master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [2024-11-05 20:08:59,870] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-05 20:08:59,995] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-05 20:09:00,018] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-05 20:09:00,056] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-05 20:09:00,061] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-05 20:09:00,099] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-05 20:09:00,111] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-05 20:09:03,412] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-05 20:09:03,977] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-05 20:09:03,998] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-05 20:09:03,998] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2024-11-05 20:09:04,011] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-05 20:09:04,027] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-05 20:09:04,226] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-05 20:09:04,332] [INFO] [comm.py:637:init_distributed] cdb=None 11/05/2024 20:09:05 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True 11/05/2024 20:09:05 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: True 11/05/2024 20:09:05 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True 11/05/2024 20:09:05 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=configs/deepspeed.json, disable_tqdm=False, do_eval=False, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=output/Chemgpt_brain_v2-20241105-200852-1e-4/runs/Nov05_20-08-59_gpu1, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=10000, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=output/Chemgpt_brain_v2-20241105-200852-1e-4, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=7, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=output/Chemgpt_brain_v2-20241105-200852-1e-4, save_on_each_node=False, save_safetensors=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) [INFO|configuration_utils.py:667] 2024-11-05 20:09:05,033 >> loading configuration file /public1/home/amzhou/lwt/model/Brain_GPT/config.json [INFO|configuration_utils.py:667] 2024-11-05 20:09:05,036 >> loading configuration file /public1/home/amzhou/lwt/model/Brain_GPT/config.json [INFO|configuration_utils.py:725] 2024-11-05 20:09:05,037 >> Model config ChatGLMConfig { "_name_or_path": "/public1/home/amzhou/lwt/model/Brain_GPT", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": 2, "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1e-05, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_layers": 28, "original_rope": true, "pad_token_id": 0, "padded_vocab_size": 65024, "post_layer_norm": true, "pre_seq_len": null, "prefix_projection": false, "quantization_bit": 0, "rmsnorm": true, "seq_length": 8192, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.30.2", "use_cache": true, "vocab_size": 65024 } 11/05/2024 20:09:05 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: True 11/05/2024 20:09:05 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True 11/05/2024 20:09:05 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: True 11/05/2024 20:09:05 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True [INFO|tokenization_utils_base.py:1821] 2024-11-05 20:09:05,052 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:1821] 2024-11-05 20:09:05,052 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:1821] 2024-11-05 20:09:05,052 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:1821] 2024-11-05 20:09:05,052 >> loading file tokenizer_config.json [INFO|modeling_utils.py:2575] 2024-11-05 20:09:05,443 >> loading weights file /public1/home/amzhou/lwt/model/Brain_GPT/pytorch_model.bin.index.json [INFO|configuration_utils.py:577] 2024-11-05 20:09:05,443 >> Generate config GenerationConfig { "_from_model_config": true, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.30.2" } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration. [INFO|modeling_utils.py:3303] 2024-11-05 20:09:17,380 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /public1/home/amzhou/lwt/model/Brain_GPT. If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training. [INFO|modeling_utils.py:2927] 2024-11-05 20:09:17,382 >> Generation config file not found, using a generation config created from the model config. Loading checkpoint shards: 57%|█████▋ | 4/7 [00:14<00:10, 3.52s/it] Loading checkpoint shards: 57%|█████▋ | 4/7 [00:14<00:10, 3.57s/it] Loading checkpoint shards: 43%|████▎ | 3/7 [00:16<00:21, 5.41s/it] Loading checkpoint shards: 43%|████▎ | 3/7 [00:16<00:21, 5.45s/it] Loading checkpoint shards: 43%|████▎ | 3/7 [00:16<00:21, 5.45s/it] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:17<00:07, 3.57s/it] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:17<00:07, 3.60s/it] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:21<00:03, 3.51s/it] Loading checkpoint shards: 57%|█████▋ | 4/7 [00:21<00:15, 5.27s/it] Loading checkpoint shards: 57%|█████▋ | 4/7 [00:21<00:15, 5.28s/it] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:21<00:03, 3.53s/it] Loading checkpoint shards: 57%|█████▋ | 4/7 [00:21<00:15, 5.32s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:22<00:00, 2.72s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:22<00:00, 3.19s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:23<00:00, 2.97s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:23<00:00, 3.31s/it] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:24<00:09, 4.61s/it] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:24<00:09, 4.68s/it] Loading checkpoint shards: 71%|███████▏ | 5/7 [00:24<00:09, 4.70s/it] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:28<00:04, 4.27s/it] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:28<00:04, 4.33s/it] Loading checkpoint shards: 86%|████████▌ | 6/7 [00:28<00:04, 4.34s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:30<00:00, 3.55s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:30<00:00, 4.34s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:30<00:00, 3.61s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:30<00:00, 4.38s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:30<00:00, 3.61s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:30<00:00, 4.39s/it] max leng of data is 8362 PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( max leng of data is 8362 PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( max leng of data is 8362 PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( max leng of data is 8362 Sanity Check >>>>>>>>>>>>> '[gMASK]': 64790 -> -100 'sop': 64792 -> -100 '<|user|>': 64795 -> -100 '': 30910 -> -100 '\n': 13 -> -100 'hello': 24954 -> -100 '<|assistant|>': 64796 -> -100 '': 30910 -> 30910 '\n': 13 -> 13 '你': 36474 -> 36474 '好': 54591 -> 54591 ',': 31123 -> 31123 '我是': 33030 -> 33030 'C': 30942 -> 30942 'hem': 3343 -> 3343 'G': 30964 -> 30964 'PT': 8705 -> 8705 ',': 31123 -> 31123 '请问': 42693 -> 42693 '有什么': 33277 -> 33277 '需要': 31665 -> 31665 '帮助': 31934 -> 31934 '吗': 55398 -> 55398 '?': 31514 -> 31514 '🥰': 64481 -> 64481 '': 2 -> 2 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 <<<<<<<<<<<<< Sanity Check PrefixTrainer () 11/05/2024 20:23:19 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [INFO|trainer.py:577] 2024-11-05 20:23:19,801 >> max_steps is given, it will override any value given in num_train_epochs /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [2024-11-05 20:23:20,516] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.0, git-hash=unknown, git-branch=unknown max leng of data is 8362 PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( max leng of data is 8362 PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( max leng of data is 8362 PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [2024-11-05 20:24:42,421] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2024-11-05 20:24:42,422] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer [2024-11-05 20:24:42,422] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2024-11-05 20:24:42,428] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW [2024-11-05 20:24:42,428] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type= [2024-11-05 20:24:42,428] [WARNING] [engine.py:1188:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution ***** [2024-11-05 20:24:42,428] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer [2024-11-05 20:24:42,428] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 500000000 [2024-11-05 20:24:42,428] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 500000000 [2024-11-05 20:24:42,428] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False [2024-11-05 20:24:42,428] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False 11/05/2024 20:24:56 - WARNING - transformers_modules.Brain_GPT.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [2024-11-05 20:24:57,212] [INFO] [utils.py:800:see_memory_usage] Before initializing optimizer states [2024-11-05 20:24:57,213] [INFO] [utils.py:801:see_memory_usage] MA 14.95 GB Max_MA 14.95 GB CA 14.96 GB Max_CA 15 GB [2024-11-05 20:24:57,213] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 82.18 GB, percent = 16.3% [2024-11-05 20:24:57,629] [INFO] [utils.py:800:see_memory_usage] After initializing optimizer states [2024-11-05 20:24:57,630] [INFO] [utils.py:801:see_memory_usage] MA 14.95 GB Max_MA 18.28 GB CA 18.28 GB Max_CA 18 GB [2024-11-05 20:24:57,630] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 84.4 GB, percent = 16.8% [2024-11-05 20:24:57,630] [INFO] [stage_1_and_2.py:539:__init__] optimizer state initialized [2024-11-05 20:24:58,038] [INFO] [utils.py:800:see_memory_usage] After initializing ZeRO optimizer [2024-11-05 20:24:58,039] [INFO] [utils.py:801:see_memory_usage] MA 14.95 GB Max_MA 14.95 GB CA 18.28 GB Max_CA 18 GB [2024-11-05 20:24:58,039] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 86.32 GB, percent = 17.1% [2024-11-05 20:24:58,040] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW [2024-11-05 20:24:58,040] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2024-11-05 20:24:58,040] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None [2024-11-05 20:24:58,040] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[5e-05, 5e-05], mom=[(0.9, 0.999), (0.9, 0.999)] [2024-11-05 20:24:58,041] [INFO] [config.py:996:print] DeepSpeedEngine configuration: [2024-11-05 20:24:58,041] [INFO] [config.py:1000:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2024-11-05 20:24:58,041] [INFO] [config.py:1000:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2024-11-05 20:24:58,041] [INFO] [config.py:1000:print] amp_enabled .................. False [2024-11-05 20:24:58,041] [INFO] [config.py:1000:print] amp_params ................... False [2024-11-05 20:24:58,041] [INFO] [config.py:1000:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2024-11-05 20:24:58,041] [INFO] [config.py:1000:print] bfloat16_enabled ............. False [2024-11-05 20:24:58,041] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] comms_config ................. [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] communication_data_type ...... None [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={} [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] dataloader_drop_last ......... False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] disable_allgather ............ False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] dump_state ................... False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] elasticity_enabled ........... False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] fp16_auto_cast ............... False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] fp16_enabled ................. True [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] global_rank .................. 0 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] grad_accum_dtype ............. None [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 1 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] gradient_clipping ............ 0.0 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] graph_harvesting ............. False [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 65536 [2024-11-05 20:24:58,042] [INFO] [config.py:1000:print] load_universal_checkpoint .... False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] loss_scale ................... 0 [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] memory_breakdown ............. False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] mics_shard_size .............. -1 [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] optimizer_name ............... None [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] optimizer_params ............. None [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] pld_enabled .................. False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] pld_params ................... False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] prescale_gradients ........... False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] scheduler_name ............... None [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] scheduler_params ............. None [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32 [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] sparse_attention ............. None [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] steps_per_print .............. inf [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] train_batch_size ............. 49 [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 7 [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] use_data_before_expert_parallel_ False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] use_node_local_storage ....... False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] weight_quantization_config ... None [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] world_size ................... 7 [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] zero_allow_untested_optimizer True [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] zero_enabled ................. True [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True [2024-11-05 20:24:58,043] [INFO] [config.py:1000:print] zero_optimization_stage ...... 2 [2024-11-05 20:24:58,043] [INFO] [config.py:986:print_user_config] json = { "train_micro_batch_size_per_gpu": 7, "zero_allow_untested_optimizer": true, "fp16": { "enabled": true, "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "contiguous_gradients": true }, "gradient_accumulation_steps": 1, "steps_per_print": inf, "bf16": { "enabled": false } } [INFO|trainer.py:1786] 2024-11-05 20:24:58,044 >> ***** Running training ***** [INFO|trainer.py:1787] 2024-11-05 20:24:58,044 >> Num examples = 130,014 [INFO|trainer.py:1788] 2024-11-05 20:24:58,044 >> Num Epochs = 4 [INFO|trainer.py:1789] 2024-11-05 20:24:58,044 >> Instantaneous batch size per device = 7 [INFO|trainer.py:1790] 2024-11-05 20:24:58,044 >> Total train batch size (w. parallel, distributed & accumulation) = 49 [INFO|trainer.py:1791] 2024-11-05 20:24:58,044 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1792] 2024-11-05 20:24:58,044 >> Total optimization steps = 10,000 [INFO|trainer.py:1793] 2024-11-05 20:24:58,045 >> Number of trainable parameters = 6,243,584,000 0%| | 0/10000 [00:00> The following columns in the training set don't have a corresponding argument in `ChatGLMForConditionalGeneration.forward` and have been ignored: lengg. If lengg are not expected by `ChatGLMForConditionalGeneration.forward`, you can safely ignore this message. 11/05/2024 20:24:58 - WARNING - transformers_modules.Brain_GPT.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/05/2024 20:25:09 - WARNING - transformers_modules.Brain_GPT.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/05/2024 20:25:10 - WARNING - transformers_modules.Brain_GPT.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/05/2024 20:25:13 - WARNING - transformers_modules.Brain_GPT.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/05/2024 20:25:15 - WARNING - transformers_modules.Brain_GPT.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/05/2024 20:25:15 - WARNING - transformers_modules.Brain_GPT.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [2024-11-05 20:25:29,444] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 0%| | 1/10000 [00:31<87:13:04, 31.40s/it] {'loss': 2.2928, 'learning_rate': 5e-05, 'epoch': 0.0} 0%| | 1/10000 [00:33<87:13:04, 31.40s/it][2024-11-05 20:25:42,659] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 0%| | 2/10000 [00:44<57:29:49, 20.70s/it] {'loss': 2.138, 'learning_rate': 5e-05, 'epoch': 0.0} 0%| | 2/10000 [00:44<57:29:49, 20.70s/it] 0%| | 3/10000 [00:57<47:22:25, 17.06s/it] {'loss': 2.073, 'learning_rate': 4.9995000000000005e-05, 'epoch': 0.0} 0%| | 3/10000 [00:57<47:22:25, 17.06s/it] 0%| | 4/10000 [01:10<42:46:38, 15.41s/it] {'loss': 0.8426, 'learning_rate': 4.999e-05, 'epoch': 0.0} 0%| | 4/10000 [01:10<42:46:38, 15.41s/it] 0%| | 5/10000 [01:23<40:13:49, 14.49s/it] {'loss': 0.5671, 'learning_rate': 4.9985e-05, 'epoch': 0.0} 0%| | 5/10000 [01:23<40:13:49, 14.49s/it] 0%| | 6/10000 [01:36<38:48:40, 13.98s/it] {'loss': 0.3448, 'learning_rate': 4.9980000000000006e-05, 'epoch': 0.0} 0%| | 6/10000 [01:36<38:48:40, 13.98s/it] 0%| | 7/10000 [01:49<37:56:32, 13.67s/it] {'loss': 0.2545, 'learning_rate': 4.9975e-05, 'epoch': 0.0} 0%| | 7/10000 [01:49<37:56:32, 13.67s/it] 0%| | 8/10000 [02:02<37:19:33, 13.45s/it] {'loss': 0.247, 'learning_rate': 4.997e-05, 'epoch': 0.0} 0%| | 8/10000 [02:02<37:19:33, 13.45s/it] 0%| | 9/10000 [02:15<36:52:53, 13.29s/it] {'loss': 0.1912, 'learning_rate': 4.9965e-05, 'epoch': 0.0} 0%| | 9/10000 [02:15<36:52:53, 13.29s/it] 0%| | 10/10000 [02:27<36:34:03, 13.18s/it] {'loss': 0.1876, 'learning_rate': 4.996e-05, 'epoch': 0.0} 0%| | 10/10000 [02:27<36:34:03, 13.18s/it] 0%| | 11/10000 [02:40<36:24:36, 13.12s/it] {'loss': 0.1863, 'learning_rate': 4.9955e-05, 'epoch': 0.0} 0%| | 11/10000 [02:40<36:24:36, 13.12s/it] 0%| | 12/10000 [02:53<36:17:00, 13.08s/it] {'loss': 0.1542, 'learning_rate': 4.995e-05, 'epoch': 0.0} 0%| | 12/10000 [02:53<36:17:00, 13.08s/it] 0%| | 13/10000 [03:06<36:11:55, 13.05s/it] {'loss': 0.1581, 'learning_rate': 4.9945000000000004e-05, 'epoch': 0.0} 0%| | 13/10000 [03:06<36:11:55, 13.05s/it] 0%| | 14/10000 [03:19<36:07:26, 13.02s/it] {'loss': 0.1535, 'learning_rate': 4.9940000000000006e-05, 'epoch': 0.01} 0%| | 14/10000 [03:19<36:07:26, 13.02s/it] 0%| | 15/10000 [03:32<35:56:24, 12.96s/it] {'loss': 0.1407, 'learning_rate': 4.9935e-05, 'epoch': 0.01} 0%| | 15/10000 [03:32<35:56:24, 12.96s/it] 0%| | 16/10000 [03:45<35:56:19, 12.96s/it] {'loss': 0.1089, 'learning_rate': 4.9930000000000005e-05, 'epoch': 0.01} 0%| | 16/10000 [03:45<35:56:19, 12.96s/it] 0%| | 17/10000 [03:58<35:53:47, 12.94s/it] {'loss': 0.1147, 'learning_rate': 4.992500000000001e-05, 'epoch': 0.01} 0%| | 17/10000 [03:58<35:53:47, 12.94s/it] 0%| | 18/10000 [04:11<35:54:23, 12.95s/it] {'loss': 0.1388, 'learning_rate': 4.992e-05, 'epoch': 0.01} 0%| | 18/10000 [04:11<35:54:23, 12.95s/it] 0%| | 19/10000 [04:24<35:53:45, 12.95s/it] {'loss': 0.1151, 'learning_rate': 4.9915e-05, 'epoch': 0.01} 0%| | 19/10000 [04:24<35:53:45, 12.95s/it] 0%| | 20/10000 [04:37<35:52:50, 12.94s/it] {'loss': 0.1219, 'learning_rate': 4.991e-05, 'epoch': 0.01} 0%| | 20/10000 [04:37<35:52:50, 12.94s/it] 0%| | 21/10000 [04:50<35:50:33, 12.93s/it] {'loss': 0.1131, 'learning_rate': 4.9905000000000004e-05, 'epoch': 0.01} 0%| | 21/10000 [04:50<35:50:33, 12.93s/it] 0%| | 22/10000 [05:03<35:47:50, 12.92s/it] {'loss': 0.0921, 'learning_rate': 4.99e-05, 'epoch': 0.01} 0%| | 22/10000 [05:03<35:47:50, 12.92s/it] 0%| | 23/10000 [05:16<35:45:24, 12.90s/it] {'loss': 0.0732, 'learning_rate': 4.9895e-05, 'epoch': 0.01} 0%| | 23/10000 [05:16<35:45:24, 12.90s/it] 0%| | 24/10000 [05:28<35:44:51, 12.90s/it] {'loss': 0.0792, 'learning_rate': 4.9890000000000005e-05, 'epoch': 0.01} 0%| | 24/10000 [05:28<35:44:51, 12.90s/it] 0%| | 25/10000 [05:42<35:56:37, 12.97s/it] {'loss': 0.0745, 'learning_rate': 4.9885e-05, 'epoch': 0.01} 0%| | 25/10000 [05:42<35:56:37, 12.97s/it] 0%| | 26/10000 [05:55<36:00:10, 12.99s/it] {'loss': 0.0599, 'learning_rate': 4.9880000000000004e-05, 'epoch': 0.01} 0%| | 26/10000 [05:55<36:00:10, 12.99s/it] 0%| | 27/10000 [06:08<36:01:00, 13.00s/it] {'loss': 0.0985, 'learning_rate': 4.9875000000000006e-05, 'epoch': 0.01} 0%| | 27/10000 [06:08<36:01:00, 13.00s/it] 0%| | 28/10000 [06:21<35:55:30, 12.97s/it] {'loss': 0.083, 'learning_rate': 4.987e-05, 'epoch': 0.01} 0%| | 28/10000 [06:21<35:55:30, 12.97s/it] 0%| | 29/10000 [06:33<35:53:43, 12.96s/it] {'loss': 0.0492, 'learning_rate': 4.9865e-05, 'epoch': 0.01} 0%| | 29/10000 [06:33<35:53:43, 12.96s/it] 0%| | 30/10000 [06:46<35:50:57, 12.94s/it] {'loss': 0.1126, 'learning_rate': 4.986e-05, 'epoch': 0.01} 0%| | 30/10000 [06:46<35:50:57, 12.94s/it] 0%| | 31/10000 [06:59<35:47:08, 12.92s/it] {'loss': 0.0463, 'learning_rate': 4.9855e-05, 'epoch': 0.01} 0%| | 31/10000 [06:59<35:47:08, 12.92s/it] 0%| | 32/10000 [07:12<35:45:41, 12.92s/it] {'loss': 0.0639, 'learning_rate': 4.9850000000000006e-05, 'epoch': 0.01} 0%| | 32/10000 [07:12<35:45:41, 12.92s/it] 0%| | 33/10000 [07:25<35:46:40, 12.92s/it] {'loss': 0.1211, 'learning_rate': 4.9845e-05, 'epoch': 0.01} 0%| | 33/10000 [07:25<35:46:40, 12.92s/it] 0%| | 34/10000 [07:38<35:42:10, 12.90s/it] {'loss': 0.0479, 'learning_rate': 4.9840000000000004e-05, 'epoch': 0.01} 0%| | 34/10000 [07:38<35:42:10, 12.90s/it] 0%| | 35/10000 [07:51<35:41:04, 12.89s/it] {'loss': 0.0531, 'learning_rate': 4.9835000000000007e-05, 'epoch': 0.01} 0%| | 35/10000 [07:51<35:41:04, 12.89s/it] 0%| | 36/10000 [08:04<35:40:43, 12.89s/it] {'loss': 0.0427, 'learning_rate': 4.983e-05, 'epoch': 0.01} 0%| | 36/10000 [08:04<35:40:43, 12.89s/it] 0%| | 37/10000 [08:17<35:44:32, 12.92s/it] {'loss': 0.0545, 'learning_rate': 4.9825000000000005e-05, 'epoch': 0.01} 0%| | 37/10000 [08:17<35:44:32, 12.92s/it] 0%| | 38/10000 [08:30<35:49:32, 12.95s/it] {'loss': 0.0418, 'learning_rate': 4.982e-05, 'epoch': 0.01} 0%| | 38/10000 [08:30<35:49:32, 12.95s/it] 0%| | 39/10000 [08:43<35:54:29, 12.98s/it] {'loss': 0.0373, 'learning_rate': 4.9815e-05, 'epoch': 0.01} 0%| | 39/10000 [08:43<35:54:29, 12.98s/it] 0%| | 40/10000 [08:56<35:54:25, 12.98s/it] {'loss': 0.0292, 'learning_rate': 4.981e-05, 'epoch': 0.02} 0%| | 40/10000 [08:56<35:54:25, 12.98s/it] 0%| | 41/10000 [09:09<35:50:50, 12.96s/it] {'loss': 0.0428, 'learning_rate': 4.9805e-05, 'epoch': 0.02} 0%| | 41/10000 [09:09<35:50:50, 12.96s/it] 0%| | 42/10000 [09:22<35:51:02, 12.96s/it] {'loss': 0.0408, 'learning_rate': 4.9800000000000004e-05, 'epoch': 0.02} 0%| | 42/10000 [09:22<35:51:02, 12.96s/it] 0%| | 43/10000 [09:35<35:50:47, 12.96s/it] {'loss': 0.0393, 'learning_rate': 4.9795e-05, 'epoch': 0.02} 0%| | 43/10000 [09:35<35:50:47, 12.96s/it] 0%| | 44/10000 [09:48<35:54:05, 12.98s/it] {'loss': 0.035, 'learning_rate': 4.979e-05, 'epoch': 0.02} 0%| | 44/10000 [09:48<35:54:05, 12.98s/it] 0%| | 45/10000 [10:01<35:57:03, 13.00s/it] {'loss': 0.0304, 'learning_rate': 4.9785000000000005e-05, 'epoch': 0.02} 0%| | 45/10000 [10:01<35:57:03, 13.00s/it] 0%| | 46/10000 [10:14<35:56:02, 13.00s/it] {'loss': 0.047, 'learning_rate': 4.978e-05, 'epoch': 0.02} 0%| | 46/10000 [10:14<35:56:02, 13.00s/it] 0%| | 47/10000 [10:27<36:00:07, 13.02s/it] {'loss': 0.0298, 'learning_rate': 4.9775000000000004e-05, 'epoch': 0.02} 0%| | 47/10000 [10:27<36:00:07, 13.02s/it] 0%| | 48/10000 [10:40<35:58:53, 13.02s/it] {'loss': 0.0249, 'learning_rate': 4.977e-05, 'epoch': 0.02} 0%| | 48/10000 [10:40<35:58:53, 13.02s/it] 0%| | 49/10000 [10:53<35:59:16, 13.02s/it] {'loss': 0.0368, 'learning_rate': 4.9765e-05, 'epoch': 0.02} 0%| | 49/10000 [10:53<35:59:16, 13.02s/it] 0%| | 50/10000 [11:06<35:57:48, 13.01s/it] {'loss': 0.0496, 'learning_rate': 4.976e-05, 'epoch': 0.02} 0%| | 50/10000 [11:06<35:57:48, 13.01s/it] 1%| | 51/10000 [11:19<35:57:36, 13.01s/it] {'loss': 0.0313, 'learning_rate': 4.9755e-05, 'epoch': 0.02} 1%| | 51/10000 [11:19<35:57:36, 13.01s/it] 1%| | 52/10000 [11:32<35:57:46, 13.01s/it] {'loss': 0.0284, 'learning_rate': 4.975e-05, 'epoch': 0.02} 1%| | 52/10000 [11:32<35:57:46, 13.01s/it] 1%| | 53/10000 [11:45<35:57:15, 13.01s/it] {'loss': 0.023, 'learning_rate': 4.9745000000000006e-05, 'epoch': 0.02} 1%| | 53/10000 [11:45<35:57:15, 13.01s/it] 1%| | 54/10000 [11:58<36:03:22, 13.05s/it] {'loss': 0.0288, 'learning_rate': 4.974e-05, 'epoch': 0.02} 1%| | 54/10000 [11:58<36:03:22, 13.05s/it] 1%| | 55/10000 [12:11<35:57:56, 13.02s/it] {'loss': 0.0357, 'learning_rate': 4.9735000000000004e-05, 'epoch': 0.02} 1%| | 55/10000 [12:11<35:57:56, 13.02s/it] 1%| | 56/10000 [12:24<36:00:15, 13.03s/it] {'loss': 0.0271, 'learning_rate': 4.973000000000001e-05, 'epoch': 0.02} 1%| | 56/10000 [12:24<36:00:15, 13.03s/it] 1%| | 57/10000 [12:37<35:54:44, 13.00s/it] {'loss': 0.0281, 'learning_rate': 4.9725e-05, 'epoch': 0.02} 1%| | 57/10000 [12:37<35:54:44, 13.00s/it] 1%| | 58/10000 [12:50<35:56:59, 13.02s/it] {'loss': 0.0209, 'learning_rate': 4.972e-05, 'epoch': 0.02} 1%| | 58/10000 [12:50<35:56:59, 13.02s/it] 1%| | 59/10000 [13:03<35:57:29, 13.02s/it] {'loss': 0.0269, 'learning_rate': 4.9715e-05, 'epoch': 0.02} 1%| | 59/10000 [13:03<35:57:29, 13.02s/it] 1%| | 60/10000 [13:16<35:56:07, 13.01s/it] {'loss': 0.0596, 'learning_rate': 4.9710000000000003e-05, 'epoch': 0.02} 1%| | 60/10000 [13:16<35:56:07, 13.01s/it] 1%| | 61/10000 [13:29<35:50:27, 12.98s/it] {'loss': 0.0285, 'learning_rate': 4.9705e-05, 'epoch': 0.02} 1%| | 61/10000 [13:29<35:50:27, 12.98s/it] 1%| | 62/10000 [13:42<35:49:45, 12.98s/it] {'loss': 0.0218, 'learning_rate': 4.97e-05, 'epoch': 0.02} 1%| | 62/10000 [13:42<35:49:45, 12.98s/it] 1%| | 63/10000 [13:55<35:56:22, 13.02s/it] {'loss': 0.0229, 'learning_rate': 4.9695000000000004e-05, 'epoch': 0.02} 1%| | 63/10000 [13:55<35:56:22, 13.02s/it] 1%| | 64/10000 [14:08<35:59:31, 13.04s/it] {'loss': 0.0313, 'learning_rate': 4.969e-05, 'epoch': 0.02} 1%| | 64/10000 [14:08<35:59:31, 13.04s/it] 1%| | 65/10000 [14:21<35:57:39, 13.03s/it] {'loss': 0.018, 'learning_rate': 4.9685e-05, 'epoch': 0.02} 1%| | 65/10000 [14:21<35:57:39, 13.03s/it] 1%| | 66/10000 [14:34<35:52:32, 13.00s/it] {'loss': 0.0245, 'learning_rate': 4.9680000000000005e-05, 'epoch': 0.02} 1%| | 66/10000 [14:34<35:52:32, 13.00s/it] 1%| | 67/10000 [14:47<35:51:04, 12.99s/it] {'loss': 0.0298, 'learning_rate': 4.967500000000001e-05, 'epoch': 0.03} 1%| | 67/10000 [14:47<35:51:04, 12.99s/it] 1%| | 68/10000 [15:00<35:48:41, 12.98s/it] {'loss': 0.0431, 'learning_rate': 4.967e-05, 'epoch': 0.03} 1%| | 68/10000 [15:00<35:48:41, 12.98s/it] 1%| | 69/10000 [15:13<35:50:25, 12.99s/it] {'loss': 0.0188, 'learning_rate': 4.9665e-05, 'epoch': 0.03} 1%| | 69/10000 [15:13<35:50:25, 12.99s/it] 1%| | 70/10000 [15:26<35:48:55, 12.98s/it] {'loss': 0.0201, 'learning_rate': 4.966e-05, 'epoch': 0.03} 1%| | 70/10000 [15:26<35:48:55, 12.98s/it] 1%| | 71/10000 [15:39<35:48:05, 12.98s/it] {'loss': 0.0186, 'learning_rate': 4.9655000000000005e-05, 'epoch': 0.03} 1%| | 71/10000 [15:39<35:48:05, 12.98s/it] 1%| | 72/10000 [15:52<35:48:43, 12.99s/it] {'loss': 0.0207, 'learning_rate': 4.965e-05, 'epoch': 0.03} 1%| | 72/10000 [15:52<35:48:43, 12.99s/it] 1%| | 73/10000 [16:05<35:53:53, 13.02s/it] {'loss': 0.0168, 'learning_rate': 4.9645e-05, 'epoch': 0.03} 1%| | 73/10000 [16:05<35:53:53, 13.02s/it] 1%| | 74/10000 [16:18<35:59:17, 13.05s/it] {'loss': 0.0163, 'learning_rate': 4.9640000000000006e-05, 'epoch': 0.03} 1%| | 74/10000 [16:18<35:59:17, 13.05s/it] 1%| | 75/10000 [16:31<35:57:12, 13.04s/it] {'loss': 0.0112, 'learning_rate': 4.9635e-05, 'epoch': 0.03} 1%| | 75/10000 [16:31<35:57:12, 13.04s/it] 1%| | 76/10000 [16:44<35:53:06, 13.02s/it] {'loss': 0.0178, 'learning_rate': 4.9630000000000004e-05, 'epoch': 0.03} 1%| | 76/10000 [16:44<35:53:06, 13.02s/it] 1%| | 77/10000 [16:57<35:51:57, 13.01s/it] {'loss': 0.0163, 'learning_rate': 4.962500000000001e-05, 'epoch': 0.03} 1%| | 77/10000 [16:57<35:51:57, 13.01s/it] 1%| | 78/10000 [17:10<35:55:06, 13.03s/it] {'loss': 0.016, 'learning_rate': 4.962e-05, 'epoch': 0.03} 1%| | 78/10000 [17:10<35:55:06, 13.03s/it] 1%| | 79/10000 [17:23<35:53:18, 13.02s/it] {'loss': 0.0198, 'learning_rate': 4.9615e-05, 'epoch': 0.03} 1%| | 79/10000 [17:23<35:53:18, 13.02s/it] 1%| | 80/10000 [17:36<35:51:57, 13.02s/it] {'loss': 0.0154, 'learning_rate': 4.961e-05, 'epoch': 0.03} 1%| | 80/10000 [17:36<35:51:57, 13.02s/it] 1%| | 81/10000 [17:49<35:49:39, 13.00s/it] {'loss': 0.0175, 'learning_rate': 4.9605000000000004e-05, 'epoch': 0.03} 1%| | 81/10000 [17:49<35:49:39, 13.00s/it] 1%| | 82/10000 [18:02<35:51:38, 13.02s/it] {'loss': 0.0138, 'learning_rate': 4.96e-05, 'epoch': 0.03} 1%| | 82/10000 [18:02<35:51:38, 13.02s/it] 1%| | 83/10000 [18:15<35:51:28, 13.02s/it] {'loss': 0.0141, 'learning_rate': 4.9595e-05, 'epoch': 0.03} 1%| | 83/10000 [18:15<35:51:28, 13.02s/it] 1%| | 84/10000 [18:28<35:47:06, 12.99s/it] {'loss': 0.0431, 'learning_rate': 4.9590000000000005e-05, 'epoch': 0.03} 1%| | 84/10000 [18:28<35:47:06, 12.99s/it] 1%| | 85/10000 [18:41<35:47:16, 12.99s/it] {'loss': 0.0137, 'learning_rate': 4.9585e-05, 'epoch': 0.03} 1%| | 85/10000 [18:41<35:47:16, 12.99s/it] 1%| | 86/10000 [18:54<35:49:02, 13.01s/it] {'loss': 0.0191, 'learning_rate': 4.958e-05, 'epoch': 0.03} 1%| | 86/10000 [18:54<35:49:02, 13.01s/it] 1%| | 87/10000 [19:07<35:45:27, 12.99s/it] {'loss': 0.0183, 'learning_rate': 4.9575000000000006e-05, 'epoch': 0.03} 1%| | 87/10000 [19:07<35:45:27, 12.99s/it] 1%| | 88/10000 [19:20<35:51:53, 13.03s/it] {'loss': 0.0182, 'learning_rate': 4.957e-05, 'epoch': 0.03} 1%| | 88/10000 [19:20<35:51:53, 13.03s/it] 1%| | 89/10000 [19:33<35:46:13, 12.99s/it] {'loss': 0.017, 'learning_rate': 4.9565e-05, 'epoch': 0.03} 1%| | 89/10000 [19:33<35:46:13, 12.99s/it] 1%| | 90/10000 [19:46<35:43:58, 12.98s/it] {'loss': 0.0154, 'learning_rate': 4.956e-05, 'epoch': 0.03} 1%| | 90/10000 [19:46<35:43:58, 12.98s/it] 1%| | 91/10000 [19:59<35:38:14, 12.95s/it] {'loss': 0.0116, 'learning_rate': 4.9555e-05, 'epoch': 0.03} 1%| | 91/10000 [19:59<35:38:14, 12.95s/it] 1%| | 92/10000 [20:12<35:38:25, 12.95s/it] {'loss': 0.0098, 'learning_rate': 4.9550000000000005e-05, 'epoch': 0.03} 1%| | 92/10000 [20:12<35:38:25, 12.95s/it] 1%| | 93/10000 [20:25<35:39:21, 12.96s/it] {'loss': 0.0341, 'learning_rate': 4.9545e-05, 'epoch': 0.04} 1%| | 93/10000 [20:25<35:39:21, 12.96s/it] 1%| | 94/10000 [20:38<35:45:11, 12.99s/it] {'loss': 0.0137, 'learning_rate': 4.9540000000000003e-05, 'epoch': 0.04} 1%| | 94/10000 [20:38<35:45:11, 12.99s/it] 1%| | 95/10000 [20:51<35:45:41, 13.00s/it] {'loss': 0.011, 'learning_rate': 4.9535000000000006e-05, 'epoch': 0.04} 1%| | 95/10000 [20:51<35:45:41, 13.00s/it] 1%| | 96/10000 [21:04<35:47:14, 13.01s/it] {'loss': 0.0152, 'learning_rate': 4.953e-05, 'epoch': 0.04} 1%| | 96/10000 [21:04<35:47:14, 13.01s/it] 1%| | 97/10000 [21:17<35:45:11, 13.00s/it] {'loss': 0.0166, 'learning_rate': 4.9525000000000004e-05, 'epoch': 0.04} 1%| | 97/10000 [21:17<35:45:11, 13.00s/it] 1%| | 98/10000 [21:30<35:45:04, 13.00s/it] {'loss': 0.016, 'learning_rate': 4.952e-05, 'epoch': 0.04} 1%| | 98/10000 [21:30<35:45:04, 13.00s/it] 1%| | 99/10000 [21:43<35:42:00, 12.98s/it] {'loss': 0.0126, 'learning_rate': 4.9515e-05, 'epoch': 0.04} 1%| | 99/10000 [21:43<35:42:00, 12.98s/it] 1%| | 100/10000 [21:56<35:41:45, 12.98s/it] {'loss': 0.0129, 'learning_rate': 4.951e-05, 'epoch': 0.04} 1%| | 100/10000 [21:56<35:41:45, 12.98s/it] 1%| | 101/10000 [22:09<35:43:19, 12.99s/it] {'loss': 0.0155, 'learning_rate': 4.9505e-05, 'epoch': 0.04} 1%| | 101/10000 [22:09<35:43:19, 12.99s/it] 1%| | 102/10000 [22:22<35:43:32, 12.99s/it] {'loss': 0.0115, 'learning_rate': 4.9500000000000004e-05, 'epoch': 0.04} 1%| | 102/10000 [22:22<35:43:32, 12.99s/it] 1%| | 103/10000 [22:35<35:43:13, 12.99s/it] {'loss': 0.0133, 'learning_rate': 4.9495e-05, 'epoch': 0.04} 1%| | 103/10000 [22:35<35:43:13, 12.99s/it] 1%| | 104/10000 [22:48<35:41:57, 12.99s/it] {'loss': 0.0129, 'learning_rate': 4.949e-05, 'epoch': 0.04} 1%| | 104/10000 [22:48<35:41:57, 12.99s/it] 1%| | 105/10000 [23:01<35:46:45, 13.02s/it] {'loss': 0.0122, 'learning_rate': 4.9485000000000005e-05, 'epoch': 0.04} 1%| | 105/10000 [23:01<35:46:45, 13.02s/it] 1%| | 106/10000 [23:14<35:42:20, 12.99s/it] {'loss': 0.014, 'learning_rate': 4.948000000000001e-05, 'epoch': 0.04} 1%| | 106/10000 [23:14<35:42:20, 12.99s/it] 1%| | 107/10000 [23:27<35:39:45, 12.98s/it] {'loss': 0.0141, 'learning_rate': 4.9475e-05, 'epoch': 0.04} 1%| | 107/10000 [23:27<35:39:45, 12.98s/it] 1%| | 108/10000 [23:40<35:37:12, 12.96s/it] {'loss': 0.01, 'learning_rate': 4.947e-05, 'epoch': 0.04} 1%| | 108/10000 [23:40<35:37:12, 12.96s/it] 1%| | 109/10000 [23:53<35:36:42, 12.96s/it] {'loss': 0.0124, 'learning_rate': 4.9465e-05, 'epoch': 0.04} 1%| | 109/10000 [23:53<35:36:42, 12.96s/it] 1%| | 110/10000 [24:06<35:35:59, 12.96s/it] {'loss': 0.0104, 'learning_rate': 4.946e-05, 'epoch': 0.04} 1%| | 110/10000 [24:06<35:35:59, 12.96s/it] 1%| | 111/10000 [24:19<35:37:15, 12.97s/it] {'loss': 0.0112, 'learning_rate': 4.9455e-05, 'epoch': 0.04} 1%| | 111/10000 [24:19<35:37:15, 12.97s/it] 1%| | 112/10000 [24:32<35:35:24, 12.96s/it] {'loss': 0.0153, 'learning_rate': 4.945e-05, 'epoch': 0.04} 1%| | 112/10000 [24:32<35:35:24, 12.96s/it] 1%| | 113/10000 [24:45<35:35:50, 12.96s/it] {'loss': 0.012, 'learning_rate': 4.9445000000000005e-05, 'epoch': 0.04} 1%| | 113/10000 [24:45<35:35:50, 12.96s/it] 1%| | 114/10000 [24:57<35:36:41, 12.97s/it] {'loss': 0.0145, 'learning_rate': 4.944e-05, 'epoch': 0.04} 1%| | 114/10000 [24:58<35:36:41, 12.97s/it] 1%| | 115/10000 [25:10<35:34:05, 12.95s/it] {'loss': 0.0153, 'learning_rate': 4.9435000000000004e-05, 'epoch': 0.04} 1%| | 115/10000 [25:10<35:34:05, 12.95s/it] 1%| | 116/10000 [25:23<35:35:00, 12.96s/it] {'loss': 0.0072, 'learning_rate': 4.9430000000000006e-05, 'epoch': 0.04} 1%| | 116/10000 [25:23<35:35:00, 12.96s/it] 1%| | 117/10000 [25:36<35:34:57, 12.96s/it] {'loss': 0.0111, 'learning_rate': 4.9425e-05, 'epoch': 0.04} 1%| | 117/10000 [25:36<35:34:57, 12.96s/it] 1%| | 118/10000 [25:49<35:35:15, 12.96s/it] {'loss': 0.0111, 'learning_rate': 4.942e-05, 'epoch': 0.04} 1%| | 118/10000 [25:49<35:35:15, 12.96s/it] 1%| | 119/10000 [26:02<35:35:13, 12.97s/it] {'loss': 0.0128, 'learning_rate': 4.9415e-05, 'epoch': 0.04} 1%| | 119/10000 [26:02<35:35:13, 12.97s/it] 1%| | 120/10000 [26:15<35:36:43, 12.98s/it] {'loss': 0.0115, 'learning_rate': 4.941e-05, 'epoch': 0.05} 1%| | 120/10000 [26:15<35:36:43, 12.98s/it] 1%| | 121/10000 [26:28<35:37:16, 12.98s/it] {'loss': 0.0123, 'learning_rate': 4.9405e-05, 'epoch': 0.05} 1%| | 121/10000 [26:28<35:37:16, 12.98s/it] 1%| | 122/10000 [26:41<35:35:14, 12.97s/it] {'loss': 0.0787, 'learning_rate': 4.94e-05, 'epoch': 0.05} 1%| | 122/10000 [26:41<35:35:14, 12.97s/it] 1%| | 123/10000 [26:54<35:35:31, 12.97s/it] {'loss': 0.0153, 'learning_rate': 4.9395000000000004e-05, 'epoch': 0.05} 1%| | 123/10000 [26:54<35:35:31, 12.97s/it] 1%| | 124/10000 [27:07<35:32:29, 12.96s/it] {'loss': 0.0081, 'learning_rate': 4.939e-05, 'epoch': 0.05} 1%| | 124/10000 [27:07<35:32:29, 12.96s/it] 1%|▏ | 125/10000 [27:20<35:33:43, 12.96s/it] {'loss': 0.0091, 'learning_rate': 4.9385e-05, 'epoch': 0.05} 1%|▏ | 125/10000 [27:20<35:33:43, 12.96s/it] 1%|▏ | 126/10000 [27:33<35:30:53, 12.95s/it] {'loss': 0.0126, 'learning_rate': 4.9380000000000005e-05, 'epoch': 0.05} 1%|▏ | 126/10000 [27:33<35:30:53, 12.95s/it] 1%|▏ | 127/10000 [27:46<35:30:05, 12.94s/it] {'loss': 0.0125, 'learning_rate': 4.937500000000001e-05, 'epoch': 0.05} 1%|▏ | 127/10000 [27:46<35:30:05, 12.94s/it] 1%|▏ | 128/10000 [27:59<35:27:02, 12.93s/it] {'loss': 0.0117, 'learning_rate': 4.937e-05, 'epoch': 0.05} 1%|▏ | 128/10000 [27:59<35:27:02, 12.93s/it] 1%|▏ | 129/10000 [28:12<35:24:21, 12.91s/it] {'loss': 0.0164, 'learning_rate': 4.9365e-05, 'epoch': 0.05} 1%|▏ | 129/10000 [28:12<35:24:21, 12.91s/it] 1%|▏ | 130/10000 [28:25<35:23:27, 12.91s/it] {'loss': 0.0113, 'learning_rate': 4.936e-05, 'epoch': 0.05} 1%|▏ | 130/10000 [28:25<35:23:27, 12.91s/it] 1%|▏ | 131/10000 [28:38<35:21:33, 12.90s/it] {'loss': 0.0101, 'learning_rate': 4.9355000000000004e-05, 'epoch': 0.05} 1%|▏ | 131/10000 [28:38<35:21:33, 12.90s/it] 1%|▏ | 132/10000 [28:50<35:24:13, 12.92s/it] {'loss': 0.0119, 'learning_rate': 4.935e-05, 'epoch': 0.05} 1%|▏ | 132/10000 [28:50<35:24:13, 12.92s/it] 1%|▏ | 133/10000 [29:03<35:22:14, 12.91s/it] {'loss': 0.011, 'learning_rate': 4.9345e-05, 'epoch': 0.05} 1%|▏ | 133/10000 [29:03<35:22:14, 12.91s/it] 1%|▏ | 134/10000 [29:16<35:22:58, 12.91s/it] {'loss': 0.0098, 'learning_rate': 4.9340000000000005e-05, 'epoch': 0.05} 1%|▏ | 134/10000 [29:16<35:22:58, 12.91s/it] 1%|▏ | 135/10000 [29:29<35:24:35, 12.92s/it] {'loss': 0.0102, 'learning_rate': 4.9335e-05, 'epoch': 0.05} 1%|▏ | 135/10000 [29:29<35:24:35, 12.92s/it] 1%|▏ | 136/10000 [29:42<35:25:03, 12.93s/it] {'loss': 0.0123, 'learning_rate': 4.9330000000000004e-05, 'epoch': 0.05} 1%|▏ | 136/10000 [29:42<35:25:03, 12.93s/it] 1%|▏ | 137/10000 [29:55<35:24:34, 12.92s/it] {'loss': 0.0072, 'learning_rate': 4.9325000000000006e-05, 'epoch': 0.05} 1%|▏ | 137/10000 [29:55<35:24:34, 12.92s/it] 1%|▏ | 138/10000 [30:08<35:28:47, 12.95s/it] {'loss': 0.0072, 'learning_rate': 4.932e-05, 'epoch': 0.05} 1%|▏ | 138/10000 [30:08<35:28:47, 12.95s/it] 1%|▏ | 139/10000 [30:21<35:30:44, 12.96s/it] {'loss': 0.0128, 'learning_rate': 4.9315e-05, 'epoch': 0.05} 1%|▏ | 139/10000 [30:21<35:30:44, 12.96s/it] 1%|▏ | 140/10000 [30:34<35:35:12, 12.99s/it] {'loss': 0.0083, 'learning_rate': 4.931e-05, 'epoch': 0.05} 1%|▏ | 140/10000 [30:34<35:35:12, 12.99s/it] 1%|▏ | 141/10000 [30:47<35:36:12, 13.00s/it] {'loss': 0.0092, 'learning_rate': 4.9305e-05, 'epoch': 0.05} 1%|▏ | 141/10000 [30:47<35:36:12, 13.00s/it] 1%|▏ | 142/10000 [31:00<35:37:01, 13.01s/it] {'loss': 0.011, 'learning_rate': 4.93e-05, 'epoch': 0.05} 1%|▏ | 142/10000 [31:00<35:37:01, 13.01s/it] 1%|▏ | 143/10000 [31:13<35:37:10, 13.01s/it] {'loss': 0.0076, 'learning_rate': 4.9295e-05, 'epoch': 0.05} 1%|▏ | 143/10000 [31:13<35:37:10, 13.01s/it] 1%|▏ | 144/10000 [31:26<35:36:06, 13.00s/it] {'loss': 0.0081, 'learning_rate': 4.9290000000000004e-05, 'epoch': 0.05} 1%|▏ | 144/10000 [31:26<35:36:06, 13.00s/it] 1%|▏ | 145/10000 [31:39<35:36:19, 13.01s/it] {'loss': 0.0083, 'learning_rate': 4.928500000000001e-05, 'epoch': 0.05} 1%|▏ | 145/10000 [31:39<35:36:19, 13.01s/it] 1%|▏ | 146/10000 [31:52<35:34:46, 13.00s/it] {'loss': 0.012, 'learning_rate': 4.928e-05, 'epoch': 0.06} 1%|▏ | 146/10000 [31:52<35:34:46, 13.00s/it] 1%|▏ | 147/10000 [32:05<35:31:37, 12.98s/it] {'loss': 0.0159, 'learning_rate': 4.9275000000000005e-05, 'epoch': 0.06} 1%|▏ | 147/10000 [32:05<35:31:37, 12.98s/it] 1%|▏ | 148/10000 [32:18<35:30:09, 12.97s/it] {'loss': 0.0095, 'learning_rate': 4.927000000000001e-05, 'epoch': 0.06} 1%|▏ | 148/10000 [32:18<35:30:09, 12.97s/it] 1%|▏ | 149/10000 [32:31<35:27:38, 12.96s/it] {'loss': 0.0083, 'learning_rate': 4.9265e-05, 'epoch': 0.06} 1%|▏ | 149/10000 [32:31<35:27:38, 12.96s/it] 2%|▏ | 150/10000 [32:44<35:29:27, 12.97s/it] {'loss': 0.009, 'learning_rate': 4.926e-05, 'epoch': 0.06} 2%|▏ | 150/10000 [32:44<35:29:27, 12.97s/it] 2%|▏ | 151/10000 [32:57<35:28:42, 12.97s/it] {'loss': 0.0124, 'learning_rate': 4.9255e-05, 'epoch': 0.06} 2%|▏ | 151/10000 [32:57<35:28:42, 12.97s/it] 2%|▏ | 152/10000 [33:10<35:30:51, 12.98s/it] {'loss': 0.0076, 'learning_rate': 4.9250000000000004e-05, 'epoch': 0.06} 2%|▏ | 152/10000 [33:10<35:30:51, 12.98s/it] 2%|▏ | 153/10000 [33:23<35:29:07, 12.97s/it] {'loss': 0.0098, 'learning_rate': 4.9245e-05, 'epoch': 0.06} 2%|▏ | 153/10000 [33:23<35:29:07, 12.97s/it] 2%|▏ | 154/10000 [33:36<35:24:40, 12.95s/it] {'loss': 0.0081, 'learning_rate': 4.924e-05, 'epoch': 0.06} 2%|▏ | 154/10000 [33:36<35:24:40, 12.95s/it] 2%|▏ | 155/10000 [33:49<35:27:03, 12.96s/it] {'loss': 0.0085, 'learning_rate': 4.9235000000000005e-05, 'epoch': 0.06} 2%|▏ | 155/10000 [33:49<35:27:03, 12.96s/it] 2%|▏ | 156/10000 [34:02<35:30:36, 12.99s/it] {'loss': 0.0079, 'learning_rate': 4.923e-05, 'epoch': 0.06} 2%|▏ | 156/10000 [34:02<35:30:36, 12.99s/it] 2%|▏ | 157/10000 [34:15<35:33:16, 13.00s/it] {'loss': 0.007, 'learning_rate': 4.9225000000000004e-05, 'epoch': 0.06} 2%|▏ | 157/10000 [34:15<35:33:16, 13.00s/it] 2%|▏ | 158/10000 [34:28<35:32:27, 13.00s/it] {'loss': 0.0064, 'learning_rate': 4.9220000000000006e-05, 'epoch': 0.06} 2%|▏ | 158/10000 [34:28<35:32:27, 13.00s/it] 2%|▏ | 159/10000 [34:41<35:34:20, 13.01s/it] {'loss': 0.0077, 'learning_rate': 4.9215e-05, 'epoch': 0.06} 2%|▏ | 159/10000 [34:41<35:34:20, 13.01s/it] 2%|▏ | 160/10000 [34:54<35:28:26, 12.98s/it] {'loss': 0.0086, 'learning_rate': 4.921e-05, 'epoch': 0.06} 2%|▏ | 160/10000 [34:54<35:28:26, 12.98s/it] 2%|▏ | 161/10000 [35:07<35:28:16, 12.98s/it] {'loss': 0.0136, 'learning_rate': 4.9205e-05, 'epoch': 0.06} 2%|▏ | 161/10000 [35:07<35:28:16, 12.98s/it] 2%|▏ | 162/10000 [35:20<35:28:18, 12.98s/it] {'loss': 0.0073, 'learning_rate': 4.92e-05, 'epoch': 0.06} 2%|▏ | 162/10000 [35:20<35:28:18, 12.98s/it] 2%|▏ | 163/10000 [35:33<35:26:15, 12.97s/it] {'loss': 0.0093, 'learning_rate': 4.9195e-05, 'epoch': 0.06} 2%|▏ | 163/10000 [35:33<35:26:15, 12.97s/it] 2%|▏ | 164/10000 [35:46<35:26:46, 12.97s/it] {'loss': 0.0086, 'learning_rate': 4.919e-05, 'epoch': 0.06} 2%|▏ | 164/10000 [35:46<35:26:46, 12.97s/it] 2%|▏ | 165/10000 [35:59<35:24:47, 12.96s/it] {'loss': 0.0097, 'learning_rate': 4.9185000000000004e-05, 'epoch': 0.06} 2%|▏ | 165/10000 [35:59<35:24:47, 12.96s/it] 2%|▏ | 166/10000 [36:12<35:22:30, 12.95s/it] {'loss': 0.008, 'learning_rate': 4.918000000000001e-05, 'epoch': 0.06} 2%|▏ | 166/10000 [36:12<35:22:30, 12.95s/it] 2%|▏ | 167/10000 [36:24<35:18:38, 12.93s/it] {'loss': 0.0233, 'learning_rate': 4.9175e-05, 'epoch': 0.06} 2%|▏ | 167/10000 [36:24<35:18:38, 12.93s/it] 2%|▏ | 168/10000 [36:37<35:23:23, 12.96s/it] {'loss': 0.0091, 'learning_rate': 4.9170000000000005e-05, 'epoch': 0.06} 2%|▏ | 168/10000 [36:38<35:23:23, 12.96s/it] 2%|▏ | 169/10000 [36:50<35:25:23, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.9165e-05, 'epoch': 0.06} 2%|▏ | 169/10000 [36:51<35:25:23, 12.97s/it] 2%|▏ | 170/10000 [37:03<35:24:14, 12.97s/it] {'loss': 0.0269, 'learning_rate': 4.9160000000000004e-05, 'epoch': 0.06} 2%|▏ | 170/10000 [37:03<35:24:14, 12.97s/it] 2%|▏ | 171/10000 [37:16<35:27:02, 12.98s/it] {'loss': 0.0115, 'learning_rate': 4.9155e-05, 'epoch': 0.06} 2%|▏ | 171/10000 [37:17<35:27:02, 12.98s/it] 2%|▏ | 172/10000 [37:29<35:25:49, 12.98s/it] {'loss': 0.008, 'learning_rate': 4.915e-05, 'epoch': 0.06} 2%|▏ | 172/10000 [37:29<35:25:49, 12.98s/it] 2%|▏ | 173/10000 [37:42<35:23:20, 12.96s/it] {'loss': 0.0068, 'learning_rate': 4.9145000000000005e-05, 'epoch': 0.07} 2%|▏ | 173/10000 [37:42<35:23:20, 12.96s/it] 2%|▏ | 174/10000 [37:55<35:24:42, 12.97s/it] {'loss': 0.0076, 'learning_rate': 4.914e-05, 'epoch': 0.07} 2%|▏ | 174/10000 [37:55<35:24:42, 12.97s/it] 2%|▏ | 175/10000 [38:08<35:26:44, 12.99s/it] {'loss': 0.0092, 'learning_rate': 4.9135e-05, 'epoch': 0.07} 2%|▏ | 175/10000 [38:08<35:26:44, 12.99s/it] 2%|▏ | 176/10000 [38:21<35:24:10, 12.97s/it] {'loss': 0.0088, 'learning_rate': 4.9130000000000006e-05, 'epoch': 0.07} 2%|▏ | 176/10000 [38:21<35:24:10, 12.97s/it] 2%|▏ | 177/10000 [38:34<35:19:33, 12.95s/it] {'loss': 0.0069, 'learning_rate': 4.9125e-05, 'epoch': 0.07} 2%|▏ | 177/10000 [38:34<35:19:33, 12.95s/it] 2%|▏ | 178/10000 [38:47<35:18:26, 12.94s/it] {'loss': 0.0096, 'learning_rate': 4.9120000000000004e-05, 'epoch': 0.07} 2%|▏ | 178/10000 [38:47<35:18:26, 12.94s/it] 2%|▏ | 179/10000 [39:00<35:18:11, 12.94s/it] {'loss': 0.0066, 'learning_rate': 4.9115e-05, 'epoch': 0.07} 2%|▏ | 179/10000 [39:00<35:18:11, 12.94s/it] 2%|▏ | 180/10000 [39:13<35:16:09, 12.93s/it] {'loss': 0.0076, 'learning_rate': 4.911e-05, 'epoch': 0.07} 2%|▏ | 180/10000 [39:13<35:16:09, 12.93s/it] 2%|▏ | 181/10000 [39:26<35:14:40, 12.92s/it] {'loss': 0.0068, 'learning_rate': 4.9105e-05, 'epoch': 0.07} 2%|▏ | 181/10000 [39:26<35:14:40, 12.92s/it] 2%|▏ | 182/10000 [39:39<35:10:02, 12.89s/it] {'loss': 0.0128, 'learning_rate': 4.91e-05, 'epoch': 0.07} 2%|▏ | 182/10000 [39:39<35:10:02, 12.89s/it] 2%|▏ | 183/10000 [39:52<35:13:31, 12.92s/it] {'loss': 0.0091, 'learning_rate': 4.9095000000000003e-05, 'epoch': 0.07} 2%|▏ | 183/10000 [39:52<35:13:31, 12.92s/it] 2%|▏ | 184/10000 [40:05<35:12:56, 12.92s/it] {'loss': 0.0085, 'learning_rate': 4.9090000000000006e-05, 'epoch': 0.07} 2%|▏ | 184/10000 [40:05<35:12:56, 12.92s/it] 2%|▏ | 185/10000 [40:17<35:12:21, 12.91s/it] {'loss': 0.0088, 'learning_rate': 4.9085e-05, 'epoch': 0.07} 2%|▏ | 185/10000 [40:18<35:12:21, 12.91s/it] 2%|▏ | 186/10000 [40:30<35:10:06, 12.90s/it] {'loss': 0.0081, 'learning_rate': 4.9080000000000004e-05, 'epoch': 0.07} 2%|▏ | 186/10000 [40:30<35:10:06, 12.90s/it] 2%|▏ | 187/10000 [40:43<35:14:12, 12.93s/it] {'loss': 0.0087, 'learning_rate': 4.907500000000001e-05, 'epoch': 0.07} 2%|▏ | 187/10000 [40:43<35:14:12, 12.93s/it] 2%|▏ | 188/10000 [40:56<35:09:34, 12.90s/it] {'loss': 0.0096, 'learning_rate': 4.907e-05, 'epoch': 0.07} 2%|▏ | 188/10000 [40:56<35:09:34, 12.90s/it] 2%|▏ | 189/10000 [41:09<35:11:35, 12.91s/it] {'loss': 0.0089, 'learning_rate': 4.9065e-05, 'epoch': 0.07} 2%|▏ | 189/10000 [41:09<35:11:35, 12.91s/it] 2%|▏ | 190/10000 [41:22<35:10:34, 12.91s/it] {'loss': 0.0115, 'learning_rate': 4.906e-05, 'epoch': 0.07} 2%|▏ | 190/10000 [41:22<35:10:34, 12.91s/it] 2%|▏ | 191/10000 [41:35<35:12:53, 12.92s/it] {'loss': 0.0082, 'learning_rate': 4.9055000000000004e-05, 'epoch': 0.07} 2%|▏ | 191/10000 [41:35<35:12:53, 12.92s/it] 2%|▏ | 192/10000 [41:48<35:12:09, 12.92s/it] {'loss': 0.0114, 'learning_rate': 4.905e-05, 'epoch': 0.07} 2%|▏ | 192/10000 [41:48<35:12:09, 12.92s/it] 2%|▏ | 193/10000 [42:01<35:11:53, 12.92s/it] {'loss': 0.0089, 'learning_rate': 4.9045e-05, 'epoch': 0.07} 2%|▏ | 193/10000 [42:01<35:11:53, 12.92s/it] 2%|▏ | 194/10000 [42:14<35:14:21, 12.94s/it] {'loss': 0.007, 'learning_rate': 4.9040000000000005e-05, 'epoch': 0.07} 2%|▏ | 194/10000 [42:14<35:14:21, 12.94s/it] 2%|▏ | 195/10000 [42:27<35:10:06, 12.91s/it] {'loss': 0.0078, 'learning_rate': 4.9035e-05, 'epoch': 0.07} 2%|▏ | 195/10000 [42:27<35:10:06, 12.91s/it] 2%|▏ | 196/10000 [42:39<35:06:04, 12.89s/it] {'loss': 0.0121, 'learning_rate': 4.903e-05, 'epoch': 0.07} 2%|▏ | 196/10000 [42:40<35:06:04, 12.89s/it] 2%|▏ | 197/10000 [42:52<35:04:47, 12.88s/it] {'loss': 0.0085, 'learning_rate': 4.9025000000000006e-05, 'epoch': 0.07} 2%|▏ | 197/10000 [42:52<35:04:47, 12.88s/it] 2%|▏ | 198/10000 [43:05<35:07:04, 12.90s/it] {'loss': 0.0075, 'learning_rate': 4.902e-05, 'epoch': 0.07} 2%|▏ | 198/10000 [43:05<35:07:04, 12.90s/it] 2%|▏ | 199/10000 [43:18<35:08:45, 12.91s/it] {'loss': 0.0106, 'learning_rate': 4.9015e-05, 'epoch': 0.07} 2%|▏ | 199/10000 [43:18<35:08:45, 12.91s/it] 2%|▏ | 200/10000 [43:31<35:10:03, 12.92s/it] {'loss': 0.0075, 'learning_rate': 4.901e-05, 'epoch': 0.08} 2%|▏ | 200/10000 [43:31<35:10:03, 12.92s/it] 2%|▏ | 201/10000 [43:44<35:11:42, 12.93s/it] {'loss': 0.0077, 'learning_rate': 4.9005e-05, 'epoch': 0.08} 2%|▏ | 201/10000 [43:44<35:11:42, 12.93s/it] 2%|▏ | 202/10000 [43:57<35:11:37, 12.93s/it] {'loss': 0.0091, 'learning_rate': 4.9e-05, 'epoch': 0.08} 2%|▏ | 202/10000 [43:57<35:11:37, 12.93s/it] 2%|▏ | 203/10000 [44:10<35:08:26, 12.91s/it] {'loss': 0.0087, 'learning_rate': 4.8995e-05, 'epoch': 0.08} 2%|▏ | 203/10000 [44:10<35:08:26, 12.91s/it] 2%|▏ | 204/10000 [44:23<35:07:05, 12.91s/it] {'loss': 0.007, 'learning_rate': 4.8990000000000004e-05, 'epoch': 0.08} 2%|▏ | 204/10000 [44:23<35:07:05, 12.91s/it] 2%|▏ | 205/10000 [44:36<35:08:29, 12.92s/it] {'loss': 0.0124, 'learning_rate': 4.8985000000000006e-05, 'epoch': 0.08} 2%|▏ | 205/10000 [44:36<35:08:29, 12.92s/it] 2%|▏ | 206/10000 [44:49<35:07:39, 12.91s/it] {'loss': 0.0071, 'learning_rate': 4.898e-05, 'epoch': 0.08} 2%|▏ | 206/10000 [44:49<35:07:39, 12.91s/it] 2%|▏ | 207/10000 [45:02<35:08:35, 12.92s/it] {'loss': 0.0083, 'learning_rate': 4.8975000000000005e-05, 'epoch': 0.08} 2%|▏ | 207/10000 [45:02<35:08:35, 12.92s/it] 2%|▏ | 208/10000 [45:14<35:06:07, 12.91s/it] {'loss': 0.008, 'learning_rate': 4.897000000000001e-05, 'epoch': 0.08} 2%|▏ | 208/10000 [45:14<35:06:07, 12.91s/it] 2%|▏ | 209/10000 [45:27<35:05:14, 12.90s/it] {'loss': 0.0068, 'learning_rate': 4.8965e-05, 'epoch': 0.08} 2%|▏ | 209/10000 [45:27<35:05:14, 12.90s/it] 2%|▏ | 210/10000 [45:40<35:06:44, 12.91s/it] {'loss': 0.0081, 'learning_rate': 4.896e-05, 'epoch': 0.08} 2%|▏ | 210/10000 [45:40<35:06:44, 12.91s/it] 2%|▏ | 211/10000 [45:53<35:04:19, 12.90s/it] {'loss': 0.0089, 'learning_rate': 4.8955e-05, 'epoch': 0.08} 2%|▏ | 211/10000 [45:53<35:04:19, 12.90s/it] 2%|▏ | 212/10000 [46:06<35:07:48, 12.92s/it] {'loss': 0.008, 'learning_rate': 4.8950000000000004e-05, 'epoch': 0.08} 2%|▏ | 212/10000 [46:06<35:07:48, 12.92s/it] 2%|▏ | 213/10000 [46:19<35:07:30, 12.92s/it] {'loss': 0.0119, 'learning_rate': 4.8945e-05, 'epoch': 0.08} 2%|▏ | 213/10000 [46:19<35:07:30, 12.92s/it] 2%|▏ | 214/10000 [46:32<35:06:51, 12.92s/it] {'loss': 0.0087, 'learning_rate': 4.894e-05, 'epoch': 0.08} 2%|▏ | 214/10000 [46:32<35:06:51, 12.92s/it] 2%|▏ | 215/10000 [46:45<35:05:50, 12.91s/it] {'loss': 0.0093, 'learning_rate': 4.8935000000000005e-05, 'epoch': 0.08} 2%|▏ | 215/10000 [46:45<35:05:50, 12.91s/it] 2%|▏ | 216/10000 [46:58<35:02:04, 12.89s/it] {'loss': 0.007, 'learning_rate': 4.893e-05, 'epoch': 0.08} 2%|▏ | 216/10000 [46:58<35:02:04, 12.89s/it] 2%|▏ | 217/10000 [47:11<35:02:18, 12.89s/it] {'loss': 0.0067, 'learning_rate': 4.8925e-05, 'epoch': 0.08} 2%|▏ | 217/10000 [47:11<35:02:18, 12.89s/it] 2%|▏ | 218/10000 [47:24<35:02:23, 12.90s/it] {'loss': 0.0166, 'learning_rate': 4.8920000000000006e-05, 'epoch': 0.08} 2%|▏ | 218/10000 [47:24<35:02:23, 12.90s/it] 2%|▏ | 219/10000 [47:36<35:03:05, 12.90s/it] {'loss': 0.0081, 'learning_rate': 4.8915e-05, 'epoch': 0.08} 2%|▏ | 219/10000 [47:36<35:03:05, 12.90s/it] 2%|▏ | 220/10000 [47:49<35:00:46, 12.89s/it] {'loss': 0.0083, 'learning_rate': 4.891e-05, 'epoch': 0.08} 2%|▏ | 220/10000 [47:49<35:00:46, 12.89s/it] 2%|▏ | 221/10000 [48:02<35:03:17, 12.90s/it] {'loss': 0.0066, 'learning_rate': 4.8905e-05, 'epoch': 0.08} 2%|▏ | 221/10000 [48:02<35:03:17, 12.90s/it] 2%|▏ | 222/10000 [48:15<35:01:53, 12.90s/it] {'loss': 0.0067, 'learning_rate': 4.89e-05, 'epoch': 0.08} 2%|▏ | 222/10000 [48:15<35:01:53, 12.90s/it] 2%|▏ | 223/10000 [48:28<34:59:58, 12.89s/it] {'loss': 0.006, 'learning_rate': 4.8895e-05, 'epoch': 0.08} 2%|▏ | 223/10000 [48:28<34:59:58, 12.89s/it] 2%|▏ | 224/10000 [48:41<34:57:38, 12.87s/it] {'loss': 0.0069, 'learning_rate': 4.889e-05, 'epoch': 0.08} 2%|▏ | 224/10000 [48:41<34:57:38, 12.87s/it] 2%|▏ | 225/10000 [48:54<34:56:32, 12.87s/it] {'loss': 0.0093, 'learning_rate': 4.8885000000000004e-05, 'epoch': 0.08} 2%|▏ | 225/10000 [48:54<34:56:32, 12.87s/it] 2%|▏ | 226/10000 [49:06<34:53:12, 12.85s/it] {'loss': 0.0087, 'learning_rate': 4.8880000000000006e-05, 'epoch': 0.09} 2%|▏ | 226/10000 [49:07<34:53:12, 12.85s/it] 2%|▏ | 227/10000 [49:19<34:53:27, 12.85s/it] {'loss': 0.0183, 'learning_rate': 4.8875e-05, 'epoch': 0.09} 2%|▏ | 227/10000 [49:19<34:53:27, 12.85s/it] 2%|▏ | 228/10000 [49:32<34:57:07, 12.88s/it] {'loss': 0.007, 'learning_rate': 4.8870000000000005e-05, 'epoch': 0.09} 2%|▏ | 228/10000 [49:32<34:57:07, 12.88s/it] 2%|▏ | 229/10000 [49:45<34:58:23, 12.89s/it] {'loss': 0.0079, 'learning_rate': 4.8865e-05, 'epoch': 0.09} 2%|▏ | 229/10000 [49:45<34:58:23, 12.89s/it] 2%|▏ | 230/10000 [49:58<34:57:27, 12.88s/it] {'loss': 0.0061, 'learning_rate': 4.886e-05, 'epoch': 0.09} 2%|▏ | 230/10000 [49:58<34:57:27, 12.88s/it] 2%|▏ | 231/10000 [50:11<34:57:07, 12.88s/it] {'loss': 0.0147, 'learning_rate': 4.8855e-05, 'epoch': 0.09} 2%|▏ | 231/10000 [50:11<34:57:07, 12.88s/it] 2%|▏ | 232/10000 [50:24<35:00:36, 12.90s/it] {'loss': 0.007, 'learning_rate': 4.885e-05, 'epoch': 0.09} 2%|▏ | 232/10000 [50:24<35:00:36, 12.90s/it] 2%|▏ | 233/10000 [50:37<35:00:43, 12.91s/it] {'loss': 0.0075, 'learning_rate': 4.8845000000000004e-05, 'epoch': 0.09} 2%|▏ | 233/10000 [50:37<35:00:43, 12.91s/it] 2%|▏ | 234/10000 [50:50<34:59:58, 12.90s/it] {'loss': 0.0061, 'learning_rate': 4.884e-05, 'epoch': 0.09} 2%|▏ | 234/10000 [50:50<34:59:58, 12.90s/it] 2%|▏ | 235/10000 [51:03<35:00:27, 12.91s/it] {'loss': 0.0065, 'learning_rate': 4.8835e-05, 'epoch': 0.09} 2%|▏ | 235/10000 [51:03<35:00:27, 12.91s/it] 2%|▏ | 236/10000 [51:15<34:57:36, 12.89s/it] {'loss': 0.0085, 'learning_rate': 4.8830000000000005e-05, 'epoch': 0.09} 2%|▏ | 236/10000 [51:15<34:57:36, 12.89s/it] 2%|▏ | 237/10000 [51:28<34:57:38, 12.89s/it] {'loss': 0.0061, 'learning_rate': 4.8825e-05, 'epoch': 0.09} 2%|▏ | 237/10000 [51:28<34:57:38, 12.89s/it] 2%|▏ | 238/10000 [51:41<35:02:59, 12.93s/it] {'loss': 0.0085, 'learning_rate': 4.8820000000000004e-05, 'epoch': 0.09} 2%|▏ | 238/10000 [51:41<35:02:59, 12.93s/it] 2%|▏ | 239/10000 [51:54<35:07:10, 12.95s/it] {'loss': 0.0066, 'learning_rate': 4.8815e-05, 'epoch': 0.09} 2%|▏ | 239/10000 [51:54<35:07:10, 12.95s/it] 2%|▏ | 240/10000 [52:07<35:07:25, 12.96s/it] {'loss': 0.006, 'learning_rate': 4.881e-05, 'epoch': 0.09} 2%|▏ | 240/10000 [52:07<35:07:25, 12.96s/it] 2%|▏ | 241/10000 [52:20<35:08:59, 12.97s/it] {'loss': 0.0104, 'learning_rate': 4.8805e-05, 'epoch': 0.09} 2%|▏ | 241/10000 [52:20<35:08:59, 12.97s/it] 2%|▏ | 242/10000 [52:33<35:10:41, 12.98s/it] {'loss': 0.0082, 'learning_rate': 4.88e-05, 'epoch': 0.09} 2%|▏ | 242/10000 [52:33<35:10:41, 12.98s/it] 2%|▏ | 243/10000 [52:46<35:03:30, 12.94s/it] {'loss': 0.0106, 'learning_rate': 4.8795e-05, 'epoch': 0.09} 2%|▏ | 243/10000 [52:46<35:03:30, 12.94s/it] 2%|▏ | 244/10000 [52:59<35:03:33, 12.94s/it] {'loss': 0.008, 'learning_rate': 4.8790000000000006e-05, 'epoch': 0.09} 2%|▏ | 244/10000 [52:59<35:03:33, 12.94s/it] 2%|▏ | 245/10000 [53:12<35:07:34, 12.96s/it] {'loss': 0.0098, 'learning_rate': 4.8785e-05, 'epoch': 0.09} 2%|▏ | 245/10000 [53:12<35:07:34, 12.96s/it] 2%|▏ | 246/10000 [53:25<35:05:27, 12.95s/it] {'loss': 0.0062, 'learning_rate': 4.8780000000000004e-05, 'epoch': 0.09} 2%|▏ | 246/10000 [53:25<35:05:27, 12.95s/it] 2%|▏ | 247/10000 [53:38<35:04:07, 12.94s/it] {'loss': 0.0068, 'learning_rate': 4.8775000000000007e-05, 'epoch': 0.09} 2%|▏ | 247/10000 [53:38<35:04:07, 12.94s/it] 2%|▏ | 248/10000 [53:51<34:59:18, 12.92s/it] {'loss': 0.0107, 'learning_rate': 4.877e-05, 'epoch': 0.09} 2%|▏ | 248/10000 [53:51<34:59:18, 12.92s/it] 2%|▏ | 249/10000 [54:04<35:01:47, 12.93s/it] {'loss': 0.0083, 'learning_rate': 4.8765e-05, 'epoch': 0.09} 2%|▏ | 249/10000 [54:04<35:01:47, 12.93s/it] 2%|▎ | 250/10000 [54:17<35:04:04, 12.95s/it] {'loss': 0.0071, 'learning_rate': 4.876e-05, 'epoch': 0.09} 2%|▎ | 250/10000 [54:17<35:04:04, 12.95s/it] 3%|▎ | 251/10000 [54:30<35:06:52, 12.97s/it] {'loss': 0.0109, 'learning_rate': 4.8755e-05, 'epoch': 0.09} 3%|▎ | 251/10000 [54:30<35:06:52, 12.97s/it] 3%|▎ | 252/10000 [54:43<35:07:18, 12.97s/it] {'loss': 0.0078, 'learning_rate': 4.875e-05, 'epoch': 0.09} 3%|▎ | 252/10000 [54:43<35:07:18, 12.97s/it] 3%|▎ | 253/10000 [54:56<35:07:12, 12.97s/it] {'loss': 0.0104, 'learning_rate': 4.8745e-05, 'epoch': 0.1} 3%|▎ | 253/10000 [54:56<35:07:12, 12.97s/it] 3%|▎ | 254/10000 [55:09<35:10:04, 12.99s/it] {'loss': 0.0244, 'learning_rate': 4.8740000000000004e-05, 'epoch': 0.1} 3%|▎ | 254/10000 [55:09<35:10:04, 12.99s/it] 3%|▎ | 255/10000 [55:22<35:09:58, 12.99s/it] {'loss': 0.0075, 'learning_rate': 4.8735e-05, 'epoch': 0.1} 3%|▎ | 255/10000 [55:22<35:09:58, 12.99s/it] 3%|▎ | 256/10000 [55:35<35:11:29, 13.00s/it] {'loss': 0.0065, 'learning_rate': 4.873e-05, 'epoch': 0.1} 3%|▎ | 256/10000 [55:35<35:11:29, 13.00s/it] 3%|▎ | 257/10000 [55:48<35:10:50, 13.00s/it] {'loss': 0.008, 'learning_rate': 4.8725000000000005e-05, 'epoch': 0.1} 3%|▎ | 257/10000 [55:48<35:10:50, 13.00s/it] 3%|▎ | 258/10000 [56:01<35:10:57, 13.00s/it] {'loss': 0.0111, 'learning_rate': 4.872000000000001e-05, 'epoch': 0.1} 3%|▎ | 258/10000 [56:01<35:10:57, 13.00s/it] 3%|▎ | 259/10000 [56:14<35:07:27, 12.98s/it] {'loss': 0.0081, 'learning_rate': 4.8715000000000004e-05, 'epoch': 0.1} 3%|▎ | 259/10000 [56:14<35:07:27, 12.98s/it] 3%|▎ | 260/10000 [56:27<35:07:27, 12.98s/it] {'loss': 0.0101, 'learning_rate': 4.871e-05, 'epoch': 0.1} 3%|▎ | 260/10000 [56:27<35:07:27, 12.98s/it] 3%|▎ | 261/10000 [56:40<35:10:17, 13.00s/it] {'loss': 0.0128, 'learning_rate': 4.8705e-05, 'epoch': 0.1} 3%|▎ | 261/10000 [56:40<35:10:17, 13.00s/it] 3%|▎ | 262/10000 [56:53<35:11:24, 13.01s/it] {'loss': 0.0078, 'learning_rate': 4.87e-05, 'epoch': 0.1} 3%|▎ | 262/10000 [56:53<35:11:24, 13.01s/it] 3%|▎ | 263/10000 [57:06<35:14:39, 13.03s/it] {'loss': 0.0068, 'learning_rate': 4.8695e-05, 'epoch': 0.1} 3%|▎ | 263/10000 [57:06<35:14:39, 13.03s/it] 3%|▎ | 264/10000 [57:19<35:12:07, 13.02s/it] {'loss': 0.0085, 'learning_rate': 4.869e-05, 'epoch': 0.1} 3%|▎ | 264/10000 [57:19<35:12:07, 13.02s/it] 3%|▎ | 265/10000 [57:32<35:10:17, 13.01s/it] {'loss': 0.0074, 'learning_rate': 4.8685000000000006e-05, 'epoch': 0.1} 3%|▎ | 265/10000 [57:32<35:10:17, 13.01s/it] 3%|▎ | 266/10000 [57:45<35:08:04, 12.99s/it] {'loss': 0.0085, 'learning_rate': 4.868e-05, 'epoch': 0.1} 3%|▎ | 266/10000 [57:45<35:08:04, 12.99s/it] 3%|▎ | 267/10000 [57:58<35:04:18, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.8675000000000004e-05, 'epoch': 0.1} 3%|▎ | 267/10000 [57:58<35:04:18, 12.97s/it] 3%|▎ | 268/10000 [58:11<35:05:56, 12.98s/it] {'loss': 0.0067, 'learning_rate': 4.867000000000001e-05, 'epoch': 0.1} 3%|▎ | 268/10000 [58:11<35:05:56, 12.98s/it] 3%|▎ | 269/10000 [58:24<35:06:54, 12.99s/it] {'loss': 0.0126, 'learning_rate': 4.8665e-05, 'epoch': 0.1} 3%|▎ | 269/10000 [58:24<35:06:54, 12.99s/it] 3%|▎ | 270/10000 [58:37<35:08:17, 13.00s/it] {'loss': 0.0082, 'learning_rate': 4.866e-05, 'epoch': 0.1} 3%|▎ | 270/10000 [58:37<35:08:17, 13.00s/it] 3%|▎ | 271/10000 [58:50<35:07:57, 13.00s/it] {'loss': 0.0074, 'learning_rate': 4.8655e-05, 'epoch': 0.1} 3%|▎ | 271/10000 [58:50<35:07:57, 13.00s/it] 3%|▎ | 272/10000 [59:03<35:07:22, 13.00s/it] {'loss': 0.0106, 'learning_rate': 4.8650000000000003e-05, 'epoch': 0.1} 3%|▎ | 272/10000 [59:03<35:07:22, 13.00s/it] 3%|▎ | 273/10000 [59:16<35:05:30, 12.99s/it] {'loss': 0.0055, 'learning_rate': 4.8645e-05, 'epoch': 0.1} 3%|▎ | 273/10000 [59:16<35:05:30, 12.99s/it] 3%|▎ | 274/10000 [59:29<35:03:07, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.864e-05, 'epoch': 0.1} 3%|▎ | 274/10000 [59:29<35:03:07, 12.97s/it] 3%|▎ | 275/10000 [59:42<34:59:28, 12.95s/it] {'loss': 0.008, 'learning_rate': 4.8635000000000004e-05, 'epoch': 0.1} 3%|▎ | 275/10000 [59:42<34:59:28, 12.95s/it] 3%|▎ | 276/10000 [59:55<34:58:19, 12.95s/it] {'loss': 0.0089, 'learning_rate': 4.863e-05, 'epoch': 0.1} 3%|▎ | 276/10000 [59:55<34:58:19, 12.95s/it] 3%|▎ | 277/10000 [1:00:07<34:52:02, 12.91s/it] {'loss': 0.0083, 'learning_rate': 4.8625e-05, 'epoch': 0.1} 3%|▎ | 277/10000 [1:00:07<34:52:02, 12.91s/it] 3%|▎ | 278/10000 [1:00:20<34:51:36, 12.91s/it] {'loss': 0.0084, 'learning_rate': 4.8620000000000005e-05, 'epoch': 0.1} 3%|▎ | 278/10000 [1:00:20<34:51:36, 12.91s/it] 3%|▎ | 279/10000 [1:00:33<34:48:51, 12.89s/it] {'loss': 0.0073, 'learning_rate': 4.861500000000001e-05, 'epoch': 0.11} 3%|▎ | 279/10000 [1:00:33<34:48:51, 12.89s/it] 3%|▎ | 280/10000 [1:00:46<34:51:12, 12.91s/it] {'loss': 0.0076, 'learning_rate': 4.861e-05, 'epoch': 0.11} 3%|▎ | 280/10000 [1:00:46<34:51:12, 12.91s/it] 3%|▎ | 281/10000 [1:00:59<34:48:05, 12.89s/it] {'loss': 0.0085, 'learning_rate': 4.8605e-05, 'epoch': 0.11} 3%|▎ | 281/10000 [1:00:59<34:48:05, 12.89s/it] 3%|▎ | 282/10000 [1:01:12<34:45:46, 12.88s/it] {'loss': 0.0085, 'learning_rate': 4.86e-05, 'epoch': 0.11} 3%|▎ | 282/10000 [1:01:12<34:45:46, 12.88s/it] 3%|▎ | 283/10000 [1:01:25<34:45:41, 12.88s/it] {'loss': 0.0077, 'learning_rate': 4.8595000000000005e-05, 'epoch': 0.11} 3%|▎ | 283/10000 [1:01:25<34:45:41, 12.88s/it] 3%|▎ | 284/10000 [1:01:38<34:52:55, 12.92s/it] {'loss': 0.0097, 'learning_rate': 4.859e-05, 'epoch': 0.11} 3%|▎ | 284/10000 [1:01:38<34:52:55, 12.92s/it] 3%|▎ | 285/10000 [1:01:51<34:56:52, 12.95s/it] {'loss': 0.0074, 'learning_rate': 4.8585e-05, 'epoch': 0.11} 3%|▎ | 285/10000 [1:01:51<34:56:52, 12.95s/it] 3%|▎ | 286/10000 [1:02:04<34:51:58, 12.92s/it] {'loss': 0.0065, 'learning_rate': 4.8580000000000006e-05, 'epoch': 0.11} 3%|▎ | 286/10000 [1:02:04<34:51:58, 12.92s/it] 3%|▎ | 287/10000 [1:02:16<34:48:40, 12.90s/it] {'loss': 0.0073, 'learning_rate': 4.8575e-05, 'epoch': 0.11} 3%|▎ | 287/10000 [1:02:16<34:48:40, 12.90s/it] 3%|▎ | 288/10000 [1:02:29<34:46:38, 12.89s/it] {'loss': 0.0094, 'learning_rate': 4.8570000000000004e-05, 'epoch': 0.11} 3%|▎ | 288/10000 [1:02:29<34:46:38, 12.89s/it] 3%|▎ | 289/10000 [1:02:42<34:47:22, 12.90s/it] {'loss': 0.0075, 'learning_rate': 4.856500000000001e-05, 'epoch': 0.11} 3%|▎ | 289/10000 [1:02:42<34:47:22, 12.90s/it] 3%|▎ | 290/10000 [1:02:55<34:46:55, 12.90s/it] {'loss': 0.0072, 'learning_rate': 4.856e-05, 'epoch': 0.11} 3%|▎ | 290/10000 [1:02:55<34:46:55, 12.90s/it] 3%|▎ | 291/10000 [1:03:08<34:48:01, 12.90s/it] {'loss': 0.0066, 'learning_rate': 4.8555e-05, 'epoch': 0.11} 3%|▎ | 291/10000 [1:03:08<34:48:01, 12.90s/it] 3%|▎ | 292/10000 [1:03:21<34:52:11, 12.93s/it] {'loss': 0.0073, 'learning_rate': 4.855e-05, 'epoch': 0.11} 3%|▎ | 292/10000 [1:03:21<34:52:11, 12.93s/it] 3%|▎ | 293/10000 [1:03:34<34:51:21, 12.93s/it] {'loss': 0.0067, 'learning_rate': 4.8545000000000004e-05, 'epoch': 0.11} 3%|▎ | 293/10000 [1:03:34<34:51:21, 12.93s/it] 3%|▎ | 294/10000 [1:03:47<34:50:59, 12.93s/it] {'loss': 0.0082, 'learning_rate': 4.854e-05, 'epoch': 0.11} 3%|▎ | 294/10000 [1:03:47<34:50:59, 12.93s/it] 3%|▎ | 295/10000 [1:04:00<34:55:48, 12.96s/it] {'loss': 0.0109, 'learning_rate': 4.8535e-05, 'epoch': 0.11} 3%|▎ | 295/10000 [1:04:00<34:55:48, 12.96s/it] 3%|▎ | 296/10000 [1:04:13<34:52:37, 12.94s/it] {'loss': 0.009, 'learning_rate': 4.8530000000000005e-05, 'epoch': 0.11} 3%|▎ | 296/10000 [1:04:13<34:52:37, 12.94s/it] 3%|▎ | 297/10000 [1:04:26<34:55:58, 12.96s/it] {'loss': 0.0074, 'learning_rate': 4.8525e-05, 'epoch': 0.11} 3%|▎ | 297/10000 [1:04:26<34:55:58, 12.96s/it] 3%|▎ | 298/10000 [1:04:39<34:53:30, 12.95s/it] {'loss': 0.0104, 'learning_rate': 4.852e-05, 'epoch': 0.11} 3%|▎ | 298/10000 [1:04:39<34:53:30, 12.95s/it] 3%|▎ | 299/10000 [1:04:52<34:49:56, 12.93s/it] {'loss': 0.0066, 'learning_rate': 4.8515000000000006e-05, 'epoch': 0.11} 3%|▎ | 299/10000 [1:04:52<34:49:56, 12.93s/it] 3%|▎ | 300/10000 [1:05:05<34:54:57, 12.96s/it] {'loss': 0.0064, 'learning_rate': 4.851e-05, 'epoch': 0.11} 3%|▎ | 300/10000 [1:05:05<34:54:57, 12.96s/it] 3%|▎ | 301/10000 [1:05:18<34:59:05, 12.99s/it] {'loss': 0.007, 'learning_rate': 4.8505e-05, 'epoch': 0.11} 3%|▎ | 301/10000 [1:05:18<34:59:05, 12.99s/it] 3%|▎ | 302/10000 [1:05:31<34:56:45, 12.97s/it] {'loss': 0.0087, 'learning_rate': 4.85e-05, 'epoch': 0.11} 3%|▎ | 302/10000 [1:05:31<34:56:45, 12.97s/it] 3%|▎ | 303/10000 [1:05:44<35:01:05, 13.00s/it] {'loss': 0.0053, 'learning_rate': 4.8495e-05, 'epoch': 0.11} 3%|▎ | 303/10000 [1:05:44<35:01:05, 13.00s/it] 3%|▎ | 304/10000 [1:05:57<34:57:34, 12.98s/it] {'loss': 0.0072, 'learning_rate': 4.8490000000000005e-05, 'epoch': 0.11} 3%|▎ | 304/10000 [1:05:57<34:57:34, 12.98s/it] 3%|▎ | 305/10000 [1:06:10<34:58:23, 12.99s/it] {'loss': 0.0091, 'learning_rate': 4.8485e-05, 'epoch': 0.11} 3%|▎ | 305/10000 [1:06:10<34:58:23, 12.99s/it] 3%|▎ | 306/10000 [1:06:23<34:58:43, 12.99s/it] {'loss': 0.0073, 'learning_rate': 4.8480000000000003e-05, 'epoch': 0.12} 3%|▎ | 306/10000 [1:06:23<34:58:43, 12.99s/it] 3%|▎ | 307/10000 [1:06:36<34:58:36, 12.99s/it] {'loss': 0.0067, 'learning_rate': 4.8475000000000006e-05, 'epoch': 0.12} 3%|▎ | 307/10000 [1:06:36<34:58:36, 12.99s/it] 3%|▎ | 308/10000 [1:06:49<34:59:55, 13.00s/it] {'loss': 0.0069, 'learning_rate': 4.847e-05, 'epoch': 0.12} 3%|▎ | 308/10000 [1:06:49<34:59:55, 13.00s/it] 3%|▎ | 309/10000 [1:07:02<35:00:50, 13.01s/it] {'loss': 0.0057, 'learning_rate': 4.8465000000000004e-05, 'epoch': 0.12} 3%|▎ | 309/10000 [1:07:02<35:00:50, 13.01s/it] 3%|▎ | 310/10000 [1:07:15<34:57:24, 12.99s/it] {'loss': 0.0083, 'learning_rate': 4.846e-05, 'epoch': 0.12} 3%|▎ | 310/10000 [1:07:15<34:57:24, 12.99s/it] 3%|▎ | 311/10000 [1:07:27<34:55:00, 12.97s/it] {'loss': 0.0085, 'learning_rate': 4.8455e-05, 'epoch': 0.12} 3%|▎ | 311/10000 [1:07:28<34:55:00, 12.97s/it] 3%|▎ | 312/10000 [1:07:40<34:53:57, 12.97s/it] {'loss': 0.0073, 'learning_rate': 4.845e-05, 'epoch': 0.12} 3%|▎ | 312/10000 [1:07:40<34:53:57, 12.97s/it] 3%|▎ | 313/10000 [1:07:53<34:57:48, 12.99s/it] {'loss': 0.008, 'learning_rate': 4.8445e-05, 'epoch': 0.12} 3%|▎ | 313/10000 [1:07:54<34:57:48, 12.99s/it] 3%|▎ | 314/10000 [1:08:06<34:56:20, 12.99s/it] {'loss': 0.0094, 'learning_rate': 4.8440000000000004e-05, 'epoch': 0.12} 3%|▎ | 314/10000 [1:08:06<34:56:20, 12.99s/it] 3%|▎ | 315/10000 [1:08:19<34:54:52, 12.98s/it] {'loss': 0.008, 'learning_rate': 4.8435e-05, 'epoch': 0.12} 3%|▎ | 315/10000 [1:08:19<34:54:52, 12.98s/it] 3%|▎ | 316/10000 [1:08:32<34:58:11, 13.00s/it] {'loss': 0.0061, 'learning_rate': 4.843e-05, 'epoch': 0.12} 3%|▎ | 316/10000 [1:08:32<34:58:11, 13.00s/it] 3%|▎ | 317/10000 [1:08:45<34:58:59, 13.01s/it] {'loss': 0.0064, 'learning_rate': 4.8425000000000005e-05, 'epoch': 0.12} 3%|▎ | 317/10000 [1:08:45<34:58:59, 13.01s/it] 3%|▎ | 318/10000 [1:08:58<34:57:38, 13.00s/it] {'loss': 0.0099, 'learning_rate': 4.842000000000001e-05, 'epoch': 0.12} 3%|▎ | 318/10000 [1:08:58<34:57:38, 13.00s/it] 3%|▎ | 319/10000 [1:09:12<35:00:37, 13.02s/it] {'loss': 0.0072, 'learning_rate': 4.8415e-05, 'epoch': 0.12} 3%|▎ | 319/10000 [1:09:12<35:00:37, 13.02s/it] 3%|▎ | 320/10000 [1:09:25<35:00:03, 13.02s/it] {'loss': 0.0098, 'learning_rate': 4.841e-05, 'epoch': 0.12} 3%|▎ | 320/10000 [1:09:25<35:00:03, 13.02s/it] 3%|▎ | 321/10000 [1:09:38<34:59:28, 13.01s/it] {'loss': 0.0065, 'learning_rate': 4.8405e-05, 'epoch': 0.12} 3%|▎ | 321/10000 [1:09:38<34:59:28, 13.01s/it] 3%|▎ | 322/10000 [1:09:50<34:53:55, 12.98s/it] {'loss': 0.0083, 'learning_rate': 4.8400000000000004e-05, 'epoch': 0.12} 3%|▎ | 322/10000 [1:09:50<34:53:55, 12.98s/it] 3%|▎ | 323/10000 [1:10:03<34:54:37, 12.99s/it] {'loss': 0.0084, 'learning_rate': 4.8395e-05, 'epoch': 0.12} 3%|▎ | 323/10000 [1:10:03<34:54:37, 12.99s/it] 3%|▎ | 324/10000 [1:10:16<34:51:37, 12.97s/it] {'loss': 0.0079, 'learning_rate': 4.839e-05, 'epoch': 0.12} 3%|▎ | 324/10000 [1:10:16<34:51:37, 12.97s/it] 3%|▎ | 325/10000 [1:10:29<34:48:22, 12.95s/it] {'loss': 0.0112, 'learning_rate': 4.8385000000000005e-05, 'epoch': 0.12} 3%|▎ | 325/10000 [1:10:29<34:48:22, 12.95s/it] 3%|▎ | 326/10000 [1:10:42<34:46:29, 12.94s/it] {'loss': 0.0074, 'learning_rate': 4.838e-05, 'epoch': 0.12} 3%|▎ | 326/10000 [1:10:42<34:46:29, 12.94s/it] 3%|▎ | 327/10000 [1:10:55<34:51:50, 12.98s/it] {'loss': 0.0069, 'learning_rate': 4.8375000000000004e-05, 'epoch': 0.12} 3%|▎ | 327/10000 [1:10:55<34:51:50, 12.98s/it] 3%|▎ | 328/10000 [1:11:08<34:47:59, 12.95s/it] {'loss': 0.0084, 'learning_rate': 4.8370000000000006e-05, 'epoch': 0.12} 3%|▎ | 328/10000 [1:11:08<34:47:59, 12.95s/it] 3%|▎ | 329/10000 [1:11:21<34:47:48, 12.95s/it] {'loss': 0.0081, 'learning_rate': 4.8365e-05, 'epoch': 0.12} 3%|▎ | 329/10000 [1:11:21<34:47:48, 12.95s/it] 3%|▎ | 330/10000 [1:11:34<34:50:56, 12.97s/it] {'loss': 0.0079, 'learning_rate': 4.836e-05, 'epoch': 0.12} 3%|▎ | 330/10000 [1:11:34<34:50:56, 12.97s/it] 3%|▎ | 331/10000 [1:11:47<34:50:29, 12.97s/it] {'loss': 0.0058, 'learning_rate': 4.8355e-05, 'epoch': 0.12} 3%|▎ | 331/10000 [1:11:47<34:50:29, 12.97s/it] 3%|▎ | 332/10000 [1:12:00<34:52:44, 12.99s/it] {'loss': 0.0067, 'learning_rate': 4.835e-05, 'epoch': 0.13} 3%|▎ | 332/10000 [1:12:00<34:52:44, 12.99s/it] 3%|▎ | 333/10000 [1:12:13<34:52:48, 12.99s/it] {'loss': 0.0083, 'learning_rate': 4.8345e-05, 'epoch': 0.13} 3%|▎ | 333/10000 [1:12:13<34:52:48, 12.99s/it] 3%|▎ | 334/10000 [1:12:26<34:53:13, 12.99s/it] {'loss': 0.0076, 'learning_rate': 4.834e-05, 'epoch': 0.13} 3%|▎ | 334/10000 [1:12:26<34:53:13, 12.99s/it] 3%|▎ | 335/10000 [1:12:39<34:50:02, 12.97s/it] {'loss': 0.0071, 'learning_rate': 4.8335000000000004e-05, 'epoch': 0.13} 3%|▎ | 335/10000 [1:12:39<34:50:02, 12.97s/it] 3%|▎ | 336/10000 [1:12:52<34:50:50, 12.98s/it] {'loss': 0.0074, 'learning_rate': 4.833e-05, 'epoch': 0.13} 3%|▎ | 336/10000 [1:12:52<34:50:50, 12.98s/it] 3%|▎ | 337/10000 [1:13:05<34:48:09, 12.97s/it] {'loss': 0.0078, 'learning_rate': 4.8325e-05, 'epoch': 0.13} 3%|▎ | 337/10000 [1:13:05<34:48:09, 12.97s/it] 3%|▎ | 338/10000 [1:13:18<34:46:48, 12.96s/it] {'loss': 0.0061, 'learning_rate': 4.8320000000000005e-05, 'epoch': 0.13} 3%|▎ | 338/10000 [1:13:18<34:46:48, 12.96s/it] 3%|▎ | 339/10000 [1:13:31<34:42:57, 12.94s/it] {'loss': 0.0077, 'learning_rate': 4.831500000000001e-05, 'epoch': 0.13} 3%|▎ | 339/10000 [1:13:31<34:42:57, 12.94s/it] 3%|▎ | 340/10000 [1:13:44<34:41:33, 12.93s/it] {'loss': 0.007, 'learning_rate': 4.8309999999999997e-05, 'epoch': 0.13} 3%|▎ | 340/10000 [1:13:44<34:41:33, 12.93s/it] 3%|▎ | 341/10000 [1:13:57<34:39:26, 12.92s/it] {'loss': 0.0082, 'learning_rate': 4.8305e-05, 'epoch': 0.13} 3%|▎ | 341/10000 [1:13:57<34:39:26, 12.92s/it] 3%|▎ | 342/10000 [1:14:10<34:43:04, 12.94s/it] {'loss': 0.0079, 'learning_rate': 4.83e-05, 'epoch': 0.13} 3%|▎ | 342/10000 [1:14:10<34:43:04, 12.94s/it] 3%|▎ | 343/10000 [1:14:23<34:44:26, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.8295000000000004e-05, 'epoch': 0.13} 3%|▎ | 343/10000 [1:14:23<34:44:26, 12.95s/it] 3%|▎ | 344/10000 [1:14:35<34:41:22, 12.93s/it] {'loss': 0.0072, 'learning_rate': 4.829e-05, 'epoch': 0.13} 3%|▎ | 344/10000 [1:14:36<34:41:22, 12.93s/it] 3%|▎ | 345/10000 [1:14:48<34:40:00, 12.93s/it] {'loss': 0.0072, 'learning_rate': 4.8285e-05, 'epoch': 0.13} 3%|▎ | 345/10000 [1:14:48<34:40:00, 12.93s/it] 3%|▎ | 346/10000 [1:15:01<34:44:48, 12.96s/it] {'loss': 0.0075, 'learning_rate': 4.8280000000000005e-05, 'epoch': 0.13} 3%|▎ | 346/10000 [1:15:01<34:44:48, 12.96s/it] 3%|▎ | 347/10000 [1:15:14<34:44:10, 12.95s/it] {'loss': 0.0107, 'learning_rate': 4.8275e-05, 'epoch': 0.13} 3%|▎ | 347/10000 [1:15:14<34:44:10, 12.95s/it] 3%|▎ | 348/10000 [1:15:27<34:41:50, 12.94s/it] {'loss': 0.0087, 'learning_rate': 4.8270000000000004e-05, 'epoch': 0.13} 3%|▎ | 348/10000 [1:15:27<34:41:50, 12.94s/it] 3%|▎ | 349/10000 [1:15:40<34:39:13, 12.93s/it] {'loss': 0.0072, 'learning_rate': 4.8265000000000006e-05, 'epoch': 0.13} 3%|▎ | 349/10000 [1:15:40<34:39:13, 12.93s/it] 4%|▎ | 350/10000 [1:15:53<34:43:13, 12.95s/it] {'loss': 0.0096, 'learning_rate': 4.826e-05, 'epoch': 0.13} 4%|▎ | 350/10000 [1:15:53<34:43:13, 12.95s/it] 4%|▎ | 351/10000 [1:16:06<34:44:24, 12.96s/it] {'loss': 0.0084, 'learning_rate': 4.8255e-05, 'epoch': 0.13} 4%|▎ | 351/10000 [1:16:06<34:44:24, 12.96s/it] 4%|▎ | 352/10000 [1:16:19<34:45:59, 12.97s/it] {'loss': 0.0089, 'learning_rate': 4.825e-05, 'epoch': 0.13} 4%|▎ | 352/10000 [1:16:19<34:45:59, 12.97s/it] 4%|▎ | 353/10000 [1:16:32<34:43:57, 12.96s/it] {'loss': 0.0088, 'learning_rate': 4.8245e-05, 'epoch': 0.13} 4%|▎ | 353/10000 [1:16:32<34:43:57, 12.96s/it] 4%|▎ | 354/10000 [1:16:45<34:41:36, 12.95s/it] {'loss': 0.0091, 'learning_rate': 4.824e-05, 'epoch': 0.13} 4%|▎ | 354/10000 [1:16:45<34:41:36, 12.95s/it] 4%|▎ | 355/10000 [1:16:58<34:42:00, 12.95s/it] {'loss': 0.0093, 'learning_rate': 4.8235e-05, 'epoch': 0.13} 4%|▎ | 355/10000 [1:16:58<34:42:00, 12.95s/it] 4%|▎ | 356/10000 [1:17:11<34:44:15, 12.97s/it] {'loss': 0.0308, 'learning_rate': 4.8230000000000004e-05, 'epoch': 0.13} 4%|▎ | 356/10000 [1:17:11<34:44:15, 12.97s/it] 4%|▎ | 357/10000 [1:17:24<34:47:39, 12.99s/it] {'loss': 0.0069, 'learning_rate': 4.822500000000001e-05, 'epoch': 0.13} 4%|▎ | 357/10000 [1:17:24<34:47:39, 12.99s/it] 4%|▎ | 358/10000 [1:17:37<34:48:58, 13.00s/it] {'loss': 0.0119, 'learning_rate': 4.822e-05, 'epoch': 0.13} 4%|▎ | 358/10000 [1:17:37<34:48:58, 13.00s/it] 4%|▎ | 359/10000 [1:17:50<34:46:06, 12.98s/it] {'loss': 0.0093, 'learning_rate': 4.8215000000000005e-05, 'epoch': 0.14} 4%|▎ | 359/10000 [1:17:50<34:46:06, 12.98s/it] 4%|▎ | 360/10000 [1:18:03<34:48:17, 13.00s/it] {'loss': 0.0118, 'learning_rate': 4.821e-05, 'epoch': 0.14} 4%|▎ | 360/10000 [1:18:03<34:48:17, 13.00s/it] 4%|▎ | 361/10000 [1:18:16<34:48:56, 13.00s/it] {'loss': 0.0089, 'learning_rate': 4.8205000000000003e-05, 'epoch': 0.14} 4%|▎ | 361/10000 [1:18:16<34:48:56, 13.00s/it] 4%|▎ | 362/10000 [1:18:29<34:46:52, 12.99s/it] {'loss': 0.0099, 'learning_rate': 4.82e-05, 'epoch': 0.14} 4%|▎ | 362/10000 [1:18:29<34:46:52, 12.99s/it] 4%|▎ | 363/10000 [1:18:42<34:45:40, 12.99s/it] {'loss': 0.0085, 'learning_rate': 4.8195e-05, 'epoch': 0.14} 4%|▎ | 363/10000 [1:18:42<34:45:40, 12.99s/it] 4%|▎ | 364/10000 [1:18:55<34:40:36, 12.96s/it] {'loss': 0.0098, 'learning_rate': 4.8190000000000004e-05, 'epoch': 0.14} 4%|▎ | 364/10000 [1:18:55<34:40:36, 12.96s/it] 4%|▎ | 365/10000 [1:19:08<34:38:27, 12.94s/it] {'loss': 0.007, 'learning_rate': 4.8185e-05, 'epoch': 0.14} 4%|▎ | 365/10000 [1:19:08<34:38:27, 12.94s/it] 4%|▎ | 366/10000 [1:19:21<34:36:44, 12.93s/it] {'loss': 0.0101, 'learning_rate': 4.818e-05, 'epoch': 0.14} 4%|▎ | 366/10000 [1:19:21<34:36:44, 12.93s/it] 4%|▎ | 367/10000 [1:19:34<34:31:41, 12.90s/it] {'loss': 0.0113, 'learning_rate': 4.8175000000000005e-05, 'epoch': 0.14} 4%|▎ | 367/10000 [1:19:34<34:31:41, 12.90s/it] 4%|▎ | 368/10000 [1:19:46<34:31:03, 12.90s/it] {'loss': 0.0115, 'learning_rate': 4.817e-05, 'epoch': 0.14} 4%|▎ | 368/10000 [1:19:46<34:31:03, 12.90s/it] 4%|▎ | 369/10000 [1:19:59<34:28:53, 12.89s/it] {'loss': 0.0113, 'learning_rate': 4.8165000000000004e-05, 'epoch': 0.14} 4%|▎ | 369/10000 [1:19:59<34:28:53, 12.89s/it] 4%|▎ | 370/10000 [1:20:12<34:25:06, 12.87s/it] {'loss': 0.0096, 'learning_rate': 4.816e-05, 'epoch': 0.14} 4%|▎ | 370/10000 [1:20:12<34:25:06, 12.87s/it] 4%|▎ | 371/10000 [1:20:25<34:22:05, 12.85s/it] {'loss': 0.0154, 'learning_rate': 4.8155e-05, 'epoch': 0.14} 4%|▎ | 371/10000 [1:20:25<34:22:05, 12.85s/it] 4%|▎ | 372/10000 [1:20:38<34:27:25, 12.88s/it] {'loss': 0.0125, 'learning_rate': 4.815e-05, 'epoch': 0.14} 4%|▎ | 372/10000 [1:20:38<34:27:25, 12.88s/it] 4%|▎ | 373/10000 [1:20:51<34:31:27, 12.91s/it] {'loss': 0.0089, 'learning_rate': 4.8145e-05, 'epoch': 0.14} 4%|▎ | 373/10000 [1:20:51<34:31:27, 12.91s/it] 4%|▎ | 374/10000 [1:21:04<34:31:15, 12.91s/it] {'loss': 0.0089, 'learning_rate': 4.814e-05, 'epoch': 0.14} 4%|▎ | 374/10000 [1:21:04<34:31:15, 12.91s/it] 4%|▍ | 375/10000 [1:21:17<34:28:28, 12.89s/it] {'loss': 0.0083, 'learning_rate': 4.8135e-05, 'epoch': 0.14} 4%|▍ | 375/10000 [1:21:17<34:28:28, 12.89s/it] 4%|▍ | 376/10000 [1:21:30<34:29:15, 12.90s/it] {'loss': 0.008, 'learning_rate': 4.813e-05, 'epoch': 0.14} 4%|▍ | 376/10000 [1:21:30<34:29:15, 12.90s/it] 4%|▍ | 377/10000 [1:21:42<34:27:27, 12.89s/it] {'loss': 0.0098, 'learning_rate': 4.8125000000000004e-05, 'epoch': 0.14} 4%|▍ | 377/10000 [1:21:42<34:27:27, 12.89s/it] 4%|▍ | 378/10000 [1:21:55<34:26:36, 12.89s/it] {'loss': 0.0084, 'learning_rate': 4.812000000000001e-05, 'epoch': 0.14} 4%|▍ | 378/10000 [1:21:55<34:26:36, 12.89s/it] 4%|▍ | 379/10000 [1:22:08<34:26:39, 12.89s/it] {'loss': 0.0091, 'learning_rate': 4.8115e-05, 'epoch': 0.14} 4%|▍ | 379/10000 [1:22:08<34:26:39, 12.89s/it] 4%|▍ | 380/10000 [1:22:21<34:24:22, 12.88s/it] {'loss': 0.0083, 'learning_rate': 4.8110000000000005e-05, 'epoch': 0.14} 4%|▍ | 380/10000 [1:22:21<34:24:22, 12.88s/it] 4%|▍ | 381/10000 [1:22:34<34:22:38, 12.87s/it] {'loss': 0.0072, 'learning_rate': 4.8105e-05, 'epoch': 0.14} 4%|▍ | 381/10000 [1:22:34<34:22:38, 12.87s/it] 4%|▍ | 382/10000 [1:22:47<34:20:38, 12.85s/it] {'loss': 0.0095, 'learning_rate': 4.8100000000000004e-05, 'epoch': 0.14} 4%|▍ | 382/10000 [1:22:47<34:20:38, 12.85s/it] 4%|▍ | 383/10000 [1:23:00<34:23:14, 12.87s/it] {'loss': 0.0076, 'learning_rate': 4.8095e-05, 'epoch': 0.14} 4%|▍ | 383/10000 [1:23:00<34:23:14, 12.87s/it] 4%|▍ | 384/10000 [1:23:12<34:21:07, 12.86s/it] {'loss': 0.011, 'learning_rate': 4.809e-05, 'epoch': 0.14} 4%|▍ | 384/10000 [1:23:12<34:21:07, 12.86s/it] 4%|▍ | 385/10000 [1:23:25<34:24:39, 12.88s/it] {'loss': 0.0096, 'learning_rate': 4.8085000000000005e-05, 'epoch': 0.15} 4%|▍ | 385/10000 [1:23:25<34:24:39, 12.88s/it] 4%|▍ | 386/10000 [1:23:38<34:23:36, 12.88s/it] {'loss': 0.0075, 'learning_rate': 4.808e-05, 'epoch': 0.15} 4%|▍ | 386/10000 [1:23:38<34:23:36, 12.88s/it] 4%|▍ | 387/10000 [1:23:51<34:23:19, 12.88s/it] {'loss': 0.0103, 'learning_rate': 4.8075e-05, 'epoch': 0.15} 4%|▍ | 387/10000 [1:23:51<34:23:19, 12.88s/it] 4%|▍ | 388/10000 [1:24:04<34:24:45, 12.89s/it] {'loss': 0.0081, 'learning_rate': 4.8070000000000006e-05, 'epoch': 0.15} 4%|▍ | 388/10000 [1:24:04<34:24:45, 12.89s/it] 4%|▍ | 389/10000 [1:24:17<34:27:33, 12.91s/it] {'loss': 0.0111, 'learning_rate': 4.8065e-05, 'epoch': 0.15} 4%|▍ | 389/10000 [1:24:17<34:27:33, 12.91s/it] 4%|▍ | 390/10000 [1:24:30<34:29:55, 12.92s/it] {'loss': 0.0074, 'learning_rate': 4.8060000000000004e-05, 'epoch': 0.15} 4%|▍ | 390/10000 [1:24:30<34:29:55, 12.92s/it] 4%|▍ | 391/10000 [1:24:43<34:26:47, 12.91s/it] {'loss': 0.0088, 'learning_rate': 4.8055e-05, 'epoch': 0.15} 4%|▍ | 391/10000 [1:24:43<34:26:47, 12.91s/it] 4%|▍ | 392/10000 [1:24:56<34:26:23, 12.90s/it] {'loss': 0.008, 'learning_rate': 4.805e-05, 'epoch': 0.15} 4%|▍ | 392/10000 [1:24:56<34:26:23, 12.90s/it] 4%|▍ | 393/10000 [1:25:09<34:26:43, 12.91s/it] {'loss': 0.0119, 'learning_rate': 4.8045e-05, 'epoch': 0.15} 4%|▍ | 393/10000 [1:25:09<34:26:43, 12.91s/it] 4%|▍ | 394/10000 [1:25:21<34:23:35, 12.89s/it] {'loss': 0.0113, 'learning_rate': 4.804e-05, 'epoch': 0.15} 4%|▍ | 394/10000 [1:25:21<34:23:35, 12.89s/it] 4%|▍ | 395/10000 [1:25:34<34:25:11, 12.90s/it] {'loss': 0.007, 'learning_rate': 4.8035000000000003e-05, 'epoch': 0.15} 4%|▍ | 395/10000 [1:25:34<34:25:11, 12.90s/it] 4%|▍ | 396/10000 [1:25:47<34:22:06, 12.88s/it] {'loss': 0.0079, 'learning_rate': 4.8030000000000006e-05, 'epoch': 0.15} 4%|▍ | 396/10000 [1:25:47<34:22:06, 12.88s/it] 4%|▍ | 397/10000 [1:26:00<34:25:30, 12.91s/it] {'loss': 0.0066, 'learning_rate': 4.8025e-05, 'epoch': 0.15} 4%|▍ | 397/10000 [1:26:00<34:25:30, 12.91s/it] 4%|▍ | 398/10000 [1:26:13<34:24:44, 12.90s/it] {'loss': 0.0086, 'learning_rate': 4.8020000000000004e-05, 'epoch': 0.15} 4%|▍ | 398/10000 [1:26:13<34:24:44, 12.90s/it] 4%|▍ | 399/10000 [1:26:26<34:24:37, 12.90s/it] {'loss': 0.0118, 'learning_rate': 4.801500000000001e-05, 'epoch': 0.15} 4%|▍ | 399/10000 [1:26:26<34:24:37, 12.90s/it] 4%|▍ | 400/10000 [1:26:39<34:26:58, 12.92s/it] {'loss': 0.0459, 'learning_rate': 4.801e-05, 'epoch': 0.15} 4%|▍ | 400/10000 [1:26:39<34:26:58, 12.92s/it] 4%|▍ | 401/10000 [1:26:52<34:27:00, 12.92s/it] {'loss': 0.0087, 'learning_rate': 4.8005e-05, 'epoch': 0.15} 4%|▍ | 401/10000 [1:26:52<34:27:00, 12.92s/it] 4%|▍ | 402/10000 [1:27:05<34:24:19, 12.90s/it] {'loss': 0.0339, 'learning_rate': 4.8e-05, 'epoch': 0.15} 4%|▍ | 402/10000 [1:27:05<34:24:19, 12.90s/it] 4%|▍ | 403/10000 [1:27:18<34:20:17, 12.88s/it] {'loss': 0.0116, 'learning_rate': 4.7995000000000004e-05, 'epoch': 0.15} 4%|▍ | 403/10000 [1:27:18<34:20:17, 12.88s/it] 4%|▍ | 404/10000 [1:27:31<34:25:08, 12.91s/it] {'loss': 0.0109, 'learning_rate': 4.799e-05, 'epoch': 0.15} 4%|▍ | 404/10000 [1:27:31<34:25:08, 12.91s/it] 4%|▍ | 405/10000 [1:27:43<34:23:07, 12.90s/it] {'loss': 0.0083, 'learning_rate': 4.7985e-05, 'epoch': 0.15} 4%|▍ | 405/10000 [1:27:43<34:23:07, 12.90s/it] 4%|▍ | 406/10000 [1:27:56<34:21:03, 12.89s/it] {'loss': 0.0072, 'learning_rate': 4.7980000000000005e-05, 'epoch': 0.15} 4%|▍ | 406/10000 [1:27:56<34:21:03, 12.89s/it] 4%|▍ | 407/10000 [1:28:09<34:19:42, 12.88s/it] {'loss': 0.0066, 'learning_rate': 4.7975e-05, 'epoch': 0.15} 4%|▍ | 407/10000 [1:28:09<34:19:42, 12.88s/it] 4%|▍ | 408/10000 [1:28:22<34:17:50, 12.87s/it] {'loss': 0.0086, 'learning_rate': 4.797e-05, 'epoch': 0.15} 4%|▍ | 408/10000 [1:28:22<34:17:50, 12.87s/it] 4%|▍ | 409/10000 [1:28:35<34:18:25, 12.88s/it] {'loss': 0.0074, 'learning_rate': 4.7965000000000006e-05, 'epoch': 0.15} 4%|▍ | 409/10000 [1:28:35<34:18:25, 12.88s/it] 4%|▍ | 410/10000 [1:28:48<34:19:51, 12.89s/it] {'loss': 0.0077, 'learning_rate': 4.796e-05, 'epoch': 0.15} 4%|▍ | 410/10000 [1:28:48<34:19:51, 12.89s/it] 4%|▍ | 411/10000 [1:29:01<34:17:51, 12.88s/it] {'loss': 0.0061, 'learning_rate': 4.7955e-05, 'epoch': 0.15} 4%|▍ | 411/10000 [1:29:01<34:17:51, 12.88s/it] 4%|▍ | 412/10000 [1:29:13<34:17:17, 12.87s/it] {'loss': 0.0078, 'learning_rate': 4.795e-05, 'epoch': 0.16} 4%|▍ | 412/10000 [1:29:14<34:17:17, 12.87s/it] 4%|▍ | 413/10000 [1:29:26<34:19:02, 12.89s/it] {'loss': 0.0072, 'learning_rate': 4.7945e-05, 'epoch': 0.16} 4%|▍ | 413/10000 [1:29:26<34:19:02, 12.89s/it] 4%|▍ | 414/10000 [1:29:39<34:18:45, 12.89s/it] {'loss': 0.0093, 'learning_rate': 4.794e-05, 'epoch': 0.16} 4%|▍ | 414/10000 [1:29:39<34:18:45, 12.89s/it] 4%|▍ | 415/10000 [1:29:52<34:19:12, 12.89s/it] {'loss': 0.01, 'learning_rate': 4.7935e-05, 'epoch': 0.16} 4%|▍ | 415/10000 [1:29:52<34:19:12, 12.89s/it] 4%|▍ | 416/10000 [1:30:05<34:18:09, 12.88s/it] {'loss': 0.011, 'learning_rate': 4.7930000000000004e-05, 'epoch': 0.16} 4%|▍ | 416/10000 [1:30:05<34:18:09, 12.88s/it] 4%|▍ | 417/10000 [1:30:18<34:17:15, 12.88s/it] {'loss': 0.016, 'learning_rate': 4.7925000000000006e-05, 'epoch': 0.16} 4%|▍ | 417/10000 [1:30:18<34:17:15, 12.88s/it] 4%|▍ | 418/10000 [1:30:31<34:14:58, 12.87s/it] {'loss': 0.005, 'learning_rate': 4.792e-05, 'epoch': 0.16} 4%|▍ | 418/10000 [1:30:31<34:14:58, 12.87s/it] 4%|▍ | 419/10000 [1:30:44<34:14:26, 12.87s/it] {'loss': 0.0092, 'learning_rate': 4.7915000000000005e-05, 'epoch': 0.16} 4%|▍ | 419/10000 [1:30:44<34:14:26, 12.87s/it] 4%|▍ | 420/10000 [1:30:57<34:16:29, 12.88s/it] {'loss': 0.0053, 'learning_rate': 4.791000000000001e-05, 'epoch': 0.16} 4%|▍ | 420/10000 [1:30:57<34:16:29, 12.88s/it] 4%|▍ | 421/10000 [1:31:09<34:18:15, 12.89s/it] {'loss': 0.0063, 'learning_rate': 4.7905e-05, 'epoch': 0.16} 4%|▍ | 421/10000 [1:31:09<34:18:15, 12.89s/it] 4%|▍ | 422/10000 [1:31:22<34:19:38, 12.90s/it] {'loss': 0.0092, 'learning_rate': 4.79e-05, 'epoch': 0.16} 4%|▍ | 422/10000 [1:31:22<34:19:38, 12.90s/it] 4%|▍ | 423/10000 [1:31:35<34:18:16, 12.90s/it] {'loss': 0.0073, 'learning_rate': 4.7895e-05, 'epoch': 0.16} 4%|▍ | 423/10000 [1:31:35<34:18:16, 12.90s/it] 4%|▍ | 424/10000 [1:31:48<34:21:33, 12.92s/it] {'loss': 0.0076, 'learning_rate': 4.7890000000000004e-05, 'epoch': 0.16} 4%|▍ | 424/10000 [1:31:48<34:21:33, 12.92s/it] 4%|▍ | 425/10000 [1:32:01<34:20:32, 12.91s/it] {'loss': 0.0108, 'learning_rate': 4.7885e-05, 'epoch': 0.16} 4%|▍ | 425/10000 [1:32:01<34:20:32, 12.91s/it] 4%|▍ | 426/10000 [1:32:14<34:16:43, 12.89s/it] {'loss': 0.0088, 'learning_rate': 4.788e-05, 'epoch': 0.16} 4%|▍ | 426/10000 [1:32:14<34:16:43, 12.89s/it] 4%|▍ | 427/10000 [1:32:27<34:16:23, 12.89s/it] {'loss': 0.0074, 'learning_rate': 4.7875000000000005e-05, 'epoch': 0.16} 4%|▍ | 427/10000 [1:32:27<34:16:23, 12.89s/it] 4%|▍ | 428/10000 [1:32:40<34:14:20, 12.88s/it] {'loss': 0.0091, 'learning_rate': 4.787e-05, 'epoch': 0.16} 4%|▍ | 428/10000 [1:32:40<34:14:20, 12.88s/it] 4%|▍ | 429/10000 [1:32:53<34:11:51, 12.86s/it] {'loss': 0.0074, 'learning_rate': 4.7865e-05, 'epoch': 0.16} 4%|▍ | 429/10000 [1:32:53<34:11:51, 12.86s/it] 4%|▍ | 430/10000 [1:33:05<34:13:30, 12.87s/it] {'loss': 0.0065, 'learning_rate': 4.7860000000000006e-05, 'epoch': 0.16} 4%|▍ | 430/10000 [1:33:05<34:13:30, 12.87s/it] 4%|▍ | 431/10000 [1:33:18<34:16:31, 12.89s/it] {'loss': 0.0114, 'learning_rate': 4.7855e-05, 'epoch': 0.16} 4%|▍ | 431/10000 [1:33:18<34:16:31, 12.89s/it] 4%|▍ | 432/10000 [1:33:31<34:18:20, 12.91s/it] {'loss': 0.0052, 'learning_rate': 4.785e-05, 'epoch': 0.16} 4%|▍ | 432/10000 [1:33:31<34:18:20, 12.91s/it] 4%|▍ | 433/10000 [1:33:44<34:17:41, 12.90s/it] {'loss': 0.0063, 'learning_rate': 4.7845e-05, 'epoch': 0.16} 4%|▍ | 433/10000 [1:33:44<34:17:41, 12.90s/it] 4%|▍ | 434/10000 [1:33:57<34:18:24, 12.91s/it] {'loss': 0.0072, 'learning_rate': 4.784e-05, 'epoch': 0.16} 4%|▍ | 434/10000 [1:33:57<34:18:24, 12.91s/it] 4%|▍ | 435/10000 [1:34:10<34:18:37, 12.91s/it] {'loss': 0.0072, 'learning_rate': 4.7835000000000005e-05, 'epoch': 0.16} 4%|▍ | 435/10000 [1:34:10<34:18:37, 12.91s/it] 4%|▍ | 436/10000 [1:34:23<34:14:18, 12.89s/it] {'loss': 0.0091, 'learning_rate': 4.783e-05, 'epoch': 0.16} 4%|▍ | 436/10000 [1:34:23<34:14:18, 12.89s/it] 4%|▍ | 437/10000 [1:34:36<34:13:43, 12.89s/it] {'loss': 0.0073, 'learning_rate': 4.7825000000000004e-05, 'epoch': 0.16} 4%|▍ | 437/10000 [1:34:36<34:13:43, 12.89s/it] 4%|▍ | 438/10000 [1:34:49<34:13:03, 12.88s/it] {'loss': 0.0084, 'learning_rate': 4.7820000000000006e-05, 'epoch': 0.17} 4%|▍ | 438/10000 [1:34:49<34:13:03, 12.88s/it] 4%|▍ | 439/10000 [1:35:02<34:15:36, 12.90s/it] {'loss': 0.008, 'learning_rate': 4.7815e-05, 'epoch': 0.17} 4%|▍ | 439/10000 [1:35:02<34:15:36, 12.90s/it] 4%|▍ | 440/10000 [1:35:15<34:17:30, 12.91s/it] {'loss': 0.0078, 'learning_rate': 4.7810000000000005e-05, 'epoch': 0.17} 4%|▍ | 440/10000 [1:35:15<34:17:30, 12.91s/it] 4%|▍ | 441/10000 [1:35:27<34:18:09, 12.92s/it] {'loss': 0.0047, 'learning_rate': 4.7805e-05, 'epoch': 0.17} 4%|▍ | 441/10000 [1:35:27<34:18:09, 12.92s/it] 4%|▍ | 442/10000 [1:35:40<34:19:15, 12.93s/it] {'loss': 0.0075, 'learning_rate': 4.78e-05, 'epoch': 0.17} 4%|▍ | 442/10000 [1:35:40<34:19:15, 12.93s/it] 4%|▍ | 443/10000 [1:35:53<34:17:15, 12.92s/it] {'loss': 0.0089, 'learning_rate': 4.7795e-05, 'epoch': 0.17} 4%|▍ | 443/10000 [1:35:53<34:17:15, 12.92s/it] 4%|▍ | 444/10000 [1:36:06<34:15:17, 12.90s/it] {'loss': 0.0089, 'learning_rate': 4.779e-05, 'epoch': 0.17} 4%|▍ | 444/10000 [1:36:06<34:15:17, 12.90s/it] 4%|▍ | 445/10000 [1:36:19<34:12:33, 12.89s/it] {'loss': 0.0069, 'learning_rate': 4.7785000000000004e-05, 'epoch': 0.17} 4%|▍ | 445/10000 [1:36:19<34:12:33, 12.89s/it] 4%|▍ | 446/10000 [1:36:32<34:13:01, 12.89s/it] {'loss': 0.0095, 'learning_rate': 4.778e-05, 'epoch': 0.17} 4%|▍ | 446/10000 [1:36:32<34:13:01, 12.89s/it] 4%|▍ | 447/10000 [1:36:45<34:09:47, 12.87s/it] {'loss': 0.0091, 'learning_rate': 4.7775e-05, 'epoch': 0.17} 4%|▍ | 447/10000 [1:36:45<34:09:47, 12.87s/it] 4%|▍ | 448/10000 [1:36:58<34:07:44, 12.86s/it] {'loss': 0.0062, 'learning_rate': 4.7770000000000005e-05, 'epoch': 0.17} 4%|▍ | 448/10000 [1:36:58<34:07:44, 12.86s/it] 4%|▍ | 449/10000 [1:37:11<34:10:17, 12.88s/it] {'loss': 0.0071, 'learning_rate': 4.7765e-05, 'epoch': 0.17} 4%|▍ | 449/10000 [1:37:11<34:10:17, 12.88s/it] 4%|▍ | 450/10000 [1:37:23<34:11:34, 12.89s/it] {'loss': 0.0066, 'learning_rate': 4.7760000000000004e-05, 'epoch': 0.17} 4%|▍ | 450/10000 [1:37:23<34:11:34, 12.89s/it] 5%|▍ | 451/10000 [1:37:36<34:11:42, 12.89s/it] {'loss': 0.0082, 'learning_rate': 4.7755e-05, 'epoch': 0.17} 5%|▍ | 451/10000 [1:37:36<34:11:42, 12.89s/it] 5%|▍ | 452/10000 [1:37:49<34:12:26, 12.90s/it] {'loss': 0.0114, 'learning_rate': 4.775e-05, 'epoch': 0.17} 5%|▍ | 452/10000 [1:37:49<34:12:26, 12.90s/it] 5%|▍ | 453/10000 [1:38:02<34:13:42, 12.91s/it] {'loss': 0.0066, 'learning_rate': 4.7745e-05, 'epoch': 0.17} 5%|▍ | 453/10000 [1:38:02<34:13:42, 12.91s/it] 5%|▍ | 454/10000 [1:38:15<34:12:11, 12.90s/it] {'loss': 0.0069, 'learning_rate': 4.774e-05, 'epoch': 0.17} 5%|▍ | 454/10000 [1:38:15<34:12:11, 12.90s/it] 5%|▍ | 455/10000 [1:38:28<34:09:43, 12.88s/it] {'loss': 0.0077, 'learning_rate': 4.7735e-05, 'epoch': 0.17} 5%|▍ | 455/10000 [1:38:28<34:09:43, 12.88s/it] 5%|▍ | 456/10000 [1:38:41<34:08:20, 12.88s/it] {'loss': 0.008, 'learning_rate': 4.7730000000000005e-05, 'epoch': 0.17} 5%|▍ | 456/10000 [1:38:41<34:08:20, 12.88s/it] 5%|▍ | 457/10000 [1:38:54<34:07:02, 12.87s/it] {'loss': 0.0083, 'learning_rate': 4.7725e-05, 'epoch': 0.17} 5%|▍ | 457/10000 [1:38:54<34:07:02, 12.87s/it] 5%|▍ | 458/10000 [1:39:07<34:09:28, 12.89s/it] {'loss': 0.0098, 'learning_rate': 4.7720000000000004e-05, 'epoch': 0.17} 5%|▍ | 458/10000 [1:39:07<34:09:28, 12.89s/it] 5%|▍ | 459/10000 [1:39:19<34:10:27, 12.89s/it] {'loss': 0.0067, 'learning_rate': 4.7715000000000006e-05, 'epoch': 0.17} 5%|▍ | 459/10000 [1:39:19<34:10:27, 12.89s/it] 5%|▍ | 460/10000 [1:39:32<34:10:56, 12.90s/it] {'loss': 0.0078, 'learning_rate': 4.771e-05, 'epoch': 0.17} 5%|▍ | 460/10000 [1:39:32<34:10:56, 12.90s/it] 5%|▍ | 461/10000 [1:39:45<34:13:39, 12.92s/it] {'loss': 0.0073, 'learning_rate': 4.7705e-05, 'epoch': 0.17} 5%|▍ | 461/10000 [1:39:45<34:13:39, 12.92s/it] 5%|▍ | 462/10000 [1:39:58<34:10:46, 12.90s/it] {'loss': 0.009, 'learning_rate': 4.77e-05, 'epoch': 0.17} 5%|▍ | 462/10000 [1:39:58<34:10:46, 12.90s/it] 5%|▍ | 463/10000 [1:40:11<34:12:49, 12.91s/it] {'loss': 0.0062, 'learning_rate': 4.7695e-05, 'epoch': 0.17} 5%|▍ | 463/10000 [1:40:11<34:12:49, 12.91s/it] 5%|▍ | 464/10000 [1:40:24<34:09:35, 12.90s/it] {'loss': 0.0052, 'learning_rate': 4.769e-05, 'epoch': 0.17} 5%|▍ | 464/10000 [1:40:24<34:09:35, 12.90s/it] 5%|▍ | 465/10000 [1:40:37<34:07:59, 12.89s/it] {'loss': 0.0067, 'learning_rate': 4.7685e-05, 'epoch': 0.18} 5%|▍ | 465/10000 [1:40:37<34:07:59, 12.89s/it] 5%|▍ | 466/10000 [1:40:50<34:08:16, 12.89s/it] {'loss': 0.0055, 'learning_rate': 4.7680000000000004e-05, 'epoch': 0.18} 5%|▍ | 466/10000 [1:40:50<34:08:16, 12.89s/it] 5%|▍ | 467/10000 [1:41:03<34:08:50, 12.90s/it] {'loss': 0.008, 'learning_rate': 4.7675e-05, 'epoch': 0.18} 5%|▍ | 467/10000 [1:41:03<34:08:50, 12.90s/it] 5%|▍ | 468/10000 [1:41:16<34:10:40, 12.91s/it] {'loss': 0.0057, 'learning_rate': 4.767e-05, 'epoch': 0.18} 5%|▍ | 468/10000 [1:41:16<34:10:40, 12.91s/it] 5%|▍ | 469/10000 [1:41:28<34:08:40, 12.90s/it] {'loss': 0.0116, 'learning_rate': 4.7665000000000005e-05, 'epoch': 0.18} 5%|▍ | 469/10000 [1:41:28<34:08:40, 12.90s/it] 5%|▍ | 470/10000 [1:41:41<34:09:10, 12.90s/it] {'loss': 0.0072, 'learning_rate': 4.766000000000001e-05, 'epoch': 0.18} 5%|▍ | 470/10000 [1:41:41<34:09:10, 12.90s/it] 5%|▍ | 471/10000 [1:41:54<34:10:13, 12.91s/it] {'loss': 0.0465, 'learning_rate': 4.7655e-05, 'epoch': 0.18} 5%|▍ | 471/10000 [1:41:54<34:10:13, 12.91s/it] 5%|▍ | 472/10000 [1:42:07<34:09:38, 12.91s/it] {'loss': 0.0059, 'learning_rate': 4.765e-05, 'epoch': 0.18} 5%|▍ | 472/10000 [1:42:07<34:09:38, 12.91s/it] 5%|▍ | 473/10000 [1:42:20<34:12:19, 12.93s/it] {'loss': 0.0074, 'learning_rate': 4.7645e-05, 'epoch': 0.18} 5%|▍ | 473/10000 [1:42:20<34:12:19, 12.93s/it] 5%|▍ | 474/10000 [1:42:33<34:06:59, 12.89s/it] {'loss': 0.0108, 'learning_rate': 4.7640000000000005e-05, 'epoch': 0.18} 5%|▍ | 474/10000 [1:42:33<34:06:59, 12.89s/it] 5%|▍ | 475/10000 [1:42:46<34:09:58, 12.91s/it] {'loss': 0.0067, 'learning_rate': 4.7635e-05, 'epoch': 0.18} 5%|▍ | 475/10000 [1:42:46<34:09:58, 12.91s/it] 5%|▍ | 476/10000 [1:42:59<34:07:58, 12.90s/it] {'loss': 0.007, 'learning_rate': 4.763e-05, 'epoch': 0.18} 5%|▍ | 476/10000 [1:42:59<34:07:58, 12.90s/it] 5%|▍ | 477/10000 [1:43:12<34:05:50, 12.89s/it] {'loss': 0.0057, 'learning_rate': 4.7625000000000006e-05, 'epoch': 0.18} 5%|▍ | 477/10000 [1:43:12<34:05:50, 12.89s/it] 5%|▍ | 478/10000 [1:43:25<34:05:47, 12.89s/it] {'loss': 0.0067, 'learning_rate': 4.762e-05, 'epoch': 0.18} 5%|▍ | 478/10000 [1:43:25<34:05:47, 12.89s/it] 5%|▍ | 479/10000 [1:43:37<34:05:24, 12.89s/it] {'loss': 0.0068, 'learning_rate': 4.7615000000000004e-05, 'epoch': 0.18} 5%|▍ | 479/10000 [1:43:38<34:05:24, 12.89s/it] 5%|▍ | 480/10000 [1:43:50<34:05:17, 12.89s/it] {'loss': 0.0101, 'learning_rate': 4.761000000000001e-05, 'epoch': 0.18} 5%|▍ | 480/10000 [1:43:50<34:05:17, 12.89s/it] 5%|▍ | 481/10000 [1:44:03<34:03:53, 12.88s/it] {'loss': 0.0082, 'learning_rate': 4.7605e-05, 'epoch': 0.18} 5%|▍ | 481/10000 [1:44:03<34:03:53, 12.88s/it] 5%|▍ | 482/10000 [1:44:16<34:02:17, 12.87s/it] {'loss': 0.0097, 'learning_rate': 4.76e-05, 'epoch': 0.18} 5%|▍ | 482/10000 [1:44:16<34:02:17, 12.87s/it] 5%|▍ | 483/10000 [1:44:29<34:05:59, 12.90s/it] {'loss': 0.0067, 'learning_rate': 4.7595e-05, 'epoch': 0.18} 5%|▍ | 483/10000 [1:44:29<34:05:59, 12.90s/it] 5%|▍ | 484/10000 [1:44:42<34:13:33, 12.95s/it] {'loss': 0.0072, 'learning_rate': 4.7590000000000003e-05, 'epoch': 0.18} 5%|▍ | 484/10000 [1:44:42<34:13:33, 12.95s/it] 5%|▍ | 485/10000 [1:44:55<34:16:34, 12.97s/it] {'loss': 0.007, 'learning_rate': 4.7585e-05, 'epoch': 0.18} 5%|▍ | 485/10000 [1:44:55<34:16:34, 12.97s/it] 5%|▍ | 486/10000 [1:45:08<34:17:18, 12.97s/it] {'loss': 0.0098, 'learning_rate': 4.758e-05, 'epoch': 0.18} 5%|▍ | 486/10000 [1:45:08<34:17:18, 12.97s/it] 5%|▍ | 487/10000 [1:45:21<34:18:51, 12.99s/it] {'loss': 0.009, 'learning_rate': 4.7575000000000004e-05, 'epoch': 0.18} 5%|▍ | 487/10000 [1:45:21<34:18:51, 12.99s/it] 5%|▍ | 488/10000 [1:45:34<34:17:00, 12.98s/it] {'loss': 0.0084, 'learning_rate': 4.757e-05, 'epoch': 0.18} 5%|▍ | 488/10000 [1:45:34<34:17:00, 12.98s/it] 5%|▍ | 489/10000 [1:45:47<34:16:51, 12.98s/it] {'loss': 0.0056, 'learning_rate': 4.7565e-05, 'epoch': 0.18} 5%|▍ | 489/10000 [1:45:47<34:16:51, 12.98s/it] 5%|▍ | 490/10000 [1:46:00<34:16:06, 12.97s/it] {'loss': 0.0054, 'learning_rate': 4.7560000000000005e-05, 'epoch': 0.18} 5%|▍ | 490/10000 [1:46:00<34:16:06, 12.97s/it] 5%|▍ | 491/10000 [1:46:13<34:13:34, 12.96s/it] {'loss': 0.0096, 'learning_rate': 4.7555e-05, 'epoch': 0.19} 5%|▍ | 491/10000 [1:46:13<34:13:34, 12.96s/it] 5%|▍ | 492/10000 [1:46:26<34:13:57, 12.96s/it] {'loss': 0.0065, 'learning_rate': 4.755e-05, 'epoch': 0.19} 5%|▍ | 492/10000 [1:46:26<34:13:57, 12.96s/it] 5%|▍ | 493/10000 [1:46:39<34:11:52, 12.95s/it] {'loss': 0.0074, 'learning_rate': 4.7545e-05, 'epoch': 0.19} 5%|▍ | 493/10000 [1:46:39<34:11:52, 12.95s/it] 5%|▍ | 494/10000 [1:46:52<34:12:32, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.754e-05, 'epoch': 0.19} 5%|▍ | 494/10000 [1:46:52<34:12:32, 12.96s/it] 5%|▍ | 495/10000 [1:47:05<34:10:14, 12.94s/it] {'loss': 0.0159, 'learning_rate': 4.7535000000000005e-05, 'epoch': 0.19} 5%|▍ | 495/10000 [1:47:05<34:10:14, 12.94s/it] 5%|▍ | 496/10000 [1:47:18<34:10:41, 12.95s/it] {'loss': 0.0064, 'learning_rate': 4.753e-05, 'epoch': 0.19} 5%|▍ | 496/10000 [1:47:18<34:10:41, 12.95s/it] 5%|▍ | 497/10000 [1:47:31<34:10:38, 12.95s/it] {'loss': 0.0078, 'learning_rate': 4.7525e-05, 'epoch': 0.19} 5%|▍ | 497/10000 [1:47:31<34:10:38, 12.95s/it] 5%|▍ | 498/10000 [1:47:44<34:13:35, 12.97s/it] {'loss': 0.0073, 'learning_rate': 4.7520000000000006e-05, 'epoch': 0.19} 5%|▍ | 498/10000 [1:47:44<34:13:35, 12.97s/it] 5%|▍ | 499/10000 [1:47:57<34:14:58, 12.98s/it] {'loss': 0.0065, 'learning_rate': 4.7515e-05, 'epoch': 0.19} 5%|▍ | 499/10000 [1:47:57<34:14:58, 12.98s/it] 5%|▌ | 500/10000 [1:48:10<34:10:38, 12.95s/it] {'loss': 0.0066, 'learning_rate': 4.7510000000000004e-05, 'epoch': 0.19} 5%|▌ | 500/10000 [1:48:10<34:10:38, 12.95s/it] 5%|▌ | 501/10000 [1:48:22<34:09:37, 12.95s/it] {'loss': 0.009, 'learning_rate': 4.7505e-05, 'epoch': 0.19} 5%|▌ | 501/10000 [1:48:22<34:09:37, 12.95s/it] 5%|▌ | 502/10000 [1:48:35<34:10:18, 12.95s/it] {'loss': 0.0101, 'learning_rate': 4.75e-05, 'epoch': 0.19} 5%|▌ | 502/10000 [1:48:35<34:10:18, 12.95s/it] 5%|▌ | 503/10000 [1:48:48<34:09:16, 12.95s/it] {'loss': 0.0077, 'learning_rate': 4.7495e-05, 'epoch': 0.19} 5%|▌ | 503/10000 [1:48:48<34:09:16, 12.95s/it] 5%|▌ | 504/10000 [1:49:01<34:07:17, 12.94s/it] {'loss': 0.0071, 'learning_rate': 4.749e-05, 'epoch': 0.19} 5%|▌ | 504/10000 [1:49:01<34:07:17, 12.94s/it] 5%|▌ | 505/10000 [1:49:14<34:04:25, 12.92s/it] {'loss': 0.0069, 'learning_rate': 4.7485000000000004e-05, 'epoch': 0.19} 5%|▌ | 505/10000 [1:49:14<34:04:25, 12.92s/it] 5%|▌ | 506/10000 [1:49:27<34:07:40, 12.94s/it] {'loss': 0.0098, 'learning_rate': 4.748e-05, 'epoch': 0.19} 5%|▌ | 506/10000 [1:49:27<34:07:40, 12.94s/it] 5%|▌ | 507/10000 [1:49:40<34:07:57, 12.94s/it] {'loss': 0.0104, 'learning_rate': 4.7475e-05, 'epoch': 0.19} 5%|▌ | 507/10000 [1:49:40<34:07:57, 12.94s/it] 5%|▌ | 508/10000 [1:49:53<34:11:20, 12.97s/it] {'loss': 0.008, 'learning_rate': 4.7470000000000005e-05, 'epoch': 0.19} 5%|▌ | 508/10000 [1:49:53<34:11:20, 12.97s/it] 5%|▌ | 509/10000 [1:50:06<34:10:06, 12.96s/it] {'loss': 0.0078, 'learning_rate': 4.746500000000001e-05, 'epoch': 0.19} 5%|▌ | 509/10000 [1:50:06<34:10:06, 12.96s/it] 5%|▌ | 510/10000 [1:50:19<34:13:27, 12.98s/it] {'loss': 0.0064, 'learning_rate': 4.746e-05, 'epoch': 0.19} 5%|▌ | 510/10000 [1:50:19<34:13:27, 12.98s/it] 5%|▌ | 511/10000 [1:50:32<34:10:19, 12.96s/it] {'loss': 0.0109, 'learning_rate': 4.7455000000000006e-05, 'epoch': 0.19} 5%|▌ | 511/10000 [1:50:32<34:10:19, 12.96s/it] 5%|▌ | 512/10000 [1:50:45<34:03:07, 12.92s/it] {'loss': 0.0134, 'learning_rate': 4.745e-05, 'epoch': 0.19} 5%|▌ | 512/10000 [1:50:45<34:03:07, 12.92s/it] 5%|▌ | 513/10000 [1:50:58<34:03:42, 12.93s/it] {'loss': 0.0068, 'learning_rate': 4.7445e-05, 'epoch': 0.19} 5%|▌ | 513/10000 [1:50:58<34:03:42, 12.93s/it] 5%|▌ | 514/10000 [1:51:11<34:01:39, 12.91s/it] {'loss': 0.0065, 'learning_rate': 4.744e-05, 'epoch': 0.19} 5%|▌ | 514/10000 [1:51:11<34:01:39, 12.91s/it] 5%|▌ | 515/10000 [1:51:24<33:59:43, 12.90s/it] {'loss': 0.0078, 'learning_rate': 4.7435e-05, 'epoch': 0.19} 5%|▌ | 515/10000 [1:51:24<33:59:43, 12.90s/it] 5%|▌ | 516/10000 [1:51:36<33:57:10, 12.89s/it] {'loss': 0.0068, 'learning_rate': 4.7430000000000005e-05, 'epoch': 0.19} 5%|▌ | 516/10000 [1:51:36<33:57:10, 12.89s/it] 5%|▌ | 517/10000 [1:51:49<33:55:10, 12.88s/it] {'loss': 0.0058, 'learning_rate': 4.7425e-05, 'epoch': 0.19} 5%|▌ | 517/10000 [1:51:49<33:55:10, 12.88s/it] 5%|▌ | 518/10000 [1:52:02<33:55:21, 12.88s/it] {'loss': 0.008, 'learning_rate': 4.742e-05, 'epoch': 0.2} 5%|▌ | 518/10000 [1:52:02<33:55:21, 12.88s/it] 5%|▌ | 519/10000 [1:52:15<33:53:59, 12.87s/it] {'loss': 0.006, 'learning_rate': 4.7415000000000006e-05, 'epoch': 0.2} 5%|▌ | 519/10000 [1:52:15<33:53:59, 12.87s/it] 5%|▌ | 520/10000 [1:52:28<33:58:25, 12.90s/it] {'loss': 0.0071, 'learning_rate': 4.741e-05, 'epoch': 0.2} 5%|▌ | 520/10000 [1:52:28<33:58:25, 12.90s/it] 5%|▌ | 521/10000 [1:52:41<33:58:13, 12.90s/it] {'loss': 0.0069, 'learning_rate': 4.7405000000000004e-05, 'epoch': 0.2} 5%|▌ | 521/10000 [1:52:41<33:58:13, 12.90s/it] 5%|▌ | 522/10000 [1:52:54<33:56:47, 12.89s/it] {'loss': 0.0065, 'learning_rate': 4.74e-05, 'epoch': 0.2} 5%|▌ | 522/10000 [1:52:54<33:56:47, 12.89s/it] 5%|▌ | 523/10000 [1:53:07<33:56:03, 12.89s/it] {'loss': 0.0098, 'learning_rate': 4.7395e-05, 'epoch': 0.2} 5%|▌ | 523/10000 [1:53:07<33:56:03, 12.89s/it] 5%|▌ | 524/10000 [1:53:19<33:52:56, 12.87s/it] {'loss': 0.0072, 'learning_rate': 4.739e-05, 'epoch': 0.2} 5%|▌ | 524/10000 [1:53:19<33:52:56, 12.87s/it] 5%|▌ | 525/10000 [1:53:32<33:53:02, 12.87s/it] {'loss': 0.0059, 'learning_rate': 4.7385e-05, 'epoch': 0.2} 5%|▌ | 525/10000 [1:53:32<33:53:02, 12.87s/it] 5%|▌ | 526/10000 [1:53:45<33:52:47, 12.87s/it] {'loss': 0.0059, 'learning_rate': 4.7380000000000004e-05, 'epoch': 0.2} 5%|▌ | 526/10000 [1:53:45<33:52:47, 12.87s/it] 5%|▌ | 527/10000 [1:53:58<33:56:04, 12.90s/it] {'loss': 0.0081, 'learning_rate': 4.7375e-05, 'epoch': 0.2} 5%|▌ | 527/10000 [1:53:58<33:56:04, 12.90s/it] 5%|▌ | 528/10000 [1:54:11<33:56:02, 12.90s/it] {'loss': 0.0089, 'learning_rate': 4.737e-05, 'epoch': 0.2} 5%|▌ | 528/10000 [1:54:11<33:56:02, 12.90s/it] 5%|▌ | 529/10000 [1:54:24<33:53:07, 12.88s/it] {'loss': 0.0085, 'learning_rate': 4.7365000000000005e-05, 'epoch': 0.2} 5%|▌ | 529/10000 [1:54:24<33:53:07, 12.88s/it] 5%|▌ | 530/10000 [1:54:37<33:51:56, 12.87s/it] {'loss': 0.0103, 'learning_rate': 4.736000000000001e-05, 'epoch': 0.2} 5%|▌ | 530/10000 [1:54:37<33:51:56, 12.87s/it] 5%|▌ | 531/10000 [1:54:50<33:51:16, 12.87s/it] {'loss': 0.0066, 'learning_rate': 4.7355e-05, 'epoch': 0.2} 5%|▌ | 531/10000 [1:54:50<33:51:16, 12.87s/it] 5%|▌ | 532/10000 [1:55:02<33:52:25, 12.88s/it] {'loss': 0.0072, 'learning_rate': 4.735e-05, 'epoch': 0.2} 5%|▌ | 532/10000 [1:55:03<33:52:25, 12.88s/it] 5%|▌ | 533/10000 [1:55:15<33:53:16, 12.89s/it] {'loss': 0.0114, 'learning_rate': 4.7345e-05, 'epoch': 0.2} 5%|▌ | 533/10000 [1:55:15<33:53:16, 12.89s/it] 5%|▌ | 534/10000 [1:55:28<33:51:06, 12.87s/it] {'loss': 0.0064, 'learning_rate': 4.7340000000000004e-05, 'epoch': 0.2} 5%|▌ | 534/10000 [1:55:28<33:51:06, 12.87s/it] 5%|▌ | 535/10000 [1:55:41<33:51:18, 12.88s/it] {'loss': 0.0065, 'learning_rate': 4.7335e-05, 'epoch': 0.2} 5%|▌ | 535/10000 [1:55:41<33:51:18, 12.88s/it] 5%|▌ | 536/10000 [1:55:54<33:49:00, 12.86s/it] {'loss': 0.0073, 'learning_rate': 4.733e-05, 'epoch': 0.2} 5%|▌ | 536/10000 [1:55:54<33:49:00, 12.86s/it] 5%|▌ | 537/10000 [1:56:07<33:52:40, 12.89s/it] {'loss': 0.0094, 'learning_rate': 4.7325000000000005e-05, 'epoch': 0.2} 5%|▌ | 537/10000 [1:56:07<33:52:40, 12.89s/it] 5%|▌ | 538/10000 [1:56:20<33:53:55, 12.90s/it] {'loss': 0.0066, 'learning_rate': 4.732e-05, 'epoch': 0.2} 5%|▌ | 538/10000 [1:56:20<33:53:55, 12.90s/it] 5%|▌ | 539/10000 [1:56:33<33:52:20, 12.89s/it] {'loss': 0.0064, 'learning_rate': 4.7315000000000004e-05, 'epoch': 0.2} 5%|▌ | 539/10000 [1:56:33<33:52:20, 12.89s/it] 5%|▌ | 540/10000 [1:56:46<33:53:47, 12.90s/it] {'loss': 0.0104, 'learning_rate': 4.7310000000000006e-05, 'epoch': 0.2} 5%|▌ | 540/10000 [1:56:46<33:53:47, 12.90s/it] 5%|▌ | 541/10000 [1:56:59<33:56:39, 12.92s/it] {'loss': 0.0065, 'learning_rate': 4.7305e-05, 'epoch': 0.2} 5%|▌ | 541/10000 [1:56:59<33:56:39, 12.92s/it] 5%|▌ | 542/10000 [1:57:11<33:56:16, 12.92s/it] {'loss': 0.0062, 'learning_rate': 4.73e-05, 'epoch': 0.2} 5%|▌ | 542/10000 [1:57:12<33:56:16, 12.92s/it] 5%|▌ | 543/10000 [1:57:24<33:56:35, 12.92s/it] {'loss': 0.0055, 'learning_rate': 4.7295e-05, 'epoch': 0.2} 5%|▌ | 543/10000 [1:57:24<33:56:35, 12.92s/it] 5%|▌ | 544/10000 [1:57:37<33:53:25, 12.90s/it] {'loss': 0.0064, 'learning_rate': 4.729e-05, 'epoch': 0.2} 5%|▌ | 544/10000 [1:57:37<33:53:25, 12.90s/it] 5%|▌ | 545/10000 [1:57:50<33:55:17, 12.92s/it] {'loss': 0.0055, 'learning_rate': 4.7285e-05, 'epoch': 0.21} 5%|▌ | 545/10000 [1:57:50<33:55:17, 12.92s/it] 5%|▌ | 546/10000 [1:58:03<33:55:32, 12.92s/it] {'loss': 0.0073, 'learning_rate': 4.728e-05, 'epoch': 0.21} 5%|▌ | 546/10000 [1:58:03<33:55:32, 12.92s/it] 5%|▌ | 547/10000 [1:58:16<33:51:03, 12.89s/it] {'loss': 0.0074, 'learning_rate': 4.7275000000000004e-05, 'epoch': 0.21} 5%|▌ | 547/10000 [1:58:16<33:51:03, 12.89s/it] 5%|▌ | 548/10000 [1:58:29<33:50:28, 12.89s/it] {'loss': 0.0077, 'learning_rate': 4.7270000000000007e-05, 'epoch': 0.21} 5%|▌ | 548/10000 [1:58:29<33:50:28, 12.89s/it] 5%|▌ | 549/10000 [1:58:42<33:48:56, 12.88s/it] {'loss': 0.0088, 'learning_rate': 4.7265e-05, 'epoch': 0.21} 5%|▌ | 549/10000 [1:58:42<33:48:56, 12.88s/it] 6%|▌ | 550/10000 [1:58:55<33:52:12, 12.90s/it] {'loss': 0.0064, 'learning_rate': 4.7260000000000005e-05, 'epoch': 0.21} 6%|▌ | 550/10000 [1:58:55<33:52:12, 12.90s/it] 6%|▌ | 551/10000 [1:59:08<33:53:20, 12.91s/it] {'loss': 0.0105, 'learning_rate': 4.725500000000001e-05, 'epoch': 0.21} 6%|▌ | 551/10000 [1:59:08<33:53:20, 12.91s/it] 6%|▌ | 552/10000 [1:59:21<33:56:47, 12.93s/it] {'loss': 0.0097, 'learning_rate': 4.7249999999999997e-05, 'epoch': 0.21} 6%|▌ | 552/10000 [1:59:21<33:56:47, 12.93s/it] 6%|▌ | 553/10000 [1:59:34<34:00:01, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.7245e-05, 'epoch': 0.21} 6%|▌ | 553/10000 [1:59:34<34:00:01, 12.96s/it] 6%|▌ | 554/10000 [1:59:47<33:59:46, 12.96s/it] {'loss': 0.052, 'learning_rate': 4.724e-05, 'epoch': 0.21} 6%|▌ | 554/10000 [1:59:47<33:59:46, 12.96s/it] 6%|▌ | 555/10000 [2:00:00<34:01:03, 12.97s/it] {'loss': 0.006, 'learning_rate': 4.7235000000000004e-05, 'epoch': 0.21} 6%|▌ | 555/10000 [2:00:00<34:01:03, 12.97s/it] 6%|▌ | 556/10000 [2:00:12<33:56:36, 12.94s/it] {'loss': 0.0075, 'learning_rate': 4.723e-05, 'epoch': 0.21} 6%|▌ | 556/10000 [2:00:12<33:56:36, 12.94s/it] 6%|▌ | 557/10000 [2:00:25<33:58:07, 12.95s/it] {'loss': 0.0052, 'learning_rate': 4.7225e-05, 'epoch': 0.21} 6%|▌ | 557/10000 [2:00:25<33:58:07, 12.95s/it] 6%|▌ | 558/10000 [2:00:38<33:57:06, 12.94s/it] {'loss': 0.0051, 'learning_rate': 4.7220000000000005e-05, 'epoch': 0.21} 6%|▌ | 558/10000 [2:00:38<33:57:06, 12.94s/it] 6%|▌ | 559/10000 [2:00:51<33:55:16, 12.93s/it] {'loss': 0.0077, 'learning_rate': 4.7215e-05, 'epoch': 0.21} 6%|▌ | 559/10000 [2:00:51<33:55:16, 12.93s/it] 6%|▌ | 560/10000 [2:01:04<33:55:03, 12.93s/it] {'loss': 0.0117, 'learning_rate': 4.7210000000000004e-05, 'epoch': 0.21} 6%|▌ | 560/10000 [2:01:04<33:55:03, 12.93s/it] 6%|▌ | 561/10000 [2:01:17<33:59:41, 12.97s/it] {'loss': 0.0068, 'learning_rate': 4.7205000000000006e-05, 'epoch': 0.21} 6%|▌ | 561/10000 [2:01:17<33:59:41, 12.97s/it] 6%|▌ | 562/10000 [2:01:30<33:59:35, 12.97s/it] {'loss': 0.0124, 'learning_rate': 4.72e-05, 'epoch': 0.21} 6%|▌ | 562/10000 [2:01:30<33:59:35, 12.97s/it] 6%|▌ | 563/10000 [2:01:43<33:58:33, 12.96s/it] {'loss': 0.0072, 'learning_rate': 4.7195e-05, 'epoch': 0.21} 6%|▌ | 563/10000 [2:01:43<33:58:33, 12.96s/it] 6%|▌ | 564/10000 [2:01:56<33:57:08, 12.95s/it] {'loss': 0.0085, 'learning_rate': 4.719e-05, 'epoch': 0.21} 6%|▌ | 564/10000 [2:01:56<33:57:08, 12.95s/it] 6%|▌ | 565/10000 [2:02:09<33:53:48, 12.93s/it] {'loss': 0.0069, 'learning_rate': 4.7185e-05, 'epoch': 0.21} 6%|▌ | 565/10000 [2:02:09<33:53:48, 12.93s/it] 6%|▌ | 566/10000 [2:02:22<33:55:47, 12.95s/it] {'loss': 0.0052, 'learning_rate': 4.718e-05, 'epoch': 0.21} 6%|▌ | 566/10000 [2:02:22<33:55:47, 12.95s/it] 6%|▌ | 567/10000 [2:02:35<33:57:32, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.7175e-05, 'epoch': 0.21} 6%|▌ | 567/10000 [2:02:35<33:57:32, 12.96s/it] 6%|▌ | 568/10000 [2:02:48<33:55:03, 12.95s/it] {'loss': 0.007, 'learning_rate': 4.7170000000000004e-05, 'epoch': 0.21} 6%|▌ | 568/10000 [2:02:48<33:55:03, 12.95s/it] 6%|▌ | 569/10000 [2:03:01<33:54:16, 12.94s/it] {'loss': 0.0091, 'learning_rate': 4.716500000000001e-05, 'epoch': 0.21} 6%|▌ | 569/10000 [2:03:01<33:54:16, 12.94s/it] 6%|▌ | 570/10000 [2:03:14<33:56:39, 12.96s/it] {'loss': 0.0076, 'learning_rate': 4.716e-05, 'epoch': 0.21} 6%|▌ | 570/10000 [2:03:14<33:56:39, 12.96s/it] 6%|▌ | 571/10000 [2:03:27<33:58:45, 12.97s/it] {'loss': 0.0066, 'learning_rate': 4.7155000000000005e-05, 'epoch': 0.22} 6%|▌ | 571/10000 [2:03:27<33:58:45, 12.97s/it] 6%|▌ | 572/10000 [2:03:40<34:00:26, 12.99s/it] {'loss': 0.0071, 'learning_rate': 4.715e-05, 'epoch': 0.22} 6%|▌ | 572/10000 [2:03:40<34:00:26, 12.99s/it] 6%|▌ | 573/10000 [2:03:53<33:57:27, 12.97s/it] {'loss': 0.0088, 'learning_rate': 4.7145000000000003e-05, 'epoch': 0.22} 6%|▌ | 573/10000 [2:03:53<33:57:27, 12.97s/it] 6%|▌ | 574/10000 [2:04:06<33:54:50, 12.95s/it] {'loss': 0.0079, 'learning_rate': 4.714e-05, 'epoch': 0.22} 6%|▌ | 574/10000 [2:04:06<33:54:50, 12.95s/it] 6%|▌ | 575/10000 [2:04:19<33:56:14, 12.96s/it] {'loss': 0.0094, 'learning_rate': 4.7135e-05, 'epoch': 0.22} 6%|▌ | 575/10000 [2:04:19<33:56:14, 12.96s/it] 6%|▌ | 576/10000 [2:04:32<33:57:44, 12.97s/it] {'loss': 0.0062, 'learning_rate': 4.7130000000000004e-05, 'epoch': 0.22} 6%|▌ | 576/10000 [2:04:32<33:57:44, 12.97s/it] 6%|▌ | 577/10000 [2:04:45<33:58:02, 12.98s/it] {'loss': 0.0057, 'learning_rate': 4.7125e-05, 'epoch': 0.22} 6%|▌ | 577/10000 [2:04:45<33:58:02, 12.98s/it] 6%|▌ | 578/10000 [2:04:58<33:53:32, 12.95s/it] {'loss': 0.0067, 'learning_rate': 4.712e-05, 'epoch': 0.22} 6%|▌ | 578/10000 [2:04:58<33:53:32, 12.95s/it] 6%|▌ | 579/10000 [2:05:10<33:49:01, 12.92s/it] {'loss': 0.0048, 'learning_rate': 4.7115000000000005e-05, 'epoch': 0.22} 6%|▌ | 579/10000 [2:05:10<33:49:01, 12.92s/it] 6%|▌ | 580/10000 [2:05:23<33:56:01, 12.97s/it] {'loss': 0.007, 'learning_rate': 4.711e-05, 'epoch': 0.22} 6%|▌ | 580/10000 [2:05:23<33:56:01, 12.97s/it] 6%|▌ | 581/10000 [2:05:36<33:55:47, 12.97s/it] {'loss': 0.0087, 'learning_rate': 4.7105000000000004e-05, 'epoch': 0.22} 6%|▌ | 581/10000 [2:05:36<33:55:47, 12.97s/it] 6%|▌ | 582/10000 [2:05:49<33:55:42, 12.97s/it] {'loss': 0.0073, 'learning_rate': 4.71e-05, 'epoch': 0.22} 6%|▌ | 582/10000 [2:05:49<33:55:42, 12.97s/it] 6%|▌ | 583/10000 [2:06:02<34:00:01, 13.00s/it] {'loss': 0.0089, 'learning_rate': 4.7095e-05, 'epoch': 0.22} 6%|▌ | 583/10000 [2:06:02<34:00:01, 13.00s/it] 6%|▌ | 584/10000 [2:06:15<33:59:53, 13.00s/it] {'loss': 0.0066, 'learning_rate': 4.709e-05, 'epoch': 0.22} 6%|▌ | 584/10000 [2:06:15<33:59:53, 13.00s/it] 6%|▌ | 585/10000 [2:06:28<33:56:55, 12.98s/it] {'loss': 0.0082, 'learning_rate': 4.7085e-05, 'epoch': 0.22} 6%|▌ | 585/10000 [2:06:28<33:56:55, 12.98s/it] 6%|▌ | 586/10000 [2:06:41<33:55:34, 12.97s/it] {'loss': 0.0069, 'learning_rate': 4.708e-05, 'epoch': 0.22} 6%|▌ | 586/10000 [2:06:41<33:55:34, 12.97s/it] 6%|▌ | 587/10000 [2:06:54<33:53:50, 12.96s/it] {'loss': 0.0072, 'learning_rate': 4.7075e-05, 'epoch': 0.22} 6%|▌ | 587/10000 [2:06:54<33:53:50, 12.96s/it] 6%|▌ | 588/10000 [2:07:07<33:57:22, 12.99s/it] {'loss': 0.0065, 'learning_rate': 4.707e-05, 'epoch': 0.22} 6%|▌ | 588/10000 [2:07:07<33:57:22, 12.99s/it] 6%|▌ | 589/10000 [2:07:20<33:57:48, 12.99s/it] {'loss': 0.0069, 'learning_rate': 4.7065000000000004e-05, 'epoch': 0.22} 6%|▌ | 589/10000 [2:07:20<33:57:48, 12.99s/it] 6%|▌ | 590/10000 [2:07:33<33:55:42, 12.98s/it] {'loss': 0.0064, 'learning_rate': 4.706000000000001e-05, 'epoch': 0.22} 6%|▌ | 590/10000 [2:07:33<33:55:42, 12.98s/it] 6%|▌ | 591/10000 [2:07:46<33:52:52, 12.96s/it] {'loss': 0.0069, 'learning_rate': 4.7055e-05, 'epoch': 0.22} 6%|▌ | 591/10000 [2:07:46<33:52:52, 12.96s/it] 6%|▌ | 592/10000 [2:07:59<33:53:59, 12.97s/it] {'loss': 0.0069, 'learning_rate': 4.705e-05, 'epoch': 0.22} 6%|▌ | 592/10000 [2:07:59<33:53:59, 12.97s/it] 6%|▌ | 593/10000 [2:08:12<33:52:38, 12.96s/it] {'loss': 0.008, 'learning_rate': 4.7045e-05, 'epoch': 0.22} 6%|▌ | 593/10000 [2:08:12<33:52:38, 12.96s/it] 6%|▌ | 594/10000 [2:08:25<33:50:42, 12.95s/it] {'loss': 0.0058, 'learning_rate': 4.7040000000000004e-05, 'epoch': 0.22} 6%|▌ | 594/10000 [2:08:25<33:50:42, 12.95s/it] 6%|▌ | 595/10000 [2:08:38<33:53:47, 12.97s/it] {'loss': 0.0074, 'learning_rate': 4.7035e-05, 'epoch': 0.22} 6%|▌ | 595/10000 [2:08:38<33:53:47, 12.97s/it] 6%|▌ | 596/10000 [2:08:51<33:54:35, 12.98s/it] {'loss': 0.0059, 'learning_rate': 4.703e-05, 'epoch': 0.22} 6%|▌ | 596/10000 [2:08:51<33:54:35, 12.98s/it] 6%|▌ | 597/10000 [2:09:04<33:49:59, 12.95s/it] {'loss': 0.007, 'learning_rate': 4.7025000000000005e-05, 'epoch': 0.22} 6%|▌ | 597/10000 [2:09:04<33:49:59, 12.95s/it] 6%|▌ | 598/10000 [2:09:17<33:46:52, 12.93s/it] {'loss': 0.0063, 'learning_rate': 4.702e-05, 'epoch': 0.23} 6%|▌ | 598/10000 [2:09:17<33:46:52, 12.93s/it] 6%|▌ | 599/10000 [2:09:30<33:48:02, 12.94s/it] {'loss': 0.0078, 'learning_rate': 4.7015e-05, 'epoch': 0.23} 6%|▌ | 599/10000 [2:09:30<33:48:02, 12.94s/it] 6%|▌ | 600/10000 [2:09:43<33:47:21, 12.94s/it] {'loss': 0.0054, 'learning_rate': 4.7010000000000006e-05, 'epoch': 0.23} 6%|▌ | 600/10000 [2:09:43<33:47:21, 12.94s/it] 6%|▌ | 601/10000 [2:09:56<33:48:22, 12.95s/it] {'loss': 0.0065, 'learning_rate': 4.7005e-05, 'epoch': 0.23} 6%|▌ | 601/10000 [2:09:56<33:48:22, 12.95s/it] 6%|▌ | 602/10000 [2:10:09<33:53:44, 12.98s/it] {'loss': 0.0064, 'learning_rate': 4.7e-05, 'epoch': 0.23} 6%|▌ | 602/10000 [2:10:09<33:53:44, 12.98s/it] 6%|▌ | 603/10000 [2:10:22<33:53:20, 12.98s/it] {'loss': 0.005, 'learning_rate': 4.6995e-05, 'epoch': 0.23} 6%|▌ | 603/10000 [2:10:22<33:53:20, 12.98s/it] 6%|▌ | 604/10000 [2:10:35<33:53:53, 12.99s/it] {'loss': 0.0073, 'learning_rate': 4.699e-05, 'epoch': 0.23} 6%|▌ | 604/10000 [2:10:35<33:53:53, 12.99s/it] 6%|▌ | 605/10000 [2:10:48<33:52:58, 12.98s/it] {'loss': 0.0061, 'learning_rate': 4.6985e-05, 'epoch': 0.23} 6%|▌ | 605/10000 [2:10:48<33:52:58, 12.98s/it] 6%|▌ | 606/10000 [2:11:01<33:51:41, 12.98s/it] {'loss': 0.0062, 'learning_rate': 4.698e-05, 'epoch': 0.23} 6%|▌ | 606/10000 [2:11:01<33:51:41, 12.98s/it] 6%|▌ | 607/10000 [2:11:14<33:48:36, 12.96s/it] {'loss': 0.0066, 'learning_rate': 4.6975000000000003e-05, 'epoch': 0.23} 6%|▌ | 607/10000 [2:11:14<33:48:36, 12.96s/it] 6%|▌ | 608/10000 [2:11:27<33:45:42, 12.94s/it] {'loss': 0.0057, 'learning_rate': 4.6970000000000006e-05, 'epoch': 0.23} 6%|▌ | 608/10000 [2:11:27<33:45:42, 12.94s/it] 6%|▌ | 609/10000 [2:11:40<33:48:19, 12.96s/it] {'loss': 0.0082, 'learning_rate': 4.6965e-05, 'epoch': 0.23} 6%|▌ | 609/10000 [2:11:40<33:48:19, 12.96s/it] 6%|▌ | 610/10000 [2:11:53<33:53:20, 12.99s/it] {'loss': 0.0062, 'learning_rate': 4.6960000000000004e-05, 'epoch': 0.23} 6%|▌ | 610/10000 [2:11:53<33:53:20, 12.99s/it] 6%|▌ | 611/10000 [2:12:06<33:55:07, 13.01s/it] {'loss': 0.0057, 'learning_rate': 4.695500000000001e-05, 'epoch': 0.23} 6%|▌ | 611/10000 [2:12:06<33:55:07, 13.01s/it] 6%|▌ | 612/10000 [2:12:19<33:52:23, 12.99s/it] {'loss': 0.0062, 'learning_rate': 4.695e-05, 'epoch': 0.23} 6%|▌ | 612/10000 [2:12:19<33:52:23, 12.99s/it] 6%|▌ | 613/10000 [2:12:32<33:49:13, 12.97s/it] {'loss': 0.0069, 'learning_rate': 4.6945e-05, 'epoch': 0.23} 6%|▌ | 613/10000 [2:12:32<33:49:13, 12.97s/it] 6%|▌ | 614/10000 [2:12:44<33:46:42, 12.96s/it] {'loss': 0.0063, 'learning_rate': 4.694e-05, 'epoch': 0.23} 6%|▌ | 614/10000 [2:12:44<33:46:42, 12.96s/it] 6%|▌ | 615/10000 [2:12:57<33:44:27, 12.94s/it] {'loss': 0.0061, 'learning_rate': 4.6935000000000004e-05, 'epoch': 0.23} 6%|▌ | 615/10000 [2:12:57<33:44:27, 12.94s/it] 6%|▌ | 616/10000 [2:13:10<33:43:55, 12.94s/it] {'loss': 0.0099, 'learning_rate': 4.693e-05, 'epoch': 0.23} 6%|▌ | 616/10000 [2:13:10<33:43:55, 12.94s/it] 6%|▌ | 617/10000 [2:13:23<33:44:36, 12.95s/it] {'loss': 0.0071, 'learning_rate': 4.6925e-05, 'epoch': 0.23} 6%|▌ | 617/10000 [2:13:23<33:44:36, 12.95s/it] 6%|▌ | 618/10000 [2:13:36<33:40:09, 12.92s/it] {'loss': 0.0073, 'learning_rate': 4.6920000000000005e-05, 'epoch': 0.23} 6%|▌ | 618/10000 [2:13:36<33:40:09, 12.92s/it] 6%|▌ | 619/10000 [2:13:49<33:43:31, 12.94s/it] {'loss': 0.0059, 'learning_rate': 4.6915e-05, 'epoch': 0.23} 6%|▌ | 619/10000 [2:13:49<33:43:31, 12.94s/it] 6%|▌ | 620/10000 [2:14:02<33:44:09, 12.95s/it] {'loss': 0.0107, 'learning_rate': 4.691e-05, 'epoch': 0.23} 6%|▌ | 620/10000 [2:14:02<33:44:09, 12.95s/it] 6%|▌ | 621/10000 [2:14:15<33:45:19, 12.96s/it] {'loss': 0.0076, 'learning_rate': 4.6905000000000006e-05, 'epoch': 0.23} 6%|▌ | 621/10000 [2:14:15<33:45:19, 12.96s/it] 6%|▌ | 622/10000 [2:14:28<33:47:27, 12.97s/it] {'loss': 0.0046, 'learning_rate': 4.69e-05, 'epoch': 0.23} 6%|▌ | 622/10000 [2:14:28<33:47:27, 12.97s/it] 6%|▌ | 623/10000 [2:14:41<33:43:52, 12.95s/it] {'loss': 0.0063, 'learning_rate': 4.6895e-05, 'epoch': 0.23} 6%|▌ | 623/10000 [2:14:41<33:43:52, 12.95s/it] 6%|▌ | 624/10000 [2:14:54<33:48:59, 12.98s/it] {'loss': 0.0072, 'learning_rate': 4.689e-05, 'epoch': 0.24} 6%|▌ | 624/10000 [2:14:54<33:48:59, 12.98s/it] 6%|▋ | 625/10000 [2:15:07<33:47:39, 12.98s/it] {'loss': 0.0054, 'learning_rate': 4.6885e-05, 'epoch': 0.24} 6%|▋ | 625/10000 [2:15:07<33:47:39, 12.98s/it] 6%|▋ | 626/10000 [2:15:20<33:47:45, 12.98s/it] {'loss': 0.0064, 'learning_rate': 4.688e-05, 'epoch': 0.24} 6%|▋ | 626/10000 [2:15:20<33:47:45, 12.98s/it] 6%|▋ | 627/10000 [2:15:33<33:46:50, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.6875e-05, 'epoch': 0.24} 6%|▋ | 627/10000 [2:15:33<33:46:50, 12.97s/it] 6%|▋ | 628/10000 [2:15:46<33:45:29, 12.97s/it] {'loss': 0.0074, 'learning_rate': 4.6870000000000004e-05, 'epoch': 0.24} 6%|▋ | 628/10000 [2:15:46<33:45:29, 12.97s/it] 6%|▋ | 629/10000 [2:15:59<33:43:54, 12.96s/it] {'loss': 0.0065, 'learning_rate': 4.6865000000000006e-05, 'epoch': 0.24} 6%|▋ | 629/10000 [2:15:59<33:43:54, 12.96s/it] 6%|▋ | 630/10000 [2:16:12<33:44:29, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.686e-05, 'epoch': 0.24} 6%|▋ | 630/10000 [2:16:12<33:44:29, 12.96s/it] 6%|▋ | 631/10000 [2:16:25<33:44:39, 12.97s/it] {'loss': 0.0066, 'learning_rate': 4.6855000000000005e-05, 'epoch': 0.24} 6%|▋ | 631/10000 [2:16:25<33:44:39, 12.97s/it] 6%|▋ | 632/10000 [2:16:38<33:46:02, 12.98s/it] {'loss': 0.0074, 'learning_rate': 4.685000000000001e-05, 'epoch': 0.24} 6%|▋ | 632/10000 [2:16:38<33:46:02, 12.98s/it] 6%|▋ | 633/10000 [2:16:51<33:43:48, 12.96s/it] {'loss': 0.0067, 'learning_rate': 4.6845e-05, 'epoch': 0.24} 6%|▋ | 633/10000 [2:16:51<33:43:48, 12.96s/it] 6%|▋ | 634/10000 [2:17:04<33:41:30, 12.95s/it] {'loss': 0.0094, 'learning_rate': 4.684e-05, 'epoch': 0.24} 6%|▋ | 634/10000 [2:17:04<33:41:30, 12.95s/it] 6%|▋ | 635/10000 [2:17:17<33:42:06, 12.96s/it] {'loss': 0.005, 'learning_rate': 4.6835e-05, 'epoch': 0.24} 6%|▋ | 635/10000 [2:17:17<33:42:06, 12.96s/it] 6%|▋ | 636/10000 [2:17:30<33:41:48, 12.95s/it] {'loss': 0.0062, 'learning_rate': 4.6830000000000004e-05, 'epoch': 0.24} 6%|▋ | 636/10000 [2:17:30<33:41:48, 12.95s/it] 6%|▋ | 637/10000 [2:17:43<33:47:45, 12.99s/it] {'loss': 0.0066, 'learning_rate': 4.6825e-05, 'epoch': 0.24} 6%|▋ | 637/10000 [2:17:43<33:47:45, 12.99s/it] 6%|▋ | 638/10000 [2:17:56<33:45:37, 12.98s/it] {'loss': 0.0058, 'learning_rate': 4.682e-05, 'epoch': 0.24} 6%|▋ | 638/10000 [2:17:56<33:45:37, 12.98s/it] 6%|▋ | 639/10000 [2:18:08<33:41:55, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.6815000000000005e-05, 'epoch': 0.24} 6%|▋ | 639/10000 [2:18:08<33:41:55, 12.96s/it] 6%|▋ | 640/10000 [2:18:21<33:42:14, 12.96s/it] {'loss': 0.009, 'learning_rate': 4.681e-05, 'epoch': 0.24} 6%|▋ | 640/10000 [2:18:21<33:42:14, 12.96s/it] 6%|▋ | 641/10000 [2:18:34<33:41:40, 12.96s/it] {'loss': 0.0064, 'learning_rate': 4.6805e-05, 'epoch': 0.24} 6%|▋ | 641/10000 [2:18:34<33:41:40, 12.96s/it] 6%|▋ | 642/10000 [2:18:47<33:40:44, 12.96s/it] {'loss': 0.0078, 'learning_rate': 4.6800000000000006e-05, 'epoch': 0.24} 6%|▋ | 642/10000 [2:18:47<33:40:44, 12.96s/it] 6%|▋ | 643/10000 [2:19:00<33:39:56, 12.95s/it] {'loss': 0.0078, 'learning_rate': 4.6795e-05, 'epoch': 0.24} 6%|▋ | 643/10000 [2:19:00<33:39:56, 12.95s/it] 6%|▋ | 644/10000 [2:19:13<33:39:41, 12.95s/it] {'loss': 0.0069, 'learning_rate': 4.679e-05, 'epoch': 0.24} 6%|▋ | 644/10000 [2:19:13<33:39:41, 12.95s/it] 6%|▋ | 645/10000 [2:19:26<33:42:59, 12.97s/it] {'loss': 0.0069, 'learning_rate': 4.6785e-05, 'epoch': 0.24} 6%|▋ | 645/10000 [2:19:26<33:42:59, 12.97s/it] 6%|▋ | 646/10000 [2:19:39<33:35:54, 12.93s/it] {'loss': 0.0273, 'learning_rate': 4.678e-05, 'epoch': 0.24} 6%|▋ | 646/10000 [2:19:39<33:35:54, 12.93s/it] 6%|▋ | 647/10000 [2:19:52<33:35:56, 12.93s/it] {'loss': 0.0042, 'learning_rate': 4.6775000000000005e-05, 'epoch': 0.24} 6%|▋ | 647/10000 [2:19:52<33:35:56, 12.93s/it] 6%|▋ | 648/10000 [2:20:05<33:37:55, 12.95s/it] {'loss': 0.0068, 'learning_rate': 4.677e-05, 'epoch': 0.24} 6%|▋ | 648/10000 [2:20:05<33:37:55, 12.95s/it] 6%|▋ | 649/10000 [2:20:18<33:32:00, 12.91s/it] {'loss': 0.0054, 'learning_rate': 4.6765000000000004e-05, 'epoch': 0.24} 6%|▋ | 649/10000 [2:20:18<33:32:00, 12.91s/it] 6%|▋ | 650/10000 [2:20:31<33:26:50, 12.88s/it] {'loss': 0.0065, 'learning_rate': 4.6760000000000006e-05, 'epoch': 0.24} 6%|▋ | 650/10000 [2:20:31<33:26:50, 12.88s/it] 7%|▋ | 651/10000 [2:20:44<33:28:36, 12.89s/it] {'loss': 0.0056, 'learning_rate': 4.6755e-05, 'epoch': 0.25} 7%|▋ | 651/10000 [2:20:44<33:28:36, 12.89s/it] 7%|▋ | 652/10000 [2:20:56<33:29:11, 12.90s/it] {'loss': 0.0072, 'learning_rate': 4.6750000000000005e-05, 'epoch': 0.25} 7%|▋ | 652/10000 [2:20:56<33:29:11, 12.90s/it] 7%|▋ | 653/10000 [2:21:09<33:27:19, 12.89s/it] {'loss': 0.0069, 'learning_rate': 4.6745e-05, 'epoch': 0.25} 7%|▋ | 653/10000 [2:21:09<33:27:19, 12.89s/it] 7%|▋ | 654/10000 [2:21:22<33:28:30, 12.89s/it] {'loss': 0.0059, 'learning_rate': 4.674e-05, 'epoch': 0.25} 7%|▋ | 654/10000 [2:21:22<33:28:30, 12.89s/it] 7%|▋ | 655/10000 [2:21:35<33:30:47, 12.91s/it] {'loss': 0.0088, 'learning_rate': 4.6735e-05, 'epoch': 0.25} 7%|▋ | 655/10000 [2:21:35<33:30:47, 12.91s/it] 7%|▋ | 656/10000 [2:21:48<33:32:19, 12.92s/it] {'loss': 0.0067, 'learning_rate': 4.673e-05, 'epoch': 0.25} 7%|▋ | 656/10000 [2:21:48<33:32:19, 12.92s/it] 7%|▋ | 657/10000 [2:22:01<33:34:17, 12.94s/it] {'loss': 0.0105, 'learning_rate': 4.6725000000000004e-05, 'epoch': 0.25} 7%|▋ | 657/10000 [2:22:01<33:34:17, 12.94s/it] 7%|▋ | 658/10000 [2:22:14<33:33:06, 12.93s/it] {'loss': 0.0048, 'learning_rate': 4.672e-05, 'epoch': 0.25} 7%|▋ | 658/10000 [2:22:14<33:33:06, 12.93s/it] 7%|▋ | 659/10000 [2:22:27<33:29:36, 12.91s/it] {'loss': 0.0061, 'learning_rate': 4.6715e-05, 'epoch': 0.25} 7%|▋ | 659/10000 [2:22:27<33:29:36, 12.91s/it] 7%|▋ | 660/10000 [2:22:40<33:35:45, 12.95s/it] {'loss': 0.0057, 'learning_rate': 4.6710000000000005e-05, 'epoch': 0.25} 7%|▋ | 660/10000 [2:22:40<33:35:45, 12.95s/it] 7%|▋ | 661/10000 [2:22:53<33:37:47, 12.96s/it] {'loss': 0.0066, 'learning_rate': 4.670500000000001e-05, 'epoch': 0.25} 7%|▋ | 661/10000 [2:22:53<33:37:47, 12.96s/it] 7%|▋ | 662/10000 [2:23:06<33:37:55, 12.97s/it] {'loss': 0.0066, 'learning_rate': 4.6700000000000003e-05, 'epoch': 0.25} 7%|▋ | 662/10000 [2:23:06<33:37:55, 12.97s/it] 7%|▋ | 663/10000 [2:23:19<33:37:19, 12.96s/it] {'loss': 0.0074, 'learning_rate': 4.6695e-05, 'epoch': 0.25} 7%|▋ | 663/10000 [2:23:19<33:37:19, 12.96s/it] 7%|▋ | 664/10000 [2:23:32<33:38:02, 12.97s/it] {'loss': 0.0066, 'learning_rate': 4.669e-05, 'epoch': 0.25} 7%|▋ | 664/10000 [2:23:32<33:38:02, 12.97s/it] 7%|▋ | 665/10000 [2:23:45<33:39:18, 12.98s/it] {'loss': 0.0058, 'learning_rate': 4.6685e-05, 'epoch': 0.25} 7%|▋ | 665/10000 [2:23:45<33:39:18, 12.98s/it] 7%|▋ | 666/10000 [2:23:58<33:37:12, 12.97s/it] {'loss': 0.0056, 'learning_rate': 4.668e-05, 'epoch': 0.25} 7%|▋ | 666/10000 [2:23:58<33:37:12, 12.97s/it] 7%|▋ | 667/10000 [2:24:11<33:34:50, 12.95s/it] {'loss': 0.0135, 'learning_rate': 4.6675e-05, 'epoch': 0.25} 7%|▋ | 667/10000 [2:24:11<33:34:50, 12.95s/it] 7%|▋ | 668/10000 [2:24:24<33:32:18, 12.94s/it] {'loss': 0.007, 'learning_rate': 4.6670000000000005e-05, 'epoch': 0.25} 7%|▋ | 668/10000 [2:24:24<33:32:18, 12.94s/it] 7%|▋ | 669/10000 [2:24:37<33:32:18, 12.94s/it] {'loss': 0.0043, 'learning_rate': 4.6665e-05, 'epoch': 0.25} 7%|▋ | 669/10000 [2:24:37<33:32:18, 12.94s/it] 7%|▋ | 670/10000 [2:24:50<33:33:23, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.6660000000000004e-05, 'epoch': 0.25} 7%|▋ | 670/10000 [2:24:50<33:33:23, 12.95s/it] 7%|▋ | 671/10000 [2:25:03<33:38:18, 12.98s/it] {'loss': 0.0061, 'learning_rate': 4.6655000000000006e-05, 'epoch': 0.25} 7%|▋ | 671/10000 [2:25:03<33:38:18, 12.98s/it] 7%|▋ | 672/10000 [2:25:16<33:36:02, 12.97s/it] {'loss': 0.0065, 'learning_rate': 4.665e-05, 'epoch': 0.25} 7%|▋ | 672/10000 [2:25:16<33:36:02, 12.97s/it] 7%|▋ | 673/10000 [2:25:28<33:36:22, 12.97s/it] {'loss': 0.0067, 'learning_rate': 4.6645e-05, 'epoch': 0.25} 7%|▋ | 673/10000 [2:25:29<33:36:22, 12.97s/it] 7%|▋ | 674/10000 [2:25:41<33:33:45, 12.96s/it] {'loss': 0.0068, 'learning_rate': 4.664e-05, 'epoch': 0.25} 7%|▋ | 674/10000 [2:25:41<33:33:45, 12.96s/it] 7%|▋ | 675/10000 [2:25:54<33:34:39, 12.96s/it] {'loss': 0.0069, 'learning_rate': 4.6635e-05, 'epoch': 0.25} 7%|▋ | 675/10000 [2:25:54<33:34:39, 12.96s/it] 7%|▋ | 676/10000 [2:26:07<33:34:55, 12.97s/it] {'loss': 0.0067, 'learning_rate': 4.663e-05, 'epoch': 0.25} 7%|▋ | 676/10000 [2:26:07<33:34:55, 12.97s/it] 7%|▋ | 677/10000 [2:26:20<33:33:25, 12.96s/it] {'loss': 0.0048, 'learning_rate': 4.6625e-05, 'epoch': 0.26} 7%|▋ | 677/10000 [2:26:20<33:33:25, 12.96s/it] 7%|▋ | 678/10000 [2:26:33<33:31:49, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.6620000000000004e-05, 'epoch': 0.26} 7%|▋ | 678/10000 [2:26:33<33:31:49, 12.95s/it] 7%|▋ | 679/10000 [2:26:46<33:33:56, 12.96s/it] {'loss': 0.0077, 'learning_rate': 4.6615e-05, 'epoch': 0.26} 7%|▋ | 679/10000 [2:26:46<33:33:56, 12.96s/it] 7%|▋ | 680/10000 [2:26:59<33:30:36, 12.94s/it] {'loss': 0.0143, 'learning_rate': 4.661e-05, 'epoch': 0.26} 7%|▋ | 680/10000 [2:26:59<33:30:36, 12.94s/it] 7%|▋ | 681/10000 [2:27:12<33:32:39, 12.96s/it] {'loss': 0.0077, 'learning_rate': 4.6605000000000005e-05, 'epoch': 0.26} 7%|▋ | 681/10000 [2:27:12<33:32:39, 12.96s/it] 7%|▋ | 682/10000 [2:27:25<33:30:25, 12.95s/it] {'loss': 0.0059, 'learning_rate': 4.660000000000001e-05, 'epoch': 0.26} 7%|▋ | 682/10000 [2:27:25<33:30:25, 12.95s/it] 7%|▋ | 683/10000 [2:27:38<33:32:00, 12.96s/it] {'loss': 0.0076, 'learning_rate': 4.6595e-05, 'epoch': 0.26} 7%|▋ | 683/10000 [2:27:38<33:32:00, 12.96s/it] 7%|▋ | 684/10000 [2:27:51<33:30:14, 12.95s/it] {'loss': 0.0057, 'learning_rate': 4.659e-05, 'epoch': 0.26} 7%|▋ | 684/10000 [2:27:51<33:30:14, 12.95s/it] 7%|▋ | 685/10000 [2:28:04<33:31:25, 12.96s/it] {'loss': 0.0081, 'learning_rate': 4.6585e-05, 'epoch': 0.26} 7%|▋ | 685/10000 [2:28:04<33:31:25, 12.96s/it] 7%|▋ | 686/10000 [2:28:17<33:27:54, 12.93s/it] {'loss': 0.0099, 'learning_rate': 4.6580000000000005e-05, 'epoch': 0.26} 7%|▋ | 686/10000 [2:28:17<33:27:54, 12.93s/it] 7%|▋ | 687/10000 [2:28:30<33:26:34, 12.93s/it] {'loss': 0.0082, 'learning_rate': 4.6575e-05, 'epoch': 0.26} 7%|▋ | 687/10000 [2:28:30<33:26:34, 12.93s/it] 7%|▋ | 688/10000 [2:28:43<33:27:54, 12.94s/it] {'loss': 0.0068, 'learning_rate': 4.657e-05, 'epoch': 0.26} 7%|▋ | 688/10000 [2:28:43<33:27:54, 12.94s/it] 7%|▋ | 689/10000 [2:28:56<33:30:11, 12.95s/it] {'loss': 0.0072, 'learning_rate': 4.6565000000000006e-05, 'epoch': 0.26} 7%|▋ | 689/10000 [2:28:56<33:30:11, 12.95s/it] 7%|▋ | 690/10000 [2:29:09<33:30:20, 12.96s/it] {'loss': 0.0051, 'learning_rate': 4.656e-05, 'epoch': 0.26} 7%|▋ | 690/10000 [2:29:09<33:30:20, 12.96s/it] 7%|▋ | 691/10000 [2:29:22<33:30:15, 12.96s/it] {'loss': 0.0052, 'learning_rate': 4.6555000000000004e-05, 'epoch': 0.26} 7%|▋ | 691/10000 [2:29:22<33:30:15, 12.96s/it] 7%|▋ | 692/10000 [2:29:35<33:32:47, 12.97s/it] {'loss': 0.0055, 'learning_rate': 4.655000000000001e-05, 'epoch': 0.26} 7%|▋ | 692/10000 [2:29:35<33:32:47, 12.97s/it] 7%|▋ | 693/10000 [2:29:48<33:32:15, 12.97s/it] {'loss': 0.0103, 'learning_rate': 4.6545e-05, 'epoch': 0.26} 7%|▋ | 693/10000 [2:29:48<33:32:15, 12.97s/it] 7%|▋ | 694/10000 [2:30:01<33:30:58, 12.97s/it] {'loss': 0.0105, 'learning_rate': 4.654e-05, 'epoch': 0.26} 7%|▋ | 694/10000 [2:30:01<33:30:58, 12.97s/it] 7%|▋ | 695/10000 [2:30:13<33:30:58, 12.97s/it] {'loss': 0.0089, 'learning_rate': 4.6535e-05, 'epoch': 0.26} 7%|▋ | 695/10000 [2:30:14<33:30:58, 12.97s/it] 7%|▋ | 696/10000 [2:30:26<33:30:59, 12.97s/it] {'loss': 0.0064, 'learning_rate': 4.6530000000000003e-05, 'epoch': 0.26} 7%|▋ | 696/10000 [2:30:27<33:30:59, 12.97s/it] 7%|▋ | 697/10000 [2:30:39<33:29:46, 12.96s/it] {'loss': 0.0076, 'learning_rate': 4.6525e-05, 'epoch': 0.26} 7%|▋ | 697/10000 [2:30:39<33:29:46, 12.96s/it] 7%|▋ | 698/10000 [2:30:52<33:30:03, 12.97s/it] {'loss': 0.0063, 'learning_rate': 4.652e-05, 'epoch': 0.26} 7%|▋ | 698/10000 [2:30:52<33:30:03, 12.97s/it] 7%|▋ | 699/10000 [2:31:05<33:26:58, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.6515000000000004e-05, 'epoch': 0.26} 7%|▋ | 699/10000 [2:31:05<33:26:58, 12.95s/it] 7%|▋ | 700/10000 [2:31:18<33:25:11, 12.94s/it] {'loss': 0.0065, 'learning_rate': 4.651e-05, 'epoch': 0.26} 7%|▋ | 700/10000 [2:31:18<33:25:11, 12.94s/it] 7%|▋ | 701/10000 [2:31:31<33:28:15, 12.96s/it] {'loss': 0.006, 'learning_rate': 4.6505e-05, 'epoch': 0.26} 7%|▋ | 701/10000 [2:31:31<33:28:15, 12.96s/it] 7%|▋ | 702/10000 [2:31:44<33:22:54, 12.92s/it] {'loss': 0.0174, 'learning_rate': 4.6500000000000005e-05, 'epoch': 0.26} 7%|▋ | 702/10000 [2:31:44<33:22:54, 12.92s/it] 7%|▋ | 703/10000 [2:31:57<33:22:21, 12.92s/it] {'loss': 0.0061, 'learning_rate': 4.6495e-05, 'epoch': 0.26} 7%|▋ | 703/10000 [2:31:57<33:22:21, 12.92s/it] 7%|▋ | 704/10000 [2:32:10<33:24:05, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.649e-05, 'epoch': 0.27} 7%|▋ | 704/10000 [2:32:10<33:24:05, 12.94s/it] 7%|▋ | 705/10000 [2:32:23<33:18:49, 12.90s/it] {'loss': 0.0062, 'learning_rate': 4.6485e-05, 'epoch': 0.27} 7%|▋ | 705/10000 [2:32:23<33:18:49, 12.90s/it] 7%|▋ | 706/10000 [2:32:36<33:20:27, 12.91s/it] {'loss': 0.0069, 'learning_rate': 4.648e-05, 'epoch': 0.27} 7%|▋ | 706/10000 [2:32:36<33:20:27, 12.91s/it] 7%|▋ | 707/10000 [2:32:49<33:20:56, 12.92s/it] {'loss': 0.0061, 'learning_rate': 4.6475000000000005e-05, 'epoch': 0.27} 7%|▋ | 707/10000 [2:32:49<33:20:56, 12.92s/it] 7%|▋ | 708/10000 [2:33:02<33:21:20, 12.92s/it] {'loss': 0.0059, 'learning_rate': 4.647e-05, 'epoch': 0.27} 7%|▋ | 708/10000 [2:33:02<33:21:20, 12.92s/it] 7%|▋ | 709/10000 [2:33:15<33:27:28, 12.96s/it] {'loss': 0.0069, 'learning_rate': 4.6465e-05, 'epoch': 0.27} 7%|▋ | 709/10000 [2:33:15<33:27:28, 12.96s/it] 7%|▋ | 710/10000 [2:33:28<33:25:55, 12.96s/it] {'loss': 0.0054, 'learning_rate': 4.6460000000000006e-05, 'epoch': 0.27} 7%|▋ | 710/10000 [2:33:28<33:25:55, 12.96s/it] 7%|▋ | 711/10000 [2:33:40<33:23:55, 12.94s/it] {'loss': 0.0072, 'learning_rate': 4.6455e-05, 'epoch': 0.27} 7%|▋ | 711/10000 [2:33:40<33:23:55, 12.94s/it] 7%|▋ | 712/10000 [2:33:53<33:25:15, 12.95s/it] {'loss': 0.0075, 'learning_rate': 4.6450000000000004e-05, 'epoch': 0.27} 7%|▋ | 712/10000 [2:33:53<33:25:15, 12.95s/it] 7%|▋ | 713/10000 [2:34:06<33:27:35, 12.97s/it] {'loss': 0.0063, 'learning_rate': 4.6445e-05, 'epoch': 0.27} 7%|▋ | 713/10000 [2:34:07<33:27:35, 12.97s/it] 7%|▋ | 714/10000 [2:34:19<33:25:41, 12.96s/it] {'loss': 0.0062, 'learning_rate': 4.644e-05, 'epoch': 0.27} 7%|▋ | 714/10000 [2:34:19<33:25:41, 12.96s/it] 7%|▋ | 715/10000 [2:34:32<33:25:26, 12.96s/it] {'loss': 0.006, 'learning_rate': 4.6435e-05, 'epoch': 0.27} 7%|▋ | 715/10000 [2:34:32<33:25:26, 12.96s/it] 7%|▋ | 716/10000 [2:34:45<33:26:38, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.643e-05, 'epoch': 0.27} 7%|▋ | 716/10000 [2:34:45<33:26:38, 12.97s/it] 7%|▋ | 717/10000 [2:34:58<33:27:18, 12.97s/it] {'loss': 0.0076, 'learning_rate': 4.6425000000000004e-05, 'epoch': 0.27} 7%|▋ | 717/10000 [2:34:58<33:27:18, 12.97s/it] 7%|▋ | 718/10000 [2:35:11<33:26:21, 12.97s/it] {'loss': 0.0073, 'learning_rate': 4.642e-05, 'epoch': 0.27} 7%|▋ | 718/10000 [2:35:11<33:26:21, 12.97s/it] 7%|▋ | 719/10000 [2:35:24<33:25:47, 12.97s/it] {'loss': 0.0083, 'learning_rate': 4.6415e-05, 'epoch': 0.27} 7%|▋ | 719/10000 [2:35:24<33:25:47, 12.97s/it] 7%|▋ | 720/10000 [2:35:37<33:24:12, 12.96s/it] {'loss': 0.008, 'learning_rate': 4.6410000000000005e-05, 'epoch': 0.27} 7%|▋ | 720/10000 [2:35:37<33:24:12, 12.96s/it] 7%|▋ | 721/10000 [2:35:50<33:22:42, 12.95s/it] {'loss': 0.0153, 'learning_rate': 4.640500000000001e-05, 'epoch': 0.27} 7%|▋ | 721/10000 [2:35:50<33:22:42, 12.95s/it] 7%|▋ | 722/10000 [2:36:03<33:22:58, 12.95s/it] {'loss': 0.008, 'learning_rate': 4.64e-05, 'epoch': 0.27} 7%|▋ | 722/10000 [2:36:03<33:22:58, 12.95s/it] 7%|▋ | 723/10000 [2:36:16<33:23:48, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.6395e-05, 'epoch': 0.27} 7%|▋ | 723/10000 [2:36:16<33:23:48, 12.96s/it] 7%|▋ | 724/10000 [2:36:29<33:21:10, 12.94s/it] {'loss': 0.0078, 'learning_rate': 4.639e-05, 'epoch': 0.27} 7%|▋ | 724/10000 [2:36:29<33:21:10, 12.94s/it] 7%|▋ | 725/10000 [2:36:42<33:20:24, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.6385000000000004e-05, 'epoch': 0.27} 7%|▋ | 725/10000 [2:36:42<33:20:24, 12.94s/it] 7%|▋ | 726/10000 [2:36:55<33:24:14, 12.97s/it] {'loss': 0.0062, 'learning_rate': 4.638e-05, 'epoch': 0.27} 7%|▋ | 726/10000 [2:36:55<33:24:14, 12.97s/it] 7%|▋ | 727/10000 [2:37:08<33:21:00, 12.95s/it] {'loss': 0.0089, 'learning_rate': 4.6375e-05, 'epoch': 0.27} 7%|▋ | 727/10000 [2:37:08<33:21:00, 12.95s/it] 7%|▋ | 728/10000 [2:37:21<33:24:15, 12.97s/it] {'loss': 0.0057, 'learning_rate': 4.6370000000000005e-05, 'epoch': 0.27} 7%|▋ | 728/10000 [2:37:21<33:24:15, 12.97s/it] 7%|▋ | 729/10000 [2:37:34<33:28:35, 13.00s/it] {'loss': 0.0076, 'learning_rate': 4.6365e-05, 'epoch': 0.27} 7%|▋ | 729/10000 [2:37:34<33:28:35, 13.00s/it] 7%|▋ | 730/10000 [2:37:47<33:27:27, 12.99s/it] {'loss': 0.0081, 'learning_rate': 4.636e-05, 'epoch': 0.28} 7%|▋ | 730/10000 [2:37:47<33:27:27, 12.99s/it] 7%|▋ | 731/10000 [2:38:00<33:30:31, 13.01s/it] {'loss': 0.0052, 'learning_rate': 4.6355000000000006e-05, 'epoch': 0.28} 7%|▋ | 731/10000 [2:38:00<33:30:31, 13.01s/it] 7%|▋ | 732/10000 [2:38:13<33:23:31, 12.97s/it] {'loss': 0.0074, 'learning_rate': 4.635e-05, 'epoch': 0.28} 7%|▋ | 732/10000 [2:38:13<33:23:31, 12.97s/it] 7%|▋ | 733/10000 [2:38:26<33:24:17, 12.98s/it] {'loss': 0.0059, 'learning_rate': 4.6345e-05, 'epoch': 0.28} 7%|▋ | 733/10000 [2:38:26<33:24:17, 12.98s/it] 7%|▋ | 734/10000 [2:38:39<33:22:09, 12.96s/it] {'loss': 0.0077, 'learning_rate': 4.634e-05, 'epoch': 0.28} 7%|▋ | 734/10000 [2:38:39<33:22:09, 12.96s/it] 7%|▋ | 735/10000 [2:38:52<33:18:52, 12.94s/it] {'loss': 0.0085, 'learning_rate': 4.6335e-05, 'epoch': 0.28} 7%|▋ | 735/10000 [2:38:52<33:18:52, 12.94s/it] 7%|▋ | 736/10000 [2:39:05<33:19:21, 12.95s/it] {'loss': 0.0075, 'learning_rate': 4.633e-05, 'epoch': 0.28} 7%|▋ | 736/10000 [2:39:05<33:19:21, 12.95s/it] 7%|▋ | 737/10000 [2:39:18<33:19:24, 12.95s/it] {'loss': 0.0052, 'learning_rate': 4.6325e-05, 'epoch': 0.28} 7%|▋ | 737/10000 [2:39:18<33:19:24, 12.95s/it] 7%|▋ | 738/10000 [2:39:31<33:22:22, 12.97s/it] {'loss': 0.0053, 'learning_rate': 4.6320000000000004e-05, 'epoch': 0.28} 7%|▋ | 738/10000 [2:39:31<33:22:22, 12.97s/it] 7%|▋ | 739/10000 [2:39:44<33:22:19, 12.97s/it] {'loss': 0.0078, 'learning_rate': 4.6315e-05, 'epoch': 0.28} 7%|▋ | 739/10000 [2:39:44<33:22:19, 12.97s/it] 7%|▋ | 740/10000 [2:39:57<33:23:00, 12.98s/it] {'loss': 0.0069, 'learning_rate': 4.631e-05, 'epoch': 0.28} 7%|▋ | 740/10000 [2:39:57<33:23:00, 12.98s/it] 7%|▋ | 741/10000 [2:40:09<33:18:51, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.6305000000000005e-05, 'epoch': 0.28} 7%|▋ | 741/10000 [2:40:09<33:18:51, 12.95s/it] 7%|▋ | 742/10000 [2:40:22<33:15:36, 12.93s/it] {'loss': 0.0045, 'learning_rate': 4.630000000000001e-05, 'epoch': 0.28} 7%|▋ | 742/10000 [2:40:22<33:15:36, 12.93s/it] 7%|▋ | 743/10000 [2:40:35<33:18:17, 12.95s/it] {'loss': 0.0058, 'learning_rate': 4.6294999999999996e-05, 'epoch': 0.28} 7%|▋ | 743/10000 [2:40:35<33:18:17, 12.95s/it] 7%|▋ | 744/10000 [2:40:48<33:18:17, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.629e-05, 'epoch': 0.28} 7%|▋ | 744/10000 [2:40:48<33:18:17, 12.95s/it] 7%|▋ | 745/10000 [2:41:01<33:13:20, 12.92s/it] {'loss': 0.0072, 'learning_rate': 4.6285e-05, 'epoch': 0.28} 7%|▋ | 745/10000 [2:41:01<33:13:20, 12.92s/it] 7%|▋ | 746/10000 [2:41:14<33:10:38, 12.91s/it] {'loss': 0.0063, 'learning_rate': 4.6280000000000004e-05, 'epoch': 0.28} 7%|▋ | 746/10000 [2:41:14<33:10:38, 12.91s/it] 7%|▋ | 747/10000 [2:41:27<33:10:37, 12.91s/it] {'loss': 0.0054, 'learning_rate': 4.6275e-05, 'epoch': 0.28} 7%|▋ | 747/10000 [2:41:27<33:10:37, 12.91s/it] 7%|▋ | 748/10000 [2:41:40<33:08:47, 12.90s/it] {'loss': 0.0043, 'learning_rate': 4.627e-05, 'epoch': 0.28} 7%|▋ | 748/10000 [2:41:40<33:08:47, 12.90s/it] 7%|▋ | 749/10000 [2:41:53<33:06:58, 12.89s/it] {'loss': 0.0072, 'learning_rate': 4.6265000000000005e-05, 'epoch': 0.28} 7%|▋ | 749/10000 [2:41:53<33:06:58, 12.89s/it] 8%|▊ | 750/10000 [2:42:06<33:06:04, 12.88s/it] {'loss': 0.0066, 'learning_rate': 4.626e-05, 'epoch': 0.28} 8%|▊ | 750/10000 [2:42:06<33:06:04, 12.88s/it] 8%|▊ | 751/10000 [2:42:18<33:04:48, 12.88s/it] {'loss': 0.0064, 'learning_rate': 4.6255000000000004e-05, 'epoch': 0.28} 8%|▊ | 751/10000 [2:42:18<33:04:48, 12.88s/it] 8%|▊ | 752/10000 [2:42:31<33:05:51, 12.88s/it] {'loss': 0.006, 'learning_rate': 4.6250000000000006e-05, 'epoch': 0.28} 8%|▊ | 752/10000 [2:42:31<33:05:51, 12.88s/it] 8%|▊ | 753/10000 [2:42:44<33:03:55, 12.87s/it] {'loss': 0.0066, 'learning_rate': 4.6245e-05, 'epoch': 0.28} 8%|▊ | 753/10000 [2:42:44<33:03:55, 12.87s/it] 8%|▊ | 754/10000 [2:42:57<33:01:49, 12.86s/it] {'loss': 0.0084, 'learning_rate': 4.624e-05, 'epoch': 0.28} 8%|▊ | 754/10000 [2:42:57<33:01:49, 12.86s/it] 8%|▊ | 755/10000 [2:43:10<33:07:22, 12.90s/it] {'loss': 0.0055, 'learning_rate': 4.6235e-05, 'epoch': 0.28} 8%|▊ | 755/10000 [2:43:10<33:07:22, 12.90s/it] 8%|▊ | 756/10000 [2:43:23<33:07:50, 12.90s/it] {'loss': 0.0054, 'learning_rate': 4.623e-05, 'epoch': 0.28} 8%|▊ | 756/10000 [2:43:23<33:07:50, 12.90s/it] 8%|▊ | 757/10000 [2:43:36<33:04:45, 12.88s/it] {'loss': 0.0075, 'learning_rate': 4.6225e-05, 'epoch': 0.29} 8%|▊ | 757/10000 [2:43:36<33:04:45, 12.88s/it] 8%|▊ | 758/10000 [2:43:49<33:08:31, 12.91s/it] {'loss': 0.0072, 'learning_rate': 4.622e-05, 'epoch': 0.29} 8%|▊ | 758/10000 [2:43:49<33:08:31, 12.91s/it] 8%|▊ | 759/10000 [2:44:02<33:12:10, 12.93s/it] {'loss': 0.0068, 'learning_rate': 4.6215000000000004e-05, 'epoch': 0.29} 8%|▊ | 759/10000 [2:44:02<33:12:10, 12.93s/it] 8%|▊ | 760/10000 [2:44:15<33:14:36, 12.95s/it] {'loss': 0.0083, 'learning_rate': 4.6210000000000006e-05, 'epoch': 0.29} 8%|▊ | 760/10000 [2:44:15<33:14:36, 12.95s/it] 8%|▊ | 761/10000 [2:44:28<33:15:07, 12.96s/it] {'loss': 0.0051, 'learning_rate': 4.6205e-05, 'epoch': 0.29} 8%|▊ | 761/10000 [2:44:28<33:15:07, 12.96s/it] 8%|▊ | 762/10000 [2:44:41<33:14:20, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.6200000000000005e-05, 'epoch': 0.29} 8%|▊ | 762/10000 [2:44:41<33:14:20, 12.95s/it] 8%|▊ | 763/10000 [2:44:54<33:14:03, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.619500000000001e-05, 'epoch': 0.29} 8%|▊ | 763/10000 [2:44:54<33:14:03, 12.95s/it] 8%|▊ | 764/10000 [2:45:06<33:14:44, 12.96s/it] {'loss': 0.0082, 'learning_rate': 4.619e-05, 'epoch': 0.29} 8%|▊ | 764/10000 [2:45:07<33:14:44, 12.96s/it] 8%|▊ | 765/10000 [2:45:19<33:12:00, 12.94s/it] {'loss': 0.0076, 'learning_rate': 4.6185e-05, 'epoch': 0.29} 8%|▊ | 765/10000 [2:45:19<33:12:00, 12.94s/it] 8%|▊ | 766/10000 [2:45:32<33:12:51, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.618e-05, 'epoch': 0.29} 8%|▊ | 766/10000 [2:45:32<33:12:51, 12.95s/it] 8%|▊ | 767/10000 [2:45:45<33:11:15, 12.94s/it] {'loss': 0.0066, 'learning_rate': 4.6175000000000004e-05, 'epoch': 0.29} 8%|▊ | 767/10000 [2:45:45<33:11:15, 12.94s/it] 8%|▊ | 768/10000 [2:45:58<33:05:46, 12.91s/it] {'loss': 0.0064, 'learning_rate': 4.617e-05, 'epoch': 0.29} 8%|▊ | 768/10000 [2:45:58<33:05:46, 12.91s/it] 8%|▊ | 769/10000 [2:46:11<33:05:37, 12.91s/it] {'loss': 0.0072, 'learning_rate': 4.6165e-05, 'epoch': 0.29} 8%|▊ | 769/10000 [2:46:11<33:05:37, 12.91s/it] 8%|▊ | 770/10000 [2:46:24<33:06:09, 12.91s/it] {'loss': 0.0076, 'learning_rate': 4.6160000000000005e-05, 'epoch': 0.29} 8%|▊ | 770/10000 [2:46:24<33:06:09, 12.91s/it] 8%|▊ | 771/10000 [2:46:37<33:05:36, 12.91s/it] {'loss': 0.0063, 'learning_rate': 4.6155e-05, 'epoch': 0.29} 8%|▊ | 771/10000 [2:46:37<33:05:36, 12.91s/it] 8%|▊ | 772/10000 [2:46:50<33:08:13, 12.93s/it] {'loss': 0.0059, 'learning_rate': 4.6150000000000004e-05, 'epoch': 0.29} 8%|▊ | 772/10000 [2:46:50<33:08:13, 12.93s/it] 8%|▊ | 773/10000 [2:47:03<33:10:05, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.6145000000000006e-05, 'epoch': 0.29} 8%|▊ | 773/10000 [2:47:03<33:10:05, 12.94s/it] 8%|▊ | 774/10000 [2:47:16<33:10:10, 12.94s/it] {'loss': 0.0075, 'learning_rate': 4.614e-05, 'epoch': 0.29} 8%|▊ | 774/10000 [2:47:16<33:10:10, 12.94s/it] 8%|▊ | 775/10000 [2:47:29<33:06:46, 12.92s/it] {'loss': 0.0049, 'learning_rate': 4.6135e-05, 'epoch': 0.29} 8%|▊ | 775/10000 [2:47:29<33:06:46, 12.92s/it] 8%|▊ | 776/10000 [2:47:42<33:07:58, 12.93s/it] {'loss': 0.0062, 'learning_rate': 4.613e-05, 'epoch': 0.29} 8%|▊ | 776/10000 [2:47:42<33:07:58, 12.93s/it] 8%|▊ | 777/10000 [2:47:54<33:05:59, 12.92s/it] {'loss': 0.01, 'learning_rate': 4.6125e-05, 'epoch': 0.29} 8%|▊ | 777/10000 [2:47:54<33:05:59, 12.92s/it] 8%|▊ | 778/10000 [2:48:07<33:08:02, 12.93s/it] {'loss': 0.0073, 'learning_rate': 4.612e-05, 'epoch': 0.29} 8%|▊ | 778/10000 [2:48:07<33:08:02, 12.93s/it] 8%|▊ | 779/10000 [2:48:20<33:06:29, 12.93s/it] {'loss': 0.0114, 'learning_rate': 4.6115e-05, 'epoch': 0.29} 8%|▊ | 779/10000 [2:48:20<33:06:29, 12.93s/it] 8%|▊ | 780/10000 [2:48:33<33:11:45, 12.96s/it] {'loss': 0.0062, 'learning_rate': 4.6110000000000004e-05, 'epoch': 0.29} 8%|▊ | 780/10000 [2:48:33<33:11:45, 12.96s/it] 8%|▊ | 781/10000 [2:48:46<33:15:13, 12.99s/it] {'loss': 0.0073, 'learning_rate': 4.610500000000001e-05, 'epoch': 0.29} 8%|▊ | 781/10000 [2:48:46<33:15:13, 12.99s/it] 8%|▊ | 782/10000 [2:48:59<33:14:38, 12.98s/it] {'loss': 0.007, 'learning_rate': 4.61e-05, 'epoch': 0.29} 8%|▊ | 782/10000 [2:48:59<33:14:38, 12.98s/it] 8%|▊ | 783/10000 [2:49:12<33:12:09, 12.97s/it] {'loss': 0.006, 'learning_rate': 4.6095000000000005e-05, 'epoch': 0.3} 8%|▊ | 783/10000 [2:49:12<33:12:09, 12.97s/it] 8%|▊ | 784/10000 [2:49:25<33:08:02, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.609e-05, 'epoch': 0.3} 8%|▊ | 784/10000 [2:49:25<33:08:02, 12.94s/it] 8%|▊ | 785/10000 [2:49:38<33:02:56, 12.91s/it] {'loss': 0.0084, 'learning_rate': 4.6085000000000003e-05, 'epoch': 0.3} 8%|▊ | 785/10000 [2:49:38<33:02:56, 12.91s/it] 8%|▊ | 786/10000 [2:49:51<33:03:31, 12.92s/it] {'loss': 0.0062, 'learning_rate': 4.608e-05, 'epoch': 0.3} 8%|▊ | 786/10000 [2:49:51<33:03:31, 12.92s/it] 8%|▊ | 787/10000 [2:50:04<33:04:38, 12.93s/it] {'loss': 0.0073, 'learning_rate': 4.6075e-05, 'epoch': 0.3} 8%|▊ | 787/10000 [2:50:04<33:04:38, 12.93s/it] 8%|▊ | 788/10000 [2:50:17<33:04:04, 12.92s/it] {'loss': 0.0087, 'learning_rate': 4.6070000000000004e-05, 'epoch': 0.3} 8%|▊ | 788/10000 [2:50:17<33:04:04, 12.92s/it] 8%|▊ | 789/10000 [2:50:30<33:03:07, 12.92s/it] {'loss': 0.007, 'learning_rate': 4.6065e-05, 'epoch': 0.3} 8%|▊ | 789/10000 [2:50:30<33:03:07, 12.92s/it] 8%|▊ | 790/10000 [2:50:43<33:03:29, 12.92s/it] {'loss': 0.0058, 'learning_rate': 4.606e-05, 'epoch': 0.3} 8%|▊ | 790/10000 [2:50:43<33:03:29, 12.92s/it] 8%|▊ | 791/10000 [2:50:56<32:59:29, 12.90s/it] {'loss': 0.0073, 'learning_rate': 4.6055000000000005e-05, 'epoch': 0.3} 8%|▊ | 791/10000 [2:50:56<32:59:29, 12.90s/it] 8%|▊ | 792/10000 [2:51:08<32:57:46, 12.89s/it] {'loss': 0.0071, 'learning_rate': 4.605e-05, 'epoch': 0.3} 8%|▊ | 792/10000 [2:51:08<32:57:46, 12.89s/it] 8%|▊ | 793/10000 [2:51:21<32:55:08, 12.87s/it] {'loss': 0.011, 'learning_rate': 4.6045000000000004e-05, 'epoch': 0.3} 8%|▊ | 793/10000 [2:51:21<32:55:08, 12.87s/it] 8%|▊ | 794/10000 [2:51:34<32:56:54, 12.88s/it] {'loss': 0.0059, 'learning_rate': 4.604e-05, 'epoch': 0.3} 8%|▊ | 794/10000 [2:51:34<32:56:54, 12.88s/it] 8%|▊ | 795/10000 [2:51:47<32:58:34, 12.90s/it] {'loss': 0.0053, 'learning_rate': 4.6035e-05, 'epoch': 0.3} 8%|▊ | 795/10000 [2:51:47<32:58:34, 12.90s/it] 8%|▊ | 796/10000 [2:52:00<32:59:00, 12.90s/it] {'loss': 0.0087, 'learning_rate': 4.603e-05, 'epoch': 0.3} 8%|▊ | 796/10000 [2:52:00<32:59:00, 12.90s/it] 8%|▊ | 797/10000 [2:52:13<32:58:19, 12.90s/it] {'loss': 0.006, 'learning_rate': 4.6025e-05, 'epoch': 0.3} 8%|▊ | 797/10000 [2:52:13<32:58:19, 12.90s/it] 8%|▊ | 798/10000 [2:52:26<32:54:37, 12.88s/it] {'loss': 0.0069, 'learning_rate': 4.602e-05, 'epoch': 0.3} 8%|▊ | 798/10000 [2:52:26<32:54:37, 12.88s/it] 8%|▊ | 799/10000 [2:52:39<32:53:34, 12.87s/it] {'loss': 0.006, 'learning_rate': 4.6015000000000006e-05, 'epoch': 0.3} 8%|▊ | 799/10000 [2:52:39<32:53:34, 12.87s/it] 8%|▊ | 800/10000 [2:52:51<32:52:56, 12.87s/it] {'loss': 0.0063, 'learning_rate': 4.601e-05, 'epoch': 0.3} 8%|▊ | 800/10000 [2:52:51<32:52:56, 12.87s/it] 8%|▊ | 801/10000 [2:53:04<32:51:24, 12.86s/it] {'loss': 0.006, 'learning_rate': 4.6005000000000004e-05, 'epoch': 0.3} 8%|▊ | 801/10000 [2:53:04<32:51:24, 12.86s/it] 8%|▊ | 802/10000 [2:53:17<32:53:11, 12.87s/it] {'loss': 0.0065, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.3} 8%|▊ | 802/10000 [2:53:17<32:53:11, 12.87s/it] 8%|▊ | 803/10000 [2:53:30<32:55:37, 12.89s/it] {'loss': 0.0063, 'learning_rate': 4.5995e-05, 'epoch': 0.3} 8%|▊ | 803/10000 [2:53:30<32:55:37, 12.89s/it] 8%|▊ | 804/10000 [2:53:43<32:57:47, 12.90s/it] {'loss': 0.0072, 'learning_rate': 4.599e-05, 'epoch': 0.3} 8%|▊ | 804/10000 [2:53:43<32:57:47, 12.90s/it] 8%|▊ | 805/10000 [2:53:56<32:57:48, 12.91s/it] {'loss': 0.0075, 'learning_rate': 4.5985e-05, 'epoch': 0.3} 8%|▊ | 805/10000 [2:53:56<32:57:48, 12.91s/it] 8%|▊ | 806/10000 [2:54:09<32:54:55, 12.89s/it] {'loss': 0.0049, 'learning_rate': 4.5980000000000004e-05, 'epoch': 0.3} 8%|▊ | 806/10000 [2:54:09<32:54:55, 12.89s/it] 8%|▊ | 807/10000 [2:54:22<32:56:06, 12.90s/it] {'loss': 0.0091, 'learning_rate': 4.5975e-05, 'epoch': 0.3} 8%|▊ | 807/10000 [2:54:22<32:56:06, 12.90s/it] 8%|▊ | 808/10000 [2:54:35<32:53:31, 12.88s/it] {'loss': 0.007, 'learning_rate': 4.597e-05, 'epoch': 0.3} 8%|▊ | 808/10000 [2:54:35<32:53:31, 12.88s/it] 8%|▊ | 809/10000 [2:54:47<32:54:33, 12.89s/it] {'loss': 0.0059, 'learning_rate': 4.5965000000000005e-05, 'epoch': 0.3} 8%|▊ | 809/10000 [2:54:47<32:54:33, 12.89s/it] 8%|▊ | 810/10000 [2:55:00<32:53:42, 12.89s/it] {'loss': 0.0061, 'learning_rate': 4.596e-05, 'epoch': 0.31} 8%|▊ | 810/10000 [2:55:00<32:53:42, 12.89s/it] 8%|▊ | 811/10000 [2:55:13<32:52:40, 12.88s/it] {'loss': 0.0058, 'learning_rate': 4.5955e-05, 'epoch': 0.31} 8%|▊ | 811/10000 [2:55:13<32:52:40, 12.88s/it] 8%|▊ | 812/10000 [2:55:26<32:53:35, 12.89s/it] {'loss': 0.0069, 'learning_rate': 4.5950000000000006e-05, 'epoch': 0.31} 8%|▊ | 812/10000 [2:55:26<32:53:35, 12.89s/it] 8%|▊ | 813/10000 [2:55:39<32:53:02, 12.89s/it] {'loss': 0.0065, 'learning_rate': 4.5945e-05, 'epoch': 0.31} 8%|▊ | 813/10000 [2:55:39<32:53:02, 12.89s/it] 8%|▊ | 814/10000 [2:55:52<32:52:56, 12.89s/it] {'loss': 0.0073, 'learning_rate': 4.594e-05, 'epoch': 0.31} 8%|▊ | 814/10000 [2:55:52<32:52:56, 12.89s/it] 8%|▊ | 815/10000 [2:56:05<32:56:19, 12.91s/it] {'loss': 0.0056, 'learning_rate': 4.5935e-05, 'epoch': 0.31} 8%|▊ | 815/10000 [2:56:05<32:56:19, 12.91s/it] 8%|▊ | 816/10000 [2:56:18<32:57:17, 12.92s/it] {'loss': 0.0056, 'learning_rate': 4.593e-05, 'epoch': 0.31} 8%|▊ | 816/10000 [2:56:18<32:57:17, 12.92s/it] 8%|▊ | 817/10000 [2:56:31<32:53:17, 12.89s/it] {'loss': 0.0067, 'learning_rate': 4.5925e-05, 'epoch': 0.31} 8%|▊ | 817/10000 [2:56:31<32:53:17, 12.89s/it] 8%|▊ | 818/10000 [2:56:44<32:54:09, 12.90s/it] {'loss': 0.01, 'learning_rate': 4.592e-05, 'epoch': 0.31} 8%|▊ | 818/10000 [2:56:44<32:54:09, 12.90s/it] 8%|▊ | 819/10000 [2:56:56<32:54:51, 12.91s/it] {'loss': 0.0072, 'learning_rate': 4.5915000000000003e-05, 'epoch': 0.31} 8%|▊ | 819/10000 [2:56:56<32:54:51, 12.91s/it] 8%|▊ | 820/10000 [2:57:09<32:54:21, 12.90s/it] {'loss': 0.0086, 'learning_rate': 4.5910000000000006e-05, 'epoch': 0.31} 8%|▊ | 820/10000 [2:57:09<32:54:21, 12.90s/it] 8%|▊ | 821/10000 [2:57:22<32:56:33, 12.92s/it] {'loss': 0.0095, 'learning_rate': 4.5905e-05, 'epoch': 0.31} 8%|▊ | 821/10000 [2:57:22<32:56:33, 12.92s/it] 8%|▊ | 822/10000 [2:57:35<32:54:58, 12.91s/it] {'loss': 0.0074, 'learning_rate': 4.5900000000000004e-05, 'epoch': 0.31} 8%|▊ | 822/10000 [2:57:35<32:54:58, 12.91s/it] 8%|▊ | 823/10000 [2:57:48<32:57:35, 12.93s/it] {'loss': 0.0221, 'learning_rate': 4.589500000000001e-05, 'epoch': 0.31} 8%|▊ | 823/10000 [2:57:48<32:57:35, 12.93s/it] 8%|▊ | 824/10000 [2:58:01<32:57:19, 12.93s/it] {'loss': 0.0072, 'learning_rate': 4.589e-05, 'epoch': 0.31} 8%|▊ | 824/10000 [2:58:01<32:57:19, 12.93s/it] 8%|▊ | 825/10000 [2:58:14<33:01:59, 12.96s/it] {'loss': 0.0065, 'learning_rate': 4.5885e-05, 'epoch': 0.31} 8%|▊ | 825/10000 [2:58:14<33:01:59, 12.96s/it] 8%|▊ | 826/10000 [2:58:27<32:59:47, 12.95s/it] {'loss': 0.008, 'learning_rate': 4.588e-05, 'epoch': 0.31} 8%|▊ | 826/10000 [2:58:27<32:59:47, 12.95s/it] 8%|▊ | 827/10000 [2:58:40<32:57:48, 12.94s/it] {'loss': 0.0062, 'learning_rate': 4.5875000000000004e-05, 'epoch': 0.31} 8%|▊ | 827/10000 [2:58:40<32:57:48, 12.94s/it] 8%|▊ | 828/10000 [2:58:53<32:54:26, 12.92s/it] {'loss': 0.0077, 'learning_rate': 4.587e-05, 'epoch': 0.31} 8%|▊ | 828/10000 [2:58:53<32:54:26, 12.92s/it] 8%|▊ | 829/10000 [2:59:06<32:51:50, 12.90s/it] {'loss': 0.007, 'learning_rate': 4.5865e-05, 'epoch': 0.31} 8%|▊ | 829/10000 [2:59:06<32:51:50, 12.90s/it] 8%|▊ | 830/10000 [2:59:19<32:52:49, 12.91s/it] {'loss': 0.007, 'learning_rate': 4.5860000000000005e-05, 'epoch': 0.31} 8%|▊ | 830/10000 [2:59:19<32:52:49, 12.91s/it] 8%|▊ | 831/10000 [2:59:31<32:49:17, 12.89s/it] {'loss': 0.0065, 'learning_rate': 4.5855e-05, 'epoch': 0.31} 8%|▊ | 831/10000 [2:59:31<32:49:17, 12.89s/it] 8%|▊ | 832/10000 [2:59:44<32:53:01, 12.91s/it] {'loss': 0.008, 'learning_rate': 4.585e-05, 'epoch': 0.31} 8%|▊ | 832/10000 [2:59:44<32:53:01, 12.91s/it] 8%|▊ | 833/10000 [2:59:57<32:51:31, 12.90s/it] {'loss': 0.0056, 'learning_rate': 4.5845000000000006e-05, 'epoch': 0.31} 8%|▊ | 833/10000 [2:59:57<32:51:31, 12.90s/it] 8%|▊ | 834/10000 [3:00:10<32:47:50, 12.88s/it] {'loss': 0.0086, 'learning_rate': 4.584e-05, 'epoch': 0.31} 8%|▊ | 834/10000 [3:00:10<32:47:50, 12.88s/it] 8%|▊ | 835/10000 [3:00:23<32:46:16, 12.87s/it] {'loss': 0.0066, 'learning_rate': 4.5835e-05, 'epoch': 0.31} 8%|▊ | 835/10000 [3:00:23<32:46:16, 12.87s/it] 8%|▊ | 836/10000 [3:00:36<32:52:04, 12.91s/it] {'loss': 0.0079, 'learning_rate': 4.583e-05, 'epoch': 0.31} 8%|▊ | 836/10000 [3:00:36<32:52:04, 12.91s/it] 8%|▊ | 837/10000 [3:00:49<32:53:20, 12.92s/it] {'loss': 0.0044, 'learning_rate': 4.5825e-05, 'epoch': 0.32} 8%|▊ | 837/10000 [3:00:49<32:53:20, 12.92s/it] 8%|▊ | 838/10000 [3:01:02<32:53:11, 12.92s/it] {'loss': 0.0049, 'learning_rate': 4.5820000000000005e-05, 'epoch': 0.32} 8%|▊ | 838/10000 [3:01:02<32:53:11, 12.92s/it] 8%|▊ | 839/10000 [3:01:15<32:49:42, 12.90s/it] {'loss': 0.0065, 'learning_rate': 4.5815e-05, 'epoch': 0.32} 8%|▊ | 839/10000 [3:01:15<32:49:42, 12.90s/it] 8%|▊ | 840/10000 [3:01:28<32:46:37, 12.88s/it] {'loss': 0.0075, 'learning_rate': 4.5810000000000004e-05, 'epoch': 0.32} 8%|▊ | 840/10000 [3:01:28<32:46:37, 12.88s/it] 8%|▊ | 841/10000 [3:01:40<32:48:20, 12.89s/it] {'loss': 0.0058, 'learning_rate': 4.5805000000000006e-05, 'epoch': 0.32} 8%|▊ | 841/10000 [3:01:40<32:48:20, 12.89s/it] 8%|▊ | 842/10000 [3:01:53<32:48:58, 12.90s/it] {'loss': 0.0076, 'learning_rate': 4.58e-05, 'epoch': 0.32} 8%|▊ | 842/10000 [3:01:53<32:48:58, 12.90s/it] 8%|▊ | 843/10000 [3:02:06<32:47:59, 12.90s/it] {'loss': 0.0113, 'learning_rate': 4.5795000000000005e-05, 'epoch': 0.32} 8%|▊ | 843/10000 [3:02:06<32:47:59, 12.90s/it] 8%|▊ | 844/10000 [3:02:19<32:49:04, 12.90s/it] {'loss': 0.0077, 'learning_rate': 4.579e-05, 'epoch': 0.32} 8%|▊ | 844/10000 [3:02:19<32:49:04, 12.90s/it] 8%|▊ | 845/10000 [3:02:32<32:47:28, 12.89s/it] {'loss': 0.0079, 'learning_rate': 4.5785e-05, 'epoch': 0.32} 8%|▊ | 845/10000 [3:02:32<32:47:28, 12.89s/it] 8%|▊ | 846/10000 [3:02:45<32:44:47, 12.88s/it] {'loss': 0.0061, 'learning_rate': 4.578e-05, 'epoch': 0.32} 8%|▊ | 846/10000 [3:02:45<32:44:47, 12.88s/it] 8%|▊ | 847/10000 [3:02:58<32:45:30, 12.88s/it] {'loss': 0.0069, 'learning_rate': 4.5775e-05, 'epoch': 0.32} 8%|▊ | 847/10000 [3:02:58<32:45:30, 12.88s/it] 8%|▊ | 848/10000 [3:03:11<32:44:43, 12.88s/it] {'loss': 0.009, 'learning_rate': 4.5770000000000004e-05, 'epoch': 0.32} 8%|▊ | 848/10000 [3:03:11<32:44:43, 12.88s/it] 8%|▊ | 849/10000 [3:03:24<32:43:11, 12.87s/it] {'loss': 0.0067, 'learning_rate': 4.5765e-05, 'epoch': 0.32} 8%|▊ | 849/10000 [3:03:24<32:43:11, 12.87s/it] 8%|▊ | 850/10000 [3:03:36<32:46:16, 12.89s/it] {'loss': 0.0068, 'learning_rate': 4.576e-05, 'epoch': 0.32} 8%|▊ | 850/10000 [3:03:36<32:46:16, 12.89s/it] 9%|▊ | 851/10000 [3:03:49<32:47:51, 12.91s/it] {'loss': 0.0095, 'learning_rate': 4.5755000000000005e-05, 'epoch': 0.32} 9%|▊ | 851/10000 [3:03:49<32:47:51, 12.91s/it] 9%|▊ | 852/10000 [3:04:02<32:47:27, 12.90s/it] {'loss': 0.0128, 'learning_rate': 4.575e-05, 'epoch': 0.32} 9%|▊ | 852/10000 [3:04:02<32:47:27, 12.90s/it] 9%|▊ | 853/10000 [3:04:15<32:49:54, 12.92s/it] {'loss': 0.007, 'learning_rate': 4.5745e-05, 'epoch': 0.32} 9%|▊ | 853/10000 [3:04:15<32:49:54, 12.92s/it] 9%|▊ | 854/10000 [3:04:28<32:51:25, 12.93s/it] {'loss': 0.0069, 'learning_rate': 4.574e-05, 'epoch': 0.32} 9%|▊ | 854/10000 [3:04:28<32:51:25, 12.93s/it] 9%|▊ | 855/10000 [3:04:41<32:47:34, 12.91s/it] {'loss': 0.0064, 'learning_rate': 4.5735e-05, 'epoch': 0.32} 9%|▊ | 855/10000 [3:04:41<32:47:34, 12.91s/it] 9%|▊ | 856/10000 [3:04:54<32:45:56, 12.90s/it] {'loss': 0.0085, 'learning_rate': 4.573e-05, 'epoch': 0.32} 9%|▊ | 856/10000 [3:04:54<32:45:56, 12.90s/it] 9%|▊ | 857/10000 [3:05:07<32:49:12, 12.92s/it] {'loss': 0.0073, 'learning_rate': 4.5725e-05, 'epoch': 0.32} 9%|▊ | 857/10000 [3:05:07<32:49:12, 12.92s/it] 9%|▊ | 858/10000 [3:05:20<32:49:58, 12.93s/it] {'loss': 0.0077, 'learning_rate': 4.572e-05, 'epoch': 0.32} 9%|▊ | 858/10000 [3:05:20<32:49:58, 12.93s/it] 9%|▊ | 859/10000 [3:05:33<32:54:40, 12.96s/it] {'loss': 0.0088, 'learning_rate': 4.5715000000000005e-05, 'epoch': 0.32} 9%|▊ | 859/10000 [3:05:33<32:54:40, 12.96s/it] 9%|▊ | 860/10000 [3:05:46<32:56:21, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.571e-05, 'epoch': 0.32} 9%|▊ | 860/10000 [3:05:46<32:56:21, 12.97s/it] 9%|▊ | 861/10000 [3:05:59<32:57:42, 12.98s/it] {'loss': 0.0078, 'learning_rate': 4.5705000000000004e-05, 'epoch': 0.32} 9%|▊ | 861/10000 [3:05:59<32:57:42, 12.98s/it] 9%|▊ | 862/10000 [3:06:12<33:00:14, 13.00s/it] {'loss': 0.0101, 'learning_rate': 4.5700000000000006e-05, 'epoch': 0.32} 9%|▊ | 862/10000 [3:06:12<33:00:14, 13.00s/it] 9%|▊ | 863/10000 [3:06:25<32:56:47, 12.98s/it] {'loss': 0.008, 'learning_rate': 4.5695e-05, 'epoch': 0.33} 9%|▊ | 863/10000 [3:06:25<32:56:47, 12.98s/it] 9%|▊ | 864/10000 [3:06:38<32:58:21, 12.99s/it] {'loss': 0.0092, 'learning_rate': 4.569e-05, 'epoch': 0.33} 9%|▊ | 864/10000 [3:06:38<32:58:21, 12.99s/it] 9%|▊ | 865/10000 [3:06:51<32:56:45, 12.98s/it] {'loss': 0.0068, 'learning_rate': 4.5685e-05, 'epoch': 0.33} 9%|▊ | 865/10000 [3:06:51<32:56:45, 12.98s/it] 9%|▊ | 866/10000 [3:07:04<32:55:15, 12.98s/it] {'loss': 0.0069, 'learning_rate': 4.568e-05, 'epoch': 0.33} 9%|▊ | 866/10000 [3:07:04<32:55:15, 12.98s/it] 9%|▊ | 867/10000 [3:07:17<32:56:43, 12.99s/it] {'loss': 0.0088, 'learning_rate': 4.5675e-05, 'epoch': 0.33} 9%|▊ | 867/10000 [3:07:17<32:56:43, 12.99s/it] 9%|▊ | 868/10000 [3:07:30<32:55:11, 12.98s/it] {'loss': 0.0119, 'learning_rate': 4.567e-05, 'epoch': 0.33} 9%|▊ | 868/10000 [3:07:30<32:55:11, 12.98s/it] 9%|▊ | 869/10000 [3:07:43<32:55:57, 12.98s/it] {'loss': 0.0086, 'learning_rate': 4.5665000000000004e-05, 'epoch': 0.33} 9%|▊ | 869/10000 [3:07:43<32:55:57, 12.98s/it] 9%|▊ | 870/10000 [3:07:56<32:56:42, 12.99s/it] {'loss': 0.0085, 'learning_rate': 4.566e-05, 'epoch': 0.33} 9%|▊ | 870/10000 [3:07:56<32:56:42, 12.99s/it] 9%|▊ | 871/10000 [3:08:09<32:51:14, 12.96s/it] {'loss': 0.0113, 'learning_rate': 4.5655e-05, 'epoch': 0.33} 9%|▊ | 871/10000 [3:08:09<32:51:14, 12.96s/it] 9%|▊ | 872/10000 [3:08:22<32:54:10, 12.98s/it] {'loss': 0.0066, 'learning_rate': 4.5650000000000005e-05, 'epoch': 0.33} 9%|▊ | 872/10000 [3:08:22<32:54:10, 12.98s/it] 9%|▊ | 873/10000 [3:08:35<32:50:45, 12.96s/it] {'loss': 0.0082, 'learning_rate': 4.564500000000001e-05, 'epoch': 0.33} 9%|▊ | 873/10000 [3:08:35<32:50:45, 12.96s/it] 9%|▊ | 874/10000 [3:08:48<32:52:37, 12.97s/it] {'loss': 0.0063, 'learning_rate': 4.564e-05, 'epoch': 0.33} 9%|▊ | 874/10000 [3:08:48<32:52:37, 12.97s/it] 9%|▉ | 875/10000 [3:09:01<32:55:16, 12.99s/it] {'loss': 0.0059, 'learning_rate': 4.5635e-05, 'epoch': 0.33} 9%|▉ | 875/10000 [3:09:01<32:55:16, 12.99s/it] 9%|▉ | 876/10000 [3:09:14<32:52:09, 12.97s/it] {'loss': 0.0088, 'learning_rate': 4.563e-05, 'epoch': 0.33} 9%|▉ | 876/10000 [3:09:14<32:52:09, 12.97s/it] 9%|▉ | 877/10000 [3:09:27<32:54:35, 12.99s/it] {'loss': 0.0086, 'learning_rate': 4.5625e-05, 'epoch': 0.33} 9%|▉ | 877/10000 [3:09:27<32:54:35, 12.99s/it] 9%|▉ | 878/10000 [3:09:39<32:50:15, 12.96s/it] {'loss': 0.0087, 'learning_rate': 4.562e-05, 'epoch': 0.33} 9%|▉ | 878/10000 [3:09:40<32:50:15, 12.96s/it] 9%|▉ | 879/10000 [3:09:53<32:53:19, 12.98s/it] {'loss': 0.0204, 'learning_rate': 4.5615e-05, 'epoch': 0.33} 9%|▉ | 879/10000 [3:09:53<32:53:19, 12.98s/it] 9%|▉ | 880/10000 [3:10:06<32:56:01, 13.00s/it] {'loss': 0.0083, 'learning_rate': 4.5610000000000005e-05, 'epoch': 0.33} 9%|▉ | 880/10000 [3:10:06<32:56:01, 13.00s/it] 9%|▉ | 881/10000 [3:10:19<32:53:36, 12.99s/it] {'loss': 0.0059, 'learning_rate': 4.5605e-05, 'epoch': 0.33} 9%|▉ | 881/10000 [3:10:19<32:53:36, 12.99s/it] 9%|▉ | 882/10000 [3:10:32<32:54:24, 12.99s/it] {'loss': 0.0056, 'learning_rate': 4.5600000000000004e-05, 'epoch': 0.33} 9%|▉ | 882/10000 [3:10:32<32:54:24, 12.99s/it] 9%|▉ | 883/10000 [3:10:44<32:50:20, 12.97s/it] {'loss': 0.0099, 'learning_rate': 4.5595000000000006e-05, 'epoch': 0.33} 9%|▉ | 883/10000 [3:10:44<32:50:20, 12.97s/it] 9%|▉ | 884/10000 [3:10:57<32:49:48, 12.96s/it] {'loss': 0.0097, 'learning_rate': 4.559e-05, 'epoch': 0.33} 9%|▉ | 884/10000 [3:10:57<32:49:48, 12.96s/it] 9%|▉ | 885/10000 [3:11:10<32:49:00, 12.96s/it] {'loss': 0.0082, 'learning_rate': 4.5585e-05, 'epoch': 0.33} 9%|▉ | 885/10000 [3:11:10<32:49:00, 12.96s/it] 9%|▉ | 886/10000 [3:11:23<32:47:27, 12.95s/it] {'loss': 0.01, 'learning_rate': 4.558e-05, 'epoch': 0.33} 9%|▉ | 886/10000 [3:11:23<32:47:27, 12.95s/it] 9%|▉ | 887/10000 [3:11:36<32:46:35, 12.95s/it] {'loss': 0.0167, 'learning_rate': 4.5575e-05, 'epoch': 0.33} 9%|▉ | 887/10000 [3:11:36<32:46:35, 12.95s/it] 9%|▉ | 888/10000 [3:11:49<32:48:28, 12.96s/it] {'loss': 0.0074, 'learning_rate': 4.557e-05, 'epoch': 0.33} 9%|▉ | 888/10000 [3:11:49<32:48:28, 12.96s/it] 9%|▉ | 889/10000 [3:12:02<32:52:07, 12.99s/it] {'loss': 0.0093, 'learning_rate': 4.5565e-05, 'epoch': 0.33} 9%|▉ | 889/10000 [3:12:02<32:52:07, 12.99s/it] 9%|▉ | 890/10000 [3:12:15<32:50:20, 12.98s/it] {'loss': 0.0077, 'learning_rate': 4.5560000000000004e-05, 'epoch': 0.34} 9%|▉ | 890/10000 [3:12:15<32:50:20, 12.98s/it] 9%|▉ | 891/10000 [3:12:28<32:50:07, 12.98s/it] {'loss': 0.007, 'learning_rate': 4.5555e-05, 'epoch': 0.34} 9%|▉ | 891/10000 [3:12:28<32:50:07, 12.98s/it] 9%|▉ | 892/10000 [3:12:41<32:48:19, 12.97s/it] {'loss': 0.0088, 'learning_rate': 4.555e-05, 'epoch': 0.34} 9%|▉ | 892/10000 [3:12:41<32:48:19, 12.97s/it] 9%|▉ | 893/10000 [3:12:54<32:47:51, 12.96s/it] {'loss': 0.0081, 'learning_rate': 4.5545000000000005e-05, 'epoch': 0.34} 9%|▉ | 893/10000 [3:12:54<32:47:51, 12.96s/it] 9%|▉ | 894/10000 [3:13:07<32:47:03, 12.96s/it] {'loss': 0.0076, 'learning_rate': 4.554000000000001e-05, 'epoch': 0.34} 9%|▉ | 894/10000 [3:13:07<32:47:03, 12.96s/it] 9%|▉ | 895/10000 [3:13:20<32:43:41, 12.94s/it] {'loss': 0.0109, 'learning_rate': 4.5535e-05, 'epoch': 0.34} 9%|▉ | 895/10000 [3:13:20<32:43:41, 12.94s/it] 9%|▉ | 896/10000 [3:13:33<32:49:44, 12.98s/it] {'loss': 0.0088, 'learning_rate': 4.553e-05, 'epoch': 0.34} 9%|▉ | 896/10000 [3:13:33<32:49:44, 12.98s/it] 9%|▉ | 897/10000 [3:13:46<32:53:13, 13.01s/it] {'loss': 0.0117, 'learning_rate': 4.5525e-05, 'epoch': 0.34} 9%|▉ | 897/10000 [3:13:46<32:53:13, 13.01s/it] 9%|▉ | 898/10000 [3:13:59<32:49:39, 12.98s/it] {'loss': 0.0099, 'learning_rate': 4.5520000000000005e-05, 'epoch': 0.34} 9%|▉ | 898/10000 [3:13:59<32:49:39, 12.98s/it] 9%|▉ | 899/10000 [3:14:12<32:50:28, 12.99s/it] {'loss': 0.0068, 'learning_rate': 4.5515e-05, 'epoch': 0.34} 9%|▉ | 899/10000 [3:14:12<32:50:28, 12.99s/it] 9%|▉ | 900/10000 [3:14:25<32:47:55, 12.98s/it] {'loss': 0.0089, 'learning_rate': 4.551e-05, 'epoch': 0.34} 9%|▉ | 900/10000 [3:14:25<32:47:55, 12.98s/it] 9%|▉ | 901/10000 [3:14:38<32:43:34, 12.95s/it] {'loss': 0.0083, 'learning_rate': 4.5505000000000006e-05, 'epoch': 0.34} 9%|▉ | 901/10000 [3:14:38<32:43:34, 12.95s/it] 9%|▉ | 902/10000 [3:14:51<32:39:23, 12.92s/it] {'loss': 0.0116, 'learning_rate': 4.55e-05, 'epoch': 0.34} 9%|▉ | 902/10000 [3:14:51<32:39:23, 12.92s/it] 9%|▉ | 903/10000 [3:15:04<32:36:56, 12.91s/it] {'loss': 0.0081, 'learning_rate': 4.5495000000000004e-05, 'epoch': 0.34} 9%|▉ | 903/10000 [3:15:04<32:36:56, 12.91s/it] 9%|▉ | 904/10000 [3:15:16<32:35:30, 12.90s/it] {'loss': 0.0085, 'learning_rate': 4.549000000000001e-05, 'epoch': 0.34} 9%|▉ | 904/10000 [3:15:16<32:35:30, 12.90s/it] 9%|▉ | 905/10000 [3:15:29<32:35:04, 12.90s/it] {'loss': 0.0112, 'learning_rate': 4.5485e-05, 'epoch': 0.34} 9%|▉ | 905/10000 [3:15:29<32:35:04, 12.90s/it] 9%|▉ | 906/10000 [3:15:42<32:34:49, 12.90s/it] {'loss': 0.0121, 'learning_rate': 4.548e-05, 'epoch': 0.34} 9%|▉ | 906/10000 [3:15:42<32:34:49, 12.90s/it] 9%|▉ | 907/10000 [3:15:55<32:37:12, 12.91s/it] {'loss': 0.0076, 'learning_rate': 4.5475e-05, 'epoch': 0.34} 9%|▉ | 907/10000 [3:15:55<32:37:12, 12.91s/it] 9%|▉ | 908/10000 [3:16:08<32:36:10, 12.91s/it] {'loss': 0.0091, 'learning_rate': 4.5470000000000003e-05, 'epoch': 0.34} 9%|▉ | 908/10000 [3:16:08<32:36:10, 12.91s/it] 9%|▉ | 909/10000 [3:16:21<32:38:36, 12.93s/it] {'loss': 0.0103, 'learning_rate': 4.5465e-05, 'epoch': 0.34} 9%|▉ | 909/10000 [3:16:21<32:38:36, 12.93s/it] 9%|▉ | 910/10000 [3:16:34<32:36:05, 12.91s/it] {'loss': 0.0113, 'learning_rate': 4.546e-05, 'epoch': 0.34} 9%|▉ | 910/10000 [3:16:34<32:36:05, 12.91s/it] 9%|▉ | 911/10000 [3:16:47<32:37:10, 12.92s/it] {'loss': 0.0077, 'learning_rate': 4.5455000000000004e-05, 'epoch': 0.34} 9%|▉ | 911/10000 [3:16:47<32:37:10, 12.92s/it] 9%|▉ | 912/10000 [3:17:00<32:37:29, 12.92s/it] {'loss': 0.0102, 'learning_rate': 4.545000000000001e-05, 'epoch': 0.34} 9%|▉ | 912/10000 [3:17:00<32:37:29, 12.92s/it] 9%|▉ | 913/10000 [3:17:13<32:39:12, 12.94s/it] {'loss': 0.0079, 'learning_rate': 4.5445e-05, 'epoch': 0.34} 9%|▉ | 913/10000 [3:17:13<32:39:12, 12.94s/it] 9%|▉ | 914/10000 [3:17:26<32:37:28, 12.93s/it] {'loss': 0.0082, 'learning_rate': 4.5440000000000005e-05, 'epoch': 0.34} 9%|▉ | 914/10000 [3:17:26<32:37:28, 12.93s/it] 9%|▉ | 915/10000 [3:17:39<32:37:22, 12.93s/it] {'loss': 0.0097, 'learning_rate': 4.5435e-05, 'epoch': 0.34} 9%|▉ | 915/10000 [3:17:39<32:37:22, 12.93s/it] 9%|▉ | 916/10000 [3:17:51<32:33:07, 12.90s/it] {'loss': 0.0094, 'learning_rate': 4.543e-05, 'epoch': 0.35} 9%|▉ | 916/10000 [3:17:51<32:33:07, 12.90s/it] 9%|▉ | 917/10000 [3:18:04<32:32:53, 12.90s/it] {'loss': 0.0103, 'learning_rate': 4.5425e-05, 'epoch': 0.35} 9%|▉ | 917/10000 [3:18:04<32:32:53, 12.90s/it] 9%|▉ | 918/10000 [3:18:17<32:33:01, 12.90s/it] {'loss': 0.0068, 'learning_rate': 4.542e-05, 'epoch': 0.35} 9%|▉ | 918/10000 [3:18:17<32:33:01, 12.90s/it] 9%|▉ | 919/10000 [3:18:30<32:35:13, 12.92s/it] {'loss': 0.0088, 'learning_rate': 4.5415000000000005e-05, 'epoch': 0.35} 9%|▉ | 919/10000 [3:18:30<32:35:13, 12.92s/it] 9%|▉ | 920/10000 [3:18:43<32:34:56, 12.92s/it] {'loss': 0.0072, 'learning_rate': 4.541e-05, 'epoch': 0.35} 9%|▉ | 920/10000 [3:18:43<32:34:56, 12.92s/it] 9%|▉ | 921/10000 [3:18:56<32:33:13, 12.91s/it] {'loss': 0.0127, 'learning_rate': 4.5405e-05, 'epoch': 0.35} 9%|▉ | 921/10000 [3:18:56<32:33:13, 12.91s/it] 9%|▉ | 922/10000 [3:19:09<32:33:35, 12.91s/it] {'loss': 0.0104, 'learning_rate': 4.5400000000000006e-05, 'epoch': 0.35} 9%|▉ | 922/10000 [3:19:09<32:33:35, 12.91s/it] 9%|▉ | 923/10000 [3:19:22<32:32:26, 12.91s/it] {'loss': 0.0576, 'learning_rate': 4.5395e-05, 'epoch': 0.35} 9%|▉ | 923/10000 [3:19:22<32:32:26, 12.91s/it] 9%|▉ | 924/10000 [3:19:35<32:30:32, 12.89s/it] {'loss': 0.012, 'learning_rate': 4.5390000000000004e-05, 'epoch': 0.35} 9%|▉ | 924/10000 [3:19:35<32:30:32, 12.89s/it] 9%|▉ | 925/10000 [3:19:48<32:30:47, 12.90s/it] {'loss': 0.0087, 'learning_rate': 4.5385e-05, 'epoch': 0.35} 9%|▉ | 925/10000 [3:19:48<32:30:47, 12.90s/it] 9%|▉ | 926/10000 [3:20:01<32:33:57, 12.92s/it] {'loss': 0.008, 'learning_rate': 4.538e-05, 'epoch': 0.35} 9%|▉ | 926/10000 [3:20:01<32:33:57, 12.92s/it] 9%|▉ | 927/10000 [3:20:13<32:30:59, 12.90s/it] {'loss': 0.0093, 'learning_rate': 4.5375e-05, 'epoch': 0.35} 9%|▉ | 927/10000 [3:20:13<32:30:59, 12.90s/it] 9%|▉ | 928/10000 [3:20:26<32:26:49, 12.88s/it] {'loss': 0.013, 'learning_rate': 4.537e-05, 'epoch': 0.35} 9%|▉ | 928/10000 [3:20:26<32:26:49, 12.88s/it] 9%|▉ | 929/10000 [3:20:39<32:25:01, 12.87s/it] {'loss': 0.0163, 'learning_rate': 4.5365000000000004e-05, 'epoch': 0.35} 9%|▉ | 929/10000 [3:20:39<32:25:01, 12.87s/it] 9%|▉ | 930/10000 [3:20:52<32:26:15, 12.87s/it] {'loss': 0.0083, 'learning_rate': 4.536e-05, 'epoch': 0.35} 9%|▉ | 930/10000 [3:20:52<32:26:15, 12.87s/it] 9%|▉ | 931/10000 [3:21:05<32:30:52, 12.91s/it] {'loss': 0.0103, 'learning_rate': 4.5355e-05, 'epoch': 0.35} 9%|▉ | 931/10000 [3:21:05<32:30:52, 12.91s/it] 9%|▉ | 932/10000 [3:21:18<32:28:05, 12.89s/it] {'loss': 0.0452, 'learning_rate': 4.5350000000000005e-05, 'epoch': 0.35} 9%|▉ | 932/10000 [3:21:18<32:28:05, 12.89s/it] 9%|▉ | 933/10000 [3:21:31<32:29:29, 12.90s/it] {'loss': 0.0084, 'learning_rate': 4.534500000000001e-05, 'epoch': 0.35} 9%|▉ | 933/10000 [3:21:31<32:29:29, 12.90s/it] 9%|▉ | 934/10000 [3:21:44<32:27:23, 12.89s/it] {'loss': 0.01, 'learning_rate': 4.534e-05, 'epoch': 0.35} 9%|▉ | 934/10000 [3:21:44<32:27:23, 12.89s/it] 9%|▉ | 935/10000 [3:21:57<32:30:13, 12.91s/it] {'loss': 0.0087, 'learning_rate': 4.5335e-05, 'epoch': 0.35} 9%|▉ | 935/10000 [3:21:57<32:30:13, 12.91s/it] 9%|▉ | 936/10000 [3:22:09<32:28:06, 12.90s/it] {'loss': 0.0174, 'learning_rate': 4.533e-05, 'epoch': 0.35} 9%|▉ | 936/10000 [3:22:09<32:28:06, 12.90s/it] 9%|▉ | 937/10000 [3:22:22<32:30:20, 12.91s/it] {'loss': 0.0088, 'learning_rate': 4.5325000000000004e-05, 'epoch': 0.35} 9%|▉ | 937/10000 [3:22:22<32:30:20, 12.91s/it] 9%|▉ | 938/10000 [3:22:35<32:27:44, 12.90s/it] {'loss': 0.0108, 'learning_rate': 4.532e-05, 'epoch': 0.35} 9%|▉ | 938/10000 [3:22:35<32:27:44, 12.90s/it] 9%|▉ | 939/10000 [3:22:48<32:24:21, 12.88s/it] {'loss': 0.0118, 'learning_rate': 4.5315e-05, 'epoch': 0.35} 9%|▉ | 939/10000 [3:22:48<32:24:21, 12.88s/it] 9%|▉ | 940/10000 [3:23:01<32:23:27, 12.87s/it] {'loss': 0.0116, 'learning_rate': 4.5310000000000005e-05, 'epoch': 0.35} 9%|▉ | 940/10000 [3:23:01<32:23:27, 12.87s/it] 9%|▉ | 941/10000 [3:23:14<32:21:54, 12.86s/it] {'loss': 0.0093, 'learning_rate': 4.5305e-05, 'epoch': 0.35} 9%|▉ | 941/10000 [3:23:14<32:21:54, 12.86s/it] 9%|▉ | 942/10000 [3:23:27<32:22:37, 12.87s/it] {'loss': 0.0086, 'learning_rate': 4.53e-05, 'epoch': 0.35} 9%|▉ | 942/10000 [3:23:27<32:22:37, 12.87s/it] 9%|▉ | 943/10000 [3:23:40<32:25:40, 12.89s/it] {'loss': 0.0086, 'learning_rate': 4.5295000000000006e-05, 'epoch': 0.36} 9%|▉ | 943/10000 [3:23:40<32:25:40, 12.89s/it] 9%|▉ | 944/10000 [3:23:52<32:25:15, 12.89s/it] {'loss': 0.0084, 'learning_rate': 4.529e-05, 'epoch': 0.36} 9%|▉ | 944/10000 [3:23:52<32:25:15, 12.89s/it] 9%|▉ | 945/10000 [3:24:05<32:28:23, 12.91s/it] {'loss': 0.0076, 'learning_rate': 4.5285e-05, 'epoch': 0.36} 9%|▉ | 945/10000 [3:24:05<32:28:23, 12.91s/it] 9%|▉ | 946/10000 [3:24:18<32:24:18, 12.88s/it] {'loss': 0.0084, 'learning_rate': 4.528e-05, 'epoch': 0.36} 9%|▉ | 946/10000 [3:24:18<32:24:18, 12.88s/it] 9%|▉ | 947/10000 [3:24:31<32:24:38, 12.89s/it] {'loss': 0.0101, 'learning_rate': 4.5275e-05, 'epoch': 0.36} 9%|▉ | 947/10000 [3:24:31<32:24:38, 12.89s/it] 9%|▉ | 948/10000 [3:24:44<32:25:32, 12.90s/it] {'loss': 0.0115, 'learning_rate': 4.527e-05, 'epoch': 0.36} 9%|▉ | 948/10000 [3:24:44<32:25:32, 12.90s/it] 9%|▉ | 949/10000 [3:24:57<32:25:28, 12.90s/it] {'loss': 0.0094, 'learning_rate': 4.5265e-05, 'epoch': 0.36} 9%|▉ | 949/10000 [3:24:57<32:25:28, 12.90s/it] 10%|▉ | 950/10000 [3:25:10<32:28:06, 12.92s/it] {'loss': 0.0136, 'learning_rate': 4.5260000000000004e-05, 'epoch': 0.36} 10%|▉ | 950/10000 [3:25:10<32:28:06, 12.92s/it] 10%|▉ | 951/10000 [3:25:23<32:27:35, 12.91s/it] {'loss': 0.0072, 'learning_rate': 4.5255000000000006e-05, 'epoch': 0.36} 10%|▉ | 951/10000 [3:25:23<32:27:35, 12.91s/it] 10%|▉ | 952/10000 [3:25:36<32:29:22, 12.93s/it] {'loss': 0.0078, 'learning_rate': 4.525e-05, 'epoch': 0.36} 10%|▉ | 952/10000 [3:25:36<32:29:22, 12.93s/it] 10%|▉ | 953/10000 [3:25:49<32:31:19, 12.94s/it] {'loss': 0.0085, 'learning_rate': 4.5245000000000005e-05, 'epoch': 0.36} 10%|▉ | 953/10000 [3:25:49<32:31:19, 12.94s/it] 10%|▉ | 954/10000 [3:26:02<32:29:17, 12.93s/it] {'loss': 0.0086, 'learning_rate': 4.524000000000001e-05, 'epoch': 0.36} 10%|▉ | 954/10000 [3:26:02<32:29:17, 12.93s/it] 10%|▉ | 955/10000 [3:26:15<32:30:22, 12.94s/it] {'loss': 0.0102, 'learning_rate': 4.5234999999999996e-05, 'epoch': 0.36} 10%|▉ | 955/10000 [3:26:15<32:30:22, 12.94s/it] 10%|▉ | 956/10000 [3:26:28<32:30:13, 12.94s/it] {'loss': 0.0112, 'learning_rate': 4.523e-05, 'epoch': 0.36} 10%|▉ | 956/10000 [3:26:28<32:30:13, 12.94s/it] 10%|▉ | 957/10000 [3:26:41<32:30:25, 12.94s/it] {'loss': 0.0162, 'learning_rate': 4.5225e-05, 'epoch': 0.36} 10%|▉ | 957/10000 [3:26:41<32:30:25, 12.94s/it] 10%|▉ | 958/10000 [3:26:53<32:32:30, 12.96s/it] {'loss': 0.0077, 'learning_rate': 4.5220000000000004e-05, 'epoch': 0.36} 10%|▉ | 958/10000 [3:26:54<32:32:30, 12.96s/it] 10%|▉ | 959/10000 [3:27:06<32:30:23, 12.94s/it] {'loss': 0.0083, 'learning_rate': 4.5215e-05, 'epoch': 0.36} 10%|▉ | 959/10000 [3:27:06<32:30:23, 12.94s/it] 10%|▉ | 960/10000 [3:27:19<32:33:03, 12.96s/it] {'loss': 0.0057, 'learning_rate': 4.521e-05, 'epoch': 0.36} 10%|▉ | 960/10000 [3:27:19<32:33:03, 12.96s/it] 10%|▉ | 961/10000 [3:27:32<32:33:20, 12.97s/it] {'loss': 0.0096, 'learning_rate': 4.5205000000000005e-05, 'epoch': 0.36} 10%|▉ | 961/10000 [3:27:32<32:33:20, 12.97s/it] 10%|▉ | 962/10000 [3:27:45<32:29:54, 12.94s/it] {'loss': 0.0058, 'learning_rate': 4.52e-05, 'epoch': 0.36} 10%|▉ | 962/10000 [3:27:45<32:29:54, 12.94s/it] 10%|▉ | 963/10000 [3:27:58<32:27:21, 12.93s/it] {'loss': 0.0123, 'learning_rate': 4.5195000000000004e-05, 'epoch': 0.36} 10%|▉ | 963/10000 [3:27:58<32:27:21, 12.93s/it] 10%|▉ | 964/10000 [3:28:11<32:29:44, 12.95s/it] {'loss': 0.01, 'learning_rate': 4.5190000000000006e-05, 'epoch': 0.36} 10%|▉ | 964/10000 [3:28:11<32:29:44, 12.95s/it] 10%|▉ | 965/10000 [3:28:24<32:31:17, 12.96s/it] {'loss': 0.0079, 'learning_rate': 4.5185e-05, 'epoch': 0.36} 10%|▉ | 965/10000 [3:28:24<32:31:17, 12.96s/it] 10%|▉ | 966/10000 [3:28:37<32:30:45, 12.96s/it] {'loss': 0.0073, 'learning_rate': 4.518e-05, 'epoch': 0.36} 10%|▉ | 966/10000 [3:28:37<32:30:45, 12.96s/it] 10%|▉ | 967/10000 [3:28:50<32:30:06, 12.95s/it] {'loss': 0.0083, 'learning_rate': 4.5175e-05, 'epoch': 0.36} 10%|▉ | 967/10000 [3:28:50<32:30:06, 12.95s/it] 10%|▉ | 968/10000 [3:29:03<32:35:00, 12.99s/it] {'loss': 0.008, 'learning_rate': 4.517e-05, 'epoch': 0.36} 10%|▉ | 968/10000 [3:29:03<32:35:00, 12.99s/it] 10%|▉ | 969/10000 [3:29:16<32:34:07, 12.98s/it] {'loss': 0.0089, 'learning_rate': 4.5165e-05, 'epoch': 0.37} 10%|▉ | 969/10000 [3:29:16<32:34:07, 12.98s/it] 10%|▉ | 970/10000 [3:29:29<32:29:57, 12.96s/it] {'loss': 0.0066, 'learning_rate': 4.516e-05, 'epoch': 0.37} 10%|▉ | 970/10000 [3:29:29<32:29:57, 12.96s/it] 10%|▉ | 971/10000 [3:29:42<32:31:46, 12.97s/it] {'loss': 0.006, 'learning_rate': 4.5155000000000004e-05, 'epoch': 0.37} 10%|▉ | 971/10000 [3:29:42<32:31:46, 12.97s/it] 10%|▉ | 972/10000 [3:29:55<32:34:21, 12.99s/it] {'loss': 0.0089, 'learning_rate': 4.5150000000000006e-05, 'epoch': 0.37} 10%|▉ | 972/10000 [3:29:55<32:34:21, 12.99s/it] 10%|▉ | 973/10000 [3:30:08<32:30:22, 12.96s/it] {'loss': 0.0061, 'learning_rate': 4.5145e-05, 'epoch': 0.37} 10%|▉ | 973/10000 [3:30:08<32:30:22, 12.96s/it] 10%|▉ | 974/10000 [3:30:21<32:31:24, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.5140000000000005e-05, 'epoch': 0.37} 10%|▉ | 974/10000 [3:30:21<32:31:24, 12.97s/it] 10%|▉ | 975/10000 [3:30:34<32:30:18, 12.97s/it] {'loss': 0.0066, 'learning_rate': 4.5135e-05, 'epoch': 0.37} 10%|▉ | 975/10000 [3:30:34<32:30:18, 12.97s/it] 10%|▉ | 976/10000 [3:30:47<32:28:17, 12.95s/it] {'loss': 0.0077, 'learning_rate': 4.513e-05, 'epoch': 0.37} 10%|▉ | 976/10000 [3:30:47<32:28:17, 12.95s/it] 10%|▉ | 977/10000 [3:31:00<32:27:17, 12.95s/it] {'loss': 0.0099, 'learning_rate': 4.5125e-05, 'epoch': 0.37} 10%|▉ | 977/10000 [3:31:00<32:27:17, 12.95s/it] 10%|▉ | 978/10000 [3:31:13<32:26:58, 12.95s/it] {'loss': 0.0116, 'learning_rate': 4.512e-05, 'epoch': 0.37} 10%|▉ | 978/10000 [3:31:13<32:26:58, 12.95s/it] 10%|▉ | 979/10000 [3:31:26<32:27:17, 12.95s/it] {'loss': 0.0063, 'learning_rate': 4.5115000000000004e-05, 'epoch': 0.37} 10%|▉ | 979/10000 [3:31:26<32:27:17, 12.95s/it] 10%|▉ | 980/10000 [3:31:39<32:23:44, 12.93s/it] {'loss': 0.0089, 'learning_rate': 4.511e-05, 'epoch': 0.37} 10%|▉ | 980/10000 [3:31:39<32:23:44, 12.93s/it] 10%|▉ | 981/10000 [3:31:51<32:24:24, 12.94s/it] {'loss': 0.0091, 'learning_rate': 4.5105e-05, 'epoch': 0.37} 10%|▉ | 981/10000 [3:31:51<32:24:24, 12.94s/it] 10%|▉ | 982/10000 [3:32:04<32:24:44, 12.94s/it] {'loss': 0.0098, 'learning_rate': 4.5100000000000005e-05, 'epoch': 0.37} 10%|▉ | 982/10000 [3:32:04<32:24:44, 12.94s/it] 10%|▉ | 983/10000 [3:32:17<32:25:51, 12.95s/it] {'loss': 0.0069, 'learning_rate': 4.5095e-05, 'epoch': 0.37} 10%|▉ | 983/10000 [3:32:17<32:25:51, 12.95s/it] 10%|▉ | 984/10000 [3:32:30<32:23:41, 12.93s/it] {'loss': 0.0053, 'learning_rate': 4.5090000000000004e-05, 'epoch': 0.37} 10%|▉ | 984/10000 [3:32:30<32:23:41, 12.93s/it] 10%|▉ | 985/10000 [3:32:43<32:25:39, 12.95s/it] {'loss': 0.0099, 'learning_rate': 4.5085e-05, 'epoch': 0.37} 10%|▉ | 985/10000 [3:32:43<32:25:39, 12.95s/it] 10%|▉ | 986/10000 [3:32:56<32:22:33, 12.93s/it] {'loss': 0.0108, 'learning_rate': 4.508e-05, 'epoch': 0.37} 10%|▉ | 986/10000 [3:32:56<32:22:33, 12.93s/it] 10%|▉ | 987/10000 [3:33:09<32:21:48, 12.93s/it] {'loss': 0.0098, 'learning_rate': 4.5075e-05, 'epoch': 0.37} 10%|▉ | 987/10000 [3:33:09<32:21:48, 12.93s/it] 10%|▉ | 988/10000 [3:33:22<32:22:37, 12.93s/it] {'loss': 0.0098, 'learning_rate': 4.507e-05, 'epoch': 0.37} 10%|▉ | 988/10000 [3:33:22<32:22:37, 12.93s/it] 10%|▉ | 989/10000 [3:33:35<32:22:19, 12.93s/it] {'loss': 0.0079, 'learning_rate': 4.5065e-05, 'epoch': 0.37} 10%|▉ | 989/10000 [3:33:35<32:22:19, 12.93s/it] 10%|▉ | 990/10000 [3:33:48<32:20:30, 12.92s/it] {'loss': 0.0067, 'learning_rate': 4.506e-05, 'epoch': 0.37} 10%|▉ | 990/10000 [3:33:48<32:20:30, 12.92s/it] 10%|▉ | 991/10000 [3:34:01<32:22:18, 12.94s/it] {'loss': 0.0068, 'learning_rate': 4.5055e-05, 'epoch': 0.37} 10%|▉ | 991/10000 [3:34:01<32:22:18, 12.94s/it] 10%|▉ | 992/10000 [3:34:14<32:21:50, 12.93s/it] {'loss': 0.0096, 'learning_rate': 4.5050000000000004e-05, 'epoch': 0.37} 10%|▉ | 992/10000 [3:34:14<32:21:50, 12.93s/it] 10%|▉ | 993/10000 [3:34:27<32:24:01, 12.95s/it] {'loss': 0.0103, 'learning_rate': 4.504500000000001e-05, 'epoch': 0.37} 10%|▉ | 993/10000 [3:34:27<32:24:01, 12.95s/it] 10%|▉ | 994/10000 [3:34:40<32:24:48, 12.96s/it] {'loss': 0.0075, 'learning_rate': 4.504e-05, 'epoch': 0.37} 10%|▉ | 994/10000 [3:34:40<32:24:48, 12.96s/it] 10%|▉ | 995/10000 [3:34:53<32:22:22, 12.94s/it] {'loss': 0.0077, 'learning_rate': 4.5035e-05, 'epoch': 0.37} 10%|▉ | 995/10000 [3:34:53<32:22:22, 12.94s/it] 10%|▉ | 996/10000 [3:35:05<32:18:17, 12.92s/it] {'loss': 0.013, 'learning_rate': 4.503e-05, 'epoch': 0.38} 10%|▉ | 996/10000 [3:35:06<32:18:17, 12.92s/it] 10%|▉ | 997/10000 [3:35:18<32:17:35, 12.91s/it] {'loss': 0.0099, 'learning_rate': 4.5025000000000003e-05, 'epoch': 0.38} 10%|▉ | 997/10000 [3:35:18<32:17:35, 12.91s/it] 10%|▉ | 998/10000 [3:35:31<32:19:35, 12.93s/it] {'loss': 0.0088, 'learning_rate': 4.502e-05, 'epoch': 0.38} 10%|▉ | 998/10000 [3:35:31<32:19:35, 12.93s/it] 10%|▉ | 999/10000 [3:35:44<32:20:00, 12.93s/it] {'loss': 0.0086, 'learning_rate': 4.5015e-05, 'epoch': 0.38} 10%|▉ | 999/10000 [3:35:44<32:20:00, 12.93s/it] 10%|█ | 1000/10000 [3:35:57<32:18:19, 12.92s/it] {'loss': 0.0076, 'learning_rate': 4.5010000000000004e-05, 'epoch': 0.38} 10%|█ | 1000/10000 [3:35:57<32:18:19, 12.92s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-06 00:00:55,774 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/config.json [INFO|configuration_utils.py:364] 2024-11-06 00:00:55,776 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-06 00:01:42,728 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-06 00:01:42,730 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-06 00:01:42,731 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/special_tokens_map.json [2024-11-06 00:01:42,740] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is about to be saved! /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( [2024-11-06 00:01:43,023] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/global_step1000/mp_rank_00_model_states.pt [2024-11-06 00:01:43,031] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/global_step1000/mp_rank_00_model_states.pt... [2024-11-06 00:02:52,098] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/global_step1000/mp_rank_00_model_states.pt. [2024-11-06 00:02:52,192] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-06 00:04:56,002] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-06 00:04:56,005] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-06 00:04:56,005] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1000 is ready now! 10%|█ | 1001/10000 [3:40:10<212:24:11, 84.97s/it] {'loss': 0.0062, 'learning_rate': 4.5005e-05, 'epoch': 0.38} 10%|█ | 1001/10000 [3:40:10<212:24:11, 84.97s/it] 10%|█ | 1002/10000 [3:40:23<158:16:12, 63.32s/it] {'loss': 0.0076, 'learning_rate': 4.5e-05, 'epoch': 0.38} 10%|█ | 1002/10000 [3:40:23<158:16:12, 63.32s/it][2024-11-06 00:05:33,126] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 10%|█ | 1003/10000 [3:40:35<119:24:33, 47.78s/it] {'loss': 0.0075, 'learning_rate': 4.5e-05, 'epoch': 0.38} 10%|█ | 1003/10000 [3:40:35<119:24:33, 47.78s/it][2024-11-06 00:05:44,647] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 10%|█ | 1004/10000 [3:40:46<92:12:52, 36.90s/it] {'loss': 0.0075, 'learning_rate': 4.5e-05, 'epoch': 0.38} 10%|█ | 1004/10000 [3:40:46<92:12:52, 36.90s/it] 10%|█ | 1005/10000 [3:40:59<74:15:33, 29.72s/it] {'loss': 0.0058, 'learning_rate': 4.4995000000000005e-05, 'epoch': 0.38} 10%|█ | 1005/10000 [3:40:59<74:15:33, 29.72s/it] 10%|█ | 1006/10000 [3:41:12<61:36:11, 24.66s/it] {'loss': 0.0089, 'learning_rate': 4.499e-05, 'epoch': 0.38} 10%|█ | 1006/10000 [3:41:12<61:36:11, 24.66s/it] 10%|█ | 1007/10000 [3:41:25<52:47:45, 21.13s/it] {'loss': 0.007, 'learning_rate': 4.4985000000000004e-05, 'epoch': 0.38} 10%|█ | 1007/10000 [3:41:25<52:47:45, 21.13s/it] 10%|█ | 1008/10000 [3:41:38<46:35:38, 18.65s/it] {'loss': 0.0064, 'learning_rate': 4.498e-05, 'epoch': 0.38} 10%|█ | 1008/10000 [3:41:38<46:35:38, 18.65s/it] 10%|█ | 1009/10000 [3:41:51<42:16:26, 16.93s/it] {'loss': 0.0076, 'learning_rate': 4.4975e-05, 'epoch': 0.38} 10%|█ | 1009/10000 [3:41:51<42:16:26, 16.93s/it] 10%|█ | 1010/10000 [3:42:03<39:15:10, 15.72s/it] {'loss': 0.0069, 'learning_rate': 4.497e-05, 'epoch': 0.38} 10%|█ | 1010/10000 [3:42:04<39:15:10, 15.72s/it] 10%|█ | 1011/10000 [3:42:16<37:07:10, 14.87s/it] {'loss': 0.0123, 'learning_rate': 4.4965e-05, 'epoch': 0.38} 10%|█ | 1011/10000 [3:42:16<37:07:10, 14.87s/it] 10%|█ | 1012/10000 [3:42:29<35:40:41, 14.29s/it] {'loss': 0.0099, 'learning_rate': 4.496e-05, 'epoch': 0.38} 10%|█ | 1012/10000 [3:42:29<35:40:41, 14.29s/it] 10%|█ | 1013/10000 [3:42:42<34:38:09, 13.87s/it] {'loss': 0.0068, 'learning_rate': 4.4955000000000006e-05, 'epoch': 0.38} 10%|█ | 1013/10000 [3:42:42<34:38:09, 13.87s/it] 10%|█ | 1014/10000 [3:42:55<33:59:26, 13.62s/it] {'loss': 0.0082, 'learning_rate': 4.495e-05, 'epoch': 0.38} 10%|█ | 1014/10000 [3:42:55<33:59:26, 13.62s/it] 10%|█ | 1015/10000 [3:43:08<33:26:21, 13.40s/it] {'loss': 0.0097, 'learning_rate': 4.4945000000000004e-05, 'epoch': 0.38} 10%|█ | 1015/10000 [3:43:08<33:26:21, 13.40s/it] 10%|█ | 1016/10000 [3:43:21<33:03:19, 13.25s/it] {'loss': 0.0065, 'learning_rate': 4.494000000000001e-05, 'epoch': 0.38} 10%|█ | 1016/10000 [3:43:21<33:03:19, 13.25s/it] 10%|█ | 1017/10000 [3:43:34<32:46:27, 13.13s/it] {'loss': 0.0083, 'learning_rate': 4.4935e-05, 'epoch': 0.38} 10%|█ | 1017/10000 [3:43:34<32:46:27, 13.13s/it] 10%|█ | 1018/10000 [3:43:47<32:33:22, 13.05s/it] {'loss': 0.0084, 'learning_rate': 4.493e-05, 'epoch': 0.38} 10%|█ | 1018/10000 [3:43:47<32:33:22, 13.05s/it] 10%|█ | 1019/10000 [3:44:00<32:25:20, 13.00s/it] {'loss': 0.0087, 'learning_rate': 4.4925e-05, 'epoch': 0.38} 10%|█ | 1019/10000 [3:44:00<32:25:20, 13.00s/it] 10%|█ | 1020/10000 [3:44:13<32:20:37, 12.97s/it] {'loss': 0.0083, 'learning_rate': 4.4920000000000004e-05, 'epoch': 0.38} 10%|█ | 1020/10000 [3:44:13<32:20:37, 12.97s/it] 10%|█ | 1021/10000 [3:44:25<32:17:57, 12.95s/it] {'loss': 0.0064, 'learning_rate': 4.4915e-05, 'epoch': 0.38} 10%|█ | 1021/10000 [3:44:25<32:17:57, 12.95s/it] 10%|█ | 1022/10000 [3:44:38<32:17:25, 12.95s/it] {'loss': 0.0069, 'learning_rate': 4.491e-05, 'epoch': 0.39} 10%|█ | 1022/10000 [3:44:38<32:17:25, 12.95s/it] 10%|█ | 1023/10000 [3:44:51<32:16:32, 12.94s/it] {'loss': 0.0067, 'learning_rate': 4.4905000000000005e-05, 'epoch': 0.39} 10%|█ | 1023/10000 [3:44:51<32:16:32, 12.94s/it] 10%|█ | 1024/10000 [3:45:04<32:14:57, 12.93s/it] {'loss': 0.013, 'learning_rate': 4.49e-05, 'epoch': 0.39} 10%|█ | 1024/10000 [3:45:04<32:14:57, 12.93s/it] 10%|█ | 1025/10000 [3:45:17<32:17:26, 12.95s/it] {'loss': 0.0063, 'learning_rate': 4.4895e-05, 'epoch': 0.39} 10%|█ | 1025/10000 [3:45:17<32:17:26, 12.95s/it] 10%|█ | 1026/10000 [3:45:30<32:17:26, 12.95s/it] {'loss': 0.0091, 'learning_rate': 4.4890000000000006e-05, 'epoch': 0.39} 10%|█ | 1026/10000 [3:45:30<32:17:26, 12.95s/it] 10%|█ | 1027/10000 [3:45:43<32:18:50, 12.96s/it] {'loss': 0.0063, 'learning_rate': 4.488500000000001e-05, 'epoch': 0.39} 10%|█ | 1027/10000 [3:45:43<32:18:50, 12.96s/it] 10%|█ | 1028/10000 [3:45:56<32:14:34, 12.94s/it] {'loss': 0.0066, 'learning_rate': 4.488e-05, 'epoch': 0.39} 10%|█ | 1028/10000 [3:45:56<32:14:34, 12.94s/it] 10%|█ | 1029/10000 [3:46:09<32:11:14, 12.92s/it] {'loss': 0.0108, 'learning_rate': 4.4875e-05, 'epoch': 0.39} 10%|█ | 1029/10000 [3:46:09<32:11:14, 12.92s/it] 10%|█ | 1030/10000 [3:46:22<32:05:59, 12.88s/it] {'loss': 0.0064, 'learning_rate': 4.487e-05, 'epoch': 0.39} 10%|█ | 1030/10000 [3:46:22<32:05:59, 12.88s/it] 10%|█ | 1031/10000 [3:46:35<32:04:55, 12.88s/it] {'loss': 0.0081, 'learning_rate': 4.4865e-05, 'epoch': 0.39} 10%|█ | 1031/10000 [3:46:35<32:04:55, 12.88s/it] 10%|█ | 1032/10000 [3:46:47<32:03:08, 12.87s/it] {'loss': 0.0134, 'learning_rate': 4.486e-05, 'epoch': 0.39} 10%|█ | 1032/10000 [3:46:47<32:03:08, 12.87s/it] 10%|█ | 1033/10000 [3:47:00<32:01:16, 12.86s/it] {'loss': 0.0065, 'learning_rate': 4.4855e-05, 'epoch': 0.39} 10%|█ | 1033/10000 [3:47:00<32:01:16, 12.86s/it] 10%|█ | 1034/10000 [3:47:13<32:03:55, 12.87s/it] {'loss': 0.0073, 'learning_rate': 4.4850000000000006e-05, 'epoch': 0.39} 10%|█ | 1034/10000 [3:47:13<32:03:55, 12.87s/it] 10%|█ | 1035/10000 [3:47:26<32:09:31, 12.91s/it] {'loss': 0.0101, 'learning_rate': 4.4845e-05, 'epoch': 0.39} 10%|█ | 1035/10000 [3:47:26<32:09:31, 12.91s/it] 10%|█ | 1036/10000 [3:47:39<32:08:42, 12.91s/it] {'loss': 0.0062, 'learning_rate': 4.4840000000000004e-05, 'epoch': 0.39} 10%|█ | 1036/10000 [3:47:39<32:08:42, 12.91s/it] 10%|█ | 1037/10000 [3:47:52<32:10:11, 12.92s/it] {'loss': 0.0079, 'learning_rate': 4.483500000000001e-05, 'epoch': 0.39} 10%|█ | 1037/10000 [3:47:52<32:10:11, 12.92s/it] 10%|█ | 1038/10000 [3:48:05<32:05:59, 12.89s/it] {'loss': 0.0077, 'learning_rate': 4.483e-05, 'epoch': 0.39} 10%|█ | 1038/10000 [3:48:05<32:05:59, 12.89s/it] 10%|█ | 1039/10000 [3:48:18<32:06:01, 12.90s/it] {'loss': 0.0055, 'learning_rate': 4.4825e-05, 'epoch': 0.39} 10%|█ | 1039/10000 [3:48:18<32:06:01, 12.90s/it] 10%|█ | 1040/10000 [3:48:31<32:06:14, 12.90s/it] {'loss': 0.0088, 'learning_rate': 4.482e-05, 'epoch': 0.39} 10%|█ | 1040/10000 [3:48:31<32:06:14, 12.90s/it] 10%|█ | 1041/10000 [3:48:44<32:05:32, 12.90s/it] {'loss': 0.0062, 'learning_rate': 4.4815000000000004e-05, 'epoch': 0.39} 10%|█ | 1041/10000 [3:48:44<32:05:32, 12.90s/it] 10%|█ | 1042/10000 [3:48:56<32:07:56, 12.91s/it] {'loss': 0.0071, 'learning_rate': 4.481e-05, 'epoch': 0.39} 10%|█ | 1042/10000 [3:48:57<32:07:56, 12.91s/it] 10%|█ | 1043/10000 [3:49:09<32:11:58, 12.94s/it] {'loss': 0.0139, 'learning_rate': 4.4805e-05, 'epoch': 0.39} 10%|█ | 1043/10000 [3:49:10<32:11:58, 12.94s/it] 10%|█ | 1044/10000 [3:49:22<32:10:36, 12.93s/it] {'loss': 0.0064, 'learning_rate': 4.4800000000000005e-05, 'epoch': 0.39} 10%|█ | 1044/10000 [3:49:22<32:10:36, 12.93s/it] 10%|█ | 1045/10000 [3:49:35<32:09:36, 12.93s/it] {'loss': 0.007, 'learning_rate': 4.4795e-05, 'epoch': 0.39} 10%|█ | 1045/10000 [3:49:35<32:09:36, 12.93s/it] 10%|█ | 1046/10000 [3:49:48<32:10:36, 12.94s/it] {'loss': 0.0066, 'learning_rate': 4.479e-05, 'epoch': 0.39} 10%|█ | 1046/10000 [3:49:48<32:10:36, 12.94s/it] 10%|█ | 1047/10000 [3:50:01<32:07:22, 12.92s/it] {'loss': 0.0072, 'learning_rate': 4.4785000000000006e-05, 'epoch': 0.39} 10%|█ | 1047/10000 [3:50:01<32:07:22, 12.92s/it] 10%|█ | 1048/10000 [3:50:14<32:06:52, 12.91s/it] {'loss': 0.0064, 'learning_rate': 4.478e-05, 'epoch': 0.39} 10%|█ | 1048/10000 [3:50:14<32:06:52, 12.91s/it] 10%|█ | 1049/10000 [3:50:27<32:06:00, 12.91s/it] {'loss': 0.0078, 'learning_rate': 4.4775e-05, 'epoch': 0.4} 10%|█ | 1049/10000 [3:50:27<32:06:00, 12.91s/it] 10%|█ | 1050/10000 [3:50:40<32:03:42, 12.90s/it] {'loss': 0.0061, 'learning_rate': 4.477e-05, 'epoch': 0.4} 10%|█ | 1050/10000 [3:50:40<32:03:42, 12.90s/it] 11%|█ | 1051/10000 [3:50:53<32:01:38, 12.88s/it] {'loss': 0.0088, 'learning_rate': 4.4765e-05, 'epoch': 0.4} 11%|█ | 1051/10000 [3:50:53<32:01:38, 12.88s/it] 11%|█ | 1052/10000 [3:51:06<32:00:11, 12.88s/it] {'loss': 0.007, 'learning_rate': 4.4760000000000005e-05, 'epoch': 0.4} 11%|█ | 1052/10000 [3:51:06<32:00:11, 12.88s/it] 11%|█ | 1053/10000 [3:51:18<32:02:31, 12.89s/it] {'loss': 0.0093, 'learning_rate': 4.4755e-05, 'epoch': 0.4} 11%|█ | 1053/10000 [3:51:19<32:02:31, 12.89s/it] 11%|█ | 1054/10000 [3:51:31<32:04:47, 12.91s/it] {'loss': 0.0087, 'learning_rate': 4.4750000000000004e-05, 'epoch': 0.4} 11%|█ | 1054/10000 [3:51:31<32:04:47, 12.91s/it] 11%|█ | 1055/10000 [3:51:44<32:06:28, 12.92s/it] {'loss': 0.0059, 'learning_rate': 4.4745000000000006e-05, 'epoch': 0.4} 11%|█ | 1055/10000 [3:51:44<32:06:28, 12.92s/it] 11%|█ | 1056/10000 [3:51:57<32:08:29, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.474e-05, 'epoch': 0.4} 11%|█ | 1056/10000 [3:51:57<32:08:29, 12.94s/it] 11%|█ | 1057/10000 [3:52:10<32:14:29, 12.98s/it] {'loss': 0.0074, 'learning_rate': 4.4735000000000005e-05, 'epoch': 0.4} 11%|█ | 1057/10000 [3:52:10<32:14:29, 12.98s/it] 11%|█ | 1058/10000 [3:52:23<32:16:49, 13.00s/it] {'loss': 0.0059, 'learning_rate': 4.473e-05, 'epoch': 0.4} 11%|█ | 1058/10000 [3:52:23<32:16:49, 13.00s/it] 11%|█ | 1059/10000 [3:52:36<32:16:42, 13.00s/it] {'loss': 0.0059, 'learning_rate': 4.4725e-05, 'epoch': 0.4} 11%|█ | 1059/10000 [3:52:37<32:16:42, 13.00s/it] 11%|█ | 1060/10000 [3:52:49<32:16:35, 13.00s/it] {'loss': 0.0064, 'learning_rate': 4.472e-05, 'epoch': 0.4} 11%|█ | 1060/10000 [3:52:50<32:16:35, 13.00s/it] 11%|█ | 1061/10000 [3:53:02<32:14:45, 12.99s/it] {'loss': 0.0059, 'learning_rate': 4.4715e-05, 'epoch': 0.4} 11%|█ | 1061/10000 [3:53:02<32:14:45, 12.99s/it] 11%|█ | 1062/10000 [3:53:15<32:17:33, 13.01s/it] {'loss': 0.0071, 'learning_rate': 4.4710000000000004e-05, 'epoch': 0.4} 11%|█ | 1062/10000 [3:53:16<32:17:33, 13.01s/it] 11%|█ | 1063/10000 [3:53:29<32:19:02, 13.02s/it] {'loss': 0.0057, 'learning_rate': 4.4705e-05, 'epoch': 0.4} 11%|█ | 1063/10000 [3:53:29<32:19:02, 13.02s/it] 11%|█ | 1064/10000 [3:53:41<32:12:59, 12.98s/it] {'loss': 0.0062, 'learning_rate': 4.47e-05, 'epoch': 0.4} 11%|█ | 1064/10000 [3:53:41<32:12:59, 12.98s/it] 11%|█ | 1065/10000 [3:53:54<32:15:29, 13.00s/it] {'loss': 0.0093, 'learning_rate': 4.4695000000000005e-05, 'epoch': 0.4} 11%|█ | 1065/10000 [3:53:54<32:15:29, 13.00s/it] 11%|█ | 1066/10000 [3:54:07<32:11:39, 12.97s/it] {'loss': 0.0056, 'learning_rate': 4.469e-05, 'epoch': 0.4} 11%|█ | 1066/10000 [3:54:07<32:11:39, 12.97s/it] 11%|█ | 1067/10000 [3:54:20<32:10:09, 12.96s/it] {'loss': 0.0075, 'learning_rate': 4.4685e-05, 'epoch': 0.4} 11%|█ | 1067/10000 [3:54:20<32:10:09, 12.96s/it] 11%|█ | 1068/10000 [3:54:33<32:09:44, 12.96s/it] {'loss': 0.0056, 'learning_rate': 4.468e-05, 'epoch': 0.4} 11%|█ | 1068/10000 [3:54:33<32:09:44, 12.96s/it] 11%|█ | 1069/10000 [3:54:46<32:09:58, 12.97s/it] {'loss': 0.0071, 'learning_rate': 4.4675e-05, 'epoch': 0.4} 11%|█ | 1069/10000 [3:54:46<32:09:58, 12.97s/it] 11%|█ | 1070/10000 [3:54:59<32:09:06, 12.96s/it] {'loss': 0.007, 'learning_rate': 4.467e-05, 'epoch': 0.4} 11%|█ | 1070/10000 [3:54:59<32:09:06, 12.96s/it] 11%|█ | 1071/10000 [3:55:12<32:10:36, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.4665e-05, 'epoch': 0.4} 11%|█ | 1071/10000 [3:55:12<32:10:36, 12.97s/it] 11%|█ | 1072/10000 [3:55:25<32:09:50, 12.97s/it] {'loss': 0.0054, 'learning_rate': 4.466e-05, 'epoch': 0.4} 11%|█ | 1072/10000 [3:55:25<32:09:50, 12.97s/it] 11%|█ | 1073/10000 [3:55:38<32:12:22, 12.99s/it] {'loss': 0.0046, 'learning_rate': 4.4655000000000005e-05, 'epoch': 0.4} 11%|█ | 1073/10000 [3:55:38<32:12:22, 12.99s/it] 11%|█ | 1074/10000 [3:55:51<32:09:18, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.465e-05, 'epoch': 0.4} 11%|█ | 1074/10000 [3:55:51<32:09:18, 12.97s/it] 11%|█ | 1075/10000 [3:56:04<32:07:26, 12.96s/it] {'loss': 0.0383, 'learning_rate': 4.4645000000000004e-05, 'epoch': 0.41} 11%|█ | 1075/10000 [3:56:04<32:07:26, 12.96s/it] 11%|█ | 1076/10000 [3:56:17<32:05:34, 12.95s/it] {'loss': 0.0093, 'learning_rate': 4.4640000000000006e-05, 'epoch': 0.41} 11%|█ | 1076/10000 [3:56:17<32:05:34, 12.95s/it] 11%|█ | 1077/10000 [3:56:30<32:07:51, 12.96s/it] {'loss': 0.0067, 'learning_rate': 4.4635e-05, 'epoch': 0.41} 11%|█ | 1077/10000 [3:56:30<32:07:51, 12.96s/it] 11%|█ | 1078/10000 [3:56:43<32:07:45, 12.96s/it] {'loss': 0.0059, 'learning_rate': 4.463e-05, 'epoch': 0.41} 11%|█ | 1078/10000 [3:56:43<32:07:45, 12.96s/it] 11%|█ | 1079/10000 [3:56:56<32:06:05, 12.95s/it] {'loss': 0.0063, 'learning_rate': 4.4625e-05, 'epoch': 0.41} 11%|█ | 1079/10000 [3:56:56<32:06:05, 12.95s/it] 11%|█ | 1080/10000 [3:57:09<32:06:50, 12.96s/it] {'loss': 0.0074, 'learning_rate': 4.462e-05, 'epoch': 0.41} 11%|█ | 1080/10000 [3:57:09<32:06:50, 12.96s/it] 11%|█ | 1081/10000 [3:57:22<32:05:03, 12.95s/it] {'loss': 0.0052, 'learning_rate': 4.4615e-05, 'epoch': 0.41} 11%|█ | 1081/10000 [3:57:22<32:05:03, 12.95s/it] 11%|█ | 1082/10000 [3:57:35<32:04:28, 12.95s/it] {'loss': 0.0069, 'learning_rate': 4.461e-05, 'epoch': 0.41} 11%|█ | 1082/10000 [3:57:35<32:04:28, 12.95s/it] 11%|█ | 1083/10000 [3:57:48<32:06:24, 12.96s/it] {'loss': 0.0066, 'learning_rate': 4.4605000000000004e-05, 'epoch': 0.41} 11%|█ | 1083/10000 [3:57:48<32:06:24, 12.96s/it] 11%|█ | 1084/10000 [3:58:01<32:08:24, 12.98s/it] {'loss': 0.0073, 'learning_rate': 4.46e-05, 'epoch': 0.41} 11%|█ | 1084/10000 [3:58:01<32:08:24, 12.98s/it] 11%|█ | 1085/10000 [3:58:14<32:08:51, 12.98s/it] {'loss': 0.0059, 'learning_rate': 4.4595e-05, 'epoch': 0.41} 11%|█ | 1085/10000 [3:58:14<32:08:51, 12.98s/it] 11%|█ | 1086/10000 [3:58:27<32:10:31, 12.99s/it] {'loss': 0.0055, 'learning_rate': 4.4590000000000005e-05, 'epoch': 0.41} 11%|█ | 1086/10000 [3:58:27<32:10:31, 12.99s/it] 11%|█ | 1087/10000 [3:58:40<32:11:48, 13.00s/it] {'loss': 0.0069, 'learning_rate': 4.458500000000001e-05, 'epoch': 0.41} 11%|█ | 1087/10000 [3:58:40<32:11:48, 13.00s/it] 11%|█ | 1088/10000 [3:58:53<32:07:43, 12.98s/it] {'loss': 0.0082, 'learning_rate': 4.458e-05, 'epoch': 0.41} 11%|█ | 1088/10000 [3:58:53<32:07:43, 12.98s/it] 11%|█ | 1089/10000 [3:59:06<32:05:22, 12.96s/it] {'loss': 0.0075, 'learning_rate': 4.4575e-05, 'epoch': 0.41} 11%|█ | 1089/10000 [3:59:06<32:05:22, 12.96s/it] 11%|█ | 1090/10000 [3:59:19<32:06:48, 12.98s/it] {'loss': 0.0087, 'learning_rate': 4.457e-05, 'epoch': 0.41} 11%|█ | 1090/10000 [3:59:19<32:06:48, 12.98s/it] 11%|█ | 1091/10000 [3:59:32<32:05:26, 12.97s/it] {'loss': 0.0065, 'learning_rate': 4.4565000000000004e-05, 'epoch': 0.41} 11%|█ | 1091/10000 [3:59:32<32:05:26, 12.97s/it] 11%|█ | 1092/10000 [3:59:45<32:08:10, 12.99s/it] {'loss': 0.0051, 'learning_rate': 4.456e-05, 'epoch': 0.41} 11%|█ | 1092/10000 [3:59:45<32:08:10, 12.99s/it] 11%|█ | 1093/10000 [3:59:58<32:07:45, 12.99s/it] {'loss': 0.0066, 'learning_rate': 4.4555e-05, 'epoch': 0.41} 11%|█ | 1093/10000 [3:59:58<32:07:45, 12.99s/it] 11%|█ | 1094/10000 [4:00:11<32:05:53, 12.97s/it] {'loss': 0.006, 'learning_rate': 4.4550000000000005e-05, 'epoch': 0.41} 11%|█ | 1094/10000 [4:00:11<32:05:53, 12.97s/it] 11%|█ | 1095/10000 [4:00:24<32:07:15, 12.99s/it] {'loss': 0.0068, 'learning_rate': 4.4545e-05, 'epoch': 0.41} 11%|█ | 1095/10000 [4:00:24<32:07:15, 12.99s/it] 11%|█ | 1096/10000 [4:00:37<32:07:49, 12.99s/it] {'loss': 0.0083, 'learning_rate': 4.4540000000000004e-05, 'epoch': 0.41} 11%|█ | 1096/10000 [4:00:37<32:07:49, 12.99s/it] 11%|█ | 1097/10000 [4:00:49<32:04:42, 12.97s/it] {'loss': 0.0065, 'learning_rate': 4.4535000000000006e-05, 'epoch': 0.41} 11%|█ | 1097/10000 [4:00:50<32:04:42, 12.97s/it] 11%|█ | 1098/10000 [4:01:02<32:03:47, 12.97s/it] {'loss': 0.0069, 'learning_rate': 4.453e-05, 'epoch': 0.41} 11%|█ | 1098/10000 [4:01:02<32:03:47, 12.97s/it] 11%|█ | 1099/10000 [4:01:15<32:05:40, 12.98s/it] {'loss': 0.0074, 'learning_rate': 4.4525e-05, 'epoch': 0.41} 11%|█ | 1099/10000 [4:01:15<32:05:40, 12.98s/it] 11%|█ | 1100/10000 [4:01:28<32:01:56, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.452e-05, 'epoch': 0.41} 11%|█ | 1100/10000 [4:01:28<32:01:56, 12.96s/it] 11%|█ | 1101/10000 [4:01:41<32:03:12, 12.97s/it] {'loss': 0.0065, 'learning_rate': 4.4515e-05, 'epoch': 0.41} 11%|█ | 1101/10000 [4:01:41<32:03:12, 12.97s/it] 11%|█ | 1102/10000 [4:01:54<32:04:21, 12.98s/it] {'loss': 0.0091, 'learning_rate': 4.451e-05, 'epoch': 0.42} 11%|█ | 1102/10000 [4:01:54<32:04:21, 12.98s/it] 11%|█ | 1103/10000 [4:02:07<32:05:51, 12.99s/it] {'loss': 0.0082, 'learning_rate': 4.4505e-05, 'epoch': 0.42} 11%|█ | 1103/10000 [4:02:07<32:05:51, 12.99s/it] 11%|█ | 1104/10000 [4:02:20<32:06:00, 12.99s/it] {'loss': 0.005, 'learning_rate': 4.4500000000000004e-05, 'epoch': 0.42} 11%|█ | 1104/10000 [4:02:20<32:06:00, 12.99s/it] 11%|█ | 1105/10000 [4:02:33<32:05:17, 12.99s/it] {'loss': 0.005, 'learning_rate': 4.4495e-05, 'epoch': 0.42} 11%|█ | 1105/10000 [4:02:33<32:05:17, 12.99s/it] 11%|█ | 1106/10000 [4:02:46<32:05:54, 12.99s/it] {'loss': 0.0055, 'learning_rate': 4.449e-05, 'epoch': 0.42} 11%|█ | 1106/10000 [4:02:46<32:05:54, 12.99s/it] 11%|█ | 1107/10000 [4:02:59<32:02:28, 12.97s/it] {'loss': 0.0055, 'learning_rate': 4.4485000000000005e-05, 'epoch': 0.42} 11%|█ | 1107/10000 [4:02:59<32:02:28, 12.97s/it] 11%|█ | 1108/10000 [4:03:12<32:00:18, 12.96s/it] {'loss': 0.0073, 'learning_rate': 4.448e-05, 'epoch': 0.42} 11%|█ | 1108/10000 [4:03:12<32:00:18, 12.96s/it] 11%|█ | 1109/10000 [4:03:25<31:56:41, 12.93s/it] {'loss': 0.0063, 'learning_rate': 4.4475e-05, 'epoch': 0.42} 11%|█ | 1109/10000 [4:03:25<31:56:41, 12.93s/it] 11%|█ | 1110/10000 [4:03:38<31:59:11, 12.95s/it] {'loss': 0.0065, 'learning_rate': 4.447e-05, 'epoch': 0.42} 11%|█ | 1110/10000 [4:03:38<31:59:11, 12.95s/it] 11%|█ | 1111/10000 [4:03:51<32:02:30, 12.98s/it] {'loss': 0.0053, 'learning_rate': 4.4465e-05, 'epoch': 0.42} 11%|█ | 1111/10000 [4:03:51<32:02:30, 12.98s/it] 11%|█ | 1112/10000 [4:04:04<32:02:07, 12.98s/it] {'loss': 0.0077, 'learning_rate': 4.4460000000000005e-05, 'epoch': 0.42} 11%|█ | 1112/10000 [4:04:04<32:02:07, 12.98s/it] 11%|█ | 1113/10000 [4:04:17<32:02:06, 12.98s/it] {'loss': 0.0077, 'learning_rate': 4.4455e-05, 'epoch': 0.42} 11%|█ | 1113/10000 [4:04:17<32:02:06, 12.98s/it] 11%|█ | 1114/10000 [4:04:30<32:02:55, 12.98s/it] {'loss': 0.0065, 'learning_rate': 4.445e-05, 'epoch': 0.42} 11%|█ | 1114/10000 [4:04:30<32:02:55, 12.98s/it] 11%|█ | 1115/10000 [4:04:43<32:02:55, 12.99s/it] {'loss': 0.0055, 'learning_rate': 4.4445000000000006e-05, 'epoch': 0.42} 11%|█ | 1115/10000 [4:04:43<32:02:55, 12.99s/it] 11%|█ | 1116/10000 [4:04:56<32:03:21, 12.99s/it] {'loss': 0.005, 'learning_rate': 4.444e-05, 'epoch': 0.42} 11%|█ | 1116/10000 [4:04:56<32:03:21, 12.99s/it] 11%|█ | 1117/10000 [4:05:09<32:03:43, 12.99s/it] {'loss': 0.0055, 'learning_rate': 4.4435000000000004e-05, 'epoch': 0.42} 11%|█ | 1117/10000 [4:05:09<32:03:43, 12.99s/it] 11%|█ | 1118/10000 [4:05:22<32:00:25, 12.97s/it] {'loss': 0.0051, 'learning_rate': 4.443e-05, 'epoch': 0.42} 11%|█ | 1118/10000 [4:05:22<32:00:25, 12.97s/it] 11%|█ | 1119/10000 [4:05:35<31:53:21, 12.93s/it] {'loss': 0.0093, 'learning_rate': 4.4425e-05, 'epoch': 0.42} 11%|█ | 1119/10000 [4:05:35<31:53:21, 12.93s/it] 11%|█ | 1120/10000 [4:05:48<31:52:34, 12.92s/it] {'loss': 0.008, 'learning_rate': 4.442e-05, 'epoch': 0.42} 11%|█ | 1120/10000 [4:05:48<31:52:34, 12.92s/it] 11%|█ | 1121/10000 [4:06:01<31:54:57, 12.94s/it] {'loss': 0.0064, 'learning_rate': 4.4415e-05, 'epoch': 0.42} 11%|█ | 1121/10000 [4:06:01<31:54:57, 12.94s/it] 11%|█ | 1122/10000 [4:06:14<31:57:56, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.4410000000000003e-05, 'epoch': 0.42} 11%|█ | 1122/10000 [4:06:14<31:57:56, 12.96s/it] 11%|█ | 1123/10000 [4:06:27<31:57:35, 12.96s/it] {'loss': 0.0061, 'learning_rate': 4.4405e-05, 'epoch': 0.42} 11%|█ | 1123/10000 [4:06:27<31:57:35, 12.96s/it] 11%|█ | 1124/10000 [4:06:40<31:56:03, 12.95s/it] {'loss': 0.0052, 'learning_rate': 4.44e-05, 'epoch': 0.42} 11%|█ | 1124/10000 [4:06:40<31:56:03, 12.95s/it] 11%|█▏ | 1125/10000 [4:06:52<31:55:08, 12.95s/it] {'loss': 0.0054, 'learning_rate': 4.4395000000000004e-05, 'epoch': 0.42} 11%|█▏ | 1125/10000 [4:06:53<31:55:08, 12.95s/it] 11%|█▏ | 1126/10000 [4:07:05<31:57:29, 12.96s/it] {'loss': 0.0063, 'learning_rate': 4.439000000000001e-05, 'epoch': 0.42} 11%|█▏ | 1126/10000 [4:07:06<31:57:29, 12.96s/it] 11%|█▏ | 1127/10000 [4:07:18<31:59:23, 12.98s/it] {'loss': 0.0068, 'learning_rate': 4.4385e-05, 'epoch': 0.42} 11%|█▏ | 1127/10000 [4:07:19<31:59:23, 12.98s/it] 11%|█▏ | 1128/10000 [4:07:31<31:58:14, 12.97s/it] {'loss': 0.0067, 'learning_rate': 4.438e-05, 'epoch': 0.43} 11%|█▏ | 1128/10000 [4:07:31<31:58:14, 12.97s/it] 11%|█▏ | 1129/10000 [4:07:44<31:57:08, 12.97s/it] {'loss': 0.0059, 'learning_rate': 4.4375e-05, 'epoch': 0.43} 11%|█▏ | 1129/10000 [4:07:44<31:57:08, 12.97s/it] 11%|█▏ | 1130/10000 [4:07:57<31:55:35, 12.96s/it] {'loss': 0.0069, 'learning_rate': 4.4370000000000004e-05, 'epoch': 0.43} 11%|█▏ | 1130/10000 [4:07:57<31:55:35, 12.96s/it] 11%|█▏ | 1131/10000 [4:08:10<31:49:29, 12.92s/it] {'loss': 0.0064, 'learning_rate': 4.4365e-05, 'epoch': 0.43} 11%|█▏ | 1131/10000 [4:08:10<31:49:29, 12.92s/it] 11%|█▏ | 1132/10000 [4:08:23<31:49:51, 12.92s/it] {'loss': 0.0066, 'learning_rate': 4.436e-05, 'epoch': 0.43} 11%|█▏ | 1132/10000 [4:08:23<31:49:51, 12.92s/it] 11%|█▏ | 1133/10000 [4:08:36<31:53:22, 12.95s/it] {'loss': 0.007, 'learning_rate': 4.4355000000000005e-05, 'epoch': 0.43} 11%|█▏ | 1133/10000 [4:08:36<31:53:22, 12.95s/it] 11%|█▏ | 1134/10000 [4:08:49<31:53:59, 12.95s/it] {'loss': 0.0067, 'learning_rate': 4.435e-05, 'epoch': 0.43} 11%|█▏ | 1134/10000 [4:08:49<31:53:59, 12.95s/it] 11%|█▏ | 1135/10000 [4:09:02<31:54:44, 12.96s/it] {'loss': 0.0075, 'learning_rate': 4.4345e-05, 'epoch': 0.43} 11%|█▏ | 1135/10000 [4:09:02<31:54:44, 12.96s/it] 11%|█▏ | 1136/10000 [4:09:15<31:56:06, 12.97s/it] {'loss': 0.0059, 'learning_rate': 4.4340000000000006e-05, 'epoch': 0.43} 11%|█▏ | 1136/10000 [4:09:15<31:56:06, 12.97s/it] 11%|█▏ | 1137/10000 [4:09:28<31:53:43, 12.96s/it] {'loss': 0.0057, 'learning_rate': 4.4335e-05, 'epoch': 0.43} 11%|█▏ | 1137/10000 [4:09:28<31:53:43, 12.96s/it] 11%|█▏ | 1138/10000 [4:09:41<31:50:54, 12.94s/it] {'loss': 0.0653, 'learning_rate': 4.4330000000000004e-05, 'epoch': 0.43} 11%|█▏ | 1138/10000 [4:09:41<31:50:54, 12.94s/it] 11%|█▏ | 1139/10000 [4:09:54<31:51:50, 12.95s/it] {'loss': 0.0326, 'learning_rate': 4.4325e-05, 'epoch': 0.43} 11%|█▏ | 1139/10000 [4:09:54<31:51:50, 12.95s/it] 11%|█▏ | 1140/10000 [4:10:07<31:52:10, 12.95s/it] {'loss': 0.008, 'learning_rate': 4.432e-05, 'epoch': 0.43} 11%|█▏ | 1140/10000 [4:10:07<31:52:10, 12.95s/it] 11%|█▏ | 1141/10000 [4:10:20<31:52:13, 12.95s/it] {'loss': 0.0191, 'learning_rate': 4.4315e-05, 'epoch': 0.43} 11%|█▏ | 1141/10000 [4:10:20<31:52:13, 12.95s/it] 11%|█▏ | 1142/10000 [4:10:33<31:51:18, 12.95s/it] {'loss': 0.0071, 'learning_rate': 4.431e-05, 'epoch': 0.43} 11%|█▏ | 1142/10000 [4:10:33<31:51:18, 12.95s/it] 11%|█▏ | 1143/10000 [4:10:46<31:49:45, 12.94s/it] {'loss': 0.0049, 'learning_rate': 4.4305000000000004e-05, 'epoch': 0.43} 11%|█▏ | 1143/10000 [4:10:46<31:49:45, 12.94s/it] 11%|█▏ | 1144/10000 [4:10:59<31:49:21, 12.94s/it] {'loss': 0.005, 'learning_rate': 4.43e-05, 'epoch': 0.43} 11%|█▏ | 1144/10000 [4:10:59<31:49:21, 12.94s/it] 11%|█▏ | 1145/10000 [4:11:12<31:51:31, 12.95s/it] {'loss': 0.0071, 'learning_rate': 4.4295e-05, 'epoch': 0.43} 11%|█▏ | 1145/10000 [4:11:12<31:51:31, 12.95s/it] 11%|█▏ | 1146/10000 [4:11:24<31:52:55, 12.96s/it] {'loss': 0.0062, 'learning_rate': 4.4290000000000005e-05, 'epoch': 0.43} 11%|█▏ | 1146/10000 [4:11:25<31:52:55, 12.96s/it] 11%|█▏ | 1147/10000 [4:11:38<31:55:28, 12.98s/it] {'loss': 0.0055, 'learning_rate': 4.428500000000001e-05, 'epoch': 0.43} 11%|█▏ | 1147/10000 [4:11:38<31:55:28, 12.98s/it] 11%|█▏ | 1148/10000 [4:11:51<31:58:22, 13.00s/it] {'loss': 0.0081, 'learning_rate': 4.428e-05, 'epoch': 0.43} 11%|█▏ | 1148/10000 [4:11:51<31:58:22, 13.00s/it] 11%|█▏ | 1149/10000 [4:12:04<31:56:32, 12.99s/it] {'loss': 0.0094, 'learning_rate': 4.4275e-05, 'epoch': 0.43} 11%|█▏ | 1149/10000 [4:12:04<31:56:32, 12.99s/it] 12%|█▏ | 1150/10000 [4:12:16<31:51:44, 12.96s/it] {'loss': 0.0055, 'learning_rate': 4.427e-05, 'epoch': 0.43} 12%|█▏ | 1150/10000 [4:12:16<31:51:44, 12.96s/it] 12%|█▏ | 1151/10000 [4:12:29<31:48:55, 12.94s/it] {'loss': 0.0067, 'learning_rate': 4.4265000000000004e-05, 'epoch': 0.43} 12%|█▏ | 1151/10000 [4:12:29<31:48:55, 12.94s/it] 12%|█▏ | 1152/10000 [4:12:42<31:48:53, 12.94s/it] {'loss': 0.0066, 'learning_rate': 4.426e-05, 'epoch': 0.43} 12%|█▏ | 1152/10000 [4:12:42<31:48:53, 12.94s/it] 12%|█▏ | 1153/10000 [4:12:55<31:49:15, 12.95s/it] {'loss': 0.0069, 'learning_rate': 4.4255e-05, 'epoch': 0.43} 12%|█▏ | 1153/10000 [4:12:55<31:49:15, 12.95s/it] 12%|█▏ | 1154/10000 [4:13:08<31:48:52, 12.95s/it] {'loss': 0.0079, 'learning_rate': 4.4250000000000005e-05, 'epoch': 0.43} 12%|█▏ | 1154/10000 [4:13:08<31:48:52, 12.95s/it] 12%|█▏ | 1155/10000 [4:13:21<31:46:47, 12.93s/it] {'loss': 0.0084, 'learning_rate': 4.4245e-05, 'epoch': 0.44} 12%|█▏ | 1155/10000 [4:13:21<31:46:47, 12.93s/it] 12%|█▏ | 1156/10000 [4:13:34<31:46:36, 12.93s/it] {'loss': 0.007, 'learning_rate': 4.424e-05, 'epoch': 0.44} 12%|█▏ | 1156/10000 [4:13:34<31:46:36, 12.93s/it] 12%|█▏ | 1157/10000 [4:13:47<31:49:34, 12.96s/it] {'loss': 0.0081, 'learning_rate': 4.4235000000000006e-05, 'epoch': 0.44} 12%|█▏ | 1157/10000 [4:13:47<31:49:34, 12.96s/it] 12%|█▏ | 1158/10000 [4:14:00<31:49:35, 12.96s/it] {'loss': 0.0069, 'learning_rate': 4.423e-05, 'epoch': 0.44} 12%|█▏ | 1158/10000 [4:14:00<31:49:35, 12.96s/it] 12%|█▏ | 1159/10000 [4:14:13<31:49:02, 12.96s/it] {'loss': 0.0073, 'learning_rate': 4.4225e-05, 'epoch': 0.44} 12%|█▏ | 1159/10000 [4:14:13<31:49:02, 12.96s/it] 12%|█▏ | 1160/10000 [4:14:26<31:47:59, 12.95s/it] {'loss': 0.0051, 'learning_rate': 4.422e-05, 'epoch': 0.44} 12%|█▏ | 1160/10000 [4:14:26<31:47:59, 12.95s/it] 12%|█▏ | 1161/10000 [4:14:39<31:48:45, 12.96s/it] {'loss': 0.0073, 'learning_rate': 4.4215e-05, 'epoch': 0.44} 12%|█▏ | 1161/10000 [4:14:39<31:48:45, 12.96s/it] 12%|█▏ | 1162/10000 [4:14:52<31:48:04, 12.95s/it] {'loss': 0.0073, 'learning_rate': 4.421e-05, 'epoch': 0.44} 12%|█▏ | 1162/10000 [4:14:52<31:48:04, 12.95s/it] 12%|█▏ | 1163/10000 [4:15:05<31:46:53, 12.95s/it] {'loss': 0.0084, 'learning_rate': 4.4205e-05, 'epoch': 0.44} 12%|█▏ | 1163/10000 [4:15:05<31:46:53, 12.95s/it] 12%|█▏ | 1164/10000 [4:15:18<31:43:33, 12.93s/it] {'loss': 0.0068, 'learning_rate': 4.4200000000000004e-05, 'epoch': 0.44} 12%|█▏ | 1164/10000 [4:15:18<31:43:33, 12.93s/it] 12%|█▏ | 1165/10000 [4:15:31<31:47:39, 12.96s/it] {'loss': 0.0056, 'learning_rate': 4.4195000000000006e-05, 'epoch': 0.44} 12%|█▏ | 1165/10000 [4:15:31<31:47:39, 12.96s/it] 12%|█▏ | 1166/10000 [4:15:44<31:46:03, 12.95s/it] {'loss': 0.0048, 'learning_rate': 4.419e-05, 'epoch': 0.44} 12%|█▏ | 1166/10000 [4:15:44<31:46:03, 12.95s/it] 12%|█▏ | 1167/10000 [4:15:57<31:46:59, 12.95s/it] {'loss': 0.0065, 'learning_rate': 4.4185000000000005e-05, 'epoch': 0.44} 12%|█▏ | 1167/10000 [4:15:57<31:46:59, 12.95s/it] 12%|█▏ | 1168/10000 [4:16:09<31:45:28, 12.94s/it] {'loss': 0.0084, 'learning_rate': 4.418000000000001e-05, 'epoch': 0.44} 12%|█▏ | 1168/10000 [4:16:09<31:45:28, 12.94s/it] 12%|█▏ | 1169/10000 [4:16:22<31:46:18, 12.95s/it] {'loss': 0.0072, 'learning_rate': 4.4174999999999996e-05, 'epoch': 0.44} 12%|█▏ | 1169/10000 [4:16:22<31:46:18, 12.95s/it] 12%|█▏ | 1170/10000 [4:16:35<31:48:52, 12.97s/it] {'loss': 0.0107, 'learning_rate': 4.417e-05, 'epoch': 0.44} 12%|█▏ | 1170/10000 [4:16:35<31:48:52, 12.97s/it] 12%|█▏ | 1171/10000 [4:16:48<31:46:59, 12.96s/it] {'loss': 0.0086, 'learning_rate': 4.4165e-05, 'epoch': 0.44} 12%|█▏ | 1171/10000 [4:16:48<31:46:59, 12.96s/it] 12%|█▏ | 1172/10000 [4:17:01<31:45:21, 12.95s/it] {'loss': 0.0074, 'learning_rate': 4.4160000000000004e-05, 'epoch': 0.44} 12%|█▏ | 1172/10000 [4:17:01<31:45:21, 12.95s/it] 12%|█▏ | 1173/10000 [4:17:14<31:39:45, 12.91s/it] {'loss': 0.0084, 'learning_rate': 4.4155e-05, 'epoch': 0.44} 12%|█▏ | 1173/10000 [4:17:14<31:39:45, 12.91s/it] 12%|█▏ | 1174/10000 [4:17:27<31:45:41, 12.96s/it] {'loss': 0.0056, 'learning_rate': 4.415e-05, 'epoch': 0.44} 12%|█▏ | 1174/10000 [4:17:27<31:45:41, 12.96s/it] 12%|█▏ | 1175/10000 [4:17:40<31:45:08, 12.95s/it] {'loss': 0.0096, 'learning_rate': 4.4145000000000005e-05, 'epoch': 0.44} 12%|█▏ | 1175/10000 [4:17:40<31:45:08, 12.95s/it] 12%|█▏ | 1176/10000 [4:17:53<31:48:33, 12.98s/it] {'loss': 0.0083, 'learning_rate': 4.414e-05, 'epoch': 0.44} 12%|█▏ | 1176/10000 [4:17:53<31:48:33, 12.98s/it] 12%|█▏ | 1177/10000 [4:18:06<31:49:27, 12.99s/it] {'loss': 0.0098, 'learning_rate': 4.4135000000000003e-05, 'epoch': 0.44} 12%|█▏ | 1177/10000 [4:18:06<31:49:27, 12.99s/it] 12%|█▏ | 1178/10000 [4:18:19<31:45:49, 12.96s/it] {'loss': 0.0088, 'learning_rate': 4.4130000000000006e-05, 'epoch': 0.44} 12%|█▏ | 1178/10000 [4:18:19<31:45:49, 12.96s/it] 12%|█▏ | 1179/10000 [4:18:32<31:50:17, 12.99s/it] {'loss': 0.0071, 'learning_rate': 4.4125e-05, 'epoch': 0.44} 12%|█▏ | 1179/10000 [4:18:32<31:50:17, 12.99s/it] 12%|█▏ | 1180/10000 [4:18:45<31:45:41, 12.96s/it] {'loss': 0.0066, 'learning_rate': 4.412e-05, 'epoch': 0.44} 12%|█▏ | 1180/10000 [4:18:45<31:45:41, 12.96s/it] 12%|█▏ | 1181/10000 [4:18:58<31:41:36, 12.94s/it] {'loss': 0.0066, 'learning_rate': 4.4115e-05, 'epoch': 0.44} 12%|█▏ | 1181/10000 [4:18:58<31:41:36, 12.94s/it] 12%|█▏ | 1182/10000 [4:19:11<31:37:59, 12.91s/it] {'loss': 0.0063, 'learning_rate': 4.411e-05, 'epoch': 0.45} 12%|█▏ | 1182/10000 [4:19:11<31:37:59, 12.91s/it] 12%|█▏ | 1183/10000 [4:19:24<31:38:07, 12.92s/it] {'loss': 0.0071, 'learning_rate': 4.4105e-05, 'epoch': 0.45} 12%|█▏ | 1183/10000 [4:19:24<31:38:07, 12.92s/it] 12%|█▏ | 1184/10000 [4:19:37<31:40:25, 12.93s/it] {'loss': 0.0067, 'learning_rate': 4.41e-05, 'epoch': 0.45} 12%|█▏ | 1184/10000 [4:19:37<31:40:25, 12.93s/it] 12%|█▏ | 1185/10000 [4:19:50<31:41:09, 12.94s/it] {'loss': 0.0078, 'learning_rate': 4.4095000000000004e-05, 'epoch': 0.45} 12%|█▏ | 1185/10000 [4:19:50<31:41:09, 12.94s/it] 12%|█▏ | 1186/10000 [4:20:03<31:40:44, 12.94s/it] {'loss': 0.0065, 'learning_rate': 4.4090000000000006e-05, 'epoch': 0.45} 12%|█▏ | 1186/10000 [4:20:03<31:40:44, 12.94s/it] 12%|█▏ | 1187/10000 [4:20:15<31:40:44, 12.94s/it] {'loss': 0.0066, 'learning_rate': 4.4085e-05, 'epoch': 0.45} 12%|█▏ | 1187/10000 [4:20:16<31:40:44, 12.94s/it] 12%|█▏ | 1188/10000 [4:20:28<31:34:12, 12.90s/it] {'loss': 0.0058, 'learning_rate': 4.4080000000000005e-05, 'epoch': 0.45} 12%|█▏ | 1188/10000 [4:20:28<31:34:12, 12.90s/it] 12%|█▏ | 1189/10000 [4:20:41<31:39:05, 12.93s/it] {'loss': 0.0074, 'learning_rate': 4.4075e-05, 'epoch': 0.45} 12%|█▏ | 1189/10000 [4:20:41<31:39:05, 12.93s/it] 12%|█▏ | 1190/10000 [4:20:54<31:39:34, 12.94s/it] {'loss': 0.0086, 'learning_rate': 4.407e-05, 'epoch': 0.45} 12%|█▏ | 1190/10000 [4:20:54<31:39:34, 12.94s/it] 12%|█▏ | 1191/10000 [4:21:07<31:37:15, 12.92s/it] {'loss': 0.0099, 'learning_rate': 4.4065e-05, 'epoch': 0.45} 12%|█▏ | 1191/10000 [4:21:07<31:37:15, 12.92s/it] 12%|█▏ | 1192/10000 [4:21:20<31:43:04, 12.96s/it] {'loss': 0.0072, 'learning_rate': 4.406e-05, 'epoch': 0.45} 12%|█▏ | 1192/10000 [4:21:20<31:43:04, 12.96s/it] 12%|█▏ | 1193/10000 [4:21:33<31:37:56, 12.93s/it] {'loss': 0.0056, 'learning_rate': 4.4055000000000004e-05, 'epoch': 0.45} 12%|█▏ | 1193/10000 [4:21:33<31:37:56, 12.93s/it] 12%|█▏ | 1194/10000 [4:21:46<31:38:00, 12.93s/it] {'loss': 0.0067, 'learning_rate': 4.405e-05, 'epoch': 0.45} 12%|█▏ | 1194/10000 [4:21:46<31:38:00, 12.93s/it] 12%|█▏ | 1195/10000 [4:21:59<31:30:51, 12.88s/it] {'loss': 0.0072, 'learning_rate': 4.4045e-05, 'epoch': 0.45} 12%|█▏ | 1195/10000 [4:21:59<31:30:51, 12.88s/it] 12%|█▏ | 1196/10000 [4:22:12<31:31:33, 12.89s/it] {'loss': 0.0083, 'learning_rate': 4.4040000000000005e-05, 'epoch': 0.45} 12%|█▏ | 1196/10000 [4:22:12<31:31:33, 12.89s/it] 12%|█▏ | 1197/10000 [4:22:25<31:32:15, 12.90s/it] {'loss': 0.0057, 'learning_rate': 4.4035e-05, 'epoch': 0.45} 12%|█▏ | 1197/10000 [4:22:25<31:32:15, 12.90s/it] 12%|█▏ | 1198/10000 [4:22:37<31:29:00, 12.88s/it] {'loss': 0.008, 'learning_rate': 4.4030000000000004e-05, 'epoch': 0.45} 12%|█▏ | 1198/10000 [4:22:37<31:29:00, 12.88s/it] 12%|█▏ | 1199/10000 [4:22:50<31:29:16, 12.88s/it] {'loss': 0.0087, 'learning_rate': 4.4025e-05, 'epoch': 0.45} 12%|█▏ | 1199/10000 [4:22:50<31:29:16, 12.88s/it] 12%|█▏ | 1200/10000 [4:23:03<31:31:18, 12.90s/it] {'loss': 0.0078, 'learning_rate': 4.402e-05, 'epoch': 0.45} 12%|█▏ | 1200/10000 [4:23:03<31:31:18, 12.90s/it] 12%|█▏ | 1201/10000 [4:23:16<31:30:59, 12.89s/it] {'loss': 0.0084, 'learning_rate': 4.4015e-05, 'epoch': 0.45} 12%|█▏ | 1201/10000 [4:23:16<31:30:59, 12.89s/it] 12%|█▏ | 1202/10000 [4:23:29<31:32:35, 12.91s/it] {'loss': 0.0068, 'learning_rate': 4.401e-05, 'epoch': 0.45} 12%|█▏ | 1202/10000 [4:23:29<31:32:35, 12.91s/it] 12%|█▏ | 1203/10000 [4:23:42<31:34:56, 12.92s/it] {'loss': 0.0079, 'learning_rate': 4.4005e-05, 'epoch': 0.45} 12%|█▏ | 1203/10000 [4:23:42<31:34:56, 12.92s/it] 12%|█▏ | 1204/10000 [4:23:55<31:38:34, 12.95s/it] {'loss': 0.0065, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.45} 12%|█▏ | 1204/10000 [4:23:55<31:38:34, 12.95s/it] 12%|█▏ | 1205/10000 [4:24:08<31:37:55, 12.95s/it] {'loss': 0.0067, 'learning_rate': 4.3995e-05, 'epoch': 0.45} 12%|█▏ | 1205/10000 [4:24:08<31:37:55, 12.95s/it] 12%|█▏ | 1206/10000 [4:24:21<31:37:06, 12.94s/it] {'loss': 0.0086, 'learning_rate': 4.3990000000000004e-05, 'epoch': 0.45} 12%|█▏ | 1206/10000 [4:24:21<31:37:06, 12.94s/it] 12%|█▏ | 1207/10000 [4:24:34<31:37:57, 12.95s/it] {'loss': 0.0085, 'learning_rate': 4.398500000000001e-05, 'epoch': 0.45} 12%|█▏ | 1207/10000 [4:24:34<31:37:57, 12.95s/it] 12%|█▏ | 1208/10000 [4:24:47<31:41:28, 12.98s/it] {'loss': 0.006, 'learning_rate': 4.398e-05, 'epoch': 0.46} 12%|█▏ | 1208/10000 [4:24:47<31:41:28, 12.98s/it] 12%|█▏ | 1209/10000 [4:25:00<31:42:28, 12.98s/it] {'loss': 0.0068, 'learning_rate': 4.3975e-05, 'epoch': 0.46} 12%|█▏ | 1209/10000 [4:25:00<31:42:28, 12.98s/it] 12%|█▏ | 1210/10000 [4:25:13<31:44:09, 13.00s/it] {'loss': 0.0049, 'learning_rate': 4.397e-05, 'epoch': 0.46} 12%|█▏ | 1210/10000 [4:25:13<31:44:09, 13.00s/it] 12%|█▏ | 1211/10000 [4:25:26<31:41:01, 12.98s/it] {'loss': 0.0071, 'learning_rate': 4.3965000000000003e-05, 'epoch': 0.46} 12%|█▏ | 1211/10000 [4:25:26<31:41:01, 12.98s/it] 12%|█▏ | 1212/10000 [4:25:39<31:35:26, 12.94s/it] {'loss': 0.0041, 'learning_rate': 4.396e-05, 'epoch': 0.46} 12%|█▏ | 1212/10000 [4:25:39<31:35:26, 12.94s/it] 12%|█▏ | 1213/10000 [4:25:52<31:32:22, 12.92s/it] {'loss': 0.0061, 'learning_rate': 4.3955e-05, 'epoch': 0.46} 12%|█▏ | 1213/10000 [4:25:52<31:32:22, 12.92s/it] 12%|█▏ | 1214/10000 [4:26:04<31:29:39, 12.90s/it] {'loss': 0.0076, 'learning_rate': 4.3950000000000004e-05, 'epoch': 0.46} 12%|█▏ | 1214/10000 [4:26:05<31:29:39, 12.90s/it] 12%|█▏ | 1215/10000 [4:26:17<31:26:36, 12.89s/it] {'loss': 0.0102, 'learning_rate': 4.3945e-05, 'epoch': 0.46} 12%|█▏ | 1215/10000 [4:26:17<31:26:36, 12.89s/it] 12%|█▏ | 1216/10000 [4:26:30<31:26:52, 12.89s/it] {'loss': 0.007, 'learning_rate': 4.394e-05, 'epoch': 0.46} 12%|█▏ | 1216/10000 [4:26:30<31:26:52, 12.89s/it] 12%|█▏ | 1217/10000 [4:26:43<31:26:48, 12.89s/it] {'loss': 0.0096, 'learning_rate': 4.3935000000000005e-05, 'epoch': 0.46} 12%|█▏ | 1217/10000 [4:26:43<31:26:48, 12.89s/it] 12%|█▏ | 1218/10000 [4:26:56<31:24:41, 12.88s/it] {'loss': 0.0083, 'learning_rate': 4.393e-05, 'epoch': 0.46} 12%|█▏ | 1218/10000 [4:26:56<31:24:41, 12.88s/it] 12%|█▏ | 1219/10000 [4:27:09<31:24:00, 12.87s/it] {'loss': 0.0084, 'learning_rate': 4.3925e-05, 'epoch': 0.46} 12%|█▏ | 1219/10000 [4:27:09<31:24:00, 12.87s/it] 12%|█▏ | 1220/10000 [4:27:22<31:22:52, 12.87s/it] {'loss': 0.0392, 'learning_rate': 4.392e-05, 'epoch': 0.46} 12%|█▏ | 1220/10000 [4:27:22<31:22:52, 12.87s/it] 12%|█▏ | 1221/10000 [4:27:35<31:23:37, 12.87s/it] {'loss': 0.0052, 'learning_rate': 4.3915e-05, 'epoch': 0.46} 12%|█▏ | 1221/10000 [4:27:35<31:23:37, 12.87s/it] 12%|█▏ | 1222/10000 [4:27:47<31:21:20, 12.86s/it] {'loss': 0.0072, 'learning_rate': 4.391e-05, 'epoch': 0.46} 12%|█▏ | 1222/10000 [4:27:47<31:21:20, 12.86s/it] 12%|█▏ | 1223/10000 [4:28:00<31:23:21, 12.87s/it] {'loss': 0.0067, 'learning_rate': 4.3905e-05, 'epoch': 0.46} 12%|█▏ | 1223/10000 [4:28:00<31:23:21, 12.87s/it] 12%|█▏ | 1224/10000 [4:28:13<31:24:08, 12.88s/it] {'loss': 0.0062, 'learning_rate': 4.39e-05, 'epoch': 0.46} 12%|█▏ | 1224/10000 [4:28:13<31:24:08, 12.88s/it] 12%|█▏ | 1225/10000 [4:28:26<31:25:37, 12.89s/it] {'loss': 0.0055, 'learning_rate': 4.3895000000000006e-05, 'epoch': 0.46} 12%|█▏ | 1225/10000 [4:28:26<31:25:37, 12.89s/it] 12%|█▏ | 1226/10000 [4:28:39<31:27:07, 12.90s/it] {'loss': 0.0103, 'learning_rate': 4.389e-05, 'epoch': 0.46} 12%|█▏ | 1226/10000 [4:28:39<31:27:07, 12.90s/it] 12%|█▏ | 1227/10000 [4:28:52<31:30:00, 12.93s/it] {'loss': 0.0063, 'learning_rate': 4.3885000000000004e-05, 'epoch': 0.46} 12%|█▏ | 1227/10000 [4:28:52<31:30:00, 12.93s/it] 12%|█▏ | 1228/10000 [4:29:05<31:34:48, 12.96s/it] {'loss': 0.006, 'learning_rate': 4.388000000000001e-05, 'epoch': 0.46} 12%|█▏ | 1228/10000 [4:29:05<31:34:48, 12.96s/it] 12%|█▏ | 1229/10000 [4:29:18<31:36:08, 12.97s/it] {'loss': 0.0058, 'learning_rate': 4.3875e-05, 'epoch': 0.46} 12%|█▏ | 1229/10000 [4:29:18<31:36:08, 12.97s/it] 12%|█▏ | 1230/10000 [4:29:31<31:34:49, 12.96s/it] {'loss': 0.0078, 'learning_rate': 4.387e-05, 'epoch': 0.46} 12%|█▏ | 1230/10000 [4:29:31<31:34:49, 12.96s/it] 12%|█▏ | 1231/10000 [4:29:44<31:33:14, 12.95s/it] {'loss': 0.0077, 'learning_rate': 4.3865e-05, 'epoch': 0.46} 12%|█▏ | 1231/10000 [4:29:44<31:33:14, 12.95s/it] 12%|█▏ | 1232/10000 [4:29:57<31:32:36, 12.95s/it] {'loss': 0.0073, 'learning_rate': 4.3860000000000004e-05, 'epoch': 0.46} 12%|█▏ | 1232/10000 [4:29:57<31:32:36, 12.95s/it] 12%|█▏ | 1233/10000 [4:30:10<31:33:04, 12.96s/it] {'loss': 0.0086, 'learning_rate': 4.3855e-05, 'epoch': 0.46} 12%|█▏ | 1233/10000 [4:30:10<31:33:04, 12.96s/it] 12%|█▏ | 1234/10000 [4:30:23<31:33:40, 12.96s/it] {'loss': 0.008, 'learning_rate': 4.385e-05, 'epoch': 0.46} 12%|█▏ | 1234/10000 [4:30:23<31:33:40, 12.96s/it] 12%|█▏ | 1235/10000 [4:30:36<31:34:36, 12.97s/it] {'loss': 0.0051, 'learning_rate': 4.3845000000000005e-05, 'epoch': 0.47} 12%|█▏ | 1235/10000 [4:30:36<31:34:36, 12.97s/it] 12%|█▏ | 1236/10000 [4:30:49<31:33:23, 12.96s/it] {'loss': 0.0083, 'learning_rate': 4.384e-05, 'epoch': 0.47} 12%|█▏ | 1236/10000 [4:30:49<31:33:23, 12.96s/it] 12%|█▏ | 1237/10000 [4:31:02<31:33:51, 12.97s/it] {'loss': 0.0058, 'learning_rate': 4.3835e-05, 'epoch': 0.47} 12%|█▏ | 1237/10000 [4:31:02<31:33:51, 12.97s/it] 12%|█▏ | 1238/10000 [4:31:15<31:34:53, 12.98s/it] {'loss': 0.007, 'learning_rate': 4.3830000000000006e-05, 'epoch': 0.47} 12%|█▏ | 1238/10000 [4:31:15<31:34:53, 12.98s/it] 12%|█▏ | 1239/10000 [4:31:28<31:32:52, 12.96s/it] {'loss': 0.0068, 'learning_rate': 4.3825e-05, 'epoch': 0.47} 12%|█▏ | 1239/10000 [4:31:28<31:32:52, 12.96s/it] 12%|█▏ | 1240/10000 [4:31:41<31:32:54, 12.97s/it] {'loss': 0.0055, 'learning_rate': 4.382e-05, 'epoch': 0.47} 12%|█▏ | 1240/10000 [4:31:41<31:32:54, 12.97s/it] 12%|█▏ | 1241/10000 [4:31:54<31:31:32, 12.96s/it] {'loss': 0.0062, 'learning_rate': 4.3815e-05, 'epoch': 0.47} 12%|█▏ | 1241/10000 [4:31:54<31:31:32, 12.96s/it] 12%|█▏ | 1242/10000 [4:32:07<31:30:26, 12.95s/it] {'loss': 0.0055, 'learning_rate': 4.381e-05, 'epoch': 0.47} 12%|█▏ | 1242/10000 [4:32:07<31:30:26, 12.95s/it] 12%|█▏ | 1243/10000 [4:32:19<31:30:42, 12.95s/it] {'loss': 0.0068, 'learning_rate': 4.3805000000000005e-05, 'epoch': 0.47} 12%|█▏ | 1243/10000 [4:32:20<31:30:42, 12.95s/it] 12%|█▏ | 1244/10000 [4:32:32<31:28:12, 12.94s/it] {'loss': 0.0073, 'learning_rate': 4.38e-05, 'epoch': 0.47} 12%|█▏ | 1244/10000 [4:32:32<31:28:12, 12.94s/it] 12%|█▏ | 1245/10000 [4:32:45<31:25:04, 12.92s/it] {'loss': 0.005, 'learning_rate': 4.3795e-05, 'epoch': 0.47} 12%|█▏ | 1245/10000 [4:32:45<31:25:04, 12.92s/it] 12%|█▏ | 1246/10000 [4:32:58<31:24:55, 12.92s/it] {'loss': 0.0078, 'learning_rate': 4.3790000000000006e-05, 'epoch': 0.47} 12%|█▏ | 1246/10000 [4:32:58<31:24:55, 12.92s/it] 12%|█▏ | 1247/10000 [4:33:11<31:26:28, 12.93s/it] {'loss': 0.0063, 'learning_rate': 4.3785e-05, 'epoch': 0.47} 12%|█▏ | 1247/10000 [4:33:11<31:26:28, 12.93s/it] 12%|█▏ | 1248/10000 [4:33:24<31:25:55, 12.93s/it] {'loss': 0.006, 'learning_rate': 4.3780000000000004e-05, 'epoch': 0.47} 12%|█▏ | 1248/10000 [4:33:24<31:25:55, 12.93s/it] 12%|█▏ | 1249/10000 [4:33:37<31:24:13, 12.92s/it] {'loss': 0.0055, 'learning_rate': 4.3775e-05, 'epoch': 0.47} 12%|█▏ | 1249/10000 [4:33:37<31:24:13, 12.92s/it] 12%|█▎ | 1250/10000 [4:33:50<31:25:50, 12.93s/it] {'loss': 0.0057, 'learning_rate': 4.377e-05, 'epoch': 0.47} 12%|█▎ | 1250/10000 [4:33:50<31:25:50, 12.93s/it] 13%|█▎ | 1251/10000 [4:34:03<31:24:32, 12.92s/it] {'loss': 0.0067, 'learning_rate': 4.3765e-05, 'epoch': 0.47} 13%|█▎ | 1251/10000 [4:34:03<31:24:32, 12.92s/it] 13%|█▎ | 1252/10000 [4:34:16<31:23:42, 12.92s/it] {'loss': 0.0058, 'learning_rate': 4.376e-05, 'epoch': 0.47} 13%|█▎ | 1252/10000 [4:34:16<31:23:42, 12.92s/it] 13%|█▎ | 1253/10000 [4:34:29<31:23:39, 12.92s/it] {'loss': 0.0065, 'learning_rate': 4.3755000000000004e-05, 'epoch': 0.47} 13%|█▎ | 1253/10000 [4:34:29<31:23:39, 12.92s/it] 13%|█▎ | 1254/10000 [4:34:42<31:27:35, 12.95s/it] {'loss': 0.008, 'learning_rate': 4.375e-05, 'epoch': 0.47} 13%|█▎ | 1254/10000 [4:34:42<31:27:35, 12.95s/it] 13%|█▎ | 1255/10000 [4:34:55<31:27:07, 12.95s/it] {'loss': 0.0053, 'learning_rate': 4.3745e-05, 'epoch': 0.47} 13%|█▎ | 1255/10000 [4:34:55<31:27:07, 12.95s/it] 13%|█▎ | 1256/10000 [4:35:08<31:25:33, 12.94s/it] {'loss': 0.0061, 'learning_rate': 4.3740000000000005e-05, 'epoch': 0.47} 13%|█▎ | 1256/10000 [4:35:08<31:25:33, 12.94s/it] 13%|█▎ | 1257/10000 [4:35:20<31:24:50, 12.94s/it] {'loss': 0.0072, 'learning_rate': 4.3735e-05, 'epoch': 0.47} 13%|█▎ | 1257/10000 [4:35:21<31:24:50, 12.94s/it] 13%|█▎ | 1258/10000 [4:35:33<31:23:16, 12.93s/it] {'loss': 0.0053, 'learning_rate': 4.373e-05, 'epoch': 0.47} 13%|█▎ | 1258/10000 [4:35:33<31:23:16, 12.93s/it] 13%|█▎ | 1259/10000 [4:35:46<31:24:14, 12.93s/it] {'loss': 0.0062, 'learning_rate': 4.3725000000000006e-05, 'epoch': 0.47} 13%|█▎ | 1259/10000 [4:35:46<31:24:14, 12.93s/it] 13%|█▎ | 1260/10000 [4:35:59<31:29:42, 12.97s/it] {'loss': 0.0061, 'learning_rate': 4.372e-05, 'epoch': 0.47} 13%|█▎ | 1260/10000 [4:35:59<31:29:42, 12.97s/it] 13%|█▎ | 1261/10000 [4:36:12<31:29:22, 12.97s/it] {'loss': 0.0069, 'learning_rate': 4.3715e-05, 'epoch': 0.48} 13%|█▎ | 1261/10000 [4:36:12<31:29:22, 12.97s/it] 13%|█▎ | 1262/10000 [4:36:25<31:26:19, 12.95s/it] {'loss': 0.0404, 'learning_rate': 4.371e-05, 'epoch': 0.48} 13%|█▎ | 1262/10000 [4:36:25<31:26:19, 12.95s/it] 13%|█▎ | 1263/10000 [4:36:38<31:24:30, 12.94s/it] {'loss': 0.0059, 'learning_rate': 4.3705e-05, 'epoch': 0.48} 13%|█▎ | 1263/10000 [4:36:38<31:24:30, 12.94s/it] 13%|█▎ | 1264/10000 [4:36:51<31:22:50, 12.93s/it] {'loss': 0.0068, 'learning_rate': 4.3700000000000005e-05, 'epoch': 0.48} 13%|█▎ | 1264/10000 [4:36:51<31:22:50, 12.93s/it] 13%|█▎ | 1265/10000 [4:37:04<31:23:27, 12.94s/it] {'loss': 0.006, 'learning_rate': 4.3695e-05, 'epoch': 0.48} 13%|█▎ | 1265/10000 [4:37:04<31:23:27, 12.94s/it] 13%|█▎ | 1266/10000 [4:37:17<31:26:08, 12.96s/it] {'loss': 0.0064, 'learning_rate': 4.3690000000000004e-05, 'epoch': 0.48} 13%|█▎ | 1266/10000 [4:37:17<31:26:08, 12.96s/it] 13%|█▎ | 1267/10000 [4:37:30<31:25:13, 12.95s/it] {'loss': 0.0057, 'learning_rate': 4.3685000000000006e-05, 'epoch': 0.48} 13%|█▎ | 1267/10000 [4:37:30<31:25:13, 12.95s/it] 13%|█▎ | 1268/10000 [4:37:43<31:26:09, 12.96s/it] {'loss': 0.0064, 'learning_rate': 4.368e-05, 'epoch': 0.48} 13%|█▎ | 1268/10000 [4:37:43<31:26:09, 12.96s/it] 13%|█▎ | 1269/10000 [4:37:56<31:29:46, 12.99s/it] {'loss': 0.0114, 'learning_rate': 4.3675000000000005e-05, 'epoch': 0.48} 13%|█▎ | 1269/10000 [4:37:56<31:29:46, 12.99s/it] 13%|█▎ | 1270/10000 [4:38:09<31:27:53, 12.98s/it] {'loss': 0.0067, 'learning_rate': 4.367e-05, 'epoch': 0.48} 13%|█▎ | 1270/10000 [4:38:09<31:27:53, 12.98s/it] 13%|█▎ | 1271/10000 [4:38:22<31:28:52, 12.98s/it] {'loss': 0.0051, 'learning_rate': 4.3665e-05, 'epoch': 0.48} 13%|█▎ | 1271/10000 [4:38:22<31:28:52, 12.98s/it] 13%|█▎ | 1272/10000 [4:38:35<31:25:21, 12.96s/it] {'loss': 0.0053, 'learning_rate': 4.366e-05, 'epoch': 0.48} 13%|█▎ | 1272/10000 [4:38:35<31:25:21, 12.96s/it] 13%|█▎ | 1273/10000 [4:38:48<31:20:34, 12.93s/it] {'loss': 0.0071, 'learning_rate': 4.3655e-05, 'epoch': 0.48} 13%|█▎ | 1273/10000 [4:38:48<31:20:34, 12.93s/it] 13%|█▎ | 1274/10000 [4:39:01<31:21:32, 12.94s/it] {'loss': 0.007, 'learning_rate': 4.3650000000000004e-05, 'epoch': 0.48} 13%|█▎ | 1274/10000 [4:39:01<31:21:32, 12.94s/it] 13%|█▎ | 1275/10000 [4:39:14<31:21:37, 12.94s/it] {'loss': 0.0064, 'learning_rate': 4.3645e-05, 'epoch': 0.48} 13%|█▎ | 1275/10000 [4:39:14<31:21:37, 12.94s/it] 13%|█▎ | 1276/10000 [4:39:27<31:22:50, 12.95s/it] {'loss': 0.0045, 'learning_rate': 4.364e-05, 'epoch': 0.48} 13%|█▎ | 1276/10000 [4:39:27<31:22:50, 12.95s/it] 13%|█▎ | 1277/10000 [4:39:40<31:25:26, 12.97s/it] {'loss': 0.0045, 'learning_rate': 4.3635000000000005e-05, 'epoch': 0.48} 13%|█▎ | 1277/10000 [4:39:40<31:25:26, 12.97s/it] 13%|█▎ | 1278/10000 [4:39:53<31:23:29, 12.96s/it] {'loss': 0.0082, 'learning_rate': 4.363000000000001e-05, 'epoch': 0.48} 13%|█▎ | 1278/10000 [4:39:53<31:23:29, 12.96s/it] 13%|█▎ | 1279/10000 [4:40:06<31:25:32, 12.97s/it] {'loss': 0.0056, 'learning_rate': 4.3625e-05, 'epoch': 0.48} 13%|█▎ | 1279/10000 [4:40:06<31:25:32, 12.97s/it] 13%|█▎ | 1280/10000 [4:40:18<31:22:44, 12.95s/it] {'loss': 0.0075, 'learning_rate': 4.362e-05, 'epoch': 0.48} 13%|█▎ | 1280/10000 [4:40:18<31:22:44, 12.95s/it] 13%|█▎ | 1281/10000 [4:40:31<31:22:02, 12.95s/it] {'loss': 0.0051, 'learning_rate': 4.3615e-05, 'epoch': 0.48} 13%|█▎ | 1281/10000 [4:40:31<31:22:02, 12.95s/it] 13%|█▎ | 1282/10000 [4:40:44<31:19:44, 12.94s/it] {'loss': 0.0056, 'learning_rate': 4.361e-05, 'epoch': 0.48} 13%|█▎ | 1282/10000 [4:40:44<31:19:44, 12.94s/it] 13%|█▎ | 1283/10000 [4:40:57<31:19:28, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.3605e-05, 'epoch': 0.48} 13%|█▎ | 1283/10000 [4:40:57<31:19:28, 12.94s/it] 13%|█▎ | 1284/10000 [4:41:10<31:18:21, 12.93s/it] {'loss': 0.0058, 'learning_rate': 4.36e-05, 'epoch': 0.48} 13%|█▎ | 1284/10000 [4:41:10<31:18:21, 12.93s/it] 13%|█▎ | 1285/10000 [4:41:23<31:21:47, 12.96s/it] {'loss': 0.0061, 'learning_rate': 4.3595000000000005e-05, 'epoch': 0.48} 13%|█▎ | 1285/10000 [4:41:23<31:21:47, 12.96s/it] 13%|█▎ | 1286/10000 [4:41:36<31:19:20, 12.94s/it] {'loss': 0.0056, 'learning_rate': 4.359e-05, 'epoch': 0.48} 13%|█▎ | 1286/10000 [4:41:36<31:19:20, 12.94s/it] 13%|█▎ | 1287/10000 [4:41:49<31:17:26, 12.93s/it] {'loss': 0.0088, 'learning_rate': 4.3585000000000004e-05, 'epoch': 0.48} 13%|█▎ | 1287/10000 [4:41:49<31:17:26, 12.93s/it] 13%|█▎ | 1288/10000 [4:42:02<31:22:26, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.3580000000000006e-05, 'epoch': 0.49} 13%|█▎ | 1288/10000 [4:42:02<31:22:26, 12.96s/it] 13%|█▎ | 1289/10000 [4:42:15<31:25:58, 12.99s/it] {'loss': 0.0055, 'learning_rate': 4.3575e-05, 'epoch': 0.49} 13%|█▎ | 1289/10000 [4:42:15<31:25:58, 12.99s/it] 13%|█▎ | 1290/10000 [4:42:28<31:26:40, 13.00s/it] {'loss': 0.0065, 'learning_rate': 4.357e-05, 'epoch': 0.49} 13%|█▎ | 1290/10000 [4:42:28<31:26:40, 13.00s/it] 13%|█▎ | 1291/10000 [4:42:41<31:25:46, 12.99s/it] {'loss': 0.0077, 'learning_rate': 4.3565e-05, 'epoch': 0.49} 13%|█▎ | 1291/10000 [4:42:41<31:25:46, 12.99s/it] 13%|█▎ | 1292/10000 [4:42:54<31:26:23, 13.00s/it] {'loss': 0.0073, 'learning_rate': 4.356e-05, 'epoch': 0.49} 13%|█▎ | 1292/10000 [4:42:54<31:26:23, 13.00s/it] 13%|█▎ | 1293/10000 [4:43:07<31:24:30, 12.99s/it] {'loss': 0.0055, 'learning_rate': 4.3555e-05, 'epoch': 0.49} 13%|█▎ | 1293/10000 [4:43:07<31:24:30, 12.99s/it] 13%|█▎ | 1294/10000 [4:43:20<31:25:44, 13.00s/it] {'loss': 0.005, 'learning_rate': 4.355e-05, 'epoch': 0.49} 13%|█▎ | 1294/10000 [4:43:20<31:25:44, 13.00s/it] 13%|█▎ | 1295/10000 [4:43:33<31:21:21, 12.97s/it] {'loss': 0.0072, 'learning_rate': 4.3545000000000004e-05, 'epoch': 0.49} 13%|█▎ | 1295/10000 [4:43:33<31:21:21, 12.97s/it] 13%|█▎ | 1296/10000 [4:43:46<31:22:30, 12.98s/it] {'loss': 0.009, 'learning_rate': 4.354e-05, 'epoch': 0.49} 13%|█▎ | 1296/10000 [4:43:46<31:22:30, 12.98s/it] 13%|█▎ | 1297/10000 [4:43:59<31:21:43, 12.97s/it] {'loss': 0.0048, 'learning_rate': 4.3535e-05, 'epoch': 0.49} 13%|█▎ | 1297/10000 [4:43:59<31:21:43, 12.97s/it] 13%|█▎ | 1298/10000 [4:44:12<31:20:03, 12.96s/it] {'loss': 0.0076, 'learning_rate': 4.3530000000000005e-05, 'epoch': 0.49} 13%|█▎ | 1298/10000 [4:44:12<31:20:03, 12.96s/it] 13%|█▎ | 1299/10000 [4:44:25<31:16:09, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.352500000000001e-05, 'epoch': 0.49} 13%|█▎ | 1299/10000 [4:44:25<31:16:09, 12.94s/it] 13%|█▎ | 1300/10000 [4:44:38<31:15:58, 12.94s/it] {'loss': 0.007, 'learning_rate': 4.352e-05, 'epoch': 0.49} 13%|█▎ | 1300/10000 [4:44:38<31:15:58, 12.94s/it] 13%|█▎ | 1301/10000 [4:44:51<31:18:59, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.3515e-05, 'epoch': 0.49} 13%|█▎ | 1301/10000 [4:44:51<31:18:59, 12.96s/it] 13%|█▎ | 1302/10000 [4:45:04<31:20:55, 12.97s/it] {'loss': 0.0068, 'learning_rate': 4.351e-05, 'epoch': 0.49} 13%|█▎ | 1302/10000 [4:45:04<31:20:55, 12.97s/it] 13%|█▎ | 1303/10000 [4:45:17<31:18:15, 12.96s/it] {'loss': 0.0051, 'learning_rate': 4.3505000000000004e-05, 'epoch': 0.49} 13%|█▎ | 1303/10000 [4:45:17<31:18:15, 12.96s/it] 13%|█▎ | 1304/10000 [4:45:30<31:16:17, 12.95s/it] {'loss': 0.0052, 'learning_rate': 4.35e-05, 'epoch': 0.49} 13%|█▎ | 1304/10000 [4:45:30<31:16:17, 12.95s/it] 13%|█▎ | 1305/10000 [4:45:43<31:19:54, 12.97s/it] {'loss': 0.0319, 'learning_rate': 4.3495e-05, 'epoch': 0.49} 13%|█▎ | 1305/10000 [4:45:43<31:19:54, 12.97s/it] 13%|█▎ | 1306/10000 [4:45:55<31:18:07, 12.96s/it] {'loss': 0.0064, 'learning_rate': 4.3490000000000005e-05, 'epoch': 0.49} 13%|█▎ | 1306/10000 [4:45:56<31:18:07, 12.96s/it] 13%|█▎ | 1307/10000 [4:46:08<31:19:18, 12.97s/it] {'loss': 0.0057, 'learning_rate': 4.3485e-05, 'epoch': 0.49} 13%|█▎ | 1307/10000 [4:46:09<31:19:18, 12.97s/it] 13%|█▎ | 1308/10000 [4:46:21<31:18:10, 12.96s/it] {'loss': 0.0069, 'learning_rate': 4.3480000000000004e-05, 'epoch': 0.49} 13%|█▎ | 1308/10000 [4:46:21<31:18:10, 12.96s/it] 13%|█▎ | 1309/10000 [4:46:34<31:15:02, 12.94s/it] {'loss': 0.0059, 'learning_rate': 4.3475000000000006e-05, 'epoch': 0.49} 13%|█▎ | 1309/10000 [4:46:34<31:15:02, 12.94s/it] 13%|█▎ | 1310/10000 [4:46:47<31:16:27, 12.96s/it] {'loss': 0.0146, 'learning_rate': 4.347e-05, 'epoch': 0.49} 13%|█▎ | 1310/10000 [4:46:47<31:16:27, 12.96s/it] 13%|█▎ | 1311/10000 [4:47:00<31:17:17, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.3465e-05, 'epoch': 0.49} 13%|█▎ | 1311/10000 [4:47:00<31:17:17, 12.96s/it] 13%|█▎ | 1312/10000 [4:47:13<31:18:41, 12.97s/it] {'loss': 0.0061, 'learning_rate': 4.346e-05, 'epoch': 0.49} 13%|█▎ | 1312/10000 [4:47:13<31:18:41, 12.97s/it] 13%|█▎ | 1313/10000 [4:47:26<31:17:38, 12.97s/it] {'loss': 0.0065, 'learning_rate': 4.3455e-05, 'epoch': 0.49} 13%|█▎ | 1313/10000 [4:47:26<31:17:38, 12.97s/it] 13%|█▎ | 1314/10000 [4:47:39<31:15:20, 12.95s/it] {'loss': 0.0062, 'learning_rate': 4.345e-05, 'epoch': 0.5} 13%|█▎ | 1314/10000 [4:47:39<31:15:20, 12.95s/it] 13%|█▎ | 1315/10000 [4:47:52<31:17:13, 12.97s/it] {'loss': 0.0064, 'learning_rate': 4.3445e-05, 'epoch': 0.5} 13%|█▎ | 1315/10000 [4:47:52<31:17:13, 12.97s/it] 13%|█▎ | 1316/10000 [4:48:05<31:18:24, 12.98s/it] {'loss': 0.0067, 'learning_rate': 4.3440000000000004e-05, 'epoch': 0.5} 13%|█▎ | 1316/10000 [4:48:05<31:18:24, 12.98s/it] 13%|█▎ | 1317/10000 [4:48:18<31:17:54, 12.98s/it] {'loss': 0.006, 'learning_rate': 4.343500000000001e-05, 'epoch': 0.5} 13%|█▎ | 1317/10000 [4:48:18<31:17:54, 12.98s/it] 13%|█▎ | 1318/10000 [4:48:31<31:15:14, 12.96s/it] {'loss': 0.007, 'learning_rate': 4.343e-05, 'epoch': 0.5} 13%|█▎ | 1318/10000 [4:48:31<31:15:14, 12.96s/it] 13%|█▎ | 1319/10000 [4:48:44<31:17:45, 12.98s/it] {'loss': 0.0077, 'learning_rate': 4.3425000000000005e-05, 'epoch': 0.5} 13%|█▎ | 1319/10000 [4:48:44<31:17:45, 12.98s/it] 13%|█▎ | 1320/10000 [4:48:57<31:18:49, 12.99s/it] {'loss': 0.0049, 'learning_rate': 4.342e-05, 'epoch': 0.5} 13%|█▎ | 1320/10000 [4:48:57<31:18:49, 12.99s/it] 13%|█▎ | 1321/10000 [4:49:10<31:14:24, 12.96s/it] {'loss': 0.0066, 'learning_rate': 4.3415e-05, 'epoch': 0.5} 13%|█▎ | 1321/10000 [4:49:10<31:14:24, 12.96s/it] 13%|█▎ | 1322/10000 [4:49:23<31:14:56, 12.96s/it] {'loss': 0.0061, 'learning_rate': 4.341e-05, 'epoch': 0.5} 13%|█▎ | 1322/10000 [4:49:23<31:14:56, 12.96s/it] 13%|█▎ | 1323/10000 [4:49:36<31:13:57, 12.96s/it] {'loss': 0.0056, 'learning_rate': 4.3405e-05, 'epoch': 0.5} 13%|█▎ | 1323/10000 [4:49:36<31:13:57, 12.96s/it] 13%|█▎ | 1324/10000 [4:49:49<31:13:13, 12.95s/it] {'loss': 0.006, 'learning_rate': 4.3400000000000005e-05, 'epoch': 0.5} 13%|█▎ | 1324/10000 [4:49:49<31:13:13, 12.95s/it] 13%|█▎ | 1325/10000 [4:50:02<31:13:58, 12.96s/it] {'loss': 0.0058, 'learning_rate': 4.3395e-05, 'epoch': 0.5} 13%|█▎ | 1325/10000 [4:50:02<31:13:58, 12.96s/it] 13%|█▎ | 1326/10000 [4:50:15<31:13:08, 12.96s/it] {'loss': 0.0053, 'learning_rate': 4.339e-05, 'epoch': 0.5} 13%|█▎ | 1326/10000 [4:50:15<31:13:08, 12.96s/it] 13%|█▎ | 1327/10000 [4:50:28<31:11:50, 12.95s/it] {'loss': 0.0078, 'learning_rate': 4.3385000000000006e-05, 'epoch': 0.5} 13%|█▎ | 1327/10000 [4:50:28<31:11:50, 12.95s/it] 13%|█▎ | 1328/10000 [4:50:41<31:15:03, 12.97s/it] {'loss': 0.0069, 'learning_rate': 4.338e-05, 'epoch': 0.5} 13%|█▎ | 1328/10000 [4:50:41<31:15:03, 12.97s/it] 13%|█▎ | 1329/10000 [4:50:54<31:18:34, 13.00s/it] {'loss': 0.0049, 'learning_rate': 4.3375000000000004e-05, 'epoch': 0.5} 13%|█▎ | 1329/10000 [4:50:54<31:18:34, 13.00s/it] 13%|█▎ | 1330/10000 [4:51:07<31:14:41, 12.97s/it] {'loss': 0.0077, 'learning_rate': 4.337e-05, 'epoch': 0.5} 13%|█▎ | 1330/10000 [4:51:07<31:14:41, 12.97s/it] 13%|█▎ | 1331/10000 [4:51:20<31:17:36, 13.00s/it] {'loss': 0.0073, 'learning_rate': 4.3365e-05, 'epoch': 0.5} 13%|█▎ | 1331/10000 [4:51:20<31:17:36, 13.00s/it] 13%|█▎ | 1332/10000 [4:51:33<31:18:27, 13.00s/it] {'loss': 0.006, 'learning_rate': 4.336e-05, 'epoch': 0.5} 13%|█▎ | 1332/10000 [4:51:33<31:18:27, 13.00s/it] 13%|█▎ | 1333/10000 [4:51:46<31:18:13, 13.00s/it] {'loss': 0.007, 'learning_rate': 4.3355e-05, 'epoch': 0.5} 13%|█▎ | 1333/10000 [4:51:46<31:18:13, 13.00s/it] 13%|█▎ | 1334/10000 [4:51:59<31:17:06, 13.00s/it] {'loss': 0.0057, 'learning_rate': 4.335e-05, 'epoch': 0.5} 13%|█▎ | 1334/10000 [4:51:59<31:17:06, 13.00s/it] 13%|█▎ | 1335/10000 [4:52:12<31:18:07, 13.00s/it] {'loss': 0.0059, 'learning_rate': 4.3345e-05, 'epoch': 0.5} 13%|█▎ | 1335/10000 [4:52:12<31:18:07, 13.00s/it] 13%|█▎ | 1336/10000 [4:52:25<31:19:59, 13.02s/it] {'loss': 0.0077, 'learning_rate': 4.334e-05, 'epoch': 0.5} 13%|█▎ | 1336/10000 [4:52:25<31:19:59, 13.02s/it] 13%|█▎ | 1337/10000 [4:52:38<31:20:28, 13.02s/it] {'loss': 0.0075, 'learning_rate': 4.3335000000000004e-05, 'epoch': 0.5} 13%|█▎ | 1337/10000 [4:52:38<31:20:28, 13.02s/it] 13%|█▎ | 1338/10000 [4:52:51<31:20:18, 13.02s/it] {'loss': 0.0058, 'learning_rate': 4.333000000000001e-05, 'epoch': 0.5} 13%|█▎ | 1338/10000 [4:52:51<31:20:18, 13.02s/it] 13%|█▎ | 1339/10000 [4:53:04<31:16:00, 13.00s/it] {'loss': 0.0065, 'learning_rate': 4.3325e-05, 'epoch': 0.5} 13%|█▎ | 1339/10000 [4:53:04<31:16:00, 13.00s/it] 13%|█▎ | 1340/10000 [4:53:17<31:14:42, 12.99s/it] {'loss': 0.0057, 'learning_rate': 4.332e-05, 'epoch': 0.5} 13%|█▎ | 1340/10000 [4:53:17<31:14:42, 12.99s/it] 13%|█▎ | 1341/10000 [4:53:30<31:13:24, 12.98s/it] {'loss': 0.0054, 'learning_rate': 4.3315e-05, 'epoch': 0.51} 13%|█▎ | 1341/10000 [4:53:30<31:13:24, 12.98s/it] 13%|█▎ | 1342/10000 [4:53:43<31:12:54, 12.98s/it] {'loss': 0.0061, 'learning_rate': 4.3310000000000004e-05, 'epoch': 0.51} 13%|█▎ | 1342/10000 [4:53:43<31:12:54, 12.98s/it] 13%|█▎ | 1343/10000 [4:53:56<31:10:20, 12.96s/it] {'loss': 0.0059, 'learning_rate': 4.3305e-05, 'epoch': 0.51} 13%|█▎ | 1343/10000 [4:53:56<31:10:20, 12.96s/it] 13%|█▎ | 1344/10000 [4:54:09<31:09:57, 12.96s/it] {'loss': 0.0059, 'learning_rate': 4.33e-05, 'epoch': 0.51} 13%|█▎ | 1344/10000 [4:54:09<31:09:57, 12.96s/it] 13%|█▎ | 1345/10000 [4:54:22<31:11:37, 12.97s/it] {'loss': 0.0066, 'learning_rate': 4.3295000000000005e-05, 'epoch': 0.51} 13%|█▎ | 1345/10000 [4:54:22<31:11:37, 12.97s/it] 13%|█▎ | 1346/10000 [4:54:35<31:08:12, 12.95s/it] {'loss': 0.0054, 'learning_rate': 4.329e-05, 'epoch': 0.51} 13%|█▎ | 1346/10000 [4:54:35<31:08:12, 12.95s/it] 13%|█▎ | 1347/10000 [4:54:48<31:09:17, 12.96s/it] {'loss': 0.0054, 'learning_rate': 4.3285e-05, 'epoch': 0.51} 13%|█▎ | 1347/10000 [4:54:48<31:09:17, 12.96s/it] 13%|█▎ | 1348/10000 [4:55:00<31:07:57, 12.95s/it] {'loss': 0.0053, 'learning_rate': 4.3280000000000006e-05, 'epoch': 0.51} 13%|█▎ | 1348/10000 [4:55:00<31:07:57, 12.95s/it] 13%|█▎ | 1349/10000 [4:55:13<31:10:58, 12.98s/it] {'loss': 0.0064, 'learning_rate': 4.3275e-05, 'epoch': 0.51} 13%|█▎ | 1349/10000 [4:55:14<31:10:58, 12.98s/it] 14%|█▎ | 1350/10000 [4:55:26<31:11:25, 12.98s/it] {'loss': 0.007, 'learning_rate': 4.327e-05, 'epoch': 0.51} 14%|█▎ | 1350/10000 [4:55:27<31:11:25, 12.98s/it] 14%|█▎ | 1351/10000 [4:55:39<31:09:23, 12.97s/it] {'loss': 0.0047, 'learning_rate': 4.3265e-05, 'epoch': 0.51} 14%|█▎ | 1351/10000 [4:55:39<31:09:23, 12.97s/it] 14%|█▎ | 1352/10000 [4:55:52<31:06:46, 12.95s/it] {'loss': 0.0063, 'learning_rate': 4.326e-05, 'epoch': 0.51} 14%|█▎ | 1352/10000 [4:55:52<31:06:46, 12.95s/it] 14%|█▎ | 1353/10000 [4:56:05<31:08:59, 12.97s/it] {'loss': 0.0113, 'learning_rate': 4.3255e-05, 'epoch': 0.51} 14%|█▎ | 1353/10000 [4:56:05<31:08:59, 12.97s/it] 14%|█▎ | 1354/10000 [4:56:18<31:09:43, 12.98s/it] {'loss': 0.0073, 'learning_rate': 4.325e-05, 'epoch': 0.51} 14%|█▎ | 1354/10000 [4:56:18<31:09:43, 12.98s/it] 14%|█▎ | 1355/10000 [4:56:31<31:05:36, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.3245000000000004e-05, 'epoch': 0.51} 14%|█▎ | 1355/10000 [4:56:31<31:05:36, 12.95s/it] 14%|█▎ | 1356/10000 [4:56:44<31:03:20, 12.93s/it] {'loss': 0.0055, 'learning_rate': 4.324e-05, 'epoch': 0.51} 14%|█▎ | 1356/10000 [4:56:44<31:03:20, 12.93s/it] 14%|█▎ | 1357/10000 [4:56:57<31:02:04, 12.93s/it] {'loss': 0.0078, 'learning_rate': 4.3235e-05, 'epoch': 0.51} 14%|█▎ | 1357/10000 [4:56:57<31:02:04, 12.93s/it] 14%|█▎ | 1358/10000 [4:57:10<31:00:17, 12.92s/it] {'loss': 0.0063, 'learning_rate': 4.3230000000000005e-05, 'epoch': 0.51} 14%|█▎ | 1358/10000 [4:57:10<31:00:17, 12.92s/it] 14%|█▎ | 1359/10000 [4:57:23<30:56:18, 12.89s/it] {'loss': 0.0054, 'learning_rate': 4.322500000000001e-05, 'epoch': 0.51} 14%|█▎ | 1359/10000 [4:57:23<30:56:18, 12.89s/it] 14%|█▎ | 1360/10000 [4:57:36<30:58:00, 12.90s/it] {'loss': 0.0066, 'learning_rate': 4.3219999999999996e-05, 'epoch': 0.51} 14%|█▎ | 1360/10000 [4:57:36<30:58:00, 12.90s/it] 14%|█▎ | 1361/10000 [4:57:49<30:58:40, 12.91s/it] {'loss': 0.0117, 'learning_rate': 4.3215e-05, 'epoch': 0.51} 14%|█▎ | 1361/10000 [4:57:49<30:58:40, 12.91s/it] 14%|█▎ | 1362/10000 [4:58:02<30:58:41, 12.91s/it] {'loss': 0.0073, 'learning_rate': 4.321e-05, 'epoch': 0.51} 14%|█▎ | 1362/10000 [4:58:02<30:58:41, 12.91s/it] 14%|█▎ | 1363/10000 [4:58:14<30:56:59, 12.90s/it] {'loss': 0.0063, 'learning_rate': 4.3205000000000004e-05, 'epoch': 0.51} 14%|█▎ | 1363/10000 [4:58:14<30:56:59, 12.90s/it] 14%|█▎ | 1364/10000 [4:58:27<30:54:47, 12.89s/it] {'loss': 0.0073, 'learning_rate': 4.32e-05, 'epoch': 0.51} 14%|█▎ | 1364/10000 [4:58:27<30:54:47, 12.89s/it] 14%|█▎ | 1365/10000 [4:58:40<30:54:50, 12.89s/it] {'loss': 0.0061, 'learning_rate': 4.3195e-05, 'epoch': 0.51} 14%|█▎ | 1365/10000 [4:58:40<30:54:50, 12.89s/it] 14%|█▎ | 1366/10000 [4:58:53<30:56:23, 12.90s/it] {'loss': 0.0065, 'learning_rate': 4.3190000000000005e-05, 'epoch': 0.51} 14%|█▎ | 1366/10000 [4:58:53<30:56:23, 12.90s/it] 14%|█▎ | 1367/10000 [4:59:06<30:52:07, 12.87s/it] {'loss': 0.0077, 'learning_rate': 4.3185e-05, 'epoch': 0.52} 14%|█▎ | 1367/10000 [4:59:06<30:52:07, 12.87s/it] 14%|█▎ | 1368/10000 [4:59:19<30:54:54, 12.89s/it] {'loss': 0.0053, 'learning_rate': 4.318e-05, 'epoch': 0.52} 14%|█▎ | 1368/10000 [4:59:19<30:54:54, 12.89s/it] 14%|█▎ | 1369/10000 [4:59:32<30:54:15, 12.89s/it] {'loss': 0.0053, 'learning_rate': 4.3175000000000006e-05, 'epoch': 0.52} 14%|█▎ | 1369/10000 [4:59:32<30:54:15, 12.89s/it] 14%|█▎ | 1370/10000 [4:59:45<30:57:55, 12.92s/it] {'loss': 0.0047, 'learning_rate': 4.317e-05, 'epoch': 0.52} 14%|█▎ | 1370/10000 [4:59:45<30:57:55, 12.92s/it] 14%|█▎ | 1371/10000 [4:59:58<30:57:49, 12.92s/it] {'loss': 0.0057, 'learning_rate': 4.3165e-05, 'epoch': 0.52} 14%|█▎ | 1371/10000 [4:59:58<30:57:49, 12.92s/it] 14%|█▎ | 1372/10000 [5:00:10<30:55:48, 12.91s/it] {'loss': 0.0071, 'learning_rate': 4.316e-05, 'epoch': 0.52} 14%|█▎ | 1372/10000 [5:00:10<30:55:48, 12.91s/it] 14%|█▎ | 1373/10000 [5:00:23<30:56:37, 12.91s/it] {'loss': 0.0077, 'learning_rate': 4.3155e-05, 'epoch': 0.52} 14%|█▎ | 1373/10000 [5:00:23<30:56:37, 12.91s/it] 14%|█▎ | 1374/10000 [5:00:36<31:00:48, 12.94s/it] {'loss': 0.0055, 'learning_rate': 4.315e-05, 'epoch': 0.52} 14%|█▎ | 1374/10000 [5:00:36<31:00:48, 12.94s/it] 14%|█▍ | 1375/10000 [5:00:49<31:03:34, 12.96s/it] {'loss': 0.0073, 'learning_rate': 4.3145e-05, 'epoch': 0.52} 14%|█▍ | 1375/10000 [5:00:49<31:03:34, 12.96s/it] 14%|█▍ | 1376/10000 [5:01:02<31:01:05, 12.95s/it] {'loss': 0.0099, 'learning_rate': 4.3140000000000004e-05, 'epoch': 0.52} 14%|█▍ | 1376/10000 [5:01:02<31:01:05, 12.95s/it] 14%|█▍ | 1377/10000 [5:01:15<31:01:54, 12.96s/it] {'loss': 0.0062, 'learning_rate': 4.3135000000000006e-05, 'epoch': 0.52} 14%|█▍ | 1377/10000 [5:01:15<31:01:54, 12.96s/it] 14%|█▍ | 1378/10000 [5:01:28<30:57:34, 12.93s/it] {'loss': 0.0048, 'learning_rate': 4.313e-05, 'epoch': 0.52} 14%|█▍ | 1378/10000 [5:01:28<30:57:34, 12.93s/it] 14%|█▍ | 1379/10000 [5:01:41<31:00:15, 12.95s/it] {'loss': 0.0087, 'learning_rate': 4.3125000000000005e-05, 'epoch': 0.52} 14%|█▍ | 1379/10000 [5:01:41<31:00:15, 12.95s/it] 14%|█▍ | 1380/10000 [5:01:54<31:00:34, 12.95s/it] {'loss': 0.0076, 'learning_rate': 4.312000000000001e-05, 'epoch': 0.52} 14%|█▍ | 1380/10000 [5:01:54<31:00:34, 12.95s/it] 14%|█▍ | 1381/10000 [5:02:07<31:03:49, 12.97s/it] {'loss': 0.0049, 'learning_rate': 4.3115e-05, 'epoch': 0.52} 14%|█▍ | 1381/10000 [5:02:07<31:03:49, 12.97s/it] 14%|█▍ | 1382/10000 [5:02:20<31:05:31, 12.99s/it] {'loss': 0.0047, 'learning_rate': 4.311e-05, 'epoch': 0.52} 14%|█▍ | 1382/10000 [5:02:20<31:05:31, 12.99s/it] 14%|█▍ | 1383/10000 [5:02:33<31:05:33, 12.99s/it] {'loss': 0.0056, 'learning_rate': 4.3105e-05, 'epoch': 0.52} 14%|█▍ | 1383/10000 [5:02:33<31:05:33, 12.99s/it] 14%|█▍ | 1384/10000 [5:02:46<31:00:43, 12.96s/it] {'loss': 0.0051, 'learning_rate': 4.3100000000000004e-05, 'epoch': 0.52} 14%|█▍ | 1384/10000 [5:02:46<31:00:43, 12.96s/it] 14%|█▍ | 1385/10000 [5:02:59<31:04:36, 12.99s/it] {'loss': 0.0055, 'learning_rate': 4.3095e-05, 'epoch': 0.52} 14%|█▍ | 1385/10000 [5:02:59<31:04:36, 12.99s/it] 14%|█▍ | 1386/10000 [5:03:12<31:03:03, 12.98s/it] {'loss': 0.0052, 'learning_rate': 4.309e-05, 'epoch': 0.52} 14%|█▍ | 1386/10000 [5:03:12<31:03:03, 12.98s/it] 14%|█▍ | 1387/10000 [5:03:25<31:04:30, 12.99s/it] {'loss': 0.0068, 'learning_rate': 4.3085000000000005e-05, 'epoch': 0.52} 14%|█▍ | 1387/10000 [5:03:25<31:04:30, 12.99s/it] 14%|█▍ | 1388/10000 [5:03:38<31:02:05, 12.97s/it] {'loss': 0.0054, 'learning_rate': 4.308e-05, 'epoch': 0.52} 14%|█▍ | 1388/10000 [5:03:38<31:02:05, 12.97s/it] 14%|█▍ | 1389/10000 [5:03:51<31:06:03, 13.00s/it] {'loss': 0.0062, 'learning_rate': 4.3075000000000003e-05, 'epoch': 0.52} 14%|█▍ | 1389/10000 [5:03:51<31:06:03, 13.00s/it] 14%|█▍ | 1390/10000 [5:04:04<31:05:12, 13.00s/it] {'loss': 0.0053, 'learning_rate': 4.3070000000000006e-05, 'epoch': 0.52} 14%|█▍ | 1390/10000 [5:04:04<31:05:12, 13.00s/it] 14%|█▍ | 1391/10000 [5:04:17<31:00:58, 12.97s/it] {'loss': 0.0074, 'learning_rate': 4.3065e-05, 'epoch': 0.52} 14%|█▍ | 1391/10000 [5:04:17<31:00:58, 12.97s/it] 14%|█▍ | 1392/10000 [5:04:30<31:00:28, 12.97s/it] {'loss': 0.0063, 'learning_rate': 4.306e-05, 'epoch': 0.52} 14%|█▍ | 1392/10000 [5:04:30<31:00:28, 12.97s/it] 14%|█▍ | 1393/10000 [5:04:43<30:59:29, 12.96s/it] {'loss': 0.005, 'learning_rate': 4.3055e-05, 'epoch': 0.52} 14%|█▍ | 1393/10000 [5:04:43<30:59:29, 12.96s/it] 14%|█▍ | 1394/10000 [5:04:56<30:58:57, 12.96s/it] {'loss': 0.0079, 'learning_rate': 4.305e-05, 'epoch': 0.53} 14%|█▍ | 1394/10000 [5:04:56<30:58:57, 12.96s/it] 14%|█▍ | 1395/10000 [5:05:09<30:58:24, 12.96s/it] {'loss': 0.0063, 'learning_rate': 4.3045e-05, 'epoch': 0.53} 14%|█▍ | 1395/10000 [5:05:09<30:58:24, 12.96s/it] 14%|█▍ | 1396/10000 [5:05:22<30:55:53, 12.94s/it] {'loss': 0.0058, 'learning_rate': 4.304e-05, 'epoch': 0.53} 14%|█▍ | 1396/10000 [5:05:22<30:55:53, 12.94s/it] 14%|█▍ | 1397/10000 [5:05:35<30:51:06, 12.91s/it] {'loss': 0.0046, 'learning_rate': 4.3035000000000004e-05, 'epoch': 0.53} 14%|█▍ | 1397/10000 [5:05:35<30:51:06, 12.91s/it] 14%|█▍ | 1398/10000 [5:05:47<30:48:40, 12.89s/it] {'loss': 0.0074, 'learning_rate': 4.3030000000000006e-05, 'epoch': 0.53} 14%|█▍ | 1398/10000 [5:05:47<30:48:40, 12.89s/it] 14%|█▍ | 1399/10000 [5:06:00<30:48:04, 12.89s/it] {'loss': 0.0069, 'learning_rate': 4.3025e-05, 'epoch': 0.53} 14%|█▍ | 1399/10000 [5:06:00<30:48:04, 12.89s/it] 14%|█▍ | 1400/10000 [5:06:13<30:47:01, 12.89s/it] {'loss': 0.0055, 'learning_rate': 4.3020000000000005e-05, 'epoch': 0.53} 14%|█▍ | 1400/10000 [5:06:13<30:47:01, 12.89s/it] 14%|█▍ | 1401/10000 [5:06:26<30:44:13, 12.87s/it] {'loss': 0.0052, 'learning_rate': 4.3015e-05, 'epoch': 0.53} 14%|█▍ | 1401/10000 [5:06:26<30:44:13, 12.87s/it] 14%|█▍ | 1402/10000 [5:06:39<30:45:08, 12.88s/it] {'loss': 0.0053, 'learning_rate': 4.301e-05, 'epoch': 0.53} 14%|█▍ | 1402/10000 [5:06:39<30:45:08, 12.88s/it] 14%|█▍ | 1403/10000 [5:06:52<30:42:58, 12.86s/it] {'loss': 0.0082, 'learning_rate': 4.3005e-05, 'epoch': 0.53} 14%|█▍ | 1403/10000 [5:06:52<30:42:58, 12.86s/it] 14%|█▍ | 1404/10000 [5:07:05<30:46:30, 12.89s/it] {'loss': 0.0061, 'learning_rate': 4.3e-05, 'epoch': 0.53} 14%|█▍ | 1404/10000 [5:07:05<30:46:30, 12.89s/it] 14%|█▍ | 1405/10000 [5:07:18<30:48:11, 12.90s/it] {'loss': 0.0056, 'learning_rate': 4.2995000000000004e-05, 'epoch': 0.53} 14%|█▍ | 1405/10000 [5:07:18<30:48:11, 12.90s/it] 14%|█▍ | 1406/10000 [5:07:31<30:48:50, 12.91s/it] {'loss': 0.0046, 'learning_rate': 4.299e-05, 'epoch': 0.53} 14%|█▍ | 1406/10000 [5:07:31<30:48:50, 12.91s/it] 14%|█▍ | 1407/10000 [5:07:44<30:53:09, 12.94s/it] {'loss': 0.0045, 'learning_rate': 4.2985e-05, 'epoch': 0.53} 14%|█▍ | 1407/10000 [5:07:44<30:53:09, 12.94s/it] 14%|█▍ | 1408/10000 [5:07:56<30:51:54, 12.93s/it] {'loss': 0.0068, 'learning_rate': 4.2980000000000005e-05, 'epoch': 0.53} 14%|█▍ | 1408/10000 [5:07:56<30:51:54, 12.93s/it] 14%|█▍ | 1409/10000 [5:08:09<30:53:58, 12.95s/it] {'loss': 0.0088, 'learning_rate': 4.2975e-05, 'epoch': 0.53} 14%|█▍ | 1409/10000 [5:08:09<30:53:58, 12.95s/it] 14%|█▍ | 1410/10000 [5:08:22<30:55:16, 12.96s/it] {'loss': 0.0074, 'learning_rate': 4.2970000000000004e-05, 'epoch': 0.53} 14%|█▍ | 1410/10000 [5:08:22<30:55:16, 12.96s/it] 14%|█▍ | 1411/10000 [5:08:35<30:52:56, 12.94s/it] {'loss': 0.0069, 'learning_rate': 4.2965e-05, 'epoch': 0.53} 14%|█▍ | 1411/10000 [5:08:35<30:52:56, 12.94s/it] 14%|█▍ | 1412/10000 [5:08:48<30:52:39, 12.94s/it] {'loss': 0.0054, 'learning_rate': 4.296e-05, 'epoch': 0.53} 14%|█▍ | 1412/10000 [5:08:48<30:52:39, 12.94s/it] 14%|█▍ | 1413/10000 [5:09:01<30:47:30, 12.91s/it] {'loss': 0.0071, 'learning_rate': 4.2955e-05, 'epoch': 0.53} 14%|█▍ | 1413/10000 [5:09:01<30:47:30, 12.91s/it] 14%|█▍ | 1414/10000 [5:09:14<30:46:59, 12.91s/it] {'loss': 0.0071, 'learning_rate': 4.295e-05, 'epoch': 0.53} 14%|█▍ | 1414/10000 [5:09:14<30:46:59, 12.91s/it] 14%|█▍ | 1415/10000 [5:09:27<30:50:53, 12.94s/it] {'loss': 0.0055, 'learning_rate': 4.2945e-05, 'epoch': 0.53} 14%|█▍ | 1415/10000 [5:09:27<30:50:53, 12.94s/it] 14%|█▍ | 1416/10000 [5:09:40<30:50:17, 12.93s/it] {'loss': 0.006, 'learning_rate': 4.2940000000000006e-05, 'epoch': 0.53} 14%|█▍ | 1416/10000 [5:09:40<30:50:17, 12.93s/it] 14%|█▍ | 1417/10000 [5:09:53<30:53:29, 12.96s/it] {'loss': 0.0053, 'learning_rate': 4.2935e-05, 'epoch': 0.53} 14%|█▍ | 1417/10000 [5:09:53<30:53:29, 12.96s/it] 14%|█▍ | 1418/10000 [5:10:06<30:52:31, 12.95s/it] {'loss': 0.0054, 'learning_rate': 4.2930000000000004e-05, 'epoch': 0.53} 14%|█▍ | 1418/10000 [5:10:06<30:52:31, 12.95s/it] 14%|█▍ | 1419/10000 [5:10:19<30:52:52, 12.96s/it] {'loss': 0.0061, 'learning_rate': 4.2925000000000007e-05, 'epoch': 0.53} 14%|█▍ | 1419/10000 [5:10:19<30:52:52, 12.96s/it] 14%|█▍ | 1420/10000 [5:10:32<30:52:20, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.292e-05, 'epoch': 0.54} 14%|█▍ | 1420/10000 [5:10:32<30:52:20, 12.95s/it] 14%|█▍ | 1421/10000 [5:10:45<30:54:31, 12.97s/it] {'loss': 0.0046, 'learning_rate': 4.2915e-05, 'epoch': 0.54} 14%|█▍ | 1421/10000 [5:10:45<30:54:31, 12.97s/it] 14%|█▍ | 1422/10000 [5:10:58<30:51:43, 12.95s/it] {'loss': 0.0064, 'learning_rate': 4.291e-05, 'epoch': 0.54} 14%|█▍ | 1422/10000 [5:10:58<30:51:43, 12.95s/it] 14%|█▍ | 1423/10000 [5:11:11<30:49:36, 12.94s/it] {'loss': 0.006, 'learning_rate': 4.2905000000000003e-05, 'epoch': 0.54} 14%|█▍ | 1423/10000 [5:11:11<30:49:36, 12.94s/it] 14%|█▍ | 1424/10000 [5:11:24<30:48:49, 12.93s/it] {'loss': 0.006, 'learning_rate': 4.29e-05, 'epoch': 0.54} 14%|█▍ | 1424/10000 [5:11:24<30:48:49, 12.93s/it] 14%|█▍ | 1425/10000 [5:11:37<30:51:54, 12.96s/it] {'loss': 0.0056, 'learning_rate': 4.2895e-05, 'epoch': 0.54} 14%|█▍ | 1425/10000 [5:11:37<30:51:54, 12.96s/it] 14%|█▍ | 1426/10000 [5:11:49<30:49:54, 12.95s/it] {'loss': 0.0062, 'learning_rate': 4.2890000000000004e-05, 'epoch': 0.54} 14%|█▍ | 1426/10000 [5:11:49<30:49:54, 12.95s/it] 14%|█▍ | 1427/10000 [5:12:02<30:51:48, 12.96s/it] {'loss': 0.0068, 'learning_rate': 4.2885e-05, 'epoch': 0.54} 14%|█▍ | 1427/10000 [5:12:03<30:51:48, 12.96s/it] 14%|█▍ | 1428/10000 [5:12:15<30:51:00, 12.96s/it] {'loss': 0.0067, 'learning_rate': 4.288e-05, 'epoch': 0.54} 14%|█▍ | 1428/10000 [5:12:15<30:51:00, 12.96s/it] 14%|█▍ | 1429/10000 [5:12:28<30:51:18, 12.96s/it] {'loss': 0.0054, 'learning_rate': 4.2875000000000005e-05, 'epoch': 0.54} 14%|█▍ | 1429/10000 [5:12:28<30:51:18, 12.96s/it] 14%|█▍ | 1430/10000 [5:12:41<30:50:18, 12.95s/it] {'loss': 0.005, 'learning_rate': 4.287000000000001e-05, 'epoch': 0.54} 14%|█▍ | 1430/10000 [5:12:41<30:50:18, 12.95s/it] 14%|█▍ | 1431/10000 [5:12:54<30:50:26, 12.96s/it] {'loss': 0.0056, 'learning_rate': 4.2865e-05, 'epoch': 0.54} 14%|█▍ | 1431/10000 [5:12:54<30:50:26, 12.96s/it] 14%|█▍ | 1432/10000 [5:13:07<30:49:29, 12.95s/it] {'loss': 0.0045, 'learning_rate': 4.286e-05, 'epoch': 0.54} 14%|█▍ | 1432/10000 [5:13:07<30:49:29, 12.95s/it] 14%|█▍ | 1433/10000 [5:13:20<30:50:46, 12.96s/it] {'loss': 0.0064, 'learning_rate': 4.2855e-05, 'epoch': 0.54} 14%|█▍ | 1433/10000 [5:13:20<30:50:46, 12.96s/it] 14%|█▍ | 1434/10000 [5:13:33<30:51:21, 12.97s/it] {'loss': 0.006, 'learning_rate': 4.285e-05, 'epoch': 0.54} 14%|█▍ | 1434/10000 [5:13:33<30:51:21, 12.97s/it] 14%|█▍ | 1435/10000 [5:13:46<30:47:15, 12.94s/it] {'loss': 0.0062, 'learning_rate': 4.2845e-05, 'epoch': 0.54} 14%|█▍ | 1435/10000 [5:13:46<30:47:15, 12.94s/it] 14%|█▍ | 1436/10000 [5:13:59<30:48:07, 12.95s/it] {'loss': 0.005, 'learning_rate': 4.284e-05, 'epoch': 0.54} 14%|█▍ | 1436/10000 [5:13:59<30:48:07, 12.95s/it] 14%|█▍ | 1437/10000 [5:14:12<30:48:40, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.2835000000000006e-05, 'epoch': 0.54} 14%|█▍ | 1437/10000 [5:14:12<30:48:40, 12.95s/it] 14%|█▍ | 1438/10000 [5:14:25<30:45:27, 12.93s/it] {'loss': 0.0055, 'learning_rate': 4.283e-05, 'epoch': 0.54} 14%|█▍ | 1438/10000 [5:14:25<30:45:27, 12.93s/it] 14%|█▍ | 1439/10000 [5:14:38<30:44:33, 12.93s/it] {'loss': 0.0069, 'learning_rate': 4.2825000000000004e-05, 'epoch': 0.54} 14%|█▍ | 1439/10000 [5:14:38<30:44:33, 12.93s/it] 14%|█▍ | 1440/10000 [5:14:51<30:44:48, 12.93s/it] {'loss': 0.0035, 'learning_rate': 4.282000000000001e-05, 'epoch': 0.54} 14%|█▍ | 1440/10000 [5:14:51<30:44:48, 12.93s/it] 14%|█▍ | 1441/10000 [5:15:04<30:47:07, 12.95s/it] {'loss': 0.0048, 'learning_rate': 4.2815e-05, 'epoch': 0.54} 14%|█▍ | 1441/10000 [5:15:04<30:47:07, 12.95s/it] 14%|█▍ | 1442/10000 [5:15:17<30:45:12, 12.94s/it] {'loss': 0.0052, 'learning_rate': 4.281e-05, 'epoch': 0.54} 14%|█▍ | 1442/10000 [5:15:17<30:45:12, 12.94s/it] 14%|█▍ | 1443/10000 [5:15:30<30:46:43, 12.95s/it] {'loss': 0.0065, 'learning_rate': 4.2805e-05, 'epoch': 0.54} 14%|█▍ | 1443/10000 [5:15:30<30:46:43, 12.95s/it] 14%|█▍ | 1444/10000 [5:15:43<30:48:23, 12.96s/it] {'loss': 0.008, 'learning_rate': 4.2800000000000004e-05, 'epoch': 0.54} 14%|█▍ | 1444/10000 [5:15:43<30:48:23, 12.96s/it] 14%|█▍ | 1445/10000 [5:15:56<30:48:51, 12.97s/it] {'loss': 0.0055, 'learning_rate': 4.2795e-05, 'epoch': 0.54} 14%|█▍ | 1445/10000 [5:15:56<30:48:51, 12.97s/it] 14%|█▍ | 1446/10000 [5:16:09<30:50:05, 12.98s/it] {'loss': 0.0057, 'learning_rate': 4.279e-05, 'epoch': 0.54} 14%|█▍ | 1446/10000 [5:16:09<30:50:05, 12.98s/it] 14%|█▍ | 1447/10000 [5:16:22<30:49:44, 12.98s/it] {'loss': 0.0067, 'learning_rate': 4.2785000000000005e-05, 'epoch': 0.55} 14%|█▍ | 1447/10000 [5:16:22<30:49:44, 12.98s/it] 14%|█▍ | 1448/10000 [5:16:35<30:48:55, 12.97s/it] {'loss': 0.0058, 'learning_rate': 4.278e-05, 'epoch': 0.55} 14%|█▍ | 1448/10000 [5:16:35<30:48:55, 12.97s/it] 14%|█▍ | 1449/10000 [5:16:47<30:48:13, 12.97s/it] {'loss': 0.0049, 'learning_rate': 4.2775e-05, 'epoch': 0.55} 14%|█▍ | 1449/10000 [5:16:48<30:48:13, 12.97s/it] 14%|█▍ | 1450/10000 [5:17:00<30:48:00, 12.97s/it] {'loss': 0.0063, 'learning_rate': 4.2770000000000006e-05, 'epoch': 0.55} 14%|█▍ | 1450/10000 [5:17:00<30:48:00, 12.97s/it] 15%|█▍ | 1451/10000 [5:17:13<30:48:16, 12.97s/it] {'loss': 0.0047, 'learning_rate': 4.2765e-05, 'epoch': 0.55} 15%|█▍ | 1451/10000 [5:17:13<30:48:16, 12.97s/it] 15%|█▍ | 1452/10000 [5:17:26<30:47:34, 12.97s/it] {'loss': 0.0053, 'learning_rate': 4.276e-05, 'epoch': 0.55} 15%|█▍ | 1452/10000 [5:17:26<30:47:34, 12.97s/it] 15%|█▍ | 1453/10000 [5:17:39<30:49:25, 12.98s/it] {'loss': 0.0055, 'learning_rate': 4.2755e-05, 'epoch': 0.55} 15%|█▍ | 1453/10000 [5:17:39<30:49:25, 12.98s/it] 15%|█▍ | 1454/10000 [5:17:52<30:49:01, 12.98s/it] {'loss': 0.0051, 'learning_rate': 4.275e-05, 'epoch': 0.55} 15%|█▍ | 1454/10000 [5:17:52<30:49:01, 12.98s/it] 15%|█▍ | 1455/10000 [5:18:05<30:48:34, 12.98s/it] {'loss': 0.0049, 'learning_rate': 4.2745000000000005e-05, 'epoch': 0.55} 15%|█▍ | 1455/10000 [5:18:05<30:48:34, 12.98s/it] 15%|█▍ | 1456/10000 [5:18:18<30:47:02, 12.97s/it] {'loss': 0.005, 'learning_rate': 4.274e-05, 'epoch': 0.55} 15%|█▍ | 1456/10000 [5:18:18<30:47:02, 12.97s/it] 15%|█▍ | 1457/10000 [5:18:31<30:50:03, 12.99s/it] {'loss': 0.0055, 'learning_rate': 4.2735e-05, 'epoch': 0.55} 15%|█▍ | 1457/10000 [5:18:31<30:50:03, 12.99s/it] 15%|█▍ | 1458/10000 [5:18:44<30:46:24, 12.97s/it] {'loss': 0.0056, 'learning_rate': 4.2730000000000006e-05, 'epoch': 0.55} 15%|█▍ | 1458/10000 [5:18:44<30:46:24, 12.97s/it] 15%|█▍ | 1459/10000 [5:18:57<30:45:13, 12.96s/it] {'loss': 0.0057, 'learning_rate': 4.2725e-05, 'epoch': 0.55} 15%|█▍ | 1459/10000 [5:18:57<30:45:13, 12.96s/it] 15%|█▍ | 1460/10000 [5:19:10<30:44:22, 12.96s/it] {'loss': 0.0053, 'learning_rate': 4.2720000000000004e-05, 'epoch': 0.55} 15%|█▍ | 1460/10000 [5:19:10<30:44:22, 12.96s/it] 15%|█▍ | 1461/10000 [5:19:23<30:43:23, 12.95s/it] {'loss': 0.006, 'learning_rate': 4.2715e-05, 'epoch': 0.55} 15%|█▍ | 1461/10000 [5:19:23<30:43:23, 12.95s/it] 15%|█▍ | 1462/10000 [5:19:36<30:43:49, 12.96s/it] {'loss': 0.0059, 'learning_rate': 4.271e-05, 'epoch': 0.55} 15%|█▍ | 1462/10000 [5:19:36<30:43:49, 12.96s/it] 15%|█▍ | 1463/10000 [5:19:49<30:47:00, 12.98s/it] {'loss': 0.0064, 'learning_rate': 4.2705e-05, 'epoch': 0.55} 15%|█▍ | 1463/10000 [5:19:49<30:47:00, 12.98s/it] 15%|█▍ | 1464/10000 [5:20:02<30:43:18, 12.96s/it] {'loss': 0.0063, 'learning_rate': 4.27e-05, 'epoch': 0.55} 15%|█▍ | 1464/10000 [5:20:02<30:43:18, 12.96s/it] 15%|█▍ | 1465/10000 [5:20:15<30:43:19, 12.96s/it] {'loss': 0.0066, 'learning_rate': 4.2695000000000004e-05, 'epoch': 0.55} 15%|█▍ | 1465/10000 [5:20:15<30:43:19, 12.96s/it] 15%|█▍ | 1466/10000 [5:20:28<30:45:37, 12.98s/it] {'loss': 0.0065, 'learning_rate': 4.269e-05, 'epoch': 0.55} 15%|█▍ | 1466/10000 [5:20:28<30:45:37, 12.98s/it] 15%|█▍ | 1467/10000 [5:20:41<30:47:00, 12.99s/it] {'loss': 0.0048, 'learning_rate': 4.2685e-05, 'epoch': 0.55} 15%|█▍ | 1467/10000 [5:20:41<30:47:00, 12.99s/it] 15%|█▍ | 1468/10000 [5:20:54<30:43:57, 12.97s/it] {'loss': 0.0052, 'learning_rate': 4.2680000000000005e-05, 'epoch': 0.55} 15%|█▍ | 1468/10000 [5:20:54<30:43:57, 12.97s/it] 15%|█▍ | 1469/10000 [5:21:07<30:43:47, 12.97s/it] {'loss': 0.0054, 'learning_rate': 4.2675e-05, 'epoch': 0.55} 15%|█▍ | 1469/10000 [5:21:07<30:43:47, 12.97s/it] 15%|█▍ | 1470/10000 [5:21:20<30:42:47, 12.96s/it] {'loss': 0.0065, 'learning_rate': 4.267e-05, 'epoch': 0.55} 15%|█▍ | 1470/10000 [5:21:20<30:42:47, 12.96s/it] 15%|█▍ | 1471/10000 [5:21:33<30:43:09, 12.97s/it] {'loss': 0.0064, 'learning_rate': 4.2665e-05, 'epoch': 0.55} 15%|█▍ | 1471/10000 [5:21:33<30:43:09, 12.97s/it] 15%|█▍ | 1472/10000 [5:21:46<30:39:36, 12.94s/it] {'loss': 0.0055, 'learning_rate': 4.266e-05, 'epoch': 0.55} 15%|█▍ | 1472/10000 [5:21:46<30:39:36, 12.94s/it] 15%|█▍ | 1473/10000 [5:21:59<30:36:32, 12.92s/it] {'loss': 0.0065, 'learning_rate': 4.2655e-05, 'epoch': 0.56} 15%|█▍ | 1473/10000 [5:21:59<30:36:32, 12.92s/it] 15%|█▍ | 1474/10000 [5:22:11<30:34:50, 12.91s/it] {'loss': 0.0046, 'learning_rate': 4.265e-05, 'epoch': 0.56} 15%|█▍ | 1474/10000 [5:22:12<30:34:50, 12.91s/it] 15%|█▍ | 1475/10000 [5:22:24<30:35:14, 12.92s/it] {'loss': 0.0058, 'learning_rate': 4.2645e-05, 'epoch': 0.56} 15%|█▍ | 1475/10000 [5:22:24<30:35:14, 12.92s/it] 15%|█▍ | 1476/10000 [5:22:37<30:34:22, 12.91s/it] {'loss': 0.006, 'learning_rate': 4.2640000000000005e-05, 'epoch': 0.56} 15%|█▍ | 1476/10000 [5:22:37<30:34:22, 12.91s/it] 15%|█▍ | 1477/10000 [5:22:50<30:37:32, 12.94s/it] {'loss': 0.0048, 'learning_rate': 4.2635e-05, 'epoch': 0.56} 15%|█▍ | 1477/10000 [5:22:50<30:37:32, 12.94s/it] 15%|█▍ | 1478/10000 [5:23:03<30:34:06, 12.91s/it] {'loss': 0.006, 'learning_rate': 4.2630000000000004e-05, 'epoch': 0.56} 15%|█▍ | 1478/10000 [5:23:03<30:34:06, 12.91s/it] 15%|█▍ | 1479/10000 [5:23:16<30:31:45, 12.90s/it] {'loss': 0.0051, 'learning_rate': 4.2625000000000006e-05, 'epoch': 0.56} 15%|█▍ | 1479/10000 [5:23:16<30:31:45, 12.90s/it] 15%|█▍ | 1480/10000 [5:23:29<30:35:14, 12.92s/it] {'loss': 0.0059, 'learning_rate': 4.262e-05, 'epoch': 0.56} 15%|█▍ | 1480/10000 [5:23:29<30:35:14, 12.92s/it] 15%|█▍ | 1481/10000 [5:23:42<30:33:18, 12.91s/it] {'loss': 0.0077, 'learning_rate': 4.2615e-05, 'epoch': 0.56} 15%|█▍ | 1481/10000 [5:23:42<30:33:18, 12.91s/it] 15%|█▍ | 1482/10000 [5:23:55<30:36:40, 12.94s/it] {'loss': 0.0048, 'learning_rate': 4.261e-05, 'epoch': 0.56} 15%|█▍ | 1482/10000 [5:23:55<30:36:40, 12.94s/it] 15%|█▍ | 1483/10000 [5:24:08<30:37:14, 12.94s/it] {'loss': 0.0048, 'learning_rate': 4.2605e-05, 'epoch': 0.56} 15%|█▍ | 1483/10000 [5:24:08<30:37:14, 12.94s/it] 15%|█▍ | 1484/10000 [5:24:21<30:38:19, 12.95s/it] {'loss': 0.0053, 'learning_rate': 4.26e-05, 'epoch': 0.56} 15%|█▍ | 1484/10000 [5:24:21<30:38:19, 12.95s/it] 15%|█▍ | 1485/10000 [5:24:34<30:35:17, 12.93s/it] {'loss': 0.005, 'learning_rate': 4.2595e-05, 'epoch': 0.56} 15%|█▍ | 1485/10000 [5:24:34<30:35:17, 12.93s/it] 15%|█▍ | 1486/10000 [5:24:47<30:34:49, 12.93s/it] {'loss': 0.0073, 'learning_rate': 4.2590000000000004e-05, 'epoch': 0.56} 15%|█▍ | 1486/10000 [5:24:47<30:34:49, 12.93s/it] 15%|█▍ | 1487/10000 [5:25:00<30:36:38, 12.94s/it] {'loss': 0.0065, 'learning_rate': 4.2585e-05, 'epoch': 0.56} 15%|█▍ | 1487/10000 [5:25:00<30:36:38, 12.94s/it] 15%|█▍ | 1488/10000 [5:25:12<30:33:17, 12.92s/it] {'loss': 0.0062, 'learning_rate': 4.258e-05, 'epoch': 0.56} 15%|█▍ | 1488/10000 [5:25:13<30:33:17, 12.92s/it] 15%|█▍ | 1489/10000 [5:25:25<30:36:20, 12.95s/it] {'loss': 0.0047, 'learning_rate': 4.2575000000000005e-05, 'epoch': 0.56} 15%|█▍ | 1489/10000 [5:25:26<30:36:20, 12.95s/it] 15%|█▍ | 1490/10000 [5:25:39<30:40:13, 12.97s/it] {'loss': 0.0055, 'learning_rate': 4.257000000000001e-05, 'epoch': 0.56} 15%|█▍ | 1490/10000 [5:25:39<30:40:13, 12.97s/it] 15%|█▍ | 1491/10000 [5:25:51<30:39:48, 12.97s/it] {'loss': 0.007, 'learning_rate': 4.2564999999999997e-05, 'epoch': 0.56} 15%|█▍ | 1491/10000 [5:25:52<30:39:48, 12.97s/it] 15%|█▍ | 1492/10000 [5:26:04<30:38:36, 12.97s/it] {'loss': 0.0055, 'learning_rate': 4.256e-05, 'epoch': 0.56} 15%|█▍ | 1492/10000 [5:26:04<30:38:36, 12.97s/it] 15%|█▍ | 1493/10000 [5:26:17<30:42:13, 12.99s/it] {'loss': 0.0051, 'learning_rate': 4.2555e-05, 'epoch': 0.56} 15%|█▍ | 1493/10000 [5:26:18<30:42:13, 12.99s/it] 15%|█▍ | 1494/10000 [5:26:30<30:37:50, 12.96s/it] {'loss': 0.0069, 'learning_rate': 4.2550000000000004e-05, 'epoch': 0.56} 15%|█▍ | 1494/10000 [5:26:30<30:37:50, 12.96s/it] 15%|█▍ | 1495/10000 [5:26:43<30:38:28, 12.97s/it] {'loss': 0.005, 'learning_rate': 4.2545e-05, 'epoch': 0.56} 15%|█▍ | 1495/10000 [5:26:43<30:38:28, 12.97s/it] 15%|█▍ | 1496/10000 [5:26:56<30:36:20, 12.96s/it] {'loss': 0.0054, 'learning_rate': 4.254e-05, 'epoch': 0.56} 15%|█▍ | 1496/10000 [5:26:56<30:36:20, 12.96s/it] 15%|█▍ | 1497/10000 [5:27:09<30:35:57, 12.96s/it] {'loss': 0.0064, 'learning_rate': 4.2535000000000005e-05, 'epoch': 0.56} 15%|█▍ | 1497/10000 [5:27:09<30:35:57, 12.96s/it] 15%|█▍ | 1498/10000 [5:27:22<30:38:10, 12.97s/it] {'loss': 0.0056, 'learning_rate': 4.253e-05, 'epoch': 0.56} 15%|█▍ | 1498/10000 [5:27:22<30:38:10, 12.97s/it] 15%|█▍ | 1499/10000 [5:27:35<30:41:27, 13.00s/it] {'loss': 0.005, 'learning_rate': 4.2525000000000004e-05, 'epoch': 0.56} 15%|█▍ | 1499/10000 [5:27:35<30:41:27, 13.00s/it] 15%|█▌ | 1500/10000 [5:27:48<30:38:43, 12.98s/it] {'loss': 0.0052, 'learning_rate': 4.2520000000000006e-05, 'epoch': 0.57} 15%|█▌ | 1500/10000 [5:27:48<30:38:43, 12.98s/it] 15%|█▌ | 1501/10000 [5:28:01<30:39:16, 12.98s/it] {'loss': 0.0076, 'learning_rate': 4.2515e-05, 'epoch': 0.57} 15%|█▌ | 1501/10000 [5:28:01<30:39:16, 12.98s/it] 15%|█▌ | 1502/10000 [5:28:14<30:37:21, 12.97s/it] {'loss': 0.0058, 'learning_rate': 4.251e-05, 'epoch': 0.57} 15%|█▌ | 1502/10000 [5:28:14<30:37:21, 12.97s/it] 15%|█▌ | 1503/10000 [5:28:27<30:37:53, 12.98s/it] {'loss': 0.0047, 'learning_rate': 4.2505e-05, 'epoch': 0.57} 15%|█▌ | 1503/10000 [5:28:27<30:37:53, 12.98s/it] 15%|█▌ | 1504/10000 [5:28:40<30:36:04, 12.97s/it] {'loss': 0.0056, 'learning_rate': 4.25e-05, 'epoch': 0.57} 15%|█▌ | 1504/10000 [5:28:40<30:36:04, 12.97s/it] 15%|█▌ | 1505/10000 [5:28:53<30:35:22, 12.96s/it] {'loss': 0.0063, 'learning_rate': 4.2495e-05, 'epoch': 0.57} 15%|█▌ | 1505/10000 [5:28:53<30:35:22, 12.96s/it] 15%|█▌ | 1506/10000 [5:29:06<30:34:48, 12.96s/it] {'loss': 0.0047, 'learning_rate': 4.249e-05, 'epoch': 0.57} 15%|█▌ | 1506/10000 [5:29:06<30:34:48, 12.96s/it] 15%|█▌ | 1507/10000 [5:29:19<30:30:02, 12.93s/it] {'loss': 0.0061, 'learning_rate': 4.2485000000000004e-05, 'epoch': 0.57} 15%|█▌ | 1507/10000 [5:29:19<30:30:02, 12.93s/it] 15%|█▌ | 1508/10000 [5:29:32<30:31:06, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.248e-05, 'epoch': 0.57} 15%|█▌ | 1508/10000 [5:29:32<30:31:06, 12.94s/it] 15%|█▌ | 1509/10000 [5:29:45<30:31:18, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.2475e-05, 'epoch': 0.57} 15%|█▌ | 1509/10000 [5:29:45<30:31:18, 12.94s/it] 15%|█▌ | 1510/10000 [5:29:58<30:31:34, 12.94s/it] {'loss': 0.0064, 'learning_rate': 4.2470000000000005e-05, 'epoch': 0.57} 15%|█▌ | 1510/10000 [5:29:58<30:31:34, 12.94s/it] 15%|█▌ | 1511/10000 [5:30:11<30:32:02, 12.95s/it] {'loss': 0.0067, 'learning_rate': 4.246500000000001e-05, 'epoch': 0.57} 15%|█▌ | 1511/10000 [5:30:11<30:32:02, 12.95s/it] 15%|█▌ | 1512/10000 [5:30:24<30:31:50, 12.95s/it] {'loss': 0.0094, 'learning_rate': 4.246e-05, 'epoch': 0.57} 15%|█▌ | 1512/10000 [5:30:24<30:31:50, 12.95s/it] 15%|█▌ | 1513/10000 [5:30:37<30:32:44, 12.96s/it] {'loss': 0.0053, 'learning_rate': 4.2455e-05, 'epoch': 0.57} 15%|█▌ | 1513/10000 [5:30:37<30:32:44, 12.96s/it] 15%|█▌ | 1514/10000 [5:30:50<30:34:40, 12.97s/it] {'loss': 0.0053, 'learning_rate': 4.245e-05, 'epoch': 0.57} 15%|█▌ | 1514/10000 [5:30:50<30:34:40, 12.97s/it] 15%|█▌ | 1515/10000 [5:31:03<30:34:58, 12.98s/it] {'loss': 0.0052, 'learning_rate': 4.2445000000000004e-05, 'epoch': 0.57} 15%|█▌ | 1515/10000 [5:31:03<30:34:58, 12.98s/it] 15%|█▌ | 1516/10000 [5:31:16<30:33:24, 12.97s/it] {'loss': 0.0112, 'learning_rate': 4.244e-05, 'epoch': 0.57} 15%|█▌ | 1516/10000 [5:31:16<30:33:24, 12.97s/it] 15%|█▌ | 1517/10000 [5:31:29<30:33:52, 12.97s/it] {'loss': 0.0047, 'learning_rate': 4.2435e-05, 'epoch': 0.57} 15%|█▌ | 1517/10000 [5:31:29<30:33:52, 12.97s/it] 15%|█▌ | 1518/10000 [5:31:41<30:26:02, 12.92s/it] {'loss': 0.0061, 'learning_rate': 4.2430000000000005e-05, 'epoch': 0.57} 15%|█▌ | 1518/10000 [5:31:41<30:26:02, 12.92s/it] 15%|█▌ | 1519/10000 [5:31:54<30:23:44, 12.90s/it] {'loss': 0.0063, 'learning_rate': 4.2425e-05, 'epoch': 0.57} 15%|█▌ | 1519/10000 [5:31:54<30:23:44, 12.90s/it] 15%|█▌ | 1520/10000 [5:32:07<30:21:36, 12.89s/it] {'loss': 0.006, 'learning_rate': 4.2420000000000004e-05, 'epoch': 0.57} 15%|█▌ | 1520/10000 [5:32:07<30:21:36, 12.89s/it] 15%|█▌ | 1521/10000 [5:32:20<30:21:00, 12.89s/it] {'loss': 0.0052, 'learning_rate': 4.2415000000000006e-05, 'epoch': 0.57} 15%|█▌ | 1521/10000 [5:32:20<30:21:00, 12.89s/it] 15%|█▌ | 1522/10000 [5:32:33<30:20:01, 12.88s/it] {'loss': 0.0067, 'learning_rate': 4.241e-05, 'epoch': 0.57} 15%|█▌ | 1522/10000 [5:32:33<30:20:01, 12.88s/it] 15%|█▌ | 1523/10000 [5:32:46<30:19:55, 12.88s/it] {'loss': 0.0066, 'learning_rate': 4.2405e-05, 'epoch': 0.57} 15%|█▌ | 1523/10000 [5:32:46<30:19:55, 12.88s/it] 15%|█▌ | 1524/10000 [5:32:59<30:20:40, 12.89s/it] {'loss': 0.0052, 'learning_rate': 4.24e-05, 'epoch': 0.57} 15%|█▌ | 1524/10000 [5:32:59<30:20:40, 12.89s/it] 15%|█▌ | 1525/10000 [5:33:11<30:17:33, 12.87s/it] {'loss': 0.0067, 'learning_rate': 4.2395e-05, 'epoch': 0.57} 15%|█▌ | 1525/10000 [5:33:11<30:17:33, 12.87s/it] 15%|█▌ | 1526/10000 [5:33:24<30:18:05, 12.87s/it] {'loss': 0.0058, 'learning_rate': 4.239e-05, 'epoch': 0.57} 15%|█▌ | 1526/10000 [5:33:24<30:18:05, 12.87s/it] 15%|█▌ | 1527/10000 [5:33:37<30:15:55, 12.86s/it] {'loss': 0.0069, 'learning_rate': 4.2385e-05, 'epoch': 0.58} 15%|█▌ | 1527/10000 [5:33:37<30:15:55, 12.86s/it] 15%|█▌ | 1528/10000 [5:33:50<30:19:45, 12.89s/it] {'loss': 0.0064, 'learning_rate': 4.2380000000000004e-05, 'epoch': 0.58} 15%|█▌ | 1528/10000 [5:33:50<30:19:45, 12.89s/it] 15%|█▌ | 1529/10000 [5:34:03<30:19:43, 12.89s/it] {'loss': 0.0053, 'learning_rate': 4.237500000000001e-05, 'epoch': 0.58} 15%|█▌ | 1529/10000 [5:34:03<30:19:43, 12.89s/it] 15%|█▌ | 1530/10000 [5:34:16<30:18:03, 12.88s/it] {'loss': 0.005, 'learning_rate': 4.237e-05, 'epoch': 0.58} 15%|█▌ | 1530/10000 [5:34:16<30:18:03, 12.88s/it] 15%|█▌ | 1531/10000 [5:34:29<30:14:54, 12.86s/it] {'loss': 0.0064, 'learning_rate': 4.2365000000000005e-05, 'epoch': 0.58} 15%|█▌ | 1531/10000 [5:34:29<30:14:54, 12.86s/it] 15%|█▌ | 1532/10000 [5:34:42<30:22:54, 12.92s/it] {'loss': 0.0053, 'learning_rate': 4.236e-05, 'epoch': 0.58} 15%|█▌ | 1532/10000 [5:34:42<30:22:54, 12.92s/it] 15%|█▌ | 1533/10000 [5:34:55<30:22:50, 12.92s/it] {'loss': 0.0056, 'learning_rate': 4.2355000000000004e-05, 'epoch': 0.58} 15%|█▌ | 1533/10000 [5:34:55<30:22:50, 12.92s/it] 15%|█▌ | 1534/10000 [5:35:07<30:20:24, 12.90s/it] {'loss': 0.0044, 'learning_rate': 4.235e-05, 'epoch': 0.58} 15%|█▌ | 1534/10000 [5:35:08<30:20:24, 12.90s/it] 15%|█▌ | 1535/10000 [5:35:20<30:21:56, 12.91s/it] {'loss': 0.0057, 'learning_rate': 4.2345e-05, 'epoch': 0.58} 15%|█▌ | 1535/10000 [5:35:20<30:21:56, 12.91s/it] 15%|█▌ | 1536/10000 [5:35:33<30:20:28, 12.91s/it] {'loss': 0.007, 'learning_rate': 4.2340000000000005e-05, 'epoch': 0.58} 15%|█▌ | 1536/10000 [5:35:33<30:20:28, 12.91s/it] 15%|█▌ | 1537/10000 [5:35:46<30:16:56, 12.88s/it] {'loss': 0.0059, 'learning_rate': 4.2335e-05, 'epoch': 0.58} 15%|█▌ | 1537/10000 [5:35:46<30:16:56, 12.88s/it] 15%|█▌ | 1538/10000 [5:35:59<30:15:05, 12.87s/it] {'loss': 0.0062, 'learning_rate': 4.233e-05, 'epoch': 0.58} 15%|█▌ | 1538/10000 [5:35:59<30:15:05, 12.87s/it] 15%|█▌ | 1539/10000 [5:36:12<30:13:31, 12.86s/it] {'loss': 0.0055, 'learning_rate': 4.2325000000000006e-05, 'epoch': 0.58} 15%|█▌ | 1539/10000 [5:36:12<30:13:31, 12.86s/it] 15%|█▌ | 1540/10000 [5:36:25<30:13:48, 12.86s/it] {'loss': 0.0056, 'learning_rate': 4.232e-05, 'epoch': 0.58} 15%|█▌ | 1540/10000 [5:36:25<30:13:48, 12.86s/it] 15%|█▌ | 1541/10000 [5:36:38<30:12:33, 12.86s/it] {'loss': 0.0056, 'learning_rate': 4.2315000000000004e-05, 'epoch': 0.58} 15%|█▌ | 1541/10000 [5:36:38<30:12:33, 12.86s/it] 15%|█▌ | 1542/10000 [5:36:50<30:13:33, 12.87s/it] {'loss': 0.0059, 'learning_rate': 4.231e-05, 'epoch': 0.58} 15%|█▌ | 1542/10000 [5:36:50<30:13:33, 12.87s/it] 15%|█▌ | 1543/10000 [5:37:03<30:14:26, 12.87s/it] {'loss': 0.0057, 'learning_rate': 4.2305e-05, 'epoch': 0.58} 15%|█▌ | 1543/10000 [5:37:03<30:14:26, 12.87s/it] 15%|█▌ | 1544/10000 [5:37:16<30:15:55, 12.88s/it] {'loss': 0.0044, 'learning_rate': 4.23e-05, 'epoch': 0.58} 15%|█▌ | 1544/10000 [5:37:16<30:15:55, 12.88s/it] 15%|█▌ | 1545/10000 [5:37:29<30:16:38, 12.89s/it] {'loss': 0.005, 'learning_rate': 4.2295e-05, 'epoch': 0.58} 15%|█▌ | 1545/10000 [5:37:29<30:16:38, 12.89s/it] 15%|█▌ | 1546/10000 [5:37:42<30:17:38, 12.90s/it] {'loss': 0.0054, 'learning_rate': 4.229e-05, 'epoch': 0.58} 15%|█▌ | 1546/10000 [5:37:42<30:17:38, 12.90s/it] 15%|█▌ | 1547/10000 [5:37:55<30:17:33, 12.90s/it] {'loss': 0.0067, 'learning_rate': 4.2285e-05, 'epoch': 0.58} 15%|█▌ | 1547/10000 [5:37:55<30:17:33, 12.90s/it] 15%|█▌ | 1548/10000 [5:38:08<30:20:33, 12.92s/it] {'loss': 0.0056, 'learning_rate': 4.228e-05, 'epoch': 0.58} 15%|█▌ | 1548/10000 [5:38:08<30:20:33, 12.92s/it] 15%|█▌ | 1549/10000 [5:38:21<30:23:55, 12.95s/it] {'loss': 0.0057, 'learning_rate': 4.2275000000000004e-05, 'epoch': 0.58} 15%|█▌ | 1549/10000 [5:38:21<30:23:55, 12.95s/it] 16%|█▌ | 1550/10000 [5:38:34<30:21:51, 12.94s/it] {'loss': 0.0058, 'learning_rate': 4.227000000000001e-05, 'epoch': 0.58} 16%|█▌ | 1550/10000 [5:38:34<30:21:51, 12.94s/it] 16%|█▌ | 1551/10000 [5:38:47<30:21:09, 12.93s/it] {'loss': 0.0056, 'learning_rate': 4.2265e-05, 'epoch': 0.58} 16%|█▌ | 1551/10000 [5:38:47<30:21:09, 12.93s/it] 16%|█▌ | 1552/10000 [5:39:00<30:19:54, 12.93s/it] {'loss': 0.0056, 'learning_rate': 4.226e-05, 'epoch': 0.58} 16%|█▌ | 1552/10000 [5:39:00<30:19:54, 12.93s/it] 16%|█▌ | 1553/10000 [5:39:13<30:19:26, 12.92s/it] {'loss': 0.0062, 'learning_rate': 4.2255e-05, 'epoch': 0.59} 16%|█▌ | 1553/10000 [5:39:13<30:19:26, 12.92s/it] 16%|█▌ | 1554/10000 [5:39:26<30:19:35, 12.93s/it] {'loss': 0.005, 'learning_rate': 4.2250000000000004e-05, 'epoch': 0.59} 16%|█▌ | 1554/10000 [5:39:26<30:19:35, 12.93s/it] 16%|█▌ | 1555/10000 [5:39:38<30:16:40, 12.91s/it] {'loss': 0.0055, 'learning_rate': 4.2245e-05, 'epoch': 0.59} 16%|█▌ | 1555/10000 [5:39:38<30:16:40, 12.91s/it] 16%|█▌ | 1556/10000 [5:39:51<30:12:13, 12.88s/it] {'loss': 0.0062, 'learning_rate': 4.224e-05, 'epoch': 0.59} 16%|█▌ | 1556/10000 [5:39:51<30:12:13, 12.88s/it] 16%|█▌ | 1557/10000 [5:40:04<30:12:10, 12.88s/it] {'loss': 0.0063, 'learning_rate': 4.2235000000000005e-05, 'epoch': 0.59} 16%|█▌ | 1557/10000 [5:40:04<30:12:10, 12.88s/it] 16%|█▌ | 1558/10000 [5:40:17<30:13:47, 12.89s/it] {'loss': 0.0055, 'learning_rate': 4.223e-05, 'epoch': 0.59} 16%|█▌ | 1558/10000 [5:40:17<30:13:47, 12.89s/it] 16%|█▌ | 1559/10000 [5:40:30<30:13:49, 12.89s/it] {'loss': 0.0059, 'learning_rate': 4.2225e-05, 'epoch': 0.59} 16%|█▌ | 1559/10000 [5:40:30<30:13:49, 12.89s/it] 16%|█▌ | 1560/10000 [5:40:43<30:13:11, 12.89s/it] {'loss': 0.0055, 'learning_rate': 4.2220000000000006e-05, 'epoch': 0.59} 16%|█▌ | 1560/10000 [5:40:43<30:13:11, 12.89s/it] 16%|█▌ | 1561/10000 [5:40:56<30:11:58, 12.88s/it] {'loss': 0.0098, 'learning_rate': 4.2215e-05, 'epoch': 0.59} 16%|█▌ | 1561/10000 [5:40:56<30:11:58, 12.88s/it] 16%|█▌ | 1562/10000 [5:41:09<30:13:44, 12.90s/it] {'loss': 0.0065, 'learning_rate': 4.221e-05, 'epoch': 0.59} 16%|█▌ | 1562/10000 [5:41:09<30:13:44, 12.90s/it] 16%|█▌ | 1563/10000 [5:41:21<30:13:13, 12.89s/it] {'loss': 0.0077, 'learning_rate': 4.2205e-05, 'epoch': 0.59} 16%|█▌ | 1563/10000 [5:41:21<30:13:13, 12.89s/it] 16%|█▌ | 1564/10000 [5:41:34<30:11:33, 12.88s/it] {'loss': 0.0054, 'learning_rate': 4.22e-05, 'epoch': 0.59} 16%|█▌ | 1564/10000 [5:41:34<30:11:33, 12.88s/it] 16%|█▌ | 1565/10000 [5:41:47<30:07:20, 12.86s/it] {'loss': 0.0061, 'learning_rate': 4.2195e-05, 'epoch': 0.59} 16%|█▌ | 1565/10000 [5:41:47<30:07:20, 12.86s/it] 16%|█▌ | 1566/10000 [5:42:00<30:09:51, 12.88s/it] {'loss': 0.0074, 'learning_rate': 4.219e-05, 'epoch': 0.59} 16%|█▌ | 1566/10000 [5:42:00<30:09:51, 12.88s/it] 16%|█▌ | 1567/10000 [5:42:13<30:06:46, 12.86s/it] {'loss': 0.006, 'learning_rate': 4.2185000000000004e-05, 'epoch': 0.59} 16%|█▌ | 1567/10000 [5:42:13<30:06:46, 12.86s/it] 16%|█▌ | 1568/10000 [5:42:26<30:14:41, 12.91s/it] {'loss': 0.0064, 'learning_rate': 4.2180000000000006e-05, 'epoch': 0.59} 16%|█▌ | 1568/10000 [5:42:26<30:14:41, 12.91s/it] 16%|█▌ | 1569/10000 [5:42:39<30:18:08, 12.94s/it] {'loss': 0.006, 'learning_rate': 4.2175e-05, 'epoch': 0.59} 16%|█▌ | 1569/10000 [5:42:39<30:18:08, 12.94s/it] 16%|█▌ | 1570/10000 [5:42:52<30:16:01, 12.93s/it] {'loss': 0.0062, 'learning_rate': 4.2170000000000005e-05, 'epoch': 0.59} 16%|█▌ | 1570/10000 [5:42:52<30:16:01, 12.93s/it] 16%|█▌ | 1571/10000 [5:43:05<30:13:51, 12.91s/it] {'loss': 0.0071, 'learning_rate': 4.216500000000001e-05, 'epoch': 0.59} 16%|█▌ | 1571/10000 [5:43:05<30:13:51, 12.91s/it] 16%|█▌ | 1572/10000 [5:43:18<30:18:00, 12.94s/it] {'loss': 0.0067, 'learning_rate': 4.2159999999999996e-05, 'epoch': 0.59} 16%|█▌ | 1572/10000 [5:43:18<30:18:00, 12.94s/it] 16%|█▌ | 1573/10000 [5:43:31<30:17:34, 12.94s/it] {'loss': 0.0048, 'learning_rate': 4.2155e-05, 'epoch': 0.59} 16%|█▌ | 1573/10000 [5:43:31<30:17:34, 12.94s/it] 16%|█▌ | 1574/10000 [5:43:44<30:16:14, 12.93s/it] {'loss': 0.0051, 'learning_rate': 4.215e-05, 'epoch': 0.59} 16%|█▌ | 1574/10000 [5:43:44<30:16:14, 12.93s/it] 16%|█▌ | 1575/10000 [5:43:56<30:14:01, 12.92s/it] {'loss': 0.0074, 'learning_rate': 4.2145000000000004e-05, 'epoch': 0.59} 16%|█▌ | 1575/10000 [5:43:56<30:14:01, 12.92s/it] 16%|█▌ | 1576/10000 [5:44:09<30:13:22, 12.92s/it] {'loss': 0.0049, 'learning_rate': 4.214e-05, 'epoch': 0.59} 16%|█▌ | 1576/10000 [5:44:09<30:13:22, 12.92s/it] 16%|█▌ | 1577/10000 [5:44:22<30:10:33, 12.90s/it] {'loss': 0.0076, 'learning_rate': 4.2135e-05, 'epoch': 0.59} 16%|█▌ | 1577/10000 [5:44:22<30:10:33, 12.90s/it] 16%|█▌ | 1578/10000 [5:44:35<30:10:17, 12.90s/it] {'loss': 0.0076, 'learning_rate': 4.2130000000000005e-05, 'epoch': 0.59} 16%|█▌ | 1578/10000 [5:44:35<30:10:17, 12.90s/it] 16%|█▌ | 1579/10000 [5:44:48<30:13:37, 12.92s/it] {'loss': 0.0067, 'learning_rate': 4.2125e-05, 'epoch': 0.59} 16%|█▌ | 1579/10000 [5:44:48<30:13:37, 12.92s/it] 16%|█▌ | 1580/10000 [5:45:01<30:13:10, 12.92s/it] {'loss': 0.0056, 'learning_rate': 4.212e-05, 'epoch': 0.6} 16%|█▌ | 1580/10000 [5:45:01<30:13:10, 12.92s/it] 16%|█▌ | 1581/10000 [5:45:14<30:12:11, 12.91s/it] {'loss': 0.0069, 'learning_rate': 4.2115000000000006e-05, 'epoch': 0.6} 16%|█▌ | 1581/10000 [5:45:14<30:12:11, 12.91s/it] 16%|█▌ | 1582/10000 [5:45:27<30:13:33, 12.93s/it] {'loss': 0.0048, 'learning_rate': 4.211e-05, 'epoch': 0.6} 16%|█▌ | 1582/10000 [5:45:27<30:13:33, 12.93s/it] 16%|█▌ | 1583/10000 [5:45:40<30:09:41, 12.90s/it] {'loss': 0.0067, 'learning_rate': 4.2105e-05, 'epoch': 0.6} 16%|█▌ | 1583/10000 [5:45:40<30:09:41, 12.90s/it] 16%|█▌ | 1584/10000 [5:45:52<30:06:13, 12.88s/it] {'loss': 0.0071, 'learning_rate': 4.21e-05, 'epoch': 0.6} 16%|█▌ | 1584/10000 [5:45:53<30:06:13, 12.88s/it] 16%|█▌ | 1585/10000 [5:46:05<30:02:27, 12.85s/it] {'loss': 0.008, 'learning_rate': 4.2095e-05, 'epoch': 0.6} 16%|█▌ | 1585/10000 [5:46:05<30:02:27, 12.85s/it] 16%|█▌ | 1586/10000 [5:46:18<30:02:09, 12.85s/it] {'loss': 0.0047, 'learning_rate': 4.209e-05, 'epoch': 0.6} 16%|█▌ | 1586/10000 [5:46:18<30:02:09, 12.85s/it] 16%|█▌ | 1587/10000 [5:46:31<30:03:53, 12.86s/it] {'loss': 0.006, 'learning_rate': 4.2085e-05, 'epoch': 0.6} 16%|█▌ | 1587/10000 [5:46:31<30:03:53, 12.86s/it] 16%|█▌ | 1588/10000 [5:46:44<30:01:54, 12.85s/it] {'loss': 0.0054, 'learning_rate': 4.2080000000000004e-05, 'epoch': 0.6} 16%|█▌ | 1588/10000 [5:46:44<30:01:54, 12.85s/it] 16%|█▌ | 1589/10000 [5:46:57<30:04:27, 12.87s/it] {'loss': 0.0053, 'learning_rate': 4.2075000000000006e-05, 'epoch': 0.6} 16%|█▌ | 1589/10000 [5:46:57<30:04:27, 12.87s/it] 16%|█▌ | 1590/10000 [5:47:10<30:01:59, 12.86s/it] {'loss': 0.0051, 'learning_rate': 4.207e-05, 'epoch': 0.6} 16%|█▌ | 1590/10000 [5:47:10<30:01:59, 12.86s/it] 16%|█▌ | 1591/10000 [5:47:22<30:00:45, 12.85s/it] {'loss': 0.007, 'learning_rate': 4.2065000000000005e-05, 'epoch': 0.6} 16%|█▌ | 1591/10000 [5:47:22<30:00:45, 12.85s/it] 16%|█▌ | 1592/10000 [5:47:35<30:01:55, 12.86s/it] {'loss': 0.0043, 'learning_rate': 4.206e-05, 'epoch': 0.6} 16%|█▌ | 1592/10000 [5:47:35<30:01:55, 12.86s/it] 16%|█▌ | 1593/10000 [5:47:48<30:02:15, 12.86s/it] {'loss': 0.0064, 'learning_rate': 4.2055e-05, 'epoch': 0.6} 16%|█▌ | 1593/10000 [5:47:48<30:02:15, 12.86s/it] 16%|█▌ | 1594/10000 [5:48:01<30:04:58, 12.88s/it] {'loss': 0.0067, 'learning_rate': 4.205e-05, 'epoch': 0.6} 16%|█▌ | 1594/10000 [5:48:01<30:04:58, 12.88s/it] 16%|█▌ | 1595/10000 [5:48:14<30:07:52, 12.91s/it] {'loss': 0.0039, 'learning_rate': 4.2045e-05, 'epoch': 0.6} 16%|█▌ | 1595/10000 [5:48:14<30:07:52, 12.91s/it] 16%|█▌ | 1596/10000 [5:48:27<30:08:23, 12.91s/it] {'loss': 0.0072, 'learning_rate': 4.2040000000000004e-05, 'epoch': 0.6} 16%|█▌ | 1596/10000 [5:48:27<30:08:23, 12.91s/it] 16%|█▌ | 1597/10000 [5:48:40<30:12:37, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.2035e-05, 'epoch': 0.6} 16%|█▌ | 1597/10000 [5:48:40<30:12:37, 12.94s/it] 16%|█▌ | 1598/10000 [5:48:53<30:13:20, 12.95s/it] {'loss': 0.0057, 'learning_rate': 4.203e-05, 'epoch': 0.6} 16%|█▌ | 1598/10000 [5:48:53<30:13:20, 12.95s/it] 16%|█▌ | 1599/10000 [5:49:06<30:13:32, 12.95s/it] {'loss': 0.0054, 'learning_rate': 4.2025000000000005e-05, 'epoch': 0.6} 16%|█▌ | 1599/10000 [5:49:06<30:13:32, 12.95s/it] 16%|█▌ | 1600/10000 [5:49:19<30:13:44, 12.96s/it] {'loss': 0.0068, 'learning_rate': 4.202e-05, 'epoch': 0.6} 16%|█▌ | 1600/10000 [5:49:19<30:13:44, 12.96s/it] 16%|█▌ | 1601/10000 [5:49:32<30:14:28, 12.96s/it] {'loss': 0.0068, 'learning_rate': 4.2015000000000003e-05, 'epoch': 0.6} 16%|█▌ | 1601/10000 [5:49:32<30:14:28, 12.96s/it] 16%|█▌ | 1602/10000 [5:49:45<30:16:08, 12.98s/it] {'loss': 0.0046, 'learning_rate': 4.201e-05, 'epoch': 0.6} 16%|█▌ | 1602/10000 [5:49:45<30:16:08, 12.98s/it] 16%|█▌ | 1603/10000 [5:49:58<30:15:48, 12.97s/it] {'loss': 0.0064, 'learning_rate': 4.2005e-05, 'epoch': 0.6} 16%|█▌ | 1603/10000 [5:49:58<30:15:48, 12.97s/it] 16%|█▌ | 1604/10000 [5:50:11<30:13:54, 12.96s/it] {'loss': 0.0049, 'learning_rate': 4.2e-05, 'epoch': 0.6} 16%|█▌ | 1604/10000 [5:50:11<30:13:54, 12.96s/it] 16%|█▌ | 1605/10000 [5:50:24<30:10:03, 12.94s/it] {'loss': 0.0064, 'learning_rate': 4.1995e-05, 'epoch': 0.6} 16%|█▌ | 1605/10000 [5:50:24<30:10:03, 12.94s/it] 16%|█▌ | 1606/10000 [5:50:37<30:07:49, 12.92s/it] {'loss': 0.0064, 'learning_rate': 4.199e-05, 'epoch': 0.61} 16%|█▌ | 1606/10000 [5:50:37<30:07:49, 12.92s/it] 16%|█▌ | 1607/10000 [5:50:49<30:07:09, 12.92s/it] {'loss': 0.0059, 'learning_rate': 4.1985000000000005e-05, 'epoch': 0.61} 16%|█▌ | 1607/10000 [5:50:49<30:07:09, 12.92s/it] 16%|█▌ | 1608/10000 [5:51:02<30:10:40, 12.95s/it] {'loss': 0.0048, 'learning_rate': 4.198e-05, 'epoch': 0.61} 16%|█▌ | 1608/10000 [5:51:02<30:10:40, 12.95s/it] 16%|█▌ | 1609/10000 [5:51:15<30:11:37, 12.95s/it] {'loss': 0.0047, 'learning_rate': 4.1975000000000004e-05, 'epoch': 0.61} 16%|█▌ | 1609/10000 [5:51:15<30:11:37, 12.95s/it] 16%|█▌ | 1610/10000 [5:51:28<30:12:13, 12.96s/it] {'loss': 0.0052, 'learning_rate': 4.1970000000000006e-05, 'epoch': 0.61} 16%|█▌ | 1610/10000 [5:51:28<30:12:13, 12.96s/it] 16%|█▌ | 1611/10000 [5:51:41<30:14:45, 12.98s/it] {'loss': 0.006, 'learning_rate': 4.1965e-05, 'epoch': 0.61} 16%|█▌ | 1611/10000 [5:51:41<30:14:45, 12.98s/it] 16%|█▌ | 1612/10000 [5:51:54<30:14:24, 12.98s/it] {'loss': 0.0042, 'learning_rate': 4.196e-05, 'epoch': 0.61} 16%|█▌ | 1612/10000 [5:51:54<30:14:24, 12.98s/it] 16%|█▌ | 1613/10000 [5:52:07<30:13:42, 12.98s/it] {'loss': 0.0059, 'learning_rate': 4.1955e-05, 'epoch': 0.61} 16%|█▌ | 1613/10000 [5:52:07<30:13:42, 12.98s/it] 16%|█▌ | 1614/10000 [5:52:20<30:11:29, 12.96s/it] {'loss': 0.0064, 'learning_rate': 4.195e-05, 'epoch': 0.61} 16%|█▌ | 1614/10000 [5:52:20<30:11:29, 12.96s/it] 16%|█▌ | 1615/10000 [5:52:33<30:15:39, 12.99s/it] {'loss': 0.007, 'learning_rate': 4.1945e-05, 'epoch': 0.61} 16%|█▌ | 1615/10000 [5:52:33<30:15:39, 12.99s/it] 16%|█▌ | 1616/10000 [5:52:46<30:15:21, 12.99s/it] {'loss': 0.0048, 'learning_rate': 4.194e-05, 'epoch': 0.61} 16%|█▌ | 1616/10000 [5:52:46<30:15:21, 12.99s/it] 16%|█▌ | 1617/10000 [5:52:59<30:15:52, 13.00s/it] {'loss': 0.0056, 'learning_rate': 4.1935000000000004e-05, 'epoch': 0.61} 16%|█▌ | 1617/10000 [5:52:59<30:15:52, 13.00s/it] 16%|█▌ | 1618/10000 [5:53:12<30:14:44, 12.99s/it] {'loss': 0.0054, 'learning_rate': 4.193e-05, 'epoch': 0.61} 16%|█▌ | 1618/10000 [5:53:12<30:14:44, 12.99s/it] 16%|█▌ | 1619/10000 [5:53:25<30:12:47, 12.98s/it] {'loss': 0.0085, 'learning_rate': 4.1925e-05, 'epoch': 0.61} 16%|█▌ | 1619/10000 [5:53:25<30:12:47, 12.98s/it] 16%|█▌ | 1620/10000 [5:53:38<30:09:39, 12.96s/it] {'loss': 0.009, 'learning_rate': 4.1920000000000005e-05, 'epoch': 0.61} 16%|█▌ | 1620/10000 [5:53:38<30:09:39, 12.96s/it] 16%|█▌ | 1621/10000 [5:53:51<30:06:47, 12.94s/it] {'loss': 0.0056, 'learning_rate': 4.1915e-05, 'epoch': 0.61} 16%|█▌ | 1621/10000 [5:53:51<30:06:47, 12.94s/it] 16%|█▌ | 1622/10000 [5:54:04<30:09:15, 12.96s/it] {'loss': 0.0067, 'learning_rate': 4.191e-05, 'epoch': 0.61} 16%|█▌ | 1622/10000 [5:54:04<30:09:15, 12.96s/it] 16%|█▌ | 1623/10000 [5:54:17<30:09:23, 12.96s/it] {'loss': 0.0082, 'learning_rate': 4.1905e-05, 'epoch': 0.61} 16%|█▌ | 1623/10000 [5:54:17<30:09:23, 12.96s/it] 16%|█▌ | 1624/10000 [5:54:30<30:08:44, 12.96s/it] {'loss': 0.0069, 'learning_rate': 4.19e-05, 'epoch': 0.61} 16%|█▌ | 1624/10000 [5:54:30<30:08:44, 12.96s/it] 16%|█▋ | 1625/10000 [5:54:43<30:09:54, 12.97s/it] {'loss': 0.0053, 'learning_rate': 4.1895e-05, 'epoch': 0.61} 16%|█▋ | 1625/10000 [5:54:43<30:09:54, 12.97s/it] 16%|█▋ | 1626/10000 [5:54:56<30:11:03, 12.98s/it] {'loss': 0.0063, 'learning_rate': 4.189e-05, 'epoch': 0.61} 16%|█▋ | 1626/10000 [5:54:56<30:11:03, 12.98s/it] 16%|█▋ | 1627/10000 [5:55:09<30:08:54, 12.96s/it] {'loss': 0.006, 'learning_rate': 4.1885e-05, 'epoch': 0.61} 16%|█▋ | 1627/10000 [5:55:09<30:08:54, 12.96s/it] 16%|█▋ | 1628/10000 [5:55:22<30:08:57, 12.96s/it] {'loss': 0.008, 'learning_rate': 4.1880000000000006e-05, 'epoch': 0.61} 16%|█▋ | 1628/10000 [5:55:22<30:08:57, 12.96s/it] 16%|█▋ | 1629/10000 [5:55:35<30:06:16, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.1875e-05, 'epoch': 0.61} 16%|█▋ | 1629/10000 [5:55:35<30:06:16, 12.95s/it] 16%|█▋ | 1630/10000 [5:55:48<30:08:08, 12.96s/it] {'loss': 0.0057, 'learning_rate': 4.1870000000000004e-05, 'epoch': 0.61} 16%|█▋ | 1630/10000 [5:55:48<30:08:08, 12.96s/it] 16%|█▋ | 1631/10000 [5:56:01<30:07:58, 12.96s/it] {'loss': 0.0055, 'learning_rate': 4.1865000000000007e-05, 'epoch': 0.61} 16%|█▋ | 1631/10000 [5:56:01<30:07:58, 12.96s/it] 16%|█▋ | 1632/10000 [5:56:14<30:05:54, 12.95s/it] {'loss': 0.0089, 'learning_rate': 4.186e-05, 'epoch': 0.61} 16%|█▋ | 1632/10000 [5:56:14<30:05:54, 12.95s/it] 16%|█▋ | 1633/10000 [5:56:27<30:02:50, 12.93s/it] {'loss': 0.0061, 'learning_rate': 4.1855e-05, 'epoch': 0.62} 16%|█▋ | 1633/10000 [5:56:27<30:02:50, 12.93s/it] 16%|█▋ | 1634/10000 [5:56:40<30:07:07, 12.96s/it] {'loss': 0.004, 'learning_rate': 4.185e-05, 'epoch': 0.62} 16%|█▋ | 1634/10000 [5:56:40<30:07:07, 12.96s/it] 16%|█▋ | 1635/10000 [5:56:53<30:13:13, 13.01s/it] {'loss': 0.0068, 'learning_rate': 4.1845000000000003e-05, 'epoch': 0.62} 16%|█▋ | 1635/10000 [5:56:53<30:13:13, 13.01s/it] 16%|█▋ | 1636/10000 [5:57:06<30:09:14, 12.98s/it] {'loss': 0.0059, 'learning_rate': 4.184e-05, 'epoch': 0.62} 16%|█▋ | 1636/10000 [5:57:06<30:09:14, 12.98s/it] 16%|█▋ | 1637/10000 [5:57:19<30:12:33, 13.00s/it] {'loss': 0.0084, 'learning_rate': 4.1835e-05, 'epoch': 0.62} 16%|█▋ | 1637/10000 [5:57:19<30:12:33, 13.00s/it] 16%|█▋ | 1638/10000 [5:57:32<30:07:54, 12.97s/it] {'loss': 0.0077, 'learning_rate': 4.1830000000000004e-05, 'epoch': 0.62} 16%|█▋ | 1638/10000 [5:57:32<30:07:54, 12.97s/it] 16%|█▋ | 1639/10000 [5:57:45<30:06:26, 12.96s/it] {'loss': 0.0073, 'learning_rate': 4.1825e-05, 'epoch': 0.62} 16%|█▋ | 1639/10000 [5:57:45<30:06:26, 12.96s/it] 16%|█▋ | 1640/10000 [5:57:57<30:06:05, 12.96s/it] {'loss': 0.0054, 'learning_rate': 4.182e-05, 'epoch': 0.62} 16%|█▋ | 1640/10000 [5:57:58<30:06:05, 12.96s/it] 16%|█▋ | 1641/10000 [5:58:10<30:02:05, 12.94s/it] {'loss': 0.0055, 'learning_rate': 4.1815000000000005e-05, 'epoch': 0.62} 16%|█▋ | 1641/10000 [5:58:10<30:02:05, 12.94s/it] 16%|█▋ | 1642/10000 [5:58:23<29:58:46, 12.91s/it] {'loss': 0.0077, 'learning_rate': 4.181000000000001e-05, 'epoch': 0.62} 16%|█▋ | 1642/10000 [5:58:23<29:58:46, 12.91s/it] 16%|█▋ | 1643/10000 [5:58:36<29:57:08, 12.90s/it] {'loss': 0.0061, 'learning_rate': 4.1805e-05, 'epoch': 0.62} 16%|█▋ | 1643/10000 [5:58:36<29:57:08, 12.90s/it] 16%|█▋ | 1644/10000 [5:58:49<29:55:48, 12.89s/it] {'loss': 0.0068, 'learning_rate': 4.18e-05, 'epoch': 0.62} 16%|█▋ | 1644/10000 [5:58:49<29:55:48, 12.89s/it] 16%|█▋ | 1645/10000 [5:59:02<29:58:09, 12.91s/it] {'loss': 0.0059, 'learning_rate': 4.1795e-05, 'epoch': 0.62} 16%|█▋ | 1645/10000 [5:59:02<29:58:09, 12.91s/it] 16%|█▋ | 1646/10000 [5:59:15<29:55:00, 12.89s/it] {'loss': 0.0054, 'learning_rate': 4.179e-05, 'epoch': 0.62} 16%|█▋ | 1646/10000 [5:59:15<29:55:00, 12.89s/it] 16%|█▋ | 1647/10000 [5:59:28<29:54:59, 12.89s/it] {'loss': 0.0055, 'learning_rate': 4.1785e-05, 'epoch': 0.62} 16%|█▋ | 1647/10000 [5:59:28<29:54:59, 12.89s/it] 16%|█▋ | 1648/10000 [5:59:40<29:52:13, 12.88s/it] {'loss': 0.0049, 'learning_rate': 4.178e-05, 'epoch': 0.62} 16%|█▋ | 1648/10000 [5:59:41<29:52:13, 12.88s/it] 16%|█▋ | 1649/10000 [5:59:53<29:50:55, 12.87s/it] {'loss': 0.0056, 'learning_rate': 4.1775000000000006e-05, 'epoch': 0.62} 16%|█▋ | 1649/10000 [5:59:53<29:50:55, 12.87s/it] 16%|█▋ | 1650/10000 [6:00:06<29:48:09, 12.85s/it] {'loss': 0.007, 'learning_rate': 4.177e-05, 'epoch': 0.62} 16%|█▋ | 1650/10000 [6:00:06<29:48:09, 12.85s/it] 17%|█▋ | 1651/10000 [6:00:19<29:46:14, 12.84s/it] {'loss': 0.005, 'learning_rate': 4.1765000000000004e-05, 'epoch': 0.62} 17%|█▋ | 1651/10000 [6:00:19<29:46:14, 12.84s/it] 17%|█▋ | 1652/10000 [6:00:32<29:48:39, 12.86s/it] {'loss': 0.005, 'learning_rate': 4.176000000000001e-05, 'epoch': 0.62} 17%|█▋ | 1652/10000 [6:00:32<29:48:39, 12.86s/it] 17%|█▋ | 1653/10000 [6:00:45<29:49:17, 12.86s/it] {'loss': 0.005, 'learning_rate': 4.1755e-05, 'epoch': 0.62} 17%|█▋ | 1653/10000 [6:00:45<29:49:17, 12.86s/it] 17%|█▋ | 1654/10000 [6:00:58<29:50:53, 12.87s/it] {'loss': 0.0088, 'learning_rate': 4.175e-05, 'epoch': 0.62} 17%|█▋ | 1654/10000 [6:00:58<29:50:53, 12.87s/it] 17%|█▋ | 1655/10000 [6:01:10<29:48:00, 12.86s/it] {'loss': 0.0051, 'learning_rate': 4.1745e-05, 'epoch': 0.62} 17%|█▋ | 1655/10000 [6:01:10<29:48:00, 12.86s/it] 17%|█▋ | 1656/10000 [6:01:23<29:46:17, 12.84s/it] {'loss': 0.0109, 'learning_rate': 4.1740000000000004e-05, 'epoch': 0.62} 17%|█▋ | 1656/10000 [6:01:23<29:46:17, 12.84s/it] 17%|█▋ | 1657/10000 [6:01:36<29:46:20, 12.85s/it] {'loss': 0.009, 'learning_rate': 4.1735e-05, 'epoch': 0.62} 17%|█▋ | 1657/10000 [6:01:36<29:46:20, 12.85s/it] 17%|█▋ | 1658/10000 [6:01:49<29:46:44, 12.85s/it] {'loss': 0.0069, 'learning_rate': 4.173e-05, 'epoch': 0.62} 17%|█▋ | 1658/10000 [6:01:49<29:46:44, 12.85s/it] 17%|█▋ | 1659/10000 [6:02:02<29:46:02, 12.85s/it] {'loss': 0.0052, 'learning_rate': 4.1725000000000005e-05, 'epoch': 0.63} 17%|█▋ | 1659/10000 [6:02:02<29:46:02, 12.85s/it] 17%|█▋ | 1660/10000 [6:02:15<29:44:02, 12.83s/it] {'loss': 0.0053, 'learning_rate': 4.172e-05, 'epoch': 0.63} 17%|█▋ | 1660/10000 [6:02:15<29:44:02, 12.83s/it] 17%|█▋ | 1661/10000 [6:02:27<29:43:20, 12.83s/it] {'loss': 0.007, 'learning_rate': 4.1715e-05, 'epoch': 0.63} 17%|█▋ | 1661/10000 [6:02:27<29:43:20, 12.83s/it] 17%|█▋ | 1662/10000 [6:02:40<29:43:09, 12.83s/it] {'loss': 0.0073, 'learning_rate': 4.1710000000000006e-05, 'epoch': 0.63} 17%|█▋ | 1662/10000 [6:02:40<29:43:09, 12.83s/it] 17%|█▋ | 1663/10000 [6:02:53<29:46:04, 12.85s/it] {'loss': 0.0049, 'learning_rate': 4.1705e-05, 'epoch': 0.63} 17%|█▋ | 1663/10000 [6:02:53<29:46:04, 12.85s/it] 17%|█▋ | 1664/10000 [6:03:06<29:45:27, 12.85s/it] {'loss': 0.0059, 'learning_rate': 4.17e-05, 'epoch': 0.63} 17%|█▋ | 1664/10000 [6:03:06<29:45:27, 12.85s/it] 17%|█▋ | 1665/10000 [6:03:19<29:48:18, 12.87s/it] {'loss': 0.0059, 'learning_rate': 4.1695e-05, 'epoch': 0.63} 17%|█▋ | 1665/10000 [6:03:19<29:48:18, 12.87s/it] 17%|█▋ | 1666/10000 [6:03:32<29:47:51, 12.87s/it] {'loss': 0.006, 'learning_rate': 4.169e-05, 'epoch': 0.63} 17%|█▋ | 1666/10000 [6:03:32<29:47:51, 12.87s/it] 17%|█▋ | 1667/10000 [6:03:45<29:49:14, 12.88s/it] {'loss': 0.0055, 'learning_rate': 4.1685000000000005e-05, 'epoch': 0.63} 17%|█▋ | 1667/10000 [6:03:45<29:49:14, 12.88s/it] 17%|█▋ | 1668/10000 [6:03:58<29:47:01, 12.87s/it] {'loss': 0.006, 'learning_rate': 4.168e-05, 'epoch': 0.63} 17%|█▋ | 1668/10000 [6:03:58<29:47:01, 12.87s/it] 17%|█▋ | 1669/10000 [6:04:10<29:46:36, 12.87s/it] {'loss': 0.0101, 'learning_rate': 4.1675e-05, 'epoch': 0.63} 17%|█▋ | 1669/10000 [6:04:10<29:46:36, 12.87s/it] 17%|█▋ | 1670/10000 [6:04:23<29:45:07, 12.86s/it] {'loss': 0.0059, 'learning_rate': 4.1670000000000006e-05, 'epoch': 0.63} 17%|█▋ | 1670/10000 [6:04:23<29:45:07, 12.86s/it] 17%|█▋ | 1671/10000 [6:04:36<29:48:26, 12.88s/it] {'loss': 0.0049, 'learning_rate': 4.1665e-05, 'epoch': 0.63} 17%|█▋ | 1671/10000 [6:04:36<29:48:26, 12.88s/it] 17%|█▋ | 1672/10000 [6:04:49<29:49:55, 12.90s/it] {'loss': 0.0063, 'learning_rate': 4.1660000000000004e-05, 'epoch': 0.63} 17%|█▋ | 1672/10000 [6:04:49<29:49:55, 12.90s/it] 17%|█▋ | 1673/10000 [6:05:02<29:50:42, 12.90s/it] {'loss': 0.005, 'learning_rate': 4.1655e-05, 'epoch': 0.63} 17%|█▋ | 1673/10000 [6:05:02<29:50:42, 12.90s/it] 17%|█▋ | 1674/10000 [6:05:15<29:49:33, 12.90s/it] {'loss': 0.0061, 'learning_rate': 4.165e-05, 'epoch': 0.63} 17%|█▋ | 1674/10000 [6:05:15<29:49:33, 12.90s/it] 17%|█▋ | 1675/10000 [6:05:28<29:45:36, 12.87s/it] {'loss': 0.0065, 'learning_rate': 4.1645e-05, 'epoch': 0.63} 17%|█▋ | 1675/10000 [6:05:28<29:45:36, 12.87s/it] 17%|█▋ | 1676/10000 [6:05:41<29:47:02, 12.88s/it] {'loss': 0.007, 'learning_rate': 4.164e-05, 'epoch': 0.63} 17%|█▋ | 1676/10000 [6:05:41<29:47:02, 12.88s/it] 17%|█▋ | 1677/10000 [6:05:54<29:45:49, 12.87s/it] {'loss': 0.0053, 'learning_rate': 4.1635000000000004e-05, 'epoch': 0.63} 17%|█▋ | 1677/10000 [6:05:54<29:45:49, 12.87s/it] 17%|█▋ | 1678/10000 [6:06:06<29:44:20, 12.86s/it] {'loss': 0.0056, 'learning_rate': 4.163e-05, 'epoch': 0.63} 17%|█▋ | 1678/10000 [6:06:06<29:44:20, 12.86s/it] 17%|█▋ | 1679/10000 [6:06:19<29:45:25, 12.87s/it] {'loss': 0.0062, 'learning_rate': 4.1625e-05, 'epoch': 0.63} 17%|█▋ | 1679/10000 [6:06:19<29:45:25, 12.87s/it] 17%|█▋ | 1680/10000 [6:06:32<29:45:53, 12.88s/it] {'loss': 0.0068, 'learning_rate': 4.1620000000000005e-05, 'epoch': 0.63} 17%|█▋ | 1680/10000 [6:06:32<29:45:53, 12.88s/it] 17%|█▋ | 1681/10000 [6:06:45<29:46:16, 12.88s/it] {'loss': 0.0054, 'learning_rate': 4.161500000000001e-05, 'epoch': 0.63} 17%|█▋ | 1681/10000 [6:06:45<29:46:16, 12.88s/it] 17%|█▋ | 1682/10000 [6:06:58<29:44:26, 12.87s/it] {'loss': 0.008, 'learning_rate': 4.161e-05, 'epoch': 0.63} 17%|█▋ | 1682/10000 [6:06:58<29:44:26, 12.87s/it] 17%|█▋ | 1683/10000 [6:07:11<29:43:40, 12.87s/it] {'loss': 0.0056, 'learning_rate': 4.1605e-05, 'epoch': 0.63} 17%|█▋ | 1683/10000 [6:07:11<29:43:40, 12.87s/it] 17%|█▋ | 1684/10000 [6:07:24<29:42:19, 12.86s/it] {'loss': 0.0071, 'learning_rate': 4.16e-05, 'epoch': 0.63} 17%|█▋ | 1684/10000 [6:07:24<29:42:19, 12.86s/it] 17%|█▋ | 1685/10000 [6:07:36<29:41:31, 12.86s/it] {'loss': 0.0056, 'learning_rate': 4.1595e-05, 'epoch': 0.63} 17%|█▋ | 1685/10000 [6:07:36<29:41:31, 12.86s/it] 17%|█▋ | 1686/10000 [6:07:49<29:47:01, 12.90s/it] {'loss': 0.0057, 'learning_rate': 4.159e-05, 'epoch': 0.64} 17%|█▋ | 1686/10000 [6:07:49<29:47:01, 12.90s/it] 17%|█▋ | 1687/10000 [6:08:02<29:47:01, 12.90s/it] {'loss': 0.0087, 'learning_rate': 4.1585e-05, 'epoch': 0.64} 17%|█▋ | 1687/10000 [6:08:02<29:47:01, 12.90s/it] 17%|█▋ | 1688/10000 [6:08:15<29:46:16, 12.89s/it] {'loss': 0.0066, 'learning_rate': 4.1580000000000005e-05, 'epoch': 0.64} 17%|█▋ | 1688/10000 [6:08:15<29:46:16, 12.89s/it] 17%|█▋ | 1689/10000 [6:08:28<29:47:23, 12.90s/it] {'loss': 0.0048, 'learning_rate': 4.1575e-05, 'epoch': 0.64} 17%|█▋ | 1689/10000 [6:08:28<29:47:23, 12.90s/it] 17%|█▋ | 1690/10000 [6:08:41<29:48:57, 12.92s/it] {'loss': 0.0085, 'learning_rate': 4.1570000000000003e-05, 'epoch': 0.64} 17%|█▋ | 1690/10000 [6:08:41<29:48:57, 12.92s/it] 17%|█▋ | 1691/10000 [6:08:54<29:47:10, 12.91s/it] {'loss': 0.0067, 'learning_rate': 4.1565000000000006e-05, 'epoch': 0.64} 17%|█▋ | 1691/10000 [6:08:54<29:47:10, 12.91s/it] 17%|█▋ | 1692/10000 [6:09:07<29:47:49, 12.91s/it] {'loss': 0.0041, 'learning_rate': 4.156e-05, 'epoch': 0.64} 17%|█▋ | 1692/10000 [6:09:07<29:47:49, 12.91s/it] 17%|█▋ | 1693/10000 [6:09:20<29:44:41, 12.89s/it] {'loss': 0.0082, 'learning_rate': 4.1555e-05, 'epoch': 0.64} 17%|█▋ | 1693/10000 [6:09:20<29:44:41, 12.89s/it] 17%|█▋ | 1694/10000 [6:09:33<29:45:16, 12.90s/it] {'loss': 0.0065, 'learning_rate': 4.155e-05, 'epoch': 0.64} 17%|█▋ | 1694/10000 [6:09:33<29:45:16, 12.90s/it] 17%|█▋ | 1695/10000 [6:09:46<29:46:58, 12.91s/it] {'loss': 0.0078, 'learning_rate': 4.1545e-05, 'epoch': 0.64} 17%|█▋ | 1695/10000 [6:09:46<29:46:58, 12.91s/it] 17%|█▋ | 1696/10000 [6:09:59<29:50:01, 12.93s/it] {'loss': 0.0071, 'learning_rate': 4.154e-05, 'epoch': 0.64} 17%|█▋ | 1696/10000 [6:09:59<29:50:01, 12.93s/it] 17%|█▋ | 1697/10000 [6:10:12<29:51:43, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.1535e-05, 'epoch': 0.64} 17%|█▋ | 1697/10000 [6:10:12<29:51:43, 12.95s/it] 17%|█▋ | 1698/10000 [6:10:24<29:51:44, 12.95s/it] {'loss': 0.0064, 'learning_rate': 4.1530000000000004e-05, 'epoch': 0.64} 17%|█▋ | 1698/10000 [6:10:25<29:51:44, 12.95s/it] 17%|█▋ | 1699/10000 [6:10:37<29:53:05, 12.96s/it] {'loss': 0.0068, 'learning_rate': 4.1525e-05, 'epoch': 0.64} 17%|█▋ | 1699/10000 [6:10:38<29:53:05, 12.96s/it] 17%|█▋ | 1700/10000 [6:10:50<29:51:41, 12.95s/it] {'loss': 0.0054, 'learning_rate': 4.152e-05, 'epoch': 0.64} 17%|█▋ | 1700/10000 [6:10:50<29:51:41, 12.95s/it] 17%|█▋ | 1701/10000 [6:11:03<29:50:31, 12.95s/it] {'loss': 0.0068, 'learning_rate': 4.1515000000000005e-05, 'epoch': 0.64} 17%|█▋ | 1701/10000 [6:11:03<29:50:31, 12.95s/it] 17%|█▋ | 1702/10000 [6:11:16<29:53:46, 12.97s/it] {'loss': 0.0071, 'learning_rate': 4.151000000000001e-05, 'epoch': 0.64} 17%|█▋ | 1702/10000 [6:11:16<29:53:46, 12.97s/it] 17%|█▋ | 1703/10000 [6:11:29<29:54:19, 12.98s/it] {'loss': 0.0068, 'learning_rate': 4.1504999999999996e-05, 'epoch': 0.64} 17%|█▋ | 1703/10000 [6:11:29<29:54:19, 12.98s/it] 17%|█▋ | 1704/10000 [6:11:42<29:54:00, 12.98s/it] {'loss': 0.0063, 'learning_rate': 4.15e-05, 'epoch': 0.64} 17%|█▋ | 1704/10000 [6:11:42<29:54:00, 12.98s/it] 17%|█▋ | 1705/10000 [6:11:55<29:51:01, 12.95s/it] {'loss': 0.0084, 'learning_rate': 4.1495e-05, 'epoch': 0.64} 17%|█▋ | 1705/10000 [6:11:55<29:51:01, 12.95s/it] 17%|█▋ | 1706/10000 [6:12:08<29:48:17, 12.94s/it] {'loss': 0.0082, 'learning_rate': 4.1490000000000004e-05, 'epoch': 0.64} 17%|█▋ | 1706/10000 [6:12:08<29:48:17, 12.94s/it] 17%|█▋ | 1707/10000 [6:12:21<29:47:37, 12.93s/it] {'loss': 0.0056, 'learning_rate': 4.1485e-05, 'epoch': 0.64} 17%|█▋ | 1707/10000 [6:12:21<29:47:37, 12.93s/it] 17%|█▋ | 1708/10000 [6:12:34<29:47:40, 12.94s/it] {'loss': 0.0079, 'learning_rate': 4.148e-05, 'epoch': 0.64} 17%|█▋ | 1708/10000 [6:12:34<29:47:40, 12.94s/it] 17%|█▋ | 1709/10000 [6:12:47<29:49:37, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.1475000000000005e-05, 'epoch': 0.64} 17%|█▋ | 1709/10000 [6:12:47<29:49:37, 12.95s/it] 17%|█▋ | 1710/10000 [6:13:00<29:53:02, 12.98s/it] {'loss': 0.0047, 'learning_rate': 4.147e-05, 'epoch': 0.64} 17%|█▋ | 1710/10000 [6:13:00<29:53:02, 12.98s/it] 17%|█▋ | 1711/10000 [6:13:13<29:53:09, 12.98s/it] {'loss': 0.007, 'learning_rate': 4.1465000000000004e-05, 'epoch': 0.64} 17%|█▋ | 1711/10000 [6:13:13<29:53:09, 12.98s/it] 17%|█▋ | 1712/10000 [6:13:26<29:54:26, 12.99s/it] {'loss': 0.0065, 'learning_rate': 4.1460000000000006e-05, 'epoch': 0.65} 17%|█▋ | 1712/10000 [6:13:26<29:54:26, 12.99s/it] 17%|█▋ | 1713/10000 [6:13:39<29:49:41, 12.96s/it] {'loss': 0.008, 'learning_rate': 4.1455e-05, 'epoch': 0.65} 17%|█▋ | 1713/10000 [6:13:39<29:49:41, 12.96s/it] 17%|█▋ | 1714/10000 [6:13:52<29:48:41, 12.95s/it] {'loss': 0.0069, 'learning_rate': 4.145e-05, 'epoch': 0.65} 17%|█▋ | 1714/10000 [6:13:52<29:48:41, 12.95s/it] 17%|█▋ | 1715/10000 [6:14:05<29:50:07, 12.96s/it] {'loss': 0.0089, 'learning_rate': 4.1445e-05, 'epoch': 0.65} 17%|█▋ | 1715/10000 [6:14:05<29:50:07, 12.96s/it] 17%|█▋ | 1716/10000 [6:14:18<29:48:34, 12.95s/it] {'loss': 0.0064, 'learning_rate': 4.144e-05, 'epoch': 0.65} 17%|█▋ | 1716/10000 [6:14:18<29:48:34, 12.95s/it] 17%|█▋ | 1717/10000 [6:14:31<29:50:00, 12.97s/it] {'loss': 0.0067, 'learning_rate': 4.1435e-05, 'epoch': 0.65} 17%|█▋ | 1717/10000 [6:14:31<29:50:00, 12.97s/it] 17%|█▋ | 1718/10000 [6:14:44<29:47:41, 12.95s/it] {'loss': 0.0065, 'learning_rate': 4.143e-05, 'epoch': 0.65} 17%|█▋ | 1718/10000 [6:14:44<29:47:41, 12.95s/it] 17%|█▋ | 1719/10000 [6:14:57<29:48:54, 12.96s/it] {'loss': 0.007, 'learning_rate': 4.1425000000000004e-05, 'epoch': 0.65} 17%|█▋ | 1719/10000 [6:14:57<29:48:54, 12.96s/it] 17%|█▋ | 1720/10000 [6:15:10<29:48:07, 12.96s/it] {'loss': 0.0062, 'learning_rate': 4.142000000000001e-05, 'epoch': 0.65} 17%|█▋ | 1720/10000 [6:15:10<29:48:07, 12.96s/it] 17%|█▋ | 1721/10000 [6:15:23<29:49:04, 12.97s/it] {'loss': 0.0055, 'learning_rate': 4.1415e-05, 'epoch': 0.65} 17%|█▋ | 1721/10000 [6:15:23<29:49:04, 12.97s/it] 17%|█▋ | 1722/10000 [6:15:36<29:47:34, 12.96s/it] {'loss': 0.0083, 'learning_rate': 4.1410000000000005e-05, 'epoch': 0.65} 17%|█▋ | 1722/10000 [6:15:36<29:47:34, 12.96s/it] 17%|█▋ | 1723/10000 [6:15:49<29:48:39, 12.97s/it] {'loss': 0.0061, 'learning_rate': 4.1405e-05, 'epoch': 0.65} 17%|█▋ | 1723/10000 [6:15:49<29:48:39, 12.97s/it] 17%|█▋ | 1724/10000 [6:16:02<29:51:50, 12.99s/it] {'loss': 0.0044, 'learning_rate': 4.14e-05, 'epoch': 0.65} 17%|█▋ | 1724/10000 [6:16:02<29:51:50, 12.99s/it] 17%|█▋ | 1725/10000 [6:16:14<29:47:14, 12.96s/it] {'loss': 0.0067, 'learning_rate': 4.1395e-05, 'epoch': 0.65} 17%|█▋ | 1725/10000 [6:16:15<29:47:14, 12.96s/it] 17%|█▋ | 1726/10000 [6:16:27<29:48:02, 12.97s/it] {'loss': 0.0059, 'learning_rate': 4.139e-05, 'epoch': 0.65} 17%|█▋ | 1726/10000 [6:16:27<29:48:02, 12.97s/it] 17%|█▋ | 1727/10000 [6:16:40<29:50:02, 12.98s/it] {'loss': 0.0046, 'learning_rate': 4.1385000000000004e-05, 'epoch': 0.65} 17%|█▋ | 1727/10000 [6:16:41<29:50:02, 12.98s/it] 17%|█▋ | 1728/10000 [6:16:53<29:48:53, 12.98s/it] {'loss': 0.0059, 'learning_rate': 4.138e-05, 'epoch': 0.65} 17%|█▋ | 1728/10000 [6:16:53<29:48:53, 12.98s/it] 17%|█▋ | 1729/10000 [6:17:06<29:48:02, 12.97s/it] {'loss': 0.0407, 'learning_rate': 4.1375e-05, 'epoch': 0.65} 17%|█▋ | 1729/10000 [6:17:06<29:48:02, 12.97s/it] 17%|█▋ | 1730/10000 [6:17:19<29:45:43, 12.96s/it] {'loss': 0.005, 'learning_rate': 4.1370000000000005e-05, 'epoch': 0.65} 17%|█▋ | 1730/10000 [6:17:19<29:45:43, 12.96s/it] 17%|█▋ | 1731/10000 [6:17:32<29:45:37, 12.96s/it] {'loss': 0.0054, 'learning_rate': 4.1365e-05, 'epoch': 0.65} 17%|█▋ | 1731/10000 [6:17:32<29:45:37, 12.96s/it] 17%|█▋ | 1732/10000 [6:17:45<29:46:10, 12.96s/it] {'loss': 0.0059, 'learning_rate': 4.1360000000000004e-05, 'epoch': 0.65} 17%|█▋ | 1732/10000 [6:17:45<29:46:10, 12.96s/it] 17%|█▋ | 1733/10000 [6:17:58<29:45:28, 12.96s/it] {'loss': 0.0068, 'learning_rate': 4.1355e-05, 'epoch': 0.65} 17%|█▋ | 1733/10000 [6:17:58<29:45:28, 12.96s/it] 17%|█▋ | 1734/10000 [6:18:11<29:43:55, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.135e-05, 'epoch': 0.65} 17%|█▋ | 1734/10000 [6:18:11<29:43:55, 12.95s/it] 17%|█▋ | 1735/10000 [6:18:24<29:44:37, 12.96s/it] {'loss': 0.0044, 'learning_rate': 4.1345e-05, 'epoch': 0.65} 17%|█▋ | 1735/10000 [6:18:24<29:44:37, 12.96s/it] 17%|█▋ | 1736/10000 [6:18:37<29:45:52, 12.97s/it] {'loss': 0.0064, 'learning_rate': 4.134e-05, 'epoch': 0.65} 17%|█▋ | 1736/10000 [6:18:37<29:45:52, 12.97s/it] 17%|█▋ | 1737/10000 [6:18:50<29:44:04, 12.95s/it] {'loss': 0.0065, 'learning_rate': 4.1335e-05, 'epoch': 0.65} 17%|█▋ | 1737/10000 [6:18:50<29:44:04, 12.95s/it] 17%|█▋ | 1738/10000 [6:19:03<29:42:35, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.133e-05, 'epoch': 0.65} 17%|█▋ | 1738/10000 [6:19:03<29:42:35, 12.95s/it] 17%|█▋ | 1739/10000 [6:19:16<29:43:14, 12.95s/it] {'loss': 0.0057, 'learning_rate': 4.1325e-05, 'epoch': 0.66} 17%|█▋ | 1739/10000 [6:19:16<29:43:14, 12.95s/it] 17%|█▋ | 1740/10000 [6:19:29<29:45:14, 12.97s/it] {'loss': 0.0057, 'learning_rate': 4.1320000000000004e-05, 'epoch': 0.66} 17%|█▋ | 1740/10000 [6:19:29<29:45:14, 12.97s/it] 17%|█▋ | 1741/10000 [6:19:42<29:43:04, 12.95s/it] {'loss': 0.0058, 'learning_rate': 4.131500000000001e-05, 'epoch': 0.66} 17%|█▋ | 1741/10000 [6:19:42<29:43:04, 12.95s/it] 17%|█▋ | 1742/10000 [6:19:55<29:41:55, 12.95s/it] {'loss': 0.0062, 'learning_rate': 4.131e-05, 'epoch': 0.66} 17%|█▋ | 1742/10000 [6:19:55<29:41:55, 12.95s/it] 17%|█▋ | 1743/10000 [6:20:08<29:41:06, 12.94s/it] {'loss': 0.0054, 'learning_rate': 4.1305e-05, 'epoch': 0.66} 17%|█▋ | 1743/10000 [6:20:08<29:41:06, 12.94s/it] 17%|█▋ | 1744/10000 [6:20:21<29:41:16, 12.95s/it] {'loss': 0.0047, 'learning_rate': 4.13e-05, 'epoch': 0.66} 17%|█▋ | 1744/10000 [6:20:21<29:41:16, 12.95s/it] 17%|█▋ | 1745/10000 [6:20:34<29:41:30, 12.95s/it] {'loss': 0.0078, 'learning_rate': 4.1295000000000004e-05, 'epoch': 0.66} 17%|█▋ | 1745/10000 [6:20:34<29:41:30, 12.95s/it] 17%|█▋ | 1746/10000 [6:20:47<29:43:24, 12.96s/it] {'loss': 0.0048, 'learning_rate': 4.129e-05, 'epoch': 0.66} 17%|█▋ | 1746/10000 [6:20:47<29:43:24, 12.96s/it] 17%|█▋ | 1747/10000 [6:21:00<29:41:14, 12.95s/it] {'loss': 0.0058, 'learning_rate': 4.1285e-05, 'epoch': 0.66} 17%|█▋ | 1747/10000 [6:21:00<29:41:14, 12.95s/it] 17%|█▋ | 1748/10000 [6:21:12<29:41:56, 12.96s/it] {'loss': 0.0061, 'learning_rate': 4.1280000000000005e-05, 'epoch': 0.66} 17%|█▋ | 1748/10000 [6:21:13<29:41:56, 12.96s/it] 17%|█▋ | 1749/10000 [6:21:25<29:40:48, 12.95s/it] {'loss': 0.0076, 'learning_rate': 4.1275e-05, 'epoch': 0.66} 17%|█▋ | 1749/10000 [6:21:25<29:40:48, 12.95s/it] 18%|█▊ | 1750/10000 [6:21:38<29:38:59, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.127e-05, 'epoch': 0.66} 18%|█▊ | 1750/10000 [6:21:38<29:38:59, 12.94s/it] 18%|█▊ | 1751/10000 [6:21:51<29:42:04, 12.96s/it] {'loss': 0.0053, 'learning_rate': 4.1265000000000006e-05, 'epoch': 0.66} 18%|█▊ | 1751/10000 [6:21:51<29:42:04, 12.96s/it] 18%|█▊ | 1752/10000 [6:22:04<29:43:06, 12.97s/it] {'loss': 0.0068, 'learning_rate': 4.126e-05, 'epoch': 0.66} 18%|█▊ | 1752/10000 [6:22:04<29:43:06, 12.97s/it] 18%|█▊ | 1753/10000 [6:22:17<29:42:34, 12.97s/it] {'loss': 0.0066, 'learning_rate': 4.1255e-05, 'epoch': 0.66} 18%|█▊ | 1753/10000 [6:22:17<29:42:34, 12.97s/it] 18%|█▊ | 1754/10000 [6:22:30<29:39:51, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.125e-05, 'epoch': 0.66} 18%|█▊ | 1754/10000 [6:22:30<29:39:51, 12.95s/it] 18%|█▊ | 1755/10000 [6:22:43<29:37:31, 12.94s/it] {'loss': 0.0063, 'learning_rate': 4.1245e-05, 'epoch': 0.66} 18%|█▊ | 1755/10000 [6:22:43<29:37:31, 12.94s/it] 18%|█▊ | 1756/10000 [6:22:56<29:33:24, 12.91s/it] {'loss': 0.0067, 'learning_rate': 4.124e-05, 'epoch': 0.66} 18%|█▊ | 1756/10000 [6:22:56<29:33:24, 12.91s/it] 18%|█▊ | 1757/10000 [6:23:09<29:36:41, 12.93s/it] {'loss': 0.0061, 'learning_rate': 4.1235e-05, 'epoch': 0.66} 18%|█▊ | 1757/10000 [6:23:09<29:36:41, 12.93s/it] 18%|█▊ | 1758/10000 [6:23:22<29:33:10, 12.91s/it] {'loss': 0.0049, 'learning_rate': 4.123e-05, 'epoch': 0.66} 18%|█▊ | 1758/10000 [6:23:22<29:33:10, 12.91s/it] 18%|█▊ | 1759/10000 [6:23:35<29:31:43, 12.90s/it] {'loss': 0.0065, 'learning_rate': 4.1225e-05, 'epoch': 0.66} 18%|█▊ | 1759/10000 [6:23:35<29:31:43, 12.90s/it] 18%|█▊ | 1760/10000 [6:23:48<29:31:56, 12.90s/it] {'loss': 0.0067, 'learning_rate': 4.122e-05, 'epoch': 0.66} 18%|█▊ | 1760/10000 [6:23:48<29:31:56, 12.90s/it] 18%|█▊ | 1761/10000 [6:24:01<29:32:36, 12.91s/it] {'loss': 0.0108, 'learning_rate': 4.1215000000000004e-05, 'epoch': 0.66} 18%|█▊ | 1761/10000 [6:24:01<29:32:36, 12.91s/it] 18%|█▊ | 1762/10000 [6:24:13<29:32:21, 12.91s/it] {'loss': 0.007, 'learning_rate': 4.121000000000001e-05, 'epoch': 0.66} 18%|█▊ | 1762/10000 [6:24:13<29:32:21, 12.91s/it] 18%|█▊ | 1763/10000 [6:24:26<29:30:57, 12.90s/it] {'loss': 0.0063, 'learning_rate': 4.1205e-05, 'epoch': 0.66} 18%|█▊ | 1763/10000 [6:24:26<29:30:57, 12.90s/it] 18%|█▊ | 1764/10000 [6:24:39<29:30:35, 12.90s/it] {'loss': 0.0055, 'learning_rate': 4.12e-05, 'epoch': 0.66} 18%|█▊ | 1764/10000 [6:24:39<29:30:35, 12.90s/it] 18%|█▊ | 1765/10000 [6:24:52<29:31:43, 12.91s/it] {'loss': 0.0076, 'learning_rate': 4.1195e-05, 'epoch': 0.67} 18%|█▊ | 1765/10000 [6:24:52<29:31:43, 12.91s/it] 18%|█▊ | 1766/10000 [6:25:05<29:32:28, 12.92s/it] {'loss': 0.0062, 'learning_rate': 4.1190000000000004e-05, 'epoch': 0.67} 18%|█▊ | 1766/10000 [6:25:05<29:32:28, 12.92s/it] 18%|█▊ | 1767/10000 [6:25:18<29:32:35, 12.92s/it] {'loss': 0.0065, 'learning_rate': 4.1185e-05, 'epoch': 0.67} 18%|█▊ | 1767/10000 [6:25:18<29:32:35, 12.92s/it] 18%|█▊ | 1768/10000 [6:25:31<29:32:35, 12.92s/it] {'loss': 0.0066, 'learning_rate': 4.118e-05, 'epoch': 0.67} 18%|█▊ | 1768/10000 [6:25:31<29:32:35, 12.92s/it] 18%|█▊ | 1769/10000 [6:25:44<29:31:56, 12.92s/it] {'loss': 0.0061, 'learning_rate': 4.1175000000000005e-05, 'epoch': 0.67} 18%|█▊ | 1769/10000 [6:25:44<29:31:56, 12.92s/it] 18%|█▊ | 1770/10000 [6:25:57<29:30:25, 12.91s/it] {'loss': 0.0063, 'learning_rate': 4.117e-05, 'epoch': 0.67} 18%|█▊ | 1770/10000 [6:25:57<29:30:25, 12.91s/it] 18%|█▊ | 1771/10000 [6:26:10<29:32:35, 12.92s/it] {'loss': 0.0066, 'learning_rate': 4.1165e-05, 'epoch': 0.67} 18%|█▊ | 1771/10000 [6:26:10<29:32:35, 12.92s/it] 18%|█▊ | 1772/10000 [6:26:23<29:29:27, 12.90s/it] {'loss': 0.0059, 'learning_rate': 4.1160000000000006e-05, 'epoch': 0.67} 18%|█▊ | 1772/10000 [6:26:23<29:29:27, 12.90s/it] 18%|█▊ | 1773/10000 [6:26:35<29:26:29, 12.88s/it] {'loss': 0.0061, 'learning_rate': 4.1155e-05, 'epoch': 0.67} 18%|█▊ | 1773/10000 [6:26:35<29:26:29, 12.88s/it] 18%|█▊ | 1774/10000 [6:26:48<29:24:53, 12.87s/it] {'loss': 0.0056, 'learning_rate': 4.115e-05, 'epoch': 0.67} 18%|█▊ | 1774/10000 [6:26:48<29:24:53, 12.87s/it] 18%|█▊ | 1775/10000 [6:27:01<29:24:00, 12.87s/it] {'loss': 0.0081, 'learning_rate': 4.1145e-05, 'epoch': 0.67} 18%|█▊ | 1775/10000 [6:27:01<29:24:00, 12.87s/it] 18%|█▊ | 1776/10000 [6:27:14<29:26:21, 12.89s/it] {'loss': 0.0053, 'learning_rate': 4.114e-05, 'epoch': 0.67} 18%|█▊ | 1776/10000 [6:27:14<29:26:21, 12.89s/it] 18%|█▊ | 1777/10000 [6:27:27<29:26:39, 12.89s/it] {'loss': 0.0067, 'learning_rate': 4.1135e-05, 'epoch': 0.67} 18%|█▊ | 1777/10000 [6:27:27<29:26:39, 12.89s/it] 18%|█▊ | 1778/10000 [6:27:40<29:25:18, 12.88s/it] {'loss': 0.0055, 'learning_rate': 4.113e-05, 'epoch': 0.67} 18%|█▊ | 1778/10000 [6:27:40<29:25:18, 12.88s/it] 18%|█▊ | 1779/10000 [6:27:53<29:27:09, 12.90s/it] {'loss': 0.0054, 'learning_rate': 4.1125000000000004e-05, 'epoch': 0.67} 18%|█▊ | 1779/10000 [6:27:53<29:27:09, 12.90s/it] 18%|█▊ | 1780/10000 [6:28:06<29:24:11, 12.88s/it] {'loss': 0.0088, 'learning_rate': 4.1120000000000006e-05, 'epoch': 0.67} 18%|█▊ | 1780/10000 [6:28:06<29:24:11, 12.88s/it] 18%|█▊ | 1781/10000 [6:28:18<29:25:27, 12.89s/it] {'loss': 0.0055, 'learning_rate': 4.1115e-05, 'epoch': 0.67} 18%|█▊ | 1781/10000 [6:28:18<29:25:27, 12.89s/it] 18%|█▊ | 1782/10000 [6:28:31<29:25:30, 12.89s/it] {'loss': 0.0079, 'learning_rate': 4.1110000000000005e-05, 'epoch': 0.67} 18%|█▊ | 1782/10000 [6:28:31<29:25:30, 12.89s/it] 18%|█▊ | 1783/10000 [6:28:44<29:23:48, 12.88s/it] {'loss': 0.0061, 'learning_rate': 4.110500000000001e-05, 'epoch': 0.67} 18%|█▊ | 1783/10000 [6:28:44<29:23:48, 12.88s/it] 18%|█▊ | 1784/10000 [6:28:57<29:25:26, 12.89s/it] {'loss': 0.0063, 'learning_rate': 4.11e-05, 'epoch': 0.67} 18%|█▊ | 1784/10000 [6:28:57<29:25:26, 12.89s/it] 18%|█▊ | 1785/10000 [6:29:10<29:22:06, 12.87s/it] {'loss': 0.0076, 'learning_rate': 4.1095e-05, 'epoch': 0.67} 18%|█▊ | 1785/10000 [6:29:10<29:22:06, 12.87s/it] 18%|█▊ | 1786/10000 [6:29:23<29:23:33, 12.88s/it] {'loss': 0.0049, 'learning_rate': 4.109e-05, 'epoch': 0.67} 18%|█▊ | 1786/10000 [6:29:23<29:23:33, 12.88s/it] 18%|█▊ | 1787/10000 [6:29:36<29:26:07, 12.90s/it] {'loss': 0.0053, 'learning_rate': 4.1085000000000004e-05, 'epoch': 0.67} 18%|█▊ | 1787/10000 [6:29:36<29:26:07, 12.90s/it] 18%|█▊ | 1788/10000 [6:29:49<29:23:16, 12.88s/it] {'loss': 0.0065, 'learning_rate': 4.108e-05, 'epoch': 0.67} 18%|█▊ | 1788/10000 [6:29:49<29:23:16, 12.88s/it] 18%|█▊ | 1789/10000 [6:30:02<29:24:44, 12.90s/it] {'loss': 0.0058, 'learning_rate': 4.1075e-05, 'epoch': 0.67} 18%|█▊ | 1789/10000 [6:30:02<29:24:44, 12.90s/it] 18%|█▊ | 1790/10000 [6:30:14<29:22:33, 12.88s/it] {'loss': 0.0074, 'learning_rate': 4.1070000000000005e-05, 'epoch': 0.67} 18%|█▊ | 1790/10000 [6:30:14<29:22:33, 12.88s/it] 18%|█▊ | 1791/10000 [6:30:27<29:23:57, 12.89s/it] {'loss': 0.0048, 'learning_rate': 4.1065e-05, 'epoch': 0.67} 18%|█▊ | 1791/10000 [6:30:27<29:23:57, 12.89s/it] 18%|█▊ | 1792/10000 [6:30:40<29:24:41, 12.90s/it] {'loss': 0.0057, 'learning_rate': 4.106e-05, 'epoch': 0.68} 18%|█▊ | 1792/10000 [6:30:40<29:24:41, 12.90s/it] 18%|█▊ | 1793/10000 [6:30:53<29:24:45, 12.90s/it] {'loss': 0.0067, 'learning_rate': 4.1055000000000006e-05, 'epoch': 0.68} 18%|█▊ | 1793/10000 [6:30:53<29:24:45, 12.90s/it] 18%|█▊ | 1794/10000 [6:31:06<29:23:32, 12.89s/it] {'loss': 0.0064, 'learning_rate': 4.105e-05, 'epoch': 0.68} 18%|█▊ | 1794/10000 [6:31:06<29:23:32, 12.89s/it] 18%|█▊ | 1795/10000 [6:31:19<29:21:57, 12.88s/it] {'loss': 0.0054, 'learning_rate': 4.1045e-05, 'epoch': 0.68} 18%|█▊ | 1795/10000 [6:31:19<29:21:57, 12.88s/it] 18%|█▊ | 1796/10000 [6:31:32<29:21:31, 12.88s/it] {'loss': 0.0065, 'learning_rate': 4.104e-05, 'epoch': 0.68} 18%|█▊ | 1796/10000 [6:31:32<29:21:31, 12.88s/it] 18%|█▊ | 1797/10000 [6:31:45<29:20:05, 12.87s/it] {'loss': 0.0071, 'learning_rate': 4.1035e-05, 'epoch': 0.68} 18%|█▊ | 1797/10000 [6:31:45<29:20:05, 12.87s/it] 18%|█▊ | 1798/10000 [6:31:58<29:21:21, 12.88s/it] {'loss': 0.0092, 'learning_rate': 4.103e-05, 'epoch': 0.68} 18%|█▊ | 1798/10000 [6:31:58<29:21:21, 12.88s/it] 18%|█▊ | 1799/10000 [6:32:11<29:26:02, 12.92s/it] {'loss': 0.0057, 'learning_rate': 4.1025e-05, 'epoch': 0.68} 18%|█▊ | 1799/10000 [6:32:11<29:26:02, 12.92s/it] 18%|█▊ | 1800/10000 [6:32:23<29:23:56, 12.91s/it] {'loss': 0.0073, 'learning_rate': 4.1020000000000004e-05, 'epoch': 0.68} 18%|█▊ | 1800/10000 [6:32:23<29:23:56, 12.91s/it] 18%|█▊ | 1801/10000 [6:32:36<29:20:37, 12.88s/it] {'loss': 0.0063, 'learning_rate': 4.1015000000000006e-05, 'epoch': 0.68} 18%|█▊ | 1801/10000 [6:32:36<29:20:37, 12.88s/it] 18%|█▊ | 1802/10000 [6:32:49<29:20:55, 12.89s/it] {'loss': 0.0066, 'learning_rate': 4.101e-05, 'epoch': 0.68} 18%|█▊ | 1802/10000 [6:32:49<29:20:55, 12.89s/it] 18%|█▊ | 1803/10000 [6:33:02<29:21:06, 12.89s/it] {'loss': 0.0052, 'learning_rate': 4.1005000000000005e-05, 'epoch': 0.68} 18%|█▊ | 1803/10000 [6:33:02<29:21:06, 12.89s/it] 18%|█▊ | 1804/10000 [6:33:15<29:20:19, 12.89s/it] {'loss': 0.008, 'learning_rate': 4.1e-05, 'epoch': 0.68} 18%|█▊ | 1804/10000 [6:33:15<29:20:19, 12.89s/it] 18%|█▊ | 1805/10000 [6:33:28<29:19:36, 12.88s/it] {'loss': 0.0056, 'learning_rate': 4.0995e-05, 'epoch': 0.68} 18%|█▊ | 1805/10000 [6:33:28<29:19:36, 12.88s/it] 18%|█▊ | 1806/10000 [6:33:41<29:21:21, 12.90s/it] {'loss': 0.0076, 'learning_rate': 4.099e-05, 'epoch': 0.68} 18%|█▊ | 1806/10000 [6:33:41<29:21:21, 12.90s/it] 18%|█▊ | 1807/10000 [6:33:54<29:20:14, 12.89s/it] {'loss': 0.0056, 'learning_rate': 4.0985e-05, 'epoch': 0.68} 18%|█▊ | 1807/10000 [6:33:54<29:20:14, 12.89s/it] 18%|█▊ | 1808/10000 [6:34:06<29:20:53, 12.90s/it] {'loss': 0.0067, 'learning_rate': 4.0980000000000004e-05, 'epoch': 0.68} 18%|█▊ | 1808/10000 [6:34:07<29:20:53, 12.90s/it] 18%|█▊ | 1809/10000 [6:34:19<29:20:30, 12.90s/it] {'loss': 0.0076, 'learning_rate': 4.0975e-05, 'epoch': 0.68} 18%|█▊ | 1809/10000 [6:34:19<29:20:30, 12.90s/it] 18%|█▊ | 1810/10000 [6:34:32<29:22:14, 12.91s/it] {'loss': 0.0065, 'learning_rate': 4.097e-05, 'epoch': 0.68} 18%|█▊ | 1810/10000 [6:34:32<29:22:14, 12.91s/it] 18%|█▊ | 1811/10000 [6:34:45<29:24:44, 12.93s/it] {'loss': 0.0062, 'learning_rate': 4.0965000000000005e-05, 'epoch': 0.68} 18%|█▊ | 1811/10000 [6:34:45<29:24:44, 12.93s/it] 18%|█▊ | 1812/10000 [6:34:58<29:19:53, 12.90s/it] {'loss': 0.0066, 'learning_rate': 4.096e-05, 'epoch': 0.68} 18%|█▊ | 1812/10000 [6:34:58<29:19:53, 12.90s/it] 18%|█▊ | 1813/10000 [6:35:11<29:19:39, 12.90s/it] {'loss': 0.0071, 'learning_rate': 4.0955000000000003e-05, 'epoch': 0.68} 18%|█▊ | 1813/10000 [6:35:11<29:19:39, 12.90s/it] 18%|█▊ | 1814/10000 [6:35:24<29:23:51, 12.93s/it] {'loss': 0.0067, 'learning_rate': 4.095e-05, 'epoch': 0.68} 18%|█▊ | 1814/10000 [6:35:24<29:23:51, 12.93s/it] 18%|█▊ | 1815/10000 [6:35:37<29:24:07, 12.93s/it] {'loss': 0.0063, 'learning_rate': 4.0945e-05, 'epoch': 0.68} 18%|█▊ | 1815/10000 [6:35:37<29:24:07, 12.93s/it] 18%|█▊ | 1816/10000 [6:35:50<29:25:12, 12.94s/it] {'loss': 0.005, 'learning_rate': 4.094e-05, 'epoch': 0.68} 18%|█▊ | 1816/10000 [6:35:50<29:25:12, 12.94s/it] 18%|█▊ | 1817/10000 [6:36:03<29:22:49, 12.93s/it] {'loss': 0.0057, 'learning_rate': 4.0935e-05, 'epoch': 0.68} 18%|█▊ | 1817/10000 [6:36:03<29:22:49, 12.93s/it] 18%|█▊ | 1818/10000 [6:36:16<29:22:13, 12.92s/it] {'loss': 0.0053, 'learning_rate': 4.093e-05, 'epoch': 0.69} 18%|█▊ | 1818/10000 [6:36:16<29:22:13, 12.92s/it] 18%|█▊ | 1819/10000 [6:36:29<29:25:03, 12.95s/it] {'loss': 0.0067, 'learning_rate': 4.0925000000000005e-05, 'epoch': 0.69} 18%|█▊ | 1819/10000 [6:36:29<29:25:03, 12.95s/it] 18%|█▊ | 1820/10000 [6:36:42<29:25:11, 12.95s/it] {'loss': 0.0053, 'learning_rate': 4.092e-05, 'epoch': 0.69} 18%|█▊ | 1820/10000 [6:36:42<29:25:11, 12.95s/it] 18%|█▊ | 1821/10000 [6:36:55<29:26:50, 12.96s/it] {'loss': 0.0058, 'learning_rate': 4.0915000000000004e-05, 'epoch': 0.69} 18%|█▊ | 1821/10000 [6:36:55<29:26:50, 12.96s/it] 18%|█▊ | 1822/10000 [6:37:08<29:25:44, 12.95s/it] {'loss': 0.0054, 'learning_rate': 4.0910000000000006e-05, 'epoch': 0.69} 18%|█▊ | 1822/10000 [6:37:08<29:25:44, 12.95s/it] 18%|█▊ | 1823/10000 [6:37:21<29:24:20, 12.95s/it] {'loss': 0.0234, 'learning_rate': 4.0905e-05, 'epoch': 0.69} 18%|█▊ | 1823/10000 [6:37:21<29:24:20, 12.95s/it] 18%|█▊ | 1824/10000 [6:37:33<29:22:59, 12.94s/it] {'loss': 0.0075, 'learning_rate': 4.09e-05, 'epoch': 0.69} 18%|█▊ | 1824/10000 [6:37:33<29:22:59, 12.94s/it] 18%|█▊ | 1825/10000 [6:37:46<29:26:47, 12.97s/it] {'loss': 0.0067, 'learning_rate': 4.0895e-05, 'epoch': 0.69} 18%|█▊ | 1825/10000 [6:37:47<29:26:47, 12.97s/it] 18%|█▊ | 1826/10000 [6:37:59<29:25:20, 12.96s/it] {'loss': 0.008, 'learning_rate': 4.089e-05, 'epoch': 0.69} 18%|█▊ | 1826/10000 [6:37:59<29:25:20, 12.96s/it] 18%|█▊ | 1827/10000 [6:38:12<29:25:39, 12.96s/it] {'loss': 0.0057, 'learning_rate': 4.0885e-05, 'epoch': 0.69} 18%|█▊ | 1827/10000 [6:38:12<29:25:39, 12.96s/it] 18%|█▊ | 1828/10000 [6:38:25<29:21:54, 12.94s/it] {'loss': 0.0073, 'learning_rate': 4.088e-05, 'epoch': 0.69} 18%|█▊ | 1828/10000 [6:38:25<29:21:54, 12.94s/it] 18%|█▊ | 1829/10000 [6:38:38<29:19:43, 12.92s/it] {'loss': 0.0065, 'learning_rate': 4.0875000000000004e-05, 'epoch': 0.69} 18%|█▊ | 1829/10000 [6:38:38<29:19:43, 12.92s/it] 18%|█▊ | 1830/10000 [6:38:51<29:18:06, 12.91s/it] {'loss': 0.0061, 'learning_rate': 4.087e-05, 'epoch': 0.69} 18%|█▊ | 1830/10000 [6:38:51<29:18:06, 12.91s/it] 18%|█▊ | 1831/10000 [6:39:04<29:19:47, 12.93s/it] {'loss': 0.0053, 'learning_rate': 4.0865e-05, 'epoch': 0.69} 18%|█▊ | 1831/10000 [6:39:04<29:19:47, 12.93s/it] 18%|█▊ | 1832/10000 [6:39:17<29:19:31, 12.92s/it] {'loss': 0.0059, 'learning_rate': 4.0860000000000005e-05, 'epoch': 0.69} 18%|█▊ | 1832/10000 [6:39:17<29:19:31, 12.92s/it] 18%|█▊ | 1833/10000 [6:39:30<29:15:36, 12.90s/it] {'loss': 0.0053, 'learning_rate': 4.0855e-05, 'epoch': 0.69} 18%|█▊ | 1833/10000 [6:39:30<29:15:36, 12.90s/it] 18%|█▊ | 1834/10000 [6:39:43<29:11:16, 12.87s/it] {'loss': 0.007, 'learning_rate': 4.085e-05, 'epoch': 0.69} 18%|█▊ | 1834/10000 [6:39:43<29:11:16, 12.87s/it] 18%|█▊ | 1835/10000 [6:39:55<29:12:01, 12.87s/it] {'loss': 0.0056, 'learning_rate': 4.0845e-05, 'epoch': 0.69} 18%|█▊ | 1835/10000 [6:39:55<29:12:01, 12.87s/it] 18%|█▊ | 1836/10000 [6:40:08<29:13:10, 12.88s/it] {'loss': 0.0047, 'learning_rate': 4.084e-05, 'epoch': 0.69} 18%|█▊ | 1836/10000 [6:40:08<29:13:10, 12.88s/it] 18%|█▊ | 1837/10000 [6:40:21<29:14:24, 12.90s/it] {'loss': 0.0063, 'learning_rate': 4.0835e-05, 'epoch': 0.69} 18%|█▊ | 1837/10000 [6:40:21<29:14:24, 12.90s/it] 18%|█▊ | 1838/10000 [6:40:34<29:14:52, 12.90s/it] {'loss': 0.0063, 'learning_rate': 4.083e-05, 'epoch': 0.69} 18%|█▊ | 1838/10000 [6:40:34<29:14:52, 12.90s/it] 18%|█▊ | 1839/10000 [6:40:47<29:19:23, 12.94s/it] {'loss': 0.0084, 'learning_rate': 4.0825e-05, 'epoch': 0.69} 18%|█▊ | 1839/10000 [6:40:47<29:19:23, 12.94s/it] 18%|█▊ | 1840/10000 [6:41:00<29:18:42, 12.93s/it] {'loss': 0.0055, 'learning_rate': 4.0820000000000006e-05, 'epoch': 0.69} 18%|█▊ | 1840/10000 [6:41:00<29:18:42, 12.93s/it] 18%|█▊ | 1841/10000 [6:41:13<29:17:02, 12.92s/it] {'loss': 0.0073, 'learning_rate': 4.0815e-05, 'epoch': 0.69} 18%|█▊ | 1841/10000 [6:41:13<29:17:02, 12.92s/it] 18%|█▊ | 1842/10000 [6:41:26<29:17:13, 12.92s/it] {'loss': 0.0044, 'learning_rate': 4.0810000000000004e-05, 'epoch': 0.69} 18%|█▊ | 1842/10000 [6:41:26<29:17:13, 12.92s/it] 18%|█▊ | 1843/10000 [6:41:39<29:18:28, 12.93s/it] {'loss': 0.0059, 'learning_rate': 4.0805000000000007e-05, 'epoch': 0.69} 18%|█▊ | 1843/10000 [6:41:39<29:18:28, 12.93s/it] 18%|█▊ | 1844/10000 [6:41:52<29:17:32, 12.93s/it] {'loss': 0.0051, 'learning_rate': 4.08e-05, 'epoch': 0.69} 18%|█▊ | 1844/10000 [6:41:52<29:17:32, 12.93s/it] 18%|█▊ | 1845/10000 [6:42:05<29:18:14, 12.94s/it] {'loss': 0.0048, 'learning_rate': 4.0795e-05, 'epoch': 0.7} 18%|█▊ | 1845/10000 [6:42:05<29:18:14, 12.94s/it] 18%|█▊ | 1846/10000 [6:42:18<29:19:29, 12.95s/it] {'loss': 0.0064, 'learning_rate': 4.079e-05, 'epoch': 0.7} 18%|█▊ | 1846/10000 [6:42:18<29:19:29, 12.95s/it] 18%|█▊ | 1847/10000 [6:42:31<29:18:20, 12.94s/it] {'loss': 0.0059, 'learning_rate': 4.0785e-05, 'epoch': 0.7} 18%|█▊ | 1847/10000 [6:42:31<29:18:20, 12.94s/it] 18%|█▊ | 1848/10000 [6:42:44<29:16:19, 12.93s/it] {'loss': 0.0073, 'learning_rate': 4.078e-05, 'epoch': 0.7} 18%|█▊ | 1848/10000 [6:42:44<29:16:19, 12.93s/it] 18%|█▊ | 1849/10000 [6:42:56<29:15:38, 12.92s/it] {'loss': 0.006, 'learning_rate': 4.0775e-05, 'epoch': 0.7} 18%|█▊ | 1849/10000 [6:42:57<29:15:38, 12.92s/it] 18%|█▊ | 1850/10000 [6:43:09<29:15:09, 12.92s/it] {'loss': 0.0055, 'learning_rate': 4.0770000000000004e-05, 'epoch': 0.7} 18%|█▊ | 1850/10000 [6:43:09<29:15:09, 12.92s/it] 19%|█▊ | 1851/10000 [6:43:22<29:17:29, 12.94s/it] {'loss': 0.0056, 'learning_rate': 4.0765e-05, 'epoch': 0.7} 19%|█▊ | 1851/10000 [6:43:22<29:17:29, 12.94s/it] 19%|█▊ | 1852/10000 [6:43:35<29:14:30, 12.92s/it] {'loss': 0.0065, 'learning_rate': 4.076e-05, 'epoch': 0.7} 19%|█▊ | 1852/10000 [6:43:35<29:14:30, 12.92s/it] 19%|█▊ | 1853/10000 [6:43:48<29:15:20, 12.93s/it] {'loss': 0.0065, 'learning_rate': 4.0755000000000005e-05, 'epoch': 0.7} 19%|█▊ | 1853/10000 [6:43:48<29:15:20, 12.93s/it] 19%|█▊ | 1854/10000 [6:44:01<29:15:57, 12.93s/it] {'loss': 0.005, 'learning_rate': 4.075e-05, 'epoch': 0.7} 19%|█▊ | 1854/10000 [6:44:01<29:15:57, 12.93s/it] 19%|█▊ | 1855/10000 [6:44:14<29:14:55, 12.93s/it] {'loss': 0.0065, 'learning_rate': 4.0745e-05, 'epoch': 0.7} 19%|█▊ | 1855/10000 [6:44:14<29:14:55, 12.93s/it] 19%|█▊ | 1856/10000 [6:44:27<29:12:51, 12.91s/it] {'loss': 0.0053, 'learning_rate': 4.074e-05, 'epoch': 0.7} 19%|█▊ | 1856/10000 [6:44:27<29:12:51, 12.91s/it] 19%|█▊ | 1857/10000 [6:44:40<29:13:06, 12.92s/it] {'loss': 0.0066, 'learning_rate': 4.0735e-05, 'epoch': 0.7} 19%|█▊ | 1857/10000 [6:44:40<29:13:06, 12.92s/it] 19%|█▊ | 1858/10000 [6:44:53<29:16:13, 12.94s/it] {'loss': 0.0057, 'learning_rate': 4.0730000000000005e-05, 'epoch': 0.7} 19%|█▊ | 1858/10000 [6:44:53<29:16:13, 12.94s/it] 19%|█▊ | 1859/10000 [6:45:06<29:16:28, 12.95s/it] {'loss': 0.0052, 'learning_rate': 4.0725e-05, 'epoch': 0.7} 19%|█▊ | 1859/10000 [6:45:06<29:16:28, 12.95s/it] 19%|█▊ | 1860/10000 [6:45:19<29:20:17, 12.98s/it] {'loss': 0.0056, 'learning_rate': 4.072e-05, 'epoch': 0.7} 19%|█▊ | 1860/10000 [6:45:19<29:20:17, 12.98s/it] 19%|█▊ | 1861/10000 [6:45:32<29:20:36, 12.98s/it] {'loss': 0.0041, 'learning_rate': 4.0715000000000006e-05, 'epoch': 0.7} 19%|█▊ | 1861/10000 [6:45:32<29:20:36, 12.98s/it] 19%|█▊ | 1862/10000 [6:45:45<29:16:59, 12.95s/it] {'loss': 0.0074, 'learning_rate': 4.071e-05, 'epoch': 0.7} 19%|█▊ | 1862/10000 [6:45:45<29:16:59, 12.95s/it] 19%|█▊ | 1863/10000 [6:45:58<29:16:13, 12.95s/it] {'loss': 0.0063, 'learning_rate': 4.0705000000000004e-05, 'epoch': 0.7} 19%|█▊ | 1863/10000 [6:45:58<29:16:13, 12.95s/it] 19%|█▊ | 1864/10000 [6:46:11<29:14:35, 12.94s/it] {'loss': 0.0047, 'learning_rate': 4.07e-05, 'epoch': 0.7} 19%|█▊ | 1864/10000 [6:46:11<29:14:35, 12.94s/it] 19%|█▊ | 1865/10000 [6:46:24<29:14:49, 12.94s/it] {'loss': 0.0042, 'learning_rate': 4.0695e-05, 'epoch': 0.7} 19%|█▊ | 1865/10000 [6:46:24<29:14:49, 12.94s/it] 19%|█▊ | 1866/10000 [6:46:36<29:13:17, 12.93s/it] {'loss': 0.0079, 'learning_rate': 4.069e-05, 'epoch': 0.7} 19%|█▊ | 1866/10000 [6:46:36<29:13:17, 12.93s/it] 19%|█▊ | 1867/10000 [6:46:49<29:12:44, 12.93s/it] {'loss': 0.0057, 'learning_rate': 4.0685e-05, 'epoch': 0.7} 19%|█▊ | 1867/10000 [6:46:49<29:12:44, 12.93s/it] 19%|█▊ | 1868/10000 [6:47:02<29:09:37, 12.91s/it] {'loss': 0.0052, 'learning_rate': 4.0680000000000004e-05, 'epoch': 0.7} 19%|█▊ | 1868/10000 [6:47:02<29:09:37, 12.91s/it] 19%|█▊ | 1869/10000 [6:47:15<29:07:27, 12.89s/it] {'loss': 0.0052, 'learning_rate': 4.0675e-05, 'epoch': 0.7} 19%|█▊ | 1869/10000 [6:47:15<29:07:27, 12.89s/it] 19%|█▊ | 1870/10000 [6:47:28<29:09:34, 12.91s/it] {'loss': 0.0054, 'learning_rate': 4.067e-05, 'epoch': 0.7} 19%|█▊ | 1870/10000 [6:47:28<29:09:34, 12.91s/it] 19%|█▊ | 1871/10000 [6:47:41<29:09:07, 12.91s/it] {'loss': 0.007, 'learning_rate': 4.0665000000000005e-05, 'epoch': 0.7} 19%|█▊ | 1871/10000 [6:47:41<29:09:07, 12.91s/it] 19%|█▊ | 1872/10000 [6:47:54<29:10:27, 12.92s/it] {'loss': 0.0062, 'learning_rate': 4.066e-05, 'epoch': 0.71} 19%|█▊ | 1872/10000 [6:47:54<29:10:27, 12.92s/it] 19%|█▊ | 1873/10000 [6:48:07<29:12:45, 12.94s/it] {'loss': 0.0083, 'learning_rate': 4.0655e-05, 'epoch': 0.71} 19%|█▊ | 1873/10000 [6:48:07<29:12:45, 12.94s/it] 19%|█▊ | 1874/10000 [6:48:20<29:11:13, 12.93s/it] {'loss': 0.0088, 'learning_rate': 4.065e-05, 'epoch': 0.71} 19%|█▊ | 1874/10000 [6:48:20<29:11:13, 12.93s/it] 19%|█▉ | 1875/10000 [6:48:33<29:08:00, 12.91s/it] {'loss': 0.0074, 'learning_rate': 4.0645e-05, 'epoch': 0.71} 19%|█▉ | 1875/10000 [6:48:33<29:08:00, 12.91s/it] 19%|█▉ | 1876/10000 [6:48:46<29:11:11, 12.93s/it] {'loss': 0.0061, 'learning_rate': 4.064e-05, 'epoch': 0.71} 19%|█▉ | 1876/10000 [6:48:46<29:11:11, 12.93s/it] 19%|█▉ | 1877/10000 [6:48:59<29:10:12, 12.93s/it] {'loss': 0.0054, 'learning_rate': 4.0635e-05, 'epoch': 0.71} 19%|█▉ | 1877/10000 [6:48:59<29:10:12, 12.93s/it] 19%|█▉ | 1878/10000 [6:49:11<29:09:34, 12.92s/it] {'loss': 0.0054, 'learning_rate': 4.063e-05, 'epoch': 0.71} 19%|█▉ | 1878/10000 [6:49:12<29:09:34, 12.92s/it] 19%|█▉ | 1879/10000 [6:49:24<29:08:33, 12.92s/it] {'loss': 0.0054, 'learning_rate': 4.0625000000000005e-05, 'epoch': 0.71} 19%|█▉ | 1879/10000 [6:49:24<29:08:33, 12.92s/it] 19%|█▉ | 1880/10000 [6:49:37<29:07:11, 12.91s/it] {'loss': 0.0052, 'learning_rate': 4.062e-05, 'epoch': 0.71} 19%|█▉ | 1880/10000 [6:49:37<29:07:11, 12.91s/it] 19%|█▉ | 1881/10000 [6:49:50<29:07:40, 12.92s/it] {'loss': 0.006, 'learning_rate': 4.0615e-05, 'epoch': 0.71} 19%|█▉ | 1881/10000 [6:49:50<29:07:40, 12.92s/it] 19%|█▉ | 1882/10000 [6:50:03<29:10:05, 12.93s/it] {'loss': 0.0076, 'learning_rate': 4.0610000000000006e-05, 'epoch': 0.71} 19%|█▉ | 1882/10000 [6:50:03<29:10:05, 12.93s/it] 19%|█▉ | 1883/10000 [6:50:16<29:11:04, 12.94s/it] {'loss': 0.0052, 'learning_rate': 4.0605e-05, 'epoch': 0.71} 19%|█▉ | 1883/10000 [6:50:16<29:11:04, 12.94s/it] 19%|█▉ | 1884/10000 [6:50:29<29:08:42, 12.93s/it] {'loss': 0.0063, 'learning_rate': 4.0600000000000004e-05, 'epoch': 0.71} 19%|█▉ | 1884/10000 [6:50:29<29:08:42, 12.93s/it] 19%|█▉ | 1885/10000 [6:50:42<29:09:28, 12.94s/it] {'loss': 0.0053, 'learning_rate': 4.0595e-05, 'epoch': 0.71} 19%|█▉ | 1885/10000 [6:50:42<29:09:28, 12.94s/it] 19%|█▉ | 1886/10000 [6:50:55<29:12:26, 12.96s/it] {'loss': 0.0057, 'learning_rate': 4.059e-05, 'epoch': 0.71} 19%|█▉ | 1886/10000 [6:50:55<29:12:26, 12.96s/it] 19%|█▉ | 1887/10000 [6:51:08<29:12:15, 12.96s/it] {'loss': 0.0065, 'learning_rate': 4.0585e-05, 'epoch': 0.71} 19%|█▉ | 1887/10000 [6:51:08<29:12:15, 12.96s/it] 19%|█▉ | 1888/10000 [6:51:21<29:08:42, 12.93s/it] {'loss': 0.007, 'learning_rate': 4.058e-05, 'epoch': 0.71} 19%|█▉ | 1888/10000 [6:51:21<29:08:42, 12.93s/it] 19%|█▉ | 1889/10000 [6:51:34<29:06:04, 12.92s/it] {'loss': 0.008, 'learning_rate': 4.0575000000000004e-05, 'epoch': 0.71} 19%|█▉ | 1889/10000 [6:51:34<29:06:04, 12.92s/it] 19%|█▉ | 1890/10000 [6:51:47<29:03:55, 12.90s/it] {'loss': 0.0056, 'learning_rate': 4.057e-05, 'epoch': 0.71} 19%|█▉ | 1890/10000 [6:51:47<29:03:55, 12.90s/it] 19%|█▉ | 1891/10000 [6:52:00<29:08:01, 12.93s/it] {'loss': 0.0072, 'learning_rate': 4.0565e-05, 'epoch': 0.71} 19%|█▉ | 1891/10000 [6:52:00<29:08:01, 12.93s/it] 19%|█▉ | 1892/10000 [6:52:13<29:09:01, 12.94s/it] {'loss': 0.0088, 'learning_rate': 4.0560000000000005e-05, 'epoch': 0.71} 19%|█▉ | 1892/10000 [6:52:13<29:09:01, 12.94s/it] 19%|█▉ | 1893/10000 [6:52:26<29:11:56, 12.97s/it] {'loss': 0.0054, 'learning_rate': 4.055500000000001e-05, 'epoch': 0.71} 19%|█▉ | 1893/10000 [6:52:26<29:11:56, 12.97s/it] 19%|█▉ | 1894/10000 [6:52:39<29:11:39, 12.97s/it] {'loss': 0.0051, 'learning_rate': 4.055e-05, 'epoch': 0.71} 19%|█▉ | 1894/10000 [6:52:39<29:11:39, 12.97s/it] 19%|█▉ | 1895/10000 [6:52:51<29:07:29, 12.94s/it] {'loss': 0.0056, 'learning_rate': 4.0545e-05, 'epoch': 0.71} 19%|█▉ | 1895/10000 [6:52:51<29:07:29, 12.94s/it] 19%|█▉ | 1896/10000 [6:53:04<29:04:21, 12.91s/it] {'loss': 0.0049, 'learning_rate': 4.054e-05, 'epoch': 0.71} 19%|█▉ | 1896/10000 [6:53:04<29:04:21, 12.91s/it] 19%|█▉ | 1897/10000 [6:53:17<29:07:22, 12.94s/it] {'loss': 0.0048, 'learning_rate': 4.0535000000000004e-05, 'epoch': 0.71} 19%|█▉ | 1897/10000 [6:53:17<29:07:22, 12.94s/it] 19%|█▉ | 1898/10000 [6:53:30<29:06:27, 12.93s/it] {'loss': 0.0064, 'learning_rate': 4.053e-05, 'epoch': 0.72} 19%|█▉ | 1898/10000 [6:53:30<29:06:27, 12.93s/it] 19%|█▉ | 1899/10000 [6:53:43<29:09:16, 12.96s/it] {'loss': 0.0076, 'learning_rate': 4.0525e-05, 'epoch': 0.72} 19%|█▉ | 1899/10000 [6:53:43<29:09:16, 12.96s/it] 19%|█▉ | 1900/10000 [6:53:56<29:07:44, 12.95s/it] {'loss': 0.006, 'learning_rate': 4.0520000000000005e-05, 'epoch': 0.72} 19%|█▉ | 1900/10000 [6:53:56<29:07:44, 12.95s/it] 19%|█▉ | 1901/10000 [6:54:09<29:08:06, 12.95s/it] {'loss': 0.0061, 'learning_rate': 4.0515e-05, 'epoch': 0.72} 19%|█▉ | 1901/10000 [6:54:09<29:08:06, 12.95s/it] 19%|█▉ | 1902/10000 [6:54:22<29:10:43, 12.97s/it] {'loss': 0.0059, 'learning_rate': 4.0510000000000003e-05, 'epoch': 0.72} 19%|█▉ | 1902/10000 [6:54:22<29:10:43, 12.97s/it] 19%|█▉ | 1903/10000 [6:54:35<29:08:31, 12.96s/it] {'loss': 0.0059, 'learning_rate': 4.0505000000000006e-05, 'epoch': 0.72} 19%|█▉ | 1903/10000 [6:54:35<29:08:31, 12.96s/it] 19%|█▉ | 1904/10000 [6:54:48<29:05:18, 12.93s/it] {'loss': 0.0055, 'learning_rate': 4.05e-05, 'epoch': 0.72} 19%|█▉ | 1904/10000 [6:54:48<29:05:18, 12.93s/it] 19%|█▉ | 1905/10000 [6:55:01<29:04:31, 12.93s/it] {'loss': 0.0063, 'learning_rate': 4.0495e-05, 'epoch': 0.72} 19%|█▉ | 1905/10000 [6:55:01<29:04:31, 12.93s/it] 19%|█▉ | 1906/10000 [6:55:14<29:04:59, 12.94s/it] {'loss': 0.0056, 'learning_rate': 4.049e-05, 'epoch': 0.72} 19%|█▉ | 1906/10000 [6:55:14<29:04:59, 12.94s/it] 19%|█▉ | 1907/10000 [6:55:27<29:04:21, 12.93s/it] {'loss': 0.0071, 'learning_rate': 4.0485e-05, 'epoch': 0.72} 19%|█▉ | 1907/10000 [6:55:27<29:04:21, 12.93s/it] 19%|█▉ | 1908/10000 [6:55:40<29:06:11, 12.95s/it] {'loss': 0.005, 'learning_rate': 4.048e-05, 'epoch': 0.72} 19%|█▉ | 1908/10000 [6:55:40<29:06:11, 12.95s/it] 19%|█▉ | 1909/10000 [6:55:53<29:04:50, 12.94s/it] {'loss': 0.0054, 'learning_rate': 4.0475e-05, 'epoch': 0.72} 19%|█▉ | 1909/10000 [6:55:53<29:04:50, 12.94s/it] 19%|█▉ | 1910/10000 [6:56:06<29:06:20, 12.95s/it] {'loss': 0.0052, 'learning_rate': 4.0470000000000004e-05, 'epoch': 0.72} 19%|█▉ | 1910/10000 [6:56:06<29:06:20, 12.95s/it] 19%|█▉ | 1911/10000 [6:56:19<29:04:58, 12.94s/it] {'loss': 0.0055, 'learning_rate': 4.0465e-05, 'epoch': 0.72} 19%|█▉ | 1911/10000 [6:56:19<29:04:58, 12.94s/it] 19%|█▉ | 1912/10000 [6:56:31<29:04:07, 12.94s/it] {'loss': 0.0056, 'learning_rate': 4.046e-05, 'epoch': 0.72} 19%|█▉ | 1912/10000 [6:56:31<29:04:07, 12.94s/it] 19%|█▉ | 1913/10000 [6:56:44<29:07:08, 12.96s/it] {'loss': 0.0045, 'learning_rate': 4.0455000000000005e-05, 'epoch': 0.72} 19%|█▉ | 1913/10000 [6:56:45<29:07:08, 12.96s/it] 19%|█▉ | 1914/10000 [6:56:57<29:07:19, 12.97s/it] {'loss': 0.0067, 'learning_rate': 4.045000000000001e-05, 'epoch': 0.72} 19%|█▉ | 1914/10000 [6:56:57<29:07:19, 12.97s/it] 19%|█▉ | 1915/10000 [6:57:10<29:09:07, 12.98s/it] {'loss': 0.0048, 'learning_rate': 4.0444999999999996e-05, 'epoch': 0.72} 19%|█▉ | 1915/10000 [6:57:11<29:09:07, 12.98s/it] 19%|█▉ | 1916/10000 [6:57:23<29:04:49, 12.95s/it] {'loss': 0.0068, 'learning_rate': 4.044e-05, 'epoch': 0.72} 19%|█▉ | 1916/10000 [6:57:23<29:04:49, 12.95s/it] 19%|█▉ | 1917/10000 [6:57:36<29:05:12, 12.95s/it] {'loss': 0.0057, 'learning_rate': 4.0435e-05, 'epoch': 0.72} 19%|█▉ | 1917/10000 [6:57:36<29:05:12, 12.95s/it] 19%|█▉ | 1918/10000 [6:57:49<29:04:51, 12.95s/it] {'loss': 0.0075, 'learning_rate': 4.0430000000000004e-05, 'epoch': 0.72} 19%|█▉ | 1918/10000 [6:57:49<29:04:51, 12.95s/it] 19%|█▉ | 1919/10000 [6:58:02<29:01:03, 12.93s/it] {'loss': 0.0072, 'learning_rate': 4.0425e-05, 'epoch': 0.72} 19%|█▉ | 1919/10000 [6:58:02<29:01:03, 12.93s/it] 19%|█▉ | 1920/10000 [6:58:15<29:06:11, 12.97s/it] {'loss': 0.0063, 'learning_rate': 4.042e-05, 'epoch': 0.72} 19%|█▉ | 1920/10000 [6:58:15<29:06:11, 12.97s/it] 19%|█▉ | 1921/10000 [6:58:28<29:02:15, 12.94s/it] {'loss': 0.0053, 'learning_rate': 4.0415000000000005e-05, 'epoch': 0.72} 19%|█▉ | 1921/10000 [6:58:28<29:02:15, 12.94s/it] 19%|█▉ | 1922/10000 [6:58:41<29:06:47, 12.97s/it] {'loss': 0.0063, 'learning_rate': 4.041e-05, 'epoch': 0.72} 19%|█▉ | 1922/10000 [6:58:41<29:06:47, 12.97s/it] 19%|█▉ | 1923/10000 [6:58:54<29:09:47, 13.00s/it] {'loss': 0.0052, 'learning_rate': 4.0405000000000004e-05, 'epoch': 0.72} 19%|█▉ | 1923/10000 [6:58:54<29:09:47, 13.00s/it] 19%|█▉ | 1924/10000 [6:59:07<29:08:44, 12.99s/it] {'loss': 0.0049, 'learning_rate': 4.0400000000000006e-05, 'epoch': 0.72} 19%|█▉ | 1924/10000 [6:59:07<29:08:44, 12.99s/it] 19%|█▉ | 1925/10000 [6:59:20<29:09:35, 13.00s/it] {'loss': 0.0054, 'learning_rate': 4.0395e-05, 'epoch': 0.73} 19%|█▉ | 1925/10000 [6:59:20<29:09:35, 13.00s/it] 19%|█▉ | 1926/10000 [6:59:33<29:13:17, 13.03s/it] {'loss': 0.0061, 'learning_rate': 4.039e-05, 'epoch': 0.73} 19%|█▉ | 1926/10000 [6:59:33<29:13:17, 13.03s/it] 19%|█▉ | 1927/10000 [6:59:46<29:06:05, 12.98s/it] {'loss': 0.0063, 'learning_rate': 4.0385e-05, 'epoch': 0.73} 19%|█▉ | 1927/10000 [6:59:46<29:06:05, 12.98s/it] 19%|█▉ | 1928/10000 [6:59:59<29:03:40, 12.96s/it] {'loss': 0.0061, 'learning_rate': 4.038e-05, 'epoch': 0.73} 19%|█▉ | 1928/10000 [6:59:59<29:03:40, 12.96s/it] 19%|█▉ | 1929/10000 [7:00:12<29:03:57, 12.96s/it] {'loss': 0.005, 'learning_rate': 4.0375e-05, 'epoch': 0.73} 19%|█▉ | 1929/10000 [7:00:12<29:03:57, 12.96s/it] 19%|█▉ | 1930/10000 [7:00:25<29:05:07, 12.97s/it] {'loss': 0.0049, 'learning_rate': 4.037e-05, 'epoch': 0.73} 19%|█▉ | 1930/10000 [7:00:25<29:05:07, 12.97s/it] 19%|█▉ | 1931/10000 [7:00:38<29:03:03, 12.96s/it] {'loss': 0.0053, 'learning_rate': 4.0365000000000004e-05, 'epoch': 0.73} 19%|█▉ | 1931/10000 [7:00:38<29:03:03, 12.96s/it] 19%|█▉ | 1932/10000 [7:00:51<29:00:31, 12.94s/it] {'loss': 0.0059, 'learning_rate': 4.0360000000000007e-05, 'epoch': 0.73} 19%|█▉ | 1932/10000 [7:00:51<29:00:31, 12.94s/it] 19%|█▉ | 1933/10000 [7:01:04<29:04:32, 12.98s/it] {'loss': 0.0058, 'learning_rate': 4.0355e-05, 'epoch': 0.73} 19%|█▉ | 1933/10000 [7:01:04<29:04:32, 12.98s/it] 19%|█▉ | 1934/10000 [7:01:17<29:02:42, 12.96s/it] {'loss': 0.0071, 'learning_rate': 4.0350000000000005e-05, 'epoch': 0.73} 19%|█▉ | 1934/10000 [7:01:17<29:02:42, 12.96s/it] 19%|█▉ | 1935/10000 [7:01:30<29:04:09, 12.98s/it] {'loss': 0.006, 'learning_rate': 4.0345e-05, 'epoch': 0.73} 19%|█▉ | 1935/10000 [7:01:30<29:04:09, 12.98s/it] 19%|█▉ | 1936/10000 [7:01:43<29:04:20, 12.98s/it] {'loss': 0.0057, 'learning_rate': 4.034e-05, 'epoch': 0.73} 19%|█▉ | 1936/10000 [7:01:43<29:04:20, 12.98s/it] 19%|█▉ | 1937/10000 [7:01:56<29:04:15, 12.98s/it] {'loss': 0.0057, 'learning_rate': 4.0335e-05, 'epoch': 0.73} 19%|█▉ | 1937/10000 [7:01:56<29:04:15, 12.98s/it] 19%|█▉ | 1938/10000 [7:02:09<28:58:51, 12.94s/it] {'loss': 0.0076, 'learning_rate': 4.033e-05, 'epoch': 0.73} 19%|█▉ | 1938/10000 [7:02:09<28:58:51, 12.94s/it] 19%|█▉ | 1939/10000 [7:02:22<28:57:03, 12.93s/it] {'loss': 0.0057, 'learning_rate': 4.0325000000000004e-05, 'epoch': 0.73} 19%|█▉ | 1939/10000 [7:02:22<28:57:03, 12.93s/it] 19%|█▉ | 1940/10000 [7:02:35<28:57:52, 12.94s/it] {'loss': 0.006, 'learning_rate': 4.032e-05, 'epoch': 0.73} 19%|█▉ | 1940/10000 [7:02:35<28:57:52, 12.94s/it] 19%|█▉ | 1941/10000 [7:02:47<28:59:38, 12.95s/it] {'loss': 0.0052, 'learning_rate': 4.0315e-05, 'epoch': 0.73} 19%|█▉ | 1941/10000 [7:02:48<28:59:38, 12.95s/it] 19%|█▉ | 1942/10000 [7:03:00<28:58:06, 12.94s/it] {'loss': 0.0055, 'learning_rate': 4.0310000000000005e-05, 'epoch': 0.73} 19%|█▉ | 1942/10000 [7:03:00<28:58:06, 12.94s/it] 19%|█▉ | 1943/10000 [7:03:13<28:59:52, 12.96s/it] {'loss': 0.0043, 'learning_rate': 4.0305e-05, 'epoch': 0.73} 19%|█▉ | 1943/10000 [7:03:13<28:59:52, 12.96s/it] 19%|█▉ | 1944/10000 [7:03:26<29:01:01, 12.97s/it] {'loss': 0.006, 'learning_rate': 4.0300000000000004e-05, 'epoch': 0.73} 19%|█▉ | 1944/10000 [7:03:26<29:01:01, 12.97s/it] 19%|█▉ | 1945/10000 [7:03:39<29:03:56, 12.99s/it] {'loss': 0.0049, 'learning_rate': 4.0295e-05, 'epoch': 0.73} 19%|█▉ | 1945/10000 [7:03:39<29:03:56, 12.99s/it] 19%|█▉ | 1946/10000 [7:03:52<29:01:15, 12.97s/it] {'loss': 0.0052, 'learning_rate': 4.029e-05, 'epoch': 0.73} 19%|█▉ | 1946/10000 [7:03:52<29:01:15, 12.97s/it] 19%|█▉ | 1947/10000 [7:04:05<28:57:24, 12.94s/it] {'loss': 0.0058, 'learning_rate': 4.0285e-05, 'epoch': 0.73} 19%|█▉ | 1947/10000 [7:04:05<28:57:24, 12.94s/it] 19%|█▉ | 1948/10000 [7:04:18<29:01:06, 12.97s/it] {'loss': 0.0056, 'learning_rate': 4.028e-05, 'epoch': 0.73} 19%|█▉ | 1948/10000 [7:04:18<29:01:06, 12.97s/it] 19%|█▉ | 1949/10000 [7:04:31<29:02:30, 12.99s/it] {'loss': 0.0059, 'learning_rate': 4.0275e-05, 'epoch': 0.73} 19%|█▉ | 1949/10000 [7:04:31<29:02:30, 12.99s/it] 20%|█▉ | 1950/10000 [7:04:44<29:00:30, 12.97s/it] {'loss': 0.0063, 'learning_rate': 4.027e-05, 'epoch': 0.73} 20%|█▉ | 1950/10000 [7:04:44<29:00:30, 12.97s/it] 20%|█▉ | 1951/10000 [7:04:57<28:59:53, 12.97s/it] {'loss': 0.0054, 'learning_rate': 4.0265e-05, 'epoch': 0.74} 20%|█▉ | 1951/10000 [7:04:57<28:59:53, 12.97s/it] 20%|█▉ | 1952/10000 [7:05:10<28:58:37, 12.96s/it] {'loss': 0.0037, 'learning_rate': 4.0260000000000004e-05, 'epoch': 0.74} 20%|█▉ | 1952/10000 [7:05:10<28:58:37, 12.96s/it] 20%|█▉ | 1953/10000 [7:05:23<28:56:36, 12.95s/it] {'loss': 0.0051, 'learning_rate': 4.025500000000001e-05, 'epoch': 0.74} 20%|█▉ | 1953/10000 [7:05:23<28:56:36, 12.95s/it] 20%|█▉ | 1954/10000 [7:05:36<28:55:08, 12.94s/it] {'loss': 0.0084, 'learning_rate': 4.025e-05, 'epoch': 0.74} 20%|█▉ | 1954/10000 [7:05:36<28:55:08, 12.94s/it] 20%|█▉ | 1955/10000 [7:05:49<28:56:52, 12.95s/it] {'loss': 0.0053, 'learning_rate': 4.0245e-05, 'epoch': 0.74} 20%|█▉ | 1955/10000 [7:05:49<28:56:52, 12.95s/it] 20%|█▉ | 1956/10000 [7:06:02<28:57:59, 12.96s/it] {'loss': 0.0054, 'learning_rate': 4.024e-05, 'epoch': 0.74} 20%|█▉ | 1956/10000 [7:06:02<28:57:59, 12.96s/it] 20%|█▉ | 1957/10000 [7:06:15<28:59:13, 12.97s/it] {'loss': 0.0059, 'learning_rate': 4.0235000000000004e-05, 'epoch': 0.74} 20%|█▉ | 1957/10000 [7:06:15<28:59:13, 12.97s/it] 20%|█▉ | 1958/10000 [7:06:28<28:57:08, 12.96s/it] {'loss': 0.0047, 'learning_rate': 4.023e-05, 'epoch': 0.74} 20%|█▉ | 1958/10000 [7:06:28<28:57:08, 12.96s/it] 20%|█▉ | 1959/10000 [7:06:41<28:57:20, 12.96s/it] {'loss': 0.0059, 'learning_rate': 4.0225e-05, 'epoch': 0.74} 20%|█▉ | 1959/10000 [7:06:41<28:57:20, 12.96s/it] 20%|█▉ | 1960/10000 [7:06:54<28:58:40, 12.98s/it] {'loss': 0.0054, 'learning_rate': 4.0220000000000005e-05, 'epoch': 0.74} 20%|█▉ | 1960/10000 [7:06:54<28:58:40, 12.98s/it] 20%|█▉ | 1961/10000 [7:07:07<28:57:52, 12.97s/it] {'loss': 0.005, 'learning_rate': 4.0215e-05, 'epoch': 0.74} 20%|█▉ | 1961/10000 [7:07:07<28:57:52, 12.97s/it] 20%|█▉ | 1962/10000 [7:07:20<28:54:29, 12.95s/it] {'loss': 0.0057, 'learning_rate': 4.021e-05, 'epoch': 0.74} 20%|█▉ | 1962/10000 [7:07:20<28:54:29, 12.95s/it] 20%|█▉ | 1963/10000 [7:07:33<28:59:56, 12.99s/it] {'loss': 0.0064, 'learning_rate': 4.0205000000000006e-05, 'epoch': 0.74} 20%|█▉ | 1963/10000 [7:07:33<28:59:56, 12.99s/it] 20%|█▉ | 1964/10000 [7:07:46<28:58:25, 12.98s/it] {'loss': 0.0056, 'learning_rate': 4.02e-05, 'epoch': 0.74} 20%|█▉ | 1964/10000 [7:07:46<28:58:25, 12.98s/it] 20%|█▉ | 1965/10000 [7:07:59<28:57:04, 12.97s/it] {'loss': 0.0058, 'learning_rate': 4.0195e-05, 'epoch': 0.74} 20%|█▉ | 1965/10000 [7:07:59<28:57:04, 12.97s/it] 20%|█▉ | 1966/10000 [7:08:12<29:02:06, 13.01s/it] {'loss': 0.0057, 'learning_rate': 4.019e-05, 'epoch': 0.74} 20%|█▉ | 1966/10000 [7:08:12<29:02:06, 13.01s/it] 20%|█▉ | 1967/10000 [7:08:25<29:03:45, 13.02s/it] {'loss': 0.0066, 'learning_rate': 4.0185e-05, 'epoch': 0.74} 20%|█▉ | 1967/10000 [7:08:25<29:03:45, 13.02s/it] 20%|█▉ | 1968/10000 [7:08:38<29:03:34, 13.02s/it] {'loss': 0.0059, 'learning_rate': 4.018e-05, 'epoch': 0.74} 20%|█▉ | 1968/10000 [7:08:38<29:03:34, 13.02s/it] 20%|█▉ | 1969/10000 [7:08:51<29:03:09, 13.02s/it] {'loss': 0.0056, 'learning_rate': 4.0175e-05, 'epoch': 0.74} 20%|█▉ | 1969/10000 [7:08:51<29:03:09, 13.02s/it] 20%|█▉ | 1970/10000 [7:09:04<29:00:15, 13.00s/it] {'loss': 0.0063, 'learning_rate': 4.017e-05, 'epoch': 0.74} 20%|█▉ | 1970/10000 [7:09:04<29:00:15, 13.00s/it] 20%|█▉ | 1971/10000 [7:09:17<29:00:42, 13.01s/it] {'loss': 0.0046, 'learning_rate': 4.0165000000000006e-05, 'epoch': 0.74} 20%|█▉ | 1971/10000 [7:09:17<29:00:42, 13.01s/it] 20%|█▉ | 1972/10000 [7:09:30<28:56:47, 12.98s/it] {'loss': 0.005, 'learning_rate': 4.016e-05, 'epoch': 0.74} 20%|█▉ | 1972/10000 [7:09:30<28:56:47, 12.98s/it] 20%|█▉ | 1973/10000 [7:09:43<29:01:22, 13.02s/it] {'loss': 0.0061, 'learning_rate': 4.0155000000000004e-05, 'epoch': 0.74} 20%|█▉ | 1973/10000 [7:09:43<29:01:22, 13.02s/it] 20%|█▉ | 1974/10000 [7:09:56<29:03:05, 13.03s/it] {'loss': 0.0056, 'learning_rate': 4.015000000000001e-05, 'epoch': 0.74} 20%|█▉ | 1974/10000 [7:09:56<29:03:05, 13.03s/it] 20%|█▉ | 1975/10000 [7:10:09<29:07:52, 13.07s/it] {'loss': 0.0058, 'learning_rate': 4.0144999999999996e-05, 'epoch': 0.74} 20%|█▉ | 1975/10000 [7:10:09<29:07:52, 13.07s/it] 20%|█▉ | 1976/10000 [7:10:22<29:06:59, 13.06s/it] {'loss': 0.0057, 'learning_rate': 4.014e-05, 'epoch': 0.74} 20%|█▉ | 1976/10000 [7:10:22<29:06:59, 13.06s/it] 20%|█▉ | 1977/10000 [7:10:35<29:04:33, 13.05s/it] {'loss': 0.0046, 'learning_rate': 4.0135e-05, 'epoch': 0.74} 20%|█▉ | 1977/10000 [7:10:35<29:04:33, 13.05s/it] 20%|█▉ | 1978/10000 [7:10:48<29:03:06, 13.04s/it] {'loss': 0.0048, 'learning_rate': 4.0130000000000004e-05, 'epoch': 0.75} 20%|█▉ | 1978/10000 [7:10:48<29:03:06, 13.04s/it] 20%|█▉ | 1979/10000 [7:11:01<29:02:17, 13.03s/it] {'loss': 0.0056, 'learning_rate': 4.0125e-05, 'epoch': 0.75} 20%|█▉ | 1979/10000 [7:11:01<29:02:17, 13.03s/it] 20%|█▉ | 1980/10000 [7:11:14<29:01:08, 13.03s/it] {'loss': 0.0058, 'learning_rate': 4.012e-05, 'epoch': 0.75} 20%|█▉ | 1980/10000 [7:11:14<29:01:08, 13.03s/it] 20%|█▉ | 1981/10000 [7:11:27<28:58:45, 13.01s/it] {'loss': 0.0058, 'learning_rate': 4.0115000000000005e-05, 'epoch': 0.75} 20%|█▉ | 1981/10000 [7:11:27<28:58:45, 13.01s/it] 20%|█▉ | 1982/10000 [7:11:40<28:56:58, 13.00s/it] {'loss': 0.0059, 'learning_rate': 4.011e-05, 'epoch': 0.75} 20%|█▉ | 1982/10000 [7:11:40<28:56:58, 13.00s/it] 20%|█▉ | 1983/10000 [7:11:53<28:56:43, 13.00s/it] {'loss': 0.0058, 'learning_rate': 4.0105e-05, 'epoch': 0.75} 20%|█▉ | 1983/10000 [7:11:53<28:56:43, 13.00s/it] 20%|█▉ | 1984/10000 [7:12:06<28:54:03, 12.98s/it] {'loss': 0.0052, 'learning_rate': 4.0100000000000006e-05, 'epoch': 0.75} 20%|█▉ | 1984/10000 [7:12:06<28:54:03, 12.98s/it] 20%|█▉ | 1985/10000 [7:12:19<28:51:36, 12.96s/it] {'loss': 0.0064, 'learning_rate': 4.0095e-05, 'epoch': 0.75} 20%|█▉ | 1985/10000 [7:12:19<28:51:36, 12.96s/it] 20%|█▉ | 1986/10000 [7:12:32<28:49:04, 12.95s/it] {'loss': 0.0054, 'learning_rate': 4.009e-05, 'epoch': 0.75} 20%|█▉ | 1986/10000 [7:12:32<28:49:04, 12.95s/it] 20%|█▉ | 1987/10000 [7:12:45<28:49:22, 12.95s/it] {'loss': 0.0054, 'learning_rate': 4.0085e-05, 'epoch': 0.75} 20%|█▉ | 1987/10000 [7:12:45<28:49:22, 12.95s/it] 20%|█▉ | 1988/10000 [7:12:58<28:48:18, 12.94s/it] {'loss': 0.0058, 'learning_rate': 4.008e-05, 'epoch': 0.75} 20%|█▉ | 1988/10000 [7:12:58<28:48:18, 12.94s/it] 20%|█▉ | 1989/10000 [7:13:11<28:51:01, 12.96s/it] {'loss': 0.005, 'learning_rate': 4.0075e-05, 'epoch': 0.75} 20%|█▉ | 1989/10000 [7:13:11<28:51:01, 12.96s/it] 20%|█▉ | 1990/10000 [7:13:24<28:50:28, 12.96s/it] {'loss': 0.0058, 'learning_rate': 4.007e-05, 'epoch': 0.75} 20%|█▉ | 1990/10000 [7:13:24<28:50:28, 12.96s/it] 20%|█▉ | 1991/10000 [7:13:37<28:50:20, 12.96s/it] {'loss': 0.0048, 'learning_rate': 4.0065000000000003e-05, 'epoch': 0.75} 20%|█▉ | 1991/10000 [7:13:37<28:50:20, 12.96s/it] 20%|█▉ | 1992/10000 [7:13:50<28:48:02, 12.95s/it] {'loss': 0.0063, 'learning_rate': 4.0060000000000006e-05, 'epoch': 0.75} 20%|█▉ | 1992/10000 [7:13:50<28:48:02, 12.95s/it] 20%|█▉ | 1993/10000 [7:14:03<28:45:07, 12.93s/it] {'loss': 0.0068, 'learning_rate': 4.0055e-05, 'epoch': 0.75} 20%|█▉ | 1993/10000 [7:14:03<28:45:07, 12.93s/it] 20%|█▉ | 1994/10000 [7:14:16<28:47:40, 12.95s/it] {'loss': 0.0054, 'learning_rate': 4.0050000000000004e-05, 'epoch': 0.75} 20%|█▉ | 1994/10000 [7:14:16<28:47:40, 12.95s/it] 20%|█▉ | 1995/10000 [7:14:28<28:46:59, 12.94s/it] {'loss': 0.0057, 'learning_rate': 4.0045e-05, 'epoch': 0.75} 20%|█▉ | 1995/10000 [7:14:29<28:46:59, 12.94s/it] 20%|█▉ | 1996/10000 [7:14:41<28:47:26, 12.95s/it] {'loss': 0.0045, 'learning_rate': 4.004e-05, 'epoch': 0.75} 20%|█▉ | 1996/10000 [7:14:41<28:47:26, 12.95s/it] 20%|█▉ | 1997/10000 [7:14:54<28:45:31, 12.94s/it] {'loss': 0.0057, 'learning_rate': 4.0035e-05, 'epoch': 0.75} 20%|█▉ | 1997/10000 [7:14:54<28:45:31, 12.94s/it] 20%|█▉ | 1998/10000 [7:15:07<28:45:54, 12.94s/it] {'loss': 0.0061, 'learning_rate': 4.003e-05, 'epoch': 0.75} 20%|█▉ | 1998/10000 [7:15:07<28:45:54, 12.94s/it] 20%|█▉ | 1999/10000 [7:15:20<28:47:24, 12.95s/it] {'loss': 0.0057, 'learning_rate': 4.0025000000000004e-05, 'epoch': 0.75} 20%|█▉ | 1999/10000 [7:15:20<28:47:24, 12.95s/it] 20%|██ | 2000/10000 [7:15:33<28:47:46, 12.96s/it] {'loss': 0.0053, 'learning_rate': 4.002e-05, 'epoch': 0.75} 20%|██ | 2000/10000 [7:15:33<28:47:46, 12.96s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-06 03:40:31,883 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/config.json [INFO|configuration_utils.py:364] 2024-11-06 03:40:31,884 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-06 03:41:27,834 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-06 03:41:27,836 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-06 03:41:27,838 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/special_tokens_map.json [2024-11-06 03:41:27,848] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step2000 is about to be saved! [2024-11-06 03:41:27,883] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/global_step2000/mp_rank_00_model_states.pt [2024-11-06 03:41:27,883] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/global_step2000/mp_rank_00_model_states.pt... [2024-11-06 03:42:18,491] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/global_step2000/mp_rank_00_model_states.pt. [2024-11-06 03:42:18,580] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-06 03:44:07,821] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-06 03:44:10,461] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-06 03:44:10,461] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step2000 is ready now! 20%|██ | 2001/10000 [7:19:30<177:47:09, 80.01s/it] {'loss': 0.0051, 'learning_rate': 4.0015e-05, 'epoch': 0.75} 20%|██ | 2001/10000 [7:19:30<177:47:09, 80.01s/it] 20%|██ | 2002/10000 [7:19:43<132:58:59, 59.86s/it] {'loss': 0.0053, 'learning_rate': 4.0010000000000005e-05, 'epoch': 0.75} 20%|██ | 2002/10000 [7:19:43<132:58:59, 59.86s/it] 20%|██ | 2003/10000 [7:19:55<101:37:54, 45.75s/it] {'loss': 0.0054, 'learning_rate': 4.0005e-05, 'epoch': 0.75} 20%|██ | 2003/10000 [7:19:55<101:37:54, 45.75s/it] 20%|██ | 2004/10000 [7:20:08<79:44:45, 35.90s/it] {'loss': 0.005, 'learning_rate': 4e-05, 'epoch': 0.76} 20%|██ | 2004/10000 [7:20:08<79:44:45, 35.90s/it][2024-11-06 03:45:18,457] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 20%|██ | 2005/10000 [7:20:20<63:32:01, 28.61s/it] {'loss': 0.0051, 'learning_rate': 4e-05, 'epoch': 0.76} 20%|██ | 2005/10000 [7:20:20<63:32:01, 28.61s/it][2024-11-06 03:45:30,034] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 20%|██ | 2006/10000 [7:20:31<52:10:48, 23.50s/it] {'loss': 0.0043, 'learning_rate': 4e-05, 'epoch': 0.76} 20%|██ | 2006/10000 [7:20:32<52:10:48, 23.50s/it] 20%|██ | 2007/10000 [7:20:44<45:06:28, 20.32s/it] {'loss': 0.0053, 'learning_rate': 3.9995000000000006e-05, 'epoch': 0.76} 20%|██ | 2007/10000 [7:20:44<45:06:28, 20.32s/it] 20%|██ | 2008/10000 [7:20:57<40:10:42, 18.10s/it] {'loss': 0.0055, 'learning_rate': 3.999e-05, 'epoch': 0.76} 20%|██ | 2008/10000 [7:20:57<40:10:42, 18.10s/it] 20%|██ | 2009/10000 [7:21:10<36:44:12, 16.55s/it] {'loss': 0.0045, 'learning_rate': 3.9985e-05, 'epoch': 0.76} 20%|██ | 2009/10000 [7:21:10<36:44:12, 16.55s/it] 20%|██ | 2010/10000 [7:21:23<34:24:44, 15.50s/it] {'loss': 0.0067, 'learning_rate': 3.998e-05, 'epoch': 0.76} 20%|██ | 2010/10000 [7:21:23<34:24:44, 15.50s/it] 20%|██ | 2011/10000 [7:21:36<32:44:42, 14.76s/it] {'loss': 0.0049, 'learning_rate': 3.9975e-05, 'epoch': 0.76} 20%|██ | 2011/10000 [7:21:36<32:44:42, 14.76s/it] 20%|██ | 2012/10000 [7:21:49<31:32:11, 14.21s/it] {'loss': 0.0044, 'learning_rate': 3.9970000000000005e-05, 'epoch': 0.76} 20%|██ | 2012/10000 [7:21:49<31:32:11, 14.21s/it] 20%|██ | 2013/10000 [7:22:02<30:39:36, 13.82s/it] {'loss': 0.005, 'learning_rate': 3.9965e-05, 'epoch': 0.76} 20%|██ | 2013/10000 [7:22:02<30:39:36, 13.82s/it] 20%|██ | 2014/10000 [7:22:15<30:00:46, 13.53s/it] {'loss': 0.0055, 'learning_rate': 3.9960000000000004e-05, 'epoch': 0.76} 20%|██ | 2014/10000 [7:22:15<30:00:46, 13.53s/it] 20%|██ | 2015/10000 [7:22:28<29:33:08, 13.32s/it] {'loss': 0.0053, 'learning_rate': 3.9955000000000006e-05, 'epoch': 0.76} 20%|██ | 2015/10000 [7:22:28<29:33:08, 13.32s/it] 20%|██ | 2016/10000 [7:22:41<29:15:02, 13.19s/it] {'loss': 0.0063, 'learning_rate': 3.995e-05, 'epoch': 0.76} 20%|██ | 2016/10000 [7:22:41<29:15:02, 13.19s/it] 20%|██ | 2017/10000 [7:22:54<28:59:52, 13.08s/it] {'loss': 0.0067, 'learning_rate': 3.9945000000000005e-05, 'epoch': 0.76} 20%|██ | 2017/10000 [7:22:54<28:59:52, 13.08s/it] 20%|██ | 2018/10000 [7:23:06<28:50:25, 13.01s/it] {'loss': 0.0048, 'learning_rate': 3.994e-05, 'epoch': 0.76} 20%|██ | 2018/10000 [7:23:06<28:50:25, 13.01s/it] 20%|██ | 2019/10000 [7:23:19<28:45:29, 12.97s/it] {'loss': 0.0054, 'learning_rate': 3.9935e-05, 'epoch': 0.76} 20%|██ | 2019/10000 [7:23:19<28:45:29, 12.97s/it] 20%|██ | 2020/10000 [7:23:32<28:43:10, 12.96s/it] {'loss': 0.0063, 'learning_rate': 3.993e-05, 'epoch': 0.76} 20%|██ | 2020/10000 [7:23:32<28:43:10, 12.96s/it] 20%|██ | 2021/10000 [7:23:45<28:42:36, 12.95s/it] {'loss': 0.0066, 'learning_rate': 3.9925e-05, 'epoch': 0.76} 20%|██ | 2021/10000 [7:23:45<28:42:36, 12.95s/it] 20%|██ | 2022/10000 [7:23:58<28:40:24, 12.94s/it] {'loss': 0.0047, 'learning_rate': 3.9920000000000004e-05, 'epoch': 0.76} 20%|██ | 2022/10000 [7:23:58<28:40:24, 12.94s/it] 20%|██ | 2023/10000 [7:24:11<28:38:29, 12.93s/it] {'loss': 0.0057, 'learning_rate': 3.9915e-05, 'epoch': 0.76} 20%|██ | 2023/10000 [7:24:11<28:38:29, 12.93s/it] 20%|██ | 2024/10000 [7:24:24<28:36:45, 12.91s/it] {'loss': 0.0061, 'learning_rate': 3.991e-05, 'epoch': 0.76} 20%|██ | 2024/10000 [7:24:24<28:36:45, 12.91s/it] 20%|██ | 2025/10000 [7:24:37<28:35:29, 12.91s/it] {'loss': 0.0049, 'learning_rate': 3.9905000000000005e-05, 'epoch': 0.76} 20%|██ | 2025/10000 [7:24:37<28:35:29, 12.91s/it] 20%|██ | 2026/10000 [7:24:50<28:36:11, 12.91s/it] {'loss': 0.0042, 'learning_rate': 3.99e-05, 'epoch': 0.76} 20%|██ | 2026/10000 [7:24:50<28:36:11, 12.91s/it] 20%|██ | 2027/10000 [7:25:03<28:35:04, 12.91s/it] {'loss': 0.0052, 'learning_rate': 3.9895000000000003e-05, 'epoch': 0.76} 20%|██ | 2027/10000 [7:25:03<28:35:04, 12.91s/it] 20%|██ | 2028/10000 [7:25:15<28:33:22, 12.90s/it] {'loss': 0.0055, 'learning_rate': 3.989e-05, 'epoch': 0.76} 20%|██ | 2028/10000 [7:25:15<28:33:22, 12.90s/it] 20%|██ | 2029/10000 [7:25:28<28:36:00, 12.92s/it] {'loss': 0.0046, 'learning_rate': 3.9885e-05, 'epoch': 0.76} 20%|██ | 2029/10000 [7:25:28<28:36:00, 12.92s/it] 20%|██ | 2030/10000 [7:25:41<28:34:56, 12.91s/it] {'loss': 0.008, 'learning_rate': 3.988e-05, 'epoch': 0.76} 20%|██ | 2030/10000 [7:25:41<28:34:56, 12.91s/it] 20%|██ | 2031/10000 [7:25:54<28:33:15, 12.90s/it] {'loss': 0.0049, 'learning_rate': 3.9875e-05, 'epoch': 0.77} 20%|██ | 2031/10000 [7:25:54<28:33:15, 12.90s/it] 20%|██ | 2032/10000 [7:26:07<28:29:39, 12.87s/it] {'loss': 0.0062, 'learning_rate': 3.987e-05, 'epoch': 0.77} 20%|██ | 2032/10000 [7:26:07<28:29:39, 12.87s/it] 20%|██ | 2033/10000 [7:26:20<28:26:51, 12.85s/it] {'loss': 0.0061, 'learning_rate': 3.9865000000000005e-05, 'epoch': 0.77} 20%|██ | 2033/10000 [7:26:20<28:26:51, 12.85s/it] 20%|██ | 2034/10000 [7:26:33<28:30:16, 12.88s/it] {'loss': 0.0045, 'learning_rate': 3.986e-05, 'epoch': 0.77} 20%|██ | 2034/10000 [7:26:33<28:30:16, 12.88s/it] 20%|██ | 2035/10000 [7:26:46<28:29:37, 12.88s/it] {'loss': 0.0065, 'learning_rate': 3.9855000000000004e-05, 'epoch': 0.77} 20%|██ | 2035/10000 [7:26:46<28:29:37, 12.88s/it] 20%|██ | 2036/10000 [7:26:58<28:28:36, 12.87s/it] {'loss': 0.0059, 'learning_rate': 3.9850000000000006e-05, 'epoch': 0.77} 20%|██ | 2036/10000 [7:26:58<28:28:36, 12.87s/it] 20%|██ | 2037/10000 [7:27:11<28:30:55, 12.89s/it] {'loss': 0.0044, 'learning_rate': 3.9845e-05, 'epoch': 0.77} 20%|██ | 2037/10000 [7:27:11<28:30:55, 12.89s/it] 20%|██ | 2038/10000 [7:27:24<28:32:57, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.984e-05, 'epoch': 0.77} 20%|██ | 2038/10000 [7:27:24<28:32:57, 12.91s/it] 20%|██ | 2039/10000 [7:27:37<28:31:25, 12.90s/it] {'loss': 0.0051, 'learning_rate': 3.9835e-05, 'epoch': 0.77} 20%|██ | 2039/10000 [7:27:37<28:31:25, 12.90s/it] 20%|██ | 2040/10000 [7:27:50<28:32:08, 12.91s/it] {'loss': 0.0072, 'learning_rate': 3.983e-05, 'epoch': 0.77} 20%|██ | 2040/10000 [7:27:50<28:32:08, 12.91s/it] 20%|██ | 2041/10000 [7:28:03<28:34:51, 12.93s/it] {'loss': 0.0049, 'learning_rate': 3.9825e-05, 'epoch': 0.77} 20%|██ | 2041/10000 [7:28:03<28:34:51, 12.93s/it] 20%|██ | 2042/10000 [7:28:16<28:35:25, 12.93s/it] {'loss': 0.0038, 'learning_rate': 3.982e-05, 'epoch': 0.77} 20%|██ | 2042/10000 [7:28:16<28:35:25, 12.93s/it] 20%|██ | 2043/10000 [7:28:29<28:35:06, 12.93s/it] {'loss': 0.0057, 'learning_rate': 3.9815000000000004e-05, 'epoch': 0.77} 20%|██ | 2043/10000 [7:28:29<28:35:06, 12.93s/it] 20%|██ | 2044/10000 [7:28:42<28:33:47, 12.92s/it] {'loss': 0.0062, 'learning_rate': 3.981e-05, 'epoch': 0.77} 20%|██ | 2044/10000 [7:28:42<28:33:47, 12.92s/it] 20%|██ | 2045/10000 [7:28:55<28:31:47, 12.91s/it] {'loss': 0.0056, 'learning_rate': 3.9805e-05, 'epoch': 0.77} 20%|██ | 2045/10000 [7:28:55<28:31:47, 12.91s/it] 20%|██ | 2046/10000 [7:29:08<28:32:43, 12.92s/it] {'loss': 0.0059, 'learning_rate': 3.9800000000000005e-05, 'epoch': 0.77} 20%|██ | 2046/10000 [7:29:08<28:32:43, 12.92s/it] 20%|██ | 2047/10000 [7:29:21<28:33:56, 12.93s/it] {'loss': 0.0052, 'learning_rate': 3.979500000000001e-05, 'epoch': 0.77} 20%|██ | 2047/10000 [7:29:21<28:33:56, 12.93s/it] 20%|██ | 2048/10000 [7:29:34<28:34:16, 12.93s/it] {'loss': 0.0051, 'learning_rate': 3.979e-05, 'epoch': 0.77} 20%|██ | 2048/10000 [7:29:34<28:34:16, 12.93s/it] 20%|██ | 2049/10000 [7:29:47<28:34:41, 12.94s/it] {'loss': 0.0067, 'learning_rate': 3.9785e-05, 'epoch': 0.77} 20%|██ | 2049/10000 [7:29:47<28:34:41, 12.94s/it] 20%|██ | 2050/10000 [7:30:00<28:36:58, 12.96s/it] {'loss': 0.0051, 'learning_rate': 3.978e-05, 'epoch': 0.77} 20%|██ | 2050/10000 [7:30:00<28:36:58, 12.96s/it] 21%|██ | 2051/10000 [7:30:12<28:31:05, 12.92s/it] {'loss': 0.005, 'learning_rate': 3.9775e-05, 'epoch': 0.77} 21%|██ | 2051/10000 [7:30:12<28:31:05, 12.92s/it] 21%|██ | 2052/10000 [7:30:25<28:29:27, 12.90s/it] {'loss': 0.008, 'learning_rate': 3.977e-05, 'epoch': 0.77} 21%|██ | 2052/10000 [7:30:25<28:29:27, 12.90s/it] 21%|██ | 2053/10000 [7:30:38<28:28:53, 12.90s/it] {'loss': 0.0079, 'learning_rate': 3.9765e-05, 'epoch': 0.77} 21%|██ | 2053/10000 [7:30:38<28:28:53, 12.90s/it] 21%|██ | 2054/10000 [7:30:51<28:28:34, 12.90s/it] {'loss': 0.0055, 'learning_rate': 3.9760000000000006e-05, 'epoch': 0.77} 21%|██ | 2054/10000 [7:30:51<28:28:34, 12.90s/it] 21%|██ | 2055/10000 [7:31:04<28:28:47, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.9755e-05, 'epoch': 0.77} 21%|██ | 2055/10000 [7:31:04<28:28:47, 12.90s/it] 21%|██ | 2056/10000 [7:31:17<28:27:35, 12.90s/it] {'loss': 0.0064, 'learning_rate': 3.9750000000000004e-05, 'epoch': 0.77} 21%|██ | 2056/10000 [7:31:17<28:27:35, 12.90s/it] 21%|██ | 2057/10000 [7:31:30<28:27:36, 12.90s/it] {'loss': 0.0047, 'learning_rate': 3.9745000000000007e-05, 'epoch': 0.78} 21%|██ | 2057/10000 [7:31:30<28:27:36, 12.90s/it] 21%|██ | 2058/10000 [7:31:43<28:25:27, 12.88s/it] {'loss': 0.0044, 'learning_rate': 3.974e-05, 'epoch': 0.78} 21%|██ | 2058/10000 [7:31:43<28:25:27, 12.88s/it] 21%|██ | 2059/10000 [7:31:55<28:22:38, 12.86s/it] {'loss': 0.0049, 'learning_rate': 3.9735e-05, 'epoch': 0.78} 21%|██ | 2059/10000 [7:31:55<28:22:38, 12.86s/it] 21%|██ | 2060/10000 [7:32:08<28:23:27, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.973e-05, 'epoch': 0.78} 21%|██ | 2060/10000 [7:32:08<28:23:27, 12.87s/it] 21%|██ | 2061/10000 [7:32:21<28:27:07, 12.90s/it] {'loss': 0.0044, 'learning_rate': 3.9725e-05, 'epoch': 0.78} 21%|██ | 2061/10000 [7:32:21<28:27:07, 12.90s/it] 21%|██ | 2062/10000 [7:32:34<28:30:53, 12.93s/it] {'loss': 0.0063, 'learning_rate': 3.972e-05, 'epoch': 0.78} 21%|██ | 2062/10000 [7:32:34<28:30:53, 12.93s/it] 21%|██ | 2063/10000 [7:32:47<28:32:47, 12.95s/it] {'loss': 0.0097, 'learning_rate': 3.9715e-05, 'epoch': 0.78} 21%|██ | 2063/10000 [7:32:47<28:32:47, 12.95s/it] 21%|██ | 2064/10000 [7:33:00<28:31:00, 12.94s/it] {'loss': 0.0066, 'learning_rate': 3.9710000000000004e-05, 'epoch': 0.78} 21%|██ | 2064/10000 [7:33:00<28:31:00, 12.94s/it] 21%|██ | 2065/10000 [7:33:13<28:31:07, 12.94s/it] {'loss': 0.0106, 'learning_rate': 3.9705e-05, 'epoch': 0.78} 21%|██ | 2065/10000 [7:33:13<28:31:07, 12.94s/it] 21%|██ | 2066/10000 [7:33:26<28:30:30, 12.94s/it] {'loss': 0.0056, 'learning_rate': 3.97e-05, 'epoch': 0.78} 21%|██ | 2066/10000 [7:33:26<28:30:30, 12.94s/it] 21%|██ | 2067/10000 [7:33:39<28:30:45, 12.94s/it] {'loss': 0.0059, 'learning_rate': 3.9695000000000005e-05, 'epoch': 0.78} 21%|██ | 2067/10000 [7:33:39<28:30:45, 12.94s/it] 21%|██ | 2068/10000 [7:33:52<28:28:14, 12.92s/it] {'loss': 0.0089, 'learning_rate': 3.969e-05, 'epoch': 0.78} 21%|██ | 2068/10000 [7:33:52<28:28:14, 12.92s/it] 21%|██ | 2069/10000 [7:34:05<28:27:46, 12.92s/it] {'loss': 0.0058, 'learning_rate': 3.9685e-05, 'epoch': 0.78} 21%|██ | 2069/10000 [7:34:05<28:27:46, 12.92s/it] 21%|██ | 2070/10000 [7:34:18<28:29:28, 12.93s/it] {'loss': 0.0047, 'learning_rate': 3.968e-05, 'epoch': 0.78} 21%|██ | 2070/10000 [7:34:18<28:29:28, 12.93s/it] 21%|██ | 2071/10000 [7:34:31<28:28:34, 12.93s/it] {'loss': 0.0067, 'learning_rate': 3.9675e-05, 'epoch': 0.78} 21%|██ | 2071/10000 [7:34:31<28:28:34, 12.93s/it] 21%|██ | 2072/10000 [7:34:44<28:28:46, 12.93s/it] {'loss': 0.0057, 'learning_rate': 3.9670000000000005e-05, 'epoch': 0.78} 21%|██ | 2072/10000 [7:34:44<28:28:46, 12.93s/it] 21%|██ | 2073/10000 [7:34:57<28:27:01, 12.92s/it] {'loss': 0.008, 'learning_rate': 3.9665e-05, 'epoch': 0.78} 21%|██ | 2073/10000 [7:34:57<28:27:01, 12.92s/it] 21%|██ | 2074/10000 [7:35:09<28:27:21, 12.92s/it] {'loss': 0.0074, 'learning_rate': 3.966e-05, 'epoch': 0.78} 21%|██ | 2074/10000 [7:35:09<28:27:21, 12.92s/it] 21%|██ | 2075/10000 [7:35:22<28:25:44, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.9655000000000006e-05, 'epoch': 0.78} 21%|██ | 2075/10000 [7:35:22<28:25:44, 12.91s/it] 21%|██ | 2076/10000 [7:35:35<28:24:59, 12.91s/it] {'loss': 0.0084, 'learning_rate': 3.965e-05, 'epoch': 0.78} 21%|██ | 2076/10000 [7:35:35<28:24:59, 12.91s/it] 21%|██ | 2077/10000 [7:35:48<28:26:16, 12.92s/it] {'loss': 0.0052, 'learning_rate': 3.9645000000000004e-05, 'epoch': 0.78} 21%|██ | 2077/10000 [7:35:48<28:26:16, 12.92s/it] 21%|██ | 2078/10000 [7:36:01<28:28:37, 12.94s/it] {'loss': 0.0056, 'learning_rate': 3.964e-05, 'epoch': 0.78} 21%|██ | 2078/10000 [7:36:01<28:28:37, 12.94s/it] 21%|██ | 2079/10000 [7:36:14<28:26:34, 12.93s/it] {'loss': 0.0055, 'learning_rate': 3.9635e-05, 'epoch': 0.78} 21%|██ | 2079/10000 [7:36:14<28:26:34, 12.93s/it] 21%|██ | 2080/10000 [7:36:27<28:26:05, 12.92s/it] {'loss': 0.0068, 'learning_rate': 3.963e-05, 'epoch': 0.78} 21%|██ | 2080/10000 [7:36:27<28:26:05, 12.92s/it] 21%|██ | 2081/10000 [7:36:40<28:25:44, 12.92s/it] {'loss': 0.0057, 'learning_rate': 3.9625e-05, 'epoch': 0.78} 21%|██ | 2081/10000 [7:36:40<28:25:44, 12.92s/it] 21%|██ | 2082/10000 [7:36:53<28:25:53, 12.93s/it] {'loss': 0.0092, 'learning_rate': 3.9620000000000004e-05, 'epoch': 0.78} 21%|██ | 2082/10000 [7:36:53<28:25:53, 12.93s/it] 21%|██ | 2083/10000 [7:37:06<28:27:41, 12.94s/it] {'loss': 0.0057, 'learning_rate': 3.9615e-05, 'epoch': 0.78} 21%|██ | 2083/10000 [7:37:06<28:27:41, 12.94s/it] 21%|██ | 2084/10000 [7:37:19<28:30:44, 12.97s/it] {'loss': 0.006, 'learning_rate': 3.961e-05, 'epoch': 0.79} 21%|██ | 2084/10000 [7:37:19<28:30:44, 12.97s/it] 21%|██ | 2085/10000 [7:37:32<28:28:12, 12.95s/it] {'loss': 0.0062, 'learning_rate': 3.9605000000000005e-05, 'epoch': 0.79} 21%|██ | 2085/10000 [7:37:32<28:28:12, 12.95s/it] 21%|██ | 2086/10000 [7:37:45<28:24:32, 12.92s/it] {'loss': 0.0055, 'learning_rate': 3.960000000000001e-05, 'epoch': 0.79} 21%|██ | 2086/10000 [7:37:45<28:24:32, 12.92s/it] 21%|██ | 2087/10000 [7:37:58<28:26:36, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.9595e-05, 'epoch': 0.79} 21%|██ | 2087/10000 [7:37:58<28:26:36, 12.94s/it] 21%|██ | 2088/10000 [7:38:10<28:24:31, 12.93s/it] {'loss': 0.0065, 'learning_rate': 3.959e-05, 'epoch': 0.79} 21%|██ | 2088/10000 [7:38:11<28:24:31, 12.93s/it] 21%|██ | 2089/10000 [7:38:23<28:24:02, 12.92s/it] {'loss': 0.0052, 'learning_rate': 3.9585e-05, 'epoch': 0.79} 21%|██ | 2089/10000 [7:38:23<28:24:02, 12.92s/it] 21%|██ | 2090/10000 [7:38:36<28:26:56, 12.95s/it] {'loss': 0.0068, 'learning_rate': 3.958e-05, 'epoch': 0.79} 21%|██ | 2090/10000 [7:38:36<28:26:56, 12.95s/it] 21%|██ | 2091/10000 [7:38:49<28:28:09, 12.96s/it] {'loss': 0.0063, 'learning_rate': 3.9575e-05, 'epoch': 0.79} 21%|██ | 2091/10000 [7:38:49<28:28:09, 12.96s/it] 21%|██ | 2092/10000 [7:39:02<28:23:06, 12.92s/it] {'loss': 0.0061, 'learning_rate': 3.957e-05, 'epoch': 0.79} 21%|██ | 2092/10000 [7:39:02<28:23:06, 12.92s/it] 21%|██ | 2093/10000 [7:39:15<28:22:48, 12.92s/it] {'loss': 0.006, 'learning_rate': 3.9565000000000005e-05, 'epoch': 0.79} 21%|██ | 2093/10000 [7:39:15<28:22:48, 12.92s/it] 21%|██ | 2094/10000 [7:39:28<28:21:34, 12.91s/it] {'loss': 0.0065, 'learning_rate': 3.956e-05, 'epoch': 0.79} 21%|██ | 2094/10000 [7:39:28<28:21:34, 12.91s/it] 21%|██ | 2095/10000 [7:39:41<28:27:00, 12.96s/it] {'loss': 0.0058, 'learning_rate': 3.9555e-05, 'epoch': 0.79} 21%|██ | 2095/10000 [7:39:41<28:27:00, 12.96s/it] 21%|██ | 2096/10000 [7:39:54<28:30:26, 12.98s/it] {'loss': 0.0071, 'learning_rate': 3.9550000000000006e-05, 'epoch': 0.79} 21%|██ | 2096/10000 [7:39:54<28:30:26, 12.98s/it] 21%|██ | 2097/10000 [7:40:07<28:30:15, 12.98s/it] {'loss': 0.0048, 'learning_rate': 3.9545e-05, 'epoch': 0.79} 21%|██ | 2097/10000 [7:40:07<28:30:15, 12.98s/it] 21%|██ | 2098/10000 [7:40:20<28:30:52, 12.99s/it] {'loss': 0.0057, 'learning_rate': 3.954e-05, 'epoch': 0.79} 21%|██ | 2098/10000 [7:40:20<28:30:52, 12.99s/it] 21%|██ | 2099/10000 [7:40:33<28:25:42, 12.95s/it] {'loss': 0.0057, 'learning_rate': 3.9535e-05, 'epoch': 0.79} 21%|██ | 2099/10000 [7:40:33<28:25:42, 12.95s/it] 21%|██ | 2100/10000 [7:40:46<28:23:35, 12.94s/it] {'loss': 0.0054, 'learning_rate': 3.953e-05, 'epoch': 0.79} 21%|██ | 2100/10000 [7:40:46<28:23:35, 12.94s/it] 21%|██ | 2101/10000 [7:40:59<28:20:43, 12.92s/it] {'loss': 0.0062, 'learning_rate': 3.9525e-05, 'epoch': 0.79} 21%|██ | 2101/10000 [7:40:59<28:20:43, 12.92s/it] 21%|██ | 2102/10000 [7:41:12<28:19:52, 12.91s/it] {'loss': 0.0069, 'learning_rate': 3.952e-05, 'epoch': 0.79} 21%|██ | 2102/10000 [7:41:12<28:19:52, 12.91s/it] 21%|██ | 2103/10000 [7:41:25<28:17:34, 12.90s/it] {'loss': 0.0067, 'learning_rate': 3.9515000000000004e-05, 'epoch': 0.79} 21%|██ | 2103/10000 [7:41:25<28:17:34, 12.90s/it] 21%|██ | 2104/10000 [7:41:37<28:17:14, 12.90s/it] {'loss': 0.0075, 'learning_rate': 3.951e-05, 'epoch': 0.79} 21%|██ | 2104/10000 [7:41:37<28:17:14, 12.90s/it] 21%|██ | 2105/10000 [7:41:50<28:15:57, 12.89s/it] {'loss': 0.0062, 'learning_rate': 3.9505e-05, 'epoch': 0.79} 21%|██ | 2105/10000 [7:41:50<28:15:57, 12.89s/it] 21%|██ | 2106/10000 [7:42:03<28:16:43, 12.90s/it] {'loss': 0.0068, 'learning_rate': 3.9500000000000005e-05, 'epoch': 0.79} 21%|██ | 2106/10000 [7:42:03<28:16:43, 12.90s/it] 21%|██ | 2107/10000 [7:42:16<28:17:58, 12.91s/it] {'loss': 0.0049, 'learning_rate': 3.949500000000001e-05, 'epoch': 0.79} 21%|██ | 2107/10000 [7:42:16<28:17:58, 12.91s/it] 21%|██ | 2108/10000 [7:42:29<28:16:07, 12.89s/it] {'loss': 0.0056, 'learning_rate': 3.9489999999999996e-05, 'epoch': 0.79} 21%|██ | 2108/10000 [7:42:29<28:16:07, 12.89s/it] 21%|██ | 2109/10000 [7:42:42<28:16:55, 12.90s/it] {'loss': 0.0052, 'learning_rate': 3.9485e-05, 'epoch': 0.79} 21%|██ | 2109/10000 [7:42:42<28:16:55, 12.90s/it] 21%|██ | 2110/10000 [7:42:55<28:14:00, 12.88s/it] {'loss': 0.006, 'learning_rate': 3.948e-05, 'epoch': 0.8} 21%|██ | 2110/10000 [7:42:55<28:14:00, 12.88s/it] 21%|██ | 2111/10000 [7:43:08<28:13:00, 12.88s/it] {'loss': 0.006, 'learning_rate': 3.9475000000000004e-05, 'epoch': 0.8} 21%|██ | 2111/10000 [7:43:08<28:13:00, 12.88s/it] 21%|██ | 2112/10000 [7:43:20<28:11:56, 12.87s/it] {'loss': 0.0053, 'learning_rate': 3.947e-05, 'epoch': 0.8} 21%|██ | 2112/10000 [7:43:21<28:11:56, 12.87s/it] 21%|██ | 2113/10000 [7:43:33<28:12:23, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.9465e-05, 'epoch': 0.8} 21%|██ | 2113/10000 [7:43:33<28:12:23, 12.87s/it] 21%|██ | 2114/10000 [7:43:46<28:08:53, 12.85s/it] {'loss': 0.0062, 'learning_rate': 3.9460000000000005e-05, 'epoch': 0.8} 21%|██ | 2114/10000 [7:43:46<28:08:53, 12.85s/it] 21%|██ | 2115/10000 [7:43:59<28:09:59, 12.86s/it] {'loss': 0.0052, 'learning_rate': 3.9455e-05, 'epoch': 0.8} 21%|██ | 2115/10000 [7:43:59<28:09:59, 12.86s/it] 21%|██ | 2116/10000 [7:44:12<28:13:42, 12.89s/it] {'loss': 0.005, 'learning_rate': 3.9450000000000003e-05, 'epoch': 0.8} 21%|██ | 2116/10000 [7:44:12<28:13:42, 12.89s/it] 21%|██ | 2117/10000 [7:44:25<28:11:08, 12.87s/it] {'loss': 0.0081, 'learning_rate': 3.9445000000000006e-05, 'epoch': 0.8} 21%|██ | 2117/10000 [7:44:25<28:11:08, 12.87s/it] 21%|██ | 2118/10000 [7:44:38<28:10:52, 12.87s/it] {'loss': 0.0062, 'learning_rate': 3.944e-05, 'epoch': 0.8} 21%|██ | 2118/10000 [7:44:38<28:10:52, 12.87s/it] 21%|██ | 2119/10000 [7:44:51<28:08:51, 12.86s/it] {'loss': 0.0065, 'learning_rate': 3.9435e-05, 'epoch': 0.8} 21%|██ | 2119/10000 [7:44:51<28:08:51, 12.86s/it] 21%|██ | 2120/10000 [7:45:03<28:10:57, 12.88s/it] {'loss': 0.0061, 'learning_rate': 3.943e-05, 'epoch': 0.8} 21%|██ | 2120/10000 [7:45:03<28:10:57, 12.88s/it] 21%|██ | 2121/10000 [7:45:16<28:12:44, 12.89s/it] {'loss': 0.0065, 'learning_rate': 3.9425e-05, 'epoch': 0.8} 21%|██ | 2121/10000 [7:45:16<28:12:44, 12.89s/it] 21%|██ | 2122/10000 [7:45:29<28:11:19, 12.88s/it] {'loss': 0.0052, 'learning_rate': 3.942e-05, 'epoch': 0.8} 21%|██ | 2122/10000 [7:45:29<28:11:19, 12.88s/it] 21%|██ | 2123/10000 [7:45:42<28:11:27, 12.88s/it] {'loss': 0.0055, 'learning_rate': 3.9415e-05, 'epoch': 0.8} 21%|██ | 2123/10000 [7:45:42<28:11:27, 12.88s/it] 21%|██ | 2124/10000 [7:45:55<28:12:16, 12.89s/it] {'loss': 0.0071, 'learning_rate': 3.9410000000000004e-05, 'epoch': 0.8} 21%|██ | 2124/10000 [7:45:55<28:12:16, 12.89s/it] 21%|██▏ | 2125/10000 [7:46:08<28:09:55, 12.88s/it] {'loss': 0.0068, 'learning_rate': 3.9405e-05, 'epoch': 0.8} 21%|██▏ | 2125/10000 [7:46:08<28:09:55, 12.88s/it] 21%|██▏ | 2126/10000 [7:46:21<28:09:34, 12.87s/it] {'loss': 0.0055, 'learning_rate': 3.94e-05, 'epoch': 0.8} 21%|██▏ | 2126/10000 [7:46:21<28:09:34, 12.87s/it] 21%|██▏ | 2127/10000 [7:46:34<28:10:09, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.9395000000000005e-05, 'epoch': 0.8} 21%|██▏ | 2127/10000 [7:46:34<28:10:09, 12.88s/it] 21%|██▏ | 2128/10000 [7:46:47<28:10:01, 12.88s/it] {'loss': 0.0069, 'learning_rate': 3.939e-05, 'epoch': 0.8} 21%|██▏ | 2128/10000 [7:46:47<28:10:01, 12.88s/it] 21%|██▏ | 2129/10000 [7:46:59<28:10:52, 12.89s/it] {'loss': 0.0065, 'learning_rate': 3.9384999999999996e-05, 'epoch': 0.8} 21%|██▏ | 2129/10000 [7:46:59<28:10:52, 12.89s/it] 21%|██▏ | 2130/10000 [7:47:12<28:10:24, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.938e-05, 'epoch': 0.8} 21%|██▏ | 2130/10000 [7:47:12<28:10:24, 12.89s/it] 21%|██▏ | 2131/10000 [7:47:25<28:09:38, 12.88s/it] {'loss': 0.0071, 'learning_rate': 3.9375e-05, 'epoch': 0.8} 21%|██▏ | 2131/10000 [7:47:25<28:09:38, 12.88s/it] 21%|██▏ | 2132/10000 [7:47:38<28:08:26, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.9370000000000004e-05, 'epoch': 0.8} 21%|██▏ | 2132/10000 [7:47:38<28:08:26, 12.88s/it] 21%|██▏ | 2133/10000 [7:47:51<28:09:05, 12.88s/it] {'loss': 0.0056, 'learning_rate': 3.9365e-05, 'epoch': 0.8} 21%|██▏ | 2133/10000 [7:47:51<28:09:05, 12.88s/it] 21%|██▏ | 2134/10000 [7:48:04<28:11:11, 12.90s/it] {'loss': 0.0083, 'learning_rate': 3.936e-05, 'epoch': 0.8} 21%|██▏ | 2134/10000 [7:48:04<28:11:11, 12.90s/it] 21%|██▏ | 2135/10000 [7:48:17<28:10:26, 12.90s/it] {'loss': 0.0055, 'learning_rate': 3.9355000000000005e-05, 'epoch': 0.8} 21%|██▏ | 2135/10000 [7:48:17<28:10:26, 12.90s/it] 21%|██▏ | 2136/10000 [7:48:30<28:09:51, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.935e-05, 'epoch': 0.8} 21%|██▏ | 2136/10000 [7:48:30<28:09:51, 12.89s/it] 21%|██▏ | 2137/10000 [7:48:43<28:08:24, 12.88s/it] {'loss': 0.0047, 'learning_rate': 3.9345000000000004e-05, 'epoch': 0.81} 21%|██▏ | 2137/10000 [7:48:43<28:08:24, 12.88s/it] 21%|██▏ | 2138/10000 [7:48:55<28:08:29, 12.89s/it] {'loss': 0.0062, 'learning_rate': 3.9340000000000006e-05, 'epoch': 0.81} 21%|██▏ | 2138/10000 [7:48:55<28:08:29, 12.89s/it] 21%|██▏ | 2139/10000 [7:49:08<28:06:09, 12.87s/it] {'loss': 0.0051, 'learning_rate': 3.9335e-05, 'epoch': 0.81} 21%|██▏ | 2139/10000 [7:49:08<28:06:09, 12.87s/it] 21%|██▏ | 2140/10000 [7:49:21<28:04:23, 12.86s/it] {'loss': 0.0067, 'learning_rate': 3.933e-05, 'epoch': 0.81} 21%|██▏ | 2140/10000 [7:49:21<28:04:23, 12.86s/it] 21%|██▏ | 2141/10000 [7:49:34<28:03:13, 12.85s/it] {'loss': 0.0057, 'learning_rate': 3.9325e-05, 'epoch': 0.81} 21%|██▏ | 2141/10000 [7:49:34<28:03:13, 12.85s/it] 21%|██▏ | 2142/10000 [7:49:47<28:03:36, 12.86s/it] {'loss': 0.0071, 'learning_rate': 3.932e-05, 'epoch': 0.81} 21%|██▏ | 2142/10000 [7:49:47<28:03:36, 12.86s/it] 21%|██▏ | 2143/10000 [7:50:00<28:05:47, 12.87s/it] {'loss': 0.0065, 'learning_rate': 3.9315e-05, 'epoch': 0.81} 21%|██▏ | 2143/10000 [7:50:00<28:05:47, 12.87s/it] 21%|██▏ | 2144/10000 [7:50:13<28:05:07, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.931e-05, 'epoch': 0.81} 21%|██▏ | 2144/10000 [7:50:13<28:05:07, 12.87s/it] 21%|██▏ | 2145/10000 [7:50:25<28:04:31, 12.87s/it] {'loss': 0.0071, 'learning_rate': 3.9305000000000004e-05, 'epoch': 0.81} 21%|██▏ | 2145/10000 [7:50:25<28:04:31, 12.87s/it] 21%|██▏ | 2146/10000 [7:50:38<28:05:14, 12.87s/it] {'loss': 0.0056, 'learning_rate': 3.9300000000000007e-05, 'epoch': 0.81} 21%|██▏ | 2146/10000 [7:50:38<28:05:14, 12.87s/it] 21%|██▏ | 2147/10000 [7:50:51<28:07:02, 12.89s/it] {'loss': 0.0062, 'learning_rate': 3.9295e-05, 'epoch': 0.81} 21%|██▏ | 2147/10000 [7:50:51<28:07:02, 12.89s/it] 21%|██▏ | 2148/10000 [7:51:04<28:09:24, 12.91s/it] {'loss': 0.0069, 'learning_rate': 3.9290000000000005e-05, 'epoch': 0.81} 21%|██▏ | 2148/10000 [7:51:04<28:09:24, 12.91s/it] 21%|██▏ | 2149/10000 [7:51:17<28:06:59, 12.89s/it] {'loss': 0.0104, 'learning_rate': 3.9285e-05, 'epoch': 0.81} 21%|██▏ | 2149/10000 [7:51:17<28:06:59, 12.89s/it] 22%|██▏ | 2150/10000 [7:51:30<28:04:44, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.9280000000000003e-05, 'epoch': 0.81} 22%|██▏ | 2150/10000 [7:51:30<28:04:44, 12.88s/it] 22%|██▏ | 2151/10000 [7:51:43<28:04:40, 12.88s/it] {'loss': 0.0071, 'learning_rate': 3.9275e-05, 'epoch': 0.81} 22%|██▏ | 2151/10000 [7:51:43<28:04:40, 12.88s/it] 22%|██▏ | 2152/10000 [7:51:56<28:03:49, 12.87s/it] {'loss': 0.0059, 'learning_rate': 3.927e-05, 'epoch': 0.81} 22%|██▏ | 2152/10000 [7:51:56<28:03:49, 12.87s/it] 22%|██▏ | 2153/10000 [7:52:09<28:04:13, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.9265000000000004e-05, 'epoch': 0.81} 22%|██▏ | 2153/10000 [7:52:09<28:04:13, 12.88s/it] 22%|██▏ | 2154/10000 [7:52:21<28:05:52, 12.89s/it] {'loss': 0.007, 'learning_rate': 3.926e-05, 'epoch': 0.81} 22%|██▏ | 2154/10000 [7:52:21<28:05:52, 12.89s/it] 22%|██▏ | 2155/10000 [7:52:34<28:05:02, 12.89s/it] {'loss': 0.0077, 'learning_rate': 3.9255e-05, 'epoch': 0.81} 22%|██▏ | 2155/10000 [7:52:34<28:05:02, 12.89s/it] 22%|██▏ | 2156/10000 [7:52:47<28:04:15, 12.88s/it] {'loss': 0.0121, 'learning_rate': 3.9250000000000005e-05, 'epoch': 0.81} 22%|██▏ | 2156/10000 [7:52:47<28:04:15, 12.88s/it] 22%|██▏ | 2157/10000 [7:53:00<28:03:48, 12.88s/it] {'loss': 0.0074, 'learning_rate': 3.9245e-05, 'epoch': 0.81} 22%|██▏ | 2157/10000 [7:53:00<28:03:48, 12.88s/it] 22%|██▏ | 2158/10000 [7:53:13<28:06:13, 12.90s/it] {'loss': 0.0071, 'learning_rate': 3.9240000000000004e-05, 'epoch': 0.81} 22%|██▏ | 2158/10000 [7:53:13<28:06:13, 12.90s/it] 22%|██▏ | 2159/10000 [7:53:26<28:04:09, 12.89s/it] {'loss': 0.0066, 'learning_rate': 3.9235e-05, 'epoch': 0.81} 22%|██▏ | 2159/10000 [7:53:26<28:04:09, 12.89s/it] 22%|██▏ | 2160/10000 [7:53:39<28:00:09, 12.86s/it] {'loss': 0.0063, 'learning_rate': 3.923e-05, 'epoch': 0.81} 22%|██▏ | 2160/10000 [7:53:39<28:00:09, 12.86s/it] 22%|██▏ | 2161/10000 [7:53:52<27:59:46, 12.86s/it] {'loss': 0.0047, 'learning_rate': 3.9225e-05, 'epoch': 0.81} 22%|██▏ | 2161/10000 [7:53:52<27:59:46, 12.86s/it] 22%|██▏ | 2162/10000 [7:54:04<28:01:27, 12.87s/it] {'loss': 0.0063, 'learning_rate': 3.922e-05, 'epoch': 0.81} 22%|██▏ | 2162/10000 [7:54:04<28:01:27, 12.87s/it] 22%|██▏ | 2163/10000 [7:54:17<28:04:07, 12.89s/it] {'loss': 0.0064, 'learning_rate': 3.9215e-05, 'epoch': 0.81} 22%|██▏ | 2163/10000 [7:54:17<28:04:07, 12.89s/it] 22%|██▏ | 2164/10000 [7:54:30<28:05:11, 12.90s/it] {'loss': 0.0054, 'learning_rate': 3.921e-05, 'epoch': 0.82} 22%|██▏ | 2164/10000 [7:54:30<28:05:11, 12.90s/it] 22%|██▏ | 2165/10000 [7:54:43<28:05:30, 12.91s/it] {'loss': 0.005, 'learning_rate': 3.9205e-05, 'epoch': 0.82} 22%|██▏ | 2165/10000 [7:54:43<28:05:30, 12.91s/it] 22%|██▏ | 2166/10000 [7:54:56<28:04:24, 12.90s/it] {'loss': 0.0053, 'learning_rate': 3.9200000000000004e-05, 'epoch': 0.82} 22%|██▏ | 2166/10000 [7:54:56<28:04:24, 12.90s/it] 22%|██▏ | 2167/10000 [7:55:09<28:02:25, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.919500000000001e-05, 'epoch': 0.82} 22%|██▏ | 2167/10000 [7:55:09<28:02:25, 12.89s/it] 22%|██▏ | 2168/10000 [7:55:22<28:02:52, 12.89s/it] {'loss': 0.0066, 'learning_rate': 3.919e-05, 'epoch': 0.82} 22%|██▏ | 2168/10000 [7:55:22<28:02:52, 12.89s/it] 22%|██▏ | 2169/10000 [7:55:35<28:07:04, 12.93s/it] {'loss': 0.0079, 'learning_rate': 3.9185e-05, 'epoch': 0.82} 22%|██▏ | 2169/10000 [7:55:35<28:07:04, 12.93s/it] 22%|██▏ | 2170/10000 [7:55:48<28:09:38, 12.95s/it] {'loss': 0.0068, 'learning_rate': 3.918e-05, 'epoch': 0.82} 22%|██▏ | 2170/10000 [7:55:48<28:09:38, 12.95s/it] 22%|██▏ | 2171/10000 [7:56:01<28:08:39, 12.94s/it] {'loss': 0.0054, 'learning_rate': 3.9175000000000004e-05, 'epoch': 0.82} 22%|██▏ | 2171/10000 [7:56:01<28:08:39, 12.94s/it] 22%|██▏ | 2172/10000 [7:56:14<28:08:31, 12.94s/it] {'loss': 0.0081, 'learning_rate': 3.917e-05, 'epoch': 0.82} 22%|██▏ | 2172/10000 [7:56:14<28:08:31, 12.94s/it] 22%|██▏ | 2173/10000 [7:56:27<28:10:31, 12.96s/it] {'loss': 0.007, 'learning_rate': 3.9165e-05, 'epoch': 0.82} 22%|██▏ | 2173/10000 [7:56:27<28:10:31, 12.96s/it] 22%|██▏ | 2174/10000 [7:56:40<28:11:17, 12.97s/it] {'loss': 0.0082, 'learning_rate': 3.9160000000000005e-05, 'epoch': 0.82} 22%|██▏ | 2174/10000 [7:56:40<28:11:17, 12.97s/it] 22%|██▏ | 2175/10000 [7:56:53<28:08:50, 12.95s/it] {'loss': 0.0082, 'learning_rate': 3.9155e-05, 'epoch': 0.82} 22%|██▏ | 2175/10000 [7:56:53<28:08:50, 12.95s/it] 22%|██▏ | 2176/10000 [7:57:06<28:08:07, 12.95s/it] {'loss': 0.0056, 'learning_rate': 3.915e-05, 'epoch': 0.82} 22%|██▏ | 2176/10000 [7:57:06<28:08:07, 12.95s/it] 22%|██▏ | 2177/10000 [7:57:19<28:09:43, 12.96s/it] {'loss': 0.0119, 'learning_rate': 3.9145000000000006e-05, 'epoch': 0.82} 22%|██▏ | 2177/10000 [7:57:19<28:09:43, 12.96s/it] 22%|██▏ | 2178/10000 [7:57:32<28:12:06, 12.98s/it] {'loss': 0.008, 'learning_rate': 3.914e-05, 'epoch': 0.82} 22%|██▏ | 2178/10000 [7:57:32<28:12:06, 12.98s/it] 22%|██▏ | 2179/10000 [7:57:45<28:12:18, 12.98s/it] {'loss': 0.0061, 'learning_rate': 3.9135e-05, 'epoch': 0.82} 22%|██▏ | 2179/10000 [7:57:45<28:12:18, 12.98s/it] 22%|██▏ | 2180/10000 [7:57:58<28:14:35, 13.00s/it] {'loss': 0.0051, 'learning_rate': 3.913e-05, 'epoch': 0.82} 22%|██▏ | 2180/10000 [7:57:58<28:14:35, 13.00s/it] 22%|██▏ | 2181/10000 [7:58:11<28:13:52, 13.00s/it] {'loss': 0.0064, 'learning_rate': 3.9125e-05, 'epoch': 0.82} 22%|██▏ | 2181/10000 [7:58:11<28:13:52, 13.00s/it] 22%|██▏ | 2182/10000 [7:58:24<28:16:46, 13.02s/it] {'loss': 0.0105, 'learning_rate': 3.912e-05, 'epoch': 0.82} 22%|██▏ | 2182/10000 [7:58:24<28:16:46, 13.02s/it] 22%|██▏ | 2183/10000 [7:58:37<28:15:24, 13.01s/it] {'loss': 0.0075, 'learning_rate': 3.9115e-05, 'epoch': 0.82} 22%|██▏ | 2183/10000 [7:58:37<28:15:24, 13.01s/it] 22%|██▏ | 2184/10000 [7:58:50<28:13:30, 13.00s/it] {'loss': 0.0051, 'learning_rate': 3.911e-05, 'epoch': 0.82} 22%|██▏ | 2184/10000 [7:58:50<28:13:30, 13.00s/it] 22%|██▏ | 2185/10000 [7:59:03<28:10:36, 12.98s/it] {'loss': 0.0069, 'learning_rate': 3.9105000000000006e-05, 'epoch': 0.82} 22%|██▏ | 2185/10000 [7:59:03<28:10:36, 12.98s/it] 22%|██▏ | 2186/10000 [7:59:16<28:10:00, 12.98s/it] {'loss': 0.0041, 'learning_rate': 3.91e-05, 'epoch': 0.82} 22%|██▏ | 2186/10000 [7:59:16<28:10:00, 12.98s/it] 22%|██▏ | 2187/10000 [7:59:29<28:09:42, 12.98s/it] {'loss': 0.0055, 'learning_rate': 3.9095000000000004e-05, 'epoch': 0.82} 22%|██▏ | 2187/10000 [7:59:29<28:09:42, 12.98s/it] 22%|██▏ | 2188/10000 [7:59:41<28:08:28, 12.97s/it] {'loss': 0.0082, 'learning_rate': 3.909000000000001e-05, 'epoch': 0.82} 22%|██▏ | 2188/10000 [7:59:42<28:08:28, 12.97s/it] 22%|██▏ | 2189/10000 [7:59:55<28:12:03, 13.00s/it] {'loss': 0.0147, 'learning_rate': 3.9085e-05, 'epoch': 0.82} 22%|██▏ | 2189/10000 [7:59:55<28:12:03, 13.00s/it] 22%|██▏ | 2190/10000 [8:00:07<28:09:16, 12.98s/it] {'loss': 0.0058, 'learning_rate': 3.908e-05, 'epoch': 0.83} 22%|██▏ | 2190/10000 [8:00:08<28:09:16, 12.98s/it] 22%|██▏ | 2191/10000 [8:00:20<28:03:03, 12.93s/it] {'loss': 0.0075, 'learning_rate': 3.9075e-05, 'epoch': 0.83} 22%|██▏ | 2191/10000 [8:00:20<28:03:03, 12.93s/it] 22%|██▏ | 2192/10000 [8:00:33<27:58:19, 12.90s/it] {'loss': 0.0082, 'learning_rate': 3.9070000000000004e-05, 'epoch': 0.83} 22%|██▏ | 2192/10000 [8:00:33<27:58:19, 12.90s/it] 22%|██▏ | 2193/10000 [8:00:46<27:59:13, 12.91s/it] {'loss': 0.0085, 'learning_rate': 3.9065e-05, 'epoch': 0.83} 22%|██▏ | 2193/10000 [8:00:46<27:59:13, 12.91s/it] 22%|██▏ | 2194/10000 [8:00:59<28:02:09, 12.93s/it] {'loss': 0.0071, 'learning_rate': 3.906e-05, 'epoch': 0.83} 22%|██▏ | 2194/10000 [8:00:59<28:02:09, 12.93s/it] 22%|██▏ | 2195/10000 [8:01:12<27:59:44, 12.91s/it] {'loss': 0.0087, 'learning_rate': 3.9055000000000005e-05, 'epoch': 0.83} 22%|██▏ | 2195/10000 [8:01:12<27:59:44, 12.91s/it] 22%|██▏ | 2196/10000 [8:01:25<28:02:15, 12.93s/it] {'loss': 0.0097, 'learning_rate': 3.905e-05, 'epoch': 0.83} 22%|██▏ | 2196/10000 [8:01:25<28:02:15, 12.93s/it] 22%|██▏ | 2197/10000 [8:01:38<28:01:25, 12.93s/it] {'loss': 0.0062, 'learning_rate': 3.9045e-05, 'epoch': 0.83} 22%|██▏ | 2197/10000 [8:01:38<28:01:25, 12.93s/it] 22%|██▏ | 2198/10000 [8:01:51<28:00:47, 12.93s/it] {'loss': 0.0081, 'learning_rate': 3.9040000000000006e-05, 'epoch': 0.83} 22%|██▏ | 2198/10000 [8:01:51<28:00:47, 12.93s/it] 22%|██▏ | 2199/10000 [8:02:04<28:00:21, 12.92s/it] {'loss': 0.0067, 'learning_rate': 3.9035e-05, 'epoch': 0.83} 22%|██▏ | 2199/10000 [8:02:04<28:00:21, 12.92s/it] 22%|██▏ | 2200/10000 [8:02:17<28:00:22, 12.93s/it] {'loss': 0.0079, 'learning_rate': 3.903e-05, 'epoch': 0.83} 22%|██▏ | 2200/10000 [8:02:17<28:00:22, 12.93s/it] 22%|██▏ | 2201/10000 [8:02:29<27:58:27, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.9025e-05, 'epoch': 0.83} 22%|██▏ | 2201/10000 [8:02:29<27:58:27, 12.91s/it] 22%|██▏ | 2202/10000 [8:02:42<27:58:28, 12.91s/it] {'loss': 0.0076, 'learning_rate': 3.902e-05, 'epoch': 0.83} 22%|██▏ | 2202/10000 [8:02:42<27:58:28, 12.91s/it] 22%|██▏ | 2203/10000 [8:02:55<27:59:24, 12.92s/it] {'loss': 0.0085, 'learning_rate': 3.9015e-05, 'epoch': 0.83} 22%|██▏ | 2203/10000 [8:02:55<27:59:24, 12.92s/it] 22%|██▏ | 2204/10000 [8:03:08<27:56:51, 12.91s/it] {'loss': 0.0116, 'learning_rate': 3.901e-05, 'epoch': 0.83} 22%|██▏ | 2204/10000 [8:03:08<27:56:51, 12.91s/it] 22%|██▏ | 2205/10000 [8:03:21<27:59:00, 12.92s/it] {'loss': 0.0079, 'learning_rate': 3.9005000000000003e-05, 'epoch': 0.83} 22%|██▏ | 2205/10000 [8:03:21<27:59:00, 12.92s/it] 22%|██▏ | 2206/10000 [8:03:34<28:00:08, 12.93s/it] {'loss': 0.0061, 'learning_rate': 3.9000000000000006e-05, 'epoch': 0.83} 22%|██▏ | 2206/10000 [8:03:34<28:00:08, 12.93s/it] 22%|██▏ | 2207/10000 [8:03:47<27:58:55, 12.93s/it] {'loss': 0.0072, 'learning_rate': 3.8995e-05, 'epoch': 0.83} 22%|██▏ | 2207/10000 [8:03:47<27:58:55, 12.93s/it] 22%|██▏ | 2208/10000 [8:04:00<27:58:03, 12.92s/it] {'loss': 0.0079, 'learning_rate': 3.8990000000000004e-05, 'epoch': 0.83} 22%|██▏ | 2208/10000 [8:04:00<27:58:03, 12.92s/it] 22%|██▏ | 2209/10000 [8:04:13<27:59:19, 12.93s/it] {'loss': 0.0055, 'learning_rate': 3.8985e-05, 'epoch': 0.83} 22%|██▏ | 2209/10000 [8:04:13<27:59:19, 12.93s/it] 22%|██▏ | 2210/10000 [8:04:26<27:59:10, 12.93s/it] {'loss': 0.0083, 'learning_rate': 3.898e-05, 'epoch': 0.83} 22%|██▏ | 2210/10000 [8:04:26<27:59:10, 12.93s/it] 22%|██▏ | 2211/10000 [8:04:39<27:59:55, 12.94s/it] {'loss': 0.0067, 'learning_rate': 3.8975e-05, 'epoch': 0.83} 22%|██▏ | 2211/10000 [8:04:39<27:59:55, 12.94s/it] 22%|██▏ | 2212/10000 [8:04:52<27:59:54, 12.94s/it] {'loss': 0.0081, 'learning_rate': 3.897e-05, 'epoch': 0.83} 22%|██▏ | 2212/10000 [8:04:52<27:59:54, 12.94s/it] 22%|██▏ | 2213/10000 [8:05:05<28:00:25, 12.95s/it] {'loss': 0.0056, 'learning_rate': 3.8965000000000004e-05, 'epoch': 0.83} 22%|██▏ | 2213/10000 [8:05:05<28:00:25, 12.95s/it] 22%|██▏ | 2214/10000 [8:05:18<27:59:34, 12.94s/it] {'loss': 0.0073, 'learning_rate': 3.896e-05, 'epoch': 0.83} 22%|██▏ | 2214/10000 [8:05:18<27:59:34, 12.94s/it] 22%|██▏ | 2215/10000 [8:05:31<27:58:25, 12.94s/it] {'loss': 0.0064, 'learning_rate': 3.8955e-05, 'epoch': 0.83} 22%|██▏ | 2215/10000 [8:05:31<27:58:25, 12.94s/it] 22%|██▏ | 2216/10000 [8:05:43<27:58:55, 12.94s/it] {'loss': 0.0108, 'learning_rate': 3.8950000000000005e-05, 'epoch': 0.83} 22%|██▏ | 2216/10000 [8:05:44<27:58:55, 12.94s/it] 22%|██▏ | 2217/10000 [8:05:56<27:58:43, 12.94s/it] {'loss': 0.0076, 'learning_rate': 3.8945e-05, 'epoch': 0.84} 22%|██▏ | 2217/10000 [8:05:56<27:58:43, 12.94s/it] 22%|██▏ | 2218/10000 [8:06:09<27:53:44, 12.90s/it] {'loss': 0.0064, 'learning_rate': 3.894e-05, 'epoch': 0.84} 22%|██▏ | 2218/10000 [8:06:09<27:53:44, 12.90s/it] 22%|██▏ | 2219/10000 [8:06:22<27:51:40, 12.89s/it] {'loss': 0.0081, 'learning_rate': 3.8935e-05, 'epoch': 0.84} 22%|██▏ | 2219/10000 [8:06:22<27:51:40, 12.89s/it] 22%|██▏ | 2220/10000 [8:06:35<27:48:37, 12.87s/it] {'loss': 0.0086, 'learning_rate': 3.893e-05, 'epoch': 0.84} 22%|██▏ | 2220/10000 [8:06:35<27:48:37, 12.87s/it] 22%|██▏ | 2221/10000 [8:06:48<27:47:42, 12.86s/it] {'loss': 0.0128, 'learning_rate': 3.8925e-05, 'epoch': 0.84} 22%|██▏ | 2221/10000 [8:06:48<27:47:42, 12.86s/it] 22%|██▏ | 2222/10000 [8:07:01<27:48:24, 12.87s/it] {'loss': 0.0056, 'learning_rate': 3.892e-05, 'epoch': 0.84} 22%|██▏ | 2222/10000 [8:07:01<27:48:24, 12.87s/it] 22%|██▏ | 2223/10000 [8:07:14<27:50:18, 12.89s/it] {'loss': 0.0075, 'learning_rate': 3.8915e-05, 'epoch': 0.84} 22%|██▏ | 2223/10000 [8:07:14<27:50:18, 12.89s/it] 22%|██▏ | 2224/10000 [8:07:27<27:51:42, 12.90s/it] {'loss': 0.0063, 'learning_rate': 3.8910000000000005e-05, 'epoch': 0.84} 22%|██▏ | 2224/10000 [8:07:27<27:51:42, 12.90s/it] 22%|██▏ | 2225/10000 [8:07:39<27:51:40, 12.90s/it] {'loss': 0.0054, 'learning_rate': 3.8905e-05, 'epoch': 0.84} 22%|██▏ | 2225/10000 [8:07:39<27:51:40, 12.90s/it] 22%|██▏ | 2226/10000 [8:07:52<27:52:53, 12.91s/it] {'loss': 0.0077, 'learning_rate': 3.8900000000000004e-05, 'epoch': 0.84} 22%|██▏ | 2226/10000 [8:07:52<27:52:53, 12.91s/it] 22%|██▏ | 2227/10000 [8:08:05<27:50:54, 12.90s/it] {'loss': 0.0091, 'learning_rate': 3.8895000000000006e-05, 'epoch': 0.84} 22%|██▏ | 2227/10000 [8:08:05<27:50:54, 12.90s/it] 22%|██▏ | 2228/10000 [8:08:18<27:49:43, 12.89s/it] {'loss': 0.0084, 'learning_rate': 3.889e-05, 'epoch': 0.84} 22%|██▏ | 2228/10000 [8:08:18<27:49:43, 12.89s/it] 22%|██▏ | 2229/10000 [8:08:31<27:48:04, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.8885e-05, 'epoch': 0.84} 22%|██▏ | 2229/10000 [8:08:31<27:48:04, 12.88s/it] 22%|██▏ | 2230/10000 [8:08:44<27:46:37, 12.87s/it] {'loss': 0.0047, 'learning_rate': 3.888e-05, 'epoch': 0.84} 22%|██▏ | 2230/10000 [8:08:44<27:46:37, 12.87s/it] 22%|██▏ | 2231/10000 [8:08:57<27:47:35, 12.88s/it] {'loss': 0.0102, 'learning_rate': 3.8875e-05, 'epoch': 0.84} 22%|██▏ | 2231/10000 [8:08:57<27:47:35, 12.88s/it] 22%|██▏ | 2232/10000 [8:09:10<27:47:39, 12.88s/it] {'loss': 0.0077, 'learning_rate': 3.887e-05, 'epoch': 0.84} 22%|██▏ | 2232/10000 [8:09:10<27:47:39, 12.88s/it] 22%|██▏ | 2233/10000 [8:09:23<27:49:45, 12.90s/it] {'loss': 0.0059, 'learning_rate': 3.8865e-05, 'epoch': 0.84} 22%|██▏ | 2233/10000 [8:09:23<27:49:45, 12.90s/it] 22%|██▏ | 2234/10000 [8:09:35<27:47:05, 12.88s/it] {'loss': 0.0077, 'learning_rate': 3.8860000000000004e-05, 'epoch': 0.84} 22%|██▏ | 2234/10000 [8:09:35<27:47:05, 12.88s/it] 22%|██▏ | 2235/10000 [8:09:48<27:47:43, 12.89s/it] {'loss': 0.0134, 'learning_rate': 3.8855e-05, 'epoch': 0.84} 22%|██▏ | 2235/10000 [8:09:48<27:47:43, 12.89s/it] 22%|██▏ | 2236/10000 [8:10:01<27:47:52, 12.89s/it] {'loss': 0.0084, 'learning_rate': 3.885e-05, 'epoch': 0.84} 22%|██▏ | 2236/10000 [8:10:01<27:47:52, 12.89s/it] 22%|██▏ | 2237/10000 [8:10:14<27:47:35, 12.89s/it] {'loss': 0.0129, 'learning_rate': 3.8845000000000005e-05, 'epoch': 0.84} 22%|██▏ | 2237/10000 [8:10:14<27:47:35, 12.89s/it] 22%|██▏ | 2238/10000 [8:10:27<27:51:40, 12.92s/it] {'loss': 0.0074, 'learning_rate': 3.884e-05, 'epoch': 0.84} 22%|██▏ | 2238/10000 [8:10:27<27:51:40, 12.92s/it] 22%|██▏ | 2239/10000 [8:10:40<27:55:18, 12.95s/it] {'loss': 0.0067, 'learning_rate': 3.8835e-05, 'epoch': 0.84} 22%|██▏ | 2239/10000 [8:10:40<27:55:18, 12.95s/it] 22%|██▏ | 2240/10000 [8:10:53<27:56:29, 12.96s/it] {'loss': 0.0065, 'learning_rate': 3.883e-05, 'epoch': 0.84} 22%|██▏ | 2240/10000 [8:10:53<27:56:29, 12.96s/it] 22%|██▏ | 2241/10000 [8:11:06<27:59:44, 12.99s/it] {'loss': 0.0085, 'learning_rate': 3.8825e-05, 'epoch': 0.84} 22%|██▏ | 2241/10000 [8:11:06<27:59:44, 12.99s/it] 22%|██▏ | 2242/10000 [8:11:19<27:59:50, 12.99s/it] {'loss': 0.0076, 'learning_rate': 3.882e-05, 'epoch': 0.84} 22%|██▏ | 2242/10000 [8:11:19<27:59:50, 12.99s/it] 22%|██▏ | 2243/10000 [8:11:32<27:59:24, 12.99s/it] {'loss': 0.0068, 'learning_rate': 3.8815e-05, 'epoch': 0.85} 22%|██▏ | 2243/10000 [8:11:32<27:59:24, 12.99s/it] 22%|██▏ | 2244/10000 [8:11:45<27:59:17, 12.99s/it] {'loss': 0.005, 'learning_rate': 3.881e-05, 'epoch': 0.85} 22%|██▏ | 2244/10000 [8:11:45<27:59:17, 12.99s/it] 22%|██▏ | 2245/10000 [8:11:58<27:57:24, 12.98s/it] {'loss': 0.014, 'learning_rate': 3.8805000000000005e-05, 'epoch': 0.85} 22%|██▏ | 2245/10000 [8:11:58<27:57:24, 12.98s/it] 22%|██▏ | 2246/10000 [8:12:11<27:59:06, 12.99s/it] {'loss': 0.0077, 'learning_rate': 3.88e-05, 'epoch': 0.85} 22%|██▏ | 2246/10000 [8:12:11<27:59:06, 12.99s/it] 22%|██▏ | 2247/10000 [8:12:24<27:56:13, 12.97s/it] {'loss': 0.0079, 'learning_rate': 3.8795000000000004e-05, 'epoch': 0.85} 22%|██▏ | 2247/10000 [8:12:24<27:56:13, 12.97s/it] 22%|██▏ | 2248/10000 [8:12:37<27:57:28, 12.98s/it] {'loss': 0.0092, 'learning_rate': 3.8790000000000006e-05, 'epoch': 0.85} 22%|██▏ | 2248/10000 [8:12:37<27:57:28, 12.98s/it] 22%|██▏ | 2249/10000 [8:12:50<27:58:32, 12.99s/it] {'loss': 0.0074, 'learning_rate': 3.8785e-05, 'epoch': 0.85} 22%|██▏ | 2249/10000 [8:12:50<27:58:32, 12.99s/it] 22%|██▎ | 2250/10000 [8:13:03<27:58:56, 13.00s/it] {'loss': 0.01, 'learning_rate': 3.878e-05, 'epoch': 0.85} 22%|██▎ | 2250/10000 [8:13:03<27:58:56, 13.00s/it] 23%|██▎ | 2251/10000 [8:13:16<27:56:19, 12.98s/it] {'loss': 0.0114, 'learning_rate': 3.8775e-05, 'epoch': 0.85} 23%|██▎ | 2251/10000 [8:13:16<27:56:19, 12.98s/it] 23%|██▎ | 2252/10000 [8:13:29<27:57:37, 12.99s/it] {'loss': 0.008, 'learning_rate': 3.877e-05, 'epoch': 0.85} 23%|██▎ | 2252/10000 [8:13:29<27:57:37, 12.99s/it] 23%|██▎ | 2253/10000 [8:13:42<27:55:55, 12.98s/it] {'loss': 0.0074, 'learning_rate': 3.8765e-05, 'epoch': 0.85} 23%|██▎ | 2253/10000 [8:13:42<27:55:55, 12.98s/it] 23%|██▎ | 2254/10000 [8:13:55<27:55:32, 12.98s/it] {'loss': 0.0048, 'learning_rate': 3.876e-05, 'epoch': 0.85} 23%|██▎ | 2254/10000 [8:13:55<27:55:32, 12.98s/it] 23%|██▎ | 2255/10000 [8:14:08<27:51:21, 12.95s/it] {'loss': 0.0062, 'learning_rate': 3.8755000000000004e-05, 'epoch': 0.85} 23%|██▎ | 2255/10000 [8:14:08<27:51:21, 12.95s/it] 23%|██▎ | 2256/10000 [8:14:21<27:50:09, 12.94s/it] {'loss': 0.0041, 'learning_rate': 3.875e-05, 'epoch': 0.85} 23%|██▎ | 2256/10000 [8:14:21<27:50:09, 12.94s/it] 23%|██▎ | 2257/10000 [8:14:34<27:51:04, 12.95s/it] {'loss': 0.0058, 'learning_rate': 3.8745e-05, 'epoch': 0.85} 23%|██▎ | 2257/10000 [8:14:34<27:51:04, 12.95s/it] 23%|██▎ | 2258/10000 [8:14:46<27:46:34, 12.92s/it] {'loss': 0.0143, 'learning_rate': 3.8740000000000005e-05, 'epoch': 0.85} 23%|██▎ | 2258/10000 [8:14:47<27:46:34, 12.92s/it] 23%|██▎ | 2259/10000 [8:14:59<27:44:07, 12.90s/it] {'loss': 0.0113, 'learning_rate': 3.873500000000001e-05, 'epoch': 0.85} 23%|██▎ | 2259/10000 [8:14:59<27:44:07, 12.90s/it] 23%|██▎ | 2260/10000 [8:15:12<27:42:05, 12.88s/it] {'loss': 0.0082, 'learning_rate': 3.873e-05, 'epoch': 0.85} 23%|██▎ | 2260/10000 [8:15:12<27:42:05, 12.88s/it] 23%|██▎ | 2261/10000 [8:15:25<27:41:51, 12.88s/it] {'loss': 0.0076, 'learning_rate': 3.8725e-05, 'epoch': 0.85} 23%|██▎ | 2261/10000 [8:15:25<27:41:51, 12.88s/it] 23%|██▎ | 2262/10000 [8:15:38<27:45:29, 12.91s/it] {'loss': 0.008, 'learning_rate': 3.872e-05, 'epoch': 0.85} 23%|██▎ | 2262/10000 [8:15:38<27:45:29, 12.91s/it] 23%|██▎ | 2263/10000 [8:15:51<27:49:33, 12.95s/it] {'loss': 0.0096, 'learning_rate': 3.8715000000000005e-05, 'epoch': 0.85} 23%|██▎ | 2263/10000 [8:15:51<27:49:33, 12.95s/it] 23%|██▎ | 2264/10000 [8:16:04<27:49:48, 12.95s/it] {'loss': 0.0155, 'learning_rate': 3.871e-05, 'epoch': 0.85} 23%|██▎ | 2264/10000 [8:16:04<27:49:48, 12.95s/it] 23%|██▎ | 2265/10000 [8:16:17<27:47:57, 12.94s/it] {'loss': 0.0268, 'learning_rate': 3.8705e-05, 'epoch': 0.85} 23%|██▎ | 2265/10000 [8:16:17<27:47:57, 12.94s/it] 23%|██▎ | 2266/10000 [8:16:30<27:48:38, 12.95s/it] {'loss': 0.0074, 'learning_rate': 3.8700000000000006e-05, 'epoch': 0.85} 23%|██▎ | 2266/10000 [8:16:30<27:48:38, 12.95s/it] 23%|██▎ | 2267/10000 [8:16:43<27:49:22, 12.95s/it] {'loss': 0.0051, 'learning_rate': 3.8695e-05, 'epoch': 0.85} 23%|██▎ | 2267/10000 [8:16:43<27:49:22, 12.95s/it] 23%|██▎ | 2268/10000 [8:16:56<27:51:45, 12.97s/it] {'loss': 0.0072, 'learning_rate': 3.8690000000000004e-05, 'epoch': 0.85} 23%|██▎ | 2268/10000 [8:16:56<27:51:45, 12.97s/it] 23%|██▎ | 2269/10000 [8:17:09<27:49:11, 12.95s/it] {'loss': 0.0108, 'learning_rate': 3.8685000000000007e-05, 'epoch': 0.85} 23%|██▎ | 2269/10000 [8:17:09<27:49:11, 12.95s/it] 23%|██▎ | 2270/10000 [8:17:22<27:48:00, 12.95s/it] {'loss': 0.0109, 'learning_rate': 3.868e-05, 'epoch': 0.86} 23%|██▎ | 2270/10000 [8:17:22<27:48:00, 12.95s/it] 23%|██▎ | 2271/10000 [8:17:35<27:47:17, 12.94s/it] {'loss': 0.0092, 'learning_rate': 3.8675e-05, 'epoch': 0.86} 23%|██▎ | 2271/10000 [8:17:35<27:47:17, 12.94s/it] 23%|██▎ | 2272/10000 [8:17:48<27:50:41, 12.97s/it] {'loss': 0.0109, 'learning_rate': 3.867e-05, 'epoch': 0.86} 23%|██▎ | 2272/10000 [8:17:48<27:50:41, 12.97s/it] 23%|██▎ | 2273/10000 [8:18:01<27:50:11, 12.97s/it] {'loss': 0.0074, 'learning_rate': 3.8665e-05, 'epoch': 0.86} 23%|██▎ | 2273/10000 [8:18:01<27:50:11, 12.97s/it] 23%|██▎ | 2274/10000 [8:18:14<27:48:37, 12.96s/it] {'loss': 0.0072, 'learning_rate': 3.866e-05, 'epoch': 0.86} 23%|██▎ | 2274/10000 [8:18:14<27:48:37, 12.96s/it] 23%|██▎ | 2275/10000 [8:18:27<27:47:45, 12.95s/it] {'loss': 0.0081, 'learning_rate': 3.8655e-05, 'epoch': 0.86} 23%|██▎ | 2275/10000 [8:18:27<27:47:45, 12.95s/it] 23%|██▎ | 2276/10000 [8:18:40<27:46:40, 12.95s/it] {'loss': 0.0092, 'learning_rate': 3.8650000000000004e-05, 'epoch': 0.86} 23%|██▎ | 2276/10000 [8:18:40<27:46:40, 12.95s/it] 23%|██▎ | 2277/10000 [8:18:53<27:49:29, 12.97s/it] {'loss': 0.0067, 'learning_rate': 3.8645e-05, 'epoch': 0.86} 23%|██▎ | 2277/10000 [8:18:53<27:49:29, 12.97s/it] 23%|██▎ | 2278/10000 [8:19:06<27:49:38, 12.97s/it] {'loss': 0.0121, 'learning_rate': 3.864e-05, 'epoch': 0.86} 23%|██▎ | 2278/10000 [8:19:06<27:49:38, 12.97s/it] 23%|██▎ | 2279/10000 [8:19:19<27:50:59, 12.99s/it] {'loss': 0.0062, 'learning_rate': 3.8635000000000005e-05, 'epoch': 0.86} 23%|██▎ | 2279/10000 [8:19:19<27:50:59, 12.99s/it] 23%|██▎ | 2280/10000 [8:19:31<27:50:21, 12.98s/it] {'loss': 0.0105, 'learning_rate': 3.863e-05, 'epoch': 0.86} 23%|██▎ | 2280/10000 [8:19:32<27:50:21, 12.98s/it] 23%|██▎ | 2281/10000 [8:19:44<27:50:43, 12.99s/it] {'loss': 0.0095, 'learning_rate': 3.8625e-05, 'epoch': 0.86} 23%|██▎ | 2281/10000 [8:19:45<27:50:43, 12.99s/it] 23%|██▎ | 2282/10000 [8:19:57<27:48:40, 12.97s/it] {'loss': 0.0067, 'learning_rate': 3.862e-05, 'epoch': 0.86} 23%|██▎ | 2282/10000 [8:19:57<27:48:40, 12.97s/it] 23%|██▎ | 2283/10000 [8:20:10<27:47:02, 12.96s/it] {'loss': 0.0086, 'learning_rate': 3.8615e-05, 'epoch': 0.86} 23%|██▎ | 2283/10000 [8:20:10<27:47:02, 12.96s/it] 23%|██▎ | 2284/10000 [8:20:23<27:45:23, 12.95s/it] {'loss': 0.0075, 'learning_rate': 3.8610000000000005e-05, 'epoch': 0.86} 23%|██▎ | 2284/10000 [8:20:23<27:45:23, 12.95s/it] 23%|██▎ | 2285/10000 [8:20:36<27:42:37, 12.93s/it] {'loss': 0.0097, 'learning_rate': 3.8605e-05, 'epoch': 0.86} 23%|██▎ | 2285/10000 [8:20:36<27:42:37, 12.93s/it] 23%|██▎ | 2286/10000 [8:20:49<27:40:56, 12.92s/it] {'loss': 0.0081, 'learning_rate': 3.86e-05, 'epoch': 0.86} 23%|██▎ | 2286/10000 [8:20:49<27:40:56, 12.92s/it] 23%|██▎ | 2287/10000 [8:21:02<27:39:12, 12.91s/it] {'loss': 0.0061, 'learning_rate': 3.8595000000000006e-05, 'epoch': 0.86} 23%|██▎ | 2287/10000 [8:21:02<27:39:12, 12.91s/it] 23%|██▎ | 2288/10000 [8:21:15<27:34:14, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.859e-05, 'epoch': 0.86} 23%|██▎ | 2288/10000 [8:21:15<27:34:14, 12.87s/it] 23%|██▎ | 2289/10000 [8:21:28<27:34:52, 12.88s/it] {'loss': 0.0096, 'learning_rate': 3.8585000000000004e-05, 'epoch': 0.86} 23%|██▎ | 2289/10000 [8:21:28<27:34:52, 12.88s/it] 23%|██▎ | 2290/10000 [8:21:41<27:39:39, 12.92s/it] {'loss': 0.0083, 'learning_rate': 3.858e-05, 'epoch': 0.86} 23%|██▎ | 2290/10000 [8:21:41<27:39:39, 12.92s/it] 23%|██▎ | 2291/10000 [8:21:53<27:36:45, 12.89s/it] {'loss': 0.0076, 'learning_rate': 3.8575e-05, 'epoch': 0.86} 23%|██▎ | 2291/10000 [8:21:53<27:36:45, 12.89s/it] 23%|██▎ | 2292/10000 [8:22:06<27:37:09, 12.90s/it] {'loss': 0.0121, 'learning_rate': 3.857e-05, 'epoch': 0.86} 23%|██▎ | 2292/10000 [8:22:06<27:37:09, 12.90s/it] 23%|██▎ | 2293/10000 [8:22:19<27:38:24, 12.91s/it] {'loss': 0.0088, 'learning_rate': 3.8565e-05, 'epoch': 0.86} 23%|██▎ | 2293/10000 [8:22:19<27:38:24, 12.91s/it] 23%|██▎ | 2294/10000 [8:22:32<27:34:59, 12.89s/it] {'loss': 0.0105, 'learning_rate': 3.8560000000000004e-05, 'epoch': 0.86} 23%|██▎ | 2294/10000 [8:22:32<27:34:59, 12.89s/it] 23%|██▎ | 2295/10000 [8:22:45<27:33:29, 12.88s/it] {'loss': 0.0052, 'learning_rate': 3.8555e-05, 'epoch': 0.86} 23%|██▎ | 2295/10000 [8:22:45<27:33:29, 12.88s/it] 23%|██▎ | 2296/10000 [8:22:58<27:35:19, 12.89s/it] {'loss': 0.0067, 'learning_rate': 3.855e-05, 'epoch': 0.87} 23%|██▎ | 2296/10000 [8:22:58<27:35:19, 12.89s/it] 23%|██▎ | 2297/10000 [8:23:11<27:37:17, 12.91s/it] {'loss': 0.006, 'learning_rate': 3.8545000000000004e-05, 'epoch': 0.87} 23%|██▎ | 2297/10000 [8:23:11<27:37:17, 12.91s/it] 23%|██▎ | 2298/10000 [8:23:24<27:34:37, 12.89s/it] {'loss': 0.0241, 'learning_rate': 3.854000000000001e-05, 'epoch': 0.87} 23%|██▎ | 2298/10000 [8:23:24<27:34:37, 12.89s/it] 23%|██▎ | 2299/10000 [8:23:37<27:37:55, 12.92s/it] {'loss': 0.0083, 'learning_rate': 3.8535e-05, 'epoch': 0.87} 23%|██▎ | 2299/10000 [8:23:37<27:37:55, 12.92s/it] 23%|██▎ | 2300/10000 [8:23:50<27:37:46, 12.92s/it] {'loss': 0.009, 'learning_rate': 3.853e-05, 'epoch': 0.87} 23%|██▎ | 2300/10000 [8:23:50<27:37:46, 12.92s/it] 23%|██▎ | 2301/10000 [8:24:03<27:36:15, 12.91s/it] {'loss': 0.0063, 'learning_rate': 3.8525e-05, 'epoch': 0.87} 23%|██▎ | 2301/10000 [8:24:03<27:36:15, 12.91s/it] 23%|██▎ | 2302/10000 [8:24:15<27:37:45, 12.92s/it] {'loss': 0.0116, 'learning_rate': 3.8520000000000004e-05, 'epoch': 0.87} 23%|██▎ | 2302/10000 [8:24:15<27:37:45, 12.92s/it] 23%|██▎ | 2303/10000 [8:24:28<27:37:34, 12.92s/it] {'loss': 0.0092, 'learning_rate': 3.8515e-05, 'epoch': 0.87} 23%|██▎ | 2303/10000 [8:24:28<27:37:34, 12.92s/it] 23%|██▎ | 2304/10000 [8:24:41<27:38:47, 12.93s/it] {'loss': 0.007, 'learning_rate': 3.851e-05, 'epoch': 0.87} 23%|██▎ | 2304/10000 [8:24:41<27:38:47, 12.93s/it] 23%|██▎ | 2305/10000 [8:24:54<27:36:50, 12.92s/it] {'loss': 0.0097, 'learning_rate': 3.8505000000000005e-05, 'epoch': 0.87} 23%|██▎ | 2305/10000 [8:24:54<27:36:50, 12.92s/it] 23%|██▎ | 2306/10000 [8:25:07<27:33:27, 12.89s/it] {'loss': 0.0063, 'learning_rate': 3.85e-05, 'epoch': 0.87} 23%|██▎ | 2306/10000 [8:25:07<27:33:27, 12.89s/it] 23%|██▎ | 2307/10000 [8:25:20<27:32:23, 12.89s/it] {'loss': 0.0072, 'learning_rate': 3.8495e-05, 'epoch': 0.87} 23%|██▎ | 2307/10000 [8:25:20<27:32:23, 12.89s/it] 23%|██▎ | 2308/10000 [8:25:33<27:34:59, 12.91s/it] {'loss': 0.0087, 'learning_rate': 3.8490000000000006e-05, 'epoch': 0.87} 23%|██▎ | 2308/10000 [8:25:33<27:34:59, 12.91s/it] 23%|██▎ | 2309/10000 [8:25:46<27:32:32, 12.89s/it] {'loss': 0.0089, 'learning_rate': 3.8485e-05, 'epoch': 0.87} 23%|██▎ | 2309/10000 [8:25:46<27:32:32, 12.89s/it] 23%|██▎ | 2310/10000 [8:25:59<27:31:12, 12.88s/it] {'loss': 0.0077, 'learning_rate': 3.848e-05, 'epoch': 0.87} 23%|██▎ | 2310/10000 [8:25:59<27:31:12, 12.88s/it] 23%|██▎ | 2311/10000 [8:26:12<27:31:31, 12.89s/it] {'loss': 0.0066, 'learning_rate': 3.8475e-05, 'epoch': 0.87} 23%|██▎ | 2311/10000 [8:26:12<27:31:31, 12.89s/it] 23%|██▎ | 2312/10000 [8:26:24<27:34:51, 12.92s/it] {'loss': 0.0069, 'learning_rate': 3.847e-05, 'epoch': 0.87} 23%|██▎ | 2312/10000 [8:26:25<27:34:51, 12.92s/it] 23%|██▎ | 2313/10000 [8:26:37<27:35:28, 12.92s/it] {'loss': 0.0066, 'learning_rate': 3.8465e-05, 'epoch': 0.87} 23%|██▎ | 2313/10000 [8:26:37<27:35:28, 12.92s/it] 23%|██▎ | 2314/10000 [8:26:50<27:36:18, 12.93s/it] {'loss': 0.0062, 'learning_rate': 3.846e-05, 'epoch': 0.87} 23%|██▎ | 2314/10000 [8:26:50<27:36:18, 12.93s/it] 23%|██▎ | 2315/10000 [8:27:03<27:36:15, 12.93s/it] {'loss': 0.0089, 'learning_rate': 3.8455000000000004e-05, 'epoch': 0.87} 23%|██▎ | 2315/10000 [8:27:03<27:36:15, 12.93s/it] 23%|██▎ | 2316/10000 [8:27:16<27:38:32, 12.95s/it] {'loss': 0.0065, 'learning_rate': 3.845e-05, 'epoch': 0.87} 23%|██▎ | 2316/10000 [8:27:16<27:38:32, 12.95s/it] 23%|██▎ | 2317/10000 [8:27:29<27:34:59, 12.92s/it] {'loss': 0.0076, 'learning_rate': 3.8445e-05, 'epoch': 0.87} 23%|██▎ | 2317/10000 [8:27:29<27:34:59, 12.92s/it] 23%|██▎ | 2318/10000 [8:27:42<27:34:44, 12.92s/it] {'loss': 0.009, 'learning_rate': 3.8440000000000005e-05, 'epoch': 0.87} 23%|██▎ | 2318/10000 [8:27:42<27:34:44, 12.92s/it] 23%|██▎ | 2319/10000 [8:27:55<27:39:08, 12.96s/it] {'loss': 0.0072, 'learning_rate': 3.843500000000001e-05, 'epoch': 0.87} 23%|██▎ | 2319/10000 [8:27:55<27:39:08, 12.96s/it] 23%|██▎ | 2320/10000 [8:28:08<27:41:22, 12.98s/it] {'loss': 0.0158, 'learning_rate': 3.8429999999999996e-05, 'epoch': 0.87} 23%|██▎ | 2320/10000 [8:28:08<27:41:22, 12.98s/it] 23%|██▎ | 2321/10000 [8:28:21<27:34:35, 12.93s/it] {'loss': 0.0081, 'learning_rate': 3.8425e-05, 'epoch': 0.87} 23%|██▎ | 2321/10000 [8:28:21<27:34:35, 12.93s/it] 23%|██▎ | 2322/10000 [8:28:34<27:34:39, 12.93s/it] {'loss': 0.0075, 'learning_rate': 3.842e-05, 'epoch': 0.87} 23%|██▎ | 2322/10000 [8:28:34<27:34:39, 12.93s/it] 23%|██▎ | 2323/10000 [8:28:47<27:34:15, 12.93s/it] {'loss': 0.0076, 'learning_rate': 3.8415000000000004e-05, 'epoch': 0.88} 23%|██▎ | 2323/10000 [8:28:47<27:34:15, 12.93s/it] 23%|██▎ | 2324/10000 [8:29:00<27:38:26, 12.96s/it] {'loss': 0.0072, 'learning_rate': 3.841e-05, 'epoch': 0.88} 23%|██▎ | 2324/10000 [8:29:00<27:38:26, 12.96s/it] 23%|██▎ | 2325/10000 [8:29:13<27:37:36, 12.96s/it] {'loss': 0.0067, 'learning_rate': 3.8405e-05, 'epoch': 0.88} 23%|██▎ | 2325/10000 [8:29:13<27:37:36, 12.96s/it] 23%|██▎ | 2326/10000 [8:29:26<27:36:21, 12.95s/it] {'loss': 0.0074, 'learning_rate': 3.8400000000000005e-05, 'epoch': 0.88} 23%|██▎ | 2326/10000 [8:29:26<27:36:21, 12.95s/it] 23%|██▎ | 2327/10000 [8:29:39<27:37:07, 12.96s/it] {'loss': 0.0075, 'learning_rate': 3.8395e-05, 'epoch': 0.88} 23%|██▎ | 2327/10000 [8:29:39<27:37:07, 12.96s/it] 23%|██▎ | 2328/10000 [8:29:52<27:37:23, 12.96s/it] {'loss': 0.0069, 'learning_rate': 3.8390000000000003e-05, 'epoch': 0.88} 23%|██▎ | 2328/10000 [8:29:52<27:37:23, 12.96s/it] 23%|██▎ | 2329/10000 [8:30:05<27:39:00, 12.98s/it] {'loss': 0.0086, 'learning_rate': 3.8385000000000006e-05, 'epoch': 0.88} 23%|██▎ | 2329/10000 [8:30:05<27:39:00, 12.98s/it] 23%|██▎ | 2330/10000 [8:30:18<27:40:32, 12.99s/it] {'loss': 0.0067, 'learning_rate': 3.838e-05, 'epoch': 0.88} 23%|██▎ | 2330/10000 [8:30:18<27:40:32, 12.99s/it] 23%|██▎ | 2331/10000 [8:30:31<27:39:03, 12.98s/it] {'loss': 0.0063, 'learning_rate': 3.8375e-05, 'epoch': 0.88} 23%|██▎ | 2331/10000 [8:30:31<27:39:03, 12.98s/it] 23%|██▎ | 2332/10000 [8:30:44<27:35:48, 12.96s/it] {'loss': 0.0051, 'learning_rate': 3.837e-05, 'epoch': 0.88} 23%|██▎ | 2332/10000 [8:30:44<27:35:48, 12.96s/it] 23%|██▎ | 2333/10000 [8:30:57<27:34:12, 12.95s/it] {'loss': 0.0098, 'learning_rate': 3.8365e-05, 'epoch': 0.88} 23%|██▎ | 2333/10000 [8:30:57<27:34:12, 12.95s/it] 23%|██▎ | 2334/10000 [8:31:09<27:34:52, 12.95s/it] {'loss': 0.0056, 'learning_rate': 3.836e-05, 'epoch': 0.88} 23%|██▎ | 2334/10000 [8:31:10<27:34:52, 12.95s/it] 23%|██▎ | 2335/10000 [8:31:22<27:32:29, 12.94s/it] {'loss': 0.0072, 'learning_rate': 3.8355e-05, 'epoch': 0.88} 23%|██▎ | 2335/10000 [8:31:22<27:32:29, 12.94s/it] 23%|██▎ | 2336/10000 [8:31:35<27:30:17, 12.92s/it] {'loss': 0.0097, 'learning_rate': 3.8350000000000004e-05, 'epoch': 0.88} 23%|██▎ | 2336/10000 [8:31:35<27:30:17, 12.92s/it] 23%|██▎ | 2337/10000 [8:31:48<27:26:34, 12.89s/it] {'loss': 0.0066, 'learning_rate': 3.8345000000000006e-05, 'epoch': 0.88} 23%|██▎ | 2337/10000 [8:31:48<27:26:34, 12.89s/it] 23%|██▎ | 2338/10000 [8:32:01<27:26:11, 12.89s/it] {'loss': 0.0086, 'learning_rate': 3.834e-05, 'epoch': 0.88} 23%|██▎ | 2338/10000 [8:32:01<27:26:11, 12.89s/it] 23%|██▎ | 2339/10000 [8:32:14<27:25:32, 12.89s/it] {'loss': 0.0067, 'learning_rate': 3.8335000000000005e-05, 'epoch': 0.88} 23%|██▎ | 2339/10000 [8:32:14<27:25:32, 12.89s/it] 23%|██▎ | 2340/10000 [8:32:27<27:23:11, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.833e-05, 'epoch': 0.88} 23%|██▎ | 2340/10000 [8:32:27<27:23:11, 12.87s/it] 23%|██▎ | 2341/10000 [8:32:40<27:24:45, 12.88s/it] {'loss': 0.0083, 'learning_rate': 3.8324999999999996e-05, 'epoch': 0.88} 23%|██▎ | 2341/10000 [8:32:40<27:24:45, 12.88s/it] 23%|██▎ | 2342/10000 [8:32:53<27:25:57, 12.90s/it] {'loss': 0.007, 'learning_rate': 3.832e-05, 'epoch': 0.88} 23%|██▎ | 2342/10000 [8:32:53<27:25:57, 12.90s/it] 23%|██▎ | 2343/10000 [8:33:05<27:25:20, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.8315e-05, 'epoch': 0.88} 23%|██▎ | 2343/10000 [8:33:05<27:25:20, 12.89s/it] 23%|██▎ | 2344/10000 [8:33:18<27:24:55, 12.89s/it] {'loss': 0.0094, 'learning_rate': 3.8310000000000004e-05, 'epoch': 0.88} 23%|██▎ | 2344/10000 [8:33:18<27:24:55, 12.89s/it] 23%|██▎ | 2345/10000 [8:33:31<27:26:25, 12.90s/it] {'loss': 0.0071, 'learning_rate': 3.8305e-05, 'epoch': 0.88} 23%|██▎ | 2345/10000 [8:33:31<27:26:25, 12.90s/it] 23%|██▎ | 2346/10000 [8:33:44<27:22:05, 12.87s/it] {'loss': 0.0119, 'learning_rate': 3.83e-05, 'epoch': 0.88} 23%|██▎ | 2346/10000 [8:33:44<27:22:05, 12.87s/it] 23%|██▎ | 2347/10000 [8:33:57<27:21:55, 12.87s/it] {'loss': 0.0065, 'learning_rate': 3.8295000000000005e-05, 'epoch': 0.88} 23%|██▎ | 2347/10000 [8:33:57<27:21:55, 12.87s/it] 23%|██▎ | 2348/10000 [8:34:10<27:22:52, 12.88s/it] {'loss': 0.007, 'learning_rate': 3.829e-05, 'epoch': 0.88} 23%|██▎ | 2348/10000 [8:34:10<27:22:52, 12.88s/it] 23%|██▎ | 2349/10000 [8:34:23<27:23:40, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.8285000000000004e-05, 'epoch': 0.89} 23%|██▎ | 2349/10000 [8:34:23<27:23:40, 12.89s/it] 24%|██▎ | 2350/10000 [8:34:36<27:24:00, 12.89s/it] {'loss': 0.0061, 'learning_rate': 3.828e-05, 'epoch': 0.89} 24%|██▎ | 2350/10000 [8:34:36<27:24:00, 12.89s/it] 24%|██▎ | 2351/10000 [8:34:48<27:22:29, 12.88s/it] {'loss': 0.0045, 'learning_rate': 3.8275e-05, 'epoch': 0.89} 24%|██▎ | 2351/10000 [8:34:49<27:22:29, 12.88s/it] 24%|██▎ | 2352/10000 [8:35:01<27:23:06, 12.89s/it] {'loss': 0.0068, 'learning_rate': 3.827e-05, 'epoch': 0.89} 24%|██▎ | 2352/10000 [8:35:01<27:23:06, 12.89s/it] 24%|██▎ | 2353/10000 [8:35:14<27:20:26, 12.87s/it] {'loss': 0.0095, 'learning_rate': 3.8265e-05, 'epoch': 0.89} 24%|██▎ | 2353/10000 [8:35:14<27:20:26, 12.87s/it] 24%|██▎ | 2354/10000 [8:35:27<27:19:58, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.826e-05, 'epoch': 0.89} 24%|██▎ | 2354/10000 [8:35:27<27:19:58, 12.87s/it] 24%|██▎ | 2355/10000 [8:35:40<27:19:54, 12.87s/it] {'loss': 0.0095, 'learning_rate': 3.8255e-05, 'epoch': 0.89} 24%|██▎ | 2355/10000 [8:35:40<27:19:54, 12.87s/it] 24%|██▎ | 2356/10000 [8:35:53<27:21:23, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.825e-05, 'epoch': 0.89} 24%|██▎ | 2356/10000 [8:35:53<27:21:23, 12.88s/it] 24%|██▎ | 2357/10000 [8:36:06<27:22:55, 12.90s/it] {'loss': 0.0086, 'learning_rate': 3.8245000000000004e-05, 'epoch': 0.89} 24%|██▎ | 2357/10000 [8:36:06<27:22:55, 12.90s/it] 24%|██▎ | 2358/10000 [8:36:19<27:22:15, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.8240000000000007e-05, 'epoch': 0.89} 24%|██▎ | 2358/10000 [8:36:19<27:22:15, 12.89s/it] 24%|██▎ | 2359/10000 [8:36:32<27:20:51, 12.88s/it] {'loss': 0.0081, 'learning_rate': 3.8235e-05, 'epoch': 0.89} 24%|██▎ | 2359/10000 [8:36:32<27:20:51, 12.88s/it] 24%|██▎ | 2360/10000 [8:36:44<27:20:18, 12.88s/it] {'loss': 0.0075, 'learning_rate': 3.823e-05, 'epoch': 0.89} 24%|██▎ | 2360/10000 [8:36:44<27:20:18, 12.88s/it] 24%|██▎ | 2361/10000 [8:36:57<27:19:56, 12.88s/it] {'loss': 0.0091, 'learning_rate': 3.8225e-05, 'epoch': 0.89} 24%|██▎ | 2361/10000 [8:36:57<27:19:56, 12.88s/it] 24%|██▎ | 2362/10000 [8:37:10<27:18:28, 12.87s/it] {'loss': 0.0052, 'learning_rate': 3.822e-05, 'epoch': 0.89} 24%|██▎ | 2362/10000 [8:37:10<27:18:28, 12.87s/it] 24%|██▎ | 2363/10000 [8:37:23<27:18:08, 12.87s/it] {'loss': 0.0053, 'learning_rate': 3.8215e-05, 'epoch': 0.89} 24%|██▎ | 2363/10000 [8:37:23<27:18:08, 12.87s/it] 24%|██▎ | 2364/10000 [8:37:36<27:18:37, 12.88s/it] {'loss': 0.0056, 'learning_rate': 3.821e-05, 'epoch': 0.89} 24%|██▎ | 2364/10000 [8:37:36<27:18:37, 12.88s/it] 24%|██▎ | 2365/10000 [8:37:49<27:20:00, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.8205000000000004e-05, 'epoch': 0.89} 24%|██▎ | 2365/10000 [8:37:49<27:20:00, 12.89s/it] 24%|██▎ | 2366/10000 [8:38:02<27:19:16, 12.88s/it] {'loss': 0.008, 'learning_rate': 3.82e-05, 'epoch': 0.89} 24%|██▎ | 2366/10000 [8:38:02<27:19:16, 12.88s/it] 24%|██▎ | 2367/10000 [8:38:15<27:20:16, 12.89s/it] {'loss': 0.0066, 'learning_rate': 3.8195e-05, 'epoch': 0.89} 24%|██▎ | 2367/10000 [8:38:15<27:20:16, 12.89s/it] 24%|██▎ | 2368/10000 [8:38:28<27:23:03, 12.92s/it] {'loss': 0.0067, 'learning_rate': 3.8190000000000005e-05, 'epoch': 0.89} 24%|██▎ | 2368/10000 [8:38:28<27:23:03, 12.92s/it] 24%|██▎ | 2369/10000 [8:38:40<27:21:16, 12.90s/it] {'loss': 0.0045, 'learning_rate': 3.8185e-05, 'epoch': 0.89} 24%|██▎ | 2369/10000 [8:38:40<27:21:16, 12.90s/it] 24%|██▎ | 2370/10000 [8:38:53<27:21:10, 12.91s/it] {'loss': 0.0066, 'learning_rate': 3.818e-05, 'epoch': 0.89} 24%|██▎ | 2370/10000 [8:38:53<27:21:10, 12.91s/it] 24%|██▎ | 2371/10000 [8:39:06<27:18:23, 12.89s/it] {'loss': 0.0056, 'learning_rate': 3.8175e-05, 'epoch': 0.89} 24%|██▎ | 2371/10000 [8:39:06<27:18:23, 12.89s/it] 24%|██▎ | 2372/10000 [8:39:19<27:18:57, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.817e-05, 'epoch': 0.89} 24%|██▎ | 2372/10000 [8:39:19<27:18:57, 12.89s/it] 24%|██▎ | 2373/10000 [8:39:32<27:17:26, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.8165e-05, 'epoch': 0.89} 24%|██▎ | 2373/10000 [8:39:32<27:17:26, 12.88s/it] 24%|██▎ | 2374/10000 [8:39:45<27:18:19, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.816e-05, 'epoch': 0.89} 24%|██▎ | 2374/10000 [8:39:45<27:18:19, 12.89s/it] 24%|██▍ | 2375/10000 [8:39:58<27:17:34, 12.89s/it] {'loss': 0.0047, 'learning_rate': 3.8155e-05, 'epoch': 0.89} 24%|██▍ | 2375/10000 [8:39:58<27:17:34, 12.89s/it] 24%|██▍ | 2376/10000 [8:40:11<27:17:17, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.8150000000000006e-05, 'epoch': 0.9} 24%|██▍ | 2376/10000 [8:40:11<27:17:17, 12.89s/it] 24%|██▍ | 2377/10000 [8:40:24<27:17:13, 12.89s/it] {'loss': 0.0083, 'learning_rate': 3.8145e-05, 'epoch': 0.9} 24%|██▍ | 2377/10000 [8:40:24<27:17:13, 12.89s/it] 24%|██▍ | 2378/10000 [8:40:36<27:16:52, 12.89s/it] {'loss': 0.0056, 'learning_rate': 3.8140000000000004e-05, 'epoch': 0.9} 24%|██▍ | 2378/10000 [8:40:36<27:16:52, 12.89s/it] 24%|██▍ | 2379/10000 [8:40:49<27:14:00, 12.86s/it] {'loss': 0.0046, 'learning_rate': 3.813500000000001e-05, 'epoch': 0.9} 24%|██▍ | 2379/10000 [8:40:49<27:14:00, 12.86s/it] 24%|██▍ | 2380/10000 [8:41:02<27:14:56, 12.87s/it] {'loss': 0.0162, 'learning_rate': 3.8129999999999996e-05, 'epoch': 0.9} 24%|██▍ | 2380/10000 [8:41:02<27:14:56, 12.87s/it] 24%|██▍ | 2381/10000 [8:41:15<27:14:59, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.8125e-05, 'epoch': 0.9} 24%|██▍ | 2381/10000 [8:41:15<27:14:59, 12.88s/it] 24%|██▍ | 2382/10000 [8:41:28<27:13:18, 12.86s/it] {'loss': 0.0062, 'learning_rate': 3.812e-05, 'epoch': 0.9} 24%|██▍ | 2382/10000 [8:41:28<27:13:18, 12.86s/it] 24%|██▍ | 2383/10000 [8:41:41<27:13:23, 12.87s/it] {'loss': 0.0047, 'learning_rate': 3.8115000000000004e-05, 'epoch': 0.9} 24%|██▍ | 2383/10000 [8:41:41<27:13:23, 12.87s/it] 24%|██▍ | 2384/10000 [8:41:54<27:12:10, 12.86s/it] {'loss': 0.0054, 'learning_rate': 3.811e-05, 'epoch': 0.9} 24%|██▍ | 2384/10000 [8:41:54<27:12:10, 12.86s/it] 24%|██▍ | 2385/10000 [8:42:06<27:12:08, 12.86s/it] {'loss': 0.007, 'learning_rate': 3.8105e-05, 'epoch': 0.9} 24%|██▍ | 2385/10000 [8:42:06<27:12:08, 12.86s/it] 24%|██▍ | 2386/10000 [8:42:19<27:11:57, 12.86s/it] {'loss': 0.0051, 'learning_rate': 3.8100000000000005e-05, 'epoch': 0.9} 24%|██▍ | 2386/10000 [8:42:19<27:11:57, 12.86s/it] 24%|██▍ | 2387/10000 [8:42:32<27:14:36, 12.88s/it] {'loss': 0.005, 'learning_rate': 3.8095e-05, 'epoch': 0.9} 24%|██▍ | 2387/10000 [8:42:32<27:14:36, 12.88s/it] 24%|██▍ | 2388/10000 [8:42:45<27:16:09, 12.90s/it] {'loss': 0.0054, 'learning_rate': 3.809e-05, 'epoch': 0.9} 24%|██▍ | 2388/10000 [8:42:45<27:16:09, 12.90s/it] 24%|██▍ | 2389/10000 [8:42:58<27:18:50, 12.92s/it] {'loss': 0.0046, 'learning_rate': 3.8085000000000006e-05, 'epoch': 0.9} 24%|██▍ | 2389/10000 [8:42:58<27:18:50, 12.92s/it] 24%|██▍ | 2390/10000 [8:43:11<27:17:25, 12.91s/it] {'loss': 0.0058, 'learning_rate': 3.808e-05, 'epoch': 0.9} 24%|██▍ | 2390/10000 [8:43:11<27:17:25, 12.91s/it] 24%|██▍ | 2391/10000 [8:43:24<27:15:06, 12.89s/it] {'loss': 0.0063, 'learning_rate': 3.8075e-05, 'epoch': 0.9} 24%|██▍ | 2391/10000 [8:43:24<27:15:06, 12.89s/it] 24%|██▍ | 2392/10000 [8:43:37<27:19:21, 12.93s/it] {'loss': 0.0049, 'learning_rate': 3.807e-05, 'epoch': 0.9} 24%|██▍ | 2392/10000 [8:43:37<27:19:21, 12.93s/it] 24%|██▍ | 2393/10000 [8:43:50<27:17:16, 12.91s/it] {'loss': 0.0063, 'learning_rate': 3.8065e-05, 'epoch': 0.9} 24%|██▍ | 2393/10000 [8:43:50<27:17:16, 12.91s/it] 24%|██▍ | 2394/10000 [8:44:03<27:16:26, 12.91s/it] {'loss': 0.0048, 'learning_rate': 3.806e-05, 'epoch': 0.9} 24%|██▍ | 2394/10000 [8:44:03<27:16:26, 12.91s/it] 24%|██▍ | 2395/10000 [8:44:15<27:11:02, 12.87s/it] {'loss': 0.0081, 'learning_rate': 3.8055e-05, 'epoch': 0.9} 24%|██▍ | 2395/10000 [8:44:15<27:11:02, 12.87s/it] 24%|██▍ | 2396/10000 [8:44:28<27:10:46, 12.87s/it] {'loss': 0.0102, 'learning_rate': 3.805e-05, 'epoch': 0.9} 24%|██▍ | 2396/10000 [8:44:28<27:10:46, 12.87s/it] 24%|██▍ | 2397/10000 [8:44:41<27:16:13, 12.91s/it] {'loss': 0.0055, 'learning_rate': 3.8045000000000006e-05, 'epoch': 0.9} 24%|██▍ | 2397/10000 [8:44:41<27:16:13, 12.91s/it] 24%|██▍ | 2398/10000 [8:44:54<27:12:39, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.804e-05, 'epoch': 0.9} 24%|██▍ | 2398/10000 [8:44:54<27:12:39, 12.89s/it] 24%|██▍ | 2399/10000 [8:45:07<27:14:51, 12.91s/it] {'loss': 0.0055, 'learning_rate': 3.8035000000000004e-05, 'epoch': 0.9} 24%|██▍ | 2399/10000 [8:45:07<27:14:51, 12.91s/it] 24%|██▍ | 2400/10000 [8:45:20<27:14:25, 12.90s/it] {'loss': 0.0059, 'learning_rate': 3.803000000000001e-05, 'epoch': 0.9} 24%|██▍ | 2400/10000 [8:45:20<27:14:25, 12.90s/it] 24%|██▍ | 2401/10000 [8:45:33<27:15:03, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.8025e-05, 'epoch': 0.9} 24%|██▍ | 2401/10000 [8:45:33<27:15:03, 12.91s/it] 24%|██▍ | 2402/10000 [8:45:46<27:11:25, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.802e-05, 'epoch': 0.91} 24%|██▍ | 2402/10000 [8:45:46<27:11:25, 12.88s/it] 24%|██▍ | 2403/10000 [8:45:59<27:16:24, 12.92s/it] {'loss': 0.0048, 'learning_rate': 3.8015e-05, 'epoch': 0.91} 24%|██▍ | 2403/10000 [8:45:59<27:16:24, 12.92s/it] 24%|██▍ | 2404/10000 [8:46:12<27:13:00, 12.90s/it] {'loss': 0.0057, 'learning_rate': 3.8010000000000004e-05, 'epoch': 0.91} 24%|██▍ | 2404/10000 [8:46:12<27:13:00, 12.90s/it] 24%|██▍ | 2405/10000 [8:46:25<27:15:20, 12.92s/it] {'loss': 0.0091, 'learning_rate': 3.8005e-05, 'epoch': 0.91} 24%|██▍ | 2405/10000 [8:46:25<27:15:20, 12.92s/it] 24%|██▍ | 2406/10000 [8:46:37<27:11:18, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.8e-05, 'epoch': 0.91} 24%|██▍ | 2406/10000 [8:46:37<27:11:18, 12.89s/it] 24%|██▍ | 2407/10000 [8:46:50<27:09:24, 12.88s/it] {'loss': 0.0047, 'learning_rate': 3.7995000000000005e-05, 'epoch': 0.91} 24%|██▍ | 2407/10000 [8:46:50<27:09:24, 12.88s/it] 24%|██▍ | 2408/10000 [8:47:03<27:12:48, 12.90s/it] {'loss': 0.0064, 'learning_rate': 3.799e-05, 'epoch': 0.91} 24%|██▍ | 2408/10000 [8:47:03<27:12:48, 12.90s/it] 24%|██▍ | 2409/10000 [8:47:16<27:09:46, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.7985e-05, 'epoch': 0.91} 24%|██▍ | 2409/10000 [8:47:16<27:09:46, 12.88s/it] 24%|██▍ | 2410/10000 [8:47:29<27:09:21, 12.88s/it] {'loss': 0.0047, 'learning_rate': 3.7980000000000006e-05, 'epoch': 0.91} 24%|██▍ | 2410/10000 [8:47:29<27:09:21, 12.88s/it] 24%|██▍ | 2411/10000 [8:47:42<27:11:09, 12.90s/it] {'loss': 0.0064, 'learning_rate': 3.7975e-05, 'epoch': 0.91} 24%|██▍ | 2411/10000 [8:47:42<27:11:09, 12.90s/it] 24%|██▍ | 2412/10000 [8:47:55<27:08:27, 12.88s/it] {'loss': 0.0047, 'learning_rate': 3.797e-05, 'epoch': 0.91} 24%|██▍ | 2412/10000 [8:47:55<27:08:27, 12.88s/it] 24%|██▍ | 2413/10000 [8:48:08<27:09:01, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.7965e-05, 'epoch': 0.91} 24%|██▍ | 2413/10000 [8:48:08<27:09:01, 12.88s/it] 24%|██▍ | 2414/10000 [8:48:20<27:09:52, 12.89s/it] {'loss': 0.0042, 'learning_rate': 3.796e-05, 'epoch': 0.91} 24%|██▍ | 2414/10000 [8:48:20<27:09:52, 12.89s/it] 24%|██▍ | 2415/10000 [8:48:33<27:09:53, 12.89s/it] {'loss': 0.0063, 'learning_rate': 3.7955e-05, 'epoch': 0.91} 24%|██▍ | 2415/10000 [8:48:33<27:09:53, 12.89s/it] 24%|██▍ | 2416/10000 [8:48:46<27:06:51, 12.87s/it] {'loss': 0.0047, 'learning_rate': 3.795e-05, 'epoch': 0.91} 24%|██▍ | 2416/10000 [8:48:46<27:06:51, 12.87s/it] 24%|██▍ | 2417/10000 [8:48:59<27:07:11, 12.88s/it] {'loss': 0.0067, 'learning_rate': 3.7945000000000003e-05, 'epoch': 0.91} 24%|██▍ | 2417/10000 [8:48:59<27:07:11, 12.88s/it] 24%|██▍ | 2418/10000 [8:49:12<27:05:38, 12.86s/it] {'loss': 0.0072, 'learning_rate': 3.7940000000000006e-05, 'epoch': 0.91} 24%|██▍ | 2418/10000 [8:49:12<27:05:38, 12.86s/it] 24%|██▍ | 2419/10000 [8:49:25<27:07:41, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.7935e-05, 'epoch': 0.91} 24%|██▍ | 2419/10000 [8:49:25<27:07:41, 12.88s/it] 24%|██▍ | 2420/10000 [8:49:38<27:08:45, 12.89s/it] {'loss': 0.006, 'learning_rate': 3.7930000000000004e-05, 'epoch': 0.91} 24%|██▍ | 2420/10000 [8:49:38<27:08:45, 12.89s/it] 24%|██▍ | 2421/10000 [8:49:51<27:06:40, 12.88s/it] {'loss': 0.0074, 'learning_rate': 3.7925e-05, 'epoch': 0.91} 24%|██▍ | 2421/10000 [8:49:51<27:06:40, 12.88s/it] 24%|██▍ | 2422/10000 [8:50:03<27:05:39, 12.87s/it] {'loss': 0.0052, 'learning_rate': 3.792e-05, 'epoch': 0.91} 24%|██▍ | 2422/10000 [8:50:03<27:05:39, 12.87s/it] 24%|██▍ | 2423/10000 [8:50:16<27:08:26, 12.90s/it] {'loss': 0.0075, 'learning_rate': 3.7915e-05, 'epoch': 0.91} 24%|██▍ | 2423/10000 [8:50:16<27:08:26, 12.90s/it] 24%|██▍ | 2424/10000 [8:50:29<27:06:10, 12.88s/it] {'loss': 0.0064, 'learning_rate': 3.791e-05, 'epoch': 0.91} 24%|██▍ | 2424/10000 [8:50:29<27:06:10, 12.88s/it] 24%|██▍ | 2425/10000 [8:50:42<27:07:23, 12.89s/it] {'loss': 0.0069, 'learning_rate': 3.7905000000000004e-05, 'epoch': 0.91} 24%|██▍ | 2425/10000 [8:50:42<27:07:23, 12.89s/it] 24%|██▍ | 2426/10000 [8:50:55<27:09:30, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.79e-05, 'epoch': 0.91} 24%|██▍ | 2426/10000 [8:50:55<27:09:30, 12.91s/it] 24%|██▍ | 2427/10000 [8:51:08<27:08:05, 12.90s/it] {'loss': 0.008, 'learning_rate': 3.7895e-05, 'epoch': 0.91} 24%|██▍ | 2427/10000 [8:51:08<27:08:05, 12.90s/it] 24%|██▍ | 2428/10000 [8:51:21<27:07:52, 12.90s/it] {'loss': 0.0072, 'learning_rate': 3.7890000000000005e-05, 'epoch': 0.91} 24%|██▍ | 2428/10000 [8:51:21<27:07:52, 12.90s/it] 24%|██▍ | 2429/10000 [8:51:34<27:04:03, 12.87s/it] {'loss': 0.0082, 'learning_rate': 3.7885e-05, 'epoch': 0.92} 24%|██▍ | 2429/10000 [8:51:34<27:04:03, 12.87s/it] 24%|██▍ | 2430/10000 [8:51:47<27:08:08, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.788e-05, 'epoch': 0.92} 24%|██▍ | 2430/10000 [8:51:47<27:08:08, 12.90s/it] 24%|██▍ | 2431/10000 [8:52:00<27:09:05, 12.91s/it] {'loss': 0.0047, 'learning_rate': 3.7875e-05, 'epoch': 0.92} 24%|██▍ | 2431/10000 [8:52:00<27:09:05, 12.91s/it] 24%|██▍ | 2432/10000 [8:52:12<27:07:11, 12.90s/it] {'loss': 0.0072, 'learning_rate': 3.787e-05, 'epoch': 0.92} 24%|██▍ | 2432/10000 [8:52:12<27:07:11, 12.90s/it] 24%|██▍ | 2433/10000 [8:52:25<27:07:40, 12.91s/it] {'loss': 0.0074, 'learning_rate': 3.7865e-05, 'epoch': 0.92} 24%|██▍ | 2433/10000 [8:52:25<27:07:40, 12.91s/it] 24%|██▍ | 2434/10000 [8:52:38<27:11:09, 12.94s/it] {'loss': 0.0061, 'learning_rate': 3.786e-05, 'epoch': 0.92} 24%|██▍ | 2434/10000 [8:52:38<27:11:09, 12.94s/it] 24%|██▍ | 2435/10000 [8:52:51<27:11:06, 12.94s/it] {'loss': 0.0053, 'learning_rate': 3.7855e-05, 'epoch': 0.92} 24%|██▍ | 2435/10000 [8:52:51<27:11:06, 12.94s/it] 24%|██▍ | 2436/10000 [8:53:04<27:08:33, 12.92s/it] {'loss': 0.006, 'learning_rate': 3.7850000000000005e-05, 'epoch': 0.92} 24%|██▍ | 2436/10000 [8:53:04<27:08:33, 12.92s/it] 24%|██▍ | 2437/10000 [8:53:17<27:03:10, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.7845e-05, 'epoch': 0.92} 24%|██▍ | 2437/10000 [8:53:17<27:03:10, 12.88s/it] 24%|██▍ | 2438/10000 [8:53:30<27:03:48, 12.88s/it] {'loss': 0.0064, 'learning_rate': 3.7840000000000004e-05, 'epoch': 0.92} 24%|██▍ | 2438/10000 [8:53:30<27:03:48, 12.88s/it] 24%|██▍ | 2439/10000 [8:53:43<27:01:32, 12.87s/it] {'loss': 0.004, 'learning_rate': 3.7835000000000006e-05, 'epoch': 0.92} 24%|██▍ | 2439/10000 [8:53:43<27:01:32, 12.87s/it] 24%|██▍ | 2440/10000 [8:53:56<27:02:58, 12.88s/it] {'loss': 0.0046, 'learning_rate': 3.783e-05, 'epoch': 0.92} 24%|██▍ | 2440/10000 [8:53:56<27:02:58, 12.88s/it] 24%|██▍ | 2441/10000 [8:54:08<27:01:09, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.7825e-05, 'epoch': 0.92} 24%|██▍ | 2441/10000 [8:54:08<27:01:09, 12.87s/it] 24%|██▍ | 2442/10000 [8:54:21<27:02:22, 12.88s/it] {'loss': 0.0049, 'learning_rate': 3.782e-05, 'epoch': 0.92} 24%|██▍ | 2442/10000 [8:54:21<27:02:22, 12.88s/it] 24%|██▍ | 2443/10000 [8:54:34<27:00:18, 12.86s/it] {'loss': 0.0042, 'learning_rate': 3.7815e-05, 'epoch': 0.92} 24%|██▍ | 2443/10000 [8:54:34<27:00:18, 12.86s/it] 24%|██▍ | 2444/10000 [8:54:47<26:58:30, 12.85s/it] {'loss': 0.0067, 'learning_rate': 3.781e-05, 'epoch': 0.92} 24%|██▍ | 2444/10000 [8:54:47<26:58:30, 12.85s/it] 24%|██▍ | 2445/10000 [8:55:00<26:56:32, 12.84s/it] {'loss': 0.0055, 'learning_rate': 3.7805e-05, 'epoch': 0.92} 24%|██▍ | 2445/10000 [8:55:00<26:56:32, 12.84s/it] 24%|██▍ | 2446/10000 [8:55:13<27:00:08, 12.87s/it] {'loss': 0.0067, 'learning_rate': 3.7800000000000004e-05, 'epoch': 0.92} 24%|██▍ | 2446/10000 [8:55:13<27:00:08, 12.87s/it] 24%|██▍ | 2447/10000 [8:55:26<27:00:50, 12.88s/it] {'loss': 0.0052, 'learning_rate': 3.7795e-05, 'epoch': 0.92} 24%|██▍ | 2447/10000 [8:55:26<27:00:50, 12.88s/it] 24%|██▍ | 2448/10000 [8:55:39<27:01:50, 12.89s/it] {'loss': 0.0066, 'learning_rate': 3.779e-05, 'epoch': 0.92} 24%|██▍ | 2448/10000 [8:55:39<27:01:50, 12.89s/it] 24%|██▍ | 2449/10000 [8:55:52<27:04:45, 12.91s/it] {'loss': 0.0054, 'learning_rate': 3.7785000000000005e-05, 'epoch': 0.92} 24%|██▍ | 2449/10000 [8:55:52<27:04:45, 12.91s/it] 24%|██▍ | 2450/10000 [8:56:04<27:03:39, 12.90s/it] {'loss': 0.0058, 'learning_rate': 3.778000000000001e-05, 'epoch': 0.92} 24%|██▍ | 2450/10000 [8:56:04<27:03:39, 12.90s/it] 25%|██▍ | 2451/10000 [8:56:17<27:03:28, 12.90s/it] {'loss': 0.0048, 'learning_rate': 3.7775e-05, 'epoch': 0.92} 25%|██▍ | 2451/10000 [8:56:17<27:03:28, 12.90s/it] 25%|██▍ | 2452/10000 [8:56:30<27:00:18, 12.88s/it] {'loss': 0.0067, 'learning_rate': 3.777e-05, 'epoch': 0.92} 25%|██▍ | 2452/10000 [8:56:30<27:00:18, 12.88s/it] 25%|██▍ | 2453/10000 [8:56:43<27:01:36, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.7765e-05, 'epoch': 0.92} 25%|██▍ | 2453/10000 [8:56:43<27:01:36, 12.89s/it] 25%|██▍ | 2454/10000 [8:56:56<27:01:36, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.776e-05, 'epoch': 0.92} 25%|██▍ | 2454/10000 [8:56:56<27:01:36, 12.89s/it] 25%|██▍ | 2455/10000 [8:57:09<26:57:55, 12.87s/it] {'loss': 0.0062, 'learning_rate': 3.7755e-05, 'epoch': 0.93} 25%|██▍ | 2455/10000 [8:57:09<26:57:55, 12.87s/it] 25%|██▍ | 2456/10000 [8:57:22<26:59:01, 12.88s/it] {'loss': 0.0044, 'learning_rate': 3.775e-05, 'epoch': 0.93} 25%|██▍ | 2456/10000 [8:57:22<26:59:01, 12.88s/it] 25%|██▍ | 2457/10000 [8:57:35<26:59:39, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.7745000000000005e-05, 'epoch': 0.93} 25%|██▍ | 2457/10000 [8:57:35<26:59:39, 12.88s/it] 25%|██▍ | 2458/10000 [8:57:47<27:00:28, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.774e-05, 'epoch': 0.93} 25%|██▍ | 2458/10000 [8:57:48<27:00:28, 12.89s/it] 25%|██▍ | 2459/10000 [8:58:00<27:00:41, 12.90s/it] {'loss': 0.0044, 'learning_rate': 3.7735000000000004e-05, 'epoch': 0.93} 25%|██▍ | 2459/10000 [8:58:00<27:00:41, 12.90s/it] 25%|██▍ | 2460/10000 [8:58:13<27:02:43, 12.91s/it] {'loss': 0.0051, 'learning_rate': 3.7730000000000006e-05, 'epoch': 0.93} 25%|██▍ | 2460/10000 [8:58:13<27:02:43, 12.91s/it] 25%|██▍ | 2461/10000 [8:58:26<27:04:32, 12.93s/it] {'loss': 0.0072, 'learning_rate': 3.7725e-05, 'epoch': 0.93} 25%|██▍ | 2461/10000 [8:58:26<27:04:32, 12.93s/it] 25%|██▍ | 2462/10000 [8:58:39<27:06:07, 12.94s/it] {'loss': 0.005, 'learning_rate': 3.772e-05, 'epoch': 0.93} 25%|██▍ | 2462/10000 [8:58:39<27:06:07, 12.94s/it] 25%|██▍ | 2463/10000 [8:58:52<27:04:37, 12.93s/it] {'loss': 0.0078, 'learning_rate': 3.7715e-05, 'epoch': 0.93} 25%|██▍ | 2463/10000 [8:58:52<27:04:37, 12.93s/it] 25%|██▍ | 2464/10000 [8:59:05<27:04:39, 12.94s/it] {'loss': 0.0069, 'learning_rate': 3.771e-05, 'epoch': 0.93} 25%|██▍ | 2464/10000 [8:59:05<27:04:39, 12.94s/it] 25%|██▍ | 2465/10000 [8:59:18<27:04:35, 12.94s/it] {'loss': 0.0056, 'learning_rate': 3.7705e-05, 'epoch': 0.93} 25%|██▍ | 2465/10000 [8:59:18<27:04:35, 12.94s/it] 25%|██▍ | 2466/10000 [8:59:31<27:04:23, 12.94s/it] {'loss': 0.0054, 'learning_rate': 3.77e-05, 'epoch': 0.93} 25%|██▍ | 2466/10000 [8:59:31<27:04:23, 12.94s/it] 25%|██▍ | 2467/10000 [8:59:44<27:00:59, 12.91s/it] {'loss': 0.0067, 'learning_rate': 3.7695000000000004e-05, 'epoch': 0.93} 25%|██▍ | 2467/10000 [8:59:44<27:00:59, 12.91s/it] 25%|██▍ | 2468/10000 [8:59:57<27:02:34, 12.93s/it] {'loss': 0.0067, 'learning_rate': 3.769e-05, 'epoch': 0.93} 25%|██▍ | 2468/10000 [8:59:57<27:02:34, 12.93s/it] 25%|██▍ | 2469/10000 [9:00:10<27:05:27, 12.95s/it] {'loss': 0.0049, 'learning_rate': 3.7685e-05, 'epoch': 0.93} 25%|██▍ | 2469/10000 [9:00:10<27:05:27, 12.95s/it] 25%|██▍ | 2470/10000 [9:00:23<27:03:07, 12.93s/it] {'loss': 0.0081, 'learning_rate': 3.7680000000000005e-05, 'epoch': 0.93} 25%|██▍ | 2470/10000 [9:00:23<27:03:07, 12.93s/it] 25%|██▍ | 2471/10000 [9:00:36<27:04:29, 12.95s/it] {'loss': 0.0102, 'learning_rate': 3.7675e-05, 'epoch': 0.93} 25%|██▍ | 2471/10000 [9:00:36<27:04:29, 12.95s/it] 25%|██▍ | 2472/10000 [9:00:49<27:04:05, 12.94s/it] {'loss': 0.0056, 'learning_rate': 3.767e-05, 'epoch': 0.93} 25%|██▍ | 2472/10000 [9:00:49<27:04:05, 12.94s/it] 25%|██▍ | 2473/10000 [9:01:02<27:05:27, 12.96s/it] {'loss': 0.0071, 'learning_rate': 3.7665e-05, 'epoch': 0.93} 25%|██▍ | 2473/10000 [9:01:02<27:05:27, 12.96s/it] 25%|██▍ | 2474/10000 [9:01:15<27:03:39, 12.94s/it] {'loss': 0.0074, 'learning_rate': 3.766e-05, 'epoch': 0.93} 25%|██▍ | 2474/10000 [9:01:15<27:03:39, 12.94s/it] 25%|██▍ | 2475/10000 [9:01:28<27:04:21, 12.95s/it] {'loss': 0.0057, 'learning_rate': 3.7655000000000005e-05, 'epoch': 0.93} 25%|██▍ | 2475/10000 [9:01:28<27:04:21, 12.95s/it] 25%|██▍ | 2476/10000 [9:01:40<27:04:16, 12.95s/it] {'loss': 0.0085, 'learning_rate': 3.765e-05, 'epoch': 0.93} 25%|██▍ | 2476/10000 [9:01:40<27:04:16, 12.95s/it] 25%|██▍ | 2477/10000 [9:01:53<27:05:50, 12.97s/it] {'loss': 0.0051, 'learning_rate': 3.7645e-05, 'epoch': 0.93} 25%|██▍ | 2477/10000 [9:01:54<27:05:50, 12.97s/it] 25%|██▍ | 2478/10000 [9:02:06<27:04:29, 12.96s/it] {'loss': 0.0051, 'learning_rate': 3.7640000000000006e-05, 'epoch': 0.93} 25%|██▍ | 2478/10000 [9:02:06<27:04:29, 12.96s/it] 25%|██▍ | 2479/10000 [9:02:19<27:04:25, 12.96s/it] {'loss': 0.0102, 'learning_rate': 3.7635e-05, 'epoch': 0.93} 25%|██▍ | 2479/10000 [9:02:19<27:04:25, 12.96s/it] 25%|██▍ | 2480/10000 [9:02:32<27:02:19, 12.94s/it] {'loss': 0.0054, 'learning_rate': 3.7630000000000004e-05, 'epoch': 0.93} 25%|██▍ | 2480/10000 [9:02:32<27:02:19, 12.94s/it] 25%|██▍ | 2481/10000 [9:02:45<27:02:36, 12.95s/it] {'loss': 0.0107, 'learning_rate': 3.7625e-05, 'epoch': 0.93} 25%|██▍ | 2481/10000 [9:02:45<27:02:36, 12.95s/it] 25%|██▍ | 2482/10000 [9:02:58<27:01:34, 12.94s/it] {'loss': 0.0066, 'learning_rate': 3.762e-05, 'epoch': 0.94} 25%|██▍ | 2482/10000 [9:02:58<27:01:34, 12.94s/it] 25%|██▍ | 2483/10000 [9:03:11<27:04:09, 12.96s/it] {'loss': 0.0069, 'learning_rate': 3.7615e-05, 'epoch': 0.94} 25%|██▍ | 2483/10000 [9:03:11<27:04:09, 12.96s/it] 25%|██▍ | 2484/10000 [9:03:24<27:03:27, 12.96s/it] {'loss': 0.0081, 'learning_rate': 3.761e-05, 'epoch': 0.94} 25%|██▍ | 2484/10000 [9:03:24<27:03:27, 12.96s/it] 25%|██▍ | 2485/10000 [9:03:37<27:06:27, 12.99s/it] {'loss': 0.0058, 'learning_rate': 3.7605e-05, 'epoch': 0.94} 25%|██▍ | 2485/10000 [9:03:37<27:06:27, 12.99s/it] 25%|██▍ | 2486/10000 [9:03:50<27:05:16, 12.98s/it] {'loss': 0.0044, 'learning_rate': 3.76e-05, 'epoch': 0.94} 25%|██▍ | 2486/10000 [9:03:50<27:05:16, 12.98s/it] 25%|██▍ | 2487/10000 [9:04:03<27:04:40, 12.97s/it] {'loss': 0.006, 'learning_rate': 3.7595e-05, 'epoch': 0.94} 25%|██▍ | 2487/10000 [9:04:03<27:04:40, 12.97s/it] 25%|██▍ | 2488/10000 [9:04:16<27:03:46, 12.97s/it] {'loss': 0.0043, 'learning_rate': 3.7590000000000004e-05, 'epoch': 0.94} 25%|██▍ | 2488/10000 [9:04:16<27:03:46, 12.97s/it] 25%|██▍ | 2489/10000 [9:04:29<27:03:10, 12.97s/it] {'loss': 0.0043, 'learning_rate': 3.758500000000001e-05, 'epoch': 0.94} 25%|██▍ | 2489/10000 [9:04:29<27:03:10, 12.97s/it] 25%|██▍ | 2490/10000 [9:04:42<27:01:46, 12.96s/it] {'loss': 0.0041, 'learning_rate': 3.758e-05, 'epoch': 0.94} 25%|██▍ | 2490/10000 [9:04:42<27:01:46, 12.96s/it] 25%|██▍ | 2491/10000 [9:04:55<27:00:40, 12.95s/it] {'loss': 0.0062, 'learning_rate': 3.7575e-05, 'epoch': 0.94} 25%|██▍ | 2491/10000 [9:04:55<27:00:40, 12.95s/it] 25%|██▍ | 2492/10000 [9:05:08<27:01:45, 12.96s/it] {'loss': 0.0074, 'learning_rate': 3.757e-05, 'epoch': 0.94} 25%|██▍ | 2492/10000 [9:05:08<27:01:45, 12.96s/it] 25%|██▍ | 2493/10000 [9:05:21<27:01:29, 12.96s/it] {'loss': 0.0052, 'learning_rate': 3.7565e-05, 'epoch': 0.94} 25%|██▍ | 2493/10000 [9:05:21<27:01:29, 12.96s/it] 25%|██▍ | 2494/10000 [9:05:34<27:02:43, 12.97s/it] {'loss': 0.0078, 'learning_rate': 3.756e-05, 'epoch': 0.94} 25%|██▍ | 2494/10000 [9:05:34<27:02:43, 12.97s/it] 25%|██▍ | 2495/10000 [9:05:47<27:02:43, 12.97s/it] {'loss': 0.0065, 'learning_rate': 3.7555e-05, 'epoch': 0.94} 25%|██▍ | 2495/10000 [9:05:47<27:02:43, 12.97s/it] 25%|██▍ | 2496/10000 [9:06:00<27:02:11, 12.97s/it] {'loss': 0.0069, 'learning_rate': 3.7550000000000005e-05, 'epoch': 0.94} 25%|██▍ | 2496/10000 [9:06:00<27:02:11, 12.97s/it] 25%|██▍ | 2497/10000 [9:06:13<27:02:43, 12.98s/it] {'loss': 0.0055, 'learning_rate': 3.7545e-05, 'epoch': 0.94} 25%|██▍ | 2497/10000 [9:06:13<27:02:43, 12.98s/it] 25%|██▍ | 2498/10000 [9:06:26<27:00:33, 12.96s/it] {'loss': 0.0062, 'learning_rate': 3.754e-05, 'epoch': 0.94} 25%|██▍ | 2498/10000 [9:06:26<27:00:33, 12.96s/it] 25%|██▍ | 2499/10000 [9:06:39<26:57:09, 12.94s/it] {'loss': 0.0055, 'learning_rate': 3.7535000000000006e-05, 'epoch': 0.94} 25%|██▍ | 2499/10000 [9:06:39<26:57:09, 12.94s/it] 25%|██▌ | 2500/10000 [9:06:51<26:54:40, 12.92s/it] {'loss': 0.0057, 'learning_rate': 3.753e-05, 'epoch': 0.94} 25%|██▌ | 2500/10000 [9:06:51<26:54:40, 12.92s/it] 25%|██▌ | 2501/10000 [9:07:04<26:54:23, 12.92s/it] {'loss': 0.0051, 'learning_rate': 3.7525e-05, 'epoch': 0.94} 25%|██▌ | 2501/10000 [9:07:04<26:54:23, 12.92s/it] 25%|██▌ | 2502/10000 [9:07:17<26:53:20, 12.91s/it] {'loss': 0.0058, 'learning_rate': 3.752e-05, 'epoch': 0.94} 25%|██▌ | 2502/10000 [9:07:17<26:53:20, 12.91s/it] 25%|██▌ | 2503/10000 [9:07:30<26:52:34, 12.91s/it] {'loss': 0.0058, 'learning_rate': 3.7515e-05, 'epoch': 0.94} 25%|██▌ | 2503/10000 [9:07:30<26:52:34, 12.91s/it] 25%|██▌ | 2504/10000 [9:07:43<26:50:47, 12.89s/it] {'loss': 0.005, 'learning_rate': 3.751e-05, 'epoch': 0.94} 25%|██▌ | 2504/10000 [9:07:43<26:50:47, 12.89s/it] 25%|██▌ | 2505/10000 [9:07:56<26:48:31, 12.88s/it] {'loss': 0.0047, 'learning_rate': 3.7505e-05, 'epoch': 0.94} 25%|██▌ | 2505/10000 [9:07:56<26:48:31, 12.88s/it] 25%|██▌ | 2506/10000 [9:08:09<26:48:56, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.7500000000000003e-05, 'epoch': 0.94} 25%|██▌ | 2506/10000 [9:08:09<26:48:56, 12.88s/it] 25%|██▌ | 2507/10000 [9:08:22<26:48:13, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.7495e-05, 'epoch': 0.94} 25%|██▌ | 2507/10000 [9:08:22<26:48:13, 12.88s/it] 25%|██▌ | 2508/10000 [9:08:35<26:48:49, 12.88s/it] {'loss': 0.0054, 'learning_rate': 3.749e-05, 'epoch': 0.94} 25%|██▌ | 2508/10000 [9:08:35<26:48:49, 12.88s/it] 25%|██▌ | 2509/10000 [9:08:47<26:49:56, 12.90s/it] {'loss': 0.0049, 'learning_rate': 3.7485000000000004e-05, 'epoch': 0.95} 25%|██▌ | 2509/10000 [9:08:47<26:49:56, 12.90s/it] 25%|██▌ | 2510/10000 [9:09:00<26:47:46, 12.88s/it] {'loss': 0.0075, 'learning_rate': 3.748000000000001e-05, 'epoch': 0.95} 25%|██▌ | 2510/10000 [9:09:00<26:47:46, 12.88s/it] 25%|██▌ | 2511/10000 [9:09:13<26:48:07, 12.88s/it] {'loss': 0.0045, 'learning_rate': 3.7475e-05, 'epoch': 0.95} 25%|██▌ | 2511/10000 [9:09:13<26:48:07, 12.88s/it] 25%|██▌ | 2512/10000 [9:09:26<26:47:52, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.747e-05, 'epoch': 0.95} 25%|██▌ | 2512/10000 [9:09:26<26:47:52, 12.88s/it] 25%|██▌ | 2513/10000 [9:09:39<26:44:14, 12.86s/it] {'loss': 0.0055, 'learning_rate': 3.7465e-05, 'epoch': 0.95} 25%|██▌ | 2513/10000 [9:09:39<26:44:14, 12.86s/it] 25%|██▌ | 2514/10000 [9:09:52<26:42:43, 12.85s/it] {'loss': 0.006, 'learning_rate': 3.7460000000000004e-05, 'epoch': 0.95} 25%|██▌ | 2514/10000 [9:09:52<26:42:43, 12.85s/it] 25%|██▌ | 2515/10000 [9:10:05<26:42:24, 12.84s/it] {'loss': 0.0054, 'learning_rate': 3.7455e-05, 'epoch': 0.95} 25%|██▌ | 2515/10000 [9:10:05<26:42:24, 12.84s/it] 25%|██▌ | 2516/10000 [9:10:17<26:44:19, 12.86s/it] {'loss': 0.0047, 'learning_rate': 3.745e-05, 'epoch': 0.95} 25%|██▌ | 2516/10000 [9:10:17<26:44:19, 12.86s/it] 25%|██▌ | 2517/10000 [9:10:30<26:42:09, 12.85s/it] {'loss': 0.0066, 'learning_rate': 3.7445000000000005e-05, 'epoch': 0.95} 25%|██▌ | 2517/10000 [9:10:30<26:42:09, 12.85s/it] 25%|██▌ | 2518/10000 [9:10:43<26:44:58, 12.87s/it] {'loss': 0.0062, 'learning_rate': 3.744e-05, 'epoch': 0.95} 25%|██▌ | 2518/10000 [9:10:43<26:44:58, 12.87s/it] 25%|██▌ | 2519/10000 [9:10:56<26:44:39, 12.87s/it] {'loss': 0.0056, 'learning_rate': 3.7435e-05, 'epoch': 0.95} 25%|██▌ | 2519/10000 [9:10:56<26:44:39, 12.87s/it] 25%|██▌ | 2520/10000 [9:11:09<26:45:46, 12.88s/it] {'loss': 0.0062, 'learning_rate': 3.7430000000000006e-05, 'epoch': 0.95} 25%|██▌ | 2520/10000 [9:11:09<26:45:46, 12.88s/it] 25%|██▌ | 2521/10000 [9:11:22<26:46:42, 12.89s/it] {'loss': 0.0061, 'learning_rate': 3.7425e-05, 'epoch': 0.95} 25%|██▌ | 2521/10000 [9:11:22<26:46:42, 12.89s/it] 25%|██▌ | 2522/10000 [9:11:35<26:48:30, 12.91s/it] {'loss': 0.0041, 'learning_rate': 3.742e-05, 'epoch': 0.95} 25%|██▌ | 2522/10000 [9:11:35<26:48:30, 12.91s/it] 25%|██▌ | 2523/10000 [9:11:48<26:47:51, 12.90s/it] {'loss': 0.0051, 'learning_rate': 3.7415e-05, 'epoch': 0.95} 25%|██▌ | 2523/10000 [9:11:48<26:47:51, 12.90s/it] 25%|██▌ | 2524/10000 [9:12:01<26:51:11, 12.93s/it] {'loss': 0.0056, 'learning_rate': 3.741e-05, 'epoch': 0.95} 25%|██▌ | 2524/10000 [9:12:01<26:51:11, 12.93s/it] 25%|██▌ | 2525/10000 [9:12:14<26:48:07, 12.91s/it] {'loss': 0.0052, 'learning_rate': 3.7405e-05, 'epoch': 0.95} 25%|██▌ | 2525/10000 [9:12:14<26:48:07, 12.91s/it] 25%|██▌ | 2526/10000 [9:12:26<26:45:33, 12.89s/it] {'loss': 0.0071, 'learning_rate': 3.74e-05, 'epoch': 0.95} 25%|██▌ | 2526/10000 [9:12:26<26:45:33, 12.89s/it] 25%|██▌ | 2527/10000 [9:12:39<26:47:13, 12.90s/it] {'loss': 0.0056, 'learning_rate': 3.7395000000000004e-05, 'epoch': 0.95} 25%|██▌ | 2527/10000 [9:12:39<26:47:13, 12.90s/it] 25%|██▌ | 2528/10000 [9:12:52<26:47:03, 12.90s/it] {'loss': 0.0042, 'learning_rate': 3.739e-05, 'epoch': 0.95} 25%|██▌ | 2528/10000 [9:12:52<26:47:03, 12.90s/it] 25%|██▌ | 2529/10000 [9:13:05<26:47:32, 12.91s/it] {'loss': 0.0065, 'learning_rate': 3.7385e-05, 'epoch': 0.95} 25%|██▌ | 2529/10000 [9:13:05<26:47:32, 12.91s/it] 25%|██▌ | 2530/10000 [9:13:18<26:43:57, 12.88s/it] {'loss': 0.0062, 'learning_rate': 3.7380000000000005e-05, 'epoch': 0.95} 25%|██▌ | 2530/10000 [9:13:18<26:43:57, 12.88s/it] 25%|██▌ | 2531/10000 [9:13:31<26:46:43, 12.91s/it] {'loss': 0.0046, 'learning_rate': 3.737500000000001e-05, 'epoch': 0.95} 25%|██▌ | 2531/10000 [9:13:31<26:46:43, 12.91s/it] 25%|██▌ | 2532/10000 [9:13:44<26:44:46, 12.89s/it] {'loss': 0.0065, 'learning_rate': 3.7369999999999996e-05, 'epoch': 0.95} 25%|██▌ | 2532/10000 [9:13:44<26:44:46, 12.89s/it] 25%|██▌ | 2533/10000 [9:13:57<26:43:19, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.7365e-05, 'epoch': 0.95} 25%|██▌ | 2533/10000 [9:13:57<26:43:19, 12.88s/it] 25%|██▌ | 2534/10000 [9:14:09<26:41:47, 12.87s/it] {'loss': 0.0077, 'learning_rate': 3.736e-05, 'epoch': 0.95} 25%|██▌ | 2534/10000 [9:14:10<26:41:47, 12.87s/it] 25%|██▌ | 2535/10000 [9:14:22<26:42:32, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.7355000000000004e-05, 'epoch': 0.96} 25%|██▌ | 2535/10000 [9:14:22<26:42:32, 12.88s/it] 25%|██▌ | 2536/10000 [9:14:35<26:45:47, 12.91s/it] {'loss': 0.0072, 'learning_rate': 3.735e-05, 'epoch': 0.96} 25%|██▌ | 2536/10000 [9:14:35<26:45:47, 12.91s/it] 25%|██▌ | 2537/10000 [9:14:48<26:43:32, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.7345e-05, 'epoch': 0.96} 25%|██▌ | 2537/10000 [9:14:48<26:43:32, 12.89s/it] 25%|██▌ | 2538/10000 [9:15:01<26:45:24, 12.91s/it] {'loss': 0.0064, 'learning_rate': 3.7340000000000005e-05, 'epoch': 0.96} 25%|██▌ | 2538/10000 [9:15:01<26:45:24, 12.91s/it] 25%|██▌ | 2539/10000 [9:15:14<26:42:28, 12.89s/it] {'loss': 0.006, 'learning_rate': 3.7335e-05, 'epoch': 0.96} 25%|██▌ | 2539/10000 [9:15:14<26:42:28, 12.89s/it] 25%|██▌ | 2540/10000 [9:15:27<26:45:50, 12.92s/it] {'loss': 0.0064, 'learning_rate': 3.7330000000000003e-05, 'epoch': 0.96} 25%|██▌ | 2540/10000 [9:15:27<26:45:50, 12.92s/it] 25%|██▌ | 2541/10000 [9:15:40<26:44:59, 12.91s/it] {'loss': 0.004, 'learning_rate': 3.7325000000000006e-05, 'epoch': 0.96} 25%|██▌ | 2541/10000 [9:15:40<26:44:59, 12.91s/it] 25%|██▌ | 2542/10000 [9:15:53<26:42:11, 12.89s/it] {'loss': 0.006, 'learning_rate': 3.732e-05, 'epoch': 0.96} 25%|██▌ | 2542/10000 [9:15:53<26:42:11, 12.89s/it] 25%|██▌ | 2543/10000 [9:16:06<26:38:50, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.7315e-05, 'epoch': 0.96} 25%|██▌ | 2543/10000 [9:16:06<26:38:50, 12.86s/it] 25%|██▌ | 2544/10000 [9:16:18<26:35:59, 12.84s/it] {'loss': 0.0063, 'learning_rate': 3.731e-05, 'epoch': 0.96} 25%|██▌ | 2544/10000 [9:16:18<26:35:59, 12.84s/it] 25%|██▌ | 2545/10000 [9:16:31<26:35:31, 12.84s/it] {'loss': 0.0069, 'learning_rate': 3.7305e-05, 'epoch': 0.96} 25%|██▌ | 2545/10000 [9:16:31<26:35:31, 12.84s/it] 25%|██▌ | 2546/10000 [9:16:44<26:38:46, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.73e-05, 'epoch': 0.96} 25%|██▌ | 2546/10000 [9:16:44<26:38:46, 12.87s/it] 25%|██▌ | 2547/10000 [9:16:57<26:40:41, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.7295e-05, 'epoch': 0.96} 25%|██▌ | 2547/10000 [9:16:57<26:40:41, 12.89s/it] 25%|██▌ | 2548/10000 [9:17:10<26:40:43, 12.89s/it] {'loss': 0.0066, 'learning_rate': 3.7290000000000004e-05, 'epoch': 0.96} 25%|██▌ | 2548/10000 [9:17:10<26:40:43, 12.89s/it] 25%|██▌ | 2549/10000 [9:17:23<26:40:40, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.7285000000000006e-05, 'epoch': 0.96} 25%|██▌ | 2549/10000 [9:17:23<26:40:40, 12.89s/it] 26%|██▌ | 2550/10000 [9:17:36<26:43:29, 12.91s/it] {'loss': 0.005, 'learning_rate': 3.728e-05, 'epoch': 0.96} 26%|██▌ | 2550/10000 [9:17:36<26:43:29, 12.91s/it] 26%|██▌ | 2551/10000 [9:17:49<26:42:52, 12.91s/it] {'loss': 0.0039, 'learning_rate': 3.7275000000000005e-05, 'epoch': 0.96} 26%|██▌ | 2551/10000 [9:17:49<26:42:52, 12.91s/it] 26%|██▌ | 2552/10000 [9:18:02<26:39:53, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.727e-05, 'epoch': 0.96} 26%|██▌ | 2552/10000 [9:18:02<26:39:53, 12.89s/it] 26%|██▌ | 2553/10000 [9:18:14<26:37:24, 12.87s/it] {'loss': 0.0043, 'learning_rate': 3.7265e-05, 'epoch': 0.96} 26%|██▌ | 2553/10000 [9:18:14<26:37:24, 12.87s/it] 26%|██▌ | 2554/10000 [9:18:27<26:37:10, 12.87s/it] {'loss': 0.005, 'learning_rate': 3.726e-05, 'epoch': 0.96} 26%|██▌ | 2554/10000 [9:18:27<26:37:10, 12.87s/it] 26%|██▌ | 2555/10000 [9:18:40<26:39:03, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.7255e-05, 'epoch': 0.96} 26%|██▌ | 2555/10000 [9:18:40<26:39:03, 12.89s/it] 26%|██▌ | 2556/10000 [9:18:53<26:39:10, 12.89s/it] {'loss': 0.0043, 'learning_rate': 3.7250000000000004e-05, 'epoch': 0.96} 26%|██▌ | 2556/10000 [9:18:53<26:39:10, 12.89s/it] 26%|██▌ | 2557/10000 [9:19:06<26:41:25, 12.91s/it] {'loss': 0.0057, 'learning_rate': 3.7245e-05, 'epoch': 0.96} 26%|██▌ | 2557/10000 [9:19:06<26:41:25, 12.91s/it] 26%|██▌ | 2558/10000 [9:19:19<26:38:37, 12.89s/it] {'loss': 0.0063, 'learning_rate': 3.724e-05, 'epoch': 0.96} 26%|██▌ | 2558/10000 [9:19:19<26:38:37, 12.89s/it] 26%|██▌ | 2559/10000 [9:19:32<26:39:07, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.7235000000000005e-05, 'epoch': 0.96} 26%|██▌ | 2559/10000 [9:19:32<26:39:07, 12.89s/it] 26%|██▌ | 2560/10000 [9:19:45<26:37:10, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.723e-05, 'epoch': 0.96} 26%|██▌ | 2560/10000 [9:19:45<26:37:10, 12.88s/it] 26%|██▌ | 2561/10000 [9:19:57<26:38:02, 12.89s/it] {'loss': 0.0043, 'learning_rate': 3.7225000000000004e-05, 'epoch': 0.96} 26%|██▌ | 2561/10000 [9:19:58<26:38:02, 12.89s/it] 26%|██▌ | 2562/10000 [9:20:10<26:35:33, 12.87s/it] {'loss': 0.0077, 'learning_rate': 3.722e-05, 'epoch': 0.97} 26%|██▌ | 2562/10000 [9:20:10<26:35:33, 12.87s/it] 26%|██▌ | 2563/10000 [9:20:23<26:36:13, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.7215e-05, 'epoch': 0.97} 26%|██▌ | 2563/10000 [9:20:23<26:36:13, 12.88s/it] 26%|██▌ | 2564/10000 [9:20:36<26:38:28, 12.90s/it] {'loss': 0.0051, 'learning_rate': 3.721e-05, 'epoch': 0.97} 26%|██▌ | 2564/10000 [9:20:36<26:38:28, 12.90s/it] 26%|██▌ | 2565/10000 [9:20:49<26:34:53, 12.87s/it] {'loss': 0.0055, 'learning_rate': 3.7205e-05, 'epoch': 0.97} 26%|██▌ | 2565/10000 [9:20:49<26:34:53, 12.87s/it] 26%|██▌ | 2566/10000 [9:21:02<26:34:52, 12.87s/it] {'loss': 0.0057, 'learning_rate': 3.72e-05, 'epoch': 0.97} 26%|██▌ | 2566/10000 [9:21:02<26:34:52, 12.87s/it] 26%|██▌ | 2567/10000 [9:21:15<26:36:06, 12.88s/it] {'loss': 0.0085, 'learning_rate': 3.7195e-05, 'epoch': 0.97} 26%|██▌ | 2567/10000 [9:21:15<26:36:06, 12.88s/it] 26%|██▌ | 2568/10000 [9:21:28<26:38:28, 12.90s/it] {'loss': 0.0063, 'learning_rate': 3.719e-05, 'epoch': 0.97} 26%|██▌ | 2568/10000 [9:21:28<26:38:28, 12.90s/it] 26%|██▌ | 2569/10000 [9:21:41<26:39:28, 12.91s/it] {'loss': 0.0048, 'learning_rate': 3.7185000000000004e-05, 'epoch': 0.97} 26%|██▌ | 2569/10000 [9:21:41<26:39:28, 12.91s/it] 26%|██▌ | 2570/10000 [9:21:54<26:39:48, 12.92s/it] {'loss': 0.007, 'learning_rate': 3.7180000000000007e-05, 'epoch': 0.97} 26%|██▌ | 2570/10000 [9:21:54<26:39:48, 12.92s/it] 26%|██▌ | 2571/10000 [9:22:06<26:39:07, 12.92s/it] {'loss': 0.0043, 'learning_rate': 3.7175e-05, 'epoch': 0.97} 26%|██▌ | 2571/10000 [9:22:06<26:39:07, 12.92s/it] 26%|██▌ | 2572/10000 [9:22:19<26:36:52, 12.90s/it] {'loss': 0.0061, 'learning_rate': 3.717e-05, 'epoch': 0.97} 26%|██▌ | 2572/10000 [9:22:19<26:36:52, 12.90s/it] 26%|██▌ | 2573/10000 [9:22:32<26:37:57, 12.91s/it] {'loss': 0.0063, 'learning_rate': 3.7165e-05, 'epoch': 0.97} 26%|██▌ | 2573/10000 [9:22:32<26:37:57, 12.91s/it] 26%|██▌ | 2574/10000 [9:22:45<26:34:58, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.716e-05, 'epoch': 0.97} 26%|██▌ | 2574/10000 [9:22:45<26:34:58, 12.89s/it] 26%|██▌ | 2575/10000 [9:22:58<26:37:05, 12.91s/it] {'loss': 0.0047, 'learning_rate': 3.7155e-05, 'epoch': 0.97} 26%|██▌ | 2575/10000 [9:22:58<26:37:05, 12.91s/it] 26%|██▌ | 2576/10000 [9:23:11<26:35:25, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.715e-05, 'epoch': 0.97} 26%|██▌ | 2576/10000 [9:23:11<26:35:25, 12.89s/it] 26%|██▌ | 2577/10000 [9:23:24<26:36:15, 12.90s/it] {'loss': 0.0067, 'learning_rate': 3.7145000000000004e-05, 'epoch': 0.97} 26%|██▌ | 2577/10000 [9:23:24<26:36:15, 12.90s/it] 26%|██▌ | 2578/10000 [9:23:37<26:34:15, 12.89s/it] {'loss': 0.0061, 'learning_rate': 3.714e-05, 'epoch': 0.97} 26%|██▌ | 2578/10000 [9:23:37<26:34:15, 12.89s/it] 26%|██▌ | 2579/10000 [9:23:50<26:36:25, 12.91s/it] {'loss': 0.0052, 'learning_rate': 3.7135e-05, 'epoch': 0.97} 26%|██▌ | 2579/10000 [9:23:50<26:36:25, 12.91s/it] 26%|██▌ | 2580/10000 [9:24:03<26:34:36, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.7130000000000005e-05, 'epoch': 0.97} 26%|██▌ | 2580/10000 [9:24:03<26:34:36, 12.89s/it] 26%|██▌ | 2581/10000 [9:24:15<26:34:53, 12.90s/it] {'loss': 0.0045, 'learning_rate': 3.7125e-05, 'epoch': 0.97} 26%|██▌ | 2581/10000 [9:24:15<26:34:53, 12.90s/it] 26%|██▌ | 2582/10000 [9:24:28<26:33:05, 12.89s/it] {'loss': 0.0093, 'learning_rate': 3.712e-05, 'epoch': 0.97} 26%|██▌ | 2582/10000 [9:24:28<26:33:05, 12.89s/it] 26%|██▌ | 2583/10000 [9:24:41<26:35:44, 12.91s/it] {'loss': 0.0037, 'learning_rate': 3.7115e-05, 'epoch': 0.97} 26%|██▌ | 2583/10000 [9:24:41<26:35:44, 12.91s/it] 26%|██▌ | 2584/10000 [9:24:54<26:34:39, 12.90s/it] {'loss': 0.0057, 'learning_rate': 3.711e-05, 'epoch': 0.97} 26%|██▌ | 2584/10000 [9:24:54<26:34:39, 12.90s/it] 26%|██▌ | 2585/10000 [9:25:07<26:31:29, 12.88s/it] {'loss': 0.0062, 'learning_rate': 3.7105e-05, 'epoch': 0.97} 26%|██▌ | 2585/10000 [9:25:07<26:31:29, 12.88s/it] 26%|██▌ | 2586/10000 [9:25:20<26:32:45, 12.89s/it] {'loss': 0.008, 'learning_rate': 3.71e-05, 'epoch': 0.97} 26%|██▌ | 2586/10000 [9:25:20<26:32:45, 12.89s/it] 26%|██▌ | 2587/10000 [9:25:33<26:29:44, 12.87s/it] {'loss': 0.0062, 'learning_rate': 3.7095e-05, 'epoch': 0.97} 26%|██▌ | 2587/10000 [9:25:33<26:29:44, 12.87s/it] 26%|██▌ | 2588/10000 [9:25:46<26:29:03, 12.86s/it] {'loss': 0.007, 'learning_rate': 3.7090000000000006e-05, 'epoch': 0.98} 26%|██▌ | 2588/10000 [9:25:46<26:29:03, 12.86s/it] 26%|██▌ | 2589/10000 [9:25:58<26:31:12, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.7085e-05, 'epoch': 0.98} 26%|██▌ | 2589/10000 [9:25:58<26:31:12, 12.88s/it] 26%|██▌ | 2590/10000 [9:26:11<26:30:45, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.7080000000000004e-05, 'epoch': 0.98} 26%|██▌ | 2590/10000 [9:26:11<26:30:45, 12.88s/it] 26%|██▌ | 2591/10000 [9:26:24<26:32:29, 12.90s/it] {'loss': 0.0086, 'learning_rate': 3.707500000000001e-05, 'epoch': 0.98} 26%|██▌ | 2591/10000 [9:26:24<26:32:29, 12.90s/it] 26%|██▌ | 2592/10000 [9:26:37<26:30:41, 12.88s/it] {'loss': 0.0065, 'learning_rate': 3.707e-05, 'epoch': 0.98} 26%|██▌ | 2592/10000 [9:26:37<26:30:41, 12.88s/it] 26%|██▌ | 2593/10000 [9:26:50<26:29:20, 12.87s/it] {'loss': 0.0051, 'learning_rate': 3.7065e-05, 'epoch': 0.98} 26%|██▌ | 2593/10000 [9:26:50<26:29:20, 12.87s/it] 26%|██▌ | 2594/10000 [9:27:03<26:29:55, 12.88s/it] {'loss': 0.0054, 'learning_rate': 3.706e-05, 'epoch': 0.98} 26%|██▌ | 2594/10000 [9:27:03<26:29:55, 12.88s/it] 26%|██▌ | 2595/10000 [9:27:16<26:33:42, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.7055000000000004e-05, 'epoch': 0.98} 26%|██▌ | 2595/10000 [9:27:16<26:33:42, 12.91s/it] 26%|██▌ | 2596/10000 [9:27:29<26:33:19, 12.91s/it] {'loss': 0.0057, 'learning_rate': 3.705e-05, 'epoch': 0.98} 26%|██▌ | 2596/10000 [9:27:29<26:33:19, 12.91s/it] 26%|██▌ | 2597/10000 [9:27:42<26:33:16, 12.91s/it] {'loss': 0.0053, 'learning_rate': 3.7045e-05, 'epoch': 0.98} 26%|██▌ | 2597/10000 [9:27:42<26:33:16, 12.91s/it] 26%|██▌ | 2598/10000 [9:27:55<26:33:15, 12.91s/it] {'loss': 0.0055, 'learning_rate': 3.7040000000000005e-05, 'epoch': 0.98} 26%|██▌ | 2598/10000 [9:27:55<26:33:15, 12.91s/it] 26%|██▌ | 2599/10000 [9:28:07<26:31:00, 12.90s/it] {'loss': 0.0071, 'learning_rate': 3.7035e-05, 'epoch': 0.98} 26%|██▌ | 2599/10000 [9:28:07<26:31:00, 12.90s/it] 26%|██▌ | 2600/10000 [9:28:20<26:29:39, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.703e-05, 'epoch': 0.98} 26%|██▌ | 2600/10000 [9:28:20<26:29:39, 12.89s/it] 26%|██▌ | 2601/10000 [9:28:33<26:27:29, 12.87s/it] {'loss': 0.0049, 'learning_rate': 3.7025000000000005e-05, 'epoch': 0.98} 26%|██▌ | 2601/10000 [9:28:33<26:27:29, 12.87s/it] 26%|██▌ | 2602/10000 [9:28:46<26:28:30, 12.88s/it] {'loss': 0.0043, 'learning_rate': 3.702e-05, 'epoch': 0.98} 26%|██▌ | 2602/10000 [9:28:46<26:28:30, 12.88s/it] 26%|██▌ | 2603/10000 [9:28:59<26:24:56, 12.86s/it] {'loss': 0.0058, 'learning_rate': 3.7015e-05, 'epoch': 0.98} 26%|██▌ | 2603/10000 [9:28:59<26:24:56, 12.86s/it] 26%|██▌ | 2604/10000 [9:29:12<26:28:25, 12.89s/it] {'loss': 0.0053, 'learning_rate': 3.701e-05, 'epoch': 0.98} 26%|██▌ | 2604/10000 [9:29:12<26:28:25, 12.89s/it] 26%|██▌ | 2605/10000 [9:29:25<26:27:30, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.7005e-05, 'epoch': 0.98} 26%|██▌ | 2605/10000 [9:29:25<26:27:30, 12.88s/it] 26%|██▌ | 2606/10000 [9:29:38<26:28:12, 12.89s/it] {'loss': 0.0046, 'learning_rate': 3.7e-05, 'epoch': 0.98} 26%|██▌ | 2606/10000 [9:29:38<26:28:12, 12.89s/it] 26%|██▌ | 2607/10000 [9:29:50<26:28:17, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.6995e-05, 'epoch': 0.98} 26%|██▌ | 2607/10000 [9:29:51<26:28:17, 12.89s/it] 26%|██▌ | 2608/10000 [9:30:03<26:27:47, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.699e-05, 'epoch': 0.98} 26%|██▌ | 2608/10000 [9:30:03<26:27:47, 12.89s/it] 26%|██▌ | 2609/10000 [9:30:16<26:28:27, 12.90s/it] {'loss': 0.0057, 'learning_rate': 3.6985000000000006e-05, 'epoch': 0.98} 26%|██▌ | 2609/10000 [9:30:16<26:28:27, 12.90s/it] 26%|██▌ | 2610/10000 [9:30:29<26:25:56, 12.88s/it] {'loss': 0.006, 'learning_rate': 3.698e-05, 'epoch': 0.98} 26%|██▌ | 2610/10000 [9:30:29<26:25:56, 12.88s/it] 26%|██▌ | 2611/10000 [9:30:42<26:25:35, 12.88s/it] {'loss': 0.0052, 'learning_rate': 3.6975000000000004e-05, 'epoch': 0.98} 26%|██▌ | 2611/10000 [9:30:42<26:25:35, 12.88s/it] 26%|██▌ | 2612/10000 [9:30:55<26:29:11, 12.91s/it] {'loss': 0.0045, 'learning_rate': 3.697e-05, 'epoch': 0.98} 26%|██▌ | 2612/10000 [9:30:55<26:29:11, 12.91s/it] 26%|██▌ | 2613/10000 [9:31:08<26:28:48, 12.90s/it] {'loss': 0.0061, 'learning_rate': 3.6965e-05, 'epoch': 0.98} 26%|██▌ | 2613/10000 [9:31:08<26:28:48, 12.90s/it] 26%|██▌ | 2614/10000 [9:31:21<26:26:20, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.696e-05, 'epoch': 0.98} 26%|██▌ | 2614/10000 [9:31:21<26:26:20, 12.89s/it] 26%|██▌ | 2615/10000 [9:31:34<26:29:45, 12.92s/it] {'loss': 0.0053, 'learning_rate': 3.6955e-05, 'epoch': 0.99} 26%|██▌ | 2615/10000 [9:31:34<26:29:45, 12.92s/it] 26%|██▌ | 2616/10000 [9:31:47<26:28:35, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.6950000000000004e-05, 'epoch': 0.99} 26%|██▌ | 2616/10000 [9:31:47<26:28:35, 12.91s/it] 26%|██▌ | 2617/10000 [9:31:59<26:27:49, 12.90s/it] {'loss': 0.0063, 'learning_rate': 3.6945e-05, 'epoch': 0.99} 26%|██▌ | 2617/10000 [9:32:00<26:27:49, 12.90s/it] 26%|██▌ | 2618/10000 [9:32:12<26:27:15, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.694e-05, 'epoch': 0.99} 26%|██▌ | 2618/10000 [9:32:12<26:27:15, 12.90s/it] 26%|██▌ | 2619/10000 [9:32:25<26:26:56, 12.90s/it] {'loss': 0.0043, 'learning_rate': 3.6935000000000005e-05, 'epoch': 0.99} 26%|██▌ | 2619/10000 [9:32:25<26:26:56, 12.90s/it] 26%|██▌ | 2620/10000 [9:32:38<26:26:53, 12.90s/it] {'loss': 0.0059, 'learning_rate': 3.693e-05, 'epoch': 0.99} 26%|██▌ | 2620/10000 [9:32:38<26:26:53, 12.90s/it] 26%|██▌ | 2621/10000 [9:32:51<26:24:57, 12.89s/it] {'loss': 0.0094, 'learning_rate': 3.6925e-05, 'epoch': 0.99} 26%|██▌ | 2621/10000 [9:32:51<26:24:57, 12.89s/it] 26%|██▌ | 2622/10000 [9:33:04<26:22:59, 12.87s/it] {'loss': 0.0057, 'learning_rate': 3.692e-05, 'epoch': 0.99} 26%|██▌ | 2622/10000 [9:33:04<26:22:59, 12.87s/it] 26%|██▌ | 2623/10000 [9:33:17<26:24:22, 12.89s/it] {'loss': 0.0063, 'learning_rate': 3.6915e-05, 'epoch': 0.99} 26%|██▌ | 2623/10000 [9:33:17<26:24:22, 12.89s/it] 26%|██▌ | 2624/10000 [9:33:30<26:26:22, 12.90s/it] {'loss': 0.004, 'learning_rate': 3.691e-05, 'epoch': 0.99} 26%|██▌ | 2624/10000 [9:33:30<26:26:22, 12.90s/it] 26%|██▋ | 2625/10000 [9:33:43<26:25:48, 12.90s/it] {'loss': 0.0055, 'learning_rate': 3.6905e-05, 'epoch': 0.99} 26%|██▋ | 2625/10000 [9:33:43<26:25:48, 12.90s/it] 26%|██▋ | 2626/10000 [9:33:56<26:26:09, 12.91s/it] {'loss': 0.005, 'learning_rate': 3.69e-05, 'epoch': 0.99} 26%|██▋ | 2626/10000 [9:33:56<26:26:09, 12.91s/it] 26%|██▋ | 2627/10000 [9:34:09<26:27:30, 12.92s/it] {'loss': 0.0043, 'learning_rate': 3.6895000000000005e-05, 'epoch': 0.99} 26%|██▋ | 2627/10000 [9:34:09<26:27:30, 12.92s/it] 26%|██▋ | 2628/10000 [9:34:21<26:27:58, 12.92s/it] {'loss': 0.0048, 'learning_rate': 3.689e-05, 'epoch': 0.99} 26%|██▋ | 2628/10000 [9:34:21<26:27:58, 12.92s/it] 26%|██▋ | 2629/10000 [9:34:34<26:25:45, 12.91s/it] {'loss': 0.0044, 'learning_rate': 3.6885000000000003e-05, 'epoch': 0.99} 26%|██▋ | 2629/10000 [9:34:34<26:25:45, 12.91s/it] 26%|██▋ | 2630/10000 [9:34:47<26:25:22, 12.91s/it] {'loss': 0.0064, 'learning_rate': 3.6880000000000006e-05, 'epoch': 0.99} 26%|██▋ | 2630/10000 [9:34:47<26:25:22, 12.91s/it] 26%|██▋ | 2631/10000 [9:35:00<26:23:10, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.6875e-05, 'epoch': 0.99} 26%|██▋ | 2631/10000 [9:35:00<26:23:10, 12.89s/it] 26%|██▋ | 2632/10000 [9:35:13<26:23:08, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.6870000000000004e-05, 'epoch': 0.99} 26%|██▋ | 2632/10000 [9:35:13<26:23:08, 12.89s/it] 26%|██▋ | 2633/10000 [9:35:26<26:22:10, 12.89s/it] {'loss': 0.0085, 'learning_rate': 3.6865e-05, 'epoch': 0.99} 26%|██▋ | 2633/10000 [9:35:26<26:22:10, 12.89s/it] 26%|██▋ | 2634/10000 [9:35:39<26:20:23, 12.87s/it] {'loss': 0.0057, 'learning_rate': 3.686e-05, 'epoch': 0.99} 26%|██▋ | 2634/10000 [9:35:39<26:20:23, 12.87s/it] 26%|██▋ | 2635/10000 [9:35:52<26:20:20, 12.87s/it] {'loss': 0.0057, 'learning_rate': 3.6855e-05, 'epoch': 0.99} 26%|██▋ | 2635/10000 [9:35:52<26:20:20, 12.87s/it] 26%|██▋ | 2636/10000 [9:36:05<26:25:19, 12.92s/it] {'loss': 0.0064, 'learning_rate': 3.685e-05, 'epoch': 0.99} 26%|██▋ | 2636/10000 [9:36:05<26:25:19, 12.92s/it] 26%|██▋ | 2637/10000 [9:36:17<26:24:14, 12.91s/it] {'loss': 0.0061, 'learning_rate': 3.6845000000000004e-05, 'epoch': 0.99} 26%|██▋ | 2637/10000 [9:36:17<26:24:14, 12.91s/it] 26%|██▋ | 2638/10000 [9:36:30<26:23:20, 12.90s/it] {'loss': 0.0046, 'learning_rate': 3.684e-05, 'epoch': 0.99} 26%|██▋ | 2638/10000 [9:36:30<26:23:20, 12.90s/it] 26%|██▋ | 2639/10000 [9:36:43<26:21:04, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.6835e-05, 'epoch': 0.99} 26%|██▋ | 2639/10000 [9:36:43<26:21:04, 12.89s/it] 26%|██▋ | 2640/10000 [9:36:56<26:21:19, 12.89s/it] {'loss': 0.0067, 'learning_rate': 3.6830000000000005e-05, 'epoch': 0.99} 26%|██▋ | 2640/10000 [9:36:56<26:21:19, 12.89s/it] 26%|██▋ | 2641/10000 [9:37:09<26:21:15, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.6825e-05, 'epoch': 1.0} 26%|██▋ | 2641/10000 [9:37:09<26:21:15, 12.89s/it] 26%|██▋ | 2642/10000 [9:37:22<26:18:58, 12.88s/it] {'loss': 0.0076, 'learning_rate': 3.682e-05, 'epoch': 1.0} 26%|██▋ | 2642/10000 [9:37:22<26:18:58, 12.88s/it] 26%|██▋ | 2643/10000 [9:37:35<26:18:58, 12.88s/it] {'loss': 0.0057, 'learning_rate': 3.6815e-05, 'epoch': 1.0} 26%|██▋ | 2643/10000 [9:37:35<26:18:58, 12.88s/it] 26%|██▋ | 2644/10000 [9:37:48<26:20:11, 12.89s/it] {'loss': 0.0046, 'learning_rate': 3.681e-05, 'epoch': 1.0} 26%|██▋ | 2644/10000 [9:37:48<26:20:11, 12.89s/it] 26%|██▋ | 2645/10000 [9:38:01<26:19:56, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.6805e-05, 'epoch': 1.0} 26%|██▋ | 2645/10000 [9:38:01<26:19:56, 12.89s/it] 26%|██▋ | 2646/10000 [9:38:13<26:18:31, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.68e-05, 'epoch': 1.0} 26%|██▋ | 2646/10000 [9:38:13<26:18:31, 12.88s/it] 26%|██▋ | 2647/10000 [9:38:26<26:19:28, 12.89s/it] {'loss': 0.0042, 'learning_rate': 3.6795e-05, 'epoch': 1.0} 26%|██▋ | 2647/10000 [9:38:26<26:19:28, 12.89s/it] 26%|██▋ | 2648/10000 [9:38:39<26:19:32, 12.89s/it] {'loss': 0.0077, 'learning_rate': 3.6790000000000005e-05, 'epoch': 1.0} 26%|██▋ | 2648/10000 [9:38:39<26:19:32, 12.89s/it] 26%|██▋ | 2649/10000 [9:38:52<26:23:13, 12.92s/it] {'loss': 0.0055, 'learning_rate': 3.6785e-05, 'epoch': 1.0} 26%|██▋ | 2649/10000 [9:38:52<26:23:13, 12.92s/it] 26%|██▋ | 2650/10000 [9:39:05<26:22:41, 12.92s/it] {'loss': 0.0047, 'learning_rate': 3.6780000000000004e-05, 'epoch': 1.0} 26%|██▋ | 2650/10000 [9:39:05<26:22:41, 12.92s/it] 27%|██▋ | 2651/10000 [9:39:18<26:24:58, 12.94s/it] {'loss': 0.0073, 'learning_rate': 3.6775000000000006e-05, 'epoch': 1.0} 27%|██▋ | 2651/10000 [9:39:18<26:24:58, 12.94s/it] 27%|██▋ | 2652/10000 [9:39:31<26:24:33, 12.94s/it] {'loss': 0.0059, 'learning_rate': 3.677e-05, 'epoch': 1.0} 27%|██▋ | 2652/10000 [9:39:31<26:24:33, 12.94s/it] 27%|██▋ | 2653/10000 [9:39:44<26:23:32, 12.93s/it] {'loss': 0.0051, 'learning_rate': 3.6765e-05, 'epoch': 1.0} 27%|██▋ | 2653/10000 [9:39:44<26:23:32, 12.93s/it] 27%|██▋ | 2654/10000 [9:39:51<22:47:32, 11.17s/it] {'loss': 0.0063, 'learning_rate': 3.676e-05, 'epoch': 1.0} 27%|██▋ | 2654/10000 [9:39:51<22:47:32, 11.17s/it] 27%|██▋ | 2655/10000 [9:40:04<23:52:59, 11.71s/it] {'loss': 0.0065, 'learning_rate': 3.6755e-05, 'epoch': 1.0} 27%|██▋ | 2655/10000 [9:40:04<23:52:59, 11.71s/it] 27%|██▋ | 2656/10000 [9:40:17<24:39:33, 12.09s/it] {'loss': 0.0052, 'learning_rate': 3.675e-05, 'epoch': 1.0} 27%|██▋ | 2656/10000 [9:40:17<24:39:33, 12.09s/it] 27%|██▋ | 2657/10000 [9:40:30<25:11:46, 12.35s/it] {'loss': 0.0071, 'learning_rate': 3.6745e-05, 'epoch': 1.0} 27%|██▋ | 2657/10000 [9:40:30<25:11:46, 12.35s/it] 27%|██▋ | 2658/10000 [9:40:43<25:34:02, 12.54s/it] {'loss': 0.0072, 'learning_rate': 3.6740000000000004e-05, 'epoch': 1.0} 27%|██▋ | 2658/10000 [9:40:43<25:34:02, 12.54s/it] 27%|██▋ | 2659/10000 [9:40:56<25:48:20, 12.66s/it] {'loss': 0.0051, 'learning_rate': 3.6735e-05, 'epoch': 1.0} 27%|██▋ | 2659/10000 [9:40:56<25:48:20, 12.66s/it] 27%|██▋ | 2660/10000 [9:41:09<25:57:14, 12.73s/it] {'loss': 0.0055, 'learning_rate': 3.673e-05, 'epoch': 1.0} 27%|██▋ | 2660/10000 [9:41:09<25:57:14, 12.73s/it] 27%|██▋ | 2661/10000 [9:41:22<26:04:08, 12.79s/it] {'loss': 0.0065, 'learning_rate': 3.6725000000000005e-05, 'epoch': 1.0} 27%|██▋ | 2661/10000 [9:41:22<26:04:08, 12.79s/it] 27%|██▋ | 2662/10000 [9:41:35<26:10:58, 12.85s/it] {'loss': 0.0041, 'learning_rate': 3.672000000000001e-05, 'epoch': 1.0} 27%|██▋ | 2662/10000 [9:41:35<26:10:58, 12.85s/it] 27%|██▋ | 2663/10000 [9:41:48<26:13:16, 12.87s/it] {'loss': 0.0065, 'learning_rate': 3.6714999999999997e-05, 'epoch': 1.0} 27%|██▋ | 2663/10000 [9:41:48<26:13:16, 12.87s/it] 27%|██▋ | 2664/10000 [9:42:00<26:11:40, 12.85s/it] {'loss': 0.0046, 'learning_rate': 3.671e-05, 'epoch': 1.0} 27%|██▋ | 2664/10000 [9:42:00<26:11:40, 12.85s/it] 27%|██▋ | 2665/10000 [9:42:13<26:13:55, 12.87s/it] {'loss': 0.0041, 'learning_rate': 3.6705e-05, 'epoch': 1.0} 27%|██▋ | 2665/10000 [9:42:13<26:13:55, 12.87s/it] 27%|██▋ | 2666/10000 [9:42:26<26:15:07, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.6700000000000004e-05, 'epoch': 1.0} 27%|██▋ | 2666/10000 [9:42:26<26:15:07, 12.89s/it] 27%|██▋ | 2667/10000 [9:42:39<26:16:11, 12.90s/it] {'loss': 0.0075, 'learning_rate': 3.6695e-05, 'epoch': 1.0} 27%|██▋ | 2667/10000 [9:42:39<26:16:11, 12.90s/it] 27%|██▋ | 2668/10000 [9:42:52<26:13:28, 12.88s/it] {'loss': 0.0077, 'learning_rate': 3.669e-05, 'epoch': 1.01} 27%|██▋ | 2668/10000 [9:42:52<26:13:28, 12.88s/it] 27%|██▋ | 2669/10000 [9:43:05<26:13:40, 12.88s/it] {'loss': 0.0049, 'learning_rate': 3.6685000000000005e-05, 'epoch': 1.01} 27%|██▋ | 2669/10000 [9:43:05<26:13:40, 12.88s/it] 27%|██▋ | 2670/10000 [9:43:18<26:15:06, 12.89s/it] {'loss': 0.0069, 'learning_rate': 3.668e-05, 'epoch': 1.01} 27%|██▋ | 2670/10000 [9:43:18<26:15:06, 12.89s/it] 27%|██▋ | 2671/10000 [9:43:31<26:15:24, 12.90s/it] {'loss': 0.0057, 'learning_rate': 3.6675000000000004e-05, 'epoch': 1.01} 27%|██▋ | 2671/10000 [9:43:31<26:15:24, 12.90s/it] 27%|██▋ | 2672/10000 [9:43:44<26:15:39, 12.90s/it] {'loss': 0.0047, 'learning_rate': 3.6670000000000006e-05, 'epoch': 1.01} 27%|██▋ | 2672/10000 [9:43:44<26:15:39, 12.90s/it] 27%|██▋ | 2673/10000 [9:43:56<26:14:32, 12.89s/it] {'loss': 0.0054, 'learning_rate': 3.6665e-05, 'epoch': 1.01} 27%|██▋ | 2673/10000 [9:43:56<26:14:32, 12.89s/it] 27%|██▋ | 2674/10000 [9:44:09<26:12:16, 12.88s/it] {'loss': 0.0067, 'learning_rate': 3.666e-05, 'epoch': 1.01} 27%|██▋ | 2674/10000 [9:44:09<26:12:16, 12.88s/it] 27%|██▋ | 2675/10000 [9:44:22<26:11:02, 12.87s/it] {'loss': 0.0058, 'learning_rate': 3.6655e-05, 'epoch': 1.01} 27%|██▋ | 2675/10000 [9:44:22<26:11:02, 12.87s/it] 27%|██▋ | 2676/10000 [9:44:35<26:10:27, 12.87s/it] {'loss': 0.0053, 'learning_rate': 3.665e-05, 'epoch': 1.01} 27%|██▋ | 2676/10000 [9:44:35<26:10:27, 12.87s/it] 27%|██▋ | 2677/10000 [9:44:48<26:10:45, 12.87s/it] {'loss': 0.006, 'learning_rate': 3.6645e-05, 'epoch': 1.01} 27%|██▋ | 2677/10000 [9:44:48<26:10:45, 12.87s/it] 27%|██▋ | 2678/10000 [9:45:01<26:08:36, 12.85s/it] {'loss': 0.0049, 'learning_rate': 3.664e-05, 'epoch': 1.01} 27%|██▋ | 2678/10000 [9:45:01<26:08:36, 12.85s/it] 27%|██▋ | 2679/10000 [9:45:14<26:09:50, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.6635000000000004e-05, 'epoch': 1.01} 27%|██▋ | 2679/10000 [9:45:14<26:09:50, 12.87s/it] 27%|██▋ | 2680/10000 [9:45:27<26:12:29, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.663e-05, 'epoch': 1.01} 27%|██▋ | 2680/10000 [9:45:27<26:12:29, 12.89s/it] 27%|██▋ | 2681/10000 [9:45:39<26:13:46, 12.90s/it] {'loss': 0.0064, 'learning_rate': 3.6625e-05, 'epoch': 1.01} 27%|██▋ | 2681/10000 [9:45:39<26:13:46, 12.90s/it] 27%|██▋ | 2682/10000 [9:45:52<26:13:21, 12.90s/it] {'loss': 0.0067, 'learning_rate': 3.6620000000000005e-05, 'epoch': 1.01} 27%|██▋ | 2682/10000 [9:45:52<26:13:21, 12.90s/it] 27%|██▋ | 2683/10000 [9:46:05<26:11:23, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.6615e-05, 'epoch': 1.01} 27%|██▋ | 2683/10000 [9:46:05<26:11:23, 12.89s/it] 27%|██▋ | 2684/10000 [9:46:18<26:09:56, 12.88s/it] {'loss': 0.0061, 'learning_rate': 3.661e-05, 'epoch': 1.01} 27%|██▋ | 2684/10000 [9:46:18<26:09:56, 12.88s/it] 27%|██▋ | 2685/10000 [9:46:31<26:13:06, 12.90s/it] {'loss': 0.0049, 'learning_rate': 3.6605e-05, 'epoch': 1.01} 27%|██▋ | 2685/10000 [9:46:31<26:13:06, 12.90s/it] 27%|██▋ | 2686/10000 [9:46:44<26:12:29, 12.90s/it] {'loss': 0.0068, 'learning_rate': 3.66e-05, 'epoch': 1.01} 27%|██▋ | 2686/10000 [9:46:44<26:12:29, 12.90s/it] 27%|██▋ | 2687/10000 [9:46:57<26:09:04, 12.87s/it] {'loss': 0.0056, 'learning_rate': 3.6595000000000005e-05, 'epoch': 1.01} 27%|██▋ | 2687/10000 [9:46:57<26:09:04, 12.87s/it] 27%|██▋ | 2688/10000 [9:47:10<26:09:46, 12.88s/it] {'loss': 0.0041, 'learning_rate': 3.659e-05, 'epoch': 1.01} 27%|██▋ | 2688/10000 [9:47:10<26:09:46, 12.88s/it] 27%|██▋ | 2689/10000 [9:47:23<26:10:04, 12.89s/it] {'loss': 0.0067, 'learning_rate': 3.6585e-05, 'epoch': 1.01} 27%|██▋ | 2689/10000 [9:47:23<26:10:04, 12.89s/it] 27%|██▋ | 2690/10000 [9:47:35<26:09:30, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.6580000000000006e-05, 'epoch': 1.01} 27%|██▋ | 2690/10000 [9:47:35<26:09:30, 12.88s/it] 27%|██▋ | 2691/10000 [9:47:48<26:11:31, 12.90s/it] {'loss': 0.006, 'learning_rate': 3.6575e-05, 'epoch': 1.01} 27%|██▋ | 2691/10000 [9:47:48<26:11:31, 12.90s/it] 27%|██▋ | 2692/10000 [9:48:01<26:11:45, 12.90s/it] {'loss': 0.0059, 'learning_rate': 3.6570000000000004e-05, 'epoch': 1.01} 27%|██▋ | 2692/10000 [9:48:01<26:11:45, 12.90s/it] 27%|██▋ | 2693/10000 [9:48:14<26:09:23, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.6565e-05, 'epoch': 1.01} 27%|██▋ | 2693/10000 [9:48:14<26:09:23, 12.89s/it] 27%|██▋ | 2694/10000 [9:48:27<26:06:53, 12.87s/it] {'loss': 0.0062, 'learning_rate': 3.656e-05, 'epoch': 1.02} 27%|██▋ | 2694/10000 [9:48:27<26:06:53, 12.87s/it] 27%|██▋ | 2695/10000 [9:48:40<26:05:47, 12.86s/it] {'loss': 0.0047, 'learning_rate': 3.6555e-05, 'epoch': 1.02} 27%|██▋ | 2695/10000 [9:48:40<26:05:47, 12.86s/it] 27%|██▋ | 2696/10000 [9:48:53<26:03:39, 12.84s/it] {'loss': 0.0072, 'learning_rate': 3.655e-05, 'epoch': 1.02} 27%|██▋ | 2696/10000 [9:48:53<26:03:39, 12.84s/it] 27%|██▋ | 2697/10000 [9:49:05<26:04:39, 12.85s/it] {'loss': 0.0039, 'learning_rate': 3.6545e-05, 'epoch': 1.02} 27%|██▋ | 2697/10000 [9:49:05<26:04:39, 12.85s/it] 27%|██▋ | 2698/10000 [9:49:18<26:03:57, 12.85s/it] {'loss': 0.005, 'learning_rate': 3.654e-05, 'epoch': 1.02} 27%|██▋ | 2698/10000 [9:49:18<26:03:57, 12.85s/it] 27%|██▋ | 2699/10000 [9:49:31<26:05:34, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.6535e-05, 'epoch': 1.02} 27%|██▋ | 2699/10000 [9:49:31<26:05:34, 12.87s/it] 27%|██▋ | 2700/10000 [9:49:44<26:04:03, 12.86s/it] {'loss': 0.0054, 'learning_rate': 3.6530000000000004e-05, 'epoch': 1.02} 27%|██▋ | 2700/10000 [9:49:44<26:04:03, 12.86s/it] 27%|██▋ | 2701/10000 [9:49:57<26:04:23, 12.86s/it] {'loss': 0.0053, 'learning_rate': 3.652500000000001e-05, 'epoch': 1.02} 27%|██▋ | 2701/10000 [9:49:57<26:04:23, 12.86s/it] 27%|██▋ | 2702/10000 [9:50:10<26:04:42, 12.86s/it] {'loss': 0.0075, 'learning_rate': 3.652e-05, 'epoch': 1.02} 27%|██▋ | 2702/10000 [9:50:10<26:04:42, 12.86s/it] 27%|██▋ | 2703/10000 [9:50:23<26:07:16, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.6515e-05, 'epoch': 1.02} 27%|██▋ | 2703/10000 [9:50:23<26:07:16, 12.89s/it] 27%|██▋ | 2704/10000 [9:50:36<26:06:21, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.651e-05, 'epoch': 1.02} 27%|██▋ | 2704/10000 [9:50:36<26:06:21, 12.88s/it] 27%|██▋ | 2705/10000 [9:50:48<26:06:14, 12.88s/it] {'loss': 0.0069, 'learning_rate': 3.6505e-05, 'epoch': 1.02} 27%|██▋ | 2705/10000 [9:50:48<26:06:14, 12.88s/it] 27%|██▋ | 2706/10000 [9:51:01<26:06:46, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.65e-05, 'epoch': 1.02} 27%|██▋ | 2706/10000 [9:51:01<26:06:46, 12.89s/it] 27%|██▋ | 2707/10000 [9:51:14<26:09:31, 12.91s/it] {'loss': 0.0055, 'learning_rate': 3.6495e-05, 'epoch': 1.02} 27%|██▋ | 2707/10000 [9:51:14<26:09:31, 12.91s/it] 27%|██▋ | 2708/10000 [9:51:27<26:06:27, 12.89s/it] {'loss': 0.0069, 'learning_rate': 3.6490000000000005e-05, 'epoch': 1.02} 27%|██▋ | 2708/10000 [9:51:27<26:06:27, 12.89s/it] 27%|██▋ | 2709/10000 [9:51:40<26:05:03, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.6485e-05, 'epoch': 1.02} 27%|██▋ | 2709/10000 [9:51:40<26:05:03, 12.88s/it] 27%|██▋ | 2710/10000 [9:51:53<26:05:11, 12.88s/it] {'loss': 0.0062, 'learning_rate': 3.648e-05, 'epoch': 1.02} 27%|██▋ | 2710/10000 [9:51:53<26:05:11, 12.88s/it] 27%|██▋ | 2711/10000 [9:52:06<26:02:59, 12.87s/it] {'loss': 0.006, 'learning_rate': 3.6475000000000006e-05, 'epoch': 1.02} 27%|██▋ | 2711/10000 [9:52:06<26:02:59, 12.87s/it] 27%|██▋ | 2712/10000 [9:52:19<26:02:56, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.647e-05, 'epoch': 1.02} 27%|██▋ | 2712/10000 [9:52:19<26:02:56, 12.87s/it] 27%|██▋ | 2713/10000 [9:52:31<26:01:31, 12.86s/it] {'loss': 0.0063, 'learning_rate': 3.6465e-05, 'epoch': 1.02} 27%|██▋ | 2713/10000 [9:52:31<26:01:31, 12.86s/it] 27%|██▋ | 2714/10000 [9:52:44<25:59:56, 12.85s/it] {'loss': 0.0065, 'learning_rate': 3.646e-05, 'epoch': 1.02} 27%|██▋ | 2714/10000 [9:52:44<25:59:56, 12.85s/it] 27%|██▋ | 2715/10000 [9:52:57<26:00:56, 12.86s/it] {'loss': 0.0064, 'learning_rate': 3.6455e-05, 'epoch': 1.02} 27%|██▋ | 2715/10000 [9:52:57<26:00:56, 12.86s/it] 27%|██▋ | 2716/10000 [9:53:10<26:01:01, 12.86s/it] {'loss': 0.0055, 'learning_rate': 3.645e-05, 'epoch': 1.02} 27%|██▋ | 2716/10000 [9:53:10<26:01:01, 12.86s/it] 27%|██▋ | 2717/10000 [9:53:23<26:02:36, 12.87s/it] {'loss': 0.005, 'learning_rate': 3.6445e-05, 'epoch': 1.02} 27%|██▋ | 2717/10000 [9:53:23<26:02:36, 12.87s/it] 27%|██▋ | 2718/10000 [9:53:36<26:02:27, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.6440000000000003e-05, 'epoch': 1.02} 27%|██▋ | 2718/10000 [9:53:36<26:02:27, 12.87s/it] 27%|██▋ | 2719/10000 [9:53:49<26:00:36, 12.86s/it] {'loss': 0.0047, 'learning_rate': 3.6435e-05, 'epoch': 1.02} 27%|██▋ | 2719/10000 [9:53:49<26:00:36, 12.86s/it] 27%|██▋ | 2720/10000 [9:54:01<26:01:06, 12.87s/it] {'loss': 0.0049, 'learning_rate': 3.643e-05, 'epoch': 1.02} 27%|██▋ | 2720/10000 [9:54:02<26:01:06, 12.87s/it] 27%|██▋ | 2721/10000 [9:54:14<26:01:15, 12.87s/it] {'loss': 0.0074, 'learning_rate': 3.6425000000000004e-05, 'epoch': 1.03} 27%|██▋ | 2721/10000 [9:54:14<26:01:15, 12.87s/it] 27%|██▋ | 2722/10000 [9:54:27<25:59:51, 12.86s/it] {'loss': 0.0052, 'learning_rate': 3.642000000000001e-05, 'epoch': 1.03} 27%|██▋ | 2722/10000 [9:54:27<25:59:51, 12.86s/it] 27%|██▋ | 2723/10000 [9:54:40<25:58:19, 12.85s/it] {'loss': 0.0065, 'learning_rate': 3.6414999999999996e-05, 'epoch': 1.03} 27%|██▋ | 2723/10000 [9:54:40<25:58:19, 12.85s/it] 27%|██▋ | 2724/10000 [9:54:53<25:58:26, 12.85s/it] {'loss': 0.0045, 'learning_rate': 3.641e-05, 'epoch': 1.03} 27%|██▋ | 2724/10000 [9:54:53<25:58:26, 12.85s/it] 27%|██▋ | 2725/10000 [9:55:06<25:58:18, 12.85s/it] {'loss': 0.0061, 'learning_rate': 3.6405e-05, 'epoch': 1.03} 27%|██▋ | 2725/10000 [9:55:06<25:58:18, 12.85s/it] 27%|██▋ | 2726/10000 [9:55:19<25:58:16, 12.85s/it] {'loss': 0.0063, 'learning_rate': 3.6400000000000004e-05, 'epoch': 1.03} 27%|██▋ | 2726/10000 [9:55:19<25:58:16, 12.85s/it] 27%|██▋ | 2727/10000 [9:55:31<25:56:23, 12.84s/it] {'loss': 0.0049, 'learning_rate': 3.6395e-05, 'epoch': 1.03} 27%|██▋ | 2727/10000 [9:55:31<25:56:23, 12.84s/it] 27%|██▋ | 2728/10000 [9:55:44<26:01:05, 12.88s/it] {'loss': 0.004, 'learning_rate': 3.639e-05, 'epoch': 1.03} 27%|██▋ | 2728/10000 [9:55:44<26:01:05, 12.88s/it] 27%|██▋ | 2729/10000 [9:55:57<25:59:05, 12.87s/it] {'loss': 0.0053, 'learning_rate': 3.6385000000000005e-05, 'epoch': 1.03} 27%|██▋ | 2729/10000 [9:55:57<25:59:05, 12.87s/it] 27%|██▋ | 2730/10000 [9:56:10<25:59:54, 12.87s/it] {'loss': 0.0058, 'learning_rate': 3.638e-05, 'epoch': 1.03} 27%|██▋ | 2730/10000 [9:56:10<25:59:54, 12.87s/it] 27%|██▋ | 2731/10000 [9:56:23<25:59:08, 12.87s/it] {'loss': 0.0044, 'learning_rate': 3.6375e-05, 'epoch': 1.03} 27%|██▋ | 2731/10000 [9:56:23<25:59:08, 12.87s/it] 27%|██▋ | 2732/10000 [9:56:36<25:56:49, 12.85s/it] {'loss': 0.0055, 'learning_rate': 3.6370000000000006e-05, 'epoch': 1.03} 27%|██▋ | 2732/10000 [9:56:36<25:56:49, 12.85s/it] 27%|██▋ | 2733/10000 [9:56:49<25:55:44, 12.84s/it] {'loss': 0.0064, 'learning_rate': 3.6365e-05, 'epoch': 1.03} 27%|██▋ | 2733/10000 [9:56:49<25:55:44, 12.84s/it] 27%|██▋ | 2734/10000 [9:57:02<25:59:20, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.636e-05, 'epoch': 1.03} 27%|██▋ | 2734/10000 [9:57:02<25:59:20, 12.88s/it] 27%|██▋ | 2735/10000 [9:57:15<26:02:26, 12.90s/it] {'loss': 0.0054, 'learning_rate': 3.6355e-05, 'epoch': 1.03} 27%|██▋ | 2735/10000 [9:57:15<26:02:26, 12.90s/it] 27%|██▋ | 2736/10000 [9:57:28<26:06:02, 12.94s/it] {'loss': 0.0044, 'learning_rate': 3.635e-05, 'epoch': 1.03} 27%|██▋ | 2736/10000 [9:57:28<26:06:02, 12.94s/it] 27%|██▋ | 2737/10000 [9:57:40<26:03:08, 12.91s/it] {'loss': 0.0045, 'learning_rate': 3.6345e-05, 'epoch': 1.03} 27%|██▋ | 2737/10000 [9:57:40<26:03:08, 12.91s/it] 27%|██▋ | 2738/10000 [9:57:53<26:01:10, 12.90s/it] {'loss': 0.0054, 'learning_rate': 3.634e-05, 'epoch': 1.03} 27%|██▋ | 2738/10000 [9:57:53<26:01:10, 12.90s/it] 27%|██▋ | 2739/10000 [9:58:06<26:03:18, 12.92s/it] {'loss': 0.0058, 'learning_rate': 3.6335000000000004e-05, 'epoch': 1.03} 27%|██▋ | 2739/10000 [9:58:06<26:03:18, 12.92s/it] 27%|██▋ | 2740/10000 [9:58:19<26:06:37, 12.95s/it] {'loss': 0.0053, 'learning_rate': 3.6330000000000006e-05, 'epoch': 1.03} 27%|██▋ | 2740/10000 [9:58:19<26:06:37, 12.95s/it] 27%|██▋ | 2741/10000 [9:58:32<26:05:24, 12.94s/it] {'loss': 0.0068, 'learning_rate': 3.6325e-05, 'epoch': 1.03} 27%|██▋ | 2741/10000 [9:58:32<26:05:24, 12.94s/it] 27%|██▋ | 2742/10000 [9:58:45<26:05:00, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.6320000000000005e-05, 'epoch': 1.03} 27%|██▋ | 2742/10000 [9:58:45<26:05:00, 12.94s/it] 27%|██▋ | 2743/10000 [9:58:58<26:06:28, 12.95s/it] {'loss': 0.0056, 'learning_rate': 3.6315e-05, 'epoch': 1.03} 27%|██▋ | 2743/10000 [9:58:58<26:06:28, 12.95s/it] 27%|██▋ | 2744/10000 [9:59:11<26:05:43, 12.95s/it] {'loss': 0.0051, 'learning_rate': 3.6309999999999996e-05, 'epoch': 1.03} 27%|██▋ | 2744/10000 [9:59:11<26:05:43, 12.95s/it] 27%|██▋ | 2745/10000 [9:59:24<26:02:46, 12.92s/it] {'loss': 0.0063, 'learning_rate': 3.6305e-05, 'epoch': 1.03} 27%|██▋ | 2745/10000 [9:59:24<26:02:46, 12.92s/it] 27%|██▋ | 2746/10000 [9:59:37<26:06:47, 12.96s/it] {'loss': 0.0059, 'learning_rate': 3.63e-05, 'epoch': 1.03} 27%|██▋ | 2746/10000 [9:59:37<26:06:47, 12.96s/it] 27%|██▋ | 2747/10000 [9:59:50<26:11:31, 13.00s/it] {'loss': 0.0045, 'learning_rate': 3.6295000000000004e-05, 'epoch': 1.04} 27%|██▋ | 2747/10000 [9:59:50<26:11:31, 13.00s/it] 27%|██▋ | 2748/10000 [10:00:03<26:08:42, 12.98s/it] {'loss': 0.0062, 'learning_rate': 3.629e-05, 'epoch': 1.04} 27%|██▋ | 2748/10000 [10:00:03<26:08:42, 12.98s/it] 27%|██▋ | 2749/10000 [10:00:16<26:08:27, 12.98s/it] {'loss': 0.0066, 'learning_rate': 3.6285e-05, 'epoch': 1.04} 27%|██▋ | 2749/10000 [10:00:16<26:08:27, 12.98s/it] 28%|██▊ | 2750/10000 [10:00:29<26:07:27, 12.97s/it] {'loss': 0.0074, 'learning_rate': 3.6280000000000005e-05, 'epoch': 1.04} 28%|██▊ | 2750/10000 [10:00:29<26:07:27, 12.97s/it] 28%|██▊ | 2751/10000 [10:00:42<26:06:50, 12.97s/it] {'loss': 0.0043, 'learning_rate': 3.6275e-05, 'epoch': 1.04} 28%|██▊ | 2751/10000 [10:00:42<26:06:50, 12.97s/it] 28%|██▊ | 2752/10000 [10:00:55<26:05:21, 12.96s/it] {'loss': 0.0076, 'learning_rate': 3.6270000000000003e-05, 'epoch': 1.04} 28%|██▊ | 2752/10000 [10:00:55<26:05:21, 12.96s/it] 28%|██▊ | 2753/10000 [10:01:08<26:02:47, 12.94s/it] {'loss': 0.0051, 'learning_rate': 3.6265e-05, 'epoch': 1.04} 28%|██▊ | 2753/10000 [10:01:08<26:02:47, 12.94s/it] 28%|██▊ | 2754/10000 [10:01:21<26:01:06, 12.93s/it] {'loss': 0.0111, 'learning_rate': 3.626e-05, 'epoch': 1.04} 28%|██▊ | 2754/10000 [10:01:21<26:01:06, 12.93s/it] 28%|██▊ | 2755/10000 [10:01:33<26:00:39, 12.92s/it] {'loss': 0.0056, 'learning_rate': 3.6255e-05, 'epoch': 1.04} 28%|██▊ | 2755/10000 [10:01:34<26:00:39, 12.92s/it] 28%|██▊ | 2756/10000 [10:01:46<26:00:22, 12.92s/it] {'loss': 0.0056, 'learning_rate': 3.625e-05, 'epoch': 1.04} 28%|██▊ | 2756/10000 [10:01:46<26:00:22, 12.92s/it] 28%|██▊ | 2757/10000 [10:01:59<25:57:08, 12.90s/it] {'loss': 0.0071, 'learning_rate': 3.6245e-05, 'epoch': 1.04} 28%|██▊ | 2757/10000 [10:01:59<25:57:08, 12.90s/it] 28%|██▊ | 2758/10000 [10:02:12<25:56:40, 12.90s/it] {'loss': 0.0049, 'learning_rate': 3.624e-05, 'epoch': 1.04} 28%|██▊ | 2758/10000 [10:02:12<25:56:40, 12.90s/it] 28%|██▊ | 2759/10000 [10:02:25<25:56:16, 12.90s/it] {'loss': 0.0059, 'learning_rate': 3.6235e-05, 'epoch': 1.04} 28%|██▊ | 2759/10000 [10:02:25<25:56:16, 12.90s/it] 28%|██▊ | 2760/10000 [10:02:38<25:57:09, 12.90s/it] {'loss': 0.0054, 'learning_rate': 3.6230000000000004e-05, 'epoch': 1.04} 28%|██▊ | 2760/10000 [10:02:38<25:57:09, 12.90s/it] 28%|██▊ | 2761/10000 [10:02:51<25:57:25, 12.91s/it] {'loss': 0.0049, 'learning_rate': 3.6225000000000006e-05, 'epoch': 1.04} 28%|██▊ | 2761/10000 [10:02:51<25:57:25, 12.91s/it] 28%|██▊ | 2762/10000 [10:03:04<25:56:28, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.622e-05, 'epoch': 1.04} 28%|██▊ | 2762/10000 [10:03:04<25:56:28, 12.90s/it] 28%|██▊ | 2763/10000 [10:03:17<25:56:39, 12.91s/it] {'loss': 0.0048, 'learning_rate': 3.6215000000000005e-05, 'epoch': 1.04} 28%|██▊ | 2763/10000 [10:03:17<25:56:39, 12.91s/it] 28%|██▊ | 2764/10000 [10:03:30<25:55:27, 12.90s/it] {'loss': 0.0063, 'learning_rate': 3.621e-05, 'epoch': 1.04} 28%|██▊ | 2764/10000 [10:03:30<25:55:27, 12.90s/it] 28%|██▊ | 2765/10000 [10:03:43<25:57:12, 12.91s/it] {'loss': 0.0055, 'learning_rate': 3.6205e-05, 'epoch': 1.04} 28%|██▊ | 2765/10000 [10:03:43<25:57:12, 12.91s/it] 28%|██▊ | 2766/10000 [10:03:55<25:57:29, 12.92s/it] {'loss': 0.0053, 'learning_rate': 3.62e-05, 'epoch': 1.04} 28%|██▊ | 2766/10000 [10:03:55<25:57:29, 12.92s/it] 28%|██▊ | 2767/10000 [10:04:08<25:57:49, 12.92s/it] {'loss': 0.0052, 'learning_rate': 3.6195e-05, 'epoch': 1.04} 28%|██▊ | 2767/10000 [10:04:08<25:57:49, 12.92s/it] 28%|██▊ | 2768/10000 [10:04:21<25:54:44, 12.90s/it] {'loss': 0.0053, 'learning_rate': 3.6190000000000004e-05, 'epoch': 1.04} 28%|██▊ | 2768/10000 [10:04:21<25:54:44, 12.90s/it] 28%|██▊ | 2769/10000 [10:04:34<25:55:25, 12.91s/it] {'loss': 0.0045, 'learning_rate': 3.6185e-05, 'epoch': 1.04} 28%|██▊ | 2769/10000 [10:04:34<25:55:25, 12.91s/it] 28%|██▊ | 2770/10000 [10:04:47<25:53:04, 12.89s/it] {'loss': 0.0047, 'learning_rate': 3.618e-05, 'epoch': 1.04} 28%|██▊ | 2770/10000 [10:04:47<25:53:04, 12.89s/it] 28%|██▊ | 2771/10000 [10:05:00<25:52:51, 12.89s/it] {'loss': 0.0053, 'learning_rate': 3.6175000000000005e-05, 'epoch': 1.04} 28%|██▊ | 2771/10000 [10:05:00<25:52:51, 12.89s/it] 28%|██▊ | 2772/10000 [10:05:13<25:52:11, 12.88s/it] {'loss': 0.005, 'learning_rate': 3.617e-05, 'epoch': 1.04} 28%|██▊ | 2772/10000 [10:05:13<25:52:11, 12.88s/it] 28%|██▊ | 2773/10000 [10:05:26<25:53:42, 12.90s/it] {'loss': 0.0053, 'learning_rate': 3.6165000000000004e-05, 'epoch': 1.04} 28%|██▊ | 2773/10000 [10:05:26<25:53:42, 12.90s/it] 28%|██▊ | 2774/10000 [10:05:39<25:52:00, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.616e-05, 'epoch': 1.05} 28%|██▊ | 2774/10000 [10:05:39<25:52:00, 12.89s/it] 28%|██▊ | 2775/10000 [10:05:51<25:52:30, 12.89s/it] {'loss': 0.0056, 'learning_rate': 3.6155e-05, 'epoch': 1.05} 28%|██▊ | 2775/10000 [10:05:51<25:52:30, 12.89s/it] 28%|██▊ | 2776/10000 [10:06:04<25:51:05, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.615e-05, 'epoch': 1.05} 28%|██▊ | 2776/10000 [10:06:04<25:51:05, 12.88s/it] 28%|██▊ | 2777/10000 [10:06:17<25:50:27, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.6145e-05, 'epoch': 1.05} 28%|██▊ | 2777/10000 [10:06:17<25:50:27, 12.88s/it] 28%|██▊ | 2778/10000 [10:06:30<25:51:07, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.614e-05, 'epoch': 1.05} 28%|██▊ | 2778/10000 [10:06:30<25:51:07, 12.89s/it] 28%|██▊ | 2779/10000 [10:06:43<25:54:26, 12.92s/it] {'loss': 0.0045, 'learning_rate': 3.6135000000000006e-05, 'epoch': 1.05} 28%|██▊ | 2779/10000 [10:06:43<25:54:26, 12.92s/it] 28%|██▊ | 2780/10000 [10:06:56<25:52:24, 12.90s/it] {'loss': 0.0056, 'learning_rate': 3.613e-05, 'epoch': 1.05} 28%|██▊ | 2780/10000 [10:06:56<25:52:24, 12.90s/it] 28%|██▊ | 2781/10000 [10:07:09<25:50:12, 12.88s/it] {'loss': 0.0056, 'learning_rate': 3.6125000000000004e-05, 'epoch': 1.05} 28%|██▊ | 2781/10000 [10:07:09<25:50:12, 12.88s/it] 28%|██▊ | 2782/10000 [10:07:22<25:49:12, 12.88s/it] {'loss': 0.0047, 'learning_rate': 3.6120000000000007e-05, 'epoch': 1.05} 28%|██▊ | 2782/10000 [10:07:22<25:49:12, 12.88s/it] 28%|██▊ | 2783/10000 [10:07:35<25:51:12, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.6115e-05, 'epoch': 1.05} 28%|██▊ | 2783/10000 [10:07:35<25:51:12, 12.90s/it] 28%|██▊ | 2784/10000 [10:07:47<25:48:27, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.611e-05, 'epoch': 1.05} 28%|██▊ | 2784/10000 [10:07:47<25:48:27, 12.88s/it] 28%|██▊ | 2785/10000 [10:08:00<25:49:17, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.6105e-05, 'epoch': 1.05} 28%|██▊ | 2785/10000 [10:08:00<25:49:17, 12.88s/it] 28%|██▊ | 2786/10000 [10:08:13<25:46:55, 12.87s/it] {'loss': 0.0053, 'learning_rate': 3.61e-05, 'epoch': 1.05} 28%|██▊ | 2786/10000 [10:08:13<25:46:55, 12.87s/it] 28%|██▊ | 2787/10000 [10:08:26<25:45:45, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.6095e-05, 'epoch': 1.05} 28%|██▊ | 2787/10000 [10:08:26<25:45:45, 12.86s/it] 28%|██▊ | 2788/10000 [10:08:39<25:47:41, 12.88s/it] {'loss': 0.0064, 'learning_rate': 3.609e-05, 'epoch': 1.05} 28%|██▊ | 2788/10000 [10:08:39<25:47:41, 12.88s/it] 28%|██▊ | 2789/10000 [10:08:52<25:47:15, 12.87s/it] {'loss': 0.0055, 'learning_rate': 3.6085000000000004e-05, 'epoch': 1.05} 28%|██▊ | 2789/10000 [10:08:52<25:47:15, 12.87s/it] 28%|██▊ | 2790/10000 [10:09:05<25:46:25, 12.87s/it] {'loss': 0.0068, 'learning_rate': 3.608e-05, 'epoch': 1.05} 28%|██▊ | 2790/10000 [10:09:05<25:46:25, 12.87s/it] 28%|██▊ | 2791/10000 [10:09:18<25:47:19, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.6075e-05, 'epoch': 1.05} 28%|██▊ | 2791/10000 [10:09:18<25:47:19, 12.88s/it] 28%|██▊ | 2792/10000 [10:09:30<25:43:51, 12.85s/it] {'loss': 0.0057, 'learning_rate': 3.6070000000000005e-05, 'epoch': 1.05} 28%|██▊ | 2792/10000 [10:09:30<25:43:51, 12.85s/it] 28%|██▊ | 2793/10000 [10:09:43<25:47:38, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.6065e-05, 'epoch': 1.05} 28%|██▊ | 2793/10000 [10:09:43<25:47:38, 12.88s/it] 28%|██▊ | 2794/10000 [10:09:56<25:49:53, 12.90s/it] {'loss': 0.0058, 'learning_rate': 3.606e-05, 'epoch': 1.05} 28%|██▊ | 2794/10000 [10:09:56<25:49:53, 12.90s/it] 28%|██▊ | 2795/10000 [10:10:09<25:49:06, 12.90s/it] {'loss': 0.006, 'learning_rate': 3.6055e-05, 'epoch': 1.05} 28%|██▊ | 2795/10000 [10:10:09<25:49:06, 12.90s/it] 28%|██▊ | 2796/10000 [10:10:22<25:47:18, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.605e-05, 'epoch': 1.05} 28%|██▊ | 2796/10000 [10:10:22<25:47:18, 12.89s/it] 28%|██▊ | 2797/10000 [10:10:35<25:45:52, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.6045e-05, 'epoch': 1.05} 28%|██▊ | 2797/10000 [10:10:35<25:45:52, 12.88s/it] 28%|██▊ | 2798/10000 [10:10:48<25:47:25, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.604e-05, 'epoch': 1.05} 28%|██▊ | 2798/10000 [10:10:48<25:47:25, 12.89s/it] 28%|██▊ | 2799/10000 [10:11:01<25:44:08, 12.87s/it] {'loss': 0.0055, 'learning_rate': 3.6035e-05, 'epoch': 1.05} 28%|██▊ | 2799/10000 [10:11:01<25:44:08, 12.87s/it] 28%|██▊ | 2800/10000 [10:11:13<25:44:38, 12.87s/it] {'loss': 0.0068, 'learning_rate': 3.6030000000000006e-05, 'epoch': 1.06} 28%|██▊ | 2800/10000 [10:11:13<25:44:38, 12.87s/it] 28%|██▊ | 2801/10000 [10:11:26<25:45:19, 12.88s/it] {'loss': 0.0059, 'learning_rate': 3.6025e-05, 'epoch': 1.06} 28%|██▊ | 2801/10000 [10:11:26<25:45:19, 12.88s/it] 28%|██▊ | 2802/10000 [10:11:39<25:44:01, 12.87s/it] {'loss': 0.0057, 'learning_rate': 3.6020000000000004e-05, 'epoch': 1.06} 28%|██▊ | 2802/10000 [10:11:39<25:44:01, 12.87s/it] 28%|██▊ | 2803/10000 [10:11:52<25:47:15, 12.90s/it] {'loss': 0.0054, 'learning_rate': 3.601500000000001e-05, 'epoch': 1.06} 28%|██▊ | 2803/10000 [10:11:52<25:47:15, 12.90s/it] 28%|██▊ | 2804/10000 [10:12:05<25:45:53, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.601e-05, 'epoch': 1.06} 28%|██▊ | 2804/10000 [10:12:05<25:45:53, 12.89s/it] 28%|██▊ | 2805/10000 [10:12:18<25:42:36, 12.86s/it] {'loss': 0.0069, 'learning_rate': 3.6005e-05, 'epoch': 1.06} 28%|██▊ | 2805/10000 [10:12:18<25:42:36, 12.86s/it] 28%|██▊ | 2806/10000 [10:12:31<25:40:31, 12.85s/it] {'loss': 0.006, 'learning_rate': 3.6e-05, 'epoch': 1.06} 28%|██▊ | 2806/10000 [10:12:31<25:40:31, 12.85s/it] 28%|██▊ | 2807/10000 [10:12:44<25:44:27, 12.88s/it] {'loss': 0.0059, 'learning_rate': 3.5995000000000004e-05, 'epoch': 1.06} 28%|██▊ | 2807/10000 [10:12:44<25:44:27, 12.88s/it] 28%|██▊ | 2808/10000 [10:12:56<25:41:04, 12.86s/it] {'loss': 0.0142, 'learning_rate': 3.599e-05, 'epoch': 1.06} 28%|██▊ | 2808/10000 [10:12:56<25:41:04, 12.86s/it] 28%|██▊ | 2809/10000 [10:13:09<25:40:23, 12.85s/it] {'loss': 0.0041, 'learning_rate': 3.5985e-05, 'epoch': 1.06} 28%|██▊ | 2809/10000 [10:13:09<25:40:23, 12.85s/it] 28%|██▊ | 2810/10000 [10:13:22<25:39:07, 12.84s/it] {'loss': 0.0054, 'learning_rate': 3.5980000000000004e-05, 'epoch': 1.06} 28%|██▊ | 2810/10000 [10:13:22<25:39:07, 12.84s/it] 28%|██▊ | 2811/10000 [10:13:35<25:38:50, 12.84s/it] {'loss': 0.0051, 'learning_rate': 3.5975e-05, 'epoch': 1.06} 28%|██▊ | 2811/10000 [10:13:35<25:38:50, 12.84s/it] 28%|██▊ | 2812/10000 [10:13:48<25:38:56, 12.85s/it] {'loss': 0.0069, 'learning_rate': 3.597e-05, 'epoch': 1.06} 28%|██▊ | 2812/10000 [10:13:48<25:38:56, 12.85s/it] 28%|██▊ | 2813/10000 [10:14:01<25:38:12, 12.84s/it] {'loss': 0.0077, 'learning_rate': 3.5965000000000005e-05, 'epoch': 1.06} 28%|██▊ | 2813/10000 [10:14:01<25:38:12, 12.84s/it] 28%|██▊ | 2814/10000 [10:14:14<25:41:06, 12.87s/it] {'loss': 0.0059, 'learning_rate': 3.596e-05, 'epoch': 1.06} 28%|██▊ | 2814/10000 [10:14:14<25:41:06, 12.87s/it] 28%|██▊ | 2815/10000 [10:14:26<25:39:37, 12.86s/it] {'loss': 0.0046, 'learning_rate': 3.5955e-05, 'epoch': 1.06} 28%|██▊ | 2815/10000 [10:14:26<25:39:37, 12.86s/it] 28%|██▊ | 2816/10000 [10:14:39<25:40:48, 12.87s/it] {'loss': 0.0058, 'learning_rate': 3.595e-05, 'epoch': 1.06} 28%|██▊ | 2816/10000 [10:14:39<25:40:48, 12.87s/it] 28%|██▊ | 2817/10000 [10:14:52<25:43:58, 12.90s/it] {'loss': 0.0056, 'learning_rate': 3.5945e-05, 'epoch': 1.06} 28%|██▊ | 2817/10000 [10:14:52<25:43:58, 12.90s/it] 28%|██▊ | 2818/10000 [10:15:05<25:42:43, 12.89s/it] {'loss': 0.0054, 'learning_rate': 3.594e-05, 'epoch': 1.06} 28%|██▊ | 2818/10000 [10:15:05<25:42:43, 12.89s/it] 28%|██▊ | 2819/10000 [10:15:18<25:43:21, 12.90s/it] {'loss': 0.0048, 'learning_rate': 3.5935e-05, 'epoch': 1.06} 28%|██▊ | 2819/10000 [10:15:18<25:43:21, 12.90s/it] 28%|██▊ | 2820/10000 [10:15:31<25:43:16, 12.90s/it] {'loss': 0.0049, 'learning_rate': 3.593e-05, 'epoch': 1.06} 28%|██▊ | 2820/10000 [10:15:31<25:43:16, 12.90s/it] 28%|██▊ | 2821/10000 [10:15:44<25:43:17, 12.90s/it] {'loss': 0.0056, 'learning_rate': 3.5925000000000006e-05, 'epoch': 1.06} 28%|██▊ | 2821/10000 [10:15:44<25:43:17, 12.90s/it] 28%|██▊ | 2822/10000 [10:15:57<25:39:45, 12.87s/it] {'loss': 0.0044, 'learning_rate': 3.592e-05, 'epoch': 1.06} 28%|██▊ | 2822/10000 [10:15:57<25:39:45, 12.87s/it] 28%|██▊ | 2823/10000 [10:16:09<25:39:34, 12.87s/it] {'loss': 0.0043, 'learning_rate': 3.5915000000000004e-05, 'epoch': 1.06} 28%|██▊ | 2823/10000 [10:16:09<25:39:34, 12.87s/it] 28%|██▊ | 2824/10000 [10:16:22<25:39:45, 12.87s/it] {'loss': 0.0051, 'learning_rate': 3.591e-05, 'epoch': 1.06} 28%|██▊ | 2824/10000 [10:16:22<25:39:45, 12.87s/it] 28%|██▊ | 2825/10000 [10:16:35<25:38:44, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.5905e-05, 'epoch': 1.06} 28%|██▊ | 2825/10000 [10:16:35<25:38:44, 12.87s/it] 28%|██▊ | 2826/10000 [10:16:48<25:42:15, 12.90s/it] {'loss': 0.0048, 'learning_rate': 3.59e-05, 'epoch': 1.06} 28%|██▊ | 2826/10000 [10:16:48<25:42:15, 12.90s/it] 28%|██▊ | 2827/10000 [10:17:01<25:44:58, 12.92s/it] {'loss': 0.0042, 'learning_rate': 3.5895e-05, 'epoch': 1.07} 28%|██▊ | 2827/10000 [10:17:01<25:44:58, 12.92s/it] 28%|██▊ | 2828/10000 [10:17:14<25:44:12, 12.92s/it] {'loss': 0.006, 'learning_rate': 3.5890000000000004e-05, 'epoch': 1.07} 28%|██▊ | 2828/10000 [10:17:14<25:44:12, 12.92s/it] 28%|██▊ | 2829/10000 [10:17:27<25:47:12, 12.95s/it] {'loss': 0.0054, 'learning_rate': 3.5885e-05, 'epoch': 1.07} 28%|██▊ | 2829/10000 [10:17:27<25:47:12, 12.95s/it] 28%|██▊ | 2830/10000 [10:17:40<25:44:38, 12.93s/it] {'loss': 0.0054, 'learning_rate': 3.588e-05, 'epoch': 1.07} 28%|██▊ | 2830/10000 [10:17:40<25:44:38, 12.93s/it] 28%|██▊ | 2831/10000 [10:17:53<25:44:21, 12.93s/it] {'loss': 0.0057, 'learning_rate': 3.5875000000000005e-05, 'epoch': 1.07} 28%|██▊ | 2831/10000 [10:17:53<25:44:21, 12.93s/it] 28%|██▊ | 2832/10000 [10:18:06<25:43:21, 12.92s/it] {'loss': 0.0051, 'learning_rate': 3.587e-05, 'epoch': 1.07} 28%|██▊ | 2832/10000 [10:18:06<25:43:21, 12.92s/it] 28%|██▊ | 2833/10000 [10:18:19<25:41:23, 12.90s/it] {'loss': 0.0042, 'learning_rate': 3.5865e-05, 'epoch': 1.07} 28%|██▊ | 2833/10000 [10:18:19<25:41:23, 12.90s/it] 28%|██▊ | 2834/10000 [10:18:32<25:41:20, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.586e-05, 'epoch': 1.07} 28%|██▊ | 2834/10000 [10:18:32<25:41:20, 12.91s/it] 28%|██▊ | 2835/10000 [10:18:44<25:41:24, 12.91s/it] {'loss': 0.0046, 'learning_rate': 3.5855e-05, 'epoch': 1.07} 28%|██▊ | 2835/10000 [10:18:44<25:41:24, 12.91s/it] 28%|██▊ | 2836/10000 [10:18:57<25:40:14, 12.90s/it] {'loss': 0.0049, 'learning_rate': 3.585e-05, 'epoch': 1.07} 28%|██▊ | 2836/10000 [10:18:57<25:40:14, 12.90s/it] 28%|██▊ | 2837/10000 [10:19:10<25:38:40, 12.89s/it] {'loss': 0.0046, 'learning_rate': 3.5845e-05, 'epoch': 1.07} 28%|██▊ | 2837/10000 [10:19:10<25:38:40, 12.89s/it] 28%|██▊ | 2838/10000 [10:19:23<25:37:04, 12.88s/it] {'loss': 0.0054, 'learning_rate': 3.584e-05, 'epoch': 1.07} 28%|██▊ | 2838/10000 [10:19:23<25:37:04, 12.88s/it] 28%|██▊ | 2839/10000 [10:19:36<25:34:13, 12.85s/it] {'loss': 0.0054, 'learning_rate': 3.5835000000000005e-05, 'epoch': 1.07} 28%|██▊ | 2839/10000 [10:19:36<25:34:13, 12.85s/it] 28%|██▊ | 2840/10000 [10:19:49<25:35:21, 12.87s/it] {'loss': 0.0034, 'learning_rate': 3.583e-05, 'epoch': 1.07} 28%|██▊ | 2840/10000 [10:19:49<25:35:21, 12.87s/it] 28%|██▊ | 2841/10000 [10:20:02<25:34:00, 12.86s/it] {'loss': 0.0058, 'learning_rate': 3.5825000000000003e-05, 'epoch': 1.07} 28%|██▊ | 2841/10000 [10:20:02<25:34:00, 12.86s/it] 28%|██▊ | 2842/10000 [10:20:14<25:32:24, 12.85s/it] {'loss': 0.0056, 'learning_rate': 3.5820000000000006e-05, 'epoch': 1.07} 28%|██▊ | 2842/10000 [10:20:14<25:32:24, 12.85s/it] 28%|██▊ | 2843/10000 [10:20:27<25:36:14, 12.88s/it] {'loss': 0.0059, 'learning_rate': 3.5815e-05, 'epoch': 1.07} 28%|██▊ | 2843/10000 [10:20:27<25:36:14, 12.88s/it] 28%|██▊ | 2844/10000 [10:20:40<25:33:58, 12.86s/it] {'loss': 0.0059, 'learning_rate': 3.581e-05, 'epoch': 1.07} 28%|██▊ | 2844/10000 [10:20:40<25:33:58, 12.86s/it] 28%|██▊ | 2845/10000 [10:20:53<25:32:24, 12.85s/it] {'loss': 0.0059, 'learning_rate': 3.5805e-05, 'epoch': 1.07} 28%|██▊ | 2845/10000 [10:20:53<25:32:24, 12.85s/it] 28%|██▊ | 2846/10000 [10:21:06<25:30:23, 12.84s/it] {'loss': 0.0053, 'learning_rate': 3.58e-05, 'epoch': 1.07} 28%|██▊ | 2846/10000 [10:21:06<25:30:23, 12.84s/it] 28%|██▊ | 2847/10000 [10:21:19<25:30:37, 12.84s/it] {'loss': 0.0047, 'learning_rate': 3.5795e-05, 'epoch': 1.07} 28%|██▊ | 2847/10000 [10:21:19<25:30:37, 12.84s/it] 28%|██▊ | 2848/10000 [10:21:32<25:30:49, 12.84s/it] {'loss': 0.0055, 'learning_rate': 3.579e-05, 'epoch': 1.07} 28%|██▊ | 2848/10000 [10:21:32<25:30:49, 12.84s/it] 28%|██▊ | 2849/10000 [10:21:44<25:28:35, 12.83s/it] {'loss': 0.005, 'learning_rate': 3.5785000000000004e-05, 'epoch': 1.07} 28%|██▊ | 2849/10000 [10:21:44<25:28:35, 12.83s/it] 28%|██▊ | 2850/10000 [10:21:57<25:30:18, 12.84s/it] {'loss': 0.004, 'learning_rate': 3.578e-05, 'epoch': 1.07} 28%|██▊ | 2850/10000 [10:21:57<25:30:18, 12.84s/it] 29%|██▊ | 2851/10000 [10:22:10<25:34:20, 12.88s/it] {'loss': 0.0039, 'learning_rate': 3.5775e-05, 'epoch': 1.07} 29%|██▊ | 2851/10000 [10:22:10<25:34:20, 12.88s/it] 29%|██▊ | 2852/10000 [10:22:23<25:35:55, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.5770000000000005e-05, 'epoch': 1.07} 29%|██▊ | 2852/10000 [10:22:23<25:35:55, 12.89s/it] 29%|██▊ | 2853/10000 [10:22:36<25:35:15, 12.89s/it] {'loss': 0.0068, 'learning_rate': 3.576500000000001e-05, 'epoch': 1.07} 29%|██▊ | 2853/10000 [10:22:36<25:35:15, 12.89s/it] 29%|██▊ | 2854/10000 [10:22:49<25:33:18, 12.87s/it] {'loss': 0.0065, 'learning_rate': 3.5759999999999996e-05, 'epoch': 1.08} 29%|██▊ | 2854/10000 [10:22:49<25:33:18, 12.87s/it] 29%|██▊ | 2855/10000 [10:23:02<25:33:28, 12.88s/it] {'loss': 0.0059, 'learning_rate': 3.5755e-05, 'epoch': 1.08} 29%|██▊ | 2855/10000 [10:23:02<25:33:28, 12.88s/it] 29%|██▊ | 2856/10000 [10:23:15<25:33:36, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.575e-05, 'epoch': 1.08} 29%|██▊ | 2856/10000 [10:23:15<25:33:36, 12.88s/it] 29%|██▊ | 2857/10000 [10:23:27<25:30:36, 12.86s/it] {'loss': 0.0064, 'learning_rate': 3.5745e-05, 'epoch': 1.08} 29%|██▊ | 2857/10000 [10:23:27<25:30:36, 12.86s/it] 29%|██▊ | 2858/10000 [10:23:40<25:32:43, 12.88s/it] {'loss': 0.0059, 'learning_rate': 3.574e-05, 'epoch': 1.08} 29%|██▊ | 2858/10000 [10:23:40<25:32:43, 12.88s/it] 29%|██▊ | 2859/10000 [10:23:53<25:32:03, 12.87s/it] {'loss': 0.0051, 'learning_rate': 3.5735e-05, 'epoch': 1.08} 29%|██▊ | 2859/10000 [10:23:53<25:32:03, 12.87s/it] 29%|██▊ | 2860/10000 [10:24:06<25:31:57, 12.87s/it] {'loss': 0.005, 'learning_rate': 3.5730000000000005e-05, 'epoch': 1.08} 29%|██▊ | 2860/10000 [10:24:06<25:31:57, 12.87s/it] 29%|██▊ | 2861/10000 [10:24:19<25:33:32, 12.89s/it] {'loss': 0.008, 'learning_rate': 3.5725e-05, 'epoch': 1.08} 29%|██▊ | 2861/10000 [10:24:19<25:33:32, 12.89s/it] 29%|██▊ | 2862/10000 [10:24:32<25:34:07, 12.90s/it] {'loss': 0.0051, 'learning_rate': 3.5720000000000004e-05, 'epoch': 1.08} 29%|██▊ | 2862/10000 [10:24:32<25:34:07, 12.90s/it] 29%|██▊ | 2863/10000 [10:24:45<25:33:23, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.5715000000000006e-05, 'epoch': 1.08} 29%|██▊ | 2863/10000 [10:24:45<25:33:23, 12.89s/it] 29%|██▊ | 2864/10000 [10:24:58<25:32:11, 12.88s/it] {'loss': 0.0061, 'learning_rate': 3.571e-05, 'epoch': 1.08} 29%|██▊ | 2864/10000 [10:24:58<25:32:11, 12.88s/it] 29%|██▊ | 2865/10000 [10:25:10<25:32:10, 12.88s/it] {'loss': 0.0067, 'learning_rate': 3.5705e-05, 'epoch': 1.08} 29%|██▊ | 2865/10000 [10:25:11<25:32:10, 12.88s/it] 29%|██▊ | 2866/10000 [10:25:23<25:33:14, 12.90s/it] {'loss': 0.0041, 'learning_rate': 3.57e-05, 'epoch': 1.08} 29%|██▊ | 2866/10000 [10:25:23<25:33:14, 12.90s/it] 29%|██▊ | 2867/10000 [10:25:36<25:33:10, 12.90s/it] {'loss': 0.0061, 'learning_rate': 3.5695e-05, 'epoch': 1.08} 29%|██▊ | 2867/10000 [10:25:36<25:33:10, 12.90s/it] 29%|██▊ | 2868/10000 [10:25:49<25:29:45, 12.87s/it] {'loss': 0.0055, 'learning_rate': 3.569e-05, 'epoch': 1.08} 29%|██▊ | 2868/10000 [10:25:49<25:29:45, 12.87s/it] 29%|██▊ | 2869/10000 [10:26:02<25:28:07, 12.86s/it] {'loss': 0.0048, 'learning_rate': 3.5685e-05, 'epoch': 1.08} 29%|██▊ | 2869/10000 [10:26:02<25:28:07, 12.86s/it] 29%|██▊ | 2870/10000 [10:26:15<25:27:51, 12.86s/it] {'loss': 0.0038, 'learning_rate': 3.5680000000000004e-05, 'epoch': 1.08} 29%|██▊ | 2870/10000 [10:26:15<25:27:51, 12.86s/it] 29%|██▊ | 2871/10000 [10:26:28<25:29:01, 12.87s/it] {'loss': 0.0068, 'learning_rate': 3.5675e-05, 'epoch': 1.08} 29%|██▊ | 2871/10000 [10:26:28<25:29:01, 12.87s/it] 29%|██▊ | 2872/10000 [10:26:41<25:28:47, 12.87s/it] {'loss': 0.0049, 'learning_rate': 3.567e-05, 'epoch': 1.08} 29%|██▊ | 2872/10000 [10:26:41<25:28:47, 12.87s/it] 29%|██▊ | 2873/10000 [10:26:53<25:28:41, 12.87s/it] {'loss': 0.004, 'learning_rate': 3.5665000000000005e-05, 'epoch': 1.08} 29%|██▊ | 2873/10000 [10:26:53<25:28:41, 12.87s/it] 29%|██▊ | 2874/10000 [10:27:06<25:27:40, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.566e-05, 'epoch': 1.08} 29%|██▊ | 2874/10000 [10:27:06<25:27:40, 12.86s/it] 29%|██▉ | 2875/10000 [10:27:19<25:27:15, 12.86s/it] {'loss': 0.0045, 'learning_rate': 3.5654999999999997e-05, 'epoch': 1.08} 29%|██▉ | 2875/10000 [10:27:19<25:27:15, 12.86s/it] 29%|██▉ | 2876/10000 [10:27:32<25:26:51, 12.86s/it] {'loss': 0.0045, 'learning_rate': 3.565e-05, 'epoch': 1.08} 29%|██▉ | 2876/10000 [10:27:32<25:26:51, 12.86s/it] 29%|██▉ | 2877/10000 [10:27:45<25:30:28, 12.89s/it] {'loss': 0.0032, 'learning_rate': 3.5645e-05, 'epoch': 1.08} 29%|██▉ | 2877/10000 [10:27:45<25:30:28, 12.89s/it] 29%|██▉ | 2878/10000 [10:27:58<25:31:12, 12.90s/it] {'loss': 0.0046, 'learning_rate': 3.5640000000000004e-05, 'epoch': 1.08} 29%|██▉ | 2878/10000 [10:27:58<25:31:12, 12.90s/it] 29%|██▉ | 2879/10000 [10:28:11<25:29:42, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.5635e-05, 'epoch': 1.08} 29%|██▉ | 2879/10000 [10:28:11<25:29:42, 12.89s/it] 29%|██▉ | 2880/10000 [10:28:24<25:30:07, 12.89s/it] {'loss': 0.0054, 'learning_rate': 3.563e-05, 'epoch': 1.09} 29%|██▉ | 2880/10000 [10:28:24<25:30:07, 12.89s/it] 29%|██▉ | 2881/10000 [10:28:37<25:30:19, 12.90s/it] {'loss': 0.0048, 'learning_rate': 3.5625000000000005e-05, 'epoch': 1.09} 29%|██▉ | 2881/10000 [10:28:37<25:30:19, 12.90s/it] 29%|██▉ | 2882/10000 [10:28:49<25:28:24, 12.88s/it] {'loss': 0.0052, 'learning_rate': 3.562e-05, 'epoch': 1.09} 29%|██▉ | 2882/10000 [10:28:49<25:28:24, 12.88s/it] 29%|██▉ | 2883/10000 [10:29:02<25:29:29, 12.89s/it] {'loss': 0.0065, 'learning_rate': 3.5615000000000004e-05, 'epoch': 1.09} 29%|██▉ | 2883/10000 [10:29:02<25:29:29, 12.89s/it] 29%|██▉ | 2884/10000 [10:29:15<25:27:16, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.5610000000000006e-05, 'epoch': 1.09} 29%|██▉ | 2884/10000 [10:29:15<25:27:16, 12.88s/it] 29%|██▉ | 2885/10000 [10:29:28<25:28:41, 12.89s/it] {'loss': 0.0047, 'learning_rate': 3.5605e-05, 'epoch': 1.09} 29%|██▉ | 2885/10000 [10:29:28<25:28:41, 12.89s/it] 29%|██▉ | 2886/10000 [10:29:41<25:27:09, 12.88s/it] {'loss': 0.0043, 'learning_rate': 3.56e-05, 'epoch': 1.09} 29%|██▉ | 2886/10000 [10:29:41<25:27:09, 12.88s/it] 29%|██▉ | 2887/10000 [10:29:54<25:29:53, 12.91s/it] {'loss': 0.0037, 'learning_rate': 3.5595e-05, 'epoch': 1.09} 29%|██▉ | 2887/10000 [10:29:54<25:29:53, 12.91s/it] 29%|██▉ | 2888/10000 [10:30:07<25:30:06, 12.91s/it] {'loss': 0.004, 'learning_rate': 3.559e-05, 'epoch': 1.09} 29%|██▉ | 2888/10000 [10:30:07<25:30:06, 12.91s/it] 29%|██▉ | 2889/10000 [10:30:20<25:30:24, 12.91s/it] {'loss': 0.0058, 'learning_rate': 3.5585e-05, 'epoch': 1.09} 29%|██▉ | 2889/10000 [10:30:20<25:30:24, 12.91s/it] 29%|██▉ | 2890/10000 [10:30:33<25:26:37, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.558e-05, 'epoch': 1.09} 29%|██▉ | 2890/10000 [10:30:33<25:26:37, 12.88s/it] 29%|██▉ | 2891/10000 [10:30:45<25:24:30, 12.87s/it] {'loss': 0.0066, 'learning_rate': 3.5575000000000004e-05, 'epoch': 1.09} 29%|██▉ | 2891/10000 [10:30:45<25:24:30, 12.87s/it] 29%|██▉ | 2892/10000 [10:30:58<25:23:51, 12.86s/it] {'loss': 0.0054, 'learning_rate': 3.557e-05, 'epoch': 1.09} 29%|██▉ | 2892/10000 [10:30:58<25:23:51, 12.86s/it] 29%|██▉ | 2893/10000 [10:31:11<25:25:14, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.5565e-05, 'epoch': 1.09} 29%|██▉ | 2893/10000 [10:31:11<25:25:14, 12.88s/it] 29%|██▉ | 2894/10000 [10:31:24<25:25:22, 12.88s/it] {'loss': 0.0049, 'learning_rate': 3.5560000000000005e-05, 'epoch': 1.09} 29%|██▉ | 2894/10000 [10:31:24<25:25:22, 12.88s/it] 29%|██▉ | 2895/10000 [10:31:37<25:26:04, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.5555e-05, 'epoch': 1.09} 29%|██▉ | 2895/10000 [10:31:37<25:26:04, 12.89s/it] 29%|██▉ | 2896/10000 [10:31:50<25:25:10, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.555e-05, 'epoch': 1.09} 29%|██▉ | 2896/10000 [10:31:50<25:25:10, 12.88s/it] 29%|██▉ | 2897/10000 [10:32:03<25:24:43, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.5545e-05, 'epoch': 1.09} 29%|██▉ | 2897/10000 [10:32:03<25:24:43, 12.88s/it] 29%|██▉ | 2898/10000 [10:32:16<25:25:02, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.554e-05, 'epoch': 1.09} 29%|██▉ | 2898/10000 [10:32:16<25:25:02, 12.88s/it] 29%|██▉ | 2899/10000 [10:32:28<25:24:33, 12.88s/it] {'loss': 0.0057, 'learning_rate': 3.5535000000000005e-05, 'epoch': 1.09} 29%|██▉ | 2899/10000 [10:32:28<25:24:33, 12.88s/it] 29%|██▉ | 2900/10000 [10:32:41<25:24:44, 12.89s/it] {'loss': 0.0065, 'learning_rate': 3.553e-05, 'epoch': 1.09} 29%|██▉ | 2900/10000 [10:32:41<25:24:44, 12.89s/it] 29%|██▉ | 2901/10000 [10:32:54<25:23:07, 12.87s/it] {'loss': 0.0045, 'learning_rate': 3.5525e-05, 'epoch': 1.09} 29%|██▉ | 2901/10000 [10:32:54<25:23:07, 12.87s/it] 29%|██▉ | 2902/10000 [10:33:07<25:22:40, 12.87s/it] {'loss': 0.0057, 'learning_rate': 3.5520000000000006e-05, 'epoch': 1.09} 29%|██▉ | 2902/10000 [10:33:07<25:22:40, 12.87s/it] 29%|██▉ | 2903/10000 [10:33:20<25:20:07, 12.85s/it] {'loss': 0.0058, 'learning_rate': 3.5515e-05, 'epoch': 1.09} 29%|██▉ | 2903/10000 [10:33:20<25:20:07, 12.85s/it] 29%|██▉ | 2904/10000 [10:33:33<25:21:10, 12.86s/it] {'loss': 0.0054, 'learning_rate': 3.5510000000000004e-05, 'epoch': 1.09} 29%|██▉ | 2904/10000 [10:33:33<25:21:10, 12.86s/it] 29%|██▉ | 2905/10000 [10:33:46<25:21:09, 12.86s/it] {'loss': 0.0053, 'learning_rate': 3.5505e-05, 'epoch': 1.09} 29%|██▉ | 2905/10000 [10:33:46<25:21:09, 12.86s/it] 29%|██▉ | 2906/10000 [10:33:58<25:21:25, 12.87s/it] {'loss': 0.0053, 'learning_rate': 3.55e-05, 'epoch': 1.09} 29%|██▉ | 2906/10000 [10:33:59<25:21:25, 12.87s/it] 29%|██▉ | 2907/10000 [10:34:11<25:20:36, 12.86s/it] {'loss': 0.0053, 'learning_rate': 3.5495e-05, 'epoch': 1.1} 29%|██▉ | 2907/10000 [10:34:11<25:20:36, 12.86s/it] 29%|██▉ | 2908/10000 [10:34:24<25:21:16, 12.87s/it] {'loss': 0.0045, 'learning_rate': 3.549e-05, 'epoch': 1.1} 29%|██▉ | 2908/10000 [10:34:24<25:21:16, 12.87s/it] 29%|██▉ | 2909/10000 [10:34:37<25:20:27, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.5485e-05, 'epoch': 1.1} 29%|██▉ | 2909/10000 [10:34:37<25:20:27, 12.87s/it] 29%|██▉ | 2910/10000 [10:34:50<25:20:24, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.548e-05, 'epoch': 1.1} 29%|██▉ | 2910/10000 [10:34:50<25:20:24, 12.87s/it] 29%|██▉ | 2911/10000 [10:35:03<25:22:50, 12.89s/it] {'loss': 0.0068, 'learning_rate': 3.5475e-05, 'epoch': 1.1} 29%|██▉ | 2911/10000 [10:35:03<25:22:50, 12.89s/it] 29%|██▉ | 2912/10000 [10:35:16<25:21:47, 12.88s/it] {'loss': 0.0049, 'learning_rate': 3.5470000000000004e-05, 'epoch': 1.1} 29%|██▉ | 2912/10000 [10:35:16<25:21:47, 12.88s/it] 29%|██▉ | 2913/10000 [10:35:29<25:17:53, 12.85s/it] {'loss': 0.0053, 'learning_rate': 3.546500000000001e-05, 'epoch': 1.1} 29%|██▉ | 2913/10000 [10:35:29<25:17:53, 12.85s/it] 29%|██▉ | 2914/10000 [10:35:41<25:18:20, 12.86s/it] {'loss': 0.0049, 'learning_rate': 3.546e-05, 'epoch': 1.1} 29%|██▉ | 2914/10000 [10:35:41<25:18:20, 12.86s/it] 29%|██▉ | 2915/10000 [10:35:54<25:16:55, 12.85s/it] {'loss': 0.0053, 'learning_rate': 3.5455e-05, 'epoch': 1.1} 29%|██▉ | 2915/10000 [10:35:54<25:16:55, 12.85s/it] 29%|██▉ | 2916/10000 [10:36:07<25:19:02, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.545e-05, 'epoch': 1.1} 29%|██▉ | 2916/10000 [10:36:07<25:19:02, 12.87s/it] 29%|██▉ | 2917/10000 [10:36:20<25:19:29, 12.87s/it] {'loss': 0.0049, 'learning_rate': 3.5445000000000004e-05, 'epoch': 1.1} 29%|██▉ | 2917/10000 [10:36:20<25:19:29, 12.87s/it] 29%|██▉ | 2918/10000 [10:36:33<25:17:53, 12.86s/it] {'loss': 0.0051, 'learning_rate': 3.544e-05, 'epoch': 1.1} 29%|██▉ | 2918/10000 [10:36:33<25:17:53, 12.86s/it] 29%|██▉ | 2919/10000 [10:36:46<25:17:13, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.5435e-05, 'epoch': 1.1} 29%|██▉ | 2919/10000 [10:36:46<25:17:13, 12.86s/it] 29%|██▉ | 2920/10000 [10:36:59<25:16:43, 12.85s/it] {'loss': 0.005, 'learning_rate': 3.5430000000000005e-05, 'epoch': 1.1} 29%|██▉ | 2920/10000 [10:36:59<25:16:43, 12.85s/it] 29%|██▉ | 2921/10000 [10:37:11<25:17:54, 12.87s/it] {'loss': 0.0045, 'learning_rate': 3.5425e-05, 'epoch': 1.1} 29%|██▉ | 2921/10000 [10:37:11<25:17:54, 12.87s/it] 29%|██▉ | 2922/10000 [10:37:24<25:16:45, 12.86s/it] {'loss': 0.0045, 'learning_rate': 3.542e-05, 'epoch': 1.1} 29%|██▉ | 2922/10000 [10:37:24<25:16:45, 12.86s/it] 29%|██▉ | 2923/10000 [10:37:37<25:21:34, 12.90s/it] {'loss': 0.0057, 'learning_rate': 3.5415000000000006e-05, 'epoch': 1.1} 29%|██▉ | 2923/10000 [10:37:37<25:21:34, 12.90s/it] 29%|██▉ | 2924/10000 [10:37:50<25:18:23, 12.87s/it] {'loss': 0.0068, 'learning_rate': 3.541e-05, 'epoch': 1.1} 29%|██▉ | 2924/10000 [10:37:50<25:18:23, 12.87s/it] 29%|██▉ | 2925/10000 [10:38:03<25:20:48, 12.90s/it] {'loss': 0.0051, 'learning_rate': 3.5405e-05, 'epoch': 1.1} 29%|██▉ | 2925/10000 [10:38:03<25:20:48, 12.90s/it] 29%|██▉ | 2926/10000 [10:38:16<25:19:54, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.54e-05, 'epoch': 1.1} 29%|██▉ | 2926/10000 [10:38:16<25:19:54, 12.89s/it] 29%|██▉ | 2927/10000 [10:38:29<25:16:52, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.5395e-05, 'epoch': 1.1} 29%|██▉ | 2927/10000 [10:38:29<25:16:52, 12.87s/it] 29%|██▉ | 2928/10000 [10:38:42<25:18:32, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.539e-05, 'epoch': 1.1} 29%|██▉ | 2928/10000 [10:38:42<25:18:32, 12.88s/it] 29%|██▉ | 2929/10000 [10:38:55<25:18:30, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.5385e-05, 'epoch': 1.1} 29%|██▉ | 2929/10000 [10:38:55<25:18:30, 12.89s/it] 29%|██▉ | 2930/10000 [10:39:07<25:17:14, 12.88s/it] {'loss': 0.0059, 'learning_rate': 3.5380000000000003e-05, 'epoch': 1.1} 29%|██▉ | 2930/10000 [10:39:07<25:17:14, 12.88s/it] 29%|██▉ | 2931/10000 [10:39:20<25:18:53, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.5375e-05, 'epoch': 1.1} 29%|██▉ | 2931/10000 [10:39:20<25:18:53, 12.89s/it] 29%|██▉ | 2932/10000 [10:39:33<25:18:23, 12.89s/it] {'loss': 0.0065, 'learning_rate': 3.537e-05, 'epoch': 1.1} 29%|██▉ | 2932/10000 [10:39:33<25:18:23, 12.89s/it] 29%|██▉ | 2933/10000 [10:39:46<25:17:48, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.5365000000000004e-05, 'epoch': 1.11} 29%|██▉ | 2933/10000 [10:39:46<25:17:48, 12.89s/it] 29%|██▉ | 2934/10000 [10:39:59<25:16:27, 12.88s/it] {'loss': 0.0052, 'learning_rate': 3.536000000000001e-05, 'epoch': 1.11} 29%|██▉ | 2934/10000 [10:39:59<25:16:27, 12.88s/it] 29%|██▉ | 2935/10000 [10:40:12<25:17:52, 12.89s/it] {'loss': 0.0044, 'learning_rate': 3.5354999999999996e-05, 'epoch': 1.11} 29%|██▉ | 2935/10000 [10:40:12<25:17:52, 12.89s/it] 29%|██▉ | 2936/10000 [10:40:25<25:16:57, 12.88s/it] {'loss': 0.0055, 'learning_rate': 3.535e-05, 'epoch': 1.11} 29%|██▉ | 2936/10000 [10:40:25<25:16:57, 12.88s/it] 29%|██▉ | 2937/10000 [10:40:38<25:15:56, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.5345e-05, 'epoch': 1.11} 29%|██▉ | 2937/10000 [10:40:38<25:15:56, 12.88s/it] 29%|██▉ | 2938/10000 [10:40:51<25:17:59, 12.90s/it] {'loss': 0.0051, 'learning_rate': 3.5340000000000004e-05, 'epoch': 1.11} 29%|██▉ | 2938/10000 [10:40:51<25:17:59, 12.90s/it] 29%|██▉ | 2939/10000 [10:41:03<25:16:04, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.5335e-05, 'epoch': 1.11} 29%|██▉ | 2939/10000 [10:41:03<25:16:04, 12.88s/it] 29%|██▉ | 2940/10000 [10:41:16<25:13:28, 12.86s/it] {'loss': 0.0047, 'learning_rate': 3.533e-05, 'epoch': 1.11} 29%|██▉ | 2940/10000 [10:41:16<25:13:28, 12.86s/it] 29%|██▉ | 2941/10000 [10:41:29<25:12:51, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.5325000000000005e-05, 'epoch': 1.11} 29%|██▉ | 2941/10000 [10:41:29<25:12:51, 12.86s/it] 29%|██▉ | 2942/10000 [10:41:42<25:13:33, 12.87s/it] {'loss': 0.0052, 'learning_rate': 3.532e-05, 'epoch': 1.11} 29%|██▉ | 2942/10000 [10:41:42<25:13:33, 12.87s/it] 29%|██▉ | 2943/10000 [10:41:55<25:15:21, 12.88s/it] {'loss': 0.0067, 'learning_rate': 3.5315e-05, 'epoch': 1.11} 29%|██▉ | 2943/10000 [10:41:55<25:15:21, 12.88s/it] 29%|██▉ | 2944/10000 [10:42:08<25:13:15, 12.87s/it] {'loss': 0.0058, 'learning_rate': 3.5310000000000006e-05, 'epoch': 1.11} 29%|██▉ | 2944/10000 [10:42:08<25:13:15, 12.87s/it] 29%|██▉ | 2945/10000 [10:42:21<25:11:33, 12.86s/it] {'loss': 0.0049, 'learning_rate': 3.5305e-05, 'epoch': 1.11} 29%|██▉ | 2945/10000 [10:42:21<25:11:33, 12.86s/it] 29%|██▉ | 2946/10000 [10:42:33<25:14:00, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.53e-05, 'epoch': 1.11} 29%|██▉ | 2946/10000 [10:42:33<25:14:00, 12.88s/it] 29%|██▉ | 2947/10000 [10:42:46<25:16:26, 12.90s/it] {'loss': 0.0046, 'learning_rate': 3.5295e-05, 'epoch': 1.11} 29%|██▉ | 2947/10000 [10:42:46<25:16:26, 12.90s/it] 29%|██▉ | 2948/10000 [10:42:59<25:15:13, 12.89s/it] {'loss': 0.0045, 'learning_rate': 3.529e-05, 'epoch': 1.11} 29%|██▉ | 2948/10000 [10:42:59<25:15:13, 12.89s/it] 29%|██▉ | 2949/10000 [10:43:12<25:11:18, 12.86s/it] {'loss': 0.0045, 'learning_rate': 3.5285e-05, 'epoch': 1.11} 29%|██▉ | 2949/10000 [10:43:12<25:11:18, 12.86s/it] 30%|██▉ | 2950/10000 [10:43:25<25:13:05, 12.88s/it] {'loss': 0.0068, 'learning_rate': 3.528e-05, 'epoch': 1.11} 30%|██▉ | 2950/10000 [10:43:25<25:13:05, 12.88s/it] 30%|██▉ | 2951/10000 [10:43:38<25:14:50, 12.89s/it] {'loss': 0.005, 'learning_rate': 3.5275000000000004e-05, 'epoch': 1.11} 30%|██▉ | 2951/10000 [10:43:38<25:14:50, 12.89s/it] 30%|██▉ | 2952/10000 [10:43:51<25:18:17, 12.93s/it] {'loss': 0.0045, 'learning_rate': 3.5270000000000006e-05, 'epoch': 1.11} 30%|██▉ | 2952/10000 [10:43:51<25:18:17, 12.93s/it] 30%|██▉ | 2953/10000 [10:44:04<25:19:28, 12.94s/it] {'loss': 0.005, 'learning_rate': 3.5265e-05, 'epoch': 1.11} 30%|██▉ | 2953/10000 [10:44:04<25:19:28, 12.94s/it] 30%|██▉ | 2954/10000 [10:44:17<25:22:07, 12.96s/it] {'loss': 0.0049, 'learning_rate': 3.5260000000000005e-05, 'epoch': 1.11} 30%|██▉ | 2954/10000 [10:44:17<25:22:07, 12.96s/it] 30%|██▉ | 2955/10000 [10:44:30<25:20:04, 12.95s/it] {'loss': 0.0047, 'learning_rate': 3.5255e-05, 'epoch': 1.11} 30%|██▉ | 2955/10000 [10:44:30<25:20:04, 12.95s/it] 30%|██▉ | 2956/10000 [10:44:43<25:18:17, 12.93s/it] {'loss': 0.0046, 'learning_rate': 3.525e-05, 'epoch': 1.11} 30%|██▉ | 2956/10000 [10:44:43<25:18:17, 12.93s/it] 30%|██▉ | 2957/10000 [10:44:56<25:18:16, 12.93s/it] {'loss': 0.0039, 'learning_rate': 3.5245e-05, 'epoch': 1.11} 30%|██▉ | 2957/10000 [10:44:56<25:18:16, 12.93s/it] 30%|██▉ | 2958/10000 [10:45:09<25:18:25, 12.94s/it] {'loss': 0.0059, 'learning_rate': 3.524e-05, 'epoch': 1.11} 30%|██▉ | 2958/10000 [10:45:09<25:18:25, 12.94s/it] 30%|██▉ | 2959/10000 [10:45:22<25:18:45, 12.94s/it] {'loss': 0.0058, 'learning_rate': 3.5235000000000004e-05, 'epoch': 1.11} 30%|██▉ | 2959/10000 [10:45:22<25:18:45, 12.94s/it] 30%|██▉ | 2960/10000 [10:45:34<25:18:10, 12.94s/it] {'loss': 0.005, 'learning_rate': 3.523e-05, 'epoch': 1.12} 30%|██▉ | 2960/10000 [10:45:35<25:18:10, 12.94s/it] 30%|██▉ | 2961/10000 [10:45:47<25:18:31, 12.94s/it] {'loss': 0.0059, 'learning_rate': 3.5225e-05, 'epoch': 1.12} 30%|██▉ | 2961/10000 [10:45:47<25:18:31, 12.94s/it] 30%|██▉ | 2962/10000 [10:46:00<25:17:35, 12.94s/it] {'loss': 0.0058, 'learning_rate': 3.5220000000000005e-05, 'epoch': 1.12} 30%|██▉ | 2962/10000 [10:46:00<25:17:35, 12.94s/it] 30%|██▉ | 2963/10000 [10:46:13<25:16:59, 12.93s/it] {'loss': 0.0058, 'learning_rate': 3.5215e-05, 'epoch': 1.12} 30%|██▉ | 2963/10000 [10:46:13<25:16:59, 12.93s/it] 30%|██▉ | 2964/10000 [10:46:26<25:17:04, 12.94s/it] {'loss': 0.0062, 'learning_rate': 3.5210000000000003e-05, 'epoch': 1.12} 30%|██▉ | 2964/10000 [10:46:26<25:17:04, 12.94s/it] 30%|██▉ | 2965/10000 [10:46:39<25:16:45, 12.94s/it] {'loss': 0.0062, 'learning_rate': 3.5205e-05, 'epoch': 1.12} 30%|██▉ | 2965/10000 [10:46:39<25:16:45, 12.94s/it] 30%|██▉ | 2966/10000 [10:46:52<25:17:06, 12.94s/it] {'loss': 0.0048, 'learning_rate': 3.52e-05, 'epoch': 1.12} 30%|██▉ | 2966/10000 [10:46:52<25:17:06, 12.94s/it] 30%|██▉ | 2967/10000 [10:47:05<25:18:19, 12.95s/it] {'loss': 0.0047, 'learning_rate': 3.5195e-05, 'epoch': 1.12} 30%|██▉ | 2967/10000 [10:47:05<25:18:19, 12.95s/it] 30%|██▉ | 2968/10000 [10:47:18<25:18:05, 12.95s/it] {'loss': 0.0035, 'learning_rate': 3.519e-05, 'epoch': 1.12} 30%|██▉ | 2968/10000 [10:47:18<25:18:05, 12.95s/it] 30%|██▉ | 2969/10000 [10:47:31<25:15:27, 12.93s/it] {'loss': 0.0066, 'learning_rate': 3.5185e-05, 'epoch': 1.12} 30%|██▉ | 2969/10000 [10:47:31<25:15:27, 12.93s/it] 30%|██▉ | 2970/10000 [10:47:44<25:12:40, 12.91s/it] {'loss': 0.0058, 'learning_rate': 3.518e-05, 'epoch': 1.12} 30%|██▉ | 2970/10000 [10:47:44<25:12:40, 12.91s/it] 30%|██▉ | 2971/10000 [10:47:57<25:13:10, 12.92s/it] {'loss': 0.0051, 'learning_rate': 3.5175e-05, 'epoch': 1.12} 30%|██▉ | 2971/10000 [10:47:57<25:13:10, 12.92s/it] 30%|██▉ | 2972/10000 [10:48:10<25:13:20, 12.92s/it] {'loss': 0.0057, 'learning_rate': 3.5170000000000004e-05, 'epoch': 1.12} 30%|██▉ | 2972/10000 [10:48:10<25:13:20, 12.92s/it] 30%|██▉ | 2973/10000 [10:48:23<25:12:58, 12.92s/it] {'loss': 0.0083, 'learning_rate': 3.5165000000000006e-05, 'epoch': 1.12} 30%|██▉ | 2973/10000 [10:48:23<25:12:58, 12.92s/it] 30%|██▉ | 2974/10000 [10:48:36<25:14:30, 12.93s/it] {'loss': 0.0055, 'learning_rate': 3.516e-05, 'epoch': 1.12} 30%|██▉ | 2974/10000 [10:48:36<25:14:30, 12.93s/it] 30%|██▉ | 2975/10000 [10:48:48<25:12:58, 12.92s/it] {'loss': 0.0059, 'learning_rate': 3.5155e-05, 'epoch': 1.12} 30%|██▉ | 2975/10000 [10:48:48<25:12:58, 12.92s/it] 30%|██▉ | 2976/10000 [10:49:01<25:12:43, 12.92s/it] {'loss': 0.0057, 'learning_rate': 3.515e-05, 'epoch': 1.12} 30%|██▉ | 2976/10000 [10:49:01<25:12:43, 12.92s/it] 30%|██▉ | 2977/10000 [10:49:14<25:14:47, 12.94s/it] {'loss': 0.005, 'learning_rate': 3.5145e-05, 'epoch': 1.12} 30%|██▉ | 2977/10000 [10:49:14<25:14:47, 12.94s/it] 30%|██▉ | 2978/10000 [10:49:27<25:15:37, 12.95s/it] {'loss': 0.004, 'learning_rate': 3.514e-05, 'epoch': 1.12} 30%|██▉ | 2978/10000 [10:49:27<25:15:37, 12.95s/it] 30%|██▉ | 2979/10000 [10:49:40<25:14:50, 12.95s/it] {'loss': 0.006, 'learning_rate': 3.5135e-05, 'epoch': 1.12} 30%|██▉ | 2979/10000 [10:49:40<25:14:50, 12.95s/it] 30%|██▉ | 2980/10000 [10:49:53<25:15:53, 12.96s/it] {'loss': 0.0056, 'learning_rate': 3.5130000000000004e-05, 'epoch': 1.12} 30%|██▉ | 2980/10000 [10:49:53<25:15:53, 12.96s/it] 30%|██▉ | 2981/10000 [10:50:06<25:14:17, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.5125e-05, 'epoch': 1.12} 30%|██▉ | 2981/10000 [10:50:06<25:14:17, 12.94s/it] 30%|██▉ | 2982/10000 [10:50:19<25:13:02, 12.94s/it] {'loss': 0.0063, 'learning_rate': 3.512e-05, 'epoch': 1.12} 30%|██▉ | 2982/10000 [10:50:19<25:13:02, 12.94s/it] 30%|██▉ | 2983/10000 [10:50:32<25:11:58, 12.93s/it] {'loss': 0.007, 'learning_rate': 3.5115000000000005e-05, 'epoch': 1.12} 30%|██▉ | 2983/10000 [10:50:32<25:11:58, 12.93s/it] 30%|██▉ | 2984/10000 [10:50:45<25:11:03, 12.92s/it] {'loss': 0.0046, 'learning_rate': 3.511e-05, 'epoch': 1.12} 30%|██▉ | 2984/10000 [10:50:45<25:11:03, 12.92s/it] 30%|██▉ | 2985/10000 [10:50:58<25:09:53, 12.91s/it] {'loss': 0.0067, 'learning_rate': 3.5105e-05, 'epoch': 1.12} 30%|██▉ | 2985/10000 [10:50:58<25:09:53, 12.91s/it] 30%|██▉ | 2986/10000 [10:51:11<25:10:21, 12.92s/it] {'loss': 0.007, 'learning_rate': 3.51e-05, 'epoch': 1.13} 30%|██▉ | 2986/10000 [10:51:11<25:10:21, 12.92s/it] 30%|██▉ | 2987/10000 [10:51:24<25:13:58, 12.95s/it] {'loss': 0.0049, 'learning_rate': 3.5095e-05, 'epoch': 1.13} 30%|██▉ | 2987/10000 [10:51:24<25:13:58, 12.95s/it] 30%|██▉ | 2988/10000 [10:51:37<25:13:42, 12.95s/it] {'loss': 0.0059, 'learning_rate': 3.509e-05, 'epoch': 1.13} 30%|██▉ | 2988/10000 [10:51:37<25:13:42, 12.95s/it] 30%|██▉ | 2989/10000 [10:51:50<25:14:26, 12.96s/it] {'loss': 0.0064, 'learning_rate': 3.5085e-05, 'epoch': 1.13} 30%|██▉ | 2989/10000 [10:51:50<25:14:26, 12.96s/it] 30%|██▉ | 2990/10000 [10:52:03<25:18:15, 13.00s/it] {'loss': 0.0052, 'learning_rate': 3.508e-05, 'epoch': 1.13} 30%|██▉ | 2990/10000 [10:52:03<25:18:15, 13.00s/it] 30%|██▉ | 2991/10000 [10:52:16<25:14:09, 12.96s/it] {'loss': 0.0051, 'learning_rate': 3.5075000000000006e-05, 'epoch': 1.13} 30%|██▉ | 2991/10000 [10:52:16<25:14:09, 12.96s/it] 30%|██▉ | 2992/10000 [10:52:29<25:11:17, 12.94s/it] {'loss': 0.0061, 'learning_rate': 3.507e-05, 'epoch': 1.13} 30%|██▉ | 2992/10000 [10:52:29<25:11:17, 12.94s/it] 30%|██▉ | 2993/10000 [10:52:41<25:08:36, 12.92s/it] {'loss': 0.0054, 'learning_rate': 3.5065000000000004e-05, 'epoch': 1.13} 30%|██▉ | 2993/10000 [10:52:41<25:08:36, 12.92s/it] 30%|██▉ | 2994/10000 [10:52:54<25:11:51, 12.95s/it] {'loss': 0.0052, 'learning_rate': 3.5060000000000007e-05, 'epoch': 1.13} 30%|██▉ | 2994/10000 [10:52:54<25:11:51, 12.95s/it] 30%|██▉ | 2995/10000 [10:53:07<25:12:50, 12.96s/it] {'loss': 0.0054, 'learning_rate': 3.5055e-05, 'epoch': 1.13} 30%|██▉ | 2995/10000 [10:53:07<25:12:50, 12.96s/it] 30%|██▉ | 2996/10000 [10:53:20<25:11:57, 12.95s/it] {'loss': 0.0049, 'learning_rate': 3.505e-05, 'epoch': 1.13} 30%|██▉ | 2996/10000 [10:53:20<25:11:57, 12.95s/it] 30%|██▉ | 2997/10000 [10:53:33<25:09:53, 12.94s/it] {'loss': 0.0054, 'learning_rate': 3.5045e-05, 'epoch': 1.13} 30%|██▉ | 2997/10000 [10:53:33<25:09:53, 12.94s/it] 30%|██▉ | 2998/10000 [10:53:46<25:09:06, 12.93s/it] {'loss': 0.0074, 'learning_rate': 3.504e-05, 'epoch': 1.13} 30%|██▉ | 2998/10000 [10:53:46<25:09:06, 12.93s/it] 30%|██▉ | 2999/10000 [10:53:59<25:09:02, 12.93s/it] {'loss': 0.0054, 'learning_rate': 3.5035e-05, 'epoch': 1.13} 30%|██▉ | 2999/10000 [10:53:59<25:09:02, 12.93s/it] 30%|███ | 3000/10000 [10:54:12<25:09:12, 12.94s/it] {'loss': 0.0048, 'learning_rate': 3.503e-05, 'epoch': 1.13} 30%|███ | 3000/10000 [10:54:12<25:09:12, 12.94s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-06 07:19:10,623 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/config.json [INFO|configuration_utils.py:364] 2024-11-06 07:19:10,625 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-06 07:19:58,041 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-06 07:19:58,043 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-06 07:19:58,044 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/special_tokens_map.json [2024-11-06 07:19:58,054] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step3000 is about to be saved! [2024-11-06 07:19:58,095] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/global_step3000/mp_rank_00_model_states.pt [2024-11-06 07:19:58,095] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/global_step3000/mp_rank_00_model_states.pt... [2024-11-06 07:21:00,900] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/global_step3000/mp_rank_00_model_states.pt. [2024-11-06 07:21:01,040] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-06 07:23:11,831] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-06 07:23:11,839] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-06 07:23:11,839] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step3000 is ready now! 30%|███ | 3001/10000 [10:58:26<165:47:05, 85.27s/it] {'loss': 0.005, 'learning_rate': 3.5025000000000004e-05, 'epoch': 1.13} 30%|███ | 3001/10000 [10:58:26<165:47:05, 85.27s/it] 30%|███ | 3002/10000 [10:58:39<123:25:32, 63.49s/it] {'loss': 0.0064, 'learning_rate': 3.502e-05, 'epoch': 1.13} 30%|███ | 3002/10000 [10:58:39<123:25:32, 63.49s/it] 30%|███ | 3003/10000 [10:58:52<93:50:01, 48.28s/it] {'loss': 0.0061, 'learning_rate': 3.5015e-05, 'epoch': 1.13} 30%|███ | 3003/10000 [10:58:52<93:50:01, 48.28s/it] 30%|███ | 3004/10000 [10:59:04<73:06:53, 37.62s/it] {'loss': 0.0038, 'learning_rate': 3.5010000000000005e-05, 'epoch': 1.13} 30%|███ | 3004/10000 [10:59:04<73:06:53, 37.62s/it] 30%|███ | 3005/10000 [10:59:17<58:37:37, 30.17s/it] {'loss': 0.0037, 'learning_rate': 3.5005e-05, 'epoch': 1.13} 30%|███ | 3005/10000 [10:59:17<58:37:37, 30.17s/it] 30%|███ | 3006/10000 [10:59:30<48:30:41, 24.97s/it] {'loss': 0.0041, 'learning_rate': 3.5e-05, 'epoch': 1.13} 30%|███ | 3006/10000 [10:59:30<48:30:41, 24.97s/it][2024-11-06 07:24:40,026] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 30%|███ | 3007/10000 [10:59:41<40:41:33, 20.95s/it] {'loss': 0.0057, 'learning_rate': 3.5e-05, 'epoch': 1.13} 30%|███ | 3007/10000 [10:59:42<40:41:33, 20.95s/it][2024-11-06 07:24:51,607] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 30%|███ | 3008/10000 [10:59:53<35:13:41, 18.14s/it] {'loss': 0.004, 'learning_rate': 3.5e-05, 'epoch': 1.13} 30%|███ | 3008/10000 [10:59:53<35:13:41, 18.14s/it] 30%|███ | 3009/10000 [11:00:06<32:09:42, 16.56s/it] {'loss': 0.0055, 'learning_rate': 3.4995e-05, 'epoch': 1.13} 30%|███ | 3009/10000 [11:00:06<32:09:42, 16.56s/it] 30%|███ | 3010/10000 [11:00:19<29:58:53, 15.44s/it] {'loss': 0.0061, 'learning_rate': 3.499e-05, 'epoch': 1.13} 30%|███ | 3010/10000 [11:00:19<29:58:53, 15.44s/it] 30%|███ | 3011/10000 [11:00:32<28:30:17, 14.68s/it] {'loss': 0.0033, 'learning_rate': 3.4985e-05, 'epoch': 1.13} 30%|███ | 3011/10000 [11:00:32<28:30:17, 14.68s/it] 30%|███ | 3012/10000 [11:00:45<27:28:45, 14.16s/it] {'loss': 0.0034, 'learning_rate': 3.498e-05, 'epoch': 1.13} 30%|███ | 3012/10000 [11:00:45<27:28:45, 14.16s/it] 30%|███ | 3013/10000 [11:00:58<26:48:44, 13.81s/it] {'loss': 0.0048, 'learning_rate': 3.4975e-05, 'epoch': 1.14} 30%|███ | 3013/10000 [11:00:58<26:48:44, 13.81s/it] 30%|███ | 3014/10000 [11:01:11<26:17:12, 13.55s/it] {'loss': 0.0052, 'learning_rate': 3.4970000000000006e-05, 'epoch': 1.14} 30%|███ | 3014/10000 [11:01:11<26:17:12, 13.55s/it] 30%|███ | 3015/10000 [11:01:23<25:53:16, 13.34s/it] {'loss': 0.0054, 'learning_rate': 3.4965e-05, 'epoch': 1.14} 30%|███ | 3015/10000 [11:01:23<25:53:16, 13.34s/it] 30%|███ | 3016/10000 [11:01:36<25:36:07, 13.20s/it] {'loss': 0.0044, 'learning_rate': 3.4960000000000004e-05, 'epoch': 1.14} 30%|███ | 3016/10000 [11:01:36<25:36:07, 13.20s/it] 30%|███ | 3017/10000 [11:01:49<25:25:34, 13.11s/it] {'loss': 0.0052, 'learning_rate': 3.495500000000001e-05, 'epoch': 1.14} 30%|███ | 3017/10000 [11:01:49<25:25:34, 13.11s/it] 30%|███ | 3018/10000 [11:02:02<25:18:11, 13.05s/it] {'loss': 0.0045, 'learning_rate': 3.495e-05, 'epoch': 1.14} 30%|███ | 3018/10000 [11:02:02<25:18:11, 13.05s/it] 30%|███ | 3019/10000 [11:02:15<25:13:43, 13.01s/it] {'loss': 0.0041, 'learning_rate': 3.4945e-05, 'epoch': 1.14} 30%|███ | 3019/10000 [11:02:15<25:13:43, 13.01s/it] 30%|███ | 3020/10000 [11:02:28<25:09:08, 12.97s/it] {'loss': 0.0075, 'learning_rate': 3.494e-05, 'epoch': 1.14} 30%|███ | 3020/10000 [11:02:28<25:09:08, 12.97s/it] 30%|███ | 3021/10000 [11:02:41<25:07:31, 12.96s/it] {'loss': 0.0041, 'learning_rate': 3.4935000000000003e-05, 'epoch': 1.14} 30%|███ | 3021/10000 [11:02:41<25:07:31, 12.96s/it] 30%|███ | 3022/10000 [11:02:54<25:05:55, 12.95s/it] {'loss': 0.0045, 'learning_rate': 3.493e-05, 'epoch': 1.14} 30%|███ | 3022/10000 [11:02:54<25:05:55, 12.95s/it] 30%|███ | 3023/10000 [11:03:07<25:02:51, 12.92s/it] {'loss': 0.0057, 'learning_rate': 3.4925e-05, 'epoch': 1.14} 30%|███ | 3023/10000 [11:03:07<25:02:51, 12.92s/it] 30%|███ | 3024/10000 [11:03:20<25:02:42, 12.92s/it] {'loss': 0.0052, 'learning_rate': 3.4920000000000004e-05, 'epoch': 1.14} 30%|███ | 3024/10000 [11:03:20<25:02:42, 12.92s/it] 30%|███ | 3025/10000 [11:03:32<25:01:03, 12.91s/it] {'loss': 0.0049, 'learning_rate': 3.4915e-05, 'epoch': 1.14} 30%|███ | 3025/10000 [11:03:32<25:01:03, 12.91s/it] 30%|███ | 3026/10000 [11:03:45<24:59:42, 12.90s/it] {'loss': 0.0052, 'learning_rate': 3.491e-05, 'epoch': 1.14} 30%|███ | 3026/10000 [11:03:45<24:59:42, 12.90s/it] 30%|███ | 3027/10000 [11:03:58<24:58:49, 12.90s/it] {'loss': 0.0061, 'learning_rate': 3.4905000000000005e-05, 'epoch': 1.14} 30%|███ | 3027/10000 [11:03:58<24:58:49, 12.90s/it] 30%|███ | 3028/10000 [11:04:11<24:59:13, 12.90s/it] {'loss': 0.0044, 'learning_rate': 3.49e-05, 'epoch': 1.14} 30%|███ | 3028/10000 [11:04:11<24:59:13, 12.90s/it] 30%|███ | 3029/10000 [11:04:24<24:58:24, 12.90s/it] {'loss': 0.0067, 'learning_rate': 3.4895e-05, 'epoch': 1.14} 30%|███ | 3029/10000 [11:04:24<24:58:24, 12.90s/it] 30%|███ | 3030/10000 [11:04:37<24:59:35, 12.91s/it] {'loss': 0.0044, 'learning_rate': 3.489e-05, 'epoch': 1.14} 30%|███ | 3030/10000 [11:04:37<24:59:35, 12.91s/it] 30%|███ | 3031/10000 [11:04:50<24:59:53, 12.91s/it] {'loss': 0.004, 'learning_rate': 3.4885e-05, 'epoch': 1.14} 30%|███ | 3031/10000 [11:04:50<24:59:53, 12.91s/it] 30%|███ | 3032/10000 [11:05:03<25:04:13, 12.95s/it] {'loss': 0.0049, 'learning_rate': 3.4880000000000005e-05, 'epoch': 1.14} 30%|███ | 3032/10000 [11:05:03<25:04:13, 12.95s/it] 30%|███ | 3033/10000 [11:05:16<25:02:28, 12.94s/it] {'loss': 0.0044, 'learning_rate': 3.4875e-05, 'epoch': 1.14} 30%|███ | 3033/10000 [11:05:16<25:02:28, 12.94s/it] 30%|███ | 3034/10000 [11:05:29<25:02:07, 12.94s/it] {'loss': 0.0066, 'learning_rate': 3.487e-05, 'epoch': 1.14} 30%|███ | 3034/10000 [11:05:29<25:02:07, 12.94s/it] 30%|███ | 3035/10000 [11:05:42<25:00:09, 12.92s/it] {'loss': 0.0052, 'learning_rate': 3.4865000000000006e-05, 'epoch': 1.14} 30%|███ | 3035/10000 [11:05:42<25:00:09, 12.92s/it] 30%|███ | 3036/10000 [11:05:54<24:58:26, 12.91s/it] {'loss': 0.0046, 'learning_rate': 3.486e-05, 'epoch': 1.14} 30%|███ | 3036/10000 [11:05:55<24:58:26, 12.91s/it] 30%|███ | 3037/10000 [11:06:07<24:58:17, 12.91s/it] {'loss': 0.004, 'learning_rate': 3.4855000000000004e-05, 'epoch': 1.14} 30%|███ | 3037/10000 [11:06:07<24:58:17, 12.91s/it] 30%|███ | 3038/10000 [11:06:20<24:57:09, 12.90s/it] {'loss': 0.0043, 'learning_rate': 3.485e-05, 'epoch': 1.14} 30%|███ | 3038/10000 [11:06:20<24:57:09, 12.90s/it] 30%|███ | 3039/10000 [11:06:33<24:58:13, 12.91s/it] {'loss': 0.0052, 'learning_rate': 3.4845e-05, 'epoch': 1.15} 30%|███ | 3039/10000 [11:06:33<24:58:13, 12.91s/it] 30%|███ | 3040/10000 [11:06:46<24:57:13, 12.91s/it] {'loss': 0.0053, 'learning_rate': 3.484e-05, 'epoch': 1.15} 30%|███ | 3040/10000 [11:06:46<24:57:13, 12.91s/it] 30%|███ | 3041/10000 [11:06:59<24:58:37, 12.92s/it] {'loss': 0.0043, 'learning_rate': 3.4835e-05, 'epoch': 1.15} 30%|███ | 3041/10000 [11:06:59<24:58:37, 12.92s/it] 30%|███ | 3042/10000 [11:07:12<24:58:48, 12.92s/it] {'loss': 0.0068, 'learning_rate': 3.4830000000000004e-05, 'epoch': 1.15} 30%|███ | 3042/10000 [11:07:12<24:58:48, 12.92s/it] 30%|███ | 3043/10000 [11:07:25<24:55:18, 12.90s/it] {'loss': 0.0047, 'learning_rate': 3.4825e-05, 'epoch': 1.15} 30%|███ | 3043/10000 [11:07:25<24:55:18, 12.90s/it] 30%|███ | 3044/10000 [11:07:38<24:52:27, 12.87s/it] {'loss': 0.0059, 'learning_rate': 3.482e-05, 'epoch': 1.15} 30%|███ | 3044/10000 [11:07:38<24:52:27, 12.87s/it] 30%|███ | 3045/10000 [11:07:50<24:50:26, 12.86s/it] {'loss': 0.0044, 'learning_rate': 3.4815000000000005e-05, 'epoch': 1.15} 30%|███ | 3045/10000 [11:07:51<24:50:26, 12.86s/it] 30%|███ | 3046/10000 [11:08:03<24:49:34, 12.85s/it] {'loss': 0.006, 'learning_rate': 3.481e-05, 'epoch': 1.15} 30%|███ | 3046/10000 [11:08:03<24:49:34, 12.85s/it] 30%|███ | 3047/10000 [11:08:16<24:50:02, 12.86s/it] {'loss': 0.0051, 'learning_rate': 3.4805e-05, 'epoch': 1.15} 30%|███ | 3047/10000 [11:08:16<24:50:02, 12.86s/it] 30%|███ | 3048/10000 [11:08:29<24:47:40, 12.84s/it] {'loss': 0.0061, 'learning_rate': 3.48e-05, 'epoch': 1.15} 30%|███ | 3048/10000 [11:08:29<24:47:40, 12.84s/it] 30%|███ | 3049/10000 [11:08:42<24:49:00, 12.85s/it] {'loss': 0.0047, 'learning_rate': 3.4795e-05, 'epoch': 1.15} 30%|███ | 3049/10000 [11:08:42<24:49:00, 12.85s/it] 30%|███ | 3050/10000 [11:08:55<24:50:18, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.479e-05, 'epoch': 1.15} 30%|███ | 3050/10000 [11:08:55<24:50:18, 12.87s/it] 31%|███ | 3051/10000 [11:09:08<24:50:39, 12.87s/it] {'loss': 0.0049, 'learning_rate': 3.4785e-05, 'epoch': 1.15} 31%|███ | 3051/10000 [11:09:08<24:50:39, 12.87s/it] 31%|███ | 3052/10000 [11:09:21<24:52:19, 12.89s/it] {'loss': 0.006, 'learning_rate': 3.478e-05, 'epoch': 1.15} 31%|███ | 3052/10000 [11:09:21<24:52:19, 12.89s/it] 31%|███ | 3053/10000 [11:09:33<24:49:10, 12.86s/it] {'loss': 0.005, 'learning_rate': 3.4775000000000005e-05, 'epoch': 1.15} 31%|███ | 3053/10000 [11:09:33<24:49:10, 12.86s/it] 31%|███ | 3054/10000 [11:09:46<24:48:48, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.477e-05, 'epoch': 1.15} 31%|███ | 3054/10000 [11:09:46<24:48:48, 12.86s/it] 31%|███ | 3055/10000 [11:09:59<24:47:47, 12.85s/it] {'loss': 0.0052, 'learning_rate': 3.4765000000000003e-05, 'epoch': 1.15} 31%|███ | 3055/10000 [11:09:59<24:47:47, 12.85s/it] 31%|███ | 3056/10000 [11:10:12<24:48:32, 12.86s/it] {'loss': 0.0046, 'learning_rate': 3.4760000000000006e-05, 'epoch': 1.15} 31%|███ | 3056/10000 [11:10:12<24:48:32, 12.86s/it] 31%|███ | 3057/10000 [11:10:25<24:49:34, 12.87s/it] {'loss': 0.0051, 'learning_rate': 3.4755e-05, 'epoch': 1.15} 31%|███ | 3057/10000 [11:10:25<24:49:34, 12.87s/it] 31%|███ | 3058/10000 [11:10:38<24:51:08, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.475e-05, 'epoch': 1.15} 31%|███ | 3058/10000 [11:10:38<24:51:08, 12.89s/it] 31%|███ | 3059/10000 [11:10:51<24:50:49, 12.89s/it] {'loss': 0.0044, 'learning_rate': 3.4745e-05, 'epoch': 1.15} 31%|███ | 3059/10000 [11:10:51<24:50:49, 12.89s/it] 31%|███ | 3060/10000 [11:11:04<24:50:20, 12.88s/it] {'loss': 0.0039, 'learning_rate': 3.474e-05, 'epoch': 1.15} 31%|███ | 3060/10000 [11:11:04<24:50:20, 12.88s/it] 31%|███ | 3061/10000 [11:11:16<24:52:09, 12.90s/it] {'loss': 0.0061, 'learning_rate': 3.4735e-05, 'epoch': 1.15} 31%|███ | 3061/10000 [11:11:17<24:52:09, 12.90s/it] 31%|███ | 3062/10000 [11:11:29<24:48:53, 12.88s/it] {'loss': 0.0071, 'learning_rate': 3.473e-05, 'epoch': 1.15} 31%|███ | 3062/10000 [11:11:29<24:48:53, 12.88s/it] 31%|███ | 3063/10000 [11:11:42<24:49:33, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.4725000000000004e-05, 'epoch': 1.15} 31%|███ | 3063/10000 [11:11:42<24:49:33, 12.88s/it] 31%|███ | 3064/10000 [11:11:55<24:48:19, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.472e-05, 'epoch': 1.15} 31%|███ | 3064/10000 [11:11:55<24:48:19, 12.87s/it] 31%|███ | 3065/10000 [11:12:08<24:44:50, 12.85s/it] {'loss': 0.0065, 'learning_rate': 3.4715e-05, 'epoch': 1.15} 31%|███ | 3065/10000 [11:12:08<24:44:50, 12.85s/it] 31%|███ | 3066/10000 [11:12:21<24:44:35, 12.85s/it] {'loss': 0.0072, 'learning_rate': 3.4710000000000005e-05, 'epoch': 1.16} 31%|███ | 3066/10000 [11:12:21<24:44:35, 12.85s/it] 31%|███ | 3067/10000 [11:12:34<24:45:22, 12.85s/it] {'loss': 0.0056, 'learning_rate': 3.470500000000001e-05, 'epoch': 1.16} 31%|███ | 3067/10000 [11:12:34<24:45:22, 12.85s/it] 31%|███ | 3068/10000 [11:12:46<24:46:28, 12.87s/it] {'loss': 0.0049, 'learning_rate': 3.4699999999999996e-05, 'epoch': 1.16} 31%|███ | 3068/10000 [11:12:46<24:46:28, 12.87s/it] 31%|███ | 3069/10000 [11:12:59<24:47:48, 12.88s/it] {'loss': 0.0049, 'learning_rate': 3.4695e-05, 'epoch': 1.16} 31%|███ | 3069/10000 [11:12:59<24:47:48, 12.88s/it] 31%|███ | 3070/10000 [11:13:12<24:48:20, 12.89s/it] {'loss': 0.0053, 'learning_rate': 3.469e-05, 'epoch': 1.16} 31%|███ | 3070/10000 [11:13:12<24:48:20, 12.89s/it] 31%|███ | 3071/10000 [11:13:25<24:49:02, 12.89s/it] {'loss': 0.0072, 'learning_rate': 3.4685000000000004e-05, 'epoch': 1.16} 31%|███ | 3071/10000 [11:13:25<24:49:02, 12.89s/it] 31%|███ | 3072/10000 [11:13:38<24:48:41, 12.89s/it] {'loss': 0.0053, 'learning_rate': 3.468e-05, 'epoch': 1.16} 31%|███ | 3072/10000 [11:13:38<24:48:41, 12.89s/it] 31%|███ | 3073/10000 [11:13:51<24:47:03, 12.88s/it] {'loss': 0.0044, 'learning_rate': 3.4675e-05, 'epoch': 1.16} 31%|███ | 3073/10000 [11:13:51<24:47:03, 12.88s/it] 31%|███ | 3074/10000 [11:14:04<24:44:06, 12.86s/it] {'loss': 0.0048, 'learning_rate': 3.4670000000000005e-05, 'epoch': 1.16} 31%|███ | 3074/10000 [11:14:04<24:44:06, 12.86s/it] 31%|███ | 3075/10000 [11:14:17<24:41:54, 12.84s/it] {'loss': 0.0049, 'learning_rate': 3.4665e-05, 'epoch': 1.16} 31%|███ | 3075/10000 [11:14:17<24:41:54, 12.84s/it] 31%|███ | 3076/10000 [11:14:29<24:44:51, 12.87s/it] {'loss': 0.0068, 'learning_rate': 3.4660000000000004e-05, 'epoch': 1.16} 31%|███ | 3076/10000 [11:14:29<24:44:51, 12.87s/it] 31%|███ | 3077/10000 [11:14:42<24:44:44, 12.87s/it] {'loss': 0.005, 'learning_rate': 3.4655000000000006e-05, 'epoch': 1.16} 31%|███ | 3077/10000 [11:14:42<24:44:44, 12.87s/it] 31%|███ | 3078/10000 [11:14:55<24:43:25, 12.86s/it] {'loss': 0.0066, 'learning_rate': 3.465e-05, 'epoch': 1.16} 31%|███ | 3078/10000 [11:14:55<24:43:25, 12.86s/it] 31%|███ | 3079/10000 [11:15:08<24:43:20, 12.86s/it] {'loss': 0.0057, 'learning_rate': 3.4645e-05, 'epoch': 1.16} 31%|███ | 3079/10000 [11:15:08<24:43:20, 12.86s/it] 31%|███ | 3080/10000 [11:15:21<24:42:31, 12.85s/it] {'loss': 0.0045, 'learning_rate': 3.464e-05, 'epoch': 1.16} 31%|███ | 3080/10000 [11:15:21<24:42:31, 12.85s/it] 31%|███ | 3081/10000 [11:15:34<24:40:32, 12.84s/it] {'loss': 0.0066, 'learning_rate': 3.4635e-05, 'epoch': 1.16} 31%|███ | 3081/10000 [11:15:34<24:40:32, 12.84s/it] 31%|███ | 3082/10000 [11:15:47<24:44:05, 12.87s/it] {'loss': 0.0052, 'learning_rate': 3.463e-05, 'epoch': 1.16} 31%|███ | 3082/10000 [11:15:47<24:44:05, 12.87s/it] 31%|███ | 3083/10000 [11:15:59<24:43:06, 12.86s/it] {'loss': 0.0049, 'learning_rate': 3.4625e-05, 'epoch': 1.16} 31%|███ | 3083/10000 [11:16:00<24:43:06, 12.86s/it] 31%|███ | 3084/10000 [11:16:12<24:41:52, 12.86s/it] {'loss': 0.0062, 'learning_rate': 3.4620000000000004e-05, 'epoch': 1.16} 31%|███ | 3084/10000 [11:16:12<24:41:52, 12.86s/it] 31%|███ | 3085/10000 [11:16:25<24:41:59, 12.86s/it] {'loss': 0.0042, 'learning_rate': 3.4615e-05, 'epoch': 1.16} 31%|███ | 3085/10000 [11:16:25<24:41:59, 12.86s/it] 31%|███ | 3086/10000 [11:16:38<24:40:12, 12.85s/it] {'loss': 0.0046, 'learning_rate': 3.461e-05, 'epoch': 1.16} 31%|███ | 3086/10000 [11:16:38<24:40:12, 12.85s/it] 31%|███ | 3087/10000 [11:16:51<24:38:54, 12.84s/it] {'loss': 0.0064, 'learning_rate': 3.4605000000000005e-05, 'epoch': 1.16} 31%|███ | 3087/10000 [11:16:51<24:38:54, 12.84s/it] 31%|███ | 3088/10000 [11:17:04<24:40:54, 12.86s/it] {'loss': 0.0058, 'learning_rate': 3.46e-05, 'epoch': 1.16} 31%|███ | 3088/10000 [11:17:04<24:40:54, 12.86s/it] 31%|███ | 3089/10000 [11:17:17<24:41:42, 12.86s/it] {'loss': 0.0056, 'learning_rate': 3.4594999999999997e-05, 'epoch': 1.16} 31%|███ | 3089/10000 [11:17:17<24:41:42, 12.86s/it] 31%|███ | 3090/10000 [11:17:29<24:42:03, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.459e-05, 'epoch': 1.16} 31%|███ | 3090/10000 [11:17:29<24:42:03, 12.87s/it] 31%|███ | 3091/10000 [11:17:42<24:40:06, 12.85s/it] {'loss': 0.0057, 'learning_rate': 3.4585e-05, 'epoch': 1.16} 31%|███ | 3091/10000 [11:17:42<24:40:06, 12.85s/it] 31%|███ | 3092/10000 [11:17:55<24:38:44, 12.84s/it] {'loss': 0.0042, 'learning_rate': 3.4580000000000004e-05, 'epoch': 1.17} 31%|███ | 3092/10000 [11:17:55<24:38:44, 12.84s/it] 31%|███ | 3093/10000 [11:18:08<24:39:02, 12.85s/it] {'loss': 0.0044, 'learning_rate': 3.4575e-05, 'epoch': 1.17} 31%|███ | 3093/10000 [11:18:08<24:39:02, 12.85s/it] 31%|███ | 3094/10000 [11:18:21<24:40:20, 12.86s/it] {'loss': 0.0052, 'learning_rate': 3.457e-05, 'epoch': 1.17} 31%|███ | 3094/10000 [11:18:21<24:40:20, 12.86s/it] 31%|███ | 3095/10000 [11:18:34<24:38:33, 12.85s/it] {'loss': 0.0046, 'learning_rate': 3.4565000000000005e-05, 'epoch': 1.17} 31%|███ | 3095/10000 [11:18:34<24:38:33, 12.85s/it] 31%|███ | 3096/10000 [11:18:47<24:39:12, 12.86s/it] {'loss': 0.0052, 'learning_rate': 3.456e-05, 'epoch': 1.17} 31%|███ | 3096/10000 [11:18:47<24:39:12, 12.86s/it] 31%|███ | 3097/10000 [11:18:59<24:39:31, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.4555000000000004e-05, 'epoch': 1.17} 31%|███ | 3097/10000 [11:18:59<24:39:31, 12.86s/it] 31%|███ | 3098/10000 [11:19:12<24:37:49, 12.85s/it] {'loss': 0.0069, 'learning_rate': 3.455e-05, 'epoch': 1.17} 31%|███ | 3098/10000 [11:19:12<24:37:49, 12.85s/it] 31%|███ | 3099/10000 [11:19:25<24:40:40, 12.87s/it] {'loss': 0.0069, 'learning_rate': 3.4545e-05, 'epoch': 1.17} 31%|███ | 3099/10000 [11:19:25<24:40:40, 12.87s/it] 31%|███ | 3100/10000 [11:19:38<24:38:47, 12.86s/it] {'loss': 0.0058, 'learning_rate': 3.454e-05, 'epoch': 1.17} 31%|███ | 3100/10000 [11:19:38<24:38:47, 12.86s/it] 31%|███ | 3101/10000 [11:19:51<24:37:37, 12.85s/it] {'loss': 0.0058, 'learning_rate': 3.4535e-05, 'epoch': 1.17} 31%|███ | 3101/10000 [11:19:51<24:37:37, 12.85s/it] 31%|███ | 3102/10000 [11:20:04<24:41:47, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.453e-05, 'epoch': 1.17} 31%|███ | 3102/10000 [11:20:04<24:41:47, 12.89s/it] 31%|███ | 3103/10000 [11:20:17<24:43:23, 12.90s/it] {'loss': 0.0058, 'learning_rate': 3.4525e-05, 'epoch': 1.17} 31%|███ | 3103/10000 [11:20:17<24:43:23, 12.90s/it] 31%|███ | 3104/10000 [11:20:30<24:41:21, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.452e-05, 'epoch': 1.17} 31%|███ | 3104/10000 [11:20:30<24:41:21, 12.89s/it] 31%|███ | 3105/10000 [11:20:42<24:39:16, 12.87s/it] {'loss': 0.0055, 'learning_rate': 3.4515000000000004e-05, 'epoch': 1.17} 31%|███ | 3105/10000 [11:20:42<24:39:16, 12.87s/it] 31%|███ | 3106/10000 [11:20:55<24:38:13, 12.87s/it] {'loss': 0.0052, 'learning_rate': 3.451000000000001e-05, 'epoch': 1.17} 31%|███ | 3106/10000 [11:20:55<24:38:13, 12.87s/it] 31%|███ | 3107/10000 [11:21:08<24:39:22, 12.88s/it] {'loss': 0.0062, 'learning_rate': 3.4505e-05, 'epoch': 1.17} 31%|███ | 3107/10000 [11:21:08<24:39:22, 12.88s/it] 31%|███ | 3108/10000 [11:21:21<24:40:05, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.45e-05, 'epoch': 1.17} 31%|███ | 3108/10000 [11:21:21<24:40:05, 12.89s/it] 31%|███ | 3109/10000 [11:21:34<24:42:06, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.4495e-05, 'epoch': 1.17} 31%|███ | 3109/10000 [11:21:34<24:42:06, 12.90s/it] 31%|███ | 3110/10000 [11:21:47<24:43:03, 12.91s/it] {'loss': 0.0064, 'learning_rate': 3.449e-05, 'epoch': 1.17} 31%|███ | 3110/10000 [11:21:47<24:43:03, 12.91s/it] 31%|███ | 3111/10000 [11:22:00<24:39:33, 12.89s/it] {'loss': 0.0071, 'learning_rate': 3.4485e-05, 'epoch': 1.17} 31%|███ | 3111/10000 [11:22:00<24:39:33, 12.89s/it] 31%|███ | 3112/10000 [11:22:13<24:40:49, 12.90s/it] {'loss': 0.0045, 'learning_rate': 3.448e-05, 'epoch': 1.17} 31%|███ | 3112/10000 [11:22:13<24:40:49, 12.90s/it] 31%|███ | 3113/10000 [11:22:26<24:39:20, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.4475000000000005e-05, 'epoch': 1.17} 31%|███ | 3113/10000 [11:22:26<24:39:20, 12.89s/it] 31%|███ | 3114/10000 [11:22:38<24:39:54, 12.89s/it] {'loss': 0.0044, 'learning_rate': 3.447e-05, 'epoch': 1.17} 31%|███ | 3114/10000 [11:22:39<24:39:54, 12.89s/it] 31%|███ | 3115/10000 [11:22:51<24:42:40, 12.92s/it] {'loss': 0.006, 'learning_rate': 3.4465e-05, 'epoch': 1.17} 31%|███ | 3115/10000 [11:22:52<24:42:40, 12.92s/it] 31%|███ | 3116/10000 [11:23:04<24:43:44, 12.93s/it] {'loss': 0.0074, 'learning_rate': 3.4460000000000005e-05, 'epoch': 1.17} 31%|███ | 3116/10000 [11:23:04<24:43:44, 12.93s/it] 31%|███ | 3117/10000 [11:23:17<24:44:09, 12.94s/it] {'loss': 0.0049, 'learning_rate': 3.4455e-05, 'epoch': 1.17} 31%|███ | 3117/10000 [11:23:17<24:44:09, 12.94s/it] 31%|███ | 3118/10000 [11:23:30<24:45:55, 12.95s/it] {'loss': 0.0056, 'learning_rate': 3.445e-05, 'epoch': 1.17} 31%|███ | 3118/10000 [11:23:30<24:45:55, 12.95s/it] 31%|███ | 3119/10000 [11:23:43<24:42:39, 12.93s/it] {'loss': 0.0049, 'learning_rate': 3.4445e-05, 'epoch': 1.18} 31%|███ | 3119/10000 [11:23:43<24:42:39, 12.93s/it] 31%|███ | 3120/10000 [11:23:56<24:40:25, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.444e-05, 'epoch': 1.18} 31%|███ | 3120/10000 [11:23:56<24:40:25, 12.91s/it] 31%|███ | 3121/10000 [11:24:09<24:41:14, 12.92s/it] {'loss': 0.0078, 'learning_rate': 3.4435e-05, 'epoch': 1.18} 31%|███ | 3121/10000 [11:24:09<24:41:14, 12.92s/it] 31%|███ | 3122/10000 [11:24:22<24:41:09, 12.92s/it] {'loss': 0.006, 'learning_rate': 3.443e-05, 'epoch': 1.18} 31%|███ | 3122/10000 [11:24:22<24:41:09, 12.92s/it] 31%|███ | 3123/10000 [11:24:35<24:37:07, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.4425e-05, 'epoch': 1.18} 31%|███ | 3123/10000 [11:24:35<24:37:07, 12.89s/it] 31%|███ | 3124/10000 [11:24:48<24:37:21, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.442e-05, 'epoch': 1.18} 31%|███ | 3124/10000 [11:24:48<24:37:21, 12.89s/it] 31%|███▏ | 3125/10000 [11:25:01<24:37:32, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.4415e-05, 'epoch': 1.18} 31%|███▏ | 3125/10000 [11:25:01<24:37:32, 12.89s/it] 31%|███▏ | 3126/10000 [11:25:14<24:39:09, 12.91s/it] {'loss': 0.0054, 'learning_rate': 3.4410000000000004e-05, 'epoch': 1.18} 31%|███▏ | 3126/10000 [11:25:14<24:39:09, 12.91s/it] 31%|███▏ | 3127/10000 [11:25:26<24:40:32, 12.92s/it] {'loss': 0.0054, 'learning_rate': 3.440500000000001e-05, 'epoch': 1.18} 31%|███▏ | 3127/10000 [11:25:27<24:40:32, 12.92s/it] 31%|███▏ | 3128/10000 [11:25:39<24:40:44, 12.93s/it] {'loss': 0.0057, 'learning_rate': 3.4399999999999996e-05, 'epoch': 1.18} 31%|███▏ | 3128/10000 [11:25:39<24:40:44, 12.93s/it] 31%|███▏ | 3129/10000 [11:25:52<24:41:27, 12.94s/it] {'loss': 0.0054, 'learning_rate': 3.4395e-05, 'epoch': 1.18} 31%|███▏ | 3129/10000 [11:25:52<24:41:27, 12.94s/it] 31%|███▏ | 3130/10000 [11:26:05<24:38:22, 12.91s/it] {'loss': 0.0066, 'learning_rate': 3.439e-05, 'epoch': 1.18} 31%|███▏ | 3130/10000 [11:26:05<24:38:22, 12.91s/it] 31%|███▏ | 3131/10000 [11:26:18<24:38:01, 12.91s/it] {'loss': 0.0067, 'learning_rate': 3.4385000000000004e-05, 'epoch': 1.18} 31%|███▏ | 3131/10000 [11:26:18<24:38:01, 12.91s/it] 31%|███▏ | 3132/10000 [11:26:31<24:38:15, 12.91s/it] {'loss': 0.0051, 'learning_rate': 3.438e-05, 'epoch': 1.18} 31%|███▏ | 3132/10000 [11:26:31<24:38:15, 12.91s/it] 31%|███▏ | 3133/10000 [11:26:44<24:41:55, 12.95s/it] {'loss': 0.0054, 'learning_rate': 3.4375e-05, 'epoch': 1.18} 31%|███▏ | 3133/10000 [11:26:44<24:41:55, 12.95s/it] 31%|███▏ | 3134/10000 [11:26:57<24:46:10, 12.99s/it] {'loss': 0.0057, 'learning_rate': 3.4370000000000005e-05, 'epoch': 1.18} 31%|███▏ | 3134/10000 [11:26:57<24:46:10, 12.99s/it] 31%|███▏ | 3135/10000 [11:27:10<24:45:06, 12.98s/it] {'loss': 0.0068, 'learning_rate': 3.4365e-05, 'epoch': 1.18} 31%|███▏ | 3135/10000 [11:27:10<24:45:06, 12.98s/it] 31%|███▏ | 3136/10000 [11:27:23<24:44:13, 12.97s/it] {'loss': 0.006, 'learning_rate': 3.436e-05, 'epoch': 1.18} 31%|███▏ | 3136/10000 [11:27:23<24:44:13, 12.97s/it] 31%|███▏ | 3137/10000 [11:27:36<24:43:24, 12.97s/it] {'loss': 0.0053, 'learning_rate': 3.4355000000000006e-05, 'epoch': 1.18} 31%|███▏ | 3137/10000 [11:27:36<24:43:24, 12.97s/it] 31%|███▏ | 3138/10000 [11:27:49<24:43:24, 12.97s/it] {'loss': 0.0053, 'learning_rate': 3.435e-05, 'epoch': 1.18} 31%|███▏ | 3138/10000 [11:27:49<24:43:24, 12.97s/it] 31%|███▏ | 3139/10000 [11:28:02<24:43:07, 12.97s/it] {'loss': 0.0065, 'learning_rate': 3.4345e-05, 'epoch': 1.18} 31%|███▏ | 3139/10000 [11:28:02<24:43:07, 12.97s/it] 31%|███▏ | 3140/10000 [11:28:15<24:42:38, 12.97s/it] {'loss': 0.0049, 'learning_rate': 3.434e-05, 'epoch': 1.18} 31%|███▏ | 3140/10000 [11:28:15<24:42:38, 12.97s/it] 31%|███▏ | 3141/10000 [11:28:28<24:39:51, 12.95s/it] {'loss': 0.0068, 'learning_rate': 3.4335e-05, 'epoch': 1.18} 31%|███▏ | 3141/10000 [11:28:28<24:39:51, 12.95s/it] 31%|███▏ | 3142/10000 [11:28:41<24:41:31, 12.96s/it] {'loss': 0.0057, 'learning_rate': 3.433e-05, 'epoch': 1.18} 31%|███▏ | 3142/10000 [11:28:41<24:41:31, 12.96s/it] 31%|███▏ | 3143/10000 [11:28:54<24:40:00, 12.95s/it] {'loss': 0.0073, 'learning_rate': 3.4325e-05, 'epoch': 1.18} 31%|███▏ | 3143/10000 [11:28:54<24:40:00, 12.95s/it] 31%|███▏ | 3144/10000 [11:29:07<24:40:24, 12.96s/it] {'loss': 0.0074, 'learning_rate': 3.4320000000000003e-05, 'epoch': 1.18} 31%|███▏ | 3144/10000 [11:29:07<24:40:24, 12.96s/it] 31%|███▏ | 3145/10000 [11:29:20<24:37:17, 12.93s/it] {'loss': 0.0063, 'learning_rate': 3.4315000000000006e-05, 'epoch': 1.19} 31%|███▏ | 3145/10000 [11:29:20<24:37:17, 12.93s/it] 31%|███▏ | 3146/10000 [11:29:33<24:40:26, 12.96s/it] {'loss': 0.0055, 'learning_rate': 3.431e-05, 'epoch': 1.19} 31%|███▏ | 3146/10000 [11:29:33<24:40:26, 12.96s/it] 31%|███▏ | 3147/10000 [11:29:46<24:41:29, 12.97s/it] {'loss': 0.0056, 'learning_rate': 3.4305000000000004e-05, 'epoch': 1.19} 31%|███▏ | 3147/10000 [11:29:46<24:41:29, 12.97s/it] 31%|███▏ | 3148/10000 [11:29:59<24:41:18, 12.97s/it] {'loss': 0.006, 'learning_rate': 3.430000000000001e-05, 'epoch': 1.19} 31%|███▏ | 3148/10000 [11:29:59<24:41:18, 12.97s/it] 31%|███▏ | 3149/10000 [11:30:12<24:38:55, 12.95s/it] {'loss': 0.005, 'learning_rate': 3.4294999999999996e-05, 'epoch': 1.19} 31%|███▏ | 3149/10000 [11:30:12<24:38:55, 12.95s/it] 32%|███▏ | 3150/10000 [11:30:25<24:41:09, 12.97s/it] {'loss': 0.0057, 'learning_rate': 3.429e-05, 'epoch': 1.19} 32%|███▏ | 3150/10000 [11:30:25<24:41:09, 12.97s/it] 32%|███▏ | 3151/10000 [11:30:38<24:41:26, 12.98s/it] {'loss': 0.0054, 'learning_rate': 3.4285e-05, 'epoch': 1.19} 32%|███▏ | 3151/10000 [11:30:38<24:41:26, 12.98s/it] 32%|███▏ | 3152/10000 [11:30:50<24:38:59, 12.96s/it] {'loss': 0.0083, 'learning_rate': 3.4280000000000004e-05, 'epoch': 1.19} 32%|███▏ | 3152/10000 [11:30:50<24:38:59, 12.96s/it] 32%|███▏ | 3153/10000 [11:31:03<24:38:48, 12.96s/it] {'loss': 0.0048, 'learning_rate': 3.4275e-05, 'epoch': 1.19} 32%|███▏ | 3153/10000 [11:31:03<24:38:48, 12.96s/it] 32%|███▏ | 3154/10000 [11:31:17<24:44:49, 13.01s/it] {'loss': 0.0051, 'learning_rate': 3.427e-05, 'epoch': 1.19} 32%|███▏ | 3154/10000 [11:31:17<24:44:49, 13.01s/it] 32%|███▏ | 3155/10000 [11:31:29<24:40:34, 12.98s/it] {'loss': 0.0061, 'learning_rate': 3.4265000000000005e-05, 'epoch': 1.19} 32%|███▏ | 3155/10000 [11:31:29<24:40:34, 12.98s/it] 32%|███▏ | 3156/10000 [11:31:42<24:41:14, 12.99s/it] {'loss': 0.0062, 'learning_rate': 3.426e-05, 'epoch': 1.19} 32%|███▏ | 3156/10000 [11:31:42<24:41:14, 12.99s/it] 32%|███▏ | 3157/10000 [11:31:55<24:36:59, 12.95s/it] {'loss': 0.007, 'learning_rate': 3.4255e-05, 'epoch': 1.19} 32%|███▏ | 3157/10000 [11:31:55<24:36:59, 12.95s/it] 32%|███▏ | 3158/10000 [11:32:08<24:37:05, 12.95s/it] {'loss': 0.005, 'learning_rate': 3.4250000000000006e-05, 'epoch': 1.19} 32%|███▏ | 3158/10000 [11:32:08<24:37:05, 12.95s/it] 32%|███▏ | 3159/10000 [11:32:21<24:35:58, 12.95s/it] {'loss': 0.0087, 'learning_rate': 3.4245e-05, 'epoch': 1.19} 32%|███▏ | 3159/10000 [11:32:21<24:35:58, 12.95s/it] 32%|███▏ | 3160/10000 [11:32:34<24:36:16, 12.95s/it] {'loss': 0.0067, 'learning_rate': 3.424e-05, 'epoch': 1.19} 32%|███▏ | 3160/10000 [11:32:34<24:36:16, 12.95s/it] 32%|███▏ | 3161/10000 [11:32:47<24:39:43, 12.98s/it] {'loss': 0.0046, 'learning_rate': 3.4235e-05, 'epoch': 1.19} 32%|███▏ | 3161/10000 [11:32:47<24:39:43, 12.98s/it] 32%|███▏ | 3162/10000 [11:33:00<24:38:41, 12.97s/it] {'loss': 0.0055, 'learning_rate': 3.423e-05, 'epoch': 1.19} 32%|███▏ | 3162/10000 [11:33:00<24:38:41, 12.97s/it] 32%|███▏ | 3163/10000 [11:33:13<24:38:35, 12.98s/it] {'loss': 0.0062, 'learning_rate': 3.4225e-05, 'epoch': 1.19} 32%|███▏ | 3163/10000 [11:33:13<24:38:35, 12.98s/it] 32%|███▏ | 3164/10000 [11:33:26<24:41:44, 13.01s/it] {'loss': 0.0042, 'learning_rate': 3.422e-05, 'epoch': 1.19} 32%|███▏ | 3164/10000 [11:33:26<24:41:44, 13.01s/it] 32%|███▏ | 3165/10000 [11:33:39<24:40:51, 13.00s/it] {'loss': 0.0074, 'learning_rate': 3.4215000000000004e-05, 'epoch': 1.19} 32%|███▏ | 3165/10000 [11:33:39<24:40:51, 13.00s/it] 32%|███▏ | 3166/10000 [11:33:52<24:40:31, 13.00s/it] {'loss': 0.0054, 'learning_rate': 3.4210000000000006e-05, 'epoch': 1.19} 32%|███▏ | 3166/10000 [11:33:52<24:40:31, 13.00s/it] 32%|███▏ | 3167/10000 [11:34:05<24:38:39, 12.98s/it] {'loss': 0.0081, 'learning_rate': 3.4205e-05, 'epoch': 1.19} 32%|███▏ | 3167/10000 [11:34:05<24:38:39, 12.98s/it] 32%|███▏ | 3168/10000 [11:34:18<24:40:25, 13.00s/it] {'loss': 0.0044, 'learning_rate': 3.4200000000000005e-05, 'epoch': 1.19} 32%|███▏ | 3168/10000 [11:34:18<24:40:25, 13.00s/it] 32%|███▏ | 3169/10000 [11:34:31<24:37:41, 12.98s/it] {'loss': 0.0067, 'learning_rate': 3.4195e-05, 'epoch': 1.19} 32%|███▏ | 3169/10000 [11:34:31<24:37:41, 12.98s/it] 32%|███▏ | 3170/10000 [11:34:44<24:36:56, 12.97s/it] {'loss': 0.0062, 'learning_rate': 3.419e-05, 'epoch': 1.19} 32%|███▏ | 3170/10000 [11:34:44<24:36:56, 12.97s/it] 32%|███▏ | 3171/10000 [11:34:57<24:38:25, 12.99s/it] {'loss': 0.0057, 'learning_rate': 3.4185e-05, 'epoch': 1.19} 32%|███▏ | 3171/10000 [11:34:57<24:38:25, 12.99s/it] 32%|███▏ | 3172/10000 [11:35:10<24:38:50, 13.00s/it] {'loss': 0.005, 'learning_rate': 3.418e-05, 'epoch': 1.2} 32%|███▏ | 3172/10000 [11:35:10<24:38:50, 13.00s/it] 32%|███▏ | 3173/10000 [11:35:23<24:37:31, 12.99s/it] {'loss': 0.0047, 'learning_rate': 3.4175000000000004e-05, 'epoch': 1.2} 32%|███▏ | 3173/10000 [11:35:23<24:37:31, 12.99s/it] 32%|███▏ | 3174/10000 [11:35:36<24:34:39, 12.96s/it] {'loss': 0.0063, 'learning_rate': 3.417e-05, 'epoch': 1.2} 32%|███▏ | 3174/10000 [11:35:36<24:34:39, 12.96s/it] 32%|███▏ | 3175/10000 [11:35:49<24:31:00, 12.93s/it] {'loss': 0.0065, 'learning_rate': 3.4165e-05, 'epoch': 1.2} 32%|███▏ | 3175/10000 [11:35:49<24:31:00, 12.93s/it] 32%|███▏ | 3176/10000 [11:36:02<24:26:48, 12.90s/it] {'loss': 0.0056, 'learning_rate': 3.4160000000000005e-05, 'epoch': 1.2} 32%|███▏ | 3176/10000 [11:36:02<24:26:48, 12.90s/it] 32%|███▏ | 3177/10000 [11:36:15<24:25:54, 12.89s/it] {'loss': 0.0064, 'learning_rate': 3.4155e-05, 'epoch': 1.2} 32%|███▏ | 3177/10000 [11:36:15<24:25:54, 12.89s/it] 32%|███▏ | 3178/10000 [11:36:27<24:24:39, 12.88s/it] {'loss': 0.0068, 'learning_rate': 3.415e-05, 'epoch': 1.2} 32%|███▏ | 3178/10000 [11:36:27<24:24:39, 12.88s/it] 32%|███▏ | 3179/10000 [11:36:40<24:26:48, 12.90s/it] {'loss': 0.0058, 'learning_rate': 3.4145e-05, 'epoch': 1.2} 32%|███▏ | 3179/10000 [11:36:40<24:26:48, 12.90s/it] 32%|███▏ | 3180/10000 [11:36:53<24:27:01, 12.91s/it] {'loss': 0.0069, 'learning_rate': 3.414e-05, 'epoch': 1.2} 32%|███▏ | 3180/10000 [11:36:53<24:27:01, 12.91s/it] 32%|███▏ | 3181/10000 [11:37:06<24:24:59, 12.89s/it] {'loss': 0.0061, 'learning_rate': 3.4135e-05, 'epoch': 1.2} 32%|███▏ | 3181/10000 [11:37:06<24:24:59, 12.89s/it] 32%|███▏ | 3182/10000 [11:37:19<24:24:41, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.413e-05, 'epoch': 1.2} 32%|███▏ | 3182/10000 [11:37:19<24:24:41, 12.89s/it] 32%|███▏ | 3183/10000 [11:37:32<24:24:05, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.4125e-05, 'epoch': 1.2} 32%|███▏ | 3183/10000 [11:37:32<24:24:05, 12.89s/it] 32%|███▏ | 3184/10000 [11:37:45<24:24:36, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.412e-05, 'epoch': 1.2} 32%|███▏ | 3184/10000 [11:37:45<24:24:36, 12.89s/it] 32%|███▏ | 3185/10000 [11:37:58<24:24:47, 12.90s/it] {'loss': 0.0041, 'learning_rate': 3.4115e-05, 'epoch': 1.2} 32%|███▏ | 3185/10000 [11:37:58<24:24:47, 12.90s/it] 32%|███▏ | 3186/10000 [11:38:11<24:24:29, 12.90s/it] {'loss': 0.0057, 'learning_rate': 3.4110000000000004e-05, 'epoch': 1.2} 32%|███▏ | 3186/10000 [11:38:11<24:24:29, 12.90s/it] 32%|███▏ | 3187/10000 [11:38:23<24:22:31, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.4105000000000006e-05, 'epoch': 1.2} 32%|███▏ | 3187/10000 [11:38:23<24:22:31, 12.88s/it] 32%|███▏ | 3188/10000 [11:38:36<24:24:18, 12.90s/it] {'loss': 0.0052, 'learning_rate': 3.41e-05, 'epoch': 1.2} 32%|███▏ | 3188/10000 [11:38:36<24:24:18, 12.90s/it] 32%|███▏ | 3189/10000 [11:38:49<24:24:36, 12.90s/it] {'loss': 0.0039, 'learning_rate': 3.4095e-05, 'epoch': 1.2} 32%|███▏ | 3189/10000 [11:38:49<24:24:36, 12.90s/it] 32%|███▏ | 3190/10000 [11:39:02<24:23:39, 12.90s/it] {'loss': 0.0049, 'learning_rate': 3.409e-05, 'epoch': 1.2} 32%|███▏ | 3190/10000 [11:39:02<24:23:39, 12.90s/it] 32%|███▏ | 3191/10000 [11:39:15<24:23:44, 12.90s/it] {'loss': 0.0046, 'learning_rate': 3.4085e-05, 'epoch': 1.2} 32%|███▏ | 3191/10000 [11:39:15<24:23:44, 12.90s/it] 32%|███▏ | 3192/10000 [11:39:28<24:23:24, 12.90s/it] {'loss': 0.0055, 'learning_rate': 3.408e-05, 'epoch': 1.2} 32%|███▏ | 3192/10000 [11:39:28<24:23:24, 12.90s/it] 32%|███▏ | 3193/10000 [11:39:41<24:26:26, 12.93s/it] {'loss': 0.0044, 'learning_rate': 3.4075e-05, 'epoch': 1.2} 32%|███▏ | 3193/10000 [11:39:41<24:26:26, 12.93s/it] 32%|███▏ | 3194/10000 [11:39:54<24:26:50, 12.93s/it] {'loss': 0.0053, 'learning_rate': 3.4070000000000004e-05, 'epoch': 1.2} 32%|███▏ | 3194/10000 [11:39:54<24:26:50, 12.93s/it] 32%|███▏ | 3195/10000 [11:40:07<24:26:18, 12.93s/it] {'loss': 0.0049, 'learning_rate': 3.4065e-05, 'epoch': 1.2} 32%|███▏ | 3195/10000 [11:40:07<24:26:18, 12.93s/it] 32%|███▏ | 3196/10000 [11:40:20<24:24:06, 12.91s/it] {'loss': 0.0053, 'learning_rate': 3.406e-05, 'epoch': 1.2} 32%|███▏ | 3196/10000 [11:40:20<24:24:06, 12.91s/it] 32%|███▏ | 3197/10000 [11:40:33<24:23:12, 12.91s/it] {'loss': 0.0078, 'learning_rate': 3.4055000000000005e-05, 'epoch': 1.2} 32%|███▏ | 3197/10000 [11:40:33<24:23:12, 12.91s/it] 32%|███▏ | 3198/10000 [11:40:46<24:23:06, 12.91s/it] {'loss': 0.005, 'learning_rate': 3.405e-05, 'epoch': 1.2} 32%|███▏ | 3198/10000 [11:40:46<24:23:06, 12.91s/it] 32%|███▏ | 3199/10000 [11:40:58<24:24:11, 12.92s/it] {'loss': 0.0044, 'learning_rate': 3.4045e-05, 'epoch': 1.21} 32%|███▏ | 3199/10000 [11:40:58<24:24:11, 12.92s/it] 32%|███▏ | 3200/10000 [11:41:11<24:21:21, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.404e-05, 'epoch': 1.21} 32%|███▏ | 3200/10000 [11:41:11<24:21:21, 12.89s/it] 32%|███▏ | 3201/10000 [11:41:24<24:18:56, 12.87s/it] {'loss': 0.0056, 'learning_rate': 3.4035e-05, 'epoch': 1.21} 32%|███▏ | 3201/10000 [11:41:24<24:18:56, 12.87s/it] 32%|███▏ | 3202/10000 [11:41:37<24:19:23, 12.88s/it] {'loss': 0.0054, 'learning_rate': 3.403e-05, 'epoch': 1.21} 32%|███▏ | 3202/10000 [11:41:37<24:19:23, 12.88s/it] 32%|███▏ | 3203/10000 [11:41:50<24:18:28, 12.87s/it] {'loss': 0.0055, 'learning_rate': 3.4025e-05, 'epoch': 1.21} 32%|███▏ | 3203/10000 [11:41:50<24:18:28, 12.87s/it] 32%|███▏ | 3204/10000 [11:42:03<24:19:33, 12.89s/it] {'loss': 0.0065, 'learning_rate': 3.402e-05, 'epoch': 1.21} 32%|███▏ | 3204/10000 [11:42:03<24:19:33, 12.89s/it] 32%|███▏ | 3205/10000 [11:42:16<24:20:38, 12.90s/it] {'loss': 0.0041, 'learning_rate': 3.4015000000000006e-05, 'epoch': 1.21} 32%|███▏ | 3205/10000 [11:42:16<24:20:38, 12.90s/it] 32%|███▏ | 3206/10000 [11:42:29<24:21:41, 12.91s/it] {'loss': 0.0048, 'learning_rate': 3.401e-05, 'epoch': 1.21} 32%|███▏ | 3206/10000 [11:42:29<24:21:41, 12.91s/it] 32%|███▏ | 3207/10000 [11:42:42<24:20:50, 12.90s/it] {'loss': 0.0044, 'learning_rate': 3.4005000000000004e-05, 'epoch': 1.21} 32%|███▏ | 3207/10000 [11:42:42<24:20:50, 12.90s/it] 32%|███▏ | 3208/10000 [11:42:54<24:20:36, 12.90s/it] {'loss': 0.0045, 'learning_rate': 3.4000000000000007e-05, 'epoch': 1.21} 32%|███▏ | 3208/10000 [11:42:54<24:20:36, 12.90s/it] 32%|███▏ | 3209/10000 [11:43:07<24:17:41, 12.88s/it] {'loss': 0.004, 'learning_rate': 3.3995e-05, 'epoch': 1.21} 32%|███▏ | 3209/10000 [11:43:07<24:17:41, 12.88s/it] 32%|███▏ | 3210/10000 [11:43:20<24:17:01, 12.88s/it] {'loss': 0.006, 'learning_rate': 3.399e-05, 'epoch': 1.21} 32%|███▏ | 3210/10000 [11:43:20<24:17:01, 12.88s/it] 32%|███▏ | 3211/10000 [11:43:33<24:16:49, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.3985e-05, 'epoch': 1.21} 32%|███▏ | 3211/10000 [11:43:33<24:16:49, 12.88s/it] 32%|███▏ | 3212/10000 [11:43:46<24:15:19, 12.86s/it] {'loss': 0.0088, 'learning_rate': 3.398e-05, 'epoch': 1.21} 32%|███▏ | 3212/10000 [11:43:46<24:15:19, 12.86s/it] 32%|███▏ | 3213/10000 [11:43:59<24:15:35, 12.87s/it] {'loss': 0.0047, 'learning_rate': 3.3975e-05, 'epoch': 1.21} 32%|███▏ | 3213/10000 [11:43:59<24:15:35, 12.87s/it] 32%|███▏ | 3214/10000 [11:44:12<24:16:36, 12.88s/it] {'loss': 0.0084, 'learning_rate': 3.397e-05, 'epoch': 1.21} 32%|███▏ | 3214/10000 [11:44:12<24:16:36, 12.88s/it] 32%|███▏ | 3215/10000 [11:44:25<24:16:43, 12.88s/it] {'loss': 0.0057, 'learning_rate': 3.3965000000000004e-05, 'epoch': 1.21} 32%|███▏ | 3215/10000 [11:44:25<24:16:43, 12.88s/it] 32%|███▏ | 3216/10000 [11:44:37<24:15:53, 12.88s/it] {'loss': 0.0056, 'learning_rate': 3.396e-05, 'epoch': 1.21} 32%|███▏ | 3216/10000 [11:44:37<24:15:53, 12.88s/it] 32%|███▏ | 3217/10000 [11:44:50<24:18:36, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.3955e-05, 'epoch': 1.21} 32%|███▏ | 3217/10000 [11:44:50<24:18:36, 12.90s/it] 32%|███▏ | 3218/10000 [11:45:03<24:17:15, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.3950000000000005e-05, 'epoch': 1.21} 32%|███▏ | 3218/10000 [11:45:03<24:17:15, 12.89s/it] 32%|███▏ | 3219/10000 [11:45:16<24:15:36, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.3945e-05, 'epoch': 1.21} 32%|███▏ | 3219/10000 [11:45:16<24:15:36, 12.88s/it] 32%|███▏ | 3220/10000 [11:45:29<24:16:18, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.394e-05, 'epoch': 1.21} 32%|███▏ | 3220/10000 [11:45:29<24:16:18, 12.89s/it] 32%|███▏ | 3221/10000 [11:45:42<24:17:29, 12.90s/it] {'loss': 0.0056, 'learning_rate': 3.3935e-05, 'epoch': 1.21} 32%|███▏ | 3221/10000 [11:45:42<24:17:29, 12.90s/it] 32%|███▏ | 3222/10000 [11:45:55<24:19:21, 12.92s/it] {'loss': 0.0044, 'learning_rate': 3.393e-05, 'epoch': 1.21} 32%|███▏ | 3222/10000 [11:45:55<24:19:21, 12.92s/it] 32%|███▏ | 3223/10000 [11:46:08<24:19:13, 12.92s/it] {'loss': 0.0059, 'learning_rate': 3.3925e-05, 'epoch': 1.21} 32%|███▏ | 3223/10000 [11:46:08<24:19:13, 12.92s/it] 32%|███▏ | 3224/10000 [11:46:21<24:19:09, 12.92s/it] {'loss': 0.0042, 'learning_rate': 3.392e-05, 'epoch': 1.21} 32%|███▏ | 3224/10000 [11:46:21<24:19:09, 12.92s/it] 32%|███▏ | 3225/10000 [11:46:34<24:17:55, 12.91s/it] {'loss': 0.0053, 'learning_rate': 3.3915e-05, 'epoch': 1.22} 32%|███▏ | 3225/10000 [11:46:34<24:17:55, 12.91s/it] 32%|███▏ | 3226/10000 [11:46:47<24:21:09, 12.94s/it] {'loss': 0.006, 'learning_rate': 3.3910000000000006e-05, 'epoch': 1.22} 32%|███▏ | 3226/10000 [11:46:47<24:21:09, 12.94s/it] 32%|███▏ | 3227/10000 [11:46:59<24:18:58, 12.92s/it] {'loss': 0.0065, 'learning_rate': 3.3905e-05, 'epoch': 1.22} 32%|███▏ | 3227/10000 [11:46:59<24:18:58, 12.92s/it] 32%|███▏ | 3228/10000 [11:47:12<24:15:42, 12.90s/it] {'loss': 0.0061, 'learning_rate': 3.3900000000000004e-05, 'epoch': 1.22} 32%|███▏ | 3228/10000 [11:47:12<24:15:42, 12.90s/it] 32%|███▏ | 3229/10000 [11:47:25<24:16:38, 12.91s/it] {'loss': 0.0047, 'learning_rate': 3.3895e-05, 'epoch': 1.22} 32%|███▏ | 3229/10000 [11:47:25<24:16:38, 12.91s/it] 32%|███▏ | 3230/10000 [11:47:38<24:17:02, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.389e-05, 'epoch': 1.22} 32%|███▏ | 3230/10000 [11:47:38<24:17:02, 12.91s/it] 32%|███▏ | 3231/10000 [11:47:51<24:16:24, 12.91s/it] {'loss': 0.0051, 'learning_rate': 3.3885e-05, 'epoch': 1.22} 32%|███▏ | 3231/10000 [11:47:51<24:16:24, 12.91s/it] 32%|███▏ | 3232/10000 [11:48:04<24:16:50, 12.92s/it] {'loss': 0.0088, 'learning_rate': 3.388e-05, 'epoch': 1.22} 32%|███▏ | 3232/10000 [11:48:04<24:16:50, 12.92s/it] 32%|███▏ | 3233/10000 [11:48:17<24:15:55, 12.91s/it] {'loss': 0.0061, 'learning_rate': 3.3875000000000003e-05, 'epoch': 1.22} 32%|███▏ | 3233/10000 [11:48:17<24:15:55, 12.91s/it] 32%|███▏ | 3234/10000 [11:48:30<24:10:53, 12.87s/it] {'loss': 0.0058, 'learning_rate': 3.387e-05, 'epoch': 1.22} 32%|███▏ | 3234/10000 [11:48:30<24:10:53, 12.87s/it] 32%|███▏ | 3235/10000 [11:48:43<24:14:33, 12.90s/it] {'loss': 0.0049, 'learning_rate': 3.3865e-05, 'epoch': 1.22} 32%|███▏ | 3235/10000 [11:48:43<24:14:33, 12.90s/it] 32%|███▏ | 3236/10000 [11:48:56<24:15:27, 12.91s/it] {'loss': 0.0067, 'learning_rate': 3.3860000000000004e-05, 'epoch': 1.22} 32%|███▏ | 3236/10000 [11:48:56<24:15:27, 12.91s/it] 32%|███▏ | 3237/10000 [11:49:08<24:14:28, 12.90s/it] {'loss': 0.0061, 'learning_rate': 3.3855e-05, 'epoch': 1.22} 32%|███▏ | 3237/10000 [11:49:08<24:14:28, 12.90s/it] 32%|███▏ | 3238/10000 [11:49:21<24:16:23, 12.92s/it] {'loss': 0.0049, 'learning_rate': 3.385e-05, 'epoch': 1.22} 32%|███▏ | 3238/10000 [11:49:21<24:16:23, 12.92s/it] 32%|███▏ | 3239/10000 [11:49:34<24:13:27, 12.90s/it] {'loss': 0.0057, 'learning_rate': 3.3845e-05, 'epoch': 1.22} 32%|███▏ | 3239/10000 [11:49:34<24:13:27, 12.90s/it] 32%|███▏ | 3240/10000 [11:49:47<24:12:14, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.384e-05, 'epoch': 1.22} 32%|███▏ | 3240/10000 [11:49:47<24:12:14, 12.89s/it] 32%|███▏ | 3241/10000 [11:50:00<24:10:16, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.3835e-05, 'epoch': 1.22} 32%|███▏ | 3241/10000 [11:50:00<24:10:16, 12.87s/it] 32%|███▏ | 3242/10000 [11:50:13<24:08:44, 12.86s/it] {'loss': 0.0061, 'learning_rate': 3.383e-05, 'epoch': 1.22} 32%|███▏ | 3242/10000 [11:50:13<24:08:44, 12.86s/it] 32%|███▏ | 3243/10000 [11:50:26<24:07:05, 12.85s/it] {'loss': 0.0061, 'learning_rate': 3.3825e-05, 'epoch': 1.22} 32%|███▏ | 3243/10000 [11:50:26<24:07:05, 12.85s/it] 32%|███▏ | 3244/10000 [11:50:39<24:09:14, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.3820000000000005e-05, 'epoch': 1.22} 32%|███▏ | 3244/10000 [11:50:39<24:09:14, 12.87s/it] 32%|███▏ | 3245/10000 [11:50:52<24:11:48, 12.90s/it] {'loss': 0.0055, 'learning_rate': 3.3815e-05, 'epoch': 1.22} 32%|███▏ | 3245/10000 [11:50:52<24:11:48, 12.90s/it] 32%|███▏ | 3246/10000 [11:51:04<24:10:17, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.381e-05, 'epoch': 1.22} 32%|███▏ | 3246/10000 [11:51:04<24:10:17, 12.88s/it] 32%|███▏ | 3247/10000 [11:51:17<24:10:17, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.3805000000000006e-05, 'epoch': 1.22} 32%|███▏ | 3247/10000 [11:51:17<24:10:17, 12.89s/it] 32%|███▏ | 3248/10000 [11:51:30<24:09:04, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.38e-05, 'epoch': 1.22} 32%|███▏ | 3248/10000 [11:51:30<24:09:04, 12.88s/it] 32%|███▏ | 3249/10000 [11:51:43<24:10:28, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.3795e-05, 'epoch': 1.22} 32%|███▏ | 3249/10000 [11:51:43<24:10:28, 12.89s/it] 32%|███▎ | 3250/10000 [11:51:56<24:12:31, 12.91s/it] {'loss': 0.0045, 'learning_rate': 3.379e-05, 'epoch': 1.22} 32%|███▎ | 3250/10000 [11:51:56<24:12:31, 12.91s/it] 33%|███▎ | 3251/10000 [11:52:09<24:12:45, 12.92s/it] {'loss': 0.0052, 'learning_rate': 3.3785e-05, 'epoch': 1.22} 33%|███▎ | 3251/10000 [11:52:09<24:12:45, 12.92s/it] 33%|███▎ | 3252/10000 [11:52:22<24:09:28, 12.89s/it] {'loss': 0.006, 'learning_rate': 3.378e-05, 'epoch': 1.23} 33%|███▎ | 3252/10000 [11:52:22<24:09:28, 12.89s/it] 33%|███▎ | 3253/10000 [11:52:35<24:10:19, 12.90s/it] {'loss': 0.0048, 'learning_rate': 3.3775e-05, 'epoch': 1.23} 33%|███▎ | 3253/10000 [11:52:35<24:10:19, 12.90s/it] 33%|███▎ | 3254/10000 [11:52:48<24:08:22, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.3770000000000004e-05, 'epoch': 1.23} 33%|███▎ | 3254/10000 [11:52:48<24:08:22, 12.88s/it] 33%|███▎ | 3255/10000 [11:53:00<24:11:04, 12.91s/it] {'loss': 0.0046, 'learning_rate': 3.3765e-05, 'epoch': 1.23} 33%|███▎ | 3255/10000 [11:53:01<24:11:04, 12.91s/it] 33%|███▎ | 3256/10000 [11:53:14<24:14:49, 12.94s/it] {'loss': 0.0048, 'learning_rate': 3.376e-05, 'epoch': 1.23} 33%|███▎ | 3256/10000 [11:53:14<24:14:49, 12.94s/it] 33%|███▎ | 3257/10000 [11:53:26<24:11:53, 12.92s/it] {'loss': 0.0047, 'learning_rate': 3.3755000000000005e-05, 'epoch': 1.23} 33%|███▎ | 3257/10000 [11:53:26<24:11:53, 12.92s/it] 33%|███▎ | 3258/10000 [11:53:39<24:10:24, 12.91s/it] {'loss': 0.0045, 'learning_rate': 3.375000000000001e-05, 'epoch': 1.23} 33%|███▎ | 3258/10000 [11:53:39<24:10:24, 12.91s/it] 33%|███▎ | 3259/10000 [11:53:52<24:09:33, 12.90s/it] {'loss': 0.0045, 'learning_rate': 3.3745e-05, 'epoch': 1.23} 33%|███▎ | 3259/10000 [11:53:52<24:09:33, 12.90s/it] 33%|███▎ | 3260/10000 [11:54:05<24:09:23, 12.90s/it] {'loss': 0.0053, 'learning_rate': 3.374e-05, 'epoch': 1.23} 33%|███▎ | 3260/10000 [11:54:05<24:09:23, 12.90s/it] 33%|███▎ | 3261/10000 [11:54:18<24:10:20, 12.91s/it] {'loss': 0.0047, 'learning_rate': 3.3735e-05, 'epoch': 1.23} 33%|███▎ | 3261/10000 [11:54:18<24:10:20, 12.91s/it] 33%|███▎ | 3262/10000 [11:54:31<24:12:06, 12.93s/it] {'loss': 0.0059, 'learning_rate': 3.373e-05, 'epoch': 1.23} 33%|███▎ | 3262/10000 [11:54:31<24:12:06, 12.93s/it] 33%|███▎ | 3263/10000 [11:54:44<24:11:55, 12.93s/it] {'loss': 0.007, 'learning_rate': 3.3725e-05, 'epoch': 1.23} 33%|███▎ | 3263/10000 [11:54:44<24:11:55, 12.93s/it] 33%|███▎ | 3264/10000 [11:54:57<24:14:23, 12.95s/it] {'loss': 0.0057, 'learning_rate': 3.372e-05, 'epoch': 1.23} 33%|███▎ | 3264/10000 [11:54:57<24:14:23, 12.95s/it] 33%|███▎ | 3265/10000 [11:55:10<24:12:15, 12.94s/it] {'loss': 0.0045, 'learning_rate': 3.3715000000000005e-05, 'epoch': 1.23} 33%|███▎ | 3265/10000 [11:55:10<24:12:15, 12.94s/it] 33%|███▎ | 3266/10000 [11:55:23<24:11:13, 12.93s/it] {'loss': 0.0068, 'learning_rate': 3.371e-05, 'epoch': 1.23} 33%|███▎ | 3266/10000 [11:55:23<24:11:13, 12.93s/it] 33%|███▎ | 3267/10000 [11:55:36<24:11:20, 12.93s/it] {'loss': 0.0063, 'learning_rate': 3.3705000000000003e-05, 'epoch': 1.23} 33%|███▎ | 3267/10000 [11:55:36<24:11:20, 12.93s/it] 33%|███▎ | 3268/10000 [11:55:49<24:11:01, 12.93s/it] {'loss': 0.0069, 'learning_rate': 3.3700000000000006e-05, 'epoch': 1.23} 33%|███▎ | 3268/10000 [11:55:49<24:11:01, 12.93s/it] 33%|███▎ | 3269/10000 [11:56:02<24:10:54, 12.93s/it] {'loss': 0.0069, 'learning_rate': 3.3695e-05, 'epoch': 1.23} 33%|███▎ | 3269/10000 [11:56:02<24:10:54, 12.93s/it] 33%|███▎ | 3270/10000 [11:56:14<24:12:09, 12.95s/it] {'loss': 0.0042, 'learning_rate': 3.369e-05, 'epoch': 1.23} 33%|███▎ | 3270/10000 [11:56:15<24:12:09, 12.95s/it] 33%|███▎ | 3271/10000 [11:56:28<24:14:23, 12.97s/it] {'loss': 0.0044, 'learning_rate': 3.3685e-05, 'epoch': 1.23} 33%|███▎ | 3271/10000 [11:56:28<24:14:23, 12.97s/it] 33%|███▎ | 3272/10000 [11:56:41<24:15:18, 12.98s/it] {'loss': 0.0062, 'learning_rate': 3.368e-05, 'epoch': 1.23} 33%|███▎ | 3272/10000 [11:56:41<24:15:18, 12.98s/it] 33%|███▎ | 3273/10000 [11:56:53<24:13:25, 12.96s/it] {'loss': 0.0065, 'learning_rate': 3.3675e-05, 'epoch': 1.23} 33%|███▎ | 3273/10000 [11:56:53<24:13:25, 12.96s/it] 33%|███▎ | 3274/10000 [11:57:06<24:15:46, 12.99s/it] {'loss': 0.0042, 'learning_rate': 3.367e-05, 'epoch': 1.23} 33%|███▎ | 3274/10000 [11:57:07<24:15:46, 12.99s/it] 33%|███▎ | 3275/10000 [11:57:19<24:15:16, 12.98s/it] {'loss': 0.0069, 'learning_rate': 3.3665000000000004e-05, 'epoch': 1.23} 33%|███▎ | 3275/10000 [11:57:19<24:15:16, 12.98s/it] 33%|███▎ | 3276/10000 [11:57:32<24:15:12, 12.99s/it] {'loss': 0.0067, 'learning_rate': 3.366e-05, 'epoch': 1.23} 33%|███▎ | 3276/10000 [11:57:32<24:15:12, 12.99s/it] 33%|███▎ | 3277/10000 [11:57:45<24:15:39, 12.99s/it] {'loss': 0.0055, 'learning_rate': 3.3655e-05, 'epoch': 1.23} 33%|███▎ | 3277/10000 [11:57:45<24:15:39, 12.99s/it] 33%|███▎ | 3278/10000 [11:57:58<24:12:29, 12.96s/it] {'loss': 0.006, 'learning_rate': 3.3650000000000005e-05, 'epoch': 1.24} 33%|███▎ | 3278/10000 [11:57:58<24:12:29, 12.96s/it] 33%|███▎ | 3279/10000 [11:58:11<24:13:54, 12.98s/it] {'loss': 0.0054, 'learning_rate': 3.364500000000001e-05, 'epoch': 1.24} 33%|███▎ | 3279/10000 [11:58:11<24:13:54, 12.98s/it] 33%|███▎ | 3280/10000 [11:58:24<24:14:42, 12.99s/it] {'loss': 0.005, 'learning_rate': 3.3639999999999996e-05, 'epoch': 1.24} 33%|███▎ | 3280/10000 [11:58:24<24:14:42, 12.99s/it] 33%|███▎ | 3281/10000 [11:58:37<24:15:07, 12.99s/it] {'loss': 0.0042, 'learning_rate': 3.3635e-05, 'epoch': 1.24} 33%|███▎ | 3281/10000 [11:58:37<24:15:07, 12.99s/it] 33%|███▎ | 3282/10000 [11:58:50<24:14:12, 12.99s/it] {'loss': 0.0068, 'learning_rate': 3.363e-05, 'epoch': 1.24} 33%|███▎ | 3282/10000 [11:58:50<24:14:12, 12.99s/it] 33%|███▎ | 3283/10000 [11:59:03<24:12:21, 12.97s/it] {'loss': 0.0048, 'learning_rate': 3.3625000000000004e-05, 'epoch': 1.24} 33%|███▎ | 3283/10000 [11:59:03<24:12:21, 12.97s/it] 33%|███▎ | 3284/10000 [11:59:16<24:10:52, 12.96s/it] {'loss': 0.0047, 'learning_rate': 3.362e-05, 'epoch': 1.24} 33%|███▎ | 3284/10000 [11:59:16<24:10:52, 12.96s/it] 33%|███▎ | 3285/10000 [11:59:29<24:09:18, 12.95s/it] {'loss': 0.0042, 'learning_rate': 3.3615e-05, 'epoch': 1.24} 33%|███▎ | 3285/10000 [11:59:29<24:09:18, 12.95s/it] 33%|███▎ | 3286/10000 [11:59:42<24:09:38, 12.95s/it] {'loss': 0.0071, 'learning_rate': 3.3610000000000005e-05, 'epoch': 1.24} 33%|███▎ | 3286/10000 [11:59:42<24:09:38, 12.95s/it] 33%|███▎ | 3287/10000 [11:59:55<24:06:58, 12.93s/it] {'loss': 0.0046, 'learning_rate': 3.3605e-05, 'epoch': 1.24} 33%|███▎ | 3287/10000 [11:59:55<24:06:58, 12.93s/it] 33%|███▎ | 3288/10000 [12:00:08<24:07:33, 12.94s/it] {'loss': 0.0057, 'learning_rate': 3.3600000000000004e-05, 'epoch': 1.24} 33%|███▎ | 3288/10000 [12:00:08<24:07:33, 12.94s/it] 33%|███▎ | 3289/10000 [12:00:21<24:05:48, 12.93s/it] {'loss': 0.0057, 'learning_rate': 3.3595000000000006e-05, 'epoch': 1.24} 33%|███▎ | 3289/10000 [12:00:21<24:05:48, 12.93s/it] 33%|███▎ | 3290/10000 [12:00:34<24:05:49, 12.93s/it] {'loss': 0.0054, 'learning_rate': 3.359e-05, 'epoch': 1.24} 33%|███▎ | 3290/10000 [12:00:34<24:05:49, 12.93s/it] 33%|███▎ | 3291/10000 [12:00:47<24:04:46, 12.92s/it] {'loss': 0.0068, 'learning_rate': 3.3585e-05, 'epoch': 1.24} 33%|███▎ | 3291/10000 [12:00:47<24:04:46, 12.92s/it] 33%|███▎ | 3292/10000 [12:01:00<24:04:43, 12.92s/it] {'loss': 0.0056, 'learning_rate': 3.358e-05, 'epoch': 1.24} 33%|███▎ | 3292/10000 [12:01:00<24:04:43, 12.92s/it] 33%|███▎ | 3293/10000 [12:01:13<24:03:43, 12.92s/it] {'loss': 0.0043, 'learning_rate': 3.3575e-05, 'epoch': 1.24} 33%|███▎ | 3293/10000 [12:01:13<24:03:43, 12.92s/it] 33%|███▎ | 3294/10000 [12:01:25<24:04:37, 12.93s/it] {'loss': 0.0064, 'learning_rate': 3.357e-05, 'epoch': 1.24} 33%|███▎ | 3294/10000 [12:01:25<24:04:37, 12.93s/it] 33%|███▎ | 3295/10000 [12:01:38<24:02:12, 12.91s/it] {'loss': 0.0046, 'learning_rate': 3.3565e-05, 'epoch': 1.24} 33%|███▎ | 3295/10000 [12:01:38<24:02:12, 12.91s/it] 33%|███▎ | 3296/10000 [12:01:51<24:03:53, 12.92s/it] {'loss': 0.0051, 'learning_rate': 3.3560000000000004e-05, 'epoch': 1.24} 33%|███▎ | 3296/10000 [12:01:51<24:03:53, 12.92s/it] 33%|███▎ | 3297/10000 [12:02:04<24:07:07, 12.95s/it] {'loss': 0.0126, 'learning_rate': 3.3555e-05, 'epoch': 1.24} 33%|███▎ | 3297/10000 [12:02:04<24:07:07, 12.95s/it] 33%|███▎ | 3298/10000 [12:02:17<24:04:07, 12.93s/it] {'loss': 0.0039, 'learning_rate': 3.355e-05, 'epoch': 1.24} 33%|███▎ | 3298/10000 [12:02:17<24:04:07, 12.93s/it] 33%|███▎ | 3299/10000 [12:02:30<24:04:55, 12.94s/it] {'loss': 0.0062, 'learning_rate': 3.3545000000000005e-05, 'epoch': 1.24} 33%|███▎ | 3299/10000 [12:02:30<24:04:55, 12.94s/it] 33%|███▎ | 3300/10000 [12:02:43<24:04:16, 12.93s/it] {'loss': 0.0046, 'learning_rate': 3.354e-05, 'epoch': 1.24} 33%|███▎ | 3300/10000 [12:02:43<24:04:16, 12.93s/it] 33%|███▎ | 3301/10000 [12:02:56<24:04:11, 12.94s/it] {'loss': 0.0051, 'learning_rate': 3.3534999999999997e-05, 'epoch': 1.24} 33%|███▎ | 3301/10000 [12:02:56<24:04:11, 12.94s/it] 33%|███▎ | 3302/10000 [12:03:09<24:01:00, 12.91s/it] {'loss': 0.0043, 'learning_rate': 3.353e-05, 'epoch': 1.24} 33%|███▎ | 3302/10000 [12:03:09<24:01:00, 12.91s/it] 33%|███▎ | 3303/10000 [12:03:22<24:00:26, 12.91s/it] {'loss': 0.0154, 'learning_rate': 3.3525e-05, 'epoch': 1.24} 33%|███▎ | 3303/10000 [12:03:22<24:00:26, 12.91s/it] 33%|███▎ | 3304/10000 [12:03:35<24:00:32, 12.91s/it] {'loss': 0.0054, 'learning_rate': 3.3520000000000004e-05, 'epoch': 1.24} 33%|███▎ | 3304/10000 [12:03:35<24:00:32, 12.91s/it] 33%|███▎ | 3305/10000 [12:03:48<24:02:40, 12.93s/it] {'loss': 0.0039, 'learning_rate': 3.3515e-05, 'epoch': 1.25} 33%|███▎ | 3305/10000 [12:03:48<24:02:40, 12.93s/it] 33%|███▎ | 3306/10000 [12:04:01<24:03:00, 12.93s/it] {'loss': 0.0039, 'learning_rate': 3.351e-05, 'epoch': 1.25} 33%|███▎ | 3306/10000 [12:04:01<24:03:00, 12.93s/it] 33%|███▎ | 3307/10000 [12:04:14<24:06:51, 12.97s/it] {'loss': 0.0048, 'learning_rate': 3.3505000000000005e-05, 'epoch': 1.25} 33%|███▎ | 3307/10000 [12:04:14<24:06:51, 12.97s/it] 33%|███▎ | 3308/10000 [12:04:27<24:06:00, 12.96s/it] {'loss': 0.0048, 'learning_rate': 3.35e-05, 'epoch': 1.25} 33%|███▎ | 3308/10000 [12:04:27<24:06:00, 12.96s/it] 33%|███▎ | 3309/10000 [12:04:39<24:03:04, 12.94s/it] {'loss': 0.0066, 'learning_rate': 3.3495000000000004e-05, 'epoch': 1.25} 33%|███▎ | 3309/10000 [12:04:40<24:03:04, 12.94s/it] 33%|███▎ | 3310/10000 [12:04:52<24:04:24, 12.95s/it] {'loss': 0.0041, 'learning_rate': 3.349e-05, 'epoch': 1.25} 33%|███▎ | 3310/10000 [12:04:53<24:04:24, 12.95s/it] 33%|███▎ | 3311/10000 [12:05:05<24:01:05, 12.93s/it] {'loss': 0.0053, 'learning_rate': 3.3485e-05, 'epoch': 1.25} 33%|███▎ | 3311/10000 [12:05:05<24:01:05, 12.93s/it] 33%|███▎ | 3312/10000 [12:05:18<24:01:28, 12.93s/it] {'loss': 0.0048, 'learning_rate': 3.348e-05, 'epoch': 1.25} 33%|███▎ | 3312/10000 [12:05:18<24:01:28, 12.93s/it] 33%|███▎ | 3313/10000 [12:05:31<24:05:22, 12.97s/it] {'loss': 0.0047, 'learning_rate': 3.3475e-05, 'epoch': 1.25} 33%|███▎ | 3313/10000 [12:05:31<24:05:22, 12.97s/it] 33%|███▎ | 3314/10000 [12:05:44<24:03:20, 12.95s/it] {'loss': 0.0053, 'learning_rate': 3.347e-05, 'epoch': 1.25} 33%|███▎ | 3314/10000 [12:05:44<24:03:20, 12.95s/it] 33%|███▎ | 3315/10000 [12:05:57<24:00:07, 12.93s/it] {'loss': 0.0056, 'learning_rate': 3.3465e-05, 'epoch': 1.25} 33%|███▎ | 3315/10000 [12:05:57<24:00:07, 12.93s/it] 33%|███▎ | 3316/10000 [12:06:10<24:01:00, 12.94s/it] {'loss': 0.0043, 'learning_rate': 3.346e-05, 'epoch': 1.25} 33%|███▎ | 3316/10000 [12:06:10<24:01:00, 12.94s/it] 33%|███▎ | 3317/10000 [12:06:23<24:04:46, 12.97s/it] {'loss': 0.0045, 'learning_rate': 3.3455000000000004e-05, 'epoch': 1.25} 33%|███▎ | 3317/10000 [12:06:23<24:04:46, 12.97s/it] 33%|███▎ | 3318/10000 [12:06:36<24:04:46, 12.97s/it] {'loss': 0.0038, 'learning_rate': 3.345000000000001e-05, 'epoch': 1.25} 33%|███▎ | 3318/10000 [12:06:36<24:04:46, 12.97s/it] 33%|███▎ | 3319/10000 [12:06:49<24:03:14, 12.96s/it] {'loss': 0.0051, 'learning_rate': 3.3445e-05, 'epoch': 1.25} 33%|███▎ | 3319/10000 [12:06:49<24:03:14, 12.96s/it] 33%|███▎ | 3320/10000 [12:07:02<24:03:26, 12.97s/it] {'loss': 0.0054, 'learning_rate': 3.344e-05, 'epoch': 1.25} 33%|███▎ | 3320/10000 [12:07:02<24:03:26, 12.97s/it] 33%|███▎ | 3321/10000 [12:07:15<24:04:43, 12.98s/it] {'loss': 0.0035, 'learning_rate': 3.3435e-05, 'epoch': 1.25} 33%|███▎ | 3321/10000 [12:07:15<24:04:43, 12.98s/it] 33%|███▎ | 3322/10000 [12:07:28<24:02:45, 12.96s/it] {'loss': 0.0047, 'learning_rate': 3.3430000000000003e-05, 'epoch': 1.25} 33%|███▎ | 3322/10000 [12:07:28<24:02:45, 12.96s/it] 33%|███▎ | 3323/10000 [12:07:41<24:00:18, 12.94s/it] {'loss': 0.005, 'learning_rate': 3.3425e-05, 'epoch': 1.25} 33%|███▎ | 3323/10000 [12:07:41<24:00:18, 12.94s/it] 33%|███▎ | 3324/10000 [12:07:54<24:02:20, 12.96s/it] {'loss': 0.0048, 'learning_rate': 3.342e-05, 'epoch': 1.25} 33%|███▎ | 3324/10000 [12:07:54<24:02:20, 12.96s/it] 33%|███▎ | 3325/10000 [12:08:07<24:01:29, 12.96s/it] {'loss': 0.0048, 'learning_rate': 3.3415000000000004e-05, 'epoch': 1.25} 33%|███▎ | 3325/10000 [12:08:07<24:01:29, 12.96s/it] 33%|███▎ | 3326/10000 [12:08:20<23:59:03, 12.94s/it] {'loss': 0.006, 'learning_rate': 3.341e-05, 'epoch': 1.25} 33%|███▎ | 3326/10000 [12:08:20<23:59:03, 12.94s/it] 33%|███▎ | 3327/10000 [12:08:33<23:57:59, 12.93s/it] {'loss': 0.0072, 'learning_rate': 3.3405e-05, 'epoch': 1.25} 33%|███▎ | 3327/10000 [12:08:33<23:57:59, 12.93s/it] 33%|███▎ | 3328/10000 [12:08:46<23:58:16, 12.93s/it] {'loss': 0.0051, 'learning_rate': 3.3400000000000005e-05, 'epoch': 1.25} 33%|███▎ | 3328/10000 [12:08:46<23:58:16, 12.93s/it] 33%|███▎ | 3329/10000 [12:08:59<24:00:37, 12.96s/it] {'loss': 0.0039, 'learning_rate': 3.3395e-05, 'epoch': 1.25} 33%|███▎ | 3329/10000 [12:08:59<24:00:37, 12.96s/it] 33%|███▎ | 3330/10000 [12:09:12<24:01:55, 12.97s/it] {'loss': 0.0051, 'learning_rate': 3.339e-05, 'epoch': 1.25} 33%|███▎ | 3330/10000 [12:09:12<24:01:55, 12.97s/it] 33%|███▎ | 3331/10000 [12:09:25<24:01:48, 12.97s/it] {'loss': 0.0055, 'learning_rate': 3.3385e-05, 'epoch': 1.26} 33%|███▎ | 3331/10000 [12:09:25<24:01:48, 12.97s/it] 33%|███▎ | 3332/10000 [12:09:37<24:01:44, 12.97s/it] {'loss': 0.0048, 'learning_rate': 3.338e-05, 'epoch': 1.26} 33%|███▎ | 3332/10000 [12:09:38<24:01:44, 12.97s/it] 33%|███▎ | 3333/10000 [12:09:51<24:03:04, 12.99s/it] {'loss': 0.0048, 'learning_rate': 3.3375e-05, 'epoch': 1.26} 33%|███▎ | 3333/10000 [12:09:51<24:03:04, 12.99s/it] 33%|███▎ | 3334/10000 [12:10:03<24:00:00, 12.96s/it] {'loss': 0.0063, 'learning_rate': 3.337e-05, 'epoch': 1.26} 33%|███▎ | 3334/10000 [12:10:03<24:00:00, 12.96s/it] 33%|███▎ | 3335/10000 [12:10:16<23:58:48, 12.95s/it] {'loss': 0.0059, 'learning_rate': 3.3365e-05, 'epoch': 1.26} 33%|███▎ | 3335/10000 [12:10:16<23:58:48, 12.95s/it] 33%|███▎ | 3336/10000 [12:10:29<23:55:06, 12.92s/it] {'loss': 0.0065, 'learning_rate': 3.336e-05, 'epoch': 1.26} 33%|███▎ | 3336/10000 [12:10:29<23:55:06, 12.92s/it] 33%|███▎ | 3337/10000 [12:10:42<23:55:30, 12.93s/it] {'loss': 0.0059, 'learning_rate': 3.3355e-05, 'epoch': 1.26} 33%|███▎ | 3337/10000 [12:10:42<23:55:30, 12.93s/it] 33%|███▎ | 3338/10000 [12:10:55<23:55:31, 12.93s/it] {'loss': 0.0058, 'learning_rate': 3.3350000000000004e-05, 'epoch': 1.26} 33%|███▎ | 3338/10000 [12:10:55<23:55:31, 12.93s/it] 33%|███▎ | 3339/10000 [12:11:08<23:56:27, 12.94s/it] {'loss': 0.0055, 'learning_rate': 3.334500000000001e-05, 'epoch': 1.26} 33%|███▎ | 3339/10000 [12:11:08<23:56:27, 12.94s/it] 33%|███▎ | 3340/10000 [12:11:21<23:57:46, 12.95s/it] {'loss': 0.0052, 'learning_rate': 3.3339999999999996e-05, 'epoch': 1.26} 33%|███▎ | 3340/10000 [12:11:21<23:57:46, 12.95s/it] 33%|███▎ | 3341/10000 [12:11:34<24:00:18, 12.98s/it] {'loss': 0.0059, 'learning_rate': 3.3335e-05, 'epoch': 1.26} 33%|███▎ | 3341/10000 [12:11:34<24:00:18, 12.98s/it] 33%|███▎ | 3342/10000 [12:11:47<23:55:30, 12.94s/it] {'loss': 0.0061, 'learning_rate': 3.333e-05, 'epoch': 1.26} 33%|███▎ | 3342/10000 [12:11:47<23:55:30, 12.94s/it] 33%|███▎ | 3343/10000 [12:12:00<23:53:07, 12.92s/it] {'loss': 0.0043, 'learning_rate': 3.3325000000000004e-05, 'epoch': 1.26} 33%|███▎ | 3343/10000 [12:12:00<23:53:07, 12.92s/it] 33%|███▎ | 3344/10000 [12:12:13<23:54:53, 12.93s/it] {'loss': 0.0058, 'learning_rate': 3.332e-05, 'epoch': 1.26} 33%|███▎ | 3344/10000 [12:12:13<23:54:53, 12.93s/it] 33%|███▎ | 3345/10000 [12:12:26<23:50:38, 12.90s/it] {'loss': 0.0046, 'learning_rate': 3.3315e-05, 'epoch': 1.26} 33%|███▎ | 3345/10000 [12:12:26<23:50:38, 12.90s/it] 33%|███▎ | 3346/10000 [12:12:38<23:51:04, 12.90s/it] {'loss': 0.0053, 'learning_rate': 3.3310000000000005e-05, 'epoch': 1.26} 33%|███▎ | 3346/10000 [12:12:38<23:51:04, 12.90s/it] 33%|███▎ | 3347/10000 [12:12:51<23:50:31, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.3305e-05, 'epoch': 1.26} 33%|███▎ | 3347/10000 [12:12:51<23:50:31, 12.90s/it] 33%|███▎ | 3348/10000 [12:13:04<23:49:57, 12.90s/it] {'loss': 0.0058, 'learning_rate': 3.33e-05, 'epoch': 1.26} 33%|███▎ | 3348/10000 [12:13:04<23:49:57, 12.90s/it] 33%|███▎ | 3349/10000 [12:13:17<23:50:30, 12.90s/it] {'loss': 0.0048, 'learning_rate': 3.3295000000000006e-05, 'epoch': 1.26} 33%|███▎ | 3349/10000 [12:13:17<23:50:30, 12.90s/it] 34%|███▎ | 3350/10000 [12:13:30<23:50:45, 12.91s/it] {'loss': 0.004, 'learning_rate': 3.329e-05, 'epoch': 1.26} 34%|███▎ | 3350/10000 [12:13:30<23:50:45, 12.91s/it] 34%|███▎ | 3351/10000 [12:13:43<23:51:45, 12.92s/it] {'loss': 0.0047, 'learning_rate': 3.3285e-05, 'epoch': 1.26} 34%|███▎ | 3351/10000 [12:13:43<23:51:45, 12.92s/it] 34%|███▎ | 3352/10000 [12:13:56<23:49:54, 12.91s/it] {'loss': 0.0051, 'learning_rate': 3.328e-05, 'epoch': 1.26} 34%|███▎ | 3352/10000 [12:13:56<23:49:54, 12.91s/it] 34%|███▎ | 3353/10000 [12:14:09<23:47:22, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.3275e-05, 'epoch': 1.26} 34%|███▎ | 3353/10000 [12:14:09<23:47:22, 12.88s/it] 34%|███▎ | 3354/10000 [12:14:22<23:44:18, 12.86s/it] {'loss': 0.0055, 'learning_rate': 3.327e-05, 'epoch': 1.26} 34%|███▎ | 3354/10000 [12:14:22<23:44:18, 12.86s/it] 34%|███▎ | 3355/10000 [12:14:35<23:47:11, 12.89s/it] {'loss': 0.0045, 'learning_rate': 3.3265e-05, 'epoch': 1.26} 34%|███▎ | 3355/10000 [12:14:35<23:47:11, 12.89s/it] 34%|███▎ | 3356/10000 [12:14:47<23:47:39, 12.89s/it] {'loss': 0.0046, 'learning_rate': 3.3260000000000003e-05, 'epoch': 1.26} 34%|███▎ | 3356/10000 [12:14:47<23:47:39, 12.89s/it] 34%|███▎ | 3357/10000 [12:15:00<23:47:38, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.3255000000000006e-05, 'epoch': 1.26} 34%|███▎ | 3357/10000 [12:15:00<23:47:38, 12.89s/it] 34%|███▎ | 3358/10000 [12:15:13<23:46:05, 12.88s/it] {'loss': 0.0049, 'learning_rate': 3.325e-05, 'epoch': 1.27} 34%|███▎ | 3358/10000 [12:15:13<23:46:05, 12.88s/it] 34%|███▎ | 3359/10000 [12:15:26<23:46:30, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.3245000000000004e-05, 'epoch': 1.27} 34%|███▎ | 3359/10000 [12:15:26<23:46:30, 12.89s/it] 34%|███▎ | 3360/10000 [12:15:39<23:45:16, 12.88s/it] {'loss': 0.0044, 'learning_rate': 3.324e-05, 'epoch': 1.27} 34%|███▎ | 3360/10000 [12:15:39<23:45:16, 12.88s/it] 34%|███▎ | 3361/10000 [12:15:52<23:45:12, 12.88s/it] {'loss': 0.0057, 'learning_rate': 3.3235e-05, 'epoch': 1.27} 34%|███▎ | 3361/10000 [12:15:52<23:45:12, 12.88s/it] 34%|███▎ | 3362/10000 [12:16:05<23:44:29, 12.88s/it] {'loss': 0.0056, 'learning_rate': 3.323e-05, 'epoch': 1.27} 34%|███▎ | 3362/10000 [12:16:05<23:44:29, 12.88s/it] 34%|███▎ | 3363/10000 [12:16:18<23:43:29, 12.87s/it] {'loss': 0.0138, 'learning_rate': 3.3225e-05, 'epoch': 1.27} 34%|███▎ | 3363/10000 [12:16:18<23:43:29, 12.87s/it] 34%|███▎ | 3364/10000 [12:16:30<23:44:20, 12.88s/it] {'loss': 0.0041, 'learning_rate': 3.3220000000000004e-05, 'epoch': 1.27} 34%|███▎ | 3364/10000 [12:16:30<23:44:20, 12.88s/it] 34%|███▎ | 3365/10000 [12:16:43<23:43:15, 12.87s/it] {'loss': 0.0045, 'learning_rate': 3.3215e-05, 'epoch': 1.27} 34%|███▎ | 3365/10000 [12:16:43<23:43:15, 12.87s/it] 34%|███▎ | 3366/10000 [12:16:56<23:40:58, 12.85s/it] {'loss': 0.0046, 'learning_rate': 3.321e-05, 'epoch': 1.27} 34%|███▎ | 3366/10000 [12:16:56<23:40:58, 12.85s/it] 34%|███▎ | 3367/10000 [12:17:09<23:43:10, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.3205000000000005e-05, 'epoch': 1.27} 34%|███▎ | 3367/10000 [12:17:09<23:43:10, 12.87s/it] 34%|███▎ | 3368/10000 [12:17:22<23:41:18, 12.86s/it] {'loss': 0.0066, 'learning_rate': 3.32e-05, 'epoch': 1.27} 34%|███▎ | 3368/10000 [12:17:22<23:41:18, 12.86s/it] 34%|███▎ | 3369/10000 [12:17:35<23:42:37, 12.87s/it] {'loss': 0.0036, 'learning_rate': 3.3195e-05, 'epoch': 1.27} 34%|███▎ | 3369/10000 [12:17:35<23:42:37, 12.87s/it] 34%|███▎ | 3370/10000 [12:17:48<23:43:15, 12.88s/it] {'loss': 0.0046, 'learning_rate': 3.319e-05, 'epoch': 1.27} 34%|███▎ | 3370/10000 [12:17:48<23:43:15, 12.88s/it] 34%|███▎ | 3371/10000 [12:18:01<23:44:16, 12.89s/it] {'loss': 0.0053, 'learning_rate': 3.3185e-05, 'epoch': 1.27} 34%|███▎ | 3371/10000 [12:18:01<23:44:16, 12.89s/it] 34%|███▎ | 3372/10000 [12:18:13<23:43:52, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.318e-05, 'epoch': 1.27} 34%|███▎ | 3372/10000 [12:18:13<23:43:52, 12.89s/it] 34%|███▎ | 3373/10000 [12:18:26<23:41:45, 12.87s/it] {'loss': 0.0047, 'learning_rate': 3.3175e-05, 'epoch': 1.27} 34%|███▎ | 3373/10000 [12:18:26<23:41:45, 12.87s/it] 34%|███▎ | 3374/10000 [12:18:39<23:41:44, 12.87s/it] {'loss': 0.0047, 'learning_rate': 3.317e-05, 'epoch': 1.27} 34%|███▎ | 3374/10000 [12:18:39<23:41:44, 12.87s/it] 34%|███▍ | 3375/10000 [12:18:52<23:39:46, 12.86s/it] {'loss': 0.0045, 'learning_rate': 3.3165e-05, 'epoch': 1.27} 34%|███▍ | 3375/10000 [12:18:52<23:39:46, 12.86s/it] 34%|███▍ | 3376/10000 [12:19:05<23:40:24, 12.87s/it] {'loss': 0.0063, 'learning_rate': 3.316e-05, 'epoch': 1.27} 34%|███▍ | 3376/10000 [12:19:05<23:40:24, 12.87s/it] 34%|███▍ | 3377/10000 [12:19:18<23:40:56, 12.87s/it] {'loss': 0.0082, 'learning_rate': 3.3155000000000004e-05, 'epoch': 1.27} 34%|███▍ | 3377/10000 [12:19:18<23:40:56, 12.87s/it] 34%|███▍ | 3378/10000 [12:19:31<23:39:46, 12.86s/it] {'loss': 0.0055, 'learning_rate': 3.3150000000000006e-05, 'epoch': 1.27} 34%|███▍ | 3378/10000 [12:19:31<23:39:46, 12.86s/it] 34%|███▍ | 3379/10000 [12:19:44<23:41:35, 12.88s/it] {'loss': 0.0047, 'learning_rate': 3.3145e-05, 'epoch': 1.27} 34%|███▍ | 3379/10000 [12:19:44<23:41:35, 12.88s/it] 34%|███▍ | 3380/10000 [12:19:56<23:42:11, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.314e-05, 'epoch': 1.27} 34%|███▍ | 3380/10000 [12:19:56<23:42:11, 12.89s/it] 34%|███▍ | 3381/10000 [12:20:09<23:41:05, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.3135e-05, 'epoch': 1.27} 34%|███▍ | 3381/10000 [12:20:09<23:41:05, 12.88s/it] 34%|███▍ | 3382/10000 [12:20:22<23:39:54, 12.87s/it] {'loss': 0.0058, 'learning_rate': 3.313e-05, 'epoch': 1.27} 34%|███▍ | 3382/10000 [12:20:22<23:39:54, 12.87s/it] 34%|███▍ | 3383/10000 [12:20:35<23:40:54, 12.88s/it] {'loss': 0.0036, 'learning_rate': 3.3125e-05, 'epoch': 1.27} 34%|███▍ | 3383/10000 [12:20:35<23:40:54, 12.88s/it] 34%|███▍ | 3384/10000 [12:20:48<23:37:52, 12.86s/it] {'loss': 0.0063, 'learning_rate': 3.312e-05, 'epoch': 1.28} 34%|███▍ | 3384/10000 [12:20:48<23:37:52, 12.86s/it] 34%|███▍ | 3385/10000 [12:21:01<23:39:03, 12.87s/it] {'loss': 0.0043, 'learning_rate': 3.3115000000000004e-05, 'epoch': 1.28} 34%|███▍ | 3385/10000 [12:21:01<23:39:03, 12.87s/it] 34%|███▍ | 3386/10000 [12:21:14<23:37:27, 12.86s/it] {'loss': 0.0056, 'learning_rate': 3.311e-05, 'epoch': 1.28} 34%|███▍ | 3386/10000 [12:21:14<23:37:27, 12.86s/it] 34%|███▍ | 3387/10000 [12:21:26<23:36:48, 12.85s/it] {'loss': 0.0048, 'learning_rate': 3.3105e-05, 'epoch': 1.28} 34%|███▍ | 3387/10000 [12:21:26<23:36:48, 12.85s/it] 34%|███▍ | 3388/10000 [12:21:39<23:35:05, 12.84s/it] {'loss': 0.0045, 'learning_rate': 3.3100000000000005e-05, 'epoch': 1.28} 34%|███▍ | 3388/10000 [12:21:39<23:35:05, 12.84s/it] 34%|███▍ | 3389/10000 [12:21:52<23:36:57, 12.86s/it] {'loss': 0.0053, 'learning_rate': 3.3095e-05, 'epoch': 1.28} 34%|███▍ | 3389/10000 [12:21:52<23:36:57, 12.86s/it] 34%|███▍ | 3390/10000 [12:22:05<23:36:23, 12.86s/it] {'loss': 0.0054, 'learning_rate': 3.309e-05, 'epoch': 1.28} 34%|███▍ | 3390/10000 [12:22:05<23:36:23, 12.86s/it] 34%|███▍ | 3391/10000 [12:22:18<23:38:50, 12.88s/it] {'loss': 0.0049, 'learning_rate': 3.3085e-05, 'epoch': 1.28} 34%|███▍ | 3391/10000 [12:22:18<23:38:50, 12.88s/it] 34%|███▍ | 3392/10000 [12:22:31<23:37:56, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.308e-05, 'epoch': 1.28} 34%|███▍ | 3392/10000 [12:22:31<23:37:56, 12.87s/it] 34%|███▍ | 3393/10000 [12:22:44<23:39:39, 12.89s/it] {'loss': 0.0039, 'learning_rate': 3.3075e-05, 'epoch': 1.28} 34%|███▍ | 3393/10000 [12:22:44<23:39:39, 12.89s/it] 34%|███▍ | 3394/10000 [12:22:57<23:38:29, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.307e-05, 'epoch': 1.28} 34%|███▍ | 3394/10000 [12:22:57<23:38:29, 12.88s/it] 34%|███▍ | 3395/10000 [12:23:09<23:37:22, 12.88s/it] {'loss': 0.0047, 'learning_rate': 3.3065e-05, 'epoch': 1.28} 34%|███▍ | 3395/10000 [12:23:09<23:37:22, 12.88s/it] 34%|███▍ | 3396/10000 [12:23:22<23:36:40, 12.87s/it] {'loss': 0.0051, 'learning_rate': 3.3060000000000005e-05, 'epoch': 1.28} 34%|███▍ | 3396/10000 [12:23:22<23:36:40, 12.87s/it] 34%|███▍ | 3397/10000 [12:23:35<23:36:41, 12.87s/it] {'loss': 0.0071, 'learning_rate': 3.3055e-05, 'epoch': 1.28} 34%|███▍ | 3397/10000 [12:23:35<23:36:41, 12.87s/it] 34%|███▍ | 3398/10000 [12:23:48<23:36:22, 12.87s/it] {'loss': 0.0044, 'learning_rate': 3.3050000000000004e-05, 'epoch': 1.28} 34%|███▍ | 3398/10000 [12:23:48<23:36:22, 12.87s/it] 34%|███▍ | 3399/10000 [12:24:01<23:34:39, 12.86s/it] {'loss': 0.0057, 'learning_rate': 3.3045000000000006e-05, 'epoch': 1.28} 34%|███▍ | 3399/10000 [12:24:01<23:34:39, 12.86s/it] 34%|███▍ | 3400/10000 [12:24:14<23:34:05, 12.86s/it] {'loss': 0.0049, 'learning_rate': 3.304e-05, 'epoch': 1.28} 34%|███▍ | 3400/10000 [12:24:14<23:34:05, 12.86s/it] 34%|███▍ | 3401/10000 [12:24:27<23:35:15, 12.87s/it] {'loss': 0.0067, 'learning_rate': 3.3035e-05, 'epoch': 1.28} 34%|███▍ | 3401/10000 [12:24:27<23:35:15, 12.87s/it] 34%|███▍ | 3402/10000 [12:24:39<23:34:51, 12.87s/it] {'loss': 0.0055, 'learning_rate': 3.303e-05, 'epoch': 1.28} 34%|███▍ | 3402/10000 [12:24:39<23:34:51, 12.87s/it] 34%|███▍ | 3403/10000 [12:24:52<23:34:37, 12.87s/it] {'loss': 0.0062, 'learning_rate': 3.3025e-05, 'epoch': 1.28} 34%|███▍ | 3403/10000 [12:24:52<23:34:37, 12.87s/it] 34%|███▍ | 3404/10000 [12:25:05<23:34:18, 12.87s/it] {'loss': 0.0044, 'learning_rate': 3.302e-05, 'epoch': 1.28} 34%|███▍ | 3404/10000 [12:25:05<23:34:18, 12.87s/it] 34%|███▍ | 3405/10000 [12:25:18<23:33:11, 12.86s/it] {'loss': 0.0054, 'learning_rate': 3.3015e-05, 'epoch': 1.28} 34%|███▍ | 3405/10000 [12:25:18<23:33:11, 12.86s/it] 34%|███▍ | 3406/10000 [12:25:31<23:31:52, 12.85s/it] {'loss': 0.0071, 'learning_rate': 3.3010000000000004e-05, 'epoch': 1.28} 34%|███▍ | 3406/10000 [12:25:31<23:31:52, 12.85s/it] 34%|███▍ | 3407/10000 [12:25:44<23:34:11, 12.87s/it] {'loss': 0.005, 'learning_rate': 3.3005e-05, 'epoch': 1.28} 34%|███▍ | 3407/10000 [12:25:44<23:34:11, 12.87s/it] 34%|███▍ | 3408/10000 [12:25:57<23:33:38, 12.87s/it] {'loss': 0.0055, 'learning_rate': 3.3e-05, 'epoch': 1.28} 34%|███▍ | 3408/10000 [12:25:57<23:33:38, 12.87s/it] 34%|███▍ | 3409/10000 [12:26:09<23:31:20, 12.85s/it] {'loss': 0.0052, 'learning_rate': 3.2995000000000005e-05, 'epoch': 1.28} 34%|███▍ | 3409/10000 [12:26:09<23:31:20, 12.85s/it] 34%|███▍ | 3410/10000 [12:26:22<23:34:50, 12.88s/it] {'loss': 0.0043, 'learning_rate': 3.299e-05, 'epoch': 1.28} 34%|███▍ | 3410/10000 [12:26:22<23:34:50, 12.88s/it] 34%|███▍ | 3411/10000 [12:26:35<23:35:25, 12.89s/it] {'loss': 0.0043, 'learning_rate': 3.2985e-05, 'epoch': 1.29} 34%|███▍ | 3411/10000 [12:26:35<23:35:25, 12.89s/it] 34%|███▍ | 3412/10000 [12:26:48<23:33:39, 12.87s/it] {'loss': 0.0064, 'learning_rate': 3.298e-05, 'epoch': 1.29} 34%|███▍ | 3412/10000 [12:26:48<23:33:39, 12.87s/it] 34%|███▍ | 3413/10000 [12:27:01<23:33:07, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.2975e-05, 'epoch': 1.29} 34%|███▍ | 3413/10000 [12:27:01<23:33:07, 12.87s/it] 34%|███▍ | 3414/10000 [12:27:14<23:32:20, 12.87s/it] {'loss': 0.0056, 'learning_rate': 3.297e-05, 'epoch': 1.29} 34%|███▍ | 3414/10000 [12:27:14<23:32:20, 12.87s/it] 34%|███▍ | 3415/10000 [12:27:27<23:32:20, 12.87s/it] {'loss': 0.0068, 'learning_rate': 3.2965e-05, 'epoch': 1.29} 34%|███▍ | 3415/10000 [12:27:27<23:32:20, 12.87s/it] 34%|███▍ | 3416/10000 [12:27:40<23:31:32, 12.86s/it] {'loss': 0.005, 'learning_rate': 3.296e-05, 'epoch': 1.29} 34%|███▍ | 3416/10000 [12:27:40<23:31:32, 12.86s/it] 34%|███▍ | 3417/10000 [12:27:52<23:31:28, 12.86s/it] {'loss': 0.005, 'learning_rate': 3.2955000000000006e-05, 'epoch': 1.29} 34%|███▍ | 3417/10000 [12:27:52<23:31:28, 12.86s/it] 34%|███▍ | 3418/10000 [12:28:05<23:30:11, 12.85s/it] {'loss': 0.0045, 'learning_rate': 3.295e-05, 'epoch': 1.29} 34%|███▍ | 3418/10000 [12:28:05<23:30:11, 12.85s/it] 34%|███▍ | 3419/10000 [12:28:18<23:29:09, 12.85s/it] {'loss': 0.0067, 'learning_rate': 3.2945000000000004e-05, 'epoch': 1.29} 34%|███▍ | 3419/10000 [12:28:18<23:29:09, 12.85s/it] 34%|███▍ | 3420/10000 [12:28:31<23:31:54, 12.87s/it] {'loss': 0.0049, 'learning_rate': 3.2940000000000006e-05, 'epoch': 1.29} 34%|███▍ | 3420/10000 [12:28:31<23:31:54, 12.87s/it] 34%|███▍ | 3421/10000 [12:28:44<23:31:29, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.2935e-05, 'epoch': 1.29} 34%|███▍ | 3421/10000 [12:28:44<23:31:29, 12.87s/it] 34%|███▍ | 3422/10000 [12:28:57<23:29:32, 12.86s/it] {'loss': 0.005, 'learning_rate': 3.293e-05, 'epoch': 1.29} 34%|███▍ | 3422/10000 [12:28:57<23:29:32, 12.86s/it] 34%|███▍ | 3423/10000 [12:29:10<23:31:08, 12.87s/it] {'loss': 0.0045, 'learning_rate': 3.2925e-05, 'epoch': 1.29} 34%|███▍ | 3423/10000 [12:29:10<23:31:08, 12.87s/it] 34%|███▍ | 3424/10000 [12:29:23<23:31:47, 12.88s/it] {'loss': 0.0072, 'learning_rate': 3.292e-05, 'epoch': 1.29} 34%|███▍ | 3424/10000 [12:29:23<23:31:47, 12.88s/it] 34%|███▍ | 3425/10000 [12:29:35<23:30:44, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.2915e-05, 'epoch': 1.29} 34%|███▍ | 3425/10000 [12:29:35<23:30:44, 12.87s/it] 34%|███▍ | 3426/10000 [12:29:48<23:30:08, 12.87s/it] {'loss': 0.0053, 'learning_rate': 3.291e-05, 'epoch': 1.29} 34%|███▍ | 3426/10000 [12:29:48<23:30:08, 12.87s/it] 34%|███▍ | 3427/10000 [12:30:01<23:31:14, 12.88s/it] {'loss': 0.0044, 'learning_rate': 3.2905000000000004e-05, 'epoch': 1.29} 34%|███▍ | 3427/10000 [12:30:01<23:31:14, 12.88s/it] 34%|███▍ | 3428/10000 [12:30:14<23:31:36, 12.89s/it] {'loss': 0.0043, 'learning_rate': 3.29e-05, 'epoch': 1.29} 34%|███▍ | 3428/10000 [12:30:14<23:31:36, 12.89s/it] 34%|███▍ | 3429/10000 [12:30:27<23:31:48, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.2895e-05, 'epoch': 1.29} 34%|███▍ | 3429/10000 [12:30:27<23:31:48, 12.89s/it] 34%|███▍ | 3430/10000 [12:30:40<23:36:14, 12.93s/it] {'loss': 0.0051, 'learning_rate': 3.2890000000000005e-05, 'epoch': 1.29} 34%|███▍ | 3430/10000 [12:30:40<23:36:14, 12.93s/it] 34%|███▍ | 3431/10000 [12:30:53<23:35:48, 12.93s/it] {'loss': 0.0048, 'learning_rate': 3.2885e-05, 'epoch': 1.29} 34%|███▍ | 3431/10000 [12:30:53<23:35:48, 12.93s/it] 34%|███▍ | 3432/10000 [12:31:06<23:35:10, 12.93s/it] {'loss': 0.0055, 'learning_rate': 3.288e-05, 'epoch': 1.29} 34%|███▍ | 3432/10000 [12:31:06<23:35:10, 12.93s/it] 34%|███▍ | 3433/10000 [12:31:19<23:32:11, 12.90s/it] {'loss': 0.0081, 'learning_rate': 3.2875e-05, 'epoch': 1.29} 34%|███▍ | 3433/10000 [12:31:19<23:32:11, 12.90s/it] 34%|███▍ | 3434/10000 [12:31:32<23:31:03, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.287e-05, 'epoch': 1.29} 34%|███▍ | 3434/10000 [12:31:32<23:31:03, 12.89s/it] 34%|███▍ | 3435/10000 [12:31:45<23:33:44, 12.92s/it] {'loss': 0.0048, 'learning_rate': 3.2865000000000005e-05, 'epoch': 1.29} 34%|███▍ | 3435/10000 [12:31:45<23:33:44, 12.92s/it] 34%|███▍ | 3436/10000 [12:31:57<23:32:42, 12.91s/it] {'loss': 0.0061, 'learning_rate': 3.286e-05, 'epoch': 1.29} 34%|███▍ | 3436/10000 [12:31:57<23:32:42, 12.91s/it] 34%|███▍ | 3437/10000 [12:32:10<23:32:19, 12.91s/it] {'loss': 0.0071, 'learning_rate': 3.2855e-05, 'epoch': 1.3} 34%|███▍ | 3437/10000 [12:32:10<23:32:19, 12.91s/it] 34%|███▍ | 3438/10000 [12:32:23<23:30:56, 12.90s/it] {'loss': 0.0059, 'learning_rate': 3.2850000000000006e-05, 'epoch': 1.3} 34%|███▍ | 3438/10000 [12:32:23<23:30:56, 12.90s/it] 34%|███▍ | 3439/10000 [12:32:36<23:32:39, 12.92s/it] {'loss': 0.0057, 'learning_rate': 3.2845e-05, 'epoch': 1.3} 34%|███▍ | 3439/10000 [12:32:36<23:32:39, 12.92s/it] 34%|███▍ | 3440/10000 [12:32:49<23:29:16, 12.89s/it] {'loss': 0.0073, 'learning_rate': 3.2840000000000004e-05, 'epoch': 1.3} 34%|███▍ | 3440/10000 [12:32:49<23:29:16, 12.89s/it] 34%|███▍ | 3441/10000 [12:33:02<23:30:46, 12.91s/it] {'loss': 0.0046, 'learning_rate': 3.2835e-05, 'epoch': 1.3} 34%|███▍ | 3441/10000 [12:33:02<23:30:46, 12.91s/it] 34%|███▍ | 3442/10000 [12:33:15<23:28:17, 12.88s/it] {'loss': 0.0075, 'learning_rate': 3.283e-05, 'epoch': 1.3} 34%|███▍ | 3442/10000 [12:33:15<23:28:17, 12.88s/it] 34%|███▍ | 3443/10000 [12:33:28<23:27:53, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.2825e-05, 'epoch': 1.3} 34%|███▍ | 3443/10000 [12:33:28<23:27:53, 12.88s/it] 34%|███▍ | 3444/10000 [12:33:41<23:26:42, 12.87s/it] {'loss': 0.0116, 'learning_rate': 3.282e-05, 'epoch': 1.3} 34%|███▍ | 3444/10000 [12:33:41<23:26:42, 12.87s/it] 34%|███▍ | 3445/10000 [12:33:53<23:24:24, 12.86s/it] {'loss': 0.0069, 'learning_rate': 3.2815000000000003e-05, 'epoch': 1.3} 34%|███▍ | 3445/10000 [12:33:53<23:24:24, 12.86s/it] 34%|███▍ | 3446/10000 [12:34:06<23:23:00, 12.84s/it] {'loss': 0.0048, 'learning_rate': 3.281e-05, 'epoch': 1.3} 34%|███▍ | 3446/10000 [12:34:06<23:23:00, 12.84s/it] 34%|███▍ | 3447/10000 [12:34:19<23:26:06, 12.87s/it] {'loss': 0.0053, 'learning_rate': 3.2805e-05, 'epoch': 1.3} 34%|███▍ | 3447/10000 [12:34:19<23:26:06, 12.87s/it] 34%|███▍ | 3448/10000 [12:34:32<23:27:34, 12.89s/it] {'loss': 0.0062, 'learning_rate': 3.2800000000000004e-05, 'epoch': 1.3} 34%|███▍ | 3448/10000 [12:34:32<23:27:34, 12.89s/it] 34%|███▍ | 3449/10000 [12:34:45<23:31:50, 12.93s/it] {'loss': 0.0055, 'learning_rate': 3.2795e-05, 'epoch': 1.3} 34%|███▍ | 3449/10000 [12:34:45<23:31:50, 12.93s/it] 34%|███▍ | 3450/10000 [12:34:58<23:29:29, 12.91s/it] {'loss': 0.0043, 'learning_rate': 3.279e-05, 'epoch': 1.3} 34%|███▍ | 3450/10000 [12:34:58<23:29:29, 12.91s/it] 35%|███▍ | 3451/10000 [12:35:11<23:30:49, 12.93s/it] {'loss': 0.006, 'learning_rate': 3.2785e-05, 'epoch': 1.3} 35%|███▍ | 3451/10000 [12:35:11<23:30:49, 12.93s/it] 35%|███▍ | 3452/10000 [12:35:24<23:29:54, 12.92s/it] {'loss': 0.0058, 'learning_rate': 3.278e-05, 'epoch': 1.3} 35%|███▍ | 3452/10000 [12:35:24<23:29:54, 12.92s/it] 35%|███▍ | 3453/10000 [12:35:37<23:29:13, 12.91s/it] {'loss': 0.0045, 'learning_rate': 3.2775e-05, 'epoch': 1.3} 35%|███▍ | 3453/10000 [12:35:37<23:29:13, 12.91s/it] 35%|███▍ | 3454/10000 [12:35:50<23:32:18, 12.95s/it] {'loss': 0.0065, 'learning_rate': 3.277e-05, 'epoch': 1.3} 35%|███▍ | 3454/10000 [12:35:50<23:32:18, 12.95s/it] 35%|███▍ | 3455/10000 [12:36:03<23:32:26, 12.95s/it] {'loss': 0.0052, 'learning_rate': 3.2765e-05, 'epoch': 1.3} 35%|███▍ | 3455/10000 [12:36:03<23:32:26, 12.95s/it] 35%|███▍ | 3456/10000 [12:36:16<23:32:06, 12.95s/it] {'loss': 0.005, 'learning_rate': 3.2760000000000005e-05, 'epoch': 1.3} 35%|███▍ | 3456/10000 [12:36:16<23:32:06, 12.95s/it] 35%|███▍ | 3457/10000 [12:36:29<23:30:21, 12.93s/it] {'loss': 0.0063, 'learning_rate': 3.2755e-05, 'epoch': 1.3} 35%|███▍ | 3457/10000 [12:36:29<23:30:21, 12.93s/it] 35%|███▍ | 3458/10000 [12:36:41<23:31:09, 12.94s/it] {'loss': 0.0053, 'learning_rate': 3.275e-05, 'epoch': 1.3} 35%|███▍ | 3458/10000 [12:36:42<23:31:09, 12.94s/it] 35%|███▍ | 3459/10000 [12:36:54<23:32:12, 12.95s/it] {'loss': 0.0052, 'learning_rate': 3.2745000000000006e-05, 'epoch': 1.3} 35%|███▍ | 3459/10000 [12:36:55<23:32:12, 12.95s/it] 35%|███▍ | 3460/10000 [12:37:07<23:31:26, 12.95s/it] {'loss': 0.0047, 'learning_rate': 3.274e-05, 'epoch': 1.3} 35%|███▍ | 3460/10000 [12:37:07<23:31:26, 12.95s/it] 35%|███▍ | 3461/10000 [12:37:20<23:30:55, 12.95s/it] {'loss': 0.0058, 'learning_rate': 3.2735e-05, 'epoch': 1.3} 35%|███▍ | 3461/10000 [12:37:20<23:30:55, 12.95s/it] 35%|███▍ | 3462/10000 [12:37:33<23:31:16, 12.95s/it] {'loss': 0.0048, 'learning_rate': 3.273e-05, 'epoch': 1.3} 35%|███▍ | 3462/10000 [12:37:33<23:31:16, 12.95s/it] 35%|███▍ | 3463/10000 [12:37:46<23:32:27, 12.96s/it] {'loss': 0.0047, 'learning_rate': 3.2725e-05, 'epoch': 1.3} 35%|███▍ | 3463/10000 [12:37:46<23:32:27, 12.96s/it] 35%|███▍ | 3464/10000 [12:37:59<23:31:23, 12.96s/it] {'loss': 0.0055, 'learning_rate': 3.272e-05, 'epoch': 1.31} 35%|███▍ | 3464/10000 [12:37:59<23:31:23, 12.96s/it] 35%|███▍ | 3465/10000 [12:38:12<23:32:01, 12.96s/it] {'loss': 0.0048, 'learning_rate': 3.2715e-05, 'epoch': 1.31} 35%|███▍ | 3465/10000 [12:38:12<23:32:01, 12.96s/it] 35%|███▍ | 3466/10000 [12:38:25<23:31:27, 12.96s/it] {'loss': 0.0056, 'learning_rate': 3.2710000000000004e-05, 'epoch': 1.31} 35%|███▍ | 3466/10000 [12:38:25<23:31:27, 12.96s/it] 35%|███▍ | 3467/10000 [12:38:38<23:28:37, 12.94s/it] {'loss': 0.0069, 'learning_rate': 3.2705e-05, 'epoch': 1.31} 35%|███▍ | 3467/10000 [12:38:38<23:28:37, 12.94s/it] 35%|███▍ | 3468/10000 [12:38:51<23:30:34, 12.96s/it] {'loss': 0.0047, 'learning_rate': 3.27e-05, 'epoch': 1.31} 35%|███▍ | 3468/10000 [12:38:51<23:30:34, 12.96s/it] 35%|███▍ | 3469/10000 [12:39:04<23:27:56, 12.93s/it] {'loss': 0.0058, 'learning_rate': 3.2695000000000005e-05, 'epoch': 1.31} 35%|███▍ | 3469/10000 [12:39:04<23:27:56, 12.93s/it] 35%|███▍ | 3470/10000 [12:39:17<23:29:54, 12.95s/it] {'loss': 0.004, 'learning_rate': 3.269000000000001e-05, 'epoch': 1.31} 35%|███▍ | 3470/10000 [12:39:17<23:29:54, 12.95s/it] 35%|███▍ | 3471/10000 [12:39:30<23:29:03, 12.95s/it] {'loss': 0.0045, 'learning_rate': 3.2684999999999996e-05, 'epoch': 1.31} 35%|███▍ | 3471/10000 [12:39:30<23:29:03, 12.95s/it] 35%|███▍ | 3472/10000 [12:39:43<23:25:30, 12.92s/it] {'loss': 0.0075, 'learning_rate': 3.268e-05, 'epoch': 1.31} 35%|███▍ | 3472/10000 [12:39:43<23:25:30, 12.92s/it] 35%|███▍ | 3473/10000 [12:39:56<23:27:56, 12.94s/it] {'loss': 0.004, 'learning_rate': 3.2675e-05, 'epoch': 1.31} 35%|███▍ | 3473/10000 [12:39:56<23:27:56, 12.94s/it] 35%|███▍ | 3474/10000 [12:40:09<23:29:23, 12.96s/it] {'loss': 0.006, 'learning_rate': 3.267e-05, 'epoch': 1.31} 35%|███▍ | 3474/10000 [12:40:09<23:29:23, 12.96s/it] 35%|███▍ | 3475/10000 [12:40:22<23:29:33, 12.96s/it] {'loss': 0.0076, 'learning_rate': 3.2665e-05, 'epoch': 1.31} 35%|███▍ | 3475/10000 [12:40:22<23:29:33, 12.96s/it] 35%|███▍ | 3476/10000 [12:40:35<23:26:53, 12.94s/it] {'loss': 0.0064, 'learning_rate': 3.266e-05, 'epoch': 1.31} 35%|███▍ | 3476/10000 [12:40:35<23:26:53, 12.94s/it] 35%|███▍ | 3477/10000 [12:40:47<23:25:55, 12.93s/it] {'loss': 0.0047, 'learning_rate': 3.2655000000000005e-05, 'epoch': 1.31} 35%|███▍ | 3477/10000 [12:40:48<23:25:55, 12.93s/it] 35%|███▍ | 3478/10000 [12:41:00<23:23:21, 12.91s/it] {'loss': 0.0049, 'learning_rate': 3.265e-05, 'epoch': 1.31} 35%|███▍ | 3478/10000 [12:41:00<23:23:21, 12.91s/it] 35%|███▍ | 3479/10000 [12:41:13<23:23:51, 12.92s/it] {'loss': 0.0049, 'learning_rate': 3.2645e-05, 'epoch': 1.31} 35%|███▍ | 3479/10000 [12:41:13<23:23:51, 12.92s/it] 35%|███▍ | 3480/10000 [12:41:26<23:27:16, 12.95s/it] {'loss': 0.0047, 'learning_rate': 3.2640000000000006e-05, 'epoch': 1.31} 35%|███▍ | 3480/10000 [12:41:26<23:27:16, 12.95s/it] 35%|███▍ | 3481/10000 [12:41:39<23:28:24, 12.96s/it] {'loss': 0.005, 'learning_rate': 3.2635e-05, 'epoch': 1.31} 35%|███▍ | 3481/10000 [12:41:39<23:28:24, 12.96s/it] 35%|███▍ | 3482/10000 [12:41:52<23:28:44, 12.97s/it] {'loss': 0.0048, 'learning_rate': 3.263e-05, 'epoch': 1.31} 35%|███▍ | 3482/10000 [12:41:52<23:28:44, 12.97s/it] 35%|███▍ | 3483/10000 [12:42:05<23:26:41, 12.95s/it] {'loss': 0.0051, 'learning_rate': 3.2625e-05, 'epoch': 1.31} 35%|███▍ | 3483/10000 [12:42:05<23:26:41, 12.95s/it] 35%|███▍ | 3484/10000 [12:42:18<23:27:21, 12.96s/it] {'loss': 0.0068, 'learning_rate': 3.262e-05, 'epoch': 1.31} 35%|███▍ | 3484/10000 [12:42:18<23:27:21, 12.96s/it] 35%|███▍ | 3485/10000 [12:42:31<23:25:28, 12.94s/it] {'loss': 0.0048, 'learning_rate': 3.2615e-05, 'epoch': 1.31} 35%|███▍ | 3485/10000 [12:42:31<23:25:28, 12.94s/it] 35%|███▍ | 3486/10000 [12:42:44<23:25:52, 12.95s/it] {'loss': 0.0049, 'learning_rate': 3.261e-05, 'epoch': 1.31} 35%|███▍ | 3486/10000 [12:42:44<23:25:52, 12.95s/it] 35%|███▍ | 3487/10000 [12:42:57<23:24:55, 12.94s/it] {'loss': 0.0057, 'learning_rate': 3.2605000000000004e-05, 'epoch': 1.31} 35%|███▍ | 3487/10000 [12:42:57<23:24:55, 12.94s/it] 35%|███▍ | 3488/10000 [12:43:10<23:24:57, 12.94s/it] {'loss': 0.0056, 'learning_rate': 3.26e-05, 'epoch': 1.31} 35%|███▍ | 3488/10000 [12:43:10<23:24:57, 12.94s/it] 35%|███▍ | 3489/10000 [12:43:23<23:21:44, 12.92s/it] {'loss': 0.0059, 'learning_rate': 3.2595e-05, 'epoch': 1.31} 35%|███▍ | 3489/10000 [12:43:23<23:21:44, 12.92s/it] 35%|███▍ | 3490/10000 [12:43:36<23:24:12, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.2590000000000005e-05, 'epoch': 1.31} 35%|███▍ | 3490/10000 [12:43:36<23:24:12, 12.94s/it] 35%|███▍ | 3491/10000 [12:43:49<23:23:59, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.2585e-05, 'epoch': 1.32} 35%|███▍ | 3491/10000 [12:43:49<23:23:59, 12.94s/it] 35%|███▍ | 3492/10000 [12:44:02<23:25:59, 12.96s/it] {'loss': 0.0039, 'learning_rate': 3.2579999999999996e-05, 'epoch': 1.32} 35%|███▍ | 3492/10000 [12:44:02<23:25:59, 12.96s/it] 35%|███▍ | 3493/10000 [12:44:15<23:26:26, 12.97s/it] {'loss': 0.0053, 'learning_rate': 3.2575e-05, 'epoch': 1.32} 35%|███▍ | 3493/10000 [12:44:15<23:26:26, 12.97s/it] 35%|███▍ | 3494/10000 [12:44:28<23:23:20, 12.94s/it] {'loss': 0.0063, 'learning_rate': 3.257e-05, 'epoch': 1.32} 35%|███▍ | 3494/10000 [12:44:28<23:23:20, 12.94s/it] 35%|███▍ | 3495/10000 [12:44:41<23:22:04, 12.93s/it] {'loss': 0.0046, 'learning_rate': 3.2565000000000004e-05, 'epoch': 1.32} 35%|███▍ | 3495/10000 [12:44:41<23:22:04, 12.93s/it] 35%|███▍ | 3496/10000 [12:44:53<23:19:56, 12.91s/it] {'loss': 0.0058, 'learning_rate': 3.256e-05, 'epoch': 1.32} 35%|███▍ | 3496/10000 [12:44:53<23:19:56, 12.91s/it] 35%|███▍ | 3497/10000 [12:45:06<23:19:46, 12.92s/it] {'loss': 0.0052, 'learning_rate': 3.2555e-05, 'epoch': 1.32} 35%|███▍ | 3497/10000 [12:45:06<23:19:46, 12.92s/it] 35%|███▍ | 3498/10000 [12:45:19<23:19:37, 12.92s/it] {'loss': 0.0041, 'learning_rate': 3.2550000000000005e-05, 'epoch': 1.32} 35%|███▍ | 3498/10000 [12:45:19<23:19:37, 12.92s/it] 35%|███▍ | 3499/10000 [12:45:32<23:19:09, 12.91s/it] {'loss': 0.0051, 'learning_rate': 3.2545e-05, 'epoch': 1.32} 35%|███▍ | 3499/10000 [12:45:32<23:19:09, 12.91s/it] 35%|███▌ | 3500/10000 [12:45:45<23:19:23, 12.92s/it] {'loss': 0.0055, 'learning_rate': 3.2540000000000004e-05, 'epoch': 1.32} 35%|███▌ | 3500/10000 [12:45:45<23:19:23, 12.92s/it] 35%|███▌ | 3501/10000 [12:45:58<23:21:12, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.2535e-05, 'epoch': 1.32} 35%|███▌ | 3501/10000 [12:45:58<23:21:12, 12.94s/it] 35%|███▌ | 3502/10000 [12:46:11<23:20:18, 12.93s/it] {'loss': 0.0053, 'learning_rate': 3.253e-05, 'epoch': 1.32} 35%|███▌ | 3502/10000 [12:46:11<23:20:18, 12.93s/it] 35%|███▌ | 3503/10000 [12:46:24<23:20:22, 12.93s/it] {'loss': 0.0047, 'learning_rate': 3.2525e-05, 'epoch': 1.32} 35%|███▌ | 3503/10000 [12:46:24<23:20:22, 12.93s/it] 35%|███▌ | 3504/10000 [12:46:37<23:21:31, 12.95s/it] {'loss': 0.0054, 'learning_rate': 3.252e-05, 'epoch': 1.32} 35%|███▌ | 3504/10000 [12:46:37<23:21:31, 12.95s/it] 35%|███▌ | 3505/10000 [12:46:50<23:23:44, 12.97s/it] {'loss': 0.0044, 'learning_rate': 3.2515e-05, 'epoch': 1.32} 35%|███▌ | 3505/10000 [12:46:50<23:23:44, 12.97s/it] 35%|███▌ | 3506/10000 [12:47:03<23:20:48, 12.94s/it] {'loss': 0.0055, 'learning_rate': 3.251e-05, 'epoch': 1.32} 35%|███▌ | 3506/10000 [12:47:03<23:20:48, 12.94s/it] 35%|███▌ | 3507/10000 [12:47:16<23:21:09, 12.95s/it] {'loss': 0.0053, 'learning_rate': 3.2505e-05, 'epoch': 1.32} 35%|███▌ | 3507/10000 [12:47:16<23:21:09, 12.95s/it] 35%|███▌ | 3508/10000 [12:47:29<23:18:05, 12.92s/it] {'loss': 0.0057, 'learning_rate': 3.2500000000000004e-05, 'epoch': 1.32} 35%|███▌ | 3508/10000 [12:47:29<23:18:05, 12.92s/it] 35%|███▌ | 3509/10000 [12:47:42<23:20:18, 12.94s/it] {'loss': 0.005, 'learning_rate': 3.2495000000000007e-05, 'epoch': 1.32} 35%|███▌ | 3509/10000 [12:47:42<23:20:18, 12.94s/it] 35%|███▌ | 3510/10000 [12:47:54<23:17:37, 12.92s/it] {'loss': 0.0054, 'learning_rate': 3.249e-05, 'epoch': 1.32} 35%|███▌ | 3510/10000 [12:47:54<23:17:37, 12.92s/it] 35%|███▌ | 3511/10000 [12:48:07<23:18:17, 12.93s/it] {'loss': 0.0055, 'learning_rate': 3.2485000000000005e-05, 'epoch': 1.32} 35%|███▌ | 3511/10000 [12:48:07<23:18:17, 12.93s/it] 35%|███▌ | 3512/10000 [12:48:20<23:19:15, 12.94s/it] {'loss': 0.0062, 'learning_rate': 3.248e-05, 'epoch': 1.32} 35%|███▌ | 3512/10000 [12:48:20<23:19:15, 12.94s/it] 35%|███▌ | 3513/10000 [12:48:33<23:17:42, 12.93s/it] {'loss': 0.0062, 'learning_rate': 3.2474999999999997e-05, 'epoch': 1.32} 35%|███▌ | 3513/10000 [12:48:33<23:17:42, 12.93s/it] 35%|███▌ | 3514/10000 [12:48:46<23:18:08, 12.93s/it] {'loss': 0.0052, 'learning_rate': 3.247e-05, 'epoch': 1.32} 35%|███▌ | 3514/10000 [12:48:46<23:18:08, 12.93s/it] 35%|███▌ | 3515/10000 [12:48:59<23:21:22, 12.97s/it] {'loss': 0.0035, 'learning_rate': 3.2465e-05, 'epoch': 1.32} 35%|███▌ | 3515/10000 [12:48:59<23:21:22, 12.97s/it] 35%|███▌ | 3516/10000 [12:49:12<23:20:32, 12.96s/it] {'loss': 0.0063, 'learning_rate': 3.2460000000000004e-05, 'epoch': 1.32} 35%|███▌ | 3516/10000 [12:49:12<23:20:32, 12.96s/it] 35%|███▌ | 3517/10000 [12:49:25<23:18:40, 12.94s/it] {'loss': 0.0043, 'learning_rate': 3.2455e-05, 'epoch': 1.33} 35%|███▌ | 3517/10000 [12:49:25<23:18:40, 12.94s/it] 35%|███▌ | 3518/10000 [12:49:38<23:19:46, 12.96s/it] {'loss': 0.006, 'learning_rate': 3.245e-05, 'epoch': 1.33} 35%|███▌ | 3518/10000 [12:49:38<23:19:46, 12.96s/it] 35%|███▌ | 3519/10000 [12:49:51<23:20:31, 12.97s/it] {'loss': 0.0073, 'learning_rate': 3.2445000000000005e-05, 'epoch': 1.33} 35%|███▌ | 3519/10000 [12:49:51<23:20:31, 12.97s/it] 35%|███▌ | 3520/10000 [12:50:04<23:21:49, 12.98s/it] {'loss': 0.0059, 'learning_rate': 3.244e-05, 'epoch': 1.33} 35%|███▌ | 3520/10000 [12:50:04<23:21:49, 12.98s/it] 35%|███▌ | 3521/10000 [12:50:17<23:19:08, 12.96s/it] {'loss': 0.0061, 'learning_rate': 3.2435000000000004e-05, 'epoch': 1.33} 35%|███▌ | 3521/10000 [12:50:17<23:19:08, 12.96s/it] 35%|███▌ | 3522/10000 [12:50:30<23:18:52, 12.96s/it] {'loss': 0.0057, 'learning_rate': 3.243e-05, 'epoch': 1.33} 35%|███▌ | 3522/10000 [12:50:30<23:18:52, 12.96s/it] 35%|███▌ | 3523/10000 [12:50:43<23:19:48, 12.97s/it] {'loss': 0.0055, 'learning_rate': 3.2425e-05, 'epoch': 1.33} 35%|███▌ | 3523/10000 [12:50:43<23:19:48, 12.97s/it] 35%|███▌ | 3524/10000 [12:50:56<23:19:41, 12.97s/it] {'loss': 0.0051, 'learning_rate': 3.242e-05, 'epoch': 1.33} 35%|███▌ | 3524/10000 [12:50:56<23:19:41, 12.97s/it] 35%|███▌ | 3525/10000 [12:51:09<23:18:47, 12.96s/it] {'loss': 0.0043, 'learning_rate': 3.2415e-05, 'epoch': 1.33} 35%|███▌ | 3525/10000 [12:51:09<23:18:47, 12.96s/it] 35%|███▌ | 3526/10000 [12:51:22<23:17:25, 12.95s/it] {'loss': 0.0039, 'learning_rate': 3.241e-05, 'epoch': 1.33} 35%|███▌ | 3526/10000 [12:51:22<23:17:25, 12.95s/it] 35%|███▌ | 3527/10000 [12:51:35<23:16:30, 12.94s/it] {'loss': 0.0054, 'learning_rate': 3.2405e-05, 'epoch': 1.33} 35%|███▌ | 3527/10000 [12:51:35<23:16:30, 12.94s/it] 35%|███▌ | 3528/10000 [12:51:48<23:16:51, 12.95s/it] {'loss': 0.0065, 'learning_rate': 3.24e-05, 'epoch': 1.33} 35%|███▌ | 3528/10000 [12:51:48<23:16:51, 12.95s/it] 35%|███▌ | 3529/10000 [12:52:01<23:18:05, 12.96s/it] {'loss': 0.0046, 'learning_rate': 3.2395000000000004e-05, 'epoch': 1.33} 35%|███▌ | 3529/10000 [12:52:01<23:18:05, 12.96s/it] 35%|███▌ | 3530/10000 [12:52:14<23:20:51, 12.99s/it] {'loss': 0.007, 'learning_rate': 3.239000000000001e-05, 'epoch': 1.33} 35%|███▌ | 3530/10000 [12:52:14<23:20:51, 12.99s/it] 35%|███▌ | 3531/10000 [12:52:27<23:19:13, 12.98s/it] {'loss': 0.0051, 'learning_rate': 3.2385e-05, 'epoch': 1.33} 35%|███▌ | 3531/10000 [12:52:27<23:19:13, 12.98s/it] 35%|███▌ | 3532/10000 [12:52:40<23:15:51, 12.95s/it] {'loss': 0.005, 'learning_rate': 3.238e-05, 'epoch': 1.33} 35%|███▌ | 3532/10000 [12:52:40<23:15:51, 12.95s/it] 35%|███▌ | 3533/10000 [12:52:52<23:14:48, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.2375e-05, 'epoch': 1.33} 35%|███▌ | 3533/10000 [12:52:52<23:14:48, 12.94s/it] 35%|███▌ | 3534/10000 [12:53:05<23:15:40, 12.95s/it] {'loss': 0.0074, 'learning_rate': 3.2370000000000003e-05, 'epoch': 1.33} 35%|███▌ | 3534/10000 [12:53:05<23:15:40, 12.95s/it] 35%|███▌ | 3535/10000 [12:53:18<23:10:09, 12.90s/it] {'loss': 0.006, 'learning_rate': 3.2365e-05, 'epoch': 1.33} 35%|███▌ | 3535/10000 [12:53:18<23:10:09, 12.90s/it] 35%|███▌ | 3536/10000 [12:53:31<23:09:52, 12.90s/it] {'loss': 0.0052, 'learning_rate': 3.236e-05, 'epoch': 1.33} 35%|███▌ | 3536/10000 [12:53:31<23:09:52, 12.90s/it] 35%|███▌ | 3537/10000 [12:53:44<23:08:29, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.2355000000000004e-05, 'epoch': 1.33} 35%|███▌ | 3537/10000 [12:53:44<23:08:29, 12.89s/it] 35%|███▌ | 3538/10000 [12:53:57<23:06:37, 12.87s/it] {'loss': 0.0063, 'learning_rate': 3.235e-05, 'epoch': 1.33} 35%|███▌ | 3538/10000 [12:53:57<23:06:37, 12.87s/it] 35%|███▌ | 3539/10000 [12:54:10<23:05:33, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.2345e-05, 'epoch': 1.33} 35%|███▌ | 3539/10000 [12:54:10<23:05:33, 12.87s/it] 35%|███▌ | 3540/10000 [12:54:23<23:04:48, 12.86s/it] {'loss': 0.0043, 'learning_rate': 3.2340000000000005e-05, 'epoch': 1.33} 35%|███▌ | 3540/10000 [12:54:23<23:04:48, 12.86s/it] 35%|███▌ | 3541/10000 [12:54:35<23:06:25, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.2335e-05, 'epoch': 1.33} 35%|███▌ | 3541/10000 [12:54:35<23:06:25, 12.88s/it] 35%|███▌ | 3542/10000 [12:54:48<23:04:47, 12.87s/it] {'loss': 0.0057, 'learning_rate': 3.233e-05, 'epoch': 1.33} 35%|███▌ | 3542/10000 [12:54:48<23:04:47, 12.87s/it] 35%|███▌ | 3543/10000 [12:55:01<23:05:51, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.2325e-05, 'epoch': 1.33} 35%|███▌ | 3543/10000 [12:55:01<23:05:51, 12.88s/it] 35%|███▌ | 3544/10000 [12:55:14<23:06:09, 12.88s/it] {'loss': 0.0057, 'learning_rate': 3.232e-05, 'epoch': 1.34} 35%|███▌ | 3544/10000 [12:55:14<23:06:09, 12.88s/it] 35%|███▌ | 3545/10000 [12:55:27<23:04:41, 12.87s/it] {'loss': 0.0067, 'learning_rate': 3.2315e-05, 'epoch': 1.34} 35%|███▌ | 3545/10000 [12:55:27<23:04:41, 12.87s/it] 35%|███▌ | 3546/10000 [12:55:40<23:02:21, 12.85s/it] {'loss': 0.0066, 'learning_rate': 3.231e-05, 'epoch': 1.34} 35%|███▌ | 3546/10000 [12:55:40<23:02:21, 12.85s/it] 35%|███▌ | 3547/10000 [12:55:53<23:02:10, 12.85s/it] {'loss': 0.0059, 'learning_rate': 3.2305e-05, 'epoch': 1.34} 35%|███▌ | 3547/10000 [12:55:53<23:02:10, 12.85s/it] 35%|███▌ | 3548/10000 [12:56:05<23:02:54, 12.86s/it] {'loss': 0.0054, 'learning_rate': 3.2300000000000006e-05, 'epoch': 1.34} 35%|███▌ | 3548/10000 [12:56:05<23:02:54, 12.86s/it] 35%|███▌ | 3549/10000 [12:56:18<23:03:04, 12.86s/it] {'loss': 0.0049, 'learning_rate': 3.2295e-05, 'epoch': 1.34} 35%|███▌ | 3549/10000 [12:56:18<23:03:04, 12.86s/it] 36%|███▌ | 3550/10000 [12:56:31<23:04:14, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.2290000000000004e-05, 'epoch': 1.34} 36%|███▌ | 3550/10000 [12:56:31<23:04:14, 12.88s/it] 36%|███▌ | 3551/10000 [12:56:44<23:04:24, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.228500000000001e-05, 'epoch': 1.34} 36%|███▌ | 3551/10000 [12:56:44<23:04:24, 12.88s/it] 36%|███▌ | 3552/10000 [12:56:57<23:03:57, 12.88s/it] {'loss': 0.0064, 'learning_rate': 3.2279999999999996e-05, 'epoch': 1.34} 36%|███▌ | 3552/10000 [12:56:57<23:03:57, 12.88s/it] 36%|███▌ | 3553/10000 [12:57:10<23:02:14, 12.86s/it] {'loss': 0.0053, 'learning_rate': 3.2275e-05, 'epoch': 1.34} 36%|███▌ | 3553/10000 [12:57:10<23:02:14, 12.86s/it] 36%|███▌ | 3554/10000 [12:57:23<23:04:45, 12.89s/it] {'loss': 0.0038, 'learning_rate': 3.227e-05, 'epoch': 1.34} 36%|███▌ | 3554/10000 [12:57:23<23:04:45, 12.89s/it] 36%|███▌ | 3555/10000 [12:57:36<23:06:00, 12.90s/it] {'loss': 0.009, 'learning_rate': 3.2265000000000004e-05, 'epoch': 1.34} 36%|███▌ | 3555/10000 [12:57:36<23:06:00, 12.90s/it] 36%|███▌ | 3556/10000 [12:57:49<23:05:47, 12.90s/it] {'loss': 0.0051, 'learning_rate': 3.226e-05, 'epoch': 1.34} 36%|███▌ | 3556/10000 [12:57:49<23:05:47, 12.90s/it] 36%|███▌ | 3557/10000 [12:58:01<23:02:55, 12.88s/it] {'loss': 0.0065, 'learning_rate': 3.2255e-05, 'epoch': 1.34} 36%|███▌ | 3557/10000 [12:58:01<23:02:55, 12.88s/it] 36%|███▌ | 3558/10000 [12:58:14<23:03:58, 12.89s/it] {'loss': 0.0054, 'learning_rate': 3.2250000000000005e-05, 'epoch': 1.34} 36%|███▌ | 3558/10000 [12:58:14<23:03:58, 12.89s/it] 36%|███▌ | 3559/10000 [12:58:27<23:05:17, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.2245e-05, 'epoch': 1.34} 36%|███▌ | 3559/10000 [12:58:27<23:05:17, 12.90s/it] 36%|███▌ | 3560/10000 [12:58:40<23:03:26, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.224e-05, 'epoch': 1.34} 36%|███▌ | 3560/10000 [12:58:40<23:03:26, 12.89s/it] 36%|███▌ | 3561/10000 [12:58:53<23:02:04, 12.88s/it] {'loss': 0.0069, 'learning_rate': 3.2235000000000006e-05, 'epoch': 1.34} 36%|███▌ | 3561/10000 [12:58:53<23:02:04, 12.88s/it] 36%|███▌ | 3562/10000 [12:59:06<23:01:54, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.223e-05, 'epoch': 1.34} 36%|███▌ | 3562/10000 [12:59:06<23:01:54, 12.88s/it] 36%|███▌ | 3563/10000 [12:59:19<23:03:19, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.2225e-05, 'epoch': 1.34} 36%|███▌ | 3563/10000 [12:59:19<23:03:19, 12.89s/it] 36%|███▌ | 3564/10000 [12:59:32<23:04:49, 12.91s/it] {'loss': 0.006, 'learning_rate': 3.222e-05, 'epoch': 1.34} 36%|███▌ | 3564/10000 [12:59:32<23:04:49, 12.91s/it] 36%|███▌ | 3565/10000 [12:59:45<23:02:48, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.2215e-05, 'epoch': 1.34} 36%|███▌ | 3565/10000 [12:59:45<23:02:48, 12.89s/it] 36%|███▌ | 3566/10000 [12:59:58<23:02:06, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.221e-05, 'epoch': 1.34} 36%|███▌ | 3566/10000 [12:59:58<23:02:06, 12.89s/it] 36%|███▌ | 3567/10000 [13:00:10<23:00:38, 12.88s/it] {'loss': 0.0062, 'learning_rate': 3.2205e-05, 'epoch': 1.34} 36%|███▌ | 3567/10000 [13:00:10<23:00:38, 12.88s/it] 36%|███▌ | 3568/10000 [13:00:23<22:58:15, 12.86s/it] {'loss': 0.0049, 'learning_rate': 3.2200000000000003e-05, 'epoch': 1.34} 36%|███▌ | 3568/10000 [13:00:23<22:58:15, 12.86s/it] 36%|███▌ | 3569/10000 [13:00:36<22:58:40, 12.86s/it] {'loss': 0.0052, 'learning_rate': 3.2195000000000006e-05, 'epoch': 1.34} 36%|███▌ | 3569/10000 [13:00:36<22:58:40, 12.86s/it] 36%|███▌ | 3570/10000 [13:00:49<22:59:36, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.219e-05, 'epoch': 1.35} 36%|███▌ | 3570/10000 [13:00:49<22:59:36, 12.87s/it] 36%|███▌ | 3571/10000 [13:01:02<22:59:53, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.2185000000000004e-05, 'epoch': 1.35} 36%|███▌ | 3571/10000 [13:01:02<22:59:53, 12.88s/it] 36%|███▌ | 3572/10000 [13:01:15<23:01:07, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.218e-05, 'epoch': 1.35} 36%|███▌ | 3572/10000 [13:01:15<23:01:07, 12.89s/it] 36%|███▌ | 3573/10000 [13:01:28<22:59:43, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.2175e-05, 'epoch': 1.35} 36%|███▌ | 3573/10000 [13:01:28<22:59:43, 12.88s/it] 36%|███▌ | 3574/10000 [13:01:40<22:58:25, 12.87s/it] {'loss': 0.0067, 'learning_rate': 3.217e-05, 'epoch': 1.35} 36%|███▌ | 3574/10000 [13:01:40<22:58:25, 12.87s/it] 36%|███▌ | 3575/10000 [13:01:53<22:59:09, 12.88s/it] {'loss': 0.0049, 'learning_rate': 3.2165e-05, 'epoch': 1.35} 36%|███▌ | 3575/10000 [13:01:53<22:59:09, 12.88s/it] 36%|███▌ | 3576/10000 [13:02:06<22:58:45, 12.88s/it] {'loss': 0.0054, 'learning_rate': 3.2160000000000004e-05, 'epoch': 1.35} 36%|███▌ | 3576/10000 [13:02:06<22:58:45, 12.88s/it] 36%|███▌ | 3577/10000 [13:02:19<22:59:18, 12.88s/it] {'loss': 0.0041, 'learning_rate': 3.2155e-05, 'epoch': 1.35} 36%|███▌ | 3577/10000 [13:02:19<22:59:18, 12.88s/it] 36%|███▌ | 3578/10000 [13:02:32<22:56:04, 12.86s/it] {'loss': 0.0064, 'learning_rate': 3.215e-05, 'epoch': 1.35} 36%|███▌ | 3578/10000 [13:02:32<22:56:04, 12.86s/it] 36%|███▌ | 3579/10000 [13:02:45<22:54:23, 12.84s/it] {'loss': 0.0048, 'learning_rate': 3.2145000000000005e-05, 'epoch': 1.35} 36%|███▌ | 3579/10000 [13:02:45<22:54:23, 12.84s/it] 36%|███▌ | 3580/10000 [13:02:58<22:57:00, 12.87s/it] {'loss': 0.0047, 'learning_rate': 3.214e-05, 'epoch': 1.35} 36%|███▌ | 3580/10000 [13:02:58<22:57:00, 12.87s/it] 36%|███▌ | 3581/10000 [13:03:10<22:55:25, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.2135e-05, 'epoch': 1.35} 36%|███▌ | 3581/10000 [13:03:11<22:55:25, 12.86s/it] 36%|███▌ | 3582/10000 [13:03:23<22:57:01, 12.87s/it] {'loss': 0.005, 'learning_rate': 3.213e-05, 'epoch': 1.35} 36%|███▌ | 3582/10000 [13:03:23<22:57:01, 12.87s/it] 36%|███▌ | 3583/10000 [13:03:36<22:58:47, 12.89s/it] {'loss': 0.0051, 'learning_rate': 3.2125e-05, 'epoch': 1.35} 36%|███▌ | 3583/10000 [13:03:36<22:58:47, 12.89s/it] 36%|███▌ | 3584/10000 [13:03:49<22:57:19, 12.88s/it] {'loss': 0.0046, 'learning_rate': 3.212e-05, 'epoch': 1.35} 36%|███▌ | 3584/10000 [13:03:49<22:57:19, 12.88s/it] 36%|███▌ | 3585/10000 [13:04:02<22:57:13, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.2115e-05, 'epoch': 1.35} 36%|███▌ | 3585/10000 [13:04:02<22:57:13, 12.88s/it] 36%|███▌ | 3586/10000 [13:04:15<22:57:16, 12.88s/it] {'loss': 0.0054, 'learning_rate': 3.211e-05, 'epoch': 1.35} 36%|███▌ | 3586/10000 [13:04:15<22:57:16, 12.88s/it] 36%|███▌ | 3587/10000 [13:04:28<22:56:45, 12.88s/it] {'loss': 0.0041, 'learning_rate': 3.2105e-05, 'epoch': 1.35} 36%|███▌ | 3587/10000 [13:04:28<22:56:45, 12.88s/it] 36%|███▌ | 3588/10000 [13:04:41<22:55:06, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.21e-05, 'epoch': 1.35} 36%|███▌ | 3588/10000 [13:04:41<22:55:06, 12.87s/it] 36%|███▌ | 3589/10000 [13:04:53<22:52:07, 12.84s/it] {'loss': 0.0054, 'learning_rate': 3.2095000000000004e-05, 'epoch': 1.35} 36%|███▌ | 3589/10000 [13:04:53<22:52:07, 12.84s/it] 36%|███▌ | 3590/10000 [13:05:06<22:53:37, 12.86s/it] {'loss': 0.0055, 'learning_rate': 3.2090000000000006e-05, 'epoch': 1.35} 36%|███▌ | 3590/10000 [13:05:06<22:53:37, 12.86s/it] 36%|███▌ | 3591/10000 [13:05:19<22:52:27, 12.85s/it] {'loss': 0.0074, 'learning_rate': 3.2085e-05, 'epoch': 1.35} 36%|███▌ | 3591/10000 [13:05:19<22:52:27, 12.85s/it] 36%|███▌ | 3592/10000 [13:05:32<22:54:39, 12.87s/it] {'loss': 0.0045, 'learning_rate': 3.208e-05, 'epoch': 1.35} 36%|███▌ | 3592/10000 [13:05:32<22:54:39, 12.87s/it] 36%|███▌ | 3593/10000 [13:05:45<22:56:55, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.2075e-05, 'epoch': 1.35} 36%|███▌ | 3593/10000 [13:05:45<22:56:55, 12.89s/it] 36%|███▌ | 3594/10000 [13:05:58<22:55:03, 12.88s/it] {'loss': 0.0049, 'learning_rate': 3.207e-05, 'epoch': 1.35} 36%|███▌ | 3594/10000 [13:05:58<22:55:03, 12.88s/it] 36%|███▌ | 3595/10000 [13:06:11<22:54:37, 12.88s/it] {'loss': 0.0042, 'learning_rate': 3.2065e-05, 'epoch': 1.35} 36%|███▌ | 3595/10000 [13:06:11<22:54:37, 12.88s/it] 36%|███▌ | 3596/10000 [13:06:24<22:55:00, 12.88s/it] {'loss': 0.0055, 'learning_rate': 3.206e-05, 'epoch': 1.35} 36%|███▌ | 3596/10000 [13:06:24<22:55:00, 12.88s/it] 36%|███▌ | 3597/10000 [13:06:37<22:55:05, 12.89s/it] {'loss': 0.0044, 'learning_rate': 3.2055000000000004e-05, 'epoch': 1.36} 36%|███▌ | 3597/10000 [13:06:37<22:55:05, 12.89s/it] 36%|███▌ | 3598/10000 [13:06:49<22:53:11, 12.87s/it] {'loss': 0.0052, 'learning_rate': 3.205e-05, 'epoch': 1.36} 36%|███▌ | 3598/10000 [13:06:49<22:53:11, 12.87s/it] 36%|███▌ | 3599/10000 [13:07:02<22:53:22, 12.87s/it] {'loss': 0.0051, 'learning_rate': 3.2045e-05, 'epoch': 1.36} 36%|███▌ | 3599/10000 [13:07:02<22:53:22, 12.87s/it] 36%|███▌ | 3600/10000 [13:07:15<22:52:30, 12.87s/it] {'loss': 0.005, 'learning_rate': 3.2040000000000005e-05, 'epoch': 1.36} 36%|███▌ | 3600/10000 [13:07:15<22:52:30, 12.87s/it] 36%|███▌ | 3601/10000 [13:07:28<22:51:39, 12.86s/it] {'loss': 0.0055, 'learning_rate': 3.2035e-05, 'epoch': 1.36} 36%|███▌ | 3601/10000 [13:07:28<22:51:39, 12.86s/it] 36%|███▌ | 3602/10000 [13:07:41<22:51:24, 12.86s/it] {'loss': 0.0057, 'learning_rate': 3.2029999999999997e-05, 'epoch': 1.36} 36%|███▌ | 3602/10000 [13:07:41<22:51:24, 12.86s/it] 36%|███▌ | 3603/10000 [13:07:54<22:50:32, 12.85s/it] {'loss': 0.0056, 'learning_rate': 3.2025e-05, 'epoch': 1.36} 36%|███▌ | 3603/10000 [13:07:54<22:50:32, 12.85s/it] 36%|███▌ | 3604/10000 [13:08:07<22:53:49, 12.89s/it] {'loss': 0.0054, 'learning_rate': 3.202e-05, 'epoch': 1.36} 36%|███▌ | 3604/10000 [13:08:07<22:53:49, 12.89s/it] 36%|███▌ | 3605/10000 [13:08:20<22:53:59, 12.89s/it] {'loss': 0.0054, 'learning_rate': 3.2015e-05, 'epoch': 1.36} 36%|███▌ | 3605/10000 [13:08:20<22:53:59, 12.89s/it] 36%|███▌ | 3606/10000 [13:08:32<22:50:44, 12.86s/it] {'loss': 0.0065, 'learning_rate': 3.201e-05, 'epoch': 1.36} 36%|███▌ | 3606/10000 [13:08:32<22:50:44, 12.86s/it] 36%|███▌ | 3607/10000 [13:08:45<22:49:57, 12.86s/it] {'loss': 0.0048, 'learning_rate': 3.2005e-05, 'epoch': 1.36} 36%|███▌ | 3607/10000 [13:08:45<22:49:57, 12.86s/it] 36%|███▌ | 3608/10000 [13:08:58<22:49:36, 12.86s/it] {'loss': 0.0048, 'learning_rate': 3.2000000000000005e-05, 'epoch': 1.36} 36%|███▌ | 3608/10000 [13:08:58<22:49:36, 12.86s/it] 36%|███▌ | 3609/10000 [13:09:11<22:49:04, 12.85s/it] {'loss': 0.0052, 'learning_rate': 3.1995e-05, 'epoch': 1.36} 36%|███▌ | 3609/10000 [13:09:11<22:49:04, 12.85s/it] 36%|███▌ | 3610/10000 [13:09:24<22:50:06, 12.86s/it] {'loss': 0.0051, 'learning_rate': 3.1990000000000004e-05, 'epoch': 1.36} 36%|███▌ | 3610/10000 [13:09:24<22:50:06, 12.86s/it] 36%|███▌ | 3611/10000 [13:09:37<22:51:06, 12.88s/it] {'loss': 0.004, 'learning_rate': 3.1985000000000006e-05, 'epoch': 1.36} 36%|███▌ | 3611/10000 [13:09:37<22:51:06, 12.88s/it] 36%|███▌ | 3612/10000 [13:09:49<22:49:30, 12.86s/it] {'loss': 0.0059, 'learning_rate': 3.198e-05, 'epoch': 1.36} 36%|███▌ | 3612/10000 [13:09:50<22:49:30, 12.86s/it] 36%|███▌ | 3613/10000 [13:10:02<22:49:03, 12.86s/it] {'loss': 0.0046, 'learning_rate': 3.1975e-05, 'epoch': 1.36} 36%|███▌ | 3613/10000 [13:10:02<22:49:03, 12.86s/it] 36%|███▌ | 3614/10000 [13:10:15<22:49:13, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.197e-05, 'epoch': 1.36} 36%|███▌ | 3614/10000 [13:10:15<22:49:13, 12.86s/it] 36%|███▌ | 3615/10000 [13:10:28<22:47:56, 12.85s/it] {'loss': 0.0052, 'learning_rate': 3.1965e-05, 'epoch': 1.36} 36%|███▌ | 3615/10000 [13:10:28<22:47:56, 12.85s/it] 36%|███▌ | 3616/10000 [13:10:41<22:46:30, 12.84s/it] {'loss': 0.0049, 'learning_rate': 3.196e-05, 'epoch': 1.36} 36%|███▌ | 3616/10000 [13:10:41<22:46:30, 12.84s/it] 36%|███▌ | 3617/10000 [13:10:54<22:46:34, 12.85s/it] {'loss': 0.0057, 'learning_rate': 3.1955e-05, 'epoch': 1.36} 36%|███▌ | 3617/10000 [13:10:54<22:46:34, 12.85s/it] 36%|███▌ | 3618/10000 [13:11:07<22:47:19, 12.85s/it] {'loss': 0.0058, 'learning_rate': 3.1950000000000004e-05, 'epoch': 1.36} 36%|███▌ | 3618/10000 [13:11:07<22:47:19, 12.85s/it] 36%|███▌ | 3619/10000 [13:11:20<22:49:28, 12.88s/it] {'loss': 0.0055, 'learning_rate': 3.1945e-05, 'epoch': 1.36} 36%|███▌ | 3619/10000 [13:11:20<22:49:28, 12.88s/it] 36%|███▌ | 3620/10000 [13:11:32<22:50:12, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.194e-05, 'epoch': 1.36} 36%|███▌ | 3620/10000 [13:11:32<22:50:12, 12.89s/it] 36%|███▌ | 3621/10000 [13:11:45<22:49:55, 12.89s/it] {'loss': 0.0053, 'learning_rate': 3.1935000000000005e-05, 'epoch': 1.36} 36%|███▌ | 3621/10000 [13:11:45<22:49:55, 12.89s/it] 36%|███▌ | 3622/10000 [13:11:58<22:51:12, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.193e-05, 'epoch': 1.36} 36%|███▌ | 3622/10000 [13:11:58<22:51:12, 12.90s/it] 36%|███▌ | 3623/10000 [13:12:11<22:55:11, 12.94s/it] {'loss': 0.005, 'learning_rate': 3.1925e-05, 'epoch': 1.37} 36%|███▌ | 3623/10000 [13:12:11<22:55:11, 12.94s/it] 36%|███▌ | 3624/10000 [13:12:24<22:56:41, 12.96s/it] {'loss': 0.0044, 'learning_rate': 3.192e-05, 'epoch': 1.37} 36%|███▌ | 3624/10000 [13:12:24<22:56:41, 12.96s/it] 36%|███▋ | 3625/10000 [13:12:37<22:59:49, 12.99s/it] {'loss': 0.0053, 'learning_rate': 3.1915e-05, 'epoch': 1.37} 36%|███▋ | 3625/10000 [13:12:37<22:59:49, 12.99s/it] 36%|███▋ | 3626/10000 [13:12:50<22:58:37, 12.98s/it] {'loss': 0.0054, 'learning_rate': 3.191e-05, 'epoch': 1.37} 36%|███▋ | 3626/10000 [13:12:50<22:58:37, 12.98s/it] 36%|███▋ | 3627/10000 [13:13:03<22:59:54, 12.99s/it] {'loss': 0.004, 'learning_rate': 3.1905e-05, 'epoch': 1.37} 36%|███▋ | 3627/10000 [13:13:03<22:59:54, 12.99s/it] 36%|███▋ | 3628/10000 [13:13:16<22:57:17, 12.97s/it] {'loss': 0.0066, 'learning_rate': 3.19e-05, 'epoch': 1.37} 36%|███▋ | 3628/10000 [13:13:16<22:57:17, 12.97s/it] 36%|███▋ | 3629/10000 [13:13:29<22:55:15, 12.95s/it] {'loss': 0.0055, 'learning_rate': 3.1895000000000005e-05, 'epoch': 1.37} 36%|███▋ | 3629/10000 [13:13:29<22:55:15, 12.95s/it] 36%|███▋ | 3630/10000 [13:13:42<22:53:54, 12.94s/it] {'loss': 0.0059, 'learning_rate': 3.189e-05, 'epoch': 1.37} 36%|███▋ | 3630/10000 [13:13:42<22:53:54, 12.94s/it] 36%|███▋ | 3631/10000 [13:13:55<22:54:31, 12.95s/it] {'loss': 0.0075, 'learning_rate': 3.1885000000000004e-05, 'epoch': 1.37} 36%|███▋ | 3631/10000 [13:13:55<22:54:31, 12.95s/it] 36%|███▋ | 3632/10000 [13:14:08<22:54:33, 12.95s/it] {'loss': 0.0055, 'learning_rate': 3.188e-05, 'epoch': 1.37} 36%|███▋ | 3632/10000 [13:14:08<22:54:33, 12.95s/it] 36%|███▋ | 3633/10000 [13:14:21<22:55:38, 12.96s/it] {'loss': 0.0054, 'learning_rate': 3.1875e-05, 'epoch': 1.37} 36%|███▋ | 3633/10000 [13:14:21<22:55:38, 12.96s/it] 36%|███▋ | 3634/10000 [13:14:34<22:55:41, 12.97s/it] {'loss': 0.0054, 'learning_rate': 3.187e-05, 'epoch': 1.37} 36%|███▋ | 3634/10000 [13:14:34<22:55:41, 12.97s/it] 36%|███▋ | 3635/10000 [13:14:47<22:54:42, 12.96s/it] {'loss': 0.0046, 'learning_rate': 3.1865e-05, 'epoch': 1.37} 36%|███▋ | 3635/10000 [13:14:47<22:54:42, 12.96s/it] 36%|███▋ | 3636/10000 [13:15:00<22:52:45, 12.94s/it] {'loss': 0.0057, 'learning_rate': 3.186e-05, 'epoch': 1.37} 36%|███▋ | 3636/10000 [13:15:00<22:52:45, 12.94s/it] 36%|███▋ | 3637/10000 [13:15:13<22:51:48, 12.94s/it] {'loss': 0.0041, 'learning_rate': 3.1855e-05, 'epoch': 1.37} 36%|███▋ | 3637/10000 [13:15:13<22:51:48, 12.94s/it] 36%|███▋ | 3638/10000 [13:15:26<22:51:01, 12.93s/it] {'loss': 0.0038, 'learning_rate': 3.185e-05, 'epoch': 1.37} 36%|███▋ | 3638/10000 [13:15:26<22:51:01, 12.93s/it] 36%|███▋ | 3639/10000 [13:15:39<22:52:10, 12.94s/it] {'loss': 0.0057, 'learning_rate': 3.1845000000000004e-05, 'epoch': 1.37} 36%|███▋ | 3639/10000 [13:15:39<22:52:10, 12.94s/it] 36%|███▋ | 3640/10000 [13:15:52<22:52:45, 12.95s/it] {'loss': 0.0051, 'learning_rate': 3.184e-05, 'epoch': 1.37} 36%|███▋ | 3640/10000 [13:15:52<22:52:45, 12.95s/it] 36%|███▋ | 3641/10000 [13:16:05<22:53:07, 12.96s/it] {'loss': 0.0049, 'learning_rate': 3.1835e-05, 'epoch': 1.37} 36%|███▋ | 3641/10000 [13:16:05<22:53:07, 12.96s/it] 36%|███▋ | 3642/10000 [13:16:17<22:51:31, 12.94s/it] {'loss': 0.0065, 'learning_rate': 3.1830000000000005e-05, 'epoch': 1.37} 36%|███▋ | 3642/10000 [13:16:18<22:51:31, 12.94s/it] 36%|███▋ | 3643/10000 [13:16:30<22:53:11, 12.96s/it] {'loss': 0.0042, 'learning_rate': 3.1825e-05, 'epoch': 1.37} 36%|███▋ | 3643/10000 [13:16:30<22:53:11, 12.96s/it] 36%|███▋ | 3644/10000 [13:16:43<22:52:32, 12.96s/it] {'loss': 0.0064, 'learning_rate': 3.182e-05, 'epoch': 1.37} 36%|███▋ | 3644/10000 [13:16:43<22:52:32, 12.96s/it] 36%|███▋ | 3645/10000 [13:16:56<22:53:21, 12.97s/it] {'loss': 0.0063, 'learning_rate': 3.1815e-05, 'epoch': 1.37} 36%|███▋ | 3645/10000 [13:16:56<22:53:21, 12.97s/it] 36%|███▋ | 3646/10000 [13:17:09<22:51:24, 12.95s/it] {'loss': 0.0052, 'learning_rate': 3.181e-05, 'epoch': 1.37} 36%|███▋ | 3646/10000 [13:17:09<22:51:24, 12.95s/it] 36%|███▋ | 3647/10000 [13:17:22<22:53:24, 12.97s/it] {'loss': 0.0061, 'learning_rate': 3.1805000000000005e-05, 'epoch': 1.37} 36%|███▋ | 3647/10000 [13:17:22<22:53:24, 12.97s/it] 36%|███▋ | 3648/10000 [13:17:35<22:51:19, 12.95s/it] {'loss': 0.0047, 'learning_rate': 3.18e-05, 'epoch': 1.37} 36%|███▋ | 3648/10000 [13:17:35<22:51:19, 12.95s/it] 36%|███▋ | 3649/10000 [13:17:48<22:50:54, 12.95s/it] {'loss': 0.0048, 'learning_rate': 3.1795e-05, 'epoch': 1.37} 36%|███▋ | 3649/10000 [13:17:48<22:50:54, 12.95s/it] 36%|███▋ | 3650/10000 [13:18:01<22:50:52, 12.95s/it] {'loss': 0.0053, 'learning_rate': 3.1790000000000006e-05, 'epoch': 1.38} 36%|███▋ | 3650/10000 [13:18:01<22:50:52, 12.95s/it] 37%|███▋ | 3651/10000 [13:18:14<22:51:07, 12.96s/it] {'loss': 0.005, 'learning_rate': 3.1785e-05, 'epoch': 1.38} 37%|███▋ | 3651/10000 [13:18:14<22:51:07, 12.96s/it] 37%|███▋ | 3652/10000 [13:18:27<22:47:43, 12.93s/it] {'loss': 0.0052, 'learning_rate': 3.1780000000000004e-05, 'epoch': 1.38} 37%|███▋ | 3652/10000 [13:18:27<22:47:43, 12.93s/it] 37%|███▋ | 3653/10000 [13:18:40<22:47:21, 12.93s/it] {'loss': 0.0047, 'learning_rate': 3.1775e-05, 'epoch': 1.38} 37%|███▋ | 3653/10000 [13:18:40<22:47:21, 12.93s/it] 37%|███▋ | 3654/10000 [13:18:53<22:45:53, 12.91s/it] {'loss': 0.0058, 'learning_rate': 3.177e-05, 'epoch': 1.38} 37%|███▋ | 3654/10000 [13:18:53<22:45:53, 12.91s/it] 37%|███▋ | 3655/10000 [13:19:06<22:47:17, 12.93s/it] {'loss': 0.0053, 'learning_rate': 3.1765e-05, 'epoch': 1.38} 37%|███▋ | 3655/10000 [13:19:06<22:47:17, 12.93s/it] 37%|███▋ | 3656/10000 [13:19:19<22:47:34, 12.93s/it] {'loss': 0.0054, 'learning_rate': 3.176e-05, 'epoch': 1.38} 37%|███▋ | 3656/10000 [13:19:19<22:47:34, 12.93s/it] 37%|███▋ | 3657/10000 [13:19:32<22:47:23, 12.93s/it] {'loss': 0.0062, 'learning_rate': 3.1755000000000003e-05, 'epoch': 1.38} 37%|███▋ | 3657/10000 [13:19:32<22:47:23, 12.93s/it] 37%|███▋ | 3658/10000 [13:19:44<22:44:47, 12.91s/it] {'loss': 0.0051, 'learning_rate': 3.175e-05, 'epoch': 1.38} 37%|███▋ | 3658/10000 [13:19:45<22:44:47, 12.91s/it] 37%|███▋ | 3659/10000 [13:19:57<22:42:56, 12.90s/it] {'loss': 0.004, 'learning_rate': 3.1745e-05, 'epoch': 1.38} 37%|███▋ | 3659/10000 [13:19:57<22:42:56, 12.90s/it] 37%|███▋ | 3660/10000 [13:20:10<22:40:12, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.1740000000000004e-05, 'epoch': 1.38} 37%|███▋ | 3660/10000 [13:20:10<22:40:12, 12.87s/it] 37%|███▋ | 3661/10000 [13:20:23<22:41:58, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.1735e-05, 'epoch': 1.38} 37%|███▋ | 3661/10000 [13:20:23<22:41:58, 12.89s/it] 37%|███▋ | 3662/10000 [13:20:36<22:41:56, 12.89s/it] {'loss': 0.0046, 'learning_rate': 3.173e-05, 'epoch': 1.38} 37%|███▋ | 3662/10000 [13:20:36<22:41:56, 12.89s/it] 37%|███▋ | 3663/10000 [13:20:49<22:41:32, 12.89s/it] {'loss': 0.0075, 'learning_rate': 3.1725e-05, 'epoch': 1.38} 37%|███▋ | 3663/10000 [13:20:49<22:41:32, 12.89s/it] 37%|███▋ | 3664/10000 [13:21:02<22:39:18, 12.87s/it] {'loss': 0.005, 'learning_rate': 3.172e-05, 'epoch': 1.38} 37%|███▋ | 3664/10000 [13:21:02<22:39:18, 12.87s/it] 37%|███▋ | 3665/10000 [13:21:15<22:41:30, 12.90s/it] {'loss': 0.0061, 'learning_rate': 3.1715e-05, 'epoch': 1.38} 37%|███▋ | 3665/10000 [13:21:15<22:41:30, 12.90s/it] 37%|███▋ | 3666/10000 [13:21:27<22:38:37, 12.87s/it] {'loss': 0.0085, 'learning_rate': 3.171e-05, 'epoch': 1.38} 37%|███▋ | 3666/10000 [13:21:28<22:38:37, 12.87s/it] 37%|███▋ | 3667/10000 [13:21:40<22:37:25, 12.86s/it] {'loss': 0.0052, 'learning_rate': 3.1705e-05, 'epoch': 1.38} 37%|███▋ | 3667/10000 [13:21:40<22:37:25, 12.86s/it] 37%|███▋ | 3668/10000 [13:21:53<22:38:31, 12.87s/it] {'loss': 0.0043, 'learning_rate': 3.1700000000000005e-05, 'epoch': 1.38} 37%|███▋ | 3668/10000 [13:21:53<22:38:31, 12.87s/it] 37%|███▋ | 3669/10000 [13:22:06<22:39:38, 12.89s/it] {'loss': 0.005, 'learning_rate': 3.1695e-05, 'epoch': 1.38} 37%|███▋ | 3669/10000 [13:22:06<22:39:38, 12.89s/it] 37%|███▋ | 3670/10000 [13:22:19<22:40:55, 12.90s/it] {'loss': 0.0049, 'learning_rate': 3.169e-05, 'epoch': 1.38} 37%|███▋ | 3670/10000 [13:22:19<22:40:55, 12.90s/it] 37%|███▋ | 3671/10000 [13:22:32<22:40:45, 12.90s/it] {'loss': 0.0045, 'learning_rate': 3.1685000000000006e-05, 'epoch': 1.38} 37%|███▋ | 3671/10000 [13:22:32<22:40:45, 12.90s/it] 37%|███▋ | 3672/10000 [13:22:45<22:41:19, 12.91s/it] {'loss': 0.0068, 'learning_rate': 3.168e-05, 'epoch': 1.38} 37%|███▋ | 3672/10000 [13:22:45<22:41:19, 12.91s/it] 37%|███▋ | 3673/10000 [13:22:58<22:40:03, 12.90s/it] {'loss': 0.0048, 'learning_rate': 3.1675e-05, 'epoch': 1.38} 37%|███▋ | 3673/10000 [13:22:58<22:40:03, 12.90s/it] 37%|███▋ | 3674/10000 [13:23:11<22:39:13, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.167e-05, 'epoch': 1.38} 37%|███▋ | 3674/10000 [13:23:11<22:39:13, 12.89s/it] 37%|███▋ | 3675/10000 [13:23:24<22:40:42, 12.91s/it] {'loss': 0.0042, 'learning_rate': 3.1665e-05, 'epoch': 1.38} 37%|███▋ | 3675/10000 [13:23:24<22:40:42, 12.91s/it] 37%|███▋ | 3676/10000 [13:23:36<22:39:28, 12.90s/it] {'loss': 0.0048, 'learning_rate': 3.166e-05, 'epoch': 1.39} 37%|███▋ | 3676/10000 [13:23:36<22:39:28, 12.90s/it] 37%|███▋ | 3677/10000 [13:23:49<22:40:05, 12.91s/it] {'loss': 0.0066, 'learning_rate': 3.1655e-05, 'epoch': 1.39} 37%|███▋ | 3677/10000 [13:23:49<22:40:05, 12.91s/it] 37%|███▋ | 3678/10000 [13:24:02<22:39:55, 12.91s/it] {'loss': 0.0052, 'learning_rate': 3.1650000000000004e-05, 'epoch': 1.39} 37%|███▋ | 3678/10000 [13:24:02<22:39:55, 12.91s/it] 37%|███▋ | 3679/10000 [13:24:15<22:39:14, 12.90s/it] {'loss': 0.0044, 'learning_rate': 3.1645e-05, 'epoch': 1.39} 37%|███▋ | 3679/10000 [13:24:15<22:39:14, 12.90s/it] 37%|███▋ | 3680/10000 [13:24:28<22:37:43, 12.89s/it] {'loss': 0.0053, 'learning_rate': 3.164e-05, 'epoch': 1.39} 37%|███▋ | 3680/10000 [13:24:28<22:37:43, 12.89s/it] 37%|███▋ | 3681/10000 [13:24:41<22:35:09, 12.87s/it] {'loss': 0.0051, 'learning_rate': 3.1635000000000005e-05, 'epoch': 1.39} 37%|███▋ | 3681/10000 [13:24:41<22:35:09, 12.87s/it] 37%|███▋ | 3682/10000 [13:24:54<22:36:43, 12.88s/it] {'loss': 0.0056, 'learning_rate': 3.163000000000001e-05, 'epoch': 1.39} 37%|███▋ | 3682/10000 [13:24:54<22:36:43, 12.88s/it] 37%|███▋ | 3683/10000 [13:25:07<22:36:26, 12.88s/it] {'loss': 0.0046, 'learning_rate': 3.1624999999999996e-05, 'epoch': 1.39} 37%|███▋ | 3683/10000 [13:25:07<22:36:26, 12.88s/it] 37%|███▋ | 3684/10000 [13:25:20<22:35:48, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.162e-05, 'epoch': 1.39} 37%|███▋ | 3684/10000 [13:25:20<22:35:48, 12.88s/it] 37%|███▋ | 3685/10000 [13:25:32<22:36:20, 12.89s/it] {'loss': 0.0047, 'learning_rate': 3.1615e-05, 'epoch': 1.39} 37%|███▋ | 3685/10000 [13:25:32<22:36:20, 12.89s/it] 37%|███▋ | 3686/10000 [13:25:45<22:37:56, 12.90s/it] {'loss': 0.0054, 'learning_rate': 3.1610000000000004e-05, 'epoch': 1.39} 37%|███▋ | 3686/10000 [13:25:45<22:37:56, 12.90s/it] 37%|███▋ | 3687/10000 [13:25:58<22:37:34, 12.90s/it] {'loss': 0.0047, 'learning_rate': 3.1605e-05, 'epoch': 1.39} 37%|███▋ | 3687/10000 [13:25:58<22:37:34, 12.90s/it] 37%|███▋ | 3688/10000 [13:26:11<22:38:32, 12.91s/it] {'loss': 0.0056, 'learning_rate': 3.16e-05, 'epoch': 1.39} 37%|███▋ | 3688/10000 [13:26:11<22:38:32, 12.91s/it] 37%|███▋ | 3689/10000 [13:26:24<22:35:54, 12.89s/it] {'loss': 0.0044, 'learning_rate': 3.1595000000000005e-05, 'epoch': 1.39} 37%|███▋ | 3689/10000 [13:26:24<22:35:54, 12.89s/it] 37%|███▋ | 3690/10000 [13:26:37<22:34:40, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.159e-05, 'epoch': 1.39} 37%|███▋ | 3690/10000 [13:26:37<22:34:40, 12.88s/it] 37%|███▋ | 3691/10000 [13:26:50<22:34:43, 12.88s/it] {'loss': 0.0046, 'learning_rate': 3.1585e-05, 'epoch': 1.39} 37%|███▋ | 3691/10000 [13:26:50<22:34:43, 12.88s/it] 37%|███▋ | 3692/10000 [13:27:03<22:33:57, 12.88s/it] {'loss': 0.0057, 'learning_rate': 3.1580000000000006e-05, 'epoch': 1.39} 37%|███▋ | 3692/10000 [13:27:03<22:33:57, 12.88s/it] 37%|███▋ | 3693/10000 [13:27:16<22:33:05, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.1575e-05, 'epoch': 1.39} 37%|███▋ | 3693/10000 [13:27:16<22:33:05, 12.87s/it] 37%|███▋ | 3694/10000 [13:27:28<22:31:07, 12.86s/it] {'loss': 0.0066, 'learning_rate': 3.157e-05, 'epoch': 1.39} 37%|███▋ | 3694/10000 [13:27:28<22:31:07, 12.86s/it] 37%|███▋ | 3695/10000 [13:27:41<22:32:57, 12.88s/it] {'loss': 0.0052, 'learning_rate': 3.1565e-05, 'epoch': 1.39} 37%|███▋ | 3695/10000 [13:27:41<22:32:57, 12.88s/it] 37%|███▋ | 3696/10000 [13:27:54<22:33:45, 12.88s/it] {'loss': 0.0055, 'learning_rate': 3.156e-05, 'epoch': 1.39} 37%|███▋ | 3696/10000 [13:27:54<22:33:45, 12.88s/it] 37%|███▋ | 3697/10000 [13:28:07<22:38:14, 12.93s/it] {'loss': 0.0044, 'learning_rate': 3.1555e-05, 'epoch': 1.39} 37%|███▋ | 3697/10000 [13:28:07<22:38:14, 12.93s/it] 37%|███▋ | 3698/10000 [13:28:20<22:38:34, 12.93s/it] {'loss': 0.005, 'learning_rate': 3.155e-05, 'epoch': 1.39} 37%|███▋ | 3698/10000 [13:28:20<22:38:34, 12.93s/it] 37%|███▋ | 3699/10000 [13:28:33<22:41:26, 12.96s/it] {'loss': 0.006, 'learning_rate': 3.1545000000000004e-05, 'epoch': 1.39} 37%|███▋ | 3699/10000 [13:28:33<22:41:26, 12.96s/it] 37%|███▋ | 3700/10000 [13:28:46<22:40:06, 12.95s/it] {'loss': 0.0048, 'learning_rate': 3.154e-05, 'epoch': 1.39} 37%|███▋ | 3700/10000 [13:28:46<22:40:06, 12.95s/it] 37%|███▋ | 3701/10000 [13:28:59<22:37:31, 12.93s/it] {'loss': 0.0064, 'learning_rate': 3.1535e-05, 'epoch': 1.39} 37%|███▋ | 3701/10000 [13:28:59<22:37:31, 12.93s/it] 37%|███▋ | 3702/10000 [13:29:12<22:39:02, 12.95s/it] {'loss': 0.0053, 'learning_rate': 3.1530000000000005e-05, 'epoch': 1.39} 37%|███▋ | 3702/10000 [13:29:12<22:39:02, 12.95s/it] 37%|███▋ | 3703/10000 [13:29:25<22:39:13, 12.95s/it] {'loss': 0.0049, 'learning_rate': 3.1525e-05, 'epoch': 1.4} 37%|███▋ | 3703/10000 [13:29:25<22:39:13, 12.95s/it] 37%|███▋ | 3704/10000 [13:29:38<22:38:18, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.1519999999999996e-05, 'epoch': 1.4} 37%|███▋ | 3704/10000 [13:29:38<22:38:18, 12.94s/it] 37%|███▋ | 3705/10000 [13:29:51<22:39:12, 12.96s/it] {'loss': 0.0041, 'learning_rate': 3.1515e-05, 'epoch': 1.4} 37%|███▋ | 3705/10000 [13:29:51<22:39:12, 12.96s/it] 37%|███▋ | 3706/10000 [13:30:04<22:38:00, 12.95s/it] {'loss': 0.0054, 'learning_rate': 3.151e-05, 'epoch': 1.4} 37%|███▋ | 3706/10000 [13:30:04<22:38:00, 12.95s/it] 37%|███▋ | 3707/10000 [13:30:17<22:37:01, 12.94s/it] {'loss': 0.0044, 'learning_rate': 3.1505000000000004e-05, 'epoch': 1.4} 37%|███▋ | 3707/10000 [13:30:17<22:37:01, 12.94s/it] 37%|███▋ | 3708/10000 [13:30:30<22:37:07, 12.94s/it] {'loss': 0.0056, 'learning_rate': 3.15e-05, 'epoch': 1.4} 37%|███▋ | 3708/10000 [13:30:30<22:37:07, 12.94s/it] 37%|███▋ | 3709/10000 [13:30:42<22:33:19, 12.91s/it] {'loss': 0.007, 'learning_rate': 3.1495e-05, 'epoch': 1.4} 37%|███▋ | 3709/10000 [13:30:42<22:33:19, 12.91s/it] 37%|███▋ | 3710/10000 [13:30:55<22:33:11, 12.91s/it] {'loss': 0.0046, 'learning_rate': 3.1490000000000005e-05, 'epoch': 1.4} 37%|███▋ | 3710/10000 [13:30:55<22:33:11, 12.91s/it] 37%|███▋ | 3711/10000 [13:31:08<22:29:11, 12.87s/it] {'loss': 0.0048, 'learning_rate': 3.1485e-05, 'epoch': 1.4} 37%|███▋ | 3711/10000 [13:31:08<22:29:11, 12.87s/it] 37%|███▋ | 3712/10000 [13:31:21<22:28:44, 12.87s/it] {'loss': 0.0049, 'learning_rate': 3.1480000000000004e-05, 'epoch': 1.4} 37%|███▋ | 3712/10000 [13:31:21<22:28:44, 12.87s/it] 37%|███▋ | 3713/10000 [13:31:34<22:27:07, 12.86s/it] {'loss': 0.0053, 'learning_rate': 3.1475e-05, 'epoch': 1.4} 37%|███▋ | 3713/10000 [13:31:34<22:27:07, 12.86s/it] 37%|███▋ | 3714/10000 [13:31:47<22:24:44, 12.84s/it] {'loss': 0.0056, 'learning_rate': 3.147e-05, 'epoch': 1.4} 37%|███▋ | 3714/10000 [13:31:47<22:24:44, 12.84s/it] 37%|███▋ | 3715/10000 [13:32:00<22:25:19, 12.84s/it] {'loss': 0.0052, 'learning_rate': 3.1465e-05, 'epoch': 1.4} 37%|███▋ | 3715/10000 [13:32:00<22:25:19, 12.84s/it] 37%|███▋ | 3716/10000 [13:32:12<22:25:43, 12.85s/it] {'loss': 0.0051, 'learning_rate': 3.146e-05, 'epoch': 1.4} 37%|███▋ | 3716/10000 [13:32:12<22:25:43, 12.85s/it] 37%|███▋ | 3717/10000 [13:32:25<22:25:51, 12.85s/it] {'loss': 0.0041, 'learning_rate': 3.1455e-05, 'epoch': 1.4} 37%|███▋ | 3717/10000 [13:32:25<22:25:51, 12.85s/it] 37%|███▋ | 3718/10000 [13:32:38<22:26:13, 12.86s/it] {'loss': 0.0065, 'learning_rate': 3.145e-05, 'epoch': 1.4} 37%|███▋ | 3718/10000 [13:32:38<22:26:13, 12.86s/it] 37%|███▋ | 3719/10000 [13:32:51<22:27:37, 12.87s/it] {'loss': 0.0045, 'learning_rate': 3.1445e-05, 'epoch': 1.4} 37%|███▋ | 3719/10000 [13:32:51<22:27:37, 12.87s/it] 37%|███▋ | 3720/10000 [13:33:04<22:26:29, 12.86s/it] {'loss': 0.0065, 'learning_rate': 3.1440000000000004e-05, 'epoch': 1.4} 37%|███▋ | 3720/10000 [13:33:04<22:26:29, 12.86s/it] 37%|███▋ | 3721/10000 [13:33:17<22:25:29, 12.86s/it] {'loss': 0.0065, 'learning_rate': 3.1435000000000007e-05, 'epoch': 1.4} 37%|███▋ | 3721/10000 [13:33:17<22:25:29, 12.86s/it] 37%|███▋ | 3722/10000 [13:33:30<22:24:29, 12.85s/it] {'loss': 0.0067, 'learning_rate': 3.143e-05, 'epoch': 1.4} 37%|███▋ | 3722/10000 [13:33:30<22:24:29, 12.85s/it] 37%|███▋ | 3723/10000 [13:33:42<22:24:52, 12.86s/it] {'loss': 0.005, 'learning_rate': 3.1425e-05, 'epoch': 1.4} 37%|███▋ | 3723/10000 [13:33:42<22:24:52, 12.86s/it] 37%|███▋ | 3724/10000 [13:33:55<22:23:40, 12.85s/it] {'loss': 0.0055, 'learning_rate': 3.142e-05, 'epoch': 1.4} 37%|███▋ | 3724/10000 [13:33:55<22:23:40, 12.85s/it] 37%|███▋ | 3725/10000 [13:34:08<22:21:46, 12.83s/it] {'loss': 0.0043, 'learning_rate': 3.1415e-05, 'epoch': 1.4} 37%|███▋ | 3725/10000 [13:34:08<22:21:46, 12.83s/it] 37%|███▋ | 3726/10000 [13:34:21<22:23:22, 12.85s/it] {'loss': 0.0046, 'learning_rate': 3.141e-05, 'epoch': 1.4} 37%|███▋ | 3726/10000 [13:34:21<22:23:22, 12.85s/it] 37%|███▋ | 3727/10000 [13:34:34<22:25:08, 12.87s/it] {'loss': 0.0103, 'learning_rate': 3.1405e-05, 'epoch': 1.4} 37%|███▋ | 3727/10000 [13:34:34<22:25:08, 12.87s/it] 37%|███▋ | 3728/10000 [13:34:47<22:24:50, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.1400000000000004e-05, 'epoch': 1.4} 37%|███▋ | 3728/10000 [13:34:47<22:24:50, 12.87s/it] 37%|███▋ | 3729/10000 [13:35:00<22:24:47, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.1395e-05, 'epoch': 1.41} 37%|███▋ | 3729/10000 [13:35:00<22:24:47, 12.87s/it] 37%|███▋ | 3730/10000 [13:35:12<22:26:22, 12.88s/it] {'loss': 0.0044, 'learning_rate': 3.139e-05, 'epoch': 1.41} 37%|███▋ | 3730/10000 [13:35:12<22:26:22, 12.88s/it] 37%|███▋ | 3731/10000 [13:35:25<22:24:55, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.1385000000000005e-05, 'epoch': 1.41} 37%|███▋ | 3731/10000 [13:35:25<22:24:55, 12.87s/it] 37%|███▋ | 3732/10000 [13:35:38<22:22:49, 12.85s/it] {'loss': 0.0067, 'learning_rate': 3.138e-05, 'epoch': 1.41} 37%|███▋ | 3732/10000 [13:35:38<22:22:49, 12.85s/it] 37%|███▋ | 3733/10000 [13:35:51<22:21:31, 12.84s/it] {'loss': 0.0061, 'learning_rate': 3.1375e-05, 'epoch': 1.41} 37%|███▋ | 3733/10000 [13:35:51<22:21:31, 12.84s/it] 37%|███▋ | 3734/10000 [13:36:04<22:23:57, 12.87s/it] {'loss': 0.0068, 'learning_rate': 3.137e-05, 'epoch': 1.41} 37%|███▋ | 3734/10000 [13:36:04<22:23:57, 12.87s/it] 37%|███▋ | 3735/10000 [13:36:17<22:25:57, 12.89s/it] {'loss': 0.0042, 'learning_rate': 3.1365e-05, 'epoch': 1.41} 37%|███▋ | 3735/10000 [13:36:17<22:25:57, 12.89s/it] 37%|███▋ | 3736/10000 [13:36:30<22:22:59, 12.86s/it] {'loss': 0.0059, 'learning_rate': 3.136e-05, 'epoch': 1.41} 37%|███▋ | 3736/10000 [13:36:30<22:22:59, 12.86s/it] 37%|███▋ | 3737/10000 [13:36:43<22:24:43, 12.88s/it] {'loss': 0.0056, 'learning_rate': 3.1355e-05, 'epoch': 1.41} 37%|███▋ | 3737/10000 [13:36:43<22:24:43, 12.88s/it] 37%|███▋ | 3738/10000 [13:36:55<22:26:12, 12.90s/it] {'loss': 0.0058, 'learning_rate': 3.135e-05, 'epoch': 1.41} 37%|███▋ | 3738/10000 [13:36:56<22:26:12, 12.90s/it] 37%|███▋ | 3739/10000 [13:37:08<22:26:44, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.1345e-05, 'epoch': 1.41} 37%|███▋ | 3739/10000 [13:37:08<22:26:44, 12.91s/it] 37%|███▋ | 3740/10000 [13:37:21<22:24:36, 12.89s/it] {'loss': 0.0062, 'learning_rate': 3.134e-05, 'epoch': 1.41} 37%|███▋ | 3740/10000 [13:37:21<22:24:36, 12.89s/it] 37%|███▋ | 3741/10000 [13:37:34<22:24:22, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.1335000000000004e-05, 'epoch': 1.41} 37%|███▋ | 3741/10000 [13:37:34<22:24:22, 12.89s/it] 37%|███▋ | 3742/10000 [13:37:47<22:24:43, 12.89s/it] {'loss': 0.0066, 'learning_rate': 3.133000000000001e-05, 'epoch': 1.41} 37%|███▋ | 3742/10000 [13:37:47<22:24:43, 12.89s/it] 37%|███▋ | 3743/10000 [13:38:00<22:23:56, 12.89s/it] {'loss': 0.0072, 'learning_rate': 3.1324999999999996e-05, 'epoch': 1.41} 37%|███▋ | 3743/10000 [13:38:00<22:23:56, 12.89s/it] 37%|███▋ | 3744/10000 [13:38:13<22:23:36, 12.89s/it] {'loss': 0.008, 'learning_rate': 3.132e-05, 'epoch': 1.41} 37%|███▋ | 3744/10000 [13:38:13<22:23:36, 12.89s/it] 37%|███▋ | 3745/10000 [13:38:26<22:24:01, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.1315e-05, 'epoch': 1.41} 37%|███▋ | 3745/10000 [13:38:26<22:24:01, 12.89s/it] 37%|███▋ | 3746/10000 [13:38:39<22:22:57, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.1310000000000003e-05, 'epoch': 1.41} 37%|███▋ | 3746/10000 [13:38:39<22:22:57, 12.88s/it] 37%|███▋ | 3747/10000 [13:38:51<22:21:30, 12.87s/it] {'loss': 0.0061, 'learning_rate': 3.1305e-05, 'epoch': 1.41} 37%|███▋ | 3747/10000 [13:38:51<22:21:30, 12.87s/it] 37%|███▋ | 3748/10000 [13:39:04<22:22:13, 12.88s/it] {'loss': 0.0056, 'learning_rate': 3.13e-05, 'epoch': 1.41} 37%|███▋ | 3748/10000 [13:39:04<22:22:13, 12.88s/it] 37%|███▋ | 3749/10000 [13:39:17<22:22:46, 12.89s/it] {'loss': 0.0064, 'learning_rate': 3.1295000000000004e-05, 'epoch': 1.41} 37%|███▋ | 3749/10000 [13:39:17<22:22:46, 12.89s/it] 38%|███▊ | 3750/10000 [13:39:30<22:24:03, 12.90s/it] {'loss': 0.0059, 'learning_rate': 3.129e-05, 'epoch': 1.41} 38%|███▊ | 3750/10000 [13:39:30<22:24:03, 12.90s/it] 38%|███▊ | 3751/10000 [13:39:43<22:24:59, 12.91s/it] {'loss': 0.0055, 'learning_rate': 3.1285e-05, 'epoch': 1.41} 38%|███▊ | 3751/10000 [13:39:43<22:24:59, 12.91s/it] 38%|███▊ | 3752/10000 [13:39:56<22:21:48, 12.89s/it] {'loss': 0.0113, 'learning_rate': 3.1280000000000005e-05, 'epoch': 1.41} 38%|███▊ | 3752/10000 [13:39:56<22:21:48, 12.89s/it] 38%|███▊ | 3753/10000 [13:40:09<22:23:12, 12.90s/it] {'loss': 0.0048, 'learning_rate': 3.1275e-05, 'epoch': 1.41} 38%|███▊ | 3753/10000 [13:40:09<22:23:12, 12.90s/it] 38%|███▊ | 3754/10000 [13:40:22<22:21:27, 12.89s/it] {'loss': 0.0063, 'learning_rate': 3.127e-05, 'epoch': 1.41} 38%|███▊ | 3754/10000 [13:40:22<22:21:27, 12.89s/it] 38%|███▊ | 3755/10000 [13:40:35<22:21:03, 12.88s/it] {'loss': 0.006, 'learning_rate': 3.1265e-05, 'epoch': 1.41} 38%|███▊ | 3755/10000 [13:40:35<22:21:03, 12.88s/it] 38%|███▊ | 3756/10000 [13:40:47<22:19:26, 12.87s/it] {'loss': 0.0072, 'learning_rate': 3.126e-05, 'epoch': 1.42} 38%|███▊ | 3756/10000 [13:40:47<22:19:26, 12.87s/it] 38%|███▊ | 3757/10000 [13:41:00<22:17:03, 12.85s/it] {'loss': 0.0053, 'learning_rate': 3.1255e-05, 'epoch': 1.42} 38%|███▊ | 3757/10000 [13:41:00<22:17:03, 12.85s/it] 38%|███▊ | 3758/10000 [13:41:13<22:17:56, 12.86s/it] {'loss': 0.0065, 'learning_rate': 3.125e-05, 'epoch': 1.42} 38%|███▊ | 3758/10000 [13:41:13<22:17:56, 12.86s/it] 38%|███▊ | 3759/10000 [13:41:26<22:16:30, 12.85s/it] {'loss': 0.0079, 'learning_rate': 3.1245e-05, 'epoch': 1.42} 38%|███▊ | 3759/10000 [13:41:26<22:16:30, 12.85s/it] 38%|███▊ | 3760/10000 [13:41:39<22:19:49, 12.88s/it] {'loss': 0.0102, 'learning_rate': 3.1240000000000006e-05, 'epoch': 1.42} 38%|███▊ | 3760/10000 [13:41:39<22:19:49, 12.88s/it] 38%|███▊ | 3761/10000 [13:41:52<22:22:56, 12.91s/it] {'loss': 0.0059, 'learning_rate': 3.1235e-05, 'epoch': 1.42} 38%|███▊ | 3761/10000 [13:41:52<22:22:56, 12.91s/it] 38%|███▊ | 3762/10000 [13:42:05<22:20:57, 12.90s/it] {'loss': 0.0053, 'learning_rate': 3.1230000000000004e-05, 'epoch': 1.42} 38%|███▊ | 3762/10000 [13:42:05<22:20:57, 12.90s/it] 38%|███▊ | 3763/10000 [13:42:18<22:18:29, 12.88s/it] {'loss': 0.008, 'learning_rate': 3.122500000000001e-05, 'epoch': 1.42} 38%|███▊ | 3763/10000 [13:42:18<22:18:29, 12.88s/it] 38%|███▊ | 3764/10000 [13:42:30<22:20:15, 12.90s/it] {'loss': 0.006, 'learning_rate': 3.122e-05, 'epoch': 1.42} 38%|███▊ | 3764/10000 [13:42:31<22:20:15, 12.90s/it] 38%|███▊ | 3765/10000 [13:42:43<22:21:46, 12.91s/it] {'loss': 0.0076, 'learning_rate': 3.1215e-05, 'epoch': 1.42} 38%|███▊ | 3765/10000 [13:42:43<22:21:46, 12.91s/it] 38%|███▊ | 3766/10000 [13:42:56<22:20:52, 12.91s/it] {'loss': 0.0053, 'learning_rate': 3.121e-05, 'epoch': 1.42} 38%|███▊ | 3766/10000 [13:42:56<22:20:52, 12.91s/it] 38%|███▊ | 3767/10000 [13:43:09<22:21:15, 12.91s/it] {'loss': 0.0058, 'learning_rate': 3.1205000000000004e-05, 'epoch': 1.42} 38%|███▊ | 3767/10000 [13:43:09<22:21:15, 12.91s/it] 38%|███▊ | 3768/10000 [13:43:22<22:19:46, 12.90s/it] {'loss': 0.005, 'learning_rate': 3.12e-05, 'epoch': 1.42} 38%|███▊ | 3768/10000 [13:43:22<22:19:46, 12.90s/it] 38%|███▊ | 3769/10000 [13:43:35<22:18:18, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.1195e-05, 'epoch': 1.42} 38%|███▊ | 3769/10000 [13:43:35<22:18:18, 12.89s/it] 38%|███▊ | 3770/10000 [13:43:48<22:19:54, 12.90s/it] {'loss': 0.0064, 'learning_rate': 3.1190000000000005e-05, 'epoch': 1.42} 38%|███▊ | 3770/10000 [13:43:48<22:19:54, 12.90s/it] 38%|███▊ | 3771/10000 [13:44:01<22:18:10, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.1185e-05, 'epoch': 1.42} 38%|███▊ | 3771/10000 [13:44:01<22:18:10, 12.89s/it] 38%|███▊ | 3772/10000 [13:44:14<22:15:54, 12.87s/it] {'loss': 0.0051, 'learning_rate': 3.118e-05, 'epoch': 1.42} 38%|███▊ | 3772/10000 [13:44:14<22:15:54, 12.87s/it] 38%|███▊ | 3773/10000 [13:44:27<22:16:30, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.1175000000000006e-05, 'epoch': 1.42} 38%|███▊ | 3773/10000 [13:44:27<22:16:30, 12.88s/it] 38%|███▊ | 3774/10000 [13:44:39<22:15:29, 12.87s/it] {'loss': 0.0072, 'learning_rate': 3.117e-05, 'epoch': 1.42} 38%|███▊ | 3774/10000 [13:44:39<22:15:29, 12.87s/it] 38%|███▊ | 3775/10000 [13:44:52<22:15:34, 12.87s/it] {'loss': 0.0072, 'learning_rate': 3.1165e-05, 'epoch': 1.42} 38%|███▊ | 3775/10000 [13:44:52<22:15:34, 12.87s/it] 38%|███▊ | 3776/10000 [13:45:05<22:17:46, 12.90s/it] {'loss': 0.0077, 'learning_rate': 3.116e-05, 'epoch': 1.42} 38%|███▊ | 3776/10000 [13:45:05<22:17:46, 12.90s/it] 38%|███▊ | 3777/10000 [13:45:18<22:15:44, 12.88s/it] {'loss': 0.0057, 'learning_rate': 3.1155e-05, 'epoch': 1.42} 38%|███▊ | 3777/10000 [13:45:18<22:15:44, 12.88s/it] 38%|███▊ | 3778/10000 [13:45:31<22:13:56, 12.86s/it] {'loss': 0.006, 'learning_rate': 3.115e-05, 'epoch': 1.42} 38%|███▊ | 3778/10000 [13:45:31<22:13:56, 12.86s/it] 38%|███▊ | 3779/10000 [13:45:44<22:15:14, 12.88s/it] {'loss': 0.0068, 'learning_rate': 3.1145e-05, 'epoch': 1.42} 38%|███▊ | 3779/10000 [13:45:44<22:15:14, 12.88s/it] 38%|███▊ | 3780/10000 [13:45:57<22:16:39, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.1140000000000003e-05, 'epoch': 1.42} 38%|███▊ | 3780/10000 [13:45:57<22:16:39, 12.89s/it] 38%|███▊ | 3781/10000 [13:46:10<22:17:03, 12.90s/it] {'loss': 0.0084, 'learning_rate': 3.1135000000000006e-05, 'epoch': 1.42} 38%|███▊ | 3781/10000 [13:46:10<22:17:03, 12.90s/it] 38%|███▊ | 3782/10000 [13:46:22<22:14:13, 12.87s/it] {'loss': 0.0066, 'learning_rate': 3.113e-05, 'epoch': 1.43} 38%|███▊ | 3782/10000 [13:46:22<22:14:13, 12.87s/it] 38%|███▊ | 3783/10000 [13:46:35<22:16:55, 12.90s/it] {'loss': 0.0056, 'learning_rate': 3.1125000000000004e-05, 'epoch': 1.43} 38%|███▊ | 3783/10000 [13:46:35<22:16:55, 12.90s/it] 38%|███▊ | 3784/10000 [13:46:48<22:16:37, 12.90s/it] {'loss': 0.0053, 'learning_rate': 3.112e-05, 'epoch': 1.43} 38%|███▊ | 3784/10000 [13:46:48<22:16:37, 12.90s/it] 38%|███▊ | 3785/10000 [13:47:01<22:15:00, 12.89s/it] {'loss': 0.005, 'learning_rate': 3.1115e-05, 'epoch': 1.43} 38%|███▊ | 3785/10000 [13:47:01<22:15:00, 12.89s/it] 38%|███▊ | 3786/10000 [13:47:14<22:14:38, 12.89s/it] {'loss': 0.0062, 'learning_rate': 3.111e-05, 'epoch': 1.43} 38%|███▊ | 3786/10000 [13:47:14<22:14:38, 12.89s/it] 38%|███▊ | 3787/10000 [13:47:27<22:16:19, 12.91s/it] {'loss': 0.0053, 'learning_rate': 3.1105e-05, 'epoch': 1.43} 38%|███▊ | 3787/10000 [13:47:27<22:16:19, 12.91s/it] 38%|███▊ | 3788/10000 [13:47:40<22:14:53, 12.89s/it] {'loss': 0.0056, 'learning_rate': 3.1100000000000004e-05, 'epoch': 1.43} 38%|███▊ | 3788/10000 [13:47:40<22:14:53, 12.89s/it] 38%|███▊ | 3789/10000 [13:47:53<22:13:57, 12.89s/it] {'loss': 0.0041, 'learning_rate': 3.1095e-05, 'epoch': 1.43} 38%|███▊ | 3789/10000 [13:47:53<22:13:57, 12.89s/it] 38%|███▊ | 3790/10000 [13:48:06<22:11:24, 12.86s/it] {'loss': 0.007, 'learning_rate': 3.109e-05, 'epoch': 1.43} 38%|███▊ | 3790/10000 [13:48:06<22:11:24, 12.86s/it] 38%|███▊ | 3791/10000 [13:48:18<22:10:46, 12.86s/it] {'loss': 0.0056, 'learning_rate': 3.1085000000000005e-05, 'epoch': 1.43} 38%|███▊ | 3791/10000 [13:48:18<22:10:46, 12.86s/it] 38%|███▊ | 3792/10000 [13:48:31<22:13:29, 12.89s/it] {'loss': 0.0057, 'learning_rate': 3.108e-05, 'epoch': 1.43} 38%|███▊ | 3792/10000 [13:48:31<22:13:29, 12.89s/it] 38%|███▊ | 3793/10000 [13:48:44<22:16:52, 12.92s/it] {'loss': 0.0044, 'learning_rate': 3.1075e-05, 'epoch': 1.43} 38%|███▊ | 3793/10000 [13:48:44<22:16:52, 12.92s/it] 38%|███▊ | 3794/10000 [13:48:57<22:14:20, 12.90s/it] {'loss': 0.006, 'learning_rate': 3.107e-05, 'epoch': 1.43} 38%|███▊ | 3794/10000 [13:48:57<22:14:20, 12.90s/it] 38%|███▊ | 3795/10000 [13:49:10<22:14:43, 12.91s/it] {'loss': 0.0053, 'learning_rate': 3.1065e-05, 'epoch': 1.43} 38%|███▊ | 3795/10000 [13:49:10<22:14:43, 12.91s/it] 38%|███▊ | 3796/10000 [13:49:23<22:15:04, 12.91s/it] {'loss': 0.0044, 'learning_rate': 3.106e-05, 'epoch': 1.43} 38%|███▊ | 3796/10000 [13:49:23<22:15:04, 12.91s/it] 38%|███▊ | 3797/10000 [13:49:36<22:12:24, 12.89s/it] {'loss': 0.0065, 'learning_rate': 3.1055e-05, 'epoch': 1.43} 38%|███▊ | 3797/10000 [13:49:36<22:12:24, 12.89s/it] 38%|███▊ | 3798/10000 [13:49:49<22:12:13, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.105e-05, 'epoch': 1.43} 38%|███▊ | 3798/10000 [13:49:49<22:12:13, 12.89s/it] 38%|███▊ | 3799/10000 [13:50:02<22:11:33, 12.88s/it] {'loss': 0.0093, 'learning_rate': 3.1045000000000005e-05, 'epoch': 1.43} 38%|███▊ | 3799/10000 [13:50:02<22:11:33, 12.88s/it] 38%|███▊ | 3800/10000 [13:50:15<22:11:35, 12.89s/it] {'loss': 0.0095, 'learning_rate': 3.104e-05, 'epoch': 1.43} 38%|███▊ | 3800/10000 [13:50:15<22:11:35, 12.89s/it] 38%|███▊ | 3801/10000 [13:50:27<22:09:21, 12.87s/it] {'loss': 0.0098, 'learning_rate': 3.1035000000000004e-05, 'epoch': 1.43} 38%|███▊ | 3801/10000 [13:50:27<22:09:21, 12.87s/it] 38%|███▊ | 3802/10000 [13:50:40<22:09:45, 12.87s/it] {'loss': 0.0058, 'learning_rate': 3.1030000000000006e-05, 'epoch': 1.43} 38%|███▊ | 3802/10000 [13:50:40<22:09:45, 12.87s/it] 38%|███▊ | 3803/10000 [13:50:53<22:10:01, 12.88s/it] {'loss': 0.01, 'learning_rate': 3.1025e-05, 'epoch': 1.43} 38%|███▊ | 3803/10000 [13:50:53<22:10:01, 12.88s/it] 38%|███▊ | 3804/10000 [13:51:06<22:10:34, 12.88s/it] {'loss': 0.0062, 'learning_rate': 3.102e-05, 'epoch': 1.43} 38%|███▊ | 3804/10000 [13:51:06<22:10:34, 12.88s/it] 38%|███▊ | 3805/10000 [13:51:19<22:10:12, 12.88s/it] {'loss': 0.006, 'learning_rate': 3.1015e-05, 'epoch': 1.43} 38%|███▊ | 3805/10000 [13:51:19<22:10:12, 12.88s/it] 38%|███▊ | 3806/10000 [13:51:32<22:12:41, 12.91s/it] {'loss': 0.0064, 'learning_rate': 3.101e-05, 'epoch': 1.43} 38%|███▊ | 3806/10000 [13:51:32<22:12:41, 12.91s/it] 38%|███▊ | 3807/10000 [13:51:45<22:11:05, 12.90s/it] {'loss': 0.0074, 'learning_rate': 3.1005e-05, 'epoch': 1.43} 38%|███▊ | 3807/10000 [13:51:45<22:11:05, 12.90s/it] 38%|███▊ | 3808/10000 [13:51:58<22:11:02, 12.90s/it] {'loss': 0.0075, 'learning_rate': 3.1e-05, 'epoch': 1.43} 38%|███▊ | 3808/10000 [13:51:58<22:11:02, 12.90s/it] 38%|███▊ | 3809/10000 [13:52:11<22:12:46, 12.92s/it] {'loss': 0.0109, 'learning_rate': 3.0995000000000004e-05, 'epoch': 1.44} 38%|███▊ | 3809/10000 [13:52:11<22:12:46, 12.92s/it] 38%|███▊ | 3810/10000 [13:52:24<22:12:17, 12.91s/it] {'loss': 0.008, 'learning_rate': 3.099e-05, 'epoch': 1.44} 38%|███▊ | 3810/10000 [13:52:24<22:12:17, 12.91s/it] 38%|███▊ | 3811/10000 [13:52:36<22:12:06, 12.91s/it] {'loss': 0.0086, 'learning_rate': 3.0985e-05, 'epoch': 1.44} 38%|███▊ | 3811/10000 [13:52:36<22:12:06, 12.91s/it] 38%|███▊ | 3812/10000 [13:52:49<22:10:22, 12.90s/it] {'loss': 0.0076, 'learning_rate': 3.0980000000000005e-05, 'epoch': 1.44} 38%|███▊ | 3812/10000 [13:52:49<22:10:22, 12.90s/it] 38%|███▊ | 3813/10000 [13:53:02<22:09:03, 12.89s/it] {'loss': 0.007, 'learning_rate': 3.0975e-05, 'epoch': 1.44} 38%|███▊ | 3813/10000 [13:53:02<22:09:03, 12.89s/it] 38%|███▊ | 3814/10000 [13:53:15<22:05:20, 12.85s/it] {'loss': 0.0107, 'learning_rate': 3.0969999999999997e-05, 'epoch': 1.44} 38%|███▊ | 3814/10000 [13:53:15<22:05:20, 12.85s/it] 38%|███▊ | 3815/10000 [13:53:28<22:04:36, 12.85s/it] {'loss': 0.0058, 'learning_rate': 3.0965e-05, 'epoch': 1.44} 38%|███▊ | 3815/10000 [13:53:28<22:04:36, 12.85s/it] 38%|███▊ | 3816/10000 [13:53:41<22:02:16, 12.83s/it] {'loss': 0.0089, 'learning_rate': 3.096e-05, 'epoch': 1.44} 38%|███▊ | 3816/10000 [13:53:41<22:02:16, 12.83s/it] 38%|███▊ | 3817/10000 [13:53:53<22:04:57, 12.86s/it] {'loss': 0.0068, 'learning_rate': 3.0955e-05, 'epoch': 1.44} 38%|███▊ | 3817/10000 [13:53:53<22:04:57, 12.86s/it] 38%|███▊ | 3818/10000 [13:54:06<22:06:58, 12.88s/it] {'loss': 0.0062, 'learning_rate': 3.095e-05, 'epoch': 1.44} 38%|███▊ | 3818/10000 [13:54:06<22:06:58, 12.88s/it] 38%|███▊ | 3819/10000 [13:54:19<22:04:34, 12.86s/it] {'loss': 0.0079, 'learning_rate': 3.0945e-05, 'epoch': 1.44} 38%|███▊ | 3819/10000 [13:54:19<22:04:34, 12.86s/it] 38%|███▊ | 3820/10000 [13:54:32<22:04:12, 12.86s/it] {'loss': 0.0101, 'learning_rate': 3.0940000000000005e-05, 'epoch': 1.44} 38%|███▊ | 3820/10000 [13:54:32<22:04:12, 12.86s/it] 38%|███▊ | 3821/10000 [13:54:45<22:07:05, 12.89s/it] {'loss': 0.0109, 'learning_rate': 3.0935e-05, 'epoch': 1.44} 38%|███▊ | 3821/10000 [13:54:45<22:07:05, 12.89s/it] 38%|███▊ | 3822/10000 [13:54:58<22:05:08, 12.87s/it] {'loss': 0.0068, 'learning_rate': 3.0930000000000004e-05, 'epoch': 1.44} 38%|███▊ | 3822/10000 [13:54:58<22:05:08, 12.87s/it] 38%|███▊ | 3823/10000 [13:55:11<22:04:31, 12.87s/it] {'loss': 0.0081, 'learning_rate': 3.0925000000000006e-05, 'epoch': 1.44} 38%|███▊ | 3823/10000 [13:55:11<22:04:31, 12.87s/it] 38%|███▊ | 3824/10000 [13:55:24<22:04:51, 12.87s/it] {'loss': 0.0074, 'learning_rate': 3.092e-05, 'epoch': 1.44} 38%|███▊ | 3824/10000 [13:55:24<22:04:51, 12.87s/it] 38%|███▊ | 3825/10000 [13:55:36<22:04:11, 12.87s/it] {'loss': 0.0106, 'learning_rate': 3.0915e-05, 'epoch': 1.44} 38%|███▊ | 3825/10000 [13:55:36<22:04:11, 12.87s/it] 38%|███▊ | 3826/10000 [13:55:49<22:05:58, 12.89s/it] {'loss': 0.0093, 'learning_rate': 3.091e-05, 'epoch': 1.44} 38%|███▊ | 3826/10000 [13:55:49<22:05:58, 12.89s/it] 38%|███▊ | 3827/10000 [13:56:02<22:04:42, 12.88s/it] {'loss': 0.0113, 'learning_rate': 3.0905e-05, 'epoch': 1.44} 38%|███▊ | 3827/10000 [13:56:02<22:04:42, 12.88s/it] 38%|███▊ | 3828/10000 [13:56:15<22:03:44, 12.87s/it] {'loss': 0.0147, 'learning_rate': 3.09e-05, 'epoch': 1.44} 38%|███▊ | 3828/10000 [13:56:15<22:03:44, 12.87s/it] 38%|███▊ | 3829/10000 [13:56:28<22:07:29, 12.91s/it] {'loss': 0.0125, 'learning_rate': 3.0895e-05, 'epoch': 1.44} 38%|███▊ | 3829/10000 [13:56:28<22:07:29, 12.91s/it] 38%|███▊ | 3830/10000 [13:56:41<22:05:49, 12.89s/it] {'loss': 0.0086, 'learning_rate': 3.0890000000000004e-05, 'epoch': 1.44} 38%|███▊ | 3830/10000 [13:56:41<22:05:49, 12.89s/it] 38%|███▊ | 3831/10000 [13:56:54<22:05:56, 12.90s/it] {'loss': 0.0078, 'learning_rate': 3.0885e-05, 'epoch': 1.44} 38%|███▊ | 3831/10000 [13:56:54<22:05:56, 12.90s/it] 38%|███▊ | 3832/10000 [13:57:07<22:04:13, 12.88s/it] {'loss': 0.0075, 'learning_rate': 3.088e-05, 'epoch': 1.44} 38%|███▊ | 3832/10000 [13:57:07<22:04:13, 12.88s/it] 38%|███▊ | 3833/10000 [13:57:20<22:03:39, 12.88s/it] {'loss': 0.0087, 'learning_rate': 3.0875000000000005e-05, 'epoch': 1.44} 38%|███▊ | 3833/10000 [13:57:20<22:03:39, 12.88s/it] 38%|███▊ | 3834/10000 [13:57:32<22:03:54, 12.88s/it] {'loss': 0.0064, 'learning_rate': 3.087e-05, 'epoch': 1.44} 38%|███▊ | 3834/10000 [13:57:32<22:03:54, 12.88s/it] 38%|███▊ | 3835/10000 [13:57:45<22:03:53, 12.88s/it] {'loss': 0.0083, 'learning_rate': 3.0865e-05, 'epoch': 1.44} 38%|███▊ | 3835/10000 [13:57:45<22:03:53, 12.88s/it] 38%|███▊ | 3836/10000 [13:57:58<22:03:13, 12.88s/it] {'loss': 0.0081, 'learning_rate': 3.086e-05, 'epoch': 1.45} 38%|███▊ | 3836/10000 [13:57:58<22:03:13, 12.88s/it] 38%|███▊ | 3837/10000 [13:58:11<22:02:27, 12.87s/it] {'loss': 0.007, 'learning_rate': 3.0855e-05, 'epoch': 1.45} 38%|███▊ | 3837/10000 [13:58:11<22:02:27, 12.87s/it] 38%|███▊ | 3838/10000 [13:58:24<22:01:32, 12.87s/it] {'loss': 0.0076, 'learning_rate': 3.0850000000000004e-05, 'epoch': 1.45} 38%|███▊ | 3838/10000 [13:58:24<22:01:32, 12.87s/it] 38%|███▊ | 3839/10000 [13:58:37<22:00:22, 12.86s/it] {'loss': 0.0099, 'learning_rate': 3.0845e-05, 'epoch': 1.45} 38%|███▊ | 3839/10000 [13:58:37<22:00:22, 12.86s/it] 38%|███▊ | 3840/10000 [13:58:50<22:01:29, 12.87s/it] {'loss': 0.0076, 'learning_rate': 3.084e-05, 'epoch': 1.45} 38%|███▊ | 3840/10000 [13:58:50<22:01:29, 12.87s/it] 38%|███▊ | 3841/10000 [13:59:02<21:58:50, 12.85s/it] {'loss': 0.0083, 'learning_rate': 3.0835000000000005e-05, 'epoch': 1.45} 38%|███▊ | 3841/10000 [13:59:02<21:58:50, 12.85s/it] 38%|███▊ | 3842/10000 [13:59:15<22:01:17, 12.87s/it] {'loss': 0.0076, 'learning_rate': 3.083e-05, 'epoch': 1.45} 38%|███▊ | 3842/10000 [13:59:15<22:01:17, 12.87s/it] 38%|███▊ | 3843/10000 [13:59:28<21:59:48, 12.86s/it] {'loss': 0.0094, 'learning_rate': 3.0825000000000004e-05, 'epoch': 1.45} 38%|███▊ | 3843/10000 [13:59:28<21:59:48, 12.86s/it] 38%|███▊ | 3844/10000 [13:59:41<22:01:55, 12.88s/it] {'loss': 0.0073, 'learning_rate': 3.082e-05, 'epoch': 1.45} 38%|███▊ | 3844/10000 [13:59:41<22:01:55, 12.88s/it] 38%|███▊ | 3845/10000 [13:59:54<22:02:33, 12.89s/it] {'loss': 0.0091, 'learning_rate': 3.0815e-05, 'epoch': 1.45} 38%|███▊ | 3845/10000 [13:59:54<22:02:33, 12.89s/it] 38%|███▊ | 3846/10000 [14:00:07<22:01:39, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.081e-05, 'epoch': 1.45} 38%|███▊ | 3846/10000 [14:00:07<22:01:39, 12.89s/it] 38%|███▊ | 3847/10000 [14:00:20<22:03:10, 12.90s/it] {'loss': 0.0158, 'learning_rate': 3.0805e-05, 'epoch': 1.45} 38%|███▊ | 3847/10000 [14:00:20<22:03:10, 12.90s/it] 38%|███▊ | 3848/10000 [14:00:33<22:00:24, 12.88s/it] {'loss': 0.0074, 'learning_rate': 3.08e-05, 'epoch': 1.45} 38%|███▊ | 3848/10000 [14:00:33<22:00:24, 12.88s/it] 38%|███▊ | 3849/10000 [14:00:46<21:59:39, 12.87s/it] {'loss': 0.0064, 'learning_rate': 3.0795e-05, 'epoch': 1.45} 38%|███▊ | 3849/10000 [14:00:46<21:59:39, 12.87s/it] 38%|███▊ | 3850/10000 [14:00:58<22:00:09, 12.88s/it] {'loss': 0.0098, 'learning_rate': 3.079e-05, 'epoch': 1.45} 38%|███▊ | 3850/10000 [14:00:58<22:00:09, 12.88s/it] 39%|███▊ | 3851/10000 [14:01:11<21:58:35, 12.87s/it] {'loss': 0.0143, 'learning_rate': 3.0785000000000004e-05, 'epoch': 1.45} 39%|███▊ | 3851/10000 [14:01:11<21:58:35, 12.87s/it] 39%|███▊ | 3852/10000 [14:01:24<21:58:55, 12.87s/it] {'loss': 0.0095, 'learning_rate': 3.078e-05, 'epoch': 1.45} 39%|███▊ | 3852/10000 [14:01:24<21:58:55, 12.87s/it] 39%|███▊ | 3853/10000 [14:01:37<21:58:34, 12.87s/it] {'loss': 0.0083, 'learning_rate': 3.0775e-05, 'epoch': 1.45} 39%|███▊ | 3853/10000 [14:01:37<21:58:34, 12.87s/it] 39%|███▊ | 3854/10000 [14:01:50<21:58:47, 12.87s/it] {'loss': 0.0089, 'learning_rate': 3.077e-05, 'epoch': 1.45} 39%|███▊ | 3854/10000 [14:01:50<21:58:47, 12.87s/it] 39%|███▊ | 3855/10000 [14:02:03<21:58:37, 12.88s/it] {'loss': 0.0071, 'learning_rate': 3.0765e-05, 'epoch': 1.45} 39%|███▊ | 3855/10000 [14:02:03<21:58:37, 12.88s/it] 39%|███▊ | 3856/10000 [14:02:16<22:00:16, 12.89s/it] {'loss': 0.0077, 'learning_rate': 3.076e-05, 'epoch': 1.45} 39%|███▊ | 3856/10000 [14:02:16<22:00:16, 12.89s/it] 39%|███▊ | 3857/10000 [14:02:29<22:00:02, 12.89s/it] {'loss': 0.0106, 'learning_rate': 3.0755e-05, 'epoch': 1.45} 39%|███▊ | 3857/10000 [14:02:29<22:00:02, 12.89s/it] 39%|███▊ | 3858/10000 [14:02:42<21:59:36, 12.89s/it] {'loss': 0.0092, 'learning_rate': 3.075e-05, 'epoch': 1.45} 39%|███▊ | 3858/10000 [14:02:42<21:59:36, 12.89s/it] 39%|███▊ | 3859/10000 [14:02:54<22:01:22, 12.91s/it] {'loss': 0.0088, 'learning_rate': 3.0745000000000005e-05, 'epoch': 1.45} 39%|███▊ | 3859/10000 [14:02:55<22:01:22, 12.91s/it] 39%|███▊ | 3860/10000 [14:03:07<22:00:01, 12.90s/it] {'loss': 0.0095, 'learning_rate': 3.074e-05, 'epoch': 1.45} 39%|███▊ | 3860/10000 [14:03:07<22:00:01, 12.90s/it] 39%|███▊ | 3861/10000 [14:03:20<22:01:11, 12.91s/it] {'loss': 0.008, 'learning_rate': 3.0735e-05, 'epoch': 1.45} 39%|███▊ | 3861/10000 [14:03:20<22:01:11, 12.91s/it] 39%|███▊ | 3862/10000 [14:03:33<21:58:59, 12.89s/it] {'loss': 0.0075, 'learning_rate': 3.0730000000000006e-05, 'epoch': 1.46} 39%|███▊ | 3862/10000 [14:03:33<21:58:59, 12.89s/it] 39%|███▊ | 3863/10000 [14:03:46<21:59:00, 12.90s/it] {'loss': 0.0076, 'learning_rate': 3.0725e-05, 'epoch': 1.46} 39%|███▊ | 3863/10000 [14:03:46<21:59:00, 12.90s/it] 39%|███▊ | 3864/10000 [14:03:59<21:58:24, 12.89s/it] {'loss': 0.0086, 'learning_rate': 3.072e-05, 'epoch': 1.46} 39%|███▊ | 3864/10000 [14:03:59<21:58:24, 12.89s/it] 39%|███▊ | 3865/10000 [14:04:12<21:59:17, 12.90s/it] {'loss': 0.0061, 'learning_rate': 3.0715e-05, 'epoch': 1.46} 39%|███▊ | 3865/10000 [14:04:12<21:59:17, 12.90s/it] 39%|███▊ | 3866/10000 [14:04:25<21:56:46, 12.88s/it] {'loss': 0.0077, 'learning_rate': 3.071e-05, 'epoch': 1.46} 39%|███▊ | 3866/10000 [14:04:25<21:56:46, 12.88s/it] 39%|███▊ | 3867/10000 [14:04:38<21:58:52, 12.90s/it] {'loss': 0.0092, 'learning_rate': 3.0705e-05, 'epoch': 1.46} 39%|███▊ | 3867/10000 [14:04:38<21:58:52, 12.90s/it] 39%|███▊ | 3868/10000 [14:04:51<21:58:10, 12.90s/it] {'loss': 0.0081, 'learning_rate': 3.07e-05, 'epoch': 1.46} 39%|███▊ | 3868/10000 [14:04:51<21:58:10, 12.90s/it] 39%|███▊ | 3869/10000 [14:05:03<21:59:48, 12.92s/it] {'loss': 0.009, 'learning_rate': 3.0695000000000003e-05, 'epoch': 1.46} 39%|███▊ | 3869/10000 [14:05:04<21:59:48, 12.92s/it] 39%|███▊ | 3870/10000 [14:05:16<22:00:37, 12.93s/it] {'loss': 0.0082, 'learning_rate': 3.069e-05, 'epoch': 1.46} 39%|███▊ | 3870/10000 [14:05:16<22:00:37, 12.93s/it] 39%|███▊ | 3871/10000 [14:05:29<22:02:29, 12.95s/it] {'loss': 0.0074, 'learning_rate': 3.0685e-05, 'epoch': 1.46} 39%|███▊ | 3871/10000 [14:05:29<22:02:29, 12.95s/it] 39%|███▊ | 3872/10000 [14:05:42<22:05:09, 12.97s/it] {'loss': 0.0098, 'learning_rate': 3.0680000000000004e-05, 'epoch': 1.46} 39%|███▊ | 3872/10000 [14:05:43<22:05:09, 12.97s/it] 39%|███▊ | 3873/10000 [14:05:55<22:04:35, 12.97s/it] {'loss': 0.0064, 'learning_rate': 3.067500000000001e-05, 'epoch': 1.46} 39%|███▊ | 3873/10000 [14:05:55<22:04:35, 12.97s/it] 39%|███▊ | 3874/10000 [14:06:08<22:03:39, 12.96s/it] {'loss': 0.0094, 'learning_rate': 3.0669999999999996e-05, 'epoch': 1.46} 39%|███▊ | 3874/10000 [14:06:08<22:03:39, 12.96s/it] 39%|███▉ | 3875/10000 [14:06:21<22:02:32, 12.96s/it] {'loss': 0.0095, 'learning_rate': 3.0665e-05, 'epoch': 1.46} 39%|███▉ | 3875/10000 [14:06:21<22:02:32, 12.96s/it] 39%|███▉ | 3876/10000 [14:06:34<22:03:58, 12.97s/it] {'loss': 0.0081, 'learning_rate': 3.066e-05, 'epoch': 1.46} 39%|███▉ | 3876/10000 [14:06:34<22:03:58, 12.97s/it] 39%|███▉ | 3877/10000 [14:06:47<22:05:29, 12.99s/it] {'loss': 0.0049, 'learning_rate': 3.0655e-05, 'epoch': 1.46} 39%|███▉ | 3877/10000 [14:06:47<22:05:29, 12.99s/it] 39%|███▉ | 3878/10000 [14:07:00<22:02:35, 12.96s/it] {'loss': 0.0057, 'learning_rate': 3.065e-05, 'epoch': 1.46} 39%|███▉ | 3878/10000 [14:07:00<22:02:35, 12.96s/it] 39%|███▉ | 3879/10000 [14:07:13<22:02:30, 12.96s/it] {'loss': 0.0104, 'learning_rate': 3.0645e-05, 'epoch': 1.46} 39%|███▉ | 3879/10000 [14:07:13<22:02:30, 12.96s/it] 39%|███▉ | 3880/10000 [14:07:26<22:03:38, 12.98s/it] {'loss': 0.0105, 'learning_rate': 3.0640000000000005e-05, 'epoch': 1.46} 39%|███▉ | 3880/10000 [14:07:26<22:03:38, 12.98s/it] 39%|███▉ | 3881/10000 [14:07:39<22:02:54, 12.97s/it] {'loss': 0.0107, 'learning_rate': 3.0635e-05, 'epoch': 1.46} 39%|███▉ | 3881/10000 [14:07:39<22:02:54, 12.97s/it] 39%|███▉ | 3882/10000 [14:07:52<22:03:48, 12.98s/it] {'loss': 0.0053, 'learning_rate': 3.063e-05, 'epoch': 1.46} 39%|███▉ | 3882/10000 [14:07:52<22:03:48, 12.98s/it] 39%|███▉ | 3883/10000 [14:08:05<22:04:36, 12.99s/it] {'loss': 0.0078, 'learning_rate': 3.0625000000000006e-05, 'epoch': 1.46} 39%|███▉ | 3883/10000 [14:08:05<22:04:36, 12.99s/it] 39%|███▉ | 3884/10000 [14:08:18<22:02:03, 12.97s/it] {'loss': 0.0088, 'learning_rate': 3.062e-05, 'epoch': 1.46} 39%|███▉ | 3884/10000 [14:08:18<22:02:03, 12.97s/it] 39%|███▉ | 3885/10000 [14:08:31<22:02:00, 12.97s/it] {'loss': 0.0063, 'learning_rate': 3.0615e-05, 'epoch': 1.46} 39%|███▉ | 3885/10000 [14:08:31<22:02:00, 12.97s/it] 39%|███▉ | 3886/10000 [14:08:44<22:00:13, 12.96s/it] {'loss': 0.0057, 'learning_rate': 3.061e-05, 'epoch': 1.46} 39%|███▉ | 3886/10000 [14:08:44<22:00:13, 12.96s/it] 39%|███▉ | 3887/10000 [14:08:57<21:59:34, 12.95s/it] {'loss': 0.0055, 'learning_rate': 3.0605e-05, 'epoch': 1.46} 39%|███▉ | 3887/10000 [14:08:57<21:59:34, 12.95s/it] 39%|███▉ | 3888/10000 [14:09:10<21:59:40, 12.95s/it] {'loss': 0.0067, 'learning_rate': 3.06e-05, 'epoch': 1.46} 39%|███▉ | 3888/10000 [14:09:10<21:59:40, 12.95s/it] 39%|███▉ | 3889/10000 [14:09:23<21:58:21, 12.94s/it] {'loss': 0.0072, 'learning_rate': 3.0595e-05, 'epoch': 1.47} 39%|███▉ | 3889/10000 [14:09:23<21:58:21, 12.94s/it] 39%|███▉ | 3890/10000 [14:09:36<21:54:47, 12.91s/it] {'loss': 0.0084, 'learning_rate': 3.0590000000000004e-05, 'epoch': 1.47} 39%|███▉ | 3890/10000 [14:09:36<21:54:47, 12.91s/it] 39%|███▉ | 3891/10000 [14:09:49<21:51:59, 12.89s/it] {'loss': 0.0075, 'learning_rate': 3.0585e-05, 'epoch': 1.47} 39%|███▉ | 3891/10000 [14:09:49<21:51:59, 12.89s/it] 39%|███▉ | 3892/10000 [14:10:01<21:53:31, 12.90s/it] {'loss': 0.0053, 'learning_rate': 3.058e-05, 'epoch': 1.47} 39%|███▉ | 3892/10000 [14:10:01<21:53:31, 12.90s/it] 39%|███▉ | 3893/10000 [14:10:14<21:51:39, 12.89s/it] {'loss': 0.0048, 'learning_rate': 3.0575000000000005e-05, 'epoch': 1.47} 39%|███▉ | 3893/10000 [14:10:14<21:51:39, 12.89s/it] 39%|███▉ | 3894/10000 [14:10:27<21:51:59, 12.89s/it] {'loss': 0.0062, 'learning_rate': 3.057000000000001e-05, 'epoch': 1.47} 39%|███▉ | 3894/10000 [14:10:27<21:51:59, 12.89s/it] 39%|███▉ | 3895/10000 [14:10:40<21:52:07, 12.90s/it] {'loss': 0.0078, 'learning_rate': 3.0564999999999996e-05, 'epoch': 1.47} 39%|███▉ | 3895/10000 [14:10:40<21:52:07, 12.90s/it] 39%|███▉ | 3896/10000 [14:10:53<21:50:05, 12.88s/it] {'loss': 0.0074, 'learning_rate': 3.056e-05, 'epoch': 1.47} 39%|███▉ | 3896/10000 [14:10:53<21:50:05, 12.88s/it] 39%|███▉ | 3897/10000 [14:11:06<21:49:49, 12.88s/it] {'loss': 0.0063, 'learning_rate': 3.0555e-05, 'epoch': 1.47} 39%|███▉ | 3897/10000 [14:11:06<21:49:49, 12.88s/it] 39%|███▉ | 3898/10000 [14:11:19<21:48:31, 12.87s/it] {'loss': 0.0057, 'learning_rate': 3.0550000000000004e-05, 'epoch': 1.47} 39%|███▉ | 3898/10000 [14:11:19<21:48:31, 12.87s/it] 39%|███▉ | 3899/10000 [14:11:32<21:48:48, 12.87s/it] {'loss': 0.007, 'learning_rate': 3.0545e-05, 'epoch': 1.47} 39%|███▉ | 3899/10000 [14:11:32<21:48:48, 12.87s/it] 39%|███▉ | 3900/10000 [14:11:44<21:49:48, 12.88s/it] {'loss': 0.0059, 'learning_rate': 3.054e-05, 'epoch': 1.47} 39%|███▉ | 3900/10000 [14:11:44<21:49:48, 12.88s/it] 39%|███▉ | 3901/10000 [14:11:57<21:52:40, 12.91s/it] {'loss': 0.0104, 'learning_rate': 3.0535000000000005e-05, 'epoch': 1.47} 39%|███▉ | 3901/10000 [14:11:57<21:52:40, 12.91s/it] 39%|███▉ | 3902/10000 [14:12:10<21:50:42, 12.90s/it] {'loss': 0.0056, 'learning_rate': 3.053e-05, 'epoch': 1.47} 39%|███▉ | 3902/10000 [14:12:10<21:50:42, 12.90s/it] 39%|███▉ | 3903/10000 [14:12:23<21:52:03, 12.91s/it] {'loss': 0.0048, 'learning_rate': 3.0525e-05, 'epoch': 1.47} 39%|███▉ | 3903/10000 [14:12:23<21:52:03, 12.91s/it] 39%|███▉ | 3904/10000 [14:12:36<21:51:06, 12.90s/it] {'loss': 0.0103, 'learning_rate': 3.0520000000000006e-05, 'epoch': 1.47} 39%|███▉ | 3904/10000 [14:12:36<21:51:06, 12.90s/it] 39%|███▉ | 3905/10000 [14:12:49<21:48:12, 12.88s/it] {'loss': 0.0099, 'learning_rate': 3.0515e-05, 'epoch': 1.47} 39%|███▉ | 3905/10000 [14:12:49<21:48:12, 12.88s/it] 39%|███▉ | 3906/10000 [14:13:02<21:49:00, 12.89s/it] {'loss': 0.0045, 'learning_rate': 3.051e-05, 'epoch': 1.47} 39%|███▉ | 3906/10000 [14:13:02<21:49:00, 12.89s/it] 39%|███▉ | 3907/10000 [14:13:15<21:47:34, 12.88s/it] {'loss': 0.0079, 'learning_rate': 3.0505e-05, 'epoch': 1.47} 39%|███▉ | 3907/10000 [14:13:15<21:47:34, 12.88s/it] 39%|███▉ | 3908/10000 [14:13:28<21:48:27, 12.89s/it] {'loss': 0.0062, 'learning_rate': 3.05e-05, 'epoch': 1.47} 39%|███▉ | 3908/10000 [14:13:28<21:48:27, 12.89s/it] 39%|███▉ | 3909/10000 [14:13:40<21:47:10, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.0495000000000002e-05, 'epoch': 1.47} 39%|███▉ | 3909/10000 [14:13:40<21:47:10, 12.88s/it] 39%|███▉ | 3910/10000 [14:13:53<21:46:34, 12.87s/it] {'loss': 0.0077, 'learning_rate': 3.049e-05, 'epoch': 1.47} 39%|███▉ | 3910/10000 [14:13:53<21:46:34, 12.87s/it] 39%|███▉ | 3911/10000 [14:14:06<21:46:44, 12.88s/it] {'loss': 0.0065, 'learning_rate': 3.0485000000000004e-05, 'epoch': 1.47} 39%|███▉ | 3911/10000 [14:14:06<21:46:44, 12.88s/it] 39%|███▉ | 3912/10000 [14:14:19<21:46:14, 12.87s/it] {'loss': 0.0047, 'learning_rate': 3.0480000000000003e-05, 'epoch': 1.47} 39%|███▉ | 3912/10000 [14:14:19<21:46:14, 12.87s/it] 39%|███▉ | 3913/10000 [14:14:32<21:47:11, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.0475000000000002e-05, 'epoch': 1.47} 39%|███▉ | 3913/10000 [14:14:32<21:47:11, 12.89s/it] 39%|███▉ | 3914/10000 [14:14:45<21:48:18, 12.90s/it] {'loss': 0.006, 'learning_rate': 3.0470000000000005e-05, 'epoch': 1.47} 39%|███▉ | 3914/10000 [14:14:45<21:48:18, 12.90s/it] 39%|███▉ | 3915/10000 [14:14:58<21:46:54, 12.89s/it] {'loss': 0.0058, 'learning_rate': 3.0465e-05, 'epoch': 1.48} 39%|███▉ | 3915/10000 [14:14:58<21:46:54, 12.89s/it] 39%|███▉ | 3916/10000 [14:15:11<21:45:14, 12.87s/it] {'loss': 0.0101, 'learning_rate': 3.046e-05, 'epoch': 1.48} 39%|███▉ | 3916/10000 [14:15:11<21:45:14, 12.87s/it] 39%|███▉ | 3917/10000 [14:15:23<21:43:43, 12.86s/it] {'loss': 0.0062, 'learning_rate': 3.0455e-05, 'epoch': 1.48} 39%|███▉ | 3917/10000 [14:15:23<21:43:43, 12.86s/it] 39%|███▉ | 3918/10000 [14:15:36<21:47:43, 12.90s/it] {'loss': 0.0082, 'learning_rate': 3.045e-05, 'epoch': 1.48} 39%|███▉ | 3918/10000 [14:15:36<21:47:43, 12.90s/it] 39%|███▉ | 3919/10000 [14:15:49<21:47:24, 12.90s/it] {'loss': 0.0057, 'learning_rate': 3.0445e-05, 'epoch': 1.48} 39%|███▉ | 3919/10000 [14:15:49<21:47:24, 12.90s/it] 39%|███▉ | 3920/10000 [14:16:02<21:48:32, 12.91s/it] {'loss': 0.0054, 'learning_rate': 3.0440000000000003e-05, 'epoch': 1.48} 39%|███▉ | 3920/10000 [14:16:02<21:48:32, 12.91s/it] 39%|███▉ | 3921/10000 [14:16:15<21:48:31, 12.92s/it] {'loss': 0.0037, 'learning_rate': 3.0435000000000003e-05, 'epoch': 1.48} 39%|███▉ | 3921/10000 [14:16:15<21:48:31, 12.92s/it] 39%|███▉ | 3922/10000 [14:16:28<21:47:58, 12.91s/it] {'loss': 0.0041, 'learning_rate': 3.0430000000000002e-05, 'epoch': 1.48} 39%|███▉ | 3922/10000 [14:16:28<21:47:58, 12.91s/it] 39%|███▉ | 3923/10000 [14:16:41<21:46:40, 12.90s/it] {'loss': 0.0068, 'learning_rate': 3.0425000000000004e-05, 'epoch': 1.48} 39%|███▉ | 3923/10000 [14:16:41<21:46:40, 12.90s/it] 39%|███▉ | 3924/10000 [14:16:54<21:46:02, 12.90s/it] {'loss': 0.008, 'learning_rate': 3.0420000000000004e-05, 'epoch': 1.48} 39%|███▉ | 3924/10000 [14:16:54<21:46:02, 12.90s/it] 39%|███▉ | 3925/10000 [14:17:07<21:44:46, 12.89s/it] {'loss': 0.0064, 'learning_rate': 3.0415e-05, 'epoch': 1.48} 39%|███▉ | 3925/10000 [14:17:07<21:44:46, 12.89s/it] 39%|███▉ | 3926/10000 [14:17:20<21:43:34, 12.88s/it] {'loss': 0.0053, 'learning_rate': 3.041e-05, 'epoch': 1.48} 39%|███▉ | 3926/10000 [14:17:20<21:43:34, 12.88s/it] 39%|███▉ | 3927/10000 [14:17:32<21:43:20, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.0405e-05, 'epoch': 1.48} 39%|███▉ | 3927/10000 [14:17:33<21:43:20, 12.88s/it] 39%|███▉ | 3928/10000 [14:17:45<21:40:56, 12.86s/it] {'loss': 0.0066, 'learning_rate': 3.04e-05, 'epoch': 1.48} 39%|███▉ | 3928/10000 [14:17:45<21:40:56, 12.86s/it] 39%|███▉ | 3929/10000 [14:17:58<21:40:42, 12.86s/it] {'loss': 0.009, 'learning_rate': 3.0395000000000003e-05, 'epoch': 1.48} 39%|███▉ | 3929/10000 [14:17:58<21:40:42, 12.86s/it] 39%|███▉ | 3930/10000 [14:18:11<21:42:35, 12.88s/it] {'loss': 0.0058, 'learning_rate': 3.0390000000000002e-05, 'epoch': 1.48} 39%|███▉ | 3930/10000 [14:18:11<21:42:35, 12.88s/it] 39%|███▉ | 3931/10000 [14:18:24<21:42:07, 12.87s/it] {'loss': 0.0066, 'learning_rate': 3.0385e-05, 'epoch': 1.48} 39%|███▉ | 3931/10000 [14:18:24<21:42:07, 12.87s/it] 39%|███▉ | 3932/10000 [14:18:37<21:41:55, 12.87s/it] {'loss': 0.0062, 'learning_rate': 3.0380000000000004e-05, 'epoch': 1.48} 39%|███▉ | 3932/10000 [14:18:37<21:41:55, 12.87s/it] 39%|███▉ | 3933/10000 [14:18:50<21:42:29, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.0375000000000003e-05, 'epoch': 1.48} 39%|███▉ | 3933/10000 [14:18:50<21:42:29, 12.88s/it] 39%|███▉ | 3934/10000 [14:19:03<21:40:39, 12.87s/it] {'loss': 0.0068, 'learning_rate': 3.0370000000000006e-05, 'epoch': 1.48} 39%|███▉ | 3934/10000 [14:19:03<21:40:39, 12.87s/it] 39%|███▉ | 3935/10000 [14:19:15<21:41:09, 12.87s/it] {'loss': 0.0073, 'learning_rate': 3.0364999999999998e-05, 'epoch': 1.48} 39%|███▉ | 3935/10000 [14:19:15<21:41:09, 12.87s/it] 39%|███▉ | 3936/10000 [14:19:28<21:42:52, 12.89s/it] {'loss': 0.0064, 'learning_rate': 3.036e-05, 'epoch': 1.48} 39%|███▉ | 3936/10000 [14:19:28<21:42:52, 12.89s/it] 39%|███▉ | 3937/10000 [14:19:41<21:41:26, 12.88s/it] {'loss': 0.0062, 'learning_rate': 3.0355e-05, 'epoch': 1.48} 39%|███▉ | 3937/10000 [14:19:41<21:41:26, 12.88s/it] 39%|███▉ | 3938/10000 [14:19:54<21:40:39, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.035e-05, 'epoch': 1.48} 39%|███▉ | 3938/10000 [14:19:54<21:40:39, 12.87s/it] 39%|███▉ | 3939/10000 [14:20:07<21:39:47, 12.87s/it] {'loss': 0.008, 'learning_rate': 3.0345e-05, 'epoch': 1.48} 39%|███▉ | 3939/10000 [14:20:07<21:39:47, 12.87s/it] 39%|███▉ | 3940/10000 [14:20:20<21:40:12, 12.87s/it] {'loss': 0.0052, 'learning_rate': 3.034e-05, 'epoch': 1.48} 39%|███▉ | 3940/10000 [14:20:20<21:40:12, 12.87s/it] 39%|███▉ | 3941/10000 [14:20:33<21:40:10, 12.88s/it] {'loss': 0.0097, 'learning_rate': 3.0335000000000003e-05, 'epoch': 1.48} 39%|███▉ | 3941/10000 [14:20:33<21:40:10, 12.88s/it] 39%|███▉ | 3942/10000 [14:20:46<21:40:00, 12.88s/it] {'loss': 0.0078, 'learning_rate': 3.0330000000000003e-05, 'epoch': 1.49} 39%|███▉ | 3942/10000 [14:20:46<21:40:00, 12.88s/it] 39%|███▉ | 3943/10000 [14:20:59<21:41:44, 12.89s/it] {'loss': 0.007, 'learning_rate': 3.0325000000000002e-05, 'epoch': 1.49} 39%|███▉ | 3943/10000 [14:20:59<21:41:44, 12.89s/it] 39%|███▉ | 3944/10000 [14:21:11<21:43:04, 12.91s/it] {'loss': 0.0048, 'learning_rate': 3.0320000000000004e-05, 'epoch': 1.49} 39%|███▉ | 3944/10000 [14:21:11<21:43:04, 12.91s/it] 39%|███▉ | 3945/10000 [14:21:24<21:44:18, 12.92s/it] {'loss': 0.0083, 'learning_rate': 3.0315e-05, 'epoch': 1.49} 39%|███▉ | 3945/10000 [14:21:24<21:44:18, 12.92s/it] 39%|███▉ | 3946/10000 [14:21:37<21:47:27, 12.96s/it] {'loss': 0.0212, 'learning_rate': 3.031e-05, 'epoch': 1.49} 39%|███▉ | 3946/10000 [14:21:37<21:47:27, 12.96s/it] 39%|███▉ | 3947/10000 [14:21:50<21:45:58, 12.95s/it] {'loss': 0.0069, 'learning_rate': 3.0305e-05, 'epoch': 1.49} 39%|███▉ | 3947/10000 [14:21:50<21:45:58, 12.95s/it] 39%|███▉ | 3948/10000 [14:22:03<21:47:17, 12.96s/it] {'loss': 0.0085, 'learning_rate': 3.03e-05, 'epoch': 1.49} 39%|███▉ | 3948/10000 [14:22:03<21:47:17, 12.96s/it] 39%|███▉ | 3949/10000 [14:22:16<21:47:26, 12.96s/it] {'loss': 0.0065, 'learning_rate': 3.0295e-05, 'epoch': 1.49} 39%|███▉ | 3949/10000 [14:22:16<21:47:26, 12.96s/it] 40%|███▉ | 3950/10000 [14:22:29<21:44:17, 12.94s/it] {'loss': 0.006, 'learning_rate': 3.0290000000000003e-05, 'epoch': 1.49} 40%|███▉ | 3950/10000 [14:22:29<21:44:17, 12.94s/it] 40%|███▉ | 3951/10000 [14:22:42<21:44:24, 12.94s/it] {'loss': 0.0071, 'learning_rate': 3.0285000000000002e-05, 'epoch': 1.49} 40%|███▉ | 3951/10000 [14:22:42<21:44:24, 12.94s/it] 40%|███▉ | 3952/10000 [14:22:55<21:46:14, 12.96s/it] {'loss': 0.0131, 'learning_rate': 3.028e-05, 'epoch': 1.49} 40%|███▉ | 3952/10000 [14:22:55<21:46:14, 12.96s/it] 40%|███▉ | 3953/10000 [14:23:08<21:46:10, 12.96s/it] {'loss': 0.0063, 'learning_rate': 3.0275000000000004e-05, 'epoch': 1.49} 40%|███▉ | 3953/10000 [14:23:08<21:46:10, 12.96s/it] 40%|███▉ | 3954/10000 [14:23:21<21:44:20, 12.94s/it] {'loss': 0.0056, 'learning_rate': 3.0270000000000003e-05, 'epoch': 1.49} 40%|███▉ | 3954/10000 [14:23:21<21:44:20, 12.94s/it] 40%|███▉ | 3955/10000 [14:23:34<21:44:02, 12.94s/it] {'loss': 0.0069, 'learning_rate': 3.0265e-05, 'epoch': 1.49} 40%|███▉ | 3955/10000 [14:23:34<21:44:02, 12.94s/it] 40%|███▉ | 3956/10000 [14:23:47<21:45:03, 12.96s/it] {'loss': 0.006, 'learning_rate': 3.0259999999999998e-05, 'epoch': 1.49} 40%|███▉ | 3956/10000 [14:23:47<21:45:03, 12.96s/it] 40%|███▉ | 3957/10000 [14:24:00<21:44:28, 12.95s/it] {'loss': 0.006, 'learning_rate': 3.0255e-05, 'epoch': 1.49} 40%|███▉ | 3957/10000 [14:24:00<21:44:28, 12.95s/it] 40%|███▉ | 3958/10000 [14:24:13<21:45:51, 12.97s/it] {'loss': 0.0062, 'learning_rate': 3.025e-05, 'epoch': 1.49} 40%|███▉ | 3958/10000 [14:24:13<21:45:51, 12.97s/it] 40%|███▉ | 3959/10000 [14:24:26<21:43:44, 12.95s/it] {'loss': 0.0069, 'learning_rate': 3.0245000000000003e-05, 'epoch': 1.49} 40%|███▉ | 3959/10000 [14:24:26<21:43:44, 12.95s/it] 40%|███▉ | 3960/10000 [14:24:39<21:44:26, 12.96s/it] {'loss': 0.0059, 'learning_rate': 3.0240000000000002e-05, 'epoch': 1.49} 40%|███▉ | 3960/10000 [14:24:39<21:44:26, 12.96s/it] 40%|███▉ | 3961/10000 [14:24:52<21:45:48, 12.97s/it] {'loss': 0.0068, 'learning_rate': 3.0235e-05, 'epoch': 1.49} 40%|███▉ | 3961/10000 [14:24:52<21:45:48, 12.97s/it] 40%|███▉ | 3962/10000 [14:25:05<21:46:25, 12.98s/it] {'loss': 0.0077, 'learning_rate': 3.0230000000000004e-05, 'epoch': 1.49} 40%|███▉ | 3962/10000 [14:25:05<21:46:25, 12.98s/it] 40%|███▉ | 3963/10000 [14:25:18<21:47:25, 12.99s/it] {'loss': 0.0055, 'learning_rate': 3.0225000000000003e-05, 'epoch': 1.49} 40%|███▉ | 3963/10000 [14:25:18<21:47:25, 12.99s/it] 40%|███▉ | 3964/10000 [14:25:31<21:47:04, 12.99s/it] {'loss': 0.0076, 'learning_rate': 3.0220000000000005e-05, 'epoch': 1.49} 40%|███▉ | 3964/10000 [14:25:31<21:47:04, 12.99s/it] 40%|███▉ | 3965/10000 [14:25:44<21:46:13, 12.99s/it] {'loss': 0.0071, 'learning_rate': 3.0214999999999998e-05, 'epoch': 1.49} 40%|███▉ | 3965/10000 [14:25:44<21:46:13, 12.99s/it] 40%|███▉ | 3966/10000 [14:25:57<21:45:19, 12.98s/it] {'loss': 0.005, 'learning_rate': 3.021e-05, 'epoch': 1.49} 40%|███▉ | 3966/10000 [14:25:57<21:45:19, 12.98s/it] 40%|███▉ | 3967/10000 [14:26:10<21:43:48, 12.97s/it] {'loss': 0.008, 'learning_rate': 3.0205e-05, 'epoch': 1.49} 40%|███▉ | 3967/10000 [14:26:10<21:43:48, 12.97s/it] 40%|███▉ | 3968/10000 [14:26:23<21:44:13, 12.97s/it] {'loss': 0.0063, 'learning_rate': 3.02e-05, 'epoch': 1.5} 40%|███▉ | 3968/10000 [14:26:23<21:44:13, 12.97s/it] 40%|███▉ | 3969/10000 [14:26:36<21:44:27, 12.98s/it] {'loss': 0.0072, 'learning_rate': 3.0195e-05, 'epoch': 1.5} 40%|███▉ | 3969/10000 [14:26:36<21:44:27, 12.98s/it] 40%|███▉ | 3970/10000 [14:26:48<21:39:45, 12.93s/it] {'loss': 0.0072, 'learning_rate': 3.019e-05, 'epoch': 1.5} 40%|███▉ | 3970/10000 [14:26:49<21:39:45, 12.93s/it] 40%|███▉ | 3971/10000 [14:27:01<21:36:44, 12.90s/it] {'loss': 0.0072, 'learning_rate': 3.0185000000000003e-05, 'epoch': 1.5} 40%|███▉ | 3971/10000 [14:27:01<21:36:44, 12.90s/it] 40%|███▉ | 3972/10000 [14:27:14<21:37:39, 12.92s/it] {'loss': 0.0072, 'learning_rate': 3.0180000000000002e-05, 'epoch': 1.5} 40%|███▉ | 3972/10000 [14:27:14<21:37:39, 12.92s/it] 40%|███▉ | 3973/10000 [14:27:27<21:34:59, 12.89s/it] {'loss': 0.0071, 'learning_rate': 3.0175e-05, 'epoch': 1.5} 40%|███▉ | 3973/10000 [14:27:27<21:34:59, 12.89s/it] 40%|███▉ | 3974/10000 [14:27:40<21:33:59, 12.88s/it] {'loss': 0.0065, 'learning_rate': 3.0170000000000004e-05, 'epoch': 1.5} 40%|███▉ | 3974/10000 [14:27:40<21:33:59, 12.88s/it] 40%|███▉ | 3975/10000 [14:27:53<21:34:26, 12.89s/it] {'loss': 0.0062, 'learning_rate': 3.0165e-05, 'epoch': 1.5} 40%|███▉ | 3975/10000 [14:27:53<21:34:26, 12.89s/it] 40%|███▉ | 3976/10000 [14:28:06<21:33:49, 12.89s/it] {'loss': 0.0064, 'learning_rate': 3.016e-05, 'epoch': 1.5} 40%|███▉ | 3976/10000 [14:28:06<21:33:49, 12.89s/it] 40%|███▉ | 3977/10000 [14:28:19<21:34:37, 12.90s/it] {'loss': 0.0063, 'learning_rate': 3.0155e-05, 'epoch': 1.5} 40%|███▉ | 3977/10000 [14:28:19<21:34:37, 12.90s/it] 40%|███▉ | 3978/10000 [14:28:32<21:33:42, 12.89s/it] {'loss': 0.0084, 'learning_rate': 3.015e-05, 'epoch': 1.5} 40%|███▉ | 3978/10000 [14:28:32<21:33:42, 12.89s/it] 40%|███▉ | 3979/10000 [14:28:44<21:32:50, 12.88s/it] {'loss': 0.0066, 'learning_rate': 3.0145e-05, 'epoch': 1.5} 40%|███▉ | 3979/10000 [14:28:44<21:32:50, 12.88s/it] 40%|███▉ | 3980/10000 [14:28:57<21:34:02, 12.90s/it] {'loss': 0.0067, 'learning_rate': 3.0140000000000003e-05, 'epoch': 1.5} 40%|███▉ | 3980/10000 [14:28:57<21:34:02, 12.90s/it] 40%|███▉ | 3981/10000 [14:29:10<21:32:56, 12.89s/it] {'loss': 0.0065, 'learning_rate': 3.0135000000000002e-05, 'epoch': 1.5} 40%|███▉ | 3981/10000 [14:29:10<21:32:56, 12.89s/it] 40%|███▉ | 3982/10000 [14:29:23<21:30:45, 12.87s/it] {'loss': 0.0099, 'learning_rate': 3.013e-05, 'epoch': 1.5} 40%|███▉ | 3982/10000 [14:29:23<21:30:45, 12.87s/it] 40%|███▉ | 3983/10000 [14:29:36<21:31:27, 12.88s/it] {'loss': 0.0073, 'learning_rate': 3.0125000000000004e-05, 'epoch': 1.5} 40%|███▉ | 3983/10000 [14:29:36<21:31:27, 12.88s/it] 40%|███▉ | 3984/10000 [14:29:49<21:31:55, 12.88s/it] {'loss': 0.0065, 'learning_rate': 3.0120000000000003e-05, 'epoch': 1.5} 40%|███▉ | 3984/10000 [14:29:49<21:31:55, 12.88s/it] 40%|███▉ | 3985/10000 [14:30:02<21:30:48, 12.88s/it] {'loss': 0.0078, 'learning_rate': 3.0115e-05, 'epoch': 1.5} 40%|███▉ | 3985/10000 [14:30:02<21:30:48, 12.88s/it] 40%|███▉ | 3986/10000 [14:30:15<21:31:30, 12.89s/it] {'loss': 0.0052, 'learning_rate': 3.0109999999999998e-05, 'epoch': 1.5} 40%|███▉ | 3986/10000 [14:30:15<21:31:30, 12.89s/it] 40%|███▉ | 3987/10000 [14:30:28<21:33:39, 12.91s/it] {'loss': 0.0101, 'learning_rate': 3.0105e-05, 'epoch': 1.5} 40%|███▉ | 3987/10000 [14:30:28<21:33:39, 12.91s/it] 40%|███▉ | 3988/10000 [14:30:40<21:34:30, 12.92s/it] {'loss': 0.0084, 'learning_rate': 3.01e-05, 'epoch': 1.5} 40%|███▉ | 3988/10000 [14:30:41<21:34:30, 12.92s/it] 40%|███▉ | 3989/10000 [14:30:53<21:32:30, 12.90s/it] {'loss': 0.0071, 'learning_rate': 3.0095000000000002e-05, 'epoch': 1.5} 40%|███▉ | 3989/10000 [14:30:53<21:32:30, 12.90s/it] 40%|███▉ | 3990/10000 [14:31:06<21:31:35, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.009e-05, 'epoch': 1.5} 40%|███▉ | 3990/10000 [14:31:06<21:31:35, 12.89s/it] 40%|███▉ | 3991/10000 [14:31:19<21:31:49, 12.90s/it] {'loss': 0.0091, 'learning_rate': 3.0085e-05, 'epoch': 1.5} 40%|███▉ | 3991/10000 [14:31:19<21:31:49, 12.90s/it] 40%|███▉ | 3992/10000 [14:31:32<21:28:28, 12.87s/it] {'loss': 0.0054, 'learning_rate': 3.0080000000000003e-05, 'epoch': 1.5} 40%|███▉ | 3992/10000 [14:31:32<21:28:28, 12.87s/it] 40%|███▉ | 3993/10000 [14:31:45<21:28:32, 12.87s/it] {'loss': 0.0094, 'learning_rate': 3.0075000000000003e-05, 'epoch': 1.5} 40%|███▉ | 3993/10000 [14:31:45<21:28:32, 12.87s/it] 40%|███▉ | 3994/10000 [14:31:58<21:30:56, 12.90s/it] {'loss': 0.0056, 'learning_rate': 3.0070000000000005e-05, 'epoch': 1.5} 40%|███▉ | 3994/10000 [14:31:58<21:30:56, 12.90s/it] 40%|███▉ | 3995/10000 [14:32:11<21:30:51, 12.90s/it] {'loss': 0.0081, 'learning_rate': 3.0064999999999998e-05, 'epoch': 1.51} 40%|███▉ | 3995/10000 [14:32:11<21:30:51, 12.90s/it] 40%|███▉ | 3996/10000 [14:32:24<21:31:59, 12.91s/it] {'loss': 0.0084, 'learning_rate': 3.006e-05, 'epoch': 1.51} 40%|███▉ | 3996/10000 [14:32:24<21:31:59, 12.91s/it] 40%|███▉ | 3997/10000 [14:32:37<21:32:59, 12.92s/it] {'loss': 0.0068, 'learning_rate': 3.0055e-05, 'epoch': 1.51} 40%|███▉ | 3997/10000 [14:32:37<21:32:59, 12.92s/it] 40%|███▉ | 3998/10000 [14:32:50<21:34:35, 12.94s/it] {'loss': 0.01, 'learning_rate': 3.0050000000000002e-05, 'epoch': 1.51} 40%|███▉ | 3998/10000 [14:32:50<21:34:35, 12.94s/it] 40%|███▉ | 3999/10000 [14:33:02<21:33:36, 12.93s/it] {'loss': 0.0058, 'learning_rate': 3.0045e-05, 'epoch': 1.51} 40%|███▉ | 3999/10000 [14:33:03<21:33:36, 12.93s/it] 40%|████ | 4000/10000 [14:33:15<21:33:49, 12.94s/it] {'loss': 0.0058, 'learning_rate': 3.004e-05, 'epoch': 1.51} 40%|████ | 4000/10000 [14:33:15<21:33:49, 12.94s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-06 10:58:14,038 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/config.json [INFO|configuration_utils.py:364] 2024-11-06 10:58:14,040 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-06 10:59:14,134 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-06 10:59:14,137 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-06 10:59:14,138 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/special_tokens_map.json [2024-11-06 10:59:14,149] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step4000 is about to be saved! [2024-11-06 10:59:14,169] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/global_step4000/mp_rank_00_model_states.pt [2024-11-06 10:59:14,170] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/global_step4000/mp_rank_00_model_states.pt... [2024-11-06 10:59:57,452] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/global_step4000/mp_rank_00_model_states.pt. [2024-11-06 10:59:57,610] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/global_step4000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-06 11:02:01,341] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/global_step4000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-06 11:02:01,345] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-4000/global_step4000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-06 11:02:01,345] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step4000 is ready now! 40%|████ | 4001/10000 [14:37:16<135:10:31, 81.12s/it] {'loss': 0.0059, 'learning_rate': 3.0035000000000003e-05, 'epoch': 1.51} 40%|████ | 4001/10000 [14:37:16<135:10:31, 81.12s/it] 40%|████ | 4002/10000 [14:37:28<100:56:52, 60.59s/it] {'loss': 0.0059, 'learning_rate': 3.0030000000000002e-05, 'epoch': 1.51} 40%|████ | 4002/10000 [14:37:28<100:56:52, 60.59s/it] 40%|████ | 4003/10000 [14:37:41<77:03:48, 46.26s/it] {'loss': 0.0087, 'learning_rate': 3.0025000000000005e-05, 'epoch': 1.51} 40%|████ | 4003/10000 [14:37:41<77:03:48, 46.26s/it] 40%|████ | 4004/10000 [14:37:54<60:18:24, 36.21s/it] {'loss': 0.0062, 'learning_rate': 3.0020000000000004e-05, 'epoch': 1.51} 40%|████ | 4004/10000 [14:37:54<60:18:24, 36.21s/it] 40%|████ | 4005/10000 [14:38:07<48:37:11, 29.20s/it] {'loss': 0.0056, 'learning_rate': 3.0015e-05, 'epoch': 1.51} 40%|████ | 4005/10000 [14:38:07<48:37:11, 29.20s/it] 40%|████ | 4006/10000 [14:38:20<40:29:07, 24.32s/it] {'loss': 0.0064, 'learning_rate': 3.001e-05, 'epoch': 1.51} 40%|████ | 4006/10000 [14:38:20<40:29:07, 24.32s/it] 40%|████ | 4007/10000 [14:38:33<34:46:03, 20.88s/it] {'loss': 0.0083, 'learning_rate': 3.0004999999999998e-05, 'epoch': 1.51} 40%|████ | 4007/10000 [14:38:33<34:46:03, 20.88s/it] 40%|████ | 4008/10000 [14:38:45<30:45:30, 18.48s/it] {'loss': 0.0072, 'learning_rate': 3e-05, 'epoch': 1.51} 40%|████ | 4008/10000 [14:38:45<30:45:30, 18.48s/it][2024-11-06 11:03:55,495] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 40%|████ | 4009/10000 [14:38:57<27:17:41, 16.40s/it] {'loss': 0.0097, 'learning_rate': 3e-05, 'epoch': 1.51} 40%|████ | 4009/10000 [14:38:57<27:17:41, 16.40s/it][2024-11-06 11:04:07,055] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 40%|████ | 4010/10000 [14:39:09<24:52:38, 14.95s/it] {'loss': 0.0068, 'learning_rate': 3e-05, 'epoch': 1.51} 40%|████ | 4010/10000 [14:39:09<24:52:38, 14.95s/it] 40%|████ | 4011/10000 [14:39:21<23:51:06, 14.34s/it] {'loss': 0.0066, 'learning_rate': 2.9995e-05, 'epoch': 1.51} 40%|████ | 4011/10000 [14:39:21<23:51:06, 14.34s/it] 40%|████ | 4012/10000 [14:39:34<23:11:11, 13.94s/it] {'loss': 0.0089, 'learning_rate': 2.9990000000000003e-05, 'epoch': 1.51} 40%|████ | 4012/10000 [14:39:34<23:11:11, 13.94s/it] 40%|████ | 4013/10000 [14:39:47<22:41:49, 13.65s/it] {'loss': 0.006, 'learning_rate': 2.9985000000000002e-05, 'epoch': 1.51} 40%|████ | 4013/10000 [14:39:47<22:41:49, 13.65s/it] 40%|████ | 4014/10000 [14:40:00<22:17:47, 13.41s/it] {'loss': 0.0056, 'learning_rate': 2.998e-05, 'epoch': 1.51} 40%|████ | 4014/10000 [14:40:00<22:17:47, 13.41s/it] 40%|████ | 4015/10000 [14:40:13<22:01:06, 13.24s/it] {'loss': 0.006, 'learning_rate': 2.9975000000000004e-05, 'epoch': 1.51} 40%|████ | 4015/10000 [14:40:13<22:01:06, 13.24s/it] 40%|████ | 4016/10000 [14:40:26<21:47:59, 13.11s/it] {'loss': 0.0079, 'learning_rate': 2.9970000000000003e-05, 'epoch': 1.51} 40%|████ | 4016/10000 [14:40:26<21:47:59, 13.11s/it] 40%|████ | 4017/10000 [14:40:39<21:39:54, 13.04s/it] {'loss': 0.0066, 'learning_rate': 2.9965000000000005e-05, 'epoch': 1.51} 40%|████ | 4017/10000 [14:40:39<21:39:54, 13.04s/it] 40%|████ | 4018/10000 [14:40:52<21:35:05, 12.99s/it] {'loss': 0.0048, 'learning_rate': 2.9959999999999998e-05, 'epoch': 1.51} 40%|████ | 4018/10000 [14:40:52<21:35:05, 12.99s/it] 40%|████ | 4019/10000 [14:41:05<21:30:44, 12.95s/it] {'loss': 0.0058, 'learning_rate': 2.9955e-05, 'epoch': 1.51} 40%|████ | 4019/10000 [14:41:05<21:30:44, 12.95s/it] 40%|████ | 4020/10000 [14:41:17<21:25:15, 12.90s/it] {'loss': 0.0178, 'learning_rate': 2.995e-05, 'epoch': 1.51} 40%|████ | 4020/10000 [14:41:17<21:25:15, 12.90s/it] 40%|████ | 4021/10000 [14:41:30<21:25:41, 12.90s/it] {'loss': 0.0077, 'learning_rate': 2.9945000000000002e-05, 'epoch': 1.52} 40%|████ | 4021/10000 [14:41:30<21:25:41, 12.90s/it] 40%|████ | 4022/10000 [14:41:43<21:23:30, 12.88s/it] {'loss': 0.0096, 'learning_rate': 2.994e-05, 'epoch': 1.52} 40%|████ | 4022/10000 [14:41:43<21:23:30, 12.88s/it] 40%|████ | 4023/10000 [14:41:56<21:25:36, 12.91s/it] {'loss': 0.0092, 'learning_rate': 2.9935e-05, 'epoch': 1.52} 40%|████ | 4023/10000 [14:41:56<21:25:36, 12.91s/it] 40%|████ | 4024/10000 [14:42:09<21:23:53, 12.89s/it] {'loss': 0.0084, 'learning_rate': 2.9930000000000003e-05, 'epoch': 1.52} 40%|████ | 4024/10000 [14:42:09<21:23:53, 12.89s/it] 40%|████ | 4025/10000 [14:42:22<21:23:01, 12.88s/it] {'loss': 0.0084, 'learning_rate': 2.9925000000000002e-05, 'epoch': 1.52} 40%|████ | 4025/10000 [14:42:22<21:23:01, 12.88s/it] 40%|████ | 4026/10000 [14:42:35<21:23:48, 12.89s/it] {'loss': 0.0077, 'learning_rate': 2.9920000000000005e-05, 'epoch': 1.52} 40%|████ | 4026/10000 [14:42:35<21:23:48, 12.89s/it] 40%|████ | 4027/10000 [14:42:48<21:22:57, 12.89s/it] {'loss': 0.0301, 'learning_rate': 2.9915000000000004e-05, 'epoch': 1.52} 40%|████ | 4027/10000 [14:42:48<21:22:57, 12.89s/it][2024-11-06 11:07:57,803] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 40%|████ | 4028/10000 [14:42:59<20:48:40, 12.55s/it] {'loss': 1.4856, 'learning_rate': 2.9915000000000004e-05, 'epoch': 1.52} 40%|████ | 4028/10000 [14:42:59<20:48:40, 12.55s/it][2024-11-06 11:08:09,495] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 40%|████ | 4029/10000 [14:43:11<20:22:59, 12.29s/it] {'loss': 1.4351, 'learning_rate': 2.9915000000000004e-05, 'epoch': 1.52} 40%|████ | 4029/10000 [14:43:11<20:22:59, 12.29s/it] 40%|████ | 4030/10000 [14:43:24<20:41:00, 12.47s/it] {'loss': 1.4699, 'learning_rate': 2.991e-05, 'epoch': 1.52} 40%|████ | 4030/10000 [14:43:24<20:41:00, 12.47s/it] 40%|████ | 4031/10000 [14:43:37<20:54:14, 12.61s/it] {'loss': 0.2761, 'learning_rate': 2.9905e-05, 'epoch': 1.52} 40%|████ | 4031/10000 [14:43:37<20:54:14, 12.61s/it] 40%|████ | 4032/10000 [14:43:50<21:01:21, 12.68s/it] {'loss': 0.1187, 'learning_rate': 2.9900000000000002e-05, 'epoch': 1.52} 40%|████ | 4032/10000 [14:43:50<21:01:21, 12.68s/it] 40%|████ | 4033/10000 [14:44:03<21:07:34, 12.75s/it] {'loss': 0.0905, 'learning_rate': 2.9895e-05, 'epoch': 1.52} 40%|████ | 4033/10000 [14:44:03<21:07:34, 12.75s/it] 40%|████ | 4034/10000 [14:44:16<21:17:04, 12.84s/it] {'loss': 0.0542, 'learning_rate': 2.989e-05, 'epoch': 1.52} 40%|████ | 4034/10000 [14:44:16<21:17:04, 12.84s/it] 40%|████ | 4035/10000 [14:44:28<21:16:21, 12.84s/it] {'loss': 0.0701, 'learning_rate': 2.9885000000000003e-05, 'epoch': 1.52} 40%|████ | 4035/10000 [14:44:28<21:16:21, 12.84s/it] 40%|████ | 4036/10000 [14:44:41<21:21:11, 12.89s/it] {'loss': 0.0227, 'learning_rate': 2.9880000000000002e-05, 'epoch': 1.52} 40%|████ | 4036/10000 [14:44:41<21:21:11, 12.89s/it] 40%|████ | 4037/10000 [14:44:54<21:20:23, 12.88s/it] {'loss': 0.0771, 'learning_rate': 2.9875000000000004e-05, 'epoch': 1.52} 40%|████ | 4037/10000 [14:44:54<21:20:23, 12.88s/it] 40%|████ | 4038/10000 [14:45:07<21:20:40, 12.89s/it] {'loss': 0.0518, 'learning_rate': 2.9870000000000004e-05, 'epoch': 1.52} 40%|████ | 4038/10000 [14:45:07<21:20:40, 12.89s/it] 40%|████ | 4039/10000 [14:45:20<21:21:52, 12.90s/it] {'loss': 0.0305, 'learning_rate': 2.9865000000000003e-05, 'epoch': 1.52} 40%|████ | 4039/10000 [14:45:20<21:21:52, 12.90s/it] 40%|████ | 4040/10000 [14:45:33<21:21:50, 12.90s/it] {'loss': 0.0912, 'learning_rate': 2.986e-05, 'epoch': 1.52} 40%|████ | 4040/10000 [14:45:33<21:21:50, 12.90s/it] 40%|████ | 4041/10000 [14:45:46<21:22:32, 12.91s/it] {'loss': 0.4602, 'learning_rate': 2.9855e-05, 'epoch': 1.52} 40%|████ | 4041/10000 [14:45:46<21:22:32, 12.91s/it] 40%|████ | 4042/10000 [14:45:59<21:20:13, 12.89s/it] {'loss': 0.0543, 'learning_rate': 2.985e-05, 'epoch': 1.52} 40%|████ | 4042/10000 [14:45:59<21:20:13, 12.89s/it] 40%|████ | 4043/10000 [14:46:12<21:20:15, 12.89s/it] {'loss': 0.0152, 'learning_rate': 2.9845e-05, 'epoch': 1.52} 40%|████ | 4043/10000 [14:46:12<21:20:15, 12.89s/it] 40%|████ | 4044/10000 [14:46:25<21:21:08, 12.91s/it] {'loss': 0.0216, 'learning_rate': 2.9840000000000002e-05, 'epoch': 1.52} 40%|████ | 4044/10000 [14:46:25<21:21:08, 12.91s/it] 40%|████ | 4045/10000 [14:46:38<21:22:29, 12.92s/it] {'loss': 0.0143, 'learning_rate': 2.9835e-05, 'epoch': 1.52} 40%|████ | 4045/10000 [14:46:38<21:22:29, 12.92s/it] 40%|████ | 4046/10000 [14:46:51<21:21:41, 12.92s/it] {'loss': 0.0112, 'learning_rate': 2.9830000000000004e-05, 'epoch': 1.52} 40%|████ | 4046/10000 [14:46:51<21:21:41, 12.92s/it] 40%|████ | 4047/10000 [14:47:03<21:21:42, 12.92s/it] {'loss': 0.0228, 'learning_rate': 2.9825000000000003e-05, 'epoch': 1.52} 40%|████ | 4047/10000 [14:47:03<21:21:42, 12.92s/it] 40%|████ | 4048/10000 [14:47:16<21:22:00, 12.92s/it] {'loss': 0.0094, 'learning_rate': 2.9820000000000002e-05, 'epoch': 1.53} 40%|████ | 4048/10000 [14:47:16<21:22:00, 12.92s/it] 40%|████ | 4049/10000 [14:47:29<21:20:05, 12.91s/it] {'loss': 0.0125, 'learning_rate': 2.9815000000000005e-05, 'epoch': 1.53} 40%|████ | 4049/10000 [14:47:29<21:20:05, 12.91s/it] 40%|████ | 4050/10000 [14:47:42<21:20:27, 12.91s/it] {'loss': 0.008, 'learning_rate': 2.9809999999999997e-05, 'epoch': 1.53} 40%|████ | 4050/10000 [14:47:42<21:20:27, 12.91s/it] 41%|████ | 4051/10000 [14:47:55<21:19:09, 12.90s/it] {'loss': 0.0073, 'learning_rate': 2.9805e-05, 'epoch': 1.53} 41%|████ | 4051/10000 [14:47:55<21:19:09, 12.90s/it] 41%|████ | 4052/10000 [14:48:08<21:20:41, 12.92s/it] {'loss': 0.0065, 'learning_rate': 2.98e-05, 'epoch': 1.53} 41%|████ | 4052/10000 [14:48:08<21:20:41, 12.92s/it] 41%|████ | 4053/10000 [14:48:21<21:16:12, 12.88s/it] {'loss': 0.0085, 'learning_rate': 2.9795000000000002e-05, 'epoch': 1.53} 41%|████ | 4053/10000 [14:48:21<21:16:12, 12.88s/it] 41%|████ | 4054/10000 [14:48:34<21:17:05, 12.89s/it] {'loss': 0.0083, 'learning_rate': 2.979e-05, 'epoch': 1.53} 41%|████ | 4054/10000 [14:48:34<21:17:05, 12.89s/it] 41%|████ | 4055/10000 [14:48:46<21:14:06, 12.86s/it] {'loss': 0.0086, 'learning_rate': 2.9785e-05, 'epoch': 1.53} 41%|████ | 4055/10000 [14:48:47<21:14:06, 12.86s/it] 41%|████ | 4056/10000 [14:48:59<21:16:08, 12.88s/it] {'loss': 0.0236, 'learning_rate': 2.9780000000000003e-05, 'epoch': 1.53} 41%|████ | 4056/10000 [14:48:59<21:16:08, 12.88s/it] 41%|████ | 4057/10000 [14:49:12<21:17:13, 12.89s/it] {'loss': 0.0053, 'learning_rate': 2.9775000000000002e-05, 'epoch': 1.53} 41%|████ | 4057/10000 [14:49:12<21:17:13, 12.89s/it] 41%|████ | 4058/10000 [14:49:25<21:15:48, 12.88s/it] {'loss': 0.0056, 'learning_rate': 2.9770000000000005e-05, 'epoch': 1.53} 41%|████ | 4058/10000 [14:49:25<21:15:48, 12.88s/it] 41%|████ | 4059/10000 [14:49:38<21:16:42, 12.89s/it] {'loss': 0.0076, 'learning_rate': 2.9765000000000004e-05, 'epoch': 1.53} 41%|████ | 4059/10000 [14:49:38<21:16:42, 12.89s/it] 41%|████ | 4060/10000 [14:49:51<21:15:35, 12.88s/it] {'loss': 0.0082, 'learning_rate': 2.976e-05, 'epoch': 1.53} 41%|████ | 4060/10000 [14:49:51<21:15:35, 12.88s/it] 41%|████ | 4061/10000 [14:50:04<21:13:49, 12.87s/it] {'loss': 0.0081, 'learning_rate': 2.9755e-05, 'epoch': 1.53} 41%|████ | 4061/10000 [14:50:04<21:13:49, 12.87s/it] 41%|████ | 4062/10000 [14:50:17<21:12:34, 12.86s/it] {'loss': 0.0063, 'learning_rate': 2.975e-05, 'epoch': 1.53} 41%|████ | 4062/10000 [14:50:17<21:12:34, 12.86s/it] 41%|████ | 4063/10000 [14:50:30<21:13:37, 12.87s/it] {'loss': 0.0052, 'learning_rate': 2.9745e-05, 'epoch': 1.53} 41%|████ | 4063/10000 [14:50:30<21:13:37, 12.87s/it] 41%|████ | 4064/10000 [14:50:42<21:15:06, 12.89s/it] {'loss': 0.0065, 'learning_rate': 2.974e-05, 'epoch': 1.53} 41%|████ | 4064/10000 [14:50:43<21:15:06, 12.89s/it] 41%|████ | 4065/10000 [14:50:55<21:16:18, 12.90s/it] {'loss': 0.0046, 'learning_rate': 2.9735000000000002e-05, 'epoch': 1.53} 41%|████ | 4065/10000 [14:50:55<21:16:18, 12.90s/it] 41%|████ | 4066/10000 [14:51:08<21:14:37, 12.89s/it] {'loss': 0.0069, 'learning_rate': 2.973e-05, 'epoch': 1.53} 41%|████ | 4066/10000 [14:51:08<21:14:37, 12.89s/it] 41%|████ | 4067/10000 [14:51:21<21:14:46, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.9725000000000004e-05, 'epoch': 1.53} 41%|████ | 4067/10000 [14:51:21<21:14:46, 12.89s/it] 41%|████ | 4068/10000 [14:51:34<21:15:48, 12.90s/it] {'loss': 0.0054, 'learning_rate': 2.9720000000000003e-05, 'epoch': 1.53} 41%|████ | 4068/10000 [14:51:34<21:15:48, 12.90s/it] 41%|████ | 4069/10000 [14:51:47<21:15:02, 12.90s/it] {'loss': 0.0071, 'learning_rate': 2.9715000000000003e-05, 'epoch': 1.53} 41%|████ | 4069/10000 [14:51:47<21:15:02, 12.90s/it] 41%|████ | 4070/10000 [14:52:00<21:13:58, 12.89s/it] {'loss': 0.0052, 'learning_rate': 2.971e-05, 'epoch': 1.53} 41%|████ | 4070/10000 [14:52:00<21:13:58, 12.89s/it] 41%|████ | 4071/10000 [14:52:13<21:12:16, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.9705e-05, 'epoch': 1.53} 41%|████ | 4071/10000 [14:52:13<21:12:16, 12.88s/it] 41%|████ | 4072/10000 [14:52:26<21:10:41, 12.86s/it] {'loss': 0.009, 'learning_rate': 2.97e-05, 'epoch': 1.53} 41%|████ | 4072/10000 [14:52:26<21:10:41, 12.86s/it] 41%|████ | 4073/10000 [14:52:38<21:11:36, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.9695e-05, 'epoch': 1.53} 41%|████ | 4073/10000 [14:52:38<21:11:36, 12.87s/it] 41%|████ | 4074/10000 [14:52:51<21:14:41, 12.91s/it] {'loss': 0.0049, 'learning_rate': 2.9690000000000002e-05, 'epoch': 1.54} 41%|████ | 4074/10000 [14:52:51<21:14:41, 12.91s/it] 41%|████ | 4075/10000 [14:53:04<21:16:27, 12.93s/it] {'loss': 0.007, 'learning_rate': 2.9685e-05, 'epoch': 1.54} 41%|████ | 4075/10000 [14:53:04<21:16:27, 12.93s/it] 41%|████ | 4076/10000 [14:53:17<21:16:34, 12.93s/it] {'loss': 0.0058, 'learning_rate': 2.9680000000000004e-05, 'epoch': 1.54} 41%|████ | 4076/10000 [14:53:17<21:16:34, 12.93s/it] 41%|████ | 4077/10000 [14:53:30<21:15:53, 12.92s/it] {'loss': 0.0047, 'learning_rate': 2.9675000000000003e-05, 'epoch': 1.54} 41%|████ | 4077/10000 [14:53:30<21:15:53, 12.92s/it] 41%|████ | 4078/10000 [14:53:43<21:15:26, 12.92s/it] {'loss': 0.0062, 'learning_rate': 2.9670000000000002e-05, 'epoch': 1.54} 41%|████ | 4078/10000 [14:53:43<21:15:26, 12.92s/it] 41%|████ | 4079/10000 [14:53:56<21:16:11, 12.93s/it] {'loss': 0.0224, 'learning_rate': 2.9665000000000005e-05, 'epoch': 1.54} 41%|████ | 4079/10000 [14:53:56<21:16:11, 12.93s/it] 41%|████ | 4080/10000 [14:54:09<21:17:23, 12.95s/it] {'loss': 0.0053, 'learning_rate': 2.9659999999999997e-05, 'epoch': 1.54} 41%|████ | 4080/10000 [14:54:09<21:17:23, 12.95s/it] 41%|████ | 4081/10000 [14:54:22<21:18:19, 12.96s/it] {'loss': 0.0053, 'learning_rate': 2.9655e-05, 'epoch': 1.54} 41%|████ | 4081/10000 [14:54:22<21:18:19, 12.96s/it] 41%|████ | 4082/10000 [14:54:35<21:16:20, 12.94s/it] {'loss': 0.0052, 'learning_rate': 2.965e-05, 'epoch': 1.54} 41%|████ | 4082/10000 [14:54:35<21:16:20, 12.94s/it] 41%|████ | 4083/10000 [14:54:48<21:16:09, 12.94s/it] {'loss': 0.0047, 'learning_rate': 2.9645e-05, 'epoch': 1.54} 41%|████ | 4083/10000 [14:54:48<21:16:09, 12.94s/it] 41%|████ | 4084/10000 [14:55:01<21:16:35, 12.95s/it] {'loss': 0.0062, 'learning_rate': 2.964e-05, 'epoch': 1.54} 41%|████ | 4084/10000 [14:55:01<21:16:35, 12.95s/it] 41%|████ | 4085/10000 [14:55:14<21:15:41, 12.94s/it] {'loss': 0.0057, 'learning_rate': 2.9635e-05, 'epoch': 1.54} 41%|████ | 4085/10000 [14:55:14<21:15:41, 12.94s/it] 41%|████ | 4086/10000 [14:55:27<21:15:04, 12.94s/it] {'loss': 0.0046, 'learning_rate': 2.9630000000000003e-05, 'epoch': 1.54} 41%|████ | 4086/10000 [14:55:27<21:15:04, 12.94s/it] 41%|████ | 4087/10000 [14:55:40<21:14:46, 12.94s/it] {'loss': 0.021, 'learning_rate': 2.9625000000000002e-05, 'epoch': 1.54} 41%|████ | 4087/10000 [14:55:40<21:14:46, 12.94s/it] 41%|████ | 4088/10000 [14:55:53<21:16:05, 12.95s/it] {'loss': 0.0216, 'learning_rate': 2.9620000000000004e-05, 'epoch': 1.54} 41%|████ | 4088/10000 [14:55:53<21:16:05, 12.95s/it] 41%|████ | 4089/10000 [14:56:06<21:17:06, 12.96s/it] {'loss': 0.0039, 'learning_rate': 2.9615000000000004e-05, 'epoch': 1.54} 41%|████ | 4089/10000 [14:56:06<21:17:06, 12.96s/it] 41%|████ | 4090/10000 [14:56:19<21:16:05, 12.96s/it] {'loss': 0.0092, 'learning_rate': 2.961e-05, 'epoch': 1.54} 41%|████ | 4090/10000 [14:56:19<21:16:05, 12.96s/it] 41%|████ | 4091/10000 [14:56:32<21:18:13, 12.98s/it] {'loss': 0.0111, 'learning_rate': 2.9605e-05, 'epoch': 1.54} 41%|████ | 4091/10000 [14:56:32<21:18:13, 12.98s/it] 41%|████ | 4092/10000 [14:56:45<21:18:22, 12.98s/it] {'loss': 0.0108, 'learning_rate': 2.96e-05, 'epoch': 1.54} 41%|████ | 4092/10000 [14:56:45<21:18:22, 12.98s/it] 41%|████ | 4093/10000 [14:56:58<21:16:07, 12.96s/it] {'loss': 0.0145, 'learning_rate': 2.9595e-05, 'epoch': 1.54} 41%|████ | 4093/10000 [14:56:58<21:16:07, 12.96s/it] 41%|████ | 4094/10000 [14:57:10<21:16:15, 12.97s/it] {'loss': 0.0076, 'learning_rate': 2.959e-05, 'epoch': 1.54} 41%|████ | 4094/10000 [14:57:11<21:16:15, 12.97s/it] 41%|████ | 4095/10000 [14:57:23<21:17:10, 12.98s/it] {'loss': 0.0048, 'learning_rate': 2.9585000000000002e-05, 'epoch': 1.54} 41%|████ | 4095/10000 [14:57:24<21:17:10, 12.98s/it] 41%|████ | 4096/10000 [14:57:36<21:17:20, 12.98s/it] {'loss': 0.0097, 'learning_rate': 2.958e-05, 'epoch': 1.54} 41%|████ | 4096/10000 [14:57:37<21:17:20, 12.98s/it] 41%|████ | 4097/10000 [14:57:49<21:14:10, 12.95s/it] {'loss': 0.0064, 'learning_rate': 2.9575000000000004e-05, 'epoch': 1.54} 41%|████ | 4097/10000 [14:57:49<21:14:10, 12.95s/it] 41%|████ | 4098/10000 [14:58:02<21:14:14, 12.95s/it] {'loss': 0.0076, 'learning_rate': 2.9570000000000003e-05, 'epoch': 1.54} 41%|████ | 4098/10000 [14:58:02<21:14:14, 12.95s/it] 41%|████ | 4099/10000 [14:58:15<21:14:47, 12.96s/it] {'loss': 0.0088, 'learning_rate': 2.9565000000000002e-05, 'epoch': 1.54} 41%|████ | 4099/10000 [14:58:15<21:14:47, 12.96s/it] 41%|████ | 4100/10000 [14:58:28<21:14:13, 12.96s/it] {'loss': 0.0083, 'learning_rate': 2.9559999999999998e-05, 'epoch': 1.54} 41%|████ | 4100/10000 [14:58:28<21:14:13, 12.96s/it] 41%|████ | 4101/10000 [14:58:41<21:12:45, 12.95s/it] {'loss': 0.0062, 'learning_rate': 2.9555e-05, 'epoch': 1.55} 41%|████ | 4101/10000 [14:58:41<21:12:45, 12.95s/it] 41%|████ | 4102/10000 [14:58:54<21:12:15, 12.94s/it] {'loss': 0.005, 'learning_rate': 2.955e-05, 'epoch': 1.55} 41%|████ | 4102/10000 [14:58:54<21:12:15, 12.94s/it] 41%|████ | 4103/10000 [14:59:07<21:11:17, 12.93s/it] {'loss': 0.007, 'learning_rate': 2.9545e-05, 'epoch': 1.55} 41%|████ | 4103/10000 [14:59:07<21:11:17, 12.93s/it] 41%|████ | 4104/10000 [14:59:20<21:11:51, 12.94s/it] {'loss': 0.0048, 'learning_rate': 2.9540000000000002e-05, 'epoch': 1.55} 41%|████ | 4104/10000 [14:59:20<21:11:51, 12.94s/it] 41%|████ | 4105/10000 [14:59:33<21:10:25, 12.93s/it] {'loss': 0.0048, 'learning_rate': 2.9535e-05, 'epoch': 1.55} 41%|████ | 4105/10000 [14:59:33<21:10:25, 12.93s/it] 41%|████ | 4106/10000 [14:59:46<21:11:56, 12.95s/it] {'loss': 0.0051, 'learning_rate': 2.9530000000000004e-05, 'epoch': 1.55} 41%|████ | 4106/10000 [14:59:46<21:11:56, 12.95s/it] 41%|████ | 4107/10000 [14:59:59<21:12:04, 12.95s/it] {'loss': 0.0051, 'learning_rate': 2.9525000000000003e-05, 'epoch': 1.55} 41%|████ | 4107/10000 [14:59:59<21:12:04, 12.95s/it] 41%|████ | 4108/10000 [15:00:12<21:11:49, 12.95s/it] {'loss': 0.005, 'learning_rate': 2.9520000000000002e-05, 'epoch': 1.55} 41%|████ | 4108/10000 [15:00:12<21:11:49, 12.95s/it] 41%|████ | 4109/10000 [15:00:25<21:11:51, 12.95s/it] {'loss': 0.0078, 'learning_rate': 2.9515000000000005e-05, 'epoch': 1.55} 41%|████ | 4109/10000 [15:00:25<21:11:51, 12.95s/it] 41%|████ | 4110/10000 [15:00:38<21:09:59, 12.94s/it] {'loss': 0.0078, 'learning_rate': 2.951e-05, 'epoch': 1.55} 41%|████ | 4110/10000 [15:00:38<21:09:59, 12.94s/it] 41%|████ | 4111/10000 [15:00:51<21:07:38, 12.92s/it] {'loss': 0.005, 'learning_rate': 2.9505e-05, 'epoch': 1.55} 41%|████ | 4111/10000 [15:00:51<21:07:38, 12.92s/it] 41%|████ | 4112/10000 [15:01:03<21:08:26, 12.93s/it] {'loss': 0.0063, 'learning_rate': 2.95e-05, 'epoch': 1.55} 41%|████ | 4112/10000 [15:01:03<21:08:26, 12.93s/it] 41%|████ | 4113/10000 [15:01:16<21:06:46, 12.91s/it] {'loss': 0.0093, 'learning_rate': 2.9495e-05, 'epoch': 1.55} 41%|████ | 4113/10000 [15:01:16<21:06:46, 12.91s/it] 41%|████ | 4114/10000 [15:01:29<21:06:30, 12.91s/it] {'loss': 0.0059, 'learning_rate': 2.949e-05, 'epoch': 1.55} 41%|████ | 4114/10000 [15:01:29<21:06:30, 12.91s/it] 41%|████ | 4115/10000 [15:01:42<21:07:04, 12.92s/it] {'loss': 0.0048, 'learning_rate': 2.9485000000000003e-05, 'epoch': 1.55} 41%|████ | 4115/10000 [15:01:42<21:07:04, 12.92s/it] 41%|████ | 4116/10000 [15:01:55<21:08:00, 12.93s/it] {'loss': 0.0052, 'learning_rate': 2.9480000000000002e-05, 'epoch': 1.55} 41%|████ | 4116/10000 [15:01:55<21:08:00, 12.93s/it] 41%|████ | 4117/10000 [15:02:08<21:08:50, 12.94s/it] {'loss': 0.005, 'learning_rate': 2.9475e-05, 'epoch': 1.55} 41%|████ | 4117/10000 [15:02:08<21:08:50, 12.94s/it] 41%|████ | 4118/10000 [15:02:21<21:11:19, 12.97s/it] {'loss': 0.0064, 'learning_rate': 2.9470000000000004e-05, 'epoch': 1.55} 41%|████ | 4118/10000 [15:02:21<21:11:19, 12.97s/it] 41%|████ | 4119/10000 [15:02:34<21:13:02, 12.99s/it] {'loss': 0.0054, 'learning_rate': 2.9465000000000003e-05, 'epoch': 1.55} 41%|████ | 4119/10000 [15:02:34<21:13:02, 12.99s/it] 41%|████ | 4120/10000 [15:02:47<21:11:10, 12.97s/it] {'loss': 0.0047, 'learning_rate': 2.946e-05, 'epoch': 1.55} 41%|████ | 4120/10000 [15:02:47<21:11:10, 12.97s/it] 41%|████ | 4121/10000 [15:03:00<21:10:42, 12.97s/it] {'loss': 0.005, 'learning_rate': 2.9455e-05, 'epoch': 1.55} 41%|████ | 4121/10000 [15:03:00<21:10:42, 12.97s/it] 41%|████ | 4122/10000 [15:03:13<21:10:09, 12.97s/it] {'loss': 0.006, 'learning_rate': 2.945e-05, 'epoch': 1.55} 41%|████ | 4122/10000 [15:03:13<21:10:09, 12.97s/it] 41%|████ | 4123/10000 [15:03:26<21:10:16, 12.97s/it] {'loss': 0.005, 'learning_rate': 2.9445e-05, 'epoch': 1.55} 41%|████ | 4123/10000 [15:03:26<21:10:16, 12.97s/it] 41%|████ | 4124/10000 [15:03:39<21:10:44, 12.98s/it] {'loss': 0.0051, 'learning_rate': 2.944e-05, 'epoch': 1.55} 41%|████ | 4124/10000 [15:03:39<21:10:44, 12.98s/it] 41%|████▏ | 4125/10000 [15:03:52<21:11:21, 12.98s/it] {'loss': 0.0047, 'learning_rate': 2.9435000000000002e-05, 'epoch': 1.55} 41%|████▏ | 4125/10000 [15:03:52<21:11:21, 12.98s/it] 41%|████▏ | 4126/10000 [15:04:05<21:08:53, 12.96s/it] {'loss': 0.0054, 'learning_rate': 2.943e-05, 'epoch': 1.55} 41%|████▏ | 4126/10000 [15:04:05<21:08:53, 12.96s/it] 41%|████▏ | 4127/10000 [15:04:18<21:10:06, 12.98s/it] {'loss': 0.0048, 'learning_rate': 2.9425000000000004e-05, 'epoch': 1.56} 41%|████▏ | 4127/10000 [15:04:18<21:10:06, 12.98s/it] 41%|████▏ | 4128/10000 [15:04:31<21:10:16, 12.98s/it] {'loss': 0.0049, 'learning_rate': 2.9420000000000003e-05, 'epoch': 1.56} 41%|████▏ | 4128/10000 [15:04:31<21:10:16, 12.98s/it] 41%|████▏ | 4129/10000 [15:04:44<21:07:30, 12.95s/it] {'loss': 0.0058, 'learning_rate': 2.9415000000000002e-05, 'epoch': 1.56} 41%|████▏ | 4129/10000 [15:04:44<21:07:30, 12.95s/it] 41%|████▏ | 4130/10000 [15:04:57<21:08:19, 12.96s/it] {'loss': 0.005, 'learning_rate': 2.9409999999999998e-05, 'epoch': 1.56} 41%|████▏ | 4130/10000 [15:04:57<21:08:19, 12.96s/it] 41%|████▏ | 4131/10000 [15:05:10<21:07:38, 12.96s/it] {'loss': 0.0052, 'learning_rate': 2.9405e-05, 'epoch': 1.56} 41%|████▏ | 4131/10000 [15:05:10<21:07:38, 12.96s/it] 41%|████▏ | 4132/10000 [15:05:23<21:09:26, 12.98s/it] {'loss': 0.0057, 'learning_rate': 2.94e-05, 'epoch': 1.56} 41%|████▏ | 4132/10000 [15:05:23<21:09:26, 12.98s/it] 41%|████▏ | 4133/10000 [15:05:36<21:06:31, 12.95s/it] {'loss': 0.0056, 'learning_rate': 2.9395e-05, 'epoch': 1.56} 41%|████▏ | 4133/10000 [15:05:36<21:06:31, 12.95s/it] 41%|████▏ | 4134/10000 [15:05:49<21:06:32, 12.95s/it] {'loss': 0.0055, 'learning_rate': 2.939e-05, 'epoch': 1.56} 41%|████▏ | 4134/10000 [15:05:49<21:06:32, 12.95s/it] 41%|████▏ | 4135/10000 [15:06:01<21:03:00, 12.92s/it] {'loss': 0.0052, 'learning_rate': 2.9385e-05, 'epoch': 1.56} 41%|████▏ | 4135/10000 [15:06:01<21:03:00, 12.92s/it] 41%|████▏ | 4136/10000 [15:06:14<21:00:57, 12.90s/it] {'loss': 0.0049, 'learning_rate': 2.9380000000000003e-05, 'epoch': 1.56} 41%|████▏ | 4136/10000 [15:06:14<21:00:57, 12.90s/it] 41%|████▏ | 4137/10000 [15:06:27<20:59:23, 12.89s/it] {'loss': 0.0039, 'learning_rate': 2.9375000000000003e-05, 'epoch': 1.56} 41%|████▏ | 4137/10000 [15:06:27<20:59:23, 12.89s/it] 41%|████▏ | 4138/10000 [15:06:40<20:57:24, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.9370000000000002e-05, 'epoch': 1.56} 41%|████▏ | 4138/10000 [15:06:40<20:57:24, 12.87s/it] 41%|████▏ | 4139/10000 [15:06:53<20:58:14, 12.88s/it] {'loss': 0.0075, 'learning_rate': 2.9365000000000004e-05, 'epoch': 1.56} 41%|████▏ | 4139/10000 [15:06:53<20:58:14, 12.88s/it] 41%|████▏ | 4140/10000 [15:07:06<21:00:59, 12.91s/it] {'loss': 0.0051, 'learning_rate': 2.9360000000000003e-05, 'epoch': 1.56} 41%|████▏ | 4140/10000 [15:07:06<21:00:59, 12.91s/it] 41%|████▏ | 4141/10000 [15:07:19<21:00:06, 12.90s/it] {'loss': 0.005, 'learning_rate': 2.9355e-05, 'epoch': 1.56} 41%|████▏ | 4141/10000 [15:07:19<21:00:06, 12.90s/it] 41%|████▏ | 4142/10000 [15:07:32<20:58:21, 12.89s/it] {'loss': 0.0045, 'learning_rate': 2.935e-05, 'epoch': 1.56} 41%|████▏ | 4142/10000 [15:07:32<20:58:21, 12.89s/it] 41%|████▏ | 4143/10000 [15:07:44<20:56:23, 12.87s/it] {'loss': 0.0099, 'learning_rate': 2.9345e-05, 'epoch': 1.56} 41%|████▏ | 4143/10000 [15:07:44<20:56:23, 12.87s/it] 41%|████▏ | 4144/10000 [15:07:57<20:58:11, 12.89s/it] {'loss': 0.006, 'learning_rate': 2.934e-05, 'epoch': 1.56} 41%|████▏ | 4144/10000 [15:07:57<20:58:11, 12.89s/it] 41%|████▏ | 4145/10000 [15:08:10<20:57:08, 12.88s/it] {'loss': 0.0037, 'learning_rate': 2.9335000000000003e-05, 'epoch': 1.56} 41%|████▏ | 4145/10000 [15:08:10<20:57:08, 12.88s/it] 41%|████▏ | 4146/10000 [15:08:23<20:56:07, 12.87s/it] {'loss': 0.0062, 'learning_rate': 2.9330000000000002e-05, 'epoch': 1.56} 41%|████▏ | 4146/10000 [15:08:23<20:56:07, 12.87s/it] 41%|████▏ | 4147/10000 [15:08:36<20:55:46, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.9325e-05, 'epoch': 1.56} 41%|████▏ | 4147/10000 [15:08:36<20:55:46, 12.87s/it] 41%|████▏ | 4148/10000 [15:08:49<20:58:04, 12.90s/it] {'loss': 0.0045, 'learning_rate': 2.9320000000000004e-05, 'epoch': 1.56} 41%|████▏ | 4148/10000 [15:08:49<20:58:04, 12.90s/it] 41%|████▏ | 4149/10000 [15:09:02<20:57:16, 12.89s/it] {'loss': 0.0073, 'learning_rate': 2.9315000000000003e-05, 'epoch': 1.56} 41%|████▏ | 4149/10000 [15:09:02<20:57:16, 12.89s/it] 42%|████▏ | 4150/10000 [15:09:15<20:56:30, 12.89s/it] {'loss': 0.006, 'learning_rate': 2.9310000000000006e-05, 'epoch': 1.56} 42%|████▏ | 4150/10000 [15:09:15<20:56:30, 12.89s/it] 42%|████▏ | 4151/10000 [15:09:28<20:58:00, 12.90s/it] {'loss': 0.0051, 'learning_rate': 2.9304999999999998e-05, 'epoch': 1.56} 42%|████▏ | 4151/10000 [15:09:28<20:58:00, 12.90s/it] 42%|████▏ | 4152/10000 [15:09:41<20:57:22, 12.90s/it] {'loss': 0.0079, 'learning_rate': 2.93e-05, 'epoch': 1.56} 42%|████▏ | 4152/10000 [15:09:41<20:57:22, 12.90s/it] 42%|████▏ | 4153/10000 [15:09:53<20:58:03, 12.91s/it] {'loss': 0.0047, 'learning_rate': 2.9295e-05, 'epoch': 1.56} 42%|████▏ | 4153/10000 [15:09:53<20:58:03, 12.91s/it] 42%|████▏ | 4154/10000 [15:10:06<20:58:22, 12.92s/it] {'loss': 0.0052, 'learning_rate': 2.929e-05, 'epoch': 1.57} 42%|████▏ | 4154/10000 [15:10:06<20:58:22, 12.92s/it] 42%|████▏ | 4155/10000 [15:10:19<20:58:49, 12.92s/it] {'loss': 0.0055, 'learning_rate': 2.9285e-05, 'epoch': 1.57} 42%|████▏ | 4155/10000 [15:10:19<20:58:49, 12.92s/it] 42%|████▏ | 4156/10000 [15:10:32<20:58:50, 12.92s/it] {'loss': 0.0061, 'learning_rate': 2.928e-05, 'epoch': 1.57} 42%|████▏ | 4156/10000 [15:10:32<20:58:50, 12.92s/it] 42%|████▏ | 4157/10000 [15:10:45<20:59:18, 12.93s/it] {'loss': 0.0047, 'learning_rate': 2.9275000000000003e-05, 'epoch': 1.57} 42%|████▏ | 4157/10000 [15:10:45<20:59:18, 12.93s/it] 42%|████▏ | 4158/10000 [15:10:58<20:57:07, 12.91s/it] {'loss': 0.0051, 'learning_rate': 2.9270000000000003e-05, 'epoch': 1.57} 42%|████▏ | 4158/10000 [15:10:58<20:57:07, 12.91s/it] 42%|████▏ | 4159/10000 [15:11:11<20:54:14, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.9265000000000002e-05, 'epoch': 1.57} 42%|████▏ | 4159/10000 [15:11:11<20:54:14, 12.88s/it] 42%|████▏ | 4160/10000 [15:11:24<20:53:54, 12.88s/it] {'loss': 0.0058, 'learning_rate': 2.9260000000000004e-05, 'epoch': 1.57} 42%|████▏ | 4160/10000 [15:11:24<20:53:54, 12.88s/it] 42%|████▏ | 4161/10000 [15:11:37<20:54:01, 12.89s/it] {'loss': 0.006, 'learning_rate': 2.9255e-05, 'epoch': 1.57} 42%|████▏ | 4161/10000 [15:11:37<20:54:01, 12.89s/it] 42%|████▏ | 4162/10000 [15:11:50<20:54:40, 12.89s/it] {'loss': 0.0052, 'learning_rate': 2.925e-05, 'epoch': 1.57} 42%|████▏ | 4162/10000 [15:11:50<20:54:40, 12.89s/it] 42%|████▏ | 4163/10000 [15:12:02<20:55:18, 12.90s/it] {'loss': 0.005, 'learning_rate': 2.9245e-05, 'epoch': 1.57} 42%|████▏ | 4163/10000 [15:12:03<20:55:18, 12.90s/it] 42%|████▏ | 4164/10000 [15:12:15<20:51:35, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.924e-05, 'epoch': 1.57} 42%|████▏ | 4164/10000 [15:12:15<20:51:35, 12.87s/it] 42%|████▏ | 4165/10000 [15:12:28<20:52:26, 12.88s/it] {'loss': 0.0038, 'learning_rate': 2.9235e-05, 'epoch': 1.57} 42%|████▏ | 4165/10000 [15:12:28<20:52:26, 12.88s/it] 42%|████▏ | 4166/10000 [15:12:41<20:51:35, 12.87s/it] {'loss': 0.0054, 'learning_rate': 2.9230000000000003e-05, 'epoch': 1.57} 42%|████▏ | 4166/10000 [15:12:41<20:51:35, 12.87s/it] 42%|████▏ | 4167/10000 [15:12:54<20:51:19, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.9225000000000002e-05, 'epoch': 1.57} 42%|████▏ | 4167/10000 [15:12:54<20:51:19, 12.87s/it] 42%|████▏ | 4168/10000 [15:13:07<20:49:50, 12.86s/it] {'loss': 0.0042, 'learning_rate': 2.922e-05, 'epoch': 1.57} 42%|████▏ | 4168/10000 [15:13:07<20:49:50, 12.86s/it] 42%|████▏ | 4169/10000 [15:13:20<20:53:20, 12.90s/it] {'loss': 0.0048, 'learning_rate': 2.9215000000000004e-05, 'epoch': 1.57} 42%|████▏ | 4169/10000 [15:13:20<20:53:20, 12.90s/it] 42%|████▏ | 4170/10000 [15:13:33<20:54:55, 12.92s/it] {'loss': 0.0041, 'learning_rate': 2.9210000000000003e-05, 'epoch': 1.57} 42%|████▏ | 4170/10000 [15:13:33<20:54:55, 12.92s/it] 42%|████▏ | 4171/10000 [15:13:46<20:55:04, 12.92s/it] {'loss': 0.0053, 'learning_rate': 2.9205e-05, 'epoch': 1.57} 42%|████▏ | 4171/10000 [15:13:46<20:55:04, 12.92s/it] 42%|████▏ | 4172/10000 [15:13:59<20:57:34, 12.95s/it] {'loss': 0.0048, 'learning_rate': 2.9199999999999998e-05, 'epoch': 1.57} 42%|████▏ | 4172/10000 [15:13:59<20:57:34, 12.95s/it] 42%|████▏ | 4173/10000 [15:14:12<20:58:00, 12.95s/it] {'loss': 0.0054, 'learning_rate': 2.9195e-05, 'epoch': 1.57} 42%|████▏ | 4173/10000 [15:14:12<20:58:00, 12.95s/it] 42%|████▏ | 4174/10000 [15:14:25<20:58:19, 12.96s/it] {'loss': 0.0047, 'learning_rate': 2.919e-05, 'epoch': 1.57} 42%|████▏ | 4174/10000 [15:14:25<20:58:19, 12.96s/it] 42%|████▏ | 4175/10000 [15:14:37<20:56:42, 12.94s/it] {'loss': 0.0046, 'learning_rate': 2.9185000000000003e-05, 'epoch': 1.57} 42%|████▏ | 4175/10000 [15:14:38<20:56:42, 12.94s/it] 42%|████▏ | 4176/10000 [15:14:50<20:57:07, 12.95s/it] {'loss': 0.0056, 'learning_rate': 2.9180000000000002e-05, 'epoch': 1.57} 42%|████▏ | 4176/10000 [15:14:50<20:57:07, 12.95s/it] 42%|████▏ | 4177/10000 [15:15:03<20:57:10, 12.95s/it] {'loss': 0.0056, 'learning_rate': 2.9175e-05, 'epoch': 1.57} 42%|████▏ | 4177/10000 [15:15:03<20:57:10, 12.95s/it] 42%|████▏ | 4178/10000 [15:15:16<20:54:57, 12.93s/it] {'loss': 0.0061, 'learning_rate': 2.9170000000000004e-05, 'epoch': 1.57} 42%|████▏ | 4178/10000 [15:15:16<20:54:57, 12.93s/it] 42%|████▏ | 4179/10000 [15:15:29<20:54:21, 12.93s/it] {'loss': 0.005, 'learning_rate': 2.9165000000000003e-05, 'epoch': 1.57} 42%|████▏ | 4179/10000 [15:15:29<20:54:21, 12.93s/it] 42%|████▏ | 4180/10000 [15:15:42<20:56:12, 12.95s/it] {'loss': 0.0054, 'learning_rate': 2.9160000000000005e-05, 'epoch': 1.57} 42%|████▏ | 4180/10000 [15:15:42<20:56:12, 12.95s/it] 42%|████▏ | 4181/10000 [15:15:55<20:53:14, 12.92s/it] {'loss': 0.0045, 'learning_rate': 2.9154999999999998e-05, 'epoch': 1.58} 42%|████▏ | 4181/10000 [15:15:55<20:53:14, 12.92s/it] 42%|████▏ | 4182/10000 [15:16:08<20:55:30, 12.95s/it] {'loss': 0.0061, 'learning_rate': 2.915e-05, 'epoch': 1.58} 42%|████▏ | 4182/10000 [15:16:08<20:55:30, 12.95s/it] 42%|████▏ | 4183/10000 [15:16:21<20:56:41, 12.96s/it] {'loss': 0.0047, 'learning_rate': 2.9145e-05, 'epoch': 1.58} 42%|████▏ | 4183/10000 [15:16:21<20:56:41, 12.96s/it] 42%|████▏ | 4184/10000 [15:16:34<20:54:44, 12.94s/it] {'loss': 0.0057, 'learning_rate': 2.9140000000000002e-05, 'epoch': 1.58} 42%|████▏ | 4184/10000 [15:16:34<20:54:44, 12.94s/it] 42%|████▏ | 4185/10000 [15:16:47<20:54:26, 12.94s/it] {'loss': 0.0043, 'learning_rate': 2.9135e-05, 'epoch': 1.58} 42%|████▏ | 4185/10000 [15:16:47<20:54:26, 12.94s/it] 42%|████▏ | 4186/10000 [15:17:00<20:52:57, 12.93s/it] {'loss': 0.0045, 'learning_rate': 2.913e-05, 'epoch': 1.58} 42%|████▏ | 4186/10000 [15:17:00<20:52:57, 12.93s/it] 42%|████▏ | 4187/10000 [15:17:13<20:52:39, 12.93s/it] {'loss': 0.0062, 'learning_rate': 2.9125000000000003e-05, 'epoch': 1.58} 42%|████▏ | 4187/10000 [15:17:13<20:52:39, 12.93s/it] 42%|████▏ | 4188/10000 [15:17:26<20:52:58, 12.93s/it] {'loss': 0.0044, 'learning_rate': 2.9120000000000002e-05, 'epoch': 1.58} 42%|████▏ | 4188/10000 [15:17:26<20:52:58, 12.93s/it] 42%|████▏ | 4189/10000 [15:17:39<20:52:56, 12.94s/it] {'loss': 0.0077, 'learning_rate': 2.9115000000000005e-05, 'epoch': 1.58} 42%|████▏ | 4189/10000 [15:17:39<20:52:56, 12.94s/it] 42%|████▏ | 4190/10000 [15:17:52<20:52:48, 12.94s/it] {'loss': 0.0052, 'learning_rate': 2.9110000000000004e-05, 'epoch': 1.58} 42%|████▏ | 4190/10000 [15:17:52<20:52:48, 12.94s/it] 42%|████▏ | 4191/10000 [15:18:05<20:53:55, 12.95s/it] {'loss': 0.0053, 'learning_rate': 2.9105e-05, 'epoch': 1.58} 42%|████▏ | 4191/10000 [15:18:05<20:53:55, 12.95s/it] 42%|████▏ | 4192/10000 [15:18:17<20:51:54, 12.93s/it] {'loss': 0.0061, 'learning_rate': 2.91e-05, 'epoch': 1.58} 42%|████▏ | 4192/10000 [15:18:17<20:51:54, 12.93s/it] 42%|████▏ | 4193/10000 [15:18:30<20:52:19, 12.94s/it] {'loss': 0.0055, 'learning_rate': 2.9095e-05, 'epoch': 1.58} 42%|████▏ | 4193/10000 [15:18:30<20:52:19, 12.94s/it] 42%|████▏ | 4194/10000 [15:18:43<20:52:42, 12.95s/it] {'loss': 0.0058, 'learning_rate': 2.909e-05, 'epoch': 1.58} 42%|████▏ | 4194/10000 [15:18:43<20:52:42, 12.95s/it] 42%|████▏ | 4195/10000 [15:18:56<20:54:03, 12.96s/it] {'loss': 0.0044, 'learning_rate': 2.9085e-05, 'epoch': 1.58} 42%|████▏ | 4195/10000 [15:18:56<20:54:03, 12.96s/it] 42%|████▏ | 4196/10000 [15:19:09<20:52:37, 12.95s/it] {'loss': 0.0053, 'learning_rate': 2.9080000000000003e-05, 'epoch': 1.58} 42%|████▏ | 4196/10000 [15:19:09<20:52:37, 12.95s/it] 42%|████▏ | 4197/10000 [15:19:22<20:53:55, 12.97s/it] {'loss': 0.0047, 'learning_rate': 2.9075000000000002e-05, 'epoch': 1.58} 42%|████▏ | 4197/10000 [15:19:22<20:53:55, 12.97s/it] 42%|████▏ | 4198/10000 [15:19:35<20:55:04, 12.98s/it] {'loss': 0.0041, 'learning_rate': 2.907e-05, 'epoch': 1.58} 42%|████▏ | 4198/10000 [15:19:35<20:55:04, 12.98s/it] 42%|████▏ | 4199/10000 [15:19:48<20:54:23, 12.97s/it] {'loss': 0.0045, 'learning_rate': 2.9065000000000004e-05, 'epoch': 1.58} 42%|████▏ | 4199/10000 [15:19:48<20:54:23, 12.97s/it] 42%|████▏ | 4200/10000 [15:20:01<20:53:20, 12.97s/it] {'loss': 0.0051, 'learning_rate': 2.9060000000000003e-05, 'epoch': 1.58} 42%|████▏ | 4200/10000 [15:20:01<20:53:20, 12.97s/it] 42%|████▏ | 4201/10000 [15:20:14<20:53:43, 12.97s/it] {'loss': 0.0066, 'learning_rate': 2.9055e-05, 'epoch': 1.58} 42%|████▏ | 4201/10000 [15:20:14<20:53:43, 12.97s/it] 42%|████▏ | 4202/10000 [15:20:27<20:52:49, 12.96s/it] {'loss': 0.0036, 'learning_rate': 2.9049999999999998e-05, 'epoch': 1.58} 42%|████▏ | 4202/10000 [15:20:27<20:52:49, 12.96s/it] 42%|████▏ | 4203/10000 [15:20:40<20:54:19, 12.98s/it] {'loss': 0.0048, 'learning_rate': 2.9045e-05, 'epoch': 1.58} 42%|████▏ | 4203/10000 [15:20:40<20:54:19, 12.98s/it] 42%|████▏ | 4204/10000 [15:20:53<20:53:41, 12.98s/it] {'loss': 0.0048, 'learning_rate': 2.904e-05, 'epoch': 1.58} 42%|████▏ | 4204/10000 [15:20:53<20:53:41, 12.98s/it] 42%|████▏ | 4205/10000 [15:21:06<20:51:54, 12.96s/it] {'loss': 0.0057, 'learning_rate': 2.9035000000000002e-05, 'epoch': 1.58} 42%|████▏ | 4205/10000 [15:21:06<20:51:54, 12.96s/it] 42%|████▏ | 4206/10000 [15:21:19<20:51:27, 12.96s/it] {'loss': 0.0042, 'learning_rate': 2.903e-05, 'epoch': 1.58} 42%|████▏ | 4206/10000 [15:21:19<20:51:27, 12.96s/it] 42%|████▏ | 4207/10000 [15:21:32<20:51:32, 12.96s/it] {'loss': 0.0057, 'learning_rate': 2.9025e-05, 'epoch': 1.59} 42%|████▏ | 4207/10000 [15:21:32<20:51:32, 12.96s/it] 42%|████▏ | 4208/10000 [15:21:45<20:50:33, 12.95s/it] {'loss': 0.006, 'learning_rate': 2.9020000000000003e-05, 'epoch': 1.59} 42%|████▏ | 4208/10000 [15:21:45<20:50:33, 12.95s/it] 42%|████▏ | 4209/10000 [15:21:58<20:51:03, 12.96s/it] {'loss': 0.0054, 'learning_rate': 2.9015000000000003e-05, 'epoch': 1.59} 42%|████▏ | 4209/10000 [15:21:58<20:51:03, 12.96s/it] 42%|████▏ | 4210/10000 [15:22:11<20:48:02, 12.93s/it] {'loss': 0.0067, 'learning_rate': 2.9010000000000005e-05, 'epoch': 1.59} 42%|████▏ | 4210/10000 [15:22:11<20:48:02, 12.93s/it] 42%|████▏ | 4211/10000 [15:22:24<20:48:35, 12.94s/it] {'loss': 0.0046, 'learning_rate': 2.9004999999999998e-05, 'epoch': 1.59} 42%|████▏ | 4211/10000 [15:22:24<20:48:35, 12.94s/it] 42%|████▏ | 4212/10000 [15:22:37<20:48:17, 12.94s/it] {'loss': 0.0058, 'learning_rate': 2.9e-05, 'epoch': 1.59} 42%|████▏ | 4212/10000 [15:22:37<20:48:17, 12.94s/it] 42%|████▏ | 4213/10000 [15:22:50<20:47:20, 12.93s/it] {'loss': 0.0041, 'learning_rate': 2.8995e-05, 'epoch': 1.59} 42%|████▏ | 4213/10000 [15:22:50<20:47:20, 12.93s/it] 42%|████▏ | 4214/10000 [15:23:02<20:46:08, 12.92s/it] {'loss': 0.0039, 'learning_rate': 2.8990000000000002e-05, 'epoch': 1.59} 42%|████▏ | 4214/10000 [15:23:03<20:46:08, 12.92s/it] 42%|████▏ | 4215/10000 [15:23:15<20:47:08, 12.93s/it] {'loss': 0.005, 'learning_rate': 2.8985e-05, 'epoch': 1.59} 42%|████▏ | 4215/10000 [15:23:15<20:47:08, 12.93s/it] 42%|████▏ | 4216/10000 [15:23:28<20:46:31, 12.93s/it] {'loss': 0.0053, 'learning_rate': 2.898e-05, 'epoch': 1.59} 42%|████▏ | 4216/10000 [15:23:28<20:46:31, 12.93s/it] 42%|████▏ | 4217/10000 [15:23:41<20:49:21, 12.96s/it] {'loss': 0.0053, 'learning_rate': 2.8975000000000003e-05, 'epoch': 1.59} 42%|████▏ | 4217/10000 [15:23:41<20:49:21, 12.96s/it] 42%|████▏ | 4218/10000 [15:23:54<20:49:02, 12.96s/it] {'loss': 0.0045, 'learning_rate': 2.8970000000000002e-05, 'epoch': 1.59} 42%|████▏ | 4218/10000 [15:23:54<20:49:02, 12.96s/it] 42%|████▏ | 4219/10000 [15:24:07<20:46:14, 12.93s/it] {'loss': 0.0056, 'learning_rate': 2.8965000000000005e-05, 'epoch': 1.59} 42%|████▏ | 4219/10000 [15:24:07<20:46:14, 12.93s/it] 42%|████▏ | 4220/10000 [15:24:20<20:47:39, 12.95s/it] {'loss': 0.0063, 'learning_rate': 2.8960000000000004e-05, 'epoch': 1.59} 42%|████▏ | 4220/10000 [15:24:20<20:47:39, 12.95s/it] 42%|████▏ | 4221/10000 [15:24:33<20:48:55, 12.97s/it] {'loss': 0.0048, 'learning_rate': 2.8955e-05, 'epoch': 1.59} 42%|████▏ | 4221/10000 [15:24:33<20:48:55, 12.97s/it] 42%|████▏ | 4222/10000 [15:24:46<20:49:33, 12.98s/it] {'loss': 0.0059, 'learning_rate': 2.895e-05, 'epoch': 1.59} 42%|████▏ | 4222/10000 [15:24:46<20:49:33, 12.98s/it] 42%|████▏ | 4223/10000 [15:24:59<20:49:24, 12.98s/it] {'loss': 0.0048, 'learning_rate': 2.8945e-05, 'epoch': 1.59} 42%|████▏ | 4223/10000 [15:24:59<20:49:24, 12.98s/it] 42%|████▏ | 4224/10000 [15:25:12<20:49:24, 12.98s/it] {'loss': 0.0046, 'learning_rate': 2.894e-05, 'epoch': 1.59} 42%|████▏ | 4224/10000 [15:25:12<20:49:24, 12.98s/it] 42%|████▏ | 4225/10000 [15:25:25<20:47:36, 12.96s/it] {'loss': 0.0067, 'learning_rate': 2.8935e-05, 'epoch': 1.59} 42%|████▏ | 4225/10000 [15:25:25<20:47:36, 12.96s/it] 42%|████▏ | 4226/10000 [15:25:38<20:45:36, 12.94s/it] {'loss': 0.0054, 'learning_rate': 2.8930000000000003e-05, 'epoch': 1.59} 42%|████▏ | 4226/10000 [15:25:38<20:45:36, 12.94s/it] 42%|████▏ | 4227/10000 [15:25:51<20:44:56, 12.94s/it] {'loss': 0.0059, 'learning_rate': 2.8925000000000002e-05, 'epoch': 1.59} 42%|████▏ | 4227/10000 [15:25:51<20:44:56, 12.94s/it] 42%|████▏ | 4228/10000 [15:26:04<20:44:10, 12.93s/it] {'loss': 0.006, 'learning_rate': 2.8920000000000004e-05, 'epoch': 1.59} 42%|████▏ | 4228/10000 [15:26:04<20:44:10, 12.93s/it] 42%|████▏ | 4229/10000 [15:26:17<20:43:24, 12.93s/it] {'loss': 0.006, 'learning_rate': 2.8915000000000004e-05, 'epoch': 1.59} 42%|████▏ | 4229/10000 [15:26:17<20:43:24, 12.93s/it] 42%|████▏ | 4230/10000 [15:26:30<20:42:54, 12.92s/it] {'loss': 0.0057, 'learning_rate': 2.8910000000000003e-05, 'epoch': 1.59} 42%|████▏ | 4230/10000 [15:26:30<20:42:54, 12.92s/it] 42%|████▏ | 4231/10000 [15:26:43<20:44:11, 12.94s/it] {'loss': 0.005, 'learning_rate': 2.8905e-05, 'epoch': 1.59} 42%|████▏ | 4231/10000 [15:26:43<20:44:11, 12.94s/it] 42%|████▏ | 4232/10000 [15:26:55<20:40:51, 12.91s/it] {'loss': 0.0064, 'learning_rate': 2.8899999999999998e-05, 'epoch': 1.59} 42%|████▏ | 4232/10000 [15:26:56<20:40:51, 12.91s/it] 42%|████▏ | 4233/10000 [15:27:08<20:42:43, 12.93s/it] {'loss': 0.0042, 'learning_rate': 2.8895e-05, 'epoch': 1.59} 42%|████▏ | 4233/10000 [15:27:08<20:42:43, 12.93s/it] 42%|████▏ | 4234/10000 [15:27:21<20:44:47, 12.95s/it] {'loss': 0.0047, 'learning_rate': 2.889e-05, 'epoch': 1.6} 42%|████▏ | 4234/10000 [15:27:22<20:44:47, 12.95s/it] 42%|████▏ | 4235/10000 [15:27:34<20:45:48, 12.97s/it] {'loss': 0.0051, 'learning_rate': 2.8885000000000002e-05, 'epoch': 1.6} 42%|████▏ | 4235/10000 [15:27:35<20:45:48, 12.97s/it] 42%|████▏ | 4236/10000 [15:27:47<20:45:31, 12.97s/it] {'loss': 0.0053, 'learning_rate': 2.888e-05, 'epoch': 1.6} 42%|████▏ | 4236/10000 [15:27:47<20:45:31, 12.97s/it] 42%|████▏ | 4237/10000 [15:28:00<20:43:08, 12.94s/it] {'loss': 0.0064, 'learning_rate': 2.8875e-05, 'epoch': 1.6} 42%|████▏ | 4237/10000 [15:28:00<20:43:08, 12.94s/it] 42%|████▏ | 4238/10000 [15:28:13<20:39:30, 12.91s/it] {'loss': 0.0051, 'learning_rate': 2.8870000000000003e-05, 'epoch': 1.6} 42%|████▏ | 4238/10000 [15:28:13<20:39:30, 12.91s/it] 42%|████▏ | 4239/10000 [15:28:26<20:37:45, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.8865000000000002e-05, 'epoch': 1.6} 42%|████▏ | 4239/10000 [15:28:26<20:37:45, 12.89s/it] 42%|████▏ | 4240/10000 [15:28:39<20:37:06, 12.89s/it] {'loss': 0.0058, 'learning_rate': 2.8860000000000005e-05, 'epoch': 1.6} 42%|████▏ | 4240/10000 [15:28:39<20:37:06, 12.89s/it] 42%|████▏ | 4241/10000 [15:28:52<20:35:15, 12.87s/it] {'loss': 0.0065, 'learning_rate': 2.8854999999999997e-05, 'epoch': 1.6} 42%|████▏ | 4241/10000 [15:28:52<20:35:15, 12.87s/it] 42%|████▏ | 4242/10000 [15:29:05<20:37:22, 12.89s/it] {'loss': 0.0039, 'learning_rate': 2.885e-05, 'epoch': 1.6} 42%|████▏ | 4242/10000 [15:29:05<20:37:22, 12.89s/it] 42%|████▏ | 4243/10000 [15:29:17<20:35:41, 12.88s/it] {'loss': 0.0056, 'learning_rate': 2.8845e-05, 'epoch': 1.6} 42%|████▏ | 4243/10000 [15:29:18<20:35:41, 12.88s/it] 42%|████▏ | 4244/10000 [15:29:30<20:35:28, 12.88s/it] {'loss': 0.006, 'learning_rate': 2.8840000000000002e-05, 'epoch': 1.6} 42%|████▏ | 4244/10000 [15:29:30<20:35:28, 12.88s/it] 42%|████▏ | 4245/10000 [15:29:43<20:35:14, 12.88s/it] {'loss': 0.0152, 'learning_rate': 2.8835e-05, 'epoch': 1.6} 42%|████▏ | 4245/10000 [15:29:43<20:35:14, 12.88s/it] 42%|████▏ | 4246/10000 [15:29:56<20:34:53, 12.88s/it] {'loss': 0.0044, 'learning_rate': 2.883e-05, 'epoch': 1.6} 42%|████▏ | 4246/10000 [15:29:56<20:34:53, 12.88s/it] 42%|████▏ | 4247/10000 [15:30:09<20:35:51, 12.89s/it] {'loss': 0.0054, 'learning_rate': 2.8825000000000003e-05, 'epoch': 1.6} 42%|████▏ | 4247/10000 [15:30:09<20:35:51, 12.89s/it] 42%|████▏ | 4248/10000 [15:30:22<20:36:14, 12.90s/it] {'loss': 0.0045, 'learning_rate': 2.8820000000000002e-05, 'epoch': 1.6} 42%|████▏ | 4248/10000 [15:30:22<20:36:14, 12.90s/it] 42%|████▏ | 4249/10000 [15:30:35<20:38:26, 12.92s/it] {'loss': 0.0056, 'learning_rate': 2.8815000000000004e-05, 'epoch': 1.6} 42%|████▏ | 4249/10000 [15:30:35<20:38:26, 12.92s/it] 42%|████▎ | 4250/10000 [15:30:48<20:37:43, 12.92s/it] {'loss': 0.0117, 'learning_rate': 2.8810000000000004e-05, 'epoch': 1.6} 42%|████▎ | 4250/10000 [15:30:48<20:37:43, 12.92s/it] 43%|████▎ | 4251/10000 [15:31:01<20:39:41, 12.94s/it] {'loss': 0.0058, 'learning_rate': 2.8805e-05, 'epoch': 1.6} 43%|████▎ | 4251/10000 [15:31:01<20:39:41, 12.94s/it] 43%|████▎ | 4252/10000 [15:31:14<20:40:06, 12.94s/it] {'loss': 0.0042, 'learning_rate': 2.88e-05, 'epoch': 1.6} 43%|████▎ | 4252/10000 [15:31:14<20:40:06, 12.94s/it] 43%|████▎ | 4253/10000 [15:31:27<20:39:16, 12.94s/it] {'loss': 0.0059, 'learning_rate': 2.8795e-05, 'epoch': 1.6} 43%|████▎ | 4253/10000 [15:31:27<20:39:16, 12.94s/it] 43%|████▎ | 4254/10000 [15:31:40<20:38:59, 12.94s/it] {'loss': 0.0045, 'learning_rate': 2.879e-05, 'epoch': 1.6} 43%|████▎ | 4254/10000 [15:31:40<20:38:59, 12.94s/it] 43%|████▎ | 4255/10000 [15:31:53<20:37:21, 12.92s/it] {'loss': 0.0051, 'learning_rate': 2.8785e-05, 'epoch': 1.6} 43%|████▎ | 4255/10000 [15:31:53<20:37:21, 12.92s/it] 43%|████▎ | 4256/10000 [15:32:06<20:41:26, 12.97s/it] {'loss': 0.0053, 'learning_rate': 2.8780000000000002e-05, 'epoch': 1.6} 43%|████▎ | 4256/10000 [15:32:06<20:41:26, 12.97s/it] 43%|████▎ | 4257/10000 [15:32:19<20:42:26, 12.98s/it] {'loss': 0.0061, 'learning_rate': 2.8775e-05, 'epoch': 1.6} 43%|████▎ | 4257/10000 [15:32:19<20:42:26, 12.98s/it] 43%|████▎ | 4258/10000 [15:32:32<20:40:20, 12.96s/it] {'loss': 0.007, 'learning_rate': 2.8770000000000004e-05, 'epoch': 1.6} 43%|████▎ | 4258/10000 [15:32:32<20:40:20, 12.96s/it] 43%|████▎ | 4259/10000 [15:32:45<20:43:04, 12.99s/it] {'loss': 0.0046, 'learning_rate': 2.8765000000000003e-05, 'epoch': 1.6} 43%|████▎ | 4259/10000 [15:32:45<20:43:04, 12.99s/it] 43%|████▎ | 4260/10000 [15:32:58<20:41:36, 12.98s/it] {'loss': 0.0057, 'learning_rate': 2.8760000000000002e-05, 'epoch': 1.61} 43%|████▎ | 4260/10000 [15:32:58<20:41:36, 12.98s/it] 43%|████▎ | 4261/10000 [15:33:11<20:42:19, 12.99s/it] {'loss': 0.0055, 'learning_rate': 2.8754999999999998e-05, 'epoch': 1.61} 43%|████▎ | 4261/10000 [15:33:11<20:42:19, 12.99s/it] 43%|████▎ | 4262/10000 [15:33:23<20:40:18, 12.97s/it] {'loss': 0.0058, 'learning_rate': 2.8749999999999997e-05, 'epoch': 1.61} 43%|████▎ | 4262/10000 [15:33:24<20:40:18, 12.97s/it] 43%|████▎ | 4263/10000 [15:33:36<20:37:01, 12.94s/it] {'loss': 0.0066, 'learning_rate': 2.8745e-05, 'epoch': 1.61} 43%|████▎ | 4263/10000 [15:33:36<20:37:01, 12.94s/it] 43%|████▎ | 4264/10000 [15:33:49<20:35:58, 12.93s/it] {'loss': 0.0053, 'learning_rate': 2.874e-05, 'epoch': 1.61} 43%|████▎ | 4264/10000 [15:33:49<20:35:58, 12.93s/it] 43%|████▎ | 4265/10000 [15:34:02<20:36:59, 12.94s/it] {'loss': 0.0052, 'learning_rate': 2.8735000000000002e-05, 'epoch': 1.61} 43%|████▎ | 4265/10000 [15:34:02<20:36:59, 12.94s/it] 43%|████▎ | 4266/10000 [15:34:15<20:36:29, 12.94s/it] {'loss': 0.0048, 'learning_rate': 2.873e-05, 'epoch': 1.61} 43%|████▎ | 4266/10000 [15:34:15<20:36:29, 12.94s/it] 43%|████▎ | 4267/10000 [15:34:28<20:38:16, 12.96s/it] {'loss': 0.004, 'learning_rate': 2.8725e-05, 'epoch': 1.61} 43%|████▎ | 4267/10000 [15:34:28<20:38:16, 12.96s/it] 43%|████▎ | 4268/10000 [15:34:41<20:35:27, 12.93s/it] {'loss': 0.0055, 'learning_rate': 2.8720000000000003e-05, 'epoch': 1.61} 43%|████▎ | 4268/10000 [15:34:41<20:35:27, 12.93s/it] 43%|████▎ | 4269/10000 [15:34:54<20:32:24, 12.90s/it] {'loss': 0.0053, 'learning_rate': 2.8715000000000002e-05, 'epoch': 1.61} 43%|████▎ | 4269/10000 [15:34:54<20:32:24, 12.90s/it] 43%|████▎ | 4270/10000 [15:35:07<20:30:24, 12.88s/it] {'loss': 0.006, 'learning_rate': 2.8710000000000005e-05, 'epoch': 1.61} 43%|████▎ | 4270/10000 [15:35:07<20:30:24, 12.88s/it] 43%|████▎ | 4271/10000 [15:35:20<20:29:59, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.8705000000000004e-05, 'epoch': 1.61} 43%|████▎ | 4271/10000 [15:35:20<20:29:59, 12.88s/it] 43%|████▎ | 4272/10000 [15:35:32<20:28:45, 12.87s/it] {'loss': 0.0038, 'learning_rate': 2.87e-05, 'epoch': 1.61} 43%|████▎ | 4272/10000 [15:35:32<20:28:45, 12.87s/it] 43%|████▎ | 4273/10000 [15:35:45<20:28:01, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.8695e-05, 'epoch': 1.61} 43%|████▎ | 4273/10000 [15:35:45<20:28:01, 12.87s/it] 43%|████▎ | 4274/10000 [15:35:58<20:27:09, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.869e-05, 'epoch': 1.61} 43%|████▎ | 4274/10000 [15:35:58<20:27:09, 12.86s/it] 43%|████▎ | 4275/10000 [15:36:11<20:27:00, 12.86s/it] {'loss': 0.006, 'learning_rate': 2.8685e-05, 'epoch': 1.61} 43%|████▎ | 4275/10000 [15:36:11<20:27:00, 12.86s/it] 43%|████▎ | 4276/10000 [15:36:24<20:27:00, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.868e-05, 'epoch': 1.61} 43%|████▎ | 4276/10000 [15:36:24<20:27:00, 12.86s/it] 43%|████▎ | 4277/10000 [15:36:37<20:29:00, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.8675000000000002e-05, 'epoch': 1.61} 43%|████▎ | 4277/10000 [15:36:37<20:29:00, 12.88s/it] 43%|████▎ | 4278/10000 [15:36:50<20:27:22, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.867e-05, 'epoch': 1.61} 43%|████▎ | 4278/10000 [15:36:50<20:27:22, 12.87s/it] 43%|████▎ | 4279/10000 [15:37:02<20:25:05, 12.85s/it] {'loss': 0.0047, 'learning_rate': 2.8665000000000004e-05, 'epoch': 1.61} 43%|████▎ | 4279/10000 [15:37:02<20:25:05, 12.85s/it] 43%|████▎ | 4280/10000 [15:37:15<20:24:36, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.8660000000000003e-05, 'epoch': 1.61} 43%|████▎ | 4280/10000 [15:37:15<20:24:36, 12.85s/it] 43%|████▎ | 4281/10000 [15:37:28<20:25:35, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.8655000000000003e-05, 'epoch': 1.61} 43%|████▎ | 4281/10000 [15:37:28<20:25:35, 12.86s/it] 43%|████▎ | 4282/10000 [15:37:41<20:25:54, 12.86s/it] {'loss': 0.0051, 'learning_rate': 2.865e-05, 'epoch': 1.61} 43%|████▎ | 4282/10000 [15:37:41<20:25:54, 12.86s/it] 43%|████▎ | 4283/10000 [15:37:54<20:25:02, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.8645e-05, 'epoch': 1.61} 43%|████▎ | 4283/10000 [15:37:54<20:25:02, 12.86s/it] 43%|████▎ | 4284/10000 [15:38:07<20:26:42, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.864e-05, 'epoch': 1.61} 43%|████▎ | 4284/10000 [15:38:07<20:26:42, 12.88s/it] 43%|████▎ | 4285/10000 [15:38:20<20:27:03, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.8635e-05, 'epoch': 1.61} 43%|████▎ | 4285/10000 [15:38:20<20:27:03, 12.88s/it] 43%|████▎ | 4286/10000 [15:38:33<20:27:23, 12.89s/it] {'loss': 0.0044, 'learning_rate': 2.8630000000000002e-05, 'epoch': 1.61} 43%|████▎ | 4286/10000 [15:38:33<20:27:23, 12.89s/it] 43%|████▎ | 4287/10000 [15:38:45<20:26:57, 12.89s/it] {'loss': 0.0052, 'learning_rate': 2.8625e-05, 'epoch': 1.62} 43%|████▎ | 4287/10000 [15:38:46<20:26:57, 12.89s/it] 43%|████▎ | 4288/10000 [15:38:58<20:27:08, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.8620000000000004e-05, 'epoch': 1.62} 43%|████▎ | 4288/10000 [15:38:58<20:27:08, 12.89s/it] 43%|████▎ | 4289/10000 [15:39:11<20:25:15, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.8615000000000003e-05, 'epoch': 1.62} 43%|████▎ | 4289/10000 [15:39:11<20:25:15, 12.87s/it] 43%|████▎ | 4290/10000 [15:39:24<20:23:25, 12.86s/it] {'loss': 0.006, 'learning_rate': 2.8610000000000002e-05, 'epoch': 1.62} 43%|████▎ | 4290/10000 [15:39:24<20:23:25, 12.86s/it] 43%|████▎ | 4291/10000 [15:39:37<20:21:51, 12.84s/it] {'loss': 0.0055, 'learning_rate': 2.8605000000000005e-05, 'epoch': 1.62} 43%|████▎ | 4291/10000 [15:39:37<20:21:51, 12.84s/it] 43%|████▎ | 4292/10000 [15:39:50<20:23:28, 12.86s/it] {'loss': 0.006, 'learning_rate': 2.86e-05, 'epoch': 1.62} 43%|████▎ | 4292/10000 [15:39:50<20:23:28, 12.86s/it] 43%|████▎ | 4293/10000 [15:40:03<20:23:06, 12.86s/it] {'loss': 0.0045, 'learning_rate': 2.8595e-05, 'epoch': 1.62} 43%|████▎ | 4293/10000 [15:40:03<20:23:06, 12.86s/it] 43%|████▎ | 4294/10000 [15:40:15<20:22:39, 12.86s/it] {'loss': 0.0049, 'learning_rate': 2.859e-05, 'epoch': 1.62} 43%|████▎ | 4294/10000 [15:40:15<20:22:39, 12.86s/it] 43%|████▎ | 4295/10000 [15:40:28<20:20:15, 12.83s/it] {'loss': 0.0066, 'learning_rate': 2.8585e-05, 'epoch': 1.62} 43%|████▎ | 4295/10000 [15:40:28<20:20:15, 12.83s/it] 43%|████▎ | 4296/10000 [15:40:41<20:19:53, 12.83s/it] {'loss': 0.0042, 'learning_rate': 2.858e-05, 'epoch': 1.62} 43%|████▎ | 4296/10000 [15:40:41<20:19:53, 12.83s/it] 43%|████▎ | 4297/10000 [15:40:54<20:19:24, 12.83s/it] {'loss': 0.0049, 'learning_rate': 2.8575000000000003e-05, 'epoch': 1.62} 43%|████▎ | 4297/10000 [15:40:54<20:19:24, 12.83s/it] 43%|████▎ | 4298/10000 [15:41:07<20:19:06, 12.83s/it] {'loss': 0.0054, 'learning_rate': 2.8570000000000003e-05, 'epoch': 1.62} 43%|████▎ | 4298/10000 [15:41:07<20:19:06, 12.83s/it] 43%|████▎ | 4299/10000 [15:41:20<20:18:33, 12.82s/it] {'loss': 0.0045, 'learning_rate': 2.8565000000000002e-05, 'epoch': 1.62} 43%|████▎ | 4299/10000 [15:41:20<20:18:33, 12.82s/it] 43%|████▎ | 4300/10000 [15:41:32<20:22:10, 12.86s/it] {'loss': 0.0045, 'learning_rate': 2.8560000000000004e-05, 'epoch': 1.62} 43%|████▎ | 4300/10000 [15:41:32<20:22:10, 12.86s/it] 43%|████▎ | 4301/10000 [15:41:45<20:22:02, 12.87s/it] {'loss': 0.0051, 'learning_rate': 2.8555000000000004e-05, 'epoch': 1.62} 43%|████▎ | 4301/10000 [15:41:45<20:22:02, 12.87s/it] 43%|████▎ | 4302/10000 [15:41:58<20:21:06, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.855e-05, 'epoch': 1.62} 43%|████▎ | 4302/10000 [15:41:58<20:21:06, 12.86s/it] 43%|████▎ | 4303/10000 [15:42:11<20:20:34, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.8545e-05, 'epoch': 1.62} 43%|████▎ | 4303/10000 [15:42:11<20:20:34, 12.86s/it] 43%|████▎ | 4304/10000 [15:42:24<20:19:56, 12.85s/it] {'loss': 0.0051, 'learning_rate': 2.854e-05, 'epoch': 1.62} 43%|████▎ | 4304/10000 [15:42:24<20:19:56, 12.85s/it] 43%|████▎ | 4305/10000 [15:42:37<20:20:13, 12.86s/it] {'loss': 0.0045, 'learning_rate': 2.8535e-05, 'epoch': 1.62} 43%|████▎ | 4305/10000 [15:42:37<20:20:13, 12.86s/it] 43%|████▎ | 4306/10000 [15:42:50<20:20:32, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.853e-05, 'epoch': 1.62} 43%|████▎ | 4306/10000 [15:42:50<20:20:32, 12.86s/it] 43%|████▎ | 4307/10000 [15:43:02<20:20:26, 12.86s/it] {'loss': 0.0064, 'learning_rate': 2.8525000000000002e-05, 'epoch': 1.62} 43%|████▎ | 4307/10000 [15:43:02<20:20:26, 12.86s/it] 43%|████▎ | 4308/10000 [15:43:15<20:20:56, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.852e-05, 'epoch': 1.62} 43%|████▎ | 4308/10000 [15:43:15<20:20:56, 12.87s/it] 43%|████▎ | 4309/10000 [15:43:28<20:20:00, 12.86s/it] {'loss': 0.0041, 'learning_rate': 2.8515000000000004e-05, 'epoch': 1.62} 43%|████▎ | 4309/10000 [15:43:28<20:20:00, 12.86s/it] 43%|████▎ | 4310/10000 [15:43:41<20:20:03, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.8510000000000003e-05, 'epoch': 1.62} 43%|████▎ | 4310/10000 [15:43:41<20:20:03, 12.87s/it] 43%|████▎ | 4311/10000 [15:43:54<20:20:47, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.8505000000000002e-05, 'epoch': 1.62} 43%|████▎ | 4311/10000 [15:43:54<20:20:47, 12.88s/it] 43%|████▎ | 4312/10000 [15:44:07<20:18:00, 12.85s/it] {'loss': 0.0062, 'learning_rate': 2.8499999999999998e-05, 'epoch': 1.62} 43%|████▎ | 4312/10000 [15:44:07<20:18:00, 12.85s/it] 43%|████▎ | 4313/10000 [15:44:20<20:16:46, 12.84s/it] {'loss': 0.0055, 'learning_rate': 2.8495e-05, 'epoch': 1.63} 43%|████▎ | 4313/10000 [15:44:20<20:16:46, 12.84s/it] 43%|████▎ | 4314/10000 [15:44:32<20:15:43, 12.83s/it] {'loss': 0.006, 'learning_rate': 2.849e-05, 'epoch': 1.63} 43%|████▎ | 4314/10000 [15:44:32<20:15:43, 12.83s/it] 43%|████▎ | 4315/10000 [15:44:45<20:17:59, 12.85s/it] {'loss': 0.0048, 'learning_rate': 2.8485e-05, 'epoch': 1.63} 43%|████▎ | 4315/10000 [15:44:45<20:17:59, 12.85s/it] 43%|████▎ | 4316/10000 [15:44:58<20:17:33, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.8480000000000002e-05, 'epoch': 1.63} 43%|████▎ | 4316/10000 [15:44:58<20:17:33, 12.85s/it] 43%|████▎ | 4317/10000 [15:45:11<20:19:35, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.8475e-05, 'epoch': 1.63} 43%|████▎ | 4317/10000 [15:45:11<20:19:35, 12.88s/it] 43%|████▎ | 4318/10000 [15:45:24<20:21:12, 12.90s/it] {'loss': 0.005, 'learning_rate': 2.8470000000000004e-05, 'epoch': 1.63} 43%|████▎ | 4318/10000 [15:45:24<20:21:12, 12.90s/it] 43%|████▎ | 4319/10000 [15:45:37<20:22:11, 12.91s/it] {'loss': 0.0063, 'learning_rate': 2.8465000000000003e-05, 'epoch': 1.63} 43%|████▎ | 4319/10000 [15:45:37<20:22:11, 12.91s/it] 43%|████▎ | 4320/10000 [15:45:50<20:24:30, 12.93s/it] {'loss': 0.0052, 'learning_rate': 2.8460000000000002e-05, 'epoch': 1.63} 43%|████▎ | 4320/10000 [15:45:50<20:24:30, 12.93s/it] 43%|████▎ | 4321/10000 [15:46:03<20:24:44, 12.94s/it] {'loss': 0.0059, 'learning_rate': 2.8455000000000005e-05, 'epoch': 1.63} 43%|████▎ | 4321/10000 [15:46:03<20:24:44, 12.94s/it] 43%|████▎ | 4322/10000 [15:46:16<20:24:38, 12.94s/it] {'loss': 0.0067, 'learning_rate': 2.845e-05, 'epoch': 1.63} 43%|████▎ | 4322/10000 [15:46:16<20:24:38, 12.94s/it] 43%|████▎ | 4323/10000 [15:46:29<20:26:04, 12.96s/it] {'loss': 0.0056, 'learning_rate': 2.8445e-05, 'epoch': 1.63} 43%|████▎ | 4323/10000 [15:46:29<20:26:04, 12.96s/it] 43%|████▎ | 4324/10000 [15:46:42<20:22:05, 12.92s/it] {'loss': 0.0048, 'learning_rate': 2.844e-05, 'epoch': 1.63} 43%|████▎ | 4324/10000 [15:46:42<20:22:05, 12.92s/it] 43%|████▎ | 4325/10000 [15:46:55<20:20:27, 12.90s/it] {'loss': 0.0044, 'learning_rate': 2.8435e-05, 'epoch': 1.63} 43%|████▎ | 4325/10000 [15:46:55<20:20:27, 12.90s/it] 43%|████▎ | 4326/10000 [15:47:07<20:17:00, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.843e-05, 'epoch': 1.63} 43%|████▎ | 4326/10000 [15:47:07<20:17:00, 12.87s/it] 43%|████▎ | 4327/10000 [15:47:20<20:16:58, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.8425000000000003e-05, 'epoch': 1.63} 43%|████▎ | 4327/10000 [15:47:20<20:16:58, 12.87s/it] 43%|████▎ | 4328/10000 [15:47:33<20:17:50, 12.88s/it] {'loss': 0.0058, 'learning_rate': 2.8420000000000002e-05, 'epoch': 1.63} 43%|████▎ | 4328/10000 [15:47:33<20:17:50, 12.88s/it] 43%|████▎ | 4329/10000 [15:47:46<20:15:37, 12.86s/it] {'loss': 0.0059, 'learning_rate': 2.8415e-05, 'epoch': 1.63} 43%|████▎ | 4329/10000 [15:47:46<20:15:37, 12.86s/it] 43%|████▎ | 4330/10000 [15:47:59<20:16:25, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.8410000000000004e-05, 'epoch': 1.63} 43%|████▎ | 4330/10000 [15:47:59<20:16:25, 12.87s/it] 43%|████▎ | 4331/10000 [15:48:12<20:15:36, 12.87s/it] {'loss': 0.007, 'learning_rate': 2.8405000000000003e-05, 'epoch': 1.63} 43%|████▎ | 4331/10000 [15:48:12<20:15:36, 12.87s/it] 43%|████▎ | 4332/10000 [15:48:25<20:16:07, 12.87s/it] {'loss': 0.0051, 'learning_rate': 2.84e-05, 'epoch': 1.63} 43%|████▎ | 4332/10000 [15:48:25<20:16:07, 12.87s/it] 43%|████▎ | 4333/10000 [15:48:37<20:15:43, 12.87s/it] {'loss': 0.0051, 'learning_rate': 2.8395e-05, 'epoch': 1.63} 43%|████▎ | 4333/10000 [15:48:37<20:15:43, 12.87s/it] 43%|████▎ | 4334/10000 [15:48:50<20:14:24, 12.86s/it] {'loss': 0.0061, 'learning_rate': 2.839e-05, 'epoch': 1.63} 43%|████▎ | 4334/10000 [15:48:50<20:14:24, 12.86s/it] 43%|████▎ | 4335/10000 [15:49:03<20:13:31, 12.85s/it] {'loss': 0.0071, 'learning_rate': 2.8385e-05, 'epoch': 1.63} 43%|████▎ | 4335/10000 [15:49:03<20:13:31, 12.85s/it] 43%|████▎ | 4336/10000 [15:49:16<20:14:18, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.8380000000000003e-05, 'epoch': 1.63} 43%|████▎ | 4336/10000 [15:49:16<20:14:18, 12.86s/it] 43%|████▎ | 4337/10000 [15:49:29<20:14:07, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.8375000000000002e-05, 'epoch': 1.63} 43%|████▎ | 4337/10000 [15:49:29<20:14:07, 12.86s/it] 43%|████▎ | 4338/10000 [15:49:42<20:11:31, 12.84s/it] {'loss': 0.006, 'learning_rate': 2.837e-05, 'epoch': 1.63} 43%|████▎ | 4338/10000 [15:49:42<20:11:31, 12.84s/it] 43%|████▎ | 4339/10000 [15:49:55<20:12:23, 12.85s/it] {'loss': 0.0045, 'learning_rate': 2.8365000000000004e-05, 'epoch': 1.63} 43%|████▎ | 4339/10000 [15:49:55<20:12:23, 12.85s/it] 43%|████▎ | 4340/10000 [15:50:07<20:14:36, 12.88s/it] {'loss': 0.0054, 'learning_rate': 2.8360000000000003e-05, 'epoch': 1.64} 43%|████▎ | 4340/10000 [15:50:07<20:14:36, 12.88s/it] 43%|████▎ | 4341/10000 [15:50:20<20:17:30, 12.91s/it] {'loss': 0.0053, 'learning_rate': 2.8355000000000002e-05, 'epoch': 1.64} 43%|████▎ | 4341/10000 [15:50:20<20:17:30, 12.91s/it] 43%|████▎ | 4342/10000 [15:50:33<20:20:52, 12.95s/it] {'loss': 0.0044, 'learning_rate': 2.8349999999999998e-05, 'epoch': 1.64} 43%|████▎ | 4342/10000 [15:50:33<20:20:52, 12.95s/it] 43%|████▎ | 4343/10000 [15:50:46<20:20:25, 12.94s/it] {'loss': 0.0046, 'learning_rate': 2.8345e-05, 'epoch': 1.64} 43%|████▎ | 4343/10000 [15:50:46<20:20:25, 12.94s/it] 43%|████▎ | 4344/10000 [15:50:59<20:20:35, 12.95s/it] {'loss': 0.0041, 'learning_rate': 2.834e-05, 'epoch': 1.64} 43%|████▎ | 4344/10000 [15:50:59<20:20:35, 12.95s/it] 43%|████▎ | 4345/10000 [15:51:12<20:20:16, 12.95s/it] {'loss': 0.0059, 'learning_rate': 2.8335e-05, 'epoch': 1.64} 43%|████▎ | 4345/10000 [15:51:12<20:20:16, 12.95s/it] 43%|████▎ | 4346/10000 [15:51:25<20:18:16, 12.93s/it] {'loss': 0.0069, 'learning_rate': 2.833e-05, 'epoch': 1.64} 43%|████▎ | 4346/10000 [15:51:25<20:18:16, 12.93s/it] 43%|████▎ | 4347/10000 [15:51:38<20:17:30, 12.92s/it] {'loss': 0.0056, 'learning_rate': 2.8325e-05, 'epoch': 1.64} 43%|████▎ | 4347/10000 [15:51:38<20:17:30, 12.92s/it] 43%|████▎ | 4348/10000 [15:51:51<20:18:39, 12.94s/it] {'loss': 0.0048, 'learning_rate': 2.8320000000000003e-05, 'epoch': 1.64} 43%|████▎ | 4348/10000 [15:51:51<20:18:39, 12.94s/it] 43%|████▎ | 4349/10000 [15:52:04<20:20:20, 12.96s/it] {'loss': 0.0051, 'learning_rate': 2.8315000000000002e-05, 'epoch': 1.64} 43%|████▎ | 4349/10000 [15:52:04<20:20:20, 12.96s/it] 44%|████▎ | 4350/10000 [15:52:17<20:20:47, 12.96s/it] {'loss': 0.0045, 'learning_rate': 2.8310000000000002e-05, 'epoch': 1.64} 44%|████▎ | 4350/10000 [15:52:17<20:20:47, 12.96s/it] 44%|████▎ | 4351/10000 [15:52:30<20:18:55, 12.95s/it] {'loss': 0.0043, 'learning_rate': 2.8305000000000004e-05, 'epoch': 1.64} 44%|████▎ | 4351/10000 [15:52:30<20:18:55, 12.95s/it] 44%|████▎ | 4352/10000 [15:52:43<20:16:09, 12.92s/it] {'loss': 0.0067, 'learning_rate': 2.83e-05, 'epoch': 1.64} 44%|████▎ | 4352/10000 [15:52:43<20:16:09, 12.92s/it] 44%|████▎ | 4353/10000 [15:52:56<20:15:52, 12.92s/it] {'loss': 0.0052, 'learning_rate': 2.8295e-05, 'epoch': 1.64} 44%|████▎ | 4353/10000 [15:52:56<20:15:52, 12.92s/it] 44%|████▎ | 4354/10000 [15:53:09<20:15:41, 12.92s/it] {'loss': 0.0058, 'learning_rate': 2.829e-05, 'epoch': 1.64} 44%|████▎ | 4354/10000 [15:53:09<20:15:41, 12.92s/it] 44%|████▎ | 4355/10000 [15:53:22<20:15:00, 12.91s/it] {'loss': 0.0046, 'learning_rate': 2.8285e-05, 'epoch': 1.64} 44%|████▎ | 4355/10000 [15:53:22<20:15:00, 12.91s/it] 44%|████▎ | 4356/10000 [15:53:34<20:14:21, 12.91s/it] {'loss': 0.005, 'learning_rate': 2.828e-05, 'epoch': 1.64} 44%|████▎ | 4356/10000 [15:53:34<20:14:21, 12.91s/it] 44%|████▎ | 4357/10000 [15:53:47<20:12:48, 12.90s/it] {'loss': 0.0091, 'learning_rate': 2.8275000000000003e-05, 'epoch': 1.64} 44%|████▎ | 4357/10000 [15:53:47<20:12:48, 12.90s/it] 44%|████▎ | 4358/10000 [15:54:00<20:12:37, 12.90s/it] {'loss': 0.0046, 'learning_rate': 2.8270000000000002e-05, 'epoch': 1.64} 44%|████▎ | 4358/10000 [15:54:00<20:12:37, 12.90s/it] 44%|████▎ | 4359/10000 [15:54:13<20:10:15, 12.87s/it] {'loss': 0.0051, 'learning_rate': 2.8265e-05, 'epoch': 1.64} 44%|████▎ | 4359/10000 [15:54:13<20:10:15, 12.87s/it] 44%|████▎ | 4360/10000 [15:54:26<20:09:33, 12.87s/it] {'loss': 0.0067, 'learning_rate': 2.8260000000000004e-05, 'epoch': 1.64} 44%|████▎ | 4360/10000 [15:54:26<20:09:33, 12.87s/it] 44%|████▎ | 4361/10000 [15:54:39<20:10:29, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.8255000000000003e-05, 'epoch': 1.64} 44%|████▎ | 4361/10000 [15:54:39<20:10:29, 12.88s/it] 44%|████▎ | 4362/10000 [15:54:52<20:09:15, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.825e-05, 'epoch': 1.64} 44%|████▎ | 4362/10000 [15:54:52<20:09:15, 12.87s/it] 44%|████▎ | 4363/10000 [15:55:05<20:09:09, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.8244999999999998e-05, 'epoch': 1.64} 44%|████▎ | 4363/10000 [15:55:05<20:09:09, 12.87s/it] 44%|████▎ | 4364/10000 [15:55:17<20:09:20, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.824e-05, 'epoch': 1.64} 44%|████▎ | 4364/10000 [15:55:17<20:09:20, 12.87s/it] 44%|████▎ | 4365/10000 [15:55:30<20:08:04, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.8235e-05, 'epoch': 1.64} 44%|████▎ | 4365/10000 [15:55:30<20:08:04, 12.86s/it] 44%|████▎ | 4366/10000 [15:55:43<20:06:58, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.8230000000000002e-05, 'epoch': 1.65} 44%|████▎ | 4366/10000 [15:55:43<20:06:58, 12.85s/it] 44%|████▎ | 4367/10000 [15:55:56<20:06:38, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.8225e-05, 'epoch': 1.65} 44%|████▎ | 4367/10000 [15:55:56<20:06:38, 12.85s/it] 44%|████▎ | 4368/10000 [15:56:09<20:04:56, 12.84s/it] {'loss': 0.0059, 'learning_rate': 2.822e-05, 'epoch': 1.65} 44%|████▎ | 4368/10000 [15:56:09<20:04:56, 12.84s/it] 44%|████▎ | 4369/10000 [15:56:22<20:05:39, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.8215000000000003e-05, 'epoch': 1.65} 44%|████▎ | 4369/10000 [15:56:22<20:05:39, 12.85s/it] 44%|████▎ | 4370/10000 [15:56:34<20:06:12, 12.85s/it] {'loss': 0.0065, 'learning_rate': 2.8210000000000003e-05, 'epoch': 1.65} 44%|████▎ | 4370/10000 [15:56:34<20:06:12, 12.85s/it] 44%|████▎ | 4371/10000 [15:56:47<20:06:53, 12.86s/it] {'loss': 0.0052, 'learning_rate': 2.8205000000000005e-05, 'epoch': 1.65} 44%|████▎ | 4371/10000 [15:56:47<20:06:53, 12.86s/it] 44%|████▎ | 4372/10000 [15:57:00<20:05:58, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.8199999999999998e-05, 'epoch': 1.65} 44%|████▎ | 4372/10000 [15:57:00<20:05:58, 12.86s/it] 44%|████▎ | 4373/10000 [15:57:13<20:05:53, 12.86s/it] {'loss': 0.0057, 'learning_rate': 2.8195e-05, 'epoch': 1.65} 44%|████▎ | 4373/10000 [15:57:13<20:05:53, 12.86s/it] 44%|████▎ | 4374/10000 [15:57:26<20:06:15, 12.86s/it] {'loss': 0.0049, 'learning_rate': 2.819e-05, 'epoch': 1.65} 44%|████▎ | 4374/10000 [15:57:26<20:06:15, 12.86s/it] 44%|████▍ | 4375/10000 [15:57:39<20:06:51, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.8185e-05, 'epoch': 1.65} 44%|████▍ | 4375/10000 [15:57:39<20:06:51, 12.87s/it] 44%|████▍ | 4376/10000 [15:57:52<20:07:37, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.818e-05, 'epoch': 1.65} 44%|████▍ | 4376/10000 [15:57:52<20:07:37, 12.88s/it] 44%|████▍ | 4377/10000 [15:58:05<20:08:50, 12.90s/it] {'loss': 0.0047, 'learning_rate': 2.8175e-05, 'epoch': 1.65} 44%|████▍ | 4377/10000 [15:58:05<20:08:50, 12.90s/it] 44%|████▍ | 4378/10000 [15:58:18<20:10:13, 12.92s/it] {'loss': 0.004, 'learning_rate': 2.8170000000000003e-05, 'epoch': 1.65} 44%|████▍ | 4378/10000 [15:58:18<20:10:13, 12.92s/it] 44%|████▍ | 4379/10000 [15:58:30<20:08:16, 12.90s/it] {'loss': 0.0057, 'learning_rate': 2.8165000000000002e-05, 'epoch': 1.65} 44%|████▍ | 4379/10000 [15:58:30<20:08:16, 12.90s/it] 44%|████▍ | 4380/10000 [15:58:43<20:08:24, 12.90s/it] {'loss': 0.0051, 'learning_rate': 2.816e-05, 'epoch': 1.65} 44%|████▍ | 4380/10000 [15:58:43<20:08:24, 12.90s/it] 44%|████▍ | 4381/10000 [15:58:56<20:06:38, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.8155000000000004e-05, 'epoch': 1.65} 44%|████▍ | 4381/10000 [15:58:56<20:06:38, 12.88s/it] 44%|████▍ | 4382/10000 [15:59:09<20:07:51, 12.90s/it] {'loss': 0.0048, 'learning_rate': 2.815e-05, 'epoch': 1.65} 44%|████▍ | 4382/10000 [15:59:09<20:07:51, 12.90s/it] 44%|████▍ | 4383/10000 [15:59:22<20:09:07, 12.92s/it] {'loss': 0.0057, 'learning_rate': 2.8145e-05, 'epoch': 1.65} 44%|████▍ | 4383/10000 [15:59:22<20:09:07, 12.92s/it] 44%|████▍ | 4384/10000 [15:59:35<20:08:47, 12.91s/it] {'loss': 0.0054, 'learning_rate': 2.8139999999999998e-05, 'epoch': 1.65} 44%|████▍ | 4384/10000 [15:59:35<20:08:47, 12.91s/it] 44%|████▍ | 4385/10000 [15:59:48<20:09:53, 12.93s/it] {'loss': 0.0056, 'learning_rate': 2.8135e-05, 'epoch': 1.65} 44%|████▍ | 4385/10000 [15:59:48<20:09:53, 12.93s/it] 44%|████▍ | 4386/10000 [16:00:01<20:07:53, 12.91s/it] {'loss': 0.0082, 'learning_rate': 2.813e-05, 'epoch': 1.65} 44%|████▍ | 4386/10000 [16:00:01<20:07:53, 12.91s/it] 44%|████▍ | 4387/10000 [16:00:14<20:06:12, 12.89s/it] {'loss': 0.0061, 'learning_rate': 2.8125000000000003e-05, 'epoch': 1.65} 44%|████▍ | 4387/10000 [16:00:14<20:06:12, 12.89s/it] 44%|████▍ | 4388/10000 [16:00:27<20:04:52, 12.88s/it] {'loss': 0.0039, 'learning_rate': 2.8120000000000002e-05, 'epoch': 1.65} 44%|████▍ | 4388/10000 [16:00:27<20:04:52, 12.88s/it] 44%|████▍ | 4389/10000 [16:00:39<20:04:50, 12.88s/it] {'loss': 0.0057, 'learning_rate': 2.8115e-05, 'epoch': 1.65} 44%|████▍ | 4389/10000 [16:00:39<20:04:50, 12.88s/it] 44%|████▍ | 4390/10000 [16:00:52<20:03:32, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.8110000000000004e-05, 'epoch': 1.65} 44%|████▍ | 4390/10000 [16:00:52<20:03:32, 12.87s/it] 44%|████▍ | 4391/10000 [16:01:05<20:07:06, 12.91s/it] {'loss': 0.0045, 'learning_rate': 2.8105000000000003e-05, 'epoch': 1.65} 44%|████▍ | 4391/10000 [16:01:05<20:07:06, 12.91s/it] 44%|████▍ | 4392/10000 [16:01:18<20:08:19, 12.93s/it] {'loss': 0.0065, 'learning_rate': 2.8100000000000005e-05, 'epoch': 1.65} 44%|████▍ | 4392/10000 [16:01:18<20:08:19, 12.93s/it] 44%|████▍ | 4393/10000 [16:01:31<20:09:32, 12.94s/it] {'loss': 0.0051, 'learning_rate': 2.8094999999999998e-05, 'epoch': 1.66} 44%|████▍ | 4393/10000 [16:01:31<20:09:32, 12.94s/it] 44%|████▍ | 4394/10000 [16:01:44<20:08:28, 12.93s/it] {'loss': 0.0072, 'learning_rate': 2.809e-05, 'epoch': 1.66} 44%|████▍ | 4394/10000 [16:01:44<20:08:28, 12.93s/it] 44%|████▍ | 4395/10000 [16:01:57<20:10:21, 12.96s/it] {'loss': 0.0038, 'learning_rate': 2.8085e-05, 'epoch': 1.66} 44%|████▍ | 4395/10000 [16:01:57<20:10:21, 12.96s/it] 44%|████▍ | 4396/10000 [16:02:10<20:10:09, 12.96s/it] {'loss': 0.0069, 'learning_rate': 2.8080000000000002e-05, 'epoch': 1.66} 44%|████▍ | 4396/10000 [16:02:10<20:10:09, 12.96s/it] 44%|████▍ | 4397/10000 [16:02:23<20:09:15, 12.95s/it] {'loss': 0.0074, 'learning_rate': 2.8075e-05, 'epoch': 1.66} 44%|████▍ | 4397/10000 [16:02:23<20:09:15, 12.95s/it] 44%|████▍ | 4398/10000 [16:02:36<20:07:53, 12.94s/it] {'loss': 0.006, 'learning_rate': 2.807e-05, 'epoch': 1.66} 44%|████▍ | 4398/10000 [16:02:36<20:07:53, 12.94s/it] 44%|████▍ | 4399/10000 [16:02:49<20:07:01, 12.93s/it] {'loss': 0.0125, 'learning_rate': 2.8065000000000003e-05, 'epoch': 1.66} 44%|████▍ | 4399/10000 [16:02:49<20:07:01, 12.93s/it] 44%|████▍ | 4400/10000 [16:03:02<20:05:54, 12.92s/it] {'loss': 0.0063, 'learning_rate': 2.8060000000000002e-05, 'epoch': 1.66} 44%|████▍ | 4400/10000 [16:03:02<20:05:54, 12.92s/it] 44%|████▍ | 4401/10000 [16:03:15<20:03:03, 12.89s/it] {'loss': 0.0054, 'learning_rate': 2.8055000000000005e-05, 'epoch': 1.66} 44%|████▍ | 4401/10000 [16:03:15<20:03:03, 12.89s/it] 44%|████▍ | 4402/10000 [16:03:27<20:01:32, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.8050000000000004e-05, 'epoch': 1.66} 44%|████▍ | 4402/10000 [16:03:27<20:01:32, 12.88s/it] 44%|████▍ | 4403/10000 [16:03:40<20:00:43, 12.87s/it] {'loss': 0.0055, 'learning_rate': 2.8045e-05, 'epoch': 1.66} 44%|████▍ | 4403/10000 [16:03:40<20:00:43, 12.87s/it] 44%|████▍ | 4404/10000 [16:03:53<20:00:39, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.804e-05, 'epoch': 1.66} 44%|████▍ | 4404/10000 [16:03:53<20:00:39, 12.87s/it] 44%|████▍ | 4405/10000 [16:04:06<20:01:34, 12.89s/it] {'loss': 0.0053, 'learning_rate': 2.8035000000000002e-05, 'epoch': 1.66} 44%|████▍ | 4405/10000 [16:04:06<20:01:34, 12.89s/it] 44%|████▍ | 4406/10000 [16:04:19<19:59:48, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.803e-05, 'epoch': 1.66} 44%|████▍ | 4406/10000 [16:04:19<19:59:48, 12.87s/it] 44%|████▍ | 4407/10000 [16:04:32<20:00:25, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.8025e-05, 'epoch': 1.66} 44%|████▍ | 4407/10000 [16:04:32<20:00:25, 12.88s/it] 44%|████▍ | 4408/10000 [16:04:45<19:58:47, 12.86s/it] {'loss': 0.0052, 'learning_rate': 2.8020000000000003e-05, 'epoch': 1.66} 44%|████▍ | 4408/10000 [16:04:45<19:58:47, 12.86s/it] 44%|████▍ | 4409/10000 [16:04:58<19:59:51, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.8015000000000002e-05, 'epoch': 1.66} 44%|████▍ | 4409/10000 [16:04:58<19:59:51, 12.88s/it] 44%|████▍ | 4410/10000 [16:05:10<20:01:03, 12.89s/it] {'loss': 0.0062, 'learning_rate': 2.8010000000000005e-05, 'epoch': 1.66} 44%|████▍ | 4410/10000 [16:05:11<20:01:03, 12.89s/it] 44%|████▍ | 4411/10000 [16:05:23<19:59:46, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.8005000000000004e-05, 'epoch': 1.66} 44%|████▍ | 4411/10000 [16:05:23<19:59:46, 12.88s/it] 44%|████▍ | 4412/10000 [16:05:36<19:58:38, 12.87s/it] {'loss': 0.0054, 'learning_rate': 2.8000000000000003e-05, 'epoch': 1.66} 44%|████▍ | 4412/10000 [16:05:36<19:58:38, 12.87s/it] 44%|████▍ | 4413/10000 [16:05:49<19:59:23, 12.88s/it] {'loss': 0.0041, 'learning_rate': 2.7995e-05, 'epoch': 1.66} 44%|████▍ | 4413/10000 [16:05:49<19:59:23, 12.88s/it] 44%|████▍ | 4414/10000 [16:06:02<19:58:56, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.7989999999999998e-05, 'epoch': 1.66} 44%|████▍ | 4414/10000 [16:06:02<19:58:56, 12.88s/it] 44%|████▍ | 4415/10000 [16:06:15<19:58:59, 12.88s/it] {'loss': 0.004, 'learning_rate': 2.7985e-05, 'epoch': 1.66} 44%|████▍ | 4415/10000 [16:06:15<19:58:59, 12.88s/it] 44%|████▍ | 4416/10000 [16:06:28<19:58:58, 12.88s/it] {'loss': 0.005, 'learning_rate': 2.798e-05, 'epoch': 1.66} 44%|████▍ | 4416/10000 [16:06:28<19:58:58, 12.88s/it] 44%|████▍ | 4417/10000 [16:06:41<19:58:43, 12.88s/it] {'loss': 0.0058, 'learning_rate': 2.7975000000000002e-05, 'epoch': 1.66} 44%|████▍ | 4417/10000 [16:06:41<19:58:43, 12.88s/it] 44%|████▍ | 4418/10000 [16:06:54<19:58:24, 12.88s/it] {'loss': 0.0056, 'learning_rate': 2.797e-05, 'epoch': 1.66} 44%|████▍ | 4418/10000 [16:06:54<19:58:24, 12.88s/it] 44%|████▍ | 4419/10000 [16:07:06<19:59:14, 12.89s/it] {'loss': 0.005, 'learning_rate': 2.7965e-05, 'epoch': 1.67} 44%|████▍ | 4419/10000 [16:07:06<19:59:14, 12.89s/it] 44%|████▍ | 4420/10000 [16:07:19<19:58:48, 12.89s/it] {'loss': 0.0048, 'learning_rate': 2.7960000000000003e-05, 'epoch': 1.67} 44%|████▍ | 4420/10000 [16:07:19<19:58:48, 12.89s/it] 44%|████▍ | 4421/10000 [16:07:32<19:58:50, 12.89s/it] {'loss': 0.0043, 'learning_rate': 2.7955000000000003e-05, 'epoch': 1.67} 44%|████▍ | 4421/10000 [16:07:32<19:58:50, 12.89s/it] 44%|████▍ | 4422/10000 [16:07:45<19:56:28, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.7950000000000005e-05, 'epoch': 1.67} 44%|████▍ | 4422/10000 [16:07:45<19:56:28, 12.87s/it] 44%|████▍ | 4423/10000 [16:07:58<19:55:08, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.7944999999999998e-05, 'epoch': 1.67} 44%|████▍ | 4423/10000 [16:07:58<19:55:08, 12.86s/it] 44%|████▍ | 4424/10000 [16:08:11<19:55:06, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.794e-05, 'epoch': 1.67} 44%|████▍ | 4424/10000 [16:08:11<19:55:06, 12.86s/it] 44%|████▍ | 4425/10000 [16:08:24<19:56:50, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.7935e-05, 'epoch': 1.67} 44%|████▍ | 4425/10000 [16:08:24<19:56:50, 12.88s/it] 44%|████▍ | 4426/10000 [16:08:36<19:55:16, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.7930000000000002e-05, 'epoch': 1.67} 44%|████▍ | 4426/10000 [16:08:37<19:55:16, 12.87s/it] 44%|████▍ | 4427/10000 [16:08:49<19:56:23, 12.88s/it] {'loss': 0.0045, 'learning_rate': 2.7925e-05, 'epoch': 1.67} 44%|████▍ | 4427/10000 [16:08:49<19:56:23, 12.88s/it] 44%|████▍ | 4428/10000 [16:09:02<19:57:20, 12.89s/it] {'loss': 0.0041, 'learning_rate': 2.792e-05, 'epoch': 1.67} 44%|████▍ | 4428/10000 [16:09:02<19:57:20, 12.89s/it] 44%|████▍ | 4429/10000 [16:09:15<19:58:22, 12.91s/it] {'loss': 0.0054, 'learning_rate': 2.7915000000000003e-05, 'epoch': 1.67} 44%|████▍ | 4429/10000 [16:09:15<19:58:22, 12.91s/it] 44%|████▍ | 4430/10000 [16:09:28<19:58:27, 12.91s/it] {'loss': 0.0048, 'learning_rate': 2.7910000000000002e-05, 'epoch': 1.67} 44%|████▍ | 4430/10000 [16:09:28<19:58:27, 12.91s/it] 44%|████▍ | 4431/10000 [16:09:41<19:57:19, 12.90s/it] {'loss': 0.0055, 'learning_rate': 2.7905000000000005e-05, 'epoch': 1.67} 44%|████▍ | 4431/10000 [16:09:41<19:57:19, 12.90s/it] 44%|████▍ | 4432/10000 [16:09:54<19:55:18, 12.88s/it] {'loss': 0.0058, 'learning_rate': 2.7900000000000004e-05, 'epoch': 1.67} 44%|████▍ | 4432/10000 [16:09:54<19:55:18, 12.88s/it] 44%|████▍ | 4433/10000 [16:10:07<19:54:58, 12.88s/it] {'loss': 0.0085, 'learning_rate': 2.7895e-05, 'epoch': 1.67} 44%|████▍ | 4433/10000 [16:10:07<19:54:58, 12.88s/it] 44%|████▍ | 4434/10000 [16:10:20<19:55:45, 12.89s/it] {'loss': 0.0051, 'learning_rate': 2.789e-05, 'epoch': 1.67} 44%|████▍ | 4434/10000 [16:10:20<19:55:45, 12.89s/it] 44%|████▍ | 4435/10000 [16:10:33<19:55:30, 12.89s/it] {'loss': 0.0045, 'learning_rate': 2.7885e-05, 'epoch': 1.67} 44%|████▍ | 4435/10000 [16:10:33<19:55:30, 12.89s/it] 44%|████▍ | 4436/10000 [16:10:45<19:53:34, 12.87s/it] {'loss': 0.0042, 'learning_rate': 2.788e-05, 'epoch': 1.67} 44%|████▍ | 4436/10000 [16:10:45<19:53:34, 12.87s/it] 44%|████▍ | 4437/10000 [16:10:58<19:51:09, 12.85s/it] {'loss': 0.0055, 'learning_rate': 2.7875e-05, 'epoch': 1.67} 44%|████▍ | 4437/10000 [16:10:58<19:51:09, 12.85s/it] 44%|████▍ | 4438/10000 [16:11:11<19:52:08, 12.86s/it] {'loss': 0.0038, 'learning_rate': 2.7870000000000003e-05, 'epoch': 1.67} 44%|████▍ | 4438/10000 [16:11:11<19:52:08, 12.86s/it] 44%|████▍ | 4439/10000 [16:11:24<19:54:58, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.7865000000000002e-05, 'epoch': 1.67} 44%|████▍ | 4439/10000 [16:11:24<19:54:58, 12.89s/it] 44%|████▍ | 4440/10000 [16:11:37<19:55:58, 12.91s/it] {'loss': 0.0068, 'learning_rate': 2.7860000000000004e-05, 'epoch': 1.67} 44%|████▍ | 4440/10000 [16:11:37<19:55:58, 12.91s/it] 44%|████▍ | 4441/10000 [16:11:50<19:55:27, 12.90s/it] {'loss': 0.0054, 'learning_rate': 2.7855000000000004e-05, 'epoch': 1.67} 44%|████▍ | 4441/10000 [16:11:50<19:55:27, 12.90s/it] 44%|████▍ | 4442/10000 [16:12:03<19:58:16, 12.94s/it] {'loss': 0.0074, 'learning_rate': 2.7850000000000003e-05, 'epoch': 1.67} 44%|████▍ | 4442/10000 [16:12:03<19:58:16, 12.94s/it] 44%|████▍ | 4443/10000 [16:12:16<19:55:14, 12.91s/it] {'loss': 0.0055, 'learning_rate': 2.7845e-05, 'epoch': 1.67} 44%|████▍ | 4443/10000 [16:12:16<19:55:14, 12.91s/it] 44%|████▍ | 4444/10000 [16:12:29<19:52:55, 12.88s/it] {'loss': 0.0058, 'learning_rate': 2.7839999999999998e-05, 'epoch': 1.67} 44%|████▍ | 4444/10000 [16:12:29<19:52:55, 12.88s/it] 44%|████▍ | 4445/10000 [16:12:41<19:50:30, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.7835e-05, 'epoch': 1.67} 44%|████▍ | 4445/10000 [16:12:41<19:50:30, 12.86s/it] 44%|████▍ | 4446/10000 [16:12:54<19:51:06, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.783e-05, 'epoch': 1.68} 44%|████▍ | 4446/10000 [16:12:54<19:51:06, 12.87s/it] 44%|████▍ | 4447/10000 [16:13:07<19:51:59, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.7825000000000002e-05, 'epoch': 1.68} 44%|████▍ | 4447/10000 [16:13:07<19:51:59, 12.88s/it] 44%|████▍ | 4448/10000 [16:13:20<19:52:04, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.782e-05, 'epoch': 1.68} 44%|████▍ | 4448/10000 [16:13:20<19:52:04, 12.88s/it] 44%|████▍ | 4449/10000 [16:13:33<19:50:13, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.7815e-05, 'epoch': 1.68} 44%|████▍ | 4449/10000 [16:13:33<19:50:13, 12.86s/it] 44%|████▍ | 4450/10000 [16:13:46<19:50:25, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.7810000000000003e-05, 'epoch': 1.68} 44%|████▍ | 4450/10000 [16:13:46<19:50:25, 12.87s/it] 45%|████▍ | 4451/10000 [16:13:59<19:49:24, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.7805000000000002e-05, 'epoch': 1.68} 45%|████▍ | 4451/10000 [16:13:59<19:49:24, 12.86s/it] 45%|████▍ | 4452/10000 [16:14:11<19:50:22, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.7800000000000005e-05, 'epoch': 1.68} 45%|████▍ | 4452/10000 [16:14:12<19:50:22, 12.87s/it] 45%|████▍ | 4453/10000 [16:14:24<19:50:47, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.7794999999999997e-05, 'epoch': 1.68} 45%|████▍ | 4453/10000 [16:14:24<19:50:47, 12.88s/it] 45%|████▍ | 4454/10000 [16:14:37<19:50:50, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.779e-05, 'epoch': 1.68} 45%|████▍ | 4454/10000 [16:14:37<19:50:50, 12.88s/it] 45%|████▍ | 4455/10000 [16:14:50<19:49:36, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.7785e-05, 'epoch': 1.68} 45%|████▍ | 4455/10000 [16:14:50<19:49:36, 12.87s/it] 45%|████▍ | 4456/10000 [16:15:03<19:47:16, 12.85s/it] {'loss': 0.0057, 'learning_rate': 2.778e-05, 'epoch': 1.68} 45%|████▍ | 4456/10000 [16:15:03<19:47:16, 12.85s/it] 45%|████▍ | 4457/10000 [16:15:16<19:47:47, 12.86s/it] {'loss': 0.0058, 'learning_rate': 2.7775e-05, 'epoch': 1.68} 45%|████▍ | 4457/10000 [16:15:16<19:47:47, 12.86s/it] 45%|████▍ | 4458/10000 [16:15:29<19:48:05, 12.86s/it] {'loss': 0.0051, 'learning_rate': 2.777e-05, 'epoch': 1.68} 45%|████▍ | 4458/10000 [16:15:29<19:48:05, 12.86s/it] 45%|████▍ | 4459/10000 [16:15:42<19:48:14, 12.87s/it] {'loss': 0.0051, 'learning_rate': 2.7765000000000003e-05, 'epoch': 1.68} 45%|████▍ | 4459/10000 [16:15:42<19:48:14, 12.87s/it] 45%|████▍ | 4460/10000 [16:15:54<19:47:24, 12.86s/it] {'loss': 0.0056, 'learning_rate': 2.7760000000000002e-05, 'epoch': 1.68} 45%|████▍ | 4460/10000 [16:15:54<19:47:24, 12.86s/it] 45%|████▍ | 4461/10000 [16:16:07<19:47:09, 12.86s/it] {'loss': 0.0057, 'learning_rate': 2.7755000000000004e-05, 'epoch': 1.68} 45%|████▍ | 4461/10000 [16:16:07<19:47:09, 12.86s/it] 45%|████▍ | 4462/10000 [16:16:20<19:48:18, 12.87s/it] {'loss': 0.0054, 'learning_rate': 2.7750000000000004e-05, 'epoch': 1.68} 45%|████▍ | 4462/10000 [16:16:20<19:48:18, 12.87s/it] 45%|████▍ | 4463/10000 [16:16:33<19:46:15, 12.85s/it] {'loss': 0.0055, 'learning_rate': 2.7745e-05, 'epoch': 1.68} 45%|████▍ | 4463/10000 [16:16:33<19:46:15, 12.85s/it] 45%|████▍ | 4464/10000 [16:16:46<19:47:18, 12.87s/it] {'loss': 0.0045, 'learning_rate': 2.774e-05, 'epoch': 1.68} 45%|████▍ | 4464/10000 [16:16:46<19:47:18, 12.87s/it] 45%|████▍ | 4465/10000 [16:16:59<19:48:18, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.7735e-05, 'epoch': 1.68} 45%|████▍ | 4465/10000 [16:16:59<19:48:18, 12.88s/it] 45%|████▍ | 4466/10000 [16:17:12<19:49:35, 12.90s/it] {'loss': 0.0057, 'learning_rate': 2.773e-05, 'epoch': 1.68} 45%|████▍ | 4466/10000 [16:17:12<19:49:35, 12.90s/it] 45%|████▍ | 4467/10000 [16:17:25<19:49:40, 12.90s/it] {'loss': 0.0058, 'learning_rate': 2.7725e-05, 'epoch': 1.68} 45%|████▍ | 4467/10000 [16:17:25<19:49:40, 12.90s/it] 45%|████▍ | 4468/10000 [16:17:38<19:50:28, 12.91s/it] {'loss': 0.0056, 'learning_rate': 2.7720000000000002e-05, 'epoch': 1.68} 45%|████▍ | 4468/10000 [16:17:38<19:50:28, 12.91s/it] 45%|████▍ | 4469/10000 [16:17:51<19:51:11, 12.92s/it] {'loss': 0.0042, 'learning_rate': 2.7715e-05, 'epoch': 1.68} 45%|████▍ | 4469/10000 [16:17:51<19:51:11, 12.92s/it] 45%|████▍ | 4470/10000 [16:18:03<19:52:47, 12.94s/it] {'loss': 0.0037, 'learning_rate': 2.7710000000000004e-05, 'epoch': 1.68} 45%|████▍ | 4470/10000 [16:18:04<19:52:47, 12.94s/it] 45%|████▍ | 4471/10000 [16:18:16<19:53:19, 12.95s/it] {'loss': 0.0055, 'learning_rate': 2.7705000000000003e-05, 'epoch': 1.68} 45%|████▍ | 4471/10000 [16:18:17<19:53:19, 12.95s/it] 45%|████▍ | 4472/10000 [16:18:29<19:53:07, 12.95s/it] {'loss': 0.004, 'learning_rate': 2.7700000000000002e-05, 'epoch': 1.69} 45%|████▍ | 4472/10000 [16:18:29<19:53:07, 12.95s/it] 45%|████▍ | 4473/10000 [16:18:42<19:53:43, 12.96s/it] {'loss': 0.0074, 'learning_rate': 2.7694999999999998e-05, 'epoch': 1.69} 45%|████▍ | 4473/10000 [16:18:42<19:53:43, 12.96s/it] 45%|████▍ | 4474/10000 [16:18:55<19:53:31, 12.96s/it] {'loss': 0.0048, 'learning_rate': 2.769e-05, 'epoch': 1.69} 45%|████▍ | 4474/10000 [16:18:55<19:53:31, 12.96s/it] 45%|████▍ | 4475/10000 [16:19:08<19:52:39, 12.95s/it] {'loss': 0.0042, 'learning_rate': 2.7685e-05, 'epoch': 1.69} 45%|████▍ | 4475/10000 [16:19:08<19:52:39, 12.95s/it] 45%|████▍ | 4476/10000 [16:19:21<19:52:30, 12.95s/it] {'loss': 0.0052, 'learning_rate': 2.768e-05, 'epoch': 1.69} 45%|████▍ | 4476/10000 [16:19:21<19:52:30, 12.95s/it] 45%|████▍ | 4477/10000 [16:19:34<19:52:45, 12.96s/it] {'loss': 0.0044, 'learning_rate': 2.7675000000000002e-05, 'epoch': 1.69} 45%|████▍ | 4477/10000 [16:19:34<19:52:45, 12.96s/it] 45%|████▍ | 4478/10000 [16:19:47<19:51:38, 12.95s/it] {'loss': 0.0052, 'learning_rate': 2.767e-05, 'epoch': 1.69} 45%|████▍ | 4478/10000 [16:19:47<19:51:38, 12.95s/it] 45%|████▍ | 4479/10000 [16:20:00<19:49:59, 12.93s/it] {'loss': 0.0048, 'learning_rate': 2.7665000000000004e-05, 'epoch': 1.69} 45%|████▍ | 4479/10000 [16:20:00<19:49:59, 12.93s/it] 45%|████▍ | 4480/10000 [16:20:13<19:49:54, 12.93s/it] {'loss': 0.0036, 'learning_rate': 2.7660000000000003e-05, 'epoch': 1.69} 45%|████▍ | 4480/10000 [16:20:13<19:49:54, 12.93s/it] 45%|████▍ | 4481/10000 [16:20:26<19:49:42, 12.93s/it] {'loss': 0.0062, 'learning_rate': 2.7655000000000002e-05, 'epoch': 1.69} 45%|████▍ | 4481/10000 [16:20:26<19:49:42, 12.93s/it] 45%|████▍ | 4482/10000 [16:20:39<19:51:54, 12.96s/it] {'loss': 0.0047, 'learning_rate': 2.7650000000000005e-05, 'epoch': 1.69} 45%|████▍ | 4482/10000 [16:20:39<19:51:54, 12.96s/it] 45%|████▍ | 4483/10000 [16:20:52<19:48:51, 12.93s/it] {'loss': 0.0046, 'learning_rate': 2.7644999999999997e-05, 'epoch': 1.69} 45%|████▍ | 4483/10000 [16:20:52<19:48:51, 12.93s/it] 45%|████▍ | 4484/10000 [16:21:05<19:48:32, 12.93s/it] {'loss': 0.0054, 'learning_rate': 2.764e-05, 'epoch': 1.69} 45%|████▍ | 4484/10000 [16:21:05<19:48:32, 12.93s/it] 45%|████▍ | 4485/10000 [16:21:18<19:48:15, 12.93s/it] {'loss': 0.0045, 'learning_rate': 2.7635e-05, 'epoch': 1.69} 45%|████▍ | 4485/10000 [16:21:18<19:48:15, 12.93s/it] 45%|████▍ | 4486/10000 [16:21:31<19:48:58, 12.94s/it] {'loss': 0.0055, 'learning_rate': 2.763e-05, 'epoch': 1.69} 45%|████▍ | 4486/10000 [16:21:31<19:48:58, 12.94s/it] 45%|████▍ | 4487/10000 [16:21:43<19:47:12, 12.92s/it] {'loss': 0.006, 'learning_rate': 2.7625e-05, 'epoch': 1.69} 45%|████▍ | 4487/10000 [16:21:44<19:47:12, 12.92s/it] 45%|████▍ | 4488/10000 [16:21:56<19:48:30, 12.94s/it] {'loss': 0.0049, 'learning_rate': 2.762e-05, 'epoch': 1.69} 45%|████▍ | 4488/10000 [16:21:56<19:48:30, 12.94s/it] 45%|████▍ | 4489/10000 [16:22:09<19:47:56, 12.93s/it] {'loss': 0.0049, 'learning_rate': 2.7615000000000002e-05, 'epoch': 1.69} 45%|████▍ | 4489/10000 [16:22:09<19:47:56, 12.93s/it] 45%|████▍ | 4490/10000 [16:22:22<19:49:39, 12.95s/it] {'loss': 0.0055, 'learning_rate': 2.761e-05, 'epoch': 1.69} 45%|████▍ | 4490/10000 [16:22:22<19:49:39, 12.95s/it] 45%|████▍ | 4491/10000 [16:22:35<19:46:33, 12.92s/it] {'loss': 0.0054, 'learning_rate': 2.7605000000000004e-05, 'epoch': 1.69} 45%|████▍ | 4491/10000 [16:22:35<19:46:33, 12.92s/it] 45%|████▍ | 4492/10000 [16:22:48<19:45:29, 12.91s/it] {'loss': 0.0045, 'learning_rate': 2.7600000000000003e-05, 'epoch': 1.69} 45%|████▍ | 4492/10000 [16:22:48<19:45:29, 12.91s/it] 45%|████▍ | 4493/10000 [16:23:01<19:45:58, 12.92s/it] {'loss': 0.0041, 'learning_rate': 2.7595e-05, 'epoch': 1.69} 45%|████▍ | 4493/10000 [16:23:01<19:45:58, 12.92s/it] 45%|████▍ | 4494/10000 [16:23:14<19:47:18, 12.94s/it] {'loss': 0.0053, 'learning_rate': 2.759e-05, 'epoch': 1.69} 45%|████▍ | 4494/10000 [16:23:14<19:47:18, 12.94s/it] 45%|████▍ | 4495/10000 [16:23:27<19:48:30, 12.95s/it] {'loss': 0.0047, 'learning_rate': 2.7585e-05, 'epoch': 1.69} 45%|████▍ | 4495/10000 [16:23:27<19:48:30, 12.95s/it] 45%|████▍ | 4496/10000 [16:23:40<19:47:19, 12.94s/it] {'loss': 0.0038, 'learning_rate': 2.758e-05, 'epoch': 1.69} 45%|████▍ | 4496/10000 [16:23:40<19:47:19, 12.94s/it] 45%|████▍ | 4497/10000 [16:23:53<19:47:19, 12.95s/it] {'loss': 0.0044, 'learning_rate': 2.7575e-05, 'epoch': 1.69} 45%|████▍ | 4497/10000 [16:23:53<19:47:19, 12.95s/it] 45%|████▍ | 4498/10000 [16:24:06<19:46:37, 12.94s/it] {'loss': 0.006, 'learning_rate': 2.7570000000000002e-05, 'epoch': 1.69} 45%|████▍ | 4498/10000 [16:24:06<19:46:37, 12.94s/it] 45%|████▍ | 4499/10000 [16:24:19<19:46:14, 12.94s/it] {'loss': 0.004, 'learning_rate': 2.7565e-05, 'epoch': 1.7} 45%|████▍ | 4499/10000 [16:24:19<19:46:14, 12.94s/it] 45%|████▌ | 4500/10000 [16:24:32<19:46:20, 12.94s/it] {'loss': 0.0037, 'learning_rate': 2.7560000000000004e-05, 'epoch': 1.7} 45%|████▌ | 4500/10000 [16:24:32<19:46:20, 12.94s/it] 45%|████▌ | 4501/10000 [16:24:45<19:45:35, 12.94s/it] {'loss': 0.0036, 'learning_rate': 2.7555000000000003e-05, 'epoch': 1.7} 45%|████▌ | 4501/10000 [16:24:45<19:45:35, 12.94s/it] 45%|████▌ | 4502/10000 [16:24:57<19:43:23, 12.91s/it] {'loss': 0.0052, 'learning_rate': 2.7550000000000002e-05, 'epoch': 1.7} 45%|████▌ | 4502/10000 [16:24:58<19:43:23, 12.91s/it] 45%|████▌ | 4503/10000 [16:25:10<19:43:38, 12.92s/it] {'loss': 0.0054, 'learning_rate': 2.7544999999999998e-05, 'epoch': 1.7} 45%|████▌ | 4503/10000 [16:25:10<19:43:38, 12.92s/it] 45%|████▌ | 4504/10000 [16:25:23<19:43:21, 12.92s/it] {'loss': 0.0048, 'learning_rate': 2.754e-05, 'epoch': 1.7} 45%|████▌ | 4504/10000 [16:25:23<19:43:21, 12.92s/it] 45%|████▌ | 4505/10000 [16:25:36<19:41:03, 12.90s/it] {'loss': 0.0048, 'learning_rate': 2.7535e-05, 'epoch': 1.7} 45%|████▌ | 4505/10000 [16:25:36<19:41:03, 12.90s/it] 45%|████▌ | 4506/10000 [16:25:49<19:41:51, 12.91s/it] {'loss': 0.0047, 'learning_rate': 2.753e-05, 'epoch': 1.7} 45%|████▌ | 4506/10000 [16:25:49<19:41:51, 12.91s/it] 45%|████▌ | 4507/10000 [16:26:02<19:41:06, 12.90s/it] {'loss': 0.0057, 'learning_rate': 2.7525e-05, 'epoch': 1.7} 45%|████▌ | 4507/10000 [16:26:02<19:41:06, 12.90s/it] 45%|████▌ | 4508/10000 [16:26:15<19:41:49, 12.91s/it] {'loss': 0.0044, 'learning_rate': 2.752e-05, 'epoch': 1.7} 45%|████▌ | 4508/10000 [16:26:15<19:41:49, 12.91s/it] 45%|████▌ | 4509/10000 [16:26:28<19:41:39, 12.91s/it] {'loss': 0.0056, 'learning_rate': 2.7515000000000003e-05, 'epoch': 1.7} 45%|████▌ | 4509/10000 [16:26:28<19:41:39, 12.91s/it] 45%|████▌ | 4510/10000 [16:26:41<19:40:59, 12.91s/it] {'loss': 0.0036, 'learning_rate': 2.7510000000000003e-05, 'epoch': 1.7} 45%|████▌ | 4510/10000 [16:26:41<19:40:59, 12.91s/it] 45%|████▌ | 4511/10000 [16:26:54<19:41:08, 12.91s/it] {'loss': 0.0063, 'learning_rate': 2.7505000000000002e-05, 'epoch': 1.7} 45%|████▌ | 4511/10000 [16:26:54<19:41:08, 12.91s/it] 45%|████▌ | 4512/10000 [16:27:07<19:39:44, 12.90s/it] {'loss': 0.0042, 'learning_rate': 2.7500000000000004e-05, 'epoch': 1.7} 45%|████▌ | 4512/10000 [16:27:07<19:39:44, 12.90s/it] 45%|████▌ | 4513/10000 [16:27:19<19:40:47, 12.91s/it] {'loss': 0.0061, 'learning_rate': 2.7495000000000004e-05, 'epoch': 1.7} 45%|████▌ | 4513/10000 [16:27:20<19:40:47, 12.91s/it] 45%|████▌ | 4514/10000 [16:27:32<19:40:55, 12.92s/it] {'loss': 0.0041, 'learning_rate': 2.749e-05, 'epoch': 1.7} 45%|████▌ | 4514/10000 [16:27:32<19:40:55, 12.92s/it] 45%|████▌ | 4515/10000 [16:27:45<19:38:34, 12.89s/it] {'loss': 0.005, 'learning_rate': 2.7485e-05, 'epoch': 1.7} 45%|████▌ | 4515/10000 [16:27:45<19:38:34, 12.89s/it] 45%|████▌ | 4516/10000 [16:27:58<19:39:00, 12.90s/it] {'loss': 0.0043, 'learning_rate': 2.748e-05, 'epoch': 1.7} 45%|████▌ | 4516/10000 [16:27:58<19:39:00, 12.90s/it] 45%|████▌ | 4517/10000 [16:28:11<19:40:26, 12.92s/it] {'loss': 0.0062, 'learning_rate': 2.7475e-05, 'epoch': 1.7} 45%|████▌ | 4517/10000 [16:28:11<19:40:26, 12.92s/it] 45%|████▌ | 4518/10000 [16:28:24<19:40:52, 12.92s/it] {'loss': 0.0042, 'learning_rate': 2.7470000000000003e-05, 'epoch': 1.7} 45%|████▌ | 4518/10000 [16:28:24<19:40:52, 12.92s/it] 45%|████▌ | 4519/10000 [16:28:37<19:40:53, 12.93s/it] {'loss': 0.0042, 'learning_rate': 2.7465000000000002e-05, 'epoch': 1.7} 45%|████▌ | 4519/10000 [16:28:37<19:40:53, 12.93s/it] 45%|████▌ | 4520/10000 [16:28:50<19:40:22, 12.92s/it] {'loss': 0.005, 'learning_rate': 2.746e-05, 'epoch': 1.7} 45%|████▌ | 4520/10000 [16:28:50<19:40:22, 12.92s/it] 45%|████▌ | 4521/10000 [16:29:03<19:41:02, 12.93s/it] {'loss': 0.0061, 'learning_rate': 2.7455000000000004e-05, 'epoch': 1.7} 45%|████▌ | 4521/10000 [16:29:03<19:41:02, 12.93s/it] 45%|████▌ | 4522/10000 [16:29:16<19:40:29, 12.93s/it] {'loss': 0.005, 'learning_rate': 2.7450000000000003e-05, 'epoch': 1.7} 45%|████▌ | 4522/10000 [16:29:16<19:40:29, 12.93s/it] 45%|████▌ | 4523/10000 [16:29:29<19:40:45, 12.94s/it] {'loss': 0.0055, 'learning_rate': 2.7445000000000002e-05, 'epoch': 1.7} 45%|████▌ | 4523/10000 [16:29:29<19:40:45, 12.94s/it] 45%|████▌ | 4524/10000 [16:29:42<19:39:45, 12.93s/it] {'loss': 0.0044, 'learning_rate': 2.7439999999999998e-05, 'epoch': 1.7} 45%|████▌ | 4524/10000 [16:29:42<19:39:45, 12.93s/it] 45%|████▌ | 4525/10000 [16:29:55<19:40:37, 12.94s/it] {'loss': 0.0039, 'learning_rate': 2.7435e-05, 'epoch': 1.7} 45%|████▌ | 4525/10000 [16:29:55<19:40:37, 12.94s/it] 45%|████▌ | 4526/10000 [16:30:07<19:39:05, 12.92s/it] {'loss': 0.0054, 'learning_rate': 2.743e-05, 'epoch': 1.71} 45%|████▌ | 4526/10000 [16:30:08<19:39:05, 12.92s/it] 45%|████▌ | 4527/10000 [16:30:20<19:36:48, 12.90s/it] {'loss': 0.0044, 'learning_rate': 2.7425e-05, 'epoch': 1.71} 45%|████▌ | 4527/10000 [16:30:20<19:36:48, 12.90s/it] 45%|████▌ | 4528/10000 [16:30:33<19:36:24, 12.90s/it] {'loss': 0.0046, 'learning_rate': 2.7420000000000002e-05, 'epoch': 1.71} 45%|████▌ | 4528/10000 [16:30:33<19:36:24, 12.90s/it] 45%|████▌ | 4529/10000 [16:30:46<19:36:26, 12.90s/it] {'loss': 0.0057, 'learning_rate': 2.7415e-05, 'epoch': 1.71} 45%|████▌ | 4529/10000 [16:30:46<19:36:26, 12.90s/it] 45%|████▌ | 4530/10000 [16:30:59<19:38:23, 12.93s/it] {'loss': 0.0039, 'learning_rate': 2.7410000000000004e-05, 'epoch': 1.71} 45%|████▌ | 4530/10000 [16:30:59<19:38:23, 12.93s/it] 45%|████▌ | 4531/10000 [16:31:12<19:37:54, 12.92s/it] {'loss': 0.0047, 'learning_rate': 2.7405000000000003e-05, 'epoch': 1.71} 45%|████▌ | 4531/10000 [16:31:12<19:37:54, 12.92s/it] 45%|████▌ | 4532/10000 [16:31:25<19:39:03, 12.94s/it] {'loss': 0.0057, 'learning_rate': 2.7400000000000002e-05, 'epoch': 1.71} 45%|████▌ | 4532/10000 [16:31:25<19:39:03, 12.94s/it] 45%|████▌ | 4533/10000 [16:31:38<19:37:37, 12.92s/it] {'loss': 0.0053, 'learning_rate': 2.7395000000000005e-05, 'epoch': 1.71} 45%|████▌ | 4533/10000 [16:31:38<19:37:37, 12.92s/it] 45%|████▌ | 4534/10000 [16:31:51<19:36:56, 12.92s/it] {'loss': 0.0058, 'learning_rate': 2.739e-05, 'epoch': 1.71} 45%|████▌ | 4534/10000 [16:31:51<19:36:56, 12.92s/it] 45%|████▌ | 4535/10000 [16:32:04<19:37:35, 12.93s/it] {'loss': 0.0061, 'learning_rate': 2.7385e-05, 'epoch': 1.71} 45%|████▌ | 4535/10000 [16:32:04<19:37:35, 12.93s/it] 45%|████▌ | 4536/10000 [16:32:17<19:37:03, 12.93s/it] {'loss': 0.0039, 'learning_rate': 2.738e-05, 'epoch': 1.71} 45%|████▌ | 4536/10000 [16:32:17<19:37:03, 12.93s/it] 45%|████▌ | 4537/10000 [16:32:30<19:36:20, 12.92s/it] {'loss': 0.0049, 'learning_rate': 2.7375e-05, 'epoch': 1.71} 45%|████▌ | 4537/10000 [16:32:30<19:36:20, 12.92s/it] 45%|████▌ | 4538/10000 [16:32:43<19:36:01, 12.92s/it] {'loss': 0.0052, 'learning_rate': 2.737e-05, 'epoch': 1.71} 45%|████▌ | 4538/10000 [16:32:43<19:36:01, 12.92s/it] 45%|████▌ | 4539/10000 [16:32:56<19:38:21, 12.95s/it] {'loss': 0.0035, 'learning_rate': 2.7365000000000003e-05, 'epoch': 1.71} 45%|████▌ | 4539/10000 [16:32:56<19:38:21, 12.95s/it] 45%|████▌ | 4540/10000 [16:33:09<19:39:10, 12.96s/it] {'loss': 0.0045, 'learning_rate': 2.7360000000000002e-05, 'epoch': 1.71} 45%|████▌ | 4540/10000 [16:33:09<19:39:10, 12.96s/it] 45%|████▌ | 4541/10000 [16:33:21<19:38:40, 12.95s/it] {'loss': 0.0048, 'learning_rate': 2.7355e-05, 'epoch': 1.71} 45%|████▌ | 4541/10000 [16:33:21<19:38:40, 12.95s/it] 45%|████▌ | 4542/10000 [16:33:34<19:37:10, 12.94s/it] {'loss': 0.0049, 'learning_rate': 2.7350000000000004e-05, 'epoch': 1.71} 45%|████▌ | 4542/10000 [16:33:34<19:37:10, 12.94s/it] 45%|████▌ | 4543/10000 [16:33:47<19:34:36, 12.91s/it] {'loss': 0.006, 'learning_rate': 2.7345000000000003e-05, 'epoch': 1.71} 45%|████▌ | 4543/10000 [16:33:47<19:34:36, 12.91s/it] 45%|████▌ | 4544/10000 [16:34:00<19:35:13, 12.92s/it] {'loss': 0.0086, 'learning_rate': 2.734e-05, 'epoch': 1.71} 45%|████▌ | 4544/10000 [16:34:00<19:35:13, 12.92s/it] 45%|████▌ | 4545/10000 [16:34:13<19:35:33, 12.93s/it] {'loss': 0.0047, 'learning_rate': 2.7335e-05, 'epoch': 1.71} 45%|████▌ | 4545/10000 [16:34:13<19:35:33, 12.93s/it] 45%|████▌ | 4546/10000 [16:34:26<19:34:24, 12.92s/it] {'loss': 0.0059, 'learning_rate': 2.733e-05, 'epoch': 1.71} 45%|████▌ | 4546/10000 [16:34:26<19:34:24, 12.92s/it] 45%|████▌ | 4547/10000 [16:34:39<19:37:15, 12.95s/it] {'loss': 0.0042, 'learning_rate': 2.7325e-05, 'epoch': 1.71} 45%|████▌ | 4547/10000 [16:34:39<19:37:15, 12.95s/it] 45%|████▌ | 4548/10000 [16:34:52<19:37:31, 12.96s/it] {'loss': 0.0043, 'learning_rate': 2.7320000000000003e-05, 'epoch': 1.71} 45%|████▌ | 4548/10000 [16:34:52<19:37:31, 12.96s/it] 45%|████▌ | 4549/10000 [16:35:05<19:38:35, 12.97s/it] {'loss': 0.0054, 'learning_rate': 2.7315000000000002e-05, 'epoch': 1.71} 45%|████▌ | 4549/10000 [16:35:05<19:38:35, 12.97s/it] 46%|████▌ | 4550/10000 [16:35:18<19:36:30, 12.95s/it] {'loss': 0.0043, 'learning_rate': 2.731e-05, 'epoch': 1.71} 46%|████▌ | 4550/10000 [16:35:18<19:36:30, 12.95s/it] 46%|████▌ | 4551/10000 [16:35:31<19:37:27, 12.97s/it] {'loss': 0.006, 'learning_rate': 2.7305000000000004e-05, 'epoch': 1.71} 46%|████▌ | 4551/10000 [16:35:31<19:37:27, 12.97s/it] 46%|████▌ | 4552/10000 [16:35:44<19:38:56, 12.98s/it] {'loss': 0.0045, 'learning_rate': 2.7300000000000003e-05, 'epoch': 1.72} 46%|████▌ | 4552/10000 [16:35:44<19:38:56, 12.98s/it] 46%|████▌ | 4553/10000 [16:35:57<19:33:49, 12.93s/it] {'loss': 0.0067, 'learning_rate': 2.7295000000000005e-05, 'epoch': 1.72} 46%|████▌ | 4553/10000 [16:35:57<19:33:49, 12.93s/it] 46%|████▌ | 4554/10000 [16:36:10<19:30:49, 12.90s/it] {'loss': 0.0047, 'learning_rate': 2.7289999999999998e-05, 'epoch': 1.72} 46%|████▌ | 4554/10000 [16:36:10<19:30:49, 12.90s/it] 46%|████▌ | 4555/10000 [16:36:22<19:28:37, 12.88s/it] {'loss': 0.005, 'learning_rate': 2.7285e-05, 'epoch': 1.72} 46%|████▌ | 4555/10000 [16:36:22<19:28:37, 12.88s/it] 46%|████▌ | 4556/10000 [16:36:35<19:29:37, 12.89s/it] {'loss': 0.0058, 'learning_rate': 2.728e-05, 'epoch': 1.72} 46%|████▌ | 4556/10000 [16:36:35<19:29:37, 12.89s/it] 46%|████▌ | 4557/10000 [16:36:48<19:28:21, 12.88s/it] {'loss': 0.0055, 'learning_rate': 2.7275e-05, 'epoch': 1.72} 46%|████▌ | 4557/10000 [16:36:48<19:28:21, 12.88s/it] 46%|████▌ | 4558/10000 [16:37:01<19:27:50, 12.88s/it] {'loss': 0.0056, 'learning_rate': 2.727e-05, 'epoch': 1.72} 46%|████▌ | 4558/10000 [16:37:01<19:27:50, 12.88s/it] 46%|████▌ | 4559/10000 [16:37:14<19:27:46, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.7265e-05, 'epoch': 1.72} 46%|████▌ | 4559/10000 [16:37:14<19:27:46, 12.88s/it] 46%|████▌ | 4560/10000 [16:37:27<19:26:29, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.7260000000000003e-05, 'epoch': 1.72} 46%|████▌ | 4560/10000 [16:37:27<19:26:29, 12.87s/it] 46%|████▌ | 4561/10000 [16:37:40<19:25:56, 12.86s/it] {'loss': 0.0056, 'learning_rate': 2.7255000000000002e-05, 'epoch': 1.72} 46%|████▌ | 4561/10000 [16:37:40<19:25:56, 12.86s/it] 46%|████▌ | 4562/10000 [16:37:52<19:26:05, 12.87s/it] {'loss': 0.0058, 'learning_rate': 2.725e-05, 'epoch': 1.72} 46%|████▌ | 4562/10000 [16:37:53<19:26:05, 12.87s/it] 46%|████▌ | 4563/10000 [16:38:05<19:26:36, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.7245000000000004e-05, 'epoch': 1.72} 46%|████▌ | 4563/10000 [16:38:05<19:26:36, 12.87s/it] 46%|████▌ | 4564/10000 [16:38:18<19:25:32, 12.86s/it] {'loss': 0.0065, 'learning_rate': 2.724e-05, 'epoch': 1.72} 46%|████▌ | 4564/10000 [16:38:18<19:25:32, 12.86s/it] 46%|████▌ | 4565/10000 [16:38:31<19:26:28, 12.88s/it] {'loss': 0.0054, 'learning_rate': 2.7235e-05, 'epoch': 1.72} 46%|████▌ | 4565/10000 [16:38:31<19:26:28, 12.88s/it] 46%|████▌ | 4566/10000 [16:38:44<19:26:47, 12.88s/it] {'loss': 0.004, 'learning_rate': 2.723e-05, 'epoch': 1.72} 46%|████▌ | 4566/10000 [16:38:44<19:26:47, 12.88s/it] 46%|████▌ | 4567/10000 [16:38:57<19:26:42, 12.88s/it] {'loss': 0.0045, 'learning_rate': 2.7225e-05, 'epoch': 1.72} 46%|████▌ | 4567/10000 [16:38:57<19:26:42, 12.88s/it] 46%|████▌ | 4568/10000 [16:39:10<19:25:20, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.722e-05, 'epoch': 1.72} 46%|████▌ | 4568/10000 [16:39:10<19:25:20, 12.87s/it] 46%|████▌ | 4569/10000 [16:39:23<19:24:31, 12.87s/it] {'loss': 0.0054, 'learning_rate': 2.7215000000000003e-05, 'epoch': 1.72} 46%|████▌ | 4569/10000 [16:39:23<19:24:31, 12.87s/it] 46%|████▌ | 4570/10000 [16:39:35<19:24:16, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.7210000000000002e-05, 'epoch': 1.72} 46%|████▌ | 4570/10000 [16:39:35<19:24:16, 12.86s/it] 46%|████▌ | 4571/10000 [16:39:48<19:25:39, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.7205e-05, 'epoch': 1.72} 46%|████▌ | 4571/10000 [16:39:48<19:25:39, 12.88s/it] 46%|████▌ | 4572/10000 [16:40:01<19:24:19, 12.87s/it] {'loss': 0.0045, 'learning_rate': 2.7200000000000004e-05, 'epoch': 1.72} 46%|████▌ | 4572/10000 [16:40:01<19:24:19, 12.87s/it] 46%|████▌ | 4573/10000 [16:40:14<19:24:07, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.7195000000000003e-05, 'epoch': 1.72} 46%|████▌ | 4573/10000 [16:40:14<19:24:07, 12.87s/it] 46%|████▌ | 4574/10000 [16:40:27<19:24:16, 12.87s/it] {'loss': 0.0061, 'learning_rate': 2.719e-05, 'epoch': 1.72} 46%|████▌ | 4574/10000 [16:40:27<19:24:16, 12.87s/it] 46%|████▌ | 4575/10000 [16:40:40<19:23:20, 12.87s/it] {'loss': 0.0031, 'learning_rate': 2.7184999999999998e-05, 'epoch': 1.72} 46%|████▌ | 4575/10000 [16:40:40<19:23:20, 12.87s/it] 46%|████▌ | 4576/10000 [16:40:53<19:23:52, 12.87s/it] {'loss': 0.0042, 'learning_rate': 2.718e-05, 'epoch': 1.72} 46%|████▌ | 4576/10000 [16:40:53<19:23:52, 12.87s/it] 46%|████▌ | 4577/10000 [16:41:06<19:23:38, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.7175e-05, 'epoch': 1.72} 46%|████▌ | 4577/10000 [16:41:06<19:23:38, 12.87s/it] 46%|████▌ | 4578/10000 [16:41:18<19:23:09, 12.87s/it] {'loss': 0.0062, 'learning_rate': 2.7170000000000002e-05, 'epoch': 1.72} 46%|████▌ | 4578/10000 [16:41:18<19:23:09, 12.87s/it] 46%|████▌ | 4579/10000 [16:41:31<19:21:06, 12.85s/it] {'loss': 0.006, 'learning_rate': 2.7165e-05, 'epoch': 1.73} 46%|████▌ | 4579/10000 [16:41:31<19:21:06, 12.85s/it] 46%|████▌ | 4580/10000 [16:41:44<19:19:20, 12.83s/it] {'loss': 0.0076, 'learning_rate': 2.716e-05, 'epoch': 1.73} 46%|████▌ | 4580/10000 [16:41:44<19:19:20, 12.83s/it] 46%|████▌ | 4581/10000 [16:41:57<19:19:46, 12.84s/it] {'loss': 0.0039, 'learning_rate': 2.7155000000000003e-05, 'epoch': 1.73} 46%|████▌ | 4581/10000 [16:41:57<19:19:46, 12.84s/it] 46%|████▌ | 4582/10000 [16:42:10<19:18:52, 12.83s/it] {'loss': 0.0043, 'learning_rate': 2.7150000000000003e-05, 'epoch': 1.73} 46%|████▌ | 4582/10000 [16:42:10<19:18:52, 12.83s/it] 46%|████▌ | 4583/10000 [16:42:23<19:18:05, 12.83s/it] {'loss': 0.0046, 'learning_rate': 2.7145000000000005e-05, 'epoch': 1.73} 46%|████▌ | 4583/10000 [16:42:23<19:18:05, 12.83s/it] 46%|████▌ | 4584/10000 [16:42:35<19:20:02, 12.85s/it] {'loss': 0.0051, 'learning_rate': 2.7139999999999998e-05, 'epoch': 1.73} 46%|████▌ | 4584/10000 [16:42:35<19:20:02, 12.85s/it] 46%|████▌ | 4585/10000 [16:42:48<19:20:11, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.7135e-05, 'epoch': 1.73} 46%|████▌ | 4585/10000 [16:42:48<19:20:11, 12.86s/it] 46%|████▌ | 4586/10000 [16:43:01<19:21:43, 12.87s/it] {'loss': 0.0039, 'learning_rate': 2.713e-05, 'epoch': 1.73} 46%|████▌ | 4586/10000 [16:43:01<19:21:43, 12.87s/it] 46%|████▌ | 4587/10000 [16:43:14<19:19:49, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.7125000000000002e-05, 'epoch': 1.73} 46%|████▌ | 4587/10000 [16:43:14<19:19:49, 12.86s/it] 46%|████▌ | 4588/10000 [16:43:27<19:19:15, 12.85s/it] {'loss': 0.0055, 'learning_rate': 2.712e-05, 'epoch': 1.73} 46%|████▌ | 4588/10000 [16:43:27<19:19:15, 12.85s/it] 46%|████▌ | 4589/10000 [16:43:40<19:19:15, 12.85s/it] {'loss': 0.0057, 'learning_rate': 2.7115e-05, 'epoch': 1.73} 46%|████▌ | 4589/10000 [16:43:40<19:19:15, 12.85s/it] 46%|████▌ | 4590/10000 [16:43:53<19:17:41, 12.84s/it] {'loss': 0.0049, 'learning_rate': 2.7110000000000003e-05, 'epoch': 1.73} 46%|████▌ | 4590/10000 [16:43:53<19:17:41, 12.84s/it] 46%|████▌ | 4591/10000 [16:44:05<19:16:30, 12.83s/it] {'loss': 0.0048, 'learning_rate': 2.7105000000000002e-05, 'epoch': 1.73} 46%|████▌ | 4591/10000 [16:44:05<19:16:30, 12.83s/it] 46%|████▌ | 4592/10000 [16:44:18<19:15:57, 12.82s/it] {'loss': 0.0045, 'learning_rate': 2.7100000000000005e-05, 'epoch': 1.73} 46%|████▌ | 4592/10000 [16:44:18<19:15:57, 12.82s/it] 46%|████▌ | 4593/10000 [16:44:31<19:16:21, 12.83s/it] {'loss': 0.0053, 'learning_rate': 2.7095000000000004e-05, 'epoch': 1.73} 46%|████▌ | 4593/10000 [16:44:31<19:16:21, 12.83s/it] 46%|████▌ | 4594/10000 [16:44:44<19:16:52, 12.84s/it] {'loss': 0.0082, 'learning_rate': 2.709e-05, 'epoch': 1.73} 46%|████▌ | 4594/10000 [16:44:44<19:16:52, 12.84s/it] 46%|████▌ | 4595/10000 [16:44:57<19:17:08, 12.85s/it] {'loss': 0.0054, 'learning_rate': 2.7085e-05, 'epoch': 1.73} 46%|████▌ | 4595/10000 [16:44:57<19:17:08, 12.85s/it] 46%|████▌ | 4596/10000 [16:45:10<19:16:36, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.7079999999999998e-05, 'epoch': 1.73} 46%|████▌ | 4596/10000 [16:45:10<19:16:36, 12.84s/it] 46%|████▌ | 4597/10000 [16:45:22<19:17:31, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.7075e-05, 'epoch': 1.73} 46%|████▌ | 4597/10000 [16:45:22<19:17:31, 12.85s/it] 46%|████▌ | 4598/10000 [16:45:35<19:17:33, 12.86s/it] {'loss': 0.007, 'learning_rate': 2.707e-05, 'epoch': 1.73} 46%|████▌ | 4598/10000 [16:45:35<19:17:33, 12.86s/it] 46%|████▌ | 4599/10000 [16:45:48<19:18:14, 12.87s/it] {'loss': 0.0057, 'learning_rate': 2.7065000000000003e-05, 'epoch': 1.73} 46%|████▌ | 4599/10000 [16:45:48<19:18:14, 12.87s/it] 46%|████▌ | 4600/10000 [16:46:01<19:16:45, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.7060000000000002e-05, 'epoch': 1.73} 46%|████▌ | 4600/10000 [16:46:01<19:16:45, 12.85s/it] 46%|████▌ | 4601/10000 [16:46:14<19:15:02, 12.84s/it] {'loss': 0.0068, 'learning_rate': 2.7055e-05, 'epoch': 1.73} 46%|████▌ | 4601/10000 [16:46:14<19:15:02, 12.84s/it] 46%|████▌ | 4602/10000 [16:46:27<19:16:00, 12.85s/it] {'loss': 0.0053, 'learning_rate': 2.7050000000000004e-05, 'epoch': 1.73} 46%|████▌ | 4602/10000 [16:46:27<19:16:00, 12.85s/it] 46%|████▌ | 4603/10000 [16:46:40<19:17:48, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.7045000000000003e-05, 'epoch': 1.73} 46%|████▌ | 4603/10000 [16:46:40<19:17:48, 12.87s/it] 46%|████▌ | 4604/10000 [16:46:52<19:15:30, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.704e-05, 'epoch': 1.73} 46%|████▌ | 4604/10000 [16:46:52<19:15:30, 12.85s/it] 46%|████▌ | 4605/10000 [16:47:05<19:15:32, 12.85s/it] {'loss': 0.0048, 'learning_rate': 2.7034999999999998e-05, 'epoch': 1.74} 46%|████▌ | 4605/10000 [16:47:05<19:15:32, 12.85s/it] 46%|████▌ | 4606/10000 [16:47:18<19:15:59, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.703e-05, 'epoch': 1.74} 46%|████▌ | 4606/10000 [16:47:18<19:15:59, 12.86s/it] 46%|████▌ | 4607/10000 [16:47:31<19:15:10, 12.85s/it] {'loss': 0.0055, 'learning_rate': 2.7025e-05, 'epoch': 1.74} 46%|████▌ | 4607/10000 [16:47:31<19:15:10, 12.85s/it] 46%|████▌ | 4608/10000 [16:47:44<19:14:51, 12.85s/it] {'loss': 0.0047, 'learning_rate': 2.7020000000000002e-05, 'epoch': 1.74} 46%|████▌ | 4608/10000 [16:47:44<19:14:51, 12.85s/it] 46%|████▌ | 4609/10000 [16:47:57<19:13:48, 12.84s/it] {'loss': 0.0054, 'learning_rate': 2.7015e-05, 'epoch': 1.74} 46%|████▌ | 4609/10000 [16:47:57<19:13:48, 12.84s/it] 46%|████▌ | 4610/10000 [16:48:09<19:11:10, 12.81s/it] {'loss': 0.0046, 'learning_rate': 2.701e-05, 'epoch': 1.74} 46%|████▌ | 4610/10000 [16:48:09<19:11:10, 12.81s/it] 46%|████▌ | 4611/10000 [16:48:22<19:11:58, 12.83s/it] {'loss': 0.0059, 'learning_rate': 2.7005000000000003e-05, 'epoch': 1.74} 46%|████▌ | 4611/10000 [16:48:22<19:11:58, 12.83s/it] 46%|████▌ | 4612/10000 [16:48:35<19:12:42, 12.84s/it] {'loss': 0.0053, 'learning_rate': 2.7000000000000002e-05, 'epoch': 1.74} 46%|████▌ | 4612/10000 [16:48:35<19:12:42, 12.84s/it] 46%|████▌ | 4613/10000 [16:48:48<19:15:43, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.6995000000000005e-05, 'epoch': 1.74} 46%|████▌ | 4613/10000 [16:48:48<19:15:43, 12.87s/it] 46%|████▌ | 4614/10000 [16:49:01<19:13:58, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.6989999999999997e-05, 'epoch': 1.74} 46%|████▌ | 4614/10000 [16:49:01<19:13:58, 12.86s/it] 46%|████▌ | 4615/10000 [16:49:14<19:16:43, 12.89s/it] {'loss': 0.0055, 'learning_rate': 2.6985e-05, 'epoch': 1.74} 46%|████▌ | 4615/10000 [16:49:14<19:16:43, 12.89s/it] 46%|████▌ | 4616/10000 [16:49:27<19:16:28, 12.89s/it] {'loss': 0.0057, 'learning_rate': 2.698e-05, 'epoch': 1.74} 46%|████▌ | 4616/10000 [16:49:27<19:16:28, 12.89s/it] 46%|████▌ | 4617/10000 [16:49:40<19:16:18, 12.89s/it] {'loss': 0.0056, 'learning_rate': 2.6975000000000002e-05, 'epoch': 1.74} 46%|████▌ | 4617/10000 [16:49:40<19:16:18, 12.89s/it] 46%|████▌ | 4618/10000 [16:49:53<19:19:20, 12.92s/it] {'loss': 0.0047, 'learning_rate': 2.697e-05, 'epoch': 1.74} 46%|████▌ | 4618/10000 [16:49:53<19:19:20, 12.92s/it] 46%|████▌ | 4619/10000 [16:50:06<19:19:01, 12.92s/it] {'loss': 0.0044, 'learning_rate': 2.6965e-05, 'epoch': 1.74} 46%|████▌ | 4619/10000 [16:50:06<19:19:01, 12.92s/it] 46%|████▌ | 4620/10000 [16:50:18<19:17:45, 12.91s/it] {'loss': 0.0047, 'learning_rate': 2.6960000000000003e-05, 'epoch': 1.74} 46%|████▌ | 4620/10000 [16:50:19<19:17:45, 12.91s/it] 46%|████▌ | 4621/10000 [16:50:31<19:20:13, 12.94s/it] {'loss': 0.0039, 'learning_rate': 2.6955000000000002e-05, 'epoch': 1.74} 46%|████▌ | 4621/10000 [16:50:32<19:20:13, 12.94s/it] 46%|████▌ | 4622/10000 [16:50:44<19:19:53, 12.94s/it] {'loss': 0.0043, 'learning_rate': 2.6950000000000005e-05, 'epoch': 1.74} 46%|████▌ | 4622/10000 [16:50:44<19:19:53, 12.94s/it] 46%|████▌ | 4623/10000 [16:50:57<19:18:42, 12.93s/it] {'loss': 0.0046, 'learning_rate': 2.6945000000000004e-05, 'epoch': 1.74} 46%|████▌ | 4623/10000 [16:50:57<19:18:42, 12.93s/it] 46%|████▌ | 4624/10000 [16:51:10<19:20:06, 12.95s/it] {'loss': 0.0051, 'learning_rate': 2.694e-05, 'epoch': 1.74} 46%|████▌ | 4624/10000 [16:51:10<19:20:06, 12.95s/it] 46%|████▋ | 4625/10000 [16:51:23<19:18:09, 12.93s/it] {'loss': 0.0061, 'learning_rate': 2.6935e-05, 'epoch': 1.74} 46%|████▋ | 4625/10000 [16:51:23<19:18:09, 12.93s/it] 46%|████▋ | 4626/10000 [16:51:36<19:18:10, 12.93s/it] {'loss': 0.0048, 'learning_rate': 2.693e-05, 'epoch': 1.74} 46%|████▋ | 4626/10000 [16:51:36<19:18:10, 12.93s/it] 46%|████▋ | 4627/10000 [16:51:49<19:17:51, 12.93s/it] {'loss': 0.005, 'learning_rate': 2.6925e-05, 'epoch': 1.74} 46%|████▋ | 4627/10000 [16:51:49<19:17:51, 12.93s/it] 46%|████▋ | 4628/10000 [16:52:02<19:17:03, 12.92s/it] {'loss': 0.0045, 'learning_rate': 2.692e-05, 'epoch': 1.74} 46%|████▋ | 4628/10000 [16:52:02<19:17:03, 12.92s/it] 46%|████▋ | 4629/10000 [16:52:15<19:16:31, 12.92s/it] {'loss': 0.0052, 'learning_rate': 2.6915000000000002e-05, 'epoch': 1.74} 46%|████▋ | 4629/10000 [16:52:15<19:16:31, 12.92s/it] 46%|████▋ | 4630/10000 [16:52:28<19:16:52, 12.93s/it] {'loss': 0.0045, 'learning_rate': 2.691e-05, 'epoch': 1.74} 46%|████▋ | 4630/10000 [16:52:28<19:16:52, 12.93s/it] 46%|████▋ | 4631/10000 [16:52:41<19:17:30, 12.94s/it] {'loss': 0.0054, 'learning_rate': 2.6905e-05, 'epoch': 1.74} 46%|████▋ | 4631/10000 [16:52:41<19:17:30, 12.94s/it] 46%|████▋ | 4632/10000 [16:52:54<19:14:47, 12.91s/it] {'loss': 0.0055, 'learning_rate': 2.6900000000000003e-05, 'epoch': 1.75} 46%|████▋ | 4632/10000 [16:52:54<19:14:47, 12.91s/it] 46%|████▋ | 4633/10000 [16:53:07<19:15:39, 12.92s/it] {'loss': 0.0039, 'learning_rate': 2.6895000000000003e-05, 'epoch': 1.75} 46%|████▋ | 4633/10000 [16:53:07<19:15:39, 12.92s/it] 46%|████▋ | 4634/10000 [16:53:20<19:16:25, 12.93s/it] {'loss': 0.0049, 'learning_rate': 2.689e-05, 'epoch': 1.75} 46%|████▋ | 4634/10000 [16:53:20<19:16:25, 12.93s/it] 46%|████▋ | 4635/10000 [16:53:33<19:18:17, 12.95s/it] {'loss': 0.0052, 'learning_rate': 2.6884999999999998e-05, 'epoch': 1.75} 46%|████▋ | 4635/10000 [16:53:33<19:18:17, 12.95s/it] 46%|████▋ | 4636/10000 [16:53:45<19:16:23, 12.94s/it] {'loss': 0.0051, 'learning_rate': 2.688e-05, 'epoch': 1.75} 46%|████▋ | 4636/10000 [16:53:45<19:16:23, 12.94s/it] 46%|████▋ | 4637/10000 [16:53:58<19:17:50, 12.95s/it] {'loss': 0.0057, 'learning_rate': 2.6875e-05, 'epoch': 1.75} 46%|████▋ | 4637/10000 [16:53:58<19:17:50, 12.95s/it] 46%|████▋ | 4638/10000 [16:54:11<19:16:17, 12.94s/it] {'loss': 0.0038, 'learning_rate': 2.6870000000000002e-05, 'epoch': 1.75} 46%|████▋ | 4638/10000 [16:54:11<19:16:17, 12.94s/it] 46%|████▋ | 4639/10000 [16:54:24<19:16:03, 12.94s/it] {'loss': 0.0047, 'learning_rate': 2.6865e-05, 'epoch': 1.75} 46%|████▋ | 4639/10000 [16:54:24<19:16:03, 12.94s/it] 46%|████▋ | 4640/10000 [16:54:37<19:18:03, 12.96s/it] {'loss': 0.0043, 'learning_rate': 2.686e-05, 'epoch': 1.75} 46%|████▋ | 4640/10000 [16:54:37<19:18:03, 12.96s/it] 46%|████▋ | 4641/10000 [16:54:50<19:18:38, 12.97s/it] {'loss': 0.0042, 'learning_rate': 2.6855000000000003e-05, 'epoch': 1.75} 46%|████▋ | 4641/10000 [16:54:50<19:18:38, 12.97s/it] 46%|████▋ | 4642/10000 [16:55:03<19:18:18, 12.97s/it] {'loss': 0.0039, 'learning_rate': 2.6850000000000002e-05, 'epoch': 1.75} 46%|████▋ | 4642/10000 [16:55:03<19:18:18, 12.97s/it] 46%|████▋ | 4643/10000 [16:55:16<19:17:08, 12.96s/it] {'loss': 0.0051, 'learning_rate': 2.6845000000000005e-05, 'epoch': 1.75} 46%|████▋ | 4643/10000 [16:55:16<19:17:08, 12.96s/it] 46%|████▋ | 4644/10000 [16:55:29<19:18:22, 12.98s/it] {'loss': 0.0045, 'learning_rate': 2.6840000000000004e-05, 'epoch': 1.75} 46%|████▋ | 4644/10000 [16:55:29<19:18:22, 12.98s/it] 46%|████▋ | 4645/10000 [16:55:42<19:16:31, 12.96s/it] {'loss': 0.0049, 'learning_rate': 2.6835e-05, 'epoch': 1.75} 46%|████▋ | 4645/10000 [16:55:42<19:16:31, 12.96s/it] 46%|████▋ | 4646/10000 [16:55:55<19:15:48, 12.95s/it] {'loss': 0.0064, 'learning_rate': 2.683e-05, 'epoch': 1.75} 46%|████▋ | 4646/10000 [16:55:55<19:15:48, 12.95s/it] 46%|████▋ | 4647/10000 [16:56:08<19:15:45, 12.95s/it] {'loss': 0.0054, 'learning_rate': 2.6825e-05, 'epoch': 1.75} 46%|████▋ | 4647/10000 [16:56:08<19:15:45, 12.95s/it] 46%|████▋ | 4648/10000 [16:56:21<19:16:39, 12.97s/it] {'loss': 0.0046, 'learning_rate': 2.682e-05, 'epoch': 1.75} 46%|████▋ | 4648/10000 [16:56:21<19:16:39, 12.97s/it] 46%|████▋ | 4649/10000 [16:56:34<19:15:16, 12.95s/it] {'loss': 0.0053, 'learning_rate': 2.6815e-05, 'epoch': 1.75} 46%|████▋ | 4649/10000 [16:56:34<19:15:16, 12.95s/it] 46%|████▋ | 4650/10000 [16:56:47<19:17:03, 12.98s/it] {'loss': 0.0046, 'learning_rate': 2.6810000000000003e-05, 'epoch': 1.75} 46%|████▋ | 4650/10000 [16:56:47<19:17:03, 12.98s/it] 47%|████▋ | 4651/10000 [16:57:00<19:18:00, 12.99s/it] {'loss': 0.0053, 'learning_rate': 2.6805000000000002e-05, 'epoch': 1.75} 47%|████▋ | 4651/10000 [16:57:00<19:18:00, 12.99s/it] 47%|████▋ | 4652/10000 [16:57:13<19:16:34, 12.98s/it] {'loss': 0.005, 'learning_rate': 2.6800000000000004e-05, 'epoch': 1.75} 47%|████▋ | 4652/10000 [16:57:13<19:16:34, 12.98s/it] 47%|████▋ | 4653/10000 [16:57:26<19:13:41, 12.95s/it] {'loss': 0.0056, 'learning_rate': 2.6795000000000003e-05, 'epoch': 1.75} 47%|████▋ | 4653/10000 [16:57:26<19:13:41, 12.95s/it] 47%|████▋ | 4654/10000 [16:57:39<19:12:41, 12.94s/it] {'loss': 0.0053, 'learning_rate': 2.6790000000000003e-05, 'epoch': 1.75} 47%|████▋ | 4654/10000 [16:57:39<19:12:41, 12.94s/it] 47%|████▋ | 4655/10000 [16:57:52<19:12:54, 12.94s/it] {'loss': 0.0051, 'learning_rate': 2.6785e-05, 'epoch': 1.75} 47%|████▋ | 4655/10000 [16:57:52<19:12:54, 12.94s/it] 47%|████▋ | 4656/10000 [16:58:05<19:12:20, 12.94s/it] {'loss': 0.0043, 'learning_rate': 2.678e-05, 'epoch': 1.75} 47%|████▋ | 4656/10000 [16:58:05<19:12:20, 12.94s/it] 47%|████▋ | 4657/10000 [16:58:18<19:14:15, 12.96s/it] {'loss': 0.0046, 'learning_rate': 2.6775e-05, 'epoch': 1.75} 47%|████▋ | 4657/10000 [16:58:18<19:14:15, 12.96s/it] 47%|████▋ | 4658/10000 [16:58:31<19:12:17, 12.94s/it] {'loss': 0.0059, 'learning_rate': 2.677e-05, 'epoch': 1.76} 47%|████▋ | 4658/10000 [16:58:31<19:12:17, 12.94s/it] 47%|████▋ | 4659/10000 [16:58:43<19:10:49, 12.93s/it] {'loss': 0.0057, 'learning_rate': 2.6765000000000002e-05, 'epoch': 1.76} 47%|████▋ | 4659/10000 [16:58:43<19:10:49, 12.93s/it] 47%|████▋ | 4660/10000 [16:58:56<19:10:04, 12.92s/it] {'loss': 0.0053, 'learning_rate': 2.676e-05, 'epoch': 1.76} 47%|████▋ | 4660/10000 [16:58:56<19:10:04, 12.92s/it] 47%|████▋ | 4661/10000 [16:59:09<19:10:58, 12.93s/it] {'loss': 0.0054, 'learning_rate': 2.6755000000000004e-05, 'epoch': 1.76} 47%|████▋ | 4661/10000 [16:59:09<19:10:58, 12.93s/it] 47%|████▋ | 4662/10000 [16:59:22<19:11:25, 12.94s/it] {'loss': 0.0057, 'learning_rate': 2.6750000000000003e-05, 'epoch': 1.76} 47%|████▋ | 4662/10000 [16:59:22<19:11:25, 12.94s/it] 47%|████▋ | 4663/10000 [16:59:35<19:09:52, 12.93s/it] {'loss': 0.0055, 'learning_rate': 2.6745000000000002e-05, 'epoch': 1.76} 47%|████▋ | 4663/10000 [16:59:35<19:09:52, 12.93s/it] 47%|████▋ | 4664/10000 [16:59:48<19:10:04, 12.93s/it] {'loss': 0.0036, 'learning_rate': 2.6740000000000005e-05, 'epoch': 1.76} 47%|████▋ | 4664/10000 [16:59:48<19:10:04, 12.93s/it] 47%|████▋ | 4665/10000 [17:00:01<19:10:58, 12.94s/it] {'loss': 0.0054, 'learning_rate': 2.6734999999999997e-05, 'epoch': 1.76} 47%|████▋ | 4665/10000 [17:00:01<19:10:58, 12.94s/it] 47%|████▋ | 4666/10000 [17:00:14<19:10:34, 12.94s/it] {'loss': 0.005, 'learning_rate': 2.673e-05, 'epoch': 1.76} 47%|████▋ | 4666/10000 [17:00:14<19:10:34, 12.94s/it] 47%|████▋ | 4667/10000 [17:00:27<19:09:48, 12.94s/it] {'loss': 0.006, 'learning_rate': 2.6725e-05, 'epoch': 1.76} 47%|████▋ | 4667/10000 [17:00:27<19:09:48, 12.94s/it] 47%|████▋ | 4668/10000 [17:00:40<19:09:23, 12.93s/it] {'loss': 0.0052, 'learning_rate': 2.672e-05, 'epoch': 1.76} 47%|████▋ | 4668/10000 [17:00:40<19:09:23, 12.93s/it] 47%|████▋ | 4669/10000 [17:00:53<19:08:27, 12.93s/it] {'loss': 0.005, 'learning_rate': 2.6715e-05, 'epoch': 1.76} 47%|████▋ | 4669/10000 [17:00:53<19:08:27, 12.93s/it] 47%|████▋ | 4670/10000 [17:01:06<19:08:47, 12.93s/it] {'loss': 0.0039, 'learning_rate': 2.671e-05, 'epoch': 1.76} 47%|████▋ | 4670/10000 [17:01:06<19:08:47, 12.93s/it] 47%|████▋ | 4671/10000 [17:01:19<19:10:01, 12.95s/it] {'loss': 0.0042, 'learning_rate': 2.6705000000000003e-05, 'epoch': 1.76} 47%|████▋ | 4671/10000 [17:01:19<19:10:01, 12.95s/it] 47%|████▋ | 4672/10000 [17:01:32<19:10:40, 12.96s/it] {'loss': 0.0054, 'learning_rate': 2.6700000000000002e-05, 'epoch': 1.76} 47%|████▋ | 4672/10000 [17:01:32<19:10:40, 12.96s/it] 47%|████▋ | 4673/10000 [17:01:45<19:10:00, 12.95s/it] {'loss': 0.0053, 'learning_rate': 2.6695000000000004e-05, 'epoch': 1.76} 47%|████▋ | 4673/10000 [17:01:45<19:10:00, 12.95s/it] 47%|████▋ | 4674/10000 [17:01:58<19:09:57, 12.95s/it] {'loss': 0.0048, 'learning_rate': 2.6690000000000004e-05, 'epoch': 1.76} 47%|████▋ | 4674/10000 [17:01:58<19:09:57, 12.95s/it] 47%|████▋ | 4675/10000 [17:02:10<19:09:00, 12.95s/it] {'loss': 0.0053, 'learning_rate': 2.6685e-05, 'epoch': 1.76} 47%|████▋ | 4675/10000 [17:02:11<19:09:00, 12.95s/it] 47%|████▋ | 4676/10000 [17:02:23<19:08:00, 12.94s/it] {'loss': 0.0041, 'learning_rate': 2.668e-05, 'epoch': 1.76} 47%|████▋ | 4676/10000 [17:02:23<19:08:00, 12.94s/it] 47%|████▋ | 4677/10000 [17:02:36<19:08:52, 12.95s/it] {'loss': 0.0038, 'learning_rate': 2.6675e-05, 'epoch': 1.76} 47%|████▋ | 4677/10000 [17:02:36<19:08:52, 12.95s/it] 47%|████▋ | 4678/10000 [17:02:49<19:08:39, 12.95s/it] {'loss': 0.0037, 'learning_rate': 2.667e-05, 'epoch': 1.76} 47%|████▋ | 4678/10000 [17:02:49<19:08:39, 12.95s/it] 47%|████▋ | 4679/10000 [17:03:02<19:08:01, 12.95s/it] {'loss': 0.0036, 'learning_rate': 2.6665e-05, 'epoch': 1.76} 47%|████▋ | 4679/10000 [17:03:02<19:08:01, 12.95s/it] 47%|████▋ | 4680/10000 [17:03:15<19:07:09, 12.94s/it] {'loss': 0.0053, 'learning_rate': 2.6660000000000002e-05, 'epoch': 1.76} 47%|████▋ | 4680/10000 [17:03:15<19:07:09, 12.94s/it] 47%|████▋ | 4681/10000 [17:03:28<19:05:35, 12.92s/it] {'loss': 0.005, 'learning_rate': 2.6655e-05, 'epoch': 1.76} 47%|████▋ | 4681/10000 [17:03:28<19:05:35, 12.92s/it] 47%|████▋ | 4682/10000 [17:03:41<19:04:20, 12.91s/it] {'loss': 0.0052, 'learning_rate': 2.6650000000000004e-05, 'epoch': 1.76} 47%|████▋ | 4682/10000 [17:03:41<19:04:20, 12.91s/it] 47%|████▋ | 4683/10000 [17:03:54<19:05:38, 12.93s/it] {'loss': 0.0047, 'learning_rate': 2.6645000000000003e-05, 'epoch': 1.76} 47%|████▋ | 4683/10000 [17:03:54<19:05:38, 12.93s/it] 47%|████▋ | 4684/10000 [17:04:07<19:05:13, 12.93s/it] {'loss': 0.0064, 'learning_rate': 2.6640000000000002e-05, 'epoch': 1.76} 47%|████▋ | 4684/10000 [17:04:07<19:05:13, 12.93s/it] 47%|████▋ | 4685/10000 [17:04:20<19:04:00, 12.91s/it] {'loss': 0.0057, 'learning_rate': 2.6634999999999998e-05, 'epoch': 1.77} 47%|████▋ | 4685/10000 [17:04:20<19:04:00, 12.91s/it] 47%|████▋ | 4686/10000 [17:04:33<19:04:16, 12.92s/it] {'loss': 0.005, 'learning_rate': 2.663e-05, 'epoch': 1.77} 47%|████▋ | 4686/10000 [17:04:33<19:04:16, 12.92s/it] 47%|████▋ | 4687/10000 [17:04:46<19:06:20, 12.95s/it] {'loss': 0.0045, 'learning_rate': 2.6625e-05, 'epoch': 1.77} 47%|████▋ | 4687/10000 [17:04:46<19:06:20, 12.95s/it] 47%|████▋ | 4688/10000 [17:04:59<19:08:45, 12.98s/it] {'loss': 0.0046, 'learning_rate': 2.662e-05, 'epoch': 1.77} 47%|████▋ | 4688/10000 [17:04:59<19:08:45, 12.98s/it] 47%|████▋ | 4689/10000 [17:05:12<19:08:59, 12.98s/it] {'loss': 0.0047, 'learning_rate': 2.6615000000000002e-05, 'epoch': 1.77} 47%|████▋ | 4689/10000 [17:05:12<19:08:59, 12.98s/it] 47%|████▋ | 4690/10000 [17:05:25<19:08:07, 12.97s/it] {'loss': 0.0047, 'learning_rate': 2.661e-05, 'epoch': 1.77} 47%|████▋ | 4690/10000 [17:05:25<19:08:07, 12.97s/it] 47%|████▋ | 4691/10000 [17:05:38<19:05:51, 12.95s/it] {'loss': 0.0055, 'learning_rate': 2.6605000000000004e-05, 'epoch': 1.77} 47%|████▋ | 4691/10000 [17:05:38<19:05:51, 12.95s/it] 47%|████▋ | 4692/10000 [17:05:50<19:05:22, 12.95s/it] {'loss': 0.0056, 'learning_rate': 2.6600000000000003e-05, 'epoch': 1.77} 47%|████▋ | 4692/10000 [17:05:51<19:05:22, 12.95s/it] 47%|████▋ | 4693/10000 [17:06:03<19:05:18, 12.95s/it] {'loss': 0.004, 'learning_rate': 2.6595000000000002e-05, 'epoch': 1.77} 47%|████▋ | 4693/10000 [17:06:03<19:05:18, 12.95s/it] 47%|████▋ | 4694/10000 [17:06:16<19:06:13, 12.96s/it] {'loss': 0.0042, 'learning_rate': 2.6590000000000005e-05, 'epoch': 1.77} 47%|████▋ | 4694/10000 [17:06:16<19:06:13, 12.96s/it] 47%|████▋ | 4695/10000 [17:06:29<19:05:10, 12.95s/it] {'loss': 0.0047, 'learning_rate': 2.6585e-05, 'epoch': 1.77} 47%|████▋ | 4695/10000 [17:06:29<19:05:10, 12.95s/it] 47%|████▋ | 4696/10000 [17:06:42<19:04:48, 12.95s/it] {'loss': 0.0048, 'learning_rate': 2.658e-05, 'epoch': 1.77} 47%|████▋ | 4696/10000 [17:06:42<19:04:48, 12.95s/it] 47%|████▋ | 4697/10000 [17:06:55<19:06:26, 12.97s/it] {'loss': 0.0048, 'learning_rate': 2.6575e-05, 'epoch': 1.77} 47%|████▋ | 4697/10000 [17:06:55<19:06:26, 12.97s/it] 47%|████▋ | 4698/10000 [17:07:08<19:04:39, 12.95s/it] {'loss': 0.0059, 'learning_rate': 2.657e-05, 'epoch': 1.77} 47%|████▋ | 4698/10000 [17:07:08<19:04:39, 12.95s/it] 47%|████▋ | 4699/10000 [17:07:21<19:03:00, 12.94s/it] {'loss': 0.0072, 'learning_rate': 2.6565e-05, 'epoch': 1.77} 47%|████▋ | 4699/10000 [17:07:21<19:03:00, 12.94s/it] 47%|████▋ | 4700/10000 [17:07:34<19:03:30, 12.95s/it] {'loss': 0.0046, 'learning_rate': 2.6560000000000003e-05, 'epoch': 1.77} 47%|████▋ | 4700/10000 [17:07:34<19:03:30, 12.95s/it] 47%|████▋ | 4701/10000 [17:07:47<19:02:49, 12.94s/it] {'loss': 0.0054, 'learning_rate': 2.6555000000000002e-05, 'epoch': 1.77} 47%|████▋ | 4701/10000 [17:07:47<19:02:49, 12.94s/it] 47%|████▋ | 4702/10000 [17:08:00<19:03:50, 12.95s/it] {'loss': 0.0041, 'learning_rate': 2.655e-05, 'epoch': 1.77} 47%|████▋ | 4702/10000 [17:08:00<19:03:50, 12.95s/it] 47%|████▋ | 4703/10000 [17:08:13<19:04:35, 12.97s/it] {'loss': 0.0042, 'learning_rate': 2.6545000000000004e-05, 'epoch': 1.77} 47%|████▋ | 4703/10000 [17:08:13<19:04:35, 12.97s/it] 47%|████▋ | 4704/10000 [17:08:26<19:04:23, 12.97s/it] {'loss': 0.0044, 'learning_rate': 2.6540000000000003e-05, 'epoch': 1.77} 47%|████▋ | 4704/10000 [17:08:26<19:04:23, 12.97s/it] 47%|████▋ | 4705/10000 [17:08:39<19:05:27, 12.98s/it] {'loss': 0.0049, 'learning_rate': 2.6535e-05, 'epoch': 1.77} 47%|████▋ | 4705/10000 [17:08:39<19:05:27, 12.98s/it] 47%|████▋ | 4706/10000 [17:08:52<19:06:05, 12.99s/it] {'loss': 0.005, 'learning_rate': 2.653e-05, 'epoch': 1.77} 47%|████▋ | 4706/10000 [17:08:52<19:06:05, 12.99s/it] 47%|████▋ | 4707/10000 [17:09:05<19:03:56, 12.97s/it] {'loss': 0.0043, 'learning_rate': 2.6525e-05, 'epoch': 1.77} 47%|████▋ | 4707/10000 [17:09:05<19:03:56, 12.97s/it] 47%|████▋ | 4708/10000 [17:09:18<19:03:06, 12.96s/it] {'loss': 0.0057, 'learning_rate': 2.652e-05, 'epoch': 1.77} 47%|████▋ | 4708/10000 [17:09:18<19:03:06, 12.96s/it] 47%|████▋ | 4709/10000 [17:09:31<19:03:17, 12.97s/it] {'loss': 0.0043, 'learning_rate': 2.6515e-05, 'epoch': 1.77} 47%|████▋ | 4709/10000 [17:09:31<19:03:17, 12.97s/it] 47%|████▋ | 4710/10000 [17:09:44<19:03:28, 12.97s/it] {'loss': 0.0049, 'learning_rate': 2.6510000000000002e-05, 'epoch': 1.77} 47%|████▋ | 4710/10000 [17:09:44<19:03:28, 12.97s/it] 47%|████▋ | 4711/10000 [17:09:57<19:05:14, 12.99s/it] {'loss': 0.0041, 'learning_rate': 2.6505e-05, 'epoch': 1.78} 47%|████▋ | 4711/10000 [17:09:57<19:05:14, 12.99s/it] 47%|████▋ | 4712/10000 [17:10:10<19:04:16, 12.98s/it] {'loss': 0.0045, 'learning_rate': 2.6500000000000004e-05, 'epoch': 1.78} 47%|████▋ | 4712/10000 [17:10:10<19:04:16, 12.98s/it] 47%|████▋ | 4713/10000 [17:10:23<19:01:14, 12.95s/it] {'loss': 0.0042, 'learning_rate': 2.6495000000000003e-05, 'epoch': 1.78} 47%|████▋ | 4713/10000 [17:10:23<19:01:14, 12.95s/it] 47%|████▋ | 4714/10000 [17:10:36<19:02:06, 12.96s/it] {'loss': 0.0038, 'learning_rate': 2.6490000000000002e-05, 'epoch': 1.78} 47%|████▋ | 4714/10000 [17:10:36<19:02:06, 12.96s/it] 47%|████▋ | 4715/10000 [17:10:49<19:02:48, 12.97s/it] {'loss': 0.0051, 'learning_rate': 2.6484999999999998e-05, 'epoch': 1.78} 47%|████▋ | 4715/10000 [17:10:49<19:02:48, 12.97s/it] 47%|████▋ | 4716/10000 [17:11:02<19:02:03, 12.97s/it] {'loss': 0.0057, 'learning_rate': 2.648e-05, 'epoch': 1.78} 47%|████▋ | 4716/10000 [17:11:02<19:02:03, 12.97s/it] 47%|████▋ | 4717/10000 [17:11:15<19:03:28, 12.99s/it] {'loss': 0.0047, 'learning_rate': 2.6475e-05, 'epoch': 1.78} 47%|████▋ | 4717/10000 [17:11:15<19:03:28, 12.99s/it] 47%|████▋ | 4718/10000 [17:11:28<19:02:08, 12.97s/it] {'loss': 0.0056, 'learning_rate': 2.647e-05, 'epoch': 1.78} 47%|████▋ | 4718/10000 [17:11:28<19:02:08, 12.97s/it] 47%|████▋ | 4719/10000 [17:11:41<19:02:30, 12.98s/it] {'loss': 0.0044, 'learning_rate': 2.6465e-05, 'epoch': 1.78} 47%|████▋ | 4719/10000 [17:11:41<19:02:30, 12.98s/it] 47%|████▋ | 4720/10000 [17:11:54<19:01:52, 12.98s/it] {'loss': 0.0052, 'learning_rate': 2.646e-05, 'epoch': 1.78} 47%|████▋ | 4720/10000 [17:11:54<19:01:52, 12.98s/it] 47%|████▋ | 4721/10000 [17:12:07<19:02:22, 12.98s/it] {'loss': 0.0058, 'learning_rate': 2.6455000000000003e-05, 'epoch': 1.78} 47%|████▋ | 4721/10000 [17:12:07<19:02:22, 12.98s/it] 47%|████▋ | 4722/10000 [17:12:20<19:01:16, 12.97s/it] {'loss': 0.0043, 'learning_rate': 2.6450000000000003e-05, 'epoch': 1.78} 47%|████▋ | 4722/10000 [17:12:20<19:01:16, 12.97s/it] 47%|████▋ | 4723/10000 [17:12:32<19:00:37, 12.97s/it] {'loss': 0.0045, 'learning_rate': 2.6445000000000002e-05, 'epoch': 1.78} 47%|████▋ | 4723/10000 [17:12:33<19:00:37, 12.97s/it] 47%|████▋ | 4724/10000 [17:12:45<18:58:51, 12.95s/it] {'loss': 0.0046, 'learning_rate': 2.6440000000000004e-05, 'epoch': 1.78} 47%|████▋ | 4724/10000 [17:12:45<18:58:51, 12.95s/it] 47%|████▋ | 4725/10000 [17:12:58<18:59:39, 12.96s/it] {'loss': 0.0049, 'learning_rate': 2.6435e-05, 'epoch': 1.78} 47%|████▋ | 4725/10000 [17:12:58<18:59:39, 12.96s/it] 47%|████▋ | 4726/10000 [17:13:11<18:59:59, 12.97s/it] {'loss': 0.006, 'learning_rate': 2.643e-05, 'epoch': 1.78} 47%|████▋ | 4726/10000 [17:13:11<18:59:59, 12.97s/it] 47%|████▋ | 4727/10000 [17:13:24<18:57:19, 12.94s/it] {'loss': 0.006, 'learning_rate': 2.6425e-05, 'epoch': 1.78} 47%|████▋ | 4727/10000 [17:13:24<18:57:19, 12.94s/it] 47%|████▋ | 4728/10000 [17:13:37<18:57:52, 12.95s/it] {'loss': 0.0049, 'learning_rate': 2.642e-05, 'epoch': 1.78} 47%|████▋ | 4728/10000 [17:13:37<18:57:52, 12.95s/it] 47%|████▋ | 4729/10000 [17:13:50<18:57:01, 12.94s/it] {'loss': 0.0042, 'learning_rate': 2.6415e-05, 'epoch': 1.78} 47%|████▋ | 4729/10000 [17:13:50<18:57:01, 12.94s/it] 47%|████▋ | 4730/10000 [17:14:03<18:57:24, 12.95s/it] {'loss': 0.0047, 'learning_rate': 2.6410000000000003e-05, 'epoch': 1.78} 47%|████▋ | 4730/10000 [17:14:03<18:57:24, 12.95s/it] 47%|████▋ | 4731/10000 [17:14:16<18:55:38, 12.93s/it] {'loss': 0.0043, 'learning_rate': 2.6405000000000002e-05, 'epoch': 1.78} 47%|████▋ | 4731/10000 [17:14:16<18:55:38, 12.93s/it] 47%|████▋ | 4732/10000 [17:14:29<18:52:46, 12.90s/it] {'loss': 0.005, 'learning_rate': 2.64e-05, 'epoch': 1.78} 47%|████▋ | 4732/10000 [17:14:29<18:52:46, 12.90s/it] 47%|████▋ | 4733/10000 [17:14:42<18:52:31, 12.90s/it] {'loss': 0.0045, 'learning_rate': 2.6395000000000004e-05, 'epoch': 1.78} 47%|████▋ | 4733/10000 [17:14:42<18:52:31, 12.90s/it] 47%|████▋ | 4734/10000 [17:14:55<18:53:19, 12.91s/it] {'loss': 0.0047, 'learning_rate': 2.6390000000000003e-05, 'epoch': 1.78} 47%|████▋ | 4734/10000 [17:14:55<18:53:19, 12.91s/it] 47%|████▋ | 4735/10000 [17:15:08<18:52:58, 12.91s/it] {'loss': 0.0053, 'learning_rate': 2.6385e-05, 'epoch': 1.78} 47%|████▋ | 4735/10000 [17:15:08<18:52:58, 12.91s/it] 47%|████▋ | 4736/10000 [17:15:20<18:52:33, 12.91s/it] {'loss': 0.0048, 'learning_rate': 2.6379999999999998e-05, 'epoch': 1.78} 47%|████▋ | 4736/10000 [17:15:21<18:52:33, 12.91s/it] 47%|████▋ | 4737/10000 [17:15:34<18:55:34, 12.95s/it] {'loss': 0.0049, 'learning_rate': 2.6375e-05, 'epoch': 1.78} 47%|████▋ | 4737/10000 [17:15:34<18:55:34, 12.95s/it] 47%|████▋ | 4738/10000 [17:15:46<18:55:30, 12.95s/it] {'loss': 0.0061, 'learning_rate': 2.637e-05, 'epoch': 1.79} 47%|████▋ | 4738/10000 [17:15:47<18:55:30, 12.95s/it] 47%|████▋ | 4739/10000 [17:15:59<18:55:57, 12.96s/it] {'loss': 0.0056, 'learning_rate': 2.6365e-05, 'epoch': 1.79} 47%|████▋ | 4739/10000 [17:15:59<18:55:57, 12.96s/it] 47%|████▋ | 4740/10000 [17:16:12<18:54:35, 12.94s/it] {'loss': 0.0058, 'learning_rate': 2.6360000000000002e-05, 'epoch': 1.79} 47%|████▋ | 4740/10000 [17:16:12<18:54:35, 12.94s/it] 47%|████▋ | 4741/10000 [17:16:25<18:54:28, 12.94s/it] {'loss': 0.0049, 'learning_rate': 2.6355e-05, 'epoch': 1.79} 47%|████▋ | 4741/10000 [17:16:25<18:54:28, 12.94s/it] 47%|████▋ | 4742/10000 [17:16:38<18:54:33, 12.95s/it] {'loss': 0.0043, 'learning_rate': 2.6350000000000004e-05, 'epoch': 1.79} 47%|████▋ | 4742/10000 [17:16:38<18:54:33, 12.95s/it] 47%|████▋ | 4743/10000 [17:16:51<18:53:25, 12.94s/it] {'loss': 0.0051, 'learning_rate': 2.6345000000000003e-05, 'epoch': 1.79} 47%|████▋ | 4743/10000 [17:16:51<18:53:25, 12.94s/it] 47%|████▋ | 4744/10000 [17:17:04<18:53:36, 12.94s/it] {'loss': 0.0048, 'learning_rate': 2.6340000000000002e-05, 'epoch': 1.79} 47%|████▋ | 4744/10000 [17:17:04<18:53:36, 12.94s/it] 47%|████▋ | 4745/10000 [17:17:17<18:53:25, 12.94s/it] {'loss': 0.005, 'learning_rate': 2.6334999999999998e-05, 'epoch': 1.79} 47%|████▋ | 4745/10000 [17:17:17<18:53:25, 12.94s/it] 47%|████▋ | 4746/10000 [17:17:30<18:52:11, 12.93s/it] {'loss': 0.005, 'learning_rate': 2.633e-05, 'epoch': 1.79} 47%|████▋ | 4746/10000 [17:17:30<18:52:11, 12.93s/it] 47%|████▋ | 4747/10000 [17:17:43<18:52:45, 12.94s/it] {'loss': 0.0054, 'learning_rate': 2.6325e-05, 'epoch': 1.79} 47%|████▋ | 4747/10000 [17:17:43<18:52:45, 12.94s/it] 47%|████▋ | 4748/10000 [17:17:56<18:55:07, 12.97s/it] {'loss': 0.0051, 'learning_rate': 2.632e-05, 'epoch': 1.79} 47%|████▋ | 4748/10000 [17:17:56<18:55:07, 12.97s/it] 47%|████▋ | 4749/10000 [17:18:09<18:54:17, 12.96s/it] {'loss': 0.0051, 'learning_rate': 2.6315e-05, 'epoch': 1.79} 47%|████▋ | 4749/10000 [17:18:09<18:54:17, 12.96s/it] 48%|████▊ | 4750/10000 [17:18:22<18:53:36, 12.96s/it] {'loss': 0.0057, 'learning_rate': 2.631e-05, 'epoch': 1.79} 48%|████▊ | 4750/10000 [17:18:22<18:53:36, 12.96s/it] 48%|████▊ | 4751/10000 [17:18:35<18:54:55, 12.97s/it] {'loss': 0.0058, 'learning_rate': 2.6305000000000003e-05, 'epoch': 1.79} 48%|████▊ | 4751/10000 [17:18:35<18:54:55, 12.97s/it] 48%|████▊ | 4752/10000 [17:18:48<18:54:47, 12.97s/it] {'loss': 0.0054, 'learning_rate': 2.6300000000000002e-05, 'epoch': 1.79} 48%|████▊ | 4752/10000 [17:18:48<18:54:47, 12.97s/it] 48%|████▊ | 4753/10000 [17:19:01<18:55:06, 12.98s/it] {'loss': 0.005, 'learning_rate': 2.6295e-05, 'epoch': 1.79} 48%|████▊ | 4753/10000 [17:19:01<18:55:06, 12.98s/it] 48%|████▊ | 4754/10000 [17:19:14<18:52:34, 12.95s/it] {'loss': 0.005, 'learning_rate': 2.6290000000000004e-05, 'epoch': 1.79} 48%|████▊ | 4754/10000 [17:19:14<18:52:34, 12.95s/it] 48%|████▊ | 4755/10000 [17:19:27<18:53:10, 12.96s/it] {'loss': 0.0057, 'learning_rate': 2.6285e-05, 'epoch': 1.79} 48%|████▊ | 4755/10000 [17:19:27<18:53:10, 12.96s/it] 48%|████▊ | 4756/10000 [17:19:40<18:52:46, 12.96s/it] {'loss': 0.0039, 'learning_rate': 2.628e-05, 'epoch': 1.79} 48%|████▊ | 4756/10000 [17:19:40<18:52:46, 12.96s/it] 48%|████▊ | 4757/10000 [17:19:53<18:54:04, 12.98s/it] {'loss': 0.0056, 'learning_rate': 2.6275e-05, 'epoch': 1.79} 48%|████▊ | 4757/10000 [17:19:53<18:54:04, 12.98s/it] 48%|████▊ | 4758/10000 [17:20:06<18:54:27, 12.99s/it] {'loss': 0.0058, 'learning_rate': 2.627e-05, 'epoch': 1.79} 48%|████▊ | 4758/10000 [17:20:06<18:54:27, 12.99s/it] 48%|████▊ | 4759/10000 [17:20:19<18:52:57, 12.97s/it] {'loss': 0.0047, 'learning_rate': 2.6265e-05, 'epoch': 1.79} 48%|████▊ | 4759/10000 [17:20:19<18:52:57, 12.97s/it] 48%|████▊ | 4760/10000 [17:20:32<18:51:44, 12.96s/it] {'loss': 0.0042, 'learning_rate': 2.6260000000000003e-05, 'epoch': 1.79} 48%|████▊ | 4760/10000 [17:20:32<18:51:44, 12.96s/it] 48%|████▊ | 4761/10000 [17:20:45<18:53:23, 12.98s/it] {'loss': 0.0047, 'learning_rate': 2.6255000000000002e-05, 'epoch': 1.79} 48%|████▊ | 4761/10000 [17:20:45<18:53:23, 12.98s/it] 48%|████▊ | 4762/10000 [17:20:58<18:53:53, 12.99s/it] {'loss': 0.0055, 'learning_rate': 2.625e-05, 'epoch': 1.79} 48%|████▊ | 4762/10000 [17:20:58<18:53:53, 12.99s/it] 48%|████▊ | 4763/10000 [17:21:10<18:51:23, 12.96s/it] {'loss': 0.0066, 'learning_rate': 2.6245000000000004e-05, 'epoch': 1.79} 48%|████▊ | 4763/10000 [17:21:11<18:51:23, 12.96s/it] 48%|████▊ | 4764/10000 [17:21:23<18:48:31, 12.93s/it] {'loss': 0.0057, 'learning_rate': 2.6240000000000003e-05, 'epoch': 1.8} 48%|████▊ | 4764/10000 [17:21:23<18:48:31, 12.93s/it] 48%|████▊ | 4765/10000 [17:21:36<18:46:19, 12.91s/it] {'loss': 0.0043, 'learning_rate': 2.6235000000000005e-05, 'epoch': 1.8} 48%|████▊ | 4765/10000 [17:21:36<18:46:19, 12.91s/it] 48%|████▊ | 4766/10000 [17:21:49<18:45:26, 12.90s/it] {'loss': 0.0049, 'learning_rate': 2.6229999999999998e-05, 'epoch': 1.8} 48%|████▊ | 4766/10000 [17:21:49<18:45:26, 12.90s/it] 48%|████▊ | 4767/10000 [17:22:02<18:47:06, 12.92s/it] {'loss': 0.0049, 'learning_rate': 2.6225e-05, 'epoch': 1.8} 48%|████▊ | 4767/10000 [17:22:02<18:47:06, 12.92s/it] 48%|████▊ | 4768/10000 [17:22:15<18:47:11, 12.93s/it] {'loss': 0.0049, 'learning_rate': 2.622e-05, 'epoch': 1.8} 48%|████▊ | 4768/10000 [17:22:15<18:47:11, 12.93s/it] 48%|████▊ | 4769/10000 [17:22:28<18:44:16, 12.90s/it] {'loss': 0.0051, 'learning_rate': 2.6215000000000002e-05, 'epoch': 1.8} 48%|████▊ | 4769/10000 [17:22:28<18:44:16, 12.90s/it] 48%|████▊ | 4770/10000 [17:22:41<18:43:03, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.621e-05, 'epoch': 1.8} 48%|████▊ | 4770/10000 [17:22:41<18:43:03, 12.88s/it] 48%|████▊ | 4771/10000 [17:22:54<18:41:47, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.6205e-05, 'epoch': 1.8} 48%|████▊ | 4771/10000 [17:22:54<18:41:47, 12.87s/it] 48%|████▊ | 4772/10000 [17:23:06<18:43:15, 12.89s/it] {'loss': 0.0067, 'learning_rate': 2.6200000000000003e-05, 'epoch': 1.8} 48%|████▊ | 4772/10000 [17:23:06<18:43:15, 12.89s/it] 48%|████▊ | 4773/10000 [17:23:19<18:41:22, 12.87s/it] {'loss': 0.0045, 'learning_rate': 2.6195000000000002e-05, 'epoch': 1.8} 48%|████▊ | 4773/10000 [17:23:19<18:41:22, 12.87s/it] 48%|████▊ | 4774/10000 [17:23:32<18:40:27, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.6190000000000005e-05, 'epoch': 1.8} 48%|████▊ | 4774/10000 [17:23:32<18:40:27, 12.86s/it] 48%|████▊ | 4775/10000 [17:23:45<18:40:31, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.6185000000000004e-05, 'epoch': 1.8} 48%|████▊ | 4775/10000 [17:23:45<18:40:31, 12.87s/it] 48%|████▊ | 4776/10000 [17:23:58<18:39:55, 12.86s/it] {'loss': 0.0049, 'learning_rate': 2.618e-05, 'epoch': 1.8} 48%|████▊ | 4776/10000 [17:23:58<18:39:55, 12.86s/it] 48%|████▊ | 4777/10000 [17:24:11<18:41:22, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.6175e-05, 'epoch': 1.8} 48%|████▊ | 4777/10000 [17:24:11<18:41:22, 12.88s/it] 48%|████▊ | 4778/10000 [17:24:24<18:41:14, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.617e-05, 'epoch': 1.8} 48%|████▊ | 4778/10000 [17:24:24<18:41:14, 12.88s/it] 48%|████▊ | 4779/10000 [17:24:37<18:40:57, 12.88s/it] {'loss': 0.0037, 'learning_rate': 2.6165e-05, 'epoch': 1.8} 48%|████▊ | 4779/10000 [17:24:37<18:40:57, 12.88s/it] 48%|████▊ | 4780/10000 [17:24:49<18:40:27, 12.88s/it] {'loss': 0.005, 'learning_rate': 2.616e-05, 'epoch': 1.8} 48%|████▊ | 4780/10000 [17:24:49<18:40:27, 12.88s/it] 48%|████▊ | 4781/10000 [17:25:02<18:41:16, 12.89s/it] {'loss': 0.0033, 'learning_rate': 2.6155000000000003e-05, 'epoch': 1.8} 48%|████▊ | 4781/10000 [17:25:02<18:41:16, 12.89s/it] 48%|████▊ | 4782/10000 [17:25:15<18:42:36, 12.91s/it] {'loss': 0.0037, 'learning_rate': 2.6150000000000002e-05, 'epoch': 1.8} 48%|████▊ | 4782/10000 [17:25:15<18:42:36, 12.91s/it] 48%|████▊ | 4783/10000 [17:25:28<18:43:19, 12.92s/it] {'loss': 0.0043, 'learning_rate': 2.6145e-05, 'epoch': 1.8} 48%|████▊ | 4783/10000 [17:25:28<18:43:19, 12.92s/it] 48%|████▊ | 4784/10000 [17:25:41<18:41:01, 12.90s/it] {'loss': 0.0044, 'learning_rate': 2.6140000000000004e-05, 'epoch': 1.8} 48%|████▊ | 4784/10000 [17:25:41<18:41:01, 12.90s/it] 48%|████▊ | 4785/10000 [17:25:54<18:38:33, 12.87s/it] {'loss': 0.0062, 'learning_rate': 2.6135000000000003e-05, 'epoch': 1.8} 48%|████▊ | 4785/10000 [17:25:54<18:38:33, 12.87s/it] 48%|████▊ | 4786/10000 [17:26:07<18:38:37, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.613e-05, 'epoch': 1.8} 48%|████▊ | 4786/10000 [17:26:07<18:38:37, 12.87s/it] 48%|████▊ | 4787/10000 [17:26:20<18:40:17, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.6124999999999998e-05, 'epoch': 1.8} 48%|████▊ | 4787/10000 [17:26:20<18:40:17, 12.89s/it] 48%|████▊ | 4788/10000 [17:26:33<18:41:20, 12.91s/it] {'loss': 0.0047, 'learning_rate': 2.612e-05, 'epoch': 1.8} 48%|████▊ | 4788/10000 [17:26:33<18:41:20, 12.91s/it] 48%|████▊ | 4789/10000 [17:26:46<18:40:15, 12.90s/it] {'loss': 0.0041, 'learning_rate': 2.6115e-05, 'epoch': 1.8} 48%|████▊ | 4789/10000 [17:26:46<18:40:15, 12.90s/it] 48%|████▊ | 4790/10000 [17:26:58<18:39:14, 12.89s/it] {'loss': 0.005, 'learning_rate': 2.6110000000000002e-05, 'epoch': 1.8} 48%|████▊ | 4790/10000 [17:26:58<18:39:14, 12.89s/it] 48%|████▊ | 4791/10000 [17:27:11<18:39:13, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.6105e-05, 'epoch': 1.81} 48%|████▊ | 4791/10000 [17:27:11<18:39:13, 12.89s/it] 48%|████▊ | 4792/10000 [17:27:24<18:37:48, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.61e-05, 'epoch': 1.81} 48%|████▊ | 4792/10000 [17:27:24<18:37:48, 12.88s/it] 48%|████▊ | 4793/10000 [17:27:37<18:38:00, 12.88s/it] {'loss': 0.0061, 'learning_rate': 2.6095000000000003e-05, 'epoch': 1.81} 48%|████▊ | 4793/10000 [17:27:37<18:38:00, 12.88s/it] 48%|████▊ | 4794/10000 [17:27:50<18:35:42, 12.86s/it] {'loss': 0.0059, 'learning_rate': 2.6090000000000003e-05, 'epoch': 1.81} 48%|████▊ | 4794/10000 [17:27:50<18:35:42, 12.86s/it] 48%|████▊ | 4795/10000 [17:28:03<18:37:51, 12.89s/it] {'loss': 0.0054, 'learning_rate': 2.6085000000000005e-05, 'epoch': 1.81} 48%|████▊ | 4795/10000 [17:28:03<18:37:51, 12.89s/it] 48%|████▊ | 4796/10000 [17:28:16<18:38:14, 12.89s/it] {'loss': 0.0045, 'learning_rate': 2.6079999999999998e-05, 'epoch': 1.81} 48%|████▊ | 4796/10000 [17:28:16<18:38:14, 12.89s/it] 48%|████▊ | 4797/10000 [17:28:29<18:38:11, 12.89s/it] {'loss': 0.0038, 'learning_rate': 2.6075e-05, 'epoch': 1.81} 48%|████▊ | 4797/10000 [17:28:29<18:38:11, 12.89s/it] 48%|████▊ | 4798/10000 [17:28:42<18:39:10, 12.91s/it] {'loss': 0.0051, 'learning_rate': 2.607e-05, 'epoch': 1.81} 48%|████▊ | 4798/10000 [17:28:42<18:39:10, 12.91s/it] 48%|████▊ | 4799/10000 [17:28:54<18:40:07, 12.92s/it] {'loss': 0.0046, 'learning_rate': 2.6065000000000002e-05, 'epoch': 1.81} 48%|████▊ | 4799/10000 [17:28:55<18:40:07, 12.92s/it] 48%|████▊ | 4800/10000 [17:29:07<18:37:27, 12.89s/it] {'loss': 0.0059, 'learning_rate': 2.606e-05, 'epoch': 1.81} 48%|████▊ | 4800/10000 [17:29:07<18:37:27, 12.89s/it] 48%|████▊ | 4801/10000 [17:29:20<18:36:11, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.6055e-05, 'epoch': 1.81} 48%|████▊ | 4801/10000 [17:29:20<18:36:11, 12.88s/it] 48%|████▊ | 4802/10000 [17:29:33<18:35:13, 12.87s/it] {'loss': 0.0034, 'learning_rate': 2.6050000000000003e-05, 'epoch': 1.81} 48%|████▊ | 4802/10000 [17:29:33<18:35:13, 12.87s/it] 48%|████▊ | 4803/10000 [17:29:46<18:34:40, 12.87s/it] {'loss': 0.0033, 'learning_rate': 2.6045000000000002e-05, 'epoch': 1.81} 48%|████▊ | 4803/10000 [17:29:46<18:34:40, 12.87s/it] 48%|████▊ | 4804/10000 [17:29:59<18:34:15, 12.87s/it] {'loss': 0.0036, 'learning_rate': 2.6040000000000005e-05, 'epoch': 1.81} 48%|████▊ | 4804/10000 [17:29:59<18:34:15, 12.87s/it] 48%|████▊ | 4805/10000 [17:30:12<18:35:04, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.6035000000000004e-05, 'epoch': 1.81} 48%|████▊ | 4805/10000 [17:30:12<18:35:04, 12.88s/it] 48%|████▊ | 4806/10000 [17:30:25<18:36:02, 12.89s/it] {'loss': 0.0059, 'learning_rate': 2.603e-05, 'epoch': 1.81} 48%|████▊ | 4806/10000 [17:30:25<18:36:02, 12.89s/it] 48%|████▊ | 4807/10000 [17:30:37<18:34:46, 12.88s/it] {'loss': 0.0061, 'learning_rate': 2.6025e-05, 'epoch': 1.81} 48%|████▊ | 4807/10000 [17:30:37<18:34:46, 12.88s/it] 48%|████▊ | 4808/10000 [17:30:50<18:36:50, 12.91s/it] {'loss': 0.0043, 'learning_rate': 2.602e-05, 'epoch': 1.81} 48%|████▊ | 4808/10000 [17:30:50<18:36:50, 12.91s/it] 48%|████▊ | 4809/10000 [17:31:03<18:38:54, 12.93s/it] {'loss': 0.0042, 'learning_rate': 2.6015e-05, 'epoch': 1.81} 48%|████▊ | 4809/10000 [17:31:03<18:38:54, 12.93s/it] 48%|████▊ | 4810/10000 [17:31:16<18:38:18, 12.93s/it] {'loss': 0.0055, 'learning_rate': 2.601e-05, 'epoch': 1.81} 48%|████▊ | 4810/10000 [17:31:16<18:38:18, 12.93s/it] 48%|████▊ | 4811/10000 [17:31:29<18:37:42, 12.92s/it] {'loss': 0.0064, 'learning_rate': 2.6005000000000003e-05, 'epoch': 1.81} 48%|████▊ | 4811/10000 [17:31:29<18:37:42, 12.92s/it] 48%|████▊ | 4812/10000 [17:31:42<18:39:00, 12.94s/it] {'loss': 0.0052, 'learning_rate': 2.6000000000000002e-05, 'epoch': 1.81} 48%|████▊ | 4812/10000 [17:31:42<18:39:00, 12.94s/it] 48%|████▊ | 4813/10000 [17:31:55<18:37:05, 12.92s/it] {'loss': 0.0057, 'learning_rate': 2.5995000000000004e-05, 'epoch': 1.81} 48%|████▊ | 4813/10000 [17:31:55<18:37:05, 12.92s/it] 48%|████▊ | 4814/10000 [17:32:08<18:37:33, 12.93s/it] {'loss': 0.0044, 'learning_rate': 2.5990000000000004e-05, 'epoch': 1.81} 48%|████▊ | 4814/10000 [17:32:08<18:37:33, 12.93s/it] 48%|████▊ | 4815/10000 [17:32:21<18:38:07, 12.94s/it] {'loss': 0.0053, 'learning_rate': 2.5985000000000003e-05, 'epoch': 1.81} 48%|████▊ | 4815/10000 [17:32:21<18:38:07, 12.94s/it] 48%|████▊ | 4816/10000 [17:32:34<18:38:52, 12.95s/it] {'loss': 0.005, 'learning_rate': 2.598e-05, 'epoch': 1.81} 48%|████▊ | 4816/10000 [17:32:34<18:38:52, 12.95s/it] 48%|████▊ | 4817/10000 [17:32:47<18:39:12, 12.96s/it] {'loss': 0.0047, 'learning_rate': 2.5974999999999998e-05, 'epoch': 1.81} 48%|████▊ | 4817/10000 [17:32:47<18:39:12, 12.96s/it] 48%|████▊ | 4818/10000 [17:33:00<18:37:08, 12.93s/it] {'loss': 0.0046, 'learning_rate': 2.597e-05, 'epoch': 1.82} 48%|████▊ | 4818/10000 [17:33:00<18:37:08, 12.93s/it] 48%|████▊ | 4819/10000 [17:33:13<18:36:06, 12.93s/it] {'loss': 0.0052, 'learning_rate': 2.5965e-05, 'epoch': 1.82} 48%|████▊ | 4819/10000 [17:33:13<18:36:06, 12.93s/it] 48%|████▊ | 4820/10000 [17:33:26<18:35:52, 12.93s/it] {'loss': 0.0049, 'learning_rate': 2.5960000000000002e-05, 'epoch': 1.82} 48%|████▊ | 4820/10000 [17:33:26<18:35:52, 12.93s/it] 48%|████▊ | 4821/10000 [17:33:39<18:36:47, 12.94s/it] {'loss': 0.005, 'learning_rate': 2.5955e-05, 'epoch': 1.82} 48%|████▊ | 4821/10000 [17:33:39<18:36:47, 12.94s/it] 48%|████▊ | 4822/10000 [17:33:52<18:36:35, 12.94s/it] {'loss': 0.0058, 'learning_rate': 2.595e-05, 'epoch': 1.82} 48%|████▊ | 4822/10000 [17:33:52<18:36:35, 12.94s/it] 48%|████▊ | 4823/10000 [17:34:05<18:36:37, 12.94s/it] {'loss': 0.0049, 'learning_rate': 2.5945000000000003e-05, 'epoch': 1.82} 48%|████▊ | 4823/10000 [17:34:05<18:36:37, 12.94s/it] 48%|████▊ | 4824/10000 [17:34:17<18:37:39, 12.96s/it] {'loss': 0.0049, 'learning_rate': 2.5940000000000002e-05, 'epoch': 1.82} 48%|████▊ | 4824/10000 [17:34:18<18:37:39, 12.96s/it] 48%|████▊ | 4825/10000 [17:34:30<18:37:05, 12.95s/it] {'loss': 0.0057, 'learning_rate': 2.5935000000000005e-05, 'epoch': 1.82} 48%|████▊ | 4825/10000 [17:34:30<18:37:05, 12.95s/it] 48%|████▊ | 4826/10000 [17:34:43<18:36:03, 12.94s/it] {'loss': 0.0049, 'learning_rate': 2.5929999999999997e-05, 'epoch': 1.82} 48%|████▊ | 4826/10000 [17:34:43<18:36:03, 12.94s/it] 48%|████▊ | 4827/10000 [17:34:56<18:38:04, 12.97s/it] {'loss': 0.0046, 'learning_rate': 2.5925e-05, 'epoch': 1.82} 48%|████▊ | 4827/10000 [17:34:56<18:38:04, 12.97s/it] 48%|████▊ | 4828/10000 [17:35:09<18:35:50, 12.94s/it] {'loss': 0.0047, 'learning_rate': 2.592e-05, 'epoch': 1.82} 48%|████▊ | 4828/10000 [17:35:09<18:35:50, 12.94s/it] 48%|████▊ | 4829/10000 [17:35:22<18:35:55, 12.95s/it] {'loss': 0.005, 'learning_rate': 2.5915000000000002e-05, 'epoch': 1.82} 48%|████▊ | 4829/10000 [17:35:22<18:35:55, 12.95s/it] 48%|████▊ | 4830/10000 [17:35:35<18:34:46, 12.94s/it] {'loss': 0.0056, 'learning_rate': 2.591e-05, 'epoch': 1.82} 48%|████▊ | 4830/10000 [17:35:35<18:34:46, 12.94s/it] 48%|████▊ | 4831/10000 [17:35:48<18:35:09, 12.94s/it] {'loss': 0.0049, 'learning_rate': 2.5905e-05, 'epoch': 1.82} 48%|████▊ | 4831/10000 [17:35:48<18:35:09, 12.94s/it] 48%|████▊ | 4832/10000 [17:36:01<18:33:47, 12.93s/it] {'loss': 0.0047, 'learning_rate': 2.5900000000000003e-05, 'epoch': 1.82} 48%|████▊ | 4832/10000 [17:36:01<18:33:47, 12.93s/it] 48%|████▊ | 4833/10000 [17:36:14<18:36:31, 12.97s/it] {'loss': 0.0045, 'learning_rate': 2.5895000000000002e-05, 'epoch': 1.82} 48%|████▊ | 4833/10000 [17:36:14<18:36:31, 12.97s/it] 48%|████▊ | 4834/10000 [17:36:27<18:35:50, 12.96s/it] {'loss': 0.003, 'learning_rate': 2.5890000000000005e-05, 'epoch': 1.82} 48%|████▊ | 4834/10000 [17:36:27<18:35:50, 12.96s/it] 48%|████▊ | 4835/10000 [17:36:40<18:36:51, 12.97s/it] {'loss': 0.0045, 'learning_rate': 2.5885000000000004e-05, 'epoch': 1.82} 48%|████▊ | 4835/10000 [17:36:40<18:36:51, 12.97s/it] 48%|████▊ | 4836/10000 [17:36:53<18:36:54, 12.98s/it] {'loss': 0.004, 'learning_rate': 2.588e-05, 'epoch': 1.82} 48%|████▊ | 4836/10000 [17:36:53<18:36:54, 12.98s/it] 48%|████▊ | 4837/10000 [17:37:06<18:36:33, 12.98s/it] {'loss': 0.0056, 'learning_rate': 2.5875e-05, 'epoch': 1.82} 48%|████▊ | 4837/10000 [17:37:06<18:36:33, 12.98s/it] 48%|████▊ | 4838/10000 [17:37:19<18:36:33, 12.98s/it] {'loss': 0.0038, 'learning_rate': 2.587e-05, 'epoch': 1.82} 48%|████▊ | 4838/10000 [17:37:19<18:36:33, 12.98s/it] 48%|████▊ | 4839/10000 [17:37:32<18:37:52, 13.00s/it] {'loss': 0.0043, 'learning_rate': 2.5865e-05, 'epoch': 1.82} 48%|████▊ | 4839/10000 [17:37:32<18:37:52, 13.00s/it] 48%|████▊ | 4840/10000 [17:37:45<18:37:18, 12.99s/it] {'loss': 0.0044, 'learning_rate': 2.586e-05, 'epoch': 1.82} 48%|████▊ | 4840/10000 [17:37:45<18:37:18, 12.99s/it] 48%|████▊ | 4841/10000 [17:37:58<18:37:41, 13.00s/it] {'loss': 0.0052, 'learning_rate': 2.5855000000000002e-05, 'epoch': 1.82} 48%|████▊ | 4841/10000 [17:37:58<18:37:41, 13.00s/it] 48%|████▊ | 4842/10000 [17:38:11<18:37:47, 13.00s/it] {'loss': 0.0052, 'learning_rate': 2.585e-05, 'epoch': 1.82} 48%|████▊ | 4842/10000 [17:38:11<18:37:47, 13.00s/it] 48%|████▊ | 4843/10000 [17:38:24<18:35:58, 12.98s/it] {'loss': 0.0049, 'learning_rate': 2.5845000000000004e-05, 'epoch': 1.82} 48%|████▊ | 4843/10000 [17:38:24<18:35:58, 12.98s/it] 48%|████▊ | 4844/10000 [17:38:37<18:33:25, 12.96s/it] {'loss': 0.0053, 'learning_rate': 2.5840000000000003e-05, 'epoch': 1.83} 48%|████▊ | 4844/10000 [17:38:37<18:33:25, 12.96s/it] 48%|████▊ | 4845/10000 [17:38:50<18:33:03, 12.96s/it] {'loss': 0.0049, 'learning_rate': 2.5835000000000003e-05, 'epoch': 1.83} 48%|████▊ | 4845/10000 [17:38:50<18:33:03, 12.96s/it] 48%|████▊ | 4846/10000 [17:39:03<18:34:49, 12.98s/it] {'loss': 0.0051, 'learning_rate': 2.583e-05, 'epoch': 1.83} 48%|████▊ | 4846/10000 [17:39:03<18:34:49, 12.98s/it] 48%|████▊ | 4847/10000 [17:39:16<18:34:05, 12.97s/it] {'loss': 0.0035, 'learning_rate': 2.5824999999999998e-05, 'epoch': 1.83} 48%|████▊ | 4847/10000 [17:39:16<18:34:05, 12.97s/it] 48%|████▊ | 4848/10000 [17:39:29<18:33:05, 12.96s/it] {'loss': 0.0043, 'learning_rate': 2.582e-05, 'epoch': 1.83} 48%|████▊ | 4848/10000 [17:39:29<18:33:05, 12.96s/it] 48%|████▊ | 4849/10000 [17:39:42<18:32:48, 12.96s/it] {'loss': 0.0039, 'learning_rate': 2.5815e-05, 'epoch': 1.83} 48%|████▊ | 4849/10000 [17:39:42<18:32:48, 12.96s/it] 48%|████▊ | 4850/10000 [17:39:55<18:33:07, 12.97s/it] {'loss': 0.0056, 'learning_rate': 2.5810000000000002e-05, 'epoch': 1.83} 48%|████▊ | 4850/10000 [17:39:55<18:33:07, 12.97s/it] 49%|████▊ | 4851/10000 [17:40:08<18:31:34, 12.95s/it] {'loss': 0.0052, 'learning_rate': 2.5805e-05, 'epoch': 1.83} 49%|████▊ | 4851/10000 [17:40:08<18:31:34, 12.95s/it] 49%|████▊ | 4852/10000 [17:40:21<18:33:18, 12.98s/it] {'loss': 0.0052, 'learning_rate': 2.58e-05, 'epoch': 1.83} 49%|████▊ | 4852/10000 [17:40:21<18:33:18, 12.98s/it] 49%|████▊ | 4853/10000 [17:40:34<18:32:40, 12.97s/it] {'loss': 0.0054, 'learning_rate': 2.5795000000000003e-05, 'epoch': 1.83} 49%|████▊ | 4853/10000 [17:40:34<18:32:40, 12.97s/it] 49%|████▊ | 4854/10000 [17:40:46<18:30:45, 12.95s/it] {'loss': 0.0053, 'learning_rate': 2.5790000000000002e-05, 'epoch': 1.83} 49%|████▊ | 4854/10000 [17:40:47<18:30:45, 12.95s/it] 49%|████▊ | 4855/10000 [17:40:59<18:30:26, 12.95s/it] {'loss': 0.0052, 'learning_rate': 2.5785000000000005e-05, 'epoch': 1.83} 49%|████▊ | 4855/10000 [17:40:59<18:30:26, 12.95s/it] 49%|████▊ | 4856/10000 [17:41:12<18:31:14, 12.96s/it] {'loss': 0.0051, 'learning_rate': 2.5779999999999997e-05, 'epoch': 1.83} 49%|████▊ | 4856/10000 [17:41:12<18:31:14, 12.96s/it] 49%|████▊ | 4857/10000 [17:41:25<18:30:09, 12.95s/it] {'loss': 0.0047, 'learning_rate': 2.5775e-05, 'epoch': 1.83} 49%|████▊ | 4857/10000 [17:41:25<18:30:09, 12.95s/it] 49%|████▊ | 4858/10000 [17:41:38<18:27:38, 12.92s/it] {'loss': 0.0043, 'learning_rate': 2.577e-05, 'epoch': 1.83} 49%|████▊ | 4858/10000 [17:41:38<18:27:38, 12.92s/it] 49%|████▊ | 4859/10000 [17:41:51<18:25:35, 12.90s/it] {'loss': 0.0055, 'learning_rate': 2.5765e-05, 'epoch': 1.83} 49%|████▊ | 4859/10000 [17:41:51<18:25:35, 12.90s/it] 49%|████▊ | 4860/10000 [17:42:04<18:28:12, 12.94s/it] {'loss': 0.0046, 'learning_rate': 2.576e-05, 'epoch': 1.83} 49%|████▊ | 4860/10000 [17:42:04<18:28:12, 12.94s/it] 49%|████▊ | 4861/10000 [17:42:17<18:28:13, 12.94s/it] {'loss': 0.0042, 'learning_rate': 2.5755e-05, 'epoch': 1.83} 49%|████▊ | 4861/10000 [17:42:17<18:28:13, 12.94s/it] 49%|████▊ | 4862/10000 [17:42:30<18:30:17, 12.97s/it] {'loss': 0.006, 'learning_rate': 2.5750000000000002e-05, 'epoch': 1.83} 49%|████▊ | 4862/10000 [17:42:30<18:30:17, 12.97s/it] 49%|████▊ | 4863/10000 [17:42:43<18:30:38, 12.97s/it] {'loss': 0.0052, 'learning_rate': 2.5745e-05, 'epoch': 1.83} 49%|████▊ | 4863/10000 [17:42:43<18:30:38, 12.97s/it] 49%|████▊ | 4864/10000 [17:42:56<18:31:33, 12.99s/it] {'loss': 0.0056, 'learning_rate': 2.5740000000000004e-05, 'epoch': 1.83} 49%|████▊ | 4864/10000 [17:42:56<18:31:33, 12.99s/it] 49%|████▊ | 4865/10000 [17:43:09<18:29:31, 12.96s/it] {'loss': 0.0055, 'learning_rate': 2.5735000000000003e-05, 'epoch': 1.83} 49%|████▊ | 4865/10000 [17:43:09<18:29:31, 12.96s/it] 49%|████▊ | 4866/10000 [17:43:22<18:29:44, 12.97s/it] {'loss': 0.006, 'learning_rate': 2.573e-05, 'epoch': 1.83} 49%|████▊ | 4866/10000 [17:43:22<18:29:44, 12.97s/it] 49%|████▊ | 4867/10000 [17:43:35<18:29:13, 12.97s/it] {'loss': 0.0043, 'learning_rate': 2.5725e-05, 'epoch': 1.83} 49%|████▊ | 4867/10000 [17:43:35<18:29:13, 12.97s/it] 49%|████▊ | 4868/10000 [17:43:48<18:27:02, 12.94s/it] {'loss': 0.0049, 'learning_rate': 2.572e-05, 'epoch': 1.83} 49%|████▊ | 4868/10000 [17:43:48<18:27:02, 12.94s/it] 49%|████▊ | 4869/10000 [17:44:01<18:23:34, 12.90s/it] {'loss': 0.004, 'learning_rate': 2.5715e-05, 'epoch': 1.83} 49%|████▊ | 4869/10000 [17:44:01<18:23:34, 12.90s/it] 49%|████▊ | 4870/10000 [17:44:13<18:21:14, 12.88s/it] {'loss': 0.0063, 'learning_rate': 2.571e-05, 'epoch': 1.83} 49%|████▊ | 4870/10000 [17:44:13<18:21:14, 12.88s/it] 49%|████▊ | 4871/10000 [17:44:26<18:19:41, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.5705000000000002e-05, 'epoch': 1.84} 49%|████▊ | 4871/10000 [17:44:26<18:19:41, 12.86s/it] 49%|████▊ | 4872/10000 [17:44:39<18:20:02, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.57e-05, 'epoch': 1.84} 49%|████▊ | 4872/10000 [17:44:39<18:20:02, 12.87s/it] 49%|████▊ | 4873/10000 [17:44:52<18:20:39, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.5695000000000004e-05, 'epoch': 1.84} 49%|████▊ | 4873/10000 [17:44:52<18:20:39, 12.88s/it] 49%|████▊ | 4874/10000 [17:45:05<18:19:42, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.5690000000000003e-05, 'epoch': 1.84} 49%|████▊ | 4874/10000 [17:45:05<18:19:42, 12.87s/it] 49%|████▉ | 4875/10000 [17:45:18<18:18:38, 12.86s/it] {'loss': 0.0055, 'learning_rate': 2.5685000000000002e-05, 'epoch': 1.84} 49%|████▉ | 4875/10000 [17:45:18<18:18:38, 12.86s/it] 49%|████▉ | 4876/10000 [17:45:31<18:18:00, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.5679999999999998e-05, 'epoch': 1.84} 49%|████▉ | 4876/10000 [17:45:31<18:18:00, 12.86s/it] 49%|████▉ | 4877/10000 [17:45:43<18:18:23, 12.86s/it] {'loss': 0.006, 'learning_rate': 2.5675e-05, 'epoch': 1.84} 49%|████▉ | 4877/10000 [17:45:43<18:18:23, 12.86s/it] 49%|████▉ | 4878/10000 [17:45:56<18:17:21, 12.85s/it] {'loss': 0.0047, 'learning_rate': 2.567e-05, 'epoch': 1.84} 49%|████▉ | 4878/10000 [17:45:56<18:17:21, 12.85s/it] 49%|████▉ | 4879/10000 [17:46:09<18:17:23, 12.86s/it] {'loss': 0.0051, 'learning_rate': 2.5665e-05, 'epoch': 1.84} 49%|████▉ | 4879/10000 [17:46:09<18:17:23, 12.86s/it] 49%|████▉ | 4880/10000 [17:46:22<18:16:52, 12.85s/it] {'loss': 0.0046, 'learning_rate': 2.566e-05, 'epoch': 1.84} 49%|████▉ | 4880/10000 [17:46:22<18:16:52, 12.85s/it] 49%|████▉ | 4881/10000 [17:46:35<18:15:25, 12.84s/it] {'loss': 0.0055, 'learning_rate': 2.5655e-05, 'epoch': 1.84} 49%|████▉ | 4881/10000 [17:46:35<18:15:25, 12.84s/it] 49%|████▉ | 4882/10000 [17:46:48<18:15:20, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.5650000000000003e-05, 'epoch': 1.84} 49%|████▉ | 4882/10000 [17:46:48<18:15:20, 12.84s/it] 49%|████▉ | 4883/10000 [17:47:01<18:16:25, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.5645000000000003e-05, 'epoch': 1.84} 49%|████▉ | 4883/10000 [17:47:01<18:16:25, 12.86s/it] 49%|████▉ | 4884/10000 [17:47:13<18:15:41, 12.85s/it] {'loss': 0.0059, 'learning_rate': 2.5640000000000002e-05, 'epoch': 1.84} 49%|████▉ | 4884/10000 [17:47:13<18:15:41, 12.85s/it] 49%|████▉ | 4885/10000 [17:47:26<18:14:00, 12.83s/it] {'loss': 0.0069, 'learning_rate': 2.5635000000000004e-05, 'epoch': 1.84} 49%|████▉ | 4885/10000 [17:47:26<18:14:00, 12.83s/it] 49%|████▉ | 4886/10000 [17:47:39<18:14:40, 12.84s/it] {'loss': 0.0056, 'learning_rate': 2.5629999999999997e-05, 'epoch': 1.84} 49%|████▉ | 4886/10000 [17:47:39<18:14:40, 12.84s/it] 49%|████▉ | 4887/10000 [17:47:52<18:15:55, 12.86s/it] {'loss': 0.0052, 'learning_rate': 2.5625e-05, 'epoch': 1.84} 49%|████▉ | 4887/10000 [17:47:52<18:15:55, 12.86s/it] 49%|████▉ | 4888/10000 [17:48:05<18:15:55, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.562e-05, 'epoch': 1.84} 49%|████▉ | 4888/10000 [17:48:05<18:15:55, 12.86s/it] 49%|████▉ | 4889/10000 [17:48:18<18:14:46, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.5615e-05, 'epoch': 1.84} 49%|████▉ | 4889/10000 [17:48:18<18:14:46, 12.85s/it] 49%|████▉ | 4890/10000 [17:48:31<18:15:26, 12.86s/it] {'loss': 0.0042, 'learning_rate': 2.561e-05, 'epoch': 1.84} 49%|████▉ | 4890/10000 [17:48:31<18:15:26, 12.86s/it] 49%|████▉ | 4891/10000 [17:48:43<18:15:24, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.5605e-05, 'epoch': 1.84} 49%|████▉ | 4891/10000 [17:48:43<18:15:24, 12.86s/it] 49%|████▉ | 4892/10000 [17:48:56<18:14:09, 12.85s/it] {'loss': 0.0058, 'learning_rate': 2.5600000000000002e-05, 'epoch': 1.84} 49%|████▉ | 4892/10000 [17:48:56<18:14:09, 12.85s/it] 49%|████▉ | 4893/10000 [17:49:09<18:14:01, 12.85s/it] {'loss': 0.0055, 'learning_rate': 2.5595e-05, 'epoch': 1.84} 49%|████▉ | 4893/10000 [17:49:09<18:14:01, 12.85s/it] 49%|████▉ | 4894/10000 [17:49:22<18:14:00, 12.86s/it] {'loss': 0.0038, 'learning_rate': 2.5590000000000004e-05, 'epoch': 1.84} 49%|████▉ | 4894/10000 [17:49:22<18:14:00, 12.86s/it] 49%|████▉ | 4895/10000 [17:49:35<18:12:46, 12.84s/it] {'loss': 0.0059, 'learning_rate': 2.5585000000000003e-05, 'epoch': 1.84} 49%|████▉ | 4895/10000 [17:49:35<18:12:46, 12.84s/it] 49%|████▉ | 4896/10000 [17:49:48<18:12:21, 12.84s/it] {'loss': 0.0037, 'learning_rate': 2.5580000000000002e-05, 'epoch': 1.84} 49%|████▉ | 4896/10000 [17:49:48<18:12:21, 12.84s/it] 49%|████▉ | 4897/10000 [17:50:00<18:12:19, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.5574999999999998e-05, 'epoch': 1.85} 49%|████▉ | 4897/10000 [17:50:00<18:12:19, 12.84s/it] 49%|████▉ | 4898/10000 [17:50:13<18:12:01, 12.84s/it] {'loss': 0.0044, 'learning_rate': 2.557e-05, 'epoch': 1.85} 49%|████▉ | 4898/10000 [17:50:13<18:12:01, 12.84s/it] 49%|████▉ | 4899/10000 [17:50:26<18:12:52, 12.85s/it] {'loss': 0.0059, 'learning_rate': 2.5565e-05, 'epoch': 1.85} 49%|████▉ | 4899/10000 [17:50:26<18:12:52, 12.85s/it] 49%|████▉ | 4900/10000 [17:50:39<18:14:07, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.556e-05, 'epoch': 1.85} 49%|████▉ | 4900/10000 [17:50:39<18:14:07, 12.87s/it] 49%|████▉ | 4901/10000 [17:50:52<18:13:54, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.5555000000000002e-05, 'epoch': 1.85} 49%|████▉ | 4901/10000 [17:50:52<18:13:54, 12.87s/it] 49%|████▉ | 4902/10000 [17:51:05<18:13:47, 12.87s/it] {'loss': 0.0061, 'learning_rate': 2.555e-05, 'epoch': 1.85} 49%|████▉ | 4902/10000 [17:51:05<18:13:47, 12.87s/it] 49%|████▉ | 4903/10000 [17:51:18<18:13:54, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.5545000000000004e-05, 'epoch': 1.85} 49%|████▉ | 4903/10000 [17:51:18<18:13:54, 12.88s/it] 49%|████▉ | 4904/10000 [17:51:31<18:13:12, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.5540000000000003e-05, 'epoch': 1.85} 49%|████▉ | 4904/10000 [17:51:31<18:13:12, 12.87s/it] 49%|████▉ | 4905/10000 [17:51:43<18:13:13, 12.87s/it] {'loss': 0.0052, 'learning_rate': 2.5535000000000002e-05, 'epoch': 1.85} 49%|████▉ | 4905/10000 [17:51:43<18:13:13, 12.87s/it] 49%|████▉ | 4906/10000 [17:51:56<18:12:25, 12.87s/it] {'loss': 0.0054, 'learning_rate': 2.5530000000000005e-05, 'epoch': 1.85} 49%|████▉ | 4906/10000 [17:51:56<18:12:25, 12.87s/it] 49%|████▉ | 4907/10000 [17:52:09<18:12:18, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.5525e-05, 'epoch': 1.85} 49%|████▉ | 4907/10000 [17:52:09<18:12:18, 12.87s/it] 49%|████▉ | 4908/10000 [17:52:22<18:12:33, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.552e-05, 'epoch': 1.85} 49%|████▉ | 4908/10000 [17:52:22<18:12:33, 12.87s/it] 49%|████▉ | 4909/10000 [17:52:35<18:12:12, 12.87s/it] {'loss': 0.0052, 'learning_rate': 2.5515e-05, 'epoch': 1.85} 49%|████▉ | 4909/10000 [17:52:35<18:12:12, 12.87s/it] 49%|████▉ | 4910/10000 [17:52:48<18:10:44, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.551e-05, 'epoch': 1.85} 49%|████▉ | 4910/10000 [17:52:48<18:10:44, 12.86s/it] 49%|████▉ | 4911/10000 [17:53:01<18:10:19, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.5505e-05, 'epoch': 1.85} 49%|████▉ | 4911/10000 [17:53:01<18:10:19, 12.86s/it] 49%|████▉ | 4912/10000 [17:53:13<18:09:58, 12.85s/it] {'loss': 0.0048, 'learning_rate': 2.5500000000000003e-05, 'epoch': 1.85} 49%|████▉ | 4912/10000 [17:53:13<18:09:58, 12.85s/it] 49%|████▉ | 4913/10000 [17:53:26<18:08:37, 12.84s/it] {'loss': 0.006, 'learning_rate': 2.5495000000000002e-05, 'epoch': 1.85} 49%|████▉ | 4913/10000 [17:53:26<18:08:37, 12.84s/it] 49%|████▉ | 4914/10000 [17:53:39<18:09:50, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.549e-05, 'epoch': 1.85} 49%|████▉ | 4914/10000 [17:53:39<18:09:50, 12.86s/it] 49%|████▉ | 4915/10000 [17:53:52<18:09:58, 12.86s/it] {'loss': 0.0033, 'learning_rate': 2.5485000000000004e-05, 'epoch': 1.85} 49%|████▉ | 4915/10000 [17:53:52<18:09:58, 12.86s/it] 49%|████▉ | 4916/10000 [17:54:05<18:09:35, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.5480000000000003e-05, 'epoch': 1.85} 49%|████▉ | 4916/10000 [17:54:05<18:09:35, 12.86s/it] 49%|████▉ | 4917/10000 [17:54:18<18:09:26, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.5475e-05, 'epoch': 1.85} 49%|████▉ | 4917/10000 [17:54:18<18:09:26, 12.86s/it] 49%|████▉ | 4918/10000 [17:54:31<18:08:31, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.547e-05, 'epoch': 1.85} 49%|████▉ | 4918/10000 [17:54:31<18:08:31, 12.85s/it] 49%|████▉ | 4919/10000 [17:54:43<18:08:31, 12.85s/it] {'loss': 0.0046, 'learning_rate': 2.5465e-05, 'epoch': 1.85} 49%|████▉ | 4919/10000 [17:54:43<18:08:31, 12.85s/it] 49%|████▉ | 4920/10000 [17:54:56<18:09:45, 12.87s/it] {'loss': 0.0051, 'learning_rate': 2.546e-05, 'epoch': 1.85} 49%|████▉ | 4920/10000 [17:54:56<18:09:45, 12.87s/it] 49%|████▉ | 4921/10000 [17:55:09<18:08:14, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.5455e-05, 'epoch': 1.85} 49%|████▉ | 4921/10000 [17:55:09<18:08:14, 12.86s/it] 49%|████▉ | 4922/10000 [17:55:22<18:07:21, 12.85s/it] {'loss': 0.0057, 'learning_rate': 2.5450000000000002e-05, 'epoch': 1.85} 49%|████▉ | 4922/10000 [17:55:22<18:07:21, 12.85s/it] 49%|████▉ | 4923/10000 [17:55:35<18:06:56, 12.85s/it] {'loss': 0.0055, 'learning_rate': 2.5445e-05, 'epoch': 1.85} 49%|████▉ | 4923/10000 [17:55:35<18:06:56, 12.85s/it] 49%|████▉ | 4924/10000 [17:55:48<18:06:48, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.5440000000000004e-05, 'epoch': 1.86} 49%|████▉ | 4924/10000 [17:55:48<18:06:48, 12.85s/it] 49%|████▉ | 4925/10000 [17:56:01<18:08:07, 12.86s/it] {'loss': 0.0051, 'learning_rate': 2.5435000000000003e-05, 'epoch': 1.86} 49%|████▉ | 4925/10000 [17:56:01<18:08:07, 12.86s/it] 49%|████▉ | 4926/10000 [17:56:13<18:08:56, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.5430000000000002e-05, 'epoch': 1.86} 49%|████▉ | 4926/10000 [17:56:14<18:08:56, 12.88s/it] 49%|████▉ | 4927/10000 [17:56:26<18:09:23, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.5424999999999998e-05, 'epoch': 1.86} 49%|████▉ | 4927/10000 [17:56:26<18:09:23, 12.88s/it] 49%|████▉ | 4928/10000 [17:56:39<18:08:57, 12.88s/it] {'loss': 0.0057, 'learning_rate': 2.542e-05, 'epoch': 1.86} 49%|████▉ | 4928/10000 [17:56:39<18:08:57, 12.88s/it] 49%|████▉ | 4929/10000 [17:56:52<18:08:23, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.5415e-05, 'epoch': 1.86} 49%|████▉ | 4929/10000 [17:56:52<18:08:23, 12.88s/it] 49%|████▉ | 4930/10000 [17:57:05<18:08:19, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.541e-05, 'epoch': 1.86} 49%|████▉ | 4930/10000 [17:57:05<18:08:19, 12.88s/it] 49%|████▉ | 4931/10000 [17:57:18<18:05:51, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.5405e-05, 'epoch': 1.86} 49%|████▉ | 4931/10000 [17:57:18<18:05:51, 12.85s/it] 49%|████▉ | 4932/10000 [17:57:31<18:06:01, 12.86s/it] {'loss': 0.0049, 'learning_rate': 2.54e-05, 'epoch': 1.86} 49%|████▉ | 4932/10000 [17:57:31<18:06:01, 12.86s/it] 49%|████▉ | 4933/10000 [17:57:43<18:04:19, 12.84s/it] {'loss': 0.0034, 'learning_rate': 2.5395000000000003e-05, 'epoch': 1.86} 49%|████▉ | 4933/10000 [17:57:43<18:04:19, 12.84s/it] 49%|████▉ | 4934/10000 [17:57:56<18:05:18, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.5390000000000003e-05, 'epoch': 1.86} 49%|████▉ | 4934/10000 [17:57:56<18:05:18, 12.85s/it] 49%|████▉ | 4935/10000 [17:58:09<18:05:30, 12.86s/it] {'loss': 0.0055, 'learning_rate': 2.5385000000000002e-05, 'epoch': 1.86} 49%|████▉ | 4935/10000 [17:58:09<18:05:30, 12.86s/it] 49%|████▉ | 4936/10000 [17:58:22<18:06:16, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.5380000000000004e-05, 'epoch': 1.86} 49%|████▉ | 4936/10000 [17:58:22<18:06:16, 12.87s/it] 49%|████▉ | 4937/10000 [17:58:35<18:05:02, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.5375e-05, 'epoch': 1.86} 49%|████▉ | 4937/10000 [17:58:35<18:05:02, 12.86s/it] 49%|████▉ | 4938/10000 [17:58:48<18:03:28, 12.84s/it] {'loss': 0.0047, 'learning_rate': 2.537e-05, 'epoch': 1.86} 49%|████▉ | 4938/10000 [17:58:48<18:03:28, 12.84s/it] 49%|████▉ | 4939/10000 [17:59:01<18:04:00, 12.85s/it] {'loss': 0.0042, 'learning_rate': 2.5365e-05, 'epoch': 1.86} 49%|████▉ | 4939/10000 [17:59:01<18:04:00, 12.85s/it] 49%|████▉ | 4940/10000 [17:59:14<18:04:55, 12.86s/it] {'loss': 0.0045, 'learning_rate': 2.536e-05, 'epoch': 1.86} 49%|████▉ | 4940/10000 [17:59:14<18:04:55, 12.86s/it] 49%|████▉ | 4941/10000 [17:59:26<18:03:29, 12.85s/it] {'loss': 0.0062, 'learning_rate': 2.5355e-05, 'epoch': 1.86} 49%|████▉ | 4941/10000 [17:59:26<18:03:29, 12.85s/it] 49%|████▉ | 4942/10000 [17:59:39<18:03:15, 12.85s/it] {'loss': 0.0058, 'learning_rate': 2.5350000000000003e-05, 'epoch': 1.86} 49%|████▉ | 4942/10000 [17:59:39<18:03:15, 12.85s/it] 49%|████▉ | 4943/10000 [17:59:52<18:04:09, 12.86s/it] {'loss': 0.0065, 'learning_rate': 2.5345000000000002e-05, 'epoch': 1.86} 49%|████▉ | 4943/10000 [17:59:52<18:04:09, 12.86s/it] 49%|████▉ | 4944/10000 [18:00:05<18:02:30, 12.85s/it] {'loss': 0.0047, 'learning_rate': 2.534e-05, 'epoch': 1.86} 49%|████▉ | 4944/10000 [18:00:05<18:02:30, 12.85s/it] 49%|████▉ | 4945/10000 [18:00:18<18:02:40, 12.85s/it] {'loss': 0.0061, 'learning_rate': 2.5335000000000004e-05, 'epoch': 1.86} 49%|████▉ | 4945/10000 [18:00:18<18:02:40, 12.85s/it] 49%|████▉ | 4946/10000 [18:00:31<18:02:46, 12.85s/it] {'loss': 0.0059, 'learning_rate': 2.5330000000000003e-05, 'epoch': 1.86} 49%|████▉ | 4946/10000 [18:00:31<18:02:46, 12.85s/it] 49%|████▉ | 4947/10000 [18:00:43<18:01:40, 12.84s/it] {'loss': 0.0054, 'learning_rate': 2.5325e-05, 'epoch': 1.86} 49%|████▉ | 4947/10000 [18:00:43<18:01:40, 12.84s/it] 49%|████▉ | 4948/10000 [18:00:56<18:03:01, 12.86s/it] {'loss': 0.0036, 'learning_rate': 2.5319999999999998e-05, 'epoch': 1.86} 49%|████▉ | 4948/10000 [18:00:56<18:03:01, 12.86s/it] 49%|████▉ | 4949/10000 [18:01:09<18:04:09, 12.88s/it] {'loss': 0.0045, 'learning_rate': 2.5315e-05, 'epoch': 1.86} 49%|████▉ | 4949/10000 [18:01:09<18:04:09, 12.88s/it] 50%|████▉ | 4950/10000 [18:01:22<18:03:59, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.531e-05, 'epoch': 1.87} 50%|████▉ | 4950/10000 [18:01:22<18:03:59, 12.88s/it] 50%|████▉ | 4951/10000 [18:01:35<18:03:35, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.5305000000000003e-05, 'epoch': 1.87} 50%|████▉ | 4951/10000 [18:01:35<18:03:35, 12.88s/it] 50%|████▉ | 4952/10000 [18:01:48<18:03:36, 12.88s/it] {'loss': 0.0037, 'learning_rate': 2.5300000000000002e-05, 'epoch': 1.87} 50%|████▉ | 4952/10000 [18:01:48<18:03:36, 12.88s/it] 50%|████▉ | 4953/10000 [18:02:01<18:02:50, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.5295e-05, 'epoch': 1.87} 50%|████▉ | 4953/10000 [18:02:01<18:02:50, 12.87s/it] 50%|████▉ | 4954/10000 [18:02:14<18:01:39, 12.86s/it] {'loss': 0.0063, 'learning_rate': 2.5290000000000004e-05, 'epoch': 1.87} 50%|████▉ | 4954/10000 [18:02:14<18:01:39, 12.86s/it] 50%|████▉ | 4955/10000 [18:02:26<18:00:37, 12.85s/it] {'loss': 0.0063, 'learning_rate': 2.5285000000000003e-05, 'epoch': 1.87} 50%|████▉ | 4955/10000 [18:02:26<18:00:37, 12.85s/it] 50%|████▉ | 4956/10000 [18:02:39<18:01:12, 12.86s/it] {'loss': 0.0041, 'learning_rate': 2.5280000000000005e-05, 'epoch': 1.87} 50%|████▉ | 4956/10000 [18:02:39<18:01:12, 12.86s/it] 50%|████▉ | 4957/10000 [18:02:52<18:01:47, 12.87s/it] {'loss': 0.0062, 'learning_rate': 2.5274999999999998e-05, 'epoch': 1.87} 50%|████▉ | 4957/10000 [18:02:52<18:01:47, 12.87s/it] 50%|████▉ | 4958/10000 [18:03:05<18:03:43, 12.90s/it] {'loss': 0.0042, 'learning_rate': 2.527e-05, 'epoch': 1.87} 50%|████▉ | 4958/10000 [18:03:05<18:03:43, 12.90s/it] 50%|████▉ | 4959/10000 [18:03:18<18:07:34, 12.94s/it] {'loss': 0.0048, 'learning_rate': 2.5265e-05, 'epoch': 1.87} 50%|████▉ | 4959/10000 [18:03:18<18:07:34, 12.94s/it] 50%|████▉ | 4960/10000 [18:03:31<18:09:00, 12.96s/it] {'loss': 0.0043, 'learning_rate': 2.526e-05, 'epoch': 1.87} 50%|████▉ | 4960/10000 [18:03:31<18:09:00, 12.96s/it] 50%|████▉ | 4961/10000 [18:03:44<18:11:07, 12.99s/it] {'loss': 0.0047, 'learning_rate': 2.5255e-05, 'epoch': 1.87} 50%|████▉ | 4961/10000 [18:03:44<18:11:07, 12.99s/it] 50%|████▉ | 4962/10000 [18:03:57<18:10:49, 12.99s/it] {'loss': 0.0045, 'learning_rate': 2.525e-05, 'epoch': 1.87} 50%|████▉ | 4962/10000 [18:03:57<18:10:49, 12.99s/it] 50%|████▉ | 4963/10000 [18:04:10<18:09:40, 12.98s/it] {'loss': 0.0059, 'learning_rate': 2.5245000000000003e-05, 'epoch': 1.87} 50%|████▉ | 4963/10000 [18:04:10<18:09:40, 12.98s/it] 50%|████▉ | 4964/10000 [18:04:23<18:08:14, 12.97s/it] {'loss': 0.004, 'learning_rate': 2.5240000000000002e-05, 'epoch': 1.87} 50%|████▉ | 4964/10000 [18:04:23<18:08:14, 12.97s/it] 50%|████▉ | 4965/10000 [18:04:36<18:06:46, 12.95s/it] {'loss': 0.0048, 'learning_rate': 2.5235e-05, 'epoch': 1.87} 50%|████▉ | 4965/10000 [18:04:36<18:06:46, 12.95s/it] 50%|████▉ | 4966/10000 [18:04:49<18:04:56, 12.93s/it] {'loss': 0.0056, 'learning_rate': 2.5230000000000004e-05, 'epoch': 1.87} 50%|████▉ | 4966/10000 [18:04:49<18:04:56, 12.93s/it] 50%|████▉ | 4967/10000 [18:05:02<18:05:55, 12.95s/it] {'loss': 0.0044, 'learning_rate': 2.5225e-05, 'epoch': 1.87} 50%|████▉ | 4967/10000 [18:05:02<18:05:55, 12.95s/it] 50%|████▉ | 4968/10000 [18:05:15<18:07:15, 12.96s/it] {'loss': 0.0052, 'learning_rate': 2.522e-05, 'epoch': 1.87} 50%|████▉ | 4968/10000 [18:05:15<18:07:15, 12.96s/it] 50%|████▉ | 4969/10000 [18:05:28<18:07:32, 12.97s/it] {'loss': 0.0047, 'learning_rate': 2.5214999999999998e-05, 'epoch': 1.87} 50%|████▉ | 4969/10000 [18:05:28<18:07:32, 12.97s/it] 50%|████▉ | 4970/10000 [18:05:41<18:08:28, 12.98s/it] {'loss': 0.0056, 'learning_rate': 2.521e-05, 'epoch': 1.87} 50%|████▉ | 4970/10000 [18:05:41<18:08:28, 12.98s/it] 50%|████▉ | 4971/10000 [18:05:54<18:09:46, 13.00s/it] {'loss': 0.0039, 'learning_rate': 2.5205e-05, 'epoch': 1.87} 50%|████▉ | 4971/10000 [18:05:54<18:09:46, 13.00s/it] 50%|████▉ | 4972/10000 [18:06:07<18:10:30, 13.01s/it] {'loss': 0.0059, 'learning_rate': 2.5200000000000003e-05, 'epoch': 1.87} 50%|████▉ | 4972/10000 [18:06:07<18:10:30, 13.01s/it] 50%|████▉ | 4973/10000 [18:06:20<18:10:40, 13.02s/it] {'loss': 0.005, 'learning_rate': 2.5195000000000002e-05, 'epoch': 1.87} 50%|████▉ | 4973/10000 [18:06:20<18:10:40, 13.02s/it] 50%|████▉ | 4974/10000 [18:06:33<18:11:26, 13.03s/it] {'loss': 0.0054, 'learning_rate': 2.519e-05, 'epoch': 1.87} 50%|████▉ | 4974/10000 [18:06:33<18:11:26, 13.03s/it] 50%|████▉ | 4975/10000 [18:06:46<18:10:27, 13.02s/it] {'loss': 0.0043, 'learning_rate': 2.5185000000000004e-05, 'epoch': 1.87} 50%|████▉ | 4975/10000 [18:06:46<18:10:27, 13.02s/it] 50%|████▉ | 4976/10000 [18:06:59<18:12:45, 13.05s/it] {'loss': 0.0053, 'learning_rate': 2.5180000000000003e-05, 'epoch': 1.87} 50%|████▉ | 4976/10000 [18:06:59<18:12:45, 13.05s/it] 50%|████▉ | 4977/10000 [18:07:12<18:11:45, 13.04s/it] {'loss': 0.0054, 'learning_rate': 2.5175e-05, 'epoch': 1.88} 50%|████▉ | 4977/10000 [18:07:12<18:11:45, 13.04s/it] 50%|████▉ | 4978/10000 [18:07:25<18:10:26, 13.03s/it] {'loss': 0.0061, 'learning_rate': 2.5169999999999998e-05, 'epoch': 1.88} 50%|████▉ | 4978/10000 [18:07:25<18:10:26, 13.03s/it] 50%|████▉ | 4979/10000 [18:07:38<18:09:35, 13.02s/it] {'loss': 0.0055, 'learning_rate': 2.5165e-05, 'epoch': 1.88} 50%|████▉ | 4979/10000 [18:07:38<18:09:35, 13.02s/it] 50%|████▉ | 4980/10000 [18:07:51<18:08:56, 13.02s/it] {'loss': 0.0042, 'learning_rate': 2.516e-05, 'epoch': 1.88} 50%|████▉ | 4980/10000 [18:07:51<18:08:56, 13.02s/it] 50%|████▉ | 4981/10000 [18:08:04<18:05:45, 12.98s/it] {'loss': 0.0045, 'learning_rate': 2.5155000000000002e-05, 'epoch': 1.88} 50%|████▉ | 4981/10000 [18:08:04<18:05:45, 12.98s/it] 50%|████▉ | 4982/10000 [18:08:17<18:03:19, 12.95s/it] {'loss': 0.0046, 'learning_rate': 2.515e-05, 'epoch': 1.88} 50%|████▉ | 4982/10000 [18:08:17<18:03:19, 12.95s/it] 50%|████▉ | 4983/10000 [18:08:30<18:01:14, 12.93s/it] {'loss': 0.0048, 'learning_rate': 2.5145e-05, 'epoch': 1.88} 50%|████▉ | 4983/10000 [18:08:30<18:01:14, 12.93s/it] 50%|████▉ | 4984/10000 [18:08:43<18:00:45, 12.93s/it] {'loss': 0.0052, 'learning_rate': 2.5140000000000003e-05, 'epoch': 1.88} 50%|████▉ | 4984/10000 [18:08:43<18:00:45, 12.93s/it] 50%|████▉ | 4985/10000 [18:08:56<18:00:17, 12.92s/it] {'loss': 0.0045, 'learning_rate': 2.5135000000000002e-05, 'epoch': 1.88} 50%|████▉ | 4985/10000 [18:08:56<18:00:17, 12.92s/it] 50%|████▉ | 4986/10000 [18:09:09<18:00:53, 12.93s/it] {'loss': 0.004, 'learning_rate': 2.5130000000000005e-05, 'epoch': 1.88} 50%|████▉ | 4986/10000 [18:09:09<18:00:53, 12.93s/it] 50%|████▉ | 4987/10000 [18:09:22<18:00:13, 12.93s/it] {'loss': 0.0038, 'learning_rate': 2.5124999999999997e-05, 'epoch': 1.88} 50%|████▉ | 4987/10000 [18:09:22<18:00:13, 12.93s/it] 50%|████▉ | 4988/10000 [18:09:35<18:00:00, 12.93s/it] {'loss': 0.0043, 'learning_rate': 2.512e-05, 'epoch': 1.88} 50%|████▉ | 4988/10000 [18:09:35<18:00:00, 12.93s/it] 50%|████▉ | 4989/10000 [18:09:47<17:59:27, 12.93s/it] {'loss': 0.0062, 'learning_rate': 2.5115e-05, 'epoch': 1.88} 50%|████▉ | 4989/10000 [18:09:48<17:59:27, 12.93s/it] 50%|████▉ | 4990/10000 [18:10:00<18:00:50, 12.94s/it] {'loss': 0.0061, 'learning_rate': 2.5110000000000002e-05, 'epoch': 1.88} 50%|████▉ | 4990/10000 [18:10:00<18:00:50, 12.94s/it] 50%|████▉ | 4991/10000 [18:10:13<17:59:59, 12.94s/it] {'loss': 0.0038, 'learning_rate': 2.5105e-05, 'epoch': 1.88} 50%|████▉ | 4991/10000 [18:10:13<17:59:59, 12.94s/it] 50%|████▉ | 4992/10000 [18:10:26<17:58:09, 12.92s/it] {'loss': 0.0048, 'learning_rate': 2.51e-05, 'epoch': 1.88} 50%|████▉ | 4992/10000 [18:10:26<17:58:09, 12.92s/it] 50%|████▉ | 4993/10000 [18:10:39<17:56:57, 12.91s/it] {'loss': 0.0054, 'learning_rate': 2.5095000000000003e-05, 'epoch': 1.88} 50%|████▉ | 4993/10000 [18:10:39<17:56:57, 12.91s/it] 50%|████▉ | 4994/10000 [18:10:52<17:58:15, 12.92s/it] {'loss': 0.005, 'learning_rate': 2.5090000000000002e-05, 'epoch': 1.88} 50%|████▉ | 4994/10000 [18:10:52<17:58:15, 12.92s/it] 50%|████▉ | 4995/10000 [18:11:05<17:57:05, 12.91s/it] {'loss': 0.0063, 'learning_rate': 2.5085000000000005e-05, 'epoch': 1.88} 50%|████▉ | 4995/10000 [18:11:05<17:57:05, 12.91s/it] 50%|████▉ | 4996/10000 [18:11:18<17:56:06, 12.90s/it] {'loss': 0.0063, 'learning_rate': 2.5080000000000004e-05, 'epoch': 1.88} 50%|████▉ | 4996/10000 [18:11:18<17:56:06, 12.90s/it] 50%|████▉ | 4997/10000 [18:11:31<17:54:54, 12.89s/it] {'loss': 0.007, 'learning_rate': 2.5075e-05, 'epoch': 1.88} 50%|████▉ | 4997/10000 [18:11:31<17:54:54, 12.89s/it] 50%|████▉ | 4998/10000 [18:11:44<17:54:42, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.507e-05, 'epoch': 1.88} 50%|████▉ | 4998/10000 [18:11:44<17:54:42, 12.89s/it] 50%|████▉ | 4999/10000 [18:11:57<17:56:30, 12.92s/it] {'loss': 0.0042, 'learning_rate': 2.5064999999999998e-05, 'epoch': 1.88} 50%|████▉ | 4999/10000 [18:11:57<17:56:30, 12.92s/it] 50%|█████ | 5000/10000 [18:12:10<17:56:47, 12.92s/it] {'loss': 0.0037, 'learning_rate': 2.506e-05, 'epoch': 1.88} 50%|█████ | 5000/10000 [18:12:10<17:56:47, 12.92s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-06 14:37:08,119 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/config.json [INFO|configuration_utils.py:364] 2024-11-06 14:37:08,121 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-06 14:37:53,972 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-06 14:37:53,974 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-06 14:37:53,975 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/special_tokens_map.json [2024-11-06 14:37:53,984] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step5000 is about to be saved! [2024-11-06 14:37:54,036] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/global_step5000/mp_rank_00_model_states.pt [2024-11-06 14:37:54,036] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/global_step5000/mp_rank_00_model_states.pt... [2024-11-06 14:38:42,094] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/global_step5000/mp_rank_00_model_states.pt. [2024-11-06 14:38:42,192] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-06 14:40:22,255] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-06 14:40:22,430] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-06 14:40:22,430] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step5000 is ready now! 50%|█████ | 5001/10000 [18:15:37<99:07:23, 71.38s/it] {'loss': 0.0062, 'learning_rate': 2.5055e-05, 'epoch': 1.88} 50%|█████ | 5001/10000 [18:15:37<99:07:23, 71.38s/it] 50%|█████ | 5002/10000 [18:15:50<74:40:34, 53.79s/it] {'loss': 0.0042, 'learning_rate': 2.5050000000000002e-05, 'epoch': 1.88} 50%|█████ | 5002/10000 [18:15:50<74:40:34, 53.79s/it] 50%|█████ | 5003/10000 [18:16:03<57:35:47, 41.49s/it] {'loss': 0.0042, 'learning_rate': 2.5045e-05, 'epoch': 1.89} 50%|█████ | 5003/10000 [18:16:03<57:35:47, 41.49s/it] 50%|█████ | 5004/10000 [18:16:16<45:39:21, 32.90s/it] {'loss': 0.0038, 'learning_rate': 2.504e-05, 'epoch': 1.89} 50%|█████ | 5004/10000 [18:16:16<45:39:21, 32.90s/it] 50%|█████ | 5005/10000 [18:16:29<37:19:34, 26.90s/it] {'loss': 0.0043, 'learning_rate': 2.5035000000000003e-05, 'epoch': 1.89} 50%|█████ | 5005/10000 [18:16:29<37:19:34, 26.90s/it] 50%|█████ | 5006/10000 [18:16:41<31:28:31, 22.69s/it] {'loss': 0.0053, 'learning_rate': 2.5030000000000003e-05, 'epoch': 1.89} 50%|█████ | 5006/10000 [18:16:42<31:28:31, 22.69s/it] 50%|█████ | 5007/10000 [18:16:54<27:22:05, 19.73s/it] {'loss': 0.0037, 'learning_rate': 2.5025e-05, 'epoch': 1.89} 50%|█████ | 5007/10000 [18:16:54<27:22:05, 19.73s/it] 50%|█████ | 5008/10000 [18:17:07<24:29:28, 17.66s/it] {'loss': 0.0041, 'learning_rate': 2.5019999999999998e-05, 'epoch': 1.89} 50%|█████ | 5008/10000 [18:17:07<24:29:28, 17.66s/it] 50%|█████ | 5009/10000 [18:17:20<22:29:17, 16.22s/it] {'loss': 0.0053, 'learning_rate': 2.5015e-05, 'epoch': 1.89} 50%|█████ | 5009/10000 [18:17:20<22:29:17, 16.22s/it] 50%|█████ | 5010/10000 [18:17:33<21:05:02, 15.21s/it] {'loss': 0.0062, 'learning_rate': 2.501e-05, 'epoch': 1.89} 50%|█████ | 5010/10000 [18:17:33<21:05:02, 15.21s/it] 50%|█████ | 5011/10000 [18:17:46<20:07:39, 14.52s/it] {'loss': 0.0064, 'learning_rate': 2.5005000000000002e-05, 'epoch': 1.89} 50%|█████ | 5011/10000 [18:17:46<20:07:39, 14.52s/it] 50%|█████ | 5012/10000 [18:17:59<19:26:35, 14.03s/it] {'loss': 0.0061, 'learning_rate': 2.5e-05, 'epoch': 1.89} 50%|█████ | 5012/10000 [18:17:59<19:26:35, 14.03s/it] 50%|█████ | 5013/10000 [18:18:11<18:56:24, 13.67s/it] {'loss': 0.0044, 'learning_rate': 2.4995e-05, 'epoch': 1.89} 50%|█████ | 5013/10000 [18:18:12<18:56:24, 13.67s/it] 50%|█████ | 5014/10000 [18:18:24<18:37:32, 13.45s/it] {'loss': 0.0043, 'learning_rate': 2.4990000000000003e-05, 'epoch': 1.89} 50%|█████ | 5014/10000 [18:18:24<18:37:32, 13.45s/it] 50%|█████ | 5015/10000 [18:18:37<18:23:36, 13.28s/it] {'loss': 0.0056, 'learning_rate': 2.4985e-05, 'epoch': 1.89} 50%|█████ | 5015/10000 [18:18:37<18:23:36, 13.28s/it] 50%|█████ | 5016/10000 [18:18:50<18:13:28, 13.16s/it] {'loss': 0.0054, 'learning_rate': 2.498e-05, 'epoch': 1.89} 50%|█████ | 5016/10000 [18:18:50<18:13:28, 13.16s/it] 50%|█████ | 5017/10000 [18:19:03<18:05:24, 13.07s/it] {'loss': 0.0044, 'learning_rate': 2.4975e-05, 'epoch': 1.89} 50%|█████ | 5017/10000 [18:19:03<18:05:24, 13.07s/it] 50%|█████ | 5018/10000 [18:19:16<18:00:32, 13.01s/it] {'loss': 0.0046, 'learning_rate': 2.4970000000000003e-05, 'epoch': 1.89} 50%|█████ | 5018/10000 [18:19:16<18:00:32, 13.01s/it] 50%|█████ | 5019/10000 [18:19:29<17:55:43, 12.96s/it] {'loss': 0.0058, 'learning_rate': 2.4965000000000002e-05, 'epoch': 1.89} 50%|█████ | 5019/10000 [18:19:29<17:55:43, 12.96s/it] 50%|█████ | 5020/10000 [18:19:42<17:55:03, 12.95s/it] {'loss': 0.0041, 'learning_rate': 2.496e-05, 'epoch': 1.89} 50%|█████ | 5020/10000 [18:19:42<17:55:03, 12.95s/it] 50%|█████ | 5021/10000 [18:19:55<17:51:56, 12.92s/it] {'loss': 0.0037, 'learning_rate': 2.4955e-05, 'epoch': 1.89} 50%|█████ | 5021/10000 [18:19:55<17:51:56, 12.92s/it] 50%|█████ | 5022/10000 [18:20:07<17:50:23, 12.90s/it] {'loss': 0.0038, 'learning_rate': 2.495e-05, 'epoch': 1.89} 50%|█████ | 5022/10000 [18:20:07<17:50:23, 12.90s/it] 50%|█████ | 5023/10000 [18:20:20<17:48:57, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.4945000000000003e-05, 'epoch': 1.89} 50%|█████ | 5023/10000 [18:20:20<17:48:57, 12.89s/it] 50%|█████ | 5024/10000 [18:20:33<17:47:56, 12.88s/it] {'loss': 0.0043, 'learning_rate': 2.4940000000000002e-05, 'epoch': 1.89} 50%|█████ | 5024/10000 [18:20:33<17:47:56, 12.88s/it] 50%|█████ | 5025/10000 [18:20:46<17:47:03, 12.87s/it] {'loss': 0.0058, 'learning_rate': 2.4935e-05, 'epoch': 1.89} 50%|█████ | 5025/10000 [18:20:46<17:47:03, 12.87s/it] 50%|█████ | 5026/10000 [18:20:59<17:46:17, 12.86s/it] {'loss': 0.0045, 'learning_rate': 2.493e-05, 'epoch': 1.89} 50%|█████ | 5026/10000 [18:20:59<17:46:17, 12.86s/it] 50%|█████ | 5027/10000 [18:21:12<17:45:49, 12.86s/it] {'loss': 0.0059, 'learning_rate': 2.4925000000000003e-05, 'epoch': 1.89} 50%|█████ | 5027/10000 [18:21:12<17:45:49, 12.86s/it] 50%|█████ | 5028/10000 [18:21:25<17:45:31, 12.86s/it] {'loss': 0.0051, 'learning_rate': 2.4920000000000002e-05, 'epoch': 1.89} 50%|█████ | 5028/10000 [18:21:25<17:45:31, 12.86s/it] 50%|█████ | 5029/10000 [18:21:37<17:44:22, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.4915e-05, 'epoch': 1.89} 50%|█████ | 5029/10000 [18:21:37<17:44:22, 12.85s/it] 50%|█████ | 5030/10000 [18:21:50<17:45:29, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.491e-05, 'epoch': 1.9} 50%|█████ | 5030/10000 [18:21:50<17:45:29, 12.86s/it] 50%|█████ | 5031/10000 [18:22:03<17:48:09, 12.90s/it] {'loss': 0.0043, 'learning_rate': 2.4905e-05, 'epoch': 1.9} 50%|█████ | 5031/10000 [18:22:03<17:48:09, 12.90s/it] 50%|█████ | 5032/10000 [18:22:16<17:46:51, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.4900000000000002e-05, 'epoch': 1.9} 50%|█████ | 5032/10000 [18:22:16<17:46:51, 12.88s/it] 50%|█████ | 5033/10000 [18:22:29<17:45:10, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.4895e-05, 'epoch': 1.9} 50%|█████ | 5033/10000 [18:22:29<17:45:10, 12.87s/it] 50%|█████ | 5034/10000 [18:22:42<17:46:35, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.489e-05, 'epoch': 1.9} 50%|█████ | 5034/10000 [18:22:42<17:46:35, 12.89s/it] 50%|█████ | 5035/10000 [18:22:55<17:46:05, 12.88s/it] {'loss': 0.006, 'learning_rate': 2.4885e-05, 'epoch': 1.9} 50%|█████ | 5035/10000 [18:22:55<17:46:05, 12.88s/it] 50%|█████ | 5036/10000 [18:23:08<17:46:04, 12.89s/it] {'loss': 0.0043, 'learning_rate': 2.488e-05, 'epoch': 1.9} 50%|█████ | 5036/10000 [18:23:08<17:46:04, 12.89s/it] 50%|█████ | 5037/10000 [18:23:21<17:46:47, 12.90s/it] {'loss': 0.0058, 'learning_rate': 2.4875e-05, 'epoch': 1.9} 50%|█████ | 5037/10000 [18:23:21<17:46:47, 12.90s/it] 50%|█████ | 5038/10000 [18:23:33<17:47:34, 12.91s/it] {'loss': 0.005, 'learning_rate': 2.487e-05, 'epoch': 1.9} 50%|█████ | 5038/10000 [18:23:33<17:47:34, 12.91s/it] 50%|█████ | 5039/10000 [18:23:46<17:48:26, 12.92s/it] {'loss': 0.0075, 'learning_rate': 2.4865000000000003e-05, 'epoch': 1.9} 50%|█████ | 5039/10000 [18:23:46<17:48:26, 12.92s/it] 50%|█████ | 5040/10000 [18:23:59<17:48:42, 12.93s/it] {'loss': 0.0053, 'learning_rate': 2.486e-05, 'epoch': 1.9} 50%|█████ | 5040/10000 [18:23:59<17:48:42, 12.93s/it] 50%|█████ | 5041/10000 [18:24:12<17:47:19, 12.91s/it] {'loss': 0.0044, 'learning_rate': 2.4855000000000002e-05, 'epoch': 1.9} 50%|█████ | 5041/10000 [18:24:12<17:47:19, 12.91s/it] 50%|█████ | 5042/10000 [18:24:25<17:47:06, 12.91s/it] {'loss': 0.0054, 'learning_rate': 2.485e-05, 'epoch': 1.9} 50%|█████ | 5042/10000 [18:24:25<17:47:06, 12.91s/it] 50%|█████ | 5043/10000 [18:24:38<17:45:06, 12.89s/it] {'loss': 0.0056, 'learning_rate': 2.4845e-05, 'epoch': 1.9} 50%|█████ | 5043/10000 [18:24:38<17:45:06, 12.89s/it] 50%|█████ | 5044/10000 [18:24:51<17:44:44, 12.89s/it] {'loss': 0.0044, 'learning_rate': 2.4840000000000003e-05, 'epoch': 1.9} 50%|█████ | 5044/10000 [18:24:51<17:44:44, 12.89s/it] 50%|█████ | 5045/10000 [18:25:04<17:44:28, 12.89s/it] {'loss': 0.005, 'learning_rate': 2.4835e-05, 'epoch': 1.9} 50%|█████ | 5045/10000 [18:25:04<17:44:28, 12.89s/it] 50%|█████ | 5046/10000 [18:25:17<17:43:22, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.483e-05, 'epoch': 1.9} 50%|█████ | 5046/10000 [18:25:17<17:43:22, 12.88s/it] 50%|█████ | 5047/10000 [18:25:30<17:45:10, 12.90s/it] {'loss': 0.0052, 'learning_rate': 2.4825e-05, 'epoch': 1.9} 50%|█████ | 5047/10000 [18:25:30<17:45:10, 12.90s/it] 50%|█████ | 5048/10000 [18:25:42<17:44:07, 12.89s/it] {'loss': 0.0054, 'learning_rate': 2.4820000000000003e-05, 'epoch': 1.9} 50%|█████ | 5048/10000 [18:25:42<17:44:07, 12.89s/it] 50%|█████ | 5049/10000 [18:25:55<17:43:40, 12.89s/it] {'loss': 0.0057, 'learning_rate': 2.4815000000000002e-05, 'epoch': 1.9} 50%|█████ | 5049/10000 [18:25:55<17:43:40, 12.89s/it] 50%|█████ | 5050/10000 [18:26:08<17:42:48, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.481e-05, 'epoch': 1.9} 50%|█████ | 5050/10000 [18:26:08<17:42:48, 12.88s/it] 51%|█████ | 5051/10000 [18:26:21<17:43:02, 12.89s/it] {'loss': 0.0041, 'learning_rate': 2.4805e-05, 'epoch': 1.9} 51%|█████ | 5051/10000 [18:26:21<17:43:02, 12.89s/it] 51%|█████ | 5052/10000 [18:26:34<17:41:13, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.48e-05, 'epoch': 1.9} 51%|█████ | 5052/10000 [18:26:34<17:41:13, 12.87s/it] 51%|█████ | 5053/10000 [18:26:47<17:40:42, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.4795000000000002e-05, 'epoch': 1.9} 51%|█████ | 5053/10000 [18:26:47<17:40:42, 12.86s/it] 51%|█████ | 5054/10000 [18:27:00<17:41:53, 12.88s/it] {'loss': 0.006, 'learning_rate': 2.479e-05, 'epoch': 1.9} 51%|█████ | 5054/10000 [18:27:00<17:41:53, 12.88s/it] 51%|█████ | 5055/10000 [18:27:13<17:41:11, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.4785e-05, 'epoch': 1.9} 51%|█████ | 5055/10000 [18:27:13<17:41:11, 12.88s/it] 51%|█████ | 5056/10000 [18:27:25<17:41:18, 12.88s/it] {'loss': 0.0058, 'learning_rate': 2.478e-05, 'epoch': 1.91} 51%|█████ | 5056/10000 [18:27:25<17:41:18, 12.88s/it] 51%|█████ | 5057/10000 [18:27:38<17:42:00, 12.89s/it] {'loss': 0.0055, 'learning_rate': 2.4775000000000003e-05, 'epoch': 1.91} 51%|█████ | 5057/10000 [18:27:38<17:42:00, 12.89s/it] 51%|█████ | 5058/10000 [18:27:51<17:44:08, 12.92s/it] {'loss': 0.0045, 'learning_rate': 2.4770000000000002e-05, 'epoch': 1.91} 51%|█████ | 5058/10000 [18:27:51<17:44:08, 12.92s/it] 51%|█████ | 5059/10000 [18:28:04<17:43:16, 12.91s/it] {'loss': 0.0046, 'learning_rate': 2.4765e-05, 'epoch': 1.91} 51%|█████ | 5059/10000 [18:28:04<17:43:16, 12.91s/it] 51%|█████ | 5060/10000 [18:28:17<17:43:46, 12.92s/it] {'loss': 0.0045, 'learning_rate': 2.476e-05, 'epoch': 1.91} 51%|█████ | 5060/10000 [18:28:17<17:43:46, 12.92s/it] 51%|█████ | 5061/10000 [18:28:30<17:44:00, 12.93s/it] {'loss': 0.004, 'learning_rate': 2.4755e-05, 'epoch': 1.91} 51%|█████ | 5061/10000 [18:28:30<17:44:00, 12.93s/it] 51%|█████ | 5062/10000 [18:28:43<17:42:35, 12.91s/it] {'loss': 0.0042, 'learning_rate': 2.4750000000000002e-05, 'epoch': 1.91} 51%|█████ | 5062/10000 [18:28:43<17:42:35, 12.91s/it] 51%|█████ | 5063/10000 [18:28:56<17:40:13, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.4745e-05, 'epoch': 1.91} 51%|█████ | 5063/10000 [18:28:56<17:40:13, 12.89s/it] 51%|█████ | 5064/10000 [18:29:09<17:41:35, 12.90s/it] {'loss': 0.0044, 'learning_rate': 2.4740000000000004e-05, 'epoch': 1.91} 51%|█████ | 5064/10000 [18:29:09<17:41:35, 12.90s/it] 51%|█████ | 5065/10000 [18:29:22<17:40:54, 12.90s/it] {'loss': 0.0041, 'learning_rate': 2.4735e-05, 'epoch': 1.91} 51%|█████ | 5065/10000 [18:29:22<17:40:54, 12.90s/it] 51%|█████ | 5066/10000 [18:29:35<17:40:29, 12.90s/it] {'loss': 0.005, 'learning_rate': 2.473e-05, 'epoch': 1.91} 51%|█████ | 5066/10000 [18:29:35<17:40:29, 12.90s/it] 51%|█████ | 5067/10000 [18:29:47<17:41:07, 12.91s/it] {'loss': 0.0055, 'learning_rate': 2.4725e-05, 'epoch': 1.91} 51%|█████ | 5067/10000 [18:29:47<17:41:07, 12.91s/it] 51%|█████ | 5068/10000 [18:30:00<17:40:19, 12.90s/it] {'loss': 0.006, 'learning_rate': 2.472e-05, 'epoch': 1.91} 51%|█████ | 5068/10000 [18:30:00<17:40:19, 12.90s/it] 51%|█████ | 5069/10000 [18:30:13<17:39:20, 12.89s/it] {'loss': 0.0045, 'learning_rate': 2.4715000000000003e-05, 'epoch': 1.91} 51%|█████ | 5069/10000 [18:30:13<17:39:20, 12.89s/it] 51%|█████ | 5070/10000 [18:30:26<17:37:00, 12.86s/it] {'loss': 0.006, 'learning_rate': 2.471e-05, 'epoch': 1.91} 51%|█████ | 5070/10000 [18:30:26<17:37:00, 12.86s/it] 51%|█████ | 5071/10000 [18:30:39<17:36:23, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.4705e-05, 'epoch': 1.91} 51%|█████ | 5071/10000 [18:30:39<17:36:23, 12.86s/it] 51%|█████ | 5072/10000 [18:30:52<17:35:24, 12.85s/it] {'loss': 0.0046, 'learning_rate': 2.47e-05, 'epoch': 1.91} 51%|█████ | 5072/10000 [18:30:52<17:35:24, 12.85s/it] 51%|█████ | 5073/10000 [18:31:05<17:34:21, 12.84s/it] {'loss': 0.0051, 'learning_rate': 2.4695e-05, 'epoch': 1.91} 51%|█████ | 5073/10000 [18:31:05<17:34:21, 12.84s/it] 51%|█████ | 5074/10000 [18:31:17<17:35:13, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.4690000000000002e-05, 'epoch': 1.91} 51%|█████ | 5074/10000 [18:31:17<17:35:13, 12.85s/it] 51%|█████ | 5075/10000 [18:31:30<17:34:00, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.4685e-05, 'epoch': 1.91} 51%|█████ | 5075/10000 [18:31:30<17:34:00, 12.84s/it] 51%|█████ | 5076/10000 [18:31:43<17:33:34, 12.84s/it] {'loss': 0.005, 'learning_rate': 2.468e-05, 'epoch': 1.91} 51%|█████ | 5076/10000 [18:31:43<17:33:34, 12.84s/it] 51%|█████ | 5077/10000 [18:31:56<17:34:44, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.4675e-05, 'epoch': 1.91} 51%|█████ | 5077/10000 [18:31:56<17:34:44, 12.85s/it] 51%|█████ | 5078/10000 [18:32:09<17:35:02, 12.86s/it] {'loss': 0.0041, 'learning_rate': 2.4670000000000003e-05, 'epoch': 1.91} 51%|█████ | 5078/10000 [18:32:09<17:35:02, 12.86s/it] 51%|█████ | 5079/10000 [18:32:22<17:36:19, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.4665000000000002e-05, 'epoch': 1.91} 51%|█████ | 5079/10000 [18:32:22<17:36:19, 12.88s/it] 51%|█████ | 5080/10000 [18:32:35<17:36:01, 12.88s/it] {'loss': 0.0058, 'learning_rate': 2.466e-05, 'epoch': 1.91} 51%|█████ | 5080/10000 [18:32:35<17:36:01, 12.88s/it] 51%|█████ | 5081/10000 [18:32:47<17:36:03, 12.88s/it] {'loss': 0.006, 'learning_rate': 2.4655e-05, 'epoch': 1.91} 51%|█████ | 5081/10000 [18:32:48<17:36:03, 12.88s/it] 51%|█████ | 5082/10000 [18:33:00<17:36:00, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.465e-05, 'epoch': 1.91} 51%|█████ | 5082/10000 [18:33:00<17:36:00, 12.88s/it] 51%|█████ | 5083/10000 [18:33:13<17:34:33, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.4645000000000002e-05, 'epoch': 1.92} 51%|█████ | 5083/10000 [18:33:13<17:34:33, 12.87s/it] 51%|█████ | 5084/10000 [18:33:26<17:35:00, 12.88s/it] {'loss': 0.0055, 'learning_rate': 2.464e-05, 'epoch': 1.92} 51%|█████ | 5084/10000 [18:33:26<17:35:00, 12.88s/it] 51%|█████ | 5085/10000 [18:33:39<17:37:25, 12.91s/it] {'loss': 0.0051, 'learning_rate': 2.4635000000000004e-05, 'epoch': 1.92} 51%|█████ | 5085/10000 [18:33:39<17:37:25, 12.91s/it] 51%|█████ | 5086/10000 [18:33:52<17:35:48, 12.89s/it] {'loss': 0.0054, 'learning_rate': 2.463e-05, 'epoch': 1.92} 51%|█████ | 5086/10000 [18:33:52<17:35:48, 12.89s/it] 51%|█████ | 5087/10000 [18:34:05<17:34:50, 12.88s/it] {'loss': 0.0062, 'learning_rate': 2.4625000000000002e-05, 'epoch': 1.92} 51%|█████ | 5087/10000 [18:34:05<17:34:50, 12.88s/it] 51%|█████ | 5088/10000 [18:34:18<17:34:34, 12.88s/it] {'loss': 0.0039, 'learning_rate': 2.462e-05, 'epoch': 1.92} 51%|█████ | 5088/10000 [18:34:18<17:34:34, 12.88s/it] 51%|█████ | 5089/10000 [18:34:31<17:35:48, 12.90s/it] {'loss': 0.0043, 'learning_rate': 2.4615e-05, 'epoch': 1.92} 51%|█████ | 5089/10000 [18:34:31<17:35:48, 12.90s/it] 51%|█████ | 5090/10000 [18:34:43<17:33:43, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.4610000000000003e-05, 'epoch': 1.92} 51%|█████ | 5090/10000 [18:34:43<17:33:43, 12.88s/it] 51%|█████ | 5091/10000 [18:34:56<17:35:15, 12.90s/it] {'loss': 0.0058, 'learning_rate': 2.4605e-05, 'epoch': 1.92} 51%|█████ | 5091/10000 [18:34:56<17:35:15, 12.90s/it] 51%|█████ | 5092/10000 [18:35:09<17:36:08, 12.91s/it] {'loss': 0.005, 'learning_rate': 2.46e-05, 'epoch': 1.92} 51%|█████ | 5092/10000 [18:35:09<17:36:08, 12.91s/it] 51%|█████ | 5093/10000 [18:35:22<17:35:23, 12.90s/it] {'loss': 0.0048, 'learning_rate': 2.4595e-05, 'epoch': 1.92} 51%|█████ | 5093/10000 [18:35:22<17:35:23, 12.90s/it] 51%|█████ | 5094/10000 [18:35:35<17:35:11, 12.90s/it] {'loss': 0.0045, 'learning_rate': 2.4590000000000003e-05, 'epoch': 1.92} 51%|█████ | 5094/10000 [18:35:35<17:35:11, 12.90s/it] 51%|█████ | 5095/10000 [18:35:48<17:33:39, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.4585000000000003e-05, 'epoch': 1.92} 51%|█████ | 5095/10000 [18:35:48<17:33:39, 12.89s/it] 51%|█████ | 5096/10000 [18:36:01<17:32:50, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.4580000000000002e-05, 'epoch': 1.92} 51%|█████ | 5096/10000 [18:36:01<17:32:50, 12.88s/it] 51%|█████ | 5097/10000 [18:36:14<17:32:33, 12.88s/it] {'loss': 0.0034, 'learning_rate': 2.4575e-05, 'epoch': 1.92} 51%|█████ | 5097/10000 [18:36:14<17:32:33, 12.88s/it] 51%|█████ | 5098/10000 [18:36:27<17:32:52, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.457e-05, 'epoch': 1.92} 51%|█████ | 5098/10000 [18:36:27<17:32:52, 12.89s/it] 51%|█████ | 5099/10000 [18:36:39<17:31:59, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.4565000000000003e-05, 'epoch': 1.92} 51%|█████ | 5099/10000 [18:36:40<17:31:59, 12.88s/it] 51%|█████ | 5100/10000 [18:36:52<17:31:28, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.4560000000000002e-05, 'epoch': 1.92} 51%|█████ | 5100/10000 [18:36:52<17:31:28, 12.88s/it] 51%|█████ | 5101/10000 [18:37:05<17:31:14, 12.88s/it] {'loss': 0.0068, 'learning_rate': 2.4555e-05, 'epoch': 1.92} 51%|█████ | 5101/10000 [18:37:05<17:31:14, 12.88s/it] 51%|█████ | 5102/10000 [18:37:18<17:30:47, 12.87s/it] {'loss': 0.007, 'learning_rate': 2.455e-05, 'epoch': 1.92} 51%|█████ | 5102/10000 [18:37:18<17:30:47, 12.87s/it] 51%|█████ | 5103/10000 [18:37:31<17:30:40, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.4545000000000003e-05, 'epoch': 1.92} 51%|█████ | 5103/10000 [18:37:31<17:30:40, 12.87s/it] 51%|█████ | 5104/10000 [18:37:44<17:30:10, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.4540000000000002e-05, 'epoch': 1.92} 51%|█████ | 5104/10000 [18:37:44<17:30:10, 12.87s/it] 51%|█████ | 5105/10000 [18:37:57<17:29:43, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.4535e-05, 'epoch': 1.92} 51%|█████ | 5105/10000 [18:37:57<17:29:43, 12.87s/it] 51%|█████ | 5106/10000 [18:38:10<17:29:06, 12.86s/it] {'loss': 0.0057, 'learning_rate': 2.453e-05, 'epoch': 1.92} 51%|█████ | 5106/10000 [18:38:10<17:29:06, 12.86s/it] 51%|█████ | 5107/10000 [18:38:22<17:29:34, 12.87s/it] {'loss': 0.0045, 'learning_rate': 2.4525e-05, 'epoch': 1.92} 51%|█████ | 5107/10000 [18:38:22<17:29:34, 12.87s/it] 51%|█████ | 5108/10000 [18:38:35<17:28:17, 12.86s/it] {'loss': 0.0059, 'learning_rate': 2.4520000000000002e-05, 'epoch': 1.92} 51%|█████ | 5108/10000 [18:38:35<17:28:17, 12.86s/it] 51%|█████ | 5109/10000 [18:38:48<17:28:03, 12.86s/it] {'loss': 0.004, 'learning_rate': 2.4515e-05, 'epoch': 1.93} 51%|█████ | 5109/10000 [18:38:48<17:28:03, 12.86s/it] 51%|█████ | 5110/10000 [18:39:01<17:27:40, 12.85s/it] {'loss': 0.0049, 'learning_rate': 2.451e-05, 'epoch': 1.93} 51%|█████ | 5110/10000 [18:39:01<17:27:40, 12.85s/it] 51%|█████ | 5111/10000 [18:39:14<17:28:29, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.4505e-05, 'epoch': 1.93} 51%|█████ | 5111/10000 [18:39:14<17:28:29, 12.87s/it] 51%|█████ | 5112/10000 [18:39:27<17:28:55, 12.88s/it] {'loss': 0.0054, 'learning_rate': 2.45e-05, 'epoch': 1.93} 51%|█████ | 5112/10000 [18:39:27<17:28:55, 12.88s/it] 51%|█████ | 5113/10000 [18:39:40<17:28:49, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.4495000000000002e-05, 'epoch': 1.93} 51%|█████ | 5113/10000 [18:39:40<17:28:49, 12.88s/it] 51%|█████ | 5114/10000 [18:39:53<17:28:38, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.449e-05, 'epoch': 1.93} 51%|█████ | 5114/10000 [18:39:53<17:28:38, 12.88s/it] 51%|█████ | 5115/10000 [18:40:05<17:28:36, 12.88s/it] {'loss': 0.006, 'learning_rate': 2.4485000000000004e-05, 'epoch': 1.93} 51%|█████ | 5115/10000 [18:40:05<17:28:36, 12.88s/it] 51%|█████ | 5116/10000 [18:40:18<17:26:39, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.448e-05, 'epoch': 1.93} 51%|█████ | 5116/10000 [18:40:18<17:26:39, 12.86s/it] 51%|█████ | 5117/10000 [18:40:31<17:26:50, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.4475000000000002e-05, 'epoch': 1.93} 51%|█████ | 5117/10000 [18:40:31<17:26:50, 12.86s/it] 51%|█████ | 5118/10000 [18:40:44<17:27:48, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.447e-05, 'epoch': 1.93} 51%|█████ | 5118/10000 [18:40:44<17:27:48, 12.88s/it] 51%|█████ | 5119/10000 [18:40:57<17:27:27, 12.88s/it] {'loss': 0.0043, 'learning_rate': 2.4465e-05, 'epoch': 1.93} 51%|█████ | 5119/10000 [18:40:57<17:27:27, 12.88s/it] 51%|█████ | 5120/10000 [18:41:10<17:27:27, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.4460000000000003e-05, 'epoch': 1.93} 51%|█████ | 5120/10000 [18:41:10<17:27:27, 12.88s/it] 51%|█████ | 5121/10000 [18:41:23<17:27:42, 12.88s/it] {'loss': 0.0055, 'learning_rate': 2.4455e-05, 'epoch': 1.93} 51%|█████ | 5121/10000 [18:41:23<17:27:42, 12.88s/it] 51%|█████ | 5122/10000 [18:41:36<17:27:28, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.445e-05, 'epoch': 1.93} 51%|█████ | 5122/10000 [18:41:36<17:27:28, 12.88s/it] 51%|█████ | 5123/10000 [18:41:48<17:26:09, 12.87s/it] {'loss': 0.0071, 'learning_rate': 2.4445e-05, 'epoch': 1.93} 51%|█████ | 5123/10000 [18:41:48<17:26:09, 12.87s/it] 51%|█████ | 5124/10000 [18:42:01<17:25:10, 12.86s/it] {'loss': 0.0041, 'learning_rate': 2.4440000000000003e-05, 'epoch': 1.93} 51%|█████ | 5124/10000 [18:42:01<17:25:10, 12.86s/it] 51%|█████▏ | 5125/10000 [18:42:14<17:25:31, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.4435000000000002e-05, 'epoch': 1.93} 51%|█████▏ | 5125/10000 [18:42:14<17:25:31, 12.87s/it] 51%|█████▏ | 5126/10000 [18:42:27<17:25:08, 12.87s/it] {'loss': 0.0045, 'learning_rate': 2.443e-05, 'epoch': 1.93} 51%|█████▏ | 5126/10000 [18:42:27<17:25:08, 12.87s/it] 51%|█████▏ | 5127/10000 [18:42:40<17:24:23, 12.86s/it] {'loss': 0.0041, 'learning_rate': 2.4425e-05, 'epoch': 1.93} 51%|█████▏ | 5127/10000 [18:42:40<17:24:23, 12.86s/it] 51%|█████▏ | 5128/10000 [18:42:53<17:26:48, 12.89s/it] {'loss': 0.0051, 'learning_rate': 2.442e-05, 'epoch': 1.93} 51%|█████▏ | 5128/10000 [18:42:53<17:26:48, 12.89s/it] 51%|█████▏ | 5129/10000 [18:43:06<17:28:44, 12.92s/it] {'loss': 0.0054, 'learning_rate': 2.4415000000000003e-05, 'epoch': 1.93} 51%|█████▏ | 5129/10000 [18:43:06<17:28:44, 12.92s/it] 51%|█████▏ | 5130/10000 [18:43:19<17:27:52, 12.91s/it] {'loss': 0.0041, 'learning_rate': 2.4410000000000002e-05, 'epoch': 1.93} 51%|█████▏ | 5130/10000 [18:43:19<17:27:52, 12.91s/it] 51%|█████▏ | 5131/10000 [18:43:32<17:26:40, 12.90s/it] {'loss': 0.0054, 'learning_rate': 2.4405e-05, 'epoch': 1.93} 51%|█████▏ | 5131/10000 [18:43:32<17:26:40, 12.90s/it] 51%|█████▏ | 5132/10000 [18:43:44<17:28:19, 12.92s/it] {'loss': 0.0045, 'learning_rate': 2.44e-05, 'epoch': 1.93} 51%|█████▏ | 5132/10000 [18:43:45<17:28:19, 12.92s/it] 51%|█████▏ | 5133/10000 [18:43:57<17:27:18, 12.91s/it] {'loss': 0.005, 'learning_rate': 2.4395000000000003e-05, 'epoch': 1.93} 51%|█████▏ | 5133/10000 [18:43:57<17:27:18, 12.91s/it] 51%|█████▏ | 5134/10000 [18:44:10<17:26:33, 12.90s/it] {'loss': 0.0055, 'learning_rate': 2.4390000000000002e-05, 'epoch': 1.93} 51%|█████▏ | 5134/10000 [18:44:10<17:26:33, 12.90s/it] 51%|█████▏ | 5135/10000 [18:44:23<17:27:39, 12.92s/it] {'loss': 0.0049, 'learning_rate': 2.4385e-05, 'epoch': 1.93} 51%|█████▏ | 5135/10000 [18:44:23<17:27:39, 12.92s/it] 51%|█████▏ | 5136/10000 [18:44:36<17:27:38, 12.92s/it] {'loss': 0.0056, 'learning_rate': 2.438e-05, 'epoch': 1.94} 51%|█████▏ | 5136/10000 [18:44:36<17:27:38, 12.92s/it] 51%|█████▏ | 5137/10000 [18:44:49<17:26:19, 12.91s/it] {'loss': 0.0044, 'learning_rate': 2.4375e-05, 'epoch': 1.94} 51%|█████▏ | 5137/10000 [18:44:49<17:26:19, 12.91s/it] 51%|█████▏ | 5138/10000 [18:45:02<17:25:07, 12.90s/it] {'loss': 0.0042, 'learning_rate': 2.4370000000000002e-05, 'epoch': 1.94} 51%|█████▏ | 5138/10000 [18:45:02<17:25:07, 12.90s/it] 51%|█████▏ | 5139/10000 [18:45:15<17:26:35, 12.92s/it] {'loss': 0.0042, 'learning_rate': 2.4365e-05, 'epoch': 1.94} 51%|█████▏ | 5139/10000 [18:45:15<17:26:35, 12.92s/it] 51%|█████▏ | 5140/10000 [18:45:28<17:25:34, 12.91s/it] {'loss': 0.0051, 'learning_rate': 2.4360000000000004e-05, 'epoch': 1.94} 51%|█████▏ | 5140/10000 [18:45:28<17:25:34, 12.91s/it] 51%|█████▏ | 5141/10000 [18:45:41<17:25:15, 12.91s/it] {'loss': 0.0041, 'learning_rate': 2.4355e-05, 'epoch': 1.94} 51%|█████▏ | 5141/10000 [18:45:41<17:25:15, 12.91s/it] 51%|█████▏ | 5142/10000 [18:45:54<17:26:13, 12.92s/it] {'loss': 0.0061, 'learning_rate': 2.435e-05, 'epoch': 1.94} 51%|█████▏ | 5142/10000 [18:45:54<17:26:13, 12.92s/it] 51%|█████▏ | 5143/10000 [18:46:07<17:27:03, 12.93s/it] {'loss': 0.0059, 'learning_rate': 2.4345e-05, 'epoch': 1.94} 51%|█████▏ | 5143/10000 [18:46:07<17:27:03, 12.93s/it] 51%|█████▏ | 5144/10000 [18:46:19<17:26:22, 12.93s/it] {'loss': 0.005, 'learning_rate': 2.434e-05, 'epoch': 1.94} 51%|█████▏ | 5144/10000 [18:46:20<17:26:22, 12.93s/it] 51%|█████▏ | 5145/10000 [18:46:32<17:27:35, 12.95s/it] {'loss': 0.0054, 'learning_rate': 2.4335000000000003e-05, 'epoch': 1.94} 51%|█████▏ | 5145/10000 [18:46:33<17:27:35, 12.95s/it] 51%|█████▏ | 5146/10000 [18:46:45<17:25:31, 12.92s/it] {'loss': 0.0065, 'learning_rate': 2.433e-05, 'epoch': 1.94} 51%|█████▏ | 5146/10000 [18:46:45<17:25:31, 12.92s/it] 51%|█████▏ | 5147/10000 [18:46:58<17:25:41, 12.93s/it] {'loss': 0.0057, 'learning_rate': 2.4325000000000002e-05, 'epoch': 1.94} 51%|█████▏ | 5147/10000 [18:46:58<17:25:41, 12.93s/it] 51%|█████▏ | 5148/10000 [18:47:11<17:22:42, 12.89s/it] {'loss': 0.0058, 'learning_rate': 2.432e-05, 'epoch': 1.94} 51%|█████▏ | 5148/10000 [18:47:11<17:22:42, 12.89s/it] 51%|█████▏ | 5149/10000 [18:47:24<17:22:32, 12.89s/it] {'loss': 0.0073, 'learning_rate': 2.4315e-05, 'epoch': 1.94} 51%|█████▏ | 5149/10000 [18:47:24<17:22:32, 12.89s/it] 52%|█████▏ | 5150/10000 [18:47:37<17:22:56, 12.90s/it] {'loss': 0.0056, 'learning_rate': 2.4310000000000003e-05, 'epoch': 1.94} 52%|█████▏ | 5150/10000 [18:47:37<17:22:56, 12.90s/it] 52%|█████▏ | 5151/10000 [18:47:50<17:22:34, 12.90s/it] {'loss': 0.005, 'learning_rate': 2.4305e-05, 'epoch': 1.94} 52%|█████▏ | 5151/10000 [18:47:50<17:22:34, 12.90s/it] 52%|█████▏ | 5152/10000 [18:48:03<17:21:49, 12.89s/it] {'loss': 0.0054, 'learning_rate': 2.43e-05, 'epoch': 1.94} 52%|█████▏ | 5152/10000 [18:48:03<17:21:49, 12.89s/it] 52%|█████▏ | 5153/10000 [18:48:16<17:22:31, 12.91s/it] {'loss': 0.0044, 'learning_rate': 2.4295e-05, 'epoch': 1.94} 52%|█████▏ | 5153/10000 [18:48:16<17:22:31, 12.91s/it] 52%|█████▏ | 5154/10000 [18:48:28<17:19:59, 12.88s/it] {'loss': 0.0041, 'learning_rate': 2.4290000000000003e-05, 'epoch': 1.94} 52%|█████▏ | 5154/10000 [18:48:28<17:19:59, 12.88s/it] 52%|█████▏ | 5155/10000 [18:48:41<17:20:55, 12.89s/it] {'loss': 0.005, 'learning_rate': 2.4285000000000002e-05, 'epoch': 1.94} 52%|█████▏ | 5155/10000 [18:48:41<17:20:55, 12.89s/it] 52%|█████▏ | 5156/10000 [18:48:54<17:20:16, 12.89s/it] {'loss': 0.0036, 'learning_rate': 2.428e-05, 'epoch': 1.94} 52%|█████▏ | 5156/10000 [18:48:54<17:20:16, 12.89s/it] 52%|█████▏ | 5157/10000 [18:49:07<17:18:59, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.4275e-05, 'epoch': 1.94} 52%|█████▏ | 5157/10000 [18:49:07<17:18:59, 12.87s/it] 52%|█████▏ | 5158/10000 [18:49:20<17:19:10, 12.88s/it] {'loss': 0.0064, 'learning_rate': 2.427e-05, 'epoch': 1.94} 52%|█████▏ | 5158/10000 [18:49:20<17:19:10, 12.88s/it] 52%|█████▏ | 5159/10000 [18:49:33<17:18:10, 12.87s/it] {'loss': 0.0065, 'learning_rate': 2.4265000000000002e-05, 'epoch': 1.94} 52%|█████▏ | 5159/10000 [18:49:33<17:18:10, 12.87s/it] 52%|█████▏ | 5160/10000 [18:49:46<17:17:11, 12.86s/it] {'loss': 0.006, 'learning_rate': 2.426e-05, 'epoch': 1.94} 52%|█████▏ | 5160/10000 [18:49:46<17:17:11, 12.86s/it] 52%|█████▏ | 5161/10000 [18:49:59<17:18:41, 12.88s/it] {'loss': 0.0045, 'learning_rate': 2.4255e-05, 'epoch': 1.94} 52%|█████▏ | 5161/10000 [18:49:59<17:18:41, 12.88s/it] 52%|█████▏ | 5162/10000 [18:50:11<17:19:17, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.425e-05, 'epoch': 1.94} 52%|█████▏ | 5162/10000 [18:50:12<17:19:17, 12.89s/it] 52%|█████▏ | 5163/10000 [18:50:24<17:19:42, 12.90s/it] {'loss': 0.0047, 'learning_rate': 2.4245000000000002e-05, 'epoch': 1.95} 52%|█████▏ | 5163/10000 [18:50:24<17:19:42, 12.90s/it] 52%|█████▏ | 5164/10000 [18:50:37<17:19:38, 12.90s/it] {'loss': 0.0054, 'learning_rate': 2.4240000000000002e-05, 'epoch': 1.95} 52%|█████▏ | 5164/10000 [18:50:37<17:19:38, 12.90s/it] 52%|█████▏ | 5165/10000 [18:50:50<17:18:15, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.4235e-05, 'epoch': 1.95} 52%|█████▏ | 5165/10000 [18:50:50<17:18:15, 12.88s/it] 52%|█████▏ | 5166/10000 [18:51:03<17:17:35, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.423e-05, 'epoch': 1.95} 52%|█████▏ | 5166/10000 [18:51:03<17:17:35, 12.88s/it] 52%|█████▏ | 5167/10000 [18:51:16<17:16:33, 12.87s/it] {'loss': 0.0042, 'learning_rate': 2.4225e-05, 'epoch': 1.95} 52%|█████▏ | 5167/10000 [18:51:16<17:16:33, 12.87s/it] 52%|█████▏ | 5168/10000 [18:51:29<17:16:36, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.4220000000000002e-05, 'epoch': 1.95} 52%|█████▏ | 5168/10000 [18:51:29<17:16:36, 12.87s/it] 52%|█████▏ | 5169/10000 [18:51:42<17:16:00, 12.87s/it] {'loss': 0.0054, 'learning_rate': 2.4215e-05, 'epoch': 1.95} 52%|█████▏ | 5169/10000 [18:51:42<17:16:00, 12.87s/it] 52%|█████▏ | 5170/10000 [18:51:54<17:14:07, 12.85s/it] {'loss': 0.0045, 'learning_rate': 2.4210000000000004e-05, 'epoch': 1.95} 52%|█████▏ | 5170/10000 [18:51:54<17:14:07, 12.85s/it] 52%|█████▏ | 5171/10000 [18:52:07<17:13:53, 12.85s/it] {'loss': 0.0053, 'learning_rate': 2.4205e-05, 'epoch': 1.95} 52%|█████▏ | 5171/10000 [18:52:07<17:13:53, 12.85s/it] 52%|█████▏ | 5172/10000 [18:52:20<17:12:22, 12.83s/it] {'loss': 0.0053, 'learning_rate': 2.4200000000000002e-05, 'epoch': 1.95} 52%|█████▏ | 5172/10000 [18:52:20<17:12:22, 12.83s/it] 52%|█████▏ | 5173/10000 [18:52:33<17:13:54, 12.85s/it] {'loss': 0.0047, 'learning_rate': 2.4195e-05, 'epoch': 1.95} 52%|█████▏ | 5173/10000 [18:52:33<17:13:54, 12.85s/it] 52%|█████▏ | 5174/10000 [18:52:46<17:13:19, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.419e-05, 'epoch': 1.95} 52%|█████▏ | 5174/10000 [18:52:46<17:13:19, 12.85s/it] 52%|█████▏ | 5175/10000 [18:52:59<17:13:48, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.4185000000000003e-05, 'epoch': 1.95} 52%|█████▏ | 5175/10000 [18:52:59<17:13:48, 12.86s/it] 52%|█████▏ | 5176/10000 [18:53:11<17:13:18, 12.85s/it] {'loss': 0.0056, 'learning_rate': 2.418e-05, 'epoch': 1.95} 52%|█████▏ | 5176/10000 [18:53:12<17:13:18, 12.85s/it] 52%|█████▏ | 5177/10000 [18:53:24<17:14:40, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.4175e-05, 'epoch': 1.95} 52%|█████▏ | 5177/10000 [18:53:24<17:14:40, 12.87s/it] 52%|█████▏ | 5178/10000 [18:53:37<17:15:00, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.417e-05, 'epoch': 1.95} 52%|█████▏ | 5178/10000 [18:53:37<17:15:00, 12.88s/it] 52%|█████▏ | 5179/10000 [18:53:50<17:13:40, 12.86s/it] {'loss': 0.0056, 'learning_rate': 2.4165e-05, 'epoch': 1.95} 52%|█████▏ | 5179/10000 [18:53:50<17:13:40, 12.86s/it] 52%|█████▏ | 5180/10000 [18:54:03<17:13:33, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.4160000000000002e-05, 'epoch': 1.95} 52%|█████▏ | 5180/10000 [18:54:03<17:13:33, 12.87s/it] 52%|█████▏ | 5181/10000 [18:54:16<17:15:36, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.4154999999999998e-05, 'epoch': 1.95} 52%|█████▏ | 5181/10000 [18:54:16<17:15:36, 12.89s/it] 52%|█████▏ | 5182/10000 [18:54:29<17:16:13, 12.90s/it] {'loss': 0.0032, 'learning_rate': 2.415e-05, 'epoch': 1.95} 52%|█████▏ | 5182/10000 [18:54:29<17:16:13, 12.90s/it] 52%|█████▏ | 5183/10000 [18:54:42<17:14:55, 12.89s/it] {'loss': 0.0056, 'learning_rate': 2.4145e-05, 'epoch': 1.95} 52%|█████▏ | 5183/10000 [18:54:42<17:14:55, 12.89s/it] 52%|█████▏ | 5184/10000 [18:54:55<17:17:14, 12.92s/it] {'loss': 0.0054, 'learning_rate': 2.4140000000000003e-05, 'epoch': 1.95} 52%|█████▏ | 5184/10000 [18:54:55<17:17:14, 12.92s/it] 52%|█████▏ | 5185/10000 [18:55:08<17:16:57, 12.92s/it] {'loss': 0.0052, 'learning_rate': 2.4135000000000002e-05, 'epoch': 1.95} 52%|█████▏ | 5185/10000 [18:55:08<17:16:57, 12.92s/it] 52%|█████▏ | 5186/10000 [18:55:21<17:15:27, 12.91s/it] {'loss': 0.0056, 'learning_rate': 2.413e-05, 'epoch': 1.95} 52%|█████▏ | 5186/10000 [18:55:21<17:15:27, 12.91s/it] 52%|█████▏ | 5187/10000 [18:55:33<17:16:11, 12.92s/it] {'loss': 0.0047, 'learning_rate': 2.4125e-05, 'epoch': 1.95} 52%|█████▏ | 5187/10000 [18:55:34<17:16:11, 12.92s/it] 52%|█████▏ | 5188/10000 [18:55:46<17:13:54, 12.89s/it] {'loss': 0.006, 'learning_rate': 2.412e-05, 'epoch': 1.95} 52%|█████▏ | 5188/10000 [18:55:46<17:13:54, 12.89s/it] 52%|█████▏ | 5189/10000 [18:55:59<17:12:48, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.4115000000000002e-05, 'epoch': 1.96} 52%|█████▏ | 5189/10000 [18:55:59<17:12:48, 12.88s/it] 52%|█████▏ | 5190/10000 [18:56:12<17:12:35, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.411e-05, 'epoch': 1.96} 52%|█████▏ | 5190/10000 [18:56:12<17:12:35, 12.88s/it] 52%|█████▏ | 5191/10000 [18:56:25<17:10:37, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.4105e-05, 'epoch': 1.96} 52%|█████▏ | 5191/10000 [18:56:25<17:10:37, 12.86s/it] 52%|█████▏ | 5192/10000 [18:56:38<17:11:00, 12.87s/it] {'loss': 0.0052, 'learning_rate': 2.41e-05, 'epoch': 1.96} 52%|█████▏ | 5192/10000 [18:56:38<17:11:00, 12.87s/it] 52%|█████▏ | 5193/10000 [18:56:51<17:11:09, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.4095000000000002e-05, 'epoch': 1.96} 52%|█████▏ | 5193/10000 [18:56:51<17:11:09, 12.87s/it] 52%|█████▏ | 5194/10000 [18:57:04<17:12:01, 12.88s/it] {'loss': 0.0043, 'learning_rate': 2.409e-05, 'epoch': 1.96} 52%|█████▏ | 5194/10000 [18:57:04<17:12:01, 12.88s/it] 52%|█████▏ | 5195/10000 [18:57:16<17:11:08, 12.88s/it] {'loss': 0.0039, 'learning_rate': 2.4085e-05, 'epoch': 1.96} 52%|█████▏ | 5195/10000 [18:57:16<17:11:08, 12.88s/it] 52%|█████▏ | 5196/10000 [18:57:29<17:09:11, 12.85s/it] {'loss': 0.0038, 'learning_rate': 2.408e-05, 'epoch': 1.96} 52%|█████▏ | 5196/10000 [18:57:29<17:09:11, 12.85s/it] 52%|█████▏ | 5197/10000 [18:57:42<17:10:58, 12.88s/it] {'loss': 0.0038, 'learning_rate': 2.4075e-05, 'epoch': 1.96} 52%|█████▏ | 5197/10000 [18:57:42<17:10:58, 12.88s/it] 52%|█████▏ | 5198/10000 [18:57:55<17:12:40, 12.90s/it] {'loss': 0.0053, 'learning_rate': 2.407e-05, 'epoch': 1.96} 52%|█████▏ | 5198/10000 [18:57:55<17:12:40, 12.90s/it] 52%|█████▏ | 5199/10000 [18:58:08<17:10:39, 12.88s/it] {'loss': 0.006, 'learning_rate': 2.4065e-05, 'epoch': 1.96} 52%|█████▏ | 5199/10000 [18:58:08<17:10:39, 12.88s/it] 52%|█████▏ | 5200/10000 [18:58:21<17:10:44, 12.88s/it] {'loss': 0.0059, 'learning_rate': 2.4060000000000003e-05, 'epoch': 1.96} 52%|█████▏ | 5200/10000 [18:58:21<17:10:44, 12.88s/it] 52%|█████▏ | 5201/10000 [18:58:34<17:10:40, 12.89s/it] {'loss': 0.0063, 'learning_rate': 2.4055000000000003e-05, 'epoch': 1.96} 52%|█████▏ | 5201/10000 [18:58:34<17:10:40, 12.89s/it] 52%|█████▏ | 5202/10000 [18:58:47<17:12:42, 12.91s/it] {'loss': 0.0043, 'learning_rate': 2.4050000000000002e-05, 'epoch': 1.96} 52%|█████▏ | 5202/10000 [18:58:47<17:12:42, 12.91s/it] 52%|█████▏ | 5203/10000 [18:59:00<17:13:09, 12.92s/it] {'loss': 0.0045, 'learning_rate': 2.4045e-05, 'epoch': 1.96} 52%|█████▏ | 5203/10000 [18:59:00<17:13:09, 12.92s/it] 52%|█████▏ | 5204/10000 [18:59:13<17:13:17, 12.93s/it] {'loss': 0.0042, 'learning_rate': 2.404e-05, 'epoch': 1.96} 52%|█████▏ | 5204/10000 [18:59:13<17:13:17, 12.93s/it] 52%|█████▏ | 5205/10000 [18:59:25<17:10:32, 12.90s/it] {'loss': 0.0064, 'learning_rate': 2.4035000000000003e-05, 'epoch': 1.96} 52%|█████▏ | 5205/10000 [18:59:25<17:10:32, 12.90s/it] 52%|█████▏ | 5206/10000 [18:59:38<17:08:31, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.4030000000000002e-05, 'epoch': 1.96} 52%|█████▏ | 5206/10000 [18:59:38<17:08:31, 12.87s/it] 52%|█████▏ | 5207/10000 [18:59:51<17:09:27, 12.89s/it] {'loss': 0.0039, 'learning_rate': 2.4025e-05, 'epoch': 1.96} 52%|█████▏ | 5207/10000 [18:59:51<17:09:27, 12.89s/it] 52%|█████▏ | 5208/10000 [19:00:04<17:09:38, 12.89s/it] {'loss': 0.0046, 'learning_rate': 2.402e-05, 'epoch': 1.96} 52%|█████▏ | 5208/10000 [19:00:04<17:09:38, 12.89s/it] 52%|█████▏ | 5209/10000 [19:00:17<17:08:54, 12.89s/it] {'loss': 0.0056, 'learning_rate': 2.4015000000000003e-05, 'epoch': 1.96} 52%|█████▏ | 5209/10000 [19:00:17<17:08:54, 12.89s/it] 52%|█████▏ | 5210/10000 [19:00:30<17:06:54, 12.86s/it] {'loss': 0.0058, 'learning_rate': 2.4010000000000002e-05, 'epoch': 1.96} 52%|█████▏ | 5210/10000 [19:00:30<17:06:54, 12.86s/it] 52%|█████▏ | 5211/10000 [19:00:43<17:06:05, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.4005e-05, 'epoch': 1.96} 52%|█████▏ | 5211/10000 [19:00:43<17:06:05, 12.86s/it] 52%|█████▏ | 5212/10000 [19:00:55<17:05:23, 12.85s/it] {'loss': 0.0079, 'learning_rate': 2.4e-05, 'epoch': 1.96} 52%|█████▏ | 5212/10000 [19:00:55<17:05:23, 12.85s/it] 52%|█████▏ | 5213/10000 [19:01:08<17:06:50, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.3995e-05, 'epoch': 1.96} 52%|█████▏ | 5213/10000 [19:01:08<17:06:50, 12.87s/it] 52%|█████▏ | 5214/10000 [19:01:21<17:09:29, 12.91s/it] {'loss': 0.0042, 'learning_rate': 2.3990000000000002e-05, 'epoch': 1.96} 52%|█████▏ | 5214/10000 [19:01:21<17:09:29, 12.91s/it] 52%|█████▏ | 5215/10000 [19:01:34<17:08:29, 12.90s/it] {'loss': 0.0051, 'learning_rate': 2.3985e-05, 'epoch': 1.96} 52%|█████▏ | 5215/10000 [19:01:34<17:08:29, 12.90s/it] 52%|█████▏ | 5216/10000 [19:01:47<17:07:20, 12.88s/it] {'loss': 0.0045, 'learning_rate': 2.398e-05, 'epoch': 1.97} 52%|█████▏ | 5216/10000 [19:01:47<17:07:20, 12.88s/it] 52%|█████▏ | 5217/10000 [19:02:00<17:07:08, 12.88s/it] {'loss': 0.0055, 'learning_rate': 2.3975e-05, 'epoch': 1.97} 52%|█████▏ | 5217/10000 [19:02:00<17:07:08, 12.88s/it] 52%|█████▏ | 5218/10000 [19:02:13<17:07:40, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.397e-05, 'epoch': 1.97} 52%|█████▏ | 5218/10000 [19:02:13<17:07:40, 12.89s/it] 52%|█████▏ | 5219/10000 [19:02:26<17:06:55, 12.89s/it] {'loss': 0.006, 'learning_rate': 2.3965000000000002e-05, 'epoch': 1.97} 52%|█████▏ | 5219/10000 [19:02:26<17:06:55, 12.89s/it] 52%|█████▏ | 5220/10000 [19:02:39<17:06:30, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.396e-05, 'epoch': 1.97} 52%|█████▏ | 5220/10000 [19:02:39<17:06:30, 12.89s/it] 52%|█████▏ | 5221/10000 [19:02:51<17:05:21, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.3955000000000004e-05, 'epoch': 1.97} 52%|█████▏ | 5221/10000 [19:02:51<17:05:21, 12.87s/it] 52%|█████▏ | 5222/10000 [19:03:04<17:05:21, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.395e-05, 'epoch': 1.97} 52%|█████▏ | 5222/10000 [19:03:04<17:05:21, 12.88s/it] 52%|█████▏ | 5223/10000 [19:03:17<17:04:58, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.3945000000000002e-05, 'epoch': 1.97} 52%|█████▏ | 5223/10000 [19:03:17<17:04:58, 12.87s/it] 52%|█████▏ | 5224/10000 [19:03:30<17:04:40, 12.87s/it] {'loss': 0.0052, 'learning_rate': 2.394e-05, 'epoch': 1.97} 52%|█████▏ | 5224/10000 [19:03:30<17:04:40, 12.87s/it] 52%|█████▏ | 5225/10000 [19:03:43<17:06:25, 12.90s/it] {'loss': 0.0047, 'learning_rate': 2.3935e-05, 'epoch': 1.97} 52%|█████▏ | 5225/10000 [19:03:43<17:06:25, 12.90s/it] 52%|█████▏ | 5226/10000 [19:03:56<17:03:33, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.3930000000000003e-05, 'epoch': 1.97} 52%|█████▏ | 5226/10000 [19:03:56<17:03:33, 12.86s/it] 52%|█████▏ | 5227/10000 [19:04:09<17:02:31, 12.85s/it] {'loss': 0.0065, 'learning_rate': 2.3925e-05, 'epoch': 1.97} 52%|█████▏ | 5227/10000 [19:04:09<17:02:31, 12.85s/it] 52%|█████▏ | 5228/10000 [19:04:21<17:02:57, 12.86s/it] {'loss': 0.0061, 'learning_rate': 2.392e-05, 'epoch': 1.97} 52%|█████▏ | 5228/10000 [19:04:22<17:02:57, 12.86s/it] 52%|█████▏ | 5229/10000 [19:04:34<17:02:12, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.3915e-05, 'epoch': 1.97} 52%|█████▏ | 5229/10000 [19:04:34<17:02:12, 12.86s/it] 52%|█████▏ | 5230/10000 [19:04:47<17:03:51, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.3910000000000003e-05, 'epoch': 1.97} 52%|█████▏ | 5230/10000 [19:04:47<17:03:51, 12.88s/it] 52%|█████▏ | 5231/10000 [19:05:00<17:05:11, 12.90s/it] {'loss': 0.005, 'learning_rate': 2.3905000000000002e-05, 'epoch': 1.97} 52%|█████▏ | 5231/10000 [19:05:00<17:05:11, 12.90s/it] 52%|█████▏ | 5232/10000 [19:05:13<17:02:44, 12.87s/it] {'loss': 0.0052, 'learning_rate': 2.39e-05, 'epoch': 1.97} 52%|█████▏ | 5232/10000 [19:05:13<17:02:44, 12.87s/it] 52%|█████▏ | 5233/10000 [19:05:26<17:02:40, 12.87s/it] {'loss': 0.0037, 'learning_rate': 2.3895e-05, 'epoch': 1.97} 52%|█████▏ | 5233/10000 [19:05:26<17:02:40, 12.87s/it] 52%|█████▏ | 5234/10000 [19:05:39<17:02:34, 12.87s/it] {'loss': 0.0055, 'learning_rate': 2.389e-05, 'epoch': 1.97} 52%|█████▏ | 5234/10000 [19:05:39<17:02:34, 12.87s/it] 52%|█████▏ | 5235/10000 [19:05:52<17:02:46, 12.88s/it] {'loss': 0.0057, 'learning_rate': 2.3885000000000003e-05, 'epoch': 1.97} 52%|█████▏ | 5235/10000 [19:05:52<17:02:46, 12.88s/it] 52%|█████▏ | 5236/10000 [19:06:05<17:03:28, 12.89s/it] {'loss': 0.0046, 'learning_rate': 2.3880000000000002e-05, 'epoch': 1.97} 52%|█████▏ | 5236/10000 [19:06:05<17:03:28, 12.89s/it] 52%|█████▏ | 5237/10000 [19:06:17<17:02:20, 12.88s/it] {'loss': 0.005, 'learning_rate': 2.3875e-05, 'epoch': 1.97} 52%|█████▏ | 5237/10000 [19:06:17<17:02:20, 12.88s/it] 52%|█████▏ | 5238/10000 [19:06:30<17:02:54, 12.89s/it] {'loss': 0.0042, 'learning_rate': 2.387e-05, 'epoch': 1.97} 52%|█████▏ | 5238/10000 [19:06:30<17:02:54, 12.89s/it] 52%|█████▏ | 5239/10000 [19:06:43<17:02:40, 12.89s/it] {'loss': 0.0043, 'learning_rate': 2.3865000000000003e-05, 'epoch': 1.97} 52%|█████▏ | 5239/10000 [19:06:43<17:02:40, 12.89s/it] 52%|█████▏ | 5240/10000 [19:06:56<17:01:36, 12.88s/it] {'loss': 0.0063, 'learning_rate': 2.3860000000000002e-05, 'epoch': 1.97} 52%|█████▏ | 5240/10000 [19:06:56<17:01:36, 12.88s/it] 52%|█████▏ | 5241/10000 [19:07:09<17:03:30, 12.90s/it] {'loss': 0.0038, 'learning_rate': 2.3855e-05, 'epoch': 1.97} 52%|█████▏ | 5241/10000 [19:07:09<17:03:30, 12.90s/it] 52%|█████▏ | 5242/10000 [19:07:22<17:03:31, 12.91s/it] {'loss': 0.0043, 'learning_rate': 2.385e-05, 'epoch': 1.98} 52%|█████▏ | 5242/10000 [19:07:22<17:03:31, 12.91s/it] 52%|█████▏ | 5243/10000 [19:07:35<17:01:25, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.3845e-05, 'epoch': 1.98} 52%|█████▏ | 5243/10000 [19:07:35<17:01:25, 12.88s/it] 52%|█████▏ | 5244/10000 [19:07:48<17:01:54, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.3840000000000002e-05, 'epoch': 1.98} 52%|█████▏ | 5244/10000 [19:07:48<17:01:54, 12.89s/it] 52%|█████▏ | 5245/10000 [19:08:01<17:02:52, 12.91s/it] {'loss': 0.0045, 'learning_rate': 2.3835e-05, 'epoch': 1.98} 52%|█████▏ | 5245/10000 [19:08:01<17:02:52, 12.91s/it] 52%|█████▏ | 5246/10000 [19:08:14<17:02:08, 12.90s/it] {'loss': 0.0057, 'learning_rate': 2.3830000000000004e-05, 'epoch': 1.98} 52%|█████▏ | 5246/10000 [19:08:14<17:02:08, 12.90s/it] 52%|█████▏ | 5247/10000 [19:08:26<17:01:39, 12.90s/it] {'loss': 0.0062, 'learning_rate': 2.3825e-05, 'epoch': 1.98} 52%|█████▏ | 5247/10000 [19:08:26<17:01:39, 12.90s/it] 52%|█████▏ | 5248/10000 [19:08:39<17:00:59, 12.89s/it] {'loss': 0.0055, 'learning_rate': 2.3820000000000002e-05, 'epoch': 1.98} 52%|█████▏ | 5248/10000 [19:08:39<17:00:59, 12.89s/it] 52%|█████▏ | 5249/10000 [19:08:52<17:01:41, 12.90s/it] {'loss': 0.0041, 'learning_rate': 2.3815e-05, 'epoch': 1.98} 52%|█████▏ | 5249/10000 [19:08:52<17:01:41, 12.90s/it] 52%|█████▎ | 5250/10000 [19:09:05<16:59:58, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.381e-05, 'epoch': 1.98} 52%|█████▎ | 5250/10000 [19:09:05<16:59:58, 12.88s/it] 53%|█████▎ | 5251/10000 [19:09:18<16:59:28, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.3805000000000003e-05, 'epoch': 1.98} 53%|█████▎ | 5251/10000 [19:09:18<16:59:28, 12.88s/it] 53%|█████▎ | 5252/10000 [19:09:31<17:00:52, 12.90s/it] {'loss': 0.0048, 'learning_rate': 2.38e-05, 'epoch': 1.98} 53%|█████▎ | 5252/10000 [19:09:31<17:00:52, 12.90s/it] 53%|█████▎ | 5253/10000 [19:09:44<17:00:49, 12.90s/it] {'loss': 0.0037, 'learning_rate': 2.3795000000000002e-05, 'epoch': 1.98} 53%|█████▎ | 5253/10000 [19:09:44<17:00:49, 12.90s/it] 53%|█████▎ | 5254/10000 [19:09:57<16:58:54, 12.88s/it] {'loss': 0.005, 'learning_rate': 2.379e-05, 'epoch': 1.98} 53%|█████▎ | 5254/10000 [19:09:57<16:58:54, 12.88s/it] 53%|█████▎ | 5255/10000 [19:10:09<16:58:00, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.3785e-05, 'epoch': 1.98} 53%|█████▎ | 5255/10000 [19:10:10<16:58:00, 12.87s/it] 53%|█████▎ | 5256/10000 [19:10:22<16:57:36, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.3780000000000003e-05, 'epoch': 1.98} 53%|█████▎ | 5256/10000 [19:10:22<16:57:36, 12.87s/it] 53%|█████▎ | 5257/10000 [19:10:35<16:59:19, 12.89s/it] {'loss': 0.0057, 'learning_rate': 2.3775e-05, 'epoch': 1.98} 53%|█████▎ | 5257/10000 [19:10:35<16:59:19, 12.89s/it] 53%|█████▎ | 5258/10000 [19:10:48<16:55:50, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.377e-05, 'epoch': 1.98} 53%|█████▎ | 5258/10000 [19:10:48<16:55:50, 12.85s/it] 53%|█████▎ | 5259/10000 [19:11:01<16:57:15, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.3765e-05, 'epoch': 1.98} 53%|█████▎ | 5259/10000 [19:11:01<16:57:15, 12.87s/it] 53%|█████▎ | 5260/10000 [19:11:14<16:55:59, 12.86s/it] {'loss': 0.004, 'learning_rate': 2.3760000000000003e-05, 'epoch': 1.98} 53%|█████▎ | 5260/10000 [19:11:14<16:55:59, 12.86s/it] 53%|█████▎ | 5261/10000 [19:11:27<16:57:53, 12.89s/it] {'loss': 0.0065, 'learning_rate': 2.3755000000000002e-05, 'epoch': 1.98} 53%|█████▎ | 5261/10000 [19:11:27<16:57:53, 12.89s/it] 53%|█████▎ | 5262/10000 [19:11:40<16:57:28, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.375e-05, 'epoch': 1.98} 53%|█████▎ | 5262/10000 [19:11:40<16:57:28, 12.88s/it] 53%|█████▎ | 5263/10000 [19:11:53<16:57:10, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.3745e-05, 'epoch': 1.98} 53%|█████▎ | 5263/10000 [19:11:53<16:57:10, 12.88s/it] 53%|█████▎ | 5264/10000 [19:12:05<16:57:53, 12.90s/it] {'loss': 0.0065, 'learning_rate': 2.374e-05, 'epoch': 1.98} 53%|█████▎ | 5264/10000 [19:12:05<16:57:53, 12.90s/it] 53%|█████▎ | 5265/10000 [19:12:18<16:54:49, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.3735000000000002e-05, 'epoch': 1.98} 53%|█████▎ | 5265/10000 [19:12:18<16:54:49, 12.86s/it] 53%|█████▎ | 5266/10000 [19:12:31<16:54:53, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.373e-05, 'epoch': 1.98} 53%|█████▎ | 5266/10000 [19:12:31<16:54:53, 12.86s/it] 53%|█████▎ | 5267/10000 [19:12:44<16:54:24, 12.86s/it] {'loss': 0.0056, 'learning_rate': 2.3725e-05, 'epoch': 1.98} 53%|█████▎ | 5267/10000 [19:12:44<16:54:24, 12.86s/it] 53%|█████▎ | 5268/10000 [19:12:57<16:53:30, 12.85s/it] {'loss': 0.0049, 'learning_rate': 2.372e-05, 'epoch': 1.98} 53%|█████▎ | 5268/10000 [19:12:57<16:53:30, 12.85s/it] 53%|█████▎ | 5269/10000 [19:13:10<16:54:12, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.3715000000000002e-05, 'epoch': 1.99} 53%|█████▎ | 5269/10000 [19:13:10<16:54:12, 12.86s/it] 53%|█████▎ | 5270/10000 [19:13:23<16:55:29, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.371e-05, 'epoch': 1.99} 53%|█████▎ | 5270/10000 [19:13:23<16:55:29, 12.88s/it] 53%|█████▎ | 5271/10000 [19:13:35<16:54:45, 12.88s/it] {'loss': 0.005, 'learning_rate': 2.3705e-05, 'epoch': 1.99} 53%|█████▎ | 5271/10000 [19:13:35<16:54:45, 12.88s/it] 53%|█████▎ | 5272/10000 [19:13:48<16:55:34, 12.89s/it] {'loss': 0.005, 'learning_rate': 2.37e-05, 'epoch': 1.99} 53%|█████▎ | 5272/10000 [19:13:48<16:55:34, 12.89s/it] 53%|█████▎ | 5273/10000 [19:14:01<16:55:10, 12.89s/it] {'loss': 0.0054, 'learning_rate': 2.3695e-05, 'epoch': 1.99} 53%|█████▎ | 5273/10000 [19:14:01<16:55:10, 12.89s/it] 53%|█████▎ | 5274/10000 [19:14:14<16:55:00, 12.89s/it] {'loss': 0.005, 'learning_rate': 2.3690000000000002e-05, 'epoch': 1.99} 53%|█████▎ | 5274/10000 [19:14:14<16:55:00, 12.89s/it] 53%|█████▎ | 5275/10000 [19:14:27<16:53:34, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.3685e-05, 'epoch': 1.99} 53%|█████▎ | 5275/10000 [19:14:27<16:53:34, 12.87s/it] 53%|█████▎ | 5276/10000 [19:14:40<16:52:52, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.3680000000000004e-05, 'epoch': 1.99} 53%|█████▎ | 5276/10000 [19:14:40<16:52:52, 12.86s/it] 53%|█████▎ | 5277/10000 [19:14:53<16:50:25, 12.84s/it] {'loss': 0.0074, 'learning_rate': 2.3675e-05, 'epoch': 1.99} 53%|█████▎ | 5277/10000 [19:14:53<16:50:25, 12.84s/it] 53%|█████▎ | 5278/10000 [19:15:05<16:50:13, 12.84s/it] {'loss': 0.0053, 'learning_rate': 2.3670000000000002e-05, 'epoch': 1.99} 53%|█████▎ | 5278/10000 [19:15:05<16:50:13, 12.84s/it] 53%|█████▎ | 5279/10000 [19:15:18<16:50:57, 12.85s/it] {'loss': 0.0048, 'learning_rate': 2.3665e-05, 'epoch': 1.99} 53%|█████▎ | 5279/10000 [19:15:18<16:50:57, 12.85s/it] 53%|█████▎ | 5280/10000 [19:15:31<16:49:01, 12.83s/it] {'loss': 0.0043, 'learning_rate': 2.366e-05, 'epoch': 1.99} 53%|█████▎ | 5280/10000 [19:15:31<16:49:01, 12.83s/it] 53%|█████▎ | 5281/10000 [19:15:44<16:48:38, 12.82s/it] {'loss': 0.0039, 'learning_rate': 2.3655000000000003e-05, 'epoch': 1.99} 53%|█████▎ | 5281/10000 [19:15:44<16:48:38, 12.82s/it] 53%|█████▎ | 5282/10000 [19:15:57<16:49:15, 12.83s/it] {'loss': 0.0044, 'learning_rate': 2.365e-05, 'epoch': 1.99} 53%|█████▎ | 5282/10000 [19:15:57<16:49:15, 12.83s/it] 53%|█████▎ | 5283/10000 [19:16:10<16:48:53, 12.83s/it] {'loss': 0.0061, 'learning_rate': 2.3645e-05, 'epoch': 1.99} 53%|█████▎ | 5283/10000 [19:16:10<16:48:53, 12.83s/it] 53%|█████▎ | 5284/10000 [19:16:22<16:49:35, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.364e-05, 'epoch': 1.99} 53%|█████▎ | 5284/10000 [19:16:22<16:49:35, 12.84s/it] 53%|█████▎ | 5285/10000 [19:16:35<16:50:30, 12.86s/it] {'loss': 0.0063, 'learning_rate': 2.3635000000000003e-05, 'epoch': 1.99} 53%|█████▎ | 5285/10000 [19:16:35<16:50:30, 12.86s/it] 53%|█████▎ | 5286/10000 [19:16:48<16:51:15, 12.87s/it] {'loss': 0.0054, 'learning_rate': 2.3630000000000002e-05, 'epoch': 1.99} 53%|█████▎ | 5286/10000 [19:16:48<16:51:15, 12.87s/it] 53%|█████▎ | 5287/10000 [19:17:01<16:52:22, 12.89s/it] {'loss': 0.0043, 'learning_rate': 2.3624999999999998e-05, 'epoch': 1.99} 53%|█████▎ | 5287/10000 [19:17:01<16:52:22, 12.89s/it] 53%|█████▎ | 5288/10000 [19:17:14<16:52:58, 12.90s/it] {'loss': 0.0041, 'learning_rate': 2.362e-05, 'epoch': 1.99} 53%|█████▎ | 5288/10000 [19:17:14<16:52:58, 12.90s/it] 53%|█████▎ | 5289/10000 [19:17:27<16:50:17, 12.87s/it] {'loss': 0.0051, 'learning_rate': 2.3615e-05, 'epoch': 1.99} 53%|█████▎ | 5289/10000 [19:17:27<16:50:17, 12.87s/it] 53%|█████▎ | 5290/10000 [19:17:40<16:49:19, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.3610000000000003e-05, 'epoch': 1.99} 53%|█████▎ | 5290/10000 [19:17:40<16:49:19, 12.86s/it] 53%|█████▎ | 5291/10000 [19:17:53<16:50:11, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.3605000000000002e-05, 'epoch': 1.99} 53%|█████▎ | 5291/10000 [19:17:53<16:50:11, 12.87s/it] 53%|█████▎ | 5292/10000 [19:18:06<16:53:23, 12.91s/it] {'loss': 0.0054, 'learning_rate': 2.36e-05, 'epoch': 1.99} 53%|█████▎ | 5292/10000 [19:18:06<16:53:23, 12.91s/it] 53%|█████▎ | 5293/10000 [19:18:19<16:54:53, 12.94s/it] {'loss': 0.0041, 'learning_rate': 2.3595e-05, 'epoch': 1.99} 53%|█████▎ | 5293/10000 [19:18:19<16:54:53, 12.94s/it] 53%|█████▎ | 5294/10000 [19:18:32<16:53:59, 12.93s/it] {'loss': 0.0042, 'learning_rate': 2.359e-05, 'epoch': 1.99} 53%|█████▎ | 5294/10000 [19:18:32<16:53:59, 12.93s/it] 53%|█████▎ | 5295/10000 [19:18:44<16:50:53, 12.89s/it] {'loss': 0.0059, 'learning_rate': 2.3585000000000002e-05, 'epoch': 2.0} 53%|█████▎ | 5295/10000 [19:18:44<16:50:53, 12.89s/it] 53%|█████▎ | 5296/10000 [19:18:57<16:50:28, 12.89s/it] {'loss': 0.0041, 'learning_rate': 2.358e-05, 'epoch': 2.0} 53%|█████▎ | 5296/10000 [19:18:57<16:50:28, 12.89s/it] 53%|█████▎ | 5297/10000 [19:19:10<16:50:05, 12.89s/it] {'loss': 0.007, 'learning_rate': 2.3575e-05, 'epoch': 2.0} 53%|█████▎ | 5297/10000 [19:19:10<16:50:05, 12.89s/it] 53%|█████▎ | 5298/10000 [19:19:23<16:49:25, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.357e-05, 'epoch': 2.0} 53%|█████▎ | 5298/10000 [19:19:23<16:49:25, 12.88s/it] 53%|█████▎ | 5299/10000 [19:19:36<16:49:39, 12.89s/it] {'loss': 0.0052, 'learning_rate': 2.3565000000000002e-05, 'epoch': 2.0} 53%|█████▎ | 5299/10000 [19:19:36<16:49:39, 12.89s/it] 53%|█████▎ | 5300/10000 [19:19:49<16:48:54, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.356e-05, 'epoch': 2.0} 53%|█████▎ | 5300/10000 [19:19:49<16:48:54, 12.88s/it] 53%|█████▎ | 5301/10000 [19:20:02<16:48:36, 12.88s/it] {'loss': 0.0061, 'learning_rate': 2.3555e-05, 'epoch': 2.0} 53%|█████▎ | 5301/10000 [19:20:02<16:48:36, 12.88s/it] 53%|█████▎ | 5302/10000 [19:20:14<16:47:23, 12.87s/it] {'loss': 0.0063, 'learning_rate': 2.355e-05, 'epoch': 2.0} 53%|█████▎ | 5302/10000 [19:20:14<16:47:23, 12.87s/it] 53%|█████▎ | 5303/10000 [19:20:27<16:46:06, 12.85s/it] {'loss': 0.0051, 'learning_rate': 2.3545e-05, 'epoch': 2.0} 53%|█████▎ | 5303/10000 [19:20:27<16:46:06, 12.85s/it] 53%|█████▎ | 5304/10000 [19:20:40<16:45:19, 12.84s/it] {'loss': 0.005, 'learning_rate': 2.354e-05, 'epoch': 2.0} 53%|█████▎ | 5304/10000 [19:20:40<16:45:19, 12.84s/it] 53%|█████▎ | 5305/10000 [19:20:53<16:45:44, 12.85s/it] {'loss': 0.0049, 'learning_rate': 2.3535e-05, 'epoch': 2.0} 53%|█████▎ | 5305/10000 [19:20:53<16:45:44, 12.85s/it] 53%|█████▎ | 5306/10000 [19:21:06<16:45:12, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.3530000000000003e-05, 'epoch': 2.0} 53%|█████▎ | 5306/10000 [19:21:06<16:45:12, 12.85s/it] 53%|█████▎ | 5307/10000 [19:21:19<16:44:56, 12.85s/it] {'loss': 0.0056, 'learning_rate': 2.3525e-05, 'epoch': 2.0} 53%|█████▎ | 5307/10000 [19:21:19<16:44:56, 12.85s/it] 53%|█████▎ | 5308/10000 [19:21:26<14:27:18, 11.09s/it] {'loss': 0.0058, 'learning_rate': 2.3520000000000002e-05, 'epoch': 2.0} 53%|█████▎ | 5308/10000 [19:21:26<14:27:18, 11.09s/it] 53%|█████▎ | 5309/10000 [19:21:39<15:10:02, 11.64s/it] {'loss': 0.0044, 'learning_rate': 2.3515e-05, 'epoch': 2.0} 53%|█████▎ | 5309/10000 [19:21:39<15:10:02, 11.64s/it] 53%|█████▎ | 5310/10000 [19:21:51<15:37:24, 11.99s/it] {'loss': 0.0054, 'learning_rate': 2.351e-05, 'epoch': 2.0} 53%|█████▎ | 5310/10000 [19:21:51<15:37:24, 11.99s/it] 53%|█████▎ | 5311/10000 [19:22:04<15:56:42, 12.24s/it] {'loss': 0.0042, 'learning_rate': 2.3505000000000003e-05, 'epoch': 2.0} 53%|█████▎ | 5311/10000 [19:22:04<15:56:42, 12.24s/it] 53%|█████▎ | 5312/10000 [19:22:17<16:09:08, 12.40s/it] {'loss': 0.005, 'learning_rate': 2.35e-05, 'epoch': 2.0} 53%|█████▎ | 5312/10000 [19:22:17<16:09:08, 12.40s/it] 53%|█████▎ | 5313/10000 [19:22:30<16:19:26, 12.54s/it] {'loss': 0.005, 'learning_rate': 2.3495e-05, 'epoch': 2.0} 53%|█████▎ | 5313/10000 [19:22:30<16:19:26, 12.54s/it] 53%|█████▎ | 5314/10000 [19:22:43<16:26:28, 12.63s/it] {'loss': 0.0055, 'learning_rate': 2.349e-05, 'epoch': 2.0} 53%|█████▎ | 5314/10000 [19:22:43<16:26:28, 12.63s/it] 53%|█████▎ | 5315/10000 [19:22:56<16:30:49, 12.69s/it] {'loss': 0.0065, 'learning_rate': 2.3485000000000003e-05, 'epoch': 2.0} 53%|█████▎ | 5315/10000 [19:22:56<16:30:49, 12.69s/it] 53%|█████▎ | 5316/10000 [19:23:08<16:34:00, 12.73s/it] {'loss': 0.0051, 'learning_rate': 2.3480000000000002e-05, 'epoch': 2.0} 53%|█████▎ | 5316/10000 [19:23:08<16:34:00, 12.73s/it] 53%|█████▎ | 5317/10000 [19:23:21<16:37:54, 12.79s/it] {'loss': 0.0046, 'learning_rate': 2.3475e-05, 'epoch': 2.0} 53%|█████▎ | 5317/10000 [19:23:21<16:37:54, 12.79s/it] 53%|█████▎ | 5318/10000 [19:23:34<16:40:13, 12.82s/it] {'loss': 0.0042, 'learning_rate': 2.347e-05, 'epoch': 2.0} 53%|█████▎ | 5318/10000 [19:23:34<16:40:13, 12.82s/it] 53%|█████▎ | 5319/10000 [19:23:47<16:40:11, 12.82s/it] {'loss': 0.0055, 'learning_rate': 2.3465e-05, 'epoch': 2.0} 53%|█████▎ | 5319/10000 [19:23:47<16:40:11, 12.82s/it] 53%|█████▎ | 5320/10000 [19:24:00<16:41:57, 12.85s/it] {'loss': 0.0041, 'learning_rate': 2.3460000000000002e-05, 'epoch': 2.0} 53%|█████▎ | 5320/10000 [19:24:00<16:41:57, 12.85s/it] 53%|█████▎ | 5321/10000 [19:24:13<16:41:29, 12.84s/it] {'loss': 0.0049, 'learning_rate': 2.3455e-05, 'epoch': 2.0} 53%|█████▎ | 5321/10000 [19:24:13<16:41:29, 12.84s/it] 53%|█████▎ | 5322/10000 [19:24:26<16:41:58, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.345e-05, 'epoch': 2.01} 53%|█████▎ | 5322/10000 [19:24:26<16:41:58, 12.85s/it] 53%|█████▎ | 5323/10000 [19:24:38<16:43:10, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.3445e-05, 'epoch': 2.01} 53%|█████▎ | 5323/10000 [19:24:39<16:43:10, 12.87s/it] 53%|█████▎ | 5324/10000 [19:24:51<16:42:13, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.344e-05, 'epoch': 2.01} 53%|█████▎ | 5324/10000 [19:24:51<16:42:13, 12.86s/it] 53%|█████▎ | 5325/10000 [19:25:04<16:43:05, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.3435000000000002e-05, 'epoch': 2.01} 53%|█████▎ | 5325/10000 [19:25:04<16:43:05, 12.87s/it] 53%|█████▎ | 5326/10000 [19:25:17<16:43:23, 12.88s/it] {'loss': 0.0043, 'learning_rate': 2.343e-05, 'epoch': 2.01} 53%|█████▎ | 5326/10000 [19:25:17<16:43:23, 12.88s/it] 53%|█████▎ | 5327/10000 [19:25:30<16:40:36, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.3425000000000004e-05, 'epoch': 2.01} 53%|█████▎ | 5327/10000 [19:25:30<16:40:36, 12.85s/it] 53%|█████▎ | 5328/10000 [19:25:43<16:41:49, 12.87s/it] {'loss': 0.0037, 'learning_rate': 2.342e-05, 'epoch': 2.01} 53%|█████▎ | 5328/10000 [19:25:43<16:41:49, 12.87s/it] 53%|█████▎ | 5329/10000 [19:25:56<16:41:50, 12.87s/it] {'loss': 0.0059, 'learning_rate': 2.3415000000000002e-05, 'epoch': 2.01} 53%|█████▎ | 5329/10000 [19:25:56<16:41:50, 12.87s/it] 53%|█████▎ | 5330/10000 [19:26:09<16:42:33, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.341e-05, 'epoch': 2.01} 53%|█████▎ | 5330/10000 [19:26:09<16:42:33, 12.88s/it] 53%|█████▎ | 5331/10000 [19:26:22<16:42:47, 12.89s/it] {'loss': 0.0044, 'learning_rate': 2.3405e-05, 'epoch': 2.01} 53%|█████▎ | 5331/10000 [19:26:22<16:42:47, 12.89s/it] 53%|█████▎ | 5332/10000 [19:26:34<16:42:47, 12.89s/it] {'loss': 0.0043, 'learning_rate': 2.3400000000000003e-05, 'epoch': 2.01} 53%|█████▎ | 5332/10000 [19:26:34<16:42:47, 12.89s/it] 53%|█████▎ | 5333/10000 [19:26:47<16:40:35, 12.86s/it] {'loss': 0.0063, 'learning_rate': 2.3395e-05, 'epoch': 2.01} 53%|█████▎ | 5333/10000 [19:26:47<16:40:35, 12.86s/it] 53%|█████▎ | 5334/10000 [19:27:00<16:41:19, 12.88s/it] {'loss': 0.006, 'learning_rate': 2.339e-05, 'epoch': 2.01} 53%|█████▎ | 5334/10000 [19:27:00<16:41:19, 12.88s/it] 53%|█████▎ | 5335/10000 [19:27:13<16:43:37, 12.91s/it] {'loss': 0.0042, 'learning_rate': 2.3385e-05, 'epoch': 2.01} 53%|█████▎ | 5335/10000 [19:27:13<16:43:37, 12.91s/it] 53%|█████▎ | 5336/10000 [19:27:26<16:42:27, 12.90s/it] {'loss': 0.0043, 'learning_rate': 2.3380000000000003e-05, 'epoch': 2.01} 53%|█████▎ | 5336/10000 [19:27:26<16:42:27, 12.90s/it] 53%|█████▎ | 5337/10000 [19:27:39<16:43:38, 12.91s/it] {'loss': 0.0043, 'learning_rate': 2.3375000000000002e-05, 'epoch': 2.01} 53%|█████▎ | 5337/10000 [19:27:39<16:43:38, 12.91s/it] 53%|█████▎ | 5338/10000 [19:27:52<16:43:31, 12.92s/it] {'loss': 0.005, 'learning_rate': 2.337e-05, 'epoch': 2.01} 53%|█████▎ | 5338/10000 [19:27:52<16:43:31, 12.92s/it] 53%|█████▎ | 5339/10000 [19:28:05<16:43:29, 12.92s/it] {'loss': 0.0039, 'learning_rate': 2.3365e-05, 'epoch': 2.01} 53%|█████▎ | 5339/10000 [19:28:05<16:43:29, 12.92s/it] 53%|█████▎ | 5340/10000 [19:28:18<16:42:38, 12.91s/it] {'loss': 0.0052, 'learning_rate': 2.336e-05, 'epoch': 2.01} 53%|█████▎ | 5340/10000 [19:28:18<16:42:38, 12.91s/it] 53%|█████▎ | 5341/10000 [19:28:31<16:41:29, 12.90s/it] {'loss': 0.0032, 'learning_rate': 2.3355000000000003e-05, 'epoch': 2.01} 53%|█████▎ | 5341/10000 [19:28:31<16:41:29, 12.90s/it] 53%|█████▎ | 5342/10000 [19:28:43<16:41:23, 12.90s/it] {'loss': 0.0043, 'learning_rate': 2.3350000000000002e-05, 'epoch': 2.01} 53%|█████▎ | 5342/10000 [19:28:43<16:41:23, 12.90s/it] 53%|█████▎ | 5343/10000 [19:28:56<16:40:27, 12.89s/it] {'loss': 0.0046, 'learning_rate': 2.3345e-05, 'epoch': 2.01} 53%|█████▎ | 5343/10000 [19:28:56<16:40:27, 12.89s/it] 53%|█████▎ | 5344/10000 [19:29:09<16:40:21, 12.89s/it] {'loss': 0.0036, 'learning_rate': 2.334e-05, 'epoch': 2.01} 53%|█████▎ | 5344/10000 [19:29:09<16:40:21, 12.89s/it] 53%|█████▎ | 5345/10000 [19:29:22<16:39:18, 12.88s/it] {'loss': 0.004, 'learning_rate': 2.3335000000000003e-05, 'epoch': 2.01} 53%|█████▎ | 5345/10000 [19:29:22<16:39:18, 12.88s/it] 53%|█████▎ | 5346/10000 [19:29:35<16:37:44, 12.86s/it] {'loss': 0.0045, 'learning_rate': 2.3330000000000002e-05, 'epoch': 2.01} 53%|█████▎ | 5346/10000 [19:29:35<16:37:44, 12.86s/it] 53%|█████▎ | 5347/10000 [19:29:48<16:37:46, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.3325e-05, 'epoch': 2.01} 53%|█████▎ | 5347/10000 [19:29:48<16:37:46, 12.87s/it] 53%|█████▎ | 5348/10000 [19:30:01<16:37:27, 12.86s/it] {'loss': 0.0042, 'learning_rate': 2.332e-05, 'epoch': 2.02} 53%|█████▎ | 5348/10000 [19:30:01<16:37:27, 12.86s/it] 53%|█████▎ | 5349/10000 [19:30:13<16:36:59, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.3315e-05, 'epoch': 2.02} 53%|█████▎ | 5349/10000 [19:30:13<16:36:59, 12.86s/it] 54%|█████▎ | 5350/10000 [19:30:26<16:35:55, 12.85s/it] {'loss': 0.0035, 'learning_rate': 2.3310000000000002e-05, 'epoch': 2.02} 54%|█████▎ | 5350/10000 [19:30:26<16:35:55, 12.85s/it] 54%|█████▎ | 5351/10000 [19:30:39<16:35:56, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.3305e-05, 'epoch': 2.02} 54%|█████▎ | 5351/10000 [19:30:39<16:35:56, 12.85s/it] 54%|█████▎ | 5352/10000 [19:30:52<16:37:15, 12.87s/it] {'loss': 0.0029, 'learning_rate': 2.3300000000000004e-05, 'epoch': 2.02} 54%|█████▎ | 5352/10000 [19:30:52<16:37:15, 12.87s/it] 54%|█████▎ | 5353/10000 [19:31:05<16:36:51, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.3295e-05, 'epoch': 2.02} 54%|█████▎ | 5353/10000 [19:31:05<16:36:51, 12.87s/it] 54%|█████▎ | 5354/10000 [19:31:18<16:37:13, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.3290000000000002e-05, 'epoch': 2.02} 54%|█████▎ | 5354/10000 [19:31:18<16:37:13, 12.88s/it] 54%|█████▎ | 5355/10000 [19:31:31<16:38:18, 12.90s/it] {'loss': 0.0056, 'learning_rate': 2.3285e-05, 'epoch': 2.02} 54%|█████▎ | 5355/10000 [19:31:31<16:38:18, 12.90s/it] 54%|█████▎ | 5356/10000 [19:31:44<16:35:17, 12.86s/it] {'loss': 0.0042, 'learning_rate': 2.328e-05, 'epoch': 2.02} 54%|█████▎ | 5356/10000 [19:31:44<16:35:17, 12.86s/it] 54%|█████▎ | 5357/10000 [19:31:56<16:36:29, 12.88s/it] {'loss': 0.0045, 'learning_rate': 2.3275000000000003e-05, 'epoch': 2.02} 54%|█████▎ | 5357/10000 [19:31:56<16:36:29, 12.88s/it] 54%|█████▎ | 5358/10000 [19:32:09<16:35:52, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.327e-05, 'epoch': 2.02} 54%|█████▎ | 5358/10000 [19:32:09<16:35:52, 12.87s/it] 54%|█████▎ | 5359/10000 [19:32:22<16:36:44, 12.89s/it] {'loss': 0.004, 'learning_rate': 2.3265000000000002e-05, 'epoch': 2.02} 54%|█████▎ | 5359/10000 [19:32:22<16:36:44, 12.89s/it] 54%|█████▎ | 5360/10000 [19:32:35<16:36:52, 12.89s/it] {'loss': 0.0052, 'learning_rate': 2.326e-05, 'epoch': 2.02} 54%|█████▎ | 5360/10000 [19:32:35<16:36:52, 12.89s/it] 54%|█████▎ | 5361/10000 [19:32:48<16:36:11, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.3255e-05, 'epoch': 2.02} 54%|█████▎ | 5361/10000 [19:32:48<16:36:11, 12.88s/it] 54%|█████▎ | 5362/10000 [19:33:01<16:35:22, 12.88s/it] {'loss': 0.0036, 'learning_rate': 2.3250000000000003e-05, 'epoch': 2.02} 54%|█████▎ | 5362/10000 [19:33:01<16:35:22, 12.88s/it] 54%|█████▎ | 5363/10000 [19:33:14<16:35:03, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.3245e-05, 'epoch': 2.02} 54%|█████▎ | 5363/10000 [19:33:14<16:35:03, 12.88s/it] 54%|█████▎ | 5364/10000 [19:33:27<16:33:59, 12.86s/it] {'loss': 0.0059, 'learning_rate': 2.324e-05, 'epoch': 2.02} 54%|█████▎ | 5364/10000 [19:33:27<16:33:59, 12.86s/it] 54%|█████▎ | 5365/10000 [19:33:39<16:33:21, 12.86s/it] {'loss': 0.0052, 'learning_rate': 2.3235e-05, 'epoch': 2.02} 54%|█████▎ | 5365/10000 [19:33:39<16:33:21, 12.86s/it] 54%|█████▎ | 5366/10000 [19:33:52<16:32:36, 12.85s/it] {'loss': 0.0045, 'learning_rate': 2.3230000000000003e-05, 'epoch': 2.02} 54%|█████▎ | 5366/10000 [19:33:52<16:32:36, 12.85s/it] 54%|█████▎ | 5367/10000 [19:34:05<16:31:45, 12.84s/it] {'loss': 0.0053, 'learning_rate': 2.3225000000000002e-05, 'epoch': 2.02} 54%|█████▎ | 5367/10000 [19:34:05<16:31:45, 12.84s/it] 54%|█████▎ | 5368/10000 [19:34:18<16:32:18, 12.85s/it] {'loss': 0.0053, 'learning_rate': 2.322e-05, 'epoch': 2.02} 54%|█████▎ | 5368/10000 [19:34:18<16:32:18, 12.85s/it] 54%|█████▎ | 5369/10000 [19:34:31<16:32:55, 12.86s/it] {'loss': 0.0049, 'learning_rate': 2.3215e-05, 'epoch': 2.02} 54%|█████▎ | 5369/10000 [19:34:31<16:32:55, 12.86s/it] 54%|█████▎ | 5370/10000 [19:34:44<16:30:46, 12.84s/it] {'loss': 0.0046, 'learning_rate': 2.321e-05, 'epoch': 2.02} 54%|█████▎ | 5370/10000 [19:34:44<16:30:46, 12.84s/it] 54%|█████▎ | 5371/10000 [19:34:57<16:33:21, 12.88s/it] {'loss': 0.0034, 'learning_rate': 2.3205000000000002e-05, 'epoch': 2.02} 54%|█████▎ | 5371/10000 [19:34:57<16:33:21, 12.88s/it] 54%|█████▎ | 5372/10000 [19:35:09<16:32:47, 12.87s/it] {'loss': 0.0054, 'learning_rate': 2.32e-05, 'epoch': 2.02} 54%|█████▎ | 5372/10000 [19:35:09<16:32:47, 12.87s/it] 54%|█████▎ | 5373/10000 [19:35:22<16:34:58, 12.90s/it] {'loss': 0.0045, 'learning_rate': 2.3195e-05, 'epoch': 2.02} 54%|█████▎ | 5373/10000 [19:35:22<16:34:58, 12.90s/it] 54%|█████▎ | 5374/10000 [19:35:35<16:35:51, 12.92s/it] {'loss': 0.0046, 'learning_rate': 2.319e-05, 'epoch': 2.02} 54%|█████▎ | 5374/10000 [19:35:35<16:35:51, 12.92s/it] 54%|█████▍ | 5375/10000 [19:35:48<16:32:59, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.3185000000000002e-05, 'epoch': 2.03} 54%|█████▍ | 5375/10000 [19:35:48<16:32:59, 12.88s/it] 54%|█████▍ | 5376/10000 [19:36:01<16:32:34, 12.88s/it] {'loss': 0.0037, 'learning_rate': 2.318e-05, 'epoch': 2.03} 54%|█████▍ | 5376/10000 [19:36:01<16:32:34, 12.88s/it] 54%|█████▍ | 5377/10000 [19:36:14<16:32:42, 12.88s/it] {'loss': 0.0039, 'learning_rate': 2.3175e-05, 'epoch': 2.03} 54%|█████▍ | 5377/10000 [19:36:14<16:32:42, 12.88s/it] 54%|█████▍ | 5378/10000 [19:36:27<16:33:05, 12.89s/it] {'loss': 0.004, 'learning_rate': 2.317e-05, 'epoch': 2.03} 54%|█████▍ | 5378/10000 [19:36:27<16:33:05, 12.89s/it] 54%|█████▍ | 5379/10000 [19:36:40<16:31:47, 12.88s/it] {'loss': 0.0039, 'learning_rate': 2.3165e-05, 'epoch': 2.03} 54%|█████▍ | 5379/10000 [19:36:40<16:31:47, 12.88s/it] 54%|█████▍ | 5380/10000 [19:36:53<16:32:43, 12.89s/it] {'loss': 0.0033, 'learning_rate': 2.3160000000000002e-05, 'epoch': 2.03} 54%|█████▍ | 5380/10000 [19:36:53<16:32:43, 12.89s/it] 54%|█████▍ | 5381/10000 [19:37:05<16:31:50, 12.88s/it] {'loss': 0.0031, 'learning_rate': 2.3155e-05, 'epoch': 2.03} 54%|█████▍ | 5381/10000 [19:37:06<16:31:50, 12.88s/it] 54%|█████▍ | 5382/10000 [19:37:18<16:31:18, 12.88s/it] {'loss': 0.004, 'learning_rate': 2.3150000000000004e-05, 'epoch': 2.03} 54%|█████▍ | 5382/10000 [19:37:18<16:31:18, 12.88s/it] 54%|█████▍ | 5383/10000 [19:37:31<16:32:05, 12.89s/it] {'loss': 0.003, 'learning_rate': 2.3145e-05, 'epoch': 2.03} 54%|█████▍ | 5383/10000 [19:37:31<16:32:05, 12.89s/it] 54%|█████▍ | 5384/10000 [19:37:44<16:30:44, 12.88s/it] {'loss': 0.0045, 'learning_rate': 2.3140000000000002e-05, 'epoch': 2.03} 54%|█████▍ | 5384/10000 [19:37:44<16:30:44, 12.88s/it] 54%|█████▍ | 5385/10000 [19:37:57<16:30:04, 12.87s/it] {'loss': 0.0057, 'learning_rate': 2.3135e-05, 'epoch': 2.03} 54%|█████▍ | 5385/10000 [19:37:57<16:30:04, 12.87s/it] 54%|█████▍ | 5386/10000 [19:38:10<16:31:49, 12.90s/it] {'loss': 0.0037, 'learning_rate': 2.313e-05, 'epoch': 2.03} 54%|█████▍ | 5386/10000 [19:38:10<16:31:49, 12.90s/it] 54%|█████▍ | 5387/10000 [19:38:23<16:32:03, 12.90s/it] {'loss': 0.0039, 'learning_rate': 2.3125000000000003e-05, 'epoch': 2.03} 54%|█████▍ | 5387/10000 [19:38:23<16:32:03, 12.90s/it] 54%|█████▍ | 5388/10000 [19:38:36<16:30:15, 12.88s/it] {'loss': 0.0037, 'learning_rate': 2.312e-05, 'epoch': 2.03} 54%|█████▍ | 5388/10000 [19:38:36<16:30:15, 12.88s/it] 54%|█████▍ | 5389/10000 [19:38:49<16:29:48, 12.88s/it] {'loss': 0.0044, 'learning_rate': 2.3115e-05, 'epoch': 2.03} 54%|█████▍ | 5389/10000 [19:38:49<16:29:48, 12.88s/it] 54%|█████▍ | 5390/10000 [19:39:01<16:28:30, 12.87s/it] {'loss': 0.0057, 'learning_rate': 2.311e-05, 'epoch': 2.03} 54%|█████▍ | 5390/10000 [19:39:01<16:28:30, 12.87s/it] 54%|█████▍ | 5391/10000 [19:39:14<16:29:42, 12.88s/it] {'loss': 0.0035, 'learning_rate': 2.3105000000000003e-05, 'epoch': 2.03} 54%|█████▍ | 5391/10000 [19:39:14<16:29:42, 12.88s/it] 54%|█████▍ | 5392/10000 [19:39:27<16:30:26, 12.90s/it] {'loss': 0.005, 'learning_rate': 2.3100000000000002e-05, 'epoch': 2.03} 54%|█████▍ | 5392/10000 [19:39:27<16:30:26, 12.90s/it] 54%|█████▍ | 5393/10000 [19:39:40<16:30:00, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.3095e-05, 'epoch': 2.03} 54%|█████▍ | 5393/10000 [19:39:40<16:30:00, 12.89s/it] 54%|█████▍ | 5394/10000 [19:39:53<16:29:17, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.309e-05, 'epoch': 2.03} 54%|█████▍ | 5394/10000 [19:39:53<16:29:17, 12.89s/it] 54%|█████▍ | 5395/10000 [19:40:06<16:29:43, 12.90s/it] {'loss': 0.0041, 'learning_rate': 2.3085e-05, 'epoch': 2.03} 54%|█████▍ | 5395/10000 [19:40:06<16:29:43, 12.90s/it] 54%|█████▍ | 5396/10000 [19:40:19<16:28:56, 12.89s/it] {'loss': 0.004, 'learning_rate': 2.3080000000000003e-05, 'epoch': 2.03} 54%|█████▍ | 5396/10000 [19:40:19<16:28:56, 12.89s/it] 54%|█████▍ | 5397/10000 [19:40:32<16:30:33, 12.91s/it] {'loss': 0.0049, 'learning_rate': 2.3075000000000002e-05, 'epoch': 2.03} 54%|█████▍ | 5397/10000 [19:40:32<16:30:33, 12.91s/it] 54%|█████▍ | 5398/10000 [19:40:45<16:31:42, 12.93s/it] {'loss': 0.0053, 'learning_rate': 2.307e-05, 'epoch': 2.03} 54%|█████▍ | 5398/10000 [19:40:45<16:31:42, 12.93s/it] 54%|█████▍ | 5399/10000 [19:40:58<16:32:11, 12.94s/it] {'loss': 0.0058, 'learning_rate': 2.3065e-05, 'epoch': 2.03} 54%|█████▍ | 5399/10000 [19:40:58<16:32:11, 12.94s/it] 54%|█████▍ | 5400/10000 [19:41:11<16:30:48, 12.92s/it] {'loss': 0.0054, 'learning_rate': 2.306e-05, 'epoch': 2.03} 54%|█████▍ | 5400/10000 [19:41:11<16:30:48, 12.92s/it] 54%|█████▍ | 5401/10000 [19:41:23<16:28:12, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.3055000000000002e-05, 'epoch': 2.04} 54%|█████▍ | 5401/10000 [19:41:23<16:28:12, 12.89s/it] 54%|█████▍ | 5402/10000 [19:41:36<16:26:44, 12.88s/it] {'loss': 0.0044, 'learning_rate': 2.305e-05, 'epoch': 2.04} 54%|█████▍ | 5402/10000 [19:41:36<16:26:44, 12.88s/it] 54%|█████▍ | 5403/10000 [19:41:49<16:26:18, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.3045e-05, 'epoch': 2.04} 54%|█████▍ | 5403/10000 [19:41:49<16:26:18, 12.87s/it] 54%|█████▍ | 5404/10000 [19:42:02<16:24:34, 12.85s/it] {'loss': 0.0053, 'learning_rate': 2.304e-05, 'epoch': 2.04} 54%|█████▍ | 5404/10000 [19:42:02<16:24:34, 12.85s/it] 54%|█████▍ | 5405/10000 [19:42:15<16:25:35, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.3035000000000002e-05, 'epoch': 2.04} 54%|█████▍ | 5405/10000 [19:42:15<16:25:35, 12.87s/it] 54%|█████▍ | 5406/10000 [19:42:28<16:25:22, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.303e-05, 'epoch': 2.04} 54%|█████▍ | 5406/10000 [19:42:28<16:25:22, 12.87s/it] 54%|█████▍ | 5407/10000 [19:42:40<16:23:56, 12.85s/it] {'loss': 0.0059, 'learning_rate': 2.3025e-05, 'epoch': 2.04} 54%|█████▍ | 5407/10000 [19:42:41<16:23:56, 12.85s/it] 54%|█████▍ | 5408/10000 [19:42:53<16:22:51, 12.84s/it] {'loss': 0.0039, 'learning_rate': 2.302e-05, 'epoch': 2.04} 54%|█████▍ | 5408/10000 [19:42:53<16:22:51, 12.84s/it] 54%|█████▍ | 5409/10000 [19:43:06<16:22:39, 12.84s/it] {'loss': 0.0051, 'learning_rate': 2.3015e-05, 'epoch': 2.04} 54%|█████▍ | 5409/10000 [19:43:06<16:22:39, 12.84s/it] 54%|█████▍ | 5410/10000 [19:43:19<16:22:36, 12.84s/it] {'loss': 0.0042, 'learning_rate': 2.301e-05, 'epoch': 2.04} 54%|█████▍ | 5410/10000 [19:43:19<16:22:36, 12.84s/it] 54%|█████▍ | 5411/10000 [19:43:32<16:24:04, 12.87s/it] {'loss': 0.0037, 'learning_rate': 2.3005e-05, 'epoch': 2.04} 54%|█████▍ | 5411/10000 [19:43:32<16:24:04, 12.87s/it] 54%|█████▍ | 5412/10000 [19:43:45<16:23:00, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.3000000000000003e-05, 'epoch': 2.04} 54%|█████▍ | 5412/10000 [19:43:45<16:23:00, 12.86s/it] 54%|█████▍ | 5413/10000 [19:43:58<16:21:58, 12.84s/it] {'loss': 0.0043, 'learning_rate': 2.2995e-05, 'epoch': 2.04} 54%|█████▍ | 5413/10000 [19:43:58<16:21:58, 12.84s/it] 54%|█████▍ | 5414/10000 [19:44:10<16:22:45, 12.86s/it] {'loss': 0.0039, 'learning_rate': 2.2990000000000002e-05, 'epoch': 2.04} 54%|█████▍ | 5414/10000 [19:44:10<16:22:45, 12.86s/it] 54%|█████▍ | 5415/10000 [19:44:23<16:23:30, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.2985e-05, 'epoch': 2.04} 54%|█████▍ | 5415/10000 [19:44:23<16:23:30, 12.87s/it] 54%|█████▍ | 5416/10000 [19:44:36<16:22:41, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.298e-05, 'epoch': 2.04} 54%|█████▍ | 5416/10000 [19:44:36<16:22:41, 12.86s/it] 54%|█████▍ | 5417/10000 [19:44:49<16:23:36, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.2975000000000003e-05, 'epoch': 2.04} 54%|█████▍ | 5417/10000 [19:44:49<16:23:36, 12.88s/it] 54%|█████▍ | 5418/10000 [19:45:02<16:23:42, 12.88s/it] {'loss': 0.0039, 'learning_rate': 2.297e-05, 'epoch': 2.04} 54%|█████▍ | 5418/10000 [19:45:02<16:23:42, 12.88s/it] 54%|█████▍ | 5419/10000 [19:45:15<16:25:05, 12.90s/it] {'loss': 0.0058, 'learning_rate': 2.2965e-05, 'epoch': 2.04} 54%|█████▍ | 5419/10000 [19:45:15<16:25:05, 12.90s/it] 54%|█████▍ | 5420/10000 [19:45:28<16:23:46, 12.89s/it] {'loss': 0.0039, 'learning_rate': 2.296e-05, 'epoch': 2.04} 54%|█████▍ | 5420/10000 [19:45:28<16:23:46, 12.89s/it] 54%|█████▍ | 5421/10000 [19:45:41<16:24:12, 12.90s/it] {'loss': 0.0046, 'learning_rate': 2.2955000000000003e-05, 'epoch': 2.04} 54%|█████▍ | 5421/10000 [19:45:41<16:24:12, 12.90s/it] 54%|█████▍ | 5422/10000 [19:45:54<16:22:58, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.2950000000000002e-05, 'epoch': 2.04} 54%|█████▍ | 5422/10000 [19:45:54<16:22:58, 12.88s/it] 54%|█████▍ | 5423/10000 [19:46:06<16:21:53, 12.87s/it] {'loss': 0.0039, 'learning_rate': 2.2945e-05, 'epoch': 2.04} 54%|█████▍ | 5423/10000 [19:46:06<16:21:53, 12.87s/it] 54%|█████▍ | 5424/10000 [19:46:19<16:24:54, 12.91s/it] {'loss': 0.0039, 'learning_rate': 2.294e-05, 'epoch': 2.04} 54%|█████▍ | 5424/10000 [19:46:19<16:24:54, 12.91s/it] 54%|█████▍ | 5425/10000 [19:46:32<16:26:03, 12.93s/it] {'loss': 0.0037, 'learning_rate': 2.2935e-05, 'epoch': 2.04} 54%|█████▍ | 5425/10000 [19:46:32<16:26:03, 12.93s/it] 54%|█████▍ | 5426/10000 [19:46:45<16:25:40, 12.93s/it] {'loss': 0.0045, 'learning_rate': 2.2930000000000002e-05, 'epoch': 2.04} 54%|█████▍ | 5426/10000 [19:46:45<16:25:40, 12.93s/it] 54%|█████▍ | 5427/10000 [19:46:58<16:27:58, 12.96s/it] {'loss': 0.0039, 'learning_rate': 2.2925e-05, 'epoch': 2.04} 54%|█████▍ | 5427/10000 [19:46:58<16:27:58, 12.96s/it] 54%|█████▍ | 5428/10000 [19:47:11<16:28:59, 12.98s/it] {'loss': 0.0038, 'learning_rate': 2.292e-05, 'epoch': 2.05} 54%|█████▍ | 5428/10000 [19:47:11<16:28:59, 12.98s/it] 54%|█████▍ | 5429/10000 [19:47:24<16:28:33, 12.98s/it] {'loss': 0.0049, 'learning_rate': 2.2915e-05, 'epoch': 2.05} 54%|█████▍ | 5429/10000 [19:47:24<16:28:33, 12.98s/it] 54%|█████▍ | 5430/10000 [19:47:37<16:27:34, 12.97s/it] {'loss': 0.0042, 'learning_rate': 2.2910000000000003e-05, 'epoch': 2.05} 54%|█████▍ | 5430/10000 [19:47:37<16:27:34, 12.97s/it] 54%|█████▍ | 5431/10000 [19:47:50<16:28:24, 12.98s/it] {'loss': 0.0045, 'learning_rate': 2.2905000000000002e-05, 'epoch': 2.05} 54%|█████▍ | 5431/10000 [19:47:50<16:28:24, 12.98s/it] 54%|█████▍ | 5432/10000 [19:48:03<16:27:33, 12.97s/it] {'loss': 0.0053, 'learning_rate': 2.29e-05, 'epoch': 2.05} 54%|█████▍ | 5432/10000 [19:48:03<16:27:33, 12.97s/it] 54%|█████▍ | 5433/10000 [19:48:16<16:24:14, 12.93s/it] {'loss': 0.0042, 'learning_rate': 2.2895e-05, 'epoch': 2.05} 54%|█████▍ | 5433/10000 [19:48:16<16:24:14, 12.93s/it] 54%|█████▍ | 5434/10000 [19:48:29<16:23:26, 12.92s/it] {'loss': 0.005, 'learning_rate': 2.289e-05, 'epoch': 2.05} 54%|█████▍ | 5434/10000 [19:48:29<16:23:26, 12.92s/it] 54%|█████▍ | 5435/10000 [19:48:42<16:22:58, 12.92s/it] {'loss': 0.0047, 'learning_rate': 2.2885000000000002e-05, 'epoch': 2.05} 54%|█████▍ | 5435/10000 [19:48:42<16:22:58, 12.92s/it] 54%|█████▍ | 5436/10000 [19:48:55<16:21:38, 12.91s/it] {'loss': 0.0039, 'learning_rate': 2.288e-05, 'epoch': 2.05} 54%|█████▍ | 5436/10000 [19:48:55<16:21:38, 12.91s/it] 54%|█████▍ | 5437/10000 [19:49:08<16:20:50, 12.90s/it] {'loss': 0.0056, 'learning_rate': 2.2875e-05, 'epoch': 2.05} 54%|█████▍ | 5437/10000 [19:49:08<16:20:50, 12.90s/it] 54%|█████▍ | 5438/10000 [19:49:21<16:20:53, 12.90s/it] {'loss': 0.0058, 'learning_rate': 2.287e-05, 'epoch': 2.05} 54%|█████▍ | 5438/10000 [19:49:21<16:20:53, 12.90s/it] 54%|█████▍ | 5439/10000 [19:49:33<16:19:29, 12.89s/it] {'loss': 0.0048, 'learning_rate': 2.2865e-05, 'epoch': 2.05} 54%|█████▍ | 5439/10000 [19:49:33<16:19:29, 12.89s/it] 54%|█████▍ | 5440/10000 [19:49:46<16:20:24, 12.90s/it] {'loss': 0.0044, 'learning_rate': 2.286e-05, 'epoch': 2.05} 54%|█████▍ | 5440/10000 [19:49:46<16:20:24, 12.90s/it] 54%|█████▍ | 5441/10000 [19:49:59<16:19:17, 12.89s/it] {'loss': 0.0041, 'learning_rate': 2.2855e-05, 'epoch': 2.05} 54%|█████▍ | 5441/10000 [19:49:59<16:19:17, 12.89s/it] 54%|█████▍ | 5442/10000 [19:50:12<16:17:21, 12.87s/it] {'loss': 0.0039, 'learning_rate': 2.2850000000000003e-05, 'epoch': 2.05} 54%|█████▍ | 5442/10000 [19:50:12<16:17:21, 12.87s/it] 54%|█████▍ | 5443/10000 [19:50:25<16:17:08, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.2845e-05, 'epoch': 2.05} 54%|█████▍ | 5443/10000 [19:50:25<16:17:08, 12.87s/it] 54%|█████▍ | 5444/10000 [19:50:38<16:16:47, 12.86s/it] {'loss': 0.0049, 'learning_rate': 2.284e-05, 'epoch': 2.05} 54%|█████▍ | 5444/10000 [19:50:38<16:16:47, 12.86s/it] 54%|█████▍ | 5445/10000 [19:50:51<16:16:23, 12.86s/it] {'loss': 0.0037, 'learning_rate': 2.2835e-05, 'epoch': 2.05} 54%|█████▍ | 5445/10000 [19:50:51<16:16:23, 12.86s/it] 54%|█████▍ | 5446/10000 [19:51:04<16:16:44, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.283e-05, 'epoch': 2.05} 54%|█████▍ | 5446/10000 [19:51:04<16:16:44, 12.87s/it] 54%|█████▍ | 5447/10000 [19:51:16<16:16:12, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.2825000000000003e-05, 'epoch': 2.05} 54%|█████▍ | 5447/10000 [19:51:16<16:16:12, 12.86s/it] 54%|█████▍ | 5448/10000 [19:51:29<16:16:05, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.282e-05, 'epoch': 2.05} 54%|█████▍ | 5448/10000 [19:51:29<16:16:05, 12.87s/it] 54%|█████▍ | 5449/10000 [19:51:42<16:15:38, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.2815e-05, 'epoch': 2.05} 54%|█████▍ | 5449/10000 [19:51:42<16:15:38, 12.86s/it] 55%|█████▍ | 5450/10000 [19:51:55<16:16:53, 12.88s/it] {'loss': 0.0037, 'learning_rate': 2.281e-05, 'epoch': 2.05} 55%|█████▍ | 5450/10000 [19:51:55<16:16:53, 12.88s/it] 55%|█████▍ | 5451/10000 [19:52:08<16:16:27, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.2805000000000003e-05, 'epoch': 2.05} 55%|█████▍ | 5451/10000 [19:52:08<16:16:27, 12.88s/it] 55%|█████▍ | 5452/10000 [19:52:21<16:16:12, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.2800000000000002e-05, 'epoch': 2.05} 55%|█████▍ | 5452/10000 [19:52:21<16:16:12, 12.88s/it] 55%|█████▍ | 5453/10000 [19:52:34<16:15:55, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.2795e-05, 'epoch': 2.05} 55%|█████▍ | 5453/10000 [19:52:34<16:15:55, 12.88s/it] 55%|█████▍ | 5454/10000 [19:52:46<16:14:57, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.279e-05, 'epoch': 2.06} 55%|█████▍ | 5454/10000 [19:52:47<16:14:57, 12.87s/it] 55%|█████▍ | 5455/10000 [19:52:59<16:14:42, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.2785e-05, 'epoch': 2.06} 55%|█████▍ | 5455/10000 [19:52:59<16:14:42, 12.87s/it] 55%|█████▍ | 5456/10000 [19:53:12<16:12:56, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.2780000000000002e-05, 'epoch': 2.06} 55%|█████▍ | 5456/10000 [19:53:12<16:12:56, 12.85s/it] 55%|█████▍ | 5457/10000 [19:53:25<16:12:14, 12.84s/it] {'loss': 0.0042, 'learning_rate': 2.2775e-05, 'epoch': 2.06} 55%|█████▍ | 5457/10000 [19:53:25<16:12:14, 12.84s/it] 55%|█████▍ | 5458/10000 [19:53:38<16:13:50, 12.86s/it] {'loss': 0.004, 'learning_rate': 2.2770000000000004e-05, 'epoch': 2.06} 55%|█████▍ | 5458/10000 [19:53:38<16:13:50, 12.86s/it] 55%|█████▍ | 5459/10000 [19:53:51<16:14:23, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.2765e-05, 'epoch': 2.06} 55%|█████▍ | 5459/10000 [19:53:51<16:14:23, 12.87s/it] 55%|█████▍ | 5460/10000 [19:54:04<16:13:09, 12.86s/it] {'loss': 0.0034, 'learning_rate': 2.2760000000000002e-05, 'epoch': 2.06} 55%|█████▍ | 5460/10000 [19:54:04<16:13:09, 12.86s/it] 55%|█████▍ | 5461/10000 [19:54:16<16:11:32, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.2755e-05, 'epoch': 2.06} 55%|█████▍ | 5461/10000 [19:54:16<16:11:32, 12.84s/it] 55%|█████▍ | 5462/10000 [19:54:29<16:12:11, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.275e-05, 'epoch': 2.06} 55%|█████▍ | 5462/10000 [19:54:29<16:12:11, 12.85s/it] 55%|█████▍ | 5463/10000 [19:54:42<16:12:39, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.2745000000000003e-05, 'epoch': 2.06} 55%|█████▍ | 5463/10000 [19:54:42<16:12:39, 12.86s/it] 55%|█████▍ | 5464/10000 [19:54:55<16:11:24, 12.85s/it] {'loss': 0.0047, 'learning_rate': 2.274e-05, 'epoch': 2.06} 55%|█████▍ | 5464/10000 [19:54:55<16:11:24, 12.85s/it] 55%|█████▍ | 5465/10000 [19:55:08<16:13:07, 12.87s/it] {'loss': 0.0042, 'learning_rate': 2.2735000000000002e-05, 'epoch': 2.06} 55%|█████▍ | 5465/10000 [19:55:08<16:13:07, 12.87s/it] 55%|█████▍ | 5466/10000 [19:55:21<16:12:29, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.273e-05, 'epoch': 2.06} 55%|█████▍ | 5466/10000 [19:55:21<16:12:29, 12.87s/it] 55%|█████▍ | 5467/10000 [19:55:34<16:11:12, 12.86s/it] {'loss': 0.0039, 'learning_rate': 2.2725000000000003e-05, 'epoch': 2.06} 55%|█████▍ | 5467/10000 [19:55:34<16:11:12, 12.86s/it] 55%|█████▍ | 5468/10000 [19:55:46<16:11:03, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.2720000000000003e-05, 'epoch': 2.06} 55%|█████▍ | 5468/10000 [19:55:46<16:11:03, 12.86s/it] 55%|█████▍ | 5469/10000 [19:55:59<16:09:20, 12.84s/it] {'loss': 0.0069, 'learning_rate': 2.2715e-05, 'epoch': 2.06} 55%|█████▍ | 5469/10000 [19:55:59<16:09:20, 12.84s/it] 55%|█████▍ | 5470/10000 [19:56:12<16:10:21, 12.85s/it] {'loss': 0.0042, 'learning_rate': 2.271e-05, 'epoch': 2.06} 55%|█████▍ | 5470/10000 [19:56:12<16:10:21, 12.85s/it] 55%|█████▍ | 5471/10000 [19:56:25<16:10:09, 12.85s/it] {'loss': 0.0039, 'learning_rate': 2.2705e-05, 'epoch': 2.06} 55%|█████▍ | 5471/10000 [19:56:25<16:10:09, 12.85s/it] 55%|█████▍ | 5472/10000 [19:56:38<16:09:17, 12.84s/it] {'loss': 0.0041, 'learning_rate': 2.2700000000000003e-05, 'epoch': 2.06} 55%|█████▍ | 5472/10000 [19:56:38<16:09:17, 12.84s/it] 55%|█████▍ | 5473/10000 [19:56:51<16:10:19, 12.86s/it] {'loss': 0.0045, 'learning_rate': 2.2695000000000002e-05, 'epoch': 2.06} 55%|█████▍ | 5473/10000 [19:56:51<16:10:19, 12.86s/it] 55%|█████▍ | 5474/10000 [19:57:04<16:08:59, 12.85s/it] {'loss': 0.0048, 'learning_rate': 2.269e-05, 'epoch': 2.06} 55%|█████▍ | 5474/10000 [19:57:04<16:08:59, 12.85s/it] 55%|█████▍ | 5475/10000 [19:57:16<16:08:18, 12.84s/it] {'loss': 0.0049, 'learning_rate': 2.2685e-05, 'epoch': 2.06} 55%|█████▍ | 5475/10000 [19:57:16<16:08:18, 12.84s/it] 55%|█████▍ | 5476/10000 [19:57:29<16:06:39, 12.82s/it] {'loss': 0.0056, 'learning_rate': 2.268e-05, 'epoch': 2.06} 55%|█████▍ | 5476/10000 [19:57:29<16:06:39, 12.82s/it] 55%|█████▍ | 5477/10000 [19:57:42<16:07:38, 12.84s/it] {'loss': 0.0037, 'learning_rate': 2.2675000000000002e-05, 'epoch': 2.06} 55%|█████▍ | 5477/10000 [19:57:42<16:07:38, 12.84s/it] 55%|█████▍ | 5478/10000 [19:57:55<16:07:01, 12.83s/it] {'loss': 0.0042, 'learning_rate': 2.267e-05, 'epoch': 2.06} 55%|█████▍ | 5478/10000 [19:57:55<16:07:01, 12.83s/it] 55%|█████▍ | 5479/10000 [19:58:08<16:06:42, 12.83s/it] {'loss': 0.0049, 'learning_rate': 2.2665e-05, 'epoch': 2.06} 55%|█████▍ | 5479/10000 [19:58:08<16:06:42, 12.83s/it] 55%|█████▍ | 5480/10000 [19:58:20<16:05:52, 12.82s/it] {'loss': 0.0046, 'learning_rate': 2.266e-05, 'epoch': 2.06} 55%|█████▍ | 5480/10000 [19:58:20<16:05:52, 12.82s/it] 55%|█████▍ | 5481/10000 [19:58:33<16:04:54, 12.81s/it] {'loss': 0.003, 'learning_rate': 2.2655000000000002e-05, 'epoch': 2.07} 55%|█████▍ | 5481/10000 [19:58:33<16:04:54, 12.81s/it] 55%|█████▍ | 5482/10000 [19:58:46<16:06:29, 12.84s/it] {'loss': 0.004, 'learning_rate': 2.265e-05, 'epoch': 2.07} 55%|█████▍ | 5482/10000 [19:58:46<16:06:29, 12.84s/it] 55%|█████▍ | 5483/10000 [19:58:59<16:06:49, 12.84s/it] {'loss': 0.0041, 'learning_rate': 2.2645e-05, 'epoch': 2.07} 55%|█████▍ | 5483/10000 [19:58:59<16:06:49, 12.84s/it] 55%|█████▍ | 5484/10000 [19:59:12<16:06:05, 12.84s/it] {'loss': 0.0049, 'learning_rate': 2.264e-05, 'epoch': 2.07} 55%|█████▍ | 5484/10000 [19:59:12<16:06:05, 12.84s/it] 55%|█████▍ | 5485/10000 [19:59:25<16:06:00, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.2635e-05, 'epoch': 2.07} 55%|█████▍ | 5485/10000 [19:59:25<16:06:00, 12.84s/it] 55%|█████▍ | 5486/10000 [19:59:37<16:04:30, 12.82s/it] {'loss': 0.0047, 'learning_rate': 2.2630000000000002e-05, 'epoch': 2.07} 55%|█████▍ | 5486/10000 [19:59:37<16:04:30, 12.82s/it] 55%|█████▍ | 5487/10000 [19:59:50<16:03:52, 12.81s/it] {'loss': 0.0042, 'learning_rate': 2.2625e-05, 'epoch': 2.07} 55%|█████▍ | 5487/10000 [19:59:50<16:03:52, 12.81s/it] 55%|█████▍ | 5488/10000 [20:00:03<16:04:59, 12.83s/it] {'loss': 0.004, 'learning_rate': 2.2620000000000004e-05, 'epoch': 2.07} 55%|█████▍ | 5488/10000 [20:00:03<16:04:59, 12.83s/it] 55%|█████▍ | 5489/10000 [20:00:16<16:04:30, 12.83s/it] {'loss': 0.0035, 'learning_rate': 2.2615e-05, 'epoch': 2.07} 55%|█████▍ | 5489/10000 [20:00:16<16:04:30, 12.83s/it] 55%|█████▍ | 5490/10000 [20:00:29<16:03:42, 12.82s/it] {'loss': 0.0053, 'learning_rate': 2.2610000000000002e-05, 'epoch': 2.07} 55%|█████▍ | 5490/10000 [20:00:29<16:03:42, 12.82s/it] 55%|█████▍ | 5491/10000 [20:00:42<16:04:29, 12.83s/it] {'loss': 0.0048, 'learning_rate': 2.2605e-05, 'epoch': 2.07} 55%|█████▍ | 5491/10000 [20:00:42<16:04:29, 12.83s/it] 55%|█████▍ | 5492/10000 [20:00:54<16:02:36, 12.81s/it] {'loss': 0.0038, 'learning_rate': 2.26e-05, 'epoch': 2.07} 55%|█████▍ | 5492/10000 [20:00:54<16:02:36, 12.81s/it] 55%|█████▍ | 5493/10000 [20:01:07<16:04:54, 12.85s/it] {'loss': 0.0051, 'learning_rate': 2.2595000000000003e-05, 'epoch': 2.07} 55%|█████▍ | 5493/10000 [20:01:07<16:04:54, 12.85s/it] 55%|█████▍ | 5494/10000 [20:01:20<16:04:05, 12.84s/it] {'loss': 0.0044, 'learning_rate': 2.259e-05, 'epoch': 2.07} 55%|█████▍ | 5494/10000 [20:01:20<16:04:05, 12.84s/it] 55%|█████▍ | 5495/10000 [20:01:33<16:03:32, 12.83s/it] {'loss': 0.0063, 'learning_rate': 2.2585e-05, 'epoch': 2.07} 55%|█████▍ | 5495/10000 [20:01:33<16:03:32, 12.83s/it] 55%|█████▍ | 5496/10000 [20:01:46<16:03:11, 12.83s/it] {'loss': 0.0036, 'learning_rate': 2.258e-05, 'epoch': 2.07} 55%|█████▍ | 5496/10000 [20:01:46<16:03:11, 12.83s/it] 55%|█████▍ | 5497/10000 [20:01:59<16:03:26, 12.84s/it] {'loss': 0.005, 'learning_rate': 2.2575000000000003e-05, 'epoch': 2.07} 55%|█████▍ | 5497/10000 [20:01:59<16:03:26, 12.84s/it] 55%|█████▍ | 5498/10000 [20:02:11<16:03:33, 12.84s/it] {'loss': 0.0058, 'learning_rate': 2.2570000000000002e-05, 'epoch': 2.07} 55%|█████▍ | 5498/10000 [20:02:11<16:03:33, 12.84s/it] 55%|█████▍ | 5499/10000 [20:02:24<16:03:10, 12.84s/it] {'loss': 0.0043, 'learning_rate': 2.2565e-05, 'epoch': 2.07} 55%|█████▍ | 5499/10000 [20:02:24<16:03:10, 12.84s/it] 55%|█████▌ | 5500/10000 [20:02:37<16:03:58, 12.85s/it] {'loss': 0.0046, 'learning_rate': 2.256e-05, 'epoch': 2.07} 55%|█████▌ | 5500/10000 [20:02:37<16:03:58, 12.85s/it] 55%|█████▌ | 5501/10000 [20:02:50<16:03:11, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.2555e-05, 'epoch': 2.07} 55%|█████▌ | 5501/10000 [20:02:50<16:03:11, 12.85s/it] 55%|█████▌ | 5502/10000 [20:03:03<16:01:56, 12.83s/it] {'loss': 0.0053, 'learning_rate': 2.2550000000000003e-05, 'epoch': 2.07} 55%|█████▌ | 5502/10000 [20:03:03<16:01:56, 12.83s/it] 55%|█████▌ | 5503/10000 [20:03:16<16:02:47, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.2545000000000002e-05, 'epoch': 2.07} 55%|█████▌ | 5503/10000 [20:03:16<16:02:47, 12.85s/it] 55%|█████▌ | 5504/10000 [20:03:29<16:02:06, 12.84s/it] {'loss': 0.0042, 'learning_rate': 2.254e-05, 'epoch': 2.07} 55%|█████▌ | 5504/10000 [20:03:29<16:02:06, 12.84s/it] 55%|█████▌ | 5505/10000 [20:03:41<16:02:11, 12.84s/it] {'loss': 0.0043, 'learning_rate': 2.2535e-05, 'epoch': 2.07} 55%|█████▌ | 5505/10000 [20:03:41<16:02:11, 12.84s/it] 55%|█████▌ | 5506/10000 [20:03:54<16:00:26, 12.82s/it] {'loss': 0.0044, 'learning_rate': 2.253e-05, 'epoch': 2.07} 55%|█████▌ | 5506/10000 [20:03:54<16:00:26, 12.82s/it] 55%|█████▌ | 5507/10000 [20:04:07<16:02:39, 12.86s/it] {'loss': 0.0052, 'learning_rate': 2.2525000000000002e-05, 'epoch': 2.07} 55%|█████▌ | 5507/10000 [20:04:07<16:02:39, 12.86s/it] 55%|█████▌ | 5508/10000 [20:04:20<16:03:21, 12.87s/it] {'loss': 0.0042, 'learning_rate': 2.252e-05, 'epoch': 2.08} 55%|█████▌ | 5508/10000 [20:04:20<16:03:21, 12.87s/it] 55%|█████▌ | 5509/10000 [20:04:33<16:03:32, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.2515e-05, 'epoch': 2.08} 55%|█████▌ | 5509/10000 [20:04:33<16:03:32, 12.87s/it] 55%|█████▌ | 5510/10000 [20:04:46<16:02:16, 12.86s/it] {'loss': 0.0034, 'learning_rate': 2.251e-05, 'epoch': 2.08} 55%|█████▌ | 5510/10000 [20:04:46<16:02:16, 12.86s/it] 55%|█████▌ | 5511/10000 [20:04:59<16:02:00, 12.86s/it] {'loss': 0.0051, 'learning_rate': 2.2505000000000002e-05, 'epoch': 2.08} 55%|█████▌ | 5511/10000 [20:04:59<16:02:00, 12.86s/it] 55%|█████▌ | 5512/10000 [20:05:11<16:04:05, 12.89s/it] {'loss': 0.0041, 'learning_rate': 2.25e-05, 'epoch': 2.08} 55%|█████▌ | 5512/10000 [20:05:12<16:04:05, 12.89s/it] 55%|█████▌ | 5513/10000 [20:05:24<16:03:58, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.2495e-05, 'epoch': 2.08} 55%|█████▌ | 5513/10000 [20:05:24<16:03:58, 12.89s/it] 55%|█████▌ | 5514/10000 [20:05:37<16:06:05, 12.92s/it] {'loss': 0.0045, 'learning_rate': 2.249e-05, 'epoch': 2.08} 55%|█████▌ | 5514/10000 [20:05:37<16:06:05, 12.92s/it] 55%|█████▌ | 5515/10000 [20:05:50<16:03:08, 12.88s/it] {'loss': 0.0054, 'learning_rate': 2.2485e-05, 'epoch': 2.08} 55%|█████▌ | 5515/10000 [20:05:50<16:03:08, 12.88s/it] 55%|█████▌ | 5516/10000 [20:06:03<16:01:37, 12.87s/it] {'loss': 0.0054, 'learning_rate': 2.248e-05, 'epoch': 2.08} 55%|█████▌ | 5516/10000 [20:06:03<16:01:37, 12.87s/it] 55%|█████▌ | 5517/10000 [20:06:16<15:59:54, 12.85s/it] {'loss': 0.0048, 'learning_rate': 2.2475e-05, 'epoch': 2.08} 55%|█████▌ | 5517/10000 [20:06:16<15:59:54, 12.85s/it] 55%|█████▌ | 5518/10000 [20:06:29<15:58:58, 12.84s/it] {'loss': 0.005, 'learning_rate': 2.2470000000000003e-05, 'epoch': 2.08} 55%|█████▌ | 5518/10000 [20:06:29<15:58:58, 12.84s/it] 55%|█████▌ | 5519/10000 [20:06:41<15:57:50, 12.83s/it] {'loss': 0.0039, 'learning_rate': 2.2465e-05, 'epoch': 2.08} 55%|█████▌ | 5519/10000 [20:06:41<15:57:50, 12.83s/it] 55%|█████▌ | 5520/10000 [20:06:54<15:56:11, 12.81s/it] {'loss': 0.0038, 'learning_rate': 2.2460000000000002e-05, 'epoch': 2.08} 55%|█████▌ | 5520/10000 [20:06:54<15:56:11, 12.81s/it] 55%|█████▌ | 5521/10000 [20:07:07<15:57:48, 12.83s/it] {'loss': 0.0032, 'learning_rate': 2.2455e-05, 'epoch': 2.08} 55%|█████▌ | 5521/10000 [20:07:07<15:57:48, 12.83s/it] 55%|█████▌ | 5522/10000 [20:07:20<15:57:32, 12.83s/it] {'loss': 0.0048, 'learning_rate': 2.245e-05, 'epoch': 2.08} 55%|█████▌ | 5522/10000 [20:07:20<15:57:32, 12.83s/it] 55%|█████▌ | 5523/10000 [20:07:33<15:59:13, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.2445000000000003e-05, 'epoch': 2.08} 55%|█████▌ | 5523/10000 [20:07:33<15:59:13, 12.86s/it] 55%|█████▌ | 5524/10000 [20:07:46<15:59:35, 12.86s/it] {'loss': 0.006, 'learning_rate': 2.244e-05, 'epoch': 2.08} 55%|█████▌ | 5524/10000 [20:07:46<15:59:35, 12.86s/it] 55%|█████▌ | 5525/10000 [20:07:58<15:57:30, 12.84s/it] {'loss': 0.006, 'learning_rate': 2.2435e-05, 'epoch': 2.08} 55%|█████▌ | 5525/10000 [20:07:58<15:57:30, 12.84s/it] 55%|█████▌ | 5526/10000 [20:08:11<15:56:15, 12.82s/it] {'loss': 0.0054, 'learning_rate': 2.243e-05, 'epoch': 2.08} 55%|█████▌ | 5526/10000 [20:08:11<15:56:15, 12.82s/it] 55%|█████▌ | 5527/10000 [20:08:24<15:54:21, 12.80s/it] {'loss': 0.0061, 'learning_rate': 2.2425000000000003e-05, 'epoch': 2.08} 55%|█████▌ | 5527/10000 [20:08:24<15:54:21, 12.80s/it] 55%|█████▌ | 5528/10000 [20:08:37<15:55:40, 12.82s/it] {'loss': 0.0043, 'learning_rate': 2.2420000000000002e-05, 'epoch': 2.08} 55%|█████▌ | 5528/10000 [20:08:37<15:55:40, 12.82s/it] 55%|█████▌ | 5529/10000 [20:08:50<15:55:26, 12.82s/it] {'loss': 0.0043, 'learning_rate': 2.2415e-05, 'epoch': 2.08} 55%|█████▌ | 5529/10000 [20:08:50<15:55:26, 12.82s/it] 55%|█████▌ | 5530/10000 [20:09:03<15:57:41, 12.85s/it] {'loss': 0.0068, 'learning_rate': 2.241e-05, 'epoch': 2.08} 55%|█████▌ | 5530/10000 [20:09:03<15:57:41, 12.85s/it] 55%|█████▌ | 5531/10000 [20:09:15<15:56:31, 12.84s/it] {'loss': 0.0052, 'learning_rate': 2.2405e-05, 'epoch': 2.08} 55%|█████▌ | 5531/10000 [20:09:15<15:56:31, 12.84s/it] 55%|█████▌ | 5532/10000 [20:09:28<15:58:21, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.2400000000000002e-05, 'epoch': 2.08} 55%|█████▌ | 5532/10000 [20:09:28<15:58:21, 12.87s/it] 55%|█████▌ | 5533/10000 [20:09:41<15:56:53, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.2395e-05, 'epoch': 2.08} 55%|█████▌ | 5533/10000 [20:09:41<15:56:53, 12.85s/it] 55%|█████▌ | 5534/10000 [20:09:54<15:55:49, 12.84s/it] {'loss': 0.0062, 'learning_rate': 2.239e-05, 'epoch': 2.09} 55%|█████▌ | 5534/10000 [20:09:54<15:55:49, 12.84s/it] 55%|█████▌ | 5535/10000 [20:10:07<15:55:10, 12.84s/it] {'loss': 0.0052, 'learning_rate': 2.2385e-05, 'epoch': 2.09} 55%|█████▌ | 5535/10000 [20:10:07<15:55:10, 12.84s/it] 55%|█████▌ | 5536/10000 [20:10:20<15:54:35, 12.83s/it] {'loss': 0.0047, 'learning_rate': 2.2380000000000003e-05, 'epoch': 2.09} 55%|█████▌ | 5536/10000 [20:10:20<15:54:35, 12.83s/it] 55%|█████▌ | 5537/10000 [20:10:32<15:53:37, 12.82s/it] {'loss': 0.0041, 'learning_rate': 2.2375000000000002e-05, 'epoch': 2.09} 55%|█████▌ | 5537/10000 [20:10:32<15:53:37, 12.82s/it] 55%|█████▌ | 5538/10000 [20:10:45<15:53:00, 12.82s/it] {'loss': 0.0051, 'learning_rate': 2.237e-05, 'epoch': 2.09} 55%|█████▌ | 5538/10000 [20:10:45<15:53:00, 12.82s/it] 55%|█████▌ | 5539/10000 [20:10:58<15:52:54, 12.82s/it] {'loss': 0.0048, 'learning_rate': 2.2365e-05, 'epoch': 2.09} 55%|█████▌ | 5539/10000 [20:10:58<15:52:54, 12.82s/it] 55%|█████▌ | 5540/10000 [20:11:11<15:53:36, 12.83s/it] {'loss': 0.0041, 'learning_rate': 2.236e-05, 'epoch': 2.09} 55%|█████▌ | 5540/10000 [20:11:11<15:53:36, 12.83s/it] 55%|█████▌ | 5541/10000 [20:11:24<15:54:29, 12.84s/it] {'loss': 0.0053, 'learning_rate': 2.2355000000000002e-05, 'epoch': 2.09} 55%|█████▌ | 5541/10000 [20:11:24<15:54:29, 12.84s/it] 55%|█████▌ | 5542/10000 [20:11:37<15:54:30, 12.85s/it] {'loss': 0.0032, 'learning_rate': 2.235e-05, 'epoch': 2.09} 55%|█████▌ | 5542/10000 [20:11:37<15:54:30, 12.85s/it] 55%|█████▌ | 5543/10000 [20:11:49<15:52:48, 12.83s/it] {'loss': 0.0056, 'learning_rate': 2.2345e-05, 'epoch': 2.09} 55%|█████▌ | 5543/10000 [20:11:49<15:52:48, 12.83s/it] 55%|█████▌ | 5544/10000 [20:12:02<15:52:54, 12.83s/it] {'loss': 0.0035, 'learning_rate': 2.234e-05, 'epoch': 2.09} 55%|█████▌ | 5544/10000 [20:12:02<15:52:54, 12.83s/it] 55%|█████▌ | 5545/10000 [20:12:15<15:53:52, 12.85s/it] {'loss': 0.0037, 'learning_rate': 2.2335e-05, 'epoch': 2.09} 55%|█████▌ | 5545/10000 [20:12:15<15:53:52, 12.85s/it] 55%|█████▌ | 5546/10000 [20:12:28<15:51:37, 12.82s/it] {'loss': 0.0054, 'learning_rate': 2.233e-05, 'epoch': 2.09} 55%|█████▌ | 5546/10000 [20:12:28<15:51:37, 12.82s/it] 55%|█████▌ | 5547/10000 [20:12:41<15:52:35, 12.84s/it] {'loss': 0.005, 'learning_rate': 2.2325e-05, 'epoch': 2.09} 55%|█████▌ | 5547/10000 [20:12:41<15:52:35, 12.84s/it] 55%|█████▌ | 5548/10000 [20:12:54<15:52:54, 12.84s/it] {'loss': 0.0035, 'learning_rate': 2.2320000000000003e-05, 'epoch': 2.09} 55%|█████▌ | 5548/10000 [20:12:54<15:52:54, 12.84s/it] 55%|█████▌ | 5549/10000 [20:13:06<15:52:33, 12.84s/it] {'loss': 0.0041, 'learning_rate': 2.2315e-05, 'epoch': 2.09} 55%|█████▌ | 5549/10000 [20:13:07<15:52:33, 12.84s/it] 56%|█████▌ | 5550/10000 [20:13:19<15:51:20, 12.83s/it] {'loss': 0.0051, 'learning_rate': 2.231e-05, 'epoch': 2.09} 56%|█████▌ | 5550/10000 [20:13:19<15:51:20, 12.83s/it] 56%|█████▌ | 5551/10000 [20:13:32<15:51:35, 12.83s/it] {'loss': 0.0044, 'learning_rate': 2.2305e-05, 'epoch': 2.09} 56%|█████▌ | 5551/10000 [20:13:32<15:51:35, 12.83s/it] 56%|█████▌ | 5552/10000 [20:13:45<15:51:26, 12.83s/it] {'loss': 0.0043, 'learning_rate': 2.23e-05, 'epoch': 2.09} 56%|█████▌ | 5552/10000 [20:13:45<15:51:26, 12.83s/it] 56%|█████▌ | 5553/10000 [20:13:58<15:50:50, 12.83s/it] {'loss': 0.0059, 'learning_rate': 2.2295000000000003e-05, 'epoch': 2.09} 56%|█████▌ | 5553/10000 [20:13:58<15:50:50, 12.83s/it] 56%|█████▌ | 5554/10000 [20:14:11<15:49:58, 12.82s/it] {'loss': 0.0053, 'learning_rate': 2.229e-05, 'epoch': 2.09} 56%|█████▌ | 5554/10000 [20:14:11<15:49:58, 12.82s/it] 56%|█████▌ | 5555/10000 [20:14:23<15:48:54, 12.81s/it] {'loss': 0.0046, 'learning_rate': 2.2285e-05, 'epoch': 2.09} 56%|█████▌ | 5555/10000 [20:14:23<15:48:54, 12.81s/it] 56%|█████▌ | 5556/10000 [20:14:36<15:49:03, 12.81s/it] {'loss': 0.0045, 'learning_rate': 2.228e-05, 'epoch': 2.09} 56%|█████▌ | 5556/10000 [20:14:36<15:49:03, 12.81s/it] 56%|█████▌ | 5557/10000 [20:14:49<15:50:10, 12.83s/it] {'loss': 0.0051, 'learning_rate': 2.2275000000000003e-05, 'epoch': 2.09} 56%|█████▌ | 5557/10000 [20:14:49<15:50:10, 12.83s/it] 56%|█████▌ | 5558/10000 [20:15:02<15:51:05, 12.85s/it] {'loss': 0.0051, 'learning_rate': 2.2270000000000002e-05, 'epoch': 2.09} 56%|█████▌ | 5558/10000 [20:15:02<15:51:05, 12.85s/it] 56%|█████▌ | 5559/10000 [20:15:15<15:49:53, 12.83s/it] {'loss': 0.0038, 'learning_rate': 2.2265e-05, 'epoch': 2.09} 56%|█████▌ | 5559/10000 [20:15:15<15:49:53, 12.83s/it] 56%|█████▌ | 5560/10000 [20:15:28<15:49:39, 12.83s/it] {'loss': 0.006, 'learning_rate': 2.226e-05, 'epoch': 2.09} 56%|█████▌ | 5560/10000 [20:15:28<15:49:39, 12.83s/it] 56%|█████▌ | 5561/10000 [20:15:40<15:48:44, 12.82s/it] {'loss': 0.0052, 'learning_rate': 2.2255e-05, 'epoch': 2.1} 56%|█████▌ | 5561/10000 [20:15:40<15:48:44, 12.82s/it] 56%|█████▌ | 5562/10000 [20:15:53<15:47:45, 12.81s/it] {'loss': 0.0054, 'learning_rate': 2.2250000000000002e-05, 'epoch': 2.1} 56%|█████▌ | 5562/10000 [20:15:53<15:47:45, 12.81s/it] 56%|█████▌ | 5563/10000 [20:16:06<15:47:54, 12.82s/it] {'loss': 0.0051, 'learning_rate': 2.2245e-05, 'epoch': 2.1} 56%|█████▌ | 5563/10000 [20:16:06<15:47:54, 12.82s/it] 56%|█████▌ | 5564/10000 [20:16:19<15:48:13, 12.83s/it] {'loss': 0.0048, 'learning_rate': 2.224e-05, 'epoch': 2.1} 56%|█████▌ | 5564/10000 [20:16:19<15:48:13, 12.83s/it] 56%|█████▌ | 5565/10000 [20:16:32<15:48:19, 12.83s/it] {'loss': 0.0033, 'learning_rate': 2.2235e-05, 'epoch': 2.1} 56%|█████▌ | 5565/10000 [20:16:32<15:48:19, 12.83s/it] 56%|█████▌ | 5566/10000 [20:16:45<15:48:47, 12.84s/it] {'loss': 0.0044, 'learning_rate': 2.2230000000000002e-05, 'epoch': 2.1} 56%|█████▌ | 5566/10000 [20:16:45<15:48:47, 12.84s/it] 56%|█████▌ | 5567/10000 [20:16:57<15:48:51, 12.84s/it] {'loss': 0.0044, 'learning_rate': 2.2225e-05, 'epoch': 2.1} 56%|█████▌ | 5567/10000 [20:16:57<15:48:51, 12.84s/it] 56%|█████▌ | 5568/10000 [20:17:10<15:50:38, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.222e-05, 'epoch': 2.1} 56%|█████▌ | 5568/10000 [20:17:10<15:50:38, 12.87s/it] 56%|█████▌ | 5569/10000 [20:17:23<15:49:24, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.2215e-05, 'epoch': 2.1} 56%|█████▌ | 5569/10000 [20:17:23<15:49:24, 12.86s/it] 56%|█████▌ | 5570/10000 [20:17:36<15:49:19, 12.86s/it] {'loss': 0.0051, 'learning_rate': 2.221e-05, 'epoch': 2.1} 56%|█████▌ | 5570/10000 [20:17:36<15:49:19, 12.86s/it] 56%|█████▌ | 5571/10000 [20:17:49<15:50:24, 12.88s/it] {'loss': 0.003, 'learning_rate': 2.2205000000000002e-05, 'epoch': 2.1} 56%|█████▌ | 5571/10000 [20:17:49<15:50:24, 12.88s/it] 56%|█████▌ | 5572/10000 [20:18:02<15:49:33, 12.87s/it] {'loss': 0.0059, 'learning_rate': 2.22e-05, 'epoch': 2.1} 56%|█████▌ | 5572/10000 [20:18:02<15:49:33, 12.87s/it] 56%|█████▌ | 5573/10000 [20:18:15<15:47:33, 12.84s/it] {'loss': 0.0055, 'learning_rate': 2.2195000000000003e-05, 'epoch': 2.1} 56%|█████▌ | 5573/10000 [20:18:15<15:47:33, 12.84s/it] 56%|█████▌ | 5574/10000 [20:18:27<15:47:32, 12.85s/it] {'loss': 0.0049, 'learning_rate': 2.219e-05, 'epoch': 2.1} 56%|█████▌ | 5574/10000 [20:18:27<15:47:32, 12.85s/it] 56%|█████▌ | 5575/10000 [20:18:40<15:46:39, 12.84s/it] {'loss': 0.0054, 'learning_rate': 2.2185000000000002e-05, 'epoch': 2.1} 56%|█████▌ | 5575/10000 [20:18:40<15:46:39, 12.84s/it] 56%|█████▌ | 5576/10000 [20:18:53<15:47:13, 12.85s/it] {'loss': 0.0053, 'learning_rate': 2.218e-05, 'epoch': 2.1} 56%|█████▌ | 5576/10000 [20:18:53<15:47:13, 12.85s/it] 56%|█████▌ | 5577/10000 [20:19:06<15:46:30, 12.84s/it] {'loss': 0.0058, 'learning_rate': 2.2175e-05, 'epoch': 2.1} 56%|█████▌ | 5577/10000 [20:19:06<15:46:30, 12.84s/it] 56%|█████▌ | 5578/10000 [20:19:19<15:46:12, 12.84s/it] {'loss': 0.0043, 'learning_rate': 2.2170000000000003e-05, 'epoch': 2.1} 56%|█████▌ | 5578/10000 [20:19:19<15:46:12, 12.84s/it] 56%|█████▌ | 5579/10000 [20:19:32<15:46:39, 12.85s/it] {'loss': 0.0036, 'learning_rate': 2.2165000000000002e-05, 'epoch': 2.1} 56%|█████▌ | 5579/10000 [20:19:32<15:46:39, 12.85s/it] 56%|█████▌ | 5580/10000 [20:19:44<15:46:21, 12.85s/it] {'loss': 0.0041, 'learning_rate': 2.216e-05, 'epoch': 2.1} 56%|█████▌ | 5580/10000 [20:19:45<15:46:21, 12.85s/it] 56%|█████▌ | 5581/10000 [20:19:57<15:46:31, 12.85s/it] {'loss': 0.0039, 'learning_rate': 2.2155e-05, 'epoch': 2.1} 56%|█████▌ | 5581/10000 [20:19:57<15:46:31, 12.85s/it] 56%|█████▌ | 5582/10000 [20:20:10<15:45:10, 12.84s/it] {'loss': 0.0043, 'learning_rate': 2.215e-05, 'epoch': 2.1} 56%|█████▌ | 5582/10000 [20:20:10<15:45:10, 12.84s/it] 56%|█████▌ | 5583/10000 [20:20:23<15:46:04, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.2145000000000002e-05, 'epoch': 2.1} 56%|█████▌ | 5583/10000 [20:20:23<15:46:04, 12.85s/it] 56%|█████▌ | 5584/10000 [20:20:36<15:47:28, 12.87s/it] {'loss': 0.0038, 'learning_rate': 2.214e-05, 'epoch': 2.1} 56%|█████▌ | 5584/10000 [20:20:36<15:47:28, 12.87s/it] 56%|█████▌ | 5585/10000 [20:20:49<15:46:45, 12.87s/it] {'loss': 0.0038, 'learning_rate': 2.2135e-05, 'epoch': 2.1} 56%|█████▌ | 5585/10000 [20:20:49<15:46:45, 12.87s/it] 56%|█████▌ | 5586/10000 [20:21:02<15:45:01, 12.85s/it] {'loss': 0.0045, 'learning_rate': 2.213e-05, 'epoch': 2.1} 56%|█████▌ | 5586/10000 [20:21:02<15:45:01, 12.85s/it] 56%|█████▌ | 5587/10000 [20:21:15<15:47:14, 12.88s/it] {'loss': 0.0061, 'learning_rate': 2.2125000000000002e-05, 'epoch': 2.11} 56%|█████▌ | 5587/10000 [20:21:15<15:47:14, 12.88s/it] 56%|█████▌ | 5588/10000 [20:21:27<15:45:54, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.212e-05, 'epoch': 2.11} 56%|█████▌ | 5588/10000 [20:21:27<15:45:54, 12.86s/it] 56%|█████▌ | 5589/10000 [20:21:40<15:44:30, 12.85s/it] {'loss': 0.0057, 'learning_rate': 2.2115e-05, 'epoch': 2.11} 56%|█████▌ | 5589/10000 [20:21:40<15:44:30, 12.85s/it] 56%|█████▌ | 5590/10000 [20:21:53<15:43:26, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.211e-05, 'epoch': 2.11} 56%|█████▌ | 5590/10000 [20:21:53<15:43:26, 12.84s/it] 56%|█████▌ | 5591/10000 [20:22:06<15:44:15, 12.85s/it] {'loss': 0.0046, 'learning_rate': 2.2105e-05, 'epoch': 2.11} 56%|█████▌ | 5591/10000 [20:22:06<15:44:15, 12.85s/it] 56%|█████▌ | 5592/10000 [20:22:19<15:43:26, 12.84s/it] {'loss': 0.0038, 'learning_rate': 2.2100000000000002e-05, 'epoch': 2.11} 56%|█████▌ | 5592/10000 [20:22:19<15:43:26, 12.84s/it] 56%|█████▌ | 5593/10000 [20:22:32<15:42:53, 12.84s/it] {'loss': 0.0034, 'learning_rate': 2.2095e-05, 'epoch': 2.11} 56%|█████▌ | 5593/10000 [20:22:32<15:42:53, 12.84s/it] 56%|█████▌ | 5594/10000 [20:22:44<15:43:16, 12.85s/it] {'loss': 0.0037, 'learning_rate': 2.2090000000000004e-05, 'epoch': 2.11} 56%|█████▌ | 5594/10000 [20:22:44<15:43:16, 12.85s/it] 56%|█████▌ | 5595/10000 [20:22:57<15:43:22, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.2085e-05, 'epoch': 2.11} 56%|█████▌ | 5595/10000 [20:22:57<15:43:22, 12.85s/it] 56%|█████▌ | 5596/10000 [20:23:10<15:42:44, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.2080000000000002e-05, 'epoch': 2.11} 56%|█████▌ | 5596/10000 [20:23:10<15:42:44, 12.84s/it] 56%|█████▌ | 5597/10000 [20:23:23<15:43:13, 12.85s/it] {'loss': 0.0053, 'learning_rate': 2.2075e-05, 'epoch': 2.11} 56%|█████▌ | 5597/10000 [20:23:23<15:43:13, 12.85s/it] 56%|█████▌ | 5598/10000 [20:23:36<15:41:22, 12.83s/it] {'loss': 0.0067, 'learning_rate': 2.207e-05, 'epoch': 2.11} 56%|█████▌ | 5598/10000 [20:23:36<15:41:22, 12.83s/it] 56%|█████▌ | 5599/10000 [20:23:49<15:43:06, 12.86s/it] {'loss': 0.0034, 'learning_rate': 2.2065000000000003e-05, 'epoch': 2.11} 56%|█████▌ | 5599/10000 [20:23:49<15:43:06, 12.86s/it] 56%|█████▌ | 5600/10000 [20:24:02<15:43:02, 12.86s/it] {'loss': 0.0035, 'learning_rate': 2.206e-05, 'epoch': 2.11} 56%|█████▌ | 5600/10000 [20:24:02<15:43:02, 12.86s/it] 56%|█████▌ | 5601/10000 [20:24:14<15:41:20, 12.84s/it] {'loss': 0.0048, 'learning_rate': 2.2055e-05, 'epoch': 2.11} 56%|█████▌ | 5601/10000 [20:24:14<15:41:20, 12.84s/it] 56%|█████▌ | 5602/10000 [20:24:27<15:41:34, 12.85s/it] {'loss': 0.0037, 'learning_rate': 2.205e-05, 'epoch': 2.11} 56%|█████▌ | 5602/10000 [20:24:27<15:41:34, 12.85s/it] 56%|█████▌ | 5603/10000 [20:24:40<15:42:28, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.2045000000000003e-05, 'epoch': 2.11} 56%|█████▌ | 5603/10000 [20:24:40<15:42:28, 12.86s/it] 56%|█████▌ | 5604/10000 [20:24:53<15:44:26, 12.89s/it] {'loss': 0.005, 'learning_rate': 2.2040000000000002e-05, 'epoch': 2.11} 56%|█████▌ | 5604/10000 [20:24:53<15:44:26, 12.89s/it] 56%|█████▌ | 5605/10000 [20:25:06<15:42:47, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.2035e-05, 'epoch': 2.11} 56%|█████▌ | 5605/10000 [20:25:06<15:42:47, 12.87s/it] 56%|█████▌ | 5606/10000 [20:25:19<15:42:28, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.203e-05, 'epoch': 2.11} 56%|█████▌ | 5606/10000 [20:25:19<15:42:28, 12.87s/it] 56%|█████▌ | 5607/10000 [20:25:32<15:41:41, 12.86s/it] {'loss': 0.0055, 'learning_rate': 2.2025e-05, 'epoch': 2.11} 56%|█████▌ | 5607/10000 [20:25:32<15:41:41, 12.86s/it] 56%|█████▌ | 5608/10000 [20:25:44<15:40:03, 12.84s/it] {'loss': 0.004, 'learning_rate': 2.2020000000000003e-05, 'epoch': 2.11} 56%|█████▌ | 5608/10000 [20:25:44<15:40:03, 12.84s/it] 56%|█████▌ | 5609/10000 [20:25:57<15:38:34, 12.82s/it] {'loss': 0.0051, 'learning_rate': 2.2015000000000002e-05, 'epoch': 2.11} 56%|█████▌ | 5609/10000 [20:25:57<15:38:34, 12.82s/it] 56%|█████▌ | 5610/10000 [20:26:10<15:37:32, 12.81s/it] {'loss': 0.0047, 'learning_rate': 2.201e-05, 'epoch': 2.11} 56%|█████▌ | 5610/10000 [20:26:10<15:37:32, 12.81s/it] 56%|█████▌ | 5611/10000 [20:26:23<15:38:04, 12.82s/it] {'loss': 0.0041, 'learning_rate': 2.2005e-05, 'epoch': 2.11} 56%|█████▌ | 5611/10000 [20:26:23<15:38:04, 12.82s/it] 56%|█████▌ | 5612/10000 [20:26:36<15:39:26, 12.85s/it] {'loss': 0.0041, 'learning_rate': 2.2000000000000003e-05, 'epoch': 2.11} 56%|█████▌ | 5612/10000 [20:26:36<15:39:26, 12.85s/it] 56%|█████▌ | 5613/10000 [20:26:49<15:39:22, 12.85s/it] {'loss': 0.0054, 'learning_rate': 2.1995000000000002e-05, 'epoch': 2.11} 56%|█████▌ | 5613/10000 [20:26:49<15:39:22, 12.85s/it] 56%|█████▌ | 5614/10000 [20:27:01<15:40:23, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.199e-05, 'epoch': 2.12} 56%|█████▌ | 5614/10000 [20:27:01<15:40:23, 12.86s/it] 56%|█████▌ | 5615/10000 [20:27:14<15:38:41, 12.84s/it] {'loss': 0.0054, 'learning_rate': 2.1985e-05, 'epoch': 2.12} 56%|█████▌ | 5615/10000 [20:27:14<15:38:41, 12.84s/it] 56%|█████▌ | 5616/10000 [20:27:27<15:38:17, 12.84s/it] {'loss': 0.0039, 'learning_rate': 2.198e-05, 'epoch': 2.12} 56%|█████▌ | 5616/10000 [20:27:27<15:38:17, 12.84s/it] 56%|█████▌ | 5617/10000 [20:27:40<15:40:11, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.1975000000000002e-05, 'epoch': 2.12} 56%|█████▌ | 5617/10000 [20:27:40<15:40:11, 12.87s/it] 56%|█████▌ | 5618/10000 [20:27:53<15:40:02, 12.87s/it] {'loss': 0.0037, 'learning_rate': 2.197e-05, 'epoch': 2.12} 56%|█████▌ | 5618/10000 [20:27:53<15:40:02, 12.87s/it] 56%|█████▌ | 5619/10000 [20:28:06<15:40:06, 12.88s/it] {'loss': 0.0033, 'learning_rate': 2.1965e-05, 'epoch': 2.12} 56%|█████▌ | 5619/10000 [20:28:06<15:40:06, 12.88s/it] 56%|█████▌ | 5620/10000 [20:28:19<15:39:24, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.196e-05, 'epoch': 2.12} 56%|█████▌ | 5620/10000 [20:28:19<15:39:24, 12.87s/it] 56%|█████▌ | 5621/10000 [20:28:31<15:37:13, 12.84s/it] {'loss': 0.0041, 'learning_rate': 2.1955e-05, 'epoch': 2.12} 56%|█████▌ | 5621/10000 [20:28:31<15:37:13, 12.84s/it] 56%|█████▌ | 5622/10000 [20:28:44<15:35:50, 12.83s/it] {'loss': 0.0051, 'learning_rate': 2.195e-05, 'epoch': 2.12} 56%|█████▌ | 5622/10000 [20:28:44<15:35:50, 12.83s/it] 56%|█████▌ | 5623/10000 [20:28:57<15:35:24, 12.82s/it] {'loss': 0.0046, 'learning_rate': 2.1945e-05, 'epoch': 2.12} 56%|█████▌ | 5623/10000 [20:28:57<15:35:24, 12.82s/it] 56%|█████▌ | 5624/10000 [20:29:10<15:37:07, 12.85s/it] {'loss': 0.0049, 'learning_rate': 2.1940000000000003e-05, 'epoch': 2.12} 56%|█████▌ | 5624/10000 [20:29:10<15:37:07, 12.85s/it] 56%|█████▋ | 5625/10000 [20:29:23<15:37:47, 12.86s/it] {'loss': 0.0045, 'learning_rate': 2.1935e-05, 'epoch': 2.12} 56%|█████▋ | 5625/10000 [20:29:23<15:37:47, 12.86s/it] 56%|█████▋ | 5626/10000 [20:29:36<15:38:47, 12.88s/it] {'loss': 0.0041, 'learning_rate': 2.1930000000000002e-05, 'epoch': 2.12} 56%|█████▋ | 5626/10000 [20:29:36<15:38:47, 12.88s/it] 56%|█████▋ | 5627/10000 [20:29:49<15:38:19, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.1925e-05, 'epoch': 2.12} 56%|█████▋ | 5627/10000 [20:29:49<15:38:19, 12.87s/it] 56%|█████▋ | 5628/10000 [20:30:01<15:36:52, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.192e-05, 'epoch': 2.12} 56%|█████▋ | 5628/10000 [20:30:01<15:36:52, 12.86s/it] 56%|█████▋ | 5629/10000 [20:30:14<15:35:55, 12.85s/it] {'loss': 0.0046, 'learning_rate': 2.1915000000000003e-05, 'epoch': 2.12} 56%|█████▋ | 5629/10000 [20:30:14<15:35:55, 12.85s/it] 56%|█████▋ | 5630/10000 [20:30:27<15:37:17, 12.87s/it] {'loss': 0.0035, 'learning_rate': 2.191e-05, 'epoch': 2.12} 56%|█████▋ | 5630/10000 [20:30:27<15:37:17, 12.87s/it] 56%|█████▋ | 5631/10000 [20:30:40<15:39:04, 12.90s/it] {'loss': 0.0035, 'learning_rate': 2.1905e-05, 'epoch': 2.12} 56%|█████▋ | 5631/10000 [20:30:40<15:39:04, 12.90s/it] 56%|█████▋ | 5632/10000 [20:30:53<15:38:33, 12.89s/it] {'loss': 0.0048, 'learning_rate': 2.19e-05, 'epoch': 2.12} 56%|█████▋ | 5632/10000 [20:30:53<15:38:33, 12.89s/it] 56%|█████▋ | 5633/10000 [20:31:06<15:39:35, 12.91s/it] {'loss': 0.0046, 'learning_rate': 2.1895000000000003e-05, 'epoch': 2.12} 56%|█████▋ | 5633/10000 [20:31:06<15:39:35, 12.91s/it] 56%|█████▋ | 5634/10000 [20:31:19<15:42:34, 12.95s/it] {'loss': 0.0045, 'learning_rate': 2.1890000000000002e-05, 'epoch': 2.12} 56%|█████▋ | 5634/10000 [20:31:19<15:42:34, 12.95s/it] 56%|█████▋ | 5635/10000 [20:31:32<15:40:16, 12.92s/it] {'loss': 0.004, 'learning_rate': 2.1885e-05, 'epoch': 2.12} 56%|█████▋ | 5635/10000 [20:31:32<15:40:16, 12.92s/it] 56%|█████▋ | 5636/10000 [20:31:45<15:41:18, 12.94s/it] {'loss': 0.0041, 'learning_rate': 2.188e-05, 'epoch': 2.12} 56%|█████▋ | 5636/10000 [20:31:45<15:41:18, 12.94s/it] 56%|█████▋ | 5637/10000 [20:31:58<15:39:57, 12.93s/it] {'loss': 0.0041, 'learning_rate': 2.1875e-05, 'epoch': 2.12} 56%|█████▋ | 5637/10000 [20:31:58<15:39:57, 12.93s/it] 56%|█████▋ | 5638/10000 [20:32:11<15:38:03, 12.90s/it] {'loss': 0.0054, 'learning_rate': 2.1870000000000002e-05, 'epoch': 2.12} 56%|█████▋ | 5638/10000 [20:32:11<15:38:03, 12.90s/it] 56%|█████▋ | 5639/10000 [20:32:23<15:37:35, 12.90s/it] {'loss': 0.004, 'learning_rate': 2.1865e-05, 'epoch': 2.12} 56%|█████▋ | 5639/10000 [20:32:23<15:37:35, 12.90s/it] 56%|█████▋ | 5640/10000 [20:32:36<15:36:47, 12.89s/it] {'loss': 0.0048, 'learning_rate': 2.186e-05, 'epoch': 2.13} 56%|█████▋ | 5640/10000 [20:32:36<15:36:47, 12.89s/it] 56%|█████▋ | 5641/10000 [20:32:49<15:35:22, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.1855e-05, 'epoch': 2.13} 56%|█████▋ | 5641/10000 [20:32:49<15:35:22, 12.88s/it] 56%|█████▋ | 5642/10000 [20:33:02<15:36:18, 12.89s/it] {'loss': 0.0038, 'learning_rate': 2.1850000000000003e-05, 'epoch': 2.13} 56%|█████▋ | 5642/10000 [20:33:02<15:36:18, 12.89s/it] 56%|█████▋ | 5643/10000 [20:33:15<15:37:23, 12.91s/it] {'loss': 0.005, 'learning_rate': 2.1845000000000002e-05, 'epoch': 2.13} 56%|█████▋ | 5643/10000 [20:33:15<15:37:23, 12.91s/it] 56%|█████▋ | 5644/10000 [20:33:28<15:35:15, 12.88s/it] {'loss': 0.0045, 'learning_rate': 2.184e-05, 'epoch': 2.13} 56%|█████▋ | 5644/10000 [20:33:28<15:35:15, 12.88s/it] 56%|█████▋ | 5645/10000 [20:33:41<15:32:45, 12.85s/it] {'loss': 0.0049, 'learning_rate': 2.1835e-05, 'epoch': 2.13} 56%|█████▋ | 5645/10000 [20:33:41<15:32:45, 12.85s/it] 56%|█████▋ | 5646/10000 [20:33:53<15:32:07, 12.85s/it] {'loss': 0.0045, 'learning_rate': 2.183e-05, 'epoch': 2.13} 56%|█████▋ | 5646/10000 [20:33:54<15:32:07, 12.85s/it] 56%|█████▋ | 5647/10000 [20:34:06<15:33:46, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.1825000000000002e-05, 'epoch': 2.13} 56%|█████▋ | 5647/10000 [20:34:06<15:33:46, 12.87s/it] 56%|█████▋ | 5648/10000 [20:34:19<15:31:38, 12.84s/it] {'loss': 0.0053, 'learning_rate': 2.182e-05, 'epoch': 2.13} 56%|█████▋ | 5648/10000 [20:34:19<15:31:38, 12.84s/it] 56%|█████▋ | 5649/10000 [20:34:32<15:32:42, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.1815000000000004e-05, 'epoch': 2.13} 56%|█████▋ | 5649/10000 [20:34:32<15:32:42, 12.86s/it] 56%|█████▋ | 5650/10000 [20:34:45<15:31:44, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.181e-05, 'epoch': 2.13} 56%|█████▋ | 5650/10000 [20:34:45<15:31:44, 12.85s/it] 57%|█████▋ | 5651/10000 [20:34:58<15:31:45, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.1805e-05, 'epoch': 2.13} 57%|█████▋ | 5651/10000 [20:34:58<15:31:45, 12.85s/it] 57%|█████▋ | 5652/10000 [20:35:11<15:31:52, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.18e-05, 'epoch': 2.13} 57%|█████▋ | 5652/10000 [20:35:11<15:31:52, 12.86s/it] 57%|█████▋ | 5653/10000 [20:35:23<15:30:41, 12.85s/it] {'loss': 0.0037, 'learning_rate': 2.1795e-05, 'epoch': 2.13} 57%|█████▋ | 5653/10000 [20:35:23<15:30:41, 12.85s/it] 57%|█████▋ | 5654/10000 [20:35:36<15:32:34, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.1790000000000003e-05, 'epoch': 2.13} 57%|█████▋ | 5654/10000 [20:35:36<15:32:34, 12.87s/it] 57%|█████▋ | 5655/10000 [20:35:49<15:32:32, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.1785e-05, 'epoch': 2.13} 57%|█████▋ | 5655/10000 [20:35:49<15:32:32, 12.88s/it] 57%|█████▋ | 5656/10000 [20:36:02<15:30:57, 12.86s/it] {'loss': 0.0067, 'learning_rate': 2.178e-05, 'epoch': 2.13} 57%|█████▋ | 5656/10000 [20:36:02<15:30:57, 12.86s/it] 57%|█████▋ | 5657/10000 [20:36:15<15:31:02, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.1775e-05, 'epoch': 2.13} 57%|█████▋ | 5657/10000 [20:36:15<15:31:02, 12.86s/it] 57%|█████▋ | 5658/10000 [20:36:28<15:30:36, 12.86s/it] {'loss': 0.004, 'learning_rate': 2.177e-05, 'epoch': 2.13} 57%|█████▋ | 5658/10000 [20:36:28<15:30:36, 12.86s/it] 57%|█████▋ | 5659/10000 [20:36:41<15:29:49, 12.85s/it] {'loss': 0.004, 'learning_rate': 2.1765000000000003e-05, 'epoch': 2.13} 57%|█████▋ | 5659/10000 [20:36:41<15:29:49, 12.85s/it] 57%|█████▋ | 5660/10000 [20:36:54<15:30:43, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.176e-05, 'epoch': 2.13} 57%|█████▋ | 5660/10000 [20:36:54<15:30:43, 12.87s/it] 57%|█████▋ | 5661/10000 [20:37:06<15:30:36, 12.87s/it] {'loss': 0.0039, 'learning_rate': 2.1755e-05, 'epoch': 2.13} 57%|█████▋ | 5661/10000 [20:37:06<15:30:36, 12.87s/it] 57%|█████▋ | 5662/10000 [20:37:19<15:30:51, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.175e-05, 'epoch': 2.13} 57%|█████▋ | 5662/10000 [20:37:19<15:30:51, 12.87s/it] 57%|█████▋ | 5663/10000 [20:37:32<15:30:21, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.1745000000000003e-05, 'epoch': 2.13} 57%|█████▋ | 5663/10000 [20:37:32<15:30:21, 12.87s/it] 57%|█████▋ | 5664/10000 [20:37:45<15:29:38, 12.86s/it] {'loss': 0.0055, 'learning_rate': 2.1740000000000002e-05, 'epoch': 2.13} 57%|█████▋ | 5664/10000 [20:37:45<15:29:38, 12.86s/it] 57%|█████▋ | 5665/10000 [20:37:58<15:28:51, 12.86s/it] {'loss': 0.0065, 'learning_rate': 2.1735e-05, 'epoch': 2.13} 57%|█████▋ | 5665/10000 [20:37:58<15:28:51, 12.86s/it] 57%|█████▋ | 5666/10000 [20:38:11<15:28:57, 12.86s/it] {'loss': 0.0058, 'learning_rate': 2.173e-05, 'epoch': 2.13} 57%|█████▋ | 5666/10000 [20:38:11<15:28:57, 12.86s/it] 57%|█████▋ | 5667/10000 [20:38:24<15:28:05, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.1725e-05, 'epoch': 2.14} 57%|█████▋ | 5667/10000 [20:38:24<15:28:05, 12.85s/it] 57%|█████▋ | 5668/10000 [20:38:36<15:29:18, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.1720000000000002e-05, 'epoch': 2.14} 57%|█████▋ | 5668/10000 [20:38:37<15:29:18, 12.87s/it] 57%|█████▋ | 5669/10000 [20:38:49<15:29:27, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.1715e-05, 'epoch': 2.14} 57%|█████▋ | 5669/10000 [20:38:49<15:29:27, 12.88s/it] 57%|█████▋ | 5670/10000 [20:39:02<15:28:19, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.171e-05, 'epoch': 2.14} 57%|█████▋ | 5670/10000 [20:39:02<15:28:19, 12.86s/it] 57%|█████▋ | 5671/10000 [20:39:15<15:28:46, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.1705e-05, 'epoch': 2.14} 57%|█████▋ | 5671/10000 [20:39:15<15:28:46, 12.87s/it] 57%|█████▋ | 5672/10000 [20:39:28<15:29:23, 12.88s/it] {'loss': 0.0045, 'learning_rate': 2.1700000000000002e-05, 'epoch': 2.14} 57%|█████▋ | 5672/10000 [20:39:28<15:29:23, 12.88s/it] 57%|█████▋ | 5673/10000 [20:39:41<15:29:11, 12.88s/it] {'loss': 0.0055, 'learning_rate': 2.1695e-05, 'epoch': 2.14} 57%|█████▋ | 5673/10000 [20:39:41<15:29:11, 12.88s/it] 57%|█████▋ | 5674/10000 [20:39:54<15:28:06, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.169e-05, 'epoch': 2.14} 57%|█████▋ | 5674/10000 [20:39:54<15:28:06, 12.87s/it] 57%|█████▋ | 5675/10000 [20:40:07<15:27:04, 12.86s/it] {'loss': 0.0042, 'learning_rate': 2.1685e-05, 'epoch': 2.14} 57%|█████▋ | 5675/10000 [20:40:07<15:27:04, 12.86s/it] 57%|█████▋ | 5676/10000 [20:40:19<15:27:53, 12.88s/it] {'loss': 0.0039, 'learning_rate': 2.168e-05, 'epoch': 2.14} 57%|█████▋ | 5676/10000 [20:40:20<15:27:53, 12.88s/it] 57%|█████▋ | 5677/10000 [20:40:32<15:27:01, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.1675e-05, 'epoch': 2.14} 57%|█████▋ | 5677/10000 [20:40:32<15:27:01, 12.87s/it] 57%|█████▋ | 5678/10000 [20:40:45<15:24:24, 12.83s/it] {'loss': 0.0062, 'learning_rate': 2.167e-05, 'epoch': 2.14} 57%|█████▋ | 5678/10000 [20:40:45<15:24:24, 12.83s/it] 57%|█████▋ | 5679/10000 [20:40:58<15:23:14, 12.82s/it] {'loss': 0.0054, 'learning_rate': 2.1665000000000003e-05, 'epoch': 2.14} 57%|█████▋ | 5679/10000 [20:40:58<15:23:14, 12.82s/it] 57%|█████▋ | 5680/10000 [20:41:11<15:26:06, 12.86s/it] {'loss': 0.0037, 'learning_rate': 2.166e-05, 'epoch': 2.14} 57%|█████▋ | 5680/10000 [20:41:11<15:26:06, 12.86s/it] 57%|█████▋ | 5681/10000 [20:41:24<15:26:36, 12.87s/it] {'loss': 0.0039, 'learning_rate': 2.1655000000000002e-05, 'epoch': 2.14} 57%|█████▋ | 5681/10000 [20:41:24<15:26:36, 12.87s/it] 57%|█████▋ | 5682/10000 [20:41:37<15:26:47, 12.88s/it] {'loss': 0.0036, 'learning_rate': 2.165e-05, 'epoch': 2.14} 57%|█████▋ | 5682/10000 [20:41:37<15:26:47, 12.88s/it] 57%|█████▋ | 5683/10000 [20:41:49<15:25:26, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.1645e-05, 'epoch': 2.14} 57%|█████▋ | 5683/10000 [20:41:49<15:25:26, 12.86s/it] 57%|█████▋ | 5684/10000 [20:42:02<15:25:14, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.1640000000000003e-05, 'epoch': 2.14} 57%|█████▋ | 5684/10000 [20:42:02<15:25:14, 12.86s/it] 57%|█████▋ | 5685/10000 [20:42:15<15:26:57, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.1635e-05, 'epoch': 2.14} 57%|█████▋ | 5685/10000 [20:42:15<15:26:57, 12.89s/it] 57%|█████▋ | 5686/10000 [20:42:28<15:25:26, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.163e-05, 'epoch': 2.14} 57%|█████▋ | 5686/10000 [20:42:28<15:25:26, 12.87s/it] 57%|█████▋ | 5687/10000 [20:42:41<15:24:29, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.1625e-05, 'epoch': 2.14} 57%|█████▋ | 5687/10000 [20:42:41<15:24:29, 12.86s/it] 57%|█████▋ | 5688/10000 [20:42:54<15:24:35, 12.87s/it] {'loss': 0.0053, 'learning_rate': 2.162e-05, 'epoch': 2.14} 57%|█████▋ | 5688/10000 [20:42:54<15:24:35, 12.87s/it] 57%|█████▋ | 5689/10000 [20:43:07<15:23:49, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.1615000000000002e-05, 'epoch': 2.14} 57%|█████▋ | 5689/10000 [20:43:07<15:23:49, 12.86s/it] 57%|█████▋ | 5690/10000 [20:43:20<15:24:28, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.1609999999999998e-05, 'epoch': 2.14} 57%|█████▋ | 5690/10000 [20:43:20<15:24:28, 12.87s/it] 57%|█████▋ | 5691/10000 [20:43:32<15:25:47, 12.89s/it] {'loss': 0.0039, 'learning_rate': 2.1605e-05, 'epoch': 2.14} 57%|█████▋ | 5691/10000 [20:43:33<15:25:47, 12.89s/it] 57%|█████▋ | 5692/10000 [20:43:45<15:24:53, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.16e-05, 'epoch': 2.14} 57%|█████▋ | 5692/10000 [20:43:45<15:24:53, 12.88s/it] 57%|█████▋ | 5693/10000 [20:43:58<15:23:24, 12.86s/it] {'loss': 0.0033, 'learning_rate': 2.1595000000000002e-05, 'epoch': 2.15} 57%|█████▋ | 5693/10000 [20:43:58<15:23:24, 12.86s/it] 57%|█████▋ | 5694/10000 [20:44:11<15:22:53, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.159e-05, 'epoch': 2.15} 57%|█████▋ | 5694/10000 [20:44:11<15:22:53, 12.86s/it] 57%|█████▋ | 5695/10000 [20:44:24<15:24:05, 12.88s/it] {'loss': 0.005, 'learning_rate': 2.1585e-05, 'epoch': 2.15} 57%|█████▋ | 5695/10000 [20:44:24<15:24:05, 12.88s/it] 57%|█████▋ | 5696/10000 [20:44:37<15:23:42, 12.88s/it] {'loss': 0.0038, 'learning_rate': 2.158e-05, 'epoch': 2.15} 57%|█████▋ | 5696/10000 [20:44:37<15:23:42, 12.88s/it] 57%|█████▋ | 5697/10000 [20:44:50<15:23:17, 12.87s/it] {'loss': 0.0052, 'learning_rate': 2.1575e-05, 'epoch': 2.15} 57%|█████▋ | 5697/10000 [20:44:50<15:23:17, 12.87s/it] 57%|█████▋ | 5698/10000 [20:45:03<15:22:10, 12.86s/it] {'loss': 0.0052, 'learning_rate': 2.1570000000000002e-05, 'epoch': 2.15} 57%|█████▋ | 5698/10000 [20:45:03<15:22:10, 12.86s/it] 57%|█████▋ | 5699/10000 [20:45:15<15:19:33, 12.83s/it] {'loss': 0.0056, 'learning_rate': 2.1565e-05, 'epoch': 2.15} 57%|█████▋ | 5699/10000 [20:45:15<15:19:33, 12.83s/it] 57%|█████▋ | 5700/10000 [20:45:28<15:19:02, 12.82s/it] {'loss': 0.0038, 'learning_rate': 2.1560000000000004e-05, 'epoch': 2.15} 57%|█████▋ | 5700/10000 [20:45:28<15:19:02, 12.82s/it] 57%|█████▋ | 5701/10000 [20:45:41<15:18:55, 12.83s/it] {'loss': 0.0034, 'learning_rate': 2.1555e-05, 'epoch': 2.15} 57%|█████▋ | 5701/10000 [20:45:41<15:18:55, 12.83s/it] 57%|█████▋ | 5702/10000 [20:45:54<15:19:46, 12.84s/it] {'loss': 0.0041, 'learning_rate': 2.1550000000000002e-05, 'epoch': 2.15} 57%|█████▋ | 5702/10000 [20:45:54<15:19:46, 12.84s/it] 57%|█████▋ | 5703/10000 [20:46:07<15:19:52, 12.84s/it] {'loss': 0.0049, 'learning_rate': 2.1545e-05, 'epoch': 2.15} 57%|█████▋ | 5703/10000 [20:46:07<15:19:52, 12.84s/it] 57%|█████▋ | 5704/10000 [20:46:19<15:18:44, 12.83s/it] {'loss': 0.0041, 'learning_rate': 2.154e-05, 'epoch': 2.15} 57%|█████▋ | 5704/10000 [20:46:19<15:18:44, 12.83s/it] 57%|█████▋ | 5705/10000 [20:46:32<15:19:15, 12.84s/it] {'loss': 0.0038, 'learning_rate': 2.1535000000000003e-05, 'epoch': 2.15} 57%|█████▋ | 5705/10000 [20:46:32<15:19:15, 12.84s/it] 57%|█████▋ | 5706/10000 [20:46:45<15:19:19, 12.85s/it] {'loss': 0.0037, 'learning_rate': 2.153e-05, 'epoch': 2.15} 57%|█████▋ | 5706/10000 [20:46:45<15:19:19, 12.85s/it] 57%|█████▋ | 5707/10000 [20:46:58<15:20:01, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.1525e-05, 'epoch': 2.15} 57%|█████▋ | 5707/10000 [20:46:58<15:20:01, 12.86s/it] 57%|█████▋ | 5708/10000 [20:47:11<15:19:22, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.152e-05, 'epoch': 2.15} 57%|█████▋ | 5708/10000 [20:47:11<15:19:22, 12.85s/it] 57%|█████▋ | 5709/10000 [20:47:24<15:19:34, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.1515000000000003e-05, 'epoch': 2.15} 57%|█████▋ | 5709/10000 [20:47:24<15:19:34, 12.86s/it] 57%|█████▋ | 5710/10000 [20:47:37<15:18:01, 12.84s/it] {'loss': 0.006, 'learning_rate': 2.1510000000000002e-05, 'epoch': 2.15} 57%|█████▋ | 5710/10000 [20:47:37<15:18:01, 12.84s/it] 57%|█████▋ | 5711/10000 [20:47:49<15:17:36, 12.84s/it] {'loss': 0.0044, 'learning_rate': 2.1505e-05, 'epoch': 2.15} 57%|█████▋ | 5711/10000 [20:47:49<15:17:36, 12.84s/it] 57%|█████▋ | 5712/10000 [20:48:02<15:17:56, 12.84s/it] {'loss': 0.0039, 'learning_rate': 2.15e-05, 'epoch': 2.15} 57%|█████▋ | 5712/10000 [20:48:02<15:17:56, 12.84s/it] 57%|█████▋ | 5713/10000 [20:48:15<15:20:24, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.1495e-05, 'epoch': 2.15} 57%|█████▋ | 5713/10000 [20:48:15<15:20:24, 12.88s/it] 57%|█████▋ | 5714/10000 [20:48:28<15:20:18, 12.88s/it] {'loss': 0.0055, 'learning_rate': 2.1490000000000003e-05, 'epoch': 2.15} 57%|█████▋ | 5714/10000 [20:48:28<15:20:18, 12.88s/it] 57%|█████▋ | 5715/10000 [20:48:41<15:18:53, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.1485000000000002e-05, 'epoch': 2.15} 57%|█████▋ | 5715/10000 [20:48:41<15:18:53, 12.87s/it] 57%|█████▋ | 5716/10000 [20:48:54<15:18:20, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.148e-05, 'epoch': 2.15} 57%|█████▋ | 5716/10000 [20:48:54<15:18:20, 12.86s/it] 57%|█████▋ | 5717/10000 [20:49:07<15:17:58, 12.86s/it] {'loss': 0.0042, 'learning_rate': 2.1475e-05, 'epoch': 2.15} 57%|█████▋ | 5717/10000 [20:49:07<15:17:58, 12.86s/it] 57%|█████▋ | 5718/10000 [20:49:19<15:17:26, 12.86s/it] {'loss': 0.0045, 'learning_rate': 2.1470000000000003e-05, 'epoch': 2.15} 57%|█████▋ | 5718/10000 [20:49:20<15:17:26, 12.86s/it] 57%|█████▋ | 5719/10000 [20:49:32<15:17:39, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.1465000000000002e-05, 'epoch': 2.15} 57%|█████▋ | 5719/10000 [20:49:32<15:17:39, 12.86s/it] 57%|█████▋ | 5720/10000 [20:49:45<15:16:36, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.146e-05, 'epoch': 2.16} 57%|█████▋ | 5720/10000 [20:49:45<15:16:36, 12.85s/it] 57%|█████▋ | 5721/10000 [20:49:58<15:16:38, 12.85s/it] {'loss': 0.0041, 'learning_rate': 2.1455e-05, 'epoch': 2.16} 57%|█████▋ | 5721/10000 [20:49:58<15:16:38, 12.85s/it] 57%|█████▋ | 5722/10000 [20:50:11<15:15:39, 12.84s/it] {'loss': 0.0048, 'learning_rate': 2.145e-05, 'epoch': 2.16} 57%|█████▋ | 5722/10000 [20:50:11<15:15:39, 12.84s/it] 57%|█████▋ | 5723/10000 [20:50:24<15:15:38, 12.85s/it] {'loss': 0.0049, 'learning_rate': 2.1445000000000002e-05, 'epoch': 2.16} 57%|█████▋ | 5723/10000 [20:50:24<15:15:38, 12.85s/it] 57%|█████▋ | 5724/10000 [20:50:37<15:14:09, 12.83s/it] {'loss': 0.005, 'learning_rate': 2.144e-05, 'epoch': 2.16} 57%|█████▋ | 5724/10000 [20:50:37<15:14:09, 12.83s/it] 57%|█████▋ | 5725/10000 [20:50:49<15:13:03, 12.81s/it] {'loss': 0.0044, 'learning_rate': 2.1435000000000004e-05, 'epoch': 2.16} 57%|█████▋ | 5725/10000 [20:50:49<15:13:03, 12.81s/it] 57%|█████▋ | 5726/10000 [20:51:02<15:14:47, 12.84s/it] {'loss': 0.0043, 'learning_rate': 2.143e-05, 'epoch': 2.16} 57%|█████▋ | 5726/10000 [20:51:02<15:14:47, 12.84s/it] 57%|█████▋ | 5727/10000 [20:51:15<15:14:19, 12.84s/it] {'loss': 0.0051, 'learning_rate': 2.1425e-05, 'epoch': 2.16} 57%|█████▋ | 5727/10000 [20:51:15<15:14:19, 12.84s/it] 57%|█████▋ | 5728/10000 [20:51:28<15:14:44, 12.85s/it] {'loss': 0.0038, 'learning_rate': 2.142e-05, 'epoch': 2.16} 57%|█████▋ | 5728/10000 [20:51:28<15:14:44, 12.85s/it] 57%|█████▋ | 5729/10000 [20:51:41<15:14:31, 12.85s/it] {'loss': 0.0057, 'learning_rate': 2.1415e-05, 'epoch': 2.16} 57%|█████▋ | 5729/10000 [20:51:41<15:14:31, 12.85s/it] 57%|█████▋ | 5730/10000 [20:51:54<15:14:19, 12.85s/it] {'loss': 0.0048, 'learning_rate': 2.1410000000000003e-05, 'epoch': 2.16} 57%|█████▋ | 5730/10000 [20:51:54<15:14:19, 12.85s/it] 57%|█████▋ | 5731/10000 [20:52:07<15:15:35, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.1405e-05, 'epoch': 2.16} 57%|█████▋ | 5731/10000 [20:52:07<15:15:35, 12.87s/it] 57%|█████▋ | 5732/10000 [20:52:19<15:14:28, 12.86s/it] {'loss': 0.004, 'learning_rate': 2.1400000000000002e-05, 'epoch': 2.16} 57%|█████▋ | 5732/10000 [20:52:19<15:14:28, 12.86s/it] 57%|█████▋ | 5733/10000 [20:52:32<15:13:54, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.1395e-05, 'epoch': 2.16} 57%|█████▋ | 5733/10000 [20:52:32<15:13:54, 12.85s/it] 57%|█████▋ | 5734/10000 [20:52:45<15:13:45, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.139e-05, 'epoch': 2.16} 57%|█████▋ | 5734/10000 [20:52:45<15:13:45, 12.85s/it] 57%|█████▋ | 5735/10000 [20:52:58<15:12:56, 12.84s/it] {'loss': 0.0048, 'learning_rate': 2.1385000000000003e-05, 'epoch': 2.16} 57%|█████▋ | 5735/10000 [20:52:58<15:12:56, 12.84s/it] 57%|█████▋ | 5736/10000 [20:53:11<15:11:40, 12.83s/it] {'loss': 0.0039, 'learning_rate': 2.138e-05, 'epoch': 2.16} 57%|█████▋ | 5736/10000 [20:53:11<15:11:40, 12.83s/it] 57%|█████▋ | 5737/10000 [20:53:23<15:11:54, 12.83s/it] {'loss': 0.0041, 'learning_rate': 2.1375e-05, 'epoch': 2.16} 57%|█████▋ | 5737/10000 [20:53:24<15:11:54, 12.83s/it] 57%|█████▋ | 5738/10000 [20:53:36<15:11:17, 12.83s/it] {'loss': 0.0041, 'learning_rate': 2.137e-05, 'epoch': 2.16} 57%|█████▋ | 5738/10000 [20:53:36<15:11:17, 12.83s/it] 57%|█████▋ | 5739/10000 [20:53:49<15:13:21, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.1365000000000003e-05, 'epoch': 2.16} 57%|█████▋ | 5739/10000 [20:53:49<15:13:21, 12.86s/it] 57%|█████▋ | 5740/10000 [20:54:02<15:12:55, 12.86s/it] {'loss': 0.0056, 'learning_rate': 2.1360000000000002e-05, 'epoch': 2.16} 57%|█████▋ | 5740/10000 [20:54:02<15:12:55, 12.86s/it] 57%|█████▋ | 5741/10000 [20:54:15<15:13:27, 12.87s/it] {'loss': 0.0034, 'learning_rate': 2.1355e-05, 'epoch': 2.16} 57%|█████▋ | 5741/10000 [20:54:15<15:13:27, 12.87s/it] 57%|█████▋ | 5742/10000 [20:54:28<15:12:10, 12.85s/it] {'loss': 0.0038, 'learning_rate': 2.135e-05, 'epoch': 2.16} 57%|█████▋ | 5742/10000 [20:54:28<15:12:10, 12.85s/it] 57%|█████▋ | 5743/10000 [20:54:41<15:11:17, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.1345e-05, 'epoch': 2.16} 57%|█████▋ | 5743/10000 [20:54:41<15:11:17, 12.84s/it] 57%|█████▋ | 5744/10000 [20:54:54<15:11:57, 12.86s/it] {'loss': 0.004, 'learning_rate': 2.1340000000000002e-05, 'epoch': 2.16} 57%|█████▋ | 5744/10000 [20:54:54<15:11:57, 12.86s/it] 57%|█████▋ | 5745/10000 [20:55:06<15:11:30, 12.85s/it] {'loss': 0.0037, 'learning_rate': 2.1335e-05, 'epoch': 2.16} 57%|█████▋ | 5745/10000 [20:55:06<15:11:30, 12.85s/it] 57%|█████▋ | 5746/10000 [20:55:19<15:10:48, 12.85s/it] {'loss': 0.0046, 'learning_rate': 2.133e-05, 'epoch': 2.17} 57%|█████▋ | 5746/10000 [20:55:19<15:10:48, 12.85s/it] 57%|█████▋ | 5747/10000 [20:55:32<15:08:02, 12.81s/it] {'loss': 0.0061, 'learning_rate': 2.1325e-05, 'epoch': 2.17} 57%|█████▋ | 5747/10000 [20:55:32<15:08:02, 12.81s/it] 57%|█████▋ | 5748/10000 [20:55:45<15:07:13, 12.80s/it] {'loss': 0.0049, 'learning_rate': 2.1320000000000003e-05, 'epoch': 2.17} 57%|█████▋ | 5748/10000 [20:55:45<15:07:13, 12.80s/it] 57%|█████▋ | 5749/10000 [20:55:58<15:08:07, 12.82s/it] {'loss': 0.0046, 'learning_rate': 2.1315000000000002e-05, 'epoch': 2.17} 57%|█████▋ | 5749/10000 [20:55:58<15:08:07, 12.82s/it] 57%|█████▊ | 5750/10000 [20:56:10<15:08:52, 12.83s/it] {'loss': 0.0049, 'learning_rate': 2.131e-05, 'epoch': 2.17} 57%|█████▊ | 5750/10000 [20:56:10<15:08:52, 12.83s/it] 58%|█████▊ | 5751/10000 [20:56:23<15:08:31, 12.83s/it] {'loss': 0.0053, 'learning_rate': 2.1305e-05, 'epoch': 2.17} 58%|█████▊ | 5751/10000 [20:56:23<15:08:31, 12.83s/it] 58%|█████▊ | 5752/10000 [20:56:36<15:08:46, 12.84s/it] {'loss': 0.0046, 'learning_rate': 2.13e-05, 'epoch': 2.17} 58%|█████▊ | 5752/10000 [20:56:36<15:08:46, 12.84s/it] 58%|█████▊ | 5753/10000 [20:56:49<15:07:34, 12.82s/it] {'loss': 0.0047, 'learning_rate': 2.1295000000000002e-05, 'epoch': 2.17} 58%|█████▊ | 5753/10000 [20:56:49<15:07:34, 12.82s/it] 58%|█████▊ | 5754/10000 [20:57:02<15:08:32, 12.84s/it] {'loss': 0.0048, 'learning_rate': 2.129e-05, 'epoch': 2.17} 58%|█████▊ | 5754/10000 [20:57:02<15:08:32, 12.84s/it] 58%|█████▊ | 5755/10000 [20:57:15<15:09:26, 12.85s/it] {'loss': 0.0045, 'learning_rate': 2.1285000000000004e-05, 'epoch': 2.17} 58%|█████▊ | 5755/10000 [20:57:15<15:09:26, 12.85s/it] 58%|█████▊ | 5756/10000 [20:57:28<15:09:11, 12.85s/it] {'loss': 0.0057, 'learning_rate': 2.128e-05, 'epoch': 2.17} 58%|█████▊ | 5756/10000 [20:57:28<15:09:11, 12.85s/it] 58%|█████▊ | 5757/10000 [20:57:40<15:08:29, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.1275000000000002e-05, 'epoch': 2.17} 58%|█████▊ | 5757/10000 [20:57:40<15:08:29, 12.85s/it] 58%|█████▊ | 5758/10000 [20:57:53<15:09:03, 12.86s/it] {'loss': 0.004, 'learning_rate': 2.127e-05, 'epoch': 2.17} 58%|█████▊ | 5758/10000 [20:57:53<15:09:03, 12.86s/it] 58%|█████▊ | 5759/10000 [20:58:06<15:08:56, 12.86s/it] {'loss': 0.0058, 'learning_rate': 2.1265e-05, 'epoch': 2.17} 58%|█████▊ | 5759/10000 [20:58:06<15:08:56, 12.86s/it] 58%|█████▊ | 5760/10000 [20:58:19<15:07:45, 12.85s/it] {'loss': 0.0045, 'learning_rate': 2.1260000000000003e-05, 'epoch': 2.17} 58%|█████▊ | 5760/10000 [20:58:19<15:07:45, 12.85s/it] 58%|█████▊ | 5761/10000 [20:58:32<15:08:41, 12.86s/it] {'loss': 0.0049, 'learning_rate': 2.1255e-05, 'epoch': 2.17} 58%|█████▊ | 5761/10000 [20:58:32<15:08:41, 12.86s/it] 58%|█████▊ | 5762/10000 [20:58:45<15:08:52, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.125e-05, 'epoch': 2.17} 58%|█████▊ | 5762/10000 [20:58:45<15:08:52, 12.87s/it] 58%|█████▊ | 5763/10000 [20:58:57<15:06:58, 12.84s/it] {'loss': 0.0048, 'learning_rate': 2.1245e-05, 'epoch': 2.17} 58%|█████▊ | 5763/10000 [20:58:57<15:06:58, 12.84s/it] 58%|█████▊ | 5764/10000 [20:59:10<15:07:11, 12.85s/it] {'loss': 0.0047, 'learning_rate': 2.124e-05, 'epoch': 2.17} 58%|█████▊ | 5764/10000 [20:59:10<15:07:11, 12.85s/it] 58%|█████▊ | 5765/10000 [20:59:23<15:06:16, 12.84s/it] {'loss': 0.0046, 'learning_rate': 2.1235000000000003e-05, 'epoch': 2.17} 58%|█████▊ | 5765/10000 [20:59:23<15:06:16, 12.84s/it] 58%|█████▊ | 5766/10000 [20:59:36<15:06:58, 12.85s/it] {'loss': 0.0035, 'learning_rate': 2.123e-05, 'epoch': 2.17} 58%|█████▊ | 5766/10000 [20:59:36<15:06:58, 12.85s/it] 58%|█████▊ | 5767/10000 [20:59:49<15:05:26, 12.83s/it] {'loss': 0.006, 'learning_rate': 2.1225e-05, 'epoch': 2.17} 58%|█████▊ | 5767/10000 [20:59:49<15:05:26, 12.83s/it] 58%|█████▊ | 5768/10000 [21:00:02<15:05:34, 12.84s/it] {'loss': 0.0061, 'learning_rate': 2.122e-05, 'epoch': 2.17} 58%|█████▊ | 5768/10000 [21:00:02<15:05:34, 12.84s/it] 58%|█████▊ | 5769/10000 [21:00:15<15:06:12, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.1215000000000003e-05, 'epoch': 2.17} 58%|█████▊ | 5769/10000 [21:00:15<15:06:12, 12.85s/it] 58%|█████▊ | 5770/10000 [21:00:27<15:06:10, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.1210000000000002e-05, 'epoch': 2.17} 58%|█████▊ | 5770/10000 [21:00:27<15:06:10, 12.85s/it] 58%|█████▊ | 5771/10000 [21:00:40<15:07:07, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.1205e-05, 'epoch': 2.17} 58%|█████▊ | 5771/10000 [21:00:40<15:07:07, 12.87s/it] 58%|█████▊ | 5772/10000 [21:00:53<15:06:04, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.12e-05, 'epoch': 2.17} 58%|█████▊ | 5772/10000 [21:00:53<15:06:04, 12.86s/it] 58%|█████▊ | 5773/10000 [21:01:06<15:06:05, 12.86s/it] {'loss': 0.004, 'learning_rate': 2.1195e-05, 'epoch': 2.18} 58%|█████▊ | 5773/10000 [21:01:06<15:06:05, 12.86s/it] 58%|█████▊ | 5774/10000 [21:01:19<15:06:06, 12.86s/it] {'loss': 0.0042, 'learning_rate': 2.1190000000000002e-05, 'epoch': 2.18} 58%|█████▊ | 5774/10000 [21:01:19<15:06:06, 12.86s/it] 58%|█████▊ | 5775/10000 [21:01:32<15:05:47, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.1185e-05, 'epoch': 2.18} 58%|█████▊ | 5775/10000 [21:01:32<15:05:47, 12.86s/it] 58%|█████▊ | 5776/10000 [21:01:45<15:05:45, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.118e-05, 'epoch': 2.18} 58%|█████▊ | 5776/10000 [21:01:45<15:05:45, 12.87s/it] 58%|█████▊ | 5777/10000 [21:01:57<15:04:44, 12.85s/it] {'loss': 0.0037, 'learning_rate': 2.1175e-05, 'epoch': 2.18} 58%|█████▊ | 5777/10000 [21:01:57<15:04:44, 12.85s/it] 58%|█████▊ | 5778/10000 [21:02:10<15:05:29, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.1170000000000002e-05, 'epoch': 2.18} 58%|█████▊ | 5778/10000 [21:02:10<15:05:29, 12.87s/it] 58%|█████▊ | 5779/10000 [21:02:23<15:06:12, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.1165e-05, 'epoch': 2.18} 58%|█████▊ | 5779/10000 [21:02:23<15:06:12, 12.88s/it] 58%|█████▊ | 5780/10000 [21:02:36<15:05:37, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.116e-05, 'epoch': 2.18} 58%|█████▊ | 5780/10000 [21:02:36<15:05:37, 12.88s/it] 58%|█████▊ | 5781/10000 [21:02:49<15:05:49, 12.88s/it] {'loss': 0.0037, 'learning_rate': 2.1155e-05, 'epoch': 2.18} 58%|█████▊ | 5781/10000 [21:02:49<15:05:49, 12.88s/it] 58%|█████▊ | 5782/10000 [21:03:02<15:05:44, 12.88s/it] {'loss': 0.0062, 'learning_rate': 2.115e-05, 'epoch': 2.18} 58%|█████▊ | 5782/10000 [21:03:02<15:05:44, 12.88s/it] 58%|█████▊ | 5783/10000 [21:03:15<15:04:30, 12.87s/it] {'loss': 0.0049, 'learning_rate': 2.1145e-05, 'epoch': 2.18} 58%|█████▊ | 5783/10000 [21:03:15<15:04:30, 12.87s/it] 58%|█████▊ | 5784/10000 [21:03:28<15:03:20, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.114e-05, 'epoch': 2.18} 58%|█████▊ | 5784/10000 [21:03:28<15:03:20, 12.86s/it] 58%|█████▊ | 5785/10000 [21:03:40<15:02:01, 12.84s/it] {'loss': 0.0055, 'learning_rate': 2.1135000000000003e-05, 'epoch': 2.18} 58%|█████▊ | 5785/10000 [21:03:40<15:02:01, 12.84s/it] 58%|█████▊ | 5786/10000 [21:03:53<15:02:19, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.113e-05, 'epoch': 2.18} 58%|█████▊ | 5786/10000 [21:03:53<15:02:19, 12.85s/it] 58%|█████▊ | 5787/10000 [21:04:06<15:01:31, 12.84s/it] {'loss': 0.0052, 'learning_rate': 2.1125000000000002e-05, 'epoch': 2.18} 58%|█████▊ | 5787/10000 [21:04:06<15:01:31, 12.84s/it] 58%|█████▊ | 5788/10000 [21:04:19<15:00:49, 12.83s/it] {'loss': 0.0046, 'learning_rate': 2.112e-05, 'epoch': 2.18} 58%|█████▊ | 5788/10000 [21:04:19<15:00:49, 12.83s/it] 58%|█████▊ | 5789/10000 [21:04:32<14:59:48, 12.82s/it] {'loss': 0.0038, 'learning_rate': 2.1115e-05, 'epoch': 2.18} 58%|█████▊ | 5789/10000 [21:04:32<14:59:48, 12.82s/it] 58%|█████▊ | 5790/10000 [21:04:45<15:01:13, 12.84s/it] {'loss': 0.0049, 'learning_rate': 2.1110000000000003e-05, 'epoch': 2.18} 58%|█████▊ | 5790/10000 [21:04:45<15:01:13, 12.84s/it] 58%|█████▊ | 5791/10000 [21:04:58<15:03:17, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.1105e-05, 'epoch': 2.18} 58%|█████▊ | 5791/10000 [21:04:58<15:03:17, 12.88s/it] 58%|█████▊ | 5792/10000 [21:05:10<15:02:41, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.11e-05, 'epoch': 2.18} 58%|█████▊ | 5792/10000 [21:05:10<15:02:41, 12.87s/it] 58%|█████▊ | 5793/10000 [21:05:23<15:03:41, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.1095e-05, 'epoch': 2.18} 58%|█████▊ | 5793/10000 [21:05:23<15:03:41, 12.89s/it] 58%|█████▊ | 5794/10000 [21:05:36<15:01:28, 12.86s/it] {'loss': 0.0033, 'learning_rate': 2.1090000000000003e-05, 'epoch': 2.18} 58%|█████▊ | 5794/10000 [21:05:36<15:01:28, 12.86s/it] 58%|█████▊ | 5795/10000 [21:05:49<15:01:44, 12.87s/it] {'loss': 0.0031, 'learning_rate': 2.1085000000000002e-05, 'epoch': 2.18} 58%|█████▊ | 5795/10000 [21:05:49<15:01:44, 12.87s/it] 58%|█████▊ | 5796/10000 [21:06:02<14:59:58, 12.84s/it] {'loss': 0.0039, 'learning_rate': 2.1079999999999998e-05, 'epoch': 2.18} 58%|█████▊ | 5796/10000 [21:06:02<14:59:58, 12.84s/it] 58%|█████▊ | 5797/10000 [21:06:15<15:02:30, 12.88s/it] {'loss': 0.0033, 'learning_rate': 2.1075e-05, 'epoch': 2.18} 58%|█████▊ | 5797/10000 [21:06:15<15:02:30, 12.88s/it] 58%|█████▊ | 5798/10000 [21:06:28<15:00:51, 12.86s/it] {'loss': 0.0041, 'learning_rate': 2.107e-05, 'epoch': 2.18} 58%|█████▊ | 5798/10000 [21:06:28<15:00:51, 12.86s/it] 58%|█████▊ | 5799/10000 [21:06:40<15:00:39, 12.86s/it] {'loss': 0.0041, 'learning_rate': 2.1065000000000002e-05, 'epoch': 2.19} 58%|█████▊ | 5799/10000 [21:06:40<15:00:39, 12.86s/it] 58%|█████▊ | 5800/10000 [21:06:53<15:00:12, 12.86s/it] {'loss': 0.0042, 'learning_rate': 2.106e-05, 'epoch': 2.19} 58%|█████▊ | 5800/10000 [21:06:53<15:00:12, 12.86s/it] 58%|█████▊ | 5801/10000 [21:07:06<15:00:59, 12.87s/it] {'loss': 0.0056, 'learning_rate': 2.1055e-05, 'epoch': 2.19} 58%|█████▊ | 5801/10000 [21:07:06<15:00:59, 12.87s/it] 58%|█████▊ | 5802/10000 [21:07:19<15:00:43, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.105e-05, 'epoch': 2.19} 58%|█████▊ | 5802/10000 [21:07:19<15:00:43, 12.87s/it] 58%|█████▊ | 5803/10000 [21:07:32<15:00:26, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.1045e-05, 'epoch': 2.19} 58%|█████▊ | 5803/10000 [21:07:32<15:00:26, 12.87s/it] 58%|█████▊ | 5804/10000 [21:07:45<15:00:08, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.1040000000000002e-05, 'epoch': 2.19} 58%|█████▊ | 5804/10000 [21:07:45<15:00:08, 12.87s/it] 58%|█████▊ | 5805/10000 [21:07:58<15:00:55, 12.89s/it] {'loss': 0.006, 'learning_rate': 2.1035e-05, 'epoch': 2.19} 58%|█████▊ | 5805/10000 [21:07:58<15:00:55, 12.89s/it] 58%|█████▊ | 5806/10000 [21:08:11<15:01:33, 12.90s/it] {'loss': 0.0065, 'learning_rate': 2.103e-05, 'epoch': 2.19} 58%|█████▊ | 5806/10000 [21:08:11<15:01:33, 12.90s/it] 58%|█████▊ | 5807/10000 [21:08:23<15:00:06, 12.88s/it] {'loss': 0.0036, 'learning_rate': 2.1025e-05, 'epoch': 2.19} 58%|█████▊ | 5807/10000 [21:08:24<15:00:06, 12.88s/it] 58%|█████▊ | 5808/10000 [21:08:36<15:00:52, 12.89s/it] {'loss': 0.0037, 'learning_rate': 2.1020000000000002e-05, 'epoch': 2.19} 58%|█████▊ | 5808/10000 [21:08:36<15:00:52, 12.89s/it] 58%|█████▊ | 5809/10000 [21:08:49<14:59:44, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.1015e-05, 'epoch': 2.19} 58%|█████▊ | 5809/10000 [21:08:49<14:59:44, 12.88s/it] 58%|█████▊ | 5810/10000 [21:09:02<14:59:07, 12.88s/it] {'loss': 0.0068, 'learning_rate': 2.101e-05, 'epoch': 2.19} 58%|█████▊ | 5810/10000 [21:09:02<14:59:07, 12.88s/it] 58%|█████▊ | 5811/10000 [21:09:15<14:57:12, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.1005e-05, 'epoch': 2.19} 58%|█████▊ | 5811/10000 [21:09:15<14:57:12, 12.85s/it] 58%|█████▊ | 5812/10000 [21:09:28<14:55:09, 12.82s/it] {'loss': 0.0055, 'learning_rate': 2.1e-05, 'epoch': 2.19} 58%|█████▊ | 5812/10000 [21:09:28<14:55:09, 12.82s/it] 58%|█████▊ | 5813/10000 [21:09:41<14:56:09, 12.84s/it] {'loss': 0.0044, 'learning_rate': 2.0995e-05, 'epoch': 2.19} 58%|█████▊ | 5813/10000 [21:09:41<14:56:09, 12.84s/it] 58%|█████▊ | 5814/10000 [21:09:53<14:57:07, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.099e-05, 'epoch': 2.19} 58%|█████▊ | 5814/10000 [21:09:53<14:57:07, 12.86s/it] 58%|█████▊ | 5815/10000 [21:10:06<14:57:10, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.0985000000000003e-05, 'epoch': 2.19} 58%|█████▊ | 5815/10000 [21:10:06<14:57:10, 12.86s/it] 58%|█████▊ | 5816/10000 [21:10:19<14:55:46, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.098e-05, 'epoch': 2.19} 58%|█████▊ | 5816/10000 [21:10:19<14:55:46, 12.85s/it] 58%|█████▊ | 5817/10000 [21:10:32<14:56:42, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.0975e-05, 'epoch': 2.19} 58%|█████▊ | 5817/10000 [21:10:32<14:56:42, 12.86s/it] 58%|█████▊ | 5818/10000 [21:10:45<14:56:43, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.097e-05, 'epoch': 2.19} 58%|█████▊ | 5818/10000 [21:10:45<14:56:43, 12.87s/it] 58%|█████▊ | 5819/10000 [21:10:58<14:56:39, 12.87s/it] {'loss': 0.0039, 'learning_rate': 2.0965e-05, 'epoch': 2.19} 58%|█████▊ | 5819/10000 [21:10:58<14:56:39, 12.87s/it] 58%|█████▊ | 5820/10000 [21:11:11<14:54:46, 12.84s/it] {'loss': 0.0044, 'learning_rate': 2.0960000000000003e-05, 'epoch': 2.19} 58%|█████▊ | 5820/10000 [21:11:11<14:54:46, 12.84s/it] 58%|█████▊ | 5821/10000 [21:11:24<14:56:39, 12.87s/it] {'loss': 0.0045, 'learning_rate': 2.0955e-05, 'epoch': 2.19} 58%|█████▊ | 5821/10000 [21:11:24<14:56:39, 12.87s/it] 58%|█████▊ | 5822/10000 [21:11:36<14:55:09, 12.86s/it] {'loss': 0.0054, 'learning_rate': 2.095e-05, 'epoch': 2.19} 58%|█████▊ | 5822/10000 [21:11:36<14:55:09, 12.86s/it] 58%|█████▊ | 5823/10000 [21:11:49<14:56:08, 12.87s/it] {'loss': 0.0032, 'learning_rate': 2.0945e-05, 'epoch': 2.19} 58%|█████▊ | 5823/10000 [21:11:49<14:56:08, 12.87s/it] 58%|█████▊ | 5824/10000 [21:12:02<14:54:01, 12.85s/it] {'loss': 0.0049, 'learning_rate': 2.0940000000000003e-05, 'epoch': 2.19} 58%|█████▊ | 5824/10000 [21:12:02<14:54:01, 12.85s/it] 58%|█████▊ | 5825/10000 [21:12:15<14:54:12, 12.85s/it] {'loss': 0.0037, 'learning_rate': 2.0935000000000002e-05, 'epoch': 2.19} 58%|█████▊ | 5825/10000 [21:12:15<14:54:12, 12.85s/it] 58%|█████▊ | 5826/10000 [21:12:28<14:53:52, 12.85s/it] {'loss': 0.0036, 'learning_rate': 2.093e-05, 'epoch': 2.2} 58%|█████▊ | 5826/10000 [21:12:28<14:53:52, 12.85s/it] 58%|█████▊ | 5827/10000 [21:12:41<14:54:23, 12.86s/it] {'loss': 0.0039, 'learning_rate': 2.0925e-05, 'epoch': 2.2} 58%|█████▊ | 5827/10000 [21:12:41<14:54:23, 12.86s/it] 58%|█████▊ | 5828/10000 [21:12:53<14:52:30, 12.84s/it] {'loss': 0.0061, 'learning_rate': 2.092e-05, 'epoch': 2.2} 58%|█████▊ | 5828/10000 [21:12:53<14:52:30, 12.84s/it] 58%|█████▊ | 5829/10000 [21:13:06<14:54:38, 12.87s/it] {'loss': 0.0039, 'learning_rate': 2.0915000000000002e-05, 'epoch': 2.2} 58%|█████▊ | 5829/10000 [21:13:06<14:54:38, 12.87s/it] 58%|█████▊ | 5830/10000 [21:13:19<14:53:24, 12.85s/it] {'loss': 0.0048, 'learning_rate': 2.091e-05, 'epoch': 2.2} 58%|█████▊ | 5830/10000 [21:13:19<14:53:24, 12.85s/it] 58%|█████▊ | 5831/10000 [21:13:32<14:53:37, 12.86s/it] {'loss': 0.0044, 'learning_rate': 2.0905000000000004e-05, 'epoch': 2.2} 58%|█████▊ | 5831/10000 [21:13:32<14:53:37, 12.86s/it] 58%|█████▊ | 5832/10000 [21:13:45<14:52:45, 12.85s/it] {'loss': 0.0054, 'learning_rate': 2.09e-05, 'epoch': 2.2} 58%|█████▊ | 5832/10000 [21:13:45<14:52:45, 12.85s/it] 58%|█████▊ | 5833/10000 [21:13:58<14:54:26, 12.88s/it] {'loss': 0.0049, 'learning_rate': 2.0895e-05, 'epoch': 2.2} 58%|█████▊ | 5833/10000 [21:13:58<14:54:26, 12.88s/it] 58%|█████▊ | 5834/10000 [21:14:11<14:54:55, 12.89s/it] {'loss': 0.0052, 'learning_rate': 2.089e-05, 'epoch': 2.2} 58%|█████▊ | 5834/10000 [21:14:11<14:54:55, 12.89s/it] 58%|█████▊ | 5835/10000 [21:14:24<14:54:43, 12.89s/it] {'loss': 0.0042, 'learning_rate': 2.0885e-05, 'epoch': 2.2} 58%|█████▊ | 5835/10000 [21:14:24<14:54:43, 12.89s/it] 58%|█████▊ | 5836/10000 [21:14:37<14:54:55, 12.90s/it] {'loss': 0.0049, 'learning_rate': 2.0880000000000003e-05, 'epoch': 2.2} 58%|█████▊ | 5836/10000 [21:14:37<14:54:55, 12.90s/it] 58%|█████▊ | 5837/10000 [21:14:49<14:53:18, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.0875e-05, 'epoch': 2.2} 58%|█████▊ | 5837/10000 [21:14:49<14:53:18, 12.87s/it] 58%|█████▊ | 5838/10000 [21:15:02<14:53:10, 12.88s/it] {'loss': 0.0058, 'learning_rate': 2.0870000000000002e-05, 'epoch': 2.2} 58%|█████▊ | 5838/10000 [21:15:02<14:53:10, 12.88s/it] 58%|█████▊ | 5839/10000 [21:15:15<14:52:33, 12.87s/it] {'loss': 0.005, 'learning_rate': 2.0865e-05, 'epoch': 2.2} 58%|█████▊ | 5839/10000 [21:15:15<14:52:33, 12.87s/it] 58%|█████▊ | 5840/10000 [21:15:28<14:51:07, 12.85s/it] {'loss': 0.0056, 'learning_rate': 2.086e-05, 'epoch': 2.2} 58%|█████▊ | 5840/10000 [21:15:28<14:51:07, 12.85s/it] 58%|█████▊ | 5841/10000 [21:15:41<14:50:00, 12.84s/it] {'loss': 0.0054, 'learning_rate': 2.0855000000000003e-05, 'epoch': 2.2} 58%|█████▊ | 5841/10000 [21:15:41<14:50:00, 12.84s/it] 58%|█████▊ | 5842/10000 [21:15:54<14:49:08, 12.83s/it] {'loss': 0.0049, 'learning_rate': 2.085e-05, 'epoch': 2.2} 58%|█████▊ | 5842/10000 [21:15:54<14:49:08, 12.83s/it] 58%|█████▊ | 5843/10000 [21:16:06<14:51:09, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.0845e-05, 'epoch': 2.2} 58%|█████▊ | 5843/10000 [21:16:06<14:51:09, 12.86s/it] 58%|█████▊ | 5844/10000 [21:16:19<14:54:45, 12.92s/it] {'loss': 0.0054, 'learning_rate': 2.084e-05, 'epoch': 2.2} 58%|█████▊ | 5844/10000 [21:16:20<14:54:45, 12.92s/it] 58%|█████▊ | 5845/10000 [21:16:32<14:55:00, 12.92s/it] {'loss': 0.0049, 'learning_rate': 2.0835000000000003e-05, 'epoch': 2.2} 58%|█████▊ | 5845/10000 [21:16:32<14:55:00, 12.92s/it] 58%|█████▊ | 5846/10000 [21:16:45<14:53:39, 12.91s/it] {'loss': 0.003, 'learning_rate': 2.0830000000000002e-05, 'epoch': 2.2} 58%|█████▊ | 5846/10000 [21:16:45<14:53:39, 12.91s/it] 58%|█████▊ | 5847/10000 [21:16:58<14:53:01, 12.90s/it] {'loss': 0.0039, 'learning_rate': 2.0825e-05, 'epoch': 2.2} 58%|█████▊ | 5847/10000 [21:16:58<14:53:01, 12.90s/it] 58%|█████▊ | 5848/10000 [21:17:11<14:51:31, 12.88s/it] {'loss': 0.0055, 'learning_rate': 2.082e-05, 'epoch': 2.2} 58%|█████▊ | 5848/10000 [21:17:11<14:51:31, 12.88s/it] 58%|█████▊ | 5849/10000 [21:17:24<14:51:46, 12.89s/it] {'loss': 0.0059, 'learning_rate': 2.0815e-05, 'epoch': 2.2} 58%|█████▊ | 5849/10000 [21:17:24<14:51:46, 12.89s/it] 58%|█████▊ | 5850/10000 [21:17:37<14:50:41, 12.88s/it] {'loss': 0.0054, 'learning_rate': 2.0810000000000002e-05, 'epoch': 2.2} 58%|█████▊ | 5850/10000 [21:17:37<14:50:41, 12.88s/it] 59%|█████▊ | 5851/10000 [21:17:50<14:51:22, 12.89s/it] {'loss': 0.0053, 'learning_rate': 2.0805e-05, 'epoch': 2.2} 59%|█████▊ | 5851/10000 [21:17:50<14:51:22, 12.89s/it] 59%|█████▊ | 5852/10000 [21:18:03<14:52:24, 12.91s/it] {'loss': 0.0043, 'learning_rate': 2.08e-05, 'epoch': 2.2} 59%|█████▊ | 5852/10000 [21:18:03<14:52:24, 12.91s/it] 59%|█████▊ | 5853/10000 [21:18:16<14:53:10, 12.92s/it] {'loss': 0.0041, 'learning_rate': 2.0795e-05, 'epoch': 2.21} 59%|█████▊ | 5853/10000 [21:18:16<14:53:10, 12.92s/it] 59%|█████▊ | 5854/10000 [21:18:29<14:52:36, 12.92s/it] {'loss': 0.0051, 'learning_rate': 2.0790000000000003e-05, 'epoch': 2.21} 59%|█████▊ | 5854/10000 [21:18:29<14:52:36, 12.92s/it] 59%|█████▊ | 5855/10000 [21:18:41<14:50:59, 12.90s/it] {'loss': 0.0054, 'learning_rate': 2.0785000000000002e-05, 'epoch': 2.21} 59%|█████▊ | 5855/10000 [21:18:41<14:50:59, 12.90s/it] 59%|█████▊ | 5856/10000 [21:18:54<14:49:49, 12.88s/it] {'loss': 0.0044, 'learning_rate': 2.078e-05, 'epoch': 2.21} 59%|█████▊ | 5856/10000 [21:18:54<14:49:49, 12.88s/it] 59%|█████▊ | 5857/10000 [21:19:07<14:49:32, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.0775e-05, 'epoch': 2.21} 59%|█████▊ | 5857/10000 [21:19:07<14:49:32, 12.88s/it] 59%|█████▊ | 5858/10000 [21:19:20<14:48:54, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.077e-05, 'epoch': 2.21} 59%|█████▊ | 5858/10000 [21:19:20<14:48:54, 12.88s/it] 59%|█████▊ | 5859/10000 [21:19:33<14:48:52, 12.88s/it] {'loss': 0.0038, 'learning_rate': 2.0765000000000002e-05, 'epoch': 2.21} 59%|█████▊ | 5859/10000 [21:19:33<14:48:52, 12.88s/it] 59%|█████▊ | 5860/10000 [21:19:46<14:47:50, 12.87s/it] {'loss': 0.0045, 'learning_rate': 2.076e-05, 'epoch': 2.21} 59%|█████▊ | 5860/10000 [21:19:46<14:47:50, 12.87s/it] 59%|█████▊ | 5861/10000 [21:19:58<14:45:55, 12.84s/it] {'loss': 0.0037, 'learning_rate': 2.0755000000000004e-05, 'epoch': 2.21} 59%|█████▊ | 5861/10000 [21:19:59<14:45:55, 12.84s/it] 59%|█████▊ | 5862/10000 [21:20:11<14:47:36, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.075e-05, 'epoch': 2.21} 59%|█████▊ | 5862/10000 [21:20:11<14:47:36, 12.87s/it] 59%|█████▊ | 5863/10000 [21:20:24<14:47:45, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.0745000000000002e-05, 'epoch': 2.21} 59%|█████▊ | 5863/10000 [21:20:24<14:47:45, 12.88s/it] 59%|█████▊ | 5864/10000 [21:20:37<14:47:07, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.074e-05, 'epoch': 2.21} 59%|█████▊ | 5864/10000 [21:20:37<14:47:07, 12.87s/it] 59%|█████▊ | 5865/10000 [21:20:50<14:48:40, 12.90s/it] {'loss': 0.004, 'learning_rate': 2.0735e-05, 'epoch': 2.21} 59%|█████▊ | 5865/10000 [21:20:50<14:48:40, 12.90s/it] 59%|█████▊ | 5866/10000 [21:21:03<14:47:12, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.0730000000000003e-05, 'epoch': 2.21} 59%|█████▊ | 5866/10000 [21:21:03<14:47:12, 12.88s/it] 59%|█████▊ | 5867/10000 [21:21:16<14:46:10, 12.86s/it] {'loss': 0.0059, 'learning_rate': 2.0725e-05, 'epoch': 2.21} 59%|█████▊ | 5867/10000 [21:21:16<14:46:10, 12.86s/it] 59%|█████▊ | 5868/10000 [21:21:29<14:45:16, 12.85s/it] {'loss': 0.0038, 'learning_rate': 2.072e-05, 'epoch': 2.21} 59%|█████▊ | 5868/10000 [21:21:29<14:45:16, 12.85s/it] 59%|█████▊ | 5869/10000 [21:21:42<14:45:55, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.0715e-05, 'epoch': 2.21} 59%|█████▊ | 5869/10000 [21:21:42<14:45:55, 12.87s/it] 59%|█████▊ | 5870/10000 [21:21:54<14:47:15, 12.89s/it] {'loss': 0.0054, 'learning_rate': 2.0710000000000003e-05, 'epoch': 2.21} 59%|█████▊ | 5870/10000 [21:21:54<14:47:15, 12.89s/it] 59%|█████▊ | 5871/10000 [21:22:07<14:45:02, 12.86s/it] {'loss': 0.0038, 'learning_rate': 2.0705000000000003e-05, 'epoch': 2.21} 59%|█████▊ | 5871/10000 [21:22:07<14:45:02, 12.86s/it] 59%|█████▊ | 5872/10000 [21:22:20<14:45:26, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.07e-05, 'epoch': 2.21} 59%|█████▊ | 5872/10000 [21:22:20<14:45:26, 12.87s/it] 59%|█████▊ | 5873/10000 [21:22:33<14:45:13, 12.87s/it] {'loss': 0.0045, 'learning_rate': 2.0695e-05, 'epoch': 2.21} 59%|█████▊ | 5873/10000 [21:22:33<14:45:13, 12.87s/it] 59%|█████▊ | 5874/10000 [21:22:46<14:46:15, 12.89s/it] {'loss': 0.0046, 'learning_rate': 2.069e-05, 'epoch': 2.21} 59%|█████▊ | 5874/10000 [21:22:46<14:46:15, 12.89s/it] 59%|█████▉ | 5875/10000 [21:22:59<14:45:50, 12.89s/it] {'loss': 0.0039, 'learning_rate': 2.0685000000000003e-05, 'epoch': 2.21} 59%|█████▉ | 5875/10000 [21:22:59<14:45:50, 12.89s/it] 59%|█████▉ | 5876/10000 [21:23:12<14:46:15, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.0680000000000002e-05, 'epoch': 2.21} 59%|█████▉ | 5876/10000 [21:23:12<14:46:15, 12.89s/it] 59%|█████▉ | 5877/10000 [21:23:25<14:47:15, 12.91s/it] {'loss': 0.0057, 'learning_rate': 2.0675e-05, 'epoch': 2.21} 59%|█████▉ | 5877/10000 [21:23:25<14:47:15, 12.91s/it] 59%|█████▉ | 5878/10000 [21:23:38<14:49:59, 12.95s/it] {'loss': 0.0046, 'learning_rate': 2.067e-05, 'epoch': 2.21} 59%|█████▉ | 5878/10000 [21:23:38<14:49:59, 12.95s/it] 59%|█████▉ | 5879/10000 [21:23:51<14:48:54, 12.94s/it] {'loss': 0.0041, 'learning_rate': 2.0665e-05, 'epoch': 2.22} 59%|█████▉ | 5879/10000 [21:23:51<14:48:54, 12.94s/it] 59%|█████▉ | 5880/10000 [21:24:04<14:46:47, 12.91s/it] {'loss': 0.005, 'learning_rate': 2.0660000000000002e-05, 'epoch': 2.22} 59%|█████▉ | 5880/10000 [21:24:04<14:46:47, 12.91s/it] 59%|█████▉ | 5881/10000 [21:24:16<14:45:35, 12.90s/it] {'loss': 0.0048, 'learning_rate': 2.0655e-05, 'epoch': 2.22} 59%|█████▉ | 5881/10000 [21:24:16<14:45:35, 12.90s/it] 59%|█████▉ | 5882/10000 [21:24:29<14:43:26, 12.87s/it] {'loss': 0.0051, 'learning_rate': 2.065e-05, 'epoch': 2.22} 59%|█████▉ | 5882/10000 [21:24:29<14:43:26, 12.87s/it] 59%|█████▉ | 5883/10000 [21:24:42<14:42:25, 12.86s/it] {'loss': 0.0035, 'learning_rate': 2.0645e-05, 'epoch': 2.22} 59%|█████▉ | 5883/10000 [21:24:42<14:42:25, 12.86s/it] 59%|█████▉ | 5884/10000 [21:24:55<14:42:40, 12.87s/it] {'loss': 0.0045, 'learning_rate': 2.0640000000000002e-05, 'epoch': 2.22} 59%|█████▉ | 5884/10000 [21:24:55<14:42:40, 12.87s/it] 59%|█████▉ | 5885/10000 [21:25:08<14:41:43, 12.86s/it] {'loss': 0.0039, 'learning_rate': 2.0635e-05, 'epoch': 2.22} 59%|█████▉ | 5885/10000 [21:25:08<14:41:43, 12.86s/it] 59%|█████▉ | 5886/10000 [21:25:21<14:42:00, 12.86s/it] {'loss': 0.0041, 'learning_rate': 2.063e-05, 'epoch': 2.22} 59%|█████▉ | 5886/10000 [21:25:21<14:42:00, 12.86s/it] 59%|█████▉ | 5887/10000 [21:25:33<14:42:09, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.0625e-05, 'epoch': 2.22} 59%|█████▉ | 5887/10000 [21:25:34<14:42:09, 12.87s/it] 59%|█████▉ | 5888/10000 [21:25:46<14:39:54, 12.84s/it] {'loss': 0.0043, 'learning_rate': 2.062e-05, 'epoch': 2.22} 59%|█████▉ | 5888/10000 [21:25:46<14:39:54, 12.84s/it] 59%|█████▉ | 5889/10000 [21:25:59<14:40:22, 12.85s/it] {'loss': 0.0055, 'learning_rate': 2.0615e-05, 'epoch': 2.22} 59%|█████▉ | 5889/10000 [21:25:59<14:40:22, 12.85s/it] 59%|█████▉ | 5890/10000 [21:26:12<14:39:42, 12.84s/it] {'loss': 0.0052, 'learning_rate': 2.061e-05, 'epoch': 2.22} 59%|█████▉ | 5890/10000 [21:26:12<14:39:42, 12.84s/it] 59%|█████▉ | 5891/10000 [21:26:25<14:39:13, 12.84s/it] {'loss': 0.0051, 'learning_rate': 2.0605000000000003e-05, 'epoch': 2.22} 59%|█████▉ | 5891/10000 [21:26:25<14:39:13, 12.84s/it] 59%|█████▉ | 5892/10000 [21:26:38<14:37:56, 12.82s/it] {'loss': 0.0053, 'learning_rate': 2.06e-05, 'epoch': 2.22} 59%|█████▉ | 5892/10000 [21:26:38<14:37:56, 12.82s/it] 59%|█████▉ | 5893/10000 [21:26:50<14:37:17, 12.82s/it] {'loss': 0.0052, 'learning_rate': 2.0595000000000002e-05, 'epoch': 2.22} 59%|█████▉ | 5893/10000 [21:26:50<14:37:17, 12.82s/it] 59%|█████▉ | 5894/10000 [21:27:03<14:38:05, 12.83s/it] {'loss': 0.0035, 'learning_rate': 2.059e-05, 'epoch': 2.22} 59%|█████▉ | 5894/10000 [21:27:03<14:38:05, 12.83s/it] 59%|█████▉ | 5895/10000 [21:27:16<14:40:14, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.0585e-05, 'epoch': 2.22} 59%|█████▉ | 5895/10000 [21:27:16<14:40:14, 12.87s/it] 59%|█████▉ | 5896/10000 [21:27:29<14:40:44, 12.88s/it] {'loss': 0.0044, 'learning_rate': 2.0580000000000003e-05, 'epoch': 2.22} 59%|█████▉ | 5896/10000 [21:27:29<14:40:44, 12.88s/it] 59%|█████▉ | 5897/10000 [21:27:42<14:40:46, 12.88s/it] {'loss': 0.0043, 'learning_rate': 2.0575e-05, 'epoch': 2.22} 59%|█████▉ | 5897/10000 [21:27:42<14:40:46, 12.88s/it] 59%|█████▉ | 5898/10000 [21:27:55<14:38:54, 12.86s/it] {'loss': 0.006, 'learning_rate': 2.057e-05, 'epoch': 2.22} 59%|█████▉ | 5898/10000 [21:27:55<14:38:54, 12.86s/it] 59%|█████▉ | 5899/10000 [21:28:08<14:41:15, 12.89s/it] {'loss': 0.0044, 'learning_rate': 2.0565e-05, 'epoch': 2.22} 59%|█████▉ | 5899/10000 [21:28:08<14:41:15, 12.89s/it] 59%|█████▉ | 5900/10000 [21:28:21<14:41:10, 12.90s/it] {'loss': 0.0041, 'learning_rate': 2.0560000000000003e-05, 'epoch': 2.22} 59%|█████▉ | 5900/10000 [21:28:21<14:41:10, 12.90s/it] 59%|█████▉ | 5901/10000 [21:28:33<14:39:47, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.0555000000000002e-05, 'epoch': 2.22} 59%|█████▉ | 5901/10000 [21:28:34<14:39:47, 12.88s/it] 59%|█████▉ | 5902/10000 [21:28:46<14:38:47, 12.87s/it] {'loss': 0.0057, 'learning_rate': 2.055e-05, 'epoch': 2.22} 59%|█████▉ | 5902/10000 [21:28:46<14:38:47, 12.87s/it] 59%|█████▉ | 5903/10000 [21:28:59<14:38:06, 12.86s/it] {'loss': 0.0046, 'learning_rate': 2.0545e-05, 'epoch': 2.22} 59%|█████▉ | 5903/10000 [21:28:59<14:38:06, 12.86s/it] 59%|█████▉ | 5904/10000 [21:29:12<14:37:35, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.054e-05, 'epoch': 2.22} 59%|█████▉ | 5904/10000 [21:29:12<14:37:35, 12.86s/it] 59%|█████▉ | 5905/10000 [21:29:25<14:38:50, 12.88s/it] {'loss': 0.0041, 'learning_rate': 2.0535000000000002e-05, 'epoch': 2.22} 59%|█████▉ | 5905/10000 [21:29:25<14:38:50, 12.88s/it] 59%|█████▉ | 5906/10000 [21:29:38<14:39:16, 12.89s/it] {'loss': 0.0053, 'learning_rate': 2.053e-05, 'epoch': 2.23} 59%|█████▉ | 5906/10000 [21:29:38<14:39:16, 12.89s/it] 59%|█████▉ | 5907/10000 [21:29:51<14:38:46, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.0525e-05, 'epoch': 2.23} 59%|█████▉ | 5907/10000 [21:29:51<14:38:46, 12.88s/it] 59%|█████▉ | 5908/10000 [21:30:04<14:37:32, 12.87s/it] {'loss': 0.0046, 'learning_rate': 2.052e-05, 'epoch': 2.23} 59%|█████▉ | 5908/10000 [21:30:04<14:37:32, 12.87s/it] 59%|█████▉ | 5909/10000 [21:30:16<14:38:52, 12.89s/it] {'loss': 0.005, 'learning_rate': 2.0515e-05, 'epoch': 2.23} 59%|█████▉ | 5909/10000 [21:30:17<14:38:52, 12.89s/it] 59%|█████▉ | 5910/10000 [21:30:29<14:38:02, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.0510000000000002e-05, 'epoch': 2.23} 59%|█████▉ | 5910/10000 [21:30:29<14:38:02, 12.88s/it] 59%|█████▉ | 5911/10000 [21:30:42<14:37:46, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.0505e-05, 'epoch': 2.23} 59%|█████▉ | 5911/10000 [21:30:42<14:37:46, 12.88s/it] 59%|█████▉ | 5912/10000 [21:30:55<14:36:42, 12.87s/it] {'loss': 0.0036, 'learning_rate': 2.05e-05, 'epoch': 2.23} 59%|█████▉ | 5912/10000 [21:30:55<14:36:42, 12.87s/it] 59%|█████▉ | 5913/10000 [21:31:08<14:36:24, 12.87s/it] {'loss': 0.0036, 'learning_rate': 2.0495e-05, 'epoch': 2.23} 59%|█████▉ | 5913/10000 [21:31:08<14:36:24, 12.87s/it] 59%|█████▉ | 5914/10000 [21:31:21<14:34:31, 12.84s/it] {'loss': 0.0038, 'learning_rate': 2.0490000000000002e-05, 'epoch': 2.23} 59%|█████▉ | 5914/10000 [21:31:21<14:34:31, 12.84s/it] 59%|█████▉ | 5915/10000 [21:31:34<14:37:07, 12.88s/it] {'loss': 0.0043, 'learning_rate': 2.0485e-05, 'epoch': 2.23} 59%|█████▉ | 5915/10000 [21:31:34<14:37:07, 12.88s/it] 59%|█████▉ | 5916/10000 [21:31:47<14:37:36, 12.89s/it] {'loss': 0.0035, 'learning_rate': 2.048e-05, 'epoch': 2.23} 59%|█████▉ | 5916/10000 [21:31:47<14:37:36, 12.89s/it] 59%|█████▉ | 5917/10000 [21:31:59<14:36:59, 12.89s/it] {'loss': 0.0051, 'learning_rate': 2.0475e-05, 'epoch': 2.23} 59%|█████▉ | 5917/10000 [21:32:00<14:36:59, 12.89s/it] 59%|█████▉ | 5918/10000 [21:32:12<14:35:19, 12.87s/it] {'loss': 0.0062, 'learning_rate': 2.047e-05, 'epoch': 2.23} 59%|█████▉ | 5918/10000 [21:32:12<14:35:19, 12.87s/it] 59%|█████▉ | 5919/10000 [21:32:25<14:34:30, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.0465e-05, 'epoch': 2.23} 59%|█████▉ | 5919/10000 [21:32:25<14:34:30, 12.86s/it] 59%|█████▉ | 5920/10000 [21:32:38<14:34:13, 12.86s/it] {'loss': 0.004, 'learning_rate': 2.046e-05, 'epoch': 2.23} 59%|█████▉ | 5920/10000 [21:32:38<14:34:13, 12.86s/it] 59%|█████▉ | 5921/10000 [21:32:51<14:32:42, 12.84s/it] {'loss': 0.0056, 'learning_rate': 2.0455000000000003e-05, 'epoch': 2.23} 59%|█████▉ | 5921/10000 [21:32:51<14:32:42, 12.84s/it] 59%|█████▉ | 5922/10000 [21:33:04<14:32:40, 12.84s/it] {'loss': 0.0045, 'learning_rate': 2.045e-05, 'epoch': 2.23} 59%|█████▉ | 5922/10000 [21:33:04<14:32:40, 12.84s/it] 59%|█████▉ | 5923/10000 [21:33:16<14:32:13, 12.84s/it] {'loss': 0.0053, 'learning_rate': 2.0445e-05, 'epoch': 2.23} 59%|█████▉ | 5923/10000 [21:33:16<14:32:13, 12.84s/it] 59%|█████▉ | 5924/10000 [21:33:29<14:30:32, 12.81s/it] {'loss': 0.0051, 'learning_rate': 2.044e-05, 'epoch': 2.23} 59%|█████▉ | 5924/10000 [21:33:29<14:30:32, 12.81s/it] 59%|█████▉ | 5925/10000 [21:33:42<14:33:47, 12.87s/it] {'loss': 0.0043, 'learning_rate': 2.0435e-05, 'epoch': 2.23} 59%|█████▉ | 5925/10000 [21:33:42<14:33:47, 12.87s/it] 59%|█████▉ | 5926/10000 [21:33:55<14:32:32, 12.85s/it] {'loss': 0.0047, 'learning_rate': 2.0430000000000003e-05, 'epoch': 2.23} 59%|█████▉ | 5926/10000 [21:33:55<14:32:32, 12.85s/it] 59%|█████▉ | 5927/10000 [21:34:08<14:32:33, 12.85s/it] {'loss': 0.0045, 'learning_rate': 2.0425e-05, 'epoch': 2.23} 59%|█████▉ | 5927/10000 [21:34:08<14:32:33, 12.85s/it] 59%|█████▉ | 5928/10000 [21:34:21<14:33:01, 12.86s/it] {'loss': 0.0043, 'learning_rate': 2.042e-05, 'epoch': 2.23} 59%|█████▉ | 5928/10000 [21:34:21<14:33:01, 12.86s/it] 59%|█████▉ | 5929/10000 [21:34:34<14:32:53, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.0415e-05, 'epoch': 2.23} 59%|█████▉ | 5929/10000 [21:34:34<14:32:53, 12.87s/it] 59%|█████▉ | 5930/10000 [21:34:47<14:33:47, 12.88s/it] {'loss': 0.0041, 'learning_rate': 2.0410000000000003e-05, 'epoch': 2.23} 59%|█████▉ | 5930/10000 [21:34:47<14:33:47, 12.88s/it] 59%|█████▉ | 5931/10000 [21:34:59<14:33:03, 12.87s/it] {'loss': 0.0042, 'learning_rate': 2.0405000000000002e-05, 'epoch': 2.23} 59%|█████▉ | 5931/10000 [21:34:59<14:33:03, 12.87s/it] 59%|█████▉ | 5932/10000 [21:35:12<14:33:10, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.04e-05, 'epoch': 2.24} 59%|█████▉ | 5932/10000 [21:35:12<14:33:10, 12.88s/it] 59%|█████▉ | 5933/10000 [21:35:25<14:33:05, 12.88s/it] {'loss': 0.0047, 'learning_rate': 2.0395e-05, 'epoch': 2.24} 59%|█████▉ | 5933/10000 [21:35:25<14:33:05, 12.88s/it] 59%|█████▉ | 5934/10000 [21:35:38<14:32:22, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.039e-05, 'epoch': 2.24} 59%|█████▉ | 5934/10000 [21:35:38<14:32:22, 12.87s/it] 59%|█████▉ | 5935/10000 [21:35:51<14:32:47, 12.88s/it] {'loss': 0.0039, 'learning_rate': 2.0385000000000002e-05, 'epoch': 2.24} 59%|█████▉ | 5935/10000 [21:35:51<14:32:47, 12.88s/it] 59%|█████▉ | 5936/10000 [21:36:04<14:31:34, 12.87s/it] {'loss': 0.0044, 'learning_rate': 2.038e-05, 'epoch': 2.24} 59%|█████▉ | 5936/10000 [21:36:04<14:31:34, 12.87s/it] 59%|█████▉ | 5937/10000 [21:36:17<14:30:33, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.0375e-05, 'epoch': 2.24} 59%|█████▉ | 5937/10000 [21:36:17<14:30:33, 12.86s/it] 59%|█████▉ | 5938/10000 [21:36:30<14:31:07, 12.87s/it] {'loss': 0.0057, 'learning_rate': 2.037e-05, 'epoch': 2.24} 59%|█████▉ | 5938/10000 [21:36:30<14:31:07, 12.87s/it] 59%|█████▉ | 5939/10000 [21:36:42<14:30:10, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.0365000000000002e-05, 'epoch': 2.24} 59%|█████▉ | 5939/10000 [21:36:42<14:30:10, 12.86s/it] 59%|█████▉ | 5940/10000 [21:36:55<14:29:10, 12.84s/it] {'loss': 0.0054, 'learning_rate': 2.036e-05, 'epoch': 2.24} 59%|█████▉ | 5940/10000 [21:36:55<14:29:10, 12.84s/it] 59%|█████▉ | 5941/10000 [21:37:08<14:28:59, 12.85s/it] {'loss': 0.0053, 'learning_rate': 2.0355e-05, 'epoch': 2.24} 59%|█████▉ | 5941/10000 [21:37:08<14:28:59, 12.85s/it] 59%|█████▉ | 5942/10000 [21:37:21<14:29:30, 12.86s/it] {'loss': 0.0039, 'learning_rate': 2.035e-05, 'epoch': 2.24} 59%|█████▉ | 5942/10000 [21:37:21<14:29:30, 12.86s/it] 59%|█████▉ | 5943/10000 [21:37:34<14:30:08, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.0345e-05, 'epoch': 2.24} 59%|█████▉ | 5943/10000 [21:37:34<14:30:08, 12.87s/it] 59%|█████▉ | 5944/10000 [21:37:47<14:30:50, 12.88s/it] {'loss': 0.0035, 'learning_rate': 2.0340000000000002e-05, 'epoch': 2.24} 59%|█████▉ | 5944/10000 [21:37:47<14:30:50, 12.88s/it] 59%|█████▉ | 5945/10000 [21:38:00<14:30:48, 12.89s/it] {'loss': 0.0041, 'learning_rate': 2.0335e-05, 'epoch': 2.24} 59%|█████▉ | 5945/10000 [21:38:00<14:30:48, 12.89s/it] 59%|█████▉ | 5946/10000 [21:38:12<14:28:48, 12.86s/it] {'loss': 0.0048, 'learning_rate': 2.033e-05, 'epoch': 2.24} 59%|█████▉ | 5946/10000 [21:38:12<14:28:48, 12.86s/it] 59%|█████▉ | 5947/10000 [21:38:25<14:28:40, 12.86s/it] {'loss': 0.0057, 'learning_rate': 2.0325e-05, 'epoch': 2.24} 59%|█████▉ | 5947/10000 [21:38:25<14:28:40, 12.86s/it] 59%|█████▉ | 5948/10000 [21:38:38<14:28:51, 12.87s/it] {'loss': 0.0035, 'learning_rate': 2.032e-05, 'epoch': 2.24} 59%|█████▉ | 5948/10000 [21:38:38<14:28:51, 12.87s/it] 59%|█████▉ | 5949/10000 [21:38:51<14:30:11, 12.89s/it] {'loss': 0.0066, 'learning_rate': 2.0315e-05, 'epoch': 2.24} 59%|█████▉ | 5949/10000 [21:38:51<14:30:11, 12.89s/it] 60%|█████▉ | 5950/10000 [21:39:04<14:30:31, 12.90s/it] {'loss': 0.0038, 'learning_rate': 2.031e-05, 'epoch': 2.24} 60%|█████▉ | 5950/10000 [21:39:04<14:30:31, 12.90s/it] 60%|█████▉ | 5951/10000 [21:39:17<14:30:03, 12.89s/it] {'loss': 0.0045, 'learning_rate': 2.0305000000000003e-05, 'epoch': 2.24} 60%|█████▉ | 5951/10000 [21:39:17<14:30:03, 12.89s/it] 60%|█████▉ | 5952/10000 [21:39:30<14:29:27, 12.89s/it] {'loss': 0.0044, 'learning_rate': 2.0300000000000002e-05, 'epoch': 2.24} 60%|█████▉ | 5952/10000 [21:39:30<14:29:27, 12.89s/it] 60%|█████▉ | 5953/10000 [21:39:43<14:27:03, 12.85s/it] {'loss': 0.0043, 'learning_rate': 2.0295e-05, 'epoch': 2.24} 60%|█████▉ | 5953/10000 [21:39:43<14:27:03, 12.85s/it] 60%|█████▉ | 5954/10000 [21:39:55<14:26:37, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.029e-05, 'epoch': 2.24} 60%|█████▉ | 5954/10000 [21:39:55<14:26:37, 12.85s/it] 60%|█████▉ | 5955/10000 [21:40:08<14:27:56, 12.87s/it] {'loss': 0.0061, 'learning_rate': 2.0285e-05, 'epoch': 2.24} 60%|█████▉ | 5955/10000 [21:40:08<14:27:56, 12.87s/it] 60%|█████▉ | 5956/10000 [21:40:21<14:28:25, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.0280000000000002e-05, 'epoch': 2.24} 60%|█████▉ | 5956/10000 [21:40:21<14:28:25, 12.88s/it] 60%|█████▉ | 5957/10000 [21:40:34<14:27:11, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.0275e-05, 'epoch': 2.24} 60%|█████▉ | 5957/10000 [21:40:34<14:27:11, 12.87s/it] 60%|█████▉ | 5958/10000 [21:40:47<14:27:12, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.027e-05, 'epoch': 2.24} 60%|█████▉ | 5958/10000 [21:40:47<14:27:12, 12.87s/it] 60%|█████▉ | 5959/10000 [21:41:00<14:25:24, 12.85s/it] {'loss': 0.0049, 'learning_rate': 2.0265e-05, 'epoch': 2.25} 60%|█████▉ | 5959/10000 [21:41:00<14:25:24, 12.85s/it] 60%|█████▉ | 5960/10000 [21:41:13<14:25:25, 12.85s/it] {'loss': 0.0048, 'learning_rate': 2.0260000000000003e-05, 'epoch': 2.25} 60%|█████▉ | 5960/10000 [21:41:13<14:25:25, 12.85s/it] 60%|█████▉ | 5961/10000 [21:41:25<14:24:13, 12.84s/it] {'loss': 0.0048, 'learning_rate': 2.0255000000000002e-05, 'epoch': 2.25} 60%|█████▉ | 5961/10000 [21:41:25<14:24:13, 12.84s/it] 60%|█████▉ | 5962/10000 [21:41:38<14:26:04, 12.87s/it] {'loss': 0.0041, 'learning_rate': 2.025e-05, 'epoch': 2.25} 60%|█████▉ | 5962/10000 [21:41:38<14:26:04, 12.87s/it] 60%|█████▉ | 5963/10000 [21:41:51<14:23:45, 12.84s/it] {'loss': 0.0053, 'learning_rate': 2.0245e-05, 'epoch': 2.25} 60%|█████▉ | 5963/10000 [21:41:51<14:23:45, 12.84s/it] 60%|█████▉ | 5964/10000 [21:42:04<14:24:37, 12.85s/it] {'loss': 0.005, 'learning_rate': 2.024e-05, 'epoch': 2.25} 60%|█████▉ | 5964/10000 [21:42:04<14:24:37, 12.85s/it] 60%|█████▉ | 5965/10000 [21:42:17<14:24:22, 12.85s/it] {'loss': 0.0052, 'learning_rate': 2.0235000000000002e-05, 'epoch': 2.25} 60%|█████▉ | 5965/10000 [21:42:17<14:24:22, 12.85s/it] 60%|█████▉ | 5966/10000 [21:42:30<14:22:55, 12.83s/it] {'loss': 0.0047, 'learning_rate': 2.023e-05, 'epoch': 2.25} 60%|█████▉ | 5966/10000 [21:42:30<14:22:55, 12.83s/it] 60%|█████▉ | 5967/10000 [21:42:43<14:23:39, 12.85s/it] {'loss': 0.0066, 'learning_rate': 2.0225000000000004e-05, 'epoch': 2.25} 60%|█████▉ | 5967/10000 [21:42:43<14:23:39, 12.85s/it] 60%|█████▉ | 5968/10000 [21:42:55<14:23:26, 12.85s/it] {'loss': 0.0044, 'learning_rate': 2.022e-05, 'epoch': 2.25} 60%|█████▉ | 5968/10000 [21:42:55<14:23:26, 12.85s/it] 60%|█████▉ | 5969/10000 [21:43:08<14:22:51, 12.84s/it] {'loss': 0.0052, 'learning_rate': 2.0215000000000002e-05, 'epoch': 2.25} 60%|█████▉ | 5969/10000 [21:43:08<14:22:51, 12.84s/it] 60%|█████▉ | 5970/10000 [21:43:21<14:23:48, 12.86s/it] {'loss': 0.0027, 'learning_rate': 2.021e-05, 'epoch': 2.25} 60%|█████▉ | 5970/10000 [21:43:21<14:23:48, 12.86s/it] 60%|█████▉ | 5971/10000 [21:43:34<14:23:23, 12.86s/it] {'loss': 0.0042, 'learning_rate': 2.0205e-05, 'epoch': 2.25} 60%|█████▉ | 5971/10000 [21:43:34<14:23:23, 12.86s/it] 60%|█████▉ | 5972/10000 [21:43:47<14:23:20, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.0200000000000003e-05, 'epoch': 2.25} 60%|█████▉ | 5972/10000 [21:43:47<14:23:20, 12.86s/it] 60%|█████▉ | 5973/10000 [21:44:00<14:24:51, 12.89s/it] {'loss': 0.0039, 'learning_rate': 2.0195e-05, 'epoch': 2.25} 60%|█████▉ | 5973/10000 [21:44:00<14:24:51, 12.89s/it] 60%|█████▉ | 5974/10000 [21:44:13<14:25:14, 12.89s/it] {'loss': 0.0029, 'learning_rate': 2.019e-05, 'epoch': 2.25} 60%|█████▉ | 5974/10000 [21:44:13<14:25:14, 12.89s/it] 60%|█████▉ | 5975/10000 [21:44:26<14:24:36, 12.89s/it] {'loss': 0.0042, 'learning_rate': 2.0185e-05, 'epoch': 2.25} 60%|█████▉ | 5975/10000 [21:44:26<14:24:36, 12.89s/it] 60%|█████▉ | 5976/10000 [21:44:38<14:24:12, 12.89s/it] {'loss': 0.0043, 'learning_rate': 2.0180000000000003e-05, 'epoch': 2.25} 60%|█████▉ | 5976/10000 [21:44:38<14:24:12, 12.89s/it] 60%|█████▉ | 5977/10000 [21:44:51<14:24:24, 12.89s/it] {'loss': 0.0053, 'learning_rate': 2.0175000000000003e-05, 'epoch': 2.25} 60%|█████▉ | 5977/10000 [21:44:51<14:24:24, 12.89s/it] 60%|█████▉ | 5978/10000 [21:45:04<14:23:42, 12.88s/it] {'loss': 0.0054, 'learning_rate': 2.017e-05, 'epoch': 2.25} 60%|█████▉ | 5978/10000 [21:45:04<14:23:42, 12.88s/it] 60%|█████▉ | 5979/10000 [21:45:17<14:25:29, 12.91s/it] {'loss': 0.0065, 'learning_rate': 2.0165e-05, 'epoch': 2.25} 60%|█████▉ | 5979/10000 [21:45:17<14:25:29, 12.91s/it] 60%|█████▉ | 5980/10000 [21:45:30<14:22:53, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.016e-05, 'epoch': 2.25} 60%|█████▉ | 5980/10000 [21:45:30<14:22:53, 12.88s/it] 60%|█████▉ | 5981/10000 [21:45:43<14:22:39, 12.88s/it] {'loss': 0.004, 'learning_rate': 2.0155000000000003e-05, 'epoch': 2.25} 60%|█████▉ | 5981/10000 [21:45:43<14:22:39, 12.88s/it] 60%|█████▉ | 5982/10000 [21:45:56<14:23:57, 12.90s/it] {'loss': 0.0049, 'learning_rate': 2.0150000000000002e-05, 'epoch': 2.25} 60%|█████▉ | 5982/10000 [21:45:56<14:23:57, 12.90s/it] 60%|█████▉ | 5983/10000 [21:46:09<14:21:12, 12.86s/it] {'loss': 0.0047, 'learning_rate': 2.0145e-05, 'epoch': 2.25} 60%|█████▉ | 5983/10000 [21:46:09<14:21:12, 12.86s/it] 60%|█████▉ | 5984/10000 [21:46:22<14:22:18, 12.88s/it] {'loss': 0.0057, 'learning_rate': 2.014e-05, 'epoch': 2.25} 60%|█████▉ | 5984/10000 [21:46:22<14:22:18, 12.88s/it] 60%|█████▉ | 5985/10000 [21:46:34<14:22:09, 12.88s/it] {'loss': 0.0048, 'learning_rate': 2.0135e-05, 'epoch': 2.26} 60%|█████▉ | 5985/10000 [21:46:34<14:22:09, 12.88s/it] 60%|█████▉ | 5986/10000 [21:46:47<14:21:09, 12.87s/it] {'loss': 0.0052, 'learning_rate': 2.0130000000000002e-05, 'epoch': 2.26} 60%|█████▉ | 5986/10000 [21:46:47<14:21:09, 12.87s/it] 60%|█████▉ | 5987/10000 [21:47:00<14:19:52, 12.86s/it] {'loss': 0.0053, 'learning_rate': 2.0125e-05, 'epoch': 2.26} 60%|█████▉ | 5987/10000 [21:47:00<14:19:52, 12.86s/it] 60%|█████▉ | 5988/10000 [21:47:13<14:19:34, 12.86s/it] {'loss': 0.0063, 'learning_rate': 2.012e-05, 'epoch': 2.26} 60%|█████▉ | 5988/10000 [21:47:13<14:19:34, 12.86s/it] 60%|█████▉ | 5989/10000 [21:47:26<14:20:31, 12.87s/it] {'loss': 0.0047, 'learning_rate': 2.0115e-05, 'epoch': 2.26} 60%|█████▉ | 5989/10000 [21:47:26<14:20:31, 12.87s/it] 60%|█████▉ | 5990/10000 [21:47:39<14:20:52, 12.88s/it] {'loss': 0.0053, 'learning_rate': 2.0110000000000002e-05, 'epoch': 2.26} 60%|█████▉ | 5990/10000 [21:47:39<14:20:52, 12.88s/it] 60%|█████▉ | 5991/10000 [21:47:52<14:21:38, 12.90s/it] {'loss': 0.0044, 'learning_rate': 2.0105e-05, 'epoch': 2.26} 60%|█████▉ | 5991/10000 [21:47:52<14:21:38, 12.90s/it] 60%|█████▉ | 5992/10000 [21:48:05<14:20:32, 12.88s/it] {'loss': 0.0046, 'learning_rate': 2.01e-05, 'epoch': 2.26} 60%|█████▉ | 5992/10000 [21:48:05<14:20:32, 12.88s/it] 60%|█████▉ | 5993/10000 [21:48:17<14:20:17, 12.88s/it] {'loss': 0.0051, 'learning_rate': 2.0095e-05, 'epoch': 2.26} 60%|█████▉ | 5993/10000 [21:48:17<14:20:17, 12.88s/it] 60%|█████▉ | 5994/10000 [21:48:30<14:20:20, 12.89s/it] {'loss': 0.0035, 'learning_rate': 2.009e-05, 'epoch': 2.26} 60%|█████▉ | 5994/10000 [21:48:30<14:20:20, 12.89s/it] 60%|█████▉ | 5995/10000 [21:48:43<14:19:41, 12.88s/it] {'loss': 0.0052, 'learning_rate': 2.0085e-05, 'epoch': 2.26} 60%|█████▉ | 5995/10000 [21:48:43<14:19:41, 12.88s/it] 60%|█████▉ | 5996/10000 [21:48:56<14:20:03, 12.89s/it] {'loss': 0.004, 'learning_rate': 2.008e-05, 'epoch': 2.26} 60%|█████▉ | 5996/10000 [21:48:56<14:20:03, 12.89s/it] 60%|█████▉ | 5997/10000 [21:49:09<14:19:49, 12.89s/it] {'loss': 0.0049, 'learning_rate': 2.0075000000000003e-05, 'epoch': 2.26} 60%|█████▉ | 5997/10000 [21:49:09<14:19:49, 12.89s/it] 60%|█████▉ | 5998/10000 [21:49:22<14:19:36, 12.89s/it] {'loss': 0.0047, 'learning_rate': 2.007e-05, 'epoch': 2.26} 60%|█████▉ | 5998/10000 [21:49:22<14:19:36, 12.89s/it] 60%|█████▉ | 5999/10000 [21:49:35<14:19:04, 12.88s/it] {'loss': 0.003, 'learning_rate': 2.0065000000000002e-05, 'epoch': 2.26} 60%|█████▉ | 5999/10000 [21:49:35<14:19:04, 12.88s/it] 60%|██████ | 6000/10000 [21:49:48<14:19:44, 12.90s/it] {'loss': 0.0039, 'learning_rate': 2.006e-05, 'epoch': 2.26} 60%|██████ | 6000/10000 [21:49:48<14:19:44, 12.90s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-06 18:14:46,299 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/config.json [INFO|configuration_utils.py:364] 2024-11-06 18:14:46,302 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-06 18:15:41,699 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-06 18:15:41,702 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-06 18:15:41,704 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/special_tokens_map.json [2024-11-06 18:15:41,752] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step6000 is about to be saved! [2024-11-06 18:15:41,810] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt [2024-11-06 18:15:41,831] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt... [2024-11-06 18:16:51,642] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt. [2024-11-06 18:16:51,735] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/global_step6000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-06 18:18:36,081] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/global_step6000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-06 18:18:36,377] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-6000/global_step6000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-06 18:18:36,377] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step6000 is ready now! 60%|██████ | 6001/10000 [21:53:51<91:01:55, 81.95s/it] {'loss': 0.0052, 'learning_rate': 2.0055e-05, 'epoch': 2.26} 60%|██████ | 6001/10000 [21:53:51<91:01:55, 81.95s/it] 60%|██████ | 6002/10000 [21:54:03<67:54:53, 61.15s/it] {'loss': 0.0053, 'learning_rate': 2.0050000000000003e-05, 'epoch': 2.26} 60%|██████ | 6002/10000 [21:54:03<67:54:53, 61.15s/it] 60%|██████ | 6003/10000 [21:54:16<51:48:36, 46.66s/it] {'loss': 0.0041, 'learning_rate': 2.0045e-05, 'epoch': 2.26} 60%|██████ | 6003/10000 [21:54:16<51:48:36, 46.66s/it] 60%|██████ | 6004/10000 [21:54:29<40:29:28, 36.48s/it] {'loss': 0.0056, 'learning_rate': 2.004e-05, 'epoch': 2.26} 60%|██████ | 6004/10000 [21:54:29<40:29:28, 36.48s/it] 60%|██████ | 6005/10000 [21:54:42<32:34:27, 29.35s/it] {'loss': 0.0061, 'learning_rate': 2.0035e-05, 'epoch': 2.26} 60%|██████ | 6005/10000 [21:54:42<32:34:27, 29.35s/it] 60%|██████ | 6006/10000 [21:54:54<27:04:07, 24.40s/it] {'loss': 0.0046, 'learning_rate': 2.0030000000000003e-05, 'epoch': 2.26} 60%|██████ | 6006/10000 [21:54:54<27:04:07, 24.40s/it] 60%|██████ | 6007/10000 [21:55:07<23:12:04, 20.92s/it] {'loss': 0.004, 'learning_rate': 2.0025000000000002e-05, 'epoch': 2.26} 60%|██████ | 6007/10000 [21:55:07<23:12:04, 20.92s/it] 60%|██████ | 6008/10000 [21:55:20<20:28:50, 18.47s/it] {'loss': 0.0061, 'learning_rate': 2.002e-05, 'epoch': 2.26} 60%|██████ | 6008/10000 [21:55:20<20:28:50, 18.47s/it] 60%|██████ | 6009/10000 [21:55:33<18:35:20, 16.77s/it] {'loss': 0.0046, 'learning_rate': 2.0015e-05, 'epoch': 2.26} 60%|██████ | 6009/10000 [21:55:33<18:35:20, 16.77s/it] 60%|██████ | 6010/10000 [21:55:46<17:17:03, 15.59s/it] {'loss': 0.004, 'learning_rate': 2.001e-05, 'epoch': 2.26} 60%|██████ | 6010/10000 [21:55:46<17:17:03, 15.59s/it] 60%|██████ | 6011/10000 [21:55:59<16:22:19, 14.78s/it] {'loss': 0.004, 'learning_rate': 2.0005000000000002e-05, 'epoch': 2.26} 60%|██████ | 6011/10000 [21:55:59<16:22:19, 14.78s/it] 60%|██████ | 6012/10000 [21:56:11<15:44:14, 14.21s/it] {'loss': 0.0043, 'learning_rate': 2e-05, 'epoch': 2.27} 60%|██████ | 6012/10000 [21:56:11<15:44:14, 14.21s/it] 60%|██████ | 6013/10000 [21:56:24<15:17:12, 13.80s/it] {'loss': 0.0052, 'learning_rate': 1.9995e-05, 'epoch': 2.27} 60%|██████ | 6013/10000 [21:56:24<15:17:12, 13.80s/it] 60%|██████ | 6014/10000 [21:56:37<14:57:56, 13.52s/it] {'loss': 0.0043, 'learning_rate': 1.999e-05, 'epoch': 2.27} 60%|██████ | 6014/10000 [21:56:37<14:57:56, 13.52s/it] 60%|██████ | 6015/10000 [21:56:50<14:44:14, 13.31s/it] {'loss': 0.0068, 'learning_rate': 1.9985000000000003e-05, 'epoch': 2.27} 60%|██████ | 6015/10000 [21:56:50<14:44:14, 13.31s/it] 60%|██████ | 6016/10000 [21:57:03<14:35:55, 13.19s/it] {'loss': 0.0032, 'learning_rate': 1.9980000000000002e-05, 'epoch': 2.27} 60%|██████ | 6016/10000 [21:57:03<14:35:55, 13.19s/it] 60%|██████ | 6017/10000 [21:57:16<14:30:03, 13.11s/it] {'loss': 0.0069, 'learning_rate': 1.9975e-05, 'epoch': 2.27} 60%|██████ | 6017/10000 [21:57:16<14:30:03, 13.11s/it] 60%|██████ | 6018/10000 [21:57:29<14:25:06, 13.04s/it] {'loss': 0.0045, 'learning_rate': 1.997e-05, 'epoch': 2.27} 60%|██████ | 6018/10000 [21:57:29<14:25:06, 13.04s/it] 60%|██████ | 6019/10000 [21:57:41<14:20:26, 12.97s/it] {'loss': 0.0046, 'learning_rate': 1.9965e-05, 'epoch': 2.27} 60%|██████ | 6019/10000 [21:57:41<14:20:26, 12.97s/it] 60%|██████ | 6020/10000 [21:57:54<14:17:32, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.9960000000000002e-05, 'epoch': 2.27} 60%|██████ | 6020/10000 [21:57:54<14:17:32, 12.93s/it] 60%|██████ | 6021/10000 [21:58:07<14:16:50, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.9955e-05, 'epoch': 2.27} 60%|██████ | 6021/10000 [21:58:07<14:16:50, 12.92s/it] 60%|██████ | 6022/10000 [21:58:20<14:15:15, 12.90s/it] {'loss': 0.0057, 'learning_rate': 1.995e-05, 'epoch': 2.27} 60%|██████ | 6022/10000 [21:58:20<14:15:15, 12.90s/it] 60%|██████ | 6023/10000 [21:58:33<14:15:09, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.9945e-05, 'epoch': 2.27} 60%|██████ | 6023/10000 [21:58:33<14:15:09, 12.90s/it] 60%|██████ | 6024/10000 [21:58:46<14:13:42, 12.88s/it] {'loss': 0.0055, 'learning_rate': 1.994e-05, 'epoch': 2.27} 60%|██████ | 6024/10000 [21:58:46<14:13:42, 12.88s/it] 60%|██████ | 6025/10000 [21:58:59<14:12:36, 12.87s/it] {'loss': 0.0047, 'learning_rate': 1.9935e-05, 'epoch': 2.27} 60%|██████ | 6025/10000 [21:58:59<14:12:36, 12.87s/it] 60%|██████ | 6026/10000 [21:59:12<14:12:57, 12.88s/it] {'loss': 0.0043, 'learning_rate': 1.993e-05, 'epoch': 2.27} 60%|██████ | 6026/10000 [21:59:12<14:12:57, 12.88s/it] 60%|██████ | 6027/10000 [21:59:24<14:13:16, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.9925000000000003e-05, 'epoch': 2.27} 60%|██████ | 6027/10000 [21:59:24<14:13:16, 12.89s/it] 60%|██████ | 6028/10000 [21:59:37<14:12:52, 12.88s/it] {'loss': 0.0046, 'learning_rate': 1.992e-05, 'epoch': 2.27} 60%|██████ | 6028/10000 [21:59:37<14:12:52, 12.88s/it] 60%|██████ | 6029/10000 [21:59:50<14:11:43, 12.87s/it] {'loss': 0.0059, 'learning_rate': 1.9915e-05, 'epoch': 2.27} 60%|██████ | 6029/10000 [21:59:50<14:11:43, 12.87s/it] 60%|██████ | 6030/10000 [22:00:03<14:11:05, 12.86s/it] {'loss': 0.0042, 'learning_rate': 1.991e-05, 'epoch': 2.27} 60%|██████ | 6030/10000 [22:00:03<14:11:05, 12.86s/it] 60%|██████ | 6031/10000 [22:00:16<14:10:58, 12.86s/it] {'loss': 0.0054, 'learning_rate': 1.9905e-05, 'epoch': 2.27} 60%|██████ | 6031/10000 [22:00:16<14:10:58, 12.86s/it] 60%|██████ | 6032/10000 [22:00:29<14:09:18, 12.84s/it] {'loss': 0.0056, 'learning_rate': 1.9900000000000003e-05, 'epoch': 2.27} 60%|██████ | 6032/10000 [22:00:29<14:09:18, 12.84s/it] 60%|██████ | 6033/10000 [22:00:42<14:10:37, 12.87s/it] {'loss': 0.0045, 'learning_rate': 1.9895e-05, 'epoch': 2.27} 60%|██████ | 6033/10000 [22:00:42<14:10:37, 12.87s/it] 60%|██████ | 6034/10000 [22:00:54<14:11:29, 12.88s/it] {'loss': 0.0049, 'learning_rate': 1.989e-05, 'epoch': 2.27} 60%|██████ | 6034/10000 [22:00:55<14:11:29, 12.88s/it] 60%|██████ | 6035/10000 [22:01:07<14:11:23, 12.88s/it] {'loss': 0.0056, 'learning_rate': 1.9885e-05, 'epoch': 2.27} 60%|██████ | 6035/10000 [22:01:07<14:11:23, 12.88s/it] 60%|██████ | 6036/10000 [22:01:20<14:10:57, 12.88s/it] {'loss': 0.0042, 'learning_rate': 1.9880000000000003e-05, 'epoch': 2.27} 60%|██████ | 6036/10000 [22:01:20<14:10:57, 12.88s/it] 60%|██████ | 6037/10000 [22:01:33<14:10:54, 12.88s/it] {'loss': 0.0051, 'learning_rate': 1.9875000000000002e-05, 'epoch': 2.27} 60%|██████ | 6037/10000 [22:01:33<14:10:54, 12.88s/it] 60%|██████ | 6038/10000 [22:01:46<14:09:49, 12.87s/it] {'loss': 0.0041, 'learning_rate': 1.987e-05, 'epoch': 2.28} 60%|██████ | 6038/10000 [22:01:46<14:09:49, 12.87s/it] 60%|██████ | 6039/10000 [22:01:59<14:10:53, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.9865e-05, 'epoch': 2.28} 60%|██████ | 6039/10000 [22:01:59<14:10:53, 12.89s/it] 60%|██████ | 6040/10000 [22:02:12<14:09:13, 12.87s/it] {'loss': 0.0044, 'learning_rate': 1.986e-05, 'epoch': 2.28} 60%|██████ | 6040/10000 [22:02:12<14:09:13, 12.87s/it] 60%|██████ | 6041/10000 [22:02:25<14:11:15, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.9855000000000002e-05, 'epoch': 2.28} 60%|██████ | 6041/10000 [22:02:25<14:11:15, 12.90s/it] 60%|██████ | 6042/10000 [22:02:38<14:10:03, 12.89s/it] {'loss': 0.0046, 'learning_rate': 1.985e-05, 'epoch': 2.28} 60%|██████ | 6042/10000 [22:02:38<14:10:03, 12.89s/it] 60%|██████ | 6043/10000 [22:02:50<14:09:07, 12.88s/it] {'loss': 0.0038, 'learning_rate': 1.9845e-05, 'epoch': 2.28} 60%|██████ | 6043/10000 [22:02:50<14:09:07, 12.88s/it] 60%|██████ | 6044/10000 [22:03:03<14:09:26, 12.88s/it] {'loss': 0.0045, 'learning_rate': 1.984e-05, 'epoch': 2.28} 60%|██████ | 6044/10000 [22:03:03<14:09:26, 12.88s/it] 60%|██████ | 6045/10000 [22:03:16<14:07:40, 12.86s/it] {'loss': 0.0058, 'learning_rate': 1.9835000000000002e-05, 'epoch': 2.28} 60%|██████ | 6045/10000 [22:03:16<14:07:40, 12.86s/it] 60%|██████ | 6046/10000 [22:03:29<14:07:07, 12.85s/it] {'loss': 0.0054, 'learning_rate': 1.983e-05, 'epoch': 2.28} 60%|██████ | 6046/10000 [22:03:29<14:07:07, 12.85s/it] 60%|██████ | 6047/10000 [22:03:42<14:09:25, 12.89s/it] {'loss': 0.0058, 'learning_rate': 1.9825e-05, 'epoch': 2.28} 60%|██████ | 6047/10000 [22:03:42<14:09:25, 12.89s/it] 60%|██████ | 6048/10000 [22:03:55<14:07:41, 12.87s/it] {'loss': 0.0051, 'learning_rate': 1.982e-05, 'epoch': 2.28} 60%|██████ | 6048/10000 [22:03:55<14:07:41, 12.87s/it] 60%|██████ | 6049/10000 [22:04:08<14:07:38, 12.87s/it] {'loss': 0.0047, 'learning_rate': 1.9815e-05, 'epoch': 2.28} 60%|██████ | 6049/10000 [22:04:08<14:07:38, 12.87s/it] 60%|██████ | 6050/10000 [22:04:21<14:08:00, 12.88s/it] {'loss': 0.0052, 'learning_rate': 1.9810000000000002e-05, 'epoch': 2.28} 60%|██████ | 6050/10000 [22:04:21<14:08:00, 12.88s/it] 61%|██████ | 6051/10000 [22:04:33<14:07:02, 12.87s/it] {'loss': 0.0044, 'learning_rate': 1.9805e-05, 'epoch': 2.28} 61%|██████ | 6051/10000 [22:04:33<14:07:02, 12.87s/it] 61%|██████ | 6052/10000 [22:04:46<14:06:12, 12.86s/it] {'loss': 0.0044, 'learning_rate': 1.9800000000000004e-05, 'epoch': 2.28} 61%|██████ | 6052/10000 [22:04:46<14:06:12, 12.86s/it] 61%|██████ | 6053/10000 [22:04:59<14:06:00, 12.86s/it] {'loss': 0.0043, 'learning_rate': 1.9795e-05, 'epoch': 2.28} 61%|██████ | 6053/10000 [22:04:59<14:06:00, 12.86s/it] 61%|██████ | 6054/10000 [22:05:12<14:06:53, 12.88s/it] {'loss': 0.0045, 'learning_rate': 1.979e-05, 'epoch': 2.28} 61%|██████ | 6054/10000 [22:05:12<14:06:53, 12.88s/it] 61%|██████ | 6055/10000 [22:05:25<14:05:28, 12.86s/it] {'loss': 0.0053, 'learning_rate': 1.9785e-05, 'epoch': 2.28} 61%|██████ | 6055/10000 [22:05:25<14:05:28, 12.86s/it] 61%|██████ | 6056/10000 [22:05:38<14:05:39, 12.86s/it] {'loss': 0.0036, 'learning_rate': 1.978e-05, 'epoch': 2.28} 61%|██████ | 6056/10000 [22:05:38<14:05:39, 12.86s/it] 61%|██████ | 6057/10000 [22:05:51<14:07:14, 12.89s/it] {'loss': 0.0035, 'learning_rate': 1.9775000000000003e-05, 'epoch': 2.28} 61%|██████ | 6057/10000 [22:05:51<14:07:14, 12.89s/it] 61%|██████ | 6058/10000 [22:06:04<14:08:08, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.977e-05, 'epoch': 2.28} 61%|██████ | 6058/10000 [22:06:04<14:08:08, 12.91s/it] 61%|██████ | 6059/10000 [22:06:17<14:08:30, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.9765e-05, 'epoch': 2.28} 61%|██████ | 6059/10000 [22:06:17<14:08:30, 12.92s/it] 61%|██████ | 6060/10000 [22:06:29<14:08:52, 12.93s/it] {'loss': 0.0057, 'learning_rate': 1.976e-05, 'epoch': 2.28} 61%|██████ | 6060/10000 [22:06:30<14:08:52, 12.93s/it] 61%|██████ | 6061/10000 [22:06:42<14:08:52, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.9755e-05, 'epoch': 2.28} 61%|██████ | 6061/10000 [22:06:42<14:08:52, 12.93s/it] 61%|██████ | 6062/10000 [22:06:55<14:08:51, 12.93s/it] {'loss': 0.0039, 'learning_rate': 1.9750000000000002e-05, 'epoch': 2.28} 61%|██████ | 6062/10000 [22:06:55<14:08:51, 12.93s/it] 61%|██████ | 6063/10000 [22:07:08<14:06:03, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.9744999999999998e-05, 'epoch': 2.28} 61%|██████ | 6063/10000 [22:07:08<14:06:03, 12.89s/it] 61%|██████ | 6064/10000 [22:07:21<14:06:45, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.974e-05, 'epoch': 2.28} 61%|██████ | 6064/10000 [22:07:21<14:06:45, 12.91s/it] 61%|██████ | 6065/10000 [22:07:34<14:06:13, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.9735e-05, 'epoch': 2.29} 61%|██████ | 6065/10000 [22:07:34<14:06:13, 12.90s/it] 61%|██████ | 6066/10000 [22:07:47<14:05:57, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.9730000000000003e-05, 'epoch': 2.29} 61%|██████ | 6066/10000 [22:07:47<14:05:57, 12.90s/it] 61%|██████ | 6067/10000 [22:08:00<14:04:45, 12.89s/it] {'loss': 0.0047, 'learning_rate': 1.9725000000000002e-05, 'epoch': 2.29} 61%|██████ | 6067/10000 [22:08:00<14:04:45, 12.89s/it] 61%|██████ | 6068/10000 [22:08:13<14:03:17, 12.87s/it] {'loss': 0.0042, 'learning_rate': 1.972e-05, 'epoch': 2.29} 61%|██████ | 6068/10000 [22:08:13<14:03:17, 12.87s/it] 61%|██████ | 6069/10000 [22:08:25<14:02:36, 12.86s/it] {'loss': 0.0038, 'learning_rate': 1.9715e-05, 'epoch': 2.29} 61%|██████ | 6069/10000 [22:08:25<14:02:36, 12.86s/it] 61%|██████ | 6070/10000 [22:08:38<14:01:40, 12.85s/it] {'loss': 0.0046, 'learning_rate': 1.971e-05, 'epoch': 2.29} 61%|██████ | 6070/10000 [22:08:38<14:01:40, 12.85s/it] 61%|██████ | 6071/10000 [22:08:51<14:00:18, 12.83s/it] {'loss': 0.0047, 'learning_rate': 1.9705000000000002e-05, 'epoch': 2.29} 61%|██████ | 6071/10000 [22:08:51<14:00:18, 12.83s/it] 61%|██████ | 6072/10000 [22:09:04<14:00:03, 12.83s/it] {'loss': 0.0042, 'learning_rate': 1.97e-05, 'epoch': 2.29} 61%|██████ | 6072/10000 [22:09:04<14:00:03, 12.83s/it] 61%|██████ | 6073/10000 [22:09:17<14:01:12, 12.85s/it] {'loss': 0.0046, 'learning_rate': 1.9695e-05, 'epoch': 2.29} 61%|██████ | 6073/10000 [22:09:17<14:01:12, 12.85s/it] 61%|██████ | 6074/10000 [22:09:30<14:02:19, 12.87s/it] {'loss': 0.0043, 'learning_rate': 1.969e-05, 'epoch': 2.29} 61%|██████ | 6074/10000 [22:09:30<14:02:19, 12.87s/it] 61%|██████ | 6075/10000 [22:09:43<14:02:19, 12.88s/it] {'loss': 0.0056, 'learning_rate': 1.9685000000000002e-05, 'epoch': 2.29} 61%|██████ | 6075/10000 [22:09:43<14:02:19, 12.88s/it] 61%|██████ | 6076/10000 [22:09:56<14:03:35, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.968e-05, 'epoch': 2.29} 61%|██████ | 6076/10000 [22:09:56<14:03:35, 12.90s/it] 61%|██████ | 6077/10000 [22:10:09<14:05:30, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.9675e-05, 'epoch': 2.29} 61%|██████ | 6077/10000 [22:10:09<14:05:30, 12.93s/it] 61%|██████ | 6078/10000 [22:10:21<14:04:29, 12.92s/it] {'loss': 0.0059, 'learning_rate': 1.9670000000000003e-05, 'epoch': 2.29} 61%|██████ | 6078/10000 [22:10:21<14:04:29, 12.92s/it] 61%|██████ | 6079/10000 [22:10:34<14:04:59, 12.93s/it] {'loss': 0.0062, 'learning_rate': 1.9665e-05, 'epoch': 2.29} 61%|██████ | 6079/10000 [22:10:34<14:04:59, 12.93s/it] 61%|██████ | 6080/10000 [22:10:47<14:03:30, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.966e-05, 'epoch': 2.29} 61%|██████ | 6080/10000 [22:10:47<14:03:30, 12.91s/it] 61%|██████ | 6081/10000 [22:11:00<14:03:47, 12.92s/it] {'loss': 0.0037, 'learning_rate': 1.9655e-05, 'epoch': 2.29} 61%|██████ | 6081/10000 [22:11:00<14:03:47, 12.92s/it] 61%|██████ | 6082/10000 [22:11:13<14:04:23, 12.93s/it] {'loss': 0.0047, 'learning_rate': 1.9650000000000003e-05, 'epoch': 2.29} 61%|██████ | 6082/10000 [22:11:13<14:04:23, 12.93s/it] 61%|██████ | 6083/10000 [22:11:26<14:04:18, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.9645000000000002e-05, 'epoch': 2.29} 61%|██████ | 6083/10000 [22:11:26<14:04:18, 12.93s/it] 61%|██████ | 6084/10000 [22:11:39<14:04:16, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.9640000000000002e-05, 'epoch': 2.29} 61%|██████ | 6084/10000 [22:11:39<14:04:16, 12.94s/it] 61%|██████ | 6085/10000 [22:11:52<14:02:34, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.9635e-05, 'epoch': 2.29} 61%|██████ | 6085/10000 [22:11:52<14:02:34, 12.91s/it] 61%|██████ | 6086/10000 [22:12:05<14:00:41, 12.89s/it] {'loss': 0.0057, 'learning_rate': 1.963e-05, 'epoch': 2.29} 61%|██████ | 6086/10000 [22:12:05<14:00:41, 12.89s/it] 61%|██████ | 6087/10000 [22:12:18<14:01:46, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.9625000000000003e-05, 'epoch': 2.29} 61%|██████ | 6087/10000 [22:12:18<14:01:46, 12.91s/it] 61%|██████ | 6088/10000 [22:12:31<14:01:10, 12.90s/it] {'loss': 0.0054, 'learning_rate': 1.9620000000000002e-05, 'epoch': 2.29} 61%|██████ | 6088/10000 [22:12:31<14:01:10, 12.90s/it] 61%|██████ | 6089/10000 [22:12:43<14:01:02, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.9615e-05, 'epoch': 2.29} 61%|██████ | 6089/10000 [22:12:43<14:01:02, 12.90s/it] 61%|██████ | 6090/10000 [22:12:56<14:01:13, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.961e-05, 'epoch': 2.29} 61%|██████ | 6090/10000 [22:12:56<14:01:13, 12.91s/it] 61%|██████ | 6091/10000 [22:13:09<13:58:52, 12.88s/it] {'loss': 0.0048, 'learning_rate': 1.9605e-05, 'epoch': 2.3} 61%|██████ | 6091/10000 [22:13:09<13:58:52, 12.88s/it] 61%|██████ | 6092/10000 [22:13:22<14:00:15, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.9600000000000002e-05, 'epoch': 2.3} 61%|██████ | 6092/10000 [22:13:22<14:00:15, 12.90s/it] 61%|██████ | 6093/10000 [22:13:35<14:00:53, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.9595e-05, 'epoch': 2.3} 61%|██████ | 6093/10000 [22:13:35<14:00:53, 12.91s/it] 61%|██████ | 6094/10000 [22:13:48<13:59:04, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.959e-05, 'epoch': 2.3} 61%|██████ | 6094/10000 [22:13:48<13:59:04, 12.89s/it] 61%|██████ | 6095/10000 [22:14:01<14:00:24, 12.91s/it] {'loss': 0.0033, 'learning_rate': 1.9585e-05, 'epoch': 2.3} 61%|██████ | 6095/10000 [22:14:01<14:00:24, 12.91s/it] 61%|██████ | 6096/10000 [22:14:14<14:00:17, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.9580000000000002e-05, 'epoch': 2.3} 61%|██████ | 6096/10000 [22:14:14<14:00:17, 12.91s/it] 61%|██████ | 6097/10000 [22:14:27<13:57:35, 12.88s/it] {'loss': 0.0053, 'learning_rate': 1.9575e-05, 'epoch': 2.3} 61%|██████ | 6097/10000 [22:14:27<13:57:35, 12.88s/it] 61%|██████ | 6098/10000 [22:14:39<13:56:30, 12.86s/it] {'loss': 0.0039, 'learning_rate': 1.957e-05, 'epoch': 2.3} 61%|██████ | 6098/10000 [22:14:39<13:56:30, 12.86s/it] 61%|██████ | 6099/10000 [22:14:52<13:56:37, 12.87s/it] {'loss': 0.0031, 'learning_rate': 1.9565e-05, 'epoch': 2.3} 61%|██████ | 6099/10000 [22:14:52<13:56:37, 12.87s/it] 61%|██████ | 6100/10000 [22:15:05<13:56:36, 12.87s/it] {'loss': 0.0044, 'learning_rate': 1.956e-05, 'epoch': 2.3} 61%|██████ | 6100/10000 [22:15:05<13:56:36, 12.87s/it] 61%|██████ | 6101/10000 [22:15:18<13:57:23, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.9555e-05, 'epoch': 2.3} 61%|██████ | 6101/10000 [22:15:18<13:57:23, 12.89s/it] 61%|██████ | 6102/10000 [22:15:31<13:57:59, 12.90s/it] {'loss': 0.0063, 'learning_rate': 1.955e-05, 'epoch': 2.3} 61%|██████ | 6102/10000 [22:15:31<13:57:59, 12.90s/it] 61%|██████ | 6103/10000 [22:15:44<13:57:24, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.9545000000000003e-05, 'epoch': 2.3} 61%|██████ | 6103/10000 [22:15:44<13:57:24, 12.89s/it] 61%|██████ | 6104/10000 [22:15:57<13:56:54, 12.89s/it] {'loss': 0.0054, 'learning_rate': 1.954e-05, 'epoch': 2.3} 61%|██████ | 6104/10000 [22:15:57<13:56:54, 12.89s/it] 61%|██████ | 6105/10000 [22:16:10<13:57:14, 12.90s/it] {'loss': 0.0035, 'learning_rate': 1.9535000000000002e-05, 'epoch': 2.3} 61%|██████ | 6105/10000 [22:16:10<13:57:14, 12.90s/it] 61%|██████ | 6106/10000 [22:16:23<13:57:29, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.953e-05, 'epoch': 2.3} 61%|██████ | 6106/10000 [22:16:23<13:57:29, 12.90s/it] 61%|██████ | 6107/10000 [22:16:36<13:58:29, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.9525e-05, 'epoch': 2.3} 61%|██████ | 6107/10000 [22:16:36<13:58:29, 12.92s/it] 61%|██████ | 6108/10000 [22:16:48<13:57:47, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.9520000000000003e-05, 'epoch': 2.3} 61%|██████ | 6108/10000 [22:16:49<13:57:47, 12.92s/it] 61%|██████ | 6109/10000 [22:17:01<13:58:04, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.9515e-05, 'epoch': 2.3} 61%|██████ | 6109/10000 [22:17:01<13:58:04, 12.92s/it] 61%|██████ | 6110/10000 [22:17:14<13:58:41, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.951e-05, 'epoch': 2.3} 61%|██████ | 6110/10000 [22:17:14<13:58:41, 12.94s/it] 61%|██████ | 6111/10000 [22:17:27<13:57:03, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.9505e-05, 'epoch': 2.3} 61%|██████ | 6111/10000 [22:17:27<13:57:03, 12.91s/it] 61%|██████ | 6112/10000 [22:17:40<13:55:16, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.9500000000000003e-05, 'epoch': 2.3} 61%|██████ | 6112/10000 [22:17:40<13:55:16, 12.89s/it] 61%|██████ | 6113/10000 [22:17:53<13:53:51, 12.87s/it] {'loss': 0.0053, 'learning_rate': 1.9495000000000002e-05, 'epoch': 2.3} 61%|██████ | 6113/10000 [22:17:53<13:53:51, 12.87s/it] 61%|██████ | 6114/10000 [22:18:06<13:53:37, 12.87s/it] {'loss': 0.0043, 'learning_rate': 1.949e-05, 'epoch': 2.3} 61%|██████ | 6114/10000 [22:18:06<13:53:37, 12.87s/it] 61%|██████ | 6115/10000 [22:18:19<13:53:58, 12.88s/it] {'loss': 0.004, 'learning_rate': 1.9485e-05, 'epoch': 2.3} 61%|██████ | 6115/10000 [22:18:19<13:53:58, 12.88s/it] 61%|██████ | 6116/10000 [22:18:32<13:53:28, 12.88s/it] {'loss': 0.0046, 'learning_rate': 1.948e-05, 'epoch': 2.3} 61%|██████ | 6116/10000 [22:18:32<13:53:28, 12.88s/it] 61%|██████ | 6117/10000 [22:18:44<13:53:11, 12.87s/it] {'loss': 0.0055, 'learning_rate': 1.9475000000000002e-05, 'epoch': 2.3} 61%|██████ | 6117/10000 [22:18:44<13:53:11, 12.87s/it] 61%|██████ | 6118/10000 [22:18:57<13:51:57, 12.86s/it] {'loss': 0.0062, 'learning_rate': 1.947e-05, 'epoch': 2.31} 61%|██████ | 6118/10000 [22:18:57<13:51:57, 12.86s/it] 61%|██████ | 6119/10000 [22:19:10<13:51:24, 12.85s/it] {'loss': 0.0053, 'learning_rate': 1.9465e-05, 'epoch': 2.31} 61%|██████ | 6119/10000 [22:19:10<13:51:24, 12.85s/it] 61%|██████ | 6120/10000 [22:19:23<13:52:44, 12.88s/it] {'loss': 0.0038, 'learning_rate': 1.946e-05, 'epoch': 2.31} 61%|██████ | 6120/10000 [22:19:23<13:52:44, 12.88s/it] 61%|██████ | 6121/10000 [22:19:36<13:52:18, 12.87s/it] {'loss': 0.0049, 'learning_rate': 1.9455000000000003e-05, 'epoch': 2.31} 61%|██████ | 6121/10000 [22:19:36<13:52:18, 12.87s/it] 61%|██████ | 6122/10000 [22:19:49<13:50:48, 12.85s/it] {'loss': 0.0058, 'learning_rate': 1.9450000000000002e-05, 'epoch': 2.31} 61%|██████ | 6122/10000 [22:19:49<13:50:48, 12.85s/it] 61%|██████ | 6123/10000 [22:20:02<13:50:25, 12.85s/it] {'loss': 0.0054, 'learning_rate': 1.9445e-05, 'epoch': 2.31} 61%|██████ | 6123/10000 [22:20:02<13:50:25, 12.85s/it] 61%|██████ | 6124/10000 [22:20:14<13:50:24, 12.85s/it] {'loss': 0.0052, 'learning_rate': 1.944e-05, 'epoch': 2.31} 61%|██████ | 6124/10000 [22:20:14<13:50:24, 12.85s/it] 61%|██████▏ | 6125/10000 [22:20:27<13:51:27, 12.87s/it] {'loss': 0.0036, 'learning_rate': 1.9435e-05, 'epoch': 2.31} 61%|██████▏ | 6125/10000 [22:20:27<13:51:27, 12.87s/it] 61%|██████▏ | 6126/10000 [22:20:40<13:51:11, 12.87s/it] {'loss': 0.0037, 'learning_rate': 1.9430000000000002e-05, 'epoch': 2.31} 61%|██████▏ | 6126/10000 [22:20:40<13:51:11, 12.87s/it] 61%|██████▏ | 6127/10000 [22:20:53<13:50:09, 12.86s/it] {'loss': 0.006, 'learning_rate': 1.9425e-05, 'epoch': 2.31} 61%|██████▏ | 6127/10000 [22:20:53<13:50:09, 12.86s/it] 61%|██████▏ | 6128/10000 [22:21:06<13:50:34, 12.87s/it] {'loss': 0.0053, 'learning_rate': 1.942e-05, 'epoch': 2.31} 61%|██████▏ | 6128/10000 [22:21:06<13:50:34, 12.87s/it] 61%|██████▏ | 6129/10000 [22:21:19<13:49:16, 12.85s/it] {'loss': 0.0047, 'learning_rate': 1.9415e-05, 'epoch': 2.31} 61%|██████▏ | 6129/10000 [22:21:19<13:49:16, 12.85s/it] 61%|██████▏ | 6130/10000 [22:21:32<13:51:04, 12.88s/it] {'loss': 0.005, 'learning_rate': 1.941e-05, 'epoch': 2.31} 61%|██████▏ | 6130/10000 [22:21:32<13:51:04, 12.88s/it] 61%|██████▏ | 6131/10000 [22:21:45<13:51:24, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.9405e-05, 'epoch': 2.31} 61%|██████▏ | 6131/10000 [22:21:45<13:51:24, 12.89s/it] 61%|██████▏ | 6132/10000 [22:21:57<13:51:10, 12.89s/it] {'loss': 0.0059, 'learning_rate': 1.94e-05, 'epoch': 2.31} 61%|██████▏ | 6132/10000 [22:21:58<13:51:10, 12.89s/it] 61%|██████▏ | 6133/10000 [22:22:10<13:51:17, 12.90s/it] {'loss': 0.0044, 'learning_rate': 1.9395000000000003e-05, 'epoch': 2.31} 61%|██████▏ | 6133/10000 [22:22:10<13:51:17, 12.90s/it] 61%|██████▏ | 6134/10000 [22:22:23<13:51:00, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.939e-05, 'epoch': 2.31} 61%|██████▏ | 6134/10000 [22:22:23<13:51:00, 12.90s/it] 61%|██████▏ | 6135/10000 [22:22:36<13:51:09, 12.90s/it] {'loss': 0.0051, 'learning_rate': 1.9385e-05, 'epoch': 2.31} 61%|██████▏ | 6135/10000 [22:22:36<13:51:09, 12.90s/it] 61%|██████▏ | 6136/10000 [22:22:49<13:50:22, 12.89s/it] {'loss': 0.0046, 'learning_rate': 1.938e-05, 'epoch': 2.31} 61%|██████▏ | 6136/10000 [22:22:49<13:50:22, 12.89s/it] 61%|██████▏ | 6137/10000 [22:23:02<13:50:13, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.9375e-05, 'epoch': 2.31} 61%|██████▏ | 6137/10000 [22:23:02<13:50:13, 12.90s/it] 61%|██████▏ | 6138/10000 [22:23:15<13:51:04, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.9370000000000003e-05, 'epoch': 2.31} 61%|██████▏ | 6138/10000 [22:23:15<13:51:04, 12.91s/it] 61%|██████▏ | 6139/10000 [22:23:28<13:50:16, 12.90s/it] {'loss': 0.0056, 'learning_rate': 1.9365e-05, 'epoch': 2.31} 61%|██████▏ | 6139/10000 [22:23:28<13:50:16, 12.90s/it] 61%|██████▏ | 6140/10000 [22:23:41<13:50:22, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.936e-05, 'epoch': 2.31} 61%|██████▏ | 6140/10000 [22:23:41<13:50:22, 12.91s/it] 61%|██████▏ | 6141/10000 [22:23:54<13:49:43, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.9355e-05, 'epoch': 2.31} 61%|██████▏ | 6141/10000 [22:23:54<13:49:43, 12.90s/it] 61%|██████▏ | 6142/10000 [22:24:07<13:49:19, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.9350000000000003e-05, 'epoch': 2.31} 61%|██████▏ | 6142/10000 [22:24:07<13:49:19, 12.90s/it] 61%|██████▏ | 6143/10000 [22:24:19<13:47:48, 12.88s/it] {'loss': 0.0042, 'learning_rate': 1.9345000000000002e-05, 'epoch': 2.31} 61%|██████▏ | 6143/10000 [22:24:19<13:47:48, 12.88s/it] 61%|██████▏ | 6144/10000 [22:24:32<13:47:32, 12.88s/it] {'loss': 0.0052, 'learning_rate': 1.934e-05, 'epoch': 2.31} 61%|██████▏ | 6144/10000 [22:24:32<13:47:32, 12.88s/it] 61%|██████▏ | 6145/10000 [22:24:45<13:48:14, 12.89s/it] {'loss': 0.0041, 'learning_rate': 1.9335e-05, 'epoch': 2.32} 61%|██████▏ | 6145/10000 [22:24:45<13:48:14, 12.89s/it] 61%|██████▏ | 6146/10000 [22:24:58<13:48:39, 12.90s/it] {'loss': 0.0034, 'learning_rate': 1.933e-05, 'epoch': 2.32} 61%|██████▏ | 6146/10000 [22:24:58<13:48:39, 12.90s/it] 61%|██████▏ | 6147/10000 [22:25:11<13:50:03, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.9325000000000002e-05, 'epoch': 2.32} 61%|██████▏ | 6147/10000 [22:25:11<13:50:03, 12.93s/it] 61%|██████▏ | 6148/10000 [22:25:24<13:48:36, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.932e-05, 'epoch': 2.32} 61%|██████▏ | 6148/10000 [22:25:24<13:48:36, 12.91s/it] 61%|██████▏ | 6149/10000 [22:25:37<13:48:51, 12.91s/it] {'loss': 0.0054, 'learning_rate': 1.9315e-05, 'epoch': 2.32} 61%|██████▏ | 6149/10000 [22:25:37<13:48:51, 12.91s/it] 62%|██████▏ | 6150/10000 [22:25:50<13:47:48, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.931e-05, 'epoch': 2.32} 62%|██████▏ | 6150/10000 [22:25:50<13:47:48, 12.90s/it] 62%|██████▏ | 6151/10000 [22:26:03<13:48:09, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.9305000000000002e-05, 'epoch': 2.32} 62%|██████▏ | 6151/10000 [22:26:03<13:48:09, 12.91s/it] 62%|██████▏ | 6152/10000 [22:26:16<13:48:35, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.93e-05, 'epoch': 2.32} 62%|██████▏ | 6152/10000 [22:26:16<13:48:35, 12.92s/it] 62%|██████▏ | 6153/10000 [22:26:29<13:49:41, 12.94s/it] {'loss': 0.0036, 'learning_rate': 1.9295e-05, 'epoch': 2.32} 62%|██████▏ | 6153/10000 [22:26:29<13:49:41, 12.94s/it] 62%|██████▏ | 6154/10000 [22:26:42<13:49:33, 12.94s/it] {'loss': 0.0053, 'learning_rate': 1.929e-05, 'epoch': 2.32} 62%|██████▏ | 6154/10000 [22:26:42<13:49:33, 12.94s/it] 62%|██████▏ | 6155/10000 [22:26:54<13:47:34, 12.91s/it] {'loss': 0.0057, 'learning_rate': 1.9285e-05, 'epoch': 2.32} 62%|██████▏ | 6155/10000 [22:26:54<13:47:34, 12.91s/it] 62%|██████▏ | 6156/10000 [22:27:07<13:45:56, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.9280000000000002e-05, 'epoch': 2.32} 62%|██████▏ | 6156/10000 [22:27:07<13:45:56, 12.89s/it] 62%|██████▏ | 6157/10000 [22:27:20<13:47:40, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.9275e-05, 'epoch': 2.32} 62%|██████▏ | 6157/10000 [22:27:20<13:47:40, 12.92s/it] 62%|██████▏ | 6158/10000 [22:27:33<13:46:05, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.9270000000000004e-05, 'epoch': 2.32} 62%|██████▏ | 6158/10000 [22:27:33<13:46:05, 12.90s/it] 62%|██████▏ | 6159/10000 [22:27:46<13:45:24, 12.89s/it] {'loss': 0.0039, 'learning_rate': 1.9265e-05, 'epoch': 2.32} 62%|██████▏ | 6159/10000 [22:27:46<13:45:24, 12.89s/it] 62%|██████▏ | 6160/10000 [22:27:59<13:45:09, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.9260000000000002e-05, 'epoch': 2.32} 62%|██████▏ | 6160/10000 [22:27:59<13:45:09, 12.89s/it] 62%|██████▏ | 6161/10000 [22:28:12<13:45:45, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.9255e-05, 'epoch': 2.32} 62%|██████▏ | 6161/10000 [22:28:12<13:45:45, 12.91s/it] 62%|██████▏ | 6162/10000 [22:28:25<13:45:56, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.925e-05, 'epoch': 2.32} 62%|██████▏ | 6162/10000 [22:28:25<13:45:56, 12.91s/it] 62%|██████▏ | 6163/10000 [22:28:38<13:45:20, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.9245000000000003e-05, 'epoch': 2.32} 62%|██████▏ | 6163/10000 [22:28:38<13:45:20, 12.91s/it] 62%|██████▏ | 6164/10000 [22:28:50<13:43:57, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.924e-05, 'epoch': 2.32} 62%|██████▏ | 6164/10000 [22:28:50<13:43:57, 12.89s/it] 62%|██████▏ | 6165/10000 [22:29:03<13:45:25, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.9235e-05, 'epoch': 2.32} 62%|██████▏ | 6165/10000 [22:29:03<13:45:25, 12.91s/it] 62%|██████▏ | 6166/10000 [22:29:16<13:44:46, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.923e-05, 'epoch': 2.32} 62%|██████▏ | 6166/10000 [22:29:16<13:44:46, 12.91s/it] 62%|██████▏ | 6167/10000 [22:29:29<13:43:36, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.9225e-05, 'epoch': 2.32} 62%|██████▏ | 6167/10000 [22:29:29<13:43:36, 12.89s/it] 62%|██████▏ | 6168/10000 [22:29:42<13:42:37, 12.88s/it] {'loss': 0.005, 'learning_rate': 1.9220000000000002e-05, 'epoch': 2.32} 62%|██████▏ | 6168/10000 [22:29:42<13:42:37, 12.88s/it] 62%|██████▏ | 6169/10000 [22:29:55<13:42:52, 12.89s/it] {'loss': 0.0047, 'learning_rate': 1.9214999999999998e-05, 'epoch': 2.32} 62%|██████▏ | 6169/10000 [22:29:55<13:42:52, 12.89s/it] 62%|██████▏ | 6170/10000 [22:30:08<13:41:21, 12.87s/it] {'loss': 0.0057, 'learning_rate': 1.921e-05, 'epoch': 2.32} 62%|██████▏ | 6170/10000 [22:30:08<13:41:21, 12.87s/it] 62%|██████▏ | 6171/10000 [22:30:21<13:41:40, 12.88s/it] {'loss': 0.004, 'learning_rate': 1.9205e-05, 'epoch': 2.33} 62%|██████▏ | 6171/10000 [22:30:21<13:41:40, 12.88s/it] 62%|██████▏ | 6172/10000 [22:30:33<13:41:09, 12.87s/it] {'loss': 0.0044, 'learning_rate': 1.9200000000000003e-05, 'epoch': 2.33} 62%|██████▏ | 6172/10000 [22:30:33<13:41:09, 12.87s/it] 62%|██████▏ | 6173/10000 [22:30:46<13:40:40, 12.87s/it] {'loss': 0.0049, 'learning_rate': 1.9195000000000002e-05, 'epoch': 2.33} 62%|██████▏ | 6173/10000 [22:30:46<13:40:40, 12.87s/it] 62%|██████▏ | 6174/10000 [22:30:59<13:39:19, 12.85s/it] {'loss': 0.0046, 'learning_rate': 1.919e-05, 'epoch': 2.33} 62%|██████▏ | 6174/10000 [22:30:59<13:39:19, 12.85s/it] 62%|██████▏ | 6175/10000 [22:31:12<13:40:05, 12.86s/it] {'loss': 0.0046, 'learning_rate': 1.9185e-05, 'epoch': 2.33} 62%|██████▏ | 6175/10000 [22:31:12<13:40:05, 12.86s/it] 62%|██████▏ | 6176/10000 [22:31:25<13:39:11, 12.85s/it] {'loss': 0.0047, 'learning_rate': 1.918e-05, 'epoch': 2.33} 62%|██████▏ | 6176/10000 [22:31:25<13:39:11, 12.85s/it] 62%|██████▏ | 6177/10000 [22:31:38<13:40:00, 12.87s/it] {'loss': 0.0038, 'learning_rate': 1.9175000000000002e-05, 'epoch': 2.33} 62%|██████▏ | 6177/10000 [22:31:38<13:40:00, 12.87s/it] 62%|██████▏ | 6178/10000 [22:31:51<13:40:11, 12.88s/it] {'loss': 0.0057, 'learning_rate': 1.917e-05, 'epoch': 2.33} 62%|██████▏ | 6178/10000 [22:31:51<13:40:11, 12.88s/it] 62%|██████▏ | 6179/10000 [22:32:04<13:40:01, 12.88s/it] {'loss': 0.0046, 'learning_rate': 1.9165e-05, 'epoch': 2.33} 62%|██████▏ | 6179/10000 [22:32:04<13:40:01, 12.88s/it] 62%|██████▏ | 6180/10000 [22:32:16<13:40:10, 12.88s/it] {'loss': 0.0036, 'learning_rate': 1.916e-05, 'epoch': 2.33} 62%|██████▏ | 6180/10000 [22:32:16<13:40:10, 12.88s/it] 62%|██████▏ | 6181/10000 [22:32:29<13:40:33, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.9155000000000002e-05, 'epoch': 2.33} 62%|██████▏ | 6181/10000 [22:32:29<13:40:33, 12.89s/it] 62%|██████▏ | 6182/10000 [22:32:42<13:41:23, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.915e-05, 'epoch': 2.33} 62%|██████▏ | 6182/10000 [22:32:42<13:41:23, 12.91s/it] 62%|██████▏ | 6183/10000 [22:32:55<13:40:22, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.9145e-05, 'epoch': 2.33} 62%|██████▏ | 6183/10000 [22:32:55<13:40:22, 12.90s/it] 62%|██████▏ | 6184/10000 [22:33:08<13:39:06, 12.88s/it] {'loss': 0.0049, 'learning_rate': 1.914e-05, 'epoch': 2.33} 62%|██████▏ | 6184/10000 [22:33:08<13:39:06, 12.88s/it] 62%|██████▏ | 6185/10000 [22:33:21<13:40:36, 12.91s/it] {'loss': 0.0035, 'learning_rate': 1.9135e-05, 'epoch': 2.33} 62%|██████▏ | 6185/10000 [22:33:21<13:40:36, 12.91s/it] 62%|██████▏ | 6186/10000 [22:33:34<13:39:35, 12.89s/it] {'loss': 0.0053, 'learning_rate': 1.913e-05, 'epoch': 2.33} 62%|██████▏ | 6186/10000 [22:33:34<13:39:35, 12.89s/it] 62%|██████▏ | 6187/10000 [22:33:47<13:38:35, 12.88s/it] {'loss': 0.0055, 'learning_rate': 1.9125e-05, 'epoch': 2.33} 62%|██████▏ | 6187/10000 [22:33:47<13:38:35, 12.88s/it] 62%|██████▏ | 6188/10000 [22:34:00<13:37:17, 12.86s/it] {'loss': 0.0046, 'learning_rate': 1.9120000000000003e-05, 'epoch': 2.33} 62%|██████▏ | 6188/10000 [22:34:00<13:37:17, 12.86s/it] 62%|██████▏ | 6189/10000 [22:34:12<13:35:42, 12.84s/it] {'loss': 0.0046, 'learning_rate': 1.9115e-05, 'epoch': 2.33} 62%|██████▏ | 6189/10000 [22:34:12<13:35:42, 12.84s/it] 62%|██████▏ | 6190/10000 [22:34:25<13:34:59, 12.83s/it] {'loss': 0.0045, 'learning_rate': 1.911e-05, 'epoch': 2.33} 62%|██████▏ | 6190/10000 [22:34:25<13:34:59, 12.83s/it] 62%|██████▏ | 6191/10000 [22:34:38<13:36:28, 12.86s/it] {'loss': 0.0035, 'learning_rate': 1.9105e-05, 'epoch': 2.33} 62%|██████▏ | 6191/10000 [22:34:38<13:36:28, 12.86s/it] 62%|██████▏ | 6192/10000 [22:34:51<13:36:18, 12.86s/it] {'loss': 0.0048, 'learning_rate': 1.91e-05, 'epoch': 2.33} 62%|██████▏ | 6192/10000 [22:34:51<13:36:18, 12.86s/it] 62%|██████▏ | 6193/10000 [22:35:04<13:34:45, 12.84s/it] {'loss': 0.0047, 'learning_rate': 1.9095000000000003e-05, 'epoch': 2.33} 62%|██████▏ | 6193/10000 [22:35:04<13:34:45, 12.84s/it] 62%|██████▏ | 6194/10000 [22:35:17<13:35:57, 12.86s/it] {'loss': 0.0046, 'learning_rate': 1.909e-05, 'epoch': 2.33} 62%|██████▏ | 6194/10000 [22:35:17<13:35:57, 12.86s/it] 62%|██████▏ | 6195/10000 [22:35:30<13:37:46, 12.90s/it] {'loss': 0.005, 'learning_rate': 1.9085e-05, 'epoch': 2.33} 62%|██████▏ | 6195/10000 [22:35:30<13:37:46, 12.90s/it] 62%|██████▏ | 6196/10000 [22:35:42<13:36:09, 12.87s/it] {'loss': 0.0052, 'learning_rate': 1.908e-05, 'epoch': 2.33} 62%|██████▏ | 6196/10000 [22:35:42<13:36:09, 12.87s/it] 62%|██████▏ | 6197/10000 [22:35:55<13:35:24, 12.86s/it] {'loss': 0.0041, 'learning_rate': 1.9075000000000003e-05, 'epoch': 2.33} 62%|██████▏ | 6197/10000 [22:35:55<13:35:24, 12.86s/it] 62%|██████▏ | 6198/10000 [22:36:08<13:34:16, 12.85s/it] {'loss': 0.0048, 'learning_rate': 1.9070000000000002e-05, 'epoch': 2.34} 62%|██████▏ | 6198/10000 [22:36:08<13:34:16, 12.85s/it] 62%|██████▏ | 6199/10000 [22:36:21<13:34:18, 12.85s/it] {'loss': 0.0045, 'learning_rate': 1.9064999999999998e-05, 'epoch': 2.34} 62%|██████▏ | 6199/10000 [22:36:21<13:34:18, 12.85s/it] 62%|██████▏ | 6200/10000 [22:36:34<13:34:31, 12.86s/it] {'loss': 0.0044, 'learning_rate': 1.906e-05, 'epoch': 2.34} 62%|██████▏ | 6200/10000 [22:36:34<13:34:31, 12.86s/it] 62%|██████▏ | 6201/10000 [22:36:47<13:34:54, 12.87s/it] {'loss': 0.0048, 'learning_rate': 1.9055e-05, 'epoch': 2.34} 62%|██████▏ | 6201/10000 [22:36:47<13:34:54, 12.87s/it] 62%|██████▏ | 6202/10000 [22:37:00<13:34:11, 12.86s/it] {'loss': 0.0043, 'learning_rate': 1.9050000000000002e-05, 'epoch': 2.34} 62%|██████▏ | 6202/10000 [22:37:00<13:34:11, 12.86s/it] 62%|██████▏ | 6203/10000 [22:37:12<13:33:27, 12.85s/it] {'loss': 0.0053, 'learning_rate': 1.9045e-05, 'epoch': 2.34} 62%|██████▏ | 6203/10000 [22:37:12<13:33:27, 12.85s/it] 62%|██████▏ | 6204/10000 [22:37:25<13:33:26, 12.86s/it] {'loss': 0.0046, 'learning_rate': 1.904e-05, 'epoch': 2.34} 62%|██████▏ | 6204/10000 [22:37:25<13:33:26, 12.86s/it] 62%|██████▏ | 6205/10000 [22:37:38<13:34:51, 12.88s/it] {'loss': 0.0043, 'learning_rate': 1.9035e-05, 'epoch': 2.34} 62%|██████▏ | 6205/10000 [22:37:38<13:34:51, 12.88s/it] 62%|██████▏ | 6206/10000 [22:37:51<13:36:34, 12.91s/it] {'loss': 0.005, 'learning_rate': 1.903e-05, 'epoch': 2.34} 62%|██████▏ | 6206/10000 [22:37:51<13:36:34, 12.91s/it] 62%|██████▏ | 6207/10000 [22:38:04<13:36:10, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.9025e-05, 'epoch': 2.34} 62%|██████▏ | 6207/10000 [22:38:04<13:36:10, 12.91s/it] 62%|██████▏ | 6208/10000 [22:38:17<13:36:55, 12.93s/it] {'loss': 0.0039, 'learning_rate': 1.902e-05, 'epoch': 2.34} 62%|██████▏ | 6208/10000 [22:38:17<13:36:55, 12.93s/it] 62%|██████▏ | 6209/10000 [22:38:30<13:36:22, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.9015000000000003e-05, 'epoch': 2.34} 62%|██████▏ | 6209/10000 [22:38:30<13:36:22, 12.92s/it] 62%|██████▏ | 6210/10000 [22:38:43<13:36:29, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.901e-05, 'epoch': 2.34} 62%|██████▏ | 6210/10000 [22:38:43<13:36:29, 12.93s/it] 62%|██████▏ | 6211/10000 [22:38:56<13:36:34, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.9005000000000002e-05, 'epoch': 2.34} 62%|██████▏ | 6211/10000 [22:38:56<13:36:34, 12.93s/it] 62%|██████▏ | 6212/10000 [22:39:09<13:35:49, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.9e-05, 'epoch': 2.34} 62%|██████▏ | 6212/10000 [22:39:09<13:35:49, 12.92s/it] 62%|██████▏ | 6213/10000 [22:39:22<13:35:33, 12.92s/it] {'loss': 0.0047, 'learning_rate': 1.8995e-05, 'epoch': 2.34} 62%|██████▏ | 6213/10000 [22:39:22<13:35:33, 12.92s/it] 62%|██████▏ | 6214/10000 [22:40:17<26:57:50, 25.64s/it] {'loss': 0.0049, 'learning_rate': 1.8990000000000003e-05, 'epoch': 2.34} 62%|██████▏ | 6214/10000 [22:40:17<26:57:50, 25.64s/it] 62%|██████▏ | 6215/10000 [22:40:30<22:56:25, 21.82s/it] {'loss': 0.0039, 'learning_rate': 1.8985e-05, 'epoch': 2.34} 62%|██████▏ | 6215/10000 [22:40:30<22:56:25, 21.82s/it] 62%|██████▏ | 6216/10000 [22:40:43<20:07:25, 19.15s/it] {'loss': 0.0046, 'learning_rate': 1.898e-05, 'epoch': 2.34} 62%|██████▏ | 6216/10000 [22:40:43<20:07:25, 19.15s/it] 62%|██████▏ | 6217/10000 [22:40:56<18:09:31, 17.28s/it] {'loss': 0.0047, 'learning_rate': 1.8975e-05, 'epoch': 2.34} 62%|██████▏ | 6217/10000 [22:40:56<18:09:31, 17.28s/it] 62%|██████▏ | 6218/10000 [22:41:09<16:47:43, 15.99s/it] {'loss': 0.0047, 'learning_rate': 1.8970000000000003e-05, 'epoch': 2.34} 62%|██████▏ | 6218/10000 [22:41:09<16:47:43, 15.99s/it] 62%|██████▏ | 6219/10000 [22:41:22<15:50:12, 15.08s/it] {'loss': 0.0044, 'learning_rate': 1.8965000000000002e-05, 'epoch': 2.34} 62%|██████▏ | 6219/10000 [22:41:22<15:50:12, 15.08s/it] 62%|██████▏ | 6220/10000 [22:41:35<15:10:01, 14.44s/it] {'loss': 0.0039, 'learning_rate': 1.896e-05, 'epoch': 2.34} 62%|██████▏ | 6220/10000 [22:41:35<15:10:01, 14.44s/it] 62%|██████▏ | 6221/10000 [22:41:48<14:42:14, 14.01s/it] {'loss': 0.0042, 'learning_rate': 1.8955e-05, 'epoch': 2.34} 62%|██████▏ | 6221/10000 [22:41:48<14:42:14, 14.01s/it] 62%|██████▏ | 6222/10000 [22:42:00<14:21:00, 13.67s/it] {'loss': 0.0051, 'learning_rate': 1.895e-05, 'epoch': 2.34} 62%|██████▏ | 6222/10000 [22:42:01<14:21:00, 13.67s/it] 62%|██████▏ | 6223/10000 [22:42:13<14:05:49, 13.44s/it] {'loss': 0.0042, 'learning_rate': 1.8945000000000002e-05, 'epoch': 2.34} 62%|██████▏ | 6223/10000 [22:42:13<14:05:49, 13.44s/it] 62%|██████▏ | 6224/10000 [22:42:26<13:55:23, 13.27s/it] {'loss': 0.005, 'learning_rate': 1.894e-05, 'epoch': 2.35} 62%|██████▏ | 6224/10000 [22:42:26<13:55:23, 13.27s/it] 62%|██████▏ | 6225/10000 [22:42:39<13:47:09, 13.15s/it] {'loss': 0.0053, 'learning_rate': 1.8935e-05, 'epoch': 2.35} 62%|██████▏ | 6225/10000 [22:42:39<13:47:09, 13.15s/it] 62%|██████▏ | 6226/10000 [22:42:52<13:42:20, 13.07s/it] {'loss': 0.0042, 'learning_rate': 1.893e-05, 'epoch': 2.35} 62%|██████▏ | 6226/10000 [22:42:52<13:42:20, 13.07s/it] 62%|██████▏ | 6227/10000 [22:43:05<13:37:50, 13.01s/it] {'loss': 0.0063, 'learning_rate': 1.8925000000000003e-05, 'epoch': 2.35} 62%|██████▏ | 6227/10000 [22:43:05<13:37:50, 13.01s/it] 62%|██████▏ | 6228/10000 [22:43:18<13:36:17, 12.98s/it] {'loss': 0.0046, 'learning_rate': 1.8920000000000002e-05, 'epoch': 2.35} 62%|██████▏ | 6228/10000 [22:43:18<13:36:17, 12.98s/it] 62%|██████▏ | 6229/10000 [22:43:31<13:34:43, 12.96s/it] {'loss': 0.0053, 'learning_rate': 1.8915e-05, 'epoch': 2.35} 62%|██████▏ | 6229/10000 [22:43:31<13:34:43, 12.96s/it] 62%|██████▏ | 6230/10000 [22:43:44<13:32:12, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.891e-05, 'epoch': 2.35} 62%|██████▏ | 6230/10000 [22:43:44<13:32:12, 12.93s/it] 62%|██████▏ | 6231/10000 [22:43:56<13:31:35, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.8905e-05, 'epoch': 2.35} 62%|██████▏ | 6231/10000 [22:43:56<13:31:35, 12.92s/it] 62%|██████▏ | 6232/10000 [22:44:09<13:31:32, 12.92s/it] {'loss': 0.005, 'learning_rate': 1.8900000000000002e-05, 'epoch': 2.35} 62%|██████▏ | 6232/10000 [22:44:09<13:31:32, 12.92s/it] 62%|██████▏ | 6233/10000 [22:44:22<13:32:12, 12.94s/it] {'loss': 0.0036, 'learning_rate': 1.8895e-05, 'epoch': 2.35} 62%|██████▏ | 6233/10000 [22:44:22<13:32:12, 12.94s/it] 62%|██████▏ | 6234/10000 [22:44:35<13:31:19, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.8890000000000004e-05, 'epoch': 2.35} 62%|██████▏ | 6234/10000 [22:44:35<13:31:19, 12.93s/it] 62%|██████▏ | 6235/10000 [22:44:48<13:31:11, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.8885e-05, 'epoch': 2.35} 62%|██████▏ | 6235/10000 [22:44:48<13:31:11, 12.93s/it] 62%|██████▏ | 6236/10000 [22:45:01<13:29:46, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.888e-05, 'epoch': 2.35} 62%|██████▏ | 6236/10000 [22:45:01<13:29:46, 12.91s/it] 62%|██████▏ | 6237/10000 [22:45:14<13:31:05, 12.93s/it] {'loss': 0.0058, 'learning_rate': 1.8875e-05, 'epoch': 2.35} 62%|██████▏ | 6237/10000 [22:45:14<13:31:05, 12.93s/it] 62%|██████▏ | 6238/10000 [22:45:27<13:30:41, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.887e-05, 'epoch': 2.35} 62%|██████▏ | 6238/10000 [22:45:27<13:30:41, 12.93s/it] 62%|██████▏ | 6239/10000 [22:45:40<13:30:59, 12.94s/it] {'loss': 0.0055, 'learning_rate': 1.8865000000000003e-05, 'epoch': 2.35} 62%|██████▏ | 6239/10000 [22:45:40<13:30:59, 12.94s/it] 62%|██████▏ | 6240/10000 [22:45:53<13:31:01, 12.94s/it] {'loss': 0.004, 'learning_rate': 1.886e-05, 'epoch': 2.35} 62%|██████▏ | 6240/10000 [22:45:53<13:31:01, 12.94s/it] 62%|██████▏ | 6241/10000 [22:46:06<13:30:14, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.8855e-05, 'epoch': 2.35} 62%|██████▏ | 6241/10000 [22:46:06<13:30:14, 12.93s/it] 62%|██████▏ | 6242/10000 [22:46:19<13:29:38, 12.93s/it] {'loss': 0.0037, 'learning_rate': 1.885e-05, 'epoch': 2.35} 62%|██████▏ | 6242/10000 [22:46:19<13:29:38, 12.93s/it] 62%|██████▏ | 6243/10000 [22:46:32<13:28:16, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.8845e-05, 'epoch': 2.35} 62%|██████▏ | 6243/10000 [22:46:32<13:28:16, 12.91s/it] 62%|██████▏ | 6244/10000 [22:46:44<13:27:32, 12.90s/it] {'loss': 0.0058, 'learning_rate': 1.8840000000000003e-05, 'epoch': 2.35} 62%|██████▏ | 6244/10000 [22:46:44<13:27:32, 12.90s/it] 62%|██████▏ | 6245/10000 [22:46:57<13:27:16, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.8835e-05, 'epoch': 2.35} 62%|██████▏ | 6245/10000 [22:46:57<13:27:16, 12.90s/it] 62%|██████▏ | 6246/10000 [22:47:10<13:27:58, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.883e-05, 'epoch': 2.35} 62%|██████▏ | 6246/10000 [22:47:10<13:27:58, 12.91s/it] 62%|██████▏ | 6247/10000 [22:47:23<13:28:48, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.8825e-05, 'epoch': 2.35} 62%|██████▏ | 6247/10000 [22:47:23<13:28:48, 12.93s/it] 62%|██████▏ | 6248/10000 [22:47:36<13:29:25, 12.94s/it] {'loss': 0.0051, 'learning_rate': 1.8820000000000003e-05, 'epoch': 2.35} 62%|██████▏ | 6248/10000 [22:47:36<13:29:25, 12.94s/it] 62%|██████▏ | 6249/10000 [22:47:49<13:27:58, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.8815000000000002e-05, 'epoch': 2.35} 62%|██████▏ | 6249/10000 [22:47:49<13:27:58, 12.92s/it] 62%|██████▎ | 6250/10000 [22:48:02<13:27:09, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.881e-05, 'epoch': 2.35} 62%|██████▎ | 6250/10000 [22:48:02<13:27:09, 12.91s/it] 63%|██████▎ | 6251/10000 [22:48:15<13:26:51, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.8805e-05, 'epoch': 2.36} 63%|██████▎ | 6251/10000 [22:48:15<13:26:51, 12.91s/it] 63%|██████▎ | 6252/10000 [22:48:28<13:26:50, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.88e-05, 'epoch': 2.36} 63%|██████▎ | 6252/10000 [22:48:28<13:26:50, 12.92s/it] 63%|██████▎ | 6253/10000 [22:48:41<13:26:46, 12.92s/it] {'loss': 0.0058, 'learning_rate': 1.8795000000000002e-05, 'epoch': 2.36} 63%|██████▎ | 6253/10000 [22:48:41<13:26:46, 12.92s/it] 63%|██████▎ | 6254/10000 [22:48:54<13:26:17, 12.91s/it] {'loss': 0.0036, 'learning_rate': 1.879e-05, 'epoch': 2.36} 63%|██████▎ | 6254/10000 [22:48:54<13:26:17, 12.91s/it] 63%|██████▎ | 6255/10000 [22:49:07<13:25:11, 12.90s/it] {'loss': 0.0031, 'learning_rate': 1.8785e-05, 'epoch': 2.36} 63%|██████▎ | 6255/10000 [22:49:07<13:25:11, 12.90s/it] 63%|██████▎ | 6256/10000 [22:49:19<13:25:27, 12.91s/it] {'loss': 0.0054, 'learning_rate': 1.878e-05, 'epoch': 2.36} 63%|██████▎ | 6256/10000 [22:49:19<13:25:27, 12.91s/it] 63%|██████▎ | 6257/10000 [22:49:32<13:25:01, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.8775000000000002e-05, 'epoch': 2.36} 63%|██████▎ | 6257/10000 [22:49:32<13:25:01, 12.90s/it] 63%|██████▎ | 6258/10000 [22:49:45<13:26:29, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.877e-05, 'epoch': 2.36} 63%|██████▎ | 6258/10000 [22:49:45<13:26:29, 12.93s/it] 63%|██████▎ | 6259/10000 [22:49:58<13:24:41, 12.91s/it] {'loss': 0.0062, 'learning_rate': 1.8765e-05, 'epoch': 2.36} 63%|██████▎ | 6259/10000 [22:49:58<13:24:41, 12.91s/it] 63%|██████▎ | 6260/10000 [22:50:11<13:23:15, 12.89s/it] {'loss': 0.0038, 'learning_rate': 1.876e-05, 'epoch': 2.36} 63%|██████▎ | 6260/10000 [22:50:11<13:23:15, 12.89s/it] 63%|██████▎ | 6261/10000 [22:50:24<13:21:06, 12.86s/it] {'loss': 0.0045, 'learning_rate': 1.8755e-05, 'epoch': 2.36} 63%|██████▎ | 6261/10000 [22:50:24<13:21:06, 12.86s/it] 63%|██████▎ | 6262/10000 [22:50:37<13:21:57, 12.87s/it] {'loss': 0.0041, 'learning_rate': 1.8750000000000002e-05, 'epoch': 2.36} 63%|██████▎ | 6262/10000 [22:50:37<13:21:57, 12.87s/it] 63%|██████▎ | 6263/10000 [22:50:50<13:23:19, 12.90s/it] {'loss': 0.0036, 'learning_rate': 1.8745e-05, 'epoch': 2.36} 63%|██████▎ | 6263/10000 [22:50:50<13:23:19, 12.90s/it] 63%|██████▎ | 6264/10000 [22:51:03<13:22:49, 12.89s/it] {'loss': 0.0035, 'learning_rate': 1.8740000000000004e-05, 'epoch': 2.36} 63%|██████▎ | 6264/10000 [22:51:03<13:22:49, 12.89s/it] 63%|██████▎ | 6265/10000 [22:51:15<13:22:02, 12.88s/it] {'loss': 0.0042, 'learning_rate': 1.8735e-05, 'epoch': 2.36} 63%|██████▎ | 6265/10000 [22:51:15<13:22:02, 12.88s/it] 63%|██████▎ | 6266/10000 [22:51:28<13:23:33, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.8730000000000002e-05, 'epoch': 2.36} 63%|██████▎ | 6266/10000 [22:51:28<13:23:33, 12.91s/it] 63%|██████▎ | 6267/10000 [22:51:41<13:24:12, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.8725e-05, 'epoch': 2.36} 63%|██████▎ | 6267/10000 [22:51:41<13:24:12, 12.93s/it] 63%|██████▎ | 6268/10000 [22:51:54<13:24:27, 12.93s/it] {'loss': 0.0063, 'learning_rate': 1.872e-05, 'epoch': 2.36} 63%|██████▎ | 6268/10000 [22:51:54<13:24:27, 12.93s/it] 63%|██████▎ | 6269/10000 [22:52:07<13:22:52, 12.91s/it] {'loss': 0.0056, 'learning_rate': 1.8715000000000003e-05, 'epoch': 2.36} 63%|██████▎ | 6269/10000 [22:52:07<13:22:52, 12.91s/it] 63%|██████▎ | 6270/10000 [22:52:20<13:23:11, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.871e-05, 'epoch': 2.36} 63%|██████▎ | 6270/10000 [22:52:20<13:23:11, 12.92s/it] 63%|██████▎ | 6271/10000 [22:52:33<13:23:10, 12.92s/it] {'loss': 0.005, 'learning_rate': 1.8705e-05, 'epoch': 2.36} 63%|██████▎ | 6271/10000 [22:52:33<13:23:10, 12.92s/it] 63%|██████▎ | 6272/10000 [22:52:46<13:22:33, 12.92s/it] {'loss': 0.004, 'learning_rate': 1.87e-05, 'epoch': 2.36} 63%|██████▎ | 6272/10000 [22:52:46<13:22:33, 12.92s/it] 63%|██████▎ | 6273/10000 [22:52:59<13:21:49, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.8695e-05, 'epoch': 2.36} 63%|██████▎ | 6273/10000 [22:52:59<13:21:49, 12.91s/it] 63%|██████▎ | 6274/10000 [22:53:12<13:20:50, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.8690000000000002e-05, 'epoch': 2.36} 63%|██████▎ | 6274/10000 [22:53:12<13:20:50, 12.90s/it] 63%|██████▎ | 6275/10000 [22:53:25<13:19:50, 12.88s/it] {'loss': 0.004, 'learning_rate': 1.8684999999999998e-05, 'epoch': 2.36} 63%|██████▎ | 6275/10000 [22:53:25<13:19:50, 12.88s/it] 63%|██████▎ | 6276/10000 [22:53:37<13:20:20, 12.89s/it] {'loss': 0.0036, 'learning_rate': 1.868e-05, 'epoch': 2.36} 63%|██████▎ | 6276/10000 [22:53:38<13:20:20, 12.89s/it] 63%|██████▎ | 6277/10000 [22:53:51<13:22:23, 12.93s/it] {'loss': 0.0032, 'learning_rate': 1.8675e-05, 'epoch': 2.37} 63%|██████▎ | 6277/10000 [22:53:51<13:22:23, 12.93s/it] 63%|██████▎ | 6278/10000 [22:54:03<13:22:55, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.8670000000000003e-05, 'epoch': 2.37} 63%|██████▎ | 6278/10000 [22:54:04<13:22:55, 12.94s/it] 63%|██████▎ | 6279/10000 [22:54:16<13:21:05, 12.92s/it] {'loss': 0.0055, 'learning_rate': 1.8665000000000002e-05, 'epoch': 2.37} 63%|██████▎ | 6279/10000 [22:54:16<13:21:05, 12.92s/it] 63%|██████▎ | 6280/10000 [22:54:29<13:20:46, 12.92s/it] {'loss': 0.0062, 'learning_rate': 1.866e-05, 'epoch': 2.37} 63%|██████▎ | 6280/10000 [22:54:29<13:20:46, 12.92s/it] 63%|██████▎ | 6281/10000 [22:54:42<13:19:22, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.8655e-05, 'epoch': 2.37} 63%|██████▎ | 6281/10000 [22:54:42<13:19:22, 12.90s/it] 63%|██████▎ | 6282/10000 [22:54:55<13:20:16, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.865e-05, 'epoch': 2.37} 63%|██████▎ | 6282/10000 [22:54:55<13:20:16, 12.91s/it] 63%|██████▎ | 6283/10000 [22:55:08<13:19:19, 12.90s/it] {'loss': 0.0033, 'learning_rate': 1.8645000000000002e-05, 'epoch': 2.37} 63%|██████▎ | 6283/10000 [22:55:08<13:19:19, 12.90s/it] 63%|██████▎ | 6284/10000 [22:55:21<13:20:39, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.864e-05, 'epoch': 2.37} 63%|██████▎ | 6284/10000 [22:55:21<13:20:39, 12.93s/it] 63%|██████▎ | 6285/10000 [22:55:34<13:19:28, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.8635e-05, 'epoch': 2.37} 63%|██████▎ | 6285/10000 [22:55:34<13:19:28, 12.91s/it] 63%|██████▎ | 6286/10000 [22:55:47<13:20:03, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.863e-05, 'epoch': 2.37} 63%|██████▎ | 6286/10000 [22:55:47<13:20:03, 12.92s/it] 63%|██████▎ | 6287/10000 [22:56:00<13:20:51, 12.94s/it] {'loss': 0.0037, 'learning_rate': 1.8625000000000002e-05, 'epoch': 2.37} 63%|██████▎ | 6287/10000 [22:56:00<13:20:51, 12.94s/it] 63%|██████▎ | 6288/10000 [22:56:13<13:21:12, 12.95s/it] {'loss': 0.0053, 'learning_rate': 1.862e-05, 'epoch': 2.37} 63%|██████▎ | 6288/10000 [22:56:13<13:21:12, 12.95s/it] 63%|██████▎ | 6289/10000 [22:56:26<13:19:59, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.8615e-05, 'epoch': 2.37} 63%|██████▎ | 6289/10000 [22:56:26<13:19:59, 12.93s/it] 63%|██████▎ | 6290/10000 [22:56:38<13:18:00, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.861e-05, 'epoch': 2.37} 63%|██████▎ | 6290/10000 [22:56:38<13:18:00, 12.91s/it] 63%|██████▎ | 6291/10000 [22:56:51<13:16:32, 12.89s/it] {'loss': 0.0046, 'learning_rate': 1.8605e-05, 'epoch': 2.37} 63%|██████▎ | 6291/10000 [22:56:51<13:16:32, 12.89s/it] 63%|██████▎ | 6292/10000 [22:57:04<13:15:51, 12.88s/it] {'loss': 0.0033, 'learning_rate': 1.86e-05, 'epoch': 2.37} 63%|██████▎ | 6292/10000 [22:57:04<13:15:51, 12.88s/it] 63%|██████▎ | 6293/10000 [22:57:17<13:15:34, 12.88s/it] {'loss': 0.0044, 'learning_rate': 1.8595e-05, 'epoch': 2.37} 63%|██████▎ | 6293/10000 [22:57:17<13:15:34, 12.88s/it] 63%|██████▎ | 6294/10000 [22:57:30<13:15:10, 12.87s/it] {'loss': 0.0038, 'learning_rate': 1.8590000000000003e-05, 'epoch': 2.37} 63%|██████▎ | 6294/10000 [22:57:30<13:15:10, 12.87s/it] 63%|██████▎ | 6295/10000 [22:57:43<13:14:25, 12.87s/it] {'loss': 0.0043, 'learning_rate': 1.8585e-05, 'epoch': 2.37} 63%|██████▎ | 6295/10000 [22:57:43<13:14:25, 12.87s/it] 63%|██████▎ | 6296/10000 [22:57:56<13:15:12, 12.88s/it] {'loss': 0.004, 'learning_rate': 1.858e-05, 'epoch': 2.37} 63%|██████▎ | 6296/10000 [22:57:56<13:15:12, 12.88s/it] 63%|██████▎ | 6297/10000 [22:58:09<13:15:16, 12.89s/it] {'loss': 0.0054, 'learning_rate': 1.8575e-05, 'epoch': 2.37} 63%|██████▎ | 6297/10000 [22:58:09<13:15:16, 12.89s/it] 63%|██████▎ | 6298/10000 [22:58:21<13:14:34, 12.88s/it] {'loss': 0.0049, 'learning_rate': 1.857e-05, 'epoch': 2.37} 63%|██████▎ | 6298/10000 [22:58:21<13:14:34, 12.88s/it] 63%|██████▎ | 6299/10000 [22:58:34<13:14:37, 12.88s/it] {'loss': 0.0056, 'learning_rate': 1.8565000000000003e-05, 'epoch': 2.37} 63%|██████▎ | 6299/10000 [22:58:34<13:14:37, 12.88s/it] 63%|██████▎ | 6300/10000 [22:58:47<13:16:08, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.856e-05, 'epoch': 2.37} 63%|██████▎ | 6300/10000 [22:58:47<13:16:08, 12.91s/it] 63%|██████▎ | 6301/10000 [22:59:00<13:15:44, 12.91s/it] {'loss': 0.0037, 'learning_rate': 1.8555e-05, 'epoch': 2.37} 63%|██████▎ | 6301/10000 [22:59:00<13:15:44, 12.91s/it] 63%|██████▎ | 6302/10000 [22:59:13<13:15:58, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.855e-05, 'epoch': 2.37} 63%|██████▎ | 6302/10000 [22:59:13<13:15:58, 12.91s/it] 63%|██████▎ | 6303/10000 [22:59:26<13:17:20, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.8545000000000003e-05, 'epoch': 2.37} 63%|██████▎ | 6303/10000 [22:59:26<13:17:20, 12.94s/it] 63%|██████▎ | 6304/10000 [22:59:39<13:17:15, 12.94s/it] {'loss': 0.0063, 'learning_rate': 1.8540000000000002e-05, 'epoch': 2.38} 63%|██████▎ | 6304/10000 [22:59:39<13:17:15, 12.94s/it] 63%|██████▎ | 6305/10000 [22:59:52<13:16:24, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.8535e-05, 'epoch': 2.38} 63%|██████▎ | 6305/10000 [22:59:52<13:16:24, 12.93s/it] 63%|██████▎ | 6306/10000 [23:00:05<13:16:28, 12.94s/it] {'loss': 0.0038, 'learning_rate': 1.853e-05, 'epoch': 2.38} 63%|██████▎ | 6306/10000 [23:00:05<13:16:28, 12.94s/it] 63%|██████▎ | 6307/10000 [23:00:18<13:16:31, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.8525e-05, 'epoch': 2.38} 63%|██████▎ | 6307/10000 [23:00:18<13:16:31, 12.94s/it] 63%|██████▎ | 6308/10000 [23:00:31<13:16:15, 12.94s/it] {'loss': 0.0051, 'learning_rate': 1.8520000000000002e-05, 'epoch': 2.38} 63%|██████▎ | 6308/10000 [23:00:31<13:16:15, 12.94s/it] 63%|██████▎ | 6309/10000 [23:00:44<13:16:01, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.8515e-05, 'epoch': 2.38} 63%|██████▎ | 6309/10000 [23:00:44<13:16:01, 12.94s/it] 63%|██████▎ | 6310/10000 [23:00:57<13:15:28, 12.93s/it] {'loss': 0.0063, 'learning_rate': 1.851e-05, 'epoch': 2.38} 63%|██████▎ | 6310/10000 [23:00:57<13:15:28, 12.93s/it] 63%|██████▎ | 6311/10000 [23:01:09<13:13:37, 12.91s/it] {'loss': 0.0052, 'learning_rate': 1.8505e-05, 'epoch': 2.38} 63%|██████▎ | 6311/10000 [23:01:10<13:13:37, 12.91s/it] 63%|██████▎ | 6312/10000 [23:01:22<13:15:17, 12.94s/it] {'loss': 0.004, 'learning_rate': 1.85e-05, 'epoch': 2.38} 63%|██████▎ | 6312/10000 [23:01:23<13:15:17, 12.94s/it] 63%|██████▎ | 6313/10000 [23:01:35<13:15:30, 12.95s/it] {'loss': 0.0054, 'learning_rate': 1.8495e-05, 'epoch': 2.38} 63%|██████▎ | 6313/10000 [23:01:35<13:15:30, 12.95s/it] 63%|██████▎ | 6314/10000 [23:01:48<13:15:06, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.849e-05, 'epoch': 2.38} 63%|██████▎ | 6314/10000 [23:01:48<13:15:06, 12.94s/it] 63%|██████▎ | 6315/10000 [23:02:01<13:13:14, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.8485e-05, 'epoch': 2.38} 63%|██████▎ | 6315/10000 [23:02:01<13:13:14, 12.92s/it] 63%|██████▎ | 6316/10000 [23:02:14<13:12:25, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.848e-05, 'epoch': 2.38} 63%|██████▎ | 6316/10000 [23:02:14<13:12:25, 12.91s/it] 63%|██████▎ | 6317/10000 [23:02:27<13:12:28, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.8475000000000002e-05, 'epoch': 2.38} 63%|██████▎ | 6317/10000 [23:02:27<13:12:28, 12.91s/it] 63%|██████▎ | 6318/10000 [23:02:40<13:10:40, 12.88s/it] {'loss': 0.0043, 'learning_rate': 1.847e-05, 'epoch': 2.38} 63%|██████▎ | 6318/10000 [23:02:40<13:10:40, 12.88s/it] 63%|██████▎ | 6319/10000 [23:02:53<13:10:09, 12.88s/it] {'loss': 0.0046, 'learning_rate': 1.8465e-05, 'epoch': 2.38} 63%|██████▎ | 6319/10000 [23:02:53<13:10:09, 12.88s/it] 63%|██████▎ | 6320/10000 [23:03:06<13:09:39, 12.87s/it] {'loss': 0.006, 'learning_rate': 1.846e-05, 'epoch': 2.38} 63%|██████▎ | 6320/10000 [23:03:06<13:09:39, 12.87s/it] 63%|██████▎ | 6321/10000 [23:03:18<13:09:05, 12.87s/it] {'loss': 0.0046, 'learning_rate': 1.8455e-05, 'epoch': 2.38} 63%|██████▎ | 6321/10000 [23:03:18<13:09:05, 12.87s/it] 63%|██████▎ | 6322/10000 [23:03:31<13:10:37, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.845e-05, 'epoch': 2.38} 63%|██████▎ | 6322/10000 [23:03:31<13:10:37, 12.90s/it] 63%|██████▎ | 6323/10000 [23:03:44<13:10:48, 12.90s/it] {'loss': 0.005, 'learning_rate': 1.8445e-05, 'epoch': 2.38} 63%|██████▎ | 6323/10000 [23:03:44<13:10:48, 12.90s/it] 63%|██████▎ | 6324/10000 [23:03:57<13:11:51, 12.92s/it] {'loss': 0.0052, 'learning_rate': 1.8440000000000003e-05, 'epoch': 2.38} 63%|██████▎ | 6324/10000 [23:03:57<13:11:51, 12.92s/it] 63%|██████▎ | 6325/10000 [23:04:10<13:11:50, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.8435000000000002e-05, 'epoch': 2.38} 63%|██████▎ | 6325/10000 [23:04:10<13:11:50, 12.93s/it] 63%|██████▎ | 6326/10000 [23:04:23<13:10:13, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.843e-05, 'epoch': 2.38} 63%|██████▎ | 6326/10000 [23:04:23<13:10:13, 12.91s/it] 63%|██████▎ | 6327/10000 [23:04:36<13:09:51, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.8425e-05, 'epoch': 2.38} 63%|██████▎ | 6327/10000 [23:04:36<13:09:51, 12.90s/it] 63%|██████▎ | 6328/10000 [23:04:49<13:09:30, 12.90s/it] {'loss': 0.0059, 'learning_rate': 1.842e-05, 'epoch': 2.38} 63%|██████▎ | 6328/10000 [23:04:49<13:09:30, 12.90s/it] 63%|██████▎ | 6329/10000 [23:05:02<13:08:32, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.8415000000000002e-05, 'epoch': 2.38} 63%|██████▎ | 6329/10000 [23:05:02<13:08:32, 12.89s/it] 63%|██████▎ | 6330/10000 [23:05:15<13:07:58, 12.88s/it] {'loss': 0.0046, 'learning_rate': 1.841e-05, 'epoch': 2.39} 63%|██████▎ | 6330/10000 [23:05:15<13:07:58, 12.88s/it] 63%|██████▎ | 6331/10000 [23:05:28<13:09:19, 12.91s/it] {'loss': 0.0059, 'learning_rate': 1.8405e-05, 'epoch': 2.39} 63%|██████▎ | 6331/10000 [23:05:28<13:09:19, 12.91s/it] 63%|██████▎ | 6332/10000 [23:05:41<13:10:52, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.84e-05, 'epoch': 2.39} 63%|██████▎ | 6332/10000 [23:05:41<13:10:52, 12.94s/it] 63%|██████▎ | 6333/10000 [23:05:54<13:10:44, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.8395000000000003e-05, 'epoch': 2.39} 63%|██████▎ | 6333/10000 [23:05:54<13:10:44, 12.94s/it] 63%|██████▎ | 6334/10000 [23:06:06<13:08:37, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.8390000000000002e-05, 'epoch': 2.39} 63%|██████▎ | 6334/10000 [23:06:06<13:08:37, 12.91s/it] 63%|██████▎ | 6335/10000 [23:06:19<13:08:56, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.8385e-05, 'epoch': 2.39} 63%|██████▎ | 6335/10000 [23:06:19<13:08:56, 12.92s/it] 63%|██████▎ | 6336/10000 [23:06:32<13:09:16, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.838e-05, 'epoch': 2.39} 63%|██████▎ | 6336/10000 [23:06:32<13:09:16, 12.92s/it] 63%|██████▎ | 6337/10000 [23:06:45<13:09:10, 12.93s/it] {'loss': 0.0056, 'learning_rate': 1.8375e-05, 'epoch': 2.39} 63%|██████▎ | 6337/10000 [23:06:45<13:09:10, 12.93s/it] 63%|██████▎ | 6338/10000 [23:06:58<13:07:16, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.8370000000000002e-05, 'epoch': 2.39} 63%|██████▎ | 6338/10000 [23:06:58<13:07:16, 12.90s/it] 63%|██████▎ | 6339/10000 [23:07:11<13:05:42, 12.88s/it] {'loss': 0.0067, 'learning_rate': 1.8365e-05, 'epoch': 2.39} 63%|██████▎ | 6339/10000 [23:07:11<13:05:42, 12.88s/it] 63%|██████▎ | 6340/10000 [23:07:24<13:05:51, 12.88s/it] {'loss': 0.0054, 'learning_rate': 1.8360000000000004e-05, 'epoch': 2.39} 63%|██████▎ | 6340/10000 [23:07:24<13:05:51, 12.88s/it] 63%|██████▎ | 6341/10000 [23:07:37<13:06:30, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.8355e-05, 'epoch': 2.39} 63%|██████▎ | 6341/10000 [23:07:37<13:06:30, 12.90s/it] 63%|██████▎ | 6342/10000 [23:07:50<13:07:44, 12.92s/it] {'loss': 0.0055, 'learning_rate': 1.8350000000000002e-05, 'epoch': 2.39} 63%|██████▎ | 6342/10000 [23:07:50<13:07:44, 12.92s/it] 63%|██████▎ | 6343/10000 [23:08:03<13:07:34, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.8345e-05, 'epoch': 2.39} 63%|██████▎ | 6343/10000 [23:08:03<13:07:34, 12.92s/it] 63%|██████▎ | 6344/10000 [23:08:15<13:05:47, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.834e-05, 'epoch': 2.39} 63%|██████▎ | 6344/10000 [23:08:15<13:05:47, 12.90s/it] 63%|██████▎ | 6345/10000 [23:08:28<13:08:03, 12.94s/it] {'loss': 0.0029, 'learning_rate': 1.8335000000000003e-05, 'epoch': 2.39} 63%|██████▎ | 6345/10000 [23:08:28<13:08:03, 12.94s/it] 63%|██████▎ | 6346/10000 [23:08:41<13:07:17, 12.93s/it] {'loss': 0.0035, 'learning_rate': 1.833e-05, 'epoch': 2.39} 63%|██████▎ | 6346/10000 [23:08:41<13:07:17, 12.93s/it] 63%|██████▎ | 6347/10000 [23:08:54<13:06:26, 12.92s/it] {'loss': 0.0046, 'learning_rate': 1.8325e-05, 'epoch': 2.39} 63%|██████▎ | 6347/10000 [23:08:54<13:06:26, 12.92s/it] 63%|██████▎ | 6348/10000 [23:09:07<13:04:25, 12.89s/it] {'loss': 0.003, 'learning_rate': 1.832e-05, 'epoch': 2.39} 63%|██████▎ | 6348/10000 [23:09:07<13:04:25, 12.89s/it] 63%|██████▎ | 6349/10000 [23:09:20<13:07:07, 12.94s/it] {'loss': 0.0037, 'learning_rate': 1.8315e-05, 'epoch': 2.39} 63%|██████▎ | 6349/10000 [23:09:20<13:07:07, 12.94s/it] 64%|██████▎ | 6350/10000 [23:09:33<13:05:25, 12.91s/it] {'loss': 0.0029, 'learning_rate': 1.8310000000000003e-05, 'epoch': 2.39} 64%|██████▎ | 6350/10000 [23:09:33<13:05:25, 12.91s/it] 64%|██████▎ | 6351/10000 [23:09:46<13:03:51, 12.89s/it] {'loss': 0.0061, 'learning_rate': 1.8305e-05, 'epoch': 2.39} 64%|██████▎ | 6351/10000 [23:09:46<13:03:51, 12.89s/it] 64%|██████▎ | 6352/10000 [23:09:59<13:03:58, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.83e-05, 'epoch': 2.39} 64%|██████▎ | 6352/10000 [23:09:59<13:03:58, 12.89s/it] 64%|██████▎ | 6353/10000 [23:10:12<13:02:48, 12.88s/it] {'loss': 0.0041, 'learning_rate': 1.8295e-05, 'epoch': 2.39} 64%|██████▎ | 6353/10000 [23:10:12<13:02:48, 12.88s/it] 64%|██████▎ | 6354/10000 [23:10:24<13:03:24, 12.89s/it] {'loss': 0.0061, 'learning_rate': 1.8290000000000003e-05, 'epoch': 2.39} 64%|██████▎ | 6354/10000 [23:10:24<13:03:24, 12.89s/it] 64%|██████▎ | 6355/10000 [23:10:37<13:02:59, 12.89s/it] {'loss': 0.0059, 'learning_rate': 1.8285000000000002e-05, 'epoch': 2.39} 64%|██████▎ | 6355/10000 [23:10:37<13:02:59, 12.89s/it] 64%|██████▎ | 6356/10000 [23:10:50<13:03:20, 12.90s/it] {'loss': 0.0053, 'learning_rate': 1.828e-05, 'epoch': 2.39} 64%|██████▎ | 6356/10000 [23:10:50<13:03:20, 12.90s/it] 64%|██████▎ | 6357/10000 [23:11:03<13:02:46, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.8275e-05, 'epoch': 2.4} 64%|██████▎ | 6357/10000 [23:11:03<13:02:46, 12.89s/it] 64%|██████▎ | 6358/10000 [23:11:16<13:01:47, 12.88s/it] {'loss': 0.0049, 'learning_rate': 1.827e-05, 'epoch': 2.4} 64%|██████▎ | 6358/10000 [23:11:16<13:01:47, 12.88s/it] 64%|██████▎ | 6359/10000 [23:11:29<13:02:19, 12.89s/it] {'loss': 0.005, 'learning_rate': 1.8265000000000002e-05, 'epoch': 2.4} 64%|██████▎ | 6359/10000 [23:11:29<13:02:19, 12.89s/it] 64%|██████▎ | 6360/10000 [23:11:42<13:01:21, 12.88s/it] {'loss': 0.0047, 'learning_rate': 1.826e-05, 'epoch': 2.4} 64%|██████▎ | 6360/10000 [23:11:42<13:01:21, 12.88s/it] 64%|██████▎ | 6361/10000 [23:11:55<13:01:29, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.8255e-05, 'epoch': 2.4} 64%|██████▎ | 6361/10000 [23:11:55<13:01:29, 12.89s/it] 64%|██████▎ | 6362/10000 [23:12:08<13:01:54, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.825e-05, 'epoch': 2.4} 64%|██████▎ | 6362/10000 [23:12:08<13:01:54, 12.90s/it] 64%|██████▎ | 6363/10000 [23:12:21<13:02:52, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.8245000000000002e-05, 'epoch': 2.4} 64%|██████▎ | 6363/10000 [23:12:21<13:02:52, 12.92s/it] 64%|██████▎ | 6364/10000 [23:12:34<13:03:37, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.824e-05, 'epoch': 2.4} 64%|██████▎ | 6364/10000 [23:12:34<13:03:37, 12.93s/it] 64%|██████▎ | 6365/10000 [23:12:46<13:02:23, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.8235e-05, 'epoch': 2.4} 64%|██████▎ | 6365/10000 [23:12:46<13:02:23, 12.91s/it] 64%|██████▎ | 6366/10000 [23:12:59<13:01:43, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.823e-05, 'epoch': 2.4} 64%|██████▎ | 6366/10000 [23:12:59<13:01:43, 12.91s/it] 64%|██████▎ | 6367/10000 [23:13:12<13:02:31, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.8225e-05, 'epoch': 2.4} 64%|██████▎ | 6367/10000 [23:13:12<13:02:31, 12.92s/it] 64%|██████▎ | 6368/10000 [23:13:25<13:03:32, 12.94s/it] {'loss': 0.0053, 'learning_rate': 1.8220000000000002e-05, 'epoch': 2.4} 64%|██████▎ | 6368/10000 [23:13:25<13:03:32, 12.94s/it] 64%|██████▎ | 6369/10000 [23:13:38<13:04:55, 12.97s/it] {'loss': 0.0044, 'learning_rate': 1.8215e-05, 'epoch': 2.4} 64%|██████▎ | 6369/10000 [23:13:38<13:04:55, 12.97s/it] 64%|██████▎ | 6370/10000 [23:13:51<13:03:47, 12.96s/it] {'loss': 0.0047, 'learning_rate': 1.8210000000000004e-05, 'epoch': 2.4} 64%|██████▎ | 6370/10000 [23:13:51<13:03:47, 12.96s/it] 64%|██████▎ | 6371/10000 [23:14:04<13:04:36, 12.97s/it] {'loss': 0.0051, 'learning_rate': 1.8205e-05, 'epoch': 2.4} 64%|██████▎ | 6371/10000 [23:14:04<13:04:36, 12.97s/it] 64%|██████▎ | 6372/10000 [23:14:17<13:03:35, 12.96s/it] {'loss': 0.006, 'learning_rate': 1.8200000000000002e-05, 'epoch': 2.4} 64%|██████▎ | 6372/10000 [23:14:17<13:03:35, 12.96s/it] 64%|██████▎ | 6373/10000 [23:14:30<13:04:37, 12.98s/it] {'loss': 0.0038, 'learning_rate': 1.8195e-05, 'epoch': 2.4} 64%|██████▎ | 6373/10000 [23:14:30<13:04:37, 12.98s/it] 64%|██████▎ | 6374/10000 [23:14:43<13:03:36, 12.97s/it] {'loss': 0.0055, 'learning_rate': 1.819e-05, 'epoch': 2.4} 64%|██████▎ | 6374/10000 [23:14:43<13:03:36, 12.97s/it] 64%|██████▍ | 6375/10000 [23:14:56<13:02:28, 12.95s/it] {'loss': 0.0037, 'learning_rate': 1.8185000000000003e-05, 'epoch': 2.4} 64%|██████▍ | 6375/10000 [23:14:56<13:02:28, 12.95s/it] 64%|██████▍ | 6376/10000 [23:15:09<13:01:18, 12.94s/it] {'loss': 0.005, 'learning_rate': 1.818e-05, 'epoch': 2.4} 64%|██████▍ | 6376/10000 [23:15:09<13:01:18, 12.94s/it] 64%|██████▍ | 6377/10000 [23:15:22<13:02:21, 12.96s/it] {'loss': 0.0052, 'learning_rate': 1.8175e-05, 'epoch': 2.4} 64%|██████▍ | 6377/10000 [23:15:22<13:02:21, 12.96s/it] 64%|██████▍ | 6378/10000 [23:15:35<13:00:39, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.817e-05, 'epoch': 2.4} 64%|██████▍ | 6378/10000 [23:15:35<13:00:39, 12.93s/it] 64%|██████▍ | 6379/10000 [23:15:48<12:59:31, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.8165000000000003e-05, 'epoch': 2.4} 64%|██████▍ | 6379/10000 [23:15:48<12:59:31, 12.92s/it] 64%|██████▍ | 6380/10000 [23:16:01<12:58:55, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.8160000000000002e-05, 'epoch': 2.4} 64%|██████▍ | 6380/10000 [23:16:01<12:58:55, 12.91s/it] 64%|██████▍ | 6381/10000 [23:16:13<12:58:10, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.8154999999999998e-05, 'epoch': 2.4} 64%|██████▍ | 6381/10000 [23:16:13<12:58:10, 12.90s/it] 64%|██████▍ | 6382/10000 [23:16:26<12:59:19, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.815e-05, 'epoch': 2.4} 64%|██████▍ | 6382/10000 [23:16:26<12:59:19, 12.92s/it] 64%|██████▍ | 6383/10000 [23:16:39<12:59:00, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.8145e-05, 'epoch': 2.41} 64%|██████▍ | 6383/10000 [23:16:39<12:59:00, 12.92s/it] 64%|██████▍ | 6384/10000 [23:16:52<12:57:53, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.8140000000000003e-05, 'epoch': 2.41} 64%|██████▍ | 6384/10000 [23:16:52<12:57:53, 12.91s/it] 64%|██████▍ | 6385/10000 [23:17:05<12:57:33, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.8135000000000002e-05, 'epoch': 2.41} 64%|██████▍ | 6385/10000 [23:17:05<12:57:33, 12.91s/it] 64%|██████▍ | 6386/10000 [23:17:18<12:58:09, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.813e-05, 'epoch': 2.41} 64%|██████▍ | 6386/10000 [23:17:18<12:58:09, 12.92s/it] 64%|██████▍ | 6387/10000 [23:17:31<12:59:00, 12.94s/it] {'loss': 0.0049, 'learning_rate': 1.8125e-05, 'epoch': 2.41} 64%|██████▍ | 6387/10000 [23:17:31<12:59:00, 12.94s/it] 64%|██████▍ | 6388/10000 [23:17:44<12:58:05, 12.93s/it] {'loss': 0.0037, 'learning_rate': 1.812e-05, 'epoch': 2.41} 64%|██████▍ | 6388/10000 [23:17:44<12:58:05, 12.93s/it] 64%|██████▍ | 6389/10000 [23:17:57<12:55:57, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.8115000000000002e-05, 'epoch': 2.41} 64%|██████▍ | 6389/10000 [23:17:57<12:55:57, 12.89s/it] 64%|██████▍ | 6390/10000 [23:18:10<12:56:52, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.811e-05, 'epoch': 2.41} 64%|██████▍ | 6390/10000 [23:18:10<12:56:52, 12.91s/it] 64%|██████▍ | 6391/10000 [23:18:23<12:57:46, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.8105e-05, 'epoch': 2.41} 64%|██████▍ | 6391/10000 [23:18:23<12:57:46, 12.93s/it] 64%|██████▍ | 6392/10000 [23:18:36<12:58:26, 12.95s/it] {'loss': 0.0044, 'learning_rate': 1.81e-05, 'epoch': 2.41} 64%|██████▍ | 6392/10000 [23:18:36<12:58:26, 12.95s/it] 64%|██████▍ | 6393/10000 [23:18:49<12:58:43, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.8095000000000002e-05, 'epoch': 2.41} 64%|██████▍ | 6393/10000 [23:18:49<12:58:43, 12.95s/it] 64%|██████▍ | 6394/10000 [23:19:02<12:57:27, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.809e-05, 'epoch': 2.41} 64%|██████▍ | 6394/10000 [23:19:02<12:57:27, 12.94s/it] 64%|██████▍ | 6395/10000 [23:19:14<12:56:52, 12.93s/it] {'loss': 0.0038, 'learning_rate': 1.8085e-05, 'epoch': 2.41} 64%|██████▍ | 6395/10000 [23:19:14<12:56:52, 12.93s/it] 64%|██████▍ | 6396/10000 [23:19:27<12:55:02, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.808e-05, 'epoch': 2.41} 64%|██████▍ | 6396/10000 [23:19:27<12:55:02, 12.90s/it] 64%|██████▍ | 6397/10000 [23:19:40<12:54:01, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.8075e-05, 'epoch': 2.41} 64%|██████▍ | 6397/10000 [23:19:40<12:54:01, 12.89s/it] 64%|██████▍ | 6398/10000 [23:19:53<12:53:58, 12.89s/it] {'loss': 0.0033, 'learning_rate': 1.807e-05, 'epoch': 2.41} 64%|██████▍ | 6398/10000 [23:19:53<12:53:58, 12.89s/it] 64%|██████▍ | 6399/10000 [23:20:06<12:54:24, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.8065e-05, 'epoch': 2.41} 64%|██████▍ | 6399/10000 [23:20:06<12:54:24, 12.90s/it] 64%|██████▍ | 6400/10000 [23:20:19<12:55:08, 12.92s/it] {'loss': 0.0054, 'learning_rate': 1.8060000000000003e-05, 'epoch': 2.41} 64%|██████▍ | 6400/10000 [23:20:19<12:55:08, 12.92s/it] 64%|██████▍ | 6401/10000 [23:20:32<12:55:27, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.8055e-05, 'epoch': 2.41} 64%|██████▍ | 6401/10000 [23:20:32<12:55:27, 12.93s/it] 64%|██████▍ | 6402/10000 [23:20:45<12:55:06, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.805e-05, 'epoch': 2.41} 64%|██████▍ | 6402/10000 [23:20:45<12:55:06, 12.93s/it] 64%|██████▍ | 6403/10000 [23:20:58<12:53:10, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.8045e-05, 'epoch': 2.41} 64%|██████▍ | 6403/10000 [23:20:58<12:53:10, 12.90s/it] 64%|██████▍ | 6404/10000 [23:21:10<12:51:54, 12.88s/it] {'loss': 0.0041, 'learning_rate': 1.804e-05, 'epoch': 2.41} 64%|██████▍ | 6404/10000 [23:21:10<12:51:54, 12.88s/it] 64%|██████▍ | 6405/10000 [23:21:23<12:52:42, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.8035000000000003e-05, 'epoch': 2.41} 64%|██████▍ | 6405/10000 [23:21:23<12:52:42, 12.90s/it] 64%|██████▍ | 6406/10000 [23:21:36<12:52:07, 12.89s/it] {'loss': 0.0047, 'learning_rate': 1.803e-05, 'epoch': 2.41} 64%|██████▍ | 6406/10000 [23:21:36<12:52:07, 12.89s/it] 64%|██████▍ | 6407/10000 [23:21:49<12:52:13, 12.90s/it] {'loss': 0.006, 'learning_rate': 1.8025e-05, 'epoch': 2.41} 64%|██████▍ | 6407/10000 [23:21:49<12:52:13, 12.90s/it] 64%|██████▍ | 6408/10000 [23:22:02<12:50:59, 12.88s/it] {'loss': 0.0048, 'learning_rate': 1.802e-05, 'epoch': 2.41} 64%|██████▍ | 6408/10000 [23:22:02<12:50:59, 12.88s/it] 64%|██████▍ | 6409/10000 [23:22:15<12:50:14, 12.87s/it] {'loss': 0.0041, 'learning_rate': 1.8015000000000003e-05, 'epoch': 2.41} 64%|██████▍ | 6409/10000 [23:22:15<12:50:14, 12.87s/it] 64%|██████▍ | 6410/10000 [23:22:28<12:49:59, 12.87s/it] {'loss': 0.0052, 'learning_rate': 1.8010000000000002e-05, 'epoch': 2.42} 64%|██████▍ | 6410/10000 [23:22:28<12:49:59, 12.87s/it] 64%|██████▍ | 6411/10000 [23:22:41<12:51:46, 12.90s/it] {'loss': 0.005, 'learning_rate': 1.8005e-05, 'epoch': 2.42} 64%|██████▍ | 6411/10000 [23:22:41<12:51:46, 12.90s/it] 64%|██████▍ | 6412/10000 [23:22:54<12:52:24, 12.92s/it] {'loss': 0.0061, 'learning_rate': 1.8e-05, 'epoch': 2.42} 64%|██████▍ | 6412/10000 [23:22:54<12:52:24, 12.92s/it] 64%|██████▍ | 6413/10000 [23:23:07<12:51:09, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.7995e-05, 'epoch': 2.42} 64%|██████▍ | 6413/10000 [23:23:07<12:51:09, 12.90s/it] 64%|██████▍ | 6414/10000 [23:23:19<12:51:28, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.7990000000000002e-05, 'epoch': 2.42} 64%|██████▍ | 6414/10000 [23:23:19<12:51:28, 12.91s/it] 64%|██████▍ | 6415/10000 [23:23:32<12:50:57, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.7985e-05, 'epoch': 2.42} 64%|██████▍ | 6415/10000 [23:23:32<12:50:57, 12.90s/it] 64%|██████▍ | 6416/10000 [23:23:45<12:51:24, 12.91s/it] {'loss': 0.0052, 'learning_rate': 1.798e-05, 'epoch': 2.42} 64%|██████▍ | 6416/10000 [23:23:45<12:51:24, 12.91s/it] 64%|██████▍ | 6417/10000 [23:23:58<12:50:58, 12.91s/it] {'loss': 0.0061, 'learning_rate': 1.7975e-05, 'epoch': 2.42} 64%|██████▍ | 6417/10000 [23:23:58<12:50:58, 12.91s/it] 64%|██████▍ | 6418/10000 [23:24:11<12:51:11, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.797e-05, 'epoch': 2.42} 64%|██████▍ | 6418/10000 [23:24:11<12:51:11, 12.92s/it] 64%|██████▍ | 6419/10000 [23:24:24<12:52:56, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.7965e-05, 'epoch': 2.42} 64%|██████▍ | 6419/10000 [23:24:24<12:52:56, 12.95s/it] 64%|██████▍ | 6420/10000 [23:24:37<12:52:14, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.796e-05, 'epoch': 2.42} 64%|██████▍ | 6420/10000 [23:24:37<12:52:14, 12.94s/it] 64%|██████▍ | 6421/10000 [23:24:50<12:50:16, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.7955e-05, 'epoch': 2.42} 64%|██████▍ | 6421/10000 [23:24:50<12:50:16, 12.91s/it] 64%|██████▍ | 6422/10000 [23:25:03<12:49:44, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.795e-05, 'epoch': 2.42} 64%|██████▍ | 6422/10000 [23:25:03<12:49:44, 12.91s/it] 64%|██████▍ | 6423/10000 [23:25:16<12:49:08, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.7945000000000002e-05, 'epoch': 2.42} 64%|██████▍ | 6423/10000 [23:25:16<12:49:08, 12.90s/it] 64%|██████▍ | 6424/10000 [23:25:29<12:50:00, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.794e-05, 'epoch': 2.42} 64%|██████▍ | 6424/10000 [23:25:29<12:50:00, 12.92s/it] 64%|██████▍ | 6425/10000 [23:25:42<12:50:59, 12.94s/it] {'loss': 0.0047, 'learning_rate': 1.7935e-05, 'epoch': 2.42} 64%|██████▍ | 6425/10000 [23:25:42<12:50:59, 12.94s/it] 64%|██████▍ | 6426/10000 [23:25:55<12:50:28, 12.93s/it] {'loss': 0.004, 'learning_rate': 1.793e-05, 'epoch': 2.42} 64%|██████▍ | 6426/10000 [23:25:55<12:50:28, 12.93s/it] 64%|██████▍ | 6427/10000 [23:26:07<12:49:19, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.7925e-05, 'epoch': 2.42} 64%|██████▍ | 6427/10000 [23:26:07<12:49:19, 12.92s/it] 64%|██████▍ | 6428/10000 [23:26:20<12:48:46, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.792e-05, 'epoch': 2.42} 64%|██████▍ | 6428/10000 [23:26:20<12:48:46, 12.91s/it] 64%|██████▍ | 6429/10000 [23:26:33<12:48:27, 12.91s/it] {'loss': 0.0055, 'learning_rate': 1.7915e-05, 'epoch': 2.42} 64%|██████▍ | 6429/10000 [23:26:33<12:48:27, 12.91s/it] 64%|██████▍ | 6430/10000 [23:26:46<12:48:54, 12.92s/it] {'loss': 0.0062, 'learning_rate': 1.7910000000000003e-05, 'epoch': 2.42} 64%|██████▍ | 6430/10000 [23:26:46<12:48:54, 12.92s/it] 64%|██████▍ | 6431/10000 [23:26:59<12:48:05, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.7905e-05, 'epoch': 2.42} 64%|██████▍ | 6431/10000 [23:26:59<12:48:05, 12.91s/it] 64%|██████▍ | 6432/10000 [23:27:12<12:48:50, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.79e-05, 'epoch': 2.42} 64%|██████▍ | 6432/10000 [23:27:12<12:48:50, 12.93s/it] 64%|██████▍ | 6433/10000 [23:27:25<12:47:06, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.7895e-05, 'epoch': 2.42} 64%|██████▍ | 6433/10000 [23:27:25<12:47:06, 12.90s/it] 64%|██████▍ | 6434/10000 [23:27:38<12:46:04, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.789e-05, 'epoch': 2.42} 64%|██████▍ | 6434/10000 [23:27:38<12:46:04, 12.89s/it] 64%|██████▍ | 6435/10000 [23:27:51<12:46:17, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.7885000000000002e-05, 'epoch': 2.42} 64%|██████▍ | 6435/10000 [23:27:51<12:46:17, 12.90s/it] 64%|██████▍ | 6436/10000 [23:28:04<12:45:54, 12.89s/it] {'loss': 0.0061, 'learning_rate': 1.7879999999999998e-05, 'epoch': 2.43} 64%|██████▍ | 6436/10000 [23:28:04<12:45:54, 12.89s/it] 64%|██████▍ | 6437/10000 [23:28:17<12:46:54, 12.91s/it] {'loss': 0.006, 'learning_rate': 1.7875e-05, 'epoch': 2.43} 64%|██████▍ | 6437/10000 [23:28:17<12:46:54, 12.91s/it] 64%|██████▍ | 6438/10000 [23:28:29<12:45:50, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.787e-05, 'epoch': 2.43} 64%|██████▍ | 6438/10000 [23:28:29<12:45:50, 12.90s/it] 64%|██████▍ | 6439/10000 [23:28:42<12:47:24, 12.93s/it] {'loss': 0.0051, 'learning_rate': 1.7865000000000003e-05, 'epoch': 2.43} 64%|██████▍ | 6439/10000 [23:28:42<12:47:24, 12.93s/it] 64%|██████▍ | 6440/10000 [23:28:55<12:47:50, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.7860000000000002e-05, 'epoch': 2.43} 64%|██████▍ | 6440/10000 [23:28:55<12:47:50, 12.94s/it] 64%|██████▍ | 6441/10000 [23:29:08<12:47:50, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.7855e-05, 'epoch': 2.43} 64%|██████▍ | 6441/10000 [23:29:08<12:47:50, 12.94s/it] 64%|██████▍ | 6442/10000 [23:29:21<12:47:06, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.785e-05, 'epoch': 2.43} 64%|██████▍ | 6442/10000 [23:29:21<12:47:06, 12.94s/it] 64%|██████▍ | 6443/10000 [23:29:34<12:46:18, 12.93s/it] {'loss': 0.0039, 'learning_rate': 1.7845e-05, 'epoch': 2.43} 64%|██████▍ | 6443/10000 [23:29:34<12:46:18, 12.93s/it] 64%|██████▍ | 6444/10000 [23:29:47<12:45:42, 12.92s/it] {'loss': 0.0039, 'learning_rate': 1.7840000000000002e-05, 'epoch': 2.43} 64%|██████▍ | 6444/10000 [23:29:47<12:45:42, 12.92s/it] 64%|██████▍ | 6445/10000 [23:30:00<12:43:51, 12.89s/it] {'loss': 0.0055, 'learning_rate': 1.7835e-05, 'epoch': 2.43} 64%|██████▍ | 6445/10000 [23:30:00<12:43:51, 12.89s/it] 64%|██████▍ | 6446/10000 [23:30:13<12:42:34, 12.87s/it] {'loss': 0.0051, 'learning_rate': 1.783e-05, 'epoch': 2.43} 64%|██████▍ | 6446/10000 [23:30:13<12:42:34, 12.87s/it] 64%|██████▍ | 6447/10000 [23:30:26<12:42:28, 12.88s/it] {'loss': 0.0057, 'learning_rate': 1.7825e-05, 'epoch': 2.43} 64%|██████▍ | 6447/10000 [23:30:26<12:42:28, 12.88s/it] 64%|██████▍ | 6448/10000 [23:30:38<12:42:09, 12.87s/it] {'loss': 0.0038, 'learning_rate': 1.7820000000000002e-05, 'epoch': 2.43} 64%|██████▍ | 6448/10000 [23:30:38<12:42:09, 12.87s/it] 64%|██████▍ | 6449/10000 [23:30:51<12:42:05, 12.88s/it] {'loss': 0.0047, 'learning_rate': 1.7815e-05, 'epoch': 2.43} 64%|██████▍ | 6449/10000 [23:30:51<12:42:05, 12.88s/it] 64%|██████▍ | 6450/10000 [23:31:04<12:41:19, 12.87s/it] {'loss': 0.005, 'learning_rate': 1.781e-05, 'epoch': 2.43} 64%|██████▍ | 6450/10000 [23:31:04<12:41:19, 12.87s/it] 65%|██████▍ | 6451/10000 [23:31:17<12:41:38, 12.88s/it] {'loss': 0.0051, 'learning_rate': 1.7805000000000003e-05, 'epoch': 2.43} 65%|██████▍ | 6451/10000 [23:31:17<12:41:38, 12.88s/it] 65%|██████▍ | 6452/10000 [23:31:30<12:42:03, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.78e-05, 'epoch': 2.43} 65%|██████▍ | 6452/10000 [23:31:30<12:42:03, 12.89s/it] 65%|██████▍ | 6453/10000 [23:31:43<12:41:59, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.7795e-05, 'epoch': 2.43} 65%|██████▍ | 6453/10000 [23:31:43<12:41:59, 12.89s/it] 65%|██████▍ | 6454/10000 [23:31:56<12:43:24, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.779e-05, 'epoch': 2.43} 65%|██████▍ | 6454/10000 [23:31:56<12:43:24, 12.92s/it] 65%|██████▍ | 6455/10000 [23:32:09<12:41:43, 12.89s/it] {'loss': 0.0051, 'learning_rate': 1.7785e-05, 'epoch': 2.43} 65%|██████▍ | 6455/10000 [23:32:09<12:41:43, 12.89s/it] 65%|██████▍ | 6456/10000 [23:32:22<12:41:00, 12.88s/it] {'loss': 0.0037, 'learning_rate': 1.7780000000000003e-05, 'epoch': 2.43} 65%|██████▍ | 6456/10000 [23:32:22<12:41:00, 12.88s/it] 65%|██████▍ | 6457/10000 [23:32:34<12:40:43, 12.88s/it] {'loss': 0.0041, 'learning_rate': 1.7775e-05, 'epoch': 2.43} 65%|██████▍ | 6457/10000 [23:32:34<12:40:43, 12.88s/it] 65%|██████▍ | 6458/10000 [23:32:47<12:40:57, 12.89s/it] {'loss': 0.0047, 'learning_rate': 1.777e-05, 'epoch': 2.43} 65%|██████▍ | 6458/10000 [23:32:47<12:40:57, 12.89s/it] 65%|██████▍ | 6459/10000 [23:33:00<12:40:59, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.7765e-05, 'epoch': 2.43} 65%|██████▍ | 6459/10000 [23:33:00<12:40:59, 12.89s/it] 65%|██████▍ | 6460/10000 [23:33:13<12:41:04, 12.90s/it] {'loss': 0.0038, 'learning_rate': 1.7760000000000003e-05, 'epoch': 2.43} 65%|██████▍ | 6460/10000 [23:33:13<12:41:04, 12.90s/it] 65%|██████▍ | 6461/10000 [23:33:26<12:42:08, 12.92s/it] {'loss': 0.0054, 'learning_rate': 1.7755000000000002e-05, 'epoch': 2.43} 65%|██████▍ | 6461/10000 [23:33:26<12:42:08, 12.92s/it] 65%|██████▍ | 6462/10000 [23:33:39<12:42:07, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.775e-05, 'epoch': 2.43} 65%|██████▍ | 6462/10000 [23:33:39<12:42:07, 12.92s/it] 65%|██████▍ | 6463/10000 [23:33:52<12:42:36, 12.94s/it] {'loss': 0.0049, 'learning_rate': 1.7745e-05, 'epoch': 2.44} 65%|██████▍ | 6463/10000 [23:33:52<12:42:36, 12.94s/it] 65%|██████▍ | 6464/10000 [23:34:05<12:42:18, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.774e-05, 'epoch': 2.44} 65%|██████▍ | 6464/10000 [23:34:05<12:42:18, 12.94s/it] 65%|██████▍ | 6465/10000 [23:34:18<12:41:09, 12.92s/it] {'loss': 0.0047, 'learning_rate': 1.7735000000000002e-05, 'epoch': 2.44} 65%|██████▍ | 6465/10000 [23:34:18<12:41:09, 12.92s/it] 65%|██████▍ | 6466/10000 [23:34:31<12:40:57, 12.92s/it] {'loss': 0.0034, 'learning_rate': 1.773e-05, 'epoch': 2.44} 65%|██████▍ | 6466/10000 [23:34:31<12:40:57, 12.92s/it] 65%|██████▍ | 6467/10000 [23:34:44<12:40:32, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.7725e-05, 'epoch': 2.44} 65%|██████▍ | 6467/10000 [23:34:44<12:40:32, 12.92s/it] 65%|██████▍ | 6468/10000 [23:34:57<12:38:54, 12.89s/it] {'loss': 0.0047, 'learning_rate': 1.772e-05, 'epoch': 2.44} 65%|██████▍ | 6468/10000 [23:34:57<12:38:54, 12.89s/it] 65%|██████▍ | 6469/10000 [23:35:09<12:39:12, 12.90s/it] {'loss': 0.0051, 'learning_rate': 1.7715000000000002e-05, 'epoch': 2.44} 65%|██████▍ | 6469/10000 [23:35:09<12:39:12, 12.90s/it] 65%|██████▍ | 6470/10000 [23:35:22<12:40:06, 12.92s/it] {'loss': 0.0026, 'learning_rate': 1.771e-05, 'epoch': 2.44} 65%|██████▍ | 6470/10000 [23:35:22<12:40:06, 12.92s/it] 65%|██████▍ | 6471/10000 [23:35:35<12:39:29, 12.91s/it] {'loss': 0.0053, 'learning_rate': 1.7705e-05, 'epoch': 2.44} 65%|██████▍ | 6471/10000 [23:35:35<12:39:29, 12.91s/it] 65%|██████▍ | 6472/10000 [23:35:48<12:38:05, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.77e-05, 'epoch': 2.44} 65%|██████▍ | 6472/10000 [23:35:48<12:38:05, 12.89s/it] 65%|██████▍ | 6473/10000 [23:36:01<12:38:40, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.7695e-05, 'epoch': 2.44} 65%|██████▍ | 6473/10000 [23:36:01<12:38:40, 12.91s/it] 65%|██████▍ | 6474/10000 [23:36:14<12:38:25, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.7690000000000002e-05, 'epoch': 2.44} 65%|██████▍ | 6474/10000 [23:36:14<12:38:25, 12.91s/it] 65%|██████▍ | 6475/10000 [23:36:27<12:36:42, 12.88s/it] {'loss': 0.0042, 'learning_rate': 1.7685e-05, 'epoch': 2.44} 65%|██████▍ | 6475/10000 [23:36:27<12:36:42, 12.88s/it] 65%|██████▍ | 6476/10000 [23:36:40<12:37:43, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.7680000000000004e-05, 'epoch': 2.44} 65%|██████▍ | 6476/10000 [23:36:40<12:37:43, 12.90s/it] 65%|██████▍ | 6477/10000 [23:36:53<12:37:49, 12.91s/it] {'loss': 0.0034, 'learning_rate': 1.7675e-05, 'epoch': 2.44} 65%|██████▍ | 6477/10000 [23:36:53<12:37:49, 12.91s/it] 65%|██████▍ | 6478/10000 [23:37:06<12:37:36, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.7670000000000002e-05, 'epoch': 2.44} 65%|██████▍ | 6478/10000 [23:37:06<12:37:36, 12.91s/it] 65%|██████▍ | 6479/10000 [23:37:19<12:38:11, 12.92s/it] {'loss': 0.0058, 'learning_rate': 1.7665e-05, 'epoch': 2.44} 65%|██████▍ | 6479/10000 [23:37:19<12:38:11, 12.92s/it] 65%|██████▍ | 6480/10000 [23:37:32<12:39:02, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.766e-05, 'epoch': 2.44} 65%|██████▍ | 6480/10000 [23:37:32<12:39:02, 12.94s/it] 65%|██████▍ | 6481/10000 [23:37:44<12:38:08, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.7655000000000003e-05, 'epoch': 2.44} 65%|██████▍ | 6481/10000 [23:37:44<12:38:08, 12.93s/it] 65%|██████▍ | 6482/10000 [23:37:57<12:37:40, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.765e-05, 'epoch': 2.44} 65%|██████▍ | 6482/10000 [23:37:57<12:37:40, 12.92s/it] 65%|██████▍ | 6483/10000 [23:38:10<12:35:37, 12.89s/it] {'loss': 0.0047, 'learning_rate': 1.7645e-05, 'epoch': 2.44} 65%|██████▍ | 6483/10000 [23:38:10<12:35:37, 12.89s/it] 65%|██████▍ | 6484/10000 [23:38:23<12:34:44, 12.88s/it] {'loss': 0.0047, 'learning_rate': 1.764e-05, 'epoch': 2.44} 65%|██████▍ | 6484/10000 [23:38:23<12:34:44, 12.88s/it] 65%|██████▍ | 6485/10000 [23:38:36<12:35:21, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.7635000000000003e-05, 'epoch': 2.44} 65%|██████▍ | 6485/10000 [23:38:36<12:35:21, 12.89s/it] 65%|██████▍ | 6486/10000 [23:38:49<12:35:57, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.7630000000000002e-05, 'epoch': 2.44} 65%|██████▍ | 6486/10000 [23:38:49<12:35:57, 12.91s/it] 65%|██████▍ | 6487/10000 [23:39:02<12:34:46, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.7625e-05, 'epoch': 2.44} 65%|██████▍ | 6487/10000 [23:39:02<12:34:46, 12.89s/it] 65%|██████▍ | 6488/10000 [23:39:15<12:34:18, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.762e-05, 'epoch': 2.44} 65%|██████▍ | 6488/10000 [23:39:15<12:34:18, 12.89s/it] 65%|██████▍ | 6489/10000 [23:39:28<12:34:34, 12.89s/it] {'loss': 0.005, 'learning_rate': 1.7615e-05, 'epoch': 2.44} 65%|██████▍ | 6489/10000 [23:39:28<12:34:34, 12.89s/it] 65%|██████▍ | 6490/10000 [23:39:40<12:34:21, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.7610000000000002e-05, 'epoch': 2.45} 65%|██████▍ | 6490/10000 [23:39:40<12:34:21, 12.90s/it] 65%|██████▍ | 6491/10000 [23:39:53<12:32:59, 12.88s/it] {'loss': 0.0058, 'learning_rate': 1.7605000000000002e-05, 'epoch': 2.45} 65%|██████▍ | 6491/10000 [23:39:53<12:32:59, 12.88s/it] 65%|██████▍ | 6492/10000 [23:40:06<12:33:55, 12.90s/it] {'loss': 0.0038, 'learning_rate': 1.76e-05, 'epoch': 2.45} 65%|██████▍ | 6492/10000 [23:40:06<12:33:55, 12.90s/it] 65%|██████▍ | 6493/10000 [23:40:19<12:35:06, 12.92s/it] {'loss': 0.0047, 'learning_rate': 1.7595e-05, 'epoch': 2.45} 65%|██████▍ | 6493/10000 [23:40:19<12:35:06, 12.92s/it] 65%|██████▍ | 6494/10000 [23:40:32<12:35:30, 12.93s/it] {'loss': 0.0037, 'learning_rate': 1.759e-05, 'epoch': 2.45} 65%|██████▍ | 6494/10000 [23:40:32<12:35:30, 12.93s/it] 65%|██████▍ | 6495/10000 [23:40:45<12:34:16, 12.91s/it] {'loss': 0.0054, 'learning_rate': 1.7585000000000002e-05, 'epoch': 2.45} 65%|██████▍ | 6495/10000 [23:40:45<12:34:16, 12.91s/it] 65%|██████▍ | 6496/10000 [23:40:58<12:33:35, 12.90s/it] {'loss': 0.0053, 'learning_rate': 1.758e-05, 'epoch': 2.45} 65%|██████▍ | 6496/10000 [23:40:58<12:33:35, 12.90s/it] 65%|██████▍ | 6497/10000 [23:41:11<12:33:11, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.7575e-05, 'epoch': 2.45} 65%|██████▍ | 6497/10000 [23:41:11<12:33:11, 12.90s/it] 65%|██████▍ | 6498/10000 [23:41:24<12:33:31, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.757e-05, 'epoch': 2.45} 65%|██████▍ | 6498/10000 [23:41:24<12:33:31, 12.91s/it] 65%|██████▍ | 6499/10000 [23:41:37<12:34:33, 12.93s/it] {'loss': 0.0032, 'learning_rate': 1.7565000000000002e-05, 'epoch': 2.45} 65%|██████▍ | 6499/10000 [23:41:37<12:34:33, 12.93s/it] 65%|██████▌ | 6500/10000 [23:41:50<12:35:42, 12.96s/it] {'loss': 0.0037, 'learning_rate': 1.756e-05, 'epoch': 2.45} 65%|██████▌ | 6500/10000 [23:41:50<12:35:42, 12.96s/it] 65%|██████▌ | 6501/10000 [23:42:03<12:34:06, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.7555e-05, 'epoch': 2.45} 65%|██████▌ | 6501/10000 [23:42:03<12:34:06, 12.93s/it] 65%|██████▌ | 6502/10000 [23:42:15<12:32:26, 12.91s/it] {'loss': 0.005, 'learning_rate': 1.755e-05, 'epoch': 2.45} 65%|██████▌ | 6502/10000 [23:42:15<12:32:26, 12.91s/it] 65%|██████▌ | 6503/10000 [23:42:28<12:31:15, 12.89s/it] {'loss': 0.0055, 'learning_rate': 1.7545e-05, 'epoch': 2.45} 65%|██████▌ | 6503/10000 [23:42:28<12:31:15, 12.89s/it] 65%|██████▌ | 6504/10000 [23:42:41<12:30:40, 12.88s/it] {'loss': 0.0045, 'learning_rate': 1.754e-05, 'epoch': 2.45} 65%|██████▌ | 6504/10000 [23:42:41<12:30:40, 12.88s/it] 65%|██████▌ | 6505/10000 [23:42:54<12:29:41, 12.87s/it] {'loss': 0.0062, 'learning_rate': 1.7535e-05, 'epoch': 2.45} 65%|██████▌ | 6505/10000 [23:42:54<12:29:41, 12.87s/it] 65%|██████▌ | 6506/10000 [23:43:07<12:29:56, 12.88s/it] {'loss': 0.0035, 'learning_rate': 1.7530000000000003e-05, 'epoch': 2.45} 65%|██████▌ | 6506/10000 [23:43:07<12:29:56, 12.88s/it] 65%|██████▌ | 6507/10000 [23:43:20<12:29:13, 12.87s/it] {'loss': 0.005, 'learning_rate': 1.7525e-05, 'epoch': 2.45} 65%|██████▌ | 6507/10000 [23:43:20<12:29:13, 12.87s/it] 65%|██████▌ | 6508/10000 [23:43:33<12:31:28, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.752e-05, 'epoch': 2.45} 65%|██████▌ | 6508/10000 [23:43:33<12:31:28, 12.91s/it] 65%|██████▌ | 6509/10000 [23:43:46<12:31:20, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.7515e-05, 'epoch': 2.45} 65%|██████▌ | 6509/10000 [23:43:46<12:31:20, 12.91s/it] 65%|██████▌ | 6510/10000 [23:43:59<12:31:07, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.751e-05, 'epoch': 2.45} 65%|██████▌ | 6510/10000 [23:43:59<12:31:07, 12.91s/it] 65%|██████▌ | 6511/10000 [23:44:11<12:30:25, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.7505000000000003e-05, 'epoch': 2.45} 65%|██████▌ | 6511/10000 [23:44:11<12:30:25, 12.91s/it] 65%|██████▌ | 6512/10000 [23:44:24<12:29:39, 12.90s/it] {'loss': 0.006, 'learning_rate': 1.75e-05, 'epoch': 2.45} 65%|██████▌ | 6512/10000 [23:44:24<12:29:39, 12.90s/it] 65%|██████▌ | 6513/10000 [23:44:37<12:28:14, 12.87s/it] {'loss': 0.0054, 'learning_rate': 1.7495e-05, 'epoch': 2.45} 65%|██████▌ | 6513/10000 [23:44:37<12:28:14, 12.87s/it] 65%|██████▌ | 6514/10000 [23:44:50<12:29:11, 12.89s/it] {'loss': 0.0041, 'learning_rate': 1.749e-05, 'epoch': 2.45} 65%|██████▌ | 6514/10000 [23:44:50<12:29:11, 12.89s/it] 65%|██████▌ | 6515/10000 [23:45:03<12:28:54, 12.89s/it] {'loss': 0.0047, 'learning_rate': 1.7485000000000003e-05, 'epoch': 2.45} 65%|██████▌ | 6515/10000 [23:45:03<12:28:54, 12.89s/it] 65%|██████▌ | 6516/10000 [23:45:16<12:30:07, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.7480000000000002e-05, 'epoch': 2.46} 65%|██████▌ | 6516/10000 [23:45:16<12:30:07, 12.92s/it] 65%|██████▌ | 6517/10000 [23:45:29<12:30:43, 12.93s/it] {'loss': 0.0039, 'learning_rate': 1.7475e-05, 'epoch': 2.46} 65%|██████▌ | 6517/10000 [23:45:29<12:30:43, 12.93s/it] 65%|██████▌ | 6518/10000 [23:45:42<12:32:21, 12.96s/it] {'loss': 0.0044, 'learning_rate': 1.747e-05, 'epoch': 2.46} 65%|██████▌ | 6518/10000 [23:45:42<12:32:21, 12.96s/it] 65%|██████▌ | 6519/10000 [23:45:55<12:31:47, 12.96s/it] {'loss': 0.0055, 'learning_rate': 1.7465e-05, 'epoch': 2.46} 65%|██████▌ | 6519/10000 [23:45:55<12:31:47, 12.96s/it] 65%|██████▌ | 6520/10000 [23:46:08<12:31:01, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.7460000000000002e-05, 'epoch': 2.46} 65%|██████▌ | 6520/10000 [23:46:08<12:31:01, 12.95s/it] 65%|██████▌ | 6521/10000 [23:46:21<12:31:39, 12.96s/it] {'loss': 0.0043, 'learning_rate': 1.7455e-05, 'epoch': 2.46} 65%|██████▌ | 6521/10000 [23:46:21<12:31:39, 12.96s/it] 65%|██████▌ | 6522/10000 [23:46:34<12:31:22, 12.96s/it] {'loss': 0.0038, 'learning_rate': 1.745e-05, 'epoch': 2.46} 65%|██████▌ | 6522/10000 [23:46:34<12:31:22, 12.96s/it] 65%|██████▌ | 6523/10000 [23:46:47<12:31:28, 12.97s/it] {'loss': 0.0036, 'learning_rate': 1.7445e-05, 'epoch': 2.46} 65%|██████▌ | 6523/10000 [23:46:47<12:31:28, 12.97s/it] 65%|██████▌ | 6524/10000 [23:47:00<12:31:34, 12.97s/it] {'loss': 0.0038, 'learning_rate': 1.7440000000000002e-05, 'epoch': 2.46} 65%|██████▌ | 6524/10000 [23:47:00<12:31:34, 12.97s/it] 65%|██████▌ | 6525/10000 [23:47:13<12:31:22, 12.97s/it] {'loss': 0.005, 'learning_rate': 1.7435e-05, 'epoch': 2.46} 65%|██████▌ | 6525/10000 [23:47:13<12:31:22, 12.97s/it] 65%|██████▌ | 6526/10000 [23:47:26<12:30:06, 12.96s/it] {'loss': 0.005, 'learning_rate': 1.743e-05, 'epoch': 2.46} 65%|██████▌ | 6526/10000 [23:47:26<12:30:06, 12.96s/it] 65%|██████▌ | 6527/10000 [23:47:38<12:28:05, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.7425e-05, 'epoch': 2.46} 65%|██████▌ | 6527/10000 [23:47:39<12:28:05, 12.92s/it] 65%|██████▌ | 6528/10000 [23:47:51<12:27:10, 12.91s/it] {'loss': 0.0055, 'learning_rate': 1.742e-05, 'epoch': 2.46} 65%|██████▌ | 6528/10000 [23:47:51<12:27:10, 12.91s/it] 65%|██████▌ | 6529/10000 [23:48:04<12:26:20, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.7415000000000002e-05, 'epoch': 2.46} 65%|██████▌ | 6529/10000 [23:48:04<12:26:20, 12.90s/it] 65%|██████▌ | 6530/10000 [23:48:17<12:25:07, 12.88s/it] {'loss': 0.0052, 'learning_rate': 1.741e-05, 'epoch': 2.46} 65%|██████▌ | 6530/10000 [23:48:17<12:25:07, 12.88s/it] 65%|██████▌ | 6531/10000 [23:48:30<12:25:28, 12.89s/it] {'loss': 0.005, 'learning_rate': 1.7405e-05, 'epoch': 2.46} 65%|██████▌ | 6531/10000 [23:48:30<12:25:28, 12.89s/it] 65%|██████▌ | 6532/10000 [23:48:43<12:26:42, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.74e-05, 'epoch': 2.46} 65%|██████▌ | 6532/10000 [23:48:43<12:26:42, 12.92s/it] 65%|██████▌ | 6533/10000 [23:48:56<12:28:00, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.7395e-05, 'epoch': 2.46} 65%|██████▌ | 6533/10000 [23:48:56<12:28:00, 12.95s/it] 65%|██████▌ | 6534/10000 [23:49:09<12:28:24, 12.96s/it] {'loss': 0.0036, 'learning_rate': 1.739e-05, 'epoch': 2.46} 65%|██████▌ | 6534/10000 [23:49:09<12:28:24, 12.96s/it] 65%|██████▌ | 6535/10000 [23:49:22<12:28:36, 12.96s/it] {'loss': 0.0047, 'learning_rate': 1.7385e-05, 'epoch': 2.46} 65%|██████▌ | 6535/10000 [23:49:22<12:28:36, 12.96s/it] 65%|██████▌ | 6536/10000 [23:49:35<12:28:45, 12.97s/it] {'loss': 0.004, 'learning_rate': 1.7380000000000003e-05, 'epoch': 2.46} 65%|██████▌ | 6536/10000 [23:49:35<12:28:45, 12.97s/it] 65%|██████▌ | 6537/10000 [23:49:48<12:27:31, 12.95s/it] {'loss': 0.0039, 'learning_rate': 1.7375e-05, 'epoch': 2.46} 65%|██████▌ | 6537/10000 [23:49:48<12:27:31, 12.95s/it] 65%|██████▌ | 6538/10000 [23:50:01<12:27:03, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.737e-05, 'epoch': 2.46} 65%|██████▌ | 6538/10000 [23:50:01<12:27:03, 12.95s/it] 65%|██████▌ | 6539/10000 [23:50:14<12:24:51, 12.91s/it] {'loss': 0.0064, 'learning_rate': 1.7365e-05, 'epoch': 2.46} 65%|██████▌ | 6539/10000 [23:50:14<12:24:51, 12.91s/it] 65%|██████▌ | 6540/10000 [23:50:27<12:26:10, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.736e-05, 'epoch': 2.46} 65%|██████▌ | 6540/10000 [23:50:27<12:26:10, 12.94s/it] 65%|██████▌ | 6541/10000 [23:50:40<12:25:19, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.7355000000000002e-05, 'epoch': 2.46} 65%|██████▌ | 6541/10000 [23:50:40<12:25:19, 12.93s/it] 65%|██████▌ | 6542/10000 [23:50:52<12:24:58, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.7349999999999998e-05, 'epoch': 2.46} 65%|██████▌ | 6542/10000 [23:50:52<12:24:58, 12.93s/it] 65%|██████▌ | 6543/10000 [23:51:05<12:25:48, 12.94s/it] {'loss': 0.0049, 'learning_rate': 1.7345e-05, 'epoch': 2.47} 65%|██████▌ | 6543/10000 [23:51:06<12:25:48, 12.94s/it] 65%|██████▌ | 6544/10000 [23:51:18<12:26:53, 12.97s/it] {'loss': 0.0053, 'learning_rate': 1.734e-05, 'epoch': 2.47} 65%|██████▌ | 6544/10000 [23:51:18<12:26:53, 12.97s/it] 65%|██████▌ | 6545/10000 [23:51:31<12:25:32, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.7335000000000003e-05, 'epoch': 2.47} 65%|██████▌ | 6545/10000 [23:51:31<12:25:32, 12.95s/it] 65%|██████▌ | 6546/10000 [23:51:44<12:24:25, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.7330000000000002e-05, 'epoch': 2.47} 65%|██████▌ | 6546/10000 [23:51:44<12:24:25, 12.93s/it] 65%|██████▌ | 6547/10000 [23:51:57<12:23:56, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.7325e-05, 'epoch': 2.47} 65%|██████▌ | 6547/10000 [23:51:57<12:23:56, 12.93s/it] 65%|██████▌ | 6548/10000 [23:52:10<12:23:10, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.732e-05, 'epoch': 2.47} 65%|██████▌ | 6548/10000 [23:52:10<12:23:10, 12.92s/it] 65%|██████▌ | 6549/10000 [23:52:23<12:23:09, 12.92s/it] {'loss': 0.0032, 'learning_rate': 1.7315e-05, 'epoch': 2.47} 65%|██████▌ | 6549/10000 [23:52:23<12:23:09, 12.92s/it] 66%|██████▌ | 6550/10000 [23:52:36<12:21:41, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.7310000000000002e-05, 'epoch': 2.47} 66%|██████▌ | 6550/10000 [23:52:36<12:21:41, 12.90s/it] 66%|██████▌ | 6551/10000 [23:52:49<12:21:51, 12.91s/it] {'loss': 0.0059, 'learning_rate': 1.7305e-05, 'epoch': 2.47} 66%|██████▌ | 6551/10000 [23:52:49<12:21:51, 12.91s/it] 66%|██████▌ | 6552/10000 [23:53:02<12:22:24, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.73e-05, 'epoch': 2.47} 66%|██████▌ | 6552/10000 [23:53:02<12:22:24, 12.92s/it] 66%|██████▌ | 6553/10000 [23:53:15<12:22:42, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.7295e-05, 'epoch': 2.47} 66%|██████▌ | 6553/10000 [23:53:15<12:22:42, 12.93s/it] 66%|██████▌ | 6554/10000 [23:53:28<12:24:14, 12.96s/it] {'loss': 0.004, 'learning_rate': 1.7290000000000002e-05, 'epoch': 2.47} 66%|██████▌ | 6554/10000 [23:53:28<12:24:14, 12.96s/it] 66%|██████▌ | 6555/10000 [23:53:41<12:22:31, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.7285e-05, 'epoch': 2.47} 66%|██████▌ | 6555/10000 [23:53:41<12:22:31, 12.93s/it] 66%|██████▌ | 6556/10000 [23:53:53<12:21:03, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.728e-05, 'epoch': 2.47} 66%|██████▌ | 6556/10000 [23:53:53<12:21:03, 12.91s/it] 66%|██████▌ | 6557/10000 [23:54:06<12:20:04, 12.90s/it] {'loss': 0.0029, 'learning_rate': 1.7275e-05, 'epoch': 2.47} 66%|██████▌ | 6557/10000 [23:54:06<12:20:04, 12.90s/it] 66%|██████▌ | 6558/10000 [23:54:19<12:19:57, 12.90s/it] {'loss': 0.005, 'learning_rate': 1.727e-05, 'epoch': 2.47} 66%|██████▌ | 6558/10000 [23:54:19<12:19:57, 12.90s/it] 66%|██████▌ | 6559/10000 [23:54:32<12:18:48, 12.88s/it] {'loss': 0.0042, 'learning_rate': 1.7265e-05, 'epoch': 2.47} 66%|██████▌ | 6559/10000 [23:54:32<12:18:48, 12.88s/it] 66%|██████▌ | 6560/10000 [23:54:45<12:18:52, 12.89s/it] {'loss': 0.0053, 'learning_rate': 1.726e-05, 'epoch': 2.47} 66%|██████▌ | 6560/10000 [23:54:45<12:18:52, 12.89s/it] 66%|██████▌ | 6561/10000 [23:54:58<12:18:36, 12.89s/it] {'loss': 0.0057, 'learning_rate': 1.7255000000000003e-05, 'epoch': 2.47} 66%|██████▌ | 6561/10000 [23:54:58<12:18:36, 12.89s/it] 66%|██████▌ | 6562/10000 [23:55:11<12:18:37, 12.89s/it] {'loss': 0.0051, 'learning_rate': 1.725e-05, 'epoch': 2.47} 66%|██████▌ | 6562/10000 [23:55:11<12:18:37, 12.89s/it] 66%|██████▌ | 6563/10000 [23:55:24<12:18:25, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.7245e-05, 'epoch': 2.47} 66%|██████▌ | 6563/10000 [23:55:24<12:18:25, 12.89s/it] 66%|██████▌ | 6564/10000 [23:55:36<12:16:55, 12.87s/it] {'loss': 0.0054, 'learning_rate': 1.724e-05, 'epoch': 2.47} 66%|██████▌ | 6564/10000 [23:55:36<12:16:55, 12.87s/it] 66%|██████▌ | 6565/10000 [23:55:49<12:16:58, 12.87s/it] {'loss': 0.0048, 'learning_rate': 1.7235e-05, 'epoch': 2.47} 66%|██████▌ | 6565/10000 [23:55:49<12:16:58, 12.87s/it] 66%|██████▌ | 6566/10000 [23:56:02<12:17:31, 12.89s/it] {'loss': 0.0046, 'learning_rate': 1.7230000000000003e-05, 'epoch': 2.47} 66%|██████▌ | 6566/10000 [23:56:02<12:17:31, 12.89s/it] 66%|██████▌ | 6567/10000 [23:56:15<12:17:15, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.7225e-05, 'epoch': 2.47} 66%|██████▌ | 6567/10000 [23:56:15<12:17:15, 12.89s/it] 66%|██████▌ | 6568/10000 [23:56:28<12:17:13, 12.89s/it] {'loss': 0.0046, 'learning_rate': 1.722e-05, 'epoch': 2.47} 66%|██████▌ | 6568/10000 [23:56:28<12:17:13, 12.89s/it] 66%|██████▌ | 6569/10000 [23:56:41<12:17:00, 12.89s/it] {'loss': 0.0052, 'learning_rate': 1.7215e-05, 'epoch': 2.48} 66%|██████▌ | 6569/10000 [23:56:41<12:17:00, 12.89s/it] 66%|██████▌ | 6570/10000 [23:56:54<12:16:18, 12.88s/it] {'loss': 0.0071, 'learning_rate': 1.721e-05, 'epoch': 2.48} 66%|██████▌ | 6570/10000 [23:56:54<12:16:18, 12.88s/it] 66%|██████▌ | 6571/10000 [23:57:07<12:17:15, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.7205000000000002e-05, 'epoch': 2.48} 66%|██████▌ | 6571/10000 [23:57:07<12:17:15, 12.90s/it] 66%|██████▌ | 6572/10000 [23:57:20<12:17:14, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.7199999999999998e-05, 'epoch': 2.48} 66%|██████▌ | 6572/10000 [23:57:20<12:17:14, 12.90s/it] 66%|██████▌ | 6573/10000 [23:57:32<12:16:40, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.7195e-05, 'epoch': 2.48} 66%|██████▌ | 6573/10000 [23:57:33<12:16:40, 12.90s/it] 66%|██████▌ | 6574/10000 [23:57:45<12:17:40, 12.92s/it] {'loss': 0.0056, 'learning_rate': 1.719e-05, 'epoch': 2.48} 66%|██████▌ | 6574/10000 [23:57:46<12:17:40, 12.92s/it] 66%|██████▌ | 6575/10000 [23:57:58<12:17:32, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.7185000000000002e-05, 'epoch': 2.48} 66%|██████▌ | 6575/10000 [23:57:58<12:17:32, 12.92s/it] 66%|██████▌ | 6576/10000 [23:58:11<12:17:44, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.718e-05, 'epoch': 2.48} 66%|██████▌ | 6576/10000 [23:58:11<12:17:44, 12.93s/it] 66%|██████▌ | 6577/10000 [23:58:24<12:18:28, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.7175e-05, 'epoch': 2.48} 66%|██████▌ | 6577/10000 [23:58:24<12:18:28, 12.94s/it] 66%|██████▌ | 6578/10000 [23:58:37<12:17:19, 12.93s/it] {'loss': 0.0052, 'learning_rate': 1.717e-05, 'epoch': 2.48} 66%|██████▌ | 6578/10000 [23:58:37<12:17:19, 12.93s/it] 66%|██████▌ | 6579/10000 [23:58:50<12:16:07, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.7165e-05, 'epoch': 2.48} 66%|██████▌ | 6579/10000 [23:58:50<12:16:07, 12.91s/it] 66%|██████▌ | 6580/10000 [23:59:03<12:16:02, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.7160000000000002e-05, 'epoch': 2.48} 66%|██████▌ | 6580/10000 [23:59:03<12:16:02, 12.91s/it] 66%|██████▌ | 6581/10000 [23:59:16<12:13:51, 12.88s/it] {'loss': 0.0054, 'learning_rate': 1.7155e-05, 'epoch': 2.48} 66%|██████▌ | 6581/10000 [23:59:16<12:13:51, 12.88s/it] 66%|██████▌ | 6582/10000 [23:59:29<12:14:08, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.7150000000000004e-05, 'epoch': 2.48} 66%|██████▌ | 6582/10000 [23:59:29<12:14:08, 12.89s/it] 66%|██████▌ | 6583/10000 [23:59:42<12:14:29, 12.90s/it] {'loss': 0.0034, 'learning_rate': 1.7145e-05, 'epoch': 2.48} 66%|██████▌ | 6583/10000 [23:59:42<12:14:29, 12.90s/it] 66%|██████▌ | 6584/10000 [23:59:54<12:13:13, 12.88s/it] {'loss': 0.0044, 'learning_rate': 1.7140000000000002e-05, 'epoch': 2.48} 66%|██████▌ | 6584/10000 [23:59:54<12:13:13, 12.88s/it] 66%|██████▌ | 6585/10000 [24:00:07<12:12:41, 12.87s/it] {'loss': 0.0039, 'learning_rate': 1.7135e-05, 'epoch': 2.48} 66%|██████▌ | 6585/10000 [24:00:07<12:12:41, 12.87s/it] 66%|██████▌ | 6586/10000 [24:00:20<12:12:04, 12.87s/it] {'loss': 0.0048, 'learning_rate': 1.713e-05, 'epoch': 2.48} 66%|██████▌ | 6586/10000 [24:00:20<12:12:04, 12.87s/it] 66%|██████▌ | 6587/10000 [24:00:33<12:11:49, 12.87s/it] {'loss': 0.004, 'learning_rate': 1.7125000000000003e-05, 'epoch': 2.48} 66%|██████▌ | 6587/10000 [24:00:33<12:11:49, 12.87s/it] 66%|██████▌ | 6588/10000 [24:00:46<12:12:11, 12.88s/it] {'loss': 0.0048, 'learning_rate': 1.712e-05, 'epoch': 2.48} 66%|██████▌ | 6588/10000 [24:00:46<12:12:11, 12.88s/it] 66%|██████▌ | 6589/10000 [24:00:59<12:13:41, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.7115e-05, 'epoch': 2.48} 66%|██████▌ | 6589/10000 [24:00:59<12:13:41, 12.91s/it] 66%|██████▌ | 6590/10000 [24:01:12<12:12:33, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.711e-05, 'epoch': 2.48} 66%|██████▌ | 6590/10000 [24:01:12<12:12:33, 12.89s/it] 66%|██████▌ | 6591/10000 [24:01:25<12:14:00, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.7105000000000003e-05, 'epoch': 2.48} 66%|██████▌ | 6591/10000 [24:01:25<12:14:00, 12.92s/it] 66%|██████▌ | 6592/10000 [24:01:38<12:14:13, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.7100000000000002e-05, 'epoch': 2.48} 66%|██████▌ | 6592/10000 [24:01:38<12:14:13, 12.93s/it] 66%|██████▌ | 6593/10000 [24:01:51<12:12:23, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.7095e-05, 'epoch': 2.48} 66%|██████▌ | 6593/10000 [24:01:51<12:12:23, 12.90s/it] 66%|██████▌ | 6594/10000 [24:02:03<12:13:17, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.709e-05, 'epoch': 2.48} 66%|██████▌ | 6594/10000 [24:02:03<12:13:17, 12.92s/it] 66%|██████▌ | 6595/10000 [24:02:16<12:13:02, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.7085e-05, 'epoch': 2.48} 66%|██████▌ | 6595/10000 [24:02:16<12:13:02, 12.92s/it] 66%|██████▌ | 6596/10000 [24:02:29<12:11:40, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.7080000000000002e-05, 'epoch': 2.49} 66%|██████▌ | 6596/10000 [24:02:29<12:11:40, 12.90s/it] 66%|██████▌ | 6597/10000 [24:02:42<12:10:51, 12.89s/it] {'loss': 0.0058, 'learning_rate': 1.7075e-05, 'epoch': 2.49} 66%|██████▌ | 6597/10000 [24:02:42<12:10:51, 12.89s/it] 66%|██████▌ | 6598/10000 [24:02:55<12:11:20, 12.90s/it] {'loss': 0.0036, 'learning_rate': 1.707e-05, 'epoch': 2.49} 66%|██████▌ | 6598/10000 [24:02:55<12:11:20, 12.90s/it] 66%|██████▌ | 6599/10000 [24:03:08<12:10:15, 12.88s/it] {'loss': 0.0058, 'learning_rate': 1.7065e-05, 'epoch': 2.49} 66%|██████▌ | 6599/10000 [24:03:08<12:10:15, 12.88s/it] 66%|██████▌ | 6600/10000 [24:03:21<12:09:33, 12.87s/it] {'loss': 0.0033, 'learning_rate': 1.706e-05, 'epoch': 2.49} 66%|██████▌ | 6600/10000 [24:03:21<12:09:33, 12.87s/it] 66%|██████▌ | 6601/10000 [24:03:34<12:10:26, 12.89s/it] {'loss': 0.0038, 'learning_rate': 1.7055000000000002e-05, 'epoch': 2.49} 66%|██████▌ | 6601/10000 [24:03:34<12:10:26, 12.89s/it] 66%|██████▌ | 6602/10000 [24:03:47<12:10:18, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.705e-05, 'epoch': 2.49} 66%|██████▌ | 6602/10000 [24:03:47<12:10:18, 12.90s/it] 66%|██████▌ | 6603/10000 [24:04:00<12:11:29, 12.92s/it] {'loss': 0.005, 'learning_rate': 1.7045e-05, 'epoch': 2.49} 66%|██████▌ | 6603/10000 [24:04:00<12:11:29, 12.92s/it] 66%|██████▌ | 6604/10000 [24:04:12<12:10:10, 12.90s/it] {'loss': 0.0055, 'learning_rate': 1.704e-05, 'epoch': 2.49} 66%|██████▌ | 6604/10000 [24:04:12<12:10:10, 12.90s/it] 66%|██████▌ | 6605/10000 [24:04:25<12:10:09, 12.90s/it] {'loss': 0.0045, 'learning_rate': 1.7035000000000002e-05, 'epoch': 2.49} 66%|██████▌ | 6605/10000 [24:04:25<12:10:09, 12.90s/it] 66%|██████▌ | 6606/10000 [24:04:38<12:10:17, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.703e-05, 'epoch': 2.49} 66%|██████▌ | 6606/10000 [24:04:38<12:10:17, 12.91s/it] 66%|██████▌ | 6607/10000 [24:04:51<12:10:16, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.7025e-05, 'epoch': 2.49} 66%|██████▌ | 6607/10000 [24:04:51<12:10:16, 12.91s/it] 66%|██████▌ | 6608/10000 [24:05:04<12:11:13, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.702e-05, 'epoch': 2.49} 66%|██████▌ | 6608/10000 [24:05:04<12:11:13, 12.93s/it] 66%|██████▌ | 6609/10000 [24:05:17<12:09:33, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.7015e-05, 'epoch': 2.49} 66%|██████▌ | 6609/10000 [24:05:17<12:09:33, 12.91s/it] 66%|██████▌ | 6610/10000 [24:05:30<12:09:16, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.701e-05, 'epoch': 2.49} 66%|██████▌ | 6610/10000 [24:05:30<12:09:16, 12.91s/it] 66%|██████▌ | 6611/10000 [24:05:43<12:08:43, 12.90s/it] {'loss': 0.0035, 'learning_rate': 1.7005e-05, 'epoch': 2.49} 66%|██████▌ | 6611/10000 [24:05:43<12:08:43, 12.90s/it] 66%|██████▌ | 6612/10000 [24:05:56<12:08:25, 12.90s/it] {'loss': 0.0048, 'learning_rate': 1.7000000000000003e-05, 'epoch': 2.49} 66%|██████▌ | 6612/10000 [24:05:56<12:08:25, 12.90s/it] 66%|██████▌ | 6613/10000 [24:06:09<12:09:13, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.6995e-05, 'epoch': 2.49} 66%|██████▌ | 6613/10000 [24:06:09<12:09:13, 12.92s/it] 66%|██████▌ | 6614/10000 [24:06:21<12:07:10, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.699e-05, 'epoch': 2.49} 66%|██████▌ | 6614/10000 [24:06:21<12:07:10, 12.89s/it] 66%|██████▌ | 6615/10000 [24:06:34<12:06:45, 12.88s/it] {'loss': 0.0059, 'learning_rate': 1.6985e-05, 'epoch': 2.49} 66%|██████▌ | 6615/10000 [24:06:34<12:06:45, 12.88s/it] 66%|██████▌ | 6616/10000 [24:06:47<12:06:03, 12.87s/it] {'loss': 0.0049, 'learning_rate': 1.698e-05, 'epoch': 2.49} 66%|██████▌ | 6616/10000 [24:06:47<12:06:03, 12.87s/it] 66%|██████▌ | 6617/10000 [24:07:00<12:05:52, 12.87s/it] {'loss': 0.0037, 'learning_rate': 1.6975000000000003e-05, 'epoch': 2.49} 66%|██████▌ | 6617/10000 [24:07:00<12:05:52, 12.87s/it] 66%|██████▌ | 6618/10000 [24:07:13<12:09:52, 12.95s/it] {'loss': 0.0039, 'learning_rate': 1.697e-05, 'epoch': 2.49} 66%|██████▌ | 6618/10000 [24:07:13<12:09:52, 12.95s/it] 66%|██████▌ | 6619/10000 [24:07:26<12:07:31, 12.91s/it] {'loss': 0.0053, 'learning_rate': 1.6965e-05, 'epoch': 2.49} 66%|██████▌ | 6619/10000 [24:07:26<12:07:31, 12.91s/it] 66%|██████▌ | 6620/10000 [24:07:39<12:07:37, 12.92s/it] {'loss': 0.0047, 'learning_rate': 1.696e-05, 'epoch': 2.49} 66%|██████▌ | 6620/10000 [24:07:39<12:07:37, 12.92s/it] 66%|██████▌ | 6621/10000 [24:07:52<12:07:10, 12.91s/it] {'loss': 0.0053, 'learning_rate': 1.6955000000000003e-05, 'epoch': 2.49} 66%|██████▌ | 6621/10000 [24:07:52<12:07:10, 12.91s/it] 66%|██████▌ | 6622/10000 [24:08:05<12:07:35, 12.92s/it] {'loss': 0.005, 'learning_rate': 1.6950000000000002e-05, 'epoch': 2.5} 66%|██████▌ | 6622/10000 [24:08:05<12:07:35, 12.92s/it] 66%|██████▌ | 6623/10000 [24:08:18<12:07:28, 12.93s/it] {'loss': 0.0052, 'learning_rate': 1.6945e-05, 'epoch': 2.5} 66%|██████▌ | 6623/10000 [24:08:18<12:07:28, 12.93s/it] 66%|██████▌ | 6624/10000 [24:08:31<12:08:18, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.694e-05, 'epoch': 2.5} 66%|██████▌ | 6624/10000 [24:08:31<12:08:18, 12.94s/it] 66%|██████▋ | 6625/10000 [24:08:44<12:08:53, 12.96s/it] {'loss': 0.0037, 'learning_rate': 1.6935e-05, 'epoch': 2.5} 66%|██████▋ | 6625/10000 [24:08:44<12:08:53, 12.96s/it] 66%|██████▋ | 6626/10000 [24:08:57<12:09:39, 12.98s/it] {'loss': 0.0037, 'learning_rate': 1.6930000000000002e-05, 'epoch': 2.5} 66%|██████▋ | 6626/10000 [24:08:57<12:09:39, 12.98s/it] 66%|██████▋ | 6627/10000 [24:09:10<12:08:30, 12.96s/it] {'loss': 0.0055, 'learning_rate': 1.6925e-05, 'epoch': 2.5} 66%|██████▋ | 6627/10000 [24:09:10<12:08:30, 12.96s/it] 66%|██████▋ | 6628/10000 [24:09:23<12:07:58, 12.95s/it] {'loss': 0.0053, 'learning_rate': 1.692e-05, 'epoch': 2.5} 66%|██████▋ | 6628/10000 [24:09:23<12:07:58, 12.95s/it] 66%|██████▋ | 6629/10000 [24:09:36<12:07:27, 12.95s/it] {'loss': 0.0038, 'learning_rate': 1.6915e-05, 'epoch': 2.5} 66%|██████▋ | 6629/10000 [24:09:36<12:07:27, 12.95s/it] 66%|██████▋ | 6630/10000 [24:09:48<12:06:37, 12.94s/it] {'loss': 0.0049, 'learning_rate': 1.6910000000000002e-05, 'epoch': 2.5} 66%|██████▋ | 6630/10000 [24:09:48<12:06:37, 12.94s/it] 66%|██████▋ | 6631/10000 [24:10:01<12:07:17, 12.95s/it] {'loss': 0.0035, 'learning_rate': 1.6905e-05, 'epoch': 2.5} 66%|██████▋ | 6631/10000 [24:10:01<12:07:17, 12.95s/it] 66%|██████▋ | 6632/10000 [24:10:14<12:07:16, 12.96s/it] {'loss': 0.0049, 'learning_rate': 1.69e-05, 'epoch': 2.5} 66%|██████▋ | 6632/10000 [24:10:14<12:07:16, 12.96s/it] 66%|██████▋ | 6633/10000 [24:10:27<12:06:18, 12.94s/it] {'loss': 0.0035, 'learning_rate': 1.6895e-05, 'epoch': 2.5} 66%|██████▋ | 6633/10000 [24:10:27<12:06:18, 12.94s/it] 66%|██████▋ | 6634/10000 [24:10:40<12:05:22, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.689e-05, 'epoch': 2.5} 66%|██████▋ | 6634/10000 [24:10:40<12:05:22, 12.93s/it] 66%|██████▋ | 6635/10000 [24:10:53<12:06:07, 12.95s/it] {'loss': 0.0052, 'learning_rate': 1.6885000000000002e-05, 'epoch': 2.5} 66%|██████▋ | 6635/10000 [24:10:53<12:06:07, 12.95s/it] 66%|██████▋ | 6636/10000 [24:11:06<12:05:07, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.688e-05, 'epoch': 2.5} 66%|██████▋ | 6636/10000 [24:11:06<12:05:07, 12.93s/it] 66%|██████▋ | 6637/10000 [24:11:19<12:05:09, 12.94s/it] {'loss': 0.0054, 'learning_rate': 1.6875000000000004e-05, 'epoch': 2.5} 66%|██████▋ | 6637/10000 [24:11:19<12:05:09, 12.94s/it] 66%|██████▋ | 6638/10000 [24:11:32<12:06:12, 12.96s/it] {'loss': 0.0041, 'learning_rate': 1.687e-05, 'epoch': 2.5} 66%|██████▋ | 6638/10000 [24:11:32<12:06:12, 12.96s/it] 66%|██████▋ | 6639/10000 [24:11:45<12:05:47, 12.96s/it] {'loss': 0.0039, 'learning_rate': 1.6865e-05, 'epoch': 2.5} 66%|██████▋ | 6639/10000 [24:11:45<12:05:47, 12.96s/it] 66%|██████▋ | 6640/10000 [24:11:58<12:03:50, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.686e-05, 'epoch': 2.5} 66%|██████▋ | 6640/10000 [24:11:58<12:03:50, 12.93s/it] 66%|██████▋ | 6641/10000 [24:12:11<12:02:35, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.6855e-05, 'epoch': 2.5} 66%|██████▋ | 6641/10000 [24:12:11<12:02:35, 12.91s/it] 66%|██████▋ | 6642/10000 [24:12:24<12:01:11, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.6850000000000003e-05, 'epoch': 2.5} 66%|██████▋ | 6642/10000 [24:12:24<12:01:11, 12.89s/it] 66%|██████▋ | 6643/10000 [24:12:36<12:00:12, 12.87s/it] {'loss': 0.0057, 'learning_rate': 1.6845e-05, 'epoch': 2.5} 66%|██████▋ | 6643/10000 [24:12:36<12:00:12, 12.87s/it] 66%|██████▋ | 6644/10000 [24:12:49<12:01:19, 12.90s/it] {'loss': 0.004, 'learning_rate': 1.684e-05, 'epoch': 2.5} 66%|██████▋ | 6644/10000 [24:12:50<12:01:19, 12.90s/it] 66%|██████▋ | 6645/10000 [24:13:02<12:02:51, 12.93s/it] {'loss': 0.0057, 'learning_rate': 1.6835e-05, 'epoch': 2.5} 66%|██████▋ | 6645/10000 [24:13:02<12:02:51, 12.93s/it] 66%|██████▋ | 6646/10000 [24:13:15<12:02:13, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.683e-05, 'epoch': 2.5} 66%|██████▋ | 6646/10000 [24:13:15<12:02:13, 12.92s/it] 66%|██████▋ | 6647/10000 [24:13:28<12:00:52, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.6825000000000002e-05, 'epoch': 2.5} 66%|██████▋ | 6647/10000 [24:13:28<12:00:52, 12.90s/it] 66%|██████▋ | 6648/10000 [24:13:41<11:59:35, 12.88s/it] {'loss': 0.0047, 'learning_rate': 1.6819999999999998e-05, 'epoch': 2.5} 66%|██████▋ | 6648/10000 [24:13:41<11:59:35, 12.88s/it] 66%|██████▋ | 6649/10000 [24:13:54<12:00:07, 12.89s/it] {'loss': 0.0041, 'learning_rate': 1.6815e-05, 'epoch': 2.51} 66%|██████▋ | 6649/10000 [24:13:54<12:00:07, 12.89s/it] 66%|██████▋ | 6650/10000 [24:14:07<11:58:59, 12.88s/it] {'loss': 0.0045, 'learning_rate': 1.681e-05, 'epoch': 2.51} 66%|██████▋ | 6650/10000 [24:14:07<11:58:59, 12.88s/it] 67%|██████▋ | 6651/10000 [24:14:20<11:59:10, 12.88s/it] {'loss': 0.0058, 'learning_rate': 1.6805000000000003e-05, 'epoch': 2.51} 67%|██████▋ | 6651/10000 [24:14:20<11:59:10, 12.88s/it] 67%|██████▋ | 6652/10000 [24:14:32<11:59:15, 12.89s/it] {'loss': 0.0054, 'learning_rate': 1.6800000000000002e-05, 'epoch': 2.51} 67%|██████▋ | 6652/10000 [24:14:33<11:59:15, 12.89s/it] 67%|██████▋ | 6653/10000 [24:14:45<11:58:34, 12.88s/it] {'loss': 0.0065, 'learning_rate': 1.6795e-05, 'epoch': 2.51} 67%|██████▋ | 6653/10000 [24:14:45<11:58:34, 12.88s/it] 67%|██████▋ | 6654/10000 [24:14:58<11:57:53, 12.87s/it] {'loss': 0.0041, 'learning_rate': 1.679e-05, 'epoch': 2.51} 67%|██████▋ | 6654/10000 [24:14:58<11:57:53, 12.87s/it] 67%|██████▋ | 6655/10000 [24:15:11<11:57:34, 12.87s/it] {'loss': 0.0053, 'learning_rate': 1.6785e-05, 'epoch': 2.51} 67%|██████▋ | 6655/10000 [24:15:11<11:57:34, 12.87s/it] 67%|██████▋ | 6656/10000 [24:15:24<11:57:04, 12.87s/it] {'loss': 0.0053, 'learning_rate': 1.6780000000000002e-05, 'epoch': 2.51} 67%|██████▋ | 6656/10000 [24:15:24<11:57:04, 12.87s/it] 67%|██████▋ | 6657/10000 [24:15:37<11:56:20, 12.86s/it] {'loss': 0.0044, 'learning_rate': 1.6775e-05, 'epoch': 2.51} 67%|██████▋ | 6657/10000 [24:15:37<11:56:20, 12.86s/it] 67%|██████▋ | 6658/10000 [24:15:50<11:55:35, 12.85s/it] {'loss': 0.0048, 'learning_rate': 1.677e-05, 'epoch': 2.51} 67%|██████▋ | 6658/10000 [24:15:50<11:55:35, 12.85s/it] 67%|██████▋ | 6659/10000 [24:16:02<11:54:38, 12.83s/it] {'loss': 0.0045, 'learning_rate': 1.6765e-05, 'epoch': 2.51} 67%|██████▋ | 6659/10000 [24:16:02<11:54:38, 12.83s/it] 67%|██████▋ | 6660/10000 [24:16:15<11:54:18, 12.83s/it] {'loss': 0.0055, 'learning_rate': 1.6760000000000002e-05, 'epoch': 2.51} 67%|██████▋ | 6660/10000 [24:16:15<11:54:18, 12.83s/it] 67%|██████▋ | 6661/10000 [24:16:28<11:54:51, 12.85s/it] {'loss': 0.0045, 'learning_rate': 1.6755e-05, 'epoch': 2.51} 67%|██████▋ | 6661/10000 [24:16:28<11:54:51, 12.85s/it] 67%|██████▋ | 6662/10000 [24:16:41<11:53:02, 12.82s/it] {'loss': 0.0047, 'learning_rate': 1.675e-05, 'epoch': 2.51} 67%|██████▋ | 6662/10000 [24:16:41<11:53:02, 12.82s/it] 67%|██████▋ | 6663/10000 [24:16:54<11:56:35, 12.88s/it] {'loss': 0.0037, 'learning_rate': 1.6745e-05, 'epoch': 2.51} 67%|██████▋ | 6663/10000 [24:16:54<11:56:35, 12.88s/it] 67%|██████▋ | 6664/10000 [24:17:07<11:56:59, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.674e-05, 'epoch': 2.51} 67%|██████▋ | 6664/10000 [24:17:07<11:56:59, 12.90s/it] 67%|██████▋ | 6665/10000 [24:17:20<11:56:33, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.6735e-05, 'epoch': 2.51} 67%|██████▋ | 6665/10000 [24:17:20<11:56:33, 12.89s/it] 67%|██████▋ | 6666/10000 [24:17:33<11:56:57, 12.90s/it] {'loss': 0.0029, 'learning_rate': 1.673e-05, 'epoch': 2.51} 67%|██████▋ | 6666/10000 [24:17:33<11:56:57, 12.90s/it] 67%|██████▋ | 6667/10000 [24:17:45<11:55:48, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.6725000000000003e-05, 'epoch': 2.51} 67%|██████▋ | 6667/10000 [24:17:45<11:55:48, 12.89s/it] 67%|██████▋ | 6668/10000 [24:17:58<11:55:35, 12.89s/it] {'loss': 0.0061, 'learning_rate': 1.672e-05, 'epoch': 2.51} 67%|██████▋ | 6668/10000 [24:17:58<11:55:35, 12.89s/it] 67%|██████▋ | 6669/10000 [24:18:11<11:56:08, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.6715000000000002e-05, 'epoch': 2.51} 67%|██████▋ | 6669/10000 [24:18:11<11:56:08, 12.90s/it] 67%|██████▋ | 6670/10000 [24:18:24<11:55:57, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.671e-05, 'epoch': 2.51} 67%|██████▋ | 6670/10000 [24:18:24<11:55:57, 12.90s/it] 67%|██████▋ | 6671/10000 [24:18:37<11:55:00, 12.89s/it] {'loss': 0.0059, 'learning_rate': 1.6705e-05, 'epoch': 2.51} 67%|██████▋ | 6671/10000 [24:18:37<11:55:00, 12.89s/it] 67%|██████▋ | 6672/10000 [24:18:50<11:55:35, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.6700000000000003e-05, 'epoch': 2.51} 67%|██████▋ | 6672/10000 [24:18:50<11:55:35, 12.90s/it] 67%|██████▋ | 6673/10000 [24:19:03<11:54:29, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.6695e-05, 'epoch': 2.51} 67%|██████▋ | 6673/10000 [24:19:03<11:54:29, 12.89s/it] 67%|██████▋ | 6674/10000 [24:19:16<11:55:34, 12.91s/it] {'loss': 0.0037, 'learning_rate': 1.669e-05, 'epoch': 2.51} 67%|██████▋ | 6674/10000 [24:19:16<11:55:34, 12.91s/it] 67%|██████▋ | 6675/10000 [24:19:29<11:55:21, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.6685e-05, 'epoch': 2.52} 67%|██████▋ | 6675/10000 [24:19:29<11:55:21, 12.91s/it] 67%|██████▋ | 6676/10000 [24:19:42<11:54:28, 12.90s/it] {'loss': 0.0054, 'learning_rate': 1.668e-05, 'epoch': 2.52} 67%|██████▋ | 6676/10000 [24:19:42<11:54:28, 12.90s/it] 67%|██████▋ | 6677/10000 [24:19:54<11:52:59, 12.87s/it] {'loss': 0.0052, 'learning_rate': 1.6675000000000002e-05, 'epoch': 2.52} 67%|██████▋ | 6677/10000 [24:19:54<11:52:59, 12.87s/it] 67%|██████▋ | 6678/10000 [24:20:07<11:53:07, 12.88s/it] {'loss': 0.0042, 'learning_rate': 1.6669999999999998e-05, 'epoch': 2.52} 67%|██████▋ | 6678/10000 [24:20:07<11:53:07, 12.88s/it] 67%|██████▋ | 6679/10000 [24:20:20<11:54:30, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.6665e-05, 'epoch': 2.52} 67%|██████▋ | 6679/10000 [24:20:20<11:54:30, 12.91s/it] 67%|██████▋ | 6680/10000 [24:20:33<11:55:32, 12.93s/it] {'loss': 0.0051, 'learning_rate': 1.666e-05, 'epoch': 2.52} 67%|██████▋ | 6680/10000 [24:20:33<11:55:32, 12.93s/it] 67%|██████▋ | 6681/10000 [24:20:46<11:53:12, 12.89s/it] {'loss': 0.0041, 'learning_rate': 1.6655000000000002e-05, 'epoch': 2.52} 67%|██████▋ | 6681/10000 [24:20:46<11:53:12, 12.89s/it] 67%|██████▋ | 6682/10000 [24:20:59<11:53:42, 12.91s/it] {'loss': 0.0074, 'learning_rate': 1.665e-05, 'epoch': 2.52} 67%|██████▋ | 6682/10000 [24:20:59<11:53:42, 12.91s/it] 67%|██████▋ | 6683/10000 [24:21:12<11:53:03, 12.90s/it] {'loss': 0.0048, 'learning_rate': 1.6645e-05, 'epoch': 2.52} 67%|██████▋ | 6683/10000 [24:21:12<11:53:03, 12.90s/it] 67%|██████▋ | 6684/10000 [24:21:25<11:53:26, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.664e-05, 'epoch': 2.52} 67%|██████▋ | 6684/10000 [24:21:25<11:53:26, 12.91s/it] 67%|██████▋ | 6685/10000 [24:21:38<11:53:39, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.6635e-05, 'epoch': 2.52} 67%|██████▋ | 6685/10000 [24:21:38<11:53:39, 12.92s/it] 67%|██████▋ | 6686/10000 [24:21:51<11:54:20, 12.93s/it] {'loss': 0.0036, 'learning_rate': 1.6630000000000002e-05, 'epoch': 2.52} 67%|██████▋ | 6686/10000 [24:21:51<11:54:20, 12.93s/it] 67%|██████▋ | 6687/10000 [24:22:04<11:53:21, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.6625e-05, 'epoch': 2.52} 67%|██████▋ | 6687/10000 [24:22:04<11:53:21, 12.92s/it] 67%|██████▋ | 6688/10000 [24:22:16<11:52:43, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.662e-05, 'epoch': 2.52} 67%|██████▋ | 6688/10000 [24:22:17<11:52:43, 12.91s/it] 67%|██████▋ | 6689/10000 [24:22:29<11:52:21, 12.91s/it] {'loss': 0.006, 'learning_rate': 1.6615e-05, 'epoch': 2.52} 67%|██████▋ | 6689/10000 [24:22:29<11:52:21, 12.91s/it] 67%|██████▋ | 6690/10000 [24:22:42<11:53:02, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.6610000000000002e-05, 'epoch': 2.52} 67%|██████▋ | 6690/10000 [24:22:42<11:53:02, 12.93s/it] 67%|██████▋ | 6691/10000 [24:22:55<11:51:17, 12.90s/it] {'loss': 0.0035, 'learning_rate': 1.6605e-05, 'epoch': 2.52} 67%|██████▋ | 6691/10000 [24:22:55<11:51:17, 12.90s/it] 67%|██████▋ | 6692/10000 [24:23:08<11:51:14, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.66e-05, 'epoch': 2.52} 67%|██████▋ | 6692/10000 [24:23:08<11:51:14, 12.90s/it] 67%|██████▋ | 6693/10000 [24:23:21<11:51:34, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.6595e-05, 'epoch': 2.52} 67%|██████▋ | 6693/10000 [24:23:21<11:51:34, 12.91s/it] 67%|██████▋ | 6694/10000 [24:23:34<11:49:26, 12.88s/it] {'loss': 0.0045, 'learning_rate': 1.659e-05, 'epoch': 2.52} 67%|██████▋ | 6694/10000 [24:23:34<11:49:26, 12.88s/it] 67%|██████▋ | 6695/10000 [24:23:47<11:48:34, 12.86s/it] {'loss': 0.005, 'learning_rate': 1.6585e-05, 'epoch': 2.52} 67%|██████▋ | 6695/10000 [24:23:47<11:48:34, 12.86s/it] 67%|██████▋ | 6696/10000 [24:23:59<11:47:22, 12.85s/it] {'loss': 0.0041, 'learning_rate': 1.658e-05, 'epoch': 2.52} 67%|██████▋ | 6696/10000 [24:23:59<11:47:22, 12.85s/it] 67%|██████▋ | 6697/10000 [24:24:12<11:48:35, 12.87s/it] {'loss': 0.004, 'learning_rate': 1.6575000000000003e-05, 'epoch': 2.52} 67%|██████▋ | 6697/10000 [24:24:12<11:48:35, 12.87s/it] 67%|██████▋ | 6698/10000 [24:24:25<11:49:09, 12.89s/it] {'loss': 0.0048, 'learning_rate': 1.657e-05, 'epoch': 2.52} 67%|██████▋ | 6698/10000 [24:24:25<11:49:09, 12.89s/it] 67%|██████▋ | 6699/10000 [24:24:38<11:47:56, 12.87s/it] {'loss': 0.0047, 'learning_rate': 1.6565e-05, 'epoch': 2.52} 67%|██████▋ | 6699/10000 [24:24:38<11:47:56, 12.87s/it] 67%|██████▋ | 6700/10000 [24:24:51<11:47:43, 12.87s/it] {'loss': 0.0049, 'learning_rate': 1.656e-05, 'epoch': 2.52} 67%|██████▋ | 6700/10000 [24:24:51<11:47:43, 12.87s/it] 67%|██████▋ | 6701/10000 [24:25:04<11:46:56, 12.86s/it] {'loss': 0.0046, 'learning_rate': 1.6555e-05, 'epoch': 2.52} 67%|██████▋ | 6701/10000 [24:25:04<11:46:56, 12.86s/it] 67%|██████▋ | 6702/10000 [24:25:17<11:46:38, 12.86s/it] {'loss': 0.006, 'learning_rate': 1.6550000000000002e-05, 'epoch': 2.53} 67%|██████▋ | 6702/10000 [24:25:17<11:46:38, 12.86s/it] 67%|██████▋ | 6703/10000 [24:25:30<11:47:06, 12.87s/it] {'loss': 0.0052, 'learning_rate': 1.6545e-05, 'epoch': 2.53} 67%|██████▋ | 6703/10000 [24:25:30<11:47:06, 12.87s/it] 67%|██████▋ | 6704/10000 [24:25:42<11:47:37, 12.88s/it] {'loss': 0.0052, 'learning_rate': 1.654e-05, 'epoch': 2.53} 67%|██████▋ | 6704/10000 [24:25:43<11:47:37, 12.88s/it] 67%|██████▋ | 6705/10000 [24:25:55<11:48:41, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.6535e-05, 'epoch': 2.53} 67%|██████▋ | 6705/10000 [24:25:55<11:48:41, 12.90s/it] 67%|██████▋ | 6706/10000 [24:26:08<11:48:54, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.6530000000000003e-05, 'epoch': 2.53} 67%|██████▋ | 6706/10000 [24:26:08<11:48:54, 12.91s/it] 67%|██████▋ | 6707/10000 [24:26:21<11:48:51, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.6525000000000002e-05, 'epoch': 2.53} 67%|██████▋ | 6707/10000 [24:26:21<11:48:51, 12.92s/it] 67%|██████▋ | 6708/10000 [24:26:34<11:50:21, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.652e-05, 'epoch': 2.53} 67%|██████▋ | 6708/10000 [24:26:34<11:50:21, 12.95s/it] 67%|██████▋ | 6709/10000 [24:26:47<11:49:43, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.6515e-05, 'epoch': 2.53} 67%|██████▋ | 6709/10000 [24:26:47<11:49:43, 12.94s/it] 67%|██████▋ | 6710/10000 [24:27:00<11:49:15, 12.93s/it] {'loss': 0.0035, 'learning_rate': 1.651e-05, 'epoch': 2.53} 67%|██████▋ | 6710/10000 [24:27:00<11:49:15, 12.93s/it] 67%|██████▋ | 6711/10000 [24:27:13<11:49:21, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.6505000000000002e-05, 'epoch': 2.53} 67%|██████▋ | 6711/10000 [24:27:13<11:49:21, 12.94s/it] 67%|██████▋ | 6712/10000 [24:27:26<11:48:56, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.65e-05, 'epoch': 2.53} 67%|██████▋ | 6712/10000 [24:27:26<11:48:56, 12.94s/it] 67%|██████▋ | 6713/10000 [24:27:39<11:48:38, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.6495e-05, 'epoch': 2.53} 67%|██████▋ | 6713/10000 [24:27:39<11:48:38, 12.94s/it] 67%|██████▋ | 6714/10000 [24:27:52<11:47:30, 12.92s/it] {'loss': 0.0052, 'learning_rate': 1.649e-05, 'epoch': 2.53} 67%|██████▋ | 6714/10000 [24:27:52<11:47:30, 12.92s/it] 67%|██████▋ | 6715/10000 [24:28:05<11:48:09, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.6485e-05, 'epoch': 2.53} 67%|██████▋ | 6715/10000 [24:28:05<11:48:09, 12.93s/it] 67%|██████▋ | 6716/10000 [24:28:18<11:49:49, 12.97s/it] {'loss': 0.0049, 'learning_rate': 1.648e-05, 'epoch': 2.53} 67%|██████▋ | 6716/10000 [24:28:18<11:49:49, 12.97s/it] 67%|██████▋ | 6717/10000 [24:28:31<11:48:46, 12.95s/it] {'loss': 0.0047, 'learning_rate': 1.6475e-05, 'epoch': 2.53} 67%|██████▋ | 6717/10000 [24:28:31<11:48:46, 12.95s/it] 67%|██████▋ | 6718/10000 [24:28:44<11:48:18, 12.95s/it] {'loss': 0.0048, 'learning_rate': 1.6470000000000003e-05, 'epoch': 2.53} 67%|██████▋ | 6718/10000 [24:28:44<11:48:18, 12.95s/it] 67%|██████▋ | 6719/10000 [24:28:57<11:47:09, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.6465e-05, 'epoch': 2.53} 67%|██████▋ | 6719/10000 [24:28:57<11:47:09, 12.93s/it] 67%|██████▋ | 6720/10000 [24:29:10<11:46:04, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.646e-05, 'epoch': 2.53} 67%|██████▋ | 6720/10000 [24:29:10<11:46:04, 12.92s/it] 67%|██████▋ | 6721/10000 [24:29:22<11:45:49, 12.92s/it] {'loss': 0.0037, 'learning_rate': 1.6455e-05, 'epoch': 2.53} 67%|██████▋ | 6721/10000 [24:29:22<11:45:49, 12.92s/it] 67%|██████▋ | 6722/10000 [24:29:35<11:45:39, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.645e-05, 'epoch': 2.53} 67%|██████▋ | 6722/10000 [24:29:35<11:45:39, 12.92s/it] 67%|██████▋ | 6723/10000 [24:29:48<11:45:15, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.6445000000000003e-05, 'epoch': 2.53} 67%|██████▋ | 6723/10000 [24:29:48<11:45:15, 12.91s/it] 67%|██████▋ | 6724/10000 [24:30:01<11:46:13, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.644e-05, 'epoch': 2.53} 67%|██████▋ | 6724/10000 [24:30:01<11:46:13, 12.93s/it] 67%|██████▋ | 6725/10000 [24:30:14<11:44:26, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.6435e-05, 'epoch': 2.53} 67%|██████▋ | 6725/10000 [24:30:14<11:44:26, 12.91s/it] 67%|██████▋ | 6726/10000 [24:30:27<11:43:48, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.643e-05, 'epoch': 2.53} 67%|██████▋ | 6726/10000 [24:30:27<11:43:48, 12.90s/it] 67%|██████▋ | 6727/10000 [24:30:40<11:45:42, 12.94s/it] {'loss': 0.0037, 'learning_rate': 1.6425000000000003e-05, 'epoch': 2.53} 67%|██████▋ | 6727/10000 [24:30:40<11:45:42, 12.94s/it] 67%|██████▋ | 6728/10000 [24:30:53<11:46:50, 12.96s/it] {'loss': 0.0048, 'learning_rate': 1.6420000000000002e-05, 'epoch': 2.54} 67%|██████▋ | 6728/10000 [24:30:53<11:46:50, 12.96s/it] 67%|██████▋ | 6729/10000 [24:31:06<11:46:13, 12.95s/it] {'loss': 0.0048, 'learning_rate': 1.6415e-05, 'epoch': 2.54} 67%|██████▋ | 6729/10000 [24:31:06<11:46:13, 12.95s/it] 67%|██████▋ | 6730/10000 [24:31:19<11:45:18, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.641e-05, 'epoch': 2.54} 67%|██████▋ | 6730/10000 [24:31:19<11:45:18, 12.94s/it] 67%|██████▋ | 6731/10000 [24:31:32<11:45:43, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.6405e-05, 'epoch': 2.54} 67%|██████▋ | 6731/10000 [24:31:32<11:45:43, 12.95s/it] 67%|██████▋ | 6732/10000 [24:31:45<11:46:09, 12.96s/it] {'loss': 0.0048, 'learning_rate': 1.6400000000000002e-05, 'epoch': 2.54} 67%|██████▋ | 6732/10000 [24:31:45<11:46:09, 12.96s/it] 67%|██████▋ | 6733/10000 [24:31:58<11:44:11, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.6395e-05, 'epoch': 2.54} 67%|██████▋ | 6733/10000 [24:31:58<11:44:11, 12.93s/it] 67%|██████▋ | 6734/10000 [24:32:11<11:44:29, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.639e-05, 'epoch': 2.54} 67%|██████▋ | 6734/10000 [24:32:11<11:44:29, 12.94s/it] 67%|██████▋ | 6735/10000 [24:32:24<11:42:59, 12.92s/it] {'loss': 0.0032, 'learning_rate': 1.6385e-05, 'epoch': 2.54} 67%|██████▋ | 6735/10000 [24:32:24<11:42:59, 12.92s/it] 67%|██████▋ | 6736/10000 [24:32:37<11:44:35, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.6380000000000002e-05, 'epoch': 2.54} 67%|██████▋ | 6736/10000 [24:32:37<11:44:35, 12.95s/it] 67%|██████▋ | 6737/10000 [24:32:50<11:45:01, 12.96s/it] {'loss': 0.0055, 'learning_rate': 1.6375e-05, 'epoch': 2.54} 67%|██████▋ | 6737/10000 [24:32:50<11:45:01, 12.96s/it] 67%|██████▋ | 6738/10000 [24:33:03<11:45:07, 12.97s/it] {'loss': 0.0049, 'learning_rate': 1.637e-05, 'epoch': 2.54} 67%|██████▋ | 6738/10000 [24:33:03<11:45:07, 12.97s/it] 67%|██████▋ | 6739/10000 [24:33:15<11:43:53, 12.95s/it] {'loss': 0.0052, 'learning_rate': 1.6365e-05, 'epoch': 2.54} 67%|██████▋ | 6739/10000 [24:33:15<11:43:53, 12.95s/it] 67%|██████▋ | 6740/10000 [24:33:28<11:42:27, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.636e-05, 'epoch': 2.54} 67%|██████▋ | 6740/10000 [24:33:28<11:42:27, 12.93s/it] 67%|██████▋ | 6741/10000 [24:33:41<11:42:35, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.6355000000000002e-05, 'epoch': 2.54} 67%|██████▋ | 6741/10000 [24:33:41<11:42:35, 12.94s/it] 67%|██████▋ | 6742/10000 [24:33:54<11:42:43, 12.94s/it] {'loss': 0.005, 'learning_rate': 1.635e-05, 'epoch': 2.54} 67%|██████▋ | 6742/10000 [24:33:54<11:42:43, 12.94s/it] 67%|██████▋ | 6743/10000 [24:34:07<11:42:22, 12.94s/it] {'loss': 0.0066, 'learning_rate': 1.6345000000000004e-05, 'epoch': 2.54} 67%|██████▋ | 6743/10000 [24:34:07<11:42:22, 12.94s/it] 67%|██████▋ | 6744/10000 [24:34:20<11:42:25, 12.94s/it] {'loss': 0.0051, 'learning_rate': 1.634e-05, 'epoch': 2.54} 67%|██████▋ | 6744/10000 [24:34:20<11:42:25, 12.94s/it] 67%|██████▋ | 6745/10000 [24:34:33<11:40:16, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.6335e-05, 'epoch': 2.54} 67%|██████▋ | 6745/10000 [24:34:33<11:40:16, 12.91s/it] 67%|██████▋ | 6746/10000 [24:34:46<11:39:16, 12.89s/it] {'loss': 0.0048, 'learning_rate': 1.633e-05, 'epoch': 2.54} 67%|██████▋ | 6746/10000 [24:34:46<11:39:16, 12.89s/it] 67%|██████▋ | 6747/10000 [24:34:59<11:39:34, 12.90s/it] {'loss': 0.0035, 'learning_rate': 1.6325e-05, 'epoch': 2.54} 67%|██████▋ | 6747/10000 [24:34:59<11:39:34, 12.90s/it] 67%|██████▋ | 6748/10000 [24:35:12<11:40:31, 12.92s/it] {'loss': 0.0037, 'learning_rate': 1.6320000000000003e-05, 'epoch': 2.54} 67%|██████▋ | 6748/10000 [24:35:12<11:40:31, 12.92s/it] 67%|██████▋ | 6749/10000 [24:35:25<11:41:15, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.6315e-05, 'epoch': 2.54} 67%|██████▋ | 6749/10000 [24:35:25<11:41:15, 12.94s/it] 68%|██████▊ | 6750/10000 [24:35:38<11:40:02, 12.92s/it] {'loss': 0.0052, 'learning_rate': 1.631e-05, 'epoch': 2.54} 68%|██████▊ | 6750/10000 [24:35:38<11:40:02, 12.92s/it] 68%|██████▊ | 6751/10000 [24:35:50<11:40:09, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.6305e-05, 'epoch': 2.54} 68%|██████▊ | 6751/10000 [24:35:50<11:40:09, 12.93s/it] 68%|██████▊ | 6752/10000 [24:36:03<11:39:44, 12.93s/it] {'loss': 0.0064, 'learning_rate': 1.63e-05, 'epoch': 2.54} 68%|██████▊ | 6752/10000 [24:36:03<11:39:44, 12.93s/it] 68%|██████▊ | 6753/10000 [24:36:16<11:39:10, 12.92s/it] {'loss': 0.0052, 'learning_rate': 1.6295000000000002e-05, 'epoch': 2.54} 68%|██████▊ | 6753/10000 [24:36:16<11:39:10, 12.92s/it] 68%|██████▊ | 6754/10000 [24:36:29<11:38:30, 12.91s/it] {'loss': 0.0037, 'learning_rate': 1.6289999999999998e-05, 'epoch': 2.54} 68%|██████▊ | 6754/10000 [24:36:29<11:38:30, 12.91s/it] 68%|██████▊ | 6755/10000 [24:36:42<11:39:09, 12.93s/it] {'loss': 0.0034, 'learning_rate': 1.6285e-05, 'epoch': 2.55} 68%|██████▊ | 6755/10000 [24:36:42<11:39:09, 12.93s/it] 68%|██████▊ | 6756/10000 [24:36:55<11:39:39, 12.94s/it] {'loss': 0.004, 'learning_rate': 1.628e-05, 'epoch': 2.55} 68%|██████▊ | 6756/10000 [24:36:55<11:39:39, 12.94s/it] 68%|██████▊ | 6757/10000 [24:37:08<11:38:44, 12.93s/it] {'loss': 0.0035, 'learning_rate': 1.6275000000000003e-05, 'epoch': 2.55} 68%|██████▊ | 6757/10000 [24:37:08<11:38:44, 12.93s/it] 68%|██████▊ | 6758/10000 [24:37:21<11:38:58, 12.94s/it] {'loss': 0.0058, 'learning_rate': 1.6270000000000002e-05, 'epoch': 2.55} 68%|██████▊ | 6758/10000 [24:37:21<11:38:58, 12.94s/it] 68%|██████▊ | 6759/10000 [24:37:34<11:38:21, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.6265e-05, 'epoch': 2.55} 68%|██████▊ | 6759/10000 [24:37:34<11:38:21, 12.93s/it] 68%|██████▊ | 6760/10000 [24:37:47<11:38:16, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.626e-05, 'epoch': 2.55} 68%|██████▊ | 6760/10000 [24:37:47<11:38:16, 12.93s/it] 68%|██████▊ | 6761/10000 [24:38:00<11:38:01, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.6255e-05, 'epoch': 2.55} 68%|██████▊ | 6761/10000 [24:38:00<11:38:01, 12.93s/it] 68%|██████▊ | 6762/10000 [24:38:13<11:38:29, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.6250000000000002e-05, 'epoch': 2.55} 68%|██████▊ | 6762/10000 [24:38:13<11:38:29, 12.94s/it] 68%|██████▊ | 6763/10000 [24:38:26<11:38:44, 12.95s/it] {'loss': 0.0031, 'learning_rate': 1.6245e-05, 'epoch': 2.55} 68%|██████▊ | 6763/10000 [24:38:26<11:38:44, 12.95s/it] 68%|██████▊ | 6764/10000 [24:38:39<11:37:32, 12.93s/it] {'loss': 0.0047, 'learning_rate': 1.624e-05, 'epoch': 2.55} 68%|██████▊ | 6764/10000 [24:38:39<11:37:32, 12.93s/it] 68%|██████▊ | 6765/10000 [24:38:51<11:36:45, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.6235e-05, 'epoch': 2.55} 68%|██████▊ | 6765/10000 [24:38:52<11:36:45, 12.92s/it] 68%|██████▊ | 6766/10000 [24:39:04<11:37:20, 12.94s/it] {'loss': 0.0056, 'learning_rate': 1.6230000000000002e-05, 'epoch': 2.55} 68%|██████▊ | 6766/10000 [24:39:04<11:37:20, 12.94s/it] 68%|██████▊ | 6767/10000 [24:39:17<11:37:34, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.6225e-05, 'epoch': 2.55} 68%|██████▊ | 6767/10000 [24:39:17<11:37:34, 12.95s/it] 68%|██████▊ | 6768/10000 [24:39:30<11:37:26, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.622e-05, 'epoch': 2.55} 68%|██████▊ | 6768/10000 [24:39:30<11:37:26, 12.95s/it] 68%|██████▊ | 6769/10000 [24:39:43<11:36:28, 12.93s/it] {'loss': 0.0054, 'learning_rate': 1.6215e-05, 'epoch': 2.55} 68%|██████▊ | 6769/10000 [24:39:43<11:36:28, 12.93s/it] 68%|██████▊ | 6770/10000 [24:39:56<11:35:08, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.621e-05, 'epoch': 2.55} 68%|██████▊ | 6770/10000 [24:39:56<11:35:08, 12.91s/it] 68%|██████▊ | 6771/10000 [24:40:09<11:35:40, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.6205e-05, 'epoch': 2.55} 68%|██████▊ | 6771/10000 [24:40:09<11:35:40, 12.93s/it] 68%|██████▊ | 6772/10000 [24:40:22<11:36:43, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.62e-05, 'epoch': 2.55} 68%|██████▊ | 6772/10000 [24:40:22<11:36:43, 12.95s/it] 68%|██████▊ | 6773/10000 [24:40:35<11:36:08, 12.94s/it] {'loss': 0.0054, 'learning_rate': 1.6195000000000003e-05, 'epoch': 2.55} 68%|██████▊ | 6773/10000 [24:40:35<11:36:08, 12.94s/it] 68%|██████▊ | 6774/10000 [24:40:48<11:35:39, 12.94s/it] {'loss': 0.0038, 'learning_rate': 1.619e-05, 'epoch': 2.55} 68%|██████▊ | 6774/10000 [24:40:48<11:35:39, 12.94s/it] 68%|██████▊ | 6775/10000 [24:41:01<11:35:05, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.6185000000000002e-05, 'epoch': 2.55} 68%|██████▊ | 6775/10000 [24:41:01<11:35:05, 12.93s/it] 68%|██████▊ | 6776/10000 [24:41:14<11:34:47, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.618e-05, 'epoch': 2.55} 68%|██████▊ | 6776/10000 [24:41:14<11:34:47, 12.93s/it] 68%|██████▊ | 6777/10000 [24:41:27<11:35:15, 12.94s/it] {'loss': 0.0047, 'learning_rate': 1.6175e-05, 'epoch': 2.55} 68%|██████▊ | 6777/10000 [24:41:27<11:35:15, 12.94s/it] 68%|██████▊ | 6778/10000 [24:41:40<11:36:37, 12.97s/it] {'loss': 0.0048, 'learning_rate': 1.6170000000000003e-05, 'epoch': 2.55} 68%|██████▊ | 6778/10000 [24:41:40<11:36:37, 12.97s/it] 68%|██████▊ | 6779/10000 [24:41:53<11:36:48, 12.98s/it] {'loss': 0.0038, 'learning_rate': 1.6165e-05, 'epoch': 2.55} 68%|██████▊ | 6779/10000 [24:41:53<11:36:48, 12.98s/it] 68%|██████▊ | 6780/10000 [24:42:06<11:36:53, 12.99s/it] {'loss': 0.0045, 'learning_rate': 1.616e-05, 'epoch': 2.55} 68%|██████▊ | 6780/10000 [24:42:06<11:36:53, 12.99s/it] 68%|██████▊ | 6781/10000 [24:42:19<11:35:37, 12.97s/it] {'loss': 0.0056, 'learning_rate': 1.6155e-05, 'epoch': 2.56} 68%|██████▊ | 6781/10000 [24:42:19<11:35:37, 12.97s/it] 68%|██████▊ | 6782/10000 [24:42:32<11:33:28, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.6150000000000003e-05, 'epoch': 2.56} 68%|██████▊ | 6782/10000 [24:42:32<11:33:28, 12.93s/it] 68%|██████▊ | 6783/10000 [24:42:45<11:34:16, 12.95s/it] {'loss': 0.0052, 'learning_rate': 1.6145000000000002e-05, 'epoch': 2.56} 68%|██████▊ | 6783/10000 [24:42:45<11:34:16, 12.95s/it] 68%|██████▊ | 6784/10000 [24:42:58<11:34:50, 12.96s/it] {'loss': 0.0035, 'learning_rate': 1.6139999999999998e-05, 'epoch': 2.56} 68%|██████▊ | 6784/10000 [24:42:58<11:34:50, 12.96s/it] 68%|██████▊ | 6785/10000 [24:43:10<11:33:20, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.6135e-05, 'epoch': 2.56} 68%|██████▊ | 6785/10000 [24:43:10<11:33:20, 12.94s/it] 68%|██████▊ | 6786/10000 [24:43:23<11:33:19, 12.94s/it] {'loss': 0.0036, 'learning_rate': 1.613e-05, 'epoch': 2.56} 68%|██████▊ | 6786/10000 [24:43:23<11:33:19, 12.94s/it] 68%|██████▊ | 6787/10000 [24:43:36<11:33:14, 12.95s/it] {'loss': 0.0039, 'learning_rate': 1.6125000000000002e-05, 'epoch': 2.56} 68%|██████▊ | 6787/10000 [24:43:36<11:33:14, 12.95s/it] 68%|██████▊ | 6788/10000 [24:43:49<11:34:01, 12.96s/it] {'loss': 0.0042, 'learning_rate': 1.612e-05, 'epoch': 2.56} 68%|██████▊ | 6788/10000 [24:43:49<11:34:01, 12.96s/it] 68%|██████▊ | 6789/10000 [24:44:02<11:35:30, 13.00s/it] {'loss': 0.0044, 'learning_rate': 1.6115e-05, 'epoch': 2.56} 68%|██████▊ | 6789/10000 [24:44:02<11:35:30, 13.00s/it] 68%|██████▊ | 6790/10000 [24:44:15<11:36:15, 13.01s/it] {'loss': 0.0054, 'learning_rate': 1.611e-05, 'epoch': 2.56} 68%|██████▊ | 6790/10000 [24:44:16<11:36:15, 13.01s/it] 68%|██████▊ | 6791/10000 [24:44:28<11:34:38, 12.99s/it] {'loss': 0.0055, 'learning_rate': 1.6105e-05, 'epoch': 2.56} 68%|██████▊ | 6791/10000 [24:44:28<11:34:38, 12.99s/it] 68%|██████▊ | 6792/10000 [24:44:41<11:33:11, 12.96s/it] {'loss': 0.0047, 'learning_rate': 1.6100000000000002e-05, 'epoch': 2.56} 68%|██████▊ | 6792/10000 [24:44:41<11:33:11, 12.96s/it] 68%|██████▊ | 6793/10000 [24:44:54<11:31:46, 12.94s/it] {'loss': 0.0037, 'learning_rate': 1.6095e-05, 'epoch': 2.56} 68%|██████▊ | 6793/10000 [24:44:54<11:31:46, 12.94s/it] 68%|██████▊ | 6794/10000 [24:45:07<11:31:33, 12.94s/it] {'loss': 0.0051, 'learning_rate': 1.609e-05, 'epoch': 2.56} 68%|██████▊ | 6794/10000 [24:45:07<11:31:33, 12.94s/it] 68%|██████▊ | 6795/10000 [24:45:20<11:31:14, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.6085e-05, 'epoch': 2.56} 68%|██████▊ | 6795/10000 [24:45:20<11:31:14, 12.94s/it] 68%|██████▊ | 6796/10000 [24:45:33<11:30:21, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.6080000000000002e-05, 'epoch': 2.56} 68%|██████▊ | 6796/10000 [24:45:33<11:30:21, 12.93s/it] 68%|██████▊ | 6797/10000 [24:45:46<11:30:07, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.6075e-05, 'epoch': 2.56} 68%|██████▊ | 6797/10000 [24:45:46<11:30:07, 12.93s/it] 68%|██████▊ | 6798/10000 [24:45:59<11:29:06, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.607e-05, 'epoch': 2.56} 68%|██████▊ | 6798/10000 [24:45:59<11:29:06, 12.91s/it] 68%|██████▊ | 6799/10000 [24:46:12<11:27:55, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.6065e-05, 'epoch': 2.56} 68%|██████▊ | 6799/10000 [24:46:12<11:27:55, 12.89s/it] 68%|██████▊ | 6800/10000 [24:46:25<11:27:47, 12.90s/it] {'loss': 0.005, 'learning_rate': 1.606e-05, 'epoch': 2.56} 68%|██████▊ | 6800/10000 [24:46:25<11:27:47, 12.90s/it] 68%|██████▊ | 6801/10000 [24:46:38<11:29:06, 12.92s/it] {'loss': 0.0039, 'learning_rate': 1.6055e-05, 'epoch': 2.56} 68%|██████▊ | 6801/10000 [24:46:38<11:29:06, 12.92s/it] 68%|██████▊ | 6802/10000 [24:46:50<11:27:20, 12.90s/it] {'loss': 0.0044, 'learning_rate': 1.605e-05, 'epoch': 2.56} 68%|██████▊ | 6802/10000 [24:46:50<11:27:20, 12.90s/it] 68%|██████▊ | 6803/10000 [24:47:03<11:27:35, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.6045000000000003e-05, 'epoch': 2.56} 68%|██████▊ | 6803/10000 [24:47:03<11:27:35, 12.90s/it] 68%|██████▊ | 6804/10000 [24:47:16<11:28:19, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.604e-05, 'epoch': 2.56} 68%|██████▊ | 6804/10000 [24:47:16<11:28:19, 12.92s/it] 68%|██████▊ | 6805/10000 [24:47:29<11:28:04, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.6035e-05, 'epoch': 2.56} 68%|██████▊ | 6805/10000 [24:47:29<11:28:04, 12.92s/it] 68%|██████▊ | 6806/10000 [24:47:42<11:27:28, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.603e-05, 'epoch': 2.56} 68%|██████▊ | 6806/10000 [24:47:42<11:27:28, 12.91s/it] 68%|██████▊ | 6807/10000 [24:47:55<11:27:57, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.6025e-05, 'epoch': 2.56} 68%|██████▊ | 6807/10000 [24:47:55<11:27:57, 12.93s/it] 68%|██████▊ | 6808/10000 [24:48:08<11:30:45, 12.98s/it] {'loss': 0.0058, 'learning_rate': 1.6020000000000002e-05, 'epoch': 2.57} 68%|██████▊ | 6808/10000 [24:48:08<11:30:45, 12.98s/it] 68%|██████▊ | 6809/10000 [24:48:21<11:31:59, 13.01s/it] {'loss': 0.005, 'learning_rate': 1.6014999999999998e-05, 'epoch': 2.57} 68%|██████▊ | 6809/10000 [24:48:21<11:31:59, 13.01s/it] 68%|██████▊ | 6810/10000 [24:48:34<11:29:33, 12.97s/it] {'loss': 0.0041, 'learning_rate': 1.601e-05, 'epoch': 2.57} 68%|██████▊ | 6810/10000 [24:48:34<11:29:33, 12.97s/it] 68%|██████▊ | 6811/10000 [24:48:47<11:30:04, 12.98s/it] {'loss': 0.0053, 'learning_rate': 1.6005e-05, 'epoch': 2.57} 68%|██████▊ | 6811/10000 [24:48:47<11:30:04, 12.98s/it] 68%|██████▊ | 6812/10000 [24:49:00<11:29:53, 12.98s/it] {'loss': 0.0043, 'learning_rate': 1.6000000000000003e-05, 'epoch': 2.57} 68%|██████▊ | 6812/10000 [24:49:00<11:29:53, 12.98s/it] 68%|██████▊ | 6813/10000 [24:49:13<11:29:16, 12.98s/it] {'loss': 0.0048, 'learning_rate': 1.5995000000000002e-05, 'epoch': 2.57} 68%|██████▊ | 6813/10000 [24:49:13<11:29:16, 12.98s/it] 68%|██████▊ | 6814/10000 [24:49:26<11:27:56, 12.96s/it] {'loss': 0.005, 'learning_rate': 1.599e-05, 'epoch': 2.57} 68%|██████▊ | 6814/10000 [24:49:26<11:27:56, 12.96s/it] 68%|██████▊ | 6815/10000 [24:49:39<11:27:25, 12.95s/it] {'loss': 0.006, 'learning_rate': 1.5985e-05, 'epoch': 2.57} 68%|██████▊ | 6815/10000 [24:49:39<11:27:25, 12.95s/it] 68%|██████▊ | 6816/10000 [24:49:52<11:27:18, 12.95s/it] {'loss': 0.0057, 'learning_rate': 1.598e-05, 'epoch': 2.57} 68%|██████▊ | 6816/10000 [24:49:52<11:27:18, 12.95s/it] 68%|██████▊ | 6817/10000 [24:50:05<11:28:23, 12.98s/it] {'loss': 0.0041, 'learning_rate': 1.5975000000000002e-05, 'epoch': 2.57} 68%|██████▊ | 6817/10000 [24:50:05<11:28:23, 12.98s/it] 68%|██████▊ | 6818/10000 [24:50:18<11:26:22, 12.94s/it] {'loss': 0.0038, 'learning_rate': 1.597e-05, 'epoch': 2.57} 68%|██████▊ | 6818/10000 [24:50:18<11:26:22, 12.94s/it] 68%|██████▊ | 6819/10000 [24:50:31<11:25:14, 12.92s/it] {'loss': 0.004, 'learning_rate': 1.5965e-05, 'epoch': 2.57} 68%|██████▊ | 6819/10000 [24:50:31<11:25:14, 12.92s/it] 68%|██████▊ | 6820/10000 [24:50:44<11:26:04, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.596e-05, 'epoch': 2.57} 68%|██████▊ | 6820/10000 [24:50:44<11:26:04, 12.94s/it] 68%|██████▊ | 6821/10000 [24:50:57<11:24:38, 12.92s/it] {'loss': 0.0046, 'learning_rate': 1.5955e-05, 'epoch': 2.57} 68%|██████▊ | 6821/10000 [24:50:57<11:24:38, 12.92s/it] 68%|██████▊ | 6822/10000 [24:51:09<11:24:11, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.595e-05, 'epoch': 2.57} 68%|██████▊ | 6822/10000 [24:51:09<11:24:11, 12.92s/it] 68%|██████▊ | 6823/10000 [24:51:22<11:25:14, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.5945e-05, 'epoch': 2.57} 68%|██████▊ | 6823/10000 [24:51:22<11:25:14, 12.94s/it] 68%|██████▊ | 6824/10000 [24:51:35<11:23:53, 12.92s/it] {'loss': 0.0034, 'learning_rate': 1.594e-05, 'epoch': 2.57} 68%|██████▊ | 6824/10000 [24:51:35<11:23:53, 12.92s/it] 68%|██████▊ | 6825/10000 [24:51:48<11:22:19, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.5935e-05, 'epoch': 2.57} 68%|██████▊ | 6825/10000 [24:51:48<11:22:19, 12.89s/it] 68%|██████▊ | 6826/10000 [24:52:01<11:23:08, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.593e-05, 'epoch': 2.57} 68%|██████▊ | 6826/10000 [24:52:01<11:23:08, 12.91s/it] 68%|██████▊ | 6827/10000 [24:52:14<11:23:05, 12.92s/it] {'loss': 0.0036, 'learning_rate': 1.5925e-05, 'epoch': 2.57} 68%|██████▊ | 6827/10000 [24:52:14<11:23:05, 12.92s/it] 68%|██████▊ | 6828/10000 [24:52:27<11:22:29, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.592e-05, 'epoch': 2.57} 68%|██████▊ | 6828/10000 [24:52:27<11:22:29, 12.91s/it] 68%|██████▊ | 6829/10000 [24:52:40<11:27:22, 13.01s/it] {'loss': 0.0043, 'learning_rate': 1.5915000000000003e-05, 'epoch': 2.57} 68%|██████▊ | 6829/10000 [24:52:40<11:27:22, 13.01s/it] 68%|██████▊ | 6830/10000 [24:52:53<11:24:24, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.591e-05, 'epoch': 2.57} 68%|██████▊ | 6830/10000 [24:52:53<11:24:24, 12.95s/it] 68%|██████▊ | 6831/10000 [24:53:06<11:24:03, 12.95s/it] {'loss': 0.0039, 'learning_rate': 1.5905e-05, 'epoch': 2.57} 68%|██████▊ | 6831/10000 [24:53:06<11:24:03, 12.95s/it] 68%|██████▊ | 6832/10000 [24:53:19<11:24:15, 12.96s/it] {'loss': 0.0037, 'learning_rate': 1.59e-05, 'epoch': 2.57} 68%|██████▊ | 6832/10000 [24:53:19<11:24:15, 12.96s/it] 68%|██████▊ | 6833/10000 [24:53:32<11:23:21, 12.95s/it] {'loss': 0.0048, 'learning_rate': 1.5895000000000003e-05, 'epoch': 2.57} 68%|██████▊ | 6833/10000 [24:53:32<11:23:21, 12.95s/it] 68%|██████▊ | 6834/10000 [24:53:45<11:23:40, 12.96s/it] {'loss': 0.0056, 'learning_rate': 1.5890000000000002e-05, 'epoch': 2.57} 68%|██████▊ | 6834/10000 [24:53:45<11:23:40, 12.96s/it] 68%|██████▊ | 6835/10000 [24:53:58<11:23:07, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.5885e-05, 'epoch': 2.58} 68%|██████▊ | 6835/10000 [24:53:58<11:23:07, 12.95s/it] 68%|██████▊ | 6836/10000 [24:54:11<11:23:03, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.588e-05, 'epoch': 2.58} 68%|██████▊ | 6836/10000 [24:54:11<11:23:03, 12.95s/it] 68%|██████▊ | 6837/10000 [24:54:24<11:21:54, 12.94s/it] {'loss': 0.0056, 'learning_rate': 1.5875e-05, 'epoch': 2.58} 68%|██████▊ | 6837/10000 [24:54:24<11:21:54, 12.94s/it] 68%|██████▊ | 6838/10000 [24:54:36<11:21:02, 12.92s/it] {'loss': 0.0054, 'learning_rate': 1.5870000000000002e-05, 'epoch': 2.58} 68%|██████▊ | 6838/10000 [24:54:36<11:21:02, 12.92s/it] 68%|██████▊ | 6839/10000 [24:54:49<11:20:44, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.5865e-05, 'epoch': 2.58} 68%|██████▊ | 6839/10000 [24:54:49<11:20:44, 12.92s/it] 68%|██████▊ | 6840/10000 [24:55:02<11:20:51, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.586e-05, 'epoch': 2.58} 68%|██████▊ | 6840/10000 [24:55:02<11:20:51, 12.93s/it] 68%|██████▊ | 6841/10000 [24:55:15<11:19:18, 12.90s/it] {'loss': 0.0044, 'learning_rate': 1.5855e-05, 'epoch': 2.58} 68%|██████▊ | 6841/10000 [24:55:15<11:19:18, 12.90s/it] 68%|██████▊ | 6842/10000 [24:55:28<11:18:01, 12.88s/it] {'loss': 0.0057, 'learning_rate': 1.5850000000000002e-05, 'epoch': 2.58} 68%|██████▊ | 6842/10000 [24:55:28<11:18:01, 12.88s/it] 68%|██████▊ | 6843/10000 [24:55:41<11:18:46, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.5845e-05, 'epoch': 2.58} 68%|██████▊ | 6843/10000 [24:55:41<11:18:46, 12.90s/it] 68%|██████▊ | 6844/10000 [24:55:54<11:18:33, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.584e-05, 'epoch': 2.58} 68%|██████▊ | 6844/10000 [24:55:54<11:18:33, 12.90s/it] 68%|██████▊ | 6845/10000 [24:56:07<11:18:00, 12.89s/it] {'loss': 0.0057, 'learning_rate': 1.5835e-05, 'epoch': 2.58} 68%|██████▊ | 6845/10000 [24:56:07<11:18:00, 12.89s/it] 68%|██████▊ | 6846/10000 [24:56:20<11:17:47, 12.89s/it] {'loss': 0.0054, 'learning_rate': 1.583e-05, 'epoch': 2.58} 68%|██████▊ | 6846/10000 [24:56:20<11:17:47, 12.89s/it] 68%|██████▊ | 6847/10000 [24:56:32<11:17:17, 12.89s/it] {'loss': 0.0032, 'learning_rate': 1.5825000000000002e-05, 'epoch': 2.58} 68%|██████▊ | 6847/10000 [24:56:33<11:17:17, 12.89s/it] 68%|██████▊ | 6848/10000 [24:56:45<11:16:14, 12.87s/it] {'loss': 0.0056, 'learning_rate': 1.582e-05, 'epoch': 2.58} 68%|██████▊ | 6848/10000 [24:56:45<11:16:14, 12.87s/it] 68%|██████▊ | 6849/10000 [24:56:58<11:16:06, 12.87s/it] {'loss': 0.0034, 'learning_rate': 1.5815000000000004e-05, 'epoch': 2.58} 68%|██████▊ | 6849/10000 [24:56:58<11:16:06, 12.87s/it] 68%|██████▊ | 6850/10000 [24:57:11<11:17:19, 12.90s/it] {'loss': 0.0045, 'learning_rate': 1.581e-05, 'epoch': 2.58} 68%|██████▊ | 6850/10000 [24:57:11<11:17:19, 12.90s/it] 69%|██████▊ | 6851/10000 [24:57:24<11:16:34, 12.89s/it] {'loss': 0.0038, 'learning_rate': 1.5805000000000002e-05, 'epoch': 2.58} 69%|██████▊ | 6851/10000 [24:57:24<11:16:34, 12.89s/it] 69%|██████▊ | 6852/10000 [24:57:37<11:16:16, 12.89s/it] {'loss': 0.0046, 'learning_rate': 1.58e-05, 'epoch': 2.58} 69%|██████▊ | 6852/10000 [24:57:37<11:16:16, 12.89s/it] 69%|██████▊ | 6853/10000 [24:57:50<11:15:53, 12.89s/it] {'loss': 0.0053, 'learning_rate': 1.5795e-05, 'epoch': 2.58} 69%|██████▊ | 6853/10000 [24:57:50<11:15:53, 12.89s/it] 69%|██████▊ | 6854/10000 [24:58:03<11:15:49, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.5790000000000003e-05, 'epoch': 2.58} 69%|██████▊ | 6854/10000 [24:58:03<11:15:49, 12.89s/it] 69%|██████▊ | 6855/10000 [24:58:16<11:15:03, 12.88s/it] {'loss': 0.0037, 'learning_rate': 1.5785e-05, 'epoch': 2.58} 69%|██████▊ | 6855/10000 [24:58:16<11:15:03, 12.88s/it] 69%|██████▊ | 6856/10000 [24:58:28<11:14:38, 12.87s/it] {'loss': 0.0053, 'learning_rate': 1.578e-05, 'epoch': 2.58} 69%|██████▊ | 6856/10000 [24:58:28<11:14:38, 12.87s/it] 69%|██████▊ | 6857/10000 [24:58:41<11:14:50, 12.88s/it] {'loss': 0.0053, 'learning_rate': 1.5775e-05, 'epoch': 2.58} 69%|██████▊ | 6857/10000 [24:58:41<11:14:50, 12.88s/it] 69%|██████▊ | 6858/10000 [24:58:54<11:14:16, 12.88s/it] {'loss': 0.0049, 'learning_rate': 1.577e-05, 'epoch': 2.58} 69%|██████▊ | 6858/10000 [24:58:54<11:14:16, 12.88s/it] 69%|██████▊ | 6859/10000 [24:59:07<11:15:57, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.5765000000000002e-05, 'epoch': 2.58} 69%|██████▊ | 6859/10000 [24:59:07<11:15:57, 12.91s/it] 69%|██████▊ | 6860/10000 [24:59:20<11:17:18, 12.94s/it] {'loss': 0.0056, 'learning_rate': 1.5759999999999998e-05, 'epoch': 2.58} 69%|██████▊ | 6860/10000 [24:59:20<11:17:18, 12.94s/it] 69%|██████▊ | 6861/10000 [24:59:33<11:16:46, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.5755e-05, 'epoch': 2.59} 69%|██████▊ | 6861/10000 [24:59:33<11:16:46, 12.94s/it] 69%|██████▊ | 6862/10000 [24:59:46<11:16:18, 12.93s/it] {'loss': 0.0036, 'learning_rate': 1.575e-05, 'epoch': 2.59} 69%|██████▊ | 6862/10000 [24:59:46<11:16:18, 12.93s/it] 69%|██████▊ | 6863/10000 [24:59:59<11:15:58, 12.93s/it] {'loss': 0.0052, 'learning_rate': 1.5745000000000003e-05, 'epoch': 2.59} 69%|██████▊ | 6863/10000 [24:59:59<11:15:58, 12.93s/it] 69%|██████▊ | 6864/10000 [25:00:12<11:16:18, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.5740000000000002e-05, 'epoch': 2.59} 69%|██████▊ | 6864/10000 [25:00:12<11:16:18, 12.94s/it] 69%|██████▊ | 6865/10000 [25:00:25<11:16:45, 12.95s/it] {'loss': 0.0044, 'learning_rate': 1.5735e-05, 'epoch': 2.59} 69%|██████▊ | 6865/10000 [25:00:25<11:16:45, 12.95s/it] 69%|██████▊ | 6866/10000 [25:00:38<11:15:14, 12.93s/it] {'loss': 0.0038, 'learning_rate': 1.573e-05, 'epoch': 2.59} 69%|██████▊ | 6866/10000 [25:00:38<11:15:14, 12.93s/it] 69%|██████▊ | 6867/10000 [25:00:51<11:13:41, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.5725e-05, 'epoch': 2.59} 69%|██████▊ | 6867/10000 [25:00:51<11:13:41, 12.90s/it] 69%|██████▊ | 6868/10000 [25:01:04<11:13:41, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.5720000000000002e-05, 'epoch': 2.59} 69%|██████▊ | 6868/10000 [25:01:04<11:13:41, 12.91s/it] 69%|██████▊ | 6869/10000 [25:01:16<11:12:40, 12.89s/it] {'loss': 0.0038, 'learning_rate': 1.5715e-05, 'epoch': 2.59} 69%|██████▊ | 6869/10000 [25:01:16<11:12:40, 12.89s/it] 69%|██████▊ | 6870/10000 [25:01:29<11:11:57, 12.88s/it] {'loss': 0.0053, 'learning_rate': 1.571e-05, 'epoch': 2.59} 69%|██████▊ | 6870/10000 [25:01:29<11:11:57, 12.88s/it] 69%|██████▊ | 6871/10000 [25:01:42<11:11:14, 12.87s/it] {'loss': 0.006, 'learning_rate': 1.5705e-05, 'epoch': 2.59} 69%|██████▊ | 6871/10000 [25:01:42<11:11:14, 12.87s/it] 69%|██████▊ | 6872/10000 [25:01:55<11:12:03, 12.89s/it] {'loss': 0.0037, 'learning_rate': 1.5700000000000002e-05, 'epoch': 2.59} 69%|██████▊ | 6872/10000 [25:01:55<11:12:03, 12.89s/it] 69%|██████▊ | 6873/10000 [25:02:08<11:11:54, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.5695e-05, 'epoch': 2.59} 69%|██████▊ | 6873/10000 [25:02:08<11:11:54, 12.89s/it] 69%|██████▊ | 6874/10000 [25:02:21<11:10:52, 12.88s/it] {'loss': 0.0056, 'learning_rate': 1.569e-05, 'epoch': 2.59} 69%|██████▊ | 6874/10000 [25:02:21<11:10:52, 12.88s/it] 69%|██████▉ | 6875/10000 [25:02:34<11:11:04, 12.88s/it] {'loss': 0.0054, 'learning_rate': 1.5685e-05, 'epoch': 2.59} 69%|██████▉ | 6875/10000 [25:02:34<11:11:04, 12.88s/it] 69%|██████▉ | 6876/10000 [25:02:47<11:10:51, 12.88s/it] {'loss': 0.0032, 'learning_rate': 1.568e-05, 'epoch': 2.59} 69%|██████▉ | 6876/10000 [25:02:47<11:10:51, 12.88s/it] 69%|██████▉ | 6877/10000 [25:02:59<11:11:00, 12.89s/it] {'loss': 0.0056, 'learning_rate': 1.5675e-05, 'epoch': 2.59} 69%|██████▉ | 6877/10000 [25:02:59<11:11:00, 12.89s/it] 69%|██████▉ | 6878/10000 [25:03:12<11:11:12, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.567e-05, 'epoch': 2.59} 69%|██████▉ | 6878/10000 [25:03:12<11:11:12, 12.90s/it] 69%|██████▉ | 6879/10000 [25:03:25<11:11:57, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.5665000000000003e-05, 'epoch': 2.59} 69%|██████▉ | 6879/10000 [25:03:25<11:11:57, 12.92s/it] 69%|██████▉ | 6880/10000 [25:03:38<11:13:10, 12.95s/it] {'loss': 0.0051, 'learning_rate': 1.566e-05, 'epoch': 2.59} 69%|██████▉ | 6880/10000 [25:03:38<11:13:10, 12.95s/it] 69%|██████▉ | 6881/10000 [25:03:51<11:14:04, 12.97s/it] {'loss': 0.0054, 'learning_rate': 1.5655000000000002e-05, 'epoch': 2.59} 69%|██████▉ | 6881/10000 [25:03:51<11:14:04, 12.97s/it] 69%|██████▉ | 6882/10000 [25:04:04<11:14:21, 12.98s/it] {'loss': 0.0043, 'learning_rate': 1.565e-05, 'epoch': 2.59} 69%|██████▉ | 6882/10000 [25:04:04<11:14:21, 12.98s/it] 69%|██████▉ | 6883/10000 [25:04:17<11:14:09, 12.98s/it] {'loss': 0.0048, 'learning_rate': 1.5645e-05, 'epoch': 2.59} 69%|██████▉ | 6883/10000 [25:04:17<11:14:09, 12.98s/it] 69%|██████▉ | 6884/10000 [25:04:30<11:12:44, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.5640000000000003e-05, 'epoch': 2.59} 69%|██████▉ | 6884/10000 [25:04:30<11:12:44, 12.95s/it] 69%|██████▉ | 6885/10000 [25:04:43<11:11:46, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.5635e-05, 'epoch': 2.59} 69%|██████▉ | 6885/10000 [25:04:43<11:11:46, 12.94s/it] 69%|██████▉ | 6886/10000 [25:04:56<11:10:30, 12.92s/it] {'loss': 0.0058, 'learning_rate': 1.563e-05, 'epoch': 2.59} 69%|██████▉ | 6886/10000 [25:04:56<11:10:30, 12.92s/it] 69%|██████▉ | 6887/10000 [25:05:09<11:11:23, 12.94s/it] {'loss': 0.0047, 'learning_rate': 1.5625e-05, 'epoch': 2.59} 69%|██████▉ | 6887/10000 [25:05:09<11:11:23, 12.94s/it] 69%|██████▉ | 6888/10000 [25:05:22<11:11:32, 12.95s/it] {'loss': 0.0033, 'learning_rate': 1.5620000000000003e-05, 'epoch': 2.6} 69%|██████▉ | 6888/10000 [25:05:22<11:11:32, 12.95s/it] 69%|██████▉ | 6889/10000 [25:05:35<11:10:02, 12.92s/it] {'loss': 0.0062, 'learning_rate': 1.5615000000000002e-05, 'epoch': 2.6} 69%|██████▉ | 6889/10000 [25:05:35<11:10:02, 12.92s/it] 69%|██████▉ | 6890/10000 [25:05:48<11:09:43, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.561e-05, 'epoch': 2.6} 69%|██████▉ | 6890/10000 [25:05:48<11:09:43, 12.92s/it] 69%|██████▉ | 6891/10000 [25:06:01<11:08:57, 12.91s/it] {'loss': 0.005, 'learning_rate': 1.5605e-05, 'epoch': 2.6} 69%|██████▉ | 6891/10000 [25:06:01<11:08:57, 12.91s/it] 69%|██████▉ | 6892/10000 [25:06:14<11:08:36, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.56e-05, 'epoch': 2.6} 69%|██████▉ | 6892/10000 [25:06:14<11:08:36, 12.91s/it] 69%|██████▉ | 6893/10000 [25:06:26<11:08:55, 12.92s/it] {'loss': 0.0055, 'learning_rate': 1.5595000000000002e-05, 'epoch': 2.6} 69%|██████▉ | 6893/10000 [25:06:27<11:08:55, 12.92s/it] 69%|██████▉ | 6894/10000 [25:06:40<11:10:20, 12.95s/it] {'loss': 0.0047, 'learning_rate': 1.559e-05, 'epoch': 2.6} 69%|██████▉ | 6894/10000 [25:06:40<11:10:20, 12.95s/it] 69%|██████▉ | 6895/10000 [25:06:52<11:09:39, 12.94s/it] {'loss': 0.0049, 'learning_rate': 1.5585e-05, 'epoch': 2.6} 69%|██████▉ | 6895/10000 [25:06:52<11:09:39, 12.94s/it] 69%|██████▉ | 6896/10000 [25:07:05<11:10:36, 12.96s/it] {'loss': 0.0049, 'learning_rate': 1.558e-05, 'epoch': 2.6} 69%|██████▉ | 6896/10000 [25:07:05<11:10:36, 12.96s/it] 69%|██████▉ | 6897/10000 [25:07:18<11:10:26, 12.96s/it] {'loss': 0.0041, 'learning_rate': 1.5575e-05, 'epoch': 2.6} 69%|██████▉ | 6897/10000 [25:07:18<11:10:26, 12.96s/it] 69%|██████▉ | 6898/10000 [25:07:31<11:09:36, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.5570000000000002e-05, 'epoch': 2.6} 69%|██████▉ | 6898/10000 [25:07:31<11:09:36, 12.95s/it] 69%|██████▉ | 6899/10000 [25:07:44<11:09:53, 12.96s/it] {'loss': 0.004, 'learning_rate': 1.5565e-05, 'epoch': 2.6} 69%|██████▉ | 6899/10000 [25:07:44<11:09:53, 12.96s/it] 69%|██████▉ | 6900/10000 [25:07:57<11:07:28, 12.92s/it] {'loss': 0.0059, 'learning_rate': 1.556e-05, 'epoch': 2.6} 69%|██████▉ | 6900/10000 [25:07:57<11:07:28, 12.92s/it] 69%|██████▉ | 6901/10000 [25:08:10<11:07:25, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.5555e-05, 'epoch': 2.6} 69%|██████▉ | 6901/10000 [25:08:10<11:07:25, 12.92s/it] 69%|██████▉ | 6902/10000 [25:08:23<11:06:37, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.5550000000000002e-05, 'epoch': 2.6} 69%|██████▉ | 6902/10000 [25:08:23<11:06:37, 12.91s/it] 69%|██████▉ | 6903/10000 [25:08:36<11:05:27, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.5545e-05, 'epoch': 2.6} 69%|██████▉ | 6903/10000 [25:08:36<11:05:27, 12.89s/it] 69%|██████▉ | 6904/10000 [25:08:49<11:06:10, 12.91s/it] {'loss': 0.0036, 'learning_rate': 1.554e-05, 'epoch': 2.6} 69%|██████▉ | 6904/10000 [25:08:49<11:06:10, 12.91s/it] 69%|██████▉ | 6905/10000 [25:09:02<11:06:44, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.5535e-05, 'epoch': 2.6} 69%|██████▉ | 6905/10000 [25:09:02<11:06:44, 12.93s/it] 69%|██████▉ | 6906/10000 [25:09:15<11:05:34, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.553e-05, 'epoch': 2.6} 69%|██████▉ | 6906/10000 [25:09:15<11:05:34, 12.91s/it] 69%|██████▉ | 6907/10000 [25:09:28<11:06:33, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.5525e-05, 'epoch': 2.6} 69%|██████▉ | 6907/10000 [25:09:28<11:06:33, 12.93s/it] 69%|██████▉ | 6908/10000 [25:09:40<11:05:44, 12.92s/it] {'loss': 0.004, 'learning_rate': 1.552e-05, 'epoch': 2.6} 69%|██████▉ | 6908/10000 [25:09:40<11:05:44, 12.92s/it] 69%|██████▉ | 6909/10000 [25:09:53<11:05:24, 12.92s/it] {'loss': 0.0046, 'learning_rate': 1.5515000000000003e-05, 'epoch': 2.6} 69%|██████▉ | 6909/10000 [25:09:53<11:05:24, 12.92s/it] 69%|██████▉ | 6910/10000 [25:10:06<11:04:50, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.551e-05, 'epoch': 2.6} 69%|██████▉ | 6910/10000 [25:10:06<11:04:50, 12.91s/it] 69%|██████▉ | 6911/10000 [25:10:19<11:05:24, 12.92s/it] {'loss': 0.0032, 'learning_rate': 1.5505e-05, 'epoch': 2.6} 69%|██████▉ | 6911/10000 [25:10:19<11:05:24, 12.92s/it] 69%|██████▉ | 6912/10000 [25:10:32<11:04:39, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.55e-05, 'epoch': 2.6} 69%|██████▉ | 6912/10000 [25:10:32<11:04:39, 12.91s/it] 69%|██████▉ | 6913/10000 [25:10:45<11:05:14, 12.93s/it] {'loss': 0.0032, 'learning_rate': 1.5495e-05, 'epoch': 2.6} 69%|██████▉ | 6913/10000 [25:10:45<11:05:14, 12.93s/it] 69%|██████▉ | 6914/10000 [25:10:58<11:07:00, 12.97s/it] {'loss': 0.004, 'learning_rate': 1.5490000000000002e-05, 'epoch': 2.61} 69%|██████▉ | 6914/10000 [25:10:58<11:07:00, 12.97s/it] 69%|██████▉ | 6915/10000 [25:11:11<11:07:03, 12.97s/it] {'loss': 0.004, 'learning_rate': 1.5484999999999998e-05, 'epoch': 2.61} 69%|██████▉ | 6915/10000 [25:11:11<11:07:03, 12.97s/it] 69%|██████▉ | 6916/10000 [25:11:24<11:05:11, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.548e-05, 'epoch': 2.61} 69%|██████▉ | 6916/10000 [25:11:24<11:05:11, 12.94s/it] 69%|██████▉ | 6917/10000 [25:11:37<11:03:30, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.5475e-05, 'epoch': 2.61} 69%|██████▉ | 6917/10000 [25:11:37<11:03:30, 12.91s/it] 69%|██████▉ | 6918/10000 [25:11:50<11:02:23, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.5470000000000003e-05, 'epoch': 2.61} 69%|██████▉ | 6918/10000 [25:11:50<11:02:23, 12.90s/it] 69%|██████▉ | 6919/10000 [25:12:03<11:01:55, 12.89s/it] {'loss': 0.0061, 'learning_rate': 1.5465000000000002e-05, 'epoch': 2.61} 69%|██████▉ | 6919/10000 [25:12:03<11:01:55, 12.89s/it] 69%|██████▉ | 6920/10000 [25:12:15<11:02:17, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.546e-05, 'epoch': 2.61} 69%|██████▉ | 6920/10000 [25:12:16<11:02:17, 12.90s/it] 69%|██████▉ | 6921/10000 [25:12:28<11:02:17, 12.91s/it] {'loss': 0.0037, 'learning_rate': 1.5455e-05, 'epoch': 2.61} 69%|██████▉ | 6921/10000 [25:12:28<11:02:17, 12.91s/it] 69%|██████▉ | 6922/10000 [25:12:41<11:02:12, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.545e-05, 'epoch': 2.61} 69%|██████▉ | 6922/10000 [25:12:41<11:02:12, 12.91s/it] 69%|██████▉ | 6923/10000 [25:12:54<11:01:39, 12.90s/it] {'loss': 0.0059, 'learning_rate': 1.5445000000000002e-05, 'epoch': 2.61} 69%|██████▉ | 6923/10000 [25:12:54<11:01:39, 12.90s/it] 69%|██████▉ | 6924/10000 [25:13:07<11:01:24, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.544e-05, 'epoch': 2.61} 69%|██████▉ | 6924/10000 [25:13:07<11:01:24, 12.90s/it] 69%|██████▉ | 6925/10000 [25:13:20<11:00:40, 12.89s/it] {'loss': 0.0037, 'learning_rate': 1.5435e-05, 'epoch': 2.61} 69%|██████▉ | 6925/10000 [25:13:20<11:00:40, 12.89s/it] 69%|██████▉ | 6926/10000 [25:13:33<11:01:29, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.543e-05, 'epoch': 2.61} 69%|██████▉ | 6926/10000 [25:13:33<11:01:29, 12.91s/it] 69%|██████▉ | 6927/10000 [25:13:46<11:00:44, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.5425000000000002e-05, 'epoch': 2.61} 69%|██████▉ | 6927/10000 [25:13:46<11:00:44, 12.90s/it] 69%|██████▉ | 6928/10000 [25:13:59<10:59:05, 12.87s/it] {'loss': 0.0044, 'learning_rate': 1.542e-05, 'epoch': 2.61} 69%|██████▉ | 6928/10000 [25:13:59<10:59:05, 12.87s/it] 69%|██████▉ | 6929/10000 [25:14:11<10:58:48, 12.87s/it] {'loss': 0.0048, 'learning_rate': 1.5415e-05, 'epoch': 2.61} 69%|██████▉ | 6929/10000 [25:14:12<10:58:48, 12.87s/it] 69%|██████▉ | 6930/10000 [25:14:24<11:00:11, 12.90s/it] {'loss': 0.0044, 'learning_rate': 1.541e-05, 'epoch': 2.61} 69%|██████▉ | 6930/10000 [25:14:24<11:00:11, 12.90s/it] 69%|██████▉ | 6931/10000 [25:14:37<11:00:12, 12.91s/it] {'loss': 0.0036, 'learning_rate': 1.5405e-05, 'epoch': 2.61} 69%|██████▉ | 6931/10000 [25:14:37<11:00:12, 12.91s/it] 69%|██████▉ | 6932/10000 [25:14:50<10:59:13, 12.89s/it] {'loss': 0.0057, 'learning_rate': 1.54e-05, 'epoch': 2.61} 69%|██████▉ | 6932/10000 [25:14:50<10:59:13, 12.89s/it] 69%|██████▉ | 6933/10000 [25:15:03<10:58:47, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.5395e-05, 'epoch': 2.61} 69%|██████▉ | 6933/10000 [25:15:03<10:58:47, 12.89s/it] 69%|██████▉ | 6934/10000 [25:15:16<10:59:06, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.539e-05, 'epoch': 2.61} 69%|██████▉ | 6934/10000 [25:15:16<10:59:06, 12.90s/it] 69%|██████▉ | 6935/10000 [25:15:29<10:58:33, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.5385e-05, 'epoch': 2.61} 69%|██████▉ | 6935/10000 [25:15:29<10:58:33, 12.89s/it] 69%|██████▉ | 6936/10000 [25:15:42<10:58:17, 12.89s/it] {'loss': 0.0032, 'learning_rate': 1.538e-05, 'epoch': 2.61} 69%|██████▉ | 6936/10000 [25:15:42<10:58:17, 12.89s/it] 69%|██████▉ | 6937/10000 [25:15:55<10:58:12, 12.89s/it] {'loss': 0.0051, 'learning_rate': 1.5375e-05, 'epoch': 2.61} 69%|██████▉ | 6937/10000 [25:15:55<10:58:12, 12.89s/it] 69%|██████▉ | 6938/10000 [25:16:08<10:58:01, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.537e-05, 'epoch': 2.61} 69%|██████▉ | 6938/10000 [25:16:08<10:58:01, 12.89s/it] 69%|██████▉ | 6939/10000 [25:16:20<10:57:44, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.5365000000000003e-05, 'epoch': 2.61} 69%|██████▉ | 6939/10000 [25:16:21<10:57:44, 12.89s/it] 69%|██████▉ | 6940/10000 [25:16:33<10:57:47, 12.90s/it] {'loss': 0.0048, 'learning_rate': 1.536e-05, 'epoch': 2.61} 69%|██████▉ | 6940/10000 [25:16:33<10:57:47, 12.90s/it] 69%|██████▉ | 6941/10000 [25:16:46<10:57:25, 12.89s/it] {'loss': 0.005, 'learning_rate': 1.5355e-05, 'epoch': 2.62} 69%|██████▉ | 6941/10000 [25:16:46<10:57:25, 12.89s/it] 69%|██████▉ | 6942/10000 [25:16:59<10:59:38, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.535e-05, 'epoch': 2.62} 69%|██████▉ | 6942/10000 [25:16:59<10:59:38, 12.94s/it] 69%|██████▉ | 6943/10000 [25:17:12<10:58:37, 12.93s/it] {'loss': 0.0056, 'learning_rate': 1.5345e-05, 'epoch': 2.62} 69%|██████▉ | 6943/10000 [25:17:12<10:58:37, 12.93s/it] 69%|██████▉ | 6944/10000 [25:17:25<10:59:46, 12.95s/it] {'loss': 0.0037, 'learning_rate': 1.5340000000000002e-05, 'epoch': 2.62} 69%|██████▉ | 6944/10000 [25:17:25<10:59:46, 12.95s/it] 69%|██████▉ | 6945/10000 [25:17:38<10:59:02, 12.94s/it] {'loss': 0.0053, 'learning_rate': 1.5334999999999998e-05, 'epoch': 2.62} 69%|██████▉ | 6945/10000 [25:17:38<10:59:02, 12.94s/it] 69%|██████▉ | 6946/10000 [25:17:51<10:56:49, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.533e-05, 'epoch': 2.62} 69%|██████▉ | 6946/10000 [25:17:51<10:56:49, 12.90s/it] 69%|██████▉ | 6947/10000 [25:18:04<10:55:12, 12.88s/it] {'loss': 0.0048, 'learning_rate': 1.5325e-05, 'epoch': 2.62} 69%|██████▉ | 6947/10000 [25:18:04<10:55:12, 12.88s/it] 69%|██████▉ | 6948/10000 [25:18:17<10:55:32, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.5320000000000002e-05, 'epoch': 2.62} 69%|██████▉ | 6948/10000 [25:18:17<10:55:32, 12.89s/it] 69%|██████▉ | 6949/10000 [25:18:30<10:55:34, 12.89s/it] {'loss': 0.0039, 'learning_rate': 1.5315e-05, 'epoch': 2.62} 69%|██████▉ | 6949/10000 [25:18:30<10:55:34, 12.89s/it] 70%|██████▉ | 6950/10000 [25:18:42<10:54:25, 12.87s/it] {'loss': 0.0044, 'learning_rate': 1.531e-05, 'epoch': 2.62} 70%|██████▉ | 6950/10000 [25:18:42<10:54:25, 12.87s/it] 70%|██████▉ | 6951/10000 [25:18:55<10:53:34, 12.86s/it] {'loss': 0.0065, 'learning_rate': 1.5305e-05, 'epoch': 2.62} 70%|██████▉ | 6951/10000 [25:18:55<10:53:34, 12.86s/it] 70%|██████▉ | 6952/10000 [25:19:08<10:54:52, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.53e-05, 'epoch': 2.62} 70%|██████▉ | 6952/10000 [25:19:08<10:54:52, 12.89s/it] 70%|██████▉ | 6953/10000 [25:19:21<10:53:45, 12.87s/it] {'loss': 0.0043, 'learning_rate': 1.5295000000000002e-05, 'epoch': 2.62} 70%|██████▉ | 6953/10000 [25:19:21<10:53:45, 12.87s/it] 70%|██████▉ | 6954/10000 [25:19:34<10:53:33, 12.87s/it] {'loss': 0.0052, 'learning_rate': 1.529e-05, 'epoch': 2.62} 70%|██████▉ | 6954/10000 [25:19:34<10:53:33, 12.87s/it] 70%|██████▉ | 6955/10000 [25:19:47<10:53:05, 12.87s/it] {'loss': 0.0052, 'learning_rate': 1.5285000000000004e-05, 'epoch': 2.62} 70%|██████▉ | 6955/10000 [25:19:47<10:53:05, 12.87s/it] 70%|██████▉ | 6956/10000 [25:20:00<10:53:38, 12.88s/it] {'loss': 0.0038, 'learning_rate': 1.528e-05, 'epoch': 2.62} 70%|██████▉ | 6956/10000 [25:20:00<10:53:38, 12.88s/it] 70%|██████▉ | 6957/10000 [25:20:13<10:53:58, 12.89s/it] {'loss': 0.0039, 'learning_rate': 1.5275000000000002e-05, 'epoch': 2.62} 70%|██████▉ | 6957/10000 [25:20:13<10:53:58, 12.89s/it] 70%|██████▉ | 6958/10000 [25:20:26<10:54:53, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.527e-05, 'epoch': 2.62} 70%|██████▉ | 6958/10000 [25:20:26<10:54:53, 12.92s/it] 70%|██████▉ | 6959/10000 [25:20:39<10:54:48, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.5265e-05, 'epoch': 2.62} 70%|██████▉ | 6959/10000 [25:20:39<10:54:48, 12.92s/it] 70%|██████▉ | 6960/10000 [25:20:51<10:55:14, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.5260000000000003e-05, 'epoch': 2.62} 70%|██████▉ | 6960/10000 [25:20:52<10:55:14, 12.93s/it] 70%|██████▉ | 6961/10000 [25:21:04<10:53:37, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.5255e-05, 'epoch': 2.62} 70%|██████▉ | 6961/10000 [25:21:04<10:53:37, 12.90s/it] 70%|██████▉ | 6962/10000 [25:21:17<10:51:57, 12.88s/it] {'loss': 0.0035, 'learning_rate': 1.525e-05, 'epoch': 2.62} 70%|██████▉ | 6962/10000 [25:21:17<10:51:57, 12.88s/it] 70%|██████▉ | 6963/10000 [25:21:30<10:51:06, 12.86s/it] {'loss': 0.006, 'learning_rate': 1.5245e-05, 'epoch': 2.62} 70%|██████▉ | 6963/10000 [25:21:30<10:51:06, 12.86s/it] 70%|██████▉ | 6964/10000 [25:21:43<10:51:48, 12.88s/it] {'loss': 0.0036, 'learning_rate': 1.5240000000000001e-05, 'epoch': 2.62} 70%|██████▉ | 6964/10000 [25:21:43<10:51:48, 12.88s/it] 70%|██████▉ | 6965/10000 [25:21:56<10:52:39, 12.90s/it] {'loss': 0.005, 'learning_rate': 1.5235000000000002e-05, 'epoch': 2.62} 70%|██████▉ | 6965/10000 [25:21:56<10:52:39, 12.90s/it] 70%|██████▉ | 6966/10000 [25:22:09<10:52:50, 12.91s/it] {'loss': 0.0036, 'learning_rate': 1.523e-05, 'epoch': 2.62} 70%|██████▉ | 6966/10000 [25:22:09<10:52:50, 12.91s/it] 70%|██████▉ | 6967/10000 [25:22:22<10:52:27, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.5225e-05, 'epoch': 2.63} 70%|██████▉ | 6967/10000 [25:22:22<10:52:27, 12.91s/it] 70%|██████▉ | 6968/10000 [25:22:35<10:51:38, 12.90s/it] {'loss': 0.0064, 'learning_rate': 1.5220000000000002e-05, 'epoch': 2.63} 70%|██████▉ | 6968/10000 [25:22:35<10:51:38, 12.90s/it] 70%|██████▉ | 6969/10000 [25:22:47<10:51:47, 12.90s/it] {'loss': 0.0054, 'learning_rate': 1.5215000000000001e-05, 'epoch': 2.63} 70%|██████▉ | 6969/10000 [25:22:47<10:51:47, 12.90s/it] 70%|██████▉ | 6970/10000 [25:23:00<10:52:06, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.5210000000000002e-05, 'epoch': 2.63} 70%|██████▉ | 6970/10000 [25:23:00<10:52:06, 12.91s/it] 70%|██████▉ | 6971/10000 [25:23:13<10:51:05, 12.90s/it] {'loss': 0.0063, 'learning_rate': 1.5205e-05, 'epoch': 2.63} 70%|██████▉ | 6971/10000 [25:23:13<10:51:05, 12.90s/it] 70%|██████▉ | 6972/10000 [25:23:26<10:52:40, 12.93s/it] {'loss': 0.0037, 'learning_rate': 1.52e-05, 'epoch': 2.63} 70%|██████▉ | 6972/10000 [25:23:26<10:52:40, 12.93s/it] 70%|██████▉ | 6973/10000 [25:23:39<10:52:37, 12.94s/it] {'loss': 0.0047, 'learning_rate': 1.5195000000000001e-05, 'epoch': 2.63} 70%|██████▉ | 6973/10000 [25:23:39<10:52:37, 12.94s/it] 70%|██████▉ | 6974/10000 [25:23:52<10:52:46, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.5190000000000002e-05, 'epoch': 2.63} 70%|██████▉ | 6974/10000 [25:23:52<10:52:46, 12.94s/it] 70%|██████▉ | 6975/10000 [25:24:05<10:51:28, 12.92s/it] {'loss': 0.0056, 'learning_rate': 1.5185000000000003e-05, 'epoch': 2.63} 70%|██████▉ | 6975/10000 [25:24:05<10:51:28, 12.92s/it] 70%|██████▉ | 6976/10000 [25:24:18<10:50:49, 12.91s/it] {'loss': 0.0059, 'learning_rate': 1.518e-05, 'epoch': 2.63} 70%|██████▉ | 6976/10000 [25:24:18<10:50:49, 12.91s/it] 70%|██████▉ | 6977/10000 [25:24:31<10:50:40, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.5175e-05, 'epoch': 2.63} 70%|██████▉ | 6977/10000 [25:24:31<10:50:40, 12.91s/it] 70%|██████▉ | 6978/10000 [25:24:44<10:49:43, 12.90s/it] {'loss': 0.0034, 'learning_rate': 1.517e-05, 'epoch': 2.63} 70%|██████▉ | 6978/10000 [25:24:44<10:49:43, 12.90s/it] 70%|██████▉ | 6979/10000 [25:24:57<10:48:57, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.5165000000000001e-05, 'epoch': 2.63} 70%|██████▉ | 6979/10000 [25:24:57<10:48:57, 12.89s/it] 70%|██████▉ | 6980/10000 [25:25:09<10:48:52, 12.89s/it] {'loss': 0.003, 'learning_rate': 1.5160000000000002e-05, 'epoch': 2.63} 70%|██████▉ | 6980/10000 [25:25:10<10:48:52, 12.89s/it] 70%|██████▉ | 6981/10000 [25:25:22<10:48:13, 12.88s/it] {'loss': 0.0063, 'learning_rate': 1.5155e-05, 'epoch': 2.63} 70%|██████▉ | 6981/10000 [25:25:22<10:48:13, 12.88s/it] 70%|██████▉ | 6982/10000 [25:25:35<10:48:33, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.515e-05, 'epoch': 2.63} 70%|██████▉ | 6982/10000 [25:25:35<10:48:33, 12.89s/it] 70%|██████▉ | 6983/10000 [25:25:48<10:47:38, 12.88s/it] {'loss': 0.0043, 'learning_rate': 1.5145000000000002e-05, 'epoch': 2.63} 70%|██████▉ | 6983/10000 [25:25:48<10:47:38, 12.88s/it] 70%|██████▉ | 6984/10000 [25:26:01<10:47:07, 12.87s/it] {'loss': 0.0047, 'learning_rate': 1.514e-05, 'epoch': 2.63} 70%|██████▉ | 6984/10000 [25:26:01<10:47:07, 12.87s/it] 70%|██████▉ | 6985/10000 [25:26:14<10:47:20, 12.88s/it] {'loss': 0.0039, 'learning_rate': 1.5135000000000002e-05, 'epoch': 2.63} 70%|██████▉ | 6985/10000 [25:26:14<10:47:20, 12.88s/it] 70%|██████▉ | 6986/10000 [25:26:27<10:47:28, 12.89s/it] {'loss': 0.0051, 'learning_rate': 1.5129999999999999e-05, 'epoch': 2.63} 70%|██████▉ | 6986/10000 [25:26:27<10:47:28, 12.89s/it] 70%|██████▉ | 6987/10000 [25:26:40<10:47:44, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.5125e-05, 'epoch': 2.63} 70%|██████▉ | 6987/10000 [25:26:40<10:47:44, 12.90s/it] 70%|██████▉ | 6988/10000 [25:26:53<10:48:45, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.5120000000000001e-05, 'epoch': 2.63} 70%|██████▉ | 6988/10000 [25:26:53<10:48:45, 12.92s/it] 70%|██████▉ | 6989/10000 [25:27:06<10:49:05, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.5115000000000002e-05, 'epoch': 2.63} 70%|██████▉ | 6989/10000 [25:27:06<10:49:05, 12.93s/it] 70%|██████▉ | 6990/10000 [25:27:19<10:49:37, 12.95s/it] {'loss': 0.0056, 'learning_rate': 1.5110000000000003e-05, 'epoch': 2.63} 70%|██████▉ | 6990/10000 [25:27:19<10:49:37, 12.95s/it] 70%|██████▉ | 6991/10000 [25:27:32<10:48:50, 12.94s/it] {'loss': 0.005, 'learning_rate': 1.5105e-05, 'epoch': 2.63} 70%|██████▉ | 6991/10000 [25:27:32<10:48:50, 12.94s/it] 70%|██████▉ | 6992/10000 [25:27:44<10:48:34, 12.94s/it] {'loss': 0.0044, 'learning_rate': 1.51e-05, 'epoch': 2.63} 70%|██████▉ | 6992/10000 [25:27:45<10:48:34, 12.94s/it] 70%|██████▉ | 6993/10000 [25:27:57<10:47:35, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.5095e-05, 'epoch': 2.63} 70%|██████▉ | 6993/10000 [25:27:57<10:47:35, 12.92s/it] 70%|██████▉ | 6994/10000 [25:28:10<10:46:45, 12.91s/it] {'loss': 0.005, 'learning_rate': 1.5090000000000001e-05, 'epoch': 2.64} 70%|██████▉ | 6994/10000 [25:28:10<10:46:45, 12.91s/it] 70%|██████▉ | 6995/10000 [25:28:23<10:46:27, 12.91s/it] {'loss': 0.0035, 'learning_rate': 1.5085000000000002e-05, 'epoch': 2.64} 70%|██████▉ | 6995/10000 [25:28:23<10:46:27, 12.91s/it] 70%|██████▉ | 6996/10000 [25:28:36<10:45:20, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.508e-05, 'epoch': 2.64} 70%|██████▉ | 6996/10000 [25:28:36<10:45:20, 12.89s/it] 70%|██████▉ | 6997/10000 [25:28:49<10:46:36, 12.92s/it] {'loss': 0.0055, 'learning_rate': 1.5075e-05, 'epoch': 2.64} 70%|██████▉ | 6997/10000 [25:28:49<10:46:36, 12.92s/it] 70%|██████▉ | 6998/10000 [25:29:02<10:46:59, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.5070000000000001e-05, 'epoch': 2.64} 70%|██████▉ | 6998/10000 [25:29:02<10:46:59, 12.93s/it] 70%|██████▉ | 6999/10000 [25:29:15<10:45:58, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.5065e-05, 'epoch': 2.64} 70%|██████▉ | 6999/10000 [25:29:15<10:45:58, 12.92s/it] 70%|███████ | 7000/10000 [25:29:28<10:45:59, 12.92s/it] {'loss': 0.0051, 'learning_rate': 1.5060000000000001e-05, 'epoch': 2.64} 70%|███████ | 7000/10000 [25:29:28<10:45:59, 12.92s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-06 21:54:26,403 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/config.json [INFO|configuration_utils.py:364] 2024-11-06 21:54:26,405 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-06 21:55:35,038 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-06 21:55:35,040 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-06 21:55:35,042 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/special_tokens_map.json [2024-11-06 21:55:35,053] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step7000 is about to be saved! [2024-11-06 21:55:35,101] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/global_step7000/mp_rank_00_model_states.pt [2024-11-06 21:55:35,101] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/global_step7000/mp_rank_00_model_states.pt... [2024-11-06 21:56:25,462] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/global_step7000/mp_rank_00_model_states.pt. [2024-11-06 21:56:25,558] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/global_step7000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-06 21:57:52,027] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/global_step7000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-06 21:57:52,053] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-7000/global_step7000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-06 21:57:52,053] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step7000 is ready now! 70%|███████ | 7001/10000 [25:33:06<62:08:34, 74.60s/it] {'loss': 0.0062, 'learning_rate': 1.5054999999999999e-05, 'epoch': 2.64} 70%|███████ | 7001/10000 [25:33:06<62:08:34, 74.60s/it] 70%|███████ | 7002/10000 [25:33:19<46:39:19, 56.02s/it] {'loss': 0.0039, 'learning_rate': 1.505e-05, 'epoch': 2.64} 70%|███████ | 7002/10000 [25:33:19<46:39:19, 56.02s/it] 70%|███████ | 7003/10000 [25:33:32<35:50:33, 43.05s/it] {'loss': 0.0047, 'learning_rate': 1.5045e-05, 'epoch': 2.64} 70%|███████ | 7003/10000 [25:33:32<35:50:33, 43.05s/it] 70%|███████ | 7004/10000 [25:33:44<28:15:55, 33.96s/it] {'loss': 0.0049, 'learning_rate': 1.5040000000000002e-05, 'epoch': 2.64} 70%|███████ | 7004/10000 [25:33:45<28:15:55, 33.96s/it] 70%|███████ | 7005/10000 [25:33:57<22:57:33, 27.60s/it] {'loss': 0.0046, 'learning_rate': 1.5035000000000003e-05, 'epoch': 2.64} 70%|███████ | 7005/10000 [25:33:57<22:57:33, 27.60s/it] 70%|███████ | 7006/10000 [25:34:10<19:15:04, 23.15s/it] {'loss': 0.0036, 'learning_rate': 1.503e-05, 'epoch': 2.64} 70%|███████ | 7006/10000 [25:34:10<19:15:04, 23.15s/it] 70%|███████ | 7007/10000 [25:34:23<16:39:35, 20.04s/it] {'loss': 0.0034, 'learning_rate': 1.5025000000000001e-05, 'epoch': 2.64} 70%|███████ | 7007/10000 [25:34:23<16:39:35, 20.04s/it] 70%|███████ | 7008/10000 [25:34:36<14:51:02, 17.87s/it] {'loss': 0.0052, 'learning_rate': 1.502e-05, 'epoch': 2.64} 70%|███████ | 7008/10000 [25:34:36<14:51:02, 17.87s/it] 70%|███████ | 7009/10000 [25:34:48<13:34:28, 16.34s/it] {'loss': 0.004, 'learning_rate': 1.5015000000000001e-05, 'epoch': 2.64} 70%|███████ | 7009/10000 [25:34:48<13:34:28, 16.34s/it] 70%|███████ | 7010/10000 [25:35:01<12:41:52, 15.29s/it] {'loss': 0.0043, 'learning_rate': 1.5010000000000002e-05, 'epoch': 2.64} 70%|███████ | 7010/10000 [25:35:01<12:41:52, 15.29s/it] 70%|███████ | 7011/10000 [25:35:14<12:04:25, 14.54s/it] {'loss': 0.0034, 'learning_rate': 1.5005e-05, 'epoch': 2.64} 70%|███████ | 7011/10000 [25:35:14<12:04:25, 14.54s/it] 70%|███████ | 7012/10000 [25:35:27<11:38:32, 14.03s/it] {'loss': 0.0052, 'learning_rate': 1.5e-05, 'epoch': 2.64} 70%|███████ | 7012/10000 [25:35:27<11:38:32, 14.03s/it] 70%|███████ | 7013/10000 [25:35:40<11:20:39, 13.67s/it] {'loss': 0.0059, 'learning_rate': 1.4995000000000001e-05, 'epoch': 2.64} 70%|███████ | 7013/10000 [25:35:40<11:20:39, 13.67s/it] 70%|███████ | 7014/10000 [25:35:53<11:09:05, 13.44s/it] {'loss': 0.0053, 'learning_rate': 1.499e-05, 'epoch': 2.64} 70%|███████ | 7014/10000 [25:35:53<11:09:05, 13.44s/it] 70%|███████ | 7015/10000 [25:36:05<11:00:54, 13.28s/it] {'loss': 0.0044, 'learning_rate': 1.4985000000000001e-05, 'epoch': 2.64} 70%|███████ | 7015/10000 [25:36:06<11:00:54, 13.28s/it] 70%|███████ | 7016/10000 [25:36:18<10:55:16, 13.18s/it] {'loss': 0.0048, 'learning_rate': 1.4979999999999999e-05, 'epoch': 2.64} 70%|███████ | 7016/10000 [25:36:18<10:55:16, 13.18s/it] 70%|███████ | 7017/10000 [25:36:31<10:51:08, 13.10s/it] {'loss': 0.0039, 'learning_rate': 1.4975e-05, 'epoch': 2.64} 70%|███████ | 7017/10000 [25:36:31<10:51:08, 13.10s/it] 70%|███████ | 7018/10000 [25:36:44<10:46:31, 13.01s/it] {'loss': 0.0044, 'learning_rate': 1.497e-05, 'epoch': 2.64} 70%|███████ | 7018/10000 [25:36:44<10:46:31, 13.01s/it] 70%|███████ | 7019/10000 [25:36:57<10:45:24, 12.99s/it] {'loss': 0.0051, 'learning_rate': 1.4965000000000002e-05, 'epoch': 2.64} 70%|███████ | 7019/10000 [25:36:57<10:45:24, 12.99s/it] 70%|███████ | 7020/10000 [25:37:10<10:43:46, 12.96s/it] {'loss': 0.0045, 'learning_rate': 1.4960000000000002e-05, 'epoch': 2.65} 70%|███████ | 7020/10000 [25:37:10<10:43:46, 12.96s/it] 70%|███████ | 7021/10000 [25:37:23<10:43:18, 12.96s/it] {'loss': 0.0042, 'learning_rate': 1.4955e-05, 'epoch': 2.65} 70%|███████ | 7021/10000 [25:37:23<10:43:18, 12.96s/it] 70%|███████ | 7022/10000 [25:37:36<10:43:51, 12.97s/it] {'loss': 0.005, 'learning_rate': 1.4950000000000001e-05, 'epoch': 2.65} 70%|███████ | 7022/10000 [25:37:36<10:43:51, 12.97s/it] 70%|███████ | 7023/10000 [25:37:49<10:43:45, 12.97s/it] {'loss': 0.0047, 'learning_rate': 1.4945e-05, 'epoch': 2.65} 70%|███████ | 7023/10000 [25:37:49<10:43:45, 12.97s/it] 70%|███████ | 7024/10000 [25:38:02<10:42:57, 12.96s/it] {'loss': 0.0048, 'learning_rate': 1.4940000000000001e-05, 'epoch': 2.65} 70%|███████ | 7024/10000 [25:38:02<10:42:57, 12.96s/it] 70%|███████ | 7025/10000 [25:38:15<10:42:56, 12.97s/it] {'loss': 0.0044, 'learning_rate': 1.4935000000000002e-05, 'epoch': 2.65} 70%|███████ | 7025/10000 [25:38:15<10:42:56, 12.97s/it] 70%|███████ | 7026/10000 [25:38:28<10:43:53, 12.99s/it] {'loss': 0.0041, 'learning_rate': 1.493e-05, 'epoch': 2.65} 70%|███████ | 7026/10000 [25:38:28<10:43:53, 12.99s/it] 70%|███████ | 7027/10000 [25:38:41<10:42:56, 12.98s/it] {'loss': 0.0061, 'learning_rate': 1.4925e-05, 'epoch': 2.65} 70%|███████ | 7027/10000 [25:38:41<10:42:56, 12.98s/it] 70%|███████ | 7028/10000 [25:38:54<10:42:42, 12.98s/it] {'loss': 0.0033, 'learning_rate': 1.4920000000000001e-05, 'epoch': 2.65} 70%|███████ | 7028/10000 [25:38:54<10:42:42, 12.98s/it] 70%|███████ | 7029/10000 [25:39:07<10:42:08, 12.97s/it] {'loss': 0.0044, 'learning_rate': 1.4915000000000002e-05, 'epoch': 2.65} 70%|███████ | 7029/10000 [25:39:07<10:42:08, 12.97s/it][2024-11-06 22:04:16,789] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 70%|███████ | 7030/10000 [25:39:18<10:20:25, 12.53s/it] {'loss': 0.0048, 'learning_rate': 1.4915000000000002e-05, 'epoch': 2.65} 70%|███████ | 7030/10000 [25:39:18<10:20:25, 12.53s/it][2024-11-06 22:04:28,381] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 70%|███████ | 7031/10000 [25:39:30<10:06:14, 12.25s/it] {'loss': 0.0046, 'learning_rate': 1.4915000000000002e-05, 'epoch': 2.65} 70%|███████ | 7031/10000 [25:39:30<10:06:14, 12.25s/it] 70%|███████ | 7032/10000 [25:39:43<10:15:49, 12.45s/it] {'loss': 0.0038, 'learning_rate': 1.4910000000000001e-05, 'epoch': 2.65} 70%|███████ | 7032/10000 [25:39:43<10:15:49, 12.45s/it] 70%|███████ | 7033/10000 [25:39:56<10:22:39, 12.59s/it] {'loss': 0.0046, 'learning_rate': 1.4904999999999999e-05, 'epoch': 2.65} 70%|███████ | 7033/10000 [25:39:56<10:22:39, 12.59s/it] 70%|███████ | 7034/10000 [25:40:09<10:27:33, 12.69s/it] {'loss': 0.0055, 'learning_rate': 1.49e-05, 'epoch': 2.65} 70%|███████ | 7034/10000 [25:40:09<10:27:33, 12.69s/it] 70%|███████ | 7035/10000 [25:40:21<10:28:47, 12.72s/it] {'loss': 0.0048, 'learning_rate': 1.4895e-05, 'epoch': 2.65} 70%|███████ | 7035/10000 [25:40:21<10:28:47, 12.72s/it] 70%|███████ | 7036/10000 [25:40:34<10:30:38, 12.77s/it] {'loss': 0.0042, 'learning_rate': 1.4890000000000001e-05, 'epoch': 2.65} 70%|███████ | 7036/10000 [25:40:34<10:30:38, 12.77s/it] 70%|███████ | 7037/10000 [25:40:47<10:31:36, 12.79s/it] {'loss': 0.004, 'learning_rate': 1.4885000000000002e-05, 'epoch': 2.65} 70%|███████ | 7037/10000 [25:40:47<10:31:36, 12.79s/it] 70%|███████ | 7038/10000 [25:41:00<10:32:00, 12.80s/it] {'loss': 0.0054, 'learning_rate': 1.488e-05, 'epoch': 2.65} 70%|███████ | 7038/10000 [25:41:00<10:32:00, 12.80s/it] 70%|███████ | 7039/10000 [25:41:13<10:34:18, 12.85s/it] {'loss': 0.0047, 'learning_rate': 1.4875e-05, 'epoch': 2.65} 70%|███████ | 7039/10000 [25:41:13<10:34:18, 12.85s/it] 70%|███████ | 7040/10000 [25:41:26<10:34:46, 12.87s/it] {'loss': 0.0042, 'learning_rate': 1.487e-05, 'epoch': 2.65} 70%|███████ | 7040/10000 [25:41:26<10:34:46, 12.87s/it] 70%|███████ | 7041/10000 [25:41:39<10:33:54, 12.85s/it] {'loss': 0.0042, 'learning_rate': 1.4865e-05, 'epoch': 2.65} 70%|███████ | 7041/10000 [25:41:39<10:33:54, 12.85s/it] 70%|███████ | 7042/10000 [25:41:51<10:33:03, 12.84s/it] {'loss': 0.0042, 'learning_rate': 1.4860000000000002e-05, 'epoch': 2.65} 70%|███████ | 7042/10000 [25:41:51<10:33:03, 12.84s/it] 70%|███████ | 7043/10000 [25:42:04<10:32:37, 12.84s/it] {'loss': 0.0038, 'learning_rate': 1.4855e-05, 'epoch': 2.65} 70%|███████ | 7043/10000 [25:42:04<10:32:37, 12.84s/it] 70%|███████ | 7044/10000 [25:42:17<10:34:28, 12.88s/it] {'loss': 0.0045, 'learning_rate': 1.485e-05, 'epoch': 2.65} 70%|███████ | 7044/10000 [25:42:17<10:34:28, 12.88s/it] 70%|███████ | 7045/10000 [25:42:30<10:35:31, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.4845000000000001e-05, 'epoch': 2.65} 70%|███████ | 7045/10000 [25:42:30<10:35:31, 12.90s/it] 70%|███████ | 7046/10000 [25:42:43<10:34:47, 12.89s/it] {'loss': 0.0048, 'learning_rate': 1.4840000000000002e-05, 'epoch': 2.65} 70%|███████ | 7046/10000 [25:42:43<10:34:47, 12.89s/it] 70%|███████ | 7047/10000 [25:42:56<10:35:10, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.4835000000000001e-05, 'epoch': 2.66} 70%|███████ | 7047/10000 [25:42:56<10:35:10, 12.91s/it] 70%|███████ | 7048/10000 [25:43:09<10:35:11, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.4829999999999999e-05, 'epoch': 2.66} 70%|███████ | 7048/10000 [25:43:09<10:35:11, 12.91s/it] 70%|███████ | 7049/10000 [25:43:22<10:35:20, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.4825e-05, 'epoch': 2.66} 70%|███████ | 7049/10000 [25:43:22<10:35:20, 12.92s/it] 70%|███████ | 7050/10000 [25:43:35<10:34:39, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.482e-05, 'epoch': 2.66} 70%|███████ | 7050/10000 [25:43:35<10:34:39, 12.91s/it] 71%|███████ | 7051/10000 [25:43:48<10:34:25, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.4815000000000001e-05, 'epoch': 2.66} 71%|███████ | 7051/10000 [25:43:48<10:34:25, 12.91s/it] 71%|███████ | 7052/10000 [25:44:01<10:33:55, 12.90s/it] {'loss': 0.0048, 'learning_rate': 1.4810000000000002e-05, 'epoch': 2.66} 71%|███████ | 7052/10000 [25:44:01<10:33:55, 12.90s/it] 71%|███████ | 7053/10000 [25:44:13<10:33:18, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.4805e-05, 'epoch': 2.66} 71%|███████ | 7053/10000 [25:44:13<10:33:18, 12.89s/it] 71%|███████ | 7054/10000 [25:44:26<10:32:54, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.48e-05, 'epoch': 2.66} 71%|███████ | 7054/10000 [25:44:26<10:32:54, 12.89s/it] 71%|███████ | 7055/10000 [25:44:39<10:32:38, 12.89s/it] {'loss': 0.0053, 'learning_rate': 1.4795e-05, 'epoch': 2.66} 71%|███████ | 7055/10000 [25:44:39<10:32:38, 12.89s/it] 71%|███████ | 7056/10000 [25:44:52<10:31:50, 12.88s/it] {'loss': 0.005, 'learning_rate': 1.479e-05, 'epoch': 2.66} 71%|███████ | 7056/10000 [25:44:52<10:31:50, 12.88s/it] 71%|███████ | 7057/10000 [25:45:05<10:31:16, 12.87s/it] {'loss': 0.0041, 'learning_rate': 1.4785000000000002e-05, 'epoch': 2.66} 71%|███████ | 7057/10000 [25:45:05<10:31:16, 12.87s/it] 71%|███████ | 7058/10000 [25:45:18<10:30:23, 12.86s/it] {'loss': 0.0056, 'learning_rate': 1.4779999999999999e-05, 'epoch': 2.66} 71%|███████ | 7058/10000 [25:45:18<10:30:23, 12.86s/it] 71%|███████ | 7059/10000 [25:45:31<10:30:21, 12.86s/it] {'loss': 0.0044, 'learning_rate': 1.4775e-05, 'epoch': 2.66} 71%|███████ | 7059/10000 [25:45:31<10:30:21, 12.86s/it] 71%|███████ | 7060/10000 [25:45:43<10:30:30, 12.87s/it] {'loss': 0.0042, 'learning_rate': 1.4770000000000001e-05, 'epoch': 2.66} 71%|███████ | 7060/10000 [25:45:44<10:30:30, 12.87s/it] 71%|███████ | 7061/10000 [25:45:56<10:30:02, 12.86s/it] {'loss': 0.0041, 'learning_rate': 1.4765000000000002e-05, 'epoch': 2.66} 71%|███████ | 7061/10000 [25:45:56<10:30:02, 12.86s/it] 71%|███████ | 7062/10000 [25:46:09<10:30:44, 12.88s/it] {'loss': 0.0038, 'learning_rate': 1.4760000000000001e-05, 'epoch': 2.66} 71%|███████ | 7062/10000 [25:46:09<10:30:44, 12.88s/it] 71%|███████ | 7063/10000 [25:46:22<10:30:52, 12.89s/it] {'loss': 0.0053, 'learning_rate': 1.4755e-05, 'epoch': 2.66} 71%|███████ | 7063/10000 [25:46:22<10:30:52, 12.89s/it] 71%|███████ | 7064/10000 [25:46:35<10:29:41, 12.87s/it] {'loss': 0.0045, 'learning_rate': 1.475e-05, 'epoch': 2.66} 71%|███████ | 7064/10000 [25:46:35<10:29:41, 12.87s/it] 71%|███████ | 7065/10000 [25:46:48<10:29:43, 12.87s/it] {'loss': 0.006, 'learning_rate': 1.4745e-05, 'epoch': 2.66} 71%|███████ | 7065/10000 [25:46:48<10:29:43, 12.87s/it] 71%|███████ | 7066/10000 [25:47:01<10:30:55, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.4740000000000001e-05, 'epoch': 2.66} 71%|███████ | 7066/10000 [25:47:01<10:30:55, 12.90s/it] 71%|███████ | 7067/10000 [25:47:14<10:30:48, 12.90s/it] {'loss': 0.0033, 'learning_rate': 1.4735000000000002e-05, 'epoch': 2.66} 71%|███████ | 7067/10000 [25:47:14<10:30:48, 12.90s/it] 71%|███████ | 7068/10000 [25:47:27<10:29:04, 12.87s/it] {'loss': 0.0055, 'learning_rate': 1.473e-05, 'epoch': 2.66} 71%|███████ | 7068/10000 [25:47:27<10:29:04, 12.87s/it] 71%|███████ | 7069/10000 [25:47:39<10:28:11, 12.86s/it] {'loss': 0.0056, 'learning_rate': 1.4725e-05, 'epoch': 2.66} 71%|███████ | 7069/10000 [25:47:39<10:28:11, 12.86s/it] 71%|███████ | 7070/10000 [25:47:52<10:27:35, 12.85s/it] {'loss': 0.0053, 'learning_rate': 1.472e-05, 'epoch': 2.66} 71%|███████ | 7070/10000 [25:47:52<10:27:35, 12.85s/it] 71%|███████ | 7071/10000 [25:48:05<10:28:10, 12.87s/it] {'loss': 0.0039, 'learning_rate': 1.4715e-05, 'epoch': 2.66} 71%|███████ | 7071/10000 [25:48:05<10:28:10, 12.87s/it] 71%|███████ | 7072/10000 [25:48:18<10:27:15, 12.85s/it] {'loss': 0.0064, 'learning_rate': 1.4710000000000001e-05, 'epoch': 2.66} 71%|███████ | 7072/10000 [25:48:18<10:27:15, 12.85s/it] 71%|███████ | 7073/10000 [25:48:31<10:27:50, 12.87s/it] {'loss': 0.0041, 'learning_rate': 1.4704999999999999e-05, 'epoch': 2.67} 71%|███████ | 7073/10000 [25:48:31<10:27:50, 12.87s/it] 71%|███████ | 7074/10000 [25:48:44<10:27:21, 12.86s/it] {'loss': 0.0037, 'learning_rate': 1.47e-05, 'epoch': 2.67} 71%|███████ | 7074/10000 [25:48:44<10:27:21, 12.86s/it] 71%|███████ | 7075/10000 [25:48:57<10:27:38, 12.87s/it] {'loss': 0.004, 'learning_rate': 1.4695e-05, 'epoch': 2.67} 71%|███████ | 7075/10000 [25:48:57<10:27:38, 12.87s/it] 71%|███████ | 7076/10000 [25:49:09<10:27:17, 12.87s/it] {'loss': 0.0043, 'learning_rate': 1.4690000000000002e-05, 'epoch': 2.67} 71%|███████ | 7076/10000 [25:49:09<10:27:17, 12.87s/it] 71%|███████ | 7077/10000 [25:49:22<10:27:01, 12.87s/it] {'loss': 0.0043, 'learning_rate': 1.4685000000000001e-05, 'epoch': 2.67} 71%|███████ | 7077/10000 [25:49:22<10:27:01, 12.87s/it] 71%|███████ | 7078/10000 [25:49:35<10:27:23, 12.88s/it] {'loss': 0.0044, 'learning_rate': 1.4680000000000002e-05, 'epoch': 2.67} 71%|███████ | 7078/10000 [25:49:35<10:27:23, 12.88s/it] 71%|███████ | 7079/10000 [25:49:48<10:25:54, 12.86s/it] {'loss': 0.0039, 'learning_rate': 1.4675e-05, 'epoch': 2.67} 71%|███████ | 7079/10000 [25:49:48<10:25:54, 12.86s/it] 71%|███████ | 7080/10000 [25:50:01<10:25:14, 12.85s/it] {'loss': 0.0047, 'learning_rate': 1.467e-05, 'epoch': 2.67} 71%|███████ | 7080/10000 [25:50:01<10:25:14, 12.85s/it] 71%|███████ | 7081/10000 [25:50:14<10:24:54, 12.85s/it] {'loss': 0.0052, 'learning_rate': 1.4665000000000001e-05, 'epoch': 2.67} 71%|███████ | 7081/10000 [25:50:14<10:24:54, 12.85s/it] 71%|███████ | 7082/10000 [25:50:27<10:25:58, 12.87s/it] {'loss': 0.005, 'learning_rate': 1.4660000000000002e-05, 'epoch': 2.67} 71%|███████ | 7082/10000 [25:50:27<10:25:58, 12.87s/it] 71%|███████ | 7083/10000 [25:50:39<10:25:32, 12.87s/it] {'loss': 0.006, 'learning_rate': 1.4655000000000003e-05, 'epoch': 2.67} 71%|███████ | 7083/10000 [25:50:40<10:25:32, 12.87s/it] 71%|███████ | 7084/10000 [25:50:52<10:25:09, 12.86s/it] {'loss': 0.0053, 'learning_rate': 1.465e-05, 'epoch': 2.67} 71%|███████ | 7084/10000 [25:50:52<10:25:09, 12.86s/it] 71%|███████ | 7085/10000 [25:51:05<10:24:17, 12.85s/it] {'loss': 0.0053, 'learning_rate': 1.4645e-05, 'epoch': 2.67} 71%|███████ | 7085/10000 [25:51:05<10:24:17, 12.85s/it] 71%|███████ | 7086/10000 [25:51:18<10:26:03, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.464e-05, 'epoch': 2.67} 71%|███████ | 7086/10000 [25:51:18<10:26:03, 12.89s/it] 71%|███████ | 7087/10000 [25:51:31<10:27:15, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.4635000000000001e-05, 'epoch': 2.67} 71%|███████ | 7087/10000 [25:51:31<10:27:15, 12.92s/it] 71%|███████ | 7088/10000 [25:51:44<10:27:41, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.4630000000000002e-05, 'epoch': 2.67} 71%|███████ | 7088/10000 [25:51:44<10:27:41, 12.93s/it] 71%|███████ | 7089/10000 [25:51:57<10:27:32, 12.93s/it] {'loss': 0.0037, 'learning_rate': 1.4625e-05, 'epoch': 2.67} 71%|███████ | 7089/10000 [25:51:57<10:27:32, 12.93s/it] 71%|███████ | 7090/10000 [25:52:10<10:27:19, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.462e-05, 'epoch': 2.67} 71%|███████ | 7090/10000 [25:52:10<10:27:19, 12.93s/it] 71%|███████ | 7091/10000 [25:52:23<10:26:26, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.4615000000000002e-05, 'epoch': 2.67} 71%|███████ | 7091/10000 [25:52:23<10:26:26, 12.92s/it] 71%|███████ | 7092/10000 [25:52:36<10:26:53, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.461e-05, 'epoch': 2.67} 71%|███████ | 7092/10000 [25:52:36<10:26:53, 12.93s/it] 71%|███████ | 7093/10000 [25:52:49<10:25:28, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.4605000000000002e-05, 'epoch': 2.67} 71%|███████ | 7093/10000 [25:52:49<10:25:28, 12.91s/it] 71%|███████ | 7094/10000 [25:53:02<10:27:03, 12.95s/it] {'loss': 0.0049, 'learning_rate': 1.4599999999999999e-05, 'epoch': 2.67} 71%|███████ | 7094/10000 [25:53:02<10:27:03, 12.95s/it] 71%|███████ | 7095/10000 [25:53:15<10:26:39, 12.94s/it] {'loss': 0.005, 'learning_rate': 1.4595e-05, 'epoch': 2.67} 71%|███████ | 7095/10000 [25:53:15<10:26:39, 12.94s/it] 71%|███████ | 7096/10000 [25:53:28<10:26:59, 12.95s/it] {'loss': 0.0055, 'learning_rate': 1.4590000000000001e-05, 'epoch': 2.67} 71%|███████ | 7096/10000 [25:53:28<10:26:59, 12.95s/it] 71%|███████ | 7097/10000 [25:53:41<10:26:35, 12.95s/it] {'loss': 0.0044, 'learning_rate': 1.4585000000000002e-05, 'epoch': 2.67} 71%|███████ | 7097/10000 [25:53:41<10:26:35, 12.95s/it] 71%|███████ | 7098/10000 [25:53:54<10:26:21, 12.95s/it] {'loss': 0.0051, 'learning_rate': 1.4580000000000003e-05, 'epoch': 2.67} 71%|███████ | 7098/10000 [25:53:54<10:26:21, 12.95s/it] 71%|███████ | 7099/10000 [25:54:06<10:26:20, 12.95s/it] {'loss': 0.0036, 'learning_rate': 1.4575e-05, 'epoch': 2.67} 71%|███████ | 7099/10000 [25:54:07<10:26:20, 12.95s/it] 71%|███████ | 7100/10000 [25:54:19<10:25:35, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.4570000000000001e-05, 'epoch': 2.68} 71%|███████ | 7100/10000 [25:54:19<10:25:35, 12.94s/it] 71%|███████ | 7101/10000 [25:54:32<10:25:11, 12.94s/it] {'loss': 0.0052, 'learning_rate': 1.4565e-05, 'epoch': 2.68} 71%|███████ | 7101/10000 [25:54:32<10:25:11, 12.94s/it] 71%|███████ | 7102/10000 [25:54:45<10:25:06, 12.94s/it] {'loss': 0.004, 'learning_rate': 1.4560000000000001e-05, 'epoch': 2.68} 71%|███████ | 7102/10000 [25:54:45<10:25:06, 12.94s/it] 71%|███████ | 7103/10000 [25:54:58<10:23:28, 12.91s/it] {'loss': 0.005, 'learning_rate': 1.4555000000000002e-05, 'epoch': 2.68} 71%|███████ | 7103/10000 [25:54:58<10:23:28, 12.91s/it] 71%|███████ | 7104/10000 [25:55:11<10:22:57, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.455e-05, 'epoch': 2.68} 71%|███████ | 7104/10000 [25:55:11<10:22:57, 12.91s/it] 71%|███████ | 7105/10000 [25:55:24<10:21:56, 12.89s/it] {'loss': 0.0052, 'learning_rate': 1.4545e-05, 'epoch': 2.68} 71%|███████ | 7105/10000 [25:55:24<10:21:56, 12.89s/it] 71%|███████ | 7106/10000 [25:55:37<10:21:29, 12.89s/it] {'loss': 0.0073, 'learning_rate': 1.4540000000000001e-05, 'epoch': 2.68} 71%|███████ | 7106/10000 [25:55:37<10:21:29, 12.89s/it] 71%|███████ | 7107/10000 [25:55:50<10:20:20, 12.87s/it] {'loss': 0.0061, 'learning_rate': 1.4535e-05, 'epoch': 2.68} 71%|███████ | 7107/10000 [25:55:50<10:20:20, 12.87s/it] 71%|███████ | 7108/10000 [25:56:02<10:20:32, 12.87s/it] {'loss': 0.0052, 'learning_rate': 1.4530000000000001e-05, 'epoch': 2.68} 71%|███████ | 7108/10000 [25:56:02<10:20:32, 12.87s/it] 71%|███████ | 7109/10000 [25:56:15<10:20:20, 12.87s/it] {'loss': 0.0049, 'learning_rate': 1.4524999999999999e-05, 'epoch': 2.68} 71%|███████ | 7109/10000 [25:56:15<10:20:20, 12.87s/it] 71%|███████ | 7110/10000 [25:56:28<10:19:17, 12.86s/it] {'loss': 0.0035, 'learning_rate': 1.452e-05, 'epoch': 2.68} 71%|███████ | 7110/10000 [25:56:28<10:19:17, 12.86s/it] 71%|███████ | 7111/10000 [25:56:41<10:18:32, 12.85s/it] {'loss': 0.0048, 'learning_rate': 1.4515e-05, 'epoch': 2.68} 71%|███████ | 7111/10000 [25:56:41<10:18:32, 12.85s/it] 71%|███████ | 7112/10000 [25:56:54<10:18:20, 12.85s/it] {'loss': 0.0037, 'learning_rate': 1.4510000000000002e-05, 'epoch': 2.68} 71%|███████ | 7112/10000 [25:56:54<10:18:20, 12.85s/it] 71%|███████ | 7113/10000 [25:57:07<10:18:54, 12.86s/it] {'loss': 0.004, 'learning_rate': 1.4505000000000003e-05, 'epoch': 2.68} 71%|███████ | 7113/10000 [25:57:07<10:18:54, 12.86s/it] 71%|███████ | 7114/10000 [25:57:20<10:18:01, 12.85s/it] {'loss': 0.0043, 'learning_rate': 1.45e-05, 'epoch': 2.68} 71%|███████ | 7114/10000 [25:57:20<10:18:01, 12.85s/it] 71%|███████ | 7115/10000 [25:57:32<10:18:22, 12.86s/it] {'loss': 0.0042, 'learning_rate': 1.4495000000000001e-05, 'epoch': 2.68} 71%|███████ | 7115/10000 [25:57:32<10:18:22, 12.86s/it] 71%|███████ | 7116/10000 [25:57:45<10:18:40, 12.87s/it] {'loss': 0.0041, 'learning_rate': 1.449e-05, 'epoch': 2.68} 71%|███████ | 7116/10000 [25:57:45<10:18:40, 12.87s/it] 71%|███████ | 7117/10000 [25:57:58<10:19:29, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.4485000000000001e-05, 'epoch': 2.68} 71%|███████ | 7117/10000 [25:57:58<10:19:29, 12.89s/it] 71%|███████ | 7118/10000 [25:58:11<10:19:17, 12.89s/it] {'loss': 0.0036, 'learning_rate': 1.4480000000000002e-05, 'epoch': 2.68} 71%|███████ | 7118/10000 [25:58:11<10:19:17, 12.89s/it] 71%|███████ | 7119/10000 [25:58:24<10:19:46, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.4475e-05, 'epoch': 2.68} 71%|███████ | 7119/10000 [25:58:24<10:19:46, 12.91s/it] 71%|███████ | 7120/10000 [25:58:37<10:20:17, 12.92s/it] {'loss': 0.0052, 'learning_rate': 1.447e-05, 'epoch': 2.68} 71%|███████ | 7120/10000 [25:58:37<10:20:17, 12.92s/it] 71%|███████ | 7121/10000 [25:58:50<10:19:41, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.4465000000000001e-05, 'epoch': 2.68} 71%|███████ | 7121/10000 [25:58:50<10:19:41, 12.91s/it] 71%|███████ | 7122/10000 [25:59:03<10:19:52, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.4460000000000002e-05, 'epoch': 2.68} 71%|███████ | 7122/10000 [25:59:03<10:19:52, 12.92s/it] 71%|███████ | 7123/10000 [25:59:16<10:18:35, 12.90s/it] {'loss': 0.0048, 'learning_rate': 1.4455000000000001e-05, 'epoch': 2.68} 71%|███████ | 7123/10000 [25:59:16<10:18:35, 12.90s/it] 71%|███████ | 7124/10000 [25:59:29<10:18:56, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.4449999999999999e-05, 'epoch': 2.68} 71%|███████ | 7124/10000 [25:59:29<10:18:56, 12.91s/it] 71%|███████▏ | 7125/10000 [25:59:42<10:18:46, 12.91s/it] {'loss': 0.0054, 'learning_rate': 1.4445e-05, 'epoch': 2.68} 71%|███████▏ | 7125/10000 [25:59:42<10:18:46, 12.91s/it] 71%|███████▏ | 7126/10000 [25:59:55<10:18:43, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.444e-05, 'epoch': 2.69} 71%|███████▏ | 7126/10000 [25:59:55<10:18:43, 12.92s/it] 71%|███████▏ | 7127/10000 [26:00:07<10:17:46, 12.90s/it] {'loss': 0.0045, 'learning_rate': 1.4435000000000002e-05, 'epoch': 2.69} 71%|███████▏ | 7127/10000 [26:00:07<10:17:46, 12.90s/it] 71%|███████▏ | 7128/10000 [26:00:20<10:17:50, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.4430000000000002e-05, 'epoch': 2.69} 71%|███████▏ | 7128/10000 [26:00:20<10:17:50, 12.91s/it] 71%|███████▏ | 7129/10000 [26:00:33<10:17:15, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.4425e-05, 'epoch': 2.69} 71%|███████▏ | 7129/10000 [26:00:33<10:17:15, 12.90s/it] 71%|███████▏ | 7130/10000 [26:00:46<10:15:49, 12.87s/it] {'loss': 0.0042, 'learning_rate': 1.4420000000000001e-05, 'epoch': 2.69} 71%|███████▏ | 7130/10000 [26:00:46<10:15:49, 12.87s/it] 71%|███████▏ | 7131/10000 [26:00:59<10:15:20, 12.87s/it] {'loss': 0.0048, 'learning_rate': 1.4415e-05, 'epoch': 2.69} 71%|███████▏ | 7131/10000 [26:00:59<10:15:20, 12.87s/it] 71%|███████▏ | 7132/10000 [26:01:12<10:14:29, 12.86s/it] {'loss': 0.0037, 'learning_rate': 1.4410000000000001e-05, 'epoch': 2.69} 71%|███████▏ | 7132/10000 [26:01:12<10:14:29, 12.86s/it] 71%|███████▏ | 7133/10000 [26:01:25<10:15:29, 12.88s/it] {'loss': 0.0037, 'learning_rate': 1.4405000000000002e-05, 'epoch': 2.69} 71%|███████▏ | 7133/10000 [26:01:25<10:15:29, 12.88s/it] 71%|███████▏ | 7134/10000 [26:01:38<10:15:46, 12.89s/it] {'loss': 0.0052, 'learning_rate': 1.44e-05, 'epoch': 2.69} 71%|███████▏ | 7134/10000 [26:01:38<10:15:46, 12.89s/it] 71%|███████▏ | 7135/10000 [26:01:50<10:15:49, 12.90s/it] {'loss': 0.0057, 'learning_rate': 1.4395e-05, 'epoch': 2.69} 71%|███████▏ | 7135/10000 [26:01:50<10:15:49, 12.90s/it] 71%|███████▏ | 7136/10000 [26:02:03<10:15:30, 12.89s/it] {'loss': 0.0048, 'learning_rate': 1.4390000000000001e-05, 'epoch': 2.69} 71%|███████▏ | 7136/10000 [26:02:03<10:15:30, 12.89s/it] 71%|███████▏ | 7137/10000 [26:02:16<10:15:45, 12.90s/it] {'loss': 0.0053, 'learning_rate': 1.4385000000000002e-05, 'epoch': 2.69} 71%|███████▏ | 7137/10000 [26:02:16<10:15:45, 12.90s/it] 71%|███████▏ | 7138/10000 [26:02:29<10:16:40, 12.93s/it] {'loss': 0.0035, 'learning_rate': 1.4380000000000001e-05, 'epoch': 2.69} 71%|███████▏ | 7138/10000 [26:02:29<10:16:40, 12.93s/it] 71%|███████▏ | 7139/10000 [26:02:42<10:15:25, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.4374999999999999e-05, 'epoch': 2.69} 71%|███████▏ | 7139/10000 [26:02:42<10:15:25, 12.91s/it] 71%|███████▏ | 7140/10000 [26:02:55<10:15:17, 12.91s/it] {'loss': 0.0052, 'learning_rate': 1.437e-05, 'epoch': 2.69} 71%|███████▏ | 7140/10000 [26:02:55<10:15:17, 12.91s/it] 71%|███████▏ | 7141/10000 [26:03:08<10:14:27, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.4365e-05, 'epoch': 2.69} 71%|███████▏ | 7141/10000 [26:03:08<10:14:27, 12.90s/it] 71%|███████▏ | 7142/10000 [26:03:21<10:13:15, 12.87s/it] {'loss': 0.0046, 'learning_rate': 1.4360000000000001e-05, 'epoch': 2.69} 71%|███████▏ | 7142/10000 [26:03:21<10:13:15, 12.87s/it] 71%|███████▏ | 7143/10000 [26:03:34<10:13:35, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.4355000000000002e-05, 'epoch': 2.69} 71%|███████▏ | 7143/10000 [26:03:34<10:13:35, 12.89s/it] 71%|███████▏ | 7144/10000 [26:03:47<10:15:25, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.435e-05, 'epoch': 2.69} 71%|███████▏ | 7144/10000 [26:03:47<10:15:25, 12.93s/it] 71%|███████▏ | 7145/10000 [26:04:00<10:16:00, 12.95s/it] {'loss': 0.0047, 'learning_rate': 1.4345e-05, 'epoch': 2.69} 71%|███████▏ | 7145/10000 [26:04:00<10:16:00, 12.95s/it] 71%|███████▏ | 7146/10000 [26:04:13<10:16:38, 12.96s/it] {'loss': 0.0043, 'learning_rate': 1.434e-05, 'epoch': 2.69} 71%|███████▏ | 7146/10000 [26:04:13<10:16:38, 12.96s/it] 71%|███████▏ | 7147/10000 [26:04:26<10:15:42, 12.95s/it] {'loss': 0.0051, 'learning_rate': 1.4335e-05, 'epoch': 2.69} 71%|███████▏ | 7147/10000 [26:04:26<10:15:42, 12.95s/it] 71%|███████▏ | 7148/10000 [26:04:38<10:15:23, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.4330000000000002e-05, 'epoch': 2.69} 71%|███████▏ | 7148/10000 [26:04:39<10:15:23, 12.95s/it] 71%|███████▏ | 7149/10000 [26:04:51<10:13:55, 12.92s/it] {'loss': 0.0046, 'learning_rate': 1.4325e-05, 'epoch': 2.69} 71%|███████▏ | 7149/10000 [26:04:51<10:13:55, 12.92s/it] 72%|███████▏ | 7150/10000 [26:05:04<10:13:39, 12.92s/it] {'loss': 0.0046, 'learning_rate': 1.432e-05, 'epoch': 2.69} 72%|███████▏ | 7150/10000 [26:05:04<10:13:39, 12.92s/it] 72%|███████▏ | 7151/10000 [26:05:17<10:12:41, 12.90s/it] {'loss': 0.0062, 'learning_rate': 1.4315000000000001e-05, 'epoch': 2.69} 72%|███████▏ | 7151/10000 [26:05:17<10:12:41, 12.90s/it] 72%|███████▏ | 7152/10000 [26:05:30<10:12:30, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.4310000000000002e-05, 'epoch': 2.69} 72%|███████▏ | 7152/10000 [26:05:30<10:12:30, 12.90s/it] 72%|███████▏ | 7153/10000 [26:05:43<10:11:45, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.4305000000000001e-05, 'epoch': 2.7} 72%|███████▏ | 7153/10000 [26:05:43<10:11:45, 12.89s/it] 72%|███████▏ | 7154/10000 [26:05:56<10:11:12, 12.89s/it] {'loss': 0.0054, 'learning_rate': 1.43e-05, 'epoch': 2.7} 72%|███████▏ | 7154/10000 [26:05:56<10:11:12, 12.89s/it] 72%|███████▏ | 7155/10000 [26:06:09<10:11:45, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.4295e-05, 'epoch': 2.7} 72%|███████▏ | 7155/10000 [26:06:09<10:11:45, 12.90s/it] 72%|███████▏ | 7156/10000 [26:06:22<10:12:11, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.429e-05, 'epoch': 2.7} 72%|███████▏ | 7156/10000 [26:06:22<10:12:11, 12.92s/it] 72%|███████▏ | 7157/10000 [26:06:35<10:11:53, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.4285000000000001e-05, 'epoch': 2.7} 72%|███████▏ | 7157/10000 [26:06:35<10:11:53, 12.91s/it] 72%|███████▏ | 7158/10000 [26:06:48<10:13:06, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.4280000000000002e-05, 'epoch': 2.7} 72%|███████▏ | 7158/10000 [26:06:48<10:13:06, 12.94s/it] 72%|███████▏ | 7159/10000 [26:07:00<10:12:16, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.4275e-05, 'epoch': 2.7} 72%|███████▏ | 7159/10000 [26:07:01<10:12:16, 12.93s/it] 72%|███████▏ | 7160/10000 [26:07:13<10:11:10, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.427e-05, 'epoch': 2.7} 72%|███████▏ | 7160/10000 [26:07:13<10:11:10, 12.91s/it] 72%|███████▏ | 7161/10000 [26:07:26<10:11:17, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.4265e-05, 'epoch': 2.7} 72%|███████▏ | 7161/10000 [26:07:26<10:11:17, 12.92s/it] 72%|███████▏ | 7162/10000 [26:07:39<10:11:25, 12.93s/it] {'loss': 0.004, 'learning_rate': 1.426e-05, 'epoch': 2.7} 72%|███████▏ | 7162/10000 [26:07:39<10:11:25, 12.93s/it] 72%|███████▏ | 7163/10000 [26:07:52<10:11:41, 12.94s/it] {'loss': 0.0044, 'learning_rate': 1.4255000000000002e-05, 'epoch': 2.7} 72%|███████▏ | 7163/10000 [26:07:52<10:11:41, 12.94s/it] 72%|███████▏ | 7164/10000 [26:08:05<10:10:53, 12.92s/it] {'loss': 0.004, 'learning_rate': 1.4249999999999999e-05, 'epoch': 2.7} 72%|███████▏ | 7164/10000 [26:08:05<10:10:53, 12.92s/it] 72%|███████▏ | 7165/10000 [26:08:18<10:10:02, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.4245e-05, 'epoch': 2.7} 72%|███████▏ | 7165/10000 [26:08:18<10:10:02, 12.91s/it] 72%|███████▏ | 7166/10000 [26:08:31<10:10:21, 12.92s/it] {'loss': 0.0037, 'learning_rate': 1.4240000000000001e-05, 'epoch': 2.7} 72%|███████▏ | 7166/10000 [26:08:31<10:10:21, 12.92s/it] 72%|███████▏ | 7167/10000 [26:08:44<10:09:27, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.4235000000000002e-05, 'epoch': 2.7} 72%|███████▏ | 7167/10000 [26:08:44<10:09:27, 12.91s/it] 72%|███████▏ | 7168/10000 [26:08:57<10:09:13, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.4230000000000001e-05, 'epoch': 2.7} 72%|███████▏ | 7168/10000 [26:08:57<10:09:13, 12.91s/it] 72%|███████▏ | 7169/10000 [26:09:10<10:08:35, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.4225e-05, 'epoch': 2.7} 72%|███████▏ | 7169/10000 [26:09:10<10:08:35, 12.90s/it] 72%|███████▏ | 7170/10000 [26:09:23<10:08:46, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.422e-05, 'epoch': 2.7} 72%|███████▏ | 7170/10000 [26:09:23<10:08:46, 12.91s/it] 72%|███████▏ | 7171/10000 [26:09:35<10:09:31, 12.93s/it] {'loss': 0.004, 'learning_rate': 1.4215e-05, 'epoch': 2.7} 72%|███████▏ | 7171/10000 [26:09:36<10:09:31, 12.93s/it] 72%|███████▏ | 7172/10000 [26:09:48<10:08:35, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.4210000000000001e-05, 'epoch': 2.7} 72%|███████▏ | 7172/10000 [26:09:48<10:08:35, 12.91s/it] 72%|███████▏ | 7173/10000 [26:10:01<10:08:54, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.4205000000000002e-05, 'epoch': 2.7} 72%|███████▏ | 7173/10000 [26:10:01<10:08:54, 12.92s/it] 72%|███████▏ | 7174/10000 [26:10:14<10:08:51, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.42e-05, 'epoch': 2.7} 72%|███████▏ | 7174/10000 [26:10:14<10:08:51, 12.93s/it] 72%|███████▏ | 7175/10000 [26:10:27<10:08:15, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.4195e-05, 'epoch': 2.7} 72%|███████▏ | 7175/10000 [26:10:27<10:08:15, 12.92s/it] 72%|███████▏ | 7176/10000 [26:10:40<10:08:19, 12.92s/it] {'loss': 0.004, 'learning_rate': 1.4190000000000001e-05, 'epoch': 2.7} 72%|███████▏ | 7176/10000 [26:10:40<10:08:19, 12.92s/it] 72%|███████▏ | 7177/10000 [26:10:53<10:09:13, 12.95s/it] {'loss': 0.0044, 'learning_rate': 1.4185e-05, 'epoch': 2.7} 72%|███████▏ | 7177/10000 [26:10:53<10:09:13, 12.95s/it] 72%|███████▏ | 7178/10000 [26:11:06<10:09:41, 12.96s/it] {'loss': 0.0046, 'learning_rate': 1.4180000000000001e-05, 'epoch': 2.7} 72%|███████▏ | 7178/10000 [26:11:06<10:09:41, 12.96s/it] 72%|███████▏ | 7179/10000 [26:11:19<10:09:41, 12.97s/it] {'loss': 0.0056, 'learning_rate': 1.4174999999999999e-05, 'epoch': 2.7} 72%|███████▏ | 7179/10000 [26:11:19<10:09:41, 12.97s/it] 72%|███████▏ | 7180/10000 [26:11:32<10:07:59, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.417e-05, 'epoch': 2.71} 72%|███████▏ | 7180/10000 [26:11:32<10:07:59, 12.94s/it] 72%|███████▏ | 7181/10000 [26:11:45<10:07:19, 12.93s/it] {'loss': 0.0052, 'learning_rate': 1.4165e-05, 'epoch': 2.71} 72%|███████▏ | 7181/10000 [26:11:45<10:07:19, 12.93s/it] 72%|███████▏ | 7182/10000 [26:11:58<10:07:45, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.4160000000000002e-05, 'epoch': 2.71} 72%|███████▏ | 7182/10000 [26:11:58<10:07:45, 12.94s/it] 72%|███████▏ | 7183/10000 [26:12:11<10:07:17, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.4155000000000001e-05, 'epoch': 2.71} 72%|███████▏ | 7183/10000 [26:12:11<10:07:17, 12.93s/it] 72%|███████▏ | 7184/10000 [26:12:24<10:06:48, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.415e-05, 'epoch': 2.71} 72%|███████▏ | 7184/10000 [26:12:24<10:06:48, 12.93s/it] 72%|███████▏ | 7185/10000 [26:12:37<10:06:30, 12.93s/it] {'loss': 0.0056, 'learning_rate': 1.4145e-05, 'epoch': 2.71} 72%|███████▏ | 7185/10000 [26:12:37<10:06:30, 12.93s/it] 72%|███████▏ | 7186/10000 [26:12:49<10:05:21, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.414e-05, 'epoch': 2.71} 72%|███████▏ | 7186/10000 [26:12:49<10:05:21, 12.91s/it] 72%|███████▏ | 7187/10000 [26:13:02<10:04:34, 12.90s/it] {'loss': 0.005, 'learning_rate': 1.4135000000000001e-05, 'epoch': 2.71} 72%|███████▏ | 7187/10000 [26:13:02<10:04:34, 12.90s/it] 72%|███████▏ | 7188/10000 [26:13:15<10:04:01, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.4130000000000002e-05, 'epoch': 2.71} 72%|███████▏ | 7188/10000 [26:13:15<10:04:01, 12.89s/it] 72%|███████▏ | 7189/10000 [26:13:28<10:03:28, 12.88s/it] {'loss': 0.005, 'learning_rate': 1.4125e-05, 'epoch': 2.71} 72%|███████▏ | 7189/10000 [26:13:28<10:03:28, 12.88s/it] 72%|███████▏ | 7190/10000 [26:13:41<10:03:20, 12.88s/it] {'loss': 0.005, 'learning_rate': 1.412e-05, 'epoch': 2.71} 72%|███████▏ | 7190/10000 [26:13:41<10:03:20, 12.88s/it] 72%|███████▏ | 7191/10000 [26:13:54<10:04:54, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.4115000000000001e-05, 'epoch': 2.71} 72%|███████▏ | 7191/10000 [26:13:54<10:04:54, 12.92s/it] 72%|███████▏ | 7192/10000 [26:14:07<10:05:26, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.411e-05, 'epoch': 2.71} 72%|███████▏ | 7192/10000 [26:14:07<10:05:26, 12.94s/it] 72%|███████▏ | 7193/10000 [26:14:20<10:03:53, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.4105000000000001e-05, 'epoch': 2.71} 72%|███████▏ | 7193/10000 [26:14:20<10:03:53, 12.91s/it] 72%|███████▏ | 7194/10000 [26:14:33<10:03:28, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.4099999999999999e-05, 'epoch': 2.71} 72%|███████▏ | 7194/10000 [26:14:33<10:03:28, 12.90s/it] 72%|███████▏ | 7195/10000 [26:14:46<10:04:05, 12.92s/it] {'loss': 0.0036, 'learning_rate': 1.4095e-05, 'epoch': 2.71} 72%|███████▏ | 7195/10000 [26:14:46<10:04:05, 12.92s/it] 72%|███████▏ | 7196/10000 [26:14:58<10:02:53, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.409e-05, 'epoch': 2.71} 72%|███████▏ | 7196/10000 [26:14:58<10:02:53, 12.90s/it] 72%|███████▏ | 7197/10000 [26:15:11<10:03:21, 12.92s/it] {'loss': 0.0046, 'learning_rate': 1.4085000000000002e-05, 'epoch': 2.71} 72%|███████▏ | 7197/10000 [26:15:11<10:03:21, 12.92s/it] 72%|███████▏ | 7198/10000 [26:15:24<10:03:51, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.408e-05, 'epoch': 2.71} 72%|███████▏ | 7198/10000 [26:15:24<10:03:51, 12.93s/it] 72%|███████▏ | 7199/10000 [26:15:37<10:04:28, 12.95s/it] {'loss': 0.0057, 'learning_rate': 1.4075e-05, 'epoch': 2.71} 72%|███████▏ | 7199/10000 [26:15:37<10:04:28, 12.95s/it] 72%|███████▏ | 7200/10000 [26:15:50<10:04:43, 12.96s/it] {'loss': 0.0038, 'learning_rate': 1.4069999999999999e-05, 'epoch': 2.71} 72%|███████▏ | 7200/10000 [26:15:50<10:04:43, 12.96s/it] 72%|███████▏ | 7201/10000 [26:16:03<10:04:15, 12.95s/it] {'loss': 0.0059, 'learning_rate': 1.4065e-05, 'epoch': 2.71} 72%|███████▏ | 7201/10000 [26:16:03<10:04:15, 12.95s/it] 72%|███████▏ | 7202/10000 [26:16:16<10:04:25, 12.96s/it] {'loss': 0.0051, 'learning_rate': 1.4060000000000001e-05, 'epoch': 2.71} 72%|███████▏ | 7202/10000 [26:16:16<10:04:25, 12.96s/it] 72%|███████▏ | 7203/10000 [26:16:29<10:04:05, 12.96s/it] {'loss': 0.0037, 'learning_rate': 1.4055000000000002e-05, 'epoch': 2.71} 72%|███████▏ | 7203/10000 [26:16:29<10:04:05, 12.96s/it] 72%|███████▏ | 7204/10000 [26:16:42<10:02:29, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.4050000000000003e-05, 'epoch': 2.71} 72%|███████▏ | 7204/10000 [26:16:42<10:02:29, 12.93s/it] 72%|███████▏ | 7205/10000 [26:16:55<10:01:55, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.4045e-05, 'epoch': 2.71} 72%|███████▏ | 7205/10000 [26:16:55<10:01:55, 12.92s/it] 72%|███████▏ | 7206/10000 [26:17:08<10:00:49, 12.90s/it] {'loss': 0.0044, 'learning_rate': 1.4040000000000001e-05, 'epoch': 2.72} 72%|███████▏ | 7206/10000 [26:17:08<10:00:49, 12.90s/it] 72%|███████▏ | 7207/10000 [26:17:21<10:01:14, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.4035e-05, 'epoch': 2.72} 72%|███████▏ | 7207/10000 [26:17:21<10:01:14, 12.92s/it] 72%|███████▏ | 7208/10000 [26:17:34<10:00:36, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.4030000000000001e-05, 'epoch': 2.72} 72%|███████▏ | 7208/10000 [26:17:34<10:00:36, 12.91s/it] 72%|███████▏ | 7209/10000 [26:17:47<10:00:40, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.4025000000000002e-05, 'epoch': 2.72} 72%|███████▏ | 7209/10000 [26:17:47<10:00:40, 12.91s/it] 72%|███████▏ | 7210/10000 [26:18:00<10:00:29, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.402e-05, 'epoch': 2.72} 72%|███████▏ | 7210/10000 [26:18:00<10:00:29, 12.91s/it] 72%|███████▏ | 7211/10000 [26:18:12<10:00:15, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.4015e-05, 'epoch': 2.72} 72%|███████▏ | 7211/10000 [26:18:12<10:00:15, 12.91s/it] 72%|███████▏ | 7212/10000 [26:18:25<9:59:53, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.4010000000000001e-05, 'epoch': 2.72} 72%|███████▏ | 7212/10000 [26:18:25<9:59:53, 12.91s/it] 72%|███████▏ | 7213/10000 [26:18:38<9:59:30, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.4005000000000002e-05, 'epoch': 2.72} 72%|███████▏ | 7213/10000 [26:18:38<9:59:30, 12.91s/it] 72%|███████▏ | 7214/10000 [26:18:51<10:00:09, 12.92s/it] {'loss': 0.004, 'learning_rate': 1.4000000000000001e-05, 'epoch': 2.72} 72%|███████▏ | 7214/10000 [26:18:51<10:00:09, 12.92s/it] 72%|███████▏ | 7215/10000 [26:19:04<10:00:43, 12.94s/it] {'loss': 0.0052, 'learning_rate': 1.3994999999999999e-05, 'epoch': 2.72} 72%|███████▏ | 7215/10000 [26:19:04<10:00:43, 12.94s/it] 72%|███████▏ | 7216/10000 [26:19:17<10:01:20, 12.96s/it] {'loss': 0.005, 'learning_rate': 1.399e-05, 'epoch': 2.72} 72%|███████▏ | 7216/10000 [26:19:17<10:01:20, 12.96s/it] 72%|███████▏ | 7217/10000 [26:19:30<10:01:41, 12.97s/it] {'loss': 0.0039, 'learning_rate': 1.3985e-05, 'epoch': 2.72} 72%|███████▏ | 7217/10000 [26:19:30<10:01:41, 12.97s/it] 72%|███████▏ | 7218/10000 [26:19:43<10:01:19, 12.97s/it] {'loss': 0.0041, 'learning_rate': 1.3980000000000002e-05, 'epoch': 2.72} 72%|███████▏ | 7218/10000 [26:19:43<10:01:19, 12.97s/it] 72%|███████▏ | 7219/10000 [26:19:56<10:01:18, 12.97s/it] {'loss': 0.0043, 'learning_rate': 1.3975000000000003e-05, 'epoch': 2.72} 72%|███████▏ | 7219/10000 [26:19:56<10:01:18, 12.97s/it] 72%|███████▏ | 7220/10000 [26:20:09<9:59:27, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.397e-05, 'epoch': 2.72} 72%|███████▏ | 7220/10000 [26:20:09<9:59:27, 12.94s/it] 72%|███████▏ | 7221/10000 [26:20:22<9:59:29, 12.94s/it] {'loss': 0.0061, 'learning_rate': 1.3965000000000001e-05, 'epoch': 2.72} 72%|███████▏ | 7221/10000 [26:20:22<9:59:29, 12.94s/it] 72%|███████▏ | 7222/10000 [26:20:35<9:59:03, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.396e-05, 'epoch': 2.72} 72%|███████▏ | 7222/10000 [26:20:35<9:59:03, 12.94s/it] 72%|███████▏ | 7223/10000 [26:20:48<9:59:03, 12.94s/it] {'loss': 0.0044, 'learning_rate': 1.3955000000000001e-05, 'epoch': 2.72} 72%|███████▏ | 7223/10000 [26:20:48<9:59:03, 12.94s/it] 72%|███████▏ | 7224/10000 [26:21:01<9:58:19, 12.93s/it] {'loss': 0.0033, 'learning_rate': 1.3950000000000002e-05, 'epoch': 2.72} 72%|███████▏ | 7224/10000 [26:21:01<9:58:19, 12.93s/it] 72%|███████▏ | 7225/10000 [26:21:14<9:57:53, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.3945e-05, 'epoch': 2.72} 72%|███████▏ | 7225/10000 [26:21:14<9:57:53, 12.93s/it] 72%|███████▏ | 7226/10000 [26:21:26<9:56:48, 12.91s/it] {'loss': 0.0052, 'learning_rate': 1.394e-05, 'epoch': 2.72} 72%|███████▏ | 7226/10000 [26:21:27<9:56:48, 12.91s/it] 72%|███████▏ | 7227/10000 [26:21:39<9:56:29, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.3935000000000001e-05, 'epoch': 2.72} 72%|███████▏ | 7227/10000 [26:21:39<9:56:29, 12.91s/it] 72%|███████▏ | 7228/10000 [26:21:52<9:54:36, 12.87s/it] {'loss': 0.0058, 'learning_rate': 1.3930000000000002e-05, 'epoch': 2.72} 72%|███████▏ | 7228/10000 [26:21:52<9:54:36, 12.87s/it] 72%|███████▏ | 7229/10000 [26:22:05<9:55:02, 12.88s/it] {'loss': 0.004, 'learning_rate': 1.3925000000000001e-05, 'epoch': 2.72} 72%|███████▏ | 7229/10000 [26:22:05<9:55:02, 12.88s/it] 72%|███████▏ | 7230/10000 [26:22:18<9:56:14, 12.92s/it] {'loss': 0.0054, 'learning_rate': 1.3919999999999999e-05, 'epoch': 2.72} 72%|███████▏ | 7230/10000 [26:22:18<9:56:14, 12.92s/it] 72%|███████▏ | 7231/10000 [26:22:31<9:56:50, 12.93s/it] {'loss': 0.0059, 'learning_rate': 1.3915e-05, 'epoch': 2.72} 72%|███████▏ | 7231/10000 [26:22:31<9:56:50, 12.93s/it] 72%|███████▏ | 7232/10000 [26:22:44<9:56:30, 12.93s/it] {'loss': 0.0056, 'learning_rate': 1.391e-05, 'epoch': 2.72} 72%|███████▏ | 7232/10000 [26:22:44<9:56:30, 12.93s/it] 72%|███████▏ | 7233/10000 [26:22:57<9:55:19, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.3905000000000002e-05, 'epoch': 2.73} 72%|███████▏ | 7233/10000 [26:22:57<9:55:19, 12.91s/it] 72%|███████▏ | 7234/10000 [26:23:10<9:55:34, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.3900000000000002e-05, 'epoch': 2.73} 72%|███████▏ | 7234/10000 [26:23:10<9:55:34, 12.92s/it] 72%|███████▏ | 7235/10000 [26:23:23<9:54:56, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.3895e-05, 'epoch': 2.73} 72%|███████▏ | 7235/10000 [26:23:23<9:54:56, 12.91s/it] 72%|███████▏ | 7236/10000 [26:23:36<9:55:23, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.389e-05, 'epoch': 2.73} 72%|███████▏ | 7236/10000 [26:23:36<9:55:23, 12.92s/it] 72%|███████▏ | 7237/10000 [26:23:49<9:57:09, 12.97s/it] {'loss': 0.0044, 'learning_rate': 1.3885e-05, 'epoch': 2.73} 72%|███████▏ | 7237/10000 [26:23:49<9:57:09, 12.97s/it] 72%|███████▏ | 7238/10000 [26:24:02<9:56:07, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.3880000000000001e-05, 'epoch': 2.73} 72%|███████▏ | 7238/10000 [26:24:02<9:56:07, 12.95s/it] 72%|███████▏ | 7239/10000 [26:24:15<9:57:02, 12.97s/it] {'loss': 0.0037, 'learning_rate': 1.3875000000000002e-05, 'epoch': 2.73} 72%|███████▏ | 7239/10000 [26:24:15<9:57:02, 12.97s/it] 72%|███████▏ | 7240/10000 [26:24:28<9:56:04, 12.96s/it] {'loss': 0.0042, 'learning_rate': 1.387e-05, 'epoch': 2.73} 72%|███████▏ | 7240/10000 [26:24:28<9:56:04, 12.96s/it] 72%|███████▏ | 7241/10000 [26:24:41<9:55:39, 12.95s/it] {'loss': 0.0049, 'learning_rate': 1.3865e-05, 'epoch': 2.73} 72%|███████▏ | 7241/10000 [26:24:41<9:55:39, 12.95s/it] 72%|███████▏ | 7242/10000 [26:24:53<9:54:33, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.3860000000000001e-05, 'epoch': 2.73} 72%|███████▏ | 7242/10000 [26:24:53<9:54:33, 12.93s/it] 72%|███████▏ | 7243/10000 [26:25:06<9:55:51, 12.97s/it] {'loss': 0.0038, 'learning_rate': 1.3855000000000002e-05, 'epoch': 2.73} 72%|███████▏ | 7243/10000 [26:25:06<9:55:51, 12.97s/it] 72%|███████▏ | 7244/10000 [26:25:19<9:56:32, 12.99s/it] {'loss': 0.0055, 'learning_rate': 1.3850000000000001e-05, 'epoch': 2.73} 72%|███████▏ | 7244/10000 [26:25:20<9:56:32, 12.99s/it] 72%|███████▏ | 7245/10000 [26:25:33<9:57:16, 13.01s/it] {'loss': 0.0042, 'learning_rate': 1.3845e-05, 'epoch': 2.73} 72%|███████▏ | 7245/10000 [26:25:33<9:57:16, 13.01s/it] 72%|███████▏ | 7246/10000 [26:25:45<9:55:22, 12.97s/it] {'loss': 0.0044, 'learning_rate': 1.384e-05, 'epoch': 2.73} 72%|███████▏ | 7246/10000 [26:25:45<9:55:22, 12.97s/it] 72%|███████▏ | 7247/10000 [26:25:58<9:55:06, 12.97s/it] {'loss': 0.0035, 'learning_rate': 1.3835e-05, 'epoch': 2.73} 72%|███████▏ | 7247/10000 [26:25:58<9:55:06, 12.97s/it] 72%|███████▏ | 7248/10000 [26:26:11<9:53:42, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.3830000000000001e-05, 'epoch': 2.73} 72%|███████▏ | 7248/10000 [26:26:11<9:53:42, 12.94s/it] 72%|███████▏ | 7249/10000 [26:26:24<9:53:20, 12.94s/it] {'loss': 0.0036, 'learning_rate': 1.3825000000000002e-05, 'epoch': 2.73} 72%|███████▏ | 7249/10000 [26:26:24<9:53:20, 12.94s/it] 72%|███████▎ | 7250/10000 [26:26:37<9:52:39, 12.93s/it] {'loss': 0.0047, 'learning_rate': 1.382e-05, 'epoch': 2.73} 72%|███████▎ | 7250/10000 [26:26:37<9:52:39, 12.93s/it] 73%|███████▎ | 7251/10000 [26:26:50<9:52:25, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.3815e-05, 'epoch': 2.73} 73%|███████▎ | 7251/10000 [26:26:50<9:52:25, 12.93s/it] 73%|███████▎ | 7252/10000 [26:27:03<9:52:45, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.381e-05, 'epoch': 2.73} 73%|███████▎ | 7252/10000 [26:27:03<9:52:45, 12.94s/it] 73%|███████▎ | 7253/10000 [26:27:16<9:52:16, 12.94s/it] {'loss': 0.0054, 'learning_rate': 1.3805e-05, 'epoch': 2.73} 73%|███████▎ | 7253/10000 [26:27:16<9:52:16, 12.94s/it] 73%|███████▎ | 7254/10000 [26:27:29<9:51:11, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.3800000000000002e-05, 'epoch': 2.73} 73%|███████▎ | 7254/10000 [26:27:29<9:51:11, 12.92s/it] 73%|███████▎ | 7255/10000 [26:27:42<9:51:17, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.3795e-05, 'epoch': 2.73} 73%|███████▎ | 7255/10000 [26:27:42<9:51:17, 12.92s/it] 73%|███████▎ | 7256/10000 [26:27:55<9:51:33, 12.94s/it] {'loss': 0.0052, 'learning_rate': 1.379e-05, 'epoch': 2.73} 73%|███████▎ | 7256/10000 [26:27:55<9:51:33, 12.94s/it] 73%|███████▎ | 7257/10000 [26:28:08<9:50:50, 12.92s/it] {'loss': 0.0063, 'learning_rate': 1.3785000000000001e-05, 'epoch': 2.73} 73%|███████▎ | 7257/10000 [26:28:08<9:50:50, 12.92s/it] 73%|███████▎ | 7258/10000 [26:28:20<9:49:25, 12.90s/it] {'loss': 0.0038, 'learning_rate': 1.3780000000000002e-05, 'epoch': 2.73} 73%|███████▎ | 7258/10000 [26:28:20<9:49:25, 12.90s/it] 73%|███████▎ | 7259/10000 [26:28:33<9:49:32, 12.90s/it] {'loss': 0.0045, 'learning_rate': 1.3775000000000001e-05, 'epoch': 2.74} 73%|███████▎ | 7259/10000 [26:28:33<9:49:32, 12.90s/it] 73%|███████▎ | 7260/10000 [26:28:46<9:49:51, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.377e-05, 'epoch': 2.74} 73%|███████▎ | 7260/10000 [26:28:46<9:49:51, 12.92s/it] 73%|███████▎ | 7261/10000 [26:28:59<9:49:18, 12.91s/it] {'loss': 0.0029, 'learning_rate': 1.3765e-05, 'epoch': 2.74} 73%|███████▎ | 7261/10000 [26:28:59<9:49:18, 12.91s/it] 73%|███████▎ | 7262/10000 [26:29:12<9:50:02, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.376e-05, 'epoch': 2.74} 73%|███████▎ | 7262/10000 [26:29:12<9:50:02, 12.93s/it] 73%|███████▎ | 7263/10000 [26:29:25<9:49:34, 12.92s/it] {'loss': 0.0056, 'learning_rate': 1.3755000000000001e-05, 'epoch': 2.74} 73%|███████▎ | 7263/10000 [26:29:25<9:49:34, 12.92s/it] 73%|███████▎ | 7264/10000 [26:29:38<9:49:12, 12.92s/it] {'loss': 0.0051, 'learning_rate': 1.3750000000000002e-05, 'epoch': 2.74} 73%|███████▎ | 7264/10000 [26:29:38<9:49:12, 12.92s/it] 73%|███████▎ | 7265/10000 [26:29:51<9:48:48, 12.92s/it] {'loss': 0.0031, 'learning_rate': 1.3745e-05, 'epoch': 2.74} 73%|███████▎ | 7265/10000 [26:29:51<9:48:48, 12.92s/it] 73%|███████▎ | 7266/10000 [26:30:04<9:47:33, 12.89s/it] {'loss': 0.0053, 'learning_rate': 1.374e-05, 'epoch': 2.74} 73%|███████▎ | 7266/10000 [26:30:04<9:47:33, 12.89s/it] 73%|███████▎ | 7267/10000 [26:30:17<9:46:23, 12.87s/it] {'loss': 0.0059, 'learning_rate': 1.3735000000000001e-05, 'epoch': 2.74} 73%|███████▎ | 7267/10000 [26:30:17<9:46:23, 12.87s/it] 73%|███████▎ | 7268/10000 [26:30:30<9:47:13, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.373e-05, 'epoch': 2.74} 73%|███████▎ | 7268/10000 [26:30:30<9:47:13, 12.90s/it] 73%|███████▎ | 7269/10000 [26:30:42<9:47:52, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.3725000000000002e-05, 'epoch': 2.74} 73%|███████▎ | 7269/10000 [26:30:43<9:47:52, 12.92s/it] 73%|███████▎ | 7270/10000 [26:30:55<9:48:51, 12.94s/it] {'loss': 0.0038, 'learning_rate': 1.3719999999999999e-05, 'epoch': 2.74} 73%|███████▎ | 7270/10000 [26:30:56<9:48:51, 12.94s/it] 73%|███████▎ | 7271/10000 [26:31:08<9:48:39, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.3715e-05, 'epoch': 2.74} 73%|███████▎ | 7271/10000 [26:31:08<9:48:39, 12.94s/it] 73%|███████▎ | 7272/10000 [26:31:21<9:47:45, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.3710000000000001e-05, 'epoch': 2.74} 73%|███████▎ | 7272/10000 [26:31:21<9:47:45, 12.93s/it] 73%|███████▎ | 7273/10000 [26:31:34<9:47:43, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.3705000000000002e-05, 'epoch': 2.74} 73%|███████▎ | 7273/10000 [26:31:34<9:47:43, 12.93s/it] 73%|███████▎ | 7274/10000 [26:31:47<9:47:21, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.3700000000000001e-05, 'epoch': 2.74} 73%|███████▎ | 7274/10000 [26:31:47<9:47:21, 12.93s/it] 73%|███████▎ | 7275/10000 [26:32:00<9:46:01, 12.90s/it] {'loss': 0.0058, 'learning_rate': 1.3695e-05, 'epoch': 2.74} 73%|███████▎ | 7275/10000 [26:32:00<9:46:01, 12.90s/it] 73%|███████▎ | 7276/10000 [26:32:13<9:47:02, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.369e-05, 'epoch': 2.74} 73%|███████▎ | 7276/10000 [26:32:13<9:47:02, 12.93s/it] 73%|███████▎ | 7277/10000 [26:32:26<9:46:20, 12.92s/it] {'loss': 0.0059, 'learning_rate': 1.3685e-05, 'epoch': 2.74} 73%|███████▎ | 7277/10000 [26:32:26<9:46:20, 12.92s/it] 73%|███████▎ | 7278/10000 [26:32:39<9:46:25, 12.93s/it] {'loss': 0.0035, 'learning_rate': 1.3680000000000001e-05, 'epoch': 2.74} 73%|███████▎ | 7278/10000 [26:32:39<9:46:25, 12.93s/it] 73%|███████▎ | 7279/10000 [26:32:52<9:46:22, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.3675000000000002e-05, 'epoch': 2.74} 73%|███████▎ | 7279/10000 [26:32:52<9:46:22, 12.93s/it] 73%|███████▎ | 7280/10000 [26:33:05<9:45:58, 12.93s/it] {'loss': 0.0039, 'learning_rate': 1.367e-05, 'epoch': 2.74} 73%|███████▎ | 7280/10000 [26:33:05<9:45:58, 12.93s/it] 73%|███████▎ | 7281/10000 [26:33:18<9:45:11, 12.91s/it] {'loss': 0.0035, 'learning_rate': 1.3665e-05, 'epoch': 2.74} 73%|███████▎ | 7281/10000 [26:33:18<9:45:11, 12.91s/it] 73%|███████▎ | 7282/10000 [26:33:30<9:44:21, 12.90s/it] {'loss': 0.004, 'learning_rate': 1.3660000000000001e-05, 'epoch': 2.74} 73%|███████▎ | 7282/10000 [26:33:30<9:44:21, 12.90s/it] 73%|███████▎ | 7283/10000 [26:33:43<9:44:18, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.3655e-05, 'epoch': 2.74} 73%|███████▎ | 7283/10000 [26:33:43<9:44:18, 12.90s/it] 73%|███████▎ | 7284/10000 [26:33:56<9:44:26, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.3650000000000001e-05, 'epoch': 2.74} 73%|███████▎ | 7284/10000 [26:33:56<9:44:26, 12.91s/it] 73%|███████▎ | 7285/10000 [26:34:09<9:44:10, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.3644999999999999e-05, 'epoch': 2.74} 73%|███████▎ | 7285/10000 [26:34:09<9:44:10, 12.91s/it] 73%|███████▎ | 7286/10000 [26:34:22<9:44:13, 12.92s/it] {'loss': 0.0046, 'learning_rate': 1.364e-05, 'epoch': 2.75} 73%|███████▎ | 7286/10000 [26:34:22<9:44:13, 12.92s/it] 73%|███████▎ | 7287/10000 [26:34:35<9:44:33, 12.93s/it] {'loss': 0.0051, 'learning_rate': 1.3635e-05, 'epoch': 2.75} 73%|███████▎ | 7287/10000 [26:34:35<9:44:33, 12.93s/it] 73%|███████▎ | 7288/10000 [26:34:48<9:46:46, 12.98s/it] {'loss': 0.0035, 'learning_rate': 1.3630000000000002e-05, 'epoch': 2.75} 73%|███████▎ | 7288/10000 [26:34:48<9:46:46, 12.98s/it] 73%|███████▎ | 7289/10000 [26:35:01<9:47:25, 13.00s/it] {'loss': 0.0043, 'learning_rate': 1.3625e-05, 'epoch': 2.75} 73%|███████▎ | 7289/10000 [26:35:01<9:47:25, 13.00s/it] 73%|███████▎ | 7290/10000 [26:35:14<9:47:20, 13.00s/it] {'loss': 0.0044, 'learning_rate': 1.362e-05, 'epoch': 2.75} 73%|███████▎ | 7290/10000 [26:35:14<9:47:20, 13.00s/it] 73%|███████▎ | 7291/10000 [26:35:27<9:47:00, 13.00s/it] {'loss': 0.0032, 'learning_rate': 1.3615e-05, 'epoch': 2.75} 73%|███████▎ | 7291/10000 [26:35:27<9:47:00, 13.00s/it] 73%|███████▎ | 7292/10000 [26:35:40<9:46:20, 12.99s/it] {'loss': 0.0054, 'learning_rate': 1.361e-05, 'epoch': 2.75} 73%|███████▎ | 7292/10000 [26:35:40<9:46:20, 12.99s/it] 73%|███████▎ | 7293/10000 [26:35:53<9:46:02, 12.99s/it] {'loss': 0.0053, 'learning_rate': 1.3605000000000001e-05, 'epoch': 2.75} 73%|███████▎ | 7293/10000 [26:35:53<9:46:02, 12.99s/it] 73%|███████▎ | 7294/10000 [26:36:06<9:46:18, 13.00s/it] {'loss': 0.0045, 'learning_rate': 1.3600000000000002e-05, 'epoch': 2.75} 73%|███████▎ | 7294/10000 [26:36:06<9:46:18, 13.00s/it] 73%|███████▎ | 7295/10000 [26:36:19<9:45:49, 12.99s/it] {'loss': 0.0058, 'learning_rate': 1.3595e-05, 'epoch': 2.75} 73%|███████▎ | 7295/10000 [26:36:19<9:45:49, 12.99s/it] 73%|███████▎ | 7296/10000 [26:36:32<9:45:12, 12.99s/it] {'loss': 0.0039, 'learning_rate': 1.359e-05, 'epoch': 2.75} 73%|███████▎ | 7296/10000 [26:36:32<9:45:12, 12.99s/it] 73%|███████▎ | 7297/10000 [26:36:45<9:45:02, 12.99s/it] {'loss': 0.0043, 'learning_rate': 1.3585000000000001e-05, 'epoch': 2.75} 73%|███████▎ | 7297/10000 [26:36:45<9:45:02, 12.99s/it] 73%|███████▎ | 7298/10000 [26:36:58<9:43:35, 12.96s/it] {'loss': 0.0044, 'learning_rate': 1.358e-05, 'epoch': 2.75} 73%|███████▎ | 7298/10000 [26:36:58<9:43:35, 12.96s/it] 73%|███████▎ | 7299/10000 [26:37:11<9:42:42, 12.94s/it] {'loss': 0.0053, 'learning_rate': 1.3575000000000001e-05, 'epoch': 2.75} 73%|███████▎ | 7299/10000 [26:37:11<9:42:42, 12.94s/it] 73%|███████▎ | 7300/10000 [26:37:24<9:43:01, 12.96s/it] {'loss': 0.0038, 'learning_rate': 1.3569999999999999e-05, 'epoch': 2.75} 73%|███████▎ | 7300/10000 [26:37:24<9:43:01, 12.96s/it] 73%|███████▎ | 7301/10000 [26:37:37<9:41:51, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.3565e-05, 'epoch': 2.75} 73%|███████▎ | 7301/10000 [26:37:37<9:41:51, 12.93s/it] 73%|███████▎ | 7302/10000 [26:37:50<9:42:09, 12.95s/it] {'loss': 0.0037, 'learning_rate': 1.356e-05, 'epoch': 2.75} 73%|███████▎ | 7302/10000 [26:37:50<9:42:09, 12.95s/it] 73%|███████▎ | 7303/10000 [26:38:03<9:42:29, 12.96s/it] {'loss': 0.0039, 'learning_rate': 1.3555000000000002e-05, 'epoch': 2.75} 73%|███████▎ | 7303/10000 [26:38:03<9:42:29, 12.96s/it] 73%|███████▎ | 7304/10000 [26:38:16<9:42:30, 12.96s/it] {'loss': 0.0054, 'learning_rate': 1.3550000000000002e-05, 'epoch': 2.75} 73%|███████▎ | 7304/10000 [26:38:16<9:42:30, 12.96s/it] 73%|███████▎ | 7305/10000 [26:38:29<9:42:00, 12.96s/it] {'loss': 0.0045, 'learning_rate': 1.3545e-05, 'epoch': 2.75} 73%|███████▎ | 7305/10000 [26:38:29<9:42:00, 12.96s/it] 73%|███████▎ | 7306/10000 [26:38:42<9:41:32, 12.95s/it] {'loss': 0.0049, 'learning_rate': 1.3539999999999999e-05, 'epoch': 2.75} 73%|███████▎ | 7306/10000 [26:38:42<9:41:32, 12.95s/it] 73%|███████▎ | 7307/10000 [26:38:55<9:42:00, 12.97s/it] {'loss': 0.005, 'learning_rate': 1.3535e-05, 'epoch': 2.75} 73%|███████▎ | 7307/10000 [26:38:55<9:42:00, 12.97s/it] 73%|███████▎ | 7308/10000 [26:39:08<9:40:32, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.3530000000000001e-05, 'epoch': 2.75} 73%|███████▎ | 7308/10000 [26:39:08<9:40:32, 12.94s/it] 73%|███████▎ | 7309/10000 [26:39:20<9:40:20, 12.94s/it] {'loss': 0.0055, 'learning_rate': 1.3525000000000002e-05, 'epoch': 2.75} 73%|███████▎ | 7309/10000 [26:39:21<9:40:20, 12.94s/it] 73%|███████▎ | 7310/10000 [26:39:34<9:41:15, 12.96s/it] {'loss': 0.0046, 'learning_rate': 1.352e-05, 'epoch': 2.75} 73%|███████▎ | 7310/10000 [26:39:34<9:41:15, 12.96s/it] 73%|███████▎ | 7311/10000 [26:39:46<9:40:17, 12.95s/it] {'loss': 0.0047, 'learning_rate': 1.3515e-05, 'epoch': 2.75} 73%|███████▎ | 7311/10000 [26:39:46<9:40:17, 12.95s/it] 73%|███████▎ | 7312/10000 [26:39:59<9:40:30, 12.96s/it] {'loss': 0.0038, 'learning_rate': 1.3510000000000001e-05, 'epoch': 2.76} 73%|███████▎ | 7312/10000 [26:39:59<9:40:30, 12.96s/it] 73%|███████▎ | 7313/10000 [26:40:12<9:40:07, 12.95s/it] {'loss': 0.0053, 'learning_rate': 1.3505e-05, 'epoch': 2.76} 73%|███████▎ | 7313/10000 [26:40:12<9:40:07, 12.95s/it] 73%|███████▎ | 7314/10000 [26:40:25<9:39:02, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.3500000000000001e-05, 'epoch': 2.76} 73%|███████▎ | 7314/10000 [26:40:25<9:39:02, 12.93s/it] 73%|███████▎ | 7315/10000 [26:40:38<9:38:30, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.3494999999999999e-05, 'epoch': 2.76} 73%|███████▎ | 7315/10000 [26:40:38<9:38:30, 12.93s/it] 73%|███████▎ | 7316/10000 [26:40:51<9:37:15, 12.90s/it] {'loss': 0.0048, 'learning_rate': 1.349e-05, 'epoch': 2.76} 73%|███████▎ | 7316/10000 [26:40:51<9:37:15, 12.90s/it] 73%|███████▎ | 7317/10000 [26:41:04<9:38:15, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.3485e-05, 'epoch': 2.76} 73%|███████▎ | 7317/10000 [26:41:04<9:38:15, 12.93s/it] 73%|███████▎ | 7318/10000 [26:41:17<9:38:36, 12.94s/it] {'loss': 0.0049, 'learning_rate': 1.3480000000000001e-05, 'epoch': 2.76} 73%|███████▎ | 7318/10000 [26:41:17<9:38:36, 12.94s/it] 73%|███████▎ | 7319/10000 [26:41:30<9:39:18, 12.96s/it] {'loss': 0.0045, 'learning_rate': 1.3475000000000002e-05, 'epoch': 2.76} 73%|███████▎ | 7319/10000 [26:41:30<9:39:18, 12.96s/it] 73%|███████▎ | 7320/10000 [26:41:43<9:38:41, 12.96s/it] {'loss': 0.0052, 'learning_rate': 1.347e-05, 'epoch': 2.76} 73%|███████▎ | 7320/10000 [26:41:43<9:38:41, 12.96s/it] 73%|███████▎ | 7321/10000 [26:41:56<9:38:22, 12.95s/it] {'loss': 0.0039, 'learning_rate': 1.3465e-05, 'epoch': 2.76} 73%|███████▎ | 7321/10000 [26:41:56<9:38:22, 12.95s/it] 73%|███████▎ | 7322/10000 [26:42:09<9:38:07, 12.95s/it] {'loss': 0.0054, 'learning_rate': 1.346e-05, 'epoch': 2.76} 73%|███████▎ | 7322/10000 [26:42:09<9:38:07, 12.95s/it] 73%|███████▎ | 7323/10000 [26:42:22<9:37:51, 12.95s/it] {'loss': 0.004, 'learning_rate': 1.3455e-05, 'epoch': 2.76} 73%|███████▎ | 7323/10000 [26:42:22<9:37:51, 12.95s/it] 73%|███████▎ | 7324/10000 [26:42:35<9:38:08, 12.96s/it] {'loss': 0.0047, 'learning_rate': 1.3450000000000002e-05, 'epoch': 2.76} 73%|███████▎ | 7324/10000 [26:42:35<9:38:08, 12.96s/it] 73%|███████▎ | 7325/10000 [26:42:48<9:37:08, 12.95s/it] {'loss': 0.004, 'learning_rate': 1.3445e-05, 'epoch': 2.76} 73%|███████▎ | 7325/10000 [26:42:48<9:37:08, 12.95s/it] 73%|███████▎ | 7326/10000 [26:43:01<9:36:14, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.344e-05, 'epoch': 2.76} 73%|███████▎ | 7326/10000 [26:43:01<9:36:14, 12.93s/it] 73%|███████▎ | 7327/10000 [26:43:13<9:36:10, 12.93s/it] {'loss': 0.0037, 'learning_rate': 1.3435000000000001e-05, 'epoch': 2.76} 73%|███████▎ | 7327/10000 [26:43:14<9:36:10, 12.93s/it] 73%|███████▎ | 7328/10000 [26:43:26<9:35:55, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.343e-05, 'epoch': 2.76} 73%|███████▎ | 7328/10000 [26:43:26<9:35:55, 12.93s/it] 73%|███████▎ | 7329/10000 [26:43:39<9:35:44, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.3425000000000001e-05, 'epoch': 2.76} 73%|███████▎ | 7329/10000 [26:43:39<9:35:44, 12.93s/it] 73%|███████▎ | 7330/10000 [26:43:52<9:35:09, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.3420000000000002e-05, 'epoch': 2.76} 73%|███████▎ | 7330/10000 [26:43:52<9:35:09, 12.92s/it] 73%|███████▎ | 7331/10000 [26:44:05<9:34:48, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.3415e-05, 'epoch': 2.76} 73%|███████▎ | 7331/10000 [26:44:05<9:34:48, 12.92s/it] 73%|███████▎ | 7332/10000 [26:44:18<9:33:54, 12.91s/it] {'loss': 0.006, 'learning_rate': 1.341e-05, 'epoch': 2.76} 73%|███████▎ | 7332/10000 [26:44:18<9:33:54, 12.91s/it] 73%|███████▎ | 7333/10000 [26:44:31<9:33:50, 12.91s/it] {'loss': 0.0053, 'learning_rate': 1.3405000000000001e-05, 'epoch': 2.76} 73%|███████▎ | 7333/10000 [26:44:31<9:33:50, 12.91s/it] 73%|███████▎ | 7334/10000 [26:44:44<9:33:20, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.3400000000000002e-05, 'epoch': 2.76} 73%|███████▎ | 7334/10000 [26:44:44<9:33:20, 12.90s/it] 73%|███████▎ | 7335/10000 [26:44:57<9:33:07, 12.90s/it] {'loss': 0.005, 'learning_rate': 1.3395000000000001e-05, 'epoch': 2.76} 73%|███████▎ | 7335/10000 [26:44:57<9:33:07, 12.90s/it] 73%|███████▎ | 7336/10000 [26:45:10<9:33:07, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.339e-05, 'epoch': 2.76} 73%|███████▎ | 7336/10000 [26:45:10<9:33:07, 12.91s/it] 73%|███████▎ | 7337/10000 [26:45:23<9:32:18, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.3385e-05, 'epoch': 2.76} 73%|███████▎ | 7337/10000 [26:45:23<9:32:18, 12.89s/it] 73%|███████▎ | 7338/10000 [26:45:36<9:33:31, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.338e-05, 'epoch': 2.76} 73%|███████▎ | 7338/10000 [26:45:36<9:33:31, 12.93s/it] 73%|███████▎ | 7339/10000 [26:45:48<9:32:01, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.3375000000000002e-05, 'epoch': 2.77} 73%|███████▎ | 7339/10000 [26:45:48<9:32:01, 12.90s/it] 73%|███████▎ | 7340/10000 [26:46:01<9:31:59, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.3370000000000002e-05, 'epoch': 2.77} 73%|███████▎ | 7340/10000 [26:46:01<9:31:59, 12.90s/it] 73%|███████▎ | 7341/10000 [26:46:14<9:31:49, 12.90s/it] {'loss': 0.0054, 'learning_rate': 1.3365e-05, 'epoch': 2.77} 73%|███████▎ | 7341/10000 [26:46:14<9:31:49, 12.90s/it] 73%|███████▎ | 7342/10000 [26:46:27<9:31:32, 12.90s/it] {'loss': 0.0033, 'learning_rate': 1.336e-05, 'epoch': 2.77} 73%|███████▎ | 7342/10000 [26:46:27<9:31:32, 12.90s/it] 73%|███████▎ | 7343/10000 [26:46:40<9:31:41, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.3355e-05, 'epoch': 2.77} 73%|███████▎ | 7343/10000 [26:46:40<9:31:41, 12.91s/it] 73%|███████▎ | 7344/10000 [26:46:53<9:32:21, 12.93s/it] {'loss': 0.0054, 'learning_rate': 1.3350000000000001e-05, 'epoch': 2.77} 73%|███████▎ | 7344/10000 [26:46:53<9:32:21, 12.93s/it] 73%|███████▎ | 7345/10000 [26:47:06<9:32:33, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.3345000000000002e-05, 'epoch': 2.77} 73%|███████▎ | 7345/10000 [26:47:06<9:32:33, 12.94s/it] 73%|███████▎ | 7346/10000 [26:47:19<9:32:00, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.334e-05, 'epoch': 2.77} 73%|███████▎ | 7346/10000 [26:47:19<9:32:00, 12.93s/it] 73%|███████▎ | 7347/10000 [26:47:32<9:31:46, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.3335e-05, 'epoch': 2.77} 73%|███████▎ | 7347/10000 [26:47:32<9:31:46, 12.93s/it] 73%|███████▎ | 7348/10000 [26:47:45<9:32:15, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.3330000000000001e-05, 'epoch': 2.77} 73%|███████▎ | 7348/10000 [26:47:45<9:32:15, 12.95s/it] 73%|███████▎ | 7349/10000 [26:47:58<9:31:34, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.3325000000000002e-05, 'epoch': 2.77} 73%|███████▎ | 7349/10000 [26:47:58<9:31:34, 12.94s/it] 74%|███████▎ | 7350/10000 [26:48:11<9:31:50, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.3320000000000001e-05, 'epoch': 2.77} 74%|███████▎ | 7350/10000 [26:48:11<9:31:50, 12.95s/it] 74%|███████▎ | 7351/10000 [26:48:24<9:30:23, 12.92s/it] {'loss': 0.0034, 'learning_rate': 1.3315e-05, 'epoch': 2.77} 74%|███████▎ | 7351/10000 [26:48:24<9:30:23, 12.92s/it] 74%|███████▎ | 7352/10000 [26:48:36<9:30:14, 12.92s/it] {'loss': 0.0052, 'learning_rate': 1.331e-05, 'epoch': 2.77} 74%|███████▎ | 7352/10000 [26:48:36<9:30:14, 12.92s/it] 74%|███████▎ | 7353/10000 [26:48:49<9:29:22, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.3305e-05, 'epoch': 2.77} 74%|███████▎ | 7353/10000 [26:48:49<9:29:22, 12.91s/it] 74%|███████▎ | 7354/10000 [26:49:02<9:28:09, 12.88s/it] {'loss': 0.0054, 'learning_rate': 1.3300000000000001e-05, 'epoch': 2.77} 74%|███████▎ | 7354/10000 [26:49:02<9:28:09, 12.88s/it] 74%|███████▎ | 7355/10000 [26:49:15<9:28:18, 12.89s/it] {'loss': 0.0047, 'learning_rate': 1.3295000000000002e-05, 'epoch': 2.77} 74%|███████▎ | 7355/10000 [26:49:15<9:28:18, 12.89s/it] 74%|███████▎ | 7356/10000 [26:49:28<9:27:08, 12.87s/it] {'loss': 0.005, 'learning_rate': 1.329e-05, 'epoch': 2.77} 74%|███████▎ | 7356/10000 [26:49:28<9:27:08, 12.87s/it] 74%|███████▎ | 7357/10000 [26:49:41<9:27:06, 12.87s/it] {'loss': 0.0054, 'learning_rate': 1.3285e-05, 'epoch': 2.77} 74%|███████▎ | 7357/10000 [26:49:41<9:27:06, 12.87s/it] 74%|███████▎ | 7358/10000 [26:49:54<9:27:36, 12.89s/it] {'loss': 0.0041, 'learning_rate': 1.3280000000000002e-05, 'epoch': 2.77} 74%|███████▎ | 7358/10000 [26:49:54<9:27:36, 12.89s/it] 74%|███████▎ | 7359/10000 [26:50:07<9:27:33, 12.89s/it] {'loss': 0.0053, 'learning_rate': 1.3275e-05, 'epoch': 2.77} 74%|███████▎ | 7359/10000 [26:50:07<9:27:33, 12.89s/it] 74%|███████▎ | 7360/10000 [26:50:19<9:26:36, 12.88s/it] {'loss': 0.0048, 'learning_rate': 1.3270000000000002e-05, 'epoch': 2.77} 74%|███████▎ | 7360/10000 [26:50:19<9:26:36, 12.88s/it] 74%|███████▎ | 7361/10000 [26:50:32<9:26:21, 12.88s/it] {'loss': 0.0049, 'learning_rate': 1.3265e-05, 'epoch': 2.77} 74%|███████▎ | 7361/10000 [26:50:32<9:26:21, 12.88s/it] 74%|███████▎ | 7362/10000 [26:50:45<9:26:04, 12.88s/it] {'loss': 0.0041, 'learning_rate': 1.326e-05, 'epoch': 2.77} 74%|███████▎ | 7362/10000 [26:50:45<9:26:04, 12.88s/it] 74%|███████▎ | 7363/10000 [26:50:58<9:26:24, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.3255000000000001e-05, 'epoch': 2.77} 74%|███████▎ | 7363/10000 [26:50:58<9:26:24, 12.89s/it] 74%|███████▎ | 7364/10000 [26:51:11<9:26:03, 12.88s/it] {'loss': 0.0038, 'learning_rate': 1.3250000000000002e-05, 'epoch': 2.77} 74%|███████▎ | 7364/10000 [26:51:11<9:26:03, 12.88s/it] 74%|███████▎ | 7365/10000 [26:51:24<9:25:23, 12.87s/it] {'loss': 0.0052, 'learning_rate': 1.3245000000000001e-05, 'epoch': 2.78} 74%|███████▎ | 7365/10000 [26:51:24<9:25:23, 12.87s/it] 74%|███████▎ | 7366/10000 [26:51:37<9:25:45, 12.89s/it] {'loss': 0.0057, 'learning_rate': 1.324e-05, 'epoch': 2.78} 74%|███████▎ | 7366/10000 [26:51:37<9:25:45, 12.89s/it] 74%|███████▎ | 7367/10000 [26:51:50<9:25:51, 12.89s/it] {'loss': 0.0039, 'learning_rate': 1.3235e-05, 'epoch': 2.78} 74%|███████▎ | 7367/10000 [26:51:50<9:25:51, 12.89s/it] 74%|███████▎ | 7368/10000 [26:52:03<9:26:48, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.323e-05, 'epoch': 2.78} 74%|███████▎ | 7368/10000 [26:52:03<9:26:48, 12.92s/it] 74%|███████▎ | 7369/10000 [26:52:16<9:27:42, 12.95s/it] {'loss': 0.0031, 'learning_rate': 1.3225000000000001e-05, 'epoch': 2.78} 74%|███████▎ | 7369/10000 [26:52:16<9:27:42, 12.95s/it] 74%|███████▎ | 7370/10000 [26:52:29<9:26:39, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.3220000000000002e-05, 'epoch': 2.78} 74%|███████▎ | 7370/10000 [26:52:29<9:26:39, 12.93s/it] 74%|███████▎ | 7371/10000 [26:52:41<9:26:12, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.3215e-05, 'epoch': 2.78} 74%|███████▎ | 7371/10000 [26:52:41<9:26:12, 12.92s/it] 74%|███████▎ | 7372/10000 [26:52:54<9:26:26, 12.93s/it] {'loss': 0.0034, 'learning_rate': 1.321e-05, 'epoch': 2.78} 74%|███████▎ | 7372/10000 [26:52:54<9:26:26, 12.93s/it] 74%|███████▎ | 7373/10000 [26:53:07<9:25:58, 12.93s/it] {'loss': 0.0051, 'learning_rate': 1.3205000000000001e-05, 'epoch': 2.78} 74%|███████▎ | 7373/10000 [26:53:07<9:25:58, 12.93s/it] 74%|███████▎ | 7374/10000 [26:53:20<9:25:23, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.32e-05, 'epoch': 2.78} 74%|███████▎ | 7374/10000 [26:53:20<9:25:23, 12.92s/it] 74%|███████▍ | 7375/10000 [26:53:33<9:24:18, 12.90s/it] {'loss': 0.0048, 'learning_rate': 1.3195000000000002e-05, 'epoch': 2.78} 74%|███████▍ | 7375/10000 [26:53:33<9:24:18, 12.90s/it] 74%|███████▍ | 7376/10000 [26:53:46<9:24:32, 12.91s/it] {'loss': 0.0056, 'learning_rate': 1.3189999999999999e-05, 'epoch': 2.78} 74%|███████▍ | 7376/10000 [26:53:46<9:24:32, 12.91s/it] 74%|███████▍ | 7377/10000 [26:53:59<9:25:03, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.3185e-05, 'epoch': 2.78} 74%|███████▍ | 7377/10000 [26:53:59<9:25:03, 12.93s/it] 74%|███████▍ | 7378/10000 [26:54:12<9:25:08, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.3180000000000001e-05, 'epoch': 2.78} 74%|███████▍ | 7378/10000 [26:54:12<9:25:08, 12.93s/it] 74%|███████▍ | 7379/10000 [26:54:25<9:26:05, 12.96s/it] {'loss': 0.0044, 'learning_rate': 1.3175000000000002e-05, 'epoch': 2.78} 74%|███████▍ | 7379/10000 [26:54:25<9:26:05, 12.96s/it] 74%|███████▍ | 7380/10000 [26:54:38<9:26:36, 12.98s/it] {'loss': 0.004, 'learning_rate': 1.3170000000000001e-05, 'epoch': 2.78} 74%|███████▍ | 7380/10000 [26:54:38<9:26:36, 12.98s/it] 74%|███████▍ | 7381/10000 [26:54:51<9:26:08, 12.97s/it] {'loss': 0.0049, 'learning_rate': 1.3165e-05, 'epoch': 2.78} 74%|███████▍ | 7381/10000 [26:54:51<9:26:08, 12.97s/it] 74%|███████▍ | 7382/10000 [26:55:04<9:25:15, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.316e-05, 'epoch': 2.78} 74%|███████▍ | 7382/10000 [26:55:04<9:25:15, 12.95s/it] 74%|███████▍ | 7383/10000 [26:55:17<9:25:25, 12.96s/it] {'loss': 0.0056, 'learning_rate': 1.3155e-05, 'epoch': 2.78} 74%|███████▍ | 7383/10000 [26:55:17<9:25:25, 12.96s/it] 74%|███████▍ | 7384/10000 [26:55:30<9:26:11, 12.99s/it] {'loss': 0.0055, 'learning_rate': 1.3150000000000001e-05, 'epoch': 2.78} 74%|███████▍ | 7384/10000 [26:55:30<9:26:11, 12.99s/it] 74%|███████▍ | 7385/10000 [26:55:43<9:24:29, 12.95s/it] {'loss': 0.0053, 'learning_rate': 1.3145000000000002e-05, 'epoch': 2.78} 74%|███████▍ | 7385/10000 [26:55:43<9:24:29, 12.95s/it] 74%|███████▍ | 7386/10000 [26:55:56<9:23:49, 12.94s/it] {'loss': 0.0037, 'learning_rate': 1.314e-05, 'epoch': 2.78} 74%|███████▍ | 7386/10000 [26:55:56<9:23:49, 12.94s/it] 74%|███████▍ | 7387/10000 [26:56:09<9:22:53, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.3135e-05, 'epoch': 2.78} 74%|███████▍ | 7387/10000 [26:56:09<9:22:53, 12.93s/it] 74%|███████▍ | 7388/10000 [26:56:21<9:22:42, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.3130000000000001e-05, 'epoch': 2.78} 74%|███████▍ | 7388/10000 [26:56:21<9:22:42, 12.93s/it] 74%|███████▍ | 7389/10000 [26:56:34<9:22:19, 12.92s/it] {'loss': 0.0052, 'learning_rate': 1.3125e-05, 'epoch': 2.78} 74%|███████▍ | 7389/10000 [26:56:34<9:22:19, 12.92s/it] 74%|███████▍ | 7390/10000 [26:56:47<9:23:05, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.3120000000000001e-05, 'epoch': 2.78} 74%|███████▍ | 7390/10000 [26:56:47<9:23:05, 12.94s/it] 74%|███████▍ | 7391/10000 [26:57:00<9:23:24, 12.96s/it] {'loss': 0.0051, 'learning_rate': 1.3114999999999999e-05, 'epoch': 2.78} 74%|███████▍ | 7391/10000 [26:57:00<9:23:24, 12.96s/it] 74%|███████▍ | 7392/10000 [26:57:13<9:22:53, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.311e-05, 'epoch': 2.79} 74%|███████▍ | 7392/10000 [26:57:13<9:22:53, 12.95s/it] 74%|███████▍ | 7393/10000 [26:57:26<9:24:39, 13.00s/it] {'loss': 0.0043, 'learning_rate': 1.3105e-05, 'epoch': 2.79} 74%|███████▍ | 7393/10000 [26:57:26<9:24:39, 13.00s/it] 74%|███████▍ | 7394/10000 [26:57:39<9:23:56, 12.98s/it] {'loss': 0.0059, 'learning_rate': 1.3100000000000002e-05, 'epoch': 2.79} 74%|███████▍ | 7394/10000 [26:57:39<9:23:56, 12.98s/it] 74%|███████▍ | 7395/10000 [26:57:52<9:23:30, 12.98s/it] {'loss': 0.0044, 'learning_rate': 1.3095000000000003e-05, 'epoch': 2.79} 74%|███████▍ | 7395/10000 [26:57:52<9:23:30, 12.98s/it] 74%|███████▍ | 7396/10000 [26:58:05<9:23:07, 12.98s/it] {'loss': 0.0045, 'learning_rate': 1.309e-05, 'epoch': 2.79} 74%|███████▍ | 7396/10000 [26:58:05<9:23:07, 12.98s/it] 74%|███████▍ | 7397/10000 [26:58:18<9:22:44, 12.97s/it] {'loss': 0.0061, 'learning_rate': 1.3085e-05, 'epoch': 2.79} 74%|███████▍ | 7397/10000 [26:58:18<9:22:44, 12.97s/it] 74%|███████▍ | 7398/10000 [26:58:31<9:22:40, 12.97s/it] {'loss': 0.0045, 'learning_rate': 1.308e-05, 'epoch': 2.79} 74%|███████▍ | 7398/10000 [26:58:31<9:22:40, 12.97s/it] 74%|███████▍ | 7399/10000 [26:58:44<9:20:13, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.3075000000000001e-05, 'epoch': 2.79} 74%|███████▍ | 7399/10000 [26:58:44<9:20:13, 12.92s/it] 74%|███████▍ | 7400/10000 [26:58:57<9:20:04, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.3070000000000002e-05, 'epoch': 2.79} 74%|███████▍ | 7400/10000 [26:58:57<9:20:04, 12.92s/it] 74%|███████▍ | 7401/10000 [26:59:10<9:20:11, 12.93s/it] {'loss': 0.0047, 'learning_rate': 1.3065e-05, 'epoch': 2.79} 74%|███████▍ | 7401/10000 [26:59:10<9:20:11, 12.93s/it] 74%|███████▍ | 7402/10000 [26:59:23<9:18:48, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.306e-05, 'epoch': 2.79} 74%|███████▍ | 7402/10000 [26:59:23<9:18:48, 12.91s/it] 74%|███████▍ | 7403/10000 [26:59:36<9:17:50, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.3055000000000001e-05, 'epoch': 2.79} 74%|███████▍ | 7403/10000 [26:59:36<9:17:50, 12.89s/it] 74%|███████▍ | 7404/10000 [26:59:48<9:17:39, 12.89s/it] {'loss': 0.0048, 'learning_rate': 1.305e-05, 'epoch': 2.79} 74%|███████▍ | 7404/10000 [26:59:48<9:17:39, 12.89s/it] 74%|███████▍ | 7405/10000 [27:00:01<9:16:40, 12.87s/it] {'loss': 0.0054, 'learning_rate': 1.3045000000000001e-05, 'epoch': 2.79} 74%|███████▍ | 7405/10000 [27:00:01<9:16:40, 12.87s/it] 74%|███████▍ | 7406/10000 [27:00:14<9:17:33, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.3039999999999999e-05, 'epoch': 2.79} 74%|███████▍ | 7406/10000 [27:00:14<9:17:33, 12.90s/it] 74%|███████▍ | 7407/10000 [27:00:27<9:17:32, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.3035e-05, 'epoch': 2.79} 74%|███████▍ | 7407/10000 [27:00:27<9:17:32, 12.90s/it] 74%|███████▍ | 7408/10000 [27:00:40<9:16:51, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.303e-05, 'epoch': 2.79} 74%|███████▍ | 7408/10000 [27:00:40<9:16:51, 12.89s/it] 74%|███████▍ | 7409/10000 [27:00:53<9:17:15, 12.90s/it] {'loss': 0.004, 'learning_rate': 1.3025000000000002e-05, 'epoch': 2.79} 74%|███████▍ | 7409/10000 [27:00:53<9:17:15, 12.90s/it] 74%|███████▍ | 7410/10000 [27:01:06<9:17:50, 12.92s/it] {'loss': 0.0051, 'learning_rate': 1.3020000000000002e-05, 'epoch': 2.79} 74%|███████▍ | 7410/10000 [27:01:06<9:17:50, 12.92s/it] 74%|███████▍ | 7411/10000 [27:01:19<9:16:38, 12.90s/it] {'loss': 0.0045, 'learning_rate': 1.3015e-05, 'epoch': 2.79} 74%|███████▍ | 7411/10000 [27:01:19<9:16:38, 12.90s/it] 74%|███████▍ | 7412/10000 [27:01:32<9:17:37, 12.93s/it] {'loss': 0.0032, 'learning_rate': 1.301e-05, 'epoch': 2.79} 74%|███████▍ | 7412/10000 [27:01:32<9:17:37, 12.93s/it] 74%|███████▍ | 7413/10000 [27:01:45<9:18:29, 12.95s/it] {'loss': 0.004, 'learning_rate': 1.3005e-05, 'epoch': 2.79} 74%|███████▍ | 7413/10000 [27:01:45<9:18:29, 12.95s/it] 74%|███████▍ | 7414/10000 [27:01:58<9:18:51, 12.97s/it] {'loss': 0.0041, 'learning_rate': 1.3000000000000001e-05, 'epoch': 2.79} 74%|███████▍ | 7414/10000 [27:01:58<9:18:51, 12.97s/it] 74%|███████▍ | 7415/10000 [27:02:11<9:18:05, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.2995000000000002e-05, 'epoch': 2.79} 74%|███████▍ | 7415/10000 [27:02:11<9:18:05, 12.95s/it] 74%|███████▍ | 7416/10000 [27:02:24<9:17:43, 12.95s/it] {'loss': 0.0066, 'learning_rate': 1.299e-05, 'epoch': 2.79} 74%|███████▍ | 7416/10000 [27:02:24<9:17:43, 12.95s/it] 74%|███████▍ | 7417/10000 [27:02:37<9:16:57, 12.94s/it] {'loss': 0.0047, 'learning_rate': 1.2985e-05, 'epoch': 2.79} 74%|███████▍ | 7417/10000 [27:02:37<9:16:57, 12.94s/it] 74%|███████▍ | 7418/10000 [27:02:50<9:17:38, 12.96s/it] {'loss': 0.0038, 'learning_rate': 1.2980000000000001e-05, 'epoch': 2.8} 74%|███████▍ | 7418/10000 [27:02:50<9:17:38, 12.96s/it] 74%|███████▍ | 7419/10000 [27:03:03<9:18:14, 12.98s/it] {'loss': 0.0051, 'learning_rate': 1.2975e-05, 'epoch': 2.8} 74%|███████▍ | 7419/10000 [27:03:03<9:18:14, 12.98s/it] 74%|███████▍ | 7420/10000 [27:03:16<9:17:54, 12.97s/it] {'loss': 0.0039, 'learning_rate': 1.2970000000000001e-05, 'epoch': 2.8} 74%|███████▍ | 7420/10000 [27:03:16<9:17:54, 12.97s/it] 74%|███████▍ | 7421/10000 [27:03:28<9:17:17, 12.97s/it] {'loss': 0.0045, 'learning_rate': 1.2964999999999999e-05, 'epoch': 2.8} 74%|███████▍ | 7421/10000 [27:03:29<9:17:17, 12.97s/it] 74%|███████▍ | 7422/10000 [27:03:41<9:16:51, 12.96s/it] {'loss': 0.0048, 'learning_rate': 1.296e-05, 'epoch': 2.8} 74%|███████▍ | 7422/10000 [27:03:41<9:16:51, 12.96s/it] 74%|███████▍ | 7423/10000 [27:03:54<9:17:00, 12.97s/it] {'loss': 0.0039, 'learning_rate': 1.2955e-05, 'epoch': 2.8} 74%|███████▍ | 7423/10000 [27:03:54<9:17:00, 12.97s/it] 74%|███████▍ | 7424/10000 [27:04:07<9:16:59, 12.97s/it] {'loss': 0.0057, 'learning_rate': 1.2950000000000001e-05, 'epoch': 2.8} 74%|███████▍ | 7424/10000 [27:04:07<9:16:59, 12.97s/it] 74%|███████▍ | 7425/10000 [27:04:20<9:16:53, 12.98s/it] {'loss': 0.004, 'learning_rate': 1.2945000000000002e-05, 'epoch': 2.8} 74%|███████▍ | 7425/10000 [27:04:20<9:16:53, 12.98s/it] 74%|███████▍ | 7426/10000 [27:04:33<9:14:50, 12.93s/it] {'loss': 0.006, 'learning_rate': 1.294e-05, 'epoch': 2.8} 74%|███████▍ | 7426/10000 [27:04:33<9:14:50, 12.93s/it] 74%|███████▍ | 7427/10000 [27:04:46<9:14:42, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.2935e-05, 'epoch': 2.8} 74%|███████▍ | 7427/10000 [27:04:46<9:14:42, 12.94s/it] 74%|███████▍ | 7428/10000 [27:04:59<9:15:09, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.293e-05, 'epoch': 2.8} 74%|███████▍ | 7428/10000 [27:04:59<9:15:09, 12.95s/it] 74%|███████▍ | 7429/10000 [27:05:12<9:13:38, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.2925e-05, 'epoch': 2.8} 74%|███████▍ | 7429/10000 [27:05:12<9:13:38, 12.92s/it] 74%|███████▍ | 7430/10000 [27:05:25<9:13:26, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.2920000000000002e-05, 'epoch': 2.8} 74%|███████▍ | 7430/10000 [27:05:25<9:13:26, 12.92s/it] 74%|███████▍ | 7431/10000 [27:05:38<9:13:30, 12.93s/it] {'loss': 0.0032, 'learning_rate': 1.2915e-05, 'epoch': 2.8} 74%|███████▍ | 7431/10000 [27:05:38<9:13:30, 12.93s/it] 74%|███████▍ | 7432/10000 [27:05:51<9:12:52, 12.92s/it] {'loss': 0.0036, 'learning_rate': 1.291e-05, 'epoch': 2.8} 74%|███████▍ | 7432/10000 [27:05:51<9:12:52, 12.92s/it] 74%|███████▍ | 7433/10000 [27:06:04<9:14:40, 12.96s/it] {'loss': 0.0033, 'learning_rate': 1.2905000000000001e-05, 'epoch': 2.8} 74%|███████▍ | 7433/10000 [27:06:04<9:14:40, 12.96s/it] 74%|███████▍ | 7434/10000 [27:06:17<9:13:59, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.29e-05, 'epoch': 2.8} 74%|███████▍ | 7434/10000 [27:06:17<9:13:59, 12.95s/it] 74%|███████▍ | 7435/10000 [27:06:30<9:12:54, 12.93s/it] {'loss': 0.0061, 'learning_rate': 1.2895000000000001e-05, 'epoch': 2.8} 74%|███████▍ | 7435/10000 [27:06:30<9:12:54, 12.93s/it] 74%|███████▍ | 7436/10000 [27:06:43<9:12:59, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.2889999999999999e-05, 'epoch': 2.8} 74%|███████▍ | 7436/10000 [27:06:43<9:12:59, 12.94s/it] 74%|███████▍ | 7437/10000 [27:06:56<9:12:39, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.2885e-05, 'epoch': 2.8} 74%|███████▍ | 7437/10000 [27:06:56<9:12:39, 12.94s/it] 74%|███████▍ | 7438/10000 [27:07:08<9:11:42, 12.92s/it] {'loss': 0.0031, 'learning_rate': 1.288e-05, 'epoch': 2.8} 74%|███████▍ | 7438/10000 [27:07:08<9:11:42, 12.92s/it] 74%|███████▍ | 7439/10000 [27:07:21<9:12:20, 12.94s/it] {'loss': 0.0033, 'learning_rate': 1.2875000000000001e-05, 'epoch': 2.8} 74%|███████▍ | 7439/10000 [27:07:21<9:12:20, 12.94s/it] 74%|███████▍ | 7440/10000 [27:07:34<9:12:56, 12.96s/it] {'loss': 0.0043, 'learning_rate': 1.2870000000000002e-05, 'epoch': 2.8} 74%|███████▍ | 7440/10000 [27:07:34<9:12:56, 12.96s/it] 74%|███████▍ | 7441/10000 [27:07:47<9:12:28, 12.95s/it] {'loss': 0.0035, 'learning_rate': 1.2865e-05, 'epoch': 2.8} 74%|███████▍ | 7441/10000 [27:07:47<9:12:28, 12.95s/it] 74%|███████▍ | 7442/10000 [27:08:00<9:13:42, 12.99s/it] {'loss': 0.003, 'learning_rate': 1.286e-05, 'epoch': 2.8} 74%|███████▍ | 7442/10000 [27:08:00<9:13:42, 12.99s/it] 74%|███████▍ | 7443/10000 [27:08:13<9:12:05, 12.95s/it] {'loss': 0.0052, 'learning_rate': 1.2855e-05, 'epoch': 2.8} 74%|███████▍ | 7443/10000 [27:08:13<9:12:05, 12.95s/it] 74%|███████▍ | 7444/10000 [27:08:26<9:12:51, 12.98s/it] {'loss': 0.0042, 'learning_rate': 1.285e-05, 'epoch': 2.8} 74%|███████▍ | 7444/10000 [27:08:26<9:12:51, 12.98s/it] 74%|███████▍ | 7445/10000 [27:08:39<9:13:51, 13.01s/it] {'loss': 0.0045, 'learning_rate': 1.2845000000000002e-05, 'epoch': 2.81} 74%|███████▍ | 7445/10000 [27:08:39<9:13:51, 13.01s/it] 74%|███████▍ | 7446/10000 [27:08:52<9:14:33, 13.03s/it] {'loss': 0.0039, 'learning_rate': 1.2839999999999999e-05, 'epoch': 2.81} 74%|███████▍ | 7446/10000 [27:08:53<9:14:33, 13.03s/it] 74%|███████▍ | 7447/10000 [27:09:06<9:14:14, 13.03s/it] {'loss': 0.0051, 'learning_rate': 1.2835e-05, 'epoch': 2.81} 74%|███████▍ | 7447/10000 [27:09:06<9:14:14, 13.03s/it] 74%|███████▍ | 7448/10000 [27:09:19<9:14:25, 13.04s/it] {'loss': 0.0041, 'learning_rate': 1.283e-05, 'epoch': 2.81} 74%|███████▍ | 7448/10000 [27:09:19<9:14:25, 13.04s/it] 74%|███████▍ | 7449/10000 [27:09:32<9:14:25, 13.04s/it] {'loss': 0.006, 'learning_rate': 1.2825000000000002e-05, 'epoch': 2.81} 74%|███████▍ | 7449/10000 [27:09:32<9:14:25, 13.04s/it] 74%|███████▍ | 7450/10000 [27:09:44<9:11:54, 12.99s/it] {'loss': 0.0054, 'learning_rate': 1.2820000000000001e-05, 'epoch': 2.81} 74%|███████▍ | 7450/10000 [27:09:45<9:11:54, 12.99s/it] 75%|███████▍ | 7451/10000 [27:09:57<9:11:34, 12.98s/it] {'loss': 0.0043, 'learning_rate': 1.2814999999999998e-05, 'epoch': 2.81} 75%|███████▍ | 7451/10000 [27:09:57<9:11:34, 12.98s/it] 75%|███████▍ | 7452/10000 [27:10:10<9:09:25, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.281e-05, 'epoch': 2.81} 75%|███████▍ | 7452/10000 [27:10:10<9:09:25, 12.94s/it] 75%|███████▍ | 7453/10000 [27:10:23<9:08:16, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.2805e-05, 'epoch': 2.81} 75%|███████▍ | 7453/10000 [27:10:23<9:08:16, 12.92s/it] 75%|███████▍ | 7454/10000 [27:10:36<9:08:37, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.2800000000000001e-05, 'epoch': 2.81} 75%|███████▍ | 7454/10000 [27:10:36<9:08:37, 12.93s/it] 75%|███████▍ | 7455/10000 [27:10:49<9:08:17, 12.93s/it] {'loss': 0.0054, 'learning_rate': 1.2795000000000002e-05, 'epoch': 2.81} 75%|███████▍ | 7455/10000 [27:10:49<9:08:17, 12.93s/it] 75%|███████▍ | 7456/10000 [27:11:02<9:08:31, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.2790000000000001e-05, 'epoch': 2.81} 75%|███████▍ | 7456/10000 [27:11:02<9:08:31, 12.94s/it] 75%|███████▍ | 7457/10000 [27:11:15<9:08:13, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.2785e-05, 'epoch': 2.81} 75%|███████▍ | 7457/10000 [27:11:15<9:08:13, 12.93s/it] 75%|███████▍ | 7458/10000 [27:11:28<9:07:00, 12.91s/it] {'loss': 0.0052, 'learning_rate': 1.278e-05, 'epoch': 2.81} 75%|███████▍ | 7458/10000 [27:11:28<9:07:00, 12.91s/it] 75%|███████▍ | 7459/10000 [27:11:41<9:06:34, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.2775e-05, 'epoch': 2.81} 75%|███████▍ | 7459/10000 [27:11:41<9:06:34, 12.91s/it] 75%|███████▍ | 7460/10000 [27:11:54<9:06:39, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.2770000000000001e-05, 'epoch': 2.81} 75%|███████▍ | 7460/10000 [27:11:54<9:06:39, 12.91s/it] 75%|███████▍ | 7461/10000 [27:12:06<9:05:38, 12.89s/it] {'loss': 0.005, 'learning_rate': 1.2765000000000002e-05, 'epoch': 2.81} 75%|███████▍ | 7461/10000 [27:12:06<9:05:38, 12.89s/it] 75%|███████▍ | 7462/10000 [27:12:19<9:05:52, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.276e-05, 'epoch': 2.81} 75%|███████▍ | 7462/10000 [27:12:19<9:05:52, 12.91s/it] 75%|███████▍ | 7463/10000 [27:12:32<9:05:33, 12.90s/it] {'loss': 0.0045, 'learning_rate': 1.2755e-05, 'epoch': 2.81} 75%|███████▍ | 7463/10000 [27:12:32<9:05:33, 12.90s/it] 75%|███████▍ | 7464/10000 [27:12:45<9:05:07, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.2750000000000002e-05, 'epoch': 2.81} 75%|███████▍ | 7464/10000 [27:12:45<9:05:07, 12.90s/it] 75%|███████▍ | 7465/10000 [27:12:58<9:05:32, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.2745e-05, 'epoch': 2.81} 75%|███████▍ | 7465/10000 [27:12:58<9:05:32, 12.91s/it] 75%|███████▍ | 7466/10000 [27:13:11<9:05:27, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.2740000000000002e-05, 'epoch': 2.81} 75%|███████▍ | 7466/10000 [27:13:11<9:05:27, 12.92s/it] 75%|███████▍ | 7467/10000 [27:13:24<9:04:46, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.2735e-05, 'epoch': 2.81} 75%|███████▍ | 7467/10000 [27:13:24<9:04:46, 12.90s/it] 75%|███████▍ | 7468/10000 [27:13:37<9:05:39, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.273e-05, 'epoch': 2.81} 75%|███████▍ | 7468/10000 [27:13:37<9:05:39, 12.93s/it] 75%|███████▍ | 7469/10000 [27:13:50<9:05:48, 12.94s/it] {'loss': 0.0054, 'learning_rate': 1.2725000000000001e-05, 'epoch': 2.81} 75%|███████▍ | 7469/10000 [27:13:50<9:05:48, 12.94s/it] 75%|███████▍ | 7470/10000 [27:14:03<9:05:37, 12.94s/it] {'loss': 0.0059, 'learning_rate': 1.2720000000000002e-05, 'epoch': 2.81} 75%|███████▍ | 7470/10000 [27:14:03<9:05:37, 12.94s/it] 75%|███████▍ | 7471/10000 [27:14:16<9:06:26, 12.96s/it] {'loss': 0.0045, 'learning_rate': 1.2715000000000001e-05, 'epoch': 2.81} 75%|███████▍ | 7471/10000 [27:14:16<9:06:26, 12.96s/it] 75%|███████▍ | 7472/10000 [27:14:29<9:04:52, 12.93s/it] {'loss': 0.004, 'learning_rate': 1.271e-05, 'epoch': 2.82} 75%|███████▍ | 7472/10000 [27:14:29<9:04:52, 12.93s/it] 75%|███████▍ | 7473/10000 [27:14:42<9:03:34, 12.91s/it] {'loss': 0.005, 'learning_rate': 1.2705e-05, 'epoch': 2.82} 75%|███████▍ | 7473/10000 [27:14:42<9:03:34, 12.91s/it] 75%|███████▍ | 7474/10000 [27:14:54<9:02:47, 12.89s/it] {'loss': 0.0041, 'learning_rate': 1.27e-05, 'epoch': 2.82} 75%|███████▍ | 7474/10000 [27:14:54<9:02:47, 12.89s/it] 75%|███████▍ | 7475/10000 [27:15:07<9:02:30, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.2695000000000001e-05, 'epoch': 2.82} 75%|███████▍ | 7475/10000 [27:15:07<9:02:30, 12.89s/it] 75%|███████▍ | 7476/10000 [27:15:20<9:03:15, 12.91s/it] {'loss': 0.0052, 'learning_rate': 1.2690000000000002e-05, 'epoch': 2.82} 75%|███████▍ | 7476/10000 [27:15:20<9:03:15, 12.91s/it] 75%|███████▍ | 7477/10000 [27:15:33<9:03:36, 12.93s/it] {'loss': 0.0052, 'learning_rate': 1.2685e-05, 'epoch': 2.82} 75%|███████▍ | 7477/10000 [27:15:33<9:03:36, 12.93s/it] 75%|███████▍ | 7478/10000 [27:15:46<9:04:16, 12.95s/it] {'loss': 0.0048, 'learning_rate': 1.268e-05, 'epoch': 2.82} 75%|███████▍ | 7478/10000 [27:15:46<9:04:16, 12.95s/it] 75%|███████▍ | 7479/10000 [27:15:59<9:03:53, 12.94s/it] {'loss': 0.0052, 'learning_rate': 1.2675000000000001e-05, 'epoch': 2.82} 75%|███████▍ | 7479/10000 [27:15:59<9:03:53, 12.94s/it] 75%|███████▍ | 7480/10000 [27:16:12<9:03:13, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.267e-05, 'epoch': 2.82} 75%|███████▍ | 7480/10000 [27:16:12<9:03:13, 12.93s/it] 75%|███████▍ | 7481/10000 [27:16:25<9:03:59, 12.96s/it] {'loss': 0.0054, 'learning_rate': 1.2665000000000002e-05, 'epoch': 2.82} 75%|███████▍ | 7481/10000 [27:16:25<9:03:59, 12.96s/it] 75%|███████▍ | 7482/10000 [27:16:38<9:02:52, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.2659999999999999e-05, 'epoch': 2.82} 75%|███████▍ | 7482/10000 [27:16:38<9:02:52, 12.94s/it] 75%|███████▍ | 7483/10000 [27:16:51<9:01:25, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.2655e-05, 'epoch': 2.82} 75%|███████▍ | 7483/10000 [27:16:51<9:01:25, 12.91s/it] 75%|███████▍ | 7484/10000 [27:17:04<9:01:04, 12.90s/it] {'loss': 0.0044, 'learning_rate': 1.2650000000000001e-05, 'epoch': 2.82} 75%|███████▍ | 7484/10000 [27:17:04<9:01:04, 12.90s/it] 75%|███████▍ | 7485/10000 [27:17:17<9:00:45, 12.90s/it] {'loss': 0.0038, 'learning_rate': 1.2645000000000002e-05, 'epoch': 2.82} 75%|███████▍ | 7485/10000 [27:17:17<9:00:45, 12.90s/it] 75%|███████▍ | 7486/10000 [27:17:29<9:00:29, 12.90s/it] {'loss': 0.0064, 'learning_rate': 1.2640000000000003e-05, 'epoch': 2.82} 75%|███████▍ | 7486/10000 [27:17:29<9:00:29, 12.90s/it] 75%|███████▍ | 7487/10000 [27:17:42<9:00:31, 12.91s/it] {'loss': 0.005, 'learning_rate': 1.2635e-05, 'epoch': 2.82} 75%|███████▍ | 7487/10000 [27:17:42<9:00:31, 12.91s/it] 75%|███████▍ | 7488/10000 [27:17:55<9:00:25, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.263e-05, 'epoch': 2.82} 75%|███████▍ | 7488/10000 [27:17:55<9:00:25, 12.91s/it] 75%|███████▍ | 7489/10000 [27:18:08<9:01:50, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.2625e-05, 'epoch': 2.82} 75%|███████▍ | 7489/10000 [27:18:08<9:01:50, 12.95s/it] 75%|███████▍ | 7490/10000 [27:18:21<9:01:03, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.2620000000000001e-05, 'epoch': 2.82} 75%|███████▍ | 7490/10000 [27:18:21<9:01:03, 12.93s/it] 75%|███████▍ | 7491/10000 [27:18:34<9:01:37, 12.95s/it] {'loss': 0.0044, 'learning_rate': 1.2615000000000002e-05, 'epoch': 2.82} 75%|███████▍ | 7491/10000 [27:18:34<9:01:37, 12.95s/it] 75%|███████▍ | 7492/10000 [27:18:47<9:00:41, 12.94s/it] {'loss': 0.0036, 'learning_rate': 1.261e-05, 'epoch': 2.82} 75%|███████▍ | 7492/10000 [27:18:47<9:00:41, 12.94s/it] 75%|███████▍ | 7493/10000 [27:19:00<9:00:30, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.2605e-05, 'epoch': 2.82} 75%|███████▍ | 7493/10000 [27:19:00<9:00:30, 12.94s/it] 75%|███████▍ | 7494/10000 [27:19:13<9:00:06, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.2600000000000001e-05, 'epoch': 2.82} 75%|███████▍ | 7494/10000 [27:19:13<9:00:06, 12.93s/it] 75%|███████▍ | 7495/10000 [27:19:26<8:58:47, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.2595e-05, 'epoch': 2.82} 75%|███████▍ | 7495/10000 [27:19:26<8:58:47, 12.91s/it] 75%|███████▍ | 7496/10000 [27:19:39<8:59:00, 12.92s/it] {'loss': 0.0054, 'learning_rate': 1.2590000000000001e-05, 'epoch': 2.82} 75%|███████▍ | 7496/10000 [27:19:39<8:59:00, 12.92s/it] 75%|███████▍ | 7497/10000 [27:19:52<8:58:01, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.2584999999999999e-05, 'epoch': 2.82} 75%|███████▍ | 7497/10000 [27:19:52<8:58:01, 12.90s/it] 75%|███████▍ | 7498/10000 [27:20:05<8:57:58, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.258e-05, 'epoch': 2.83} 75%|███████▍ | 7498/10000 [27:20:05<8:57:58, 12.90s/it] 75%|███████▍ | 7499/10000 [27:20:17<8:57:46, 12.90s/it] {'loss': 0.004, 'learning_rate': 1.2575e-05, 'epoch': 2.83} 75%|███████▍ | 7499/10000 [27:20:17<8:57:46, 12.90s/it] 75%|███████▌ | 7500/10000 [27:20:30<8:57:11, 12.89s/it] {'loss': 0.0046, 'learning_rate': 1.2570000000000002e-05, 'epoch': 2.83} 75%|███████▌ | 7500/10000 [27:20:30<8:57:11, 12.89s/it] 75%|███████▌ | 7501/10000 [27:20:43<8:57:14, 12.90s/it] {'loss': 0.0036, 'learning_rate': 1.2565000000000003e-05, 'epoch': 2.83} 75%|███████▌ | 7501/10000 [27:20:43<8:57:14, 12.90s/it] 75%|███████▌ | 7502/10000 [27:20:56<8:57:29, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.256e-05, 'epoch': 2.83} 75%|███████▌ | 7502/10000 [27:20:56<8:57:29, 12.91s/it] 75%|███████▌ | 7503/10000 [27:21:09<8:58:35, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.2555000000000001e-05, 'epoch': 2.83} 75%|███████▌ | 7503/10000 [27:21:09<8:58:35, 12.94s/it] 75%|███████▌ | 7504/10000 [27:21:22<8:57:54, 12.93s/it] {'loss': 0.0054, 'learning_rate': 1.255e-05, 'epoch': 2.83} 75%|███████▌ | 7504/10000 [27:21:22<8:57:54, 12.93s/it] 75%|███████▌ | 7505/10000 [27:21:35<8:58:02, 12.94s/it] {'loss': 0.0054, 'learning_rate': 1.2545000000000001e-05, 'epoch': 2.83} 75%|███████▌ | 7505/10000 [27:21:35<8:58:02, 12.94s/it] 75%|███████▌ | 7506/10000 [27:21:48<8:58:26, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.2540000000000002e-05, 'epoch': 2.83} 75%|███████▌ | 7506/10000 [27:21:48<8:58:26, 12.95s/it] 75%|███████▌ | 7507/10000 [27:22:01<8:57:49, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.2535e-05, 'epoch': 2.83} 75%|███████▌ | 7507/10000 [27:22:01<8:57:49, 12.94s/it] 75%|███████▌ | 7508/10000 [27:22:14<8:56:27, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.253e-05, 'epoch': 2.83} 75%|███████▌ | 7508/10000 [27:22:14<8:56:27, 12.92s/it] 75%|███████▌ | 7509/10000 [27:22:27<8:56:56, 12.93s/it] {'loss': 0.0036, 'learning_rate': 1.2525000000000001e-05, 'epoch': 2.83} 75%|███████▌ | 7509/10000 [27:22:27<8:56:56, 12.93s/it] 75%|███████▌ | 7510/10000 [27:22:40<8:56:43, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.252e-05, 'epoch': 2.83} 75%|███████▌ | 7510/10000 [27:22:40<8:56:43, 12.93s/it] 75%|███████▌ | 7511/10000 [27:22:53<8:57:14, 12.95s/it] {'loss': 0.0051, 'learning_rate': 1.2515000000000001e-05, 'epoch': 2.83} 75%|███████▌ | 7511/10000 [27:22:53<8:57:14, 12.95s/it] 75%|███████▌ | 7512/10000 [27:23:06<8:57:36, 12.96s/it] {'loss': 0.004, 'learning_rate': 1.2509999999999999e-05, 'epoch': 2.83} 75%|███████▌ | 7512/10000 [27:23:06<8:57:36, 12.96s/it] 75%|███████▌ | 7513/10000 [27:23:19<8:56:35, 12.95s/it] {'loss': 0.0037, 'learning_rate': 1.2505e-05, 'epoch': 2.83} 75%|███████▌ | 7513/10000 [27:23:19<8:56:35, 12.95s/it] 75%|███████▌ | 7514/10000 [27:23:32<8:56:07, 12.94s/it] {'loss': 0.0044, 'learning_rate': 1.25e-05, 'epoch': 2.83} 75%|███████▌ | 7514/10000 [27:23:32<8:56:07, 12.94s/it] 75%|███████▌ | 7515/10000 [27:23:44<8:56:24, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.2495000000000001e-05, 'epoch': 2.83} 75%|███████▌ | 7515/10000 [27:23:45<8:56:24, 12.95s/it] 75%|███████▌ | 7516/10000 [27:23:57<8:56:05, 12.95s/it] {'loss': 0.0048, 'learning_rate': 1.249e-05, 'epoch': 2.83} 75%|███████▌ | 7516/10000 [27:23:57<8:56:05, 12.95s/it] 75%|███████▌ | 7517/10000 [27:24:10<8:54:50, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.2485000000000002e-05, 'epoch': 2.83} 75%|███████▌ | 7517/10000 [27:24:10<8:54:50, 12.92s/it] 75%|███████▌ | 7518/10000 [27:24:23<8:53:56, 12.91s/it] {'loss': 0.0057, 'learning_rate': 1.248e-05, 'epoch': 2.83} 75%|███████▌ | 7518/10000 [27:24:23<8:53:56, 12.91s/it] 75%|███████▌ | 7519/10000 [27:24:36<8:54:30, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.2475e-05, 'epoch': 2.83} 75%|███████▌ | 7519/10000 [27:24:36<8:54:30, 12.93s/it] 75%|███████▌ | 7520/10000 [27:24:49<8:53:33, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.2470000000000001e-05, 'epoch': 2.83} 75%|███████▌ | 7520/10000 [27:24:49<8:53:33, 12.91s/it] 75%|███████▌ | 7521/10000 [27:25:02<8:53:46, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.2465e-05, 'epoch': 2.83} 75%|███████▌ | 7521/10000 [27:25:02<8:53:46, 12.92s/it] 75%|███████▌ | 7522/10000 [27:25:15<8:54:24, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.2460000000000001e-05, 'epoch': 2.83} 75%|███████▌ | 7522/10000 [27:25:15<8:54:24, 12.94s/it] 75%|███████▌ | 7523/10000 [27:25:28<8:54:44, 12.95s/it] {'loss': 0.004, 'learning_rate': 1.2455e-05, 'epoch': 2.83} 75%|███████▌ | 7523/10000 [27:25:28<8:54:44, 12.95s/it] 75%|███████▌ | 7524/10000 [27:25:41<8:54:24, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.2450000000000001e-05, 'epoch': 2.83} 75%|███████▌ | 7524/10000 [27:25:41<8:54:24, 12.95s/it] 75%|███████▌ | 7525/10000 [27:25:54<8:54:52, 12.97s/it] {'loss': 0.0041, 'learning_rate': 1.2445e-05, 'epoch': 2.84} 75%|███████▌ | 7525/10000 [27:25:54<8:54:52, 12.97s/it] 75%|███████▌ | 7526/10000 [27:26:07<8:55:04, 12.98s/it] {'loss': 0.0046, 'learning_rate': 1.244e-05, 'epoch': 2.84} 75%|███████▌ | 7526/10000 [27:26:07<8:55:04, 12.98s/it] 75%|███████▌ | 7527/10000 [27:26:20<8:55:20, 12.99s/it] {'loss': 0.0041, 'learning_rate': 1.2435e-05, 'epoch': 2.84} 75%|███████▌ | 7527/10000 [27:26:20<8:55:20, 12.99s/it] 75%|███████▌ | 7528/10000 [27:26:33<8:53:43, 12.95s/it] {'loss': 0.0048, 'learning_rate': 1.243e-05, 'epoch': 2.84} 75%|███████▌ | 7528/10000 [27:26:33<8:53:43, 12.95s/it] 75%|███████▌ | 7529/10000 [27:26:46<8:53:24, 12.95s/it] {'loss': 0.0051, 'learning_rate': 1.2425e-05, 'epoch': 2.84} 75%|███████▌ | 7529/10000 [27:26:46<8:53:24, 12.95s/it] 75%|███████▌ | 7530/10000 [27:26:59<8:52:31, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.2420000000000001e-05, 'epoch': 2.84} 75%|███████▌ | 7530/10000 [27:26:59<8:52:31, 12.94s/it] 75%|███████▌ | 7531/10000 [27:27:12<8:52:58, 12.95s/it] {'loss': 0.0057, 'learning_rate': 1.2415e-05, 'epoch': 2.84} 75%|███████▌ | 7531/10000 [27:27:12<8:52:58, 12.95s/it] 75%|███████▌ | 7532/10000 [27:27:25<8:53:36, 12.97s/it] {'loss': 0.0055, 'learning_rate': 1.2410000000000001e-05, 'epoch': 2.84} 75%|███████▌ | 7532/10000 [27:27:25<8:53:36, 12.97s/it] 75%|███████▌ | 7533/10000 [27:27:38<8:52:51, 12.96s/it] {'loss': 0.0056, 'learning_rate': 1.2405e-05, 'epoch': 2.84} 75%|███████▌ | 7533/10000 [27:27:38<8:52:51, 12.96s/it] 75%|███████▌ | 7534/10000 [27:27:50<8:51:43, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.24e-05, 'epoch': 2.84} 75%|███████▌ | 7534/10000 [27:27:50<8:51:43, 12.94s/it] 75%|███████▌ | 7535/10000 [27:28:03<8:51:58, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.2395e-05, 'epoch': 2.84} 75%|███████▌ | 7535/10000 [27:28:03<8:51:58, 12.95s/it] 75%|███████▌ | 7536/10000 [27:28:16<8:52:29, 12.97s/it] {'loss': 0.0051, 'learning_rate': 1.239e-05, 'epoch': 2.84} 75%|███████▌ | 7536/10000 [27:28:16<8:52:29, 12.97s/it] 75%|███████▌ | 7537/10000 [27:28:29<8:52:19, 12.97s/it] {'loss': 0.0041, 'learning_rate': 1.2385000000000001e-05, 'epoch': 2.84} 75%|███████▌ | 7537/10000 [27:28:29<8:52:19, 12.97s/it] 75%|███████▌ | 7538/10000 [27:28:42<8:52:16, 12.97s/it] {'loss': 0.0038, 'learning_rate': 1.238e-05, 'epoch': 2.84} 75%|███████▌ | 7538/10000 [27:28:42<8:52:16, 12.97s/it] 75%|███████▌ | 7539/10000 [27:28:55<8:52:26, 12.98s/it] {'loss': 0.0041, 'learning_rate': 1.2375000000000001e-05, 'epoch': 2.84} 75%|███████▌ | 7539/10000 [27:28:55<8:52:26, 12.98s/it] 75%|███████▌ | 7540/10000 [27:29:08<8:50:58, 12.95s/it] {'loss': 0.0051, 'learning_rate': 1.2370000000000002e-05, 'epoch': 2.84} 75%|███████▌ | 7540/10000 [27:29:08<8:50:58, 12.95s/it] 75%|███████▌ | 7541/10000 [27:29:21<8:50:27, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.2365e-05, 'epoch': 2.84} 75%|███████▌ | 7541/10000 [27:29:21<8:50:27, 12.94s/it] 75%|███████▌ | 7542/10000 [27:29:34<8:50:33, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.236e-05, 'epoch': 2.84} 75%|███████▌ | 7542/10000 [27:29:34<8:50:33, 12.95s/it] 75%|███████▌ | 7543/10000 [27:29:47<8:50:24, 12.95s/it] {'loss': 0.0029, 'learning_rate': 1.2355e-05, 'epoch': 2.84} 75%|███████▌ | 7543/10000 [27:29:47<8:50:24, 12.95s/it] 75%|███████▌ | 7544/10000 [27:30:00<8:50:56, 12.97s/it] {'loss': 0.0051, 'learning_rate': 1.235e-05, 'epoch': 2.84} 75%|███████▌ | 7544/10000 [27:30:00<8:50:56, 12.97s/it] 75%|███████▌ | 7545/10000 [27:30:13<8:49:35, 12.94s/it] {'loss': 0.0047, 'learning_rate': 1.2345000000000001e-05, 'epoch': 2.84} 75%|███████▌ | 7545/10000 [27:30:13<8:49:35, 12.94s/it] 75%|███████▌ | 7546/10000 [27:30:26<8:48:25, 12.92s/it] {'loss': 0.0037, 'learning_rate': 1.234e-05, 'epoch': 2.84} 75%|███████▌ | 7546/10000 [27:30:26<8:48:25, 12.92s/it] 75%|███████▌ | 7547/10000 [27:30:39<8:48:44, 12.93s/it] {'loss': 0.0036, 'learning_rate': 1.2335000000000001e-05, 'epoch': 2.84} 75%|███████▌ | 7547/10000 [27:30:39<8:48:44, 12.93s/it] 75%|███████▌ | 7548/10000 [27:30:52<8:48:05, 12.92s/it] {'loss': 0.005, 'learning_rate': 1.233e-05, 'epoch': 2.84} 75%|███████▌ | 7548/10000 [27:30:52<8:48:05, 12.92s/it] 75%|███████▌ | 7549/10000 [27:31:05<8:46:10, 12.88s/it] {'loss': 0.0053, 'learning_rate': 1.2325e-05, 'epoch': 2.84} 75%|███████▌ | 7549/10000 [27:31:05<8:46:10, 12.88s/it] 76%|███████▌ | 7550/10000 [27:31:17<8:46:36, 12.90s/it] {'loss': 0.0055, 'learning_rate': 1.232e-05, 'epoch': 2.84} 76%|███████▌ | 7550/10000 [27:31:17<8:46:36, 12.90s/it] 76%|███████▌ | 7551/10000 [27:31:30<8:47:41, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.2315e-05, 'epoch': 2.85} 76%|███████▌ | 7551/10000 [27:31:31<8:47:41, 12.93s/it] 76%|███████▌ | 7552/10000 [27:31:43<8:48:48, 12.96s/it] {'loss': 0.0037, 'learning_rate': 1.231e-05, 'epoch': 2.85} 76%|███████▌ | 7552/10000 [27:31:44<8:48:48, 12.96s/it] 76%|███████▌ | 7553/10000 [27:31:56<8:47:10, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.2305000000000002e-05, 'epoch': 2.85} 76%|███████▌ | 7553/10000 [27:31:56<8:47:10, 12.93s/it] 76%|███████▌ | 7554/10000 [27:32:09<8:47:11, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.23e-05, 'epoch': 2.85} 76%|███████▌ | 7554/10000 [27:32:09<8:47:11, 12.93s/it] 76%|███████▌ | 7555/10000 [27:32:22<8:46:59, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.2295000000000002e-05, 'epoch': 2.85} 76%|███████▌ | 7555/10000 [27:32:22<8:46:59, 12.93s/it] 76%|███████▌ | 7556/10000 [27:32:35<8:46:48, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.2290000000000001e-05, 'epoch': 2.85} 76%|███████▌ | 7556/10000 [27:32:35<8:46:48, 12.93s/it] 76%|███████▌ | 7557/10000 [27:32:48<8:46:46, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.2285e-05, 'epoch': 2.85} 76%|███████▌ | 7557/10000 [27:32:48<8:46:46, 12.94s/it] 76%|███████▌ | 7558/10000 [27:33:01<8:46:50, 12.94s/it] {'loss': 0.0057, 'learning_rate': 1.2280000000000001e-05, 'epoch': 2.85} 76%|███████▌ | 7558/10000 [27:33:01<8:46:50, 12.94s/it] 76%|███████▌ | 7559/10000 [27:33:14<8:46:50, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.2275e-05, 'epoch': 2.85} 76%|███████▌ | 7559/10000 [27:33:14<8:46:50, 12.95s/it] 76%|███████▌ | 7560/10000 [27:33:27<8:47:14, 12.97s/it] {'loss': 0.0048, 'learning_rate': 1.2270000000000001e-05, 'epoch': 2.85} 76%|███████▌ | 7560/10000 [27:33:27<8:47:14, 12.97s/it] 76%|███████▌ | 7561/10000 [27:33:40<8:46:35, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.2265e-05, 'epoch': 2.85} 76%|███████▌ | 7561/10000 [27:33:40<8:46:35, 12.95s/it] 76%|███████▌ | 7562/10000 [27:33:53<8:45:18, 12.93s/it] {'loss': 0.0037, 'learning_rate': 1.2260000000000001e-05, 'epoch': 2.85} 76%|███████▌ | 7562/10000 [27:33:53<8:45:18, 12.93s/it] 76%|███████▌ | 7563/10000 [27:34:06<8:46:13, 12.96s/it] {'loss': 0.0046, 'learning_rate': 1.2255e-05, 'epoch': 2.85} 76%|███████▌ | 7563/10000 [27:34:06<8:46:13, 12.96s/it] 76%|███████▌ | 7564/10000 [27:34:19<8:45:55, 12.95s/it] {'loss': 0.0052, 'learning_rate': 1.225e-05, 'epoch': 2.85} 76%|███████▌ | 7564/10000 [27:34:19<8:45:55, 12.95s/it] 76%|███████▌ | 7565/10000 [27:34:32<8:46:06, 12.96s/it] {'loss': 0.0036, 'learning_rate': 1.2245e-05, 'epoch': 2.85} 76%|███████▌ | 7565/10000 [27:34:32<8:46:06, 12.96s/it] 76%|███████▌ | 7566/10000 [27:34:45<8:45:25, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.224e-05, 'epoch': 2.85} 76%|███████▌ | 7566/10000 [27:34:45<8:45:25, 12.95s/it] 76%|███████▌ | 7567/10000 [27:34:58<8:44:23, 12.93s/it] {'loss': 0.0052, 'learning_rate': 1.2235e-05, 'epoch': 2.85} 76%|███████▌ | 7567/10000 [27:34:58<8:44:23, 12.93s/it] 76%|███████▌ | 7568/10000 [27:35:10<8:43:31, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.2230000000000001e-05, 'epoch': 2.85} 76%|███████▌ | 7568/10000 [27:35:10<8:43:31, 12.92s/it] 76%|███████▌ | 7569/10000 [27:35:23<8:43:10, 12.91s/it] {'loss': 0.0053, 'learning_rate': 1.2225e-05, 'epoch': 2.85} 76%|███████▌ | 7569/10000 [27:35:23<8:43:10, 12.91s/it] 76%|███████▌ | 7570/10000 [27:35:36<8:43:33, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.2220000000000002e-05, 'epoch': 2.85} 76%|███████▌ | 7570/10000 [27:35:36<8:43:33, 12.93s/it] 76%|███████▌ | 7571/10000 [27:35:49<8:42:32, 12.91s/it] {'loss': 0.0028, 'learning_rate': 1.2215e-05, 'epoch': 2.85} 76%|███████▌ | 7571/10000 [27:35:49<8:42:32, 12.91s/it] 76%|███████▌ | 7572/10000 [27:36:02<8:43:39, 12.94s/it] {'loss': 0.005, 'learning_rate': 1.221e-05, 'epoch': 2.85} 76%|███████▌ | 7572/10000 [27:36:02<8:43:39, 12.94s/it] 76%|███████▌ | 7573/10000 [27:36:15<8:44:24, 12.96s/it] {'loss': 0.0044, 'learning_rate': 1.2205000000000001e-05, 'epoch': 2.85} 76%|███████▌ | 7573/10000 [27:36:15<8:44:24, 12.96s/it] 76%|███████▌ | 7574/10000 [27:36:28<8:43:21, 12.94s/it] {'loss': 0.0051, 'learning_rate': 1.22e-05, 'epoch': 2.85} 76%|███████▌ | 7574/10000 [27:36:28<8:43:21, 12.94s/it] 76%|███████▌ | 7575/10000 [27:36:41<8:42:50, 12.94s/it] {'loss': 0.0053, 'learning_rate': 1.2195000000000001e-05, 'epoch': 2.85} 76%|███████▌ | 7575/10000 [27:36:41<8:42:50, 12.94s/it] 76%|███████▌ | 7576/10000 [27:36:54<8:42:47, 12.94s/it] {'loss': 0.0057, 'learning_rate': 1.219e-05, 'epoch': 2.85} 76%|███████▌ | 7576/10000 [27:36:54<8:42:47, 12.94s/it] 76%|███████▌ | 7577/10000 [27:37:07<8:42:29, 12.94s/it] {'loss': 0.0047, 'learning_rate': 1.2185000000000001e-05, 'epoch': 2.85} 76%|███████▌ | 7577/10000 [27:37:07<8:42:29, 12.94s/it] 76%|███████▌ | 7578/10000 [27:37:20<8:42:32, 12.95s/it] {'loss': 0.004, 'learning_rate': 1.2180000000000002e-05, 'epoch': 2.86} 76%|███████▌ | 7578/10000 [27:37:20<8:42:32, 12.95s/it] 76%|███████▌ | 7579/10000 [27:37:33<8:41:32, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.2175e-05, 'epoch': 2.86} 76%|███████▌ | 7579/10000 [27:37:33<8:41:32, 12.93s/it] 76%|███████▌ | 7580/10000 [27:37:46<8:40:58, 12.92s/it] {'loss': 0.0052, 'learning_rate': 1.217e-05, 'epoch': 2.86} 76%|███████▌ | 7580/10000 [27:37:46<8:40:58, 12.92s/it] 76%|███████▌ | 7581/10000 [27:37:59<8:41:20, 12.93s/it] {'loss': 0.0054, 'learning_rate': 1.2165e-05, 'epoch': 2.86} 76%|███████▌ | 7581/10000 [27:37:59<8:41:20, 12.93s/it] 76%|███████▌ | 7582/10000 [27:38:12<8:40:45, 12.92s/it] {'loss': 0.0054, 'learning_rate': 1.216e-05, 'epoch': 2.86} 76%|███████▌ | 7582/10000 [27:38:12<8:40:45, 12.92s/it] 76%|███████▌ | 7583/10000 [27:38:24<8:40:06, 12.91s/it] {'loss': 0.0054, 'learning_rate': 1.2155000000000001e-05, 'epoch': 2.86} 76%|███████▌ | 7583/10000 [27:38:24<8:40:06, 12.91s/it] 76%|███████▌ | 7584/10000 [27:38:37<8:40:01, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.215e-05, 'epoch': 2.86} 76%|███████▌ | 7584/10000 [27:38:37<8:40:01, 12.91s/it] 76%|███████▌ | 7585/10000 [27:38:50<8:39:15, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.2145000000000001e-05, 'epoch': 2.86} 76%|███████▌ | 7585/10000 [27:38:50<8:39:15, 12.90s/it] 76%|███████▌ | 7586/10000 [27:39:03<8:40:11, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.214e-05, 'epoch': 2.86} 76%|███████▌ | 7586/10000 [27:39:03<8:40:11, 12.93s/it] 76%|███████▌ | 7587/10000 [27:39:16<8:40:21, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.2135e-05, 'epoch': 2.86} 76%|███████▌ | 7587/10000 [27:39:16<8:40:21, 12.94s/it] 76%|███████▌ | 7588/10000 [27:39:29<8:40:29, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.213e-05, 'epoch': 2.86} 76%|███████▌ | 7588/10000 [27:39:29<8:40:29, 12.95s/it] 76%|███████▌ | 7589/10000 [27:39:42<8:40:44, 12.96s/it] {'loss': 0.0044, 'learning_rate': 1.2125e-05, 'epoch': 2.86} 76%|███████▌ | 7589/10000 [27:39:42<8:40:44, 12.96s/it] 76%|███████▌ | 7590/10000 [27:39:55<8:39:22, 12.93s/it] {'loss': 0.0038, 'learning_rate': 1.2120000000000001e-05, 'epoch': 2.86} 76%|███████▌ | 7590/10000 [27:39:55<8:39:22, 12.93s/it] 76%|███████▌ | 7591/10000 [27:40:08<8:38:53, 12.92s/it] {'loss': 0.0052, 'learning_rate': 1.2115e-05, 'epoch': 2.86} 76%|███████▌ | 7591/10000 [27:40:08<8:38:53, 12.92s/it] 76%|███████▌ | 7592/10000 [27:40:21<8:38:47, 12.93s/it] {'loss': 0.0039, 'learning_rate': 1.2110000000000001e-05, 'epoch': 2.86} 76%|███████▌ | 7592/10000 [27:40:21<8:38:47, 12.93s/it] 76%|███████▌ | 7593/10000 [27:40:34<8:38:31, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.2105000000000002e-05, 'epoch': 2.86} 76%|███████▌ | 7593/10000 [27:40:34<8:38:31, 12.93s/it] 76%|███████▌ | 7594/10000 [27:40:47<8:37:34, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.2100000000000001e-05, 'epoch': 2.86} 76%|███████▌ | 7594/10000 [27:40:47<8:37:34, 12.91s/it] 76%|███████▌ | 7595/10000 [27:41:00<8:38:27, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.2095e-05, 'epoch': 2.86} 76%|███████▌ | 7595/10000 [27:41:00<8:38:27, 12.93s/it] 76%|███████▌ | 7596/10000 [27:41:13<8:37:59, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.209e-05, 'epoch': 2.86} 76%|███████▌ | 7596/10000 [27:41:13<8:37:59, 12.93s/it] 76%|███████▌ | 7597/10000 [27:41:25<8:38:14, 12.94s/it] {'loss': 0.0035, 'learning_rate': 1.2085e-05, 'epoch': 2.86} 76%|███████▌ | 7597/10000 [27:41:26<8:38:14, 12.94s/it] 76%|███████▌ | 7598/10000 [27:41:38<8:37:32, 12.93s/it] {'loss': 0.0051, 'learning_rate': 1.2080000000000001e-05, 'epoch': 2.86} 76%|███████▌ | 7598/10000 [27:41:38<8:37:32, 12.93s/it] 76%|███████▌ | 7599/10000 [27:41:51<8:38:07, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.2075e-05, 'epoch': 2.86} 76%|███████▌ | 7599/10000 [27:41:51<8:38:07, 12.95s/it] 76%|███████▌ | 7600/10000 [27:42:04<8:38:11, 12.95s/it] {'loss': 0.0049, 'learning_rate': 1.2070000000000001e-05, 'epoch': 2.86} 76%|███████▌ | 7600/10000 [27:42:04<8:38:11, 12.95s/it] 76%|███████▌ | 7601/10000 [27:42:17<8:38:05, 12.96s/it] {'loss': 0.0051, 'learning_rate': 1.2065e-05, 'epoch': 2.86} 76%|███████▌ | 7601/10000 [27:42:17<8:38:05, 12.96s/it] 76%|███████▌ | 7602/10000 [27:42:30<8:36:38, 12.93s/it] {'loss': 0.0057, 'learning_rate': 1.206e-05, 'epoch': 2.86} 76%|███████▌ | 7602/10000 [27:42:30<8:36:38, 12.93s/it] 76%|███████▌ | 7603/10000 [27:42:43<8:37:05, 12.94s/it] {'loss': 0.0047, 'learning_rate': 1.2055e-05, 'epoch': 2.86} 76%|███████▌ | 7603/10000 [27:42:43<8:37:05, 12.94s/it] 76%|███████▌ | 7604/10000 [27:42:56<8:37:00, 12.95s/it] {'loss': 0.005, 'learning_rate': 1.205e-05, 'epoch': 2.87} 76%|███████▌ | 7604/10000 [27:42:56<8:37:00, 12.95s/it] 76%|███████▌ | 7605/10000 [27:43:09<8:37:22, 12.96s/it] {'loss': 0.0053, 'learning_rate': 1.2045e-05, 'epoch': 2.87} 76%|███████▌ | 7605/10000 [27:43:09<8:37:22, 12.96s/it] 76%|███████▌ | 7606/10000 [27:43:22<8:36:38, 12.95s/it] {'loss': 0.0053, 'learning_rate': 1.204e-05, 'epoch': 2.87} 76%|███████▌ | 7606/10000 [27:43:22<8:36:38, 12.95s/it] 76%|███████▌ | 7607/10000 [27:43:35<8:36:27, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.2035e-05, 'epoch': 2.87} 76%|███████▌ | 7607/10000 [27:43:35<8:36:27, 12.95s/it] 76%|███████▌ | 7608/10000 [27:43:48<8:36:56, 12.97s/it] {'loss': 0.0039, 'learning_rate': 1.2030000000000002e-05, 'epoch': 2.87} 76%|███████▌ | 7608/10000 [27:43:48<8:36:56, 12.97s/it] 76%|███████▌ | 7609/10000 [27:44:01<8:35:51, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.2025000000000001e-05, 'epoch': 2.87} 76%|███████▌ | 7609/10000 [27:44:01<8:35:51, 12.95s/it] 76%|███████▌ | 7610/10000 [27:44:14<8:36:15, 12.96s/it] {'loss': 0.0039, 'learning_rate': 1.202e-05, 'epoch': 2.87} 76%|███████▌ | 7610/10000 [27:44:14<8:36:15, 12.96s/it] 76%|███████▌ | 7611/10000 [27:44:27<8:36:42, 12.98s/it] {'loss': 0.0045, 'learning_rate': 1.2015000000000001e-05, 'epoch': 2.87} 76%|███████▌ | 7611/10000 [27:44:27<8:36:42, 12.98s/it] 76%|███████▌ | 7612/10000 [27:44:40<8:37:09, 12.99s/it] {'loss': 0.0045, 'learning_rate': 1.201e-05, 'epoch': 2.87} 76%|███████▌ | 7612/10000 [27:44:40<8:37:09, 12.99s/it] 76%|███████▌ | 7613/10000 [27:44:53<8:35:57, 12.97s/it] {'loss': 0.0046, 'learning_rate': 1.2005000000000001e-05, 'epoch': 2.87} 76%|███████▌ | 7613/10000 [27:44:53<8:35:57, 12.97s/it] 76%|███████▌ | 7614/10000 [27:45:06<8:34:57, 12.95s/it] {'loss': 0.0035, 'learning_rate': 1.2e-05, 'epoch': 2.87} 76%|███████▌ | 7614/10000 [27:45:06<8:34:57, 12.95s/it] 76%|███████▌ | 7615/10000 [27:45:19<8:36:30, 12.99s/it] {'loss': 0.0054, 'learning_rate': 1.1995000000000001e-05, 'epoch': 2.87} 76%|███████▌ | 7615/10000 [27:45:19<8:36:30, 12.99s/it] 76%|███████▌ | 7616/10000 [27:45:32<8:36:19, 12.99s/it] {'loss': 0.0054, 'learning_rate': 1.199e-05, 'epoch': 2.87} 76%|███████▌ | 7616/10000 [27:45:32<8:36:19, 12.99s/it] 76%|███████▌ | 7617/10000 [27:45:45<8:35:15, 12.97s/it] {'loss': 0.0047, 'learning_rate': 1.1985e-05, 'epoch': 2.87} 76%|███████▌ | 7617/10000 [27:45:45<8:35:15, 12.97s/it] 76%|███████▌ | 7618/10000 [27:45:58<8:35:15, 12.98s/it] {'loss': 0.0034, 'learning_rate': 1.198e-05, 'epoch': 2.87} 76%|███████▌ | 7618/10000 [27:45:58<8:35:15, 12.98s/it] 76%|███████▌ | 7619/10000 [27:46:11<8:35:07, 12.98s/it] {'loss': 0.0047, 'learning_rate': 1.1975e-05, 'epoch': 2.87} 76%|███████▌ | 7619/10000 [27:46:11<8:35:07, 12.98s/it] 76%|███████▌ | 7620/10000 [27:46:24<8:35:23, 12.99s/it] {'loss': 0.0046, 'learning_rate': 1.197e-05, 'epoch': 2.87} 76%|███████▌ | 7620/10000 [27:46:24<8:35:23, 12.99s/it] 76%|███████▌ | 7621/10000 [27:46:37<8:33:55, 12.96s/it] {'loss': 0.0045, 'learning_rate': 1.1965000000000001e-05, 'epoch': 2.87} 76%|███████▌ | 7621/10000 [27:46:37<8:33:55, 12.96s/it] 76%|███████▌ | 7622/10000 [27:46:50<8:33:53, 12.97s/it] {'loss': 0.0046, 'learning_rate': 1.196e-05, 'epoch': 2.87} 76%|███████▌ | 7622/10000 [27:46:50<8:33:53, 12.97s/it] 76%|███████▌ | 7623/10000 [27:47:03<8:34:41, 12.99s/it] {'loss': 0.0047, 'learning_rate': 1.1955000000000002e-05, 'epoch': 2.87} 76%|███████▌ | 7623/10000 [27:47:03<8:34:41, 12.99s/it] 76%|███████▌ | 7624/10000 [27:47:16<8:33:37, 12.97s/it] {'loss': 0.0066, 'learning_rate': 1.195e-05, 'epoch': 2.87} 76%|███████▌ | 7624/10000 [27:47:16<8:33:37, 12.97s/it] 76%|███████▋ | 7625/10000 [27:47:28<8:32:39, 12.95s/it] {'loss': 0.0048, 'learning_rate': 1.1945e-05, 'epoch': 2.87} 76%|███████▋ | 7625/10000 [27:47:29<8:32:39, 12.95s/it] 76%|███████▋ | 7626/10000 [27:47:41<8:32:16, 12.95s/it] {'loss': 0.0053, 'learning_rate': 1.1940000000000001e-05, 'epoch': 2.87} 76%|███████▋ | 7626/10000 [27:47:41<8:32:16, 12.95s/it] 76%|███████▋ | 7627/10000 [27:47:54<8:32:49, 12.97s/it] {'loss': 0.0049, 'learning_rate': 1.1935e-05, 'epoch': 2.87} 76%|███████▋ | 7627/10000 [27:47:54<8:32:49, 12.97s/it] 76%|███████▋ | 7628/10000 [27:48:07<8:32:38, 12.97s/it] {'loss': 0.0041, 'learning_rate': 1.1930000000000001e-05, 'epoch': 2.87} 76%|███████▋ | 7628/10000 [27:48:07<8:32:38, 12.97s/it] 76%|███████▋ | 7629/10000 [27:48:20<8:32:39, 12.97s/it] {'loss': 0.0044, 'learning_rate': 1.1925e-05, 'epoch': 2.87} 76%|███████▋ | 7629/10000 [27:48:20<8:32:39, 12.97s/it] 76%|███████▋ | 7630/10000 [27:48:33<8:32:05, 12.96s/it] {'loss': 0.0041, 'learning_rate': 1.1920000000000001e-05, 'epoch': 2.87} 76%|███████▋ | 7630/10000 [27:48:33<8:32:05, 12.96s/it] 76%|███████▋ | 7631/10000 [27:48:46<8:30:43, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.1915000000000002e-05, 'epoch': 2.88} 76%|███████▋ | 7631/10000 [27:48:46<8:30:43, 12.94s/it] 76%|███████▋ | 7632/10000 [27:48:59<8:30:11, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.1910000000000001e-05, 'epoch': 2.88} 76%|███████▋ | 7632/10000 [27:48:59<8:30:11, 12.93s/it] 76%|███████▋ | 7633/10000 [27:49:12<8:30:36, 12.94s/it] {'loss': 0.0032, 'learning_rate': 1.1905e-05, 'epoch': 2.88} 76%|███████▋ | 7633/10000 [27:49:12<8:30:36, 12.94s/it] 76%|███████▋ | 7634/10000 [27:49:25<8:30:18, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.19e-05, 'epoch': 2.88} 76%|███████▋ | 7634/10000 [27:49:25<8:30:18, 12.94s/it] 76%|███████▋ | 7635/10000 [27:49:38<8:30:33, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.1895e-05, 'epoch': 2.88} 76%|███████▋ | 7635/10000 [27:49:38<8:30:33, 12.95s/it] 76%|███████▋ | 7636/10000 [27:49:51<8:30:28, 12.96s/it] {'loss': 0.0049, 'learning_rate': 1.1890000000000001e-05, 'epoch': 2.88} 76%|███████▋ | 7636/10000 [27:49:51<8:30:28, 12.96s/it] 76%|███████▋ | 7637/10000 [27:50:04<8:31:12, 12.98s/it] {'loss': 0.0041, 'learning_rate': 1.1885e-05, 'epoch': 2.88} 76%|███████▋ | 7637/10000 [27:50:04<8:31:12, 12.98s/it] 76%|███████▋ | 7638/10000 [27:50:17<8:32:16, 13.01s/it] {'loss': 0.0038, 'learning_rate': 1.1880000000000001e-05, 'epoch': 2.88} 76%|███████▋ | 7638/10000 [27:50:17<8:32:16, 13.01s/it] 76%|███████▋ | 7639/10000 [27:50:30<8:31:43, 13.00s/it] {'loss': 0.0041, 'learning_rate': 1.1875e-05, 'epoch': 2.88} 76%|███████▋ | 7639/10000 [27:50:30<8:31:43, 13.00s/it] 76%|███████▋ | 7640/10000 [27:50:43<8:31:43, 13.01s/it] {'loss': 0.0043, 'learning_rate': 1.187e-05, 'epoch': 2.88} 76%|███████▋ | 7640/10000 [27:50:43<8:31:43, 13.01s/it] 76%|███████▋ | 7641/10000 [27:50:56<8:31:22, 13.01s/it] {'loss': 0.0068, 'learning_rate': 1.1865e-05, 'epoch': 2.88} 76%|███████▋ | 7641/10000 [27:50:56<8:31:22, 13.01s/it] 76%|███████▋ | 7642/10000 [27:51:09<8:31:05, 13.00s/it] {'loss': 0.0035, 'learning_rate': 1.186e-05, 'epoch': 2.88} 76%|███████▋ | 7642/10000 [27:51:09<8:31:05, 13.00s/it] 76%|███████▋ | 7643/10000 [27:51:22<8:28:37, 12.95s/it] {'loss': 0.0055, 'learning_rate': 1.1855e-05, 'epoch': 2.88} 76%|███████▋ | 7643/10000 [27:51:22<8:28:37, 12.95s/it] 76%|███████▋ | 7644/10000 [27:51:35<8:28:00, 12.94s/it] {'loss': 0.0044, 'learning_rate': 1.185e-05, 'epoch': 2.88} 76%|███████▋ | 7644/10000 [27:51:35<8:28:00, 12.94s/it] 76%|███████▋ | 7645/10000 [27:51:48<8:28:10, 12.95s/it] {'loss': 0.0038, 'learning_rate': 1.1845000000000001e-05, 'epoch': 2.88} 76%|███████▋ | 7645/10000 [27:51:48<8:28:10, 12.95s/it] 76%|███████▋ | 7646/10000 [27:52:01<8:27:32, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.1840000000000002e-05, 'epoch': 2.88} 76%|███████▋ | 7646/10000 [27:52:01<8:27:32, 12.94s/it] 76%|███████▋ | 7647/10000 [27:52:14<8:27:39, 12.95s/it] {'loss': 0.0047, 'learning_rate': 1.1835000000000001e-05, 'epoch': 2.88} 76%|███████▋ | 7647/10000 [27:52:14<8:27:39, 12.95s/it] 76%|███████▋ | 7648/10000 [27:52:27<8:26:55, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.183e-05, 'epoch': 2.88} 76%|███████▋ | 7648/10000 [27:52:27<8:26:55, 12.93s/it] 76%|███████▋ | 7649/10000 [27:52:39<8:26:35, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.1825e-05, 'epoch': 2.88} 76%|███████▋ | 7649/10000 [27:52:40<8:26:35, 12.93s/it] 76%|███████▋ | 7650/10000 [27:52:53<8:27:19, 12.95s/it] {'loss': 0.004, 'learning_rate': 1.182e-05, 'epoch': 2.88} 76%|███████▋ | 7650/10000 [27:52:53<8:27:19, 12.95s/it] 77%|███████▋ | 7651/10000 [27:53:06<8:28:25, 12.99s/it] {'loss': 0.0032, 'learning_rate': 1.1815000000000001e-05, 'epoch': 2.88} 77%|███████▋ | 7651/10000 [27:53:06<8:28:25, 12.99s/it] 77%|███████▋ | 7652/10000 [27:53:19<8:27:51, 12.98s/it] {'loss': 0.0035, 'learning_rate': 1.181e-05, 'epoch': 2.88} 77%|███████▋ | 7652/10000 [27:53:19<8:27:51, 12.98s/it] 77%|███████▋ | 7653/10000 [27:53:31<8:27:16, 12.97s/it] {'loss': 0.0037, 'learning_rate': 1.1805000000000001e-05, 'epoch': 2.88} 77%|███████▋ | 7653/10000 [27:53:31<8:27:16, 12.97s/it] 77%|███████▋ | 7654/10000 [27:53:44<8:26:04, 12.94s/it] {'loss': 0.0044, 'learning_rate': 1.18e-05, 'epoch': 2.88} 77%|███████▋ | 7654/10000 [27:53:44<8:26:04, 12.94s/it] 77%|███████▋ | 7655/10000 [27:53:57<8:25:07, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.1795e-05, 'epoch': 2.88} 77%|███████▋ | 7655/10000 [27:53:57<8:25:07, 12.92s/it] 77%|███████▋ | 7656/10000 [27:54:10<8:25:22, 12.94s/it] {'loss': 0.0058, 'learning_rate': 1.179e-05, 'epoch': 2.88} 77%|███████▋ | 7656/10000 [27:54:10<8:25:22, 12.94s/it] 77%|███████▋ | 7657/10000 [27:54:23<8:25:10, 12.94s/it] {'loss': 0.004, 'learning_rate': 1.1785e-05, 'epoch': 2.89} 77%|███████▋ | 7657/10000 [27:54:23<8:25:10, 12.94s/it] 77%|███████▋ | 7658/10000 [27:54:36<8:23:59, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.178e-05, 'epoch': 2.89} 77%|███████▋ | 7658/10000 [27:54:36<8:23:59, 12.91s/it] 77%|███████▋ | 7659/10000 [27:54:49<8:24:00, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.1775e-05, 'epoch': 2.89} 77%|███████▋ | 7659/10000 [27:54:49<8:24:00, 12.92s/it] 77%|███████▋ | 7660/10000 [27:55:02<8:24:20, 12.93s/it] {'loss': 0.0052, 'learning_rate': 1.177e-05, 'epoch': 2.89} 77%|███████▋ | 7660/10000 [27:55:02<8:24:20, 12.93s/it] 77%|███████▋ | 7661/10000 [27:55:15<8:23:01, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.1765000000000002e-05, 'epoch': 2.89} 77%|███████▋ | 7661/10000 [27:55:15<8:23:01, 12.90s/it] 77%|███████▋ | 7662/10000 [27:55:28<8:24:45, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.1760000000000001e-05, 'epoch': 2.89} 77%|███████▋ | 7662/10000 [27:55:28<8:24:45, 12.95s/it] 77%|███████▋ | 7663/10000 [27:55:41<8:24:29, 12.95s/it] {'loss': 0.0041, 'learning_rate': 1.1755e-05, 'epoch': 2.89} 77%|███████▋ | 7663/10000 [27:55:41<8:24:29, 12.95s/it] 77%|███████▋ | 7664/10000 [27:55:54<8:25:11, 12.98s/it] {'loss': 0.0048, 'learning_rate': 1.175e-05, 'epoch': 2.89} 77%|███████▋ | 7664/10000 [27:55:54<8:25:11, 12.98s/it] 77%|███████▋ | 7665/10000 [27:56:07<8:25:28, 12.99s/it] {'loss': 0.0052, 'learning_rate': 1.1745e-05, 'epoch': 2.89} 77%|███████▋ | 7665/10000 [27:56:07<8:25:28, 12.99s/it] 77%|███████▋ | 7666/10000 [27:56:20<8:24:33, 12.97s/it] {'loss': 0.0039, 'learning_rate': 1.1740000000000001e-05, 'epoch': 2.89} 77%|███████▋ | 7666/10000 [27:56:20<8:24:33, 12.97s/it] 77%|███████▋ | 7667/10000 [27:56:33<8:25:18, 13.00s/it] {'loss': 0.0039, 'learning_rate': 1.1735e-05, 'epoch': 2.89} 77%|███████▋ | 7667/10000 [27:56:33<8:25:18, 13.00s/it] 77%|███████▋ | 7668/10000 [27:56:46<8:23:07, 12.94s/it] {'loss': 0.0052, 'learning_rate': 1.1730000000000001e-05, 'epoch': 2.89} 77%|███████▋ | 7668/10000 [27:56:46<8:23:07, 12.94s/it] 77%|███████▋ | 7669/10000 [27:56:59<8:23:19, 12.96s/it] {'loss': 0.005, 'learning_rate': 1.1725e-05, 'epoch': 2.89} 77%|███████▋ | 7669/10000 [27:56:59<8:23:19, 12.96s/it] 77%|███████▋ | 7670/10000 [27:57:12<8:23:23, 12.96s/it] {'loss': 0.0041, 'learning_rate': 1.172e-05, 'epoch': 2.89} 77%|███████▋ | 7670/10000 [27:57:12<8:23:23, 12.96s/it] 77%|███████▋ | 7671/10000 [27:57:25<8:23:27, 12.97s/it] {'loss': 0.0042, 'learning_rate': 1.1715e-05, 'epoch': 2.89} 77%|███████▋ | 7671/10000 [27:57:25<8:23:27, 12.97s/it] 77%|███████▋ | 7672/10000 [27:57:37<8:22:50, 12.96s/it] {'loss': 0.0043, 'learning_rate': 1.171e-05, 'epoch': 2.89} 77%|███████▋ | 7672/10000 [27:57:38<8:22:50, 12.96s/it] 77%|███████▋ | 7673/10000 [27:57:50<8:22:12, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.1705e-05, 'epoch': 2.89} 77%|███████▋ | 7673/10000 [27:57:50<8:22:12, 12.95s/it] 77%|███████▋ | 7674/10000 [27:58:03<8:22:02, 12.95s/it] {'loss': 0.0047, 'learning_rate': 1.1700000000000001e-05, 'epoch': 2.89} 77%|███████▋ | 7674/10000 [27:58:03<8:22:02, 12.95s/it] 77%|███████▋ | 7675/10000 [27:58:16<8:21:07, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.1695e-05, 'epoch': 2.89} 77%|███████▋ | 7675/10000 [27:58:16<8:21:07, 12.93s/it] 77%|███████▋ | 7676/10000 [27:58:29<8:21:05, 12.94s/it] {'loss': 0.005, 'learning_rate': 1.1690000000000002e-05, 'epoch': 2.89} 77%|███████▋ | 7676/10000 [27:58:29<8:21:05, 12.94s/it] 77%|███████▋ | 7677/10000 [27:58:42<8:21:17, 12.95s/it] {'loss': 0.0048, 'learning_rate': 1.1685e-05, 'epoch': 2.89} 77%|███████▋ | 7677/10000 [27:58:42<8:21:17, 12.95s/it] 77%|███████▋ | 7678/10000 [27:58:55<8:21:28, 12.96s/it] {'loss': 0.005, 'learning_rate': 1.168e-05, 'epoch': 2.89} 77%|███████▋ | 7678/10000 [27:58:55<8:21:28, 12.96s/it] 77%|███████▋ | 7679/10000 [27:59:08<8:22:51, 13.00s/it] {'loss': 0.0038, 'learning_rate': 1.1675000000000001e-05, 'epoch': 2.89} 77%|███████▋ | 7679/10000 [27:59:08<8:22:51, 13.00s/it] 77%|███████▋ | 7680/10000 [27:59:21<8:21:53, 12.98s/it] {'loss': 0.0045, 'learning_rate': 1.167e-05, 'epoch': 2.89} 77%|███████▋ | 7680/10000 [27:59:21<8:21:53, 12.98s/it] 77%|███████▋ | 7681/10000 [27:59:34<8:20:38, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.1665000000000001e-05, 'epoch': 2.89} 77%|███████▋ | 7681/10000 [27:59:34<8:20:38, 12.95s/it] 77%|███████▋ | 7682/10000 [27:59:47<8:20:31, 12.96s/it] {'loss': 0.0041, 'learning_rate': 1.166e-05, 'epoch': 2.89} 77%|███████▋ | 7682/10000 [27:59:47<8:20:31, 12.96s/it] 77%|███████▋ | 7683/10000 [28:00:00<8:19:23, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.1655000000000001e-05, 'epoch': 2.89} 77%|███████▋ | 7683/10000 [28:00:00<8:19:23, 12.93s/it] 77%|███████▋ | 7684/10000 [28:00:13<8:18:48, 12.92s/it] {'loss': 0.0036, 'learning_rate': 1.1650000000000002e-05, 'epoch': 2.9} 77%|███████▋ | 7684/10000 [28:00:13<8:18:48, 12.92s/it] 77%|███████▋ | 7685/10000 [28:00:26<8:18:27, 12.92s/it] {'loss': 0.0034, 'learning_rate': 1.1645000000000001e-05, 'epoch': 2.9} 77%|███████▋ | 7685/10000 [28:00:26<8:18:27, 12.92s/it] 77%|███████▋ | 7686/10000 [28:00:39<8:18:44, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.164e-05, 'epoch': 2.9} 77%|███████▋ | 7686/10000 [28:00:39<8:18:44, 12.93s/it] 77%|███████▋ | 7687/10000 [28:00:52<8:18:40, 12.94s/it] {'loss': 0.0047, 'learning_rate': 1.1635e-05, 'epoch': 2.9} 77%|███████▋ | 7687/10000 [28:00:52<8:18:40, 12.94s/it] 77%|███████▋ | 7688/10000 [28:01:05<8:17:47, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.163e-05, 'epoch': 2.9} 77%|███████▋ | 7688/10000 [28:01:05<8:17:47, 12.92s/it] 77%|███████▋ | 7689/10000 [28:01:18<8:18:22, 12.94s/it] {'loss': 0.0035, 'learning_rate': 1.1625000000000001e-05, 'epoch': 2.9} 77%|███████▋ | 7689/10000 [28:01:18<8:18:22, 12.94s/it] 77%|███████▋ | 7690/10000 [28:01:30<8:18:23, 12.95s/it] {'loss': 0.0054, 'learning_rate': 1.162e-05, 'epoch': 2.9} 77%|███████▋ | 7690/10000 [28:01:30<8:18:23, 12.95s/it] 77%|███████▋ | 7691/10000 [28:01:43<8:17:15, 12.92s/it] {'loss': 0.0034, 'learning_rate': 1.1615000000000001e-05, 'epoch': 2.9} 77%|███████▋ | 7691/10000 [28:01:43<8:17:15, 12.92s/it] 77%|███████▋ | 7692/10000 [28:01:56<8:17:22, 12.93s/it] {'loss': 0.0047, 'learning_rate': 1.161e-05, 'epoch': 2.9} 77%|███████▋ | 7692/10000 [28:01:56<8:17:22, 12.93s/it] 77%|███████▋ | 7693/10000 [28:02:09<8:16:04, 12.90s/it] {'loss': 0.0075, 'learning_rate': 1.1605e-05, 'epoch': 2.9} 77%|███████▋ | 7693/10000 [28:02:09<8:16:04, 12.90s/it] 77%|███████▋ | 7694/10000 [28:02:22<8:15:46, 12.90s/it] {'loss': 0.0054, 'learning_rate': 1.16e-05, 'epoch': 2.9} 77%|███████▋ | 7694/10000 [28:02:22<8:15:46, 12.90s/it] 77%|███████▋ | 7695/10000 [28:02:35<8:14:48, 12.88s/it] {'loss': 0.0044, 'learning_rate': 1.1595e-05, 'epoch': 2.9} 77%|███████▋ | 7695/10000 [28:02:35<8:14:48, 12.88s/it] 77%|███████▋ | 7696/10000 [28:02:48<8:14:51, 12.89s/it] {'loss': 0.0044, 'learning_rate': 1.159e-05, 'epoch': 2.9} 77%|███████▋ | 7696/10000 [28:02:48<8:14:51, 12.89s/it] 77%|███████▋ | 7697/10000 [28:03:01<8:15:19, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.1585e-05, 'epoch': 2.9} 77%|███████▋ | 7697/10000 [28:03:01<8:15:19, 12.90s/it] 77%|███████▋ | 7698/10000 [28:03:14<8:15:26, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.1580000000000001e-05, 'epoch': 2.9} 77%|███████▋ | 7698/10000 [28:03:14<8:15:26, 12.91s/it] 77%|███████▋ | 7699/10000 [28:03:27<8:15:35, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.1575000000000002e-05, 'epoch': 2.9} 77%|███████▋ | 7699/10000 [28:03:27<8:15:35, 12.92s/it] 77%|███████▋ | 7700/10000 [28:03:39<8:15:01, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.1570000000000001e-05, 'epoch': 2.9} 77%|███████▋ | 7700/10000 [28:03:39<8:15:01, 12.91s/it] 77%|███████▋ | 7701/10000 [28:03:52<8:14:31, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.1565e-05, 'epoch': 2.9} 77%|███████▋ | 7701/10000 [28:03:52<8:14:31, 12.91s/it] 77%|███████▋ | 7702/10000 [28:04:05<8:15:00, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.156e-05, 'epoch': 2.9} 77%|███████▋ | 7702/10000 [28:04:05<8:15:00, 12.92s/it] 77%|███████▋ | 7703/10000 [28:04:18<8:14:44, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.1555e-05, 'epoch': 2.9} 77%|███████▋ | 7703/10000 [28:04:18<8:14:44, 12.92s/it] 77%|███████▋ | 7704/10000 [28:04:31<8:13:30, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.1550000000000001e-05, 'epoch': 2.9} 77%|███████▋ | 7704/10000 [28:04:31<8:13:30, 12.90s/it] 77%|███████▋ | 7705/10000 [28:04:44<8:14:43, 12.93s/it] {'loss': 0.0061, 'learning_rate': 1.1545e-05, 'epoch': 2.9} 77%|███████▋ | 7705/10000 [28:04:44<8:14:43, 12.93s/it] 77%|███████▋ | 7706/10000 [28:04:57<8:14:19, 12.93s/it] {'loss': 0.0055, 'learning_rate': 1.1540000000000001e-05, 'epoch': 2.9} 77%|███████▋ | 7706/10000 [28:04:57<8:14:19, 12.93s/it] 77%|███████▋ | 7707/10000 [28:05:10<8:16:03, 12.98s/it] {'loss': 0.0044, 'learning_rate': 1.1535e-05, 'epoch': 2.9} 77%|███████▋ | 7707/10000 [28:05:10<8:16:03, 12.98s/it] 77%|███████▋ | 7708/10000 [28:05:23<8:14:18, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.153e-05, 'epoch': 2.9} 77%|███████▋ | 7708/10000 [28:05:23<8:14:18, 12.94s/it] 77%|███████▋ | 7709/10000 [28:05:36<8:14:04, 12.94s/it] {'loss': 0.0061, 'learning_rate': 1.1525e-05, 'epoch': 2.9} 77%|███████▋ | 7709/10000 [28:05:36<8:14:04, 12.94s/it] 77%|███████▋ | 7710/10000 [28:05:49<8:14:44, 12.96s/it] {'loss': 0.0041, 'learning_rate': 1.152e-05, 'epoch': 2.91} 77%|███████▋ | 7710/10000 [28:05:49<8:14:44, 12.96s/it] 77%|███████▋ | 7711/10000 [28:06:02<8:15:10, 12.98s/it] {'loss': 0.0044, 'learning_rate': 1.1515e-05, 'epoch': 2.91} 77%|███████▋ | 7711/10000 [28:06:02<8:15:10, 12.98s/it] 77%|███████▋ | 7712/10000 [28:06:15<8:14:58, 12.98s/it] {'loss': 0.0043, 'learning_rate': 1.151e-05, 'epoch': 2.91} 77%|███████▋ | 7712/10000 [28:06:15<8:14:58, 12.98s/it] 77%|███████▋ | 7713/10000 [28:06:28<8:14:55, 12.98s/it] {'loss': 0.0046, 'learning_rate': 1.1505e-05, 'epoch': 2.91} 77%|███████▋ | 7713/10000 [28:06:28<8:14:55, 12.98s/it] 77%|███████▋ | 7714/10000 [28:06:41<8:14:36, 12.98s/it] {'loss': 0.0048, 'learning_rate': 1.1500000000000002e-05, 'epoch': 2.91} 77%|███████▋ | 7714/10000 [28:06:41<8:14:36, 12.98s/it] 77%|███████▋ | 7715/10000 [28:06:54<8:14:53, 12.99s/it] {'loss': 0.0052, 'learning_rate': 1.1495000000000001e-05, 'epoch': 2.91} 77%|███████▋ | 7715/10000 [28:06:54<8:14:53, 12.99s/it] 77%|███████▋ | 7716/10000 [28:07:07<8:14:39, 12.99s/it] {'loss': 0.0045, 'learning_rate': 1.149e-05, 'epoch': 2.91} 77%|███████▋ | 7716/10000 [28:07:07<8:14:39, 12.99s/it] 77%|███████▋ | 7717/10000 [28:07:20<8:13:18, 12.96s/it] {'loss': 0.0044, 'learning_rate': 1.1485e-05, 'epoch': 2.91} 77%|███████▋ | 7717/10000 [28:07:20<8:13:18, 12.96s/it] 77%|███████▋ | 7718/10000 [28:07:33<8:13:10, 12.97s/it] {'loss': 0.0049, 'learning_rate': 1.148e-05, 'epoch': 2.91} 77%|███████▋ | 7718/10000 [28:07:33<8:13:10, 12.97s/it] 77%|███████▋ | 7719/10000 [28:07:46<8:13:02, 12.97s/it] {'loss': 0.0083, 'learning_rate': 1.1475000000000001e-05, 'epoch': 2.91} 77%|███████▋ | 7719/10000 [28:07:46<8:13:02, 12.97s/it] 77%|███████▋ | 7720/10000 [28:07:59<8:12:10, 12.95s/it] {'loss': 0.0048, 'learning_rate': 1.147e-05, 'epoch': 2.91} 77%|███████▋ | 7720/10000 [28:07:59<8:12:10, 12.95s/it] 77%|███████▋ | 7721/10000 [28:08:12<8:11:06, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.1465000000000001e-05, 'epoch': 2.91} 77%|███████▋ | 7721/10000 [28:08:12<8:11:06, 12.93s/it] 77%|███████▋ | 7722/10000 [28:08:24<8:11:05, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.146e-05, 'epoch': 2.91} 77%|███████▋ | 7722/10000 [28:08:25<8:11:05, 12.93s/it] 77%|███████▋ | 7723/10000 [28:08:37<8:11:20, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.1455000000000001e-05, 'epoch': 2.91} 77%|███████▋ | 7723/10000 [28:08:37<8:11:20, 12.95s/it] 77%|███████▋ | 7724/10000 [28:08:50<8:10:53, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.145e-05, 'epoch': 2.91} 77%|███████▋ | 7724/10000 [28:08:50<8:10:53, 12.94s/it] 77%|███████▋ | 7725/10000 [28:09:03<8:10:43, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.1445e-05, 'epoch': 2.91} 77%|███████▋ | 7725/10000 [28:09:03<8:10:43, 12.94s/it] 77%|███████▋ | 7726/10000 [28:09:16<8:10:29, 12.94s/it] {'loss': 0.005, 'learning_rate': 1.144e-05, 'epoch': 2.91} 77%|███████▋ | 7726/10000 [28:09:16<8:10:29, 12.94s/it] 77%|███████▋ | 7727/10000 [28:09:29<8:09:59, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.1435e-05, 'epoch': 2.91} 77%|███████▋ | 7727/10000 [28:09:29<8:09:59, 12.93s/it] 77%|███████▋ | 7728/10000 [28:09:42<8:09:38, 12.93s/it] {'loss': 0.0054, 'learning_rate': 1.143e-05, 'epoch': 2.91} 77%|███████▋ | 7728/10000 [28:09:42<8:09:38, 12.93s/it] 77%|███████▋ | 7729/10000 [28:09:55<8:09:10, 12.92s/it] {'loss': 0.0051, 'learning_rate': 1.1425000000000002e-05, 'epoch': 2.91} 77%|███████▋ | 7729/10000 [28:09:55<8:09:10, 12.92s/it] 77%|███████▋ | 7730/10000 [28:10:08<8:08:09, 12.90s/it] {'loss': 0.0045, 'learning_rate': 1.142e-05, 'epoch': 2.91} 77%|███████▋ | 7730/10000 [28:10:08<8:08:09, 12.90s/it] 77%|███████▋ | 7731/10000 [28:10:21<8:09:00, 12.93s/it] {'loss': 0.0047, 'learning_rate': 1.1415e-05, 'epoch': 2.91} 77%|███████▋ | 7731/10000 [28:10:21<8:09:00, 12.93s/it] 77%|███████▋ | 7732/10000 [28:10:34<8:08:50, 12.93s/it] {'loss': 0.0038, 'learning_rate': 1.141e-05, 'epoch': 2.91} 77%|███████▋ | 7732/10000 [28:10:34<8:08:50, 12.93s/it] 77%|███████▋ | 7733/10000 [28:10:47<8:08:46, 12.94s/it] {'loss': 0.0052, 'learning_rate': 1.1405e-05, 'epoch': 2.91} 77%|███████▋ | 7733/10000 [28:10:47<8:08:46, 12.94s/it] 77%|███████▋ | 7734/10000 [28:11:00<8:08:50, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.1400000000000001e-05, 'epoch': 2.91} 77%|███████▋ | 7734/10000 [28:11:00<8:08:50, 12.94s/it] 77%|███████▋ | 7735/10000 [28:11:13<8:08:23, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.1395e-05, 'epoch': 2.91} 77%|███████▋ | 7735/10000 [28:11:13<8:08:23, 12.94s/it] 77%|███████▋ | 7736/10000 [28:11:26<8:07:30, 12.92s/it] {'loss': 0.0051, 'learning_rate': 1.1390000000000001e-05, 'epoch': 2.91} 77%|███████▋ | 7736/10000 [28:11:26<8:07:30, 12.92s/it] 77%|███████▋ | 7737/10000 [28:11:39<8:08:09, 12.94s/it] {'loss': 0.0049, 'learning_rate': 1.1385000000000002e-05, 'epoch': 2.92} 77%|███████▋ | 7737/10000 [28:11:39<8:08:09, 12.94s/it] 77%|███████▋ | 7738/10000 [28:11:51<8:08:15, 12.95s/it] {'loss': 0.0049, 'learning_rate': 1.1380000000000001e-05, 'epoch': 2.92} 77%|███████▋ | 7738/10000 [28:11:52<8:08:15, 12.95s/it] 77%|███████▋ | 7739/10000 [28:12:04<8:07:46, 12.94s/it] {'loss': 0.0031, 'learning_rate': 1.1375e-05, 'epoch': 2.92} 77%|███████▋ | 7739/10000 [28:12:04<8:07:46, 12.94s/it] 77%|███████▋ | 7740/10000 [28:12:17<8:07:05, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.137e-05, 'epoch': 2.92} 77%|███████▋ | 7740/10000 [28:12:17<8:07:05, 12.93s/it] 77%|███████▋ | 7741/10000 [28:12:30<8:07:03, 12.94s/it] {'loss': 0.0044, 'learning_rate': 1.1365e-05, 'epoch': 2.92} 77%|███████▋ | 7741/10000 [28:12:30<8:07:03, 12.94s/it] 77%|███████▋ | 7742/10000 [28:12:43<8:07:39, 12.96s/it] {'loss': 0.0065, 'learning_rate': 1.1360000000000001e-05, 'epoch': 2.92} 77%|███████▋ | 7742/10000 [28:12:43<8:07:39, 12.96s/it] 77%|███████▋ | 7743/10000 [28:12:56<8:05:41, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.1355e-05, 'epoch': 2.92} 77%|███████▋ | 7743/10000 [28:12:56<8:05:41, 12.91s/it] 77%|███████▋ | 7744/10000 [28:13:09<8:05:23, 12.91s/it] {'loss': 0.0053, 'learning_rate': 1.1350000000000001e-05, 'epoch': 2.92} 77%|███████▋ | 7744/10000 [28:13:09<8:05:23, 12.91s/it] 77%|███████▋ | 7745/10000 [28:13:22<8:06:00, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.1345e-05, 'epoch': 2.92} 77%|███████▋ | 7745/10000 [28:13:22<8:06:00, 12.93s/it] 77%|███████▋ | 7746/10000 [28:13:35<8:06:17, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.134e-05, 'epoch': 2.92} 77%|███████▋ | 7746/10000 [28:13:35<8:06:17, 12.94s/it] 77%|███████▋ | 7747/10000 [28:13:48<8:06:15, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.1335e-05, 'epoch': 2.92} 77%|███████▋ | 7747/10000 [28:13:48<8:06:15, 12.95s/it] 77%|███████▋ | 7748/10000 [28:14:01<8:05:32, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.133e-05, 'epoch': 2.92} 77%|███████▋ | 7748/10000 [28:14:01<8:05:32, 12.94s/it] 77%|███████▋ | 7749/10000 [28:14:14<8:04:49, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.1325e-05, 'epoch': 2.92} 77%|███████▋ | 7749/10000 [28:14:14<8:04:49, 12.92s/it] 78%|███████▊ | 7750/10000 [28:14:27<8:05:26, 12.95s/it] {'loss': 0.0038, 'learning_rate': 1.132e-05, 'epoch': 2.92} 78%|███████▊ | 7750/10000 [28:14:27<8:05:26, 12.95s/it] 78%|███████▊ | 7751/10000 [28:14:40<8:04:42, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.1315000000000001e-05, 'epoch': 2.92} 78%|███████▊ | 7751/10000 [28:14:40<8:04:42, 12.93s/it] 78%|███████▊ | 7752/10000 [28:14:53<8:05:26, 12.96s/it] {'loss': 0.0047, 'learning_rate': 1.1310000000000002e-05, 'epoch': 2.92} 78%|███████▊ | 7752/10000 [28:14:53<8:05:26, 12.96s/it] 78%|███████▊ | 7753/10000 [28:15:06<8:05:18, 12.96s/it] {'loss': 0.0058, 'learning_rate': 1.1305000000000001e-05, 'epoch': 2.92} 78%|███████▊ | 7753/10000 [28:15:06<8:05:18, 12.96s/it] 78%|███████▊ | 7754/10000 [28:15:18<8:04:19, 12.94s/it] {'loss': 0.0048, 'learning_rate': 1.13e-05, 'epoch': 2.92} 78%|███████▊ | 7754/10000 [28:15:18<8:04:19, 12.94s/it] 78%|███████▊ | 7755/10000 [28:15:31<8:03:53, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.1295e-05, 'epoch': 2.92} 78%|███████▊ | 7755/10000 [28:15:31<8:03:53, 12.93s/it] 78%|███████▊ | 7756/10000 [28:15:44<8:03:09, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.129e-05, 'epoch': 2.92} 78%|███████▊ | 7756/10000 [28:15:44<8:03:09, 12.92s/it] 78%|███████▊ | 7757/10000 [28:15:57<8:03:16, 12.93s/it] {'loss': 0.0046, 'learning_rate': 1.1285000000000001e-05, 'epoch': 2.92} 78%|███████▊ | 7757/10000 [28:15:57<8:03:16, 12.93s/it] 78%|███████▊ | 7758/10000 [28:16:10<8:02:39, 12.92s/it] {'loss': 0.0061, 'learning_rate': 1.128e-05, 'epoch': 2.92} 78%|███████▊ | 7758/10000 [28:16:10<8:02:39, 12.92s/it] 78%|███████▊ | 7759/10000 [28:16:23<8:02:55, 12.93s/it] {'loss': 0.0038, 'learning_rate': 1.1275000000000001e-05, 'epoch': 2.92} 78%|███████▊ | 7759/10000 [28:16:23<8:02:55, 12.93s/it] 78%|███████▊ | 7760/10000 [28:16:36<8:03:05, 12.94s/it] {'loss': 0.004, 'learning_rate': 1.127e-05, 'epoch': 2.92} 78%|███████▊ | 7760/10000 [28:16:36<8:03:05, 12.94s/it] 78%|███████▊ | 7761/10000 [28:16:49<8:02:10, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.1265e-05, 'epoch': 2.92} 78%|███████▊ | 7761/10000 [28:16:49<8:02:10, 12.92s/it] 78%|███████▊ | 7762/10000 [28:17:02<8:01:42, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.126e-05, 'epoch': 2.92} 78%|███████▊ | 7762/10000 [28:17:02<8:01:42, 12.91s/it] 78%|███████▊ | 7763/10000 [28:17:15<8:01:14, 12.91s/it] {'loss': 0.0056, 'learning_rate': 1.1255e-05, 'epoch': 2.93} 78%|███████▊ | 7763/10000 [28:17:15<8:01:14, 12.91s/it] 78%|███████▊ | 7764/10000 [28:17:28<8:01:26, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.125e-05, 'epoch': 2.93} 78%|███████▊ | 7764/10000 [28:17:28<8:01:26, 12.92s/it] 78%|███████▊ | 7765/10000 [28:17:41<8:01:55, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.1245e-05, 'epoch': 2.93} 78%|███████▊ | 7765/10000 [28:17:41<8:01:55, 12.94s/it] 78%|███████▊ | 7766/10000 [28:17:54<8:01:18, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.124e-05, 'epoch': 2.93} 78%|███████▊ | 7766/10000 [28:17:54<8:01:18, 12.93s/it] 78%|███████▊ | 7767/10000 [28:18:06<8:00:25, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.1235000000000002e-05, 'epoch': 2.93} 78%|███████▊ | 7767/10000 [28:18:06<8:00:25, 12.91s/it] 78%|███████▊ | 7768/10000 [28:18:19<8:00:07, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.1230000000000001e-05, 'epoch': 2.93} 78%|███████▊ | 7768/10000 [28:18:19<8:00:07, 12.91s/it] 78%|███████▊ | 7769/10000 [28:18:32<8:01:37, 12.95s/it] {'loss': 0.0035, 'learning_rate': 1.1225e-05, 'epoch': 2.93} 78%|███████▊ | 7769/10000 [28:18:32<8:01:37, 12.95s/it] 78%|███████▊ | 7770/10000 [28:18:45<8:00:49, 12.94s/it] {'loss': 0.004, 'learning_rate': 1.122e-05, 'epoch': 2.93} 78%|███████▊ | 7770/10000 [28:18:45<8:00:49, 12.94s/it] 78%|███████▊ | 7771/10000 [28:18:58<8:00:18, 12.93s/it] {'loss': 0.0047, 'learning_rate': 1.1215e-05, 'epoch': 2.93} 78%|███████▊ | 7771/10000 [28:18:58<8:00:18, 12.93s/it] 78%|███████▊ | 7772/10000 [28:19:11<8:00:06, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.1210000000000001e-05, 'epoch': 2.93} 78%|███████▊ | 7772/10000 [28:19:11<8:00:06, 12.93s/it] 78%|███████▊ | 7773/10000 [28:19:24<8:00:34, 12.95s/it] {'loss': 0.0037, 'learning_rate': 1.1205e-05, 'epoch': 2.93} 78%|███████▊ | 7773/10000 [28:19:24<8:00:34, 12.95s/it] 78%|███████▊ | 7774/10000 [28:19:37<7:59:53, 12.93s/it] {'loss': 0.0047, 'learning_rate': 1.1200000000000001e-05, 'epoch': 2.93} 78%|███████▊ | 7774/10000 [28:19:37<7:59:53, 12.93s/it] 78%|███████▊ | 7775/10000 [28:19:50<7:59:51, 12.94s/it] {'loss': 0.0052, 'learning_rate': 1.1195e-05, 'epoch': 2.93} 78%|███████▊ | 7775/10000 [28:19:50<7:59:51, 12.94s/it] 78%|███████▊ | 7776/10000 [28:20:03<7:59:19, 12.93s/it] {'loss': 0.0054, 'learning_rate': 1.1190000000000001e-05, 'epoch': 2.93} 78%|███████▊ | 7776/10000 [28:20:03<7:59:19, 12.93s/it] 78%|███████▊ | 7777/10000 [28:20:16<7:58:39, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.1185e-05, 'epoch': 2.93} 78%|███████▊ | 7777/10000 [28:20:16<7:58:39, 12.92s/it] 78%|███████▊ | 7778/10000 [28:20:29<7:59:00, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.118e-05, 'epoch': 2.93} 78%|███████▊ | 7778/10000 [28:20:29<7:59:00, 12.93s/it] 78%|███████▊ | 7779/10000 [28:20:42<7:58:42, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.1175e-05, 'epoch': 2.93} 78%|███████▊ | 7779/10000 [28:20:42<7:58:42, 12.93s/it] 78%|███████▊ | 7780/10000 [28:20:55<7:58:14, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.117e-05, 'epoch': 2.93} 78%|███████▊ | 7780/10000 [28:20:55<7:58:14, 12.93s/it] 78%|███████▊ | 7781/10000 [28:21:08<7:58:21, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.1165e-05, 'epoch': 2.93} 78%|███████▊ | 7781/10000 [28:21:08<7:58:21, 12.93s/it] 78%|███████▊ | 7782/10000 [28:21:20<7:57:36, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.1160000000000002e-05, 'epoch': 2.93} 78%|███████▊ | 7782/10000 [28:21:20<7:57:36, 12.92s/it] 78%|███████▊ | 7783/10000 [28:21:33<7:57:11, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.1155e-05, 'epoch': 2.93} 78%|███████▊ | 7783/10000 [28:21:33<7:57:11, 12.91s/it] 78%|███████▊ | 7784/10000 [28:21:46<7:56:39, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.115e-05, 'epoch': 2.93} 78%|███████▊ | 7784/10000 [28:21:46<7:56:39, 12.91s/it] 78%|███████▊ | 7785/10000 [28:21:59<7:56:35, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.1145e-05, 'epoch': 2.93} 78%|███████▊ | 7785/10000 [28:21:59<7:56:35, 12.91s/it] 78%|███████▊ | 7786/10000 [28:22:12<7:57:03, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.114e-05, 'epoch': 2.93} 78%|███████▊ | 7786/10000 [28:22:12<7:57:03, 12.93s/it] 78%|███████▊ | 7787/10000 [28:22:25<7:56:57, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.1135000000000001e-05, 'epoch': 2.93} 78%|███████▊ | 7787/10000 [28:22:25<7:56:57, 12.93s/it] 78%|███████▊ | 7788/10000 [28:22:38<7:55:54, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.113e-05, 'epoch': 2.93} 78%|███████▊ | 7788/10000 [28:22:38<7:55:54, 12.91s/it] 78%|███████▊ | 7789/10000 [28:22:51<7:55:10, 12.89s/it] {'loss': 0.0045, 'learning_rate': 1.1125000000000001e-05, 'epoch': 2.93} 78%|███████▊ | 7789/10000 [28:22:51<7:55:10, 12.89s/it] 78%|███████▊ | 7790/10000 [28:23:04<7:54:22, 12.88s/it] {'loss': 0.0058, 'learning_rate': 1.112e-05, 'epoch': 2.94} 78%|███████▊ | 7790/10000 [28:23:04<7:54:22, 12.88s/it] 78%|███████▊ | 7791/10000 [28:23:16<7:54:46, 12.90s/it] {'loss': 0.0055, 'learning_rate': 1.1115000000000001e-05, 'epoch': 2.94} 78%|███████▊ | 7791/10000 [28:23:17<7:54:46, 12.90s/it] 78%|███████▊ | 7792/10000 [28:23:29<7:53:04, 12.86s/it] {'loss': 0.0048, 'learning_rate': 1.111e-05, 'epoch': 2.94} 78%|███████▊ | 7792/10000 [28:23:29<7:53:04, 12.86s/it] 78%|███████▊ | 7793/10000 [28:23:42<7:53:34, 12.87s/it] {'loss': 0.0041, 'learning_rate': 1.1105e-05, 'epoch': 2.94} 78%|███████▊ | 7793/10000 [28:23:42<7:53:34, 12.87s/it] 78%|███████▊ | 7794/10000 [28:23:55<7:53:16, 12.87s/it] {'loss': 0.0051, 'learning_rate': 1.11e-05, 'epoch': 2.94} 78%|███████▊ | 7794/10000 [28:23:55<7:53:16, 12.87s/it] 78%|███████▊ | 7795/10000 [28:24:08<7:52:44, 12.86s/it] {'loss': 0.0052, 'learning_rate': 1.1095e-05, 'epoch': 2.94} 78%|███████▊ | 7795/10000 [28:24:08<7:52:44, 12.86s/it] 78%|███████▊ | 7796/10000 [28:24:21<7:52:53, 12.87s/it] {'loss': 0.0051, 'learning_rate': 1.109e-05, 'epoch': 2.94} 78%|███████▊ | 7796/10000 [28:24:21<7:52:53, 12.87s/it] 78%|███████▊ | 7797/10000 [28:24:34<7:54:10, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.1085000000000001e-05, 'epoch': 2.94} 78%|███████▊ | 7797/10000 [28:24:34<7:54:10, 12.91s/it] 78%|███████▊ | 7798/10000 [28:24:47<7:53:32, 12.90s/it] {'loss': 0.0044, 'learning_rate': 1.108e-05, 'epoch': 2.94} 78%|███████▊ | 7798/10000 [28:24:47<7:53:32, 12.90s/it] 78%|███████▊ | 7799/10000 [28:25:00<7:53:05, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.1075e-05, 'epoch': 2.94} 78%|███████▊ | 7799/10000 [28:25:00<7:53:05, 12.90s/it] 78%|███████▊ | 7800/10000 [28:25:13<7:53:34, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.107e-05, 'epoch': 2.94} 78%|███████▊ | 7800/10000 [28:25:13<7:53:34, 12.92s/it] 78%|███████▊ | 7801/10000 [28:25:25<7:54:01, 12.93s/it] {'loss': 0.0048, 'learning_rate': 1.1065e-05, 'epoch': 2.94} 78%|███████▊ | 7801/10000 [28:25:26<7:54:01, 12.93s/it] 78%|███████▊ | 7802/10000 [28:25:38<7:54:05, 12.94s/it] {'loss': 0.0057, 'learning_rate': 1.106e-05, 'epoch': 2.94} 78%|███████▊ | 7802/10000 [28:25:38<7:54:05, 12.94s/it] 78%|███████▊ | 7803/10000 [28:25:51<7:52:44, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.1055e-05, 'epoch': 2.94} 78%|███████▊ | 7803/10000 [28:25:51<7:52:44, 12.91s/it] 78%|███████▊ | 7804/10000 [28:26:04<7:52:06, 12.90s/it] {'loss': 0.0038, 'learning_rate': 1.1050000000000001e-05, 'epoch': 2.94} 78%|███████▊ | 7804/10000 [28:26:04<7:52:06, 12.90s/it] 78%|███████▊ | 7805/10000 [28:26:17<7:52:11, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.1045000000000002e-05, 'epoch': 2.94} 78%|███████▊ | 7805/10000 [28:26:17<7:52:11, 12.91s/it] 78%|███████▊ | 7806/10000 [28:26:30<7:51:43, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.1040000000000001e-05, 'epoch': 2.94} 78%|███████▊ | 7806/10000 [28:26:30<7:51:43, 12.90s/it] 78%|███████▊ | 7807/10000 [28:26:43<7:51:22, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.1035e-05, 'epoch': 2.94} 78%|███████▊ | 7807/10000 [28:26:43<7:51:22, 12.90s/it] 78%|███████▊ | 7808/10000 [28:26:56<7:51:14, 12.90s/it] {'loss': 0.0044, 'learning_rate': 1.103e-05, 'epoch': 2.94} 78%|███████▊ | 7808/10000 [28:26:56<7:51:14, 12.90s/it] 78%|███████▊ | 7809/10000 [28:27:09<7:50:50, 12.89s/it] {'loss': 0.0061, 'learning_rate': 1.1025e-05, 'epoch': 2.94} 78%|███████▊ | 7809/10000 [28:27:09<7:50:50, 12.89s/it] 78%|███████▊ | 7810/10000 [28:27:22<7:51:20, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.1020000000000001e-05, 'epoch': 2.94} 78%|███████▊ | 7810/10000 [28:27:22<7:51:20, 12.91s/it] 78%|███████▊ | 7811/10000 [28:27:35<7:51:02, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.1015e-05, 'epoch': 2.94} 78%|███████▊ | 7811/10000 [28:27:35<7:51:02, 12.91s/it] 78%|███████▊ | 7812/10000 [28:27:47<7:50:26, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.1010000000000001e-05, 'epoch': 2.94} 78%|███████▊ | 7812/10000 [28:27:47<7:50:26, 12.90s/it] 78%|███████▊ | 7813/10000 [28:28:00<7:48:55, 12.86s/it] {'loss': 0.0055, 'learning_rate': 1.1005e-05, 'epoch': 2.94} 78%|███████▊ | 7813/10000 [28:28:00<7:48:55, 12.86s/it] 78%|███████▊ | 7814/10000 [28:28:13<7:48:56, 12.87s/it] {'loss': 0.0051, 'learning_rate': 1.1000000000000001e-05, 'epoch': 2.94} 78%|███████▊ | 7814/10000 [28:28:13<7:48:56, 12.87s/it] 78%|███████▊ | 7815/10000 [28:28:26<7:48:28, 12.86s/it] {'loss': 0.0046, 'learning_rate': 1.0995e-05, 'epoch': 2.94} 78%|███████▊ | 7815/10000 [28:28:26<7:48:28, 12.86s/it] 78%|███████▊ | 7816/10000 [28:28:39<7:49:18, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.099e-05, 'epoch': 2.94} 78%|███████▊ | 7816/10000 [28:28:39<7:49:18, 12.89s/it] 78%|███████▊ | 7817/10000 [28:28:52<7:49:57, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.0985e-05, 'epoch': 2.95} 78%|███████▊ | 7817/10000 [28:28:52<7:49:57, 12.92s/it] 78%|███████▊ | 7818/10000 [28:29:05<7:49:40, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.098e-05, 'epoch': 2.95} 78%|███████▊ | 7818/10000 [28:29:05<7:49:40, 12.91s/it] 78%|███████▊ | 7819/10000 [28:29:18<7:49:24, 12.91s/it] {'loss': 0.0057, 'learning_rate': 1.0975e-05, 'epoch': 2.95} 78%|███████▊ | 7819/10000 [28:29:18<7:49:24, 12.91s/it] 78%|███████▊ | 7820/10000 [28:29:31<7:49:14, 12.91s/it] {'loss': 0.0034, 'learning_rate': 1.0970000000000002e-05, 'epoch': 2.95} 78%|███████▊ | 7820/10000 [28:29:31<7:49:14, 12.91s/it] 78%|███████▊ | 7821/10000 [28:29:43<7:48:20, 12.90s/it] {'loss': 0.0045, 'learning_rate': 1.0965000000000001e-05, 'epoch': 2.95} 78%|███████▊ | 7821/10000 [28:29:43<7:48:20, 12.90s/it] 78%|███████▊ | 7822/10000 [28:29:56<7:48:17, 12.90s/it] {'loss': 0.0047, 'learning_rate': 1.096e-05, 'epoch': 2.95} 78%|███████▊ | 7822/10000 [28:29:56<7:48:17, 12.90s/it] 78%|███████▊ | 7823/10000 [28:30:09<7:48:15, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.0955e-05, 'epoch': 2.95} 78%|███████▊ | 7823/10000 [28:30:09<7:48:15, 12.91s/it] 78%|███████▊ | 7824/10000 [28:30:22<7:47:26, 12.89s/it] {'loss': 0.006, 'learning_rate': 1.095e-05, 'epoch': 2.95} 78%|███████▊ | 7824/10000 [28:30:22<7:47:26, 12.89s/it] 78%|███████▊ | 7825/10000 [28:30:35<7:47:29, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.0945000000000001e-05, 'epoch': 2.95} 78%|███████▊ | 7825/10000 [28:30:35<7:47:29, 12.90s/it] 78%|███████▊ | 7826/10000 [28:30:48<7:47:29, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.094e-05, 'epoch': 2.95} 78%|███████▊ | 7826/10000 [28:30:48<7:47:29, 12.90s/it] 78%|███████▊ | 7827/10000 [28:31:01<7:47:35, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.0935000000000001e-05, 'epoch': 2.95} 78%|███████▊ | 7827/10000 [28:31:01<7:47:35, 12.91s/it] 78%|███████▊ | 7828/10000 [28:31:14<7:47:15, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.093e-05, 'epoch': 2.95} 78%|███████▊ | 7828/10000 [28:31:14<7:47:15, 12.91s/it] 78%|███████▊ | 7829/10000 [28:31:27<7:47:30, 12.92s/it] {'loss': 0.004, 'learning_rate': 1.0925000000000001e-05, 'epoch': 2.95} 78%|███████▊ | 7829/10000 [28:31:27<7:47:30, 12.92s/it] 78%|███████▊ | 7830/10000 [28:31:40<7:46:55, 12.91s/it] {'loss': 0.0049, 'learning_rate': 1.092e-05, 'epoch': 2.95} 78%|███████▊ | 7830/10000 [28:31:40<7:46:55, 12.91s/it] 78%|███████▊ | 7831/10000 [28:31:53<7:46:55, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.0915e-05, 'epoch': 2.95} 78%|███████▊ | 7831/10000 [28:31:53<7:46:55, 12.92s/it] 78%|███████▊ | 7832/10000 [28:32:05<7:46:40, 12.92s/it] {'loss': 0.005, 'learning_rate': 1.091e-05, 'epoch': 2.95} 78%|███████▊ | 7832/10000 [28:32:05<7:46:40, 12.92s/it] 78%|███████▊ | 7833/10000 [28:32:18<7:47:07, 12.93s/it] {'loss': 0.0064, 'learning_rate': 1.0905e-05, 'epoch': 2.95} 78%|███████▊ | 7833/10000 [28:32:18<7:47:07, 12.93s/it] 78%|███████▊ | 7834/10000 [28:32:31<7:48:04, 12.97s/it] {'loss': 0.0061, 'learning_rate': 1.09e-05, 'epoch': 2.95} 78%|███████▊ | 7834/10000 [28:32:32<7:48:04, 12.97s/it] 78%|███████▊ | 7835/10000 [28:32:44<7:47:34, 12.96s/it] {'loss': 0.0045, 'learning_rate': 1.0895000000000002e-05, 'epoch': 2.95} 78%|███████▊ | 7835/10000 [28:32:44<7:47:34, 12.96s/it] 78%|███████▊ | 7836/10000 [28:32:57<7:47:14, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.089e-05, 'epoch': 2.95} 78%|███████▊ | 7836/10000 [28:32:57<7:47:14, 12.95s/it] 78%|███████▊ | 7837/10000 [28:33:10<7:46:45, 12.95s/it] {'loss': 0.0045, 'learning_rate': 1.0885e-05, 'epoch': 2.95} 78%|███████▊ | 7837/10000 [28:33:10<7:46:45, 12.95s/it] 78%|███████▊ | 7838/10000 [28:33:23<7:46:54, 12.96s/it] {'loss': 0.0039, 'learning_rate': 1.088e-05, 'epoch': 2.95} 78%|███████▊ | 7838/10000 [28:33:23<7:46:54, 12.96s/it] 78%|███████▊ | 7839/10000 [28:33:36<7:47:05, 12.97s/it] {'loss': 0.0049, 'learning_rate': 1.0875e-05, 'epoch': 2.95} 78%|███████▊ | 7839/10000 [28:33:36<7:47:05, 12.97s/it] 78%|███████▊ | 7840/10000 [28:33:49<7:46:54, 12.97s/it] {'loss': 0.0055, 'learning_rate': 1.0870000000000001e-05, 'epoch': 2.95} 78%|███████▊ | 7840/10000 [28:33:49<7:46:54, 12.97s/it] 78%|███████▊ | 7841/10000 [28:34:02<7:46:04, 12.95s/it] {'loss': 0.0051, 'learning_rate': 1.0865e-05, 'epoch': 2.95} 78%|███████▊ | 7841/10000 [28:34:02<7:46:04, 12.95s/it] 78%|███████▊ | 7842/10000 [28:34:15<7:46:10, 12.96s/it] {'loss': 0.0045, 'learning_rate': 1.0860000000000001e-05, 'epoch': 2.95} 78%|███████▊ | 7842/10000 [28:34:15<7:46:10, 12.96s/it] 78%|███████▊ | 7843/10000 [28:34:28<7:45:13, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.0855e-05, 'epoch': 2.96} 78%|███████▊ | 7843/10000 [28:34:28<7:45:13, 12.94s/it] 78%|███████▊ | 7844/10000 [28:34:41<7:45:15, 12.95s/it] {'loss': 0.0039, 'learning_rate': 1.0850000000000001e-05, 'epoch': 2.96} 78%|███████▊ | 7844/10000 [28:34:41<7:45:15, 12.95s/it] 78%|███████▊ | 7845/10000 [28:34:54<7:44:01, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.0845e-05, 'epoch': 2.96} 78%|███████▊ | 7845/10000 [28:34:54<7:44:01, 12.92s/it] 78%|███████▊ | 7846/10000 [28:35:07<7:43:35, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.084e-05, 'epoch': 2.96} 78%|███████▊ | 7846/10000 [28:35:07<7:43:35, 12.91s/it] 78%|███████▊ | 7847/10000 [28:35:20<7:43:29, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.0835e-05, 'epoch': 2.96} 78%|███████▊ | 7847/10000 [28:35:20<7:43:29, 12.92s/it] 78%|███████▊ | 7848/10000 [28:35:33<7:42:52, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.083e-05, 'epoch': 2.96} 78%|███████▊ | 7848/10000 [28:35:33<7:42:52, 12.91s/it] 78%|███████▊ | 7849/10000 [28:35:45<7:42:41, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.0825e-05, 'epoch': 2.96} 78%|███████▊ | 7849/10000 [28:35:45<7:42:41, 12.91s/it] 78%|███████▊ | 7850/10000 [28:35:58<7:42:39, 12.91s/it] {'loss': 0.0038, 'learning_rate': 1.0820000000000001e-05, 'epoch': 2.96} 78%|███████▊ | 7850/10000 [28:35:58<7:42:39, 12.91s/it] 79%|███████▊ | 7851/10000 [28:36:11<7:42:39, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.0815e-05, 'epoch': 2.96} 79%|███████▊ | 7851/10000 [28:36:11<7:42:39, 12.92s/it] 79%|███████▊ | 7852/10000 [28:36:24<7:42:12, 12.91s/it] {'loss': 0.0051, 'learning_rate': 1.081e-05, 'epoch': 2.96} 79%|███████▊ | 7852/10000 [28:36:24<7:42:12, 12.91s/it] 79%|███████▊ | 7853/10000 [28:36:37<7:42:03, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.0804999999999999e-05, 'epoch': 2.96} 79%|███████▊ | 7853/10000 [28:36:37<7:42:03, 12.91s/it] 79%|███████▊ | 7854/10000 [28:36:50<7:42:39, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.08e-05, 'epoch': 2.96} 79%|███████▊ | 7854/10000 [28:36:50<7:42:39, 12.94s/it] 79%|███████▊ | 7855/10000 [28:37:03<7:41:55, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.0795e-05, 'epoch': 2.96} 79%|███████▊ | 7855/10000 [28:37:03<7:41:55, 12.92s/it] 79%|███████▊ | 7856/10000 [28:37:16<7:42:29, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.079e-05, 'epoch': 2.96} 79%|███████▊ | 7856/10000 [28:37:16<7:42:29, 12.94s/it] 79%|███████▊ | 7857/10000 [28:37:29<7:43:27, 12.98s/it] {'loss': 0.0041, 'learning_rate': 1.0785000000000001e-05, 'epoch': 2.96} 79%|███████▊ | 7857/10000 [28:37:29<7:43:27, 12.98s/it] 79%|███████▊ | 7858/10000 [28:37:42<7:42:18, 12.95s/it] {'loss': 0.0065, 'learning_rate': 1.0780000000000002e-05, 'epoch': 2.96} 79%|███████▊ | 7858/10000 [28:37:42<7:42:18, 12.95s/it] 79%|███████▊ | 7859/10000 [28:37:55<7:41:38, 12.94s/it] {'loss': 0.0049, 'learning_rate': 1.0775000000000001e-05, 'epoch': 2.96} 79%|███████▊ | 7859/10000 [28:37:55<7:41:38, 12.94s/it] 79%|███████▊ | 7860/10000 [28:38:08<7:41:25, 12.94s/it] {'loss': 0.0044, 'learning_rate': 1.077e-05, 'epoch': 2.96} 79%|███████▊ | 7860/10000 [28:38:08<7:41:25, 12.94s/it] 79%|███████▊ | 7861/10000 [28:38:21<7:40:58, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.0765e-05, 'epoch': 2.96} 79%|███████▊ | 7861/10000 [28:38:21<7:40:58, 12.93s/it] 79%|███████▊ | 7862/10000 [28:38:34<7:40:41, 12.93s/it] {'loss': 0.0052, 'learning_rate': 1.076e-05, 'epoch': 2.96} 79%|███████▊ | 7862/10000 [28:38:34<7:40:41, 12.93s/it] 79%|███████▊ | 7863/10000 [28:38:46<7:39:18, 12.90s/it] {'loss': 0.0053, 'learning_rate': 1.0755000000000001e-05, 'epoch': 2.96} 79%|███████▊ | 7863/10000 [28:38:46<7:39:18, 12.90s/it] 79%|███████▊ | 7864/10000 [28:38:59<7:38:12, 12.87s/it] {'loss': 0.0045, 'learning_rate': 1.075e-05, 'epoch': 2.96} 79%|███████▊ | 7864/10000 [28:38:59<7:38:12, 12.87s/it] 79%|███████▊ | 7865/10000 [28:39:12<7:37:43, 12.86s/it] {'loss': 0.0051, 'learning_rate': 1.0745000000000001e-05, 'epoch': 2.96} 79%|███████▊ | 7865/10000 [28:39:12<7:37:43, 12.86s/it] 79%|███████▊ | 7866/10000 [28:39:25<7:38:38, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.074e-05, 'epoch': 2.96} 79%|███████▊ | 7866/10000 [28:39:25<7:38:38, 12.90s/it] 79%|███████▊ | 7867/10000 [28:39:38<7:38:53, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.0735000000000001e-05, 'epoch': 2.96} 79%|███████▊ | 7867/10000 [28:39:38<7:38:53, 12.91s/it] 79%|███████▊ | 7868/10000 [28:39:51<7:39:32, 12.93s/it] {'loss': 0.0054, 'learning_rate': 1.073e-05, 'epoch': 2.96} 79%|███████▊ | 7868/10000 [28:39:51<7:39:32, 12.93s/it] 79%|███████▊ | 7869/10000 [28:40:04<7:38:43, 12.92s/it] {'loss': 0.0054, 'learning_rate': 1.0725e-05, 'epoch': 2.96} 79%|███████▊ | 7869/10000 [28:40:04<7:38:43, 12.92s/it] 79%|███████▊ | 7870/10000 [28:40:17<7:39:49, 12.95s/it] {'loss': 0.0056, 'learning_rate': 1.072e-05, 'epoch': 2.97} 79%|███████▊ | 7870/10000 [28:40:17<7:39:49, 12.95s/it] 79%|███████▊ | 7871/10000 [28:40:30<7:39:33, 12.95s/it] {'loss': 0.0034, 'learning_rate': 1.0715e-05, 'epoch': 2.97} 79%|███████▊ | 7871/10000 [28:40:30<7:39:33, 12.95s/it] 79%|███████▊ | 7872/10000 [28:40:43<7:39:08, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.071e-05, 'epoch': 2.97} 79%|███████▊ | 7872/10000 [28:40:43<7:39:08, 12.95s/it] 79%|███████▊ | 7873/10000 [28:40:56<7:37:46, 12.91s/it] {'loss': 0.0041, 'learning_rate': 1.0705000000000002e-05, 'epoch': 2.97} 79%|███████▊ | 7873/10000 [28:40:56<7:37:46, 12.91s/it] 79%|███████▊ | 7874/10000 [28:41:09<7:38:40, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.0700000000000001e-05, 'epoch': 2.97} 79%|███████▊ | 7874/10000 [28:41:09<7:38:40, 12.94s/it] 79%|███████▉ | 7875/10000 [28:41:22<7:37:56, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.0695e-05, 'epoch': 2.97} 79%|███████▉ | 7875/10000 [28:41:22<7:37:56, 12.93s/it] 79%|███████▉ | 7876/10000 [28:41:35<7:38:58, 12.97s/it] {'loss': 0.005, 'learning_rate': 1.069e-05, 'epoch': 2.97} 79%|███████▉ | 7876/10000 [28:41:35<7:38:58, 12.97s/it] 79%|███████▉ | 7877/10000 [28:41:48<7:38:51, 12.97s/it] {'loss': 0.0045, 'learning_rate': 1.0685e-05, 'epoch': 2.97} 79%|███████▉ | 7877/10000 [28:41:48<7:38:51, 12.97s/it] 79%|███████▉ | 7878/10000 [28:42:01<7:38:29, 12.96s/it] {'loss': 0.0042, 'learning_rate': 1.0680000000000001e-05, 'epoch': 2.97} 79%|███████▉ | 7878/10000 [28:42:01<7:38:29, 12.96s/it] 79%|███████▉ | 7879/10000 [28:42:13<7:38:16, 12.96s/it] {'loss': 0.004, 'learning_rate': 1.0675e-05, 'epoch': 2.97} 79%|███████▉ | 7879/10000 [28:42:13<7:38:16, 12.96s/it] 79%|███████▉ | 7880/10000 [28:42:26<7:37:07, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.0670000000000001e-05, 'epoch': 2.97} 79%|███████▉ | 7880/10000 [28:42:26<7:37:07, 12.94s/it] 79%|███████▉ | 7881/10000 [28:42:39<7:36:04, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.0665e-05, 'epoch': 2.97} 79%|███████▉ | 7881/10000 [28:42:39<7:36:04, 12.91s/it] 79%|███████▉ | 7882/10000 [28:42:52<7:36:17, 12.93s/it] {'loss': 0.0041, 'learning_rate': 1.0660000000000001e-05, 'epoch': 2.97} 79%|███████▉ | 7882/10000 [28:42:52<7:36:17, 12.93s/it] 79%|███████▉ | 7883/10000 [28:43:05<7:35:20, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.0655e-05, 'epoch': 2.97} 79%|███████▉ | 7883/10000 [28:43:05<7:35:20, 12.91s/it] 79%|███████▉ | 7884/10000 [28:43:18<7:36:34, 12.95s/it] {'loss': 0.003, 'learning_rate': 1.065e-05, 'epoch': 2.97} 79%|███████▉ | 7884/10000 [28:43:18<7:36:34, 12.95s/it] 79%|███████▉ | 7885/10000 [28:43:31<7:35:57, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.0645e-05, 'epoch': 2.97} 79%|███████▉ | 7885/10000 [28:43:31<7:35:57, 12.93s/it] 79%|███████▉ | 7886/10000 [28:43:44<7:35:47, 12.94s/it] {'loss': 0.0046, 'learning_rate': 1.064e-05, 'epoch': 2.97} 79%|███████▉ | 7886/10000 [28:43:44<7:35:47, 12.94s/it] 79%|███████▉ | 7887/10000 [28:43:57<7:36:20, 12.96s/it] {'loss': 0.0044, 'learning_rate': 1.0635e-05, 'epoch': 2.97} 79%|███████▉ | 7887/10000 [28:43:57<7:36:20, 12.96s/it] 79%|███████▉ | 7888/10000 [28:44:10<7:36:08, 12.96s/it] {'loss': 0.0065, 'learning_rate': 1.0630000000000002e-05, 'epoch': 2.97} 79%|███████▉ | 7888/10000 [28:44:10<7:36:08, 12.96s/it] 79%|███████▉ | 7889/10000 [28:44:23<7:36:26, 12.97s/it] {'loss': 0.0039, 'learning_rate': 1.0625e-05, 'epoch': 2.97} 79%|███████▉ | 7889/10000 [28:44:23<7:36:26, 12.97s/it] 79%|███████▉ | 7890/10000 [28:44:36<7:35:34, 12.95s/it] {'loss': 0.004, 'learning_rate': 1.062e-05, 'epoch': 2.97} 79%|███████▉ | 7890/10000 [28:44:36<7:35:34, 12.95s/it] 79%|███████▉ | 7891/10000 [28:44:49<7:34:19, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.0615e-05, 'epoch': 2.97} 79%|███████▉ | 7891/10000 [28:44:49<7:34:19, 12.93s/it] 79%|███████▉ | 7892/10000 [28:45:01<7:33:14, 12.90s/it] {'loss': 0.0052, 'learning_rate': 1.061e-05, 'epoch': 2.97} 79%|███████▉ | 7892/10000 [28:45:02<7:33:14, 12.90s/it] 79%|███████▉ | 7893/10000 [28:45:14<7:34:02, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.0605000000000001e-05, 'epoch': 2.97} 79%|███████▉ | 7893/10000 [28:45:15<7:34:02, 12.93s/it] 79%|███████▉ | 7894/10000 [28:45:27<7:33:32, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.06e-05, 'epoch': 2.97} 79%|███████▉ | 7894/10000 [28:45:27<7:33:32, 12.92s/it] 79%|███████▉ | 7895/10000 [28:45:40<7:32:41, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.0595000000000001e-05, 'epoch': 2.97} 79%|███████▉ | 7895/10000 [28:45:40<7:32:41, 12.90s/it] 79%|███████▉ | 7896/10000 [28:45:53<7:33:22, 12.93s/it] {'loss': 0.0038, 'learning_rate': 1.059e-05, 'epoch': 2.98} 79%|███████▉ | 7896/10000 [28:45:53<7:33:22, 12.93s/it] 79%|███████▉ | 7897/10000 [28:46:06<7:33:05, 12.93s/it] {'loss': 0.0049, 'learning_rate': 1.0585000000000001e-05, 'epoch': 2.98} 79%|███████▉ | 7897/10000 [28:46:06<7:33:05, 12.93s/it] 79%|███████▉ | 7898/10000 [28:46:19<7:32:30, 12.92s/it] {'loss': 0.0053, 'learning_rate': 1.058e-05, 'epoch': 2.98} 79%|███████▉ | 7898/10000 [28:46:19<7:32:30, 12.92s/it] 79%|███████▉ | 7899/10000 [28:46:32<7:31:52, 12.90s/it] {'loss': 0.004, 'learning_rate': 1.0575e-05, 'epoch': 2.98} 79%|███████▉ | 7899/10000 [28:46:32<7:31:52, 12.90s/it] 79%|███████▉ | 7900/10000 [28:46:45<7:31:57, 12.91s/it] {'loss': 0.0052, 'learning_rate': 1.057e-05, 'epoch': 2.98} 79%|███████▉ | 7900/10000 [28:46:45<7:31:57, 12.91s/it] 79%|███████▉ | 7901/10000 [28:46:58<7:31:05, 12.89s/it] {'loss': 0.0054, 'learning_rate': 1.0565e-05, 'epoch': 2.98} 79%|███████▉ | 7901/10000 [28:46:58<7:31:05, 12.89s/it] 79%|███████▉ | 7902/10000 [28:47:11<7:31:44, 12.92s/it] {'loss': 0.005, 'learning_rate': 1.056e-05, 'epoch': 2.98} 79%|███████▉ | 7902/10000 [28:47:11<7:31:44, 12.92s/it] 79%|███████▉ | 7903/10000 [28:47:24<7:32:06, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.0555000000000001e-05, 'epoch': 2.98} 79%|███████▉ | 7903/10000 [28:47:24<7:32:06, 12.94s/it] 79%|███████▉ | 7904/10000 [28:47:37<7:32:17, 12.95s/it] {'loss': 0.0038, 'learning_rate': 1.055e-05, 'epoch': 2.98} 79%|███████▉ | 7904/10000 [28:47:37<7:32:17, 12.95s/it] 79%|███████▉ | 7905/10000 [28:47:50<7:31:44, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.0545000000000002e-05, 'epoch': 2.98} 79%|███████▉ | 7905/10000 [28:47:50<7:31:44, 12.94s/it] 79%|███████▉ | 7906/10000 [28:48:02<7:31:13, 12.93s/it] {'loss': 0.006, 'learning_rate': 1.0539999999999999e-05, 'epoch': 2.98} 79%|███████▉ | 7906/10000 [28:48:03<7:31:13, 12.93s/it] 79%|███████▉ | 7907/10000 [28:48:15<7:29:56, 12.90s/it] {'loss': 0.0054, 'learning_rate': 1.0535e-05, 'epoch': 2.98} 79%|███████▉ | 7907/10000 [28:48:15<7:29:56, 12.90s/it] 79%|███████▉ | 7908/10000 [28:48:28<7:29:24, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.053e-05, 'epoch': 2.98} 79%|███████▉ | 7908/10000 [28:48:28<7:29:24, 12.89s/it] 79%|███████▉ | 7909/10000 [28:48:41<7:31:24, 12.95s/it] {'loss': 0.0036, 'learning_rate': 1.0525e-05, 'epoch': 2.98} 79%|███████▉ | 7909/10000 [28:48:41<7:31:24, 12.95s/it] 79%|███████▉ | 7910/10000 [28:48:54<7:32:35, 12.99s/it] {'loss': 0.0038, 'learning_rate': 1.0520000000000001e-05, 'epoch': 2.98} 79%|███████▉ | 7910/10000 [28:48:54<7:32:35, 12.99s/it] 79%|███████▉ | 7911/10000 [28:49:07<7:33:28, 13.02s/it] {'loss': 0.0035, 'learning_rate': 1.0515e-05, 'epoch': 2.98} 79%|███████▉ | 7911/10000 [28:49:07<7:33:28, 13.02s/it] 79%|███████▉ | 7912/10000 [28:49:20<7:32:45, 13.01s/it] {'loss': 0.005, 'learning_rate': 1.0510000000000001e-05, 'epoch': 2.98} 79%|███████▉ | 7912/10000 [28:49:20<7:32:45, 13.01s/it] 79%|███████▉ | 7913/10000 [28:49:33<7:31:14, 12.97s/it] {'loss': 0.005, 'learning_rate': 1.0505e-05, 'epoch': 2.98} 79%|███████▉ | 7913/10000 [28:49:33<7:31:14, 12.97s/it] 79%|███████▉ | 7914/10000 [28:49:46<7:30:41, 12.96s/it] {'loss': 0.0059, 'learning_rate': 1.05e-05, 'epoch': 2.98} 79%|███████▉ | 7914/10000 [28:49:46<7:30:41, 12.96s/it] 79%|███████▉ | 7915/10000 [28:49:59<7:29:51, 12.95s/it] {'loss': 0.0054, 'learning_rate': 1.0495e-05, 'epoch': 2.98} 79%|███████▉ | 7915/10000 [28:49:59<7:29:51, 12.95s/it] 79%|███████▉ | 7916/10000 [28:50:12<7:29:06, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.049e-05, 'epoch': 2.98} 79%|███████▉ | 7916/10000 [28:50:12<7:29:06, 12.93s/it] 79%|███████▉ | 7917/10000 [28:50:25<7:29:36, 12.95s/it] {'loss': 0.0035, 'learning_rate': 1.0485e-05, 'epoch': 2.98} 79%|███████▉ | 7917/10000 [28:50:25<7:29:36, 12.95s/it] 79%|███████▉ | 7918/10000 [28:50:38<7:28:58, 12.94s/it] {'loss': 0.0052, 'learning_rate': 1.0480000000000001e-05, 'epoch': 2.98} 79%|███████▉ | 7918/10000 [28:50:38<7:28:58, 12.94s/it] 79%|███████▉ | 7919/10000 [28:50:51<7:28:25, 12.93s/it] {'loss': 0.0051, 'learning_rate': 1.0475e-05, 'epoch': 2.98} 79%|███████▉ | 7919/10000 [28:50:51<7:28:25, 12.93s/it] 79%|███████▉ | 7920/10000 [28:51:04<7:29:24, 12.96s/it] {'loss': 0.0042, 'learning_rate': 1.0470000000000001e-05, 'epoch': 2.98} 79%|███████▉ | 7920/10000 [28:51:04<7:29:24, 12.96s/it] 79%|███████▉ | 7921/10000 [28:51:17<7:28:55, 12.96s/it] {'loss': 0.0046, 'learning_rate': 1.0465e-05, 'epoch': 2.98} 79%|███████▉ | 7921/10000 [28:51:17<7:28:55, 12.96s/it] 79%|███████▉ | 7922/10000 [28:51:30<7:27:59, 12.94s/it] {'loss': 0.005, 'learning_rate': 1.046e-05, 'epoch': 2.98} 79%|███████▉ | 7922/10000 [28:51:30<7:27:59, 12.94s/it] 79%|███████▉ | 7923/10000 [28:51:43<7:27:11, 12.92s/it] {'loss': 0.0059, 'learning_rate': 1.0455e-05, 'epoch': 2.99} 79%|███████▉ | 7923/10000 [28:51:43<7:27:11, 12.92s/it] 79%|███████▉ | 7924/10000 [28:51:56<7:26:58, 12.92s/it] {'loss': 0.0047, 'learning_rate': 1.045e-05, 'epoch': 2.99} 79%|███████▉ | 7924/10000 [28:51:56<7:26:58, 12.92s/it] 79%|███████▉ | 7925/10000 [28:52:08<7:26:34, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.0445e-05, 'epoch': 2.99} 79%|███████▉ | 7925/10000 [28:52:08<7:26:34, 12.91s/it] 79%|███████▉ | 7926/10000 [28:52:21<7:26:21, 12.91s/it] {'loss': 0.0036, 'learning_rate': 1.0440000000000002e-05, 'epoch': 2.99} 79%|███████▉ | 7926/10000 [28:52:21<7:26:21, 12.91s/it] 79%|███████▉ | 7927/10000 [28:52:34<7:25:59, 12.91s/it] {'loss': 0.0045, 'learning_rate': 1.0435000000000001e-05, 'epoch': 2.99} 79%|███████▉ | 7927/10000 [28:52:34<7:25:59, 12.91s/it] 79%|███████▉ | 7928/10000 [28:52:47<7:26:23, 12.93s/it] {'loss': 0.0047, 'learning_rate': 1.043e-05, 'epoch': 2.99} 79%|███████▉ | 7928/10000 [28:52:47<7:26:23, 12.93s/it] 79%|███████▉ | 7929/10000 [28:53:00<7:26:54, 12.95s/it] {'loss': 0.0046, 'learning_rate': 1.0425e-05, 'epoch': 2.99} 79%|███████▉ | 7929/10000 [28:53:00<7:26:54, 12.95s/it] 79%|███████▉ | 7930/10000 [28:53:13<7:27:21, 12.97s/it] {'loss': 0.0047, 'learning_rate': 1.042e-05, 'epoch': 2.99} 79%|███████▉ | 7930/10000 [28:53:13<7:27:21, 12.97s/it] 79%|███████▉ | 7931/10000 [28:53:26<7:26:14, 12.94s/it] {'loss': 0.0042, 'learning_rate': 1.0415000000000001e-05, 'epoch': 2.99} 79%|███████▉ | 7931/10000 [28:53:26<7:26:14, 12.94s/it] 79%|███████▉ | 7932/10000 [28:53:39<7:25:56, 12.94s/it] {'loss': 0.004, 'learning_rate': 1.041e-05, 'epoch': 2.99} 79%|███████▉ | 7932/10000 [28:53:39<7:25:56, 12.94s/it] 79%|███████▉ | 7933/10000 [28:53:52<7:25:13, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.0405000000000001e-05, 'epoch': 2.99} 79%|███████▉ | 7933/10000 [28:53:52<7:25:13, 12.92s/it] 79%|███████▉ | 7934/10000 [28:54:05<7:24:54, 12.92s/it] {'loss': 0.005, 'learning_rate': 1.04e-05, 'epoch': 2.99} 79%|███████▉ | 7934/10000 [28:54:05<7:24:54, 12.92s/it] 79%|███████▉ | 7935/10000 [28:54:18<7:24:21, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.0395000000000001e-05, 'epoch': 2.99} 79%|███████▉ | 7935/10000 [28:54:18<7:24:21, 12.91s/it] 79%|███████▉ | 7936/10000 [28:54:31<7:24:01, 12.91s/it] {'loss': 0.0056, 'learning_rate': 1.039e-05, 'epoch': 2.99} 79%|███████▉ | 7936/10000 [28:54:31<7:24:01, 12.91s/it] 79%|███████▉ | 7937/10000 [28:54:44<7:24:27, 12.93s/it] {'loss': 0.0052, 'learning_rate': 1.0385e-05, 'epoch': 2.99} 79%|███████▉ | 7937/10000 [28:54:44<7:24:27, 12.93s/it] 79%|███████▉ | 7938/10000 [28:54:57<7:24:33, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.038e-05, 'epoch': 2.99} 79%|███████▉ | 7938/10000 [28:54:57<7:24:33, 12.94s/it] 79%|███████▉ | 7939/10000 [28:55:09<7:24:21, 12.94s/it] {'loss': 0.0039, 'learning_rate': 1.0375e-05, 'epoch': 2.99} 79%|███████▉ | 7939/10000 [28:55:10<7:24:21, 12.94s/it] 79%|███████▉ | 7940/10000 [28:55:22<7:23:44, 12.92s/it] {'loss': 0.0055, 'learning_rate': 1.037e-05, 'epoch': 2.99} 79%|███████▉ | 7940/10000 [28:55:22<7:23:44, 12.92s/it] 79%|███████▉ | 7941/10000 [28:55:35<7:23:22, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.0365000000000002e-05, 'epoch': 2.99} 79%|███████▉ | 7941/10000 [28:55:35<7:23:22, 12.92s/it] 79%|███████▉ | 7942/10000 [28:55:48<7:24:17, 12.95s/it] {'loss': 0.0042, 'learning_rate': 1.036e-05, 'epoch': 2.99} 79%|███████▉ | 7942/10000 [28:55:48<7:24:17, 12.95s/it] 79%|███████▉ | 7943/10000 [28:56:01<7:24:10, 12.96s/it] {'loss': 0.0054, 'learning_rate': 1.0355000000000002e-05, 'epoch': 2.99} 79%|███████▉ | 7943/10000 [28:56:01<7:24:10, 12.96s/it] 79%|███████▉ | 7944/10000 [28:56:14<7:22:46, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.035e-05, 'epoch': 2.99} 79%|███████▉ | 7944/10000 [28:56:14<7:22:46, 12.92s/it] 79%|███████▉ | 7945/10000 [28:56:27<7:22:20, 12.91s/it] {'loss': 0.0056, 'learning_rate': 1.0345e-05, 'epoch': 2.99} 79%|███████▉ | 7945/10000 [28:56:27<7:22:20, 12.91s/it] 79%|███████▉ | 7946/10000 [28:56:40<7:22:21, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.0340000000000001e-05, 'epoch': 2.99} 79%|███████▉ | 7946/10000 [28:56:40<7:22:21, 12.92s/it] 79%|███████▉ | 7947/10000 [28:56:53<7:23:12, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.0335e-05, 'epoch': 2.99} 79%|███████▉ | 7947/10000 [28:56:53<7:23:12, 12.95s/it] 79%|███████▉ | 7948/10000 [28:57:06<7:23:03, 12.96s/it] {'loss': 0.0041, 'learning_rate': 1.0330000000000001e-05, 'epoch': 2.99} 79%|███████▉ | 7948/10000 [28:57:06<7:23:03, 12.96s/it] 79%|███████▉ | 7949/10000 [28:57:19<7:22:19, 12.94s/it] {'loss': 0.004, 'learning_rate': 1.0325e-05, 'epoch': 3.0} 79%|███████▉ | 7949/10000 [28:57:19<7:22:19, 12.94s/it] 80%|███████▉ | 7950/10000 [28:57:32<7:21:37, 12.93s/it] {'loss': 0.005, 'learning_rate': 1.0320000000000001e-05, 'epoch': 3.0} 80%|███████▉ | 7950/10000 [28:57:32<7:21:37, 12.93s/it] 80%|███████▉ | 7951/10000 [28:57:45<7:21:12, 12.92s/it] {'loss': 0.0073, 'learning_rate': 1.0315e-05, 'epoch': 3.0} 80%|███████▉ | 7951/10000 [28:57:45<7:21:12, 12.92s/it] 80%|███████▉ | 7952/10000 [28:57:58<7:20:21, 12.90s/it] {'loss': 0.005, 'learning_rate': 1.031e-05, 'epoch': 3.0} 80%|███████▉ | 7952/10000 [28:57:58<7:20:21, 12.90s/it] 80%|███████▉ | 7953/10000 [28:58:11<7:21:06, 12.93s/it] {'loss': 0.0045, 'learning_rate': 1.0305e-05, 'epoch': 3.0} 80%|███████▉ | 7953/10000 [28:58:11<7:21:06, 12.93s/it] 80%|███████▉ | 7954/10000 [28:58:23<7:21:05, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.03e-05, 'epoch': 3.0} 80%|███████▉ | 7954/10000 [28:58:23<7:21:05, 12.94s/it] 80%|███████▉ | 7955/10000 [28:58:36<7:20:41, 12.93s/it] {'loss': 0.0033, 'learning_rate': 1.0295e-05, 'epoch': 3.0} 80%|███████▉ | 7955/10000 [28:58:36<7:20:41, 12.93s/it] 80%|███████▉ | 7956/10000 [28:58:49<7:21:15, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.0290000000000001e-05, 'epoch': 3.0} 80%|███████▉ | 7956/10000 [28:58:49<7:21:15, 12.95s/it] 80%|███████▉ | 7957/10000 [28:59:02<7:21:53, 12.98s/it] {'loss': 0.0034, 'learning_rate': 1.0285e-05, 'epoch': 3.0} 80%|███████▉ | 7957/10000 [28:59:02<7:21:53, 12.98s/it] 80%|███████▉ | 7958/10000 [28:59:15<7:20:22, 12.94s/it] {'loss': 0.0055, 'learning_rate': 1.0280000000000002e-05, 'epoch': 3.0} 80%|███████▉ | 7958/10000 [28:59:15<7:20:22, 12.94s/it] 80%|███████▉ | 7959/10000 [28:59:28<7:19:45, 12.93s/it] {'loss': 0.0059, 'learning_rate': 1.0275e-05, 'epoch': 3.0} 80%|███████▉ | 7959/10000 [28:59:28<7:19:45, 12.93s/it] 80%|███████▉ | 7960/10000 [28:59:41<7:19:20, 12.92s/it] {'loss': 0.0048, 'learning_rate': 1.027e-05, 'epoch': 3.0} 80%|███████▉ | 7960/10000 [28:59:41<7:19:20, 12.92s/it] 80%|███████▉ | 7961/10000 [28:59:54<7:19:14, 12.93s/it] {'loss': 0.0043, 'learning_rate': 1.0265e-05, 'epoch': 3.0} 80%|███████▉ | 7961/10000 [28:59:54<7:19:14, 12.93s/it] 80%|███████▉ | 7962/10000 [29:00:01<6:19:42, 11.18s/it] {'loss': 0.0057, 'learning_rate': 1.026e-05, 'epoch': 3.0} 80%|███████▉ | 7962/10000 [29:00:01<6:19:42, 11.18s/it] 80%|███████▉ | 7963/10000 [29:00:14<6:37:10, 11.70s/it] {'loss': 0.0042, 'learning_rate': 1.0255000000000001e-05, 'epoch': 3.0} 80%|███████▉ | 7963/10000 [29:00:14<6:37:10, 11.70s/it] 80%|███████▉ | 7964/10000 [29:00:27<6:49:42, 12.07s/it] {'loss': 0.0041, 'learning_rate': 1.025e-05, 'epoch': 3.0} 80%|███████▉ | 7964/10000 [29:00:27<6:49:42, 12.07s/it] 80%|███████▉ | 7965/10000 [29:00:40<6:58:09, 12.33s/it] {'loss': 0.0044, 'learning_rate': 1.0245000000000001e-05, 'epoch': 3.0} 80%|███████▉ | 7965/10000 [29:00:40<6:58:09, 12.33s/it] 80%|███████▉ | 7966/10000 [29:00:53<7:04:23, 12.52s/it] {'loss': 0.0041, 'learning_rate': 1.024e-05, 'epoch': 3.0} 80%|███████▉ | 7966/10000 [29:00:53<7:04:23, 12.52s/it] 80%|███████▉ | 7967/10000 [29:01:06<7:08:16, 12.64s/it] {'loss': 0.0031, 'learning_rate': 1.0235e-05, 'epoch': 3.0} 80%|███████▉ | 7967/10000 [29:01:06<7:08:16, 12.64s/it] 80%|███████▉ | 7968/10000 [29:01:19<7:11:26, 12.74s/it] {'loss': 0.0035, 'learning_rate': 1.023e-05, 'epoch': 3.0} 80%|███████▉ | 7968/10000 [29:01:19<7:11:26, 12.74s/it] 80%|███████▉ | 7969/10000 [29:01:32<7:12:47, 12.79s/it] {'loss': 0.0045, 'learning_rate': 1.0225e-05, 'epoch': 3.0} 80%|███████▉ | 7969/10000 [29:01:32<7:12:47, 12.79s/it] 80%|███████▉ | 7970/10000 [29:01:45<7:13:20, 12.81s/it] {'loss': 0.0036, 'learning_rate': 1.022e-05, 'epoch': 3.0} 80%|███████▉ | 7970/10000 [29:01:45<7:13:20, 12.81s/it] 80%|███████▉ | 7971/10000 [29:01:57<7:14:18, 12.84s/it] {'loss': 0.004, 'learning_rate': 1.0215000000000001e-05, 'epoch': 3.0} 80%|███████▉ | 7971/10000 [29:01:57<7:14:18, 12.84s/it] 80%|███████▉ | 7972/10000 [29:02:10<7:14:09, 12.84s/it] {'loss': 0.0042, 'learning_rate': 1.021e-05, 'epoch': 3.0} 80%|███████▉ | 7972/10000 [29:02:10<7:14:09, 12.84s/it] 80%|███████▉ | 7973/10000 [29:02:23<7:15:13, 12.88s/it] {'loss': 0.0042, 'learning_rate': 1.0205000000000001e-05, 'epoch': 3.0} 80%|███████▉ | 7973/10000 [29:02:23<7:15:13, 12.88s/it] 80%|███████▉ | 7974/10000 [29:02:36<7:14:56, 12.88s/it] {'loss': 0.0032, 'learning_rate': 1.02e-05, 'epoch': 3.0} 80%|███████▉ | 7974/10000 [29:02:36<7:14:56, 12.88s/it] 80%|███████▉ | 7975/10000 [29:02:49<7:14:35, 12.88s/it] {'loss': 0.0042, 'learning_rate': 1.0195e-05, 'epoch': 3.0} 80%|███████▉ | 7975/10000 [29:02:49<7:14:35, 12.88s/it] 80%|███████▉ | 7976/10000 [29:03:02<7:15:12, 12.90s/it] {'loss': 0.004, 'learning_rate': 1.019e-05, 'epoch': 3.01} 80%|███████▉ | 7976/10000 [29:03:02<7:15:12, 12.90s/it] 80%|███████▉ | 7977/10000 [29:03:15<7:15:03, 12.90s/it] {'loss': 0.0036, 'learning_rate': 1.0185e-05, 'epoch': 3.01} 80%|███████▉ | 7977/10000 [29:03:15<7:15:03, 12.90s/it] 80%|███████▉ | 7978/10000 [29:03:28<7:14:22, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.018e-05, 'epoch': 3.01} 80%|███████▉ | 7978/10000 [29:03:28<7:14:22, 12.89s/it] 80%|███████▉ | 7979/10000 [29:03:41<7:14:35, 12.90s/it] {'loss': 0.0034, 'learning_rate': 1.0175e-05, 'epoch': 3.01} 80%|███████▉ | 7979/10000 [29:03:41<7:14:35, 12.90s/it] 80%|███████▉ | 7980/10000 [29:03:54<7:14:44, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.0170000000000001e-05, 'epoch': 3.01} 80%|███████▉ | 7980/10000 [29:03:54<7:14:44, 12.91s/it] 80%|███████▉ | 7981/10000 [29:04:07<7:15:09, 12.93s/it] {'loss': 0.003, 'learning_rate': 1.0165e-05, 'epoch': 3.01} 80%|███████▉ | 7981/10000 [29:04:07<7:15:09, 12.93s/it] 80%|███████▉ | 7982/10000 [29:04:19<7:14:42, 12.93s/it] {'loss': 0.0036, 'learning_rate': 1.016e-05, 'epoch': 3.01} 80%|███████▉ | 7982/10000 [29:04:19<7:14:42, 12.93s/it] 80%|███████▉ | 7983/10000 [29:04:32<7:14:29, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.0155e-05, 'epoch': 3.01} 80%|███████▉ | 7983/10000 [29:04:32<7:14:29, 12.92s/it] 80%|███████▉ | 7984/10000 [29:04:45<7:14:04, 12.92s/it] {'loss': 0.0037, 'learning_rate': 1.0150000000000001e-05, 'epoch': 3.01} 80%|███████▉ | 7984/10000 [29:04:45<7:14:04, 12.92s/it] 80%|███████▉ | 7985/10000 [29:04:58<7:13:50, 12.92s/it] {'loss': 0.0031, 'learning_rate': 1.0145e-05, 'epoch': 3.01} 80%|███████▉ | 7985/10000 [29:04:58<7:13:50, 12.92s/it] 80%|███████▉ | 7986/10000 [29:05:11<7:13:30, 12.91s/it] {'loss': 0.0029, 'learning_rate': 1.0140000000000001e-05, 'epoch': 3.01} 80%|███████▉ | 7986/10000 [29:05:11<7:13:30, 12.91s/it] 80%|███████▉ | 7987/10000 [29:05:24<7:13:11, 12.91s/it] {'loss': 0.0033, 'learning_rate': 1.0135e-05, 'epoch': 3.01} 80%|███████▉ | 7987/10000 [29:05:24<7:13:11, 12.91s/it] 80%|███████▉ | 7988/10000 [29:05:37<7:12:52, 12.91s/it] {'loss': 0.0035, 'learning_rate': 1.0130000000000001e-05, 'epoch': 3.01} 80%|███████▉ | 7988/10000 [29:05:37<7:12:52, 12.91s/it] 80%|███████▉ | 7989/10000 [29:05:50<7:12:58, 12.92s/it] {'loss': 0.0032, 'learning_rate': 1.0125e-05, 'epoch': 3.01} 80%|███████▉ | 7989/10000 [29:05:50<7:12:58, 12.92s/it] 80%|███████▉ | 7990/10000 [29:06:03<7:12:49, 12.92s/it] {'loss': 0.0036, 'learning_rate': 1.012e-05, 'epoch': 3.01} 80%|███████▉ | 7990/10000 [29:06:03<7:12:49, 12.92s/it] 80%|███████▉ | 7991/10000 [29:06:16<7:12:09, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.0115e-05, 'epoch': 3.01} 80%|███████▉ | 7991/10000 [29:06:16<7:12:09, 12.91s/it] 80%|███████▉ | 7992/10000 [29:06:29<7:11:42, 12.90s/it] {'loss': 0.0049, 'learning_rate': 1.011e-05, 'epoch': 3.01} 80%|███████▉ | 7992/10000 [29:06:29<7:11:42, 12.90s/it] 80%|███████▉ | 7993/10000 [29:06:41<7:11:26, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.0105e-05, 'epoch': 3.01} 80%|███████▉ | 7993/10000 [29:06:41<7:11:26, 12.90s/it] 80%|███████▉ | 7994/10000 [29:06:54<7:11:14, 12.90s/it] {'loss': 0.0053, 'learning_rate': 1.0100000000000002e-05, 'epoch': 3.01} 80%|███████▉ | 7994/10000 [29:06:54<7:11:14, 12.90s/it] 80%|███████▉ | 7995/10000 [29:07:07<7:11:28, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.0095e-05, 'epoch': 3.01} 80%|███████▉ | 7995/10000 [29:07:07<7:11:28, 12.91s/it] 80%|███████▉ | 7996/10000 [29:07:20<7:11:18, 12.91s/it] {'loss': 0.0036, 'learning_rate': 1.0090000000000002e-05, 'epoch': 3.01} 80%|███████▉ | 7996/10000 [29:07:20<7:11:18, 12.91s/it] 80%|███████▉ | 7997/10000 [29:07:33<7:11:26, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.0085e-05, 'epoch': 3.01} 80%|███████▉ | 7997/10000 [29:07:33<7:11:26, 12.92s/it] 80%|███████▉ | 7998/10000 [29:07:46<7:10:25, 12.90s/it] {'loss': 0.0033, 'learning_rate': 1.008e-05, 'epoch': 3.01} 80%|███████▉ | 7998/10000 [29:07:46<7:10:25, 12.90s/it] 80%|███████▉ | 7999/10000 [29:07:59<7:09:52, 12.89s/it] {'loss': 0.0049, 'learning_rate': 1.0075000000000001e-05, 'epoch': 3.01} 80%|███████▉ | 7999/10000 [29:07:59<7:09:52, 12.89s/it] 80%|████████ | 8000/10000 [29:08:12<7:09:32, 12.89s/it] {'loss': 0.0033, 'learning_rate': 1.007e-05, 'epoch': 3.01} 80%|████████ | 8000/10000 [29:08:12<7:09:32, 12.89s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-07 01:33:10,341 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/config.json [INFO|configuration_utils.py:364] 2024-11-07 01:33:10,343 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-07 01:34:00,588 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-07 01:34:00,591 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-07 01:34:00,592 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/special_tokens_map.json [2024-11-07 01:34:00,604] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step8000 is about to be saved! [2024-11-07 01:34:00,653] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/global_step8000/mp_rank_00_model_states.pt [2024-11-07 01:34:00,654] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/global_step8000/mp_rank_00_model_states.pt... [2024-11-07 01:34:52,720] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/global_step8000/mp_rank_00_model_states.pt. [2024-11-07 01:34:52,826] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/global_step8000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-07 01:36:28,830] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/global_step8000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-07 01:36:28,962] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-8000/global_step8000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-07 01:36:28,962] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step8000 is ready now! 80%|████████ | 8001/10000 [29:11:44<40:21:35, 72.68s/it] {'loss': 0.0037, 'learning_rate': 1.0065000000000001e-05, 'epoch': 3.01} 80%|████████ | 8001/10000 [29:11:44<40:21:35, 72.68s/it] 80%|████████ | 8002/10000 [29:11:57<30:20:52, 54.68s/it] {'loss': 0.0041, 'learning_rate': 1.006e-05, 'epoch': 3.02} 80%|████████ | 8002/10000 [29:11:57<30:20:52, 54.68s/it] 80%|████████ | 8003/10000 [29:12:09<23:20:43, 42.08s/it] {'loss': 0.0044, 'learning_rate': 1.0055000000000001e-05, 'epoch': 3.02} 80%|████████ | 8003/10000 [29:12:09<23:20:43, 42.08s/it] 80%|████████ | 8004/10000 [29:12:22<18:27:29, 33.29s/it] {'loss': 0.0035, 'learning_rate': 1.005e-05, 'epoch': 3.02} 80%|████████ | 8004/10000 [29:12:22<18:27:29, 33.29s/it] 80%|████████ | 8005/10000 [29:12:35<15:02:09, 27.13s/it] {'loss': 0.0033, 'learning_rate': 1.0045e-05, 'epoch': 3.02} 80%|████████ | 8005/10000 [29:12:35<15:02:09, 27.13s/it] 80%|████████ | 8006/10000 [29:12:48<12:39:27, 22.85s/it] {'loss': 0.0038, 'learning_rate': 1.004e-05, 'epoch': 3.02} 80%|████████ | 8006/10000 [29:12:48<12:39:27, 22.85s/it] 80%|████████ | 8007/10000 [29:13:00<10:58:33, 19.83s/it] {'loss': 0.0044, 'learning_rate': 1.0035e-05, 'epoch': 3.02} 80%|████████ | 8007/10000 [29:13:01<10:58:33, 19.83s/it] 80%|████████ | 8008/10000 [29:13:13<9:48:25, 17.72s/it] {'loss': 0.0031, 'learning_rate': 1.003e-05, 'epoch': 3.02} 80%|████████ | 8008/10000 [29:13:13<9:48:25, 17.72s/it] 80%|████████ | 8009/10000 [29:13:26<8:58:38, 16.23s/it] {'loss': 0.0038, 'learning_rate': 1.0025000000000001e-05, 'epoch': 3.02} 80%|████████ | 8009/10000 [29:13:26<8:58:38, 16.23s/it] 80%|████████ | 8010/10000 [29:13:39<8:24:30, 15.21s/it] {'loss': 0.0042, 'learning_rate': 1.002e-05, 'epoch': 3.02} 80%|████████ | 8010/10000 [29:13:39<8:24:30, 15.21s/it] 80%|████████ | 8011/10000 [29:13:52<8:00:28, 14.49s/it] {'loss': 0.0042, 'learning_rate': 1.0015000000000002e-05, 'epoch': 3.02} 80%|████████ | 8011/10000 [29:13:52<8:00:28, 14.49s/it] 80%|████████ | 8012/10000 [29:14:05<7:43:58, 14.00s/it] {'loss': 0.0055, 'learning_rate': 1.001e-05, 'epoch': 3.02} 80%|████████ | 8012/10000 [29:14:05<7:43:58, 14.00s/it] 80%|████████ | 8013/10000 [29:14:18<7:33:08, 13.68s/it] {'loss': 0.0039, 'learning_rate': 1.0005e-05, 'epoch': 3.02} 80%|████████ | 8013/10000 [29:14:18<7:33:08, 13.68s/it] 80%|████████ | 8014/10000 [29:14:30<7:25:46, 13.47s/it] {'loss': 0.0033, 'learning_rate': 1e-05, 'epoch': 3.02} 80%|████████ | 8014/10000 [29:14:31<7:25:46, 13.47s/it] 80%|████████ | 8015/10000 [29:14:43<7:20:05, 13.30s/it] {'loss': 0.0039, 'learning_rate': 9.995e-06, 'epoch': 3.02} 80%|████████ | 8015/10000 [29:14:43<7:20:05, 13.30s/it] 80%|████████ | 8016/10000 [29:14:56<7:17:03, 13.22s/it] {'loss': 0.0035, 'learning_rate': 9.990000000000001e-06, 'epoch': 3.02} 80%|████████ | 8016/10000 [29:14:56<7:17:03, 13.22s/it] 80%|████████ | 8017/10000 [29:15:09<7:13:52, 13.13s/it] {'loss': 0.0033, 'learning_rate': 9.985e-06, 'epoch': 3.02} 80%|████████ | 8017/10000 [29:15:09<7:13:52, 13.13s/it] 80%|████████ | 8018/10000 [29:15:22<7:12:03, 13.08s/it] {'loss': 0.0033, 'learning_rate': 9.980000000000001e-06, 'epoch': 3.02} 80%|████████ | 8018/10000 [29:15:22<7:12:03, 13.08s/it] 80%|████████ | 8019/10000 [29:15:35<7:10:18, 13.03s/it] {'loss': 0.0049, 'learning_rate': 9.975e-06, 'epoch': 3.02} 80%|████████ | 8019/10000 [29:15:35<7:10:18, 13.03s/it] 80%|████████ | 8020/10000 [29:15:48<7:08:46, 12.99s/it] {'loss': 0.0039, 'learning_rate': 9.97e-06, 'epoch': 3.02} 80%|████████ | 8020/10000 [29:15:48<7:08:46, 12.99s/it] 80%|████████ | 8021/10000 [29:16:01<7:07:21, 12.96s/it] {'loss': 0.0032, 'learning_rate': 9.965e-06, 'epoch': 3.02} 80%|████████ | 8021/10000 [29:16:01<7:07:21, 12.96s/it] 80%|████████ | 8022/10000 [29:16:14<7:06:33, 12.94s/it] {'loss': 0.0027, 'learning_rate': 9.96e-06, 'epoch': 3.02} 80%|████████ | 8022/10000 [29:16:14<7:06:33, 12.94s/it] 80%|████████ | 8023/10000 [29:16:27<7:06:25, 12.94s/it] {'loss': 0.0038, 'learning_rate': 9.955e-06, 'epoch': 3.02} 80%|████████ | 8023/10000 [29:16:27<7:06:25, 12.94s/it] 80%|████████ | 8024/10000 [29:16:40<7:06:27, 12.95s/it] {'loss': 0.0027, 'learning_rate': 9.950000000000001e-06, 'epoch': 3.02} 80%|████████ | 8024/10000 [29:16:40<7:06:27, 12.95s/it] 80%|████████ | 8025/10000 [29:16:53<7:06:59, 12.97s/it] {'loss': 0.0035, 'learning_rate': 9.945e-06, 'epoch': 3.02} 80%|████████ | 8025/10000 [29:16:53<7:06:59, 12.97s/it] 80%|████████ | 8026/10000 [29:17:06<7:07:55, 13.01s/it] {'loss': 0.0035, 'learning_rate': 9.940000000000001e-06, 'epoch': 3.02} 80%|████████ | 8026/10000 [29:17:06<7:07:55, 13.01s/it] 80%|████████ | 8027/10000 [29:17:19<7:07:57, 13.01s/it] {'loss': 0.0032, 'learning_rate': 9.935e-06, 'epoch': 3.02} 80%|████████ | 8027/10000 [29:17:19<7:07:57, 13.01s/it] 80%|████████ | 8028/10000 [29:17:32<7:07:26, 13.01s/it] {'loss': 0.0042, 'learning_rate': 9.93e-06, 'epoch': 3.02} 80%|████████ | 8028/10000 [29:17:32<7:07:26, 13.01s/it] 80%|████████ | 8029/10000 [29:17:45<7:07:04, 13.00s/it] {'loss': 0.004, 'learning_rate': 9.925e-06, 'epoch': 3.03} 80%|████████ | 8029/10000 [29:17:45<7:07:04, 13.00s/it] 80%|████████ | 8030/10000 [29:17:58<7:07:03, 13.01s/it] {'loss': 0.0032, 'learning_rate': 9.92e-06, 'epoch': 3.03} 80%|████████ | 8030/10000 [29:17:58<7:07:03, 13.01s/it] 80%|████████ | 8031/10000 [29:18:11<7:06:25, 12.99s/it] {'loss': 0.004, 'learning_rate': 9.915e-06, 'epoch': 3.03} 80%|████████ | 8031/10000 [29:18:11<7:06:25, 12.99s/it][2024-11-07 01:43:21,097] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 80%|████████ | 8032/10000 [29:18:23<6:52:53, 12.59s/it] {'loss': 0.0028, 'learning_rate': 9.915e-06, 'epoch': 3.03} 80%|████████ | 8032/10000 [29:18:23<6:52:53, 12.59s/it][2024-11-07 01:43:32,757] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 80%|████████ | 8033/10000 [29:18:34<6:43:33, 12.31s/it] {'loss': 0.0044, 'learning_rate': 9.915e-06, 'epoch': 3.03} 80%|████████ | 8033/10000 [29:18:34<6:43:33, 12.31s/it] 80%|████████ | 8034/10000 [29:18:47<6:50:28, 12.53s/it] {'loss': 0.0041, 'learning_rate': 9.91e-06, 'epoch': 3.03} 80%|████████ | 8034/10000 [29:18:47<6:50:28, 12.53s/it] 80%|████████ | 8035/10000 [29:19:00<6:54:52, 12.67s/it] {'loss': 0.0033, 'learning_rate': 9.905000000000001e-06, 'epoch': 3.03} 80%|████████ | 8035/10000 [29:19:00<6:54:52, 12.67s/it] 80%|████████ | 8036/10000 [29:19:13<6:57:27, 12.75s/it] {'loss': 0.0045, 'learning_rate': 9.900000000000002e-06, 'epoch': 3.03} 80%|████████ | 8036/10000 [29:19:13<6:57:27, 12.75s/it] 80%|████████ | 8037/10000 [29:19:26<6:59:09, 12.81s/it] {'loss': 0.0043, 'learning_rate': 9.895e-06, 'epoch': 3.03} 80%|████████ | 8037/10000 [29:19:26<6:59:09, 12.81s/it] 80%|████████ | 8038/10000 [29:19:39<6:59:17, 12.82s/it] {'loss': 0.0045, 'learning_rate': 9.89e-06, 'epoch': 3.03} 80%|████████ | 8038/10000 [29:19:39<6:59:17, 12.82s/it] 80%|████████ | 8039/10000 [29:19:52<6:59:15, 12.83s/it] {'loss': 0.0042, 'learning_rate': 9.885e-06, 'epoch': 3.03} 80%|████████ | 8039/10000 [29:19:52<6:59:15, 12.83s/it] 80%|████████ | 8040/10000 [29:20:05<6:59:52, 12.85s/it] {'loss': 0.0034, 'learning_rate': 9.88e-06, 'epoch': 3.03} 80%|████████ | 8040/10000 [29:20:05<6:59:52, 12.85s/it] 80%|████████ | 8041/10000 [29:20:18<7:00:08, 12.87s/it] {'loss': 0.0029, 'learning_rate': 9.875000000000001e-06, 'epoch': 3.03} 80%|████████ | 8041/10000 [29:20:18<7:00:08, 12.87s/it] 80%|████████ | 8042/10000 [29:20:31<7:00:30, 12.89s/it] {'loss': 0.0032, 'learning_rate': 9.87e-06, 'epoch': 3.03} 80%|████████ | 8042/10000 [29:20:31<7:00:30, 12.89s/it] 80%|████████ | 8043/10000 [29:20:44<7:00:49, 12.90s/it] {'loss': 0.0032, 'learning_rate': 9.865000000000001e-06, 'epoch': 3.03} 80%|████████ | 8043/10000 [29:20:44<7:00:49, 12.90s/it] 80%|████████ | 8044/10000 [29:20:56<7:00:12, 12.89s/it] {'loss': 0.0039, 'learning_rate': 9.86e-06, 'epoch': 3.03} 80%|████████ | 8044/10000 [29:20:56<7:00:12, 12.89s/it] 80%|████████ | 8045/10000 [29:21:09<7:00:26, 12.90s/it] {'loss': 0.0038, 'learning_rate': 9.855e-06, 'epoch': 3.03} 80%|████████ | 8045/10000 [29:21:09<7:00:26, 12.90s/it] 80%|████████ | 8046/10000 [29:21:22<7:00:33, 12.91s/it] {'loss': 0.0031, 'learning_rate': 9.85e-06, 'epoch': 3.03} 80%|████████ | 8046/10000 [29:21:22<7:00:33, 12.91s/it] 80%|████████ | 8047/10000 [29:21:35<7:00:51, 12.93s/it] {'loss': 0.0037, 'learning_rate': 9.845e-06, 'epoch': 3.03} 80%|████████ | 8047/10000 [29:21:35<7:00:51, 12.93s/it] 80%|████████ | 8048/10000 [29:21:48<7:00:43, 12.93s/it] {'loss': 0.0033, 'learning_rate': 9.84e-06, 'epoch': 3.03} 80%|████████ | 8048/10000 [29:21:48<7:00:43, 12.93s/it] 80%|████████ | 8049/10000 [29:22:01<6:59:34, 12.90s/it] {'loss': 0.0042, 'learning_rate': 9.835000000000002e-06, 'epoch': 3.03} 80%|████████ | 8049/10000 [29:22:01<6:59:34, 12.90s/it] 80%|████████ | 8050/10000 [29:22:14<6:59:08, 12.90s/it] {'loss': 0.003, 'learning_rate': 9.83e-06, 'epoch': 3.03} 80%|████████ | 8050/10000 [29:22:14<6:59:08, 12.90s/it] 81%|████████ | 8051/10000 [29:22:27<6:58:02, 12.87s/it] {'loss': 0.0044, 'learning_rate': 9.825000000000002e-06, 'epoch': 3.03} 81%|████████ | 8051/10000 [29:22:27<6:58:02, 12.87s/it] 81%|████████ | 8052/10000 [29:22:40<6:57:53, 12.87s/it] {'loss': 0.0031, 'learning_rate': 9.820000000000001e-06, 'epoch': 3.03} 81%|████████ | 8052/10000 [29:22:40<6:57:53, 12.87s/it] 81%|████████ | 8053/10000 [29:22:52<6:58:14, 12.89s/it] {'loss': 0.0031, 'learning_rate': 9.815e-06, 'epoch': 3.03} 81%|████████ | 8053/10000 [29:22:53<6:58:14, 12.89s/it] 81%|████████ | 8054/10000 [29:23:05<6:58:07, 12.89s/it] {'loss': 0.004, 'learning_rate': 9.810000000000001e-06, 'epoch': 3.03} 81%|████████ | 8054/10000 [29:23:05<6:58:07, 12.89s/it] 81%|████████ | 8055/10000 [29:23:18<6:58:27, 12.91s/it] {'loss': 0.0035, 'learning_rate': 9.805e-06, 'epoch': 3.04} 81%|████████ | 8055/10000 [29:23:18<6:58:27, 12.91s/it] 81%|████████ | 8056/10000 [29:23:31<6:58:47, 12.93s/it] {'loss': 0.0043, 'learning_rate': 9.800000000000001e-06, 'epoch': 3.04} 81%|████████ | 8056/10000 [29:23:31<6:58:47, 12.93s/it] 81%|████████ | 8057/10000 [29:23:44<6:59:13, 12.95s/it] {'loss': 0.003, 'learning_rate': 9.795e-06, 'epoch': 3.04} 81%|████████ | 8057/10000 [29:23:44<6:59:13, 12.95s/it] 81%|████████ | 8058/10000 [29:23:57<7:00:02, 12.98s/it] {'loss': 0.0036, 'learning_rate': 9.790000000000001e-06, 'epoch': 3.04} 81%|████████ | 8058/10000 [29:23:57<7:00:02, 12.98s/it] 81%|████████ | 8059/10000 [29:24:10<7:00:28, 13.00s/it] {'loss': 0.0036, 'learning_rate': 9.785e-06, 'epoch': 3.04} 81%|████████ | 8059/10000 [29:24:10<7:00:28, 13.00s/it] 81%|████████ | 8060/10000 [29:24:23<6:59:55, 12.99s/it] {'loss': 0.0048, 'learning_rate': 9.78e-06, 'epoch': 3.04} 81%|████████ | 8060/10000 [29:24:23<6:59:55, 12.99s/it] 81%|████████ | 8061/10000 [29:24:36<6:58:51, 12.96s/it] {'loss': 0.0027, 'learning_rate': 9.775e-06, 'epoch': 3.04} 81%|████████ | 8061/10000 [29:24:36<6:58:51, 12.96s/it] 81%|████████ | 8062/10000 [29:24:49<6:57:22, 12.92s/it] {'loss': 0.0047, 'learning_rate': 9.77e-06, 'epoch': 3.04} 81%|████████ | 8062/10000 [29:24:49<6:57:22, 12.92s/it] 81%|████████ | 8063/10000 [29:25:02<6:56:32, 12.90s/it] {'loss': 0.0038, 'learning_rate': 9.765e-06, 'epoch': 3.04} 81%|████████ | 8063/10000 [29:25:02<6:56:32, 12.90s/it] 81%|████████ | 8064/10000 [29:25:15<6:55:54, 12.89s/it] {'loss': 0.0043, 'learning_rate': 9.760000000000001e-06, 'epoch': 3.04} 81%|████████ | 8064/10000 [29:25:15<6:55:54, 12.89s/it] 81%|████████ | 8065/10000 [29:25:28<6:55:55, 12.90s/it] {'loss': 0.0042, 'learning_rate': 9.755e-06, 'epoch': 3.04} 81%|████████ | 8065/10000 [29:25:28<6:55:55, 12.90s/it] 81%|████████ | 8066/10000 [29:25:41<6:55:31, 12.89s/it] {'loss': 0.0044, 'learning_rate': 9.750000000000002e-06, 'epoch': 3.04} 81%|████████ | 8066/10000 [29:25:41<6:55:31, 12.89s/it] 81%|████████ | 8067/10000 [29:25:54<6:55:56, 12.91s/it] {'loss': 0.0029, 'learning_rate': 9.745e-06, 'epoch': 3.04} 81%|████████ | 8067/10000 [29:25:54<6:55:56, 12.91s/it] 81%|████████ | 8068/10000 [29:26:06<6:55:44, 12.91s/it] {'loss': 0.0045, 'learning_rate': 9.74e-06, 'epoch': 3.04} 81%|████████ | 8068/10000 [29:26:07<6:55:44, 12.91s/it] 81%|████████ | 8069/10000 [29:26:19<6:56:18, 12.94s/it] {'loss': 0.0042, 'learning_rate': 9.735e-06, 'epoch': 3.04} 81%|████████ | 8069/10000 [29:26:19<6:56:18, 12.94s/it] 81%|████████ | 8070/10000 [29:26:32<6:56:12, 12.94s/it] {'loss': 0.0034, 'learning_rate': 9.73e-06, 'epoch': 3.04} 81%|████████ | 8070/10000 [29:26:32<6:56:12, 12.94s/it] 81%|████████ | 8071/10000 [29:26:45<6:55:07, 12.91s/it] {'loss': 0.0031, 'learning_rate': 9.725000000000001e-06, 'epoch': 3.04} 81%|████████ | 8071/10000 [29:26:45<6:55:07, 12.91s/it] 81%|████████ | 8072/10000 [29:26:58<6:55:04, 12.92s/it] {'loss': 0.0048, 'learning_rate': 9.72e-06, 'epoch': 3.04} 81%|████████ | 8072/10000 [29:26:58<6:55:04, 12.92s/it] 81%|████████ | 8073/10000 [29:27:11<6:55:03, 12.92s/it] {'loss': 0.0046, 'learning_rate': 9.715000000000001e-06, 'epoch': 3.04} 81%|████████ | 8073/10000 [29:27:11<6:55:03, 12.92s/it] 81%|████████ | 8074/10000 [29:27:24<6:55:01, 12.93s/it] {'loss': 0.0033, 'learning_rate': 9.71e-06, 'epoch': 3.04} 81%|████████ | 8074/10000 [29:27:24<6:55:01, 12.93s/it] 81%|████████ | 8075/10000 [29:27:37<6:55:44, 12.96s/it] {'loss': 0.0041, 'learning_rate': 9.705e-06, 'epoch': 3.04} 81%|████████ | 8075/10000 [29:27:37<6:55:44, 12.96s/it] 81%|████████ | 8076/10000 [29:27:50<6:54:53, 12.94s/it] {'loss': 0.0035, 'learning_rate': 9.7e-06, 'epoch': 3.04} 81%|████████ | 8076/10000 [29:27:50<6:54:53, 12.94s/it] 81%|████████ | 8077/10000 [29:28:03<6:54:15, 12.93s/it] {'loss': 0.0042, 'learning_rate': 9.695e-06, 'epoch': 3.04} 81%|████████ | 8077/10000 [29:28:03<6:54:15, 12.93s/it] 81%|████████ | 8078/10000 [29:28:16<6:54:57, 12.95s/it] {'loss': 0.0035, 'learning_rate': 9.69e-06, 'epoch': 3.04} 81%|████████ | 8078/10000 [29:28:16<6:54:57, 12.95s/it] 81%|████████ | 8079/10000 [29:28:29<6:54:26, 12.94s/it] {'loss': 0.0023, 'learning_rate': 9.685000000000001e-06, 'epoch': 3.04} 81%|████████ | 8079/10000 [29:28:29<6:54:26, 12.94s/it] 81%|████████ | 8080/10000 [29:28:42<6:54:00, 12.94s/it] {'loss': 0.0039, 'learning_rate': 9.68e-06, 'epoch': 3.04} 81%|████████ | 8080/10000 [29:28:42<6:54:00, 12.94s/it] 81%|████████ | 8081/10000 [29:28:55<6:53:23, 12.93s/it] {'loss': 0.0034, 'learning_rate': 9.675000000000001e-06, 'epoch': 3.04} 81%|████████ | 8081/10000 [29:28:55<6:53:23, 12.93s/it] 81%|████████ | 8082/10000 [29:29:08<6:53:34, 12.94s/it] {'loss': 0.0026, 'learning_rate': 9.67e-06, 'epoch': 3.05} 81%|████████ | 8082/10000 [29:29:08<6:53:34, 12.94s/it] 81%|████████ | 8083/10000 [29:29:21<6:53:46, 12.95s/it] {'loss': 0.0034, 'learning_rate': 9.665e-06, 'epoch': 3.05} 81%|████████ | 8083/10000 [29:29:21<6:53:46, 12.95s/it] 81%|████████ | 8084/10000 [29:29:34<6:53:29, 12.95s/it] {'loss': 0.0033, 'learning_rate': 9.66e-06, 'epoch': 3.05} 81%|████████ | 8084/10000 [29:29:34<6:53:29, 12.95s/it] 81%|████████ | 8085/10000 [29:29:46<6:52:39, 12.93s/it] {'loss': 0.005, 'learning_rate': 9.655e-06, 'epoch': 3.05} 81%|████████ | 8085/10000 [29:29:46<6:52:39, 12.93s/it] 81%|████████ | 8086/10000 [29:29:59<6:52:27, 12.93s/it] {'loss': 0.0034, 'learning_rate': 9.65e-06, 'epoch': 3.05} 81%|████████ | 8086/10000 [29:29:59<6:52:27, 12.93s/it] 81%|████████ | 8087/10000 [29:30:12<6:52:18, 12.93s/it] {'loss': 0.0044, 'learning_rate': 9.645e-06, 'epoch': 3.05} 81%|████████ | 8087/10000 [29:30:12<6:52:18, 12.93s/it] 81%|████████ | 8088/10000 [29:30:25<6:52:14, 12.94s/it] {'loss': 0.0034, 'learning_rate': 9.640000000000001e-06, 'epoch': 3.05} 81%|████████ | 8088/10000 [29:30:25<6:52:14, 12.94s/it] 81%|████████ | 8089/10000 [29:30:38<6:51:38, 12.92s/it] {'loss': 0.0045, 'learning_rate': 9.635000000000002e-06, 'epoch': 3.05} 81%|████████ | 8089/10000 [29:30:38<6:51:38, 12.92s/it] 81%|████████ | 8090/10000 [29:30:51<6:50:30, 12.90s/it] {'loss': 0.0047, 'learning_rate': 9.630000000000001e-06, 'epoch': 3.05} 81%|████████ | 8090/10000 [29:30:51<6:50:30, 12.90s/it] 81%|████████ | 8091/10000 [29:31:04<6:50:13, 12.89s/it] {'loss': 0.0034, 'learning_rate': 9.625e-06, 'epoch': 3.05} 81%|████████ | 8091/10000 [29:31:04<6:50:13, 12.89s/it] 81%|████████ | 8092/10000 [29:31:17<6:49:39, 12.88s/it] {'loss': 0.005, 'learning_rate': 9.62e-06, 'epoch': 3.05} 81%|████████ | 8092/10000 [29:31:17<6:49:39, 12.88s/it] 81%|████████ | 8093/10000 [29:31:30<6:49:17, 12.88s/it] {'loss': 0.0042, 'learning_rate': 9.615e-06, 'epoch': 3.05} 81%|████████ | 8093/10000 [29:31:30<6:49:17, 12.88s/it] 81%|████████ | 8094/10000 [29:31:43<6:49:57, 12.91s/it] {'loss': 0.0038, 'learning_rate': 9.610000000000001e-06, 'epoch': 3.05} 81%|████████ | 8094/10000 [29:31:43<6:49:57, 12.91s/it] 81%|████████ | 8095/10000 [29:31:56<6:51:13, 12.95s/it] {'loss': 0.0043, 'learning_rate': 9.605e-06, 'epoch': 3.05} 81%|████████ | 8095/10000 [29:31:56<6:51:13, 12.95s/it] 81%|████████ | 8096/10000 [29:32:09<6:50:40, 12.94s/it] {'loss': 0.0034, 'learning_rate': 9.600000000000001e-06, 'epoch': 3.05} 81%|████████ | 8096/10000 [29:32:09<6:50:40, 12.94s/it] 81%|████████ | 8097/10000 [29:32:21<6:50:00, 12.93s/it] {'loss': 0.004, 'learning_rate': 9.595e-06, 'epoch': 3.05} 81%|████████ | 8097/10000 [29:32:21<6:50:00, 12.93s/it] 81%|████████ | 8098/10000 [29:32:34<6:49:16, 12.91s/it] {'loss': 0.0041, 'learning_rate': 9.59e-06, 'epoch': 3.05} 81%|████████ | 8098/10000 [29:32:34<6:49:16, 12.91s/it] 81%|████████ | 8099/10000 [29:32:47<6:49:39, 12.93s/it] {'loss': 0.0035, 'learning_rate': 9.585e-06, 'epoch': 3.05} 81%|████████ | 8099/10000 [29:32:47<6:49:39, 12.93s/it] 81%|████████ | 8100/10000 [29:33:00<6:49:41, 12.94s/it] {'loss': 0.003, 'learning_rate': 9.58e-06, 'epoch': 3.05} 81%|████████ | 8100/10000 [29:33:00<6:49:41, 12.94s/it] 81%|████████ | 8101/10000 [29:33:13<6:49:24, 12.94s/it] {'loss': 0.0026, 'learning_rate': 9.575e-06, 'epoch': 3.05} 81%|████████ | 8101/10000 [29:33:13<6:49:24, 12.94s/it] 81%|████████ | 8102/10000 [29:33:26<6:48:49, 12.92s/it] {'loss': 0.0035, 'learning_rate': 9.57e-06, 'epoch': 3.05} 81%|████████ | 8102/10000 [29:33:26<6:48:49, 12.92s/it] 81%|████████ | 8103/10000 [29:33:39<6:48:14, 12.91s/it] {'loss': 0.0032, 'learning_rate': 9.565e-06, 'epoch': 3.05} 81%|████████ | 8103/10000 [29:33:39<6:48:14, 12.91s/it] 81%|████████ | 8104/10000 [29:33:52<6:48:27, 12.93s/it] {'loss': 0.0026, 'learning_rate': 9.560000000000002e-06, 'epoch': 3.05} 81%|████████ | 8104/10000 [29:33:52<6:48:27, 12.93s/it] 81%|████████ | 8105/10000 [29:34:05<6:48:05, 12.92s/it] {'loss': 0.0053, 'learning_rate': 9.555e-06, 'epoch': 3.05} 81%|████████ | 8105/10000 [29:34:05<6:48:05, 12.92s/it] 81%|████████ | 8106/10000 [29:34:18<6:47:02, 12.89s/it] {'loss': 0.0038, 'learning_rate': 9.55e-06, 'epoch': 3.05} 81%|████████ | 8106/10000 [29:34:18<6:47:02, 12.89s/it] 81%|████████ | 8107/10000 [29:34:31<6:47:15, 12.91s/it] {'loss': 0.0031, 'learning_rate': 9.545e-06, 'epoch': 3.05} 81%|████████ | 8107/10000 [29:34:31<6:47:15, 12.91s/it] 81%|████████ | 8108/10000 [29:34:43<6:47:09, 12.91s/it] {'loss': 0.0033, 'learning_rate': 9.54e-06, 'epoch': 3.06} 81%|████████ | 8108/10000 [29:34:43<6:47:09, 12.91s/it] 81%|████████ | 8109/10000 [29:34:56<6:47:02, 12.92s/it] {'loss': 0.003, 'learning_rate': 9.535000000000001e-06, 'epoch': 3.06} 81%|████████ | 8109/10000 [29:34:56<6:47:02, 12.92s/it] 81%|████████ | 8110/10000 [29:35:09<6:46:29, 12.90s/it] {'loss': 0.0031, 'learning_rate': 9.53e-06, 'epoch': 3.06} 81%|████████ | 8110/10000 [29:35:09<6:46:29, 12.90s/it] 81%|████████ | 8111/10000 [29:35:22<6:46:41, 12.92s/it] {'loss': 0.0035, 'learning_rate': 9.525000000000001e-06, 'epoch': 3.06} 81%|████████ | 8111/10000 [29:35:22<6:46:41, 12.92s/it] 81%|████████ | 8112/10000 [29:35:35<6:47:08, 12.94s/it] {'loss': 0.0028, 'learning_rate': 9.52e-06, 'epoch': 3.06} 81%|████████ | 8112/10000 [29:35:35<6:47:08, 12.94s/it] 81%|████████ | 8113/10000 [29:35:48<6:47:36, 12.96s/it] {'loss': 0.0031, 'learning_rate': 9.515e-06, 'epoch': 3.06} 81%|████████ | 8113/10000 [29:35:48<6:47:36, 12.96s/it] 81%|████████ | 8114/10000 [29:36:01<6:47:21, 12.96s/it] {'loss': 0.0052, 'learning_rate': 9.51e-06, 'epoch': 3.06} 81%|████████ | 8114/10000 [29:36:01<6:47:21, 12.96s/it] 81%|████████ | 8115/10000 [29:36:14<6:46:52, 12.95s/it] {'loss': 0.0029, 'learning_rate': 9.505e-06, 'epoch': 3.06} 81%|████████ | 8115/10000 [29:36:14<6:46:52, 12.95s/it] 81%|████████ | 8116/10000 [29:36:27<6:46:07, 12.93s/it] {'loss': 0.003, 'learning_rate': 9.5e-06, 'epoch': 3.06} 81%|████████ | 8116/10000 [29:36:27<6:46:07, 12.93s/it] 81%|████████ | 8117/10000 [29:36:40<6:45:27, 12.92s/it] {'loss': 0.0049, 'learning_rate': 9.495000000000001e-06, 'epoch': 3.06} 81%|████████ | 8117/10000 [29:36:40<6:45:27, 12.92s/it] 81%|████████ | 8118/10000 [29:36:53<6:45:03, 12.91s/it] {'loss': 0.0026, 'learning_rate': 9.49e-06, 'epoch': 3.06} 81%|████████ | 8118/10000 [29:36:53<6:45:03, 12.91s/it] 81%|████████ | 8119/10000 [29:37:06<6:44:47, 12.91s/it] {'loss': 0.0036, 'learning_rate': 9.485000000000002e-06, 'epoch': 3.06} 81%|████████ | 8119/10000 [29:37:06<6:44:47, 12.91s/it] 81%|████████ | 8120/10000 [29:37:19<6:44:50, 12.92s/it] {'loss': 0.0041, 'learning_rate': 9.48e-06, 'epoch': 3.06} 81%|████████ | 8120/10000 [29:37:19<6:44:50, 12.92s/it] 81%|████████ | 8121/10000 [29:37:32<6:44:12, 12.91s/it] {'loss': 0.0031, 'learning_rate': 9.475e-06, 'epoch': 3.06} 81%|████████ | 8121/10000 [29:37:32<6:44:12, 12.91s/it] 81%|████████ | 8122/10000 [29:37:45<6:44:48, 12.93s/it] {'loss': 0.0034, 'learning_rate': 9.47e-06, 'epoch': 3.06} 81%|████████ | 8122/10000 [29:37:45<6:44:48, 12.93s/it] 81%|████████ | 8123/10000 [29:37:57<6:44:42, 12.94s/it] {'loss': 0.0046, 'learning_rate': 9.465e-06, 'epoch': 3.06} 81%|████████ | 8123/10000 [29:37:58<6:44:42, 12.94s/it] 81%|████████ | 8124/10000 [29:38:10<6:43:20, 12.90s/it] {'loss': 0.0035, 'learning_rate': 9.460000000000001e-06, 'epoch': 3.06} 81%|████████ | 8124/10000 [29:38:10<6:43:20, 12.90s/it] 81%|████████▏ | 8125/10000 [29:38:23<6:43:59, 12.93s/it] {'loss': 0.0038, 'learning_rate': 9.455e-06, 'epoch': 3.06} 81%|████████▏ | 8125/10000 [29:38:23<6:43:59, 12.93s/it] 81%|████████▏ | 8126/10000 [29:38:36<6:45:01, 12.97s/it] {'loss': 0.0031, 'learning_rate': 9.450000000000001e-06, 'epoch': 3.06} 81%|████████▏ | 8126/10000 [29:38:36<6:45:01, 12.97s/it] 81%|████████▏ | 8127/10000 [29:38:49<6:44:03, 12.94s/it] {'loss': 0.0028, 'learning_rate': 9.445000000000002e-06, 'epoch': 3.06} 81%|████████▏ | 8127/10000 [29:38:49<6:44:03, 12.94s/it] 81%|████████▏ | 8128/10000 [29:39:02<6:42:57, 12.92s/it] {'loss': 0.003, 'learning_rate': 9.44e-06, 'epoch': 3.06} 81%|████████▏ | 8128/10000 [29:39:02<6:42:57, 12.92s/it] 81%|████████▏ | 8129/10000 [29:39:15<6:42:18, 12.90s/it] {'loss': 0.0039, 'learning_rate': 9.435e-06, 'epoch': 3.06} 81%|████████▏ | 8129/10000 [29:39:15<6:42:18, 12.90s/it] 81%|████████▏ | 8130/10000 [29:39:28<6:42:44, 12.92s/it] {'loss': 0.0036, 'learning_rate': 9.43e-06, 'epoch': 3.06} 81%|████████▏ | 8130/10000 [29:39:28<6:42:44, 12.92s/it] 81%|████████▏ | 8131/10000 [29:39:41<6:43:07, 12.94s/it] {'loss': 0.0029, 'learning_rate': 9.425e-06, 'epoch': 3.06} 81%|████████▏ | 8131/10000 [29:39:41<6:43:07, 12.94s/it] 81%|████████▏ | 8132/10000 [29:39:54<6:42:45, 12.94s/it] {'loss': 0.0027, 'learning_rate': 9.420000000000001e-06, 'epoch': 3.06} 81%|████████▏ | 8132/10000 [29:39:54<6:42:45, 12.94s/it] 81%|████████▏ | 8133/10000 [29:40:07<6:42:10, 12.92s/it] {'loss': 0.0036, 'learning_rate': 9.415e-06, 'epoch': 3.06} 81%|████████▏ | 8133/10000 [29:40:07<6:42:10, 12.92s/it] 81%|████████▏ | 8134/10000 [29:40:20<6:42:27, 12.94s/it] {'loss': 0.005, 'learning_rate': 9.410000000000001e-06, 'epoch': 3.06} 81%|████████▏ | 8134/10000 [29:40:20<6:42:27, 12.94s/it] 81%|████████▏ | 8135/10000 [29:40:33<6:41:04, 12.90s/it] {'loss': 0.0049, 'learning_rate': 9.405e-06, 'epoch': 3.07} 81%|████████▏ | 8135/10000 [29:40:33<6:41:04, 12.90s/it] 81%|████████▏ | 8136/10000 [29:40:45<6:41:25, 12.92s/it] {'loss': 0.0043, 'learning_rate': 9.4e-06, 'epoch': 3.07} 81%|████████▏ | 8136/10000 [29:40:45<6:41:25, 12.92s/it] 81%|████████▏ | 8137/10000 [29:40:58<6:41:10, 12.92s/it] {'loss': 0.0028, 'learning_rate': 9.395e-06, 'epoch': 3.07} 81%|████████▏ | 8137/10000 [29:40:58<6:41:10, 12.92s/it] 81%|████████▏ | 8138/10000 [29:41:11<6:40:20, 12.90s/it] {'loss': 0.0052, 'learning_rate': 9.39e-06, 'epoch': 3.07} 81%|████████▏ | 8138/10000 [29:41:11<6:40:20, 12.90s/it] 81%|████████▏ | 8139/10000 [29:41:24<6:40:19, 12.91s/it] {'loss': 0.0037, 'learning_rate': 9.385e-06, 'epoch': 3.07} 81%|████████▏ | 8139/10000 [29:41:24<6:40:19, 12.91s/it] 81%|████████▏ | 8140/10000 [29:41:37<6:40:07, 12.91s/it] {'loss': 0.0034, 'learning_rate': 9.38e-06, 'epoch': 3.07} 81%|████████▏ | 8140/10000 [29:41:37<6:40:07, 12.91s/it] 81%|████████▏ | 8141/10000 [29:41:50<6:39:09, 12.88s/it] {'loss': 0.004, 'learning_rate': 9.375000000000001e-06, 'epoch': 3.07} 81%|████████▏ | 8141/10000 [29:41:50<6:39:09, 12.88s/it] 81%|████████▏ | 8142/10000 [29:42:03<6:39:20, 12.90s/it] {'loss': 0.0037, 'learning_rate': 9.370000000000002e-06, 'epoch': 3.07} 81%|████████▏ | 8142/10000 [29:42:03<6:39:20, 12.90s/it] 81%|████████▏ | 8143/10000 [29:42:16<6:38:38, 12.88s/it] {'loss': 0.0048, 'learning_rate': 9.365000000000001e-06, 'epoch': 3.07} 81%|████████▏ | 8143/10000 [29:42:16<6:38:38, 12.88s/it] 81%|████████▏ | 8144/10000 [29:42:29<6:39:07, 12.90s/it] {'loss': 0.004, 'learning_rate': 9.36e-06, 'epoch': 3.07} 81%|████████▏ | 8144/10000 [29:42:29<6:39:07, 12.90s/it] 81%|████████▏ | 8145/10000 [29:42:42<6:38:51, 12.90s/it] {'loss': 0.0036, 'learning_rate': 9.355e-06, 'epoch': 3.07} 81%|████████▏ | 8145/10000 [29:42:42<6:38:51, 12.90s/it] 81%|████████▏ | 8146/10000 [29:42:54<6:39:15, 12.92s/it] {'loss': 0.0028, 'learning_rate': 9.35e-06, 'epoch': 3.07} 81%|████████▏ | 8146/10000 [29:42:55<6:39:15, 12.92s/it] 81%|████████▏ | 8147/10000 [29:43:07<6:39:30, 12.94s/it] {'loss': 0.0033, 'learning_rate': 9.345000000000001e-06, 'epoch': 3.07} 81%|████████▏ | 8147/10000 [29:43:07<6:39:30, 12.94s/it] 81%|████████▏ | 8148/10000 [29:43:20<6:39:40, 12.95s/it] {'loss': 0.0049, 'learning_rate': 9.34e-06, 'epoch': 3.07} 81%|████████▏ | 8148/10000 [29:43:20<6:39:40, 12.95s/it] 81%|████████▏ | 8149/10000 [29:43:33<6:38:53, 12.93s/it] {'loss': 0.002, 'learning_rate': 9.335000000000001e-06, 'epoch': 3.07} 81%|████████▏ | 8149/10000 [29:43:33<6:38:53, 12.93s/it] 82%|████████▏ | 8150/10000 [29:43:46<6:38:33, 12.93s/it] {'loss': 0.0032, 'learning_rate': 9.33e-06, 'epoch': 3.07} 82%|████████▏ | 8150/10000 [29:43:46<6:38:33, 12.93s/it] 82%|████████▏ | 8151/10000 [29:43:59<6:38:08, 12.92s/it] {'loss': 0.0029, 'learning_rate': 9.325e-06, 'epoch': 3.07} 82%|████████▏ | 8151/10000 [29:43:59<6:38:08, 12.92s/it] 82%|████████▏ | 8152/10000 [29:44:12<6:37:38, 12.91s/it] {'loss': 0.0033, 'learning_rate': 9.32e-06, 'epoch': 3.07} 82%|████████▏ | 8152/10000 [29:44:12<6:37:38, 12.91s/it] 82%|████████▏ | 8153/10000 [29:44:25<6:37:38, 12.92s/it] {'loss': 0.0038, 'learning_rate': 9.315e-06, 'epoch': 3.07} 82%|████████▏ | 8153/10000 [29:44:25<6:37:38, 12.92s/it] 82%|████████▏ | 8154/10000 [29:44:38<6:37:29, 12.92s/it] {'loss': 0.004, 'learning_rate': 9.31e-06, 'epoch': 3.07} 82%|████████▏ | 8154/10000 [29:44:38<6:37:29, 12.92s/it] 82%|████████▏ | 8155/10000 [29:44:51<6:36:51, 12.91s/it] {'loss': 0.0051, 'learning_rate': 9.305e-06, 'epoch': 3.07} 82%|████████▏ | 8155/10000 [29:44:51<6:36:51, 12.91s/it] 82%|████████▏ | 8156/10000 [29:45:04<6:37:24, 12.93s/it] {'loss': 0.0033, 'learning_rate': 9.3e-06, 'epoch': 3.07} 82%|████████▏ | 8156/10000 [29:45:04<6:37:24, 12.93s/it] 82%|████████▏ | 8157/10000 [29:45:17<6:37:17, 12.93s/it] {'loss': 0.0044, 'learning_rate': 9.295000000000002e-06, 'epoch': 3.07} 82%|████████▏ | 8157/10000 [29:45:17<6:37:17, 12.93s/it] 82%|████████▏ | 8158/10000 [29:45:30<6:37:05, 12.93s/it] {'loss': 0.0034, 'learning_rate': 9.29e-06, 'epoch': 3.07} 82%|████████▏ | 8158/10000 [29:45:30<6:37:05, 12.93s/it] 82%|████████▏ | 8159/10000 [29:45:43<6:36:37, 12.93s/it] {'loss': 0.003, 'learning_rate': 9.285e-06, 'epoch': 3.07} 82%|████████▏ | 8159/10000 [29:45:43<6:36:37, 12.93s/it] 82%|████████▏ | 8160/10000 [29:45:55<6:36:18, 12.92s/it] {'loss': 0.0034, 'learning_rate': 9.28e-06, 'epoch': 3.07} 82%|████████▏ | 8160/10000 [29:45:55<6:36:18, 12.92s/it] 82%|████████▏ | 8161/10000 [29:46:08<6:35:59, 12.92s/it] {'loss': 0.0048, 'learning_rate': 9.275e-06, 'epoch': 3.07} 82%|████████▏ | 8161/10000 [29:46:08<6:35:59, 12.92s/it] 82%|████████▏ | 8162/10000 [29:46:21<6:35:04, 12.90s/it] {'loss': 0.0045, 'learning_rate': 9.270000000000001e-06, 'epoch': 3.08} 82%|████████▏ | 8162/10000 [29:46:21<6:35:04, 12.90s/it] 82%|████████▏ | 8163/10000 [29:46:34<6:34:48, 12.90s/it] {'loss': 0.0032, 'learning_rate': 9.265e-06, 'epoch': 3.08} 82%|████████▏ | 8163/10000 [29:46:34<6:34:48, 12.90s/it] 82%|████████▏ | 8164/10000 [29:46:47<6:34:11, 12.88s/it] {'loss': 0.0046, 'learning_rate': 9.260000000000001e-06, 'epoch': 3.08} 82%|████████▏ | 8164/10000 [29:46:47<6:34:11, 12.88s/it] 82%|████████▏ | 8165/10000 [29:47:00<6:34:11, 12.89s/it] {'loss': 0.0041, 'learning_rate': 9.255e-06, 'epoch': 3.08} 82%|████████▏ | 8165/10000 [29:47:00<6:34:11, 12.89s/it] 82%|████████▏ | 8166/10000 [29:47:13<6:34:55, 12.92s/it] {'loss': 0.0043, 'learning_rate': 9.25e-06, 'epoch': 3.08} 82%|████████▏ | 8166/10000 [29:47:13<6:34:55, 12.92s/it] 82%|████████▏ | 8167/10000 [29:47:26<6:35:04, 12.93s/it] {'loss': 0.004, 'learning_rate': 9.245e-06, 'epoch': 3.08} 82%|████████▏ | 8167/10000 [29:47:26<6:35:04, 12.93s/it] 82%|████████▏ | 8168/10000 [29:47:39<6:35:36, 12.96s/it] {'loss': 0.0041, 'learning_rate': 9.24e-06, 'epoch': 3.08} 82%|████████▏ | 8168/10000 [29:47:39<6:35:36, 12.96s/it] 82%|████████▏ | 8169/10000 [29:47:52<6:36:42, 13.00s/it] {'loss': 0.0034, 'learning_rate': 9.235e-06, 'epoch': 3.08} 82%|████████▏ | 8169/10000 [29:47:52<6:36:42, 13.00s/it] 82%|████████▏ | 8170/10000 [29:48:05<6:36:43, 13.01s/it] {'loss': 0.0042, 'learning_rate': 9.23e-06, 'epoch': 3.08} 82%|████████▏ | 8170/10000 [29:48:05<6:36:43, 13.01s/it] 82%|████████▏ | 8171/10000 [29:48:18<6:35:54, 12.99s/it] {'loss': 0.0042, 'learning_rate': 9.225e-06, 'epoch': 3.08} 82%|████████▏ | 8171/10000 [29:48:18<6:35:54, 12.99s/it] 82%|████████▏ | 8172/10000 [29:48:31<6:35:25, 12.98s/it] {'loss': 0.0038, 'learning_rate': 9.220000000000002e-06, 'epoch': 3.08} 82%|████████▏ | 8172/10000 [29:48:31<6:35:25, 12.98s/it] 82%|████████▏ | 8173/10000 [29:48:44<6:34:49, 12.97s/it] {'loss': 0.0033, 'learning_rate': 9.215e-06, 'epoch': 3.08} 82%|████████▏ | 8173/10000 [29:48:44<6:34:49, 12.97s/it] 82%|████████▏ | 8174/10000 [29:48:57<6:34:25, 12.96s/it] {'loss': 0.0042, 'learning_rate': 9.21e-06, 'epoch': 3.08} 82%|████████▏ | 8174/10000 [29:48:57<6:34:25, 12.96s/it] 82%|████████▏ | 8175/10000 [29:49:10<6:33:10, 12.93s/it] {'loss': 0.0034, 'learning_rate': 9.205e-06, 'epoch': 3.08} 82%|████████▏ | 8175/10000 [29:49:10<6:33:10, 12.93s/it] 82%|████████▏ | 8176/10000 [29:49:22<6:32:39, 12.92s/it] {'loss': 0.0032, 'learning_rate': 9.2e-06, 'epoch': 3.08} 82%|████████▏ | 8176/10000 [29:49:22<6:32:39, 12.92s/it] 82%|████████▏ | 8177/10000 [29:49:35<6:32:22, 12.91s/it] {'loss': 0.0045, 'learning_rate': 9.195000000000001e-06, 'epoch': 3.08} 82%|████████▏ | 8177/10000 [29:49:35<6:32:22, 12.91s/it] 82%|████████▏ | 8178/10000 [29:49:48<6:31:48, 12.90s/it] {'loss': 0.0036, 'learning_rate': 9.19e-06, 'epoch': 3.08} 82%|████████▏ | 8178/10000 [29:49:48<6:31:48, 12.90s/it] 82%|████████▏ | 8179/10000 [29:50:01<6:31:26, 12.90s/it] {'loss': 0.0042, 'learning_rate': 9.185000000000001e-06, 'epoch': 3.08} 82%|████████▏ | 8179/10000 [29:50:01<6:31:26, 12.90s/it] 82%|████████▏ | 8180/10000 [29:50:14<6:31:12, 12.90s/it] {'loss': 0.003, 'learning_rate': 9.180000000000002e-06, 'epoch': 3.08} 82%|████████▏ | 8180/10000 [29:50:14<6:31:12, 12.90s/it] 82%|████████▏ | 8181/10000 [29:50:27<6:30:43, 12.89s/it] {'loss': 0.0047, 'learning_rate': 9.175000000000001e-06, 'epoch': 3.08} 82%|████████▏ | 8181/10000 [29:50:27<6:30:43, 12.89s/it] 82%|████████▏ | 8182/10000 [29:50:40<6:30:12, 12.88s/it] {'loss': 0.0034, 'learning_rate': 9.17e-06, 'epoch': 3.08} 82%|████████▏ | 8182/10000 [29:50:40<6:30:12, 12.88s/it] 82%|████████▏ | 8183/10000 [29:50:53<6:30:04, 12.88s/it] {'loss': 0.0034, 'learning_rate': 9.165e-06, 'epoch': 3.08} 82%|████████▏ | 8183/10000 [29:50:53<6:30:04, 12.88s/it] 82%|████████▏ | 8184/10000 [29:51:06<6:29:59, 12.89s/it] {'loss': 0.0039, 'learning_rate': 9.16e-06, 'epoch': 3.08} 82%|████████▏ | 8184/10000 [29:51:06<6:29:59, 12.89s/it] 82%|████████▏ | 8185/10000 [29:51:19<6:30:28, 12.91s/it] {'loss': 0.0046, 'learning_rate': 9.155000000000001e-06, 'epoch': 3.08} 82%|████████▏ | 8185/10000 [29:51:19<6:30:28, 12.91s/it] 82%|████████▏ | 8186/10000 [29:51:31<6:30:29, 12.92s/it] {'loss': 0.0041, 'learning_rate': 9.15e-06, 'epoch': 3.08} 82%|████████▏ | 8186/10000 [29:51:31<6:30:29, 12.92s/it] 82%|████████▏ | 8187/10000 [29:51:44<6:29:36, 12.89s/it] {'loss': 0.0038, 'learning_rate': 9.145000000000001e-06, 'epoch': 3.08} 82%|████████▏ | 8187/10000 [29:51:44<6:29:36, 12.89s/it] 82%|████████▏ | 8188/10000 [29:51:57<6:29:23, 12.89s/it] {'loss': 0.0025, 'learning_rate': 9.14e-06, 'epoch': 3.09} 82%|████████▏ | 8188/10000 [29:51:57<6:29:23, 12.89s/it] 82%|████████▏ | 8189/10000 [29:52:10<6:29:56, 12.92s/it] {'loss': 0.0039, 'learning_rate': 9.135e-06, 'epoch': 3.09} 82%|████████▏ | 8189/10000 [29:52:10<6:29:56, 12.92s/it] 82%|████████▏ | 8190/10000 [29:52:23<6:29:47, 12.92s/it] {'loss': 0.0033, 'learning_rate': 9.13e-06, 'epoch': 3.09} 82%|████████▏ | 8190/10000 [29:52:23<6:29:47, 12.92s/it] 82%|████████▏ | 8191/10000 [29:52:36<6:29:48, 12.93s/it] {'loss': 0.004, 'learning_rate': 9.125e-06, 'epoch': 3.09} 82%|████████▏ | 8191/10000 [29:52:36<6:29:48, 12.93s/it] 82%|████████▏ | 8192/10000 [29:52:49<6:29:04, 12.91s/it] {'loss': 0.0051, 'learning_rate': 9.12e-06, 'epoch': 3.09} 82%|████████▏ | 8192/10000 [29:52:49<6:29:04, 12.91s/it] 82%|████████▏ | 8193/10000 [29:53:02<6:28:47, 12.91s/it] {'loss': 0.0038, 'learning_rate': 9.115e-06, 'epoch': 3.09} 82%|████████▏ | 8193/10000 [29:53:02<6:28:47, 12.91s/it] 82%|████████▏ | 8194/10000 [29:53:15<6:28:50, 12.92s/it] {'loss': 0.0047, 'learning_rate': 9.110000000000001e-06, 'epoch': 3.09} 82%|████████▏ | 8194/10000 [29:53:15<6:28:50, 12.92s/it] 82%|████████▏ | 8195/10000 [29:53:28<6:28:29, 12.91s/it] {'loss': 0.0033, 'learning_rate': 9.105000000000002e-06, 'epoch': 3.09} 82%|████████▏ | 8195/10000 [29:53:28<6:28:29, 12.91s/it] 82%|████████▏ | 8196/10000 [29:53:41<6:28:47, 12.93s/it] {'loss': 0.0032, 'learning_rate': 9.100000000000001e-06, 'epoch': 3.09} 82%|████████▏ | 8196/10000 [29:53:41<6:28:47, 12.93s/it] 82%|████████▏ | 8197/10000 [29:53:54<6:28:26, 12.93s/it] {'loss': 0.0049, 'learning_rate': 9.095e-06, 'epoch': 3.09} 82%|████████▏ | 8197/10000 [29:53:54<6:28:26, 12.93s/it] 82%|████████▏ | 8198/10000 [29:54:06<6:28:33, 12.94s/it] {'loss': 0.0044, 'learning_rate': 9.09e-06, 'epoch': 3.09} 82%|████████▏ | 8198/10000 [29:54:07<6:28:33, 12.94s/it] 82%|████████▏ | 8199/10000 [29:54:19<6:28:09, 12.93s/it] {'loss': 0.003, 'learning_rate': 9.085e-06, 'epoch': 3.09} 82%|████████▏ | 8199/10000 [29:54:19<6:28:09, 12.93s/it] 82%|████████▏ | 8200/10000 [29:54:32<6:27:45, 12.93s/it] {'loss': 0.0032, 'learning_rate': 9.080000000000001e-06, 'epoch': 3.09} 82%|████████▏ | 8200/10000 [29:54:32<6:27:45, 12.93s/it] 82%|████████▏ | 8201/10000 [29:54:45<6:27:35, 12.93s/it] {'loss': 0.003, 'learning_rate': 9.075e-06, 'epoch': 3.09} 82%|████████▏ | 8201/10000 [29:54:45<6:27:35, 12.93s/it] 82%|████████▏ | 8202/10000 [29:54:58<6:26:59, 12.91s/it] {'loss': 0.0039, 'learning_rate': 9.070000000000001e-06, 'epoch': 3.09} 82%|████████▏ | 8202/10000 [29:54:58<6:26:59, 12.91s/it] 82%|████████▏ | 8203/10000 [29:55:11<6:26:34, 12.91s/it] {'loss': 0.004, 'learning_rate': 9.065e-06, 'epoch': 3.09} 82%|████████▏ | 8203/10000 [29:55:11<6:26:34, 12.91s/it] 82%|████████▏ | 8204/10000 [29:55:24<6:26:50, 12.92s/it] {'loss': 0.0047, 'learning_rate': 9.06e-06, 'epoch': 3.09} 82%|████████▏ | 8204/10000 [29:55:24<6:26:50, 12.92s/it] 82%|████████▏ | 8205/10000 [29:55:37<6:26:59, 12.94s/it] {'loss': 0.0043, 'learning_rate': 9.055e-06, 'epoch': 3.09} 82%|████████▏ | 8205/10000 [29:55:37<6:26:59, 12.94s/it] 82%|████████▏ | 8206/10000 [29:55:50<6:27:28, 12.96s/it] {'loss': 0.0038, 'learning_rate': 9.05e-06, 'epoch': 3.09} 82%|████████▏ | 8206/10000 [29:55:50<6:27:28, 12.96s/it] 82%|████████▏ | 8207/10000 [29:56:03<6:27:21, 12.96s/it] {'loss': 0.0037, 'learning_rate': 9.045e-06, 'epoch': 3.09} 82%|████████▏ | 8207/10000 [29:56:03<6:27:21, 12.96s/it] 82%|████████▏ | 8208/10000 [29:56:16<6:26:52, 12.95s/it] {'loss': 0.0035, 'learning_rate': 9.04e-06, 'epoch': 3.09} 82%|████████▏ | 8208/10000 [29:56:16<6:26:52, 12.95s/it] 82%|████████▏ | 8209/10000 [29:56:29<6:26:19, 12.94s/it] {'loss': 0.0034, 'learning_rate': 9.035e-06, 'epoch': 3.09} 82%|████████▏ | 8209/10000 [29:56:29<6:26:19, 12.94s/it] 82%|████████▏ | 8210/10000 [29:56:42<6:25:34, 12.92s/it] {'loss': 0.0035, 'learning_rate': 9.030000000000002e-06, 'epoch': 3.09} 82%|████████▏ | 8210/10000 [29:56:42<6:25:34, 12.92s/it] 82%|████████▏ | 8211/10000 [29:56:55<6:24:35, 12.90s/it] {'loss': 0.0037, 'learning_rate': 9.025e-06, 'epoch': 3.09} 82%|████████▏ | 8211/10000 [29:56:55<6:24:35, 12.90s/it] 82%|████████▏ | 8212/10000 [29:57:07<6:23:57, 12.88s/it] {'loss': 0.0038, 'learning_rate': 9.02e-06, 'epoch': 3.09} 82%|████████▏ | 8212/10000 [29:57:07<6:23:57, 12.88s/it] 82%|████████▏ | 8213/10000 [29:57:20<6:23:39, 12.88s/it] {'loss': 0.004, 'learning_rate': 9.015e-06, 'epoch': 3.09} 82%|████████▏ | 8213/10000 [29:57:20<6:23:39, 12.88s/it] 82%|████████▏ | 8214/10000 [29:57:33<6:23:40, 12.89s/it] {'loss': 0.0037, 'learning_rate': 9.01e-06, 'epoch': 3.09} 82%|████████▏ | 8214/10000 [29:57:33<6:23:40, 12.89s/it] 82%|████████▏ | 8215/10000 [29:57:46<6:24:28, 12.92s/it] {'loss': 0.0035, 'learning_rate': 9.005000000000001e-06, 'epoch': 3.1} 82%|████████▏ | 8215/10000 [29:57:46<6:24:28, 12.92s/it] 82%|████████▏ | 8216/10000 [29:57:59<6:24:42, 12.94s/it] {'loss': 0.0028, 'learning_rate': 9e-06, 'epoch': 3.1} 82%|████████▏ | 8216/10000 [29:57:59<6:24:42, 12.94s/it] 82%|████████▏ | 8217/10000 [29:58:12<6:24:40, 12.95s/it] {'loss': 0.0041, 'learning_rate': 8.995000000000001e-06, 'epoch': 3.1} 82%|████████▏ | 8217/10000 [29:58:12<6:24:40, 12.95s/it] 82%|████████▏ | 8218/10000 [29:58:25<6:24:27, 12.94s/it] {'loss': 0.0038, 'learning_rate': 8.99e-06, 'epoch': 3.1} 82%|████████▏ | 8218/10000 [29:58:25<6:24:27, 12.94s/it] 82%|████████▏ | 8219/10000 [29:58:38<6:24:28, 12.95s/it] {'loss': 0.0048, 'learning_rate': 8.985e-06, 'epoch': 3.1} 82%|████████▏ | 8219/10000 [29:58:38<6:24:28, 12.95s/it] 82%|████████▏ | 8220/10000 [29:58:51<6:23:50, 12.94s/it] {'loss': 0.0037, 'learning_rate': 8.98e-06, 'epoch': 3.1} 82%|████████▏ | 8220/10000 [29:58:51<6:23:50, 12.94s/it] 82%|████████▏ | 8221/10000 [29:59:04<6:22:38, 12.91s/it] {'loss': 0.0049, 'learning_rate': 8.975e-06, 'epoch': 3.1} 82%|████████▏ | 8221/10000 [29:59:04<6:22:38, 12.91s/it] 82%|████████▏ | 8222/10000 [29:59:17<6:22:34, 12.91s/it] {'loss': 0.0048, 'learning_rate': 8.97e-06, 'epoch': 3.1} 82%|████████▏ | 8222/10000 [29:59:17<6:22:34, 12.91s/it] 82%|████████▏ | 8223/10000 [29:59:30<6:22:03, 12.90s/it] {'loss': 0.0038, 'learning_rate': 8.965e-06, 'epoch': 3.1} 82%|████████▏ | 8223/10000 [29:59:30<6:22:03, 12.90s/it] 82%|████████▏ | 8224/10000 [29:59:42<6:21:59, 12.91s/it] {'loss': 0.0037, 'learning_rate': 8.96e-06, 'epoch': 3.1} 82%|████████▏ | 8224/10000 [29:59:42<6:21:59, 12.91s/it] 82%|████████▏ | 8225/10000 [29:59:55<6:22:13, 12.92s/it] {'loss': 0.0041, 'learning_rate': 8.955000000000002e-06, 'epoch': 3.1} 82%|████████▏ | 8225/10000 [29:59:55<6:22:13, 12.92s/it] 82%|████████▏ | 8226/10000 [30:00:08<6:22:33, 12.94s/it] {'loss': 0.0043, 'learning_rate': 8.95e-06, 'epoch': 3.1} 82%|████████▏ | 8226/10000 [30:00:08<6:22:33, 12.94s/it] 82%|████████▏ | 8227/10000 [30:00:21<6:22:00, 12.93s/it] {'loss': 0.0035, 'learning_rate': 8.945e-06, 'epoch': 3.1} 82%|████████▏ | 8227/10000 [30:00:21<6:22:00, 12.93s/it] 82%|████████▏ | 8228/10000 [30:00:34<6:21:50, 12.93s/it] {'loss': 0.0037, 'learning_rate': 8.939999999999999e-06, 'epoch': 3.1} 82%|████████▏ | 8228/10000 [30:00:34<6:21:50, 12.93s/it] 82%|████████▏ | 8229/10000 [30:00:47<6:21:19, 12.92s/it] {'loss': 0.0028, 'learning_rate': 8.935e-06, 'epoch': 3.1} 82%|████████▏ | 8229/10000 [30:00:47<6:21:19, 12.92s/it] 82%|████████▏ | 8230/10000 [30:01:00<6:21:53, 12.95s/it] {'loss': 0.003, 'learning_rate': 8.930000000000001e-06, 'epoch': 3.1} 82%|████████▏ | 8230/10000 [30:01:00<6:21:53, 12.95s/it] 82%|████████▏ | 8231/10000 [30:01:13<6:21:15, 12.93s/it] {'loss': 0.0036, 'learning_rate': 8.925e-06, 'epoch': 3.1} 82%|████████▏ | 8231/10000 [30:01:13<6:21:15, 12.93s/it] 82%|████████▏ | 8232/10000 [30:01:26<6:19:35, 12.88s/it] {'loss': 0.0058, 'learning_rate': 8.920000000000001e-06, 'epoch': 3.1} 82%|████████▏ | 8232/10000 [30:01:26<6:19:35, 12.88s/it] 82%|████████▏ | 8233/10000 [30:01:39<6:19:19, 12.88s/it] {'loss': 0.0038, 'learning_rate': 8.915e-06, 'epoch': 3.1} 82%|████████▏ | 8233/10000 [30:01:39<6:19:19, 12.88s/it] 82%|████████▏ | 8234/10000 [30:01:52<6:20:06, 12.91s/it] {'loss': 0.0036, 'learning_rate': 8.910000000000001e-06, 'epoch': 3.1} 82%|████████▏ | 8234/10000 [30:01:52<6:20:06, 12.91s/it] 82%|████████▏ | 8235/10000 [30:02:05<6:19:37, 12.91s/it] {'loss': 0.0042, 'learning_rate': 8.905e-06, 'epoch': 3.1} 82%|████████▏ | 8235/10000 [30:02:05<6:19:37, 12.91s/it] 82%|████████▏ | 8236/10000 [30:02:17<6:19:46, 12.92s/it] {'loss': 0.003, 'learning_rate': 8.9e-06, 'epoch': 3.1} 82%|████████▏ | 8236/10000 [30:02:18<6:19:46, 12.92s/it] 82%|████████▏ | 8237/10000 [30:02:30<6:19:50, 12.93s/it] {'loss': 0.0038, 'learning_rate': 8.895e-06, 'epoch': 3.1} 82%|████████▏ | 8237/10000 [30:02:30<6:19:50, 12.93s/it] 82%|████████▏ | 8238/10000 [30:02:43<6:19:15, 12.91s/it] {'loss': 0.0038, 'learning_rate': 8.890000000000001e-06, 'epoch': 3.1} 82%|████████▏ | 8238/10000 [30:02:43<6:19:15, 12.91s/it] 82%|████████▏ | 8239/10000 [30:02:56<6:18:26, 12.89s/it] {'loss': 0.0041, 'learning_rate': 8.885e-06, 'epoch': 3.1} 82%|████████▏ | 8239/10000 [30:02:56<6:18:26, 12.89s/it] 82%|████████▏ | 8240/10000 [30:03:09<6:18:22, 12.90s/it] {'loss': 0.0033, 'learning_rate': 8.880000000000001e-06, 'epoch': 3.1} 82%|████████▏ | 8240/10000 [30:03:09<6:18:22, 12.90s/it] 82%|████████▏ | 8241/10000 [30:03:22<6:17:55, 12.89s/it] {'loss': 0.0043, 'learning_rate': 8.875e-06, 'epoch': 3.11} 82%|████████▏ | 8241/10000 [30:03:22<6:17:55, 12.89s/it] 82%|████████▏ | 8242/10000 [30:03:35<6:17:48, 12.89s/it] {'loss': 0.0035, 'learning_rate': 8.87e-06, 'epoch': 3.11} 82%|████████▏ | 8242/10000 [30:03:35<6:17:48, 12.89s/it] 82%|████████▏ | 8243/10000 [30:03:48<6:18:06, 12.91s/it] {'loss': 0.0038, 'learning_rate': 8.865e-06, 'epoch': 3.11} 82%|████████▏ | 8243/10000 [30:03:48<6:18:06, 12.91s/it] 82%|████████▏ | 8244/10000 [30:04:01<6:18:06, 12.92s/it] {'loss': 0.0031, 'learning_rate': 8.86e-06, 'epoch': 3.11} 82%|████████▏ | 8244/10000 [30:04:01<6:18:06, 12.92s/it] 82%|████████▏ | 8245/10000 [30:04:14<6:18:01, 12.92s/it] {'loss': 0.0038, 'learning_rate': 8.855e-06, 'epoch': 3.11} 82%|████████▏ | 8245/10000 [30:04:14<6:18:01, 12.92s/it] 82%|████████▏ | 8246/10000 [30:04:27<6:17:22, 12.91s/it] {'loss': 0.0043, 'learning_rate': 8.85e-06, 'epoch': 3.11} 82%|████████▏ | 8246/10000 [30:04:27<6:17:22, 12.91s/it] 82%|████████▏ | 8247/10000 [30:04:39<6:17:14, 12.91s/it] {'loss': 0.0039, 'learning_rate': 8.845000000000001e-06, 'epoch': 3.11} 82%|████████▏ | 8247/10000 [30:04:39<6:17:14, 12.91s/it] 82%|████████▏ | 8248/10000 [30:04:52<6:17:12, 12.92s/it] {'loss': 0.0029, 'learning_rate': 8.840000000000002e-06, 'epoch': 3.11} 82%|████████▏ | 8248/10000 [30:04:52<6:17:12, 12.92s/it] 82%|████████▏ | 8249/10000 [30:05:05<6:17:04, 12.92s/it] {'loss': 0.0037, 'learning_rate': 8.835000000000001e-06, 'epoch': 3.11} 82%|████████▏ | 8249/10000 [30:05:05<6:17:04, 12.92s/it] 82%|████████▎ | 8250/10000 [30:05:18<6:16:15, 12.90s/it] {'loss': 0.0045, 'learning_rate': 8.83e-06, 'epoch': 3.11} 82%|████████▎ | 8250/10000 [30:05:18<6:16:15, 12.90s/it] 83%|████████▎ | 8251/10000 [30:05:31<6:15:57, 12.90s/it] {'loss': 0.0031, 'learning_rate': 8.825e-06, 'epoch': 3.11} 83%|████████▎ | 8251/10000 [30:05:31<6:15:57, 12.90s/it] 83%|████████▎ | 8252/10000 [30:05:44<6:15:34, 12.89s/it] {'loss': 0.0032, 'learning_rate': 8.82e-06, 'epoch': 3.11} 83%|████████▎ | 8252/10000 [30:05:44<6:15:34, 12.89s/it] 83%|████████▎ | 8253/10000 [30:05:57<6:15:38, 12.90s/it] {'loss': 0.0034, 'learning_rate': 8.815000000000001e-06, 'epoch': 3.11} 83%|████████▎ | 8253/10000 [30:05:57<6:15:38, 12.90s/it] 83%|████████▎ | 8254/10000 [30:06:10<6:16:02, 12.92s/it] {'loss': 0.0019, 'learning_rate': 8.81e-06, 'epoch': 3.11} 83%|████████▎ | 8254/10000 [30:06:10<6:16:02, 12.92s/it] 83%|████████▎ | 8255/10000 [30:06:23<6:16:00, 12.93s/it] {'loss': 0.0032, 'learning_rate': 8.805000000000001e-06, 'epoch': 3.11} 83%|████████▎ | 8255/10000 [30:06:23<6:16:00, 12.93s/it] 83%|████████▎ | 8256/10000 [30:06:36<6:15:50, 12.93s/it] {'loss': 0.0035, 'learning_rate': 8.8e-06, 'epoch': 3.11} 83%|████████▎ | 8256/10000 [30:06:36<6:15:50, 12.93s/it] 83%|████████▎ | 8257/10000 [30:06:49<6:15:51, 12.94s/it] {'loss': 0.0034, 'learning_rate': 8.795e-06, 'epoch': 3.11} 83%|████████▎ | 8257/10000 [30:06:49<6:15:51, 12.94s/it] 83%|████████▎ | 8258/10000 [30:07:02<6:15:27, 12.93s/it] {'loss': 0.0031, 'learning_rate': 8.79e-06, 'epoch': 3.11} 83%|████████▎ | 8258/10000 [30:07:02<6:15:27, 12.93s/it] 83%|████████▎ | 8259/10000 [30:07:15<6:15:26, 12.94s/it] {'loss': 0.0046, 'learning_rate': 8.785e-06, 'epoch': 3.11} 83%|████████▎ | 8259/10000 [30:07:15<6:15:26, 12.94s/it] 83%|████████▎ | 8260/10000 [30:07:27<6:14:46, 12.92s/it] {'loss': 0.0043, 'learning_rate': 8.78e-06, 'epoch': 3.11} 83%|████████▎ | 8260/10000 [30:07:27<6:14:46, 12.92s/it] 83%|████████▎ | 8261/10000 [30:07:40<6:14:59, 12.94s/it] {'loss': 0.0042, 'learning_rate': 8.775e-06, 'epoch': 3.11} 83%|████████▎ | 8261/10000 [30:07:40<6:14:59, 12.94s/it] 83%|████████▎ | 8262/10000 [30:07:53<6:14:36, 12.93s/it] {'loss': 0.004, 'learning_rate': 8.77e-06, 'epoch': 3.11} 83%|████████▎ | 8262/10000 [30:07:53<6:14:36, 12.93s/it] 83%|████████▎ | 8263/10000 [30:08:06<6:14:01, 12.92s/it] {'loss': 0.0038, 'learning_rate': 8.765000000000002e-06, 'epoch': 3.11} 83%|████████▎ | 8263/10000 [30:08:06<6:14:01, 12.92s/it] 83%|████████▎ | 8264/10000 [30:08:19<6:13:41, 12.92s/it] {'loss': 0.0034, 'learning_rate': 8.76e-06, 'epoch': 3.11} 83%|████████▎ | 8264/10000 [30:08:19<6:13:41, 12.92s/it] 83%|████████▎ | 8265/10000 [30:08:32<6:13:58, 12.93s/it] {'loss': 0.0049, 'learning_rate': 8.755e-06, 'epoch': 3.11} 83%|████████▎ | 8265/10000 [30:08:32<6:13:58, 12.93s/it] 83%|████████▎ | 8266/10000 [30:08:45<6:13:34, 12.93s/it] {'loss': 0.003, 'learning_rate': 8.75e-06, 'epoch': 3.11} 83%|████████▎ | 8266/10000 [30:08:45<6:13:34, 12.93s/it] 83%|████████▎ | 8267/10000 [30:08:58<6:13:20, 12.93s/it] {'loss': 0.0029, 'learning_rate': 8.745e-06, 'epoch': 3.11} 83%|████████▎ | 8267/10000 [30:08:58<6:13:20, 12.93s/it] 83%|████████▎ | 8268/10000 [30:09:11<6:14:12, 12.96s/it] {'loss': 0.0038, 'learning_rate': 8.740000000000001e-06, 'epoch': 3.12} 83%|████████▎ | 8268/10000 [30:09:11<6:14:12, 12.96s/it] 83%|████████▎ | 8269/10000 [30:09:24<6:13:59, 12.96s/it] {'loss': 0.0033, 'learning_rate': 8.735e-06, 'epoch': 3.12} 83%|████████▎ | 8269/10000 [30:09:24<6:13:59, 12.96s/it] 83%|████████▎ | 8270/10000 [30:09:37<6:13:46, 12.96s/it] {'loss': 0.0025, 'learning_rate': 8.730000000000001e-06, 'epoch': 3.12} 83%|████████▎ | 8270/10000 [30:09:37<6:13:46, 12.96s/it] 83%|████████▎ | 8271/10000 [30:09:50<6:13:46, 12.97s/it] {'loss': 0.0034, 'learning_rate': 8.725e-06, 'epoch': 3.12} 83%|████████▎ | 8271/10000 [30:09:50<6:13:46, 12.97s/it] 83%|████████▎ | 8272/10000 [30:10:03<6:13:19, 12.96s/it] {'loss': 0.004, 'learning_rate': 8.720000000000001e-06, 'epoch': 3.12} 83%|████████▎ | 8272/10000 [30:10:03<6:13:19, 12.96s/it] 83%|████████▎ | 8273/10000 [30:10:16<6:13:30, 12.98s/it] {'loss': 0.0027, 'learning_rate': 8.715e-06, 'epoch': 3.12} 83%|████████▎ | 8273/10000 [30:10:16<6:13:30, 12.98s/it] 83%|████████▎ | 8274/10000 [30:10:29<6:13:06, 12.97s/it] {'loss': 0.0038, 'learning_rate': 8.71e-06, 'epoch': 3.12} 83%|████████▎ | 8274/10000 [30:10:29<6:13:06, 12.97s/it] 83%|████████▎ | 8275/10000 [30:10:42<6:12:35, 12.96s/it] {'loss': 0.0027, 'learning_rate': 8.705e-06, 'epoch': 3.12} 83%|████████▎ | 8275/10000 [30:10:42<6:12:35, 12.96s/it] 83%|████████▎ | 8276/10000 [30:10:55<6:12:10, 12.95s/it] {'loss': 0.0031, 'learning_rate': 8.7e-06, 'epoch': 3.12} 83%|████████▎ | 8276/10000 [30:10:55<6:12:10, 12.95s/it] 83%|████████▎ | 8277/10000 [30:11:08<6:11:35, 12.94s/it] {'loss': 0.0043, 'learning_rate': 8.695e-06, 'epoch': 3.12} 83%|████████▎ | 8277/10000 [30:11:08<6:11:35, 12.94s/it] 83%|████████▎ | 8278/10000 [30:11:21<6:11:27, 12.94s/it] {'loss': 0.0039, 'learning_rate': 8.690000000000002e-06, 'epoch': 3.12} 83%|████████▎ | 8278/10000 [30:11:21<6:11:27, 12.94s/it] 83%|████████▎ | 8279/10000 [30:11:33<6:10:50, 12.93s/it] {'loss': 0.0056, 'learning_rate': 8.685e-06, 'epoch': 3.12} 83%|████████▎ | 8279/10000 [30:11:33<6:10:50, 12.93s/it] 83%|████████▎ | 8280/10000 [30:11:46<6:10:42, 12.93s/it] {'loss': 0.0033, 'learning_rate': 8.68e-06, 'epoch': 3.12} 83%|████████▎ | 8280/10000 [30:11:46<6:10:42, 12.93s/it] 83%|████████▎ | 8281/10000 [30:11:59<6:10:05, 12.92s/it] {'loss': 0.0032, 'learning_rate': 8.674999999999999e-06, 'epoch': 3.12} 83%|████████▎ | 8281/10000 [30:11:59<6:10:05, 12.92s/it] 83%|████████▎ | 8282/10000 [30:12:12<6:09:35, 12.91s/it] {'loss': 0.0046, 'learning_rate': 8.67e-06, 'epoch': 3.12} 83%|████████▎ | 8282/10000 [30:12:12<6:09:35, 12.91s/it] 83%|████████▎ | 8283/10000 [30:12:25<6:09:22, 12.91s/it] {'loss': 0.0034, 'learning_rate': 8.665000000000001e-06, 'epoch': 3.12} 83%|████████▎ | 8283/10000 [30:12:25<6:09:22, 12.91s/it] 83%|████████▎ | 8284/10000 [30:12:38<6:09:28, 12.92s/it] {'loss': 0.0026, 'learning_rate': 8.66e-06, 'epoch': 3.12} 83%|████████▎ | 8284/10000 [30:12:38<6:09:28, 12.92s/it] 83%|████████▎ | 8285/10000 [30:12:51<6:08:37, 12.90s/it] {'loss': 0.0035, 'learning_rate': 8.655000000000001e-06, 'epoch': 3.12} 83%|████████▎ | 8285/10000 [30:12:51<6:08:37, 12.90s/it] 83%|████████▎ | 8286/10000 [30:13:04<6:09:16, 12.93s/it] {'loss': 0.003, 'learning_rate': 8.65e-06, 'epoch': 3.12} 83%|████████▎ | 8286/10000 [30:13:04<6:09:16, 12.93s/it] 83%|████████▎ | 8287/10000 [30:13:17<6:09:34, 12.94s/it] {'loss': 0.0037, 'learning_rate': 8.645000000000001e-06, 'epoch': 3.12} 83%|████████▎ | 8287/10000 [30:13:17<6:09:34, 12.94s/it] 83%|████████▎ | 8288/10000 [30:13:30<6:09:57, 12.97s/it] {'loss': 0.0039, 'learning_rate': 8.64e-06, 'epoch': 3.12} 83%|████████▎ | 8288/10000 [30:13:30<6:09:57, 12.97s/it] 83%|████████▎ | 8289/10000 [30:13:43<6:09:10, 12.95s/it] {'loss': 0.0038, 'learning_rate': 8.635e-06, 'epoch': 3.12} 83%|████████▎ | 8289/10000 [30:13:43<6:09:10, 12.95s/it] 83%|████████▎ | 8290/10000 [30:13:56<6:08:00, 12.91s/it] {'loss': 0.0041, 'learning_rate': 8.63e-06, 'epoch': 3.12} 83%|████████▎ | 8290/10000 [30:13:56<6:08:00, 12.91s/it] 83%|████████▎ | 8291/10000 [30:14:08<6:07:26, 12.90s/it] {'loss': 0.0038, 'learning_rate': 8.625e-06, 'epoch': 3.12} 83%|████████▎ | 8291/10000 [30:14:08<6:07:26, 12.90s/it] 83%|████████▎ | 8292/10000 [30:14:21<6:07:48, 12.92s/it] {'loss': 0.0036, 'learning_rate': 8.62e-06, 'epoch': 3.12} 83%|████████▎ | 8292/10000 [30:14:21<6:07:48, 12.92s/it] 83%|████████▎ | 8293/10000 [30:14:34<6:07:24, 12.91s/it] {'loss': 0.0038, 'learning_rate': 8.615000000000001e-06, 'epoch': 3.12} 83%|████████▎ | 8293/10000 [30:14:34<6:07:24, 12.91s/it] 83%|████████▎ | 8294/10000 [30:14:47<6:06:50, 12.90s/it] {'loss': 0.0043, 'learning_rate': 8.61e-06, 'epoch': 3.13} 83%|████████▎ | 8294/10000 [30:14:47<6:06:50, 12.90s/it] 83%|████████▎ | 8295/10000 [30:15:00<6:06:56, 12.91s/it] {'loss': 0.0045, 'learning_rate': 8.605e-06, 'epoch': 3.13} 83%|████████▎ | 8295/10000 [30:15:00<6:06:56, 12.91s/it] 83%|████████▎ | 8296/10000 [30:15:13<6:06:53, 12.92s/it] {'loss': 0.0037, 'learning_rate': 8.599999999999999e-06, 'epoch': 3.13} 83%|████████▎ | 8296/10000 [30:15:13<6:06:53, 12.92s/it] 83%|████████▎ | 8297/10000 [30:15:26<6:07:15, 12.94s/it] {'loss': 0.0036, 'learning_rate': 8.595e-06, 'epoch': 3.13} 83%|████████▎ | 8297/10000 [30:15:26<6:07:15, 12.94s/it] 83%|████████▎ | 8298/10000 [30:15:39<6:07:28, 12.95s/it] {'loss': 0.0047, 'learning_rate': 8.59e-06, 'epoch': 3.13} 83%|████████▎ | 8298/10000 [30:15:39<6:07:28, 12.95s/it] 83%|████████▎ | 8299/10000 [30:15:52<6:07:03, 12.95s/it] {'loss': 0.0031, 'learning_rate': 8.585e-06, 'epoch': 3.13} 83%|████████▎ | 8299/10000 [30:15:52<6:07:03, 12.95s/it] 83%|████████▎ | 8300/10000 [30:16:05<6:05:59, 12.92s/it] {'loss': 0.0032, 'learning_rate': 8.580000000000001e-06, 'epoch': 3.13} 83%|████████▎ | 8300/10000 [30:16:05<6:05:59, 12.92s/it] 83%|████████▎ | 8301/10000 [30:16:18<6:05:41, 12.91s/it] {'loss': 0.0036, 'learning_rate': 8.575000000000002e-06, 'epoch': 3.13} 83%|████████▎ | 8301/10000 [30:16:18<6:05:41, 12.91s/it] 83%|████████▎ | 8302/10000 [30:16:31<6:05:21, 12.91s/it] {'loss': 0.0041, 'learning_rate': 8.570000000000001e-06, 'epoch': 3.13} 83%|████████▎ | 8302/10000 [30:16:31<6:05:21, 12.91s/it] 83%|████████▎ | 8303/10000 [30:16:44<6:05:03, 12.91s/it] {'loss': 0.0027, 'learning_rate': 8.565e-06, 'epoch': 3.13} 83%|████████▎ | 8303/10000 [30:16:44<6:05:03, 12.91s/it] 83%|████████▎ | 8304/10000 [30:16:56<6:04:55, 12.91s/it] {'loss': 0.0025, 'learning_rate': 8.56e-06, 'epoch': 3.13} 83%|████████▎ | 8304/10000 [30:16:56<6:04:55, 12.91s/it] 83%|████████▎ | 8305/10000 [30:17:09<6:04:19, 12.90s/it] {'loss': 0.0027, 'learning_rate': 8.555e-06, 'epoch': 3.13} 83%|████████▎ | 8305/10000 [30:17:09<6:04:19, 12.90s/it] 83%|████████▎ | 8306/10000 [30:17:22<6:03:59, 12.89s/it] {'loss': 0.0036, 'learning_rate': 8.550000000000001e-06, 'epoch': 3.13} 83%|████████▎ | 8306/10000 [30:17:22<6:03:59, 12.89s/it] 83%|████████▎ | 8307/10000 [30:17:35<6:04:06, 12.90s/it] {'loss': 0.0038, 'learning_rate': 8.545e-06, 'epoch': 3.13} 83%|████████▎ | 8307/10000 [30:17:35<6:04:06, 12.90s/it] 83%|████████▎ | 8308/10000 [30:17:48<6:05:10, 12.95s/it] {'loss': 0.0032, 'learning_rate': 8.540000000000001e-06, 'epoch': 3.13} 83%|████████▎ | 8308/10000 [30:17:48<6:05:10, 12.95s/it] 83%|████████▎ | 8309/10000 [30:18:01<6:05:37, 12.97s/it] {'loss': 0.0033, 'learning_rate': 8.535e-06, 'epoch': 3.13} 83%|████████▎ | 8309/10000 [30:18:01<6:05:37, 12.97s/it] 83%|████████▎ | 8310/10000 [30:18:14<6:05:24, 12.97s/it] {'loss': 0.0036, 'learning_rate': 8.53e-06, 'epoch': 3.13} 83%|████████▎ | 8310/10000 [30:18:14<6:05:24, 12.97s/it] 83%|████████▎ | 8311/10000 [30:18:27<6:05:10, 12.97s/it] {'loss': 0.0039, 'learning_rate': 8.525e-06, 'epoch': 3.13} 83%|████████▎ | 8311/10000 [30:18:27<6:05:10, 12.97s/it] 83%|████████▎ | 8312/10000 [30:18:40<6:05:04, 12.98s/it] {'loss': 0.004, 'learning_rate': 8.52e-06, 'epoch': 3.13} 83%|████████▎ | 8312/10000 [30:18:40<6:05:04, 12.98s/it] 83%|████████▎ | 8313/10000 [30:18:53<6:03:49, 12.94s/it] {'loss': 0.0028, 'learning_rate': 8.515e-06, 'epoch': 3.13} 83%|████████▎ | 8313/10000 [30:18:53<6:03:49, 12.94s/it] 83%|████████▎ | 8314/10000 [30:19:06<6:04:30, 12.97s/it] {'loss': 0.0034, 'learning_rate': 8.51e-06, 'epoch': 3.13} 83%|████████▎ | 8314/10000 [30:19:06<6:04:30, 12.97s/it] 83%|████████▎ | 8315/10000 [30:19:19<6:04:04, 12.96s/it] {'loss': 0.0045, 'learning_rate': 8.505e-06, 'epoch': 3.13} 83%|████████▎ | 8315/10000 [30:19:19<6:04:04, 12.96s/it] 83%|████████▎ | 8316/10000 [30:19:32<6:03:40, 12.96s/it] {'loss': 0.0038, 'learning_rate': 8.500000000000002e-06, 'epoch': 3.13} 83%|████████▎ | 8316/10000 [30:19:32<6:03:40, 12.96s/it] 83%|████████▎ | 8317/10000 [30:19:45<6:02:35, 12.93s/it] {'loss': 0.0032, 'learning_rate': 8.495e-06, 'epoch': 3.13} 83%|████████▎ | 8317/10000 [30:19:45<6:02:35, 12.93s/it] 83%|████████▎ | 8318/10000 [30:19:58<6:02:50, 12.94s/it] {'loss': 0.0033, 'learning_rate': 8.49e-06, 'epoch': 3.13} 83%|████████▎ | 8318/10000 [30:19:58<6:02:50, 12.94s/it] 83%|████████▎ | 8319/10000 [30:20:11<6:03:18, 12.97s/it] {'loss': 0.0051, 'learning_rate': 8.485e-06, 'epoch': 3.13} 83%|████████▎ | 8319/10000 [30:20:11<6:03:18, 12.97s/it] 83%|████████▎ | 8320/10000 [30:20:24<6:02:44, 12.95s/it] {'loss': 0.0032, 'learning_rate': 8.48e-06, 'epoch': 3.13} 83%|████████▎ | 8320/10000 [30:20:24<6:02:44, 12.95s/it] 83%|████████▎ | 8321/10000 [30:20:37<6:01:59, 12.94s/it] {'loss': 0.0058, 'learning_rate': 8.475000000000001e-06, 'epoch': 3.14} 83%|████████▎ | 8321/10000 [30:20:37<6:01:59, 12.94s/it] 83%|████████▎ | 8322/10000 [30:20:49<6:01:09, 12.91s/it] {'loss': 0.004, 'learning_rate': 8.47e-06, 'epoch': 3.14} 83%|████████▎ | 8322/10000 [30:20:49<6:01:09, 12.91s/it] 83%|████████▎ | 8323/10000 [30:21:02<6:00:12, 12.89s/it] {'loss': 0.0043, 'learning_rate': 8.465000000000001e-06, 'epoch': 3.14} 83%|████████▎ | 8323/10000 [30:21:02<6:00:12, 12.89s/it] 83%|████████▎ | 8324/10000 [30:21:15<5:59:43, 12.88s/it] {'loss': 0.0035, 'learning_rate': 8.46e-06, 'epoch': 3.14} 83%|████████▎ | 8324/10000 [30:21:15<5:59:43, 12.88s/it] 83%|████████▎ | 8325/10000 [30:21:28<5:59:13, 12.87s/it] {'loss': 0.004, 'learning_rate': 8.455000000000001e-06, 'epoch': 3.14} 83%|████████▎ | 8325/10000 [30:21:28<5:59:13, 12.87s/it] 83%|████████▎ | 8326/10000 [30:21:41<5:59:16, 12.88s/it] {'loss': 0.0035, 'learning_rate': 8.45e-06, 'epoch': 3.14} 83%|████████▎ | 8326/10000 [30:21:41<5:59:16, 12.88s/it] 83%|████████▎ | 8327/10000 [30:21:54<5:59:39, 12.90s/it] {'loss': 0.005, 'learning_rate': 8.445e-06, 'epoch': 3.14} 83%|████████▎ | 8327/10000 [30:21:54<5:59:39, 12.90s/it] 83%|████████▎ | 8328/10000 [30:22:07<5:59:12, 12.89s/it] {'loss': 0.0042, 'learning_rate': 8.44e-06, 'epoch': 3.14} 83%|████████▎ | 8328/10000 [30:22:07<5:59:12, 12.89s/it] 83%|████████▎ | 8329/10000 [30:22:20<5:59:40, 12.91s/it] {'loss': 0.0043, 'learning_rate': 8.435e-06, 'epoch': 3.14} 83%|████████▎ | 8329/10000 [30:22:20<5:59:40, 12.91s/it] 83%|████████▎ | 8330/10000 [30:22:33<5:59:03, 12.90s/it] {'loss': 0.0053, 'learning_rate': 8.43e-06, 'epoch': 3.14} 83%|████████▎ | 8330/10000 [30:22:33<5:59:03, 12.90s/it] 83%|████████▎ | 8331/10000 [30:22:45<5:58:27, 12.89s/it] {'loss': 0.0029, 'learning_rate': 8.425000000000001e-06, 'epoch': 3.14} 83%|████████▎ | 8331/10000 [30:22:45<5:58:27, 12.89s/it] 83%|████████▎ | 8332/10000 [30:22:58<5:58:31, 12.90s/it] {'loss': 0.0037, 'learning_rate': 8.42e-06, 'epoch': 3.14} 83%|████████▎ | 8332/10000 [30:22:58<5:58:31, 12.90s/it] 83%|████████▎ | 8333/10000 [30:23:11<5:57:53, 12.88s/it] {'loss': 0.0036, 'learning_rate': 8.415e-06, 'epoch': 3.14} 83%|████████▎ | 8333/10000 [30:23:11<5:57:53, 12.88s/it] 83%|████████▎ | 8334/10000 [30:23:24<5:57:52, 12.89s/it] {'loss': 0.0025, 'learning_rate': 8.409999999999999e-06, 'epoch': 3.14} 83%|████████▎ | 8334/10000 [30:23:24<5:57:52, 12.89s/it] 83%|████████▎ | 8335/10000 [30:23:37<5:57:27, 12.88s/it] {'loss': 0.0039, 'learning_rate': 8.405e-06, 'epoch': 3.14} 83%|████████▎ | 8335/10000 [30:23:37<5:57:27, 12.88s/it] 83%|████████▎ | 8336/10000 [30:23:50<5:57:41, 12.90s/it] {'loss': 0.0034, 'learning_rate': 8.400000000000001e-06, 'epoch': 3.14} 83%|████████▎ | 8336/10000 [30:23:50<5:57:41, 12.90s/it] 83%|████████▎ | 8337/10000 [30:24:03<5:58:03, 12.92s/it] {'loss': 0.0032, 'learning_rate': 8.395e-06, 'epoch': 3.14} 83%|████████▎ | 8337/10000 [30:24:03<5:58:03, 12.92s/it] 83%|████████▎ | 8338/10000 [30:24:16<5:58:16, 12.93s/it] {'loss': 0.0031, 'learning_rate': 8.390000000000001e-06, 'epoch': 3.14} 83%|████████▎ | 8338/10000 [30:24:16<5:58:16, 12.93s/it] 83%|████████▎ | 8339/10000 [30:24:29<5:57:47, 12.92s/it] {'loss': 0.0041, 'learning_rate': 8.385e-06, 'epoch': 3.14} 83%|████████▎ | 8339/10000 [30:24:29<5:57:47, 12.92s/it] 83%|████████▎ | 8340/10000 [30:24:42<5:56:58, 12.90s/it] {'loss': 0.0043, 'learning_rate': 8.380000000000001e-06, 'epoch': 3.14} 83%|████████▎ | 8340/10000 [30:24:42<5:56:58, 12.90s/it] 83%|████████▎ | 8341/10000 [30:24:54<5:56:56, 12.91s/it] {'loss': 0.0031, 'learning_rate': 8.375e-06, 'epoch': 3.14} 83%|████████▎ | 8341/10000 [30:24:55<5:56:56, 12.91s/it] 83%|████████▎ | 8342/10000 [30:25:07<5:57:30, 12.94s/it] {'loss': 0.0032, 'learning_rate': 8.37e-06, 'epoch': 3.14} 83%|████████▎ | 8342/10000 [30:25:08<5:57:30, 12.94s/it] 83%|████████▎ | 8343/10000 [30:25:20<5:57:32, 12.95s/it] {'loss': 0.0034, 'learning_rate': 8.365e-06, 'epoch': 3.14} 83%|████████▎ | 8343/10000 [30:25:20<5:57:32, 12.95s/it] 83%|████████▎ | 8344/10000 [30:25:33<5:57:00, 12.94s/it] {'loss': 0.003, 'learning_rate': 8.36e-06, 'epoch': 3.14} 83%|████████▎ | 8344/10000 [30:25:33<5:57:00, 12.94s/it] 83%|████████▎ | 8345/10000 [30:25:46<5:57:53, 12.97s/it] {'loss': 0.0043, 'learning_rate': 8.355e-06, 'epoch': 3.14} 83%|████████▎ | 8345/10000 [30:25:46<5:57:53, 12.97s/it] 83%|████████▎ | 8346/10000 [30:25:59<5:57:30, 12.97s/it] {'loss': 0.0035, 'learning_rate': 8.350000000000001e-06, 'epoch': 3.14} 83%|████████▎ | 8346/10000 [30:25:59<5:57:30, 12.97s/it] 83%|████████▎ | 8347/10000 [30:26:12<5:56:40, 12.95s/it] {'loss': 0.004, 'learning_rate': 8.345e-06, 'epoch': 3.15} 83%|████████▎ | 8347/10000 [30:26:12<5:56:40, 12.95s/it] 83%|████████▎ | 8348/10000 [30:26:25<5:56:00, 12.93s/it] {'loss': 0.0038, 'learning_rate': 8.34e-06, 'epoch': 3.15} 83%|████████▎ | 8348/10000 [30:26:25<5:56:00, 12.93s/it] 83%|████████▎ | 8349/10000 [30:26:38<5:54:55, 12.90s/it] {'loss': 0.0046, 'learning_rate': 8.334999999999999e-06, 'epoch': 3.15} 83%|████████▎ | 8349/10000 [30:26:38<5:54:55, 12.90s/it] 84%|████████▎ | 8350/10000 [30:26:51<5:54:54, 12.91s/it] {'loss': 0.0032, 'learning_rate': 8.33e-06, 'epoch': 3.15} 84%|████████▎ | 8350/10000 [30:26:51<5:54:54, 12.91s/it] 84%|████████▎ | 8351/10000 [30:27:04<5:54:44, 12.91s/it] {'loss': 0.0035, 'learning_rate': 8.325e-06, 'epoch': 3.15} 84%|████████▎ | 8351/10000 [30:27:04<5:54:44, 12.91s/it] 84%|████████▎ | 8352/10000 [30:27:17<5:54:37, 12.91s/it] {'loss': 0.0035, 'learning_rate': 8.32e-06, 'epoch': 3.15} 84%|████████▎ | 8352/10000 [30:27:17<5:54:37, 12.91s/it] 84%|████████▎ | 8353/10000 [30:27:30<5:54:12, 12.90s/it] {'loss': 0.0046, 'learning_rate': 8.315000000000001e-06, 'epoch': 3.15} 84%|████████▎ | 8353/10000 [30:27:30<5:54:12, 12.90s/it] 84%|████████▎ | 8354/10000 [30:27:43<5:54:08, 12.91s/it] {'loss': 0.0027, 'learning_rate': 8.31e-06, 'epoch': 3.15} 84%|████████▎ | 8354/10000 [30:27:43<5:54:08, 12.91s/it] 84%|████████▎ | 8355/10000 [30:27:55<5:54:08, 12.92s/it] {'loss': 0.0037, 'learning_rate': 8.305000000000001e-06, 'epoch': 3.15} 84%|████████▎ | 8355/10000 [30:27:56<5:54:08, 12.92s/it] 84%|████████▎ | 8356/10000 [30:28:08<5:53:36, 12.91s/it] {'loss': 0.0036, 'learning_rate': 8.3e-06, 'epoch': 3.15} 84%|████████▎ | 8356/10000 [30:28:08<5:53:36, 12.91s/it] 84%|████████▎ | 8357/10000 [30:28:21<5:53:08, 12.90s/it] {'loss': 0.0045, 'learning_rate': 8.295e-06, 'epoch': 3.15} 84%|████████▎ | 8357/10000 [30:28:21<5:53:08, 12.90s/it] 84%|████████▎ | 8358/10000 [30:28:34<5:52:52, 12.89s/it] {'loss': 0.0045, 'learning_rate': 8.29e-06, 'epoch': 3.15} 84%|████████▎ | 8358/10000 [30:28:34<5:52:52, 12.89s/it] 84%|████████▎ | 8359/10000 [30:28:47<5:52:26, 12.89s/it] {'loss': 0.0043, 'learning_rate': 8.285e-06, 'epoch': 3.15} 84%|████████▎ | 8359/10000 [30:28:47<5:52:26, 12.89s/it] 84%|████████▎ | 8360/10000 [30:29:00<5:52:38, 12.90s/it] {'loss': 0.0041, 'learning_rate': 8.28e-06, 'epoch': 3.15} 84%|████████▎ | 8360/10000 [30:29:00<5:52:38, 12.90s/it] 84%|████████▎ | 8361/10000 [30:29:13<5:52:17, 12.90s/it] {'loss': 0.004, 'learning_rate': 8.275000000000001e-06, 'epoch': 3.15} 84%|████████▎ | 8361/10000 [30:29:13<5:52:17, 12.90s/it] 84%|████████▎ | 8362/10000 [30:29:26<5:52:04, 12.90s/it] {'loss': 0.0037, 'learning_rate': 8.27e-06, 'epoch': 3.15} 84%|████████▎ | 8362/10000 [30:29:26<5:52:04, 12.90s/it] 84%|████████▎ | 8363/10000 [30:29:39<5:52:08, 12.91s/it] {'loss': 0.0038, 'learning_rate': 8.265000000000001e-06, 'epoch': 3.15} 84%|████████▎ | 8363/10000 [30:29:39<5:52:08, 12.91s/it] 84%|████████▎ | 8364/10000 [30:29:52<5:51:47, 12.90s/it] {'loss': 0.0031, 'learning_rate': 8.26e-06, 'epoch': 3.15} 84%|████████▎ | 8364/10000 [30:29:52<5:51:47, 12.90s/it] 84%|████████▎ | 8365/10000 [30:30:04<5:51:13, 12.89s/it] {'loss': 0.0044, 'learning_rate': 8.255e-06, 'epoch': 3.15} 84%|████████▎ | 8365/10000 [30:30:04<5:51:13, 12.89s/it] 84%|████████▎ | 8366/10000 [30:30:17<5:50:39, 12.88s/it] {'loss': 0.0043, 'learning_rate': 8.25e-06, 'epoch': 3.15} 84%|████████▎ | 8366/10000 [30:30:17<5:50:39, 12.88s/it] 84%|████████▎ | 8367/10000 [30:30:30<5:50:37, 12.88s/it] {'loss': 0.0052, 'learning_rate': 8.245e-06, 'epoch': 3.15} 84%|████████▎ | 8367/10000 [30:30:30<5:50:37, 12.88s/it] 84%|████████▎ | 8368/10000 [30:30:43<5:50:20, 12.88s/it] {'loss': 0.004, 'learning_rate': 8.24e-06, 'epoch': 3.15} 84%|████████▎ | 8368/10000 [30:30:43<5:50:20, 12.88s/it] 84%|████████▎ | 8369/10000 [30:30:56<5:50:11, 12.88s/it] {'loss': 0.0047, 'learning_rate': 8.235000000000002e-06, 'epoch': 3.15} 84%|████████▎ | 8369/10000 [30:30:56<5:50:11, 12.88s/it] 84%|████████▎ | 8370/10000 [30:31:09<5:49:44, 12.87s/it] {'loss': 0.0045, 'learning_rate': 8.23e-06, 'epoch': 3.15} 84%|████████▎ | 8370/10000 [30:31:09<5:49:44, 12.87s/it] 84%|████████▎ | 8371/10000 [30:31:22<5:49:35, 12.88s/it] {'loss': 0.0033, 'learning_rate': 8.225e-06, 'epoch': 3.15} 84%|████████▎ | 8371/10000 [30:31:22<5:49:35, 12.88s/it] 84%|████████▎ | 8372/10000 [30:31:35<5:49:44, 12.89s/it] {'loss': 0.0029, 'learning_rate': 8.22e-06, 'epoch': 3.15} 84%|████████▎ | 8372/10000 [30:31:35<5:49:44, 12.89s/it] 84%|████████▎ | 8373/10000 [30:31:47<5:49:53, 12.90s/it] {'loss': 0.0044, 'learning_rate': 8.215e-06, 'epoch': 3.15} 84%|████████▎ | 8373/10000 [30:31:48<5:49:53, 12.90s/it] 84%|████████▎ | 8374/10000 [30:32:00<5:49:28, 12.90s/it] {'loss': 0.0045, 'learning_rate': 8.210000000000001e-06, 'epoch': 3.16} 84%|████████▎ | 8374/10000 [30:32:00<5:49:28, 12.90s/it] 84%|████████▍ | 8375/10000 [30:32:13<5:49:25, 12.90s/it] {'loss': 0.0037, 'learning_rate': 8.205e-06, 'epoch': 3.16} 84%|████████▍ | 8375/10000 [30:32:13<5:49:25, 12.90s/it] 84%|████████▍ | 8376/10000 [30:32:26<5:48:58, 12.89s/it] {'loss': 0.0039, 'learning_rate': 8.200000000000001e-06, 'epoch': 3.16} 84%|████████▍ | 8376/10000 [30:32:26<5:48:58, 12.89s/it] 84%|████████▍ | 8377/10000 [30:32:39<5:48:59, 12.90s/it] {'loss': 0.0042, 'learning_rate': 8.195e-06, 'epoch': 3.16} 84%|████████▍ | 8377/10000 [30:32:39<5:48:59, 12.90s/it] 84%|████████▍ | 8378/10000 [30:32:52<5:49:14, 12.92s/it] {'loss': 0.0041, 'learning_rate': 8.190000000000001e-06, 'epoch': 3.16} 84%|████████▍ | 8378/10000 [30:32:52<5:49:14, 12.92s/it] 84%|████████▍ | 8379/10000 [30:33:05<5:49:21, 12.93s/it] {'loss': 0.0038, 'learning_rate': 8.185e-06, 'epoch': 3.16} 84%|████████▍ | 8379/10000 [30:33:05<5:49:21, 12.93s/it] 84%|████████▍ | 8380/10000 [30:33:18<5:50:01, 12.96s/it] {'loss': 0.0031, 'learning_rate': 8.18e-06, 'epoch': 3.16} 84%|████████▍ | 8380/10000 [30:33:18<5:50:01, 12.96s/it] 84%|████████▍ | 8381/10000 [30:33:31<5:50:04, 12.97s/it] {'loss': 0.0037, 'learning_rate': 8.175e-06, 'epoch': 3.16} 84%|████████▍ | 8381/10000 [30:33:31<5:50:04, 12.97s/it] 84%|████████▍ | 8382/10000 [30:33:44<5:49:17, 12.95s/it] {'loss': 0.0029, 'learning_rate': 8.17e-06, 'epoch': 3.16} 84%|████████▍ | 8382/10000 [30:33:44<5:49:17, 12.95s/it] 84%|████████▍ | 8383/10000 [30:33:57<5:48:48, 12.94s/it] {'loss': 0.0035, 'learning_rate': 8.165e-06, 'epoch': 3.16} 84%|████████▍ | 8383/10000 [30:33:57<5:48:48, 12.94s/it] 84%|████████▍ | 8384/10000 [30:34:10<5:47:58, 12.92s/it] {'loss': 0.0031, 'learning_rate': 8.160000000000001e-06, 'epoch': 3.16} 84%|████████▍ | 8384/10000 [30:34:10<5:47:58, 12.92s/it] 84%|████████▍ | 8385/10000 [30:34:23<5:47:25, 12.91s/it] {'loss': 0.0041, 'learning_rate': 8.155e-06, 'epoch': 3.16} 84%|████████▍ | 8385/10000 [30:34:23<5:47:25, 12.91s/it] 84%|████████▍ | 8386/10000 [30:34:36<5:47:25, 12.92s/it] {'loss': 0.0035, 'learning_rate': 8.15e-06, 'epoch': 3.16} 84%|████████▍ | 8386/10000 [30:34:36<5:47:25, 12.92s/it] 84%|████████▍ | 8387/10000 [30:34:48<5:46:40, 12.90s/it] {'loss': 0.0042, 'learning_rate': 8.144999999999999e-06, 'epoch': 3.16} 84%|████████▍ | 8387/10000 [30:34:48<5:46:40, 12.90s/it] 84%|████████▍ | 8388/10000 [30:35:01<5:46:47, 12.91s/it] {'loss': 0.0035, 'learning_rate': 8.14e-06, 'epoch': 3.16} 84%|████████▍ | 8388/10000 [30:35:01<5:46:47, 12.91s/it] 84%|████████▍ | 8389/10000 [30:35:14<5:46:22, 12.90s/it] {'loss': 0.004, 'learning_rate': 8.135000000000001e-06, 'epoch': 3.16} 84%|████████▍ | 8389/10000 [30:35:14<5:46:22, 12.90s/it] 84%|████████▍ | 8390/10000 [30:35:27<5:46:21, 12.91s/it] {'loss': 0.0036, 'learning_rate': 8.13e-06, 'epoch': 3.16} 84%|████████▍ | 8390/10000 [30:35:27<5:46:21, 12.91s/it] 84%|████████▍ | 8391/10000 [30:35:40<5:46:31, 12.92s/it] {'loss': 0.0034, 'learning_rate': 8.125000000000001e-06, 'epoch': 3.16} 84%|████████▍ | 8391/10000 [30:35:40<5:46:31, 12.92s/it] 84%|████████▍ | 8392/10000 [30:35:53<5:45:56, 12.91s/it] {'loss': 0.0038, 'learning_rate': 8.12e-06, 'epoch': 3.16} 84%|████████▍ | 8392/10000 [30:35:53<5:45:56, 12.91s/it] 84%|████████▍ | 8393/10000 [30:36:06<5:44:54, 12.88s/it] {'loss': 0.0053, 'learning_rate': 8.115000000000001e-06, 'epoch': 3.16} 84%|████████▍ | 8393/10000 [30:36:06<5:44:54, 12.88s/it] 84%|████████▍ | 8394/10000 [30:36:19<5:45:35, 12.91s/it] {'loss': 0.0041, 'learning_rate': 8.11e-06, 'epoch': 3.16} 84%|████████▍ | 8394/10000 [30:36:19<5:45:35, 12.91s/it] 84%|████████▍ | 8395/10000 [30:36:32<5:45:52, 12.93s/it] {'loss': 0.0043, 'learning_rate': 8.105e-06, 'epoch': 3.16} 84%|████████▍ | 8395/10000 [30:36:32<5:45:52, 12.93s/it] 84%|████████▍ | 8396/10000 [30:36:45<5:46:04, 12.95s/it] {'loss': 0.0047, 'learning_rate': 8.1e-06, 'epoch': 3.16} 84%|████████▍ | 8396/10000 [30:36:45<5:46:04, 12.95s/it] 84%|████████▍ | 8397/10000 [30:36:58<5:45:18, 12.93s/it] {'loss': 0.0038, 'learning_rate': 8.095e-06, 'epoch': 3.16} 84%|████████▍ | 8397/10000 [30:36:58<5:45:18, 12.93s/it] 84%|████████▍ | 8398/10000 [30:37:10<5:44:19, 12.90s/it] {'loss': 0.0033, 'learning_rate': 8.09e-06, 'epoch': 3.16} 84%|████████▍ | 8398/10000 [30:37:10<5:44:19, 12.90s/it] 84%|████████▍ | 8399/10000 [30:37:23<5:44:03, 12.89s/it] {'loss': 0.0038, 'learning_rate': 8.085000000000001e-06, 'epoch': 3.16} 84%|████████▍ | 8399/10000 [30:37:23<5:44:03, 12.89s/it] 84%|████████▍ | 8400/10000 [30:37:36<5:45:15, 12.95s/it] {'loss': 0.0025, 'learning_rate': 8.08e-06, 'epoch': 3.17} 84%|████████▍ | 8400/10000 [30:37:36<5:45:15, 12.95s/it] 84%|████████▍ | 8401/10000 [30:37:49<5:44:27, 12.93s/it] {'loss': 0.0045, 'learning_rate': 8.075000000000001e-06, 'epoch': 3.17} 84%|████████▍ | 8401/10000 [30:37:49<5:44:27, 12.93s/it] 84%|████████▍ | 8402/10000 [30:38:02<5:43:49, 12.91s/it] {'loss': 0.0033, 'learning_rate': 8.069999999999999e-06, 'epoch': 3.17} 84%|████████▍ | 8402/10000 [30:38:02<5:43:49, 12.91s/it] 84%|████████▍ | 8403/10000 [30:38:15<5:44:13, 12.93s/it] {'loss': 0.0035, 'learning_rate': 8.065e-06, 'epoch': 3.17} 84%|████████▍ | 8403/10000 [30:38:15<5:44:13, 12.93s/it] 84%|████████▍ | 8404/10000 [30:38:28<5:43:23, 12.91s/it] {'loss': 0.0037, 'learning_rate': 8.06e-06, 'epoch': 3.17} 84%|████████▍ | 8404/10000 [30:38:28<5:43:23, 12.91s/it] 84%|████████▍ | 8405/10000 [30:38:41<5:43:35, 12.92s/it] {'loss': 0.0032, 'learning_rate': 8.055e-06, 'epoch': 3.17} 84%|████████▍ | 8405/10000 [30:38:41<5:43:35, 12.92s/it] 84%|████████▍ | 8406/10000 [30:38:54<5:43:05, 12.91s/it] {'loss': 0.0034, 'learning_rate': 8.050000000000001e-06, 'epoch': 3.17} 84%|████████▍ | 8406/10000 [30:38:54<5:43:05, 12.91s/it] 84%|████████▍ | 8407/10000 [30:39:07<5:43:28, 12.94s/it] {'loss': 0.003, 'learning_rate': 8.045e-06, 'epoch': 3.17} 84%|████████▍ | 8407/10000 [30:39:07<5:43:28, 12.94s/it] 84%|████████▍ | 8408/10000 [30:39:20<5:44:16, 12.98s/it] {'loss': 0.0037, 'learning_rate': 8.040000000000001e-06, 'epoch': 3.17} 84%|████████▍ | 8408/10000 [30:39:20<5:44:16, 12.98s/it] 84%|████████▍ | 8409/10000 [30:39:33<5:43:35, 12.96s/it] {'loss': 0.0034, 'learning_rate': 8.035e-06, 'epoch': 3.17} 84%|████████▍ | 8409/10000 [30:39:33<5:43:35, 12.96s/it] 84%|████████▍ | 8410/10000 [30:39:46<5:43:17, 12.95s/it] {'loss': 0.0052, 'learning_rate': 8.03e-06, 'epoch': 3.17} 84%|████████▍ | 8410/10000 [30:39:46<5:43:17, 12.95s/it] 84%|████████▍ | 8411/10000 [30:39:59<5:44:00, 12.99s/it] {'loss': 0.0036, 'learning_rate': 8.025e-06, 'epoch': 3.17} 84%|████████▍ | 8411/10000 [30:39:59<5:44:00, 12.99s/it] 84%|████████▍ | 8412/10000 [30:40:12<5:44:17, 13.01s/it] {'loss': 0.0035, 'learning_rate': 8.02e-06, 'epoch': 3.17} 84%|████████▍ | 8412/10000 [30:40:12<5:44:17, 13.01s/it] 84%|████████▍ | 8413/10000 [30:40:25<5:44:36, 13.03s/it] {'loss': 0.0024, 'learning_rate': 8.015e-06, 'epoch': 3.17} 84%|████████▍ | 8413/10000 [30:40:25<5:44:36, 13.03s/it] 84%|████████▍ | 8414/10000 [30:40:38<5:43:08, 12.98s/it] {'loss': 0.0038, 'learning_rate': 8.010000000000001e-06, 'epoch': 3.17} 84%|████████▍ | 8414/10000 [30:40:38<5:43:08, 12.98s/it] 84%|████████▍ | 8415/10000 [30:40:51<5:42:44, 12.97s/it] {'loss': 0.0036, 'learning_rate': 8.005e-06, 'epoch': 3.17} 84%|████████▍ | 8415/10000 [30:40:51<5:42:44, 12.97s/it] 84%|████████▍ | 8416/10000 [30:41:04<5:42:14, 12.96s/it] {'loss': 0.0037, 'learning_rate': 8.000000000000001e-06, 'epoch': 3.17} 84%|████████▍ | 8416/10000 [30:41:04<5:42:14, 12.96s/it] 84%|████████▍ | 8417/10000 [30:41:17<5:41:20, 12.94s/it] {'loss': 0.0033, 'learning_rate': 7.995e-06, 'epoch': 3.17} 84%|████████▍ | 8417/10000 [30:41:17<5:41:20, 12.94s/it] 84%|████████▍ | 8418/10000 [30:41:29<5:40:35, 12.92s/it] {'loss': 0.0035, 'learning_rate': 7.99e-06, 'epoch': 3.17} 84%|████████▍ | 8418/10000 [30:41:29<5:40:35, 12.92s/it] 84%|████████▍ | 8419/10000 [30:41:42<5:40:28, 12.92s/it] {'loss': 0.0042, 'learning_rate': 7.985e-06, 'epoch': 3.17} 84%|████████▍ | 8419/10000 [30:41:42<5:40:28, 12.92s/it] 84%|████████▍ | 8420/10000 [30:41:55<5:40:03, 12.91s/it] {'loss': 0.0041, 'learning_rate': 7.98e-06, 'epoch': 3.17} 84%|████████▍ | 8420/10000 [30:41:55<5:40:03, 12.91s/it] 84%|████████▍ | 8421/10000 [30:42:08<5:39:21, 12.90s/it] {'loss': 0.0038, 'learning_rate': 7.975e-06, 'epoch': 3.17} 84%|████████▍ | 8421/10000 [30:42:08<5:39:21, 12.90s/it] 84%|████████▍ | 8422/10000 [30:42:21<5:39:37, 12.91s/it] {'loss': 0.0041, 'learning_rate': 7.97e-06, 'epoch': 3.17} 84%|████████▍ | 8422/10000 [30:42:21<5:39:37, 12.91s/it] 84%|████████▍ | 8423/10000 [30:42:34<5:39:36, 12.92s/it] {'loss': 0.0037, 'learning_rate': 7.965e-06, 'epoch': 3.17} 84%|████████▍ | 8423/10000 [30:42:34<5:39:36, 12.92s/it] 84%|████████▍ | 8424/10000 [30:42:47<5:39:26, 12.92s/it] {'loss': 0.0038, 'learning_rate': 7.96e-06, 'epoch': 3.17} 84%|████████▍ | 8424/10000 [30:42:47<5:39:26, 12.92s/it] 84%|████████▍ | 8425/10000 [30:43:00<5:39:28, 12.93s/it] {'loss': 0.0036, 'learning_rate': 7.955e-06, 'epoch': 3.17} 84%|████████▍ | 8425/10000 [30:43:00<5:39:28, 12.93s/it] 84%|████████▍ | 8426/10000 [30:43:13<5:39:25, 12.94s/it] {'loss': 0.0041, 'learning_rate': 7.95e-06, 'epoch': 3.17} 84%|████████▍ | 8426/10000 [30:43:13<5:39:25, 12.94s/it] 84%|████████▍ | 8427/10000 [30:43:26<5:39:27, 12.95s/it] {'loss': 0.0047, 'learning_rate': 7.945000000000001e-06, 'epoch': 3.18} 84%|████████▍ | 8427/10000 [30:43:26<5:39:27, 12.95s/it] 84%|████████▍ | 8428/10000 [30:43:39<5:39:53, 12.97s/it] {'loss': 0.0033, 'learning_rate': 7.94e-06, 'epoch': 3.18} 84%|████████▍ | 8428/10000 [30:43:39<5:39:53, 12.97s/it] 84%|████████▍ | 8429/10000 [30:43:52<5:40:02, 12.99s/it] {'loss': 0.0028, 'learning_rate': 7.935000000000001e-06, 'epoch': 3.18} 84%|████████▍ | 8429/10000 [30:43:52<5:40:02, 12.99s/it] 84%|████████▍ | 8430/10000 [30:44:05<5:39:10, 12.96s/it] {'loss': 0.0032, 'learning_rate': 7.93e-06, 'epoch': 3.18} 84%|████████▍ | 8430/10000 [30:44:05<5:39:10, 12.96s/it] 84%|████████▍ | 8431/10000 [30:44:18<5:38:47, 12.96s/it] {'loss': 0.0046, 'learning_rate': 7.925000000000001e-06, 'epoch': 3.18} 84%|████████▍ | 8431/10000 [30:44:18<5:38:47, 12.96s/it] 84%|████████▍ | 8432/10000 [30:44:31<5:39:09, 12.98s/it] {'loss': 0.0031, 'learning_rate': 7.92e-06, 'epoch': 3.18} 84%|████████▍ | 8432/10000 [30:44:31<5:39:09, 12.98s/it] 84%|████████▍ | 8433/10000 [30:44:44<5:38:41, 12.97s/it] {'loss': 0.0041, 'learning_rate': 7.915e-06, 'epoch': 3.18} 84%|████████▍ | 8433/10000 [30:44:44<5:38:41, 12.97s/it] 84%|████████▍ | 8434/10000 [30:44:57<5:37:59, 12.95s/it] {'loss': 0.0033, 'learning_rate': 7.91e-06, 'epoch': 3.18} 84%|████████▍ | 8434/10000 [30:44:57<5:37:59, 12.95s/it] 84%|████████▍ | 8435/10000 [30:45:10<5:37:30, 12.94s/it] {'loss': 0.0047, 'learning_rate': 7.905e-06, 'epoch': 3.18} 84%|████████▍ | 8435/10000 [30:45:10<5:37:30, 12.94s/it] 84%|████████▍ | 8436/10000 [30:45:23<5:37:34, 12.95s/it] {'loss': 0.0026, 'learning_rate': 7.9e-06, 'epoch': 3.18} 84%|████████▍ | 8436/10000 [30:45:23<5:37:34, 12.95s/it] 84%|████████▍ | 8437/10000 [30:45:35<5:36:51, 12.93s/it] {'loss': 0.0024, 'learning_rate': 7.895000000000001e-06, 'epoch': 3.18} 84%|████████▍ | 8437/10000 [30:45:35<5:36:51, 12.93s/it] 84%|████████▍ | 8438/10000 [30:45:48<5:37:41, 12.97s/it] {'loss': 0.0034, 'learning_rate': 7.89e-06, 'epoch': 3.18} 84%|████████▍ | 8438/10000 [30:45:48<5:37:41, 12.97s/it] 84%|████████▍ | 8439/10000 [30:46:01<5:37:36, 12.98s/it] {'loss': 0.0049, 'learning_rate': 7.885e-06, 'epoch': 3.18} 84%|████████▍ | 8439/10000 [30:46:01<5:37:36, 12.98s/it] 84%|████████▍ | 8440/10000 [30:46:14<5:36:53, 12.96s/it] {'loss': 0.0044, 'learning_rate': 7.879999999999999e-06, 'epoch': 3.18} 84%|████████▍ | 8440/10000 [30:46:14<5:36:53, 12.96s/it] 84%|████████▍ | 8441/10000 [30:46:27<5:35:54, 12.93s/it] {'loss': 0.0038, 'learning_rate': 7.875e-06, 'epoch': 3.18} 84%|████████▍ | 8441/10000 [30:46:27<5:35:54, 12.93s/it] 84%|████████▍ | 8442/10000 [30:46:40<5:35:39, 12.93s/it] {'loss': 0.0038, 'learning_rate': 7.870000000000001e-06, 'epoch': 3.18} 84%|████████▍ | 8442/10000 [30:46:40<5:35:39, 12.93s/it] 84%|████████▍ | 8443/10000 [30:46:53<5:35:22, 12.92s/it] {'loss': 0.0031, 'learning_rate': 7.865e-06, 'epoch': 3.18} 84%|████████▍ | 8443/10000 [30:46:53<5:35:22, 12.92s/it] 84%|████████▍ | 8444/10000 [30:47:06<5:34:47, 12.91s/it] {'loss': 0.0039, 'learning_rate': 7.860000000000001e-06, 'epoch': 3.18} 84%|████████▍ | 8444/10000 [30:47:06<5:34:47, 12.91s/it] 84%|████████▍ | 8445/10000 [30:47:19<5:34:35, 12.91s/it] {'loss': 0.0041, 'learning_rate': 7.855e-06, 'epoch': 3.18} 84%|████████▍ | 8445/10000 [30:47:19<5:34:35, 12.91s/it] 84%|████████▍ | 8446/10000 [30:47:32<5:34:47, 12.93s/it] {'loss': 0.0043, 'learning_rate': 7.850000000000001e-06, 'epoch': 3.18} 84%|████████▍ | 8446/10000 [30:47:32<5:34:47, 12.93s/it] 84%|████████▍ | 8447/10000 [30:47:45<5:34:26, 12.92s/it] {'loss': 0.0036, 'learning_rate': 7.845e-06, 'epoch': 3.18} 84%|████████▍ | 8447/10000 [30:47:45<5:34:26, 12.92s/it] 84%|████████▍ | 8448/10000 [30:47:58<5:33:56, 12.91s/it] {'loss': 0.0041, 'learning_rate': 7.84e-06, 'epoch': 3.18} 84%|████████▍ | 8448/10000 [30:47:58<5:33:56, 12.91s/it] 84%|████████▍ | 8449/10000 [30:48:11<5:33:44, 12.91s/it] {'loss': 0.0042, 'learning_rate': 7.835e-06, 'epoch': 3.18} 84%|████████▍ | 8449/10000 [30:48:11<5:33:44, 12.91s/it] 84%|████████▍ | 8450/10000 [30:48:23<5:33:36, 12.91s/it] {'loss': 0.0043, 'learning_rate': 7.83e-06, 'epoch': 3.18} 84%|████████▍ | 8450/10000 [30:48:23<5:33:36, 12.91s/it] 85%|████████▍ | 8451/10000 [30:48:36<5:33:38, 12.92s/it] {'loss': 0.0033, 'learning_rate': 7.825e-06, 'epoch': 3.18} 85%|████████▍ | 8451/10000 [30:48:36<5:33:38, 12.92s/it] 85%|████████▍ | 8452/10000 [30:48:49<5:33:41, 12.93s/it] {'loss': 0.0048, 'learning_rate': 7.820000000000001e-06, 'epoch': 3.18} 85%|████████▍ | 8452/10000 [30:48:49<5:33:41, 12.93s/it] 85%|████████▍ | 8453/10000 [30:49:02<5:33:50, 12.95s/it] {'loss': 0.0037, 'learning_rate': 7.815e-06, 'epoch': 3.19} 85%|████████▍ | 8453/10000 [30:49:02<5:33:50, 12.95s/it] 85%|████████▍ | 8454/10000 [30:49:15<5:34:05, 12.97s/it] {'loss': 0.0037, 'learning_rate': 7.810000000000001e-06, 'epoch': 3.19} 85%|████████▍ | 8454/10000 [30:49:15<5:34:05, 12.97s/it] 85%|████████▍ | 8455/10000 [30:49:28<5:33:47, 12.96s/it] {'loss': 0.003, 'learning_rate': 7.805e-06, 'epoch': 3.19} 85%|████████▍ | 8455/10000 [30:49:28<5:33:47, 12.96s/it] 85%|████████▍ | 8456/10000 [30:49:41<5:32:46, 12.93s/it] {'loss': 0.0034, 'learning_rate': 7.8e-06, 'epoch': 3.19} 85%|████████▍ | 8456/10000 [30:49:41<5:32:46, 12.93s/it] 85%|████████▍ | 8457/10000 [30:49:54<5:32:35, 12.93s/it] {'loss': 0.0037, 'learning_rate': 7.795e-06, 'epoch': 3.19} 85%|████████▍ | 8457/10000 [30:49:54<5:32:35, 12.93s/it] 85%|████████▍ | 8458/10000 [30:50:07<5:32:41, 12.95s/it] {'loss': 0.0044, 'learning_rate': 7.79e-06, 'epoch': 3.19} 85%|████████▍ | 8458/10000 [30:50:07<5:32:41, 12.95s/it] 85%|████████▍ | 8459/10000 [30:50:20<5:32:41, 12.95s/it] {'loss': 0.0026, 'learning_rate': 7.785000000000001e-06, 'epoch': 3.19} 85%|████████▍ | 8459/10000 [30:50:20<5:32:41, 12.95s/it] 85%|████████▍ | 8460/10000 [30:50:33<5:32:05, 12.94s/it] {'loss': 0.0034, 'learning_rate': 7.78e-06, 'epoch': 3.19} 85%|████████▍ | 8460/10000 [30:50:33<5:32:05, 12.94s/it] 85%|████████▍ | 8461/10000 [30:50:46<5:32:17, 12.95s/it] {'loss': 0.0046, 'learning_rate': 7.775000000000001e-06, 'epoch': 3.19} 85%|████████▍ | 8461/10000 [30:50:46<5:32:17, 12.95s/it] 85%|████████▍ | 8462/10000 [30:50:59<5:31:59, 12.95s/it] {'loss': 0.0042, 'learning_rate': 7.77e-06, 'epoch': 3.19} 85%|████████▍ | 8462/10000 [30:50:59<5:31:59, 12.95s/it] 85%|████████▍ | 8463/10000 [30:51:12<5:31:04, 12.92s/it] {'loss': 0.0038, 'learning_rate': 7.765e-06, 'epoch': 3.19} 85%|████████▍ | 8463/10000 [30:51:12<5:31:04, 12.92s/it] 85%|████████▍ | 8464/10000 [30:51:25<5:30:35, 12.91s/it] {'loss': 0.0036, 'learning_rate': 7.76e-06, 'epoch': 3.19} 85%|████████▍ | 8464/10000 [30:51:25<5:30:35, 12.91s/it] 85%|████████▍ | 8465/10000 [30:51:38<5:30:19, 12.91s/it] {'loss': 0.003, 'learning_rate': 7.755e-06, 'epoch': 3.19} 85%|████████▍ | 8465/10000 [30:51:38<5:30:19, 12.91s/it] 85%|████████▍ | 8466/10000 [30:51:50<5:29:44, 12.90s/it] {'loss': 0.0046, 'learning_rate': 7.75e-06, 'epoch': 3.19} 85%|████████▍ | 8466/10000 [30:51:50<5:29:44, 12.90s/it] 85%|████████▍ | 8467/10000 [30:52:03<5:29:31, 12.90s/it] {'loss': 0.0044, 'learning_rate': 7.745000000000001e-06, 'epoch': 3.19} 85%|████████▍ | 8467/10000 [30:52:03<5:29:31, 12.90s/it] 85%|████████▍ | 8468/10000 [30:52:16<5:29:04, 12.89s/it] {'loss': 0.0045, 'learning_rate': 7.74e-06, 'epoch': 3.19} 85%|████████▍ | 8468/10000 [30:52:16<5:29:04, 12.89s/it] 85%|████████▍ | 8469/10000 [30:52:29<5:28:55, 12.89s/it] {'loss': 0.0037, 'learning_rate': 7.735000000000001e-06, 'epoch': 3.19} 85%|████████▍ | 8469/10000 [30:52:29<5:28:55, 12.89s/it] 85%|████████▍ | 8470/10000 [30:52:42<5:28:30, 12.88s/it] {'loss': 0.0043, 'learning_rate': 7.73e-06, 'epoch': 3.19} 85%|████████▍ | 8470/10000 [30:52:42<5:28:30, 12.88s/it] 85%|████████▍ | 8471/10000 [30:52:55<5:28:18, 12.88s/it] {'loss': 0.0037, 'learning_rate': 7.725e-06, 'epoch': 3.19} 85%|████████▍ | 8471/10000 [30:52:55<5:28:18, 12.88s/it] 85%|████████▍ | 8472/10000 [30:53:08<5:28:09, 12.89s/it] {'loss': 0.0037, 'learning_rate': 7.72e-06, 'epoch': 3.19} 85%|████████▍ | 8472/10000 [30:53:08<5:28:09, 12.89s/it] 85%|████████▍ | 8473/10000 [30:53:21<5:28:03, 12.89s/it] {'loss': 0.0036, 'learning_rate': 7.715e-06, 'epoch': 3.19} 85%|████████▍ | 8473/10000 [30:53:21<5:28:03, 12.89s/it] 85%|████████▍ | 8474/10000 [30:53:34<5:28:00, 12.90s/it] {'loss': 0.0031, 'learning_rate': 7.71e-06, 'epoch': 3.19} 85%|████████▍ | 8474/10000 [30:53:34<5:28:00, 12.90s/it] 85%|████████▍ | 8475/10000 [30:53:46<5:27:47, 12.90s/it] {'loss': 0.0038, 'learning_rate': 7.705e-06, 'epoch': 3.19} 85%|████████▍ | 8475/10000 [30:53:46<5:27:47, 12.90s/it] 85%|████████▍ | 8476/10000 [30:53:59<5:27:48, 12.91s/it] {'loss': 0.0036, 'learning_rate': 7.7e-06, 'epoch': 3.19} 85%|████████▍ | 8476/10000 [30:53:59<5:27:48, 12.91s/it] 85%|████████▍ | 8477/10000 [30:54:12<5:27:50, 12.92s/it] {'loss': 0.0036, 'learning_rate': 7.695e-06, 'epoch': 3.19} 85%|████████▍ | 8477/10000 [30:54:12<5:27:50, 12.92s/it] 85%|████████▍ | 8478/10000 [30:54:25<5:27:17, 12.90s/it] {'loss': 0.0048, 'learning_rate': 7.69e-06, 'epoch': 3.19} 85%|████████▍ | 8478/10000 [30:54:25<5:27:17, 12.90s/it] 85%|████████▍ | 8479/10000 [30:54:38<5:27:25, 12.92s/it] {'loss': 0.0032, 'learning_rate': 7.685e-06, 'epoch': 3.19} 85%|████████▍ | 8479/10000 [30:54:38<5:27:25, 12.92s/it] 85%|████████▍ | 8480/10000 [30:54:51<5:27:02, 12.91s/it] {'loss': 0.0048, 'learning_rate': 7.68e-06, 'epoch': 3.2} 85%|████████▍ | 8480/10000 [30:54:51<5:27:02, 12.91s/it] 85%|████████▍ | 8481/10000 [30:55:04<5:27:17, 12.93s/it] {'loss': 0.0028, 'learning_rate': 7.675e-06, 'epoch': 3.2} 85%|████████▍ | 8481/10000 [30:55:04<5:27:17, 12.93s/it] 85%|████████▍ | 8482/10000 [30:55:17<5:27:19, 12.94s/it] {'loss': 0.0034, 'learning_rate': 7.670000000000001e-06, 'epoch': 3.2} 85%|████████▍ | 8482/10000 [30:55:17<5:27:19, 12.94s/it] 85%|████████▍ | 8483/10000 [30:55:30<5:26:42, 12.92s/it] {'loss': 0.0038, 'learning_rate': 7.665e-06, 'epoch': 3.2} 85%|████████▍ | 8483/10000 [30:55:30<5:26:42, 12.92s/it] 85%|████████▍ | 8484/10000 [30:55:43<5:26:12, 12.91s/it] {'loss': 0.0035, 'learning_rate': 7.660000000000001e-06, 'epoch': 3.2} 85%|████████▍ | 8484/10000 [30:55:43<5:26:12, 12.91s/it] 85%|████████▍ | 8485/10000 [30:55:56<5:27:19, 12.96s/it] {'loss': 0.0033, 'learning_rate': 7.655e-06, 'epoch': 3.2} 85%|████████▍ | 8485/10000 [30:55:56<5:27:19, 12.96s/it] 85%|████████▍ | 8486/10000 [30:56:09<5:26:23, 12.93s/it] {'loss': 0.0045, 'learning_rate': 7.65e-06, 'epoch': 3.2} 85%|████████▍ | 8486/10000 [30:56:09<5:26:23, 12.93s/it] 85%|████████▍ | 8487/10000 [30:56:21<5:25:27, 12.91s/it] {'loss': 0.0044, 'learning_rate': 7.645e-06, 'epoch': 3.2} 85%|████████▍ | 8487/10000 [30:56:21<5:25:27, 12.91s/it] 85%|████████▍ | 8488/10000 [30:56:34<5:25:14, 12.91s/it] {'loss': 0.0037, 'learning_rate': 7.64e-06, 'epoch': 3.2} 85%|████████▍ | 8488/10000 [30:56:34<5:25:14, 12.91s/it] 85%|████████▍ | 8489/10000 [30:56:47<5:25:29, 12.92s/it] {'loss': 0.0033, 'learning_rate': 7.635e-06, 'epoch': 3.2} 85%|████████▍ | 8489/10000 [30:56:47<5:25:29, 12.92s/it] 85%|████████▍ | 8490/10000 [30:57:00<5:25:04, 12.92s/it] {'loss': 0.0036, 'learning_rate': 7.630000000000001e-06, 'epoch': 3.2} 85%|████████▍ | 8490/10000 [30:57:00<5:25:04, 12.92s/it] 85%|████████▍ | 8491/10000 [30:57:13<5:25:20, 12.94s/it] {'loss': 0.0043, 'learning_rate': 7.625e-06, 'epoch': 3.2} 85%|████████▍ | 8491/10000 [30:57:13<5:25:20, 12.94s/it] 85%|████████▍ | 8492/10000 [30:57:26<5:24:46, 12.92s/it] {'loss': 0.0033, 'learning_rate': 7.620000000000001e-06, 'epoch': 3.2} 85%|████████▍ | 8492/10000 [30:57:26<5:24:46, 12.92s/it] 85%|████████▍ | 8493/10000 [30:57:39<5:24:16, 12.91s/it] {'loss': 0.0037, 'learning_rate': 7.615e-06, 'epoch': 3.2} 85%|████████▍ | 8493/10000 [30:57:39<5:24:16, 12.91s/it] 85%|████████▍ | 8494/10000 [30:57:52<5:24:28, 12.93s/it] {'loss': 0.0032, 'learning_rate': 7.610000000000001e-06, 'epoch': 3.2} 85%|████████▍ | 8494/10000 [30:57:52<5:24:28, 12.93s/it] 85%|████████▍ | 8495/10000 [30:58:05<5:24:15, 12.93s/it] {'loss': 0.0037, 'learning_rate': 7.605000000000001e-06, 'epoch': 3.2} 85%|████████▍ | 8495/10000 [30:58:05<5:24:15, 12.93s/it] 85%|████████▍ | 8496/10000 [30:58:18<5:24:34, 12.95s/it] {'loss': 0.004, 'learning_rate': 7.6e-06, 'epoch': 3.2} 85%|████████▍ | 8496/10000 [30:58:18<5:24:34, 12.95s/it] 85%|████████▍ | 8497/10000 [30:58:31<5:24:11, 12.94s/it] {'loss': 0.0035, 'learning_rate': 7.595000000000001e-06, 'epoch': 3.2} 85%|████████▍ | 8497/10000 [30:58:31<5:24:11, 12.94s/it] 85%|████████▍ | 8498/10000 [30:58:44<5:24:02, 12.94s/it] {'loss': 0.0038, 'learning_rate': 7.59e-06, 'epoch': 3.2} 85%|████████▍ | 8498/10000 [30:58:44<5:24:02, 12.94s/it] 85%|████████▍ | 8499/10000 [30:58:57<5:23:39, 12.94s/it] {'loss': 0.0047, 'learning_rate': 7.585e-06, 'epoch': 3.2} 85%|████████▍ | 8499/10000 [30:58:57<5:23:39, 12.94s/it] 85%|████████▌ | 8500/10000 [30:59:10<5:23:04, 12.92s/it] {'loss': 0.004, 'learning_rate': 7.580000000000001e-06, 'epoch': 3.2} 85%|████████▌ | 8500/10000 [30:59:10<5:23:04, 12.92s/it] 85%|████████▌ | 8501/10000 [30:59:22<5:22:21, 12.90s/it] {'loss': 0.0039, 'learning_rate': 7.575e-06, 'epoch': 3.2} 85%|████████▌ | 8501/10000 [30:59:22<5:22:21, 12.90s/it] 85%|████████▌ | 8502/10000 [30:59:35<5:22:40, 12.92s/it] {'loss': 0.0025, 'learning_rate': 7.57e-06, 'epoch': 3.2} 85%|████████▌ | 8502/10000 [30:59:35<5:22:40, 12.92s/it] 85%|████████▌ | 8503/10000 [30:59:48<5:22:16, 12.92s/it] {'loss': 0.0044, 'learning_rate': 7.5649999999999996e-06, 'epoch': 3.2} 85%|████████▌ | 8503/10000 [30:59:48<5:22:16, 12.92s/it] 85%|████████▌ | 8504/10000 [31:00:01<5:21:25, 12.89s/it] {'loss': 0.0029, 'learning_rate': 7.5600000000000005e-06, 'epoch': 3.2} 85%|████████▌ | 8504/10000 [31:00:01<5:21:25, 12.89s/it] 85%|████████▌ | 8505/10000 [31:00:14<5:21:30, 12.90s/it] {'loss': 0.0028, 'learning_rate': 7.555000000000001e-06, 'epoch': 3.2} 85%|████████▌ | 8505/10000 [31:00:14<5:21:30, 12.90s/it] 85%|████████▌ | 8506/10000 [31:00:27<5:21:35, 12.92s/it] {'loss': 0.0039, 'learning_rate': 7.55e-06, 'epoch': 3.2} 85%|████████▌ | 8506/10000 [31:00:27<5:21:35, 12.92s/it] 85%|████████▌ | 8507/10000 [31:00:40<5:21:11, 12.91s/it] {'loss': 0.0032, 'learning_rate': 7.545000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8507/10000 [31:00:40<5:21:11, 12.91s/it] 85%|████████▌ | 8508/10000 [31:00:53<5:20:55, 12.91s/it] {'loss': 0.0037, 'learning_rate': 7.54e-06, 'epoch': 3.21} 85%|████████▌ | 8508/10000 [31:00:53<5:20:55, 12.91s/it] 85%|████████▌ | 8509/10000 [31:01:06<5:20:53, 12.91s/it] {'loss': 0.0048, 'learning_rate': 7.535000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8509/10000 [31:01:06<5:20:53, 12.91s/it] 85%|████████▌ | 8510/10000 [31:01:19<5:21:06, 12.93s/it] {'loss': 0.0036, 'learning_rate': 7.530000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8510/10000 [31:01:19<5:21:06, 12.93s/it] 85%|████████▌ | 8511/10000 [31:01:32<5:20:14, 12.90s/it] {'loss': 0.0032, 'learning_rate': 7.525e-06, 'epoch': 3.21} 85%|████████▌ | 8511/10000 [31:01:32<5:20:14, 12.90s/it] 85%|████████▌ | 8512/10000 [31:01:44<5:19:28, 12.88s/it] {'loss': 0.0032, 'learning_rate': 7.520000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8512/10000 [31:01:44<5:19:28, 12.88s/it] 85%|████████▌ | 8513/10000 [31:01:57<5:18:59, 12.87s/it] {'loss': 0.0045, 'learning_rate': 7.515e-06, 'epoch': 3.21} 85%|████████▌ | 8513/10000 [31:01:57<5:18:59, 12.87s/it] 85%|████████▌ | 8514/10000 [31:02:10<5:19:26, 12.90s/it] {'loss': 0.0033, 'learning_rate': 7.51e-06, 'epoch': 3.21} 85%|████████▌ | 8514/10000 [31:02:10<5:19:26, 12.90s/it] 85%|████████▌ | 8515/10000 [31:02:23<5:18:53, 12.88s/it] {'loss': 0.0032, 'learning_rate': 7.505000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8515/10000 [31:02:23<5:18:53, 12.88s/it] 85%|████████▌ | 8516/10000 [31:02:36<5:18:23, 12.87s/it] {'loss': 0.0034, 'learning_rate': 7.5e-06, 'epoch': 3.21} 85%|████████▌ | 8516/10000 [31:02:36<5:18:23, 12.87s/it] 85%|████████▌ | 8517/10000 [31:02:49<5:18:41, 12.89s/it] {'loss': 0.003, 'learning_rate': 7.495e-06, 'epoch': 3.21} 85%|████████▌ | 8517/10000 [31:02:49<5:18:41, 12.89s/it] 85%|████████▌ | 8518/10000 [31:03:02<5:18:41, 12.90s/it] {'loss': 0.0051, 'learning_rate': 7.4899999999999994e-06, 'epoch': 3.21} 85%|████████▌ | 8518/10000 [31:03:02<5:18:41, 12.90s/it] 85%|████████▌ | 8519/10000 [31:03:15<5:18:42, 12.91s/it] {'loss': 0.0036, 'learning_rate': 7.485e-06, 'epoch': 3.21} 85%|████████▌ | 8519/10000 [31:03:15<5:18:42, 12.91s/it] 85%|████████▌ | 8520/10000 [31:03:28<5:18:20, 12.91s/it] {'loss': 0.0027, 'learning_rate': 7.480000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8520/10000 [31:03:28<5:18:20, 12.91s/it] 85%|████████▌ | 8521/10000 [31:03:41<5:18:26, 12.92s/it] {'loss': 0.0031, 'learning_rate': 7.4750000000000004e-06, 'epoch': 3.21} 85%|████████▌ | 8521/10000 [31:03:41<5:18:26, 12.92s/it] 85%|████████▌ | 8522/10000 [31:03:53<5:18:08, 12.92s/it] {'loss': 0.0034, 'learning_rate': 7.4700000000000005e-06, 'epoch': 3.21} 85%|████████▌ | 8522/10000 [31:03:53<5:18:08, 12.92s/it] 85%|████████▌ | 8523/10000 [31:04:06<5:17:57, 12.92s/it] {'loss': 0.0035, 'learning_rate': 7.465e-06, 'epoch': 3.21} 85%|████████▌ | 8523/10000 [31:04:06<5:17:57, 12.92s/it] 85%|████████▌ | 8524/10000 [31:04:19<5:17:26, 12.90s/it] {'loss': 0.0039, 'learning_rate': 7.4600000000000006e-06, 'epoch': 3.21} 85%|████████▌ | 8524/10000 [31:04:19<5:17:26, 12.90s/it] 85%|████████▌ | 8525/10000 [31:04:32<5:17:03, 12.90s/it] {'loss': 0.0034, 'learning_rate': 7.455000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8525/10000 [31:04:32<5:17:03, 12.90s/it] 85%|████████▌ | 8526/10000 [31:04:45<5:16:35, 12.89s/it] {'loss': 0.0039, 'learning_rate': 7.45e-06, 'epoch': 3.21} 85%|████████▌ | 8526/10000 [31:04:45<5:16:35, 12.89s/it] 85%|████████▌ | 8527/10000 [31:04:58<5:17:17, 12.92s/it] {'loss': 0.0036, 'learning_rate': 7.445000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8527/10000 [31:04:58<5:17:17, 12.92s/it] 85%|████████▌ | 8528/10000 [31:05:11<5:16:45, 12.91s/it] {'loss': 0.0043, 'learning_rate': 7.44e-06, 'epoch': 3.21} 85%|████████▌ | 8528/10000 [31:05:11<5:16:45, 12.91s/it] 85%|████████▌ | 8529/10000 [31:05:24<5:16:13, 12.90s/it] {'loss': 0.0041, 'learning_rate': 7.435e-06, 'epoch': 3.21} 85%|████████▌ | 8529/10000 [31:05:24<5:16:13, 12.90s/it] 85%|████████▌ | 8530/10000 [31:05:37<5:15:39, 12.88s/it] {'loss': 0.0026, 'learning_rate': 7.430000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8530/10000 [31:05:37<5:15:39, 12.88s/it] 85%|████████▌ | 8531/10000 [31:05:49<5:15:08, 12.87s/it] {'loss': 0.0041, 'learning_rate': 7.425e-06, 'epoch': 3.21} 85%|████████▌ | 8531/10000 [31:05:49<5:15:08, 12.87s/it] 85%|████████▌ | 8532/10000 [31:06:02<5:15:01, 12.88s/it] {'loss': 0.0031, 'learning_rate': 7.420000000000001e-06, 'epoch': 3.21} 85%|████████▌ | 8532/10000 [31:06:02<5:15:01, 12.88s/it] 85%|████████▌ | 8533/10000 [31:06:15<5:15:49, 12.92s/it] {'loss': 0.004, 'learning_rate': 7.414999999999999e-06, 'epoch': 3.22} 85%|████████▌ | 8533/10000 [31:06:15<5:15:49, 12.92s/it] 85%|████████▌ | 8534/10000 [31:06:28<5:15:20, 12.91s/it] {'loss': 0.0043, 'learning_rate': 7.41e-06, 'epoch': 3.22} 85%|████████▌ | 8534/10000 [31:06:28<5:15:20, 12.91s/it] 85%|████████▌ | 8535/10000 [31:06:41<5:15:11, 12.91s/it] {'loss': 0.0032, 'learning_rate': 7.405000000000001e-06, 'epoch': 3.22} 85%|████████▌ | 8535/10000 [31:06:41<5:15:11, 12.91s/it] 85%|████████▌ | 8536/10000 [31:06:54<5:14:53, 12.91s/it] {'loss': 0.0033, 'learning_rate': 7.4e-06, 'epoch': 3.22} 85%|████████▌ | 8536/10000 [31:06:54<5:14:53, 12.91s/it] 85%|████████▌ | 8537/10000 [31:07:07<5:14:32, 12.90s/it] {'loss': 0.0038, 'learning_rate': 7.395e-06, 'epoch': 3.22} 85%|████████▌ | 8537/10000 [31:07:07<5:14:32, 12.90s/it] 85%|████████▌ | 8538/10000 [31:07:20<5:14:39, 12.91s/it] {'loss': 0.0036, 'learning_rate': 7.3899999999999995e-06, 'epoch': 3.22} 85%|████████▌ | 8538/10000 [31:07:20<5:14:39, 12.91s/it] 85%|████████▌ | 8539/10000 [31:07:33<5:14:28, 12.91s/it] {'loss': 0.0026, 'learning_rate': 7.3850000000000004e-06, 'epoch': 3.22} 85%|████████▌ | 8539/10000 [31:07:33<5:14:28, 12.91s/it] 85%|████████▌ | 8540/10000 [31:07:46<5:14:09, 12.91s/it] {'loss': 0.0057, 'learning_rate': 7.3800000000000005e-06, 'epoch': 3.22} 85%|████████▌ | 8540/10000 [31:07:46<5:14:09, 12.91s/it] 85%|████████▌ | 8541/10000 [31:07:59<5:14:27, 12.93s/it] {'loss': 0.003, 'learning_rate': 7.375e-06, 'epoch': 3.22} 85%|████████▌ | 8541/10000 [31:07:59<5:14:27, 12.93s/it] 85%|████████▌ | 8542/10000 [31:08:12<5:13:58, 12.92s/it] {'loss': 0.0054, 'learning_rate': 7.370000000000001e-06, 'epoch': 3.22} 85%|████████▌ | 8542/10000 [31:08:12<5:13:58, 12.92s/it] 85%|████████▌ | 8543/10000 [31:08:24<5:13:36, 12.91s/it] {'loss': 0.0047, 'learning_rate': 7.365e-06, 'epoch': 3.22} 85%|████████▌ | 8543/10000 [31:08:24<5:13:36, 12.91s/it] 85%|████████▌ | 8544/10000 [31:08:37<5:13:36, 12.92s/it] {'loss': 0.0037, 'learning_rate': 7.36e-06, 'epoch': 3.22} 85%|████████▌ | 8544/10000 [31:08:37<5:13:36, 12.92s/it] 85%|████████▌ | 8545/10000 [31:08:50<5:13:31, 12.93s/it] {'loss': 0.0041, 'learning_rate': 7.355000000000001e-06, 'epoch': 3.22} 85%|████████▌ | 8545/10000 [31:08:50<5:13:31, 12.93s/it] 85%|████████▌ | 8546/10000 [31:09:03<5:12:55, 12.91s/it] {'loss': 0.0041, 'learning_rate': 7.35e-06, 'epoch': 3.22} 85%|████████▌ | 8546/10000 [31:09:03<5:12:55, 12.91s/it] 85%|████████▌ | 8547/10000 [31:09:16<5:12:22, 12.90s/it] {'loss': 0.0042, 'learning_rate': 7.345000000000001e-06, 'epoch': 3.22} 85%|████████▌ | 8547/10000 [31:09:16<5:12:22, 12.90s/it] 85%|████████▌ | 8548/10000 [31:09:29<5:12:16, 12.90s/it] {'loss': 0.0032, 'learning_rate': 7.340000000000001e-06, 'epoch': 3.22} 85%|████████▌ | 8548/10000 [31:09:29<5:12:16, 12.90s/it] 85%|████████▌ | 8549/10000 [31:09:42<5:12:32, 12.92s/it] {'loss': 0.0041, 'learning_rate': 7.335e-06, 'epoch': 3.22} 85%|████████▌ | 8549/10000 [31:09:42<5:12:32, 12.92s/it] 86%|████████▌ | 8550/10000 [31:09:55<5:12:55, 12.95s/it] {'loss': 0.0038, 'learning_rate': 7.330000000000001e-06, 'epoch': 3.22} 86%|████████▌ | 8550/10000 [31:09:55<5:12:55, 12.95s/it] 86%|████████▌ | 8551/10000 [31:10:08<5:12:44, 12.95s/it] {'loss': 0.0049, 'learning_rate': 7.325e-06, 'epoch': 3.22} 86%|████████▌ | 8551/10000 [31:10:08<5:12:44, 12.95s/it] 86%|████████▌ | 8552/10000 [31:10:21<5:12:32, 12.95s/it] {'loss': 0.0036, 'learning_rate': 7.32e-06, 'epoch': 3.22} 86%|████████▌ | 8552/10000 [31:10:21<5:12:32, 12.95s/it] 86%|████████▌ | 8553/10000 [31:10:34<5:11:49, 12.93s/it] {'loss': 0.0032, 'learning_rate': 7.315000000000001e-06, 'epoch': 3.22} 86%|████████▌ | 8553/10000 [31:10:34<5:11:49, 12.93s/it] 86%|████████▌ | 8554/10000 [31:10:47<5:11:56, 12.94s/it] {'loss': 0.0032, 'learning_rate': 7.31e-06, 'epoch': 3.22} 86%|████████▌ | 8554/10000 [31:10:47<5:11:56, 12.94s/it] 86%|████████▌ | 8555/10000 [31:11:00<5:11:19, 12.93s/it] {'loss': 0.0039, 'learning_rate': 7.305e-06, 'epoch': 3.22} 86%|████████▌ | 8555/10000 [31:11:00<5:11:19, 12.93s/it] 86%|████████▌ | 8556/10000 [31:11:13<5:11:15, 12.93s/it] {'loss': 0.0037, 'learning_rate': 7.2999999999999996e-06, 'epoch': 3.22} 86%|████████▌ | 8556/10000 [31:11:13<5:11:15, 12.93s/it] 86%|████████▌ | 8557/10000 [31:11:25<5:10:49, 12.92s/it] {'loss': 0.0032, 'learning_rate': 7.2950000000000005e-06, 'epoch': 3.22} 86%|████████▌ | 8557/10000 [31:11:25<5:10:49, 12.92s/it] 86%|████████▌ | 8558/10000 [31:11:38<5:10:23, 12.92s/it] {'loss': 0.0046, 'learning_rate': 7.290000000000001e-06, 'epoch': 3.22} 86%|████████▌ | 8558/10000 [31:11:38<5:10:23, 12.92s/it] 86%|████████▌ | 8559/10000 [31:11:51<5:09:43, 12.90s/it] {'loss': 0.0033, 'learning_rate': 7.2850000000000006e-06, 'epoch': 3.22} 86%|████████▌ | 8559/10000 [31:11:51<5:09:43, 12.90s/it] 86%|████████▌ | 8560/10000 [31:12:04<5:10:16, 12.93s/it] {'loss': 0.0043, 'learning_rate': 7.280000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8560/10000 [31:12:04<5:10:16, 12.93s/it] 86%|████████▌ | 8561/10000 [31:12:17<5:09:55, 12.92s/it] {'loss': 0.0039, 'learning_rate': 7.275e-06, 'epoch': 3.23} 86%|████████▌ | 8561/10000 [31:12:17<5:09:55, 12.92s/it] 86%|████████▌ | 8562/10000 [31:12:30<5:09:21, 12.91s/it] {'loss': 0.0041, 'learning_rate': 7.270000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8562/10000 [31:12:30<5:09:21, 12.91s/it] 86%|████████▌ | 8563/10000 [31:12:43<5:09:39, 12.93s/it] {'loss': 0.0037, 'learning_rate': 7.265000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8563/10000 [31:12:43<5:09:39, 12.93s/it] 86%|████████▌ | 8564/10000 [31:12:56<5:08:52, 12.91s/it] {'loss': 0.0036, 'learning_rate': 7.26e-06, 'epoch': 3.23} 86%|████████▌ | 8564/10000 [31:12:56<5:08:52, 12.91s/it] 86%|████████▌ | 8565/10000 [31:13:09<5:08:33, 12.90s/it] {'loss': 0.0046, 'learning_rate': 7.255000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8565/10000 [31:13:09<5:08:33, 12.90s/it] 86%|████████▌ | 8566/10000 [31:13:22<5:08:34, 12.91s/it] {'loss': 0.0041, 'learning_rate': 7.25e-06, 'epoch': 3.23} 86%|████████▌ | 8566/10000 [31:13:22<5:08:34, 12.91s/it] 86%|████████▌ | 8567/10000 [31:13:35<5:08:22, 12.91s/it] {'loss': 0.0042, 'learning_rate': 7.245e-06, 'epoch': 3.23} 86%|████████▌ | 8567/10000 [31:13:35<5:08:22, 12.91s/it] 86%|████████▌ | 8568/10000 [31:13:47<5:07:57, 12.90s/it] {'loss': 0.0044, 'learning_rate': 7.240000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8568/10000 [31:13:47<5:07:57, 12.90s/it] 86%|████████▌ | 8569/10000 [31:14:00<5:08:01, 12.91s/it] {'loss': 0.0054, 'learning_rate': 7.235e-06, 'epoch': 3.23} 86%|████████▌ | 8569/10000 [31:14:00<5:08:01, 12.91s/it] 86%|████████▌ | 8570/10000 [31:14:13<5:07:52, 12.92s/it] {'loss': 0.0039, 'learning_rate': 7.230000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8570/10000 [31:14:13<5:07:52, 12.92s/it] 86%|████████▌ | 8571/10000 [31:14:26<5:07:19, 12.90s/it] {'loss': 0.0036, 'learning_rate': 7.2249999999999994e-06, 'epoch': 3.23} 86%|████████▌ | 8571/10000 [31:14:26<5:07:19, 12.90s/it] 86%|████████▌ | 8572/10000 [31:14:39<5:07:54, 12.94s/it] {'loss': 0.0045, 'learning_rate': 7.22e-06, 'epoch': 3.23} 86%|████████▌ | 8572/10000 [31:14:39<5:07:54, 12.94s/it] 86%|████████▌ | 8573/10000 [31:14:52<5:07:15, 12.92s/it] {'loss': 0.0032, 'learning_rate': 7.215000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8573/10000 [31:14:52<5:07:15, 12.92s/it] 86%|████████▌ | 8574/10000 [31:15:05<5:06:57, 12.92s/it] {'loss': 0.0043, 'learning_rate': 7.2100000000000004e-06, 'epoch': 3.23} 86%|████████▌ | 8574/10000 [31:15:05<5:06:57, 12.92s/it] 86%|████████▌ | 8575/10000 [31:15:18<5:06:55, 12.92s/it] {'loss': 0.0049, 'learning_rate': 7.2050000000000005e-06, 'epoch': 3.23} 86%|████████▌ | 8575/10000 [31:15:18<5:06:55, 12.92s/it] 86%|████████▌ | 8576/10000 [31:15:31<5:06:42, 12.92s/it] {'loss': 0.004, 'learning_rate': 7.2e-06, 'epoch': 3.23} 86%|████████▌ | 8576/10000 [31:15:31<5:06:42, 12.92s/it] 86%|████████▌ | 8577/10000 [31:15:44<5:06:59, 12.94s/it] {'loss': 0.0035, 'learning_rate': 7.1950000000000006e-06, 'epoch': 3.23} 86%|████████▌ | 8577/10000 [31:15:44<5:06:59, 12.94s/it] 86%|████████▌ | 8578/10000 [31:15:57<5:06:27, 12.93s/it] {'loss': 0.004, 'learning_rate': 7.190000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8578/10000 [31:15:57<5:06:27, 12.93s/it] 86%|████████▌ | 8579/10000 [31:16:10<5:06:19, 12.93s/it] {'loss': 0.0044, 'learning_rate': 7.185e-06, 'epoch': 3.23} 86%|████████▌ | 8579/10000 [31:16:10<5:06:19, 12.93s/it] 86%|████████▌ | 8580/10000 [31:16:23<5:05:31, 12.91s/it] {'loss': 0.0041, 'learning_rate': 7.180000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8580/10000 [31:16:23<5:05:31, 12.91s/it] 86%|████████▌ | 8581/10000 [31:16:35<5:05:02, 12.90s/it] {'loss': 0.0038, 'learning_rate': 7.175e-06, 'epoch': 3.23} 86%|████████▌ | 8581/10000 [31:16:35<5:05:02, 12.90s/it] 86%|████████▌ | 8582/10000 [31:16:48<5:04:59, 12.91s/it] {'loss': 0.0034, 'learning_rate': 7.17e-06, 'epoch': 3.23} 86%|████████▌ | 8582/10000 [31:16:48<5:04:59, 12.91s/it] 86%|████████▌ | 8583/10000 [31:17:01<5:05:11, 12.92s/it] {'loss': 0.004, 'learning_rate': 7.165000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8583/10000 [31:17:01<5:05:11, 12.92s/it] 86%|████████▌ | 8584/10000 [31:17:14<5:04:57, 12.92s/it] {'loss': 0.0035, 'learning_rate': 7.16e-06, 'epoch': 3.23} 86%|████████▌ | 8584/10000 [31:17:14<5:04:57, 12.92s/it] 86%|████████▌ | 8585/10000 [31:17:27<5:05:06, 12.94s/it] {'loss': 0.0039, 'learning_rate': 7.155000000000001e-06, 'epoch': 3.23} 86%|████████▌ | 8585/10000 [31:17:27<5:05:06, 12.94s/it] 86%|████████▌ | 8586/10000 [31:17:40<5:04:40, 12.93s/it] {'loss': 0.0035, 'learning_rate': 7.15e-06, 'epoch': 3.24} 86%|████████▌ | 8586/10000 [31:17:40<5:04:40, 12.93s/it] 86%|████████▌ | 8587/10000 [31:17:53<5:04:30, 12.93s/it] {'loss': 0.0051, 'learning_rate': 7.145e-06, 'epoch': 3.24} 86%|████████▌ | 8587/10000 [31:17:53<5:04:30, 12.93s/it] 86%|████████▌ | 8588/10000 [31:18:06<5:04:11, 12.93s/it] {'loss': 0.0041, 'learning_rate': 7.140000000000001e-06, 'epoch': 3.24} 86%|████████▌ | 8588/10000 [31:18:06<5:04:11, 12.93s/it] 86%|████████▌ | 8589/10000 [31:18:19<5:03:54, 12.92s/it] {'loss': 0.0038, 'learning_rate': 7.135e-06, 'epoch': 3.24} 86%|████████▌ | 8589/10000 [31:18:19<5:03:54, 12.92s/it] 86%|████████▌ | 8590/10000 [31:18:32<5:03:45, 12.93s/it] {'loss': 0.0044, 'learning_rate': 7.13e-06, 'epoch': 3.24} 86%|████████▌ | 8590/10000 [31:18:32<5:03:45, 12.93s/it] 86%|████████▌ | 8591/10000 [31:18:45<5:03:34, 12.93s/it] {'loss': 0.0031, 'learning_rate': 7.1249999999999995e-06, 'epoch': 3.24} 86%|████████▌ | 8591/10000 [31:18:45<5:03:34, 12.93s/it] 86%|████████▌ | 8592/10000 [31:18:58<5:03:10, 12.92s/it] {'loss': 0.0036, 'learning_rate': 7.1200000000000004e-06, 'epoch': 3.24} 86%|████████▌ | 8592/10000 [31:18:58<5:03:10, 12.92s/it] 86%|████████▌ | 8593/10000 [31:19:10<5:02:26, 12.90s/it] {'loss': 0.0032, 'learning_rate': 7.1150000000000005e-06, 'epoch': 3.24} 86%|████████▌ | 8593/10000 [31:19:11<5:02:26, 12.90s/it] 86%|████████▌ | 8594/10000 [31:19:23<5:01:54, 12.88s/it] {'loss': 0.0038, 'learning_rate': 7.11e-06, 'epoch': 3.24} 86%|████████▌ | 8594/10000 [31:19:23<5:01:54, 12.88s/it] 86%|████████▌ | 8595/10000 [31:19:36<5:01:40, 12.88s/it] {'loss': 0.0037, 'learning_rate': 7.105000000000001e-06, 'epoch': 3.24} 86%|████████▌ | 8595/10000 [31:19:36<5:01:40, 12.88s/it] 86%|████████▌ | 8596/10000 [31:19:49<5:01:20, 12.88s/it] {'loss': 0.0039, 'learning_rate': 7.1e-06, 'epoch': 3.24} 86%|████████▌ | 8596/10000 [31:19:49<5:01:20, 12.88s/it] 86%|████████▌ | 8597/10000 [31:20:02<5:01:47, 12.91s/it] {'loss': 0.003, 'learning_rate': 7.095000000000001e-06, 'epoch': 3.24} 86%|████████▌ | 8597/10000 [31:20:02<5:01:47, 12.91s/it] 86%|████████▌ | 8598/10000 [31:20:15<5:01:10, 12.89s/it] {'loss': 0.004, 'learning_rate': 7.090000000000001e-06, 'epoch': 3.24} 86%|████████▌ | 8598/10000 [31:20:15<5:01:10, 12.89s/it] 86%|████████▌ | 8599/10000 [31:20:28<5:01:33, 12.91s/it] {'loss': 0.0032, 'learning_rate': 7.085e-06, 'epoch': 3.24} 86%|████████▌ | 8599/10000 [31:20:28<5:01:33, 12.91s/it] 86%|████████▌ | 8600/10000 [31:20:41<5:01:31, 12.92s/it] {'loss': 0.0038, 'learning_rate': 7.080000000000001e-06, 'epoch': 3.24} 86%|████████▌ | 8600/10000 [31:20:41<5:01:31, 12.92s/it] 86%|████████▌ | 8601/10000 [31:20:54<5:01:07, 12.91s/it] {'loss': 0.0036, 'learning_rate': 7.075e-06, 'epoch': 3.24} 86%|████████▌ | 8601/10000 [31:20:54<5:01:07, 12.91s/it] 86%|████████▌ | 8602/10000 [31:21:07<5:00:48, 12.91s/it] {'loss': 0.0035, 'learning_rate': 7.07e-06, 'epoch': 3.24} 86%|████████▌ | 8602/10000 [31:21:07<5:00:48, 12.91s/it] 86%|████████▌ | 8603/10000 [31:21:19<5:00:13, 12.89s/it] {'loss': 0.0038, 'learning_rate': 7.065000000000001e-06, 'epoch': 3.24} 86%|████████▌ | 8603/10000 [31:21:19<5:00:13, 12.89s/it] 86%|████████▌ | 8604/10000 [31:21:32<5:00:04, 12.90s/it] {'loss': 0.0031, 'learning_rate': 7.06e-06, 'epoch': 3.24} 86%|████████▌ | 8604/10000 [31:21:32<5:00:04, 12.90s/it] 86%|████████▌ | 8605/10000 [31:21:45<4:59:34, 12.89s/it] {'loss': 0.004, 'learning_rate': 7.055e-06, 'epoch': 3.24} 86%|████████▌ | 8605/10000 [31:21:45<4:59:34, 12.89s/it] 86%|████████▌ | 8606/10000 [31:21:58<4:59:18, 12.88s/it] {'loss': 0.0036, 'learning_rate': 7.049999999999999e-06, 'epoch': 3.24} 86%|████████▌ | 8606/10000 [31:21:58<4:59:18, 12.88s/it] 86%|████████▌ | 8607/10000 [31:22:11<4:59:15, 12.89s/it] {'loss': 0.0031, 'learning_rate': 7.045e-06, 'epoch': 3.24} 86%|████████▌ | 8607/10000 [31:22:11<4:59:15, 12.89s/it] 86%|████████▌ | 8608/10000 [31:22:24<4:59:17, 12.90s/it] {'loss': 0.0036, 'learning_rate': 7.04e-06, 'epoch': 3.24} 86%|████████▌ | 8608/10000 [31:22:24<4:59:17, 12.90s/it] 86%|████████▌ | 8609/10000 [31:22:37<4:59:21, 12.91s/it] {'loss': 0.0035, 'learning_rate': 7.0349999999999996e-06, 'epoch': 3.24} 86%|████████▌ | 8609/10000 [31:22:37<4:59:21, 12.91s/it] 86%|████████▌ | 8610/10000 [31:22:50<4:58:46, 12.90s/it] {'loss': 0.0037, 'learning_rate': 7.0300000000000005e-06, 'epoch': 3.24} 86%|████████▌ | 8610/10000 [31:22:50<4:58:46, 12.90s/it] 86%|████████▌ | 8611/10000 [31:23:03<4:58:31, 12.89s/it] {'loss': 0.0039, 'learning_rate': 7.025000000000001e-06, 'epoch': 3.24} 86%|████████▌ | 8611/10000 [31:23:03<4:58:31, 12.89s/it] 86%|████████▌ | 8612/10000 [31:23:16<4:58:23, 12.90s/it] {'loss': 0.0028, 'learning_rate': 7.0200000000000006e-06, 'epoch': 3.24} 86%|████████▌ | 8612/10000 [31:23:16<4:58:23, 12.90s/it] 86%|████████▌ | 8613/10000 [31:23:28<4:57:50, 12.88s/it] {'loss': 0.0034, 'learning_rate': 7.015000000000001e-06, 'epoch': 3.25} 86%|████████▌ | 8613/10000 [31:23:28<4:57:50, 12.88s/it] 86%|████████▌ | 8614/10000 [31:23:41<4:57:43, 12.89s/it] {'loss': 0.0035, 'learning_rate': 7.01e-06, 'epoch': 3.25} 86%|████████▌ | 8614/10000 [31:23:41<4:57:43, 12.89s/it] 86%|████████▌ | 8615/10000 [31:23:54<4:57:38, 12.89s/it] {'loss': 0.0052, 'learning_rate': 7.005000000000001e-06, 'epoch': 3.25} 86%|████████▌ | 8615/10000 [31:23:54<4:57:38, 12.89s/it] 86%|████████▌ | 8616/10000 [31:24:07<4:58:26, 12.94s/it] {'loss': 0.0056, 'learning_rate': 7.000000000000001e-06, 'epoch': 3.25} 86%|████████▌ | 8616/10000 [31:24:07<4:58:26, 12.94s/it] 86%|████████▌ | 8617/10000 [31:24:20<4:57:45, 12.92s/it] {'loss': 0.0045, 'learning_rate': 6.995e-06, 'epoch': 3.25} 86%|████████▌ | 8617/10000 [31:24:20<4:57:45, 12.92s/it] 86%|████████▌ | 8618/10000 [31:24:33<4:57:13, 12.90s/it] {'loss': 0.0058, 'learning_rate': 6.990000000000001e-06, 'epoch': 3.25} 86%|████████▌ | 8618/10000 [31:24:33<4:57:13, 12.90s/it] 86%|████████▌ | 8619/10000 [31:24:46<4:57:04, 12.91s/it] {'loss': 0.0039, 'learning_rate': 6.985e-06, 'epoch': 3.25} 86%|████████▌ | 8619/10000 [31:24:46<4:57:04, 12.91s/it] 86%|████████▌ | 8620/10000 [31:24:59<4:56:55, 12.91s/it] {'loss': 0.0043, 'learning_rate': 6.98e-06, 'epoch': 3.25} 86%|████████▌ | 8620/10000 [31:24:59<4:56:55, 12.91s/it] 86%|████████▌ | 8621/10000 [31:25:12<4:57:24, 12.94s/it] {'loss': 0.0033, 'learning_rate': 6.975000000000001e-06, 'epoch': 3.25} 86%|████████▌ | 8621/10000 [31:25:12<4:57:24, 12.94s/it] 86%|████████▌ | 8622/10000 [31:25:25<4:56:45, 12.92s/it] {'loss': 0.0041, 'learning_rate': 6.97e-06, 'epoch': 3.25} 86%|████████▌ | 8622/10000 [31:25:25<4:56:45, 12.92s/it] 86%|████████▌ | 8623/10000 [31:25:38<4:56:27, 12.92s/it] {'loss': 0.0036, 'learning_rate': 6.965000000000001e-06, 'epoch': 3.25} 86%|████████▌ | 8623/10000 [31:25:38<4:56:27, 12.92s/it] 86%|████████▌ | 8624/10000 [31:25:50<4:56:06, 12.91s/it] {'loss': 0.0033, 'learning_rate': 6.9599999999999994e-06, 'epoch': 3.25} 86%|████████▌ | 8624/10000 [31:25:51<4:56:06, 12.91s/it] 86%|████████▋ | 8625/10000 [31:26:03<4:55:19, 12.89s/it] {'loss': 0.0036, 'learning_rate': 6.955e-06, 'epoch': 3.25} 86%|████████▋ | 8625/10000 [31:26:03<4:55:19, 12.89s/it] 86%|████████▋ | 8626/10000 [31:26:16<4:55:33, 12.91s/it] {'loss': 0.0036, 'learning_rate': 6.950000000000001e-06, 'epoch': 3.25} 86%|████████▋ | 8626/10000 [31:26:16<4:55:33, 12.91s/it] 86%|████████▋ | 8627/10000 [31:26:29<4:55:06, 12.90s/it] {'loss': 0.004, 'learning_rate': 6.945e-06, 'epoch': 3.25} 86%|████████▋ | 8627/10000 [31:26:29<4:55:06, 12.90s/it] 86%|████████▋ | 8628/10000 [31:26:42<4:55:16, 12.91s/it] {'loss': 0.0042, 'learning_rate': 6.9400000000000005e-06, 'epoch': 3.25} 86%|████████▋ | 8628/10000 [31:26:42<4:55:16, 12.91s/it] 86%|████████▋ | 8629/10000 [31:26:55<4:55:06, 12.91s/it] {'loss': 0.0039, 'learning_rate': 6.935e-06, 'epoch': 3.25} 86%|████████▋ | 8629/10000 [31:26:55<4:55:06, 12.91s/it] 86%|████████▋ | 8630/10000 [31:27:08<4:54:57, 12.92s/it] {'loss': 0.0048, 'learning_rate': 6.9300000000000006e-06, 'epoch': 3.25} 86%|████████▋ | 8630/10000 [31:27:08<4:54:57, 12.92s/it] 86%|████████▋ | 8631/10000 [31:27:21<4:54:36, 12.91s/it] {'loss': 0.0034, 'learning_rate': 6.925000000000001e-06, 'epoch': 3.25} 86%|████████▋ | 8631/10000 [31:27:21<4:54:36, 12.91s/it] 86%|████████▋ | 8632/10000 [31:27:34<4:54:42, 12.93s/it] {'loss': 0.0048, 'learning_rate': 6.92e-06, 'epoch': 3.25} 86%|████████▋ | 8632/10000 [31:27:34<4:54:42, 12.93s/it] 86%|████████▋ | 8633/10000 [31:27:47<4:54:53, 12.94s/it] {'loss': 0.0032, 'learning_rate': 6.915000000000001e-06, 'epoch': 3.25} 86%|████████▋ | 8633/10000 [31:27:47<4:54:53, 12.94s/it] 86%|████████▋ | 8634/10000 [31:28:00<4:54:47, 12.95s/it] {'loss': 0.0039, 'learning_rate': 6.91e-06, 'epoch': 3.25} 86%|████████▋ | 8634/10000 [31:28:00<4:54:47, 12.95s/it] 86%|████████▋ | 8635/10000 [31:28:13<4:54:19, 12.94s/it] {'loss': 0.004, 'learning_rate': 6.905e-06, 'epoch': 3.25} 86%|████████▋ | 8635/10000 [31:28:13<4:54:19, 12.94s/it] 86%|████████▋ | 8636/10000 [31:28:26<4:53:38, 12.92s/it] {'loss': 0.005, 'learning_rate': 6.900000000000001e-06, 'epoch': 3.25} 86%|████████▋ | 8636/10000 [31:28:26<4:53:38, 12.92s/it] 86%|████████▋ | 8637/10000 [31:28:38<4:53:34, 12.92s/it] {'loss': 0.0036, 'learning_rate': 6.895e-06, 'epoch': 3.25} 86%|████████▋ | 8637/10000 [31:28:39<4:53:34, 12.92s/it] 86%|████████▋ | 8638/10000 [31:28:51<4:53:20, 12.92s/it] {'loss': 0.003, 'learning_rate': 6.890000000000001e-06, 'epoch': 3.25} 86%|████████▋ | 8638/10000 [31:28:51<4:53:20, 12.92s/it] 86%|████████▋ | 8639/10000 [31:29:04<4:53:40, 12.95s/it] {'loss': 0.0046, 'learning_rate': 6.885e-06, 'epoch': 3.26} 86%|████████▋ | 8639/10000 [31:29:04<4:53:40, 12.95s/it] 86%|████████▋ | 8640/10000 [31:29:17<4:53:17, 12.94s/it] {'loss': 0.0036, 'learning_rate': 6.88e-06, 'epoch': 3.26} 86%|████████▋ | 8640/10000 [31:29:17<4:53:17, 12.94s/it] 86%|████████▋ | 8641/10000 [31:29:30<4:52:18, 12.91s/it] {'loss': 0.0032, 'learning_rate': 6.875000000000001e-06, 'epoch': 3.26} 86%|████████▋ | 8641/10000 [31:29:30<4:52:18, 12.91s/it] 86%|████████▋ | 8642/10000 [31:29:43<4:51:44, 12.89s/it] {'loss': 0.0036, 'learning_rate': 6.87e-06, 'epoch': 3.26} 86%|████████▋ | 8642/10000 [31:29:43<4:51:44, 12.89s/it] 86%|████████▋ | 8643/10000 [31:29:56<4:51:17, 12.88s/it] {'loss': 0.0048, 'learning_rate': 6.865e-06, 'epoch': 3.26} 86%|████████▋ | 8643/10000 [31:29:56<4:51:17, 12.88s/it] 86%|████████▋ | 8644/10000 [31:30:09<4:50:36, 12.86s/it] {'loss': 0.0038, 'learning_rate': 6.8599999999999995e-06, 'epoch': 3.26} 86%|████████▋ | 8644/10000 [31:30:09<4:50:36, 12.86s/it] 86%|████████▋ | 8645/10000 [31:30:22<4:51:33, 12.91s/it] {'loss': 0.0055, 'learning_rate': 6.8550000000000004e-06, 'epoch': 3.26} 86%|████████▋ | 8645/10000 [31:30:22<4:51:33, 12.91s/it] 86%|████████▋ | 8646/10000 [31:30:35<4:51:24, 12.91s/it] {'loss': 0.0034, 'learning_rate': 6.8500000000000005e-06, 'epoch': 3.26} 86%|████████▋ | 8646/10000 [31:30:35<4:51:24, 12.91s/it] 86%|████████▋ | 8647/10000 [31:30:48<4:51:57, 12.95s/it] {'loss': 0.0027, 'learning_rate': 6.845e-06, 'epoch': 3.26} 86%|████████▋ | 8647/10000 [31:30:48<4:51:57, 12.95s/it] 86%|████████▋ | 8648/10000 [31:31:01<4:51:36, 12.94s/it] {'loss': 0.0038, 'learning_rate': 6.840000000000001e-06, 'epoch': 3.26} 86%|████████▋ | 8648/10000 [31:31:01<4:51:36, 12.94s/it] 86%|████████▋ | 8649/10000 [31:31:13<4:50:56, 12.92s/it] {'loss': 0.0032, 'learning_rate': 6.835e-06, 'epoch': 3.26} 86%|████████▋ | 8649/10000 [31:31:13<4:50:56, 12.92s/it] 86%|████████▋ | 8650/10000 [31:31:26<4:50:33, 12.91s/it] {'loss': 0.0042, 'learning_rate': 6.830000000000001e-06, 'epoch': 3.26} 86%|████████▋ | 8650/10000 [31:31:26<4:50:33, 12.91s/it] 87%|████████▋ | 8651/10000 [31:31:39<4:50:28, 12.92s/it] {'loss': 0.0038, 'learning_rate': 6.825000000000001e-06, 'epoch': 3.26} 87%|████████▋ | 8651/10000 [31:31:39<4:50:28, 12.92s/it] 87%|████████▋ | 8652/10000 [31:31:52<4:50:21, 12.92s/it] {'loss': 0.0031, 'learning_rate': 6.82e-06, 'epoch': 3.26} 87%|████████▋ | 8652/10000 [31:31:52<4:50:21, 12.92s/it] 87%|████████▋ | 8653/10000 [31:32:05<4:50:56, 12.96s/it] {'loss': 0.0031, 'learning_rate': 6.815000000000001e-06, 'epoch': 3.26} 87%|████████▋ | 8653/10000 [31:32:05<4:50:56, 12.96s/it] 87%|████████▋ | 8654/10000 [31:32:18<4:50:30, 12.95s/it] {'loss': 0.0046, 'learning_rate': 6.81e-06, 'epoch': 3.26} 87%|████████▋ | 8654/10000 [31:32:18<4:50:30, 12.95s/it] 87%|████████▋ | 8655/10000 [31:32:31<4:50:22, 12.95s/it] {'loss': 0.0041, 'learning_rate': 6.805e-06, 'epoch': 3.26} 87%|████████▋ | 8655/10000 [31:32:31<4:50:22, 12.95s/it] 87%|████████▋ | 8656/10000 [31:32:44<4:49:46, 12.94s/it] {'loss': 0.0039, 'learning_rate': 6.800000000000001e-06, 'epoch': 3.26} 87%|████████▋ | 8656/10000 [31:32:44<4:49:46, 12.94s/it] 87%|████████▋ | 8657/10000 [31:32:57<4:49:29, 12.93s/it] {'loss': 0.0045, 'learning_rate': 6.795e-06, 'epoch': 3.26} 87%|████████▋ | 8657/10000 [31:32:57<4:49:29, 12.93s/it] 87%|████████▋ | 8658/10000 [31:33:10<4:48:44, 12.91s/it] {'loss': 0.0036, 'learning_rate': 6.79e-06, 'epoch': 3.26} 87%|████████▋ | 8658/10000 [31:33:10<4:48:44, 12.91s/it] 87%|████████▋ | 8659/10000 [31:33:23<4:48:44, 12.92s/it] {'loss': 0.0039, 'learning_rate': 6.784999999999999e-06, 'epoch': 3.26} 87%|████████▋ | 8659/10000 [31:33:23<4:48:44, 12.92s/it] 87%|████████▋ | 8660/10000 [31:33:36<4:47:45, 12.88s/it] {'loss': 0.0044, 'learning_rate': 6.78e-06, 'epoch': 3.26} 87%|████████▋ | 8660/10000 [31:33:36<4:47:45, 12.88s/it] 87%|████████▋ | 8661/10000 [31:33:48<4:47:37, 12.89s/it] {'loss': 0.0036, 'learning_rate': 6.775000000000001e-06, 'epoch': 3.26} 87%|████████▋ | 8661/10000 [31:33:48<4:47:37, 12.89s/it] 87%|████████▋ | 8662/10000 [31:34:01<4:47:24, 12.89s/it] {'loss': 0.0046, 'learning_rate': 6.7699999999999996e-06, 'epoch': 3.26} 87%|████████▋ | 8662/10000 [31:34:01<4:47:24, 12.89s/it] 87%|████████▋ | 8663/10000 [31:34:14<4:47:25, 12.90s/it] {'loss': 0.004, 'learning_rate': 6.7650000000000005e-06, 'epoch': 3.26} 87%|████████▋ | 8663/10000 [31:34:14<4:47:25, 12.90s/it] 87%|████████▋ | 8664/10000 [31:34:27<4:47:17, 12.90s/it] {'loss': 0.0042, 'learning_rate': 6.76e-06, 'epoch': 3.26} 87%|████████▋ | 8664/10000 [31:34:27<4:47:17, 12.90s/it] 87%|████████▋ | 8665/10000 [31:34:40<4:47:02, 12.90s/it] {'loss': 0.0038, 'learning_rate': 6.7550000000000005e-06, 'epoch': 3.26} 87%|████████▋ | 8665/10000 [31:34:40<4:47:02, 12.90s/it] 87%|████████▋ | 8666/10000 [31:34:53<4:47:17, 12.92s/it] {'loss': 0.0035, 'learning_rate': 6.750000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8666/10000 [31:34:53<4:47:17, 12.92s/it] 87%|████████▋ | 8667/10000 [31:35:06<4:47:04, 12.92s/it] {'loss': 0.0029, 'learning_rate': 6.745e-06, 'epoch': 3.27} 87%|████████▋ | 8667/10000 [31:35:06<4:47:04, 12.92s/it] 87%|████████▋ | 8668/10000 [31:35:19<4:47:01, 12.93s/it] {'loss': 0.0032, 'learning_rate': 6.740000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8668/10000 [31:35:19<4:47:01, 12.93s/it] 87%|████████▋ | 8669/10000 [31:35:32<4:46:55, 12.93s/it] {'loss': 0.0035, 'learning_rate': 6.735e-06, 'epoch': 3.27} 87%|████████▋ | 8669/10000 [31:35:32<4:46:55, 12.93s/it] 87%|████████▋ | 8670/10000 [31:35:45<4:46:58, 12.95s/it] {'loss': 0.0038, 'learning_rate': 6.73e-06, 'epoch': 3.27} 87%|████████▋ | 8670/10000 [31:35:45<4:46:58, 12.95s/it] 87%|████████▋ | 8671/10000 [31:35:58<4:46:37, 12.94s/it] {'loss': 0.0031, 'learning_rate': 6.725000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8671/10000 [31:35:58<4:46:37, 12.94s/it] 87%|████████▋ | 8672/10000 [31:36:11<4:45:45, 12.91s/it] {'loss': 0.0043, 'learning_rate': 6.72e-06, 'epoch': 3.27} 87%|████████▋ | 8672/10000 [31:36:11<4:45:45, 12.91s/it] 87%|████████▋ | 8673/10000 [31:36:23<4:45:22, 12.90s/it] {'loss': 0.0031, 'learning_rate': 6.715e-06, 'epoch': 3.27} 87%|████████▋ | 8673/10000 [31:36:24<4:45:22, 12.90s/it] 87%|████████▋ | 8674/10000 [31:36:36<4:45:18, 12.91s/it] {'loss': 0.0038, 'learning_rate': 6.710000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8674/10000 [31:36:36<4:45:18, 12.91s/it] 87%|████████▋ | 8675/10000 [31:36:49<4:45:12, 12.91s/it] {'loss': 0.0037, 'learning_rate': 6.705e-06, 'epoch': 3.27} 87%|████████▋ | 8675/10000 [31:36:49<4:45:12, 12.91s/it] 87%|████████▋ | 8676/10000 [31:37:02<4:45:29, 12.94s/it] {'loss': 0.0039, 'learning_rate': 6.700000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8676/10000 [31:37:02<4:45:29, 12.94s/it] 87%|████████▋ | 8677/10000 [31:37:15<4:45:36, 12.95s/it] {'loss': 0.0043, 'learning_rate': 6.695e-06, 'epoch': 3.27} 87%|████████▋ | 8677/10000 [31:37:15<4:45:36, 12.95s/it] 87%|████████▋ | 8678/10000 [31:37:28<4:45:06, 12.94s/it] {'loss': 0.0037, 'learning_rate': 6.69e-06, 'epoch': 3.27} 87%|████████▋ | 8678/10000 [31:37:28<4:45:06, 12.94s/it] 87%|████████▋ | 8679/10000 [31:37:41<4:45:09, 12.95s/it] {'loss': 0.0055, 'learning_rate': 6.685000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8679/10000 [31:37:41<4:45:09, 12.95s/it] 87%|████████▋ | 8680/10000 [31:37:54<4:44:32, 12.93s/it] {'loss': 0.0044, 'learning_rate': 6.68e-06, 'epoch': 3.27} 87%|████████▋ | 8680/10000 [31:37:54<4:44:32, 12.93s/it] 87%|████████▋ | 8681/10000 [31:38:07<4:44:05, 12.92s/it] {'loss': 0.0053, 'learning_rate': 6.6750000000000005e-06, 'epoch': 3.27} 87%|████████▋ | 8681/10000 [31:38:07<4:44:05, 12.92s/it] 87%|████████▋ | 8682/10000 [31:38:20<4:43:37, 12.91s/it] {'loss': 0.004, 'learning_rate': 6.67e-06, 'epoch': 3.27} 87%|████████▋ | 8682/10000 [31:38:20<4:43:37, 12.91s/it] 87%|████████▋ | 8683/10000 [31:38:33<4:43:15, 12.90s/it] {'loss': 0.0039, 'learning_rate': 6.6650000000000006e-06, 'epoch': 3.27} 87%|████████▋ | 8683/10000 [31:38:33<4:43:15, 12.90s/it] 87%|████████▋ | 8684/10000 [31:38:46<4:42:27, 12.88s/it] {'loss': 0.0048, 'learning_rate': 6.660000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8684/10000 [31:38:46<4:42:27, 12.88s/it] 87%|████████▋ | 8685/10000 [31:38:58<4:42:25, 12.89s/it] {'loss': 0.0039, 'learning_rate': 6.655e-06, 'epoch': 3.27} 87%|████████▋ | 8685/10000 [31:38:59<4:42:25, 12.89s/it] 87%|████████▋ | 8686/10000 [31:39:11<4:42:13, 12.89s/it] {'loss': 0.0039, 'learning_rate': 6.650000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8686/10000 [31:39:11<4:42:13, 12.89s/it] 87%|████████▋ | 8687/10000 [31:39:24<4:42:24, 12.91s/it] {'loss': 0.0036, 'learning_rate': 6.645e-06, 'epoch': 3.27} 87%|████████▋ | 8687/10000 [31:39:24<4:42:24, 12.91s/it] 87%|████████▋ | 8688/10000 [31:39:37<4:41:38, 12.88s/it] {'loss': 0.0045, 'learning_rate': 6.640000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8688/10000 [31:39:37<4:41:38, 12.88s/it] 87%|████████▋ | 8689/10000 [31:39:50<4:41:39, 12.89s/it] {'loss': 0.0036, 'learning_rate': 6.635000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8689/10000 [31:39:50<4:41:39, 12.89s/it] 87%|████████▋ | 8690/10000 [31:40:03<4:41:51, 12.91s/it] {'loss': 0.0041, 'learning_rate': 6.63e-06, 'epoch': 3.27} 87%|████████▋ | 8690/10000 [31:40:03<4:41:51, 12.91s/it] 87%|████████▋ | 8691/10000 [31:40:16<4:41:15, 12.89s/it] {'loss': 0.0035, 'learning_rate': 6.625000000000001e-06, 'epoch': 3.27} 87%|████████▋ | 8691/10000 [31:40:16<4:41:15, 12.89s/it] 87%|████████▋ | 8692/10000 [31:40:29<4:41:03, 12.89s/it] {'loss': 0.0035, 'learning_rate': 6.62e-06, 'epoch': 3.28} 87%|████████▋ | 8692/10000 [31:40:29<4:41:03, 12.89s/it] 87%|████████▋ | 8693/10000 [31:40:42<4:41:15, 12.91s/it] {'loss': 0.004, 'learning_rate': 6.615e-06, 'epoch': 3.28} 87%|████████▋ | 8693/10000 [31:40:42<4:41:15, 12.91s/it] 87%|████████▋ | 8694/10000 [31:40:55<4:41:38, 12.94s/it] {'loss': 0.0029, 'learning_rate': 6.610000000000001e-06, 'epoch': 3.28} 87%|████████▋ | 8694/10000 [31:40:55<4:41:38, 12.94s/it] 87%|████████▋ | 8695/10000 [31:41:08<4:40:45, 12.91s/it] {'loss': 0.003, 'learning_rate': 6.605e-06, 'epoch': 3.28} 87%|████████▋ | 8695/10000 [31:41:08<4:40:45, 12.91s/it] 87%|████████▋ | 8696/10000 [31:41:20<4:39:54, 12.88s/it] {'loss': 0.0028, 'learning_rate': 6.6e-06, 'epoch': 3.28} 87%|████████▋ | 8696/10000 [31:41:20<4:39:54, 12.88s/it] 87%|████████▋ | 8697/10000 [31:41:33<4:39:13, 12.86s/it] {'loss': 0.0044, 'learning_rate': 6.5949999999999995e-06, 'epoch': 3.28} 87%|████████▋ | 8697/10000 [31:41:33<4:39:13, 12.86s/it] 87%|████████▋ | 8698/10000 [31:41:46<4:38:57, 12.85s/it] {'loss': 0.0034, 'learning_rate': 6.5900000000000004e-06, 'epoch': 3.28} 87%|████████▋ | 8698/10000 [31:41:46<4:38:57, 12.85s/it] 87%|████████▋ | 8699/10000 [31:41:59<4:38:54, 12.86s/it] {'loss': 0.0048, 'learning_rate': 6.5850000000000005e-06, 'epoch': 3.28} 87%|████████▋ | 8699/10000 [31:41:59<4:38:54, 12.86s/it] 87%|████████▋ | 8700/10000 [31:42:12<4:39:23, 12.90s/it] {'loss': 0.0035, 'learning_rate': 6.58e-06, 'epoch': 3.28} 87%|████████▋ | 8700/10000 [31:42:12<4:39:23, 12.90s/it] 87%|████████▋ | 8701/10000 [31:42:25<4:39:20, 12.90s/it] {'loss': 0.0036, 'learning_rate': 6.5750000000000006e-06, 'epoch': 3.28} 87%|████████▋ | 8701/10000 [31:42:25<4:39:20, 12.90s/it] 87%|████████▋ | 8702/10000 [31:42:38<4:38:57, 12.89s/it] {'loss': 0.0037, 'learning_rate': 6.57e-06, 'epoch': 3.28} 87%|████████▋ | 8702/10000 [31:42:38<4:38:57, 12.89s/it] 87%|████████▋ | 8703/10000 [31:42:51<4:38:21, 12.88s/it] {'loss': 0.0029, 'learning_rate': 6.565000000000001e-06, 'epoch': 3.28} 87%|████████▋ | 8703/10000 [31:42:51<4:38:21, 12.88s/it] 87%|████████▋ | 8704/10000 [31:43:03<4:38:18, 12.88s/it] {'loss': 0.0039, 'learning_rate': 6.560000000000001e-06, 'epoch': 3.28} 87%|████████▋ | 8704/10000 [31:43:03<4:38:18, 12.88s/it] 87%|████████▋ | 8705/10000 [31:43:16<4:38:26, 12.90s/it] {'loss': 0.004, 'learning_rate': 6.555e-06, 'epoch': 3.28} 87%|████████▋ | 8705/10000 [31:43:16<4:38:26, 12.90s/it] 87%|████████▋ | 8706/10000 [31:43:29<4:38:07, 12.90s/it] {'loss': 0.0037, 'learning_rate': 6.550000000000001e-06, 'epoch': 3.28} 87%|████████▋ | 8706/10000 [31:43:29<4:38:07, 12.90s/it] 87%|████████▋ | 8707/10000 [31:43:42<4:37:46, 12.89s/it] {'loss': 0.0047, 'learning_rate': 6.545e-06, 'epoch': 3.28} 87%|████████▋ | 8707/10000 [31:43:42<4:37:46, 12.89s/it] 87%|████████▋ | 8708/10000 [31:43:55<4:37:46, 12.90s/it] {'loss': 0.0041, 'learning_rate': 6.54e-06, 'epoch': 3.28} 87%|████████▋ | 8708/10000 [31:43:55<4:37:46, 12.90s/it] 87%|████████▋ | 8709/10000 [31:44:08<4:37:55, 12.92s/it] {'loss': 0.0038, 'learning_rate': 6.535000000000001e-06, 'epoch': 3.28} 87%|████████▋ | 8709/10000 [31:44:08<4:37:55, 12.92s/it] 87%|████████▋ | 8710/10000 [31:44:21<4:37:52, 12.92s/it] {'loss': 0.0047, 'learning_rate': 6.53e-06, 'epoch': 3.28} 87%|████████▋ | 8710/10000 [31:44:21<4:37:52, 12.92s/it] 87%|████████▋ | 8711/10000 [31:44:34<4:37:35, 12.92s/it] {'loss': 0.0049, 'learning_rate': 6.525e-06, 'epoch': 3.28} 87%|████████▋ | 8711/10000 [31:44:34<4:37:35, 12.92s/it] 87%|████████▋ | 8712/10000 [31:44:47<4:37:36, 12.93s/it] {'loss': 0.0043, 'learning_rate': 6.519999999999999e-06, 'epoch': 3.28} 87%|████████▋ | 8712/10000 [31:44:47<4:37:36, 12.93s/it] 87%|████████▋ | 8713/10000 [31:45:00<4:37:43, 12.95s/it] {'loss': 0.0035, 'learning_rate': 6.515e-06, 'epoch': 3.28} 87%|████████▋ | 8713/10000 [31:45:00<4:37:43, 12.95s/it] 87%|████████▋ | 8714/10000 [31:45:13<4:37:15, 12.94s/it] {'loss': 0.0053, 'learning_rate': 6.510000000000001e-06, 'epoch': 3.28} 87%|████████▋ | 8714/10000 [31:45:13<4:37:15, 12.94s/it] 87%|████████▋ | 8715/10000 [31:45:26<4:36:56, 12.93s/it] {'loss': 0.0043, 'learning_rate': 6.505e-06, 'epoch': 3.28} 87%|████████▋ | 8715/10000 [31:45:26<4:36:56, 12.93s/it] 87%|████████▋ | 8716/10000 [31:45:39<4:36:42, 12.93s/it] {'loss': 0.003, 'learning_rate': 6.5000000000000004e-06, 'epoch': 3.28} 87%|████████▋ | 8716/10000 [31:45:39<4:36:42, 12.93s/it] 87%|████████▋ | 8717/10000 [31:45:51<4:36:38, 12.94s/it] {'loss': 0.0043, 'learning_rate': 6.495e-06, 'epoch': 3.28} 87%|████████▋ | 8717/10000 [31:45:52<4:36:38, 12.94s/it] 87%|████████▋ | 8718/10000 [31:46:05<4:36:54, 12.96s/it] {'loss': 0.0036, 'learning_rate': 6.4900000000000005e-06, 'epoch': 3.28} 87%|████████▋ | 8718/10000 [31:46:05<4:36:54, 12.96s/it] 87%|████████▋ | 8719/10000 [31:46:18<4:37:01, 12.98s/it] {'loss': 0.003, 'learning_rate': 6.485000000000001e-06, 'epoch': 3.29} 87%|████████▋ | 8719/10000 [31:46:18<4:37:01, 12.98s/it] 87%|████████▋ | 8720/10000 [31:46:31<4:36:52, 12.98s/it] {'loss': 0.0031, 'learning_rate': 6.48e-06, 'epoch': 3.29} 87%|████████▋ | 8720/10000 [31:46:31<4:36:52, 12.98s/it] 87%|████████▋ | 8721/10000 [31:46:43<4:36:13, 12.96s/it] {'loss': 0.0045, 'learning_rate': 6.475000000000001e-06, 'epoch': 3.29} 87%|████████▋ | 8721/10000 [31:46:43<4:36:13, 12.96s/it] 87%|████████▋ | 8722/10000 [31:46:56<4:35:50, 12.95s/it] {'loss': 0.0032, 'learning_rate': 6.47e-06, 'epoch': 3.29} 87%|████████▋ | 8722/10000 [31:46:56<4:35:50, 12.95s/it] 87%|████████▋ | 8723/10000 [31:47:09<4:35:28, 12.94s/it] {'loss': 0.0037, 'learning_rate': 6.465e-06, 'epoch': 3.29} 87%|████████▋ | 8723/10000 [31:47:09<4:35:28, 12.94s/it] 87%|████████▋ | 8724/10000 [31:47:22<4:34:35, 12.91s/it] {'loss': 0.0045, 'learning_rate': 6.460000000000001e-06, 'epoch': 3.29} 87%|████████▋ | 8724/10000 [31:47:22<4:34:35, 12.91s/it] 87%|████████▋ | 8725/10000 [31:47:35<4:34:08, 12.90s/it] {'loss': 0.0035, 'learning_rate': 6.455e-06, 'epoch': 3.29} 87%|████████▋ | 8725/10000 [31:47:35<4:34:08, 12.90s/it] 87%|████████▋ | 8726/10000 [31:47:48<4:33:49, 12.90s/it] {'loss': 0.0042, 'learning_rate': 6.45e-06, 'epoch': 3.29} 87%|████████▋ | 8726/10000 [31:47:48<4:33:49, 12.90s/it] 87%|████████▋ | 8727/10000 [31:48:01<4:33:29, 12.89s/it] {'loss': 0.0043, 'learning_rate': 6.444999999999999e-06, 'epoch': 3.29} 87%|████████▋ | 8727/10000 [31:48:01<4:33:29, 12.89s/it] 87%|████████▋ | 8728/10000 [31:48:14<4:33:23, 12.90s/it] {'loss': 0.004, 'learning_rate': 6.44e-06, 'epoch': 3.29} 87%|████████▋ | 8728/10000 [31:48:14<4:33:23, 12.90s/it] 87%|████████▋ | 8729/10000 [31:48:27<4:33:52, 12.93s/it] {'loss': 0.003, 'learning_rate': 6.435000000000001e-06, 'epoch': 3.29} 87%|████████▋ | 8729/10000 [31:48:27<4:33:52, 12.93s/it] 87%|████████▋ | 8730/10000 [31:48:40<4:33:39, 12.93s/it] {'loss': 0.0041, 'learning_rate': 6.43e-06, 'epoch': 3.29} 87%|████████▋ | 8730/10000 [31:48:40<4:33:39, 12.93s/it] 87%|████████▋ | 8731/10000 [31:48:53<4:33:49, 12.95s/it] {'loss': 0.0036, 'learning_rate': 6.425e-06, 'epoch': 3.29} 87%|████████▋ | 8731/10000 [31:48:53<4:33:49, 12.95s/it] 87%|████████▋ | 8732/10000 [31:49:06<4:33:43, 12.95s/it] {'loss': 0.0036, 'learning_rate': 6.4199999999999995e-06, 'epoch': 3.29} 87%|████████▋ | 8732/10000 [31:49:06<4:33:43, 12.95s/it] 87%|████████▋ | 8733/10000 [31:49:19<4:33:33, 12.95s/it] {'loss': 0.0053, 'learning_rate': 6.415e-06, 'epoch': 3.29} 87%|████████▋ | 8733/10000 [31:49:19<4:33:33, 12.95s/it] 87%|████████▋ | 8734/10000 [31:49:31<4:33:04, 12.94s/it] {'loss': 0.0034, 'learning_rate': 6.4100000000000005e-06, 'epoch': 3.29} 87%|████████▋ | 8734/10000 [31:49:31<4:33:04, 12.94s/it] 87%|████████▋ | 8735/10000 [31:49:44<4:32:26, 12.92s/it] {'loss': 0.0038, 'learning_rate': 6.405e-06, 'epoch': 3.29} 87%|████████▋ | 8735/10000 [31:49:44<4:32:26, 12.92s/it] 87%|████████▋ | 8736/10000 [31:49:57<4:32:08, 12.92s/it] {'loss': 0.0041, 'learning_rate': 6.4000000000000006e-06, 'epoch': 3.29} 87%|████████▋ | 8736/10000 [31:49:57<4:32:08, 12.92s/it] 87%|████████▋ | 8737/10000 [31:50:10<4:31:23, 12.89s/it] {'loss': 0.0035, 'learning_rate': 6.395000000000001e-06, 'epoch': 3.29} 87%|████████▋ | 8737/10000 [31:50:10<4:31:23, 12.89s/it] 87%|████████▋ | 8738/10000 [31:50:23<4:30:51, 12.88s/it] {'loss': 0.0049, 'learning_rate': 6.39e-06, 'epoch': 3.29} 87%|████████▋ | 8738/10000 [31:50:23<4:30:51, 12.88s/it] 87%|████████▋ | 8739/10000 [31:50:36<4:30:42, 12.88s/it] {'loss': 0.0034, 'learning_rate': 6.385000000000001e-06, 'epoch': 3.29} 87%|████████▋ | 8739/10000 [31:50:36<4:30:42, 12.88s/it] 87%|████████▋ | 8740/10000 [31:50:49<4:30:32, 12.88s/it] {'loss': 0.0049, 'learning_rate': 6.38e-06, 'epoch': 3.29} 87%|████████▋ | 8740/10000 [31:50:49<4:30:32, 12.88s/it] 87%|████████▋ | 8741/10000 [31:51:02<4:30:17, 12.88s/it] {'loss': 0.0031, 'learning_rate': 6.375000000000001e-06, 'epoch': 3.29} 87%|████████▋ | 8741/10000 [31:51:02<4:30:17, 12.88s/it] 87%|████████▋ | 8742/10000 [31:51:14<4:29:57, 12.88s/it] {'loss': 0.0036, 'learning_rate': 6.370000000000001e-06, 'epoch': 3.29} 87%|████████▋ | 8742/10000 [31:51:14<4:29:57, 12.88s/it] 87%|████████▋ | 8743/10000 [31:51:27<4:29:32, 12.87s/it] {'loss': 0.0037, 'learning_rate': 6.365e-06, 'epoch': 3.29} 87%|████████▋ | 8743/10000 [31:51:27<4:29:32, 12.87s/it] 87%|████████▋ | 8744/10000 [31:51:40<4:29:22, 12.87s/it] {'loss': 0.0035, 'learning_rate': 6.360000000000001e-06, 'epoch': 3.29} 87%|████████▋ | 8744/10000 [31:51:40<4:29:22, 12.87s/it] 87%|████████▋ | 8745/10000 [31:51:53<4:29:15, 12.87s/it] {'loss': 0.003, 'learning_rate': 6.355e-06, 'epoch': 3.3} 87%|████████▋ | 8745/10000 [31:51:53<4:29:15, 12.87s/it] 87%|████████▋ | 8746/10000 [31:52:06<4:29:00, 12.87s/it] {'loss': 0.0031, 'learning_rate': 6.35e-06, 'epoch': 3.3} 87%|████████▋ | 8746/10000 [31:52:06<4:29:00, 12.87s/it] 87%|████████▋ | 8747/10000 [31:52:19<4:28:57, 12.88s/it] {'loss': 0.0051, 'learning_rate': 6.345000000000001e-06, 'epoch': 3.3} 87%|████████▋ | 8747/10000 [31:52:19<4:28:57, 12.88s/it] 87%|████████▋ | 8748/10000 [31:52:32<4:28:55, 12.89s/it] {'loss': 0.0026, 'learning_rate': 6.34e-06, 'epoch': 3.3} 87%|████████▋ | 8748/10000 [31:52:32<4:28:55, 12.89s/it] 87%|████████▋ | 8749/10000 [31:52:45<4:29:05, 12.91s/it] {'loss': 0.0033, 'learning_rate': 6.335e-06, 'epoch': 3.3} 87%|████████▋ | 8749/10000 [31:52:45<4:29:05, 12.91s/it] 88%|████████▊ | 8750/10000 [31:52:58<4:28:57, 12.91s/it] {'loss': 0.0046, 'learning_rate': 6.3299999999999995e-06, 'epoch': 3.3} 88%|████████▊ | 8750/10000 [31:52:58<4:28:57, 12.91s/it] 88%|████████▊ | 8751/10000 [31:53:10<4:28:38, 12.90s/it] {'loss': 0.0038, 'learning_rate': 6.3250000000000004e-06, 'epoch': 3.3} 88%|████████▊ | 8751/10000 [31:53:10<4:28:38, 12.90s/it] 88%|████████▊ | 8752/10000 [31:53:23<4:28:44, 12.92s/it] {'loss': 0.0045, 'learning_rate': 6.320000000000001e-06, 'epoch': 3.3} 88%|████████▊ | 8752/10000 [31:53:23<4:28:44, 12.92s/it] 88%|████████▊ | 8753/10000 [31:53:36<4:28:20, 12.91s/it] {'loss': 0.0036, 'learning_rate': 6.315e-06, 'epoch': 3.3} 88%|████████▊ | 8753/10000 [31:53:36<4:28:20, 12.91s/it] 88%|████████▊ | 8754/10000 [31:53:49<4:28:09, 12.91s/it] {'loss': 0.0032, 'learning_rate': 6.3100000000000006e-06, 'epoch': 3.3} 88%|████████▊ | 8754/10000 [31:53:49<4:28:09, 12.91s/it] 88%|████████▊ | 8755/10000 [31:54:02<4:28:17, 12.93s/it] {'loss': 0.0038, 'learning_rate': 6.305e-06, 'epoch': 3.3} 88%|████████▊ | 8755/10000 [31:54:02<4:28:17, 12.93s/it] 88%|████████▊ | 8756/10000 [31:54:15<4:28:31, 12.95s/it] {'loss': 0.0037, 'learning_rate': 6.300000000000001e-06, 'epoch': 3.3} 88%|████████▊ | 8756/10000 [31:54:15<4:28:31, 12.95s/it] 88%|████████▊ | 8757/10000 [31:54:28<4:29:02, 12.99s/it] {'loss': 0.0037, 'learning_rate': 6.295000000000001e-06, 'epoch': 3.3} 88%|████████▊ | 8757/10000 [31:54:28<4:29:02, 12.99s/it] 88%|████████▊ | 8758/10000 [31:54:41<4:28:45, 12.98s/it] {'loss': 0.0043, 'learning_rate': 6.29e-06, 'epoch': 3.3} 88%|████████▊ | 8758/10000 [31:54:41<4:28:45, 12.98s/it] 88%|████████▊ | 8759/10000 [31:54:54<4:28:19, 12.97s/it] {'loss': 0.0031, 'learning_rate': 6.285000000000001e-06, 'epoch': 3.3} 88%|████████▊ | 8759/10000 [31:54:54<4:28:19, 12.97s/it] 88%|████████▊ | 8760/10000 [31:55:07<4:28:06, 12.97s/it] {'loss': 0.0031, 'learning_rate': 6.28e-06, 'epoch': 3.3} 88%|████████▊ | 8760/10000 [31:55:07<4:28:06, 12.97s/it] 88%|████████▊ | 8761/10000 [31:55:20<4:27:33, 12.96s/it] {'loss': 0.0038, 'learning_rate': 6.275e-06, 'epoch': 3.3} 88%|████████▊ | 8761/10000 [31:55:20<4:27:33, 12.96s/it] 88%|████████▊ | 8762/10000 [31:55:33<4:26:48, 12.93s/it] {'loss': 0.004, 'learning_rate': 6.270000000000001e-06, 'epoch': 3.3} 88%|████████▊ | 8762/10000 [31:55:33<4:26:48, 12.93s/it] 88%|████████▊ | 8763/10000 [31:55:46<4:26:02, 12.90s/it] {'loss': 0.0038, 'learning_rate': 6.265e-06, 'epoch': 3.3} 88%|████████▊ | 8763/10000 [31:55:46<4:26:02, 12.90s/it] 88%|████████▊ | 8764/10000 [31:55:59<4:25:59, 12.91s/it] {'loss': 0.0039, 'learning_rate': 6.26e-06, 'epoch': 3.3} 88%|████████▊ | 8764/10000 [31:55:59<4:25:59, 12.91s/it] 88%|████████▊ | 8765/10000 [31:56:12<4:25:39, 12.91s/it] {'loss': 0.0058, 'learning_rate': 6.254999999999999e-06, 'epoch': 3.3} 88%|████████▊ | 8765/10000 [31:56:12<4:25:39, 12.91s/it] 88%|████████▊ | 8766/10000 [31:56:24<4:25:17, 12.90s/it] {'loss': 0.0034, 'learning_rate': 6.25e-06, 'epoch': 3.3} 88%|████████▊ | 8766/10000 [31:56:24<4:25:17, 12.90s/it] 88%|████████▊ | 8767/10000 [31:56:37<4:25:01, 12.90s/it] {'loss': 0.0041, 'learning_rate': 6.245e-06, 'epoch': 3.3} 88%|████████▊ | 8767/10000 [31:56:37<4:25:01, 12.90s/it] 88%|████████▊ | 8768/10000 [31:56:50<4:25:00, 12.91s/it] {'loss': 0.0038, 'learning_rate': 6.24e-06, 'epoch': 3.3} 88%|████████▊ | 8768/10000 [31:56:50<4:25:00, 12.91s/it] 88%|████████▊ | 8769/10000 [31:57:03<4:24:41, 12.90s/it] {'loss': 0.0032, 'learning_rate': 6.2350000000000004e-06, 'epoch': 3.3} 88%|████████▊ | 8769/10000 [31:57:03<4:24:41, 12.90s/it] 88%|████████▊ | 8770/10000 [31:57:16<4:24:17, 12.89s/it] {'loss': 0.0041, 'learning_rate': 6.2300000000000005e-06, 'epoch': 3.3} 88%|████████▊ | 8770/10000 [31:57:16<4:24:17, 12.89s/it] 88%|████████▊ | 8771/10000 [31:57:29<4:24:04, 12.89s/it] {'loss': 0.0044, 'learning_rate': 6.2250000000000005e-06, 'epoch': 3.3} 88%|████████▊ | 8771/10000 [31:57:29<4:24:04, 12.89s/it] 88%|████████▊ | 8772/10000 [31:57:42<4:23:26, 12.87s/it] {'loss': 0.0044, 'learning_rate': 6.22e-06, 'epoch': 3.31} 88%|████████▊ | 8772/10000 [31:57:42<4:23:26, 12.87s/it] 88%|████████▊ | 8773/10000 [31:57:55<4:23:30, 12.89s/it] {'loss': 0.0046, 'learning_rate': 6.215e-06, 'epoch': 3.31} 88%|████████▊ | 8773/10000 [31:57:55<4:23:30, 12.89s/it] 88%|████████▊ | 8774/10000 [31:58:08<4:23:45, 12.91s/it] {'loss': 0.0035, 'learning_rate': 6.210000000000001e-06, 'epoch': 3.31} 88%|████████▊ | 8774/10000 [31:58:08<4:23:45, 12.91s/it] 88%|████████▊ | 8775/10000 [31:58:20<4:23:08, 12.89s/it] {'loss': 0.0034, 'learning_rate': 6.205000000000001e-06, 'epoch': 3.31} 88%|████████▊ | 8775/10000 [31:58:21<4:23:08, 12.89s/it] 88%|████████▊ | 8776/10000 [31:58:33<4:22:28, 12.87s/it] {'loss': 0.0042, 'learning_rate': 6.2e-06, 'epoch': 3.31} 88%|████████▊ | 8776/10000 [31:58:33<4:22:28, 12.87s/it] 88%|████████▊ | 8777/10000 [31:58:46<4:22:15, 12.87s/it] {'loss': 0.0048, 'learning_rate': 6.195e-06, 'epoch': 3.31} 88%|████████▊ | 8777/10000 [31:58:46<4:22:15, 12.87s/it] 88%|████████▊ | 8778/10000 [31:58:59<4:21:49, 12.86s/it] {'loss': 0.0043, 'learning_rate': 6.19e-06, 'epoch': 3.31} 88%|████████▊ | 8778/10000 [31:58:59<4:21:49, 12.86s/it] 88%|████████▊ | 8779/10000 [31:59:12<4:21:42, 12.86s/it] {'loss': 0.0035, 'learning_rate': 6.185000000000001e-06, 'epoch': 3.31} 88%|████████▊ | 8779/10000 [31:59:12<4:21:42, 12.86s/it] 88%|████████▊ | 8780/10000 [31:59:25<4:21:30, 12.86s/it] {'loss': 0.0058, 'learning_rate': 6.18e-06, 'epoch': 3.31} 88%|████████▊ | 8780/10000 [31:59:25<4:21:30, 12.86s/it] 88%|████████▊ | 8781/10000 [31:59:38<4:21:39, 12.88s/it] {'loss': 0.0036, 'learning_rate': 6.175e-06, 'epoch': 3.31} 88%|████████▊ | 8781/10000 [31:59:38<4:21:39, 12.88s/it] 88%|████████▊ | 8782/10000 [31:59:51<4:21:45, 12.89s/it] {'loss': 0.0048, 'learning_rate': 6.17e-06, 'epoch': 3.31} 88%|████████▊ | 8782/10000 [31:59:51<4:21:45, 12.89s/it] 88%|████████▊ | 8783/10000 [32:00:03<4:21:14, 12.88s/it] {'loss': 0.0037, 'learning_rate': 6.165e-06, 'epoch': 3.31} 88%|████████▊ | 8783/10000 [32:00:03<4:21:14, 12.88s/it] 88%|████████▊ | 8784/10000 [32:00:16<4:21:05, 12.88s/it] {'loss': 0.0032, 'learning_rate': 6.16e-06, 'epoch': 3.31} 88%|████████▊ | 8784/10000 [32:00:16<4:21:05, 12.88s/it] 88%|████████▊ | 8785/10000 [32:00:29<4:21:33, 12.92s/it] {'loss': 0.0036, 'learning_rate': 6.155e-06, 'epoch': 3.31} 88%|████████▊ | 8785/10000 [32:00:29<4:21:33, 12.92s/it] 88%|████████▊ | 8786/10000 [32:00:42<4:21:45, 12.94s/it] {'loss': 0.0041, 'learning_rate': 6.15e-06, 'epoch': 3.31} 88%|████████▊ | 8786/10000 [32:00:42<4:21:45, 12.94s/it] 88%|████████▊ | 8787/10000 [32:00:55<4:21:26, 12.93s/it] {'loss': 0.0032, 'learning_rate': 6.1450000000000005e-06, 'epoch': 3.31} 88%|████████▊ | 8787/10000 [32:00:55<4:21:26, 12.93s/it] 88%|████████▊ | 8788/10000 [32:01:08<4:21:05, 12.92s/it] {'loss': 0.0038, 'learning_rate': 6.1400000000000005e-06, 'epoch': 3.31} 88%|████████▊ | 8788/10000 [32:01:08<4:21:05, 12.92s/it] 88%|████████▊ | 8789/10000 [32:01:21<4:20:15, 12.89s/it] {'loss': 0.0041, 'learning_rate': 6.1350000000000006e-06, 'epoch': 3.31} 88%|████████▊ | 8789/10000 [32:01:21<4:20:15, 12.89s/it] 88%|████████▊ | 8790/10000 [32:01:34<4:19:52, 12.89s/it] {'loss': 0.0032, 'learning_rate': 6.130000000000001e-06, 'epoch': 3.31} 88%|████████▊ | 8790/10000 [32:01:34<4:19:52, 12.89s/it] 88%|████████▊ | 8791/10000 [32:01:47<4:19:34, 12.88s/it] {'loss': 0.0039, 'learning_rate': 6.125e-06, 'epoch': 3.31} 88%|████████▊ | 8791/10000 [32:01:47<4:19:34, 12.88s/it] 88%|████████▊ | 8792/10000 [32:02:00<4:19:17, 12.88s/it] {'loss': 0.0026, 'learning_rate': 6.12e-06, 'epoch': 3.31} 88%|████████▊ | 8792/10000 [32:02:00<4:19:17, 12.88s/it] 88%|████████▊ | 8793/10000 [32:02:12<4:18:57, 12.87s/it] {'loss': 0.0045, 'learning_rate': 6.115000000000001e-06, 'epoch': 3.31} 88%|████████▊ | 8793/10000 [32:02:12<4:18:57, 12.87s/it] 88%|████████▊ | 8794/10000 [32:02:25<4:19:11, 12.90s/it] {'loss': 0.0027, 'learning_rate': 6.110000000000001e-06, 'epoch': 3.31} 88%|████████▊ | 8794/10000 [32:02:25<4:19:11, 12.90s/it] 88%|████████▊ | 8795/10000 [32:02:38<4:19:19, 12.91s/it] {'loss': 0.0035, 'learning_rate': 6.105e-06, 'epoch': 3.31} 88%|████████▊ | 8795/10000 [32:02:38<4:19:19, 12.91s/it] 88%|████████▊ | 8796/10000 [32:02:51<4:19:12, 12.92s/it] {'loss': 0.004, 'learning_rate': 6.1e-06, 'epoch': 3.31} 88%|████████▊ | 8796/10000 [32:02:51<4:19:12, 12.92s/it] 88%|████████▊ | 8797/10000 [32:03:04<4:19:37, 12.95s/it] {'loss': 0.004, 'learning_rate': 6.095e-06, 'epoch': 3.31} 88%|████████▊ | 8797/10000 [32:03:04<4:19:37, 12.95s/it] 88%|████████▊ | 8798/10000 [32:03:17<4:19:03, 12.93s/it] {'loss': 0.003, 'learning_rate': 6.090000000000001e-06, 'epoch': 3.31} 88%|████████▊ | 8798/10000 [32:03:17<4:19:03, 12.93s/it] 88%|████████▊ | 8799/10000 [32:03:30<4:18:32, 12.92s/it] {'loss': 0.003, 'learning_rate': 6.085e-06, 'epoch': 3.32} 88%|████████▊ | 8799/10000 [32:03:30<4:18:32, 12.92s/it] 88%|████████▊ | 8800/10000 [32:03:43<4:17:52, 12.89s/it] {'loss': 0.0054, 'learning_rate': 6.08e-06, 'epoch': 3.32} 88%|████████▊ | 8800/10000 [32:03:43<4:17:52, 12.89s/it] 88%|████████▊ | 8801/10000 [32:03:56<4:17:34, 12.89s/it] {'loss': 0.0035, 'learning_rate': 6.075e-06, 'epoch': 3.32} 88%|████████▊ | 8801/10000 [32:03:56<4:17:34, 12.89s/it] 88%|████████▊ | 8802/10000 [32:04:09<4:17:35, 12.90s/it] {'loss': 0.0045, 'learning_rate': 6.07e-06, 'epoch': 3.32} 88%|████████▊ | 8802/10000 [32:04:09<4:17:35, 12.90s/it] 88%|████████▊ | 8803/10000 [32:04:22<4:17:34, 12.91s/it] {'loss': 0.0035, 'learning_rate': 6.065e-06, 'epoch': 3.32} 88%|████████▊ | 8803/10000 [32:04:22<4:17:34, 12.91s/it] 88%|████████▊ | 8804/10000 [32:04:35<4:17:24, 12.91s/it] {'loss': 0.0037, 'learning_rate': 6.0600000000000004e-06, 'epoch': 3.32} 88%|████████▊ | 8804/10000 [32:04:35<4:17:24, 12.91s/it] 88%|████████▊ | 8805/10000 [32:04:47<4:17:21, 12.92s/it] {'loss': 0.0039, 'learning_rate': 6.0550000000000005e-06, 'epoch': 3.32} 88%|████████▊ | 8805/10000 [32:04:47<4:17:21, 12.92s/it] 88%|████████▊ | 8806/10000 [32:05:00<4:17:07, 12.92s/it] {'loss': 0.004, 'learning_rate': 6.0500000000000005e-06, 'epoch': 3.32} 88%|████████▊ | 8806/10000 [32:05:00<4:17:07, 12.92s/it] 88%|████████▊ | 8807/10000 [32:05:13<4:16:32, 12.90s/it] {'loss': 0.0056, 'learning_rate': 6.045e-06, 'epoch': 3.32} 88%|████████▊ | 8807/10000 [32:05:13<4:16:32, 12.90s/it] 88%|████████▊ | 8808/10000 [32:05:26<4:16:22, 12.91s/it] {'loss': 0.0034, 'learning_rate': 6.040000000000001e-06, 'epoch': 3.32} 88%|████████▊ | 8808/10000 [32:05:26<4:16:22, 12.91s/it] 88%|████████▊ | 8809/10000 [32:05:39<4:16:46, 12.94s/it] {'loss': 0.0028, 'learning_rate': 6.035000000000001e-06, 'epoch': 3.32} 88%|████████▊ | 8809/10000 [32:05:39<4:16:46, 12.94s/it] 88%|████████▊ | 8810/10000 [32:05:52<4:16:55, 12.95s/it] {'loss': 0.0036, 'learning_rate': 6.03e-06, 'epoch': 3.32} 88%|████████▊ | 8810/10000 [32:05:52<4:16:55, 12.95s/it] 88%|████████▊ | 8811/10000 [32:06:05<4:16:27, 12.94s/it] {'loss': 0.0038, 'learning_rate': 6.025e-06, 'epoch': 3.32} 88%|████████▊ | 8811/10000 [32:06:05<4:16:27, 12.94s/it] 88%|████████▊ | 8812/10000 [32:06:18<4:17:00, 12.98s/it] {'loss': 0.0039, 'learning_rate': 6.02e-06, 'epoch': 3.32} 88%|████████▊ | 8812/10000 [32:06:18<4:17:00, 12.98s/it] 88%|████████▊ | 8813/10000 [32:06:31<4:16:29, 12.96s/it] {'loss': 0.0037, 'learning_rate': 6.015000000000001e-06, 'epoch': 3.32} 88%|████████▊ | 8813/10000 [32:06:31<4:16:29, 12.96s/it] 88%|████████▊ | 8814/10000 [32:06:44<4:15:43, 12.94s/it] {'loss': 0.0033, 'learning_rate': 6.01e-06, 'epoch': 3.32} 88%|████████▊ | 8814/10000 [32:06:44<4:15:43, 12.94s/it] 88%|████████▊ | 8815/10000 [32:06:57<4:15:17, 12.93s/it] {'loss': 0.0033, 'learning_rate': 6.005e-06, 'epoch': 3.32} 88%|████████▊ | 8815/10000 [32:06:57<4:15:17, 12.93s/it] 88%|████████▊ | 8816/10000 [32:07:10<4:14:39, 12.90s/it] {'loss': 0.0038, 'learning_rate': 6e-06, 'epoch': 3.32} 88%|████████▊ | 8816/10000 [32:07:10<4:14:39, 12.90s/it] 88%|████████▊ | 8817/10000 [32:07:23<4:14:48, 12.92s/it] {'loss': 0.0042, 'learning_rate': 5.995e-06, 'epoch': 3.32} 88%|████████▊ | 8817/10000 [32:07:23<4:14:48, 12.92s/it] 88%|████████▊ | 8818/10000 [32:07:36<4:15:03, 12.95s/it] {'loss': 0.004, 'learning_rate': 5.99e-06, 'epoch': 3.32} 88%|████████▊ | 8818/10000 [32:07:36<4:15:03, 12.95s/it] 88%|████████▊ | 8819/10000 [32:07:49<4:14:29, 12.93s/it] {'loss': 0.0043, 'learning_rate': 5.985e-06, 'epoch': 3.32} 88%|████████▊ | 8819/10000 [32:07:49<4:14:29, 12.93s/it] 88%|████████▊ | 8820/10000 [32:08:01<4:14:11, 12.93s/it] {'loss': 0.003, 'learning_rate': 5.98e-06, 'epoch': 3.32} 88%|████████▊ | 8820/10000 [32:08:02<4:14:11, 12.93s/it] 88%|████████▊ | 8821/10000 [32:08:14<4:14:08, 12.93s/it] {'loss': 0.0036, 'learning_rate': 5.975e-06, 'epoch': 3.32} 88%|████████▊ | 8821/10000 [32:08:14<4:14:08, 12.93s/it] 88%|████████▊ | 8822/10000 [32:08:27<4:13:23, 12.91s/it] {'loss': 0.0041, 'learning_rate': 5.9700000000000004e-06, 'epoch': 3.32} 88%|████████▊ | 8822/10000 [32:08:27<4:13:23, 12.91s/it] 88%|████████▊ | 8823/10000 [32:08:40<4:13:17, 12.91s/it] {'loss': 0.0041, 'learning_rate': 5.9650000000000005e-06, 'epoch': 3.32} 88%|████████▊ | 8823/10000 [32:08:40<4:13:17, 12.91s/it] 88%|████████▊ | 8824/10000 [32:08:53<4:13:35, 12.94s/it] {'loss': 0.0032, 'learning_rate': 5.9600000000000005e-06, 'epoch': 3.32} 88%|████████▊ | 8824/10000 [32:08:53<4:13:35, 12.94s/it] 88%|████████▊ | 8825/10000 [32:09:06<4:13:36, 12.95s/it] {'loss': 0.003, 'learning_rate': 5.955000000000001e-06, 'epoch': 3.33} 88%|████████▊ | 8825/10000 [32:09:06<4:13:36, 12.95s/it] 88%|████████▊ | 8826/10000 [32:09:19<4:13:37, 12.96s/it] {'loss': 0.0042, 'learning_rate': 5.95e-06, 'epoch': 3.33} 88%|████████▊ | 8826/10000 [32:09:19<4:13:37, 12.96s/it] 88%|████████▊ | 8827/10000 [32:09:32<4:13:44, 12.98s/it] {'loss': 0.0041, 'learning_rate': 5.945000000000001e-06, 'epoch': 3.33} 88%|████████▊ | 8827/10000 [32:09:32<4:13:44, 12.98s/it] 88%|████████▊ | 8828/10000 [32:09:45<4:12:41, 12.94s/it] {'loss': 0.0035, 'learning_rate': 5.940000000000001e-06, 'epoch': 3.33} 88%|████████▊ | 8828/10000 [32:09:45<4:12:41, 12.94s/it] 88%|████████▊ | 8829/10000 [32:09:58<4:11:57, 12.91s/it] {'loss': 0.0033, 'learning_rate': 5.935e-06, 'epoch': 3.33} 88%|████████▊ | 8829/10000 [32:09:58<4:11:57, 12.91s/it] 88%|████████▊ | 8830/10000 [32:10:11<4:11:38, 12.90s/it] {'loss': 0.0031, 'learning_rate': 5.93e-06, 'epoch': 3.33} 88%|████████▊ | 8830/10000 [32:10:11<4:11:38, 12.90s/it] 88%|████████▊ | 8831/10000 [32:10:24<4:11:16, 12.90s/it] {'loss': 0.003, 'learning_rate': 5.925e-06, 'epoch': 3.33} 88%|████████▊ | 8831/10000 [32:10:24<4:11:16, 12.90s/it] 88%|████████▊ | 8832/10000 [32:10:37<4:10:57, 12.89s/it] {'loss': 0.0037, 'learning_rate': 5.920000000000001e-06, 'epoch': 3.33} 88%|████████▊ | 8832/10000 [32:10:37<4:10:57, 12.89s/it] 88%|████████▊ | 8833/10000 [32:10:49<4:10:33, 12.88s/it] {'loss': 0.0029, 'learning_rate': 5.915e-06, 'epoch': 3.33} 88%|████████▊ | 8833/10000 [32:10:49<4:10:33, 12.88s/it] 88%|████████▊ | 8834/10000 [32:11:02<4:10:38, 12.90s/it] {'loss': 0.0034, 'learning_rate': 5.91e-06, 'epoch': 3.33} 88%|████████▊ | 8834/10000 [32:11:02<4:10:38, 12.90s/it] 88%|████████▊ | 8835/10000 [32:11:15<4:10:59, 12.93s/it] {'loss': 0.0025, 'learning_rate': 5.905e-06, 'epoch': 3.33} 88%|████████▊ | 8835/10000 [32:11:15<4:10:59, 12.93s/it] 88%|████████▊ | 8836/10000 [32:11:28<4:10:14, 12.90s/it] {'loss': 0.0037, 'learning_rate': 5.9e-06, 'epoch': 3.33} 88%|████████▊ | 8836/10000 [32:11:28<4:10:14, 12.90s/it] 88%|████████▊ | 8837/10000 [32:11:41<4:09:43, 12.88s/it] {'loss': 0.0034, 'learning_rate': 5.895e-06, 'epoch': 3.33} 88%|████████▊ | 8837/10000 [32:11:41<4:09:43, 12.88s/it] 88%|████████▊ | 8838/10000 [32:11:54<4:09:47, 12.90s/it] {'loss': 0.0029, 'learning_rate': 5.89e-06, 'epoch': 3.33} 88%|████████▊ | 8838/10000 [32:11:54<4:09:47, 12.90s/it] 88%|████████▊ | 8839/10000 [32:12:07<4:09:47, 12.91s/it] {'loss': 0.003, 'learning_rate': 5.885e-06, 'epoch': 3.33} 88%|████████▊ | 8839/10000 [32:12:07<4:09:47, 12.91s/it] 88%|████████▊ | 8840/10000 [32:12:20<4:10:20, 12.95s/it] {'loss': 0.0036, 'learning_rate': 5.8800000000000005e-06, 'epoch': 3.33} 88%|████████▊ | 8840/10000 [32:12:20<4:10:20, 12.95s/it] 88%|████████▊ | 8841/10000 [32:12:33<4:09:38, 12.92s/it] {'loss': 0.0039, 'learning_rate': 5.875e-06, 'epoch': 3.33} 88%|████████▊ | 8841/10000 [32:12:33<4:09:38, 12.92s/it] 88%|████████▊ | 8842/10000 [32:12:46<4:09:00, 12.90s/it] {'loss': 0.0043, 'learning_rate': 5.8700000000000005e-06, 'epoch': 3.33} 88%|████████▊ | 8842/10000 [32:12:46<4:09:00, 12.90s/it] 88%|████████▊ | 8843/10000 [32:12:59<4:08:43, 12.90s/it] {'loss': 0.0043, 'learning_rate': 5.865000000000001e-06, 'epoch': 3.33} 88%|████████▊ | 8843/10000 [32:12:59<4:08:43, 12.90s/it] 88%|████████▊ | 8844/10000 [32:13:11<4:08:42, 12.91s/it] {'loss': 0.0033, 'learning_rate': 5.86e-06, 'epoch': 3.33} 88%|████████▊ | 8844/10000 [32:13:11<4:08:42, 12.91s/it] 88%|████████▊ | 8845/10000 [32:13:24<4:08:18, 12.90s/it] {'loss': 0.0029, 'learning_rate': 5.855e-06, 'epoch': 3.33} 88%|████████▊ | 8845/10000 [32:13:24<4:08:18, 12.90s/it] 88%|████████▊ | 8846/10000 [32:13:37<4:08:00, 12.89s/it] {'loss': 0.0036, 'learning_rate': 5.850000000000001e-06, 'epoch': 3.33} 88%|████████▊ | 8846/10000 [32:13:37<4:08:00, 12.89s/it] 88%|████████▊ | 8847/10000 [32:13:50<4:08:00, 12.91s/it] {'loss': 0.0027, 'learning_rate': 5.845000000000001e-06, 'epoch': 3.33} 88%|████████▊ | 8847/10000 [32:13:50<4:08:00, 12.91s/it] 88%|████████▊ | 8848/10000 [32:14:03<4:07:42, 12.90s/it] {'loss': 0.0035, 'learning_rate': 5.84e-06, 'epoch': 3.33} 88%|████████▊ | 8848/10000 [32:14:03<4:07:42, 12.90s/it] 88%|████████▊ | 8849/10000 [32:14:16<4:07:46, 12.92s/it] {'loss': 0.0036, 'learning_rate': 5.835e-06, 'epoch': 3.33} 88%|████████▊ | 8849/10000 [32:14:16<4:07:46, 12.92s/it] 88%|████████▊ | 8850/10000 [32:14:29<4:07:49, 12.93s/it] {'loss': 0.0042, 'learning_rate': 5.83e-06, 'epoch': 3.33} 88%|████████▊ | 8850/10000 [32:14:29<4:07:49, 12.93s/it] 89%|████████▊ | 8851/10000 [32:14:42<4:07:46, 12.94s/it] {'loss': 0.004, 'learning_rate': 5.825000000000001e-06, 'epoch': 3.33} 89%|████████▊ | 8851/10000 [32:14:42<4:07:46, 12.94s/it] 89%|████████▊ | 8852/10000 [32:14:55<4:07:33, 12.94s/it] {'loss': 0.0032, 'learning_rate': 5.82e-06, 'epoch': 3.34} 89%|████████▊ | 8852/10000 [32:14:55<4:07:33, 12.94s/it] 89%|████████▊ | 8853/10000 [32:15:08<4:07:22, 12.94s/it] {'loss': 0.0042, 'learning_rate': 5.815e-06, 'epoch': 3.34} 89%|████████▊ | 8853/10000 [32:15:08<4:07:22, 12.94s/it] 89%|████████▊ | 8854/10000 [32:15:21<4:06:41, 12.92s/it] {'loss': 0.0056, 'learning_rate': 5.81e-06, 'epoch': 3.34} 89%|████████▊ | 8854/10000 [32:15:21<4:06:41, 12.92s/it] 89%|████████▊ | 8855/10000 [32:15:34<4:06:45, 12.93s/it] {'loss': 0.0046, 'learning_rate': 5.805e-06, 'epoch': 3.34} 89%|████████▊ | 8855/10000 [32:15:34<4:06:45, 12.93s/it] 89%|████████▊ | 8856/10000 [32:15:46<4:06:15, 12.92s/it] {'loss': 0.0032, 'learning_rate': 5.8e-06, 'epoch': 3.34} 89%|████████▊ | 8856/10000 [32:15:47<4:06:15, 12.92s/it] 89%|████████▊ | 8857/10000 [32:15:59<4:05:51, 12.91s/it] {'loss': 0.0038, 'learning_rate': 5.795e-06, 'epoch': 3.34} 89%|████████▊ | 8857/10000 [32:15:59<4:05:51, 12.91s/it] 89%|████████▊ | 8858/10000 [32:16:12<4:05:29, 12.90s/it] {'loss': 0.0043, 'learning_rate': 5.7900000000000005e-06, 'epoch': 3.34} 89%|████████▊ | 8858/10000 [32:16:12<4:05:29, 12.90s/it] 89%|████████▊ | 8859/10000 [32:16:25<4:04:58, 12.88s/it] {'loss': 0.0037, 'learning_rate': 5.7850000000000005e-06, 'epoch': 3.34} 89%|████████▊ | 8859/10000 [32:16:25<4:04:58, 12.88s/it] 89%|████████▊ | 8860/10000 [32:16:38<4:04:58, 12.89s/it] {'loss': 0.0033, 'learning_rate': 5.78e-06, 'epoch': 3.34} 89%|████████▊ | 8860/10000 [32:16:38<4:04:58, 12.89s/it] 89%|████████▊ | 8861/10000 [32:16:51<4:04:57, 12.90s/it] {'loss': 0.003, 'learning_rate': 5.775000000000001e-06, 'epoch': 3.34} 89%|████████▊ | 8861/10000 [32:16:51<4:04:57, 12.90s/it] 89%|████████▊ | 8862/10000 [32:17:04<4:05:01, 12.92s/it] {'loss': 0.0039, 'learning_rate': 5.770000000000001e-06, 'epoch': 3.34} 89%|████████▊ | 8862/10000 [32:17:04<4:05:01, 12.92s/it] 89%|████████▊ | 8863/10000 [32:17:17<4:04:37, 12.91s/it] {'loss': 0.0034, 'learning_rate': 5.765e-06, 'epoch': 3.34} 89%|████████▊ | 8863/10000 [32:17:17<4:04:37, 12.91s/it] 89%|████████▊ | 8864/10000 [32:17:30<4:03:51, 12.88s/it] {'loss': 0.0037, 'learning_rate': 5.76e-06, 'epoch': 3.34} 89%|████████▊ | 8864/10000 [32:17:30<4:03:51, 12.88s/it] 89%|████████▊ | 8865/10000 [32:17:43<4:04:00, 12.90s/it] {'loss': 0.0034, 'learning_rate': 5.755e-06, 'epoch': 3.34} 89%|████████▊ | 8865/10000 [32:17:43<4:04:00, 12.90s/it] 89%|████████▊ | 8866/10000 [32:17:55<4:03:47, 12.90s/it] {'loss': 0.0035, 'learning_rate': 5.750000000000001e-06, 'epoch': 3.34} 89%|████████▊ | 8866/10000 [32:17:55<4:03:47, 12.90s/it] 89%|████████▊ | 8867/10000 [32:18:08<4:03:48, 12.91s/it] {'loss': 0.0042, 'learning_rate': 5.745e-06, 'epoch': 3.34} 89%|████████▊ | 8867/10000 [32:18:08<4:03:48, 12.91s/it] 89%|████████▊ | 8868/10000 [32:18:21<4:03:19, 12.90s/it] {'loss': 0.0027, 'learning_rate': 5.74e-06, 'epoch': 3.34} 89%|████████▊ | 8868/10000 [32:18:21<4:03:19, 12.90s/it] 89%|████████▊ | 8869/10000 [32:18:34<4:03:01, 12.89s/it] {'loss': 0.0036, 'learning_rate': 5.735e-06, 'epoch': 3.34} 89%|████████▊ | 8869/10000 [32:18:34<4:03:01, 12.89s/it] 89%|████████▊ | 8870/10000 [32:18:47<4:02:55, 12.90s/it] {'loss': 0.0029, 'learning_rate': 5.73e-06, 'epoch': 3.34} 89%|████████▊ | 8870/10000 [32:18:47<4:02:55, 12.90s/it] 89%|████████▊ | 8871/10000 [32:19:00<4:02:33, 12.89s/it] {'loss': 0.0048, 'learning_rate': 5.725e-06, 'epoch': 3.34} 89%|████████▊ | 8871/10000 [32:19:00<4:02:33, 12.89s/it] 89%|████████▊ | 8872/10000 [32:19:13<4:02:13, 12.88s/it] {'loss': 0.0048, 'learning_rate': 5.72e-06, 'epoch': 3.34} 89%|████████▊ | 8872/10000 [32:19:13<4:02:13, 12.88s/it] 89%|████████▊ | 8873/10000 [32:19:26<4:01:55, 12.88s/it] {'loss': 0.0025, 'learning_rate': 5.715e-06, 'epoch': 3.34} 89%|████████▊ | 8873/10000 [32:19:26<4:01:55, 12.88s/it] 89%|████████▊ | 8874/10000 [32:19:39<4:01:36, 12.87s/it] {'loss': 0.0031, 'learning_rate': 5.71e-06, 'epoch': 3.34} 89%|████████▊ | 8874/10000 [32:19:39<4:01:36, 12.87s/it] 89%|████████▉ | 8875/10000 [32:19:51<4:01:29, 12.88s/it] {'loss': 0.005, 'learning_rate': 5.705e-06, 'epoch': 3.34} 89%|████████▉ | 8875/10000 [32:19:51<4:01:29, 12.88s/it] 89%|████████▉ | 8876/10000 [32:20:04<4:01:00, 12.86s/it] {'loss': 0.0042, 'learning_rate': 5.7000000000000005e-06, 'epoch': 3.34} 89%|████████▉ | 8876/10000 [32:20:04<4:01:00, 12.86s/it] 89%|████████▉ | 8877/10000 [32:20:17<4:00:49, 12.87s/it] {'loss': 0.004, 'learning_rate': 5.6950000000000005e-06, 'epoch': 3.34} 89%|████████▉ | 8877/10000 [32:20:17<4:00:49, 12.87s/it] 89%|████████▉ | 8878/10000 [32:20:30<4:00:43, 12.87s/it] {'loss': 0.0052, 'learning_rate': 5.690000000000001e-06, 'epoch': 3.35} 89%|████████▉ | 8878/10000 [32:20:30<4:00:43, 12.87s/it] 89%|████████▉ | 8879/10000 [32:20:43<4:00:24, 12.87s/it] {'loss': 0.0035, 'learning_rate': 5.685e-06, 'epoch': 3.35} 89%|████████▉ | 8879/10000 [32:20:43<4:00:24, 12.87s/it] 89%|████████▉ | 8880/10000 [32:20:56<4:00:12, 12.87s/it] {'loss': 0.0027, 'learning_rate': 5.680000000000001e-06, 'epoch': 3.35} 89%|████████▉ | 8880/10000 [32:20:56<4:00:12, 12.87s/it] 89%|████████▉ | 8881/10000 [32:21:09<4:00:18, 12.89s/it] {'loss': 0.0041, 'learning_rate': 5.675000000000001e-06, 'epoch': 3.35} 89%|████████▉ | 8881/10000 [32:21:09<4:00:18, 12.89s/it] 89%|████████▉ | 8882/10000 [32:21:22<4:00:25, 12.90s/it] {'loss': 0.0039, 'learning_rate': 5.67e-06, 'epoch': 3.35} 89%|████████▉ | 8882/10000 [32:21:22<4:00:25, 12.90s/it] 89%|████████▉ | 8883/10000 [32:21:35<4:00:35, 12.92s/it] {'loss': 0.004, 'learning_rate': 5.665e-06, 'epoch': 3.35} 89%|████████▉ | 8883/10000 [32:21:35<4:00:35, 12.92s/it] 89%|████████▉ | 8884/10000 [32:21:48<4:00:29, 12.93s/it] {'loss': 0.0047, 'learning_rate': 5.66e-06, 'epoch': 3.35} 89%|████████▉ | 8884/10000 [32:21:48<4:00:29, 12.93s/it] 89%|████████▉ | 8885/10000 [32:22:00<3:59:53, 12.91s/it] {'loss': 0.003, 'learning_rate': 5.655000000000001e-06, 'epoch': 3.35} 89%|████████▉ | 8885/10000 [32:22:00<3:59:53, 12.91s/it] 89%|████████▉ | 8886/10000 [32:22:13<3:59:28, 12.90s/it] {'loss': 0.004, 'learning_rate': 5.65e-06, 'epoch': 3.35} 89%|████████▉ | 8886/10000 [32:22:13<3:59:28, 12.90s/it] 89%|████████▉ | 8887/10000 [32:22:26<3:58:31, 12.86s/it] {'loss': 0.0056, 'learning_rate': 5.645e-06, 'epoch': 3.35} 89%|████████▉ | 8887/10000 [32:22:26<3:58:31, 12.86s/it] 89%|████████▉ | 8888/10000 [32:22:39<3:58:30, 12.87s/it] {'loss': 0.0026, 'learning_rate': 5.64e-06, 'epoch': 3.35} 89%|████████▉ | 8888/10000 [32:22:39<3:58:30, 12.87s/it] 89%|████████▉ | 8889/10000 [32:22:52<3:57:54, 12.85s/it] {'loss': 0.0041, 'learning_rate': 5.635e-06, 'epoch': 3.35} 89%|████████▉ | 8889/10000 [32:22:52<3:57:54, 12.85s/it] 89%|████████▉ | 8890/10000 [32:23:05<3:58:04, 12.87s/it] {'loss': 0.0035, 'learning_rate': 5.63e-06, 'epoch': 3.35} 89%|████████▉ | 8890/10000 [32:23:05<3:58:04, 12.87s/it] 89%|████████▉ | 8891/10000 [32:23:18<3:58:01, 12.88s/it] {'loss': 0.0039, 'learning_rate': 5.625e-06, 'epoch': 3.35} 89%|████████▉ | 8891/10000 [32:23:18<3:58:01, 12.88s/it] 89%|████████▉ | 8892/10000 [32:23:30<3:57:56, 12.89s/it] {'loss': 0.0044, 'learning_rate': 5.62e-06, 'epoch': 3.35} 89%|████████▉ | 8892/10000 [32:23:30<3:57:56, 12.89s/it] 89%|████████▉ | 8893/10000 [32:23:43<3:57:36, 12.88s/it] {'loss': 0.0043, 'learning_rate': 5.6150000000000005e-06, 'epoch': 3.35} 89%|████████▉ | 8893/10000 [32:23:43<3:57:36, 12.88s/it] 89%|████████▉ | 8894/10000 [32:23:56<3:57:23, 12.88s/it] {'loss': 0.0038, 'learning_rate': 5.61e-06, 'epoch': 3.35} 89%|████████▉ | 8894/10000 [32:23:56<3:57:23, 12.88s/it] 89%|████████▉ | 8895/10000 [32:24:09<3:57:14, 12.88s/it] {'loss': 0.0031, 'learning_rate': 5.6050000000000005e-06, 'epoch': 3.35} 89%|████████▉ | 8895/10000 [32:24:09<3:57:14, 12.88s/it] 89%|████████▉ | 8896/10000 [32:24:22<3:57:02, 12.88s/it] {'loss': 0.0036, 'learning_rate': 5.600000000000001e-06, 'epoch': 3.35} 89%|████████▉ | 8896/10000 [32:24:22<3:57:02, 12.88s/it] 89%|████████▉ | 8897/10000 [32:24:35<3:56:50, 12.88s/it] {'loss': 0.0042, 'learning_rate': 5.595000000000001e-06, 'epoch': 3.35} 89%|████████▉ | 8897/10000 [32:24:35<3:56:50, 12.88s/it] 89%|████████▉ | 8898/10000 [32:24:48<3:56:17, 12.86s/it] {'loss': 0.0041, 'learning_rate': 5.59e-06, 'epoch': 3.35} 89%|████████▉ | 8898/10000 [32:24:48<3:56:17, 12.86s/it] 89%|████████▉ | 8899/10000 [32:25:01<3:56:28, 12.89s/it] {'loss': 0.0037, 'learning_rate': 5.585e-06, 'epoch': 3.35} 89%|████████▉ | 8899/10000 [32:25:01<3:56:28, 12.89s/it] 89%|████████▉ | 8900/10000 [32:25:13<3:56:12, 12.88s/it] {'loss': 0.0031, 'learning_rate': 5.580000000000001e-06, 'epoch': 3.35} 89%|████████▉ | 8900/10000 [32:25:13<3:56:12, 12.88s/it] 89%|████████▉ | 8901/10000 [32:25:26<3:56:19, 12.90s/it] {'loss': 0.0043, 'learning_rate': 5.575e-06, 'epoch': 3.35} 89%|████████▉ | 8901/10000 [32:25:26<3:56:19, 12.90s/it] 89%|████████▉ | 8902/10000 [32:25:39<3:55:46, 12.88s/it] {'loss': 0.0034, 'learning_rate': 5.57e-06, 'epoch': 3.35} 89%|████████▉ | 8902/10000 [32:25:39<3:55:46, 12.88s/it] 89%|████████▉ | 8903/10000 [32:25:52<3:55:04, 12.86s/it] {'loss': 0.0039, 'learning_rate': 5.565e-06, 'epoch': 3.35} 89%|████████▉ | 8903/10000 [32:25:52<3:55:04, 12.86s/it] 89%|████████▉ | 8904/10000 [32:26:05<3:55:06, 12.87s/it] {'loss': 0.0031, 'learning_rate': 5.56e-06, 'epoch': 3.35} 89%|████████▉ | 8904/10000 [32:26:05<3:55:06, 12.87s/it] 89%|████████▉ | 8905/10000 [32:26:18<3:55:06, 12.88s/it] {'loss': 0.0041, 'learning_rate': 5.555e-06, 'epoch': 3.36} 89%|████████▉ | 8905/10000 [32:26:18<3:55:06, 12.88s/it] 89%|████████▉ | 8906/10000 [32:26:31<3:54:53, 12.88s/it] {'loss': 0.0036, 'learning_rate': 5.55e-06, 'epoch': 3.36} 89%|████████▉ | 8906/10000 [32:26:31<3:54:53, 12.88s/it] 89%|████████▉ | 8907/10000 [32:26:44<3:54:38, 12.88s/it] {'loss': 0.0056, 'learning_rate': 5.545e-06, 'epoch': 3.36} 89%|████████▉ | 8907/10000 [32:26:44<3:54:38, 12.88s/it] 89%|████████▉ | 8908/10000 [32:26:56<3:54:21, 12.88s/it] {'loss': 0.0042, 'learning_rate': 5.54e-06, 'epoch': 3.36} 89%|████████▉ | 8908/10000 [32:26:56<3:54:21, 12.88s/it] 89%|████████▉ | 8909/10000 [32:27:09<3:53:56, 12.87s/it] {'loss': 0.0034, 'learning_rate': 5.535e-06, 'epoch': 3.36} 89%|████████▉ | 8909/10000 [32:27:09<3:53:56, 12.87s/it] 89%|████████▉ | 8910/10000 [32:27:22<3:53:46, 12.87s/it] {'loss': 0.0036, 'learning_rate': 5.53e-06, 'epoch': 3.36} 89%|████████▉ | 8910/10000 [32:27:22<3:53:46, 12.87s/it] 89%|████████▉ | 8911/10000 [32:27:35<3:53:52, 12.89s/it] {'loss': 0.0032, 'learning_rate': 5.5250000000000005e-06, 'epoch': 3.36} 89%|████████▉ | 8911/10000 [32:27:35<3:53:52, 12.89s/it] 89%|████████▉ | 8912/10000 [32:27:48<3:53:25, 12.87s/it] {'loss': 0.0042, 'learning_rate': 5.5200000000000005e-06, 'epoch': 3.36} 89%|████████▉ | 8912/10000 [32:27:48<3:53:25, 12.87s/it] 89%|████████▉ | 8913/10000 [32:28:01<3:53:30, 12.89s/it] {'loss': 0.0035, 'learning_rate': 5.515e-06, 'epoch': 3.36} 89%|████████▉ | 8913/10000 [32:28:01<3:53:30, 12.89s/it] 89%|████████▉ | 8914/10000 [32:28:14<3:53:01, 12.87s/it] {'loss': 0.0057, 'learning_rate': 5.510000000000001e-06, 'epoch': 3.36} 89%|████████▉ | 8914/10000 [32:28:14<3:53:01, 12.87s/it] 89%|████████▉ | 8915/10000 [32:28:27<3:52:56, 12.88s/it] {'loss': 0.0042, 'learning_rate': 5.505000000000001e-06, 'epoch': 3.36} 89%|████████▉ | 8915/10000 [32:28:27<3:52:56, 12.88s/it] 89%|████████▉ | 8916/10000 [32:28:40<3:53:03, 12.90s/it] {'loss': 0.0046, 'learning_rate': 5.500000000000001e-06, 'epoch': 3.36} 89%|████████▉ | 8916/10000 [32:28:40<3:53:03, 12.90s/it] 89%|████████▉ | 8917/10000 [32:28:52<3:53:03, 12.91s/it] {'loss': 0.003, 'learning_rate': 5.495e-06, 'epoch': 3.36} 89%|████████▉ | 8917/10000 [32:28:53<3:53:03, 12.91s/it] 89%|████████▉ | 8918/10000 [32:29:05<3:52:44, 12.91s/it] {'loss': 0.0049, 'learning_rate': 5.49e-06, 'epoch': 3.36} 89%|████████▉ | 8918/10000 [32:29:05<3:52:44, 12.91s/it] 89%|████████▉ | 8919/10000 [32:29:18<3:52:18, 12.89s/it] {'loss': 0.0036, 'learning_rate': 5.485000000000001e-06, 'epoch': 3.36} 89%|████████▉ | 8919/10000 [32:29:18<3:52:18, 12.89s/it] 89%|████████▉ | 8920/10000 [32:29:31<3:52:16, 12.90s/it] {'loss': 0.003, 'learning_rate': 5.48e-06, 'epoch': 3.36} 89%|████████▉ | 8920/10000 [32:29:31<3:52:16, 12.90s/it] 89%|████████▉ | 8921/10000 [32:29:44<3:52:14, 12.91s/it] {'loss': 0.0036, 'learning_rate': 5.475e-06, 'epoch': 3.36} 89%|████████▉ | 8921/10000 [32:29:44<3:52:14, 12.91s/it] 89%|████████▉ | 8922/10000 [32:29:57<3:51:52, 12.91s/it] {'loss': 0.0038, 'learning_rate': 5.47e-06, 'epoch': 3.36} 89%|████████▉ | 8922/10000 [32:29:57<3:51:52, 12.91s/it] 89%|████████▉ | 8923/10000 [32:30:10<3:51:31, 12.90s/it] {'loss': 0.004, 'learning_rate': 5.465e-06, 'epoch': 3.36} 89%|████████▉ | 8923/10000 [32:30:10<3:51:31, 12.90s/it] 89%|████████▉ | 8924/10000 [32:30:23<3:51:27, 12.91s/it] {'loss': 0.0041, 'learning_rate': 5.46e-06, 'epoch': 3.36} 89%|████████▉ | 8924/10000 [32:30:23<3:51:27, 12.91s/it] 89%|████████▉ | 8925/10000 [32:30:36<3:51:12, 12.90s/it] {'loss': 0.0032, 'learning_rate': 5.455e-06, 'epoch': 3.36} 89%|████████▉ | 8925/10000 [32:30:36<3:51:12, 12.90s/it] 89%|████████▉ | 8926/10000 [32:30:49<3:51:20, 12.92s/it] {'loss': 0.0045, 'learning_rate': 5.45e-06, 'epoch': 3.36} 89%|████████▉ | 8926/10000 [32:30:49<3:51:20, 12.92s/it] 89%|████████▉ | 8927/10000 [32:31:02<3:50:57, 12.91s/it] {'loss': 0.0044, 'learning_rate': 5.445e-06, 'epoch': 3.36} 89%|████████▉ | 8927/10000 [32:31:02<3:50:57, 12.91s/it] 89%|████████▉ | 8928/10000 [32:31:15<3:50:52, 12.92s/it] {'loss': 0.0035, 'learning_rate': 5.44e-06, 'epoch': 3.36} 89%|████████▉ | 8928/10000 [32:31:15<3:50:52, 12.92s/it] 89%|████████▉ | 8929/10000 [32:31:28<3:51:10, 12.95s/it] {'loss': 0.0037, 'learning_rate': 5.4350000000000005e-06, 'epoch': 3.36} 89%|████████▉ | 8929/10000 [32:31:28<3:51:10, 12.95s/it] 89%|████████▉ | 8930/10000 [32:31:40<3:50:57, 12.95s/it] {'loss': 0.0031, 'learning_rate': 5.4300000000000005e-06, 'epoch': 3.36} 89%|████████▉ | 8930/10000 [32:31:41<3:50:57, 12.95s/it] 89%|████████▉ | 8931/10000 [32:31:53<3:51:01, 12.97s/it] {'loss': 0.0035, 'learning_rate': 5.4250000000000006e-06, 'epoch': 3.37} 89%|████████▉ | 8931/10000 [32:31:54<3:51:01, 12.97s/it] 89%|████████▉ | 8932/10000 [32:32:06<3:50:46, 12.96s/it] {'loss': 0.0036, 'learning_rate': 5.42e-06, 'epoch': 3.37} 89%|████████▉ | 8932/10000 [32:32:06<3:50:46, 12.96s/it] 89%|████████▉ | 8933/10000 [32:32:19<3:50:02, 12.94s/it] {'loss': 0.0047, 'learning_rate': 5.415e-06, 'epoch': 3.37} 89%|████████▉ | 8933/10000 [32:32:19<3:50:02, 12.94s/it] 89%|████████▉ | 8934/10000 [32:32:32<3:49:55, 12.94s/it] {'loss': 0.0029, 'learning_rate': 5.410000000000001e-06, 'epoch': 3.37} 89%|████████▉ | 8934/10000 [32:32:32<3:49:55, 12.94s/it] 89%|████████▉ | 8935/10000 [32:32:45<3:49:15, 12.92s/it] {'loss': 0.004, 'learning_rate': 5.405e-06, 'epoch': 3.37} 89%|████████▉ | 8935/10000 [32:32:45<3:49:15, 12.92s/it] 89%|████████▉ | 8936/10000 [32:32:58<3:49:16, 12.93s/it] {'loss': 0.0028, 'learning_rate': 5.4e-06, 'epoch': 3.37} 89%|████████▉ | 8936/10000 [32:32:58<3:49:16, 12.93s/it] 89%|████████▉ | 8937/10000 [32:33:11<3:49:07, 12.93s/it] {'loss': 0.004, 'learning_rate': 5.395e-06, 'epoch': 3.37} 89%|████████▉ | 8937/10000 [32:33:11<3:49:07, 12.93s/it] 89%|████████▉ | 8938/10000 [32:33:24<3:48:46, 12.93s/it] {'loss': 0.0033, 'learning_rate': 5.390000000000001e-06, 'epoch': 3.37} 89%|████████▉ | 8938/10000 [32:33:24<3:48:46, 12.93s/it] 89%|████████▉ | 8939/10000 [32:33:37<3:48:28, 12.92s/it] {'loss': 0.0035, 'learning_rate': 5.385e-06, 'epoch': 3.37} 89%|████████▉ | 8939/10000 [32:33:37<3:48:28, 12.92s/it] 89%|████████▉ | 8940/10000 [32:33:50<3:48:10, 12.92s/it] {'loss': 0.0041, 'learning_rate': 5.38e-06, 'epoch': 3.37} 89%|████████▉ | 8940/10000 [32:33:50<3:48:10, 12.92s/it] 89%|████████▉ | 8941/10000 [32:34:03<3:47:52, 12.91s/it] {'loss': 0.0044, 'learning_rate': 5.375e-06, 'epoch': 3.37} 89%|████████▉ | 8941/10000 [32:34:03<3:47:52, 12.91s/it] 89%|████████▉ | 8942/10000 [32:34:16<3:47:42, 12.91s/it] {'loss': 0.0038, 'learning_rate': 5.37e-06, 'epoch': 3.37} 89%|████████▉ | 8942/10000 [32:34:16<3:47:42, 12.91s/it] 89%|████████▉ | 8943/10000 [32:34:28<3:47:12, 12.90s/it] {'loss': 0.0041, 'learning_rate': 5.365e-06, 'epoch': 3.37} 89%|████████▉ | 8943/10000 [32:34:28<3:47:12, 12.90s/it] 89%|████████▉ | 8944/10000 [32:34:41<3:47:10, 12.91s/it] {'loss': 0.0035, 'learning_rate': 5.36e-06, 'epoch': 3.37} 89%|████████▉ | 8944/10000 [32:34:41<3:47:10, 12.91s/it] 89%|████████▉ | 8945/10000 [32:34:54<3:47:02, 12.91s/it] {'loss': 0.0037, 'learning_rate': 5.355e-06, 'epoch': 3.37} 89%|████████▉ | 8945/10000 [32:34:54<3:47:02, 12.91s/it] 89%|████████▉ | 8946/10000 [32:35:07<3:46:47, 12.91s/it] {'loss': 0.0036, 'learning_rate': 5.3500000000000004e-06, 'epoch': 3.37} 89%|████████▉ | 8946/10000 [32:35:07<3:46:47, 12.91s/it] 89%|████████▉ | 8947/10000 [32:35:20<3:46:35, 12.91s/it] {'loss': 0.004, 'learning_rate': 5.345e-06, 'epoch': 3.37} 89%|████████▉ | 8947/10000 [32:35:20<3:46:35, 12.91s/it] 89%|████████▉ | 8948/10000 [32:35:33<3:45:56, 12.89s/it] {'loss': 0.0033, 'learning_rate': 5.3400000000000005e-06, 'epoch': 3.37} 89%|████████▉ | 8948/10000 [32:35:33<3:45:56, 12.89s/it] 89%|████████▉ | 8949/10000 [32:35:46<3:45:51, 12.89s/it] {'loss': 0.0028, 'learning_rate': 5.335000000000001e-06, 'epoch': 3.37} 89%|████████▉ | 8949/10000 [32:35:46<3:45:51, 12.89s/it] 90%|████████▉ | 8950/10000 [32:35:59<3:45:14, 12.87s/it] {'loss': 0.0041, 'learning_rate': 5.330000000000001e-06, 'epoch': 3.37} 90%|████████▉ | 8950/10000 [32:35:59<3:45:14, 12.87s/it] 90%|████████▉ | 8951/10000 [32:36:12<3:45:10, 12.88s/it] {'loss': 0.003, 'learning_rate': 5.325e-06, 'epoch': 3.37} 90%|████████▉ | 8951/10000 [32:36:12<3:45:10, 12.88s/it] 90%|████████▉ | 8952/10000 [32:36:24<3:44:54, 12.88s/it] {'loss': 0.0037, 'learning_rate': 5.32e-06, 'epoch': 3.37} 90%|████████▉ | 8952/10000 [32:36:24<3:44:54, 12.88s/it] 90%|████████▉ | 8953/10000 [32:36:37<3:44:57, 12.89s/it] {'loss': 0.0047, 'learning_rate': 5.315000000000001e-06, 'epoch': 3.37} 90%|████████▉ | 8953/10000 [32:36:37<3:44:57, 12.89s/it] 90%|████████▉ | 8954/10000 [32:36:50<3:44:56, 12.90s/it] {'loss': 0.0044, 'learning_rate': 5.31e-06, 'epoch': 3.37} 90%|████████▉ | 8954/10000 [32:36:50<3:44:56, 12.90s/it] 90%|████████▉ | 8955/10000 [32:37:03<3:45:18, 12.94s/it] {'loss': 0.0028, 'learning_rate': 5.305e-06, 'epoch': 3.37} 90%|████████▉ | 8955/10000 [32:37:03<3:45:18, 12.94s/it] 90%|████████▉ | 8956/10000 [32:37:16<3:45:10, 12.94s/it] {'loss': 0.003, 'learning_rate': 5.3e-06, 'epoch': 3.37} 90%|████████▉ | 8956/10000 [32:37:16<3:45:10, 12.94s/it] 90%|████████▉ | 8957/10000 [32:37:29<3:45:00, 12.94s/it] {'loss': 0.0034, 'learning_rate': 5.295e-06, 'epoch': 3.37} 90%|████████▉ | 8957/10000 [32:37:29<3:45:00, 12.94s/it] 90%|████████▉ | 8958/10000 [32:37:42<3:44:47, 12.94s/it] {'loss': 0.0041, 'learning_rate': 5.29e-06, 'epoch': 3.38} 90%|████████▉ | 8958/10000 [32:37:42<3:44:47, 12.94s/it] 90%|████████▉ | 8959/10000 [32:37:55<3:44:44, 12.95s/it] {'loss': 0.0037, 'learning_rate': 5.285e-06, 'epoch': 3.38} 90%|████████▉ | 8959/10000 [32:37:55<3:44:44, 12.95s/it] 90%|████████▉ | 8960/10000 [32:38:08<3:43:50, 12.91s/it] {'loss': 0.0041, 'learning_rate': 5.28e-06, 'epoch': 3.38} 90%|████████▉ | 8960/10000 [32:38:08<3:43:50, 12.91s/it] 90%|████████▉ | 8961/10000 [32:38:21<3:43:41, 12.92s/it] {'loss': 0.004, 'learning_rate': 5.275e-06, 'epoch': 3.38} 90%|████████▉ | 8961/10000 [32:38:21<3:43:41, 12.92s/it] 90%|████████▉ | 8962/10000 [32:38:34<3:43:19, 12.91s/it] {'loss': 0.0058, 'learning_rate': 5.2699999999999995e-06, 'epoch': 3.38} 90%|████████▉ | 8962/10000 [32:38:34<3:43:19, 12.91s/it] 90%|████████▉ | 8963/10000 [32:38:47<3:43:23, 12.93s/it] {'loss': 0.0035, 'learning_rate': 5.265e-06, 'epoch': 3.38} 90%|████████▉ | 8963/10000 [32:38:47<3:43:23, 12.93s/it] 90%|████████▉ | 8964/10000 [32:39:00<3:43:13, 12.93s/it] {'loss': 0.0039, 'learning_rate': 5.2600000000000005e-06, 'epoch': 3.38} 90%|████████▉ | 8964/10000 [32:39:00<3:43:13, 12.93s/it] 90%|████████▉ | 8965/10000 [32:39:13<3:42:45, 12.91s/it] {'loss': 0.0041, 'learning_rate': 5.2550000000000005e-06, 'epoch': 3.38} 90%|████████▉ | 8965/10000 [32:39:13<3:42:45, 12.91s/it] 90%|████████▉ | 8966/10000 [32:39:26<3:43:05, 12.95s/it] {'loss': 0.0035, 'learning_rate': 5.25e-06, 'epoch': 3.38} 90%|████████▉ | 8966/10000 [32:39:26<3:43:05, 12.95s/it] 90%|████████▉ | 8967/10000 [32:39:38<3:42:37, 12.93s/it] {'loss': 0.004, 'learning_rate': 5.245e-06, 'epoch': 3.38} 90%|████████▉ | 8967/10000 [32:39:38<3:42:37, 12.93s/it] 90%|████████▉ | 8968/10000 [32:39:51<3:42:21, 12.93s/it] {'loss': 0.0039, 'learning_rate': 5.240000000000001e-06, 'epoch': 3.38} 90%|████████▉ | 8968/10000 [32:39:51<3:42:21, 12.93s/it] 90%|████████▉ | 8969/10000 [32:40:04<3:41:45, 12.91s/it] {'loss': 0.0038, 'learning_rate': 5.235000000000001e-06, 'epoch': 3.38} 90%|████████▉ | 8969/10000 [32:40:04<3:41:45, 12.91s/it] 90%|████████▉ | 8970/10000 [32:40:17<3:41:35, 12.91s/it] {'loss': 0.0028, 'learning_rate': 5.23e-06, 'epoch': 3.38} 90%|████████▉ | 8970/10000 [32:40:17<3:41:35, 12.91s/it] 90%|████████▉ | 8971/10000 [32:40:30<3:41:35, 12.92s/it] {'loss': 0.0034, 'learning_rate': 5.225e-06, 'epoch': 3.38} 90%|████████▉ | 8971/10000 [32:40:30<3:41:35, 12.92s/it] 90%|████████▉ | 8972/10000 [32:40:43<3:41:22, 12.92s/it] {'loss': 0.0062, 'learning_rate': 5.220000000000001e-06, 'epoch': 3.38} 90%|████████▉ | 8972/10000 [32:40:43<3:41:22, 12.92s/it] 90%|████████▉ | 8973/10000 [32:40:56<3:40:58, 12.91s/it] {'loss': 0.0039, 'learning_rate': 5.215e-06, 'epoch': 3.38} 90%|████████▉ | 8973/10000 [32:40:56<3:40:58, 12.91s/it] 90%|████████▉ | 8974/10000 [32:41:09<3:40:41, 12.91s/it] {'loss': 0.0034, 'learning_rate': 5.21e-06, 'epoch': 3.38} 90%|████████▉ | 8974/10000 [32:41:09<3:40:41, 12.91s/it] 90%|████████▉ | 8975/10000 [32:41:22<3:40:07, 12.89s/it] {'loss': 0.0038, 'learning_rate': 5.205e-06, 'epoch': 3.38} 90%|████████▉ | 8975/10000 [32:41:22<3:40:07, 12.89s/it] 90%|████████▉ | 8976/10000 [32:41:35<3:40:21, 12.91s/it] {'loss': 0.0035, 'learning_rate': 5.2e-06, 'epoch': 3.38} 90%|████████▉ | 8976/10000 [32:41:35<3:40:21, 12.91s/it] 90%|████████▉ | 8977/10000 [32:41:48<3:40:31, 12.93s/it] {'loss': 0.0033, 'learning_rate': 5.195e-06, 'epoch': 3.38} 90%|████████▉ | 8977/10000 [32:41:48<3:40:31, 12.93s/it] 90%|████████▉ | 8978/10000 [32:42:00<3:40:08, 12.92s/it] {'loss': 0.0047, 'learning_rate': 5.19e-06, 'epoch': 3.38} 90%|████████▉ | 8978/10000 [32:42:01<3:40:08, 12.92s/it] 90%|████████▉ | 8979/10000 [32:42:14<3:40:21, 12.95s/it] {'loss': 0.0041, 'learning_rate': 5.185e-06, 'epoch': 3.38} 90%|████████▉ | 8979/10000 [32:42:14<3:40:21, 12.95s/it] 90%|████████▉ | 8980/10000 [32:42:26<3:40:00, 12.94s/it] {'loss': 0.0035, 'learning_rate': 5.18e-06, 'epoch': 3.38} 90%|████████▉ | 8980/10000 [32:42:26<3:40:00, 12.94s/it] 90%|████████▉ | 8981/10000 [32:42:39<3:40:01, 12.95s/it] {'loss': 0.0031, 'learning_rate': 5.175e-06, 'epoch': 3.38} 90%|████████▉ | 8981/10000 [32:42:39<3:40:01, 12.95s/it] 90%|████████▉ | 8982/10000 [32:42:52<3:39:16, 12.92s/it] {'loss': 0.0025, 'learning_rate': 5.1700000000000005e-06, 'epoch': 3.38} 90%|████████▉ | 8982/10000 [32:42:52<3:39:16, 12.92s/it] 90%|████████▉ | 8983/10000 [32:43:05<3:39:01, 12.92s/it] {'loss': 0.0042, 'learning_rate': 5.1650000000000005e-06, 'epoch': 3.38} 90%|████████▉ | 8983/10000 [32:43:05<3:39:01, 12.92s/it] 90%|████████▉ | 8984/10000 [32:43:18<3:38:52, 12.93s/it] {'loss': 0.0046, 'learning_rate': 5.1600000000000006e-06, 'epoch': 3.39} 90%|████████▉ | 8984/10000 [32:43:18<3:38:52, 12.93s/it] 90%|████████▉ | 8985/10000 [32:43:31<3:38:28, 12.91s/it] {'loss': 0.0034, 'learning_rate': 5.155e-06, 'epoch': 3.39} 90%|████████▉ | 8985/10000 [32:43:31<3:38:28, 12.91s/it] 90%|████████▉ | 8986/10000 [32:43:44<3:38:40, 12.94s/it] {'loss': 0.004, 'learning_rate': 5.15e-06, 'epoch': 3.39} 90%|████████▉ | 8986/10000 [32:43:44<3:38:40, 12.94s/it] 90%|████████▉ | 8987/10000 [32:43:57<3:38:25, 12.94s/it] {'loss': 0.0035, 'learning_rate': 5.145000000000001e-06, 'epoch': 3.39} 90%|████████▉ | 8987/10000 [32:43:57<3:38:25, 12.94s/it] 90%|████████▉ | 8988/10000 [32:44:10<3:38:05, 12.93s/it] {'loss': 0.0041, 'learning_rate': 5.140000000000001e-06, 'epoch': 3.39} 90%|████████▉ | 8988/10000 [32:44:10<3:38:05, 12.93s/it] 90%|████████▉ | 8989/10000 [32:44:23<3:37:42, 12.92s/it] {'loss': 0.0038, 'learning_rate': 5.135e-06, 'epoch': 3.39} 90%|████████▉ | 8989/10000 [32:44:23<3:37:42, 12.92s/it] 90%|████████▉ | 8990/10000 [32:44:36<3:37:18, 12.91s/it] {'loss': 0.0037, 'learning_rate': 5.13e-06, 'epoch': 3.39} 90%|████████▉ | 8990/10000 [32:44:36<3:37:18, 12.91s/it] 90%|████████▉ | 8991/10000 [32:44:48<3:36:53, 12.90s/it] {'loss': 0.0035, 'learning_rate': 5.125e-06, 'epoch': 3.39} 90%|████████▉ | 8991/10000 [32:44:49<3:36:53, 12.90s/it] 90%|████████▉ | 8992/10000 [32:45:01<3:36:40, 12.90s/it] {'loss': 0.0024, 'learning_rate': 5.12e-06, 'epoch': 3.39} 90%|████████▉ | 8992/10000 [32:45:01<3:36:40, 12.90s/it] 90%|████████▉ | 8993/10000 [32:45:14<3:36:37, 12.91s/it] {'loss': 0.004, 'learning_rate': 5.115e-06, 'epoch': 3.39} 90%|████████▉ | 8993/10000 [32:45:14<3:36:37, 12.91s/it] 90%|████████▉ | 8994/10000 [32:45:27<3:35:59, 12.88s/it] {'loss': 0.0044, 'learning_rate': 5.11e-06, 'epoch': 3.39} 90%|████████▉ | 8994/10000 [32:45:27<3:35:59, 12.88s/it] 90%|████████▉ | 8995/10000 [32:45:40<3:35:48, 12.88s/it] {'loss': 0.0052, 'learning_rate': 5.105e-06, 'epoch': 3.39} 90%|████████▉ | 8995/10000 [32:45:40<3:35:48, 12.88s/it] 90%|████████▉ | 8996/10000 [32:45:53<3:35:43, 12.89s/it] {'loss': 0.0045, 'learning_rate': 5.1e-06, 'epoch': 3.39} 90%|████████▉ | 8996/10000 [32:45:53<3:35:43, 12.89s/it] 90%|████████▉ | 8997/10000 [32:46:06<3:35:36, 12.90s/it] {'loss': 0.0038, 'learning_rate': 5.095e-06, 'epoch': 3.39} 90%|████████▉ | 8997/10000 [32:46:06<3:35:36, 12.90s/it] 90%|████████▉ | 8998/10000 [32:46:19<3:35:16, 12.89s/it] {'loss': 0.0037, 'learning_rate': 5.09e-06, 'epoch': 3.39} 90%|████████▉ | 8998/10000 [32:46:19<3:35:16, 12.89s/it] 90%|████████▉ | 8999/10000 [32:46:32<3:35:19, 12.91s/it] {'loss': 0.0039, 'learning_rate': 5.0850000000000004e-06, 'epoch': 3.39} 90%|████████▉ | 8999/10000 [32:46:32<3:35:19, 12.91s/it] 90%|█████████ | 9000/10000 [32:46:45<3:35:08, 12.91s/it] {'loss': 0.0026, 'learning_rate': 5.08e-06, 'epoch': 3.39} 90%|█████████ | 9000/10000 [32:46:45<3:35:08, 12.91s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-07 05:11:43,169 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/config.json [INFO|configuration_utils.py:364] 2024-11-07 05:11:43,171 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-07 05:12:28,518 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-07 05:12:28,521 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-07 05:12:28,522 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/special_tokens_map.json [2024-11-07 05:12:28,531] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step9000 is about to be saved! [2024-11-07 05:12:28,569] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/global_step9000/mp_rank_00_model_states.pt [2024-11-07 05:12:28,569] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/global_step9000/mp_rank_00_model_states.pt... [2024-11-07 05:13:18,631] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/global_step9000/mp_rank_00_model_states.pt. [2024-11-07 05:13:18,714] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-07 05:15:04,003] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-07 05:15:04,101] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-9000/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-07 05:15:04,101] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step9000 is ready now! 90%|█████████ | 9001/10000 [32:50:23<20:41:04, 74.54s/it] {'loss': 0.0039, 'learning_rate': 5.0750000000000005e-06, 'epoch': 3.39} 90%|█████████ | 9001/10000 [32:50:23<20:41:04, 74.54s/it] 90%|█████████ | 9002/10000 [32:50:36<15:31:30, 56.00s/it] {'loss': 0.0031, 'learning_rate': 5.070000000000001e-06, 'epoch': 3.39} 90%|█████████ | 9002/10000 [32:50:36<15:31:30, 56.00s/it] 90%|█████████ | 9003/10000 [32:50:48<11:55:00, 43.03s/it] {'loss': 0.0032, 'learning_rate': 5.065000000000001e-06, 'epoch': 3.39} 90%|█████████ | 9003/10000 [32:50:48<11:55:00, 43.03s/it] 90%|█████████ | 9004/10000 [32:51:01<9:23:17, 33.93s/it] {'loss': 0.0046, 'learning_rate': 5.06e-06, 'epoch': 3.39} 90%|█████████ | 9004/10000 [32:51:01<9:23:17, 33.93s/it] 90%|█████████ | 9005/10000 [32:51:14<7:37:33, 27.59s/it] {'loss': 0.0043, 'learning_rate': 5.055e-06, 'epoch': 3.39} 90%|█████████ | 9005/10000 [32:51:14<7:37:33, 27.59s/it] 90%|█████████ | 9006/10000 [32:51:27<6:23:18, 23.14s/it] {'loss': 0.0041, 'learning_rate': 5.050000000000001e-06, 'epoch': 3.39} 90%|█████████ | 9006/10000 [32:51:27<6:23:18, 23.14s/it] 90%|█████████ | 9007/10000 [32:51:39<5:31:33, 20.03s/it] {'loss': 0.0036, 'learning_rate': 5.045000000000001e-06, 'epoch': 3.39} 90%|█████████ | 9007/10000 [32:51:39<5:31:33, 20.03s/it] 90%|█████████ | 9008/10000 [32:51:52<4:55:22, 17.87s/it] {'loss': 0.0037, 'learning_rate': 5.04e-06, 'epoch': 3.39} 90%|█████████ | 9008/10000 [32:51:52<4:55:22, 17.87s/it] 90%|█████████ | 9009/10000 [32:52:05<4:29:58, 16.35s/it] {'loss': 0.0051, 'learning_rate': 5.035e-06, 'epoch': 3.39} 90%|█████████ | 9009/10000 [32:52:05<4:29:58, 16.35s/it] 90%|█████████ | 9010/10000 [32:52:18<4:12:25, 15.30s/it] {'loss': 0.0041, 'learning_rate': 5.03e-06, 'epoch': 3.39} 90%|█████████ | 9010/10000 [32:52:18<4:12:25, 15.30s/it] 90%|█████████ | 9011/10000 [32:52:31<3:59:54, 14.55s/it] {'loss': 0.003, 'learning_rate': 5.025e-06, 'epoch': 3.4} 90%|█████████ | 9011/10000 [32:52:31<3:59:54, 14.55s/it] 90%|█████████ | 9012/10000 [32:52:44<3:51:31, 14.06s/it] {'loss': 0.0037, 'learning_rate': 5.02e-06, 'epoch': 3.4} 90%|█████████ | 9012/10000 [32:52:44<3:51:31, 14.06s/it] 90%|█████████ | 9013/10000 [32:52:57<3:45:29, 13.71s/it] {'loss': 0.0036, 'learning_rate': 5.015e-06, 'epoch': 3.4} 90%|█████████ | 9013/10000 [32:52:57<3:45:29, 13.71s/it] 90%|█████████ | 9014/10000 [32:53:10<3:41:32, 13.48s/it] {'loss': 0.003, 'learning_rate': 5.01e-06, 'epoch': 3.4} 90%|█████████ | 9014/10000 [32:53:10<3:41:32, 13.48s/it] 90%|█████████ | 9015/10000 [32:53:22<3:38:05, 13.28s/it] {'loss': 0.003, 'learning_rate': 5.005e-06, 'epoch': 3.4} 90%|█████████ | 9015/10000 [32:53:22<3:38:05, 13.28s/it] 90%|█████████ | 9016/10000 [32:53:35<3:35:56, 13.17s/it] {'loss': 0.0043, 'learning_rate': 5e-06, 'epoch': 3.4} 90%|█████████ | 9016/10000 [32:53:35<3:35:56, 13.17s/it] 90%|█████████ | 9017/10000 [32:53:48<3:34:13, 13.08s/it] {'loss': 0.0035, 'learning_rate': 4.9950000000000005e-06, 'epoch': 3.4} 90%|█████████ | 9017/10000 [32:53:48<3:34:13, 13.08s/it] 90%|█████████ | 9018/10000 [32:54:01<3:32:36, 12.99s/it] {'loss': 0.0042, 'learning_rate': 4.9900000000000005e-06, 'epoch': 3.4} 90%|█████████ | 9018/10000 [32:54:01<3:32:36, 12.99s/it] 90%|█████████ | 9019/10000 [32:54:14<3:31:53, 12.96s/it] {'loss': 0.0033, 'learning_rate': 4.985e-06, 'epoch': 3.4} 90%|█████████ | 9019/10000 [32:54:14<3:31:53, 12.96s/it] 90%|█████████ | 9020/10000 [32:54:27<3:31:41, 12.96s/it] {'loss': 0.0028, 'learning_rate': 4.98e-06, 'epoch': 3.4} 90%|█████████ | 9020/10000 [32:54:27<3:31:41, 12.96s/it] 90%|█████████ | 9021/10000 [32:54:40<3:31:15, 12.95s/it] {'loss': 0.0028, 'learning_rate': 4.975000000000001e-06, 'epoch': 3.4} 90%|█████████ | 9021/10000 [32:54:40<3:31:15, 12.95s/it] 90%|█████████ | 9022/10000 [32:54:53<3:31:25, 12.97s/it] {'loss': 0.0036, 'learning_rate': 4.970000000000001e-06, 'epoch': 3.4} 90%|█████████ | 9022/10000 [32:54:53<3:31:25, 12.97s/it] 90%|█████████ | 9023/10000 [32:55:06<3:31:06, 12.96s/it] {'loss': 0.0038, 'learning_rate': 4.965e-06, 'epoch': 3.4} 90%|█████████ | 9023/10000 [32:55:06<3:31:06, 12.96s/it] 90%|█████████ | 9024/10000 [32:55:19<3:30:56, 12.97s/it] {'loss': 0.0042, 'learning_rate': 4.96e-06, 'epoch': 3.4} 90%|█████████ | 9024/10000 [32:55:19<3:30:56, 12.97s/it] 90%|█████████ | 9025/10000 [32:55:31<3:30:18, 12.94s/it] {'loss': 0.0039, 'learning_rate': 4.955e-06, 'epoch': 3.4} 90%|█████████ | 9025/10000 [32:55:31<3:30:18, 12.94s/it] 90%|█████████ | 9026/10000 [32:55:44<3:29:45, 12.92s/it] {'loss': 0.0031, 'learning_rate': 4.950000000000001e-06, 'epoch': 3.4} 90%|█████████ | 9026/10000 [32:55:44<3:29:45, 12.92s/it] 90%|█████████ | 9027/10000 [32:55:57<3:30:17, 12.97s/it] {'loss': 0.0049, 'learning_rate': 4.945e-06, 'epoch': 3.4} 90%|█████████ | 9027/10000 [32:55:57<3:30:17, 12.97s/it] 90%|█████████ | 9028/10000 [32:56:10<3:30:01, 12.96s/it] {'loss': 0.0035, 'learning_rate': 4.94e-06, 'epoch': 3.4} 90%|█████████ | 9028/10000 [32:56:10<3:30:01, 12.96s/it] 90%|█████████ | 9029/10000 [32:56:23<3:29:29, 12.95s/it] {'loss': 0.0036, 'learning_rate': 4.935e-06, 'epoch': 3.4} 90%|█████████ | 9029/10000 [32:56:23<3:29:29, 12.95s/it] 90%|█████████ | 9030/10000 [32:56:36<3:28:58, 12.93s/it] {'loss': 0.0043, 'learning_rate': 4.93e-06, 'epoch': 3.4} 90%|█████████ | 9030/10000 [32:56:36<3:28:58, 12.93s/it] 90%|█████████ | 9031/10000 [32:56:49<3:28:34, 12.91s/it] {'loss': 0.0035, 'learning_rate': 4.925e-06, 'epoch': 3.4} 90%|█████████ | 9031/10000 [32:56:49<3:28:34, 12.91s/it] 90%|█████████ | 9032/10000 [32:57:02<3:28:36, 12.93s/it] {'loss': 0.003, 'learning_rate': 4.92e-06, 'epoch': 3.4} 90%|█████████ | 9032/10000 [32:57:02<3:28:36, 12.93s/it] 90%|█████████ | 9033/10000 [32:57:15<3:27:57, 12.90s/it] {'loss': 0.004, 'learning_rate': 4.915e-06, 'epoch': 3.4} 90%|█████████ | 9033/10000 [32:57:15<3:27:57, 12.90s/it][2024-11-07 05:22:24,953] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 90%|█████████ | 9034/10000 [32:57:26<3:21:12, 12.50s/it] {'loss': 0.0045, 'learning_rate': 4.915e-06, 'epoch': 3.4} 90%|█████████ | 9034/10000 [32:57:26<3:21:12, 12.50s/it][2024-11-07 05:22:36,569] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 90%|█████████ | 9035/10000 [32:57:38<3:16:44, 12.23s/it] {'loss': 0.0031, 'learning_rate': 4.915e-06, 'epoch': 3.4} 90%|█████████ | 9035/10000 [32:57:38<3:16:44, 12.23s/it] 90%|█████████ | 9036/10000 [32:57:51<3:19:36, 12.42s/it] {'loss': 0.0042, 'learning_rate': 4.9100000000000004e-06, 'epoch': 3.4} 90%|█████████ | 9036/10000 [32:57:51<3:19:36, 12.42s/it] 90%|█████████ | 9037/10000 [32:58:04<3:22:06, 12.59s/it] {'loss': 0.0032, 'learning_rate': 4.9050000000000005e-06, 'epoch': 3.41} 90%|█████████ | 9037/10000 [32:58:04<3:22:06, 12.59s/it] 90%|█████████ | 9038/10000 [32:58:17<3:23:10, 12.67s/it] {'loss': 0.0046, 'learning_rate': 4.9000000000000005e-06, 'epoch': 3.41} 90%|█████████ | 9038/10000 [32:58:17<3:23:10, 12.67s/it] 90%|█████████ | 9039/10000 [32:58:30<3:23:44, 12.72s/it] {'loss': 0.0029, 'learning_rate': 4.8950000000000006e-06, 'epoch': 3.41} 90%|█████████ | 9039/10000 [32:58:30<3:23:44, 12.72s/it] 90%|█████████ | 9040/10000 [32:58:43<3:24:58, 12.81s/it] {'loss': 0.0032, 'learning_rate': 4.89e-06, 'epoch': 3.41} 90%|█████████ | 9040/10000 [32:58:43<3:24:58, 12.81s/it] 90%|█████████ | 9041/10000 [32:58:55<3:25:08, 12.83s/it] {'loss': 0.0028, 'learning_rate': 4.885e-06, 'epoch': 3.41} 90%|█████████ | 9041/10000 [32:58:56<3:25:08, 12.83s/it] 90%|█████████ | 9042/10000 [32:59:08<3:25:12, 12.85s/it] {'loss': 0.0036, 'learning_rate': 4.880000000000001e-06, 'epoch': 3.41} 90%|█████████ | 9042/10000 [32:59:08<3:25:12, 12.85s/it] 90%|█████████ | 9043/10000 [32:59:21<3:25:11, 12.86s/it] {'loss': 0.004, 'learning_rate': 4.875000000000001e-06, 'epoch': 3.41} 90%|█████████ | 9043/10000 [32:59:21<3:25:11, 12.86s/it] 90%|█████████ | 9044/10000 [32:59:34<3:25:22, 12.89s/it] {'loss': 0.0031, 'learning_rate': 4.87e-06, 'epoch': 3.41} 90%|█████████ | 9044/10000 [32:59:34<3:25:22, 12.89s/it] 90%|█████████ | 9045/10000 [32:59:47<3:25:13, 12.89s/it] {'loss': 0.003, 'learning_rate': 4.865e-06, 'epoch': 3.41} 90%|█████████ | 9045/10000 [32:59:47<3:25:13, 12.89s/it] 90%|█████████ | 9046/10000 [33:00:00<3:25:35, 12.93s/it] {'loss': 0.004, 'learning_rate': 4.86e-06, 'epoch': 3.41} 90%|█████████ | 9046/10000 [33:00:00<3:25:35, 12.93s/it] 90%|█████████ | 9047/10000 [33:00:13<3:25:46, 12.96s/it] {'loss': 0.0028, 'learning_rate': 4.855e-06, 'epoch': 3.41} 90%|█████████ | 9047/10000 [33:00:13<3:25:46, 12.96s/it] 90%|█████████ | 9048/10000 [33:00:26<3:25:04, 12.93s/it] {'loss': 0.0034, 'learning_rate': 4.85e-06, 'epoch': 3.41} 90%|█████████ | 9048/10000 [33:00:26<3:25:04, 12.93s/it] 90%|█████████ | 9049/10000 [33:00:39<3:24:45, 12.92s/it] {'loss': 0.0039, 'learning_rate': 4.845e-06, 'epoch': 3.41} 90%|█████████ | 9049/10000 [33:00:39<3:24:45, 12.92s/it] 90%|█████████ | 9050/10000 [33:00:52<3:24:11, 12.90s/it] {'loss': 0.0039, 'learning_rate': 4.84e-06, 'epoch': 3.41} 90%|█████████ | 9050/10000 [33:00:52<3:24:11, 12.90s/it] 91%|█████████ | 9051/10000 [33:01:05<3:24:10, 12.91s/it] {'loss': 0.0032, 'learning_rate': 4.835e-06, 'epoch': 3.41} 91%|█████████ | 9051/10000 [33:01:05<3:24:10, 12.91s/it] 91%|█████████ | 9052/10000 [33:01:18<3:23:55, 12.91s/it] {'loss': 0.0022, 'learning_rate': 4.83e-06, 'epoch': 3.41} 91%|█████████ | 9052/10000 [33:01:18<3:23:55, 12.91s/it] 91%|█████████ | 9053/10000 [33:01:30<3:23:32, 12.90s/it] {'loss': 0.004, 'learning_rate': 4.825e-06, 'epoch': 3.41} 91%|█████████ | 9053/10000 [33:01:30<3:23:32, 12.90s/it] 91%|█████████ | 9054/10000 [33:01:43<3:23:28, 12.91s/it] {'loss': 0.0042, 'learning_rate': 4.8200000000000004e-06, 'epoch': 3.41} 91%|█████████ | 9054/10000 [33:01:43<3:23:28, 12.91s/it] 91%|█████████ | 9055/10000 [33:01:56<3:23:07, 12.90s/it] {'loss': 0.0046, 'learning_rate': 4.8150000000000005e-06, 'epoch': 3.41} 91%|█████████ | 9055/10000 [33:01:56<3:23:07, 12.90s/it] 91%|█████████ | 9056/10000 [33:02:09<3:22:45, 12.89s/it] {'loss': 0.0044, 'learning_rate': 4.81e-06, 'epoch': 3.41} 91%|█████████ | 9056/10000 [33:02:09<3:22:45, 12.89s/it] 91%|█████████ | 9057/10000 [33:02:22<3:22:42, 12.90s/it] {'loss': 0.0039, 'learning_rate': 4.805000000000001e-06, 'epoch': 3.41} 91%|█████████ | 9057/10000 [33:02:22<3:22:42, 12.90s/it] 91%|█████████ | 9058/10000 [33:02:35<3:22:55, 12.93s/it] {'loss': 0.0044, 'learning_rate': 4.800000000000001e-06, 'epoch': 3.41} 91%|█████████ | 9058/10000 [33:02:35<3:22:55, 12.93s/it] 91%|█████████ | 9059/10000 [33:02:48<3:22:47, 12.93s/it] {'loss': 0.0038, 'learning_rate': 4.795e-06, 'epoch': 3.41} 91%|█████████ | 9059/10000 [33:02:48<3:22:47, 12.93s/it] 91%|█████████ | 9060/10000 [33:03:01<3:22:31, 12.93s/it] {'loss': 0.0027, 'learning_rate': 4.79e-06, 'epoch': 3.41} 91%|█████████ | 9060/10000 [33:03:01<3:22:31, 12.93s/it] 91%|█████████ | 9061/10000 [33:03:14<3:21:48, 12.90s/it] {'loss': 0.0055, 'learning_rate': 4.785e-06, 'epoch': 3.41} 91%|█████████ | 9061/10000 [33:03:14<3:21:48, 12.90s/it] 91%|█████████ | 9062/10000 [33:03:27<3:21:52, 12.91s/it] {'loss': 0.0043, 'learning_rate': 4.780000000000001e-06, 'epoch': 3.41} 91%|█████████ | 9062/10000 [33:03:27<3:21:52, 12.91s/it] 91%|█████████ | 9063/10000 [33:03:40<3:21:22, 12.90s/it] {'loss': 0.0041, 'learning_rate': 4.775e-06, 'epoch': 3.41} 91%|█████████ | 9063/10000 [33:03:40<3:21:22, 12.90s/it] 91%|█████████ | 9064/10000 [33:03:52<3:21:16, 12.90s/it] {'loss': 0.0028, 'learning_rate': 4.77e-06, 'epoch': 3.42} 91%|█████████ | 9064/10000 [33:03:52<3:21:16, 12.90s/it] 91%|█████████ | 9065/10000 [33:04:05<3:20:58, 12.90s/it] {'loss': 0.0024, 'learning_rate': 4.765e-06, 'epoch': 3.42} 91%|█████████ | 9065/10000 [33:04:05<3:20:58, 12.90s/it] 91%|█████████ | 9066/10000 [33:04:18<3:20:26, 12.88s/it] {'loss': 0.0037, 'learning_rate': 4.76e-06, 'epoch': 3.42} 91%|█████████ | 9066/10000 [33:04:18<3:20:26, 12.88s/it] 91%|█████████ | 9067/10000 [33:04:31<3:20:21, 12.89s/it] {'loss': 0.0038, 'learning_rate': 4.755e-06, 'epoch': 3.42} 91%|█████████ | 9067/10000 [33:04:31<3:20:21, 12.89s/it] 91%|█████████ | 9068/10000 [33:04:44<3:20:08, 12.88s/it] {'loss': 0.0048, 'learning_rate': 4.75e-06, 'epoch': 3.42} 91%|█████████ | 9068/10000 [33:04:44<3:20:08, 12.88s/it] 91%|█████████ | 9069/10000 [33:04:57<3:20:03, 12.89s/it] {'loss': 0.0035, 'learning_rate': 4.745e-06, 'epoch': 3.42} 91%|█████████ | 9069/10000 [33:04:57<3:20:03, 12.89s/it] 91%|█████████ | 9070/10000 [33:05:10<3:19:28, 12.87s/it] {'loss': 0.0041, 'learning_rate': 4.74e-06, 'epoch': 3.42} 91%|█████████ | 9070/10000 [33:05:10<3:19:28, 12.87s/it] 91%|█████████ | 9071/10000 [33:05:23<3:19:38, 12.89s/it] {'loss': 0.0027, 'learning_rate': 4.735e-06, 'epoch': 3.42} 91%|█████████ | 9071/10000 [33:05:23<3:19:38, 12.89s/it] 91%|█████████ | 9072/10000 [33:05:35<3:19:14, 12.88s/it] {'loss': 0.0042, 'learning_rate': 4.7300000000000005e-06, 'epoch': 3.42} 91%|█████████ | 9072/10000 [33:05:36<3:19:14, 12.88s/it] 91%|█████████ | 9073/10000 [33:05:48<3:19:16, 12.90s/it] {'loss': 0.0033, 'learning_rate': 4.7250000000000005e-06, 'epoch': 3.42} 91%|█████████ | 9073/10000 [33:05:48<3:19:16, 12.90s/it] 91%|█████████ | 9074/10000 [33:06:01<3:19:06, 12.90s/it] {'loss': 0.0033, 'learning_rate': 4.72e-06, 'epoch': 3.42} 91%|█████████ | 9074/10000 [33:06:01<3:19:06, 12.90s/it] 91%|█████████ | 9075/10000 [33:06:14<3:19:12, 12.92s/it] {'loss': 0.0031, 'learning_rate': 4.715e-06, 'epoch': 3.42} 91%|█████████ | 9075/10000 [33:06:14<3:19:12, 12.92s/it] 91%|█████████ | 9076/10000 [33:06:27<3:19:22, 12.95s/it] {'loss': 0.0034, 'learning_rate': 4.710000000000001e-06, 'epoch': 3.42} 91%|█████████ | 9076/10000 [33:06:27<3:19:22, 12.95s/it] 91%|█████████ | 9077/10000 [33:06:40<3:19:09, 12.95s/it] {'loss': 0.0031, 'learning_rate': 4.705000000000001e-06, 'epoch': 3.42} 91%|█████████ | 9077/10000 [33:06:40<3:19:09, 12.95s/it] 91%|█████████ | 9078/10000 [33:06:53<3:18:48, 12.94s/it] {'loss': 0.004, 'learning_rate': 4.7e-06, 'epoch': 3.42} 91%|█████████ | 9078/10000 [33:06:53<3:18:48, 12.94s/it] 91%|█████████ | 9079/10000 [33:07:06<3:18:25, 12.93s/it] {'loss': 0.0036, 'learning_rate': 4.695e-06, 'epoch': 3.42} 91%|█████████ | 9079/10000 [33:07:06<3:18:25, 12.93s/it] 91%|█████████ | 9080/10000 [33:07:19<3:18:13, 12.93s/it] {'loss': 0.004, 'learning_rate': 4.69e-06, 'epoch': 3.42} 91%|█████████ | 9080/10000 [33:07:19<3:18:13, 12.93s/it] 91%|█████████ | 9081/10000 [33:07:32<3:17:39, 12.91s/it] {'loss': 0.0029, 'learning_rate': 4.685000000000001e-06, 'epoch': 3.42} 91%|█████████ | 9081/10000 [33:07:32<3:17:39, 12.91s/it] 91%|█████████ | 9082/10000 [33:07:45<3:17:22, 12.90s/it] {'loss': 0.0061, 'learning_rate': 4.68e-06, 'epoch': 3.42} 91%|█████████ | 9082/10000 [33:07:45<3:17:22, 12.90s/it] 91%|█████████ | 9083/10000 [33:07:58<3:17:10, 12.90s/it] {'loss': 0.0043, 'learning_rate': 4.675e-06, 'epoch': 3.42} 91%|█████████ | 9083/10000 [33:07:58<3:17:10, 12.90s/it] 91%|█████████ | 9084/10000 [33:08:11<3:17:36, 12.94s/it] {'loss': 0.0031, 'learning_rate': 4.67e-06, 'epoch': 3.42} 91%|█████████ | 9084/10000 [33:08:11<3:17:36, 12.94s/it] 91%|█████████ | 9085/10000 [33:08:24<3:17:11, 12.93s/it] {'loss': 0.0041, 'learning_rate': 4.665e-06, 'epoch': 3.42} 91%|█████████ | 9085/10000 [33:08:24<3:17:11, 12.93s/it] 91%|█████████ | 9086/10000 [33:08:36<3:16:49, 12.92s/it] {'loss': 0.0032, 'learning_rate': 4.66e-06, 'epoch': 3.42} 91%|█████████ | 9086/10000 [33:08:37<3:16:49, 12.92s/it] 91%|█████████ | 9087/10000 [33:08:49<3:16:17, 12.90s/it] {'loss': 0.0038, 'learning_rate': 4.655e-06, 'epoch': 3.42} 91%|█████████ | 9087/10000 [33:08:49<3:16:17, 12.90s/it] 91%|█████████ | 9088/10000 [33:09:02<3:16:13, 12.91s/it] {'loss': 0.0037, 'learning_rate': 4.65e-06, 'epoch': 3.42} 91%|█████████ | 9088/10000 [33:09:02<3:16:13, 12.91s/it] 91%|█████████ | 9089/10000 [33:09:15<3:16:12, 12.92s/it] {'loss': 0.0041, 'learning_rate': 4.645e-06, 'epoch': 3.42} 91%|█████████ | 9089/10000 [33:09:15<3:16:12, 12.92s/it] 91%|█████████ | 9090/10000 [33:09:28<3:15:40, 12.90s/it] {'loss': 0.0051, 'learning_rate': 4.64e-06, 'epoch': 3.43} 91%|█████████ | 9090/10000 [33:09:28<3:15:40, 12.90s/it] 91%|█████████ | 9091/10000 [33:09:41<3:15:06, 12.88s/it] {'loss': 0.0038, 'learning_rate': 4.6350000000000005e-06, 'epoch': 3.43} 91%|█████████ | 9091/10000 [33:09:41<3:15:06, 12.88s/it] 91%|█████████ | 9092/10000 [33:09:54<3:14:55, 12.88s/it] {'loss': 0.0039, 'learning_rate': 4.6300000000000006e-06, 'epoch': 3.43} 91%|█████████ | 9092/10000 [33:09:54<3:14:55, 12.88s/it] 91%|█████████ | 9093/10000 [33:10:07<3:14:47, 12.89s/it] {'loss': 0.0035, 'learning_rate': 4.625e-06, 'epoch': 3.43} 91%|█████████ | 9093/10000 [33:10:07<3:14:47, 12.89s/it] 91%|█████████ | 9094/10000 [33:10:20<3:14:57, 12.91s/it] {'loss': 0.0027, 'learning_rate': 4.62e-06, 'epoch': 3.43} 91%|█████████ | 9094/10000 [33:10:20<3:14:57, 12.91s/it] 91%|█████████ | 9095/10000 [33:10:33<3:15:04, 12.93s/it] {'loss': 0.004, 'learning_rate': 4.615e-06, 'epoch': 3.43} 91%|█████████ | 9095/10000 [33:10:33<3:15:04, 12.93s/it] 91%|█████████ | 9096/10000 [33:10:46<3:14:49, 12.93s/it] {'loss': 0.0034, 'learning_rate': 4.610000000000001e-06, 'epoch': 3.43} 91%|█████████ | 9096/10000 [33:10:46<3:14:49, 12.93s/it] 91%|█████████ | 9097/10000 [33:10:59<3:14:38, 12.93s/it] {'loss': 0.0032, 'learning_rate': 4.605e-06, 'epoch': 3.43} 91%|█████████ | 9097/10000 [33:10:59<3:14:38, 12.93s/it] 91%|█████████ | 9098/10000 [33:11:11<3:14:32, 12.94s/it] {'loss': 0.0034, 'learning_rate': 4.6e-06, 'epoch': 3.43} 91%|█████████ | 9098/10000 [33:11:11<3:14:32, 12.94s/it] 91%|█████████ | 9099/10000 [33:11:24<3:14:31, 12.95s/it] {'loss': 0.0037, 'learning_rate': 4.595e-06, 'epoch': 3.43} 91%|█████████ | 9099/10000 [33:11:24<3:14:31, 12.95s/it] 91%|█████████ | 9100/10000 [33:11:37<3:14:20, 12.96s/it] {'loss': 0.0046, 'learning_rate': 4.590000000000001e-06, 'epoch': 3.43} 91%|█████████ | 9100/10000 [33:11:37<3:14:20, 12.96s/it] 91%|█████████ | 9101/10000 [33:11:50<3:14:14, 12.96s/it] {'loss': 0.0041, 'learning_rate': 4.585e-06, 'epoch': 3.43} 91%|█████████ | 9101/10000 [33:11:50<3:14:14, 12.96s/it] 91%|█████████ | 9102/10000 [33:12:03<3:13:54, 12.96s/it] {'loss': 0.0022, 'learning_rate': 4.58e-06, 'epoch': 3.43} 91%|█████████ | 9102/10000 [33:12:03<3:13:54, 12.96s/it] 91%|█████████ | 9103/10000 [33:12:16<3:13:24, 12.94s/it] {'loss': 0.0038, 'learning_rate': 4.575e-06, 'epoch': 3.43} 91%|█████████ | 9103/10000 [33:12:16<3:13:24, 12.94s/it] 91%|█████████ | 9104/10000 [33:12:29<3:13:01, 12.93s/it] {'loss': 0.0041, 'learning_rate': 4.57e-06, 'epoch': 3.43} 91%|█████████ | 9104/10000 [33:12:29<3:13:01, 12.93s/it] 91%|█████████ | 9105/10000 [33:12:42<3:12:49, 12.93s/it] {'loss': 0.0028, 'learning_rate': 4.565e-06, 'epoch': 3.43} 91%|█████████ | 9105/10000 [33:12:42<3:12:49, 12.93s/it] 91%|█████████ | 9106/10000 [33:12:55<3:12:23, 12.91s/it] {'loss': 0.0039, 'learning_rate': 4.56e-06, 'epoch': 3.43} 91%|█████████ | 9106/10000 [33:12:55<3:12:23, 12.91s/it] 91%|█████████ | 9107/10000 [33:13:08<3:11:58, 12.90s/it] {'loss': 0.0048, 'learning_rate': 4.5550000000000004e-06, 'epoch': 3.43} 91%|█████████ | 9107/10000 [33:13:08<3:11:58, 12.90s/it] 91%|█████████ | 9108/10000 [33:13:21<3:12:04, 12.92s/it] {'loss': 0.0033, 'learning_rate': 4.5500000000000005e-06, 'epoch': 3.43} 91%|█████████ | 9108/10000 [33:13:21<3:12:04, 12.92s/it] 91%|█████████ | 9109/10000 [33:13:34<3:12:21, 12.95s/it] {'loss': 0.0033, 'learning_rate': 4.545e-06, 'epoch': 3.43} 91%|█████████ | 9109/10000 [33:13:34<3:12:21, 12.95s/it] 91%|█████████ | 9110/10000 [33:13:47<3:11:52, 12.94s/it] {'loss': 0.0048, 'learning_rate': 4.540000000000001e-06, 'epoch': 3.43} 91%|█████████ | 9110/10000 [33:13:47<3:11:52, 12.94s/it] 91%|█████████ | 9111/10000 [33:14:00<3:11:45, 12.94s/it] {'loss': 0.0031, 'learning_rate': 4.535000000000001e-06, 'epoch': 3.43} 91%|█████████ | 9111/10000 [33:14:00<3:11:45, 12.94s/it] 91%|█████████ | 9112/10000 [33:14:13<3:11:19, 12.93s/it] {'loss': 0.004, 'learning_rate': 4.53e-06, 'epoch': 3.43} 91%|█████████ | 9112/10000 [33:14:13<3:11:19, 12.93s/it] 91%|█████████ | 9113/10000 [33:14:25<3:10:53, 12.91s/it] {'loss': 0.0046, 'learning_rate': 4.525e-06, 'epoch': 3.43} 91%|█████████ | 9113/10000 [33:14:25<3:10:53, 12.91s/it] 91%|█████████ | 9114/10000 [33:14:38<3:10:26, 12.90s/it] {'loss': 0.0041, 'learning_rate': 4.52e-06, 'epoch': 3.43} 91%|█████████ | 9114/10000 [33:14:38<3:10:26, 12.90s/it] 91%|█████████ | 9115/10000 [33:14:51<3:10:16, 12.90s/it] {'loss': 0.0032, 'learning_rate': 4.515000000000001e-06, 'epoch': 3.43} 91%|█████████ | 9115/10000 [33:14:51<3:10:16, 12.90s/it] 91%|█████████ | 9116/10000 [33:15:04<3:10:05, 12.90s/it] {'loss': 0.004, 'learning_rate': 4.51e-06, 'epoch': 3.43} 91%|█████████ | 9116/10000 [33:15:04<3:10:05, 12.90s/it] 91%|█████████ | 9117/10000 [33:15:17<3:10:00, 12.91s/it] {'loss': 0.0034, 'learning_rate': 4.505e-06, 'epoch': 3.44} 91%|█████████ | 9117/10000 [33:15:17<3:10:00, 12.91s/it] 91%|█████████ | 9118/10000 [33:15:30<3:10:15, 12.94s/it] {'loss': 0.0029, 'learning_rate': 4.5e-06, 'epoch': 3.44} 91%|█████████ | 9118/10000 [33:15:30<3:10:15, 12.94s/it] 91%|█████████ | 9119/10000 [33:15:43<3:09:31, 12.91s/it] {'loss': 0.0045, 'learning_rate': 4.495e-06, 'epoch': 3.44} 91%|█████████ | 9119/10000 [33:15:43<3:09:31, 12.91s/it] 91%|█████████ | 9120/10000 [33:15:56<3:09:12, 12.90s/it] {'loss': 0.0034, 'learning_rate': 4.49e-06, 'epoch': 3.44} 91%|█████████ | 9120/10000 [33:15:56<3:09:12, 12.90s/it] 91%|█████████ | 9121/10000 [33:16:09<3:08:49, 12.89s/it] {'loss': 0.0034, 'learning_rate': 4.485e-06, 'epoch': 3.44} 91%|█████████ | 9121/10000 [33:16:09<3:08:49, 12.89s/it] 91%|█████████ | 9122/10000 [33:16:22<3:08:55, 12.91s/it] {'loss': 0.0039, 'learning_rate': 4.48e-06, 'epoch': 3.44} 91%|█████████ | 9122/10000 [33:16:22<3:08:55, 12.91s/it] 91%|█████████ | 9123/10000 [33:16:34<3:08:22, 12.89s/it] {'loss': 0.0037, 'learning_rate': 4.475e-06, 'epoch': 3.44} 91%|█████████ | 9123/10000 [33:16:34<3:08:22, 12.89s/it] 91%|█████████ | 9124/10000 [33:16:47<3:08:07, 12.89s/it] {'loss': 0.0057, 'learning_rate': 4.4699999999999996e-06, 'epoch': 3.44} 91%|█████████ | 9124/10000 [33:16:47<3:08:07, 12.89s/it] 91%|█████████▏| 9125/10000 [33:17:00<3:08:00, 12.89s/it] {'loss': 0.0034, 'learning_rate': 4.4650000000000004e-06, 'epoch': 3.44} 91%|█████████▏| 9125/10000 [33:17:00<3:08:00, 12.89s/it] 91%|█████████▏| 9126/10000 [33:17:13<3:07:40, 12.88s/it] {'loss': 0.0043, 'learning_rate': 4.4600000000000005e-06, 'epoch': 3.44} 91%|█████████▏| 9126/10000 [33:17:13<3:07:40, 12.88s/it] 91%|█████████▏| 9127/10000 [33:17:26<3:07:22, 12.88s/it] {'loss': 0.0043, 'learning_rate': 4.4550000000000005e-06, 'epoch': 3.44} 91%|█████████▏| 9127/10000 [33:17:26<3:07:22, 12.88s/it] 91%|█████████▏| 9128/10000 [33:17:39<3:07:44, 12.92s/it] {'loss': 0.0031, 'learning_rate': 4.45e-06, 'epoch': 3.44} 91%|█████████▏| 9128/10000 [33:17:39<3:07:44, 12.92s/it] 91%|█████████▏| 9129/10000 [33:17:52<3:07:21, 12.91s/it] {'loss': 0.0042, 'learning_rate': 4.445000000000001e-06, 'epoch': 3.44} 91%|█████████▏| 9129/10000 [33:17:52<3:07:21, 12.91s/it] 91%|█████████▏| 9130/10000 [33:18:05<3:06:54, 12.89s/it] {'loss': 0.0032, 'learning_rate': 4.440000000000001e-06, 'epoch': 3.44} 91%|█████████▏| 9130/10000 [33:18:05<3:06:54, 12.89s/it] 91%|█████████▏| 9131/10000 [33:18:18<3:06:45, 12.90s/it] {'loss': 0.0042, 'learning_rate': 4.435e-06, 'epoch': 3.44} 91%|█████████▏| 9131/10000 [33:18:18<3:06:45, 12.90s/it] 91%|█████████▏| 9132/10000 [33:18:31<3:06:55, 12.92s/it] {'loss': 0.0036, 'learning_rate': 4.43e-06, 'epoch': 3.44} 91%|█████████▏| 9132/10000 [33:18:31<3:06:55, 12.92s/it] 91%|█████████▏| 9133/10000 [33:18:44<3:07:01, 12.94s/it] {'loss': 0.0036, 'learning_rate': 4.425e-06, 'epoch': 3.44} 91%|█████████▏| 9133/10000 [33:18:44<3:07:01, 12.94s/it] 91%|█████████▏| 9134/10000 [33:18:57<3:06:55, 12.95s/it] {'loss': 0.0045, 'learning_rate': 4.420000000000001e-06, 'epoch': 3.44} 91%|█████████▏| 9134/10000 [33:18:57<3:06:55, 12.95s/it] 91%|█████████▏| 9135/10000 [33:19:09<3:06:44, 12.95s/it] {'loss': 0.0056, 'learning_rate': 4.415e-06, 'epoch': 3.44} 91%|█████████▏| 9135/10000 [33:19:10<3:06:44, 12.95s/it] 91%|█████████▏| 9136/10000 [33:19:22<3:06:06, 12.92s/it] {'loss': 0.0041, 'learning_rate': 4.41e-06, 'epoch': 3.44} 91%|█████████▏| 9136/10000 [33:19:22<3:06:06, 12.92s/it] 91%|█████████▏| 9137/10000 [33:19:35<3:06:04, 12.94s/it] {'loss': 0.0037, 'learning_rate': 4.405e-06, 'epoch': 3.44} 91%|█████████▏| 9137/10000 [33:19:35<3:06:04, 12.94s/it] 91%|█████████▏| 9138/10000 [33:19:48<3:05:51, 12.94s/it] {'loss': 0.0036, 'learning_rate': 4.4e-06, 'epoch': 3.44} 91%|█████████▏| 9138/10000 [33:19:48<3:05:51, 12.94s/it] 91%|█████████▏| 9139/10000 [33:20:01<3:05:49, 12.95s/it] {'loss': 0.0046, 'learning_rate': 4.395e-06, 'epoch': 3.44} 91%|█████████▏| 9139/10000 [33:20:01<3:05:49, 12.95s/it] 91%|█████████▏| 9140/10000 [33:20:14<3:05:38, 12.95s/it] {'loss': 0.0047, 'learning_rate': 4.39e-06, 'epoch': 3.44} 91%|█████████▏| 9140/10000 [33:20:14<3:05:38, 12.95s/it] 91%|█████████▏| 9141/10000 [33:20:27<3:05:31, 12.96s/it] {'loss': 0.0038, 'learning_rate': 4.385e-06, 'epoch': 3.44} 91%|█████████▏| 9141/10000 [33:20:27<3:05:31, 12.96s/it] 91%|█████████▏| 9142/10000 [33:20:40<3:05:29, 12.97s/it] {'loss': 0.0039, 'learning_rate': 4.38e-06, 'epoch': 3.44} 91%|█████████▏| 9142/10000 [33:20:40<3:05:29, 12.97s/it] 91%|█████████▏| 9143/10000 [33:20:53<3:05:18, 12.97s/it] {'loss': 0.0041, 'learning_rate': 4.375e-06, 'epoch': 3.44} 91%|█████████▏| 9143/10000 [33:20:53<3:05:18, 12.97s/it] 91%|█████████▏| 9144/10000 [33:21:06<3:05:02, 12.97s/it] {'loss': 0.0034, 'learning_rate': 4.3700000000000005e-06, 'epoch': 3.45} 91%|█████████▏| 9144/10000 [33:21:06<3:05:02, 12.97s/it] 91%|█████████▏| 9145/10000 [33:21:19<3:04:54, 12.98s/it] {'loss': 0.0048, 'learning_rate': 4.3650000000000006e-06, 'epoch': 3.45} 91%|█████████▏| 9145/10000 [33:21:19<3:04:54, 12.98s/it] 91%|█████████▏| 9146/10000 [33:21:32<3:04:27, 12.96s/it] {'loss': 0.0043, 'learning_rate': 4.360000000000001e-06, 'epoch': 3.45} 91%|█████████▏| 9146/10000 [33:21:32<3:04:27, 12.96s/it] 91%|█████████▏| 9147/10000 [33:21:45<3:04:06, 12.95s/it] {'loss': 0.0032, 'learning_rate': 4.355e-06, 'epoch': 3.45} 91%|█████████▏| 9147/10000 [33:21:45<3:04:06, 12.95s/it] 91%|█████████▏| 9148/10000 [33:21:58<3:03:49, 12.95s/it] {'loss': 0.0038, 'learning_rate': 4.35e-06, 'epoch': 3.45} 91%|█████████▏| 9148/10000 [33:21:58<3:03:49, 12.95s/it] 91%|█████████▏| 9149/10000 [33:22:11<3:03:29, 12.94s/it] {'loss': 0.004, 'learning_rate': 4.345000000000001e-06, 'epoch': 3.45} 91%|█████████▏| 9149/10000 [33:22:11<3:03:29, 12.94s/it] 92%|█████████▏| 9150/10000 [33:22:24<3:03:04, 12.92s/it] {'loss': 0.004, 'learning_rate': 4.34e-06, 'epoch': 3.45} 92%|█████████▏| 9150/10000 [33:22:24<3:03:04, 12.92s/it] 92%|█████████▏| 9151/10000 [33:22:37<3:02:45, 12.92s/it] {'loss': 0.0029, 'learning_rate': 4.335e-06, 'epoch': 3.45} 92%|█████████▏| 9151/10000 [33:22:37<3:02:45, 12.92s/it] 92%|█████████▏| 9152/10000 [33:22:49<3:02:29, 12.91s/it] {'loss': 0.0036, 'learning_rate': 4.33e-06, 'epoch': 3.45} 92%|█████████▏| 9152/10000 [33:22:50<3:02:29, 12.91s/it] 92%|█████████▏| 9153/10000 [33:23:02<3:02:17, 12.91s/it] {'loss': 0.0039, 'learning_rate': 4.325e-06, 'epoch': 3.45} 92%|█████████▏| 9153/10000 [33:23:02<3:02:17, 12.91s/it] 92%|█████████▏| 9154/10000 [33:23:15<3:02:06, 12.92s/it] {'loss': 0.0045, 'learning_rate': 4.32e-06, 'epoch': 3.45} 92%|█████████▏| 9154/10000 [33:23:15<3:02:06, 12.92s/it] 92%|█████████▏| 9155/10000 [33:23:28<3:02:20, 12.95s/it] {'loss': 0.0039, 'learning_rate': 4.315e-06, 'epoch': 3.45} 92%|█████████▏| 9155/10000 [33:23:28<3:02:20, 12.95s/it] 92%|█████████▏| 9156/10000 [33:23:41<3:02:03, 12.94s/it] {'loss': 0.0035, 'learning_rate': 4.31e-06, 'epoch': 3.45} 92%|█████████▏| 9156/10000 [33:23:41<3:02:03, 12.94s/it] 92%|█████████▏| 9157/10000 [33:23:54<3:01:24, 12.91s/it] {'loss': 0.0049, 'learning_rate': 4.305e-06, 'epoch': 3.45} 92%|█████████▏| 9157/10000 [33:23:54<3:01:24, 12.91s/it] 92%|█████████▏| 9158/10000 [33:24:07<3:01:20, 12.92s/it] {'loss': 0.0036, 'learning_rate': 4.2999999999999995e-06, 'epoch': 3.45} 92%|█████████▏| 9158/10000 [33:24:07<3:01:20, 12.92s/it] 92%|█████████▏| 9159/10000 [33:24:20<3:00:51, 12.90s/it] {'loss': 0.0038, 'learning_rate': 4.295e-06, 'epoch': 3.45} 92%|█████████▏| 9159/10000 [33:24:20<3:00:51, 12.90s/it] 92%|█████████▏| 9160/10000 [33:24:33<3:00:31, 12.89s/it] {'loss': 0.0039, 'learning_rate': 4.2900000000000004e-06, 'epoch': 3.45} 92%|█████████▏| 9160/10000 [33:24:33<3:00:31, 12.89s/it] 92%|█████████▏| 9161/10000 [33:24:46<3:00:34, 12.91s/it] {'loss': 0.0039, 'learning_rate': 4.2850000000000005e-06, 'epoch': 3.45} 92%|█████████▏| 9161/10000 [33:24:46<3:00:34, 12.91s/it] 92%|█████████▏| 9162/10000 [33:24:59<3:00:28, 12.92s/it] {'loss': 0.0039, 'learning_rate': 4.28e-06, 'epoch': 3.45} 92%|█████████▏| 9162/10000 [33:24:59<3:00:28, 12.92s/it] 92%|█████████▏| 9163/10000 [33:25:12<3:00:25, 12.93s/it] {'loss': 0.0036, 'learning_rate': 4.2750000000000006e-06, 'epoch': 3.45} 92%|█████████▏| 9163/10000 [33:25:12<3:00:25, 12.93s/it] 92%|█████████▏| 9164/10000 [33:25:25<3:00:10, 12.93s/it] {'loss': 0.0037, 'learning_rate': 4.270000000000001e-06, 'epoch': 3.45} 92%|█████████▏| 9164/10000 [33:25:25<3:00:10, 12.93s/it] 92%|█████████▏| 9165/10000 [33:25:37<2:59:55, 12.93s/it] {'loss': 0.0026, 'learning_rate': 4.265e-06, 'epoch': 3.45} 92%|█████████▏| 9165/10000 [33:25:38<2:59:55, 12.93s/it] 92%|█████████▏| 9166/10000 [33:25:50<2:59:25, 12.91s/it] {'loss': 0.0046, 'learning_rate': 4.26e-06, 'epoch': 3.45} 92%|█████████▏| 9166/10000 [33:25:50<2:59:25, 12.91s/it] 92%|█████████▏| 9167/10000 [33:26:03<2:58:48, 12.88s/it] {'loss': 0.0045, 'learning_rate': 4.255e-06, 'epoch': 3.45} 92%|█████████▏| 9167/10000 [33:26:03<2:58:48, 12.88s/it] 92%|█████████▏| 9168/10000 [33:26:16<2:58:50, 12.90s/it] {'loss': 0.004, 'learning_rate': 4.250000000000001e-06, 'epoch': 3.45} 92%|█████████▏| 9168/10000 [33:26:16<2:58:50, 12.90s/it] 92%|█████████▏| 9169/10000 [33:26:29<2:58:34, 12.89s/it] {'loss': 0.0048, 'learning_rate': 4.245e-06, 'epoch': 3.45} 92%|█████████▏| 9169/10000 [33:26:29<2:58:34, 12.89s/it] 92%|█████████▏| 9170/10000 [33:26:42<2:58:17, 12.89s/it] {'loss': 0.0047, 'learning_rate': 4.24e-06, 'epoch': 3.46} 92%|█████████▏| 9170/10000 [33:26:42<2:58:17, 12.89s/it] 92%|█████████▏| 9171/10000 [33:26:55<2:57:52, 12.87s/it] {'loss': 0.004, 'learning_rate': 4.235e-06, 'epoch': 3.46} 92%|█████████▏| 9171/10000 [33:26:55<2:57:52, 12.87s/it] 92%|█████████▏| 9172/10000 [33:27:08<2:57:48, 12.88s/it] {'loss': 0.0032, 'learning_rate': 4.23e-06, 'epoch': 3.46} 92%|█████████▏| 9172/10000 [33:27:08<2:57:48, 12.88s/it] 92%|█████████▏| 9173/10000 [33:27:20<2:57:34, 12.88s/it] {'loss': 0.003, 'learning_rate': 4.225e-06, 'epoch': 3.46} 92%|█████████▏| 9173/10000 [33:27:21<2:57:34, 12.88s/it] 92%|█████████▏| 9174/10000 [33:27:33<2:57:31, 12.90s/it] {'loss': 0.0039, 'learning_rate': 4.22e-06, 'epoch': 3.46} 92%|█████████▏| 9174/10000 [33:27:33<2:57:31, 12.90s/it] 92%|█████████▏| 9175/10000 [33:27:46<2:57:05, 12.88s/it] {'loss': 0.0033, 'learning_rate': 4.215e-06, 'epoch': 3.46} 92%|█████████▏| 9175/10000 [33:27:46<2:57:05, 12.88s/it] 92%|█████████▏| 9176/10000 [33:27:59<2:56:43, 12.87s/it] {'loss': 0.004, 'learning_rate': 4.21e-06, 'epoch': 3.46} 92%|█████████▏| 9176/10000 [33:27:59<2:56:43, 12.87s/it] 92%|█████████▏| 9177/10000 [33:28:12<2:56:44, 12.89s/it] {'loss': 0.0033, 'learning_rate': 4.2049999999999996e-06, 'epoch': 3.46} 92%|█████████▏| 9177/10000 [33:28:12<2:56:44, 12.89s/it] 92%|█████████▏| 9178/10000 [33:28:25<2:56:34, 12.89s/it] {'loss': 0.0036, 'learning_rate': 4.2000000000000004e-06, 'epoch': 3.46} 92%|█████████▏| 9178/10000 [33:28:25<2:56:34, 12.89s/it] 92%|█████████▏| 9179/10000 [33:28:38<2:56:23, 12.89s/it] {'loss': 0.0037, 'learning_rate': 4.1950000000000005e-06, 'epoch': 3.46} 92%|█████████▏| 9179/10000 [33:28:38<2:56:23, 12.89s/it] 92%|█████████▏| 9180/10000 [33:28:51<2:56:03, 12.88s/it] {'loss': 0.003, 'learning_rate': 4.1900000000000005e-06, 'epoch': 3.46} 92%|█████████▏| 9180/10000 [33:28:51<2:56:03, 12.88s/it] 92%|█████████▏| 9181/10000 [33:29:04<2:55:51, 12.88s/it] {'loss': 0.0037, 'learning_rate': 4.185e-06, 'epoch': 3.46} 92%|█████████▏| 9181/10000 [33:29:04<2:55:51, 12.88s/it] 92%|█████████▏| 9182/10000 [33:29:16<2:55:27, 12.87s/it] {'loss': 0.0036, 'learning_rate': 4.18e-06, 'epoch': 3.46} 92%|█████████▏| 9182/10000 [33:29:16<2:55:27, 12.87s/it] 92%|█████████▏| 9183/10000 [33:29:29<2:55:04, 12.86s/it] {'loss': 0.0041, 'learning_rate': 4.175000000000001e-06, 'epoch': 3.46} 92%|█████████▏| 9183/10000 [33:29:29<2:55:04, 12.86s/it] 92%|█████████▏| 9184/10000 [33:29:42<2:55:16, 12.89s/it] {'loss': 0.0046, 'learning_rate': 4.17e-06, 'epoch': 3.46} 92%|█████████▏| 9184/10000 [33:29:42<2:55:16, 12.89s/it] 92%|█████████▏| 9185/10000 [33:29:55<2:55:27, 12.92s/it] {'loss': 0.0035, 'learning_rate': 4.165e-06, 'epoch': 3.46} 92%|█████████▏| 9185/10000 [33:29:55<2:55:27, 12.92s/it] 92%|█████████▏| 9186/10000 [33:30:08<2:55:30, 12.94s/it] {'loss': 0.003, 'learning_rate': 4.16e-06, 'epoch': 3.46} 92%|█████████▏| 9186/10000 [33:30:08<2:55:30, 12.94s/it] 92%|█████████▏| 9187/10000 [33:30:21<2:55:27, 12.95s/it] {'loss': 0.003, 'learning_rate': 4.155e-06, 'epoch': 3.46} 92%|█████████▏| 9187/10000 [33:30:21<2:55:27, 12.95s/it] 92%|█████████▏| 9188/10000 [33:30:34<2:55:17, 12.95s/it] {'loss': 0.0027, 'learning_rate': 4.15e-06, 'epoch': 3.46} 92%|█████████▏| 9188/10000 [33:30:34<2:55:17, 12.95s/it] 92%|█████████▏| 9189/10000 [33:30:47<2:54:43, 12.93s/it] {'loss': 0.0043, 'learning_rate': 4.145e-06, 'epoch': 3.46} 92%|█████████▏| 9189/10000 [33:30:47<2:54:43, 12.93s/it] 92%|█████████▏| 9190/10000 [33:31:00<2:54:49, 12.95s/it] {'loss': 0.0033, 'learning_rate': 4.14e-06, 'epoch': 3.46} 92%|█████████▏| 9190/10000 [33:31:00<2:54:49, 12.95s/it] 92%|█████████▏| 9191/10000 [33:31:13<2:54:33, 12.95s/it] {'loss': 0.0038, 'learning_rate': 4.135e-06, 'epoch': 3.46} 92%|█████████▏| 9191/10000 [33:31:13<2:54:33, 12.95s/it] 92%|█████████▏| 9192/10000 [33:31:26<2:53:54, 12.91s/it] {'loss': 0.0043, 'learning_rate': 4.13e-06, 'epoch': 3.46} 92%|█████████▏| 9192/10000 [33:31:26<2:53:54, 12.91s/it] 92%|█████████▏| 9193/10000 [33:31:39<2:53:29, 12.90s/it] {'loss': 0.0039, 'learning_rate': 4.125e-06, 'epoch': 3.46} 92%|█████████▏| 9193/10000 [33:31:39<2:53:29, 12.90s/it] 92%|█████████▏| 9194/10000 [33:31:51<2:53:00, 12.88s/it] {'loss': 0.0023, 'learning_rate': 4.12e-06, 'epoch': 3.46} 92%|█████████▏| 9194/10000 [33:31:51<2:53:00, 12.88s/it] 92%|█████████▏| 9195/10000 [33:32:04<2:52:53, 12.89s/it] {'loss': 0.0031, 'learning_rate': 4.115e-06, 'epoch': 3.46} 92%|█████████▏| 9195/10000 [33:32:04<2:52:53, 12.89s/it] 92%|█████████▏| 9196/10000 [33:32:17<2:52:24, 12.87s/it] {'loss': 0.0044, 'learning_rate': 4.11e-06, 'epoch': 3.46} 92%|█████████▏| 9196/10000 [33:32:17<2:52:24, 12.87s/it] 92%|█████████▏| 9197/10000 [33:32:30<2:52:20, 12.88s/it] {'loss': 0.0034, 'learning_rate': 4.1050000000000005e-06, 'epoch': 3.47} 92%|█████████▏| 9197/10000 [33:32:30<2:52:20, 12.88s/it] 92%|█████████▏| 9198/10000 [33:32:43<2:52:10, 12.88s/it] {'loss': 0.0037, 'learning_rate': 4.1000000000000006e-06, 'epoch': 3.47} 92%|█████████▏| 9198/10000 [33:32:43<2:52:10, 12.88s/it] 92%|█████████▏| 9199/10000 [33:32:56<2:52:01, 12.89s/it] {'loss': 0.004, 'learning_rate': 4.095000000000001e-06, 'epoch': 3.47} 92%|█████████▏| 9199/10000 [33:32:56<2:52:01, 12.89s/it] 92%|█████████▏| 9200/10000 [33:33:09<2:51:56, 12.90s/it] {'loss': 0.0041, 'learning_rate': 4.09e-06, 'epoch': 3.47} 92%|█████████▏| 9200/10000 [33:33:09<2:51:56, 12.90s/it] 92%|█████████▏| 9201/10000 [33:33:22<2:51:31, 12.88s/it] {'loss': 0.003, 'learning_rate': 4.085e-06, 'epoch': 3.47} 92%|█████████▏| 9201/10000 [33:33:22<2:51:31, 12.88s/it] 92%|█████████▏| 9202/10000 [33:33:35<2:51:48, 12.92s/it] {'loss': 0.003, 'learning_rate': 4.080000000000001e-06, 'epoch': 3.47} 92%|█████████▏| 9202/10000 [33:33:35<2:51:48, 12.92s/it] 92%|█████████▏| 9203/10000 [33:33:48<2:51:39, 12.92s/it] {'loss': 0.0034, 'learning_rate': 4.075e-06, 'epoch': 3.47} 92%|█████████▏| 9203/10000 [33:33:48<2:51:39, 12.92s/it] 92%|█████████▏| 9204/10000 [33:34:00<2:51:13, 12.91s/it] {'loss': 0.0031, 'learning_rate': 4.07e-06, 'epoch': 3.47} 92%|█████████▏| 9204/10000 [33:34:00<2:51:13, 12.91s/it] 92%|█████████▏| 9205/10000 [33:34:13<2:51:06, 12.91s/it] {'loss': 0.0029, 'learning_rate': 4.065e-06, 'epoch': 3.47} 92%|█████████▏| 9205/10000 [33:34:13<2:51:06, 12.91s/it] 92%|█████████▏| 9206/10000 [33:34:26<2:50:52, 12.91s/it] {'loss': 0.0039, 'learning_rate': 4.06e-06, 'epoch': 3.47} 92%|█████████▏| 9206/10000 [33:34:26<2:50:52, 12.91s/it] 92%|█████████▏| 9207/10000 [33:34:39<2:50:38, 12.91s/it] {'loss': 0.0028, 'learning_rate': 4.055e-06, 'epoch': 3.47} 92%|█████████▏| 9207/10000 [33:34:39<2:50:38, 12.91s/it] 92%|█████████▏| 9208/10000 [33:34:52<2:50:42, 12.93s/it] {'loss': 0.0035, 'learning_rate': 4.05e-06, 'epoch': 3.47} 92%|█████████▏| 9208/10000 [33:34:52<2:50:42, 12.93s/it] 92%|█████████▏| 9209/10000 [33:35:05<2:50:25, 12.93s/it] {'loss': 0.0063, 'learning_rate': 4.045e-06, 'epoch': 3.47} 92%|█████████▏| 9209/10000 [33:35:05<2:50:25, 12.93s/it] 92%|█████████▏| 9210/10000 [33:35:18<2:50:07, 12.92s/it] {'loss': 0.0032, 'learning_rate': 4.04e-06, 'epoch': 3.47} 92%|█████████▏| 9210/10000 [33:35:18<2:50:07, 12.92s/it] 92%|█████████▏| 9211/10000 [33:35:31<2:49:54, 12.92s/it] {'loss': 0.0032, 'learning_rate': 4.0349999999999995e-06, 'epoch': 3.47} 92%|█████████▏| 9211/10000 [33:35:31<2:49:54, 12.92s/it] 92%|█████████▏| 9212/10000 [33:35:44<2:49:27, 12.90s/it] {'loss': 0.0036, 'learning_rate': 4.03e-06, 'epoch': 3.47} 92%|█████████▏| 9212/10000 [33:35:44<2:49:27, 12.90s/it] 92%|█████████▏| 9213/10000 [33:35:57<2:49:15, 12.90s/it] {'loss': 0.0033, 'learning_rate': 4.0250000000000004e-06, 'epoch': 3.47} 92%|█████████▏| 9213/10000 [33:35:57<2:49:15, 12.90s/it] 92%|█████████▏| 9214/10000 [33:36:10<2:48:53, 12.89s/it] {'loss': 0.004, 'learning_rate': 4.0200000000000005e-06, 'epoch': 3.47} 92%|█████████▏| 9214/10000 [33:36:10<2:48:53, 12.89s/it] 92%|█████████▏| 9215/10000 [33:36:23<2:49:10, 12.93s/it] {'loss': 0.0031, 'learning_rate': 4.015e-06, 'epoch': 3.47} 92%|█████████▏| 9215/10000 [33:36:23<2:49:10, 12.93s/it] 92%|█████████▏| 9216/10000 [33:36:36<2:49:05, 12.94s/it] {'loss': 0.0039, 'learning_rate': 4.01e-06, 'epoch': 3.47} 92%|█████████▏| 9216/10000 [33:36:36<2:49:05, 12.94s/it] 92%|█████████▏| 9217/10000 [33:36:48<2:48:36, 12.92s/it] {'loss': 0.0033, 'learning_rate': 4.005000000000001e-06, 'epoch': 3.47} 92%|█████████▏| 9217/10000 [33:36:48<2:48:36, 12.92s/it] 92%|█████████▏| 9218/10000 [33:37:01<2:48:20, 12.92s/it] {'loss': 0.0029, 'learning_rate': 4.000000000000001e-06, 'epoch': 3.47} 92%|█████████▏| 9218/10000 [33:37:01<2:48:20, 12.92s/it] 92%|█████████▏| 9219/10000 [33:37:14<2:47:57, 12.90s/it] {'loss': 0.0037, 'learning_rate': 3.995e-06, 'epoch': 3.47} 92%|█████████▏| 9219/10000 [33:37:14<2:47:57, 12.90s/it] 92%|█████████▏| 9220/10000 [33:37:27<2:47:32, 12.89s/it] {'loss': 0.006, 'learning_rate': 3.99e-06, 'epoch': 3.47} 92%|█████████▏| 9220/10000 [33:37:27<2:47:32, 12.89s/it] 92%|█████████▏| 9221/10000 [33:37:40<2:47:16, 12.88s/it] {'loss': 0.0036, 'learning_rate': 3.985e-06, 'epoch': 3.47} 92%|█████████▏| 9221/10000 [33:37:40<2:47:16, 12.88s/it] 92%|█████████▏| 9222/10000 [33:37:53<2:46:45, 12.86s/it] {'loss': 0.0036, 'learning_rate': 3.98e-06, 'epoch': 3.47} 92%|█████████▏| 9222/10000 [33:37:53<2:46:45, 12.86s/it] 92%|█████████▏| 9223/10000 [33:38:06<2:46:44, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.975e-06, 'epoch': 3.48} 92%|█████████▏| 9223/10000 [33:38:06<2:46:44, 12.88s/it] 92%|█████████▏| 9224/10000 [33:38:19<2:46:36, 12.88s/it] {'loss': 0.0032, 'learning_rate': 3.97e-06, 'epoch': 3.48} 92%|█████████▏| 9224/10000 [33:38:19<2:46:36, 12.88s/it] 92%|█████████▏| 9225/10000 [33:38:31<2:46:19, 12.88s/it] {'loss': 0.0035, 'learning_rate': 3.965e-06, 'epoch': 3.48} 92%|█████████▏| 9225/10000 [33:38:31<2:46:19, 12.88s/it] 92%|█████████▏| 9226/10000 [33:38:44<2:46:16, 12.89s/it] {'loss': 0.0027, 'learning_rate': 3.96e-06, 'epoch': 3.48} 92%|█████████▏| 9226/10000 [33:38:44<2:46:16, 12.89s/it] 92%|█████████▏| 9227/10000 [33:38:57<2:46:14, 12.90s/it] {'loss': 0.0032, 'learning_rate': 3.955e-06, 'epoch': 3.48} 92%|█████████▏| 9227/10000 [33:38:57<2:46:14, 12.90s/it] 92%|█████████▏| 9228/10000 [33:39:10<2:45:51, 12.89s/it] {'loss': 0.0059, 'learning_rate': 3.95e-06, 'epoch': 3.48} 92%|█████████▏| 9228/10000 [33:39:10<2:45:51, 12.89s/it] 92%|█████████▏| 9229/10000 [33:39:23<2:45:12, 12.86s/it] {'loss': 0.0031, 'learning_rate': 3.945e-06, 'epoch': 3.48} 92%|█████████▏| 9229/10000 [33:39:23<2:45:12, 12.86s/it] 92%|█████████▏| 9230/10000 [33:39:36<2:45:09, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.9399999999999995e-06, 'epoch': 3.48} 92%|█████████▏| 9230/10000 [33:39:36<2:45:09, 12.87s/it] 92%|█████████▏| 9231/10000 [33:39:49<2:45:02, 12.88s/it] {'loss': 0.0051, 'learning_rate': 3.9350000000000004e-06, 'epoch': 3.48} 92%|█████████▏| 9231/10000 [33:39:49<2:45:02, 12.88s/it] 92%|█████████▏| 9232/10000 [33:40:02<2:44:50, 12.88s/it] {'loss': 0.0031, 'learning_rate': 3.9300000000000005e-06, 'epoch': 3.48} 92%|█████████▏| 9232/10000 [33:40:02<2:44:50, 12.88s/it] 92%|█████████▏| 9233/10000 [33:40:14<2:44:48, 12.89s/it] {'loss': 0.0034, 'learning_rate': 3.9250000000000005e-06, 'epoch': 3.48} 92%|█████████▏| 9233/10000 [33:40:14<2:44:48, 12.89s/it] 92%|█████████▏| 9234/10000 [33:40:27<2:44:36, 12.89s/it] {'loss': 0.0036, 'learning_rate': 3.92e-06, 'epoch': 3.48} 92%|█████████▏| 9234/10000 [33:40:27<2:44:36, 12.89s/it] 92%|█████████▏| 9235/10000 [33:40:40<2:44:07, 12.87s/it] {'loss': 0.0041, 'learning_rate': 3.915e-06, 'epoch': 3.48} 92%|█████████▏| 9235/10000 [33:40:40<2:44:07, 12.87s/it] 92%|█████████▏| 9236/10000 [33:40:53<2:44:06, 12.89s/it] {'loss': 0.0032, 'learning_rate': 3.910000000000001e-06, 'epoch': 3.48} 92%|█████████▏| 9236/10000 [33:40:53<2:44:06, 12.89s/it] 92%|█████████▏| 9237/10000 [33:41:06<2:43:50, 12.88s/it] {'loss': 0.0046, 'learning_rate': 3.905000000000001e-06, 'epoch': 3.48} 92%|█████████▏| 9237/10000 [33:41:06<2:43:50, 12.88s/it] 92%|█████████▏| 9238/10000 [33:41:19<2:43:41, 12.89s/it] {'loss': 0.003, 'learning_rate': 3.9e-06, 'epoch': 3.48} 92%|█████████▏| 9238/10000 [33:41:19<2:43:41, 12.89s/it] 92%|█████████▏| 9239/10000 [33:41:32<2:43:25, 12.89s/it] {'loss': 0.0038, 'learning_rate': 3.895e-06, 'epoch': 3.48} 92%|█████████▏| 9239/10000 [33:41:32<2:43:25, 12.89s/it] 92%|█████████▏| 9240/10000 [33:41:45<2:43:02, 12.87s/it] {'loss': 0.0034, 'learning_rate': 3.89e-06, 'epoch': 3.48} 92%|█████████▏| 9240/10000 [33:41:45<2:43:02, 12.87s/it] 92%|█████████▏| 9241/10000 [33:41:58<2:42:58, 12.88s/it] {'loss': 0.0052, 'learning_rate': 3.885e-06, 'epoch': 3.48} 92%|█████████▏| 9241/10000 [33:41:58<2:42:58, 12.88s/it] 92%|█████████▏| 9242/10000 [33:42:10<2:42:35, 12.87s/it] {'loss': 0.0041, 'learning_rate': 3.88e-06, 'epoch': 3.48} 92%|█████████▏| 9242/10000 [33:42:10<2:42:35, 12.87s/it] 92%|█████████▏| 9243/10000 [33:42:23<2:42:35, 12.89s/it] {'loss': 0.0042, 'learning_rate': 3.875e-06, 'epoch': 3.48} 92%|█████████▏| 9243/10000 [33:42:23<2:42:35, 12.89s/it] 92%|█████████▏| 9244/10000 [33:42:36<2:42:25, 12.89s/it] {'loss': 0.0036, 'learning_rate': 3.87e-06, 'epoch': 3.48} 92%|█████████▏| 9244/10000 [33:42:36<2:42:25, 12.89s/it] 92%|█████████▏| 9245/10000 [33:42:49<2:42:12, 12.89s/it] {'loss': 0.0045, 'learning_rate': 3.865e-06, 'epoch': 3.48} 92%|█████████▏| 9245/10000 [33:42:49<2:42:12, 12.89s/it] 92%|█████████▏| 9246/10000 [33:43:02<2:41:58, 12.89s/it] {'loss': 0.0039, 'learning_rate': 3.86e-06, 'epoch': 3.48} 92%|█████████▏| 9246/10000 [33:43:02<2:41:58, 12.89s/it] 92%|█████████▏| 9247/10000 [33:43:15<2:41:55, 12.90s/it] {'loss': 0.0046, 'learning_rate': 3.855e-06, 'epoch': 3.48} 92%|█████████▏| 9247/10000 [33:43:15<2:41:55, 12.90s/it] 92%|█████████▏| 9248/10000 [33:43:28<2:41:41, 12.90s/it] {'loss': 0.0031, 'learning_rate': 3.85e-06, 'epoch': 3.48} 92%|█████████▏| 9248/10000 [33:43:28<2:41:41, 12.90s/it] 92%|█████████▏| 9249/10000 [33:43:41<2:41:38, 12.91s/it] {'loss': 0.0044, 'learning_rate': 3.845e-06, 'epoch': 3.48} 92%|█████████▏| 9249/10000 [33:43:41<2:41:38, 12.91s/it] 92%|█████████▎| 9250/10000 [33:43:54<2:41:29, 12.92s/it] {'loss': 0.0034, 'learning_rate': 3.84e-06, 'epoch': 3.49} 92%|█████████▎| 9250/10000 [33:43:54<2:41:29, 12.92s/it] 93%|█████████▎| 9251/10000 [33:44:07<2:41:10, 12.91s/it] {'loss': 0.0035, 'learning_rate': 3.8350000000000006e-06, 'epoch': 3.49} 93%|█████████▎| 9251/10000 [33:44:07<2:41:10, 12.91s/it] 93%|█████████▎| 9252/10000 [33:44:20<2:41:08, 12.93s/it] {'loss': 0.0023, 'learning_rate': 3.830000000000001e-06, 'epoch': 3.49} 93%|█████████▎| 9252/10000 [33:44:20<2:41:08, 12.93s/it] 93%|█████████▎| 9253/10000 [33:44:32<2:40:49, 12.92s/it] {'loss': 0.0038, 'learning_rate': 3.825e-06, 'epoch': 3.49} 93%|█████████▎| 9253/10000 [33:44:32<2:40:49, 12.92s/it] 93%|█████████▎| 9254/10000 [33:44:45<2:40:40, 12.92s/it] {'loss': 0.0036, 'learning_rate': 3.82e-06, 'epoch': 3.49} 93%|█████████▎| 9254/10000 [33:44:45<2:40:40, 12.92s/it] 93%|█████████▎| 9255/10000 [33:44:58<2:40:22, 12.92s/it] {'loss': 0.0036, 'learning_rate': 3.815000000000001e-06, 'epoch': 3.49} 93%|█████████▎| 9255/10000 [33:44:58<2:40:22, 12.92s/it] 93%|█████████▎| 9256/10000 [33:45:11<2:40:02, 12.91s/it] {'loss': 0.0033, 'learning_rate': 3.8100000000000004e-06, 'epoch': 3.49} 93%|█████████▎| 9256/10000 [33:45:11<2:40:02, 12.91s/it] 93%|█████████▎| 9257/10000 [33:45:24<2:39:54, 12.91s/it] {'loss': 0.0032, 'learning_rate': 3.8050000000000004e-06, 'epoch': 3.49} 93%|█████████▎| 9257/10000 [33:45:24<2:39:54, 12.91s/it] 93%|█████████▎| 9258/10000 [33:45:37<2:39:14, 12.88s/it] {'loss': 0.003, 'learning_rate': 3.8e-06, 'epoch': 3.49} 93%|█████████▎| 9258/10000 [33:45:37<2:39:14, 12.88s/it] 93%|█████████▎| 9259/10000 [33:45:50<2:39:08, 12.89s/it] {'loss': 0.0038, 'learning_rate': 3.795e-06, 'epoch': 3.49} 93%|█████████▎| 9259/10000 [33:45:50<2:39:08, 12.89s/it] 93%|█████████▎| 9260/10000 [33:46:03<2:38:53, 12.88s/it] {'loss': 0.0037, 'learning_rate': 3.7900000000000006e-06, 'epoch': 3.49} 93%|█████████▎| 9260/10000 [33:46:03<2:38:53, 12.88s/it] 93%|█████████▎| 9261/10000 [33:46:16<2:38:44, 12.89s/it] {'loss': 0.0045, 'learning_rate': 3.785e-06, 'epoch': 3.49} 93%|█████████▎| 9261/10000 [33:46:16<2:38:44, 12.89s/it] 93%|█████████▎| 9262/10000 [33:46:28<2:38:27, 12.88s/it] {'loss': 0.0032, 'learning_rate': 3.7800000000000002e-06, 'epoch': 3.49} 93%|█████████▎| 9262/10000 [33:46:28<2:38:27, 12.88s/it] 93%|█████████▎| 9263/10000 [33:46:41<2:38:09, 12.88s/it] {'loss': 0.0048, 'learning_rate': 3.775e-06, 'epoch': 3.49} 93%|█████████▎| 9263/10000 [33:46:41<2:38:09, 12.88s/it] 93%|█████████▎| 9264/10000 [33:46:54<2:37:55, 12.87s/it] {'loss': 0.0034, 'learning_rate': 3.77e-06, 'epoch': 3.49} 93%|█████████▎| 9264/10000 [33:46:54<2:37:55, 12.87s/it] 93%|█████████▎| 9265/10000 [33:47:07<2:37:40, 12.87s/it] {'loss': 0.0027, 'learning_rate': 3.7650000000000004e-06, 'epoch': 3.49} 93%|█████████▎| 9265/10000 [33:47:07<2:37:40, 12.87s/it] 93%|█████████▎| 9266/10000 [33:47:20<2:37:24, 12.87s/it] {'loss': 0.0042, 'learning_rate': 3.7600000000000004e-06, 'epoch': 3.49} 93%|█████████▎| 9266/10000 [33:47:20<2:37:24, 12.87s/it] 93%|█████████▎| 9267/10000 [33:47:33<2:37:09, 12.86s/it] {'loss': 0.0025, 'learning_rate': 3.755e-06, 'epoch': 3.49} 93%|█████████▎| 9267/10000 [33:47:33<2:37:09, 12.86s/it] 93%|█████████▎| 9268/10000 [33:47:46<2:36:50, 12.86s/it] {'loss': 0.0039, 'learning_rate': 3.75e-06, 'epoch': 3.49} 93%|█████████▎| 9268/10000 [33:47:46<2:36:50, 12.86s/it] 93%|█████████▎| 9269/10000 [33:47:58<2:36:43, 12.86s/it] {'loss': 0.0037, 'learning_rate': 3.7449999999999997e-06, 'epoch': 3.49} 93%|█████████▎| 9269/10000 [33:47:58<2:36:43, 12.86s/it] 93%|█████████▎| 9270/10000 [33:48:11<2:36:48, 12.89s/it] {'loss': 0.0049, 'learning_rate': 3.7400000000000006e-06, 'epoch': 3.49} 93%|█████████▎| 9270/10000 [33:48:11<2:36:48, 12.89s/it] 93%|█████████▎| 9271/10000 [33:48:24<2:37:11, 12.94s/it] {'loss': 0.0035, 'learning_rate': 3.7350000000000002e-06, 'epoch': 3.49} 93%|█████████▎| 9271/10000 [33:48:24<2:37:11, 12.94s/it] 93%|█████████▎| 9272/10000 [33:48:37<2:37:05, 12.95s/it] {'loss': 0.0032, 'learning_rate': 3.7300000000000003e-06, 'epoch': 3.49} 93%|█████████▎| 9272/10000 [33:48:37<2:37:05, 12.95s/it] 93%|█████████▎| 9273/10000 [33:48:50<2:36:36, 12.92s/it] {'loss': 0.0044, 'learning_rate': 3.725e-06, 'epoch': 3.49} 93%|█████████▎| 9273/10000 [33:48:50<2:36:36, 12.92s/it] 93%|█████████▎| 9274/10000 [33:49:03<2:36:37, 12.94s/it] {'loss': 0.0038, 'learning_rate': 3.72e-06, 'epoch': 3.49} 93%|█████████▎| 9274/10000 [33:49:03<2:36:37, 12.94s/it] 93%|█████████▎| 9275/10000 [33:49:16<2:36:30, 12.95s/it] {'loss': 0.0038, 'learning_rate': 3.7150000000000004e-06, 'epoch': 3.49} 93%|█████████▎| 9275/10000 [33:49:16<2:36:30, 12.95s/it] 93%|█████████▎| 9276/10000 [33:49:29<2:36:14, 12.95s/it] {'loss': 0.0041, 'learning_rate': 3.7100000000000005e-06, 'epoch': 3.5} 93%|█████████▎| 9276/10000 [33:49:29<2:36:14, 12.95s/it] 93%|█████████▎| 9277/10000 [33:49:42<2:36:01, 12.95s/it] {'loss': 0.0039, 'learning_rate': 3.705e-06, 'epoch': 3.5} 93%|█████████▎| 9277/10000 [33:49:42<2:36:01, 12.95s/it] 93%|█████████▎| 9278/10000 [33:49:55<2:35:39, 12.94s/it] {'loss': 0.0035, 'learning_rate': 3.7e-06, 'epoch': 3.5} 93%|█████████▎| 9278/10000 [33:49:55<2:35:39, 12.94s/it] 93%|█████████▎| 9279/10000 [33:50:08<2:35:07, 12.91s/it] {'loss': 0.0041, 'learning_rate': 3.6949999999999998e-06, 'epoch': 3.5} 93%|█████████▎| 9279/10000 [33:50:08<2:35:07, 12.91s/it] 93%|█████████▎| 9280/10000 [33:50:21<2:34:57, 12.91s/it] {'loss': 0.0037, 'learning_rate': 3.6900000000000002e-06, 'epoch': 3.5} 93%|█████████▎| 9280/10000 [33:50:21<2:34:57, 12.91s/it] 93%|█████████▎| 9281/10000 [33:50:34<2:34:42, 12.91s/it] {'loss': 0.0039, 'learning_rate': 3.6850000000000003e-06, 'epoch': 3.5} 93%|█████████▎| 9281/10000 [33:50:34<2:34:42, 12.91s/it] 93%|█████████▎| 9282/10000 [33:50:47<2:34:24, 12.90s/it] {'loss': 0.0052, 'learning_rate': 3.68e-06, 'epoch': 3.5} 93%|█████████▎| 9282/10000 [33:50:47<2:34:24, 12.90s/it] 93%|█████████▎| 9283/10000 [33:51:00<2:34:17, 12.91s/it] {'loss': 0.0034, 'learning_rate': 3.675e-06, 'epoch': 3.5} 93%|█████████▎| 9283/10000 [33:51:00<2:34:17, 12.91s/it] 93%|█████████▎| 9284/10000 [33:51:12<2:34:12, 12.92s/it] {'loss': 0.0039, 'learning_rate': 3.6700000000000004e-06, 'epoch': 3.5} 93%|█████████▎| 9284/10000 [33:51:13<2:34:12, 12.92s/it] 93%|█████████▎| 9285/10000 [33:51:25<2:33:54, 12.92s/it] {'loss': 0.0044, 'learning_rate': 3.6650000000000005e-06, 'epoch': 3.5} 93%|█████████▎| 9285/10000 [33:51:25<2:33:54, 12.92s/it] 93%|█████████▎| 9286/10000 [33:51:38<2:33:40, 12.91s/it] {'loss': 0.0052, 'learning_rate': 3.66e-06, 'epoch': 3.5} 93%|█████████▎| 9286/10000 [33:51:38<2:33:40, 12.91s/it] 93%|█████████▎| 9287/10000 [33:51:51<2:33:23, 12.91s/it] {'loss': 0.0055, 'learning_rate': 3.655e-06, 'epoch': 3.5} 93%|█████████▎| 9287/10000 [33:51:51<2:33:23, 12.91s/it] 93%|█████████▎| 9288/10000 [33:52:04<2:32:58, 12.89s/it] {'loss': 0.0043, 'learning_rate': 3.6499999999999998e-06, 'epoch': 3.5} 93%|█████████▎| 9288/10000 [33:52:04<2:32:58, 12.89s/it] 93%|█████████▎| 9289/10000 [33:52:17<2:32:44, 12.89s/it] {'loss': 0.003, 'learning_rate': 3.6450000000000007e-06, 'epoch': 3.5} 93%|█████████▎| 9289/10000 [33:52:17<2:32:44, 12.89s/it] 93%|█████████▎| 9290/10000 [33:52:30<2:32:27, 12.88s/it] {'loss': 0.0036, 'learning_rate': 3.6400000000000003e-06, 'epoch': 3.5} 93%|█████████▎| 9290/10000 [33:52:30<2:32:27, 12.88s/it] 93%|█████████▎| 9291/10000 [33:52:43<2:32:46, 12.93s/it] {'loss': 0.0034, 'learning_rate': 3.6350000000000003e-06, 'epoch': 3.5} 93%|█████████▎| 9291/10000 [33:52:43<2:32:46, 12.93s/it] 93%|█████████▎| 9292/10000 [33:52:56<2:32:44, 12.94s/it] {'loss': 0.0031, 'learning_rate': 3.63e-06, 'epoch': 3.5} 93%|█████████▎| 9292/10000 [33:52:56<2:32:44, 12.94s/it] 93%|█████████▎| 9293/10000 [33:53:09<2:32:33, 12.95s/it] {'loss': 0.0046, 'learning_rate': 3.625e-06, 'epoch': 3.5} 93%|█████████▎| 9293/10000 [33:53:09<2:32:33, 12.95s/it] 93%|█████████▎| 9294/10000 [33:53:22<2:32:57, 13.00s/it] {'loss': 0.0033, 'learning_rate': 3.6200000000000005e-06, 'epoch': 3.5} 93%|█████████▎| 9294/10000 [33:53:22<2:32:57, 13.00s/it] 93%|█████████▎| 9295/10000 [33:53:35<2:32:38, 12.99s/it] {'loss': 0.0044, 'learning_rate': 3.6150000000000005e-06, 'epoch': 3.5} 93%|█████████▎| 9295/10000 [33:53:35<2:32:38, 12.99s/it] 93%|█████████▎| 9296/10000 [33:53:48<2:32:23, 12.99s/it] {'loss': 0.0051, 'learning_rate': 3.61e-06, 'epoch': 3.5} 93%|█████████▎| 9296/10000 [33:53:48<2:32:23, 12.99s/it] 93%|█████████▎| 9297/10000 [33:54:01<2:32:09, 12.99s/it] {'loss': 0.0037, 'learning_rate': 3.6050000000000002e-06, 'epoch': 3.5} 93%|█████████▎| 9297/10000 [33:54:01<2:32:09, 12.99s/it] 93%|█████████▎| 9298/10000 [33:54:14<2:31:53, 12.98s/it] {'loss': 0.0034, 'learning_rate': 3.6e-06, 'epoch': 3.5} 93%|█████████▎| 9298/10000 [33:54:14<2:31:53, 12.98s/it] 93%|█████████▎| 9299/10000 [33:54:27<2:31:48, 12.99s/it] {'loss': 0.0024, 'learning_rate': 3.5950000000000003e-06, 'epoch': 3.5} 93%|█████████▎| 9299/10000 [33:54:27<2:31:48, 12.99s/it] 93%|█████████▎| 9300/10000 [33:54:40<2:31:19, 12.97s/it] {'loss': 0.0034, 'learning_rate': 3.5900000000000004e-06, 'epoch': 3.5} 93%|█████████▎| 9300/10000 [33:54:40<2:31:19, 12.97s/it] 93%|█████████▎| 9301/10000 [33:54:53<2:31:17, 12.99s/it] {'loss': 0.0037, 'learning_rate': 3.585e-06, 'epoch': 3.5} 93%|█████████▎| 9301/10000 [33:54:53<2:31:17, 12.99s/it] 93%|█████████▎| 9302/10000 [33:55:06<2:30:31, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.58e-06, 'epoch': 3.5} 93%|█████████▎| 9302/10000 [33:55:06<2:30:31, 12.94s/it] 93%|█████████▎| 9303/10000 [33:55:19<2:30:23, 12.95s/it] {'loss': 0.0038, 'learning_rate': 3.575e-06, 'epoch': 3.51} 93%|█████████▎| 9303/10000 [33:55:19<2:30:23, 12.95s/it] 93%|█████████▎| 9304/10000 [33:55:31<2:30:08, 12.94s/it] {'loss': 0.0052, 'learning_rate': 3.5700000000000005e-06, 'epoch': 3.51} 93%|█████████▎| 9304/10000 [33:55:31<2:30:08, 12.94s/it] 93%|█████████▎| 9305/10000 [33:55:44<2:29:37, 12.92s/it] {'loss': 0.0033, 'learning_rate': 3.565e-06, 'epoch': 3.51} 93%|█████████▎| 9305/10000 [33:55:44<2:29:37, 12.92s/it] 93%|█████████▎| 9306/10000 [33:55:57<2:29:41, 12.94s/it] {'loss': 0.0027, 'learning_rate': 3.5600000000000002e-06, 'epoch': 3.51} 93%|█████████▎| 9306/10000 [33:55:57<2:29:41, 12.94s/it] 93%|█████████▎| 9307/10000 [33:56:10<2:29:15, 12.92s/it] {'loss': 0.0042, 'learning_rate': 3.555e-06, 'epoch': 3.51} 93%|█████████▎| 9307/10000 [33:56:10<2:29:15, 12.92s/it] 93%|█████████▎| 9308/10000 [33:56:23<2:28:52, 12.91s/it] {'loss': 0.0047, 'learning_rate': 3.55e-06, 'epoch': 3.51} 93%|█████████▎| 9308/10000 [33:56:23<2:28:52, 12.91s/it] 93%|█████████▎| 9309/10000 [33:56:36<2:28:58, 12.94s/it] {'loss': 0.0038, 'learning_rate': 3.5450000000000004e-06, 'epoch': 3.51} 93%|█████████▎| 9309/10000 [33:56:36<2:28:58, 12.94s/it] 93%|█████████▎| 9310/10000 [33:56:49<2:28:45, 12.93s/it] {'loss': 0.004, 'learning_rate': 3.5400000000000004e-06, 'epoch': 3.51} 93%|█████████▎| 9310/10000 [33:56:49<2:28:45, 12.93s/it] 93%|█████████▎| 9311/10000 [33:57:02<2:28:18, 12.91s/it] {'loss': 0.0036, 'learning_rate': 3.535e-06, 'epoch': 3.51} 93%|█████████▎| 9311/10000 [33:57:02<2:28:18, 12.91s/it] 93%|█████████▎| 9312/10000 [33:57:15<2:28:07, 12.92s/it] {'loss': 0.0034, 'learning_rate': 3.53e-06, 'epoch': 3.51} 93%|█████████▎| 9312/10000 [33:57:15<2:28:07, 12.92s/it] 93%|█████████▎| 9313/10000 [33:57:28<2:27:47, 12.91s/it] {'loss': 0.004, 'learning_rate': 3.5249999999999997e-06, 'epoch': 3.51} 93%|█████████▎| 9313/10000 [33:57:28<2:27:47, 12.91s/it] 93%|█████████▎| 9314/10000 [33:57:41<2:27:36, 12.91s/it] {'loss': 0.0032, 'learning_rate': 3.52e-06, 'epoch': 3.51} 93%|█████████▎| 9314/10000 [33:57:41<2:27:36, 12.91s/it] 93%|█████████▎| 9315/10000 [33:57:53<2:27:18, 12.90s/it] {'loss': 0.0026, 'learning_rate': 3.5150000000000002e-06, 'epoch': 3.51} 93%|█████████▎| 9315/10000 [33:57:54<2:27:18, 12.90s/it] 93%|█████████▎| 9316/10000 [33:58:06<2:27:11, 12.91s/it] {'loss': 0.0025, 'learning_rate': 3.5100000000000003e-06, 'epoch': 3.51} 93%|█████████▎| 9316/10000 [33:58:06<2:27:11, 12.91s/it] 93%|█████████▎| 9317/10000 [33:58:19<2:26:53, 12.90s/it] {'loss': 0.0035, 'learning_rate': 3.505e-06, 'epoch': 3.51} 93%|█████████▎| 9317/10000 [33:58:19<2:26:53, 12.90s/it] 93%|█████████▎| 9318/10000 [33:58:32<2:26:39, 12.90s/it] {'loss': 0.0042, 'learning_rate': 3.5000000000000004e-06, 'epoch': 3.51} 93%|█████████▎| 9318/10000 [33:58:32<2:26:39, 12.90s/it] 93%|█████████▎| 9319/10000 [33:58:45<2:26:19, 12.89s/it] {'loss': 0.0042, 'learning_rate': 3.4950000000000004e-06, 'epoch': 3.51} 93%|█████████▎| 9319/10000 [33:58:45<2:26:19, 12.89s/it] 93%|█████████▎| 9320/10000 [33:58:58<2:26:00, 12.88s/it] {'loss': 0.0043, 'learning_rate': 3.49e-06, 'epoch': 3.51} 93%|█████████▎| 9320/10000 [33:58:58<2:26:00, 12.88s/it] 93%|█████████▎| 9321/10000 [33:59:11<2:25:41, 12.87s/it] {'loss': 0.0032, 'learning_rate': 3.485e-06, 'epoch': 3.51} 93%|█████████▎| 9321/10000 [33:59:11<2:25:41, 12.87s/it] 93%|█████████▎| 9322/10000 [33:59:24<2:25:36, 12.89s/it] {'loss': 0.0031, 'learning_rate': 3.4799999999999997e-06, 'epoch': 3.51} 93%|█████████▎| 9322/10000 [33:59:24<2:25:36, 12.89s/it] 93%|█████████▎| 9323/10000 [33:59:37<2:25:26, 12.89s/it] {'loss': 0.0033, 'learning_rate': 3.4750000000000006e-06, 'epoch': 3.51} 93%|█████████▎| 9323/10000 [33:59:37<2:25:26, 12.89s/it] 93%|█████████▎| 9324/10000 [33:59:50<2:25:24, 12.91s/it] {'loss': 0.0037, 'learning_rate': 3.4700000000000002e-06, 'epoch': 3.51} 93%|█████████▎| 9324/10000 [33:59:50<2:25:24, 12.91s/it] 93%|█████████▎| 9325/10000 [34:00:02<2:25:18, 12.92s/it] {'loss': 0.0038, 'learning_rate': 3.4650000000000003e-06, 'epoch': 3.51} 93%|█████████▎| 9325/10000 [34:00:03<2:25:18, 12.92s/it] 93%|█████████▎| 9326/10000 [34:00:15<2:25:14, 12.93s/it] {'loss': 0.0041, 'learning_rate': 3.46e-06, 'epoch': 3.51} 93%|█████████▎| 9326/10000 [34:00:15<2:25:14, 12.93s/it] 93%|█████████▎| 9327/10000 [34:00:28<2:25:04, 12.93s/it] {'loss': 0.004, 'learning_rate': 3.455e-06, 'epoch': 3.51} 93%|█████████▎| 9327/10000 [34:00:28<2:25:04, 12.93s/it] 93%|█████████▎| 9328/10000 [34:00:41<2:24:40, 12.92s/it] {'loss': 0.0032, 'learning_rate': 3.4500000000000004e-06, 'epoch': 3.51} 93%|█████████▎| 9328/10000 [34:00:41<2:24:40, 12.92s/it] 93%|█████████▎| 9329/10000 [34:00:54<2:24:12, 12.90s/it] {'loss': 0.0036, 'learning_rate': 3.4450000000000005e-06, 'epoch': 3.52} 93%|█████████▎| 9329/10000 [34:00:54<2:24:12, 12.90s/it] 93%|█████████▎| 9330/10000 [34:01:07<2:23:57, 12.89s/it] {'loss': 0.004, 'learning_rate': 3.44e-06, 'epoch': 3.52} 93%|█████████▎| 9330/10000 [34:01:07<2:23:57, 12.89s/it] 93%|█████████▎| 9331/10000 [34:01:20<2:23:35, 12.88s/it] {'loss': 0.0035, 'learning_rate': 3.435e-06, 'epoch': 3.52} 93%|█████████▎| 9331/10000 [34:01:20<2:23:35, 12.88s/it] 93%|█████████▎| 9332/10000 [34:01:33<2:23:16, 12.87s/it] {'loss': 0.0039, 'learning_rate': 3.4299999999999998e-06, 'epoch': 3.52} 93%|█████████▎| 9332/10000 [34:01:33<2:23:16, 12.87s/it] 93%|█████████▎| 9333/10000 [34:01:46<2:23:17, 12.89s/it] {'loss': 0.0038, 'learning_rate': 3.4250000000000002e-06, 'epoch': 3.52} 93%|█████████▎| 9333/10000 [34:01:46<2:23:17, 12.89s/it] 93%|█████████▎| 9334/10000 [34:01:58<2:23:02, 12.89s/it] {'loss': 0.0037, 'learning_rate': 3.4200000000000003e-06, 'epoch': 3.52} 93%|█████████▎| 9334/10000 [34:01:59<2:23:02, 12.89s/it] 93%|█████████▎| 9335/10000 [34:02:11<2:23:07, 12.91s/it] {'loss': 0.0033, 'learning_rate': 3.4150000000000003e-06, 'epoch': 3.52} 93%|█████████▎| 9335/10000 [34:02:11<2:23:07, 12.91s/it] 93%|█████████▎| 9336/10000 [34:02:24<2:22:41, 12.89s/it] {'loss': 0.0033, 'learning_rate': 3.41e-06, 'epoch': 3.52} 93%|█████████▎| 9336/10000 [34:02:24<2:22:41, 12.89s/it] 93%|█████████▎| 9337/10000 [34:02:37<2:22:40, 12.91s/it] {'loss': 0.0045, 'learning_rate': 3.405e-06, 'epoch': 3.52} 93%|█████████▎| 9337/10000 [34:02:37<2:22:40, 12.91s/it] 93%|█████████▎| 9338/10000 [34:02:50<2:22:34, 12.92s/it] {'loss': 0.0036, 'learning_rate': 3.4000000000000005e-06, 'epoch': 3.52} 93%|█████████▎| 9338/10000 [34:02:50<2:22:34, 12.92s/it] 93%|█████████▎| 9339/10000 [34:03:03<2:22:12, 12.91s/it] {'loss': 0.0037, 'learning_rate': 3.395e-06, 'epoch': 3.52} 93%|█████████▎| 9339/10000 [34:03:03<2:22:12, 12.91s/it] 93%|█████████▎| 9340/10000 [34:03:16<2:21:59, 12.91s/it] {'loss': 0.0037, 'learning_rate': 3.39e-06, 'epoch': 3.52} 93%|█████████▎| 9340/10000 [34:03:16<2:21:59, 12.91s/it] 93%|█████████▎| 9341/10000 [34:03:29<2:21:55, 12.92s/it] {'loss': 0.004, 'learning_rate': 3.3849999999999998e-06, 'epoch': 3.52} 93%|█████████▎| 9341/10000 [34:03:29<2:21:55, 12.92s/it] 93%|█████████▎| 9342/10000 [34:03:42<2:21:39, 12.92s/it] {'loss': 0.0029, 'learning_rate': 3.38e-06, 'epoch': 3.52} 93%|█████████▎| 9342/10000 [34:03:42<2:21:39, 12.92s/it] 93%|█████████▎| 9343/10000 [34:03:55<2:21:45, 12.95s/it] {'loss': 0.0033, 'learning_rate': 3.3750000000000003e-06, 'epoch': 3.52} 93%|█████████▎| 9343/10000 [34:03:55<2:21:45, 12.95s/it] 93%|█████████▎| 9344/10000 [34:04:08<2:21:17, 12.92s/it] {'loss': 0.0031, 'learning_rate': 3.3700000000000003e-06, 'epoch': 3.52} 93%|█████████▎| 9344/10000 [34:04:08<2:21:17, 12.92s/it] 93%|█████████▎| 9345/10000 [34:04:21<2:21:09, 12.93s/it] {'loss': 0.0041, 'learning_rate': 3.365e-06, 'epoch': 3.52} 93%|█████████▎| 9345/10000 [34:04:21<2:21:09, 12.93s/it] 93%|█████████▎| 9346/10000 [34:04:34<2:20:45, 12.91s/it] {'loss': 0.0034, 'learning_rate': 3.36e-06, 'epoch': 3.52} 93%|█████████▎| 9346/10000 [34:04:34<2:20:45, 12.91s/it] 93%|█████████▎| 9347/10000 [34:04:46<2:20:23, 12.90s/it] {'loss': 0.0052, 'learning_rate': 3.3550000000000005e-06, 'epoch': 3.52} 93%|█████████▎| 9347/10000 [34:04:46<2:20:23, 12.90s/it] 93%|█████████▎| 9348/10000 [34:04:59<2:20:07, 12.89s/it] {'loss': 0.0041, 'learning_rate': 3.3500000000000005e-06, 'epoch': 3.52} 93%|█████████▎| 9348/10000 [34:04:59<2:20:07, 12.89s/it] 93%|█████████▎| 9349/10000 [34:05:12<2:20:04, 12.91s/it] {'loss': 0.0042, 'learning_rate': 3.345e-06, 'epoch': 3.52} 93%|█████████▎| 9349/10000 [34:05:12<2:20:04, 12.91s/it] 94%|█████████▎| 9350/10000 [34:05:25<2:19:49, 12.91s/it] {'loss': 0.0039, 'learning_rate': 3.34e-06, 'epoch': 3.52} 94%|█████████▎| 9350/10000 [34:05:25<2:19:49, 12.91s/it] 94%|█████████▎| 9351/10000 [34:05:38<2:19:48, 12.93s/it] {'loss': 0.0034, 'learning_rate': 3.335e-06, 'epoch': 3.52} 94%|█████████▎| 9351/10000 [34:05:38<2:19:48, 12.93s/it] 94%|█████████▎| 9352/10000 [34:05:51<2:19:51, 12.95s/it] {'loss': 0.0031, 'learning_rate': 3.3300000000000003e-06, 'epoch': 3.52} 94%|█████████▎| 9352/10000 [34:05:51<2:19:51, 12.95s/it] 94%|█████████▎| 9353/10000 [34:06:04<2:19:41, 12.95s/it] {'loss': 0.0033, 'learning_rate': 3.3250000000000004e-06, 'epoch': 3.52} 94%|█████████▎| 9353/10000 [34:06:04<2:19:41, 12.95s/it] 94%|█████████▎| 9354/10000 [34:06:17<2:19:33, 12.96s/it] {'loss': 0.0045, 'learning_rate': 3.3200000000000004e-06, 'epoch': 3.52} 94%|█████████▎| 9354/10000 [34:06:17<2:19:33, 12.96s/it] 94%|█████████▎| 9355/10000 [34:06:30<2:19:19, 12.96s/it] {'loss': 0.0039, 'learning_rate': 3.315e-06, 'epoch': 3.52} 94%|█████████▎| 9355/10000 [34:06:30<2:19:19, 12.96s/it] 94%|█████████▎| 9356/10000 [34:06:43<2:18:59, 12.95s/it] {'loss': 0.0033, 'learning_rate': 3.31e-06, 'epoch': 3.53} 94%|█████████▎| 9356/10000 [34:06:43<2:18:59, 12.95s/it] 94%|█████████▎| 9357/10000 [34:06:56<2:18:27, 12.92s/it] {'loss': 0.0058, 'learning_rate': 3.3050000000000005e-06, 'epoch': 3.53} 94%|█████████▎| 9357/10000 [34:06:56<2:18:27, 12.92s/it] 94%|█████████▎| 9358/10000 [34:07:09<2:17:56, 12.89s/it] {'loss': 0.0041, 'learning_rate': 3.3e-06, 'epoch': 3.53} 94%|█████████▎| 9358/10000 [34:07:09<2:17:56, 12.89s/it] 94%|█████████▎| 9359/10000 [34:07:21<2:17:30, 12.87s/it] {'loss': 0.0046, 'learning_rate': 3.2950000000000002e-06, 'epoch': 3.53} 94%|█████████▎| 9359/10000 [34:07:21<2:17:30, 12.87s/it] 94%|█████████▎| 9360/10000 [34:07:34<2:17:24, 12.88s/it] {'loss': 0.0037, 'learning_rate': 3.29e-06, 'epoch': 3.53} 94%|█████████▎| 9360/10000 [34:07:34<2:17:24, 12.88s/it] 94%|█████████▎| 9361/10000 [34:07:47<2:17:14, 12.89s/it] {'loss': 0.0038, 'learning_rate': 3.285e-06, 'epoch': 3.53} 94%|█████████▎| 9361/10000 [34:07:47<2:17:14, 12.89s/it] 94%|█████████▎| 9362/10000 [34:08:00<2:17:24, 12.92s/it] {'loss': 0.0044, 'learning_rate': 3.2800000000000004e-06, 'epoch': 3.53} 94%|█████████▎| 9362/10000 [34:08:00<2:17:24, 12.92s/it] 94%|█████████▎| 9363/10000 [34:08:13<2:17:06, 12.92s/it] {'loss': 0.0039, 'learning_rate': 3.2750000000000004e-06, 'epoch': 3.53} 94%|█████████▎| 9363/10000 [34:08:13<2:17:06, 12.92s/it] 94%|█████████▎| 9364/10000 [34:08:26<2:16:48, 12.91s/it] {'loss': 0.0032, 'learning_rate': 3.27e-06, 'epoch': 3.53} 94%|█████████▎| 9364/10000 [34:08:26<2:16:48, 12.91s/it] 94%|█████████▎| 9365/10000 [34:08:39<2:16:41, 12.92s/it] {'loss': 0.0038, 'learning_rate': 3.265e-06, 'epoch': 3.53} 94%|█████████▎| 9365/10000 [34:08:39<2:16:41, 12.92s/it] 94%|█████████▎| 9366/10000 [34:08:52<2:16:28, 12.92s/it] {'loss': 0.0029, 'learning_rate': 3.2599999999999997e-06, 'epoch': 3.53} 94%|█████████▎| 9366/10000 [34:08:52<2:16:28, 12.92s/it] 94%|█████████▎| 9367/10000 [34:09:05<2:16:12, 12.91s/it] {'loss': 0.0042, 'learning_rate': 3.2550000000000006e-06, 'epoch': 3.53} 94%|█████████▎| 9367/10000 [34:09:05<2:16:12, 12.91s/it] 94%|█████████▎| 9368/10000 [34:09:18<2:15:48, 12.89s/it] {'loss': 0.0043, 'learning_rate': 3.2500000000000002e-06, 'epoch': 3.53} 94%|█████████▎| 9368/10000 [34:09:18<2:15:48, 12.89s/it] 94%|█████████▎| 9369/10000 [34:09:31<2:15:39, 12.90s/it] {'loss': 0.0041, 'learning_rate': 3.2450000000000003e-06, 'epoch': 3.53} 94%|█████████▎| 9369/10000 [34:09:31<2:15:39, 12.90s/it] 94%|█████████▎| 9370/10000 [34:09:44<2:15:31, 12.91s/it] {'loss': 0.0054, 'learning_rate': 3.24e-06, 'epoch': 3.53} 94%|█████████▎| 9370/10000 [34:09:44<2:15:31, 12.91s/it] 94%|█████████▎| 9371/10000 [34:09:56<2:15:08, 12.89s/it] {'loss': 0.0055, 'learning_rate': 3.235e-06, 'epoch': 3.53} 94%|█████████▎| 9371/10000 [34:09:56<2:15:08, 12.89s/it] 94%|█████████▎| 9372/10000 [34:10:09<2:15:01, 12.90s/it] {'loss': 0.0024, 'learning_rate': 3.2300000000000004e-06, 'epoch': 3.53} 94%|█████████▎| 9372/10000 [34:10:09<2:15:01, 12.90s/it] 94%|█████████▎| 9373/10000 [34:10:22<2:14:50, 12.90s/it] {'loss': 0.0037, 'learning_rate': 3.225e-06, 'epoch': 3.53} 94%|█████████▎| 9373/10000 [34:10:22<2:14:50, 12.90s/it] 94%|█████████▎| 9374/10000 [34:10:35<2:14:34, 12.90s/it] {'loss': 0.0035, 'learning_rate': 3.22e-06, 'epoch': 3.53} 94%|█████████▎| 9374/10000 [34:10:35<2:14:34, 12.90s/it] 94%|█████████▍| 9375/10000 [34:10:48<2:14:20, 12.90s/it] {'loss': 0.0026, 'learning_rate': 3.215e-06, 'epoch': 3.53} 94%|█████████▍| 9375/10000 [34:10:48<2:14:20, 12.90s/it] 94%|█████████▍| 9376/10000 [34:11:01<2:14:01, 12.89s/it] {'loss': 0.0037, 'learning_rate': 3.2099999999999998e-06, 'epoch': 3.53} 94%|█████████▍| 9376/10000 [34:11:01<2:14:01, 12.89s/it] 94%|█████████▍| 9377/10000 [34:11:14<2:13:47, 12.88s/it] {'loss': 0.0033, 'learning_rate': 3.2050000000000002e-06, 'epoch': 3.53} 94%|█████████▍| 9377/10000 [34:11:14<2:13:47, 12.88s/it] 94%|█████████▍| 9378/10000 [34:11:27<2:13:27, 12.87s/it] {'loss': 0.0034, 'learning_rate': 3.2000000000000003e-06, 'epoch': 3.53} 94%|█████████▍| 9378/10000 [34:11:27<2:13:27, 12.87s/it] 94%|█████████▍| 9379/10000 [34:11:40<2:13:28, 12.90s/it] {'loss': 0.0032, 'learning_rate': 3.195e-06, 'epoch': 3.53} 94%|█████████▍| 9379/10000 [34:11:40<2:13:28, 12.90s/it] 94%|█████████▍| 9380/10000 [34:11:52<2:13:17, 12.90s/it] {'loss': 0.0033, 'learning_rate': 3.19e-06, 'epoch': 3.53} 94%|█████████▍| 9380/10000 [34:11:52<2:13:17, 12.90s/it] 94%|█████████▍| 9381/10000 [34:12:05<2:13:05, 12.90s/it] {'loss': 0.0033, 'learning_rate': 3.1850000000000004e-06, 'epoch': 3.53} 94%|█████████▍| 9381/10000 [34:12:05<2:13:05, 12.90s/it] 94%|█████████▍| 9382/10000 [34:12:18<2:13:04, 12.92s/it] {'loss': 0.0045, 'learning_rate': 3.1800000000000005e-06, 'epoch': 3.54} 94%|█████████▍| 9382/10000 [34:12:18<2:13:04, 12.92s/it] 94%|█████████▍| 9383/10000 [34:12:31<2:12:50, 12.92s/it] {'loss': 0.0034, 'learning_rate': 3.175e-06, 'epoch': 3.54} 94%|█████████▍| 9383/10000 [34:12:31<2:12:50, 12.92s/it] 94%|█████████▍| 9384/10000 [34:12:44<2:12:36, 12.92s/it] {'loss': 0.0026, 'learning_rate': 3.17e-06, 'epoch': 3.54} 94%|█████████▍| 9384/10000 [34:12:44<2:12:36, 12.92s/it] 94%|█████████▍| 9385/10000 [34:12:57<2:12:02, 12.88s/it] {'loss': 0.0039, 'learning_rate': 3.1649999999999998e-06, 'epoch': 3.54} 94%|█████████▍| 9385/10000 [34:12:57<2:12:02, 12.88s/it] 94%|█████████▍| 9386/10000 [34:13:10<2:12:00, 12.90s/it] {'loss': 0.006, 'learning_rate': 3.1600000000000007e-06, 'epoch': 3.54} 94%|█████████▍| 9386/10000 [34:13:10<2:12:00, 12.90s/it] 94%|█████████▍| 9387/10000 [34:13:23<2:11:49, 12.90s/it] {'loss': 0.0024, 'learning_rate': 3.1550000000000003e-06, 'epoch': 3.54} 94%|█████████▍| 9387/10000 [34:13:23<2:11:49, 12.90s/it] 94%|█████████▍| 9388/10000 [34:13:36<2:11:26, 12.89s/it] {'loss': 0.004, 'learning_rate': 3.1500000000000003e-06, 'epoch': 3.54} 94%|█████████▍| 9388/10000 [34:13:36<2:11:26, 12.89s/it] 94%|█████████▍| 9389/10000 [34:13:49<2:11:20, 12.90s/it] {'loss': 0.0042, 'learning_rate': 3.145e-06, 'epoch': 3.54} 94%|█████████▍| 9389/10000 [34:13:49<2:11:20, 12.90s/it] 94%|█████████▍| 9390/10000 [34:14:01<2:11:05, 12.89s/it] {'loss': 0.0037, 'learning_rate': 3.14e-06, 'epoch': 3.54} 94%|█████████▍| 9390/10000 [34:14:01<2:11:05, 12.89s/it] 94%|█████████▍| 9391/10000 [34:14:14<2:10:54, 12.90s/it] {'loss': 0.0044, 'learning_rate': 3.1350000000000005e-06, 'epoch': 3.54} 94%|█████████▍| 9391/10000 [34:14:14<2:10:54, 12.90s/it] 94%|█████████▍| 9392/10000 [34:14:27<2:10:36, 12.89s/it] {'loss': 0.0031, 'learning_rate': 3.13e-06, 'epoch': 3.54} 94%|█████████▍| 9392/10000 [34:14:27<2:10:36, 12.89s/it] 94%|█████████▍| 9393/10000 [34:14:40<2:10:22, 12.89s/it] {'loss': 0.004, 'learning_rate': 3.125e-06, 'epoch': 3.54} 94%|█████████▍| 9393/10000 [34:14:40<2:10:22, 12.89s/it] 94%|█████████▍| 9394/10000 [34:14:53<2:10:20, 12.90s/it] {'loss': 0.0038, 'learning_rate': 3.12e-06, 'epoch': 3.54} 94%|█████████▍| 9394/10000 [34:14:53<2:10:20, 12.90s/it] 94%|█████████▍| 9395/10000 [34:15:06<2:10:00, 12.89s/it] {'loss': 0.0031, 'learning_rate': 3.1150000000000002e-06, 'epoch': 3.54} 94%|█████████▍| 9395/10000 [34:15:06<2:10:00, 12.89s/it] 94%|█████████▍| 9396/10000 [34:15:19<2:09:53, 12.90s/it] {'loss': 0.0036, 'learning_rate': 3.11e-06, 'epoch': 3.54} 94%|█████████▍| 9396/10000 [34:15:19<2:09:53, 12.90s/it] 94%|█████████▍| 9397/10000 [34:15:32<2:09:29, 12.88s/it] {'loss': 0.0041, 'learning_rate': 3.1050000000000003e-06, 'epoch': 3.54} 94%|█████████▍| 9397/10000 [34:15:32<2:09:29, 12.88s/it] 94%|█████████▍| 9398/10000 [34:15:45<2:09:25, 12.90s/it] {'loss': 0.0031, 'learning_rate': 3.1e-06, 'epoch': 3.54} 94%|█████████▍| 9398/10000 [34:15:45<2:09:25, 12.90s/it] 94%|█████████▍| 9399/10000 [34:15:57<2:08:59, 12.88s/it] {'loss': 0.0035, 'learning_rate': 3.095e-06, 'epoch': 3.54} 94%|█████████▍| 9399/10000 [34:15:57<2:08:59, 12.88s/it] 94%|█████████▍| 9400/10000 [34:16:10<2:08:57, 12.90s/it] {'loss': 0.0044, 'learning_rate': 3.09e-06, 'epoch': 3.54} 94%|█████████▍| 9400/10000 [34:16:10<2:08:57, 12.90s/it] 94%|█████████▍| 9401/10000 [34:16:23<2:08:31, 12.87s/it] {'loss': 0.0035, 'learning_rate': 3.085e-06, 'epoch': 3.54} 94%|█████████▍| 9401/10000 [34:16:23<2:08:31, 12.87s/it] 94%|█████████▍| 9402/10000 [34:16:36<2:08:16, 12.87s/it] {'loss': 0.0036, 'learning_rate': 3.08e-06, 'epoch': 3.54} 94%|█████████▍| 9402/10000 [34:16:36<2:08:16, 12.87s/it] 94%|█████████▍| 9403/10000 [34:16:49<2:08:30, 12.92s/it] {'loss': 0.0034, 'learning_rate': 3.075e-06, 'epoch': 3.54} 94%|█████████▍| 9403/10000 [34:16:49<2:08:30, 12.92s/it] 94%|█████████▍| 9404/10000 [34:17:02<2:08:26, 12.93s/it] {'loss': 0.0038, 'learning_rate': 3.0700000000000003e-06, 'epoch': 3.54} 94%|█████████▍| 9404/10000 [34:17:02<2:08:26, 12.93s/it] 94%|█████████▍| 9405/10000 [34:17:15<2:08:01, 12.91s/it] {'loss': 0.0041, 'learning_rate': 3.0650000000000003e-06, 'epoch': 3.54} 94%|█████████▍| 9405/10000 [34:17:15<2:08:01, 12.91s/it] 94%|█████████▍| 9406/10000 [34:17:28<2:07:52, 12.92s/it] {'loss': 0.0049, 'learning_rate': 3.06e-06, 'epoch': 3.54} 94%|█████████▍| 9406/10000 [34:17:28<2:07:52, 12.92s/it] 94%|█████████▍| 9407/10000 [34:17:41<2:07:38, 12.92s/it] {'loss': 0.0039, 'learning_rate': 3.0550000000000004e-06, 'epoch': 3.54} 94%|█████████▍| 9407/10000 [34:17:41<2:07:38, 12.92s/it] 94%|█████████▍| 9408/10000 [34:17:54<2:07:30, 12.92s/it] {'loss': 0.0034, 'learning_rate': 3.05e-06, 'epoch': 3.54} 94%|█████████▍| 9408/10000 [34:17:54<2:07:30, 12.92s/it] 94%|█████████▍| 9409/10000 [34:18:07<2:07:02, 12.90s/it] {'loss': 0.0044, 'learning_rate': 3.0450000000000005e-06, 'epoch': 3.55} 94%|█████████▍| 9409/10000 [34:18:07<2:07:02, 12.90s/it] 94%|█████████▍| 9410/10000 [34:18:19<2:06:52, 12.90s/it] {'loss': 0.0034, 'learning_rate': 3.04e-06, 'epoch': 3.55} 94%|█████████▍| 9410/10000 [34:18:19<2:06:52, 12.90s/it] 94%|█████████▍| 9411/10000 [34:18:32<2:06:39, 12.90s/it] {'loss': 0.0026, 'learning_rate': 3.035e-06, 'epoch': 3.55} 94%|█████████▍| 9411/10000 [34:18:32<2:06:39, 12.90s/it] 94%|█████████▍| 9412/10000 [34:18:45<2:06:29, 12.91s/it] {'loss': 0.0031, 'learning_rate': 3.0300000000000002e-06, 'epoch': 3.55} 94%|█████████▍| 9412/10000 [34:18:45<2:06:29, 12.91s/it] 94%|█████████▍| 9413/10000 [34:18:58<2:06:15, 12.91s/it] {'loss': 0.0027, 'learning_rate': 3.0250000000000003e-06, 'epoch': 3.55} 94%|█████████▍| 9413/10000 [34:18:58<2:06:15, 12.91s/it] 94%|█████████▍| 9414/10000 [34:19:11<2:05:52, 12.89s/it] {'loss': 0.004, 'learning_rate': 3.0200000000000003e-06, 'epoch': 3.55} 94%|█████████▍| 9414/10000 [34:19:11<2:05:52, 12.89s/it] 94%|█████████▍| 9415/10000 [34:19:24<2:05:54, 12.91s/it] {'loss': 0.0032, 'learning_rate': 3.015e-06, 'epoch': 3.55} 94%|█████████▍| 9415/10000 [34:19:24<2:05:54, 12.91s/it] 94%|█████████▍| 9416/10000 [34:19:37<2:05:28, 12.89s/it] {'loss': 0.0043, 'learning_rate': 3.01e-06, 'epoch': 3.55} 94%|█████████▍| 9416/10000 [34:19:37<2:05:28, 12.89s/it] 94%|█████████▍| 9417/10000 [34:19:50<2:05:21, 12.90s/it] {'loss': 0.0035, 'learning_rate': 3.005e-06, 'epoch': 3.55} 94%|█████████▍| 9417/10000 [34:19:50<2:05:21, 12.90s/it] 94%|█████████▍| 9418/10000 [34:20:03<2:04:59, 12.89s/it] {'loss': 0.0036, 'learning_rate': 3e-06, 'epoch': 3.55} 94%|█████████▍| 9418/10000 [34:20:03<2:04:59, 12.89s/it] 94%|█████████▍| 9419/10000 [34:20:15<2:04:34, 12.86s/it] {'loss': 0.0041, 'learning_rate': 2.995e-06, 'epoch': 3.55} 94%|█████████▍| 9419/10000 [34:20:15<2:04:34, 12.86s/it] 94%|█████████▍| 9420/10000 [34:20:28<2:04:33, 12.89s/it] {'loss': 0.0036, 'learning_rate': 2.99e-06, 'epoch': 3.55} 94%|█████████▍| 9420/10000 [34:20:28<2:04:33, 12.89s/it] 94%|█████████▍| 9421/10000 [34:20:41<2:04:19, 12.88s/it] {'loss': 0.0026, 'learning_rate': 2.9850000000000002e-06, 'epoch': 3.55} 94%|█████████▍| 9421/10000 [34:20:41<2:04:19, 12.88s/it] 94%|█████████▍| 9422/10000 [34:20:54<2:04:09, 12.89s/it] {'loss': 0.0033, 'learning_rate': 2.9800000000000003e-06, 'epoch': 3.55} 94%|█████████▍| 9422/10000 [34:20:54<2:04:09, 12.89s/it] 94%|█████████▍| 9423/10000 [34:21:07<2:03:50, 12.88s/it] {'loss': 0.004, 'learning_rate': 2.975e-06, 'epoch': 3.55} 94%|█████████▍| 9423/10000 [34:21:07<2:03:50, 12.88s/it] 94%|█████████▍| 9424/10000 [34:21:20<2:03:47, 12.89s/it] {'loss': 0.0032, 'learning_rate': 2.9700000000000004e-06, 'epoch': 3.55} 94%|█████████▍| 9424/10000 [34:21:20<2:03:47, 12.89s/it] 94%|█████████▍| 9425/10000 [34:21:33<2:03:37, 12.90s/it] {'loss': 0.0044, 'learning_rate': 2.965e-06, 'epoch': 3.55} 94%|█████████▍| 9425/10000 [34:21:33<2:03:37, 12.90s/it] 94%|█████████▍| 9426/10000 [34:21:46<2:03:32, 12.91s/it] {'loss': 0.0041, 'learning_rate': 2.9600000000000005e-06, 'epoch': 3.55} 94%|█████████▍| 9426/10000 [34:21:46<2:03:32, 12.91s/it] 94%|█████████▍| 9427/10000 [34:21:59<2:03:15, 12.91s/it] {'loss': 0.0044, 'learning_rate': 2.955e-06, 'epoch': 3.55} 94%|█████████▍| 9427/10000 [34:21:59<2:03:15, 12.91s/it] 94%|█████████▍| 9428/10000 [34:22:12<2:02:58, 12.90s/it] {'loss': 0.0035, 'learning_rate': 2.95e-06, 'epoch': 3.55} 94%|█████████▍| 9428/10000 [34:22:12<2:02:58, 12.90s/it] 94%|█████████▍| 9429/10000 [34:22:24<2:02:40, 12.89s/it] {'loss': 0.0036, 'learning_rate': 2.945e-06, 'epoch': 3.55} 94%|█████████▍| 9429/10000 [34:22:24<2:02:40, 12.89s/it] 94%|█████████▍| 9430/10000 [34:22:37<2:02:35, 12.90s/it] {'loss': 0.0041, 'learning_rate': 2.9400000000000002e-06, 'epoch': 3.55} 94%|█████████▍| 9430/10000 [34:22:37<2:02:35, 12.90s/it] 94%|█████████▍| 9431/10000 [34:22:50<2:02:22, 12.90s/it] {'loss': 0.004, 'learning_rate': 2.9350000000000003e-06, 'epoch': 3.55} 94%|█████████▍| 9431/10000 [34:22:50<2:02:22, 12.90s/it] 94%|█████████▍| 9432/10000 [34:23:03<2:02:12, 12.91s/it] {'loss': 0.0044, 'learning_rate': 2.93e-06, 'epoch': 3.55} 94%|█████████▍| 9432/10000 [34:23:03<2:02:12, 12.91s/it] 94%|█████████▍| 9433/10000 [34:23:16<2:01:58, 12.91s/it] {'loss': 0.0044, 'learning_rate': 2.9250000000000004e-06, 'epoch': 3.55} 94%|█████████▍| 9433/10000 [34:23:16<2:01:58, 12.91s/it] 94%|█████████▍| 9434/10000 [34:23:29<2:01:40, 12.90s/it] {'loss': 0.0042, 'learning_rate': 2.92e-06, 'epoch': 3.55} 94%|█████████▍| 9434/10000 [34:23:29<2:01:40, 12.90s/it] 94%|█████████▍| 9435/10000 [34:23:42<2:01:32, 12.91s/it] {'loss': 0.0048, 'learning_rate': 2.915e-06, 'epoch': 3.56} 94%|█████████▍| 9435/10000 [34:23:42<2:01:32, 12.91s/it] 94%|█████████▍| 9436/10000 [34:23:55<2:01:24, 12.92s/it] {'loss': 0.0028, 'learning_rate': 2.91e-06, 'epoch': 3.56} 94%|█████████▍| 9436/10000 [34:23:55<2:01:24, 12.92s/it] 94%|█████████▍| 9437/10000 [34:24:08<2:01:15, 12.92s/it] {'loss': 0.0044, 'learning_rate': 2.905e-06, 'epoch': 3.56} 94%|█████████▍| 9437/10000 [34:24:08<2:01:15, 12.92s/it] 94%|█████████▍| 9438/10000 [34:24:21<2:00:53, 12.91s/it] {'loss': 0.004, 'learning_rate': 2.9e-06, 'epoch': 3.56} 94%|█████████▍| 9438/10000 [34:24:21<2:00:53, 12.91s/it] 94%|█████████▍| 9439/10000 [34:24:34<2:00:42, 12.91s/it] {'loss': 0.0028, 'learning_rate': 2.8950000000000002e-06, 'epoch': 3.56} 94%|█████████▍| 9439/10000 [34:24:34<2:00:42, 12.91s/it] 94%|█████████▍| 9440/10000 [34:24:46<2:00:31, 12.91s/it] {'loss': 0.0029, 'learning_rate': 2.89e-06, 'epoch': 3.56} 94%|█████████▍| 9440/10000 [34:24:46<2:00:31, 12.91s/it] 94%|█████████▍| 9441/10000 [34:24:59<2:00:19, 12.91s/it] {'loss': 0.0034, 'learning_rate': 2.8850000000000003e-06, 'epoch': 3.56} 94%|█████████▍| 9441/10000 [34:24:59<2:00:19, 12.91s/it] 94%|█████████▍| 9442/10000 [34:25:12<2:00:08, 12.92s/it] {'loss': 0.0039, 'learning_rate': 2.88e-06, 'epoch': 3.56} 94%|█████████▍| 9442/10000 [34:25:12<2:00:08, 12.92s/it] 94%|█████████▍| 9443/10000 [34:25:25<1:59:47, 12.90s/it] {'loss': 0.0044, 'learning_rate': 2.8750000000000004e-06, 'epoch': 3.56} 94%|█████████▍| 9443/10000 [34:25:25<1:59:47, 12.90s/it] 94%|█████████▍| 9444/10000 [34:25:38<1:59:23, 12.88s/it] {'loss': 0.0038, 'learning_rate': 2.87e-06, 'epoch': 3.56} 94%|█████████▍| 9444/10000 [34:25:38<1:59:23, 12.88s/it] 94%|█████████▍| 9445/10000 [34:25:51<1:59:04, 12.87s/it] {'loss': 0.0042, 'learning_rate': 2.865e-06, 'epoch': 3.56} 94%|█████████▍| 9445/10000 [34:25:51<1:59:04, 12.87s/it] 94%|█████████▍| 9446/10000 [34:26:04<1:58:49, 12.87s/it] {'loss': 0.0048, 'learning_rate': 2.86e-06, 'epoch': 3.56} 94%|█████████▍| 9446/10000 [34:26:04<1:58:49, 12.87s/it] 94%|█████████▍| 9447/10000 [34:26:17<1:58:57, 12.91s/it] {'loss': 0.0025, 'learning_rate': 2.855e-06, 'epoch': 3.56} 94%|█████████▍| 9447/10000 [34:26:17<1:58:57, 12.91s/it] 94%|█████████▍| 9448/10000 [34:26:30<1:58:32, 12.89s/it] {'loss': 0.0045, 'learning_rate': 2.8500000000000002e-06, 'epoch': 3.56} 94%|█████████▍| 9448/10000 [34:26:30<1:58:32, 12.89s/it] 94%|█████████▍| 9449/10000 [34:26:43<1:58:37, 12.92s/it] {'loss': 0.0025, 'learning_rate': 2.8450000000000003e-06, 'epoch': 3.56} 94%|█████████▍| 9449/10000 [34:26:43<1:58:37, 12.92s/it] 94%|█████████▍| 9450/10000 [34:26:55<1:58:24, 12.92s/it] {'loss': 0.003, 'learning_rate': 2.8400000000000003e-06, 'epoch': 3.56} 94%|█████████▍| 9450/10000 [34:26:55<1:58:24, 12.92s/it] 95%|█████████▍| 9451/10000 [34:27:08<1:58:21, 12.94s/it] {'loss': 0.0035, 'learning_rate': 2.835e-06, 'epoch': 3.56} 95%|█████████▍| 9451/10000 [34:27:09<1:58:21, 12.94s/it] 95%|█████████▍| 9452/10000 [34:27:21<1:58:11, 12.94s/it] {'loss': 0.0026, 'learning_rate': 2.83e-06, 'epoch': 3.56} 95%|█████████▍| 9452/10000 [34:27:21<1:58:11, 12.94s/it] 95%|█████████▍| 9453/10000 [34:27:34<1:57:46, 12.92s/it] {'loss': 0.0047, 'learning_rate': 2.825e-06, 'epoch': 3.56} 95%|█████████▍| 9453/10000 [34:27:34<1:57:46, 12.92s/it] 95%|█████████▍| 9454/10000 [34:27:47<1:57:29, 12.91s/it] {'loss': 0.0035, 'learning_rate': 2.82e-06, 'epoch': 3.56} 95%|█████████▍| 9454/10000 [34:27:47<1:57:29, 12.91s/it] 95%|█████████▍| 9455/10000 [34:28:00<1:57:18, 12.92s/it] {'loss': 0.0032, 'learning_rate': 2.815e-06, 'epoch': 3.56} 95%|█████████▍| 9455/10000 [34:28:00<1:57:18, 12.92s/it] 95%|█████████▍| 9456/10000 [34:28:13<1:56:51, 12.89s/it] {'loss': 0.0045, 'learning_rate': 2.81e-06, 'epoch': 3.56} 95%|█████████▍| 9456/10000 [34:28:13<1:56:51, 12.89s/it] 95%|█████████▍| 9457/10000 [34:28:26<1:56:33, 12.88s/it] {'loss': 0.0044, 'learning_rate': 2.805e-06, 'epoch': 3.56} 95%|█████████▍| 9457/10000 [34:28:26<1:56:33, 12.88s/it] 95%|█████████▍| 9458/10000 [34:28:39<1:56:32, 12.90s/it] {'loss': 0.0037, 'learning_rate': 2.8000000000000003e-06, 'epoch': 3.56} 95%|█████████▍| 9458/10000 [34:28:39<1:56:32, 12.90s/it] 95%|█████████▍| 9459/10000 [34:28:52<1:56:17, 12.90s/it] {'loss': 0.003, 'learning_rate': 2.795e-06, 'epoch': 3.56} 95%|█████████▍| 9459/10000 [34:28:52<1:56:17, 12.90s/it] 95%|█████████▍| 9460/10000 [34:29:05<1:56:09, 12.91s/it] {'loss': 0.0038, 'learning_rate': 2.7900000000000004e-06, 'epoch': 3.56} 95%|█████████▍| 9460/10000 [34:29:05<1:56:09, 12.91s/it] 95%|█████████▍| 9461/10000 [34:29:17<1:55:59, 12.91s/it] {'loss': 0.0039, 'learning_rate': 2.785e-06, 'epoch': 3.56} 95%|█████████▍| 9461/10000 [34:29:17<1:55:59, 12.91s/it] 95%|█████████▍| 9462/10000 [34:29:30<1:55:49, 12.92s/it] {'loss': 0.0029, 'learning_rate': 2.78e-06, 'epoch': 3.57} 95%|█████████▍| 9462/10000 [34:29:30<1:55:49, 12.92s/it] 95%|█████████▍| 9463/10000 [34:29:43<1:55:44, 12.93s/it] {'loss': 0.0039, 'learning_rate': 2.775e-06, 'epoch': 3.57} 95%|█████████▍| 9463/10000 [34:29:43<1:55:44, 12.93s/it] 95%|█████████▍| 9464/10000 [34:29:56<1:55:32, 12.93s/it] {'loss': 0.0032, 'learning_rate': 2.77e-06, 'epoch': 3.57} 95%|█████████▍| 9464/10000 [34:29:56<1:55:32, 12.93s/it] 95%|█████████▍| 9465/10000 [34:30:09<1:55:23, 12.94s/it] {'loss': 0.0031, 'learning_rate': 2.765e-06, 'epoch': 3.57} 95%|█████████▍| 9465/10000 [34:30:09<1:55:23, 12.94s/it] 95%|█████████▍| 9466/10000 [34:30:22<1:55:19, 12.96s/it] {'loss': 0.0045, 'learning_rate': 2.7600000000000003e-06, 'epoch': 3.57} 95%|█████████▍| 9466/10000 [34:30:22<1:55:19, 12.96s/it] 95%|█████████▍| 9467/10000 [34:30:35<1:55:00, 12.95s/it] {'loss': 0.0027, 'learning_rate': 2.7550000000000003e-06, 'epoch': 3.57} 95%|█████████▍| 9467/10000 [34:30:35<1:55:00, 12.95s/it] 95%|█████████▍| 9468/10000 [34:30:48<1:54:37, 12.93s/it] {'loss': 0.0037, 'learning_rate': 2.7500000000000004e-06, 'epoch': 3.57} 95%|█████████▍| 9468/10000 [34:30:48<1:54:37, 12.93s/it] 95%|█████████▍| 9469/10000 [34:31:01<1:54:14, 12.91s/it] {'loss': 0.0044, 'learning_rate': 2.745e-06, 'epoch': 3.57} 95%|█████████▍| 9469/10000 [34:31:01<1:54:14, 12.91s/it] 95%|█████████▍| 9470/10000 [34:31:14<1:54:00, 12.91s/it] {'loss': 0.0034, 'learning_rate': 2.74e-06, 'epoch': 3.57} 95%|█████████▍| 9470/10000 [34:31:14<1:54:00, 12.91s/it] 95%|█████████▍| 9471/10000 [34:31:27<1:53:45, 12.90s/it] {'loss': 0.0039, 'learning_rate': 2.735e-06, 'epoch': 3.57} 95%|█████████▍| 9471/10000 [34:31:27<1:53:45, 12.90s/it] 95%|█████████▍| 9472/10000 [34:31:40<1:53:31, 12.90s/it] {'loss': 0.0043, 'learning_rate': 2.73e-06, 'epoch': 3.57} 95%|█████████▍| 9472/10000 [34:31:40<1:53:31, 12.90s/it] 95%|█████████▍| 9473/10000 [34:31:52<1:53:13, 12.89s/it] {'loss': 0.0041, 'learning_rate': 2.725e-06, 'epoch': 3.57} 95%|█████████▍| 9473/10000 [34:31:52<1:53:13, 12.89s/it] 95%|█████████▍| 9474/10000 [34:32:05<1:53:06, 12.90s/it] {'loss': 0.0035, 'learning_rate': 2.72e-06, 'epoch': 3.57} 95%|█████████▍| 9474/10000 [34:32:05<1:53:06, 12.90s/it] 95%|█████████▍| 9475/10000 [34:32:18<1:52:53, 12.90s/it] {'loss': 0.0038, 'learning_rate': 2.7150000000000003e-06, 'epoch': 3.57} 95%|█████████▍| 9475/10000 [34:32:18<1:52:53, 12.90s/it] 95%|█████████▍| 9476/10000 [34:32:31<1:52:47, 12.92s/it] {'loss': 0.0036, 'learning_rate': 2.71e-06, 'epoch': 3.57} 95%|█████████▍| 9476/10000 [34:32:31<1:52:47, 12.92s/it] 95%|█████████▍| 9477/10000 [34:32:44<1:52:38, 12.92s/it] {'loss': 0.0029, 'learning_rate': 2.7050000000000004e-06, 'epoch': 3.57} 95%|█████████▍| 9477/10000 [34:32:44<1:52:38, 12.92s/it] 95%|█████████▍| 9478/10000 [34:32:57<1:52:30, 12.93s/it] {'loss': 0.0033, 'learning_rate': 2.7e-06, 'epoch': 3.57} 95%|█████████▍| 9478/10000 [34:32:57<1:52:30, 12.93s/it] 95%|█████████▍| 9479/10000 [34:33:10<1:52:16, 12.93s/it] {'loss': 0.0032, 'learning_rate': 2.6950000000000005e-06, 'epoch': 3.57} 95%|█████████▍| 9479/10000 [34:33:10<1:52:16, 12.93s/it] 95%|█████████▍| 9480/10000 [34:33:23<1:51:54, 12.91s/it] {'loss': 0.0034, 'learning_rate': 2.69e-06, 'epoch': 3.57} 95%|█████████▍| 9480/10000 [34:33:23<1:51:54, 12.91s/it] 95%|█████████▍| 9481/10000 [34:33:36<1:51:21, 12.87s/it] {'loss': 0.004, 'learning_rate': 2.685e-06, 'epoch': 3.57} 95%|█████████▍| 9481/10000 [34:33:36<1:51:21, 12.87s/it] 95%|█████████▍| 9482/10000 [34:33:49<1:51:09, 12.87s/it] {'loss': 0.0035, 'learning_rate': 2.68e-06, 'epoch': 3.57} 95%|█████████▍| 9482/10000 [34:33:49<1:51:09, 12.87s/it] 95%|█████████▍| 9483/10000 [34:34:01<1:50:54, 12.87s/it] {'loss': 0.0038, 'learning_rate': 2.6750000000000002e-06, 'epoch': 3.57} 95%|█████████▍| 9483/10000 [34:34:01<1:50:54, 12.87s/it] 95%|█████████▍| 9484/10000 [34:34:14<1:50:39, 12.87s/it] {'loss': 0.0034, 'learning_rate': 2.6700000000000003e-06, 'epoch': 3.57} 95%|█████████▍| 9484/10000 [34:34:14<1:50:39, 12.87s/it] 95%|█████████▍| 9485/10000 [34:34:27<1:50:28, 12.87s/it] {'loss': 0.0026, 'learning_rate': 2.6650000000000003e-06, 'epoch': 3.57} 95%|█████████▍| 9485/10000 [34:34:27<1:50:28, 12.87s/it] 95%|█████████▍| 9486/10000 [34:34:40<1:50:26, 12.89s/it] {'loss': 0.003, 'learning_rate': 2.66e-06, 'epoch': 3.57} 95%|█████████▍| 9486/10000 [34:34:40<1:50:26, 12.89s/it] 95%|█████████▍| 9487/10000 [34:34:53<1:50:08, 12.88s/it] {'loss': 0.0038, 'learning_rate': 2.655e-06, 'epoch': 3.57} 95%|█████████▍| 9487/10000 [34:34:53<1:50:08, 12.88s/it] 95%|█████████▍| 9488/10000 [34:35:06<1:49:49, 12.87s/it] {'loss': 0.0025, 'learning_rate': 2.65e-06, 'epoch': 3.57} 95%|█████████▍| 9488/10000 [34:35:06<1:49:49, 12.87s/it] 95%|█████████▍| 9489/10000 [34:35:19<1:49:30, 12.86s/it] {'loss': 0.0036, 'learning_rate': 2.645e-06, 'epoch': 3.58} 95%|█████████▍| 9489/10000 [34:35:19<1:49:30, 12.86s/it] 95%|█████████▍| 9490/10000 [34:35:32<1:49:29, 12.88s/it] {'loss': 0.0042, 'learning_rate': 2.64e-06, 'epoch': 3.58} 95%|█████████▍| 9490/10000 [34:35:32<1:49:29, 12.88s/it] 95%|█████████▍| 9491/10000 [34:35:44<1:49:15, 12.88s/it] {'loss': 0.0034, 'learning_rate': 2.6349999999999998e-06, 'epoch': 3.58} 95%|█████████▍| 9491/10000 [34:35:44<1:49:15, 12.88s/it] 95%|█████████▍| 9492/10000 [34:35:57<1:49:01, 12.88s/it] {'loss': 0.0038, 'learning_rate': 2.6300000000000002e-06, 'epoch': 3.58} 95%|█████████▍| 9492/10000 [34:35:57<1:49:01, 12.88s/it] 95%|█████████▍| 9493/10000 [34:36:10<1:48:49, 12.88s/it] {'loss': 0.0039, 'learning_rate': 2.625e-06, 'epoch': 3.58} 95%|█████████▍| 9493/10000 [34:36:10<1:48:49, 12.88s/it] 95%|█████████▍| 9494/10000 [34:36:23<1:48:32, 12.87s/it] {'loss': 0.0034, 'learning_rate': 2.6200000000000003e-06, 'epoch': 3.58} 95%|█████████▍| 9494/10000 [34:36:23<1:48:32, 12.87s/it] 95%|█████████▍| 9495/10000 [34:36:36<1:48:31, 12.89s/it] {'loss': 0.0041, 'learning_rate': 2.615e-06, 'epoch': 3.58} 95%|█████████▍| 9495/10000 [34:36:36<1:48:31, 12.89s/it] 95%|█████████▍| 9496/10000 [34:36:49<1:48:27, 12.91s/it] {'loss': 0.0036, 'learning_rate': 2.6100000000000004e-06, 'epoch': 3.58} 95%|█████████▍| 9496/10000 [34:36:49<1:48:27, 12.91s/it] 95%|█████████▍| 9497/10000 [34:37:02<1:48:16, 12.92s/it] {'loss': 0.0041, 'learning_rate': 2.605e-06, 'epoch': 3.58} 95%|█████████▍| 9497/10000 [34:37:02<1:48:16, 12.92s/it] 95%|█████████▍| 9498/10000 [34:37:15<1:47:58, 12.90s/it] {'loss': 0.0032, 'learning_rate': 2.6e-06, 'epoch': 3.58} 95%|█████████▍| 9498/10000 [34:37:15<1:47:58, 12.90s/it] 95%|█████████▍| 9499/10000 [34:37:28<1:47:40, 12.90s/it] {'loss': 0.0035, 'learning_rate': 2.595e-06, 'epoch': 3.58} 95%|█████████▍| 9499/10000 [34:37:28<1:47:40, 12.90s/it] 95%|█████████▌| 9500/10000 [34:37:41<1:47:24, 12.89s/it] {'loss': 0.0028, 'learning_rate': 2.59e-06, 'epoch': 3.58} 95%|█████████▌| 9500/10000 [34:37:41<1:47:24, 12.89s/it] 95%|█████████▌| 9501/10000 [34:37:53<1:46:59, 12.86s/it] {'loss': 0.005, 'learning_rate': 2.5850000000000002e-06, 'epoch': 3.58} 95%|█████████▌| 9501/10000 [34:37:53<1:46:59, 12.86s/it] 95%|█████████▌| 9502/10000 [34:38:06<1:46:46, 12.86s/it] {'loss': 0.0038, 'learning_rate': 2.5800000000000003e-06, 'epoch': 3.58} 95%|█████████▌| 9502/10000 [34:38:06<1:46:46, 12.86s/it] 95%|█████████▌| 9503/10000 [34:38:19<1:46:30, 12.86s/it] {'loss': 0.0037, 'learning_rate': 2.575e-06, 'epoch': 3.58} 95%|█████████▌| 9503/10000 [34:38:19<1:46:30, 12.86s/it] 95%|█████████▌| 9504/10000 [34:38:32<1:46:24, 12.87s/it] {'loss': 0.0037, 'learning_rate': 2.5700000000000004e-06, 'epoch': 3.58} 95%|█████████▌| 9504/10000 [34:38:32<1:46:24, 12.87s/it] 95%|█████████▌| 9505/10000 [34:38:45<1:46:17, 12.88s/it] {'loss': 0.0021, 'learning_rate': 2.565e-06, 'epoch': 3.58} 95%|█████████▌| 9505/10000 [34:38:45<1:46:17, 12.88s/it] 95%|█████████▌| 9506/10000 [34:38:58<1:46:10, 12.89s/it] {'loss': 0.0033, 'learning_rate': 2.56e-06, 'epoch': 3.58} 95%|█████████▌| 9506/10000 [34:38:58<1:46:10, 12.89s/it] 95%|█████████▌| 9507/10000 [34:39:11<1:45:57, 12.90s/it] {'loss': 0.0039, 'learning_rate': 2.555e-06, 'epoch': 3.58} 95%|█████████▌| 9507/10000 [34:39:11<1:45:57, 12.90s/it] 95%|█████████▌| 9508/10000 [34:39:24<1:45:51, 12.91s/it] {'loss': 0.0039, 'learning_rate': 2.55e-06, 'epoch': 3.58} 95%|█████████▌| 9508/10000 [34:39:24<1:45:51, 12.91s/it] 95%|█████████▌| 9509/10000 [34:39:37<1:45:40, 12.91s/it] {'loss': 0.003, 'learning_rate': 2.545e-06, 'epoch': 3.58} 95%|█████████▌| 9509/10000 [34:39:37<1:45:40, 12.91s/it] 95%|█████████▌| 9510/10000 [34:39:49<1:45:24, 12.91s/it] {'loss': 0.0032, 'learning_rate': 2.54e-06, 'epoch': 3.58} 95%|█████████▌| 9510/10000 [34:39:49<1:45:24, 12.91s/it] 95%|█████████▌| 9511/10000 [34:40:02<1:45:03, 12.89s/it] {'loss': 0.0032, 'learning_rate': 2.5350000000000003e-06, 'epoch': 3.58} 95%|█████████▌| 9511/10000 [34:40:02<1:45:03, 12.89s/it] 95%|█████████▌| 9512/10000 [34:40:15<1:44:59, 12.91s/it] {'loss': 0.0032, 'learning_rate': 2.53e-06, 'epoch': 3.58} 95%|█████████▌| 9512/10000 [34:40:15<1:44:59, 12.91s/it] 95%|█████████▌| 9513/10000 [34:40:28<1:44:52, 12.92s/it] {'loss': 0.0034, 'learning_rate': 2.5250000000000004e-06, 'epoch': 3.58} 95%|█████████▌| 9513/10000 [34:40:28<1:44:52, 12.92s/it] 95%|█████████▌| 9514/10000 [34:40:41<1:44:45, 12.93s/it] {'loss': 0.0041, 'learning_rate': 2.52e-06, 'epoch': 3.58} 95%|█████████▌| 9514/10000 [34:40:41<1:44:45, 12.93s/it] 95%|█████████▌| 9515/10000 [34:40:54<1:44:37, 12.94s/it] {'loss': 0.0053, 'learning_rate': 2.515e-06, 'epoch': 3.59} 95%|█████████▌| 9515/10000 [34:40:54<1:44:37, 12.94s/it] 95%|█████████▌| 9516/10000 [34:41:07<1:44:33, 12.96s/it] {'loss': 0.0038, 'learning_rate': 2.51e-06, 'epoch': 3.59} 95%|█████████▌| 9516/10000 [34:41:07<1:44:33, 12.96s/it] 95%|█████████▌| 9517/10000 [34:41:20<1:44:32, 12.99s/it] {'loss': 0.003, 'learning_rate': 2.505e-06, 'epoch': 3.59} 95%|█████████▌| 9517/10000 [34:41:20<1:44:32, 12.99s/it] 95%|█████████▌| 9518/10000 [34:41:33<1:44:08, 12.96s/it] {'loss': 0.0035, 'learning_rate': 2.5e-06, 'epoch': 3.59} 95%|█████████▌| 9518/10000 [34:41:33<1:44:08, 12.96s/it] 95%|█████████▌| 9519/10000 [34:41:46<1:44:02, 12.98s/it] {'loss': 0.0041, 'learning_rate': 2.4950000000000003e-06, 'epoch': 3.59} 95%|█████████▌| 9519/10000 [34:41:46<1:44:02, 12.98s/it] 95%|█████████▌| 9520/10000 [34:41:59<1:43:36, 12.95s/it] {'loss': 0.0043, 'learning_rate': 2.49e-06, 'epoch': 3.59} 95%|█████████▌| 9520/10000 [34:41:59<1:43:36, 12.95s/it] 95%|█████████▌| 9521/10000 [34:42:12<1:43:22, 12.95s/it] {'loss': 0.0034, 'learning_rate': 2.4850000000000003e-06, 'epoch': 3.59} 95%|█████████▌| 9521/10000 [34:42:12<1:43:22, 12.95s/it] 95%|█████████▌| 9522/10000 [34:42:25<1:43:10, 12.95s/it] {'loss': 0.0046, 'learning_rate': 2.48e-06, 'epoch': 3.59} 95%|█████████▌| 9522/10000 [34:42:25<1:43:10, 12.95s/it] 95%|█████████▌| 9523/10000 [34:42:38<1:42:55, 12.95s/it] {'loss': 0.0038, 'learning_rate': 2.4750000000000004e-06, 'epoch': 3.59} 95%|█████████▌| 9523/10000 [34:42:38<1:42:55, 12.95s/it] 95%|█████████▌| 9524/10000 [34:42:51<1:42:49, 12.96s/it] {'loss': 0.0043, 'learning_rate': 2.47e-06, 'epoch': 3.59} 95%|█████████▌| 9524/10000 [34:42:51<1:42:49, 12.96s/it] 95%|█████████▌| 9525/10000 [34:43:04<1:42:51, 12.99s/it] {'loss': 0.0029, 'learning_rate': 2.465e-06, 'epoch': 3.59} 95%|█████████▌| 9525/10000 [34:43:04<1:42:51, 12.99s/it] 95%|█████████▌| 9526/10000 [34:43:17<1:42:39, 12.99s/it] {'loss': 0.0052, 'learning_rate': 2.46e-06, 'epoch': 3.59} 95%|█████████▌| 9526/10000 [34:43:17<1:42:39, 12.99s/it] 95%|█████████▌| 9527/10000 [34:43:30<1:42:20, 12.98s/it] {'loss': 0.0039, 'learning_rate': 2.4550000000000002e-06, 'epoch': 3.59} 95%|█████████▌| 9527/10000 [34:43:30<1:42:20, 12.98s/it] 95%|█████████▌| 9528/10000 [34:43:43<1:41:59, 12.97s/it] {'loss': 0.004, 'learning_rate': 2.4500000000000003e-06, 'epoch': 3.59} 95%|█████████▌| 9528/10000 [34:43:43<1:41:59, 12.97s/it] 95%|█████████▌| 9529/10000 [34:43:56<1:41:34, 12.94s/it] {'loss': 0.0043, 'learning_rate': 2.445e-06, 'epoch': 3.59} 95%|█████████▌| 9529/10000 [34:43:56<1:41:34, 12.94s/it] 95%|█████████▌| 9530/10000 [34:44:09<1:41:17, 12.93s/it] {'loss': 0.0038, 'learning_rate': 2.4400000000000004e-06, 'epoch': 3.59} 95%|█████████▌| 9530/10000 [34:44:09<1:41:17, 12.93s/it] 95%|█████████▌| 9531/10000 [34:44:22<1:41:11, 12.95s/it] {'loss': 0.0028, 'learning_rate': 2.435e-06, 'epoch': 3.59} 95%|█████████▌| 9531/10000 [34:44:22<1:41:11, 12.95s/it] 95%|█████████▌| 9532/10000 [34:44:35<1:41:08, 12.97s/it] {'loss': 0.0035, 'learning_rate': 2.43e-06, 'epoch': 3.59} 95%|█████████▌| 9532/10000 [34:44:35<1:41:08, 12.97s/it] 95%|█████████▌| 9533/10000 [34:44:48<1:40:54, 12.97s/it] {'loss': 0.0027, 'learning_rate': 2.425e-06, 'epoch': 3.59} 95%|█████████▌| 9533/10000 [34:44:48<1:40:54, 12.97s/it] 95%|█████████▌| 9534/10000 [34:45:00<1:40:29, 12.94s/it] {'loss': 0.0043, 'learning_rate': 2.42e-06, 'epoch': 3.59} 95%|█████████▌| 9534/10000 [34:45:00<1:40:29, 12.94s/it] 95%|█████████▌| 9535/10000 [34:45:13<1:40:20, 12.95s/it] {'loss': 0.003, 'learning_rate': 2.415e-06, 'epoch': 3.59} 95%|█████████▌| 9535/10000 [34:45:13<1:40:20, 12.95s/it] 95%|█████████▌| 9536/10000 [34:45:26<1:40:00, 12.93s/it] {'loss': 0.0038, 'learning_rate': 2.4100000000000002e-06, 'epoch': 3.59} 95%|█████████▌| 9536/10000 [34:45:26<1:40:00, 12.93s/it] 95%|█████████▌| 9537/10000 [34:45:39<1:39:51, 12.94s/it] {'loss': 0.0041, 'learning_rate': 2.405e-06, 'epoch': 3.59} 95%|█████████▌| 9537/10000 [34:45:39<1:39:51, 12.94s/it] 95%|█████████▌| 9538/10000 [34:45:52<1:39:30, 12.92s/it] {'loss': 0.0041, 'learning_rate': 2.4000000000000003e-06, 'epoch': 3.59} 95%|█████████▌| 9538/10000 [34:45:52<1:39:30, 12.92s/it] 95%|█████████▌| 9539/10000 [34:46:05<1:39:18, 12.92s/it] {'loss': 0.0035, 'learning_rate': 2.395e-06, 'epoch': 3.59} 95%|█████████▌| 9539/10000 [34:46:05<1:39:18, 12.92s/it] 95%|█████████▌| 9540/10000 [34:46:18<1:39:03, 12.92s/it] {'loss': 0.0025, 'learning_rate': 2.3900000000000004e-06, 'epoch': 3.59} 95%|█████████▌| 9540/10000 [34:46:18<1:39:03, 12.92s/it] 95%|█████████▌| 9541/10000 [34:46:31<1:39:00, 12.94s/it] {'loss': 0.0032, 'learning_rate': 2.385e-06, 'epoch': 3.59} 95%|█████████▌| 9541/10000 [34:46:31<1:39:00, 12.94s/it] 95%|█████████▌| 9542/10000 [34:46:44<1:38:39, 12.92s/it] {'loss': 0.004, 'learning_rate': 2.38e-06, 'epoch': 3.6} 95%|█████████▌| 9542/10000 [34:46:44<1:38:39, 12.92s/it] 95%|█████████▌| 9543/10000 [34:46:57<1:38:27, 12.93s/it] {'loss': 0.003, 'learning_rate': 2.375e-06, 'epoch': 3.6} 95%|█████████▌| 9543/10000 [34:46:57<1:38:27, 12.93s/it] 95%|█████████▌| 9544/10000 [34:47:10<1:38:27, 12.96s/it] {'loss': 0.003, 'learning_rate': 2.37e-06, 'epoch': 3.6} 95%|█████████▌| 9544/10000 [34:47:10<1:38:27, 12.96s/it] 95%|█████████▌| 9545/10000 [34:47:23<1:38:18, 12.96s/it] {'loss': 0.0028, 'learning_rate': 2.3650000000000002e-06, 'epoch': 3.6} 95%|█████████▌| 9545/10000 [34:47:23<1:38:18, 12.96s/it] 95%|█████████▌| 9546/10000 [34:47:36<1:37:56, 12.94s/it] {'loss': 0.0034, 'learning_rate': 2.36e-06, 'epoch': 3.6} 95%|█████████▌| 9546/10000 [34:47:36<1:37:56, 12.94s/it] 95%|█████████▌| 9547/10000 [34:47:48<1:37:29, 12.91s/it] {'loss': 0.0043, 'learning_rate': 2.3550000000000003e-06, 'epoch': 3.6} 95%|█████████▌| 9547/10000 [34:47:49<1:37:29, 12.91s/it] 95%|█████████▌| 9548/10000 [34:48:01<1:37:08, 12.90s/it] {'loss': 0.0035, 'learning_rate': 2.35e-06, 'epoch': 3.6} 95%|█████████▌| 9548/10000 [34:48:01<1:37:08, 12.90s/it] 95%|█████████▌| 9549/10000 [34:48:14<1:36:51, 12.89s/it] {'loss': 0.0044, 'learning_rate': 2.345e-06, 'epoch': 3.6} 95%|█████████▌| 9549/10000 [34:48:14<1:36:51, 12.89s/it] 96%|█████████▌| 9550/10000 [34:48:27<1:36:49, 12.91s/it] {'loss': 0.005, 'learning_rate': 2.34e-06, 'epoch': 3.6} 96%|█████████▌| 9550/10000 [34:48:27<1:36:49, 12.91s/it] 96%|█████████▌| 9551/10000 [34:48:40<1:36:38, 12.91s/it] {'loss': 0.0033, 'learning_rate': 2.335e-06, 'epoch': 3.6} 96%|█████████▌| 9551/10000 [34:48:40<1:36:38, 12.91s/it] 96%|█████████▌| 9552/10000 [34:48:53<1:36:28, 12.92s/it] {'loss': 0.0035, 'learning_rate': 2.33e-06, 'epoch': 3.6} 96%|█████████▌| 9552/10000 [34:48:53<1:36:28, 12.92s/it] 96%|█████████▌| 9553/10000 [34:49:06<1:36:23, 12.94s/it] {'loss': 0.0039, 'learning_rate': 2.325e-06, 'epoch': 3.6} 96%|█████████▌| 9553/10000 [34:49:06<1:36:23, 12.94s/it] 96%|█████████▌| 9554/10000 [34:49:19<1:36:07, 12.93s/it] {'loss': 0.0034, 'learning_rate': 2.32e-06, 'epoch': 3.6} 96%|█████████▌| 9554/10000 [34:49:19<1:36:07, 12.93s/it] 96%|█████████▌| 9555/10000 [34:49:32<1:35:50, 12.92s/it] {'loss': 0.0031, 'learning_rate': 2.3150000000000003e-06, 'epoch': 3.6} 96%|█████████▌| 9555/10000 [34:49:32<1:35:50, 12.92s/it] 96%|█████████▌| 9556/10000 [34:49:45<1:35:38, 12.92s/it] {'loss': 0.0045, 'learning_rate': 2.31e-06, 'epoch': 3.6} 96%|█████████▌| 9556/10000 [34:49:45<1:35:38, 12.92s/it] 96%|█████████▌| 9557/10000 [34:49:58<1:35:22, 12.92s/it] {'loss': 0.0036, 'learning_rate': 2.3050000000000004e-06, 'epoch': 3.6} 96%|█████████▌| 9557/10000 [34:49:58<1:35:22, 12.92s/it] 96%|█████████▌| 9558/10000 [34:50:11<1:35:05, 12.91s/it] {'loss': 0.0029, 'learning_rate': 2.3e-06, 'epoch': 3.6} 96%|█████████▌| 9558/10000 [34:50:11<1:35:05, 12.91s/it] 96%|█████████▌| 9559/10000 [34:50:23<1:34:52, 12.91s/it] {'loss': 0.0035, 'learning_rate': 2.2950000000000005e-06, 'epoch': 3.6} 96%|█████████▌| 9559/10000 [34:50:23<1:34:52, 12.91s/it] 96%|█████████▌| 9560/10000 [34:50:36<1:34:35, 12.90s/it] {'loss': 0.0035, 'learning_rate': 2.29e-06, 'epoch': 3.6} 96%|█████████▌| 9560/10000 [34:50:36<1:34:35, 12.90s/it] 96%|█████████▌| 9561/10000 [34:50:49<1:34:36, 12.93s/it] {'loss': 0.0032, 'learning_rate': 2.285e-06, 'epoch': 3.6} 96%|█████████▌| 9561/10000 [34:50:49<1:34:36, 12.93s/it] 96%|█████████▌| 9562/10000 [34:51:02<1:34:20, 12.92s/it] {'loss': 0.0036, 'learning_rate': 2.28e-06, 'epoch': 3.6} 96%|█████████▌| 9562/10000 [34:51:02<1:34:20, 12.92s/it] 96%|█████████▌| 9563/10000 [34:51:15<1:34:04, 12.92s/it] {'loss': 0.0041, 'learning_rate': 2.2750000000000002e-06, 'epoch': 3.6} 96%|█████████▌| 9563/10000 [34:51:15<1:34:04, 12.92s/it] 96%|█████████▌| 9564/10000 [34:51:28<1:33:44, 12.90s/it] {'loss': 0.0035, 'learning_rate': 2.2700000000000003e-06, 'epoch': 3.6} 96%|█████████▌| 9564/10000 [34:51:28<1:33:44, 12.90s/it] 96%|█████████▌| 9565/10000 [34:51:41<1:33:39, 12.92s/it] {'loss': 0.0028, 'learning_rate': 2.265e-06, 'epoch': 3.6} 96%|█████████▌| 9565/10000 [34:51:41<1:33:39, 12.92s/it] 96%|█████████▌| 9566/10000 [34:51:54<1:33:18, 12.90s/it] {'loss': 0.0058, 'learning_rate': 2.26e-06, 'epoch': 3.6} 96%|█████████▌| 9566/10000 [34:51:54<1:33:18, 12.90s/it] 96%|█████████▌| 9567/10000 [34:52:07<1:33:01, 12.89s/it] {'loss': 0.003, 'learning_rate': 2.255e-06, 'epoch': 3.6} 96%|█████████▌| 9567/10000 [34:52:07<1:33:01, 12.89s/it] 96%|█████████▌| 9568/10000 [34:52:20<1:33:00, 12.92s/it] {'loss': 0.0031, 'learning_rate': 2.25e-06, 'epoch': 3.61} 96%|█████████▌| 9568/10000 [34:52:20<1:33:00, 12.92s/it] 96%|█████████▌| 9569/10000 [34:52:33<1:32:50, 12.92s/it] {'loss': 0.004, 'learning_rate': 2.245e-06, 'epoch': 3.61} 96%|█████████▌| 9569/10000 [34:52:33<1:32:50, 12.92s/it] 96%|█████████▌| 9570/10000 [34:52:46<1:32:46, 12.95s/it] {'loss': 0.0032, 'learning_rate': 2.24e-06, 'epoch': 3.61} 96%|█████████▌| 9570/10000 [34:52:46<1:32:46, 12.95s/it] 96%|█████████▌| 9571/10000 [34:52:59<1:32:30, 12.94s/it] {'loss': 0.0027, 'learning_rate': 2.2349999999999998e-06, 'epoch': 3.61} 96%|█████████▌| 9571/10000 [34:52:59<1:32:30, 12.94s/it] 96%|█████████▌| 9572/10000 [34:53:11<1:32:12, 12.93s/it] {'loss': 0.0041, 'learning_rate': 2.2300000000000002e-06, 'epoch': 3.61} 96%|█████████▌| 9572/10000 [34:53:11<1:32:12, 12.93s/it] 96%|█████████▌| 9573/10000 [34:53:24<1:31:50, 12.91s/it] {'loss': 0.005, 'learning_rate': 2.225e-06, 'epoch': 3.61} 96%|█████████▌| 9573/10000 [34:53:24<1:31:50, 12.91s/it] 96%|█████████▌| 9574/10000 [34:53:37<1:31:36, 12.90s/it] {'loss': 0.003, 'learning_rate': 2.2200000000000003e-06, 'epoch': 3.61} 96%|█████████▌| 9574/10000 [34:53:37<1:31:36, 12.90s/it] 96%|█████████▌| 9575/10000 [34:53:50<1:31:34, 12.93s/it] {'loss': 0.0036, 'learning_rate': 2.215e-06, 'epoch': 3.61} 96%|█████████▌| 9575/10000 [34:53:50<1:31:34, 12.93s/it] 96%|█████████▌| 9576/10000 [34:54:03<1:31:27, 12.94s/it] {'loss': 0.0033, 'learning_rate': 2.2100000000000004e-06, 'epoch': 3.61} 96%|█████████▌| 9576/10000 [34:54:03<1:31:27, 12.94s/it] 96%|█████████▌| 9577/10000 [34:54:16<1:31:27, 12.97s/it] {'loss': 0.0034, 'learning_rate': 2.205e-06, 'epoch': 3.61} 96%|█████████▌| 9577/10000 [34:54:16<1:31:27, 12.97s/it] 96%|█████████▌| 9578/10000 [34:54:29<1:31:07, 12.96s/it] {'loss': 0.0042, 'learning_rate': 2.2e-06, 'epoch': 3.61} 96%|█████████▌| 9578/10000 [34:54:29<1:31:07, 12.96s/it] 96%|█████████▌| 9579/10000 [34:54:42<1:30:58, 12.96s/it] {'loss': 0.0044, 'learning_rate': 2.195e-06, 'epoch': 3.61} 96%|█████████▌| 9579/10000 [34:54:42<1:30:58, 12.96s/it] 96%|█████████▌| 9580/10000 [34:54:55<1:30:30, 12.93s/it] {'loss': 0.0044, 'learning_rate': 2.19e-06, 'epoch': 3.61} 96%|█████████▌| 9580/10000 [34:54:55<1:30:30, 12.93s/it] 96%|█████████▌| 9581/10000 [34:55:08<1:30:20, 12.94s/it] {'loss': 0.0033, 'learning_rate': 2.1850000000000003e-06, 'epoch': 3.61} 96%|█████████▌| 9581/10000 [34:55:08<1:30:20, 12.94s/it] 96%|█████████▌| 9582/10000 [34:55:21<1:30:07, 12.94s/it] {'loss': 0.0041, 'learning_rate': 2.1800000000000003e-06, 'epoch': 3.61} 96%|█████████▌| 9582/10000 [34:55:21<1:30:07, 12.94s/it] 96%|█████████▌| 9583/10000 [34:55:34<1:29:51, 12.93s/it] {'loss': 0.0026, 'learning_rate': 2.175e-06, 'epoch': 3.61} 96%|█████████▌| 9583/10000 [34:55:34<1:29:51, 12.93s/it] 96%|█████████▌| 9584/10000 [34:55:47<1:29:34, 12.92s/it] {'loss': 0.0032, 'learning_rate': 2.17e-06, 'epoch': 3.61} 96%|█████████▌| 9584/10000 [34:55:47<1:29:34, 12.92s/it] 96%|█████████▌| 9585/10000 [34:56:00<1:29:15, 12.90s/it] {'loss': 0.0049, 'learning_rate': 2.165e-06, 'epoch': 3.61} 96%|█████████▌| 9585/10000 [34:56:00<1:29:15, 12.90s/it] 96%|█████████▌| 9586/10000 [34:56:12<1:29:07, 12.92s/it] {'loss': 0.003, 'learning_rate': 2.16e-06, 'epoch': 3.61} 96%|█████████▌| 9586/10000 [34:56:13<1:29:07, 12.92s/it] 96%|█████████▌| 9587/10000 [34:56:25<1:28:52, 12.91s/it] {'loss': 0.003, 'learning_rate': 2.155e-06, 'epoch': 3.61} 96%|█████████▌| 9587/10000 [34:56:25<1:28:52, 12.91s/it] 96%|█████████▌| 9588/10000 [34:56:38<1:28:29, 12.89s/it] {'loss': 0.0043, 'learning_rate': 2.1499999999999997e-06, 'epoch': 3.61} 96%|█████████▌| 9588/10000 [34:56:38<1:28:29, 12.89s/it] 96%|█████████▌| 9589/10000 [34:56:51<1:28:27, 12.91s/it] {'loss': 0.0037, 'learning_rate': 2.1450000000000002e-06, 'epoch': 3.61} 96%|█████████▌| 9589/10000 [34:56:51<1:28:27, 12.91s/it] 96%|█████████▌| 9590/10000 [34:57:04<1:28:09, 12.90s/it] {'loss': 0.0037, 'learning_rate': 2.14e-06, 'epoch': 3.61} 96%|█████████▌| 9590/10000 [34:57:04<1:28:09, 12.90s/it] 96%|█████████▌| 9591/10000 [34:57:17<1:28:07, 12.93s/it] {'loss': 0.0036, 'learning_rate': 2.1350000000000003e-06, 'epoch': 3.61} 96%|█████████▌| 9591/10000 [34:57:17<1:28:07, 12.93s/it] 96%|█████████▌| 9592/10000 [34:57:30<1:28:00, 12.94s/it] {'loss': 0.0038, 'learning_rate': 2.13e-06, 'epoch': 3.61} 96%|█████████▌| 9592/10000 [34:57:30<1:28:00, 12.94s/it] 96%|█████████▌| 9593/10000 [34:57:43<1:27:35, 12.91s/it] {'loss': 0.0028, 'learning_rate': 2.1250000000000004e-06, 'epoch': 3.61} 96%|█████████▌| 9593/10000 [34:57:43<1:27:35, 12.91s/it] 96%|█████████▌| 9594/10000 [34:57:56<1:27:19, 12.91s/it] {'loss': 0.0024, 'learning_rate': 2.12e-06, 'epoch': 3.61} 96%|█████████▌| 9594/10000 [34:57:56<1:27:19, 12.91s/it] 96%|█████████▌| 9595/10000 [34:58:09<1:27:08, 12.91s/it] {'loss': 0.0026, 'learning_rate': 2.115e-06, 'epoch': 3.62} 96%|█████████▌| 9595/10000 [34:58:09<1:27:08, 12.91s/it] 96%|█████████▌| 9596/10000 [34:58:22<1:26:56, 12.91s/it] {'loss': 0.0031, 'learning_rate': 2.11e-06, 'epoch': 3.62} 96%|█████████▌| 9596/10000 [34:58:22<1:26:56, 12.91s/it] 96%|█████████▌| 9597/10000 [34:58:34<1:26:40, 12.90s/it] {'loss': 0.0038, 'learning_rate': 2.105e-06, 'epoch': 3.62} 96%|█████████▌| 9597/10000 [34:58:34<1:26:40, 12.90s/it] 96%|█████████▌| 9598/10000 [34:58:47<1:26:23, 12.89s/it] {'loss': 0.0036, 'learning_rate': 2.1000000000000002e-06, 'epoch': 3.62} 96%|█████████▌| 9598/10000 [34:58:47<1:26:23, 12.89s/it] 96%|█████████▌| 9599/10000 [34:59:00<1:26:13, 12.90s/it] {'loss': 0.0026, 'learning_rate': 2.0950000000000003e-06, 'epoch': 3.62} 96%|█████████▌| 9599/10000 [34:59:00<1:26:13, 12.90s/it] 96%|█████████▌| 9600/10000 [34:59:13<1:25:50, 12.88s/it] {'loss': 0.006, 'learning_rate': 2.09e-06, 'epoch': 3.62} 96%|█████████▌| 9600/10000 [34:59:13<1:25:50, 12.88s/it] 96%|█████████▌| 9601/10000 [34:59:26<1:25:42, 12.89s/it] {'loss': 0.0041, 'learning_rate': 2.085e-06, 'epoch': 3.62} 96%|█████████▌| 9601/10000 [34:59:26<1:25:42, 12.89s/it] 96%|█████████▌| 9602/10000 [34:59:39<1:25:32, 12.90s/it] {'loss': 0.005, 'learning_rate': 2.08e-06, 'epoch': 3.62} 96%|█████████▌| 9602/10000 [34:59:39<1:25:32, 12.90s/it] 96%|█████████▌| 9603/10000 [34:59:52<1:25:18, 12.89s/it] {'loss': 0.003, 'learning_rate': 2.075e-06, 'epoch': 3.62} 96%|█████████▌| 9603/10000 [34:59:52<1:25:18, 12.89s/it] 96%|█████████▌| 9604/10000 [35:00:05<1:25:00, 12.88s/it] {'loss': 0.0044, 'learning_rate': 2.07e-06, 'epoch': 3.62} 96%|█████████▌| 9604/10000 [35:00:05<1:25:00, 12.88s/it] 96%|█████████▌| 9605/10000 [35:00:18<1:24:51, 12.89s/it] {'loss': 0.0031, 'learning_rate': 2.065e-06, 'epoch': 3.62} 96%|█████████▌| 9605/10000 [35:00:18<1:24:51, 12.89s/it] 96%|█████████▌| 9606/10000 [35:00:30<1:24:42, 12.90s/it] {'loss': 0.0042, 'learning_rate': 2.06e-06, 'epoch': 3.62} 96%|█████████▌| 9606/10000 [35:00:30<1:24:42, 12.90s/it] 96%|█████████▌| 9607/10000 [35:00:43<1:24:36, 12.92s/it] {'loss': 0.0036, 'learning_rate': 2.055e-06, 'epoch': 3.62} 96%|█████████▌| 9607/10000 [35:00:43<1:24:36, 12.92s/it] 96%|█████████▌| 9608/10000 [35:00:56<1:24:13, 12.89s/it] {'loss': 0.0051, 'learning_rate': 2.0500000000000003e-06, 'epoch': 3.62} 96%|█████████▌| 9608/10000 [35:00:56<1:24:13, 12.89s/it] 96%|█████████▌| 9609/10000 [35:01:09<1:24:08, 12.91s/it] {'loss': 0.0041, 'learning_rate': 2.045e-06, 'epoch': 3.62} 96%|█████████▌| 9609/10000 [35:01:09<1:24:08, 12.91s/it] 96%|█████████▌| 9610/10000 [35:01:22<1:24:04, 12.93s/it] {'loss': 0.0027, 'learning_rate': 2.0400000000000004e-06, 'epoch': 3.62} 96%|█████████▌| 9610/10000 [35:01:22<1:24:04, 12.93s/it] 96%|█████████▌| 9611/10000 [35:01:35<1:23:56, 12.95s/it] {'loss': 0.0038, 'learning_rate': 2.035e-06, 'epoch': 3.62} 96%|█████████▌| 9611/10000 [35:01:35<1:23:56, 12.95s/it] 96%|█████████▌| 9612/10000 [35:01:48<1:23:59, 12.99s/it] {'loss': 0.0029, 'learning_rate': 2.03e-06, 'epoch': 3.62} 96%|█████████▌| 9612/10000 [35:01:48<1:23:59, 12.99s/it] 96%|█████████▌| 9613/10000 [35:02:01<1:23:41, 12.98s/it] {'loss': 0.0042, 'learning_rate': 2.025e-06, 'epoch': 3.62} 96%|█████████▌| 9613/10000 [35:02:01<1:23:41, 12.98s/it] 96%|█████████▌| 9614/10000 [35:02:14<1:23:30, 12.98s/it] {'loss': 0.0031, 'learning_rate': 2.02e-06, 'epoch': 3.62} 96%|█████████▌| 9614/10000 [35:02:14<1:23:30, 12.98s/it] 96%|█████████▌| 9615/10000 [35:02:27<1:23:14, 12.97s/it] {'loss': 0.0041, 'learning_rate': 2.015e-06, 'epoch': 3.62} 96%|█████████▌| 9615/10000 [35:02:27<1:23:14, 12.97s/it] 96%|█████████▌| 9616/10000 [35:02:40<1:22:48, 12.94s/it] {'loss': 0.0041, 'learning_rate': 2.0100000000000002e-06, 'epoch': 3.62} 96%|█████████▌| 9616/10000 [35:02:40<1:22:48, 12.94s/it] 96%|█████████▌| 9617/10000 [35:02:53<1:22:33, 12.93s/it] {'loss': 0.0033, 'learning_rate': 2.005e-06, 'epoch': 3.62} 96%|█████████▌| 9617/10000 [35:02:53<1:22:33, 12.93s/it] 96%|█████████▌| 9618/10000 [35:03:06<1:22:14, 12.92s/it] {'loss': 0.0032, 'learning_rate': 2.0000000000000003e-06, 'epoch': 3.62} 96%|█████████▌| 9618/10000 [35:03:06<1:22:14, 12.92s/it] 96%|█████████▌| 9619/10000 [35:03:19<1:21:52, 12.89s/it] {'loss': 0.0033, 'learning_rate': 1.995e-06, 'epoch': 3.62} 96%|█████████▌| 9619/10000 [35:03:19<1:21:52, 12.89s/it] 96%|█████████▌| 9620/10000 [35:03:32<1:21:42, 12.90s/it] {'loss': 0.003, 'learning_rate': 1.99e-06, 'epoch': 3.62} 96%|█████████▌| 9620/10000 [35:03:32<1:21:42, 12.90s/it] 96%|█████████▌| 9621/10000 [35:03:44<1:21:28, 12.90s/it] {'loss': 0.003, 'learning_rate': 1.985e-06, 'epoch': 3.63} 96%|█████████▌| 9621/10000 [35:03:44<1:21:28, 12.90s/it] 96%|█████████▌| 9622/10000 [35:03:57<1:21:16, 12.90s/it] {'loss': 0.0042, 'learning_rate': 1.98e-06, 'epoch': 3.63} 96%|█████████▌| 9622/10000 [35:03:57<1:21:16, 12.90s/it] 96%|█████████▌| 9623/10000 [35:04:10<1:20:59, 12.89s/it] {'loss': 0.0034, 'learning_rate': 1.975e-06, 'epoch': 3.63} 96%|█████████▌| 9623/10000 [35:04:10<1:20:59, 12.89s/it] 96%|█████████▌| 9624/10000 [35:04:23<1:20:55, 12.91s/it] {'loss': 0.0029, 'learning_rate': 1.9699999999999998e-06, 'epoch': 3.63} 96%|█████████▌| 9624/10000 [35:04:23<1:20:55, 12.91s/it] 96%|█████████▋| 9625/10000 [35:04:36<1:20:46, 12.92s/it] {'loss': 0.0029, 'learning_rate': 1.9650000000000002e-06, 'epoch': 3.63} 96%|█████████▋| 9625/10000 [35:04:36<1:20:46, 12.92s/it] 96%|█████████▋| 9626/10000 [35:04:49<1:20:38, 12.94s/it] {'loss': 0.0024, 'learning_rate': 1.96e-06, 'epoch': 3.63} 96%|█████████▋| 9626/10000 [35:04:49<1:20:38, 12.94s/it] 96%|█████████▋| 9627/10000 [35:05:02<1:20:20, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.9550000000000003e-06, 'epoch': 3.63} 96%|█████████▋| 9627/10000 [35:05:02<1:20:20, 12.92s/it] 96%|█████████▋| 9628/10000 [35:05:15<1:20:11, 12.94s/it] {'loss': 0.0058, 'learning_rate': 1.95e-06, 'epoch': 3.63} 96%|█████████▋| 9628/10000 [35:05:15<1:20:11, 12.94s/it] 96%|█████████▋| 9629/10000 [35:05:28<1:19:56, 12.93s/it] {'loss': 0.0032, 'learning_rate': 1.945e-06, 'epoch': 3.63} 96%|█████████▋| 9629/10000 [35:05:28<1:19:56, 12.93s/it] 96%|█████████▋| 9630/10000 [35:05:41<1:19:46, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.94e-06, 'epoch': 3.63} 96%|█████████▋| 9630/10000 [35:05:41<1:19:46, 12.94s/it] 96%|█████████▋| 9631/10000 [35:05:54<1:19:28, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.935e-06, 'epoch': 3.63} 96%|█████████▋| 9631/10000 [35:05:54<1:19:28, 12.92s/it] 96%|█████████▋| 9632/10000 [35:06:07<1:19:14, 12.92s/it] {'loss': 0.0039, 'learning_rate': 1.93e-06, 'epoch': 3.63} 96%|█████████▋| 9632/10000 [35:06:07<1:19:14, 12.92s/it] 96%|█████████▋| 9633/10000 [35:06:20<1:19:01, 12.92s/it] {'loss': 0.0036, 'learning_rate': 1.925e-06, 'epoch': 3.63} 96%|█████████▋| 9633/10000 [35:06:20<1:19:01, 12.92s/it] 96%|█████████▋| 9634/10000 [35:06:32<1:18:47, 12.92s/it] {'loss': 0.0046, 'learning_rate': 1.92e-06, 'epoch': 3.63} 96%|█████████▋| 9634/10000 [35:06:32<1:18:47, 12.92s/it] 96%|█████████▋| 9635/10000 [35:06:45<1:18:32, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.9150000000000003e-06, 'epoch': 3.63} 96%|█████████▋| 9635/10000 [35:06:45<1:18:32, 12.91s/it] 96%|█████████▋| 9636/10000 [35:06:58<1:18:27, 12.93s/it] {'loss': 0.0026, 'learning_rate': 1.91e-06, 'epoch': 3.63} 96%|█████████▋| 9636/10000 [35:06:58<1:18:27, 12.93s/it] 96%|█████████▋| 9637/10000 [35:07:11<1:18:19, 12.95s/it] {'loss': 0.0031, 'learning_rate': 1.9050000000000002e-06, 'epoch': 3.63} 96%|█████████▋| 9637/10000 [35:07:11<1:18:19, 12.95s/it] 96%|█████████▋| 9638/10000 [35:07:24<1:18:07, 12.95s/it] {'loss': 0.0034, 'learning_rate': 1.9e-06, 'epoch': 3.63} 96%|█████████▋| 9638/10000 [35:07:24<1:18:07, 12.95s/it] 96%|█████████▋| 9639/10000 [35:07:37<1:17:51, 12.94s/it] {'loss': 0.0034, 'learning_rate': 1.8950000000000003e-06, 'epoch': 3.63} 96%|█████████▋| 9639/10000 [35:07:37<1:17:51, 12.94s/it] 96%|█████████▋| 9640/10000 [35:07:50<1:17:39, 12.94s/it] {'loss': 0.0043, 'learning_rate': 1.8900000000000001e-06, 'epoch': 3.63} 96%|█████████▋| 9640/10000 [35:07:50<1:17:39, 12.94s/it] 96%|█████████▋| 9641/10000 [35:08:03<1:17:26, 12.94s/it] {'loss': 0.0037, 'learning_rate': 1.885e-06, 'epoch': 3.63} 96%|█████████▋| 9641/10000 [35:08:03<1:17:26, 12.94s/it] 96%|█████████▋| 9642/10000 [35:08:16<1:16:59, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.8800000000000002e-06, 'epoch': 3.63} 96%|█████████▋| 9642/10000 [35:08:16<1:16:59, 12.90s/it] 96%|█████████▋| 9643/10000 [35:08:29<1:16:45, 12.90s/it] {'loss': 0.0038, 'learning_rate': 1.875e-06, 'epoch': 3.63} 96%|█████████▋| 9643/10000 [35:08:29<1:16:45, 12.90s/it] 96%|█████████▋| 9644/10000 [35:08:42<1:16:32, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.8700000000000003e-06, 'epoch': 3.63} 96%|█████████▋| 9644/10000 [35:08:42<1:16:32, 12.90s/it] 96%|█████████▋| 9645/10000 [35:08:55<1:16:21, 12.90s/it] {'loss': 0.004, 'learning_rate': 1.8650000000000001e-06, 'epoch': 3.63} 96%|█████████▋| 9645/10000 [35:08:55<1:16:21, 12.90s/it] 96%|█████████▋| 9646/10000 [35:09:08<1:16:13, 12.92s/it] {'loss': 0.0034, 'learning_rate': 1.86e-06, 'epoch': 3.63} 96%|█████████▋| 9646/10000 [35:09:08<1:16:13, 12.92s/it] 96%|█████████▋| 9647/10000 [35:09:21<1:16:04, 12.93s/it] {'loss': 0.0044, 'learning_rate': 1.8550000000000002e-06, 'epoch': 3.63} 96%|█████████▋| 9647/10000 [35:09:21<1:16:04, 12.93s/it] 96%|█████████▋| 9648/10000 [35:09:33<1:15:52, 12.93s/it] {'loss': 0.003, 'learning_rate': 1.85e-06, 'epoch': 3.64} 96%|█████████▋| 9648/10000 [35:09:34<1:15:52, 12.93s/it] 96%|█████████▋| 9649/10000 [35:09:46<1:15:35, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.8450000000000001e-06, 'epoch': 3.64} 96%|█████████▋| 9649/10000 [35:09:46<1:15:35, 12.92s/it] 96%|█████████▋| 9650/10000 [35:09:59<1:15:21, 12.92s/it] {'loss': 0.0035, 'learning_rate': 1.84e-06, 'epoch': 3.64} 96%|█████████▋| 9650/10000 [35:09:59<1:15:21, 12.92s/it] 97%|█████████▋| 9651/10000 [35:10:12<1:15:03, 12.90s/it] {'loss': 0.0046, 'learning_rate': 1.8350000000000002e-06, 'epoch': 3.64} 97%|█████████▋| 9651/10000 [35:10:12<1:15:03, 12.90s/it] 97%|█████████▋| 9652/10000 [35:10:25<1:14:44, 12.89s/it] {'loss': 0.0029, 'learning_rate': 1.83e-06, 'epoch': 3.64} 97%|█████████▋| 9652/10000 [35:10:25<1:14:44, 12.89s/it] 97%|█████████▋| 9653/10000 [35:10:38<1:14:26, 12.87s/it] {'loss': 0.0035, 'learning_rate': 1.8249999999999999e-06, 'epoch': 3.64} 97%|█████████▋| 9653/10000 [35:10:38<1:14:26, 12.87s/it] 97%|█████████▋| 9654/10000 [35:10:51<1:14:13, 12.87s/it] {'loss': 0.0038, 'learning_rate': 1.8200000000000002e-06, 'epoch': 3.64} 97%|█████████▋| 9654/10000 [35:10:51<1:14:13, 12.87s/it] 97%|█████████▋| 9655/10000 [35:11:04<1:13:59, 12.87s/it] {'loss': 0.0038, 'learning_rate': 1.815e-06, 'epoch': 3.64} 97%|█████████▋| 9655/10000 [35:11:04<1:13:59, 12.87s/it] 97%|█████████▋| 9656/10000 [35:11:16<1:13:49, 12.88s/it] {'loss': 0.0036, 'learning_rate': 1.8100000000000002e-06, 'epoch': 3.64} 97%|█████████▋| 9656/10000 [35:11:16<1:13:49, 12.88s/it] 97%|█████████▋| 9657/10000 [35:11:29<1:13:33, 12.87s/it] {'loss': 0.0032, 'learning_rate': 1.805e-06, 'epoch': 3.64} 97%|█████████▋| 9657/10000 [35:11:29<1:13:33, 12.87s/it] 97%|█████████▋| 9658/10000 [35:11:42<1:13:21, 12.87s/it] {'loss': 0.0039, 'learning_rate': 1.8e-06, 'epoch': 3.64} 97%|█████████▋| 9658/10000 [35:11:42<1:13:21, 12.87s/it] 97%|█████████▋| 9659/10000 [35:11:55<1:13:08, 12.87s/it] {'loss': 0.0031, 'learning_rate': 1.7950000000000002e-06, 'epoch': 3.64} 97%|█████████▋| 9659/10000 [35:11:55<1:13:08, 12.87s/it] 97%|█████████▋| 9660/10000 [35:12:08<1:12:53, 12.86s/it] {'loss': 0.0033, 'learning_rate': 1.79e-06, 'epoch': 3.64} 97%|█████████▋| 9660/10000 [35:12:08<1:12:53, 12.86s/it] 97%|█████████▋| 9661/10000 [35:12:21<1:12:41, 12.86s/it] {'loss': 0.0038, 'learning_rate': 1.7850000000000003e-06, 'epoch': 3.64} 97%|█████████▋| 9661/10000 [35:12:21<1:12:41, 12.86s/it] 97%|█████████▋| 9662/10000 [35:12:34<1:12:33, 12.88s/it] {'loss': 0.0038, 'learning_rate': 1.7800000000000001e-06, 'epoch': 3.64} 97%|█████████▋| 9662/10000 [35:12:34<1:12:33, 12.88s/it] 97%|█████████▋| 9663/10000 [35:12:47<1:12:24, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.775e-06, 'epoch': 3.64} 97%|█████████▋| 9663/10000 [35:12:47<1:12:24, 12.89s/it] 97%|█████████▋| 9664/10000 [35:12:59<1:12:09, 12.89s/it] {'loss': 0.004, 'learning_rate': 1.7700000000000002e-06, 'epoch': 3.64} 97%|█████████▋| 9664/10000 [35:12:59<1:12:09, 12.89s/it] 97%|█████████▋| 9665/10000 [35:13:12<1:11:54, 12.88s/it] {'loss': 0.0033, 'learning_rate': 1.765e-06, 'epoch': 3.64} 97%|█████████▋| 9665/10000 [35:13:12<1:11:54, 12.88s/it] 97%|█████████▋| 9666/10000 [35:13:25<1:11:46, 12.89s/it] {'loss': 0.0031, 'learning_rate': 1.76e-06, 'epoch': 3.64} 97%|█████████▋| 9666/10000 [35:13:25<1:11:46, 12.89s/it] 97%|█████████▋| 9667/10000 [35:13:38<1:11:29, 12.88s/it] {'loss': 0.004, 'learning_rate': 1.7550000000000001e-06, 'epoch': 3.64} 97%|█████████▋| 9667/10000 [35:13:38<1:11:29, 12.88s/it] 97%|█████████▋| 9668/10000 [35:13:51<1:11:08, 12.86s/it] {'loss': 0.0037, 'learning_rate': 1.7500000000000002e-06, 'epoch': 3.64} 97%|█████████▋| 9668/10000 [35:13:51<1:11:08, 12.86s/it] 97%|█████████▋| 9669/10000 [35:14:04<1:10:55, 12.86s/it] {'loss': 0.0031, 'learning_rate': 1.745e-06, 'epoch': 3.64} 97%|█████████▋| 9669/10000 [35:14:04<1:10:55, 12.86s/it] 97%|█████████▋| 9670/10000 [35:14:17<1:10:48, 12.88s/it] {'loss': 0.0034, 'learning_rate': 1.7399999999999999e-06, 'epoch': 3.64} 97%|█████████▋| 9670/10000 [35:14:17<1:10:48, 12.88s/it] 97%|█████████▋| 9671/10000 [35:14:30<1:10:42, 12.89s/it] {'loss': 0.0054, 'learning_rate': 1.7350000000000001e-06, 'epoch': 3.64} 97%|█████████▋| 9671/10000 [35:14:30<1:10:42, 12.89s/it] 97%|█████████▋| 9672/10000 [35:14:43<1:10:39, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.73e-06, 'epoch': 3.64} 97%|█████████▋| 9672/10000 [35:14:43<1:10:39, 12.93s/it] 97%|█████████▋| 9673/10000 [35:14:56<1:10:29, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.7250000000000002e-06, 'epoch': 3.64} 97%|█████████▋| 9673/10000 [35:14:56<1:10:29, 12.93s/it] 97%|█████████▋| 9674/10000 [35:15:08<1:10:15, 12.93s/it] {'loss': 0.0026, 'learning_rate': 1.72e-06, 'epoch': 3.65} 97%|█████████▋| 9674/10000 [35:15:09<1:10:15, 12.93s/it] 97%|█████████▋| 9675/10000 [35:15:21<1:09:59, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.7149999999999999e-06, 'epoch': 3.65} 97%|█████████▋| 9675/10000 [35:15:21<1:09:59, 12.92s/it] 97%|█████████▋| 9676/10000 [35:15:34<1:09:49, 12.93s/it] {'loss': 0.0028, 'learning_rate': 1.7100000000000001e-06, 'epoch': 3.65} 97%|█████████▋| 9676/10000 [35:15:34<1:09:49, 12.93s/it] 97%|█████████▋| 9677/10000 [35:15:47<1:09:36, 12.93s/it] {'loss': 0.0042, 'learning_rate': 1.705e-06, 'epoch': 3.65} 97%|█████████▋| 9677/10000 [35:15:47<1:09:36, 12.93s/it] 97%|█████████▋| 9678/10000 [35:16:00<1:09:20, 12.92s/it] {'loss': 0.0032, 'learning_rate': 1.7000000000000002e-06, 'epoch': 3.65} 97%|█████████▋| 9678/10000 [35:16:00<1:09:20, 12.92s/it] 97%|█████████▋| 9679/10000 [35:16:13<1:09:00, 12.90s/it] {'loss': 0.0039, 'learning_rate': 1.695e-06, 'epoch': 3.65} 97%|█████████▋| 9679/10000 [35:16:13<1:09:00, 12.90s/it] 97%|█████████▋| 9680/10000 [35:16:26<1:08:44, 12.89s/it] {'loss': 0.0035, 'learning_rate': 1.69e-06, 'epoch': 3.65} 97%|█████████▋| 9680/10000 [35:16:26<1:08:44, 12.89s/it] 97%|█████████▋| 9681/10000 [35:16:39<1:08:34, 12.90s/it] {'loss': 0.0037, 'learning_rate': 1.6850000000000002e-06, 'epoch': 3.65} 97%|█████████▋| 9681/10000 [35:16:39<1:08:34, 12.90s/it] 97%|█████████▋| 9682/10000 [35:16:52<1:08:28, 12.92s/it] {'loss': 0.0031, 'learning_rate': 1.68e-06, 'epoch': 3.65} 97%|█████████▋| 9682/10000 [35:16:52<1:08:28, 12.92s/it] 97%|█████████▋| 9683/10000 [35:17:05<1:08:11, 12.91s/it] {'loss': 0.0029, 'learning_rate': 1.6750000000000003e-06, 'epoch': 3.65} 97%|█████████▋| 9683/10000 [35:17:05<1:08:11, 12.91s/it] 97%|█████████▋| 9684/10000 [35:17:18<1:07:59, 12.91s/it] {'loss': 0.0032, 'learning_rate': 1.67e-06, 'epoch': 3.65} 97%|█████████▋| 9684/10000 [35:17:18<1:07:59, 12.91s/it] 97%|█████████▋| 9685/10000 [35:17:31<1:07:49, 12.92s/it] {'loss': 0.004, 'learning_rate': 1.6650000000000002e-06, 'epoch': 3.65} 97%|█████████▋| 9685/10000 [35:17:31<1:07:49, 12.92s/it] 97%|█████████▋| 9686/10000 [35:17:43<1:07:28, 12.89s/it] {'loss': 0.0053, 'learning_rate': 1.6600000000000002e-06, 'epoch': 3.65} 97%|█████████▋| 9686/10000 [35:17:43<1:07:28, 12.89s/it] 97%|█████████▋| 9687/10000 [35:17:56<1:07:16, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.655e-06, 'epoch': 3.65} 97%|█████████▋| 9687/10000 [35:17:56<1:07:16, 12.90s/it] 97%|█████████▋| 9688/10000 [35:18:09<1:07:00, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.65e-06, 'epoch': 3.65} 97%|█████████▋| 9688/10000 [35:18:09<1:07:00, 12.89s/it] 97%|█████████▋| 9689/10000 [35:18:22<1:06:51, 12.90s/it] {'loss': 0.0026, 'learning_rate': 1.645e-06, 'epoch': 3.65} 97%|█████████▋| 9689/10000 [35:18:22<1:06:51, 12.90s/it] 97%|█████████▋| 9690/10000 [35:18:35<1:06:32, 12.88s/it] {'loss': 0.0043, 'learning_rate': 1.6400000000000002e-06, 'epoch': 3.65} 97%|█████████▋| 9690/10000 [35:18:35<1:06:32, 12.88s/it] 97%|█████████▋| 9691/10000 [35:18:48<1:06:21, 12.88s/it] {'loss': 0.0053, 'learning_rate': 1.635e-06, 'epoch': 3.65} 97%|█████████▋| 9691/10000 [35:18:48<1:06:21, 12.88s/it] 97%|█████████▋| 9692/10000 [35:19:01<1:06:06, 12.88s/it] {'loss': 0.0043, 'learning_rate': 1.6299999999999999e-06, 'epoch': 3.65} 97%|█████████▋| 9692/10000 [35:19:01<1:06:06, 12.88s/it] 97%|█████████▋| 9693/10000 [35:19:14<1:05:56, 12.89s/it] {'loss': 0.0042, 'learning_rate': 1.6250000000000001e-06, 'epoch': 3.65} 97%|█████████▋| 9693/10000 [35:19:14<1:05:56, 12.89s/it] 97%|█████████▋| 9694/10000 [35:19:27<1:05:53, 12.92s/it] {'loss': 0.0033, 'learning_rate': 1.62e-06, 'epoch': 3.65} 97%|█████████▋| 9694/10000 [35:19:27<1:05:53, 12.92s/it] 97%|█████████▋| 9695/10000 [35:19:39<1:05:41, 12.92s/it] {'loss': 0.0045, 'learning_rate': 1.6150000000000002e-06, 'epoch': 3.65} 97%|█████████▋| 9695/10000 [35:19:39<1:05:41, 12.92s/it] 97%|█████████▋| 9696/10000 [35:19:52<1:05:28, 12.92s/it] {'loss': 0.005, 'learning_rate': 1.61e-06, 'epoch': 3.65} 97%|█████████▋| 9696/10000 [35:19:52<1:05:28, 12.92s/it] 97%|█████████▋| 9697/10000 [35:20:05<1:05:13, 12.92s/it] {'loss': 0.0028, 'learning_rate': 1.6049999999999999e-06, 'epoch': 3.65} 97%|█████████▋| 9697/10000 [35:20:05<1:05:13, 12.92s/it] 97%|█████████▋| 9698/10000 [35:20:18<1:04:57, 12.91s/it] {'loss': 0.004, 'learning_rate': 1.6000000000000001e-06, 'epoch': 3.65} 97%|█████████▋| 9698/10000 [35:20:18<1:04:57, 12.91s/it] 97%|█████████▋| 9699/10000 [35:20:31<1:04:44, 12.90s/it] {'loss': 0.0028, 'learning_rate': 1.595e-06, 'epoch': 3.65} 97%|█████████▋| 9699/10000 [35:20:31<1:04:44, 12.90s/it] 97%|█████████▋| 9700/10000 [35:20:44<1:04:42, 12.94s/it] {'loss': 0.0031, 'learning_rate': 1.5900000000000002e-06, 'epoch': 3.65} 97%|█████████▋| 9700/10000 [35:20:44<1:04:42, 12.94s/it] 97%|█████████▋| 9701/10000 [35:20:57<1:04:28, 12.94s/it] {'loss': 0.0045, 'learning_rate': 1.585e-06, 'epoch': 3.66} 97%|█████████▋| 9701/10000 [35:20:57<1:04:28, 12.94s/it] 97%|█████████▋| 9702/10000 [35:21:10<1:04:08, 12.91s/it] {'loss': 0.0036, 'learning_rate': 1.5800000000000003e-06, 'epoch': 3.66} 97%|█████████▋| 9702/10000 [35:21:10<1:04:08, 12.91s/it] 97%|█████████▋| 9703/10000 [35:21:23<1:03:53, 12.91s/it] {'loss': 0.0032, 'learning_rate': 1.5750000000000002e-06, 'epoch': 3.66} 97%|█████████▋| 9703/10000 [35:21:23<1:03:53, 12.91s/it] 97%|█████████▋| 9704/10000 [35:21:36<1:03:38, 12.90s/it] {'loss': 0.004, 'learning_rate': 1.57e-06, 'epoch': 3.66} 97%|█████████▋| 9704/10000 [35:21:36<1:03:38, 12.90s/it] 97%|█████████▋| 9705/10000 [35:21:49<1:03:21, 12.89s/it] {'loss': 0.0031, 'learning_rate': 1.565e-06, 'epoch': 3.66} 97%|█████████▋| 9705/10000 [35:21:49<1:03:21, 12.89s/it] 97%|█████████▋| 9706/10000 [35:22:02<1:03:19, 12.92s/it] {'loss': 0.0037, 'learning_rate': 1.56e-06, 'epoch': 3.66} 97%|█████████▋| 9706/10000 [35:22:02<1:03:19, 12.92s/it] 97%|█████████▋| 9707/10000 [35:22:14<1:03:03, 12.91s/it] {'loss': 0.0035, 'learning_rate': 1.555e-06, 'epoch': 3.66} 97%|█████████▋| 9707/10000 [35:22:14<1:03:03, 12.91s/it] 97%|█████████▋| 9708/10000 [35:22:27<1:02:48, 12.91s/it] {'loss': 0.0031, 'learning_rate': 1.55e-06, 'epoch': 3.66} 97%|█████████▋| 9708/10000 [35:22:27<1:02:48, 12.91s/it] 97%|█████████▋| 9709/10000 [35:22:40<1:02:37, 12.91s/it] {'loss': 0.0036, 'learning_rate': 1.545e-06, 'epoch': 3.66} 97%|█████████▋| 9709/10000 [35:22:40<1:02:37, 12.91s/it] 97%|█████████▋| 9710/10000 [35:22:53<1:02:21, 12.90s/it] {'loss': 0.0043, 'learning_rate': 1.54e-06, 'epoch': 3.66} 97%|█████████▋| 9710/10000 [35:22:53<1:02:21, 12.90s/it] 97%|█████████▋| 9711/10000 [35:23:06<1:02:10, 12.91s/it] {'loss': 0.0032, 'learning_rate': 1.5350000000000001e-06, 'epoch': 3.66} 97%|█████████▋| 9711/10000 [35:23:06<1:02:10, 12.91s/it] 97%|█████████▋| 9712/10000 [35:23:19<1:01:52, 12.89s/it] {'loss': 0.0041, 'learning_rate': 1.53e-06, 'epoch': 3.66} 97%|█████████▋| 9712/10000 [35:23:19<1:01:52, 12.89s/it] 97%|█████████▋| 9713/10000 [35:23:32<1:01:39, 12.89s/it] {'loss': 0.0033, 'learning_rate': 1.525e-06, 'epoch': 3.66} 97%|█████████▋| 9713/10000 [35:23:32<1:01:39, 12.89s/it] 97%|█████████▋| 9714/10000 [35:23:45<1:01:29, 12.90s/it] {'loss': 0.0045, 'learning_rate': 1.52e-06, 'epoch': 3.66} 97%|█████████▋| 9714/10000 [35:23:45<1:01:29, 12.90s/it] 97%|█████████▋| 9715/10000 [35:23:58<1:01:15, 12.90s/it] {'loss': 0.0027, 'learning_rate': 1.5150000000000001e-06, 'epoch': 3.66} 97%|█████████▋| 9715/10000 [35:23:58<1:01:15, 12.90s/it] 97%|█████████▋| 9716/10000 [35:24:10<1:01:00, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.5100000000000002e-06, 'epoch': 3.66} 97%|█████████▋| 9716/10000 [35:24:11<1:01:00, 12.89s/it] 97%|█████████▋| 9717/10000 [35:24:23<1:00:53, 12.91s/it] {'loss': 0.0027, 'learning_rate': 1.505e-06, 'epoch': 3.66} 97%|█████████▋| 9717/10000 [35:24:23<1:00:53, 12.91s/it] 97%|█████████▋| 9718/10000 [35:24:36<1:00:46, 12.93s/it] {'loss': 0.0027, 'learning_rate': 1.5e-06, 'epoch': 3.66} 97%|█████████▋| 9718/10000 [35:24:36<1:00:46, 12.93s/it] 97%|█████████▋| 9719/10000 [35:24:49<1:00:28, 12.91s/it] {'loss': 0.0048, 'learning_rate': 1.495e-06, 'epoch': 3.66} 97%|█████████▋| 9719/10000 [35:24:49<1:00:28, 12.91s/it] 97%|█████████▋| 9720/10000 [35:25:02<1:00:29, 12.96s/it] {'loss': 0.0041, 'learning_rate': 1.4900000000000001e-06, 'epoch': 3.66} 97%|█████████▋| 9720/10000 [35:25:02<1:00:29, 12.96s/it] 97%|█████████▋| 9721/10000 [35:25:15<1:00:20, 12.98s/it] {'loss': 0.0033, 'learning_rate': 1.4850000000000002e-06, 'epoch': 3.66} 97%|█████████▋| 9721/10000 [35:25:15<1:00:20, 12.98s/it] 97%|█████████▋| 9722/10000 [35:25:28<1:00:09, 12.98s/it] {'loss': 0.0033, 'learning_rate': 1.4800000000000002e-06, 'epoch': 3.66} 97%|█████████▋| 9722/10000 [35:25:28<1:00:09, 12.98s/it] 97%|█████████▋| 9723/10000 [35:25:41<59:47, 12.95s/it] {'loss': 0.0049, 'learning_rate': 1.475e-06, 'epoch': 3.66} 97%|█████████▋| 9723/10000 [35:25:41<59:47, 12.95s/it] 97%|█████████▋| 9724/10000 [35:25:54<59:26, 12.92s/it] {'loss': 0.0043, 'learning_rate': 1.4700000000000001e-06, 'epoch': 3.66} 97%|█████████▋| 9724/10000 [35:25:54<59:26, 12.92s/it] 97%|█████████▋| 9725/10000 [35:26:07<59:15, 12.93s/it] {'loss': 0.0031, 'learning_rate': 1.465e-06, 'epoch': 3.66} 97%|█████████▋| 9725/10000 [35:26:07<59:15, 12.93s/it] 97%|█████████▋| 9726/10000 [35:26:20<58:51, 12.89s/it] {'loss': 0.0036, 'learning_rate': 1.46e-06, 'epoch': 3.66} 97%|█████████▋| 9726/10000 [35:26:20<58:51, 12.89s/it] 97%|█████████▋| 9727/10000 [35:26:33<58:32, 12.87s/it] {'loss': 0.0037, 'learning_rate': 1.455e-06, 'epoch': 3.67} 97%|█████████▋| 9727/10000 [35:26:33<58:32, 12.87s/it] 97%|█████████▋| 9728/10000 [35:26:46<58:19, 12.86s/it] {'loss': 0.0045, 'learning_rate': 1.45e-06, 'epoch': 3.67} 97%|█████████▋| 9728/10000 [35:26:46<58:19, 12.86s/it] 97%|█████████▋| 9729/10000 [35:26:58<58:06, 12.87s/it] {'loss': 0.0041, 'learning_rate': 1.445e-06, 'epoch': 3.67} 97%|█████████▋| 9729/10000 [35:26:58<58:06, 12.87s/it] 97%|█████████▋| 9730/10000 [35:27:11<57:57, 12.88s/it] {'loss': 0.004, 'learning_rate': 1.44e-06, 'epoch': 3.67} 97%|█████████▋| 9730/10000 [35:27:11<57:57, 12.88s/it] 97%|█████████▋| 9731/10000 [35:27:24<57:51, 12.91s/it] {'loss': 0.0046, 'learning_rate': 1.435e-06, 'epoch': 3.67} 97%|█████████▋| 9731/10000 [35:27:24<57:51, 12.91s/it] 97%|█████████▋| 9732/10000 [35:27:37<57:43, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.43e-06, 'epoch': 3.67} 97%|█████████▋| 9732/10000 [35:27:37<57:43, 12.92s/it] 97%|█████████▋| 9733/10000 [35:27:50<57:37, 12.95s/it] {'loss': 0.0023, 'learning_rate': 1.4250000000000001e-06, 'epoch': 3.67} 97%|█████████▋| 9733/10000 [35:27:50<57:37, 12.95s/it] 97%|█████████▋| 9734/10000 [35:28:03<57:28, 12.96s/it] {'loss': 0.0043, 'learning_rate': 1.4200000000000002e-06, 'epoch': 3.67} 97%|█████████▋| 9734/10000 [35:28:03<57:28, 12.96s/it] 97%|█████████▋| 9735/10000 [35:28:16<57:16, 12.97s/it] {'loss': 0.0032, 'learning_rate': 1.415e-06, 'epoch': 3.67} 97%|█████████▋| 9735/10000 [35:28:16<57:16, 12.97s/it] 97%|█████████▋| 9736/10000 [35:28:29<57:05, 12.98s/it] {'loss': 0.0031, 'learning_rate': 1.41e-06, 'epoch': 3.67} 97%|█████████▋| 9736/10000 [35:28:29<57:05, 12.98s/it] 97%|█████████▋| 9737/10000 [35:28:42<56:52, 12.98s/it] {'loss': 0.0033, 'learning_rate': 1.405e-06, 'epoch': 3.67} 97%|█████████▋| 9737/10000 [35:28:42<56:52, 12.98s/it] 97%|█████████▋| 9738/10000 [35:28:55<56:36, 12.96s/it] {'loss': 0.0035, 'learning_rate': 1.4000000000000001e-06, 'epoch': 3.67} 97%|█████████▋| 9738/10000 [35:28:55<56:36, 12.96s/it] 97%|█████████▋| 9739/10000 [35:29:08<56:13, 12.93s/it] {'loss': 0.0035, 'learning_rate': 1.3950000000000002e-06, 'epoch': 3.67} 97%|█████████▋| 9739/10000 [35:29:08<56:13, 12.93s/it] 97%|█████████▋| 9740/10000 [35:29:21<55:58, 12.92s/it] {'loss': 0.0026, 'learning_rate': 1.39e-06, 'epoch': 3.67} 97%|█████████▋| 9740/10000 [35:29:21<55:58, 12.92s/it] 97%|█████████▋| 9741/10000 [35:29:34<55:49, 12.93s/it] {'loss': 0.0034, 'learning_rate': 1.385e-06, 'epoch': 3.67} 97%|█████████▋| 9741/10000 [35:29:34<55:49, 12.93s/it] 97%|█████████▋| 9742/10000 [35:29:47<55:38, 12.94s/it] {'loss': 0.0038, 'learning_rate': 1.3800000000000001e-06, 'epoch': 3.67} 97%|█████████▋| 9742/10000 [35:29:47<55:38, 12.94s/it] 97%|█████████▋| 9743/10000 [35:30:00<55:26, 12.94s/it] {'loss': 0.0038, 'learning_rate': 1.3750000000000002e-06, 'epoch': 3.67} 97%|█████████▋| 9743/10000 [35:30:00<55:26, 12.94s/it] 97%|█████████▋| 9744/10000 [35:30:13<55:10, 12.93s/it] {'loss': 0.004, 'learning_rate': 1.37e-06, 'epoch': 3.67} 97%|█████████▋| 9744/10000 [35:30:13<55:10, 12.93s/it] 97%|█████████▋| 9745/10000 [35:30:26<54:54, 12.92s/it] {'loss': 0.0042, 'learning_rate': 1.365e-06, 'epoch': 3.67} 97%|█████████▋| 9745/10000 [35:30:26<54:54, 12.92s/it] 97%|█████████▋| 9746/10000 [35:30:38<54:43, 12.93s/it] {'loss': 0.0025, 'learning_rate': 1.36e-06, 'epoch': 3.67} 97%|█████████▋| 9746/10000 [35:30:38<54:43, 12.93s/it] 97%|█████████▋| 9747/10000 [35:30:51<54:26, 12.91s/it] {'loss': 0.0028, 'learning_rate': 1.355e-06, 'epoch': 3.67} 97%|█████████▋| 9747/10000 [35:30:51<54:26, 12.91s/it] 97%|█████████▋| 9748/10000 [35:31:04<54:24, 12.95s/it] {'loss': 0.0027, 'learning_rate': 1.35e-06, 'epoch': 3.67} 97%|█████████▋| 9748/10000 [35:31:04<54:24, 12.95s/it] 97%|█████████▋| 9749/10000 [35:31:17<54:12, 12.96s/it] {'loss': 0.0042, 'learning_rate': 1.345e-06, 'epoch': 3.67} 97%|█████████▋| 9749/10000 [35:31:17<54:12, 12.96s/it] 98%|█████████▊| 9750/10000 [35:31:30<53:53, 12.93s/it] {'loss': 0.003, 'learning_rate': 1.34e-06, 'epoch': 3.67} 98%|█████████▊| 9750/10000 [35:31:30<53:53, 12.93s/it] 98%|█████████▊| 9751/10000 [35:31:43<53:37, 12.92s/it] {'loss': 0.0039, 'learning_rate': 1.3350000000000001e-06, 'epoch': 3.67} 98%|█████████▊| 9751/10000 [35:31:43<53:37, 12.92s/it] 98%|█████████▊| 9752/10000 [35:31:56<53:21, 12.91s/it] {'loss': 0.0031, 'learning_rate': 1.33e-06, 'epoch': 3.67} 98%|█████████▊| 9752/10000 [35:31:56<53:21, 12.91s/it] 98%|█████████▊| 9753/10000 [35:32:09<53:07, 12.90s/it] {'loss': 0.0032, 'learning_rate': 1.325e-06, 'epoch': 3.67} 98%|█████████▊| 9753/10000 [35:32:09<53:07, 12.90s/it] 98%|█████████▊| 9754/10000 [35:32:22<52:51, 12.89s/it] {'loss': 0.0041, 'learning_rate': 1.32e-06, 'epoch': 3.68} 98%|█████████▊| 9754/10000 [35:32:22<52:51, 12.89s/it] 98%|█████████▊| 9755/10000 [35:32:35<52:35, 12.88s/it] {'loss': 0.0047, 'learning_rate': 1.3150000000000001e-06, 'epoch': 3.68} 98%|█████████▊| 9755/10000 [35:32:35<52:35, 12.88s/it] 98%|█████████▊| 9756/10000 [35:32:48<52:31, 12.92s/it] {'loss': 0.0029, 'learning_rate': 1.3100000000000002e-06, 'epoch': 3.68} 98%|█████████▊| 9756/10000 [35:32:48<52:31, 12.92s/it] 98%|█████████▊| 9757/10000 [35:33:00<52:12, 12.89s/it] {'loss': 0.0037, 'learning_rate': 1.3050000000000002e-06, 'epoch': 3.68} 98%|█████████▊| 9757/10000 [35:33:00<52:12, 12.89s/it] 98%|█████████▊| 9758/10000 [35:33:13<52:07, 12.92s/it] {'loss': 0.0039, 'learning_rate': 1.3e-06, 'epoch': 3.68} 98%|█████████▊| 9758/10000 [35:33:13<52:07, 12.92s/it] 98%|█████████▊| 9759/10000 [35:33:26<51:55, 12.93s/it] {'loss': 0.0034, 'learning_rate': 1.295e-06, 'epoch': 3.68} 98%|█████████▊| 9759/10000 [35:33:26<51:55, 12.93s/it] 98%|█████████▊| 9760/10000 [35:33:39<51:41, 12.92s/it] {'loss': 0.0031, 'learning_rate': 1.2900000000000001e-06, 'epoch': 3.68} 98%|█████████▊| 9760/10000 [35:33:39<51:41, 12.92s/it] 98%|█████████▊| 9761/10000 [35:33:52<51:25, 12.91s/it] {'loss': 0.0028, 'learning_rate': 1.2850000000000002e-06, 'epoch': 3.68} 98%|█████████▊| 9761/10000 [35:33:52<51:25, 12.91s/it] 98%|█████████▊| 9762/10000 [35:34:05<51:16, 12.93s/it] {'loss': 0.0028, 'learning_rate': 1.28e-06, 'epoch': 3.68} 98%|█████████▊| 9762/10000 [35:34:05<51:16, 12.93s/it] 98%|█████████▊| 9763/10000 [35:34:18<51:10, 12.95s/it] {'loss': 0.0029, 'learning_rate': 1.275e-06, 'epoch': 3.68} 98%|█████████▊| 9763/10000 [35:34:18<51:10, 12.95s/it] 98%|█████████▊| 9764/10000 [35:34:31<50:57, 12.96s/it] {'loss': 0.0035, 'learning_rate': 1.27e-06, 'epoch': 3.68} 98%|█████████▊| 9764/10000 [35:34:31<50:57, 12.96s/it] 98%|█████████▊| 9765/10000 [35:34:44<50:43, 12.95s/it] {'loss': 0.0043, 'learning_rate': 1.265e-06, 'epoch': 3.68} 98%|█████████▊| 9765/10000 [35:34:44<50:43, 12.95s/it] 98%|█████████▊| 9766/10000 [35:34:57<50:30, 12.95s/it] {'loss': 0.003, 'learning_rate': 1.26e-06, 'epoch': 3.68} 98%|█████████▊| 9766/10000 [35:34:57<50:30, 12.95s/it] 98%|█████████▊| 9767/10000 [35:35:10<50:19, 12.96s/it] {'loss': 0.0041, 'learning_rate': 1.255e-06, 'epoch': 3.68} 98%|█████████▊| 9767/10000 [35:35:10<50:19, 12.96s/it] 98%|█████████▊| 9768/10000 [35:35:23<50:01, 12.94s/it] {'loss': 0.0031, 'learning_rate': 1.25e-06, 'epoch': 3.68} 98%|█████████▊| 9768/10000 [35:35:23<50:01, 12.94s/it] 98%|█████████▊| 9769/10000 [35:35:36<49:50, 12.95s/it] {'loss': 0.0035, 'learning_rate': 1.245e-06, 'epoch': 3.68} 98%|█████████▊| 9769/10000 [35:35:36<49:50, 12.95s/it] 98%|█████████▊| 9770/10000 [35:35:49<49:33, 12.93s/it] {'loss': 0.0033, 'learning_rate': 1.24e-06, 'epoch': 3.68} 98%|█████████▊| 9770/10000 [35:35:49<49:33, 12.93s/it] 98%|█████████▊| 9771/10000 [35:36:02<49:19, 12.92s/it] {'loss': 0.0033, 'learning_rate': 1.235e-06, 'epoch': 3.68} 98%|█████████▊| 9771/10000 [35:36:02<49:19, 12.92s/it] 98%|█████████▊| 9772/10000 [35:36:15<49:02, 12.91s/it] {'loss': 0.0029, 'learning_rate': 1.23e-06, 'epoch': 3.68} 98%|█████████▊| 9772/10000 [35:36:15<49:02, 12.91s/it] 98%|█████████▊| 9773/10000 [35:36:27<48:49, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.2250000000000001e-06, 'epoch': 3.68} 98%|█████████▊| 9773/10000 [35:36:27<48:49, 12.91s/it] 98%|█████████▊| 9774/10000 [35:36:40<48:37, 12.91s/it] {'loss': 0.0028, 'learning_rate': 1.2200000000000002e-06, 'epoch': 3.68} 98%|█████████▊| 9774/10000 [35:36:40<48:37, 12.91s/it] 98%|█████████▊| 9775/10000 [35:36:53<48:22, 12.90s/it] {'loss': 0.0028, 'learning_rate': 1.215e-06, 'epoch': 3.68} 98%|█████████▊| 9775/10000 [35:36:53<48:22, 12.90s/it] 98%|█████████▊| 9776/10000 [35:37:06<48:09, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.21e-06, 'epoch': 3.68} 98%|█████████▊| 9776/10000 [35:37:06<48:09, 12.90s/it] 98%|█████████▊| 9777/10000 [35:37:19<47:58, 12.91s/it] {'loss': 0.0047, 'learning_rate': 1.2050000000000001e-06, 'epoch': 3.68} 98%|█████████▊| 9777/10000 [35:37:19<47:58, 12.91s/it] 98%|█████████▊| 9778/10000 [35:37:32<47:40, 12.88s/it] {'loss': 0.0046, 'learning_rate': 1.2000000000000002e-06, 'epoch': 3.68} 98%|█████████▊| 9778/10000 [35:37:32<47:40, 12.88s/it] 98%|█████████▊| 9779/10000 [35:37:45<47:22, 12.86s/it] {'loss': 0.0033, 'learning_rate': 1.1950000000000002e-06, 'epoch': 3.68} 98%|█████████▊| 9779/10000 [35:37:45<47:22, 12.86s/it] 98%|█████████▊| 9780/10000 [35:37:58<47:13, 12.88s/it] {'loss': 0.0035, 'learning_rate': 1.19e-06, 'epoch': 3.69} 98%|█████████▊| 9780/10000 [35:37:58<47:13, 12.88s/it] 98%|█████████▊| 9781/10000 [35:38:10<46:57, 12.86s/it] {'loss': 0.0032, 'learning_rate': 1.185e-06, 'epoch': 3.69} 98%|█████████▊| 9781/10000 [35:38:10<46:57, 12.86s/it] 98%|█████████▊| 9782/10000 [35:38:23<46:43, 12.86s/it] {'loss': 0.0042, 'learning_rate': 1.18e-06, 'epoch': 3.69} 98%|█████████▊| 9782/10000 [35:38:23<46:43, 12.86s/it] 98%|█████████▊| 9783/10000 [35:38:36<46:32, 12.87s/it] {'loss': 0.0047, 'learning_rate': 1.175e-06, 'epoch': 3.69} 98%|█████████▊| 9783/10000 [35:38:36<46:32, 12.87s/it] 98%|█████████▊| 9784/10000 [35:38:49<46:21, 12.88s/it] {'loss': 0.0032, 'learning_rate': 1.17e-06, 'epoch': 3.69} 98%|█████████▊| 9784/10000 [35:38:49<46:21, 12.88s/it] 98%|█████████▊| 9785/10000 [35:39:02<46:09, 12.88s/it] {'loss': 0.003, 'learning_rate': 1.165e-06, 'epoch': 3.69} 98%|█████████▊| 9785/10000 [35:39:02<46:09, 12.88s/it] 98%|█████████▊| 9786/10000 [35:39:15<45:55, 12.88s/it] {'loss': 0.0029, 'learning_rate': 1.16e-06, 'epoch': 3.69} 98%|█████████▊| 9786/10000 [35:39:15<45:55, 12.88s/it] 98%|█████████▊| 9787/10000 [35:39:28<45:44, 12.88s/it] {'loss': 0.0024, 'learning_rate': 1.155e-06, 'epoch': 3.69} 98%|█████████▊| 9787/10000 [35:39:28<45:44, 12.88s/it] 98%|█████████▊| 9788/10000 [35:39:41<45:30, 12.88s/it] {'loss': 0.003, 'learning_rate': 1.15e-06, 'epoch': 3.69} 98%|█████████▊| 9788/10000 [35:39:41<45:30, 12.88s/it] 98%|█████████▊| 9789/10000 [35:39:53<45:17, 12.88s/it] {'loss': 0.0035, 'learning_rate': 1.145e-06, 'epoch': 3.69} 98%|█████████▊| 9789/10000 [35:39:53<45:17, 12.88s/it] 98%|█████████▊| 9790/10000 [35:40:06<45:02, 12.87s/it] {'loss': 0.005, 'learning_rate': 1.14e-06, 'epoch': 3.69} 98%|█████████▊| 9790/10000 [35:40:06<45:02, 12.87s/it] 98%|█████████▊| 9791/10000 [35:40:19<44:55, 12.90s/it] {'loss': 0.0038, 'learning_rate': 1.1350000000000001e-06, 'epoch': 3.69} 98%|█████████▊| 9791/10000 [35:40:19<44:55, 12.90s/it] 98%|█████████▊| 9792/10000 [35:40:32<44:42, 12.90s/it] {'loss': 0.0027, 'learning_rate': 1.13e-06, 'epoch': 3.69} 98%|█████████▊| 9792/10000 [35:40:32<44:42, 12.90s/it] 98%|█████████▊| 9793/10000 [35:40:45<44:28, 12.89s/it] {'loss': 0.0043, 'learning_rate': 1.125e-06, 'epoch': 3.69} 98%|█████████▊| 9793/10000 [35:40:45<44:28, 12.89s/it] 98%|█████████▊| 9794/10000 [35:40:58<44:19, 12.91s/it] {'loss': 0.0039, 'learning_rate': 1.12e-06, 'epoch': 3.69} 98%|█████████▊| 9794/10000 [35:40:58<44:19, 12.91s/it] 98%|█████████▊| 9795/10000 [35:41:11<44:05, 12.91s/it] {'loss': 0.0033, 'learning_rate': 1.1150000000000001e-06, 'epoch': 3.69} 98%|█████████▊| 9795/10000 [35:41:11<44:05, 12.91s/it] 98%|█████████▊| 9796/10000 [35:41:24<43:54, 12.91s/it] {'loss': 0.0033, 'learning_rate': 1.1100000000000002e-06, 'epoch': 3.69} 98%|█████████▊| 9796/10000 [35:41:24<43:54, 12.91s/it] 98%|█████████▊| 9797/10000 [35:41:37<43:43, 12.92s/it] {'loss': 0.0034, 'learning_rate': 1.1050000000000002e-06, 'epoch': 3.69} 98%|█████████▊| 9797/10000 [35:41:37<43:43, 12.92s/it] 98%|█████████▊| 9798/10000 [35:41:50<43:28, 12.91s/it] {'loss': 0.0042, 'learning_rate': 1.1e-06, 'epoch': 3.69} 98%|█████████▊| 9798/10000 [35:41:50<43:28, 12.91s/it] 98%|█████████▊| 9799/10000 [35:42:03<43:21, 12.94s/it] {'loss': 0.0029, 'learning_rate': 1.095e-06, 'epoch': 3.69} 98%|█████████▊| 9799/10000 [35:42:03<43:21, 12.94s/it] 98%|█████████▊| 9800/10000 [35:42:16<43:04, 12.92s/it] {'loss': 0.004, 'learning_rate': 1.0900000000000002e-06, 'epoch': 3.69} 98%|█████████▊| 9800/10000 [35:42:16<43:04, 12.92s/it] 98%|█████████▊| 9801/10000 [35:42:29<42:56, 12.95s/it] {'loss': 0.0036, 'learning_rate': 1.085e-06, 'epoch': 3.69} 98%|█████████▊| 9801/10000 [35:42:29<42:56, 12.95s/it] 98%|█████████▊| 9802/10000 [35:42:41<42:42, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.08e-06, 'epoch': 3.69} 98%|█████████▊| 9802/10000 [35:42:42<42:42, 12.94s/it] 98%|█████████▊| 9803/10000 [35:42:54<42:28, 12.93s/it] {'loss': 0.0035, 'learning_rate': 1.0749999999999999e-06, 'epoch': 3.69} 98%|█████████▊| 9803/10000 [35:42:54<42:28, 12.93s/it] 98%|█████████▊| 9804/10000 [35:43:07<42:16, 12.94s/it] {'loss': 0.0041, 'learning_rate': 1.07e-06, 'epoch': 3.69} 98%|█████████▊| 9804/10000 [35:43:07<42:16, 12.94s/it] 98%|█████████▊| 9805/10000 [35:43:20<42:03, 12.94s/it] {'loss': 0.0038, 'learning_rate': 1.065e-06, 'epoch': 3.69} 98%|█████████▊| 9805/10000 [35:43:20<42:03, 12.94s/it] 98%|█████████▊| 9806/10000 [35:43:33<41:47, 12.93s/it] {'loss': 0.0029, 'learning_rate': 1.06e-06, 'epoch': 3.69} 98%|█████████▊| 9806/10000 [35:43:33<41:47, 12.93s/it] 98%|█████████▊| 9807/10000 [35:43:46<41:33, 12.92s/it] {'loss': 0.0044, 'learning_rate': 1.055e-06, 'epoch': 3.7} 98%|█████████▊| 9807/10000 [35:43:46<41:33, 12.92s/it] 98%|█████████▊| 9808/10000 [35:43:59<41:19, 12.92s/it] {'loss': 0.0028, 'learning_rate': 1.0500000000000001e-06, 'epoch': 3.7} 98%|█████████▊| 9808/10000 [35:43:59<41:19, 12.92s/it] 98%|█████████▊| 9809/10000 [35:44:12<41:07, 12.92s/it] {'loss': 0.0031, 'learning_rate': 1.045e-06, 'epoch': 3.7} 98%|█████████▊| 9809/10000 [35:44:12<41:07, 12.92s/it] 98%|█████████▊| 9810/10000 [35:44:25<40:52, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.04e-06, 'epoch': 3.7} 98%|█████████▊| 9810/10000 [35:44:25<40:52, 12.91s/it] 98%|█████████▊| 9811/10000 [35:44:38<40:34, 12.88s/it] {'loss': 0.0039, 'learning_rate': 1.035e-06, 'epoch': 3.7} 98%|█████████▊| 9811/10000 [35:44:38<40:34, 12.88s/it] 98%|█████████▊| 9812/10000 [35:44:51<40:23, 12.89s/it] {'loss': 0.0038, 'learning_rate': 1.03e-06, 'epoch': 3.7} 98%|█████████▊| 9812/10000 [35:44:51<40:23, 12.89s/it] 98%|█████████▊| 9813/10000 [35:45:03<40:08, 12.88s/it] {'loss': 0.0032, 'learning_rate': 1.0250000000000001e-06, 'epoch': 3.7} 98%|█████████▊| 9813/10000 [35:45:03<40:08, 12.88s/it] 98%|█████████▊| 9814/10000 [35:45:16<39:57, 12.89s/it] {'loss': 0.0028, 'learning_rate': 1.0200000000000002e-06, 'epoch': 3.7} 98%|█████████▊| 9814/10000 [35:45:16<39:57, 12.89s/it] 98%|█████████▊| 9815/10000 [35:45:29<39:46, 12.90s/it] {'loss': 0.004, 'learning_rate': 1.015e-06, 'epoch': 3.7} 98%|█████████▊| 9815/10000 [35:45:29<39:46, 12.90s/it] 98%|█████████▊| 9816/10000 [35:45:42<39:33, 12.90s/it] {'loss': 0.0032, 'learning_rate': 1.01e-06, 'epoch': 3.7} 98%|█████████▊| 9816/10000 [35:45:42<39:33, 12.90s/it] 98%|█████████▊| 9817/10000 [35:45:55<39:21, 12.90s/it] {'loss': 0.0038, 'learning_rate': 1.0050000000000001e-06, 'epoch': 3.7} 98%|█████████▊| 9817/10000 [35:45:55<39:21, 12.90s/it] 98%|█████████▊| 9818/10000 [35:46:08<39:01, 12.87s/it] {'loss': 0.0043, 'learning_rate': 1.0000000000000002e-06, 'epoch': 3.7} 98%|█████████▊| 9818/10000 [35:46:08<39:01, 12.87s/it] 98%|█████████▊| 9819/10000 [35:46:21<38:49, 12.87s/it] {'loss': 0.0039, 'learning_rate': 9.95e-07, 'epoch': 3.7} 98%|█████████▊| 9819/10000 [35:46:21<38:49, 12.87s/it] 98%|█████████▊| 9820/10000 [35:46:34<38:35, 12.87s/it] {'loss': 0.0028, 'learning_rate': 9.9e-07, 'epoch': 3.7} 98%|█████████▊| 9820/10000 [35:46:34<38:35, 12.87s/it] 98%|█████████▊| 9821/10000 [35:46:46<38:25, 12.88s/it] {'loss': 0.0031, 'learning_rate': 9.849999999999999e-07, 'epoch': 3.7} 98%|█████████▊| 9821/10000 [35:46:46<38:25, 12.88s/it] 98%|█████████▊| 9822/10000 [35:46:59<38:11, 12.87s/it] {'loss': 0.0032, 'learning_rate': 9.8e-07, 'epoch': 3.7} 98%|█████████▊| 9822/10000 [35:46:59<38:11, 12.87s/it] 98%|█████████▊| 9823/10000 [35:47:12<37:57, 12.87s/it] {'loss': 0.003, 'learning_rate': 9.75e-07, 'epoch': 3.7} 98%|█████████▊| 9823/10000 [35:47:12<37:57, 12.87s/it] 98%|█████████▊| 9824/10000 [35:47:25<37:50, 12.90s/it] {'loss': 0.0046, 'learning_rate': 9.7e-07, 'epoch': 3.7} 98%|█████████▊| 9824/10000 [35:47:25<37:50, 12.90s/it] 98%|█████████▊| 9825/10000 [35:47:38<37:37, 12.90s/it] {'loss': 0.0052, 'learning_rate': 9.65e-07, 'epoch': 3.7} 98%|█████████▊| 9825/10000 [35:47:38<37:37, 12.90s/it] 98%|█████████▊| 9826/10000 [35:47:51<37:27, 12.92s/it] {'loss': 0.0028, 'learning_rate': 9.6e-07, 'epoch': 3.7} 98%|█████████▊| 9826/10000 [35:47:51<37:27, 12.92s/it] 98%|█████████▊| 9827/10000 [35:48:04<37:14, 12.92s/it] {'loss': 0.0037, 'learning_rate': 9.55e-07, 'epoch': 3.7} 98%|█████████▊| 9827/10000 [35:48:04<37:14, 12.92s/it] 98%|█████████▊| 9828/10000 [35:48:17<36:59, 12.90s/it] {'loss': 0.0029, 'learning_rate': 9.5e-07, 'epoch': 3.7} 98%|█████████▊| 9828/10000 [35:48:17<36:59, 12.90s/it] 98%|█████████▊| 9829/10000 [35:48:30<36:46, 12.91s/it] {'loss': 0.003, 'learning_rate': 9.450000000000001e-07, 'epoch': 3.7} 98%|█████████▊| 9829/10000 [35:48:30<36:46, 12.91s/it] 98%|█████████▊| 9830/10000 [35:48:43<36:31, 12.89s/it] {'loss': 0.0039, 'learning_rate': 9.400000000000001e-07, 'epoch': 3.7} 98%|█████████▊| 9830/10000 [35:48:43<36:31, 12.89s/it] 98%|█████████▊| 9831/10000 [35:48:55<36:16, 12.88s/it] {'loss': 0.0027, 'learning_rate': 9.350000000000002e-07, 'epoch': 3.7} 98%|█████████▊| 9831/10000 [35:48:55<36:16, 12.88s/it] 98%|█████████▊| 9832/10000 [35:49:08<36:04, 12.88s/it] {'loss': 0.0058, 'learning_rate': 9.3e-07, 'epoch': 3.7} 98%|█████████▊| 9832/10000 [35:49:08<36:04, 12.88s/it] 98%|█████████▊| 9833/10000 [35:49:21<35:50, 12.88s/it] {'loss': 0.0035, 'learning_rate': 9.25e-07, 'epoch': 3.7} 98%|█████████▊| 9833/10000 [35:49:21<35:50, 12.88s/it] 98%|█████████▊| 9834/10000 [35:49:34<35:37, 12.88s/it] {'loss': 0.0024, 'learning_rate': 9.2e-07, 'epoch': 3.71} 98%|█████████▊| 9834/10000 [35:49:34<35:37, 12.88s/it] 98%|█████████▊| 9835/10000 [35:49:47<35:26, 12.89s/it] {'loss': 0.0039, 'learning_rate': 9.15e-07, 'epoch': 3.71} 98%|█████████▊| 9835/10000 [35:49:47<35:26, 12.89s/it] 98%|█████████▊| 9836/10000 [35:50:00<35:11, 12.88s/it] {'loss': 0.003, 'learning_rate': 9.100000000000001e-07, 'epoch': 3.71} 98%|█████████▊| 9836/10000 [35:50:00<35:11, 12.88s/it] 98%|█████████▊| 9837/10000 [35:50:13<34:58, 12.87s/it] {'loss': 0.0025, 'learning_rate': 9.050000000000001e-07, 'epoch': 3.71} 98%|█████████▊| 9837/10000 [35:50:13<34:58, 12.87s/it] 98%|█████████▊| 9838/10000 [35:50:26<34:51, 12.91s/it] {'loss': 0.003, 'learning_rate': 9e-07, 'epoch': 3.71} 98%|█████████▊| 9838/10000 [35:50:26<34:51, 12.91s/it] 98%|█████████▊| 9839/10000 [35:50:39<34:37, 12.90s/it] {'loss': 0.0037, 'learning_rate': 8.95e-07, 'epoch': 3.71} 98%|█████████▊| 9839/10000 [35:50:39<34:37, 12.90s/it] 98%|█████████▊| 9840/10000 [35:50:51<34:22, 12.89s/it] {'loss': 0.0046, 'learning_rate': 8.900000000000001e-07, 'epoch': 3.71} 98%|█████████▊| 9840/10000 [35:50:51<34:22, 12.89s/it] 98%|█████████▊| 9841/10000 [35:51:04<34:08, 12.89s/it] {'loss': 0.0043, 'learning_rate': 8.850000000000001e-07, 'epoch': 3.71} 98%|█████████▊| 9841/10000 [35:51:04<34:08, 12.89s/it] 98%|█████████▊| 9842/10000 [35:51:17<33:53, 12.87s/it] {'loss': 0.0027, 'learning_rate': 8.8e-07, 'epoch': 3.71} 98%|█████████▊| 9842/10000 [35:51:17<33:53, 12.87s/it] 98%|█████████▊| 9843/10000 [35:51:30<33:40, 12.87s/it] {'loss': 0.004, 'learning_rate': 8.750000000000001e-07, 'epoch': 3.71} 98%|█████████▊| 9843/10000 [35:51:30<33:40, 12.87s/it] 98%|█████████▊| 9844/10000 [35:51:43<33:33, 12.91s/it] {'loss': 0.0035, 'learning_rate': 8.699999999999999e-07, 'epoch': 3.71} 98%|█████████▊| 9844/10000 [35:51:43<33:33, 12.91s/it] 98%|█████████▊| 9845/10000 [35:51:56<33:18, 12.90s/it] {'loss': 0.0043, 'learning_rate': 8.65e-07, 'epoch': 3.71} 98%|█████████▊| 9845/10000 [35:51:56<33:18, 12.90s/it] 98%|█████████▊| 9846/10000 [35:52:09<33:03, 12.88s/it] {'loss': 0.003, 'learning_rate': 8.6e-07, 'epoch': 3.71} 98%|█████████▊| 9846/10000 [35:52:09<33:03, 12.88s/it] 98%|█████████▊| 9847/10000 [35:52:22<32:51, 12.89s/it] {'loss': 0.0028, 'learning_rate': 8.550000000000001e-07, 'epoch': 3.71} 98%|█████████▊| 9847/10000 [35:52:22<32:51, 12.89s/it] 98%|█████████▊| 9848/10000 [35:52:34<32:37, 12.88s/it] {'loss': 0.0054, 'learning_rate': 8.500000000000001e-07, 'epoch': 3.71} 98%|█████████▊| 9848/10000 [35:52:35<32:37, 12.88s/it] 98%|█████████▊| 9849/10000 [35:52:47<32:24, 12.88s/it] {'loss': 0.0039, 'learning_rate': 8.45e-07, 'epoch': 3.71} 98%|█████████▊| 9849/10000 [35:52:47<32:24, 12.88s/it] 98%|█████████▊| 9850/10000 [35:53:00<32:13, 12.89s/it] {'loss': 0.0031, 'learning_rate': 8.4e-07, 'epoch': 3.71} 98%|█████████▊| 9850/10000 [35:53:00<32:13, 12.89s/it] 99%|█████████▊| 9851/10000 [35:53:13<31:59, 12.88s/it] {'loss': 0.0039, 'learning_rate': 8.35e-07, 'epoch': 3.71} 99%|█████████▊| 9851/10000 [35:53:13<31:59, 12.88s/it] 99%|█████████▊| 9852/10000 [35:53:26<31:45, 12.87s/it] {'loss': 0.0024, 'learning_rate': 8.300000000000001e-07, 'epoch': 3.71} 99%|█████████▊| 9852/10000 [35:53:26<31:45, 12.87s/it] 99%|█████████▊| 9853/10000 [35:53:39<31:37, 12.91s/it] {'loss': 0.0037, 'learning_rate': 8.25e-07, 'epoch': 3.71} 99%|█████████▊| 9853/10000 [35:53:39<31:37, 12.91s/it] 99%|█████████▊| 9854/10000 [35:53:52<31:23, 12.90s/it] {'loss': 0.0029, 'learning_rate': 8.200000000000001e-07, 'epoch': 3.71} 99%|█████████▊| 9854/10000 [35:53:52<31:23, 12.90s/it] 99%|█████████▊| 9855/10000 [35:54:05<31:11, 12.91s/it] {'loss': 0.0042, 'learning_rate': 8.149999999999999e-07, 'epoch': 3.71} 99%|█████████▊| 9855/10000 [35:54:05<31:11, 12.91s/it] 99%|█████████▊| 9856/10000 [35:54:18<30:57, 12.90s/it] {'loss': 0.004, 'learning_rate': 8.1e-07, 'epoch': 3.71} 99%|█████████▊| 9856/10000 [35:54:18<30:57, 12.90s/it] 99%|█████████▊| 9857/10000 [35:54:31<30:44, 12.90s/it] {'loss': 0.003, 'learning_rate': 8.05e-07, 'epoch': 3.71} 99%|█████████▊| 9857/10000 [35:54:31<30:44, 12.90s/it] 99%|█████████▊| 9858/10000 [35:54:44<30:35, 12.93s/it] {'loss': 0.0028, 'learning_rate': 8.000000000000001e-07, 'epoch': 3.71} 99%|█████████▊| 9858/10000 [35:54:44<30:35, 12.93s/it] 99%|█████████▊| 9859/10000 [35:54:57<30:24, 12.94s/it] {'loss': 0.0033, 'learning_rate': 7.950000000000001e-07, 'epoch': 3.71} 99%|█████████▊| 9859/10000 [35:54:57<30:24, 12.94s/it] 99%|█████████▊| 9860/10000 [35:55:10<30:16, 12.98s/it] {'loss': 0.0036, 'learning_rate': 7.900000000000002e-07, 'epoch': 3.72} 99%|█████████▊| 9860/10000 [35:55:10<30:16, 12.98s/it] 99%|█████████▊| 9861/10000 [35:55:22<30:00, 12.96s/it] {'loss': 0.0032, 'learning_rate': 7.85e-07, 'epoch': 3.72} 99%|█████████▊| 9861/10000 [35:55:23<30:00, 12.96s/it] 99%|█████████▊| 9862/10000 [35:55:35<29:45, 12.94s/it] {'loss': 0.0037, 'learning_rate': 7.8e-07, 'epoch': 3.72} 99%|█████████▊| 9862/10000 [35:55:35<29:45, 12.94s/it] 99%|█████████▊| 9863/10000 [35:55:48<29:37, 12.98s/it] {'loss': 0.0022, 'learning_rate': 7.75e-07, 'epoch': 3.72} 99%|█████████▊| 9863/10000 [35:55:48<29:37, 12.98s/it] 99%|█████████▊| 9864/10000 [35:56:01<29:23, 12.97s/it] {'loss': 0.0036, 'learning_rate': 7.7e-07, 'epoch': 3.72} 99%|█████████▊| 9864/10000 [35:56:01<29:23, 12.97s/it] 99%|█████████▊| 9865/10000 [35:56:14<29:11, 12.97s/it] {'loss': 0.0027, 'learning_rate': 7.65e-07, 'epoch': 3.72} 99%|█████████▊| 9865/10000 [35:56:14<29:11, 12.97s/it] 99%|█████████▊| 9866/10000 [35:56:27<28:54, 12.94s/it] {'loss': 0.0035, 'learning_rate': 7.6e-07, 'epoch': 3.72} 99%|█████████▊| 9866/10000 [35:56:27<28:54, 12.94s/it] 99%|█████████▊| 9867/10000 [35:56:40<28:42, 12.95s/it] {'loss': 0.003, 'learning_rate': 7.550000000000001e-07, 'epoch': 3.72} 99%|█████████▊| 9867/10000 [35:56:40<28:42, 12.95s/it] 99%|█████████▊| 9868/10000 [35:56:53<28:28, 12.94s/it] {'loss': 0.0033, 'learning_rate': 7.5e-07, 'epoch': 3.72} 99%|█████████▊| 9868/10000 [35:56:53<28:28, 12.94s/it] 99%|█████████▊| 9869/10000 [35:57:06<28:14, 12.94s/it] {'loss': 0.0033, 'learning_rate': 7.450000000000001e-07, 'epoch': 3.72} 99%|█████████▊| 9869/10000 [35:57:06<28:14, 12.94s/it] 99%|█████████▊| 9870/10000 [35:57:19<28:01, 12.94s/it] {'loss': 0.0041, 'learning_rate': 7.400000000000001e-07, 'epoch': 3.72} 99%|█████████▊| 9870/10000 [35:57:19<28:01, 12.94s/it] 99%|█████████▊| 9871/10000 [35:57:32<27:47, 12.92s/it] {'loss': 0.0043, 'learning_rate': 7.350000000000001e-07, 'epoch': 3.72} 99%|█████████▊| 9871/10000 [35:57:32<27:47, 12.92s/it] 99%|█████████▊| 9872/10000 [35:57:45<27:31, 12.90s/it] {'loss': 0.0037, 'learning_rate': 7.3e-07, 'epoch': 3.72} 99%|█████████▊| 9872/10000 [35:57:45<27:31, 12.90s/it] 99%|█████████▊| 9873/10000 [35:57:58<27:16, 12.89s/it] {'loss': 0.0051, 'learning_rate': 7.25e-07, 'epoch': 3.72} 99%|█████████▊| 9873/10000 [35:57:58<27:16, 12.89s/it] 99%|█████████▊| 9874/10000 [35:58:10<27:03, 12.88s/it] {'loss': 0.0037, 'learning_rate': 7.2e-07, 'epoch': 3.72} 99%|█████████▊| 9874/10000 [35:58:11<27:03, 12.88s/it] 99%|█████████▉| 9875/10000 [35:58:23<26:51, 12.89s/it] {'loss': 0.0031, 'learning_rate': 7.15e-07, 'epoch': 3.72} 99%|█████████▉| 9875/10000 [35:58:23<26:51, 12.89s/it] 99%|█████████▉| 9876/10000 [35:58:36<26:36, 12.87s/it] {'loss': 0.0034, 'learning_rate': 7.100000000000001e-07, 'epoch': 3.72} 99%|█████████▉| 9876/10000 [35:58:36<26:36, 12.87s/it] 99%|█████████▉| 9877/10000 [35:58:49<26:24, 12.88s/it] {'loss': 0.0053, 'learning_rate': 7.05e-07, 'epoch': 3.72} 99%|█████████▉| 9877/10000 [35:58:49<26:24, 12.88s/it] 99%|█████████▉| 9878/10000 [35:59:02<26:11, 12.88s/it] {'loss': 0.0037, 'learning_rate': 7.000000000000001e-07, 'epoch': 3.72} 99%|█████████▉| 9878/10000 [35:59:02<26:11, 12.88s/it] 99%|█████████▉| 9879/10000 [35:59:15<25:58, 12.88s/it] {'loss': 0.0034, 'learning_rate': 6.95e-07, 'epoch': 3.72} 99%|█████████▉| 9879/10000 [35:59:15<25:58, 12.88s/it] 99%|█████████▉| 9880/10000 [35:59:28<25:47, 12.89s/it] {'loss': 0.0037, 'learning_rate': 6.900000000000001e-07, 'epoch': 3.72} 99%|█████████▉| 9880/10000 [35:59:28<25:47, 12.89s/it] 99%|█████████▉| 9881/10000 [35:59:41<25:32, 12.88s/it] {'loss': 0.0027, 'learning_rate': 6.85e-07, 'epoch': 3.72} 99%|█████████▉| 9881/10000 [35:59:41<25:32, 12.88s/it] 99%|█████████▉| 9882/10000 [35:59:54<25:22, 12.90s/it] {'loss': 0.0031, 'learning_rate': 6.8e-07, 'epoch': 3.72} 99%|█████████▉| 9882/10000 [35:59:54<25:22, 12.90s/it] 99%|█████████▉| 9883/10000 [36:00:07<25:10, 12.91s/it] {'loss': 0.003, 'learning_rate': 6.75e-07, 'epoch': 3.72} 99%|█████████▉| 9883/10000 [36:00:07<25:10, 12.91s/it] 99%|█████████▉| 9884/10000 [36:00:19<24:59, 12.92s/it] {'loss': 0.0033, 'learning_rate': 6.7e-07, 'epoch': 3.72} 99%|█████████▉| 9884/10000 [36:00:20<24:59, 12.92s/it] 99%|█████████▉| 9885/10000 [36:00:32<24:47, 12.93s/it] {'loss': 0.0025, 'learning_rate': 6.65e-07, 'epoch': 3.72} 99%|█████████▉| 9885/10000 [36:00:32<24:47, 12.93s/it] 99%|█████████▉| 9886/10000 [36:00:45<24:32, 12.92s/it] {'loss': 0.0031, 'learning_rate': 6.6e-07, 'epoch': 3.72} 99%|█████████▉| 9886/10000 [36:00:45<24:32, 12.92s/it] 99%|█████████▉| 9887/10000 [36:00:58<24:19, 12.92s/it] {'loss': 0.0048, 'learning_rate': 6.550000000000001e-07, 'epoch': 3.73} 99%|█████████▉| 9887/10000 [36:00:58<24:19, 12.92s/it] 99%|█████████▉| 9888/10000 [36:01:11<24:06, 12.92s/it] {'loss': 0.0031, 'learning_rate': 6.5e-07, 'epoch': 3.73} 99%|█████████▉| 9888/10000 [36:01:11<24:06, 12.92s/it] 99%|█████████▉| 9889/10000 [36:01:24<23:57, 12.95s/it] {'loss': 0.003, 'learning_rate': 6.450000000000001e-07, 'epoch': 3.73} 99%|█████████▉| 9889/10000 [36:01:24<23:57, 12.95s/it] 99%|█████████▉| 9890/10000 [36:01:37<23:45, 12.96s/it] {'loss': 0.0028, 'learning_rate': 6.4e-07, 'epoch': 3.73} 99%|█████████▉| 9890/10000 [36:01:37<23:45, 12.96s/it] 99%|█████████▉| 9891/10000 [36:01:50<23:31, 12.95s/it] {'loss': 0.004, 'learning_rate': 6.35e-07, 'epoch': 3.73} 99%|█████████▉| 9891/10000 [36:01:50<23:31, 12.95s/it] 99%|█████████▉| 9892/10000 [36:02:03<23:17, 12.94s/it] {'loss': 0.0033, 'learning_rate': 6.3e-07, 'epoch': 3.73} 99%|█████████▉| 9892/10000 [36:02:03<23:17, 12.94s/it] 99%|█████████▉| 9893/10000 [36:02:16<23:04, 12.94s/it] {'loss': 0.0049, 'learning_rate': 6.25e-07, 'epoch': 3.73} 99%|█████████▉| 9893/10000 [36:02:16<23:04, 12.94s/it] 99%|█████████▉| 9894/10000 [36:02:29<22:49, 12.92s/it] {'loss': 0.0038, 'learning_rate': 6.2e-07, 'epoch': 3.73} 99%|█████████▉| 9894/10000 [36:02:29<22:49, 12.92s/it] 99%|█████████▉| 9895/10000 [36:02:42<22:37, 12.92s/it] {'loss': 0.0033, 'learning_rate': 6.15e-07, 'epoch': 3.73} 99%|█████████▉| 9895/10000 [36:02:42<22:37, 12.92s/it] 99%|█████████▉| 9896/10000 [36:02:55<22:26, 12.95s/it] {'loss': 0.003, 'learning_rate': 6.100000000000001e-07, 'epoch': 3.73} 99%|█████████▉| 9896/10000 [36:02:55<22:26, 12.95s/it] 99%|█████████▉| 9897/10000 [36:03:08<22:15, 12.96s/it] {'loss': 0.0022, 'learning_rate': 6.05e-07, 'epoch': 3.73} 99%|█████████▉| 9897/10000 [36:03:08<22:15, 12.96s/it] 99%|█████████▉| 9898/10000 [36:03:21<22:02, 12.96s/it] {'loss': 0.0034, 'learning_rate': 6.000000000000001e-07, 'epoch': 3.73} 99%|█████████▉| 9898/10000 [36:03:21<22:02, 12.96s/it] 99%|█████████▉| 9899/10000 [36:03:34<21:48, 12.95s/it] {'loss': 0.0037, 'learning_rate': 5.95e-07, 'epoch': 3.73} 99%|█████████▉| 9899/10000 [36:03:34<21:48, 12.95s/it] 99%|█████████▉| 9900/10000 [36:03:47<21:34, 12.95s/it] {'loss': 0.004, 'learning_rate': 5.9e-07, 'epoch': 3.73} 99%|█████████▉| 9900/10000 [36:03:47<21:34, 12.95s/it] 99%|█████████▉| 9901/10000 [36:03:59<21:19, 12.93s/it] {'loss': 0.0041, 'learning_rate': 5.85e-07, 'epoch': 3.73} 99%|█████████▉| 9901/10000 [36:04:00<21:19, 12.93s/it] 99%|█████████▉| 9902/10000 [36:04:13<21:10, 12.97s/it] {'loss': 0.0039, 'learning_rate': 5.8e-07, 'epoch': 3.73} 99%|█████████▉| 9902/10000 [36:04:13<21:10, 12.97s/it] 99%|█████████▉| 9903/10000 [36:04:25<20:56, 12.95s/it] {'loss': 0.0028, 'learning_rate': 5.75e-07, 'epoch': 3.73} 99%|█████████▉| 9903/10000 [36:04:25<20:56, 12.95s/it] 99%|█████████▉| 9904/10000 [36:04:38<20:42, 12.94s/it] {'loss': 0.0037, 'learning_rate': 5.7e-07, 'epoch': 3.73} 99%|█████████▉| 9904/10000 [36:04:38<20:42, 12.94s/it] 99%|█████████▉| 9905/10000 [36:04:51<20:28, 12.93s/it] {'loss': 0.0044, 'learning_rate': 5.65e-07, 'epoch': 3.73} 99%|█████████▉| 9905/10000 [36:04:51<20:28, 12.93s/it] 99%|█████████▉| 9906/10000 [36:05:04<20:14, 12.92s/it] {'loss': 0.0042, 'learning_rate': 5.6e-07, 'epoch': 3.73} 99%|█████████▉| 9906/10000 [36:05:04<20:14, 12.92s/it] 99%|█████████▉| 9907/10000 [36:05:17<20:01, 12.92s/it] {'loss': 0.0036, 'learning_rate': 5.550000000000001e-07, 'epoch': 3.73} 99%|█████████▉| 9907/10000 [36:05:17<20:01, 12.92s/it] 99%|█████████▉| 9908/10000 [36:05:30<19:48, 12.92s/it] {'loss': 0.0038, 'learning_rate': 5.5e-07, 'epoch': 3.73} 99%|█████████▉| 9908/10000 [36:05:30<19:48, 12.92s/it] 99%|█████████▉| 9909/10000 [36:05:43<19:35, 12.92s/it] {'loss': 0.0038, 'learning_rate': 5.450000000000001e-07, 'epoch': 3.73} 99%|█████████▉| 9909/10000 [36:05:43<19:35, 12.92s/it] 99%|█████████▉| 9910/10000 [36:05:56<19:24, 12.94s/it] {'loss': 0.0041, 'learning_rate': 5.4e-07, 'epoch': 3.73} 99%|█████████▉| 9910/10000 [36:05:56<19:24, 12.94s/it] 99%|█████████▉| 9911/10000 [36:06:09<19:11, 12.94s/it] {'loss': 0.0042, 'learning_rate': 5.35e-07, 'epoch': 3.73} 99%|█████████▉| 9911/10000 [36:06:09<19:11, 12.94s/it] 99%|█████████▉| 9912/10000 [36:06:22<18:57, 12.93s/it] {'loss': 0.0054, 'learning_rate': 5.3e-07, 'epoch': 3.73} 99%|█████████▉| 9912/10000 [36:06:22<18:57, 12.93s/it] 99%|█████████▉| 9913/10000 [36:06:35<18:43, 12.92s/it] {'loss': 0.0032, 'learning_rate': 5.250000000000001e-07, 'epoch': 3.74} 99%|█████████▉| 9913/10000 [36:06:35<18:43, 12.92s/it] 99%|█████████▉| 9914/10000 [36:06:48<18:31, 12.92s/it] {'loss': 0.0051, 'learning_rate': 5.2e-07, 'epoch': 3.74} 99%|█████████▉| 9914/10000 [36:06:48<18:31, 12.92s/it] 99%|█████████▉| 9915/10000 [36:07:00<18:18, 12.92s/it] {'loss': 0.0044, 'learning_rate': 5.15e-07, 'epoch': 3.74} 99%|█████████▉| 9915/10000 [36:07:01<18:18, 12.92s/it] 99%|█████████▉| 9916/10000 [36:07:13<18:04, 12.92s/it] {'loss': 0.0032, 'learning_rate': 5.100000000000001e-07, 'epoch': 3.74} 99%|█████████▉| 9916/10000 [36:07:13<18:04, 12.92s/it] 99%|█████████▉| 9917/10000 [36:07:26<17:56, 12.97s/it] {'loss': 0.0045, 'learning_rate': 5.05e-07, 'epoch': 3.74} 99%|█████████▉| 9917/10000 [36:07:27<17:56, 12.97s/it] 99%|█████████▉| 9918/10000 [36:07:39<17:42, 12.96s/it] {'loss': 0.0041, 'learning_rate': 5.000000000000001e-07, 'epoch': 3.74} 99%|█████████▉| 9918/10000 [36:07:39<17:42, 12.96s/it] 99%|█████████▉| 9919/10000 [36:07:52<17:27, 12.93s/it] {'loss': 0.0044, 'learning_rate': 4.95e-07, 'epoch': 3.74} 99%|█████████▉| 9919/10000 [36:07:52<17:27, 12.93s/it] 99%|█████████▉| 9920/10000 [36:08:05<17:13, 12.92s/it] {'loss': 0.0028, 'learning_rate': 4.9e-07, 'epoch': 3.74} 99%|█████████▉| 9920/10000 [36:08:05<17:13, 12.92s/it] 99%|█████████▉| 9921/10000 [36:08:18<16:59, 12.91s/it] {'loss': 0.0046, 'learning_rate': 4.85e-07, 'epoch': 3.74} 99%|█████████▉| 9921/10000 [36:08:18<16:59, 12.91s/it] 99%|█████████▉| 9922/10000 [36:08:31<16:44, 12.88s/it] {'loss': 0.0049, 'learning_rate': 4.8e-07, 'epoch': 3.74} 99%|█████████▉| 9922/10000 [36:08:31<16:44, 12.88s/it] 99%|█████████▉| 9923/10000 [36:08:44<16:31, 12.88s/it] {'loss': 0.004, 'learning_rate': 4.75e-07, 'epoch': 3.74} 99%|█████████▉| 9923/10000 [36:08:44<16:31, 12.88s/it] 99%|█████████▉| 9924/10000 [36:08:57<16:18, 12.87s/it] {'loss': 0.0038, 'learning_rate': 4.7000000000000005e-07, 'epoch': 3.74} 99%|█████████▉| 9924/10000 [36:08:57<16:18, 12.87s/it] 99%|█████████▉| 9925/10000 [36:09:09<16:05, 12.88s/it] {'loss': 0.0046, 'learning_rate': 4.65e-07, 'epoch': 3.74} 99%|█████████▉| 9925/10000 [36:09:10<16:05, 12.88s/it] 99%|█████████▉| 9926/10000 [36:09:22<15:53, 12.89s/it] {'loss': 0.0032, 'learning_rate': 4.6e-07, 'epoch': 3.74} 99%|█████████▉| 9926/10000 [36:09:22<15:53, 12.89s/it] 99%|█████████▉| 9927/10000 [36:09:35<15:41, 12.90s/it] {'loss': 0.0033, 'learning_rate': 4.5500000000000004e-07, 'epoch': 3.74} 99%|█████████▉| 9927/10000 [36:09:35<15:41, 12.90s/it] 99%|█████████▉| 9928/10000 [36:09:48<15:28, 12.90s/it] {'loss': 0.0033, 'learning_rate': 4.5e-07, 'epoch': 3.74} 99%|█████████▉| 9928/10000 [36:09:48<15:28, 12.90s/it] 99%|█████████▉| 9929/10000 [36:10:01<15:15, 12.90s/it] {'loss': 0.003, 'learning_rate': 4.4500000000000003e-07, 'epoch': 3.74} 99%|█████████▉| 9929/10000 [36:10:01<15:15, 12.90s/it] 99%|█████████▉| 9930/10000 [36:10:14<15:04, 12.92s/it] {'loss': 0.0035, 'learning_rate': 4.4e-07, 'epoch': 3.74} 99%|█████████▉| 9930/10000 [36:10:14<15:04, 12.92s/it] 99%|█████████▉| 9931/10000 [36:10:27<14:51, 12.92s/it] {'loss': 0.0039, 'learning_rate': 4.3499999999999996e-07, 'epoch': 3.74} 99%|█████████▉| 9931/10000 [36:10:27<14:51, 12.92s/it] 99%|█████████▉| 9932/10000 [36:10:40<14:38, 12.92s/it] {'loss': 0.0027, 'learning_rate': 4.3e-07, 'epoch': 3.74} 99%|█████████▉| 9932/10000 [36:10:40<14:38, 12.92s/it] 99%|█████████▉| 9933/10000 [36:10:53<14:26, 12.93s/it] {'loss': 0.0033, 'learning_rate': 4.2500000000000006e-07, 'epoch': 3.74} 99%|█████████▉| 9933/10000 [36:10:53<14:26, 12.93s/it] 99%|█████████▉| 9934/10000 [36:11:06<14:13, 12.94s/it] {'loss': 0.0037, 'learning_rate': 4.2e-07, 'epoch': 3.74} 99%|█████████▉| 9934/10000 [36:11:06<14:13, 12.94s/it] 99%|█████████▉| 9935/10000 [36:11:19<13:59, 12.91s/it] {'loss': 0.0034, 'learning_rate': 4.1500000000000005e-07, 'epoch': 3.74} 99%|█████████▉| 9935/10000 [36:11:19<13:59, 12.91s/it] 99%|█████████▉| 9936/10000 [36:11:32<13:46, 12.91s/it] {'loss': 0.0049, 'learning_rate': 4.1000000000000004e-07, 'epoch': 3.74} 99%|█████████▉| 9936/10000 [36:11:32<13:46, 12.91s/it] 99%|█████████▉| 9937/10000 [36:11:44<13:32, 12.89s/it] {'loss': 0.0039, 'learning_rate': 4.05e-07, 'epoch': 3.74} 99%|█████████▉| 9937/10000 [36:11:44<13:32, 12.89s/it] 99%|█████████▉| 9938/10000 [36:11:57<13:20, 12.91s/it] {'loss': 0.0033, 'learning_rate': 4.0000000000000003e-07, 'epoch': 3.74} 99%|█████████▉| 9938/10000 [36:11:57<13:20, 12.91s/it] 99%|█████████▉| 9939/10000 [36:12:10<13:06, 12.89s/it] {'loss': 0.0033, 'learning_rate': 3.950000000000001e-07, 'epoch': 3.74} 99%|█████████▉| 9939/10000 [36:12:10<13:06, 12.89s/it] 99%|█████████▉| 9940/10000 [36:12:23<12:52, 12.88s/it] {'loss': 0.004, 'learning_rate': 3.9e-07, 'epoch': 3.75} 99%|█████████▉| 9940/10000 [36:12:23<12:52, 12.88s/it] 99%|█████████▉| 9941/10000 [36:12:36<12:40, 12.89s/it] {'loss': 0.0037, 'learning_rate': 3.85e-07, 'epoch': 3.75} 99%|█████████▉| 9941/10000 [36:12:36<12:40, 12.89s/it] 99%|█████████▉| 9942/10000 [36:12:49<12:26, 12.87s/it] {'loss': 0.0036, 'learning_rate': 3.8e-07, 'epoch': 3.75} 99%|█████████▉| 9942/10000 [36:12:49<12:26, 12.87s/it] 99%|█████████▉| 9943/10000 [36:13:02<12:13, 12.86s/it] {'loss': 0.0039, 'learning_rate': 3.75e-07, 'epoch': 3.75} 99%|█████████▉| 9943/10000 [36:13:02<12:13, 12.86s/it] 99%|█████████▉| 9944/10000 [36:13:15<12:01, 12.88s/it] {'loss': 0.0047, 'learning_rate': 3.7000000000000006e-07, 'epoch': 3.75} 99%|█████████▉| 9944/10000 [36:13:15<12:01, 12.88s/it] 99%|█████████▉| 9945/10000 [36:13:28<11:48, 12.89s/it] {'loss': 0.0039, 'learning_rate': 3.65e-07, 'epoch': 3.75} 99%|█████████▉| 9945/10000 [36:13:28<11:48, 12.89s/it] 99%|█████████▉| 9946/10000 [36:13:40<11:35, 12.88s/it] {'loss': 0.0038, 'learning_rate': 3.6e-07, 'epoch': 3.75} 99%|█████████▉| 9946/10000 [36:13:40<11:35, 12.88s/it] 99%|█████████▉| 9947/10000 [36:13:53<11:21, 12.87s/it] {'loss': 0.0038, 'learning_rate': 3.5500000000000004e-07, 'epoch': 3.75} 99%|█████████▉| 9947/10000 [36:13:53<11:21, 12.87s/it] 99%|█████████▉| 9948/10000 [36:14:06<11:09, 12.87s/it] {'loss': 0.003, 'learning_rate': 3.5000000000000004e-07, 'epoch': 3.75} 99%|█████████▉| 9948/10000 [36:14:06<11:09, 12.87s/it] 99%|█████████▉| 9949/10000 [36:14:19<10:56, 12.88s/it] {'loss': 0.0038, 'learning_rate': 3.4500000000000003e-07, 'epoch': 3.75} 99%|█████████▉| 9949/10000 [36:14:19<10:56, 12.88s/it] 100%|█████████▉| 9950/10000 [36:14:32<10:44, 12.90s/it] {'loss': 0.0036, 'learning_rate': 3.4e-07, 'epoch': 3.75} 100%|█████████▉| 9950/10000 [36:14:32<10:44, 12.90s/it] 100%|█████████▉| 9951/10000 [36:14:45<10:31, 12.90s/it] {'loss': 0.0058, 'learning_rate': 3.35e-07, 'epoch': 3.75} 100%|█████████▉| 9951/10000 [36:14:45<10:31, 12.90s/it] 100%|█████████▉| 9952/10000 [36:14:58<10:19, 12.90s/it] {'loss': 0.0036, 'learning_rate': 3.3e-07, 'epoch': 3.75} 100%|█████████▉| 9952/10000 [36:14:58<10:19, 12.90s/it] 100%|█████████▉| 9953/10000 [36:15:11<10:07, 12.92s/it] {'loss': 0.0041, 'learning_rate': 3.25e-07, 'epoch': 3.75} 100%|█████████▉| 9953/10000 [36:15:11<10:07, 12.92s/it] 100%|█████████▉| 9954/10000 [36:15:24<09:55, 12.94s/it] {'loss': 0.0025, 'learning_rate': 3.2e-07, 'epoch': 3.75} 100%|█████████▉| 9954/10000 [36:15:24<09:55, 12.94s/it] 100%|█████████▉| 9955/10000 [36:15:37<09:41, 12.93s/it] {'loss': 0.004, 'learning_rate': 3.15e-07, 'epoch': 3.75} 100%|█████████▉| 9955/10000 [36:15:37<09:41, 12.93s/it] 100%|█████████▉| 9956/10000 [36:15:49<09:28, 12.92s/it] {'loss': 0.0035, 'learning_rate': 3.1e-07, 'epoch': 3.75} 100%|█████████▉| 9956/10000 [36:15:50<09:28, 12.92s/it] 100%|█████████▉| 9957/10000 [36:16:02<09:15, 12.93s/it] {'loss': 0.0039, 'learning_rate': 3.0500000000000004e-07, 'epoch': 3.75} 100%|█████████▉| 9957/10000 [36:16:02<09:15, 12.93s/it] 100%|█████████▉| 9958/10000 [36:16:15<09:03, 12.93s/it] {'loss': 0.0037, 'learning_rate': 3.0000000000000004e-07, 'epoch': 3.75} 100%|█████████▉| 9958/10000 [36:16:15<09:03, 12.93s/it] 100%|█████████▉| 9959/10000 [36:16:28<08:49, 12.92s/it] {'loss': 0.0048, 'learning_rate': 2.95e-07, 'epoch': 3.75} 100%|█████████▉| 9959/10000 [36:16:28<08:49, 12.92s/it] 100%|█████████▉| 9960/10000 [36:16:41<08:36, 12.91s/it] {'loss': 0.0031, 'learning_rate': 2.9e-07, 'epoch': 3.75} 100%|█████████▉| 9960/10000 [36:16:41<08:36, 12.91s/it] 100%|█████████▉| 9961/10000 [36:16:54<08:23, 12.91s/it] {'loss': 0.0032, 'learning_rate': 2.85e-07, 'epoch': 3.75} 100%|█████████▉| 9961/10000 [36:16:54<08:23, 12.91s/it] 100%|█████████▉| 9962/10000 [36:17:07<08:11, 12.93s/it] {'loss': 0.003, 'learning_rate': 2.8e-07, 'epoch': 3.75} 100%|█████████▉| 9962/10000 [36:17:07<08:11, 12.93s/it] 100%|█████████▉| 9963/10000 [36:17:20<07:58, 12.94s/it] {'loss': 0.0044, 'learning_rate': 2.75e-07, 'epoch': 3.75} 100%|█████████▉| 9963/10000 [36:17:20<07:58, 12.94s/it] 100%|█████████▉| 9964/10000 [36:17:33<07:46, 12.95s/it] {'loss': 0.0037, 'learning_rate': 2.7e-07, 'epoch': 3.75} 100%|█████████▉| 9964/10000 [36:17:33<07:46, 12.95s/it] 100%|█████████▉| 9965/10000 [36:17:46<07:34, 12.98s/it] {'loss': 0.0029, 'learning_rate': 2.65e-07, 'epoch': 3.75} 100%|█████████▉| 9965/10000 [36:17:46<07:34, 12.98s/it] 100%|█████████▉| 9966/10000 [36:17:59<07:21, 12.99s/it] {'loss': 0.003, 'learning_rate': 2.6e-07, 'epoch': 3.76} 100%|█████████▉| 9966/10000 [36:17:59<07:21, 12.99s/it] 100%|█████████▉| 9967/10000 [36:18:12<07:07, 12.97s/it] {'loss': 0.0041, 'learning_rate': 2.5500000000000005e-07, 'epoch': 3.76} 100%|█████████▉| 9967/10000 [36:18:12<07:07, 12.97s/it] 100%|█████████▉| 9968/10000 [36:18:25<06:55, 12.99s/it] {'loss': 0.0034, 'learning_rate': 2.5000000000000004e-07, 'epoch': 3.76} 100%|█████████▉| 9968/10000 [36:18:25<06:55, 12.99s/it] 100%|█████████▉| 9969/10000 [36:18:38<06:42, 12.99s/it] {'loss': 0.0039, 'learning_rate': 2.45e-07, 'epoch': 3.76} 100%|█████████▉| 9969/10000 [36:18:38<06:42, 12.99s/it] 100%|█████████▉| 9970/10000 [36:18:51<06:29, 12.99s/it] {'loss': 0.0031, 'learning_rate': 2.4e-07, 'epoch': 3.76} 100%|█████████▉| 9970/10000 [36:18:51<06:29, 12.99s/it] 100%|█████████▉| 9971/10000 [36:19:04<06:15, 12.95s/it] {'loss': 0.0034, 'learning_rate': 2.3500000000000003e-07, 'epoch': 3.76} 100%|█████████▉| 9971/10000 [36:19:04<06:15, 12.95s/it] 100%|█████████▉| 9972/10000 [36:19:17<06:01, 12.92s/it] {'loss': 0.004, 'learning_rate': 2.3e-07, 'epoch': 3.76} 100%|█████████▉| 9972/10000 [36:19:17<06:01, 12.92s/it] 100%|█████████▉| 9973/10000 [36:19:30<05:48, 12.91s/it] {'loss': 0.0039, 'learning_rate': 2.25e-07, 'epoch': 3.76} 100%|█████████▉| 9973/10000 [36:19:30<05:48, 12.91s/it] 100%|█████████▉| 9974/10000 [36:19:43<05:36, 12.93s/it] {'loss': 0.0024, 'learning_rate': 2.2e-07, 'epoch': 3.76} 100%|█████████▉| 9974/10000 [36:19:43<05:36, 12.93s/it] 100%|█████████▉| 9975/10000 [36:19:56<05:23, 12.95s/it] {'loss': 0.0022, 'learning_rate': 2.15e-07, 'epoch': 3.76} 100%|█████████▉| 9975/10000 [36:19:56<05:23, 12.95s/it] 100%|█████████▉| 9976/10000 [36:20:08<05:10, 12.94s/it] {'loss': 0.0032, 'learning_rate': 2.1e-07, 'epoch': 3.76} 100%|█████████▉| 9976/10000 [36:20:09<05:10, 12.94s/it] 100%|█████████▉| 9977/10000 [36:20:21<04:57, 12.94s/it] {'loss': 0.0041, 'learning_rate': 2.0500000000000002e-07, 'epoch': 3.76} 100%|█████████▉| 9977/10000 [36:20:21<04:57, 12.94s/it] 100%|█████████▉| 9978/10000 [36:20:34<04:44, 12.93s/it] {'loss': 0.0032, 'learning_rate': 2.0000000000000002e-07, 'epoch': 3.76} 100%|█████████▉| 9978/10000 [36:20:34<04:44, 12.93s/it] 100%|█████████▉| 9979/10000 [36:20:47<04:31, 12.91s/it] {'loss': 0.0044, 'learning_rate': 1.95e-07, 'epoch': 3.76} 100%|█████████▉| 9979/10000 [36:20:47<04:31, 12.91s/it] 100%|█████████▉| 9980/10000 [36:21:00<04:18, 12.91s/it] {'loss': 0.0043, 'learning_rate': 1.9e-07, 'epoch': 3.76} 100%|█████████▉| 9980/10000 [36:21:00<04:18, 12.91s/it] 100%|█████████▉| 9981/10000 [36:21:13<04:05, 12.90s/it] {'loss': 0.0032, 'learning_rate': 1.8500000000000003e-07, 'epoch': 3.76} 100%|█████████▉| 9981/10000 [36:21:13<04:05, 12.90s/it] 100%|█████████▉| 9982/10000 [36:21:26<03:52, 12.92s/it] {'loss': 0.0034, 'learning_rate': 1.8e-07, 'epoch': 3.76} 100%|█████████▉| 9982/10000 [36:21:26<03:52, 12.92s/it] 100%|█████████▉| 9983/10000 [36:21:39<03:39, 12.94s/it] {'loss': 0.0033, 'learning_rate': 1.7500000000000002e-07, 'epoch': 3.76} 100%|█████████▉| 9983/10000 [36:21:39<03:39, 12.94s/it] 100%|█████████▉| 9984/10000 [36:21:52<03:27, 12.95s/it] {'loss': 0.0029, 'learning_rate': 1.7e-07, 'epoch': 3.76} 100%|█████████▉| 9984/10000 [36:21:52<03:27, 12.95s/it] 100%|█████████▉| 9985/10000 [36:22:05<03:14, 12.96s/it] {'loss': 0.003, 'learning_rate': 1.65e-07, 'epoch': 3.76} 100%|█████████▉| 9985/10000 [36:22:05<03:14, 12.96s/it] 100%|█████████▉| 9986/10000 [36:22:18<03:00, 12.93s/it] {'loss': 0.0036, 'learning_rate': 1.6e-07, 'epoch': 3.76} 100%|█████████▉| 9986/10000 [36:22:18<03:00, 12.93s/it] 100%|█████████▉| 9987/10000 [36:22:31<02:47, 12.92s/it] {'loss': 0.0049, 'learning_rate': 1.55e-07, 'epoch': 3.76} 100%|█████████▉| 9987/10000 [36:22:31<02:47, 12.92s/it] 100%|█████████▉| 9988/10000 [36:22:44<02:35, 12.92s/it] {'loss': 0.0038, 'learning_rate': 1.5000000000000002e-07, 'epoch': 3.76} 100%|█████████▉| 9988/10000 [36:22:44<02:35, 12.92s/it] 100%|█████████▉| 9989/10000 [36:22:56<02:22, 12.92s/it] {'loss': 0.0035, 'learning_rate': 1.45e-07, 'epoch': 3.76} 100%|█████████▉| 9989/10000 [36:22:57<02:22, 12.92s/it] 100%|█████████▉| 9990/10000 [36:23:09<02:09, 12.94s/it] {'loss': 0.0035, 'learning_rate': 1.4e-07, 'epoch': 3.76} 100%|█████████▉| 9990/10000 [36:23:09<02:09, 12.94s/it] 100%|█████████▉| 9991/10000 [36:23:22<01:56, 12.93s/it] {'loss': 0.0035, 'learning_rate': 1.35e-07, 'epoch': 3.76} 100%|█████████▉| 9991/10000 [36:23:22<01:56, 12.93s/it] 100%|█████████▉| 9992/10000 [36:23:35<01:43, 12.93s/it] {'loss': 0.0053, 'learning_rate': 1.3e-07, 'epoch': 3.76} 100%|█████████▉| 9992/10000 [36:23:35<01:43, 12.93s/it] 100%|█████████▉| 9993/10000 [36:23:48<01:30, 12.94s/it] {'loss': 0.0024, 'learning_rate': 1.2500000000000002e-07, 'epoch': 3.77} 100%|█████████▉| 9993/10000 [36:23:48<01:30, 12.94s/it] 100%|█████████▉| 9994/10000 [36:24:01<01:17, 12.92s/it] {'loss': 0.0037, 'learning_rate': 1.2e-07, 'epoch': 3.77} 100%|█████████▉| 9994/10000 [36:24:01<01:17, 12.92s/it] 100%|█████████▉| 9995/10000 [36:24:14<01:04, 12.92s/it] {'loss': 0.003, 'learning_rate': 1.15e-07, 'epoch': 3.77} 100%|█████████▉| 9995/10000 [36:24:14<01:04, 12.92s/it] 100%|█████████▉| 9996/10000 [36:24:27<00:51, 12.90s/it] {'loss': 0.0041, 'learning_rate': 1.1e-07, 'epoch': 3.77} 100%|█████████▉| 9996/10000 [36:24:27<00:51, 12.90s/it] 100%|█████████▉| 9997/10000 [36:24:40<00:38, 12.92s/it] {'loss': 0.0041, 'learning_rate': 1.05e-07, 'epoch': 3.77} 100%|█████████▉| 9997/10000 [36:24:40<00:38, 12.92s/it] 100%|█████████▉| 9998/10000 [36:24:53<00:25, 12.92s/it] {'loss': 0.0036, 'learning_rate': 1.0000000000000001e-07, 'epoch': 3.77} 100%|█████████▉| 9998/10000 [36:24:53<00:25, 12.92s/it] 100%|█████████▉| 9999/10000 [36:25:06<00:12, 12.92s/it] {'loss': 0.0033, 'learning_rate': 9.5e-08, 'epoch': 3.77} 100%|█████████▉| 9999/10000 [36:25:06<00:12, 12.92s/it] 100%|██████████| 10000/10000 [36:25:19<00:00, 12.96s/it] {'loss': 0.0029, 'learning_rate': 9e-08, 'epoch': 3.77} 100%|██████████| 10000/10000 [36:25:19<00:00, 12.96s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-07 08:50:17,369 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/config.json [INFO|configuration_utils.py:364] 2024-11-07 08:50:17,371 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-07 08:51:04,756 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-07 08:51:04,759 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-07 08:51:04,760 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/special_tokens_map.json [2024-11-07 08:51:04,769] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step10000 is about to be saved! [2024-11-07 08:51:04,805] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/global_step10000/mp_rank_00_model_states.pt [2024-11-07 08:51:04,805] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/global_step10000/mp_rank_00_model_states.pt... [2024-11-07 08:51:59,296] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/global_step10000/mp_rank_00_model_states.pt. [2024-11-07 08:51:59,386] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/global_step10000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-07 08:53:19,857] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/global_step10000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-07 08:53:20,220] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/Chemgpt_brain_v2-20241105-200852-1e-4/checkpoint-10000/global_step10000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-07 08:53:20,221] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step10000 is ready now! [INFO|trainer.py:2053] 2024-11-07 08:53:47,136 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 131329.0922, 'train_samples_per_second': 3.731, 'train_steps_per_second': 0.076, 'train_loss': 0.007247873665951193, 'epoch': 3.77} 100%|██████████| 10000/10000 [36:28:49<00:00, 12.96s/it] 100%|██████████| 10000/10000 [36:28:49<00:00, 13.13s/it] Saving the whole model [INFO|configuration_utils.py:458] 2024-11-07 08:53:47,152 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/config.json [INFO|configuration_utils.py:364] 2024-11-07 08:53:47,154 >> Configuration saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-07 08:54:24,986 >> Model weights saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-07 08:54:24,990 >> tokenizer config file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-07 08:54:24,991 >> Special tokens file saved in output/Chemgpt_brain_v2-20241105-200852-1e-4/special_tokens_map.json