[2025-10-14 02:31:22,833] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-10-14 02:31:23,540] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False [INFO|2025-10-14 02:31:25] llamafactory.cli:143 >> Initializing 8 distributed tasks at: 127.0.0.1:49583 W1014 02:31:26.703000 222161 site-packages/torch/distributed/run.py:793] W1014 02:31:26.703000 222161 site-packages/torch/distributed/run.py:793] ***************************************** W1014 02:31:26.703000 222161 site-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W1014 02:31:26.703000 222161 site-packages/torch/distributed/run.py:793] ***************************************** [2025-10-14 02:31:31,787] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-10-14 02:31:31,788] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-10-14 02:31:31,789] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-10-14 02:31:31,791] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-10-14 02:31:31,845] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-10-14 02:31:31,852] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-10-14 02:31:31,858] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-10-14 02:31:31,858] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-10-14 02:31:32,645] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False [2025-10-14 02:31:32,648] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False [2025-10-14 02:31:32,649] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False [2025-10-14 02:31:32,679] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False [2025-10-14 02:31:32,687] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False [2025-10-14 02:31:32,699] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False [2025-10-14 02:31:32,703] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False [2025-10-14 02:31:32,706] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False [2025-10-14 02:31:33,644] [INFO] [comm.py:821:init_distributed] cdb=None [2025-10-14 02:31:33,646] [INFO] [comm.py:821:init_distributed] cdb=None [2025-10-14 02:31:33,646] [INFO] [comm.py:852:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-10-14 02:31:33,647] [INFO] [comm.py:821:init_distributed] cdb=None [W1014 02:31:33.156712707 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) [W1014 02:31:33.157343750 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) [W1014 02:31:33.157943548 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) [2025-10-14 02:31:33,667] [INFO] [comm.py:821:init_distributed] cdb=None [W1014 02:31:33.177464774 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) [2025-10-14 02:31:33,685] [INFO] [comm.py:821:init_distributed] cdb=None [W1014 02:31:33.194671733 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) [2025-10-14 02:31:33,695] [INFO] [comm.py:821:init_distributed] cdb=None [W1014 02:31:33.206253710 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) [2025-10-14 02:31:33,803] [INFO] [comm.py:821:init_distributed] cdb=None [W1014 02:31:33.314327871 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) [2025-10-14 02:31:33,807] [INFO] [comm.py:821:init_distributed] cdb=None [W1014 02:31:33.317907568 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) [INFO|2025-10-14 02:31:40] llamafactory.hparams.parser:401 >> Process rank: 6, world size: 8, device: cuda:6, distributed training: True, compute dtype: torch.bfloat16 [INFO|2025-10-14 02:31:40] llamafactory.hparams.parser:401 >> Process rank: 3, world size: 8, device: cuda:3, distributed training: True, compute dtype: torch.bfloat16 [INFO|2025-10-14 02:31:40] llamafactory.hparams.parser:401 >> Process rank: 1, world size: 8, device: cuda:1, distributed training: True, compute dtype: torch.bfloat16 [INFO|2025-10-14 02:31:40] llamafactory.hparams.parser:401 >> Process rank: 7, world size: 8, device: cuda:7, distributed training: True, compute dtype: torch.bfloat16 [INFO|2025-10-14 02:31:40] llamafactory.hparams.parser:401 >> Process rank: 4, world size: 8, device: cuda:4, distributed training: True, compute dtype: torch.bfloat16 [INFO|2025-10-14 02:31:40] llamafactory.hparams.parser:401 >> Process rank: 5, world size: 8, device: cuda:5, distributed training: True, compute dtype: torch.bfloat16 [INFO|2025-10-14 02:31:40] llamafactory.hparams.parser:401 >> Process rank: 2, world size: 8, device: cuda:2, distributed training: True, compute dtype: torch.bfloat16 [INFO|2025-10-14 02:31:40] llamafactory.hparams.parser:401 >> Process rank: 0, world size: 8, device: cuda:0, distributed training: True, compute dtype: torch.bfloat16 [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,155 >> loading file vocab.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,155 >> loading file merges.txt [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,155 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,155 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,155 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,155 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,155 >> loading file chat_template.jinja Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [INFO|tokenization_utils_base.py:2299] 2025-10-14 02:31:40,503 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|image_processing_base.py:378] 2025-10-14 02:31:40,504 >> loading configuration file /inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct/preprocessor_config.json [INFO|image_processing_base.py:378] 2025-10-14 02:31:40,506 >> loading configuration file /inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct/preprocessor_config.json [WARNING|logging.py:328] 2025-10-14 02:31:40,506 >> Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [INFO|image_processing_base.py:433] 2025-10-14 02:31:40,508 >> Image processor Qwen2VLImageProcessor { "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "patch_size": 14, "processor_class": "Qwen2_5_VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "longest_edge": 12845056, "shortest_edge": 3136 }, "temporal_patch_size": 2 } Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,508 >> loading file vocab.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,508 >> loading file merges.txt [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,508 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,508 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,508 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,508 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2021] 2025-10-14 02:31:40,508 >> loading file chat_template.jinja You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. [INFO|tokenization_utils_base.py:2299] 2025-10-14 02:31:40,752 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:328] 2025-10-14 02:31:40,753 >> You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. [INFO|video_processing_utils.py:627] 2025-10-14 02:31:40,753 >> loading configuration file /inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct/preprocessor_config.json [INFO|configuration_utils.py:696] 2025-10-14 02:31:40,753 >> loading configuration file /inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct/config.json [INFO|configuration_utils.py:770] 2025-10-14 02:31:40,755 >> Model config Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": null, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "torch_dtype": "bfloat16", "use_cache": true, "use_sliding_window": false, "video_token_id": null, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.52.1", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "depth": 32, "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 3584, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "window_size": 112 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 } [INFO|video_processing_utils.py:627] 2025-10-14 02:31:40,756 >> loading configuration file /inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct/preprocessor_config.json [INFO|video_processing_utils.py:683] 2025-10-14 02:31:40,756 >> Video processor Qwen2VLVideoProcessor { "_valid_kwargs_names": [ "do_convert_rgb", "do_resize", "size", "size_divisor", "default_to_square", "resample", "do_rescale", "rescale_factor", "do_normalize", "image_mean", "image_std", "do_pad", "do_center_crop", "crop_size", "data_format", "input_data_format", "device", "min_pixels", "max_pixels", "patch_size", "temporal_patch_size", "merge_size" ], "crop_size": null, "data_format": "channels_first", "default_to_square": true, "device": null, "do_center_crop": null, "do_convert_rgb": true, "do_normalize": true, "do_pad": null, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "input_data_format": null, "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "model_valid_processing_keys": [ "do_convert_rgb", "do_resize", "size", "size_divisor", "default_to_square", "resample", "do_rescale", "rescale_factor", "do_normalize", "image_mean", "image_std", "do_pad", "do_center_crop", "crop_size", "data_format", "input_data_format", "device", "min_pixels", "max_pixels", "patch_size", "temporal_patch_size", "merge_size" ], "patch_size": 14, "processor_class": "Qwen2_5_VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "longest_edge": 12845056, "shortest_edge": 3136 }, "size_divisor": null, "temporal_patch_size": 2, "video_processor_type": "Qwen2VLVideoProcessor" } You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. [INFO|processing_utils.py:990] 2025-10-14 02:31:41,114 >> Processor Qwen2_5_VLProcessor: - image_processor: Qwen2VLImageProcessor { "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "patch_size": 14, "processor_class": "Qwen2_5_VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "longest_edge": 12845056, "shortest_edge": 3136 }, "temporal_patch_size": 2 } - tokenizer: Qwen2TokenizerFast(name_or_path='/inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={ 151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151657: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151658: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), } ) - video_processor: Qwen2VLVideoProcessor { "_valid_kwargs_names": [ "do_convert_rgb", "do_resize", "size", "size_divisor", "default_to_square", "resample", "do_rescale", "rescale_factor", "do_normalize", "image_mean", "image_std", "do_pad", "do_center_crop", "crop_size", "data_format", "input_data_format", "device", "min_pixels", "max_pixels", "patch_size", "temporal_patch_size", "merge_size" ], "crop_size": null, "data_format": "channels_first", "default_to_square": true, "device": null, "do_center_crop": null, "do_convert_rgb": true, "do_normalize": true, "do_pad": null, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "input_data_format": null, "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "model_valid_processing_keys": [ "do_convert_rgb", "do_resize", "size", "size_divisor", "default_to_square", "resample", "do_rescale", "rescale_factor", "do_normalize", "image_mean", "image_std", "do_pad", "do_center_crop", "crop_size", "data_format", "input_data_format", "device", "min_pixels", "max_pixels", "patch_size", "temporal_patch_size", "merge_size" ], "patch_size": 14, "processor_class": "Qwen2_5_VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "longest_edge": 12845056, "shortest_edge": 3136 }, "size_divisor": null, "temporal_patch_size": 2, "video_processor_type": "Qwen2VLVideoProcessor" } { "processor_class": "Qwen2_5_VLProcessor" } [INFO|2025-10-14 02:31:41] llamafactory.data.loader:143 >> Loading dataset /inspire/hdd/global_user/zhangkaipeng-24043/val_global/dataset/PyVision_SFT/sharegpt/sft_data_vsi_wo_video_hint/sft_data_vsi_wo_video_hint_sharegpt.json... Converting format of dataset (num_proc=256): 0%| | 0/21939 [00:00> Loading dataset /inspire/hdd/global_user/zhangkaipeng-24043/val_global/dataset/PyVision_SFT/sharegpt/sft_data_gmai_reasoning_wo_image_hint/sft_data_gmai_reasoning_wo_image_hint_sharegpt.json... Converting format of dataset (num_proc=256): 0%| | 0/5661 [00:00> Loading dataset /inspire/hdd/global_user/zhangkaipeng-24043/val_global/dataset/PyVision_SFT/sharegpt/sft_data_mmpr_wo_image_hint/sft_data_mmpr_wo_image_hint_sharegpt.json... Converting format of dataset (num_proc=256): 0%| | 0/27156 [00:00> Loading dataset /inspire/hdd/global_user/zhangkaipeng-24043/val_global/dataset/PyVision_SFT/sharegpt/sft_data_mmk12_wo_image_hint/sft_data_mmk12_wo_image_hint_sharegpt.json... Converting format of dataset (num_proc=256): 0%| | 0/10635 [00:00> Loading dataset /inspire/hdd/global_user/zhangkaipeng-24043/val_global/dataset/PyVision_SFT/sharegpt/sft_data_longvila_wo_video_hint/sft_data_longvila_wo_video_hint_sharegpt.json... Converting format of dataset (num_proc=256): 0%| | 0/22204 [00:00> Loading dataset /inspire/hdd/global_user/zhangkaipeng-24043/val_global/dataset/PyVision_SFT/sharegpt/sft_data_cosyn_chart_wo_image_hint/sft_data_cosyn_chart_wo_image_hint_sharegpt.json... Converting format of dataset (num_proc=256): 0%| | 0/22560 [00:00system You are a helpful assistant.<|im_end|> <|im_start|>user You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved. Solve the following problem step by step. You now have the ability to selectively write executable Python code to enhance your reasoning process. The Python code will be executed by an external sandbox. You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully. For all the provided videos, in order, the j-th video has already been read into the global variable `video_clue_j` using the VideoReader() function. When writing Python code, you can directly use these variables without needing to read them again. Since you are dealing with the vision-related question answering task, you MUST use the Python tool (e.g., matplotlib library) to analyze or transform images whenever it could improve your understanding or aid your reasoning. This includes but is not limited to zooming in, rotating, adjusting contrast, computing statistics, or isolating features. For the videos, you can also use the Python tool (e.g., decord library) to sample frames from the video, helping your reasoning. Note: 1. When you use matplotlib to visualize data or further process images, you need to use plt.show() to display these images; there is no need to save them. 2. Do not use image processing libraries like cv2 or PIL. 3. Remember you CAN NOT see the video directly. Thus, if you need to reason based on the video, you MUST sample frames and use plt.show() to display these frames, helping your reasoning. 4. If you want to check the value of a variable, you MUST use print() to check it. 5. If you think the init provided frames are not enough to solve the question. just sample more fames from the `video_clue_0` using Python code. The output (wrapped in `output_str`) can be returned to aid your reasoning and help you arrive at the final answer. The Python code should be complete scripts, including necessary imports. Each code snippet is wrapped with ` ```python code snippet ``` `. The last part of your response should be in the following format: \boxed{'The final answer goes here.'} *Video Information:* Frame Width: 1296; Frame Height: 968; Video Length: 2675; Sample FPS: 24.00 The original video has been read into the global variable `video_clue_0`. *User Question:* Answer the following Problem with an image provided and put the answer in the format of \boxed{answer} If I am standing by the couch and facing the table, is the jacket to my front-left, front-right, back-left, or back-right? Directions refer to the quadrants of a Cartesian plane (assuming I am at the origin and facing along the positive y-axis). A. back-right B. front-right C. front-left D. back-left Remember to place the final answer in the last part using the format: \boxed{'The final answer goes here.'} <|im_end|> <|im_start|>assistant Let's break down the problem and plan our approach: ## Step 1: Understanding the Problem - The user is "standing by the couch and facing the table." - The question is about the location of the "jacket" relative to the user's position and facing direction. - Directions are defined as quadrants: front-left, front-right, back-left, back-right, with "front" being the direction the user is facing (positive y-axis). ## Step 2: What Do We Need to Find? - Identify the locations of the couch, table, and jacket in the video. - Determine their spatial relationships. - Infer the relative position of the jacket when standing by the couch and facing the table. ## Step 3: Planning the Investigation 1. **Sample Frames:** We need to sample frames from `video_clue_0` to visually locate the couch, table, and jacket. 2. **Mark Objects:** Use matplotlib to highlight and compare the locations of these items. 3. **Analyze Relative Positions:** Based on the video, map the positions to our coordinate system. 4. **Conclude:** Use the mapped positions to answer the question. ## Step 4: Initial Frame Sampling Since we cannot see the video directly, let's sample a few frames evenly across the video to get diverse views. We'll display them to analyze object locations. ### Next Steps - Sample 5 frames (start, 25%, 50%, 75%, end) and display them. - Try to identify the couch, table, and jacket in these frames. Let's write code for this. ```python import matplotlib.pyplot as plt # Sample 5 frames from different points in the video frame_indices = [0, int(0.25*2675), int(0.5*2675), int(0.75*2675), 2674] frames = [video_clue_0[i].asnumpy() for i in frame_indices] plt.figure(figsize=(15, 10)) for idx, frame in enumerate(frames): plt.subplot(1, 5, idx+1) plt.imshow(frame) plt.title(f'Frame {frame_indices[idx]}') plt.axis('off') plt.tight_layout() plt.show() ``` <|im_end|> <|im_start|>user Text Result: None Image Result: <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> <|im_end|> <|im_start|>assistant Let's analyze the sampled frames: - **Frame 0:** Kitchen area (stove, counter) — not relevant for couch/table/jacket. - **Frame 668:** Table with two chairs; there is a dark object (likely the jacket) on the table. - **Frame 1337:** Couch is visible, with a white rectangular object (possibly a coffee table) in front of it. - **Frame 2006:** View of the kitchen area, adjacent to a carpeted area (possibly near the couch). - **Frame 2674:** Sink area, not relevant. ### Key Observations: - **Jacket:** The dark object on the table in Frame 668 is likely the jacket. - **Table:** Also in Frame 668. - **Couch:** Clearly visible in Frame 1337, with the door in the background. The table visible in frame 1337 is likely a coffee table, not the dining table from frame 668. #### Next Steps: We need to determine the spatial relationship between the couch, the (dining) table, and the jacket. Ideally, we need a frame that shows both the couch and the table together or allows us to infer their relative positions. ### Plan: - Sample additional frames between 668 and 1337 to try to get a frame that shows both the table (with jacket) and the couch, or at least provides more spatial context. - Focus on the transition between these two locations. Let's sample 3 more frames at indices: 900, 1100, and 1200, and display them for analysis. ```python additional_indices = [900, 1100, 1200] additional_frames = [video_clue_0[i].asnumpy() for i in additional_indices] plt.figure(figsize=(12, 6)) for idx, frame in enumerate(additional_frames): plt.subplot(1, 3, idx+1) plt.imshow(frame) plt.title(f'Frame {additional_indices[idx]}') plt.axis('off') plt.tight_layout() plt.show() ``` <|im_end|> <|im_start|>user Text Result: None Image Result: <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> <|im_end|> <|im_start|>assistant Let's analyze the new frames: - **Frame 900:** Shows part of the table (with jacket) and the closet, with the door and a bit of the carpeted area visible. - **Frame 1100:** Shows the door and the right arm of the couch. - **Frame 1200:** Shows the closet and the door, carpeted area. ### Spatial Relationship Synthesis From all frames so far: - The **table with the jacket** is close to the closet and near the kitchen area (Frame 668, 900). - The **couch** is adjacent to the door and on the carpet (Frame 1337, 1100). - The door and carpet area serve as a transition between the kitchen/table area and the couch. - Frames 900 and 1200 show you can see both the table (left) and the closet/door (center/right) in one shot, indicating proximity. Thus, the **table (with jacket)** is in the dining area, adjacent to the kitchen, and the **couch** is in the living area, near the door. ### Next Step: Determining Relative Directions If you are "standing by the couch and facing the table," your back is to the door, and you are looking toward the dining area (table, kitchen). The table is in front of you. The jacket is on the table. Now, we need to consider: Is the jacket on the left or right side of the table from this perspective? #### Let's zoom in on Frame 668 (table with jacket) and Frame 1337 (couch) for more details. ```python plt.figure(figsize=(10, 5)) # Zoom on Frame 668 (table with jacket) plt.subplot(1, 2, 1) plt.imshow(frames[1]) # Frame 668 plt.title('Frame 668 (Table & Jacket)') plt.axis('off') # Zoom on Frame 1337 (couch) plt.subplot(1, 2, 2) plt.imshow(frames[2]) # Frame 1337 plt.title('Frame 1337 (Couch)') plt.axis('off') plt.tight_layout() plt.show() ``` <|im_end|> <|im_start|>user Text Result: None Image Result: <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> <|im_end|> <|im_start|>assistant Let's reflect on what we've learned from these zoomed frames: - **Frame 668 (Table & Jacket):** The jacket is placed on the left side of the table (from the perspective in this frame), and the closet/door is to the right. - **Frame 1337 (Couch):** The couch is against the wall, near the door. The carpeted area leads toward the table (which is out of frame). ### Synthesizing the Room Layout From the sequence and the angles: - The table and jacket are in the dining area, with the closet/door to the right (from the table's perspective). - The couch is in the living area, with the door nearby. - If standing by the couch and facing the table, you would be looking toward the kitchen/dining area. ### Determining Jacket's Relative Position If you are standing by the couch and facing the table: - The table is in front of you. - The closet/door is to the right of the table (from the table's perspective). - The jacket is on the left side of the table (from the table's perspective). So, from your position by the couch, facing the table: - The jacket is on your **front-left**. ### Final Check Let's confirm: - In Frame 668, the jacket is on the left side of the table, with the closet/door to the right. - In Frame 1337, the couch is positioned such that facing the table from the couch puts the jacket on your front-left. ## Final Answer \boxed{'front-left'} <|im_end|> label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 10061, 594, 1438, 1495, 279, 3491, 323, 3119, 1039, 5486, 1447, 565, 14822, 220, 16, 25, 45451, 279, 22079, 271, 12, 576, 1196, 374, 330, 10070, 553, 279, 26148, 323, 12880, 279, 1965, 10040, 12, 576, 3405, 374, 911, 279, 3728, 315, 279, 330, 73, 5709, 1, 8674, 311, 279, 1196, 594, 2309, 323, 12880, 5106, 624, 12, 53486, 525, 4512, 438, 29136, 1783, 25, 4065, 7950, 11, 4065, 6701, 11, 1182, 7950, 11, 1182, 6701, 11, 448, 330, 6951, 1, 1660, 279, 5106, 279, 1196, 374, 12880, 320, 30487, 379, 35321, 3593, 565, 14822, 220, 17, 25, 3555, 3155, 1205, 14656, 311, 7379, 1939, 12, 64547, 279, 10468, 315, 279, 26148, 11, 1965, 11, 323, 26208, 304, 279, 2766, 624, 12, 29901, 862, 27979, 11871, 624, 12, 62658, 279, 8674, 2309, 315, 279, 26208, 979, 11259, 553, 279, 26148, 323, 12880, 279, 1965, 382, 565, 14822, 220, 18, 25, 27680, 279, 46897, 271, 16, 13, 3070, 17571, 64597, 66963, 1205, 1184, 311, 6077, 14087, 504, 1565, 9986, 6794, 361, 62, 15, 63, 311, 42295, 24523, 279, 26148, 11, 1965, 11, 323, 26208, 624, 17, 13, 3070, 8949, 24114, 66963, 5443, 16801, 311, 11167, 323, 9429, 279, 10468, 315, 1493, 3589, 624, 18, 13, 3070, 2082, 55856, 39402, 75383, 66963, 20205, 389, 279, 2766, 11, 2415, 279, 9892, 311, 1039, 16184, 1849, 624, 19, 13, 3070, 1109, 857, 66963, 5443, 279, 23844, 9892, 311, 4226, 279, 3405, 382, 565, 14822, 220, 19, 25, 4127, 16321, 95309, 271, 12549, 582, 4157, 1490, 279, 2766, 5961, 11, 1077, 594, 6077, 264, 2421, 14087, 41047, 3941, 279, 2766, 311, 633, 16807, 6194, 13, 1205, 3278, 3037, 1105, 311, 23643, 1633, 10468, 382, 14374, 9295, 39861, 271, 12, 19143, 220, 20, 14087, 320, 2468, 11, 220, 17, 20, 13384, 220, 20, 15, 13384, 220, 22, 20, 13384, 835, 8, 323, 3037, 1105, 624, 12, 9735, 311, 10542, 279, 26148, 11, 1965, 11, 323, 26208, 304, 1493, 14087, 382, 10061, 594, 3270, 2038, 369, 419, 382, 27, 1851, 397, 73594, 12669, 198, 474, 16801, 23716, 438, 6516, 271, 2, 19143, 220, 20, 14087, 504, 2155, 3501, 304, 279, 2766, 198, 6763, 18333, 284, 508, 15, 11, 526, 7, 15, 13, 17, 20, 9, 17, 21, 22, 20, 701, 526, 7, 15, 13, 20, 9, 17, 21, 22, 20, 701, 526, 7, 15, 13, 22, 20, 9, 17, 21, 22, 20, 701, 220, 17, 21, 22, 19, 921, 23719, 284, 508, 9986, 6794, 361, 62, 15, 989, 936, 300, 35083, 368, 369, 600, 304, 4034, 18333, 2533, 9476, 26504, 48683, 4539, 16, 20, 11, 220, 16, 15, 1171, 1958, 7187, 11, 4034, 304, 13252, 96133, 982, 262, 6516, 45667, 7, 16, 11, 220, 20, 11, 7187, 10, 16, 340, 262, 6516, 29086, 15046, 340, 262, 6516, 6067, 955, 6, 4369, 314, 6763, 18333, 11597, 13989, 1305, 262, 6516, 35151, 492, 1847, 1305, 9476, 75229, 14466, 741, 9476, 5460, 741, 13874, 3989, 522, 1851, 397, 151645, 198, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 10061, 594, 23643, 279, 48876, 14087, 1447, 12, 3070, 4369, 220, 15, 66963, 19354, 3082, 320, 267, 994, 11, 5546, 8, 1959, 537, 9760, 369, 26148, 45326, 4437, 5709, 624, 12, 3070, 4369, 220, 21, 21, 23, 66963, 6633, 448, 1378, 25904, 26, 1052, 374, 264, 6319, 1633, 320, 14282, 279, 26208, 8, 389, 279, 1965, 624, 12, 3070, 4369, 220, 16, 18, 18, 22, 66963, 61128, 374, 9434, 11, 448, 264, 4158, 51424, 1633, 320, 53128, 264, 10799, 1965, 8, 304, 4065, 315, 432, 624, 12, 3070, 4369, 220, 17, 15, 15, 21, 66963, 2738, 315, 279, 9780, 3082, 11, 23942, 311, 264, 27854, 291, 3082, 320, 53128, 3143, 279, 26148, 4292, 12, 3070, 4369, 220, 17, 21, 22, 19, 66963, 56451, 3082, 11, 537, 9760, 382, 14374, 5309, 30843, 804, 510, 12, 3070, 41, 5709, 66963, 576, 6319, 1633, 389, 279, 1965, 304, 16321, 220, 21, 21, 23, 374, 4363, 279, 26208, 624, 12, 3070, 2556, 66963, 7281, 304, 16321, 220, 21, 21, 23, 624, 12, 3070, 34, 3026, 66963, 53404, 9434, 304, 16321, 220, 16, 18, 18, 22, 11, 448, 279, 6006, 304, 279, 4004, 13, 576, 1965, 9434, 304, 4034, 220, 16, 18, 18, 22, 374, 4363, 264, 10799, 1965, 11, 537, 279, 17914, 1965, 504, 4034, 220, 21, 21, 23, 382, 820, 9295, 39861, 510, 1654, 1184, 311, 8253, 279, 27979, 5025, 1948, 279, 26148, 11, 279, 320, 67, 5740, 8, 1965, 11, 323, 279, 26208, 13, 66701, 11, 582, 1184, 264, 4034, 429, 4933, 2176, 279, 26148, 323, 279, 1965, 3786, 476, 6147, 601, 311, 23583, 862, 8674, 9892, 382, 14374, 9680, 510, 12, 19143, 5107, 14087, 1948, 220, 21, 21, 23, 323, 220, 16, 18, 18, 22, 311, 1430, 311, 633, 264, 4034, 429, 4933, 2176, 279, 1965, 320, 4197, 26208, 8, 323, 279, 26148, 11, 476, 518, 3245, 5707, 803, 27979, 2266, 624, 12, 25806, 389, 279, 9142, 1948, 1493, 1378, 10468, 382, 10061, 594, 6077, 220, 18, 803, 14087, 518, 14937, 25, 220, 24, 15, 15, 11, 220, 16, 16, 15, 15, 11, 323, 220, 16, 17, 15, 15, 11, 323, 3037, 1105, 369, 6358, 382, 27, 1851, 397, 73594, 12669, 198, 35499, 18333, 284, 508, 24, 15, 15, 11, 220, 16, 16, 15, 15, 11, 220, 16, 17, 15, 15, 921, 35499, 29319, 284, 508, 9986, 6794, 361, 62, 15, 989, 936, 300, 35083, 368, 369, 600, 304, 5107, 18333, 2533, 9476, 26504, 48683, 4539, 16, 17, 11, 220, 21, 1171, 1958, 7187, 11, 4034, 304, 13252, 25906, 3005, 29319, 982, 262, 6516, 45667, 7, 16, 11, 220, 18, 11, 7187, 10, 16, 340, 262, 6516, 29086, 15046, 340, 262, 6516, 6067, 955, 6, 4369, 314, 35499, 18333, 11597, 13989, 1305, 262, 6516, 35151, 492, 1847, 1305, 9476, 75229, 14466, 741, 9476, 5460, 741, 13874, 3989, 522, 1851, 397, 151645, 198, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 10061, 594, 23643, 279, 501, 14087, 1447, 12, 3070, 4369, 220, 24, 15, 15, 66963, 36280, 949, 315, 279, 1965, 320, 4197, 26208, 8, 323, 279, 31944, 11, 448, 279, 6006, 323, 264, 2699, 315, 279, 27854, 291, 3082, 9434, 624, 12, 3070, 4369, 220, 16, 16, 15, 15, 66963, 36280, 279, 6006, 323, 279, 1290, 6773, 315, 279, 26148, 624, 12, 3070, 4369, 220, 16, 17, 15, 15, 66963, 36280, 279, 31944, 323, 279, 6006, 11, 27854, 291, 3082, 382, 14374, 74697, 32807, 23153, 13189, 271, 3830, 678, 14087, 773, 3041, 510, 12, 576, 3070, 2005, 448, 279, 26208, 334, 374, 3265, 311, 279, 31944, 323, 3143, 279, 9780, 3082, 320, 4369, 220, 21, 21, 23, 11, 220, 24, 15, 15, 4292, 12, 576, 3070, 66, 3026, 334, 374, 23942, 311, 279, 6006, 323, 389, 279, 27854, 320, 4369, 220, 16, 18, 18, 22, 11, 220, 16, 16, 15, 15, 4292, 12, 576, 6006, 323, 27854, 3082, 8683, 438, 264, 9142, 1948, 279, 9780, 45326, 3082, 323, 279, 26148, 624, 12, 64597, 220, 24, 15, 15, 323, 220, 16, 17, 15, 15, 1473, 498, 646, 1490, 2176, 279, 1965, 320, 2359, 8, 323, 279, 31944, 14, 10787, 320, 3057, 73101, 8, 304, 825, 6552, 11, 18860, 36743, 382, 44500, 11, 279, 3070, 2005, 320, 4197, 26208, 32295, 374, 304, 279, 17914, 3082, 11, 23942, 311, 279, 9780, 11, 323, 279, 3070, 66, 3026, 334, 374, 304, 279, 5382, 3082, 11, 3143, 279, 6006, 382, 14374, 9295, 14822, 25, 34907, 287, 39402, 53486, 271, 2679, 498, 525, 330, 10070, 553, 279, 26148, 323, 12880, 279, 1965, 1335, 697, 1182, 374, 311, 279, 6006, 11, 323, 498, 525, 3330, 8841, 279, 17914, 3082, 320, 2005, 11, 9780, 568, 576, 1965, 374, 304, 4065, 315, 498, 13, 576, 26208, 374, 389, 279, 1965, 382, 7039, 11, 582, 1184, 311, 2908, 25, 2160, 279, 26208, 389, 279, 2115, 476, 1290, 3108, 315, 279, 1965, 504, 419, 13057, 1939, 820, 6771, 594, 15562, 304, 389, 16321, 220, 21, 21, 23, 320, 2005, 448, 26208, 8, 323, 16321, 220, 16, 18, 18, 22, 320, 66, 3026, 8, 369, 803, 3565, 382, 27, 1851, 397, 73594, 12669, 198, 9476, 26504, 48683, 4539, 16, 15, 11, 220, 20, 4390, 2, 37529, 389, 16321, 220, 21, 21, 23, 320, 2005, 448, 26208, 340, 9476, 45667, 7, 16, 11, 220, 17, 11, 220, 16, 340, 9476, 29086, 96133, 58, 16, 2467, 220, 671, 16321, 220, 21, 21, 23, 198, 9476, 6067, 492, 4369, 220, 21, 21, 23, 320, 2556, 609, 54770, 33013, 9476, 35151, 492, 1847, 4610, 2, 37529, 389, 16321, 220, 16, 18, 18, 22, 320, 66, 3026, 340, 9476, 45667, 7, 16, 11, 220, 17, 11, 220, 17, 340, 9476, 29086, 96133, 58, 17, 2467, 220, 671, 16321, 220, 16, 18, 18, 22, 198, 9476, 6067, 492, 4369, 220, 16, 18, 18, 22, 320, 34, 3026, 33013, 9476, 35151, 492, 1847, 4610, 9476, 75229, 14466, 741, 9476, 5460, 741, 13874, 3989, 522, 1851, 397, 151645, 198, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 10061, 594, 8708, 389, 1128, 582, 3003, 9498, 504, 1493, 15562, 291, 14087, 1447, 12, 3070, 4369, 220, 21, 21, 23, 320, 2556, 609, 54770, 1648, 334, 2303, 220, 576, 26208, 374, 9099, 389, 279, 2115, 3108, 315, 279, 1965, 320, 1499, 279, 13057, 304, 419, 4034, 701, 323, 279, 31944, 14, 10787, 374, 311, 279, 1290, 382, 12, 3070, 4369, 220, 16, 18, 18, 22, 320, 34, 3026, 1648, 334, 2303, 220, 576, 26148, 374, 2348, 279, 7002, 11, 3143, 279, 6006, 13, 576, 27854, 291, 3082, 11508, 8841, 279, 1965, 320, 8206, 374, 700, 315, 4034, 3593, 14374, 23153, 6375, 4849, 279, 10420, 13821, 271, 3830, 279, 8500, 323, 279, 25941, 510, 12, 576, 1965, 323, 26208, 525, 304, 279, 17914, 3082, 11, 448, 279, 31944, 14, 10787, 311, 279, 1290, 320, 1499, 279, 1965, 594, 13057, 4292, 12, 576, 26148, 374, 304, 279, 5382, 3082, 11, 448, 279, 6006, 14046, 624, 12, 1416, 11259, 553, 279, 26148, 323, 12880, 279, 1965, 11, 498, 1035, 387, 3330, 8841, 279, 9780, 3446, 5740, 3082, 382, 14374, 34907, 287, 54770, 594, 39402, 12380, 271, 2679, 498, 525, 11259, 553, 279, 26148, 323, 12880, 279, 1965, 510, 12, 576, 1965, 374, 304, 4065, 315, 498, 624, 12, 576, 31944, 14, 10787, 374, 311, 279, 1290, 315, 279, 1965, 320, 1499, 279, 1965, 594, 13057, 4292, 12, 576, 26208, 374, 389, 279, 2115, 3108, 315, 279, 1965, 320, 1499, 279, 1965, 594, 13057, 3593, 4416, 11, 504, 697, 2309, 553, 279, 26148, 11, 12880, 279, 1965, 510, 12, 576, 26208, 374, 389, 697, 3070, 6951, 7950, 334, 382, 14374, 13023, 4248, 271, 10061, 594, 7683, 510, 12, 758, 16321, 220, 21, 21, 23, 11, 279, 26208, 374, 389, 279, 2115, 3108, 315, 279, 1965, 11, 448, 279, 31944, 14, 10787, 311, 279, 1290, 624, 12, 758, 16321, 220, 16, 18, 18, 22, 11, 279, 26148, 374, 34228, 1741, 429, 12880, 279, 1965, 504, 279, 26148, 9521, 279, 26208, 389, 697, 4065, 7950, 382, 565, 13023, 21806, 271, 27, 9217, 397, 59, 79075, 13608, 6951, 7950, 16418, 522, 9217, 29, 151645, 198] labels: Let's break down the problem and plan our approach: ## Step 1: Understanding the Problem - The user is "standing by the couch and facing the table." - The question is about the location of the "jacket" relative to the user's position and facing direction. - Directions are defined as quadrants: front-left, front-right, back-left, back-right, with "front" being the direction the user is facing (positive y-axis). ## Step 2: What Do We Need to Find? - Identify the locations of the couch, table, and jacket in the video. - Determine their spatial relationships. - Infer the relative position of the jacket when standing by the couch and facing the table. ## Step 3: Planning the Investigation 1. **Sample Frames:** We need to sample frames from `video_clue_0` to visually locate the couch, table, and jacket. 2. **Mark Objects:** Use matplotlib to highlight and compare the locations of these items. 3. **Analyze Relative Positions:** Based on the video, map the positions to our coordinate system. 4. **Conclude:** Use the mapped positions to answer the question. ## Step 4: Initial Frame Sampling Since we cannot see the video directly, let's sample a few frames evenly across the video to get diverse views. We'll display them to analyze object locations. ### Next Steps - Sample 5 frames (start, 25%, 50%, 75%, end) and display them. - Try to identify the couch, table, and jacket in these frames. Let's write code for this. ```python import matplotlib.pyplot as plt # Sample 5 frames from different points in the video frame_indices = [0, int(0.25*2675), int(0.5*2675), int(0.75*2675), 2674] frames = [video_clue_0[i].asnumpy() for i in frame_indices] plt.figure(figsize=(15, 10)) for idx, frame in enumerate(frames): plt.subplot(1, 5, idx+1) plt.imshow(frame) plt.title(f'Frame {frame_indices[idx]}') plt.axis('off') plt.tight_layout() plt.show() ``` <|im_end|> Let's analyze the sampled frames: - **Frame 0:** Kitchen area (stove, counter) — not relevant for couch/table/jacket. - **Frame 668:** Table with two chairs; there is a dark object (likely the jacket) on the table. - **Frame 1337:** Couch is visible, with a white rectangular object (possibly a coffee table) in front of it. - **Frame 2006:** View of the kitchen area, adjacent to a carpeted area (possibly near the couch). - **Frame 2674:** Sink area, not relevant. ### Key Observations: - **Jacket:** The dark object on the table in Frame 668 is likely the jacket. - **Table:** Also in Frame 668. - **Couch:** Clearly visible in Frame 1337, with the door in the background. The table visible in frame 1337 is likely a coffee table, not the dining table from frame 668. #### Next Steps: We need to determine the spatial relationship between the couch, the (dining) table, and the jacket. Ideally, we need a frame that shows both the couch and the table together or allows us to infer their relative positions. ### Plan: - Sample additional frames between 668 and 1337 to try to get a frame that shows both the table (with jacket) and the couch, or at least provides more spatial context. - Focus on the transition between these two locations. Let's sample 3 more frames at indices: 900, 1100, and 1200, and display them for analysis. ```python additional_indices = [900, 1100, 1200] additional_frames = [video_clue_0[i].asnumpy() for i in additional_indices] plt.figure(figsize=(12, 6)) for idx, frame in enumerate(additional_frames): plt.subplot(1, 3, idx+1) plt.imshow(frame) plt.title(f'Frame {additional_indices[idx]}') plt.axis('off') plt.tight_layout() plt.show() ``` <|im_end|> Let's analyze the new frames: - **Frame 900:** Shows part of the table (with jacket) and the closet, with the door and a bit of the carpeted area visible. - **Frame 1100:** Shows the door and the right arm of the couch. - **Frame 1200:** Shows the closet and the door, carpeted area. ### Spatial Relationship Synthesis From all frames so far: - The **table with the jacket** is close to the closet and near the kitchen area (Frame 668, 900). - The **couch** is adjacent to the door and on the carpet (Frame 1337, 1100). - The door and carpet area serve as a transition between the kitchen/table area and the couch. - Frames 900 and 1200 show you can see both the table (left) and the closet/door (center/right) in one shot, indicating proximity. Thus, the **table (with jacket)** is in the dining area, adjacent to the kitchen, and the **couch** is in the living area, near the door. ### Next Step: Determining Relative Directions If you are "standing by the couch and facing the table," your back is to the door, and you are looking toward the dining area (table, kitchen). The table is in front of you. The jacket is on the table. Now, we need to consider: Is the jacket on the left or right side of the table from this perspective? #### Let's zoom in on Frame 668 (table with jacket) and Frame 1337 (couch) for more details. ```python plt.figure(figsize=(10, 5)) # Zoom on Frame 668 (table with jacket) plt.subplot(1, 2, 1) plt.imshow(frames[1]) # Frame 668 plt.title('Frame 668 (Table & Jacket)') plt.axis('off') # Zoom on Frame 1337 (couch) plt.subplot(1, 2, 2) plt.imshow(frames[2]) # Frame 1337 plt.title('Frame 1337 (Couch)') plt.axis('off') plt.tight_layout() plt.show() ``` <|im_end|> Let's reflect on what we've learned from these zoomed frames: - **Frame 668 (Table & Jacket):** The jacket is placed on the left side of the table (from the perspective in this frame), and the closet/door is to the right. - **Frame 1337 (Couch):** The couch is against the wall, near the door. The carpeted area leads toward the table (which is out of frame). ### Synthesizing the Room Layout From the sequence and the angles: - The table and jacket are in the dining area, with the closet/door to the right (from the table's perspective). - The couch is in the living area, with the door nearby. - If standing by the couch and facing the table, you would be looking toward the kitchen/dining area. ### Determining Jacket's Relative Position If you are standing by the couch and facing the table: - The table is in front of you. - The closet/door is to the right of the table (from the table's perspective). - The jacket is on the left side of the table (from the table's perspective). So, from your position by the couch, facing the table: - The jacket is on your **front-left**. ### Final Check Let's confirm: - In Frame 668, the jacket is on the left side of the table, with the closet/door to the right. - In Frame 1337, the couch is positioned such that facing the table from the couch puts the jacket on your front-left. ## Final Answer \boxed{'front-left'} <|im_end|> [INFO|configuration_utils.py:696] 2025-10-14 02:37:27,070 >> loading configuration file /inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct/config.json [INFO|configuration_utils.py:770] 2025-10-14 02:37:27,072 >> Model config Qwen2_5_VLConfig { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "text_config": { "architectures": [ "Qwen2_5_VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": null, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 128000, "max_window_layers": 28, "model_type": "qwen2_5_vl_text", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "torch_dtype": "bfloat16", "use_cache": true, "use_sliding_window": false, "video_token_id": null, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.52.1", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "depth": 32, "fullatt_block_indexes": [ 7, 15, 23, 31 ], "hidden_act": "silu", "hidden_size": 1280, "in_channels": 3, "in_chans": 3, "initializer_range": 0.02, "intermediate_size": 3420, "model_type": "qwen2_5_vl", "num_heads": 16, "out_hidden_size": 3584, "patch_size": 14, "spatial_merge_size": 2, "spatial_patch_size": 14, "temporal_patch_size": 2, "tokens_per_second": 2, "window_size": 112 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 } [INFO|2025-10-14 02:37:27] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training. [INFO|modeling_utils.py:1146] 2025-10-14 02:37:27,102 >> loading weights file /inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct/model.safetensors.index.json [INFO|modeling_utils.py:3878] 2025-10-14 02:37:27,102 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model [2025-10-14 02:37:27,103] [INFO] [config.py:684:__init__] Config mesh_device None world_size = 8 [INFO|configuration_utils.py:1135] 2025-10-14 02:37:27,113 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645, "use_cache": false } [INFO|modeling_utils.py:2239] 2025-10-14 02:37:27,114 >> Instantiating Qwen2_5_VisionTransformerPretrainedModel model under default dtype torch.float32. [INFO|modeling_utils.py:2239] 2025-10-14 02:37:27,225 >> Instantiating Qwen2_5_VLTextModel model under default dtype torch.float32. [2025-10-14 02:37:27,870] [INFO] [config.py:684:__init__] Config mesh_device None world_size = 8 [2025-10-14 02:37:27,870] [INFO] [config.py:684:__init__] Config mesh_device None world_size = 8 [2025-10-14 02:37:27,870] [INFO] [config.py:684:__init__] Config mesh_device None world_size = 8 [2025-10-14 02:37:27,870] [INFO] [config.py:684:__init__] Config mesh_device None world_size = 8 [2025-10-14 02:37:27,871] [INFO] [config.py:684:__init__] Config mesh_device None world_size = 8 [2025-10-14 02:37:27,881] [INFO] [config.py:684:__init__] Config mesh_device None world_size = 8 [2025-10-14 02:37:27,916] [INFO] [config.py:684:__init__] Config mesh_device None world_size = 8 Loading checkpoint shards: 0%| | 0/5 [00:00> All model checkpoint weights were used when initializing Qwen2_5_VLForConditionalGeneration. [INFO|modeling_utils.py:5178] 2025-10-14 02:37:35,813 >> All the weights of Qwen2_5_VLForConditionalGeneration were initialized from the model checkpoint at /inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2_5_VLForConditionalGeneration for predictions without further training. [INFO|configuration_utils.py:1088] 2025-10-14 02:37:35,817 >> loading configuration file /inspire/hdd/global_user/zhangkaipeng-24043/val_global/checkpoints/Qwen/Qwen2.5-VL-7B-Instruct/generation_config.json [INFO|configuration_utils.py:1135] 2025-10-14 02:37:35,817 >> Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.05, "temperature": 1e-06 } [INFO|2025-10-14 02:37:35] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled. [INFO|2025-10-14 02:37:35] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference. [INFO|2025-10-14 02:37:35] llamafactory.model.adapter:143 >> DeepSpeed ZeRO3 detected, remaining trainable params in float32. [INFO|2025-10-14 02:37:35] llamafactory.model.adapter:143 >> Fine-tuning method: Full [INFO|2025-10-14 02:37:35] llamafactory.model.model_utils.visual:143 >> Set vision model not trainable: ['visual.patch_embed', 'visual.blocks']. [INFO|2025-10-14 02:37:35] llamafactory.model.model_utils.visual:143 >> Set multi model projector not trainable: visual.merger. [INFO|2025-10-14 02:37:35] llamafactory.model.loader:143 >> trainable params: 7,615,616,512 || all params: 8,292,166,656 || trainable%: 91.8411 [INFO|trainer.py:756] 2025-10-14 02:37:35,838 >> Using auto half precision backend [2025-10-14 02:37:36,077] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed info: version=0.17.5, git-hash=unknown, git-branch=unknown [2025-10-14 02:37:36,077] [INFO] [config.py:684:__init__] Config mesh_device None world_size = 8 [2025-10-14 02:37:38,116] [INFO] [engine.py:1356:_configure_distributed_model] ********** distributed groups summary ********** self.dp_world_size=8 self.mp_world_size=1 self.seq_dp_world_size=8 self.sequence_parallel_size=1 *********************************************** [2025-10-14 02:37:38,119] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2025-10-14 02:37:38,121] [INFO] [logging.py:107:log_dist] [Rank 0] Using client Optimizer as basic optimizer [2025-10-14 02:37:38,121] [INFO] [logging.py:107:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2025-10-14 02:37:38,143] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW [2025-10-14 02:37:38,143] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type= [2025-10-14 02:37:38,143] [INFO] [logging.py:107:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False [2025-10-14 02:37:38,143] [INFO] [logging.py:107:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer [2025-10-14 02:37:38,334] [INFO] [utils.py:781:see_memory_usage] Stage 3 initialize beginning [2025-10-14 02:37:38,334] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 4.85 GB CA 1.93 GB Max_CA 5 GB [2025-10-14 02:37:38,335] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 146.08 GB, percent = 7.2% [2025-10-14 02:37:38,337] [INFO] [stage3.py:186:__init__] Reduce bucket size 12845056 [2025-10-14 02:37:38,337] [INFO] [stage3.py:187:__init__] Prefetch bucket size 11560550 [2025-10-14 02:37:38,434] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [begin] [2025-10-14 02:37:38,434] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.93 GB Max_CA 2 GB [2025-10-14 02:37:38,434] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 146.08 GB, percent = 7.2% Parameter Offload - Persistent parameters statistics: param_count = 368, numel = 848896 [2025-10-14 02:37:38,595] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [end] [2025-10-14 02:37:38,595] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.93 GB Max_CA 2 GB [2025-10-14 02:37:38,596] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 146.09 GB, percent = 7.2% [2025-10-14 02:37:38,710] [INFO] [utils.py:781:see_memory_usage] Before creating fp16 partitions [2025-10-14 02:37:38,710] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.93 GB Max_CA 2 GB [2025-10-14 02:37:38,710] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 146.09 GB, percent = 7.2% [2025-10-14 02:37:40,282] [INFO] [utils.py:781:see_memory_usage] After creating fp16 partitions: 2 [2025-10-14 02:37:40,282] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.97 GB Max_CA 2 GB [2025-10-14 02:37:40,282] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 156.15 GB, percent = 7.7% [2025-10-14 02:37:40,390] [INFO] [utils.py:781:see_memory_usage] Before creating fp32 partitions [2025-10-14 02:37:40,390] [INFO] [utils.py:782:see_memory_usage] MA 1.93 GB Max_MA 1.93 GB CA 1.97 GB Max_CA 2 GB [2025-10-14 02:37:40,390] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 147.53 GB, percent = 7.3% [2025-10-14 02:37:40,551] [INFO] [utils.py:781:see_memory_usage] After creating fp32 partitions [2025-10-14 02:37:40,552] [INFO] [utils.py:782:see_memory_usage] MA 5.48 GB Max_MA 7.25 GB CA 7.28 GB Max_CA 7 GB [2025-10-14 02:37:40,552] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 146.17 GB, percent = 7.3% [2025-10-14 02:37:40,653] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states [2025-10-14 02:37:40,654] [INFO] [utils.py:782:see_memory_usage] MA 5.48 GB Max_MA 5.48 GB CA 7.28 GB Max_CA 7 GB [2025-10-14 02:37:40,654] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 146.19 GB, percent = 7.3% [2025-10-14 02:37:40,790] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states [2025-10-14 02:37:40,790] [INFO] [utils.py:782:see_memory_usage] MA 5.48 GB Max_MA 9.02 GB CA 10.84 GB Max_CA 11 GB [2025-10-14 02:37:40,790] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 146.2 GB, percent = 7.3% [2025-10-14 02:37:40,791] [INFO] [stage3.py:554:_setup_for_real_optimizer] optimizer state initialized [2025-10-14 02:37:40,975] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer [2025-10-14 02:37:40,975] [INFO] [utils.py:782:see_memory_usage] MA 7.27 GB Max_MA 9.3 GB CA 10.84 GB Max_CA 11 GB [2025-10-14 02:37:40,975] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 146.22 GB, percent = 7.3% [2025-10-14 02:37:40,975] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer_Stage3 [2025-10-14 02:37:40,975] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = None [2025-10-14 02:37:40,975] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed LR Scheduler = None [2025-10-14 02:37:40,976] [INFO] [logging.py:107:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2025-10-14 02:37:40,977] [INFO] [logging.py:107:log_dist] [Rank 0] [TorchCheckpointEngine] Initialized with serialization = True [2025-10-14 02:37:40,977] [INFO] [config.py:954:print] DeepSpeedEngine configuration: [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'intra_op_parallelism': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False} [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] amp_enabled .................. False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] amp_params ................... False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] bfloat16_config .............. enabled=True immediate_grad_update=False check_grad_overflow=False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] checkpoint_config ............ {'tag_validation': 'WARN', 'checkpoint_serialization': True, 'writer': None} [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] checkpoint_parallel_write_pipeline False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] checkpoint_tag_validation_enabled True [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] checkpoint_tag_validation_fail False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] comms_config ................. [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] communication_data_type ...... None [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] compile_config ............... deepcompile=False free_activation=False offload_activation=False offload_opt_states=False double_buffer=True symmetric_memory=False debug_log=False offload_parameters=False sync_before_reduce=False sync_after_reduce=False sync_before_allgather=False sync_after_allgather=False keep_int_input_tensors=True keep_all_input_tensors=False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] curriculum_enabled_legacy .... False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] curriculum_params_legacy ..... False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'pin_memory': False, 'curriculum_learning': {'enabled': False}, 'dynamic_batching': {'enabled': False, 'lr_scaling_method': 'linear', 'min_batch_size': 1, 'max_batch_size': None, 'sequence_picking_order': 'dataloader', 'verbose': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] data_efficiency_enabled ...... False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] dataloader_drop_last ......... False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] disable_allgather ............ False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] dump_state ................... False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] eigenvalue_enabled ........... False [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] eigenvalue_gas_boundary_resolution 1 [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] eigenvalue_layer_name ........ bert.encoder.layer [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] eigenvalue_layer_num ......... 0 [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] eigenvalue_max_iter .......... 100 [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] eigenvalue_stability ......... 1e-06 [2025-10-14 02:37:40,977] [INFO] [config.py:958:print] eigenvalue_tol ............... 0.01 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] eigenvalue_verbose ........... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] elasticity_enabled ........... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] float16_config ............... enabled=False auto_cast=False loss_scale=0.0 initial_scale_power=16 loss_scale_window=1000 hysteresis=2 consecutive_hysteresis=False min_loss_scale=1 fp16_master_weights_and_grads=False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] global_rank .................. 0 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] grad_accum_dtype ............. None [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] gradient_accumulation_steps .. 1 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] gradient_clipping ............ 1.0 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] gradient_predivide_factor .... 1.0 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] graph_harvesting ............. False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] load_universal_checkpoint .... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] memory_breakdown ............. False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] mics_hierarchial_params_gather False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] mics_shard_size .............. -1 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] optimizer_legacy_fusion ...... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] optimizer_name ............... None [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] optimizer_params ............. None [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] pld_enabled .................. False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] pld_params ................... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] prescale_gradients ........... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] scheduler_name ............... None [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] scheduler_params ............. None [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] seq_parallel_communication_data_type torch.float32 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] sparse_attention ............. None [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] sparse_gradients_enabled ..... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] steps_per_print .............. inf [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] tensor_parallel_config ....... dtype=torch.float16 autotp_size=0 tp_overlap_comm=False tensor_parallel=TPConfig(tp_size=1, tp_grain_size=1, mpu=None, tp_group=None) injection_policy_tuple=None keep_module_on_host=False replace_with_kernel_inject=False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] timers_config ................ enabled=True synchronized=True [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] torch_autocast_dtype ......... None [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] torch_autocast_enabled ....... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] torch_autocast_lower_precision_safe_modules None [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] train_batch_size ............. 16 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] train_micro_batch_size_per_gpu 2 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] use_data_before_expert_parallel_ False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] use_node_local_storage ....... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] wall_clock_breakdown ......... False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] weight_quantization_config ... None [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] world_size ................... 8 [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] zero_allow_untested_optimizer True [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=12845056 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None zenflow=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=11560550 param_persistence_threshold=35840 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=True module_granularity_threshold=0 use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False zeropp_loco_param=None mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True log_trace_cache_warnings=False [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] zero_enabled ................. True [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] zero_force_ds_cpu_optimizer .. True [2025-10-14 02:37:40,978] [INFO] [config.py:958:print] zero_optimization_stage ...... 3 [2025-10-14 02:37:40,978] [INFO] [config.py:944:print_user_config] json = { "train_batch_size": 16, "train_micro_batch_size_per_gpu": 2, "gradient_accumulation_steps": 1, "gradient_clipping": 1.0, "zero_allow_untested_optimizer": true, "fp16": { "enabled": false, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": true }, "zero_optimization": { "stage": 3, "overlap_comm": false, "contiguous_gradients": true, "sub_group_size": 1.000000e+09, "reduce_bucket_size": 1.284506e+07, "stage3_prefetch_bucket_size": 1.156055e+07, "stage3_param_persistence_threshold": 3.584000e+04, "stage3_max_live_parameters": 1.000000e+09, "stage3_max_reuse_distance": 1.000000e+09, "stage3_gather_16bit_weights_on_model_save": true }, "steps_per_print": inf } [INFO|trainer.py:2409] 2025-10-14 02:37:40,979 >> ***** Running training ***** [INFO|trainer.py:2410] 2025-10-14 02:37:40,979 >> Num examples = 110,155 [INFO|trainer.py:2411] 2025-10-14 02:37:40,979 >> Num Epochs = 1 [INFO|trainer.py:2412] 2025-10-14 02:37:40,980 >> Instantaneous batch size per device = 2 [INFO|trainer.py:2415] 2025-10-14 02:37:40,980 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:2416] 2025-10-14 02:37:40,980 >> Gradient Accumulation steps = 1 [INFO|trainer.py:2417] 2025-10-14 02:37:40,980 >> Total optimization steps = 6,885 [INFO|trainer.py:2418] 2025-10-14 02:37:40,981 >> Number of trainable parameters = 7,615,616,512 [INFO|integration_utils.py:832] 2025-10-14 02:37:40,982 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: Tracking run with wandb version 0.21.3 wandb: W&B syncing is set to `offline` in this directory. Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. wandb: Run data is saved locally in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/wandb/offline-run-20251014_023741-21vuy1wg 0%| | 0/6885 [00:00> `loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`. `loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`. `loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`. `loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`. `loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`. `loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`. `loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`. `loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`. [INFO|2025-10-14 02:37:46] llamafactory.train.callbacks:143 >> EMA state initialized. 0%| | 1/6885 [00:05<10:10:33, 5.32s/it] 0%| | 2/6885 [00:10<10:20:16, 5.41s/it] 0%| | 3/6885 [00:13<8:06:04, 4.24s/it] 0%| | 4/6885 [00:18<8:14:24, 4.31s/it] 0%| | 5/6885 [00:22<8:02:10, 4.20s/it] 0%| | 6/6885 [00:26<8:28:09, 4.43s/it] 0%| | 7/6885 [00:29<7:18:36, 3.83s/it] 0%| | 8/6885 [00:32<6:52:53, 3.60s/it] 0%| | 9/6885 [00:41<9:55:33, 5.20s/it] 0%| | 10/6885 [00:44<8:32:32, 4.47s/it] {'loss': 0.9607, 'grad_norm': 4.328955480739728, 'learning_rate': 1.3062409288824383e-07, 'epoch': 0.0} 0%| | 10/6885 [00:44<8:32:32, 4.47s/it] 0%| | 11/6885 [00:46<7:13:15, 3.78s/it] 0%| | 12/6885 [00:48<5:56:53, 3.12s/it] 0%| | 13/6885 [00:49<5:16:17, 2.76s/it] 0%| | 14/6885 [00:52<5:18:36, 2.78s/it] 0%| | 15/6885 [00:54<4:46:32, 2.50s/it] 0%| | 16/6885 [00:57<5:05:15, 2.67s/it] 0%| | 17/6885 [00:59<4:22:25, 2.29s/it] 0%| | 18/6885 [01:01<4:19:23, 2.27s/it] 0%| | 19/6885 [01:05<5:13:23, 2.74s/it] 0%| | 20/6885 [01:07<4:48:02, 2.52s/it] {'loss': 0.9859, 'grad_norm': 4.469323164876104, 'learning_rate': 2.757619738751814e-07, 'epoch': 0.0} 0%| | 20/6885 [01:07<4:48:02, 2.52s/it] 0%| | 21/6885 [01:09<4:27:03, 2.33s/it] 0%| | 22/6885 [01:11<4:37:25, 2.43s/it] 0%| | 23/6885 [01:17<6:16:30, 3.29s/it] 0%| | 24/6885 [01:19<5:56:35, 3.12s/it] 0%| | 25/6885 [01:21<5:13:47, 2.74s/it] 0%| | 26/6885 [01:23<4:52:43, 2.56s/it] 0%| | 27/6885 [01:26<5:14:32, 2.75s/it] 0%| | 28/6885 [01:30<5:51:46, 3.08s/it] 0%| | 29/6885 [01:32<5:08:12, 2.70s/it] 0%| | 30/6885 [01:35<5:08:21, 2.70s/it] {'loss': 0.9872, 'grad_norm': 4.000416594025176, 'learning_rate': 4.2089985486211904e-07, 'epoch': 0.0} 0%| | 30/6885 [01:35<5:08:21, 2.70s/it] 0%| | 31/6885 [01:37<4:59:14, 2.62s/it] 0%| | 32/6885 [01:40<4:50:42, 2.55s/it] 0%| | 33/6885 [01:42<4:55:39, 2.59s/it] 0%| | 34/6885 [01:48<6:27:19, 3.39s/it] 1%| | 35/6885 [01:54<8:28:28, 4.45s/it] 1%| | 36/6885 [01:57<7:28:49, 3.93s/it] 1%| | 37/6885 [02:00<6:42:22, 3.53s/it] 1%| | 38/6885 [02:02<5:56:59, 3.13s/it] 1%| | 39/6885 [02:04<5:35:14, 2.94s/it] 1%| | 40/6885 [02:07<5:14:53, 2.76s/it] {'loss': 0.9191, 'grad_norm': 3.1566001029759914, 'learning_rate': 5.660377358490567e-07, 'epoch': 0.01} 1%| | 40/6885 [9:57:25<5:14:53, 2.76s/it] 1%| | 41/6885 [9:57:27<20376:14:32, 10718.07s/it] 1%| | 42/6885 [9:57:30<14262:51:47, 7503.48s/it] 1%| | 43/6885 [9:57:33<9984:08:00, 5253.27s/it] 1%| | 44/6885 [9:57:35<6989:21:00, 3678.07s/it] 1%| | 45/6885 [9:57:37<4892:55:09, 2575.22s/it] 1%| | 46/6885 [9:57:39<3425:39:14, 1803.24s/it] 1%| | 47/6885 [9:57:43<2399:28:36, 1263.25s/it] 1%| | 48/6885 [9:57:45<1680:52:05, 885.06s/it] 1%| | 49/6885 [9:57:49<1178:21:09, 620.55s/it] 1%| | 50/6885 [9:57:52<826:42:56, 435.43s/it] {'loss': 0.866, 'grad_norm': 2.000776925354802, 'learning_rate': 7.111756168359943e-07, 'epoch': 0.01} 1%| | 50/6885 [9:57:52<826:42:56, 435.43s/it] 1%| | 51/6885 [9:57:54<579:58:51, 305.52s/it] 1%| | 52/6885 [9:57:59<408:31:48, 215.24s/it] 1%| | 53/6885 [9:58:02<287:30:23, 151.50s/it] 1%| | 54/6885 [9:58:05<202:54:07, 106.93s/it] 1%| | 55/6885 [9:58:08<143:44:18, 75.76s/it] 1%| | 56/6885 [9:58:12<102:45:19, 54.17s/it] 1%| | 57/6885 [9:58:14<73:25:56, 38.72s/it] 1%| | 58/6885 [9:58:18<53:34:19, 28.25s/it] 1%| | 59/6885 [9:58:23<40:07:45, 21.16s/it] 1%| | 60/6885 [9:58:26<30:08:19, 15.90s/it] {'loss': 0.8475, 'grad_norm': 2.03383269865318, 'learning_rate': 8.563134978229319e-07, 'epoch': 0.01} 1%| | 60/6885 [9:58:26<30:08:19, 15.90s/it] 1%| | 61/6885 [9:58:30<23:12:36, 12.24s/it] 1%| | 62/6885 [9:58:33<17:46:55, 9.38s/it] 1%| | 63/6885 [9:58:35<13:34:01, 7.16s/it] 1%| | 64/6885 [9:58:36<10:28:49, 5.53s/it] 1%| | 65/6885 [9:58:39<8:58:49, 4.74s/it] 1%| | 66/6885 [9:58:42<7:40:50, 4.05s/it] 1%| | 67/6885 [9:58:45<7:00:58, 3.70s/it] 1%| | 68/6885 [9:58:47<6:26:46, 3.40s/it] 1%| | 69/6885 [9:58:50<6:03:37, 3.20s/it] 1%| | 70/6885 [9:58:54<6:17:14, 3.32s/it] {'loss': 0.8145, 'grad_norm': 1.981671850063017, 'learning_rate': 1.0014513788098695e-06, 'epoch': 0.01} 1%| | 70/6885 [9:58:54<6:17:14, 3.32s/it] 1%| | 71/6885 [9:58:57<6:20:54, 3.35s/it] 1%| | 72/6885 [9:59:01<6:39:48, 3.52s/it] 1%| | 73/6885 [9:59:04<6:32:51, 3.46s/it] 1%| | 74/6885 [9:59:09<7:05:01, 3.74s/it] 1%| | 75/6885 [9:59:11<6:16:45, 3.32s/it] 1%| | 76/6885 [9:59:13<5:33:10, 2.94s/it] 1%| | 77/6885 [9:59:16<5:34:27, 2.95s/it] 1%| | 78/6885 [9:59:19<5:28:30, 2.90s/it] 1%| | 79/6885 [9:59:22<5:24:43, 2.86s/it] 1%| | 80/6885 [9:59:23<4:42:19, 2.49s/it] {'loss': 0.7874, 'grad_norm': 1.9935447101504142, 'learning_rate': 1.146589259796807e-06, 'epoch': 0.01} 1%| | 80/6885 [9:59:23<4:42:19, 2.49s/it] 1%| | 81/6885 [9:59:26<4:39:29, 2.46s/it] 1%| | 82/6885 [9:59:29<4:53:30, 2.59s/it] 1%| | 83/6885 [9:59:31<4:42:17, 2.49s/it] 1%| | 84/6885 [9:59:33<4:41:45, 2.49s/it] 1%| | 85/6885 [9:59:36<4:59:04, 2.64s/it] 1%| | 86/6885 [9:59:41<5:55:22, 3.14s/it] 1%|▏ | 87/6885 [9:59:43<5:18:37, 2.81s/it] 1%|▏ | 88/6885 [9:59:46<5:40:05, 3.00s/it] 1%|▏ | 89/6885 [9:59:49<5:53:16, 3.12s/it] 1%|▏ | 90/6885 [9:59:52<5:40:04, 3.00s/it] {'loss': 0.7606, 'grad_norm': 1.696794144473072, 'learning_rate': 1.2917271407837448e-06, 'epoch': 0.01} 1%|▏ | 90/6885 [9:59:52<5:40:04, 3.00s/it] 1%|▏ | 91/6885 [9:59:55<5:25:29, 2.87s/it] 1%|▏ | 92/6885 [9:59:57<5:06:30, 2.71s/it] 1%|▏ | 93/6885 [10:00:02<6:38:40, 3.52s/it] 1%|▏ | 94/6885 [10:00:05<6:21:30, 3.37s/it] 1%|▏ | 95/6885 [10:00:10<7:09:39, 3.80s/it] 1%|▏ | 96/6885 [10:00:13<6:23:29, 3.39s/it] 1%|▏ | 97/6885 [10:00:15<5:55:32, 3.14s/it] 1%|▏ | 98/6885 [10:00:17<5:06:39, 2.71s/it] 1%|▏ | 99/6885 [10:00:20<5:30:47, 2.92s/it] 1%|▏ | 100/6885 [10:00:23<5:16:17, 2.80s/it] {'loss': 0.7505, 'grad_norm': 1.8441704167155635, 'learning_rate': 1.4368650217706823e-06, 'epoch': 0.01} 1%|▏ | 100/6885 [10:00:23<5:16:17, 2.80s/it] 1%|▏ | 101/6885 [10:00:26<5:18:00, 2.81s/it] 1%|▏ | 102/6885 [10:00:29<5:26:01, 2.88s/it] 1%|▏ | 103/6885 [10:00:33<6:05:00, 3.23s/it] 2%|▏ | 104/6885 [10:00:35<5:33:59, 2.96s/it] 2%|▏ | 105/6885 [10:00:39<5:51:55, 3.11s/it] 2%|▏ | 106/6885 [10:00:41<5:21:05, 2.84s/it] 2%|▏ | 107/6885 [10:00:44<5:33:09, 2.95s/it] 2%|▏ | 108/6885 [10:00:47<5:35:00, 2.97s/it] 2%|▏ | 109/6885 [10:00:49<5:11:41, 2.76s/it] 2%|▏ | 110/6885 [10:00:53<5:44:11, 3.05s/it] {'loss': 0.7432, 'grad_norm': 1.6167640330505846, 'learning_rate': 1.5820029027576197e-06, 'epoch': 0.02} 2%|▏ | 110/6885 [10:00:53<5:44:11, 3.05s/it] 2%|▏ | 111/6885 [10:00:57<5:57:46, 3.17s/it] 2%|▏ | 112/6885 [10:01:00<5:54:45, 3.14s/it] 2%|▏ | 113/6885 [10:01:03<6:02:33, 3.21s/it] 2%|▏ | 114/6885 [10:01:10<8:03:35, 4.29s/it] 2%|▏ | 115/6885 [10:01:12<6:49:53, 3.63s/it] 2%|▏ | 116/6885 [10:01:15<6:25:40, 3.42s/it] 2%|▏ | 117/6885 [10:01:17<5:52:08, 3.12s/it] 2%|▏ | 118/6885 [10:01:19<5:21:23, 2.85s/it] 2%|▏ | 119/6885 [10:01:22<5:25:00, 2.88s/it] 2%|▏ | 120/6885 [10:01:25<5:18:40, 2.83s/it] {'loss': 0.7502, 'grad_norm': 1.7310300613256226, 'learning_rate': 1.7271407837445576e-06, 'epoch': 0.02} 2%|▏ | 120/6885 [10:01:25<5:18:40, 2.83s/it] 2%|▏ | 121/6885 [10:01:29<5:39:34, 3.01s/it] 2%|▏ | 122/6885 [10:01:31<5:25:15, 2.89s/it] 2%|▏ | 123/6885 [10:01:34<5:16:27, 2.81s/it] 2%|▏ | 124/6885 [10:01:36<4:47:32, 2.55s/it] 2%|▏ | 125/6885 [10:01:39<5:02:06, 2.68s/it] 2%|▏ | 126/6885 [10:01:42<5:33:58, 2.96s/it] 2%|▏ | 127/6885 [10:01:45<5:37:16, 2.99s/it] 2%|▏ | 128/6885 [10:01:48<5:07:48, 2.73s/it] 2%|▏ | 129/6885 [10:01:50<4:45:35, 2.54s/it] 2%|▏ | 130/6885 [10:01:53<5:07:22, 2.73s/it] {'loss': 0.7075, 'grad_norm': 1.5504171157690307, 'learning_rate': 1.872278664731495e-06, 'epoch': 0.02} 2%|▏ | 130/6885 [10:01:53<5:07:22, 2.73s/it] 2%|▏ | 131/6885 [10:01:55<4:39:47, 2.49s/it] 2%|▏ | 132/6885 [10:01:57<4:26:16, 2.37s/it] 2%|▏ | 133/6885 [10:02:01<5:17:57, 2.83s/it] 2%|▏ | 134/6885 [10:02:03<5:02:17, 2.69s/it] 2%|▏ | 135/6885 [10:02:05<4:37:11, 2.46s/it] 2%|▏ | 136/6885 [10:02:09<5:42:00, 3.04s/it] 2%|▏ | 137/6885 [10:02:12<5:13:00, 2.78s/it] 2%|▏ | 138/6885 [10:02:14<5:11:23, 2.77s/it] 2%|▏ | 139/6885 [10:02:18<5:46:23, 3.08s/it] 2%|▏ | 140/6885 [10:02:23<6:45:12, 3.60s/it] {'loss': 0.7242, 'grad_norm': 1.5001595551333269, 'learning_rate': 2.0174165457184327e-06, 'epoch': 0.02} 2%|▏ | 140/6885 [10:02:23<6:45:12, 3.60s/it] 2%|▏ | 141/6885 [10:02:26<6:39:52, 3.56s/it] 2%|▏ | 142/6885 [10:02:29<6:13:04, 3.32s/it] 2%|▏ | 143/6885 [10:02:33<6:43:56, 3.59s/it] 2%|▏ | 144/6885 [10:02:38<7:17:22, 3.89s/it] 2%|▏ | 145/6885 [10:02:42<7:13:40, 3.86s/it] 2%|▏ | 146/6885 [10:02:45<6:40:59, 3.57s/it] 2%|▏ | 147/6885 [10:02:48<6:24:51, 3.43s/it] 2%|▏ | 148/6885 [10:02:51<6:28:00, 3.46s/it] 2%|▏ | 149/6885 [10:02:54<5:49:03, 3.11s/it] 2%|▏ | 150/6885 [10:02:56<5:29:44, 2.94s/it] {'loss': 0.7299, 'grad_norm': 1.7680255328873922, 'learning_rate': 2.1625544267053704e-06, 'epoch': 0.02} 2%|▏ | 150/6885 [10:02:56<5:29:44, 2.94s/it] 2%|▏ | 151/6885 [10:02:59<5:20:18, 2.85s/it] 2%|▏ | 152/6885 [10:03:02<5:26:11, 2.91s/it] 2%|▏ | 153/6885 [10:03:05<5:49:35, 3.12s/it] 2%|▏ | 154/6885 [10:03:08<5:28:09, 2.93s/it] 2%|▏ | 155/6885 [10:03:10<5:13:24, 2.79s/it] 2%|▏ | 156/6885 [10:03:13<4:52:39, 2.61s/it] 2%|▏ | 157/6885 [10:03:15<4:51:45, 2.60s/it] 2%|▏ | 158/6885 [10:03:18<4:56:46, 2.65s/it] 2%|▏ | 159/6885 [10:03:20<4:48:42, 2.58s/it] 2%|▏ | 160/6885 [10:03:22<4:28:29, 2.40s/it] {'loss': 0.7074, 'grad_norm': 1.9776874021989124, 'learning_rate': 2.307692307692308e-06, 'epoch': 0.02} 2%|▏ | 160/6885 [10:03:22<4:28:29, 2.40s/it] 2%|▏ | 161/6885 [10:03:25<4:39:25, 2.49s/it] 2%|▏ | 162/6885 [10:03:28<4:51:21, 2.60s/it] 2%|▏ | 163/6885 [10:03:30<4:39:09, 2.49s/it] 2%|▏ | 164/6885 [10:03:32<4:26:55, 2.38s/it] 2%|▏ | 165/6885 [10:03:35<4:43:16, 2.53s/it] 2%|▏ | 166/6885 [10:03:37<4:21:27, 2.33s/it] 2%|▏ | 167/6885 [10:03:39<4:17:34, 2.30s/it] 2%|▏ | 168/6885 [10:03:46<6:35:04, 3.53s/it] 2%|▏ | 169/6885 [10:03:47<5:40:36, 3.04s/it] 2%|▏ | 170/6885 [10:03:51<5:44:36, 3.08s/it] {'loss': 0.7003, 'grad_norm': 1.645294675336186, 'learning_rate': 2.4528301886792453e-06, 'epoch': 0.02} 2%|▏ | 170/6885 [10:03:51<5:44:36, 3.08s/it] 2%|▏ | 171/6885 [10:03:54<6:02:55, 3.24s/it] 2%|▏ | 172/6885 [10:03:57<5:34:31, 2.99s/it] 3%|▎ | 173/6885 [10:04:01<6:09:33, 3.30s/it] 3%|▎ | 174/6885 [10:04:03<5:35:12, 3.00s/it] 3%|▎ | 175/6885 [10:04:06<5:23:31, 2.89s/it] 3%|▎ | 176/6885 [10:04:08<5:21:18, 2.87s/it] 3%|▎ | 177/6885 [10:04:11<4:59:39, 2.68s/it] 3%|▎ | 178/6885 [10:04:14<5:37:30, 3.02s/it] 3%|▎ | 179/6885 [10:04:17<5:10:51, 2.78s/it] 3%|▎ | 180/6885 [10:04:19<4:42:30, 2.53s/it] {'loss': 0.6935, 'grad_norm': 1.903626800669526, 'learning_rate': 2.597968069666183e-06, 'epoch': 0.03} 3%|▎ | 180/6885 [10:04:19<4:42:30, 2.53s/it] 3%|▎ | 181/6885 [10:04:20<4:19:07, 2.32s/it] 3%|▎ | 182/6885 [10:04:24<4:49:14, 2.59s/it] 3%|▎ | 183/6885 [10:04:28<5:55:06, 3.18s/it] 3%|▎ | 184/6885 [10:04:31<5:57:08, 3.20s/it] 3%|▎ | 185/6885 [10:04:36<6:27:08, 3.47s/it] 3%|▎ | 186/6885 [10:04:38<5:52:46, 3.16s/it] 3%|▎ | 187/6885 [10:04:41<5:41:00, 3.05s/it] 3%|▎ | 188/6885 [10:04:43<5:13:12, 2.81s/it] 3%|▎ | 189/6885 [10:04:47<5:36:39, 3.02s/it] 3%|▎ | 190/6885 [10:04:50<5:44:43, 3.09s/it] {'loss': 0.7099, 'grad_norm': 1.6296522016767983, 'learning_rate': 2.7431059506531207e-06, 'epoch': 0.03} 3%|▎ | 190/6885 [10:04:50<5:44:43, 3.09s/it] 3%|▎ | 191/6885 [10:04:52<5:11:26, 2.79s/it] 3%|▎ | 192/6885 [10:04:55<5:16:39, 2.84s/it] 3%|▎ | 193/6885 [10:05:00<6:26:44, 3.47s/it] 3%|▎ | 194/6885 [10:05:01<5:21:22, 2.88s/it] 3%|▎ | 195/6885 [10:05:04<5:30:59, 2.97s/it] 3%|▎ | 196/6885 [10:05:11<7:30:07, 4.04s/it] 3%|▎ | 197/6885 [10:05:13<6:28:57, 3.49s/it] 3%|▎ | 198/6885 [10:05:19<7:40:25, 4.13s/it] 3%|▎ | 199/6885 [10:05:21<6:49:29, 3.67s/it] 3%|▎ | 200/6885 [10:05:25<6:42:27, 3.61s/it] {'loss': 0.7082, 'grad_norm': 1.5624745122869332, 'learning_rate': 2.8882438316400583e-06, 'epoch': 0.03} 3%|▎ | 200/6885 [10:05:25<6:42:27, 3.61s/it] 3%|▎ | 201/6885 [10:05:28<6:19:03, 3.40s/it] 3%|▎ | 202/6885 [10:05:30<5:31:48, 2.98s/it] 3%|▎ | 203/6885 [10:05:33<5:47:04, 3.12s/it] 3%|▎ | 204/6885 [10:05:36<5:26:54, 2.94s/it] 3%|▎ | 205/6885 [10:05:39<5:43:55, 3.09s/it] 3%|▎ | 206/6885 [10:05:42<5:24:33, 2.92s/it] 3%|▎ | 207/6885 [10:05:44<5:02:20, 2.72s/it] 3%|▎ | 208/6885 [10:05:47<5:00:22, 2.70s/it] 3%|▎ | 209/6885 [10:05:50<5:23:09, 2.90s/it] 3%|▎ | 210/6885 [10:05:54<5:54:21, 3.19s/it] {'loss': 0.6847, 'grad_norm': 1.5327148829437787, 'learning_rate': 3.033381712626996e-06, 'epoch': 0.03} 3%|▎ | 210/6885 [10:05:54<5:54:21, 3.19s/it] 3%|▎ | 211/6885 [10:05:56<5:15:37, 2.84s/it] 3%|▎ | 212/6885 [10:05:59<5:24:58, 2.92s/it] 3%|▎ | 213/6885 [10:06:01<4:54:31, 2.65s/it] 3%|▎ | 214/6885 [10:06:03<4:35:10, 2.48s/it] 3%|▎ | 215/6885 [10:06:06<4:54:06, 2.65s/it] 3%|▎ | 216/6885 [10:06:11<5:53:54, 3.18s/it] 3%|▎ | 217/6885 [10:06:13<5:32:57, 3.00s/it] 3%|▎ | 218/6885 [10:06:16<5:25:44, 2.93s/it] 3%|▎ | 219/6885 [10:06:19<5:17:17, 2.86s/it] 3%|▎ | 220/6885 [10:06:23<6:17:40, 3.40s/it] {'loss': 0.6997, 'grad_norm': 1.4217156007581908, 'learning_rate': 3.1785195936139337e-06, 'epoch': 0.03} 3%|▎ | 220/6885 [10:06:23<6:17:40, 3.40s/it] 3%|▎ | 221/6885 [10:06:29<7:25:27, 4.01s/it] 3%|▎ | 222/6885 [10:06:33<7:24:43, 4.00s/it] 3%|▎ | 223/6885 [10:06:35<6:39:45, 3.60s/it] 3%|▎ | 224/6885 [10:06:38<6:01:35, 3.26s/it] 3%|▎ | 225/6885 [10:06:41<5:46:19, 3.12s/it] 3%|▎ | 226/6885 [10:06:43<5:20:01, 2.88s/it] 3%|▎ | 227/6885 [10:06:48<6:32:50, 3.54s/it] 3%|▎ | 228/6885 [10:06:51<6:04:35, 3.29s/it] 3%|▎ | 229/6885 [10:06:54<6:16:05, 3.39s/it] 3%|▎ | 230/6885 [10:06:57<5:46:43, 3.13s/it] {'loss': 0.6922, 'grad_norm': 1.678714535521671, 'learning_rate': 3.323657474600871e-06, 'epoch': 0.03} 3%|▎ | 230/6885 [10:06:57<5:46:43, 3.13s/it] 3%|▎ | 231/6885 [10:06:59<5:02:52, 2.73s/it] 3%|▎ | 232/6885 [10:07:02<5:08:20, 2.78s/it] 3%|▎ | 233/6885 [10:07:04<4:42:49, 2.55s/it] 3%|▎ | 234/6885 [10:07:06<4:39:56, 2.53s/it] 3%|▎ | 235/6885 [10:07:09<4:45:33, 2.58s/it] 3%|▎ | 236/6885 [10:07:12<5:04:27, 2.75s/it] 3%|▎ | 237/6885 [10:07:16<5:36:40, 3.04s/it] 3%|▎ | 238/6885 [10:07:19<5:45:46, 3.12s/it] 3%|▎ | 239/6885 [10:07:21<5:13:31, 2.83s/it] 3%|▎ | 240/6885 [10:07:23<4:50:22, 2.62s/it] {'loss': 0.6764, 'grad_norm': 1.6893028132334575, 'learning_rate': 3.4687953555878086e-06, 'epoch': 0.03} 3%|▎ | 240/6885 [10:07:23<4:50:22, 2.62s/it] 4%|▎ | 241/6885 [10:07:26<4:57:14, 2.68s/it] 4%|▎ | 242/6885 [10:07:28<4:48:58, 2.61s/it] 4%|▎ | 243/6885 [10:07:30<4:25:32, 2.40s/it] 4%|▎ | 244/6885 [10:07:34<4:58:40, 2.70s/it] 4%|▎ | 245/6885 [10:07:36<4:45:36, 2.58s/it] 4%|▎ | 246/6885 [10:07:39<4:49:28, 2.62s/it] 4%|▎ | 247/6885 [10:07:43<5:40:48, 3.08s/it] 4%|▎ | 248/6885 [10:07:45<5:16:03, 2.86s/it] 4%|▎ | 249/6885 [10:07:49<5:41:29, 3.09s/it] 4%|▎ | 250/6885 [10:07:51<5:22:02, 2.91s/it] {'loss': 0.6838, 'grad_norm': 1.6842923668045748, 'learning_rate': 3.6139332365747467e-06, 'epoch': 0.04} 4%|▎ | 250/6885 [10:07:51<5:22:02, 2.91s/it] 4%|▎ | 251/6885 [10:07:54<5:08:55, 2.79s/it] 4%|▎ | 252/6885 [10:07:56<4:53:08, 2.65s/it] 4%|▎ | 253/6885 [10:08:00<5:22:56, 2.92s/it] 4%|▎ | 254/6885 [10:08:03<5:23:46, 2.93s/it] 4%|▎ | 255/6885 [10:08:06<5:27:34, 2.96s/it] 4%|▎ | 256/6885 [10:08:08<5:14:36, 2.85s/it] 4%|▎ | 257/6885 [10:08:11<4:52:20, 2.65s/it] 4%|▎ | 258/6885 [10:08:13<4:36:07, 2.50s/it] 4%|▍ | 259/6885 [10:08:16<5:12:34, 2.83s/it] 4%|▍ | 260/6885 [10:08:18<4:40:28, 2.54s/it] {'loss': 0.6961, 'grad_norm': 2.0758637079489306, 'learning_rate': 3.759071117561684e-06, 'epoch': 0.04} 4%|▍ | 260/6885 [10:08:18<4:40:28, 2.54s/it] 4%|▍ | 261/6885 [10:08:24<6:28:49, 3.52s/it] 4%|▍ | 262/6885 [10:08:27<6:17:29, 3.42s/it] 4%|▍ | 263/6885 [10:08:29<5:40:58, 3.09s/it] 4%|▍ | 264/6885 [10:08:32<5:08:16, 2.79s/it] 4%|▍ | 265/6885 [10:08:33<4:35:16, 2.49s/it] 4%|▍ | 266/6885 [10:08:36<4:49:03, 2.62s/it] 4%|▍ | 267/6885 [10:08:39<4:39:45, 2.54s/it] 4%|▍ | 268/6885 [10:08:44<5:59:07, 3.26s/it] 4%|▍ | 269/6885 [10:08:46<5:30:55, 3.00s/it] 4%|▍ | 270/6885 [10:08:49<5:19:31, 2.90s/it] {'loss': 0.6619, 'grad_norm': 1.651886885559497, 'learning_rate': 3.904208998548621e-06, 'epoch': 0.04} 4%|▍ | 270/6885 [10:08:49<5:19:31, 2.90s/it] 4%|▍ | 271/6885 [10:08:51<5:02:10, 2.74s/it] 4%|▍ | 272/6885 [10:08:53<4:53:25, 2.66s/it] 4%|▍ | 273/6885 [10:08:57<5:12:23, 2.83s/it] 4%|▍ | 274/6885 [10:09:00<5:28:41, 2.98s/it] 4%|▍ | 275/6885 [10:09:02<5:03:25, 2.75s/it] 4%|▍ | 276/6885 [10:09:05<4:49:02, 2.62s/it] 4%|▍ | 277/6885 [10:09:08<5:24:30, 2.95s/it] 4%|▍ | 278/6885 [10:09:11<5:18:50, 2.90s/it] 4%|▍ | 279/6885 [10:09:14<5:23:41, 2.94s/it] 4%|▍ | 280/6885 [10:09:18<5:39:53, 3.09s/it] {'loss': 0.691, 'grad_norm': 1.6813735734416895, 'learning_rate': 4.049346879535559e-06, 'epoch': 0.04} 4%|▍ | 280/6885 [10:09:18<5:39:53, 3.09s/it] 4%|▍ | 281/6885 [10:09:20<5:19:20, 2.90s/it] 4%|▍ | 282/6885 [10:09:23<5:18:22, 2.89s/it] 4%|▍ | 283/6885 [10:09:25<4:57:48, 2.71s/it] 4%|▍ | 284/6885 [10:09:27<4:43:48, 2.58s/it] 4%|▍ | 285/6885 [10:09:31<5:02:20, 2.75s/it] 4%|▍ | 286/6885 [10:09:33<4:37:51, 2.53s/it] 4%|▍ | 287/6885 [10:09:34<4:15:52, 2.33s/it] 4%|▍ | 288/6885 [10:09:37<4:35:47, 2.51s/it] 4%|▍ | 289/6885 [10:09:40<4:37:05, 2.52s/it] 4%|▍ | 290/6885 [10:09:44<5:13:18, 2.85s/it] {'loss': 0.6646, 'grad_norm': 1.8001370749006687, 'learning_rate': 4.194484760522497e-06, 'epoch': 0.04} 4%|▍ | 290/6885 [10:09:44<5:13:18, 2.85s/it] 4%|▍ | 291/6885 [10:09:47<5:22:48, 2.94s/it] 4%|▍ | 292/6885 [10:09:52<6:49:44, 3.73s/it] 4%|▍ | 293/6885 [10:09:55<6:19:57, 3.46s/it] 4%|▍ | 294/6885 [10:09:57<5:40:00, 3.10s/it] 4%|▍ | 295/6885 [10:10:02<6:22:02, 3.48s/it] 4%|▍ | 296/6885 [10:10:04<5:46:08, 3.15s/it] 4%|▍ | 297/6885 [10:10:06<5:18:02, 2.90s/it] 4%|▍ | 298/6885 [10:10:10<5:30:51, 3.01s/it] 4%|▍ | 299/6885 [10:10:12<5:23:38, 2.95s/it] 4%|▍ | 300/6885 [10:10:16<5:39:56, 3.10s/it] {'loss': 0.6595, 'grad_norm': 1.8255351447030483, 'learning_rate': 4.339622641509435e-06, 'epoch': 0.04} 4%|▍ | 300/6885 [10:10:16<5:39:56, 3.10s/it] 4%|▍ | 301/6885 [10:10:19<5:36:53, 3.07s/it] 4%|▍ | 302/6885 [10:10:22<5:35:57, 3.06s/it] 4%|▍ | 303/6885 [10:10:24<4:58:21, 2.72s/it] 4%|▍ | 304/6885 [10:10:26<4:42:58, 2.58s/it] 4%|▍ | 305/6885 [10:10:29<4:53:10, 2.67s/it] 4%|▍ | 306/6885 [10:10:31<4:43:15, 2.58s/it] 4%|▍ | 307/6885 [10:10:34<4:36:51, 2.53s/it] 4%|▍ | 308/6885 [10:10:38<5:17:58, 2.90s/it] 4%|▍ | 309/6885 [10:10:40<5:06:04, 2.79s/it] 5%|▍ | 310/6885 [10:10:45<6:18:37, 3.46s/it] {'loss': 0.6555, 'grad_norm': 1.7918481140936697, 'learning_rate': 4.484760522496372e-06, 'epoch': 0.05} 5%|▍ | 310/6885 [10:10:45<6:18:37, 3.46s/it] 5%|▍ | 311/6885 [10:10:48<5:53:23, 3.23s/it] 5%|▍ | 312/6885 [10:10:51<5:36:51, 3.07s/it] 5%|▍ | 313/6885 [10:10:53<5:16:08, 2.89s/it] 5%|▍ | 314/6885 [10:10:56<5:04:48, 2.78s/it] 5%|▍ | 315/6885 [10:10:59<5:27:12, 2.99s/it] 5%|▍ | 316/6885 [10:11:02<5:12:15, 2.85s/it] 5%|▍ | 317/6885 [10:11:05<5:41:43, 3.12s/it] 5%|▍ | 318/6885 [10:11:08<5:12:07, 2.85s/it] 5%|▍ | 319/6885 [10:11:10<4:46:33, 2.62s/it] 5%|▍ | 320/6885 [10:11:13<5:17:24, 2.90s/it] {'loss': 0.6734, 'grad_norm': 1.6697318257583398, 'learning_rate': 4.629898403483309e-06, 'epoch': 0.05} 5%|▍ | 320/6885 [10:11:13<5:17:24, 2.90s/it] 5%|▍ | 321/6885 [10:11:16<5:31:04, 3.03s/it] 5%|▍ | 322/6885 [10:11:20<5:49:29, 3.20s/it] 5%|▍ | 323/6885 [10:11:27<8:06:23, 4.45s/it] 5%|▍ | 324/6885 [10:11:31<7:44:50, 4.25s/it] 5%|▍ | 325/6885 [10:11:34<6:53:43, 3.78s/it] 5%|▍ | 326/6885 [10:11:38<6:59:28, 3.84s/it] 5%|▍ | 327/6885 [10:11:41<6:27:09, 3.54s/it] 5%|▍ | 328/6885 [10:11:44<6:05:42, 3.35s/it] 5%|▍ | 329/6885 [10:11:48<6:42:58, 3.69s/it] 5%|▍ | 330/6885 [10:11:52<6:38:20, 3.65s/it] {'loss': 0.6511, 'grad_norm': 1.5656777878920214, 'learning_rate': 4.775036284470247e-06, 'epoch': 0.05} 5%|▍ | 330/6885 [10:11:52<6:38:20, 3.65s/it] 5%|▍ | 331/6885 [10:11:54<6:07:57, 3.37s/it] 5%|▍ | 332/6885 [10:11:57<5:39:43, 3.11s/it] 5%|▍ | 333/6885 [10:12:00<5:44:45, 3.16s/it] 5%|▍ | 334/6885 [10:12:03<5:21:05, 2.94s/it] 5%|▍ | 335/6885 [10:12:05<5:19:02, 2.92s/it] 5%|▍ | 336/6885 [10:12:08<5:01:26, 2.76s/it] 5%|▍ | 337/6885 [10:12:11<5:14:43, 2.88s/it] 5%|▍ | 338/6885 [10:12:12<4:27:22, 2.45s/it] 5%|▍ | 339/6885 [10:12:16<5:03:39, 2.78s/it] 5%|▍ | 340/6885 [10:12:18<4:43:06, 2.60s/it] {'loss': 0.6651, 'grad_norm': 1.6515736055504289, 'learning_rate': 4.920174165457185e-06, 'epoch': 0.05} 5%|▍ | 340/6885 [10:12:18<4:43:06, 2.60s/it] 5%|▍ | 341/6885 [10:12:21<5:02:33, 2.77s/it] 5%|▍ | 342/6885 [10:12:25<5:24:59, 2.98s/it] 5%|▍ | 343/6885 [10:12:28<5:18:09, 2.92s/it] 5%|▍ | 344/6885 [10:12:30<5:04:30, 2.79s/it] 5%|▌ | 345/6885 [10:12:32<4:47:04, 2.63s/it] 5%|▌ | 346/6885 [10:12:35<4:49:02, 2.65s/it] 5%|▌ | 347/6885 [10:12:38<5:02:56, 2.78s/it] 5%|▌ | 348/6885 [10:12:41<5:01:15, 2.77s/it] 5%|▌ | 349/6885 [10:12:44<5:19:02, 2.93s/it] 5%|▌ | 350/6885 [10:12:47<5:20:07, 2.94s/it] {'loss': 0.665, 'grad_norm': 1.6517233906536315, 'learning_rate': 5.065312046444122e-06, 'epoch': 0.05} 5%|▌ | 350/6885 [10:12:47<5:20:07, 2.94s/it] 5%|▌ | 351/6885 [10:12:49<4:50:28, 2.67s/it] 5%|▌ | 352/6885 [10:12:52<4:49:20, 2.66s/it] 5%|▌ | 353/6885 [10:12:54<4:27:09, 2.45s/it] 5%|▌ | 354/6885 [10:12:57<4:53:25, 2.70s/it] 5%|▌ | 355/6885 [10:13:00<5:09:31, 2.84s/it] 5%|▌ | 356/6885 [10:13:03<4:53:19, 2.70s/it] 5%|▌ | 357/6885 [10:13:05<4:54:33, 2.71s/it] 5%|▌ | 358/6885 [10:13:09<5:24:55, 2.99s/it] 5%|▌ | 359/6885 [10:13:13<5:53:07, 3.25s/it] 5%|▌ | 360/6885 [10:13:15<5:30:07, 3.04s/it] {'loss': 0.6632, 'grad_norm': 1.6987223199576384, 'learning_rate': 5.210449927431061e-06, 'epoch': 0.05} 5%|▌ | 360/6885 [10:13:15<5:30:07, 3.04s/it] 5%|▌ | 361/6885 [10:13:18<5:01:31, 2.77s/it] 5%|▌ | 362/6885 [10:13:20<4:47:34, 2.65s/it] 5%|▌ | 363/6885 [10:13:23<5:04:47, 2.80s/it] 5%|▌ | 364/6885 [10:13:26<4:54:45, 2.71s/it] 5%|▌ | 365/6885 [10:13:31<6:30:11, 3.59s/it] 5%|▌ | 366/6885 [10:13:37<7:32:43, 4.17s/it] 5%|▌ | 367/6885 [10:13:40<7:14:38, 4.00s/it] 5%|▌ | 368/6885 [10:13:44<6:53:28, 3.81s/it] 5%|▌ | 369/6885 [10:13:45<5:35:05, 3.09s/it] 5%|▌ | 370/6885 [10:13:48<5:25:23, 3.00s/it] {'loss': 0.665, 'grad_norm': 1.578744968443496, 'learning_rate': 5.355587808417998e-06, 'epoch': 0.05} 5%|▌ | 370/6885 [10:13:48<5:25:23, 3.00s/it] 5%|▌ | 371/6885 [10:13:50<5:12:21, 2.88s/it] 5%|▌ | 372/6885 [10:13:52<4:43:34, 2.61s/it] 5%|▌ | 373/6885 [10:13:55<4:49:40, 2.67s/it] 5%|▌ | 374/6885 [10:13:59<5:09:45, 2.85s/it] 5%|▌ | 375/6885 [10:14:01<5:07:33, 2.83s/it] 5%|▌ | 376/6885 [10:14:03<4:41:21, 2.59s/it] 5%|▌ | 377/6885 [10:14:07<5:08:21, 2.84s/it] 5%|▌ | 378/6885 [10:14:09<4:52:34, 2.70s/it] 6%|▌ | 379/6885 [10:14:13<5:16:04, 2.91s/it] 6%|▌ | 380/6885 [10:14:16<5:20:37, 2.96s/it] {'loss': 0.6511, 'grad_norm': 1.4975426293081397, 'learning_rate': 5.500725689404935e-06, 'epoch': 0.06} 6%|▌ | 380/6885 [10:14:16<5:20:37, 2.96s/it] 6%|▌ | 381/6885 [10:14:18<5:09:18, 2.85s/it] 6%|▌ | 382/6885 [10:14:22<5:25:43, 3.01s/it] 6%|▌ | 383/6885 [10:14:24<5:03:01, 2.80s/it] 6%|▌ | 384/6885 [10:14:26<4:27:12, 2.47s/it] 6%|▌ | 385/6885 [10:14:28<4:17:22, 2.38s/it] 6%|▌ | 386/6885 [10:14:30<4:00:04, 2.22s/it] 6%|▌ | 387/6885 [10:14:35<5:33:51, 3.08s/it] 6%|▌ | 388/6885 [10:14:36<4:44:34, 2.63s/it] 6%|▌ | 389/6885 [10:14:39<4:41:57, 2.60s/it] 6%|▌ | 390/6885 [10:14:41<4:30:26, 2.50s/it] {'loss': 0.6676, 'grad_norm': 1.7386717568110297, 'learning_rate': 5.645863570391873e-06, 'epoch': 0.06} 6%|▌ | 390/6885 [10:14:41<4:30:26, 2.50s/it] 6%|▌ | 391/6885 [10:14:44<4:49:43, 2.68s/it] 6%|▌ | 392/6885 [10:14:49<5:53:26, 3.27s/it] 6%|▌ | 393/6885 [10:14:52<5:42:07, 3.16s/it] 6%|▌ | 394/6885 [10:14:55<5:41:14, 3.15s/it] 6%|▌ | 395/6885 [10:14:57<5:16:05, 2.92s/it] 6%|▌ | 396/6885 [10:15:01<5:45:47, 3.20s/it] 6%|▌ | 397/6885 [10:15:05<5:58:20, 3.31s/it] 6%|▌ | 398/6885 [10:15:10<6:52:50, 3.82s/it] 6%|▌ | 399/6885 [10:15:13<6:23:42, 3.55s/it] 6%|▌ | 400/6885 [10:15:15<5:38:59, 3.14s/it] {'loss': 0.6635, 'grad_norm': 1.5916583497500596, 'learning_rate': 5.7910014513788105e-06, 'epoch': 0.06} 6%|▌ | 400/6885 [10:15:15<5:38:59, 3.14s/it] 6%|▌ | 401/6885 [10:15:17<5:21:37, 2.98s/it] 6%|▌ | 402/6885 [10:15:20<5:15:23, 2.92s/it] 6%|▌ | 403/6885 [10:15:24<5:34:17, 3.09s/it] 6%|▌ | 404/6885 [10:15:29<6:32:03, 3.63s/it] 6%|▌ | 405/6885 [10:15:33<6:55:27, 3.85s/it] 6%|▌ | 406/6885 [10:15:36<6:41:14, 3.72s/it] 6%|▌ | 407/6885 [10:15:39<6:00:13, 3.34s/it] 6%|▌ | 408/6885 [10:15:42<5:52:16, 3.26s/it] 6%|▌ | 409/6885 [10:15:44<5:23:32, 3.00s/it] 6%|▌ | 410/6885 [10:15:49<6:25:56, 3.58s/it] {'loss': 0.6668, 'grad_norm': 1.6931617934865184, 'learning_rate': 5.936139332365748e-06, 'epoch': 0.06} 6%|▌ | 410/6885 [10:15:49<6:25:56, 3.58s/it] 6%|▌ | 411/6885 [10:15:51<5:41:31, 3.17s/it] 6%|▌ | 412/6885 [10:15:55<5:58:11, 3.32s/it] 6%|▌ | 413/6885 [10:15:57<5:26:59, 3.03s/it] 6%|▌ | 414/6885 [10:16:01<5:52:13, 3.27s/it] 6%|▌ | 415/6885 [10:16:04<5:29:22, 3.05s/it] 6%|▌ | 416/6885 [10:16:06<5:04:23, 2.82s/it] 6%|▌ | 417/6885 [10:16:09<5:17:05, 2.94s/it] 6%|▌ | 418/6885 [10:16:13<5:29:37, 3.06s/it] 6%|▌ | 419/6885 [10:16:16<5:25:43, 3.02s/it] 6%|▌ | 420/6885 [10:16:18<5:11:10, 2.89s/it] {'loss': 0.6685, 'grad_norm': 1.5616372247201953, 'learning_rate': 6.081277213352685e-06, 'epoch': 0.06} 6%|▌ | 420/6885 [10:16:18<5:11:10, 2.89s/it] 6%|▌ | 421/6885 [10:16:21<5:08:53, 2.87s/it] 6%|▌ | 422/6885 [10:16:24<5:17:16, 2.95s/it] 6%|▌ | 423/6885 [10:16:27<5:03:00, 2.81s/it] 6%|▌ | 424/6885 [10:16:31<5:58:37, 3.33s/it] 6%|▌ | 425/6885 [10:16:35<6:19:47, 3.53s/it] 6%|▌ | 426/6885 [10:16:37<5:30:50, 3.07s/it] 6%|▌ | 427/6885 [10:16:40<5:38:16, 3.14s/it] 6%|▌ | 428/6885 [10:16:42<4:56:53, 2.76s/it] 6%|▌ | 429/6885 [10:16:46<5:17:06, 2.95s/it] 6%|▌ | 430/6885 [10:16:49<5:29:52, 3.07s/it] {'loss': 0.659, 'grad_norm': 1.5424914283941253, 'learning_rate': 6.226415094339623e-06, 'epoch': 0.06} 6%|▌ | 430/6885 [10:16:49<5:29:52, 3.07s/it] 6%|▋ | 431/6885 [10:16:51<5:05:15, 2.84s/it] 6%|▋ | 432/6885 [10:16:55<5:24:43, 3.02s/it] 6%|▋ | 433/6885 [10:16:59<6:18:25, 3.52s/it] 6%|▋ | 434/6885 [10:17:03<6:13:14, 3.47s/it] 6%|▋ | 435/6885 [10:17:06<5:50:43, 3.26s/it] 6%|▋ | 436/6885 [10:17:08<5:17:30, 2.95s/it] 6%|▋ | 437/6885 [10:17:11<5:13:15, 2.91s/it] 6%|▋ | 438/6885 [10:17:13<5:12:08, 2.90s/it] 6%|▋ | 439/6885 [10:17:17<5:47:06, 3.23s/it] 6%|▋ | 440/6885 [10:17:22<6:33:09, 3.66s/it] {'loss': 0.6453, 'grad_norm': 1.6468311050594455, 'learning_rate': 6.37155297532656e-06, 'epoch': 0.06} 6%|▋ | 440/6885 [10:17:22<6:33:09, 3.66s/it] 6%|▋ | 441/6885 [10:17:24<5:47:16, 3.23s/it] 6%|▋ | 442/6885 [10:17:28<5:48:45, 3.25s/it] 6%|▋ | 443/6885 [10:17:30<5:07:48, 2.87s/it] 6%|▋ | 444/6885 [10:17:33<5:14:40, 2.93s/it] 6%|▋ | 445/6885 [10:17:36<5:31:36, 3.09s/it] 6%|▋ | 446/6885 [10:17:40<6:01:28, 3.37s/it] 6%|▋ | 447/6885 [10:17:43<5:33:38, 3.11s/it] 7%|▋ | 448/6885 [10:17:46<5:42:50, 3.20s/it] 7%|▋ | 449/6885 [10:17:49<5:18:13, 2.97s/it] 7%|▋ | 450/6885 [10:17:51<4:55:41, 2.76s/it] {'loss': 0.6598, 'grad_norm': 1.5765402125957226, 'learning_rate': 6.5166908563134976e-06, 'epoch': 0.07} 7%|▋ | 450/6885 [10:17:51<4:55:41, 2.76s/it] 7%|▋ | 451/6885 [10:17:58<7:18:17, 4.09s/it] 7%|▋ | 452/6885 [10:18:00<6:16:05, 3.51s/it] 7%|▋ | 453/6885 [10:18:04<6:36:25, 3.70s/it] 7%|▋ | 454/6885 [10:18:09<6:56:13, 3.88s/it] 7%|▋ | 455/6885 [10:18:11<6:06:15, 3.42s/it] 7%|▋ | 456/6885 [10:18:14<5:43:58, 3.21s/it] 7%|▋ | 457/6885 [10:18:16<5:22:13, 3.01s/it] 7%|▋ | 458/6885 [10:18:20<5:47:34, 3.24s/it] 7%|▋ | 459/6885 [10:18:23<5:25:04, 3.04s/it] 7%|▋ | 460/6885 [10:18:26<5:22:48, 3.01s/it] {'loss': 0.6619, 'grad_norm': 1.7349394887283642, 'learning_rate': 6.6618287373004365e-06, 'epoch': 0.07} 7%|▋ | 460/6885 [10:18:26<5:22:48, 3.01s/it] 7%|▋ | 461/6885 [10:18:28<5:08:35, 2.88s/it] 7%|▋ | 462/6885 [10:18:32<5:38:18, 3.16s/it] 7%|▋ | 463/6885 [10:18:35<5:37:43, 3.16s/it] 7%|▋ | 464/6885 [10:18:38<5:29:15, 3.08s/it] 7%|▋ | 465/6885 [10:18:41<5:22:18, 3.01s/it] 7%|▋ | 466/6885 [10:18:44<5:24:09, 3.03s/it] 7%|▋ | 467/6885 [10:18:47<5:20:49, 3.00s/it] 7%|▋ | 468/6885 [10:18:51<5:43:53, 3.22s/it] 7%|▋ | 469/6885 [10:18:54<6:02:47, 3.39s/it] 7%|▋ | 470/6885 [10:18:58<6:27:58, 3.63s/it] {'loss': 0.6692, 'grad_norm': 1.6385635232751372, 'learning_rate': 6.806966618287374e-06, 'epoch': 0.07} 7%|▋ | 470/6885 [10:18:59<6:27:58, 3.63s/it] 7%|▋ | 471/6885 [10:19:01<5:48:31, 3.26s/it] 7%|▋ | 472/6885 [10:19:04<5:45:27, 3.23s/it] 7%|▋ | 473/6885 [10:19:07<5:43:36, 3.22s/it] 7%|▋ | 474/6885 [10:19:09<5:02:07, 2.83s/it] 7%|▋ | 475/6885 [10:19:12<5:03:47, 2.84s/it] 7%|▋ | 476/6885 [10:19:15<5:09:36, 2.90s/it] 7%|▋ | 477/6885 [10:19:19<5:47:09, 3.25s/it] 7%|▋ | 478/6885 [10:19:23<6:22:26, 3.58s/it] 7%|▋ | 479/6885 [10:19:26<5:54:41, 3.32s/it] 7%|▋ | 480/6885 [10:19:29<5:23:45, 3.03s/it] {'loss': 0.6484, 'grad_norm': 1.4945507177883908, 'learning_rate': 6.952104499274311e-06, 'epoch': 0.07} 7%|▋ | 480/6885 [10:19:29<5:23:45, 3.03s/it] 7%|▋ | 481/6885 [10:19:30<4:46:16, 2.68s/it] 7%|▋ | 482/6885 [10:19:33<4:47:09, 2.69s/it] 7%|▋ | 483/6885 [10:19:36<4:52:34, 2.74s/it] 7%|▋ | 484/6885 [10:19:39<4:58:34, 2.80s/it] 7%|▋ | 485/6885 [10:19:41<4:48:50, 2.71s/it] 7%|▋ | 486/6885 [10:19:44<4:35:39, 2.58s/it] 7%|▋ | 487/6885 [10:19:45<4:03:55, 2.29s/it] 7%|▋ | 488/6885 [10:19:47<3:53:52, 2.19s/it] 7%|▋ | 489/6885 [10:19:49<3:46:20, 2.12s/it] 7%|▋ | 490/6885 [10:19:54<5:03:10, 2.84s/it] {'loss': 0.657, 'grad_norm': 1.583857774726375, 'learning_rate': 7.097242380261249e-06, 'epoch': 0.07} 7%|▋ | 490/6885 [10:19:54<5:03:10, 2.84s/it] 7%|▋ | 491/6885 [10:19:58<5:52:52, 3.31s/it] 7%|▋ | 492/6885 [10:20:01<5:24:07, 3.04s/it] 7%|▋ | 493/6885 [10:20:03<5:01:21, 2.83s/it] 7%|▋ | 494/6885 [10:20:06<5:05:42, 2.87s/it] 7%|▋ | 495/6885 [10:20:09<5:26:10, 3.06s/it] 7%|▋ | 496/6885 [10:20:12<5:08:04, 2.89s/it] 7%|▋ | 497/6885 [10:20:14<4:42:11, 2.65s/it] 7%|▋ | 498/6885 [10:20:18<5:14:27, 2.95s/it] 7%|▋ | 499/6885 [10:20:20<5:00:06, 2.82s/it] 7%|▋ | 500/6885 [10:20:23<5:00:03, 2.82s/it] {'loss': 0.6601, 'grad_norm': 1.8780189334850588, 'learning_rate': 7.242380261248186e-06, 'epoch': 0.07} 7%|▋ | 500/6885 [10:20:23<5:00:03, 2.82s/it] 7%|▋ | 501/6885 [10:20:28<6:22:47, 3.60s/it] 7%|▋ | 502/6885 [10:20:31<5:55:05, 3.34s/it] 7%|▋ | 503/6885 [10:20:36<6:28:40, 3.65s/it] 7%|▋ | 504/6885 [10:20:39<6:11:54, 3.50s/it] 7%|▋ | 505/6885 [10:20:41<5:32:00, 3.12s/it] 7%|▋ | 506/6885 [10:20:44<5:20:00, 3.01s/it] 7%|▋ | 507/6885 [10:20:46<4:45:24, 2.68s/it] 7%|▋ | 508/6885 [10:20:49<5:20:48, 3.02s/it] 7%|▋ | 509/6885 [10:20:52<5:10:39, 2.92s/it] 7%|▋ | 510/6885 [10:20:55<4:58:49, 2.81s/it] {'loss': 0.6542, 'grad_norm': 1.5153409007972507, 'learning_rate': 7.387518142235124e-06, 'epoch': 0.07} 7%|▋ | 510/6885 [10:20:55<4:58:49, 2.81s/it] 7%|▋ | 511/6885 [10:20:57<4:54:06, 2.77s/it] 7%|▋ | 512/6885 [10:21:00<4:39:53, 2.64s/it] 7%|▋ | 513/6885 [10:21:03<4:57:02, 2.80s/it] 7%|▋ | 514/6885 [10:21:05<4:32:22, 2.57s/it] 7%|▋ | 515/6885 [10:21:08<4:56:21, 2.79s/it] 7%|▋ | 516/6885 [10:21:10<4:38:54, 2.63s/it] 8%|▊ | 517/6885 [10:21:13<4:38:24, 2.62s/it] 8%|▊ | 518/6885 [10:21:15<4:16:28, 2.42s/it] 8%|▊ | 519/6885 [10:21:20<5:37:38, 3.18s/it] 8%|▊ | 520/6885 [10:21:23<5:33:57, 3.15s/it] {'loss': 0.6476, 'grad_norm': 1.5243833834622142, 'learning_rate': 7.532656023222062e-06, 'epoch': 0.08} 8%|▊ | 520/6885 [10:21:23<5:33:57, 3.15s/it] 8%|▊ | 521/6885 [10:21:25<5:13:48, 2.96s/it] 8%|▊ | 522/6885 [10:21:28<4:45:30, 2.69s/it] 8%|▊ | 523/6885 [10:21:32<5:34:43, 3.16s/it] 8%|▊ | 524/6885 [10:21:33<4:42:36, 2.67s/it] 8%|▊ | 525/6885 [10:21:36<4:34:19, 2.59s/it] 8%|▊ | 526/6885 [10:21:39<5:10:45, 2.93s/it] 8%|▊ | 527/6885 [10:21:42<4:51:36, 2.75s/it] 8%|▊ | 528/6885 [10:21:45<4:51:12, 2.75s/it] 8%|▊ | 529/6885 [10:21:48<5:24:00, 3.06s/it] 8%|▊ | 530/6885 [10:21:52<5:41:33, 3.22s/it] {'loss': 0.6451, 'grad_norm': 1.6429693792028686, 'learning_rate': 7.677793904208998e-06, 'epoch': 0.08} 8%|▊ | 530/6885 [10:21:52<5:41:33, 3.22s/it] 8%|▊ | 531/6885 [10:21:56<6:14:02, 3.53s/it] 8%|▊ | 532/6885 [10:21:58<5:15:43, 2.98s/it] 8%|▊ | 533/6885 [10:22:03<6:10:18, 3.50s/it] 8%|▊ | 534/6885 [10:22:06<6:18:07, 3.57s/it] 8%|▊ | 535/6885 [10:22:08<5:30:43, 3.12s/it] 8%|▊ | 536/6885 [10:22:12<5:34:08, 3.16s/it] 8%|▊ | 537/6885 [10:22:14<5:24:33, 3.07s/it] 8%|▊ | 538/6885 [10:22:17<4:52:14, 2.76s/it] 8%|▊ | 539/6885 [10:22:19<4:45:08, 2.70s/it] 8%|▊ | 540/6885 [10:22:21<4:22:17, 2.48s/it] {'loss': 0.6527, 'grad_norm': 1.802860360098263, 'learning_rate': 7.822931785195936e-06, 'epoch': 0.08} 8%|▊ | 540/6885 [10:22:21<4:22:17, 2.48s/it] 8%|▊ | 541/6885 [10:22:24<4:46:39, 2.71s/it] 8%|▊ | 542/6885 [10:22:27<4:32:05, 2.57s/it] 8%|▊ | 543/6885 [10:22:30<4:48:54, 2.73s/it] 8%|▊ | 544/6885 [10:22:34<5:47:52, 3.29s/it] 8%|▊ | 545/6885 [10:22:40<6:57:00, 3.95s/it] 8%|▊ | 546/6885 [10:22:42<6:04:10, 3.45s/it] 8%|▊ | 547/6885 [10:22:45<5:49:30, 3.31s/it] 8%|▊ | 548/6885 [10:22:48<5:42:41, 3.24s/it] 8%|▊ | 549/6885 [10:22:51<5:25:47, 3.09s/it] 8%|▊ | 550/6885 [10:22:54<5:13:24, 2.97s/it] {'loss': 0.661, 'grad_norm': 1.6594363957156038, 'learning_rate': 7.968069666182874e-06, 'epoch': 0.08} 8%|▊ | 550/6885 [10:22:54<5:13:24, 2.97s/it] 8%|▊ | 551/6885 [10:22:56<5:02:26, 2.86s/it] 8%|▊ | 552/6885 [10:22:58<4:43:33, 2.69s/it] 8%|▊ | 553/6885 [10:23:04<6:05:35, 3.46s/it] 8%|▊ | 554/6885 [10:23:07<5:52:42, 3.34s/it] 8%|▊ | 555/6885 [10:23:09<5:15:13, 2.99s/it] 8%|▊ | 556/6885 [10:23:12<5:15:48, 2.99s/it] 8%|▊ | 557/6885 [10:23:15<5:33:34, 3.16s/it] 8%|▊ | 558/6885 [10:23:18<5:24:17, 3.08s/it] 8%|▊ | 559/6885 [10:23:22<5:29:51, 3.13s/it] 8%|▊ | 560/6885 [10:23:25<5:24:55, 3.08s/it] {'loss': 0.6547, 'grad_norm': 1.5938255936259151, 'learning_rate': 8.113207547169812e-06, 'epoch': 0.08} 8%|▊ | 560/6885 [10:23:25<5:24:55, 3.08s/it] 8%|▊ | 561/6885 [10:23:27<4:58:12, 2.83s/it] 8%|▊ | 562/6885 [10:23:29<4:43:50, 2.69s/it] 8%|▊ | 563/6885 [10:23:31<4:29:43, 2.56s/it] 8%|▊ | 564/6885 [10:23:35<4:53:53, 2.79s/it] 8%|▊ | 565/6885 [10:23:38<5:03:38, 2.88s/it] 8%|▊ | 566/6885 [10:23:41<5:24:03, 3.08s/it] 8%|▊ | 567/6885 [10:23:44<5:24:43, 3.08s/it] 8%|▊ | 568/6885 [10:23:50<6:29:18, 3.70s/it] 8%|▊ | 569/6885 [10:23:52<5:48:46, 3.31s/it] 8%|▊ | 570/6885 [10:23:55<5:42:28, 3.25s/it] {'loss': 0.6609, 'grad_norm': 1.3939924292770436, 'learning_rate': 8.25834542815675e-06, 'epoch': 0.08} 8%|▊ | 570/6885 [10:23:55<5:42:28, 3.25s/it] 8%|▊ | 571/6885 [10:23:57<5:02:39, 2.88s/it] 8%|▊ | 572/6885 [10:24:01<5:19:36, 3.04s/it] 8%|▊ | 573/6885 [10:24:05<6:10:22, 3.52s/it] 8%|▊ | 574/6885 [10:24:07<5:29:50, 3.14s/it] 8%|▊ | 575/6885 [10:24:10<5:09:13, 2.94s/it] 8%|▊ | 576/6885 [10:24:12<4:37:06, 2.64s/it] 8%|▊ | 577/6885 [10:24:15<5:03:42, 2.89s/it] 8%|▊ | 578/6885 [10:24:18<5:06:25, 2.92s/it] 8%|▊ | 579/6885 [10:24:21<4:56:44, 2.82s/it] 8%|▊ | 580/6885 [10:24:24<5:01:09, 2.87s/it] {'loss': 0.6419, 'grad_norm': 1.5321796462771227, 'learning_rate': 8.403483309143687e-06, 'epoch': 0.08} 8%|▊ | 580/6885 [10:24:24<5:01:09, 2.87s/it] 8%|▊ | 581/6885 [10:24:28<5:55:27, 3.38s/it] 8%|▊ | 582/6885 [10:24:30<5:12:12, 2.97s/it] 8%|▊ | 583/6885 [10:24:32<4:37:55, 2.65s/it] 8%|▊ | 584/6885 [10:24:36<5:01:08, 2.87s/it] 8%|▊ | 585/6885 [10:24:39<5:16:21, 3.01s/it] 9%|▊ | 586/6885 [10:24:42<5:19:32, 3.04s/it] 9%|▊ | 587/6885 [10:24:44<4:55:34, 2.82s/it] 9%|▊ | 588/6885 [10:24:57<10:06:56, 5.78s/it] 9%|▊ | 589/6885 [10:25:00<8:27:17, 4.83s/it] 9%|▊ | 590/6885 [10:25:04<8:20:38, 4.77s/it] {'loss': 0.625, 'grad_norm': 1.5907007682060863, 'learning_rate': 8.548621190130625e-06, 'epoch': 0.09} 9%|▊ | 590/6885 [10:25:04<8:20:38, 4.77s/it] 9%|▊ | 591/6885 [10:25:07<7:06:14, 4.06s/it] 9%|▊ | 592/6885 [10:25:09<6:17:37, 3.60s/it] 9%|▊ | 593/6885 [10:25:14<6:36:08, 3.78s/it] 9%|▊ | 594/6885 [10:25:18<6:41:32, 3.83s/it] 9%|▊ | 595/6885 [10:25:21<6:18:17, 3.61s/it] 9%|▊ | 596/6885 [10:25:24<6:16:50, 3.60s/it] 9%|▊ | 597/6885 [10:25:26<5:27:07, 3.12s/it] 9%|▊ | 598/6885 [10:25:29<5:11:32, 2.97s/it] 9%|▊ | 599/6885 [10:25:32<5:31:34, 3.16s/it] 9%|▊ | 600/6885 [10:25:35<5:21:43, 3.07s/it] {'loss': 0.658, 'grad_norm': 1.6048966671231157, 'learning_rate': 8.693759071117563e-06, 'epoch': 0.09} 9%|▊ | 600/6885 [10:25:35<5:21:43, 3.07s/it] 9%|▊ | 601/6885 [10:25:38<5:20:37, 3.06s/it] 9%|▊ | 602/6885 [10:25:41<5:04:33, 2.91s/it] 9%|▉ | 603/6885 [10:25:45<5:34:41, 3.20s/it] 9%|▉ | 604/6885 [10:25:48<5:22:18, 3.08s/it] 9%|▉ | 605/6885 [10:25:51<5:40:45, 3.26s/it] 9%|▉ | 606/6885 [10:25:55<6:02:49, 3.47s/it] 9%|▉ | 607/6885 [10:25:57<5:03:45, 2.90s/it] 9%|▉ | 608/6885 [10:26:00<5:01:09, 2.88s/it] 9%|▉ | 609/6885 [10:26:02<4:54:18, 2.81s/it] 9%|▉ | 610/6885 [10:26:05<4:48:56, 2.76s/it] {'loss': 0.6456, 'grad_norm': 1.457751877262412, 'learning_rate': 8.8388969521045e-06, 'epoch': 0.09} 9%|▉ | 610/6885 [10:26:05<4:48:56, 2.76s/it] 9%|▉ | 611/6885 [10:26:08<4:46:10, 2.74s/it] 9%|▉ | 612/6885 [10:26:09<4:17:04, 2.46s/it] 9%|▉ | 613/6885 [10:26:12<4:37:06, 2.65s/it] 9%|▉ | 614/6885 [10:26:16<4:50:07, 2.78s/it] 9%|▉ | 615/6885 [10:26:19<5:03:49, 2.91s/it] 9%|▉ | 616/6885 [10:26:21<4:46:49, 2.75s/it] 9%|▉ | 617/6885 [10:26:27<6:14:00, 3.58s/it] 9%|▉ | 618/6885 [10:26:29<5:40:12, 3.26s/it] 9%|▉ | 619/6885 [10:26:31<5:03:06, 2.90s/it] 9%|▉ | 620/6885 [10:26:34<4:56:13, 2.84s/it] {'loss': 0.6494, 'grad_norm': 1.3925725985786772, 'learning_rate': 8.984034833091438e-06, 'epoch': 0.09} 9%|▉ | 620/6885 [10:26:34<4:56:13, 2.84s/it] 9%|▉ | 621/6885 [10:26:37<4:49:03, 2.77s/it] 9%|▉ | 622/6885 [10:26:40<4:59:16, 2.87s/it] 9%|▉ | 623/6885 [10:26:42<4:50:55, 2.79s/it] 9%|▉ | 624/6885 [10:26:46<5:08:09, 2.95s/it] 9%|▉ | 625/6885 [10:26:48<4:52:55, 2.81s/it] 9%|▉ | 626/6885 [10:26:50<4:33:01, 2.62s/it] 9%|▉ | 627/6885 [10:26:53<4:45:46, 2.74s/it] 9%|▉ | 628/6885 [10:26:58<5:40:27, 3.26s/it] 9%|▉ | 629/6885 [10:27:00<5:16:42, 3.04s/it] 9%|▉ | 630/6885 [10:27:04<5:29:02, 3.16s/it] {'loss': 0.6604, 'grad_norm': 1.6476815627809678, 'learning_rate': 9.129172714078376e-06, 'epoch': 0.09} 9%|▉ | 630/6885 [10:27:04<5:29:02, 3.16s/it] 9%|▉ | 631/6885 [10:27:08<6:10:11, 3.55s/it] 9%|▉ | 632/6885 [10:27:11<5:36:19, 3.23s/it] 9%|▉ | 633/6885 [10:27:14<5:35:10, 3.22s/it] 9%|▉ | 634/6885 [10:27:19<6:42:41, 3.87s/it] 9%|▉ | 635/6885 [10:27:22<5:55:41, 3.41s/it] 9%|▉ | 636/6885 [10:27:26<6:34:45, 3.79s/it] 9%|▉ | 637/6885 [10:27:28<5:47:46, 3.34s/it] 9%|▉ | 638/6885 [10:27:33<6:19:30, 3.65s/it] 9%|▉ | 639/6885 [10:27:35<5:44:39, 3.31s/it] 9%|▉ | 640/6885 [10:27:40<6:09:44, 3.55s/it] {'loss': 0.6462, 'grad_norm': 1.4844043302240553, 'learning_rate': 9.274310595065312e-06, 'epoch': 0.09} 9%|▉ | 640/6885 [10:27:40<6:09:44, 3.55s/it] 9%|▉ | 641/6885 [10:27:42<5:35:04, 3.22s/it] 9%|▉ | 642/6885 [10:27:45<5:17:11, 3.05s/it] 9%|▉ | 643/6885 [10:27:47<4:45:32, 2.74s/it] 9%|▉ | 644/6885 [10:27:52<6:15:04, 3.61s/it] 9%|▉ | 645/6885 [10:27:55<5:44:36, 3.31s/it] 9%|▉ | 646/6885 [10:27:57<5:22:04, 3.10s/it] 9%|▉ | 647/6885 [10:28:00<5:15:04, 3.03s/it] 9%|▉ | 648/6885 [10:28:04<5:20:40, 3.08s/it] 9%|▉ | 649/6885 [10:28:05<4:44:59, 2.74s/it] 9%|▉ | 650/6885 [10:28:07<4:20:52, 2.51s/it] {'loss': 0.6464, 'grad_norm': 1.5541257847812342, 'learning_rate': 9.41944847605225e-06, 'epoch': 0.09} 9%|▉ | 650/6885 [10:28:07<4:20:52, 2.51s/it] 9%|▉ | 651/6885 [10:28:10<4:16:52, 2.47s/it] 9%|▉ | 652/6885 [10:28:15<5:45:26, 3.33s/it] 9%|▉ | 653/6885 [10:28:18<5:20:48, 3.09s/it] 9%|▉ | 654/6885 [10:28:21<5:12:25, 3.01s/it] 10%|▉ | 655/6885 [10:28:23<5:03:24, 2.92s/it] 10%|▉ | 656/6885 [10:28:27<5:19:25, 3.08s/it] 10%|▉ | 657/6885 [10:28:29<4:45:11, 2.75s/it] 10%|▉ | 658/6885 [10:28:32<4:57:17, 2.86s/it] 10%|▉ | 659/6885 [10:28:34<4:39:01, 2.69s/it] 10%|▉ | 660/6885 [10:28:37<4:36:22, 2.66s/it] {'loss': 0.6471, 'grad_norm': 1.5339956751582804, 'learning_rate': 9.564586357039188e-06, 'epoch': 0.1} 10%|▉ | 660/6885 [10:28:37<4:36:22, 2.66s/it] 10%|▉ | 661/6885 [10:28:41<5:27:00, 3.15s/it] 10%|▉ | 662/6885 [10:28:44<5:19:55, 3.08s/it] 10%|▉ | 663/6885 [10:28:47<5:33:27, 3.22s/it] 10%|▉ | 664/6885 [10:28:50<5:07:20, 2.96s/it] 10%|▉ | 665/6885 [10:28:52<4:57:52, 2.87s/it] 10%|▉ | 666/6885 [10:28:55<4:43:38, 2.74s/it] 10%|▉ | 667/6885 [10:28:57<4:11:25, 2.43s/it] 10%|▉ | 668/6885 [10:28:59<4:19:38, 2.51s/it] 10%|▉ | 669/6885 [10:29:01<4:08:04, 2.39s/it] 10%|▉ | 670/6885 [10:29:03<3:47:14, 2.19s/it] {'loss': 0.6519, 'grad_norm': 1.550006983868159, 'learning_rate': 9.709724238026126e-06, 'epoch': 0.1} 10%|▉ | 670/6885 [10:29:03<3:47:14, 2.19s/it] 10%|▉ | 671/6885 [10:29:06<3:55:51, 2.28s/it] 10%|▉ | 672/6885 [10:29:08<3:57:32, 2.29s/it] 10%|▉ | 673/6885 [10:29:11<4:26:59, 2.58s/it] 10%|▉ | 674/6885 [10:29:13<4:07:02, 2.39s/it] 10%|▉ | 675/6885 [10:29:16<4:33:30, 2.64s/it] 10%|▉ | 676/6885 [10:29:19<4:47:16, 2.78s/it] 10%|▉ | 677/6885 [10:29:23<5:08:02, 2.98s/it] 10%|▉ | 678/6885 [10:29:25<4:53:06, 2.83s/it] 10%|▉ | 679/6885 [10:29:28<4:32:04, 2.63s/it] 10%|▉ | 680/6885 [10:29:30<4:30:38, 2.62s/it] {'loss': 0.6508, 'grad_norm': 1.298622779401985, 'learning_rate': 9.854862119013063e-06, 'epoch': 0.1} 10%|▉ | 680/6885 [10:29:30<4:30:38, 2.62s/it] 10%|▉ | 681/6885 [10:29:32<4:07:27, 2.39s/it] 10%|▉ | 682/6885 [10:29:34<3:44:04, 2.17s/it] 10%|▉ | 683/6885 [10:29:36<4:05:03, 2.37s/it] 10%|▉ | 684/6885 [10:29:40<4:29:56, 2.61s/it] 10%|▉ | 685/6885 [10:29:43<4:41:10, 2.72s/it] 10%|▉ | 686/6885 [10:29:45<4:37:08, 2.68s/it] 10%|▉ | 687/6885 [10:29:47<4:21:20, 2.53s/it] 10%|▉ | 688/6885 [10:29:50<4:10:18, 2.42s/it] 10%|█ | 689/6885 [10:29:52<4:05:42, 2.38s/it] 10%|█ | 690/6885 [10:29:54<3:54:57, 2.28s/it] {'loss': 0.6483, 'grad_norm': 1.4545201677417376, 'learning_rate': 1e-05, 'epoch': 0.1} 10%|█ | 690/6885 [10:29:54<3:54:57, 2.28s/it] 10%|█ | 691/6885 [10:29:56<3:45:59, 2.19s/it] 10%|█ | 692/6885 [10:29:58<3:46:05, 2.19s/it] 10%|█ | 693/6885 [10:30:00<3:49:21, 2.22s/it] 10%|█ | 694/6885 [10:30:04<4:40:13, 2.72s/it] 10%|█ | 695/6885 [10:30:08<5:15:32, 3.06s/it] 10%|█ | 696/6885 [10:30:12<5:37:55, 3.28s/it] 10%|█ | 697/6885 [10:30:14<5:01:50, 2.93s/it] 10%|█ | 698/6885 [10:30:20<6:43:35, 3.91s/it] 10%|█ | 699/6885 [10:30:23<6:20:29, 3.69s/it] 10%|█ | 700/6885 [10:30:26<5:56:47, 3.46s/it] {'loss': 0.6517, 'grad_norm': 1.7514454450540817, 'learning_rate': 9.999935728859667e-06, 'epoch': 0.1} 10%|█ | 700/6885 [10:30:26<5:56:47, 3.46s/it] 10%|█ | 701/6885 [10:30:31<6:31:00, 3.79s/it] 10%|█ | 702/6885 [10:30:34<6:12:46, 3.62s/it] 10%|█ | 703/6885 [10:30:37<5:49:32, 3.39s/it] 10%|█ | 704/6885 [10:30:40<5:52:40, 3.42s/it] 10%|█ | 705/6885 [10:30:42<5:04:38, 2.96s/it] 10%|█ | 706/6885 [10:30:45<4:59:48, 2.91s/it] 10%|█ | 707/6885 [10:30:47<4:41:23, 2.73s/it] 10%|█ | 708/6885 [10:30:50<4:39:18, 2.71s/it] 10%|█ | 709/6885 [10:30:53<4:45:56, 2.78s/it] 10%|█ | 710/6885 [10:30:56<4:46:25, 2.78s/it] {'loss': 0.6435, 'grad_norm': 1.3010290416328456, 'learning_rate': 9.999742917090981e-06, 'epoch': 0.1} 10%|█ | 710/6885 [10:30:56<4:46:25, 2.78s/it] 10%|█ | 711/6885 [10:30:59<4:53:34, 2.85s/it] 10%|█ | 712/6885 [10:31:01<4:24:45, 2.57s/it] 10%|█ | 713/6885 [10:31:03<4:18:22, 2.51s/it] 10%|█ | 714/6885 [10:31:06<4:26:55, 2.60s/it] 10%|█ | 715/6885 [10:31:08<4:16:35, 2.50s/it] 10%|█ | 716/6885 [10:31:11<4:36:53, 2.69s/it] 10%|█ | 717/6885 [10:31:13<4:14:27, 2.48s/it] 10%|█ | 718/6885 [10:31:16<4:21:25, 2.54s/it] 10%|█ | 719/6885 [10:31:19<4:38:44, 2.71s/it] 10%|█ | 720/6885 [10:31:22<4:41:22, 2.74s/it] {'loss': 0.6355, 'grad_norm': 1.5222737445349914, 'learning_rate': 9.999421569650833e-06, 'epoch': 0.1} 10%|█ | 720/6885 [10:31:22<4:41:22, 2.74s/it] 10%|█ | 721/6885 [10:31:25<4:51:02, 2.83s/it] 10%|█ | 722/6885 [10:31:28<4:52:54, 2.85s/it] 11%|█ | 723/6885 [10:31:30<4:38:44, 2.71s/it] 11%|█ | 724/6885 [10:31:34<5:21:31, 3.13s/it] 11%|█ | 725/6885 [10:31:38<5:37:26, 3.29s/it] 11%|█ | 726/6885 [10:31:41<5:20:38, 3.12s/it] 11%|█ | 727/6885 [10:31:43<4:51:00, 2.84s/it] 11%|█ | 728/6885 [10:31:45<4:28:20, 2.61s/it] 11%|█ | 729/6885 [10:31:48<4:50:27, 2.83s/it] 11%|█ | 730/6885 [10:31:52<5:07:44, 3.00s/it] {'loss': 0.6414, 'grad_norm': 1.5758824439402839, 'learning_rate': 9.99897169480057e-06, 'epoch': 0.11} 11%|█ | 730/6885 [10:31:52<5:07:44, 3.00s/it] 11%|█ | 731/6885 [10:31:54<4:57:10, 2.90s/it] 11%|█ | 732/6885 [10:31:56<4:30:58, 2.64s/it] 11%|█ | 733/6885 [10:31:59<4:39:26, 2.73s/it] 11%|█ | 734/6885 [10:32:02<4:28:36, 2.62s/it] 11%|█ | 735/6885 [10:32:05<4:42:25, 2.76s/it] 11%|█ | 736/6885 [10:32:08<4:58:11, 2.91s/it] 11%|█ | 737/6885 [10:32:11<4:55:34, 2.88s/it] 11%|█ | 738/6885 [10:32:14<5:04:39, 2.97s/it] 11%|█ | 739/6885 [10:32:20<6:37:35, 3.88s/it] 11%|█ | 740/6885 [10:32:22<5:52:01, 3.44s/it] {'loss': 0.6416, 'grad_norm': 1.3245458819453462, 'learning_rate': 9.99839330410578e-06, 'epoch': 0.11} 11%|█ | 740/6885 [10:32:22<5:52:01, 3.44s/it] 11%|█ | 741/6885 [10:32:25<5:26:09, 3.19s/it] 11%|█ | 742/6885 [10:32:28<5:12:26, 3.05s/it] 11%|█ | 743/6885 [10:32:32<5:48:24, 3.40s/it] 11%|█ | 744/6885 [10:32:36<5:53:22, 3.45s/it] 11%|█ | 745/6885 [10:32:37<5:02:25, 2.96s/it] 11%|█ | 746/6885 [10:32:40<5:00:08, 2.93s/it] 11%|█ | 747/6885 [10:32:42<4:27:06, 2.61s/it] 11%|█ | 748/6885 [10:32:45<4:30:36, 2.65s/it] 11%|█ | 749/6885 [10:32:48<4:52:24, 2.86s/it] 11%|█ | 750/6885 [10:32:50<4:27:53, 2.62s/it] {'loss': 0.6381, 'grad_norm': 1.4753577499137038, 'learning_rate': 9.997686412435996e-06, 'epoch': 0.11} 11%|█ | 750/6885 [10:32:50<4:27:53, 2.62s/it] 11%|█ | 751/6885 [10:32:52<4:12:17, 2.47s/it] 11%|█ | 752/6885 [10:32:55<4:23:27, 2.58s/it] 11%|█ | 753/6885 [10:32:57<3:59:45, 2.35s/it] 11%|█ | 754/6885 [10:33:00<4:29:55, 2.64s/it] 11%|█ | 755/6885 [10:33:03<4:18:03, 2.53s/it] 11%|█ | 756/6885 [10:33:06<4:46:48, 2.81s/it] 11%|█ | 757/6885 [10:33:09<4:47:54, 2.82s/it] 11%|█ | 758/6885 [10:33:14<5:45:58, 3.39s/it] 11%|█ | 759/6885 [10:33:17<5:49:28, 3.42s/it] 11%|█ | 760/6885 [10:33:19<5:13:27, 3.07s/it] {'loss': 0.6369, 'grad_norm': 1.4578988593383, 'learning_rate': 9.99685103796431e-06, 'epoch': 0.11} 11%|█ | 760/6885 [10:33:19<5:13:27, 3.07s/it] 11%|█ | 761/6885 [10:33:21<4:37:52, 2.72s/it] 11%|█ | 762/6885 [10:33:24<4:47:06, 2.81s/it] 11%|█ | 763/6885 [10:33:26<4:15:21, 2.50s/it] 11%|█ | 764/6885 [10:33:29<4:13:48, 2.49s/it] 11%|█ | 765/6885 [10:33:33<5:18:00, 3.12s/it] 11%|█ | 766/6885 [10:33:36<5:13:53, 3.08s/it] 11%|█ | 767/6885 [10:33:38<4:40:14, 2.75s/it] 11%|█ | 768/6885 [10:33:41<4:39:34, 2.74s/it] 11%|█ | 769/6885 [10:33:44<4:38:51, 2.74s/it] 11%|█ | 770/6885 [10:33:49<6:13:44, 3.67s/it] {'loss': 0.6622, 'grad_norm': 1.389881220599468, 'learning_rate': 9.99588720216691e-06, 'epoch': 0.11} 11%|█ | 770/6885 [10:33:49<6:13:44, 3.67s/it] 11%|█ | 771/6885 [10:33:53<6:22:44, 3.76s/it] 11%|█ | 772/6885 [10:33:56<5:39:06, 3.33s/it] 11%|█ | 773/6885 [10:33:58<5:03:34, 2.98s/it] 11%|█ | 774/6885 [10:34:00<4:35:30, 2.71s/it] 11%|█▏ | 775/6885 [10:34:02<4:16:27, 2.52s/it] 11%|█▏ | 776/6885 [10:34:05<4:23:37, 2.59s/it] 11%|█▏ | 777/6885 [10:34:07<4:19:49, 2.55s/it] 11%|█▏ | 778/6885 [10:34:10<4:38:26, 2.74s/it] 11%|█▏ | 779/6885 [10:34:13<4:24:50, 2.60s/it] 11%|█▏ | 780/6885 [10:34:16<4:49:32, 2.85s/it] {'loss': 0.6279, 'grad_norm': 1.2318560606230133, 'learning_rate': 9.994794929822527e-06, 'epoch': 0.11} 11%|█▏ | 780/6885 [10:34:16<4:49:32, 2.85s/it] 11%|█▏ | 781/6885 [10:34:21<5:52:29, 3.46s/it] 11%|█▏ | 782/6885 [10:34:24<5:22:56, 3.17s/it] 11%|█▏ | 783/6885 [10:34:26<5:08:52, 3.04s/it] 11%|█▏ | 784/6885 [10:34:29<5:09:11, 3.04s/it] 11%|█▏ | 785/6885 [10:34:32<4:58:32, 2.94s/it] 11%|█▏ | 786/6885 [10:34:36<5:19:03, 3.14s/it] 11%|█▏ | 787/6885 [10:34:38<4:51:17, 2.87s/it] 11%|█▏ | 788/6885 [10:34:40<4:39:30, 2.75s/it] 11%|█▏ | 789/6885 [10:34:42<4:18:30, 2.54s/it] 11%|█▏ | 790/6885 [10:34:45<4:23:06, 2.59s/it] {'loss': 0.641, 'grad_norm': 1.355472620629438, 'learning_rate': 9.993574249011797e-06, 'epoch': 0.11} 11%|█▏ | 790/6885 [10:34:45<4:23:06, 2.59s/it] 11%|█▏ | 791/6885 [10:34:48<4:36:11, 2.72s/it] 12%|█▏ | 792/6885 [10:34:53<5:34:17, 3.29s/it] 12%|█▏ | 793/6885 [10:34:56<5:29:11, 3.24s/it] 12%|█▏ | 794/6885 [10:34:57<4:40:37, 2.76s/it] 12%|█▏ | 795/6885 [10:35:00<4:28:24, 2.64s/it] 12%|█▏ | 796/6885 [10:35:03<4:49:11, 2.85s/it] 12%|█▏ | 797/6885 [10:35:09<6:05:40, 3.60s/it] 12%|█▏ | 798/6885 [10:35:11<5:35:35, 3.31s/it] 12%|█▏ | 799/6885 [10:35:14<5:34:22, 3.30s/it] 12%|█▏ | 800/6885 [10:35:17<5:07:26, 3.03s/it] {'loss': 0.6439, 'grad_norm': 1.4379602146139996, 'learning_rate': 9.992225191116538e-06, 'epoch': 0.12} 12%|█▏ | 800/6885 [10:35:17<5:07:26, 3.03s/it] 12%|█▏ | 801/6885 [10:35:20<5:05:05, 3.01s/it] 12%|█▏ | 802/6885 [10:35:22<4:47:23, 2.83s/it] 12%|█▏ | 803/6885 [10:35:25<4:42:55, 2.79s/it] 12%|█▏ | 804/6885 [10:35:26<4:01:54, 2.39s/it] 12%|█▏ | 805/6885 [10:35:29<3:56:22, 2.33s/it] 12%|█▏ | 806/6885 [10:35:32<4:27:30, 2.64s/it] 12%|█▏ | 807/6885 [10:35:34<4:24:44, 2.61s/it] 12%|█▏ | 808/6885 [10:35:38<5:02:48, 2.99s/it] 12%|█▏ | 809/6885 [10:35:41<4:47:17, 2.84s/it] 12%|█▏ | 810/6885 [10:35:44<4:47:02, 2.83s/it] {'loss': 0.6457, 'grad_norm': 1.4777958226910466, 'learning_rate': 9.990747790818946e-06, 'epoch': 0.12} 12%|█▏ | 810/6885 [10:35:44<4:47:02, 2.83s/it] 12%|█▏ | 811/6885 [10:35:46<4:34:47, 2.71s/it] 12%|█▏ | 812/6885 [10:35:50<5:07:26, 3.04s/it] 12%|█▏ | 813/6885 [10:35:53<5:12:55, 3.09s/it] 12%|█▏ | 814/6885 [10:35:55<4:36:01, 2.73s/it] 12%|█▏ | 815/6885 [10:35:58<4:31:21, 2.68s/it] 12%|█▏ | 816/6885 [10:35:59<3:57:06, 2.34s/it] 12%|█▏ | 817/6885 [10:36:04<5:22:46, 3.19s/it] 12%|█▏ | 818/6885 [10:36:07<4:54:04, 2.91s/it] 12%|█▏ | 819/6885 [10:36:09<4:27:54, 2.65s/it] 12%|█▏ | 820/6885 [10:36:11<4:18:32, 2.56s/it] {'loss': 0.6483, 'grad_norm': 1.2895229336241503, 'learning_rate': 9.989142086100703e-06, 'epoch': 0.12} 12%|█▏ | 820/6885 [10:36:11<4:18:32, 2.56s/it] 12%|█▏ | 821/6885 [10:36:14<4:48:27, 2.85s/it] 12%|█▏ | 822/6885 [10:36:18<5:06:51, 3.04s/it] 12%|█▏ | 823/6885 [10:36:21<4:55:46, 2.93s/it] 12%|█▏ | 824/6885 [10:36:23<4:31:39, 2.69s/it] 12%|█▏ | 825/6885 [10:36:25<4:17:34, 2.55s/it] 12%|█▏ | 826/6885 [10:36:29<4:53:36, 2.91s/it] 12%|█▏ | 827/6885 [10:36:31<4:24:27, 2.62s/it] 12%|█▏ | 828/6885 [10:36:33<4:14:10, 2.52s/it] 12%|█▏ | 829/6885 [10:36:36<4:40:03, 2.77s/it] 12%|█▏ | 830/6885 [10:36:38<4:17:06, 2.55s/it] {'loss': 0.6509, 'grad_norm': 1.4811460587250382, 'learning_rate': 9.987408118241995e-06, 'epoch': 0.12} 12%|█▏ | 830/6885 [10:36:38<4:17:06, 2.55s/it] 12%|█▏ | 831/6885 [10:36:42<4:48:52, 2.86s/it] 12%|█▏ | 832/6885 [10:36:46<5:25:31, 3.23s/it] 12%|█▏ | 833/6885 [10:36:49<5:22:42, 3.20s/it] 12%|█▏ | 834/6885 [10:36:52<5:11:35, 3.09s/it] 12%|█▏ | 835/6885 [10:36:54<4:40:27, 2.78s/it] 12%|█▏ | 836/6885 [10:36:57<4:45:34, 2.83s/it] 12%|█▏ | 837/6885 [10:36:59<4:19:43, 2.58s/it] 12%|█▏ | 838/6885 [10:37:02<4:47:40, 2.85s/it] 12%|█▏ | 839/6885 [10:37:05<4:27:09, 2.65s/it] 12%|█▏ | 840/6885 [10:37:09<5:09:31, 3.07s/it] {'loss': 0.6181, 'grad_norm': 1.3189208191268318, 'learning_rate': 9.985545931820463e-06, 'epoch': 0.12} 12%|█▏ | 840/6885 [10:37:09<5:09:31, 3.07s/it] 12%|█▏ | 841/6885 [10:37:11<5:00:16, 2.98s/it] 12%|█▏ | 842/6885 [10:37:14<4:59:51, 2.98s/it] 12%|█▏ | 843/6885 [10:37:18<5:05:34, 3.03s/it] 12%|█▏ | 844/6885 [10:37:21<5:02:51, 3.01s/it] 12%|█▏ | 845/6885 [10:37:27<6:41:09, 3.99s/it] 12%|█▏ | 846/6885 [10:37:30<6:09:54, 3.68s/it] 12%|█▏ | 847/6885 [10:37:34<6:26:39, 3.84s/it] 12%|█▏ | 848/6885 [10:37:37<6:02:45, 3.61s/it] 12%|█▏ | 849/6885 [10:37:40<5:33:05, 3.31s/it] 12%|█▏ | 850/6885 [10:37:47<7:20:14, 4.38s/it] {'loss': 0.6274, 'grad_norm': 1.3731300368595278, 'learning_rate': 9.983555574710043e-06, 'epoch': 0.12} 12%|█▏ | 850/6885 [10:37:47<7:20:14, 4.38s/it] 12%|█▏ | 851/6885 [10:37:49<6:16:43, 3.75s/it] 12%|█▏ | 852/6885 [10:37:52<5:59:22, 3.57s/it] 12%|█▏ | 853/6885 [10:37:54<5:07:31, 3.06s/it] 12%|█▏ | 854/6885 [10:37:57<5:08:06, 3.07s/it] 12%|█▏ | 855/6885 [10:38:00<5:14:16, 3.13s/it] 12%|█▏ | 856/6885 [10:38:03<4:58:50, 2.97s/it] 12%|█▏ | 857/6885 [10:38:05<4:36:00, 2.75s/it] 12%|█▏ | 858/6885 [10:38:08<4:40:02, 2.79s/it] 12%|█▏ | 859/6885 [10:38:13<5:37:59, 3.37s/it] 12%|█▏ | 860/6885 [10:38:16<5:37:44, 3.36s/it] {'loss': 0.6398, 'grad_norm': 1.4055775942483093, 'learning_rate': 9.981437098079743e-06, 'epoch': 0.12} 12%|█▏ | 860/6885 [10:38:16<5:37:44, 3.36s/it] 13%|█▎ | 861/6885 [10:38:18<4:54:09, 2.93s/it] 13%|█▎ | 862/6885 [10:38:20<4:20:45, 2.60s/it] 13%|█▎ | 863/6885 [10:38:21<3:49:35, 2.29s/it] 13%|█▎ | 864/6885 [10:38:24<3:51:15, 2.30s/it] 13%|█▎ | 865/6885 [10:38:26<3:40:50, 2.20s/it] 13%|█▎ | 866/6885 [10:38:28<3:51:42, 2.31s/it] 13%|█▎ | 867/6885 [10:38:33<5:05:00, 3.04s/it] 13%|█▎ | 868/6885 [10:38:35<4:50:44, 2.90s/it] 13%|█▎ | 869/6885 [10:38:38<4:26:48, 2.66s/it] 13%|█▎ | 870/6885 [10:38:40<4:04:39, 2.44s/it] {'loss': 0.6393, 'grad_norm': 1.3307192435974602, 'learning_rate': 9.979190556392326e-06, 'epoch': 0.13} 13%|█▎ | 870/6885 [10:38:40<4:04:39, 2.44s/it] 13%|█▎ | 871/6885 [10:38:43<4:34:33, 2.74s/it] 13%|█▎ | 872/6885 [10:38:45<4:08:05, 2.48s/it] 13%|█▎ | 873/6885 [10:38:47<4:12:52, 2.52s/it] 13%|█▎ | 874/6885 [10:38:52<5:06:51, 3.06s/it] 13%|█▎ | 875/6885 [10:38:55<5:07:27, 3.07s/it] 13%|█▎ | 876/6885 [10:38:58<4:58:11, 2.98s/it] 13%|█▎ | 877/6885 [10:39:00<4:25:55, 2.66s/it] 13%|█▎ | 878/6885 [10:39:02<4:11:03, 2.51s/it] 13%|█▎ | 879/6885 [10:39:05<4:22:30, 2.62s/it] 13%|█▎ | 880/6885 [10:39:07<4:13:57, 2.54s/it] {'loss': 0.6456, 'grad_norm': 1.5622917958142868, 'learning_rate': 9.976816007402912e-06, 'epoch': 0.13} 13%|█▎ | 880/6885 [10:39:07<4:13:57, 2.54s/it] 13%|█▎ | 881/6885 [10:39:11<4:47:46, 2.88s/it] 13%|█▎ | 882/6885 [10:39:13<4:35:28, 2.75s/it] 13%|█▎ | 883/6885 [10:39:15<4:07:47, 2.48s/it] 13%|█▎ | 884/6885 [10:39:18<4:39:35, 2.80s/it] 13%|█▎ | 885/6885 [10:39:21<4:39:00, 2.79s/it] 13%|█▎ | 886/6885 [10:39:25<5:15:05, 3.15s/it] 13%|█▎ | 887/6885 [10:39:28<4:52:12, 2.92s/it] 13%|█▎ | 888/6885 [10:39:30<4:48:45, 2.89s/it] 13%|█▎ | 889/6885 [10:39:33<4:29:10, 2.69s/it] 13%|█▎ | 890/6885 [10:39:36<4:45:13, 2.85s/it] {'loss': 0.6288, 'grad_norm': 1.390636406480548, 'learning_rate': 9.974313512157488e-06, 'epoch': 0.13} 13%|█▎ | 890/6885 [10:39:36<4:45:13, 2.85s/it] 13%|█▎ | 891/6885 [10:39:39<4:40:58, 2.81s/it] 13%|█▎ | 892/6885 [10:39:42<5:06:35, 3.07s/it] 13%|█▎ | 893/6885 [10:39:47<5:48:31, 3.49s/it] 13%|█▎ | 894/6885 [10:39:49<5:15:19, 3.16s/it] 13%|█▎ | 895/6885 [10:39:52<5:05:34, 3.06s/it] 13%|█▎ | 896/6885 [10:39:55<5:14:01, 3.15s/it] 13%|█▎ | 897/6885 [10:39:58<5:13:26, 3.14s/it] 13%|█▎ | 898/6885 [10:40:07<8:11:10, 4.92s/it] 13%|█▎ | 899/6885 [10:40:11<7:31:23, 4.52s/it] 13%|█▎ | 900/6885 [10:40:13<6:23:39, 3.85s/it] {'loss': 0.6266, 'grad_norm': 1.4427250843896926, 'learning_rate': 9.971683134991344e-06, 'epoch': 0.13} 13%|█▎ | 900/6885 [10:40:13<6:23:39, 3.85s/it] 13%|█▎ | 901/6885 [10:40:17<6:15:58, 3.77s/it] 13%|█▎ | 902/6885 [10:40:20<5:55:33, 3.57s/it] 13%|█▎ | 903/6885 [10:40:23<5:37:11, 3.38s/it] 13%|█▎ | 904/6885 [10:40:26<5:28:40, 3.30s/it] 13%|█▎ | 905/6885 [10:40:29<5:31:35, 3.33s/it] 13%|█▎ | 906/6885 [10:40:31<4:50:54, 2.92s/it] 13%|█▎ | 907/6885 [10:40:35<4:58:15, 2.99s/it] 13%|█▎ | 908/6885 [10:40:38<5:07:40, 3.09s/it] 13%|█▎ | 909/6885 [10:40:41<5:21:09, 3.22s/it] 13%|█▎ | 910/6885 [10:40:43<4:39:24, 2.81s/it] {'loss': 0.6411, 'grad_norm': 1.4098179198178282, 'learning_rate': 9.968924943527418e-06, 'epoch': 0.13} 13%|█▎ | 910/6885 [10:40:43<4:39:24, 2.81s/it] 13%|█▎ | 911/6885 [10:40:46<4:46:57, 2.88s/it] 13%|█▎ | 912/6885 [10:40:49<4:32:45, 2.74s/it] 13%|█▎ | 913/6885 [10:40:51<4:14:26, 2.56s/it] 13%|█▎ | 914/6885 [10:40:54<4:28:53, 2.70s/it] 13%|█▎ | 915/6885 [10:40:57<4:34:25, 2.76s/it] 13%|█▎ | 916/6885 [10:41:00<4:53:33, 2.95s/it] 13%|█▎ | 917/6885 [10:41:03<4:47:50, 2.89s/it] 13%|█▎ | 918/6885 [10:41:05<4:24:14, 2.66s/it] 13%|█▎ | 919/6885 [10:41:07<4:16:46, 2.58s/it] 13%|█▎ | 920/6885 [10:41:11<4:53:16, 2.95s/it] {'loss': 0.6315, 'grad_norm': 1.4962238363929918, 'learning_rate': 9.96603900867455e-06, 'epoch': 0.13} 13%|█▎ | 920/6885 [10:41:11<4:53:16, 2.95s/it] 13%|█▎ | 921/6885 [10:41:15<5:01:16, 3.03s/it] 13%|█▎ | 922/6885 [10:41:18<5:00:50, 3.03s/it] 13%|█▎ | 923/6885 [10:41:21<5:10:01, 3.12s/it] 13%|█▎ | 924/6885 [10:41:26<5:56:36, 3.59s/it] 13%|█▎ | 925/6885 [10:41:28<5:25:16, 3.27s/it] 13%|█▎ | 926/6885 [10:41:31<5:08:42, 3.11s/it] 13%|█▎ | 927/6885 [10:41:33<4:31:56, 2.74s/it] 13%|█▎ | 928/6885 [10:41:36<4:41:47, 2.84s/it] 13%|█▎ | 929/6885 [10:41:38<4:10:51, 2.53s/it] 14%|█▎ | 930/6885 [10:41:44<5:57:52, 3.61s/it] {'loss': 0.6423, 'grad_norm': 1.3209044251278015, 'learning_rate': 9.963025404625673e-06, 'epoch': 0.14} 14%|█▎ | 930/6885 [10:41:44<5:57:52, 3.61s/it] 14%|█▎ | 931/6885 [10:41:46<5:13:05, 3.16s/it] 14%|█▎ | 932/6885 [10:41:49<5:19:50, 3.22s/it] 14%|█▎ | 933/6885 [10:41:52<5:00:53, 3.03s/it] 14%|█▎ | 934/6885 [10:41:55<5:21:22, 3.24s/it] 14%|█▎ | 935/6885 [10:41:58<4:51:24, 2.94s/it] 14%|█▎ | 936/6885 [10:42:00<4:40:01, 2.82s/it] 14%|█▎ | 937/6885 [10:42:03<4:34:41, 2.77s/it] 14%|█▎ | 938/6885 [10:42:05<4:21:29, 2.64s/it] 14%|█▎ | 939/6885 [10:42:10<5:11:27, 3.14s/it] 14%|█▎ | 940/6885 [10:42:13<5:08:54, 3.12s/it] {'loss': 0.6361, 'grad_norm': 1.39955503516968, 'learning_rate': 9.959884208855893e-06, 'epoch': 0.14} 14%|█▎ | 940/6885 [10:42:13<5:08:54, 3.12s/it] 14%|█▎ | 941/6885 [10:42:15<4:56:02, 2.99s/it] 14%|█▎ | 942/6885 [10:42:18<5:01:14, 3.04s/it] 14%|█▎ | 943/6885 [10:42:22<5:03:24, 3.06s/it] 14%|█▎ | 944/6885 [10:42:24<4:55:15, 2.98s/it] 14%|█▎ | 945/6885 [10:42:27<4:47:57, 2.91s/it] 14%|█▎ | 946/6885 [10:42:29<4:20:32, 2.63s/it] 14%|█▍ | 947/6885 [10:42:31<4:12:12, 2.55s/it] 14%|█▍ | 948/6885 [10:42:34<4:05:31, 2.48s/it] 14%|█▍ | 949/6885 [10:42:36<3:50:20, 2.33s/it] 14%|█▍ | 950/6885 [10:42:37<3:32:34, 2.15s/it] {'loss': 0.6241, 'grad_norm': 1.5348970475105241, 'learning_rate': 9.956615502120504e-06, 'epoch': 0.14} 14%|█▍ | 950/6885 [10:42:37<3:32:34, 2.15s/it] 14%|█▍ | 951/6885 [10:42:39<3:27:51, 2.10s/it] 14%|█▍ | 952/6885 [10:42:41<3:20:17, 2.03s/it] 14%|█▍ | 953/6885 [10:42:43<3:17:16, 2.00s/it] 14%|█▍ | 954/6885 [10:42:46<3:39:45, 2.22s/it] 14%|█▍ | 955/6885 [10:42:49<4:06:01, 2.49s/it] 14%|█▍ | 956/6885 [10:42:52<4:17:12, 2.60s/it] 14%|█▍ | 957/6885 [10:42:56<4:58:02, 3.02s/it] 14%|█▍ | 958/6885 [10:42:58<4:28:40, 2.72s/it] 14%|█▍ | 959/6885 [10:43:01<4:39:14, 2.83s/it] 14%|█▍ | 960/6885 [10:43:03<4:08:03, 2.51s/it] {'loss': 0.631, 'grad_norm': 1.48874630945738, 'learning_rate': 9.953219368452908e-06, 'epoch': 0.14} 14%|█▍ | 960/6885 [10:43:03<4:08:03, 2.51s/it] 14%|█▍ | 961/6885 [10:43:06<4:15:22, 2.59s/it] 14%|█▍ | 962/6885 [10:43:12<6:19:26, 3.84s/it] 14%|█▍ | 963/6885 [10:43:15<5:43:16, 3.48s/it] 14%|█▍ | 964/6885 [10:43:17<5:09:01, 3.13s/it] 14%|█▍ | 965/6885 [10:43:20<4:42:16, 2.86s/it] 14%|█▍ | 966/6885 [10:43:21<4:14:31, 2.58s/it] 14%|█▍ | 967/6885 [10:43:24<4:14:45, 2.58s/it] 14%|█▍ | 968/6885 [10:43:27<4:28:04, 2.72s/it] 14%|█▍ | 969/6885 [10:43:31<4:53:10, 2.97s/it] 14%|█▍ | 970/6885 [10:43:34<5:05:32, 3.10s/it] {'loss': 0.627, 'grad_norm': 1.310857282598366, 'learning_rate': 9.949695895162464e-06, 'epoch': 0.14} 14%|█▍ | 970/6885 [10:43:34<5:05:32, 3.10s/it] 14%|█▍ | 971/6885 [10:43:37<4:49:17, 2.93s/it] 14%|█▍ | 972/6885 [10:43:40<4:54:57, 2.99s/it] 14%|█▍ | 973/6885 [10:43:43<4:59:04, 3.04s/it] 14%|█▍ | 974/6885 [10:43:46<4:52:37, 2.97s/it] 14%|█▍ | 975/6885 [10:43:50<5:26:33, 3.32s/it] 14%|█▍ | 976/6885 [10:43:54<5:39:34, 3.45s/it] 14%|█▍ | 977/6885 [10:43:56<5:04:32, 3.09s/it] 14%|█▍ | 978/6885 [10:43:58<4:35:13, 2.80s/it] 14%|█▍ | 979/6885 [10:44:01<4:52:20, 2.97s/it] 14%|█▍ | 980/6885 [10:44:03<4:24:32, 2.69s/it] {'loss': 0.6387, 'grad_norm': 1.3619342578169393, 'learning_rate': 9.946045172832224e-06, 'epoch': 0.14} 14%|█▍ | 980/6885 [10:44:03<4:24:32, 2.69s/it] 14%|█▍ | 981/6885 [10:44:06<4:34:46, 2.79s/it] 14%|█▍ | 982/6885 [10:44:10<4:59:32, 3.04s/it] 14%|█▍ | 983/6885 [10:44:14<5:15:57, 3.21s/it] 14%|█▍ | 984/6885 [10:44:16<4:40:59, 2.86s/it] 14%|█▍ | 985/6885 [10:44:18<4:26:53, 2.71s/it] 14%|█▍ | 986/6885 [10:44:20<4:10:55, 2.55s/it] 14%|█▍ | 987/6885 [10:44:22<3:58:50, 2.43s/it] 14%|█▍ | 988/6885 [10:44:25<3:59:56, 2.44s/it] 14%|█▍ | 989/6885 [10:44:27<3:49:47, 2.34s/it] 14%|█▍ | 990/6885 [10:44:29<3:36:29, 2.20s/it] {'loss': 0.6331, 'grad_norm': 1.4936986486504984, 'learning_rate': 9.942267295316625e-06, 'epoch': 0.14} 14%|█▍ | 990/6885 [10:44:29<3:36:29, 2.20s/it] 14%|█▍ | 991/6885 [10:44:33<4:23:18, 2.68s/it] 14%|█▍ | 992/6885 [10:44:35<4:10:13, 2.55s/it] 14%|█▍ | 993/6885 [10:44:38<4:14:41, 2.59s/it] 14%|█▍ | 994/6885 [10:44:40<3:59:28, 2.44s/it] 14%|█▍ | 995/6885 [10:44:43<4:25:11, 2.70s/it] 14%|█▍ | 996/6885 [10:44:45<4:00:26, 2.45s/it] 14%|█▍ | 997/6885 [10:44:47<3:54:08, 2.39s/it] 14%|█▍ | 998/6885 [10:44:51<4:26:57, 2.72s/it] 15%|█▍ | 999/6885 [10:44:55<5:18:35, 3.25s/it] 15%|█▍ | 1000/6885 [10:44:57<4:45:43, 2.91s/it] {'loss': 0.626, 'grad_norm': 1.32511584393411, 'learning_rate': 9.938362359739068e-06, 'epoch': 0.15} 15%|█▍ | 1000/6885 [10:44:57<4:45:43, 2.91s/it] 15%|█▍ | 1001/6885 [10:44:59<4:13:53, 2.59s/it] 15%|█▍ | 1002/6885 [10:45:01<4:04:53, 2.50s/it] 15%|█▍ | 1003/6885 [10:45:05<4:41:47, 2.87s/it] 15%|█▍ | 1004/6885 [10:45:08<4:59:03, 3.05s/it] 15%|█▍ | 1005/6885 [10:45:10<4:20:35, 2.66s/it] 15%|█▍ | 1006/6885 [10:45:13<4:33:35, 2.79s/it] 15%|█▍ | 1007/6885 [10:45:19<5:48:57, 3.56s/it] 15%|█▍ | 1008/6885 [10:45:23<6:07:08, 3.75s/it] 15%|█▍ | 1009/6885 [10:45:26<5:59:22, 3.67s/it] 15%|█▍ | 1010/6885 [10:45:29<5:32:26, 3.40s/it] {'loss': 0.6451, 'grad_norm': 1.3291454266011833, 'learning_rate': 9.934330466489414e-06, 'epoch': 0.15} 15%|█▍ | 1010/6885 [10:45:29<5:32:26, 3.40s/it] 15%|█▍ | 1011/6885 [10:45:32<5:12:07, 3.19s/it] 15%|█▍ | 1012/6885 [10:45:34<4:46:39, 2.93s/it] 15%|█▍ | 1013/6885 [10:45:38<5:17:37, 3.25s/it] 15%|█▍ | 1014/6885 [10:45:41<5:15:54, 3.23s/it] 15%|█▍ | 1015/6885 [10:45:43<4:39:57, 2.86s/it] 15%|█▍ | 1016/6885 [10:45:46<4:46:18, 2.93s/it] 15%|█▍ | 1017/6885 [10:45:49<4:24:43, 2.71s/it] 15%|█▍ | 1018/6885 [10:45:51<4:12:54, 2.59s/it] 15%|█▍ | 1019/6885 [10:45:53<4:12:07, 2.58s/it] 15%|█▍ | 1020/6885 [10:45:57<4:51:55, 2.99s/it] {'loss': 0.6333, 'grad_norm': 1.3289648153139675, 'learning_rate': 9.930171719221418e-06, 'epoch': 0.15} 15%|█▍ | 1020/6885 [10:45:57<4:51:55, 2.99s/it] 15%|█▍ | 1021/6885 [10:46:01<5:23:14, 3.31s/it] 15%|█▍ | 1022/6885 [10:46:05<5:36:34, 3.44s/it] 15%|█▍ | 1023/6885 [10:46:08<5:23:33, 3.31s/it] 15%|█▍ | 1024/6885 [10:46:12<5:49:53, 3.58s/it] 15%|█▍ | 1025/6885 [10:46:16<5:35:07, 3.43s/it] 15%|█▍ | 1026/6885 [10:46:17<4:50:58, 2.98s/it] 15%|█▍ | 1027/6885 [10:46:20<4:33:22, 2.80s/it] 15%|█▍ | 1028/6885 [10:46:23<4:54:44, 3.02s/it] 15%|█▍ | 1029/6885 [10:46:25<4:15:15, 2.62s/it] 15%|█▍ | 1030/6885 [10:46:27<4:04:25, 2.50s/it] {'loss': 0.6329, 'grad_norm': 1.3388955314518605, 'learning_rate': 9.925886224850047e-06, 'epoch': 0.15} 15%|█▍ | 1030/6885 [10:46:27<4:04:25, 2.50s/it] 15%|█▍ | 1031/6885 [10:46:30<4:06:26, 2.53s/it] 15%|█▍ | 1032/6885 [10:46:32<4:10:05, 2.56s/it] 15%|█▌ | 1033/6885 [10:46:34<3:48:26, 2.34s/it] 15%|█▌ | 1034/6885 [10:46:37<4:07:20, 2.54s/it] 15%|█▌ | 1035/6885 [10:46:39<3:49:40, 2.36s/it] 15%|█▌ | 1036/6885 [10:47:00<12:36:46, 7.76s/it] 15%|█▌ | 1037/6885 [10:47:04<10:50:37, 6.68s/it] 15%|█▌ | 1038/6885 [10:47:07<9:07:15, 5.62s/it] 15%|█▌ | 1039/6885 [10:47:09<7:37:25, 4.69s/it] 15%|█▌ | 1040/6885 [10:47:12<6:31:52, 4.02s/it] {'loss': 0.6308, 'grad_norm': 1.3788458990043229, 'learning_rate': 9.921474093548748e-06, 'epoch': 0.15} 15%|█▌ | 1040/6885 [10:47:12<6:31:52, 4.02s/it] 15%|█▌ | 1041/6885 [10:47:14<5:28:08, 3.37s/it] 15%|█▌ | 1042/6885 [10:47:16<4:59:39, 3.08s/it] 15%|█▌ | 1043/6885 [10:47:19<5:02:52, 3.11s/it] 15%|█▌ | 1044/6885 [10:47:22<4:57:54, 3.06s/it] 15%|█▌ | 1045/6885 [10:47:25<4:46:47, 2.95s/it] 15%|█▌ | 1046/6885 [10:47:30<5:50:45, 3.60s/it] 15%|█▌ | 1047/6885 [10:47:32<5:13:51, 3.23s/it] 15%|█▌ | 1048/6885 [10:47:34<4:34:16, 2.82s/it] 15%|█▌ | 1049/6885 [10:47:37<4:36:23, 2.84s/it] 15%|█▌ | 1050/6885 [10:47:39<4:20:35, 2.68s/it] {'loss': 0.6366, 'grad_norm': 1.2630947233952987, 'learning_rate': 9.916935438746604e-06, 'epoch': 0.15} 15%|█▌ | 1050/6885 [10:47:40<4:20:35, 2.68s/it] 15%|█▌ | 1051/6885 [10:47:43<4:40:16, 2.88s/it] 15%|█▌ | 1052/6885 [10:47:45<4:22:29, 2.70s/it] 15%|█▌ | 1053/6885 [10:47:48<4:15:34, 2.63s/it] 15%|█▌ | 1054/6885 [10:47:52<4:57:22, 3.06s/it] 15%|█▌ | 1055/6885 [10:47:56<5:40:14, 3.50s/it] 15%|█▌ | 1056/6885 [10:47:59<5:29:42, 3.39s/it] 15%|█▌ | 1057/6885 [10:48:03<5:52:10, 3.63s/it] 15%|█▌ | 1058/6885 [10:48:06<5:16:33, 3.26s/it] 15%|█▌ | 1059/6885 [10:48:08<4:51:20, 3.00s/it] 15%|█▌ | 1060/6885 [10:48:12<5:13:52, 3.23s/it] {'loss': 0.6224, 'grad_norm': 1.2586848110727198, 'learning_rate': 9.912270377125424e-06, 'epoch': 0.15} 15%|█▌ | 1060/6885 [10:48:12<5:13:52, 3.23s/it] 15%|█▌ | 1061/6885 [10:48:14<4:46:37, 2.95s/it] 15%|█▌ | 1062/6885 [10:48:18<5:02:46, 3.12s/it] 15%|█▌ | 1063/6885 [10:48:20<4:20:04, 2.68s/it] 15%|█▌ | 1064/6885 [10:48:23<4:31:24, 2.80s/it] 15%|█▌ | 1065/6885 [10:48:25<4:28:25, 2.77s/it] 15%|█▌ | 1066/6885 [10:48:28<4:34:27, 2.83s/it] 15%|█▌ | 1067/6885 [10:48:31<4:44:56, 2.94s/it] 16%|█▌ | 1068/6885 [10:48:34<4:37:30, 2.86s/it] 16%|█▌ | 1069/6885 [10:48:37<4:45:47, 2.95s/it] 16%|█▌ | 1070/6885 [10:48:40<4:38:34, 2.87s/it] {'loss': 0.6261, 'grad_norm': 1.5648142512317709, 'learning_rate': 9.90747902861674e-06, 'epoch': 0.16} 16%|█▌ | 1070/6885 [10:48:40<4:38:34, 2.87s/it] 16%|█▌ | 1071/6885 [10:48:42<4:17:15, 2.65s/it] 16%|█▌ | 1072/6885 [10:48:44<3:54:02, 2.42s/it] 16%|█▌ | 1073/6885 [10:48:46<3:48:37, 2.36s/it] 16%|█▌ | 1074/6885 [10:48:48<3:42:10, 2.29s/it] 16%|█▌ | 1075/6885 [10:48:51<3:49:35, 2.37s/it] 16%|█▌ | 1076/6885 [10:48:54<4:01:44, 2.50s/it] 16%|█▌ | 1077/6885 [10:48:56<3:57:03, 2.45s/it] 16%|█▌ | 1078/6885 [10:49:00<4:26:25, 2.75s/it] 16%|█▌ | 1079/6885 [10:49:03<5:01:36, 3.12s/it] 16%|█▌ | 1080/6885 [10:49:05<4:21:41, 2.70s/it] {'loss': 0.6207, 'grad_norm': 1.477705850244199, 'learning_rate': 9.902561516398723e-06, 'epoch': 0.16} 16%|█▌ | 1080/6885 [10:49:05<4:21:41, 2.70s/it] 16%|█▌ | 1081/6885 [10:49:07<4:01:11, 2.49s/it] 16%|█▌ | 1082/6885 [10:49:11<4:35:45, 2.85s/it] 16%|█▌ | 1083/6885 [10:49:14<4:36:08, 2.86s/it] 16%|█▌ | 1084/6885 [10:49:17<4:36:36, 2.86s/it] 16%|█▌ | 1085/6885 [10:49:19<4:20:02, 2.69s/it] 16%|█▌ | 1086/6885 [10:49:22<4:30:37, 2.80s/it] 16%|█▌ | 1087/6885 [10:49:24<4:13:31, 2.62s/it] 16%|█▌ | 1088/6885 [10:49:27<4:25:48, 2.75s/it] 16%|█▌ | 1089/6885 [10:49:32<5:11:37, 3.23s/it] 16%|█▌ | 1090/6885 [10:49:34<4:59:20, 3.10s/it] {'loss': 0.6218, 'grad_norm': 1.2950681154644361, 'learning_rate': 9.897517966893023e-06, 'epoch': 0.16} 16%|█▌ | 1090/6885 [10:49:34<4:59:20, 3.10s/it] 16%|█▌ | 1091/6885 [10:49:36<4:19:43, 2.69s/it] 16%|█▌ | 1092/6885 [10:49:38<3:47:27, 2.36s/it] 16%|█▌ | 1093/6885 [10:49:40<3:48:11, 2.36s/it] 16%|█▌ | 1094/6885 [10:49:44<4:42:12, 2.92s/it] 16%|█▌ | 1095/6885 [10:49:48<5:07:32, 3.19s/it] 16%|█▌ | 1096/6885 [10:49:50<4:40:18, 2.91s/it] 16%|█▌ | 1097/6885 [10:49:54<4:57:21, 3.08s/it] 16%|█▌ | 1098/6885 [10:49:57<5:02:24, 3.14s/it] 16%|█▌ | 1099/6885 [10:50:00<4:51:18, 3.02s/it] 16%|█▌ | 1100/6885 [10:50:02<4:27:08, 2.77s/it] {'loss': 0.6237, 'grad_norm': 1.4613516139089748, 'learning_rate': 9.892348509761509e-06, 'epoch': 0.16} 16%|█▌ | 1100/6885 [10:50:02<4:27:08, 2.77s/it] 16%|█▌ | 1101/6885 [10:50:05<4:23:38, 2.73s/it] 16%|█▌ | 1102/6885 [10:50:09<5:02:57, 3.14s/it] 16%|█▌ | 1103/6885 [10:50:13<5:21:52, 3.34s/it] 16%|█▌ | 1104/6885 [10:50:16<5:30:23, 3.43s/it] 16%|█▌ | 1105/6885 [10:50:19<5:16:47, 3.29s/it] 16%|█▌ | 1106/6885 [10:50:21<4:47:03, 2.98s/it] 16%|█▌ | 1107/6885 [10:50:25<4:58:01, 3.09s/it] 16%|█▌ | 1108/6885 [10:50:30<5:48:48, 3.62s/it] 16%|█▌ | 1109/6885 [10:50:32<5:15:54, 3.28s/it] 16%|█▌ | 1110/6885 [10:50:36<5:37:08, 3.50s/it] {'loss': 0.6425, 'grad_norm': 1.2641419484176866, 'learning_rate': 9.887053277902943e-06, 'epoch': 0.16} 16%|█▌ | 1110/6885 [10:50:36<5:37:08, 3.50s/it] 16%|█▌ | 1111/6885 [10:50:40<5:54:29, 3.68s/it] 16%|█▌ | 1112/6885 [10:50:44<5:51:19, 3.65s/it] 16%|█▌ | 1113/6885 [10:50:47<5:26:09, 3.39s/it] 16%|█▌ | 1114/6885 [10:50:49<4:58:44, 3.11s/it] 16%|█▌ | 1115/6885 [10:50:52<4:54:40, 3.06s/it] 16%|█▌ | 1116/6885 [10:50:56<5:09:50, 3.22s/it] 16%|█▌ | 1117/6885 [10:51:00<5:39:18, 3.53s/it] 16%|█▌ | 1118/6885 [10:51:04<5:46:05, 3.60s/it] 16%|█▋ | 1119/6885 [10:51:06<5:07:11, 3.20s/it] 16%|█▋ | 1120/6885 [10:51:08<4:42:56, 2.94s/it] {'loss': 0.6423, 'grad_norm': 1.2419109246681843, 'learning_rate': 9.881632407449561e-06, 'epoch': 0.16} 16%|█▋ | 1120/6885 [10:51:08<4:42:56, 2.94s/it] 16%|█▋ | 1121/6885 [10:51:10<4:20:08, 2.71s/it] 16%|█▋ | 1122/6885 [10:51:13<4:01:47, 2.52s/it] 16%|█▋ | 1123/6885 [10:51:15<4:02:53, 2.53s/it] 16%|█▋ | 1124/6885 [10:51:18<4:12:25, 2.63s/it] 16%|█▋ | 1125/6885 [10:51:21<4:39:19, 2.91s/it] 16%|█▋ | 1126/6885 [10:51:27<5:50:28, 3.65s/it] 16%|█▋ | 1127/6885 [10:51:30<5:32:09, 3.46s/it] 16%|█▋ | 1128/6885 [10:51:32<4:55:38, 3.08s/it] 16%|█▋ | 1129/6885 [10:51:36<5:06:31, 3.20s/it] 16%|█▋ | 1130/6885 [10:51:38<4:41:20, 2.93s/it] {'loss': 0.6383, 'grad_norm': 1.4096648257937974, 'learning_rate': 9.876086037763575e-06, 'epoch': 0.16} 16%|█▋ | 1130/6885 [10:51:38<4:41:20, 2.93s/it] 16%|█▋ | 1131/6885 [10:51:41<4:42:42, 2.95s/it] 16%|█▋ | 1132/6885 [10:51:43<4:22:23, 2.74s/it] 16%|█▋ | 1133/6885 [10:51:45<4:12:01, 2.63s/it] 16%|█▋ | 1134/6885 [10:51:49<4:39:36, 2.92s/it] 16%|█▋ | 1135/6885 [10:51:52<4:32:20, 2.84s/it] 16%|█▋ | 1136/6885 [10:51:54<4:14:42, 2.66s/it] 17%|█▋ | 1137/6885 [10:51:56<3:53:16, 2.43s/it] 17%|█▋ | 1138/6885 [10:51:58<3:42:31, 2.32s/it] 17%|█▋ | 1139/6885 [10:52:01<3:55:25, 2.46s/it] 17%|█▋ | 1140/6885 [10:52:03<3:44:42, 2.35s/it] {'loss': 0.6059, 'grad_norm': 1.2574892255736747, 'learning_rate': 9.870414311433585e-06, 'epoch': 0.17} 17%|█▋ | 1140/6885 [10:52:03<3:44:42, 2.35s/it] 17%|█▋ | 1141/6885 [10:52:05<3:51:54, 2.42s/it] 17%|█▋ | 1142/6885 [10:52:09<4:12:37, 2.64s/it] 17%|█▋ | 1143/6885 [10:52:11<4:01:59, 2.53s/it] 17%|█▋ | 1144/6885 [10:52:13<4:05:44, 2.57s/it] 17%|█▋ | 1145/6885 [10:52:16<3:59:21, 2.50s/it] 17%|█▋ | 1146/6885 [10:52:18<3:57:39, 2.48s/it] 17%|█▋ | 1147/6885 [10:52:22<4:37:41, 2.90s/it] 17%|█▋ | 1148/6885 [10:52:24<4:11:31, 2.63s/it] 17%|█▋ | 1149/6885 [10:52:26<3:48:57, 2.40s/it] 17%|█▋ | 1150/6885 [10:52:28<3:45:26, 2.36s/it] {'loss': 0.6098, 'grad_norm': 1.2716145459010044, 'learning_rate': 9.86461737427092e-06, 'epoch': 0.17} 17%|█▋ | 1150/6885 [10:52:28<3:45:26, 2.36s/it] 17%|█▋ | 1151/6885 [10:52:30<3:41:36, 2.32s/it] 17%|█▋ | 1152/6885 [10:52:33<3:46:40, 2.37s/it] 17%|█▋ | 1153/6885 [10:52:35<3:49:54, 2.41s/it] 17%|█▋ | 1154/6885 [10:52:38<3:41:43, 2.32s/it] 17%|█▋ | 1155/6885 [10:52:41<4:26:12, 2.79s/it] 17%|█▋ | 1156/6885 [10:52:43<3:57:18, 2.49s/it] 17%|█▋ | 1157/6885 [10:52:47<4:28:56, 2.82s/it] 17%|█▋ | 1158/6885 [10:52:49<4:14:40, 2.67s/it] 17%|█▋ | 1159/6885 [10:52:52<4:19:24, 2.72s/it] 17%|█▋ | 1160/6885 [10:52:54<4:07:10, 2.59s/it] {'loss': 0.6214, 'grad_norm': 1.1998298755084313, 'learning_rate': 9.858695375305885e-06, 'epoch': 0.17} 17%|█▋ | 1160/6885 [10:52:54<4:07:10, 2.59s/it] 17%|█▋ | 1161/6885 [10:52:58<4:39:33, 2.93s/it] 17%|█▋ | 1162/6885 [10:53:00<4:20:10, 2.73s/it] 17%|█▋ | 1163/6885 [10:53:02<3:55:37, 2.47s/it] 17%|█▋ | 1164/6885 [10:53:05<4:06:16, 2.58s/it] 17%|█▋ | 1165/6885 [10:53:08<4:12:28, 2.65s/it] 17%|█▋ | 1166/6885 [10:53:11<4:38:40, 2.92s/it] 17%|█▋ | 1167/6885 [10:53:14<4:35:03, 2.89s/it] 17%|█▋ | 1168/6885 [10:53:17<4:24:17, 2.77s/it] 17%|█▋ | 1169/6885 [10:53:19<4:17:34, 2.70s/it] 17%|█▋ | 1170/6885 [10:53:22<4:25:00, 2.78s/it] {'loss': 0.6241, 'grad_norm': 1.4281449888166444, 'learning_rate': 9.852648466783927e-06, 'epoch': 0.17} 17%|█▋ | 1170/6885 [10:53:22<4:25:00, 2.78s/it] 17%|█▋ | 1171/6885 [10:53:25<4:39:23, 2.93s/it] 17%|█▋ | 1172/6885 [10:53:28<4:30:31, 2.84s/it] 17%|█▋ | 1173/6885 [10:53:32<4:48:13, 3.03s/it] 17%|█▋ | 1174/6885 [10:53:35<4:59:54, 3.15s/it] 17%|█▋ | 1175/6885 [10:53:39<5:15:07, 3.31s/it] 17%|█▋ | 1176/6885 [10:53:42<5:24:20, 3.41s/it] 17%|█▋ | 1177/6885 [10:53:46<5:40:41, 3.58s/it] 17%|█▋ | 1178/6885 [10:53:49<5:03:27, 3.19s/it] 17%|█▋ | 1179/6885 [10:53:52<5:10:34, 3.27s/it] 17%|█▋ | 1180/6885 [10:53:54<4:46:15, 3.01s/it] {'loss': 0.6474, 'grad_norm': 1.4071764477667867, 'learning_rate': 9.84647680416173e-06, 'epoch': 0.17} 17%|█▋ | 1180/6885 [10:53:54<4:46:15, 3.01s/it] 17%|█▋ | 1181/6885 [10:53:58<5:15:56, 3.32s/it] 17%|█▋ | 1182/6885 [10:54:00<4:37:21, 2.92s/it] 17%|█▋ | 1183/6885 [10:54:03<4:28:04, 2.82s/it] 17%|█▋ | 1184/6885 [10:54:06<4:30:10, 2.84s/it] 17%|█▋ | 1185/6885 [10:54:08<4:17:19, 2.71s/it] 17%|█▋ | 1186/6885 [10:54:11<4:11:34, 2.65s/it] 17%|█▋ | 1187/6885 [10:54:13<4:06:08, 2.59s/it] 17%|█▋ | 1188/6885 [10:54:16<4:17:01, 2.71s/it] 17%|█▋ | 1189/6885 [10:54:20<4:37:31, 2.92s/it] 17%|█▋ | 1190/6885 [10:54:23<4:46:22, 3.02s/it] {'loss': 0.6326, 'grad_norm': 1.2174453861834778, 'learning_rate': 9.840180546103215e-06, 'epoch': 0.17} 17%|█▋ | 1190/6885 [10:54:23<4:46:22, 3.02s/it] 17%|█▋ | 1191/6885 [10:54:25<4:30:56, 2.85s/it] 17%|█▋ | 1192/6885 [10:54:29<4:58:07, 3.14s/it] 17%|█▋ | 1193/6885 [10:54:31<4:16:22, 2.70s/it] 17%|█▋ | 1194/6885 [10:54:33<3:57:06, 2.50s/it] 17%|█▋ | 1195/6885 [10:54:36<4:23:26, 2.78s/it] 17%|█▋ | 1196/6885 [10:54:39<4:12:55, 2.67s/it] 17%|█▋ | 1197/6885 [10:54:42<4:42:57, 2.98s/it] 17%|█▋ | 1198/6885 [10:54:45<4:25:05, 2.80s/it] 17%|█▋ | 1199/6885 [10:54:47<4:01:24, 2.55s/it] 17%|█▋ | 1200/6885 [10:54:49<4:01:55, 2.55s/it] {'loss': 0.6185, 'grad_norm': 1.3029300772595094, 'learning_rate': 9.833759854475453e-06, 'epoch': 0.17} 17%|█▋ | 1200/6885 [10:54:49<4:01:55, 2.55s/it] 17%|█▋ | 1201/6885 [10:54:52<4:06:59, 2.61s/it] 17%|█▋ | 1202/6885 [10:54:55<4:09:33, 2.63s/it] 17%|█▋ | 1203/6885 [10:54:58<4:12:23, 2.67s/it] 17%|█▋ | 1204/6885 [10:55:00<3:58:05, 2.51s/it] 18%|█▊ | 1205/6885 [10:55:02<3:47:00, 2.40s/it] 18%|█▊ | 1206/6885 [10:55:05<3:58:37, 2.52s/it] 18%|█▊ | 1207/6885 [10:55:07<4:04:31, 2.58s/it] 18%|█▊ | 1208/6885 [10:55:11<4:30:32, 2.86s/it] 18%|█▊ | 1209/6885 [10:55:13<4:05:45, 2.60s/it] 18%|█▊ | 1210/6885 [10:55:16<4:30:55, 2.86s/it] {'loss': 0.6301, 'grad_norm': 1.271112016193465, 'learning_rate': 9.827214894344514e-06, 'epoch': 0.18} 18%|█▊ | 1210/6885 [10:55:16<4:30:55, 2.86s/it] 18%|█▊ | 1211/6885 [10:55:20<4:48:24, 3.05s/it] 18%|█▊ | 1212/6885 [10:55:25<6:02:31, 3.83s/it] 18%|█▊ | 1213/6885 [10:55:28<5:10:30, 3.28s/it] 18%|█▊ | 1214/6885 [10:55:29<4:27:16, 2.83s/it] 18%|█▊ | 1215/6885 [10:55:33<5:06:15, 3.24s/it] 18%|█▊ | 1216/6885 [10:55:35<4:31:55, 2.88s/it] 18%|█▊ | 1217/6885 [10:55:39<4:41:01, 2.97s/it] 18%|█▊ | 1218/6885 [10:55:42<5:03:05, 3.21s/it] 18%|█▊ | 1219/6885 [10:55:47<5:34:34, 3.54s/it] 18%|█▊ | 1220/6885 [10:55:51<5:40:01, 3.60s/it] {'loss': 0.6317, 'grad_norm': 1.2997276991719462, 'learning_rate': 9.82054583397122e-06, 'epoch': 0.18} 18%|█▊ | 1220/6885 [10:55:51<5:40:01, 3.60s/it] 18%|█▊ | 1221/6885 [10:55:55<5:58:56, 3.80s/it] 18%|█▊ | 1222/6885 [10:55:57<5:11:43, 3.30s/it] 18%|█▊ | 1223/6885 [10:56:00<5:08:19, 3.27s/it] 18%|█▊ | 1224/6885 [10:56:02<4:31:11, 2.87s/it] 18%|█▊ | 1225/6885 [10:56:04<4:05:50, 2.61s/it] 18%|█▊ | 1226/6885 [10:56:12<6:40:47, 4.25s/it] 18%|█▊ | 1227/6885 [10:56:14<5:32:49, 3.53s/it] 18%|█▊ | 1228/6885 [10:56:17<5:22:05, 3.42s/it] 18%|█▊ | 1229/6885 [10:56:20<5:09:41, 3.29s/it] 18%|█▊ | 1230/6885 [10:56:25<5:47:20, 3.69s/it] {'loss': 0.6159, 'grad_norm': 1.2096030387104992, 'learning_rate': 9.813752844806814e-06, 'epoch': 0.18} 18%|█▊ | 1230/6885 [10:56:25<5:47:20, 3.69s/it] 18%|█▊ | 1231/6885 [10:56:27<5:11:25, 3.30s/it] 18%|█▊ | 1232/6885 [10:56:29<4:26:11, 2.83s/it] 18%|█▊ | 1233/6885 [10:56:32<4:45:37, 3.03s/it] 18%|█▊ | 1234/6885 [10:56:36<5:16:04, 3.36s/it] 18%|█▊ | 1235/6885 [10:56:40<5:08:32, 3.28s/it] 18%|█▊ | 1236/6885 [10:56:43<5:09:10, 3.28s/it] 18%|█▊ | 1237/6885 [10:56:47<5:33:24, 3.54s/it] 18%|█▊ | 1238/6885 [10:56:50<5:19:09, 3.39s/it] 18%|█▊ | 1239/6885 [10:56:52<4:45:21, 3.03s/it] 18%|█▊ | 1240/6885 [10:56:55<4:31:43, 2.89s/it] {'loss': 0.6289, 'grad_norm': 1.2973416257944899, 'learning_rate': 9.806836101488561e-06, 'epoch': 0.18} 18%|█▊ | 1240/6885 [10:56:55<4:31:43, 2.89s/it] 18%|█▊ | 1241/6885 [10:56:57<4:03:16, 2.59s/it] 18%|█▊ | 1242/6885 [10:57:02<5:06:29, 3.26s/it] 18%|█▊ | 1243/6885 [10:57:07<6:23:19, 4.08s/it] 18%|█▊ | 1244/6885 [10:57:11<6:11:24, 3.95s/it] 18%|█▊ | 1245/6885 [10:57:15<6:12:25, 3.96s/it] 18%|█▊ | 1246/6885 [10:57:18<5:40:06, 3.62s/it] 18%|█▊ | 1247/6885 [10:57:20<4:54:15, 3.13s/it] 18%|█▊ | 1248/6885 [10:57:23<4:54:08, 3.13s/it] 18%|█▊ | 1249/6885 [10:57:25<4:30:10, 2.88s/it] 18%|█▊ | 1250/6885 [10:57:28<4:29:22, 2.87s/it] {'loss': 0.6088, 'grad_norm': 1.3197440048632956, 'learning_rate': 9.799795781835253e-06, 'epoch': 0.18} 18%|█▊ | 1250/6885 [10:57:28<4:29:22, 2.87s/it] 18%|█▊ | 1251/6885 [10:57:33<5:36:11, 3.58s/it] 18%|█▊ | 1252/6885 [10:57:35<4:49:27, 3.08s/it] 18%|█▊ | 1253/6885 [10:57:38<4:36:05, 2.94s/it] 18%|█▊ | 1254/6885 [10:57:40<4:21:16, 2.78s/it] 18%|█▊ | 1255/6885 [10:57:43<4:15:50, 2.73s/it] 18%|█▊ | 1256/6885 [10:57:45<4:00:26, 2.56s/it] 18%|█▊ | 1257/6885 [10:57:50<4:52:08, 3.11s/it] 18%|█▊ | 1258/6885 [10:57:52<4:46:07, 3.05s/it] 18%|█▊ | 1259/6885 [10:57:55<4:22:24, 2.80s/it] 18%|█▊ | 1260/6885 [10:57:58<4:28:57, 2.87s/it] {'loss': 0.6206, 'grad_norm': 1.2535036782710556, 'learning_rate': 9.79263206684264e-06, 'epoch': 0.18} 18%|█▊ | 1260/6885 [10:57:58<4:28:57, 2.87s/it] 18%|█▊ | 1261/6885 [10:58:00<4:07:48, 2.64s/it] 18%|█▊ | 1262/6885 [10:58:02<3:56:45, 2.53s/it] 18%|█▊ | 1263/6885 [10:58:05<4:13:20, 2.70s/it] 18%|█▊ | 1264/6885 [10:58:10<5:00:10, 3.20s/it] 18%|█▊ | 1265/6885 [10:58:11<4:19:33, 2.77s/it] 18%|█▊ | 1266/6885 [10:58:14<4:08:08, 2.65s/it] 18%|█▊ | 1267/6885 [10:58:16<4:07:46, 2.65s/it] 18%|█▊ | 1268/6885 [10:58:18<3:50:51, 2.47s/it] 18%|█▊ | 1269/6885 [10:58:21<4:03:13, 2.60s/it] 18%|█▊ | 1270/6885 [10:58:23<3:51:44, 2.48s/it] {'loss': 0.6149, 'grad_norm': 1.3190252094745194, 'learning_rate': 9.785345140678775e-06, 'epoch': 0.18} 18%|█▊ | 1270/6885 [10:58:23<3:51:44, 2.48s/it] 18%|█▊ | 1271/6885 [10:58:27<4:20:21, 2.78s/it] 18%|█▊ | 1272/6885 [10:58:31<4:53:21, 3.14s/it] 18%|█▊ | 1273/6885 [10:58:34<4:50:29, 3.11s/it] 19%|█▊ | 1274/6885 [10:58:36<4:26:23, 2.85s/it] 19%|█▊ | 1275/6885 [10:58:39<4:15:02, 2.73s/it] 19%|█▊ | 1276/6885 [10:58:41<4:08:06, 2.65s/it] 19%|█▊ | 1277/6885 [10:58:43<3:56:25, 2.53s/it] 19%|█▊ | 1278/6885 [10:58:45<3:30:49, 2.26s/it] 19%|█▊ | 1279/6885 [10:58:48<3:54:14, 2.51s/it] 19%|█▊ | 1280/6885 [10:58:50<3:46:14, 2.42s/it] {'loss': 0.6134, 'grad_norm': 1.3148617882447478, 'learning_rate': 9.777935190679277e-06, 'epoch': 0.19} 19%|█▊ | 1280/6885 [10:58:50<3:46:14, 2.42s/it] 19%|█▊ | 1281/6885 [10:58:52<3:32:54, 2.28s/it] 19%|█▊ | 1282/6885 [10:58:55<3:50:14, 2.47s/it] 19%|█▊ | 1283/6885 [10:58:58<4:12:13, 2.70s/it] 19%|█▊ | 1284/6885 [10:59:01<4:14:24, 2.73s/it] 19%|█▊ | 1285/6885 [10:59:05<4:44:10, 3.04s/it] 19%|█▊ | 1286/6885 [10:59:09<4:59:50, 3.21s/it] 19%|█▊ | 1287/6885 [10:59:12<4:53:59, 3.15s/it] 19%|█▊ | 1288/6885 [10:59:14<4:24:23, 2.83s/it] 19%|█▊ | 1289/6885 [10:59:16<4:08:17, 2.66s/it] 19%|█▊ | 1290/6885 [10:59:19<4:12:52, 2.71s/it] {'loss': 0.6258, 'grad_norm': 1.3368521794263946, 'learning_rate': 9.770402407342524e-06, 'epoch': 0.19} 19%|█▊ | 1290/6885 [10:59:19<4:12:52, 2.71s/it] 19%|█▉ | 1291/6885 [10:59:21<4:08:36, 2.67s/it] 19%|█▉ | 1292/6885 [10:59:24<4:09:41, 2.68s/it] 19%|█▉ | 1293/6885 [10:59:26<4:00:32, 2.58s/it] 19%|█▉ | 1294/6885 [10:59:29<3:57:25, 2.55s/it] 19%|█▉ | 1295/6885 [10:59:33<4:52:53, 3.14s/it] 19%|█▉ | 1296/6885 [10:59:36<4:41:15, 3.02s/it] 19%|█▉ | 1297/6885 [10:59:38<4:07:42, 2.66s/it] 19%|█▉ | 1298/6885 [10:59:43<5:18:19, 3.42s/it] 19%|█▉ | 1299/6885 [10:59:46<4:54:48, 3.17s/it] 19%|█▉ | 1300/6885 [10:59:48<4:42:54, 3.04s/it] {'loss': 0.6191, 'grad_norm': 1.3941700458180073, 'learning_rate': 9.762746984324743e-06, 'epoch': 0.19} 19%|█▉ | 1300/6885 [10:59:48<4:42:54, 3.04s/it] 19%|█▉ | 1301/6885 [10:59:51<4:16:05, 2.75s/it] 19%|█▉ | 1302/6885 [10:59:53<3:53:39, 2.51s/it] 19%|█▉ | 1303/6885 [10:59:56<4:30:32, 2.91s/it] 19%|█▉ | 1304/6885 [10:59:59<4:11:39, 2.71s/it] 19%|█▉ | 1305/6885 [11:00:02<4:31:56, 2.92s/it] 19%|█▉ | 1306/6885 [11:00:04<4:16:00, 2.75s/it] 19%|█▉ | 1307/6885 [11:00:06<3:34:41, 2.31s/it] 19%|█▉ | 1308/6885 [11:00:09<3:57:56, 2.56s/it] 19%|█▉ | 1309/6885 [11:00:12<4:09:20, 2.68s/it] 19%|█▉ | 1310/6885 [11:00:14<4:04:04, 2.63s/it] {'loss': 0.6446, 'grad_norm': 1.3152403546822757, 'learning_rate': 9.754969118435043e-06, 'epoch': 0.19} 19%|█▉ | 1310/6885 [11:00:14<4:04:04, 2.63s/it] 19%|█▉ | 1311/6885 [11:00:17<4:02:16, 2.61s/it] 19%|█▉ | 1312/6885 [11:00:21<4:53:48, 3.16s/it] 19%|█▉ | 1313/6885 [11:00:25<4:57:46, 3.21s/it] 19%|█▉ | 1314/6885 [11:00:27<4:25:55, 2.86s/it] 19%|█▉ | 1315/6885 [11:00:29<4:24:29, 2.85s/it] 19%|█▉ | 1316/6885 [11:00:32<4:10:40, 2.70s/it] 19%|█▉ | 1317/6885 [11:00:34<4:07:07, 2.66s/it] 19%|█▉ | 1318/6885 [11:00:37<3:52:17, 2.50s/it] 19%|█▉ | 1319/6885 [11:00:40<4:08:30, 2.68s/it] 19%|█▉ | 1320/6885 [11:00:43<4:25:43, 2.87s/it] {'loss': 0.6312, 'grad_norm': 1.3013626770341264, 'learning_rate': 9.747069009630347e-06, 'epoch': 0.19} 19%|█▉ | 1320/6885 [11:00:43<4:25:43, 2.87s/it] 19%|█▉ | 1321/6885 [11:00:47<4:57:44, 3.21s/it] 19%|█▉ | 1322/6885 [11:00:51<5:28:06, 3.54s/it] 19%|█▉ | 1323/6885 [11:00:55<5:39:04, 3.66s/it] 19%|█▉ | 1324/6885 [11:00:59<5:34:46, 3.61s/it] 19%|█▉ | 1325/6885 [11:01:01<4:51:17, 3.14s/it] 19%|█▉ | 1326/6885 [11:01:03<4:22:29, 2.83s/it] 19%|█▉ | 1327/6885 [11:01:06<4:43:52, 3.06s/it] 19%|█▉ | 1328/6885 [11:01:09<4:23:14, 2.84s/it] 19%|█▉ | 1329/6885 [11:01:12<4:36:40, 2.99s/it] 19%|█▉ | 1330/6885 [11:01:14<4:03:21, 2.63s/it] {'loss': 0.6207, 'grad_norm': 1.3966383885583535, 'learning_rate': 9.739046861010255e-06, 'epoch': 0.19} 19%|█▉ | 1330/6885 [11:01:14<4:03:21, 2.63s/it] 19%|█▉ | 1331/6885 [11:01:17<4:04:28, 2.64s/it] 19%|█▉ | 1332/6885 [11:01:20<4:20:04, 2.81s/it] 19%|█▉ | 1333/6885 [11:01:24<5:06:17, 3.31s/it] 19%|█▉ | 1334/6885 [11:01:27<4:43:58, 3.07s/it] 19%|█▉ | 1335/6885 [11:01:29<4:27:34, 2.89s/it] 19%|█▉ | 1336/6885 [11:01:31<4:00:43, 2.60s/it] 19%|█▉ | 1337/6885 [11:01:34<4:12:52, 2.73s/it] 19%|█▉ | 1338/6885 [11:01:37<4:08:38, 2.69s/it] 19%|█▉ | 1339/6885 [11:01:41<5:02:32, 3.27s/it] 19%|█▉ | 1340/6885 [11:01:44<4:41:47, 3.05s/it] {'loss': 0.6144, 'grad_norm': 1.1439991746974036, 'learning_rate': 9.730902878811825e-06, 'epoch': 0.19} 19%|█▉ | 1340/6885 [11:01:44<4:41:47, 3.05s/it] 19%|█▉ | 1341/6885 [11:01:47<4:30:39, 2.93s/it] 19%|█▉ | 1342/6885 [11:01:49<4:20:52, 2.82s/it] 20%|█▉ | 1343/6885 [11:01:51<3:55:20, 2.55s/it] 20%|█▉ | 1344/6885 [11:01:55<4:21:26, 2.83s/it] 20%|█▉ | 1345/6885 [11:01:57<4:20:32, 2.82s/it] 20%|█▉ | 1346/6885 [11:02:01<4:30:36, 2.93s/it] 20%|█▉ | 1347/6885 [11:02:04<4:36:49, 3.00s/it] 20%|█▉ | 1348/6885 [11:02:07<4:39:24, 3.03s/it] 20%|█▉ | 1349/6885 [11:02:10<4:42:21, 3.06s/it] 20%|█▉ | 1350/6885 [11:02:12<4:22:11, 2.84s/it] {'loss': 0.6044, 'grad_norm': 1.3540894709055364, 'learning_rate': 9.722637272404263e-06, 'epoch': 0.2} 20%|█▉ | 1350/6885 [11:02:12<4:22:11, 2.84s/it] 20%|█▉ | 1351/6885 [11:02:15<4:05:18, 2.66s/it] 20%|█▉ | 1352/6885 [11:02:17<3:56:04, 2.56s/it] 20%|█▉ | 1353/6885 [11:02:21<4:46:05, 3.10s/it] 20%|█▉ | 1354/6885 [11:02:24<4:24:18, 2.87s/it] 20%|█▉ | 1355/6885 [11:02:25<3:47:42, 2.47s/it] 20%|█▉ | 1356/6885 [11:02:28<3:54:28, 2.54s/it] 20%|█▉ | 1357/6885 [11:02:30<3:51:16, 2.51s/it] 20%|█▉ | 1358/6885 [11:02:33<3:56:53, 2.57s/it] 20%|█▉ | 1359/6885 [11:02:35<3:45:31, 2.45s/it] 20%|█▉ | 1360/6885 [11:02:39<4:21:09, 2.84s/it] {'loss': 0.6036, 'grad_norm': 1.100639588271217, 'learning_rate': 9.71425025428355e-06, 'epoch': 0.2} 20%|█▉ | 1360/6885 [11:02:39<4:21:09, 2.84s/it] 20%|█▉ | 1361/6885 [11:02:42<4:19:53, 2.82s/it] 20%|█▉ | 1362/6885 [11:02:44<3:58:00, 2.59s/it] 20%|█▉ | 1363/6885 [11:02:46<3:51:12, 2.51s/it] 20%|█▉ | 1364/6885 [11:02:49<4:03:36, 2.65s/it] 20%|█▉ | 1365/6885 [11:02:52<4:18:50, 2.81s/it] 20%|█▉ | 1366/6885 [11:02:55<4:10:28, 2.72s/it] 20%|█▉ | 1367/6885 [11:02:59<4:50:33, 3.16s/it] 20%|█▉ | 1368/6885 [11:03:01<4:36:00, 3.00s/it] 20%|█▉ | 1369/6885 [11:03:04<4:18:37, 2.81s/it] 20%|█▉ | 1370/6885 [11:03:07<4:31:38, 2.96s/it] {'loss': 0.6039, 'grad_norm': 1.1874319432290736, 'learning_rate': 9.705742040066977e-06, 'epoch': 0.2} 20%|█▉ | 1370/6885 [11:03:07<4:31:38, 2.96s/it] 20%|█▉ | 1371/6885 [11:03:10<4:29:58, 2.94s/it] 20%|█▉ | 1372/6885 [11:03:12<4:11:23, 2.74s/it] 20%|█▉ | 1373/6885 [11:03:17<5:17:04, 3.45s/it] 20%|█▉ | 1374/6885 [11:03:20<4:50:44, 3.17s/it] 20%|█▉ | 1375/6885 [11:03:23<4:36:38, 3.01s/it] 20%|█▉ | 1376/6885 [11:03:25<4:21:21, 2.85s/it] 20%|██ | 1377/6885 [11:03:27<3:57:36, 2.59s/it] 20%|██ | 1378/6885 [11:03:30<4:19:24, 2.83s/it] 20%|██ | 1379/6885 [11:03:33<4:14:29, 2.77s/it] 20%|██ | 1380/6885 [11:03:36<4:22:45, 2.86s/it] {'loss': 0.6376, 'grad_norm': 1.1767671647303808, 'learning_rate': 9.697112848487591e-06, 'epoch': 0.2} 20%|██ | 1380/6885 [11:03:36<4:22:45, 2.86s/it] 20%|██ | 1381/6885 [11:03:38<4:03:42, 2.66s/it] 20%|██ | 1382/6885 [11:03:42<4:43:55, 3.10s/it] 20%|██ | 1383/6885 [11:03:45<4:20:56, 2.85s/it] 20%|██ | 1384/6885 [11:03:48<4:31:30, 2.96s/it] 20%|██ | 1385/6885 [11:03:51<4:25:21, 2.89s/it] 20%|██ | 1386/6885 [11:03:53<4:19:50, 2.84s/it] 20%|██ | 1387/6885 [11:03:57<4:38:39, 3.04s/it] 20%|██ | 1388/6885 [11:03:59<4:12:19, 2.75s/it] 20%|██ | 1389/6885 [11:04:02<4:10:27, 2.73s/it] 20%|██ | 1390/6885 [11:04:05<4:18:13, 2.82s/it] {'loss': 0.6035, 'grad_norm': 1.135879944041461, 'learning_rate': 9.688362901388586e-06, 'epoch': 0.2} 20%|██ | 1390/6885 [11:04:05<4:18:13, 2.82s/it] 20%|██ | 1391/6885 [11:04:08<4:22:33, 2.87s/it] 20%|██ | 1392/6885 [11:04:11<4:22:27, 2.87s/it] 20%|██ | 1393/6885 [11:04:13<4:05:27, 2.68s/it] 20%|██ | 1394/6885 [11:04:15<3:55:48, 2.58s/it] 20%|██ | 1395/6885 [11:04:19<4:42:35, 3.09s/it] 20%|██ | 1396/6885 [11:04:23<5:05:56, 3.34s/it] 20%|██ | 1397/6885 [11:04:26<4:50:16, 3.17s/it] 20%|██ | 1398/6885 [11:04:36<8:06:23, 5.32s/it] 20%|██ | 1399/6885 [11:04:39<6:44:00, 4.42s/it] 20%|██ | 1400/6885 [11:04:42<6:01:16, 3.95s/it] {'loss': 0.6098, 'grad_norm': 1.2315910796359388, 'learning_rate': 9.679492423717596e-06, 'epoch': 0.2} 20%|██ | 1400/6885 [11:04:42<6:01:16, 3.95s/it] 20%|██ | 1401/6885 [11:04:44<5:27:18, 3.58s/it] 20%|██ | 1402/6885 [11:04:49<5:59:46, 3.94s/it] 20%|██ | 1403/6885 [11:04:52<5:25:05, 3.56s/it] 20%|██ | 1404/6885 [11:04:54<4:49:30, 3.17s/it] 20%|██ | 1405/6885 [11:04:58<5:22:24, 3.53s/it] 20%|██ | 1406/6885 [11:05:01<4:53:01, 3.21s/it] 20%|██ | 1407/6885 [11:05:03<4:19:27, 2.84s/it] 20%|██ | 1408/6885 [11:05:07<4:55:13, 3.23s/it] 20%|██ | 1409/6885 [11:05:10<4:41:49, 3.09s/it] 20%|██ | 1410/6885 [11:05:12<4:22:05, 2.87s/it] {'loss': 0.6203, 'grad_norm': 1.4949408462288012, 'learning_rate': 9.670501643520904e-06, 'epoch': 0.2} 20%|██ | 1410/6885 [11:05:12<4:22:05, 2.87s/it] 20%|██ | 1411/6885 [11:05:16<4:37:12, 3.04s/it] 21%|██ | 1412/6885 [11:05:18<4:30:37, 2.97s/it] 21%|██ | 1413/6885 [11:05:23<5:06:51, 3.36s/it] 21%|██ | 1414/6885 [11:05:26<4:55:36, 3.24s/it] 21%|██ | 1415/6885 [11:05:28<4:37:36, 3.05s/it] 21%|██ | 1416/6885 [11:05:33<5:20:03, 3.51s/it] 21%|██ | 1417/6885 [11:05:36<5:07:49, 3.38s/it] 21%|██ | 1418/6885 [11:05:38<4:45:55, 3.14s/it] 21%|██ | 1419/6885 [11:05:41<4:41:51, 3.09s/it] 21%|██ | 1420/6885 [11:05:44<4:29:45, 2.96s/it] {'loss': 0.6286, 'grad_norm': 1.3180181445795711, 'learning_rate': 9.66139079193759e-06, 'epoch': 0.21} 21%|██ | 1420/6885 [11:05:44<4:29:45, 2.96s/it] 21%|██ | 1421/6885 [11:05:46<4:05:49, 2.70s/it] 21%|██ | 1422/6885 [11:05:50<4:50:56, 3.20s/it] 21%|██ | 1423/6885 [11:05:53<4:26:11, 2.92s/it] 21%|██ | 1424/6885 [11:05:55<4:17:11, 2.83s/it] 21%|██ | 1425/6885 [11:05:57<3:56:24, 2.60s/it] 21%|██ | 1426/6885 [11:06:00<3:59:56, 2.64s/it] 21%|██ | 1427/6885 [11:06:03<3:57:09, 2.61s/it] 21%|██ | 1428/6885 [11:06:05<3:43:36, 2.46s/it] 21%|██ | 1429/6885 [11:06:08<4:00:38, 2.65s/it] 21%|██ | 1430/6885 [11:06:13<5:13:17, 3.45s/it] {'loss': 0.6274, 'grad_norm': 1.2616556885045909, 'learning_rate': 9.652160103193583e-06, 'epoch': 0.21} 21%|██ | 1430/6885 [11:06:13<5:13:17, 3.45s/it] 21%|██ | 1431/6885 [11:06:16<4:42:40, 3.11s/it] 21%|██ | 1432/6885 [11:06:18<4:17:07, 2.83s/it] 21%|██ | 1433/6885 [11:06:20<4:05:20, 2.70s/it] 21%|██ | 1434/6885 [11:06:22<3:39:03, 2.41s/it] 21%|██ | 1435/6885 [11:06:24<3:42:40, 2.45s/it] 21%|██ | 1436/6885 [11:06:27<3:45:32, 2.48s/it] 21%|██ | 1437/6885 [11:06:31<4:40:21, 3.09s/it] 21%|██ | 1438/6885 [11:06:36<5:19:15, 3.52s/it] 21%|██ | 1439/6885 [11:06:38<4:48:04, 3.17s/it] 21%|██ | 1440/6885 [11:06:43<5:23:11, 3.56s/it] {'loss': 0.6136, 'grad_norm': 1.3174449455574337, 'learning_rate': 9.642809814595637e-06, 'epoch': 0.21} 21%|██ | 1440/6885 [11:06:43<5:23:11, 3.56s/it] 21%|██ | 1441/6885 [11:06:45<4:42:12, 3.11s/it] 21%|██ | 1442/6885 [11:06:48<4:33:21, 3.01s/it] 21%|██ | 1443/6885 [11:06:50<4:23:04, 2.90s/it] 21%|██ | 1444/6885 [11:06:53<4:30:00, 2.98s/it] 21%|██ | 1445/6885 [11:06:57<4:36:49, 3.05s/it] 21%|██ | 1446/6885 [11:06:59<4:23:16, 2.90s/it] 21%|██ | 1447/6885 [11:07:01<4:03:05, 2.68s/it] 21%|██ | 1448/6885 [11:07:04<4:12:39, 2.79s/it] 21%|██ | 1449/6885 [11:07:07<4:07:30, 2.73s/it] 21%|██ | 1450/6885 [11:07:09<3:54:43, 2.59s/it] {'loss': 0.6145, 'grad_norm': 1.296735377133819, 'learning_rate': 9.633340166525238e-06, 'epoch': 0.21} 21%|██ | 1450/6885 [11:07:09<3:54:43, 2.59s/it] 21%|██ | 1451/6885 [11:07:12<4:01:42, 2.67s/it] 21%|██ | 1452/6885 [11:07:15<4:14:08, 2.81s/it] 21%|██ | 1453/6885 [11:07:19<4:31:31, 3.00s/it] 21%|██ | 1454/6885 [11:07:21<4:16:51, 2.84s/it] 21%|██ | 1455/6885 [11:07:24<4:24:02, 2.92s/it] 21%|██ | 1456/6885 [11:07:27<4:15:00, 2.82s/it] 21%|██ | 1457/6885 [11:07:29<4:08:09, 2.74s/it] 21%|██ | 1458/6885 [11:07:32<3:51:23, 2.56s/it] 21%|██ | 1459/6885 [11:07:36<4:32:08, 3.01s/it] 21%|██ | 1460/6885 [11:07:38<4:18:07, 2.85s/it] {'loss': 0.6031, 'grad_norm': 1.2502497833244608, 'learning_rate': 9.62375140243242e-06, 'epoch': 0.21} 21%|██ | 1460/6885 [11:07:38<4:18:07, 2.85s/it] 21%|██ | 1461/6885 [11:07:42<4:33:26, 3.02s/it] 21%|██ | 1462/6885 [11:07:45<4:32:16, 3.01s/it] 21%|██ | 1463/6885 [11:07:48<4:50:46, 3.22s/it] 21%|██▏ | 1464/6885 [11:07:51<4:36:46, 3.06s/it] 21%|██▏ | 1465/6885 [11:07:57<5:49:06, 3.86s/it] 21%|██▏ | 1466/6885 [11:07:59<5:07:27, 3.40s/it] 21%|██▏ | 1467/6885 [11:08:02<4:49:33, 3.21s/it] 21%|██▏ | 1468/6885 [11:08:07<5:53:44, 3.92s/it] 21%|██▏ | 1469/6885 [11:08:10<5:13:24, 3.47s/it] 21%|██▏ | 1470/6885 [11:08:12<4:51:58, 3.24s/it] {'loss': 0.6128, 'grad_norm': 1.2288830705505374, 'learning_rate': 9.6140437688295e-06, 'epoch': 0.21} 21%|██▏ | 1470/6885 [11:08:12<4:51:58, 3.24s/it] 21%|██▏ | 1471/6885 [11:08:17<5:30:55, 3.67s/it] 21%|██▏ | 1472/6885 [11:08:21<5:39:20, 3.76s/it] 21%|██▏ | 1473/6885 [11:08:24<5:09:19, 3.43s/it] 21%|██▏ | 1474/6885 [11:08:33<7:42:47, 5.13s/it] 21%|██▏ | 1475/6885 [11:08:38<7:33:45, 5.03s/it] 21%|██▏ | 1476/6885 [11:08:40<6:31:09, 4.34s/it] 21%|██▏ | 1477/6885 [11:08:42<5:14:53, 3.49s/it] 21%|██▏ | 1478/6885 [11:08:45<5:05:30, 3.39s/it] 21%|██▏ | 1479/6885 [11:08:48<4:44:52, 3.16s/it] 21%|██▏ | 1480/6885 [11:08:51<4:55:51, 3.28s/it] {'loss': 0.6171, 'grad_norm': 1.1119473380240397, 'learning_rate': 9.604217515284753e-06, 'epoch': 0.21} 21%|██▏ | 1480/6885 [11:08:51<4:55:51, 3.28s/it] 22%|██▏ | 1481/6885 [11:08:53<4:25:55, 2.95s/it] 22%|██▏ | 1482/6885 [11:08:55<3:54:02, 2.60s/it] 22%|██▏ | 1483/6885 [11:08:57<3:44:28, 2.49s/it] 22%|██▏ | 1484/6885 [11:09:00<3:54:12, 2.60s/it] 22%|██▏ | 1485/6885 [11:09:05<4:55:38, 3.28s/it] 22%|██▏ | 1486/6885 [11:09:08<4:29:52, 3.00s/it] 22%|██▏ | 1487/6885 [11:09:10<4:21:14, 2.90s/it] 22%|██▏ | 1488/6885 [11:09:14<4:38:12, 3.09s/it] 22%|██▏ | 1489/6885 [11:09:19<5:36:32, 3.74s/it] 22%|██▏ | 1490/6885 [11:09:22<5:08:23, 3.43s/it] {'loss': 0.6238, 'grad_norm': 1.2070397164389806, 'learning_rate': 9.594272894415986e-06, 'epoch': 0.22} 22%|██▏ | 1490/6885 [11:09:22<5:08:23, 3.43s/it] 22%|██▏ | 1491/6885 [11:09:26<5:30:10, 3.67s/it] 22%|██▏ | 1492/6885 [11:09:29<5:06:18, 3.41s/it] 22%|██▏ | 1493/6885 [11:09:31<4:38:11, 3.10s/it] 22%|██▏ | 1494/6885 [11:09:34<4:27:33, 2.98s/it] 22%|██▏ | 1495/6885 [11:09:39<5:38:00, 3.76s/it] 22%|██▏ | 1496/6885 [11:09:43<5:33:34, 3.71s/it] 22%|██▏ | 1497/6885 [11:09:47<5:33:22, 3.71s/it] 22%|██▏ | 1498/6885 [11:09:49<4:47:46, 3.21s/it] 22%|██▏ | 1499/6885 [11:09:51<4:25:31, 2.96s/it] 22%|██▏ | 1500/6885 [11:09:53<4:07:19, 2.76s/it] {'loss': 0.6163, 'grad_norm': 1.3345637205372078, 'learning_rate': 9.584210161884049e-06, 'epoch': 0.22} 22%|██▏ | 1500/6885 [11:09:53<4:07:19, 2.76s/it] 22%|██▏ | 1501/6885 [11:09:56<4:12:02, 2.81s/it] 22%|██▏ | 1502/6885 [11:09:58<3:52:14, 2.59s/it] 22%|██▏ | 1503/6885 [11:10:01<3:59:42, 2.67s/it] 22%|██▏ | 1504/6885 [11:10:04<3:50:49, 2.57s/it] 22%|██▏ | 1505/6885 [11:10:08<4:29:52, 3.01s/it] 22%|██▏ | 1506/6885 [11:10:10<4:19:42, 2.90s/it] 22%|██▏ | 1507/6885 [11:10:13<4:20:20, 2.90s/it] 22%|██▏ | 1508/6885 [11:10:15<3:54:35, 2.62s/it] 22%|██▏ | 1509/6885 [11:10:18<4:08:37, 2.77s/it] 22%|██▏ | 1510/6885 [11:10:21<3:57:49, 2.65s/it] {'loss': 0.6083, 'grad_norm': 1.1385043759036517, 'learning_rate': 9.57402957638626e-06, 'epoch': 0.22} 22%|██▏ | 1510/6885 [11:10:21<3:57:49, 2.65s/it] 22%|██▏ | 1511/6885 [11:10:23<3:52:10, 2.59s/it] 22%|██▏ | 1512/6885 [11:10:25<3:32:02, 2.37s/it] 22%|██▏ | 1513/6885 [11:10:27<3:27:02, 2.31s/it] 22%|██▏ | 1514/6885 [11:10:31<4:04:28, 2.73s/it] 22%|██▏ | 1515/6885 [11:10:36<5:15:08, 3.52s/it] 22%|██▏ | 1516/6885 [11:10:39<4:59:45, 3.35s/it] 22%|██▏ | 1517/6885 [11:10:42<4:41:17, 3.14s/it] 22%|██▏ | 1518/6885 [11:10:44<4:07:25, 2.77s/it] 22%|██▏ | 1519/6885 [11:10:47<4:10:38, 2.80s/it] 22%|██▏ | 1520/6885 [11:10:49<4:11:37, 2.81s/it] {'loss': 0.5992, 'grad_norm': 1.1936988121465326, 'learning_rate': 9.563731399649756e-06, 'epoch': 0.22} 22%|██▏ | 1520/6885 [11:10:49<4:11:37, 2.81s/it] 22%|██▏ | 1521/6885 [11:10:51<3:50:32, 2.58s/it] 22%|██▏ | 1522/6885 [11:10:55<4:08:24, 2.78s/it] 22%|██▏ | 1523/6885 [11:10:58<4:28:56, 3.01s/it] 22%|██▏ | 1524/6885 [11:11:03<5:10:30, 3.48s/it] 22%|██▏ | 1525/6885 [11:11:06<4:59:55, 3.36s/it] 22%|██▏ | 1526/6885 [11:11:09<4:54:16, 3.29s/it] 22%|██▏ | 1527/6885 [11:11:12<4:36:12, 3.09s/it] 22%|██▏ | 1528/6885 [11:11:16<5:07:36, 3.45s/it] 22%|██▏ | 1529/6885 [11:11:21<6:01:07, 4.05s/it] 22%|██▏ | 1530/6885 [11:11:25<5:54:48, 3.98s/it] {'loss': 0.6054, 'grad_norm': 1.4103572503621762, 'learning_rate': 9.553315896424758e-06, 'epoch': 0.22} 22%|██▏ | 1530/6885 [11:11:25<5:54:48, 3.98s/it] 22%|██▏ | 1531/6885 [11:11:28<5:12:33, 3.50s/it] 22%|██▏ | 1532/6885 [11:11:31<5:02:43, 3.39s/it] 22%|██▏ | 1533/6885 [11:11:35<5:31:05, 3.71s/it] 22%|██▏ | 1534/6885 [11:11:38<5:19:34, 3.58s/it] 22%|██▏ | 1535/6885 [11:11:41<4:38:53, 3.13s/it] 22%|██▏ | 1536/6885 [11:11:46<5:38:39, 3.80s/it] 22%|██▏ | 1537/6885 [11:11:51<6:18:41, 4.25s/it] 22%|██▏ | 1538/6885 [11:11:54<5:31:14, 3.72s/it] 22%|██▏ | 1539/6885 [11:11:56<4:52:00, 3.28s/it] 22%|██▏ | 1540/6885 [11:11:58<4:15:11, 2.86s/it] {'loss': 0.596, 'grad_norm': 1.3209719950503893, 'learning_rate': 9.54278333447778e-06, 'epoch': 0.22} 22%|██▏ | 1540/6885 [11:11:58<4:15:11, 2.86s/it] 22%|██▏ | 1541/6885 [11:12:01<4:12:45, 2.84s/it] 22%|██▏ | 1542/6885 [11:12:03<3:52:06, 2.61s/it] 22%|██▏ | 1543/6885 [11:12:06<4:01:57, 2.72s/it] 22%|██▏ | 1544/6885 [11:12:08<3:43:48, 2.51s/it] 22%|██▏ | 1545/6885 [11:12:10<3:42:01, 2.49s/it] 22%|██▏ | 1546/6885 [11:12:16<5:06:14, 3.44s/it] 22%|██▏ | 1547/6885 [11:12:19<4:47:43, 3.23s/it] 22%|██▏ | 1548/6885 [11:12:24<5:41:03, 3.83s/it] 22%|██▏ | 1549/6885 [11:12:26<4:54:01, 3.31s/it] 23%|██▎ | 1550/6885 [11:12:28<4:26:44, 3.00s/it] {'loss': 0.6323, 'grad_norm': 1.1693016501696898, 'learning_rate': 9.532133984584721e-06, 'epoch': 0.23} 23%|██▎ | 1550/6885 [11:12:28<4:26:44, 3.00s/it] 23%|██▎ | 1551/6885 [11:12:30<3:56:55, 2.67s/it] 23%|██▎ | 1552/6885 [11:12:33<4:13:39, 2.85s/it] 23%|██▎ | 1553/6885 [11:12:37<4:24:12, 2.97s/it] 23%|██▎ | 1554/6885 [11:12:39<4:22:02, 2.95s/it] 23%|██▎ | 1555/6885 [11:12:42<4:17:10, 2.89s/it] 23%|██▎ | 1556/6885 [11:12:45<4:16:18, 2.89s/it] 23%|██▎ | 1557/6885 [11:12:48<4:13:01, 2.85s/it] 23%|██▎ | 1558/6885 [11:12:51<4:11:59, 2.84s/it] 23%|██▎ | 1559/6885 [11:12:54<4:13:54, 2.86s/it] 23%|██▎ | 1560/6885 [11:12:58<4:54:57, 3.32s/it] {'loss': 0.6027, 'grad_norm': 1.1691510921859125, 'learning_rate': 9.521368120523931e-06, 'epoch': 0.23} 23%|██▎ | 1560/6885 [11:12:58<4:54:57, 3.32s/it] 23%|██▎ | 1561/6885 [11:13:01<4:50:30, 3.27s/it] 23%|██▎ | 1562/6885 [11:13:05<5:17:17, 3.58s/it] 23%|██▎ | 1563/6885 [11:13:08<4:58:52, 3.37s/it] 23%|██▎ | 1564/6885 [11:13:10<4:22:29, 2.96s/it] 23%|██▎ | 1565/6885 [11:13:13<4:27:24, 3.02s/it] 23%|██▎ | 1566/6885 [11:13:17<4:40:28, 3.16s/it] 23%|██▎ | 1567/6885 [11:13:20<4:31:57, 3.07s/it] 23%|██▎ | 1568/6885 [11:13:23<4:27:20, 3.02s/it] 23%|██▎ | 1569/6885 [11:13:26<4:28:44, 3.03s/it] 23%|██▎ | 1570/6885 [11:13:29<4:24:46, 2.99s/it] {'loss': 0.6245, 'grad_norm': 1.2114364957172101, 'learning_rate': 9.510486019069154e-06, 'epoch': 0.23} 23%|██▎ | 1570/6885 [11:13:29<4:24:46, 2.99s/it] 23%|██▎ | 1571/6885 [11:13:33<5:03:37, 3.43s/it] 23%|██▎ | 1572/6885 [11:13:37<5:09:41, 3.50s/it] 23%|██▎ | 1573/6885 [11:13:40<4:50:04, 3.28s/it] 23%|██▎ | 1574/6885 [11:13:43<4:48:49, 3.26s/it] 23%|██▎ | 1575/6885 [11:13:45<4:13:00, 2.86s/it] 23%|██▎ | 1576/6885 [11:13:46<3:41:23, 2.50s/it] 23%|██▎ | 1577/6885 [11:13:49<3:35:18, 2.43s/it] 23%|██▎ | 1578/6885 [11:13:51<3:37:03, 2.45s/it] 23%|██▎ | 1579/6885 [11:13:53<3:25:14, 2.32s/it] 23%|██▎ | 1580/6885 [11:13:59<4:53:48, 3.32s/it] {'loss': 0.6189, 'grad_norm': 1.265123327235345, 'learning_rate': 9.499487959982415e-06, 'epoch': 0.23} 23%|██▎ | 1580/6885 [11:13:59<4:53:48, 3.32s/it] 23%|██▎ | 1581/6885 [11:14:01<4:29:10, 3.04s/it] 23%|██▎ | 1582/6885 [11:14:03<4:05:31, 2.78s/it] 23%|██▎ | 1583/6885 [11:14:06<4:00:44, 2.72s/it] 23%|██▎ | 1584/6885 [11:14:09<4:00:28, 2.72s/it] 23%|██▎ | 1585/6885 [11:14:11<3:56:07, 2.67s/it] 23%|██▎ | 1586/6885 [11:14:15<4:37:39, 3.14s/it] 23%|██▎ | 1587/6885 [11:14:18<4:16:56, 2.91s/it] 23%|██▎ | 1588/6885 [11:14:21<4:17:55, 2.92s/it] 23%|██▎ | 1589/6885 [11:14:24<4:23:37, 2.99s/it] 23%|██▎ | 1590/6885 [11:14:28<4:40:52, 3.18s/it] {'loss': 0.6106, 'grad_norm': 1.3773059483594046, 'learning_rate': 9.488374226006836e-06, 'epoch': 0.23} 23%|██▎ | 1590/6885 [11:14:28<4:40:52, 3.18s/it] 23%|██▎ | 1591/6885 [11:14:30<4:20:23, 2.95s/it] 23%|██▎ | 1592/6885 [11:14:32<3:56:50, 2.68s/it] 23%|██▎ | 1593/6885 [11:14:36<4:31:39, 3.08s/it] 23%|██▎ | 1594/6885 [11:14:39<4:31:45, 3.08s/it] 23%|██▎ | 1595/6885 [11:14:42<4:22:59, 2.98s/it] 23%|██▎ | 1596/6885 [11:14:46<4:52:19, 3.32s/it] 23%|██▎ | 1597/6885 [11:14:49<4:43:58, 3.22s/it] 23%|██▎ | 1598/6885 [11:14:53<5:15:15, 3.58s/it] 23%|██▎ | 1599/6885 [11:14:56<4:51:56, 3.31s/it] 23%|██▎ | 1600/6885 [11:14:59<4:50:37, 3.30s/it] {'loss': 0.6115, 'grad_norm': 1.2737618179619303, 'learning_rate': 9.477145102859357e-06, 'epoch': 0.23} 23%|██▎ | 1600/6885 [11:14:59<4:50:37, 3.30s/it] 23%|██▎ | 1601/6885 [11:15:02<4:39:45, 3.18s/it] 23%|██▎ | 1602/6885 [11:15:04<4:08:16, 2.82s/it] 23%|██▎ | 1603/6885 [11:15:06<3:45:23, 2.56s/it] 23%|██▎ | 1604/6885 [11:15:08<3:31:47, 2.41s/it] 23%|██▎ | 1605/6885 [11:15:12<3:59:29, 2.72s/it] 23%|██▎ | 1606/6885 [11:15:16<4:40:05, 3.18s/it] 23%|██▎ | 1607/6885 [11:15:19<4:34:40, 3.12s/it] 23%|██▎ | 1608/6885 [11:15:21<4:18:11, 2.94s/it] 23%|██▎ | 1609/6885 [11:15:24<4:09:19, 2.84s/it] 23%|██▎ | 1610/6885 [11:15:27<4:12:07, 2.87s/it] {'loss': 0.609, 'grad_norm': 1.3066121502077, 'learning_rate': 9.4658008792234e-06, 'epoch': 0.23} 23%|██▎ | 1610/6885 [11:15:27<4:12:07, 2.87s/it] 23%|██▎ | 1611/6885 [11:15:30<4:16:56, 2.92s/it] 23%|██▎ | 1612/6885 [11:15:32<3:59:04, 2.72s/it] 23%|██▎ | 1613/6885 [11:15:35<3:55:25, 2.68s/it] 23%|██▎ | 1614/6885 [11:15:39<4:43:11, 3.22s/it] 23%|██▎ | 1615/6885 [11:15:43<4:52:56, 3.34s/it] 23%|██▎ | 1616/6885 [11:15:46<4:57:01, 3.38s/it] 23%|██▎ | 1617/6885 [11:15:48<4:16:45, 2.92s/it] 24%|██▎ | 1618/6885 [11:15:50<3:42:30, 2.53s/it] 24%|██▎ | 1619/6885 [11:15:55<4:51:23, 3.32s/it] 24%|██▎ | 1620/6885 [11:15:59<5:13:09, 3.57s/it] {'loss': 0.6, 'grad_norm': 1.242518893517758, 'learning_rate': 9.45434184674144e-06, 'epoch': 0.24} 24%|██▎ | 1620/6885 [11:15:59<5:13:09, 3.57s/it] 24%|██▎ | 1621/6885 [11:16:01<4:34:23, 3.13s/it] 24%|██▎ | 1622/6885 [11:16:04<4:32:37, 3.11s/it] 24%|██▎ | 1623/6885 [11:16:07<4:09:18, 2.84s/it] 24%|██▎ | 1624/6885 [11:16:08<3:44:13, 2.56s/it] 24%|██▎ | 1625/6885 [11:16:10<3:26:42, 2.36s/it] 24%|██▎ | 1626/6885 [11:16:14<3:48:38, 2.61s/it] 24%|██▎ | 1627/6885 [11:16:17<4:22:41, 3.00s/it] 24%|██▎ | 1628/6885 [11:16:21<4:29:26, 3.08s/it] 24%|██▎ | 1629/6885 [11:16:23<4:19:44, 2.97s/it] 24%|██▎ | 1630/6885 [11:16:27<4:24:13, 3.02s/it] {'loss': 0.6144, 'grad_norm': 1.2493334973003818, 'learning_rate': 9.442768300007511e-06, 'epoch': 0.24} 24%|██▎ | 1630/6885 [11:16:27<4:24:13, 3.02s/it] 24%|██▎ | 1631/6885 [11:16:30<4:46:54, 3.28s/it] 24%|██▎ | 1632/6885 [11:16:33<4:31:58, 3.11s/it] 24%|██▎ | 1633/6885 [11:16:36<4:26:37, 3.05s/it] 24%|██▎ | 1634/6885 [11:16:39<4:13:34, 2.90s/it] 24%|██▎ | 1635/6885 [11:16:41<4:04:18, 2.79s/it] 24%|██▍ | 1636/6885 [11:16:45<4:23:20, 3.01s/it] 24%|██▍ | 1637/6885 [11:16:49<4:48:13, 3.30s/it] 24%|██▍ | 1638/6885 [11:16:52<4:40:34, 3.21s/it] 24%|██▍ | 1639/6885 [11:16:56<5:09:17, 3.54s/it] 24%|██▍ | 1640/6885 [11:16:59<5:08:57, 3.53s/it] {'loss': 0.6245, 'grad_norm': 1.2775874117960886, 'learning_rate': 9.431080536559631e-06, 'epoch': 0.24} 24%|██▍ | 1640/6885 [11:16:59<5:08:57, 3.53s/it] 24%|██▍ | 1641/6885 [11:17:02<4:48:14, 3.30s/it] 24%|██▍ | 1642/6885 [11:17:05<4:36:04, 3.16s/it] 24%|██▍ | 1643/6885 [11:17:07<4:11:26, 2.88s/it] 24%|██▍ | 1644/6885 [11:17:10<4:18:37, 2.96s/it] 24%|██▍ | 1645/6885 [11:17:14<4:26:52, 3.06s/it] 24%|██▍ | 1646/6885 [11:17:17<4:33:30, 3.13s/it] 24%|██▍ | 1647/6885 [11:17:20<4:17:59, 2.96s/it] 24%|██▍ | 1648/6885 [11:17:22<3:58:08, 2.73s/it] 24%|██▍ | 1649/6885 [11:17:24<3:38:28, 2.50s/it] 24%|██▍ | 1650/6885 [11:17:27<3:53:31, 2.68s/it] {'loss': 0.6279, 'grad_norm': 1.247039996382283, 'learning_rate': 9.419278856872154e-06, 'epoch': 0.24} 24%|██▍ | 1650/6885 [11:17:27<3:53:31, 2.68s/it] 24%|██▍ | 1651/6885 [11:17:30<4:08:19, 2.85s/it] 24%|██▍ | 1652/6885 [11:17:32<3:54:11, 2.69s/it] 24%|██▍ | 1653/6885 [11:17:36<4:19:00, 2.97s/it] 24%|██▍ | 1654/6885 [11:17:38<3:58:46, 2.74s/it] 24%|██▍ | 1655/6885 [11:17:40<3:37:13, 2.49s/it] 24%|██▍ | 1656/6885 [11:17:42<3:30:48, 2.42s/it] 24%|██▍ | 1657/6885 [11:17:45<3:30:57, 2.42s/it] 24%|██▍ | 1658/6885 [11:17:47<3:38:22, 2.51s/it] 24%|██▍ | 1659/6885 [11:17:52<4:19:22, 2.98s/it] 24%|██▍ | 1660/6885 [11:17:55<4:36:33, 3.18s/it] {'loss': 0.5933, 'grad_norm': 1.302601682600637, 'learning_rate': 9.407363564348047e-06, 'epoch': 0.24} 24%|██▍ | 1660/6885 [11:17:55<4:36:33, 3.18s/it] 24%|██▍ | 1661/6885 [11:18:01<5:38:52, 3.89s/it] 24%|██▍ | 1662/6885 [11:18:03<5:00:25, 3.45s/it] 24%|██▍ | 1663/6885 [11:18:06<4:48:36, 3.32s/it] 24%|██▍ | 1664/6885 [11:18:09<4:23:34, 3.03s/it] 24%|██▍ | 1665/6885 [11:18:10<3:50:40, 2.65s/it] 24%|██▍ | 1666/6885 [11:18:13<3:46:22, 2.60s/it] 24%|██▍ | 1667/6885 [11:18:16<3:54:34, 2.70s/it] 24%|██▍ | 1668/6885 [11:18:19<4:08:27, 2.86s/it] 24%|██▍ | 1669/6885 [11:18:21<3:59:27, 2.75s/it] 24%|██▍ | 1670/6885 [11:18:23<3:33:14, 2.45s/it] {'loss': 0.6171, 'grad_norm': 1.431347455463815, 'learning_rate': 9.39533496531108e-06, 'epoch': 0.24} 24%|██▍ | 1670/6885 [11:18:23<3:33:14, 2.45s/it] 24%|██▍ | 1671/6885 [11:18:26<3:44:16, 2.58s/it] 24%|██▍ | 1672/6885 [11:18:29<3:41:59, 2.56s/it] 24%|██▍ | 1673/6885 [11:18:31<3:33:49, 2.46s/it] 24%|██▍ | 1674/6885 [11:18:33<3:25:14, 2.36s/it] 24%|██▍ | 1675/6885 [11:18:36<3:53:44, 2.69s/it] 24%|██▍ | 1676/6885 [11:18:39<3:38:11, 2.51s/it] 24%|██▍ | 1677/6885 [11:18:41<3:44:13, 2.58s/it] 24%|██▍ | 1678/6885 [11:18:44<3:38:40, 2.52s/it] 24%|██▍ | 1679/6885 [11:18:50<5:08:18, 3.55s/it] 24%|██▍ | 1680/6885 [11:18:53<4:52:35, 3.37s/it] {'loss': 0.6099, 'grad_norm': 1.2527655662771335, 'learning_rate': 9.38319336899797e-06, 'epoch': 0.24} 24%|██▍ | 1680/6885 [11:18:53<4:52:35, 3.37s/it] 24%|██▍ | 1681/6885 [11:18:56<4:43:24, 3.27s/it] 24%|██▍ | 1682/6885 [11:18:58<4:09:44, 2.88s/it] 24%|██▍ | 1683/6885 [11:18:59<3:37:27, 2.51s/it] 24%|██▍ | 1684/6885 [11:19:02<3:36:32, 2.50s/it] 24%|██▍ | 1685/6885 [11:19:05<3:53:38, 2.70s/it] 24%|██▍ | 1686/6885 [11:19:07<3:46:47, 2.62s/it] 25%|██▍ | 1687/6885 [11:19:11<4:10:10, 2.89s/it] 25%|██▍ | 1688/6885 [11:19:14<4:09:32, 2.88s/it] 25%|██▍ | 1689/6885 [11:19:17<4:32:26, 3.15s/it] 25%|██▍ | 1690/6885 [11:19:20<4:24:29, 3.05s/it] {'loss': 0.6077, 'grad_norm': 1.205551788839019, 'learning_rate': 9.370939087550407e-06, 'epoch': 0.25} 25%|██▍ | 1690/6885 [11:19:20<4:24:29, 3.05s/it] 25%|██▍ | 1691/6885 [11:19:23<4:14:58, 2.95s/it] 25%|██▍ | 1692/6885 [11:19:25<3:43:10, 2.58s/it] 25%|██▍ | 1693/6885 [11:19:28<4:02:11, 2.80s/it] 25%|██▍ | 1694/6885 [11:19:30<3:34:56, 2.48s/it] 25%|██▍ | 1695/6885 [11:19:32<3:18:49, 2.30s/it] 25%|██▍ | 1696/6885 [11:19:34<3:14:14, 2.25s/it] 25%|██▍ | 1697/6885 [11:19:36<3:26:54, 2.39s/it] 25%|██▍ | 1698/6885 [11:19:40<3:46:04, 2.62s/it] 25%|██▍ | 1699/6885 [11:19:41<3:27:14, 2.40s/it] 25%|██▍ | 1700/6885 [11:19:45<3:53:41, 2.70s/it] {'loss': 0.6126, 'grad_norm': 1.332981320431861, 'learning_rate': 9.358572436007052e-06, 'epoch': 0.25} 25%|██▍ | 1700/6885 [11:19:45<3:53:41, 2.70s/it] 25%|██▍ | 1701/6885 [11:19:48<3:58:38, 2.76s/it] 25%|██▍ | 1702/6885 [11:19:52<4:41:37, 3.26s/it] 25%|██▍ | 1703/6885 [11:19:55<4:23:49, 3.05s/it] 25%|██▍ | 1704/6885 [11:20:00<5:14:18, 3.64s/it] 25%|██▍ | 1705/6885 [11:20:05<6:02:12, 4.20s/it] 25%|██▍ | 1706/6885 [11:20:08<5:24:59, 3.77s/it] 25%|██▍ | 1707/6885 [11:20:11<5:00:26, 3.48s/it] 25%|██▍ | 1708/6885 [11:20:14<4:55:43, 3.43s/it] 25%|██▍ | 1709/6885 [11:20:18<4:56:48, 3.44s/it] 25%|██▍ | 1710/6885 [11:20:20<4:22:15, 3.04s/it] {'loss': 0.6141, 'grad_norm': 1.2112905977700383, 'learning_rate': 9.346093732295422e-06, 'epoch': 0.25} 25%|██▍ | 1710/6885 [11:20:20<4:22:15, 3.04s/it] 25%|██▍ | 1711/6885 [11:20:22<4:04:18, 2.83s/it] 25%|██▍ | 1712/6885 [11:20:24<3:52:17, 2.69s/it] 25%|██▍ | 1713/6885 [11:20:31<5:28:07, 3.81s/it] 25%|██▍ | 1714/6885 [11:20:35<5:28:14, 3.81s/it] 25%|██▍ | 1715/6885 [11:20:40<6:02:33, 4.21s/it] 25%|██▍ | 1716/6885 [11:20:43<5:22:46, 3.75s/it] 25%|██▍ | 1717/6885 [11:20:45<4:55:25, 3.43s/it] 25%|██▍ | 1718/6885 [11:20:48<4:44:00, 3.30s/it] 25%|██▍ | 1719/6885 [11:20:50<4:07:26, 2.87s/it] 25%|██▍ | 1720/6885 [11:20:53<4:07:07, 2.87s/it] {'loss': 0.5977, 'grad_norm': 1.1741115783770129, 'learning_rate': 9.333503297223725e-06, 'epoch': 0.25} 25%|██▍ | 1720/6885 [11:20:53<4:07:07, 2.87s/it] 25%|██▍ | 1721/6885 [11:20:56<4:01:02, 2.80s/it] 25%|██▌ | 1722/6885 [11:20:59<4:14:56, 2.96s/it] 25%|██▌ | 1723/6885 [11:21:02<4:23:27, 3.06s/it] 25%|██▌ | 1724/6885 [11:21:06<4:38:30, 3.24s/it] 25%|██▌ | 1725/6885 [11:21:09<4:35:41, 3.21s/it] 25%|██▌ | 1726/6885 [11:21:12<4:24:56, 3.08s/it] 25%|██▌ | 1727/6885 [11:21:15<4:17:50, 3.00s/it] 25%|██▌ | 1728/6885 [11:21:17<3:51:57, 2.70s/it] 25%|██▌ | 1729/6885 [11:21:19<3:44:04, 2.61s/it] 25%|██▌ | 1730/6885 [11:21:23<4:25:42, 3.09s/it] {'loss': 0.6213, 'grad_norm': 1.2308239868942004, 'learning_rate': 9.320801454472607e-06, 'epoch': 0.25} 25%|██▌ | 1730/6885 [11:21:23<4:25:42, 3.09s/it] 25%|██▌ | 1731/6885 [11:21:25<4:03:03, 2.83s/it] 25%|██▌ | 1732/6885 [11:21:27<3:41:17, 2.58s/it] 25%|██▌ | 1733/6885 [11:21:31<4:20:03, 3.03s/it] 25%|██▌ | 1734/6885 [11:21:35<4:35:46, 3.21s/it] 25%|██▌ | 1735/6885 [11:21:39<4:49:50, 3.38s/it] 25%|██▌ | 1736/6885 [11:21:41<4:24:42, 3.08s/it] 25%|██▌ | 1737/6885 [11:21:44<4:12:00, 2.94s/it] 25%|██▌ | 1738/6885 [11:21:47<4:23:22, 3.07s/it] 25%|██▌ | 1739/6885 [11:21:50<4:24:04, 3.08s/it] 25%|██▌ | 1740/6885 [11:21:52<3:59:36, 2.79s/it] {'loss': 0.6217, 'grad_norm': 1.3933258283474292, 'learning_rate': 9.30798853058684e-06, 'epoch': 0.25} 25%|██▌ | 1740/6885 [11:21:52<3:59:36, 2.79s/it] 25%|██▌ | 1741/6885 [11:21:55<4:01:30, 2.82s/it] 25%|██▌ | 1742/6885 [11:21:57<3:37:11, 2.53s/it] 25%|██▌ | 1743/6885 [11:21:59<3:17:16, 2.30s/it] 25%|██▌ | 1744/6885 [11:22:04<4:25:19, 3.10s/it] 25%|██▌ | 1745/6885 [11:22:07<4:16:50, 3.00s/it] 25%|██▌ | 1746/6885 [11:22:10<4:16:43, 3.00s/it] 25%|██▌ | 1747/6885 [11:22:12<4:09:45, 2.92s/it] 25%|██▌ | 1748/6885 [11:22:15<3:57:42, 2.78s/it] 25%|██▌ | 1749/6885 [11:22:17<3:50:00, 2.69s/it] 25%|██▌ | 1750/6885 [11:22:21<4:07:49, 2.90s/it] {'loss': 0.6089, 'grad_norm': 1.2467959691205432, 'learning_rate': 9.29506485496691e-06, 'epoch': 0.25} 25%|██▌ | 1750/6885 [11:22:21<4:07:49, 2.90s/it] 25%|██▌ | 1751/6885 [11:22:24<4:26:33, 3.12s/it] 25%|██▌ | 1752/6885 [11:22:26<3:50:39, 2.70s/it] 25%|██▌ | 1753/6885 [11:22:30<4:13:53, 2.97s/it] 25%|██▌ | 1754/6885 [11:22:32<4:07:00, 2.89s/it] 25%|██▌ | 1755/6885 [11:22:35<3:55:55, 2.76s/it] 26%|██▌ | 1756/6885 [11:22:38<4:08:34, 2.91s/it] 26%|██▌ | 1757/6885 [11:22:41<4:03:02, 2.84s/it] 26%|██▌ | 1758/6885 [11:22:44<4:17:51, 3.02s/it] 26%|██▌ | 1759/6885 [11:22:47<4:03:55, 2.86s/it] 26%|██▌ | 1760/6885 [11:22:50<4:16:35, 3.00s/it] {'loss': 0.6113, 'grad_norm': 1.106847677662664, 'learning_rate': 9.282030759860566e-06, 'epoch': 0.26} 26%|██▌ | 1760/6885 [11:22:50<4:16:35, 3.00s/it] 26%|██▌ | 1761/6885 [11:22:51<3:33:06, 2.50s/it] 26%|██▌ | 1762/6885 [11:22:55<4:10:22, 2.93s/it] 26%|██▌ | 1763/6885 [11:22:58<3:57:47, 2.79s/it] 26%|██▌ | 1764/6885 [11:23:00<3:35:35, 2.53s/it] 26%|██▌ | 1765/6885 [11:23:02<3:22:49, 2.38s/it] 26%|██▌ | 1766/6885 [11:23:06<3:59:18, 2.80s/it] 26%|██▌ | 1767/6885 [11:23:12<5:30:39, 3.88s/it] 26%|██▌ | 1768/6885 [11:23:15<5:06:59, 3.60s/it] 26%|██▌ | 1769/6885 [11:23:20<5:40:59, 4.00s/it] 26%|██▌ | 1770/6885 [11:23:23<5:10:15, 3.64s/it] {'loss': 0.6041, 'grad_norm': 1.225606521070107, 'learning_rate': 9.268886580354272e-06, 'epoch': 0.26} 26%|██▌ | 1770/6885 [11:23:23<5:10:15, 3.64s/it] 26%|██▌ | 1771/6885 [11:23:26<5:03:21, 3.56s/it] 26%|██▌ | 1772/6885 [11:23:28<4:30:25, 3.17s/it] 26%|██▌ | 1773/6885 [11:23:32<4:49:24, 3.40s/it] 26%|██▌ | 1774/6885 [11:23:35<4:39:09, 3.28s/it] 26%|██▌ | 1775/6885 [11:23:40<5:09:56, 3.64s/it] 26%|██▌ | 1776/6885 [11:23:43<5:00:07, 3.52s/it] 26%|██▌ | 1777/6885 [11:23:46<4:43:02, 3.32s/it] 26%|██▌ | 1778/6885 [11:23:48<4:14:34, 2.99s/it] 26%|██▌ | 1779/6885 [11:23:50<3:57:38, 2.79s/it] 26%|██▌ | 1780/6885 [11:23:52<3:33:15, 2.51s/it] {'loss': 0.6112, 'grad_norm': 1.1249241718792773, 'learning_rate': 9.255632654364591e-06, 'epoch': 0.26} 26%|██▌ | 1780/6885 [11:23:52<3:33:15, 2.51s/it] 26%|██▌ | 1781/6885 [11:23:56<4:14:19, 2.99s/it] 26%|██▌ | 1782/6885 [11:24:00<4:27:01, 3.14s/it] 26%|██▌ | 1783/6885 [11:24:03<4:25:34, 3.12s/it] 26%|██▌ | 1784/6885 [11:24:05<3:57:52, 2.80s/it] 26%|██▌ | 1785/6885 [11:24:07<3:36:18, 2.54s/it] 26%|██▌ | 1786/6885 [11:24:09<3:34:32, 2.52s/it] 26%|██▌ | 1787/6885 [11:24:12<3:47:10, 2.67s/it] 26%|██▌ | 1788/6885 [11:24:15<3:48:36, 2.69s/it] 26%|██▌ | 1789/6885 [11:24:18<4:02:28, 2.85s/it] 26%|██▌ | 1790/6885 [11:24:21<4:04:35, 2.88s/it] {'loss': 0.6003, 'grad_norm': 1.2347205288363368, 'learning_rate': 9.242269322629494e-06, 'epoch': 0.26} 26%|██▌ | 1790/6885 [11:24:21<4:04:35, 2.88s/it] 26%|██▌ | 1791/6885 [11:24:24<4:02:00, 2.85s/it] 26%|██▌ | 1792/6885 [11:24:27<4:02:10, 2.85s/it] 26%|██▌ | 1793/6885 [11:24:29<3:45:14, 2.65s/it] 26%|██▌ | 1794/6885 [11:24:32<4:05:34, 2.89s/it] 26%|██▌ | 1795/6885 [11:24:34<3:40:23, 2.60s/it] 26%|██▌ | 1796/6885 [11:24:38<4:07:42, 2.92s/it] 26%|██▌ | 1797/6885 [11:24:41<4:00:36, 2.84s/it] 26%|██▌ | 1798/6885 [11:24:43<3:56:07, 2.78s/it] 26%|██▌ | 1799/6885 [11:24:46<4:01:05, 2.84s/it] 26%|██▌ | 1800/6885 [11:24:48<3:36:45, 2.56s/it] {'loss': 0.6187, 'grad_norm': 1.3040805105750026, 'learning_rate': 9.228796928699613e-06, 'epoch': 0.26} 26%|██▌ | 1800/6885 [11:24:48<3:36:45, 2.56s/it] 26%|██▌ | 1801/6885 [11:24:51<3:46:22, 2.67s/it] 26%|██▌ | 1802/6885 [11:24:55<4:22:55, 3.10s/it] 26%|██▌ | 1803/6885 [11:24:58<4:23:06, 3.11s/it] 26%|██▌ | 1804/6885 [11:25:04<5:15:41, 3.73s/it] 26%|██▌ | 1805/6885 [11:25:05<4:27:36, 3.16s/it] 26%|██▌ | 1806/6885 [11:25:10<5:08:12, 3.64s/it] 26%|██▌ | 1807/6885 [11:25:12<4:32:45, 3.22s/it] 26%|██▋ | 1808/6885 [11:25:15<4:05:03, 2.90s/it] 26%|██▋ | 1809/6885 [11:25:18<4:12:41, 2.99s/it] 26%|██▋ | 1810/6885 [11:25:20<3:42:49, 2.63s/it] {'loss': 0.612, 'grad_norm': 1.4585670240799034, 'learning_rate': 9.215215818929392e-06, 'epoch': 0.26} 26%|██▋ | 1810/6885 [11:25:20<3:42:49, 2.63s/it] 26%|██▋ | 1811/6885 [11:25:23<4:02:25, 2.87s/it] 26%|██▋ | 1812/6885 [11:25:25<3:47:24, 2.69s/it] 26%|██▋ | 1813/6885 [11:25:30<4:40:23, 3.32s/it] 26%|██▋ | 1814/6885 [11:25:32<4:03:28, 2.88s/it] 26%|██▋ | 1815/6885 [11:25:35<4:16:50, 3.04s/it] 26%|██▋ | 1816/6885 [11:25:38<4:19:07, 3.07s/it] 26%|██▋ | 1817/6885 [11:25:41<4:11:49, 2.98s/it] 26%|██▋ | 1818/6885 [11:25:44<4:00:34, 2.85s/it] 26%|██▋ | 1819/6885 [11:25:47<3:58:09, 2.82s/it] 26%|██▋ | 1820/6885 [11:25:49<3:58:53, 2.83s/it] {'loss': 0.6124, 'grad_norm': 1.0974130075617774, 'learning_rate': 9.201526342468202e-06, 'epoch': 0.26} 26%|██▋ | 1820/6885 [11:25:49<3:58:53, 2.83s/it] 26%|██▋ | 1821/6885 [11:25:53<4:11:26, 2.98s/it] 26%|██▋ | 1822/6885 [11:25:56<4:21:02, 3.09s/it] 26%|██▋ | 1823/6885 [11:25:58<3:53:06, 2.76s/it] 26%|██▋ | 1824/6885 [11:26:01<4:02:55, 2.88s/it] 27%|██▋ | 1825/6885 [11:26:03<3:36:59, 2.57s/it] 27%|██▋ | 1826/6885 [11:26:05<3:09:47, 2.25s/it] 27%|██▋ | 1827/6885 [11:26:10<4:39:28, 3.32s/it] 27%|██▋ | 1828/6885 [11:26:13<4:24:27, 3.14s/it] 27%|██▋ | 1829/6885 [11:26:16<4:19:23, 3.08s/it] 27%|██▋ | 1830/6885 [11:26:19<4:17:25, 3.06s/it] {'loss': 0.6055, 'grad_norm': 1.2918051377461068, 'learning_rate': 9.18772885125134e-06, 'epoch': 0.27} 27%|██▋ | 1830/6885 [11:26:19<4:17:25, 3.06s/it] 27%|██▋ | 1831/6885 [11:26:22<4:22:00, 3.11s/it] 27%|██▋ | 1832/6885 [11:26:25<4:07:55, 2.94s/it] 27%|██▋ | 1833/6885 [11:26:27<3:46:40, 2.69s/it] 27%|██▋ | 1834/6885 [11:26:29<3:40:48, 2.62s/it] 27%|██▋ | 1835/6885 [11:26:31<3:25:36, 2.44s/it] 27%|██▋ | 1836/6885 [11:26:35<3:50:51, 2.74s/it] 27%|██▋ | 1837/6885 [11:26:38<3:51:44, 2.75s/it] 27%|██▋ | 1838/6885 [11:26:42<4:31:43, 3.23s/it] 27%|██▋ | 1839/6885 [11:26:46<4:45:24, 3.39s/it] 27%|██▋ | 1840/6885 [11:26:48<4:21:20, 3.11s/it] {'loss': 0.6086, 'grad_norm': 1.199609927095931, 'learning_rate': 9.17382369999101e-06, 'epoch': 0.27} 27%|██▋ | 1840/6885 [11:26:48<4:21:20, 3.11s/it] 27%|██▋ | 1841/6885 [11:26:51<4:11:44, 2.99s/it] 27%|██▋ | 1842/6885 [11:26:54<4:25:06, 3.15s/it] 27%|██▋ | 1843/6885 [11:26:59<4:58:40, 3.55s/it] 27%|██▋ | 1844/6885 [11:27:02<4:47:38, 3.42s/it] 27%|██▋ | 1845/6885 [11:27:05<4:29:14, 3.21s/it] 27%|██▋ | 1846/6885 [11:27:08<4:20:41, 3.10s/it] 27%|██▋ | 1847/6885 [11:27:10<4:05:30, 2.92s/it] 27%|██▋ | 1848/6885 [11:27:13<3:52:18, 2.77s/it] 27%|██▋ | 1849/6885 [11:27:15<3:47:59, 2.72s/it] 27%|██▋ | 1850/6885 [11:27:20<4:32:42, 3.25s/it] {'loss': 0.6111, 'grad_norm': 1.2736244478450063, 'learning_rate': 9.159811246167182e-06, 'epoch': 0.27} 27%|██▋ | 1850/6885 [11:27:20<4:32:42, 3.25s/it] 27%|██▋ | 1851/6885 [11:27:22<4:23:00, 3.13s/it] 27%|██▋ | 1852/6885 [11:27:24<3:53:02, 2.78s/it] 27%|██▋ | 1853/6885 [11:27:27<3:54:41, 2.80s/it] 27%|██▋ | 1854/6885 [11:27:30<3:40:59, 2.64s/it] 27%|██▋ | 1855/6885 [11:27:32<3:24:29, 2.44s/it] 27%|██▋ | 1856/6885 [11:27:34<3:31:39, 2.53s/it] 27%|██▋ | 1857/6885 [11:27:38<4:02:30, 2.89s/it] 27%|██▋ | 1858/6885 [11:27:40<3:37:57, 2.60s/it] 27%|██▋ | 1859/6885 [11:27:43<3:44:16, 2.68s/it] 27%|██▋ | 1860/6885 [11:27:45<3:26:16, 2.46s/it] {'loss': 0.5951, 'grad_norm': 1.2484696326393374, 'learning_rate': 9.14569185001841e-06, 'epoch': 0.27} 27%|██▋ | 1860/6885 [11:27:45<3:26:16, 2.46s/it] 27%|██▋ | 1861/6885 [11:27:47<3:27:19, 2.48s/it] 27%|██▋ | 1862/6885 [11:27:52<4:13:50, 3.03s/it] 27%|██▋ | 1863/6885 [11:27:55<4:13:28, 3.03s/it] 27%|██▋ | 1864/6885 [11:27:59<4:37:07, 3.31s/it] 27%|██▋ | 1865/6885 [11:28:02<4:31:25, 3.24s/it] 27%|██▋ | 1866/6885 [11:28:05<4:43:20, 3.39s/it] 27%|██▋ | 1867/6885 [11:28:08<4:18:43, 3.09s/it] 27%|██▋ | 1868/6885 [11:28:10<3:50:55, 2.76s/it] 27%|██▋ | 1869/6885 [11:28:14<4:31:23, 3.25s/it] 27%|██▋ | 1870/6885 [11:28:17<4:23:00, 3.15s/it] {'loss': 0.5861, 'grad_norm': 1.3221301583704237, 'learning_rate': 9.131465874532568e-06, 'epoch': 0.27} 27%|██▋ | 1870/6885 [11:28:17<4:23:00, 3.15s/it] 27%|██▋ | 1871/6885 [11:28:19<4:03:40, 2.92s/it] 27%|██▋ | 1872/6885 [11:28:23<4:09:15, 2.98s/it] 27%|██▋ | 1873/6885 [11:28:25<4:01:52, 2.90s/it] 27%|██▋ | 1874/6885 [11:28:32<5:43:25, 4.11s/it] 27%|██▋ | 1875/6885 [11:28:35<5:08:09, 3.69s/it] 27%|██▋ | 1876/6885 [11:28:39<5:28:37, 3.94s/it] 27%|██▋ | 1877/6885 [11:28:42<4:57:27, 3.56s/it] 27%|██▋ | 1878/6885 [11:28:46<5:13:04, 3.75s/it] 27%|██▋ | 1879/6885 [11:28:48<4:29:42, 3.23s/it] 27%|██▋ | 1880/6885 [11:28:51<4:24:33, 3.17s/it] {'loss': 0.6073, 'grad_norm': 1.2578322361866867, 'learning_rate': 9.117133685437524e-06, 'epoch': 0.27} 27%|██▋ | 1880/6885 [11:28:51<4:24:33, 3.17s/it] 27%|██▋ | 1881/6885 [11:28:55<4:43:30, 3.40s/it] 27%|██▋ | 1882/6885 [11:28:58<4:28:23, 3.22s/it] 27%|██▋ | 1883/6885 [11:29:01<4:15:53, 3.07s/it] 27%|██▋ | 1884/6885 [11:29:03<4:02:16, 2.91s/it] 27%|██▋ | 1885/6885 [11:29:06<4:08:17, 2.98s/it] 27%|██▋ | 1886/6885 [11:29:09<3:44:12, 2.69s/it] 27%|██▋ | 1887/6885 [11:29:11<3:36:49, 2.60s/it] 27%|██▋ | 1888/6885 [11:29:14<3:45:43, 2.71s/it] 27%|██▋ | 1889/6885 [11:29:16<3:27:05, 2.49s/it] 27%|██▋ | 1890/6885 [11:29:19<3:36:34, 2.60s/it] {'loss': 0.5838, 'grad_norm': 1.3260698149158467, 'learning_rate': 9.102695651191737e-06, 'epoch': 0.27} 27%|██▋ | 1890/6885 [11:29:19<3:36:34, 2.60s/it] 27%|██▋ | 1891/6885 [11:29:22<3:55:33, 2.83s/it] 27%|██▋ | 1892/6885 [11:29:25<4:04:28, 2.94s/it] 27%|██▋ | 1893/6885 [11:29:28<4:01:11, 2.90s/it] 28%|██▊ | 1894/6885 [11:29:30<3:27:34, 2.50s/it] 28%|██▊ | 1895/6885 [11:29:33<3:39:29, 2.64s/it] 28%|██▊ | 1896/6885 [11:29:34<3:12:49, 2.32s/it] 28%|██▊ | 1897/6885 [11:29:37<3:36:37, 2.61s/it] 28%|██▊ | 1898/6885 [11:29:41<4:02:55, 2.92s/it] 28%|██▊ | 1899/6885 [11:29:44<3:53:05, 2.80s/it] 28%|██▊ | 1900/6885 [11:29:46<3:46:49, 2.73s/it] {'loss': 0.6013, 'grad_norm': 1.2373193794097532, 'learning_rate': 9.088152142974771e-06, 'epoch': 0.28} 28%|██▊ | 1900/6885 [11:29:46<3:46:49, 2.73s/it] 28%|██▊ | 1901/6885 [11:29:50<4:12:52, 3.04s/it] 28%|██▊ | 1902/6885 [11:29:53<4:22:38, 3.16s/it] 28%|██▊ | 1903/6885 [11:29:57<4:32:52, 3.29s/it] 28%|██▊ | 1904/6885 [11:30:00<4:36:47, 3.33s/it] 28%|██▊ | 1905/6885 [11:30:05<5:03:23, 3.66s/it] 28%|██▊ | 1906/6885 [11:30:08<4:44:34, 3.43s/it] 28%|██▊ | 1907/6885 [11:30:11<4:38:27, 3.36s/it] 28%|██▊ | 1908/6885 [11:30:14<4:19:41, 3.13s/it] 28%|██▊ | 1909/6885 [11:30:18<4:53:47, 3.54s/it] 28%|██▊ | 1910/6885 [11:30:20<4:25:30, 3.20s/it] {'loss': 0.6219, 'grad_norm': 1.1997047870357698, 'learning_rate': 9.073503534677773e-06, 'epoch': 0.28} 28%|██▊ | 1910/6885 [11:30:20<4:25:30, 3.20s/it] 28%|██▊ | 1911/6885 [11:30:25<4:59:31, 3.61s/it] 28%|██▊ | 1912/6885 [11:30:28<4:35:53, 3.33s/it] 28%|██▊ | 1913/6885 [11:30:30<4:20:50, 3.15s/it] 28%|██▊ | 1914/6885 [11:30:35<4:56:20, 3.58s/it] 28%|██▊ | 1915/6885 [11:30:39<5:01:51, 3.64s/it] 28%|██▊ | 1916/6885 [11:30:44<5:31:19, 4.00s/it] 28%|██▊ | 1917/6885 [11:30:46<4:53:51, 3.55s/it] 28%|██▊ | 1918/6885 [11:30:49<4:27:59, 3.24s/it] 28%|██▊ | 1919/6885 [11:30:53<4:45:42, 3.45s/it] 28%|██▊ | 1920/6885 [11:30:54<3:58:17, 2.88s/it] {'loss': 0.6052, 'grad_norm': 1.2769112952981858, 'learning_rate': 9.058750202893844e-06, 'epoch': 0.28} 28%|██▊ | 1920/6885 [11:30:54<3:58:17, 2.88s/it] 28%|██▊ | 1921/6885 [11:30:59<4:46:15, 3.46s/it] 28%|██▊ | 1922/6885 [11:31:03<5:02:27, 3.66s/it] 28%|██▊ | 1923/6885 [11:31:05<4:22:15, 3.17s/it] 28%|██▊ | 1924/6885 [11:31:09<4:35:03, 3.33s/it] 28%|██▊ | 1925/6885 [11:31:12<4:38:46, 3.37s/it] 28%|██▊ | 1926/6885 [11:31:14<4:09:54, 3.02s/it] 28%|██▊ | 1927/6885 [11:31:17<3:59:38, 2.90s/it] 28%|██▊ | 1928/6885 [11:31:20<4:12:21, 3.05s/it] 28%|██▊ | 1929/6885 [11:31:24<4:17:17, 3.11s/it] 28%|██▊ | 1930/6885 [11:31:28<4:54:05, 3.56s/it] {'loss': 0.6124, 'grad_norm': 1.2302296498321919, 'learning_rate': 9.04389252690837e-06, 'epoch': 0.28} 28%|██▊ | 1930/6885 [11:31:28<4:54:05, 3.56s/it] 28%|██▊ | 1931/6885 [11:31:32<4:51:53, 3.54s/it] 28%|██▊ | 1932/6885 [11:31:34<4:25:41, 3.22s/it] 28%|██▊ | 1933/6885 [11:31:37<4:03:38, 2.95s/it] 28%|██▊ | 1934/6885 [11:31:39<3:58:24, 2.89s/it] 28%|██▊ | 1935/6885 [11:31:42<4:00:22, 2.91s/it] 28%|██▊ | 1936/6885 [11:31:44<3:29:42, 2.54s/it] 28%|██▊ | 1937/6885 [11:31:47<3:35:24, 2.61s/it] 28%|██▊ | 1938/6885 [11:31:49<3:36:42, 2.63s/it] 28%|██▊ | 1939/6885 [11:31:52<3:26:43, 2.51s/it] 28%|██▊ | 1940/6885 [11:31:55<3:34:46, 2.61s/it] {'loss': 0.604, 'grad_norm': 1.2009594091858158, 'learning_rate': 9.02893088868926e-06, 'epoch': 0.28} 28%|██▊ | 1940/6885 [11:31:55<3:34:46, 2.61s/it] 28%|██▊ | 1941/6885 [11:31:59<4:32:36, 3.31s/it] 28%|██▊ | 1942/6885 [11:32:02<4:17:49, 3.13s/it] 28%|██▊ | 1943/6885 [11:32:05<4:13:11, 3.07s/it] 28%|██▊ | 1944/6885 [11:32:08<4:11:15, 3.05s/it] 28%|██▊ | 1945/6885 [11:32:11<4:13:54, 3.08s/it] 28%|██▊ | 1946/6885 [11:32:14<4:06:14, 2.99s/it] 28%|██▊ | 1947/6885 [11:32:17<4:16:38, 3.12s/it] 28%|██▊ | 1948/6885 [11:32:19<3:46:46, 2.76s/it] 28%|██▊ | 1949/6885 [11:32:21<3:26:18, 2.51s/it] 28%|██▊ | 1950/6885 [11:32:25<4:00:01, 2.92s/it] {'loss': 0.6052, 'grad_norm': 1.0539872600155336, 'learning_rate': 9.013865672877133e-06, 'epoch': 0.28} 28%|██▊ | 1950/6885 [11:32:25<4:00:01, 2.92s/it] 28%|██▊ | 1951/6885 [11:32:27<3:40:27, 2.68s/it] 28%|██▊ | 1952/6885 [11:32:29<3:24:40, 2.49s/it] 28%|██▊ | 1953/6885 [11:32:31<3:11:56, 2.34s/it] 28%|██▊ | 1954/6885 [11:32:34<3:21:25, 2.45s/it] 28%|██▊ | 1955/6885 [11:32:38<3:59:31, 2.92s/it] 28%|██▊ | 1956/6885 [11:32:40<3:45:26, 2.74s/it] 28%|██▊ | 1957/6885 [11:32:43<3:38:34, 2.66s/it] 28%|██▊ | 1958/6885 [11:32:45<3:29:41, 2.55s/it] 28%|██▊ | 1959/6885 [11:32:48<3:38:57, 2.67s/it] 28%|██▊ | 1960/6885 [11:32:50<3:30:27, 2.56s/it] {'loss': 0.6077, 'grad_norm': 1.2561895098497668, 'learning_rate': 8.998697266775433e-06, 'epoch': 0.28} 28%|██▊ | 1960/6885 [11:32:50<3:30:27, 2.56s/it] 28%|██▊ | 1961/6885 [11:32:53<3:19:03, 2.43s/it] 28%|██▊ | 1962/6885 [11:32:55<3:11:17, 2.33s/it] 29%|██▊ | 1963/6885 [11:32:57<3:09:44, 2.31s/it] 29%|██▊ | 1964/6885 [11:32:59<3:15:43, 2.39s/it] 29%|██▊ | 1965/6885 [11:33:03<3:51:23, 2.82s/it] 29%|██▊ | 1966/6885 [11:33:07<4:03:40, 2.97s/it] 29%|██▊ | 1967/6885 [11:33:10<4:06:59, 3.01s/it] 29%|██▊ | 1968/6885 [11:33:12<3:43:32, 2.73s/it] 29%|██▊ | 1969/6885 [11:33:15<4:03:52, 2.98s/it] 29%|██▊ | 1970/6885 [11:33:18<3:50:09, 2.81s/it] {'loss': 0.6059, 'grad_norm': 1.2763583417414128, 'learning_rate': 8.98342606034046e-06, 'epoch': 0.29} 29%|██▊ | 1970/6885 [11:33:18<3:50:09, 2.81s/it] 29%|██▊ | 1971/6885 [11:33:20<3:29:52, 2.56s/it] 29%|██▊ | 1972/6885 [11:33:22<3:19:56, 2.44s/it] 29%|██▊ | 1973/6885 [11:33:24<3:22:04, 2.47s/it] 29%|██▊ | 1974/6885 [11:33:27<3:30:18, 2.57s/it] 29%|██▊ | 1975/6885 [11:33:30<3:36:49, 2.65s/it] 29%|██▊ | 1976/6885 [11:33:33<3:55:01, 2.87s/it] 29%|██▊ | 1977/6885 [11:33:37<4:15:16, 3.12s/it] 29%|██▊ | 1978/6885 [11:33:40<4:02:40, 2.97s/it] 29%|██▊ | 1979/6885 [11:33:42<3:53:43, 2.86s/it] 29%|██▉ | 1980/6885 [11:33:46<4:10:28, 3.06s/it] {'loss': 0.6183, 'grad_norm': 1.1463184995763767, 'learning_rate': 8.96805244617135e-06, 'epoch': 0.29} 29%|██▉ | 1980/6885 [11:33:46<4:10:28, 3.06s/it] 29%|██▉ | 1981/6885 [11:33:51<4:52:17, 3.58s/it] 29%|██▉ | 1982/6885 [11:33:55<5:05:45, 3.74s/it] 29%|██▉ | 1983/6885 [11:33:57<4:32:01, 3.33s/it] 29%|██▉ | 1984/6885 [11:34:01<4:49:55, 3.55s/it] 29%|██▉ | 1985/6885 [11:34:05<4:48:09, 3.53s/it] 29%|██▉ | 1986/6885 [11:34:07<4:20:10, 3.19s/it] 29%|██▉ | 1987/6885 [11:34:09<3:54:16, 2.87s/it] 29%|██▉ | 1988/6885 [11:34:12<3:53:28, 2.86s/it] 29%|██▉ | 1989/6885 [11:34:14<3:30:32, 2.58s/it] 29%|██▉ | 1990/6885 [11:34:18<4:10:08, 3.07s/it] {'loss': 0.602, 'grad_norm': 1.1421597790792624, 'learning_rate': 8.952576819499998e-06, 'epoch': 0.29} 29%|██▉ | 1990/6885 [11:34:18<4:10:08, 3.07s/it] 29%|██▉ | 1991/6885 [11:34:22<4:39:05, 3.42s/it] 29%|██▉ | 1992/6885 [11:34:25<4:18:33, 3.17s/it] 29%|██▉ | 1993/6885 [11:34:29<4:26:52, 3.27s/it] 29%|██▉ | 1994/6885 [11:34:32<4:27:35, 3.28s/it] 29%|██▉ | 1995/6885 [11:34:35<4:33:15, 3.35s/it] 29%|██▉ | 1996/6885 [11:34:38<4:25:16, 3.26s/it] 29%|██▉ | 1997/6885 [11:34:41<4:03:24, 2.99s/it] 29%|██▉ | 1998/6885 [11:34:44<4:01:46, 2.97s/it] 29%|██▉ | 1999/6885 [11:34:45<3:30:14, 2.58s/it] 29%|██▉ | 2000/6885 [11:34:48<3:27:17, 2.55s/it] {'loss': 0.5925, 'grad_norm': 1.3046866547593934, 'learning_rate': 8.93699957818087e-06, 'epoch': 0.29} 29%|██▉ | 2000/6885 [11:34:48<3:27:17, 2.55s/it] 29%|██▉ | 2001/6885 [11:34:50<3:23:17, 2.50s/it] 29%|██▉ | 2002/6885 [11:34:54<3:47:20, 2.79s/it] 29%|██▉ | 2003/6885 [11:34:57<3:50:05, 2.83s/it] 29%|██▉ | 2004/6885 [11:34:59<3:45:01, 2.77s/it] 29%|██▉ | 2005/6885 [11:35:02<3:44:50, 2.76s/it] 29%|██▉ | 2006/6885 [11:35:06<4:03:36, 3.00s/it] 29%|██▉ | 2007/6885 [11:35:10<4:42:15, 3.47s/it] 29%|██▉ | 2008/6885 [11:35:12<3:53:22, 2.87s/it] 29%|██▉ | 2009/6885 [11:35:15<3:57:47, 2.93s/it] 29%|██▉ | 2010/6885 [11:35:18<4:04:18, 3.01s/it] {'loss': 0.6037, 'grad_norm': 1.27239619384718, 'learning_rate': 8.921321122680789e-06, 'epoch': 0.29} 29%|██▉ | 2010/6885 [11:35:18<4:04:18, 3.01s/it] 29%|██▉ | 2011/6885 [11:35:20<3:50:22, 2.84s/it] 29%|██▉ | 2012/6885 [11:35:23<3:39:21, 2.70s/it] 29%|██▉ | 2013/6885 [11:35:26<3:58:08, 2.93s/it] 29%|██▉ | 2014/6885 [11:35:30<4:09:07, 3.07s/it] 29%|██▉ | 2015/6885 [11:35:33<4:24:41, 3.26s/it] 29%|██▉ | 2016/6885 [11:35:37<4:27:30, 3.30s/it] 29%|██▉ | 2017/6885 [11:35:39<3:56:51, 2.92s/it] 29%|██▉ | 2018/6885 [11:35:42<4:07:16, 3.05s/it] 29%|██▉ | 2019/6885 [11:35:45<4:07:03, 3.05s/it] 29%|██▉ | 2020/6885 [11:35:47<3:46:38, 2.80s/it] {'loss': 0.6077, 'grad_norm': 1.3073284462474046, 'learning_rate': 8.905541856068641e-06, 'epoch': 0.29} 29%|██▉ | 2020/6885 [11:35:47<3:46:38, 2.80s/it] 29%|██▉ | 2021/6885 [11:35:50<3:56:28, 2.92s/it] 29%|██▉ | 2022/6885 [11:35:54<4:11:07, 3.10s/it] 29%|██▉ | 2023/6885 [11:35:56<3:55:08, 2.90s/it] 29%|██▉ | 2024/6885 [11:36:00<4:03:13, 3.00s/it] 29%|██▉ | 2025/6885 [11:36:06<5:35:49, 4.15s/it] 29%|██▉ | 2026/6885 [11:36:09<4:45:57, 3.53s/it] 29%|██▉ | 2027/6885 [11:36:12<4:36:59, 3.42s/it] 29%|██▉ | 2028/6885 [11:36:15<4:26:15, 3.29s/it] 29%|██▉ | 2029/6885 [11:36:17<4:09:08, 3.08s/it] 29%|██▉ | 2030/6885 [11:36:20<4:11:00, 3.10s/it] {'loss': 0.6076, 'grad_norm': 1.2694028140938955, 'learning_rate': 8.889662184005007e-06, 'epoch': 0.29} 29%|██▉ | 2030/6885 [11:36:20<4:11:00, 3.10s/it] 29%|██▉ | 2031/6885 [11:36:24<4:23:40, 3.26s/it] 30%|██▉ | 2032/6885 [11:36:28<4:37:46, 3.43s/it] 30%|██▉ | 2033/6885 [11:36:30<4:04:05, 3.02s/it] 30%|██▉ | 2034/6885 [11:36:33<4:01:23, 2.99s/it] 30%|██▉ | 2035/6885 [11:36:35<3:42:12, 2.75s/it] 30%|██▉ | 2036/6885 [11:36:37<3:14:07, 2.40s/it] 30%|██▉ | 2037/6885 [11:36:39<3:17:33, 2.44s/it] 30%|██▉ | 2038/6885 [11:36:44<4:23:51, 3.27s/it] 30%|██▉ | 2039/6885 [11:36:47<3:59:07, 2.96s/it] 30%|██▉ | 2040/6885 [11:36:49<3:49:45, 2.85s/it] {'loss': 0.5986, 'grad_norm': 1.1075058528848678, 'learning_rate': 8.873682514731746e-06, 'epoch': 0.3} 30%|██▉ | 2040/6885 [11:36:49<3:49:45, 2.85s/it] 30%|██▉ | 2041/6885 [11:36:55<5:05:04, 3.78s/it] 30%|██▉ | 2042/6885 [11:36:57<4:18:20, 3.20s/it] 30%|██▉ | 2043/6885 [11:37:00<4:05:08, 3.04s/it] 30%|██▉ | 2044/6885 [11:37:03<4:02:40, 3.01s/it] 30%|██▉ | 2045/6885 [11:37:05<3:58:46, 2.96s/it] 30%|██▉ | 2046/6885 [11:37:09<4:05:49, 3.05s/it] 30%|██▉ | 2047/6885 [11:37:11<3:45:33, 2.80s/it] 30%|██▉ | 2048/6885 [11:37:13<3:19:38, 2.48s/it] 30%|██▉ | 2049/6885 [11:37:17<4:03:52, 3.03s/it] 30%|██▉ | 2050/6885 [11:37:20<4:03:42, 3.02s/it] {'loss': 0.5911, 'grad_norm': 1.25011183641691, 'learning_rate': 8.85760325906148e-06, 'epoch': 0.3} 30%|██▉ | 2050/6885 [11:37:20<4:03:42, 3.02s/it] 30%|██▉ | 2051/6885 [11:37:24<4:16:31, 3.18s/it] 30%|██▉ | 2052/6885 [11:37:26<4:03:02, 3.02s/it] 30%|██▉ | 2053/6885 [11:37:29<4:07:47, 3.08s/it] 30%|██▉ | 2054/6885 [11:37:33<4:18:45, 3.21s/it] 30%|██▉ | 2055/6885 [11:37:39<5:25:37, 4.04s/it] 30%|██▉ | 2056/6885 [11:37:43<5:15:34, 3.92s/it] 30%|██▉ | 2057/6885 [11:37:45<4:41:27, 3.50s/it] 30%|██▉ | 2058/6885 [11:37:49<4:57:47, 3.70s/it] 30%|██▉ | 2059/6885 [11:37:53<4:47:02, 3.57s/it] 30%|██▉ | 2060/6885 [11:37:55<4:11:28, 3.13s/it] {'loss': 0.5918, 'grad_norm': 1.230690665069067, 'learning_rate': 8.841424830367051e-06, 'epoch': 0.3} 30%|██▉ | 2060/6885 [11:37:55<4:11:28, 3.13s/it] 30%|██▉ | 2061/6885 [11:37:58<4:17:29, 3.20s/it] 30%|██▉ | 2062/6885 [11:38:05<5:53:44, 4.40s/it] 30%|██▉ | 2063/6885 [11:38:08<5:15:53, 3.93s/it] 30%|██▉ | 2064/6885 [11:38:11<4:58:45, 3.72s/it] 30%|██▉ | 2065/6885 [11:38:15<4:49:41, 3.61s/it] 30%|███ | 2066/6885 [11:38:18<4:35:31, 3.43s/it] 30%|███ | 2067/6885 [11:38:20<4:05:06, 3.05s/it] 30%|███ | 2068/6885 [11:38:22<3:56:08, 2.94s/it] 30%|███ | 2069/6885 [11:38:25<3:49:45, 2.86s/it] 30%|███ | 2070/6885 [11:38:28<3:41:30, 2.76s/it] {'loss': 0.6026, 'grad_norm': 1.2143851276582127, 'learning_rate': 8.82514764457088e-06, 'epoch': 0.3} 30%|███ | 2070/6885 [11:38:28<3:41:30, 2.76s/it] 30%|███ | 2071/6885 [11:38:31<4:07:15, 3.08s/it] 30%|███ | 2072/6885 [11:38:36<4:48:30, 3.60s/it] 30%|███ | 2073/6885 [11:38:38<4:08:56, 3.10s/it] 30%|███ | 2074/6885 [11:38:42<4:15:48, 3.19s/it] 30%|███ | 2075/6885 [11:38:44<3:48:17, 2.85s/it] 30%|███ | 2076/6885 [11:38:47<3:52:58, 2.91s/it] 30%|███ | 2077/6885 [11:38:49<3:38:24, 2.73s/it] 30%|███ | 2078/6885 [11:38:51<3:18:36, 2.48s/it] 30%|███ | 2079/6885 [11:38:53<3:13:45, 2.42s/it] 30%|███ | 2080/6885 [11:38:57<3:45:22, 2.81s/it] {'loss': 0.6208, 'grad_norm': 1.1711415813258073, 'learning_rate': 8.808772120134286e-06, 'epoch': 0.3} 30%|███ | 2080/6885 [11:38:57<3:45:22, 2.81s/it] 30%|███ | 2081/6885 [11:39:00<3:53:01, 2.91s/it] 30%|███ | 2082/6885 [11:39:03<3:48:12, 2.85s/it] 30%|███ | 2083/6885 [11:39:05<3:39:18, 2.74s/it] 30%|███ | 2084/6885 [11:39:07<3:25:53, 2.57s/it] 30%|███ | 2085/6885 [11:39:10<3:35:44, 2.70s/it] 30%|███ | 2086/6885 [11:39:14<3:52:42, 2.91s/it] 30%|███ | 2087/6885 [11:39:19<4:54:11, 3.68s/it] 30%|███ | 2088/6885 [11:39:21<4:16:26, 3.21s/it] 30%|███ | 2089/6885 [11:39:24<3:55:29, 2.95s/it] 30%|███ | 2090/6885 [11:39:26<3:44:08, 2.80s/it] {'loss': 0.6178, 'grad_norm': 1.2105658122447378, 'learning_rate': 8.79229867804672e-06, 'epoch': 0.3} 30%|███ | 2090/6885 [11:39:26<3:44:08, 2.80s/it] 30%|███ | 2091/6885 [11:39:30<4:16:31, 3.21s/it] 30%|███ | 2092/6885 [11:39:32<3:48:08, 2.86s/it] 30%|███ | 2093/6885 [11:39:37<4:36:08, 3.46s/it] 30%|███ | 2094/6885 [11:39:41<4:53:17, 3.67s/it] 30%|███ | 2095/6885 [11:39:44<4:29:04, 3.37s/it] 30%|███ | 2096/6885 [11:39:46<4:02:07, 3.03s/it] 30%|███ | 2097/6885 [11:39:51<4:41:53, 3.53s/it] 30%|███ | 2098/6885 [11:39:53<4:12:18, 3.16s/it] 30%|███ | 2099/6885 [11:39:56<3:50:26, 2.89s/it] 31%|███ | 2100/6885 [11:40:02<5:05:27, 3.83s/it] {'loss': 0.6033, 'grad_norm': 1.260614604486508, 'learning_rate': 8.775727741814945e-06, 'epoch': 0.31} 31%|███ | 2100/6885 [11:40:02<5:05:27, 3.83s/it] 31%|███ | 2101/6885 [11:40:04<4:30:41, 3.40s/it] 31%|███ | 2102/6885 [11:40:07<4:23:52, 3.31s/it] 31%|███ | 2103/6885 [11:40:10<4:10:31, 3.14s/it] 31%|███ | 2104/6885 [11:40:13<3:58:25, 2.99s/it] 31%|███ | 2105/6885 [11:40:16<4:15:15, 3.20s/it] 31%|███ | 2106/6885 [11:40:19<4:02:09, 3.04s/it] 31%|███ | 2107/6885 [11:40:22<4:05:58, 3.09s/it] 31%|███ | 2108/6885 [11:40:24<3:47:29, 2.86s/it] 31%|███ | 2109/6885 [11:40:27<3:49:49, 2.89s/it] 31%|███ | 2110/6885 [11:40:29<3:26:53, 2.60s/it] {'loss': 0.5954, 'grad_norm': 1.1949196588242055, 'learning_rate': 8.75905973745215e-06, 'epoch': 0.31} 31%|███ | 2110/6885 [11:40:29<3:26:53, 2.60s/it] 31%|███ | 2111/6885 [11:40:31<3:15:28, 2.46s/it] 31%|███ | 2112/6885 [11:40:34<3:24:05, 2.57s/it] 31%|███ | 2113/6885 [11:40:38<3:51:13, 2.91s/it] 31%|███ | 2114/6885 [11:40:41<4:01:45, 3.04s/it] 31%|███ | 2115/6885 [11:40:43<3:36:40, 2.73s/it] 31%|███ | 2116/6885 [11:40:47<3:49:27, 2.89s/it] 31%|███ | 2117/6885 [11:40:50<4:09:38, 3.14s/it] 31%|███ | 2118/6885 [11:40:53<4:00:47, 3.03s/it] 31%|███ | 2119/6885 [11:40:55<3:41:52, 2.79s/it] 31%|███ | 2120/6885 [11:40:58<3:49:17, 2.89s/it] {'loss': 0.5929, 'grad_norm': 1.2358431757504627, 'learning_rate': 8.742295093466993e-06, 'epoch': 0.31} 31%|███ | 2120/6885 [11:40:58<3:49:17, 2.89s/it] 31%|███ | 2121/6885 [11:41:01<3:49:14, 2.89s/it] 31%|███ | 2122/6885 [11:41:04<3:35:15, 2.71s/it] 31%|███ | 2123/6885 [11:41:08<4:07:34, 3.12s/it] 31%|███ | 2124/6885 [11:41:11<4:05:05, 3.09s/it] 31%|███ | 2125/6885 [11:41:13<3:52:50, 2.94s/it] 31%|███ | 2126/6885 [11:41:15<3:27:12, 2.61s/it] 31%|███ | 2127/6885 [11:41:17<3:16:19, 2.48s/it] 31%|███ | 2128/6885 [11:41:21<3:44:51, 2.84s/it] 31%|███ | 2129/6885 [11:41:23<3:30:06, 2.65s/it] 31%|███ | 2130/6885 [11:41:26<3:28:28, 2.63s/it] {'loss': 0.6014, 'grad_norm': 1.1788915626896657, 'learning_rate': 8.725434240852586e-06, 'epoch': 0.31} 31%|███ | 2130/6885 [11:41:26<3:28:28, 2.63s/it] 31%|███ | 2131/6885 [11:41:29<3:52:47, 2.94s/it] 31%|███ | 2132/6885 [11:41:34<4:33:12, 3.45s/it] 31%|███ | 2133/6885 [11:41:37<4:12:14, 3.18s/it] 31%|███ | 2134/6885 [11:41:40<4:22:20, 3.31s/it] 31%|███ | 2135/6885 [11:41:44<4:36:48, 3.50s/it] 31%|███ | 2136/6885 [11:41:47<4:18:25, 3.26s/it] 31%|███ | 2137/6885 [11:41:49<3:48:07, 2.88s/it] 31%|███ | 2138/6885 [11:41:53<4:14:18, 3.21s/it] 31%|███ | 2139/6885 [11:41:57<4:45:13, 3.61s/it] 31%|███ | 2140/6885 [11:42:00<4:15:59, 3.24s/it] {'loss': 0.588, 'grad_norm': 1.2899429468502281, 'learning_rate': 8.708477613075422e-06, 'epoch': 0.31} 31%|███ | 2140/6885 [11:42:00<4:15:59, 3.24s/it] 31%|███ | 2141/6885 [11:42:02<3:49:29, 2.90s/it] 31%|███ | 2142/6885 [11:42:04<3:40:32, 2.79s/it] 31%|███ | 2143/6885 [11:42:07<3:31:22, 2.67s/it] 31%|███ | 2144/6885 [11:42:09<3:23:02, 2.57s/it] 31%|███ | 2145/6885 [11:42:13<3:46:58, 2.87s/it] 31%|███ | 2146/6885 [11:42:15<3:25:17, 2.60s/it] 31%|███ | 2147/6885 [11:42:18<3:35:03, 2.72s/it] 31%|███ | 2148/6885 [11:42:20<3:15:42, 2.48s/it] 31%|███ | 2149/6885 [11:42:22<3:06:28, 2.36s/it] 31%|███ | 2150/6885 [11:42:25<3:20:18, 2.54s/it] {'loss': 0.6128, 'grad_norm': 1.0436767601630443, 'learning_rate': 8.691425646064222e-06, 'epoch': 0.31} 31%|███ | 2150/6885 [11:42:25<3:20:18, 2.54s/it] 31%|███ | 2151/6885 [11:42:27<3:12:58, 2.45s/it] 31%|███▏ | 2152/6885 [11:42:29<3:13:29, 2.45s/it] 31%|███▏ | 2153/6885 [11:42:32<3:27:10, 2.63s/it] 31%|███▏ | 2154/6885 [11:42:34<3:10:35, 2.42s/it] 31%|███▏ | 2155/6885 [11:42:38<3:36:45, 2.75s/it] 31%|███▏ | 2156/6885 [11:42:41<3:54:11, 2.97s/it] 31%|███▏ | 2157/6885 [11:42:43<3:35:08, 2.73s/it] 31%|███▏ | 2158/6885 [11:42:45<3:09:34, 2.41s/it] 31%|███▏ | 2159/6885 [11:42:49<3:37:46, 2.76s/it] 31%|███▏ | 2160/6885 [11:42:52<3:42:29, 2.83s/it] {'loss': 0.5939, 'grad_norm': 1.1823668694466984, 'learning_rate': 8.674278778198731e-06, 'epoch': 0.31} 31%|███▏ | 2160/6885 [11:42:52<3:42:29, 2.83s/it] 31%|███▏ | 2161/6885 [11:42:54<3:26:26, 2.62s/it] 31%|███▏ | 2162/6885 [11:42:57<3:47:59, 2.90s/it] 31%|███▏ | 2163/6885 [11:43:01<3:56:24, 3.00s/it] 31%|███▏ | 2164/6885 [11:43:04<4:01:51, 3.07s/it] 31%|███▏ | 2165/6885 [11:43:07<4:02:10, 3.08s/it] 31%|███▏ | 2166/6885 [11:43:10<3:57:45, 3.02s/it] 31%|███▏ | 2167/6885 [11:43:12<3:36:14, 2.75s/it] 31%|███▏ | 2168/6885 [11:43:14<3:15:47, 2.49s/it] 32%|███▏ | 2169/6885 [11:43:16<3:07:17, 2.38s/it] 32%|███▏ | 2170/6885 [11:43:18<3:06:39, 2.38s/it] {'loss': 0.5942, 'grad_norm': 1.2287777612088193, 'learning_rate': 8.657037450298449e-06, 'epoch': 0.32} 32%|███▏ | 2170/6885 [11:43:18<3:06:39, 2.38s/it] 32%|███▏ | 2171/6885 [11:43:20<2:57:48, 2.26s/it] 32%|███▏ | 2172/6885 [11:43:23<3:01:42, 2.31s/it] 32%|███▏ | 2173/6885 [11:43:26<3:23:53, 2.60s/it] 32%|███▏ | 2174/6885 [11:43:29<3:26:02, 2.62s/it] 32%|███▏ | 2175/6885 [11:43:32<3:53:13, 2.97s/it] 32%|███▏ | 2176/6885 [11:43:37<4:28:38, 3.42s/it] 32%|███▏ | 2177/6885 [11:43:43<5:37:08, 4.30s/it] 32%|███▏ | 2178/6885 [11:43:47<5:15:44, 4.02s/it] 32%|███▏ | 2179/6885 [11:43:49<4:34:19, 3.50s/it] 32%|███▏ | 2180/6885 [11:43:53<4:52:58, 3.74s/it] {'loss': 0.6068, 'grad_norm': 1.1210160142803036, 'learning_rate': 8.6397021056113e-06, 'epoch': 0.32} 32%|███▏ | 2180/6885 [11:43:53<4:52:58, 3.74s/it] 32%|███▏ | 2181/6885 [11:43:57<5:00:30, 3.83s/it] 32%|███▏ | 2182/6885 [11:44:00<4:33:55, 3.49s/it] 32%|███▏ | 2183/6885 [11:44:02<4:03:14, 3.10s/it] 32%|███▏ | 2184/6885 [11:44:06<4:14:50, 3.25s/it] 32%|███▏ | 2185/6885 [11:44:08<3:55:46, 3.01s/it] 32%|███▏ | 2186/6885 [11:44:10<3:34:33, 2.74s/it] 32%|███▏ | 2187/6885 [11:44:13<3:41:58, 2.83s/it] 32%|███▏ | 2188/6885 [11:44:16<3:30:13, 2.69s/it] 32%|███▏ | 2189/6885 [11:44:18<3:12:00, 2.45s/it] 32%|███▏ | 2190/6885 [11:44:19<2:52:17, 2.20s/it] {'loss': 0.6099, 'grad_norm': 1.176574092958882, 'learning_rate': 8.622273189802231e-06, 'epoch': 0.32} 32%|███▏ | 2190/6885 [11:44:19<2:52:17, 2.20s/it] 32%|███▏ | 2191/6885 [11:44:23<3:22:27, 2.59s/it] 32%|███▏ | 2192/6885 [11:44:25<3:20:36, 2.56s/it] 32%|███▏ | 2193/6885 [11:44:28<3:15:06, 2.49s/it] 32%|███▏ | 2194/6885 [11:44:30<3:02:40, 2.34s/it] 32%|███▏ | 2195/6885 [11:44:34<3:41:41, 2.84s/it] 32%|███▏ | 2196/6885 [11:44:36<3:27:06, 2.65s/it] 32%|███▏ | 2197/6885 [11:44:38<3:09:20, 2.42s/it] 32%|███▏ | 2198/6885 [11:44:41<3:25:54, 2.64s/it] 32%|███▏ | 2199/6885 [11:44:43<3:20:23, 2.57s/it] 32%|███▏ | 2200/6885 [11:44:47<3:51:15, 2.96s/it] {'loss': 0.598, 'grad_norm': 1.2276623152067967, 'learning_rate': 8.604751150941758e-06, 'epoch': 0.32} 32%|███▏ | 2200/6885 [11:44:47<3:51:15, 2.96s/it] 32%|███▏ | 2201/6885 [11:44:49<3:34:44, 2.75s/it] 32%|███▏ | 2202/6885 [11:44:52<3:32:16, 2.72s/it] 32%|███▏ | 2203/6885 [11:44:55<3:45:00, 2.88s/it] 32%|███▏ | 2204/6885 [11:44:59<4:11:54, 3.23s/it] 32%|███▏ | 2205/6885 [11:45:02<4:01:32, 3.10s/it] 32%|███▏ | 2206/6885 [11:45:05<3:49:32, 2.94s/it] 32%|███▏ | 2207/6885 [11:45:08<3:53:26, 2.99s/it] 32%|███▏ | 2208/6885 [11:45:10<3:43:28, 2.87s/it] 32%|███▏ | 2209/6885 [11:45:14<3:52:48, 2.99s/it] 32%|███▏ | 2210/6885 [11:45:16<3:41:37, 2.84s/it] {'loss': 0.5934, 'grad_norm': 1.2049029589388036, 'learning_rate': 8.58713643949445e-06, 'epoch': 0.32} 32%|███▏ | 2210/6885 [11:45:16<3:41:37, 2.84s/it] 32%|███▏ | 2211/6885 [11:45:19<3:32:08, 2.72s/it] 32%|███▏ | 2212/6885 [11:45:21<3:20:20, 2.57s/it] 32%|███▏ | 2213/6885 [11:45:25<3:51:50, 2.98s/it] 32%|███▏ | 2214/6885 [11:45:28<3:59:01, 3.07s/it] 32%|███▏ | 2215/6885 [11:45:31<4:08:09, 3.19s/it] 32%|███▏ | 2216/6885 [11:45:33<3:38:19, 2.81s/it] 32%|███▏ | 2217/6885 [11:45:40<5:08:59, 3.97s/it] 32%|███▏ | 2218/6885 [11:45:43<4:51:30, 3.75s/it] 32%|███▏ | 2219/6885 [11:45:46<4:16:14, 3.29s/it] 32%|███▏ | 2220/6885 [11:45:48<3:58:59, 3.07s/it] {'loss': 0.6039, 'grad_norm': 1.2650704032924422, 'learning_rate': 8.569429508307345e-06, 'epoch': 0.32} 32%|███▏ | 2220/6885 [11:45:48<3:58:59, 3.07s/it] 32%|███▏ | 2221/6885 [11:45:51<4:03:17, 3.13s/it] 32%|███▏ | 2222/6885 [11:45:53<3:35:37, 2.77s/it] 32%|███▏ | 2223/6885 [11:45:56<3:26:41, 2.66s/it] 32%|███▏ | 2224/6885 [11:45:58<3:14:09, 2.50s/it] 32%|███▏ | 2225/6885 [11:46:01<3:23:32, 2.62s/it] 32%|███▏ | 2226/6885 [11:46:04<3:39:53, 2.83s/it] 32%|███▏ | 2227/6885 [11:46:06<3:25:29, 2.65s/it] 32%|███▏ | 2228/6885 [11:46:11<4:04:57, 3.16s/it] 32%|███▏ | 2229/6885 [11:46:13<3:53:16, 3.01s/it] 32%|███▏ | 2230/6885 [11:46:17<4:16:53, 3.31s/it] {'loss': 0.6038, 'grad_norm': 1.088534753663297, 'learning_rate': 8.551630812598303e-06, 'epoch': 0.32} 32%|███▏ | 2230/6885 [11:46:17<4:16:53, 3.31s/it] 32%|███▏ | 2231/6885 [11:46:21<4:22:07, 3.38s/it] 32%|███▏ | 2232/6885 [11:46:23<3:59:43, 3.09s/it] 32%|███▏ | 2233/6885 [11:46:26<3:56:57, 3.06s/it] 32%|███▏ | 2234/6885 [11:46:29<3:46:32, 2.92s/it] 32%|███▏ | 2235/6885 [11:46:31<3:31:48, 2.73s/it] 32%|███▏ | 2236/6885 [11:46:34<3:45:46, 2.91s/it] 32%|███▏ | 2237/6885 [11:46:37<3:38:25, 2.82s/it] 33%|███▎ | 2238/6885 [11:46:40<3:43:01, 2.88s/it] 33%|███▎ | 2239/6885 [11:46:43<3:52:15, 3.00s/it] 33%|███▎ | 2240/6885 [11:46:47<3:57:53, 3.07s/it] {'loss': 0.6084, 'grad_norm': 1.1678210415173849, 'learning_rate': 8.533740809944317e-06, 'epoch': 0.33} 33%|███▎ | 2240/6885 [11:46:47<3:57:53, 3.07s/it] 33%|███▎ | 2241/6885 [11:46:50<3:55:03, 3.04s/it] 33%|███▎ | 2242/6885 [11:46:53<3:53:43, 3.02s/it] 33%|███▎ | 2243/6885 [11:46:56<3:56:09, 3.05s/it] 33%|███▎ | 2244/6885 [11:46:58<3:34:57, 2.78s/it] 33%|███▎ | 2245/6885 [11:47:02<4:00:25, 3.11s/it] 33%|███▎ | 2246/6885 [11:47:05<4:00:21, 3.11s/it] 33%|███▎ | 2247/6885 [11:47:08<3:58:50, 3.09s/it] 33%|███▎ | 2248/6885 [11:47:11<4:09:09, 3.22s/it] 33%|███▎ | 2249/6885 [11:47:14<4:05:40, 3.18s/it] 33%|███▎ | 2250/6885 [11:47:17<3:58:08, 3.08s/it] {'loss': 0.5975, 'grad_norm': 1.251355519441971, 'learning_rate': 8.515759960269731e-06, 'epoch': 0.33} 33%|███▎ | 2250/6885 [11:47:17<3:58:08, 3.08s/it] 33%|███▎ | 2251/6885 [11:47:22<4:25:28, 3.44s/it] 33%|███▎ | 2252/6885 [11:47:26<4:47:17, 3.72s/it] 33%|███▎ | 2253/6885 [11:47:33<6:13:01, 4.83s/it] 33%|███▎ | 2254/6885 [11:47:36<5:13:58, 4.07s/it] 33%|███▎ | 2255/6885 [11:47:38<4:26:11, 3.45s/it] 33%|███▎ | 2256/6885 [11:47:40<3:58:56, 3.10s/it] 33%|███▎ | 2257/6885 [11:47:42<3:44:02, 2.90s/it] 33%|███▎ | 2258/6885 [11:47:45<3:32:26, 2.75s/it] 33%|███▎ | 2259/6885 [11:47:48<3:38:43, 2.84s/it] 33%|███▎ | 2260/6885 [11:47:50<3:26:06, 2.67s/it] {'loss': 0.6106, 'grad_norm': 1.1662322522769242, 'learning_rate': 8.497688725834432e-06, 'epoch': 0.33} 33%|███▎ | 2260/6885 [11:47:50<3:26:06, 2.67s/it] 33%|███▎ | 2261/6885 [11:47:52<3:10:14, 2.47s/it] 33%|███▎ | 2262/6885 [11:47:55<3:14:26, 2.52s/it] 33%|███▎ | 2263/6885 [11:47:57<2:55:35, 2.28s/it] 33%|███▎ | 2264/6885 [11:47:59<2:53:11, 2.25s/it] 33%|███▎ | 2265/6885 [11:48:01<3:02:27, 2.37s/it] 33%|███▎ | 2266/6885 [11:48:04<3:13:56, 2.52s/it] 33%|███▎ | 2267/6885 [11:48:06<3:08:39, 2.45s/it] 33%|███▎ | 2268/6885 [11:48:10<3:21:42, 2.62s/it] 33%|███▎ | 2269/6885 [11:48:11<3:03:16, 2.38s/it] 33%|███▎ | 2270/6885 [11:48:15<3:22:56, 2.64s/it] {'loss': 0.6224, 'grad_norm': 1.336372713961502, 'learning_rate': 8.479527571221957e-06, 'epoch': 0.33} 33%|███▎ | 2270/6885 [11:48:15<3:22:56, 2.64s/it] 33%|███▎ | 2271/6885 [11:48:17<3:19:58, 2.60s/it] 33%|███▎ | 2272/6885 [11:48:20<3:20:52, 2.61s/it] 33%|███▎ | 2273/6885 [11:48:23<3:27:41, 2.70s/it] 33%|███▎ | 2274/6885 [11:48:25<3:18:54, 2.59s/it] 33%|███▎ | 2275/6885 [11:48:28<3:18:56, 2.59s/it] 33%|███▎ | 2276/6885 [11:48:31<3:44:04, 2.92s/it] 33%|███▎ | 2277/6885 [11:48:34<3:36:56, 2.82s/it] 33%|███▎ | 2278/6885 [11:48:38<3:58:41, 3.11s/it] 33%|███▎ | 2279/6885 [11:48:42<4:35:05, 3.58s/it] 33%|███▎ | 2280/6885 [11:48:44<3:59:38, 3.12s/it] {'loss': 0.607, 'grad_norm': 1.148371532122775, 'learning_rate': 8.461276963327555e-06, 'epoch': 0.33} 33%|███▎ | 2280/6885 [11:48:44<3:59:38, 3.12s/it] 33%|███▎ | 2281/6885 [11:48:47<3:47:24, 2.96s/it] 33%|███▎ | 2282/6885 [11:48:49<3:22:32, 2.64s/it] 33%|███▎ | 2283/6885 [11:48:51<3:01:04, 2.36s/it] 33%|███▎ | 2284/6885 [11:48:54<3:21:50, 2.63s/it] 33%|███▎ | 2285/6885 [11:48:57<3:27:03, 2.70s/it] 33%|███▎ | 2286/6885 [11:49:01<3:56:30, 3.09s/it] 33%|███▎ | 2287/6885 [11:49:06<4:44:18, 3.71s/it] 33%|███▎ | 2288/6885 [11:49:11<5:21:36, 4.20s/it] 33%|███▎ | 2289/6885 [11:49:13<4:24:31, 3.45s/it] 33%|███▎ | 2290/6885 [11:49:17<4:30:11, 3.53s/it] {'loss': 0.6001, 'grad_norm': 1.3691981401078914, 'learning_rate': 8.442937371346174e-06, 'epoch': 0.33} 33%|███▎ | 2290/6885 [11:49:17<4:30:11, 3.53s/it] 33%|███▎ | 2291/6885 [11:49:22<5:07:13, 4.01s/it] 33%|███▎ | 2292/6885 [11:49:24<4:36:18, 3.61s/it] 33%|███▎ | 2293/6885 [11:49:26<3:46:40, 2.96s/it] 33%|███▎ | 2294/6885 [11:49:29<3:53:20, 3.05s/it] 33%|███▎ | 2295/6885 [11:49:32<3:58:48, 3.12s/it] 33%|███▎ | 2296/6885 [11:49:35<3:51:52, 3.03s/it] 33%|███▎ | 2297/6885 [11:49:37<3:25:26, 2.69s/it] 33%|███▎ | 2298/6885 [11:49:39<3:13:58, 2.54s/it] 33%|███▎ | 2299/6885 [11:49:42<3:21:32, 2.64s/it] 33%|███▎ | 2300/6885 [11:49:44<3:03:20, 2.40s/it] {'loss': 0.6009, 'grad_norm': 1.3343569533197541, 'learning_rate': 8.424509266760413e-06, 'epoch': 0.33} 33%|███▎ | 2300/6885 [11:49:44<3:03:20, 2.40s/it] 33%|███▎ | 2301/6885 [11:49:46<2:50:54, 2.24s/it] 33%|███▎ | 2302/6885 [11:49:50<3:39:19, 2.87s/it] 33%|███▎ | 2303/6885 [11:49:54<4:08:08, 3.25s/it] 33%|███▎ | 2304/6885 [11:49:57<4:06:04, 3.22s/it] 33%|███▎ | 2305/6885 [11:50:00<3:46:53, 2.97s/it] 33%|███▎ | 2306/6885 [11:50:05<4:26:08, 3.49s/it] 34%|███▎ | 2307/6885 [11:50:08<4:33:13, 3.58s/it] 34%|███▎ | 2308/6885 [11:50:12<4:27:33, 3.51s/it] 34%|███▎ | 2309/6885 [11:50:13<3:45:51, 2.96s/it] 34%|███▎ | 2310/6885 [11:50:18<4:33:46, 3.59s/it] {'loss': 0.5852, 'grad_norm': 1.0903008241967769, 'learning_rate': 8.405993123328388e-06, 'epoch': 0.34} 34%|███▎ | 2310/6885 [11:50:18<4:33:46, 3.59s/it] 34%|███▎ | 2311/6885 [11:50:21<4:05:01, 3.21s/it] 34%|███▎ | 2312/6885 [11:50:24<3:56:53, 3.11s/it] 34%|███▎ | 2313/6885 [11:50:28<4:25:12, 3.48s/it] 34%|███▎ | 2314/6885 [11:50:31<4:15:55, 3.36s/it] 34%|███▎ | 2315/6885 [11:50:33<3:47:26, 2.99s/it] 34%|███▎ | 2316/6885 [11:50:36<3:53:48, 3.07s/it] 34%|███▎ | 2317/6885 [11:50:39<3:33:41, 2.81s/it] 34%|███▎ | 2318/6885 [11:50:41<3:21:52, 2.65s/it] 34%|███▎ | 2319/6885 [11:50:44<3:29:48, 2.76s/it] 34%|███▎ | 2320/6885 [11:50:46<3:18:48, 2.61s/it] {'loss': 0.5967, 'grad_norm': 1.2770798153391716, 'learning_rate': 8.387389417071565e-06, 'epoch': 0.34} 34%|███▎ | 2320/6885 [11:50:46<3:18:48, 2.61s/it] 34%|███▎ | 2321/6885 [11:50:51<4:04:48, 3.22s/it] 34%|███▎ | 2322/6885 [11:50:54<4:13:14, 3.33s/it] 34%|███▎ | 2323/6885 [11:50:57<3:50:11, 3.03s/it] 34%|███▍ | 2324/6885 [11:50:59<3:34:52, 2.83s/it] 34%|███▍ | 2325/6885 [11:51:04<4:11:20, 3.31s/it] 34%|███▍ | 2326/6885 [11:51:09<4:54:55, 3.88s/it] 34%|███▍ | 2327/6885 [11:51:12<4:38:05, 3.66s/it] 34%|███▍ | 2328/6885 [11:51:15<4:19:11, 3.41s/it] 34%|███▍ | 2329/6885 [11:51:18<4:21:47, 3.45s/it] 34%|███▍ | 2330/6885 [11:51:23<4:45:51, 3.77s/it] {'loss': 0.5906, 'grad_norm': 1.1893611624135727, 'learning_rate': 8.368698626262506e-06, 'epoch': 0.34} 34%|███▍ | 2330/6885 [11:51:23<4:45:51, 3.77s/it] 34%|███▍ | 2331/6885 [11:51:25<4:19:34, 3.42s/it] 34%|███▍ | 2332/6885 [11:51:28<4:06:15, 3.25s/it] 34%|███▍ | 2333/6885 [11:51:30<3:40:09, 2.90s/it] 34%|███▍ | 2334/6885 [11:51:33<3:25:56, 2.72s/it] 34%|███▍ | 2335/6885 [11:51:35<3:24:05, 2.69s/it] 34%|███▍ | 2336/6885 [11:51:38<3:19:27, 2.63s/it] 34%|███▍ | 2337/6885 [11:51:40<3:07:15, 2.47s/it] 34%|███▍ | 2338/6885 [11:51:42<3:06:22, 2.46s/it] 34%|███▍ | 2339/6885 [11:51:45<3:16:19, 2.59s/it] 34%|███▍ | 2340/6885 [11:51:49<3:41:41, 2.93s/it] {'loss': 0.6144, 'grad_norm': 1.1182656055274527, 'learning_rate': 8.349921231412588e-06, 'epoch': 0.34} 34%|███▍ | 2340/6885 [11:51:49<3:41:41, 2.93s/it] 34%|███▍ | 2341/6885 [11:51:51<3:17:58, 2.61s/it] 34%|███▍ | 2342/6885 [11:51:54<3:28:45, 2.76s/it] 34%|███▍ | 2343/6885 [11:51:56<3:21:52, 2.67s/it] 34%|███▍ | 2344/6885 [11:52:00<3:55:58, 3.12s/it] 34%|███▍ | 2345/6885 [11:52:03<3:33:51, 2.83s/it] 34%|███▍ | 2346/6885 [11:52:05<3:25:24, 2.72s/it] 34%|███▍ | 2347/6885 [11:52:10<4:17:08, 3.40s/it] 34%|███▍ | 2348/6885 [11:52:13<3:56:50, 3.13s/it] 34%|███▍ | 2349/6885 [11:52:15<3:47:04, 3.00s/it] 34%|███▍ | 2350/6885 [11:52:18<3:40:32, 2.92s/it] {'loss': 0.5945, 'grad_norm': 1.1569225334439495, 'learning_rate': 8.331057715259643e-06, 'epoch': 0.34} 34%|███▍ | 2350/6885 [11:52:18<3:40:32, 2.92s/it] 34%|███▍ | 2351/6885 [11:52:22<4:02:08, 3.20s/it] 34%|███▍ | 2352/6885 [11:52:24<3:37:15, 2.88s/it] 34%|███▍ | 2353/6885 [11:52:27<3:41:34, 2.93s/it] 34%|███▍ | 2354/6885 [11:52:29<3:30:17, 2.78s/it] 34%|███▍ | 2355/6885 [11:52:32<3:31:11, 2.80s/it] 34%|███▍ | 2356/6885 [11:52:35<3:35:52, 2.86s/it] 34%|███▍ | 2357/6885 [11:52:41<4:37:18, 3.67s/it] 34%|███▍ | 2358/6885 [11:52:43<4:06:42, 3.27s/it] 34%|███▍ | 2359/6885 [11:52:45<3:28:19, 2.76s/it] 34%|███▍ | 2360/6885 [11:52:48<3:39:23, 2.91s/it] {'loss': 0.6012, 'grad_norm': 1.0553585361032343, 'learning_rate': 8.312108562755547e-06, 'epoch': 0.34} 34%|███▍ | 2360/6885 [11:52:48<3:39:23, 2.91s/it] 34%|███▍ | 2361/6885 [11:52:50<3:19:48, 2.65s/it] 34%|███▍ | 2362/6885 [11:52:52<2:52:27, 2.29s/it] 34%|███▍ | 2363/6885 [11:52:55<3:11:33, 2.54s/it] 34%|███▍ | 2364/6885 [11:52:57<3:06:24, 2.47s/it] 34%|███▍ | 2365/6885 [11:52:59<2:55:16, 2.33s/it] 34%|███▍ | 2366/6885 [11:53:01<2:58:20, 2.37s/it] 34%|███▍ | 2367/6885 [11:53:05<3:22:54, 2.69s/it] 34%|███▍ | 2368/6885 [11:53:09<3:53:05, 3.10s/it] 34%|███▍ | 2369/6885 [11:53:12<3:43:00, 2.96s/it] 34%|███▍ | 2370/6885 [11:53:19<5:16:37, 4.21s/it] {'loss': 0.602, 'grad_norm': 1.0429439932782214, 'learning_rate': 8.29307426105376e-06, 'epoch': 0.34} 34%|███▍ | 2370/6885 [11:53:19<5:16:37, 4.21s/it] 34%|███▍ | 2371/6885 [11:53:22<4:48:16, 3.83s/it] 34%|███▍ | 2372/6885 [11:53:24<4:22:03, 3.48s/it] 34%|███▍ | 2373/6885 [11:53:26<3:44:59, 2.99s/it] 34%|███▍ | 2374/6885 [11:53:28<3:17:48, 2.63s/it] 34%|███▍ | 2375/6885 [11:53:31<3:18:51, 2.65s/it] 35%|███▍ | 2376/6885 [11:53:33<3:08:29, 2.51s/it] 35%|███▍ | 2377/6885 [11:53:35<3:03:16, 2.44s/it] 35%|███▍ | 2378/6885 [11:53:38<3:19:48, 2.66s/it] 35%|███▍ | 2379/6885 [11:53:42<3:37:44, 2.90s/it] 35%|███▍ | 2380/6885 [11:53:44<3:31:05, 2.81s/it] {'loss': 0.5932, 'grad_norm': 1.0397368512389722, 'learning_rate': 8.273955299496787e-06, 'epoch': 0.35} 35%|███▍ | 2380/6885 [11:53:44<3:31:05, 2.81s/it] 35%|███▍ | 2381/6885 [11:53:46<3:11:58, 2.56s/it] 35%|███▍ | 2382/6885 [11:53:51<3:51:56, 3.09s/it] 35%|███▍ | 2383/6885 [11:53:53<3:38:01, 2.91s/it] 35%|███▍ | 2384/6885 [11:53:55<3:16:03, 2.61s/it] 35%|███▍ | 2385/6885 [11:53:57<3:06:50, 2.49s/it] 35%|███▍ | 2386/6885 [11:54:05<5:04:01, 4.05s/it] 35%|███▍ | 2387/6885 [11:54:07<4:26:23, 3.55s/it] 35%|███▍ | 2388/6885 [11:54:10<4:09:57, 3.33s/it] 35%|███▍ | 2389/6885 [11:54:12<3:46:09, 3.02s/it] 35%|███▍ | 2390/6885 [11:54:15<3:43:54, 2.99s/it] {'loss': 0.5987, 'grad_norm': 1.0989788243486265, 'learning_rate': 8.254752169603614e-06, 'epoch': 0.35} 35%|███▍ | 2390/6885 [11:54:15<3:43:54, 2.99s/it] 35%|███▍ | 2391/6885 [11:54:20<4:11:00, 3.35s/it] 35%|███▍ | 2392/6885 [11:54:22<3:56:25, 3.16s/it] 35%|███▍ | 2393/6885 [11:54:24<3:21:19, 2.69s/it] 35%|███▍ | 2394/6885 [11:54:26<3:19:42, 2.67s/it] 35%|███▍ | 2395/6885 [11:54:31<4:03:54, 3.26s/it] 35%|███▍ | 2396/6885 [11:54:34<4:03:07, 3.25s/it] 35%|███▍ | 2397/6885 [11:54:37<3:57:06, 3.17s/it] 35%|███▍ | 2398/6885 [11:54:40<3:38:57, 2.93s/it] 35%|███▍ | 2399/6885 [11:54:42<3:20:00, 2.68s/it] 35%|███▍ | 2400/6885 [11:54:46<4:05:38, 3.29s/it] {'loss': 0.597, 'grad_norm': 1.2513128657031618, 'learning_rate': 8.235465365057067e-06, 'epoch': 0.35} 35%|███▍ | 2400/6885 [11:54:46<4:05:38, 3.29s/it] 35%|███▍ | 2401/6885 [11:54:49<3:45:41, 3.02s/it] 35%|███▍ | 2402/6885 [11:54:53<4:10:37, 3.35s/it] 35%|███▍ | 2403/6885 [11:54:57<4:16:54, 3.44s/it] 35%|███▍ | 2404/6885 [11:55:02<4:53:34, 3.93s/it] 35%|███▍ | 2405/6885 [11:55:04<4:14:30, 3.41s/it] 35%|███▍ | 2406/6885 [11:55:07<3:57:06, 3.18s/it] 35%|███▍ | 2407/6885 [11:55:09<3:32:14, 2.84s/it] 35%|███▍ | 2408/6885 [11:55:12<3:44:36, 3.01s/it] 35%|███▍ | 2409/6885 [11:55:14<3:20:52, 2.69s/it] 35%|███▌ | 2410/6885 [11:55:17<3:17:52, 2.65s/it] {'loss': 0.5962, 'grad_norm': 1.2696804086094644, 'learning_rate': 8.21609538169111e-06, 'epoch': 0.35} 35%|███▌ | 2410/6885 [11:55:17<3:17:52, 2.65s/it] 35%|███▌ | 2411/6885 [11:55:20<3:30:55, 2.83s/it] 35%|███▌ | 2412/6885 [11:55:22<3:17:18, 2.65s/it] 35%|███▌ | 2413/6885 [11:55:24<3:01:29, 2.44s/it] 35%|███▌ | 2414/6885 [11:55:27<3:11:02, 2.56s/it] 35%|███▌ | 2415/6885 [11:55:31<3:55:06, 3.16s/it] 35%|███▌ | 2416/6885 [11:55:34<3:36:01, 2.90s/it] 35%|███▌ | 2417/6885 [11:55:36<3:30:22, 2.82s/it] 35%|███▌ | 2418/6885 [11:55:40<3:49:31, 3.08s/it] 35%|███▌ | 2419/6885 [11:55:44<4:20:11, 3.50s/it] 35%|███▌ | 2420/6885 [11:55:46<3:34:02, 2.88s/it] {'loss': 0.6083, 'grad_norm': 1.3765675743894579, 'learning_rate': 8.196642717478113e-06, 'epoch': 0.35} 35%|███▌ | 2420/6885 [11:55:46<3:34:02, 2.88s/it] 35%|███▌ | 2421/6885 [11:55:49<3:35:58, 2.90s/it] 35%|███▌ | 2422/6885 [11:55:51<3:19:05, 2.68s/it] 35%|███▌ | 2423/6885 [11:55:54<3:24:47, 2.75s/it] 35%|███▌ | 2424/6885 [11:55:59<4:13:56, 3.42s/it] 35%|███▌ | 2425/6885 [11:56:02<4:18:00, 3.47s/it] 35%|███▌ | 2426/6885 [11:56:05<4:05:46, 3.31s/it] 35%|███▌ | 2427/6885 [11:56:08<3:49:10, 3.08s/it] 35%|███▌ | 2428/6885 [11:56:12<4:17:12, 3.46s/it] 35%|███▌ | 2429/6885 [11:56:15<3:59:27, 3.22s/it] 35%|███▌ | 2430/6885 [11:56:18<3:54:15, 3.16s/it] {'loss': 0.5912, 'grad_norm': 1.1525716644685924, 'learning_rate': 8.177107872516041e-06, 'epoch': 0.35} 35%|███▌ | 2430/6885 [11:56:18<3:54:15, 3.16s/it] 35%|███▌ | 2431/6885 [11:56:21<3:47:34, 3.07s/it] 35%|███▌ | 2432/6885 [11:56:24<3:42:45, 3.00s/it] 35%|███▌ | 2433/6885 [11:56:26<3:24:35, 2.76s/it] 35%|███▌ | 2434/6885 [11:56:29<3:30:10, 2.83s/it] 35%|███▌ | 2435/6885 [11:56:32<3:27:16, 2.79s/it] 35%|███▌ | 2436/6885 [11:56:34<3:28:40, 2.81s/it] 35%|███▌ | 2437/6885 [11:56:37<3:16:24, 2.65s/it] 35%|███▌ | 2438/6885 [11:56:43<4:40:21, 3.78s/it] 35%|███▌ | 2439/6885 [11:56:46<4:11:21, 3.39s/it] 35%|███▌ | 2440/6885 [11:56:49<4:16:36, 3.46s/it] {'loss': 0.601, 'grad_norm': 1.1930516036081553, 'learning_rate': 8.157491349015599e-06, 'epoch': 0.35} 35%|███▌ | 2440/6885 [11:56:49<4:16:36, 3.46s/it] 35%|███▌ | 2441/6885 [11:56:51<3:41:53, 3.00s/it] 35%|███▌ | 2442/6885 [11:56:57<4:48:37, 3.90s/it] 35%|███▌ | 2443/6885 [11:56:59<4:10:32, 3.38s/it] 35%|███▌ | 2444/6885 [11:57:04<4:31:40, 3.67s/it] 36%|███▌ | 2445/6885 [11:57:08<4:36:54, 3.74s/it] 36%|███▌ | 2446/6885 [11:57:11<4:31:42, 3.67s/it] 36%|███▌ | 2447/6885 [11:57:14<4:20:12, 3.52s/it] 36%|███▌ | 2448/6885 [11:57:17<4:12:21, 3.41s/it] 36%|███▌ | 2449/6885 [11:57:20<3:49:02, 3.10s/it] 36%|███▌ | 2450/6885 [11:57:24<4:23:51, 3.57s/it] {'loss': 0.62, 'grad_norm': 1.3453249916774477, 'learning_rate': 8.137793651287317e-06, 'epoch': 0.36} 36%|███▌ | 2450/6885 [11:57:24<4:23:51, 3.57s/it] 36%|███▌ | 2451/6885 [11:57:27<4:00:39, 3.26s/it] 36%|███▌ | 2452/6885 [11:57:30<4:03:17, 3.29s/it] 36%|███▌ | 2453/6885 [11:57:33<3:39:49, 2.98s/it] 36%|███▌ | 2454/6885 [11:57:35<3:21:32, 2.73s/it] 36%|███▌ | 2455/6885 [11:57:38<3:38:11, 2.96s/it] 36%|███▌ | 2456/6885 [11:57:41<3:35:31, 2.92s/it] 36%|███▌ | 2457/6885 [11:57:43<3:20:52, 2.72s/it] 36%|███▌ | 2458/6885 [11:57:46<3:29:42, 2.84s/it] 36%|███▌ | 2459/6885 [11:57:49<3:19:59, 2.71s/it] 36%|███▌ | 2460/6885 [11:57:51<3:04:29, 2.50s/it] {'loss': 0.6037, 'grad_norm': 1.216543063547056, 'learning_rate': 8.118015285728598e-06, 'epoch': 0.36} 36%|███▌ | 2460/6885 [11:57:51<3:04:29, 2.50s/it] 36%|███▌ | 2461/6885 [11:57:53<2:51:49, 2.33s/it] 36%|███▌ | 2462/6885 [11:57:56<3:03:26, 2.49s/it] 36%|███▌ | 2463/6885 [11:57:59<3:26:21, 2.80s/it] 36%|███▌ | 2464/6885 [11:58:01<3:14:33, 2.64s/it] 36%|███▌ | 2465/6885 [11:58:05<3:35:12, 2.92s/it] 36%|███▌ | 2466/6885 [11:58:09<3:57:03, 3.22s/it] 36%|███▌ | 2467/6885 [11:58:11<3:40:38, 3.00s/it] 36%|███▌ | 2468/6885 [11:58:15<3:44:10, 3.05s/it] 36%|███▌ | 2469/6885 [11:58:18<3:50:06, 3.13s/it] 36%|███▌ | 2470/6885 [11:58:23<4:31:45, 3.69s/it] {'loss': 0.598, 'grad_norm': 1.129394528084983, 'learning_rate': 8.098156760810683e-06, 'epoch': 0.36} 36%|███▌ | 2470/6885 [11:58:23<4:31:45, 3.69s/it] 36%|███▌ | 2471/6885 [11:58:28<5:06:56, 4.17s/it] 36%|███▌ | 2472/6885 [11:58:31<4:35:22, 3.74s/it] 36%|███▌ | 2473/6885 [11:58:34<4:28:30, 3.65s/it] 36%|███▌ | 2474/6885 [11:58:37<3:59:12, 3.25s/it] 36%|███▌ | 2475/6885 [11:58:43<5:02:10, 4.11s/it] 36%|███▌ | 2476/6885 [11:58:46<4:32:49, 3.71s/it] 36%|███▌ | 2477/6885 [11:58:48<4:09:20, 3.39s/it] 36%|███▌ | 2478/6885 [11:58:51<3:51:07, 3.15s/it] 36%|███▌ | 2479/6885 [11:58:53<3:25:26, 2.80s/it] 36%|███▌ | 2480/6885 [11:58:55<3:18:30, 2.70s/it] {'loss': 0.5813, 'grad_norm': 1.124156367954234, 'learning_rate': 8.078218587065589e-06, 'epoch': 0.36} 36%|███▌ | 2480/6885 [11:58:55<3:18:30, 2.70s/it] 36%|███▌ | 2481/6885 [11:58:58<3:18:14, 2.70s/it] 36%|███▌ | 2482/6885 [11:59:00<3:08:58, 2.58s/it] 36%|███▌ | 2483/6885 [11:59:03<3:05:08, 2.52s/it] 36%|███▌ | 2484/6885 [11:59:05<2:57:43, 2.42s/it] 36%|███▌ | 2485/6885 [11:59:10<3:58:25, 3.25s/it] 36%|███▌ | 2486/6885 [11:59:13<3:49:52, 3.14s/it] 36%|███▌ | 2487/6885 [11:59:15<3:37:22, 2.97s/it] 36%|███▌ | 2488/6885 [11:59:19<3:44:45, 3.07s/it] 36%|███▌ | 2489/6885 [11:59:22<3:37:56, 2.97s/it] 36%|███▌ | 2490/6885 [11:59:24<3:29:07, 2.85s/it] {'loss': 0.5876, 'grad_norm': 1.2039082584679666, 'learning_rate': 8.058201277072981e-06, 'epoch': 0.36} 36%|███▌ | 2490/6885 [11:59:24<3:29:07, 2.85s/it] 36%|███▌ | 2491/6885 [11:59:28<3:55:44, 3.22s/it] 36%|███▌ | 2492/6885 [11:59:31<3:36:16, 2.95s/it] 36%|███▌ | 2493/6885 [11:59:33<3:25:45, 2.81s/it] 36%|███▌ | 2494/6885 [11:59:35<3:12:55, 2.64s/it] 36%|███▌ | 2495/6885 [11:59:38<3:23:52, 2.79s/it] 36%|███▋ | 2496/6885 [11:59:42<3:48:40, 3.13s/it] 36%|███▋ | 2497/6885 [11:59:45<3:44:49, 3.07s/it] 36%|███▋ | 2498/6885 [11:59:48<3:33:42, 2.92s/it] 36%|███▋ | 2499/6885 [11:59:50<3:22:11, 2.77s/it] 36%|███▋ | 2500/6885 [11:59:52<3:10:52, 2.61s/it] {'loss': 0.6115, 'grad_norm': 1.1919842026488203, 'learning_rate': 8.038105345446994e-06, 'epoch': 0.36} 36%|███▋ | 2500/6885 [11:59:52<3:10:52, 2.61s/it] 36%|███▋ | 2501/6885 [11:59:56<3:28:32, 2.85s/it] 36%|███▋ | 2502/6885 [11:59:59<3:44:19, 3.07s/it] 36%|███▋ | 2503/6885 [12:00:02<3:40:09, 3.01s/it] 36%|███▋ | 2504/6885 [12:00:05<3:42:43, 3.05s/it] 36%|███▋ | 2505/6885 [12:00:10<4:07:52, 3.40s/it] 36%|███▋ | 2506/6885 [12:00:12<3:52:17, 3.18s/it] 36%|███▋ | 2507/6885 [12:00:15<3:33:01, 2.92s/it] 36%|███▋ | 2508/6885 [12:00:18<3:31:50, 2.90s/it] 36%|███▋ | 2509/6885 [12:00:21<3:49:33, 3.15s/it] 36%|███▋ | 2510/6885 [12:00:24<3:33:32, 2.93s/it] {'loss': 0.592, 'grad_norm': 1.2851968482663827, 'learning_rate': 8.017931308823006e-06, 'epoch': 0.36} 36%|███▋ | 2510/6885 [12:00:24<3:33:32, 2.93s/it] 36%|███▋ | 2511/6885 [12:00:26<3:16:18, 2.69s/it] 36%|███▋ | 2512/6885 [12:00:29<3:32:30, 2.92s/it] 36%|███▋ | 2513/6885 [12:00:31<3:06:02, 2.55s/it] 37%|███▋ | 2514/6885 [12:00:34<3:21:59, 2.77s/it] 37%|███▋ | 2515/6885 [12:00:38<3:44:44, 3.09s/it] 37%|███▋ | 2516/6885 [12:00:41<3:51:40, 3.18s/it] 37%|███▋ | 2517/6885 [12:00:45<3:59:06, 3.28s/it] 37%|███▋ | 2518/6885 [12:00:48<3:49:10, 3.15s/it] 37%|███▋ | 2519/6885 [12:00:50<3:33:03, 2.93s/it] 37%|███▋ | 2520/6885 [12:00:53<3:19:32, 2.74s/it] {'loss': 0.5867, 'grad_norm': 1.1538243634302991, 'learning_rate': 7.997679685844353e-06, 'epoch': 0.37} 37%|███▋ | 2520/6885 [12:00:53<3:19:32, 2.74s/it] 37%|███▋ | 2521/6885 [12:00:57<3:47:11, 3.12s/it] 37%|███▋ | 2522/6885 [12:00:59<3:32:19, 2.92s/it] 37%|███▋ | 2523/6885 [12:01:03<3:49:11, 3.15s/it] 37%|███▋ | 2524/6885 [12:01:05<3:33:52, 2.94s/it] 37%|███▋ | 2525/6885 [12:01:07<3:18:59, 2.74s/it] 37%|███▋ | 2526/6885 [12:01:09<2:56:52, 2.43s/it] 37%|███▋ | 2527/6885 [12:01:12<2:57:10, 2.44s/it] 37%|███▋ | 2528/6885 [12:01:15<3:14:28, 2.68s/it] 37%|███▋ | 2529/6885 [12:01:19<3:38:39, 3.01s/it] 37%|███▋ | 2530/6885 [12:01:22<3:45:57, 3.11s/it] {'loss': 0.6007, 'grad_norm': 1.0704432112589999, 'learning_rate': 7.977350997148994e-06, 'epoch': 0.37} 37%|███▋ | 2530/6885 [12:01:22<3:45:57, 3.11s/it] 37%|███▋ | 2531/6885 [12:01:25<3:34:39, 2.96s/it] 37%|███▋ | 2532/6885 [12:01:27<3:14:53, 2.69s/it] 37%|███▋ | 2533/6885 [12:01:29<2:58:38, 2.46s/it] 37%|███▋ | 2534/6885 [12:01:32<3:14:06, 2.68s/it] 37%|███▋ | 2535/6885 [12:01:34<3:13:01, 2.66s/it] 37%|███▋ | 2536/6885 [12:01:37<3:20:34, 2.77s/it] 37%|███▋ | 2537/6885 [12:01:39<3:03:50, 2.54s/it] 37%|███▋ | 2538/6885 [12:01:41<2:52:42, 2.38s/it] 37%|███▋ | 2539/6885 [12:01:44<2:58:25, 2.46s/it] 37%|███▋ | 2540/6885 [12:01:46<2:51:12, 2.36s/it] {'loss': 0.5746, 'grad_norm': 1.2707334756597408, 'learning_rate': 7.956945765356133e-06, 'epoch': 0.37} 37%|███▋ | 2540/6885 [12:01:46<2:51:12, 2.36s/it] 37%|███▋ | 2541/6885 [12:01:52<4:02:01, 3.34s/it] 37%|███▋ | 2542/6885 [12:01:54<3:36:31, 2.99s/it] 37%|███▋ | 2543/6885 [12:01:59<4:18:53, 3.58s/it] 37%|███▋ | 2544/6885 [12:02:01<3:52:37, 3.22s/it] 37%|███▋ | 2545/6885 [12:02:04<3:36:34, 2.99s/it] 37%|███▋ | 2546/6885 [12:02:10<4:55:37, 4.09s/it] 37%|███▋ | 2547/6885 [12:02:14<4:49:47, 4.01s/it] 37%|███▋ | 2548/6885 [12:02:17<4:31:28, 3.76s/it] 37%|███▋ | 2549/6885 [12:02:20<4:04:04, 3.38s/it] 37%|███▋ | 2550/6885 [12:02:22<3:29:57, 2.91s/it] {'loss': 0.601, 'grad_norm': 1.2061421625898763, 'learning_rate': 7.936464515052776e-06, 'epoch': 0.37} 37%|███▋ | 2550/6885 [12:02:22<3:29:57, 2.91s/it] 37%|███▋ | 2551/6885 [12:02:25<3:28:42, 2.89s/it] 37%|███▋ | 2552/6885 [12:02:27<3:22:03, 2.80s/it] 37%|███▋ | 2553/6885 [12:02:30<3:24:12, 2.83s/it] 37%|███▋ | 2554/6885 [12:02:33<3:17:06, 2.73s/it] 37%|███▋ | 2555/6885 [12:02:35<3:18:58, 2.76s/it] 37%|███▋ | 2556/6885 [12:02:39<3:33:38, 2.96s/it] 37%|███▋ | 2557/6885 [12:02:42<3:36:52, 3.01s/it] 37%|███▋ | 2558/6885 [12:02:44<3:23:53, 2.83s/it] 37%|███▋ | 2559/6885 [12:02:48<3:35:57, 3.00s/it] 37%|███▋ | 2560/6885 [12:02:56<5:27:24, 4.54s/it] {'loss': 0.6081, 'grad_norm': 1.318015728266432, 'learning_rate': 7.915907772780244e-06, 'epoch': 0.37} 37%|███▋ | 2560/6885 [12:02:56<5:27:24, 4.54s/it] 37%|███▋ | 2561/6885 [12:02:58<4:46:40, 3.98s/it] 37%|███▋ | 2562/6885 [12:03:01<4:16:24, 3.56s/it] 37%|███▋ | 2563/6885 [12:03:05<4:15:04, 3.54s/it] 37%|███▋ | 2564/6885 [12:03:07<3:56:09, 3.28s/it] 37%|███▋ | 2565/6885 [12:03:10<3:51:31, 3.22s/it] 37%|███▋ | 2566/6885 [12:03:12<3:27:25, 2.88s/it] 37%|███▋ | 2567/6885 [12:03:14<3:09:49, 2.64s/it] 37%|███▋ | 2568/6885 [12:03:17<3:08:07, 2.61s/it] 37%|███▋ | 2569/6885 [12:03:20<3:12:43, 2.68s/it] 37%|███▋ | 2570/6885 [12:03:22<3:05:43, 2.58s/it] {'loss': 0.6046, 'grad_norm': 1.253197445356757, 'learning_rate': 7.89527606702065e-06, 'epoch': 0.37} 37%|███▋ | 2570/6885 [12:03:22<3:05:43, 2.58s/it] 37%|███▋ | 2571/6885 [12:03:25<3:11:59, 2.67s/it] 37%|███▋ | 2572/6885 [12:03:27<3:01:05, 2.52s/it] 37%|███▋ | 2573/6885 [12:03:32<3:46:36, 3.15s/it] 37%|███▋ | 2574/6885 [12:03:36<3:57:05, 3.30s/it] 37%|███▋ | 2575/6885 [12:03:39<3:58:01, 3.31s/it] 37%|███▋ | 2576/6885 [12:03:41<3:36:30, 3.01s/it] 37%|███▋ | 2577/6885 [12:03:44<3:24:37, 2.85s/it] 37%|███▋ | 2578/6885 [12:03:46<3:04:23, 2.57s/it] 37%|███▋ | 2579/6885 [12:03:50<3:39:23, 3.06s/it] 37%|███▋ | 2580/6885 [12:03:53<3:46:51, 3.16s/it] {'loss': 0.5986, 'grad_norm': 1.190199765539676, 'learning_rate': 7.87456992818329e-06, 'epoch': 0.37} 37%|███▋ | 2580/6885 [12:03:53<3:46:51, 3.16s/it] 37%|███▋ | 2581/6885 [12:03:55<3:26:07, 2.87s/it] 38%|███▊ | 2582/6885 [12:03:58<3:24:23, 2.85s/it] 38%|███▊ | 2583/6885 [12:04:00<3:09:33, 2.64s/it] 38%|███▊ | 2584/6885 [12:04:03<3:11:22, 2.67s/it] 38%|███▊ | 2585/6885 [12:04:06<3:13:03, 2.69s/it] 38%|███▊ | 2586/6885 [12:04:09<3:18:10, 2.77s/it] 38%|███▊ | 2587/6885 [12:04:11<3:02:43, 2.55s/it] 38%|███▊ | 2588/6885 [12:04:13<2:58:05, 2.49s/it] 38%|███▊ | 2589/6885 [12:04:18<3:42:13, 3.10s/it] 38%|███▊ | 2590/6885 [12:04:20<3:14:39, 2.72s/it] {'loss': 0.5889, 'grad_norm': 1.193398450040499, 'learning_rate': 7.853789888591032e-06, 'epoch': 0.38} 38%|███▊ | 2590/6885 [12:04:20<3:14:39, 2.72s/it] 38%|███▊ | 2591/6885 [12:04:22<3:19:18, 2.79s/it] 38%|███▊ | 2592/6885 [12:04:25<3:20:38, 2.80s/it] 38%|███▊ | 2593/6885 [12:04:28<3:26:38, 2.89s/it] 38%|███▊ | 2594/6885 [12:04:31<3:21:25, 2.82s/it] 38%|███▊ | 2595/6885 [12:04:34<3:19:29, 2.79s/it] 38%|███▊ | 2596/6885 [12:04:36<3:06:13, 2.61s/it] 38%|███▊ | 2597/6885 [12:04:38<3:03:02, 2.56s/it] 38%|███▊ | 2598/6885 [12:04:43<3:50:10, 3.22s/it] 38%|███▊ | 2599/6885 [12:04:46<3:40:14, 3.08s/it] 38%|███▊ | 2600/6885 [12:04:52<4:40:18, 3.92s/it] {'loss': 0.5934, 'grad_norm': 1.035053671117003, 'learning_rate': 7.832936482466612e-06, 'epoch': 0.38} 38%|███▊ | 2600/6885 [12:04:52<4:40:18, 3.92s/it] 38%|███▊ | 2601/6885 [12:04:55<4:13:52, 3.56s/it] 38%|███▊ | 2602/6885 [12:04:57<3:43:45, 3.13s/it] 38%|███▊ | 2603/6885 [12:04:59<3:26:57, 2.90s/it] 38%|███▊ | 2604/6885 [12:05:02<3:22:28, 2.84s/it] 38%|███▊ | 2605/6885 [12:05:05<3:36:38, 3.04s/it] 38%|███▊ | 2606/6885 [12:05:07<3:09:17, 2.65s/it] 38%|███▊ | 2607/6885 [12:05:10<3:23:20, 2.85s/it] 38%|███▊ | 2608/6885 [12:05:13<3:25:35, 2.88s/it] 38%|███▊ | 2609/6885 [12:05:16<3:29:24, 2.94s/it] 38%|███▊ | 2610/6885 [12:05:19<3:28:32, 2.93s/it] {'loss': 0.586, 'grad_norm': 1.1386993400574172, 'learning_rate': 7.812010245918903e-06, 'epoch': 0.38} 38%|███▊ | 2610/6885 [12:05:19<3:28:32, 2.93s/it] 38%|███▊ | 2611/6885 [12:05:22<3:20:31, 2.81s/it] 38%|███▊ | 2612/6885 [12:05:24<3:15:22, 2.74s/it] 38%|███▊ | 2613/6885 [12:05:27<3:17:43, 2.78s/it] 38%|███▊ | 2614/6885 [12:05:29<3:07:04, 2.63s/it] 38%|███▊ | 2615/6885 [12:05:32<2:54:44, 2.46s/it] 38%|███▊ | 2616/6885 [12:05:34<2:51:56, 2.42s/it] 38%|███▊ | 2617/6885 [12:05:38<3:25:20, 2.89s/it] 38%|███▊ | 2618/6885 [12:05:41<3:31:01, 2.97s/it] 38%|███▊ | 2619/6885 [12:05:44<3:40:36, 3.10s/it] 38%|███▊ | 2620/6885 [12:05:49<4:05:56, 3.46s/it] {'loss': 0.5806, 'grad_norm': 1.1022458257608025, 'learning_rate': 7.79101171692914e-06, 'epoch': 0.38} 38%|███▊ | 2620/6885 [12:05:49<4:05:56, 3.46s/it] 38%|███▊ | 2621/6885 [12:05:51<3:47:51, 3.21s/it] 38%|███▊ | 2622/6885 [12:05:55<3:47:58, 3.21s/it] 38%|███▊ | 2623/6885 [12:05:57<3:39:01, 3.08s/it] 38%|███▊ | 2624/6885 [12:06:00<3:23:18, 2.86s/it] 38%|███▊ | 2625/6885 [12:06:04<3:46:47, 3.19s/it] 38%|███▊ | 2626/6885 [12:06:05<3:17:15, 2.78s/it] 38%|███▊ | 2627/6885 [12:06:10<3:55:17, 3.32s/it] 38%|███▊ | 2628/6885 [12:06:14<4:01:27, 3.40s/it] 38%|███▊ | 2629/6885 [12:06:18<4:24:01, 3.72s/it] 38%|███▊ | 2630/6885 [12:06:20<3:50:49, 3.25s/it] {'loss': 0.5618, 'grad_norm': 1.1758543851880188, 'learning_rate': 7.769941435337083e-06, 'epoch': 0.38} 38%|███▊ | 2630/6885 [12:06:20<3:50:49, 3.25s/it] 38%|███▊ | 2631/6885 [12:06:22<3:21:21, 2.84s/it] 38%|███▊ | 2632/6885 [12:06:25<3:17:02, 2.78s/it] 38%|███▊ | 2633/6885 [12:06:27<3:05:50, 2.62s/it] 38%|███▊ | 2634/6885 [12:06:30<3:19:12, 2.81s/it] 38%|███▊ | 2635/6885 [12:06:34<3:45:37, 3.19s/it] 38%|███▊ | 2636/6885 [12:06:37<3:26:22, 2.91s/it] 38%|███▊ | 2637/6885 [12:06:40<3:27:03, 2.92s/it] 38%|███▊ | 2638/6885 [12:06:42<3:22:31, 2.86s/it] 38%|███▊ | 2639/6885 [12:06:44<3:06:08, 2.63s/it] 38%|███▊ | 2640/6885 [12:06:49<3:44:37, 3.17s/it] {'loss': 0.6012, 'grad_norm': 1.2426818455480244, 'learning_rate': 7.748799942827147e-06, 'epoch': 0.38} 38%|███▊ | 2640/6885 [12:06:49<3:44:37, 3.17s/it] 38%|███▊ | 2641/6885 [12:06:52<3:50:36, 3.26s/it] 38%|███▊ | 2642/6885 [12:06:55<3:39:52, 3.11s/it] 38%|███▊ | 2643/6885 [12:06:59<3:48:08, 3.23s/it] 38%|███▊ | 2644/6885 [12:07:01<3:29:32, 2.96s/it] 38%|███▊ | 2645/6885 [12:07:06<4:18:53, 3.66s/it] 38%|███▊ | 2646/6885 [12:07:09<4:05:06, 3.47s/it] 38%|███▊ | 2647/6885 [12:07:12<3:42:06, 3.14s/it] 38%|███▊ | 2648/6885 [12:07:14<3:31:51, 3.00s/it] 38%|███▊ | 2649/6885 [12:07:18<3:39:27, 3.11s/it] 38%|███▊ | 2650/6885 [12:07:21<3:40:43, 3.13s/it] {'loss': 0.5887, 'grad_norm': 1.0718204571931684, 'learning_rate': 7.72758778291446e-06, 'epoch': 0.38} 38%|███▊ | 2650/6885 [12:07:21<3:40:43, 3.13s/it] 39%|███▊ | 2651/6885 [12:07:24<3:34:57, 3.05s/it] 39%|███▊ | 2652/6885 [12:07:27<3:33:36, 3.03s/it] 39%|███▊ | 2653/6885 [12:07:31<4:10:02, 3.55s/it] 39%|███▊ | 2654/6885 [12:07:34<3:47:50, 3.23s/it] 39%|███▊ | 2655/6885 [12:07:36<3:20:58, 2.85s/it] 39%|███▊ | 2656/6885 [12:07:38<3:08:59, 2.68s/it] 39%|███▊ | 2657/6885 [12:07:41<3:18:11, 2.81s/it] 39%|███▊ | 2658/6885 [12:07:44<3:27:26, 2.94s/it] 39%|███▊ | 2659/6885 [12:07:48<3:44:02, 3.18s/it] 39%|███▊ | 2660/6885 [12:07:51<3:39:06, 3.11s/it] {'loss': 0.6037, 'grad_norm': 1.0289005823465374, 'learning_rate': 7.706305500930909e-06, 'epoch': 0.39} 39%|███▊ | 2660/6885 [12:07:51<3:39:06, 3.11s/it] 39%|███▊ | 2661/6885 [12:07:54<3:32:20, 3.02s/it] 39%|███▊ | 2662/6885 [12:07:56<3:07:40, 2.67s/it] 39%|███▊ | 2663/6885 [12:08:00<3:30:46, 3.00s/it] 39%|███▊ | 2664/6885 [12:08:02<3:09:14, 2.69s/it] 39%|███▊ | 2665/6885 [12:08:04<3:02:49, 2.60s/it] 39%|███▊ | 2666/6885 [12:08:08<3:31:49, 3.01s/it] 39%|███▊ | 2667/6885 [12:08:10<3:13:00, 2.75s/it] 39%|███▉ | 2668/6885 [12:08:13<3:24:52, 2.92s/it] 39%|███▉ | 2669/6885 [12:08:16<3:24:18, 2.91s/it] 39%|███▉ | 2670/6885 [12:08:20<3:52:31, 3.31s/it] {'loss': 0.584, 'grad_norm': 1.2478985029233107, 'learning_rate': 7.684953644011103e-06, 'epoch': 0.39} 39%|███▉ | 2670/6885 [12:08:20<3:52:31, 3.31s/it] 39%|███▉ | 2671/6885 [12:08:24<4:02:03, 3.45s/it] 39%|███▉ | 2672/6885 [12:08:28<4:08:53, 3.54s/it] 39%|███▉ | 2673/6885 [12:08:30<3:43:09, 3.18s/it] 39%|███▉ | 2674/6885 [12:08:32<3:16:46, 2.80s/it] 39%|███▉ | 2675/6885 [12:08:35<3:22:03, 2.88s/it] 39%|███▉ | 2676/6885 [12:08:38<3:14:39, 2.77s/it] 39%|███▉ | 2677/6885 [12:08:42<3:45:12, 3.21s/it] 39%|███▉ | 2678/6885 [12:08:45<3:40:00, 3.14s/it] 39%|███▉ | 2679/6885 [12:08:48<3:32:02, 3.02s/it] 39%|███▉ | 2680/6885 [12:08:52<3:51:41, 3.31s/it] {'loss': 0.6007, 'grad_norm': 1.1066991243562059, 'learning_rate': 7.66353276107832e-06, 'epoch': 0.39} 39%|███▉ | 2680/6885 [12:08:52<3:51:41, 3.31s/it] 39%|███▉ | 2681/6885 [12:08:54<3:22:58, 2.90s/it] 39%|███▉ | 2682/6885 [12:08:59<4:03:55, 3.48s/it] 39%|███▉ | 2683/6885 [12:09:01<3:48:14, 3.26s/it] 39%|███▉ | 2684/6885 [12:09:04<3:31:58, 3.03s/it] 39%|███▉ | 2685/6885 [12:09:06<3:13:40, 2.77s/it] 39%|███▉ | 2686/6885 [12:09:09<3:10:20, 2.72s/it] 39%|███▉ | 2687/6885 [12:09:11<3:12:00, 2.74s/it] 39%|███▉ | 2688/6885 [12:09:15<3:34:36, 3.07s/it] 39%|███▉ | 2689/6885 [12:09:18<3:22:35, 2.90s/it] 39%|███▉ | 2690/6885 [12:09:20<3:10:08, 2.72s/it] {'loss': 0.6033, 'grad_norm': 1.2345614999374477, 'learning_rate': 7.64204340283039e-06, 'epoch': 0.39} 39%|███▉ | 2690/6885 [12:09:20<3:10:08, 2.72s/it] 39%|███▉ | 2691/6885 [12:09:24<3:41:10, 3.16s/it] 39%|███▉ | 2692/6885 [12:09:28<4:01:44, 3.46s/it] 39%|███▉ | 2693/6885 [12:09:31<3:47:58, 3.26s/it] 39%|███▉ | 2694/6885 [12:09:34<3:44:51, 3.22s/it] 39%|███▉ | 2695/6885 [12:09:37<3:44:08, 3.21s/it] 39%|███▉ | 2696/6885 [12:09:40<3:30:15, 3.01s/it] 39%|███▉ | 2697/6885 [12:09:43<3:21:36, 2.89s/it] 39%|███▉ | 2698/6885 [12:09:46<3:34:43, 3.08s/it] 39%|███▉ | 2699/6885 [12:09:49<3:37:03, 3.11s/it] 39%|███▉ | 2700/6885 [12:09:54<4:09:28, 3.58s/it] {'loss': 0.59, 'grad_norm': 1.0798799696274017, 'learning_rate': 7.620486121725536e-06, 'epoch': 0.39} 39%|███▉ | 2700/6885 [12:09:54<4:09:28, 3.58s/it] 39%|███▉ | 2701/6885 [12:09:58<4:17:05, 3.69s/it] 39%|███▉ | 2702/6885 [12:10:00<3:45:24, 3.23s/it] 39%|███▉ | 2703/6885 [12:10:05<4:13:41, 3.64s/it] 39%|███▉ | 2704/6885 [12:10:07<3:52:07, 3.33s/it] 39%|███▉ | 2705/6885 [12:10:11<4:08:34, 3.57s/it] 39%|███▉ | 2706/6885 [12:10:13<3:32:01, 3.04s/it] 39%|███▉ | 2707/6885 [12:10:16<3:32:22, 3.05s/it] 39%|███▉ | 2708/6885 [12:10:19<3:22:38, 2.91s/it] 39%|███▉ | 2709/6885 [12:10:21<3:12:55, 2.77s/it] 39%|███▉ | 2710/6885 [12:10:26<4:01:47, 3.47s/it] {'loss': 0.5948, 'grad_norm': 1.1600968806836478, 'learning_rate': 7.598861471968174e-06, 'epoch': 0.39} 39%|███▉ | 2710/6885 [12:10:26<4:01:47, 3.47s/it] 39%|███▉ | 2711/6885 [12:10:28<3:22:34, 2.91s/it] 39%|███▉ | 2712/6885 [12:10:31<3:17:10, 2.84s/it] 39%|███▉ | 2713/6885 [12:10:33<3:07:49, 2.70s/it] 39%|███▉ | 2714/6885 [12:10:36<3:14:57, 2.80s/it] 39%|███▉ | 2715/6885 [12:10:39<3:09:34, 2.73s/it] 39%|███▉ | 2716/6885 [12:10:42<3:18:36, 2.86s/it] 39%|███▉ | 2717/6885 [12:10:44<3:02:51, 2.63s/it] 39%|███▉ | 2718/6885 [12:10:48<3:29:13, 3.01s/it] 39%|███▉ | 2719/6885 [12:10:50<3:12:23, 2.77s/it] 40%|███▉ | 2720/6885 [12:10:52<3:00:05, 2.59s/it] {'loss': 0.5981, 'grad_norm': 1.1860847221048887, 'learning_rate': 7.577170009494665e-06, 'epoch': 0.4} 40%|███▉ | 2720/6885 [12:10:52<3:00:05, 2.59s/it] 40%|███▉ | 2721/6885 [12:10:56<3:28:46, 3.01s/it] 40%|███▉ | 2722/6885 [12:10:59<3:20:02, 2.88s/it] 40%|███▉ | 2723/6885 [12:11:02<3:19:47, 2.88s/it] 40%|███▉ | 2724/6885 [12:11:04<3:00:54, 2.61s/it] 40%|███▉ | 2725/6885 [12:11:06<3:05:26, 2.67s/it] 40%|███▉ | 2726/6885 [12:11:09<2:53:07, 2.50s/it] 40%|███▉ | 2727/6885 [12:11:12<3:12:06, 2.77s/it] 40%|███▉ | 2728/6885 [12:11:15<3:25:15, 2.96s/it] 40%|███▉ | 2729/6885 [12:11:19<3:35:00, 3.10s/it] 40%|███▉ | 2730/6885 [12:11:22<3:43:54, 3.23s/it] {'loss': 0.5772, 'grad_norm': 1.0670434364146835, 'learning_rate': 7.555412291959018e-06, 'epoch': 0.4} 40%|███▉ | 2730/6885 [12:11:22<3:43:54, 3.23s/it] 40%|███▉ | 2731/6885 [12:11:25<3:25:34, 2.97s/it] 40%|███▉ | 2732/6885 [12:11:28<3:22:31, 2.93s/it] 40%|███▉ | 2733/6885 [12:11:31<3:36:20, 3.13s/it] 40%|███▉ | 2734/6885 [12:11:33<3:20:22, 2.90s/it] 40%|███▉ | 2735/6885 [12:11:38<3:45:54, 3.27s/it] 40%|███▉ | 2736/6885 [12:11:39<3:15:50, 2.83s/it] 40%|███▉ | 2737/6885 [12:11:44<3:44:08, 3.24s/it] 40%|███▉ | 2738/6885 [12:11:46<3:34:45, 3.11s/it] 40%|███▉ | 2739/6885 [12:11:49<3:20:44, 2.91s/it] 40%|███▉ | 2740/6885 [12:11:51<3:04:15, 2.67s/it] {'loss': 0.584, 'grad_norm': 1.1865817610815497, 'learning_rate': 7.533588878718561e-06, 'epoch': 0.4} 40%|███▉ | 2740/6885 [12:11:51<3:04:15, 2.67s/it] 40%|███▉ | 2741/6885 [12:11:54<3:01:21, 2.63s/it] 40%|███▉ | 2742/6885 [12:11:58<3:40:01, 3.19s/it] 40%|███▉ | 2743/6885 [12:12:01<3:35:06, 3.12s/it] 40%|███▉ | 2744/6885 [12:12:05<3:47:21, 3.29s/it] 40%|███▉ | 2745/6885 [12:12:08<3:40:23, 3.19s/it] 40%|███▉ | 2746/6885 [12:12:11<3:33:46, 3.10s/it] 40%|███▉ | 2747/6885 [12:12:16<4:24:51, 3.84s/it] 40%|███▉ | 2748/6885 [12:12:19<4:07:28, 3.59s/it] 40%|███▉ | 2749/6885 [12:12:22<4:02:01, 3.51s/it] 40%|███▉ | 2750/6885 [12:12:24<3:26:25, 3.00s/it] {'loss': 0.5832, 'grad_norm': 1.2092053148497965, 'learning_rate': 7.511700330819556e-06, 'epoch': 0.4} 40%|███▉ | 2750/6885 [12:12:24<3:26:25, 3.00s/it] 40%|███▉ | 2751/6885 [12:12:26<3:10:12, 2.76s/it] 40%|███▉ | 2752/6885 [12:12:29<3:08:15, 2.73s/it] 40%|███▉ | 2753/6885 [12:12:32<3:10:26, 2.77s/it] 40%|████ | 2754/6885 [12:12:36<3:27:50, 3.02s/it] 40%|████ | 2755/6885 [12:12:38<3:24:44, 2.97s/it] 40%|████ | 2756/6885 [12:12:41<3:14:09, 2.82s/it] 40%|████ | 2757/6885 [12:12:43<2:52:56, 2.51s/it] 40%|████ | 2758/6885 [12:12:45<2:42:45, 2.37s/it] 40%|████ | 2759/6885 [12:12:49<3:30:42, 3.06s/it] 40%|████ | 2760/6885 [12:12:54<3:54:41, 3.41s/it] {'loss': 0.5984, 'grad_norm': 1.1770338237370501, 'learning_rate': 7.489747210982777e-06, 'epoch': 0.4} 40%|████ | 2760/6885 [12:12:54<3:54:41, 3.41s/it] 40%|████ | 2761/6885 [12:12:56<3:36:37, 3.15s/it] 40%|████ | 2762/6885 [12:13:00<3:48:03, 3.32s/it] 40%|████ | 2763/6885 [12:13:02<3:16:04, 2.85s/it] 40%|████ | 2764/6885 [12:13:04<2:57:05, 2.58s/it] 40%|████ | 2765/6885 [12:13:06<3:00:08, 2.62s/it] 40%|████ | 2766/6885 [12:13:08<2:49:26, 2.47s/it] 40%|████ | 2767/6885 [12:13:14<3:52:31, 3.39s/it] 40%|████ | 2768/6885 [12:13:16<3:26:02, 3.00s/it] 40%|████ | 2769/6885 [12:13:19<3:20:45, 2.93s/it] 40%|████ | 2770/6885 [12:13:22<3:19:02, 2.90s/it] {'loss': 0.5755, 'grad_norm': 1.1434774901575833, 'learning_rate': 7.4677300835890424e-06, 'epoch': 0.4} 40%|████ | 2770/6885 [12:13:22<3:19:02, 2.90s/it] 40%|████ | 2771/6885 [12:13:25<3:30:02, 3.06s/it] 40%|████ | 2772/6885 [12:13:28<3:21:37, 2.94s/it] 40%|████ | 2773/6885 [12:13:30<3:14:56, 2.84s/it] 40%|████ | 2774/6885 [12:13:34<3:25:41, 3.00s/it] 40%|████ | 2775/6885 [12:13:36<3:16:54, 2.87s/it] 40%|████ | 2776/6885 [12:13:40<3:26:58, 3.02s/it] 40%|████ | 2777/6885 [12:13:43<3:28:47, 3.05s/it] 40%|████ | 2778/6885 [12:13:46<3:29:29, 3.06s/it] 40%|████ | 2779/6885 [12:13:48<3:15:55, 2.86s/it] 40%|████ | 2780/6885 [12:13:54<4:22:59, 3.84s/it] {'loss': 0.5886, 'grad_norm': 1.0366368031771818, 'learning_rate': 7.445649514664703e-06, 'epoch': 0.4} 40%|████ | 2780/6885 [12:13:54<4:22:59, 3.84s/it] 40%|████ | 2781/6885 [12:13:58<4:25:33, 3.88s/it] 40%|████ | 2782/6885 [12:14:05<5:17:54, 4.65s/it] 40%|████ | 2783/6885 [12:14:07<4:22:24, 3.84s/it] 40%|████ | 2784/6885 [12:14:09<3:44:49, 3.29s/it] 40%|████ | 2785/6885 [12:14:11<3:15:07, 2.86s/it] 40%|████ | 2786/6885 [12:14:15<3:50:04, 3.37s/it] 40%|████ | 2787/6885 [12:14:18<3:30:32, 3.08s/it] 40%|████ | 2788/6885 [12:14:20<3:17:58, 2.90s/it] 41%|████ | 2789/6885 [12:14:23<3:26:38, 3.03s/it] 41%|████ | 2790/6885 [12:14:26<3:12:56, 2.83s/it] {'loss': 0.6134, 'grad_norm': 1.2729396302065998, 'learning_rate': 7.423506071867101e-06, 'epoch': 0.41} 41%|████ | 2790/6885 [12:14:26<3:12:56, 2.83s/it] 41%|████ | 2791/6885 [12:14:28<2:53:26, 2.54s/it] 41%|████ | 2792/6885 [12:14:30<2:46:56, 2.45s/it] 41%|████ | 2793/6885 [12:14:33<3:00:27, 2.65s/it] 41%|████ | 2794/6885 [12:14:38<3:46:04, 3.32s/it] 41%|████ | 2795/6885 [12:14:42<3:55:32, 3.46s/it] 41%|████ | 2796/6885 [12:14:44<3:40:00, 3.23s/it] 41%|████ | 2797/6885 [12:14:47<3:27:29, 3.05s/it] 41%|████ | 2798/6885 [12:14:49<3:03:05, 2.69s/it] 41%|████ | 2799/6885 [12:14:52<3:12:05, 2.82s/it] 41%|████ | 2800/6885 [12:14:55<3:16:44, 2.89s/it] {'loss': 0.5737, 'grad_norm': 1.0518352889412923, 'learning_rate': 7.401300324469961e-06, 'epoch': 0.41} 41%|████ | 2800/6885 [12:14:55<3:16:44, 2.89s/it] 41%|████ | 2801/6885 [12:14:57<3:01:06, 2.66s/it] 41%|████ | 2802/6885 [12:15:01<3:28:00, 3.06s/it] 41%|████ | 2803/6885 [12:15:04<3:31:32, 3.11s/it] 41%|████ | 2804/6885 [12:15:07<3:27:57, 3.06s/it] 41%|████ | 2805/6885 [12:15:09<3:09:08, 2.78s/it] 41%|████ | 2806/6885 [12:15:12<3:08:02, 2.77s/it] 41%|████ | 2807/6885 [12:15:14<2:51:13, 2.52s/it] 41%|████ | 2808/6885 [12:15:19<3:34:29, 3.16s/it] 41%|████ | 2809/6885 [12:15:21<3:24:30, 3.01s/it] 41%|████ | 2810/6885 [12:15:25<3:30:10, 3.09s/it] {'loss': 0.5874, 'grad_norm': 1.2001944481237583, 'learning_rate': 7.3790328433487665e-06, 'epoch': 0.41} 41%|████ | 2810/6885 [12:15:25<3:30:10, 3.09s/it] 41%|████ | 2811/6885 [12:15:28<3:37:58, 3.21s/it] 41%|████ | 2812/6885 [12:15:30<3:14:57, 2.87s/it] 41%|████ | 2813/6885 [12:15:33<3:17:10, 2.91s/it] 41%|████ | 2814/6885 [12:15:40<4:36:26, 4.07s/it] 41%|████ | 2815/6885 [12:15:44<4:34:52, 4.05s/it] 41%|████ | 2816/6885 [12:15:46<4:00:28, 3.55s/it] 41%|████ | 2817/6885 [12:15:49<3:35:34, 3.18s/it] 41%|████ | 2818/6885 [12:15:53<3:55:15, 3.47s/it] 41%|████ | 2819/6885 [12:15:56<3:49:34, 3.39s/it] 41%|████ | 2820/6885 [12:15:58<3:16:01, 2.89s/it] {'loss': 0.5862, 'grad_norm': 1.250231920993964, 'learning_rate': 7.3567042009660786e-06, 'epoch': 0.41} 41%|████ | 2820/6885 [12:15:58<3:16:01, 2.89s/it] 41%|████ | 2821/6885 [12:16:02<3:45:33, 3.33s/it] 41%|████ | 2822/6885 [12:16:05<3:41:13, 3.27s/it] 41%|████ | 2823/6885 [12:16:11<4:39:08, 4.12s/it] 41%|████ | 2824/6885 [12:16:14<4:06:19, 3.64s/it] 41%|████ | 2825/6885 [12:16:17<3:48:05, 3.37s/it] 41%|████ | 2826/6885 [12:16:20<3:54:22, 3.46s/it] 41%|████ | 2827/6885 [12:16:24<4:04:13, 3.61s/it] 41%|████ | 2828/6885 [12:16:27<3:36:42, 3.21s/it] 41%|████ | 2829/6885 [12:16:30<3:51:34, 3.43s/it] 41%|████ | 2830/6885 [12:16:34<3:55:36, 3.49s/it] {'loss': 0.593, 'grad_norm': 1.1512872210708966, 'learning_rate': 7.3343149713568215e-06, 'epoch': 0.41} 41%|████ | 2830/6885 [12:16:34<3:55:36, 3.49s/it] 41%|████ | 2831/6885 [12:16:36<3:33:21, 3.16s/it] 41%|████ | 2832/6885 [12:16:39<3:26:03, 3.05s/it] 41%|████ | 2833/6885 [12:16:42<3:10:06, 2.82s/it] 41%|████ | 2834/6885 [12:16:48<4:21:55, 3.88s/it] 41%|████ | 2835/6885 [12:16:50<3:51:49, 3.43s/it] 41%|████ | 2836/6885 [12:16:53<3:34:42, 3.18s/it] 41%|████ | 2837/6885 [12:16:56<3:35:04, 3.19s/it] 41%|████ | 2838/6885 [12:16:59<3:24:13, 3.03s/it] 41%|████ | 2839/6885 [12:17:02<3:38:35, 3.24s/it] 41%|████ | 2840/6885 [12:17:06<3:38:42, 3.24s/it] {'loss': 0.5939, 'grad_norm': 1.1605256860138091, 'learning_rate': 7.311865730113525e-06, 'epoch': 0.41} 41%|████ | 2840/6885 [12:17:06<3:38:42, 3.24s/it] 41%|████▏ | 2841/6885 [12:17:08<3:23:01, 3.01s/it] 41%|████▏ | 2842/6885 [12:17:11<3:18:30, 2.95s/it] 41%|████▏ | 2843/6885 [12:17:14<3:24:09, 3.03s/it] 41%|████▏ | 2844/6885 [12:17:17<3:11:13, 2.84s/it] 41%|████▏ | 2845/6885 [12:17:19<3:11:37, 2.85s/it] 41%|████▏ | 2846/6885 [12:17:23<3:28:19, 3.09s/it] 41%|████▏ | 2847/6885 [12:17:26<3:23:16, 3.02s/it] 41%|████▏ | 2848/6885 [12:17:29<3:16:43, 2.92s/it] 41%|████▏ | 2849/6885 [12:17:31<3:04:49, 2.75s/it] 41%|████▏ | 2850/6885 [12:17:33<2:49:19, 2.52s/it] {'loss': 0.6028, 'grad_norm': 1.3940208410225592, 'learning_rate': 7.2893570543715174e-06, 'epoch': 0.41} 41%|████▏ | 2850/6885 [12:17:33<2:49:19, 2.52s/it] 41%|████▏ | 2851/6885 [12:17:37<3:25:17, 3.05s/it] 41%|████▏ | 2852/6885 [12:17:42<3:54:06, 3.48s/it] 41%|████▏ | 2853/6885 [12:17:45<3:40:47, 3.29s/it] 41%|████▏ | 2854/6885 [12:17:47<3:30:55, 3.14s/it] 41%|████▏ | 2855/6885 [12:17:51<3:37:48, 3.24s/it] 41%|████▏ | 2856/6885 [12:17:54<3:35:19, 3.21s/it] 41%|████▏ | 2857/6885 [12:17:58<3:51:33, 3.45s/it] 42%|████▏ | 2858/6885 [12:18:01<3:34:10, 3.19s/it] 42%|████▏ | 2859/6885 [12:18:04<3:30:06, 3.13s/it] 42%|████▏ | 2860/6885 [12:18:06<3:18:08, 2.95s/it] {'loss': 0.6065, 'grad_norm': 1.1976078557092422, 'learning_rate': 7.266789522794104e-06, 'epoch': 0.42} 42%|████▏ | 2860/6885 [12:18:06<3:18:08, 2.95s/it] 42%|████▏ | 2861/6885 [12:18:08<2:53:31, 2.59s/it] 42%|████▏ | 2862/6885 [12:18:12<3:17:34, 2.95s/it] 42%|████▏ | 2863/6885 [12:18:15<3:34:16, 3.20s/it] 42%|████▏ | 2864/6885 [12:18:18<3:22:40, 3.02s/it] 42%|████▏ | 2865/6885 [12:18:21<3:10:35, 2.84s/it] 42%|████▏ | 2866/6885 [12:18:24<3:16:44, 2.94s/it] 42%|████▏ | 2867/6885 [12:18:29<3:59:32, 3.58s/it] 42%|████▏ | 2868/6885 [12:18:32<3:51:17, 3.45s/it] 42%|████▏ | 2869/6885 [12:18:35<3:37:40, 3.25s/it] 42%|████▏ | 2870/6885 [12:18:38<3:40:53, 3.30s/it] {'loss': 0.5915, 'grad_norm': 1.035110243445679, 'learning_rate': 7.244163715557683e-06, 'epoch': 0.42} 42%|████▏ | 2870/6885 [12:18:38<3:40:53, 3.30s/it] 42%|████▏ | 2871/6885 [12:18:40<3:13:55, 2.90s/it] 42%|████▏ | 2872/6885 [12:18:43<3:04:59, 2.77s/it] 42%|████▏ | 2873/6885 [12:18:47<3:47:49, 3.41s/it] 42%|████▏ | 2874/6885 [12:18:50<3:38:34, 3.27s/it] 42%|████▏ | 2875/6885 [12:18:53<3:21:54, 3.02s/it] 42%|████▏ | 2876/6885 [12:18:55<3:14:23, 2.91s/it] 42%|████▏ | 2877/6885 [12:18:58<3:10:58, 2.86s/it] 42%|████▏ | 2878/6885 [12:19:00<2:47:41, 2.51s/it] 42%|████▏ | 2879/6885 [12:19:02<2:44:55, 2.47s/it] 42%|████▏ | 2880/6885 [12:19:05<2:47:12, 2.51s/it] {'loss': 0.5961, 'grad_norm': 1.1865073190747897, 'learning_rate': 7.2214802143368225e-06, 'epoch': 0.42} 42%|████▏ | 2880/6885 [12:19:05<2:47:12, 2.51s/it] 42%|████▏ | 2881/6885 [12:19:08<3:05:21, 2.78s/it] 42%|████▏ | 2882/6885 [12:19:11<2:57:45, 2.66s/it] 42%|████▏ | 2883/6885 [12:19:15<3:29:50, 3.15s/it] 42%|████▏ | 2884/6885 [12:19:17<3:14:21, 2.91s/it] 42%|████▏ | 2885/6885 [12:19:20<3:03:47, 2.76s/it] 42%|████▏ | 2886/6885 [12:19:22<2:45:37, 2.48s/it] 42%|████▏ | 2887/6885 [12:19:25<3:03:31, 2.75s/it] 42%|████▏ | 2888/6885 [12:19:29<3:22:43, 3.04s/it] 42%|████▏ | 2889/6885 [12:19:34<3:59:04, 3.59s/it] 42%|████▏ | 2890/6885 [12:19:36<3:45:26, 3.39s/it] {'loss': 0.5857, 'grad_norm': 1.0991372561424138, 'learning_rate': 7.1987396022893216e-06, 'epoch': 0.42} 42%|████▏ | 2890/6885 [12:19:36<3:45:26, 3.39s/it] 42%|████▏ | 2891/6885 [12:19:40<3:46:57, 3.41s/it] 42%|████▏ | 2892/6885 [12:19:42<3:19:56, 3.00s/it] 42%|████▏ | 2893/6885 [12:19:45<3:11:16, 2.87s/it] 42%|████▏ | 2894/6885 [12:19:48<3:18:42, 2.99s/it] 42%|████▏ | 2895/6885 [12:19:51<3:28:57, 3.14s/it] 42%|████▏ | 2896/6885 [12:19:57<4:16:26, 3.86s/it] 42%|████▏ | 2897/6885 [12:20:00<4:00:06, 3.61s/it] 42%|████▏ | 2898/6885 [12:20:05<4:31:38, 4.09s/it] 42%|████▏ | 2899/6885 [12:20:07<3:52:18, 3.50s/it] 42%|████▏ | 2900/6885 [12:20:11<4:05:09, 3.69s/it] {'loss': 0.5829, 'grad_norm': 1.0801243737112538, 'learning_rate': 7.175942464041209e-06, 'epoch': 0.42} 42%|████▏ | 2900/6885 [12:20:11<4:05:09, 3.69s/it] 42%|████▏ | 2901/6885 [12:20:14<3:48:19, 3.44s/it] 42%|████▏ | 2902/6885 [12:20:18<3:59:57, 3.61s/it] 42%|████▏ | 2903/6885 [12:20:20<3:27:35, 3.13s/it] 42%|████▏ | 2904/6885 [12:20:23<3:17:15, 2.97s/it] 42%|████▏ | 2905/6885 [12:20:25<3:04:14, 2.78s/it] 42%|████▏ | 2906/6885 [12:20:28<3:10:58, 2.88s/it] 42%|████▏ | 2907/6885 [12:20:32<3:30:17, 3.17s/it] 42%|████▏ | 2908/6885 [12:20:35<3:19:41, 3.01s/it] 42%|████▏ | 2909/6885 [12:20:37<3:04:36, 2.79s/it] 42%|████▏ | 2910/6885 [12:20:39<2:46:28, 2.51s/it] {'loss': 0.5869, 'grad_norm': 1.3295568712189132, 'learning_rate': 7.15308938567171e-06, 'epoch': 0.42} 42%|████▏ | 2910/6885 [12:20:39<2:46:28, 2.51s/it] 42%|████▏ | 2911/6885 [12:20:41<2:43:48, 2.47s/it] 42%|████▏ | 2912/6885 [12:20:44<2:56:38, 2.67s/it] 42%|████▏ | 2913/6885 [12:20:47<2:55:00, 2.64s/it] 42%|████▏ | 2914/6885 [12:20:49<2:41:14, 2.44s/it] 42%|████▏ | 2915/6885 [12:20:52<2:56:17, 2.66s/it] 42%|████▏ | 2916/6885 [12:20:56<3:18:49, 3.01s/it] 42%|████▏ | 2917/6885 [12:20:59<3:15:59, 2.96s/it] 42%|████▏ | 2918/6885 [12:21:01<3:11:32, 2.90s/it] 42%|████▏ | 2919/6885 [12:21:03<2:50:41, 2.58s/it] 42%|████▏ | 2920/6885 [12:21:06<2:50:30, 2.58s/it] {'loss': 0.5842, 'grad_norm': 1.0402363831702612, 'learning_rate': 7.130180954698187e-06, 'epoch': 0.42} 42%|████▏ | 2920/6885 [12:21:06<2:50:30, 2.58s/it] 42%|████▏ | 2921/6885 [12:21:10<3:25:49, 3.12s/it] 42%|████▏ | 2922/6885 [12:21:12<3:02:00, 2.76s/it] 42%|████▏ | 2923/6885 [12:21:15<3:05:56, 2.82s/it] 42%|████▏ | 2924/6885 [12:21:19<3:24:00, 3.09s/it] 42%|████▏ | 2925/6885 [12:21:22<3:31:16, 3.20s/it] 42%|████▏ | 2926/6885 [12:21:25<3:28:29, 3.16s/it] 43%|████▎ | 2927/6885 [12:21:28<3:08:10, 2.85s/it] 43%|████▎ | 2928/6885 [12:21:31<3:12:04, 2.91s/it] 43%|████▎ | 2929/6885 [12:21:33<2:53:38, 2.63s/it] 43%|████▎ | 2930/6885 [12:21:36<3:16:41, 2.98s/it] {'loss': 0.5923, 'grad_norm': 1.1031276144488775, 'learning_rate': 7.107217760061036e-06, 'epoch': 0.43} 43%|████▎ | 2930/6885 [12:21:36<3:16:41, 2.98s/it] 43%|████▎ | 2931/6885 [12:21:40<3:27:22, 3.15s/it] 43%|████▎ | 2932/6885 [12:21:42<2:59:53, 2.73s/it] 43%|████▎ | 2933/6885 [12:21:44<2:45:46, 2.52s/it] 43%|████▎ | 2934/6885 [12:21:46<2:33:48, 2.34s/it] 43%|████▎ | 2935/6885 [12:21:51<3:27:42, 3.16s/it] 43%|████▎ | 2936/6885 [12:21:54<3:29:28, 3.18s/it] 43%|████▎ | 2937/6885 [12:21:55<2:55:30, 2.67s/it] 43%|████▎ | 2938/6885 [12:21:58<3:00:25, 2.74s/it] 43%|████▎ | 2939/6885 [12:22:02<3:12:40, 2.93s/it] 43%|████▎ | 2940/6885 [12:22:07<4:04:30, 3.72s/it] {'loss': 0.6053, 'grad_norm': 1.183086396688286, 'learning_rate': 7.0842003921085376e-06, 'epoch': 0.43} 43%|████▎ | 2940/6885 [12:22:07<4:04:30, 3.72s/it] 43%|████▎ | 2941/6885 [12:22:10<3:52:54, 3.54s/it] 43%|████▎ | 2942/6885 [12:22:14<3:47:30, 3.46s/it] 43%|████▎ | 2943/6885 [12:22:16<3:21:38, 3.07s/it] 43%|████▎ | 2944/6885 [12:22:19<3:19:14, 3.03s/it] 43%|████▎ | 2945/6885 [12:22:21<3:02:25, 2.78s/it] 43%|████▎ | 2946/6885 [12:22:24<3:15:07, 2.97s/it] 43%|████▎ | 2947/6885 [12:22:29<3:52:01, 3.54s/it] 43%|████▎ | 2948/6885 [12:22:31<3:27:13, 3.16s/it] 43%|████▎ | 2949/6885 [12:22:34<3:10:09, 2.90s/it] 43%|████▎ | 2950/6885 [12:22:37<3:14:00, 2.96s/it] {'loss': 0.5924, 'grad_norm': 1.244303339507363, 'learning_rate': 7.061129442581685e-06, 'epoch': 0.43} 43%|████▎ | 2950/6885 [12:22:37<3:14:00, 2.96s/it] 43%|████▎ | 2951/6885 [12:22:39<2:56:24, 2.69s/it] 43%|████▎ | 2952/6885 [12:22:41<2:49:02, 2.58s/it] 43%|████▎ | 2953/6885 [12:22:43<2:40:54, 2.46s/it] 43%|████▎ | 2954/6885 [12:22:46<2:50:10, 2.60s/it] 43%|████▎ | 2955/6885 [12:22:48<2:36:05, 2.38s/it] 43%|████▎ | 2956/6885 [12:22:50<2:32:33, 2.33s/it] 43%|████▎ | 2957/6885 [12:22:53<2:34:06, 2.35s/it] 43%|████▎ | 2958/6885 [12:22:55<2:37:56, 2.41s/it] 43%|████▎ | 2959/6885 [12:23:00<3:12:01, 2.93s/it] 43%|████▎ | 2960/6885 [12:23:02<3:07:45, 2.87s/it] {'loss': 0.5922, 'grad_norm': 1.2478572360385807, 'learning_rate': 7.038005504598975e-06, 'epoch': 0.43} 43%|████▎ | 2960/6885 [12:23:02<3:07:45, 2.87s/it] 43%|████▎ | 2961/6885 [12:23:06<3:24:18, 3.12s/it] 43%|████▎ | 2962/6885 [12:23:09<3:18:52, 3.04s/it] 43%|████▎ | 2963/6885 [12:23:11<3:00:33, 2.76s/it] 43%|████▎ | 2964/6885 [12:23:13<2:55:25, 2.68s/it] 43%|████▎ | 2965/6885 [12:23:15<2:41:18, 2.47s/it] 43%|████▎ | 2966/6885 [12:23:19<3:10:47, 2.92s/it] 43%|████▎ | 2967/6885 [12:23:23<3:16:24, 3.01s/it] 43%|████▎ | 2968/6885 [12:23:25<3:10:46, 2.92s/it] 43%|████▎ | 2969/6885 [12:23:28<2:59:46, 2.75s/it] 43%|████▎ | 2970/6885 [12:23:30<2:58:26, 2.73s/it] {'loss': 0.5825, 'grad_norm': 1.0447681879549313, 'learning_rate': 7.0148291726411486e-06, 'epoch': 0.43} 43%|████▎ | 2970/6885 [12:23:30<2:58:26, 2.73s/it] 43%|████▎ | 2971/6885 [12:23:34<3:08:27, 2.89s/it] 43%|████▎ | 2972/6885 [12:23:35<2:46:39, 2.56s/it] 43%|████▎ | 2973/6885 [12:23:38<2:56:08, 2.70s/it] 43%|████▎ | 2974/6885 [12:23:42<3:10:24, 2.92s/it] 43%|████▎ | 2975/6885 [12:23:46<3:41:42, 3.40s/it] 43%|████▎ | 2976/6885 [12:23:49<3:23:18, 3.12s/it] 43%|████▎ | 2977/6885 [12:23:53<3:47:37, 3.49s/it] 43%|████▎ | 2978/6885 [12:23:55<3:21:54, 3.10s/it] 43%|████▎ | 2979/6885 [12:23:59<3:33:43, 3.28s/it] 43%|████▎ | 2980/6885 [12:24:03<3:39:05, 3.37s/it] {'loss': 0.5956, 'grad_norm': 1.1025428022026995, 'learning_rate': 6.9916010425359214e-06, 'epoch': 0.43} 43%|████▎ | 2980/6885 [12:24:03<3:39:05, 3.37s/it] 43%|████▎ | 2981/6885 [12:24:05<3:21:32, 3.10s/it] 43%|████▎ | 2982/6885 [12:24:10<3:48:35, 3.51s/it] 43%|████▎ | 2983/6885 [12:24:12<3:34:52, 3.30s/it] 43%|████▎ | 2984/6885 [12:24:17<4:00:53, 3.70s/it] 43%|████▎ | 2985/6885 [12:24:20<3:48:51, 3.52s/it] 43%|████▎ | 2986/6885 [12:24:24<4:01:01, 3.71s/it] 43%|████▎ | 2987/6885 [12:24:27<3:36:01, 3.33s/it] 43%|████▎ | 2988/6885 [12:24:30<3:37:04, 3.34s/it] 43%|████▎ | 2989/6885 [12:24:32<3:17:09, 3.04s/it] 43%|████▎ | 2990/6885 [12:24:34<2:50:26, 2.63s/it] {'loss': 0.5772, 'grad_norm': 1.329010163267056, 'learning_rate': 6.968321711442658e-06, 'epoch': 0.43} 43%|████▎ | 2990/6885 [12:24:34<2:50:26, 2.63s/it] 43%|████▎ | 2991/6885 [12:24:37<3:02:20, 2.81s/it] 43%|████▎ | 2992/6885 [12:24:40<2:52:01, 2.65s/it] 43%|████▎ | 2993/6885 [12:24:42<2:48:31, 2.60s/it] 43%|████▎ | 2994/6885 [12:24:46<3:18:42, 3.06s/it] 44%|████▎ | 2995/6885 [12:24:49<3:10:23, 2.94s/it] 44%|████▎ | 2996/6885 [12:24:51<2:57:45, 2.74s/it] 44%|████▎ | 2997/6885 [12:24:53<2:46:06, 2.56s/it] 44%|████▎ | 2998/6885 [12:24:56<2:47:56, 2.59s/it] 44%|████▎ | 2999/6885 [12:24:59<2:46:11, 2.57s/it] 44%|████▎ | 3000/6885 [12:25:00<2:30:13, 2.32s/it] {'loss': 0.5933, 'grad_norm': 1.2330587975332181, 'learning_rate': 6.9449917778370216e-06, 'epoch': 0.44} 44%|████▎ | 3000/6885 [12:25:00<2:30:13, 2.32s/it] 44%|████▎ | 3001/6885 [12:25:03<2:37:32, 2.43s/it] 44%|████▎ | 3002/6885 [12:25:05<2:36:47, 2.42s/it] 44%|████▎ | 3003/6885 [12:25:07<2:28:35, 2.30s/it] 44%|████▎ | 3004/6885 [12:25:09<2:18:21, 2.14s/it] 44%|████▎ | 3005/6885 [12:25:12<2:41:19, 2.49s/it] 44%|████▎ | 3006/6885 [12:25:15<2:40:25, 2.48s/it] 44%|████▎ | 3007/6885 [12:25:17<2:34:15, 2.39s/it] 44%|████▎ | 3008/6885 [12:25:19<2:31:09, 2.34s/it] 44%|████▎ | 3009/6885 [12:25:23<2:51:21, 2.65s/it] 44%|████▎ | 3010/6885 [12:25:25<2:40:55, 2.49s/it] {'loss': 0.5922, 'grad_norm': 1.1656344009683823, 'learning_rate': 6.921611841495584e-06, 'epoch': 0.44} 44%|████▎ | 3010/6885 [12:25:25<2:40:55, 2.49s/it] 44%|████▎ | 3011/6885 [12:25:28<2:47:16, 2.59s/it] 44%|████▎ | 3012/6885 [12:25:31<2:54:58, 2.71s/it] 44%|████▍ | 3013/6885 [12:25:33<2:46:24, 2.58s/it] 44%|████▍ | 3014/6885 [12:26:11<14:10:14, 13.18s/it] 44%|████▍ | 3015/6885 [12:26:13<10:46:24, 10.02s/it] 44%|████▍ | 3016/6885 [12:26:16<8:27:34, 7.87s/it] 44%|████▍ | 3017/6885 [12:26:18<6:37:45, 6.17s/it] 44%|████▍ | 3018/6885 [12:26:21<5:26:53, 5.07s/it] 44%|████▍ | 3019/6885 [12:26:24<4:39:11, 4.33s/it] 44%|████▍ | 3020/6885 [12:26:25<3:50:26, 3.58s/it] {'loss': 0.5911, 'grad_norm': 1.2709734185927093, 'learning_rate': 6.898182503480414e-06, 'epoch': 0.44} 44%|████▍ | 3020/6885 [12:26:25<3:50:26, 3.58s/it] 44%|████▍ | 3021/6885 [12:26:28<3:31:08, 3.28s/it] 44%|████▍ | 3022/6885 [12:26:31<3:23:28, 3.16s/it] 44%|████▍ | 3023/6885 [12:26:33<2:53:41, 2.70s/it] 44%|████▍ | 3024/6885 [12:26:36<3:04:10, 2.86s/it] 44%|████▍ | 3025/6885 [12:26:41<3:54:40, 3.65s/it] 44%|████▍ | 3026/6885 [12:26:48<4:47:14, 4.47s/it] 44%|████▍ | 3027/6885 [12:26:53<5:10:38, 4.83s/it] 44%|████▍ | 3028/6885 [12:26:56<4:22:33, 4.08s/it] 44%|████▍ | 3029/6885 [12:26:58<3:47:09, 3.53s/it] 44%|████▍ | 3030/6885 [12:27:00<3:13:28, 3.01s/it] {'loss': 0.6103, 'grad_norm': 1.269770194129687, 'learning_rate': 6.8747043661236215e-06, 'epoch': 0.44} 44%|████▍ | 3030/6885 [12:27:00<3:13:28, 3.01s/it] 44%|████▍ | 3031/6885 [12:27:01<2:45:27, 2.58s/it] 44%|████▍ | 3032/6885 [12:27:03<2:38:15, 2.46s/it] 44%|████▍ | 3033/6885 [12:27:05<2:25:53, 2.27s/it] 44%|████▍ | 3034/6885 [12:27:07<2:23:08, 2.23s/it] 44%|████▍ | 3035/6885 [12:27:13<3:27:06, 3.23s/it] 44%|████▍ | 3036/6885 [12:27:17<3:40:57, 3.44s/it] 44%|████▍ | 3037/6885 [12:27:20<3:27:37, 3.24s/it] 44%|████▍ | 3038/6885 [12:27:22<3:19:27, 3.11s/it] 44%|████▍ | 3039/6885 [12:27:26<3:25:21, 3.20s/it] 44%|████▍ | 3040/6885 [12:27:29<3:24:44, 3.19s/it] {'loss': 0.5997, 'grad_norm': 1.106713465551905, 'learning_rate': 6.851178033011869e-06, 'epoch': 0.44} 44%|████▍ | 3040/6885 [12:27:29<3:24:44, 3.19s/it] 44%|████▍ | 3041/6885 [12:27:32<3:18:54, 3.10s/it] 44%|████▍ | 3042/6885 [12:27:35<3:19:08, 3.11s/it] 44%|████▍ | 3043/6885 [12:27:38<3:20:47, 3.14s/it] 44%|████▍ | 3044/6885 [12:27:40<3:01:06, 2.83s/it] 44%|████▍ | 3045/6885 [12:27:43<3:06:04, 2.91s/it] 44%|████▍ | 3046/6885 [12:27:47<3:09:12, 2.96s/it] 44%|████▍ | 3047/6885 [12:27:49<3:01:34, 2.84s/it] 44%|████▍ | 3048/6885 [12:27:51<2:41:01, 2.52s/it] 44%|████▍ | 3049/6885 [12:27:54<2:48:41, 2.64s/it] 44%|████▍ | 3050/6885 [12:27:56<2:46:17, 2.60s/it] {'loss': 0.5727, 'grad_norm': 1.1985970638971495, 'learning_rate': 6.82760410897086e-06, 'epoch': 0.44} 44%|████▍ | 3050/6885 [12:27:56<2:46:17, 2.60s/it] 44%|████▍ | 3051/6885 [12:27:59<2:51:15, 2.68s/it] 44%|████▍ | 3052/6885 [12:28:02<2:59:06, 2.80s/it] 44%|████▍ | 3053/6885 [12:28:05<2:52:38, 2.70s/it] 44%|████▍ | 3054/6885 [12:28:08<2:54:22, 2.73s/it] 44%|████▍ | 3055/6885 [12:28:13<3:40:00, 3.45s/it] 44%|████▍ | 3056/6885 [12:28:16<3:30:40, 3.30s/it] 44%|████▍ | 3057/6885 [12:28:20<3:51:59, 3.64s/it] 44%|████▍ | 3058/6885 [12:28:22<3:28:45, 3.27s/it] 44%|████▍ | 3059/6885 [12:28:26<3:25:52, 3.23s/it] 44%|████▍ | 3060/6885 [12:28:28<3:13:48, 3.04s/it] {'loss': 0.5983, 'grad_norm': 1.1259472634689607, 'learning_rate': 6.8039832000497865e-06, 'epoch': 0.44} 44%|████▍ | 3060/6885 [12:28:28<3:13:48, 3.04s/it] 44%|████▍ | 3061/6885 [12:28:33<3:40:25, 3.46s/it] 44%|████▍ | 3062/6885 [12:28:36<3:38:52, 3.44s/it] 44%|████▍ | 3063/6885 [12:28:39<3:30:34, 3.31s/it] 45%|████▍ | 3064/6885 [12:28:42<3:16:14, 3.08s/it] 45%|████▍ | 3065/6885 [12:28:43<2:46:05, 2.61s/it] 45%|████▍ | 3066/6885 [12:28:46<2:47:44, 2.64s/it] 45%|████▍ | 3067/6885 [12:28:50<3:15:20, 3.07s/it] 45%|████▍ | 3068/6885 [12:28:55<3:48:13, 3.59s/it] 45%|████▍ | 3069/6885 [12:28:58<3:44:42, 3.53s/it] 45%|████▍ | 3070/6885 [12:29:00<3:23:10, 3.20s/it] {'loss': 0.5958, 'grad_norm': 1.212189906596056, 'learning_rate': 6.78031591350575e-06, 'epoch': 0.45} 45%|████▍ | 3070/6885 [12:29:00<3:23:10, 3.20s/it] 45%|████▍ | 3071/6885 [12:29:05<3:46:12, 3.56s/it] 45%|████▍ | 3072/6885 [12:29:09<4:02:30, 3.82s/it] 45%|████▍ | 3073/6885 [12:29:11<3:17:32, 3.11s/it] 45%|████▍ | 3074/6885 [12:29:14<3:14:08, 3.06s/it] 45%|████▍ | 3075/6885 [12:29:17<3:19:33, 3.14s/it] 45%|████▍ | 3076/6885 [12:29:21<3:37:53, 3.43s/it] 45%|████▍ | 3077/6885 [12:29:23<3:07:39, 2.96s/it] 45%|████▍ | 3078/6885 [12:29:25<2:55:26, 2.76s/it] 45%|████▍ | 3079/6885 [12:29:29<3:06:36, 2.94s/it] 45%|████▍ | 3080/6885 [12:29:33<3:34:35, 3.38s/it] {'loss': 0.5717, 'grad_norm': 1.0999728539824523, 'learning_rate': 6.756602857788148e-06, 'epoch': 0.45} 45%|████▍ | 3080/6885 [12:29:33<3:34:35, 3.38s/it] 45%|████▍ | 3081/6885 [12:29:36<3:17:07, 3.11s/it] 45%|████▍ | 3082/6885 [12:29:39<3:18:12, 3.13s/it] 45%|████▍ | 3083/6885 [12:29:41<2:56:48, 2.79s/it] 45%|████▍ | 3084/6885 [12:29:47<3:54:20, 3.70s/it] 45%|████▍ | 3085/6885 [12:29:50<3:45:30, 3.56s/it] 45%|████▍ | 3086/6885 [12:29:53<3:33:11, 3.37s/it] 45%|████▍ | 3087/6885 [12:29:56<3:29:24, 3.31s/it] 45%|████▍ | 3088/6885 [12:30:00<3:43:49, 3.54s/it] 45%|████▍ | 3089/6885 [12:30:02<3:13:11, 3.05s/it] 45%|████▍ | 3090/6885 [12:30:05<3:15:29, 3.09s/it] {'loss': 0.5793, 'grad_norm': 1.1130187014726358, 'learning_rate': 6.732844642523032e-06, 'epoch': 0.45} 45%|████▍ | 3090/6885 [12:30:05<3:15:29, 3.09s/it] 45%|████▍ | 3091/6885 [12:30:07<2:52:24, 2.73s/it] 45%|████▍ | 3092/6885 [12:30:09<2:36:01, 2.47s/it] 45%|████▍ | 3093/6885 [12:30:11<2:35:43, 2.46s/it] 45%|████▍ | 3094/6885 [12:30:14<2:36:37, 2.48s/it] 45%|████▍ | 3095/6885 [12:30:17<2:50:01, 2.69s/it] 45%|████▍ | 3096/6885 [12:30:19<2:43:14, 2.58s/it] 45%|████▍ | 3097/6885 [12:30:23<3:00:36, 2.86s/it] 45%|████▍ | 3098/6885 [12:30:26<3:10:20, 3.02s/it] 45%|████▌ | 3099/6885 [12:30:31<3:45:07, 3.57s/it] 45%|████▌ | 3100/6885 [12:30:35<3:50:08, 3.65s/it] {'loss': 0.562, 'grad_norm': 1.075132513625087, 'learning_rate': 6.70904187849744e-06, 'epoch': 0.45} 45%|████▌ | 3100/6885 [12:30:35<3:50:08, 3.65s/it] 45%|████▌ | 3101/6885 [12:30:37<3:19:03, 3.16s/it] 45%|████▌ | 3102/6885 [12:30:39<3:08:28, 2.99s/it] 45%|████▌ | 3103/6885 [12:30:42<3:04:44, 2.93s/it] 45%|████▌ | 3104/6885 [12:30:45<2:55:59, 2.79s/it] 45%|████▌ | 3105/6885 [12:30:49<3:22:52, 3.22s/it] 45%|████▌ | 3106/6885 [12:30:51<3:07:21, 2.97s/it] 45%|████▌ | 3107/6885 [12:30:53<2:51:28, 2.72s/it] 45%|████▌ | 3108/6885 [12:30:56<2:49:43, 2.70s/it] 45%|████▌ | 3109/6885 [12:30:58<2:33:12, 2.43s/it] 45%|████▌ | 3110/6885 [12:31:02<3:01:10, 2.88s/it] {'loss': 0.5978, 'grad_norm': 1.2147850552839328, 'learning_rate': 6.685195177643684e-06, 'epoch': 0.45} 45%|████▌ | 3110/6885 [12:31:02<3:01:10, 2.88s/it] 45%|████▌ | 3111/6885 [12:31:04<2:49:37, 2.70s/it] 45%|████▌ | 3112/6885 [12:31:08<3:19:43, 3.18s/it] 45%|████▌ | 3113/6885 [12:31:13<3:52:09, 3.69s/it] 45%|████▌ | 3114/6885 [12:31:17<3:47:13, 3.62s/it] 45%|████▌ | 3115/6885 [12:31:19<3:26:10, 3.28s/it] 45%|████▌ | 3116/6885 [12:31:21<3:05:24, 2.95s/it] 45%|████▌ | 3117/6885 [12:31:25<3:11:06, 3.04s/it] 45%|████▌ | 3118/6885 [12:31:27<2:59:20, 2.86s/it] 45%|████▌ | 3119/6885 [12:31:29<2:41:06, 2.57s/it] 45%|████▌ | 3120/6885 [12:31:31<2:30:19, 2.40s/it] {'loss': 0.5912, 'grad_norm': 1.2836246837826484, 'learning_rate': 6.661305153023628e-06, 'epoch': 0.45} 45%|████▌ | 3120/6885 [12:31:31<2:30:19, 2.40s/it] 45%|████▌ | 3121/6885 [12:31:33<2:30:20, 2.40s/it] 45%|████▌ | 3122/6885 [12:31:37<2:44:19, 2.62s/it] 45%|████▌ | 3123/6885 [12:31:39<2:46:32, 2.66s/it] 45%|████▌ | 3124/6885 [12:31:41<2:30:20, 2.40s/it] 45%|████▌ | 3125/6885 [12:31:44<2:31:08, 2.41s/it] 45%|████▌ | 3126/6885 [12:31:46<2:36:30, 2.50s/it] 45%|████▌ | 3127/6885 [12:31:49<2:42:07, 2.59s/it] 45%|████▌ | 3128/6885 [12:31:52<2:43:05, 2.60s/it] 45%|████▌ | 3129/6885 [12:31:54<2:40:55, 2.57s/it] 45%|████▌ | 3130/6885 [12:31:56<2:24:27, 2.31s/it] {'loss': 0.586, 'grad_norm': 1.1766776836268427, 'learning_rate': 6.637372418812921e-06, 'epoch': 0.45} 45%|████▌ | 3130/6885 [12:31:56<2:24:27, 2.31s/it] 45%|████▌ | 3131/6885 [12:31:59<2:37:27, 2.52s/it] 45%|████▌ | 3132/6885 [12:32:01<2:24:49, 2.32s/it] 46%|████▌ | 3133/6885 [12:32:03<2:31:30, 2.42s/it] 46%|████▌ | 3134/6885 [12:32:06<2:33:46, 2.46s/it] 46%|████▌ | 3135/6885 [12:32:09<2:38:21, 2.53s/it] 46%|████▌ | 3136/6885 [12:32:13<3:16:38, 3.15s/it] 46%|████▌ | 3137/6885 [12:32:16<3:02:38, 2.92s/it] 46%|████▌ | 3138/6885 [12:32:20<3:38:04, 3.49s/it] 46%|████▌ | 3139/6885 [12:32:23<3:22:34, 3.24s/it] 46%|████▌ | 3140/6885 [12:32:25<2:54:06, 2.79s/it] {'loss': 0.5998, 'grad_norm': 1.3613669267848012, 'learning_rate': 6.613397590285211e-06, 'epoch': 0.46} 46%|████▌ | 3140/6885 [12:32:25<2:54:06, 2.79s/it] 46%|████▌ | 3141/6885 [12:32:27<2:45:00, 2.64s/it] 46%|████▌ | 3142/6885 [12:32:30<2:52:25, 2.76s/it] 46%|████▌ | 3143/6885 [12:32:36<3:43:24, 3.58s/it] 46%|████▌ | 3144/6885 [12:32:38<3:15:46, 3.14s/it] 46%|████▌ | 3145/6885 [12:32:40<2:51:48, 2.76s/it] 46%|████▌ | 3146/6885 [12:32:44<3:20:58, 3.23s/it] 46%|████▌ | 3147/6885 [12:32:46<3:02:33, 2.93s/it] 46%|████▌ | 3148/6885 [12:32:50<3:17:25, 3.17s/it] 46%|████▌ | 3149/6885 [12:32:53<3:19:10, 3.20s/it] 46%|████▌ | 3150/6885 [12:32:56<3:18:06, 3.18s/it] {'loss': 0.5812, 'grad_norm': 1.2051701552338834, 'learning_rate': 6.589381283796325e-06, 'epoch': 0.46} 46%|████▌ | 3150/6885 [12:32:56<3:18:06, 3.18s/it] 46%|████▌ | 3151/6885 [12:33:00<3:25:55, 3.31s/it] 46%|████▌ | 3152/6885 [12:33:02<3:10:47, 3.07s/it] 46%|████▌ | 3153/6885 [12:33:05<3:04:24, 2.96s/it] 46%|████▌ | 3154/6885 [12:33:08<3:02:36, 2.94s/it] 46%|████▌ | 3155/6885 [12:33:11<3:05:17, 2.98s/it] 46%|████▌ | 3156/6885 [12:33:14<2:55:02, 2.82s/it] 46%|████▌ | 3157/6885 [12:33:17<3:00:48, 2.91s/it] 46%|████▌ | 3158/6885 [12:33:21<3:25:57, 3.32s/it] 46%|████▌ | 3159/6885 [12:33:23<3:06:57, 3.01s/it] 46%|████▌ | 3160/6885 [12:33:25<2:49:46, 2.73s/it] {'loss': 0.583, 'grad_norm': 1.1519365736041338, 'learning_rate': 6.565324116768428e-06, 'epoch': 0.46} 46%|████▌ | 3160/6885 [12:33:25<2:49:46, 2.73s/it] 46%|████▌ | 3161/6885 [12:33:27<2:36:14, 2.52s/it] 46%|████▌ | 3162/6885 [12:33:29<2:20:20, 2.26s/it] 46%|████▌ | 3163/6885 [12:33:32<2:37:47, 2.54s/it] 46%|████▌ | 3164/6885 [12:33:36<2:53:51, 2.80s/it] 46%|████▌ | 3165/6885 [12:33:37<2:32:51, 2.47s/it] 46%|████▌ | 3166/6885 [12:33:43<3:32:22, 3.43s/it] 46%|████▌ | 3167/6885 [12:33:46<3:16:42, 3.17s/it] 46%|████▌ | 3168/6885 [12:33:48<3:06:15, 3.01s/it] 46%|████▌ | 3169/6885 [12:33:51<3:03:59, 2.97s/it] 46%|████▌ | 3170/6885 [12:33:54<2:56:50, 2.86s/it] {'loss': 0.5765, 'grad_norm': 1.1475917123110242, 'learning_rate': 6.54122670767414e-06, 'epoch': 0.46} 46%|████▌ | 3170/6885 [12:33:54<2:56:50, 2.86s/it] 46%|████▌ | 3171/6885 [12:33:56<2:52:10, 2.78s/it] 46%|████▌ | 3172/6885 [12:33:58<2:37:28, 2.54s/it] 46%|████▌ | 3173/6885 [12:34:00<2:22:57, 2.31s/it] 46%|████▌ | 3174/6885 [12:34:03<2:30:31, 2.43s/it] 46%|████▌ | 3175/6885 [12:34:07<3:08:47, 3.05s/it] 46%|████▌ | 3176/6885 [12:34:09<2:49:32, 2.74s/it] 46%|████▌ | 3177/6885 [12:34:12<2:49:05, 2.74s/it] 46%|████▌ | 3178/6885 [12:34:16<3:07:13, 3.03s/it] 46%|████▌ | 3179/6885 [12:34:18<2:57:56, 2.88s/it] 46%|████▌ | 3180/6885 [12:34:24<3:55:24, 3.81s/it] {'loss': 0.5997, 'grad_norm': 1.088676956077236, 'learning_rate': 6.517089676020648e-06, 'epoch': 0.46} 46%|████▌ | 3180/6885 [12:34:24<3:55:24, 3.81s/it] 46%|████▌ | 3181/6885 [12:34:28<3:58:40, 3.87s/it] 46%|████▌ | 3182/6885 [12:34:31<3:45:14, 3.65s/it] 46%|████▌ | 3183/6885 [12:34:35<3:38:04, 3.53s/it] 46%|████▌ | 3184/6885 [12:34:38<3:35:08, 3.49s/it] 46%|████▋ | 3185/6885 [12:34:42<3:36:19, 3.51s/it] 46%|████▋ | 3186/6885 [12:34:44<3:13:49, 3.14s/it] 46%|████▋ | 3187/6885 [12:34:48<3:42:08, 3.60s/it] 46%|████▋ | 3188/6885 [12:34:51<3:16:47, 3.19s/it] 46%|████▋ | 3189/6885 [12:34:53<3:01:26, 2.95s/it] 46%|████▋ | 3190/6885 [12:34:56<3:01:12, 2.94s/it] {'loss': 0.565, 'grad_norm': 1.1195203213303881, 'learning_rate': 6.492913642333768e-06, 'epoch': 0.46} 46%|████▋ | 3190/6885 [12:34:56<3:01:12, 2.94s/it] 46%|████▋ | 3191/6885 [12:35:00<3:13:53, 3.15s/it] 46%|████▋ | 3192/6885 [12:35:02<2:57:59, 2.89s/it] 46%|████▋ | 3193/6885 [12:35:05<2:51:42, 2.79s/it] 46%|████▋ | 3194/6885 [12:35:07<2:51:27, 2.79s/it] 46%|████▋ | 3195/6885 [12:35:09<2:40:36, 2.61s/it] 46%|████▋ | 3196/6885 [12:35:12<2:43:12, 2.65s/it] 46%|████▋ | 3197/6885 [12:35:14<2:28:23, 2.41s/it] 46%|████▋ | 3198/6885 [12:35:17<2:32:27, 2.48s/it] 46%|████▋ | 3199/6885 [12:35:18<2:15:27, 2.20s/it] 46%|████▋ | 3200/6885 [12:35:21<2:19:35, 2.27s/it] {'loss': 0.5988, 'grad_norm': 1.0927178103796473, 'learning_rate': 6.468699228142004e-06, 'epoch': 0.46} 46%|████▋ | 3200/6885 [12:35:21<2:19:35, 2.27s/it] 46%|████▋ | 3201/6885 [12:35:23<2:26:56, 2.39s/it] 47%|████▋ | 3202/6885 [12:35:27<2:54:15, 2.84s/it] 47%|████▋ | 3203/6885 [12:35:29<2:37:13, 2.56s/it] 47%|████▋ | 3204/6885 [12:35:32<2:48:23, 2.74s/it] 47%|████▋ | 3205/6885 [12:35:35<2:40:28, 2.62s/it] 47%|████▋ | 3206/6885 [12:35:37<2:40:14, 2.61s/it] 47%|████▋ | 3207/6885 [12:35:41<2:57:36, 2.90s/it] 47%|████▋ | 3208/6885 [12:35:44<2:57:36, 2.90s/it] 47%|████▋ | 3209/6885 [12:35:47<2:55:12, 2.86s/it] 47%|████▋ | 3210/6885 [12:35:50<3:07:39, 3.06s/it] {'loss': 0.6034, 'grad_norm': 1.1180323598233408, 'learning_rate': 6.444447055960559e-06, 'epoch': 0.47} 47%|████▋ | 3210/6885 [12:35:50<3:07:39, 3.06s/it] 47%|████▋ | 3211/6885 [12:35:53<3:02:43, 2.98s/it] 47%|████▋ | 3212/6885 [12:35:55<2:51:45, 2.81s/it] 47%|████▋ | 3213/6885 [12:35:59<3:00:53, 2.96s/it] 47%|████▋ | 3214/6885 [12:36:03<3:27:42, 3.39s/it] 47%|████▋ | 3215/6885 [12:36:06<3:22:16, 3.31s/it] 47%|████▋ | 3216/6885 [12:36:09<3:21:58, 3.30s/it] 47%|████▋ | 3217/6885 [12:36:13<3:30:09, 3.44s/it] 47%|████▋ | 3218/6885 [12:36:15<2:59:33, 2.94s/it] 47%|████▋ | 3219/6885 [12:36:18<2:56:45, 2.89s/it] 47%|████▋ | 3220/6885 [12:36:20<2:48:56, 2.77s/it] {'loss': 0.5792, 'grad_norm': 1.1581218721076667, 'learning_rate': 6.420157749275341e-06, 'epoch': 0.47} 47%|████▋ | 3220/6885 [12:36:20<2:48:56, 2.77s/it] 47%|████▋ | 3221/6885 [12:36:22<2:35:16, 2.54s/it] 47%|████▋ | 3222/6885 [12:36:25<2:36:14, 2.56s/it] 47%|████▋ | 3223/6885 [12:36:29<3:04:53, 3.03s/it] 47%|████▋ | 3224/6885 [12:36:32<3:08:48, 3.09s/it] 47%|████▋ | 3225/6885 [12:36:35<3:04:56, 3.03s/it] 47%|████▋ | 3226/6885 [12:36:37<2:52:36, 2.83s/it] 47%|████▋ | 3227/6885 [12:36:40<2:57:19, 2.91s/it] 47%|████▋ | 3228/6885 [12:36:42<2:35:25, 2.55s/it] 47%|████▋ | 3229/6885 [12:36:45<2:33:33, 2.52s/it] 47%|████▋ | 3230/6885 [12:36:47<2:31:26, 2.49s/it] {'loss': 0.5914, 'grad_norm': 1.2355006071990586, 'learning_rate': 6.395831932526924e-06, 'epoch': 0.47} 47%|████▋ | 3230/6885 [12:36:47<2:31:26, 2.49s/it] 47%|████▋ | 3231/6885 [12:36:49<2:17:33, 2.26s/it] 47%|████▋ | 3232/6885 [12:36:52<2:30:24, 2.47s/it] 47%|████▋ | 3233/6885 [12:36:54<2:26:29, 2.41s/it] 47%|████▋ | 3234/6885 [12:36:57<2:35:30, 2.56s/it] 47%|████▋ | 3235/6885 [12:37:00<2:49:40, 2.79s/it] 47%|████▋ | 3236/6885 [12:37:03<2:54:03, 2.86s/it] 47%|████▋ | 3237/6885 [12:37:08<3:23:56, 3.35s/it] 47%|████▋ | 3238/6885 [12:37:12<3:33:22, 3.51s/it] 47%|████▋ | 3239/6885 [12:37:15<3:31:08, 3.47s/it] 47%|████▋ | 3240/6885 [12:37:19<3:31:43, 3.49s/it] {'loss': 0.5972, 'grad_norm': 1.2628642644632941, 'learning_rate': 6.371470231094498e-06, 'epoch': 0.47} 47%|████▋ | 3240/6885 [12:37:19<3:31:43, 3.49s/it] 47%|████▋ | 3241/6885 [12:37:25<4:26:38, 4.39s/it] 47%|████▋ | 3242/6885 [12:37:28<4:04:16, 4.02s/it] 47%|████▋ | 3243/6885 [12:37:30<3:31:49, 3.49s/it] 47%|████▋ | 3244/6885 [12:37:33<3:21:43, 3.32s/it] 47%|████▋ | 3245/6885 [12:37:38<3:41:44, 3.66s/it] 47%|████▋ | 3246/6885 [12:37:40<3:13:54, 3.20s/it] 47%|████▋ | 3247/6885 [12:37:44<3:22:23, 3.34s/it] 47%|████▋ | 3248/6885 [12:37:46<3:05:34, 3.06s/it] 47%|████▋ | 3249/6885 [12:37:49<3:05:55, 3.07s/it] 47%|████▋ | 3250/6885 [12:37:53<3:14:26, 3.21s/it] {'loss': 0.5943, 'grad_norm': 1.30372441555249, 'learning_rate': 6.3470732712798e-06, 'epoch': 0.47} 47%|████▋ | 3250/6885 [12:37:53<3:14:26, 3.21s/it] 47%|████▋ | 3251/6885 [12:37:56<3:09:21, 3.13s/it] 47%|████▋ | 3252/6885 [12:37:58<3:02:38, 3.02s/it] 47%|████▋ | 3253/6885 [12:38:00<2:46:38, 2.75s/it] 47%|████▋ | 3254/6885 [12:38:05<3:21:37, 3.33s/it] 47%|████▋ | 3255/6885 [12:38:10<3:44:13, 3.71s/it] 47%|████▋ | 3256/6885 [12:38:12<3:12:47, 3.19s/it] 47%|████▋ | 3257/6885 [12:38:15<3:10:35, 3.15s/it] 47%|████▋ | 3258/6885 [12:38:17<2:46:37, 2.76s/it] 47%|████▋ | 3259/6885 [12:38:19<2:33:33, 2.54s/it] 47%|████▋ | 3260/6885 [12:38:23<3:01:54, 3.01s/it] {'loss': 0.59, 'grad_norm': 1.2732465621842586, 'learning_rate': 6.322641680290997e-06, 'epoch': 0.47} 47%|████▋ | 3260/6885 [12:38:23<3:01:54, 3.01s/it] 47%|████▋ | 3261/6885 [12:38:26<3:03:30, 3.04s/it] 47%|████▋ | 3262/6885 [12:38:31<3:33:50, 3.54s/it] 47%|████▋ | 3263/6885 [12:38:35<3:52:30, 3.85s/it] 47%|████▋ | 3264/6885 [12:38:38<3:41:17, 3.67s/it] 47%|████▋ | 3265/6885 [12:38:41<3:20:13, 3.32s/it] 47%|████▋ | 3266/6885 [12:38:43<2:50:53, 2.83s/it] 47%|████▋ | 3267/6885 [12:38:46<3:07:15, 3.11s/it] 47%|████▋ | 3268/6885 [12:38:49<2:55:45, 2.92s/it] 47%|████▋ | 3269/6885 [12:38:52<3:09:01, 3.14s/it] 47%|████▋ | 3270/6885 [12:38:55<3:01:48, 3.02s/it] {'loss': 0.5908, 'grad_norm': 1.1957460012906904, 'learning_rate': 6.298176086226577e-06, 'epoch': 0.47} 47%|████▋ | 3270/6885 [12:38:55<3:01:48, 3.02s/it] 48%|████▊ | 3271/6885 [12:38:58<2:53:00, 2.87s/it] 48%|████▊ | 3272/6885 [12:39:01<2:55:39, 2.92s/it] 48%|████▊ | 3273/6885 [12:39:04<2:53:08, 2.88s/it] 48%|████▊ | 3274/6885 [12:39:07<2:56:10, 2.93s/it] 48%|████▊ | 3275/6885 [12:39:09<2:50:37, 2.84s/it] 48%|████▊ | 3276/6885 [12:39:12<2:43:32, 2.72s/it] 48%|████▊ | 3277/6885 [12:39:14<2:34:50, 2.57s/it] 48%|████▊ | 3278/6885 [12:39:17<2:40:06, 2.66s/it] 48%|████▊ | 3279/6885 [12:39:20<2:42:11, 2.70s/it] 48%|████▊ | 3280/6885 [12:39:22<2:32:09, 2.53s/it] {'loss': 0.579, 'grad_norm': 1.2666436895215651, 'learning_rate': 6.273677118059192e-06, 'epoch': 0.48} 48%|████▊ | 3280/6885 [12:39:22<2:32:09, 2.53s/it] 48%|████▊ | 3281/6885 [12:39:25<2:46:37, 2.77s/it] 48%|████▊ | 3282/6885 [12:39:28<2:51:46, 2.86s/it] 48%|████▊ | 3283/6885 [12:39:31<2:45:23, 2.76s/it] 48%|████▊ | 3284/6885 [12:39:34<2:47:57, 2.80s/it] 48%|████▊ | 3285/6885 [12:39:37<2:56:45, 2.95s/it] 48%|████▊ | 3286/6885 [12:39:39<2:48:55, 2.82s/it] 48%|████▊ | 3287/6885 [12:39:47<4:24:39, 4.41s/it] 48%|████▊ | 3288/6885 [12:39:54<4:53:57, 4.90s/it] 48%|████▊ | 3289/6885 [12:39:56<4:07:49, 4.14s/it] 48%|████▊ | 3290/6885 [12:39:59<3:43:29, 3.73s/it] {'loss': 0.5849, 'grad_norm': 1.1740612442844354, 'learning_rate': 6.24914540561949e-06, 'epoch': 0.48} 48%|████▊ | 3290/6885 [12:39:59<3:43:29, 3.73s/it] 48%|████▊ | 3291/6885 [12:40:00<3:04:44, 3.08s/it] 48%|████▊ | 3292/6885 [12:40:03<2:59:16, 2.99s/it] 48%|████▊ | 3293/6885 [12:40:05<2:43:02, 2.72s/it] 48%|████▊ | 3294/6885 [12:40:08<2:54:43, 2.92s/it] 48%|████▊ | 3295/6885 [12:40:11<2:48:36, 2.82s/it] 48%|████▊ | 3296/6885 [12:40:13<2:31:56, 2.54s/it] 48%|████▊ | 3297/6885 [12:40:17<2:59:50, 3.01s/it] 48%|████▊ | 3298/6885 [12:40:19<2:47:59, 2.81s/it] 48%|████▊ | 3299/6885 [12:40:22<2:41:38, 2.70s/it] 48%|████▊ | 3300/6885 [12:40:24<2:36:16, 2.62s/it] {'loss': 0.5914, 'grad_norm': 1.170368029656733, 'learning_rate': 6.2245815795799235e-06, 'epoch': 0.48} 48%|████▊ | 3300/6885 [12:40:24<2:36:16, 2.62s/it] 48%|████▊ | 3301/6885 [12:40:26<2:25:04, 2.43s/it] 48%|████▊ | 3302/6885 [12:40:30<2:42:35, 2.72s/it] 48%|████▊ | 3303/6885 [12:40:32<2:32:25, 2.55s/it] 48%|████▊ | 3304/6885 [12:40:35<2:37:43, 2.64s/it] 48%|████▊ | 3305/6885 [12:40:37<2:29:56, 2.51s/it] 48%|████▊ | 3306/6885 [12:40:39<2:24:38, 2.42s/it] 48%|████▊ | 3307/6885 [12:40:44<3:02:42, 3.06s/it] 48%|████▊ | 3308/6885 [12:40:47<3:09:07, 3.17s/it] 48%|████▊ | 3309/6885 [12:40:55<4:39:19, 4.69s/it] 48%|████▊ | 3310/6885 [12:40:59<4:18:36, 4.34s/it] {'loss': 0.5692, 'grad_norm': 1.060432274782722, 'learning_rate': 6.199986271438536e-06, 'epoch': 0.48} 48%|████▊ | 3310/6885 [12:40:59<4:18:36, 4.34s/it] 48%|████▊ | 3311/6885 [12:41:02<3:51:39, 3.89s/it] 48%|████▊ | 3312/6885 [12:41:04<3:20:22, 3.36s/it] 48%|████▊ | 3313/6885 [12:41:07<3:13:56, 3.26s/it] 48%|████▊ | 3314/6885 [12:41:10<3:14:07, 3.26s/it] 48%|████▊ | 3315/6885 [12:41:13<3:03:25, 3.08s/it] 48%|████▊ | 3316/6885 [12:41:17<3:22:22, 3.40s/it] 48%|████▊ | 3317/6885 [12:41:21<3:31:07, 3.55s/it] 48%|████▊ | 3318/6885 [12:41:26<4:08:28, 4.18s/it] 48%|████▊ | 3319/6885 [12:41:28<3:30:30, 3.54s/it] 48%|████▊ | 3320/6885 [12:41:32<3:37:26, 3.66s/it] {'loss': 0.5789, 'grad_norm': 1.133481629336483, 'learning_rate': 6.17536011350273e-06, 'epoch': 0.48} 48%|████▊ | 3320/6885 [12:41:32<3:37:26, 3.66s/it] 48%|████▊ | 3321/6885 [12:41:35<3:19:34, 3.36s/it] 48%|████▊ | 3322/6885 [12:41:39<3:26:06, 3.47s/it] 48%|████▊ | 3323/6885 [12:41:42<3:22:44, 3.42s/it] 48%|████▊ | 3324/6885 [12:41:45<3:19:40, 3.36s/it] 48%|████▊ | 3325/6885 [12:41:48<3:15:25, 3.29s/it] 48%|████▊ | 3326/6885 [12:41:51<3:05:14, 3.12s/it] 48%|████▊ | 3327/6885 [12:41:54<3:07:33, 3.16s/it] 48%|████▊ | 3328/6885 [12:41:57<2:59:46, 3.03s/it] 48%|████▊ | 3329/6885 [12:41:59<2:44:13, 2.77s/it] 48%|████▊ | 3330/6885 [12:42:03<3:00:30, 3.05s/it] {'loss': 0.5815, 'grad_norm': 1.0779584839433474, 'learning_rate': 6.150703738873004e-06, 'epoch': 0.48} 48%|████▊ | 3330/6885 [12:42:03<3:00:30, 3.05s/it] 48%|████▊ | 3331/6885 [12:42:05<2:46:27, 2.81s/it] 48%|████▊ | 3332/6885 [12:42:08<2:37:10, 2.65s/it] 48%|████▊ | 3333/6885 [12:42:10<2:29:59, 2.53s/it] 48%|████▊ | 3334/6885 [12:42:13<2:33:37, 2.60s/it] 48%|████▊ | 3335/6885 [12:42:15<2:23:18, 2.42s/it] 48%|████▊ | 3336/6885 [12:42:18<2:32:31, 2.58s/it] 48%|████▊ | 3337/6885 [12:42:21<2:40:58, 2.72s/it] 48%|████▊ | 3338/6885 [12:42:23<2:31:48, 2.57s/it] 48%|████▊ | 3339/6885 [12:42:25<2:20:05, 2.37s/it] 49%|████▊ | 3340/6885 [12:42:28<2:29:55, 2.54s/it] {'loss': 0.5754, 'grad_norm': 1.138478981177591, 'learning_rate': 6.1260177814266855e-06, 'epoch': 0.49} 49%|████▊ | 3340/6885 [12:42:28<2:29:55, 2.54s/it] 49%|████▊ | 3341/6885 [12:42:30<2:23:48, 2.43s/it] 49%|████▊ | 3342/6885 [12:42:32<2:25:29, 2.46s/it] 49%|████▊ | 3343/6885 [12:42:36<2:52:53, 2.93s/it] 49%|████▊ | 3344/6885 [12:42:39<2:45:10, 2.80s/it] 49%|████▊ | 3345/6885 [12:42:41<2:28:51, 2.52s/it] 49%|████▊ | 3346/6885 [12:42:43<2:24:46, 2.45s/it] 49%|████▊ | 3347/6885 [12:42:47<2:50:19, 2.89s/it] 49%|████▊ | 3348/6885 [12:42:51<3:18:45, 3.37s/it] 49%|████▊ | 3349/6885 [12:42:54<3:01:25, 3.08s/it] 49%|████▊ | 3350/6885 [12:42:58<3:17:41, 3.36s/it] {'loss': 0.5778, 'grad_norm': 1.1290987276585867, 'learning_rate': 6.101302875801628e-06, 'epoch': 0.49} 49%|████▊ | 3350/6885 [12:42:58<3:17:41, 3.36s/it] 49%|████▊ | 3351/6885 [12:43:03<3:41:54, 3.77s/it] 49%|████▊ | 3352/6885 [12:43:06<3:27:20, 3.52s/it] 49%|████▊ | 3353/6885 [12:43:08<3:02:13, 3.10s/it] 49%|████▊ | 3354/6885 [12:43:10<2:52:19, 2.93s/it] 49%|████▊ | 3355/6885 [12:43:13<2:54:16, 2.96s/it] 49%|████▊ | 3356/6885 [12:43:17<3:02:52, 3.11s/it] 49%|████▉ | 3357/6885 [12:43:20<3:08:50, 3.21s/it] 49%|████▉ | 3358/6885 [12:43:25<3:31:10, 3.59s/it] 49%|████▉ | 3359/6885 [12:43:27<3:02:33, 3.11s/it] 49%|████▉ | 3360/6885 [12:43:29<2:55:05, 2.98s/it] {'loss': 0.5689, 'grad_norm': 1.1468009205478524, 'learning_rate': 6.0765596573798994e-06, 'epoch': 0.49} 49%|████▉ | 3360/6885 [12:43:29<2:55:05, 2.98s/it] 49%|████▉ | 3361/6885 [12:43:31<2:38:54, 2.71s/it] 49%|████▉ | 3362/6885 [12:43:34<2:39:14, 2.71s/it] 49%|████▉ | 3363/6885 [12:43:37<2:36:15, 2.66s/it] 49%|████▉ | 3364/6885 [12:43:40<2:55:27, 2.99s/it] 49%|████▉ | 3365/6885 [12:43:43<2:42:45, 2.77s/it] 49%|████▉ | 3366/6885 [12:43:46<2:46:45, 2.84s/it] 49%|████▉ | 3367/6885 [12:43:49<2:48:55, 2.88s/it] 49%|████▉ | 3368/6885 [12:43:51<2:33:35, 2.62s/it] 49%|████▉ | 3369/6885 [12:43:55<3:02:25, 3.11s/it] 49%|████▉ | 3370/6885 [12:43:58<3:03:24, 3.13s/it] {'loss': 0.5692, 'grad_norm': 1.0683998313181482, 'learning_rate': 6.051788762271442e-06, 'epoch': 0.49} 49%|████▉ | 3370/6885 [12:43:58<3:03:24, 3.13s/it] 49%|████▉ | 3371/6885 [12:44:02<3:23:34, 3.48s/it] 49%|████▉ | 3372/6885 [12:44:05<3:03:28, 3.13s/it] 49%|████▉ | 3373/6885 [12:44:08<3:14:40, 3.33s/it] 49%|████▉ | 3374/6885 [12:44:12<3:20:32, 3.43s/it] 49%|████▉ | 3375/6885 [12:44:14<2:55:42, 3.00s/it] 49%|████▉ | 3376/6885 [12:44:19<3:27:28, 3.55s/it] 49%|████▉ | 3377/6885 [12:44:23<3:32:28, 3.63s/it] 49%|████▉ | 3378/6885 [12:44:25<3:03:18, 3.14s/it] 49%|████▉ | 3379/6885 [12:44:27<2:46:37, 2.85s/it] 49%|████▉ | 3380/6885 [12:44:29<2:36:46, 2.68s/it] {'loss': 0.5808, 'grad_norm': 1.1889646870467425, 'learning_rate': 6.0269908272977295e-06, 'epoch': 0.49} 49%|████▉ | 3380/6885 [12:44:29<2:36:46, 2.68s/it] 49%|████▉ | 3381/6885 [12:44:35<3:33:35, 3.66s/it] 49%|████▉ | 3382/6885 [12:44:37<3:09:27, 3.24s/it] 49%|████▉ | 3383/6885 [12:44:41<3:08:15, 3.23s/it] 49%|████▉ | 3384/6885 [12:44:43<2:58:36, 3.06s/it] 49%|████▉ | 3385/6885 [12:44:46<2:52:04, 2.95s/it] 49%|████▉ | 3386/6885 [12:44:48<2:39:48, 2.74s/it] 49%|████▉ | 3387/6885 [12:44:51<2:43:40, 2.81s/it] 49%|████▉ | 3388/6885 [12:44:54<2:42:17, 2.78s/it] 49%|████▉ | 3389/6885 [12:44:56<2:27:25, 2.53s/it] 49%|████▉ | 3390/6885 [12:44:59<2:43:56, 2.81s/it] {'loss': 0.5772, 'grad_norm': 1.2529890364621932, 'learning_rate': 6.002166489975385e-06, 'epoch': 0.49} 49%|████▉ | 3390/6885 [12:44:59<2:43:56, 2.81s/it] 49%|████▉ | 3391/6885 [12:45:02<2:40:08, 2.75s/it] 49%|████▉ | 3392/6885 [12:45:04<2:28:40, 2.55s/it] 49%|████▉ | 3393/6885 [12:45:07<2:36:08, 2.68s/it] 49%|████▉ | 3394/6885 [12:45:12<3:19:47, 3.43s/it] 49%|████▉ | 3395/6885 [12:45:17<3:38:41, 3.76s/it] 49%|████▉ | 3396/6885 [12:45:22<4:01:56, 4.16s/it] 49%|████▉ | 3397/6885 [12:45:24<3:33:01, 3.66s/it] 49%|████▉ | 3398/6885 [12:45:27<3:22:21, 3.48s/it] 49%|████▉ | 3399/6885 [12:45:29<2:55:15, 3.02s/it] 49%|████▉ | 3400/6885 [12:45:32<2:43:24, 2.81s/it] {'loss': 0.5862, 'grad_norm': 1.1925487080641164, 'learning_rate': 5.977316388499794e-06, 'epoch': 0.49} 49%|████▉ | 3400/6885 [12:45:32<2:43:24, 2.81s/it] 49%|████▉ | 3401/6885 [12:45:35<3:00:00, 3.10s/it] 49%|████▉ | 3402/6885 [12:45:38<2:45:02, 2.84s/it] 49%|████▉ | 3403/6885 [12:45:41<2:58:17, 3.07s/it] 49%|████▉ | 3404/6885 [12:45:43<2:35:34, 2.68s/it] 49%|████▉ | 3405/6885 [12:45:47<3:01:04, 3.12s/it] 49%|████▉ | 3406/6885 [12:45:50<3:03:06, 3.16s/it] 49%|████▉ | 3407/6885 [12:45:53<2:50:04, 2.93s/it] 49%|████▉ | 3408/6885 [12:45:55<2:31:22, 2.61s/it] 50%|████▉ | 3409/6885 [12:45:58<2:43:40, 2.83s/it] 50%|████▉ | 3410/6885 [12:46:01<2:39:16, 2.75s/it] {'loss': 0.5662, 'grad_norm': 1.1372201366075154, 'learning_rate': 5.952441161728701e-06, 'epoch': 0.5} 50%|████▉ | 3410/6885 [12:46:01<2:39:16, 2.75s/it] 50%|████▉ | 3411/6885 [12:46:04<2:45:57, 2.87s/it] 50%|████▉ | 3412/6885 [12:46:07<2:50:22, 2.94s/it] 50%|████▉ | 3413/6885 [12:46:10<2:56:30, 3.05s/it] 50%|████▉ | 3414/6885 [12:46:12<2:39:56, 2.76s/it] 50%|████▉ | 3415/6885 [12:46:15<2:39:37, 2.76s/it] 50%|████▉ | 3416/6885 [12:46:18<2:39:29, 2.76s/it] 50%|████▉ | 3417/6885 [12:46:20<2:31:41, 2.62s/it] 50%|████▉ | 3418/6885 [12:46:22<2:27:52, 2.56s/it] 50%|████▉ | 3419/6885 [12:46:24<2:17:15, 2.38s/it] 50%|████▉ | 3420/6885 [12:46:28<2:30:18, 2.60s/it] {'loss': 0.5682, 'grad_norm': 1.2981299245914195, 'learning_rate': 5.927541449165783e-06, 'epoch': 0.5} 50%|████▉ | 3420/6885 [12:46:28<2:30:18, 2.60s/it] 50%|████▉ | 3421/6885 [12:46:31<2:41:52, 2.80s/it] 50%|████▉ | 3422/6885 [12:46:34<2:48:52, 2.93s/it] 50%|████▉ | 3423/6885 [12:46:37<2:47:57, 2.91s/it] 50%|████▉ | 3424/6885 [12:46:41<3:02:02, 3.16s/it] 50%|████▉ | 3425/6885 [12:46:45<3:22:17, 3.51s/it] 50%|████▉ | 3426/6885 [12:46:49<3:29:37, 3.64s/it] 50%|████▉ | 3427/6885 [12:46:51<3:06:36, 3.24s/it] 50%|████▉ | 3428/6885 [12:46:54<3:02:03, 3.16s/it] 50%|████▉ | 3429/6885 [12:46:57<2:50:53, 2.97s/it] 50%|████▉ | 3430/6885 [12:47:00<2:50:35, 2.96s/it] {'loss': 0.5894, 'grad_norm': 1.1198285033650917, 'learning_rate': 5.902617890944207e-06, 'epoch': 0.5} 50%|████▉ | 3430/6885 [12:47:00<2:50:35, 2.96s/it] 50%|████▉ | 3431/6885 [12:47:02<2:46:15, 2.89s/it] 50%|████▉ | 3432/6885 [12:47:05<2:49:39, 2.95s/it] 50%|████▉ | 3433/6885 [12:47:07<2:33:19, 2.67s/it] 50%|████▉ | 3434/6885 [12:47:11<2:56:09, 3.06s/it] 50%|████▉ | 3435/6885 [12:47:13<2:37:37, 2.74s/it] 50%|████▉ | 3436/6885 [12:47:16<2:29:44, 2.61s/it] 50%|████▉ | 3437/6885 [12:47:19<2:37:07, 2.73s/it] 50%|████▉ | 3438/6885 [12:47:21<2:24:23, 2.51s/it] 50%|████▉ | 3439/6885 [12:47:24<2:38:49, 2.77s/it] 50%|████▉ | 3440/6885 [12:47:26<2:29:11, 2.60s/it] {'loss': 0.5735, 'grad_norm': 1.1442459802118357, 'learning_rate': 5.8776711278101765e-06, 'epoch': 0.5} 50%|████▉ | 3440/6885 [12:47:26<2:29:11, 2.60s/it] 50%|████▉ | 3441/6885 [12:47:29<2:27:30, 2.57s/it] 50%|████▉ | 3442/6885 [12:47:36<3:49:01, 3.99s/it] 50%|█████ | 3443/6885 [12:47:40<3:50:57, 4.03s/it] 50%|█████ | 3444/6885 [12:47:42<3:11:14, 3.33s/it] 50%|█████ | 3445/6885 [12:47:45<3:00:59, 3.16s/it] 50%|█████ | 3446/6885 [12:47:49<3:16:34, 3.43s/it] 50%|█████ | 3447/6885 [12:47:51<2:55:28, 3.06s/it] 50%|█████ | 3448/6885 [12:47:54<2:57:56, 3.11s/it] 50%|█████ | 3449/6885 [12:47:57<2:47:12, 2.92s/it] 50%|█████ | 3450/6885 [12:47:59<2:32:33, 2.66s/it] {'loss': 0.5838, 'grad_norm': 1.10045421098352, 'learning_rate': 5.852701801106458e-06, 'epoch': 0.5} 50%|█████ | 3450/6885 [12:47:59<2:32:33, 2.66s/it] 50%|█████ | 3451/6885 [12:48:02<2:34:38, 2.70s/it] 50%|█████ | 3452/6885 [12:48:04<2:36:45, 2.74s/it] 50%|█████ | 3453/6885 [12:48:07<2:30:32, 2.63s/it] 50%|█████ | 3454/6885 [12:48:09<2:27:22, 2.58s/it] 50%|█████ | 3455/6885 [12:48:12<2:39:08, 2.78s/it] 50%|█████ | 3456/6885 [12:48:15<2:42:18, 2.84s/it] 50%|█████ | 3457/6885 [12:48:18<2:29:15, 2.61s/it] 50%|█████ | 3458/6885 [12:48:20<2:30:10, 2.63s/it] 50%|█████ | 3459/6885 [12:48:23<2:30:51, 2.64s/it] 50%|█████ | 3460/6885 [12:48:25<2:19:44, 2.45s/it] {'loss': 0.5847, 'grad_norm': 1.1675311387395517, 'learning_rate': 5.82771055275589e-06, 'epoch': 0.5} 50%|█████ | 3460/6885 [12:48:25<2:19:44, 2.45s/it] 50%|█████ | 3461/6885 [12:48:28<2:27:35, 2.59s/it] 50%|█████ | 3462/6885 [12:48:32<2:51:25, 3.00s/it] 50%|█████ | 3463/6885 [12:48:34<2:45:29, 2.90s/it] 50%|█████ | 3464/6885 [12:48:37<2:42:35, 2.85s/it] 50%|█████ | 3465/6885 [12:48:40<2:36:33, 2.75s/it] 50%|█████ | 3466/6885 [12:48:42<2:31:30, 2.66s/it] 50%|█████ | 3467/6885 [12:48:46<2:45:07, 2.90s/it] 50%|█████ | 3468/6885 [12:48:48<2:30:47, 2.65s/it] 50%|█████ | 3469/6885 [12:48:50<2:27:50, 2.60s/it] 50%|█████ | 3470/6885 [12:48:53<2:26:15, 2.57s/it] {'loss': 0.5656, 'grad_norm': 1.0028532762834719, 'learning_rate': 5.802698025244886e-06, 'epoch': 0.5} 50%|█████ | 3470/6885 [12:48:53<2:26:15, 2.57s/it] 50%|█████ | 3471/6885 [12:48:56<2:37:44, 2.77s/it] 50%|█████ | 3472/6885 [12:48:58<2:34:17, 2.71s/it] 50%|█████ | 3473/6885 [12:49:01<2:35:58, 2.74s/it] 50%|█████ | 3474/6885 [12:49:04<2:42:12, 2.85s/it] 50%|█████ | 3475/6885 [12:49:07<2:38:22, 2.79s/it] 50%|█████ | 3476/6885 [12:49:10<2:46:44, 2.93s/it] 51%|█████ | 3477/6885 [12:49:13<2:42:33, 2.86s/it] 51%|█████ | 3478/6885 [12:49:17<2:55:04, 3.08s/it] 51%|█████ | 3479/6885 [12:49:19<2:42:30, 2.86s/it] 51%|█████ | 3480/6885 [12:49:22<2:41:43, 2.85s/it] {'loss': 0.5871, 'grad_norm': 1.028656973511835, 'learning_rate': 5.777664861606912e-06, 'epoch': 0.51} 51%|█████ | 3480/6885 [12:49:22<2:41:43, 2.85s/it] 51%|█████ | 3481/6885 [12:49:24<2:36:10, 2.75s/it] 51%|█████ | 3482/6885 [12:49:27<2:28:41, 2.62s/it] 51%|█████ | 3483/6885 [12:49:29<2:23:58, 2.54s/it] 51%|█████ | 3484/6885 [12:49:32<2:30:42, 2.66s/it] 51%|█████ | 3485/6885 [12:49:35<2:36:40, 2.76s/it] 51%|█████ | 3486/6885 [12:49:37<2:33:49, 2.72s/it] 51%|█████ | 3487/6885 [12:49:41<2:44:57, 2.91s/it] 51%|█████ | 3488/6885 [12:49:43<2:36:35, 2.77s/it] 51%|█████ | 3489/6885 [12:49:48<3:08:29, 3.33s/it] 51%|█████ | 3490/6885 [12:49:51<3:06:55, 3.30s/it] {'loss': 0.5895, 'grad_norm': 1.2007383871296113, 'learning_rate': 5.752611705405957e-06, 'epoch': 0.51} 51%|█████ | 3490/6885 [12:49:51<3:06:55, 3.30s/it] 51%|█████ | 3491/6885 [12:49:55<3:21:26, 3.56s/it] 51%|█████ | 3492/6885 [12:49:57<2:56:16, 3.12s/it] 51%|█████ | 3493/6885 [12:50:00<2:41:01, 2.85s/it] 51%|█████ | 3494/6885 [12:50:03<2:46:34, 2.95s/it] 51%|█████ | 3495/6885 [12:50:06<2:53:40, 3.07s/it] 51%|█████ | 3496/6885 [12:50:08<2:32:07, 2.69s/it] 51%|█████ | 3497/6885 [12:50:10<2:20:12, 2.48s/it] 51%|█████ | 3498/6885 [12:50:15<3:08:21, 3.34s/it] 51%|█████ | 3499/6885 [12:50:17<2:44:12, 2.91s/it] 51%|█████ | 3500/6885 [12:50:19<2:27:23, 2.61s/it] {'loss': 0.573, 'grad_norm': 1.1281898149999334, 'learning_rate': 5.7275392007199896e-06, 'epoch': 0.51} 51%|█████ | 3500/6885 [12:50:19<2:27:23, 2.61s/it] 51%|█████ | 3501/6885 [12:50:21<2:22:53, 2.53s/it] 51%|█████ | 3502/6885 [12:50:25<2:45:36, 2.94s/it] 51%|█████ | 3503/6885 [12:50:28<2:36:23, 2.77s/it] 51%|█████ | 3504/6885 [12:50:30<2:35:00, 2.75s/it] 51%|█████ | 3505/6885 [12:50:33<2:34:21, 2.74s/it] 51%|█████ | 3506/6885 [12:50:35<2:27:33, 2.62s/it] 51%|█████ | 3507/6885 [12:50:38<2:30:46, 2.68s/it] 51%|█████ | 3508/6885 [12:50:41<2:36:32, 2.78s/it] 51%|█████ | 3509/6885 [12:50:43<2:22:12, 2.53s/it] 51%|█████ | 3510/6885 [12:50:47<2:38:30, 2.82s/it] {'loss': 0.57, 'grad_norm': 1.282146433020574, 'learning_rate': 5.702447992124394e-06, 'epoch': 0.51} 51%|█████ | 3510/6885 [12:50:47<2:38:30, 2.82s/it] 51%|█████ | 3511/6885 [12:50:50<2:52:29, 3.07s/it] 51%|█████ | 3512/6885 [12:50:53<2:42:13, 2.89s/it] 51%|█████ | 3513/6885 [12:50:55<2:26:10, 2.60s/it] 51%|█████ | 3514/6885 [12:50:57<2:20:54, 2.51s/it] 51%|█████ | 3515/6885 [12:51:00<2:25:28, 2.59s/it] 51%|█████ | 3516/6885 [12:51:02<2:26:14, 2.60s/it] 51%|█████ | 3517/6885 [12:51:06<2:40:17, 2.86s/it] 51%|█████ | 3518/6885 [12:51:09<2:43:07, 2.91s/it] 51%|█████ | 3519/6885 [12:51:14<3:26:16, 3.68s/it] 51%|█████ | 3520/6885 [12:51:18<3:23:07, 3.62s/it] {'loss': 0.5751, 'grad_norm': 1.05801689608913, 'learning_rate': 5.677338724675406e-06, 'epoch': 0.51} 51%|█████ | 3520/6885 [12:51:18<3:23:07, 3.62s/it] 51%|█████ | 3521/6885 [12:51:21<3:08:12, 3.36s/it] 51%|█████ | 3522/6885 [12:51:23<2:47:15, 2.98s/it] 51%|█████ | 3523/6885 [12:51:26<2:49:32, 3.03s/it] 51%|█████ | 3524/6885 [12:51:30<2:59:26, 3.20s/it] 51%|█████ | 3525/6885 [12:51:32<2:40:02, 2.86s/it] 51%|█████ | 3526/6885 [12:51:34<2:29:10, 2.66s/it] 51%|█████ | 3527/6885 [12:51:37<2:44:03, 2.93s/it] 51%|█████ | 3528/6885 [12:51:40<2:45:53, 2.97s/it] 51%|█████▏ | 3529/6885 [12:51:43<2:35:24, 2.78s/it] 51%|█████▏ | 3530/6885 [12:51:45<2:30:38, 2.69s/it] {'loss': 0.5805, 'grad_norm': 1.2511793245069922, 'learning_rate': 5.652212043893528e-06, 'epoch': 0.51} 51%|█████▏ | 3530/6885 [12:51:45<2:30:38, 2.69s/it] 51%|█████▏ | 3531/6885 [12:51:47<2:22:42, 2.55s/it] 51%|█████▏ | 3532/6885 [12:51:50<2:22:24, 2.55s/it] 51%|█████▏ | 3533/6885 [12:51:52<2:12:27, 2.37s/it] 51%|█████▏ | 3534/6885 [12:51:55<2:17:20, 2.46s/it] 51%|█████▏ | 3535/6885 [12:51:57<2:20:00, 2.51s/it] 51%|█████▏ | 3536/6885 [12:52:00<2:16:03, 2.44s/it] 51%|█████▏ | 3537/6885 [12:52:04<2:51:30, 3.07s/it] 51%|█████▏ | 3538/6885 [12:52:07<2:53:53, 3.12s/it] 51%|█████▏ | 3539/6885 [12:52:11<3:03:59, 3.30s/it] 51%|█████▏ | 3540/6885 [12:52:13<2:37:59, 2.83s/it] {'loss': 0.5734, 'grad_norm': 1.2496537928999953, 'learning_rate': 5.627068595746931e-06, 'epoch': 0.51} 51%|█████▏ | 3540/6885 [12:52:13<2:37:59, 2.83s/it] 51%|█████▏ | 3541/6885 [12:52:15<2:28:50, 2.67s/it] 51%|█████▏ | 3542/6885 [12:52:18<2:32:48, 2.74s/it] 51%|█████▏ | 3543/6885 [12:52:21<2:33:17, 2.75s/it] 51%|█████▏ | 3544/6885 [12:52:23<2:21:13, 2.54s/it] 51%|█████▏ | 3545/6885 [12:52:26<2:41:03, 2.89s/it] 52%|█████▏ | 3546/6885 [12:52:30<2:54:18, 3.13s/it] 52%|█████▏ | 3547/6885 [12:52:32<2:34:17, 2.77s/it] 52%|█████▏ | 3548/6885 [12:52:34<2:22:54, 2.57s/it] 52%|█████▏ | 3549/6885 [12:52:38<2:36:19, 2.81s/it] 52%|█████▏ | 3550/6885 [12:52:40<2:30:35, 2.71s/it] {'loss': 0.573, 'grad_norm': 1.0586939290192166, 'learning_rate': 5.601909026634846e-06, 'epoch': 0.52} 52%|█████▏ | 3550/6885 [12:52:40<2:30:35, 2.71s/it] 52%|█████▏ | 3551/6885 [12:52:43<2:34:29, 2.78s/it] 52%|█████▏ | 3552/6885 [12:52:46<2:30:28, 2.71s/it] 52%|█████▏ | 3553/6885 [12:52:48<2:18:59, 2.50s/it] 52%|█████▏ | 3554/6885 [12:52:50<2:15:55, 2.45s/it] 52%|█████▏ | 3555/6885 [12:52:55<3:03:23, 3.30s/it] 52%|█████▏ | 3556/6885 [12:52:58<2:52:06, 3.10s/it] 52%|█████▏ | 3557/6885 [12:53:00<2:42:29, 2.93s/it] 52%|█████▏ | 3558/6885 [12:53:03<2:39:04, 2.87s/it] 52%|█████▏ | 3559/6885 [12:53:08<3:18:45, 3.59s/it] 52%|█████▏ | 3560/6885 [12:53:11<2:55:26, 3.17s/it] {'loss': 0.5696, 'grad_norm': 1.2135072197108623, 'learning_rate': 5.576733983370955e-06, 'epoch': 0.52} 52%|█████▏ | 3560/6885 [12:53:11<2:55:26, 3.17s/it] 52%|█████▏ | 3561/6885 [12:53:15<3:21:27, 3.64s/it] 52%|█████▏ | 3562/6885 [12:53:17<2:54:19, 3.15s/it] 52%|█████▏ | 3563/6885 [12:53:20<2:43:28, 2.95s/it] 52%|█████▏ | 3564/6885 [12:53:22<2:32:00, 2.75s/it] 52%|█████▏ | 3565/6885 [12:53:25<2:39:29, 2.88s/it] 52%|█████▏ | 3566/6885 [12:53:28<2:30:25, 2.72s/it] 52%|█████▏ | 3567/6885 [12:53:32<2:56:03, 3.18s/it] 52%|█████▏ | 3568/6885 [12:53:35<2:48:21, 3.05s/it] 52%|█████▏ | 3569/6885 [12:53:37<2:45:47, 3.00s/it] 52%|█████▏ | 3570/6885 [12:53:42<3:11:20, 3.46s/it] {'loss': 0.5764, 'grad_norm': 1.096951604322022, 'learning_rate': 5.551544113166752e-06, 'epoch': 0.52} 52%|█████▏ | 3570/6885 [12:53:42<3:11:20, 3.46s/it] 52%|█████▏ | 3571/6885 [12:53:47<3:33:27, 3.86s/it] 52%|█████▏ | 3572/6885 [12:53:49<3:11:32, 3.47s/it] 52%|█████▏ | 3573/6885 [12:53:52<2:54:24, 3.16s/it] 52%|█████▏ | 3574/6885 [12:53:55<2:55:27, 3.18s/it] 52%|█████▏ | 3575/6885 [12:53:59<3:13:55, 3.52s/it] 52%|█████▏ | 3576/6885 [12:54:05<3:51:47, 4.20s/it] 52%|█████▏ | 3577/6885 [12:54:07<3:15:52, 3.55s/it] 52%|█████▏ | 3578/6885 [12:54:09<2:51:07, 3.10s/it] 52%|█████▏ | 3579/6885 [12:54:12<2:51:07, 3.11s/it] 52%|█████▏ | 3580/6885 [12:54:16<3:00:03, 3.27s/it] {'loss': 0.5945, 'grad_norm': 1.067656908278471, 'learning_rate': 5.5263400636149104e-06, 'epoch': 0.52} 52%|█████▏ | 3580/6885 [12:54:16<3:00:03, 3.27s/it] 52%|█████▏ | 3581/6885 [12:54:18<2:43:06, 2.96s/it] 52%|█████▏ | 3582/6885 [12:54:21<2:37:44, 2.87s/it] 52%|█████▏ | 3583/6885 [12:54:23<2:30:49, 2.74s/it] 52%|█████▏ | 3584/6885 [12:54:26<2:38:19, 2.88s/it] 52%|█████▏ | 3585/6885 [12:54:30<2:56:22, 3.21s/it] 52%|█████▏ | 3586/6885 [12:54:34<2:55:38, 3.19s/it] 52%|█████▏ | 3587/6885 [12:54:37<2:57:25, 3.23s/it] 52%|█████▏ | 3588/6885 [12:54:40<2:54:10, 3.17s/it] 52%|█████▏ | 3589/6885 [12:54:43<2:45:12, 3.01s/it] 52%|█████▏ | 3590/6885 [12:54:45<2:32:04, 2.77s/it] {'loss': 0.5698, 'grad_norm': 1.2528345132805765, 'learning_rate': 5.50112248267263e-06, 'epoch': 0.52} 52%|█████▏ | 3590/6885 [12:54:45<2:32:04, 2.77s/it] 52%|█████▏ | 3591/6885 [12:54:48<2:34:12, 2.81s/it] 52%|█████▏ | 3592/6885 [12:54:52<2:52:04, 3.14s/it] 52%|█████▏ | 3593/6885 [12:54:53<2:25:11, 2.65s/it] 52%|█████▏ | 3594/6885 [12:54:56<2:23:49, 2.62s/it] 52%|█████▏ | 3595/6885 [12:54:59<2:39:42, 2.91s/it] 52%|█████▏ | 3596/6885 [12:55:02<2:29:27, 2.73s/it] 52%|█████▏ | 3597/6885 [12:55:06<3:02:05, 3.32s/it] 52%|█████▏ | 3598/6885 [12:55:11<3:30:50, 3.85s/it] 52%|█████▏ | 3599/6885 [12:55:15<3:32:19, 3.88s/it] 52%|█████▏ | 3600/6885 [12:55:18<3:17:28, 3.61s/it] {'loss': 0.5939, 'grad_norm': 1.153586426579592, 'learning_rate': 5.475892018644989e-06, 'epoch': 0.52} 52%|█████▏ | 3600/6885 [12:55:18<3:17:28, 3.61s/it] 52%|█████▏ | 3601/6885 [12:55:21<3:07:02, 3.42s/it] 52%|█████▏ | 3602/6885 [12:55:24<2:52:35, 3.15s/it] 52%|█████▏ | 3603/6885 [12:55:28<3:13:30, 3.54s/it] 52%|█████▏ | 3604/6885 [12:55:31<3:02:01, 3.33s/it] 52%|█████▏ | 3605/6885 [12:55:35<3:10:20, 3.48s/it] 52%|█████▏ | 3606/6885 [12:55:38<2:57:49, 3.25s/it] 52%|█████▏ | 3607/6885 [12:55:41<2:57:04, 3.24s/it] 52%|█████▏ | 3608/6885 [12:55:43<2:41:27, 2.96s/it] 52%|█████▏ | 3609/6885 [12:55:45<2:25:02, 2.66s/it] 52%|█████▏ | 3610/6885 [12:55:47<2:07:59, 2.34s/it] {'loss': 0.5764, 'grad_norm': 1.321281822598792, 'learning_rate': 5.450649320168263e-06, 'epoch': 0.52} 52%|█████▏ | 3610/6885 [12:55:47<2:07:59, 2.34s/it] 52%|█████▏ | 3611/6885 [12:55:49<2:12:30, 2.43s/it] 52%|█████▏ | 3612/6885 [12:55:51<2:02:06, 2.24s/it] 52%|█████▏ | 3613/6885 [12:55:53<1:57:53, 2.16s/it] 52%|█████▏ | 3614/6885 [12:55:55<2:00:28, 2.21s/it] 53%|█████▎ | 3615/6885 [12:55:57<1:55:49, 2.13s/it] 53%|█████▎ | 3616/6885 [12:56:01<2:26:45, 2.69s/it] 53%|█████▎ | 3617/6885 [12:56:05<2:49:50, 3.12s/it] 53%|█████▎ | 3618/6885 [12:56:08<2:33:24, 2.82s/it] 53%|█████▎ | 3619/6885 [12:56:12<3:02:25, 3.35s/it] 53%|█████▎ | 3620/6885 [12:56:14<2:38:22, 2.91s/it] {'loss': 0.5698, 'grad_norm': 1.1546247883125684, 'learning_rate': 5.4253950361932565e-06, 'epoch': 0.53} 53%|█████▎ | 3620/6885 [12:56:14<2:38:22, 2.91s/it] 53%|█████▎ | 3621/6885 [12:56:21<3:38:53, 4.02s/it] 53%|█████▎ | 3622/6885 [12:56:24<3:23:29, 3.74s/it] 53%|█████▎ | 3623/6885 [12:56:28<3:32:19, 3.91s/it] 53%|█████▎ | 3624/6885 [12:56:33<3:44:47, 4.14s/it] 53%|█████▎ | 3625/6885 [12:56:36<3:24:19, 3.76s/it] 53%|█████▎ | 3626/6885 [12:56:40<3:26:39, 3.80s/it] 53%|█████▎ | 3627/6885 [12:56:45<3:59:37, 4.41s/it] 53%|█████▎ | 3628/6885 [12:56:48<3:36:48, 3.99s/it] 53%|█████▎ | 3629/6885 [12:56:51<3:19:11, 3.67s/it] 53%|█████▎ | 3630/6885 [12:56:55<3:14:55, 3.59s/it] {'loss': 0.58, 'grad_norm': 1.3090075714265825, 'learning_rate': 5.400129815968623e-06, 'epoch': 0.53} 53%|█████▎ | 3630/6885 [12:56:55<3:14:55, 3.59s/it] 53%|█████▎ | 3631/6885 [12:57:02<4:12:24, 4.65s/it] 53%|█████▎ | 3632/6885 [12:57:04<3:27:53, 3.83s/it] 53%|█████▎ | 3633/6885 [12:57:07<3:20:29, 3.70s/it] 53%|█████▎ | 3634/6885 [12:57:10<3:04:35, 3.41s/it] 53%|█████▎ | 3635/6885 [12:57:13<2:54:25, 3.22s/it] 53%|█████▎ | 3636/6885 [12:57:15<2:47:03, 3.08s/it] 53%|█████▎ | 3637/6885 [12:57:18<2:43:31, 3.02s/it] 53%|█████▎ | 3638/6885 [12:57:20<2:24:57, 2.68s/it] 53%|█████▎ | 3639/6885 [12:57:22<2:07:29, 2.36s/it] 53%|█████▎ | 3640/6885 [12:57:24<1:59:17, 2.21s/it] {'loss': 0.5906, 'grad_norm': 1.3546772950978652, 'learning_rate': 5.374854309024167e-06, 'epoch': 0.53} 53%|█████▎ | 3640/6885 [12:57:24<1:59:17, 2.21s/it] 53%|█████▎ | 3641/6885 [12:57:26<1:59:26, 2.21s/it] 53%|█████▎ | 3642/6885 [12:57:28<2:01:05, 2.24s/it] 53%|█████▎ | 3643/6885 [12:57:31<2:14:02, 2.48s/it] 53%|█████▎ | 3644/6885 [12:57:35<2:38:00, 2.93s/it] 53%|█████▎ | 3645/6885 [12:57:38<2:30:06, 2.78s/it] 53%|█████▎ | 3646/6885 [12:57:41<2:32:58, 2.83s/it] 53%|█████▎ | 3647/6885 [12:57:43<2:23:10, 2.65s/it] 53%|█████▎ | 3648/6885 [12:57:45<2:11:50, 2.44s/it] 53%|█████▎ | 3649/6885 [12:57:48<2:24:47, 2.68s/it] 53%|█████▎ | 3650/6885 [12:57:50<2:20:09, 2.60s/it] {'loss': 0.5617, 'grad_norm': 1.0728126839197956, 'learning_rate': 5.349569165154153e-06, 'epoch': 0.53} 53%|█████▎ | 3650/6885 [12:57:50<2:20:09, 2.60s/it] 53%|█████▎ | 3651/6885 [12:57:54<2:33:03, 2.84s/it] 53%|█████▎ | 3652/6885 [12:57:57<2:34:03, 2.86s/it] 53%|█████▎ | 3653/6885 [12:57:59<2:22:25, 2.64s/it] 53%|█████▎ | 3654/6885 [12:58:01<2:14:07, 2.49s/it] 53%|█████▎ | 3655/6885 [12:58:06<3:00:17, 3.35s/it] 53%|█████▎ | 3656/6885 [12:58:09<2:44:42, 3.06s/it] 53%|█████▎ | 3657/6885 [12:58:12<2:51:07, 3.18s/it] 53%|█████▎ | 3658/6885 [12:58:15<2:49:02, 3.14s/it] 53%|█████▎ | 3659/6885 [12:58:18<2:46:58, 3.11s/it] 53%|█████▎ | 3660/6885 [12:58:23<3:14:39, 3.62s/it] {'loss': 0.5752, 'grad_norm': 1.0481388119854531, 'learning_rate': 5.32427503440059e-06, 'epoch': 0.53} 53%|█████▎ | 3660/6885 [12:58:23<3:14:39, 3.62s/it] 53%|█████▎ | 3661/6885 [12:58:26<3:05:38, 3.45s/it] 53%|█████▎ | 3662/6885 [12:58:30<3:04:11, 3.43s/it] 53%|█████▎ | 3663/6885 [12:58:33<3:03:37, 3.42s/it] 53%|█████▎ | 3664/6885 [12:58:36<3:05:58, 3.46s/it] 53%|█████▎ | 3665/6885 [12:58:39<2:49:48, 3.16s/it] 53%|█████▎ | 3666/6885 [12:58:43<3:05:54, 3.47s/it] 53%|█████▎ | 3667/6885 [12:58:45<2:48:16, 3.14s/it] 53%|█████▎ | 3668/6885 [12:58:48<2:32:49, 2.85s/it] 53%|█████▎ | 3669/6885 [12:58:51<2:36:15, 2.92s/it] 53%|█████▎ | 3670/6885 [12:58:54<2:36:50, 2.93s/it] {'loss': 0.577, 'grad_norm': 1.251734474368655, 'learning_rate': 5.29897256703653e-06, 'epoch': 0.53} 53%|█████▎ | 3670/6885 [12:58:54<2:36:50, 2.93s/it] 53%|█████▎ | 3671/6885 [12:58:55<2:16:15, 2.54s/it] 53%|█████▎ | 3672/6885 [12:59:00<2:45:48, 3.10s/it] 53%|█████▎ | 3673/6885 [12:59:02<2:25:56, 2.73s/it] 53%|█████▎ | 3674/6885 [12:59:05<2:41:07, 3.01s/it] 53%|█████▎ | 3675/6885 [12:59:10<3:13:15, 3.61s/it] 53%|█████▎ | 3676/6885 [12:59:13<2:56:25, 3.30s/it] 53%|█████▎ | 3677/6885 [12:59:15<2:45:00, 3.09s/it] 53%|█████▎ | 3678/6885 [12:59:19<2:51:07, 3.20s/it] 53%|█████▎ | 3679/6885 [12:59:21<2:36:35, 2.93s/it] 53%|█████▎ | 3680/6885 [12:59:23<2:20:56, 2.64s/it] {'loss': 0.5604, 'grad_norm': 1.1273771235496188, 'learning_rate': 5.2736624135493465e-06, 'epoch': 0.53} 53%|█████▎ | 3680/6885 [12:59:23<2:20:56, 2.64s/it] 53%|█████▎ | 3681/6885 [12:59:25<2:13:56, 2.51s/it] 53%|█████▎ | 3682/6885 [12:59:28<2:14:47, 2.52s/it] 53%|█████▎ | 3683/6885 [12:59:31<2:24:33, 2.71s/it] 54%|█████▎ | 3684/6885 [12:59:34<2:24:53, 2.72s/it] 54%|█████▎ | 3685/6885 [12:59:37<2:38:40, 2.98s/it] 54%|█████▎ | 3686/6885 [12:59:39<2:16:40, 2.56s/it] 54%|█████▎ | 3687/6885 [12:59:43<2:44:15, 3.08s/it] 54%|█████▎ | 3688/6885 [12:59:45<2:23:21, 2.69s/it] 54%|█████▎ | 3689/6885 [12:59:49<2:38:23, 2.97s/it] 54%|█████▎ | 3690/6885 [12:59:52<2:47:26, 3.14s/it] {'loss': 0.5799, 'grad_norm': 1.1728285082039356, 'learning_rate': 5.248345224624007e-06, 'epoch': 0.54} 54%|█████▎ | 3690/6885 [12:59:52<2:47:26, 3.14s/it] 54%|█████▎ | 3691/6885 [12:59:58<3:21:47, 3.79s/it] 54%|█████▎ | 3692/6885 [13:00:00<2:57:35, 3.34s/it] 54%|█████▎ | 3693/6885 [13:00:02<2:44:17, 3.09s/it] 54%|█████▎ | 3694/6885 [13:00:06<2:50:00, 3.20s/it] 54%|█████▎ | 3695/6885 [13:00:08<2:35:07, 2.92s/it] 54%|█████▎ | 3696/6885 [13:00:10<2:16:27, 2.57s/it] 54%|█████▎ | 3697/6885 [13:00:13<2:27:36, 2.78s/it] 54%|█████▎ | 3698/6885 [13:00:16<2:36:56, 2.95s/it] 54%|█████▎ | 3699/6885 [13:00:19<2:37:48, 2.97s/it] 54%|█████▎ | 3700/6885 [13:00:22<2:30:07, 2.83s/it] {'loss': 0.5792, 'grad_norm': 1.1207082347004158, 'learning_rate': 5.223021651126356e-06, 'epoch': 0.54} 54%|█████▎ | 3700/6885 [13:00:22<2:30:07, 2.83s/it] 54%|█████▍ | 3701/6885 [13:00:25<2:33:35, 2.89s/it] 54%|█████▍ | 3702/6885 [13:00:29<2:52:19, 3.25s/it] 54%|█████▍ | 3703/6885 [13:00:31<2:35:32, 2.93s/it] 54%|█████▍ | 3704/6885 [13:00:35<2:41:03, 3.04s/it] 54%|█████▍ | 3705/6885 [13:00:37<2:36:32, 2.95s/it] 54%|█████▍ | 3706/6885 [13:00:40<2:29:22, 2.82s/it] 54%|█████▍ | 3707/6885 [13:00:42<2:19:14, 2.63s/it] 54%|█████▍ | 3708/6885 [13:00:44<2:08:00, 2.42s/it] 54%|█████▍ | 3709/6885 [13:00:49<2:49:44, 3.21s/it] 54%|█████▍ | 3710/6885 [13:00:51<2:33:16, 2.90s/it] {'loss': 0.582, 'grad_norm': 1.096111126610637, 'learning_rate': 5.197692344086369e-06, 'epoch': 0.54} 54%|█████▍ | 3710/6885 [13:00:51<2:33:16, 2.90s/it] 54%|█████▍ | 3711/6885 [13:00:54<2:27:38, 2.79s/it] 54%|█████▍ | 3712/6885 [13:00:57<2:36:53, 2.97s/it] 54%|█████▍ | 3713/6885 [13:01:01<2:50:34, 3.23s/it] 54%|█████▍ | 3714/6885 [13:01:05<3:06:20, 3.53s/it] 54%|█████▍ | 3715/6885 [13:01:07<2:47:08, 3.16s/it] 54%|█████▍ | 3716/6885 [13:01:10<2:34:35, 2.93s/it] 54%|█████▍ | 3717/6885 [13:01:13<2:38:40, 3.01s/it] 54%|█████▍ | 3718/6885 [13:01:16<2:32:48, 2.89s/it] 54%|█████▍ | 3719/6885 [13:01:19<2:41:03, 3.05s/it] 54%|█████▍ | 3720/6885 [13:01:22<2:33:16, 2.91s/it] {'loss': 0.5669, 'grad_norm': 1.1432895144261512, 'learning_rate': 5.172357954681427e-06, 'epoch': 0.54} 54%|█████▍ | 3720/6885 [13:01:22<2:33:16, 2.91s/it] 54%|█████▍ | 3721/6885 [13:01:25<2:33:33, 2.91s/it] 54%|█████▍ | 3722/6885 [13:01:27<2:26:28, 2.78s/it] 54%|█████▍ | 3723/6885 [13:01:32<2:55:40, 3.33s/it] 54%|█████▍ | 3724/6885 [13:01:35<2:54:13, 3.31s/it] 54%|█████▍ | 3725/6885 [13:01:38<2:53:00, 3.28s/it] 54%|█████▍ | 3726/6885 [13:01:40<2:36:37, 2.97s/it] 54%|█████▍ | 3727/6885 [13:01:44<2:50:03, 3.23s/it] 54%|█████▍ | 3728/6885 [13:01:47<2:35:46, 2.96s/it] 54%|█████▍ | 3729/6885 [13:01:49<2:25:20, 2.76s/it] 54%|█████▍ | 3730/6885 [13:01:51<2:10:33, 2.48s/it] {'loss': 0.5727, 'grad_norm': 1.2795186578480655, 'learning_rate': 5.147019134219569e-06, 'epoch': 0.54} 54%|█████▍ | 3730/6885 [13:01:51<2:10:33, 2.48s/it] 54%|█████▍ | 3731/6885 [13:01:52<1:58:49, 2.26s/it] 54%|█████▍ | 3732/6885 [13:01:54<1:49:25, 2.08s/it] 54%|█████▍ | 3733/6885 [13:01:57<2:05:17, 2.39s/it] 54%|█████▍ | 3734/6885 [13:02:03<2:59:27, 3.42s/it] 54%|█████▍ | 3735/6885 [13:02:05<2:36:49, 2.99s/it] 54%|█████▍ | 3736/6885 [13:02:10<3:14:35, 3.71s/it] 54%|█████▍ | 3737/6885 [13:02:13<2:58:05, 3.39s/it] 54%|█████▍ | 3738/6885 [13:02:18<3:30:09, 4.01s/it] 54%|█████▍ | 3739/6885 [13:02:21<2:59:52, 3.43s/it] 54%|█████▍ | 3740/6885 [13:02:24<3:00:58, 3.45s/it] {'loss': 0.5665, 'grad_norm': 1.1497619263404009, 'learning_rate': 5.121676534122746e-06, 'epoch': 0.54} 54%|█████▍ | 3740/6885 [13:02:24<3:00:58, 3.45s/it] 54%|█████▍ | 3741/6885 [13:02:27<2:46:09, 3.17s/it] 54%|█████▍ | 3742/6885 [13:02:30<2:49:35, 3.24s/it] 54%|█████▍ | 3743/6885 [13:02:32<2:36:40, 2.99s/it] 54%|█████▍ | 3744/6885 [13:02:36<2:52:55, 3.30s/it] 54%|█████▍ | 3745/6885 [13:02:39<2:42:09, 3.10s/it] 54%|█████▍ | 3746/6885 [13:02:45<3:25:05, 3.92s/it] 54%|█████▍ | 3747/6885 [13:02:47<2:52:06, 3.29s/it] 54%|█████▍ | 3748/6885 [13:02:49<2:34:49, 2.96s/it] 54%|█████▍ | 3749/6885 [13:02:57<4:00:18, 4.60s/it] 54%|█████▍ | 3750/6885 [13:03:00<3:35:22, 4.12s/it] {'loss': 0.5758, 'grad_norm': 1.053760679670929, 'learning_rate': 5.096330805910085e-06, 'epoch': 0.54} 54%|█████▍ | 3750/6885 [13:03:00<3:35:22, 4.12s/it] 54%|█████▍ | 3751/6885 [13:03:05<3:47:50, 4.36s/it] 54%|█████▍ | 3752/6885 [13:03:07<3:11:07, 3.66s/it] 55%|█████▍ | 3753/6885 [13:03:11<3:10:14, 3.64s/it] 55%|█████▍ | 3754/6885 [13:03:13<2:45:22, 3.17s/it] 55%|█████▍ | 3755/6885 [13:03:15<2:35:31, 2.98s/it] 55%|█████▍ | 3756/6885 [13:03:18<2:25:45, 2.79s/it] 55%|█████▍ | 3757/6885 [13:03:21<2:36:53, 3.01s/it] 55%|█████▍ | 3758/6885 [13:03:25<2:47:08, 3.21s/it] 55%|█████▍ | 3759/6885 [13:03:27<2:33:26, 2.95s/it] 55%|█████▍ | 3760/6885 [13:03:29<2:16:20, 2.62s/it] {'loss': 0.5715, 'grad_norm': 1.2455461930319618, 'learning_rate': 5.0709826011811246e-06, 'epoch': 0.55} 55%|█████▍ | 3760/6885 [13:03:29<2:16:20, 2.62s/it] 55%|█████▍ | 3761/6885 [13:03:32<2:21:49, 2.72s/it] 55%|█████▍ | 3762/6885 [13:03:35<2:31:42, 2.91s/it] 55%|█████▍ | 3763/6885 [13:03:37<2:10:45, 2.51s/it] 55%|█████▍ | 3764/6885 [13:03:39<2:06:33, 2.43s/it] 55%|█████▍ | 3765/6885 [13:03:42<2:11:55, 2.54s/it] 55%|█████▍ | 3766/6885 [13:03:46<2:37:51, 3.04s/it] 55%|█████▍ | 3767/6885 [13:03:50<2:42:32, 3.13s/it] 55%|█████▍ | 3768/6885 [13:03:52<2:34:23, 2.97s/it] 55%|█████▍ | 3769/6885 [13:03:55<2:32:25, 2.93s/it] 55%|█████▍ | 3770/6885 [13:03:57<2:13:26, 2.57s/it] {'loss': 0.5764, 'grad_norm': 1.2714142743729588, 'learning_rate': 5.045632571599076e-06, 'epoch': 0.55} 55%|█████▍ | 3770/6885 [13:03:57<2:13:26, 2.57s/it] 55%|█████▍ | 3771/6885 [13:03:59<2:12:49, 2.56s/it] 55%|█████▍ | 3772/6885 [13:04:01<2:04:03, 2.39s/it] 55%|█████▍ | 3773/6885 [13:04:04<2:08:37, 2.48s/it] 55%|█████▍ | 3774/6885 [13:04:06<2:04:25, 2.40s/it] 55%|█████▍ | 3775/6885 [13:04:12<2:52:41, 3.33s/it] 55%|█████▍ | 3776/6885 [13:04:15<2:55:19, 3.38s/it] 55%|█████▍ | 3777/6885 [13:04:20<3:18:56, 3.84s/it] 55%|█████▍ | 3778/6885 [13:04:26<3:44:52, 4.34s/it] 55%|█████▍ | 3779/6885 [13:04:30<3:45:45, 4.36s/it] 55%|█████▍ | 3780/6885 [13:04:33<3:26:14, 3.99s/it] {'loss': 0.5777, 'grad_norm': 1.2596602396359573, 'learning_rate': 5.020281368874063e-06, 'epoch': 0.55} 55%|█████▍ | 3780/6885 [13:04:33<3:26:14, 3.99s/it] 55%|█████▍ | 3781/6885 [13:04:36<3:11:28, 3.70s/it] 55%|█████▍ | 3782/6885 [13:04:40<3:16:01, 3.79s/it] 55%|█████▍ | 3783/6885 [13:04:42<2:49:08, 3.27s/it] 55%|█████▍ | 3784/6885 [13:04:45<2:33:10, 2.96s/it] 55%|█████▍ | 3785/6885 [13:04:47<2:32:10, 2.95s/it] 55%|█████▍ | 3786/6885 [13:04:50<2:20:39, 2.72s/it] 55%|█████▌ | 3787/6885 [13:04:52<2:09:58, 2.52s/it] 55%|█████▌ | 3788/6885 [13:04:56<2:34:34, 2.99s/it] 55%|█████▌ | 3789/6885 [13:04:58<2:16:40, 2.65s/it] 55%|█████▌ | 3790/6885 [13:05:00<2:18:18, 2.68s/it] {'loss': 0.5752, 'grad_norm': 1.096076072807335, 'learning_rate': 4.994929644746366e-06, 'epoch': 0.55} 55%|█████▌ | 3790/6885 [13:05:00<2:18:18, 2.68s/it] 55%|█████▌ | 3791/6885 [13:05:03<2:21:48, 2.75s/it] 55%|█████▌ | 3792/6885 [13:05:06<2:17:58, 2.68s/it] 55%|█████▌ | 3793/6885 [13:05:08<2:15:10, 2.62s/it] 55%|█████▌ | 3794/6885 [13:05:11<2:19:09, 2.70s/it] 55%|█████▌ | 3795/6885 [13:05:13<2:08:29, 2.49s/it] 55%|█████▌ | 3796/6885 [13:05:16<2:08:17, 2.49s/it] 55%|█████▌ | 3797/6885 [13:05:19<2:15:08, 2.63s/it] 55%|█████▌ | 3798/6885 [13:05:21<2:10:50, 2.54s/it] 55%|█████▌ | 3799/6885 [13:05:24<2:12:01, 2.57s/it] 55%|█████▌ | 3800/6885 [13:05:27<2:19:52, 2.72s/it] {'loss': 0.5783, 'grad_norm': 1.1180419407959938, 'learning_rate': 4.969578050969675e-06, 'epoch': 0.55} 55%|█████▌ | 3800/6885 [13:05:27<2:19:52, 2.72s/it] 55%|█████▌ | 3801/6885 [13:05:30<2:36:09, 3.04s/it] 55%|█████▌ | 3802/6885 [13:05:34<2:38:42, 3.09s/it] 55%|█████▌ | 3803/6885 [13:05:36<2:32:14, 2.96s/it] 55%|█████▌ | 3804/6885 [13:05:42<3:07:46, 3.66s/it] 55%|█████▌ | 3805/6885 [13:05:45<2:58:08, 3.47s/it] 55%|█████▌ | 3806/6885 [13:05:47<2:43:22, 3.18s/it] 55%|█████▌ | 3807/6885 [13:05:50<2:37:47, 3.08s/it] 55%|█████▌ | 3808/6885 [13:05:54<2:55:11, 3.42s/it] 55%|█████▌ | 3809/6885 [13:05:56<2:37:19, 3.07s/it] 55%|█████▌ | 3810/6885 [13:06:03<3:25:22, 4.01s/it] {'loss': 0.5706, 'grad_norm': 1.1457632992717688, 'learning_rate': 4.944227239294327e-06, 'epoch': 0.55} 55%|█████▌ | 3810/6885 [13:06:03<3:25:22, 4.01s/it] 55%|█████▌ | 3811/6885 [13:06:05<3:01:55, 3.55s/it] 55%|█████▌ | 3812/6885 [13:06:07<2:35:36, 3.04s/it] 55%|█████▌ | 3813/6885 [13:06:10<2:28:02, 2.89s/it] 55%|█████▌ | 3814/6885 [13:06:12<2:22:28, 2.78s/it] 55%|█████▌ | 3815/6885 [13:06:14<2:13:01, 2.60s/it] 55%|█████▌ | 3816/6885 [13:06:16<2:05:59, 2.46s/it] 55%|█████▌ | 3817/6885 [13:06:20<2:18:53, 2.72s/it] 55%|█████▌ | 3818/6885 [13:06:23<2:25:55, 2.85s/it] 55%|█████▌ | 3819/6885 [13:06:26<2:33:00, 2.99s/it] 55%|█████▌ | 3820/6885 [13:06:30<2:44:30, 3.22s/it] {'loss': 0.5629, 'grad_norm': 1.0431686309314605, 'learning_rate': 4.918877861450553e-06, 'epoch': 0.55} 55%|█████▌ | 3820/6885 [13:06:30<2:44:30, 3.22s/it] 55%|█████▌ | 3821/6885 [13:06:34<3:05:03, 3.62s/it] 56%|█████▌ | 3822/6885 [13:06:37<2:41:30, 3.16s/it] 56%|█████▌ | 3823/6885 [13:06:42<3:09:45, 3.72s/it] 56%|█████▌ | 3824/6885 [13:06:45<3:06:50, 3.66s/it] 56%|█████▌ | 3825/6885 [13:06:48<3:00:01, 3.53s/it] 56%|█████▌ | 3826/6885 [13:06:52<3:08:24, 3.70s/it] 56%|█████▌ | 3827/6885 [13:06:55<2:52:32, 3.39s/it] 56%|█████▌ | 3828/6885 [13:06:58<2:50:34, 3.35s/it] 56%|█████▌ | 3829/6885 [13:07:00<2:25:16, 2.85s/it] 56%|█████▌ | 3830/6885 [13:07:04<2:40:02, 3.14s/it] {'loss': 0.5611, 'grad_norm': 1.1033442319502207, 'learning_rate': 4.893530569131716e-06, 'epoch': 0.56} 56%|█████▌ | 3830/6885 [13:07:04<2:40:02, 3.14s/it] 56%|█████▌ | 3831/6885 [13:07:07<2:35:13, 3.05s/it] 56%|█████▌ | 3832/6885 [13:07:10<2:31:33, 2.98s/it] 56%|█████▌ | 3833/6885 [13:07:17<3:38:25, 4.29s/it] 56%|█████▌ | 3834/6885 [13:07:19<3:11:05, 3.76s/it] 56%|█████▌ | 3835/6885 [13:07:24<3:17:29, 3.89s/it] 56%|█████▌ | 3836/6885 [13:07:27<3:03:25, 3.61s/it] 56%|█████▌ | 3837/6885 [13:07:29<2:50:29, 3.36s/it] 56%|█████▌ | 3838/6885 [13:07:32<2:43:56, 3.23s/it] 56%|█████▌ | 3839/6885 [13:07:35<2:32:36, 3.01s/it] 56%|█████▌ | 3840/6885 [13:07:37<2:28:32, 2.93s/it] {'loss': 0.568, 'grad_norm': 1.1929600913303742, 'learning_rate': 4.8681860139775745e-06, 'epoch': 0.56} 56%|█████▌ | 3840/6885 [13:07:37<2:28:32, 2.93s/it] 56%|█████▌ | 3841/6885 [13:07:40<2:27:05, 2.90s/it] 56%|█████▌ | 3842/6885 [13:07:43<2:24:12, 2.84s/it] 56%|█████▌ | 3843/6885 [13:07:46<2:20:54, 2.78s/it] 56%|█████▌ | 3844/6885 [13:07:49<2:22:52, 2.82s/it] 56%|█████▌ | 3845/6885 [13:07:55<3:17:13, 3.89s/it] 56%|█████▌ | 3846/6885 [13:07:57<2:55:58, 3.47s/it] 56%|█████▌ | 3847/6885 [13:08:00<2:38:10, 3.12s/it] 56%|█████▌ | 3848/6885 [13:08:02<2:20:46, 2.78s/it] 56%|█████▌ | 3849/6885 [13:08:03<2:04:03, 2.45s/it] 56%|█████▌ | 3850/6885 [13:08:05<1:51:36, 2.21s/it] {'loss': 0.5882, 'grad_norm': 1.281488846532093, 'learning_rate': 4.842844847557508e-06, 'epoch': 0.56} 56%|█████▌ | 3850/6885 [13:08:05<1:51:36, 2.21s/it] 56%|█████▌ | 3851/6885 [13:08:08<1:57:22, 2.32s/it] 56%|█████▌ | 3852/6885 [13:08:10<1:57:11, 2.32s/it] 56%|█████▌ | 3853/6885 [13:08:13<2:08:07, 2.54s/it] 56%|█████▌ | 3854/6885 [13:08:16<2:14:06, 2.65s/it] 56%|█████▌ | 3855/6885 [13:08:20<2:31:36, 3.00s/it] 56%|█████▌ | 3856/6885 [13:08:22<2:20:28, 2.78s/it] 56%|█████▌ | 3857/6885 [13:08:24<2:07:41, 2.53s/it] 56%|█████▌ | 3858/6885 [13:08:27<2:22:55, 2.83s/it] 56%|█████▌ | 3859/6885 [13:08:31<2:32:27, 3.02s/it] 56%|█████▌ | 3860/6885 [13:08:34<2:39:23, 3.16s/it] {'loss': 0.596, 'grad_norm': 1.1195048036816224, 'learning_rate': 4.817507721353785e-06, 'epoch': 0.56} 56%|█████▌ | 3860/6885 [13:08:34<2:39:23, 3.16s/it] 56%|█████▌ | 3861/6885 [13:08:37<2:36:43, 3.11s/it] 56%|█████▌ | 3862/6885 [13:08:40<2:35:40, 3.09s/it] 56%|█████▌ | 3863/6885 [13:08:44<2:43:34, 3.25s/it] 56%|█████▌ | 3864/6885 [13:08:47<2:33:59, 3.06s/it] 56%|█████▌ | 3865/6885 [13:08:50<2:31:15, 3.01s/it] 56%|█████▌ | 3866/6885 [13:08:51<2:12:46, 2.64s/it] 56%|█████▌ | 3867/6885 [13:08:57<2:55:51, 3.50s/it] 56%|█████▌ | 3868/6885 [13:09:00<2:49:52, 3.38s/it] 56%|█████▌ | 3869/6885 [13:09:03<2:37:32, 3.13s/it] 56%|█████▌ | 3870/6885 [13:09:06<2:41:50, 3.22s/it] {'loss': 0.5747, 'grad_norm': 1.1077419816516767, 'learning_rate': 4.792175286744802e-06, 'epoch': 0.56} 56%|█████▌ | 3870/6885 [13:09:06<2:41:50, 3.22s/it] 56%|█████▌ | 3871/6885 [13:09:09<2:40:56, 3.20s/it] 56%|█████▌ | 3872/6885 [13:09:13<2:43:55, 3.26s/it] 56%|█████▋ | 3873/6885 [13:09:16<2:48:55, 3.37s/it] 56%|█████▋ | 3874/6885 [13:09:18<2:26:17, 2.92s/it] 56%|█████▋ | 3875/6885 [13:09:20<2:17:29, 2.74s/it] 56%|█████▋ | 3876/6885 [13:09:23<2:17:47, 2.75s/it] 56%|█████▋ | 3877/6885 [13:09:26<2:16:04, 2.71s/it] 56%|█████▋ | 3878/6885 [13:09:28<2:08:17, 2.56s/it] 56%|█████▋ | 3879/6885 [13:09:32<2:30:21, 3.00s/it] 56%|█████▋ | 3880/6885 [13:09:35<2:28:19, 2.96s/it] {'loss': 0.5915, 'grad_norm': 1.3502747193694702, 'learning_rate': 4.766848194988344e-06, 'epoch': 0.56} 56%|█████▋ | 3880/6885 [13:09:35<2:28:19, 2.96s/it] 56%|█████▋ | 3881/6885 [13:09:38<2:35:35, 3.11s/it] 56%|█████▋ | 3882/6885 [13:09:42<2:43:50, 3.27s/it] 56%|█████▋ | 3883/6885 [13:09:45<2:44:56, 3.30s/it] 56%|█████▋ | 3884/6885 [13:09:51<3:27:35, 4.15s/it] 56%|█████▋ | 3885/6885 [13:09:54<3:07:39, 3.75s/it] 56%|█████▋ | 3886/6885 [13:09:58<3:09:27, 3.79s/it] 56%|█████▋ | 3887/6885 [13:10:01<3:00:38, 3.62s/it] 56%|█████▋ | 3888/6885 [13:10:04<2:45:35, 3.32s/it] 56%|█████▋ | 3889/6885 [13:10:07<2:41:06, 3.23s/it] 56%|█████▋ | 3890/6885 [13:10:11<2:49:08, 3.39s/it] {'loss': 0.5732, 'grad_norm': 1.001203957804234, 'learning_rate': 4.741527097204837e-06, 'epoch': 0.56} 56%|█████▋ | 3890/6885 [13:10:11<2:49:08, 3.39s/it] 57%|█████▋ | 3891/6885 [13:10:13<2:33:47, 3.08s/it] 57%|█████▋ | 3892/6885 [13:10:17<2:38:52, 3.18s/it] 57%|█████▋ | 3893/6885 [13:10:20<2:42:04, 3.25s/it] 57%|█████▋ | 3894/6885 [13:10:23<2:42:10, 3.25s/it] 57%|█████▋ | 3895/6885 [13:10:27<2:43:12, 3.28s/it] 57%|█████▋ | 3896/6885 [13:10:29<2:26:38, 2.94s/it] 57%|█████▋ | 3897/6885 [13:10:33<2:41:05, 3.23s/it] 57%|█████▋ | 3898/6885 [13:10:35<2:32:05, 3.06s/it] 57%|█████▋ | 3899/6885 [13:10:40<3:04:29, 3.71s/it] 57%|█████▋ | 3900/6885 [13:10:45<3:18:30, 3.99s/it] {'loss': 0.5682, 'grad_norm': 1.1428305709772093, 'learning_rate': 4.7162126443606145e-06, 'epoch': 0.57} 57%|█████▋ | 3900/6885 [13:10:45<3:18:30, 3.99s/it] 57%|█████▋ | 3901/6885 [13:10:47<2:42:02, 3.26s/it] 57%|█████▋ | 3902/6885 [13:10:49<2:34:23, 3.11s/it] 57%|█████▋ | 3903/6885 [13:10:55<3:16:59, 3.96s/it] 57%|█████▋ | 3904/6885 [13:10:58<2:57:35, 3.57s/it] 57%|█████▋ | 3905/6885 [13:11:02<3:02:26, 3.67s/it] 57%|█████▋ | 3906/6885 [13:11:04<2:35:52, 3.14s/it] 57%|█████▋ | 3907/6885 [13:11:07<2:30:24, 3.03s/it] 57%|█████▋ | 3908/6885 [13:11:11<2:52:56, 3.49s/it] 57%|█████▋ | 3909/6885 [13:11:14<2:37:24, 3.17s/it] 57%|█████▋ | 3910/6885 [13:11:16<2:32:55, 3.08s/it] {'loss': 0.5695, 'grad_norm': 1.220191866232699, 'learning_rate': 4.690905487251174e-06, 'epoch': 0.57} 57%|█████▋ | 3910/6885 [13:11:16<2:32:55, 3.08s/it] 57%|█████▋ | 3911/6885 [13:11:21<2:54:04, 3.51s/it] 57%|█████▋ | 3912/6885 [13:11:24<2:43:48, 3.31s/it] 57%|█████▋ | 3913/6885 [13:11:27<2:40:47, 3.25s/it] 57%|█████▋ | 3914/6885 [13:11:30<2:34:13, 3.11s/it] 57%|█████▋ | 3915/6885 [13:11:32<2:17:22, 2.78s/it] 57%|█████▋ | 3916/6885 [13:11:34<2:07:51, 2.58s/it] 57%|█████▋ | 3917/6885 [13:11:37<2:10:00, 2.63s/it] 57%|█████▋ | 3918/6885 [13:11:39<2:04:50, 2.52s/it] 57%|█████▋ | 3919/6885 [13:11:42<2:12:09, 2.67s/it] 57%|█████▋ | 3920/6885 [13:11:46<2:30:18, 3.04s/it] {'loss': 0.5684, 'grad_norm': 1.0555952997249456, 'learning_rate': 4.665606276484455e-06, 'epoch': 0.57} 57%|█████▋ | 3920/6885 [13:11:46<2:30:18, 3.04s/it] 57%|█████▋ | 3921/6885 [13:11:48<2:19:41, 2.83s/it] 57%|█████▋ | 3922/6885 [13:11:50<2:10:26, 2.64s/it] 57%|█████▋ | 3923/6885 [13:11:54<2:30:24, 3.05s/it] 57%|█████▋ | 3924/6885 [13:11:57<2:25:54, 2.96s/it] 57%|█████▋ | 3925/6885 [13:12:01<2:46:35, 3.38s/it] 57%|█████▋ | 3926/6885 [13:12:04<2:39:40, 3.24s/it] 57%|█████▋ | 3927/6885 [13:12:07<2:31:31, 3.07s/it] 57%|█████▋ | 3928/6885 [13:12:09<2:19:55, 2.84s/it] 57%|█████▋ | 3929/6885 [13:12:11<2:06:44, 2.57s/it] 57%|█████▋ | 3930/6885 [13:12:14<2:02:52, 2.49s/it] {'loss': 0.5876, 'grad_norm': 1.1675138439049109, 'learning_rate': 4.6403156624641085e-06, 'epoch': 0.57} 57%|█████▋ | 3930/6885 [13:12:14<2:02:52, 2.49s/it] 57%|█████▋ | 3931/6885 [13:12:18<2:26:02, 2.97s/it] 57%|█████▋ | 3932/6885 [13:12:21<2:37:51, 3.21s/it] 57%|█████▋ | 3933/6885 [13:12:24<2:27:39, 3.00s/it] 57%|█████▋ | 3934/6885 [13:12:27<2:33:34, 3.12s/it] 57%|█████▋ | 3935/6885 [13:12:31<2:42:02, 3.30s/it] 57%|█████▋ | 3936/6885 [13:12:36<3:05:57, 3.78s/it] 57%|█████▋ | 3937/6885 [13:12:41<3:18:02, 4.03s/it] 57%|█████▋ | 3938/6885 [13:12:43<2:54:20, 3.55s/it] 57%|█████▋ | 3939/6885 [13:12:46<2:43:30, 3.33s/it] 57%|█████▋ | 3940/6885 [13:12:49<2:35:24, 3.17s/it] {'loss': 0.5838, 'grad_norm': 1.2418849374572543, 'learning_rate': 4.615034295372777e-06, 'epoch': 0.57} 57%|█████▋ | 3940/6885 [13:12:49<2:35:24, 3.17s/it] 57%|█████▋ | 3941/6885 [13:12:53<2:54:10, 3.55s/it] 57%|█████▋ | 3942/6885 [13:12:58<3:12:16, 3.92s/it] 57%|█████▋ | 3943/6885 [13:13:01<3:04:32, 3.76s/it] 57%|█████▋ | 3944/6885 [13:13:04<2:55:53, 3.59s/it] 57%|█████▋ | 3945/6885 [13:13:07<2:36:57, 3.20s/it] 57%|█████▋ | 3946/6885 [13:13:09<2:24:13, 2.94s/it] 57%|█████▋ | 3947/6885 [13:13:12<2:19:15, 2.84s/it] 57%|█████▋ | 3948/6885 [13:13:16<2:35:16, 3.17s/it] 57%|█████▋ | 3949/6885 [13:13:20<2:50:35, 3.49s/it] 57%|█████▋ | 3950/6885 [13:13:23<2:50:27, 3.48s/it] {'loss': 0.57, 'grad_norm': 1.0616817293128535, 'learning_rate': 4.589762825155374e-06, 'epoch': 0.57} 57%|█████▋ | 3950/6885 [13:13:23<2:50:27, 3.48s/it] 57%|█████▋ | 3951/6885 [13:13:26<2:39:11, 3.26s/it] 57%|█████▋ | 3952/6885 [13:13:29<2:39:21, 3.26s/it] 57%|█████▋ | 3953/6885 [13:13:32<2:36:12, 3.20s/it] 57%|█████▋ | 3954/6885 [13:13:36<2:37:06, 3.22s/it] 57%|█████▋ | 3955/6885 [13:13:39<2:37:35, 3.23s/it] 57%|█████▋ | 3956/6885 [13:13:41<2:19:42, 2.86s/it] 57%|█████▋ | 3957/6885 [13:13:43<2:14:18, 2.75s/it] 57%|█████▋ | 3958/6885 [13:13:47<2:20:29, 2.88s/it] 58%|█████▊ | 3959/6885 [13:13:51<2:42:13, 3.33s/it] 58%|█████▊ | 3960/6885 [13:13:53<2:27:11, 3.02s/it] {'loss': 0.5521, 'grad_norm': 1.2414737852232787, 'learning_rate': 4.564501901502386e-06, 'epoch': 0.58} 58%|█████▊ | 3960/6885 [13:13:53<2:27:11, 3.02s/it] 58%|█████▊ | 3961/6885 [13:13:55<2:12:54, 2.73s/it] 58%|█████▊ | 3962/6885 [13:13:58<2:14:27, 2.76s/it] 58%|█████▊ | 3963/6885 [13:14:01<2:22:38, 2.93s/it] 58%|█████▊ | 3964/6885 [13:14:05<2:33:33, 3.15s/it] 58%|█████▊ | 3965/6885 [13:14:07<2:13:56, 2.75s/it] 58%|█████▊ | 3966/6885 [13:14:15<3:37:03, 4.46s/it] 58%|█████▊ | 3967/6885 [13:14:18<3:15:51, 4.03s/it] 58%|█████▊ | 3968/6885 [13:14:23<3:20:29, 4.12s/it] 58%|█████▊ | 3969/6885 [13:14:27<3:18:12, 4.08s/it] 58%|█████▊ | 3970/6885 [13:14:30<3:10:53, 3.93s/it] {'loss': 0.5761, 'grad_norm': 1.0962764476368352, 'learning_rate': 4.5392521738331585e-06, 'epoch': 0.58} 58%|█████▊ | 3970/6885 [13:14:30<3:10:53, 3.93s/it] 58%|█████▊ | 3971/6885 [13:14:33<2:59:01, 3.69s/it] 58%|█████▊ | 3972/6885 [13:14:36<2:40:15, 3.30s/it] 58%|█████▊ | 3973/6885 [13:14:41<3:02:47, 3.77s/it] 58%|█████▊ | 3974/6885 [13:14:46<3:20:28, 4.13s/it] 58%|█████▊ | 3975/6885 [13:14:49<3:07:00, 3.86s/it] 58%|█████▊ | 3976/6885 [13:14:51<2:40:40, 3.31s/it] 58%|█████▊ | 3977/6885 [13:14:53<2:29:10, 3.08s/it] 58%|█████▊ | 3978/6885 [13:14:56<2:19:01, 2.87s/it] 58%|█████▊ | 3979/6885 [13:14:59<2:16:58, 2.83s/it] 58%|█████▊ | 3980/6885 [13:15:00<2:03:00, 2.54s/it] {'loss': 0.5612, 'grad_norm': 1.2445755051746221, 'learning_rate': 4.514014291279208e-06, 'epoch': 0.58} 58%|█████▊ | 3980/6885 [13:15:00<2:03:00, 2.54s/it] 58%|█████▊ | 3981/6885 [13:15:03<2:06:18, 2.61s/it] 58%|█████▊ | 3982/6885 [13:15:06<2:13:33, 2.76s/it] 58%|█████▊ | 3983/6885 [13:15:11<2:40:19, 3.31s/it] 58%|█████▊ | 3984/6885 [13:15:14<2:31:59, 3.14s/it] 58%|█████▊ | 3985/6885 [13:15:17<2:33:25, 3.17s/it] 58%|█████▊ | 3986/6885 [13:15:20<2:27:28, 3.05s/it] 58%|█████▊ | 3987/6885 [13:15:23<2:24:35, 2.99s/it] 58%|█████▊ | 3988/6885 [13:15:24<2:05:14, 2.59s/it] 58%|█████▊ | 3989/6885 [13:15:26<1:50:03, 2.28s/it] 58%|█████▊ | 3990/6885 [13:15:30<2:12:25, 2.74s/it] {'loss': 0.5651, 'grad_norm': 1.1248791169953434, 'learning_rate': 4.488788902667534e-06, 'epoch': 0.58} 58%|█████▊ | 3990/6885 [13:15:30<2:12:25, 2.74s/it] 58%|█████▊ | 3991/6885 [13:15:33<2:16:45, 2.84s/it] 58%|█████▊ | 3992/6885 [13:15:35<2:17:06, 2.84s/it] 58%|█████▊ | 3993/6885 [13:15:41<2:49:04, 3.51s/it] 58%|█████▊ | 3994/6885 [13:15:42<2:25:00, 3.01s/it] 58%|█████▊ | 3995/6885 [13:15:46<2:27:17, 3.06s/it] 58%|█████▊ | 3996/6885 [13:15:47<2:09:51, 2.70s/it] 58%|█████▊ | 3997/6885 [13:15:52<2:34:17, 3.21s/it] 58%|█████▊ | 3998/6885 [13:15:55<2:32:17, 3.16s/it] 58%|█████▊ | 3999/6885 [13:15:57<2:20:17, 2.92s/it] 58%|█████▊ | 4000/6885 [13:15:59<2:10:42, 2.72s/it] {'loss': 0.5624, 'grad_norm': 1.1052395709597995, 'learning_rate': 4.463576656503927e-06, 'epoch': 0.58} 58%|█████▊ | 4000/6885 [13:15:59<2:10:42, 2.72s/it] 58%|█████▊ | 4001/6885 [13:16:02<2:01:00, 2.52s/it] 58%|█████▊ | 4002/6885 [13:16:04<2:02:06, 2.54s/it] 58%|█████▊ | 4003/6885 [13:16:06<1:59:37, 2.49s/it] 58%|█████▊ | 4004/6885 [13:16:09<1:58:18, 2.46s/it] 58%|█████▊ | 4005/6885 [13:16:12<2:04:36, 2.60s/it] 58%|█████▊ | 4006/6885 [13:16:15<2:17:16, 2.86s/it] 58%|█████▊ | 4007/6885 [13:16:17<2:05:03, 2.61s/it] 58%|█████▊ | 4008/6885 [13:16:20<2:02:34, 2.56s/it] 58%|█████▊ | 4009/6885 [13:16:25<2:37:58, 3.30s/it] 58%|█████▊ | 4010/6885 [13:16:28<2:30:51, 3.15s/it] {'loss': 0.5747, 'grad_norm': 1.0979993545936089, 'learning_rate': 4.438378200956318e-06, 'epoch': 0.58} 58%|█████▊ | 4010/6885 [13:16:28<2:30:51, 3.15s/it] 58%|█████▊ | 4011/6885 [13:16:30<2:15:32, 2.83s/it] 58%|█████▊ | 4012/6885 [13:16:32<2:11:48, 2.75s/it] 58%|█████▊ | 4013/6885 [13:16:36<2:21:52, 2.96s/it] 58%|█████▊ | 4014/6885 [13:16:39<2:28:45, 3.11s/it] 58%|█████▊ | 4015/6885 [13:16:43<2:35:16, 3.25s/it] 58%|█████▊ | 4016/6885 [13:16:45<2:24:46, 3.03s/it] 58%|█████▊ | 4017/6885 [13:16:48<2:23:16, 3.00s/it] 58%|█████▊ | 4018/6885 [13:16:52<2:40:33, 3.36s/it] 58%|█████▊ | 4019/6885 [13:16:55<2:32:47, 3.20s/it] 58%|█████▊ | 4020/6885 [13:16:59<2:41:43, 3.39s/it] {'loss': 0.5757, 'grad_norm': 1.1585156096079503, 'learning_rate': 4.413194183838091e-06, 'epoch': 0.58} 58%|█████▊ | 4020/6885 [13:16:59<2:41:43, 3.39s/it] 58%|█████▊ | 4021/6885 [13:17:04<3:07:37, 3.93s/it] 58%|█████▊ | 4022/6885 [13:17:07<2:57:08, 3.71s/it] 58%|█████▊ | 4023/6885 [13:17:12<3:04:48, 3.87s/it] 58%|█████▊ | 4024/6885 [13:17:14<2:38:25, 3.32s/it] 58%|█████▊ | 4025/6885 [13:17:16<2:23:22, 3.01s/it] 58%|█████▊ | 4026/6885 [13:17:23<3:16:59, 4.13s/it] 58%|█████▊ | 4027/6885 [13:17:26<3:00:10, 3.78s/it] 59%|█████▊ | 4028/6885 [13:17:28<2:39:11, 3.34s/it] 59%|█████▊ | 4029/6885 [13:17:31<2:34:59, 3.26s/it] 59%|█████▊ | 4030/6885 [13:17:35<2:46:51, 3.51s/it] {'loss': 0.5826, 'grad_norm': 1.0657343307419072, 'learning_rate': 4.388025252591448e-06, 'epoch': 0.59} 59%|█████▊ | 4030/6885 [13:17:35<2:46:51, 3.51s/it] 59%|█████▊ | 4031/6885 [13:17:39<2:50:15, 3.58s/it] 59%|█████▊ | 4032/6885 [13:17:41<2:31:11, 3.18s/it] 59%|█████▊ | 4033/6885 [13:17:44<2:25:53, 3.07s/it] 59%|█████▊ | 4034/6885 [13:17:48<2:40:10, 3.37s/it] 59%|█████▊ | 4035/6885 [13:17:51<2:30:18, 3.16s/it] 59%|█████▊ | 4036/6885 [13:17:53<2:15:20, 2.85s/it] 59%|█████▊ | 4037/6885 [13:17:56<2:17:41, 2.90s/it] 59%|█████▊ | 4038/6885 [13:17:58<2:02:26, 2.58s/it] 59%|█████▊ | 4039/6885 [13:18:01<2:16:26, 2.88s/it] 59%|█████▊ | 4040/6885 [13:18:06<2:36:51, 3.31s/it] {'loss': 0.561, 'grad_norm': 1.1584399941372348, 'learning_rate': 4.362872054270753e-06, 'epoch': 0.59} 59%|█████▊ | 4040/6885 [13:18:06<2:36:51, 3.31s/it] 59%|█████▊ | 4041/6885 [13:18:09<2:41:09, 3.40s/it] 59%|█████▊ | 4042/6885 [13:18:13<2:52:04, 3.63s/it] 59%|█████▊ | 4043/6885 [13:18:18<3:09:54, 4.01s/it] 59%|█████▊ | 4044/6885 [13:18:20<2:42:00, 3.42s/it] 59%|█████▉ | 4045/6885 [13:18:23<2:38:17, 3.34s/it] 59%|█████▉ | 4046/6885 [13:18:26<2:29:24, 3.16s/it] 59%|█████▉ | 4047/6885 [13:18:29<2:20:08, 2.96s/it] 59%|█████▉ | 4048/6885 [13:18:33<2:46:40, 3.52s/it] 59%|█████▉ | 4049/6885 [13:18:36<2:35:14, 3.28s/it] 59%|█████▉ | 4050/6885 [13:18:39<2:30:19, 3.18s/it] {'loss': 0.5801, 'grad_norm': 1.1136815017444102, 'learning_rate': 4.337735235525904e-06, 'epoch': 0.59} 59%|█████▉ | 4050/6885 [13:18:39<2:30:19, 3.18s/it] 59%|█████▉ | 4051/6885 [13:18:41<2:17:15, 2.91s/it] 59%|█████▉ | 4052/6885 [13:18:44<2:09:23, 2.74s/it] 59%|█████▉ | 4053/6885 [13:18:47<2:11:09, 2.78s/it] 59%|█████▉ | 4054/6885 [13:18:50<2:12:54, 2.82s/it] 59%|█████▉ | 4055/6885 [13:18:54<2:31:57, 3.22s/it] 59%|█████▉ | 4056/6885 [13:18:58<2:47:42, 3.56s/it] 59%|█████▉ | 4057/6885 [13:19:01<2:32:25, 3.23s/it] 59%|█████▉ | 4058/6885 [13:19:03<2:19:40, 2.96s/it] 59%|█████▉ | 4059/6885 [13:19:06<2:15:25, 2.88s/it] 59%|█████▉ | 4060/6885 [13:19:08<2:08:26, 2.73s/it] {'loss': 0.5748, 'grad_norm': 1.2048049573288624, 'learning_rate': 4.312615442585699e-06, 'epoch': 0.59} 59%|█████▉ | 4060/6885 [13:19:08<2:08:26, 2.73s/it] 59%|█████▉ | 4061/6885 [13:19:11<2:13:26, 2.84s/it] 59%|█████▉ | 4062/6885 [13:19:13<2:04:58, 2.66s/it] 59%|█████▉ | 4063/6885 [13:19:15<1:54:20, 2.43s/it] 59%|█████▉ | 4064/6885 [13:19:18<2:01:26, 2.58s/it] 59%|█████▉ | 4065/6885 [13:19:20<1:52:59, 2.40s/it] 59%|█████▉ | 4066/6885 [13:19:23<2:00:26, 2.56s/it] 59%|█████▉ | 4067/6885 [13:19:26<2:12:56, 2.83s/it] 59%|█████▉ | 4068/6885 [13:19:29<2:03:47, 2.64s/it] 59%|█████▉ | 4069/6885 [13:19:31<2:06:11, 2.69s/it] 59%|█████▉ | 4070/6885 [13:19:36<2:29:28, 3.19s/it] {'loss': 0.5665, 'grad_norm': 1.106968794623351, 'learning_rate': 4.287513321241237e-06, 'epoch': 0.59} 59%|█████▉ | 4070/6885 [13:19:36<2:29:28, 3.19s/it] 59%|█████▉ | 4071/6885 [13:19:38<2:18:15, 2.95s/it] 59%|█████▉ | 4072/6885 [13:19:41<2:18:53, 2.96s/it] 59%|█████▉ | 4073/6885 [13:19:45<2:25:37, 3.11s/it] 59%|█████▉ | 4074/6885 [13:19:49<2:38:52, 3.39s/it] 59%|█████▉ | 4075/6885 [13:19:51<2:25:05, 3.10s/it] 59%|█████▉ | 4076/6885 [13:19:54<2:23:01, 3.06s/it] 59%|█████▉ | 4077/6885 [13:19:58<2:30:14, 3.21s/it] 59%|█████▉ | 4078/6885 [13:20:00<2:12:05, 2.82s/it] 59%|█████▉ | 4079/6885 [13:20:02<2:04:11, 2.66s/it] 59%|█████▉ | 4080/6885 [13:20:06<2:30:00, 3.21s/it] {'loss': 0.5739, 'grad_norm': 1.0773536810915454, 'learning_rate': 4.262429516829299e-06, 'epoch': 0.59} 59%|█████▉ | 4080/6885 [13:20:06<2:30:00, 3.21s/it] 59%|█████▉ | 4081/6885 [13:20:09<2:21:24, 3.03s/it] 59%|█████▉ | 4082/6885 [13:20:12<2:19:32, 2.99s/it] 59%|█████▉ | 4083/6885 [13:20:14<2:10:44, 2.80s/it] 59%|█████▉ | 4084/6885 [13:20:20<2:57:13, 3.80s/it] 59%|█████▉ | 4085/6885 [13:20:23<2:41:32, 3.46s/it] 59%|█████▉ | 4086/6885 [13:20:25<2:21:44, 3.04s/it] 59%|█████▉ | 4087/6885 [13:20:27<2:04:16, 2.67s/it] 59%|█████▉ | 4088/6885 [13:20:29<2:02:31, 2.63s/it] 59%|█████▉ | 4089/6885 [13:20:32<2:05:10, 2.69s/it] 59%|█████▉ | 4090/6885 [13:20:35<2:05:04, 2.69s/it] {'loss': 0.573, 'grad_norm': 1.2780512286596586, 'learning_rate': 4.237364674215774e-06, 'epoch': 0.59} 59%|█████▉ | 4090/6885 [13:20:35<2:05:04, 2.69s/it] 59%|█████▉ | 4091/6885 [13:20:37<1:54:33, 2.46s/it] 59%|█████▉ | 4092/6885 [13:20:41<2:19:48, 3.00s/it] 59%|█████▉ | 4093/6885 [13:20:44<2:17:45, 2.96s/it] 59%|█████▉ | 4094/6885 [13:20:48<2:39:34, 3.43s/it] 59%|█████▉ | 4095/6885 [13:20:51<2:30:16, 3.23s/it] 59%|█████▉ | 4096/6885 [13:20:55<2:42:13, 3.49s/it] 60%|█████▉ | 4097/6885 [13:20:58<2:33:55, 3.31s/it] 60%|█████▉ | 4098/6885 [13:21:02<2:39:06, 3.43s/it] 60%|█████▉ | 4099/6885 [13:21:05<2:40:01, 3.45s/it] 60%|█████▉ | 4100/6885 [13:21:11<3:06:23, 4.02s/it] {'loss': 0.5637, 'grad_norm': 1.015175880325257, 'learning_rate': 4.212319437779066e-06, 'epoch': 0.6} 60%|█████▉ | 4100/6885 [13:21:11<3:06:23, 4.02s/it] 60%|█████▉ | 4101/6885 [13:21:13<2:44:51, 3.55s/it] 60%|█████▉ | 4102/6885 [13:21:17<2:51:49, 3.70s/it] 60%|█████▉ | 4103/6885 [13:21:20<2:39:48, 3.45s/it] 60%|█████▉ | 4104/6885 [13:21:25<2:54:19, 3.76s/it] 60%|█████▉ | 4105/6885 [13:21:27<2:31:48, 3.28s/it] 60%|█████▉ | 4106/6885 [13:21:29<2:14:43, 2.91s/it] 60%|█████▉ | 4107/6885 [13:21:32<2:16:20, 2.94s/it] 60%|█████▉ | 4108/6885 [13:21:34<2:06:40, 2.74s/it] 60%|█████▉ | 4109/6885 [13:21:36<1:57:30, 2.54s/it] 60%|█████▉ | 4110/6885 [13:21:39<2:06:51, 2.74s/it] {'loss': 0.5807, 'grad_norm': 1.1403330329394572, 'learning_rate': 4.187294451393541e-06, 'epoch': 0.6} 60%|█████▉ | 4110/6885 [13:21:39<2:06:51, 2.74s/it] 60%|█████▉ | 4111/6885 [13:21:42<2:06:34, 2.74s/it] 60%|█████▉ | 4112/6885 [13:21:47<2:32:57, 3.31s/it] 60%|█████▉ | 4113/6885 [13:21:49<2:16:30, 2.95s/it] 60%|█████▉ | 4114/6885 [13:21:51<2:06:14, 2.73s/it] 60%|█████▉ | 4115/6885 [13:21:53<2:00:54, 2.62s/it] 60%|█████▉ | 4116/6885 [13:21:56<2:02:59, 2.67s/it] 60%|█████▉ | 4117/6885 [13:21:59<2:01:39, 2.64s/it] 60%|█████▉ | 4118/6885 [13:22:02<2:11:05, 2.84s/it] 60%|█████▉ | 4119/6885 [13:22:06<2:29:09, 3.24s/it] 60%|█████▉ | 4120/6885 [13:22:11<2:55:04, 3.80s/it] {'loss': 0.5704, 'grad_norm': 1.1083139371642667, 'learning_rate': 4.162290358412962e-06, 'epoch': 0.6} 60%|█████▉ | 4120/6885 [13:22:11<2:55:04, 3.80s/it] 60%|█████▉ | 4121/6885 [13:22:14<2:45:08, 3.58s/it] 60%|█████▉ | 4122/6885 [13:22:17<2:24:12, 3.13s/it] 60%|█████▉ | 4123/6885 [13:22:20<2:25:04, 3.15s/it] 60%|█████▉ | 4124/6885 [13:22:25<2:49:58, 3.69s/it] 60%|█████▉ | 4125/6885 [13:22:30<3:08:32, 4.10s/it] 60%|█████▉ | 4126/6885 [13:22:33<2:52:44, 3.76s/it] 60%|█████▉ | 4127/6885 [13:22:35<2:31:39, 3.30s/it] 60%|█████▉ | 4128/6885 [13:22:39<2:38:27, 3.45s/it] 60%|█████▉ | 4129/6885 [13:22:41<2:26:36, 3.19s/it] 60%|█████▉ | 4130/6885 [13:22:44<2:16:11, 2.97s/it] {'loss': 0.5559, 'grad_norm': 1.1372343052927192, 'learning_rate': 4.1373078016539535e-06, 'epoch': 0.6} 60%|█████▉ | 4130/6885 [13:22:44<2:16:11, 2.97s/it] 60%|██████ | 4131/6885 [13:22:47<2:25:30, 3.17s/it] 60%|██████ | 4132/6885 [13:22:50<2:17:30, 3.00s/it] 60%|██████ | 4133/6885 [13:22:55<2:45:24, 3.61s/it] 60%|██████ | 4134/6885 [13:22:58<2:34:12, 3.36s/it] 60%|██████ | 4135/6885 [13:23:02<2:44:44, 3.59s/it] 60%|██████ | 4136/6885 [13:23:04<2:28:46, 3.25s/it] 60%|██████ | 4137/6885 [13:23:08<2:34:32, 3.37s/it] 60%|██████ | 4138/6885 [13:23:11<2:26:11, 3.19s/it] 60%|██████ | 4139/6885 [13:23:14<2:30:13, 3.28s/it] 60%|██████ | 4140/6885 [13:23:16<2:14:09, 2.93s/it] {'loss': 0.5588, 'grad_norm': 1.2137905963682751, 'learning_rate': 4.1123474233794845e-06, 'epoch': 0.6} 60%|██████ | 4140/6885 [13:23:16<2:14:09, 2.93s/it] 60%|██████ | 4141/6885 [13:23:19<2:06:46, 2.77s/it] 60%|██████ | 4142/6885 [13:23:22<2:07:41, 2.79s/it] 60%|██████ | 4143/6885 [13:23:25<2:09:08, 2.83s/it] 60%|██████ | 4144/6885 [13:23:27<2:06:28, 2.77s/it] 60%|██████ | 4145/6885 [13:23:30<2:05:56, 2.76s/it] 60%|██████ | 4146/6885 [13:23:32<1:54:19, 2.50s/it] 60%|██████ | 4147/6885 [13:23:34<1:43:41, 2.27s/it] 60%|██████ | 4148/6885 [13:23:36<1:50:26, 2.42s/it] 60%|██████ | 4149/6885 [13:23:39<1:51:19, 2.44s/it] 60%|██████ | 4150/6885 [13:23:41<1:47:35, 2.36s/it] {'loss': 0.5776, 'grad_norm': 1.2130103389722957, 'learning_rate': 4.087409865282341e-06, 'epoch': 0.6} 60%|██████ | 4150/6885 [13:23:41<1:47:35, 2.36s/it] 60%|██████ | 4151/6885 [13:23:44<1:53:42, 2.50s/it] 60%|██████ | 4152/6885 [13:23:48<2:14:01, 2.94s/it] 60%|██████ | 4153/6885 [13:23:50<2:07:10, 2.79s/it] 60%|██████ | 4154/6885 [13:23:54<2:13:56, 2.94s/it] 60%|██████ | 4155/6885 [13:23:57<2:17:40, 3.03s/it] 60%|██████ | 4156/6885 [13:23:59<2:09:53, 2.86s/it] 60%|██████ | 4157/6885 [13:24:03<2:19:38, 3.07s/it] 60%|██████ | 4158/6885 [13:24:06<2:19:01, 3.06s/it] 60%|██████ | 4159/6885 [13:24:08<2:10:12, 2.87s/it] 60%|██████ | 4160/6885 [13:24:11<2:10:14, 2.87s/it] {'loss': 0.5618, 'grad_norm': 1.21914550825707, 'learning_rate': 4.062495768468646e-06, 'epoch': 0.6} 60%|██████ | 4160/6885 [13:24:11<2:10:14, 2.87s/it] 60%|██████ | 4161/6885 [13:24:14<2:03:57, 2.73s/it] 60%|██████ | 4162/6885 [13:24:16<1:56:29, 2.57s/it] 60%|██████ | 4163/6885 [13:24:18<1:47:27, 2.37s/it] 60%|██████ | 4164/6885 [13:24:20<1:41:49, 2.25s/it] 60%|██████ | 4165/6885 [13:24:22<1:42:05, 2.25s/it] 61%|██████ | 4166/6885 [13:24:26<2:03:57, 2.74s/it] 61%|██████ | 4167/6885 [13:24:28<1:57:53, 2.60s/it] 61%|██████ | 4168/6885 [13:24:30<1:48:05, 2.39s/it] 61%|██████ | 4169/6885 [13:24:32<1:48:10, 2.39s/it] 61%|██████ | 4170/6885 [13:24:35<1:48:30, 2.40s/it] {'loss': 0.5784, 'grad_norm': 1.1540562248868875, 'learning_rate': 4.03760577344136e-06, 'epoch': 0.61} 61%|██████ | 4170/6885 [13:24:35<1:48:30, 2.40s/it] 61%|██████ | 4171/6885 [13:24:37<1:45:16, 2.33s/it] 61%|██████ | 4172/6885 [13:24:40<2:00:05, 2.66s/it] 61%|██████ | 4173/6885 [13:24:43<2:00:59, 2.68s/it] 61%|██████ | 4174/6885 [13:24:48<2:28:11, 3.28s/it] 61%|██████ | 4175/6885 [13:24:51<2:23:17, 3.17s/it] 61%|██████ | 4176/6885 [13:24:53<2:14:52, 2.99s/it] 61%|██████ | 4177/6885 [13:24:57<2:29:45, 3.32s/it] 61%|██████ | 4178/6885 [13:24:59<2:12:59, 2.95s/it] 61%|██████ | 4179/6885 [13:25:02<2:03:20, 2.73s/it] 61%|██████ | 4180/6885 [13:25:06<2:27:40, 3.28s/it] {'loss': 0.5814, 'grad_norm': 1.214796762228358, 'learning_rate': 4.012740520083832e-06, 'epoch': 0.61} 61%|██████ | 4180/6885 [13:25:06<2:27:40, 3.28s/it] 61%|██████ | 4181/6885 [13:25:09<2:22:14, 3.16s/it] 61%|██████ | 4182/6885 [13:25:12<2:16:16, 3.02s/it] 61%|██████ | 4183/6885 [13:25:14<2:05:41, 2.79s/it] 61%|██████ | 4184/6885 [13:25:16<1:58:47, 2.64s/it] 61%|██████ | 4185/6885 [13:25:18<1:51:53, 2.49s/it] 61%|██████ | 4186/6885 [13:25:21<1:48:38, 2.42s/it] 61%|██████ | 4187/6885 [13:25:24<1:59:14, 2.65s/it] 61%|██████ | 4188/6885 [13:25:26<1:47:14, 2.39s/it] 61%|██████ | 4189/6885 [13:25:27<1:37:59, 2.18s/it] 61%|██████ | 4190/6885 [13:25:31<2:02:02, 2.72s/it] {'loss': 0.5791, 'grad_norm': 1.157806370832285, 'learning_rate': 3.987900647643334e-06, 'epoch': 0.61} 61%|██████ | 4190/6885 [13:25:31<2:02:02, 2.72s/it] 61%|██████ | 4191/6885 [13:25:33<1:51:59, 2.49s/it] 61%|██████ | 4192/6885 [13:25:43<3:25:55, 4.59s/it] 61%|██████ | 4193/6885 [13:25:46<3:02:48, 4.07s/it] 61%|██████ | 4194/6885 [13:25:48<2:42:18, 3.62s/it] 61%|██████ | 4195/6885 [13:25:53<2:56:18, 3.93s/it] 61%|██████ | 4196/6885 [13:25:55<2:33:07, 3.42s/it] 61%|██████ | 4197/6885 [13:25:58<2:23:52, 3.21s/it] 61%|██████ | 4198/6885 [13:26:00<2:04:57, 2.79s/it] 61%|██████ | 4199/6885 [13:26:02<2:05:32, 2.80s/it] 61%|██████ | 4200/6885 [13:26:05<2:05:17, 2.80s/it] {'loss': 0.5652, 'grad_norm': 1.1517956672556253, 'learning_rate': 3.963086794714639e-06, 'epoch': 0.61} 61%|██████ | 4200/6885 [13:26:05<2:05:17, 2.80s/it] 61%|██████ | 4201/6885 [13:26:07<1:56:00, 2.59s/it] 61%|██████ | 4202/6885 [13:26:09<1:41:02, 2.26s/it] 61%|██████ | 4203/6885 [13:26:12<1:55:29, 2.58s/it] 61%|██████ | 4204/6885 [13:26:14<1:51:04, 2.49s/it] 61%|██████ | 4205/6885 [13:26:18<2:04:12, 2.78s/it] 61%|██████ | 4206/6885 [13:26:21<2:08:40, 2.88s/it] 61%|██████ | 4207/6885 [13:26:23<1:57:41, 2.64s/it] 61%|██████ | 4208/6885 [13:26:25<1:48:27, 2.43s/it] 61%|██████ | 4209/6885 [13:26:29<2:12:27, 2.97s/it] 61%|██████ | 4210/6885 [13:26:35<2:44:13, 3.68s/it] {'loss': 0.5728, 'grad_norm': 1.1605789001720612, 'learning_rate': 3.9382995992235955e-06, 'epoch': 0.61} 61%|██████ | 4210/6885 [13:26:35<2:44:13, 3.68s/it] 61%|██████ | 4211/6885 [13:26:38<2:36:46, 3.52s/it] 61%|██████ | 4212/6885 [13:26:41<2:39:02, 3.57s/it] 61%|██████ | 4213/6885 [13:26:44<2:27:53, 3.32s/it] 61%|██████ | 4214/6885 [13:26:46<2:12:51, 2.98s/it] 61%|██████ | 4215/6885 [13:26:49<2:05:54, 2.83s/it] 61%|██████ | 4216/6885 [13:26:51<1:58:49, 2.67s/it] 61%|██████ | 4217/6885 [13:26:53<1:52:22, 2.53s/it] 61%|██████▏ | 4218/6885 [13:26:57<2:03:21, 2.78s/it] 61%|██████▏ | 4219/6885 [13:26:59<2:01:10, 2.73s/it] 61%|██████▏ | 4220/6885 [13:27:03<2:17:20, 3.09s/it] {'loss': 0.5684, 'grad_norm': 1.0630436480054268, 'learning_rate': 3.913539698410734e-06, 'epoch': 0.61} 61%|██████▏ | 4220/6885 [13:27:03<2:17:20, 3.09s/it] 61%|██████▏ | 4221/6885 [13:27:06<2:11:19, 2.96s/it] 61%|██████▏ | 4222/6885 [13:27:09<2:15:01, 3.04s/it] 61%|██████▏ | 4223/6885 [13:27:12<2:11:52, 2.97s/it] 61%|██████▏ | 4224/6885 [13:27:14<1:58:58, 2.68s/it] 61%|██████▏ | 4225/6885 [13:27:16<1:55:05, 2.60s/it] 61%|██████▏ | 4226/6885 [13:27:19<2:00:44, 2.72s/it] 61%|██████▏ | 4227/6885 [13:27:22<1:58:49, 2.68s/it] 61%|██████▏ | 4228/6885 [13:27:24<1:56:43, 2.64s/it] 61%|██████▏ | 4229/6885 [13:27:27<1:55:36, 2.61s/it] 61%|██████▏ | 4230/6885 [13:27:30<1:56:39, 2.64s/it] {'loss': 0.5664, 'grad_norm': 1.175513347812724, 'learning_rate': 3.888807728814874e-06, 'epoch': 0.61} 61%|██████▏ | 4230/6885 [13:27:30<1:56:39, 2.64s/it] 61%|██████▏ | 4231/6885 [13:27:33<2:08:03, 2.89s/it] 61%|██████▏ | 4232/6885 [13:27:36<2:05:42, 2.84s/it] 61%|██████▏ | 4233/6885 [13:27:42<2:43:15, 3.69s/it] 61%|██████▏ | 4234/6885 [13:27:46<2:46:15, 3.76s/it] 62%|██████▏ | 4235/6885 [13:27:48<2:23:31, 3.25s/it] 62%|██████▏ | 4236/6885 [13:27:51<2:22:58, 3.24s/it] 62%|██████▏ | 4237/6885 [13:27:54<2:28:34, 3.37s/it] 62%|██████▏ | 4238/6885 [13:27:58<2:30:52, 3.42s/it] 62%|██████▏ | 4239/6885 [13:28:02<2:35:16, 3.52s/it] 62%|██████▏ | 4240/6885 [13:28:07<3:04:45, 4.19s/it] {'loss': 0.5805, 'grad_norm': 1.1583525329647688, 'learning_rate': 3.864104326256775e-06, 'epoch': 0.62} 62%|██████▏ | 4240/6885 [13:28:07<3:04:45, 4.19s/it] 62%|██████▏ | 4241/6885 [13:28:12<3:02:25, 4.14s/it] 62%|██████▏ | 4242/6885 [13:28:13<2:31:40, 3.44s/it] 62%|██████▏ | 4243/6885 [13:28:17<2:28:44, 3.38s/it] 62%|██████▏ | 4244/6885 [13:28:20<2:27:25, 3.35s/it] 62%|██████▏ | 4245/6885 [13:28:23<2:18:38, 3.15s/it] 62%|██████▏ | 4246/6885 [13:28:25<2:07:32, 2.90s/it] 62%|██████▏ | 4247/6885 [13:28:29<2:21:26, 3.22s/it] 62%|██████▏ | 4248/6885 [13:28:31<2:03:17, 2.81s/it] 62%|██████▏ | 4249/6885 [13:28:36<2:32:40, 3.48s/it] 62%|██████▏ | 4250/6885 [13:28:39<2:29:33, 3.41s/it] {'loss': 0.5622, 'grad_norm': 1.1058170223844426, 'learning_rate': 3.8394301258227756e-06, 'epoch': 0.62} 62%|██████▏ | 4250/6885 [13:28:39<2:29:33, 3.41s/it] 62%|██████▏ | 4251/6885 [13:28:41<2:12:13, 3.01s/it] 62%|██████▏ | 4252/6885 [13:28:45<2:19:06, 3.17s/it] 62%|██████▏ | 4253/6885 [13:28:47<2:09:42, 2.96s/it] 62%|██████▏ | 4254/6885 [13:28:51<2:18:27, 3.16s/it] 62%|██████▏ | 4255/6885 [13:28:53<2:12:25, 3.02s/it] 62%|██████▏ | 4256/6885 [13:28:58<2:33:06, 3.49s/it] 62%|██████▏ | 4257/6885 [13:29:01<2:23:10, 3.27s/it] 62%|██████▏ | 4258/6885 [13:29:03<2:08:10, 2.93s/it] 62%|██████▏ | 4259/6885 [13:29:06<2:15:12, 3.09s/it] 62%|██████▏ | 4260/6885 [13:29:10<2:17:17, 3.14s/it] {'loss': 0.5583, 'grad_norm': 1.2295319541574912, 'learning_rate': 3.814785761848475e-06, 'epoch': 0.62} 62%|██████▏ | 4260/6885 [13:29:10<2:17:17, 3.14s/it] 62%|██████▏ | 4261/6885 [13:29:13<2:22:00, 3.25s/it] 62%|██████▏ | 4262/6885 [13:29:16<2:15:46, 3.11s/it] 62%|██████▏ | 4263/6885 [13:29:18<2:06:31, 2.90s/it] 62%|██████▏ | 4264/6885 [13:29:22<2:13:54, 3.07s/it] 62%|██████▏ | 4265/6885 [13:29:24<2:09:42, 2.97s/it] 62%|██████▏ | 4266/6885 [13:29:29<2:28:20, 3.40s/it] 62%|██████▏ | 4267/6885 [13:29:32<2:22:39, 3.27s/it] 62%|██████▏ | 4268/6885 [13:29:35<2:16:23, 3.13s/it] 62%|██████▏ | 4269/6885 [13:29:37<2:06:11, 2.89s/it] 62%|██████▏ | 4270/6885 [13:29:40<2:05:10, 2.87s/it] {'loss': 0.5755, 'grad_norm': 1.092280135001415, 'learning_rate': 3.790171867902426e-06, 'epoch': 0.62} 62%|██████▏ | 4270/6885 [13:29:40<2:05:10, 2.87s/it] 62%|██████▏ | 4271/6885 [13:29:42<1:58:52, 2.73s/it] 62%|██████▏ | 4272/6885 [13:29:44<1:43:15, 2.37s/it] 62%|██████▏ | 4273/6885 [13:29:47<1:50:55, 2.55s/it] 62%|██████▏ | 4274/6885 [13:29:49<1:42:20, 2.35s/it] 62%|██████▏ | 4275/6885 [13:29:51<1:49:33, 2.52s/it] 62%|██████▏ | 4276/6885 [13:29:56<2:13:15, 3.06s/it] 62%|██████▏ | 4277/6885 [13:29:59<2:14:52, 3.10s/it] 62%|██████▏ | 4278/6885 [13:30:02<2:08:54, 2.97s/it] 62%|██████▏ | 4279/6885 [13:30:07<2:34:23, 3.55s/it] 62%|██████▏ | 4280/6885 [13:30:09<2:13:24, 3.07s/it] {'loss': 0.5729, 'grad_norm': 1.274653674496685, 'learning_rate': 3.7655890767698384e-06, 'epoch': 0.62} 62%|██████▏ | 4280/6885 [13:30:09<2:13:24, 3.07s/it] 62%|██████▏ | 4281/6885 [13:30:13<2:31:27, 3.49s/it] 62%|██████▏ | 4282/6885 [13:30:16<2:30:33, 3.47s/it] 62%|██████▏ | 4283/6885 [13:30:19<2:23:04, 3.30s/it] 62%|██████▏ | 4284/6885 [13:30:22<2:18:28, 3.19s/it] 62%|██████▏ | 4285/6885 [13:30:28<2:50:38, 3.94s/it] 62%|██████▏ | 4286/6885 [13:30:31<2:34:17, 3.56s/it] 62%|██████▏ | 4287/6885 [13:30:34<2:29:44, 3.46s/it] 62%|██████▏ | 4288/6885 [13:30:37<2:29:15, 3.45s/it] 62%|██████▏ | 4289/6885 [13:30:40<2:20:26, 3.25s/it] 62%|██████▏ | 4290/6885 [13:30:43<2:16:53, 3.17s/it] {'loss': 0.5572, 'grad_norm': 1.2166924621577075, 'learning_rate': 3.741038020436323e-06, 'epoch': 0.62} 62%|██████▏ | 4290/6885 [13:30:43<2:16:53, 3.17s/it] 62%|██████▏ | 4291/6885 [13:30:46<2:16:29, 3.16s/it] 62%|██████▏ | 4292/6885 [13:30:49<2:14:21, 3.11s/it] 62%|██████▏ | 4293/6885 [13:30:52<2:16:16, 3.15s/it] 62%|██████▏ | 4294/6885 [13:30:56<2:24:28, 3.35s/it] 62%|██████▏ | 4295/6885 [13:30:59<2:11:57, 3.06s/it] 62%|██████▏ | 4296/6885 [13:31:01<1:59:16, 2.76s/it] 62%|██████▏ | 4297/6885 [13:31:03<1:52:33, 2.61s/it] 62%|██████▏ | 4298/6885 [13:31:06<1:58:45, 2.75s/it] 62%|██████▏ | 4299/6885 [13:31:11<2:23:21, 3.33s/it] 62%|██████▏ | 4300/6885 [13:31:13<2:16:58, 3.18s/it] {'loss': 0.5664, 'grad_norm': 1.0296689666125658, 'learning_rate': 3.7165193300716297e-06, 'epoch': 0.62} 62%|██████▏ | 4300/6885 [13:31:13<2:16:58, 3.18s/it] 62%|██████▏ | 4301/6885 [13:31:16<2:13:12, 3.09s/it] 62%|██████▏ | 4302/6885 [13:31:19<2:11:02, 3.04s/it] 62%|██████▏ | 4303/6885 [13:31:21<1:55:20, 2.68s/it] 63%|██████▎ | 4304/6885 [13:31:24<2:02:37, 2.85s/it] 63%|██████▎ | 4305/6885 [13:31:27<1:57:34, 2.73s/it] 63%|██████▎ | 4306/6885 [13:31:29<1:48:58, 2.54s/it] 63%|██████▎ | 4307/6885 [13:31:32<1:50:36, 2.57s/it] 63%|██████▎ | 4308/6885 [13:31:34<1:48:20, 2.52s/it] 63%|██████▎ | 4309/6885 [13:31:36<1:44:07, 2.43s/it] 63%|██████▎ | 4310/6885 [13:31:40<2:00:08, 2.80s/it] {'loss': 0.5679, 'grad_norm': 1.0530929308425294, 'learning_rate': 3.6920336360134378e-06, 'epoch': 0.63} 63%|██████▎ | 4310/6885 [13:31:40<2:00:08, 2.80s/it] 63%|██████▎ | 4311/6885 [13:31:42<1:47:11, 2.50s/it] 63%|██████▎ | 4312/6885 [13:31:44<1:47:13, 2.50s/it] 63%|██████▎ | 4313/6885 [13:31:46<1:43:05, 2.40s/it] 63%|██████▎ | 4314/6885 [13:31:56<3:17:10, 4.60s/it] 63%|██████▎ | 4315/6885 [13:31:58<2:44:29, 3.84s/it] 63%|██████▎ | 4316/6885 [13:32:01<2:28:23, 3.47s/it] 63%|██████▎ | 4317/6885 [13:32:03<2:08:06, 2.99s/it] 63%|██████▎ | 4318/6885 [13:32:05<1:54:25, 2.67s/it] 63%|██████▎ | 4319/6885 [13:32:07<1:54:53, 2.69s/it] 63%|██████▎ | 4320/6885 [13:32:10<1:53:45, 2.66s/it] {'loss': 0.5607, 'grad_norm': 1.1137539642969592, 'learning_rate': 3.6675815677511382e-06, 'epoch': 0.63} 63%|██████▎ | 4320/6885 [13:32:10<1:53:45, 2.66s/it] 63%|██████▎ | 4321/6885 [13:32:13<2:01:40, 2.85s/it] 63%|██████▎ | 4322/6885 [13:32:17<2:09:04, 3.02s/it] 63%|██████▎ | 4323/6885 [13:32:20<2:07:51, 2.99s/it] 63%|██████▎ | 4324/6885 [13:32:24<2:24:39, 3.39s/it] 63%|██████▎ | 4325/6885 [13:32:27<2:24:27, 3.39s/it] 63%|██████▎ | 4326/6885 [13:32:30<2:12:52, 3.12s/it] 63%|██████▎ | 4327/6885 [13:32:35<2:37:40, 3.70s/it] 63%|██████▎ | 4328/6885 [13:32:38<2:30:06, 3.52s/it] 63%|██████▎ | 4329/6885 [13:32:42<2:38:58, 3.73s/it] 63%|██████▎ | 4330/6885 [13:32:45<2:24:48, 3.40s/it] {'loss': 0.5691, 'grad_norm': 1.0875536687719785, 'learning_rate': 3.6431637539096565e-06, 'epoch': 0.63} 63%|██████▎ | 4330/6885 [13:32:45<2:24:48, 3.40s/it] 63%|██████▎ | 4331/6885 [13:32:47<2:14:26, 3.16s/it] 63%|██████▎ | 4332/6885 [13:32:50<2:06:43, 2.98s/it] 63%|██████▎ | 4333/6885 [13:32:53<2:12:22, 3.11s/it] 63%|██████▎ | 4334/6885 [13:32:56<2:10:48, 3.08s/it] 63%|██████▎ | 4335/6885 [13:32:59<2:01:51, 2.87s/it] 63%|██████▎ | 4336/6885 [13:33:00<1:47:34, 2.53s/it] 63%|██████▎ | 4337/6885 [13:33:04<1:55:03, 2.71s/it] 63%|██████▎ | 4338/6885 [13:33:07<2:08:54, 3.04s/it] 63%|██████▎ | 4339/6885 [13:33:10<2:03:32, 2.91s/it] 63%|██████▎ | 4340/6885 [13:33:13<2:09:13, 3.05s/it] {'loss': 0.5668, 'grad_norm': 1.1268225507247402, 'learning_rate': 3.6187808222332852e-06, 'epoch': 0.63} 63%|██████▎ | 4340/6885 [13:33:13<2:09:13, 3.05s/it] 63%|██████▎ | 4341/6885 [13:33:19<2:36:47, 3.70s/it] 63%|██████▎ | 4342/6885 [13:33:21<2:19:09, 3.28s/it] 63%|██████▎ | 4343/6885 [13:33:23<2:05:24, 2.96s/it] 63%|██████▎ | 4344/6885 [13:33:27<2:22:04, 3.35s/it] 63%|██████▎ | 4345/6885 [13:33:31<2:20:06, 3.31s/it] 63%|██████▎ | 4346/6885 [13:33:34<2:16:26, 3.22s/it] 63%|██████▎ | 4347/6885 [13:33:36<2:09:20, 3.06s/it] 63%|██████▎ | 4348/6885 [13:33:40<2:12:54, 3.14s/it] 63%|██████▎ | 4349/6885 [13:33:42<2:06:58, 3.00s/it] 63%|██████▎ | 4350/6885 [13:33:45<2:07:06, 3.01s/it] {'loss': 0.5551, 'grad_norm': 1.1757316218974525, 'learning_rate': 3.594433399569559e-06, 'epoch': 0.63} 63%|██████▎ | 4350/6885 [13:33:45<2:07:06, 3.01s/it] 63%|██████▎ | 4351/6885 [13:33:48<1:58:48, 2.81s/it] 63%|██████▎ | 4352/6885 [13:33:50<1:53:09, 2.68s/it] 63%|██████▎ | 4353/6885 [13:33:52<1:42:24, 2.43s/it] 63%|██████▎ | 4354/6885 [13:33:54<1:44:39, 2.48s/it] 63%|██████▎ | 4355/6885 [13:33:58<1:56:57, 2.77s/it] 63%|██████▎ | 4356/6885 [13:34:00<1:51:10, 2.64s/it] 63%|██████▎ | 4357/6885 [13:34:03<1:49:43, 2.60s/it] 63%|██████▎ | 4358/6885 [13:34:06<1:54:16, 2.71s/it] 63%|██████▎ | 4359/6885 [13:34:08<1:53:02, 2.69s/it] 63%|██████▎ | 4360/6885 [13:34:12<2:00:06, 2.85s/it] {'loss': 0.5785, 'grad_norm': 1.1554119314408926, 'learning_rate': 3.5701221118531195e-06, 'epoch': 0.63} 63%|██████▎ | 4360/6885 [13:34:12<2:00:06, 2.85s/it] 63%|██████▎ | 4361/6885 [13:34:16<2:20:14, 3.33s/it] 63%|██████▎ | 4362/6885 [13:34:19<2:12:52, 3.16s/it] 63%|██████▎ | 4363/6885 [13:34:21<2:05:29, 2.99s/it] 63%|██████▎ | 4364/6885 [13:34:24<2:07:12, 3.03s/it] 63%|██████▎ | 4365/6885 [13:34:29<2:25:17, 3.46s/it] 63%|██████▎ | 4366/6885 [13:34:33<2:28:17, 3.53s/it] 63%|██████▎ | 4367/6885 [13:34:35<2:12:43, 3.16s/it] 63%|██████▎ | 4368/6885 [13:34:37<1:55:42, 2.76s/it] 63%|██████▎ | 4369/6885 [13:34:40<1:58:26, 2.82s/it] 63%|██████▎ | 4370/6885 [13:34:43<2:03:19, 2.94s/it] {'loss': 0.5677, 'grad_norm': 1.0947128171930913, 'learning_rate': 3.5458475840896434e-06, 'epoch': 0.63} 63%|██████▎ | 4370/6885 [13:34:43<2:03:19, 2.94s/it] 63%|██████▎ | 4371/6885 [13:34:48<2:23:58, 3.44s/it] 64%|██████▎ | 4372/6885 [13:34:50<2:12:20, 3.16s/it] 64%|██████▎ | 4373/6885 [13:34:52<1:58:50, 2.84s/it] 64%|██████▎ | 4374/6885 [13:34:54<1:48:34, 2.59s/it] 64%|██████▎ | 4375/6885 [13:34:57<1:47:10, 2.56s/it] 64%|██████▎ | 4376/6885 [13:34:59<1:49:55, 2.63s/it] 64%|██████▎ | 4377/6885 [13:35:01<1:38:50, 2.36s/it] 64%|██████▎ | 4378/6885 [13:35:04<1:44:35, 2.50s/it] 64%|██████▎ | 4379/6885 [13:35:07<1:56:23, 2.79s/it] 64%|██████▎ | 4380/6885 [13:35:11<2:01:51, 2.92s/it] {'loss': 0.5504, 'grad_norm': 1.2477952532418557, 'learning_rate': 3.5216104403397623e-06, 'epoch': 0.64} 64%|██████▎ | 4380/6885 [13:35:11<2:01:51, 2.92s/it] 64%|██████▎ | 4381/6885 [13:35:13<1:58:36, 2.84s/it] 64%|██████▎ | 4382/6885 [13:35:16<1:55:16, 2.76s/it] 64%|██████▎ | 4383/6885 [13:35:18<1:51:54, 2.68s/it] 64%|██████▎ | 4384/6885 [13:35:22<2:03:57, 2.97s/it] 64%|██████▎ | 4385/6885 [13:35:25<2:00:13, 2.89s/it] 64%|██████▎ | 4386/6885 [13:35:28<2:08:45, 3.09s/it] 64%|██████▎ | 4387/6885 [13:35:30<1:54:25, 2.75s/it] 64%|██████▎ | 4388/6885 [13:35:33<1:49:46, 2.64s/it] 64%|██████▎ | 4389/6885 [13:35:35<1:45:52, 2.55s/it] 64%|██████▍ | 4390/6885 [13:35:39<2:00:48, 2.91s/it] {'loss': 0.5753, 'grad_norm': 1.1149755483280817, 'learning_rate': 3.4974113037030257e-06, 'epoch': 0.64} 64%|██████▍ | 4390/6885 [13:35:39<2:00:48, 2.91s/it] 64%|██████▍ | 4391/6885 [13:35:51<3:57:09, 5.71s/it] 64%|██████▍ | 4392/6885 [13:35:53<3:12:30, 4.63s/it] 64%|██████▍ | 4393/6885 [13:35:56<2:51:12, 4.12s/it] 64%|██████▍ | 4394/6885 [13:36:00<2:50:21, 4.10s/it] 64%|██████▍ | 4395/6885 [13:36:05<3:01:31, 4.37s/it] 64%|██████▍ | 4396/6885 [13:36:08<2:44:38, 3.97s/it] 64%|██████▍ | 4397/6885 [13:36:11<2:36:54, 3.78s/it] 64%|██████▍ | 4398/6885 [13:36:14<2:23:46, 3.47s/it] 64%|██████▍ | 4399/6885 [13:36:17<2:16:28, 3.29s/it] 64%|██████▍ | 4400/6885 [13:36:20<2:15:42, 3.28s/it] {'loss': 0.5669, 'grad_norm': 1.214526641921585, 'learning_rate': 3.473250796301874e-06, 'epoch': 0.64} 64%|██████▍ | 4400/6885 [13:36:20<2:15:42, 3.28s/it] 64%|██████▍ | 4401/6885 [13:36:24<2:22:45, 3.45s/it] 64%|██████▍ | 4402/6885 [13:36:27<2:10:39, 3.16s/it] 64%|██████▍ | 4403/6885 [13:36:29<2:04:34, 3.01s/it] 64%|██████▍ | 4404/6885 [13:36:31<1:53:34, 2.75s/it] 64%|██████▍ | 4405/6885 [13:36:34<1:56:15, 2.81s/it] 64%|██████▍ | 4406/6885 [13:36:40<2:29:53, 3.63s/it] 64%|██████▍ | 4407/6885 [13:36:44<2:28:58, 3.61s/it] 64%|██████▍ | 4408/6885 [13:36:46<2:18:02, 3.34s/it] 64%|██████▍ | 4409/6885 [13:36:49<2:15:53, 3.29s/it] 64%|██████▍ | 4410/6885 [13:36:52<2:11:33, 3.19s/it] {'loss': 0.5604, 'grad_norm': 1.1149175312128623, 'learning_rate': 3.4491295392656497e-06, 'epoch': 0.64} 64%|██████▍ | 4410/6885 [13:36:52<2:11:33, 3.19s/it] 64%|██████▍ | 4411/6885 [13:36:54<1:58:13, 2.87s/it] 64%|██████▍ | 4412/6885 [13:36:58<2:10:27, 3.17s/it] 64%|██████▍ | 4413/6885 [13:37:01<1:58:37, 2.88s/it] 64%|██████▍ | 4414/6885 [13:37:04<2:06:01, 3.06s/it] 64%|██████▍ | 4415/6885 [13:37:07<2:04:42, 3.03s/it] 64%|██████▍ | 4416/6885 [13:37:09<1:52:22, 2.73s/it] 64%|██████▍ | 4417/6885 [13:37:11<1:43:40, 2.52s/it] 64%|██████▍ | 4418/6885 [13:37:13<1:36:37, 2.35s/it] 64%|██████▍ | 4419/6885 [13:37:18<2:04:33, 3.03s/it] 64%|██████▍ | 4420/6885 [13:37:22<2:15:23, 3.30s/it] {'loss': 0.5651, 'grad_norm': 1.1763746140746527, 'learning_rate': 3.425048152714635e-06, 'epoch': 0.64} 64%|██████▍ | 4420/6885 [13:37:22<2:15:23, 3.30s/it] 64%|██████▍ | 4421/6885 [13:37:26<2:31:48, 3.70s/it] 64%|██████▍ | 4422/6885 [13:37:29<2:26:15, 3.56s/it] 64%|██████▍ | 4423/6885 [13:37:32<2:13:36, 3.26s/it] 64%|██████▍ | 4424/6885 [13:37:34<2:00:13, 2.93s/it] 64%|██████▍ | 4425/6885 [13:37:40<2:31:57, 3.71s/it] 64%|██████▍ | 4426/6885 [13:37:42<2:11:17, 3.20s/it] 64%|██████▍ | 4427/6885 [13:37:45<2:14:21, 3.28s/it] 64%|██████▍ | 4428/6885 [13:37:47<2:01:02, 2.96s/it] 64%|██████▍ | 4429/6885 [13:37:50<1:56:43, 2.85s/it] 64%|██████▍ | 4430/6885 [13:37:52<1:51:10, 2.72s/it] {'loss': 0.5685, 'grad_norm': 1.169802661186734, 'learning_rate': 3.4010072557440967e-06, 'epoch': 0.64} 64%|██████▍ | 4430/6885 [13:37:52<1:51:10, 2.72s/it] 64%|██████▍ | 4431/6885 [13:37:56<2:08:09, 3.13s/it] 64%|██████▍ | 4432/6885 [13:37:59<2:05:12, 3.06s/it] 64%|██████▍ | 4433/6885 [13:38:03<2:13:14, 3.26s/it] 64%|██████▍ | 4434/6885 [13:38:05<2:01:47, 2.98s/it] 64%|██████▍ | 4435/6885 [13:38:07<1:49:07, 2.67s/it] 64%|██████▍ | 4436/6885 [13:38:10<1:44:09, 2.55s/it] 64%|██████▍ | 4437/6885 [13:38:14<2:02:24, 3.00s/it] 64%|██████▍ | 4438/6885 [13:38:16<1:58:59, 2.92s/it] 64%|██████▍ | 4439/6885 [13:38:18<1:45:08, 2.58s/it] 64%|██████▍ | 4440/6885 [13:38:21<1:49:54, 2.70s/it] {'loss': 0.577, 'grad_norm': 1.1404701148865375, 'learning_rate': 3.3770074664083827e-06, 'epoch': 0.64} 64%|██████▍ | 4440/6885 [13:38:21<1:49:54, 2.70s/it] 65%|██████▍ | 4441/6885 [13:38:24<1:49:52, 2.70s/it] 65%|██████▍ | 4442/6885 [13:38:26<1:43:40, 2.55s/it] 65%|██████▍ | 4443/6885 [13:38:30<1:56:38, 2.87s/it] 65%|██████▍ | 4444/6885 [13:38:34<2:18:10, 3.40s/it] 65%|██████▍ | 4445/6885 [13:38:37<2:15:22, 3.33s/it] 65%|██████▍ | 4446/6885 [13:38:40<2:07:45, 3.14s/it] 65%|██████▍ | 4447/6885 [13:38:43<2:05:46, 3.10s/it] 65%|██████▍ | 4448/6885 [13:38:46<2:00:29, 2.97s/it] 65%|██████▍ | 4449/6885 [13:38:49<2:04:28, 3.07s/it] 65%|██████▍ | 4450/6885 [13:38:51<1:54:21, 2.82s/it] {'loss': 0.5546, 'grad_norm': 1.2951511455390947, 'learning_rate': 3.353049401705022e-06, 'epoch': 0.65} 65%|██████▍ | 4450/6885 [13:38:51<1:54:21, 2.82s/it] 65%|██████▍ | 4451/6885 [13:38:53<1:44:42, 2.58s/it] 65%|██████▍ | 4452/6885 [13:38:56<1:46:53, 2.64s/it] 65%|██████▍ | 4453/6885 [13:38:59<1:53:54, 2.81s/it] 65%|██████▍ | 4454/6885 [13:39:02<1:48:46, 2.68s/it] 65%|██████▍ | 4455/6885 [13:39:04<1:42:47, 2.54s/it] 65%|██████▍ | 4456/6885 [13:39:07<1:43:30, 2.56s/it] 65%|██████▍ | 4457/6885 [13:39:09<1:36:45, 2.39s/it] 65%|██████▍ | 4458/6885 [13:39:11<1:36:16, 2.38s/it] 65%|██████▍ | 4459/6885 [13:39:13<1:35:42, 2.37s/it] 65%|██████▍ | 4460/6885 [13:39:16<1:38:58, 2.45s/it] {'loss': 0.5697, 'grad_norm': 1.2188858191779428, 'learning_rate': 3.329133677558873e-06, 'epoch': 0.65} 65%|██████▍ | 4460/6885 [13:39:16<1:38:58, 2.45s/it] 65%|██████▍ | 4461/6885 [13:39:18<1:40:35, 2.49s/it] 65%|██████▍ | 4462/6885 [13:39:27<2:58:50, 4.43s/it] 65%|██████▍ | 4463/6885 [13:39:31<2:53:14, 4.29s/it] 65%|██████▍ | 4464/6885 [13:39:35<2:38:36, 3.93s/it] 65%|██████▍ | 4465/6885 [13:39:38<2:27:49, 3.66s/it] 65%|██████▍ | 4466/6885 [13:39:40<2:14:49, 3.34s/it] 65%|██████▍ | 4467/6885 [13:39:43<2:03:39, 3.07s/it] 65%|██████▍ | 4468/6885 [13:39:47<2:25:23, 3.61s/it] 65%|██████▍ | 4469/6885 [13:39:49<2:04:29, 3.09s/it] 65%|██████▍ | 4470/6885 [13:39:53<2:06:48, 3.15s/it] {'loss': 0.5901, 'grad_norm': 1.1239635889524127, 'learning_rate': 3.3052609088062767e-06, 'epoch': 0.65} 65%|██████▍ | 4470/6885 [13:39:53<2:06:48, 3.15s/it] 65%|██████▍ | 4471/6885 [13:39:56<2:15:09, 3.36s/it] 65%|██████▍ | 4472/6885 [13:40:00<2:13:59, 3.33s/it] 65%|██████▍ | 4473/6885 [13:40:02<2:03:23, 3.07s/it] 65%|██████▍ | 4474/6885 [13:40:04<1:49:36, 2.73s/it] 65%|██████▍ | 4475/6885 [13:40:09<2:10:08, 3.24s/it] 65%|██████▌ | 4476/6885 [13:40:11<2:01:21, 3.02s/it] 65%|██████▌ | 4477/6885 [13:40:13<1:46:49, 2.66s/it] 65%|██████▌ | 4478/6885 [13:40:17<1:58:15, 2.95s/it] 65%|██████▌ | 4479/6885 [13:40:19<1:50:15, 2.75s/it] 65%|██████▌ | 4480/6885 [13:40:21<1:46:23, 2.65s/it] {'loss': 0.566, 'grad_norm': 1.0931476283773633, 'learning_rate': 3.281431709179264e-06, 'epoch': 0.65} 65%|██████▌ | 4480/6885 [13:40:21<1:46:23, 2.65s/it] 65%|██████▌ | 4481/6885 [13:40:24<1:43:09, 2.57s/it] 65%|██████▌ | 4482/6885 [13:40:26<1:36:33, 2.41s/it] 65%|██████▌ | 4483/6885 [13:40:30<2:01:49, 3.04s/it] 65%|██████▌ | 4484/6885 [13:40:33<2:02:14, 3.05s/it] 65%|██████▌ | 4485/6885 [13:40:38<2:20:39, 3.52s/it] 65%|██████▌ | 4486/6885 [13:40:41<2:13:47, 3.35s/it] 65%|██████▌ | 4487/6885 [13:40:43<2:02:34, 3.07s/it] 65%|██████▌ | 4488/6885 [13:40:45<1:48:11, 2.71s/it] 65%|██████▌ | 4489/6885 [13:40:47<1:41:57, 2.55s/it] 65%|██████▌ | 4490/6885 [13:40:49<1:35:30, 2.39s/it] {'loss': 0.5761, 'grad_norm': 1.4718901865939953, 'learning_rate': 3.2576466912897674e-06, 'epoch': 0.65} 65%|██████▌ | 4490/6885 [13:40:49<1:35:30, 2.39s/it] 65%|██████▌ | 4491/6885 [13:40:54<1:58:54, 2.98s/it] 65%|██████▌ | 4492/6885 [13:40:56<1:46:41, 2.67s/it] 65%|██████▌ | 4493/6885 [13:40:58<1:44:33, 2.62s/it] 65%|██████▌ | 4494/6885 [13:41:01<1:51:08, 2.79s/it] 65%|██████▌ | 4495/6885 [13:41:07<2:21:26, 3.55s/it] 65%|██████▌ | 4496/6885 [13:41:09<2:06:45, 3.18s/it] 65%|██████▌ | 4497/6885 [13:41:11<1:56:32, 2.93s/it] 65%|██████▌ | 4498/6885 [13:41:14<1:52:27, 2.83s/it] 65%|██████▌ | 4499/6885 [13:41:17<2:00:56, 3.04s/it] 65%|██████▌ | 4500/6885 [13:41:21<2:07:21, 3.20s/it] {'loss': 0.5757, 'grad_norm': 1.2062192465520678, 'learning_rate': 3.2339064666138783e-06, 'epoch': 0.65} 65%|██████▌ | 4500/6885 [13:41:21<2:07:21, 3.20s/it] 65%|██████▌ | 4501/6885 [13:41:24<1:59:47, 3.01s/it] 65%|██████▌ | 4502/6885 [13:41:26<1:50:28, 2.78s/it] 65%|██████▌ | 4503/6885 [13:41:30<2:04:06, 3.13s/it] 65%|██████▌ | 4504/6885 [13:41:33<2:03:21, 3.11s/it] 65%|██████▌ | 4505/6885 [13:41:35<1:51:03, 2.80s/it] 65%|██████▌ | 4506/6885 [13:41:38<1:53:31, 2.86s/it] 65%|██████▌ | 4507/6885 [13:41:42<2:07:56, 3.23s/it] 65%|██████▌ | 4508/6885 [13:41:44<1:59:31, 3.02s/it] 65%|██████▌ | 4509/6885 [13:41:47<1:58:03, 2.98s/it] 66%|██████▌ | 4510/6885 [13:41:50<1:48:05, 2.73s/it] {'loss': 0.5615, 'grad_norm': 1.2732571104572175, 'learning_rate': 3.2102116454761168e-06, 'epoch': 0.66} 66%|██████▌ | 4510/6885 [13:41:50<1:48:05, 2.73s/it] 66%|██████▌ | 4511/6885 [13:41:53<1:54:10, 2.89s/it] 66%|██████▌ | 4512/6885 [13:41:55<1:43:57, 2.63s/it] 66%|██████▌ | 4513/6885 [13:41:58<1:48:48, 2.75s/it] 66%|██████▌ | 4514/6885 [13:42:02<2:06:15, 3.20s/it] 66%|██████▌ | 4515/6885 [13:42:05<2:06:33, 3.20s/it] 66%|██████▌ | 4516/6885 [13:42:10<2:20:01, 3.55s/it] 66%|██████▌ | 4517/6885 [13:42:15<2:37:35, 3.99s/it] 66%|██████▌ | 4518/6885 [13:42:18<2:25:53, 3.70s/it] 66%|██████▌ | 4519/6885 [13:42:20<2:13:40, 3.39s/it] 66%|██████▌ | 4520/6885 [13:42:23<1:59:19, 3.03s/it] {'loss': 0.5632, 'grad_norm': 1.198522063919598, 'learning_rate': 3.1865628370337575e-06, 'epoch': 0.66} 66%|██████▌ | 4520/6885 [13:42:23<1:59:19, 3.03s/it] 66%|██████▌ | 4521/6885 [13:42:25<1:50:43, 2.81s/it] 66%|██████▌ | 4522/6885 [13:42:27<1:43:05, 2.62s/it] 66%|██████▌ | 4523/6885 [13:42:29<1:38:19, 2.50s/it] 66%|██████▌ | 4524/6885 [13:42:32<1:36:40, 2.46s/it] 66%|██████▌ | 4525/6885 [13:42:33<1:29:30, 2.28s/it] 66%|██████▌ | 4526/6885 [13:42:36<1:33:05, 2.37s/it] 66%|██████▌ | 4527/6885 [13:42:38<1:33:37, 2.38s/it] 66%|██████▌ | 4528/6885 [13:42:41<1:33:24, 2.38s/it] 66%|██████▌ | 4529/6885 [13:42:44<1:39:05, 2.52s/it] 66%|██████▌ | 4530/6885 [13:42:46<1:38:31, 2.51s/it] {'loss': 0.5472, 'grad_norm': 1.208764455797361, 'learning_rate': 3.162960649261152e-06, 'epoch': 0.66} 66%|██████▌ | 4530/6885 [13:42:46<1:38:31, 2.51s/it] 66%|██████▌ | 4531/6885 [13:42:49<1:43:36, 2.64s/it] 66%|██████▌ | 4532/6885 [13:42:52<1:49:11, 2.78s/it] 66%|██████▌ | 4533/6885 [13:42:55<1:48:59, 2.78s/it] 66%|██████▌ | 4534/6885 [13:42:58<1:51:08, 2.84s/it] 66%|██████▌ | 4535/6885 [13:43:01<1:57:41, 3.01s/it] 66%|██████▌ | 4536/6885 [13:43:04<1:52:37, 2.88s/it] 66%|██████▌ | 4537/6885 [13:43:08<2:03:46, 3.16s/it] 66%|██████▌ | 4538/6885 [13:43:11<2:00:09, 3.07s/it] 66%|██████▌ | 4539/6885 [13:43:13<1:51:02, 2.84s/it] 66%|██████▌ | 4540/6885 [13:43:17<2:00:20, 3.08s/it] {'loss': 0.5737, 'grad_norm': 1.2300085896818644, 'learning_rate': 3.1394056889341086e-06, 'epoch': 0.66} 66%|██████▌ | 4540/6885 [13:43:17<2:00:20, 3.08s/it] 66%|██████▌ | 4541/6885 [13:43:19<1:48:56, 2.79s/it] 66%|██████▌ | 4542/6885 [13:43:22<1:52:00, 2.87s/it] 66%|██████▌ | 4543/6885 [13:43:24<1:42:03, 2.61s/it] 66%|██████▌ | 4544/6885 [13:43:27<1:51:07, 2.85s/it] 66%|██████▌ | 4545/6885 [13:43:30<1:49:54, 2.82s/it] 66%|██████▌ | 4546/6885 [13:43:33<1:52:05, 2.88s/it] 66%|██████▌ | 4547/6885 [13:43:35<1:48:13, 2.78s/it] 66%|██████▌ | 4548/6885 [13:43:39<1:53:22, 2.91s/it] 66%|██████▌ | 4549/6885 [13:43:42<1:58:52, 3.05s/it] 66%|██████▌ | 4550/6885 [13:43:44<1:47:25, 2.76s/it] {'loss': 0.5467, 'grad_norm': 1.2362227883984134, 'learning_rate': 3.1158985616142944e-06, 'epoch': 0.66} 66%|██████▌ | 4550/6885 [13:43:44<1:47:25, 2.76s/it] 66%|██████▌ | 4551/6885 [13:43:47<1:45:30, 2.71s/it] 66%|██████▌ | 4552/6885 [13:43:49<1:40:00, 2.57s/it] 66%|██████▌ | 4553/6885 [13:43:54<2:05:50, 3.24s/it] 66%|██████▌ | 4554/6885 [13:43:58<2:13:10, 3.43s/it] 66%|██████▌ | 4555/6885 [13:44:00<2:03:38, 3.18s/it] 66%|██████▌ | 4556/6885 [13:44:03<1:57:49, 3.04s/it] 66%|██████▌ | 4557/6885 [13:44:05<1:40:56, 2.60s/it] 66%|██████▌ | 4558/6885 [13:44:06<1:33:00, 2.40s/it] 66%|██████▌ | 4559/6885 [13:44:09<1:40:33, 2.59s/it] 66%|██████▌ | 4560/6885 [13:44:12<1:40:30, 2.59s/it] {'loss': 0.5652, 'grad_norm': 1.2577141886691818, 'learning_rate': 3.092439871633658e-06, 'epoch': 0.66} 66%|██████▌ | 4560/6885 [13:44:12<1:40:30, 2.59s/it] 66%|██████▌ | 4561/6885 [13:44:14<1:34:25, 2.44s/it] 66%|██████▋ | 4562/6885 [13:44:19<2:06:17, 3.26s/it] 66%|██████▋ | 4563/6885 [13:44:22<2:01:28, 3.14s/it] 66%|██████▋ | 4564/6885 [13:44:24<1:43:29, 2.68s/it] 66%|██████▋ | 4565/6885 [13:44:26<1:43:44, 2.68s/it] 66%|██████▋ | 4566/6885 [13:44:29<1:39:49, 2.58s/it] 66%|██████▋ | 4567/6885 [13:44:33<1:59:56, 3.10s/it] 66%|██████▋ | 4568/6885 [13:44:38<2:14:25, 3.48s/it] 66%|██████▋ | 4569/6885 [13:44:40<2:00:42, 3.13s/it] 66%|██████▋ | 4570/6885 [13:44:43<1:55:30, 2.99s/it] {'loss': 0.564, 'grad_norm': 1.2246719550977323, 'learning_rate': 3.0690302220789036e-06, 'epoch': 0.66} 66%|██████▋ | 4570/6885 [13:44:43<1:55:30, 2.99s/it] 66%|██████▋ | 4571/6885 [13:44:45<1:48:34, 2.82s/it] 66%|██████▋ | 4572/6885 [13:44:52<2:34:33, 4.01s/it] 66%|██████▋ | 4573/6885 [13:44:54<2:13:33, 3.47s/it] 66%|██████▋ | 4574/6885 [13:44:57<2:08:18, 3.33s/it] 66%|██████▋ | 4575/6885 [13:44:59<1:50:11, 2.86s/it] 66%|██████▋ | 4576/6885 [13:45:02<1:58:18, 3.07s/it] 66%|██████▋ | 4577/6885 [13:45:06<2:06:58, 3.30s/it] 66%|██████▋ | 4578/6885 [13:45:15<3:13:01, 5.02s/it] 67%|██████▋ | 4579/6885 [13:45:18<2:45:48, 4.31s/it] 67%|██████▋ | 4580/6885 [13:45:22<2:46:33, 4.34s/it] {'loss': 0.5538, 'grad_norm': 0.952770111510269, 'learning_rate': 3.0456702147759797e-06, 'epoch': 0.67} 67%|██████▋ | 4580/6885 [13:45:22<2:46:33, 4.34s/it] 67%|██████▋ | 4581/6885 [13:45:25<2:24:43, 3.77s/it] 67%|██████▋ | 4582/6885 [13:45:27<2:10:21, 3.40s/it] 67%|██████▋ | 4583/6885 [13:45:30<1:58:39, 3.09s/it] 67%|██████▋ | 4584/6885 [13:45:32<1:55:49, 3.02s/it] 67%|██████▋ | 4585/6885 [13:45:35<1:47:22, 2.80s/it] 67%|██████▋ | 4586/6885 [13:45:37<1:38:39, 2.57s/it] 67%|██████▋ | 4587/6885 [13:45:40<1:45:59, 2.77s/it] 67%|██████▋ | 4588/6885 [13:45:42<1:37:07, 2.54s/it] 67%|██████▋ | 4589/6885 [13:45:44<1:34:02, 2.46s/it] 67%|██████▋ | 4590/6885 [13:45:46<1:27:58, 2.30s/it] {'loss': 0.5624, 'grad_norm': 1.2114290005968387, 'learning_rate': 3.0223604502746097e-06, 'epoch': 0.67} 67%|██████▋ | 4590/6885 [13:45:46<1:27:58, 2.30s/it] 67%|██████▋ | 4591/6885 [13:45:48<1:26:53, 2.27s/it] 67%|██████▋ | 4592/6885 [13:45:51<1:34:12, 2.47s/it] 67%|██████▋ | 4593/6885 [13:45:53<1:29:17, 2.34s/it] 67%|██████▋ | 4594/6885 [13:45:56<1:34:04, 2.46s/it] 67%|██████▋ | 4595/6885 [13:45:59<1:42:57, 2.70s/it] 67%|██████▋ | 4596/6885 [13:46:02<1:39:40, 2.61s/it] 67%|██████▋ | 4597/6885 [13:46:04<1:35:36, 2.51s/it] 67%|██████▋ | 4598/6885 [13:46:08<1:47:55, 2.83s/it] 67%|██████▋ | 4599/6885 [13:46:10<1:46:32, 2.80s/it] 67%|██████▋ | 4600/6885 [13:46:13<1:50:51, 2.91s/it] {'loss': 0.5581, 'grad_norm': 1.2379634249474247, 'learning_rate': 2.999101527832849e-06, 'epoch': 0.67} 67%|██████▋ | 4600/6885 [13:46:13<1:50:51, 2.91s/it] 67%|██████▋ | 4601/6885 [13:46:17<1:57:37, 3.09s/it] 67%|██████▋ | 4602/6885 [13:46:19<1:43:03, 2.71s/it] 67%|██████▋ | 4603/6885 [13:46:21<1:38:06, 2.58s/it] 67%|██████▋ | 4604/6885 [13:46:23<1:28:21, 2.32s/it] 67%|██████▋ | 4605/6885 [13:46:28<1:57:03, 3.08s/it] 67%|██████▋ | 4606/6885 [13:46:30<1:47:23, 2.83s/it] 67%|██████▋ | 4607/6885 [13:46:33<1:50:50, 2.92s/it] 67%|██████▋ | 4608/6885 [13:46:37<2:05:52, 3.32s/it] 67%|██████▋ | 4609/6885 [13:46:41<2:07:47, 3.37s/it] 67%|██████▋ | 4610/6885 [13:46:43<1:55:02, 3.03s/it] {'loss': 0.5519, 'grad_norm': 1.2432970361649818, 'learning_rate': 2.9758940454016893e-06, 'epoch': 0.67} 67%|██████▋ | 4610/6885 [13:46:43<1:55:02, 3.03s/it] 67%|██████▋ | 4611/6885 [13:46:45<1:46:00, 2.80s/it] 67%|██████▋ | 4612/6885 [13:46:47<1:34:22, 2.49s/it] 67%|██████▋ | 4613/6885 [13:46:50<1:38:04, 2.59s/it] 67%|██████▋ | 4614/6885 [13:46:55<2:04:09, 3.28s/it] 67%|██████▋ | 4615/6885 [13:46:58<2:06:02, 3.33s/it] 67%|██████▋ | 4616/6885 [13:47:01<1:56:54, 3.09s/it] 67%|██████▋ | 4617/6885 [13:47:04<1:57:44, 3.11s/it] 67%|██████▋ | 4618/6885 [13:47:08<2:11:39, 3.48s/it] 67%|██████▋ | 4619/6885 [13:47:11<1:58:31, 3.14s/it] 67%|██████▋ | 4620/6885 [13:47:13<1:52:56, 2.99s/it] {'loss': 0.5512, 'grad_norm': 1.1827840525798392, 'learning_rate': 2.9527385996096702e-06, 'epoch': 0.67} 67%|██████▋ | 4620/6885 [13:47:13<1:52:56, 2.99s/it] 67%|██████▋ | 4621/6885 [13:47:16<1:50:29, 2.93s/it] 67%|██████▋ | 4622/6885 [13:47:18<1:34:57, 2.52s/it] 67%|██████▋ | 4623/6885 [13:47:21<1:47:41, 2.86s/it] 67%|██████▋ | 4624/6885 [13:47:24<1:46:22, 2.82s/it] 67%|██████▋ | 4625/6885 [13:47:27<1:51:48, 2.97s/it] 67%|██████▋ | 4626/6885 [13:47:29<1:34:27, 2.51s/it] 67%|██████▋ | 4627/6885 [13:47:31<1:32:22, 2.45s/it] 67%|██████▋ | 4628/6885 [13:47:35<1:51:00, 2.95s/it] 67%|██████▋ | 4629/6885 [13:47:38<1:48:35, 2.89s/it] 67%|██████▋ | 4630/6885 [13:47:40<1:42:43, 2.73s/it] {'loss': 0.5615, 'grad_norm': 1.1313263342846276, 'learning_rate': 2.929635785747558e-06, 'epoch': 0.67} 67%|██████▋ | 4630/6885 [13:47:40<1:42:43, 2.73s/it] 67%|██████▋ | 4631/6885 [13:47:42<1:34:40, 2.52s/it] 67%|██████▋ | 4632/6885 [13:47:45<1:39:36, 2.65s/it] 67%|██████▋ | 4633/6885 [13:47:48<1:41:13, 2.70s/it] 67%|██████▋ | 4634/6885 [13:47:51<1:46:02, 2.83s/it] 67%|██████▋ | 4635/6885 [13:47:57<2:15:23, 3.61s/it] 67%|██████▋ | 4636/6885 [13:48:00<2:14:06, 3.58s/it] 67%|██████▋ | 4637/6885 [13:48:03<2:03:14, 3.29s/it] 67%|██████▋ | 4638/6885 [13:48:06<1:59:25, 3.19s/it] 67%|██████▋ | 4639/6885 [13:48:08<1:47:29, 2.87s/it] 67%|██████▋ | 4640/6885 [13:48:10<1:43:27, 2.77s/it] {'loss': 0.5577, 'grad_norm': 1.0718626125088186, 'learning_rate': 2.9065861977530263e-06, 'epoch': 0.67} 67%|██████▋ | 4640/6885 [13:48:10<1:43:27, 2.77s/it] 67%|██████▋ | 4641/6885 [13:48:13<1:44:57, 2.81s/it] 67%|██████▋ | 4642/6885 [13:48:16<1:43:40, 2.77s/it] 67%|██████▋ | 4643/6885 [13:48:19<1:48:07, 2.89s/it] 67%|██████▋ | 4644/6885 [13:48:21<1:40:54, 2.70s/it] 67%|██████▋ | 4645/6885 [13:48:23<1:33:56, 2.52s/it] 67%|██████▋ | 4646/6885 [13:48:26<1:34:55, 2.54s/it] 67%|██████▋ | 4647/6885 [13:48:28<1:33:40, 2.51s/it] 68%|██████▊ | 4648/6885 [13:48:31<1:37:09, 2.61s/it] 68%|██████▊ | 4649/6885 [13:48:34<1:42:50, 2.76s/it] 68%|██████▊ | 4650/6885 [13:48:37<1:35:29, 2.56s/it] {'loss': 0.5543, 'grad_norm': 1.2058366328226908, 'learning_rate': 2.8835904281953984e-06, 'epoch': 0.68} 68%|██████▊ | 4650/6885 [13:48:37<1:35:29, 2.56s/it] 68%|██████▊ | 4651/6885 [13:48:39<1:36:46, 2.60s/it] 68%|██████▊ | 4652/6885 [13:48:42<1:36:00, 2.58s/it] 68%|██████▊ | 4653/6885 [13:48:46<1:56:59, 3.15s/it] 68%|██████▊ | 4654/6885 [13:48:52<2:25:28, 3.91s/it] 68%|██████▊ | 4655/6885 [13:48:56<2:30:56, 4.06s/it] 68%|██████▊ | 4656/6885 [13:49:01<2:35:47, 4.19s/it] 68%|██████▊ | 4657/6885 [13:49:03<2:12:06, 3.56s/it] 68%|██████▊ | 4658/6885 [13:49:05<1:56:59, 3.15s/it] 68%|██████▊ | 4659/6885 [13:49:09<2:03:43, 3.34s/it] 68%|██████▊ | 4660/6885 [13:49:11<1:51:12, 3.00s/it] {'loss': 0.563, 'grad_norm': 1.2044090066060698, 'learning_rate': 2.8606490682604083e-06, 'epoch': 0.68} 68%|██████▊ | 4660/6885 [13:49:11<1:51:12, 3.00s/it] 68%|██████▊ | 4661/6885 [13:49:14<1:46:16, 2.87s/it] 68%|██████▊ | 4662/6885 [13:49:16<1:43:29, 2.79s/it] 68%|██████▊ | 4663/6885 [13:49:18<1:30:27, 2.44s/it] 68%|██████▊ | 4664/6885 [13:49:20<1:31:49, 2.48s/it] 68%|██████▊ | 4665/6885 [13:49:23<1:34:01, 2.54s/it] 68%|██████▊ | 4666/6885 [13:49:26<1:36:17, 2.60s/it] 68%|██████▊ | 4667/6885 [13:49:28<1:33:27, 2.53s/it] 68%|██████▊ | 4668/6885 [13:49:32<1:50:19, 2.99s/it] 68%|██████▊ | 4669/6885 [13:49:35<1:45:32, 2.86s/it] 68%|██████▊ | 4670/6885 [13:49:37<1:33:03, 2.52s/it] {'loss': 0.5678, 'grad_norm': 1.2440783490748353, 'learning_rate': 2.837762707734999e-06, 'epoch': 0.68} 68%|██████▊ | 4670/6885 [13:49:37<1:33:03, 2.52s/it] 68%|██████▊ | 4671/6885 [13:49:40<1:38:54, 2.68s/it] 68%|██████▊ | 4672/6885 [13:49:43<1:43:38, 2.81s/it] 68%|██████▊ | 4673/6885 [13:49:45<1:43:04, 2.80s/it] 68%|██████▊ | 4674/6885 [13:49:48<1:39:03, 2.69s/it] 68%|██████▊ | 4675/6885 [13:49:50<1:27:20, 2.37s/it] 68%|██████▊ | 4676/6885 [13:49:52<1:28:43, 2.41s/it] 68%|██████▊ | 4677/6885 [13:49:56<1:42:00, 2.77s/it] 68%|██████▊ | 4678/6885 [13:49:57<1:30:14, 2.45s/it] 68%|██████▊ | 4679/6885 [13:50:01<1:48:02, 2.94s/it] 68%|██████▊ | 4680/6885 [13:50:04<1:41:39, 2.77s/it] {'loss': 0.5443, 'grad_norm': 1.1447619754452882, 'learning_rate': 2.8149319349921678e-06, 'epoch': 0.68} 68%|██████▊ | 4680/6885 [13:50:04<1:41:39, 2.77s/it] 68%|██████▊ | 4681/6885 [13:50:06<1:40:08, 2.73s/it] 68%|██████▊ | 4682/6885 [13:50:09<1:40:42, 2.74s/it] 68%|██████▊ | 4683/6885 [13:50:12<1:42:30, 2.79s/it] 68%|██████▊ | 4684/6885 [13:50:16<1:52:18, 3.06s/it] 68%|██████▊ | 4685/6885 [13:50:19<1:48:41, 2.96s/it] 68%|██████▊ | 4686/6885 [13:50:22<1:52:59, 3.08s/it] 68%|██████▊ | 4687/6885 [13:50:25<1:51:16, 3.04s/it] 68%|██████▊ | 4688/6885 [13:50:27<1:45:11, 2.87s/it] 68%|██████▊ | 4689/6885 [13:50:30<1:37:52, 2.67s/it] 68%|██████▊ | 4690/6885 [13:50:33<1:46:38, 2.91s/it] {'loss': 0.5548, 'grad_norm': 1.0682059420594845, 'learning_rate': 2.7921573369758344e-06, 'epoch': 0.68} 68%|██████▊ | 4690/6885 [13:50:33<1:46:38, 2.91s/it] 68%|██████▊ | 4691/6885 [13:50:35<1:38:15, 2.69s/it] 68%|██████▊ | 4692/6885 [13:50:39<1:50:55, 3.04s/it] 68%|██████▊ | 4693/6885 [13:50:41<1:41:47, 2.79s/it] 68%|██████▊ | 4694/6885 [13:50:44<1:42:28, 2.81s/it] 68%|██████▊ | 4695/6885 [13:50:46<1:33:54, 2.57s/it] 68%|██████▊ | 4696/6885 [13:50:48<1:25:50, 2.35s/it] 68%|██████▊ | 4697/6885 [13:50:50<1:21:59, 2.25s/it] 68%|██████▊ | 4698/6885 [13:50:52<1:21:44, 2.24s/it] 68%|██████▊ | 4699/6885 [13:50:55<1:28:24, 2.43s/it] 68%|██████▊ | 4700/6885 [13:50:59<1:41:26, 2.79s/it] {'loss': 0.557, 'grad_norm': 1.0786981942796325, 'learning_rate': 2.769439499185752e-06, 'epoch': 0.68} 68%|██████▊ | 4700/6885 [13:50:59<1:41:26, 2.79s/it] 68%|██████▊ | 4701/6885 [13:51:02<1:52:25, 3.09s/it] 68%|██████▊ | 4702/6885 [13:51:07<2:09:02, 3.55s/it] 68%|██████▊ | 4703/6885 [13:51:10<2:05:49, 3.46s/it] 68%|██████▊ | 4704/6885 [13:51:19<2:58:49, 4.92s/it] 68%|██████▊ | 4705/6885 [13:51:22<2:39:41, 4.40s/it] 68%|██████▊ | 4706/6885 [13:51:24<2:19:56, 3.85s/it] 68%|██████▊ | 4707/6885 [13:51:28<2:13:06, 3.67s/it] 68%|██████▊ | 4708/6885 [13:51:30<2:00:15, 3.31s/it] 68%|██████▊ | 4709/6885 [13:51:32<1:45:48, 2.92s/it] 68%|██████▊ | 4710/6885 [13:51:35<1:40:24, 2.77s/it] {'loss': 0.5641, 'grad_norm': 1.1021974391300458, 'learning_rate': 2.7467790056624565e-06, 'epoch': 0.68} 68%|██████▊ | 4710/6885 [13:51:35<1:40:24, 2.77s/it] 68%|██████▊ | 4711/6885 [13:51:38<1:50:58, 3.06s/it] 68%|██████▊ | 4712/6885 [13:51:42<2:02:49, 3.39s/it] 68%|██████▊ | 4713/6885 [13:51:45<1:50:50, 3.06s/it] 68%|██████▊ | 4714/6885 [13:51:48<1:56:33, 3.22s/it] 68%|██████▊ | 4715/6885 [13:51:50<1:43:46, 2.87s/it] 68%|██████▊ | 4716/6885 [13:51:55<2:05:15, 3.46s/it] 69%|██████▊ | 4717/6885 [13:51:59<2:05:16, 3.47s/it] 69%|██████▊ | 4718/6885 [13:52:04<2:27:23, 4.08s/it] 69%|██████▊ | 4719/6885 [13:52:06<2:06:15, 3.50s/it] 69%|██████▊ | 4720/6885 [13:52:09<1:52:03, 3.11s/it] {'loss': 0.5579, 'grad_norm': 1.172642324603278, 'learning_rate': 2.7241764389722536e-06, 'epoch': 0.69} 69%|██████▊ | 4720/6885 [13:52:09<1:52:03, 3.11s/it] 69%|██████▊ | 4721/6885 [13:52:11<1:44:34, 2.90s/it] 69%|██████▊ | 4722/6885 [13:52:14<1:47:46, 2.99s/it] 69%|██████▊ | 4723/6885 [13:52:17<1:49:19, 3.03s/it] 69%|██████▊ | 4724/6885 [13:52:20<1:41:32, 2.82s/it] 69%|██████▊ | 4725/6885 [13:52:22<1:32:15, 2.56s/it] 69%|██████▊ | 4726/6885 [13:52:24<1:27:13, 2.42s/it] 69%|██████▊ | 4727/6885 [13:52:27<1:39:25, 2.76s/it] 69%|██████▊ | 4728/6885 [13:52:30<1:42:15, 2.84s/it] 69%|██████▊ | 4729/6885 [13:52:33<1:43:00, 2.87s/it] 69%|██████▊ | 4730/6885 [13:52:35<1:34:37, 2.63s/it] {'loss': 0.5426, 'grad_norm': 1.1739344769196898, 'learning_rate': 2.7016323801922327e-06, 'epoch': 0.69} 69%|██████▊ | 4730/6885 [13:52:35<1:34:37, 2.63s/it] 69%|██████▊ | 4731/6885 [13:52:38<1:36:32, 2.69s/it] 69%|██████▊ | 4732/6885 [13:52:41<1:36:12, 2.68s/it] 69%|██████▊ | 4733/6885 [13:52:43<1:33:19, 2.60s/it] 69%|██████▉ | 4734/6885 [13:52:46<1:37:42, 2.73s/it] 69%|██████▉ | 4735/6885 [13:52:49<1:35:20, 2.66s/it] 69%|██████▉ | 4736/6885 [13:52:51<1:33:55, 2.62s/it] 69%|██████▉ | 4737/6885 [13:52:53<1:26:28, 2.42s/it] 69%|██████▉ | 4738/6885 [13:53:00<2:17:07, 3.83s/it] 69%|██████▉ | 4739/6885 [13:53:04<2:17:18, 3.84s/it] 69%|██████▉ | 4740/6885 [13:53:07<2:07:33, 3.57s/it] {'loss': 0.5667, 'grad_norm': 1.0908808031509236, 'learning_rate': 2.679147408895349e-06, 'epoch': 0.69} 69%|██████▉ | 4740/6885 [13:53:07<2:07:33, 3.57s/it] 69%|██████▉ | 4741/6885 [13:53:11<2:08:29, 3.60s/it] 69%|██████▉ | 4742/6885 [13:53:13<1:52:30, 3.15s/it] 69%|██████▉ | 4743/6885 [13:53:16<1:57:02, 3.28s/it] 69%|██████▉ | 4744/6885 [13:53:19<1:44:34, 2.93s/it] 69%|██████▉ | 4745/6885 [13:53:21<1:40:01, 2.80s/it] 69%|██████▉ | 4746/6885 [13:53:24<1:43:32, 2.90s/it] 69%|██████▉ | 4747/6885 [13:53:27<1:43:24, 2.90s/it] 69%|██████▉ | 4748/6885 [13:53:31<1:49:14, 3.07s/it] 69%|██████▉ | 4749/6885 [13:53:33<1:37:18, 2.73s/it] 69%|██████▉ | 4750/6885 [13:53:35<1:33:51, 2.64s/it] {'loss': 0.5639, 'grad_norm': 1.1345661062696517, 'learning_rate': 2.6567221031354907e-06, 'epoch': 0.69} 69%|██████▉ | 4750/6885 [13:53:35<1:33:51, 2.64s/it] 69%|██████▉ | 4751/6885 [13:53:37<1:24:44, 2.38s/it] 69%|██████▉ | 4752/6885 [13:53:39<1:27:44, 2.47s/it] 69%|██████▉ | 4753/6885 [13:53:43<1:42:30, 2.88s/it] 69%|██████▉ | 4754/6885 [13:53:45<1:32:59, 2.62s/it] 69%|██████▉ | 4755/6885 [13:53:48<1:31:14, 2.57s/it] 69%|██████▉ | 4756/6885 [13:53:50<1:33:13, 2.63s/it] 69%|██████▉ | 4757/6885 [13:53:53<1:28:57, 2.51s/it] 69%|██████▉ | 4758/6885 [13:53:56<1:38:19, 2.77s/it] 69%|██████▉ | 4759/6885 [13:53:58<1:31:38, 2.59s/it] 69%|██████▉ | 4760/6885 [13:54:04<2:06:03, 3.56s/it] {'loss': 0.5648, 'grad_norm': 1.0249096917283105, 'learning_rate': 2.634357039432656e-06, 'epoch': 0.69} 69%|██████▉ | 4760/6885 [13:54:04<2:06:03, 3.56s/it] 69%|██████▉ | 4761/6885 [13:54:06<1:52:52, 3.19s/it] 69%|██████▉ | 4762/6885 [13:54:08<1:39:18, 2.81s/it] 69%|██████▉ | 4763/6885 [13:54:11<1:39:46, 2.82s/it] 69%|██████▉ | 4764/6885 [13:54:13<1:33:46, 2.65s/it] 69%|██████▉ | 4765/6885 [13:54:16<1:35:31, 2.70s/it] 69%|██████▉ | 4766/6885 [13:54:19<1:34:32, 2.68s/it] 69%|██████▉ | 4767/6885 [13:54:23<1:52:04, 3.17s/it] 69%|██████▉ | 4768/6885 [13:54:27<1:56:04, 3.29s/it] 69%|██████▉ | 4769/6885 [13:54:29<1:47:27, 3.05s/it] 69%|██████▉ | 4770/6885 [13:54:32<1:42:34, 2.91s/it] {'loss': 0.5651, 'grad_norm': 1.1583880032183098, 'learning_rate': 2.612052792758095e-06, 'epoch': 0.69} 69%|██████▉ | 4770/6885 [13:54:32<1:42:34, 2.91s/it] 69%|██████▉ | 4771/6885 [13:54:36<1:59:18, 3.39s/it] 69%|██████▉ | 4772/6885 [13:54:39<1:48:36, 3.08s/it] 69%|██████▉ | 4773/6885 [13:54:42<1:49:32, 3.11s/it] 69%|██████▉ | 4774/6885 [13:54:45<1:47:59, 3.07s/it] 69%|██████▉ | 4775/6885 [13:54:48<1:49:48, 3.12s/it] 69%|██████▉ | 4776/6885 [13:54:51<1:45:19, 3.00s/it] 69%|██████▉ | 4777/6885 [13:54:54<1:46:02, 3.02s/it] 69%|██████▉ | 4778/6885 [13:54:57<1:44:52, 2.99s/it] 69%|██████▉ | 4779/6885 [13:55:00<1:42:37, 2.92s/it] 69%|██████▉ | 4780/6885 [13:55:02<1:36:55, 2.76s/it] {'loss': 0.5722, 'grad_norm': 1.069684864764473, 'learning_rate': 2.5898099365195626e-06, 'epoch': 0.69} 69%|██████▉ | 4780/6885 [13:55:02<1:36:55, 2.76s/it] 69%|██████▉ | 4781/6885 [13:55:04<1:25:30, 2.44s/it] 69%|██████▉ | 4782/6885 [13:55:06<1:29:41, 2.56s/it] 69%|██████▉ | 4783/6885 [13:55:11<1:54:38, 3.27s/it] 69%|██████▉ | 4784/6885 [13:55:15<1:52:35, 3.22s/it] 69%|██████▉ | 4785/6885 [13:55:18<1:58:13, 3.38s/it] 70%|██████▉ | 4786/6885 [13:55:21<1:49:46, 3.14s/it] 70%|██████▉ | 4787/6885 [13:55:23<1:39:38, 2.85s/it] 70%|██████▉ | 4788/6885 [13:55:27<1:51:02, 3.18s/it] 70%|██████▉ | 4789/6885 [13:55:30<1:46:43, 3.06s/it] 70%|██████▉ | 4790/6885 [13:55:32<1:42:26, 2.93s/it] {'loss': 0.5664, 'grad_norm': 1.0867414593247826, 'learning_rate': 2.5676290425465496e-06, 'epoch': 0.7} 70%|██████▉ | 4790/6885 [13:55:32<1:42:26, 2.93s/it] 70%|██████▉ | 4791/6885 [13:55:35<1:42:01, 2.92s/it] 70%|██████▉ | 4792/6885 [13:55:41<2:13:02, 3.81s/it] 70%|██████▉ | 4793/6885 [13:55:43<1:56:06, 3.33s/it] 70%|██████▉ | 4794/6885 [13:55:45<1:42:37, 2.94s/it] 70%|██████▉ | 4795/6885 [13:55:47<1:31:31, 2.63s/it] 70%|██████▉ | 4796/6885 [13:55:50<1:30:26, 2.60s/it] 70%|██████▉ | 4797/6885 [13:55:53<1:33:58, 2.70s/it] 70%|██████▉ | 4798/6885 [13:55:56<1:34:44, 2.72s/it] 70%|██████▉ | 4799/6885 [13:55:57<1:25:31, 2.46s/it] 70%|██████▉ | 4800/6885 [13:56:00<1:21:51, 2.36s/it] {'loss': 0.5585, 'grad_norm': 1.1375716473128172, 'learning_rate': 2.5455106810755957e-06, 'epoch': 0.7} 70%|██████▉ | 4800/6885 [13:56:00<1:21:51, 2.36s/it] 70%|██████▉ | 4801/6885 [13:56:03<1:37:04, 2.79s/it] 70%|██████▉ | 4802/6885 [13:56:08<1:58:49, 3.42s/it] 70%|██████▉ | 4803/6885 [13:56:12<2:04:55, 3.60s/it] 70%|██████▉ | 4804/6885 [13:56:14<1:49:06, 3.15s/it] 70%|██████▉ | 4805/6885 [13:56:17<1:45:13, 3.04s/it] 70%|██████▉ | 4806/6885 [13:56:19<1:33:24, 2.70s/it] 70%|██████▉ | 4807/6885 [13:56:22<1:32:06, 2.66s/it] 70%|██████▉ | 4808/6885 [13:56:24<1:26:52, 2.51s/it] 70%|██████▉ | 4809/6885 [13:56:26<1:29:00, 2.57s/it] 70%|██████▉ | 4810/6885 [13:56:31<1:46:09, 3.07s/it] {'loss': 0.5722, 'grad_norm': 1.034623153574018, 'learning_rate': 2.5234554207356266e-06, 'epoch': 0.7} 70%|██████▉ | 4810/6885 [13:56:31<1:46:09, 3.07s/it] 70%|██████▉ | 4811/6885 [13:56:33<1:35:28, 2.76s/it] 70%|██████▉ | 4812/6885 [13:56:34<1:21:49, 2.37s/it] 70%|██████▉ | 4813/6885 [13:56:36<1:19:55, 2.31s/it] 70%|██████▉ | 4814/6885 [13:56:40<1:34:50, 2.75s/it] 70%|██████▉ | 4815/6885 [13:56:43<1:39:02, 2.87s/it] 70%|██████▉ | 4816/6885 [13:56:47<1:44:12, 3.02s/it] 70%|██████▉ | 4817/6885 [13:56:51<1:52:46, 3.27s/it] 70%|██████▉ | 4818/6885 [13:56:53<1:46:24, 3.09s/it] 70%|██████▉ | 4819/6885 [13:56:56<1:39:32, 2.89s/it] 70%|███████ | 4820/6885 [13:56:59<1:43:44, 3.01s/it] {'loss': 0.5643, 'grad_norm': 1.0654655922639538, 'learning_rate': 2.5014638285333357e-06, 'epoch': 0.7} 70%|███████ | 4820/6885 [13:56:59<1:43:44, 3.01s/it] 70%|███████ | 4821/6885 [13:57:01<1:29:58, 2.62s/it] 70%|███████ | 4822/6885 [13:57:04<1:40:30, 2.92s/it] 70%|███████ | 4823/6885 [13:57:07<1:42:35, 2.99s/it] 70%|███████ | 4824/6885 [13:57:10<1:41:22, 2.95s/it] 70%|███████ | 4825/6885 [13:57:13<1:39:13, 2.89s/it] 70%|███████ | 4826/6885 [13:57:15<1:32:57, 2.71s/it] 70%|███████ | 4827/6885 [13:57:18<1:36:37, 2.82s/it] 70%|███████ | 4828/6885 [13:57:21<1:35:15, 2.78s/it] 70%|███████ | 4829/6885 [13:57:24<1:38:00, 2.86s/it] 70%|███████ | 4830/6885 [13:57:28<1:46:12, 3.10s/it] {'loss': 0.5635, 'grad_norm': 1.0988829596394427, 'learning_rate': 2.479536469838606e-06, 'epoch': 0.7} 70%|███████ | 4830/6885 [13:57:28<1:46:12, 3.10s/it] 70%|███████ | 4831/6885 [13:57:29<1:28:43, 2.59s/it] 70%|███████ | 4832/6885 [13:57:31<1:23:08, 2.43s/it] 70%|███████ | 4833/6885 [13:57:35<1:36:30, 2.82s/it] 70%|███████ | 4834/6885 [13:57:38<1:43:29, 3.03s/it] 70%|███████ | 4835/6885 [13:57:42<1:48:28, 3.18s/it] 70%|███████ | 4836/6885 [13:57:45<1:44:54, 3.07s/it] 70%|███████ | 4837/6885 [13:57:49<1:58:51, 3.48s/it] 70%|███████ | 4838/6885 [13:57:52<1:52:20, 3.29s/it] 70%|███████ | 4839/6885 [13:57:56<1:55:33, 3.39s/it] 70%|███████ | 4840/6885 [13:57:58<1:48:47, 3.19s/it] {'loss': 0.55, 'grad_norm': 1.050301540250255, 'learning_rate': 2.4576739083699764e-06, 'epoch': 0.7} 70%|███████ | 4840/6885 [13:57:58<1:48:47, 3.19s/it] 70%|███████ | 4841/6885 [13:58:00<1:34:53, 2.79s/it] 70%|███████ | 4842/6885 [13:58:03<1:31:58, 2.70s/it] 70%|███████ | 4843/6885 [13:58:05<1:29:21, 2.63s/it] 70%|███████ | 4844/6885 [13:58:08<1:32:36, 2.72s/it] 70%|███████ | 4845/6885 [13:58:11<1:31:45, 2.70s/it] 70%|███████ | 4846/6885 [13:58:14<1:36:49, 2.85s/it] 70%|███████ | 4847/6885 [13:58:16<1:26:51, 2.56s/it] 70%|███████ | 4848/6885 [13:58:19<1:29:09, 2.63s/it] 70%|███████ | 4849/6885 [13:58:21<1:24:01, 2.48s/it] 70%|███████ | 4850/6885 [13:58:24<1:32:55, 2.74s/it] {'loss': 0.5686, 'grad_norm': 1.3185971209726384, 'learning_rate': 2.43587670618015e-06, 'epoch': 0.7} 70%|███████ | 4850/6885 [13:58:24<1:32:55, 2.74s/it] 70%|███████ | 4851/6885 [13:58:26<1:22:56, 2.45s/it] 70%|███████ | 4852/6885 [13:58:29<1:26:51, 2.56s/it] 70%|███████ | 4853/6885 [13:58:31<1:22:49, 2.45s/it] 71%|███████ | 4854/6885 [13:58:33<1:22:40, 2.44s/it] 71%|███████ | 4855/6885 [13:58:36<1:29:16, 2.64s/it] 71%|███████ | 4856/6885 [13:58:39<1:30:49, 2.69s/it] 71%|███████ | 4857/6885 [13:58:42<1:30:59, 2.69s/it] 71%|███████ | 4858/6885 [13:58:44<1:27:05, 2.58s/it] 71%|███████ | 4859/6885 [13:58:47<1:23:39, 2.48s/it] 71%|███████ | 4860/6885 [13:58:51<1:47:53, 3.20s/it] {'loss': 0.5617, 'grad_norm': 1.1036440984293434, 'learning_rate': 2.4141454236415428e-06, 'epoch': 0.71} 71%|███████ | 4860/6885 [13:58:51<1:47:53, 3.20s/it] 71%|███████ | 4861/6885 [13:58:54<1:40:51, 2.99s/it] 71%|███████ | 4862/6885 [13:58:56<1:31:59, 2.73s/it] 71%|███████ | 4863/6885 [13:59:00<1:39:38, 2.96s/it] 71%|███████ | 4864/6885 [13:59:03<1:47:38, 3.20s/it] 71%|███████ | 4865/6885 [13:59:06<1:40:15, 2.98s/it] 71%|███████ | 4866/6885 [13:59:09<1:43:30, 3.08s/it] 71%|███████ | 4867/6885 [13:59:11<1:36:43, 2.88s/it] 71%|███████ | 4868/6885 [13:59:14<1:37:46, 2.91s/it] 71%|███████ | 4869/6885 [13:59:17<1:29:55, 2.68s/it] 71%|███████ | 4870/6885 [13:59:20<1:41:00, 3.01s/it] {'loss': 0.5416, 'grad_norm': 1.0669150287420783, 'learning_rate': 2.392480619431879e-06, 'epoch': 0.71} 71%|███████ | 4870/6885 [13:59:20<1:41:00, 3.01s/it] 71%|███████ | 4871/6885 [13:59:23<1:35:48, 2.85s/it] 71%|███████ | 4872/6885 [13:59:26<1:34:31, 2.82s/it] 71%|███████ | 4873/6885 [13:59:30<1:54:10, 3.40s/it] 71%|███████ | 4874/6885 [13:59:33<1:46:48, 3.19s/it] 71%|███████ | 4875/6885 [13:59:36<1:43:54, 3.10s/it] 71%|███████ | 4876/6885 [13:59:38<1:35:56, 2.87s/it] 71%|███████ | 4877/6885 [13:59:41<1:36:06, 2.87s/it] 71%|███████ | 4878/6885 [13:59:44<1:31:31, 2.74s/it] 71%|███████ | 4879/6885 [13:59:46<1:29:25, 2.67s/it] 71%|███████ | 4880/6885 [13:59:48<1:22:45, 2.48s/it] {'loss': 0.5777, 'grad_norm': 1.0472161733755885, 'learning_rate': 2.3708828505198265e-06, 'epoch': 0.71} 71%|███████ | 4880/6885 [13:59:48<1:22:45, 2.48s/it] 71%|███████ | 4881/6885 [13:59:51<1:28:09, 2.64s/it] 71%|███████ | 4882/6885 [13:59:54<1:35:25, 2.86s/it] 71%|███████ | 4883/6885 [13:59:58<1:38:29, 2.95s/it] 71%|███████ | 4884/6885 [14:00:00<1:28:36, 2.66s/it] 71%|███████ | 4885/6885 [14:00:03<1:32:31, 2.78s/it] 71%|███████ | 4886/6885 [14:00:06<1:33:10, 2.80s/it] 71%|███████ | 4887/6885 [14:00:08<1:29:14, 2.68s/it] 71%|███████ | 4888/6885 [14:00:10<1:23:52, 2.52s/it] 71%|███████ | 4889/6885 [14:00:12<1:21:24, 2.45s/it] 71%|███████ | 4890/6885 [14:00:14<1:15:46, 2.28s/it] {'loss': 0.5535, 'grad_norm': 1.1252884484776227, 'learning_rate': 2.349352672150681e-06, 'epoch': 0.71} 71%|███████ | 4890/6885 [14:00:14<1:15:46, 2.28s/it] 71%|███████ | 4891/6885 [14:00:16<1:15:12, 2.26s/it] 71%|███████ | 4892/6885 [14:00:19<1:17:19, 2.33s/it] 71%|███████ | 4893/6885 [14:00:21<1:18:06, 2.35s/it] 71%|███████ | 4894/6885 [14:00:25<1:31:04, 2.74s/it] 71%|███████ | 4895/6885 [14:00:28<1:35:07, 2.87s/it] 71%|███████ | 4896/6885 [14:00:31<1:37:53, 2.95s/it] 71%|███████ | 4897/6885 [14:00:34<1:35:32, 2.88s/it] 71%|███████ | 4898/6885 [14:00:37<1:35:21, 2.88s/it] 71%|███████ | 4899/6885 [14:00:40<1:39:54, 3.02s/it] 71%|███████ | 4900/6885 [14:00:43<1:33:35, 2.83s/it] {'loss': 0.5598, 'grad_norm': 1.1423409076437527, 'learning_rate': 2.3278906378320854e-06, 'epoch': 0.71} 71%|███████ | 4900/6885 [14:00:43<1:33:35, 2.83s/it] 71%|███████ | 4901/6885 [14:00:45<1:25:38, 2.59s/it] 71%|███████ | 4902/6885 [14:00:48<1:35:09, 2.88s/it] 71%|███████ | 4903/6885 [14:00:51<1:34:51, 2.87s/it] 71%|███████ | 4904/6885 [14:00:53<1:26:50, 2.63s/it] 71%|███████ | 4905/6885 [14:00:56<1:24:08, 2.55s/it] 71%|███████▏ | 4906/6885 [14:00:59<1:35:18, 2.89s/it] 71%|███████▏ | 4907/6885 [14:01:04<1:50:00, 3.34s/it] 71%|███████▏ | 4908/6885 [14:01:06<1:42:00, 3.10s/it] 71%|███████▏ | 4909/6885 [14:01:10<1:51:02, 3.37s/it] 71%|███████▏ | 4910/6885 [14:01:13<1:50:42, 3.36s/it] {'loss': 0.5551, 'grad_norm': 0.9801237939355479, 'learning_rate': 2.306497299319814e-06, 'epoch': 0.71} 71%|███████▏ | 4910/6885 [14:01:13<1:50:42, 3.36s/it] 71%|███████▏ | 4911/6885 [14:01:18<2:00:56, 3.68s/it] 71%|███████▏ | 4912/6885 [14:01:21<1:53:15, 3.44s/it] 71%|███████▏ | 4913/6885 [14:01:26<2:13:46, 4.07s/it] 71%|███████▏ | 4914/6885 [14:01:29<1:59:25, 3.64s/it] 71%|███████▏ | 4915/6885 [14:01:32<1:54:33, 3.49s/it] 71%|███████▏ | 4916/6885 [14:01:34<1:41:56, 3.11s/it] 71%|███████▏ | 4917/6885 [14:01:37<1:33:48, 2.86s/it] 71%|███████▏ | 4918/6885 [14:01:39<1:32:31, 2.82s/it] 71%|███████▏ | 4919/6885 [14:01:42<1:34:47, 2.89s/it] 71%|███████▏ | 4920/6885 [14:01:45<1:33:42, 2.86s/it] {'loss': 0.5683, 'grad_norm': 1.0526887175825372, 'learning_rate': 2.285173206603564e-06, 'epoch': 0.71} 71%|███████▏ | 4920/6885 [14:01:45<1:33:42, 2.86s/it] 71%|███████▏ | 4921/6885 [14:01:48<1:36:16, 2.94s/it] 71%|███████▏ | 4922/6885 [14:01:53<1:54:58, 3.51s/it] 72%|███████▏ | 4923/6885 [14:01:55<1:39:26, 3.04s/it] 72%|███████▏ | 4924/6885 [14:01:57<1:32:45, 2.84s/it] 72%|███████▏ | 4925/6885 [14:02:00<1:25:42, 2.62s/it] 72%|███████▏ | 4926/6885 [14:02:03<1:30:29, 2.77s/it] 72%|███████▏ | 4927/6885 [14:02:05<1:24:36, 2.59s/it] 72%|███████▏ | 4928/6885 [14:02:07<1:23:58, 2.57s/it] 72%|███████▏ | 4929/6885 [14:02:10<1:28:21, 2.71s/it] 72%|███████▏ | 4930/6885 [14:02:12<1:18:43, 2.42s/it] {'loss': 0.5581, 'grad_norm': 1.1758853714133906, 'learning_rate': 2.2639189078928453e-06, 'epoch': 0.72} 72%|███████▏ | 4930/6885 [14:02:12<1:18:43, 2.42s/it] 72%|███████▏ | 4931/6885 [14:02:16<1:32:34, 2.84s/it] 72%|███████▏ | 4932/6885 [14:02:19<1:32:36, 2.85s/it] 72%|███████▏ | 4933/6885 [14:02:23<1:47:12, 3.30s/it] 72%|███████▏ | 4934/6885 [14:02:25<1:37:22, 2.99s/it] 72%|███████▏ | 4935/6885 [14:02:27<1:27:08, 2.68s/it] 72%|███████▏ | 4936/6885 [14:02:30<1:28:09, 2.71s/it] 72%|███████▏ | 4937/6885 [14:02:33<1:29:47, 2.77s/it] 72%|███████▏ | 4938/6885 [14:02:35<1:21:37, 2.52s/it] 72%|███████▏ | 4939/6885 [14:02:38<1:21:44, 2.52s/it] 72%|███████▏ | 4940/6885 [14:02:43<1:47:52, 3.33s/it] {'loss': 0.5448, 'grad_norm': 1.107044757903735, 'learning_rate': 2.242734949602856e-06, 'epoch': 0.72} 72%|███████▏ | 4940/6885 [14:02:43<1:47:52, 3.33s/it] 72%|███████▏ | 4941/6885 [14:02:45<1:39:51, 3.08s/it] 72%|███████▏ | 4942/6885 [14:02:47<1:27:22, 2.70s/it] 72%|███████▏ | 4943/6885 [14:02:51<1:34:45, 2.93s/it] 72%|███████▏ | 4944/6885 [14:02:53<1:25:53, 2.66s/it] 72%|███████▏ | 4945/6885 [14:02:56<1:34:01, 2.91s/it] 72%|███████▏ | 4946/6885 [14:02:59<1:33:03, 2.88s/it] 72%|███████▏ | 4947/6885 [14:03:01<1:26:58, 2.69s/it] 72%|███████▏ | 4948/6885 [14:03:06<1:51:11, 3.44s/it] 72%|███████▏ | 4949/6885 [14:03:12<2:11:53, 4.09s/it] 72%|███████▏ | 4950/6885 [14:03:14<1:51:39, 3.46s/it] {'loss': 0.5531, 'grad_norm': 1.2037164103649114, 'learning_rate': 2.2216218763404647e-06, 'epoch': 0.72} 72%|███████▏ | 4950/6885 [14:03:14<1:51:39, 3.46s/it] 72%|███████▏ | 4951/6885 [14:03:16<1:38:24, 3.05s/it] 72%|███████▏ | 4952/6885 [14:03:19<1:35:54, 2.98s/it] 72%|███████▏ | 4953/6885 [14:03:22<1:35:30, 2.97s/it] 72%|███████▏ | 4954/6885 [14:03:25<1:42:22, 3.18s/it] 72%|███████▏ | 4955/6885 [14:03:28<1:38:04, 3.05s/it] 72%|███████▏ | 4956/6885 [14:03:31<1:31:41, 2.85s/it] 72%|███████▏ | 4957/6885 [14:03:33<1:28:05, 2.74s/it] 72%|███████▏ | 4958/6885 [14:03:35<1:24:26, 2.63s/it] 72%|███████▏ | 4959/6885 [14:03:38<1:26:07, 2.68s/it] 72%|███████▏ | 4960/6885 [14:03:41<1:22:09, 2.56s/it] {'loss': 0.5501, 'grad_norm': 1.0588992084011324, 'learning_rate': 2.200580230890188e-06, 'epoch': 0.72} 72%|███████▏ | 4960/6885 [14:03:41<1:22:09, 2.56s/it] 72%|███████▏ | 4961/6885 [14:03:43<1:18:38, 2.45s/it] 72%|███████▏ | 4962/6885 [14:03:45<1:18:50, 2.46s/it] 72%|███████▏ | 4963/6885 [14:03:47<1:16:01, 2.37s/it] 72%|███████▏ | 4964/6885 [14:03:52<1:37:00, 3.03s/it] 72%|███████▏ | 4965/6885 [14:03:55<1:37:34, 3.05s/it] 72%|███████▏ | 4966/6885 [14:03:58<1:41:08, 3.16s/it] 72%|███████▏ | 4967/6885 [14:04:02<1:40:33, 3.15s/it] 72%|███████▏ | 4968/6885 [14:04:04<1:35:04, 2.98s/it] 72%|███████▏ | 4969/6885 [14:04:07<1:33:31, 2.93s/it] 72%|███████▏ | 4970/6885 [14:04:09<1:25:13, 2.67s/it] {'loss': 0.5769, 'grad_norm': 1.2543824405997601, 'learning_rate': 2.17961055420024e-06, 'epoch': 0.72} 72%|███████▏ | 4970/6885 [14:04:09<1:25:13, 2.67s/it] 72%|███████▏ | 4971/6885 [14:04:12<1:23:49, 2.63s/it] 72%|███████▏ | 4972/6885 [14:04:14<1:21:33, 2.56s/it] 72%|███████▏ | 4973/6885 [14:04:17<1:23:42, 2.63s/it] 72%|███████▏ | 4974/6885 [14:04:19<1:19:57, 2.51s/it] 72%|███████▏ | 4975/6885 [14:04:21<1:17:43, 2.44s/it] 72%|███████▏ | 4976/6885 [14:04:23<1:14:46, 2.35s/it] 72%|███████▏ | 4977/6885 [14:04:27<1:26:44, 2.73s/it] 72%|███████▏ | 4978/6885 [14:04:30<1:26:09, 2.71s/it] 72%|███████▏ | 4979/6885 [14:04:32<1:25:09, 2.68s/it] 72%|███████▏ | 4980/6885 [14:04:34<1:20:34, 2.54s/it] {'loss': 0.5683, 'grad_norm': 1.1899069770329052, 'learning_rate': 2.1587133853686422e-06, 'epoch': 0.72} 72%|███████▏ | 4980/6885 [14:04:34<1:20:34, 2.54s/it] 72%|███████▏ | 4981/6885 [14:04:37<1:16:45, 2.42s/it] 72%|███████▏ | 4982/6885 [14:04:39<1:16:24, 2.41s/it] 72%|███████▏ | 4983/6885 [14:04:43<1:33:15, 2.94s/it] 72%|███████▏ | 4984/6885 [14:04:46<1:32:43, 2.93s/it] 72%|███████▏ | 4985/6885 [14:04:49<1:33:00, 2.94s/it] 72%|███████▏ | 4986/6885 [14:04:54<1:48:30, 3.43s/it] 72%|███████▏ | 4987/6885 [14:04:55<1:32:09, 2.91s/it] 72%|███████▏ | 4988/6885 [14:04:59<1:42:25, 3.24s/it] 72%|███████▏ | 4989/6885 [14:05:01<1:28:39, 2.81s/it] 72%|███████▏ | 4990/6885 [14:05:03<1:18:18, 2.48s/it] {'loss': 0.5648, 'grad_norm': 1.144536370052011, 'learning_rate': 2.137889261629334e-06, 'epoch': 0.72} 72%|███████▏ | 4990/6885 [14:05:03<1:18:18, 2.48s/it] 72%|███████▏ | 4991/6885 [14:05:05<1:17:46, 2.46s/it] 73%|███████▎ | 4992/6885 [14:05:09<1:28:42, 2.81s/it] 73%|███████▎ | 4993/6885 [14:05:11<1:24:10, 2.67s/it] 73%|███████▎ | 4994/6885 [14:05:14<1:26:49, 2.76s/it] 73%|███████▎ | 4995/6885 [14:05:17<1:30:27, 2.87s/it] 73%|███████▎ | 4996/6885 [14:05:20<1:30:25, 2.87s/it] 73%|███████▎ | 4997/6885 [14:05:24<1:37:59, 3.11s/it] 73%|███████▎ | 4998/6885 [14:05:28<1:51:33, 3.55s/it] 73%|███████▎ | 4999/6885 [14:05:31<1:42:23, 3.26s/it] 73%|███████▎ | 5000/6885 [14:05:33<1:31:24, 2.91s/it] {'loss': 0.5646, 'grad_norm': 1.1936078152653293, 'learning_rate': 2.1171387183383936e-06, 'epoch': 0.73} 73%|███████▎ | 5000/6885 [14:05:33<1:31:24, 2.91s/it] 73%|███████▎ | 5001/6885 [14:05:37<1:41:25, 3.23s/it] 73%|███████▎ | 5002/6885 [14:05:39<1:30:58, 2.90s/it] 73%|███████▎ | 5003/6885 [14:05:41<1:22:19, 2.62s/it] 73%|███████▎ | 5004/6885 [14:05:43<1:18:11, 2.49s/it] 73%|███████▎ | 5005/6885 [14:05:47<1:27:32, 2.79s/it] 73%|███████▎ | 5006/6885 [14:05:50<1:25:56, 2.74s/it] 73%|███████▎ | 5007/6885 [14:05:52<1:22:11, 2.63s/it] 73%|███████▎ | 5008/6885 [14:05:55<1:22:49, 2.65s/it] 73%|███████▎ | 5009/6885 [14:05:57<1:16:25, 2.44s/it] 73%|███████▎ | 5010/6885 [14:05:59<1:15:27, 2.41s/it] {'loss': 0.5682, 'grad_norm': 1.26324013915445, 'learning_rate': 2.096462288960251e-06, 'epoch': 0.73} 73%|███████▎ | 5010/6885 [14:05:59<1:15:27, 2.41s/it] 73%|███████▎ | 5011/6885 [14:06:01<1:14:38, 2.39s/it] 73%|███████▎ | 5012/6885 [14:06:04<1:18:01, 2.50s/it] 73%|███████▎ | 5013/6885 [14:06:07<1:18:49, 2.53s/it] 73%|███████▎ | 5014/6885 [14:06:09<1:17:09, 2.47s/it] 73%|███████▎ | 5015/6885 [14:06:14<1:38:19, 3.16s/it] 73%|███████▎ | 5016/6885 [14:06:18<1:54:08, 3.66s/it] 73%|███████▎ | 5017/6885 [14:06:21<1:42:11, 3.28s/it] 73%|███████▎ | 5018/6885 [14:06:23<1:32:58, 2.99s/it] 73%|███████▎ | 5019/6885 [14:06:27<1:40:38, 3.24s/it] 73%|███████▎ | 5020/6885 [14:06:30<1:35:00, 3.06s/it] {'loss': 0.5571, 'grad_norm': 1.1381437228179463, 'learning_rate': 2.0758605050539836e-06, 'epoch': 0.73} 73%|███████▎ | 5020/6885 [14:06:30<1:35:00, 3.06s/it] 73%|███████▎ | 5021/6885 [14:06:33<1:38:49, 3.18s/it] 73%|███████▎ | 5022/6885 [14:06:35<1:27:21, 2.81s/it] 73%|███████▎ | 5023/6885 [14:06:38<1:27:07, 2.81s/it] 73%|███████▎ | 5024/6885 [14:06:40<1:23:07, 2.68s/it] 73%|███████▎ | 5025/6885 [14:06:43<1:20:02, 2.58s/it] 73%|███████▎ | 5026/6885 [14:06:45<1:20:36, 2.60s/it] 73%|███████▎ | 5027/6885 [14:06:47<1:16:55, 2.48s/it] 73%|███████▎ | 5028/6885 [14:06:53<1:42:53, 3.32s/it] 73%|███████▎ | 5029/6885 [14:06:55<1:30:45, 2.93s/it] 73%|███████▎ | 5030/6885 [14:06:57<1:25:42, 2.77s/it] {'loss': 0.5716, 'grad_norm': 1.3500933515295954, 'learning_rate': 2.0553338962596492e-06, 'epoch': 0.73} 73%|███████▎ | 5030/6885 [14:06:57<1:25:42, 2.77s/it] 73%|███████▎ | 5031/6885 [14:07:00<1:23:31, 2.70s/it] 73%|███████▎ | 5032/6885 [14:07:03<1:25:10, 2.76s/it] 73%|███████▎ | 5033/6885 [14:07:05<1:18:48, 2.55s/it] 73%|███████▎ | 5034/6885 [14:07:07<1:19:07, 2.56s/it] 73%|███████▎ | 5035/6885 [14:07:11<1:32:58, 3.02s/it] 73%|███████▎ | 5036/6885 [14:07:15<1:35:18, 3.09s/it] 73%|███████▎ | 5037/6885 [14:07:19<1:43:37, 3.36s/it] 73%|███████▎ | 5038/6885 [14:07:22<1:47:44, 3.50s/it] 73%|███████▎ | 5039/6885 [14:07:25<1:37:16, 3.16s/it] 73%|███████▎ | 5040/6885 [14:07:27<1:28:25, 2.88s/it] {'loss': 0.5626, 'grad_norm': 1.0940717331908218, 'learning_rate': 2.03488299028467e-06, 'epoch': 0.73} 73%|███████▎ | 5040/6885 [14:07:27<1:28:25, 2.88s/it] 73%|███████▎ | 5041/6885 [14:07:30<1:25:08, 2.77s/it] 73%|███████▎ | 5042/6885 [14:07:33<1:34:19, 3.07s/it] 73%|███████▎ | 5043/6885 [14:07:38<1:45:32, 3.44s/it] 73%|███████▎ | 5044/6885 [14:07:40<1:33:08, 3.04s/it] 73%|███████▎ | 5045/6885 [14:07:45<1:52:05, 3.66s/it] 73%|███████▎ | 5046/6885 [14:07:48<1:44:05, 3.40s/it] 73%|███████▎ | 5047/6885 [14:07:51<1:41:57, 3.33s/it] 73%|███████▎ | 5048/6885 [14:07:54<1:43:21, 3.38s/it] 73%|███████▎ | 5049/6885 [14:07:56<1:30:56, 2.97s/it] 73%|███████▎ | 5050/6885 [14:08:00<1:33:58, 3.07s/it] {'loss': 0.5625, 'grad_norm': 1.1116999445105729, 'learning_rate': 2.0145083128902647e-06, 'epoch': 0.73} 73%|███████▎ | 5050/6885 [14:08:00<1:33:58, 3.07s/it] 73%|███████▎ | 5051/6885 [14:08:02<1:29:37, 2.93s/it] 73%|███████▎ | 5052/6885 [14:08:07<1:45:25, 3.45s/it] 73%|███████▎ | 5053/6885 [14:08:12<1:58:33, 3.88s/it] 73%|███████▎ | 5054/6885 [14:08:13<1:38:23, 3.22s/it] 73%|███████▎ | 5055/6885 [14:08:16<1:31:32, 3.00s/it] 73%|███████▎ | 5056/6885 [14:08:18<1:21:54, 2.69s/it] 73%|███████▎ | 5057/6885 [14:08:21<1:24:04, 2.76s/it] 73%|███████▎ | 5058/6885 [14:08:23<1:21:09, 2.67s/it] 73%|███████▎ | 5059/6885 [14:08:25<1:14:29, 2.45s/it] 73%|███████▎ | 5060/6885 [14:08:28<1:14:38, 2.45s/it] {'loss': 0.5601, 'grad_norm': 1.144025480175903, 'learning_rate': 1.9942103878779335e-06, 'epoch': 0.73} 73%|███████▎ | 5060/6885 [14:08:28<1:14:38, 2.45s/it] 74%|███████▎ | 5061/6885 [14:08:30<1:13:09, 2.41s/it] 74%|███████▎ | 5062/6885 [14:08:33<1:17:26, 2.55s/it] 74%|███████▎ | 5063/6885 [14:08:35<1:15:03, 2.47s/it] 74%|███████▎ | 5064/6885 [14:08:40<1:35:25, 3.14s/it] 74%|███████▎ | 5065/6885 [14:08:42<1:26:23, 2.85s/it] 74%|███████▎ | 5066/6885 [14:08:45<1:26:16, 2.85s/it] 74%|███████▎ | 5067/6885 [14:08:48<1:25:42, 2.83s/it] 74%|███████▎ | 5068/6885 [14:08:53<1:50:47, 3.66s/it] 74%|███████▎ | 5069/6885 [14:08:57<1:48:17, 3.58s/it] 74%|███████▎ | 5070/6885 [14:08:59<1:34:50, 3.14s/it] {'loss': 0.5523, 'grad_norm': 1.0557283567612936, 'learning_rate': 1.9739897370759886e-06, 'epoch': 0.74} 74%|███████▎ | 5070/6885 [14:08:59<1:34:50, 3.14s/it] 74%|███████▎ | 5071/6885 [14:09:01<1:30:42, 3.00s/it] 74%|███████▎ | 5072/6885 [14:09:04<1:27:30, 2.90s/it] 74%|███████▎ | 5073/6885 [14:09:06<1:18:56, 2.61s/it] 74%|███████▎ | 5074/6885 [14:09:08<1:09:15, 2.29s/it] 74%|███████▎ | 5075/6885 [14:09:11<1:19:00, 2.62s/it] 74%|███████▎ | 5076/6885 [14:09:16<1:44:13, 3.46s/it] 74%|███████▎ | 5077/6885 [14:09:20<1:41:54, 3.38s/it] 74%|███████▍ | 5078/6885 [14:09:23<1:45:29, 3.50s/it] 74%|███████▍ | 5079/6885 [14:09:25<1:31:25, 3.04s/it] 74%|███████▍ | 5080/6885 [14:09:27<1:23:28, 2.77s/it] {'loss': 0.5521, 'grad_norm': 1.243995372081041, 'learning_rate': 1.9538468803261514e-06, 'epoch': 0.74} 74%|███████▍ | 5080/6885 [14:09:27<1:23:28, 2.77s/it] 74%|███████▍ | 5081/6885 [14:09:30<1:21:39, 2.72s/it] 74%|███████▍ | 5082/6885 [14:09:32<1:19:16, 2.64s/it] 74%|███████▍ | 5083/6885 [14:09:35<1:15:35, 2.52s/it] 74%|███████▍ | 5084/6885 [14:09:38<1:18:42, 2.62s/it] 74%|███████▍ | 5085/6885 [14:09:41<1:26:06, 2.87s/it] 74%|███████▍ | 5086/6885 [14:09:43<1:20:50, 2.70s/it] 74%|███████▍ | 5087/6885 [14:09:47<1:30:10, 3.01s/it] 74%|███████▍ | 5088/6885 [14:09:49<1:21:44, 2.73s/it] 74%|███████▍ | 5089/6885 [14:09:51<1:13:05, 2.44s/it] 74%|███████▍ | 5090/6885 [14:09:54<1:16:59, 2.57s/it] {'loss': 0.5615, 'grad_norm': 1.1122614530495916, 'learning_rate': 1.9337823354701617e-06, 'epoch': 0.74} 74%|███████▍ | 5090/6885 [14:09:54<1:16:59, 2.57s/it] 74%|███████▍ | 5091/6885 [14:09:55<1:09:10, 2.31s/it] 74%|███████▍ | 5092/6885 [14:09:58<1:10:01, 2.34s/it] 74%|███████▍ | 5093/6885 [14:10:01<1:15:00, 2.51s/it] 74%|███████▍ | 5094/6885 [14:10:02<1:06:54, 2.24s/it] 74%|███████▍ | 5095/6885 [14:10:05<1:09:43, 2.34s/it] 74%|███████▍ | 5096/6885 [14:10:09<1:20:45, 2.71s/it] 74%|███████▍ | 5097/6885 [14:10:11<1:16:52, 2.58s/it] 74%|███████▍ | 5098/6885 [14:10:13<1:16:16, 2.56s/it] 74%|███████▍ | 5099/6885 [14:10:16<1:13:09, 2.46s/it] 74%|███████▍ | 5100/6885 [14:10:19<1:22:42, 2.78s/it] {'loss': 0.5514, 'grad_norm': 1.012804702506735, 'learning_rate': 1.913796618336499e-06, 'epoch': 0.74} 74%|███████▍ | 5100/6885 [14:10:19<1:22:42, 2.78s/it] 74%|███████▍ | 5101/6885 [14:10:22<1:20:36, 2.71s/it] 74%|███████▍ | 5102/6885 [14:10:25<1:25:56, 2.89s/it] 74%|███████▍ | 5103/6885 [14:10:28<1:25:40, 2.88s/it] 74%|███████▍ | 5104/6885 [14:10:30<1:23:21, 2.81s/it] 74%|███████▍ | 5105/6885 [14:10:36<1:45:11, 3.55s/it] 74%|███████▍ | 5106/6885 [14:10:38<1:29:55, 3.03s/it] 74%|███████▍ | 5107/6885 [14:10:44<2:02:53, 4.15s/it] 74%|███████▍ | 5108/6885 [14:10:48<1:57:51, 3.98s/it] 74%|███████▍ | 5109/6885 [14:10:50<1:41:40, 3.43s/it] 74%|███████▍ | 5110/6885 [14:10:53<1:39:13, 3.35s/it] {'loss': 0.5595, 'grad_norm': 1.1487569184157758, 'learning_rate': 1.8938902427270905e-06, 'epoch': 0.74} 74%|███████▍ | 5110/6885 [14:10:53<1:39:13, 3.35s/it] 74%|███████▍ | 5111/6885 [14:10:55<1:26:37, 2.93s/it] 74%|███████▍ | 5112/6885 [14:10:57<1:17:56, 2.64s/it] 74%|███████▍ | 5113/6885 [14:10:59<1:14:10, 2.51s/it] 74%|███████▍ | 5114/6885 [14:11:02<1:15:38, 2.56s/it] 74%|███████▍ | 5115/6885 [14:11:04<1:14:20, 2.52s/it] 74%|███████▍ | 5116/6885 [14:11:07<1:15:13, 2.55s/it] 74%|███████▍ | 5117/6885 [14:11:10<1:17:46, 2.64s/it] 74%|███████▍ | 5118/6885 [14:11:13<1:22:50, 2.81s/it] 74%|███████▍ | 5119/6885 [14:11:16<1:20:48, 2.75s/it] 74%|███████▍ | 5120/6885 [14:11:18<1:15:04, 2.55s/it] {'loss': 0.5645, 'grad_norm': 1.222308594990331, 'learning_rate': 1.8740637204041195e-06, 'epoch': 0.74} 74%|███████▍ | 5120/6885 [14:11:18<1:15:04, 2.55s/it] 74%|███████▍ | 5121/6885 [14:11:22<1:29:27, 3.04s/it] 74%|███████▍ | 5122/6885 [14:11:24<1:24:07, 2.86s/it] 74%|███████▍ | 5123/6885 [14:11:27<1:17:06, 2.63s/it] 74%|███████▍ | 5124/6885 [14:11:28<1:11:24, 2.43s/it] 74%|███████▍ | 5125/6885 [14:11:31<1:11:59, 2.45s/it] 74%|███████▍ | 5126/6885 [14:11:34<1:19:17, 2.70s/it] 74%|███████▍ | 5127/6885 [14:11:38<1:27:50, 3.00s/it] 74%|███████▍ | 5128/6885 [14:11:42<1:34:13, 3.22s/it] 74%|███████▍ | 5129/6885 [14:11:44<1:27:37, 2.99s/it] 75%|███████▍ | 5130/6885 [14:11:48<1:31:28, 3.13s/it] {'loss': 0.5607, 'grad_norm': 1.1354476091482255, 'learning_rate': 1.8543175610768715e-06, 'epoch': 0.75} 75%|███████▍ | 5130/6885 [14:11:48<1:31:28, 3.13s/it] 75%|███████▍ | 5131/6885 [14:11:53<1:48:07, 3.70s/it] 75%|███████▍ | 5132/6885 [14:11:55<1:33:13, 3.19s/it] 75%|███████▍ | 5133/6885 [14:12:00<1:50:02, 3.77s/it] 75%|███████▍ | 5134/6885 [14:12:02<1:33:46, 3.21s/it] 75%|███████▍ | 5135/6885 [14:12:09<2:07:33, 4.37s/it] 75%|███████▍ | 5136/6885 [14:12:12<1:57:37, 4.04s/it] 75%|███████▍ | 5137/6885 [14:12:15<1:52:32, 3.86s/it] 75%|███████▍ | 5138/6885 [14:12:21<2:04:04, 4.26s/it] 75%|███████▍ | 5139/6885 [14:12:25<2:01:04, 4.16s/it] 75%|███████▍ | 5140/6885 [14:12:27<1:45:21, 3.62s/it] {'loss': 0.542, 'grad_norm': 1.2205544178436005, 'learning_rate': 1.83465227238861e-06, 'epoch': 0.75} 75%|███████▍ | 5140/6885 [14:12:27<1:45:21, 3.62s/it] 75%|███████▍ | 5141/6885 [14:12:29<1:32:11, 3.17s/it] 75%|███████▍ | 5142/6885 [14:12:33<1:36:06, 3.31s/it] 75%|███████▍ | 5143/6885 [14:12:36<1:32:41, 3.19s/it] 75%|███████▍ | 5144/6885 [14:12:38<1:29:35, 3.09s/it] 75%|███████▍ | 5145/6885 [14:12:42<1:35:10, 3.28s/it] 75%|███████▍ | 5146/6885 [14:12:44<1:24:17, 2.91s/it] 75%|███████▍ | 5147/6885 [14:12:47<1:24:07, 2.90s/it] 75%|███████▍ | 5148/6885 [14:12:51<1:30:23, 3.12s/it] 75%|███████▍ | 5149/6885 [14:12:54<1:34:33, 3.27s/it] 75%|███████▍ | 5150/6885 [14:12:57<1:29:50, 3.11s/it] {'loss': 0.5606, 'grad_norm': 1.2462160753237452, 'learning_rate': 1.8150683599035517e-06, 'epoch': 0.75} 75%|███████▍ | 5150/6885 [14:12:57<1:29:50, 3.11s/it] 75%|███████▍ | 5151/6885 [14:13:00<1:30:06, 3.12s/it] 75%|███████▍ | 5152/6885 [14:13:02<1:17:45, 2.69s/it] 75%|███████▍ | 5153/6885 [14:13:04<1:15:40, 2.62s/it] 75%|███████▍ | 5154/6885 [14:13:07<1:19:09, 2.74s/it] 75%|███████▍ | 5155/6885 [14:13:10<1:17:57, 2.70s/it] 75%|███████▍ | 5156/6885 [14:13:13<1:19:48, 2.77s/it] 75%|███████▍ | 5157/6885 [14:13:16<1:17:58, 2.71s/it] 75%|███████▍ | 5158/6885 [14:13:19<1:21:00, 2.81s/it] 75%|███████▍ | 5159/6885 [14:13:25<1:50:52, 3.85s/it] 75%|███████▍ | 5160/6885 [14:13:29<1:55:34, 4.02s/it] {'loss': 0.5689, 'grad_norm': 1.1396860492016365, 'learning_rate': 1.7955663270938501e-06, 'epoch': 0.75} 75%|███████▍ | 5160/6885 [14:13:29<1:55:34, 4.02s/it] 75%|███████▍ | 5161/6885 [14:13:32<1:43:28, 3.60s/it] 75%|███████▍ | 5162/6885 [14:13:34<1:34:03, 3.28s/it] 75%|███████▍ | 5163/6885 [14:13:38<1:37:49, 3.41s/it] 75%|███████▌ | 5164/6885 [14:13:41<1:31:50, 3.20s/it] 75%|███████▌ | 5165/6885 [14:13:44<1:28:53, 3.10s/it] 75%|███████▌ | 5166/6885 [14:13:47<1:30:52, 3.17s/it] 75%|███████▌ | 5167/6885 [14:13:52<1:43:33, 3.62s/it] 75%|███████▌ | 5168/6885 [14:13:54<1:31:29, 3.20s/it] 75%|███████▌ | 5169/6885 [14:13:57<1:27:04, 3.04s/it] 75%|███████▌ | 5170/6885 [14:13:59<1:24:16, 2.95s/it] {'loss': 0.5625, 'grad_norm': 1.1228524828818305, 'learning_rate': 1.7761466753266598e-06, 'epoch': 0.75} 75%|███████▌ | 5170/6885 [14:13:59<1:24:16, 2.95s/it] 75%|███████▌ | 5171/6885 [14:14:04<1:37:37, 3.42s/it] 75%|███████▌ | 5172/6885 [14:14:06<1:24:49, 2.97s/it] 75%|███████▌ | 5173/6885 [14:14:10<1:37:33, 3.42s/it] 75%|███████▌ | 5174/6885 [14:14:14<1:37:39, 3.42s/it] 75%|███████▌ | 5175/6885 [14:14:16<1:26:33, 3.04s/it] 75%|███████▌ | 5176/6885 [14:14:20<1:35:46, 3.36s/it] 75%|███████▌ | 5177/6885 [14:14:23<1:29:30, 3.14s/it] 75%|███████▌ | 5178/6885 [14:14:25<1:24:45, 2.98s/it] 75%|███████▌ | 5179/6885 [14:14:28<1:21:06, 2.85s/it] 75%|███████▌ | 5180/6885 [14:14:32<1:31:30, 3.22s/it] {'loss': 0.5724, 'grad_norm': 1.1360291736903685, 'learning_rate': 1.7568099038512466e-06, 'epoch': 0.75} 75%|███████▌ | 5180/6885 [14:14:32<1:31:30, 3.22s/it] 75%|███████▌ | 5181/6885 [14:14:34<1:19:49, 2.81s/it] 75%|███████▌ | 5182/6885 [14:14:37<1:21:51, 2.88s/it] 75%|███████▌ | 5183/6885 [14:14:39<1:17:10, 2.72s/it] 75%|███████▌ | 5184/6885 [14:14:41<1:07:43, 2.39s/it] 75%|███████▌ | 5185/6885 [14:14:43<1:10:23, 2.48s/it] 75%|███████▌ | 5186/6885 [14:14:48<1:28:00, 3.11s/it] 75%|███████▌ | 5187/6885 [14:14:51<1:24:36, 2.99s/it] 75%|███████▌ | 5188/6885 [14:14:55<1:31:55, 3.25s/it] 75%|███████▌ | 5189/6885 [14:14:57<1:21:34, 2.89s/it] 75%|███████▌ | 5190/6885 [14:15:00<1:28:14, 3.12s/it] {'loss': 0.5653, 'grad_norm': 1.226701284666325, 'learning_rate': 1.7375565097861518e-06, 'epoch': 0.75} 75%|███████▌ | 5190/6885 [14:15:00<1:28:14, 3.12s/it] 75%|███████▌ | 5191/6885 [14:15:04<1:30:09, 3.19s/it] 75%|███████▌ | 5192/6885 [14:15:06<1:27:03, 3.09s/it] 75%|███████▌ | 5193/6885 [14:15:09<1:24:03, 2.98s/it] 75%|███████▌ | 5194/6885 [14:15:11<1:17:07, 2.74s/it] 75%|███████▌ | 5195/6885 [14:15:14<1:14:22, 2.64s/it] 75%|███████▌ | 5196/6885 [14:15:16<1:13:13, 2.60s/it] 75%|███████▌ | 5197/6885 [14:15:20<1:20:02, 2.85s/it] 75%|███████▌ | 5198/6885 [14:15:23<1:23:27, 2.97s/it] 76%|███████▌ | 5199/6885 [14:15:25<1:14:26, 2.65s/it] 76%|███████▌ | 5200/6885 [14:15:27<1:13:42, 2.62s/it] {'loss': 0.5681, 'grad_norm': 1.1971595467490777, 'learning_rate': 1.7183869881064125e-06, 'epoch': 0.76} 76%|███████▌ | 5200/6885 [14:15:27<1:13:42, 2.62s/it] 76%|███████▌ | 5201/6885 [14:15:30<1:14:09, 2.64s/it] 76%|███████▌ | 5202/6885 [14:15:33<1:15:24, 2.69s/it] 76%|███████▌ | 5203/6885 [14:15:36<1:22:57, 2.96s/it] 76%|███████▌ | 5204/6885 [14:15:39<1:15:58, 2.71s/it] 76%|███████▌ | 5205/6885 [14:15:41<1:14:24, 2.66s/it] 76%|███████▌ | 5206/6885 [14:15:45<1:22:44, 2.96s/it] 76%|███████▌ | 5207/6885 [14:15:49<1:37:18, 3.48s/it] 76%|███████▌ | 5208/6885 [14:15:52<1:28:35, 3.17s/it] 76%|███████▌ | 5209/6885 [14:15:54<1:20:07, 2.87s/it] 76%|███████▌ | 5210/6885 [14:15:59<1:34:49, 3.40s/it] {'loss': 0.5497, 'grad_norm': 1.003433379963408, 'learning_rate': 1.6993018316308351e-06, 'epoch': 0.76} 76%|███████▌ | 5210/6885 [14:15:59<1:34:49, 3.40s/it] 76%|███████▌ | 5211/6885 [14:16:03<1:40:24, 3.60s/it] 76%|███████▌ | 5212/6885 [14:16:06<1:37:31, 3.50s/it] 76%|███████▌ | 5213/6885 [14:16:08<1:22:25, 2.96s/it] 76%|███████▌ | 5214/6885 [14:16:10<1:19:39, 2.86s/it] 76%|███████▌ | 5215/6885 [14:16:14<1:23:32, 3.00s/it] 76%|███████▌ | 5216/6885 [14:16:16<1:21:21, 2.92s/it] 76%|███████▌ | 5217/6885 [14:16:19<1:20:17, 2.89s/it] 76%|███████▌ | 5218/6885 [14:16:22<1:18:16, 2.82s/it] 76%|███████▌ | 5219/6885 [14:16:24<1:12:55, 2.63s/it] 76%|███████▌ | 5220/6885 [14:16:29<1:29:59, 3.24s/it] {'loss': 0.5663, 'grad_norm': 1.0677706687056256, 'learning_rate': 1.6803015310093286e-06, 'epoch': 0.76} 76%|███████▌ | 5220/6885 [14:16:29<1:29:59, 3.24s/it] 76%|███████▌ | 5221/6885 [14:16:32<1:31:01, 3.28s/it] 76%|███████▌ | 5222/6885 [14:16:36<1:36:33, 3.48s/it] 76%|███████▌ | 5223/6885 [14:16:40<1:40:45, 3.64s/it] 76%|███████▌ | 5224/6885 [14:16:42<1:26:23, 3.12s/it] 76%|███████▌ | 5225/6885 [14:16:45<1:21:39, 2.95s/it] 76%|███████▌ | 5226/6885 [14:16:46<1:12:06, 2.61s/it] 76%|███████▌ | 5227/6885 [14:16:50<1:19:30, 2.88s/it] 76%|███████▌ | 5228/6885 [14:16:53<1:19:59, 2.90s/it] 76%|███████▌ | 5229/6885 [14:16:55<1:17:23, 2.80s/it] 76%|███████▌ | 5230/6885 [14:16:58<1:16:56, 2.79s/it] {'loss': 0.5566, 'grad_norm': 1.1960572257973088, 'learning_rate': 1.6613865747102876e-06, 'epoch': 0.76} 76%|███████▌ | 5230/6885 [14:16:58<1:16:56, 2.79s/it] 76%|███████▌ | 5231/6885 [14:17:01<1:18:07, 2.83s/it] 76%|███████▌ | 5232/6885 [14:17:05<1:25:10, 3.09s/it] 76%|███████▌ | 5233/6885 [14:17:08<1:23:55, 3.05s/it] 76%|███████▌ | 5234/6885 [14:17:11<1:21:58, 2.98s/it] 76%|███████▌ | 5235/6885 [14:17:15<1:29:57, 3.27s/it] 76%|███████▌ | 5236/6885 [14:17:18<1:34:41, 3.45s/it] 76%|███████▌ | 5237/6885 [14:17:21<1:29:59, 3.28s/it] 76%|███████▌ | 5238/6885 [14:17:24<1:26:29, 3.15s/it] 76%|███████▌ | 5239/6885 [14:17:26<1:19:44, 2.91s/it] 76%|███████▌ | 5240/6885 [14:17:31<1:31:36, 3.34s/it] {'loss': 0.5474, 'grad_norm': 1.1110041512712467, 'learning_rate': 1.6425574490080355e-06, 'epoch': 0.76} 76%|███████▌ | 5240/6885 [14:17:31<1:31:36, 3.34s/it] 76%|███████▌ | 5241/6885 [14:17:33<1:24:24, 3.08s/it] 76%|███████▌ | 5242/6885 [14:17:36<1:19:58, 2.92s/it] 76%|███████▌ | 5243/6885 [14:17:39<1:26:07, 3.15s/it] 76%|███████▌ | 5244/6885 [14:17:41<1:14:51, 2.74s/it] 76%|███████▌ | 5245/6885 [14:17:45<1:25:56, 3.14s/it] 76%|███████▌ | 5246/6885 [14:17:49<1:25:58, 3.15s/it] 76%|███████▌ | 5247/6885 [14:17:51<1:17:24, 2.84s/it] 76%|███████▌ | 5248/6885 [14:17:53<1:10:05, 2.57s/it] 76%|███████▌ | 5249/6885 [14:17:55<1:10:53, 2.60s/it] 76%|███████▋ | 5250/6885 [14:18:00<1:29:45, 3.29s/it] {'loss': 0.5602, 'grad_norm': 1.1953866183465143, 'learning_rate': 1.6238146379703257e-06, 'epoch': 0.76} 76%|███████▋ | 5250/6885 [14:18:00<1:29:45, 3.29s/it] 76%|███████▋ | 5251/6885 [14:18:04<1:37:54, 3.60s/it] 76%|███████▋ | 5252/6885 [14:18:07<1:31:00, 3.34s/it] 76%|███████▋ | 5253/6885 [14:18:10<1:24:40, 3.11s/it] 76%|███████▋ | 5254/6885 [14:18:12<1:16:07, 2.80s/it] 76%|███████▋ | 5255/6885 [14:18:15<1:22:26, 3.03s/it] 76%|███████▋ | 5256/6885 [14:18:18<1:16:35, 2.82s/it] 76%|███████▋ | 5257/6885 [14:18:20<1:10:07, 2.58s/it] 76%|███████▋ | 5258/6885 [14:18:23<1:12:18, 2.67s/it] 76%|███████▋ | 5259/6885 [14:18:25<1:12:25, 2.67s/it] 76%|███████▋ | 5260/6885 [14:18:27<1:05:40, 2.42s/it] {'loss': 0.558, 'grad_norm': 1.184221410195916, 'learning_rate': 1.6051586234458932e-06, 'epoch': 0.76} 76%|███████▋ | 5260/6885 [14:18:27<1:05:40, 2.42s/it] 76%|███████▋ | 5261/6885 [14:18:31<1:12:55, 2.69s/it] 76%|███████▋ | 5262/6885 [14:18:34<1:19:07, 2.93s/it] 76%|███████▋ | 5263/6885 [14:18:37<1:17:03, 2.85s/it] 76%|███████▋ | 5264/6885 [14:18:41<1:26:36, 3.21s/it] 76%|███████▋ | 5265/6885 [14:18:48<1:56:24, 4.31s/it] 76%|███████▋ | 5266/6885 [14:18:51<1:45:52, 3.92s/it] 76%|███████▋ | 5267/6885 [14:18:53<1:34:04, 3.49s/it] 77%|███████▋ | 5268/6885 [14:18:57<1:33:51, 3.48s/it] 77%|███████▋ | 5269/6885 [14:19:00<1:30:15, 3.35s/it] 77%|███████▋ | 5270/6885 [14:19:02<1:20:10, 2.98s/it] {'loss': 0.573, 'grad_norm': 1.1917994670950118, 'learning_rate': 1.5865898850520671e-06, 'epoch': 0.77} 77%|███████▋ | 5270/6885 [14:19:02<1:20:10, 2.98s/it] 77%|███████▋ | 5271/6885 [14:19:04<1:15:14, 2.80s/it] 77%|███████▋ | 5272/6885 [14:19:07<1:13:14, 2.72s/it] 77%|███████▋ | 5273/6885 [14:19:09<1:08:05, 2.53s/it] 77%|███████▋ | 5274/6885 [14:19:12<1:10:32, 2.63s/it] 77%|███████▋ | 5275/6885 [14:19:19<1:47:00, 3.99s/it] 77%|███████▋ | 5276/6885 [14:19:21<1:32:24, 3.45s/it] 77%|███████▋ | 5277/6885 [14:19:24<1:28:40, 3.31s/it] 77%|███████▋ | 5278/6885 [14:19:26<1:22:36, 3.08s/it] 77%|███████▋ | 5279/6885 [14:19:29<1:17:56, 2.91s/it] 77%|███████▋ | 5280/6885 [14:19:31<1:14:29, 2.78s/it] {'loss': 0.5565, 'grad_norm': 1.205079091727242, 'learning_rate': 1.5681089001624488e-06, 'epoch': 0.77} 77%|███████▋ | 5280/6885 [14:19:31<1:14:29, 2.78s/it] 77%|███████▋ | 5281/6885 [14:19:34<1:15:56, 2.84s/it] 77%|███████▋ | 5282/6885 [14:19:37<1:11:03, 2.66s/it] 77%|███████▋ | 5283/6885 [14:19:40<1:14:11, 2.78s/it] 77%|███████▋ | 5284/6885 [14:19:43<1:16:10, 2.85s/it] 77%|███████▋ | 5285/6885 [14:19:45<1:11:14, 2.67s/it] 77%|███████▋ | 5286/6885 [14:19:48<1:10:13, 2.64s/it] 77%|███████▋ | 5287/6885 [14:19:51<1:20:23, 3.02s/it] 77%|███████▋ | 5288/6885 [14:19:55<1:26:37, 3.25s/it] 77%|███████▋ | 5289/6885 [14:19:58<1:25:04, 3.20s/it] 77%|███████▋ | 5290/6885 [14:20:02<1:28:48, 3.34s/it] {'loss': 0.5537, 'grad_norm': 1.0590014592765518, 'learning_rate': 1.5497161438946218e-06, 'epoch': 0.77} 77%|███████▋ | 5290/6885 [14:20:02<1:28:48, 3.34s/it] 77%|███████▋ | 5291/6885 [14:20:06<1:35:35, 3.60s/it] 77%|███████▋ | 5292/6885 [14:20:09<1:30:31, 3.41s/it] 77%|███████▋ | 5293/6885 [14:20:12<1:24:13, 3.17s/it] 77%|███████▋ | 5294/6885 [14:20:14<1:18:42, 2.97s/it] 77%|███████▋ | 5295/6885 [14:20:18<1:23:35, 3.15s/it] 77%|███████▋ | 5296/6885 [14:20:20<1:18:12, 2.95s/it] 77%|███████▋ | 5297/6885 [14:20:26<1:37:15, 3.67s/it] 77%|███████▋ | 5298/6885 [14:20:28<1:29:52, 3.40s/it] 77%|███████▋ | 5299/6885 [14:20:31<1:21:03, 3.07s/it] 77%|███████▋ | 5300/6885 [14:20:33<1:12:33, 2.75s/it] {'loss': 0.5608, 'grad_norm': 1.3045355829406655, 'learning_rate': 1.5314120890979596e-06, 'epoch': 0.77} 77%|███████▋ | 5300/6885 [14:20:33<1:12:33, 2.75s/it] 77%|███████▋ | 5301/6885 [14:20:36<1:12:50, 2.76s/it] 77%|███████▋ | 5302/6885 [14:20:38<1:09:59, 2.65s/it] 77%|███████▋ | 5303/6885 [14:20:41<1:09:44, 2.65s/it] 77%|███████▋ | 5304/6885 [14:20:44<1:15:34, 2.87s/it] 77%|███████▋ | 5305/6885 [14:20:46<1:08:30, 2.60s/it] 77%|███████▋ | 5306/6885 [14:20:48<1:03:46, 2.42s/it] 77%|███████▋ | 5307/6885 [14:20:51<1:09:34, 2.65s/it] 77%|███████▋ | 5308/6885 [14:20:54<1:07:37, 2.57s/it] 77%|███████▋ | 5309/6885 [14:20:56<1:10:12, 2.67s/it] 77%|███████▋ | 5310/6885 [14:21:00<1:15:51, 2.89s/it] {'loss': 0.563, 'grad_norm': 1.227226173650366, 'learning_rate': 1.5131972063414451e-06, 'epoch': 0.77} 77%|███████▋ | 5310/6885 [14:21:00<1:15:51, 2.89s/it] 77%|███████▋ | 5311/6885 [14:21:04<1:23:39, 3.19s/it] 77%|███████▋ | 5312/6885 [14:21:06<1:14:14, 2.83s/it] 77%|███████▋ | 5313/6885 [14:21:09<1:18:00, 2.98s/it] 77%|███████▋ | 5314/6885 [14:21:11<1:07:31, 2.58s/it] 77%|███████▋ | 5315/6885 [14:21:12<59:20, 2.27s/it] 77%|███████▋ | 5316/6885 [14:21:15<1:00:29, 2.31s/it] 77%|███████▋ | 5317/6885 [14:21:19<1:13:37, 2.82s/it] 77%|███████▋ | 5318/6885 [14:21:21<1:13:20, 2.81s/it] 77%|███████▋ | 5319/6885 [14:21:25<1:16:39, 2.94s/it] 77%|███████▋ | 5320/6885 [14:21:30<1:31:57, 3.53s/it] {'loss': 0.5618, 'grad_norm': 1.1505400844326525, 'learning_rate': 1.4950719639015987e-06, 'epoch': 0.77} 77%|███████▋ | 5320/6885 [14:21:30<1:31:57, 3.53s/it] 77%|███████▋ | 5321/6885 [14:21:33<1:27:41, 3.36s/it] 77%|███████▋ | 5322/6885 [14:21:36<1:30:21, 3.47s/it] 77%|███████▋ | 5323/6885 [14:21:39<1:26:10, 3.31s/it] 77%|███████▋ | 5324/6885 [14:21:41<1:16:29, 2.94s/it] 77%|███████▋ | 5325/6885 [14:21:44<1:11:02, 2.73s/it] 77%|███████▋ | 5326/6885 [14:21:48<1:24:00, 3.23s/it] 77%|███████▋ | 5327/6885 [14:21:55<1:52:12, 4.32s/it] 77%|███████▋ | 5328/6885 [14:21:57<1:37:29, 3.76s/it] 77%|███████▋ | 5329/6885 [14:22:00<1:27:20, 3.37s/it] 77%|███████▋ | 5330/6885 [14:22:02<1:16:59, 2.97s/it] {'loss': 0.5559, 'grad_norm': 1.1971910791582392, 'learning_rate': 1.4770368277504183e-06, 'epoch': 0.77} 77%|███████▋ | 5330/6885 [14:22:02<1:16:59, 2.97s/it] 77%|███████▋ | 5331/6885 [14:22:05<1:17:47, 3.00s/it] 77%|███████▋ | 5332/6885 [14:22:08<1:17:11, 2.98s/it] 77%|███████▋ | 5333/6885 [14:22:10<1:13:00, 2.82s/it] 77%|███████▋ | 5334/6885 [14:22:13<1:13:37, 2.85s/it] 77%|███████▋ | 5335/6885 [14:22:15<1:07:37, 2.62s/it] 78%|███████▊ | 5336/6885 [14:22:18<1:08:43, 2.66s/it] 78%|███████▊ | 5337/6885 [14:22:20<1:05:55, 2.56s/it] 78%|███████▊ | 5338/6885 [14:22:22<1:01:15, 2.38s/it] 78%|███████▊ | 5339/6885 [14:22:29<1:34:57, 3.69s/it] 78%|███████▊ | 5340/6885 [14:22:31<1:23:17, 3.23s/it] {'loss': 0.5757, 'grad_norm': 1.1465426761189066, 'learning_rate': 1.45909226154341e-06, 'epoch': 0.78} 78%|███████▊ | 5340/6885 [14:22:31<1:23:17, 3.23s/it] 78%|███████▊ | 5341/6885 [14:22:33<1:15:27, 2.93s/it] 78%|███████▊ | 5342/6885 [14:22:37<1:20:54, 3.15s/it] 78%|███████▊ | 5343/6885 [14:22:40<1:19:37, 3.10s/it] 78%|███████▊ | 5344/6885 [14:22:43<1:22:28, 3.21s/it] 78%|███████▊ | 5345/6885 [14:22:46<1:13:57, 2.88s/it] 78%|███████▊ | 5346/6885 [14:22:48<1:11:24, 2.78s/it] 78%|███████▊ | 5347/6885 [14:22:50<1:07:27, 2.63s/it] 78%|███████▊ | 5348/6885 [14:22:54<1:17:36, 3.03s/it] 78%|███████▊ | 5349/6885 [14:22:57<1:13:40, 2.88s/it] 78%|███████▊ | 5350/6885 [14:23:00<1:15:49, 2.96s/it] {'loss': 0.5699, 'grad_norm': 1.0530342043982832, 'learning_rate': 1.4412387266076677e-06, 'epoch': 0.78} 78%|███████▊ | 5350/6885 [14:23:00<1:15:49, 2.96s/it] 78%|███████▊ | 5351/6885 [14:23:03<1:16:07, 2.98s/it] 78%|███████▊ | 5352/6885 [14:23:05<1:09:34, 2.72s/it] 78%|███████▊ | 5353/6885 [14:23:07<1:03:38, 2.49s/it] 78%|███████▊ | 5354/6885 [14:23:10<1:04:30, 2.53s/it] 78%|███████▊ | 5355/6885 [14:23:12<59:46, 2.34s/it] 78%|███████▊ | 5356/6885 [14:23:14<58:02, 2.28s/it] 78%|███████▊ | 5357/6885 [14:23:17<1:06:07, 2.60s/it] 78%|███████▊ | 5358/6885 [14:23:20<1:08:29, 2.69s/it] 78%|███████▊ | 5359/6885 [14:23:23<1:09:34, 2.74s/it] 78%|███████▊ | 5360/6885 [14:23:25<1:04:17, 2.53s/it] {'loss': 0.5592, 'grad_norm': 1.1921772808125664, 'learning_rate': 1.4234766819300106e-06, 'epoch': 0.78} 78%|███████▊ | 5360/6885 [14:23:25<1:04:17, 2.53s/it] 78%|███████▊ | 5361/6885 [14:23:29<1:17:15, 3.04s/it] 78%|███████▊ | 5362/6885 [14:23:32<1:16:02, 3.00s/it] 78%|███████▊ | 5363/6885 [14:23:35<1:14:12, 2.93s/it] 78%|███████▊ | 5364/6885 [14:23:37<1:05:43, 2.59s/it] 78%|███████▊ | 5365/6885 [14:23:39<1:01:07, 2.41s/it] 78%|███████▊ | 5366/6885 [14:23:42<1:08:33, 2.71s/it] 78%|███████▊ | 5367/6885 [14:23:45<1:09:58, 2.77s/it] 78%|███████▊ | 5368/6885 [14:23:48<1:11:03, 2.81s/it] 78%|███████▊ | 5369/6885 [14:23:51<1:14:24, 2.94s/it] 78%|███████▊ | 5370/6885 [14:23:54<1:12:20, 2.87s/it] {'loss': 0.5658, 'grad_norm': 1.1969217401024441, 'learning_rate': 1.4058065841451856e-06, 'epoch': 0.78} 78%|███████▊ | 5370/6885 [14:23:54<1:12:20, 2.87s/it] 78%|███████▊ | 5371/6885 [14:23:57<1:17:23, 3.07s/it] 78%|███████▊ | 5372/6885 [14:24:00<1:15:12, 2.98s/it] 78%|███████▊ | 5373/6885 [14:24:03<1:15:24, 2.99s/it] 78%|███████▊ | 5374/6885 [14:24:05<1:09:42, 2.77s/it] 78%|███████▊ | 5375/6885 [14:24:10<1:20:51, 3.21s/it] 78%|███████▊ | 5376/6885 [14:24:13<1:21:07, 3.23s/it] 78%|███████▊ | 5377/6885 [14:24:16<1:18:23, 3.12s/it] 78%|███████▊ | 5378/6885 [14:24:18<1:13:26, 2.92s/it] 78%|███████▊ | 5379/6885 [14:24:22<1:19:09, 3.15s/it] 78%|███████▊ | 5380/6885 [14:24:26<1:29:13, 3.56s/it] {'loss': 0.5523, 'grad_norm': 1.1371738180522346, 'learning_rate': 1.3882288875241262e-06, 'epoch': 0.78} 78%|███████▊ | 5380/6885 [14:24:26<1:29:13, 3.56s/it] 78%|███████▊ | 5381/6885 [14:24:30<1:27:24, 3.49s/it] 78%|███████▊ | 5382/6885 [14:24:33<1:27:30, 3.49s/it] 78%|███████▊ | 5383/6885 [14:24:35<1:13:45, 2.95s/it] 78%|███████▊ | 5384/6885 [14:24:37<1:09:27, 2.78s/it] 78%|███████▊ | 5385/6885 [14:24:40<1:05:41, 2.63s/it] 78%|███████▊ | 5386/6885 [14:24:42<1:02:53, 2.52s/it] 78%|███████▊ | 5387/6885 [14:24:45<1:08:43, 2.75s/it] 78%|███████▊ | 5388/6885 [14:24:48<1:11:09, 2.85s/it] 78%|███████▊ | 5389/6885 [14:24:51<1:11:21, 2.86s/it] 78%|███████▊ | 5390/6885 [14:24:54<1:10:19, 2.82s/it] {'loss': 0.5501, 'grad_norm': 1.119312116230787, 'learning_rate': 1.3707440439622754e-06, 'epoch': 0.78} 78%|███████▊ | 5390/6885 [14:24:54<1:10:19, 2.82s/it] 78%|███████▊ | 5391/6885 [14:24:56<1:03:43, 2.56s/it] 78%|███████▊ | 5392/6885 [14:24:58<1:00:05, 2.41s/it] 78%|███████▊ | 5393/6885 [14:25:01<1:02:07, 2.50s/it] 78%|███████▊ | 5394/6885 [14:25:04<1:09:18, 2.79s/it] 78%|███████▊ | 5395/6885 [14:25:06<1:03:33, 2.56s/it] 78%|███████▊ | 5396/6885 [14:25:09<1:04:43, 2.61s/it] 78%|███████▊ | 5397/6885 [14:25:11<1:03:12, 2.55s/it] 78%|███████▊ | 5398/6885 [14:25:14<1:05:11, 2.63s/it] 78%|███████▊ | 5399/6885 [14:25:17<1:10:52, 2.86s/it] 78%|███████▊ | 5400/6885 [14:25:20<1:07:06, 2.71s/it] {'loss': 0.5393, 'grad_norm': 1.200972988458609, 'learning_rate': 1.353352502967966e-06, 'epoch': 0.78} 78%|███████▊ | 5400/6885 [14:25:20<1:07:06, 2.71s/it] 78%|███████▊ | 5401/6885 [14:25:22<1:00:58, 2.47s/it] 78%|███████▊ | 5402/6885 [14:25:24<59:07, 2.39s/it] 78%|███████▊ | 5403/6885 [14:25:27<1:07:29, 2.73s/it] 78%|███████▊ | 5404/6885 [14:25:30<1:03:34, 2.58s/it] 79%|███████▊ | 5405/6885 [14:25:33<1:08:20, 2.77s/it] 79%|███████▊ | 5406/6885 [14:25:36<1:12:35, 2.95s/it] 79%|███████▊ | 5407/6885 [14:25:39<1:12:01, 2.92s/it] 79%|███████▊ | 5408/6885 [14:25:43<1:22:05, 3.33s/it] 79%|███████▊ | 5409/6885 [14:25:46<1:15:44, 3.08s/it] 79%|███████▊ | 5410/6885 [14:25:50<1:26:23, 3.51s/it] {'loss': 0.5552, 'grad_norm': 1.005244568846047, 'learning_rate': 1.336054711650867e-06, 'epoch': 0.79} 79%|███████▊ | 5410/6885 [14:25:50<1:26:23, 3.51s/it] 79%|███████▊ | 5411/6885 [14:25:52<1:14:39, 3.04s/it] 79%|███████▊ | 5412/6885 [14:25:54<1:07:38, 2.76s/it] 79%|███████▊ | 5413/6885 [14:25:59<1:19:20, 3.23s/it] 79%|███████▊ | 5414/6885 [14:26:01<1:13:16, 2.99s/it] 79%|███████▊ | 5415/6885 [14:26:04<1:15:15, 3.07s/it] 79%|███████▊ | 5416/6885 [14:26:07<1:10:41, 2.89s/it] 79%|███████▊ | 5417/6885 [14:26:09<1:06:20, 2.71s/it] 79%|███████▊ | 5418/6885 [14:26:12<1:10:07, 2.87s/it] 79%|███████▊ | 5419/6885 [14:26:14<1:04:23, 2.64s/it] 79%|███████▊ | 5420/6885 [14:26:17<1:06:12, 2.71s/it] {'loss': 0.5615, 'grad_norm': 0.9811514201367332, 'learning_rate': 1.3188511147104882e-06, 'epoch': 0.79} 79%|███████▊ | 5420/6885 [14:26:17<1:06:12, 2.71s/it] 79%|███████▊ | 5421/6885 [14:26:20<1:03:41, 2.61s/it] 79%|███████▉ | 5422/6885 [14:26:22<1:01:59, 2.54s/it] 79%|███████▉ | 5423/6885 [14:26:25<1:03:38, 2.61s/it] 79%|███████▉ | 5424/6885 [14:26:27<1:01:35, 2.53s/it] 79%|███████▉ | 5425/6885 [14:26:32<1:17:23, 3.18s/it] 79%|███████▉ | 5426/6885 [14:26:34<1:12:40, 2.99s/it] 79%|███████▉ | 5427/6885 [14:26:37<1:09:59, 2.88s/it] 79%|███████▉ | 5428/6885 [14:26:40<1:07:35, 2.78s/it] 79%|███████▉ | 5429/6885 [14:26:45<1:28:45, 3.66s/it] 79%|███████▉ | 5430/6885 [14:26:49<1:28:44, 3.66s/it] {'loss': 0.5731, 'grad_norm': 1.2124333619418073, 'learning_rate': 1.3017421544247466e-06, 'epoch': 0.79} 79%|███████▉ | 5430/6885 [14:26:49<1:28:44, 3.66s/it] 79%|███████▉ | 5431/6885 [14:26:51<1:17:56, 3.22s/it] 79%|███████▉ | 5432/6885 [14:26:53<1:10:14, 2.90s/it] 79%|███████▉ | 5433/6885 [14:26:56<1:08:25, 2.83s/it] 79%|███████▉ | 5434/6885 [14:26:59<1:06:19, 2.74s/it] 79%|███████▉ | 5435/6885 [14:27:01<1:07:26, 2.79s/it] 79%|███████▉ | 5436/6885 [14:27:05<1:09:24, 2.87s/it] 79%|███████▉ | 5437/6885 [14:27:07<1:09:21, 2.87s/it] 79%|███████▉ | 5438/6885 [14:27:10<1:03:39, 2.64s/it] 79%|███████▉ | 5439/6885 [14:27:13<1:06:16, 2.75s/it] 79%|███████▉ | 5440/6885 [14:27:16<1:07:52, 2.82s/it] {'loss': 0.5449, 'grad_norm': 1.0164638888045425, 'learning_rate': 1.2847282706385962e-06, 'epoch': 0.79} 79%|███████▉ | 5440/6885 [14:27:16<1:07:52, 2.82s/it] 79%|███████▉ | 5441/6885 [14:27:17<1:01:26, 2.55s/it] 79%|███████▉ | 5442/6885 [14:27:20<1:03:08, 2.63s/it] 79%|███████▉ | 5443/6885 [14:27:23<1:02:58, 2.62s/it] 79%|███████▉ | 5444/6885 [14:27:25<1:00:54, 2.54s/it] 79%|███████▉ | 5445/6885 [14:27:29<1:11:48, 2.99s/it] 79%|███████▉ | 5446/6885 [14:27:31<1:02:17, 2.60s/it] 79%|███████▉ | 5447/6885 [14:27:34<1:07:45, 2.83s/it] 79%|███████▉ | 5448/6885 [14:27:36<58:39, 2.45s/it] 79%|███████▉ | 5449/6885 [14:27:40<1:07:46, 2.83s/it] 79%|███████▉ | 5450/6885 [14:27:42<1:07:11, 2.81s/it] {'loss': 0.5581, 'grad_norm': 1.0692055130184748, 'learning_rate': 1.267809900752725e-06, 'epoch': 0.79} 79%|███████▉ | 5450/6885 [14:27:42<1:07:11, 2.81s/it] 79%|███████▉ | 5451/6885 [14:27:45<1:02:49, 2.63s/it] 79%|███████▉ | 5452/6885 [14:27:47<1:03:57, 2.68s/it] 79%|███████▉ | 5453/6885 [14:27:49<59:54, 2.51s/it] 79%|███████▉ | 5454/6885 [14:27:51<56:30, 2.37s/it] 79%|███████▉ | 5455/6885 [14:27:53<53:15, 2.23s/it] 79%|███████▉ | 5456/6885 [14:27:56<52:36, 2.21s/it] 79%|███████▉ | 5457/6885 [14:27:59<1:01:46, 2.60s/it] 79%|███████▉ | 5458/6885 [14:28:02<1:01:10, 2.57s/it] 79%|███████▉ | 5459/6885 [14:28:04<59:28, 2.50s/it] 79%|███████▉ | 5460/6885 [14:28:06<57:55, 2.44s/it] {'loss': 0.5694, 'grad_norm': 1.2243966381535343, 'learning_rate': 1.2509874797122983e-06, 'epoch': 0.79} 79%|███████▉ | 5460/6885 [14:28:06<57:55, 2.44s/it] 79%|███████▉ | 5461/6885 [14:28:09<57:03, 2.40s/it] 79%|███████▉ | 5462/6885 [14:28:12<1:06:23, 2.80s/it] 79%|███████▉ | 5463/6885 [14:28:14<1:00:54, 2.57s/it] 79%|███████▉ | 5464/6885 [14:28:17<59:08, 2.50s/it] 79%|███████▉ | 5465/6885 [14:28:21<1:09:48, 2.95s/it] 79%|███████▉ | 5466/6885 [14:28:24<1:10:06, 2.96s/it] 79%|███████▉ | 5467/6885 [14:28:27<1:12:54, 3.08s/it] 79%|███████▉ | 5468/6885 [14:28:30<1:12:57, 3.09s/it] 79%|███████▉ | 5469/6885 [14:28:34<1:21:41, 3.46s/it] 79%|███████▉ | 5470/6885 [14:28:38<1:20:43, 3.42s/it] {'loss': 0.5601, 'grad_norm': 1.1192058071022615, 'learning_rate': 1.2342614399957952e-06, 'epoch': 0.79} 79%|███████▉ | 5470/6885 [14:28:38<1:20:43, 3.42s/it] 79%|███████▉ | 5471/6885 [14:28:40<1:09:57, 2.97s/it] 79%|███████▉ | 5472/6885 [14:28:42<1:07:20, 2.86s/it] 79%|███████▉ | 5473/6885 [14:28:45<1:07:47, 2.88s/it] 80%|███████▉ | 5474/6885 [14:28:47<1:00:17, 2.56s/it] 80%|███████▉ | 5475/6885 [14:28:51<1:09:15, 2.95s/it] 80%|███████▉ | 5476/6885 [14:28:53<1:00:31, 2.58s/it] 80%|███████▉ | 5477/6885 [14:28:55<56:47, 2.42s/it] 80%|███████▉ | 5478/6885 [14:28:56<51:39, 2.20s/it] 80%|███████▉ | 5479/6885 [14:29:00<1:05:08, 2.78s/it] 80%|███████▉ | 5480/6885 [14:29:02<57:49, 2.47s/it] {'loss': 0.5383, 'grad_norm': 1.210664779695526, 'learning_rate': 1.217632211603868e-06, 'epoch': 0.8} 80%|███████▉ | 5480/6885 [14:29:02<57:49, 2.47s/it] 80%|███████▉ | 5481/6885 [14:29:04<56:04, 2.40s/it] 80%|███████▉ | 5482/6885 [14:29:07<57:47, 2.47s/it] 80%|███████▉ | 5483/6885 [14:29:10<1:03:51, 2.73s/it] 80%|███████▉ | 5484/6885 [14:29:14<1:10:21, 3.01s/it] 80%|███████▉ | 5485/6885 [14:29:18<1:13:28, 3.15s/it] 80%|███████▉ | 5486/6885 [14:29:20<1:09:16, 2.97s/it] 80%|███████▉ | 5487/6885 [14:29:24<1:19:15, 3.40s/it] 80%|███████▉ | 5488/6885 [14:29:27<1:09:34, 2.99s/it] 80%|███████▉ | 5489/6885 [14:29:29<1:07:01, 2.88s/it] 80%|███████▉ | 5490/6885 [14:29:31<58:37, 2.52s/it] {'loss': 0.5503, 'grad_norm': 1.2306429782422048, 'learning_rate': 1.2011002220483099e-06, 'epoch': 0.8} 80%|███████▉ | 5490/6885 [14:29:31<58:37, 2.52s/it] 80%|███████▉ | 5491/6885 [14:29:33<53:52, 2.32s/it] 80%|███████▉ | 5492/6885 [14:29:35<52:25, 2.26s/it] 80%|███████▉ | 5493/6885 [14:29:38<1:02:04, 2.68s/it] 80%|███████▉ | 5494/6885 [14:29:41<59:50, 2.58s/it] 80%|███████▉ | 5495/6885 [14:29:44<1:04:17, 2.78s/it] 80%|███████▉ | 5496/6885 [14:29:46<1:01:39, 2.66s/it] 80%|███████▉ | 5497/6885 [14:29:51<1:11:21, 3.08s/it] 80%|███████▉ | 5498/6885 [14:29:53<1:04:26, 2.79s/it] 80%|███████▉ | 5499/6885 [14:29:57<1:15:10, 3.25s/it] 80%|███████▉ | 5500/6885 [14:30:01<1:19:00, 3.42s/it] {'loss': 0.561, 'grad_norm': 1.1449496150562748, 'learning_rate': 1.1846658963410472e-06, 'epoch': 0.8} 80%|███████▉ | 5500/6885 [14:30:01<1:19:00, 3.42s/it] 80%|███████▉ | 5501/6885 [14:30:04<1:15:10, 3.26s/it] 80%|███████▉ | 5502/6885 [14:30:06<1:10:14, 3.05s/it] 80%|███████▉ | 5503/6885 [14:30:09<1:05:39, 2.85s/it] 80%|███████▉ | 5504/6885 [14:30:11<1:02:16, 2.71s/it] 80%|███████▉ | 5505/6885 [14:30:16<1:20:38, 3.51s/it] 80%|███████▉ | 5506/6885 [14:30:19<1:14:59, 3.26s/it] 80%|███████▉ | 5507/6885 [14:30:22<1:11:43, 3.12s/it] 80%|████████ | 5508/6885 [14:30:25<1:11:19, 3.11s/it] 80%|████████ | 5509/6885 [14:30:27<1:06:51, 2.92s/it] 80%|████████ | 5510/6885 [14:30:29<1:00:54, 2.66s/it] {'loss': 0.5489, 'grad_norm': 1.1809146975647171, 'learning_rate': 1.168329656983222e-06, 'epoch': 0.8} 80%|████████ | 5510/6885 [14:30:29<1:00:54, 2.66s/it] 80%|████████ | 5511/6885 [14:30:33<1:07:40, 2.96s/it] 80%|████████ | 5512/6885 [14:30:36<1:05:11, 2.85s/it] 80%|████████ | 5513/6885 [14:30:39<1:11:10, 3.11s/it] 80%|████████ | 5514/6885 [14:30:43<1:16:53, 3.36s/it] 80%|████████ | 5515/6885 [14:30:46<1:10:20, 3.08s/it] 80%|████████ | 5516/6885 [14:30:50<1:15:44, 3.32s/it] 80%|████████ | 5517/6885 [14:30:53<1:13:04, 3.20s/it] 80%|████████ | 5518/6885 [14:30:57<1:21:06, 3.56s/it] 80%|████████ | 5519/6885 [14:31:00<1:15:56, 3.34s/it] 80%|████████ | 5520/6885 [14:31:02<1:09:26, 3.05s/it] {'loss': 0.5443, 'grad_norm': 1.1865786985653701, 'learning_rate': 1.1520919239543272e-06, 'epoch': 0.8} 80%|████████ | 5520/6885 [14:31:02<1:09:26, 3.05s/it] 80%|████████ | 5521/6885 [14:31:04<1:04:16, 2.83s/it] 80%|████████ | 5522/6885 [14:31:10<1:24:19, 3.71s/it] 80%|████████ | 5523/6885 [14:31:14<1:26:41, 3.82s/it] 80%|████████ | 5524/6885 [14:31:17<1:21:07, 3.58s/it] 80%|████████ | 5525/6885 [14:31:21<1:19:20, 3.50s/it] 80%|████████ | 5526/6885 [14:31:24<1:21:18, 3.59s/it] 80%|████████ | 5527/6885 [14:31:27<1:13:41, 3.26s/it] 80%|████████ | 5528/6885 [14:31:29<1:02:56, 2.78s/it] 80%|████████ | 5529/6885 [14:31:32<1:05:32, 2.90s/it] 80%|████████ | 5530/6885 [14:31:34<58:25, 2.59s/it] {'loss': 0.5784, 'grad_norm': 1.2819514449232758, 'learning_rate': 1.1359531147014102e-06, 'epoch': 0.8} 80%|████████ | 5530/6885 [14:31:34<58:25, 2.59s/it] 80%|████████ | 5531/6885 [14:31:38<1:07:26, 2.99s/it] 80%|████████ | 5532/6885 [14:31:40<1:01:46, 2.74s/it] 80%|████████ | 5533/6885 [14:31:42<55:24, 2.46s/it] 80%|████████ | 5534/6885 [14:31:44<53:26, 2.37s/it] 80%|████████ | 5535/6885 [14:31:46<52:11, 2.32s/it] 80%|████████ | 5536/6885 [14:31:49<56:31, 2.51s/it] 80%|████████ | 5537/6885 [14:31:52<57:26, 2.56s/it] 80%|████████ | 5538/6885 [14:31:54<53:59, 2.40s/it] 80%|████████ | 5539/6885 [14:31:57<1:03:37, 2.84s/it] 80%|████████ | 5540/6885 [14:32:01<1:05:33, 2.92s/it] {'loss': 0.5472, 'grad_norm': 1.140249494732679, 'learning_rate': 1.11991364412834e-06, 'epoch': 0.8} 80%|████████ | 5540/6885 [14:32:01<1:05:33, 2.92s/it] 80%|████████ | 5541/6885 [14:32:04<1:07:28, 3.01s/it] 80%|████████ | 5542/6885 [14:32:07<1:06:35, 2.98s/it] 81%|████████ | 5543/6885 [14:32:10<1:09:19, 3.10s/it] 81%|████████ | 5544/6885 [14:32:17<1:36:28, 4.32s/it] 81%|████████ | 5545/6885 [14:32:21<1:35:49, 4.29s/it] 81%|████████ | 5546/6885 [14:32:24<1:25:08, 3.82s/it] 81%|████████ | 5547/6885 [14:32:27<1:16:37, 3.44s/it] 81%|████████ | 5548/6885 [14:32:29<1:09:57, 3.14s/it] 81%|████████ | 5549/6885 [14:32:32<1:06:24, 2.98s/it] 81%|████████ | 5550/6885 [14:32:36<1:11:42, 3.22s/it] {'loss': 0.5614, 'grad_norm': 1.0963574239357976, 'learning_rate': 1.1039739245851426e-06, 'epoch': 0.81} 81%|████████ | 5550/6885 [14:32:36<1:11:42, 3.22s/it] 81%|████████ | 5551/6885 [14:32:38<1:08:26, 3.08s/it] 81%|████████ | 5552/6885 [14:32:41<1:06:57, 3.01s/it] 81%|████████ | 5553/6885 [14:32:44<1:03:25, 2.86s/it] 81%|████████ | 5554/6885 [14:32:46<1:01:40, 2.78s/it] 81%|████████ | 5555/6885 [14:32:50<1:05:51, 2.97s/it] 81%|████████ | 5556/6885 [14:32:55<1:24:32, 3.82s/it] 81%|████████ | 5557/6885 [14:32:58<1:18:22, 3.54s/it] 81%|████████ | 5558/6885 [14:33:00<1:08:05, 3.08s/it] 81%|████████ | 5559/6885 [14:33:03<1:08:19, 3.09s/it] 81%|████████ | 5560/6885 [14:33:06<1:06:42, 3.02s/it] {'loss': 0.5516, 'grad_norm': 1.1963836912036798, 'learning_rate': 1.088134365857399e-06, 'epoch': 0.81} 81%|████████ | 5560/6885 [14:33:06<1:06:42, 3.02s/it] 81%|████████ | 5561/6885 [14:33:10<1:10:52, 3.21s/it] 81%|████████ | 5562/6885 [14:33:12<1:01:42, 2.80s/it] 81%|████████ | 5563/6885 [14:33:16<1:09:29, 3.15s/it] 81%|████████ | 5564/6885 [14:33:21<1:22:28, 3.75s/it] 81%|████████ | 5565/6885 [14:33:23<1:10:34, 3.21s/it] 81%|████████ | 5566/6885 [14:33:26<1:08:05, 3.10s/it] 81%|████████ | 5567/6885 [14:33:30<1:15:53, 3.46s/it] 81%|████████ | 5568/6885 [14:33:34<1:19:59, 3.64s/it] 81%|████████ | 5569/6885 [14:33:37<1:14:07, 3.38s/it] 81%|████████ | 5570/6885 [14:33:39<1:06:26, 3.03s/it] {'loss': 0.5643, 'grad_norm': 1.320400739555157, 'learning_rate': 1.0723953751557098e-06, 'epoch': 0.81} 81%|████████ | 5570/6885 [14:33:39<1:06:26, 3.03s/it] 81%|████████ | 5571/6885 [14:33:43<1:14:49, 3.42s/it] 81%|████████ | 5572/6885 [14:33:46<1:12:02, 3.29s/it] 81%|████████ | 5573/6885 [14:33:48<1:03:10, 2.89s/it] 81%|████████ | 5574/6885 [14:33:51<1:00:17, 2.76s/it] 81%|████████ | 5575/6885 [14:33:55<1:09:35, 3.19s/it] 81%|████████ | 5576/6885 [14:34:01<1:26:18, 3.96s/it] 81%|████████ | 5577/6885 [14:34:03<1:14:17, 3.41s/it] 81%|████████ | 5578/6885 [14:34:06<1:10:22, 3.23s/it] 81%|████████ | 5579/6885 [14:34:08<1:07:19, 3.09s/it] 81%|████████ | 5580/6885 [14:34:12<1:10:10, 3.23s/it] {'loss': 0.545, 'grad_norm': 1.2261172403861758, 'learning_rate': 1.0567573571052265e-06, 'epoch': 0.81} 81%|████████ | 5580/6885 [14:34:12<1:10:10, 3.23s/it] 81%|████████ | 5581/6885 [14:34:14<1:02:26, 2.87s/it] 81%|████████ | 5582/6885 [14:34:17<1:00:37, 2.79s/it] 81%|████████ | 5583/6885 [14:34:19<58:10, 2.68s/it] 81%|████████ | 5584/6885 [14:34:22<1:02:21, 2.88s/it] 81%|████████ | 5585/6885 [14:34:26<1:04:41, 2.99s/it] 81%|████████ | 5586/6885 [14:34:31<1:18:40, 3.63s/it] 81%|████████ | 5587/6885 [14:34:33<1:07:41, 3.13s/it] 81%|████████ | 5588/6885 [14:34:37<1:13:36, 3.40s/it] 81%|████████ | 5589/6885 [14:34:40<1:12:50, 3.37s/it] 81%|████████ | 5590/6885 [14:34:43<1:07:11, 3.11s/it] {'loss': 0.5562, 'grad_norm': 1.1363072652624087, 'learning_rate': 1.0412207137352504e-06, 'epoch': 0.81} 81%|████████ | 5590/6885 [14:34:43<1:07:11, 3.11s/it] 81%|████████ | 5591/6885 [14:34:45<1:01:01, 2.83s/it] 81%|████████ | 5592/6885 [14:34:49<1:08:37, 3.18s/it] 81%|████████ | 5593/6885 [14:34:53<1:17:24, 3.59s/it] 81%|████████ | 5594/6885 [14:34:59<1:32:02, 4.28s/it] 81%|████████▏ | 5595/6885 [14:35:02<1:24:49, 3.95s/it] 81%|████████▏ | 5596/6885 [14:35:04<1:12:43, 3.39s/it] 81%|████████▏ | 5597/6885 [14:35:07<1:07:16, 3.13s/it] 81%|████████▏ | 5598/6885 [14:35:09<59:07, 2.76s/it] 81%|████████▏ | 5599/6885 [14:35:12<1:03:18, 2.95s/it] 81%|████████▏ | 5600/6885 [14:35:16<1:10:35, 3.30s/it] {'loss': 0.5584, 'grad_norm': 1.0696753091917897, 'learning_rate': 1.0257858444688968e-06, 'epoch': 0.81} 81%|████████▏ | 5600/6885 [14:35:16<1:10:35, 3.30s/it] 81%|████████▏ | 5601/6885 [14:35:19<1:03:58, 2.99s/it] 81%|████████▏ | 5602/6885 [14:35:20<56:47, 2.66s/it] 81%|████████▏ | 5603/6885 [14:35:23<58:55, 2.76s/it] 81%|████████▏ | 5604/6885 [14:35:26<58:07, 2.72s/it] 81%|████████▏ | 5605/6885 [14:35:28<51:55, 2.43s/it] 81%|████████▏ | 5606/6885 [14:35:30<52:09, 2.45s/it] 81%|████████▏ | 5607/6885 [14:35:33<55:03, 2.58s/it] 81%|████████▏ | 5608/6885 [14:35:36<56:35, 2.66s/it] 81%|████████▏ | 5609/6885 [14:35:39<1:00:39, 2.85s/it] 81%|████████▏ | 5610/6885 [14:35:43<1:04:21, 3.03s/it] {'loss': 0.5509, 'grad_norm': 1.092336652561905, 'learning_rate': 1.0104531461128224e-06, 'epoch': 0.81} 81%|████████▏ | 5610/6885 [14:35:43<1:04:21, 3.03s/it] 81%|████████▏ | 5611/6885 [14:35:45<59:19, 2.79s/it] 82%|████████▏ | 5612/6885 [14:35:48<59:26, 2.80s/it] 82%|████████▏ | 5613/6885 [14:35:51<1:03:05, 2.98s/it] 82%|████████▏ | 5614/6885 [14:35:54<1:03:18, 2.99s/it] 82%|████████▏ | 5615/6885 [14:35:57<1:03:17, 2.99s/it] 82%|████████▏ | 5616/6885 [14:36:00<1:00:25, 2.86s/it] 82%|████████▏ | 5617/6885 [14:36:02<55:24, 2.62s/it] 82%|████████▏ | 5618/6885 [14:36:05<1:00:10, 2.85s/it] 82%|████████▏ | 5619/6885 [14:36:08<1:02:09, 2.95s/it] 82%|████████▏ | 5620/6885 [14:36:11<57:09, 2.71s/it] {'loss': 0.5552, 'grad_norm': 1.2190453226296554, 'learning_rate': 9.952230128470358e-07, 'epoch': 0.82} 82%|████████▏ | 5620/6885 [14:36:11<57:09, 2.71s/it] 82%|████████▏ | 5621/6885 [14:36:14<59:59, 2.85s/it] 82%|████████▏ | 5622/6885 [14:36:17<59:22, 2.82s/it] 82%|████████▏ | 5623/6885 [14:36:20<1:04:23, 3.06s/it] 82%|████████▏ | 5624/6885 [14:36:22<59:06, 2.81s/it] 82%|████████▏ | 5625/6885 [14:36:25<54:48, 2.61s/it] 82%|████████▏ | 5626/6885 [14:36:28<1:00:20, 2.88s/it] 82%|████████▏ | 5627/6885 [14:36:30<54:23, 2.59s/it] 82%|████████▏ | 5628/6885 [14:36:35<1:09:21, 3.31s/it] 82%|████████▏ | 5629/6885 [14:36:39<1:13:45, 3.52s/it] 82%|████████▏ | 5630/6885 [14:36:42<1:12:39, 3.47s/it] {'loss': 0.5611, 'grad_norm': 1.1756174285580154, 'learning_rate': 9.800958362147433e-07, 'epoch': 0.82} 82%|████████▏ | 5630/6885 [14:36:42<1:12:39, 3.47s/it] 82%|████████▏ | 5631/6885 [14:36:48<1:24:27, 4.04s/it] 82%|████████▏ | 5632/6885 [14:36:51<1:17:19, 3.70s/it] 82%|████████▏ | 5633/6885 [14:36:53<1:08:56, 3.30s/it] 82%|████████▏ | 5634/6885 [14:36:56<1:05:35, 3.15s/it] 82%|████████▏ | 5635/6885 [14:36:58<57:40, 2.77s/it] 82%|████████▏ | 5636/6885 [14:37:01<58:24, 2.81s/it] 82%|████████▏ | 5637/6885 [14:37:04<1:00:13, 2.90s/it] 82%|████████▏ | 5638/6885 [14:37:07<1:00:48, 2.93s/it] 82%|████████▏ | 5639/6885 [14:37:10<1:00:16, 2.90s/it] 82%|████████▏ | 5640/6885 [14:37:12<1:00:01, 2.89s/it] {'loss': 0.5536, 'grad_norm': 1.050298389841538, 'learning_rate': 9.65072005112308e-07, 'epoch': 0.82} 82%|████████▏ | 5640/6885 [14:37:12<1:00:01, 2.89s/it] 82%|████████▏ | 5641/6885 [14:37:15<1:00:51, 2.93s/it] 82%|████████▏ | 5642/6885 [14:37:18<57:26, 2.77s/it] 82%|████████▏ | 5643/6885 [14:37:20<54:34, 2.64s/it] 82%|████████▏ | 5644/6885 [14:37:22<49:47, 2.41s/it] 82%|████████▏ | 5645/6885 [14:37:25<55:41, 2.69s/it] 82%|████████▏ | 5646/6885 [14:37:28<54:36, 2.64s/it] 82%|████████▏ | 5647/6885 [14:37:30<51:16, 2.49s/it] 82%|████████▏ | 5648/6885 [14:37:34<57:30, 2.79s/it] 82%|████████▏ | 5649/6885 [14:37:38<1:05:56, 3.20s/it] 82%|████████▏ | 5650/6885 [14:37:41<1:06:56, 3.25s/it] {'loss': 0.5495, 'grad_norm': 1.2990174959407426, 'learning_rate': 9.501519057792275e-07, 'epoch': 0.82} 82%|████████▏ | 5650/6885 [14:37:41<1:06:56, 3.25s/it] 82%|████████▏ | 5651/6885 [14:37:44<1:04:20, 3.13s/it] 82%|████████▏ | 5652/6885 [14:37:46<59:10, 2.88s/it] 82%|████████▏ | 5653/6885 [14:37:48<53:20, 2.60s/it] 82%|████████▏ | 5654/6885 [14:37:51<53:25, 2.60s/it] 82%|████████▏ | 5655/6885 [14:37:54<56:05, 2.74s/it] 82%|████████▏ | 5656/6885 [14:37:56<55:16, 2.70s/it] 82%|████████▏ | 5657/6885 [14:37:59<55:20, 2.70s/it] 82%|████████▏ | 5658/6885 [14:38:03<1:03:36, 3.11s/it] 82%|████████▏ | 5659/6885 [14:38:08<1:12:43, 3.56s/it] 82%|████████▏ | 5660/6885 [14:38:10<1:02:43, 3.07s/it] {'loss': 0.5557, 'grad_norm': 1.1318695700100998, 'learning_rate': 9.353359217882241e-07, 'epoch': 0.82} 82%|████████▏ | 5660/6885 [14:38:10<1:02:43, 3.07s/it] 82%|████████▏ | 5661/6885 [14:38:12<58:57, 2.89s/it] 82%|████████▏ | 5662/6885 [14:38:15<55:29, 2.72s/it] 82%|████████▏ | 5663/6885 [14:38:16<49:49, 2.45s/it] 82%|████████▏ | 5664/6885 [14:38:18<46:05, 2.27s/it] 82%|████████▏ | 5665/6885 [14:38:20<44:23, 2.18s/it] 82%|████████▏ | 5666/6885 [14:38:22<43:58, 2.16s/it] 82%|████████▏ | 5667/6885 [14:38:25<44:58, 2.22s/it] 82%|████████▏ | 5668/6885 [14:38:27<46:00, 2.27s/it] 82%|████████▏ | 5669/6885 [14:38:33<1:07:21, 3.32s/it] 82%|████████▏ | 5670/6885 [14:38:35<58:59, 2.91s/it] {'loss': 0.5703, 'grad_norm': 1.1818056539247317, 'learning_rate': 9.206244340353732e-07, 'epoch': 0.82} 82%|████████▏ | 5670/6885 [14:38:35<58:59, 2.91s/it] 82%|████████▏ | 5671/6885 [14:38:37<57:12, 2.83s/it] 82%|████████▏ | 5672/6885 [14:38:40<55:03, 2.72s/it] 82%|████████▏ | 5673/6885 [14:38:42<53:16, 2.64s/it] 82%|████████▏ | 5674/6885 [14:38:45<53:58, 2.67s/it] 82%|████████▏ | 5675/6885 [14:38:47<49:20, 2.45s/it] 82%|████████▏ | 5676/6885 [14:38:50<50:00, 2.48s/it] 82%|████████▏ | 5677/6885 [14:38:55<1:07:04, 3.33s/it] 82%|████████▏ | 5678/6885 [14:38:57<1:01:59, 3.08s/it] 82%|████████▏ | 5679/6885 [14:38:59<55:35, 2.77s/it] 82%|████████▏ | 5680/6885 [14:39:04<1:04:00, 3.19s/it] {'loss': 0.5543, 'grad_norm': 1.191491253002993, 'learning_rate': 9.060178207303077e-07, 'epoch': 0.82} 82%|████████▏ | 5680/6885 [14:39:04<1:04:00, 3.19s/it] 83%|████████▎ | 5681/6885 [14:39:07<1:06:32, 3.32s/it] 83%|████████▎ | 5682/6885 [14:39:09<59:39, 2.98s/it] 83%|████████▎ | 5683/6885 [14:39:11<51:39, 2.58s/it] 83%|████████▎ | 5684/6885 [14:39:14<52:15, 2.61s/it] 83%|████████▎ | 5685/6885 [14:39:16<51:13, 2.56s/it] 83%|████████▎ | 5686/6885 [14:39:18<47:06, 2.36s/it] 83%|████████▎ | 5687/6885 [14:39:21<50:30, 2.53s/it] 83%|████████▎ | 5688/6885 [14:39:23<47:38, 2.39s/it] 83%|████████▎ | 5689/6885 [14:39:26<49:54, 2.50s/it] 83%|████████▎ | 5690/6885 [14:39:28<51:01, 2.56s/it] {'loss': 0.5673, 'grad_norm': 1.2775803771232788, 'learning_rate': 8.915164573865109e-07, 'epoch': 0.83} 83%|████████▎ | 5690/6885 [14:39:28<51:01, 2.56s/it] 83%|████████▎ | 5691/6885 [14:39:31<52:28, 2.64s/it] 83%|████████▎ | 5692/6885 [14:39:38<1:17:05, 3.88s/it] 83%|████████▎ | 5693/6885 [14:39:43<1:24:41, 4.26s/it] 83%|████████▎ | 5694/6885 [14:39:45<1:12:50, 3.67s/it] 83%|████████▎ | 5695/6885 [14:39:49<1:13:04, 3.68s/it] 83%|████████▎ | 5696/6885 [14:39:52<1:08:16, 3.45s/it] 83%|████████▎ | 5697/6885 [14:39:55<1:07:59, 3.43s/it] 83%|████████▎ | 5698/6885 [14:39:59<1:06:55, 3.38s/it] 83%|████████▎ | 5699/6885 [14:40:02<1:04:41, 3.27s/it] 83%|████████▎ | 5700/6885 [14:40:04<59:59, 3.04s/it] {'loss': 0.5526, 'grad_norm': 1.0993365384271814, 'learning_rate': 8.771207168116407e-07, 'epoch': 0.83} 83%|████████▎ | 5700/6885 [14:40:04<59:59, 3.04s/it] 83%|████████▎ | 5701/6885 [14:40:07<57:06, 2.89s/it] 83%|████████▎ | 5702/6885 [14:40:11<1:02:02, 3.15s/it] 83%|████████▎ | 5703/6885 [14:40:15<1:08:52, 3.50s/it] 83%|████████▎ | 5704/6885 [14:40:17<1:02:10, 3.16s/it] 83%|████████▎ | 5705/6885 [14:40:20<1:00:13, 3.06s/it] 83%|████████▎ | 5706/6885 [14:40:23<1:00:18, 3.07s/it] 83%|████████▎ | 5707/6885 [14:40:28<1:07:43, 3.45s/it] 83%|████████▎ | 5708/6885 [14:40:31<1:08:09, 3.47s/it] 83%|████████▎ | 5709/6885 [14:40:37<1:24:37, 4.32s/it] 83%|████████▎ | 5710/6885 [14:40:39<1:10:49, 3.62s/it] {'loss': 0.5465, 'grad_norm': 1.2010857578242673, 'learning_rate': 8.628309690979658e-07, 'epoch': 0.83} 83%|████████▎ | 5710/6885 [14:40:39<1:10:49, 3.62s/it] 83%|████████▎ | 5711/6885 [14:40:43<1:08:45, 3.51s/it] 83%|████████▎ | 5712/6885 [14:40:46<1:10:55, 3.63s/it] 83%|████████▎ | 5713/6885 [14:40:51<1:13:18, 3.75s/it] 83%|████████▎ | 5714/6885 [14:40:54<1:10:40, 3.62s/it] 83%|████████▎ | 5715/6885 [14:40:56<1:02:43, 3.22s/it] 83%|████████▎ | 5716/6885 [14:40:58<57:07, 2.93s/it] 83%|████████▎ | 5717/6885 [14:41:02<1:00:48, 3.12s/it] 83%|████████▎ | 5718/6885 [14:41:05<57:44, 2.97s/it] 83%|████████▎ | 5719/6885 [14:41:07<55:21, 2.85s/it] 83%|████████▎ | 5720/6885 [14:41:10<57:38, 2.97s/it] {'loss': 0.5522, 'grad_norm': 1.1363204888828164, 'learning_rate': 8.486475816128376e-07, 'epoch': 0.83} 83%|████████▎ | 5720/6885 [14:41:10<57:38, 2.97s/it] 83%|████████▎ | 5721/6885 [14:41:15<1:05:54, 3.40s/it] 83%|████████▎ | 5722/6885 [14:41:18<1:06:30, 3.43s/it] 83%|████████▎ | 5723/6885 [14:41:21<1:03:59, 3.30s/it] 83%|████████▎ | 5724/6885 [14:41:26<1:11:14, 3.68s/it] 83%|████████▎ | 5725/6885 [14:41:28<1:01:02, 3.16s/it] 83%|████████▎ | 5726/6885 [14:41:30<55:59, 2.90s/it] 83%|████████▎ | 5727/6885 [14:41:32<51:47, 2.68s/it] 83%|████████▎ | 5728/6885 [14:41:35<50:42, 2.63s/it] 83%|████████▎ | 5729/6885 [14:41:38<53:02, 2.75s/it] 83%|████████▎ | 5730/6885 [14:41:40<49:46, 2.59s/it] {'loss': 0.5377, 'grad_norm': 1.237168492535083, 'learning_rate': 8.345709189892504e-07, 'epoch': 0.83} 83%|████████▎ | 5730/6885 [14:41:40<49:46, 2.59s/it] 83%|████████▎ | 5731/6885 [14:41:42<48:06, 2.50s/it] 83%|████████▎ | 5732/6885 [14:41:46<55:07, 2.87s/it] 83%|████████▎ | 5733/6885 [14:41:50<1:02:02, 3.23s/it] 83%|████████▎ | 5734/6885 [14:41:53<57:43, 3.01s/it] 83%|████████▎ | 5735/6885 [14:41:55<56:22, 2.94s/it] 83%|████████▎ | 5736/6885 [14:42:00<1:04:58, 3.39s/it] 83%|████████▎ | 5737/6885 [14:42:03<1:04:50, 3.39s/it] 83%|████████▎ | 5738/6885 [14:42:06<59:58, 3.14s/it] 83%|████████▎ | 5739/6885 [14:42:08<53:52, 2.82s/it] 83%|████████▎ | 5740/6885 [14:42:10<51:21, 2.69s/it] {'loss': 0.5613, 'grad_norm': 1.1890926723132464, 'learning_rate': 8.206013431164683e-07, 'epoch': 0.83} 83%|████████▎ | 5740/6885 [14:42:10<51:21, 2.69s/it] 83%|████████▎ | 5741/6885 [14:42:13<50:53, 2.67s/it] 83%|████████▎ | 5742/6885 [14:42:15<46:07, 2.42s/it] 83%|████████▎ | 5743/6885 [14:42:19<57:25, 3.02s/it] 83%|████████▎ | 5744/6885 [14:42:22<57:47, 3.04s/it] 83%|████████▎ | 5745/6885 [14:42:25<53:57, 2.84s/it] 83%|████████▎ | 5746/6885 [14:42:29<1:01:03, 3.22s/it] 83%|████████▎ | 5747/6885 [14:42:31<58:32, 3.09s/it] 83%|████████▎ | 5748/6885 [14:42:34<56:36, 2.99s/it] 84%|████████▎ | 5749/6885 [14:42:37<55:52, 2.95s/it] 84%|████████▎ | 5750/6885 [14:42:40<56:28, 2.99s/it] {'loss': 0.5562, 'grad_norm': 1.2611972496063513, 'learning_rate': 8.0673921313072e-07, 'epoch': 0.84} 84%|████████▎ | 5750/6885 [14:42:40<56:28, 2.99s/it] 84%|████████▎ | 5751/6885 [14:42:44<59:38, 3.16s/it] 84%|████████▎ | 5752/6885 [14:42:48<1:07:34, 3.58s/it] 84%|████████▎ | 5753/6885 [14:42:52<1:10:55, 3.76s/it] 84%|████████▎ | 5754/6885 [14:42:55<1:04:37, 3.43s/it] 84%|████████▎ | 5755/6885 [14:42:57<58:17, 3.09s/it] 84%|████████▎ | 5756/6885 [14:42:59<50:58, 2.71s/it] 84%|████████▎ | 5757/6885 [14:43:02<50:20, 2.68s/it] 84%|████████▎ | 5758/6885 [14:43:07<1:04:24, 3.43s/it] 84%|████████▎ | 5759/6885 [14:43:10<1:01:11, 3.26s/it] 84%|████████▎ | 5760/6885 [14:43:12<55:55, 2.98s/it] {'loss': 0.5469, 'grad_norm': 1.1453681982727373, 'learning_rate': 7.929848854059663e-07, 'epoch': 0.84} 84%|████████▎ | 5760/6885 [14:43:12<55:55, 2.98s/it] 84%|████████▎ | 5761/6885 [14:43:15<52:19, 2.79s/it] 84%|████████▎ | 5762/6885 [14:43:17<50:22, 2.69s/it] 84%|████████▎ | 5763/6885 [14:43:19<48:22, 2.59s/it] 84%|████████▎ | 5764/6885 [14:43:22<46:24, 2.48s/it] 84%|████████▎ | 5765/6885 [14:43:23<42:56, 2.30s/it] 84%|████████▎ | 5766/6885 [14:43:28<53:23, 2.86s/it] 84%|████████▍ | 5767/6885 [14:43:30<49:36, 2.66s/it] 84%|████████▍ | 5768/6885 [14:43:35<1:01:09, 3.28s/it] 84%|████████▍ | 5769/6885 [14:43:38<1:03:02, 3.39s/it] 84%|████████▍ | 5770/6885 [14:43:41<59:59, 3.23s/it] {'loss': 0.5688, 'grad_norm': 1.1161546893459802, 'learning_rate': 7.793387135447372e-07, 'epoch': 0.84} 84%|████████▍ | 5770/6885 [14:43:41<59:59, 3.23s/it] 84%|████████▍ | 5771/6885 [14:43:45<1:02:50, 3.39s/it] 84%|████████▍ | 5772/6885 [14:43:46<53:18, 2.87s/it] 84%|████████▍ | 5773/6885 [14:43:49<49:49, 2.69s/it] 84%|████████▍ | 5774/6885 [14:43:52<50:13, 2.71s/it] 84%|████████▍ | 5775/6885 [14:43:54<50:49, 2.75s/it] 84%|████████▍ | 5776/6885 [14:44:03<1:21:50, 4.43s/it] 84%|████████▍ | 5777/6885 [14:44:07<1:21:53, 4.43s/it] 84%|████████▍ | 5778/6885 [14:44:09<1:09:36, 3.77s/it] 84%|████████▍ | 5779/6885 [14:44:13<1:11:04, 3.86s/it] 84%|████████▍ | 5780/6885 [14:44:15<1:00:00, 3.26s/it] {'loss': 0.5516, 'grad_norm': 1.242951008236561, 'learning_rate': 7.658010483690431e-07, 'epoch': 0.84} 84%|████████▍ | 5780/6885 [14:44:15<1:00:00, 3.26s/it] 84%|████████▍ | 5781/6885 [14:44:18<56:30, 3.07s/it] 84%|████████▍ | 5782/6885 [14:44:21<55:03, 3.00s/it] 84%|████████▍ | 5783/6885 [14:44:24<53:48, 2.93s/it] 84%|████████▍ | 5784/6885 [14:44:26<52:12, 2.85s/it] 84%|████████▍ | 5785/6885 [14:44:30<57:04, 3.11s/it] 84%|████████▍ | 5786/6885 [14:44:35<1:10:39, 3.86s/it] 84%|████████▍ | 5787/6885 [14:44:39<1:09:24, 3.79s/it] 84%|████████▍ | 5788/6885 [14:44:42<1:03:56, 3.50s/it] 84%|████████▍ | 5789/6885 [14:44:44<55:31, 3.04s/it] 84%|████████▍ | 5790/6885 [14:44:47<55:22, 3.03s/it] {'loss': 0.5558, 'grad_norm': 1.1291848404892897, 'learning_rate': 7.52372237911358e-07, 'epoch': 0.84} 84%|████████▍ | 5790/6885 [14:44:47<55:22, 3.03s/it] 84%|████████▍ | 5791/6885 [14:44:50<57:50, 3.17s/it] 84%|████████▍ | 5792/6885 [14:44:54<1:00:41, 3.33s/it] 84%|████████▍ | 5793/6885 [14:44:58<1:00:59, 3.35s/it] 84%|████████▍ | 5794/6885 [14:45:01<1:03:20, 3.48s/it] 84%|████████▍ | 5795/6885 [14:45:03<55:07, 3.03s/it] 84%|████████▍ | 5796/6885 [14:45:07<59:27, 3.28s/it] 84%|████████▍ | 5797/6885 [14:45:11<1:03:39, 3.51s/it] 84%|████████▍ | 5798/6885 [14:45:15<1:03:33, 3.51s/it] 84%|████████▍ | 5799/6885 [14:45:18<1:04:40, 3.57s/it] 84%|████████▍ | 5800/6885 [14:45:20<55:18, 3.06s/it] {'loss': 0.5368, 'grad_norm': 1.1344340429459099, 'learning_rate': 7.390526274056625e-07, 'epoch': 0.84} 84%|████████▍ | 5800/6885 [14:45:20<55:18, 3.06s/it] 84%|████████▍ | 5801/6885 [14:45:23<50:51, 2.81s/it] 84%|████████▍ | 5802/6885 [14:45:26<56:45, 3.14s/it] 84%|████████▍ | 5803/6885 [14:45:29<51:17, 2.84s/it] 84%|████████▍ | 5804/6885 [14:45:32<52:29, 2.91s/it] 84%|████████▍ | 5805/6885 [14:45:37<1:03:10, 3.51s/it] 84%|████████▍ | 5806/6885 [14:45:39<57:58, 3.22s/it] 84%|████████▍ | 5807/6885 [14:45:41<50:52, 2.83s/it] 84%|████████▍ | 5808/6885 [14:45:44<51:49, 2.89s/it] 84%|████████▍ | 5809/6885 [14:45:47<49:31, 2.76s/it] 84%|████████▍ | 5810/6885 [14:45:49<45:35, 2.54s/it] {'loss': 0.5438, 'grad_norm': 1.2369341276497008, 'learning_rate': 7.25842559278584e-07, 'epoch': 0.84} 84%|████████▍ | 5810/6885 [14:45:49<45:35, 2.54s/it] 84%|████████▍ | 5811/6885 [14:45:53<57:59, 3.24s/it] 84%|████████▍ | 5812/6885 [14:45:57<1:01:33, 3.44s/it] 84%|████████▍ | 5813/6885 [14:46:00<58:11, 3.26s/it] 84%|████████▍ | 5814/6885 [14:46:03<53:57, 3.02s/it] 84%|████████▍ | 5815/6885 [14:46:05<52:26, 2.94s/it] 84%|████████▍ | 5816/6885 [14:46:08<52:27, 2.94s/it] 84%|████████▍ | 5817/6885 [14:46:11<48:40, 2.73s/it] 85%|████████▍ | 5818/6885 [14:46:12<43:38, 2.45s/it] 85%|████████▍ | 5819/6885 [14:46:15<45:21, 2.55s/it] 85%|████████▍ | 5820/6885 [14:46:17<42:27, 2.39s/it] {'loss': 0.5524, 'grad_norm': 1.161564478717058, 'learning_rate': 7.127423731405747e-07, 'epoch': 0.85} 85%|████████▍ | 5820/6885 [14:46:17<42:27, 2.39s/it] 85%|████████▍ | 5821/6885 [14:46:21<50:22, 2.84s/it] 85%|████████▍ | 5822/6885 [14:46:23<43:21, 2.45s/it] 85%|████████▍ | 5823/6885 [14:46:24<39:26, 2.23s/it] 85%|████████▍ | 5824/6885 [14:46:26<38:32, 2.18s/it] 85%|████████▍ | 5825/6885 [14:46:29<41:06, 2.33s/it] 85%|████████▍ | 5826/6885 [14:46:31<37:42, 2.14s/it] 85%|████████▍ | 5827/6885 [14:46:34<41:32, 2.36s/it] 85%|████████▍ | 5828/6885 [14:46:36<40:02, 2.27s/it] 85%|████████▍ | 5829/6885 [14:46:39<43:33, 2.48s/it] 85%|████████▍ | 5830/6885 [14:46:41<40:57, 2.33s/it] {'loss': 0.5411, 'grad_norm': 1.3389378618000198, 'learning_rate': 6.997524057771964e-07, 'epoch': 0.85} 85%|████████▍ | 5830/6885 [14:46:41<40:57, 2.33s/it] 85%|████████▍ | 5831/6885 [14:46:44<43:47, 2.49s/it] 85%|████████▍ | 5832/6885 [14:46:47<50:05, 2.85s/it] 85%|████████▍ | 5833/6885 [14:46:50<48:10, 2.75s/it] 85%|████████▍ | 5834/6885 [14:46:52<46:29, 2.65s/it] 85%|████████▍ | 5835/6885 [14:46:54<42:45, 2.44s/it] 85%|████████▍ | 5836/6885 [14:46:57<43:22, 2.48s/it] 85%|████████▍ | 5837/6885 [14:46:59<43:07, 2.47s/it] 85%|████████▍ | 5838/6885 [14:47:02<46:07, 2.64s/it] 85%|████████▍ | 5839/6885 [14:47:04<41:04, 2.36s/it] 85%|████████▍ | 5840/6885 [14:47:08<50:12, 2.88s/it] {'loss': 0.5594, 'grad_norm': 1.2324708082947882, 'learning_rate': 6.868729911404582e-07, 'epoch': 0.85} 85%|████████▍ | 5840/6885 [14:47:08<50:12, 2.88s/it] 85%|████████▍ | 5841/6885 [14:47:10<47:01, 2.70s/it] 85%|████████▍ | 5842/6885 [14:47:13<47:42, 2.74s/it] 85%|████████▍ | 5843/6885 [14:47:17<54:39, 3.15s/it] 85%|████████▍ | 5844/6885 [14:47:20<53:07, 3.06s/it] 85%|████████▍ | 5845/6885 [14:47:23<53:25, 3.08s/it] 85%|████████▍ | 5846/6885 [14:47:26<50:29, 2.92s/it] 85%|████████▍ | 5847/6885 [14:47:28<49:34, 2.87s/it] 85%|████████▍ | 5848/6885 [14:47:31<48:21, 2.80s/it] 85%|████████▍ | 5849/6885 [14:47:34<47:58, 2.78s/it] 85%|████████▍ | 5850/6885 [14:47:39<58:26, 3.39s/it] {'loss': 0.5394, 'grad_norm': 1.0931906751127958, 'learning_rate': 6.741044603402214e-07, 'epoch': 0.85} 85%|████████▍ | 5850/6885 [14:47:39<58:26, 3.39s/it] 85%|████████▍ | 5851/6885 [14:47:41<54:37, 3.17s/it] 85%|████████▍ | 5852/6885 [14:47:46<1:00:04, 3.49s/it] 85%|████████▌ | 5853/6885 [14:47:47<49:42, 2.89s/it] 85%|████████▌ | 5854/6885 [14:47:50<52:32, 3.06s/it] 85%|████████▌ | 5855/6885 [14:47:55<57:46, 3.37s/it] 85%|████████▌ | 5856/6885 [14:47:57<52:56, 3.09s/it] 85%|████████▌ | 5857/6885 [14:47:59<49:30, 2.89s/it] 85%|████████▌ | 5858/6885 [14:48:02<46:04, 2.69s/it] 85%|████████▌ | 5859/6885 [14:48:05<47:15, 2.76s/it] 85%|████████▌ | 5860/6885 [14:48:07<44:28, 2.60s/it] {'loss': 0.5517, 'grad_norm': 1.1045798920330345, 'learning_rate': 6.614471416357055e-07, 'epoch': 0.85} 85%|████████▌ | 5860/6885 [14:48:07<44:28, 2.60s/it] 85%|████████▌ | 5861/6885 [14:48:10<46:56, 2.75s/it] 85%|████████▌ | 5862/6885 [14:48:13<48:16, 2.83s/it] 85%|████████▌ | 5863/6885 [14:48:16<49:00, 2.88s/it] 85%|████████▌ | 5864/6885 [14:48:19<48:20, 2.84s/it] 85%|████████▌ | 5865/6885 [14:48:22<49:21, 2.90s/it] 85%|████████▌ | 5866/6885 [14:48:25<51:17, 3.02s/it] 85%|████████▌ | 5867/6885 [14:48:27<47:48, 2.82s/it] 85%|████████▌ | 5868/6885 [14:48:31<49:48, 2.94s/it] 85%|████████▌ | 5869/6885 [14:48:34<50:04, 2.96s/it] 85%|████████▌ | 5870/6885 [14:48:37<50:03, 2.96s/it] {'loss': 0.5432, 'grad_norm': 1.1003308882789462, 'learning_rate': 6.489013604270277e-07, 'epoch': 0.85} 85%|████████▌ | 5870/6885 [14:48:37<50:03, 2.96s/it] 85%|████████▌ | 5871/6885 [14:48:42<1:04:01, 3.79s/it] 85%|████████▌ | 5872/6885 [14:48:45<59:51, 3.55s/it] 85%|████████▌ | 5873/6885 [14:48:48<56:33, 3.35s/it] 85%|████████▌ | 5874/6885 [14:48:53<1:05:57, 3.91s/it] 85%|████████▌ | 5875/6885 [14:48:57<1:06:48, 3.97s/it] 85%|████████▌ | 5876/6885 [14:49:00<59:02, 3.51s/it] 85%|████████▌ | 5877/6885 [14:49:02<52:22, 3.12s/it] 85%|████████▌ | 5878/6885 [14:49:07<1:01:28, 3.66s/it] 85%|████████▌ | 5879/6885 [14:49:09<52:31, 3.13s/it] 85%|████████▌ | 5880/6885 [14:49:12<52:56, 3.16s/it] {'loss': 0.5543, 'grad_norm': 1.1511825195957979, 'learning_rate': 6.364674392468578e-07, 'epoch': 0.85} 85%|████████▌ | 5880/6885 [14:49:12<52:56, 3.16s/it] 85%|████████▌ | 5881/6885 [14:49:15<51:18, 3.07s/it] 85%|████████▌ | 5882/6885 [14:49:19<56:08, 3.36s/it] 85%|████████▌ | 5883/6885 [14:49:22<53:23, 3.20s/it] 85%|████████▌ | 5884/6885 [14:49:24<46:25, 2.78s/it] 85%|████████▌ | 5885/6885 [14:49:27<46:45, 2.81s/it] 85%|████████▌ | 5886/6885 [14:49:29<47:11, 2.83s/it] 86%|████████▌ | 5887/6885 [14:49:32<48:20, 2.91s/it] 86%|████████▌ | 5888/6885 [14:49:39<1:08:27, 4.12s/it] 86%|████████▌ | 5889/6885 [14:49:42<1:02:17, 3.75s/it] 86%|████████▌ | 5890/6885 [14:49:45<54:37, 3.29s/it] {'loss': 0.5511, 'grad_norm': 1.1016772920186344, 'learning_rate': 6.241456977521115e-07, 'epoch': 0.86} 86%|████████▌ | 5890/6885 [14:49:45<54:37, 3.29s/it] 86%|████████▌ | 5891/6885 [14:49:46<47:22, 2.86s/it] 86%|████████▌ | 5892/6885 [14:49:49<47:45, 2.89s/it] 86%|████████▌ | 5893/6885 [14:49:52<47:16, 2.86s/it] 86%|████████▌ | 5894/6885 [14:49:54<44:04, 2.67s/it] 86%|████████▌ | 5895/6885 [14:49:57<45:02, 2.73s/it] 86%|████████▌ | 5896/6885 [14:50:01<51:39, 3.13s/it] 86%|████████▌ | 5897/6885 [14:50:03<44:46, 2.72s/it] 86%|████████▌ | 5898/6885 [14:50:06<44:17, 2.69s/it] 86%|████████▌ | 5899/6885 [14:50:08<44:10, 2.69s/it] 86%|████████▌ | 5900/6885 [14:50:10<40:45, 2.48s/it] {'loss': 0.5546, 'grad_norm': 1.2345711604547172, 'learning_rate': 6.119364527157401e-07, 'epoch': 0.86} 86%|████████▌ | 5900/6885 [14:50:10<40:45, 2.48s/it] 86%|████████▌ | 5901/6885 [14:50:14<44:37, 2.72s/it] 86%|████████▌ | 5902/6885 [14:50:17<48:17, 2.95s/it] 86%|████████▌ | 5903/6885 [14:50:20<48:35, 2.97s/it] 86%|████████▌ | 5904/6885 [14:50:23<47:01, 2.88s/it] 86%|████████▌ | 5905/6885 [14:50:25<41:50, 2.56s/it] 86%|████████▌ | 5906/6885 [14:50:27<39:22, 2.41s/it] 86%|████████▌ | 5907/6885 [14:50:29<39:13, 2.41s/it] 86%|████████▌ | 5908/6885 [14:50:33<45:43, 2.81s/it] 86%|████████▌ | 5909/6885 [14:50:36<49:35, 3.05s/it] 86%|████████▌ | 5910/6885 [14:50:39<46:26, 2.86s/it] {'loss': 0.5534, 'grad_norm': 1.1026866190660687, 'learning_rate': 5.998400180185838e-07, 'epoch': 0.86} 86%|████████▌ | 5910/6885 [14:50:39<46:26, 2.86s/it] 86%|████████▌ | 5911/6885 [14:50:43<50:13, 3.09s/it] 86%|████████▌ | 5912/6885 [14:50:46<52:45, 3.25s/it] 86%|████████▌ | 5913/6885 [14:50:50<56:50, 3.51s/it] 86%|████████▌ | 5914/6885 [14:50:55<1:03:27, 3.92s/it] 86%|████████▌ | 5915/6885 [14:50:57<54:25, 3.37s/it] 86%|████████▌ | 5916/6885 [14:50:59<46:50, 2.90s/it] 86%|████████▌ | 5917/6885 [14:51:03<51:38, 3.20s/it] 86%|████████▌ | 5918/6885 [14:51:06<48:48, 3.03s/it] 86%|████████▌ | 5919/6885 [14:51:07<42:34, 2.64s/it] 86%|████████▌ | 5920/6885 [14:51:11<49:16, 3.06s/it] {'loss': 0.5431, 'grad_norm': 1.0696348901565953, 'learning_rate': 5.878567046413025e-07, 'epoch': 0.86} 86%|████████▌ | 5920/6885 [14:51:11<49:16, 3.06s/it] 86%|████████▌ | 5921/6885 [14:51:14<47:43, 2.97s/it] 86%|████████▌ | 5922/6885 [14:51:16<43:21, 2.70s/it] 86%|████████▌ | 5923/6885 [14:51:19<42:43, 2.66s/it] 86%|████████▌ | 5924/6885 [14:51:23<49:15, 3.08s/it] 86%|████████▌ | 5925/6885 [14:51:25<47:19, 2.96s/it] 86%|████████▌ | 5926/6885 [14:51:28<44:03, 2.76s/it] 86%|████████▌ | 5927/6885 [14:51:30<39:57, 2.50s/it] 86%|████████▌ | 5928/6885 [14:51:31<36:19, 2.28s/it] 86%|████████▌ | 5929/6885 [14:51:34<38:37, 2.42s/it] 86%|████████▌ | 5930/6885 [14:51:37<42:18, 2.66s/it] {'loss': 0.5564, 'grad_norm': 1.074925388402079, 'learning_rate': 5.759868206563834e-07, 'epoch': 0.86} 86%|████████▌ | 5930/6885 [14:51:37<42:18, 2.66s/it] 86%|████████▌ | 5931/6885 [14:51:39<37:46, 2.38s/it] 86%|████████▌ | 5932/6885 [14:51:41<36:21, 2.29s/it] 86%|████████▌ | 5933/6885 [14:51:45<44:37, 2.81s/it] 86%|████████▌ | 5934/6885 [14:51:50<54:57, 3.47s/it] 86%|████████▌ | 5935/6885 [14:51:52<49:10, 3.11s/it] 86%|████████▌ | 5936/6885 [14:51:55<47:42, 3.02s/it] 86%|████████▌ | 5937/6885 [14:51:57<43:14, 2.74s/it] 86%|████████▌ | 5938/6885 [14:52:00<41:22, 2.62s/it] 86%|████████▋ | 5939/6885 [14:52:02<40:33, 2.57s/it] 86%|████████▋ | 5940/6885 [14:52:04<38:52, 2.47s/it] {'loss': 0.56, 'grad_norm': 1.1892355845709555, 'learning_rate': 5.642306712202183e-07, 'epoch': 0.86} 86%|████████▋ | 5940/6885 [14:52:04<38:52, 2.47s/it] 86%|████████▋ | 5941/6885 [14:52:07<40:00, 2.54s/it] 86%|████████▋ | 5942/6885 [14:52:10<43:08, 2.74s/it] 86%|████████▋ | 5943/6885 [14:52:13<44:23, 2.83s/it] 86%|████████▋ | 5944/6885 [14:52:16<43:48, 2.79s/it] 86%|████████▋ | 5945/6885 [14:52:18<41:41, 2.66s/it] 86%|████████▋ | 5946/6885 [14:52:21<43:04, 2.75s/it] 86%|████████▋ | 5947/6885 [14:52:24<42:17, 2.71s/it] 86%|████████▋ | 5948/6885 [14:52:27<45:17, 2.90s/it] 86%|████████▋ | 5949/6885 [14:52:30<42:09, 2.70s/it] 86%|████████▋ | 5950/6885 [14:52:31<37:22, 2.40s/it] {'loss': 0.5477, 'grad_norm': 1.1714018297678883, 'learning_rate': 5.525885585652591e-07, 'epoch': 0.86} 86%|████████▋ | 5950/6885 [14:52:31<37:22, 2.40s/it] 86%|████████▋ | 5951/6885 [14:52:34<36:59, 2.38s/it] 86%|████████▋ | 5952/6885 [14:52:35<33:31, 2.16s/it] 86%|████████▋ | 5953/6885 [14:52:38<34:31, 2.22s/it] 86%|████████▋ | 5954/6885 [14:52:40<33:12, 2.14s/it] 86%|████████▋ | 5955/6885 [14:52:42<36:50, 2.38s/it] 87%|████████▋ | 5956/6885 [14:52:44<34:05, 2.20s/it] 87%|████████▋ | 5957/6885 [14:52:51<55:02, 3.56s/it] 87%|████████▋ | 5958/6885 [14:52:55<55:01, 3.56s/it] 87%|████████▋ | 5959/6885 [14:52:58<55:30, 3.60s/it] 87%|████████▋ | 5960/6885 [14:53:00<49:06, 3.19s/it] {'loss': 0.5561, 'grad_norm': 1.2243789216177572, 'learning_rate': 5.410607819922481e-07, 'epoch': 0.87} 87%|████████▋ | 5960/6885 [14:53:00<49:06, 3.19s/it] 87%|████████▋ | 5961/6885 [14:53:03<46:12, 3.00s/it] 87%|████████▋ | 5962/6885 [14:53:05<43:19, 2.82s/it] 87%|████████▋ | 5963/6885 [14:53:08<44:07, 2.87s/it] 87%|████████▋ | 5964/6885 [14:53:12<46:13, 3.01s/it] 87%|████████▋ | 5965/6885 [14:53:14<44:13, 2.88s/it] 87%|████████▋ | 5966/6885 [14:53:17<44:39, 2.92s/it] 87%|████████▋ | 5967/6885 [14:53:20<42:50, 2.80s/it] 87%|████████▋ | 5968/6885 [14:53:22<39:57, 2.62s/it] 87%|████████▋ | 5969/6885 [14:53:25<41:40, 2.73s/it] 87%|████████▋ | 5970/6885 [14:53:27<38:12, 2.51s/it] {'loss': 0.5246, 'grad_norm': 1.158429282768604, 'learning_rate': 5.296476378625237e-07, 'epoch': 0.87} 87%|████████▋ | 5970/6885 [14:53:27<38:12, 2.51s/it] 87%|████████▋ | 5971/6885 [14:53:31<44:42, 2.94s/it] 87%|████████▋ | 5972/6885 [14:53:33<41:44, 2.74s/it] 87%|████████▋ | 5973/6885 [14:53:35<38:38, 2.54s/it] 87%|████████▋ | 5974/6885 [14:53:39<42:06, 2.77s/it] 87%|████████▋ | 5975/6885 [14:53:41<39:05, 2.58s/it] 87%|████████▋ | 5976/6885 [14:53:44<41:11, 2.72s/it] 87%|████████▋ | 5977/6885 [14:53:49<50:48, 3.36s/it] 87%|████████▋ | 5978/6885 [14:53:52<51:47, 3.43s/it] 87%|████████▋ | 5979/6885 [14:53:55<49:33, 3.28s/it] 87%|████████▋ | 5980/6885 [14:53:57<43:49, 2.91s/it] {'loss': 0.5434, 'grad_norm': 1.2064879125921322, 'learning_rate': 5.183494195904015e-07, 'epoch': 0.87} 87%|████████▋ | 5980/6885 [14:53:57<43:49, 2.91s/it] 87%|████████▋ | 5981/6885 [14:53:59<39:35, 2.63s/it] 87%|████████▋ | 5982/6885 [14:54:01<36:04, 2.40s/it] 87%|████████▋ | 5983/6885 [14:54:03<34:28, 2.29s/it] 87%|████████▋ | 5984/6885 [14:54:09<48:45, 3.25s/it] 87%|████████▋ | 5985/6885 [14:54:12<48:56, 3.26s/it] 87%|████████▋ | 5986/6885 [14:54:15<46:20, 3.09s/it] 87%|████████▋ | 5987/6885 [14:54:17<41:37, 2.78s/it] 87%|████████▋ | 5988/6885 [14:54:19<40:02, 2.68s/it] 87%|████████▋ | 5989/6885 [14:54:21<37:47, 2.53s/it] 87%|████████▋ | 5990/6885 [14:54:25<41:19, 2.77s/it] {'loss': 0.556, 'grad_norm': 1.0370084252960212, 'learning_rate': 5.071664176356294e-07, 'epoch': 0.87} 87%|████████▋ | 5990/6885 [14:54:25<41:19, 2.77s/it] 87%|████████▋ | 5991/6885 [14:54:26<36:49, 2.47s/it] 87%|████████▋ | 5992/6885 [14:54:32<52:04, 3.50s/it] 87%|████████▋ | 5993/6885 [14:54:35<47:58, 3.23s/it] 87%|████████▋ | 5994/6885 [14:54:37<44:03, 2.97s/it] 87%|████████▋ | 5995/6885 [14:54:40<41:43, 2.81s/it] 87%|████████▋ | 5996/6885 [14:54:43<44:46, 3.02s/it] 87%|████████▋ | 5997/6885 [14:54:46<41:35, 2.81s/it] 87%|████████▋ | 5998/6885 [14:54:49<44:34, 3.02s/it] 87%|████████▋ | 5999/6885 [14:54:53<47:35, 3.22s/it] 87%|████████▋ | 6000/6885 [14:54:59<1:01:58, 4.20s/it] {'loss': 0.5349, 'grad_norm': 1.1529022886105922, 'learning_rate': 4.960989194959225e-07, 'epoch': 0.87} 87%|████████▋ | 6000/6885 [14:54:59<1:01:58, 4.20s/it] 87%|████████▋ | 6001/6885 [14:55:03<58:01, 3.94s/it] 87%|████████▋ | 6002/6885 [14:55:06<55:06, 3.74s/it] 87%|████████▋ | 6003/6885 [14:55:08<49:42, 3.38s/it] 87%|████████▋ | 6004/6885 [14:55:11<45:16, 3.08s/it] 87%|████████▋ | 6005/6885 [14:55:14<48:06, 3.28s/it] 87%|████████▋ | 6006/6885 [14:55:17<46:02, 3.14s/it] 87%|████████▋ | 6007/6885 [14:55:21<49:08, 3.36s/it] 87%|████████▋ | 6008/6885 [14:55:24<48:00, 3.28s/it] 87%|████████▋ | 6009/6885 [14:55:28<50:10, 3.44s/it] 87%|████████▋ | 6010/6885 [14:55:31<48:39, 3.34s/it] {'loss': 0.5641, 'grad_norm': 1.0702466803229502, 'learning_rate': 4.851472096995741e-07, 'epoch': 0.87} 87%|████████▋ | 6010/6885 [14:55:31<48:39, 3.34s/it] 87%|████████▋ | 6011/6885 [14:55:34<48:31, 3.33s/it] 87%|████████▋ | 6012/6885 [14:55:37<46:50, 3.22s/it] 87%|████████▋ | 6013/6885 [14:55:40<44:45, 3.08s/it] 87%|████████▋ | 6014/6885 [14:55:45<51:35, 3.55s/it] 87%|████████▋ | 6015/6885 [14:55:47<47:20, 3.26s/it] 87%|████████▋ | 6016/6885 [14:55:51<48:00, 3.31s/it] 87%|████████▋ | 6017/6885 [14:55:55<53:25, 3.69s/it] 87%|████████▋ | 6018/6885 [14:55:58<46:34, 3.22s/it] 87%|████████▋ | 6019/6885 [14:55:59<40:02, 2.77s/it] 87%|████████▋ | 6020/6885 [14:56:03<43:54, 3.05s/it] {'loss': 0.5627, 'grad_norm': 1.195504112892932, 'learning_rate': 4.7431156979813097e-07, 'epoch': 0.87} 87%|████████▋ | 6020/6885 [14:56:03<43:54, 3.05s/it] 87%|████████▋ | 6021/6885 [14:56:06<42:20, 2.94s/it] 87%|████████▋ | 6022/6885 [14:56:10<46:09, 3.21s/it] 87%|████████▋ | 6023/6885 [14:56:14<50:08, 3.49s/it] 87%|████████▋ | 6024/6885 [14:56:16<46:08, 3.22s/it] 88%|████████▊ | 6025/6885 [14:56:19<45:56, 3.21s/it] 88%|████████▊ | 6026/6885 [14:56:21<40:36, 2.84s/it] 88%|████████▊ | 6027/6885 [14:56:23<36:39, 2.56s/it] 88%|████████▊ | 6028/6885 [14:56:25<34:30, 2.42s/it] 88%|████████▊ | 6029/6885 [14:56:29<37:44, 2.64s/it] 88%|████████▊ | 6030/6885 [14:56:32<41:21, 2.90s/it] {'loss': 0.5457, 'grad_norm': 1.0424744381436926, 'learning_rate': 4.6359227835916954e-07, 'epoch': 0.88} 88%|████████▊ | 6030/6885 [14:56:32<41:21, 2.90s/it] 88%|████████▊ | 6031/6885 [14:56:35<40:55, 2.87s/it] 88%|████████▊ | 6032/6885 [14:56:37<39:11, 2.76s/it] 88%|████████▊ | 6033/6885 [14:56:40<37:51, 2.67s/it] 88%|████████▊ | 6034/6885 [14:56:42<37:10, 2.62s/it] 88%|████████▊ | 6035/6885 [14:56:44<33:46, 2.38s/it] 88%|████████▊ | 6036/6885 [14:56:47<35:51, 2.53s/it] 88%|████████▊ | 6037/6885 [14:56:50<39:10, 2.77s/it] 88%|████████▊ | 6038/6885 [14:56:53<40:34, 2.87s/it] 88%|████████▊ | 6039/6885 [14:56:56<38:17, 2.72s/it] 88%|████████▊ | 6040/6885 [14:56:59<39:47, 2.83s/it] {'loss': 0.5536, 'grad_norm': 1.136106426677912, 'learning_rate': 4.529896109591203e-07, 'epoch': 0.88} 88%|████████▊ | 6040/6885 [14:56:59<39:47, 2.83s/it] 88%|████████▊ | 6041/6885 [14:57:02<39:33, 2.81s/it] 88%|████████▊ | 6042/6885 [14:57:04<39:25, 2.81s/it] 88%|████████▊ | 6043/6885 [14:57:06<34:42, 2.47s/it] 88%|████████▊ | 6044/6885 [14:57:09<35:57, 2.57s/it] 88%|████████▊ | 6045/6885 [14:57:13<42:37, 3.05s/it] 88%|████████▊ | 6046/6885 [14:57:16<40:04, 2.87s/it] 88%|████████▊ | 6047/6885 [14:57:18<39:04, 2.80s/it] 88%|████████▊ | 6048/6885 [14:57:21<37:08, 2.66s/it] 88%|████████▊ | 6049/6885 [14:57:23<36:37, 2.63s/it] 88%|████████▊ | 6050/6885 [14:57:26<38:26, 2.76s/it] {'loss': 0.5512, 'grad_norm': 1.1941194023099557, 'learning_rate': 4.425038401761961e-07, 'epoch': 0.88} 88%|████████▊ | 6050/6885 [14:57:26<38:26, 2.76s/it] 88%|████████▊ | 6051/6885 [14:57:30<43:22, 3.12s/it] 88%|████████▊ | 6052/6885 [14:57:32<36:40, 2.64s/it] 88%|████████▊ | 6053/6885 [14:57:35<38:08, 2.75s/it] 88%|████████▊ | 6054/6885 [14:57:41<53:26, 3.86s/it] 88%|████████▊ | 6055/6885 [14:57:44<50:12, 3.63s/it] 88%|████████▊ | 6056/6885 [14:57:46<42:35, 3.08s/it] 88%|████████▊ | 6057/6885 [14:57:49<40:13, 2.91s/it] 88%|████████▊ | 6058/6885 [14:57:51<38:47, 2.81s/it] 88%|████████▊ | 6059/6885 [14:57:55<41:38, 3.02s/it] 88%|████████▊ | 6060/6885 [14:57:58<42:07, 3.06s/it] {'loss': 0.5522, 'grad_norm': 1.1005592964409183, 'learning_rate': 4.3213523558337354e-07, 'epoch': 0.88} 88%|████████▊ | 6060/6885 [14:57:58<42:07, 3.06s/it] 88%|████████▊ | 6061/6885 [14:57:59<35:55, 2.62s/it] 88%|████████▊ | 6062/6885 [14:58:01<32:51, 2.40s/it] 88%|████████▊ | 6063/6885 [14:58:04<35:04, 2.56s/it] 88%|████████▊ | 6064/6885 [14:58:06<32:39, 2.39s/it] 88%|████████▊ | 6065/6885 [14:58:09<34:32, 2.53s/it] 88%|████████▊ | 6066/6885 [14:58:11<33:48, 2.48s/it] 88%|████████▊ | 6067/6885 [14:58:14<35:47, 2.63s/it] 88%|████████▊ | 6068/6885 [14:58:17<36:44, 2.70s/it] 88%|████████▊ | 6069/6885 [14:58:22<43:24, 3.19s/it] 88%|████████▊ | 6070/6885 [14:58:24<40:22, 2.97s/it] {'loss': 0.5389, 'grad_norm': 1.3046172497671011, 'learning_rate': 4.218840637414695e-07, 'epoch': 0.88} 88%|████████▊ | 6070/6885 [14:58:24<40:22, 2.97s/it] 88%|████████▊ | 6071/6885 [14:58:27<38:31, 2.84s/it] 88%|████████▊ | 6072/6885 [14:58:30<40:29, 2.99s/it] 88%|████████▊ | 6073/6885 [14:58:33<40:44, 3.01s/it] 88%|████████▊ | 6074/6885 [14:58:36<40:50, 3.02s/it] 88%|████████▊ | 6075/6885 [14:58:39<39:46, 2.95s/it] 88%|████████▊ | 6076/6885 [14:58:42<38:52, 2.88s/it] 88%|████████▊ | 6077/6885 [14:58:44<35:50, 2.66s/it] 88%|████████▊ | 6078/6885 [14:58:48<40:54, 3.04s/it] 88%|████████▊ | 6079/6885 [14:58:51<42:31, 3.17s/it] 88%|████████▊ | 6080/6885 [14:58:53<38:38, 2.88s/it] {'loss': 0.5637, 'grad_norm': 1.2050786337197097, 'learning_rate': 4.117505881922856e-07, 'epoch': 0.88} 88%|████████▊ | 6080/6885 [14:58:53<38:38, 2.88s/it] 88%|████████▊ | 6081/6885 [14:58:56<36:38, 2.73s/it] 88%|████████▊ | 6082/6885 [14:58:59<37:38, 2.81s/it] 88%|████████▊ | 6083/6885 [14:59:01<37:28, 2.80s/it] 88%|████████▊ | 6084/6885 [14:59:05<39:42, 2.97s/it] 88%|████████▊ | 6085/6885 [14:59:08<39:10, 2.94s/it] 88%|████████▊ | 6086/6885 [14:59:11<38:53, 2.92s/it] 88%|████████▊ | 6087/6885 [14:59:13<35:33, 2.67s/it] 88%|████████▊ | 6088/6885 [14:59:15<35:44, 2.69s/it] 88%|████████▊ | 6089/6885 [14:59:18<36:20, 2.74s/it] 88%|████████▊ | 6090/6885 [14:59:21<38:27, 2.90s/it] {'loss': 0.5637, 'grad_norm': 1.1086711189663023, 'learning_rate': 4.0173506945183295e-07, 'epoch': 0.88} 88%|████████▊ | 6090/6885 [14:59:21<38:27, 2.90s/it] 88%|████████▊ | 6091/6885 [14:59:24<37:22, 2.82s/it] 88%|████████▊ | 6092/6885 [14:59:28<40:41, 3.08s/it] 88%|████████▊ | 6093/6885 [14:59:30<36:10, 2.74s/it] 89%|████████▊ | 6094/6885 [14:59:33<37:01, 2.81s/it] 89%|████████▊ | 6095/6885 [14:59:35<34:54, 2.65s/it] 89%|████████▊ | 6096/6885 [14:59:39<38:13, 2.91s/it] 89%|████████▊ | 6097/6885 [14:59:41<38:26, 2.93s/it] 89%|████████▊ | 6098/6885 [14:59:44<37:00, 2.82s/it] 89%|████████▊ | 6099/6885 [14:59:47<36:07, 2.76s/it] 89%|████████▊ | 6100/6885 [14:59:49<36:13, 2.77s/it] {'loss': 0.5639, 'grad_norm': 1.142760086036647, 'learning_rate': 3.9183776500363593e-07, 'epoch': 0.89} 89%|████████▊ | 6100/6885 [14:59:49<36:13, 2.77s/it] 89%|████████▊ | 6101/6885 [14:59:52<33:19, 2.55s/it] 89%|████████▊ | 6102/6885 [14:59:55<35:45, 2.74s/it] 89%|████████▊ | 6103/6885 [14:59:57<34:59, 2.68s/it] 89%|████████▊ | 6104/6885 [15:00:00<36:50, 2.83s/it] 89%|████████▊ | 6105/6885 [15:00:03<34:11, 2.63s/it] 89%|████████▊ | 6106/6885 [15:00:05<34:13, 2.64s/it] 89%|████████▊ | 6107/6885 [15:00:08<33:06, 2.55s/it] 89%|████████▊ | 6108/6885 [15:00:12<42:00, 3.24s/it] 89%|████████▊ | 6109/6885 [15:00:15<41:00, 3.17s/it] 89%|████████▊ | 6110/6885 [15:00:20<44:39, 3.46s/it] {'loss': 0.5534, 'grad_norm': 1.211597985547058, 'learning_rate': 3.8205892929211175e-07, 'epoch': 0.89} 89%|████████▊ | 6110/6885 [15:00:20<44:39, 3.46s/it] 89%|████████▉ | 6111/6885 [15:00:23<43:50, 3.40s/it] 89%|████████▉ | 6112/6885 [15:00:27<45:34, 3.54s/it] 89%|████████▉ | 6113/6885 [15:00:29<39:17, 3.05s/it] 89%|████████▉ | 6114/6885 [15:00:31<37:57, 2.95s/it] 89%|████████▉ | 6115/6885 [15:00:36<45:28, 3.54s/it] 89%|████████▉ | 6116/6885 [15:00:40<46:35, 3.64s/it] 89%|████████▉ | 6117/6885 [15:00:43<43:21, 3.39s/it] 89%|████████▉ | 6118/6885 [15:00:47<46:57, 3.67s/it] 89%|████████▉ | 6119/6885 [15:00:50<41:53, 3.28s/it] 89%|████████▉ | 6120/6885 [15:00:55<49:40, 3.90s/it] {'loss': 0.5514, 'grad_norm': 1.125094111731544, 'learning_rate': 3.7239881371603005e-07, 'epoch': 0.89} 89%|████████▉ | 6120/6885 [15:00:55<49:40, 3.90s/it] 89%|████████▉ | 6121/6885 [15:00:57<43:25, 3.41s/it] 89%|████████▉ | 6122/6885 [15:01:02<47:46, 3.76s/it] 89%|████████▉ | 6123/6885 [15:01:05<44:57, 3.54s/it] 89%|████████▉ | 6124/6885 [15:01:08<43:03, 3.40s/it] 89%|████████▉ | 6125/6885 [15:01:10<39:35, 3.13s/it] 89%|████████▉ | 6126/6885 [15:01:14<42:50, 3.39s/it] 89%|████████▉ | 6127/6885 [15:01:17<39:37, 3.14s/it] 89%|████████▉ | 6128/6885 [15:01:20<37:47, 3.00s/it] 89%|████████▉ | 6129/6885 [15:01:23<39:53, 3.17s/it] 89%|████████▉ | 6130/6885 [15:01:26<37:43, 3.00s/it] {'loss': 0.5593, 'grad_norm': 1.1253410539349802, 'learning_rate': 3.6285766662204735e-07, 'epoch': 0.89} 89%|████████▉ | 6130/6885 [15:01:26<37:43, 3.00s/it] 89%|████████▉ | 6131/6885 [15:01:29<36:43, 2.92s/it] 89%|████████▉ | 6132/6885 [15:01:31<36:22, 2.90s/it] 89%|████████▉ | 6133/6885 [15:01:36<41:43, 3.33s/it] 89%|████████▉ | 6134/6885 [15:01:38<36:59, 2.96s/it] 89%|████████▉ | 6135/6885 [15:01:40<34:10, 2.73s/it] 89%|████████▉ | 6136/6885 [15:01:42<31:58, 2.56s/it] 89%|████████▉ | 6137/6885 [15:01:45<32:46, 2.63s/it] 89%|████████▉ | 6138/6885 [15:01:48<33:47, 2.71s/it] 89%|████████▉ | 6139/6885 [15:01:51<33:42, 2.71s/it] 89%|████████▉ | 6140/6885 [15:01:53<32:31, 2.62s/it] {'loss': 0.5494, 'grad_norm': 1.076054931723469, 'learning_rate': 3.534357332983257e-07, 'epoch': 0.89} 89%|████████▉ | 6140/6885 [15:01:53<32:31, 2.62s/it] 89%|████████▉ | 6141/6885 [15:01:56<34:14, 2.76s/it] 89%|████████▉ | 6142/6885 [15:02:00<39:01, 3.15s/it] 89%|████████▉ | 6143/6885 [15:02:03<36:58, 2.99s/it] 89%|████████▉ | 6144/6885 [15:02:06<36:23, 2.95s/it] 89%|████████▉ | 6145/6885 [15:02:09<36:17, 2.94s/it] 89%|████████▉ | 6146/6885 [15:02:10<30:52, 2.51s/it] 89%|████████▉ | 6147/6885 [15:02:12<29:29, 2.40s/it] 89%|████████▉ | 6148/6885 [15:02:16<35:23, 2.88s/it] 89%|████████▉ | 6149/6885 [15:02:18<31:38, 2.58s/it] 89%|████████▉ | 6150/6885 [15:02:21<31:31, 2.57s/it] {'loss': 0.5507, 'grad_norm': 1.2433138382241562, 'learning_rate': 3.441332559682242e-07, 'epoch': 0.89} 89%|████████▉ | 6150/6885 [15:02:21<31:31, 2.57s/it] 89%|████████▉ | 6151/6885 [15:02:23<29:35, 2.42s/it] 89%|████████▉ | 6152/6885 [15:02:25<29:28, 2.41s/it] 89%|████████▉ | 6153/6885 [15:02:28<32:38, 2.68s/it] 89%|████████▉ | 6154/6885 [15:02:33<38:42, 3.18s/it] 89%|████████▉ | 6155/6885 [15:02:35<34:47, 2.86s/it] 89%|████████▉ | 6156/6885 [15:02:38<34:50, 2.87s/it] 89%|████████▉ | 6157/6885 [15:02:40<32:35, 2.69s/it] 89%|████████▉ | 6158/6885 [15:02:44<36:40, 3.03s/it] 89%|████████▉ | 6159/6885 [15:02:48<42:47, 3.54s/it] 89%|████████▉ | 6160/6885 [15:02:51<37:44, 3.12s/it] {'loss': 0.5632, 'grad_norm': 1.172111145318429, 'learning_rate': 3.349504737840742e-07, 'epoch': 0.89} 89%|████████▉ | 6160/6885 [15:02:51<37:44, 3.12s/it] 89%|████████▉ | 6161/6885 [15:02:56<44:23, 3.68s/it] 89%|████████▉ | 6162/6885 [15:02:59<43:56, 3.65s/it] 90%|████████▉ | 6163/6885 [15:03:02<40:13, 3.34s/it] 90%|████████▉ | 6164/6885 [15:03:06<43:19, 3.61s/it] 90%|████████▉ | 6165/6885 [15:03:11<47:12, 3.93s/it] 90%|████████▉ | 6166/6885 [15:03:14<45:44, 3.82s/it] 90%|████████▉ | 6167/6885 [15:03:18<44:07, 3.69s/it] 90%|████████▉ | 6168/6885 [15:03:22<45:45, 3.83s/it] 90%|████████▉ | 6169/6885 [15:03:25<45:00, 3.77s/it] 90%|████████▉ | 6170/6885 [15:03:28<39:41, 3.33s/it] {'loss': 0.5381, 'grad_norm': 1.2018077073853302, 'learning_rate': 3.258876228210267e-07, 'epoch': 0.9} 90%|████████▉ | 6170/6885 [15:03:28<39:41, 3.33s/it] 90%|████████▉ | 6171/6885 [15:03:32<41:59, 3.53s/it] 90%|████████▉ | 6172/6885 [15:03:37<47:41, 4.01s/it] 90%|████████▉ | 6173/6885 [15:03:44<58:29, 4.93s/it] 90%|████████▉ | 6174/6885 [15:03:46<47:37, 4.02s/it] 90%|████████▉ | 6175/6885 [15:03:48<40:55, 3.46s/it] 90%|████████▉ | 6176/6885 [15:03:50<35:51, 3.03s/it] 90%|████████▉ | 6177/6885 [15:03:52<32:25, 2.75s/it] 90%|████████▉ | 6178/6885 [15:03:54<28:35, 2.43s/it] 90%|████████▉ | 6179/6885 [15:03:56<27:39, 2.35s/it] 90%|████████▉ | 6180/6885 [15:03:59<29:25, 2.50s/it] {'loss': 0.5651, 'grad_norm': 1.1218901853415595, 'learning_rate': 3.169449360709914e-07, 'epoch': 0.9} 90%|████████▉ | 6180/6885 [15:03:59<29:25, 2.50s/it] 90%|████████▉ | 6181/6885 [15:04:02<30:55, 2.64s/it] 90%|████████▉ | 6182/6885 [15:04:05<33:33, 2.86s/it] 90%|████████▉ | 6183/6885 [15:04:07<31:04, 2.66s/it] 90%|████████▉ | 6184/6885 [15:04:10<30:34, 2.62s/it] 90%|████████▉ | 6185/6885 [15:04:15<39:12, 3.36s/it] 90%|████████▉ | 6186/6885 [15:04:18<38:06, 3.27s/it] 90%|████████▉ | 6187/6885 [15:04:20<34:25, 2.96s/it] 90%|████████▉ | 6188/6885 [15:04:22<31:30, 2.71s/it] 90%|████████▉ | 6189/6885 [15:04:27<37:44, 3.25s/it] 90%|████████▉ | 6190/6885 [15:04:31<40:03, 3.46s/it] {'loss': 0.5518, 'grad_norm': 1.075452696669577, 'learning_rate': 3.0812264343663467e-07, 'epoch': 0.9} 90%|████████▉ | 6190/6885 [15:04:31<40:03, 3.46s/it] 90%|████████▉ | 6191/6885 [15:04:34<40:10, 3.47s/it] 90%|████████▉ | 6192/6885 [15:04:38<39:43, 3.44s/it] 90%|████████▉ | 6193/6885 [15:04:40<36:05, 3.13s/it] 90%|████████▉ | 6194/6885 [15:04:43<34:02, 2.96s/it] 90%|████████▉ | 6195/6885 [15:04:46<36:38, 3.19s/it] 90%|████████▉ | 6196/6885 [15:04:48<31:59, 2.79s/it] 90%|█████████ | 6197/6885 [15:04:51<30:29, 2.66s/it] 90%|█████████ | 6198/6885 [15:04:54<31:32, 2.75s/it] 90%|█████████ | 6199/6885 [15:04:56<31:35, 2.76s/it] 90%|█████████ | 6200/6885 [15:04:59<30:06, 2.64s/it] {'loss': 0.5535, 'grad_norm': 1.2898875627777047, 'learning_rate': 2.99420971725482e-07, 'epoch': 0.9} 90%|█████████ | 6200/6885 [15:04:59<30:06, 2.64s/it] 90%|█████████ | 6201/6885 [15:05:02<31:29, 2.76s/it] 90%|█████████ | 6202/6885 [15:05:04<29:53, 2.63s/it] 90%|█████████ | 6203/6885 [15:05:06<26:24, 2.32s/it] 90%|█████████ | 6204/6885 [15:05:07<24:06, 2.12s/it] 90%|█████████ | 6205/6885 [15:05:10<25:44, 2.27s/it] 90%|█████████ | 6206/6885 [15:05:13<27:26, 2.42s/it] 90%|█████████ | 6207/6885 [15:05:16<30:50, 2.73s/it] 90%|█████████ | 6208/6885 [15:05:20<34:02, 3.02s/it] 90%|█████████ | 6209/6885 [15:05:22<29:12, 2.59s/it] 90%|█████████ | 6210/6885 [15:05:25<32:13, 2.86s/it] {'loss': 0.551, 'grad_norm': 1.064409341720963, 'learning_rate': 2.9084014464407837e-07, 'epoch': 0.9} 90%|█████████ | 6210/6885 [15:05:25<32:13, 2.86s/it] 90%|█████████ | 6211/6885 [15:05:27<30:38, 2.73s/it] 90%|█████████ | 6212/6885 [15:05:30<29:43, 2.65s/it] 90%|█████████ | 6213/6885 [15:05:33<31:25, 2.81s/it] 90%|█████████ | 6214/6885 [15:05:36<31:39, 2.83s/it] 90%|█████████ | 6215/6885 [15:05:38<29:53, 2.68s/it] 90%|█████████ | 6216/6885 [15:05:42<33:07, 2.97s/it] 90%|█████████ | 6217/6885 [15:05:44<31:07, 2.80s/it] 90%|█████████ | 6218/6885 [15:05:49<36:04, 3.24s/it] 90%|█████████ | 6219/6885 [15:05:50<30:56, 2.79s/it] 90%|█████████ | 6220/6885 [15:05:53<29:20, 2.65s/it] {'loss': 0.5351, 'grad_norm': 1.1430289990560287, 'learning_rate': 2.8238038279224e-07, 'epoch': 0.9} 90%|█████████ | 6220/6885 [15:05:53<29:20, 2.65s/it] 90%|█████████ | 6221/6885 [15:05:55<26:48, 2.42s/it] 90%|█████████ | 6222/6885 [15:05:58<30:11, 2.73s/it] 90%|█████████ | 6223/6885 [15:06:02<34:10, 3.10s/it] 90%|█████████ | 6224/6885 [15:06:06<38:28, 3.49s/it] 90%|█████████ | 6225/6885 [15:06:09<35:59, 3.27s/it] 90%|█████████ | 6226/6885 [15:06:11<31:47, 2.89s/it] 90%|█████████ | 6227/6885 [15:06:14<32:36, 2.97s/it] 90%|█████████ | 6228/6885 [15:06:17<31:10, 2.85s/it] 90%|█████████ | 6229/6885 [15:06:19<30:15, 2.77s/it] 90%|█████████ | 6230/6885 [15:06:22<29:14, 2.68s/it] {'loss': 0.5628, 'grad_norm': 1.0942084433621513, 'learning_rate': 2.740419036573844e-07, 'epoch': 0.9} 90%|█████████ | 6230/6885 [15:06:22<29:14, 2.68s/it] 91%|█████████ | 6231/6885 [15:06:25<29:29, 2.71s/it] 91%|█████████ | 6232/6885 [15:06:27<26:52, 2.47s/it] 91%|█████████ | 6233/6885 [15:06:29<27:16, 2.51s/it] 91%|█████████ | 6234/6885 [15:06:33<31:07, 2.87s/it] 91%|█████████ | 6235/6885 [15:06:36<32:47, 3.03s/it] 91%|█████████ | 6236/6885 [15:06:40<34:48, 3.22s/it] 91%|█████████ | 6237/6885 [15:06:43<35:06, 3.25s/it] 91%|█████████ | 6238/6885 [15:06:46<33:43, 3.13s/it] 91%|█████████ | 6239/6885 [15:06:48<30:56, 2.87s/it] 91%|█████████ | 6240/6885 [15:06:51<28:32, 2.65s/it] {'loss': 0.5698, 'grad_norm': 1.1827726416299507, 'learning_rate': 2.6582492160893536e-07, 'epoch': 0.91} 91%|█████████ | 6240/6885 [15:06:51<28:32, 2.65s/it] 91%|█████████ | 6241/6885 [15:06:53<27:25, 2.55s/it] 91%|█████████ | 6242/6885 [15:06:56<29:22, 2.74s/it] 91%|█████████ | 6243/6885 [15:07:00<34:33, 3.23s/it] 91%|█████████ | 6244/6885 [15:07:05<39:43, 3.72s/it] 91%|█████████ | 6245/6885 [15:07:09<39:08, 3.67s/it] 91%|█████████ | 6246/6885 [15:07:14<42:25, 3.98s/it] 91%|█████████ | 6247/6885 [15:07:15<35:37, 3.35s/it] 91%|█████████ | 6248/6885 [15:07:19<36:44, 3.46s/it] 91%|█████████ | 6249/6885 [15:07:22<35:33, 3.36s/it] 91%|█████████ | 6250/6885 [15:07:24<31:47, 3.00s/it] {'loss': 0.539, 'grad_norm': 1.0512203056975564, 'learning_rate': 2.5772964789281593e-07, 'epoch': 0.91} 91%|█████████ | 6250/6885 [15:07:24<31:47, 3.00s/it] 91%|█████████ | 6251/6885 [15:07:27<31:56, 3.02s/it] 91%|█████████ | 6252/6885 [15:07:32<36:47, 3.49s/it] 91%|█████████ | 6253/6885 [15:07:36<36:48, 3.49s/it] 91%|█████████ | 6254/6885 [15:07:38<32:47, 3.12s/it] 91%|█████████ | 6255/6885 [15:07:41<32:31, 3.10s/it] 91%|█████████ | 6256/6885 [15:07:43<29:09, 2.78s/it] 91%|█████████ | 6257/6885 [15:07:45<27:03, 2.58s/it] 91%|█████████ | 6258/6885 [15:07:47<26:00, 2.49s/it] 91%|█████████ | 6259/6885 [15:07:49<24:59, 2.40s/it] 91%|█████████ | 6260/6885 [15:07:52<25:21, 2.43s/it] {'loss': 0.5475, 'grad_norm': 1.177449766279641, 'learning_rate': 2.4975629062601534e-07, 'epoch': 0.91} 91%|█████████ | 6260/6885 [15:07:52<25:21, 2.43s/it] 91%|█████████ | 6261/6885 [15:07:54<25:08, 2.42s/it] 91%|█████████ | 6262/6885 [15:07:57<24:39, 2.38s/it] 91%|█████████ | 6263/6885 [15:07:59<23:11, 2.24s/it] 91%|█████████ | 6264/6885 [15:08:01<25:07, 2.43s/it] 91%|█████████ | 6265/6885 [15:08:04<25:11, 2.44s/it] 91%|█████████ | 6266/6885 [15:08:07<27:29, 2.66s/it] 91%|█████████ | 6267/6885 [15:08:10<28:34, 2.77s/it] 91%|█████████ | 6268/6885 [15:08:13<27:53, 2.71s/it] 91%|█████████ | 6269/6885 [15:08:17<33:20, 3.25s/it] 91%|█████████ | 6270/6885 [15:08:20<33:14, 3.24s/it] {'loss': 0.541, 'grad_norm': 1.2124754199233574, 'learning_rate': 2.419050547912388e-07, 'epoch': 0.91} 91%|█████████ | 6270/6885 [15:08:20<33:14, 3.24s/it] 91%|█████████ | 6271/6885 [15:08:25<36:11, 3.54s/it] 91%|█████████ | 6272/6885 [15:08:28<35:38, 3.49s/it] 91%|█████████ | 6273/6885 [15:08:31<34:55, 3.42s/it] 91%|█████████ | 6274/6885 [15:08:38<44:08, 4.34s/it] 91%|█████████ | 6275/6885 [15:08:40<36:13, 3.56s/it] 91%|█████████ | 6276/6885 [15:08:42<32:06, 3.16s/it] 91%|█████████ | 6277/6885 [15:08:44<28:47, 2.84s/it] 91%|█████████ | 6278/6885 [15:08:48<32:04, 3.17s/it] 91%|█████████ | 6279/6885 [15:08:52<35:11, 3.48s/it] 91%|█████████ | 6280/6885 [15:08:58<42:09, 4.18s/it] {'loss': 0.5588, 'grad_norm': 1.3580937630552576, 'learning_rate': 2.3417614223163908e-07, 'epoch': 0.91} 91%|█████████ | 6280/6885 [15:08:58<42:09, 4.18s/it] 91%|█████████ | 6281/6885 [15:09:00<34:44, 3.45s/it] 91%|█████████ | 6282/6885 [15:09:02<32:13, 3.21s/it] 91%|█████████▏| 6283/6885 [15:09:04<28:09, 2.81s/it] 91%|█████████▏| 6284/6885 [15:09:06<26:39, 2.66s/it] 91%|█████████▏| 6285/6885 [15:09:10<28:46, 2.88s/it] 91%|█████████▏| 6286/6885 [15:09:13<29:31, 2.96s/it] 91%|█████████▏| 6287/6885 [15:09:15<28:14, 2.83s/it] 91%|█████████▏| 6288/6885 [15:09:20<33:44, 3.39s/it] 91%|█████████▏| 6289/6885 [15:09:24<34:15, 3.45s/it] 91%|█████████▏| 6290/6885 [15:09:27<32:22, 3.26s/it] {'loss': 0.5436, 'grad_norm': 1.1170472146222037, 'learning_rate': 2.26569751645625e-07, 'epoch': 0.91} 91%|█████████▏| 6290/6885 [15:09:27<32:22, 3.26s/it] 91%|█████████▏| 6291/6885 [15:09:29<28:56, 2.92s/it] 91%|█████████▏| 6292/6885 [15:09:30<25:21, 2.57s/it] 91%|█████████▏| 6293/6885 [15:09:35<32:11, 3.26s/it] 91%|█████████▏| 6294/6885 [15:09:39<33:27, 3.40s/it] 91%|█████████▏| 6295/6885 [15:09:42<32:12, 3.28s/it] 91%|█████████▏| 6296/6885 [15:09:46<33:35, 3.42s/it] 91%|█████████▏| 6297/6885 [15:09:50<35:27, 3.62s/it] 91%|█████████▏| 6298/6885 [15:09:54<35:52, 3.67s/it] 91%|█████████▏| 6299/6885 [15:09:58<36:37, 3.75s/it] 92%|█████████▏| 6300/6885 [15:10:01<34:14, 3.51s/it] {'loss': 0.5377, 'grad_norm': 1.1184802548299553, 'learning_rate': 2.1908607858175612e-07, 'epoch': 0.92} 92%|█████████▏| 6300/6885 [15:10:01<34:14, 3.51s/it] 92%|█████████▏| 6301/6885 [15:10:05<36:18, 3.73s/it] 92%|█████████▏| 6302/6885 [15:10:08<34:24, 3.54s/it] 92%|█████████▏| 6303/6885 [15:10:10<30:05, 3.10s/it] 92%|█████████▏| 6304/6885 [15:10:13<29:52, 3.08s/it] 92%|█████████▏| 6305/6885 [15:10:17<31:14, 3.23s/it] 92%|█████████▏| 6306/6885 [15:10:19<28:45, 2.98s/it] 92%|█████████▏| 6307/6885 [15:10:21<26:55, 2.79s/it] 92%|█████████▏| 6308/6885 [15:10:25<30:23, 3.16s/it] 92%|█████████▏| 6309/6885 [15:10:30<33:26, 3.48s/it] 92%|█████████▏| 6310/6885 [15:10:32<31:04, 3.24s/it] {'loss': 0.5683, 'grad_norm': 1.1396702009546613, 'learning_rate': 2.117253154337118e-07, 'epoch': 0.92} 92%|█████████▏| 6310/6885 [15:10:32<31:04, 3.24s/it] 92%|█████████▏| 6311/6885 [15:10:35<29:14, 3.06s/it] 92%|█████████▏| 6312/6885 [15:10:37<26:23, 2.76s/it] 92%|█████████▏| 6313/6885 [15:10:39<25:37, 2.69s/it] 92%|█████████▏| 6314/6885 [15:10:42<25:06, 2.64s/it] 92%|█████████▏| 6315/6885 [15:10:45<25:31, 2.69s/it] 92%|█████████▏| 6316/6885 [15:10:48<26:34, 2.80s/it] 92%|█████████▏| 6317/6885 [15:10:50<25:29, 2.69s/it] 92%|█████████▏| 6318/6885 [15:10:53<24:31, 2.60s/it] 92%|█████████▏| 6319/6885 [15:10:56<25:27, 2.70s/it] 92%|█████████▏| 6320/6885 [15:10:58<23:36, 2.51s/it] {'loss': 0.5668, 'grad_norm': 1.2119088736658123, 'learning_rate': 2.0448765143534942e-07, 'epoch': 0.92} 92%|█████████▏| 6320/6885 [15:10:58<23:36, 2.51s/it] 92%|█████████▏| 6321/6885 [15:11:03<31:49, 3.38s/it] 92%|█████████▏| 6322/6885 [15:11:06<29:35, 3.15s/it] 92%|█████████▏| 6323/6885 [15:11:09<30:34, 3.26s/it] 92%|█████████▏| 6324/6885 [15:11:12<29:16, 3.13s/it] 92%|█████████▏| 6325/6885 [15:11:14<27:15, 2.92s/it] 92%|█████████▏| 6326/6885 [15:11:17<25:12, 2.71s/it] 92%|█████████▏| 6327/6885 [15:11:19<25:09, 2.70s/it] 92%|█████████▏| 6328/6885 [15:11:23<28:05, 3.03s/it] 92%|█████████▏| 6329/6885 [15:11:25<25:23, 2.74s/it] 92%|█████████▏| 6330/6885 [15:11:28<25:34, 2.76s/it] {'loss': 0.5437, 'grad_norm': 1.0448734314632342, 'learning_rate': 1.973732726558364e-07, 'epoch': 0.92} 92%|█████████▏| 6330/6885 [15:11:28<25:34, 2.76s/it] 92%|█████████▏| 6331/6885 [15:11:31<26:25, 2.86s/it] 92%|█████████▏| 6332/6885 [15:11:33<24:19, 2.64s/it] 92%|█████████▏| 6333/6885 [15:11:36<23:27, 2.55s/it] 92%|█████████▏| 6334/6885 [15:11:38<23:37, 2.57s/it] 92%|█████████▏| 6335/6885 [15:11:41<24:18, 2.65s/it] 92%|█████████▏| 6336/6885 [15:11:43<22:17, 2.44s/it] 92%|█████████▏| 6337/6885 [15:11:46<22:54, 2.51s/it] 92%|█████████▏| 6338/6885 [15:11:47<20:29, 2.25s/it] 92%|█████████▏| 6339/6885 [15:11:49<20:09, 2.22s/it] 92%|█████████▏| 6340/6885 [15:11:51<18:28, 2.03s/it] {'loss': 0.5622, 'grad_norm': 1.2851112602098311, 'learning_rate': 1.9038236199486693e-07, 'epoch': 0.92} 92%|█████████▏| 6340/6885 [15:11:51<18:28, 2.03s/it] 92%|█████████▏| 6341/6885 [15:11:55<23:00, 2.54s/it] 92%|█████████▏| 6342/6885 [15:11:57<22:38, 2.50s/it] 92%|█████████▏| 6343/6885 [15:11:59<20:35, 2.28s/it] 92%|█████████▏| 6344/6885 [15:12:03<25:30, 2.83s/it] 92%|█████████▏| 6345/6885 [15:12:05<23:18, 2.59s/it] 92%|█████████▏| 6346/6885 [15:12:07<22:36, 2.52s/it] 92%|█████████▏| 6347/6885 [15:12:09<20:48, 2.32s/it] 92%|█████████▏| 6348/6885 [15:12:14<25:48, 2.88s/it] 92%|█████████▏| 6349/6885 [15:12:16<24:34, 2.75s/it] 92%|█████████▏| 6350/6885 [15:12:19<23:59, 2.69s/it] {'loss': 0.542, 'grad_norm': 1.1700640178574329, 'learning_rate': 1.8351509917796218e-07, 'epoch': 0.92} 92%|█████████▏| 6350/6885 [15:12:19<23:59, 2.69s/it] 92%|█████████▏| 6351/6885 [15:12:23<27:28, 3.09s/it] 92%|█████████▏| 6352/6885 [15:12:24<24:08, 2.72s/it] 92%|█████████▏| 6353/6885 [15:12:27<23:24, 2.64s/it] 92%|█████████▏| 6354/6885 [15:12:31<26:13, 2.96s/it] 92%|█████████▏| 6355/6885 [15:12:33<25:05, 2.84s/it] 92%|█████████▏| 6356/6885 [15:12:35<22:36, 2.56s/it] 92%|█████████▏| 6357/6885 [15:12:38<22:41, 2.58s/it] 92%|█████████▏| 6358/6885 [15:12:41<25:25, 2.89s/it] 92%|█████████▏| 6359/6885 [15:12:43<23:11, 2.65s/it] 92%|█████████▏| 6360/6885 [15:12:48<27:52, 3.19s/it] {'loss': 0.5529, 'grad_norm': 1.1416778336018678, 'learning_rate': 1.7677166075184548e-07, 'epoch': 0.92} 92%|█████████▏| 6360/6885 [15:12:48<27:52, 3.19s/it] 92%|█████████▏| 6361/6885 [15:12:52<30:28, 3.49s/it] 92%|█████████▏| 6362/6885 [15:12:54<26:58, 3.09s/it] 92%|█████████▏| 6363/6885 [15:12:57<26:26, 3.04s/it] 92%|█████████▏| 6364/6885 [15:13:00<25:34, 2.95s/it] 92%|█████████▏| 6365/6885 [15:13:03<27:17, 3.15s/it] 92%|█████████▏| 6366/6885 [15:13:08<30:44, 3.55s/it] 92%|█████████▏| 6367/6885 [15:13:11<29:08, 3.38s/it] 92%|█████████▏| 6368/6885 [15:13:13<25:10, 2.92s/it] 93%|█████████▎| 6369/6885 [15:13:16<26:27, 3.08s/it] 93%|█████████▎| 6370/6885 [15:13:18<24:11, 2.82s/it] {'loss': 0.5559, 'grad_norm': 1.1230308913216087, 'learning_rate': 1.7015222007990883e-07, 'epoch': 0.93} 93%|█████████▎| 6370/6885 [15:13:18<24:11, 2.82s/it] 93%|█████████▎| 6371/6885 [15:13:22<25:26, 2.97s/it] 93%|█████████▎| 6372/6885 [15:13:24<23:44, 2.78s/it] 93%|█████████▎| 6373/6885 [15:13:28<26:22, 3.09s/it] 93%|█████████▎| 6374/6885 [15:13:30<24:21, 2.86s/it] 93%|█████████▎| 6375/6885 [15:13:33<25:04, 2.95s/it] 93%|█████████▎| 6376/6885 [15:13:35<22:11, 2.62s/it] 93%|█████████▎| 6377/6885 [15:13:39<25:44, 3.04s/it] 93%|█████████▎| 6378/6885 [15:13:41<23:12, 2.75s/it] 93%|█████████▎| 6379/6885 [15:13:45<25:03, 2.97s/it] 93%|█████████▎| 6380/6885 [15:13:47<23:09, 2.75s/it] {'loss': 0.5507, 'grad_norm': 1.1568250466964043, 'learning_rate': 1.6365694733775305e-07, 'epoch': 0.93} 93%|█████████▎| 6380/6885 [15:13:47<23:09, 2.75s/it] 93%|█████████▎| 6381/6885 [15:13:51<26:24, 3.14s/it] 93%|█████████▎| 6382/6885 [15:13:53<23:51, 2.85s/it] 93%|█████████▎| 6383/6885 [15:13:56<23:28, 2.81s/it] 93%|█████████▎| 6384/6885 [15:13:58<20:47, 2.49s/it] 93%|█████████▎| 6385/6885 [15:14:00<19:49, 2.38s/it] 93%|█████████▎| 6386/6885 [15:14:04<23:49, 2.87s/it] 93%|█████████▎| 6387/6885 [15:14:06<23:17, 2.81s/it] 93%|█████████▎| 6388/6885 [15:14:08<20:44, 2.50s/it] 93%|█████████▎| 6389/6885 [15:14:11<21:32, 2.61s/it] 93%|█████████▎| 6390/6885 [15:14:14<23:04, 2.80s/it] {'loss': 0.552, 'grad_norm': 1.1602815569402067, 'learning_rate': 1.572860095088108e-07, 'epoch': 0.93} 93%|█████████▎| 6390/6885 [15:14:14<23:04, 2.80s/it] 93%|█████████▎| 6391/6885 [15:14:17<22:55, 2.79s/it] 93%|█████████▎| 6392/6885 [15:14:20<22:22, 2.72s/it] 93%|█████████▎| 6393/6885 [15:14:23<24:49, 3.03s/it] 93%|█████████▎| 6394/6885 [15:14:28<28:18, 3.46s/it] 93%|█████████▎| 6395/6885 [15:14:31<26:44, 3.27s/it] 93%|█████████▎| 6396/6885 [15:14:33<25:06, 3.08s/it] 93%|█████████▎| 6397/6885 [15:14:37<25:56, 3.19s/it] 93%|█████████▎| 6398/6885 [15:14:39<24:15, 2.99s/it] 93%|█████████▎| 6399/6885 [15:14:44<28:14, 3.49s/it] 93%|█████████▎| 6400/6885 [15:14:46<25:49, 3.19s/it] {'loss': 0.5446, 'grad_norm': 1.0423401424679095, 'learning_rate': 1.5103957038005935e-07, 'epoch': 0.93} 93%|█████████▎| 6400/6885 [15:14:46<25:49, 3.19s/it] 93%|█████████▎| 6401/6885 [15:14:50<27:23, 3.40s/it] 93%|█████████▎| 6402/6885 [15:14:52<24:13, 3.01s/it] 93%|█████████▎| 6403/6885 [15:14:55<23:22, 2.91s/it] 93%|█████████▎| 6404/6885 [15:14:58<23:50, 2.97s/it] 93%|█████████▎| 6405/6885 [15:15:00<21:50, 2.73s/it] 93%|█████████▎| 6406/6885 [15:15:03<21:14, 2.66s/it] 93%|█████████▎| 6407/6885 [15:15:09<29:55, 3.76s/it] 93%|█████████▎| 6408/6885 [15:15:12<26:40, 3.36s/it] 93%|█████████▎| 6409/6885 [15:15:15<26:06, 3.29s/it] 93%|█████████▎| 6410/6885 [15:15:17<23:49, 3.01s/it] {'loss': 0.5473, 'grad_norm': 1.1374874233890928, 'learning_rate': 1.4491779053780298e-07, 'epoch': 0.93} 93%|█████████▎| 6410/6885 [15:15:17<23:49, 3.01s/it] 93%|█████████▎| 6411/6885 [15:15:20<23:29, 2.97s/it] 93%|█████████▎| 6412/6885 [15:15:24<26:18, 3.34s/it] 93%|█████████▎| 6413/6885 [15:15:27<23:45, 3.02s/it] 93%|█████████▎| 6414/6885 [15:15:29<22:30, 2.87s/it] 93%|█████████▎| 6415/6885 [15:15:31<19:15, 2.46s/it] 93%|█████████▎| 6416/6885 [15:15:34<21:16, 2.72s/it] 93%|█████████▎| 6417/6885 [15:15:37<23:14, 2.98s/it] 93%|█████████▎| 6418/6885 [15:15:43<28:45, 3.69s/it] 93%|█████████▎| 6419/6885 [15:15:45<24:40, 3.18s/it] 93%|█████████▎| 6420/6885 [15:15:48<24:16, 3.13s/it] {'loss': 0.5486, 'grad_norm': 1.1755709384042587, 'learning_rate': 1.3892082736355283e-07, 'epoch': 0.93} 93%|█████████▎| 6420/6885 [15:15:48<24:16, 3.13s/it] 93%|█████████▎| 6421/6885 [15:15:50<22:49, 2.95s/it] 93%|█████████▎| 6422/6885 [15:15:53<21:16, 2.76s/it] 93%|█████████▎| 6423/6885 [15:15:55<19:55, 2.59s/it] 93%|█████████▎| 6424/6885 [15:15:57<18:56, 2.46s/it] 93%|█████████▎| 6425/6885 [15:15:59<17:27, 2.28s/it] 93%|█████████▎| 6426/6885 [15:16:01<16:06, 2.11s/it] 93%|█████████▎| 6427/6885 [15:16:04<18:35, 2.44s/it] 93%|█████████▎| 6428/6885 [15:16:07<21:18, 2.80s/it] 93%|█████████▎| 6429/6885 [15:16:10<20:17, 2.67s/it] 93%|█████████▎| 6430/6885 [15:16:13<21:00, 2.77s/it] {'loss': 0.5518, 'grad_norm': 1.1744643775241368, 'learning_rate': 1.3304883502997133e-07, 'epoch': 0.93} 93%|█████████▎| 6430/6885 [15:16:13<21:00, 2.77s/it] 93%|█████████▎| 6431/6885 [15:16:15<20:30, 2.71s/it] 93%|█████████▎| 6432/6885 [15:16:18<19:48, 2.62s/it] 93%|█████████▎| 6433/6885 [15:16:21<20:09, 2.68s/it] 93%|█████████▎| 6434/6885 [15:16:23<19:06, 2.54s/it] 93%|█████████▎| 6435/6885 [15:16:25<17:20, 2.31s/it] 93%|█████████▎| 6436/6885 [15:16:27<16:57, 2.27s/it] 93%|█████████▎| 6437/6885 [15:16:30<18:25, 2.47s/it] 94%|█████████▎| 6438/6885 [15:16:32<18:56, 2.54s/it] 94%|█████████▎| 6439/6885 [15:16:35<19:29, 2.62s/it] 94%|█████████▎| 6440/6885 [15:16:39<21:47, 2.94s/it] {'loss': 0.5492, 'grad_norm': 1.1216236591765696, 'learning_rate': 1.2730196449691756e-07, 'epoch': 0.94} 94%|█████████▎| 6440/6885 [15:16:39<21:47, 2.94s/it] 94%|█████████▎| 6441/6885 [15:16:48<35:24, 4.79s/it] 94%|█████████▎| 6442/6885 [15:16:50<28:51, 3.91s/it] 94%|█████████▎| 6443/6885 [15:16:53<26:31, 3.60s/it] 94%|█████████▎| 6444/6885 [15:16:55<24:04, 3.28s/it] 94%|█████████▎| 6445/6885 [15:16:58<23:25, 3.19s/it] 94%|█████████▎| 6446/6885 [15:17:01<22:55, 3.13s/it] 94%|█████████▎| 6447/6885 [15:17:05<23:46, 3.26s/it] 94%|█████████▎| 6448/6885 [15:17:09<25:13, 3.46s/it] 94%|█████████▎| 6449/6885 [15:17:11<22:28, 3.09s/it] 94%|█████████▎| 6450/6885 [15:17:14<21:45, 3.00s/it] {'loss': 0.5322, 'grad_norm': 1.1470393369010776, 'learning_rate': 1.2168036350755975e-07, 'epoch': 0.94} 94%|█████████▎| 6450/6885 [15:17:14<21:45, 3.00s/it] 94%|█████████▎| 6451/6885 [15:17:16<20:34, 2.85s/it] 94%|█████████▎| 6452/6885 [15:17:18<18:16, 2.53s/it] 94%|█████████▎| 6453/6885 [15:17:21<19:19, 2.68s/it] 94%|█████████▎| 6454/6885 [15:17:24<19:00, 2.65s/it] 94%|█████████▍| 6455/6885 [15:17:27<19:44, 2.75s/it] 94%|█████████▍| 6456/6885 [15:17:30<20:43, 2.90s/it] 94%|█████████▍| 6457/6885 [15:17:33<20:42, 2.90s/it] 94%|█████████▍| 6458/6885 [15:17:35<20:04, 2.82s/it] 94%|█████████▍| 6459/6885 [15:17:38<19:13, 2.71s/it] 94%|█████████▍| 6460/6885 [15:17:39<16:41, 2.36s/it] {'loss': 0.5616, 'grad_norm': 1.1985354195876317, 'learning_rate': 1.1618417658458003e-07, 'epoch': 0.94} 94%|█████████▍| 6460/6885 [15:17:39<16:41, 2.36s/it] 94%|█████████▍| 6461/6885 [15:17:42<16:32, 2.34s/it] 94%|█████████▍| 6462/6885 [15:17:45<17:32, 2.49s/it] 94%|█████████▍| 6463/6885 [15:17:47<17:59, 2.56s/it] 94%|█████████▍| 6464/6885 [15:17:50<18:50, 2.69s/it] 94%|█████████▍| 6465/6885 [15:17:53<18:54, 2.70s/it] 94%|█████████▍| 6466/6885 [15:17:57<21:06, 3.02s/it] 94%|█████████▍| 6467/6885 [15:18:00<20:37, 2.96s/it] 94%|█████████▍| 6468/6885 [15:18:04<23:57, 3.45s/it] 94%|█████████▍| 6469/6885 [15:18:06<20:30, 2.96s/it] 94%|█████████▍| 6470/6885 [15:18:08<19:38, 2.84s/it] {'loss': 0.5531, 'grad_norm': 1.1475497479759824, 'learning_rate': 1.1081354502645913e-07, 'epoch': 0.94} 94%|█████████▍| 6470/6885 [15:18:09<19:38, 2.84s/it] 94%|█████████▍| 6471/6885 [15:18:11<19:48, 2.87s/it] 94%|█████████▍| 6472/6885 [15:18:14<18:28, 2.69s/it] 94%|█████████▍| 6473/6885 [15:18:16<17:38, 2.57s/it] 94%|█████████▍| 6474/6885 [15:18:20<20:22, 2.97s/it] 94%|█████████▍| 6475/6885 [15:18:23<19:48, 2.90s/it] 94%|█████████▍| 6476/6885 [15:18:25<18:08, 2.66s/it] 94%|█████████▍| 6477/6885 [15:18:27<17:49, 2.62s/it] 94%|█████████▍| 6478/6885 [15:18:31<19:27, 2.87s/it] 94%|█████████▍| 6479/6885 [15:18:33<18:33, 2.74s/it] 94%|█████████▍| 6480/6885 [15:18:35<16:22, 2.43s/it] {'loss': 0.5472, 'grad_norm': 1.1396353932104606, 'learning_rate': 1.0556860690384252e-07, 'epoch': 0.94} 94%|█████████▍| 6480/6885 [15:18:35<16:22, 2.43s/it] 94%|█████████▍| 6481/6885 [15:18:37<16:23, 2.43s/it] 94%|█████████▍| 6482/6885 [15:18:41<18:43, 2.79s/it] 94%|█████████▍| 6483/6885 [15:18:43<18:07, 2.70s/it] 94%|█████████▍| 6484/6885 [15:18:46<18:10, 2.72s/it] 94%|█████████▍| 6485/6885 [15:18:49<17:58, 2.70s/it] 94%|█████████▍| 6486/6885 [15:18:52<18:49, 2.83s/it] 94%|█████████▍| 6487/6885 [15:18:54<17:38, 2.66s/it] 94%|█████████▍| 6488/6885 [15:18:58<18:51, 2.85s/it] 94%|█████████▍| 6489/6885 [15:19:03<23:23, 3.55s/it] 94%|█████████▍| 6490/6885 [15:19:06<22:10, 3.37s/it] {'loss': 0.5429, 'grad_norm': 1.1215848254083782, 'learning_rate': 1.0044949705599216e-07, 'epoch': 0.94} 94%|█████████▍| 6490/6885 [15:19:06<22:10, 3.37s/it] 94%|█████████▍| 6491/6885 [15:19:10<23:22, 3.56s/it] 94%|█████████▍| 6492/6885 [15:19:16<29:13, 4.46s/it] 94%|█████████▍| 6493/6885 [15:19:19<25:33, 3.91s/it] 94%|█████████▍| 6494/6885 [15:19:21<22:11, 3.40s/it] 94%|█████████▍| 6495/6885 [15:19:25<22:32, 3.47s/it] 94%|█████████▍| 6496/6885 [15:19:29<25:05, 3.87s/it] 94%|█████████▍| 6497/6885 [15:19:32<22:54, 3.54s/it] 94%|█████████▍| 6498/6885 [15:19:34<19:46, 3.07s/it] 94%|█████████▍| 6499/6885 [15:19:36<18:05, 2.81s/it] 94%|█████████▍| 6500/6885 [15:19:40<18:45, 2.92s/it] {'loss': 0.5418, 'grad_norm': 1.005591582016032, 'learning_rate': 9.545634708731988e-08, 'epoch': 0.94} 94%|█████████▍| 6500/6885 [15:19:40<18:45, 2.92s/it] 94%|█████████▍| 6501/6885 [15:19:42<18:07, 2.83s/it] 94%|█████████▍| 6502/6885 [15:19:45<17:34, 2.75s/it] 94%|█████████▍| 6503/6885 [15:19:48<18:00, 2.83s/it] 94%|█████████▍| 6504/6885 [15:19:51<19:03, 3.00s/it] 94%|█████████▍| 6505/6885 [15:19:54<18:09, 2.87s/it] 94%|█████████▍| 6506/6885 [15:19:57<18:28, 2.93s/it] 95%|█████████▍| 6507/6885 [15:20:00<19:39, 3.12s/it] 95%|█████████▍| 6508/6885 [15:20:03<17:47, 2.83s/it] 95%|█████████▍| 6509/6885 [15:20:07<21:17, 3.40s/it] 95%|█████████▍| 6510/6885 [15:20:10<19:08, 3.06s/it] {'loss': 0.5578, 'grad_norm': 1.215225242394237, 'learning_rate': 9.058928536400058e-08, 'epoch': 0.95} 95%|█████████▍| 6510/6885 [15:20:10<19:08, 3.06s/it] 95%|█████████▍| 6511/6885 [15:20:14<21:47, 3.50s/it] 95%|█████████▍| 6512/6885 [15:20:18<23:04, 3.71s/it] 95%|█████████▍| 6513/6885 [15:20:21<21:27, 3.46s/it] 95%|█████████▍| 6514/6885 [15:20:23<18:05, 2.92s/it] 95%|█████████▍| 6515/6885 [15:20:25<16:34, 2.69s/it] 95%|█████████▍| 6516/6885 [15:20:29<18:54, 3.07s/it] 95%|█████████▍| 6517/6885 [15:20:31<17:11, 2.80s/it] 95%|█████████▍| 6518/6885 [15:20:33<14:55, 2.44s/it] 95%|█████████▍| 6519/6885 [15:20:36<15:44, 2.58s/it] 95%|█████████▍| 6520/6885 [15:20:41<20:01, 3.29s/it] {'loss': 0.5404, 'grad_norm': 1.152537711229488, 'learning_rate': 8.584843701067935e-08, 'epoch': 0.95} 95%|█████████▍| 6520/6885 [15:20:41<20:01, 3.29s/it] 95%|█████████▍| 6521/6885 [15:20:43<18:05, 2.98s/it] 95%|█████████▍| 6522/6885 [15:20:47<19:48, 3.27s/it] 95%|█████████▍| 6523/6885 [15:20:50<19:56, 3.31s/it] 95%|█████████▍| 6524/6885 [15:20:53<19:31, 3.25s/it] 95%|█████████▍| 6525/6885 [15:20:55<17:34, 2.93s/it] 95%|█████████▍| 6526/6885 [15:20:58<16:23, 2.74s/it] 95%|█████████▍| 6527/6885 [15:21:04<23:17, 3.90s/it] 95%|█████████▍| 6528/6885 [15:21:07<20:44, 3.49s/it] 95%|█████████▍| 6529/6885 [15:21:10<19:16, 3.25s/it] 95%|█████████▍| 6530/6885 [15:21:12<17:06, 2.89s/it] {'loss': 0.5522, 'grad_norm': 1.175848365037797, 'learning_rate': 8.123392390724682e-08, 'epoch': 0.95} 95%|█████████▍| 6530/6885 [15:21:12<17:06, 2.89s/it] 95%|█████████▍| 6531/6885 [15:21:14<15:45, 2.67s/it] 95%|█████████▍| 6532/6885 [15:21:16<14:44, 2.51s/it] 95%|█████████▍| 6533/6885 [15:21:19<15:25, 2.63s/it] 95%|█████████▍| 6534/6885 [15:21:21<15:11, 2.60s/it] 95%|█████████▍| 6535/6885 [15:21:24<16:02, 2.75s/it] 95%|█████████▍| 6536/6885 [15:21:28<16:58, 2.92s/it] 95%|█████████▍| 6537/6885 [15:21:31<18:05, 3.12s/it] 95%|█████████▍| 6538/6885 [15:21:33<15:52, 2.74s/it] 95%|█████████▍| 6539/6885 [15:21:36<16:13, 2.81s/it] 95%|█████████▍| 6540/6885 [15:21:41<18:56, 3.29s/it] {'loss': 0.5564, 'grad_norm': 1.0183498527962453, 'learning_rate': 7.674586468570999e-08, 'epoch': 0.95} 95%|█████████▍| 6540/6885 [15:21:41<18:56, 3.29s/it] 95%|█████████▌| 6541/6885 [15:21:44<18:29, 3.22s/it] 95%|█████████▌| 6542/6885 [15:21:47<18:18, 3.20s/it] 95%|█████████▌| 6543/6885 [15:21:49<15:57, 2.80s/it] 95%|█████████▌| 6544/6885 [15:21:51<15:52, 2.79s/it] 95%|█████████▌| 6545/6885 [15:21:54<14:46, 2.61s/it] 95%|█████████▌| 6546/6885 [15:21:57<15:34, 2.76s/it] 95%|█████████▌| 6547/6885 [15:21:59<14:54, 2.65s/it] 95%|█████████▌| 6548/6885 [15:22:03<16:52, 3.00s/it] 95%|█████████▌| 6549/6885 [15:22:05<15:01, 2.68s/it] 95%|█████████▌| 6550/6885 [15:22:06<13:05, 2.34s/it] {'loss': 0.5561, 'grad_norm': 1.2151729065782833, 'learning_rate': 7.238437472714466e-08, 'epoch': 0.95} 95%|█████████▌| 6550/6885 [15:22:06<13:05, 2.34s/it] 95%|█████████▌| 6551/6885 [15:22:10<14:28, 2.60s/it] 95%|█████████▌| 6552/6885 [15:22:13<16:19, 2.94s/it] 95%|█████████▌| 6553/6885 [15:22:15<14:43, 2.66s/it] 95%|█████████▌| 6554/6885 [15:22:18<13:57, 2.53s/it] 95%|█████████▌| 6555/6885 [15:22:22<16:50, 3.06s/it] 95%|█████████▌| 6556/6885 [15:22:25<17:21, 3.16s/it] 95%|█████████▌| 6557/6885 [15:22:29<18:24, 3.37s/it] 95%|█████████▌| 6558/6885 [15:22:32<18:06, 3.32s/it] 95%|█████████▌| 6559/6885 [15:22:36<18:18, 3.37s/it] 95%|█████████▌| 6560/6885 [15:22:39<18:07, 3.34s/it] {'loss': 0.5411, 'grad_norm': 1.1402236462651618, 'learning_rate': 6.81495661587217e-08, 'epoch': 0.95} 95%|█████████▌| 6560/6885 [15:22:39<18:07, 3.34s/it] 95%|█████████▌| 6561/6885 [15:22:41<15:20, 2.84s/it] 95%|█████████▌| 6562/6885 [15:22:43<13:54, 2.58s/it] 95%|█████████▌| 6563/6885 [15:22:45<13:13, 2.46s/it] 95%|█████████▌| 6564/6885 [15:22:47<13:00, 2.43s/it] 95%|█████████▌| 6565/6885 [15:22:50<13:41, 2.57s/it] 95%|█████████▌| 6566/6885 [15:22:53<13:42, 2.58s/it] 95%|█████████▌| 6567/6885 [15:22:57<15:27, 2.92s/it] 95%|█████████▌| 6568/6885 [15:23:00<16:38, 3.15s/it] 95%|█████████▌| 6569/6885 [15:23:03<16:05, 3.06s/it] 95%|█████████▌| 6570/6885 [15:23:06<16:10, 3.08s/it] {'loss': 0.5539, 'grad_norm': 1.1521868862152016, 'learning_rate': 6.404154785083383e-08, 'epoch': 0.95} 95%|█████████▌| 6570/6885 [15:23:06<16:10, 3.08s/it] 95%|█████████▌| 6571/6885 [15:23:09<16:03, 3.07s/it] 95%|█████████▌| 6572/6885 [15:23:12<16:10, 3.10s/it] 95%|█████████▌| 6573/6885 [15:23:15<15:12, 2.92s/it] 95%|█████████▌| 6574/6885 [15:23:17<14:11, 2.74s/it] 95%|█████████▌| 6575/6885 [15:23:23<19:03, 3.69s/it] 96%|█████████▌| 6576/6885 [15:23:25<16:42, 3.24s/it] 96%|█████████▌| 6577/6885 [15:23:27<14:45, 2.87s/it] 96%|█████████▌| 6578/6885 [15:23:30<14:20, 2.80s/it] 96%|█████████▌| 6579/6885 [15:23:33<14:45, 2.89s/it] 96%|█████████▌| 6580/6885 [15:23:36<14:39, 2.88s/it] {'loss': 0.5532, 'grad_norm': 1.1258302178296054, 'learning_rate': 6.006042541428669e-08, 'epoch': 0.96} 96%|█████████▌| 6580/6885 [15:23:36<14:39, 2.88s/it] 96%|█████████▌| 6581/6885 [15:23:39<14:54, 2.94s/it] 96%|█████████▌| 6582/6885 [15:23:41<13:54, 2.75s/it] 96%|█████████▌| 6583/6885 [15:23:45<14:27, 2.87s/it] 96%|█████████▌| 6584/6885 [15:23:47<13:56, 2.78s/it] 96%|█████████▌| 6585/6885 [15:23:49<13:13, 2.65s/it] 96%|█████████▌| 6586/6885 [15:23:51<12:04, 2.42s/it] 96%|█████████▌| 6587/6885 [15:23:54<12:48, 2.58s/it] 96%|█████████▌| 6588/6885 [15:23:57<13:13, 2.67s/it] 96%|█████████▌| 6589/6885 [15:24:00<13:04, 2.65s/it] 96%|█████████▌| 6590/6885 [15:24:03<14:06, 2.87s/it] {'loss': 0.5505, 'grad_norm': 1.173412519187008, 'learning_rate': 5.6206301197594404e-08, 'epoch': 0.96} 96%|█████████▌| 6590/6885 [15:24:03<14:06, 2.87s/it] 96%|█████████▌| 6591/6885 [15:24:06<13:48, 2.82s/it] 96%|█████████▌| 6592/6885 [15:24:09<13:44, 2.82s/it] 96%|█████████▌| 6593/6885 [15:24:11<13:22, 2.75s/it] 96%|█████████▌| 6594/6885 [15:24:13<12:17, 2.53s/it] 96%|█████████▌| 6595/6885 [15:24:16<12:12, 2.52s/it] 96%|█████████▌| 6596/6885 [15:24:19<12:32, 2.60s/it] 96%|█████████▌| 6597/6885 [15:24:20<11:28, 2.39s/it] 96%|█████████▌| 6598/6885 [15:24:23<12:18, 2.57s/it] 96%|█████████▌| 6599/6885 [15:24:27<14:13, 2.98s/it] 96%|█████████▌| 6600/6885 [15:24:32<16:41, 3.51s/it] {'loss': 0.5435, 'grad_norm': 1.136513704911577, 'learning_rate': 5.247927428433885e-08, 'epoch': 0.96} 96%|█████████▌| 6600/6885 [15:24:32<16:41, 3.51s/it] 96%|█████████▌| 6601/6885 [15:24:35<14:59, 3.17s/it] 96%|█████████▌| 6602/6885 [15:24:36<12:52, 2.73s/it] 96%|█████████▌| 6603/6885 [15:24:40<14:13, 3.03s/it] 96%|█████████▌| 6604/6885 [15:24:43<13:58, 2.98s/it] 96%|█████████▌| 6605/6885 [15:24:45<13:11, 2.83s/it] 96%|█████████▌| 6606/6885 [15:24:48<12:59, 2.79s/it] 96%|█████████▌| 6607/6885 [15:24:51<13:02, 2.82s/it] 96%|█████████▌| 6608/6885 [15:24:54<13:44, 2.98s/it] 96%|█████████▌| 6609/6885 [15:24:58<15:17, 3.32s/it] 96%|█████████▌| 6610/6885 [15:25:01<14:12, 3.10s/it] {'loss': 0.548, 'grad_norm': 1.1972723133655234, 'learning_rate': 4.887944049062843e-08, 'epoch': 0.96} 96%|█████████▌| 6610/6885 [15:25:01<14:12, 3.10s/it] 96%|█████████▌| 6611/6885 [15:25:04<14:02, 3.07s/it] 96%|█████████▌| 6612/6885 [15:25:06<13:15, 2.91s/it] 96%|█████████▌| 6613/6885 [15:25:09<12:13, 2.70s/it] 96%|█████████▌| 6614/6885 [15:25:12<12:25, 2.75s/it] 96%|█████████▌| 6615/6885 [15:25:14<12:22, 2.75s/it] 96%|█████████▌| 6616/6885 [15:25:18<14:11, 3.16s/it] 96%|█████████▌| 6617/6885 [15:25:25<18:05, 4.05s/it] 96%|█████████▌| 6618/6885 [15:25:26<15:09, 3.40s/it] 96%|█████████▌| 6619/6885 [15:25:29<13:39, 3.08s/it] 96%|█████████▌| 6620/6885 [15:25:31<13:05, 2.96s/it] {'loss': 0.5538, 'grad_norm': 1.240930781464282, 'learning_rate': 4.5406892362632185e-08, 'epoch': 0.96} 96%|█████████▌| 6620/6885 [15:25:31<13:05, 2.96s/it] 96%|█████████▌| 6621/6885 [15:25:34<11:58, 2.72s/it] 96%|█████████▌| 6622/6885 [15:25:38<14:22, 3.28s/it] 96%|█████████▌| 6623/6885 [15:25:40<12:39, 2.90s/it] 96%|█████████▌| 6624/6885 [15:25:42<11:29, 2.64s/it] 96%|█████████▌| 6625/6885 [15:25:46<12:44, 2.94s/it] 96%|█████████▌| 6626/6885 [15:25:48<11:45, 2.73s/it] 96%|█████████▋| 6627/6885 [15:25:51<11:46, 2.74s/it] 96%|█████████▋| 6628/6885 [15:25:54<11:55, 2.78s/it] 96%|█████████▋| 6629/6885 [15:25:56<11:12, 2.63s/it] 96%|█████████▋| 6630/6885 [15:25:59<11:34, 2.72s/it] {'loss': 0.5616, 'grad_norm': 1.2645184421648727, 'learning_rate': 4.206171917420121e-08, 'epoch': 0.96} 96%|█████████▋| 6630/6885 [15:25:59<11:34, 2.72s/it] 96%|█████████▋| 6631/6885 [15:26:01<11:15, 2.66s/it] 96%|█████████▋| 6632/6885 [15:26:04<10:49, 2.57s/it] 96%|█████████▋| 6633/6885 [15:26:06<10:41, 2.55s/it] 96%|█████████▋| 6634/6885 [15:26:09<10:28, 2.50s/it] 96%|█████████▋| 6635/6885 [15:26:12<11:28, 2.75s/it] 96%|█████████▋| 6636/6885 [15:26:15<11:12, 2.70s/it] 96%|█████████▋| 6637/6885 [15:26:17<10:30, 2.54s/it] 96%|█████████▋| 6638/6885 [15:26:20<10:59, 2.67s/it] 96%|█████████▋| 6639/6885 [15:26:22<10:27, 2.55s/it] 96%|█████████▋| 6640/6885 [15:26:24<10:11, 2.49s/it] {'loss': 0.5578, 'grad_norm': 1.1619344530688336, 'learning_rate': 3.884400692457435e-08, 'epoch': 0.96} 96%|█████████▋| 6640/6885 [15:26:24<10:11, 2.49s/it] 96%|█████████▋| 6641/6885 [15:26:28<11:13, 2.76s/it] 96%|█████████▋| 6642/6885 [15:26:31<11:06, 2.74s/it] 96%|█████████▋| 6643/6885 [15:26:33<10:34, 2.62s/it] 96%|█████████▋| 6644/6885 [15:26:35<09:56, 2.48s/it] 97%|█████████▋| 6645/6885 [15:26:39<11:46, 2.94s/it] 97%|█████████▋| 6646/6885 [15:26:43<12:26, 3.12s/it] 97%|█████████▋| 6647/6885 [15:26:45<11:51, 2.99s/it] 97%|█████████▋| 6648/6885 [15:26:48<11:58, 3.03s/it] 97%|█████████▋| 6649/6885 [15:26:51<11:18, 2.88s/it] 97%|█████████▋| 6650/6885 [15:26:55<12:29, 3.19s/it] {'loss': 0.536, 'grad_norm': 1.0415045949293107, 'learning_rate': 3.575383833616497e-08, 'epoch': 0.97} 97%|█████████▋| 6650/6885 [15:26:55<12:29, 3.19s/it] 97%|█████████▋| 6651/6885 [15:26:56<10:39, 2.73s/it] 97%|█████████▋| 6652/6885 [15:27:01<12:58, 3.34s/it] 97%|█████████▋| 6653/6885 [15:27:03<11:06, 2.87s/it] 97%|█████████▋| 6654/6885 [15:27:06<10:55, 2.84s/it] 97%|█████████▋| 6655/6885 [15:27:09<11:39, 3.04s/it] 97%|█████████▋| 6656/6885 [15:27:11<10:13, 2.68s/it] 97%|█████████▋| 6657/6885 [15:27:15<11:07, 2.93s/it] 97%|█████████▋| 6658/6885 [15:27:17<10:40, 2.82s/it] 97%|█████████▋| 6659/6885 [15:27:20<10:10, 2.70s/it] 97%|█████████▋| 6660/6885 [15:27:23<11:01, 2.94s/it] {'loss': 0.5444, 'grad_norm': 1.1707683296063809, 'learning_rate': 3.2791292852437096e-08, 'epoch': 0.97} 97%|█████████▋| 6660/6885 [15:27:23<11:01, 2.94s/it] 97%|█████████▋| 6661/6885 [15:27:26<11:11, 3.00s/it] 97%|█████████▋| 6662/6885 [15:27:29<10:26, 2.81s/it] 97%|█████████▋| 6663/6885 [15:27:31<09:32, 2.58s/it] 97%|█████████▋| 6664/6885 [15:27:35<11:14, 3.05s/it] 97%|█████████▋| 6665/6885 [15:27:37<10:39, 2.91s/it] 97%|█████████▋| 6666/6885 [15:27:40<10:42, 2.93s/it] 97%|█████████▋| 6667/6885 [15:27:43<10:26, 2.87s/it] 97%|█████████▋| 6668/6885 [15:27:45<09:23, 2.60s/it] 97%|█████████▋| 6669/6885 [15:27:48<10:14, 2.84s/it] 97%|█████████▋| 6670/6885 [15:27:53<11:35, 3.24s/it] {'loss': 0.5604, 'grad_norm': 0.9579807050337852, 'learning_rate': 2.99564466358615e-08, 'epoch': 0.97} 97%|█████████▋| 6670/6885 [15:27:53<11:35, 3.24s/it] 97%|█████████▋| 6671/6885 [15:27:56<11:46, 3.30s/it] 97%|█████████▋| 6672/6885 [15:27:59<11:06, 3.13s/it] 97%|█████████▋| 6673/6885 [15:28:01<10:12, 2.89s/it] 97%|█████████▋| 6674/6885 [15:28:04<09:54, 2.82s/it] 97%|█████████▋| 6675/6885 [15:28:09<11:58, 3.42s/it] 97%|█████████▋| 6676/6885 [15:28:11<10:30, 3.02s/it] 97%|█████████▋| 6677/6885 [15:28:13<09:17, 2.68s/it] 97%|█████████▋| 6678/6885 [15:28:16<10:27, 3.03s/it] 97%|█████████▋| 6679/6885 [15:28:19<09:57, 2.90s/it] 97%|█████████▋| 6680/6885 [15:28:21<09:17, 2.72s/it] {'loss': 0.5495, 'grad_norm': 1.155540906901066, 'learning_rate': 2.7249372565957277e-08, 'epoch': 0.97} 97%|█████████▋| 6680/6885 [15:28:21<09:17, 2.72s/it] 97%|█████████▋| 6681/6885 [15:28:25<10:05, 2.97s/it] 97%|█████████▋| 6682/6885 [15:28:27<09:24, 2.78s/it] 97%|█████████▋| 6683/6885 [15:28:31<09:57, 2.96s/it] 97%|█████████▋| 6684/6885 [15:28:33<09:43, 2.90s/it] 97%|█████████▋| 6685/6885 [15:28:35<08:33, 2.57s/it] 97%|█████████▋| 6686/6885 [15:28:39<10:00, 3.02s/it] 97%|█████████▋| 6687/6885 [15:28:41<09:07, 2.76s/it] 97%|█████████▋| 6688/6885 [15:28:45<09:41, 2.95s/it] 97%|█████████▋| 6689/6885 [15:28:47<08:48, 2.69s/it] 97%|█████████▋| 6690/6885 [15:28:50<09:07, 2.81s/it] {'loss': 0.5483, 'grad_norm': 1.0959456715901421, 'learning_rate': 2.4670140237419428e-08, 'epoch': 0.97} 97%|█████████▋| 6690/6885 [15:28:50<09:07, 2.81s/it] 97%|█████████▋| 6691/6885 [15:28:53<09:11, 2.84s/it] 97%|█████████▋| 6692/6885 [15:28:57<09:53, 3.08s/it] 97%|█████████▋| 6693/6885 [15:29:01<11:33, 3.61s/it] 97%|█████████▋| 6694/6885 [15:29:03<09:34, 3.01s/it] 97%|█████████▋| 6695/6885 [15:29:06<09:23, 2.96s/it] 97%|█████████▋| 6696/6885 [15:29:09<09:04, 2.88s/it] 97%|█████████▋| 6697/6885 [15:29:11<08:29, 2.71s/it] 97%|█████████▋| 6698/6885 [15:29:14<09:11, 2.95s/it] 97%|█████████▋| 6699/6885 [15:29:18<10:03, 3.25s/it] 97%|█████████▋| 6700/6885 [15:29:23<11:32, 3.74s/it] {'loss': 0.5497, 'grad_norm': 1.0366185075689953, 'learning_rate': 2.2218815958329754e-08, 'epoch': 0.97} 97%|█████████▋| 6700/6885 [15:29:23<11:32, 3.74s/it] 97%|█████████▋| 6701/6885 [15:29:29<13:01, 4.25s/it] 97%|█████████▋| 6702/6885 [15:29:31<11:38, 3.82s/it] 97%|█████████▋| 6703/6885 [15:29:34<10:16, 3.39s/it] 97%|█████████▋| 6704/6885 [15:29:37<10:21, 3.43s/it] 97%|█████████▋| 6705/6885 [15:29:41<10:34, 3.53s/it] 97%|█████████▋| 6706/6885 [15:29:45<10:41, 3.59s/it] 97%|█████████▋| 6707/6885 [15:29:48<10:01, 3.38s/it] 97%|█████████▋| 6708/6885 [15:29:50<09:13, 3.13s/it] 97%|█████████▋| 6709/6885 [15:29:53<08:33, 2.92s/it] 97%|█████████▋| 6710/6885 [15:29:55<07:57, 2.73s/it] {'loss': 0.5634, 'grad_norm': 1.0759294981597065, 'learning_rate': 1.9895462748450444e-08, 'epoch': 0.97} 97%|█████████▋| 6710/6885 [15:29:55<07:57, 2.73s/it] 97%|█████████▋| 6711/6885 [15:29:57<07:05, 2.44s/it] 97%|█████████▋| 6712/6885 [15:29:59<07:17, 2.53s/it] 98%|█████████▊| 6713/6885 [15:30:02<06:57, 2.43s/it] 98%|█████████▊| 6714/6885 [15:30:05<07:52, 2.76s/it] 98%|█████████▊| 6715/6885 [15:30:08<07:25, 2.62s/it] 98%|█████████▊| 6716/6885 [15:30:10<07:35, 2.70s/it] 98%|█████████▊| 6717/6885 [15:30:14<08:24, 3.00s/it] 98%|█████████▊| 6718/6885 [15:30:17<08:10, 2.94s/it] 98%|█████████▊| 6719/6885 [15:30:21<09:27, 3.42s/it] 98%|█████████▊| 6720/6885 [15:30:23<08:13, 2.99s/it] {'loss': 0.5508, 'grad_norm': 1.1209995693338786, 'learning_rate': 1.770014033760592e-08, 'epoch': 0.98} 98%|█████████▊| 6720/6885 [15:30:23<08:13, 2.99s/it] 98%|█████████▊| 6721/6885 [15:30:25<07:06, 2.60s/it] 98%|█████████▊| 6722/6885 [15:30:28<07:24, 2.73s/it] 98%|█████████▊| 6723/6885 [15:30:30<07:02, 2.61s/it] 98%|█████████▊| 6724/6885 [15:30:32<06:11, 2.31s/it] 98%|█████████▊| 6725/6885 [15:30:36<07:33, 2.83s/it] 98%|█████████▊| 6726/6885 [15:30:40<08:08, 3.07s/it] 98%|█████████▊| 6727/6885 [15:30:42<07:16, 2.76s/it] 98%|█████████▊| 6728/6885 [15:30:46<08:03, 3.08s/it] 98%|█████████▊| 6729/6885 [15:30:48<07:14, 2.78s/it] 98%|█████████▊| 6730/6885 [15:30:51<07:15, 2.81s/it] {'loss': 0.5813, 'grad_norm': 1.210238366549934, 'learning_rate': 1.5632905164145173e-08, 'epoch': 0.98} 98%|█████████▊| 6730/6885 [15:30:51<07:15, 2.81s/it] 98%|█████████▊| 6731/6885 [15:30:53<06:45, 2.63s/it] 98%|█████████▊| 6732/6885 [15:30:55<06:33, 2.57s/it] 98%|█████████▊| 6733/6885 [15:30:58<06:48, 2.69s/it] 98%|█████████▊| 6734/6885 [15:31:01<06:33, 2.61s/it] 98%|█████████▊| 6735/6885 [15:31:03<06:03, 2.42s/it] 98%|█████████▊| 6736/6885 [15:31:05<06:02, 2.43s/it] 98%|█████████▊| 6737/6885 [15:31:10<07:56, 3.22s/it] 98%|█████████▊| 6738/6885 [15:31:13<07:17, 2.97s/it] 98%|█████████▊| 6739/6885 [15:31:14<06:21, 2.61s/it] 98%|█████████▊| 6740/6885 [15:31:18<06:50, 2.83s/it] {'loss': 0.5421, 'grad_norm': 1.15542524575641, 'learning_rate': 1.3693810373494598e-08, 'epoch': 0.98} 98%|█████████▊| 6740/6885 [15:31:18<06:50, 2.83s/it] 98%|█████████▊| 6741/6885 [15:31:20<06:38, 2.77s/it] 98%|█████████▊| 6742/6885 [15:31:24<07:31, 3.16s/it] 98%|█████████▊| 6743/6885 [15:31:27<07:00, 2.96s/it] 98%|█████████▊| 6744/6885 [15:31:29<06:36, 2.81s/it] 98%|█████████▊| 6745/6885 [15:31:31<06:00, 2.57s/it] 98%|█████████▊| 6746/6885 [15:31:34<05:52, 2.54s/it] 98%|█████████▊| 6747/6885 [15:31:38<07:01, 3.06s/it] 98%|█████████▊| 6748/6885 [15:31:41<06:51, 3.00s/it] 98%|█████████▊| 6749/6885 [15:31:44<07:12, 3.18s/it] 98%|█████████▊| 6750/6885 [15:31:47<07:02, 3.13s/it] {'loss': 0.5586, 'grad_norm': 1.194050906215969, 'learning_rate': 1.188290581678575e-08, 'epoch': 0.98} 98%|█████████▊| 6750/6885 [15:31:47<07:02, 3.13s/it] 98%|█████████▊| 6751/6885 [15:31:49<06:04, 2.72s/it] 98%|█████████▊| 6752/6885 [15:31:51<05:28, 2.47s/it] 98%|█████████▊| 6753/6885 [15:31:53<04:57, 2.26s/it] 98%|█████████▊| 6754/6885 [15:31:55<04:41, 2.15s/it] 98%|█████████▊| 6755/6885 [15:31:59<05:41, 2.63s/it] 98%|█████████▊| 6756/6885 [15:32:00<05:08, 2.39s/it] 98%|█████████▊| 6757/6885 [15:32:02<04:46, 2.24s/it] 98%|█████████▊| 6758/6885 [15:32:05<05:04, 2.39s/it] 98%|█████████▊| 6759/6885 [15:32:08<05:31, 2.63s/it] 98%|█████████▊| 6760/6885 [15:32:11<05:24, 2.59s/it] {'loss': 0.5632, 'grad_norm': 1.1566645017111077, 'learning_rate': 1.0200238049580258e-08, 'epoch': 0.98} 98%|█████████▊| 6760/6885 [15:32:11<05:24, 2.59s/it] 98%|█████████▊| 6761/6885 [15:32:14<06:04, 2.94s/it] 98%|█████████▊| 6762/6885 [15:32:17<05:40, 2.77s/it] 98%|█████████▊| 6763/6885 [15:32:20<05:57, 2.93s/it] 98%|█████████▊| 6764/6885 [15:32:23<06:03, 3.00s/it] 98%|█████████▊| 6765/6885 [15:32:26<05:36, 2.80s/it] 98%|█████████▊| 6766/6885 [15:32:28<05:24, 2.73s/it] 98%|█████████▊| 6767/6885 [15:32:31<05:31, 2.81s/it] 98%|█████████▊| 6768/6885 [15:32:35<06:08, 3.15s/it] 98%|█████████▊| 6769/6885 [15:32:37<05:34, 2.89s/it] 98%|█████████▊| 6770/6885 [15:32:42<06:43, 3.51s/it] {'loss': 0.5368, 'grad_norm': 1.0710546930410338, 'learning_rate': 8.645850330668559e-09, 'epoch': 0.98} 98%|█████████▊| 6770/6885 [15:32:42<06:43, 3.51s/it] 98%|█████████▊| 6771/6885 [15:32:45<06:22, 3.35s/it] 98%|█████████▊| 6772/6885 [15:32:48<05:50, 3.10s/it] 98%|█████████▊| 6773/6885 [15:32:50<05:06, 2.74s/it] 98%|█████████▊| 6774/6885 [15:32:52<04:59, 2.70s/it] 98%|█████████▊| 6775/6885 [15:32:55<04:41, 2.56s/it] 98%|█████████▊| 6776/6885 [15:32:58<05:02, 2.78s/it] 98%|█████████▊| 6777/6885 [15:33:01<05:07, 2.85s/it] 98%|█████████▊| 6778/6885 [15:33:05<05:31, 3.10s/it] 98%|█████████▊| 6779/6885 [15:33:09<06:00, 3.40s/it] 98%|█████████▊| 6780/6885 [15:33:12<05:45, 3.29s/it] {'loss': 0.5388, 'grad_norm': 1.175731861197897, 'learning_rate': 7.219782620958571e-09, 'epoch': 0.98} 98%|█████████▊| 6780/6885 [15:33:12<05:45, 3.29s/it] 98%|█████████▊| 6781/6885 [15:33:15<05:27, 3.15s/it] 99%|█████████▊| 6782/6885 [15:33:18<05:19, 3.10s/it] 99%|█████████▊| 6783/6885 [15:33:20<05:01, 2.95s/it] 99%|█████████▊| 6784/6885 [15:33:29<07:47, 4.63s/it] 99%|█████████▊| 6785/6885 [15:33:33<07:30, 4.50s/it] 99%|█████████▊| 6786/6885 [15:33:37<07:22, 4.47s/it] 99%|█████████▊| 6787/6885 [15:33:40<06:27, 3.95s/it] 99%|█████████▊| 6788/6885 [15:33:42<05:32, 3.43s/it] 99%|█████████▊| 6789/6885 [15:33:45<05:00, 3.13s/it] 99%|█████████▊| 6790/6885 [15:33:48<04:52, 3.08s/it] {'loss': 0.5585, 'grad_norm': 1.0791848418311811, 'learning_rate': 5.922071582449285e-09, 'epoch': 0.99} 99%|█████████▊| 6790/6885 [15:33:48<04:52, 3.08s/it] 99%|█████████▊| 6791/6885 [15:33:50<04:32, 2.90s/it] 99%|█████████▊| 6792/6885 [15:33:54<05:00, 3.23s/it] 99%|█████████▊| 6793/6885 [15:33:58<05:15, 3.43s/it] 99%|█████████▊| 6794/6885 [15:34:00<04:32, 2.99s/it] 99%|█████████▊| 6795/6885 [15:34:05<05:26, 3.63s/it] 99%|█████████▊| 6796/6885 [15:34:08<05:09, 3.48s/it] 99%|█████████▊| 6797/6885 [15:34:11<04:34, 3.12s/it] 99%|█████████▊| 6798/6885 [15:34:15<05:16, 3.64s/it] 99%|█████████▉| 6799/6885 [15:34:19<05:12, 3.63s/it] 99%|█████████▉| 6800/6885 [15:34:22<04:41, 3.31s/it] {'loss': 0.5603, 'grad_norm': 1.21651622954666, 'learning_rate': 4.752750577288745e-09, 'epoch': 0.99} 99%|█████████▉| 6800/6885 [15:34:22<04:41, 3.31s/it] 99%|█████████▉| 6801/6885 [15:34:24<04:13, 3.01s/it] 99%|█████████▉| 6802/6885 [15:34:30<05:29, 3.97s/it] 99%|█████████▉| 6803/6885 [15:34:33<05:03, 3.70s/it] 99%|█████████▉| 6804/6885 [15:34:35<04:23, 3.26s/it] 99%|█████████▉| 6805/6885 [15:34:44<06:20, 4.75s/it] 99%|█████████▉| 6806/6885 [15:34:47<05:37, 4.27s/it] 99%|█████████▉| 6807/6885 [15:34:49<04:44, 3.65s/it] 99%|█████████▉| 6808/6885 [15:34:54<05:11, 4.04s/it] 99%|█████████▉| 6809/6885 [15:34:56<04:15, 3.37s/it] 99%|█████████▉| 6810/6885 [15:34:58<03:57, 3.16s/it] {'loss': 0.5713, 'grad_norm': 1.294701087862953, 'learning_rate': 3.711849666914735e-09, 'epoch': 0.99} 99%|█████████▉| 6810/6885 [15:34:58<03:57, 3.16s/it] 99%|█████████▉| 6811/6885 [15:35:02<03:55, 3.18s/it] 99%|█████████▉| 6812/6885 [15:35:04<03:43, 3.06s/it] 99%|█████████▉| 6813/6885 [15:35:10<04:43, 3.94s/it] 99%|█████████▉| 6814/6885 [15:35:13<04:20, 3.67s/it] 99%|█████████▉| 6815/6885 [15:35:16<03:51, 3.30s/it] 99%|█████████▉| 6816/6885 [15:35:19<03:43, 3.24s/it] 99%|█████████▉| 6817/6885 [15:35:21<03:13, 2.84s/it] 99%|█████████▉| 6818/6885 [15:35:25<03:32, 3.17s/it] 99%|█████████▉| 6819/6885 [15:35:27<03:07, 2.84s/it] 99%|█████████▉| 6820/6885 [15:35:30<03:07, 2.89s/it] {'loss': 0.5587, 'grad_norm': 1.100757408335571, 'learning_rate': 2.799395611281508e-09, 'epoch': 0.99} 99%|█████████▉| 6820/6885 [15:35:30<03:07, 2.89s/it] 99%|█████████▉| 6821/6885 [15:35:32<02:59, 2.80s/it] 99%|█████████▉| 6822/6885 [15:35:34<02:40, 2.55s/it] 99%|█████████▉| 6823/6885 [15:35:37<02:41, 2.60s/it] 99%|█████████▉| 6824/6885 [15:35:41<03:03, 3.00s/it] 99%|█████████▉| 6825/6885 [15:35:44<02:51, 2.86s/it] 99%|█████████▉| 6826/6885 [15:35:46<02:40, 2.72s/it] 99%|█████████▉| 6827/6885 [15:35:48<02:29, 2.58s/it] 99%|█████████▉| 6828/6885 [15:35:50<02:15, 2.38s/it] 99%|█████████▉| 6829/6885 [15:35:52<02:11, 2.34s/it] 99%|█████████▉| 6830/6885 [15:35:54<02:03, 2.24s/it] {'loss': 0.5588, 'grad_norm': 1.282263624241459, 'learning_rate': 2.0154118681753322e-09, 'epoch': 0.99} 99%|█████████▉| 6830/6885 [15:35:54<02:03, 2.24s/it] 99%|█████████▉| 6831/6885 [15:35:57<01:59, 2.22s/it] 99%|█████████▉| 6832/6885 [15:35:58<01:49, 2.07s/it] 99%|█████████▉| 6833/6885 [15:36:01<02:03, 2.38s/it] 99%|█████████▉| 6834/6885 [15:36:04<02:07, 2.50s/it] 99%|█████████▉| 6835/6885 [15:36:07<02:04, 2.49s/it] 99%|█████████▉| 6836/6885 [15:36:09<01:53, 2.32s/it] 99%|█████████▉| 6837/6885 [15:36:15<02:44, 3.43s/it] 99%|█████████▉| 6838/6885 [15:36:18<02:35, 3.32s/it] 99%|█████████▉| 6839/6885 [15:36:20<02:21, 3.07s/it] 99%|█████████▉| 6840/6885 [15:36:23<02:15, 3.01s/it] {'loss': 0.5724, 'grad_norm': 1.0975199346392859, 'learning_rate': 1.3599185926072012e-09, 'epoch': 0.99} 99%|█████████▉| 6840/6885 [15:36:23<02:15, 3.01s/it] 99%|█████████▉| 6841/6885 [15:36:27<02:26, 3.32s/it] 99%|█████████▉| 6842/6885 [15:36:31<02:26, 3.41s/it] 99%|█████████▉| 6843/6885 [15:36:37<03:01, 4.32s/it] 99%|█████████▉| 6844/6885 [15:36:39<02:26, 3.58s/it] 99%|█████████▉| 6845/6885 [15:36:41<02:03, 3.09s/it] 99%|█████████▉| 6846/6885 [15:36:43<01:46, 2.72s/it] 99%|█████████▉| 6847/6885 [15:36:45<01:40, 2.66s/it] 99%|█████████▉| 6848/6885 [15:36:48<01:43, 2.79s/it] 99%|█████████▉| 6849/6885 [15:36:51<01:41, 2.81s/it] 99%|█████████▉| 6850/6885 [15:36:53<01:32, 2.64s/it] {'loss': 0.5621, 'grad_norm': 1.1620574281790235, 'learning_rate': 8.329326362976897e-10, 'epoch': 0.99} 99%|█████████▉| 6850/6885 [15:36:53<01:32, 2.64s/it] 100%|█████████▉| 6851/6885 [15:36:57<01:33, 2.75s/it] 100%|█████████▉| 6852/6885 [15:36:59<01:24, 2.57s/it] 100%|█████████▉| 6853/6885 [15:37:02<01:29, 2.78s/it] 100%|█████████▉| 6854/6885 [15:37:04<01:21, 2.62s/it] 100%|█████████▉| 6855/6885 [15:37:06<01:11, 2.40s/it] 100%|█████████▉| 6856/6885 [15:37:08<01:06, 2.30s/it] 100%|█████████▉| 6857/6885 [15:37:11<01:10, 2.51s/it] 100%|█████████▉| 6858/6885 [15:37:14<01:10, 2.60s/it] 100%|█████████▉| 6859/6885 [15:37:16<01:01, 2.37s/it] 100%|█████████▉| 6860/6885 [15:37:17<00:53, 2.14s/it] {'loss': 0.5506, 'grad_norm': 1.1717561623715795, 'learning_rate': 4.34467547242301e-10, 'epoch': 1.0} 100%|█████████▉| 6860/6885 [15:37:17<00:53, 2.14s/it] 100%|█████████▉| 6861/6885 [15:37:20<00:52, 2.19s/it] 100%|█████████▉| 6862/6885 [15:37:23<00:57, 2.52s/it] 100%|█████████▉| 6863/6885 [15:37:26<01:00, 2.75s/it] 100%|█████████▉| 6864/6885 [15:37:30<01:04, 3.06s/it] 100%|█████████▉| 6865/6885 [15:37:32<00:54, 2.71s/it] 100%|█████████▉| 6866/6885 [15:37:35<00:51, 2.70s/it] 100%|█████████▉| 6867/6885 [15:37:38<00:52, 2.91s/it] 100%|█████████▉| 6868/6885 [15:37:40<00:46, 2.75s/it] 100%|█████████▉| 6869/6885 [15:37:43<00:43, 2.69s/it] 100%|█████████▉| 6870/6885 [15:37:46<00:39, 2.66s/it] {'loss': 0.5533, 'grad_norm': 1.155270191238308, 'learning_rate': 1.645335693623018e-10, 'epoch': 1.0} 100%|█████████▉| 6870/6885 [15:37:46<00:39, 2.66s/it] 100%|█████████▉| 6871/6885 [15:37:48<00:36, 2.59s/it] 100%|█████████▉| 6872/6885 [15:37:51<00:36, 2.80s/it] 100%|█████████▉| 6873/6885 [15:37:54<00:31, 2.66s/it] 100%|█████████▉| 6874/6885 [15:37:57<00:30, 2.75s/it] 100%|█████████▉| 6875/6885 [15:38:01<00:32, 3.24s/it] 100%|█████████▉| 6876/6885 [15:38:04<00:28, 3.16s/it] 100%|█████████▉| 6877/6885 [15:38:06<00:23, 2.98s/it] 100%|█████████▉| 6878/6885 [15:38:10<00:22, 3.23s/it] 100%|█████████▉| 6879/6885 [15:38:14<00:19, 3.24s/it] 100%|█████████▉| 6880/6885 [15:38:16<00:14, 2.99s/it] {'loss': 0.5538, 'grad_norm': 1.240301119345841, 'learning_rate': 2.3137642244375202e-11, 'epoch': 1.0} 100%|█████████▉| 6880/6885 [15:38:16<00:14, 2.99s/it] 100%|█████████▉| 6881/6885 [15:38:18<00:11, 2.80s/it] 100%|█████████▉| 6882/6885 [15:38:20<00:07, 2.40s/it] 100%|█████████▉| 6883/6885 [15:38:22<00:04, 2.27s/it] 100%|█████████▉| 6884/6885 [15:38:24<00:02, 2.34s/it] 100%|██████████| 6885/6885 [15:38:27<00:00, 2.55s/it][INFO|trainer.py:3993] 2025-10-14 18:16:17,635 >> Saving model checkpoint to /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885 [INFO|configuration_utils.py:424] 2025-10-14 18:16:17,642 >> Configuration saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/config.json [INFO|configuration_utils.py:904] 2025-10-14 18:16:17,644 >> Configuration saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/generation_config.json [INFO|modeling_utils.py:3730] 2025-10-14 18:16:37,886 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2356] 2025-10-14 18:16:37,890 >> chat template saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/chat_template.jinja [INFO|tokenization_utils_base.py:2525] 2025-10-14 18:16:37,893 >> tokenizer config file saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/tokenizer_config.json [INFO|tokenization_utils_base.py:2534] 2025-10-14 18:16:37,895 >> Special tokens file saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/special_tokens_map.json [2025-10-14 18:16:38,628] [INFO] [logging.py:107:log_dist] [Rank 0] [Torch] Checkpoint global_step6885 is begin to save! [2025-10-14 18:16:38,694] [INFO] [logging.py:107:log_dist] [Rank 0] Saving model checkpoint: /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/global_step6885/zero_pp_rank_0_mp_rank_00_model_states.pt [INFO|2025-10-14 18:16:57] llamafactory.train.callbacks:143 >> EMA model saved at: /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/ema [INFO|image_processing_base.py:260] 2025-10-14 18:16:57,142 >> Image processor saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/preprocessor_config.json [INFO|tokenization_utils_base.py:2356] 2025-10-14 18:16:57,143 >> chat template saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/chat_template.jinja [INFO|tokenization_utils_base.py:2525] 2025-10-14 18:16:57,146 >> tokenizer config file saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/tokenizer_config.json [INFO|tokenization_utils_base.py:2534] 2025-10-14 18:16:57,147 >> Special tokens file saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/special_tokens_map.json [INFO|video_processing_utils.py:491] 2025-10-14 18:16:57,275 >> Video processor saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/video_preprocessor_config.json [INFO|processing_utils.py:674] 2025-10-14 18:16:57,577 >> chat template saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/checkpoint-6885/chat_template.jinja [INFO|trainer.py:2676] 2025-10-14 18:16:57,578 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 56356.5973, 'train_samples_per_second': 1.955, 'train_steps_per_second': 0.122, 'train_loss': 0.5927019230420812, 'epoch': 1.0} 100%|██████████| 6885/6885 [15:39:15<00:00, 2.55s/it][INFO|2025-10-14 18:16:57] llamafactory.train.callbacks:143 >> EMA model saved at: /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/ema 100%|██████████| 6885/6885 [15:39:15<00:00, 8.19s/it] [INFO|image_processing_base.py:260] 2025-10-14 18:16:57,604 >> Image processor saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/preprocessor_config.json [INFO|tokenization_utils_base.py:2356] 2025-10-14 18:16:57,606 >> chat template saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/chat_template.jinja [INFO|tokenization_utils_base.py:2525] 2025-10-14 18:16:57,607 >> tokenizer config file saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/tokenizer_config.json [INFO|tokenization_utils_base.py:2534] 2025-10-14 18:16:57,608 >> Special tokens file saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/special_tokens_map.json [INFO|video_processing_utils.py:491] 2025-10-14 18:16:57,707 >> Video processor saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/video_preprocessor_config.json [INFO|processing_utils.py:674] 2025-10-14 18:16:57,982 >> chat template saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/chat_template.jinja [INFO|trainer.py:3993] 2025-10-14 18:17:05,389 >> Saving model checkpoint to /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA [INFO|configuration_utils.py:424] 2025-10-14 18:17:05,396 >> Configuration saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/config.json [INFO|configuration_utils.py:904] 2025-10-14 18:17:05,398 >> Configuration saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/generation_config.json [INFO|modeling_utils.py:3730] 2025-10-14 18:17:24,931 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2356] 2025-10-14 18:17:24,932 >> chat template saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/chat_template.jinja [INFO|tokenization_utils_base.py:2525] 2025-10-14 18:17:24,935 >> tokenizer config file saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/tokenizer_config.json [INFO|tokenization_utils_base.py:2534] 2025-10-14 18:17:24,935 >> Special tokens file saved in /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 1808934GF train_loss = 0.5927 train_runtime = 15:39:16.59 train_samples_per_second = 1.955 train_steps_per_second = 0.122 Figure saved at: /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/training_loss.png [WARNING|2025-10-14 18:17:25] llamafactory.extras.ploting:148 >> No metric eval_loss to plot. [WARNING|2025-10-14 18:17:25] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot. [INFO|modelcard.py:450] 2025-10-14 18:17:25,669 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}} wandb: wandb: You can sync this run to the cloud by running: wandb: wandb sync /inspire/hdd/project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/wandb/offline-run-20251014_023741-21vuy1wg wandb: Find logs at: ../../../../../../project/agileapplication/zhangkaipeng-24043/lsh_appl/train/pyvision/qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA/wandb/offline-run-20251014_023741-21vuy1wg/logs