开始时间: Thu Oct 30 17:02:24 CST 2025
节点列表: SH-IDCA1404-10-140-54-103
总进程数: 8
当前任务ID: 6166706
INFO: User not listed in /etc/subuid, trying root-mapped namespace
INFO: No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
[INFO|2025-10-30 09:02:42] llamafactory.launcher:143 >> Initializing 8 distributed tasks at: 127.0.0.1:17821
W1030 09:02:46.863000 57640 site-packages/torch/distributed/run.py:792]
W1030 09:02:46.863000 57640 site-packages/torch/distributed/run.py:792] *****************************************
W1030 09:02:46.863000 57640 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1030 09:02:46.863000 57640 site-packages/torch/distributed/run.py:792] *****************************************
[2025-10-30 09:03:07,123] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 09:03:07,123] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 09:03:07,123] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 09:03:07,123] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 09:03:07,124] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 09:03:07,125] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 09:03:07,126] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 09:03:07,126] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
[2025-10-30 09:03:17,614] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 09:03:17,614] [INFO] [comm.py:669:init_distributed] cdb=None
[INFO|2025-10-30 09:03:18] llamafactory.hparams.parser:423 >> Process rank: 1, world size: 8, device: cuda:1, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 09:03:18] llamafactory.hparams.parser:423 >> Process rank: 4, world size: 8, device: cuda:4, distributed training: True, compute dtype: torch.bfloat16
[2025-10-30 09:03:18,383] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 09:03:18,383] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 09:03:18,383] [INFO] [comm.py:700:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2025-10-30 09:03:18,397] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 09:03:18,398] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 09:03:18,399] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 09:03:18,401] [INFO] [comm.py:669:init_distributed] cdb=None
[INFO|2025-10-30 09:03:18] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 8, device: cuda:0, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:18,922 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:18,923 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:18,924 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:18,925 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:18,927 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:18,928 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:18,930 >> loading file chat_template.jinja
[INFO|2025-10-30 09:03:19] llamafactory.hparams.parser:423 >> Process rank: 2, world size: 8, device: cuda:2, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 09:03:19] llamafactory.hparams.parser:423 >> Process rank: 3, world size: 8, device: cuda:3, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 09:03:19] llamafactory.hparams.parser:423 >> Process rank: 5, world size: 8, device: cuda:5, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 09:03:19] llamafactory.hparams.parser:423 >> Process rank: 6, world size: 8, device: cuda:6, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 09:03:19] llamafactory.hparams.parser:423 >> Process rank: 7, world size: 8, device: cuda:7, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2364] 2025-10-30 09:03:19,408 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|image_processing_base.py:381] 2025-10-30 09:03:19,420 >> loading configuration file ckpts/Qwen2.5-Omni-3B/preprocessor_config.json
[INFO|image_processing_base.py:381] 2025-10-30 09:03:19,442 >> loading configuration file ckpts/Qwen2.5-Omni-3B/preprocessor_config.json
[INFO|image_processing_base.py:428] 2025-10-30 09:03:19,471 >> Image processor Qwen2VLImageProcessorFast {
"chunk_length": 300,
"crop_size": null,
"data_format": "channels_first",
"default_to_square": true,
"device": null,
"disable_grouping": null,
"dither": 0.0,
"do_center_crop": null,
"do_convert_rgb": true,
"do_normalize": true,
"do_pad": null,
"do_rescale": true,
"do_resize": true,
"feature_size": 128,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessorFast",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"input_data_format": null,
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"pad_size": null,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"return_attention_mask": true,
"return_tensors": null,
"sampling_rate": 16000,
"size": {
"longest_edge": 12845056,
"shortest_edge": 3136
},
"temporal_patch_size": 2
}
[INFO|video_processing_utils.py:724] 2025-10-30 09:03:19,482 >> loading configuration file ckpts/Qwen2.5-Omni-3B/preprocessor_config.json
[INFO|video_processing_utils.py:770] 2025-10-30 09:03:19,485 >> Video processor Qwen2VLVideoProcessor {
"chunk_length": 300,
"crop_size": null,
"data_format": "channels_first",
"default_to_square": true,
"device": null,
"dither": 0.0,
"do_center_crop": null,
"do_convert_rgb": true,
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"do_sample_frames": false,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"fps": null,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"input_data_format": null,
"max_frames": 768,
"max_pixels": 12845056,
"merge_size": 2,
"min_frames": 4,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"num_frames": null,
"pad_size": null,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"return_attention_mask": true,
"return_metadata": false,
"sampling_rate": 16000,
"size": {
"longest_edge": 12845056,
"shortest_edge": 3136
},
"temporal_patch_size": 2,
"video_metadata": null,
"video_processor_type": "Qwen2VLVideoProcessor"
}
[INFO|feature_extraction_utils.py:556] 2025-10-30 09:03:19,512 >> loading configuration file ckpts/Qwen2.5-Omni-3B/preprocessor_config.json
[INFO|feature_extraction_utils.py:597] 2025-10-30 09:03:19,514 >> Feature extractor WhisperFeatureExtractor {
"chunk_length": 300,
"dither": 0.0,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"return_attention_mask": true,
"sampling_rate": 16000,
"temporal_patch_size": 2
}
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:19,533 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:19,534 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:19,534 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:19,535 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:19,535 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:19,536 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 09:03:19,536 >> loading file chat_template.jinja
[rank1]:[W1030 09:03:19.486491330 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank4]:[W1030 09:03:19.557434759 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[INFO|tokenization_utils_base.py:2364] 2025-10-30 09:03:20,025 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|processing_utils.py:1114] 2025-10-30 09:03:20,039 >> loading configuration file None
[INFO|processing_utils.py:1199] 2025-10-30 09:03:20,731 >> Processor Qwen2_5OmniProcessor:
- image_processor: Qwen2VLImageProcessorFast {
"chunk_length": 300,
"crop_size": null,
"data_format": "channels_first",
"default_to_square": true,
"device": null,
"disable_grouping": null,
"dither": 0.0,
"do_center_crop": null,
"do_convert_rgb": true,
"do_normalize": true,
"do_pad": null,
"do_rescale": true,
"do_resize": true,
"feature_size": 128,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessorFast",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"input_data_format": null,
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"pad_size": null,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"return_attention_mask": true,
"return_tensors": null,
"sampling_rate": 16000,
"size": {
"longest_edge": 12845056,
"shortest_edge": 3136
},
"temporal_patch_size": 2
}
- video_processor: Qwen2VLVideoProcessor {
"chunk_length": 300,
"crop_size": null,
"data_format": "channels_first",
"default_to_square": true,
"device": null,
"dither": 0.0,
"do_center_crop": null,
"do_convert_rgb": true,
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"do_sample_frames": false,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"fps": null,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"input_data_format": null,
"max_frames": 768,
"max_pixels": 12845056,
"merge_size": 2,
"min_frames": 4,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"num_frames": null,
"pad_size": null,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"return_attention_mask": true,
"return_metadata": false,
"sampling_rate": 16000,
"size": {
"longest_edge": 12845056,
"shortest_edge": 3136
},
"temporal_patch_size": 2,
"video_metadata": null,
"video_processor_type": "Qwen2VLVideoProcessor"
}
- feature_extractor: WhisperFeatureExtractor {
"chunk_length": 300,
"dither": 0.0,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"return_attention_mask": true,
"sampling_rate": 16000,
"temporal_patch_size": 2
}
- tokenizer: Qwen2TokenizerFast(name_or_path='ckpts/Qwen2.5-Omni-3B', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|AUDIO|>', '<|audio_bos|>', '<|audio_eos|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_bos|>', '<|vision_eos|>', '<|vision_pad|>', '<|IMAGE|>', '<|VIDEO|>'], 'image_token': '<|IMAGE|>', 'audio_token': '<|AUDIO|>', 'video_token': '<|VIDEO|>', 'vision_bos_token': '<|vision_bos|>', 'vision_eos_token': '<|vision_eos|>', 'audio_bos_token': '<|audio_bos|>', 'audio_eos_token': '<|audio_eos|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151646: AddedToken("<|AUDIO|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151647: AddedToken("<|audio_bos|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151648: AddedToken("<|audio_eos|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151652: AddedToken("<|vision_bos|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151653: AddedToken("<|vision_eos|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151655: AddedToken("<|IMAGE|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151656: AddedToken("<|VIDEO|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151657: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151658: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
}
)
{
"processor_class": "Qwen2_5OmniProcessor"
}
[INFO|2025-10-30 09:03:20] llamafactory.data.loader:143 >> Loading dataset VG-LLM-train/scannet_det_train_4frames.json...
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
[rank5]:[W1030 09:03:20.451250051 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank2]:[W1030 09:03:20.537862631 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank3]:[W1030 09:03:20.608079592 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank7]:[W1030 09:03:21.746086343 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank6]:[W1030 09:03:21.799332944 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank0]:[W1030 09:03:21.571601438 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
training example:
input_ids:
[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151653, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151653, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151653, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151653, 198, 57193, 279, 220, 18, 35, 30618, 14697, 304, 279, 6249, 16184, 1849, 315, 279, 1156, 4034, 624, 5097, 264, 2951, 1140, 1380, 1817, 4343, 5610, 279, 1633, 829, 304, 330, 1502, 1, 323, 1181, 220, 18, 35, 30618, 3745, 304, 330, 2011, 62, 18, 67, 22956, 785, 220, 18, 35, 30618, 3745, 3561, 1265, 387, 508, 87, 21087, 11, 379, 21087, 11, 1147, 21087, 11, 856, 2368, 11, 379, 2368, 11, 1147, 2368, 11, 45672, 11, 9649, 11, 6502, 936, 151645, 198, 151644, 77091, 198, 73594, 2236, 198, 9640, 197, 4913, 1502, 788, 330, 2005, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 18, 17, 11, 220, 15, 13, 21, 11, 220, 16, 13, 15, 20, 11, 220, 15, 13, 23, 21, 11, 220, 16, 13, 22, 11, 220, 15, 13, 23, 22, 11, 481, 16, 13, 18, 19, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 1419, 4748, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 16, 20, 11, 220, 15, 13, 17, 18, 11, 220, 15, 13, 22, 23, 11, 220, 15, 13, 19, 24, 11, 220, 15, 13, 20, 16, 11, 220, 15, 13, 17, 17, 11, 481, 16, 13, 18, 19, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 5507, 497, 330, 58456, 62, 18, 67, 788, 10055, 15, 13, 21, 17, 11, 481, 15, 13, 16, 17, 11, 220, 16, 13, 21, 21, 11, 220, 15, 13, 17, 21, 11, 220, 15, 13, 21, 23, 11, 220, 16, 13, 17, 18, 11, 481, 18, 13, 15, 24, 11, 220, 16, 13, 16, 24, 11, 481, 18, 13, 15, 19, 57352, 197, 4913, 1502, 788, 330, 11453, 2482, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 15, 11, 481, 15, 13, 17, 19, 11, 220, 17, 13, 17, 19, 11, 220, 15, 13, 19, 11, 220, 15, 13, 23, 23, 11, 220, 16, 13, 19, 11, 220, 16, 13, 23, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 5507, 497, 330, 58456, 62, 18, 67, 788, 10055, 15, 13, 21, 11, 481, 15, 13, 20, 19, 11, 220, 18, 13, 15, 18, 11, 220, 15, 13, 17, 17, 11, 220, 15, 13, 19, 24, 11, 220, 15, 13, 22, 24, 11, 481, 17, 13, 24, 16, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 11453, 2482, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 17, 18, 11, 481, 15, 13, 20, 17, 11, 220, 18, 13, 15, 22, 11, 220, 15, 13, 18, 22, 11, 220, 16, 13, 22, 16, 11, 220, 16, 13, 16, 21, 11, 220, 16, 13, 23, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 53950, 497, 330, 58456, 62, 18, 67, 788, 508, 17, 13, 17, 20, 11, 220, 15, 13, 15, 20, 11, 220, 15, 13, 20, 24, 11, 220, 15, 13, 23, 16, 11, 220, 22, 13, 16, 19, 11, 220, 16, 13, 23, 24, 11, 220, 15, 13, 17, 22, 11, 220, 16, 13, 16, 24, 11, 481, 17, 13, 22, 22, 23439, 60, 73594, 151645, 198]
inputs:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
<|vision_bos|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|vision_eos|>
<|vision_bos|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|vision_eos|>
<|vision_bos|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|vision_eos|>
<|vision_bos|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|vision_eos|>
Detect the 3D bounding boxes in the camera coordinate system of the first frame.
Output a json list where each entry contains the object name in "label" and its 3D bounding box in "box_3d".
The 3D bounding box format should be [x_center, y_center, z_center, x_size, y_size, z_size, yaw, pitch, roll].<|im_end|>
<|im_start|>assistant
```json
[
{"label": "table", "bbox_3d": [0.32, 0.6, 1.05, 0.86, 1.7, 0.87, -1.34, 1.08, -2.84]},
{"label": "backpack", "bbox_3d": [0.15, 0.23, 0.78, 0.49, 0.51, 0.22, -1.34, 1.08, -2.84]},
{"label": "window", "bbox_3d": [-0.62, -0.12, 1.66, 0.26, 0.68, 1.23, -3.09, 1.19, -3.04]},
{"label": "blackboard", "bbox_3d": [0.0, -0.24, 2.24, 0.4, 0.88, 1.4, 1.8, 1.08, -2.84]},
{"label": "window", "bbox_3d": [-0.6, -0.54, 3.03, 0.22, 0.49, 0.79, -2.91, 1.08, -2.84]},
{"label": "blackboard", "bbox_3d": [0.23, -0.52, 3.07, 0.37, 1.71, 1.16, 1.8, 1.08, -2.84]},
{"label": "shelf", "bbox_3d": [2.25, 0.05, 0.59, 0.81, 7.14, 1.89, 0.27, 1.19, -2.77]}
]```<|im_end|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 73594, 2236, 198, 9640, 197, 4913, 1502, 788, 330, 2005, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 18, 17, 11, 220, 15, 13, 21, 11, 220, 16, 13, 15, 20, 11, 220, 15, 13, 23, 21, 11, 220, 16, 13, 22, 11, 220, 15, 13, 23, 22, 11, 481, 16, 13, 18, 19, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 1419, 4748, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 16, 20, 11, 220, 15, 13, 17, 18, 11, 220, 15, 13, 22, 23, 11, 220, 15, 13, 19, 24, 11, 220, 15, 13, 20, 16, 11, 220, 15, 13, 17, 17, 11, 481, 16, 13, 18, 19, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 5507, 497, 330, 58456, 62, 18, 67, 788, 10055, 15, 13, 21, 17, 11, 481, 15, 13, 16, 17, 11, 220, 16, 13, 21, 21, 11, 220, 15, 13, 17, 21, 11, 220, 15, 13, 21, 23, 11, 220, 16, 13, 17, 18, 11, 481, 18, 13, 15, 24, 11, 220, 16, 13, 16, 24, 11, 481, 18, 13, 15, 19, 57352, 197, 4913, 1502, 788, 330, 11453, 2482, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 15, 11, 481, 15, 13, 17, 19, 11, 220, 17, 13, 17, 19, 11, 220, 15, 13, 19, 11, 220, 15, 13, 23, 23, 11, 220, 16, 13, 19, 11, 220, 16, 13, 23, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 5507, 497, 330, 58456, 62, 18, 67, 788, 10055, 15, 13, 21, 11, 481, 15, 13, 20, 19, 11, 220, 18, 13, 15, 18, 11, 220, 15, 13, 17, 17, 11, 220, 15, 13, 19, 24, 11, 220, 15, 13, 22, 24, 11, 481, 17, 13, 24, 16, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 11453, 2482, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 17, 18, 11, 481, 15, 13, 20, 17, 11, 220, 18, 13, 15, 22, 11, 220, 15, 13, 18, 22, 11, 220, 16, 13, 22, 16, 11, 220, 16, 13, 16, 21, 11, 220, 16, 13, 23, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 53950, 497, 330, 58456, 62, 18, 67, 788, 508, 17, 13, 17, 20, 11, 220, 15, 13, 15, 20, 11, 220, 15, 13, 20, 24, 11, 220, 15, 13, 23, 16, 11, 220, 22, 13, 16, 19, 11, 220, 16, 13, 23, 24, 11, 220, 15, 13, 17, 22, 11, 220, 16, 13, 16, 24, 11, 481, 17, 13, 22, 22, 23439, 60, 73594, 151645, 198]
labels:
```json
[
{"label": "table", "bbox_3d": [0.32, 0.6, 1.05, 0.86, 1.7, 0.87, -1.34, 1.08, -2.84]},
{"label": "backpack", "bbox_3d": [0.15, 0.23, 0.78, 0.49, 0.51, 0.22, -1.34, 1.08, -2.84]},
{"label": "window", "bbox_3d": [-0.62, -0.12, 1.66, 0.26, 0.68, 1.23, -3.09, 1.19, -3.04]},
{"label": "blackboard", "bbox_3d": [0.0, -0.24, 2.24, 0.4, 0.88, 1.4, 1.8, 1.08, -2.84]},
{"label": "window", "bbox_3d": [-0.6, -0.54, 3.03, 0.22, 0.49, 0.79, -2.91, 1.08, -2.84]},
{"label": "blackboard", "bbox_3d": [0.23, -0.52, 3.07, 0.37, 1.71, 1.16, 1.8, 1.08, -2.84]},
{"label": "shelf", "bbox_3d": [2.25, 0.05, 0.59, 0.81, 7.14, 1.89, 0.27, 1.19, -2.77]}
]```<|im_end|>
[INFO|configuration_utils.py:763] 2025-10-30 09:03:33,716 >> loading configuration file ckpts/Qwen2.5-Omni-3B/config.json
[WARNING|modeling_rope_utils.py:557] 2025-10-30 09:03:33,721 >> Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO|configuration_qwen2_5_omni.py:1059] 2025-10-30 09:03:33,726 >> thinker_config is None. Initializing thinker model with default values
[INFO|configuration_qwen2_5_omni.py:1063] 2025-10-30 09:03:33,727 >> talker_config is None. Initializing talker model with default values
[INFO|configuration_qwen2_5_omni.py:1067] 2025-10-30 09:03:33,727 >> token2wav_config is None. Initializing token2wav model with default values
[INFO|configuration_utils.py:839] 2025-10-30 09:03:33,734 >> Model config Qwen2_5OmniConfig {
"architectures": [
"Qwen2_5OmniForConditionalGeneration"
],
"dtype": "bfloat16",
"enable_audio_output": true,
"enable_talker": true,
"model_type": "qwen2_5_omni",
"talker_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "Qwen2.5-Omni-3B/talker",
"architectures": [
"Qwen2OmniTalkerForConditionalGeneration"
],
"attention_dropout": 0.0,
"audio_end_token_id": 151648,
"audio_start_token_id": 151647,
"audio_token_index": 151646,
"dtype": "bfloat16",
"embedding_size": 2048,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 896,
"image_token_index": 151655,
"init_std": 0.02,
"initializer_range": 0.02,
"intermediate_size": 4864,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2_5_omni_talker",
"num_attention_heads": 14,
"num_hidden_layers": 24,
"num_key_value_heads": 2,
"position_id_per_seconds": 25,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"mrope_section": [
16,
16,
0
],
"rope_type": "default",
"type": "default"
},
"rope_theta": 1000000.0,
"seconds_per_chunk": 2,
"sliding_window": null,
"spatial_merge_size": 2,
"tts_codec_end_token_id": 8294,
"tts_codec_mask_token_id": 8296,
"tts_codec_pad_token_id": 8292,
"tts_codec_start_token_id": 8293,
"tts_text_end_token_id": 151861,
"tts_text_pad_token_id": 151859,
"tts_text_start_token_id": 151860,
"use_cache": true,
"use_sliding_window": false,
"video_token_index": 151656,
"vision_end_token_id": 151653,
"vision_start_token_id": 151652,
"vocab_size": 8448
},
"thinker_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "Qwen2.5-Omni-3B/thinker",
"architectures": [
"Qwen2OmniNaViTThinkerForConditionalGeneration"
],
"audio_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "",
"activation_dropout": 0.0,
"activation_function": "gelu",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0.0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"d_model": 1280,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.0,
"dtype": null,
"early_stopping": false,
"encoder_attention_heads": 20,
"encoder_ffn_dim": 5120,
"encoder_layerdrop": 0.0,
"encoder_layers": 32,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"init_std": 0.02,
"initializer_range": 0.02,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"max_length": 20,
"max_source_positions": 1500,
"min_length": 0,
"model_type": "qwen2_5_omni_audio_encoder",
"n_window": 100,
"no_repeat_ngram_size": 0,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 32,
"num_mel_bins": 128,
"num_return_sequences": 1,
"output_attentions": false,
"output_dim": 2048,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"scale_embedding": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"use_bfloat16": false
},
"audio_end_token_id": 151648,
"audio_start_token_id": 151647,
"audio_token_index": 151646,
"bos_token_id": 151644,
"dtype": "bfloat16",
"eos_token_id": 151645,
"ignore_index": -100,
"image_token_index": 151655,
"init_std": 0.02,
"initializer_range": 0.02,
"model_type": "qwen2_5_omni_thinker",
"pad_token_id": 151643,
"position_id_per_seconds": 25,
"seconds_per_chunk": 2,
"text_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0.0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dtype": null,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "silu",
"hidden_size": 2048,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"init_std": 0.02,
"initializer_range": 0.02,
"intermediate_size": 11008,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 32768,
"max_window_layers": 70,
"min_length": 0,
"model_type": "qwen2_5_omni_text",
"no_repeat_ngram_size": 0,
"num_attention_heads": 16,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 36,
"num_key_value_heads": 2,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"mrope_section": [
16,
24,
24
],
"rope_type": "default",
"type": "default"
},
"rope_theta": 1000000.0,
"sep_token_id": null,
"sliding_window": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": false,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"use_bfloat16": false,
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
},
"user_token_id": 872,
"video_token_index": 151656,
"vision_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"depth": 32,
"diversity_penalty": 0.0,
"do_sample": false,
"dtype": null,
"early_stopping": false,
"embed_dim": 1280,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"fullatt_block_indexes": [
7,
15,
23,
31
],
"hidden_act": "silu",
"hidden_size": 1280,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"in_channels": 3,
"in_chans": 3,
"init_std": 0.02,
"initializer_range": 0.02,
"intermediate_size": 3420,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"max_length": 20,
"min_length": 0,
"model_type": "qwen2_5_omni_vision_encoder",
"no_repeat_ngram_size": 0,
"num_beam_groups": 1,
"num_beams": 1,
"num_heads": 16,
"num_return_sequences": 1,
"out_hidden_size": 2048,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"patch_size": 14,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"spatial_merge_size": 2,
"spatial_patch_size": 14,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"temporal_patch_size": 2,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"tokens_per_second": 25,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"use_bfloat16": false,
"window_size": 112
},
"vision_end_token_id": 151653,
"vision_start_token_id": 151652,
"vision_token_id": 151654
},
"token2wav_config": {
"_attn_implementation_autoset": true,
"bigvgan_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dtype": null,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"max_length": 20,
"mel_dim": 80,
"min_length": 0,
"model_type": "qwen2_5_omni_bigvgan",
"no_repeat_ngram_size": 0,
"num_beam_groups": 1,
"num_beams": 1,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"resblock_dilation_sizes": [
[
1,
3,
5
],
[
1,
3,
5
],
[
1,
3,
5
]
],
"resblock_kernel_sizes": [
3,
7,
11
],
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"upsample_initial_channel": 1536,
"upsample_kernel_sizes": [
11,
7,
4,
4,
4,
4
],
"upsample_rates": [
5,
3,
2,
2,
2,
2
],
"use_bfloat16": false,
"use_bias_at_final": false
},
"dit_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"block_size": 24,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"depth": 22,
"dim": 1024,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.1,
"dtype": "float32",
"early_stopping": false,
"emb_dim": 512,
"enc_attention_channels": 64,
"enc_channels": [
256,
256,
256,
256,
768
],
"enc_dilations": [
1,
2,
3,
4,
1
],
"enc_dim": 128,
"enc_emb_dim": 192,
"enc_global_context": true,
"enc_kernel_sizes": [
5,
3,
3,
3,
1
],
"enc_lin_neurons": 192,
"enc_res2net_scale": 2,
"enc_se_channels": 64,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"ff_mult": 2,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"head_dim": 64,
"heads": 16,
"hidden_size": 1024,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"look_ahead_layers": [
10
],
"look_backward_layers": [
0,
20
],
"max_length": 20,
"max_position_embeddings": 32768,
"mel_dim": 80,
"min_length": 0,
"model_type": "qwen2_5_omni_dit",
"no_repeat_ngram_size": 0,
"num_attention_heads": 16,
"num_beam_groups": 1,
"num_beams": 1,
"num_embeds": 8193,
"num_hidden_layers": 22,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repeats": 2,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"rope_theta": 10000.0,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"use_bfloat16": false
},
"model_type": "qwen2_5_omni_token2wav"
},
"transformers_version": "4.57.1"
}
[INFO|2025-10-30 09:03:33] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[WARNING|logging.py:328] 2025-10-30 09:03:35,554 >> `torch_dtype` is deprecated! Use `dtype` instead!
[INFO|modeling_utils.py:1169] 2025-10-30 09:03:35,558 >> loading weights file ckpts/Qwen2.5-Omni-3B/model.safetensors.index.json
[INFO|modeling_utils.py:2341] 2025-10-30 09:03:35,563 >> Instantiating Qwen2_5OmniForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:986] 2025-10-30 09:03:35,567 >> Generate config GenerationConfig {
"use_cache": false
}
[INFO|configuration_utils.py:986] 2025-10-30 09:03:35,569 >> Generate config GenerationConfig {
"bos_token_id": 151644,
"eos_token_id": 151645,
"pad_token_id": 151643
}
[INFO|configuration_utils.py:986] 2025-10-30 09:03:35,640 >> Generate config GenerationConfig {}
[INFO|modeling_utils.py:2341] 2025-10-30 09:03:35,657 >> Instantiating Qwen2_5OmniToken2WavDiTModel model under default dtype torch.float32.
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:04, 2.46s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.69s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.64s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.67s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:04<00:08, 4.40s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:04<00:08, 4.42s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:04<00:08, 4.40s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.22s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:04<00:08, 4.47s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:04<00:02, 2.38s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:05<00:02, 2.50s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:05<00:02, 2.49s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.75s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.90s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:06<00:00, 1.84s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:06<00:00, 2.02s/it]
[INFO|configuration_utils.py:939] 2025-10-30 09:03:42,034 >> loading configuration file ckpts/Qwen2.5-Omni-3B/generation_config.json
[INFO|configuration_utils.py:986] 2025-10-30 09:03:42,034 >> Generate config GenerationConfig {}
[INFO|dynamic_module_utils.py:423] 2025-10-30 09:03:42,037 >> Could not locate the custom_generate/generate.py inside ckpts/Qwen2.5-Omni-3B.
[INFO|modeling_qwen2_5_omni.py:3727] 2025-10-30 09:03:42,091 >> Speaker ['Ethan', 'Chelsie'] loaded
[INFO|2025-10-30 09:03:42] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled.
[INFO|2025-10-30 09:03:42] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
[INFO|2025-10-30 09:03:42] llamafactory.model.adapter:143 >> Upcasting trainable params to float32.
[INFO|2025-10-30 09:03:42] llamafactory.model.adapter:143 >> Fine-tuning method: Full
[INFO|2025-10-30 09:03:42] llamafactory.model.model_utils.visual:143 >> Set vision model not trainable: ['visual.patch_embed', 'visual.blocks', 'audio_tower'].
[INFO|2025-10-30 09:03:42] llamafactory.model.model_utils.visual:143 >> Set multi model projector not trainable: visual.merger.
[INFO|2025-10-30 09:03:42] llamafactory.model.loader:143 >> trainable params: 3,397,103,616 || all params: 4,703,464,448 || trainable%: 72.2256
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
[INFO|trainer.py:749] 2025-10-30 09:03:42,429 >> Using auto half precision backend
[WARNING|trainer.py:982] 2025-10-30 09:03:42,432 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
Loading checkpoint shards: 100%|██████████| 3/3 [00:06<00:00, 1.89s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:06<00:00, 2.07s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:06<00:00, 1.90s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:06<00:00, 2.07s/it]
Gradient accumulation steps mismatch: GradientAccumulationPlugin has 1, DeepSpeed config has 4. Using DeepSpeed's value.
[2025-10-30 09:03:42,838] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed info: version=0.16.9, git-hash=unknown, git-branch=unknown
[2025-10-30 09:03:42,838] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:07<00:03, 3.44s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:07<00:03, 3.45s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:07<00:03, 3.46s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:07<00:03, 3.53s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:10<00:00, 3.32s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:10<00:00, 3.45s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:10<00:00, 3.33s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:10<00:00, 3.46s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:10<00:00, 3.33s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:10<00:00, 3.46s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:10<00:00, 3.38s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:10<00:00, 3.52s/it]
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
[2025-10-30 09:03:51,380] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2025-10-30 09:03:51,387] [INFO] [logging.py:107:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2025-10-30 09:03:51,388] [INFO] [logging.py:107:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2025-10-30 09:03:51,423] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2025-10-30 09:03:51,423] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=
[2025-10-30 09:03:51,424] [INFO] [logging.py:107:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2025-10-30 09:03:51,424] [INFO] [stage_1_and_2.py:150:__init__] Reduce bucket size 500000000
[2025-10-30 09:03:51,425] [INFO] [stage_1_and_2.py:151:__init__] Allgather bucket size 500000000
[2025-10-30 09:03:51,425] [INFO] [stage_1_and_2.py:152:__init__] CPU Offload: False
[2025-10-30 09:03:51,426] [INFO] [stage_1_and_2.py:153:__init__] Round robin gradient partitioning: True
[2025-10-30 09:04:07,863] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2025-10-30 09:04:07,864] [INFO] [utils.py:782:see_memory_usage] MA 10.35 GB Max_MA 10.35 GB CA 10.38 GB Max_CA 10 GB
[2025-10-30 09:04:07,865] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 144.07 GB, percent = 14.3%
[2025-10-30 09:04:08,304] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2025-10-30 09:04:08,306] [INFO] [utils.py:782:see_memory_usage] MA 10.35 GB Max_MA 11.93 GB CA 11.96 GB Max_CA 12 GB
[2025-10-30 09:04:08,307] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 150.8 GB, percent = 15.0%
[2025-10-30 09:04:08,308] [INFO] [stage_1_and_2.py:557:__init__] optimizer state initialized
[2025-10-30 09:04:08,909] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2025-10-30 09:04:08,912] [INFO] [utils.py:782:see_memory_usage] MA 10.35 GB Max_MA 10.35 GB CA 11.96 GB Max_CA 12 GB
[2025-10-30 09:04:08,915] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 155.87 GB, percent = 15.5%
[2025-10-30 09:04:08,922] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer
[2025-10-30 09:04:08,923] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = None
[2025-10-30 09:04:08,924] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2025-10-30 09:04:08,924] [INFO] [logging.py:107:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2025-10-30 09:04:08,934] [INFO] [config.py:1003:print] DeepSpeedEngine configuration:
[2025-10-30 09:04:08,936] [INFO] [config.py:1007:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2025-10-30 09:04:08,936] [INFO] [config.py:1007:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'intra_op_parallelism': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False}
[2025-10-30 09:04:08,937] [INFO] [config.py:1007:print] amp_enabled .................. False
[2025-10-30 09:04:08,937] [INFO] [config.py:1007:print] amp_params ................... False
[2025-10-30 09:04:08,939] [INFO] [config.py:1007:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2025-10-30 09:04:08,939] [INFO] [config.py:1007:print] bfloat16_enabled ............. True
[2025-10-30 09:04:08,940] [INFO] [config.py:1007:print] bfloat16_immediate_grad_update True
[2025-10-30 09:04:08,940] [INFO] [config.py:1007:print] checkpoint_parallel_write_pipeline False
[2025-10-30 09:04:08,941] [INFO] [config.py:1007:print] checkpoint_tag_validation_enabled True
[2025-10-30 09:04:08,941] [INFO] [config.py:1007:print] checkpoint_tag_validation_fail False
[2025-10-30 09:04:08,942] [INFO] [config.py:1007:print] comms_config .................
[2025-10-30 09:04:08,942] [INFO] [config.py:1007:print] communication_data_type ...... None
[2025-10-30 09:04:08,943] [INFO] [config.py:1007:print] compile_config ............... deepcompile=False free_activation=False offload_activation=False offload_opt_states=False double_buffer=True symmetric_memory=False debug_log=False offload_parameters=False sync_before_reduce=False sync_after_reduce=False sync_before_allgather=False sync_after_allgather=False
[2025-10-30 09:04:08,944] [INFO] [config.py:1007:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2025-10-30 09:04:08,944] [INFO] [config.py:1007:print] curriculum_enabled_legacy .... False
[2025-10-30 09:04:08,945] [INFO] [config.py:1007:print] curriculum_params_legacy ..... False
[2025-10-30 09:04:08,945] [INFO] [config.py:1007:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'pin_memory': False, 'curriculum_learning': {'enabled': False}, 'dynamic_batching': {'enabled': False, 'lr_scaling_method': 'linear', 'min_batch_size': 1, 'max_batch_size': None, 'sequence_picking_order': 'dataloader', 'verbose': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2025-10-30 09:04:08,946] [INFO] [config.py:1007:print] data_efficiency_enabled ...... False
[2025-10-30 09:04:08,946] [INFO] [config.py:1007:print] dataloader_drop_last ......... False
[2025-10-30 09:04:08,947] [INFO] [config.py:1007:print] disable_allgather ............ False
[2025-10-30 09:04:08,947] [INFO] [config.py:1007:print] dump_state ................... False
[2025-10-30 09:04:08,947] [INFO] [config.py:1007:print] dynamic_loss_scale_args ...... None
[2025-10-30 09:04:08,948] [INFO] [config.py:1007:print] eigenvalue_enabled ........... False
[2025-10-30 09:04:08,948] [INFO] [config.py:1007:print] eigenvalue_gas_boundary_resolution 1
[2025-10-30 09:04:08,949] [INFO] [config.py:1007:print] eigenvalue_layer_name ........ bert.encoder.layer
[2025-10-30 09:04:08,949] [INFO] [config.py:1007:print] eigenvalue_layer_num ......... 0
[2025-10-30 09:04:08,950] [INFO] [config.py:1007:print] eigenvalue_max_iter .......... 100
[2025-10-30 09:04:08,950] [INFO] [config.py:1007:print] eigenvalue_stability ......... 1e-06
[2025-10-30 09:04:08,951] [INFO] [config.py:1007:print] eigenvalue_tol ............... 0.01
[2025-10-30 09:04:08,951] [INFO] [config.py:1007:print] eigenvalue_verbose ........... False
[2025-10-30 09:04:08,952] [INFO] [config.py:1007:print] elasticity_enabled ........... False
[2025-10-30 09:04:08,952] [INFO] [config.py:1007:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2025-10-30 09:04:08,953] [INFO] [config.py:1007:print] fp16_auto_cast ............... None
[2025-10-30 09:04:08,953] [INFO] [config.py:1007:print] fp16_enabled ................. False
[2025-10-30 09:04:08,954] [INFO] [config.py:1007:print] fp16_master_weights_and_gradients False
[2025-10-30 09:04:08,954] [INFO] [config.py:1007:print] global_rank .................. 0
[2025-10-30 09:04:08,955] [INFO] [config.py:1007:print] grad_accum_dtype ............. None
[2025-10-30 09:04:08,955] [INFO] [config.py:1007:print] gradient_accumulation_steps .. 4
[2025-10-30 09:04:08,956] [INFO] [config.py:1007:print] gradient_clipping ............ 1.0
[2025-10-30 09:04:08,956] [INFO] [config.py:1007:print] gradient_predivide_factor .... 1.0
[2025-10-30 09:04:08,956] [INFO] [config.py:1007:print] graph_harvesting ............. False
[2025-10-30 09:04:08,957] [INFO] [config.py:1007:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2025-10-30 09:04:08,957] [INFO] [config.py:1007:print] initial_dynamic_scale ........ 1
[2025-10-30 09:04:08,958] [INFO] [config.py:1007:print] load_universal_checkpoint .... False
[2025-10-30 09:04:08,958] [INFO] [config.py:1007:print] loss_scale ................... 1.0
[2025-10-30 09:04:08,959] [INFO] [config.py:1007:print] memory_breakdown ............. False
[2025-10-30 09:04:08,959] [INFO] [config.py:1007:print] mics_hierarchial_params_gather False
[2025-10-30 09:04:08,960] [INFO] [config.py:1007:print] mics_shard_size .............. -1
[2025-10-30 09:04:08,960] [INFO] [config.py:1007:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName')
[2025-10-30 09:04:08,961] [INFO] [config.py:1007:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2025-10-30 09:04:08,961] [INFO] [config.py:1007:print] optimizer_legacy_fusion ...... False
[2025-10-30 09:04:08,962] [INFO] [config.py:1007:print] optimizer_name ............... None
[2025-10-30 09:04:08,962] [INFO] [config.py:1007:print] optimizer_params ............. None
[2025-10-30 09:04:08,963] [INFO] [config.py:1007:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2025-10-30 09:04:08,963] [INFO] [config.py:1007:print] pld_enabled .................. False
[2025-10-30 09:04:08,964] [INFO] [config.py:1007:print] pld_params ................... False
[2025-10-30 09:04:08,964] [INFO] [config.py:1007:print] prescale_gradients ........... False
[2025-10-30 09:04:08,965] [INFO] [config.py:1007:print] scheduler_name ............... None
[2025-10-30 09:04:08,965] [INFO] [config.py:1007:print] scheduler_params ............. None
[2025-10-30 09:04:08,966] [INFO] [config.py:1007:print] seq_parallel_communication_data_type torch.float32
[2025-10-30 09:04:08,966] [INFO] [config.py:1007:print] sparse_attention ............. None
[2025-10-30 09:04:08,967] [INFO] [config.py:1007:print] sparse_gradients_enabled ..... False
[2025-10-30 09:04:08,967] [INFO] [config.py:1007:print] steps_per_print .............. inf
[2025-10-30 09:04:08,968] [INFO] [config.py:1007:print] tensor_parallel_config ....... dtype=torch.float16 autotp_size=0 tp_overlap_comm=False tensor_parallel=TPConfig(tp_size=1, tp_grain_size=1, mpu=None, tp_group=None) injection_policy_tuple=None keep_module_on_host=False replace_with_kernel_inject=False
[2025-10-30 09:04:08,968] [INFO] [config.py:1007:print] timers_config ................ enabled=True synchronized=True
[2025-10-30 09:04:08,969] [INFO] [config.py:1007:print] train_batch_size ............. 32
[2025-10-30 09:04:08,969] [INFO] [config.py:1007:print] train_micro_batch_size_per_gpu 1
[2025-10-30 09:04:08,970] [INFO] [config.py:1007:print] use_data_before_expert_parallel_ False
[2025-10-30 09:04:08,970] [INFO] [config.py:1007:print] use_node_local_storage ....... False
[2025-10-30 09:04:08,971] [INFO] [config.py:1007:print] wall_clock_breakdown ......... False
[2025-10-30 09:04:08,971] [INFO] [config.py:1007:print] weight_quantization_config ... None
[2025-10-30 09:04:08,972] [INFO] [config.py:1007:print] world_size ................... 8
[2025-10-30 09:04:08,972] [INFO] [config.py:1007:print] zero_allow_untested_optimizer True
[2025-10-30 09:04:08,973] [INFO] [config.py:1007:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False module_granularity_threshold=0 use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=True zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False zeropp_loco_param=None mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True log_trace_cache_warnings=False
[2025-10-30 09:04:08,973] [INFO] [config.py:1007:print] zero_enabled ................. True
[2025-10-30 09:04:08,974] [INFO] [config.py:1007:print] zero_force_ds_cpu_optimizer .. True
[2025-10-30 09:04:08,974] [INFO] [config.py:1007:print] zero_optimization_stage ...... 2
[2025-10-30 09:04:08,975] [INFO] [config.py:993:print_user_config] json = {
"train_batch_size": 32,
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 4,
"gradient_clipping": 1.0,
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": false,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"bf16": {
"enabled": true
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5.000000e+08,
"overlap_comm": false,
"reduce_scatter": true,
"reduce_bucket_size": 5.000000e+08,
"contiguous_gradients": true,
"round_robin_gradients": true
},
"steps_per_print": inf
}
[INFO|trainer.py:2519] 2025-10-30 09:04:08,991 >> ***** Running training *****
[INFO|trainer.py:2520] 2025-10-30 09:04:08,991 >> Num examples = 144,164
[INFO|trainer.py:2521] 2025-10-30 09:04:08,992 >> Num Epochs = 1
[INFO|trainer.py:2522] 2025-10-30 09:04:08,992 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2525] 2025-10-30 09:04:08,993 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2526] 2025-10-30 09:04:08,993 >> Gradient Accumulation steps = 4
[INFO|trainer.py:2527] 2025-10-30 09:04:08,994 >> Total optimization steps = 4,506
[INFO|trainer.py:2528] 2025-10-30 09:04:09,005 >> Number of trainable parameters = 3,397,103,616
0%| | 0/4506 [00:00, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
[WARNING|logging.py:328] 2025-10-30 09:04:17,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
0%| | 1/4506 [00:26<33:04:02, 26.42s/it]
{'loss': 0.8387, 'grad_norm': 7.749425888061523, 'learning_rate': 0.0, 'epoch': 0.0}
0%| | 1/4506 [00:26<33:04:02, 26.42s/it]
0%| | 2/4506 [00:41<24:28:46, 19.57s/it]
{'loss': 0.7774, 'grad_norm': 6.958076000213623, 'learning_rate': 1.1086474501108648e-07, 'epoch': 0.0}
0%| | 2/4506 [00:41<24:28:46, 19.57s/it]slurmstepd: error: *** JOB 6166706 ON SH-IDCA1404-10-140-54-103 CANCELLED AT 2025-10-30T17:04:51 DUE TO PREEMPTION ***
W1030 09:04:51.291000 57640 site-packages/torch/distributed/elastic/agent/server/api.py:719] Received 15 death signal, shutting down workers
W1030 09:04:51.295000 57640 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 57717 closing signal SIGTERM
W1030 09:04:51.298000 57640 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 57718 closing signal SIGTERM
W1030 09:04:51.303000 57640 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 57719 closing signal SIGTERM
W1030 09:04:51.312000 57640 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 57720 closing signal SIGTERM
[rank6]:[E1030 09:14:52.381864367 ProcessGroupNCCL.cpp:629] [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600057 milliseconds before timing out.
[rank7]:[E1030 09:14:52.382558443 ProcessGroupNCCL.cpp:629] [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600060 milliseconds before timing out.
[rank6]:[E1030 09:14:52.382560688 ProcessGroupNCCL.cpp:2168] [PG ID 1 PG GUID 1 Rank 6] failure detected by watchdog at work sequence id: 1509 PG status: last enqueued work: 1510, last completed work: 1508
[rank7]:[E1030 09:14:52.383147328 ProcessGroupNCCL.cpp:2168] [PG ID 1 PG GUID 1 Rank 7] failure detected by watchdog at work sequence id: 1509 PG status: last enqueued work: 1510, last completed work: 1508
[rank6]:[E1030 09:14:52.383436059 ProcessGroupNCCL.cpp:667] Stack trace of the failed collective not found, potentially because FlightRecorder is disabled. You can enable it by setting TORCH_NCCL_TRACE_BUFFER_SIZE to a non-zero value.
[rank7]:[E1030 09:14:52.383834403 ProcessGroupNCCL.cpp:667] Stack trace of the failed collective not found, potentially because FlightRecorder is disabled. You can enable it by setting TORCH_NCCL_TRACE_BUFFER_SIZE to a non-zero value.
[rank4]:[E1030 09:14:52.388451945 ProcessGroupNCCL.cpp:629] [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600068 milliseconds before timing out.
[rank4]:[E1030 09:14:52.389021753 ProcessGroupNCCL.cpp:2168] [PG ID 1 PG GUID 1 Rank 4] failure detected by watchdog at work sequence id: 1509 PG status: last enqueued work: 1510, last completed work: 1508
[rank4]:[E1030 09:14:52.389467660 ProcessGroupNCCL.cpp:667] Stack trace of the failed collective not found, potentially because FlightRecorder is disabled. You can enable it by setting TORCH_NCCL_TRACE_BUFFER_SIZE to a non-zero value.
[rank5]:[E1030 09:14:52.418050292 ProcessGroupNCCL.cpp:629] [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600085 milliseconds before timing out.
[rank5]:[E1030 09:14:52.418678947 ProcessGroupNCCL.cpp:2168] [PG ID 1 PG GUID 1 Rank 5] failure detected by watchdog at work sequence id: 1509 PG status: last enqueued work: 1510, last completed work: 1508
[rank5]:[E1030 09:14:52.419093875 ProcessGroupNCCL.cpp:667] Stack trace of the failed collective not found, potentially because FlightRecorder is disabled. You can enable it by setting TORCH_NCCL_TRACE_BUFFER_SIZE to a non-zero value.
[rank4]:[E1030 09:14:54.206212910 ProcessGroupNCCL.cpp:681] [Rank 4] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank4]:[E1030 09:14:54.206731112 ProcessGroupNCCL.cpp:695] [Rank 4] To avoid data inconsistency, we are taking the entire process down.
[rank4]:[E1030 09:14:54.209022092 ProcessGroupNCCL.cpp:1895] [PG ID 1 PG GUID 1 Rank 4] Process group watchdog thread terminated with exception: [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600068 milliseconds before timing out.
Exception raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:632 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f69f2d6c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional > >) + 0x2b4 (0x7f69a101bc74 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x890 (0x7f69a101d7d0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f69a101e6ed in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x145c0 (0x7f69f326d5c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #5: + 0x94ac3 (0x7f69f3a6bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7f69f3afca04 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 1 PG GUID 1 Rank 4] Process group watchdog thread terminated with exception: [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600068 milliseconds before timing out.
Exception raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:632 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f69f2d6c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional > >) + 0x2b4 (0x7f69a101bc74 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x890 (0x7f69a101d7d0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f69a101e6ed in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x145c0 (0x7f69f326d5c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #5: + 0x94ac3 (0x7f69f3a6bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7f69f3afca04 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1901 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f69f2d6c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: + 0xe5c6fc (0x7f69a0c796fc in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0x145c0 (0x7f69f326d5c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #3: + 0x94ac3 (0x7f69f3a6bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: clone + 0x44 (0x7f69f3afca04 in /lib/x86_64-linux-gnu/libc.so.6)
[rank6]:[E1030 09:14:54.216094150 ProcessGroupNCCL.cpp:681] [Rank 6] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank6]:[E1030 09:14:54.216515223 ProcessGroupNCCL.cpp:695] [Rank 6] To avoid data inconsistency, we are taking the entire process down.
[rank6]:[E1030 09:14:54.218573355 ProcessGroupNCCL.cpp:1895] [PG ID 1 PG GUID 1 Rank 6] Process group watchdog thread terminated with exception: [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600057 milliseconds before timing out.
Exception raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:632 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fa81736c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional > >) + 0x2b4 (0x7fa7c561bc74 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x890 (0x7fa7c561d7d0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fa7c561e6ed in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x145c0 (0x7fa8177df5c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #5: + 0x94ac3 (0x7fa81806bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7fa8180fca04 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
[rank7]:[E1030 09:14:54.219529250 ProcessGroupNCCL.cpp:681] [Rank 7] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank7]:[E1030 09:14:54.220493314 ProcessGroupNCCL.cpp:695] [Rank 7] To avoid data inconsistency, we are taking the entire process down.
what(): [PG ID 1 PG GUID 1 Rank 6] Process group watchdog thread terminated with exception: [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600057 milliseconds before timing out.
Exception raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:632 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fa81736c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional > >) + 0x2b4 (0x7fa7c561bc74 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x890 (0x7fa7c561d7d0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fa7c561e6ed in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x145c0 (0x7fa8177df5c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #5: + 0x94ac3 (0x7fa81806bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7fa8180fca04 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1901 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fa81736c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: + 0xe5c6fc (0x7fa7c52796fc in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0x145c0 (0x7fa8177df5c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #3: + 0x94ac3 (0x7fa81806bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: clone + 0x44 (0x7fa8180fca04 in /lib/x86_64-linux-gnu/libc.so.6)
[rank7]:[E1030 09:14:54.222412174 ProcessGroupNCCL.cpp:1895] [PG ID 1 PG GUID 1 Rank 7] Process group watchdog thread terminated with exception: [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600060 milliseconds before timing out.
Exception raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:632 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f234f76c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional > >) + 0x2b4 (0x7f22fda1bc74 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x890 (0x7f22fda1d7d0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f22fda1e6ed in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x145c0 (0x7f234fc5e5c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #5: + 0x94ac3 (0x7f235046bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7f23504fca04 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 1 PG GUID 1 Rank 7] Process group watchdog thread terminated with exception: [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600060 milliseconds before timing out.
Exception raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:632 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f234f76c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional > >) + 0x2b4 (0x7f22fda1bc74 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x890 (0x7f22fda1d7d0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f22fda1e6ed in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x145c0 (0x7f234fc5e5c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #5: + 0x94ac3 (0x7f235046bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7f23504fca04 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1901 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f234f76c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: + 0xe5c6fc (0x7f22fd6796fc in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0x145c0 (0x7f234fc5e5c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #3: + 0x94ac3 (0x7f235046bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: clone + 0x44 (0x7f23504fca04 in /lib/x86_64-linux-gnu/libc.so.6)
[rank5]:[E1030 09:14:54.303436604 ProcessGroupNCCL.cpp:681] [Rank 5] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank5]:[E1030 09:14:54.303934785 ProcessGroupNCCL.cpp:695] [Rank 5] To avoid data inconsistency, we are taking the entire process down.
[rank5]:[E1030 09:14:54.306150731 ProcessGroupNCCL.cpp:1895] [PG ID 1 PG GUID 1 Rank 5] Process group watchdog thread terminated with exception: [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600085 milliseconds before timing out.
Exception raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:632 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fd23e16c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional > >) + 0x2b4 (0x7fd1ec81bc74 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x890 (0x7fd1ec81d7d0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fd1ec81e6ed in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x145c0 (0x7fd23e5e35c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #5: + 0x94ac3 (0x7fd23f06bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7fd23f0fca04 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 1 PG GUID 1 Rank 5] Process group watchdog thread terminated with exception: [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1509, OpType=ALLREDUCE, NumelIn=494450176, NumelOut=494450176, Timeout(ms)=600000) ran for 600085 milliseconds before timing out.
Exception raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:632 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fd23e16c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional > >) + 0x2b4 (0x7fd1ec81bc74 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x890 (0x7fd1ec81d7d0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fd1ec81e6ed in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x145c0 (0x7fd23e5e35c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #5: + 0x94ac3 (0x7fd23f06bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7fd23f0fca04 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1901 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fd23e16c1b6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: + 0xe5c6fc (0x7fd1ec4796fc in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0x145c0 (0x7fd23e5e35c0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch.so)
frame #3: + 0x94ac3 (0x7fd23f06bac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: clone + 0x44 (0x7fd23f0fca04 in /lib/x86_64-linux-gnu/libc.so.6)
开始时间: Thu Oct 30 23:14:07 CST 2025
节点列表: SH-IDCA1404-10-140-54-99
总进程数: 8
当前任务ID: 6166706
INFO: User not listed in /etc/subuid, trying root-mapped namespace
INFO: No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
[INFO|2025-10-30 15:14:28] llamafactory.launcher:143 >> Initializing 8 distributed tasks at: 127.0.0.1:17821
W1030 15:14:32.420000 113787 site-packages/torch/distributed/run.py:792]
W1030 15:14:32.420000 113787 site-packages/torch/distributed/run.py:792] *****************************************
W1030 15:14:32.420000 113787 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1030 15:14:32.420000 113787 site-packages/torch/distributed/run.py:792] *****************************************
[2025-10-30 15:14:53,109] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 15:14:53,109] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 15:14:53,109] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 15:14:53,110] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 15:14:53,110] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 15:14:53,110] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 15:14:53,110] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-10-30 15:14:53,110] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
[2025-10-30 15:15:03,295] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 15:15:03,300] [INFO] [comm.py:669:init_distributed] cdb=None
[INFO|2025-10-30 15:15:03] llamafactory.hparams.parser:423 >> Process rank: 2, world size: 8, device: cuda:2, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 15:15:03] llamafactory.hparams.parser:423 >> Process rank: 5, world size: 8, device: cuda:5, distributed training: True, compute dtype: torch.bfloat16
[2025-10-30 15:15:03,784] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 15:15:03,784] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 15:15:03,784] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 15:15:03,785] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 15:15:03,786] [INFO] [comm.py:700:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2025-10-30 15:15:03,788] [INFO] [comm.py:669:init_distributed] cdb=None
[2025-10-30 15:15:03,790] [INFO] [comm.py:669:init_distributed] cdb=None
[INFO|2025-10-30 15:15:04] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 8, device: cuda:0, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,184 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,185 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,185 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,186 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,186 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,187 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,187 >> loading file chat_template.jinja
[INFO|2025-10-30 15:15:04] llamafactory.hparams.parser:423 >> Process rank: 1, world size: 8, device: cuda:1, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 15:15:04] llamafactory.hparams.parser:423 >> Process rank: 7, world size: 8, device: cuda:7, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 15:15:04] llamafactory.hparams.parser:423 >> Process rank: 3, world size: 8, device: cuda:3, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 15:15:04] llamafactory.hparams.parser:423 >> Process rank: 4, world size: 8, device: cuda:4, distributed training: True, compute dtype: torch.bfloat16
[INFO|2025-10-30 15:15:04] llamafactory.hparams.parser:423 >> Process rank: 6, world size: 8, device: cuda:6, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2364] 2025-10-30 15:15:04,576 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|image_processing_base.py:381] 2025-10-30 15:15:04,589 >> loading configuration file ckpts/Qwen2.5-Omni-3B/preprocessor_config.json
[INFO|image_processing_base.py:381] 2025-10-30 15:15:04,610 >> loading configuration file ckpts/Qwen2.5-Omni-3B/preprocessor_config.json
[INFO|image_processing_base.py:428] 2025-10-30 15:15:04,640 >> Image processor Qwen2VLImageProcessorFast {
"chunk_length": 300,
"crop_size": null,
"data_format": "channels_first",
"default_to_square": true,
"device": null,
"disable_grouping": null,
"dither": 0.0,
"do_center_crop": null,
"do_convert_rgb": true,
"do_normalize": true,
"do_pad": null,
"do_rescale": true,
"do_resize": true,
"feature_size": 128,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessorFast",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"input_data_format": null,
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"pad_size": null,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"return_attention_mask": true,
"return_tensors": null,
"sampling_rate": 16000,
"size": {
"longest_edge": 12845056,
"shortest_edge": 3136
},
"temporal_patch_size": 2
}
[INFO|video_processing_utils.py:724] 2025-10-30 15:15:04,653 >> loading configuration file ckpts/Qwen2.5-Omni-3B/preprocessor_config.json
[INFO|video_processing_utils.py:770] 2025-10-30 15:15:04,656 >> Video processor Qwen2VLVideoProcessor {
"chunk_length": 300,
"crop_size": null,
"data_format": "channels_first",
"default_to_square": true,
"device": null,
"dither": 0.0,
"do_center_crop": null,
"do_convert_rgb": true,
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"do_sample_frames": false,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"fps": null,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"input_data_format": null,
"max_frames": 768,
"max_pixels": 12845056,
"merge_size": 2,
"min_frames": 4,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"num_frames": null,
"pad_size": null,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"return_attention_mask": true,
"return_metadata": false,
"sampling_rate": 16000,
"size": {
"longest_edge": 12845056,
"shortest_edge": 3136
},
"temporal_patch_size": 2,
"video_metadata": null,
"video_processor_type": "Qwen2VLVideoProcessor"
}
[INFO|feature_extraction_utils.py:556] 2025-10-30 15:15:04,686 >> loading configuration file ckpts/Qwen2.5-Omni-3B/preprocessor_config.json
[INFO|feature_extraction_utils.py:597] 2025-10-30 15:15:04,687 >> Feature extractor WhisperFeatureExtractor {
"chunk_length": 300,
"dither": 0.0,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"return_attention_mask": true,
"sampling_rate": 16000,
"temporal_patch_size": 2
}
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,716 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,717 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,717 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,718 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,718 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,719 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2093] 2025-10-30 15:15:04,719 >> loading file chat_template.jinja
[INFO|tokenization_utils_base.py:2364] 2025-10-30 15:15:05,105 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|processing_utils.py:1114] 2025-10-30 15:15:05,118 >> loading configuration file None
[rank5]:[W1030 15:15:05.896585856 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank2]:[W1030 15:15:05.962326035 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[INFO|processing_utils.py:1199] 2025-10-30 15:15:05,624 >> Processor Qwen2_5OmniProcessor:
- image_processor: Qwen2VLImageProcessorFast {
"chunk_length": 300,
"crop_size": null,
"data_format": "channels_first",
"default_to_square": true,
"device": null,
"disable_grouping": null,
"dither": 0.0,
"do_center_crop": null,
"do_convert_rgb": true,
"do_normalize": true,
"do_pad": null,
"do_rescale": true,
"do_resize": true,
"feature_size": 128,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessorFast",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"input_data_format": null,
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"pad_size": null,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"return_attention_mask": true,
"return_tensors": null,
"sampling_rate": 16000,
"size": {
"longest_edge": 12845056,
"shortest_edge": 3136
},
"temporal_patch_size": 2
}
- video_processor: Qwen2VLVideoProcessor {
"chunk_length": 300,
"crop_size": null,
"data_format": "channels_first",
"default_to_square": true,
"device": null,
"dither": 0.0,
"do_center_crop": null,
"do_convert_rgb": true,
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"do_sample_frames": false,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"fps": null,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"input_data_format": null,
"max_frames": 768,
"max_pixels": 12845056,
"merge_size": 2,
"min_frames": 4,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"num_frames": null,
"pad_size": null,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"return_attention_mask": true,
"return_metadata": false,
"sampling_rate": 16000,
"size": {
"longest_edge": 12845056,
"shortest_edge": 3136
},
"temporal_patch_size": 2,
"video_metadata": null,
"video_processor_type": "Qwen2VLVideoProcessor"
}
- feature_extractor: WhisperFeatureExtractor {
"chunk_length": 300,
"dither": 0.0,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"hop_length": 160,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "Qwen2VLImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"max_pixels": 12845056,
"merge_size": 2,
"min_pixels": 3136,
"n_fft": 400,
"n_samples": 4800000,
"nb_max_frames": 30000,
"padding_side": "right",
"padding_value": 0.0,
"patch_size": 14,
"processor_class": "Qwen2_5OmniProcessor",
"return_attention_mask": true,
"sampling_rate": 16000,
"temporal_patch_size": 2
}
- tokenizer: Qwen2TokenizerFast(name_or_path='ckpts/Qwen2.5-Omni-3B', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|AUDIO|>', '<|audio_bos|>', '<|audio_eos|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_bos|>', '<|vision_eos|>', '<|vision_pad|>', '<|IMAGE|>', '<|VIDEO|>'], 'image_token': '<|IMAGE|>', 'audio_token': '<|AUDIO|>', 'video_token': '<|VIDEO|>', 'vision_bos_token': '<|vision_bos|>', 'vision_eos_token': '<|vision_eos|>', 'audio_bos_token': '<|audio_bos|>', 'audio_eos_token': '<|audio_eos|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151646: AddedToken("<|AUDIO|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151647: AddedToken("<|audio_bos|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151648: AddedToken("<|audio_eos|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151652: AddedToken("<|vision_bos|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151653: AddedToken("<|vision_eos|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151655: AddedToken("<|IMAGE|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151656: AddedToken("<|VIDEO|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
151657: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151658: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
}
)
{
"processor_class": "Qwen2_5OmniProcessor"
}
[INFO|2025-10-30 15:15:05] llamafactory.data.loader:143 >> Loading dataset VG-LLM-train/scannet_det_train_4frames.json...
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
[rank1]:[W1030 15:15:05.517533392 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank6]:[W1030 15:15:05.550564857 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank4]:[W1030 15:15:05.554627333 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank7]:[W1030 15:15:05.601412883 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank3]:[W1030 15:15:05.680445238 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank0]:[W1030 15:15:07.569669251 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'json' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
training example:
input_ids:
[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151653, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151653, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151653, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151653, 198, 57193, 279, 220, 18, 35, 30618, 14697, 304, 279, 6249, 16184, 1849, 315, 279, 1156, 4034, 624, 5097, 264, 2951, 1140, 1380, 1817, 4343, 5610, 279, 1633, 829, 304, 330, 1502, 1, 323, 1181, 220, 18, 35, 30618, 3745, 304, 330, 2011, 62, 18, 67, 22956, 785, 220, 18, 35, 30618, 3745, 3561, 1265, 387, 508, 87, 21087, 11, 379, 21087, 11, 1147, 21087, 11, 856, 2368, 11, 379, 2368, 11, 1147, 2368, 11, 45672, 11, 9649, 11, 6502, 936, 151645, 198, 151644, 77091, 198, 73594, 2236, 198, 9640, 197, 4913, 1502, 788, 330, 2005, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 18, 17, 11, 220, 15, 13, 21, 11, 220, 16, 13, 15, 20, 11, 220, 15, 13, 23, 21, 11, 220, 16, 13, 22, 11, 220, 15, 13, 23, 22, 11, 481, 16, 13, 18, 19, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 1419, 4748, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 16, 20, 11, 220, 15, 13, 17, 18, 11, 220, 15, 13, 22, 23, 11, 220, 15, 13, 19, 24, 11, 220, 15, 13, 20, 16, 11, 220, 15, 13, 17, 17, 11, 481, 16, 13, 18, 19, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 5507, 497, 330, 58456, 62, 18, 67, 788, 10055, 15, 13, 21, 17, 11, 481, 15, 13, 16, 17, 11, 220, 16, 13, 21, 21, 11, 220, 15, 13, 17, 21, 11, 220, 15, 13, 21, 23, 11, 220, 16, 13, 17, 18, 11, 481, 18, 13, 15, 24, 11, 220, 16, 13, 16, 24, 11, 481, 18, 13, 15, 19, 57352, 197, 4913, 1502, 788, 330, 11453, 2482, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 15, 11, 481, 15, 13, 17, 19, 11, 220, 17, 13, 17, 19, 11, 220, 15, 13, 19, 11, 220, 15, 13, 23, 23, 11, 220, 16, 13, 19, 11, 220, 16, 13, 23, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 5507, 497, 330, 58456, 62, 18, 67, 788, 10055, 15, 13, 21, 11, 481, 15, 13, 20, 19, 11, 220, 18, 13, 15, 18, 11, 220, 15, 13, 17, 17, 11, 220, 15, 13, 19, 24, 11, 220, 15, 13, 22, 24, 11, 481, 17, 13, 24, 16, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 11453, 2482, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 17, 18, 11, 481, 15, 13, 20, 17, 11, 220, 18, 13, 15, 22, 11, 220, 15, 13, 18, 22, 11, 220, 16, 13, 22, 16, 11, 220, 16, 13, 16, 21, 11, 220, 16, 13, 23, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 53950, 497, 330, 58456, 62, 18, 67, 788, 508, 17, 13, 17, 20, 11, 220, 15, 13, 15, 20, 11, 220, 15, 13, 20, 24, 11, 220, 15, 13, 23, 16, 11, 220, 22, 13, 16, 19, 11, 220, 16, 13, 23, 24, 11, 220, 15, 13, 17, 22, 11, 220, 16, 13, 16, 24, 11, 481, 17, 13, 22, 22, 23439, 60, 73594, 151645, 198]
inputs:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
<|vision_bos|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|vision_eos|>
<|vision_bos|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|vision_eos|>
<|vision_bos|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|vision_eos|>
<|vision_bos|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|IMAGE|><|vision_eos|>
Detect the 3D bounding boxes in the camera coordinate system of the first frame.
Output a json list where each entry contains the object name in "label" and its 3D bounding box in "box_3d".
The 3D bounding box format should be [x_center, y_center, z_center, x_size, y_size, z_size, yaw, pitch, roll].<|im_end|>
<|im_start|>assistant
```json
[
{"label": "table", "bbox_3d": [0.32, 0.6, 1.05, 0.86, 1.7, 0.87, -1.34, 1.08, -2.84]},
{"label": "backpack", "bbox_3d": [0.15, 0.23, 0.78, 0.49, 0.51, 0.22, -1.34, 1.08, -2.84]},
{"label": "window", "bbox_3d": [-0.62, -0.12, 1.66, 0.26, 0.68, 1.23, -3.09, 1.19, -3.04]},
{"label": "blackboard", "bbox_3d": [0.0, -0.24, 2.24, 0.4, 0.88, 1.4, 1.8, 1.08, -2.84]},
{"label": "window", "bbox_3d": [-0.6, -0.54, 3.03, 0.22, 0.49, 0.79, -2.91, 1.08, -2.84]},
{"label": "blackboard", "bbox_3d": [0.23, -0.52, 3.07, 0.37, 1.71, 1.16, 1.8, 1.08, -2.84]},
{"label": "shelf", "bbox_3d": [2.25, 0.05, 0.59, 0.81, 7.14, 1.89, 0.27, 1.19, -2.77]}
]```<|im_end|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 73594, 2236, 198, 9640, 197, 4913, 1502, 788, 330, 2005, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 18, 17, 11, 220, 15, 13, 21, 11, 220, 16, 13, 15, 20, 11, 220, 15, 13, 23, 21, 11, 220, 16, 13, 22, 11, 220, 15, 13, 23, 22, 11, 481, 16, 13, 18, 19, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 1419, 4748, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 16, 20, 11, 220, 15, 13, 17, 18, 11, 220, 15, 13, 22, 23, 11, 220, 15, 13, 19, 24, 11, 220, 15, 13, 20, 16, 11, 220, 15, 13, 17, 17, 11, 481, 16, 13, 18, 19, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 5507, 497, 330, 58456, 62, 18, 67, 788, 10055, 15, 13, 21, 17, 11, 481, 15, 13, 16, 17, 11, 220, 16, 13, 21, 21, 11, 220, 15, 13, 17, 21, 11, 220, 15, 13, 21, 23, 11, 220, 16, 13, 17, 18, 11, 481, 18, 13, 15, 24, 11, 220, 16, 13, 16, 24, 11, 481, 18, 13, 15, 19, 57352, 197, 4913, 1502, 788, 330, 11453, 2482, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 15, 11, 481, 15, 13, 17, 19, 11, 220, 17, 13, 17, 19, 11, 220, 15, 13, 19, 11, 220, 15, 13, 23, 23, 11, 220, 16, 13, 19, 11, 220, 16, 13, 23, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 5507, 497, 330, 58456, 62, 18, 67, 788, 10055, 15, 13, 21, 11, 481, 15, 13, 20, 19, 11, 220, 18, 13, 15, 18, 11, 220, 15, 13, 17, 17, 11, 220, 15, 13, 19, 24, 11, 220, 15, 13, 22, 24, 11, 481, 17, 13, 24, 16, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 11453, 2482, 497, 330, 58456, 62, 18, 67, 788, 508, 15, 13, 17, 18, 11, 481, 15, 13, 20, 17, 11, 220, 18, 13, 15, 22, 11, 220, 15, 13, 18, 22, 11, 220, 16, 13, 22, 16, 11, 220, 16, 13, 16, 21, 11, 220, 16, 13, 23, 11, 220, 16, 13, 15, 23, 11, 481, 17, 13, 23, 19, 57352, 197, 4913, 1502, 788, 330, 53950, 497, 330, 58456, 62, 18, 67, 788, 508, 17, 13, 17, 20, 11, 220, 15, 13, 15, 20, 11, 220, 15, 13, 20, 24, 11, 220, 15, 13, 23, 16, 11, 220, 22, 13, 16, 19, 11, 220, 16, 13, 23, 24, 11, 220, 15, 13, 17, 22, 11, 220, 16, 13, 16, 24, 11, 481, 17, 13, 22, 22, 23439, 60, 73594, 151645, 198]
labels:
```json
[
{"label": "table", "bbox_3d": [0.32, 0.6, 1.05, 0.86, 1.7, 0.87, -1.34, 1.08, -2.84]},
{"label": "backpack", "bbox_3d": [0.15, 0.23, 0.78, 0.49, 0.51, 0.22, -1.34, 1.08, -2.84]},
{"label": "window", "bbox_3d": [-0.62, -0.12, 1.66, 0.26, 0.68, 1.23, -3.09, 1.19, -3.04]},
{"label": "blackboard", "bbox_3d": [0.0, -0.24, 2.24, 0.4, 0.88, 1.4, 1.8, 1.08, -2.84]},
{"label": "window", "bbox_3d": [-0.6, -0.54, 3.03, 0.22, 0.49, 0.79, -2.91, 1.08, -2.84]},
{"label": "blackboard", "bbox_3d": [0.23, -0.52, 3.07, 0.37, 1.71, 1.16, 1.8, 1.08, -2.84]},
{"label": "shelf", "bbox_3d": [2.25, 0.05, 0.59, 0.81, 7.14, 1.89, 0.27, 1.19, -2.77]}
]```<|im_end|>
[INFO|configuration_utils.py:763] 2025-10-30 15:15:18,251 >> loading configuration file ckpts/Qwen2.5-Omni-3B/config.json
[WARNING|modeling_rope_utils.py:557] 2025-10-30 15:15:18,256 >> Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO|configuration_qwen2_5_omni.py:1059] 2025-10-30 15:15:18,259 >> thinker_config is None. Initializing thinker model with default values
[INFO|configuration_qwen2_5_omni.py:1063] 2025-10-30 15:15:18,259 >> talker_config is None. Initializing talker model with default values
[INFO|configuration_qwen2_5_omni.py:1067] 2025-10-30 15:15:18,260 >> token2wav_config is None. Initializing token2wav model with default values
[INFO|configuration_utils.py:839] 2025-10-30 15:15:18,264 >> Model config Qwen2_5OmniConfig {
"architectures": [
"Qwen2_5OmniForConditionalGeneration"
],
"dtype": "bfloat16",
"enable_audio_output": true,
"enable_talker": true,
"model_type": "qwen2_5_omni",
"talker_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "Qwen2.5-Omni-3B/talker",
"architectures": [
"Qwen2OmniTalkerForConditionalGeneration"
],
"attention_dropout": 0.0,
"audio_end_token_id": 151648,
"audio_start_token_id": 151647,
"audio_token_index": 151646,
"dtype": "bfloat16",
"embedding_size": 2048,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 896,
"image_token_index": 151655,
"init_std": 0.02,
"initializer_range": 0.02,
"intermediate_size": 4864,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2_5_omni_talker",
"num_attention_heads": 14,
"num_hidden_layers": 24,
"num_key_value_heads": 2,
"position_id_per_seconds": 25,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"mrope_section": [
16,
16,
0
],
"rope_type": "default",
"type": "default"
},
"rope_theta": 1000000.0,
"seconds_per_chunk": 2,
"sliding_window": null,
"spatial_merge_size": 2,
"tts_codec_end_token_id": 8294,
"tts_codec_mask_token_id": 8296,
"tts_codec_pad_token_id": 8292,
"tts_codec_start_token_id": 8293,
"tts_text_end_token_id": 151861,
"tts_text_pad_token_id": 151859,
"tts_text_start_token_id": 151860,
"use_cache": true,
"use_sliding_window": false,
"video_token_index": 151656,
"vision_end_token_id": 151653,
"vision_start_token_id": 151652,
"vocab_size": 8448
},
"thinker_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "Qwen2.5-Omni-3B/thinker",
"architectures": [
"Qwen2OmniNaViTThinkerForConditionalGeneration"
],
"audio_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "",
"activation_dropout": 0.0,
"activation_function": "gelu",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0.0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"d_model": 1280,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.0,
"dtype": null,
"early_stopping": false,
"encoder_attention_heads": 20,
"encoder_ffn_dim": 5120,
"encoder_layerdrop": 0.0,
"encoder_layers": 32,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"init_std": 0.02,
"initializer_range": 0.02,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"max_length": 20,
"max_source_positions": 1500,
"min_length": 0,
"model_type": "qwen2_5_omni_audio_encoder",
"n_window": 100,
"no_repeat_ngram_size": 0,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 32,
"num_mel_bins": 128,
"num_return_sequences": 1,
"output_attentions": false,
"output_dim": 2048,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"scale_embedding": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"use_bfloat16": false
},
"audio_end_token_id": 151648,
"audio_start_token_id": 151647,
"audio_token_index": 151646,
"bos_token_id": 151644,
"dtype": "bfloat16",
"eos_token_id": 151645,
"ignore_index": -100,
"image_token_index": 151655,
"init_std": 0.02,
"initializer_range": 0.02,
"model_type": "qwen2_5_omni_thinker",
"pad_token_id": 151643,
"position_id_per_seconds": 25,
"seconds_per_chunk": 2,
"text_config": {
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"attention_dropout": 0.0,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dtype": null,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "silu",
"hidden_size": 2048,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"init_std": 0.02,
"initializer_range": 0.02,
"intermediate_size": 11008,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 32768,
"max_window_layers": 70,
"min_length": 0,
"model_type": "qwen2_5_omni_text",
"no_repeat_ngram_size": 0,
"num_attention_heads": 16,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 36,
"num_key_value_heads": 2,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"mrope_section": [
16,
24,
24
],
"rope_type": "default",
"type": "default"
},
"rope_theta": 1000000.0,
"sep_token_id": null,
"sliding_window": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": false,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"use_bfloat16": false,
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
},
"user_token_id": 872,
"video_token_index": 151656,
"vision_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"depth": 32,
"diversity_penalty": 0.0,
"do_sample": false,
"dtype": null,
"early_stopping": false,
"embed_dim": 1280,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"fullatt_block_indexes": [
7,
15,
23,
31
],
"hidden_act": "silu",
"hidden_size": 1280,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"in_channels": 3,
"in_chans": 3,
"init_std": 0.02,
"initializer_range": 0.02,
"intermediate_size": 3420,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"max_length": 20,
"min_length": 0,
"model_type": "qwen2_5_omni_vision_encoder",
"no_repeat_ngram_size": 0,
"num_beam_groups": 1,
"num_beams": 1,
"num_heads": 16,
"num_return_sequences": 1,
"out_hidden_size": 2048,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"patch_size": 14,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"spatial_merge_size": 2,
"spatial_patch_size": 14,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"temporal_patch_size": 2,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"tokens_per_second": 25,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"use_bfloat16": false,
"window_size": 112
},
"vision_end_token_id": 151653,
"vision_start_token_id": 151652,
"vision_token_id": 151654
},
"token2wav_config": {
"_attn_implementation_autoset": true,
"bigvgan_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"dtype": null,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"max_length": 20,
"mel_dim": 80,
"min_length": 0,
"model_type": "qwen2_5_omni_bigvgan",
"no_repeat_ngram_size": 0,
"num_beam_groups": 1,
"num_beams": 1,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"resblock_dilation_sizes": [
[
1,
3,
5
],
[
1,
3,
5
],
[
1,
3,
5
]
],
"resblock_kernel_sizes": [
3,
7,
11
],
"return_dict": true,
"return_dict_in_generate": false,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"upsample_initial_channel": 1536,
"upsample_kernel_sizes": [
11,
7,
4,
4,
4,
4
],
"upsample_rates": [
5,
3,
2,
2,
2,
2
],
"use_bfloat16": false,
"use_bias_at_final": false
},
"dit_config": {
"_attn_implementation_autoset": true,
"_name_or_path": "",
"add_cross_attention": false,
"architectures": null,
"bad_words_ids": null,
"begin_suppress_tokens": null,
"block_size": 24,
"bos_token_id": null,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"depth": 22,
"dim": 1024,
"diversity_penalty": 0.0,
"do_sample": false,
"dropout": 0.1,
"dtype": "float32",
"early_stopping": false,
"emb_dim": 512,
"enc_attention_channels": 64,
"enc_channels": [
256,
256,
256,
256,
768
],
"enc_dilations": [
1,
2,
3,
4,
1
],
"enc_dim": 128,
"enc_emb_dim": 192,
"enc_global_context": true,
"enc_kernel_sizes": [
5,
3,
3,
3,
1
],
"enc_lin_neurons": 192,
"enc_res2net_scale": 2,
"enc_se_channels": 64,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": null,
"exponential_decay_length_penalty": null,
"ff_mult": 2,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"head_dim": 64,
"heads": 16,
"hidden_size": 1024,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"look_ahead_layers": [
10
],
"look_backward_layers": [
0,
20
],
"max_length": 20,
"max_position_embeddings": 32768,
"mel_dim": 80,
"min_length": 0,
"model_type": "qwen2_5_omni_dit",
"no_repeat_ngram_size": 0,
"num_attention_heads": 16,
"num_beam_groups": 1,
"num_beams": 1,
"num_embeds": 8193,
"num_hidden_layers": 22,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": null,
"prefix": null,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repeats": 2,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"rope_theta": 10000.0,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": true,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"typical_p": 1.0,
"use_bfloat16": false
},
"model_type": "qwen2_5_omni_token2wav"
},
"transformers_version": "4.57.1"
}
[INFO|2025-10-30 15:15:18] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[WARNING|logging.py:328] 2025-10-30 15:15:20,035 >> `torch_dtype` is deprecated! Use `dtype` instead!
[INFO|modeling_utils.py:1169] 2025-10-30 15:15:20,039 >> loading weights file ckpts/Qwen2.5-Omni-3B/model.safetensors.index.json
[INFO|modeling_utils.py:2341] 2025-10-30 15:15:20,044 >> Instantiating Qwen2_5OmniForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:986] 2025-10-30 15:15:20,048 >> Generate config GenerationConfig {
"use_cache": false
}
[INFO|configuration_utils.py:986] 2025-10-30 15:15:20,050 >> Generate config GenerationConfig {
"bos_token_id": 151644,
"eos_token_id": 151645,
"pad_token_id": 151643
}
[INFO|configuration_utils.py:986] 2025-10-30 15:15:20,131 >> Generate config GenerationConfig {}
[INFO|modeling_utils.py:2341] 2025-10-30 15:15:20,145 >> Instantiating Qwen2_5OmniToken2WavDiTModel model under default dtype torch.float32.
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:06<00:12, 6.43s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:06<00:12, 6.45s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:06<00:13, 6.63s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:06<00:13, 6.77s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:06<00:13, 6.54s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:06<00:13, 6.54s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:06<00:13, 6.56s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:06<00:13, 6.64s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:12<00:06, 6.12s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:12<00:06, 6.11s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:12<00:06, 6.19s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:12<00:06, 6.16s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:12<00:06, 6.15s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:12<00:06, 6.13s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:12<00:06, 6.13s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:12<00:06, 6.13s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 4.58s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 5.04s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 4.58s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 5.06s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 4.61s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 4.59s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 4.65s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 4.58s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 4.58s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 4.58s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 5.07s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 5.04s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 5.11s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 5.04s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 5.04s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:15<00:00, 5.04s/it]
[INFO|configuration_utils.py:939] 2025-10-30 15:15:35,810 >> loading configuration file ckpts/Qwen2.5-Omni-3B/generation_config.json
[INFO|configuration_utils.py:986] 2025-10-30 15:15:35,811 >> Generate config GenerationConfig {}
[INFO|dynamic_module_utils.py:423] 2025-10-30 15:15:35,817 >> Could not locate the custom_generate/generate.py inside ckpts/Qwen2.5-Omni-3B.
[INFO|modeling_qwen2_5_omni.py:3727] 2025-10-30 15:15:35,865 >> Speaker ['Ethan', 'Chelsie'] loaded
[INFO|2025-10-30 15:15:35] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled.
[INFO|2025-10-30 15:15:35] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
[INFO|2025-10-30 15:15:35] llamafactory.model.adapter:143 >> Upcasting trainable params to float32.
[INFO|2025-10-30 15:15:35] llamafactory.model.adapter:143 >> Fine-tuning method: Full
[INFO|2025-10-30 15:15:35] llamafactory.model.model_utils.visual:143 >> Set vision model not trainable: ['visual.patch_embed', 'visual.blocks', 'audio_tower'].
[INFO|2025-10-30 15:15:35] llamafactory.model.model_utils.visual:143 >> Set multi model projector not trainable: visual.merger.
[INFO|2025-10-30 15:15:36] llamafactory.model.loader:143 >> trainable params: 3,397,103,616 || all params: 4,703,464,448 || trainable%: 72.2256
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
[INFO|trainer.py:749] 2025-10-30 15:15:36,289 >> Using auto half precision backend
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
[WARNING|trainer.py:982] 2025-10-30 15:15:36,292 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
Gradient accumulation steps mismatch: GradientAccumulationPlugin has 1, DeepSpeed config has 4. Using DeepSpeed's value.
[2025-10-30 15:15:36,609] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed info: version=0.16.9, git-hash=unknown, git-branch=unknown
[2025-10-30 15:15:36,610] [INFO] [config.py:735:__init__] Config mesh_device None world_size = 8
[2025-10-30 15:15:38,572] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2025-10-30 15:15:38,576] [INFO] [logging.py:107:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2025-10-30 15:15:38,577] [INFO] [logging.py:107:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2025-10-30 15:15:38,613] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2025-10-30 15:15:38,613] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=
[2025-10-30 15:15:38,614] [INFO] [logging.py:107:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2025-10-30 15:15:38,614] [INFO] [stage_1_and_2.py:150:__init__] Reduce bucket size 500000000
[2025-10-30 15:15:38,615] [INFO] [stage_1_and_2.py:151:__init__] Allgather bucket size 500000000
[2025-10-30 15:15:38,615] [INFO] [stage_1_and_2.py:152:__init__] CPU Offload: False
[2025-10-30 15:15:38,616] [INFO] [stage_1_and_2.py:153:__init__] Round robin gradient partitioning: True
[2025-10-30 15:15:50,847] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2025-10-30 15:15:50,848] [INFO] [utils.py:782:see_memory_usage] MA 10.35 GB Max_MA 10.35 GB CA 10.38 GB Max_CA 10 GB
[2025-10-30 15:15:50,850] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 81.42 GB, percent = 8.1%
[2025-10-30 15:15:51,185] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2025-10-30 15:15:51,187] [INFO] [utils.py:782:see_memory_usage] MA 10.35 GB Max_MA 11.93 GB CA 11.96 GB Max_CA 12 GB
[2025-10-30 15:15:51,188] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 85.19 GB, percent = 8.5%
[2025-10-30 15:15:51,189] [INFO] [stage_1_and_2.py:557:__init__] optimizer state initialized
[2025-10-30 15:15:51,683] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2025-10-30 15:15:51,685] [INFO] [utils.py:782:see_memory_usage] MA 10.35 GB Max_MA 10.35 GB CA 11.96 GB Max_CA 12 GB
[2025-10-30 15:15:51,686] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 90.83 GB, percent = 9.0%
[2025-10-30 15:15:51,691] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer
[2025-10-30 15:15:51,692] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = None
[2025-10-30 15:15:51,692] [INFO] [logging.py:107:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2025-10-30 15:15:51,693] [INFO] [logging.py:107:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2025-10-30 15:15:51,699] [INFO] [config.py:1003:print] DeepSpeedEngine configuration:
[2025-10-30 15:15:51,701] [INFO] [config.py:1007:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2025-10-30 15:15:51,701] [INFO] [config.py:1007:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'intra_op_parallelism': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False}
[2025-10-30 15:15:51,702] [INFO] [config.py:1007:print] amp_enabled .................. False
[2025-10-30 15:15:51,702] [INFO] [config.py:1007:print] amp_params ................... False
[2025-10-30 15:15:51,704] [INFO] [config.py:1007:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2025-10-30 15:15:51,704] [INFO] [config.py:1007:print] bfloat16_enabled ............. True
[2025-10-30 15:15:51,705] [INFO] [config.py:1007:print] bfloat16_immediate_grad_update True
[2025-10-30 15:15:51,705] [INFO] [config.py:1007:print] checkpoint_parallel_write_pipeline False
[2025-10-30 15:15:51,706] [INFO] [config.py:1007:print] checkpoint_tag_validation_enabled True
[2025-10-30 15:15:51,706] [INFO] [config.py:1007:print] checkpoint_tag_validation_fail False
[2025-10-30 15:15:51,707] [INFO] [config.py:1007:print] comms_config .................
[2025-10-30 15:15:51,707] [INFO] [config.py:1007:print] communication_data_type ...... None
[2025-10-30 15:15:51,708] [INFO] [config.py:1007:print] compile_config ............... deepcompile=False free_activation=False offload_activation=False offload_opt_states=False double_buffer=True symmetric_memory=False debug_log=False offload_parameters=False sync_before_reduce=False sync_after_reduce=False sync_before_allgather=False sync_after_allgather=False
[2025-10-30 15:15:51,709] [INFO] [config.py:1007:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2025-10-30 15:15:51,709] [INFO] [config.py:1007:print] curriculum_enabled_legacy .... False
[2025-10-30 15:15:51,710] [INFO] [config.py:1007:print] curriculum_params_legacy ..... False
[2025-10-30 15:15:51,710] [INFO] [config.py:1007:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'pin_memory': False, 'curriculum_learning': {'enabled': False}, 'dynamic_batching': {'enabled': False, 'lr_scaling_method': 'linear', 'min_batch_size': 1, 'max_batch_size': None, 'sequence_picking_order': 'dataloader', 'verbose': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2025-10-30 15:15:51,711] [INFO] [config.py:1007:print] data_efficiency_enabled ...... False
[2025-10-30 15:15:51,711] [INFO] [config.py:1007:print] dataloader_drop_last ......... False
[2025-10-30 15:15:51,712] [INFO] [config.py:1007:print] disable_allgather ............ False
[2025-10-30 15:15:51,712] [INFO] [config.py:1007:print] dump_state ................... False
[2025-10-30 15:15:51,713] [INFO] [config.py:1007:print] dynamic_loss_scale_args ...... None
[2025-10-30 15:15:51,713] [INFO] [config.py:1007:print] eigenvalue_enabled ........... False
[2025-10-30 15:15:51,714] [INFO] [config.py:1007:print] eigenvalue_gas_boundary_resolution 1
[2025-10-30 15:15:51,714] [INFO] [config.py:1007:print] eigenvalue_layer_name ........ bert.encoder.layer
[2025-10-30 15:15:51,715] [INFO] [config.py:1007:print] eigenvalue_layer_num ......... 0
[2025-10-30 15:15:51,715] [INFO] [config.py:1007:print] eigenvalue_max_iter .......... 100
[2025-10-30 15:15:51,716] [INFO] [config.py:1007:print] eigenvalue_stability ......... 1e-06
[2025-10-30 15:15:51,716] [INFO] [config.py:1007:print] eigenvalue_tol ............... 0.01
[2025-10-30 15:15:51,717] [INFO] [config.py:1007:print] eigenvalue_verbose ........... False
[2025-10-30 15:15:51,717] [INFO] [config.py:1007:print] elasticity_enabled ........... False
[2025-10-30 15:15:51,718] [INFO] [config.py:1007:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2025-10-30 15:15:51,718] [INFO] [config.py:1007:print] fp16_auto_cast ............... None
[2025-10-30 15:15:51,719] [INFO] [config.py:1007:print] fp16_enabled ................. False
[2025-10-30 15:15:51,720] [INFO] [config.py:1007:print] fp16_master_weights_and_gradients False
[2025-10-30 15:15:51,720] [INFO] [config.py:1007:print] global_rank .................. 0
[2025-10-30 15:15:51,721] [INFO] [config.py:1007:print] grad_accum_dtype ............. None
[2025-10-30 15:15:51,721] [INFO] [config.py:1007:print] gradient_accumulation_steps .. 4
[2025-10-30 15:15:51,722] [INFO] [config.py:1007:print] gradient_clipping ............ 1.0
[2025-10-30 15:15:51,722] [INFO] [config.py:1007:print] gradient_predivide_factor .... 1.0
[2025-10-30 15:15:51,723] [INFO] [config.py:1007:print] graph_harvesting ............. False
[2025-10-30 15:15:51,723] [INFO] [config.py:1007:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2025-10-30 15:15:51,724] [INFO] [config.py:1007:print] initial_dynamic_scale ........ 1
[2025-10-30 15:15:51,724] [INFO] [config.py:1007:print] load_universal_checkpoint .... False
[2025-10-30 15:15:51,725] [INFO] [config.py:1007:print] loss_scale ................... 1.0
[2025-10-30 15:15:51,725] [INFO] [config.py:1007:print] memory_breakdown ............. False
[2025-10-30 15:15:51,726] [INFO] [config.py:1007:print] mics_hierarchial_params_gather False
[2025-10-30 15:15:51,726] [INFO] [config.py:1007:print] mics_shard_size .............. -1
[2025-10-30 15:15:51,727] [INFO] [config.py:1007:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName')
[2025-10-30 15:15:51,728] [INFO] [config.py:1007:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2025-10-30 15:15:51,728] [INFO] [config.py:1007:print] optimizer_legacy_fusion ...... False
[2025-10-30 15:15:51,729] [INFO] [config.py:1007:print] optimizer_name ............... None
[2025-10-30 15:15:51,729] [INFO] [config.py:1007:print] optimizer_params ............. None
[2025-10-30 15:15:51,730] [INFO] [config.py:1007:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2025-10-30 15:15:51,730] [INFO] [config.py:1007:print] pld_enabled .................. False
[2025-10-30 15:15:51,731] [INFO] [config.py:1007:print] pld_params ................... False
[2025-10-30 15:15:51,731] [INFO] [config.py:1007:print] prescale_gradients ........... False
[2025-10-30 15:15:51,732] [INFO] [config.py:1007:print] scheduler_name ............... None
[2025-10-30 15:15:51,732] [INFO] [config.py:1007:print] scheduler_params ............. None
[2025-10-30 15:15:51,732] [INFO] [config.py:1007:print] seq_parallel_communication_data_type torch.float32
[2025-10-30 15:15:51,733] [INFO] [config.py:1007:print] sparse_attention ............. None
[2025-10-30 15:15:51,733] [INFO] [config.py:1007:print] sparse_gradients_enabled ..... False
[2025-10-30 15:15:51,734] [INFO] [config.py:1007:print] steps_per_print .............. inf
[2025-10-30 15:15:51,734] [INFO] [config.py:1007:print] tensor_parallel_config ....... dtype=torch.float16 autotp_size=0 tp_overlap_comm=False tensor_parallel=TPConfig(tp_size=1, tp_grain_size=1, mpu=None, tp_group=None) injection_policy_tuple=None keep_module_on_host=False replace_with_kernel_inject=False
[2025-10-30 15:15:51,735] [INFO] [config.py:1007:print] timers_config ................ enabled=True synchronized=True
[2025-10-30 15:15:51,735] [INFO] [config.py:1007:print] train_batch_size ............. 32
[2025-10-30 15:15:51,736] [INFO] [config.py:1007:print] train_micro_batch_size_per_gpu 1
[2025-10-30 15:15:51,736] [INFO] [config.py:1007:print] use_data_before_expert_parallel_ False
[2025-10-30 15:15:51,736] [INFO] [config.py:1007:print] use_node_local_storage ....... False
[2025-10-30 15:15:51,737] [INFO] [config.py:1007:print] wall_clock_breakdown ......... False
[2025-10-30 15:15:51,737] [INFO] [config.py:1007:print] weight_quantization_config ... None
[2025-10-30 15:15:51,738] [INFO] [config.py:1007:print] world_size ................... 8
[2025-10-30 15:15:51,738] [INFO] [config.py:1007:print] zero_allow_untested_optimizer True
[2025-10-30 15:15:51,739] [INFO] [config.py:1007:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False module_granularity_threshold=0 use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=True zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False zeropp_loco_param=None mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True log_trace_cache_warnings=False
[2025-10-30 15:15:51,739] [INFO] [config.py:1007:print] zero_enabled ................. True
[2025-10-30 15:15:51,740] [INFO] [config.py:1007:print] zero_force_ds_cpu_optimizer .. True
[2025-10-30 15:15:51,740] [INFO] [config.py:1007:print] zero_optimization_stage ...... 2
[2025-10-30 15:15:51,741] [INFO] [config.py:993:print_user_config] json = {
"train_batch_size": 32,
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 4,
"gradient_clipping": 1.0,
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": false,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"bf16": {
"enabled": true
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5.000000e+08,
"overlap_comm": false,
"reduce_scatter": true,
"reduce_bucket_size": 5.000000e+08,
"contiguous_gradients": true,
"round_robin_gradients": true
},
"steps_per_print": inf
}
[INFO|trainer.py:2519] 2025-10-30 15:15:51,746 >> ***** Running training *****
[INFO|trainer.py:2520] 2025-10-30 15:15:51,746 >> Num examples = 144,164
[INFO|trainer.py:2521] 2025-10-30 15:15:51,747 >> Num Epochs = 1
[INFO|trainer.py:2522] 2025-10-30 15:15:51,747 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2525] 2025-10-30 15:15:51,748 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2526] 2025-10-30 15:15:51,748 >> Gradient Accumulation steps = 4
[INFO|trainer.py:2527] 2025-10-30 15:15:51,749 >> Total optimization steps = 4,506
[INFO|trainer.py:2528] 2025-10-30 15:15:51,755 >> Number of trainable parameters = 3,397,103,616
0%| | 0/4506 [00:00, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
[WARNING|logging.py:328] 2025-10-30 15:15:56,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
0%| | 1/4506 [00:09<11:30:28, 9.20s/it]
{'loss': 0.8387, 'grad_norm': 7.749441146850586, 'learning_rate': 0.0, 'epoch': 0.0}
0%| | 1/4506 [00:09<11:30:28, 9.20s/it]
0%| | 2/4506 [00:12<7:20:08, 5.86s/it]
{'loss': 0.7774, 'grad_norm': 6.957911968231201, 'learning_rate': 1.1086474501108648e-07, 'epoch': 0.0}
0%| | 2/4506 [00:12<7:20:08, 5.86s/it]
0%| | 3/4506 [00:17<6:26:40, 5.15s/it]
{'loss': 0.7747, 'grad_norm': 6.345953464508057, 'learning_rate': 2.2172949002217296e-07, 'epoch': 0.0}
0%| | 3/4506 [00:17<6:26:40, 5.15s/it]
0%| | 4/4506 [00:21<5:58:14, 4.77s/it]
{'loss': 0.7963, 'grad_norm': 6.663708686828613, 'learning_rate': 3.3259423503325944e-07, 'epoch': 0.0}
0%| | 4/4506 [00:21<5:58:14, 4.77s/it]
0%| | 5/4506 [00:25<5:37:33, 4.50s/it]
{'loss': 0.7882, 'grad_norm': 6.499878883361816, 'learning_rate': 4.434589800443459e-07, 'epoch': 0.0}
0%| | 5/4506 [00:25<5:37:33, 4.50s/it]
0%| | 6/4506 [00:29<5:26:42, 4.36s/it]
{'loss': 0.8035, 'grad_norm': 7.092200756072998, 'learning_rate': 5.543237250554324e-07, 'epoch': 0.0}
0%| | 6/4506 [00:29<5:26:42, 4.36s/it]
0%| | 7/4506 [00:33<5:12:50, 4.17s/it]
{'loss': 0.8067, 'grad_norm': 7.032195091247559, 'learning_rate': 6.651884700665189e-07, 'epoch': 0.0}
0%| | 7/4506 [00:33<5:12:50, 4.17s/it]
0%| | 8/4506 [00:36<5:01:37, 4.02s/it]
{'loss': 0.7832, 'grad_norm': 6.209845542907715, 'learning_rate': 7.760532150776054e-07, 'epoch': 0.0}
0%| | 8/4506 [00:36<5:01:37, 4.02s/it]
0%| | 9/4506 [00:40<5:01:31, 4.02s/it]
{'loss': 0.8068, 'grad_norm': 6.654215335845947, 'learning_rate': 8.869179600886918e-07, 'epoch': 0.0}
0%| | 9/4506 [00:40<5:01:31, 4.02s/it]
0%| | 10/4506 [00:44<5:03:51, 4.05s/it]
{'loss': 0.7751, 'grad_norm': 5.93585205078125, 'learning_rate': 9.977827050997782e-07, 'epoch': 0.0}
0%| | 10/4506 [00:44<5:03:51, 4.05s/it]
0%| | 11/4506 [00:49<5:07:30, 4.10s/it]
{'loss': 0.7595, 'grad_norm': 4.94903039932251, 'learning_rate': 1.1086474501108648e-06, 'epoch': 0.0}
0%| | 11/4506 [00:49<5:07:30, 4.10s/it]
0%| | 12/4506 [00:53<5:14:56, 4.20s/it]
{'loss': 0.6998, 'grad_norm': 4.450592041015625, 'learning_rate': 1.2195121951219514e-06, 'epoch': 0.0}
0%| | 12/4506 [00:53<5:14:56, 4.20s/it]
0%| | 13/4506 [00:57<5:15:10, 4.21s/it]
{'loss': 0.7643, 'grad_norm': 5.181246757507324, 'learning_rate': 1.3303769401330377e-06, 'epoch': 0.0}
0%| | 13/4506 [00:57<5:15:10, 4.21s/it]
0%| | 14/4506 [01:01<5:10:33, 4.15s/it]
{'loss': 0.6956, 'grad_norm': 2.7396697998046875, 'learning_rate': 1.4412416851441241e-06, 'epoch': 0.0}
0%| | 14/4506 [01:01<5:10:33, 4.15s/it]
0%| | 15/4506 [01:06<5:14:51, 4.21s/it]
{'loss': 0.7748, 'grad_norm': 6.235905647277832, 'learning_rate': 1.5521064301552107e-06, 'epoch': 0.0}
0%| | 15/4506 [01:06<5:14:51, 4.21s/it]
0%| | 16/4506 [01:10<5:12:17, 4.17s/it]
{'loss': 0.684, 'grad_norm': 2.209026575088501, 'learning_rate': 1.662971175166297e-06, 'epoch': 0.0}
0%| | 16/4506 [01:10<5:12:17, 4.17s/it]
0%| | 17/4506 [01:14<5:13:05, 4.18s/it]
{'loss': 0.6972, 'grad_norm': 2.4540529251098633, 'learning_rate': 1.7738359201773837e-06, 'epoch': 0.0}
0%| | 17/4506 [01:14<5:13:05, 4.18s/it]
0%| | 18/4506 [01:18<5:15:42, 4.22s/it]
{'loss': 0.6821, 'grad_norm': 1.872214436531067, 'learning_rate': 1.8847006651884702e-06, 'epoch': 0.0}
0%| | 18/4506 [01:18<5:15:42, 4.22s/it]
0%| | 19/4506 [01:22<5:14:40, 4.21s/it]
{'loss': 0.6267, 'grad_norm': 1.580664038658142, 'learning_rate': 1.9955654101995564e-06, 'epoch': 0.0}
0%| | 19/4506 [01:22<5:14:40, 4.21s/it]
0%| | 20/4506 [01:27<5:16:52, 4.24s/it]
{'loss': 0.6471, 'grad_norm': 1.7943265438079834, 'learning_rate': 2.106430155210643e-06, 'epoch': 0.0}
0%| | 20/4506 [01:27<5:16:52, 4.24s/it]
0%| | 21/4506 [01:31<5:19:30, 4.27s/it]
{'loss': 0.6412, 'grad_norm': 1.7004420757293701, 'learning_rate': 2.2172949002217296e-06, 'epoch': 0.0}
0%| | 21/4506 [01:31<5:19:30, 4.27s/it]
0%| | 22/4506 [01:35<5:14:12, 4.20s/it]
{'loss': 0.6382, 'grad_norm': 1.5448575019836426, 'learning_rate': 2.328159645232816e-06, 'epoch': 0.0}
0%| | 22/4506 [01:35<5:14:12, 4.20s/it]
1%| | 23/4506 [01:39<5:11:22, 4.17s/it]
{'loss': 0.6064, 'grad_norm': 1.235308051109314, 'learning_rate': 2.4390243902439027e-06, 'epoch': 0.01}
1%| | 23/4506 [01:39<5:11:22, 4.17s/it]
1%| | 24/4506 [01:43<5:11:08, 4.17s/it]
{'loss': 0.6087, 'grad_norm': 1.1577008962631226, 'learning_rate': 2.549889135254989e-06, 'epoch': 0.01}
1%| | 24/4506 [01:43<5:11:08, 4.17s/it]
1%| | 25/4506 [01:47<5:01:44, 4.04s/it]
{'loss': 0.6231, 'grad_norm': 1.1030522584915161, 'learning_rate': 2.6607538802660755e-06, 'epoch': 0.01}
1%| | 25/4506 [01:47<5:01:44, 4.04s/it]
1%| | 26/4506 [01:51<5:01:23, 4.04s/it]
{'loss': 0.5953, 'grad_norm': 1.5744059085845947, 'learning_rate': 2.7716186252771623e-06, 'epoch': 0.01}
1%| | 26/4506 [01:51<5:01:23, 4.04s/it]
1%| | 27/4506 [01:55<5:05:46, 4.10s/it]
{'loss': 0.6082, 'grad_norm': 1.2497130632400513, 'learning_rate': 2.8824833702882482e-06, 'epoch': 0.01}
1%| | 27/4506 [01:55<5:05:46, 4.10s/it]
1%| | 28/4506 [02:00<5:06:06, 4.10s/it]
{'loss': 0.5918, 'grad_norm': 1.3776960372924805, 'learning_rate': 2.993348115299335e-06, 'epoch': 0.01}
1%| | 28/4506 [02:00<5:06:06, 4.10s/it]
1%| | 29/4506 [02:04<5:04:32, 4.08s/it]
{'loss': 0.5775, 'grad_norm': 0.8967052698135376, 'learning_rate': 3.1042128603104214e-06, 'epoch': 0.01}
1%| | 29/4506 [02:04<5:04:32, 4.08s/it]
1%| | 30/4506 [02:08<5:05:09, 4.09s/it]
{'loss': 0.5779, 'grad_norm': 1.3759279251098633, 'learning_rate': 3.2150776053215078e-06, 'epoch': 0.01}
1%| | 30/4506 [02:08<5:05:09, 4.09s/it]
1%| | 31/4506 [02:12<4:59:47, 4.02s/it]
{'loss': 0.5795, 'grad_norm': 0.8057426810264587, 'learning_rate': 3.325942350332594e-06, 'epoch': 0.01}
1%| | 31/4506 [02:12<4:59:47, 4.02s/it]
1%| | 32/4506 [02:16<4:58:58, 4.01s/it]
{'loss': 0.5952, 'grad_norm': 0.91196209192276, 'learning_rate': 3.436807095343681e-06, 'epoch': 0.01}
1%| | 32/4506 [02:16<4:58:58, 4.01s/it]
1%| | 33/4506 [02:19<4:56:06, 3.97s/it]
{'loss': 0.5669, 'grad_norm': 0.7868587970733643, 'learning_rate': 3.5476718403547673e-06, 'epoch': 0.01}
1%| | 33/4506 [02:19<4:56:06, 3.97s/it]
1%| | 34/4506 [02:23<4:56:55, 3.98s/it]
{'loss': 0.5707, 'grad_norm': 0.6470670104026794, 'learning_rate': 3.6585365853658537e-06, 'epoch': 0.01}
1%| | 34/4506 [02:23<4:56:55, 3.98s/it]
1%| | 35/4506 [02:28<4:58:50, 4.01s/it]
{'loss': 0.5708, 'grad_norm': 0.6626003384590149, 'learning_rate': 3.7694013303769405e-06, 'epoch': 0.01}
1%| | 35/4506 [02:28<4:58:50, 4.01s/it]
1%| | 36/4506 [02:32<5:04:45, 4.09s/it]
{'loss': 0.5783, 'grad_norm': 0.7512629628181458, 'learning_rate': 3.8802660753880264e-06, 'epoch': 0.01}
1%| | 36/4506 [02:32<5:04:45, 4.09s/it]
1%| | 37/4506 [02:36<5:07:22, 4.13s/it]
{'loss': 0.5688, 'grad_norm': 0.8607787489891052, 'learning_rate': 3.991130820399113e-06, 'epoch': 0.01}
1%| | 37/4506 [02:36<5:07:22, 4.13s/it]
1%| | 38/4506 [02:40<5:07:23, 4.13s/it]
{'loss': 0.5704, 'grad_norm': 0.7380874752998352, 'learning_rate': 4.1019955654102e-06, 'epoch': 0.01}
1%| | 38/4506 [02:40<5:07:23, 4.13s/it]
1%| | 39/4506 [02:45<5:17:15, 4.26s/it]
{'loss': 0.5711, 'grad_norm': 0.8002908229827881, 'learning_rate': 4.212860310421286e-06, 'epoch': 0.01}
1%| | 39/4506 [02:45<5:17:15, 4.26s/it]
1%| | 40/4506 [02:49<5:13:30, 4.21s/it]
{'loss': 0.552, 'grad_norm': 0.6222218871116638, 'learning_rate': 4.323725055432373e-06, 'epoch': 0.01}
1%| | 40/4506 [02:49<5:13:30, 4.21s/it]
1%| | 41/4506 [02:53<5:13:09, 4.21s/it]
{'loss': 0.5525, 'grad_norm': 0.6954725980758667, 'learning_rate': 4.434589800443459e-06, 'epoch': 0.01}
1%| | 41/4506 [02:53<5:13:09, 4.21s/it]
1%| | 42/4506 [02:57<5:11:57, 4.19s/it]
{'loss': 0.5545, 'grad_norm': 0.6371685266494751, 'learning_rate': 4.5454545454545455e-06, 'epoch': 0.01}
1%| | 42/4506 [02:57<5:11:57, 4.19s/it]
1%| | 43/4506 [03:01<5:02:15, 4.06s/it]
{'loss': 0.5771, 'grad_norm': 0.8137165307998657, 'learning_rate': 4.656319290465632e-06, 'epoch': 0.01}
1%| | 43/4506 [03:01<5:02:15, 4.06s/it]
1%| | 44/4506 [03:05<4:54:01, 3.95s/it]
{'loss': 0.557, 'grad_norm': 0.6092925667762756, 'learning_rate': 4.767184035476718e-06, 'epoch': 0.01}
1%| | 44/4506 [03:05<4:54:01, 3.95s/it]
1%| | 45/4506 [03:09<4:54:22, 3.96s/it]
{'loss': 0.5668, 'grad_norm': 0.6992784142494202, 'learning_rate': 4.8780487804878055e-06, 'epoch': 0.01}
1%| | 45/4506 [03:09<4:54:22, 3.96s/it]
1%| | 46/4506 [03:12<4:52:45, 3.94s/it]
{'loss': 0.5607, 'grad_norm': 1.0127744674682617, 'learning_rate': 4.988913525498892e-06, 'epoch': 0.01}
1%| | 46/4506 [03:12<4:52:45, 3.94s/it]
1%| | 47/4506 [03:17<4:56:27, 3.99s/it]
{'loss': 0.5593, 'grad_norm': 0.791454553604126, 'learning_rate': 5.099778270509978e-06, 'epoch': 0.01}
1%| | 47/4506 [03:17<4:56:27, 3.99s/it]
1%| | 48/4506 [03:20<4:52:26, 3.94s/it]
{'loss': 0.5685, 'grad_norm': 1.0631412267684937, 'learning_rate': 5.210643015521065e-06, 'epoch': 0.01}
1%| | 48/4506 [03:20<4:52:26, 3.94s/it]
1%| | 49/4506 [03:25<4:58:18, 4.02s/it]
{'loss': 0.535, 'grad_norm': 0.7461488842964172, 'learning_rate': 5.321507760532151e-06, 'epoch': 0.01}
1%| | 49/4506 [03:25<4:58:18, 4.02s/it]
1%| | 50/4506 [03:29<4:59:39, 4.03s/it]
{'loss': 0.5523, 'grad_norm': 0.8169209957122803, 'learning_rate': 5.432372505543237e-06, 'epoch': 0.01}
1%| | 50/4506 [03:29<4:59:39, 4.03s/it]
1%| | 51/4506 [03:33<5:01:21, 4.06s/it]
{'loss': 0.5529, 'grad_norm': 0.9432432651519775, 'learning_rate': 5.5432372505543246e-06, 'epoch': 0.01}
1%| | 51/4506 [03:33<5:01:21, 4.06s/it]
1%| | 52/4506 [03:37<5:04:53, 4.11s/it]
{'loss': 0.545, 'grad_norm': 0.7050830721855164, 'learning_rate': 5.65410199556541e-06, 'epoch': 0.01}
1%| | 52/4506 [03:37<5:04:53, 4.11s/it]
1%| | 53/4506 [03:41<5:10:41, 4.19s/it]
{'loss': 0.5489, 'grad_norm': 0.7369768023490906, 'learning_rate': 5.7649667405764965e-06, 'epoch': 0.01}
1%| | 53/4506 [03:41<5:10:41, 4.19s/it]
1%| | 54/4506 [03:46<5:11:10, 4.19s/it]
{'loss': 0.5592, 'grad_norm': 0.649141252040863, 'learning_rate': 5.875831485587584e-06, 'epoch': 0.01}
1%| | 54/4506 [03:46<5:11:10, 4.19s/it]
1%| | 55/4506 [03:50<5:09:55, 4.18s/it]
{'loss': 0.5411, 'grad_norm': 0.7262722253799438, 'learning_rate': 5.98669623059867e-06, 'epoch': 0.01}
1%| | 55/4506 [03:50<5:09:55, 4.18s/it]
1%| | 56/4506 [03:54<5:14:29, 4.24s/it]
{'loss': 0.5272, 'grad_norm': 0.8773645162582397, 'learning_rate': 6.0975609756097564e-06, 'epoch': 0.01}
1%| | 56/4506 [03:54<5:14:29, 4.24s/it]
1%|▏ | 57/4506 [03:58<5:10:21, 4.19s/it]
{'loss': 0.5285, 'grad_norm': 0.6162415742874146, 'learning_rate': 6.208425720620843e-06, 'epoch': 0.01}
1%|▏ | 57/4506 [03:58<5:10:21, 4.19s/it]
1%|▏ | 58/4506 [04:02<5:12:41, 4.22s/it]
{'loss': 0.5357, 'grad_norm': 0.7336971163749695, 'learning_rate': 6.319290465631929e-06, 'epoch': 0.01}
1%|▏ | 58/4506 [04:02<5:12:41, 4.22s/it]
1%|▏ | 59/4506 [04:07<5:11:37, 4.20s/it]
{'loss': 0.5324, 'grad_norm': 0.7085797190666199, 'learning_rate': 6.4301552106430155e-06, 'epoch': 0.01}
1%|▏ | 59/4506 [04:07<5:11:37, 4.20s/it]
1%|▏ | 60/4506 [04:11<5:13:14, 4.23s/it]
{'loss': 0.5483, 'grad_norm': 0.9946093559265137, 'learning_rate': 6.541019955654103e-06, 'epoch': 0.01}
1%|▏ | 60/4506 [04:11<5:13:14, 4.23s/it]
1%|▏ | 61/4506 [04:15<5:04:51, 4.12s/it]
{'loss': 0.5499, 'grad_norm': 0.7174414992332458, 'learning_rate': 6.651884700665188e-06, 'epoch': 0.01}
1%|▏ | 61/4506 [04:15<5:04:51, 4.12s/it]
1%|▏ | 62/4506 [04:19<5:12:17, 4.22s/it]
{'loss': 0.5375, 'grad_norm': 0.9220353960990906, 'learning_rate': 6.7627494456762755e-06, 'epoch': 0.01}
1%|▏ | 62/4506 [04:19<5:12:17, 4.22s/it]
1%|▏ | 63/4506 [04:24<5:17:08, 4.28s/it]
{'loss': 0.5479, 'grad_norm': 0.7135265469551086, 'learning_rate': 6.873614190687362e-06, 'epoch': 0.01}
1%|▏ | 63/4506 [04:24<5:17:08, 4.28s/it]
1%|▏ | 64/4506 [04:28<5:10:16, 4.19s/it]
{'loss': 0.5281, 'grad_norm': 0.6784458160400391, 'learning_rate': 6.984478935698447e-06, 'epoch': 0.01}
1%|▏ | 64/4506 [04:28<5:10:16, 4.19s/it]
1%|▏ | 65/4506 [04:31<5:02:46, 4.09s/it]
{'loss': 0.5496, 'grad_norm': 0.774141788482666, 'learning_rate': 7.095343680709535e-06, 'epoch': 0.01}
1%|▏ | 65/4506 [04:32<5:02:46, 4.09s/it]
1%|▏ | 66/4506 [04:35<4:56:11, 4.00s/it]
{'loss': 0.5463, 'grad_norm': 0.912402868270874, 'learning_rate': 7.206208425720622e-06, 'epoch': 0.01}
1%|▏ | 66/4506 [04:35<4:56:11, 4.00s/it]
1%|▏ | 67/4506 [04:40<5:02:34, 4.09s/it]
{'loss': 0.5539, 'grad_norm': 0.8502076864242554, 'learning_rate': 7.317073170731707e-06, 'epoch': 0.01}
1%|▏ | 67/4506 [04:40<5:02:34, 4.09s/it]
2%|▏ | 68/4506 [04:43<4:55:42, 4.00s/it]
{'loss': 0.5264, 'grad_norm': 0.9313775300979614, 'learning_rate': 7.427937915742795e-06, 'epoch': 0.02}
2%|▏ | 68/4506 [04:43<4:55:42, 4.00s/it]
2%|▏ | 69/4506 [04:48<4:58:52, 4.04s/it]
{'loss': 0.5387, 'grad_norm': 0.8367621302604675, 'learning_rate': 7.538802660753881e-06, 'epoch': 0.02}
2%|▏ | 69/4506 [04:48<4:58:52, 4.04s/it]
2%|▏ | 70/4506 [04:52<5:01:56, 4.08s/it]
{'loss': 0.5312, 'grad_norm': 0.8713820576667786, 'learning_rate': 7.649667405764967e-06, 'epoch': 0.02}
2%|▏ | 70/4506 [04:52<5:01:56, 4.08s/it]
2%|▏ | 71/4506 [04:56<5:06:05, 4.14s/it]
{'loss': 0.523, 'grad_norm': 0.7605623006820679, 'learning_rate': 7.760532150776053e-06, 'epoch': 0.02}
2%|▏ | 71/4506 [04:56<5:06:05, 4.14s/it]
2%|▏ | 72/4506 [05:00<5:05:39, 4.14s/it]
{'loss': 0.5274, 'grad_norm': 0.5791651010513306, 'learning_rate': 7.87139689578714e-06, 'epoch': 0.02}
2%|▏ | 72/4506 [05:00<5:05:39, 4.14s/it]
2%|▏ | 73/4506 [05:04<5:07:25, 4.16s/it]
{'loss': 0.5465, 'grad_norm': 0.7534204721450806, 'learning_rate': 7.982261640798226e-06, 'epoch': 0.02}
2%|▏ | 73/4506 [05:04<5:07:25, 4.16s/it]
2%|▏ | 74/4506 [05:09<5:08:55, 4.18s/it]
{'loss': 0.5197, 'grad_norm': 0.6784512400627136, 'learning_rate': 8.093126385809313e-06, 'epoch': 0.02}
2%|▏ | 74/4506 [05:09<5:08:55, 4.18s/it]
2%|▏ | 75/4506 [05:13<5:05:27, 4.14s/it]
{'loss': 0.5185, 'grad_norm': 0.5945080518722534, 'learning_rate': 8.2039911308204e-06, 'epoch': 0.02}
2%|▏ | 75/4506 [05:13<5:05:27, 4.14s/it]
2%|▏ | 76/4506 [05:16<4:58:34, 4.04s/it]
{'loss': 0.5191, 'grad_norm': 0.6193498969078064, 'learning_rate': 8.314855875831486e-06, 'epoch': 0.02}
2%|▏ | 76/4506 [05:16<4:58:34, 4.04s/it]
2%|▏ | 77/4506 [05:20<4:59:38, 4.06s/it]
{'loss': 0.5266, 'grad_norm': 0.68009352684021, 'learning_rate': 8.425720620842573e-06, 'epoch': 0.02}
2%|▏ | 77/4506 [05:21<4:59:38, 4.06s/it]
2%|▏ | 78/4506 [05:24<4:53:06, 3.97s/it]
{'loss': 0.5266, 'grad_norm': 0.8341363668441772, 'learning_rate': 8.53658536585366e-06, 'epoch': 0.02}
2%|▏ | 78/4506 [05:24<4:53:06, 3.97s/it]
2%|▏ | 79/4506 [05:29<5:05:43, 4.14s/it]
{'loss': 0.5191, 'grad_norm': 2.0721189975738525, 'learning_rate': 8.647450110864746e-06, 'epoch': 0.02}
2%|▏ | 79/4506 [05:29<5:05:43, 4.14s/it]
2%|▏ | 80/4506 [05:32<4:52:44, 3.97s/it]
{'loss': 0.5229, 'grad_norm': 1.040513515472412, 'learning_rate': 8.758314855875833e-06, 'epoch': 0.02}
2%|▏ | 80/4506 [05:32<4:52:44, 3.97s/it]
2%|▏ | 81/4506 [05:37<4:58:57, 4.05s/it]
{'loss': 0.5652, 'grad_norm': 3.047673463821411, 'learning_rate': 8.869179600886918e-06, 'epoch': 0.02}
2%|▏ | 81/4506 [05:37<4:58:57, 4.05s/it]
2%|▏ | 82/4506 [05:41<4:57:31, 4.04s/it]
{'loss': 0.5491, 'grad_norm': 5.054272174835205, 'learning_rate': 8.980044345898004e-06, 'epoch': 0.02}
2%|▏ | 82/4506 [05:41<4:57:31, 4.04s/it]
2%|▏ | 83/4506 [05:45<4:55:01, 4.00s/it]
{'loss': 0.5248, 'grad_norm': 1.0403252840042114, 'learning_rate': 9.090909090909091e-06, 'epoch': 0.02}
2%|▏ | 83/4506 [05:45<4:55:01, 4.00s/it]
2%|▏ | 84/4506 [05:49<4:59:37, 4.07s/it]
{'loss': 0.5209, 'grad_norm': 0.9126943349838257, 'learning_rate': 9.201773835920177e-06, 'epoch': 0.02}
2%|▏ | 84/4506 [05:49<4:59:37, 4.07s/it]
2%|▏ | 85/4506 [05:53<5:05:58, 4.15s/it]
{'loss': 0.5333, 'grad_norm': 1.0895047187805176, 'learning_rate': 9.312638580931264e-06, 'epoch': 0.02}
2%|▏ | 85/4506 [05:53<5:05:58, 4.15s/it]
2%|▏ | 86/4506 [05:57<4:58:11, 4.05s/it]
{'loss': 0.5376, 'grad_norm': 0.7024878263473511, 'learning_rate': 9.423503325942351e-06, 'epoch': 0.02}
2%|▏ | 86/4506 [05:57<4:58:11, 4.05s/it]
2%|▏ | 87/4506 [06:01<4:55:55, 4.02s/it]
{'loss': 0.5233, 'grad_norm': 1.3156445026397705, 'learning_rate': 9.534368070953437e-06, 'epoch': 0.02}
2%|▏ | 87/4506 [06:01<4:55:55, 4.02s/it]
2%|▏ | 88/4506 [06:05<4:52:40, 3.97s/it]
{'loss': 0.5327, 'grad_norm': 1.5042088031768799, 'learning_rate': 9.645232815964524e-06, 'epoch': 0.02}
2%|▏ | 88/4506 [06:05<4:52:40, 3.97s/it]
2%|▏ | 89/4506 [06:09<4:50:04, 3.94s/it]
{'loss': 0.539, 'grad_norm': 0.7844274640083313, 'learning_rate': 9.756097560975611e-06, 'epoch': 0.02}
2%|▏ | 89/4506 [06:09<4:50:04, 3.94s/it]
2%|▏ | 90/4506 [06:13<4:53:54, 3.99s/it]
{'loss': 0.5378, 'grad_norm': 0.9690947532653809, 'learning_rate': 9.866962305986696e-06, 'epoch': 0.02}
2%|▏ | 90/4506 [06:13<4:53:54, 3.99s/it]
2%|▏ | 91/4506 [06:17<4:53:11, 3.98s/it]
{'loss': 0.5229, 'grad_norm': 0.9600121378898621, 'learning_rate': 9.977827050997784e-06, 'epoch': 0.02}
2%|▏ | 91/4506 [06:17<4:53:11, 3.98s/it]
2%|▏ | 92/4506 [06:21<4:57:56, 4.05s/it]
{'loss': 0.5331, 'grad_norm': 0.7447946071624756, 'learning_rate': 1.008869179600887e-05, 'epoch': 0.02}
2%|▏ | 92/4506 [06:21<4:57:56, 4.05s/it]
2%|▏ | 93/4506 [06:25<4:58:40, 4.06s/it]
{'loss': 0.5066, 'grad_norm': 0.8912707567214966, 'learning_rate': 1.0199556541019956e-05, 'epoch': 0.02}
2%|▏ | 93/4506 [06:25<4:58:40, 4.06s/it]
2%|▏ | 94/4506 [06:29<5:00:31, 4.09s/it]
{'loss': 0.5201, 'grad_norm': 0.8096697926521301, 'learning_rate': 1.0310421286031042e-05, 'epoch': 0.02}
2%|▏ | 94/4506 [06:29<5:00:31, 4.09s/it]
2%|▏ | 95/4506 [06:34<5:10:16, 4.22s/it]
{'loss': 0.5332, 'grad_norm': 0.7825638055801392, 'learning_rate': 1.042128603104213e-05, 'epoch': 0.02}
2%|▏ | 95/4506 [06:34<5:10:16, 4.22s/it]
2%|▏ | 96/4506 [06:38<5:18:43, 4.34s/it]
{'loss': 0.5177, 'grad_norm': 0.7641898393630981, 'learning_rate': 1.0532150776053215e-05, 'epoch': 0.02}
2%|▏ | 96/4506 [06:38<5:18:43, 4.34s/it]
2%|▏ | 97/4506 [06:43<5:16:45, 4.31s/it]
{'loss': 0.5155, 'grad_norm': 0.9750529527664185, 'learning_rate': 1.0643015521064302e-05, 'epoch': 0.02}
2%|▏ | 97/4506 [06:43<5:16:45, 4.31s/it]
2%|▏ | 98/4506 [06:47<5:13:43, 4.27s/it]
{'loss': 0.5039, 'grad_norm': 0.7133203148841858, 'learning_rate': 1.075388026607539e-05, 'epoch': 0.02}
2%|▏ | 98/4506 [06:47<5:13:43, 4.27s/it]
2%|▏ | 99/4506 [06:51<5:22:01, 4.38s/it]
{'loss': 0.5386, 'grad_norm': 2.1950747966766357, 'learning_rate': 1.0864745011086475e-05, 'epoch': 0.02}
2%|▏ | 99/4506 [06:51<5:22:01, 4.38s/it]
2%|▏ | 100/4506 [06:55<5:14:19, 4.28s/it]
{'loss': 0.5208, 'grad_norm': 1.3711824417114258, 'learning_rate': 1.0975609756097562e-05, 'epoch': 0.02}
2%|▏ | 100/4506 [06:55<5:14:19, 4.28s/it]
2%|▏ | 101/4506 [07:00<5:12:02, 4.25s/it]
{'loss': 0.5186, 'grad_norm': 0.9618781805038452, 'learning_rate': 1.1086474501108649e-05, 'epoch': 0.02}
2%|▏ | 101/4506 [07:00<5:12:02, 4.25s/it]
2%|▏ | 102/4506 [07:04<5:12:01, 4.25s/it]
{'loss': 0.5159, 'grad_norm': 0.7472499012947083, 'learning_rate': 1.1197339246119735e-05, 'epoch': 0.02}
2%|▏ | 102/4506 [07:04<5:12:01, 4.25s/it]
2%|▏ | 103/4506 [07:08<5:13:58, 4.28s/it]
{'loss': 0.515, 'grad_norm': 0.7514169216156006, 'learning_rate': 1.130820399113082e-05, 'epoch': 0.02}
2%|▏ | 103/4506 [07:08<5:13:58, 4.28s/it]
2%|▏ | 104/4506 [07:12<5:09:36, 4.22s/it]
{'loss': 0.5166, 'grad_norm': 0.8120065331459045, 'learning_rate': 1.1419068736141907e-05, 'epoch': 0.02}
2%|▏ | 104/4506 [07:12<5:09:36, 4.22s/it]
2%|▏ | 105/4506 [07:16<5:04:58, 4.16s/it]
{'loss': 0.5117, 'grad_norm': 0.8643534183502197, 'learning_rate': 1.1529933481152993e-05, 'epoch': 0.02}
2%|▏ | 105/4506 [07:16<5:04:58, 4.16s/it]
2%|▏ | 106/4506 [07:20<5:03:11, 4.13s/it]
{'loss': 0.506, 'grad_norm': 0.7540009021759033, 'learning_rate': 1.164079822616408e-05, 'epoch': 0.02}
2%|▏ | 106/4506 [07:20<5:03:11, 4.13s/it]
2%|▏ | 107/4506 [07:24<4:59:51, 4.09s/it]
{'loss': 0.5272, 'grad_norm': 0.773687481880188, 'learning_rate': 1.1751662971175167e-05, 'epoch': 0.02}
2%|▏ | 107/4506 [07:24<4:59:51, 4.09s/it]
2%|▏ | 108/4506 [07:28<4:56:04, 4.04s/it]
{'loss': 0.5346, 'grad_norm': 0.9719457030296326, 'learning_rate': 1.1862527716186253e-05, 'epoch': 0.02}
2%|▏ | 108/4506 [07:28<4:56:04, 4.04s/it]
2%|▏ | 109/4506 [07:32<4:58:37, 4.07s/it]
{'loss': 0.5244, 'grad_norm': 0.6876107454299927, 'learning_rate': 1.197339246119734e-05, 'epoch': 0.02}
2%|▏ | 109/4506 [07:32<4:58:37, 4.07s/it]
2%|▏ | 110/4506 [07:37<5:03:18, 4.14s/it]
{'loss': 0.5209, 'grad_norm': 0.6858226656913757, 'learning_rate': 1.2084257206208427e-05, 'epoch': 0.02}
2%|▏ | 110/4506 [07:37<5:03:18, 4.14s/it]
2%|▏ | 111/4506 [07:41<5:01:32, 4.12s/it]
{'loss': 0.5151, 'grad_norm': 0.6834162473678589, 'learning_rate': 1.2195121951219513e-05, 'epoch': 0.02}
2%|▏ | 111/4506 [07:41<5:01:32, 4.12s/it]
2%|▏ | 112/4506 [07:45<5:10:09, 4.24s/it]
{'loss': 0.513, 'grad_norm': 0.7963821291923523, 'learning_rate': 1.23059866962306e-05, 'epoch': 0.02}
2%|▏ | 112/4506 [07:45<5:10:09, 4.24s/it]
3%|▎ | 113/4506 [07:50<5:17:24, 4.34s/it]
{'loss': 0.5065, 'grad_norm': 0.68403559923172, 'learning_rate': 1.2416851441241686e-05, 'epoch': 0.03}
3%|▎ | 113/4506 [07:50<5:17:24, 4.34s/it]
3%|▎ | 114/4506 [07:54<5:08:25, 4.21s/it]
{'loss': 0.5055, 'grad_norm': 0.692649781703949, 'learning_rate': 1.2527716186252773e-05, 'epoch': 0.03}
3%|▎ | 114/4506 [07:54<5:08:25, 4.21s/it]
3%|▎ | 115/4506 [07:58<5:06:03, 4.18s/it]
{'loss': 0.5086, 'grad_norm': 0.665569543838501, 'learning_rate': 1.2638580931263858e-05, 'epoch': 0.03}
3%|▎ | 115/4506 [07:58<5:06:03, 4.18s/it]
3%|▎ | 116/4506 [08:02<5:05:17, 4.17s/it]
{'loss': 0.5175, 'grad_norm': 0.6402637362480164, 'learning_rate': 1.2749445676274946e-05, 'epoch': 0.03}
3%|▎ | 116/4506 [08:02<5:05:17, 4.17s/it]
3%|▎ | 117/4506 [08:06<5:05:55, 4.18s/it]
{'loss': 0.506, 'grad_norm': 0.6624858975410461, 'learning_rate': 1.2860310421286031e-05, 'epoch': 0.03}
3%|▎ | 117/4506 [08:06<5:05:55, 4.18s/it]
3%|▎ | 118/4506 [08:10<4:57:50, 4.07s/it]
{'loss': 0.5037, 'grad_norm': 0.6470703482627869, 'learning_rate': 1.2971175166297117e-05, 'epoch': 0.03}
3%|▎ | 118/4506 [08:10<4:57:50, 4.07s/it]
3%|▎ | 119/4506 [08:14<5:04:08, 4.16s/it]
{'loss': 0.5149, 'grad_norm': 0.6009573936462402, 'learning_rate': 1.3082039911308206e-05, 'epoch': 0.03}
3%|▎ | 119/4506 [08:14<5:04:08, 4.16s/it]
3%|▎ | 120/4506 [08:19<5:04:13, 4.16s/it]
{'loss': 0.5187, 'grad_norm': 0.8120008111000061, 'learning_rate': 1.3192904656319291e-05, 'epoch': 0.03}
3%|▎ | 120/4506 [08:19<5:04:13, 4.16s/it]
3%|▎ | 121/4506 [08:22<4:56:58, 4.06s/it]
{'loss': 0.5047, 'grad_norm': 0.6205000281333923, 'learning_rate': 1.3303769401330377e-05, 'epoch': 0.03}
3%|▎ | 121/4506 [08:22<4:56:58, 4.06s/it]
3%|▎ | 122/4506 [08:27<4:58:09, 4.08s/it]
{'loss': 0.512, 'grad_norm': 0.7457971572875977, 'learning_rate': 1.3414634146341466e-05, 'epoch': 0.03}
3%|▎ | 122/4506 [08:27<4:58:09, 4.08s/it]
3%|▎ | 123/4506 [08:30<4:54:51, 4.04s/it]
{'loss': 0.5063, 'grad_norm': 0.7526416182518005, 'learning_rate': 1.3525498891352551e-05, 'epoch': 0.03}
3%|▎ | 123/4506 [08:30<4:54:51, 4.04s/it]
3%|▎ | 124/4506 [08:34<4:49:02, 3.96s/it]
{'loss': 0.5144, 'grad_norm': 0.7152613401412964, 'learning_rate': 1.3636363636363637e-05, 'epoch': 0.03}
3%|▎ | 124/4506 [08:34<4:49:02, 3.96s/it]
3%|▎ | 125/4506 [08:38<4:51:13, 3.99s/it]
{'loss': 0.5099, 'grad_norm': 1.1524215936660767, 'learning_rate': 1.3747228381374724e-05, 'epoch': 0.03}
3%|▎ | 125/4506 [08:38<4:51:13, 3.99s/it]
3%|▎ | 126/4506 [08:43<4:56:41, 4.06s/it]
{'loss': 0.5143, 'grad_norm': 0.6255645155906677, 'learning_rate': 1.385809312638581e-05, 'epoch': 0.03}
3%|▎ | 126/4506 [08:43<4:56:41, 4.06s/it]
3%|▎ | 127/4506 [08:47<5:03:38, 4.16s/it]
{'loss': 0.5073, 'grad_norm': 0.6595082879066467, 'learning_rate': 1.3968957871396895e-05, 'epoch': 0.03}
3%|▎ | 127/4506 [08:47<5:03:38, 4.16s/it]
3%|▎ | 128/4506 [08:51<5:06:23, 4.20s/it]
{'loss': 0.522, 'grad_norm': 0.7125035524368286, 'learning_rate': 1.4079822616407984e-05, 'epoch': 0.03}
3%|▎ | 128/4506 [08:51<5:06:23, 4.20s/it]
3%|▎ | 129/4506 [08:55<5:01:50, 4.14s/it]
{'loss': 0.5093, 'grad_norm': 0.6468531489372253, 'learning_rate': 1.419068736141907e-05, 'epoch': 0.03}
3%|▎ | 129/4506 [08:55<5:01:50, 4.14s/it]
3%|▎ | 130/4506 [08:59<4:58:33, 4.09s/it]
{'loss': 0.5107, 'grad_norm': 0.7116678357124329, 'learning_rate': 1.4301552106430155e-05, 'epoch': 0.03}
3%|▎ | 130/4506 [08:59<4:58:33, 4.09s/it]
3%|▎ | 131/4506 [09:03<5:01:34, 4.14s/it]
{'loss': 0.5147, 'grad_norm': 0.6179465055465698, 'learning_rate': 1.4412416851441244e-05, 'epoch': 0.03}
3%|▎ | 131/4506 [09:03<5:01:34, 4.14s/it]
3%|▎ | 132/4506 [09:08<5:02:55, 4.16s/it]
{'loss': 0.4927, 'grad_norm': 0.58305424451828, 'learning_rate': 1.452328159645233e-05, 'epoch': 0.03}
3%|▎ | 132/4506 [09:08<5:02:55, 4.16s/it]
3%|▎ | 133/4506 [09:12<4:57:58, 4.09s/it]
{'loss': 0.5146, 'grad_norm': 0.6888719797134399, 'learning_rate': 1.4634146341463415e-05, 'epoch': 0.03}
3%|▎ | 133/4506 [09:12<4:57:58, 4.09s/it]
3%|▎ | 134/4506 [09:16<4:56:09, 4.06s/it]
{'loss': 0.5263, 'grad_norm': 0.837751567363739, 'learning_rate': 1.4745011086474502e-05, 'epoch': 0.03}
3%|▎ | 134/4506 [09:16<4:56:09, 4.06s/it]
3%|▎ | 135/4506 [09:19<4:49:41, 3.98s/it]
{'loss': 0.5027, 'grad_norm': 0.6558955311775208, 'learning_rate': 1.485587583148559e-05, 'epoch': 0.03}
3%|▎ | 135/4506 [09:19<4:49:41, 3.98s/it]
3%|▎ | 136/4506 [09:23<4:49:24, 3.97s/it]
{'loss': 0.5073, 'grad_norm': 0.6571597456932068, 'learning_rate': 1.4966740576496675e-05, 'epoch': 0.03}
3%|▎ | 136/4506 [09:23<4:49:24, 3.97s/it]
3%|▎ | 137/4506 [09:27<4:50:46, 3.99s/it]
{'loss': 0.5001, 'grad_norm': 0.6853216886520386, 'learning_rate': 1.5077605321507762e-05, 'epoch': 0.03}
3%|▎ | 137/4506 [09:27<4:50:46, 3.99s/it]
3%|▎ | 138/4506 [09:32<5:06:19, 4.21s/it]
{'loss': 0.5069, 'grad_norm': 0.7141135334968567, 'learning_rate': 1.5188470066518847e-05, 'epoch': 0.03}
3%|▎ | 138/4506 [09:32<5:06:19, 4.21s/it]
3%|▎ | 139/4506 [09:36<5:02:57, 4.16s/it]
{'loss': 0.5, 'grad_norm': 0.6511688828468323, 'learning_rate': 1.5299334811529935e-05, 'epoch': 0.03}
3%|▎ | 139/4506 [09:36<5:02:57, 4.16s/it]
3%|▎ | 140/4506 [09:40<5:01:19, 4.14s/it]
{'loss': 0.5049, 'grad_norm': 0.6858503222465515, 'learning_rate': 1.541019955654102e-05, 'epoch': 0.03}
3%|▎ | 140/4506 [09:40<5:01:19, 4.14s/it]
3%|▎ | 141/4506 [09:44<4:59:35, 4.12s/it]
{'loss': 0.4983, 'grad_norm': 0.711287260055542, 'learning_rate': 1.5521064301552106e-05, 'epoch': 0.03}
3%|▎ | 141/4506 [09:44<4:59:35, 4.12s/it]
3%|▎ | 142/4506 [09:48<4:55:41, 4.07s/it]
{'loss': 0.5107, 'grad_norm': 0.6876451373100281, 'learning_rate': 1.563192904656319e-05, 'epoch': 0.03}
3%|▎ | 142/4506 [09:48<4:55:41, 4.07s/it]
3%|▎ | 143/4506 [09:52<4:51:43, 4.01s/it]
{'loss': 0.5191, 'grad_norm': 0.7538036108016968, 'learning_rate': 1.574279379157428e-05, 'epoch': 0.03}
3%|▎ | 143/4506 [09:52<4:51:43, 4.01s/it]
3%|▎ | 144/4506 [09:56<4:49:21, 3.98s/it]
{'loss': 0.5056, 'grad_norm': 0.808566153049469, 'learning_rate': 1.5853658536585366e-05, 'epoch': 0.03}
3%|▎ | 144/4506 [09:56<4:49:21, 3.98s/it]
3%|▎ | 145/4506 [10:00<4:54:06, 4.05s/it]
{'loss': 0.496, 'grad_norm': 0.673305094242096, 'learning_rate': 1.596452328159645e-05, 'epoch': 0.03}
3%|▎ | 145/4506 [10:00<4:54:06, 4.05s/it]
3%|▎ | 146/4506 [10:04<4:52:43, 4.03s/it]
{'loss': 0.5011, 'grad_norm': 0.7931214570999146, 'learning_rate': 1.607538802660754e-05, 'epoch': 0.03}
3%|▎ | 146/4506 [10:04<4:52:43, 4.03s/it]
3%|▎ | 147/4506 [10:08<4:55:38, 4.07s/it]
{'loss': 0.5088, 'grad_norm': 0.575067400932312, 'learning_rate': 1.6186252771618626e-05, 'epoch': 0.03}
3%|▎ | 147/4506 [10:08<4:55:38, 4.07s/it]
3%|▎ | 148/4506 [10:13<5:03:42, 4.18s/it]
{'loss': 0.5212, 'grad_norm': 0.8837129473686218, 'learning_rate': 1.629711751662971e-05, 'epoch': 0.03}
3%|▎ | 148/4506 [10:13<5:03:42, 4.18s/it]
3%|▎ | 149/4506 [10:17<5:02:07, 4.16s/it]
{'loss': 0.5058, 'grad_norm': 0.6617403626441956, 'learning_rate': 1.64079822616408e-05, 'epoch': 0.03}
3%|▎ | 149/4506 [10:17<5:02:07, 4.16s/it]
3%|▎ | 150/4506 [10:21<4:59:32, 4.13s/it]
{'loss': 0.4971, 'grad_norm': 0.7399684190750122, 'learning_rate': 1.6518847006651886e-05, 'epoch': 0.03}
3%|▎ | 150/4506 [10:21<4:59:32, 4.13s/it]
3%|▎ | 151/4506 [10:25<5:03:14, 4.18s/it]
{'loss': 0.4909, 'grad_norm': 0.7428039908409119, 'learning_rate': 1.662971175166297e-05, 'epoch': 0.03}
3%|▎ | 151/4506 [10:25<5:03:14, 4.18s/it]
3%|▎ | 152/4506 [10:29<4:57:58, 4.11s/it]
{'loss': 0.5141, 'grad_norm': 0.7002363204956055, 'learning_rate': 1.674057649667406e-05, 'epoch': 0.03}
3%|▎ | 152/4506 [10:29<4:57:58, 4.11s/it]
3%|▎ | 153/4506 [10:33<4:53:54, 4.05s/it]
{'loss': 0.4976, 'grad_norm': 0.7918694615364075, 'learning_rate': 1.6851441241685146e-05, 'epoch': 0.03}
3%|▎ | 153/4506 [10:33<4:53:54, 4.05s/it]
3%|▎ | 154/4506 [10:37<4:59:08, 4.12s/it]
{'loss': 0.4996, 'grad_norm': 0.5637157559394836, 'learning_rate': 1.696230598669623e-05, 'epoch': 0.03}
3%|▎ | 154/4506 [10:37<4:59:08, 4.12s/it]
3%|▎ | 155/4506 [10:41<4:54:11, 4.06s/it]
{'loss': 0.5038, 'grad_norm': 0.6881914734840393, 'learning_rate': 1.707317073170732e-05, 'epoch': 0.03}
3%|▎ | 155/4506 [10:41<4:54:11, 4.06s/it]
3%|▎ | 156/4506 [10:45<4:55:29, 4.08s/it]
{'loss': 0.4982, 'grad_norm': 0.9357012510299683, 'learning_rate': 1.7184035476718406e-05, 'epoch': 0.03}
3%|▎ | 156/4506 [10:45<4:55:29, 4.08s/it]
3%|▎ | 157/4506 [10:49<4:54:42, 4.07s/it]
{'loss': 0.5057, 'grad_norm': 0.5885743498802185, 'learning_rate': 1.729490022172949e-05, 'epoch': 0.03}
3%|▎ | 157/4506 [10:49<4:54:42, 4.07s/it]
4%|▎ | 158/4506 [10:54<4:59:33, 4.13s/it]
{'loss': 0.4925, 'grad_norm': 0.8538793325424194, 'learning_rate': 1.740576496674058e-05, 'epoch': 0.04}
4%|▎ | 158/4506 [10:54<4:59:33, 4.13s/it]
4%|▎ | 159/4506 [10:57<4:48:24, 3.98s/it]
{'loss': 0.4976, 'grad_norm': 0.764680027961731, 'learning_rate': 1.7516629711751666e-05, 'epoch': 0.04}
4%|▎ | 159/4506 [10:57<4:48:24, 3.98s/it]
4%|▎ | 160/4506 [11:02<4:56:40, 4.10s/it]
{'loss': 0.5103, 'grad_norm': 0.9471895098686218, 'learning_rate': 1.762749445676275e-05, 'epoch': 0.04}
4%|▎ | 160/4506 [11:02<4:56:40, 4.10s/it]
4%|▎ | 161/4506 [11:06<4:51:47, 4.03s/it]
{'loss': 0.5035, 'grad_norm': 0.6463961005210876, 'learning_rate': 1.7738359201773837e-05, 'epoch': 0.04}
4%|▎ | 161/4506 [11:06<4:51:47, 4.03s/it]
4%|▎ | 162/4506 [11:09<4:48:16, 3.98s/it]
{'loss': 0.494, 'grad_norm': 0.8441608548164368, 'learning_rate': 1.7849223946784922e-05, 'epoch': 0.04}
4%|▎ | 162/4506 [11:09<4:48:16, 3.98s/it]
4%|▎ | 163/4506 [11:14<4:59:43, 4.14s/it]
{'loss': 0.4811, 'grad_norm': 0.6206681132316589, 'learning_rate': 1.7960088691796008e-05, 'epoch': 0.04}
4%|▎ | 163/4506 [11:14<4:59:43, 4.14s/it]
4%|▎ | 164/4506 [11:18<4:56:50, 4.10s/it]
{'loss': 0.4976, 'grad_norm': 0.8829758167266846, 'learning_rate': 1.8070953436807093e-05, 'epoch': 0.04}
4%|▎ | 164/4506 [11:18<4:56:50, 4.10s/it]
4%|▎ | 165/4506 [11:22<4:56:39, 4.10s/it]
{'loss': 0.493, 'grad_norm': 0.6485555171966553, 'learning_rate': 1.8181818181818182e-05, 'epoch': 0.04}
4%|▎ | 165/4506 [11:22<4:56:39, 4.10s/it]
4%|▎ | 166/4506 [11:27<5:04:41, 4.21s/it]
{'loss': 0.4977, 'grad_norm': 0.6904866099357605, 'learning_rate': 1.8292682926829268e-05, 'epoch': 0.04}
4%|▎ | 166/4506 [11:27<5:04:41, 4.21s/it]
4%|▎ | 167/4506 [11:31<5:01:25, 4.17s/it]
{'loss': 0.4846, 'grad_norm': 0.7005682587623596, 'learning_rate': 1.8403547671840353e-05, 'epoch': 0.04}
4%|▎ | 167/4506 [11:31<5:01:25, 4.17s/it]
4%|▎ | 168/4506 [11:35<4:58:06, 4.12s/it]
{'loss': 0.5072, 'grad_norm': 0.732580840587616, 'learning_rate': 1.8514412416851442e-05, 'epoch': 0.04}
4%|▎ | 168/4506 [11:35<4:58:06, 4.12s/it]
4%|▍ | 169/4506 [11:39<4:57:02, 4.11s/it]
{'loss': 0.4967, 'grad_norm': 0.5651941895484924, 'learning_rate': 1.8625277161862528e-05, 'epoch': 0.04}
4%|▍ | 169/4506 [11:39<4:57:02, 4.11s/it]
4%|▍ | 170/4506 [11:43<5:01:53, 4.18s/it]
{'loss': 0.5013, 'grad_norm': 0.7341728210449219, 'learning_rate': 1.8736141906873613e-05, 'epoch': 0.04}
4%|▍ | 170/4506 [11:43<5:01:53, 4.18s/it]
4%|▍ | 171/4506 [11:47<4:53:48, 4.07s/it]
{'loss': 0.5017, 'grad_norm': 0.7666785717010498, 'learning_rate': 1.8847006651884702e-05, 'epoch': 0.04}
4%|▍ | 171/4506 [11:47<4:53:48, 4.07s/it]
4%|▍ | 172/4506 [11:51<4:47:09, 3.98s/it]
{'loss': 0.4929, 'grad_norm': 0.6863864660263062, 'learning_rate': 1.8957871396895788e-05, 'epoch': 0.04}
4%|▍ | 172/4506 [11:51<4:47:09, 3.98s/it]
4%|▍ | 173/4506 [11:54<4:43:02, 3.92s/it]
{'loss': 0.4861, 'grad_norm': 0.636673629283905, 'learning_rate': 1.9068736141906873e-05, 'epoch': 0.04}
4%|▍ | 173/4506 [11:54<4:43:02, 3.92s/it]
4%|▍ | 174/4506 [11:58<4:38:42, 3.86s/it]
{'loss': 0.502, 'grad_norm': 0.6833817958831787, 'learning_rate': 1.9179600886917962e-05, 'epoch': 0.04}
4%|▍ | 174/4506 [11:58<4:38:42, 3.86s/it]
4%|▍ | 175/4506 [12:02<4:48:07, 3.99s/it]
{'loss': 0.4987, 'grad_norm': 0.8306127786636353, 'learning_rate': 1.9290465631929047e-05, 'epoch': 0.04}
4%|▍ | 175/4506 [12:02<4:48:07, 3.99s/it]
4%|▍ | 176/4506 [12:06<4:38:59, 3.87s/it]
{'loss': 0.4954, 'grad_norm': 0.7580655217170715, 'learning_rate': 1.9401330376940133e-05, 'epoch': 0.04}
4%|▍ | 176/4506 [12:06<4:38:59, 3.87s/it]
4%|▍ | 177/4506 [12:10<4:47:28, 3.98s/it]
{'loss': 0.489, 'grad_norm': 0.6834363341331482, 'learning_rate': 1.9512195121951222e-05, 'epoch': 0.04}
4%|▍ | 177/4506 [12:10<4:47:28, 3.98s/it]
4%|▍ | 178/4506 [12:15<4:58:57, 4.14s/it]
{'loss': 0.4928, 'grad_norm': 0.7275435328483582, 'learning_rate': 1.9623059866962307e-05, 'epoch': 0.04}
4%|▍ | 178/4506 [12:15<4:58:57, 4.14s/it]
4%|▍ | 179/4506 [12:19<4:58:59, 4.15s/it]
{'loss': 0.4918, 'grad_norm': 0.6124159097671509, 'learning_rate': 1.9733924611973393e-05, 'epoch': 0.04}
4%|▍ | 179/4506 [12:19<4:58:59, 4.15s/it]
4%|▍ | 180/4506 [12:23<4:56:55, 4.12s/it]
{'loss': 0.4932, 'grad_norm': 0.688004732131958, 'learning_rate': 1.9844789356984482e-05, 'epoch': 0.04}
4%|▍ | 180/4506 [12:23<4:56:55, 4.12s/it]
4%|▍ | 181/4506 [12:27<4:53:09, 4.07s/it]
{'loss': 0.5074, 'grad_norm': 0.8179774284362793, 'learning_rate': 1.9955654101995567e-05, 'epoch': 0.04}
4%|▍ | 181/4506 [12:27<4:53:09, 4.07s/it]
4%|▍ | 182/4506 [12:31<4:59:05, 4.15s/it]
{'loss': 0.4897, 'grad_norm': 0.7262698411941528, 'learning_rate': 2.0066518847006653e-05, 'epoch': 0.04}
4%|▍ | 182/4506 [12:31<4:59:05, 4.15s/it]
4%|▍ | 183/4506 [12:35<4:52:50, 4.06s/it]
{'loss': 0.5041, 'grad_norm': 0.7454146146774292, 'learning_rate': 2.017738359201774e-05, 'epoch': 0.04}
4%|▍ | 183/4506 [12:35<4:52:50, 4.06s/it]
4%|▍ | 184/4506 [12:39<4:56:59, 4.12s/it]
{'loss': 0.4966, 'grad_norm': 0.6987413167953491, 'learning_rate': 2.0288248337028824e-05, 'epoch': 0.04}
4%|▍ | 184/4506 [12:39<4:56:59, 4.12s/it]
4%|▍ | 185/4506 [12:43<4:55:23, 4.10s/it]
{'loss': 0.506, 'grad_norm': 1.2635070085525513, 'learning_rate': 2.0399113082039913e-05, 'epoch': 0.04}
4%|▍ | 185/4506 [12:43<4:55:23, 4.10s/it]
4%|▍ | 186/4506 [12:48<4:57:40, 4.13s/it]
{'loss': 0.4856, 'grad_norm': 0.935310423374176, 'learning_rate': 2.0509977827051e-05, 'epoch': 0.04}
4%|▍ | 186/4506 [12:48<4:57:40, 4.13s/it]
4%|▍ | 187/4506 [12:52<4:51:44, 4.05s/it]
{'loss': 0.5072, 'grad_norm': 0.7651014924049377, 'learning_rate': 2.0620842572062084e-05, 'epoch': 0.04}
4%|▍ | 187/4506 [12:52<4:51:44, 4.05s/it]
4%|▍ | 188/4506 [12:56<4:56:31, 4.12s/it]
{'loss': 0.5054, 'grad_norm': 0.7485103011131287, 'learning_rate': 2.073170731707317e-05, 'epoch': 0.04}
4%|▍ | 188/4506 [12:56<4:56:31, 4.12s/it]
4%|▍ | 189/4506 [13:00<4:53:18, 4.08s/it]
{'loss': 0.4786, 'grad_norm': 0.6164509654045105, 'learning_rate': 2.084257206208426e-05, 'epoch': 0.04}
4%|▍ | 189/4506 [13:00<4:53:18, 4.08s/it]
4%|▍ | 190/4506 [13:04<4:55:01, 4.10s/it]
{'loss': 0.4987, 'grad_norm': 0.9426259994506836, 'learning_rate': 2.0953436807095344e-05, 'epoch': 0.04}
4%|▍ | 190/4506 [13:04<4:55:01, 4.10s/it]
4%|▍ | 191/4506 [13:09<5:12:47, 4.35s/it]
{'loss': 0.5034, 'grad_norm': 0.6564449667930603, 'learning_rate': 2.106430155210643e-05, 'epoch': 0.04}
4%|▍ | 191/4506 [13:09<5:12:47, 4.35s/it]
4%|▍ | 192/4506 [13:13<5:15:35, 4.39s/it]
{'loss': 0.4769, 'grad_norm': 0.6232126951217651, 'learning_rate': 2.117516629711752e-05, 'epoch': 0.04}
4%|▍ | 192/4506 [13:13<5:15:35, 4.39s/it]
4%|▍ | 193/4506 [13:18<5:12:56, 4.35s/it]
{'loss': 0.5027, 'grad_norm': 0.791825532913208, 'learning_rate': 2.1286031042128604e-05, 'epoch': 0.04}
4%|▍ | 193/4506 [13:18<5:12:56, 4.35s/it]
4%|▍ | 194/4506 [13:22<5:15:04, 4.38s/it]
{'loss': 0.5021, 'grad_norm': 0.7323757410049438, 'learning_rate': 2.139689578713969e-05, 'epoch': 0.04}
4%|▍ | 194/4506 [13:22<5:15:04, 4.38s/it]
4%|▍ | 195/4506 [13:26<5:05:45, 4.26s/it]
{'loss': 0.4979, 'grad_norm': 0.6726215481758118, 'learning_rate': 2.150776053215078e-05, 'epoch': 0.04}
4%|▍ | 195/4506 [13:26<5:05:45, 4.26s/it]
4%|▍ | 196/4506 [13:30<5:08:01, 4.29s/it]
{'loss': 0.4943, 'grad_norm': 0.7152880430221558, 'learning_rate': 2.1618625277161864e-05, 'epoch': 0.04}
4%|▍ | 196/4506 [13:30<5:08:01, 4.29s/it]
4%|▍ | 197/4506 [13:35<5:08:30, 4.30s/it]
{'loss': 0.4904, 'grad_norm': 0.5969955325126648, 'learning_rate': 2.172949002217295e-05, 'epoch': 0.04}
4%|▍ | 197/4506 [13:35<5:08:30, 4.30s/it]
4%|▍ | 198/4506 [13:39<5:04:14, 4.24s/it]
{'loss': 0.4911, 'grad_norm': 2.0753326416015625, 'learning_rate': 2.1840354767184038e-05, 'epoch': 0.04}
4%|▍ | 198/4506 [13:39<5:04:14, 4.24s/it]
4%|▍ | 199/4506 [13:43<4:59:02, 4.17s/it]
{'loss': 0.4876, 'grad_norm': 0.674692690372467, 'learning_rate': 2.1951219512195124e-05, 'epoch': 0.04}
4%|▍ | 199/4506 [13:43<4:59:02, 4.17s/it]
4%|▍ | 200/4506 [13:47<4:54:17, 4.10s/it]
{'loss': 0.4791, 'grad_norm': 0.622216522693634, 'learning_rate': 2.206208425720621e-05, 'epoch': 0.04}
4%|▍ | 200/4506 [13:47<4:54:17, 4.10s/it]
4%|▍ | 201/4506 [13:51<4:49:07, 4.03s/it]
{'loss': 0.4797, 'grad_norm': 0.6506837606430054, 'learning_rate': 2.2172949002217298e-05, 'epoch': 0.04}
4%|▍ | 201/4506 [13:51<4:49:07, 4.03s/it]
4%|▍ | 202/4506 [13:55<4:48:10, 4.02s/it]
{'loss': 0.4843, 'grad_norm': 0.5556838512420654, 'learning_rate': 2.2283813747228384e-05, 'epoch': 0.04}
4%|▍ | 202/4506 [13:55<4:48:10, 4.02s/it]
5%|▍ | 203/4506 [13:59<4:49:02, 4.03s/it]
{'loss': 0.5139, 'grad_norm': 0.675977349281311, 'learning_rate': 2.239467849223947e-05, 'epoch': 0.05}
5%|▍ | 203/4506 [13:59<4:49:02, 4.03s/it]
5%|▍ | 204/4506 [14:03<4:57:31, 4.15s/it]
{'loss': 0.4993, 'grad_norm': 0.7002516388893127, 'learning_rate': 2.2505543237250555e-05, 'epoch': 0.05}
5%|▍ | 204/4506 [14:03<4:57:31, 4.15s/it]
5%|▍ | 205/4506 [14:07<4:55:11, 4.12s/it]
{'loss': 0.4905, 'grad_norm': 0.8939037322998047, 'learning_rate': 2.261640798226164e-05, 'epoch': 0.05}
5%|▍ | 205/4506 [14:07<4:55:11, 4.12s/it]
5%|▍ | 206/4506 [14:12<5:09:41, 4.32s/it]
{'loss': 0.4841, 'grad_norm': 0.6225322484970093, 'learning_rate': 2.272727272727273e-05, 'epoch': 0.05}
5%|▍ | 206/4506 [14:12<5:09:41, 4.32s/it]
5%|▍ | 207/4506 [14:17<5:21:27, 4.49s/it]
{'loss': 0.4787, 'grad_norm': 0.5085726380348206, 'learning_rate': 2.2838137472283815e-05, 'epoch': 0.05}
5%|▍ | 207/4506 [14:17<5:21:27, 4.49s/it]
5%|▍ | 208/4506 [14:21<5:21:46, 4.49s/it]
{'loss': 0.4826, 'grad_norm': 1.1057159900665283, 'learning_rate': 2.29490022172949e-05, 'epoch': 0.05}
5%|▍ | 208/4506 [14:21<5:21:46, 4.49s/it]
5%|▍ | 209/4506 [14:26<5:16:10, 4.41s/it]
{'loss': 0.4781, 'grad_norm': 0.5425624847412109, 'learning_rate': 2.3059866962305986e-05, 'epoch': 0.05}
5%|▍ | 209/4506 [14:26<5:16:10, 4.41s/it]
5%|▍ | 210/4506 [14:30<5:16:13, 4.42s/it]
{'loss': 0.4921, 'grad_norm': 1.1150624752044678, 'learning_rate': 2.3170731707317075e-05, 'epoch': 0.05}
5%|▍ | 210/4506 [14:30<5:16:13, 4.42s/it]
5%|▍ | 211/4506 [14:34<5:06:27, 4.28s/it]
{'loss': 0.4791, 'grad_norm': 0.6691420078277588, 'learning_rate': 2.328159645232816e-05, 'epoch': 0.05}
5%|▍ | 211/4506 [14:34<5:06:27, 4.28s/it]
5%|▍ | 212/4506 [14:38<5:03:47, 4.24s/it]
{'loss': 0.4936, 'grad_norm': 0.6617951989173889, 'learning_rate': 2.3392461197339246e-05, 'epoch': 0.05}
5%|▍ | 212/4506 [14:38<5:03:47, 4.24s/it]
5%|▍ | 213/4506 [14:42<4:59:06, 4.18s/it]
{'loss': 0.4995, 'grad_norm': 0.7723509073257446, 'learning_rate': 2.3503325942350335e-05, 'epoch': 0.05}
5%|▍ | 213/4506 [14:42<4:59:06, 4.18s/it]
5%|▍ | 214/4506 [14:46<4:55:39, 4.13s/it]
{'loss': 0.4877, 'grad_norm': 0.7247833609580994, 'learning_rate': 2.361419068736142e-05, 'epoch': 0.05}
5%|▍ | 214/4506 [14:46<4:55:39, 4.13s/it]
5%|▍ | 215/4506 [14:50<4:51:15, 4.07s/it]
{'loss': 0.4902, 'grad_norm': 0.7771150469779968, 'learning_rate': 2.3725055432372506e-05, 'epoch': 0.05}
5%|▍ | 215/4506 [14:50<4:51:15, 4.07s/it]
5%|▍ | 216/4506 [14:54<4:47:40, 4.02s/it]
{'loss': 0.4648, 'grad_norm': 0.7120174765586853, 'learning_rate': 2.3835920177383595e-05, 'epoch': 0.05}
5%|▍ | 216/4506 [14:54<4:47:40, 4.02s/it]
5%|▍ | 217/4506 [14:58<4:46:23, 4.01s/it]
{'loss': 0.484, 'grad_norm': 0.7036386728286743, 'learning_rate': 2.394678492239468e-05, 'epoch': 0.05}
5%|▍ | 217/4506 [14:58<4:46:23, 4.01s/it]
5%|▍ | 218/4506 [15:02<4:43:41, 3.97s/it]
{'loss': 0.5, 'grad_norm': 0.7409453392028809, 'learning_rate': 2.4057649667405766e-05, 'epoch': 0.05}
5%|▍ | 218/4506 [15:02<4:43:41, 3.97s/it]
5%|▍ | 219/4506 [15:06<4:40:07, 3.92s/it]
{'loss': 0.4931, 'grad_norm': 0.7029383778572083, 'learning_rate': 2.4168514412416855e-05, 'epoch': 0.05}
5%|▍ | 219/4506 [15:06<4:40:07, 3.92s/it]
5%|▍ | 220/4506 [15:10<4:44:32, 3.98s/it]
{'loss': 0.4901, 'grad_norm': 0.7503450512886047, 'learning_rate': 2.427937915742794e-05, 'epoch': 0.05}
5%|▍ | 220/4506 [15:10<4:44:32, 3.98s/it]
5%|▍ | 221/4506 [15:14<4:49:22, 4.05s/it]
{'loss': 0.4969, 'grad_norm': 0.6956893801689148, 'learning_rate': 2.4390243902439026e-05, 'epoch': 0.05}
5%|▍ | 221/4506 [15:14<4:49:22, 4.05s/it]
5%|▍ | 222/4506 [15:19<5:01:21, 4.22s/it]
{'loss': 0.4712, 'grad_norm': 0.7169675230979919, 'learning_rate': 2.4501108647450115e-05, 'epoch': 0.05}
5%|▍ | 222/4506 [15:19<5:01:21, 4.22s/it]
5%|▍ | 223/4506 [15:22<4:51:57, 4.09s/it]
{'loss': 0.4824, 'grad_norm': 0.7318289279937744, 'learning_rate': 2.46119733924612e-05, 'epoch': 0.05}
5%|▍ | 223/4506 [15:22<4:51:57, 4.09s/it]
5%|▍ | 224/4506 [15:27<5:03:39, 4.25s/it]
{'loss': 0.4963, 'grad_norm': 0.5716328024864197, 'learning_rate': 2.4722838137472286e-05, 'epoch': 0.05}
5%|▍ | 224/4506 [15:27<5:03:39, 4.25s/it]
5%|▍ | 225/4506 [15:31<4:58:34, 4.18s/it]
{'loss': 0.4828, 'grad_norm': 0.5573404431343079, 'learning_rate': 2.483370288248337e-05, 'epoch': 0.05}
5%|▍ | 225/4506 [15:31<4:58:34, 4.18s/it]
5%|▌ | 226/4506 [15:35<4:57:01, 4.16s/it]
{'loss': 0.5016, 'grad_norm': 0.6081146001815796, 'learning_rate': 2.4944567627494457e-05, 'epoch': 0.05}
5%|▌ | 226/4506 [15:35<4:57:01, 4.16s/it]
5%|▌ | 227/4506 [15:39<4:51:07, 4.08s/it]
{'loss': 0.4786, 'grad_norm': 0.6197226643562317, 'learning_rate': 2.5055432372505546e-05, 'epoch': 0.05}
5%|▌ | 227/4506 [15:39<4:51:07, 4.08s/it]
5%|▌ | 228/4506 [15:43<4:51:47, 4.09s/it]
{'loss': 0.5055, 'grad_norm': 0.6215401291847229, 'learning_rate': 2.516629711751663e-05, 'epoch': 0.05}
5%|▌ | 228/4506 [15:43<4:51:47, 4.09s/it]
5%|▌ | 229/4506 [15:47<4:47:37, 4.04s/it]
{'loss': 0.4934, 'grad_norm': 0.6373041868209839, 'learning_rate': 2.5277161862527717e-05, 'epoch': 0.05}
5%|▌ | 229/4506 [15:47<4:47:37, 4.04s/it]
5%|▌ | 230/4506 [15:51<4:47:59, 4.04s/it]
{'loss': 0.4708, 'grad_norm': 0.6692588329315186, 'learning_rate': 2.5388026607538806e-05, 'epoch': 0.05}
5%|▌ | 230/4506 [15:51<4:47:59, 4.04s/it]
5%|▌ | 231/4506 [15:56<4:58:19, 4.19s/it]
{'loss': 0.4798, 'grad_norm': 0.6730163097381592, 'learning_rate': 2.549889135254989e-05, 'epoch': 0.05}
5%|▌ | 231/4506 [15:56<4:58:19, 4.19s/it]
5%|▌ | 232/4506 [16:00<5:04:26, 4.27s/it]
{'loss': 0.4952, 'grad_norm': 0.6426482796669006, 'learning_rate': 2.5609756097560977e-05, 'epoch': 0.05}
5%|▌ | 232/4506 [16:00<5:04:26, 4.27s/it]
5%|▌ | 233/4506 [16:04<4:59:08, 4.20s/it]
{'loss': 0.4966, 'grad_norm': 0.6740127205848694, 'learning_rate': 2.5720620842572062e-05, 'epoch': 0.05}
5%|▌ | 233/4506 [16:04<4:59:08, 4.20s/it]
5%|▌ | 234/4506 [16:08<4:57:03, 4.17s/it]
{'loss': 0.4852, 'grad_norm': 0.6730912923812866, 'learning_rate': 2.5831485587583148e-05, 'epoch': 0.05}
5%|▌ | 234/4506 [16:08<4:57:03, 4.17s/it]
5%|▌ | 235/4506 [16:12<4:53:54, 4.13s/it]
{'loss': 0.4859, 'grad_norm': 0.7001057267189026, 'learning_rate': 2.5942350332594233e-05, 'epoch': 0.05}
5%|▌ | 235/4506 [16:12<4:53:54, 4.13s/it]
5%|▌ | 236/4506 [16:16<4:53:49, 4.13s/it]
{'loss': 0.4893, 'grad_norm': 0.6092996597290039, 'learning_rate': 2.6053215077605326e-05, 'epoch': 0.05}
5%|▌ | 236/4506 [16:16<4:53:49, 4.13s/it]
5%|▌ | 237/4506 [16:21<4:55:01, 4.15s/it]
{'loss': 0.4892, 'grad_norm': 0.6854727268218994, 'learning_rate': 2.616407982261641e-05, 'epoch': 0.05}
5%|▌ | 237/4506 [16:21<4:55:01, 4.15s/it]
5%|▌ | 238/4506 [16:25<5:09:10, 4.35s/it]
{'loss': 0.4771, 'grad_norm': 0.5869104266166687, 'learning_rate': 2.6274944567627497e-05, 'epoch': 0.05}
5%|▌ | 238/4506 [16:25<5:09:10, 4.35s/it]
5%|▌ | 239/4506 [16:30<5:06:51, 4.31s/it]
{'loss': 0.4942, 'grad_norm': 0.7085840702056885, 'learning_rate': 2.6385809312638582e-05, 'epoch': 0.05}
5%|▌ | 239/4506 [16:30<5:06:51, 4.31s/it]
5%|▌ | 240/4506 [16:34<5:01:08, 4.24s/it]
{'loss': 0.4894, 'grad_norm': 0.7140604853630066, 'learning_rate': 2.6496674057649668e-05, 'epoch': 0.05}
5%|▌ | 240/4506 [16:34<5:01:08, 4.24s/it]
5%|▌ | 241/4506 [16:38<4:56:45, 4.17s/it]
{'loss': 0.4861, 'grad_norm': 0.6964391469955444, 'learning_rate': 2.6607538802660753e-05, 'epoch': 0.05}
5%|▌ | 241/4506 [16:38<4:56:45, 4.17s/it]
5%|▌ | 242/4506 [16:42<5:00:27, 4.23s/it]
{'loss': 0.4895, 'grad_norm': 0.6332617402076721, 'learning_rate': 2.6718403547671845e-05, 'epoch': 0.05}
5%|▌ | 242/4506 [16:42<5:00:27, 4.23s/it]
5%|▌ | 243/4506 [16:46<4:55:20, 4.16s/it]
{'loss': 0.4852, 'grad_norm': 0.658176839351654, 'learning_rate': 2.682926829268293e-05, 'epoch': 0.05}
5%|▌ | 243/4506 [16:46<4:55:20, 4.16s/it]
5%|▌ | 244/4506 [16:50<4:51:12, 4.10s/it]
{'loss': 0.5003, 'grad_norm': 0.577104389667511, 'learning_rate': 2.6940133037694017e-05, 'epoch': 0.05}
5%|▌ | 244/4506 [16:50<4:51:12, 4.10s/it]
5%|▌ | 245/4506 [16:54<4:56:05, 4.17s/it]
{'loss': 0.4872, 'grad_norm': 0.84919673204422, 'learning_rate': 2.7050997782705102e-05, 'epoch': 0.05}
5%|▌ | 245/4506 [16:54<4:56:05, 4.17s/it]
5%|▌ | 246/4506 [16:58<4:46:57, 4.04s/it]
{'loss': 0.4773, 'grad_norm': 0.7139866352081299, 'learning_rate': 2.7161862527716188e-05, 'epoch': 0.05}
5%|▌ | 246/4506 [16:58<4:46:57, 4.04s/it]
5%|▌ | 247/4506 [17:03<4:58:42, 4.21s/it]
{'loss': 0.4939, 'grad_norm': 0.7911086082458496, 'learning_rate': 2.7272727272727273e-05, 'epoch': 0.05}
5%|▌ | 247/4506 [17:03<4:58:42, 4.21s/it]
6%|▌ | 248/4506 [17:07<4:50:43, 4.10s/it]
{'loss': 0.4832, 'grad_norm': 0.7042136192321777, 'learning_rate': 2.7383592017738362e-05, 'epoch': 0.06}
6%|▌ | 248/4506 [17:07<4:50:43, 4.10s/it]
6%|▌ | 249/4506 [17:11<4:47:41, 4.05s/it]
{'loss': 0.4937, 'grad_norm': 0.7779251933097839, 'learning_rate': 2.7494456762749448e-05, 'epoch': 0.06}
6%|▌ | 249/4506 [17:11<4:47:41, 4.05s/it]
6%|▌ | 250/4506 [17:15<4:46:26, 4.04s/it]
{'loss': 0.5073, 'grad_norm': 0.6762179732322693, 'learning_rate': 2.7605321507760533e-05, 'epoch': 0.06}
6%|▌ | 250/4506 [17:15<4:46:26, 4.04s/it]
6%|▌ | 251/4506 [17:18<4:41:30, 3.97s/it]
{'loss': 0.4772, 'grad_norm': 0.5835009813308716, 'learning_rate': 2.771618625277162e-05, 'epoch': 0.06}
6%|▌ | 251/4506 [17:18<4:41:30, 3.97s/it]
6%|▌ | 252/4506 [17:22<4:40:39, 3.96s/it]
{'loss': 0.4945, 'grad_norm': 0.6470304131507874, 'learning_rate': 2.7827050997782704e-05, 'epoch': 0.06}
6%|▌ | 252/4506 [17:22<4:40:39, 3.96s/it]
6%|▌ | 253/4506 [17:26<4:43:16, 4.00s/it]
{'loss': 0.4833, 'grad_norm': 0.6158888936042786, 'learning_rate': 2.793791574279379e-05, 'epoch': 0.06}
6%|▌ | 253/4506 [17:26<4:43:16, 4.00s/it]
6%|▌ | 254/4506 [17:30<4:41:53, 3.98s/it]
{'loss': 0.4982, 'grad_norm': 0.6625139117240906, 'learning_rate': 2.8048780487804882e-05, 'epoch': 0.06}
6%|▌ | 254/4506 [17:30<4:41:53, 3.98s/it]
6%|▌ | 255/4506 [17:34<4:45:14, 4.03s/it]
{'loss': 0.4816, 'grad_norm': 0.5282764434814453, 'learning_rate': 2.8159645232815967e-05, 'epoch': 0.06}
6%|▌ | 255/4506 [17:34<4:45:14, 4.03s/it]
6%|▌ | 256/4506 [17:39<4:48:33, 4.07s/it]
{'loss': 0.4853, 'grad_norm': 0.5397368669509888, 'learning_rate': 2.8270509977827053e-05, 'epoch': 0.06}
6%|▌ | 256/4506 [17:39<4:48:33, 4.07s/it]
6%|▌ | 257/4506 [17:43<4:54:20, 4.16s/it]
{'loss': 0.5541, 'grad_norm': 49.477108001708984, 'learning_rate': 2.838137472283814e-05, 'epoch': 0.06}
6%|▌ | 257/4506 [17:43<4:54:20, 4.16s/it]
6%|▌ | 258/4506 [17:47<4:53:57, 4.15s/it]
{'loss': 0.4917, 'grad_norm': 0.727167010307312, 'learning_rate': 2.8492239467849224e-05, 'epoch': 0.06}
6%|▌ | 258/4506 [17:47<4:53:57, 4.15s/it]
6%|▌ | 259/4506 [17:51<4:57:46, 4.21s/it]
{'loss': 0.4869, 'grad_norm': 0.5855398774147034, 'learning_rate': 2.860310421286031e-05, 'epoch': 0.06}
6%|▌ | 259/4506 [17:51<4:57:46, 4.21s/it]
6%|▌ | 260/4506 [17:55<4:52:36, 4.13s/it]
{'loss': 0.4863, 'grad_norm': 0.5604298710823059, 'learning_rate': 2.8713968957871395e-05, 'epoch': 0.06}
6%|▌ | 260/4506 [17:55<4:52:36, 4.13s/it]
6%|▌ | 261/4506 [18:00<4:55:58, 4.18s/it]
{'loss': 0.4766, 'grad_norm': 0.5564908385276794, 'learning_rate': 2.8824833702882487e-05, 'epoch': 0.06}
6%|▌ | 261/4506 [18:00<4:55:58, 4.18s/it]
6%|▌ | 262/4506 [18:04<4:55:16, 4.17s/it]
{'loss': 0.4913, 'grad_norm': 4.946478843688965, 'learning_rate': 2.8935698447893573e-05, 'epoch': 0.06}
6%|▌ | 262/4506 [18:04<4:55:16, 4.17s/it]
6%|▌ | 263/4506 [18:08<4:54:38, 4.17s/it]
{'loss': 0.4968, 'grad_norm': 0.7509699463844299, 'learning_rate': 2.904656319290466e-05, 'epoch': 0.06}
6%|▌ | 263/4506 [18:08<4:54:38, 4.17s/it]
6%|▌ | 264/4506 [18:12<4:50:30, 4.11s/it]
{'loss': 0.487, 'grad_norm': 0.6557166576385498, 'learning_rate': 2.9157427937915744e-05, 'epoch': 0.06}
6%|▌ | 264/4506 [18:12<4:50:30, 4.11s/it]
6%|▌ | 265/4506 [18:16<4:42:37, 4.00s/it]
{'loss': 0.4838, 'grad_norm': 0.6967031955718994, 'learning_rate': 2.926829268292683e-05, 'epoch': 0.06}
6%|▌ | 265/4506 [18:16<4:42:37, 4.00s/it]
6%|▌ | 266/4506 [18:20<4:47:33, 4.07s/it]
{'loss': 0.4896, 'grad_norm': 0.6567521691322327, 'learning_rate': 2.9379157427937915e-05, 'epoch': 0.06}
6%|▌ | 266/4506 [18:20<4:47:33, 4.07s/it]
6%|▌ | 267/4506 [18:24<4:52:47, 4.14s/it]
{'loss': 0.4784, 'grad_norm': 0.6312602758407593, 'learning_rate': 2.9490022172949004e-05, 'epoch': 0.06}
6%|▌ | 267/4506 [18:24<4:52:47, 4.14s/it]
6%|▌ | 268/4506 [18:28<4:49:23, 4.10s/it]
{'loss': 0.4681, 'grad_norm': 0.6965785026550293, 'learning_rate': 2.960088691796009e-05, 'epoch': 0.06}
6%|▌ | 268/4506 [18:28<4:49:23, 4.10s/it]
6%|▌ | 269/4506 [18:32<4:46:56, 4.06s/it]
{'loss': 0.4875, 'grad_norm': 0.673069953918457, 'learning_rate': 2.971175166297118e-05, 'epoch': 0.06}
6%|▌ | 269/4506 [18:32<4:46:56, 4.06s/it]
6%|▌ | 270/4506 [18:36<4:49:18, 4.10s/it]
{'loss': 0.4811, 'grad_norm': 0.5619593858718872, 'learning_rate': 2.9822616407982264e-05, 'epoch': 0.06}
6%|▌ | 270/4506 [18:36<4:49:18, 4.10s/it]
6%|▌ | 271/4506 [18:41<4:51:34, 4.13s/it]
{'loss': 0.4758, 'grad_norm': 0.6835150718688965, 'learning_rate': 2.993348115299335e-05, 'epoch': 0.06}
6%|▌ | 271/4506 [18:41<4:51:34, 4.13s/it]
6%|▌ | 272/4506 [18:44<4:45:26, 4.04s/it]
{'loss': 0.5333, 'grad_norm': 5.422232151031494, 'learning_rate': 3.0044345898004435e-05, 'epoch': 0.06}
6%|▌ | 272/4506 [18:45<4:45:26, 4.04s/it]
6%|▌ | 273/4506 [18:48<4:43:10, 4.01s/it]
{'loss': 0.486, 'grad_norm': 0.9272562265396118, 'learning_rate': 3.0155210643015524e-05, 'epoch': 0.06}
6%|▌ | 273/4506 [18:48<4:43:10, 4.01s/it]
6%|▌ | 274/4506 [18:52<4:43:59, 4.03s/it]
{'loss': 0.5305, 'grad_norm': 2.5507068634033203, 'learning_rate': 3.026607538802661e-05, 'epoch': 0.06}
6%|▌ | 274/4506 [18:52<4:43:59, 4.03s/it]
6%|▌ | 275/4506 [18:56<4:38:11, 3.95s/it]
{'loss': 0.4771, 'grad_norm': 0.7423598766326904, 'learning_rate': 3.0376940133037695e-05, 'epoch': 0.06}
6%|▌ | 275/4506 [18:56<4:38:11, 3.95s/it]
6%|▌ | 276/4506 [19:00<4:43:27, 4.02s/it]
{'loss': 0.4821, 'grad_norm': 0.8148553967475891, 'learning_rate': 3.048780487804878e-05, 'epoch': 0.06}
6%|▌ | 276/4506 [19:00<4:43:27, 4.02s/it]
6%|▌ | 277/4506 [19:05<4:44:21, 4.03s/it]
{'loss': 0.4735, 'grad_norm': 0.6945447325706482, 'learning_rate': 3.059866962305987e-05, 'epoch': 0.06}
6%|▌ | 277/4506 [19:05<4:44:21, 4.03s/it]
6%|▌ | 278/4506 [19:09<4:49:35, 4.11s/it]
{'loss': 0.5009, 'grad_norm': 0.8545969724655151, 'learning_rate': 3.070953436807095e-05, 'epoch': 0.06}
6%|▌ | 278/4506 [19:09<4:49:35, 4.11s/it]
6%|▌ | 279/4506 [19:13<4:46:34, 4.07s/it]
{'loss': 0.4903, 'grad_norm': 0.6980259418487549, 'learning_rate': 3.082039911308204e-05, 'epoch': 0.06}
6%|▌ | 279/4506 [19:13<4:46:34, 4.07s/it]
6%|▌ | 280/4506 [19:17<4:46:33, 4.07s/it]
{'loss': 0.4929, 'grad_norm': 0.9629960060119629, 'learning_rate': 3.093126385809313e-05, 'epoch': 0.06}
6%|▌ | 280/4506 [19:17<4:46:33, 4.07s/it]
6%|▌ | 281/4506 [19:21<4:43:27, 4.03s/it]
{'loss': 0.4771, 'grad_norm': 0.7634316682815552, 'learning_rate': 3.104212860310421e-05, 'epoch': 0.06}
6%|▌ | 281/4506 [19:21<4:43:27, 4.03s/it]
6%|▋ | 282/4506 [19:25<4:42:29, 4.01s/it]
{'loss': 0.4914, 'grad_norm': 0.8118604421615601, 'learning_rate': 3.11529933481153e-05, 'epoch': 0.06}
6%|▋ | 282/4506 [19:25<4:42:29, 4.01s/it]
6%|▋ | 283/4506 [19:29<4:46:18, 4.07s/it]
{'loss': 0.483, 'grad_norm': 0.587049663066864, 'learning_rate': 3.126385809312638e-05, 'epoch': 0.06}
6%|▋ | 283/4506 [19:29<4:46:18, 4.07s/it]
6%|▋ | 284/4506 [19:33<4:45:25, 4.06s/it]
{'loss': 0.4742, 'grad_norm': 0.6681234240531921, 'learning_rate': 3.137472283813747e-05, 'epoch': 0.06}
6%|▋ | 284/4506 [19:33<4:45:25, 4.06s/it]
6%|▋ | 285/4506 [19:37<4:42:26, 4.01s/it]
{'loss': 0.484, 'grad_norm': 0.6477012634277344, 'learning_rate': 3.148558758314856e-05, 'epoch': 0.06}
6%|▋ | 285/4506 [19:37<4:42:26, 4.01s/it]
6%|▋ | 286/4506 [19:41<4:47:37, 4.09s/it]
{'loss': 0.4951, 'grad_norm': 0.6526032090187073, 'learning_rate': 3.159645232815965e-05, 'epoch': 0.06}
6%|▋ | 286/4506 [19:41<4:47:37, 4.09s/it]
6%|▋ | 287/4506 [19:45<4:44:58, 4.05s/it]
{'loss': 0.4695, 'grad_norm': 0.7159047722816467, 'learning_rate': 3.170731707317073e-05, 'epoch': 0.06}
6%|▋ | 287/4506 [19:45<4:44:58, 4.05s/it]
6%|▋ | 288/4506 [19:49<4:41:46, 4.01s/it]
{'loss': 0.4765, 'grad_norm': 0.5336236953735352, 'learning_rate': 3.181818181818182e-05, 'epoch': 0.06}
6%|▋ | 288/4506 [19:49<4:41:46, 4.01s/it]
6%|▋ | 289/4506 [19:53<4:40:10, 3.99s/it]
{'loss': 0.5536, 'grad_norm': 6.016000270843506, 'learning_rate': 3.19290465631929e-05, 'epoch': 0.06}
6%|▋ | 289/4506 [19:53<4:40:10, 3.99s/it]
6%|▋ | 290/4506 [19:57<4:33:30, 3.89s/it]
{'loss': 0.4847, 'grad_norm': 0.9864018559455872, 'learning_rate': 3.203991130820399e-05, 'epoch': 0.06}
6%|▋ | 290/4506 [19:57<4:33:30, 3.89s/it]
6%|▋ | 291/4506 [20:01<4:43:41, 4.04s/it]
{'loss': 0.4768, 'grad_norm': 0.7144032716751099, 'learning_rate': 3.215077605321508e-05, 'epoch': 0.06}
6%|▋ | 291/4506 [20:01<4:43:41, 4.04s/it]
6%|▋ | 292/4506 [20:05<4:40:11, 3.99s/it]
{'loss': 0.5096, 'grad_norm': 7.4400715827941895, 'learning_rate': 3.226164079822617e-05, 'epoch': 0.06}
6%|▋ | 292/4506 [20:05<4:40:11, 3.99s/it]
7%|▋ | 293/4506 [20:09<4:35:34, 3.92s/it]
{'loss': 0.498, 'grad_norm': 1.2048771381378174, 'learning_rate': 3.237250554323725e-05, 'epoch': 0.07}
7%|▋ | 293/4506 [20:09<4:35:34, 3.92s/it]
7%|▋ | 294/4506 [20:13<4:39:13, 3.98s/it]
{'loss': 0.4818, 'grad_norm': 0.5369740128517151, 'learning_rate': 3.248337028824834e-05, 'epoch': 0.07}
7%|▋ | 294/4506 [20:13<4:39:13, 3.98s/it]
7%|▋ | 295/4506 [20:18<4:56:00, 4.22s/it]
{'loss': 0.4892, 'grad_norm': 0.7335840463638306, 'learning_rate': 3.259423503325942e-05, 'epoch': 0.07}
7%|▋ | 295/4506 [20:18<4:56:00, 4.22s/it]
7%|▋ | 296/4506 [20:21<4:49:29, 4.13s/it]
{'loss': 0.4688, 'grad_norm': 0.8398157358169556, 'learning_rate': 3.270509977827051e-05, 'epoch': 0.07}
7%|▋ | 296/4506 [20:21<4:49:29, 4.13s/it]
7%|▋ | 297/4506 [20:26<4:49:26, 4.13s/it]
{'loss': 0.4709, 'grad_norm': 0.4955570101737976, 'learning_rate': 3.28159645232816e-05, 'epoch': 0.07}
7%|▋ | 297/4506 [20:26<4:49:26, 4.13s/it]
7%|▋ | 298/4506 [20:30<5:02:22, 4.31s/it]
{'loss': 0.4848, 'grad_norm': 0.7331514358520508, 'learning_rate': 3.292682926829269e-05, 'epoch': 0.07}
7%|▋ | 298/4506 [20:30<5:02:22, 4.31s/it]
7%|▋ | 299/4506 [20:34<4:56:33, 4.23s/it]
{'loss': 0.4977, 'grad_norm': 0.8306925296783447, 'learning_rate': 3.303769401330377e-05, 'epoch': 0.07}
7%|▋ | 299/4506 [20:34<4:56:33, 4.23s/it]
7%|▋ | 300/4506 [20:39<5:02:16, 4.31s/it]
{'loss': 0.4784, 'grad_norm': 0.48811081051826477, 'learning_rate': 3.314855875831486e-05, 'epoch': 0.07}
7%|▋ | 300/4506 [20:39<5:02:16, 4.31s/it]
7%|▋ | 301/4506 [20:43<4:56:48, 4.24s/it]
{'loss': 0.4815, 'grad_norm': 0.8901100158691406, 'learning_rate': 3.325942350332594e-05, 'epoch': 0.07}
7%|▋ | 301/4506 [20:43<4:56:48, 4.24s/it]
7%|▋ | 302/4506 [20:47<4:55:04, 4.21s/it]
{'loss': 0.4771, 'grad_norm': 0.6370069980621338, 'learning_rate': 3.337028824833703e-05, 'epoch': 0.07}
7%|▋ | 302/4506 [20:47<4:55:04, 4.21s/it]
7%|▋ | 303/4506 [20:51<4:47:44, 4.11s/it]
{'loss': 0.476, 'grad_norm': 0.5985918641090393, 'learning_rate': 3.348115299334812e-05, 'epoch': 0.07}
7%|▋ | 303/4506 [20:51<4:47:44, 4.11s/it]
7%|▋ | 304/4506 [20:55<4:41:30, 4.02s/it]
{'loss': 0.4805, 'grad_norm': 0.6268394589424133, 'learning_rate': 3.35920177383592e-05, 'epoch': 0.07}
7%|▋ | 304/4506 [20:55<4:41:30, 4.02s/it]
7%|▋ | 305/4506 [20:59<4:38:25, 3.98s/it]
{'loss': 0.471, 'grad_norm': 0.6944514513015747, 'learning_rate': 3.370288248337029e-05, 'epoch': 0.07}
7%|▋ | 305/4506 [20:59<4:38:25, 3.98s/it]
7%|▋ | 306/4506 [21:03<4:46:55, 4.10s/it]
{'loss': 0.4741, 'grad_norm': 0.5834444761276245, 'learning_rate': 3.381374722838137e-05, 'epoch': 0.07}
7%|▋ | 306/4506 [21:03<4:46:55, 4.10s/it]
7%|▋ | 307/4506 [21:07<4:51:23, 4.16s/it]
{'loss': 0.4693, 'grad_norm': 0.5606842041015625, 'learning_rate': 3.392461197339246e-05, 'epoch': 0.07}
7%|▋ | 307/4506 [21:07<4:51:23, 4.16s/it]
7%|▋ | 308/4506 [21:12<5:02:51, 4.33s/it]
{'loss': 0.4642, 'grad_norm': 0.6367583274841309, 'learning_rate': 3.4035476718403544e-05, 'epoch': 0.07}
7%|▋ | 308/4506 [21:12<5:02:51, 4.33s/it]
7%|▋ | 309/4506 [21:16<5:00:47, 4.30s/it]
{'loss': 0.4693, 'grad_norm': 0.6626463532447815, 'learning_rate': 3.414634146341464e-05, 'epoch': 0.07}
7%|▋ | 309/4506 [21:16<5:00:47, 4.30s/it]
7%|▋ | 310/4506 [21:21<5:03:02, 4.33s/it]
{'loss': 0.4676, 'grad_norm': 0.4992898404598236, 'learning_rate': 3.425720620842572e-05, 'epoch': 0.07}
7%|▋ | 310/4506 [21:21<5:03:02, 4.33s/it]
7%|▋ | 311/4506 [21:25<5:03:00, 4.33s/it]
{'loss': 0.4702, 'grad_norm': 0.5672375559806824, 'learning_rate': 3.436807095343681e-05, 'epoch': 0.07}
7%|▋ | 311/4506 [21:25<5:03:00, 4.33s/it]
7%|▋ | 312/4506 [21:29<4:58:29, 4.27s/it]
{'loss': 0.458, 'grad_norm': 0.6133443117141724, 'learning_rate': 3.447893569844789e-05, 'epoch': 0.07}
7%|▋ | 312/4506 [21:29<4:58:29, 4.27s/it]
7%|▋ | 313/4506 [21:33<4:53:25, 4.20s/it]
{'loss': 0.4781, 'grad_norm': 0.6221413612365723, 'learning_rate': 3.458980044345898e-05, 'epoch': 0.07}
7%|▋ | 313/4506 [21:33<4:53:25, 4.20s/it]
7%|▋ | 314/4506 [21:37<4:48:34, 4.13s/it]
{'loss': 0.4768, 'grad_norm': 0.6324403882026672, 'learning_rate': 3.4700665188470064e-05, 'epoch': 0.07}
7%|▋ | 314/4506 [21:37<4:48:34, 4.13s/it]
7%|▋ | 315/4506 [21:42<4:54:33, 4.22s/it]
{'loss': 0.4662, 'grad_norm': 0.5514945387840271, 'learning_rate': 3.481152993348116e-05, 'epoch': 0.07}
7%|▋ | 315/4506 [21:42<4:54:33, 4.22s/it]
7%|▋ | 316/4506 [21:46<4:53:30, 4.20s/it]
{'loss': 0.4781, 'grad_norm': 0.5889636278152466, 'learning_rate': 3.492239467849224e-05, 'epoch': 0.07}
7%|▋ | 316/4506 [21:46<4:53:30, 4.20s/it]
7%|▋ | 317/4506 [21:50<4:49:00, 4.14s/it]
{'loss': 0.4675, 'grad_norm': 0.6650471091270447, 'learning_rate': 3.503325942350333e-05, 'epoch': 0.07}
7%|▋ | 317/4506 [21:50<4:49:00, 4.14s/it]
7%|▋ | 318/4506 [21:54<4:44:40, 4.08s/it]
{'loss': 0.4746, 'grad_norm': 0.6731651425361633, 'learning_rate': 3.514412416851441e-05, 'epoch': 0.07}
7%|▋ | 318/4506 [21:54<4:44:40, 4.08s/it]
7%|▋ | 319/4506 [21:58<4:49:48, 4.15s/it]
{'loss': 0.4655, 'grad_norm': 0.5708182454109192, 'learning_rate': 3.52549889135255e-05, 'epoch': 0.07}
7%|▋ | 319/4506 [21:58<4:49:48, 4.15s/it]
7%|▋ | 320/4506 [22:02<4:55:02, 4.23s/it]
{'loss': 0.4654, 'grad_norm': 0.4560811221599579, 'learning_rate': 3.5365853658536584e-05, 'epoch': 0.07}
7%|▋ | 320/4506 [22:02<4:55:02, 4.23s/it]
7%|▋ | 321/4506 [22:07<4:55:46, 4.24s/it]
{'loss': 0.4727, 'grad_norm': 0.6515731811523438, 'learning_rate': 3.547671840354767e-05, 'epoch': 0.07}
7%|▋ | 321/4506 [22:07<4:55:46, 4.24s/it]
7%|▋ | 322/4506 [22:12<5:15:15, 4.52s/it]
{'loss': 0.474, 'grad_norm': 0.5765541791915894, 'learning_rate': 3.558758314855876e-05, 'epoch': 0.07}
7%|▋ | 322/4506 [22:12<5:15:15, 4.52s/it]
7%|▋ | 323/4506 [22:16<5:01:57, 4.33s/it]
{'loss': 0.4763, 'grad_norm': 0.6130799651145935, 'learning_rate': 3.5698447893569844e-05, 'epoch': 0.07}
7%|▋ | 323/4506 [22:16<5:01:57, 4.33s/it]
7%|▋ | 324/4506 [22:20<4:57:53, 4.27s/it]
{'loss': 0.4697, 'grad_norm': 0.5536300539970398, 'learning_rate': 3.580931263858093e-05, 'epoch': 0.07}
7%|▋ | 324/4506 [22:20<4:57:53, 4.27s/it]
7%|▋ | 325/4506 [22:24<4:47:12, 4.12s/it]
{'loss': 0.4874, 'grad_norm': 0.5790297985076904, 'learning_rate': 3.5920177383592015e-05, 'epoch': 0.07}
7%|▋ | 325/4506 [22:24<4:47:12, 4.12s/it]
7%|▋ | 326/4506 [22:27<4:41:38, 4.04s/it]
{'loss': 0.4827, 'grad_norm': 0.5863263607025146, 'learning_rate': 3.6031042128603104e-05, 'epoch': 0.07}
7%|▋ | 326/4506 [22:28<4:41:38, 4.04s/it]
7%|▋ | 327/4506 [22:31<4:37:40, 3.99s/it]
{'loss': 0.4851, 'grad_norm': 0.585790753364563, 'learning_rate': 3.6141906873614186e-05, 'epoch': 0.07}
7%|▋ | 327/4506 [22:31<4:37:40, 3.99s/it]
7%|▋ | 328/4506 [22:35<4:38:09, 3.99s/it]
{'loss': 0.4712, 'grad_norm': 0.5194576382637024, 'learning_rate': 3.625277161862528e-05, 'epoch': 0.07}
7%|▋ | 328/4506 [22:35<4:38:09, 3.99s/it]
7%|▋ | 329/4506 [22:39<4:35:03, 3.95s/it]
{'loss': 0.4648, 'grad_norm': 0.5439262390136719, 'learning_rate': 3.6363636363636364e-05, 'epoch': 0.07}
7%|▋ | 329/4506 [22:39<4:35:03, 3.95s/it]
7%|▋ | 330/4506 [22:43<4:39:39, 4.02s/it]
{'loss': 0.4808, 'grad_norm': 0.5636749863624573, 'learning_rate': 3.647450110864745e-05, 'epoch': 0.07}
7%|▋ | 330/4506 [22:43<4:39:39, 4.02s/it]
7%|▋ | 331/4506 [22:47<4:38:09, 4.00s/it]
{'loss': 0.4555, 'grad_norm': 0.56549471616745, 'learning_rate': 3.6585365853658535e-05, 'epoch': 0.07}
7%|▋ | 331/4506 [22:47<4:38:09, 4.00s/it]
7%|▋ | 332/4506 [22:51<4:38:24, 4.00s/it]
{'loss': 0.4773, 'grad_norm': 0.6048386693000793, 'learning_rate': 3.6696230598669624e-05, 'epoch': 0.07}
7%|▋ | 332/4506 [22:51<4:38:24, 4.00s/it]
7%|▋ | 333/4506 [22:56<4:42:41, 4.06s/it]
{'loss': 0.4779, 'grad_norm': 0.6526768207550049, 'learning_rate': 3.6807095343680706e-05, 'epoch': 0.07}
7%|▋ | 333/4506 [22:56<4:42:41, 4.06s/it]
7%|▋ | 334/4506 [23:00<4:44:13, 4.09s/it]
{'loss': 0.4753, 'grad_norm': 0.6674452424049377, 'learning_rate': 3.69179600886918e-05, 'epoch': 0.07}
7%|▋ | 334/4506 [23:00<4:44:13, 4.09s/it]
7%|▋ | 335/4506 [23:04<4:50:36, 4.18s/it]
{'loss': 0.4648, 'grad_norm': 0.5995166301727295, 'learning_rate': 3.7028824833702884e-05, 'epoch': 0.07}
7%|▋ | 335/4506 [23:04<4:50:36, 4.18s/it]
7%|▋ | 336/4506 [23:08<4:48:17, 4.15s/it]
{'loss': 0.4769, 'grad_norm': 0.6123847961425781, 'learning_rate': 3.713968957871397e-05, 'epoch': 0.07}
7%|▋ | 336/4506 [23:08<4:48:17, 4.15s/it]
7%|▋ | 337/4506 [23:13<4:53:07, 4.22s/it]
{'loss': 0.4701, 'grad_norm': 0.6568610668182373, 'learning_rate': 3.7250554323725055e-05, 'epoch': 0.07}
7%|▋ | 337/4506 [23:13<4:53:07, 4.22s/it]
8%|▊ | 338/4506 [23:17<4:50:50, 4.19s/it]
{'loss': 0.4651, 'grad_norm': 0.6314013600349426, 'learning_rate': 3.7361419068736144e-05, 'epoch': 0.08}
8%|▊ | 338/4506 [23:17<4:50:50, 4.19s/it]
8%|▊ | 339/4506 [23:21<4:49:06, 4.16s/it]
{'loss': 0.4624, 'grad_norm': 0.646771252155304, 'learning_rate': 3.7472283813747226e-05, 'epoch': 0.08}
8%|▊ | 339/4506 [23:21<4:49:06, 4.16s/it]
8%|▊ | 340/4506 [23:25<4:41:56, 4.06s/it]
{'loss': 0.4799, 'grad_norm': 0.6134341359138489, 'learning_rate': 3.758314855875832e-05, 'epoch': 0.08}
8%|▊ | 340/4506 [23:25<4:41:56, 4.06s/it]
8%|▊ | 341/4506 [23:29<4:42:01, 4.06s/it]
{'loss': 0.4684, 'grad_norm': 0.5610846877098083, 'learning_rate': 3.7694013303769404e-05, 'epoch': 0.08}
8%|▊ | 341/4506 [23:29<4:42:01, 4.06s/it]
8%|▊ | 342/4506 [23:32<4:36:53, 3.99s/it]
{'loss': 0.4735, 'grad_norm': 0.5979098081588745, 'learning_rate': 3.780487804878049e-05, 'epoch': 0.08}
8%|▊ | 342/4506 [23:32<4:36:53, 3.99s/it]
8%|▊ | 343/4506 [23:37<4:43:05, 4.08s/it]
{'loss': 0.4707, 'grad_norm': 0.5827330350875854, 'learning_rate': 3.7915742793791575e-05, 'epoch': 0.08}
8%|▊ | 343/4506 [23:37<4:43:05, 4.08s/it]
8%|▊ | 344/4506 [23:41<4:40:50, 4.05s/it]
{'loss': 0.4778, 'grad_norm': 0.6543785929679871, 'learning_rate': 3.8026607538802664e-05, 'epoch': 0.08}
8%|▊ | 344/4506 [23:41<4:40:50, 4.05s/it]
8%|▊ | 345/4506 [23:45<4:37:19, 4.00s/it]
{'loss': 0.4682, 'grad_norm': 0.5134230256080627, 'learning_rate': 3.8137472283813746e-05, 'epoch': 0.08}
8%|▊ | 345/4506 [23:45<4:37:19, 4.00s/it]
8%|▊ | 346/4506 [23:49<4:44:03, 4.10s/it]
{'loss': 0.4689, 'grad_norm': 0.5412847399711609, 'learning_rate': 3.8248337028824835e-05, 'epoch': 0.08}
8%|▊ | 346/4506 [23:49<4:44:03, 4.10s/it]
8%|▊ | 347/4506 [23:53<4:40:13, 4.04s/it]
{'loss': 0.4672, 'grad_norm': 0.6407479643821716, 'learning_rate': 3.8359201773835924e-05, 'epoch': 0.08}
8%|▊ | 347/4506 [23:53<4:40:13, 4.04s/it]
8%|▊ | 348/4506 [23:57<4:45:56, 4.13s/it]
{'loss': 0.4832, 'grad_norm': 0.6856405735015869, 'learning_rate': 3.8470066518847006e-05, 'epoch': 0.08}
8%|▊ | 348/4506 [23:57<4:45:56, 4.13s/it]
8%|▊ | 349/4506 [24:01<4:47:11, 4.15s/it]
{'loss': 0.4607, 'grad_norm': 0.5314794778823853, 'learning_rate': 3.8580931263858095e-05, 'epoch': 0.08}
8%|▊ | 349/4506 [24:01<4:47:11, 4.15s/it]
8%|▊ | 350/4506 [24:05<4:44:03, 4.10s/it]
{'loss': 0.4707, 'grad_norm': 0.6006532907485962, 'learning_rate': 3.869179600886918e-05, 'epoch': 0.08}
8%|▊ | 350/4506 [24:05<4:44:03, 4.10s/it]
8%|▊ | 351/4506 [24:10<4:44:59, 4.12s/it]
{'loss': 0.4843, 'grad_norm': 0.5731856226921082, 'learning_rate': 3.8802660753880266e-05, 'epoch': 0.08}
8%|▊ | 351/4506 [24:10<4:44:59, 4.12s/it]
8%|▊ | 352/4506 [24:14<4:45:37, 4.13s/it]
{'loss': 0.4585, 'grad_norm': 0.4726095199584961, 'learning_rate': 3.8913525498891355e-05, 'epoch': 0.08}
8%|▊ | 352/4506 [24:14<4:45:37, 4.13s/it]
8%|▊ | 353/4506 [24:18<4:42:59, 4.09s/it]
{'loss': 0.4771, 'grad_norm': 0.59394770860672, 'learning_rate': 3.9024390243902444e-05, 'epoch': 0.08}
8%|▊ | 353/4506 [24:18<4:42:59, 4.09s/it]
8%|▊ | 354/4506 [24:21<4:36:46, 4.00s/it]
{'loss': 0.4607, 'grad_norm': 0.6210277676582336, 'learning_rate': 3.9135254988913526e-05, 'epoch': 0.08}
8%|▊ | 354/4506 [24:21<4:36:46, 4.00s/it]
8%|▊ | 355/4506 [24:26<4:39:27, 4.04s/it]
{'loss': 0.4587, 'grad_norm': 0.5042370557785034, 'learning_rate': 3.9246119733924615e-05, 'epoch': 0.08}
8%|▊ | 355/4506 [24:26<4:39:27, 4.04s/it]
8%|▊ | 356/4506 [24:30<4:36:28, 4.00s/it]
{'loss': 0.4629, 'grad_norm': 0.5638478398323059, 'learning_rate': 3.93569844789357e-05, 'epoch': 0.08}
8%|▊ | 356/4506 [24:30<4:36:28, 4.00s/it]
8%|▊ | 357/4506 [24:33<4:33:22, 3.95s/it]
{'loss': 0.4641, 'grad_norm': 0.5535477995872498, 'learning_rate': 3.9467849223946786e-05, 'epoch': 0.08}
8%|▊ | 357/4506 [24:33<4:33:22, 3.95s/it]
8%|▊ | 358/4506 [24:38<4:41:25, 4.07s/it]
{'loss': 0.4758, 'grad_norm': 0.5657721757888794, 'learning_rate': 3.9578713968957875e-05, 'epoch': 0.08}
8%|▊ | 358/4506 [24:38<4:41:25, 4.07s/it]
8%|▊ | 359/4506 [24:42<4:42:34, 4.09s/it]
{'loss': 0.4617, 'grad_norm': 0.5227282643318176, 'learning_rate': 3.9689578713968964e-05, 'epoch': 0.08}
8%|▊ | 359/4506 [24:42<4:42:34, 4.09s/it]
8%|▊ | 360/4506 [24:46<4:36:42, 4.00s/it]
{'loss': 0.4579, 'grad_norm': 0.5598239302635193, 'learning_rate': 3.9800443458980046e-05, 'epoch': 0.08}
8%|▊ | 360/4506 [24:46<4:36:42, 4.00s/it]
8%|▊ | 361/4506 [24:50<4:40:09, 4.06s/it]
{'loss': 0.4511, 'grad_norm': 0.5228368043899536, 'learning_rate': 3.9911308203991135e-05, 'epoch': 0.08}
8%|▊ | 361/4506 [24:50<4:40:09, 4.06s/it]
8%|▊ | 362/4506 [24:54<4:35:30, 3.99s/it]
{'loss': 0.4545, 'grad_norm': 0.6421412229537964, 'learning_rate': 4.002217294900222e-05, 'epoch': 0.08}
8%|▊ | 362/4506 [24:54<4:35:30, 3.99s/it]
8%|▊ | 363/4506 [24:58<4:38:24, 4.03s/it]
{'loss': 0.4626, 'grad_norm': 0.5603579878807068, 'learning_rate': 4.0133037694013306e-05, 'epoch': 0.08}
8%|▊ | 363/4506 [24:58<4:38:24, 4.03s/it]
8%|▊ | 364/4506 [25:02<4:44:30, 4.12s/it]
{'loss': 0.4818, 'grad_norm': 0.6214726567268372, 'learning_rate': 4.0243902439024395e-05, 'epoch': 0.08}
8%|▊ | 364/4506 [25:02<4:44:30, 4.12s/it]
8%|▊ | 365/4506 [25:06<4:45:22, 4.13s/it]
{'loss': 0.4644, 'grad_norm': 0.5321371555328369, 'learning_rate': 4.035476718403548e-05, 'epoch': 0.08}
8%|▊ | 365/4506 [25:06<4:45:22, 4.13s/it]
8%|▊ | 366/4506 [25:10<4:39:55, 4.06s/it]
{'loss': 0.4512, 'grad_norm': 0.593797504901886, 'learning_rate': 4.0465631929046566e-05, 'epoch': 0.08}
8%|▊ | 366/4506 [25:10<4:39:55, 4.06s/it]
8%|▊ | 367/4506 [25:14<4:40:30, 4.07s/it]
{'loss': 0.4592, 'grad_norm': 0.6430811285972595, 'learning_rate': 4.057649667405765e-05, 'epoch': 0.08}
8%|▊ | 367/4506 [25:14<4:40:30, 4.07s/it]
8%|▊ | 368/4506 [25:19<4:46:55, 4.16s/it]
{'loss': 0.463, 'grad_norm': 0.5156310796737671, 'learning_rate': 4.068736141906874e-05, 'epoch': 0.08}
8%|▊ | 368/4506 [25:19<4:46:55, 4.16s/it]
8%|▊ | 369/4506 [25:23<4:47:39, 4.17s/it]
{'loss': 0.4741, 'grad_norm': 0.5599780082702637, 'learning_rate': 4.0798226164079826e-05, 'epoch': 0.08}
8%|▊ | 369/4506 [25:23<4:47:39, 4.17s/it]
8%|▊ | 370/4506 [25:27<4:55:01, 4.28s/it]
{'loss': 0.4612, 'grad_norm': 0.5433686375617981, 'learning_rate': 4.0909090909090915e-05, 'epoch': 0.08}
8%|▊ | 370/4506 [25:27<4:55:01, 4.28s/it]
8%|▊ | 371/4506 [25:32<4:58:35, 4.33s/it]
{'loss': 0.4721, 'grad_norm': 0.6916232109069824, 'learning_rate': 4.1019955654102e-05, 'epoch': 0.08}
8%|▊ | 371/4506 [25:32<4:58:35, 4.33s/it]
8%|▊ | 372/4506 [25:36<4:59:38, 4.35s/it]
{'loss': 0.4656, 'grad_norm': 0.6830759048461914, 'learning_rate': 4.1130820399113086e-05, 'epoch': 0.08}
8%|▊ | 372/4506 [25:36<4:59:38, 4.35s/it]
8%|▊ | 373/4506 [25:41<5:06:24, 4.45s/it]
{'loss': 0.4613, 'grad_norm': 0.43557506799697876, 'learning_rate': 4.124168514412417e-05, 'epoch': 0.08}
8%|▊ | 373/4506 [25:41<5:06:24, 4.45s/it]
8%|▊ | 374/4506 [25:45<4:57:37, 4.32s/it]
{'loss': 0.4638, 'grad_norm': 0.6481852531433105, 'learning_rate': 4.135254988913526e-05, 'epoch': 0.08}
8%|▊ | 374/4506 [25:45<4:57:37, 4.32s/it]
8%|▊ | 375/4506 [25:49<4:54:13, 4.27s/it]
{'loss': 0.4546, 'grad_norm': 0.6400026679039001, 'learning_rate': 4.146341463414634e-05, 'epoch': 0.08}
8%|▊ | 375/4506 [25:49<4:54:13, 4.27s/it]
8%|▊ | 376/4506 [25:53<4:49:53, 4.21s/it]
{'loss': 0.474, 'grad_norm': 0.5715425610542297, 'learning_rate': 4.1574279379157435e-05, 'epoch': 0.08}
8%|▊ | 376/4506 [25:53<4:49:53, 4.21s/it]
8%|▊ | 377/4506 [25:57<4:44:22, 4.13s/it]
{'loss': 0.4597, 'grad_norm': 0.6540743112564087, 'learning_rate': 4.168514412416852e-05, 'epoch': 0.08}
8%|▊ | 377/4506 [25:57<4:44:22, 4.13s/it]
8%|▊ | 378/4506 [26:01<4:38:21, 4.05s/it]
{'loss': 0.4636, 'grad_norm': 0.5579650402069092, 'learning_rate': 4.1796008869179606e-05, 'epoch': 0.08}
8%|▊ | 378/4506 [26:01<4:38:21, 4.05s/it]
8%|▊ | 379/4506 [26:05<4:34:38, 3.99s/it]
{'loss': 0.4597, 'grad_norm': 0.5883092284202576, 'learning_rate': 4.190687361419069e-05, 'epoch': 0.08}
8%|▊ | 379/4506 [26:05<4:34:38, 3.99s/it]
8%|▊ | 380/4506 [26:09<4:36:18, 4.02s/it]
{'loss': 0.4671, 'grad_norm': 0.5526081323623657, 'learning_rate': 4.201773835920178e-05, 'epoch': 0.08}
8%|▊ | 380/4506 [26:09<4:36:18, 4.02s/it]
8%|▊ | 381/4506 [26:13<4:37:04, 4.03s/it]
{'loss': 0.4626, 'grad_norm': 0.5996270179748535, 'learning_rate': 4.212860310421286e-05, 'epoch': 0.08}
8%|▊ | 381/4506 [26:13<4:37:04, 4.03s/it]
8%|▊ | 382/4506 [26:17<4:37:45, 4.04s/it]
{'loss': 0.4624, 'grad_norm': 0.5177364945411682, 'learning_rate': 4.2239467849223955e-05, 'epoch': 0.08}
8%|▊ | 382/4506 [26:17<4:37:45, 4.04s/it]
8%|▊ | 383/4506 [26:21<4:42:29, 4.11s/it]
{'loss': 0.4719, 'grad_norm': 0.5866743922233582, 'learning_rate': 4.235033259423504e-05, 'epoch': 0.09}
8%|▊ | 383/4506 [26:21<4:42:29, 4.11s/it]
9%|▊ | 384/4506 [26:25<4:40:04, 4.08s/it]
{'loss': 0.4622, 'grad_norm': 0.5588046312332153, 'learning_rate': 4.2461197339246126e-05, 'epoch': 0.09}
9%|▊ | 384/4506 [26:25<4:40:04, 4.08s/it]
9%|▊ | 385/4506 [26:30<4:53:55, 4.28s/it]
{'loss': 0.4599, 'grad_norm': 0.7789252400398254, 'learning_rate': 4.257206208425721e-05, 'epoch': 0.09}
9%|▊ | 385/4506 [26:30<4:53:55, 4.28s/it]
9%|▊ | 386/4506 [26:34<4:51:54, 4.25s/it]
{'loss': 0.4652, 'grad_norm': 0.4761052131652832, 'learning_rate': 4.26829268292683e-05, 'epoch': 0.09}
9%|▊ | 386/4506 [26:34<4:51:54, 4.25s/it]
9%|▊ | 387/4506 [26:38<4:51:39, 4.25s/it]
{'loss': 0.4754, 'grad_norm': 0.7659537196159363, 'learning_rate': 4.279379157427938e-05, 'epoch': 0.09}
9%|▊ | 387/4506 [26:38<4:51:39, 4.25s/it]
9%|▊ | 388/4506 [26:43<4:53:14, 4.27s/it]
{'loss': 0.4748, 'grad_norm': 0.5786538124084473, 'learning_rate': 4.290465631929047e-05, 'epoch': 0.09}
9%|▊ | 388/4506 [26:43<4:53:14, 4.27s/it]
9%|▊ | 389/4506 [26:46<4:41:57, 4.11s/it]
{'loss': 0.47, 'grad_norm': 0.6295309066772461, 'learning_rate': 4.301552106430156e-05, 'epoch': 0.09}
9%|▊ | 389/4506 [26:47<4:41:57, 4.11s/it]
9%|▊ | 390/4506 [26:51<4:46:53, 4.18s/it]
{'loss': 0.4578, 'grad_norm': 0.5258036255836487, 'learning_rate': 4.312638580931264e-05, 'epoch': 0.09}
9%|▊ | 390/4506 [26:51<4:46:53, 4.18s/it]
9%|▊ | 391/4506 [26:55<4:48:39, 4.21s/it]
{'loss': 0.4556, 'grad_norm': 0.5534862279891968, 'learning_rate': 4.323725055432373e-05, 'epoch': 0.09}
9%|▊ | 391/4506 [26:55<4:48:39, 4.21s/it]
9%|▊ | 392/4506 [26:59<4:40:01, 4.08s/it]
{'loss': 0.4556, 'grad_norm': 0.5713557004928589, 'learning_rate': 4.334811529933481e-05, 'epoch': 0.09}
9%|▊ | 392/4506 [26:59<4:40:01, 4.08s/it]
9%|▊ | 393/4506 [27:03<4:45:45, 4.17s/it]
{'loss': 0.4497, 'grad_norm': 0.7118195295333862, 'learning_rate': 4.34589800443459e-05, 'epoch': 0.09}
9%|▊ | 393/4506 [27:03<4:45:45, 4.17s/it]
9%|▊ | 394/4506 [27:07<4:43:56, 4.14s/it]
{'loss': 0.46, 'grad_norm': 0.5879712104797363, 'learning_rate': 4.356984478935698e-05, 'epoch': 0.09}
9%|▊ | 394/4506 [27:07<4:43:56, 4.14s/it]
9%|▉ | 395/4506 [27:12<4:44:16, 4.15s/it]
{'loss': 0.4557, 'grad_norm': 0.8116681575775146, 'learning_rate': 4.3680709534368077e-05, 'epoch': 0.09}
9%|▉ | 395/4506 [27:12<4:44:16, 4.15s/it]
9%|▉ | 396/4506 [27:16<4:50:32, 4.24s/it]
{'loss': 0.4606, 'grad_norm': 0.5611497759819031, 'learning_rate': 4.379157427937916e-05, 'epoch': 0.09}
9%|▉ | 396/4506 [27:16<4:50:32, 4.24s/it]
9%|▉ | 397/4506 [27:20<4:51:35, 4.26s/it]
{'loss': 0.4606, 'grad_norm': 0.6730551719665527, 'learning_rate': 4.390243902439025e-05, 'epoch': 0.09}
9%|▉ | 397/4506 [27:20<4:51:35, 4.26s/it]
9%|▉ | 398/4506 [27:24<4:50:02, 4.24s/it]
{'loss': 0.4513, 'grad_norm': 0.6595414876937866, 'learning_rate': 4.401330376940133e-05, 'epoch': 0.09}
9%|▉ | 398/4506 [27:24<4:50:02, 4.24s/it]
9%|▉ | 399/4506 [27:29<4:49:23, 4.23s/it]
{'loss': 0.4606, 'grad_norm': 0.5522336363792419, 'learning_rate': 4.412416851441242e-05, 'epoch': 0.09}
9%|▉ | 399/4506 [27:29<4:49:23, 4.23s/it]
9%|▉ | 400/4506 [27:32<4:40:59, 4.11s/it]
{'loss': 0.478, 'grad_norm': 0.6960667371749878, 'learning_rate': 4.42350332594235e-05, 'epoch': 0.09}
9%|▉ | 400/4506 [27:33<4:40:59, 4.11s/it]
9%|▉ | 401/4506 [27:37<4:39:58, 4.09s/it]
{'loss': 0.4523, 'grad_norm': 0.49169501662254333, 'learning_rate': 4.4345898004434597e-05, 'epoch': 0.09}
9%|▉ | 401/4506 [27:37<4:39:58, 4.09s/it]
9%|▉ | 402/4506 [27:41<4:45:11, 4.17s/it]
{'loss': 0.467, 'grad_norm': 0.7127147912979126, 'learning_rate': 4.445676274944568e-05, 'epoch': 0.09}
9%|▉ | 402/4506 [27:41<4:45:11, 4.17s/it]
9%|▉ | 403/4506 [27:45<4:38:03, 4.07s/it]
{'loss': 0.438, 'grad_norm': 0.567373514175415, 'learning_rate': 4.456762749445677e-05, 'epoch': 0.09}
9%|▉ | 403/4506 [27:45<4:38:03, 4.07s/it]
9%|▉ | 404/4506 [27:49<4:37:00, 4.05s/it]
{'loss': 0.4613, 'grad_norm': 0.603082001209259, 'learning_rate': 4.467849223946785e-05, 'epoch': 0.09}
9%|▉ | 404/4506 [27:49<4:37:00, 4.05s/it]
9%|▉ | 405/4506 [27:53<4:33:28, 4.00s/it]
{'loss': 0.4711, 'grad_norm': 0.6929937601089478, 'learning_rate': 4.478935698447894e-05, 'epoch': 0.09}
9%|▉ | 405/4506 [27:53<4:33:28, 4.00s/it]
9%|▉ | 406/4506 [27:57<4:34:51, 4.02s/it]
{'loss': 0.4649, 'grad_norm': 0.5277724862098694, 'learning_rate': 4.490022172949002e-05, 'epoch': 0.09}
9%|▉ | 406/4506 [27:57<4:34:51, 4.02s/it]
9%|▉ | 407/4506 [28:01<4:39:09, 4.09s/it]
{'loss': 0.4568, 'grad_norm': 0.7047582864761353, 'learning_rate': 4.501108647450111e-05, 'epoch': 0.09}
9%|▉ | 407/4506 [28:01<4:39:09, 4.09s/it]
9%|▉ | 408/4506 [28:05<4:33:41, 4.01s/it]
{'loss': 0.446, 'grad_norm': 0.5928276777267456, 'learning_rate': 4.51219512195122e-05, 'epoch': 0.09}
9%|▉ | 408/4506 [28:05<4:33:41, 4.01s/it]
9%|▉ | 409/4506 [28:09<4:38:03, 4.07s/it]
{'loss': 0.4452, 'grad_norm': 0.5233811140060425, 'learning_rate': 4.523281596452328e-05, 'epoch': 0.09}
9%|▉ | 409/4506 [28:09<4:38:03, 4.07s/it]
9%|▉ | 410/4506 [28:13<4:36:03, 4.04s/it]
{'loss': 0.4656, 'grad_norm': 0.7053755521774292, 'learning_rate': 4.534368070953437e-05, 'epoch': 0.09}
9%|▉ | 410/4506 [28:13<4:36:03, 4.04s/it]
9%|▉ | 411/4506 [28:17<4:32:46, 4.00s/it]
{'loss': 0.4675, 'grad_norm': 0.5023159384727478, 'learning_rate': 4.545454545454546e-05, 'epoch': 0.09}
9%|▉ | 411/4506 [28:17<4:32:46, 4.00s/it]
9%|▉ | 412/4506 [28:21<4:35:21, 4.04s/it]
{'loss': 0.4594, 'grad_norm': 0.5829842686653137, 'learning_rate': 4.556541019955654e-05, 'epoch': 0.09}
9%|▉ | 412/4506 [28:21<4:35:21, 4.04s/it]
9%|▉ | 413/4506 [28:25<4:36:33, 4.05s/it]
{'loss': 0.4679, 'grad_norm': 0.5290495753288269, 'learning_rate': 4.567627494456763e-05, 'epoch': 0.09}
9%|▉ | 413/4506 [28:25<4:36:33, 4.05s/it]
9%|▉ | 414/4506 [28:29<4:38:22, 4.08s/it]
{'loss': 0.4549, 'grad_norm': 0.5022586584091187, 'learning_rate': 4.578713968957872e-05, 'epoch': 0.09}
9%|▉ | 414/4506 [28:29<4:38:22, 4.08s/it]
9%|▉ | 415/4506 [28:33<4:32:30, 4.00s/it]
{'loss': 0.4524, 'grad_norm': 0.5855683088302612, 'learning_rate': 4.58980044345898e-05, 'epoch': 0.09}
9%|▉ | 415/4506 [28:33<4:32:30, 4.00s/it]
9%|▉ | 416/4506 [28:37<4:40:57, 4.12s/it]
{'loss': 0.4755, 'grad_norm': 0.6429371237754822, 'learning_rate': 4.600886917960089e-05, 'epoch': 0.09}
9%|▉ | 416/4506 [28:37<4:40:57, 4.12s/it]
9%|▉ | 417/4506 [28:41<4:31:49, 3.99s/it]
{'loss': 0.4578, 'grad_norm': 0.5348767042160034, 'learning_rate': 4.611973392461197e-05, 'epoch': 0.09}
9%|▉ | 417/4506 [28:41<4:31:49, 3.99s/it]
9%|▉ | 418/4506 [28:45<4:29:10, 3.95s/it]
{'loss': 0.4684, 'grad_norm': 0.6719628572463989, 'learning_rate': 4.623059866962306e-05, 'epoch': 0.09}
9%|▉ | 418/4506 [28:45<4:29:10, 3.95s/it]
9%|▉ | 419/4506 [28:49<4:33:12, 4.01s/it]
{'loss': 0.4456, 'grad_norm': 0.574863612651825, 'learning_rate': 4.634146341463415e-05, 'epoch': 0.09}
9%|▉ | 419/4506 [28:49<4:33:12, 4.01s/it]
9%|▉ | 420/4506 [28:53<4:28:20, 3.94s/it]
{'loss': 0.4553, 'grad_norm': 0.5323691368103027, 'learning_rate': 4.645232815964524e-05, 'epoch': 0.09}
9%|▉ | 420/4506 [28:53<4:28:20, 3.94s/it]
9%|▉ | 421/4506 [28:57<4:39:08, 4.10s/it]
{'loss': 0.4557, 'grad_norm': 0.626629114151001, 'learning_rate': 4.656319290465632e-05, 'epoch': 0.09}
9%|▉ | 421/4506 [28:57<4:39:08, 4.10s/it]
9%|▉ | 422/4506 [29:02<4:45:54, 4.20s/it]
{'loss': 0.4705, 'grad_norm': 0.5961509943008423, 'learning_rate': 4.667405764966741e-05, 'epoch': 0.09}
9%|▉ | 422/4506 [29:02<4:45:54, 4.20s/it]
9%|▉ | 423/4506 [29:06<4:48:39, 4.24s/it]
{'loss': 0.4605, 'grad_norm': 0.6836445331573486, 'learning_rate': 4.678492239467849e-05, 'epoch': 0.09}
9%|▉ | 423/4506 [29:06<4:48:39, 4.24s/it]
9%|▉ | 424/4506 [29:10<4:42:30, 4.15s/it]
{'loss': 0.4612, 'grad_norm': 0.6358264684677124, 'learning_rate': 4.689578713968958e-05, 'epoch': 0.09}
9%|▉ | 424/4506 [29:10<4:42:30, 4.15s/it]
9%|▉ | 425/4506 [29:14<4:44:23, 4.18s/it]
{'loss': 0.4482, 'grad_norm': 0.5614078044891357, 'learning_rate': 4.700665188470067e-05, 'epoch': 0.09}
9%|▉ | 425/4506 [29:14<4:44:23, 4.18s/it]
9%|▉ | 426/4506 [29:19<4:54:39, 4.33s/it]
{'loss': 0.4482, 'grad_norm': 0.5856594443321228, 'learning_rate': 4.711751662971176e-05, 'epoch': 0.09}
9%|▉ | 426/4506 [29:19<4:54:39, 4.33s/it]
9%|▉ | 427/4506 [29:23<4:51:49, 4.29s/it]
{'loss': 0.4462, 'grad_norm': 0.5428962111473083, 'learning_rate': 4.722838137472284e-05, 'epoch': 0.09}
9%|▉ | 427/4506 [29:23<4:51:49, 4.29s/it]
9%|▉ | 428/4506 [29:27<4:46:45, 4.22s/it]
{'loss': 0.4669, 'grad_norm': 0.5739588141441345, 'learning_rate': 4.733924611973393e-05, 'epoch': 0.1}
9%|▉ | 428/4506 [29:27<4:46:45, 4.22s/it]
10%|▉ | 429/4506 [29:31<4:45:04, 4.20s/it]
{'loss': 0.4416, 'grad_norm': 0.7170345783233643, 'learning_rate': 4.745011086474501e-05, 'epoch': 0.1}
10%|▉ | 429/4506 [29:31<4:45:04, 4.20s/it]
10%|▉ | 430/4506 [29:35<4:42:20, 4.16s/it]
{'loss': 0.4722, 'grad_norm': 0.7232305407524109, 'learning_rate': 4.75609756097561e-05, 'epoch': 0.1}
10%|▉ | 430/4506 [29:35<4:42:20, 4.16s/it]
10%|▉ | 431/4506 [29:40<4:44:11, 4.18s/it]
{'loss': 0.4599, 'grad_norm': 0.6060721278190613, 'learning_rate': 4.767184035476719e-05, 'epoch': 0.1}
10%|▉ | 431/4506 [29:40<4:44:11, 4.18s/it]
10%|▉ | 432/4506 [29:44<4:37:33, 4.09s/it]
{'loss': 0.4599, 'grad_norm': 0.5401850342750549, 'learning_rate': 4.778270509977827e-05, 'epoch': 0.1}
10%|▉ | 432/4506 [29:44<4:37:33, 4.09s/it]
10%|▉ | 433/4506 [29:48<4:38:30, 4.10s/it]
{'loss': 0.4558, 'grad_norm': 0.643409788608551, 'learning_rate': 4.789356984478936e-05, 'epoch': 0.1}
10%|▉ | 433/4506 [29:48<4:38:30, 4.10s/it]
10%|▉ | 434/4506 [29:52<4:33:54, 4.04s/it]
{'loss': 0.4576, 'grad_norm': 0.7528185844421387, 'learning_rate': 4.800443458980044e-05, 'epoch': 0.1}
10%|▉ | 434/4506 [29:52<4:33:54, 4.04s/it]
10%|▉ | 435/4506 [29:56<4:32:01, 4.01s/it]
{'loss': 0.4614, 'grad_norm': 0.5888720154762268, 'learning_rate': 4.811529933481153e-05, 'epoch': 0.1}
10%|▉ | 435/4506 [29:56<4:32:01, 4.01s/it]
10%|▉ | 436/4506 [30:00<4:37:05, 4.08s/it]
{'loss': 0.4453, 'grad_norm': 0.644278883934021, 'learning_rate': 4.8226164079822614e-05, 'epoch': 0.1}
10%|▉ | 436/4506 [30:00<4:37:05, 4.08s/it]
10%|▉ | 437/4506 [30:04<4:34:15, 4.04s/it]
{'loss': 0.4497, 'grad_norm': 0.5477158427238464, 'learning_rate': 4.833702882483371e-05, 'epoch': 0.1}
10%|▉ | 437/4506 [30:04<4:34:15, 4.04s/it]
10%|▉ | 438/4506 [30:08<4:29:33, 3.98s/it]
{'loss': 0.4616, 'grad_norm': 0.5223158001899719, 'learning_rate': 4.844789356984479e-05, 'epoch': 0.1}
10%|▉ | 438/4506 [30:08<4:29:33, 3.98s/it]
10%|▉ | 439/4506 [30:11<4:28:03, 3.95s/it]
{'loss': 0.4651, 'grad_norm': 0.5338672399520874, 'learning_rate': 4.855875831485588e-05, 'epoch': 0.1}
10%|▉ | 439/4506 [30:11<4:28:03, 3.95s/it]
10%|▉ | 440/4506 [30:15<4:24:33, 3.90s/it]
{'loss': 0.4602, 'grad_norm': 0.5525108575820923, 'learning_rate': 4.866962305986696e-05, 'epoch': 0.1}
10%|▉ | 440/4506 [30:15<4:24:33, 3.90s/it]
10%|▉ | 441/4506 [30:20<4:33:00, 4.03s/it]
{'loss': 0.4493, 'grad_norm': 0.46514490246772766, 'learning_rate': 4.878048780487805e-05, 'epoch': 0.1}
10%|▉ | 441/4506 [30:20<4:33:00, 4.03s/it]
10%|▉ | 442/4506 [30:24<4:43:31, 4.19s/it]
{'loss': 0.456, 'grad_norm': 0.5159311890602112, 'learning_rate': 4.8891352549889134e-05, 'epoch': 0.1}
10%|▉ | 442/4506 [30:24<4:43:31, 4.19s/it]
10%|▉ | 443/4506 [30:28<4:45:36, 4.22s/it]
{'loss': 0.4597, 'grad_norm': 0.5232892036437988, 'learning_rate': 4.900221729490023e-05, 'epoch': 0.1}
10%|▉ | 443/4506 [30:28<4:45:36, 4.22s/it]
10%|▉ | 444/4506 [30:32<4:41:39, 4.16s/it]
{'loss': 0.4549, 'grad_norm': 1.1367723941802979, 'learning_rate': 4.911308203991131e-05, 'epoch': 0.1}
10%|▉ | 444/4506 [30:32<4:41:39, 4.16s/it]
10%|▉ | 445/4506 [30:36<4:38:15, 4.11s/it]
{'loss': 0.4496, 'grad_norm': 0.47729188203811646, 'learning_rate': 4.92239467849224e-05, 'epoch': 0.1}
10%|▉ | 445/4506 [30:36<4:38:15, 4.11s/it]
10%|▉ | 446/4506 [30:40<4:34:59, 4.06s/it]
{'loss': 0.4428, 'grad_norm': 0.5833290815353394, 'learning_rate': 4.933481152993348e-05, 'epoch': 0.1}
10%|▉ | 446/4506 [30:40<4:34:59, 4.06s/it]
10%|▉ | 447/4506 [30:45<4:38:31, 4.12s/it]
{'loss': 0.4597, 'grad_norm': 0.491079181432724, 'learning_rate': 4.944567627494457e-05, 'epoch': 0.1}
10%|▉ | 447/4506 [30:45<4:38:31, 4.12s/it]
10%|▉ | 448/4506 [30:49<4:34:52, 4.06s/it]
{'loss': 0.4516, 'grad_norm': 0.49322178959846497, 'learning_rate': 4.9556541019955654e-05, 'epoch': 0.1}
10%|▉ | 448/4506 [30:49<4:34:52, 4.06s/it]
10%|▉ | 449/4506 [30:53<4:36:41, 4.09s/it]
{'loss': 0.456, 'grad_norm': 0.5450277924537659, 'learning_rate': 4.966740576496674e-05, 'epoch': 0.1}
10%|▉ | 449/4506 [30:53<4:36:41, 4.09s/it]
10%|▉ | 450/4506 [30:57<4:38:56, 4.13s/it]
{'loss': 0.4484, 'grad_norm': 0.5215233564376831, 'learning_rate': 4.977827050997783e-05, 'epoch': 0.1}
10%|▉ | 450/4506 [30:57<4:38:56, 4.13s/it]
10%|█ | 451/4506 [31:01<4:37:34, 4.11s/it]
{'loss': 0.4655, 'grad_norm': 0.5275602340698242, 'learning_rate': 4.9889135254988913e-05, 'epoch': 0.1}
10%|█ | 451/4506 [31:01<4:37:34, 4.11s/it]
10%|█ | 452/4506 [31:05<4:42:19, 4.18s/it]
{'loss': 0.4561, 'grad_norm': 0.4633048176765442, 'learning_rate': 5e-05, 'epoch': 0.1}
10%|█ | 452/4506 [31:05<4:42:19, 4.18s/it]
10%|█ | 453/4506 [31:09<4:35:43, 4.08s/it]
{'loss': 0.437, 'grad_norm': 0.5585088133811951, 'learning_rate': 4.999999249711967e-05, 'epoch': 0.1}
10%|█ | 453/4506 [31:09<4:35:43, 4.08s/it]
10%|█ | 454/4506 [31:14<4:44:29, 4.21s/it]
{'loss': 0.4563, 'grad_norm': 0.550626277923584, 'learning_rate': 4.9999969988483185e-05, 'epoch': 0.1}
10%|█ | 454/4506 [31:14<4:44:29, 4.21s/it]
10%|█ | 455/4506 [31:18<4:44:20, 4.21s/it]
{'loss': 0.4534, 'grad_norm': 0.5013534426689148, 'learning_rate': 4.999993247410405e-05, 'epoch': 0.1}
10%|█ | 455/4506 [31:18<4:44:20, 4.21s/it]
10%|█ | 456/4506 [31:22<4:43:34, 4.20s/it]
{'loss': 0.4443, 'grad_norm': 0.7509421706199646, 'learning_rate': 4.9999879954004784e-05, 'epoch': 0.1}
10%|█ | 456/4506 [31:22<4:43:34, 4.20s/it]
10%|█ | 457/4506 [31:26<4:46:54, 4.25s/it]
{'loss': 0.4564, 'grad_norm': 0.5595318675041199, 'learning_rate': 4.999981242821692e-05, 'epoch': 0.1}
10%|█ | 457/4506 [31:27<4:46:54, 4.25s/it]
10%|█ | 458/4506 [31:30<4:40:37, 4.16s/it]
{'loss': 0.4756, 'grad_norm': 2.0854523181915283, 'learning_rate': 4.9999729896780975e-05, 'epoch': 0.1}
10%|█ | 458/4506 [31:30<4:40:37, 4.16s/it]
10%|█ | 459/4506 [31:35<4:43:13, 4.20s/it]
{'loss': 0.5182, 'grad_norm': 3.7414886951446533, 'learning_rate': 4.9999632359746496e-05, 'epoch': 0.1}
10%|█ | 459/4506 [31:35<4:43:13, 4.20s/it]
10%|█ | 460/4506 [31:39<4:46:06, 4.24s/it]
{'loss': 0.4506, 'grad_norm': 0.7605077624320984, 'learning_rate': 4.999951981717203e-05, 'epoch': 0.1}
10%|█ | 460/4506 [31:39<4:46:06, 4.24s/it]
10%|█ | 461/4506 [31:43<4:39:28, 4.15s/it]
{'loss': 0.4578, 'grad_norm': 1.907209038734436, 'learning_rate': 4.9999392269125114e-05, 'epoch': 0.1}
10%|█ | 461/4506 [31:43<4:39:28, 4.15s/it]
10%|█ | 462/4506 [31:47<4:37:07, 4.11s/it]
{'loss': 0.4553, 'grad_norm': 0.6861857771873474, 'learning_rate': 4.9999249715682316e-05, 'epoch': 0.1}
10%|█ | 462/4506 [31:47<4:37:07, 4.11s/it]
10%|█ | 463/4506 [31:52<4:44:30, 4.22s/it]
{'loss': 0.4741, 'grad_norm': 0.7866166234016418, 'learning_rate': 4.99990921569292e-05, 'epoch': 0.1}
10%|█ | 463/4506 [31:52<4:44:30, 4.22s/it]
10%|█ | 464/4506 [31:56<4:44:51, 4.23s/it]
{'loss': 0.4541, 'grad_norm': 0.5293951034545898, 'learning_rate': 4.999891959296035e-05, 'epoch': 0.1}
10%|█ | 464/4506 [31:56<4:44:51, 4.23s/it]
10%|█ | 465/4506 [32:00<4:41:43, 4.18s/it]
{'loss': 0.4535, 'grad_norm': 0.6158657670021057, 'learning_rate': 4.999873202387933e-05, 'epoch': 0.1}
10%|█ | 465/4506 [32:00<4:41:43, 4.18s/it]
10%|█ | 466/4506 [32:04<4:41:44, 4.18s/it]
{'loss': 0.4646, 'grad_norm': 0.5599068999290466, 'learning_rate': 4.9998529449798725e-05, 'epoch': 0.1}
10%|█ | 466/4506 [32:04<4:41:44, 4.18s/it]
10%|█ | 467/4506 [32:09<4:51:03, 4.32s/it]
{'loss': 0.4499, 'grad_norm': 0.5410224199295044, 'learning_rate': 4.9998311870840125e-05, 'epoch': 0.1}
10%|█ | 467/4506 [32:09<4:51:03, 4.32s/it]
10%|█ | 468/4506 [32:13<4:46:50, 4.26s/it]
{'loss': 0.439, 'grad_norm': 0.5516248345375061, 'learning_rate': 4.9998079287134134e-05, 'epoch': 0.1}
10%|█ | 468/4506 [32:13<4:46:50, 4.26s/it]
10%|█ | 469/4506 [32:17<4:42:46, 4.20s/it]
{'loss': 0.4494, 'grad_norm': 0.58585125207901, 'learning_rate': 4.999783169882035e-05, 'epoch': 0.1}
10%|█ | 469/4506 [32:17<4:42:46, 4.20s/it]
10%|█ | 470/4506 [32:21<4:38:18, 4.14s/it]
{'loss': 0.4383, 'grad_norm': 0.5318825840950012, 'learning_rate': 4.999756910604739e-05, 'epoch': 0.1}
10%|█ | 470/4506 [32:21<4:38:18, 4.14s/it]
10%|█ | 471/4506 [32:25<4:38:43, 4.14s/it]
{'loss': 0.4317, 'grad_norm': 0.4958344101905823, 'learning_rate': 4.999729150897287e-05, 'epoch': 0.1}
10%|█ | 471/4506 [32:25<4:38:43, 4.14s/it]
10%|█ | 472/4506 [32:29<4:42:32, 4.20s/it]
{'loss': 0.4316, 'grad_norm': 0.584888756275177, 'learning_rate': 4.999699890776339e-05, 'epoch': 0.1}
10%|█ | 472/4506 [32:29<4:42:32, 4.20s/it]
10%|█ | 473/4506 [32:33<4:39:08, 4.15s/it]
{'loss': 0.4584, 'grad_norm': 0.6207486987113953, 'learning_rate': 4.999669130259461e-05, 'epoch': 0.1}
10%|█ | 473/4506 [32:33<4:39:08, 4.15s/it]
11%|█ | 474/4506 [32:37<4:35:28, 4.10s/it]
{'loss': 0.4331, 'grad_norm': 0.5452698469161987, 'learning_rate': 4.999636869365115e-05, 'epoch': 0.11}
11%|█ | 474/4506 [32:37<4:35:28, 4.10s/it]
11%|█ | 475/4506 [32:41<4:33:36, 4.07s/it]
{'loss': 0.4304, 'grad_norm': 0.5363308191299438, 'learning_rate': 4.9996031081126646e-05, 'epoch': 0.11}
11%|█ | 475/4506 [32:41<4:33:36, 4.07s/it]
11%|█ | 476/4506 [32:46<4:37:10, 4.13s/it]
{'loss': 0.4505, 'grad_norm': 0.5662781000137329, 'learning_rate': 4.999567846522375e-05, 'epoch': 0.11}
11%|█ | 476/4506 [32:46<4:37:10, 4.13s/it]
11%|█ | 477/4506 [32:50<4:38:44, 4.15s/it]
{'loss': 0.4316, 'grad_norm': 0.5242763161659241, 'learning_rate': 4.999531084615411e-05, 'epoch': 0.11}
11%|█ | 477/4506 [32:50<4:38:44, 4.15s/it]
11%|█ | 478/4506 [32:54<4:36:28, 4.12s/it]
{'loss': 0.4565, 'grad_norm': 0.5541630983352661, 'learning_rate': 4.999492822413839e-05, 'epoch': 0.11}
11%|█ | 478/4506 [32:54<4:36:28, 4.12s/it]
11%|█ | 479/4506 [32:58<4:46:48, 4.27s/it]
{'loss': 0.4579, 'grad_norm': 0.5700659155845642, 'learning_rate': 4.999453059940623e-05, 'epoch': 0.11}
11%|█ | 479/4506 [32:58<4:46:48, 4.27s/it]
11%|█ | 480/4506 [33:02<4:39:00, 4.16s/it]
{'loss': 0.4562, 'grad_norm': 0.5246980786323547, 'learning_rate': 4.999411797219632e-05, 'epoch': 0.11}
11%|█ | 480/4506 [33:02<4:39:00, 4.16s/it]
11%|█ | 481/4506 [33:06<4:32:37, 4.06s/it]
{'loss': 0.4553, 'grad_norm': 0.6420440673828125, 'learning_rate': 4.9993690342756315e-05, 'epoch': 0.11}
11%|█ | 481/4506 [33:06<4:32:37, 4.06s/it]
11%|█ | 482/4506 [33:10<4:29:27, 4.02s/it]
{'loss': 0.4416, 'grad_norm': 0.5616031885147095, 'learning_rate': 4.999324771134291e-05, 'epoch': 0.11}
11%|█ | 482/4506 [33:10<4:29:27, 4.02s/it]
11%|█ | 483/4506 [33:14<4:33:55, 4.09s/it]
{'loss': 0.4625, 'grad_norm': 0.5313547849655151, 'learning_rate': 4.999279007822176e-05, 'epoch': 0.11}
11%|█ | 483/4506 [33:14<4:33:55, 4.09s/it]
11%|█ | 484/4506 [33:18<4:30:43, 4.04s/it]
{'loss': 0.4398, 'grad_norm': 0.5547383427619934, 'learning_rate': 4.999231744366756e-05, 'epoch': 0.11}
11%|█ | 484/4506 [33:18<4:30:43, 4.04s/it]
11%|█ | 485/4506 [33:22<4:30:17, 4.03s/it]
{'loss': 0.4409, 'grad_norm': 0.5973032116889954, 'learning_rate': 4.999182980796402e-05, 'epoch': 0.11}
11%|█ | 485/4506 [33:22<4:30:17, 4.03s/it]
11%|█ | 486/4506 [33:26<4:28:40, 4.01s/it]
{'loss': 0.4549, 'grad_norm': 0.6893261075019836, 'learning_rate': 4.99913271714038e-05, 'epoch': 0.11}
11%|█ | 486/4506 [33:26<4:28:40, 4.01s/it]
11%|█ | 487/4506 [33:31<4:35:56, 4.12s/it]
{'loss': 0.4525, 'grad_norm': 0.5233190655708313, 'learning_rate': 4.999080953428863e-05, 'epoch': 0.11}
11%|█ | 487/4506 [33:31<4:35:56, 4.12s/it]
11%|█ | 488/4506 [33:35<4:34:07, 4.09s/it]
{'loss': 0.4333, 'grad_norm': 0.5408821105957031, 'learning_rate': 4.999027689692919e-05, 'epoch': 0.11}
11%|█ | 488/4506 [33:35<4:34:07, 4.09s/it]
11%|█ | 489/4506 [33:39<4:36:27, 4.13s/it]
{'loss': 0.4447, 'grad_norm': 0.4986744225025177, 'learning_rate': 4.998972925964519e-05, 'epoch': 0.11}
11%|█ | 489/4506 [33:39<4:36:27, 4.13s/it]
11%|█ | 490/4506 [33:44<4:46:21, 4.28s/it]
{'loss': 0.4563, 'grad_norm': 0.5063315629959106, 'learning_rate': 4.998916662276534e-05, 'epoch': 0.11}
11%|█ | 490/4506 [33:44<4:46:21, 4.28s/it]
11%|█ | 491/4506 [33:48<4:42:50, 4.23s/it]
{'loss': 0.4506, 'grad_norm': 0.5945531725883484, 'learning_rate': 4.9988588986627357e-05, 'epoch': 0.11}
11%|█ | 491/4506 [33:48<4:42:50, 4.23s/it]
11%|█ | 492/4506 [33:52<4:37:17, 4.14s/it]
{'loss': 0.4518, 'grad_norm': 0.5567687153816223, 'learning_rate': 4.998799635157795e-05, 'epoch': 0.11}
11%|█ | 492/4506 [33:52<4:37:17, 4.14s/it]
11%|█ | 493/4506 [33:55<4:30:20, 4.04s/it]
{'loss': 0.4553, 'grad_norm': 0.6393666863441467, 'learning_rate': 4.998738871797283e-05, 'epoch': 0.11}
11%|█ | 493/4506 [33:55<4:30:20, 4.04s/it]
11%|█ | 494/4506 [34:00<4:37:01, 4.14s/it]
{'loss': 0.4474, 'grad_norm': 0.5485759973526001, 'learning_rate': 4.9986766086176726e-05, 'epoch': 0.11}
11%|█ | 494/4506 [34:00<4:37:01, 4.14s/it]
11%|█ | 495/4506 [34:04<4:35:08, 4.12s/it]
{'loss': 0.4626, 'grad_norm': 0.5786741375923157, 'learning_rate': 4.998612845656337e-05, 'epoch': 0.11}
11%|█ | 495/4506 [34:04<4:35:08, 4.12s/it]
11%|█ | 496/4506 [34:07<4:26:17, 3.98s/it]
{'loss': 0.4408, 'grad_norm': 0.565970778465271, 'learning_rate': 4.998547582951547e-05, 'epoch': 0.11}
11%|█ | 496/4506 [34:08<4:26:17, 3.98s/it]
11%|█ | 497/4506 [34:12<4:28:25, 4.02s/it]
{'loss': 0.4427, 'grad_norm': 0.5865330696105957, 'learning_rate': 4.998480820542476e-05, 'epoch': 0.11}
11%|█ | 497/4506 [34:12<4:28:25, 4.02s/it]
11%|█ | 498/4506 [34:16<4:32:46, 4.08s/it]
{'loss': 0.4605, 'grad_norm': 0.6171113848686218, 'learning_rate': 4.998412558469196e-05, 'epoch': 0.11}
11%|█ | 498/4506 [34:16<4:32:46, 4.08s/it]
11%|█ | 499/4506 [34:20<4:34:16, 4.11s/it]
{'loss': 0.4441, 'grad_norm': 0.531599760055542, 'learning_rate': 4.9983427967726815e-05, 'epoch': 0.11}
11%|█ | 499/4506 [34:20<4:34:16, 4.11s/it]
11%|█ | 500/4506 [34:24<4:34:26, 4.11s/it]
{'loss': 0.4513, 'grad_norm': 0.5382117629051208, 'learning_rate': 4.998271535494804e-05, 'epoch': 0.11}
11%|█ | 500/4506 [34:24<4:34:26, 4.11s/it]
11%|█ | 501/4506 [34:28<4:36:21, 4.14s/it]
{'loss': 0.4257, 'grad_norm': 0.6095223426818848, 'learning_rate': 4.9981987746783374e-05, 'epoch': 0.11}
11%|█ | 501/4506 [34:28<4:36:21, 4.14s/it]
11%|█ | 502/4506 [34:33<4:49:58, 4.35s/it]
{'loss': 0.4408, 'grad_norm': 0.5668097734451294, 'learning_rate': 4.998124514366956e-05, 'epoch': 0.11}
11%|█ | 502/4506 [34:33<4:49:58, 4.35s/it]
11%|█ | 503/4506 [34:37<4:37:36, 4.16s/it]
{'loss': 0.4192, 'grad_norm': 0.5142541527748108, 'learning_rate': 4.998048754605231e-05, 'epoch': 0.11}
11%|█ | 503/4506 [34:37<4:37:36, 4.16s/it]
11%|█ | 504/4506 [34:41<4:33:22, 4.10s/it]
{'loss': 0.4658, 'grad_norm': 0.6339344382286072, 'learning_rate': 4.9979714954386374e-05, 'epoch': 0.11}
11%|█ | 504/4506 [34:41<4:33:22, 4.10s/it]
11%|█ | 505/4506 [34:45<4:28:22, 4.02s/it]
{'loss': 0.4296, 'grad_norm': 0.6426340937614441, 'learning_rate': 4.997892736913548e-05, 'epoch': 0.11}
11%|█ | 505/4506 [34:45<4:28:22, 4.02s/it]
11%|█ | 506/4506 [34:49<4:32:49, 4.09s/it]
{'loss': 0.4477, 'grad_norm': 0.6418610215187073, 'learning_rate': 4.9978124790772356e-05, 'epoch': 0.11}
11%|█ | 506/4506 [34:49<4:32:49, 4.09s/it]
11%|█▏ | 507/4506 [34:53<4:39:32, 4.19s/it]
{'loss': 0.452, 'grad_norm': 0.5652244687080383, 'learning_rate': 4.997730721977874e-05, 'epoch': 0.11}
11%|█▏ | 507/4506 [34:53<4:39:32, 4.19s/it]
11%|█▏ | 508/4506 [34:57<4:36:36, 4.15s/it]
{'loss': 0.4346, 'grad_norm': 0.6343207955360413, 'learning_rate': 4.997647465664536e-05, 'epoch': 0.11}
11%|█▏ | 508/4506 [34:57<4:36:36, 4.15s/it]
11%|█▏ | 509/4506 [35:02<4:45:22, 4.28s/it]
{'loss': 0.4395, 'grad_norm': 0.6175341010093689, 'learning_rate': 4.9975627101871945e-05, 'epoch': 0.11}
11%|█▏ | 509/4506 [35:02<4:45:22, 4.28s/it]
11%|█▏ | 510/4506 [35:06<4:40:07, 4.21s/it]
{'loss': 0.4408, 'grad_norm': 0.5581393241882324, 'learning_rate': 4.997476455596722e-05, 'epoch': 0.11}
11%|█▏ | 510/4506 [35:06<4:40:07, 4.21s/it]
11%|█▏ | 511/4506 [35:10<4:35:50, 4.14s/it]
{'loss': 0.4572, 'grad_norm': 0.6580193042755127, 'learning_rate': 4.997388701944893e-05, 'epoch': 0.11}
11%|█▏ | 511/4506 [35:10<4:35:50, 4.14s/it]
11%|█▏ | 512/4506 [35:14<4:34:02, 4.12s/it]
{'loss': 0.448, 'grad_norm': 0.6082578897476196, 'learning_rate': 4.997299449284377e-05, 'epoch': 0.11}
11%|█▏ | 512/4506 [35:14<4:34:02, 4.12s/it]
11%|█▏ | 513/4506 [35:18<4:29:35, 4.05s/it]
{'loss': 0.4487, 'grad_norm': 0.5900601148605347, 'learning_rate': 4.9972086976687485e-05, 'epoch': 0.11}
11%|█▏ | 513/4506 [35:18<4:29:35, 4.05s/it]
11%|█▏ | 514/4506 [35:22<4:24:46, 3.98s/it]
{'loss': 0.4246, 'grad_norm': 0.5502161979675293, 'learning_rate': 4.997116447152478e-05, 'epoch': 0.11}
11%|█▏ | 514/4506 [35:22<4:24:46, 3.98s/it]
11%|█▏ | 515/4506 [35:26<4:26:45, 4.01s/it]
{'loss': 0.4416, 'grad_norm': 0.5540690422058105, 'learning_rate': 4.997022697790938e-05, 'epoch': 0.11}
11%|█▏ | 515/4506 [35:26<4:26:45, 4.01s/it]
11%|█▏ | 516/4506 [35:30<4:20:42, 3.92s/it]
{'loss': 0.4326, 'grad_norm': 0.581132173538208, 'learning_rate': 4.9969274496403996e-05, 'epoch': 0.11}
11%|█▏ | 516/4506 [35:30<4:20:42, 3.92s/it]
11%|█▏ | 517/4506 [35:33<4:19:58, 3.91s/it]
{'loss': 0.4418, 'grad_norm': 0.5767580270767212, 'learning_rate': 4.996830702758033e-05, 'epoch': 0.11}
11%|█▏ | 517/4506 [35:33<4:19:58, 3.91s/it]
11%|█▏ | 518/4506 [35:37<4:16:57, 3.87s/it]
{'loss': 0.4435, 'grad_norm': 0.6371181011199951, 'learning_rate': 4.996732457201909e-05, 'epoch': 0.11}
11%|█▏ | 518/4506 [35:37<4:16:57, 3.87s/it]
12%|█▏ | 519/4506 [35:41<4:17:26, 3.87s/it]
{'loss': 0.4561, 'grad_norm': 0.6188710331916809, 'learning_rate': 4.996632713030997e-05, 'epoch': 0.12}
12%|█▏ | 519/4506 [35:41<4:17:26, 3.87s/it]
12%|█▏ | 520/4506 [35:45<4:23:48, 3.97s/it]
{'loss': 0.4312, 'grad_norm': 0.5682732462882996, 'learning_rate': 4.996531470305168e-05, 'epoch': 0.12}
12%|█▏ | 520/4506 [35:45<4:23:48, 3.97s/it]
12%|█▏ | 521/4506 [35:50<4:28:25, 4.04s/it]
{'loss': 0.4386, 'grad_norm': 0.538855254650116, 'learning_rate': 4.996428729085189e-05, 'epoch': 0.12}
12%|█▏ | 521/4506 [35:50<4:28:25, 4.04s/it]
12%|█▏ | 522/4506 [35:53<4:21:26, 3.94s/it]
{'loss': 0.4238, 'grad_norm': 0.5303306579589844, 'learning_rate': 4.99632448943273e-05, 'epoch': 0.12}
12%|█▏ | 522/4506 [35:53<4:21:26, 3.94s/it]
12%|█▏ | 523/4506 [35:57<4:20:28, 3.92s/it]
{'loss': 0.4178, 'grad_norm': 0.5037055015563965, 'learning_rate': 4.996218751410358e-05, 'epoch': 0.12}
12%|█▏ | 523/4506 [35:57<4:20:28, 3.92s/it]
12%|█▏ | 524/4506 [36:01<4:23:54, 3.98s/it]
{'loss': 0.4534, 'grad_norm': 0.6870015859603882, 'learning_rate': 4.996111515081541e-05, 'epoch': 0.12}
12%|█▏ | 524/4506 [36:01<4:23:54, 3.98s/it]
12%|█▏ | 525/4506 [36:05<4:29:07, 4.06s/it]
{'loss': 0.4496, 'grad_norm': 0.5975680947303772, 'learning_rate': 4.996002780510644e-05, 'epoch': 0.12}
12%|█▏ | 525/4506 [36:05<4:29:07, 4.06s/it]
12%|█▏ | 526/4506 [36:10<4:35:53, 4.16s/it]
{'loss': 0.4249, 'grad_norm': 0.5034270286560059, 'learning_rate': 4.9958925477629345e-05, 'epoch': 0.12}
12%|█▏ | 526/4506 [36:10<4:35:53, 4.16s/it]
12%|█▏ | 527/4506 [36:14<4:39:03, 4.21s/it]
{'loss': 0.4472, 'grad_norm': 0.47337606549263, 'learning_rate': 4.995780816904576e-05, 'epoch': 0.12}
12%|█▏ | 527/4506 [36:14<4:39:03, 4.21s/it]
12%|█▏ | 528/4506 [36:18<4:40:21, 4.23s/it]
{'loss': 0.4231, 'grad_norm': 0.4643701910972595, 'learning_rate': 4.995667588002634e-05, 'epoch': 0.12}
12%|█▏ | 528/4506 [36:18<4:40:21, 4.23s/it]
12%|█▏ | 529/4506 [36:23<4:44:52, 4.30s/it]
{'loss': 0.422, 'grad_norm': 0.5718487501144409, 'learning_rate': 4.995552861125071e-05, 'epoch': 0.12}
12%|█▏ | 529/4506 [36:23<4:44:52, 4.30s/it]
12%|█▏ | 530/4506 [36:27<4:36:59, 4.18s/it]
{'loss': 0.4248, 'grad_norm': 0.43830007314682007, 'learning_rate': 4.995436636340751e-05, 'epoch': 0.12}
12%|█▏ | 530/4506 [36:27<4:36:59, 4.18s/it]
12%|█▏ | 531/4506 [36:31<4:37:52, 4.19s/it]
{'loss': 0.4364, 'grad_norm': 0.5556172728538513, 'learning_rate': 4.995318913719434e-05, 'epoch': 0.12}
12%|█▏ | 531/4506 [36:31<4:37:52, 4.19s/it]
12%|█▏ | 532/4506 [36:35<4:42:57, 4.27s/it]
{'loss': 0.4433, 'grad_norm': 0.5517470240592957, 'learning_rate': 4.995199693331781e-05, 'epoch': 0.12}
12%|█▏ | 532/4506 [36:36<4:42:57, 4.27s/it]
12%|█▏ | 533/4506 [36:40<4:44:01, 4.29s/it]
{'loss': 0.436, 'grad_norm': 0.47722265124320984, 'learning_rate': 4.995078975249353e-05, 'epoch': 0.12}
12%|█▏ | 533/4506 [36:40<4:44:01, 4.29s/it]
12%|█▏ | 534/4506 [36:44<4:39:42, 4.23s/it]
{'loss': 0.4401, 'grad_norm': 0.472236305475235, 'learning_rate': 4.9949567595446066e-05, 'epoch': 0.12}
12%|█▏ | 534/4506 [36:44<4:39:42, 4.23s/it]
12%|█▏ | 535/4506 [36:49<4:51:00, 4.40s/it]
{'loss': 0.4273, 'grad_norm': 0.5037781000137329, 'learning_rate': 4.9948330462909014e-05, 'epoch': 0.12}
12%|█▏ | 535/4506 [36:49<4:51:00, 4.40s/it]
12%|█▏ | 536/4506 [36:53<4:39:33, 4.22s/it]
{'loss': 0.4159, 'grad_norm': 0.505709171295166, 'learning_rate': 4.994707835562493e-05, 'epoch': 0.12}
12%|█▏ | 536/4506 [36:53<4:39:33, 4.22s/it]
12%|█▏ | 537/4506 [36:57<4:38:29, 4.21s/it]
{'loss': 0.4312, 'grad_norm': 0.5447461009025574, 'learning_rate': 4.994581127434536e-05, 'epoch': 0.12}
12%|█▏ | 537/4506 [36:57<4:38:29, 4.21s/it]
12%|█▏ | 538/4506 [37:00<4:28:17, 4.06s/it]
{'loss': 0.4368, 'grad_norm': 0.5553063750267029, 'learning_rate': 4.994452921983086e-05, 'epoch': 0.12}
12%|█▏ | 538/4506 [37:00<4:28:17, 4.06s/it]
12%|█▏ | 539/4506 [37:05<4:29:22, 4.07s/it]
{'loss': 0.418, 'grad_norm': 0.5296281576156616, 'learning_rate': 4.9943232192850944e-05, 'epoch': 0.12}
12%|█▏ | 539/4506 [37:05<4:29:22, 4.07s/it]
12%|█▏ | 540/4506 [37:09<4:28:45, 4.07s/it]
{'loss': 0.4241, 'grad_norm': 0.5289497375488281, 'learning_rate': 4.994192019418413e-05, 'epoch': 0.12}
12%|█▏ | 540/4506 [37:09<4:28:45, 4.07s/it]
12%|█▏ | 541/4506 [37:13<4:30:36, 4.09s/it]
{'loss': 0.4349, 'grad_norm': 0.5554969906806946, 'learning_rate': 4.994059322461793e-05, 'epoch': 0.12}
12%|█▏ | 541/4506 [37:13<4:30:36, 4.09s/it]
12%|█▏ | 542/4506 [37:17<4:31:35, 4.11s/it]
{'loss': 0.4284, 'grad_norm': 0.5366103053092957, 'learning_rate': 4.993925128494881e-05, 'epoch': 0.12}
12%|█▏ | 542/4506 [37:17<4:31:35, 4.11s/it]
12%|█▏ | 543/4506 [37:22<4:42:46, 4.28s/it]
{'loss': 0.445, 'grad_norm': 0.5923882722854614, 'learning_rate': 4.9937894375982264e-05, 'epoch': 0.12}
12%|█▏ | 543/4506 [37:22<4:42:46, 4.28s/it]
12%|█▏ | 544/4506 [37:25<4:33:18, 4.14s/it]
{'loss': 0.4457, 'grad_norm': 0.6029394865036011, 'learning_rate': 4.9936522498532746e-05, 'epoch': 0.12}
12%|█▏ | 544/4506 [37:25<4:33:18, 4.14s/it]
12%|█▏ | 545/4506 [37:30<4:36:46, 4.19s/it]
{'loss': 0.4345, 'grad_norm': 0.5009217858314514, 'learning_rate': 4.993513565342369e-05, 'epoch': 0.12}
12%|█▏ | 545/4506 [37:30<4:36:46, 4.19s/it]
12%|█▏ | 546/4506 [37:34<4:42:23, 4.28s/it]
{'loss': 0.4301, 'grad_norm': 0.47245654463768005, 'learning_rate': 4.9933733841487534e-05, 'epoch': 0.12}
12%|█▏ | 546/4506 [37:34<4:42:23, 4.28s/it]
12%|█▏ | 547/4506 [37:39<4:48:10, 4.37s/it]
{'loss': 0.4276, 'grad_norm': 0.49395090341567993, 'learning_rate': 4.993231706356568e-05, 'epoch': 0.12}
12%|█▏ | 547/4506 [37:39<4:48:10, 4.37s/it]
12%|█▏ | 548/4506 [37:43<4:43:42, 4.30s/it]
{'loss': 0.4092, 'grad_norm': 0.5548378825187683, 'learning_rate': 4.9930885320508525e-05, 'epoch': 0.12}
12%|█▏ | 548/4506 [37:43<4:43:42, 4.30s/it]
12%|█▏ | 549/4506 [37:47<4:41:23, 4.27s/it]
{'loss': 0.4215, 'grad_norm': 0.6328645348548889, 'learning_rate': 4.992943861317544e-05, 'epoch': 0.12}
12%|█▏ | 549/4506 [37:47<4:41:23, 4.27s/it]
12%|█▏ | 550/4506 [37:51<4:35:11, 4.17s/it]
{'loss': 0.4263, 'grad_norm': 0.6018776297569275, 'learning_rate': 4.9927976942434785e-05, 'epoch': 0.12}
12%|█▏ | 550/4506 [37:51<4:35:11, 4.17s/it]
12%|█▏ | 551/4506 [37:55<4:24:48, 4.02s/it]
{'loss': 0.4336, 'grad_norm': 0.5549276471138, 'learning_rate': 4.99265003091639e-05, 'epoch': 0.12}
12%|█▏ | 551/4506 [37:55<4:24:48, 4.02s/it]
12%|█▏ | 552/4506 [37:59<4:27:33, 4.06s/it]
{'loss': 0.437, 'grad_norm': 0.638979971408844, 'learning_rate': 4.99250087142491e-05, 'epoch': 0.12}
12%|█▏ | 552/4506 [37:59<4:27:33, 4.06s/it]
12%|█▏ | 553/4506 [38:03<4:24:43, 4.02s/it]
{'loss': 0.4427, 'grad_norm': 1.465622067451477, 'learning_rate': 4.99235021585857e-05, 'epoch': 0.12}
12%|█▏ | 553/4506 [38:03<4:24:43, 4.02s/it]
12%|█▏ | 554/4506 [38:07<4:22:47, 3.99s/it]
{'loss': 0.4295, 'grad_norm': 0.6905245780944824, 'learning_rate': 4.992198064307797e-05, 'epoch': 0.12}
12%|█▏ | 554/4506 [38:07<4:22:47, 3.99s/it]
12%|█▏ | 555/4506 [38:11<4:24:38, 4.02s/it]
{'loss': 0.4365, 'grad_norm': 0.7170051336288452, 'learning_rate': 4.992044416863917e-05, 'epoch': 0.12}
12%|█▏ | 555/4506 [38:11<4:24:38, 4.02s/it]
12%|█▏ | 556/4506 [38:15<4:22:33, 3.99s/it]
{'loss': 0.4459, 'grad_norm': 0.5665454864501953, 'learning_rate': 4.9918892736191536e-05, 'epoch': 0.12}
12%|█▏ | 556/4506 [38:15<4:22:33, 3.99s/it]
12%|█▏ | 557/4506 [38:19<4:28:55, 4.09s/it]
{'loss': 0.4316, 'grad_norm': 0.48568466305732727, 'learning_rate': 4.9917326346666294e-05, 'epoch': 0.12}
12%|█▏ | 557/4506 [38:19<4:28:55, 4.09s/it]
12%|█▏ | 558/4506 [38:23<4:28:30, 4.08s/it]
{'loss': 0.4276, 'grad_norm': 0.6732491254806519, 'learning_rate': 4.9915745001003636e-05, 'epoch': 0.12}
12%|█▏ | 558/4506 [38:23<4:28:30, 4.08s/it]
12%|█▏ | 559/4506 [38:27<4:32:28, 4.14s/it]
{'loss': 0.4401, 'grad_norm': 0.5691626071929932, 'learning_rate': 4.9914148700152726e-05, 'epoch': 0.12}
12%|█▏ | 559/4506 [38:27<4:32:28, 4.14s/it]
12%|█▏ | 560/4506 [38:31<4:27:39, 4.07s/it]
{'loss': 0.4401, 'grad_norm': 0.6619053483009338, 'learning_rate': 4.9912537445071715e-05, 'epoch': 0.12}
12%|█▏ | 560/4506 [38:31<4:27:39, 4.07s/it]
12%|█▏ | 561/4506 [38:35<4:26:41, 4.06s/it]
{'loss': 0.4336, 'grad_norm': 0.5587584376335144, 'learning_rate': 4.991091123672774e-05, 'epoch': 0.12}
12%|█▏ | 561/4506 [38:35<4:26:41, 4.06s/it]
12%|█▏ | 562/4506 [38:40<4:36:30, 4.21s/it]
{'loss': 0.4181, 'grad_norm': 0.5611889958381653, 'learning_rate': 4.990927007609688e-05, 'epoch': 0.12}
12%|█▏ | 562/4506 [38:40<4:36:30, 4.21s/it]
12%|█▏ | 563/4506 [38:44<4:36:50, 4.21s/it]
{'loss': 0.4148, 'grad_norm': 0.510606050491333, 'learning_rate': 4.9907613964164226e-05, 'epoch': 0.12}
12%|█▏ | 563/4506 [38:44<4:36:50, 4.21s/it]
13%|█▎ | 564/4506 [38:48<4:38:23, 4.24s/it]
{'loss': 0.4445, 'grad_norm': 0.6370822787284851, 'learning_rate': 4.990594290192382e-05, 'epoch': 0.13}
13%|█▎ | 564/4506 [38:48<4:38:23, 4.24s/it]
13%|█▎ | 565/4506 [38:53<4:37:48, 4.23s/it]
{'loss': 0.4292, 'grad_norm': 0.5120978951454163, 'learning_rate': 4.990425689037869e-05, 'epoch': 0.13}
13%|█▎ | 565/4506 [38:53<4:37:48, 4.23s/it]
13%|█▎ | 566/4506 [38:57<4:37:49, 4.23s/it]
{'loss': 0.4218, 'grad_norm': 0.6295232772827148, 'learning_rate': 4.9902555930540824e-05, 'epoch': 0.13}
13%|█▎ | 566/4506 [38:57<4:37:49, 4.23s/it]
13%|█▎ | 567/4506 [39:01<4:43:28, 4.32s/it]
{'loss': 0.4276, 'grad_norm': 0.4928165376186371, 'learning_rate': 4.990084002343119e-05, 'epoch': 0.13}
13%|█▎ | 567/4506 [39:01<4:43:28, 4.32s/it]
13%|█▎ | 568/4506 [39:05<4:37:47, 4.23s/it]
{'loss': 0.3995, 'grad_norm': 0.5832208395004272, 'learning_rate': 4.989910917007973e-05, 'epoch': 0.13}
13%|█▎ | 568/4506 [39:05<4:37:47, 4.23s/it]
13%|█▎ | 569/4506 [39:09<4:32:53, 4.16s/it]
{'loss': 0.4431, 'grad_norm': 0.588647186756134, 'learning_rate': 4.989736337152536e-05, 'epoch': 0.13}
13%|█▎ | 569/4506 [39:09<4:32:53, 4.16s/it]
13%|█▎ | 570/4506 [39:14<4:42:20, 4.30s/it]
{'loss': 0.4509, 'grad_norm': 0.592066764831543, 'learning_rate': 4.989560262881595e-05, 'epoch': 0.13}
13%|█▎ | 570/4506 [39:14<4:42:20, 4.30s/it]
13%|█▎ | 571/4506 [39:18<4:36:18, 4.21s/it]
{'loss': 0.4447, 'grad_norm': 0.5233760476112366, 'learning_rate': 4.989382694300837e-05, 'epoch': 0.13}
13%|█▎ | 571/4506 [39:18<4:36:18, 4.21s/it]
13%|█▎ | 572/4506 [39:22<4:29:40, 4.11s/it]
{'loss': 0.4204, 'grad_norm': 0.4952084720134735, 'learning_rate': 4.9892036315168425e-05, 'epoch': 0.13}
13%|█▎ | 572/4506 [39:22<4:29:40, 4.11s/it]
13%|█▎ | 573/4506 [39:26<4:28:41, 4.10s/it]
{'loss': 0.4223, 'grad_norm': 0.5447104573249817, 'learning_rate': 4.9890230746370895e-05, 'epoch': 0.13}
13%|█▎ | 573/4506 [39:26<4:28:41, 4.10s/it]
13%|█▎ | 574/4506 [39:30<4:25:20, 4.05s/it]
{'loss': 0.4225, 'grad_norm': 0.3968985378742218, 'learning_rate': 4.9888410237699565e-05, 'epoch': 0.13}
13%|█▎ | 574/4506 [39:30<4:25:20, 4.05s/it]
13%|█▎ | 575/4506 [39:34<4:32:03, 4.15s/it]
{'loss': 0.4214, 'grad_norm': 0.5541154146194458, 'learning_rate': 4.988657479024714e-05, 'epoch': 0.13}
13%|█▎ | 575/4506 [39:34<4:32:03, 4.15s/it]
13%|█▎ | 576/4506 [39:38<4:31:48, 4.15s/it]
{'loss': 0.445, 'grad_norm': 0.5268145203590393, 'learning_rate': 4.988472440511531e-05, 'epoch': 0.13}
13%|█▎ | 576/4506 [39:38<4:31:48, 4.15s/it]
13%|█▎ | 577/4506 [39:42<4:29:42, 4.12s/it]
{'loss': 0.41, 'grad_norm': 0.5802878141403198, 'learning_rate': 4.9882859083414745e-05, 'epoch': 0.13}
13%|█▎ | 577/4506 [39:42<4:29:42, 4.12s/it]
13%|█▎ | 578/4506 [39:46<4:28:17, 4.10s/it]
{'loss': 0.41, 'grad_norm': 0.548310399055481, 'learning_rate': 4.988097882626507e-05, 'epoch': 0.13}
13%|█▎ | 578/4506 [39:47<4:28:17, 4.10s/it]
13%|█▎ | 579/4506 [39:51<4:28:07, 4.10s/it]
{'loss': 0.4038, 'grad_norm': 0.458487331867218, 'learning_rate': 4.987908363479485e-05, 'epoch': 0.13}
13%|█▎ | 579/4506 [39:51<4:28:07, 4.10s/it]
13%|█▎ | 580/4506 [39:54<4:24:19, 4.04s/it]
{'loss': 0.419, 'grad_norm': 0.5269064903259277, 'learning_rate': 4.987717351014166e-05, 'epoch': 0.13}
13%|█▎ | 580/4506 [39:55<4:24:19, 4.04s/it]
13%|█▎ | 581/4506 [39:59<4:26:17, 4.07s/it]
{'loss': 0.4305, 'grad_norm': 0.6245625019073486, 'learning_rate': 4.9875248453452005e-05, 'epoch': 0.13}
13%|█▎ | 581/4506 [39:59<4:26:17, 4.07s/it]
13%|█▎ | 582/4506 [40:03<4:33:50, 4.19s/it]
{'loss': 0.415, 'grad_norm': 0.5550088286399841, 'learning_rate': 4.9873308465881366e-05, 'epoch': 0.13}
13%|█▎ | 582/4506 [40:03<4:33:50, 4.19s/it]
13%|█▎ | 583/4506 [40:07<4:29:36, 4.12s/it]
{'loss': 0.4184, 'grad_norm': 0.683721125125885, 'learning_rate': 4.9871353548594166e-05, 'epoch': 0.13}
13%|█▎ | 583/4506 [40:07<4:29:36, 4.12s/it]
13%|█▎ | 584/4506 [40:11<4:25:31, 4.06s/it]
{'loss': 0.4295, 'grad_norm': 0.6568419933319092, 'learning_rate': 4.986938370276384e-05, 'epoch': 0.13}
13%|█▎ | 584/4506 [40:11<4:25:31, 4.06s/it]
13%|█▎ | 585/4506 [40:15<4:26:11, 4.07s/it]
{'loss': 0.3936, 'grad_norm': 0.5536210536956787, 'learning_rate': 4.9867398929572714e-05, 'epoch': 0.13}
13%|█▎ | 585/4506 [40:15<4:26:11, 4.07s/it]
13%|█▎ | 586/4506 [40:19<4:24:54, 4.05s/it]
{'loss': 0.4197, 'grad_norm': 0.6341694593429565, 'learning_rate': 4.9865399230212126e-05, 'epoch': 0.13}
13%|█▎ | 586/4506 [40:19<4:24:54, 4.05s/it]
13%|█▎ | 587/4506 [40:23<4:30:25, 4.14s/it]
{'loss': 0.4085, 'grad_norm': 0.4903506934642792, 'learning_rate': 4.986338460588236e-05, 'epoch': 0.13}
13%|█▎ | 587/4506 [40:23<4:30:25, 4.14s/it]
13%|█▎ | 588/4506 [40:27<4:24:21, 4.05s/it]
{'loss': 0.4228, 'grad_norm': 0.623181164264679, 'learning_rate': 4.9861355057792645e-05, 'epoch': 0.13}
13%|█▎ | 588/4506 [40:27<4:24:21, 4.05s/it]
13%|█▎ | 589/4506 [40:31<4:26:54, 4.09s/it]
{'loss': 0.4389, 'grad_norm': 0.713523268699646, 'learning_rate': 4.9859310587161185e-05, 'epoch': 0.13}
13%|█▎ | 589/4506 [40:31<4:26:54, 4.09s/it]
13%|█▎ | 590/4506 [40:36<4:27:51, 4.10s/it]
{'loss': 0.4205, 'grad_norm': 0.5094652771949768, 'learning_rate': 4.985725119521513e-05, 'epoch': 0.13}
13%|█▎ | 590/4506 [40:36<4:27:51, 4.10s/it]
13%|█▎ | 591/4506 [40:40<4:29:09, 4.12s/it]
{'loss': 0.4059, 'grad_norm': 0.6505284309387207, 'learning_rate': 4.985517688319059e-05, 'epoch': 0.13}
13%|█▎ | 591/4506 [40:40<4:29:09, 4.12s/it]
13%|█▎ | 592/4506 [40:44<4:26:45, 4.09s/it]
{'loss': 0.4152, 'grad_norm': 0.6586400866508484, 'learning_rate': 4.985308765233263e-05, 'epoch': 0.13}
13%|█▎ | 592/4506 [40:44<4:26:45, 4.09s/it]
13%|█▎ | 593/4506 [40:48<4:24:04, 4.05s/it]
{'loss': 0.418, 'grad_norm': 0.5554739832878113, 'learning_rate': 4.985098350389527e-05, 'epoch': 0.13}
13%|█▎ | 593/4506 [40:48<4:24:04, 4.05s/it]
13%|█▎ | 594/4506 [40:52<4:28:41, 4.12s/it]
{'loss': 0.4266, 'grad_norm': 0.5518794059753418, 'learning_rate': 4.984886443914149e-05, 'epoch': 0.13}
13%|█▎ | 594/4506 [40:52<4:28:41, 4.12s/it]
13%|█▎ | 595/4506 [40:56<4:25:01, 4.07s/it]
{'loss': 0.4065, 'grad_norm': 0.585888683795929, 'learning_rate': 4.984673045934321e-05, 'epoch': 0.13}
13%|█▎ | 595/4506 [40:56<4:25:01, 4.07s/it]
13%|█▎ | 596/4506 [41:00<4:20:28, 4.00s/it]
{'loss': 0.4023, 'grad_norm': 0.6191440224647522, 'learning_rate': 4.984458156578131e-05, 'epoch': 0.13}
13%|█▎ | 596/4506 [41:00<4:20:28, 4.00s/it]
13%|█▎ | 597/4506 [41:04<4:22:33, 4.03s/it]
{'loss': 0.3974, 'grad_norm': 0.5325369238853455, 'learning_rate': 4.984241775974562e-05, 'epoch': 0.13}
13%|█▎ | 597/4506 [41:04<4:22:33, 4.03s/it]
13%|█▎ | 598/4506 [41:08<4:28:50, 4.13s/it]
{'loss': 0.419, 'grad_norm': 0.5010392665863037, 'learning_rate': 4.984023904253493e-05, 'epoch': 0.13}
13%|█▎ | 598/4506 [41:08<4:28:50, 4.13s/it]
13%|█▎ | 599/4506 [41:12<4:29:08, 4.13s/it]
{'loss': 0.4264, 'grad_norm': 0.5388311147689819, 'learning_rate': 4.983804541545696e-05, 'epoch': 0.13}
13%|█▎ | 599/4506 [41:12<4:29:08, 4.13s/it]
13%|█▎ | 600/4506 [41:16<4:26:20, 4.09s/it]
{'loss': 0.392, 'grad_norm': 0.591433048248291, 'learning_rate': 4.98358368798284e-05, 'epoch': 0.13}
13%|█▎ | 600/4506 [41:16<4:26:20, 4.09s/it]
13%|█▎ | 601/4506 [41:21<4:33:32, 4.20s/it]
{'loss': 0.4299, 'grad_norm': 0.5803235769271851, 'learning_rate': 4.9833613436974884e-05, 'epoch': 0.13}
13%|█▎ | 601/4506 [41:21<4:33:32, 4.20s/it]
13%|█▎ | 602/4506 [41:25<4:34:47, 4.22s/it]
{'loss': 0.4143, 'grad_norm': 0.6365665197372437, 'learning_rate': 4.983137508823098e-05, 'epoch': 0.13}
13%|█▎ | 602/4506 [41:25<4:34:47, 4.22s/it]
13%|█▎ | 603/4506 [41:29<4:32:29, 4.19s/it]
{'loss': 0.3979, 'grad_norm': 0.5281859040260315, 'learning_rate': 4.982912183494022e-05, 'epoch': 0.13}
13%|█▎ | 603/4506 [41:29<4:32:29, 4.19s/it]
13%|█▎ | 604/4506 [41:33<4:32:13, 4.19s/it]
{'loss': 0.418, 'grad_norm': 0.47640854120254517, 'learning_rate': 4.9826853678455075e-05, 'epoch': 0.13}
13%|█▎ | 604/4506 [41:33<4:32:13, 4.19s/it]
13%|█▎ | 605/4506 [41:37<4:26:19, 4.10s/it]
{'loss': 0.4073, 'grad_norm': 0.5732517242431641, 'learning_rate': 4.982457062013696e-05, 'epoch': 0.13}
13%|█▎ | 605/4506 [41:37<4:26:19, 4.10s/it]
13%|█▎ | 606/4506 [41:42<4:39:19, 4.30s/it]
{'loss': 0.4089, 'grad_norm': 0.47420451045036316, 'learning_rate': 4.982227266135624e-05, 'epoch': 0.13}
13%|█▎ | 606/4506 [41:42<4:39:19, 4.30s/it]
13%|█▎ | 607/4506 [41:46<4:31:02, 4.17s/it]
{'loss': 0.3992, 'grad_norm': 0.5526474118232727, 'learning_rate': 4.981995980349221e-05, 'epoch': 0.13}
13%|█▎ | 607/4506 [41:46<4:31:02, 4.17s/it]
13%|█▎ | 608/4506 [41:51<4:40:09, 4.31s/it]
{'loss': 0.4125, 'grad_norm': 0.4662491977214813, 'learning_rate': 4.981763204793312e-05, 'epoch': 0.13}
13%|█▎ | 608/4506 [41:51<4:40:09, 4.31s/it]
14%|█▎ | 609/4506 [41:55<4:36:01, 4.25s/it]
{'loss': 0.4158, 'grad_norm': 0.5810238122940063, 'learning_rate': 4.981528939607617e-05, 'epoch': 0.14}
14%|█▎ | 609/4506 [41:55<4:36:01, 4.25s/it]
14%|█▎ | 610/4506 [41:59<4:35:10, 4.24s/it]
{'loss': 0.4229, 'grad_norm': 0.5863488912582397, 'learning_rate': 4.981293184932748e-05, 'epoch': 0.14}
14%|█▎ | 610/4506 [41:59<4:35:10, 4.24s/it]
14%|█▎ | 611/4506 [42:03<4:25:22, 4.09s/it]
{'loss': 0.4087, 'grad_norm': 0.5276097059249878, 'learning_rate': 4.981055940910212e-05, 'epoch': 0.14}
14%|█▎ | 611/4506 [42:03<4:25:22, 4.09s/it]
14%|█▎ | 612/4506 [42:07<4:26:23, 4.10s/it]
{'loss': 0.422, 'grad_norm': 0.5822830200195312, 'learning_rate': 4.980817207682412e-05, 'epoch': 0.14}
14%|█▎ | 612/4506 [42:07<4:26:23, 4.10s/it]
14%|█▎ | 613/4506 [42:11<4:22:47, 4.05s/it]
{'loss': 0.412, 'grad_norm': 0.5592841506004333, 'learning_rate': 4.980576985392641e-05, 'epoch': 0.14}
14%|█▎ | 613/4506 [42:11<4:22:47, 4.05s/it]
14%|█▎ | 614/4506 [42:15<4:18:52, 3.99s/it]
{'loss': 0.4192, 'grad_norm': 0.6463163495063782, 'learning_rate': 4.980335274185087e-05, 'epoch': 0.14}
14%|█▎ | 614/4506 [42:15<4:18:52, 3.99s/it]
14%|█▎ | 615/4506 [42:19<4:34:23, 4.23s/it]
{'loss': 0.41, 'grad_norm': 0.5158575177192688, 'learning_rate': 4.980092074204835e-05, 'epoch': 0.14}
14%|█▎ | 615/4506 [42:19<4:34:23, 4.23s/it]
14%|█▎ | 616/4506 [42:23<4:25:14, 4.09s/it]
{'loss': 0.4163, 'grad_norm': 0.5175809264183044, 'learning_rate': 4.97984738559786e-05, 'epoch': 0.14}
14%|█▎ | 616/4506 [42:23<4:25:14, 4.09s/it]
14%|█▎ | 617/4506 [42:27<4:24:06, 4.07s/it]
{'loss': 0.4122, 'grad_norm': 0.5382671356201172, 'learning_rate': 4.979601208511031e-05, 'epoch': 0.14}
14%|█▎ | 617/4506 [42:27<4:24:06, 4.07s/it]
14%|█▎ | 618/4506 [42:31<4:21:08, 4.03s/it]
{'loss': 0.4095, 'grad_norm': 0.5065286159515381, 'learning_rate': 4.979353543092111e-05, 'epoch': 0.14}
14%|█▎ | 618/4506 [42:31<4:21:08, 4.03s/it]
14%|█▎ | 619/4506 [42:35<4:26:48, 4.12s/it]
{'loss': 0.3978, 'grad_norm': 0.48070114850997925, 'learning_rate': 4.979104389489757e-05, 'epoch': 0.14}
14%|█▎ | 619/4506 [42:35<4:26:48, 4.12s/it]
14%|█▍ | 620/4506 [42:39<4:25:15, 4.10s/it]
{'loss': 0.4024, 'grad_norm': 0.5691619515419006, 'learning_rate': 4.978853747853517e-05, 'epoch': 0.14}
14%|█▍ | 620/4506 [42:39<4:25:15, 4.10s/it]
14%|█▍ | 621/4506 [42:43<4:17:35, 3.98s/it]
{'loss': 0.4048, 'grad_norm': 0.5295706391334534, 'learning_rate': 4.9786016183338355e-05, 'epoch': 0.14}
14%|█▍ | 621/4506 [42:43<4:17:35, 3.98s/it]
14%|█▍ | 622/4506 [42:47<4:19:29, 4.01s/it]
{'loss': 0.4005, 'grad_norm': 0.5799340605735779, 'learning_rate': 4.978348001082048e-05, 'epoch': 0.14}
14%|█▍ | 622/4506 [42:47<4:19:29, 4.01s/it]
14%|█▍ | 623/4506 [42:52<4:25:44, 4.11s/it]
{'loss': 0.4172, 'grad_norm': 0.5518925786018372, 'learning_rate': 4.978092896250383e-05, 'epoch': 0.14}
14%|█▍ | 623/4506 [42:52<4:25:44, 4.11s/it]
14%|█▍ | 624/4506 [42:55<4:19:21, 4.01s/it]
{'loss': 0.4204, 'grad_norm': 0.6683689951896667, 'learning_rate': 4.977836303991962e-05, 'epoch': 0.14}
14%|█▍ | 624/4506 [42:55<4:19:21, 4.01s/it]
14%|█▍ | 625/4506 [42:59<4:18:52, 4.00s/it]
{'loss': 0.3944, 'grad_norm': 0.5765993595123291, 'learning_rate': 4.9775782244608e-05, 'epoch': 0.14}
14%|█▍ | 625/4506 [42:59<4:18:52, 4.00s/it]
14%|█▍ | 626/4506 [43:04<4:27:49, 4.14s/it]
{'loss': 0.4162, 'grad_norm': 0.5606405735015869, 'learning_rate': 4.977318657811803e-05, 'epoch': 0.14}
14%|█▍ | 626/4506 [43:04<4:27:49, 4.14s/it]
14%|█▍ | 627/4506 [43:08<4:29:12, 4.16s/it]
{'loss': 0.4209, 'grad_norm': 0.5206716656684875, 'learning_rate': 4.977057604200773e-05, 'epoch': 0.14}
14%|█▍ | 627/4506 [43:08<4:29:12, 4.16s/it]
14%|█▍ | 628/4506 [43:12<4:29:38, 4.17s/it]
{'loss': 0.4145, 'grad_norm': 0.49110639095306396, 'learning_rate': 4.9767950637844e-05, 'epoch': 0.14}
14%|█▍ | 628/4506 [43:12<4:29:38, 4.17s/it]
14%|█▍ | 629/4506 [43:16<4:31:06, 4.20s/it]
{'loss': 0.3929, 'grad_norm': 0.47708895802497864, 'learning_rate': 4.976531036720269e-05, 'epoch': 0.14}
14%|█▍ | 629/4506 [43:16<4:31:06, 4.20s/it]
14%|█▍ | 630/4506 [43:21<4:28:54, 4.16s/it]
{'loss': 0.4208, 'grad_norm': 1.0365325212478638, 'learning_rate': 4.9762655231668594e-05, 'epoch': 0.14}
14%|█▍ | 630/4506 [43:21<4:28:54, 4.16s/it]
14%|█▍ | 631/4506 [43:25<4:29:59, 4.18s/it]
{'loss': 0.3923, 'grad_norm': 0.4928903877735138, 'learning_rate': 4.975998523283538e-05, 'epoch': 0.14}
14%|█▍ | 631/4506 [43:25<4:29:59, 4.18s/it]
14%|█▍ | 632/4506 [43:29<4:24:53, 4.10s/it]
{'loss': 0.412, 'grad_norm': 0.5961422920227051, 'learning_rate': 4.9757300372305674e-05, 'epoch': 0.14}
14%|█▍ | 632/4506 [43:29<4:24:53, 4.10s/it]
14%|█▍ | 633/4506 [43:33<4:26:36, 4.13s/it]
{'loss': 0.4234, 'grad_norm': 0.4902186989784241, 'learning_rate': 4.975460065169101e-05, 'epoch': 0.14}
14%|█▍ | 633/4506 [43:33<4:26:36, 4.13s/it]
14%|█▍ | 634/4506 [43:37<4:26:07, 4.12s/it]
{'loss': 0.414, 'grad_norm': 0.5538730025291443, 'learning_rate': 4.9751886072611834e-05, 'epoch': 0.14}
14%|█▍ | 634/4506 [43:37<4:26:07, 4.12s/it]
14%|█▍ | 635/4506 [43:41<4:25:06, 4.11s/it]
{'loss': 0.4092, 'grad_norm': 0.5165281891822815, 'learning_rate': 4.9749156636697523e-05, 'epoch': 0.14}
14%|█▍ | 635/4506 [43:41<4:25:06, 4.11s/it]
14%|█▍ | 636/4506 [43:45<4:30:30, 4.19s/it]
{'loss': 0.3987, 'grad_norm': 0.48674455285072327, 'learning_rate': 4.974641234558638e-05, 'epoch': 0.14}
14%|█▍ | 636/4506 [43:45<4:30:30, 4.19s/it]
14%|█▍ | 637/4506 [43:50<4:29:14, 4.18s/it]
{'loss': 0.4096, 'grad_norm': 0.5631784200668335, 'learning_rate': 4.97436532009256e-05, 'epoch': 0.14}
14%|█▍ | 637/4506 [43:50<4:29:14, 4.18s/it]
14%|█▍ | 638/4506 [43:54<4:26:17, 4.13s/it]
{'loss': 0.3947, 'grad_norm': 0.5054290294647217, 'learning_rate': 4.974087920437131e-05, 'epoch': 0.14}
14%|█▍ | 638/4506 [43:54<4:26:17, 4.13s/it]
14%|█▍ | 639/4506 [43:57<4:20:39, 4.04s/it]
{'loss': 0.4078, 'grad_norm': 0.5229886770248413, 'learning_rate': 4.973809035758854e-05, 'epoch': 0.14}
14%|█▍ | 639/4506 [43:57<4:20:39, 4.04s/it]
14%|█▍ | 640/4506 [44:01<4:17:40, 4.00s/it]
{'loss': 0.4081, 'grad_norm': 0.48206889629364014, 'learning_rate': 4.9735286662251246e-05, 'epoch': 0.14}
14%|█▍ | 640/4506 [44:01<4:17:40, 4.00s/it]
14%|█▍ | 641/4506 [44:05<4:15:31, 3.97s/it]
{'loss': 0.4015, 'grad_norm': 0.55205237865448, 'learning_rate': 4.97324681200423e-05, 'epoch': 0.14}
14%|█▍ | 641/4506 [44:05<4:15:31, 3.97s/it]
14%|█▍ | 642/4506 [44:09<4:13:22, 3.93s/it]
{'loss': 0.4073, 'grad_norm': 0.5415014028549194, 'learning_rate': 4.9729634732653466e-05, 'epoch': 0.14}
14%|█▍ | 642/4506 [44:09<4:13:22, 3.93s/it]
14%|█▍ | 643/4506 [44:13<4:16:54, 3.99s/it]
{'loss': 0.4012, 'grad_norm': 0.5368225574493408, 'learning_rate': 4.9726786501785416e-05, 'epoch': 0.14}
14%|█▍ | 643/4506 [44:13<4:16:54, 3.99s/it]
14%|█▍ | 644/4506 [44:18<4:26:38, 4.14s/it]
{'loss': 0.41, 'grad_norm': 0.44099292159080505, 'learning_rate': 4.9723923429147775e-05, 'epoch': 0.14}
14%|█▍ | 644/4506 [44:18<4:26:38, 4.14s/it]
14%|█▍ | 645/4506 [44:22<4:28:19, 4.17s/it]
{'loss': 0.4177, 'grad_norm': 0.4954772889614105, 'learning_rate': 4.9721045516459024e-05, 'epoch': 0.14}
14%|█▍ | 645/4506 [44:22<4:28:19, 4.17s/it]
14%|█▍ | 646/4506 [44:26<4:32:27, 4.24s/it]
{'loss': 0.4255, 'grad_norm': 0.5059422850608826, 'learning_rate': 4.971815276544659e-05, 'epoch': 0.14}
14%|█▍ | 646/4506 [44:26<4:32:27, 4.24s/it]
14%|█▍ | 647/4506 [44:30<4:25:10, 4.12s/it]
{'loss': 0.3942, 'grad_norm': 0.49065259099006653, 'learning_rate': 4.971524517784677e-05, 'epoch': 0.14}
14%|█▍ | 647/4506 [44:30<4:25:10, 4.12s/it]
14%|█▍ | 648/4506 [44:34<4:22:39, 4.08s/it]
{'loss': 0.3847, 'grad_norm': 0.5355893969535828, 'learning_rate': 4.971232275540481e-05, 'epoch': 0.14}
14%|█▍ | 648/4506 [44:34<4:22:39, 4.08s/it]
14%|█▍ | 649/4506 [44:38<4:20:17, 4.05s/it]
{'loss': 0.399, 'grad_norm': 0.5791782140731812, 'learning_rate': 4.970938549987481e-05, 'epoch': 0.14}
14%|█▍ | 649/4506 [44:38<4:20:17, 4.05s/it]
14%|█▍ | 650/4506 [44:42<4:24:50, 4.12s/it]
{'loss': 0.3861, 'grad_norm': 0.4678512513637543, 'learning_rate': 4.9706433413019824e-05, 'epoch': 0.14}
14%|█▍ | 650/4506 [44:42<4:24:50, 4.12s/it]
14%|█▍ | 651/4506 [44:46<4:18:20, 4.02s/it]
{'loss': 0.4005, 'grad_norm': 0.5278710722923279, 'learning_rate': 4.9703466496611774e-05, 'epoch': 0.14}
14%|█▍ | 651/4506 [44:46<4:18:20, 4.02s/it]
14%|█▍ | 652/4506 [44:50<4:15:47, 3.98s/it]
{'loss': 0.4113, 'grad_norm': 0.5542044043540955, 'learning_rate': 4.970048475243149e-05, 'epoch': 0.14}
14%|█▍ | 652/4506 [44:50<4:15:47, 3.98s/it]
14%|█▍ | 653/4506 [44:54<4:11:05, 3.91s/it]
{'loss': 0.3989, 'grad_norm': 0.6361961364746094, 'learning_rate': 4.9697488182268714e-05, 'epoch': 0.14}
14%|█▍ | 653/4506 [44:54<4:11:05, 3.91s/it]
15%|█▍ | 654/4506 [44:58<4:09:34, 3.89s/it]
{'loss': 0.4091, 'grad_norm': 0.551453173160553, 'learning_rate': 4.969447678792207e-05, 'epoch': 0.15}
15%|█▍ | 654/4506 [44:58<4:09:34, 3.89s/it]
15%|█▍ | 655/4506 [45:02<4:07:51, 3.86s/it]
{'loss': 0.4003, 'grad_norm': 0.525257408618927, 'learning_rate': 4.969145057119911e-05, 'epoch': 0.15}
15%|█▍ | 655/4506 [45:02<4:07:51, 3.86s/it]
15%|█▍ | 656/4506 [45:06<4:16:05, 3.99s/it]
{'loss': 0.4209, 'grad_norm': 0.5540181994438171, 'learning_rate': 4.968840953391622e-05, 'epoch': 0.15}
15%|█▍ | 656/4506 [45:06<4:16:05, 3.99s/it]
15%|█▍ | 657/4506 [45:10<4:15:48, 3.99s/it]
{'loss': 0.3988, 'grad_norm': 0.5221325755119324, 'learning_rate': 4.968535367789877e-05, 'epoch': 0.15}
15%|█▍ | 657/4506 [45:10<4:15:48, 3.99s/it]
15%|█▍ | 658/4506 [45:13<4:08:00, 3.87s/it]
{'loss': 0.3653, 'grad_norm': 0.5504676699638367, 'learning_rate': 4.9682283004980944e-05, 'epoch': 0.15}
15%|█▍ | 658/4506 [45:13<4:08:00, 3.87s/it]
15%|█▍ | 659/4506 [45:18<4:13:56, 3.96s/it]
{'loss': 0.4169, 'grad_norm': 0.5263819098472595, 'learning_rate': 4.9679197517005874e-05, 'epoch': 0.15}
15%|█▍ | 659/4506 [45:18<4:13:56, 3.96s/it]
15%|█▍ | 660/4506 [45:22<4:17:59, 4.02s/it]
{'loss': 0.3941, 'grad_norm': 0.5558189749717712, 'learning_rate': 4.967609721582555e-05, 'epoch': 0.15}
15%|█▍ | 660/4506 [45:22<4:17:59, 4.02s/it]
15%|█▍ | 661/4506 [45:26<4:32:03, 4.25s/it]
{'loss': 0.4118, 'grad_norm': 0.5839881300926208, 'learning_rate': 4.967298210330086e-05, 'epoch': 0.15}
15%|█▍ | 661/4506 [45:26<4:32:03, 4.25s/it]
15%|█▍ | 662/4506 [45:31<4:28:03, 4.18s/it]
{'loss': 0.4182, 'grad_norm': 0.49037978053092957, 'learning_rate': 4.9669852181301614e-05, 'epoch': 0.15}
15%|█▍ | 662/4506 [45:31<4:28:03, 4.18s/it]
15%|█▍ | 663/4506 [45:35<4:26:40, 4.16s/it]
{'loss': 0.3985, 'grad_norm': 0.4857046604156494, 'learning_rate': 4.966670745170647e-05, 'epoch': 0.15}
15%|█▍ | 663/4506 [45:35<4:26:40, 4.16s/it]
15%|█▍ | 664/4506 [45:39<4:30:49, 4.23s/it]
{'loss': 0.4072, 'grad_norm': 0.5025992393493652, 'learning_rate': 4.966354791640299e-05, 'epoch': 0.15}
15%|█▍ | 664/4506 [45:39<4:30:49, 4.23s/it]
15%|█▍ | 665/4506 [45:43<4:28:43, 4.20s/it]
{'loss': 0.4077, 'grad_norm': 0.4906552731990814, 'learning_rate': 4.966037357728763e-05, 'epoch': 0.15}
15%|█▍ | 665/4506 [45:43<4:28:43, 4.20s/it]
15%|█▍ | 666/4506 [45:47<4:22:07, 4.10s/it]
{'loss': 0.3901, 'grad_norm': 0.5134648084640503, 'learning_rate': 4.965718443626572e-05, 'epoch': 0.15}
15%|█▍ | 666/4506 [45:47<4:22:07, 4.10s/it]
15%|█▍ | 667/4506 [45:51<4:19:39, 4.06s/it]
{'loss': 0.3925, 'grad_norm': 0.4564010500907898, 'learning_rate': 4.965398049525149e-05, 'epoch': 0.15}
15%|█▍ | 667/4506 [45:51<4:19:39, 4.06s/it]
15%|█▍ | 668/4506 [45:55<4:17:28, 4.03s/it]
{'loss': 0.4001, 'grad_norm': 0.4659903645515442, 'learning_rate': 4.965076175616803e-05, 'epoch': 0.15}
15%|█▍ | 668/4506 [45:55<4:17:28, 4.03s/it]
15%|█▍ | 669/4506 [45:59<4:22:58, 4.11s/it]
{'loss': 0.4169, 'grad_norm': 0.5288462042808533, 'learning_rate': 4.964752822094732e-05, 'epoch': 0.15}
15%|█▍ | 669/4506 [45:59<4:22:58, 4.11s/it]
15%|█▍ | 670/4506 [46:03<4:14:36, 3.98s/it]
{'loss': 0.3923, 'grad_norm': 0.4964834451675415, 'learning_rate': 4.964427989153025e-05, 'epoch': 0.15}
15%|█▍ | 670/4506 [46:03<4:14:36, 3.98s/it]
15%|█▍ | 671/4506 [46:07<4:21:40, 4.09s/it]
{'loss': 0.4026, 'grad_norm': 0.5649354457855225, 'learning_rate': 4.964101676986654e-05, 'epoch': 0.15}
15%|█▍ | 671/4506 [46:07<4:21:40, 4.09s/it]
15%|█▍ | 672/4506 [46:11<4:17:40, 4.03s/it]
{'loss': 0.3993, 'grad_norm': 0.5773479342460632, 'learning_rate': 4.963773885791484e-05, 'epoch': 0.15}
15%|█▍ | 672/4506 [46:11<4:17:40, 4.03s/it]
15%|█▍ | 673/4506 [46:15<4:18:26, 4.05s/it]
{'loss': 0.3799, 'grad_norm': 0.5021446943283081, 'learning_rate': 4.9634446157642636e-05, 'epoch': 0.15}
15%|█▍ | 673/4506 [46:15<4:18:26, 4.05s/it]
15%|█▍ | 674/4506 [46:19<4:22:06, 4.10s/it]
{'loss': 0.3888, 'grad_norm': 0.45748090744018555, 'learning_rate': 4.96311386710263e-05, 'epoch': 0.15}
15%|█▍ | 674/4506 [46:19<4:22:06, 4.10s/it]
15%|█▍ | 675/4506 [46:24<4:22:23, 4.11s/it]
{'loss': 0.4039, 'grad_norm': 0.5797020792961121, 'learning_rate': 4.9627816400051096e-05, 'epoch': 0.15}
15%|█▍ | 675/4506 [46:24<4:22:23, 4.11s/it]
15%|█▌ | 676/4506 [46:28<4:20:05, 4.07s/it]
{'loss': 0.399, 'grad_norm': 0.5664475560188293, 'learning_rate': 4.962447934671116e-05, 'epoch': 0.15}
15%|█▌ | 676/4506 [46:28<4:20:05, 4.07s/it]
15%|█▌ | 677/4506 [46:31<4:16:22, 4.02s/it]
{'loss': 0.3808, 'grad_norm': 0.4936218559741974, 'learning_rate': 4.962112751300949e-05, 'epoch': 0.15}
15%|█▌ | 677/4506 [46:31<4:16:22, 4.02s/it]
15%|█▌ | 678/4506 [46:35<4:13:18, 3.97s/it]
{'loss': 0.3932, 'grad_norm': 0.4908999502658844, 'learning_rate': 4.9617760900957946e-05, 'epoch': 0.15}
15%|█▌ | 678/4506 [46:35<4:13:18, 3.97s/it]
15%|█▌ | 679/4506 [46:40<4:24:53, 4.15s/it]
{'loss': 0.3923, 'grad_norm': 0.4419223964214325, 'learning_rate': 4.961437951257728e-05, 'epoch': 0.15}
15%|█▌ | 679/4506 [46:40<4:24:53, 4.15s/it]
15%|█▌ | 680/4506 [46:44<4:18:18, 4.05s/it]
{'loss': 0.391, 'grad_norm': 0.4983472228050232, 'learning_rate': 4.96109833498971e-05, 'epoch': 0.15}
15%|█▌ | 680/4506 [46:44<4:18:18, 4.05s/it]
15%|█▌ | 681/4506 [46:48<4:24:02, 4.14s/it]
{'loss': 0.3863, 'grad_norm': 0.4638632833957672, 'learning_rate': 4.96075724149559e-05, 'epoch': 0.15}
15%|█▌ | 681/4506 [46:48<4:24:02, 4.14s/it]
15%|█▌ | 682/4506 [46:52<4:24:55, 4.16s/it]
{'loss': 0.3915, 'grad_norm': 0.47884106636047363, 'learning_rate': 4.9604146709801e-05, 'epoch': 0.15}
15%|█▌ | 682/4506 [46:52<4:24:55, 4.16s/it]
15%|█▌ | 683/4506 [46:57<4:30:32, 4.25s/it]
{'loss': 0.3797, 'grad_norm': 0.4923678934574127, 'learning_rate': 4.9600706236488635e-05, 'epoch': 0.15}
15%|█▌ | 683/4506 [46:57<4:30:32, 4.25s/it]
15%|█▌ | 684/4506 [47:01<4:24:53, 4.16s/it]
{'loss': 0.4018, 'grad_norm': 0.45877334475517273, 'learning_rate': 4.959725099708388e-05, 'epoch': 0.15}
15%|█▌ | 684/4506 [47:01<4:24:53, 4.16s/it]
15%|█▌ | 685/4506 [47:05<4:22:27, 4.12s/it]
{'loss': 0.3892, 'grad_norm': 0.5378092527389526, 'learning_rate': 4.959378099366067e-05, 'epoch': 0.15}
15%|█▌ | 685/4506 [47:05<4:22:27, 4.12s/it]
15%|█▌ | 686/4506 [47:09<4:23:06, 4.13s/it]
{'loss': 0.397, 'grad_norm': 0.5089632272720337, 'learning_rate': 4.95902962283018e-05, 'epoch': 0.15}
15%|█▌ | 686/4506 [47:09<4:23:06, 4.13s/it]
15%|█▌ | 687/4506 [47:13<4:22:26, 4.12s/it]
{'loss': 0.4029, 'grad_norm': 0.5566222667694092, 'learning_rate': 4.958679670309895e-05, 'epoch': 0.15}
15%|█▌ | 687/4506 [47:13<4:22:26, 4.12s/it]
15%|█▌ | 688/4506 [47:17<4:18:23, 4.06s/it]
{'loss': 0.3824, 'grad_norm': 0.5199006795883179, 'learning_rate': 4.958328242015262e-05, 'epoch': 0.15}
15%|█▌ | 688/4506 [47:17<4:18:23, 4.06s/it]
15%|█▌ | 689/4506 [47:21<4:20:48, 4.10s/it]
{'loss': 0.4105, 'grad_norm': 0.6496845483779907, 'learning_rate': 4.957975338157221e-05, 'epoch': 0.15}
15%|█▌ | 689/4506 [47:21<4:20:48, 4.10s/it]
15%|█▌ | 690/4506 [47:25<4:26:22, 4.19s/it]
{'loss': 0.3842, 'grad_norm': 0.6465082168579102, 'learning_rate': 4.957620958947594e-05, 'epoch': 0.15}
15%|█▌ | 690/4506 [47:25<4:26:22, 4.19s/it]
15%|█▌ | 691/4506 [47:29<4:20:51, 4.10s/it]
{'loss': 0.3685, 'grad_norm': 0.5241184234619141, 'learning_rate': 4.957265104599091e-05, 'epoch': 0.15}
15%|█▌ | 691/4506 [47:29<4:20:51, 4.10s/it]
15%|█▌ | 692/4506 [47:34<4:31:56, 4.28s/it]
{'loss': 0.3947, 'grad_norm': 0.5372939109802246, 'learning_rate': 4.956907775325306e-05, 'epoch': 0.15}
15%|█▌ | 692/4506 [47:34<4:31:56, 4.28s/it]
15%|█▌ | 693/4506 [47:38<4:27:40, 4.21s/it]
{'loss': 0.3998, 'grad_norm': 0.6064289808273315, 'learning_rate': 4.95654897134072e-05, 'epoch': 0.15}
15%|█▌ | 693/4506 [47:38<4:27:40, 4.21s/it]
15%|█▌ | 694/4506 [47:42<4:21:41, 4.12s/it]
{'loss': 0.3805, 'grad_norm': 0.5411776900291443, 'learning_rate': 4.956188692860697e-05, 'epoch': 0.15}
15%|█▌ | 694/4506 [47:42<4:21:41, 4.12s/it]
15%|█▌ | 695/4506 [47:46<4:14:14, 4.00s/it]
{'loss': 0.3826, 'grad_norm': 0.46976393461227417, 'learning_rate': 4.955826940101488e-05, 'epoch': 0.15}
15%|█▌ | 695/4506 [47:46<4:14:14, 4.00s/it]
15%|█▌ | 696/4506 [47:50<4:17:30, 4.06s/it]
{'loss': 0.3852, 'grad_norm': 0.470256507396698, 'learning_rate': 4.955463713280227e-05, 'epoch': 0.15}
15%|█▌ | 696/4506 [47:50<4:17:30, 4.06s/it]
15%|█▌ | 697/4506 [47:54<4:11:58, 3.97s/it]
{'loss': 0.3783, 'grad_norm': 0.5101056098937988, 'learning_rate': 4.955099012614934e-05, 'epoch': 0.15}
15%|█▌ | 697/4506 [47:54<4:11:58, 3.97s/it]
15%|█▌ | 698/4506 [47:58<4:13:06, 3.99s/it]
{'loss': 0.3961, 'grad_norm': 0.5139530301094055, 'learning_rate': 4.954732838324514e-05, 'epoch': 0.15}
15%|█▌ | 698/4506 [47:58<4:13:06, 3.99s/it]
16%|█▌ | 699/4506 [48:02<4:13:46, 4.00s/it]
{'loss': 0.4028, 'grad_norm': 0.48362240195274353, 'learning_rate': 4.954365190628756e-05, 'epoch': 0.16}
16%|█▌ | 699/4506 [48:02<4:13:46, 4.00s/it]
16%|█▌ | 700/4506 [48:06<4:19:48, 4.10s/it]
{'loss': 0.4003, 'grad_norm': 0.5126010179519653, 'learning_rate': 4.953996069748333e-05, 'epoch': 0.16}
16%|█▌ | 700/4506 [48:06<4:19:48, 4.10s/it]
16%|█▌ | 701/4506 [48:10<4:17:44, 4.06s/it]
{'loss': 0.3913, 'grad_norm': 0.5110991597175598, 'learning_rate': 4.9536254759048026e-05, 'epoch': 0.16}
16%|█▌ | 701/4506 [48:10<4:17:44, 4.06s/it]
16%|█▌ | 702/4506 [48:14<4:24:11, 4.17s/it]
{'loss': 0.3919, 'grad_norm': 0.46405288577079773, 'learning_rate': 4.953253409320606e-05, 'epoch': 0.16}
16%|█▌ | 702/4506 [48:14<4:24:11, 4.17s/it]
16%|█▌ | 703/4506 [48:18<4:17:30, 4.06s/it]
{'loss': 0.3838, 'grad_norm': 0.5090782046318054, 'learning_rate': 4.952879870219071e-05, 'epoch': 0.16}
16%|█▌ | 703/4506 [48:18<4:17:30, 4.06s/it]
16%|█▌ | 704/4506 [48:23<4:26:26, 4.20s/it]
{'loss': 0.384, 'grad_norm': 0.5336664319038391, 'learning_rate': 4.952504858824404e-05, 'epoch': 0.16}
16%|█▌ | 704/4506 [48:23<4:26:26, 4.20s/it]
16%|█▌ | 705/4506 [48:27<4:19:18, 4.09s/it]
{'loss': 0.3904, 'grad_norm': 0.5330934524536133, 'learning_rate': 4.9521283753617e-05, 'epoch': 0.16}
16%|█▌ | 705/4506 [48:27<4:19:18, 4.09s/it]
16%|█▌ | 706/4506 [48:31<4:17:37, 4.07s/it]
{'loss': 0.3904, 'grad_norm': 0.481336772441864, 'learning_rate': 4.951750420056936e-05, 'epoch': 0.16}
16%|█▌ | 706/4506 [48:31<4:17:37, 4.07s/it]
16%|█▌ | 707/4506 [48:35<4:24:06, 4.17s/it]
{'loss': 0.3826, 'grad_norm': 0.5783835053443909, 'learning_rate': 4.951370993136971e-05, 'epoch': 0.16}
16%|█▌ | 707/4506 [48:35<4:24:06, 4.17s/it]
16%|█▌ | 708/4506 [48:39<4:19:08, 4.09s/it]
{'loss': 0.3786, 'grad_norm': 0.5915576815605164, 'learning_rate': 4.9509900948295504e-05, 'epoch': 0.16}
16%|█▌ | 708/4506 [48:39<4:19:08, 4.09s/it]
16%|█▌ | 709/4506 [48:43<4:17:11, 4.06s/it]
{'loss': 0.3985, 'grad_norm': 0.5210710763931274, 'learning_rate': 4.9506077253633e-05, 'epoch': 0.16}
16%|█▌ | 709/4506 [48:43<4:17:11, 4.06s/it]
16%|█▌ | 710/4506 [48:47<4:15:07, 4.03s/it]
{'loss': 0.3706, 'grad_norm': 0.5837641358375549, 'learning_rate': 4.9502238849677295e-05, 'epoch': 0.16}
16%|█▌ | 710/4506 [48:47<4:15:07, 4.03s/it]
16%|█▌ | 711/4506 [48:51<4:10:02, 3.95s/it]
{'loss': 0.3872, 'grad_norm': 0.49958252906799316, 'learning_rate': 4.949838573873231e-05, 'epoch': 0.16}
16%|█▌ | 711/4506 [48:51<4:10:02, 3.95s/it]
16%|█▌ | 712/4506 [48:55<4:18:31, 4.09s/it]
{'loss': 0.3763, 'grad_norm': 0.49639952182769775, 'learning_rate': 4.9494517923110816e-05, 'epoch': 0.16}
16%|█▌ | 712/4506 [48:55<4:18:31, 4.09s/it]
16%|█▌ | 713/4506 [48:59<4:20:55, 4.13s/it]
{'loss': 0.3787, 'grad_norm': 0.48319607973098755, 'learning_rate': 4.949063540513438e-05, 'epoch': 0.16}
16%|█▌ | 713/4506 [48:59<4:20:55, 4.13s/it]
16%|█▌ | 714/4506 [49:04<4:23:05, 4.16s/it]
{'loss': 0.367, 'grad_norm': 0.5306775569915771, 'learning_rate': 4.9486738187133416e-05, 'epoch': 0.16}
16%|█▌ | 714/4506 [49:04<4:23:05, 4.16s/it]
16%|█▌ | 715/4506 [49:07<4:14:51, 4.03s/it]
{'loss': 0.3818, 'grad_norm': 0.6840843558311462, 'learning_rate': 4.948282627144714e-05, 'epoch': 0.16}
16%|█▌ | 715/4506 [49:07<4:14:51, 4.03s/it]
16%|█▌ | 716/4506 [49:12<4:18:50, 4.10s/it]
{'loss': 0.3907, 'grad_norm': 0.5596795678138733, 'learning_rate': 4.9478899660423615e-05, 'epoch': 0.16}
16%|█▌ | 716/4506 [49:12<4:18:50, 4.10s/it]
16%|█▌ | 717/4506 [49:16<4:15:40, 4.05s/it]
{'loss': 0.3722, 'grad_norm': 0.4834854006767273, 'learning_rate': 4.947495835641971e-05, 'epoch': 0.16}
16%|█▌ | 717/4506 [49:16<4:15:40, 4.05s/it]
16%|█▌ | 718/4506 [49:20<4:17:09, 4.07s/it]
{'loss': 0.3847, 'grad_norm': 0.543215274810791, 'learning_rate': 4.947100236180111e-05, 'epoch': 0.16}
16%|█▌ | 718/4506 [49:20<4:17:09, 4.07s/it]
16%|█▌ | 719/4506 [49:24<4:22:01, 4.15s/it]
{'loss': 0.3912, 'grad_norm': 0.5494294166564941, 'learning_rate': 4.946703167894233e-05, 'epoch': 0.16}
16%|█▌ | 719/4506 [49:24<4:22:01, 4.15s/it]
16%|█▌ | 720/4506 [49:29<4:29:58, 4.28s/it]
{'loss': 0.3861, 'grad_norm': 0.5505370497703552, 'learning_rate': 4.946304631022669e-05, 'epoch': 0.16}
16%|█▌ | 720/4506 [49:29<4:29:58, 4.28s/it]
16%|█▌ | 721/4506 [49:33<4:34:04, 4.34s/it]
{'loss': 0.3771, 'grad_norm': 0.49660196900367737, 'learning_rate': 4.945904625804634e-05, 'epoch': 0.16}
16%|█▌ | 721/4506 [49:33<4:34:04, 4.34s/it]
16%|█▌ | 722/4506 [49:37<4:25:10, 4.20s/it]
{'loss': 0.3765, 'grad_norm': 0.5223884582519531, 'learning_rate': 4.945503152480222e-05, 'epoch': 0.16}
16%|█▌ | 722/4506 [49:37<4:25:10, 4.20s/it]
16%|█▌ | 723/4506 [49:41<4:25:35, 4.21s/it]
{'loss': 0.3872, 'grad_norm': 0.4947271943092346, 'learning_rate': 4.9451002112904095e-05, 'epoch': 0.16}
16%|█▌ | 723/4506 [49:41<4:25:35, 4.21s/it]
16%|█▌ | 724/4506 [49:45<4:25:09, 4.21s/it]
{'loss': 0.3984, 'grad_norm': 0.5458205342292786, 'learning_rate': 4.944695802477055e-05, 'epoch': 0.16}
16%|█▌ | 724/4506 [49:45<4:25:09, 4.21s/it]
16%|█▌ | 725/4506 [49:50<4:24:07, 4.19s/it]
{'loss': 0.3959, 'grad_norm': 0.502316415309906, 'learning_rate': 4.944289926282896e-05, 'epoch': 0.16}
16%|█▌ | 725/4506 [49:50<4:24:07, 4.19s/it]
16%|█▌ | 726/4506 [49:53<4:15:51, 4.06s/it]
{'loss': 0.3929, 'grad_norm': 0.5949967503547668, 'learning_rate': 4.943882582951553e-05, 'epoch': 0.16}
16%|█▌ | 726/4506 [49:53<4:15:51, 4.06s/it]
16%|█▌ | 727/4506 [49:57<4:18:35, 4.11s/it]
{'loss': 0.4007, 'grad_norm': 0.549598753452301, 'learning_rate': 4.9434737727275246e-05, 'epoch': 0.16}
16%|█▌ | 727/4506 [49:57<4:18:35, 4.11s/it]
16%|█▌ | 728/4506 [50:01<4:13:57, 4.03s/it]
{'loss': 0.3752, 'grad_norm': 0.5035743117332458, 'learning_rate': 4.943063495856192e-05, 'epoch': 0.16}
16%|█▌ | 728/4506 [50:01<4:13:57, 4.03s/it]
16%|█▌ | 729/4506 [50:05<4:14:30, 4.04s/it]
{'loss': 0.4001, 'grad_norm': 0.5346064567565918, 'learning_rate': 4.942651752583815e-05, 'epoch': 0.16}
16%|█▌ | 729/4506 [50:05<4:14:30, 4.04s/it]
16%|█▌ | 730/4506 [50:09<4:14:10, 4.04s/it]
{'loss': 0.3688, 'grad_norm': 0.49941617250442505, 'learning_rate': 4.9422385431575344e-05, 'epoch': 0.16}
16%|█▌ | 730/4506 [50:09<4:14:10, 4.04s/it]
16%|█▌ | 731/4506 [50:14<4:22:41, 4.18s/it]
{'loss': 0.3836, 'grad_norm': 0.5015337467193604, 'learning_rate': 4.941823867825373e-05, 'epoch': 0.16}
16%|█▌ | 731/4506 [50:14<4:22:41, 4.18s/it]
16%|█▌ | 732/4506 [50:18<4:22:05, 4.17s/it]
{'loss': 0.3672, 'grad_norm': 0.4860544502735138, 'learning_rate': 4.94140772683623e-05, 'epoch': 0.16}
16%|█▌ | 732/4506 [50:18<4:22:05, 4.17s/it]
16%|█▋ | 733/4506 [50:22<4:21:05, 4.15s/it]
{'loss': 0.4075, 'grad_norm': 0.5173107385635376, 'learning_rate': 4.940990120439885e-05, 'epoch': 0.16}
16%|█▋ | 733/4506 [50:22<4:21:05, 4.15s/it]
16%|█▋ | 734/4506 [50:26<4:13:14, 4.03s/it]
{'loss': 0.3986, 'grad_norm': 0.6023046970367432, 'learning_rate': 4.940571048887e-05, 'epoch': 0.16}
16%|█▋ | 734/4506 [50:26<4:13:14, 4.03s/it]
16%|█▋ | 735/4506 [50:30<4:10:01, 3.98s/it]
{'loss': 0.3859, 'grad_norm': 0.5498964786529541, 'learning_rate': 4.940150512429114e-05, 'epoch': 0.16}
16%|█▋ | 735/4506 [50:30<4:10:01, 3.98s/it]
16%|█▋ | 736/4506 [50:34<4:10:48, 3.99s/it]
{'loss': 0.3838, 'grad_norm': 0.48322203755378723, 'learning_rate': 4.939728511318644e-05, 'epoch': 0.16}
16%|█▋ | 736/4506 [50:34<4:10:48, 3.99s/it]
16%|█▋ | 737/4506 [50:38<4:14:16, 4.05s/it]
{'loss': 0.3785, 'grad_norm': 0.6017166376113892, 'learning_rate': 4.93930504580889e-05, 'epoch': 0.16}
16%|█▋ | 737/4506 [50:38<4:14:16, 4.05s/it]
16%|█▋ | 738/4506 [50:42<4:13:43, 4.04s/it]
{'loss': 0.371, 'grad_norm': 0.45803678035736084, 'learning_rate': 4.938880116154028e-05, 'epoch': 0.16}
16%|█▋ | 738/4506 [50:42<4:13:43, 4.04s/it]
16%|█▋ | 739/4506 [50:46<4:10:36, 3.99s/it]
{'loss': 0.3896, 'grad_norm': 0.49550312757492065, 'learning_rate': 4.938453722609114e-05, 'epoch': 0.16}
16%|█▋ | 739/4506 [50:46<4:10:36, 3.99s/it]
16%|█▋ | 740/4506 [50:50<4:16:16, 4.08s/it]
{'loss': 0.3669, 'grad_norm': 0.6141790151596069, 'learning_rate': 4.938025865430082e-05, 'epoch': 0.16}
16%|█▋ | 740/4506 [50:50<4:16:16, 4.08s/it]
16%|█▋ | 741/4506 [50:54<4:13:44, 4.04s/it]
{'loss': 0.371, 'grad_norm': 0.5205312371253967, 'learning_rate': 4.937596544873746e-05, 'epoch': 0.16}
16%|█▋ | 741/4506 [50:54<4:13:44, 4.04s/it]
16%|█▋ | 742/4506 [50:58<4:13:14, 4.04s/it]
{'loss': 0.3632, 'grad_norm': 0.43938136100769043, 'learning_rate': 4.937165761197796e-05, 'epoch': 0.16}
16%|█▋ | 742/4506 [50:58<4:13:14, 4.04s/it]
16%|█▋ | 743/4506 [51:02<4:11:59, 4.02s/it]
{'loss': 0.3886, 'grad_norm': 0.5825845003128052, 'learning_rate': 4.936733514660802e-05, 'epoch': 0.16}
16%|█▋ | 743/4506 [51:02<4:11:59, 4.02s/it]
17%|█▋ | 744/4506 [51:06<4:13:18, 4.04s/it]
{'loss': 0.3854, 'grad_norm': 0.5853646397590637, 'learning_rate': 4.936299805522211e-05, 'epoch': 0.17}
17%|█▋ | 744/4506 [51:06<4:13:18, 4.04s/it]
17%|█▋ | 745/4506 [51:10<4:11:29, 4.01s/it]
{'loss': 0.3695, 'grad_norm': 0.4748672544956207, 'learning_rate': 4.9358646340423495e-05, 'epoch': 0.17}
17%|█▋ | 745/4506 [51:10<4:11:29, 4.01s/it]
17%|█▋ | 746/4506 [51:14<4:10:51, 4.00s/it]
{'loss': 0.3731, 'grad_norm': 0.47750934958457947, 'learning_rate': 4.935428000482419e-05, 'epoch': 0.17}
17%|█▋ | 746/4506 [51:14<4:10:51, 4.00s/it]
17%|█▋ | 747/4506 [51:18<4:11:17, 4.01s/it]
{'loss': 0.3879, 'grad_norm': 0.5159927606582642, 'learning_rate': 4.934989905104502e-05, 'epoch': 0.17}
17%|█▋ | 747/4506 [51:18<4:11:17, 4.01s/it]
17%|█▋ | 748/4506 [51:23<4:23:21, 4.20s/it]
{'loss': 0.3868, 'grad_norm': 0.49754849076271057, 'learning_rate': 4.934550348171556e-05, 'epoch': 0.17}
17%|█▋ | 748/4506 [51:23<4:23:21, 4.20s/it]
17%|█▋ | 749/4506 [51:27<4:21:17, 4.17s/it]
{'loss': 0.3582, 'grad_norm': 0.4771589934825897, 'learning_rate': 4.9341093299474165e-05, 'epoch': 0.17}
17%|█▋ | 749/4506 [51:27<4:21:17, 4.17s/it]
17%|█▋ | 750/4506 [51:31<4:23:53, 4.22s/it]
{'loss': 0.3975, 'grad_norm': 0.5197660326957703, 'learning_rate': 4.933666850696795e-05, 'epoch': 0.17}
17%|█▋ | 750/4506 [51:31<4:23:53, 4.22s/it]
17%|█▋ | 751/4506 [51:35<4:24:04, 4.22s/it]
{'loss': 0.3773, 'grad_norm': 0.4919314384460449, 'learning_rate': 4.9332229106852835e-05, 'epoch': 0.17}
17%|█▋ | 751/4506 [51:35<4:24:04, 4.22s/it]
17%|█▋ | 752/4506 [51:40<4:26:26, 4.26s/it]
{'loss': 0.3688, 'grad_norm': 0.5061005353927612, 'learning_rate': 4.9327775101793456e-05, 'epoch': 0.17}
17%|█▋ | 752/4506 [51:40<4:26:26, 4.26s/it]
17%|█▋ | 753/4506 [51:44<4:22:22, 4.19s/it]
{'loss': 0.3676, 'grad_norm': 0.5608893036842346, 'learning_rate': 4.932330649446325e-05, 'epoch': 0.17}
17%|█▋ | 753/4506 [51:44<4:22:22, 4.19s/it]
17%|█▋ | 754/4506 [51:48<4:18:50, 4.14s/it]
{'loss': 0.3856, 'grad_norm': 0.6008087992668152, 'learning_rate': 4.9318823287544425e-05, 'epoch': 0.17}
17%|█▋ | 754/4506 [51:48<4:18:50, 4.14s/it]
17%|█▋ | 755/4506 [51:52<4:20:37, 4.17s/it]
{'loss': 0.3748, 'grad_norm': 0.5113583207130432, 'learning_rate': 4.9314325483727924e-05, 'epoch': 0.17}
17%|█▋ | 755/4506 [51:52<4:20:37, 4.17s/it]
17%|█▋ | 756/4506 [51:56<4:17:33, 4.12s/it]
{'loss': 0.3532, 'grad_norm': 0.4759223461151123, 'learning_rate': 4.930981308571347e-05, 'epoch': 0.17}
17%|█▋ | 756/4506 [51:56<4:17:33, 4.12s/it]
17%|█▋ | 757/4506 [52:00<4:18:12, 4.13s/it]
{'loss': 0.3929, 'grad_norm': 0.4983096122741699, 'learning_rate': 4.930528609620954e-05, 'epoch': 0.17}
17%|█▋ | 757/4506 [52:00<4:18:12, 4.13s/it]
17%|█▋ | 758/4506 [52:04<4:16:14, 4.10s/it]
{'loss': 0.3703, 'grad_norm': 0.5244458913803101, 'learning_rate': 4.9300744517933374e-05, 'epoch': 0.17}
17%|█▋ | 758/4506 [52:04<4:16:14, 4.10s/it]
17%|█▋ | 759/4506 [52:09<4:23:33, 4.22s/it]
{'loss': 0.3934, 'grad_norm': 0.5128511786460876, 'learning_rate': 4.9296188353610964e-05, 'epoch': 0.17}
17%|█▋ | 759/4506 [52:09<4:23:33, 4.22s/it]
17%|█▋ | 760/4506 [52:13<4:20:50, 4.18s/it]
{'loss': 0.383, 'grad_norm': 0.5535969138145447, 'learning_rate': 4.9291617605977054e-05, 'epoch': 0.17}
17%|█▋ | 760/4506 [52:13<4:20:50, 4.18s/it]
17%|█▋ | 761/4506 [52:17<4:18:30, 4.14s/it]
{'loss': 0.3721, 'grad_norm': 0.502324104309082, 'learning_rate': 4.928703227777515e-05, 'epoch': 0.17}
17%|█▋ | 761/4506 [52:17<4:18:30, 4.14s/it]
17%|█▋ | 762/4506 [52:21<4:15:48, 4.10s/it]
{'loss': 0.3794, 'grad_norm': 0.6088352799415588, 'learning_rate': 4.928243237175751e-05, 'epoch': 0.17}
17%|█▋ | 762/4506 [52:21<4:15:48, 4.10s/it]
17%|█▋ | 763/4506 [52:25<4:15:49, 4.10s/it]
{'loss': 0.3714, 'grad_norm': 0.5537615418434143, 'learning_rate': 4.9277817890685127e-05, 'epoch': 0.17}
17%|█▋ | 763/4506 [52:25<4:15:49, 4.10s/it]
17%|█▋ | 764/4506 [52:29<4:08:03, 3.98s/it]
{'loss': 0.3751, 'grad_norm': 0.6296548247337341, 'learning_rate': 4.927318883732777e-05, 'epoch': 0.17}
17%|█▋ | 764/4506 [52:29<4:08:03, 3.98s/it]
17%|█▋ | 765/4506 [52:33<4:13:54, 4.07s/it]
{'loss': 0.3841, 'grad_norm': 0.564932107925415, 'learning_rate': 4.926854521446391e-05, 'epoch': 0.17}
17%|█▋ | 765/4506 [52:33<4:13:54, 4.07s/it]
17%|█▋ | 766/4506 [52:38<4:23:06, 4.22s/it]
{'loss': 0.3808, 'grad_norm': 0.49444276094436646, 'learning_rate': 4.926388702488082e-05, 'epoch': 0.17}
17%|█▋ | 766/4506 [52:38<4:23:06, 4.22s/it]
17%|█▋ | 767/4506 [52:42<4:21:33, 4.20s/it]
{'loss': 0.3427, 'grad_norm': 0.4603702425956726, 'learning_rate': 4.9259214271374465e-05, 'epoch': 0.17}
17%|█▋ | 767/4506 [52:42<4:21:33, 4.20s/it]
17%|█▋ | 768/4506 [52:46<4:18:23, 4.15s/it]
{'loss': 0.3801, 'grad_norm': 0.5568057894706726, 'learning_rate': 4.925452695674959e-05, 'epoch': 0.17}
17%|█▋ | 768/4506 [52:46<4:18:23, 4.15s/it]
17%|█▋ | 769/4506 [52:50<4:17:10, 4.13s/it]
{'loss': 0.3692, 'grad_norm': 0.5383927226066589, 'learning_rate': 4.9249825083819655e-05, 'epoch': 0.17}
17%|█▋ | 769/4506 [52:50<4:17:10, 4.13s/it]
17%|█▋ | 770/4506 [52:54<4:15:06, 4.10s/it]
{'loss': 0.3858, 'grad_norm': 0.504489541053772, 'learning_rate': 4.924510865540688e-05, 'epoch': 0.17}
17%|█▋ | 770/4506 [52:54<4:15:06, 4.10s/it]
17%|█▋ | 771/4506 [52:58<4:06:50, 3.97s/it]
{'loss': 0.3797, 'grad_norm': 0.48509106040000916, 'learning_rate': 4.924037767434219e-05, 'epoch': 0.17}
17%|█▋ | 771/4506 [52:58<4:06:50, 3.97s/it]
17%|█▋ | 772/4506 [53:02<4:09:46, 4.01s/it]
{'loss': 0.374, 'grad_norm': 0.5809385180473328, 'learning_rate': 4.923563214346526e-05, 'epoch': 0.17}
17%|█▋ | 772/4506 [53:02<4:09:46, 4.01s/it]
17%|█▋ | 773/4506 [53:06<4:08:43, 4.00s/it]
{'loss': 0.3585, 'grad_norm': 0.5350124835968018, 'learning_rate': 4.923087206562453e-05, 'epoch': 0.17}
17%|█▋ | 773/4506 [53:06<4:08:43, 4.00s/it]
17%|█▋ | 774/4506 [53:09<4:05:12, 3.94s/it]
{'loss': 0.39, 'grad_norm': 0.6187903881072998, 'learning_rate': 4.922609744367712e-05, 'epoch': 0.17}
17%|█▋ | 774/4506 [53:09<4:05:12, 3.94s/it]
17%|█▋ | 775/4506 [53:14<4:07:12, 3.98s/it]
{'loss': 0.3746, 'grad_norm': 0.5877137780189514, 'learning_rate': 4.922130828048891e-05, 'epoch': 0.17}
17%|█▋ | 775/4506 [53:14<4:07:12, 3.98s/it]
17%|█▋ | 776/4506 [53:18<4:07:34, 3.98s/it]
{'loss': 0.3818, 'grad_norm': 0.48224908113479614, 'learning_rate': 4.921650457893451e-05, 'epoch': 0.17}
17%|█▋ | 776/4506 [53:18<4:07:34, 3.98s/it]
17%|█▋ | 777/4506 [53:21<4:07:44, 3.99s/it]
{'loss': 0.3761, 'grad_norm': 0.5171534419059753, 'learning_rate': 4.9211686341897236e-05, 'epoch': 0.17}
17%|█▋ | 777/4506 [53:22<4:07:44, 3.99s/it]
17%|█▋ | 778/4506 [53:25<4:07:40, 3.99s/it]
{'loss': 0.3721, 'grad_norm': 0.4825461208820343, 'learning_rate': 4.920685357226914e-05, 'epoch': 0.17}
17%|█▋ | 778/4506 [53:25<4:07:40, 3.99s/it]
17%|█▋ | 779/4506 [53:30<4:09:17, 4.01s/it]
{'loss': 0.3741, 'grad_norm': 0.546470582485199, 'learning_rate': 4.9202006272951004e-05, 'epoch': 0.17}
17%|█▋ | 779/4506 [53:30<4:09:17, 4.01s/it]
17%|█▋ | 780/4506 [53:33<4:04:34, 3.94s/it]
{'loss': 0.3761, 'grad_norm': 0.6550987958908081, 'learning_rate': 4.9197144446852323e-05, 'epoch': 0.17}
17%|█▋ | 780/4506 [53:33<4:04:34, 3.94s/it]
17%|█▋ | 781/4506 [53:38<4:10:45, 4.04s/it]
{'loss': 0.385, 'grad_norm': 0.48123714327812195, 'learning_rate': 4.919226809689131e-05, 'epoch': 0.17}
17%|█▋ | 781/4506 [53:38<4:10:45, 4.04s/it]
17%|█▋ | 782/4506 [53:42<4:17:34, 4.15s/it]
{'loss': 0.3844, 'grad_norm': 0.5807216763496399, 'learning_rate': 4.918737722599491e-05, 'epoch': 0.17}
17%|█▋ | 782/4506 [53:42<4:17:34, 4.15s/it]
17%|█▋ | 783/4506 [53:46<4:13:44, 4.09s/it]
{'loss': 0.3587, 'grad_norm': 0.49145615100860596, 'learning_rate': 4.9182471837098755e-05, 'epoch': 0.17}
17%|█▋ | 783/4506 [53:46<4:13:44, 4.09s/it]
17%|█▋ | 784/4506 [53:50<4:09:44, 4.03s/it]
{'loss': 0.3749, 'grad_norm': 0.5149043202400208, 'learning_rate': 4.9177551933147224e-05, 'epoch': 0.17}
17%|█▋ | 784/4506 [53:50<4:09:44, 4.03s/it]
17%|█▋ | 785/4506 [53:54<4:08:59, 4.01s/it]
{'loss': 0.3683, 'grad_norm': 0.6276089549064636, 'learning_rate': 4.917261751709338e-05, 'epoch': 0.17}
17%|█▋ | 785/4506 [53:54<4:08:59, 4.01s/it]
17%|█▋ | 786/4506 [53:58<4:07:29, 3.99s/it]
{'loss': 0.3714, 'grad_norm': 0.5314478874206543, 'learning_rate': 4.916766859189902e-05, 'epoch': 0.17}
17%|█▋ | 786/4506 [53:58<4:07:29, 3.99s/it]
17%|█▋ | 787/4506 [54:02<4:12:38, 4.08s/it]
{'loss': 0.3653, 'grad_norm': 0.5453343391418457, 'learning_rate': 4.9162705160534634e-05, 'epoch': 0.17}
17%|█▋ | 787/4506 [54:02<4:12:38, 4.08s/it]
17%|█▋ | 788/4506 [54:06<4:12:02, 4.07s/it]
{'loss': 0.3749, 'grad_norm': 1.0192408561706543, 'learning_rate': 4.9157727225979424e-05, 'epoch': 0.17}
17%|█▋ | 788/4506 [54:06<4:12:02, 4.07s/it]
18%|█▊ | 789/4506 [54:10<4:03:56, 3.94s/it]
{'loss': 0.38, 'grad_norm': 0.6032074689865112, 'learning_rate': 4.91527347912213e-05, 'epoch': 0.18}
18%|█▊ | 789/4506 [54:10<4:03:56, 3.94s/it]
18%|█▊ | 790/4506 [54:14<4:07:34, 4.00s/it]
{'loss': 0.3682, 'grad_norm': 0.582206666469574, 'learning_rate': 4.914772785925688e-05, 'epoch': 0.18}
18%|█▊ | 790/4506 [54:14<4:07:34, 4.00s/it]
18%|█▊ | 791/4506 [54:18<4:06:58, 3.99s/it]
{'loss': 0.3711, 'grad_norm': 0.5845838189125061, 'learning_rate': 4.914270643309146e-05, 'epoch': 0.18}
18%|█▊ | 791/4506 [54:18<4:06:58, 3.99s/it]
18%|█▊ | 792/4506 [54:22<4:09:33, 4.03s/it]
{'loss': 0.3605, 'grad_norm': 0.47450676560401917, 'learning_rate': 4.913767051573907e-05, 'epoch': 0.18}
18%|█▊ | 792/4506 [54:22<4:09:33, 4.03s/it]
18%|█▊ | 793/4506 [54:26<4:11:49, 4.07s/it]
{'loss': 0.3725, 'grad_norm': 0.5910004377365112, 'learning_rate': 4.913262011022241e-05, 'epoch': 0.18}
18%|█▊ | 793/4506 [54:26<4:11:49, 4.07s/it]
18%|█▊ | 794/4506 [54:30<4:13:18, 4.09s/it]
{'loss': 0.3878, 'grad_norm': 0.5231766700744629, 'learning_rate': 4.91275552195729e-05, 'epoch': 0.18}
18%|█▊ | 794/4506 [54:30<4:13:18, 4.09s/it]
18%|█▊ | 795/4506 [54:35<4:19:37, 4.20s/it]
{'loss': 0.3727, 'grad_norm': 0.5057073831558228, 'learning_rate': 4.9122475846830616e-05, 'epoch': 0.18}
18%|█▊ | 795/4506 [54:35<4:19:37, 4.20s/it]
18%|█▊ | 796/4506 [54:39<4:18:38, 4.18s/it]
{'loss': 0.3561, 'grad_norm': 0.5611690878868103, 'learning_rate': 4.911738199504438e-05, 'epoch': 0.18}
18%|█▊ | 796/4506 [54:39<4:18:38, 4.18s/it]
18%|█▊ | 797/4506 [54:43<4:15:57, 4.14s/it]
{'loss': 0.3705, 'grad_norm': 0.5204333662986755, 'learning_rate': 4.911227366727166e-05, 'epoch': 0.18}
18%|█▊ | 797/4506 [54:43<4:15:57, 4.14s/it]
18%|█▊ | 798/4506 [54:47<4:15:57, 4.14s/it]
{'loss': 0.3739, 'grad_norm': 0.45705562829971313, 'learning_rate': 4.910715086657863e-05, 'epoch': 0.18}
18%|█▊ | 798/4506 [54:47<4:15:57, 4.14s/it]
18%|█▊ | 799/4506 [54:51<4:14:09, 4.11s/it]
{'loss': 0.3544, 'grad_norm': 0.4872042238712311, 'learning_rate': 4.9102013596040165e-05, 'epoch': 0.18}
18%|█▊ | 799/4506 [54:51<4:14:09, 4.11s/it]
18%|█▊ | 800/4506 [54:55<4:08:35, 4.02s/it]
{'loss': 0.3666, 'grad_norm': 0.7378105521202087, 'learning_rate': 4.90968618587398e-05, 'epoch': 0.18}
18%|█▊ | 800/4506 [54:55<4:08:35, 4.02s/it]
18%|█▊ | 801/4506 [54:59<4:10:33, 4.06s/it]
{'loss': 0.3631, 'grad_norm': 0.5583272576332092, 'learning_rate': 4.909169565776976e-05, 'epoch': 0.18}
18%|█▊ | 801/4506 [54:59<4:10:33, 4.06s/it]
18%|█▊ | 802/4506 [55:03<4:08:51, 4.03s/it]
{'loss': 0.3696, 'grad_norm': 0.6440222859382629, 'learning_rate': 4.908651499623097e-05, 'epoch': 0.18}
18%|█▊ | 802/4506 [55:03<4:08:51, 4.03s/it]
18%|█▊ | 803/4506 [55:07<4:01:54, 3.92s/it]
{'loss': 0.3645, 'grad_norm': 0.5123050212860107, 'learning_rate': 4.9081319877233e-05, 'epoch': 0.18}
18%|█▊ | 803/4506 [55:07<4:01:54, 3.92s/it]
18%|█▊ | 804/4506 [55:11<4:13:25, 4.11s/it]
{'loss': 0.3761, 'grad_norm': 0.47441282868385315, 'learning_rate': 4.907611030389414e-05, 'epoch': 0.18}
18%|█▊ | 804/4506 [55:11<4:13:25, 4.11s/it]
18%|█▊ | 805/4506 [55:15<4:15:35, 4.14s/it]
{'loss': 0.372, 'grad_norm': 0.5241721868515015, 'learning_rate': 4.907088627934133e-05, 'epoch': 0.18}
18%|█▊ | 805/4506 [55:15<4:15:35, 4.14s/it]
18%|█▊ | 806/4506 [55:19<4:13:00, 4.10s/it]
{'loss': 0.3699, 'grad_norm': 0.4699397385120392, 'learning_rate': 4.906564780671018e-05, 'epoch': 0.18}
18%|█▊ | 806/4506 [55:19<4:13:00, 4.10s/it]
18%|█▊ | 807/4506 [55:24<4:12:55, 4.10s/it]
{'loss': 0.3637, 'grad_norm': 0.56128990650177, 'learning_rate': 4.906039488914498e-05, 'epoch': 0.18}
18%|█▊ | 807/4506 [55:24<4:12:55, 4.10s/it]
18%|█▊ | 808/4506 [55:28<4:13:30, 4.11s/it]
{'loss': 0.3602, 'grad_norm': 0.5387994050979614, 'learning_rate': 4.90551275297987e-05, 'epoch': 0.18}
18%|█▊ | 808/4506 [55:28<4:13:30, 4.11s/it]
18%|█▊ | 809/4506 [55:32<4:14:35, 4.13s/it]
{'loss': 0.3625, 'grad_norm': 0.5961061716079712, 'learning_rate': 4.9049845731832965e-05, 'epoch': 0.18}
18%|█▊ | 809/4506 [55:32<4:14:35, 4.13s/it]
18%|█▊ | 810/4506 [55:36<4:12:33, 4.10s/it]
{'loss': 0.3623, 'grad_norm': 0.5614018440246582, 'learning_rate': 4.9044549498418074e-05, 'epoch': 0.18}
18%|█▊ | 810/4506 [55:36<4:12:33, 4.10s/it]
18%|█▊ | 811/4506 [55:40<4:18:01, 4.19s/it]
{'loss': 0.3669, 'grad_norm': 0.5512754321098328, 'learning_rate': 4.903923883273298e-05, 'epoch': 0.18}
18%|█▊ | 811/4506 [55:40<4:18:01, 4.19s/it]
18%|█▊ | 812/4506 [55:44<4:10:51, 4.07s/it]
{'loss': 0.3393, 'grad_norm': 0.5767865777015686, 'learning_rate': 4.903391373796531e-05, 'epoch': 0.18}
18%|█▊ | 812/4506 [55:44<4:10:51, 4.07s/it]
18%|█▊ | 813/4506 [55:48<4:11:53, 4.09s/it]
{'loss': 0.3602, 'grad_norm': 0.5709583759307861, 'learning_rate': 4.902857421731135e-05, 'epoch': 0.18}
18%|█▊ | 813/4506 [55:48<4:11:53, 4.09s/it]
18%|█▊ | 814/4506 [55:52<4:11:31, 4.09s/it]
{'loss': 0.3582, 'grad_norm': 0.5585377812385559, 'learning_rate': 4.902322027397604e-05, 'epoch': 0.18}
18%|█▊ | 814/4506 [55:52<4:11:31, 4.09s/it]
18%|█▊ | 815/4506 [55:57<4:17:41, 4.19s/it]
{'loss': 0.375, 'grad_norm': 0.5992165803909302, 'learning_rate': 4.901785191117299e-05, 'epoch': 0.18}
18%|█▊ | 815/4506 [55:57<4:17:41, 4.19s/it]
18%|█▊ | 816/4506 [56:01<4:15:24, 4.15s/it]
{'loss': 0.3629, 'grad_norm': 0.49100685119628906, 'learning_rate': 4.901246913212444e-05, 'epoch': 0.18}
18%|█▊ | 816/4506 [56:01<4:15:24, 4.15s/it]
18%|█▊ | 817/4506 [56:05<4:17:22, 4.19s/it]
{'loss': 0.3502, 'grad_norm': 0.5162002444267273, 'learning_rate': 4.90070719400613e-05, 'epoch': 0.18}
18%|█▊ | 817/4506 [56:05<4:17:22, 4.19s/it]
18%|█▊ | 818/4506 [56:09<4:11:18, 4.09s/it]
{'loss': 0.3613, 'grad_norm': 0.551860511302948, 'learning_rate': 4.9001660338223135e-05, 'epoch': 0.18}
18%|█▊ | 818/4506 [56:09<4:11:18, 4.09s/it]
18%|█▊ | 819/4506 [56:13<4:11:10, 4.09s/it]
{'loss': 0.3525, 'grad_norm': 0.5077301263809204, 'learning_rate': 4.8996234329858147e-05, 'epoch': 0.18}
18%|█▊ | 819/4506 [56:13<4:11:10, 4.09s/it]
18%|█▊ | 820/4506 [56:17<4:11:52, 4.10s/it]
{'loss': 0.3486, 'grad_norm': 0.47484511137008667, 'learning_rate': 4.89907939182232e-05, 'epoch': 0.18}
18%|█▊ | 820/4506 [56:17<4:11:52, 4.10s/it]
18%|█▊ | 821/4506 [56:21<4:13:44, 4.13s/it]
{'loss': 0.3842, 'grad_norm': 0.5109031200408936, 'learning_rate': 4.898533910658379e-05, 'epoch': 0.18}
18%|█▊ | 821/4506 [56:21<4:13:44, 4.13s/it]
18%|█▊ | 822/4506 [56:26<4:16:32, 4.18s/it]
{'loss': 0.3685, 'grad_norm': 0.5190032720565796, 'learning_rate': 4.8979869898214056e-05, 'epoch': 0.18}
18%|█▊ | 822/4506 [56:26<4:16:32, 4.18s/it]
18%|█▊ | 823/4506 [56:30<4:11:18, 4.09s/it]
{'loss': 0.3606, 'grad_norm': 0.6140127778053284, 'learning_rate': 4.897438629639678e-05, 'epoch': 0.18}
18%|█▊ | 823/4506 [56:30<4:11:18, 4.09s/it]
18%|█▊ | 824/4506 [56:34<4:19:35, 4.23s/it]
{'loss': 0.3812, 'grad_norm': 0.6212425827980042, 'learning_rate': 4.896888830442341e-05, 'epoch': 0.18}
18%|█▊ | 824/4506 [56:34<4:19:35, 4.23s/it]
18%|█▊ | 825/4506 [56:38<4:17:36, 4.20s/it]
{'loss': 0.3542, 'grad_norm': 0.4717709422111511, 'learning_rate': 4.896337592559398e-05, 'epoch': 0.18}
18%|█▊ | 825/4506 [56:38<4:17:36, 4.20s/it]
18%|█▊ | 826/4506 [56:42<4:13:26, 4.13s/it]
{'loss': 0.3774, 'grad_norm': 0.47531428933143616, 'learning_rate': 4.8957849163217206e-05, 'epoch': 0.18}
18%|█▊ | 826/4506 [56:42<4:13:26, 4.13s/it]
18%|█▊ | 827/4506 [56:47<4:18:11, 4.21s/it]
{'loss': 0.3705, 'grad_norm': 0.5891035199165344, 'learning_rate': 4.8952308020610416e-05, 'epoch': 0.18}
18%|█▊ | 827/4506 [56:47<4:18:11, 4.21s/it]
18%|█▊ | 828/4506 [56:51<4:17:05, 4.19s/it]
{'loss': 0.3613, 'grad_norm': 0.500027060508728, 'learning_rate': 4.8946752501099554e-05, 'epoch': 0.18}
18%|█▊ | 828/4506 [56:51<4:17:05, 4.19s/it]
18%|█▊ | 829/4506 [56:55<4:11:32, 4.10s/it]
{'loss': 0.3487, 'grad_norm': 0.4815438687801361, 'learning_rate': 4.8941182608019246e-05, 'epoch': 0.18}
18%|█▊ | 829/4506 [56:55<4:11:32, 4.10s/it]
18%|█▊ | 830/4506 [56:59<4:07:23, 4.04s/it]
{'loss': 0.3605, 'grad_norm': 0.5220358371734619, 'learning_rate': 4.893559834471268e-05, 'epoch': 0.18}
18%|█▊ | 830/4506 [56:59<4:07:23, 4.04s/it]
18%|█▊ | 831/4506 [57:02<4:05:51, 4.01s/it]
{'loss': 0.3532, 'grad_norm': 0.5614737868309021, 'learning_rate': 4.892999971453172e-05, 'epoch': 0.18}
18%|█▊ | 831/4506 [57:02<4:05:51, 4.01s/it]
18%|█▊ | 832/4506 [57:06<4:02:34, 3.96s/it]
{'loss': 0.3653, 'grad_norm': 0.5823272466659546, 'learning_rate': 4.892438672083682e-05, 'epoch': 0.18}
18%|█▊ | 832/4506 [57:06<4:02:34, 3.96s/it]
18%|█▊ | 833/4506 [57:11<4:13:00, 4.13s/it]
{'loss': 0.358, 'grad_norm': 0.5183751583099365, 'learning_rate': 4.891875936699708e-05, 'epoch': 0.18}
18%|█▊ | 833/4506 [57:11<4:13:00, 4.13s/it]
19%|█▊ | 834/4506 [57:15<4:08:10, 4.06s/it]
{'loss': 0.3521, 'grad_norm': 0.5866208076477051, 'learning_rate': 4.89131176563902e-05, 'epoch': 0.19}
19%|█▊ | 834/4506 [57:15<4:08:10, 4.06s/it]
19%|█▊ | 835/4506 [57:19<4:13:02, 4.14s/it]
{'loss': 0.3641, 'grad_norm': 0.6370099186897278, 'learning_rate': 4.8907461592402516e-05, 'epoch': 0.19}
19%|█▊ | 835/4506 [57:19<4:13:02, 4.14s/it]
19%|█▊ | 836/4506 [57:23<4:14:04, 4.15s/it]
{'loss': 0.3545, 'grad_norm': 0.5929888486862183, 'learning_rate': 4.890179117842897e-05, 'epoch': 0.19}
19%|█▊ | 836/4506 [57:23<4:14:04, 4.15s/it]
19%|█▊ | 837/4506 [57:28<4:17:01, 4.20s/it]
{'loss': 0.3627, 'grad_norm': 0.45720919966697693, 'learning_rate': 4.889610641787311e-05, 'epoch': 0.19}
19%|█▊ | 837/4506 [57:28<4:17:01, 4.20s/it]
19%|█▊ | 838/4506 [57:32<4:15:56, 4.19s/it]
{'loss': 0.3514, 'grad_norm': 0.4659166634082794, 'learning_rate': 4.88904073141471e-05, 'epoch': 0.19}
19%|█▊ | 838/4506 [57:32<4:15:56, 4.19s/it]
19%|█▊ | 839/4506 [57:36<4:15:26, 4.18s/it]
{'loss': 0.3674, 'grad_norm': 0.4884556531906128, 'learning_rate': 4.888469387067173e-05, 'epoch': 0.19}
19%|█▊ | 839/4506 [57:36<4:15:26, 4.18s/it]
19%|█▊ | 840/4506 [57:40<4:13:08, 4.14s/it]
{'loss': 0.3669, 'grad_norm': 0.44020143151283264, 'learning_rate': 4.887896609087637e-05, 'epoch': 0.19}
19%|█▊ | 840/4506 [57:40<4:13:08, 4.14s/it]
19%|█▊ | 841/4506 [57:44<4:11:32, 4.12s/it]
{'loss': 0.3611, 'grad_norm': 0.4937373101711273, 'learning_rate': 4.8873223978199e-05, 'epoch': 0.19}
19%|█▊ | 841/4506 [57:44<4:11:32, 4.12s/it]
19%|█▊ | 842/4506 [57:48<4:10:33, 4.10s/it]
{'loss': 0.3696, 'grad_norm': 0.5400654077529907, 'learning_rate': 4.886746753608623e-05, 'epoch': 0.19}
19%|█▊ | 842/4506 [57:48<4:10:33, 4.10s/it]
19%|█▊ | 843/4506 [57:52<4:06:32, 4.04s/it]
{'loss': 0.348, 'grad_norm': 0.4701763093471527, 'learning_rate': 4.886169676799325e-05, 'epoch': 0.19}
19%|█▊ | 843/4506 [57:52<4:06:32, 4.04s/it]
19%|█▊ | 844/4506 [57:56<4:07:17, 4.05s/it]
{'loss': 0.3715, 'grad_norm': 0.5148012638092041, 'learning_rate': 4.885591167738384e-05, 'epoch': 0.19}
19%|█▊ | 844/4506 [57:56<4:07:17, 4.05s/it]
19%|█▉ | 845/4506 [58:00<4:08:41, 4.08s/it]
{'loss': 0.3605, 'grad_norm': 0.5245968699455261, 'learning_rate': 4.8850112267730385e-05, 'epoch': 0.19}
19%|█▉ | 845/4506 [58:00<4:08:41, 4.08s/it]
19%|█▉ | 846/4506 [58:04<4:08:08, 4.07s/it]
{'loss': 0.3485, 'grad_norm': 0.502842128276825, 'learning_rate': 4.884429854251388e-05, 'epoch': 0.19}
19%|█▉ | 846/4506 [58:04<4:08:08, 4.07s/it]
19%|█▉ | 847/4506 [58:08<4:06:15, 4.04s/it]
{'loss': 0.3412, 'grad_norm': 0.4999314546585083, 'learning_rate': 4.883847050522389e-05, 'epoch': 0.19}
19%|█▉ | 847/4506 [58:08<4:06:15, 4.04s/it]
19%|█▉ | 848/4506 [58:12<4:09:54, 4.10s/it]
{'loss': 0.3597, 'grad_norm': 0.5491294264793396, 'learning_rate': 4.883262815935858e-05, 'epoch': 0.19}
19%|█▉ | 848/4506 [58:12<4:09:54, 4.10s/it]
19%|█▉ | 849/4506 [58:16<4:09:38, 4.10s/it]
{'loss': 0.361, 'grad_norm': 0.5315783023834229, 'learning_rate': 4.882677150842471e-05, 'epoch': 0.19}
19%|█▉ | 849/4506 [58:17<4:09:38, 4.10s/it]
19%|█▉ | 850/4506 [58:21<4:13:13, 4.16s/it]
{'loss': 0.3526, 'grad_norm': 0.477104514837265, 'learning_rate': 4.882090055593761e-05, 'epoch': 0.19}
19%|█▉ | 850/4506 [58:21<4:13:13, 4.16s/it]
19%|█▉ | 851/4506 [58:25<4:13:38, 4.16s/it]
{'loss': 0.3439, 'grad_norm': 0.5451700091362, 'learning_rate': 4.881501530542122e-05, 'epoch': 0.19}
19%|█▉ | 851/4506 [58:25<4:13:38, 4.16s/it]
19%|█▉ | 852/4506 [58:29<4:18:56, 4.25s/it]
{'loss': 0.3607, 'grad_norm': 0.5385378003120422, 'learning_rate': 4.880911576040804e-05, 'epoch': 0.19}
19%|█▉ | 852/4506 [58:29<4:18:56, 4.25s/it]
19%|█▉ | 853/4506 [58:33<4:13:53, 4.17s/it]
{'loss': 0.3483, 'grad_norm': 0.496227502822876, 'learning_rate': 4.880320192443916e-05, 'epoch': 0.19}
19%|█▉ | 853/4506 [58:33<4:13:53, 4.17s/it]
19%|█▉ | 854/4506 [58:37<4:12:20, 4.15s/it]
{'loss': 0.3511, 'grad_norm': 0.4724280834197998, 'learning_rate': 4.8797273801064226e-05, 'epoch': 0.19}
19%|█▉ | 854/4506 [58:38<4:12:20, 4.15s/it]
19%|█▉ | 855/4506 [58:43<4:28:15, 4.41s/it]
{'loss': 0.3548, 'grad_norm': 0.49358275532722473, 'learning_rate': 4.8791331393841494e-05, 'epoch': 0.19}
19%|█▉ | 855/4506 [58:43<4:28:15, 4.41s/it]
19%|█▉ | 856/4506 [58:46<4:16:55, 4.22s/it]
{'loss': 0.3574, 'grad_norm': 0.5620616674423218, 'learning_rate': 4.878537470633777e-05, 'epoch': 0.19}
19%|█▉ | 856/4506 [58:46<4:16:55, 4.22s/it]
19%|█▉ | 857/4506 [58:50<4:14:20, 4.18s/it]
{'loss': 0.38, 'grad_norm': 0.5207251906394958, 'learning_rate': 4.877940374212845e-05, 'epoch': 0.19}
19%|█▉ | 857/4506 [58:50<4:14:20, 4.18s/it]
19%|█▉ | 858/4506 [58:54<4:07:39, 4.07s/it]
{'loss': 0.3628, 'grad_norm': 0.5132770538330078, 'learning_rate': 4.877341850479748e-05, 'epoch': 0.19}
19%|█▉ | 858/4506 [58:54<4:07:39, 4.07s/it]
19%|█▉ | 859/4506 [58:58<4:08:01, 4.08s/it]
{'loss': 0.3442, 'grad_norm': 0.5137561559677124, 'learning_rate': 4.876741899793739e-05, 'epoch': 0.19}
19%|█▉ | 859/4506 [58:58<4:08:01, 4.08s/it]
19%|█▉ | 860/4506 [59:02<4:07:08, 4.07s/it]
{'loss': 0.359, 'grad_norm': 0.45541784167289734, 'learning_rate': 4.876140522514925e-05, 'epoch': 0.19}
19%|█▉ | 860/4506 [59:02<4:07:08, 4.07s/it]
19%|█▉ | 861/4506 [59:06<3:59:45, 3.95s/it]
{'loss': 0.3701, 'grad_norm': 0.4960521161556244, 'learning_rate': 4.8755377190042726e-05, 'epoch': 0.19}
19%|█▉ | 861/4506 [59:06<3:59:45, 3.95s/it]
19%|█▉ | 862/4506 [59:10<4:06:59, 4.07s/it]
{'loss': 0.3465, 'grad_norm': 0.5311806797981262, 'learning_rate': 4.874933489623602e-05, 'epoch': 0.19}
19%|█▉ | 862/4506 [59:10<4:06:59, 4.07s/it]
19%|█▉ | 863/4506 [59:14<4:06:34, 4.06s/it]
{'loss': 0.356, 'grad_norm': 0.4795973300933838, 'learning_rate': 4.87432783473559e-05, 'epoch': 0.19}
19%|█▉ | 863/4506 [59:14<4:06:34, 4.06s/it]
19%|█▉ | 864/4506 [59:19<4:10:56, 4.13s/it]
{'loss': 0.3618, 'grad_norm': 0.5357212424278259, 'learning_rate': 4.8737207547037686e-05, 'epoch': 0.19}
19%|█▉ | 864/4506 [59:19<4:10:56, 4.13s/it]
19%|█▉ | 865/4506 [59:23<4:08:04, 4.09s/it]
{'loss': 0.3461, 'grad_norm': 0.534323513507843, 'learning_rate': 4.8731122498925274e-05, 'epoch': 0.19}
19%|█▉ | 865/4506 [59:23<4:08:04, 4.09s/it]
19%|█▉ | 866/4506 [59:27<4:05:50, 4.05s/it]
{'loss': 0.3674, 'grad_norm': 0.49289223551750183, 'learning_rate': 4.872502320667108e-05, 'epoch': 0.19}
19%|█▉ | 866/4506 [59:27<4:05:50, 4.05s/it]
19%|█▉ | 867/4506 [59:31<4:02:21, 4.00s/it]
{'loss': 0.3621, 'grad_norm': 0.5112399458885193, 'learning_rate': 4.8718909673936096e-05, 'epoch': 0.19}
19%|█▉ | 867/4506 [59:31<4:02:21, 4.00s/it]
19%|█▉ | 868/4506 [59:35<4:11:35, 4.15s/it]
{'loss': 0.3495, 'grad_norm': 0.47224465012550354, 'learning_rate': 4.871278190438984e-05, 'epoch': 0.19}
19%|█▉ | 868/4506 [59:35<4:11:35, 4.15s/it]
19%|█▉ | 869/4506 [59:39<4:02:00, 3.99s/it]
{'loss': 0.3581, 'grad_norm': 0.5469279885292053, 'learning_rate': 4.8706639901710387e-05, 'epoch': 0.19}
19%|█▉ | 869/4506 [59:39<4:02:00, 3.99s/it]
19%|█▉ | 870/4506 [59:42<3:58:03, 3.93s/it]
{'loss': 0.3553, 'grad_norm': 0.5315027236938477, 'learning_rate': 4.870048366958436e-05, 'epoch': 0.19}
19%|█▉ | 870/4506 [59:42<3:58:03, 3.93s/it]
19%|█▉ | 871/4506 [59:46<3:56:03, 3.90s/it]
{'loss': 0.3758, 'grad_norm': 0.5056049823760986, 'learning_rate': 4.8694313211706915e-05, 'epoch': 0.19}
19%|█▉ | 871/4506 [59:46<3:56:03, 3.90s/it]
19%|█▉ | 872/4506 [59:50<4:01:28, 3.99s/it]
{'loss': 0.356, 'grad_norm': 0.487179160118103, 'learning_rate': 4.868812853178175e-05, 'epoch': 0.19}
19%|█▉ | 872/4506 [59:50<4:01:28, 3.99s/it]
19%|█▉ | 873/4506 [59:55<4:02:25, 4.00s/it]
{'loss': 0.3631, 'grad_norm': 0.4481738209724426, 'learning_rate': 4.8681929633521086e-05, 'epoch': 0.19}
19%|█▉ | 873/4506 [59:55<4:02:25, 4.00s/it]
19%|█▉ | 874/4506 [59:58<4:00:53, 3.98s/it]
{'loss': 0.3622, 'grad_norm': 0.4769934117794037, 'learning_rate': 4.8675716520645704e-05, 'epoch': 0.19}
19%|█▉ | 874/4506 [59:58<4:00:53, 3.98s/it]
19%|█▉ | 875/4506 [1:00:03<4:04:11, 4.04s/it]
{'loss': 0.343, 'grad_norm': 0.4811052083969116, 'learning_rate': 4.8669489196884896e-05, 'epoch': 0.19}
19%|█▉ | 875/4506 [1:00:03<4:04:11, 4.04s/it]
19%|█▉ | 876/4506 [1:00:07<4:04:10, 4.04s/it]
{'loss': 0.3336, 'grad_norm': 0.5428720116615295, 'learning_rate': 4.8663247665976504e-05, 'epoch': 0.19}
19%|█▉ | 876/4506 [1:00:07<4:04:10, 4.04s/it]
19%|█▉ | 877/4506 [1:00:11<4:04:03, 4.04s/it]
{'loss': 0.3587, 'grad_norm': 0.4879562556743622, 'learning_rate': 4.865699193166686e-05, 'epoch': 0.19}
19%|█▉ | 877/4506 [1:00:11<4:04:03, 4.04s/it]
19%|█▉ | 878/4506 [1:00:15<4:08:00, 4.10s/it]
{'loss': 0.3507, 'grad_norm': 0.5487425923347473, 'learning_rate': 4.8650721997710875e-05, 'epoch': 0.19}
19%|█▉ | 878/4506 [1:00:15<4:08:00, 4.10s/it]
20%|█▉ | 879/4506 [1:00:19<4:08:00, 4.10s/it]
{'loss': 0.3467, 'grad_norm': 0.5050961375236511, 'learning_rate': 4.8644437867871936e-05, 'epoch': 0.2}
20%|█▉ | 879/4506 [1:00:19<4:08:00, 4.10s/it]
20%|█▉ | 880/4506 [1:00:23<4:05:09, 4.06s/it]
{'loss': 0.3359, 'grad_norm': 0.47842079401016235, 'learning_rate': 4.863813954592197e-05, 'epoch': 0.2}
20%|█▉ | 880/4506 [1:00:23<4:05:09, 4.06s/it]
20%|█▉ | 881/4506 [1:00:27<4:05:52, 4.07s/it]
{'loss': 0.363, 'grad_norm': 0.48359063267707825, 'learning_rate': 4.863182703564142e-05, 'epoch': 0.2}
20%|█▉ | 881/4506 [1:00:27<4:05:52, 4.07s/it]
20%|█▉ | 882/4506 [1:00:31<4:07:46, 4.10s/it]
{'loss': 0.367, 'grad_norm': 0.4940342605113983, 'learning_rate': 4.862550034081926e-05, 'epoch': 0.2}
20%|█▉ | 882/4506 [1:00:31<4:07:46, 4.10s/it]
20%|█▉ | 883/4506 [1:00:36<4:18:55, 4.29s/it]
{'loss': 0.3323, 'grad_norm': 0.4696371257305145, 'learning_rate': 4.8619159465252956e-05, 'epoch': 0.2}
20%|█▉ | 883/4506 [1:00:36<4:18:55, 4.29s/it]
20%|█▉ | 884/4506 [1:00:40<4:20:40, 4.32s/it]
{'loss': 0.3622, 'grad_norm': 0.5181319713592529, 'learning_rate': 4.861280441274849e-05, 'epoch': 0.2}
20%|█▉ | 884/4506 [1:00:40<4:20:40, 4.32s/it]
20%|█▉ | 885/4506 [1:00:44<4:10:34, 4.15s/it]
{'loss': 0.3626, 'grad_norm': 0.6101993918418884, 'learning_rate': 4.860643518712037e-05, 'epoch': 0.2}
20%|█▉ | 885/4506 [1:00:44<4:10:34, 4.15s/it]
20%|█▉ | 886/4506 [1:00:48<4:04:04, 4.05s/it]
{'loss': 0.354, 'grad_norm': 0.5382705926895142, 'learning_rate': 4.8600051792191584e-05, 'epoch': 0.2}
20%|█▉ | 886/4506 [1:00:48<4:04:04, 4.05s/it]
20%|█▉ | 887/4506 [1:00:52<4:09:12, 4.13s/it]
{'loss': 0.3576, 'grad_norm': 0.4512154161930084, 'learning_rate': 4.859365423179365e-05, 'epoch': 0.2}
20%|█▉ | 887/4506 [1:00:52<4:09:12, 4.13s/it]
20%|█▉ | 888/4506 [1:00:56<4:02:03, 4.01s/it]
{'loss': 0.3717, 'grad_norm': 0.5500571727752686, 'learning_rate': 4.8587242509766575e-05, 'epoch': 0.2}
20%|█▉ | 888/4506 [1:00:56<4:02:03, 4.01s/it]
20%|█▉ | 889/4506 [1:01:00<3:54:31, 3.89s/it]
{'loss': 0.3365, 'grad_norm': 0.46582743525505066, 'learning_rate': 4.858081662995887e-05, 'epoch': 0.2}
20%|█▉ | 889/4506 [1:01:00<3:54:31, 3.89s/it]
20%|█▉ | 890/4506 [1:01:04<3:56:11, 3.92s/it]
{'loss': 0.3597, 'grad_norm': 0.6411876082420349, 'learning_rate': 4.8574376596227536e-05, 'epoch': 0.2}
20%|█▉ | 890/4506 [1:01:04<3:56:11, 3.92s/it]
20%|█▉ | 891/4506 [1:01:08<3:56:19, 3.92s/it]
{'loss': 0.3519, 'grad_norm': 0.44683220982551575, 'learning_rate': 4.8567922412438096e-05, 'epoch': 0.2}
20%|█▉ | 891/4506 [1:01:08<3:56:19, 3.92s/it]
20%|█▉ | 892/4506 [1:01:11<3:55:21, 3.91s/it]
{'loss': 0.369, 'grad_norm': 0.4959127604961395, 'learning_rate': 4.856145408246453e-05, 'epoch': 0.2}
20%|█▉ | 892/4506 [1:01:11<3:55:21, 3.91s/it]
20%|█▉ | 893/4506 [1:01:16<4:00:47, 4.00s/it]
{'loss': 0.3509, 'grad_norm': 0.46811968088150024, 'learning_rate': 4.8554971610189334e-05, 'epoch': 0.2}
20%|█▉ | 893/4506 [1:01:16<4:00:47, 4.00s/it]
20%|█▉ | 894/4506 [1:01:20<4:01:58, 4.02s/it]
{'loss': 0.3288, 'grad_norm': 0.45717111229896545, 'learning_rate': 4.854847499950349e-05, 'epoch': 0.2}
20%|█▉ | 894/4506 [1:01:20<4:01:58, 4.02s/it]
20%|█▉ | 895/4506 [1:01:24<4:02:13, 4.02s/it]
{'loss': 0.3422, 'grad_norm': 0.4503939151763916, 'learning_rate': 4.854196425430645e-05, 'epoch': 0.2}
20%|█▉ | 895/4506 [1:01:24<4:02:13, 4.02s/it]
20%|█▉ | 896/4506 [1:01:28<4:05:22, 4.08s/it]
{'loss': 0.3445, 'grad_norm': 0.4640904366970062, 'learning_rate': 4.853543937850617e-05, 'epoch': 0.2}
20%|█▉ | 896/4506 [1:01:28<4:05:22, 4.08s/it]
20%|█▉ | 897/4506 [1:01:33<4:14:49, 4.24s/it]
{'loss': 0.3525, 'grad_norm': 0.4766049385070801, 'learning_rate': 4.852890037601907e-05, 'epoch': 0.2}
20%|█▉ | 897/4506 [1:01:33<4:14:49, 4.24s/it]
20%|█▉ | 898/4506 [1:01:37<4:20:20, 4.33s/it]
{'loss': 0.3431, 'grad_norm': 0.44297584891319275, 'learning_rate': 4.852234725077007e-05, 'epoch': 0.2}
20%|█▉ | 898/4506 [1:01:37<4:20:20, 4.33s/it]
20%|█▉ | 899/4506 [1:01:41<4:09:27, 4.15s/it]
{'loss': 0.3411, 'grad_norm': 0.4860449433326721, 'learning_rate': 4.851578000669255e-05, 'epoch': 0.2}
20%|█▉ | 899/4506 [1:01:41<4:09:27, 4.15s/it]
20%|█▉ | 900/4506 [1:01:45<4:11:06, 4.18s/it]
{'loss': 0.3473, 'grad_norm': 0.4506709575653076, 'learning_rate': 4.850919864772836e-05, 'epoch': 0.2}
20%|█▉ | 900/4506 [1:01:45<4:11:06, 4.18s/it]
20%|█▉ | 901/4506 [1:01:49<4:11:39, 4.19s/it]
{'loss': 0.3602, 'grad_norm': 0.4923262298107147, 'learning_rate': 4.850260317782785e-05, 'epoch': 0.2}
20%|█▉ | 901/4506 [1:01:49<4:11:39, 4.19s/it]
20%|██ | 902/4506 [1:01:53<4:07:36, 4.12s/it]
{'loss': 0.3463, 'grad_norm': 0.5111044645309448, 'learning_rate': 4.849599360094981e-05, 'epoch': 0.2}
20%|██ | 902/4506 [1:01:53<4:07:36, 4.12s/it]
20%|██ | 903/4506 [1:01:57<4:08:34, 4.14s/it]
{'loss': 0.3599, 'grad_norm': 0.526656448841095, 'learning_rate': 4.848936992106151e-05, 'epoch': 0.2}
20%|██ | 903/4506 [1:01:57<4:08:34, 4.14s/it]
20%|██ | 904/4506 [1:02:01<4:03:47, 4.06s/it]
{'loss': 0.3547, 'grad_norm': 0.4895065128803253, 'learning_rate': 4.8482732142138685e-05, 'epoch': 0.2}
20%|██ | 904/4506 [1:02:01<4:03:47, 4.06s/it]
20%|██ | 905/4506 [1:02:05<4:04:12, 4.07s/it]
{'loss': 0.3571, 'grad_norm': 0.5365212559700012, 'learning_rate': 4.8476080268165536e-05, 'epoch': 0.2}
20%|██ | 905/4506 [1:02:05<4:04:12, 4.07s/it]
20%|██ | 906/4506 [1:02:09<3:57:37, 3.96s/it]
{'loss': 0.3577, 'grad_norm': 0.5706275105476379, 'learning_rate': 4.846941430313472e-05, 'epoch': 0.2}
20%|██ | 906/4506 [1:02:09<3:57:37, 3.96s/it]
20%|██ | 907/4506 [1:02:13<3:51:13, 3.85s/it]
{'loss': 0.3519, 'grad_norm': 0.573901891708374, 'learning_rate': 4.8462734251047344e-05, 'epoch': 0.2}
20%|██ | 907/4506 [1:02:13<3:51:13, 3.85s/it]
20%|██ | 908/4506 [1:02:17<3:55:53, 3.93s/it]
{'loss': 0.3343, 'grad_norm': 0.5520500540733337, 'learning_rate': 4.8456040115912984e-05, 'epoch': 0.2}
20%|██ | 908/4506 [1:02:17<3:55:53, 3.93s/it]
20%|██ | 909/4506 [1:02:21<3:55:25, 3.93s/it]
{'loss': 0.3419, 'grad_norm': 0.5229018926620483, 'learning_rate': 4.8449331901749665e-05, 'epoch': 0.2}
20%|██ | 909/4506 [1:02:21<3:55:25, 3.93s/it]
20%|██ | 910/4506 [1:02:25<3:58:29, 3.98s/it]
{'loss': 0.3384, 'grad_norm': 0.4812384247779846, 'learning_rate': 4.844260961258386e-05, 'epoch': 0.2}
20%|██ | 910/4506 [1:02:25<3:58:29, 3.98s/it]
20%|██ | 911/4506 [1:02:29<4:04:01, 4.07s/it]
{'loss': 0.3258, 'grad_norm': 0.41060665249824524, 'learning_rate': 4.843587325245048e-05, 'epoch': 0.2}
20%|██ | 911/4506 [1:02:29<4:04:01, 4.07s/it]
20%|██ | 912/4506 [1:02:33<4:03:58, 4.07s/it]
{'loss': 0.3549, 'grad_norm': 0.5399783849716187, 'learning_rate': 4.8429122825392916e-05, 'epoch': 0.2}
20%|██ | 912/4506 [1:02:33<4:03:58, 4.07s/it]
20%|██ | 913/4506 [1:02:37<4:00:02, 4.01s/it]
{'loss': 0.3555, 'grad_norm': 0.5642484426498413, 'learning_rate': 4.8422358335462965e-05, 'epoch': 0.2}
20%|██ | 913/4506 [1:02:37<4:00:02, 4.01s/it]
20%|██ | 914/4506 [1:02:41<3:54:57, 3.92s/it]
{'loss': 0.3483, 'grad_norm': 0.558139979839325, 'learning_rate': 4.8415579786720875e-05, 'epoch': 0.2}
20%|██ | 914/4506 [1:02:41<3:54:57, 3.92s/it]
20%|██ | 915/4506 [1:02:45<4:02:56, 4.06s/it]
{'loss': 0.3502, 'grad_norm': 0.5005956292152405, 'learning_rate': 4.840878718323536e-05, 'epoch': 0.2}
20%|██ | 915/4506 [1:02:45<4:02:56, 4.06s/it]
20%|██ | 916/4506 [1:02:49<4:05:13, 4.10s/it]
{'loss': 0.3628, 'grad_norm': 0.5238382816314697, 'learning_rate': 4.840198052908352e-05, 'epoch': 0.2}
20%|██ | 916/4506 [1:02:49<4:05:13, 4.10s/it]
20%|██ | 917/4506 [1:02:53<4:04:23, 4.09s/it]
{'loss': 0.3326, 'grad_norm': 0.5273259282112122, 'learning_rate': 4.839515982835093e-05, 'epoch': 0.2}
20%|██ | 917/4506 [1:02:53<4:04:23, 4.09s/it]
20%|██ | 918/4506 [1:02:57<4:01:22, 4.04s/it]
{'loss': 0.3508, 'grad_norm': 0.5948821306228638, 'learning_rate': 4.838832508513158e-05, 'epoch': 0.2}
20%|██ | 918/4506 [1:02:57<4:01:22, 4.04s/it]
20%|██ | 919/4506 [1:03:01<4:00:41, 4.03s/it]
{'loss': 0.3591, 'grad_norm': 0.527251124382019, 'learning_rate': 4.838147630352789e-05, 'epoch': 0.2}
20%|██ | 919/4506 [1:03:01<4:00:41, 4.03s/it]
20%|██ | 920/4506 [1:03:06<4:04:35, 4.09s/it]
{'loss': 0.3337, 'grad_norm': 0.6014145016670227, 'learning_rate': 4.837461348765071e-05, 'epoch': 0.2}
20%|██ | 920/4506 [1:03:06<4:04:35, 4.09s/it]
20%|██ | 921/4506 [1:03:10<4:07:12, 4.14s/it]
{'loss': 0.3491, 'grad_norm': 0.5146040320396423, 'learning_rate': 4.836773664161931e-05, 'epoch': 0.2}
20%|██ | 921/4506 [1:03:10<4:07:12, 4.14s/it]
20%|██ | 922/4506 [1:03:14<4:07:37, 4.15s/it]
{'loss': 0.3497, 'grad_norm': 0.45452776551246643, 'learning_rate': 4.836084576956138e-05, 'epoch': 0.2}
20%|██ | 922/4506 [1:03:14<4:07:37, 4.15s/it]
20%|██ | 923/4506 [1:03:18<4:11:14, 4.21s/it]
{'loss': 0.3532, 'grad_norm': 0.5670636892318726, 'learning_rate': 4.835394087561303e-05, 'epoch': 0.2}
20%|██ | 923/4506 [1:03:18<4:11:14, 4.21s/it]
21%|██ | 924/4506 [1:03:22<4:04:48, 4.10s/it]
{'loss': 0.3597, 'grad_norm': 0.5843803882598877, 'learning_rate': 4.83470219639188e-05, 'epoch': 0.21}
21%|██ | 924/4506 [1:03:22<4:04:48, 4.10s/it]
21%|██ | 925/4506 [1:03:26<4:03:35, 4.08s/it]
{'loss': 0.3647, 'grad_norm': 0.5659400224685669, 'learning_rate': 4.8340089038631605e-05, 'epoch': 0.21}
21%|██ | 925/4506 [1:03:26<4:03:35, 4.08s/it]
21%|██ | 926/4506 [1:03:30<4:01:20, 4.04s/it]
{'loss': 0.353, 'grad_norm': 0.5342445969581604, 'learning_rate': 4.8333142103912824e-05, 'epoch': 0.21}
21%|██ | 926/4506 [1:03:30<4:01:20, 4.04s/it]
21%|██ | 927/4506 [1:03:34<4:00:01, 4.02s/it]
{'loss': 0.3416, 'grad_norm': 0.5100891590118408, 'learning_rate': 4.83261811639322e-05, 'epoch': 0.21}
21%|██ | 927/4506 [1:03:34<4:00:01, 4.02s/it]
21%|██ | 928/4506 [1:03:38<4:02:21, 4.06s/it]
{'loss': 0.3528, 'grad_norm': 0.549517035484314, 'learning_rate': 4.831920622286792e-05, 'epoch': 0.21}
21%|██ | 928/4506 [1:03:38<4:02:21, 4.06s/it]
21%|██ | 929/4506 [1:03:43<4:14:41, 4.27s/it]
{'loss': 0.3392, 'grad_norm': 0.4837353527545929, 'learning_rate': 4.831221728490654e-05, 'epoch': 0.21}
21%|██ | 929/4506 [1:03:43<4:14:41, 4.27s/it]
21%|██ | 930/4506 [1:03:48<4:22:58, 4.41s/it]
{'loss': 0.3658, 'grad_norm': 0.4514012932777405, 'learning_rate': 4.830521435424305e-05, 'epoch': 0.21}
21%|██ | 930/4506 [1:03:48<4:22:58, 4.41s/it]
21%|██ | 931/4506 [1:03:52<4:14:00, 4.26s/it]
{'loss': 0.3387, 'grad_norm': 0.4873148202896118, 'learning_rate': 4.829819743508079e-05, 'epoch': 0.21}
21%|██ | 931/4506 [1:03:52<4:14:00, 4.26s/it]
21%|██ | 932/4506 [1:03:56<4:13:54, 4.26s/it]
{'loss': 0.3477, 'grad_norm': 0.4864300787448883, 'learning_rate': 4.829116653163155e-05, 'epoch': 0.21}
21%|██ | 932/4506 [1:03:56<4:13:54, 4.26s/it]
21%|██ | 933/4506 [1:04:00<4:08:38, 4.18s/it]
{'loss': 0.3695, 'grad_norm': 0.5428628921508789, 'learning_rate': 4.8284121648115504e-05, 'epoch': 0.21}
21%|██ | 933/4506 [1:04:00<4:08:38, 4.18s/it]
21%|██ | 934/4506 [1:04:04<4:11:43, 4.23s/it]
{'loss': 0.3571, 'grad_norm': 0.47131454944610596, 'learning_rate': 4.827706278876118e-05, 'epoch': 0.21}
21%|██ | 934/4506 [1:04:04<4:11:43, 4.23s/it]
21%|██ | 935/4506 [1:04:08<4:09:56, 4.20s/it]
{'loss': 0.3645, 'grad_norm': 0.5590959787368774, 'learning_rate': 4.8269989957805545e-05, 'epoch': 0.21}
21%|██ | 935/4506 [1:04:08<4:09:56, 4.20s/it]
21%|██ | 936/4506 [1:04:13<4:07:56, 4.17s/it]
{'loss': 0.3579, 'grad_norm': 0.48026955127716064, 'learning_rate': 4.826290315949391e-05, 'epoch': 0.21}
21%|██ | 936/4506 [1:04:13<4:07:56, 4.17s/it]
21%|██ | 937/4506 [1:04:16<4:04:15, 4.11s/it]
{'loss': 0.3476, 'grad_norm': 0.4822919964790344, 'learning_rate': 4.8255802398079985e-05, 'epoch': 0.21}
21%|██ | 937/4506 [1:04:16<4:04:15, 4.11s/it]
21%|██ | 938/4506 [1:04:20<4:02:48, 4.08s/it]
{'loss': 0.3336, 'grad_norm': 0.4845527708530426, 'learning_rate': 4.8248687677825874e-05, 'epoch': 0.21}
21%|██ | 938/4506 [1:04:21<4:02:48, 4.08s/it]
21%|██ | 939/4506 [1:04:25<4:07:47, 4.17s/it]
{'loss': 0.3447, 'grad_norm': 0.4498257637023926, 'learning_rate': 4.8241559003002044e-05, 'epoch': 0.21}
21%|██ | 939/4506 [1:04:25<4:07:47, 4.17s/it]
21%|██ | 940/4506 [1:04:29<4:03:42, 4.10s/it]
{'loss': 0.3503, 'grad_norm': 0.4560561776161194, 'learning_rate': 4.823441637788735e-05, 'epoch': 0.21}
21%|██ | 940/4506 [1:04:29<4:03:42, 4.10s/it]
21%|██ | 941/4506 [1:04:33<4:07:16, 4.16s/it]
{'loss': 0.3419, 'grad_norm': 0.5141482949256897, 'learning_rate': 4.8227259806769e-05, 'epoch': 0.21}
21%|██ | 941/4506 [1:04:33<4:07:16, 4.16s/it]
21%|██ | 942/4506 [1:04:37<4:01:18, 4.06s/it]
{'loss': 0.3353, 'grad_norm': 0.4651973843574524, 'learning_rate': 4.82200892939426e-05, 'epoch': 0.21}
21%|██ | 942/4506 [1:04:37<4:01:18, 4.06s/it]
21%|██ | 943/4506 [1:04:41<4:01:41, 4.07s/it]
{'loss': 0.3379, 'grad_norm': 0.44625386595726013, 'learning_rate': 4.821290484371208e-05, 'epoch': 0.21}
21%|██ | 943/4506 [1:04:41<4:01:41, 4.07s/it]
21%|██ | 944/4506 [1:04:45<4:00:01, 4.04s/it]
{'loss': 0.3377, 'grad_norm': 0.4548671543598175, 'learning_rate': 4.8205706460389805e-05, 'epoch': 0.21}
21%|██ | 944/4506 [1:04:45<4:00:01, 4.04s/it]
21%|██ | 945/4506 [1:04:49<3:59:44, 4.04s/it]
{'loss': 0.3538, 'grad_norm': 0.49626263976097107, 'learning_rate': 4.8198494148296434e-05, 'epoch': 0.21}
21%|██ | 945/4506 [1:04:49<3:59:44, 4.04s/it]
21%|██ | 946/4506 [1:04:53<3:53:22, 3.93s/it]
{'loss': 0.3476, 'grad_norm': 0.5538795590400696, 'learning_rate': 4.819126791176104e-05, 'epoch': 0.21}
21%|██ | 946/4506 [1:04:53<3:53:22, 3.93s/it]
21%|██ | 947/4506 [1:04:57<3:56:17, 3.98s/it]
{'loss': 0.3325, 'grad_norm': 0.4691388010978699, 'learning_rate': 4.8184027755121015e-05, 'epoch': 0.21}
21%|██ | 947/4506 [1:04:57<3:56:17, 3.98s/it]
21%|██ | 948/4506 [1:05:01<3:51:28, 3.90s/it]
{'loss': 0.328, 'grad_norm': 0.49976280331611633, 'learning_rate': 4.8176773682722124e-05, 'epoch': 0.21}
21%|██ | 948/4506 [1:05:01<3:51:28, 3.90s/it]
21%|██ | 949/4506 [1:05:05<4:06:40, 4.16s/it]
{'loss': 0.3318, 'grad_norm': 0.45006391406059265, 'learning_rate': 4.816950569891848e-05, 'epoch': 0.21}
21%|██ | 949/4506 [1:05:05<4:06:40, 4.16s/it]
21%|██ | 950/4506 [1:05:10<4:10:20, 4.22s/it]
{'loss': 0.3402, 'grad_norm': 0.5148518085479736, 'learning_rate': 4.816222380807255e-05, 'epoch': 0.21}
21%|██ | 950/4506 [1:05:10<4:10:20, 4.22s/it]
21%|██ | 951/4506 [1:05:14<4:03:38, 4.11s/it]
{'loss': 0.3355, 'grad_norm': 0.5268358588218689, 'learning_rate': 4.815492801455515e-05, 'epoch': 0.21}
21%|██ | 951/4506 [1:05:14<4:03:38, 4.11s/it]
21%|██ | 952/4506 [1:05:18<4:07:33, 4.18s/it]
{'loss': 0.3481, 'grad_norm': 0.5018934011459351, 'learning_rate': 4.814761832274543e-05, 'epoch': 0.21}
21%|██ | 952/4506 [1:05:18<4:07:33, 4.18s/it]
21%|██ | 953/4506 [1:05:22<4:03:54, 4.12s/it]
{'loss': 0.3367, 'grad_norm': 0.4892141819000244, 'learning_rate': 4.814029473703089e-05, 'epoch': 0.21}
21%|██ | 953/4506 [1:05:22<4:03:54, 4.12s/it]
21%|██ | 954/4506 [1:05:26<4:05:54, 4.15s/it]
{'loss': 0.3663, 'grad_norm': 0.5614310503005981, 'learning_rate': 4.8132957261807385e-05, 'epoch': 0.21}
21%|██ | 954/4506 [1:05:26<4:05:54, 4.15s/it]
21%|██ | 955/4506 [1:05:30<4:01:16, 4.08s/it]
{'loss': 0.354, 'grad_norm': 0.5788315534591675, 'learning_rate': 4.812560590147907e-05, 'epoch': 0.21}
21%|██ | 955/4506 [1:05:30<4:01:16, 4.08s/it]
21%|██ | 956/4506 [1:05:34<4:01:22, 4.08s/it]
{'loss': 0.3383, 'grad_norm': 0.5178674459457397, 'learning_rate': 4.811824066045846e-05, 'epoch': 0.21}
21%|██ | 956/4506 [1:05:34<4:01:22, 4.08s/it]
21%|██ | 957/4506 [1:05:38<3:59:14, 4.04s/it]
{'loss': 0.33, 'grad_norm': 0.4888642132282257, 'learning_rate': 4.811086154316641e-05, 'epoch': 0.21}
21%|██ | 957/4506 [1:05:38<3:59:14, 4.04s/it]
21%|██▏ | 958/4506 [1:05:42<3:54:47, 3.97s/it]
{'loss': 0.3666, 'grad_norm': 0.6172877550125122, 'learning_rate': 4.810346855403207e-05, 'epoch': 0.21}
21%|██▏ | 958/4506 [1:05:42<3:54:47, 3.97s/it]
21%|██▏ | 959/4506 [1:05:46<3:52:46, 3.94s/it]
{'loss': 0.3337, 'grad_norm': 0.4769025146961212, 'learning_rate': 4.8096061697492955e-05, 'epoch': 0.21}
21%|██▏ | 959/4506 [1:05:46<3:52:46, 3.94s/it]
21%|██▏ | 960/4506 [1:05:50<3:55:04, 3.98s/it]
{'loss': 0.348, 'grad_norm': 0.47906795144081116, 'learning_rate': 4.808864097799488e-05, 'epoch': 0.21}
21%|██▏ | 960/4506 [1:05:50<3:55:04, 3.98s/it]
21%|██▏ | 961/4506 [1:05:54<3:58:51, 4.04s/it]
{'loss': 0.3268, 'grad_norm': 0.48362088203430176, 'learning_rate': 4.808120639999198e-05, 'epoch': 0.21}
21%|██▏ | 961/4506 [1:05:54<3:58:51, 4.04s/it]
21%|██▏ | 962/4506 [1:05:58<4:00:23, 4.07s/it]
{'loss': 0.3312, 'grad_norm': 0.43774309754371643, 'learning_rate': 4.8073757967946734e-05, 'epoch': 0.21}
21%|██▏ | 962/4506 [1:05:58<4:00:23, 4.07s/it]
21%|██▏ | 963/4506 [1:06:02<4:02:20, 4.10s/it]
{'loss': 0.3569, 'grad_norm': 0.5678131580352783, 'learning_rate': 4.806629568632989e-05, 'epoch': 0.21}
21%|██▏ | 963/4506 [1:06:02<4:02:20, 4.10s/it]
21%|██▏ | 964/4506 [1:06:06<3:58:46, 4.04s/it]
{'loss': 0.3544, 'grad_norm': 0.6362777352333069, 'learning_rate': 4.8058819559620547e-05, 'epoch': 0.21}
21%|██▏ | 964/4506 [1:06:06<3:58:46, 4.04s/it]
21%|██▏ | 965/4506 [1:06:10<4:03:28, 4.13s/it]
{'loss': 0.3352, 'grad_norm': 0.5175965428352356, 'learning_rate': 4.8051329592306116e-05, 'epoch': 0.21}
21%|██▏ | 965/4506 [1:06:10<4:03:28, 4.13s/it]
21%|██▏ | 966/4506 [1:06:15<4:04:17, 4.14s/it]
{'loss': 0.3387, 'grad_norm': 0.5319568514823914, 'learning_rate': 4.804382578888229e-05, 'epoch': 0.21}
21%|██▏ | 966/4506 [1:06:15<4:04:17, 4.14s/it]
21%|██▏ | 967/4506 [1:06:18<3:57:52, 4.03s/it]
{'loss': 0.3379, 'grad_norm': 0.528131365776062, 'learning_rate': 4.803630815385309e-05, 'epoch': 0.21}
21%|██▏ | 967/4506 [1:06:18<3:57:52, 4.03s/it]
21%|██▏ | 968/4506 [1:06:23<3:58:38, 4.05s/it]
{'loss': 0.3438, 'grad_norm': 0.503853440284729, 'learning_rate': 4.8028776691730816e-05, 'epoch': 0.21}
21%|██▏ | 968/4506 [1:06:23<3:58:38, 4.05s/it]
22%|██▏ | 969/4506 [1:06:27<3:59:09, 4.06s/it]
{'loss': 0.3277, 'grad_norm': 0.46644800901412964, 'learning_rate': 4.802123140703609e-05, 'epoch': 0.22}
22%|██▏ | 969/4506 [1:06:27<3:59:09, 4.06s/it]
22%|██▏ | 970/4506 [1:06:31<3:59:57, 4.07s/it]
{'loss': 0.3412, 'grad_norm': 0.5294196605682373, 'learning_rate': 4.801367230429783e-05, 'epoch': 0.22}
22%|██▏ | 970/4506 [1:06:31<3:59:57, 4.07s/it]
22%|██▏ | 971/4506 [1:06:36<4:17:25, 4.37s/it]
{'loss': 0.3234, 'grad_norm': 0.5439848303794861, 'learning_rate': 4.8006099388053215e-05, 'epoch': 0.22}
22%|██▏ | 971/4506 [1:06:36<4:17:25, 4.37s/it]
22%|██▏ | 972/4506 [1:06:40<4:10:57, 4.26s/it]
{'loss': 0.3422, 'grad_norm': 0.5305350422859192, 'learning_rate': 4.799851266284776e-05, 'epoch': 0.22}
22%|██▏ | 972/4506 [1:06:40<4:10:57, 4.26s/it]
22%|██▏ | 973/4506 [1:06:44<4:04:51, 4.16s/it]
{'loss': 0.3375, 'grad_norm': 0.5044872760772705, 'learning_rate': 4.799091213323524e-05, 'epoch': 0.22}
22%|██▏ | 973/4506 [1:06:44<4:04:51, 4.16s/it]
22%|██▏ | 974/4506 [1:06:48<4:06:42, 4.19s/it]
{'loss': 0.3492, 'grad_norm': 0.4778393805027008, 'learning_rate': 4.798329780377773e-05, 'epoch': 0.22}
22%|██▏ | 974/4506 [1:06:48<4:06:42, 4.19s/it]
22%|██▏ | 975/4506 [1:06:52<4:04:08, 4.15s/it]
{'loss': 0.3423, 'grad_norm': 0.46754327416419983, 'learning_rate': 4.7975669679045576e-05, 'epoch': 0.22}
22%|██▏ | 975/4506 [1:06:52<4:04:08, 4.15s/it]
22%|██▏ | 976/4506 [1:06:56<4:01:11, 4.10s/it]
{'loss': 0.3539, 'grad_norm': 0.49049869179725647, 'learning_rate': 4.796802776361741e-05, 'epoch': 0.22}
22%|██▏ | 976/4506 [1:06:56<4:01:11, 4.10s/it]
22%|██▏ | 977/4506 [1:07:01<4:11:14, 4.27s/it]
{'loss': 0.3515, 'grad_norm': 0.46700748801231384, 'learning_rate': 4.796037206208015e-05, 'epoch': 0.22}
22%|██▏ | 977/4506 [1:07:01<4:11:14, 4.27s/it]
22%|██▏ | 978/4506 [1:07:04<4:02:26, 4.12s/it]
{'loss': 0.3514, 'grad_norm': 0.5266455411911011, 'learning_rate': 4.7952702579028976e-05, 'epoch': 0.22}
22%|██▏ | 978/4506 [1:07:04<4:02:26, 4.12s/it]
22%|██▏ | 979/4506 [1:07:09<4:06:10, 4.19s/it]
{'loss': 0.3446, 'grad_norm': 0.47876670956611633, 'learning_rate': 4.794501931906735e-05, 'epoch': 0.22}
22%|██▏ | 979/4506 [1:07:09<4:06:10, 4.19s/it]
22%|██▏ | 980/4506 [1:07:13<4:05:34, 4.18s/it]
{'loss': 0.3442, 'grad_norm': 0.5344295501708984, 'learning_rate': 4.793732228680698e-05, 'epoch': 0.22}
22%|██▏ | 980/4506 [1:07:13<4:05:34, 4.18s/it]
22%|██▏ | 981/4506 [1:07:17<3:56:15, 4.02s/it]
{'loss': 0.3213, 'grad_norm': 0.47935086488723755, 'learning_rate': 4.792961148686789e-05, 'epoch': 0.22}
22%|██▏ | 981/4506 [1:07:17<3:56:15, 4.02s/it]
22%|██▏ | 982/4506 [1:07:20<3:52:00, 3.95s/it]
{'loss': 0.3473, 'grad_norm': 0.47288310527801514, 'learning_rate': 4.792188692387831e-05, 'epoch': 0.22}
22%|██▏ | 982/4506 [1:07:20<3:52:00, 3.95s/it]
22%|██▏ | 983/4506 [1:07:24<3:53:30, 3.98s/it]
{'loss': 0.3334, 'grad_norm': 0.5839184522628784, 'learning_rate': 4.7914148602474776e-05, 'epoch': 0.22}
22%|██▏ | 983/4506 [1:07:24<3:53:30, 3.98s/it]
22%|██▏ | 984/4506 [1:07:29<3:58:30, 4.06s/it]
{'loss': 0.3357, 'grad_norm': 0.45943039655685425, 'learning_rate': 4.790639652730205e-05, 'epoch': 0.22}
22%|██▏ | 984/4506 [1:07:29<3:58:30, 4.06s/it]
22%|██▏ | 985/4506 [1:07:33<3:54:16, 3.99s/it]
{'loss': 0.3331, 'grad_norm': 0.5600707530975342, 'learning_rate': 4.789863070301316e-05, 'epoch': 0.22}
22%|██▏ | 985/4506 [1:07:33<3:54:16, 3.99s/it]
22%|██▏ | 986/4506 [1:07:36<3:53:07, 3.97s/it]
{'loss': 0.3417, 'grad_norm': 0.5005229115486145, 'learning_rate': 4.7890851134269405e-05, 'epoch': 0.22}
22%|██▏ | 986/4506 [1:07:36<3:53:07, 3.97s/it]
22%|██▏ | 987/4506 [1:07:40<3:51:09, 3.94s/it]
{'loss': 0.3375, 'grad_norm': 0.5214852094650269, 'learning_rate': 4.788305782574031e-05, 'epoch': 0.22}
22%|██▏ | 987/4506 [1:07:40<3:51:09, 3.94s/it]
22%|██▏ | 988/4506 [1:07:44<3:51:00, 3.94s/it]
{'loss': 0.3246, 'grad_norm': 0.45248714089393616, 'learning_rate': 4.7875250782103666e-05, 'epoch': 0.22}
22%|██▏ | 988/4506 [1:07:44<3:51:00, 3.94s/it]
22%|██▏ | 989/4506 [1:07:48<3:53:59, 3.99s/it]
{'loss': 0.3128, 'grad_norm': 0.43983832001686096, 'learning_rate': 4.786743000804548e-05, 'epoch': 0.22}
22%|██▏ | 989/4506 [1:07:48<3:53:59, 3.99s/it]
22%|██▏ | 990/4506 [1:07:53<4:01:02, 4.11s/it]
{'loss': 0.328, 'grad_norm': 0.4780474007129669, 'learning_rate': 4.785959550826004e-05, 'epoch': 0.22}
22%|██▏ | 990/4506 [1:07:53<4:01:02, 4.11s/it]
22%|██▏ | 991/4506 [1:07:57<3:56:49, 4.04s/it]
{'loss': 0.3292, 'grad_norm': 0.5202018022537231, 'learning_rate': 4.7851747287449836e-05, 'epoch': 0.22}
22%|██▏ | 991/4506 [1:07:57<3:56:49, 4.04s/it]
22%|██▏ | 992/4506 [1:08:01<3:59:30, 4.09s/it]
{'loss': 0.3284, 'grad_norm': 0.48413076996803284, 'learning_rate': 4.7843885350325614e-05, 'epoch': 0.22}
22%|██▏ | 992/4506 [1:08:01<3:59:30, 4.09s/it]
22%|██▏ | 993/4506 [1:08:05<3:56:55, 4.05s/it]
{'loss': 0.3473, 'grad_norm': 0.5167239904403687, 'learning_rate': 4.783600970160634e-05, 'epoch': 0.22}
22%|██▏ | 993/4506 [1:08:05<3:56:55, 4.05s/it]
22%|██▏ | 994/4506 [1:08:09<3:54:40, 4.01s/it]
{'loss': 0.3269, 'grad_norm': 0.45683905482292175, 'learning_rate': 4.782812034601923e-05, 'epoch': 0.22}
22%|██▏ | 994/4506 [1:08:09<3:54:40, 4.01s/it]
22%|██▏ | 995/4506 [1:08:13<3:55:09, 4.02s/it]
{'loss': 0.3374, 'grad_norm': 0.4455682039260864, 'learning_rate': 4.782021728829971e-05, 'epoch': 0.22}
22%|██▏ | 995/4506 [1:08:13<3:55:09, 4.02s/it]
22%|██▏ | 996/4506 [1:08:17<4:06:13, 4.21s/it]
{'loss': 0.3513, 'grad_norm': 0.5147238969802856, 'learning_rate': 4.781230053319144e-05, 'epoch': 0.22}
22%|██▏ | 996/4506 [1:08:17<4:06:13, 4.21s/it]
22%|██▏ | 997/4506 [1:08:21<3:59:30, 4.10s/it]
{'loss': 0.321, 'grad_norm': 0.5629688501358032, 'learning_rate': 4.7804370085446296e-05, 'epoch': 0.22}
22%|██▏ | 997/4506 [1:08:21<3:59:30, 4.10s/it]
22%|██▏ | 998/4506 [1:08:26<4:03:05, 4.16s/it]
{'loss': 0.3417, 'grad_norm': 0.5234940648078918, 'learning_rate': 4.7796425949824357e-05, 'epoch': 0.22}
22%|██▏ | 998/4506 [1:08:26<4:03:05, 4.16s/it]
22%|██▏ | 999/4506 [1:08:30<4:06:53, 4.22s/it]
{'loss': 0.33, 'grad_norm': 0.47641825675964355, 'learning_rate': 4.778846813109396e-05, 'epoch': 0.22}
22%|██▏ | 999/4506 [1:08:30<4:06:53, 4.22s/it]
22%|██▏ | 1000/4506 [1:08:34<4:04:46, 4.19s/it]
{'loss': 0.3208, 'grad_norm': 0.48093581199645996, 'learning_rate': 4.7780496634031615e-05, 'epoch': 0.22}
22%|██▏ | 1000/4506 [1:08:34<4:04:46, 4.19s/it]
22%|██▏ | 1001/4506 [1:08:38<4:00:37, 4.12s/it]
{'loss': 0.3412, 'grad_norm': 0.4855981767177582, 'learning_rate': 4.7772511463422067e-05, 'epoch': 0.22}
22%|██▏ | 1001/4506 [1:08:38<4:00:37, 4.12s/it]
22%|██▏ | 1002/4506 [1:08:42<3:57:08, 4.06s/it]
{'loss': 0.3308, 'grad_norm': 0.5081371068954468, 'learning_rate': 4.7764512624058245e-05, 'epoch': 0.22}
22%|██▏ | 1002/4506 [1:08:42<3:57:08, 4.06s/it]
22%|██▏ | 1003/4506 [1:08:46<3:58:09, 4.08s/it]
{'loss': 0.3317, 'grad_norm': 0.5346722602844238, 'learning_rate': 4.775650012074131e-05, 'epoch': 0.22}
22%|██▏ | 1003/4506 [1:08:46<3:58:09, 4.08s/it]
22%|██▏ | 1004/4506 [1:08:50<3:59:51, 4.11s/it]
{'loss': 0.334, 'grad_norm': 0.47872957587242126, 'learning_rate': 4.77484739582806e-05, 'epoch': 0.22}
22%|██▏ | 1004/4506 [1:08:50<3:59:51, 4.11s/it]
22%|██▏ | 1005/4506 [1:08:54<3:57:21, 4.07s/it]
{'loss': 0.3446, 'grad_norm': 0.4837908148765564, 'learning_rate': 4.774043414149366e-05, 'epoch': 0.22}
22%|██▏ | 1005/4506 [1:08:54<3:57:21, 4.07s/it]
22%|██▏ | 1006/4506 [1:08:58<3:59:25, 4.10s/it]
{'loss': 0.3265, 'grad_norm': 0.5037238597869873, 'learning_rate': 4.7732380675206245e-05, 'epoch': 0.22}
22%|██▏ | 1006/4506 [1:08:58<3:59:25, 4.10s/it]
22%|██▏ | 1007/4506 [1:09:02<3:59:45, 4.11s/it]
{'loss': 0.3164, 'grad_norm': 0.4370296001434326, 'learning_rate': 4.772431356425229e-05, 'epoch': 0.22}
22%|██▏ | 1007/4506 [1:09:02<3:59:45, 4.11s/it]
22%|██▏ | 1008/4506 [1:09:07<4:03:24, 4.18s/it]
{'loss': 0.3347, 'grad_norm': 0.4778129458427429, 'learning_rate': 4.7716232813473905e-05, 'epoch': 0.22}
22%|██▏ | 1008/4506 [1:09:07<4:03:24, 4.18s/it]
22%|██▏ | 1009/4506 [1:09:11<4:03:04, 4.17s/it]
{'loss': 0.3256, 'grad_norm': 0.4933745861053467, 'learning_rate': 4.770813842772142e-05, 'epoch': 0.22}
22%|██▏ | 1009/4506 [1:09:11<4:03:04, 4.17s/it]
22%|██▏ | 1010/4506 [1:09:15<4:00:15, 4.12s/it]
{'loss': 0.3328, 'grad_norm': 0.4944606125354767, 'learning_rate': 4.7700030411853314e-05, 'epoch': 0.22}
22%|██▏ | 1010/4506 [1:09:15<4:00:15, 4.12s/it]
22%|██▏ | 1011/4506 [1:09:19<4:04:52, 4.20s/it]
{'loss': 0.348, 'grad_norm': 0.5137065649032593, 'learning_rate': 4.769190877073627e-05, 'epoch': 0.22}
22%|██▏ | 1011/4506 [1:09:19<4:04:52, 4.20s/it]
22%|██▏ | 1012/4506 [1:09:24<4:03:48, 4.19s/it]
{'loss': 0.3533, 'grad_norm': 0.5130427479743958, 'learning_rate': 4.768377350924516e-05, 'epoch': 0.22}
22%|██▏ | 1012/4506 [1:09:24<4:03:48, 4.19s/it]
22%|██▏ | 1013/4506 [1:09:27<3:59:06, 4.11s/it]
{'loss': 0.3425, 'grad_norm': 0.4496150612831116, 'learning_rate': 4.7675624632263e-05, 'epoch': 0.22}
22%|██▏ | 1013/4506 [1:09:27<3:59:06, 4.11s/it]
23%|██▎ | 1014/4506 [1:09:32<4:00:09, 4.13s/it]
{'loss': 0.3317, 'grad_norm': 0.7348712086677551, 'learning_rate': 4.7667462144681e-05, 'epoch': 0.23}
23%|██▎ | 1014/4506 [1:09:32<4:00:09, 4.13s/it]
23%|██▎ | 1015/4506 [1:09:36<4:01:11, 4.15s/it]
{'loss': 0.3352, 'grad_norm': 0.4478227496147156, 'learning_rate': 4.765928605139852e-05, 'epoch': 0.23}
23%|██▎ | 1015/4506 [1:09:36<4:01:11, 4.15s/it]
23%|██▎ | 1016/4506 [1:09:40<3:58:48, 4.11s/it]
{'loss': 0.3342, 'grad_norm': 0.49006763100624084, 'learning_rate': 4.765109635732312e-05, 'epoch': 0.23}
23%|██▎ | 1016/4506 [1:09:40<3:58:48, 4.11s/it]
23%|██▎ | 1017/4506 [1:09:44<3:55:03, 4.04s/it]
{'loss': 0.333, 'grad_norm': 0.45797547698020935, 'learning_rate': 4.76428930673705e-05, 'epoch': 0.23}
23%|██▎ | 1017/4506 [1:09:44<3:55:03, 4.04s/it]
23%|██▎ | 1018/4506 [1:09:48<3:58:15, 4.10s/it]
{'loss': 0.3229, 'grad_norm': 0.47444579005241394, 'learning_rate': 4.7634676186464506e-05, 'epoch': 0.23}
23%|██▎ | 1018/4506 [1:09:48<3:58:15, 4.10s/it]
23%|██▎ | 1019/4506 [1:09:52<3:57:50, 4.09s/it]
{'loss': 0.3324, 'grad_norm': 0.5201365351676941, 'learning_rate': 4.762644571953718e-05, 'epoch': 0.23}
23%|██▎ | 1019/4506 [1:09:52<3:57:50, 4.09s/it]
23%|██▎ | 1020/4506 [1:09:56<3:54:16, 4.03s/it]
{'loss': 0.325, 'grad_norm': 0.5018166899681091, 'learning_rate': 4.761820167152869e-05, 'epoch': 0.23}
23%|██▎ | 1020/4506 [1:09:56<3:54:16, 4.03s/it]
23%|██▎ | 1021/4506 [1:10:00<3:54:42, 4.04s/it]
{'loss': 0.3461, 'grad_norm': 0.5344541072845459, 'learning_rate': 4.760994404738737e-05, 'epoch': 0.23}
23%|██▎ | 1021/4506 [1:10:00<3:54:42, 4.04s/it]
23%|██▎ | 1022/4506 [1:10:04<3:59:22, 4.12s/it]
{'loss': 0.3163, 'grad_norm': 0.4347385764122009, 'learning_rate': 4.760167285206968e-05, 'epoch': 0.23}
23%|██▎ | 1022/4506 [1:10:04<3:59:22, 4.12s/it]
23%|██▎ | 1023/4506 [1:10:08<3:51:47, 3.99s/it]
{'loss': 0.3214, 'grad_norm': 0.5251062512397766, 'learning_rate': 4.759338809054027e-05, 'epoch': 0.23}
23%|██▎ | 1023/4506 [1:10:08<3:51:47, 3.99s/it]
23%|██▎ | 1024/4506 [1:10:12<3:52:53, 4.01s/it]
{'loss': 0.3289, 'grad_norm': 0.510996401309967, 'learning_rate': 4.758508976777188e-05, 'epoch': 0.23}
23%|██▎ | 1024/4506 [1:10:12<3:52:53, 4.01s/it]
23%|██▎ | 1025/4506 [1:10:16<3:48:01, 3.93s/it]
{'loss': 0.3579, 'grad_norm': 0.56916743516922, 'learning_rate': 4.7576777888745436e-05, 'epoch': 0.23}
23%|██▎ | 1025/4506 [1:10:16<3:48:01, 3.93s/it]
23%|██▎ | 1026/4506 [1:10:20<3:57:29, 4.09s/it]
{'loss': 0.3473, 'grad_norm': 0.5006111860275269, 'learning_rate': 4.7568452458449975e-05, 'epoch': 0.23}
23%|██▎ | 1026/4506 [1:10:20<3:57:29, 4.09s/it]
23%|██▎ | 1027/4506 [1:10:24<3:55:07, 4.06s/it]
{'loss': 0.3525, 'grad_norm': 0.521885097026825, 'learning_rate': 4.7560113481882676e-05, 'epoch': 0.23}
23%|██▎ | 1027/4506 [1:10:24<3:55:07, 4.06s/it]
23%|██▎ | 1028/4506 [1:10:28<3:48:00, 3.93s/it]
{'loss': 0.332, 'grad_norm': 0.4769597053527832, 'learning_rate': 4.755176096404883e-05, 'epoch': 0.23}
23%|██▎ | 1028/4506 [1:10:28<3:48:00, 3.93s/it]
23%|██▎ | 1029/4506 [1:10:32<3:46:39, 3.91s/it]
{'loss': 0.3311, 'grad_norm': 0.4636962115764618, 'learning_rate': 4.7543394909961884e-05, 'epoch': 0.23}
23%|██▎ | 1029/4506 [1:10:32<3:46:39, 3.91s/it]
23%|██▎ | 1030/4506 [1:10:36<3:44:40, 3.88s/it]
{'loss': 0.3341, 'grad_norm': 0.5036718249320984, 'learning_rate': 4.753501532464341e-05, 'epoch': 0.23}
23%|██▎ | 1030/4506 [1:10:36<3:44:40, 3.88s/it]
23%|██▎ | 1031/4506 [1:10:39<3:42:32, 3.84s/it]
{'loss': 0.3303, 'grad_norm': 0.5656996369361877, 'learning_rate': 4.7526622213123065e-05, 'epoch': 0.23}
23%|██▎ | 1031/4506 [1:10:39<3:42:32, 3.84s/it]
23%|██▎ | 1032/4506 [1:10:43<3:42:34, 3.84s/it]
{'loss': 0.3407, 'grad_norm': 0.5450674891471863, 'learning_rate': 4.751821558043866e-05, 'epoch': 0.23}
23%|██▎ | 1032/4506 [1:10:43<3:42:34, 3.84s/it]
23%|██▎ | 1033/4506 [1:10:47<3:46:02, 3.91s/it]
{'loss': 0.3475, 'grad_norm': 0.5897658467292786, 'learning_rate': 4.750979543163613e-05, 'epoch': 0.23}
23%|██▎ | 1033/4506 [1:10:47<3:46:02, 3.91s/it]
23%|██▎ | 1034/4506 [1:10:51<3:49:11, 3.96s/it]
{'loss': 0.3285, 'grad_norm': 0.49034908413887024, 'learning_rate': 4.750136177176948e-05, 'epoch': 0.23}
23%|██▎ | 1034/4506 [1:10:51<3:49:11, 3.96s/it]
23%|██▎ | 1035/4506 [1:10:55<3:51:16, 4.00s/it]
{'loss': 0.339, 'grad_norm': 0.5617565512657166, 'learning_rate': 4.749291460590086e-05, 'epoch': 0.23}
23%|██▎ | 1035/4506 [1:10:55<3:51:16, 4.00s/it]
23%|██▎ | 1036/4506 [1:10:59<3:51:08, 4.00s/it]
{'loss': 0.3281, 'grad_norm': 0.46420377492904663, 'learning_rate': 4.7484453939100516e-05, 'epoch': 0.23}
23%|██▎ | 1036/4506 [1:10:59<3:51:08, 4.00s/it]
23%|██▎ | 1037/4506 [1:11:04<3:59:36, 4.14s/it]
{'loss': 0.3427, 'grad_norm': 0.46072348952293396, 'learning_rate': 4.74759797764468e-05, 'epoch': 0.23}
23%|██▎ | 1037/4506 [1:11:04<3:59:36, 4.14s/it]
23%|██▎ | 1038/4506 [1:11:08<3:55:30, 4.07s/it]
{'loss': 0.3329, 'grad_norm': 0.5428562164306641, 'learning_rate': 4.746749212302615e-05, 'epoch': 0.23}
23%|██▎ | 1038/4506 [1:11:08<3:55:30, 4.07s/it]
23%|██▎ | 1039/4506 [1:11:12<3:53:38, 4.04s/it]
{'loss': 0.3362, 'grad_norm': 0.4781440794467926, 'learning_rate': 4.745899098393313e-05, 'epoch': 0.23}
23%|██▎ | 1039/4506 [1:11:12<3:53:38, 4.04s/it]
23%|██▎ | 1040/4506 [1:11:16<3:52:52, 4.03s/it]
{'loss': 0.3322, 'grad_norm': 0.47443804144859314, 'learning_rate': 4.745047636427037e-05, 'epoch': 0.23}
23%|██▎ | 1040/4506 [1:11:16<3:52:52, 4.03s/it]
23%|██▎ | 1041/4506 [1:11:20<3:49:27, 3.97s/it]
{'loss': 0.3206, 'grad_norm': 0.4460103511810303, 'learning_rate': 4.7441948269148604e-05, 'epoch': 0.23}
23%|██▎ | 1041/4506 [1:11:20<3:49:27, 3.97s/it]
23%|██▎ | 1042/4506 [1:11:24<4:04:13, 4.23s/it]
{'loss': 0.3447, 'grad_norm': 0.5517868995666504, 'learning_rate': 4.743340670368667e-05, 'epoch': 0.23}
23%|██▎ | 1042/4506 [1:11:24<4:04:13, 4.23s/it]
23%|██▎ | 1043/4506 [1:11:28<4:00:45, 4.17s/it]
{'loss': 0.34, 'grad_norm': 0.5375611186027527, 'learning_rate': 4.742485167301146e-05, 'epoch': 0.23}
23%|██▎ | 1043/4506 [1:11:28<4:00:45, 4.17s/it]
23%|██▎ | 1044/4506 [1:11:32<3:55:44, 4.09s/it]
{'loss': 0.3333, 'grad_norm': 0.5106503963470459, 'learning_rate': 4.7416283182257965e-05, 'epoch': 0.23}
23%|██▎ | 1044/4506 [1:11:32<3:55:44, 4.09s/it]
23%|██▎ | 1045/4506 [1:11:37<4:06:41, 4.28s/it]
{'loss': 0.3554, 'grad_norm': 0.5061504244804382, 'learning_rate': 4.7407701236569254e-05, 'epoch': 0.23}
23%|██▎ | 1045/4506 [1:11:37<4:06:41, 4.28s/it]
23%|██▎ | 1046/4506 [1:11:41<4:04:54, 4.25s/it]
{'loss': 0.3143, 'grad_norm': 0.47808587551116943, 'learning_rate': 4.739910584109648e-05, 'epoch': 0.23}
23%|██▎ | 1046/4506 [1:11:41<4:04:54, 4.25s/it]
23%|██▎ | 1047/4506 [1:11:45<4:03:33, 4.22s/it]
{'loss': 0.3284, 'grad_norm': 0.4767548739910126, 'learning_rate': 4.739049700099886e-05, 'epoch': 0.23}
23%|██▎ | 1047/4506 [1:11:45<4:03:33, 4.22s/it]
23%|██▎ | 1048/4506 [1:11:49<3:58:52, 4.14s/it]
{'loss': 0.3182, 'grad_norm': 0.4805763363838196, 'learning_rate': 4.738187472144367e-05, 'epoch': 0.23}
23%|██▎ | 1048/4506 [1:11:49<3:58:52, 4.14s/it]
23%|██▎ | 1049/4506 [1:11:53<3:56:57, 4.11s/it]
{'loss': 0.3351, 'grad_norm': 0.5595769882202148, 'learning_rate': 4.737323900760628e-05, 'epoch': 0.23}
23%|██▎ | 1049/4506 [1:11:53<3:56:57, 4.11s/it]
23%|██▎ | 1050/4506 [1:11:58<4:02:37, 4.21s/it]
{'loss': 0.3362, 'grad_norm': 0.49028271436691284, 'learning_rate': 4.7364589864670085e-05, 'epoch': 0.23}
23%|██▎ | 1050/4506 [1:11:58<4:02:37, 4.21s/it]
23%|██▎ | 1051/4506 [1:12:02<4:02:16, 4.21s/it]
{'loss': 0.3209, 'grad_norm': 0.4317382574081421, 'learning_rate': 4.735592729782659e-05, 'epoch': 0.23}
23%|██▎ | 1051/4506 [1:12:02<4:02:16, 4.21s/it]
23%|██▎ | 1052/4506 [1:12:06<4:02:21, 4.21s/it]
{'loss': 0.3238, 'grad_norm': 0.4985319972038269, 'learning_rate': 4.734725131227532e-05, 'epoch': 0.23}
23%|██▎ | 1052/4506 [1:12:06<4:02:21, 4.21s/it]
23%|██▎ | 1053/4506 [1:12:10<3:58:19, 4.14s/it]
{'loss': 0.3415, 'grad_norm': 0.4906201660633087, 'learning_rate': 4.7338561913223864e-05, 'epoch': 0.23}
23%|██▎ | 1053/4506 [1:12:10<3:58:19, 4.14s/it]
23%|██▎ | 1054/4506 [1:12:14<3:56:57, 4.12s/it]
{'loss': 0.3216, 'grad_norm': 0.5531723499298096, 'learning_rate': 4.732985910588786e-05, 'epoch': 0.23}
23%|██▎ | 1054/4506 [1:12:14<3:56:57, 4.12s/it]
23%|██▎ | 1055/4506 [1:12:19<4:03:27, 4.23s/it]
{'loss': 0.3452, 'grad_norm': 0.5443403720855713, 'learning_rate': 4.732114289549101e-05, 'epoch': 0.23}
23%|██▎ | 1055/4506 [1:12:19<4:03:27, 4.23s/it]
23%|██▎ | 1056/4506 [1:12:23<3:56:57, 4.12s/it]
{'loss': 0.3236, 'grad_norm': 0.47934669256210327, 'learning_rate': 4.731241328726503e-05, 'epoch': 0.23}
23%|██▎ | 1056/4506 [1:12:23<3:56:57, 4.12s/it]
23%|██▎ | 1057/4506 [1:12:27<4:01:47, 4.21s/it]
{'loss': 0.3058, 'grad_norm': 0.4135453701019287, 'learning_rate': 4.730367028644972e-05, 'epoch': 0.23}
23%|██▎ | 1057/4506 [1:12:27<4:01:47, 4.21s/it]
23%|██▎ | 1058/4506 [1:12:31<3:58:56, 4.16s/it]
{'loss': 0.3318, 'grad_norm': 0.5203267931938171, 'learning_rate': 4.729491389829288e-05, 'epoch': 0.23}
23%|██▎ | 1058/4506 [1:12:31<3:58:56, 4.16s/it]
24%|██▎ | 1059/4506 [1:12:35<3:54:33, 4.08s/it]
{'loss': 0.3199, 'grad_norm': 0.4467572569847107, 'learning_rate': 4.728614412805037e-05, 'epoch': 0.24}
24%|██▎ | 1059/4506 [1:12:35<3:54:33, 4.08s/it]
24%|██▎ | 1060/4506 [1:12:39<3:54:07, 4.08s/it]
{'loss': 0.3237, 'grad_norm': 0.4934834837913513, 'learning_rate': 4.727736098098605e-05, 'epoch': 0.24}
24%|██▎ | 1060/4506 [1:12:39<3:54:07, 4.08s/it]
24%|██▎ | 1061/4506 [1:12:43<3:49:40, 4.00s/it]
{'loss': 0.3323, 'grad_norm': 0.5561627745628357, 'learning_rate': 4.7268564462371865e-05, 'epoch': 0.24}
24%|██▎ | 1061/4506 [1:12:43<3:49:40, 4.00s/it]
24%|██▎ | 1062/4506 [1:12:47<3:52:49, 4.06s/it]
{'loss': 0.3229, 'grad_norm': 0.5162324905395508, 'learning_rate': 4.725975457748773e-05, 'epoch': 0.24}
24%|██▎ | 1062/4506 [1:12:47<3:52:49, 4.06s/it]
24%|██▎ | 1063/4506 [1:12:52<4:02:35, 4.23s/it]
{'loss': 0.3314, 'grad_norm': 0.5172332525253296, 'learning_rate': 4.7250931331621615e-05, 'epoch': 0.24}
24%|██▎ | 1063/4506 [1:12:52<4:02:35, 4.23s/it]
24%|██▎ | 1064/4506 [1:12:56<3:58:12, 4.15s/it]
{'loss': 0.3285, 'grad_norm': 0.4816140830516815, 'learning_rate': 4.72420947300695e-05, 'epoch': 0.24}
24%|██▎ | 1064/4506 [1:12:56<3:58:12, 4.15s/it]
24%|██▎ | 1065/4506 [1:13:00<3:53:44, 4.08s/it]
{'loss': 0.3306, 'grad_norm': 0.5057365894317627, 'learning_rate': 4.7233244778135376e-05, 'epoch': 0.24}
24%|██▎ | 1065/4506 [1:13:00<3:53:44, 4.08s/it]
24%|██▎ | 1066/4506 [1:13:04<3:53:16, 4.07s/it]
{'loss': 0.3067, 'grad_norm': 0.4274426996707916, 'learning_rate': 4.7224381481131264e-05, 'epoch': 0.24}
24%|██▎ | 1066/4506 [1:13:04<3:53:16, 4.07s/it]
24%|██▎ | 1067/4506 [1:13:08<3:52:04, 4.05s/it]
{'loss': 0.3264, 'grad_norm': 0.4109852910041809, 'learning_rate': 4.721550484437717e-05, 'epoch': 0.24}
24%|██▎ | 1067/4506 [1:13:08<3:52:04, 4.05s/it]
24%|██▎ | 1068/4506 [1:13:12<3:52:56, 4.07s/it]
{'loss': 0.3257, 'grad_norm': 0.47978517413139343, 'learning_rate': 4.720661487320114e-05, 'epoch': 0.24}
24%|██▎ | 1068/4506 [1:13:12<3:52:56, 4.07s/it]
24%|██▎ | 1069/4506 [1:13:16<3:58:50, 4.17s/it]
{'loss': 0.3224, 'grad_norm': 0.46805286407470703, 'learning_rate': 4.7197711572939185e-05, 'epoch': 0.24}
24%|██▎ | 1069/4506 [1:13:16<3:58:50, 4.17s/it]
24%|██▎ | 1070/4506 [1:13:20<3:52:24, 4.06s/it]
{'loss': 0.3282, 'grad_norm': 0.5648146271705627, 'learning_rate': 4.7188794948935356e-05, 'epoch': 0.24}
24%|██▎ | 1070/4506 [1:13:20<3:52:24, 4.06s/it]
24%|██▍ | 1071/4506 [1:13:24<3:51:51, 4.05s/it]
{'loss': 0.3155, 'grad_norm': 0.470395028591156, 'learning_rate': 4.7179865006541666e-05, 'epoch': 0.24}
24%|██▍ | 1071/4506 [1:13:24<3:51:51, 4.05s/it]
24%|██▍ | 1072/4506 [1:13:28<3:49:24, 4.01s/it]
{'loss': 0.3259, 'grad_norm': 0.4746118187904358, 'learning_rate': 4.7170921751118145e-05, 'epoch': 0.24}
24%|██▍ | 1072/4506 [1:13:28<3:49:24, 4.01s/it]
24%|██▍ | 1073/4506 [1:13:32<3:54:10, 4.09s/it]
{'loss': 0.3214, 'grad_norm': 0.4902520179748535, 'learning_rate': 4.7161965188032814e-05, 'epoch': 0.24}
24%|██▍ | 1073/4506 [1:13:32<3:54:10, 4.09s/it]
24%|██▍ | 1074/4506 [1:13:36<3:53:17, 4.08s/it]
{'loss': 0.3175, 'grad_norm': 0.45181456208229065, 'learning_rate': 4.7152995322661664e-05, 'epoch': 0.24}
24%|██▍ | 1074/4506 [1:13:36<3:53:17, 4.08s/it]
24%|██▍ | 1075/4506 [1:13:40<3:56:16, 4.13s/it]
{'loss': 0.3301, 'grad_norm': 0.50949627161026, 'learning_rate': 4.7144012160388684e-05, 'epoch': 0.24}
24%|██▍ | 1075/4506 [1:13:40<3:56:16, 4.13s/it]
24%|██▍ | 1076/4506 [1:13:44<3:52:28, 4.07s/it]
{'loss': 0.317, 'grad_norm': 0.66161048412323, 'learning_rate': 4.7135015706605845e-05, 'epoch': 0.24}
24%|██▍ | 1076/4506 [1:13:44<3:52:28, 4.07s/it]
24%|██▍ | 1077/4506 [1:13:49<3:53:23, 4.08s/it]
{'loss': 0.3347, 'grad_norm': 0.4967692792415619, 'learning_rate': 4.712600596671309e-05, 'epoch': 0.24}
24%|██▍ | 1077/4506 [1:13:49<3:53:23, 4.08s/it]
24%|██▍ | 1078/4506 [1:13:52<3:47:49, 3.99s/it]
{'loss': 0.3206, 'grad_norm': 0.48877108097076416, 'learning_rate': 4.711698294611834e-05, 'epoch': 0.24}
24%|██▍ | 1078/4506 [1:13:52<3:47:49, 3.99s/it]
24%|██▍ | 1079/4506 [1:13:56<3:49:27, 4.02s/it]
{'loss': 0.3208, 'grad_norm': 0.43694400787353516, 'learning_rate': 4.7107946650237476e-05, 'epoch': 0.24}
24%|██▍ | 1079/4506 [1:13:56<3:49:27, 4.02s/it]
24%|██▍ | 1080/4506 [1:14:00<3:46:51, 3.97s/it]
{'loss': 0.3096, 'grad_norm': 0.45256170630455017, 'learning_rate': 4.709889708449438e-05, 'epoch': 0.24}
24%|██▍ | 1080/4506 [1:14:00<3:46:51, 3.97s/it]
24%|██▍ | 1081/4506 [1:14:04<3:48:03, 4.00s/it]
{'loss': 0.3076, 'grad_norm': 0.5378621816635132, 'learning_rate': 4.7089834254320854e-05, 'epoch': 0.24}
24%|██▍ | 1081/4506 [1:14:04<3:48:03, 4.00s/it]
24%|██▍ | 1082/4506 [1:14:08<3:51:11, 4.05s/it]
{'loss': 0.3165, 'grad_norm': 0.4611240327358246, 'learning_rate': 4.708075816515669e-05, 'epoch': 0.24}
24%|██▍ | 1082/4506 [1:14:08<3:51:11, 4.05s/it]
24%|██▍ | 1083/4506 [1:14:13<3:57:31, 4.16s/it]
{'loss': 0.3269, 'grad_norm': 0.5139226913452148, 'learning_rate': 4.707166882244964e-05, 'epoch': 0.24}
24%|██▍ | 1083/4506 [1:14:13<3:57:31, 4.16s/it]
24%|██▍ | 1084/4506 [1:14:17<3:54:59, 4.12s/it]
{'loss': 0.3101, 'grad_norm': 0.43561989068984985, 'learning_rate': 4.7062566231655406e-05, 'epoch': 0.24}
24%|██▍ | 1084/4506 [1:14:17<3:54:59, 4.12s/it]
24%|██▍ | 1085/4506 [1:14:21<3:58:22, 4.18s/it]
{'loss': 0.3118, 'grad_norm': 0.4405480921268463, 'learning_rate': 4.705345039823763e-05, 'epoch': 0.24}
24%|██▍ | 1085/4506 [1:14:21<3:58:22, 4.18s/it]
24%|██▍ | 1086/4506 [1:14:25<3:52:51, 4.09s/it]
{'loss': 0.329, 'grad_norm': 0.508369505405426, 'learning_rate': 4.704432132766792e-05, 'epoch': 0.24}
24%|██▍ | 1086/4506 [1:14:25<3:52:51, 4.09s/it]
24%|██▍ | 1087/4506 [1:14:29<3:52:03, 4.07s/it]
{'loss': 0.3226, 'grad_norm': 0.5172295570373535, 'learning_rate': 4.7035179025425824e-05, 'epoch': 0.24}
24%|██▍ | 1087/4506 [1:14:29<3:52:03, 4.07s/it]
24%|██▍ | 1088/4506 [1:14:33<3:51:38, 4.07s/it]
{'loss': 0.335, 'grad_norm': 0.5109332203865051, 'learning_rate': 4.7026023496998814e-05, 'epoch': 0.24}
24%|██▍ | 1088/4506 [1:14:33<3:51:38, 4.07s/it]
24%|██▍ | 1089/4506 [1:14:38<3:56:41, 4.16s/it]
{'loss': 0.3235, 'grad_norm': 0.4930461049079895, 'learning_rate': 4.701685474788234e-05, 'epoch': 0.24}
24%|██▍ | 1089/4506 [1:14:38<3:56:41, 4.16s/it]
24%|██▍ | 1090/4506 [1:14:41<3:51:32, 4.07s/it]
{'loss': 0.3366, 'grad_norm': 0.4896828532218933, 'learning_rate': 4.700767278357975e-05, 'epoch': 0.24}
24%|██▍ | 1090/4506 [1:14:41<3:51:32, 4.07s/it]
24%|██▍ | 1091/4506 [1:14:46<4:06:28, 4.33s/it]
{'loss': 0.324, 'grad_norm': 0.49440789222717285, 'learning_rate': 4.699847760960233e-05, 'epoch': 0.24}
24%|██▍ | 1091/4506 [1:14:46<4:06:28, 4.33s/it]
24%|██▍ | 1092/4506 [1:14:51<4:07:27, 4.35s/it]
{'loss': 0.3197, 'grad_norm': 0.4724733233451843, 'learning_rate': 4.6989269231469326e-05, 'epoch': 0.24}
24%|██▍ | 1092/4506 [1:14:51<4:07:27, 4.35s/it]
24%|██▍ | 1093/4506 [1:14:55<4:00:49, 4.23s/it]
{'loss': 0.3353, 'grad_norm': 0.4885287880897522, 'learning_rate': 4.698004765470787e-05, 'epoch': 0.24}
24%|██▍ | 1093/4506 [1:14:55<4:00:49, 4.23s/it]
24%|██▍ | 1094/4506 [1:14:59<4:02:05, 4.26s/it]
{'loss': 0.3372, 'grad_norm': 0.44743451476097107, 'learning_rate': 4.697081288485303e-05, 'epoch': 0.24}
24%|██▍ | 1094/4506 [1:14:59<4:02:05, 4.26s/it]
24%|██▍ | 1095/4506 [1:15:03<4:03:59, 4.29s/it]
{'loss': 0.3299, 'grad_norm': 0.4971596598625183, 'learning_rate': 4.696156492744782e-05, 'epoch': 0.24}
24%|██▍ | 1095/4506 [1:15:03<4:03:59, 4.29s/it]
24%|██▍ | 1096/4506 [1:15:08<4:02:44, 4.27s/it]
{'loss': 0.3272, 'grad_norm': 0.562667965888977, 'learning_rate': 4.6952303788043116e-05, 'epoch': 0.24}
24%|██▍ | 1096/4506 [1:15:08<4:02:44, 4.27s/it]
24%|██▍ | 1097/4506 [1:15:12<3:56:53, 4.17s/it]
{'loss': 0.3168, 'grad_norm': 0.46656858921051025, 'learning_rate': 4.6943029472197755e-05, 'epoch': 0.24}
24%|██▍ | 1097/4506 [1:15:12<3:56:53, 4.17s/it]
24%|██▍ | 1098/4506 [1:15:15<3:52:15, 4.09s/it]
{'loss': 0.3222, 'grad_norm': 0.44530314207077026, 'learning_rate': 4.693374198547845e-05, 'epoch': 0.24}
24%|██▍ | 1098/4506 [1:15:15<3:52:15, 4.09s/it]
24%|██▍ | 1099/4506 [1:15:20<3:55:11, 4.14s/it]
{'loss': 0.3318, 'grad_norm': 0.6667030453681946, 'learning_rate': 4.692444133345984e-05, 'epoch': 0.24}
24%|██▍ | 1099/4506 [1:15:20<3:55:11, 4.14s/it]
24%|██▍ | 1100/4506 [1:15:24<3:54:33, 4.13s/it]
{'loss': 0.3281, 'grad_norm': 0.5857779383659363, 'learning_rate': 4.691512752172447e-05, 'epoch': 0.24}
24%|██▍ | 1100/4506 [1:15:24<3:54:33, 4.13s/it]
24%|██▍ | 1101/4506 [1:15:28<3:55:33, 4.15s/it]
{'loss': 0.3319, 'grad_norm': 0.6606380343437195, 'learning_rate': 4.690580055586274e-05, 'epoch': 0.24}
24%|██▍ | 1101/4506 [1:15:28<3:55:33, 4.15s/it]
24%|██▍ | 1102/4506 [1:15:32<3:56:28, 4.17s/it]
{'loss': 0.3036, 'grad_norm': 0.4373786151409149, 'learning_rate': 4.689646044147302e-05, 'epoch': 0.24}
24%|██▍ | 1102/4506 [1:15:32<3:56:28, 4.17s/it]
24%|██▍ | 1103/4506 [1:15:36<3:56:16, 4.17s/it]
{'loss': 0.2995, 'grad_norm': 0.47336363792419434, 'learning_rate': 4.688710718416152e-05, 'epoch': 0.24}
24%|██▍ | 1103/4506 [1:15:36<3:56:16, 4.17s/it]
25%|██▍ | 1104/4506 [1:15:41<3:57:50, 4.19s/it]
{'loss': 0.328, 'grad_norm': 0.5019001960754395, 'learning_rate': 4.6877740789542324e-05, 'epoch': 0.25}
25%|██▍ | 1104/4506 [1:15:41<3:57:50, 4.19s/it]
25%|██▍ | 1105/4506 [1:15:45<3:59:17, 4.22s/it]
{'loss': 0.3134, 'grad_norm': 0.4733075201511383, 'learning_rate': 4.686836126323745e-05, 'epoch': 0.25}
25%|██▍ | 1105/4506 [1:15:45<3:59:17, 4.22s/it]
25%|██▍ | 1106/4506 [1:15:50<4:05:24, 4.33s/it]
{'loss': 0.3251, 'grad_norm': 0.46504664421081543, 'learning_rate': 4.685896861087677e-05, 'epoch': 0.25}
25%|██▍ | 1106/4506 [1:15:50<4:05:24, 4.33s/it]
25%|██▍ | 1107/4506 [1:15:54<4:04:09, 4.31s/it]
{'loss': 0.3126, 'grad_norm': 0.42828941345214844, 'learning_rate': 4.684956283809804e-05, 'epoch': 0.25}
25%|██▍ | 1107/4506 [1:15:54<4:04:09, 4.31s/it]
25%|██▍ | 1108/4506 [1:15:58<3:56:01, 4.17s/it]
{'loss': 0.3178, 'grad_norm': 0.47993552684783936, 'learning_rate': 4.684014395054689e-05, 'epoch': 0.25}
25%|██▍ | 1108/4506 [1:15:58<3:56:01, 4.17s/it]
25%|██▍ | 1109/4506 [1:16:02<4:00:56, 4.26s/it]
{'loss': 0.304, 'grad_norm': 0.5155718922615051, 'learning_rate': 4.6830711953876825e-05, 'epoch': 0.25}
25%|██▍ | 1109/4506 [1:16:02<4:00:56, 4.26s/it]
25%|██▍ | 1110/4506 [1:16:06<3:51:02, 4.08s/it]
{'loss': 0.3166, 'grad_norm': 0.5086363554000854, 'learning_rate': 4.682126685374921e-05, 'epoch': 0.25}
25%|██▍ | 1110/4506 [1:16:06<3:51:02, 4.08s/it]
25%|██▍ | 1111/4506 [1:16:10<3:52:41, 4.11s/it]
{'loss': 0.3391, 'grad_norm': 0.4864940643310547, 'learning_rate': 4.68118086558333e-05, 'epoch': 0.25}
25%|██▍ | 1111/4506 [1:16:10<3:52:41, 4.11s/it]
25%|██▍ | 1112/4506 [1:16:14<3:51:39, 4.10s/it]
{'loss': 0.3154, 'grad_norm': 0.47457051277160645, 'learning_rate': 4.6802337365806166e-05, 'epoch': 0.25}
25%|██▍ | 1112/4506 [1:16:14<3:51:39, 4.10s/it]
25%|██▍ | 1113/4506 [1:16:18<3:48:45, 4.05s/it]
{'loss': 0.3272, 'grad_norm': 0.5324479341506958, 'learning_rate': 4.6792852989352784e-05, 'epoch': 0.25}
25%|██▍ | 1113/4506 [1:16:18<3:48:45, 4.05s/it]
25%|██▍ | 1114/4506 [1:16:23<3:59:08, 4.23s/it]
{'loss': 0.3234, 'grad_norm': 0.4954822063446045, 'learning_rate': 4.678335553216595e-05, 'epoch': 0.25}
25%|██▍ | 1114/4506 [1:16:23<3:59:08, 4.23s/it]
25%|██▍ | 1115/4506 [1:16:26<3:52:08, 4.11s/it]
{'loss': 0.3252, 'grad_norm': 0.4830090403556824, 'learning_rate': 4.6773844999946345e-05, 'epoch': 0.25}
25%|██▍ | 1115/4506 [1:16:26<3:52:08, 4.11s/it]
25%|██▍ | 1116/4506 [1:16:31<3:53:03, 4.12s/it]
{'loss': 0.3114, 'grad_norm': 0.45952528715133667, 'learning_rate': 4.676432139840247e-05, 'epoch': 0.25}
25%|██▍ | 1116/4506 [1:16:31<3:53:03, 4.12s/it]
25%|██▍ | 1117/4506 [1:16:34<3:48:26, 4.04s/it]
{'loss': 0.3238, 'grad_norm': 0.4149326682090759, 'learning_rate': 4.675478473325068e-05, 'epoch': 0.25}
25%|██▍ | 1117/4506 [1:16:34<3:48:26, 4.04s/it]
25%|██▍ | 1118/4506 [1:16:38<3:45:13, 3.99s/it]
{'loss': 0.3278, 'grad_norm': 0.5789448618888855, 'learning_rate': 4.674523501021517e-05, 'epoch': 0.25}
25%|██▍ | 1118/4506 [1:16:38<3:45:13, 3.99s/it]
25%|██▍ | 1119/4506 [1:16:43<3:52:07, 4.11s/it]
{'loss': 0.3161, 'grad_norm': 0.4935961663722992, 'learning_rate': 4.6735672235027984e-05, 'epoch': 0.25}
25%|██▍ | 1119/4506 [1:16:43<3:52:07, 4.11s/it]
25%|██▍ | 1120/4506 [1:16:47<3:54:53, 4.16s/it]
{'loss': 0.3179, 'grad_norm': 0.39668312668800354, 'learning_rate': 4.672609641342898e-05, 'epoch': 0.25}
25%|██▍ | 1120/4506 [1:16:47<3:54:53, 4.16s/it]
25%|██▍ | 1121/4506 [1:16:51<3:58:29, 4.23s/it]
{'loss': 0.3247, 'grad_norm': 0.555705726146698, 'learning_rate': 4.671650755116586e-05, 'epoch': 0.25}
25%|██▍ | 1121/4506 [1:16:51<3:58:29, 4.23s/it]
25%|██▍ | 1122/4506 [1:16:56<3:59:32, 4.25s/it]
{'loss': 0.3175, 'grad_norm': 0.525314450263977, 'learning_rate': 4.670690565399415e-05, 'epoch': 0.25}
25%|██▍ | 1122/4506 [1:16:56<3:59:32, 4.25s/it]
25%|██▍ | 1123/4506 [1:17:00<3:53:18, 4.14s/it]
{'loss': 0.3046, 'grad_norm': 0.45874539017677307, 'learning_rate': 4.669729072767721e-05, 'epoch': 0.25}
25%|██▍ | 1123/4506 [1:17:00<3:53:18, 4.14s/it]
25%|██▍ | 1124/4506 [1:17:04<3:57:56, 4.22s/it]
{'loss': 0.3118, 'grad_norm': 0.4433734118938446, 'learning_rate': 4.66876627779862e-05, 'epoch': 0.25}
25%|██▍ | 1124/4506 [1:17:04<3:57:56, 4.22s/it]
25%|██▍ | 1125/4506 [1:17:08<4:01:05, 4.28s/it]
{'loss': 0.3215, 'grad_norm': 0.5004403591156006, 'learning_rate': 4.667802181070011e-05, 'epoch': 0.25}
25%|██▍ | 1125/4506 [1:17:08<4:01:05, 4.28s/it]
25%|██▍ | 1126/4506 [1:17:12<3:54:56, 4.17s/it]
{'loss': 0.3093, 'grad_norm': 0.47928571701049805, 'learning_rate': 4.666836783160575e-05, 'epoch': 0.25}
25%|██▍ | 1126/4506 [1:17:12<3:54:56, 4.17s/it]
25%|██▌ | 1127/4506 [1:17:16<3:53:41, 4.15s/it]
{'loss': 0.3231, 'grad_norm': 0.5049845576286316, 'learning_rate': 4.6658700846497725e-05, 'epoch': 0.25}
25%|██▌ | 1127/4506 [1:17:16<3:53:41, 4.15s/it]
25%|██▌ | 1128/4506 [1:17:21<3:55:55, 4.19s/it]
{'loss': 0.3149, 'grad_norm': 0.43471163511276245, 'learning_rate': 4.664902086117845e-05, 'epoch': 0.25}
25%|██▌ | 1128/4506 [1:17:21<3:55:55, 4.19s/it]
25%|██▌ | 1129/4506 [1:17:25<3:50:05, 4.09s/it]
{'loss': 0.3083, 'grad_norm': 0.47622624039649963, 'learning_rate': 4.663932788145816e-05, 'epoch': 0.25}
25%|██▌ | 1129/4506 [1:17:25<3:50:05, 4.09s/it]
25%|██▌ | 1130/4506 [1:17:29<3:53:39, 4.15s/it]
{'loss': 0.3242, 'grad_norm': 0.4784678816795349, 'learning_rate': 4.662962191315486e-05, 'epoch': 0.25}
25%|██▌ | 1130/4506 [1:17:29<3:53:39, 4.15s/it]
25%|██▌ | 1131/4506 [1:17:33<3:56:54, 4.21s/it]
{'loss': 0.3223, 'grad_norm': 0.497305303812027, 'learning_rate': 4.661990296209439e-05, 'epoch': 0.25}
25%|██▌ | 1131/4506 [1:17:33<3:56:54, 4.21s/it]
25%|██▌ | 1132/4506 [1:17:37<3:52:09, 4.13s/it]
{'loss': 0.3033, 'grad_norm': 0.4262154698371887, 'learning_rate': 4.661017103411033e-05, 'epoch': 0.25}
25%|██▌ | 1132/4506 [1:17:37<3:52:09, 4.13s/it]
25%|██▌ | 1133/4506 [1:17:41<3:55:29, 4.19s/it]
{'loss': 0.332, 'grad_norm': 0.4893549084663391, 'learning_rate': 4.660042613504411e-05, 'epoch': 0.25}
25%|██▌ | 1133/4506 [1:17:41<3:55:29, 4.19s/it]
25%|██▌ | 1134/4506 [1:17:46<3:57:27, 4.23s/it]
{'loss': 0.3117, 'grad_norm': 0.4383372962474823, 'learning_rate': 4.6590668270744886e-05, 'epoch': 0.25}
25%|██▌ | 1134/4506 [1:17:46<3:57:27, 4.23s/it]
25%|██▌ | 1135/4506 [1:17:50<3:53:05, 4.15s/it]
{'loss': 0.3046, 'grad_norm': 0.5229694247245789, 'learning_rate': 4.6580897447069646e-05, 'epoch': 0.25}
25%|██▌ | 1135/4506 [1:17:50<3:53:05, 4.15s/it]
25%|██▌ | 1136/4506 [1:17:54<3:50:52, 4.11s/it]
{'loss': 0.304, 'grad_norm': 0.48106953501701355, 'learning_rate': 4.657111366988313e-05, 'epoch': 0.25}
25%|██▌ | 1136/4506 [1:17:54<3:50:52, 4.11s/it]
25%|██▌ | 1137/4506 [1:17:58<3:49:48, 4.09s/it]
{'loss': 0.3141, 'grad_norm': 0.44744187593460083, 'learning_rate': 4.6561316945057856e-05, 'epoch': 0.25}
25%|██▌ | 1137/4506 [1:17:58<3:49:48, 4.09s/it]
25%|██▌ | 1138/4506 [1:18:02<3:47:19, 4.05s/it]
{'loss': 0.2967, 'grad_norm': 0.4895053803920746, 'learning_rate': 4.655150727847412e-05, 'epoch': 0.25}
25%|██▌ | 1138/4506 [1:18:02<3:47:19, 4.05s/it]
25%|██▌ | 1139/4506 [1:18:06<3:50:37, 4.11s/it]
{'loss': 0.323, 'grad_norm': 0.466607004404068, 'learning_rate': 4.654168467601997e-05, 'epoch': 0.25}
25%|██▌ | 1139/4506 [1:18:06<3:50:37, 4.11s/it]
25%|██▌ | 1140/4506 [1:18:10<3:50:20, 4.11s/it]
{'loss': 0.3323, 'grad_norm': 0.5484172105789185, 'learning_rate': 4.653184914359124e-05, 'epoch': 0.25}
25%|██▌ | 1140/4506 [1:18:10<3:50:20, 4.11s/it]
25%|██▌ | 1141/4506 [1:18:14<3:51:20, 4.12s/it]
{'loss': 0.3302, 'grad_norm': 0.4452046751976013, 'learning_rate': 4.652200068709153e-05, 'epoch': 0.25}
25%|██▌ | 1141/4506 [1:18:14<3:51:20, 4.12s/it]
25%|██▌ | 1142/4506 [1:18:18<3:47:51, 4.06s/it]
{'loss': 0.3376, 'grad_norm': 0.4811553657054901, 'learning_rate': 4.6512139312432166e-05, 'epoch': 0.25}
25%|██▌ | 1142/4506 [1:18:18<3:47:51, 4.06s/it]
25%|██▌ | 1143/4506 [1:18:23<3:53:38, 4.17s/it]
{'loss': 0.3124, 'grad_norm': 0.4843727648258209, 'learning_rate': 4.650226502553225e-05, 'epoch': 0.25}
25%|██▌ | 1143/4506 [1:18:23<3:53:38, 4.17s/it]
25%|██▌ | 1144/4506 [1:18:27<3:53:09, 4.16s/it]
{'loss': 0.3115, 'grad_norm': 0.48672017455101013, 'learning_rate': 4.649237783231862e-05, 'epoch': 0.25}
25%|██▌ | 1144/4506 [1:18:27<3:53:09, 4.16s/it]
25%|██▌ | 1145/4506 [1:18:30<3:46:20, 4.04s/it]
{'loss': 0.3172, 'grad_norm': 0.4570588767528534, 'learning_rate': 4.648247773872589e-05, 'epoch': 0.25}
25%|██▌ | 1145/4506 [1:18:30<3:46:20, 4.04s/it]
25%|██▌ | 1146/4506 [1:18:35<3:48:27, 4.08s/it]
{'loss': 0.3046, 'grad_norm': 0.4522120952606201, 'learning_rate': 4.647256475069638e-05, 'epoch': 0.25}
25%|██▌ | 1146/4506 [1:18:35<3:48:27, 4.08s/it]
25%|██▌ | 1147/4506 [1:18:39<3:49:44, 4.10s/it]
{'loss': 0.3068, 'grad_norm': 0.44540682435035706, 'learning_rate': 4.646263887418017e-05, 'epoch': 0.25}
25%|██▌ | 1147/4506 [1:18:39<3:49:44, 4.10s/it]
25%|██▌ | 1148/4506 [1:18:43<3:44:11, 4.01s/it]
{'loss': 0.3106, 'grad_norm': 0.4274246394634247, 'learning_rate': 4.645270011513509e-05, 'epoch': 0.25}
25%|██▌ | 1148/4506 [1:18:43<3:44:11, 4.01s/it]
25%|██▌ | 1149/4506 [1:18:46<3:40:10, 3.94s/it]
{'loss': 0.3025, 'grad_norm': 0.440784215927124, 'learning_rate': 4.644274847952666e-05, 'epoch': 0.26}
25%|██▌ | 1149/4506 [1:18:46<3:40:10, 3.94s/it]
26%|██▌ | 1150/4506 [1:18:50<3:41:46, 3.96s/it]
{'loss': 0.315, 'grad_norm': 0.46294790506362915, 'learning_rate': 4.6432783973328176e-05, 'epoch': 0.26}
26%|██▌ | 1150/4506 [1:18:50<3:41:46, 3.96s/it]
26%|██▌ | 1151/4506 [1:18:54<3:40:06, 3.94s/it]
{'loss': 0.317, 'grad_norm': 0.4532794952392578, 'learning_rate': 4.642280660252062e-05, 'epoch': 0.26}
26%|██▌ | 1151/4506 [1:18:54<3:40:06, 3.94s/it]
26%|██▌ | 1152/4506 [1:18:58<3:41:51, 3.97s/it]
{'loss': 0.3253, 'grad_norm': 0.43824851512908936, 'learning_rate': 4.6412816373092725e-05, 'epoch': 0.26}
26%|██▌ | 1152/4506 [1:18:58<3:41:51, 3.97s/it]
26%|██▌ | 1153/4506 [1:19:03<3:47:16, 4.07s/it]
{'loss': 0.3085, 'grad_norm': 0.4374365210533142, 'learning_rate': 4.6402813291040925e-05, 'epoch': 0.26}
26%|██▌ | 1153/4506 [1:19:03<3:47:16, 4.07s/it]
26%|██▌ | 1154/4506 [1:19:07<3:44:28, 4.02s/it]
{'loss': 0.3052, 'grad_norm': 0.4575224220752716, 'learning_rate': 4.639279736236938e-05, 'epoch': 0.26}
26%|██▌ | 1154/4506 [1:19:07<3:44:28, 4.02s/it]
26%|██▌ | 1155/4506 [1:19:11<3:47:51, 4.08s/it]
{'loss': 0.3024, 'grad_norm': 0.4961683452129364, 'learning_rate': 4.6382768593089954e-05, 'epoch': 0.26}
26%|██▌ | 1155/4506 [1:19:11<3:47:51, 4.08s/it]
26%|██▌ | 1156/4506 [1:19:15<3:43:07, 4.00s/it]
{'loss': 0.3052, 'grad_norm': 0.4924643337726593, 'learning_rate': 4.637272698922221e-05, 'epoch': 0.26}
26%|██▌ | 1156/4506 [1:19:15<3:43:07, 4.00s/it]
26%|██▌ | 1157/4506 [1:19:19<3:44:06, 4.02s/it]
{'loss': 0.306, 'grad_norm': 0.4154771864414215, 'learning_rate': 4.6362672556793434e-05, 'epoch': 0.26}
26%|██▌ | 1157/4506 [1:19:19<3:44:06, 4.02s/it]
26%|██▌ | 1158/4506 [1:19:23<3:49:36, 4.11s/it]
{'loss': 0.3047, 'grad_norm': 0.42166584730148315, 'learning_rate': 4.6352605301838606e-05, 'epoch': 0.26}
26%|██▌ | 1158/4506 [1:19:23<3:49:36, 4.11s/it]
26%|██▌ | 1159/4506 [1:19:27<3:52:53, 4.17s/it]
{'loss': 0.3236, 'grad_norm': 0.5753137469291687, 'learning_rate': 4.634252523040038e-05, 'epoch': 0.26}
26%|██▌ | 1159/4506 [1:19:27<3:52:53, 4.17s/it]
26%|██▌ | 1160/4506 [1:19:31<3:51:28, 4.15s/it]
{'loss': 0.3102, 'grad_norm': 0.5063140392303467, 'learning_rate': 4.633243234852914e-05, 'epoch': 0.26}
26%|██▌ | 1160/4506 [1:19:31<3:51:28, 4.15s/it]
26%|██▌ | 1161/4506 [1:19:35<3:47:43, 4.08s/it]
{'loss': 0.3029, 'grad_norm': 0.44359344244003296, 'learning_rate': 4.6322326662282945e-05, 'epoch': 0.26}
26%|██▌ | 1161/4506 [1:19:35<3:47:43, 4.08s/it]
26%|██▌ | 1162/4506 [1:19:39<3:44:12, 4.02s/it]
{'loss': 0.3195, 'grad_norm': 0.4243679344654083, 'learning_rate': 4.631220817772751e-05, 'epoch': 0.26}
26%|██▌ | 1162/4506 [1:19:39<3:44:12, 4.02s/it]
26%|██▌ | 1163/4506 [1:19:43<3:40:11, 3.95s/it]
{'loss': 0.3276, 'grad_norm': 0.5187814831733704, 'learning_rate': 4.630207690093628e-05, 'epoch': 0.26}
26%|██▌ | 1163/4506 [1:19:43<3:40:11, 3.95s/it]
26%|██▌ | 1164/4506 [1:19:47<3:40:22, 3.96s/it]
{'loss': 0.3195, 'grad_norm': 0.4969107210636139, 'learning_rate': 4.6291932837990346e-05, 'epoch': 0.26}
26%|██▌ | 1164/4506 [1:19:47<3:40:22, 3.96s/it]
26%|██▌ | 1165/4506 [1:19:51<3:43:45, 4.02s/it]
{'loss': 0.3119, 'grad_norm': 0.4842053949832916, 'learning_rate': 4.6281775994978486e-05, 'epoch': 0.26}
26%|██▌ | 1165/4506 [1:19:51<3:43:45, 4.02s/it]
26%|██▌ | 1166/4506 [1:19:55<3:44:19, 4.03s/it]
{'loss': 0.3178, 'grad_norm': 0.4339182674884796, 'learning_rate': 4.627160637799714e-05, 'epoch': 0.26}
26%|██▌ | 1166/4506 [1:19:55<3:44:19, 4.03s/it]
26%|██▌ | 1167/4506 [1:20:00<3:51:58, 4.17s/it]
{'loss': 0.3133, 'grad_norm': 0.42004629969596863, 'learning_rate': 4.626142399315044e-05, 'epoch': 0.26}
26%|██▌ | 1167/4506 [1:20:00<3:51:58, 4.17s/it]
26%|██▌ | 1168/4506 [1:20:04<3:53:24, 4.20s/it]
{'loss': 0.3012, 'grad_norm': 0.40808773040771484, 'learning_rate': 4.6251228846550144e-05, 'epoch': 0.26}
26%|██▌ | 1168/4506 [1:20:04<3:53:24, 4.20s/it]
26%|██▌ | 1169/4506 [1:20:08<3:52:25, 4.18s/it]
{'loss': 0.314, 'grad_norm': 0.4979538321495056, 'learning_rate': 4.624102094431569e-05, 'epoch': 0.26}
26%|██▌ | 1169/4506 [1:20:08<3:52:25, 4.18s/it]
26%|██▌ | 1170/4506 [1:20:12<3:49:52, 4.13s/it]
{'loss': 0.3241, 'grad_norm': 0.5400025248527527, 'learning_rate': 4.6230800292574184e-05, 'epoch': 0.26}
26%|██▌ | 1170/4506 [1:20:12<3:49:52, 4.13s/it]
26%|██▌ | 1171/4506 [1:20:16<3:48:55, 4.12s/it]
{'loss': 0.3083, 'grad_norm': 0.4893593490123749, 'learning_rate': 4.622056689746036e-05, 'epoch': 0.26}
26%|██▌ | 1171/4506 [1:20:16<3:48:55, 4.12s/it]
26%|██▌ | 1172/4506 [1:20:20<3:47:18, 4.09s/it]
{'loss': 0.2989, 'grad_norm': 0.46208029985427856, 'learning_rate': 4.621032076511662e-05, 'epoch': 0.26}
26%|██▌ | 1172/4506 [1:20:20<3:47:18, 4.09s/it]
26%|██▌ | 1173/4506 [1:20:24<3:41:31, 3.99s/it]
{'loss': 0.3036, 'grad_norm': 0.44538649916648865, 'learning_rate': 4.620006190169301e-05, 'epoch': 0.26}
26%|██▌ | 1173/4506 [1:20:24<3:41:31, 3.99s/it]
26%|██▌ | 1174/4506 [1:20:28<3:42:10, 4.00s/it]
{'loss': 0.3226, 'grad_norm': 0.47975683212280273, 'learning_rate': 4.618979031334719e-05, 'epoch': 0.26}
26%|██▌ | 1174/4506 [1:20:28<3:42:10, 4.00s/it]
26%|██▌ | 1175/4506 [1:20:32<3:45:14, 4.06s/it]
{'loss': 0.3018, 'grad_norm': 0.45620638132095337, 'learning_rate': 4.6179506006244514e-05, 'epoch': 0.26}
26%|██▌ | 1175/4506 [1:20:32<3:45:14, 4.06s/it]
26%|██▌ | 1176/4506 [1:20:37<3:53:24, 4.21s/it]
{'loss': 0.3069, 'grad_norm': 0.44692733883857727, 'learning_rate': 4.61692089865579e-05, 'epoch': 0.26}
26%|██▌ | 1176/4506 [1:20:37<3:53:24, 4.21s/it]
26%|██▌ | 1177/4506 [1:20:41<3:53:45, 4.21s/it]
{'loss': 0.2976, 'grad_norm': 0.42561545968055725, 'learning_rate': 4.615889926046795e-05, 'epoch': 0.26}
26%|██▌ | 1177/4506 [1:20:41<3:53:45, 4.21s/it]
26%|██▌ | 1178/4506 [1:20:45<3:47:34, 4.10s/it]
{'loss': 0.3108, 'grad_norm': 0.5181128978729248, 'learning_rate': 4.614857683416288e-05, 'epoch': 0.26}
26%|██▌ | 1178/4506 [1:20:45<3:47:34, 4.10s/it]
26%|██▌ | 1179/4506 [1:20:49<3:43:10, 4.02s/it]
{'loss': 0.3372, 'grad_norm': 0.4861530661582947, 'learning_rate': 4.613824171383852e-05, 'epoch': 0.26}
26%|██▌ | 1179/4506 [1:20:49<3:43:10, 4.02s/it]
26%|██▌ | 1180/4506 [1:20:53<3:45:15, 4.06s/it]
{'loss': 0.2966, 'grad_norm': 0.4490821063518524, 'learning_rate': 4.612789390569832e-05, 'epoch': 0.26}
26%|██▌ | 1180/4506 [1:20:53<3:45:15, 4.06s/it]
26%|██▌ | 1181/4506 [1:20:57<3:43:51, 4.04s/it]
{'loss': 0.3133, 'grad_norm': 0.5004748106002808, 'learning_rate': 4.6117533415953354e-05, 'epoch': 0.26}
26%|██▌ | 1181/4506 [1:20:57<3:43:51, 4.04s/it]
26%|██▌ | 1182/4506 [1:21:01<3:46:05, 4.08s/it]
{'loss': 0.3214, 'grad_norm': 0.5080822706222534, 'learning_rate': 4.610716025082229e-05, 'epoch': 0.26}
26%|██▌ | 1182/4506 [1:21:01<3:46:05, 4.08s/it]
26%|██▋ | 1183/4506 [1:21:05<3:42:12, 4.01s/it]
{'loss': 0.3141, 'grad_norm': 0.46324193477630615, 'learning_rate': 4.609677441653144e-05, 'epoch': 0.26}
26%|██▋ | 1183/4506 [1:21:05<3:42:12, 4.01s/it]
26%|██▋ | 1184/4506 [1:21:09<3:46:36, 4.09s/it]
{'loss': 0.3222, 'grad_norm': 0.5207367539405823, 'learning_rate': 4.608637591931467e-05, 'epoch': 0.26}
26%|██▋ | 1184/4506 [1:21:09<3:46:36, 4.09s/it]
26%|██▋ | 1185/4506 [1:21:14<3:54:40, 4.24s/it]
{'loss': 0.3286, 'grad_norm': 0.462596595287323, 'learning_rate': 4.60759647654135e-05, 'epoch': 0.26}
26%|██▋ | 1185/4506 [1:21:14<3:54:40, 4.24s/it]
26%|██▋ | 1186/4506 [1:21:18<3:56:05, 4.27s/it]
{'loss': 0.3125, 'grad_norm': 0.42869889736175537, 'learning_rate': 4.606554096107702e-05, 'epoch': 0.26}
26%|██▋ | 1186/4506 [1:21:18<3:56:05, 4.27s/it]
26%|██▋ | 1187/4506 [1:21:22<3:47:36, 4.11s/it]
{'loss': 0.3012, 'grad_norm': 0.4347342848777771, 'learning_rate': 4.60551045125619e-05, 'epoch': 0.26}
26%|██▋ | 1187/4506 [1:21:22<3:47:36, 4.11s/it]
26%|██▋ | 1188/4506 [1:21:26<3:47:58, 4.12s/it]
{'loss': 0.3122, 'grad_norm': 0.5339630246162415, 'learning_rate': 4.604465542613242e-05, 'epoch': 0.26}
26%|██▋ | 1188/4506 [1:21:26<3:47:58, 4.12s/it]
26%|██▋ | 1189/4506 [1:21:30<3:46:03, 4.09s/it]
{'loss': 0.3014, 'grad_norm': 0.4248190224170685, 'learning_rate': 4.6034193708060436e-05, 'epoch': 0.26}
26%|██▋ | 1189/4506 [1:21:30<3:46:03, 4.09s/it]
26%|██▋ | 1190/4506 [1:21:34<3:47:00, 4.11s/it]
{'loss': 0.309, 'grad_norm': 0.5132874846458435, 'learning_rate': 4.602371936462539e-05, 'epoch': 0.26}
26%|██▋ | 1190/4506 [1:21:34<3:47:00, 4.11s/it]
26%|██▋ | 1191/4506 [1:21:38<3:43:51, 4.05s/it]
{'loss': 0.3197, 'grad_norm': 0.488154798746109, 'learning_rate': 4.601323240211431e-05, 'epoch': 0.26}
26%|██▋ | 1191/4506 [1:21:38<3:43:51, 4.05s/it]
26%|██▋ | 1192/4506 [1:21:42<3:41:23, 4.01s/it]
{'loss': 0.3042, 'grad_norm': 0.506653904914856, 'learning_rate': 4.600273282682179e-05, 'epoch': 0.26}
26%|██▋ | 1192/4506 [1:21:42<3:41:23, 4.01s/it]
26%|██▋ | 1193/4506 [1:21:46<3:43:37, 4.05s/it]
{'loss': 0.3209, 'grad_norm': 0.5156626105308533, 'learning_rate': 4.599222064504999e-05, 'epoch': 0.26}
26%|██▋ | 1193/4506 [1:21:46<3:43:37, 4.05s/it]
26%|██▋ | 1194/4506 [1:21:50<3:42:51, 4.04s/it]
{'loss': 0.3279, 'grad_norm': 0.4896182119846344, 'learning_rate': 4.598169586310863e-05, 'epoch': 0.27}
26%|██▋ | 1194/4506 [1:21:50<3:42:51, 4.04s/it]
27%|██▋ | 1195/4506 [1:21:54<3:39:39, 3.98s/it]
{'loss': 0.31, 'grad_norm': 0.5663788318634033, 'learning_rate': 4.5971158487315025e-05, 'epoch': 0.27}
27%|██▋ | 1195/4506 [1:21:54<3:39:39, 3.98s/it]
27%|██▋ | 1196/4506 [1:21:58<3:39:31, 3.98s/it]
{'loss': 0.3137, 'grad_norm': 0.4232974052429199, 'learning_rate': 4.596060852399401e-05, 'epoch': 0.27}
27%|██▋ | 1196/4506 [1:21:58<3:39:31, 3.98s/it]
27%|██▋ | 1197/4506 [1:22:02<3:37:01, 3.94s/it]
{'loss': 0.3154, 'grad_norm': 0.4978485107421875, 'learning_rate': 4.595004597947801e-05, 'epoch': 0.27}
27%|██▋ | 1197/4506 [1:22:02<3:37:01, 3.94s/it]
27%|██▋ | 1198/4506 [1:22:06<3:39:05, 3.97s/it]
{'loss': 0.3203, 'grad_norm': 0.5396599173545837, 'learning_rate': 4.5939470860106976e-05, 'epoch': 0.27}
27%|██▋ | 1198/4506 [1:22:06<3:39:05, 3.97s/it]
27%|██▋ | 1199/4506 [1:22:10<3:35:59, 3.92s/it]
{'loss': 0.2989, 'grad_norm': 0.49390164017677307, 'learning_rate': 4.592888317222842e-05, 'epoch': 0.27}
27%|██▋ | 1199/4506 [1:22:10<3:35:59, 3.92s/it]
27%|██▋ | 1200/4506 [1:22:13<3:36:41, 3.93s/it]
{'loss': 0.3097, 'grad_norm': 0.40581533312797546, 'learning_rate': 4.5918282922197385e-05, 'epoch': 0.27}
27%|██▋ | 1200/4506 [1:22:13<3:36:41, 3.93s/it]
27%|██▋ | 1201/4506 [1:22:18<3:39:31, 3.99s/it]
{'loss': 0.3072, 'grad_norm': 0.45461612939834595, 'learning_rate': 4.590767011637648e-05, 'epoch': 0.27}
27%|██▋ | 1201/4506 [1:22:18<3:39:31, 3.99s/it]
27%|██▋ | 1202/4506 [1:22:21<3:33:20, 3.87s/it]
{'loss': 0.32, 'grad_norm': 0.5212808847427368, 'learning_rate': 4.589704476113582e-05, 'epoch': 0.27}
27%|██▋ | 1202/4506 [1:22:21<3:33:20, 3.87s/it]
27%|██▋ | 1203/4506 [1:22:26<3:50:20, 4.18s/it]
{'loss': 0.301, 'grad_norm': 0.46869096159935, 'learning_rate': 4.5886406862853074e-05, 'epoch': 0.27}
27%|██▋ | 1203/4506 [1:22:26<3:50:20, 4.18s/it]
27%|██▋ | 1204/4506 [1:22:30<3:46:39, 4.12s/it]
{'loss': 0.3074, 'grad_norm': 0.515602171421051, 'learning_rate': 4.587575642791343e-05, 'epoch': 0.27}
27%|██▋ | 1204/4506 [1:22:30<3:46:39, 4.12s/it]
27%|██▋ | 1205/4506 [1:22:34<3:49:05, 4.16s/it]
{'loss': 0.3063, 'grad_norm': 0.4596552550792694, 'learning_rate': 4.5865093462709606e-05, 'epoch': 0.27}
27%|██▋ | 1205/4506 [1:22:34<3:49:05, 4.16s/it]
27%|██▋ | 1206/4506 [1:22:38<3:42:58, 4.05s/it]
{'loss': 0.2918, 'grad_norm': 0.43495452404022217, 'learning_rate': 4.585441797364184e-05, 'epoch': 0.27}
27%|██▋ | 1206/4506 [1:22:38<3:42:58, 4.05s/it]
27%|██▋ | 1207/4506 [1:22:42<3:40:51, 4.02s/it]
{'loss': 0.3017, 'grad_norm': 0.4465106427669525, 'learning_rate': 4.5843729967117874e-05, 'epoch': 0.27}
27%|██▋ | 1207/4506 [1:22:42<3:40:51, 4.02s/it]
27%|██▋ | 1208/4506 [1:22:46<3:41:08, 4.02s/it]
{'loss': 0.3122, 'grad_norm': 0.5026631355285645, 'learning_rate': 4.583302944955298e-05, 'epoch': 0.27}
27%|██▋ | 1208/4506 [1:22:46<3:41:08, 4.02s/it]
27%|██▋ | 1209/4506 [1:22:50<3:38:12, 3.97s/it]
{'loss': 0.3177, 'grad_norm': 0.45290762186050415, 'learning_rate': 4.582231642736994e-05, 'epoch': 0.27}
27%|██▋ | 1209/4506 [1:22:50<3:38:12, 3.97s/it]
27%|██▋ | 1210/4506 [1:22:54<3:39:34, 4.00s/it]
{'loss': 0.3013, 'grad_norm': 0.43277230858802795, 'learning_rate': 4.5811590906999024e-05, 'epoch': 0.27}
27%|██▋ | 1210/4506 [1:22:54<3:39:34, 4.00s/it]
27%|██▋ | 1211/4506 [1:22:58<3:37:30, 3.96s/it]
{'loss': 0.3006, 'grad_norm': 0.4648941457271576, 'learning_rate': 4.580085289487803e-05, 'epoch': 0.27}
27%|██▋ | 1211/4506 [1:22:58<3:37:30, 3.96s/it]
27%|██▋ | 1212/4506 [1:23:02<3:39:39, 4.00s/it]
{'loss': 0.2987, 'grad_norm': 0.47127729654312134, 'learning_rate': 4.579010239745224e-05, 'epoch': 0.27}
27%|██▋ | 1212/4506 [1:23:02<3:39:39, 4.00s/it]
27%|██▋ | 1213/4506 [1:23:06<3:36:52, 3.95s/it]
{'loss': 0.304, 'grad_norm': 0.48492392897605896, 'learning_rate': 4.577933942117441e-05, 'epoch': 0.27}
27%|██▋ | 1213/4506 [1:23:06<3:36:52, 3.95s/it]
27%|██▋ | 1214/4506 [1:23:10<3:43:11, 4.07s/it]
{'loss': 0.3035, 'grad_norm': 0.47185268998146057, 'learning_rate': 4.576856397250483e-05, 'epoch': 0.27}
27%|██▋ | 1214/4506 [1:23:10<3:43:11, 4.07s/it]
27%|██▋ | 1215/4506 [1:23:14<3:37:01, 3.96s/it]
{'loss': 0.2988, 'grad_norm': 0.43287765979766846, 'learning_rate': 4.575777605791123e-05, 'epoch': 0.27}
27%|██▋ | 1215/4506 [1:23:14<3:37:01, 3.96s/it]
27%|██▋ | 1216/4506 [1:23:18<3:45:36, 4.11s/it]
{'loss': 0.314, 'grad_norm': 0.49892476201057434, 'learning_rate': 4.5746975683868854e-05, 'epoch': 0.27}
27%|██▋ | 1216/4506 [1:23:18<3:45:36, 4.11s/it]
27%|██▋ | 1217/4506 [1:23:23<3:46:33, 4.13s/it]
{'loss': 0.3045, 'grad_norm': 0.42882972955703735, 'learning_rate': 4.573616285686043e-05, 'epoch': 0.27}
27%|██▋ | 1217/4506 [1:23:23<3:46:33, 4.13s/it]
27%|██▋ | 1218/4506 [1:23:27<3:46:54, 4.14s/it]
{'loss': 0.3087, 'grad_norm': 0.41911768913269043, 'learning_rate': 4.5725337583376116e-05, 'epoch': 0.27}
27%|██▋ | 1218/4506 [1:23:27<3:46:54, 4.14s/it]
27%|██▋ | 1219/4506 [1:23:30<3:38:49, 3.99s/it]
{'loss': 0.2916, 'grad_norm': 0.4812401235103607, 'learning_rate': 4.57144998699136e-05, 'epoch': 0.27}
27%|██▋ | 1219/4506 [1:23:30<3:38:49, 3.99s/it]
27%|██▋ | 1220/4506 [1:23:34<3:38:10, 3.98s/it]
{'loss': 0.2983, 'grad_norm': 0.42219650745391846, 'learning_rate': 4.570364972297798e-05, 'epoch': 0.27}
27%|██▋ | 1220/4506 [1:23:34<3:38:10, 3.98s/it]
27%|██▋ | 1221/4506 [1:23:39<3:49:54, 4.20s/it]
{'loss': 0.3112, 'grad_norm': 0.4856937527656555, 'learning_rate': 4.569278714908187e-05, 'epoch': 0.27}
27%|██▋ | 1221/4506 [1:23:39<3:49:54, 4.20s/it]
27%|██▋ | 1222/4506 [1:23:43<3:44:34, 4.10s/it]
{'loss': 0.2913, 'grad_norm': 0.5092552304267883, 'learning_rate': 4.56819121547453e-05, 'epoch': 0.27}
27%|██▋ | 1222/4506 [1:23:43<3:44:34, 4.10s/it]
27%|██▋ | 1223/4506 [1:23:47<3:42:18, 4.06s/it]
{'loss': 0.3037, 'grad_norm': 0.4024399220943451, 'learning_rate': 4.5671024746495784e-05, 'epoch': 0.27}
27%|██▋ | 1223/4506 [1:23:47<3:42:18, 4.06s/it]
27%|██▋ | 1224/4506 [1:23:51<3:40:26, 4.03s/it]
{'loss': 0.3071, 'grad_norm': 0.5040232539176941, 'learning_rate': 4.566012493086826e-05, 'epoch': 0.27}
27%|██▋ | 1224/4506 [1:23:51<3:40:26, 4.03s/it]
27%|██▋ | 1225/4506 [1:23:55<3:39:54, 4.02s/it]
{'loss': 0.2977, 'grad_norm': 0.4346095323562622, 'learning_rate': 4.5649212714405155e-05, 'epoch': 0.27}
27%|██▋ | 1225/4506 [1:23:55<3:39:54, 4.02s/it]
27%|██▋ | 1226/4506 [1:23:59<3:41:54, 4.06s/it]
{'loss': 0.3252, 'grad_norm': 0.4710376262664795, 'learning_rate': 4.563828810365628e-05, 'epoch': 0.27}
27%|██▋ | 1226/4506 [1:23:59<3:41:54, 4.06s/it]
27%|██▋ | 1227/4506 [1:24:03<3:38:45, 4.00s/it]
{'loss': 0.3017, 'grad_norm': 0.502086341381073, 'learning_rate': 4.562735110517894e-05, 'epoch': 0.27}
27%|██▋ | 1227/4506 [1:24:03<3:38:45, 4.00s/it]
27%|██▋ | 1228/4506 [1:24:07<3:35:21, 3.94s/it]
{'loss': 0.3069, 'grad_norm': 0.5745216608047485, 'learning_rate': 4.5616401725537864e-05, 'epoch': 0.27}
27%|██▋ | 1228/4506 [1:24:07<3:35:21, 3.94s/it]
27%|██▋ | 1229/4506 [1:24:11<3:35:49, 3.95s/it]
{'loss': 0.3108, 'grad_norm': 0.471181720495224, 'learning_rate': 4.560543997130518e-05, 'epoch': 0.27}
27%|██▋ | 1229/4506 [1:24:11<3:35:49, 3.95s/it]
27%|██▋ | 1230/4506 [1:24:15<3:38:57, 4.01s/it]
{'loss': 0.3117, 'grad_norm': 0.4565134346485138, 'learning_rate': 4.559446584906046e-05, 'epoch': 0.27}
27%|██▋ | 1230/4506 [1:24:15<3:38:57, 4.01s/it]
27%|██▋ | 1231/4506 [1:24:19<3:35:30, 3.95s/it]
{'loss': 0.2879, 'grad_norm': 0.45314785838127136, 'learning_rate': 4.558347936539074e-05, 'epoch': 0.27}
27%|██▋ | 1231/4506 [1:24:19<3:35:30, 3.95s/it]
27%|██▋ | 1232/4506 [1:24:23<3:39:07, 4.02s/it]
{'loss': 0.3131, 'grad_norm': 0.4501608908176422, 'learning_rate': 4.557248052689042e-05, 'epoch': 0.27}
27%|██▋ | 1232/4506 [1:24:23<3:39:07, 4.02s/it]
27%|██▋ | 1233/4506 [1:24:27<3:41:31, 4.06s/it]
{'loss': 0.2936, 'grad_norm': 0.44640493392944336, 'learning_rate': 4.556146934016134e-05, 'epoch': 0.27}
27%|██▋ | 1233/4506 [1:24:27<3:41:31, 4.06s/it]
27%|██▋ | 1234/4506 [1:24:31<3:40:48, 4.05s/it]
{'loss': 0.2917, 'grad_norm': 0.4426289498806, 'learning_rate': 4.555044581181275e-05, 'epoch': 0.27}
27%|██▋ | 1234/4506 [1:24:31<3:40:48, 4.05s/it]
27%|██▋ | 1235/4506 [1:24:35<3:40:16, 4.04s/it]
{'loss': 0.3051, 'grad_norm': 0.479555606842041, 'learning_rate': 4.5539409948461305e-05, 'epoch': 0.27}
27%|██▋ | 1235/4506 [1:24:35<3:40:16, 4.04s/it]
27%|██▋ | 1236/4506 [1:24:39<3:46:19, 4.15s/it]
{'loss': 0.3048, 'grad_norm': 0.4850950539112091, 'learning_rate': 4.552836175673107e-05, 'epoch': 0.27}
27%|██▋ | 1236/4506 [1:24:39<3:46:19, 4.15s/it]
27%|██▋ | 1237/4506 [1:24:44<3:47:09, 4.17s/it]
{'loss': 0.3172, 'grad_norm': 0.4811118245124817, 'learning_rate': 4.5517301243253505e-05, 'epoch': 0.27}
27%|██▋ | 1237/4506 [1:24:44<3:47:09, 4.17s/it]
27%|██▋ | 1238/4506 [1:24:48<3:45:16, 4.14s/it]
{'loss': 0.2905, 'grad_norm': 0.42009562253952026, 'learning_rate': 4.5506228414667464e-05, 'epoch': 0.27}
27%|██▋ | 1238/4506 [1:24:48<3:45:16, 4.14s/it]
27%|██▋ | 1239/4506 [1:24:52<3:43:18, 4.10s/it]
{'loss': 0.3103, 'grad_norm': 0.49892109632492065, 'learning_rate': 4.54951432776192e-05, 'epoch': 0.28}
27%|██▋ | 1239/4506 [1:24:52<3:43:18, 4.10s/it]
28%|██▊ | 1240/4506 [1:24:55<3:38:50, 4.02s/it]
{'loss': 0.299, 'grad_norm': 0.47566038370132446, 'learning_rate': 4.548404583876234e-05, 'epoch': 0.28}
28%|██▊ | 1240/4506 [1:24:55<3:38:50, 4.02s/it]
28%|██▊ | 1241/4506 [1:24:59<3:34:58, 3.95s/it]
{'loss': 0.2943, 'grad_norm': 0.4702947437763214, 'learning_rate': 4.547293610475791e-05, 'epoch': 0.28}
28%|██▊ | 1241/4506 [1:24:59<3:34:58, 3.95s/it]
28%|██▊ | 1242/4506 [1:25:03<3:39:37, 4.04s/it]
{'loss': 0.3082, 'grad_norm': 0.47652745246887207, 'learning_rate': 4.546181408227432e-05, 'epoch': 0.28}
28%|██▊ | 1242/4506 [1:25:03<3:39:37, 4.04s/it]
28%|██▊ | 1243/4506 [1:25:08<3:39:56, 4.04s/it]
{'loss': 0.3078, 'grad_norm': 0.48918959498405457, 'learning_rate': 4.545067977798734e-05, 'epoch': 0.28}
28%|██▊ | 1243/4506 [1:25:08<3:39:56, 4.04s/it]
28%|██▊ | 1244/4506 [1:25:12<3:43:47, 4.12s/it]
{'loss': 0.3209, 'grad_norm': 0.4656687378883362, 'learning_rate': 4.5439533198580106e-05, 'epoch': 0.28}
28%|██▊ | 1244/4506 [1:25:12<3:43:47, 4.12s/it]
28%|██▊ | 1245/4506 [1:25:16<3:49:15, 4.22s/it]
{'loss': 0.306, 'grad_norm': 0.39380133152008057, 'learning_rate': 4.542837435074315e-05, 'epoch': 0.28}
28%|██▊ | 1245/4506 [1:25:16<3:49:15, 4.22s/it]
28%|██▊ | 1246/4506 [1:25:21<3:49:59, 4.23s/it]
{'loss': 0.2961, 'grad_norm': 0.4204758405685425, 'learning_rate': 4.541720324117435e-05, 'epoch': 0.28}
28%|██▊ | 1246/4506 [1:25:21<3:49:59, 4.23s/it]
28%|██▊ | 1247/4506 [1:25:25<3:51:18, 4.26s/it]
{'loss': 0.308, 'grad_norm': 0.4546031951904297, 'learning_rate': 4.540601987657894e-05, 'epoch': 0.28}
28%|██▊ | 1247/4506 [1:25:25<3:51:18, 4.26s/it]
28%|██▊ | 1248/4506 [1:25:29<3:45:33, 4.15s/it]
{'loss': 0.309, 'grad_norm': 0.45970991253852844, 'learning_rate': 4.539482426366952e-05, 'epoch': 0.28}
28%|██▊ | 1248/4506 [1:25:29<3:45:33, 4.15s/it]
28%|██▊ | 1249/4506 [1:25:33<3:41:56, 4.09s/it]
{'loss': 0.3253, 'grad_norm': 0.4588169455528259, 'learning_rate': 4.538361640916603e-05, 'epoch': 0.28}
28%|██▊ | 1249/4506 [1:25:33<3:41:56, 4.09s/it]
28%|██▊ | 1250/4506 [1:25:37<3:37:53, 4.02s/it]
{'loss': 0.2996, 'grad_norm': 0.4395948350429535, 'learning_rate': 4.537239631979577e-05, 'epoch': 0.28}
28%|██▊ | 1250/4506 [1:25:37<3:37:53, 4.02s/it]
28%|██▊ | 1251/4506 [1:25:41<3:41:36, 4.08s/it]
{'loss': 0.2988, 'grad_norm': 0.4137611389160156, 'learning_rate': 4.536116400229338e-05, 'epoch': 0.28}
28%|██▊ | 1251/4506 [1:25:41<3:41:36, 4.08s/it]
28%|██▊ | 1252/4506 [1:25:45<3:46:24, 4.17s/it]
{'loss': 0.3153, 'grad_norm': 0.43192553520202637, 'learning_rate': 4.5349919463400846e-05, 'epoch': 0.28}
28%|██▊ | 1252/4506 [1:25:45<3:46:24, 4.17s/it]
28%|██▊ | 1253/4506 [1:25:50<3:49:48, 4.24s/it]
{'loss': 0.3203, 'grad_norm': 0.4224728047847748, 'learning_rate': 4.5338662709867464e-05, 'epoch': 0.28}
28%|██▊ | 1253/4506 [1:25:50<3:49:48, 4.24s/it]
28%|██▊ | 1254/4506 [1:25:54<3:49:03, 4.23s/it]
{'loss': 0.291, 'grad_norm': 0.4032132923603058, 'learning_rate': 4.5327393748449896e-05, 'epoch': 0.28}
28%|██▊ | 1254/4506 [1:25:54<3:49:03, 4.23s/it]
28%|██▊ | 1255/4506 [1:25:58<3:51:03, 4.26s/it]
{'loss': 0.3176, 'grad_norm': 0.38616666197776794, 'learning_rate': 4.5316112585912115e-05, 'epoch': 0.28}
28%|██▊ | 1255/4506 [1:25:58<3:51:03, 4.26s/it]
28%|██▊ | 1256/4506 [1:26:02<3:50:22, 4.25s/it]
{'loss': 0.3115, 'grad_norm': 0.4377109110355377, 'learning_rate': 4.530481922902541e-05, 'epoch': 0.28}
28%|██▊ | 1256/4506 [1:26:02<3:50:22, 4.25s/it]
28%|██▊ | 1257/4506 [1:26:07<3:53:22, 4.31s/it]
{'loss': 0.2805, 'grad_norm': 0.4178774654865265, 'learning_rate': 4.52935136845684e-05, 'epoch': 0.28}
28%|██▊ | 1257/4506 [1:26:07<3:53:22, 4.31s/it]
28%|██▊ | 1258/4506 [1:26:11<3:54:10, 4.33s/it]
{'loss': 0.3182, 'grad_norm': 0.4675861895084381, 'learning_rate': 4.528219595932702e-05, 'epoch': 0.28}
28%|██▊ | 1258/4506 [1:26:11<3:54:10, 4.33s/it]
28%|██▊ | 1259/4506 [1:26:15<3:46:47, 4.19s/it]
{'loss': 0.3138, 'grad_norm': 0.5781945586204529, 'learning_rate': 4.52708660600945e-05, 'epoch': 0.28}
28%|██▊ | 1259/4506 [1:26:15<3:46:47, 4.19s/it]
28%|██▊ | 1260/4506 [1:26:19<3:47:16, 4.20s/it]
{'loss': 0.3006, 'grad_norm': 0.4703930616378784, 'learning_rate': 4.525952399367142e-05, 'epoch': 0.28}
28%|██▊ | 1260/4506 [1:26:19<3:47:16, 4.20s/it]
28%|██▊ | 1261/4506 [1:26:23<3:39:26, 4.06s/it]
{'loss': 0.3042, 'grad_norm': 0.47113409638404846, 'learning_rate': 4.5248169766865595e-05, 'epoch': 0.28}
28%|██▊ | 1261/4506 [1:26:23<3:39:26, 4.06s/it]
28%|██▊ | 1262/4506 [1:26:27<3:37:29, 4.02s/it]
{'loss': 0.3057, 'grad_norm': 0.5159133672714233, 'learning_rate': 4.5236803386492205e-05, 'epoch': 0.28}
28%|██▊ | 1262/4506 [1:26:27<3:37:29, 4.02s/it]
28%|██▊ | 1263/4506 [1:26:31<3:37:39, 4.03s/it]
{'loss': 0.3064, 'grad_norm': 0.4508405327796936, 'learning_rate': 4.522542485937369e-05, 'epoch': 0.28}
28%|██▊ | 1263/4506 [1:26:31<3:37:39, 4.03s/it]
28%|██▊ | 1264/4506 [1:26:35<3:40:02, 4.07s/it]
{'loss': 0.2995, 'grad_norm': 0.4242812693119049, 'learning_rate': 4.521403419233978e-05, 'epoch': 0.28}
28%|██▊ | 1264/4506 [1:26:35<3:40:02, 4.07s/it]
28%|██▊ | 1265/4506 [1:26:39<3:38:19, 4.04s/it]
{'loss': 0.3192, 'grad_norm': 0.5267782807350159, 'learning_rate': 4.5202631392227515e-05, 'epoch': 0.28}
28%|██▊ | 1265/4506 [1:26:39<3:38:19, 4.04s/it]
28%|██▊ | 1266/4506 [1:26:44<3:47:10, 4.21s/it]
{'loss': 0.3058, 'grad_norm': 0.4350382387638092, 'learning_rate': 4.519121646588119e-05, 'epoch': 0.28}
28%|██▊ | 1266/4506 [1:26:44<3:47:10, 4.21s/it]
28%|██▊ | 1267/4506 [1:26:48<3:45:36, 4.18s/it]
{'loss': 0.3003, 'grad_norm': 0.431316614151001, 'learning_rate': 4.5179789420152405e-05, 'epoch': 0.28}
28%|██▊ | 1267/4506 [1:26:48<3:45:36, 4.18s/it]
28%|██▊ | 1268/4506 [1:26:52<3:41:53, 4.11s/it]
{'loss': 0.2897, 'grad_norm': 0.45277538895606995, 'learning_rate': 4.51683502619e-05, 'epoch': 0.28}
28%|██▊ | 1268/4506 [1:26:52<3:41:53, 4.11s/it]
28%|██▊ | 1269/4506 [1:26:56<3:43:20, 4.14s/it]
{'loss': 0.3136, 'grad_norm': 0.49750030040740967, 'learning_rate': 4.515689899799012e-05, 'epoch': 0.28}
28%|██▊ | 1269/4506 [1:26:56<3:43:20, 4.14s/it]
28%|██▊ | 1270/4506 [1:27:00<3:37:39, 4.04s/it]
{'loss': 0.3011, 'grad_norm': 0.4495680034160614, 'learning_rate': 4.514543563529616e-05, 'epoch': 0.28}
28%|██▊ | 1270/4506 [1:27:00<3:37:39, 4.04s/it]
28%|██▊ | 1271/4506 [1:27:04<3:41:04, 4.10s/it]
{'loss': 0.3118, 'grad_norm': 0.5736351609230042, 'learning_rate': 4.513396018069878e-05, 'epoch': 0.28}
28%|██▊ | 1271/4506 [1:27:04<3:41:04, 4.10s/it]
28%|██▊ | 1272/4506 [1:27:08<3:39:42, 4.08s/it]
{'loss': 0.3166, 'grad_norm': 0.5435466766357422, 'learning_rate': 4.512247264108589e-05, 'epoch': 0.28}
28%|██▊ | 1272/4506 [1:27:08<3:39:42, 4.08s/it]
28%|██▊ | 1273/4506 [1:27:12<3:37:06, 4.03s/it]
{'loss': 0.2978, 'grad_norm': 0.4261557161808014, 'learning_rate': 4.511097302335267e-05, 'epoch': 0.28}
28%|██▊ | 1273/4506 [1:27:12<3:37:06, 4.03s/it]
28%|██▊ | 1274/4506 [1:27:16<3:36:08, 4.01s/it]
{'loss': 0.3099, 'grad_norm': 0.48008522391319275, 'learning_rate': 4.509946133440154e-05, 'epoch': 0.28}
28%|██▊ | 1274/4506 [1:27:16<3:36:08, 4.01s/it]
28%|██▊ | 1275/4506 [1:27:20<3:38:04, 4.05s/it]
{'loss': 0.3013, 'grad_norm': 0.43412289023399353, 'learning_rate': 4.5087937581142156e-05, 'epoch': 0.28}
28%|██▊ | 1275/4506 [1:27:20<3:38:04, 4.05s/it]
28%|██▊ | 1276/4506 [1:27:24<3:43:14, 4.15s/it]
{'loss': 0.2935, 'grad_norm': 0.4202919006347656, 'learning_rate': 4.5076401770491435e-05, 'epoch': 0.28}
28%|██▊ | 1276/4506 [1:27:24<3:43:14, 4.15s/it]
28%|██▊ | 1277/4506 [1:27:28<3:40:31, 4.10s/it]
{'loss': 0.2975, 'grad_norm': 0.4248502254486084, 'learning_rate': 4.5064853909373516e-05, 'epoch': 0.28}
28%|██▊ | 1277/4506 [1:27:28<3:40:31, 4.10s/it]
28%|██▊ | 1278/4506 [1:27:32<3:34:53, 3.99s/it]
{'loss': 0.2933, 'grad_norm': 0.47866714000701904, 'learning_rate': 4.5053294004719784e-05, 'epoch': 0.28}
28%|██▊ | 1278/4506 [1:27:32<3:34:53, 3.99s/it]
28%|██▊ | 1279/4506 [1:27:36<3:33:59, 3.98s/it]
{'loss': 0.2844, 'grad_norm': 0.4137709438800812, 'learning_rate': 4.5041722063468836e-05, 'epoch': 0.28}
28%|██▊ | 1279/4506 [1:27:36<3:33:59, 3.98s/it]
28%|██▊ | 1280/4506 [1:27:40<3:36:38, 4.03s/it]
{'loss': 0.3063, 'grad_norm': 0.47934070229530334, 'learning_rate': 4.503013809256651e-05, 'epoch': 0.28}
28%|██▊ | 1280/4506 [1:27:40<3:36:38, 4.03s/it]
28%|██▊ | 1281/4506 [1:27:45<3:39:56, 4.09s/it]
{'loss': 0.2976, 'grad_norm': 0.5419443845748901, 'learning_rate': 4.501854209896586e-05, 'epoch': 0.28}
28%|██▊ | 1281/4506 [1:27:45<3:39:56, 4.09s/it]
28%|██▊ | 1282/4506 [1:27:48<3:35:56, 4.02s/it]
{'loss': 0.2956, 'grad_norm': 0.4500564932823181, 'learning_rate': 4.5006934089627135e-05, 'epoch': 0.28}
28%|██▊ | 1282/4506 [1:27:48<3:35:56, 4.02s/it]
28%|██▊ | 1283/4506 [1:27:52<3:34:10, 3.99s/it]
{'loss': 0.2987, 'grad_norm': 0.4670591652393341, 'learning_rate': 4.4995314071517835e-05, 'epoch': 0.28}
28%|██▊ | 1283/4506 [1:27:52<3:34:10, 3.99s/it]
28%|██▊ | 1284/4506 [1:27:57<3:40:32, 4.11s/it]
{'loss': 0.3096, 'grad_norm': 0.4222911298274994, 'learning_rate': 4.498368205161265e-05, 'epoch': 0.29}
28%|██▊ | 1284/4506 [1:27:57<3:40:32, 4.11s/it]
29%|██▊ | 1285/4506 [1:28:00<3:35:39, 4.02s/it]
{'loss': 0.3014, 'grad_norm': 0.47070783376693726, 'learning_rate': 4.497203803689346e-05, 'epoch': 0.29}
29%|██▊ | 1285/4506 [1:28:00<3:35:39, 4.02s/it]
29%|██▊ | 1286/4506 [1:28:05<3:40:45, 4.11s/it]
{'loss': 0.2906, 'grad_norm': 0.3955143690109253, 'learning_rate': 4.496038203434937e-05, 'epoch': 0.29}
29%|██▊ | 1286/4506 [1:28:05<3:40:45, 4.11s/it]
29%|██▊ | 1287/4506 [1:28:09<3:42:22, 4.14s/it]
{'loss': 0.309, 'grad_norm': 0.4559836685657501, 'learning_rate': 4.494871405097665e-05, 'epoch': 0.29}
29%|██▊ | 1287/4506 [1:28:09<3:42:22, 4.14s/it]
29%|██▊ | 1288/4506 [1:28:13<3:37:00, 4.05s/it]
{'loss': 0.322, 'grad_norm': 0.4488150477409363, 'learning_rate': 4.49370340937788e-05, 'epoch': 0.29}
29%|██▊ | 1288/4506 [1:28:13<3:37:00, 4.05s/it]
29%|██▊ | 1289/4506 [1:28:17<3:42:35, 4.15s/it]
{'loss': 0.307, 'grad_norm': 0.43103447556495667, 'learning_rate': 4.492534216976646e-05, 'epoch': 0.29}
29%|██▊ | 1289/4506 [1:28:17<3:42:35, 4.15s/it]
29%|██▊ | 1290/4506 [1:28:21<3:42:28, 4.15s/it]
{'loss': 0.3167, 'grad_norm': 0.453130841255188, 'learning_rate': 4.4913638285957503e-05, 'epoch': 0.29}
29%|██▊ | 1290/4506 [1:28:21<3:42:28, 4.15s/it]
29%|██▊ | 1291/4506 [1:28:26<3:47:38, 4.25s/it]
{'loss': 0.2999, 'grad_norm': 0.4483618140220642, 'learning_rate': 4.490192244937694e-05, 'epoch': 0.29}
29%|██▊ | 1291/4506 [1:28:26<3:47:38, 4.25s/it]
29%|██▊ | 1292/4506 [1:28:30<3:43:49, 4.18s/it]
{'loss': 0.306, 'grad_norm': 0.4692663252353668, 'learning_rate': 4.489019466705698e-05, 'epoch': 0.29}
29%|██▊ | 1292/4506 [1:28:30<3:43:49, 4.18s/it]
29%|██▊ | 1293/4506 [1:28:34<3:45:07, 4.20s/it]
{'loss': 0.2912, 'grad_norm': 0.44557440280914307, 'learning_rate': 4.4878454946037e-05, 'epoch': 0.29}
29%|██▊ | 1293/4506 [1:28:34<3:45:07, 4.20s/it]
29%|██▊ | 1294/4506 [1:28:38<3:43:44, 4.18s/it]
{'loss': 0.3022, 'grad_norm': 0.4994179308414459, 'learning_rate': 4.486670329336353e-05, 'epoch': 0.29}
29%|██▊ | 1294/4506 [1:28:38<3:43:44, 4.18s/it]
29%|██▊ | 1295/4506 [1:28:42<3:42:13, 4.15s/it]
{'loss': 0.3073, 'grad_norm': 0.4087050259113312, 'learning_rate': 4.485493971609026e-05, 'epoch': 0.29}
29%|██▊ | 1295/4506 [1:28:42<3:42:13, 4.15s/it]
29%|██▉ | 1296/4506 [1:28:46<3:39:09, 4.10s/it]
{'loss': 0.298, 'grad_norm': 0.43001312017440796, 'learning_rate': 4.4843164221278067e-05, 'epoch': 0.29}
29%|██▉ | 1296/4506 [1:28:46<3:39:09, 4.10s/it]
29%|██▉ | 1297/4506 [1:28:50<3:40:15, 4.12s/it]
{'loss': 0.3218, 'grad_norm': 0.49909013509750366, 'learning_rate': 4.483137681599495e-05, 'epoch': 0.29}
29%|██▉ | 1297/4506 [1:28:50<3:40:15, 4.12s/it]
29%|██▉ | 1298/4506 [1:28:54<3:34:10, 4.01s/it]
{'loss': 0.2911, 'grad_norm': 0.5056494474411011, 'learning_rate': 4.481957750731607e-05, 'epoch': 0.29}
29%|██▉ | 1298/4506 [1:28:54<3:34:10, 4.01s/it]
29%|██▉ | 1299/4506 [1:28:58<3:29:09, 3.91s/it]
{'loss': 0.2837, 'grad_norm': 0.474931925535202, 'learning_rate': 4.480776630232373e-05, 'epoch': 0.29}
29%|██▉ | 1299/4506 [1:28:58<3:29:09, 3.91s/it]
29%|██▉ | 1300/4506 [1:29:02<3:31:06, 3.95s/it]
{'loss': 0.3078, 'grad_norm': 0.559751570224762, 'learning_rate': 4.479594320810738e-05, 'epoch': 0.29}
29%|██▉ | 1300/4506 [1:29:02<3:31:06, 3.95s/it]
29%|██▉ | 1301/4506 [1:29:06<3:29:46, 3.93s/it]
{'loss': 0.2872, 'grad_norm': 0.5379100441932678, 'learning_rate': 4.4784108231763595e-05, 'epoch': 0.29}
29%|██▉ | 1301/4506 [1:29:06<3:29:46, 3.93s/it]
29%|██▉ | 1302/4506 [1:29:10<3:28:34, 3.91s/it]
{'loss': 0.2936, 'grad_norm': 0.45703497529029846, 'learning_rate': 4.4772261380396084e-05, 'epoch': 0.29}
29%|██▉ | 1302/4506 [1:29:10<3:28:34, 3.91s/it]
29%|██▉ | 1303/4506 [1:29:14<3:27:46, 3.89s/it]
{'loss': 0.292, 'grad_norm': 0.4594429135322571, 'learning_rate': 4.4760402661115705e-05, 'epoch': 0.29}
29%|██▉ | 1303/4506 [1:29:14<3:27:46, 3.89s/it]
29%|██▉ | 1304/4506 [1:29:18<3:33:14, 4.00s/it]
{'loss': 0.3173, 'grad_norm': 0.4194205701351166, 'learning_rate': 4.47485320810404e-05, 'epoch': 0.29}
29%|██▉ | 1304/4506 [1:29:18<3:33:14, 4.00s/it]
29%|██▉ | 1305/4506 [1:29:22<3:31:11, 3.96s/it]
{'loss': 0.2898, 'grad_norm': 0.4280904233455658, 'learning_rate': 4.473664964729526e-05, 'epoch': 0.29}
29%|██▉ | 1305/4506 [1:29:22<3:31:11, 3.96s/it]
29%|██▉ | 1306/4506 [1:29:26<3:31:46, 3.97s/it]
{'loss': 0.2966, 'grad_norm': 0.44454094767570496, 'learning_rate': 4.47247553670125e-05, 'epoch': 0.29}
29%|██▉ | 1306/4506 [1:29:26<3:31:46, 3.97s/it]
29%|██▉ | 1307/4506 [1:29:29<3:26:21, 3.87s/it]
{'loss': 0.2995, 'grad_norm': 0.5232909321784973, 'learning_rate': 4.471284924733141e-05, 'epoch': 0.29}
29%|██▉ | 1307/4506 [1:29:29<3:26:21, 3.87s/it]
29%|██▉ | 1308/4506 [1:29:33<3:30:38, 3.95s/it]
{'loss': 0.3023, 'grad_norm': 0.4379391074180603, 'learning_rate': 4.4700931295398406e-05, 'epoch': 0.29}
29%|██▉ | 1308/4506 [1:29:33<3:30:38, 3.95s/it]
29%|██▉ | 1309/4506 [1:29:37<3:31:25, 3.97s/it]
{'loss': 0.3035, 'grad_norm': 0.46518751978874207, 'learning_rate': 4.468900151836701e-05, 'epoch': 0.29}
29%|██▉ | 1309/4506 [1:29:37<3:31:25, 3.97s/it]
29%|██▉ | 1310/4506 [1:29:42<3:33:55, 4.02s/it]
{'loss': 0.2801, 'grad_norm': 0.41732057929039, 'learning_rate': 4.4677059923397834e-05, 'epoch': 0.29}
29%|██▉ | 1310/4506 [1:29:42<3:33:55, 4.02s/it]
29%|██▉ | 1311/4506 [1:29:45<3:32:00, 3.98s/it]
{'loss': 0.2892, 'grad_norm': 0.4601612389087677, 'learning_rate': 4.466510651765859e-05, 'epoch': 0.29}
29%|██▉ | 1311/4506 [1:29:45<3:32:00, 3.98s/it]
29%|██▉ | 1312/4506 [1:29:50<3:39:49, 4.13s/it]
{'loss': 0.2921, 'grad_norm': 0.3945634961128235, 'learning_rate': 4.4653141308324086e-05, 'epoch': 0.29}
29%|██▉ | 1312/4506 [1:29:50<3:39:49, 4.13s/it]
29%|██▉ | 1313/4506 [1:29:54<3:31:30, 3.97s/it]
{'loss': 0.2798, 'grad_norm': 0.46990934014320374, 'learning_rate': 4.464116430257619e-05, 'epoch': 0.29}
29%|██▉ | 1313/4506 [1:29:54<3:31:30, 3.97s/it]
29%|██▉ | 1314/4506 [1:29:58<3:33:35, 4.01s/it]
{'loss': 0.2963, 'grad_norm': 0.45729413628578186, 'learning_rate': 4.462917550760386e-05, 'epoch': 0.29}
29%|██▉ | 1314/4506 [1:29:58<3:33:35, 4.01s/it]
29%|██▉ | 1315/4506 [1:30:02<3:37:09, 4.08s/it]
{'loss': 0.2964, 'grad_norm': 0.5361517667770386, 'learning_rate': 4.4617174930603155e-05, 'epoch': 0.29}
29%|██▉ | 1315/4506 [1:30:02<3:37:09, 4.08s/it]
29%|██▉ | 1316/4506 [1:30:06<3:35:31, 4.05s/it]
{'loss': 0.2968, 'grad_norm': 0.4740641713142395, 'learning_rate': 4.460516257877717e-05, 'epoch': 0.29}
29%|██▉ | 1316/4506 [1:30:06<3:35:31, 4.05s/it]
29%|██▉ | 1317/4506 [1:30:10<3:33:41, 4.02s/it]
{'loss': 0.3181, 'grad_norm': 0.5360444188117981, 'learning_rate': 4.459313845933609e-05, 'epoch': 0.29}
29%|██▉ | 1317/4506 [1:30:10<3:33:41, 4.02s/it]
29%|██▉ | 1318/4506 [1:30:14<3:34:56, 4.05s/it]
{'loss': 0.3019, 'grad_norm': 0.5090695023536682, 'learning_rate': 4.4581102579497155e-05, 'epoch': 0.29}
29%|██▉ | 1318/4506 [1:30:14<3:34:56, 4.05s/it]
29%|██▉ | 1319/4506 [1:30:18<3:34:46, 4.04s/it]
{'loss': 0.3049, 'grad_norm': 0.4110558032989502, 'learning_rate': 4.456905494648468e-05, 'epoch': 0.29}
29%|██▉ | 1319/4506 [1:30:18<3:34:46, 4.04s/it]
29%|██▉ | 1320/4506 [1:30:22<3:36:25, 4.08s/it]
{'loss': 0.3044, 'grad_norm': 0.42327794432640076, 'learning_rate': 4.4556995567530005e-05, 'epoch': 0.29}
29%|██▉ | 1320/4506 [1:30:22<3:36:25, 4.08s/it]
29%|██▉ | 1321/4506 [1:30:26<3:31:43, 3.99s/it]
{'loss': 0.3069, 'grad_norm': 0.44034039974212646, 'learning_rate': 4.454492444987154e-05, 'epoch': 0.29}
29%|██▉ | 1321/4506 [1:30:26<3:31:43, 3.99s/it]
29%|██▉ | 1322/4506 [1:30:30<3:35:35, 4.06s/it]
{'loss': 0.3187, 'grad_norm': 0.48336511850357056, 'learning_rate': 4.453284160075473e-05, 'epoch': 0.29}
29%|██▉ | 1322/4506 [1:30:30<3:35:35, 4.06s/it]
29%|██▉ | 1323/4506 [1:30:34<3:36:34, 4.08s/it]
{'loss': 0.3012, 'grad_norm': 0.4136994183063507, 'learning_rate': 4.452074702743209e-05, 'epoch': 0.29}
29%|██▉ | 1323/4506 [1:30:34<3:36:34, 4.08s/it]
29%|██▉ | 1324/4506 [1:30:39<3:38:51, 4.13s/it]
{'loss': 0.3016, 'grad_norm': 0.39516907930374146, 'learning_rate': 4.450864073716313e-05, 'epoch': 0.29}
29%|██▉ | 1324/4506 [1:30:39<3:38:51, 4.13s/it]
29%|██▉ | 1325/4506 [1:30:43<3:38:40, 4.12s/it]
{'loss': 0.3118, 'grad_norm': 0.40165168046951294, 'learning_rate': 4.4496522737214424e-05, 'epoch': 0.29}
29%|██▉ | 1325/4506 [1:30:43<3:38:40, 4.12s/it]
29%|██▉ | 1326/4506 [1:30:46<3:34:19, 4.04s/it]
{'loss': 0.2906, 'grad_norm': 0.5898318886756897, 'learning_rate': 4.4484393034859564e-05, 'epoch': 0.29}
29%|██▉ | 1326/4506 [1:30:47<3:34:19, 4.04s/it]
29%|██▉ | 1327/4506 [1:30:51<3:38:17, 4.12s/it]
{'loss': 0.295, 'grad_norm': 0.3737768232822418, 'learning_rate': 4.447225163737916e-05, 'epoch': 0.29}
29%|██▉ | 1327/4506 [1:30:51<3:38:17, 4.12s/it]
29%|██▉ | 1328/4506 [1:30:55<3:32:47, 4.02s/it]
{'loss': 0.278, 'grad_norm': 0.44697651267051697, 'learning_rate': 4.4460098552060845e-05, 'epoch': 0.29}
29%|██▉ | 1328/4506 [1:30:55<3:32:47, 4.02s/it]
29%|██▉ | 1329/4506 [1:30:59<3:31:44, 4.00s/it]
{'loss': 0.2954, 'grad_norm': 0.43553024530410767, 'learning_rate': 4.4447933786199294e-05, 'epoch': 0.29}
29%|██▉ | 1329/4506 [1:30:59<3:31:44, 4.00s/it]
30%|██▉ | 1330/4506 [1:31:02<3:23:49, 3.85s/it]
{'loss': 0.2702, 'grad_norm': 0.42473194003105164, 'learning_rate': 4.443575734709614e-05, 'epoch': 0.3}
30%|██▉ | 1330/4506 [1:31:02<3:23:49, 3.85s/it]
30%|██▉ | 1331/4506 [1:31:06<3:27:07, 3.91s/it]
{'loss': 0.311, 'grad_norm': 0.43714439868927, 'learning_rate': 4.4423569242060076e-05, 'epoch': 0.3}
30%|██▉ | 1331/4506 [1:31:06<3:27:07, 3.91s/it]
30%|██▉ | 1332/4506 [1:31:10<3:31:09, 3.99s/it]
{'loss': 0.2927, 'grad_norm': 0.4780579209327698, 'learning_rate': 4.441136947840676e-05, 'epoch': 0.3}
30%|██▉ | 1332/4506 [1:31:10<3:31:09, 3.99s/it]
30%|██▉ | 1333/4506 [1:31:14<3:30:27, 3.98s/it]
{'loss': 0.3025, 'grad_norm': 0.4852457046508789, 'learning_rate': 4.439915806345886e-05, 'epoch': 0.3}
30%|██▉ | 1333/4506 [1:31:14<3:30:27, 3.98s/it]
30%|██▉ | 1334/4506 [1:31:18<3:32:33, 4.02s/it]
{'loss': 0.2966, 'grad_norm': 0.4361115097999573, 'learning_rate': 4.438693500454605e-05, 'epoch': 0.3}
30%|██▉ | 1334/4506 [1:31:18<3:32:33, 4.02s/it]
30%|██▉ | 1335/4506 [1:31:22<3:34:03, 4.05s/it]
{'loss': 0.2896, 'grad_norm': 0.46155181527137756, 'learning_rate': 4.4374700309004965e-05, 'epoch': 0.3}
30%|██▉ | 1335/4506 [1:31:22<3:34:03, 4.05s/it]
30%|██▉ | 1336/4506 [1:31:26<3:32:59, 4.03s/it]
{'loss': 0.311, 'grad_norm': 0.43667811155319214, 'learning_rate': 4.436245398417926e-05, 'epoch': 0.3}
30%|██▉ | 1336/4506 [1:31:26<3:32:59, 4.03s/it]
30%|██▉ | 1337/4506 [1:31:30<3:32:25, 4.02s/it]
{'loss': 0.2823, 'grad_norm': 0.4230649471282959, 'learning_rate': 4.435019603741955e-05, 'epoch': 0.3}
30%|██▉ | 1337/4506 [1:31:30<3:32:25, 4.02s/it]
30%|██▉ | 1338/4506 [1:31:34<3:29:35, 3.97s/it]
{'loss': 0.2847, 'grad_norm': 0.4176054894924164, 'learning_rate': 4.4337926476083405e-05, 'epoch': 0.3}
30%|██▉ | 1338/4506 [1:31:34<3:29:35, 3.97s/it]
30%|██▉ | 1339/4506 [1:31:38<3:30:53, 4.00s/it]
{'loss': 0.2839, 'grad_norm': 0.44221192598342896, 'learning_rate': 4.4325645307535414e-05, 'epoch': 0.3}
30%|██▉ | 1339/4506 [1:31:38<3:30:53, 4.00s/it]
30%|██▉ | 1340/4506 [1:31:42<3:32:17, 4.02s/it]
{'loss': 0.2895, 'grad_norm': 0.4152820408344269, 'learning_rate': 4.43133525391471e-05, 'epoch': 0.3}
30%|██▉ | 1340/4506 [1:31:42<3:32:17, 4.02s/it]
30%|██▉ | 1341/4506 [1:31:47<3:33:08, 4.04s/it]
{'loss': 0.2888, 'grad_norm': 0.41931501030921936, 'learning_rate': 4.430104817829696e-05, 'epoch': 0.3}
30%|██▉ | 1341/4506 [1:31:47<3:33:08, 4.04s/it]
30%|██▉ | 1342/4506 [1:31:51<3:35:41, 4.09s/it]
{'loss': 0.3044, 'grad_norm': 0.4354287087917328, 'learning_rate': 4.428873223237043e-05, 'epoch': 0.3}
30%|██▉ | 1342/4506 [1:31:51<3:35:41, 4.09s/it]
30%|██▉ | 1343/4506 [1:31:54<3:30:36, 4.00s/it]
{'loss': 0.3043, 'grad_norm': 0.4430081248283386, 'learning_rate': 4.4276404708759934e-05, 'epoch': 0.3}
30%|██▉ | 1343/4506 [1:31:54<3:30:36, 4.00s/it]
30%|██▉ | 1344/4506 [1:31:59<3:41:59, 4.21s/it]
{'loss': 0.2965, 'grad_norm': 0.4152029752731323, 'learning_rate': 4.4264065614864814e-05, 'epoch': 0.3}
30%|██▉ | 1344/4506 [1:31:59<3:41:59, 4.21s/it]
30%|██▉ | 1345/4506 [1:32:03<3:37:58, 4.14s/it]
{'loss': 0.2875, 'grad_norm': 0.47227442264556885, 'learning_rate': 4.425171495809138e-05, 'epoch': 0.3}
30%|██▉ | 1345/4506 [1:32:03<3:37:58, 4.14s/it]
30%|██▉ | 1346/4506 [1:32:07<3:33:30, 4.05s/it]
{'loss': 0.2945, 'grad_norm': 0.4436900019645691, 'learning_rate': 4.423935274585287e-05, 'epoch': 0.3}
30%|██▉ | 1346/4506 [1:32:07<3:33:30, 4.05s/it]
30%|██▉ | 1347/4506 [1:32:11<3:29:48, 3.98s/it]
{'loss': 0.3025, 'grad_norm': 0.42855745553970337, 'learning_rate': 4.4226978985569457e-05, 'epoch': 0.3}
30%|██▉ | 1347/4506 [1:32:11<3:29:48, 3.98s/it]
30%|██▉ | 1348/4506 [1:32:15<3:34:35, 4.08s/it]
{'loss': 0.2998, 'grad_norm': 0.47486740350723267, 'learning_rate': 4.421459368466825e-05, 'epoch': 0.3}
30%|██▉ | 1348/4506 [1:32:15<3:34:35, 4.08s/it]
30%|██▉ | 1349/4506 [1:32:19<3:34:32, 4.08s/it]
{'loss': 0.292, 'grad_norm': 0.47146910429000854, 'learning_rate': 4.420219685058327e-05, 'epoch': 0.3}
30%|██▉ | 1349/4506 [1:32:19<3:34:32, 4.08s/it]
30%|██▉ | 1350/4506 [1:32:23<3:32:41, 4.04s/it]
{'loss': 0.3066, 'grad_norm': 0.4614923894405365, 'learning_rate': 4.4189788490755496e-05, 'epoch': 0.3}
30%|██▉ | 1350/4506 [1:32:23<3:32:41, 4.04s/it]
30%|██▉ | 1351/4506 [1:32:27<3:30:05, 4.00s/it]
{'loss': 0.2908, 'grad_norm': 0.49508193135261536, 'learning_rate': 4.417736861263279e-05, 'epoch': 0.3}
30%|██▉ | 1351/4506 [1:32:27<3:30:05, 4.00s/it]
30%|███ | 1352/4506 [1:32:31<3:30:16, 4.00s/it]
{'loss': 0.306, 'grad_norm': 0.5096343159675598, 'learning_rate': 4.416493722366994e-05, 'epoch': 0.3}
30%|███ | 1352/4506 [1:32:31<3:30:16, 4.00s/it]
30%|███ | 1353/4506 [1:32:35<3:33:31, 4.06s/it]
{'loss': 0.2976, 'grad_norm': 0.48638662695884705, 'learning_rate': 4.4152494331328665e-05, 'epoch': 0.3}
30%|███ | 1353/4506 [1:32:35<3:33:31, 4.06s/it]
30%|███ | 1354/4506 [1:32:40<3:49:26, 4.37s/it]
{'loss': 0.2856, 'grad_norm': 0.4078595042228699, 'learning_rate': 4.414003994307754e-05, 'epoch': 0.3}
30%|███ | 1354/4506 [1:32:40<3:49:26, 4.37s/it]
30%|███ | 1355/4506 [1:32:44<3:42:54, 4.24s/it]
{'loss': 0.2949, 'grad_norm': 0.42007631063461304, 'learning_rate': 4.412757406639207e-05, 'epoch': 0.3}
30%|███ | 1355/4506 [1:32:44<3:42:54, 4.24s/it]
30%|███ | 1356/4506 [1:32:49<3:44:59, 4.29s/it]
{'loss': 0.3041, 'grad_norm': 0.45570048689842224, 'learning_rate': 4.4115096708754666e-05, 'epoch': 0.3}
30%|███ | 1356/4506 [1:32:49<3:44:59, 4.29s/it]
30%|███ | 1357/4506 [1:32:53<3:37:49, 4.15s/it]
{'loss': 0.2906, 'grad_norm': 0.46211594343185425, 'learning_rate': 4.4102607877654624e-05, 'epoch': 0.3}
30%|███ | 1357/4506 [1:32:53<3:37:49, 4.15s/it]
30%|███ | 1358/4506 [1:32:57<3:36:04, 4.12s/it]
{'loss': 0.301, 'grad_norm': 0.41566503047943115, 'learning_rate': 4.40901075805881e-05, 'epoch': 0.3}
30%|███ | 1358/4506 [1:32:57<3:36:04, 4.12s/it]
30%|███ | 1359/4506 [1:33:01<3:35:49, 4.11s/it]
{'loss': 0.2863, 'grad_norm': 0.44228196144104004, 'learning_rate': 4.407759582505817e-05, 'epoch': 0.3}
30%|███ | 1359/4506 [1:33:01<3:35:49, 4.11s/it]
30%|███ | 1360/4506 [1:33:05<3:42:03, 4.23s/it]
{'loss': 0.2903, 'grad_norm': 0.4794284403324127, 'learning_rate': 4.4065072618574754e-05, 'epoch': 0.3}
30%|███ | 1360/4506 [1:33:05<3:42:03, 4.23s/it]
30%|███ | 1361/4506 [1:33:10<3:43:32, 4.26s/it]
{'loss': 0.2846, 'grad_norm': 0.4407685697078705, 'learning_rate': 4.4052537968654674e-05, 'epoch': 0.3}
30%|███ | 1361/4506 [1:33:10<3:43:32, 4.26s/it]
30%|███ | 1362/4506 [1:33:14<3:49:37, 4.38s/it]
{'loss': 0.2893, 'grad_norm': 0.4645780622959137, 'learning_rate': 4.40399918828216e-05, 'epoch': 0.3}
30%|███ | 1362/4506 [1:33:14<3:49:37, 4.38s/it]
30%|███ | 1363/4506 [1:33:18<3:43:45, 4.27s/it]
{'loss': 0.2903, 'grad_norm': 0.4895080626010895, 'learning_rate': 4.4027434368606076e-05, 'epoch': 0.3}
30%|███ | 1363/4506 [1:33:18<3:43:45, 4.27s/it]
30%|███ | 1364/4506 [1:33:23<3:46:31, 4.33s/it]
{'loss': 0.2999, 'grad_norm': 0.46498361229896545, 'learning_rate': 4.401486543354553e-05, 'epoch': 0.3}
30%|███ | 1364/4506 [1:33:23<3:46:31, 4.33s/it]
30%|███ | 1365/4506 [1:33:27<3:38:54, 4.18s/it]
{'loss': 0.2978, 'grad_norm': 0.5184839963912964, 'learning_rate': 4.4002285085184175e-05, 'epoch': 0.3}
30%|███ | 1365/4506 [1:33:27<3:38:54, 4.18s/it]
30%|███ | 1366/4506 [1:33:31<3:38:00, 4.17s/it]
{'loss': 0.2953, 'grad_norm': 0.415974497795105, 'learning_rate': 4.3989693331073145e-05, 'epoch': 0.3}
30%|███ | 1366/4506 [1:33:31<3:38:00, 4.17s/it]
30%|███ | 1367/4506 [1:33:35<3:35:28, 4.12s/it]
{'loss': 0.3023, 'grad_norm': 0.509651780128479, 'learning_rate': 4.39770901787704e-05, 'epoch': 0.3}
30%|███ | 1367/4506 [1:33:35<3:35:28, 4.12s/it]
30%|███ | 1368/4506 [1:33:39<3:35:20, 4.12s/it]
{'loss': 0.2956, 'grad_norm': 0.43216925859451294, 'learning_rate': 4.3964475635840705e-05, 'epoch': 0.3}
30%|███ | 1368/4506 [1:33:39<3:35:20, 4.12s/it]
30%|███ | 1369/4506 [1:33:43<3:37:53, 4.17s/it]
{'loss': 0.2989, 'grad_norm': 0.3737112283706665, 'learning_rate': 4.3951849709855727e-05, 'epoch': 0.3}
30%|███ | 1369/4506 [1:33:43<3:37:53, 4.17s/it]
30%|███ | 1370/4506 [1:33:47<3:32:52, 4.07s/it]
{'loss': 0.2976, 'grad_norm': 0.4529306888580322, 'learning_rate': 4.393921240839391e-05, 'epoch': 0.3}
30%|███ | 1370/4506 [1:33:47<3:32:52, 4.07s/it]
30%|███ | 1371/4506 [1:33:51<3:29:10, 4.00s/it]
{'loss': 0.279, 'grad_norm': 0.4485408663749695, 'learning_rate': 4.3926563739040555e-05, 'epoch': 0.3}
30%|███ | 1371/4506 [1:33:51<3:29:10, 4.00s/it]
30%|███ | 1372/4506 [1:33:55<3:27:37, 3.97s/it]
{'loss': 0.2865, 'grad_norm': 0.42905765771865845, 'learning_rate': 4.391390370938778e-05, 'epoch': 0.3}
30%|███ | 1372/4506 [1:33:55<3:27:37, 3.97s/it]
30%|███ | 1373/4506 [1:33:59<3:28:13, 3.99s/it]
{'loss': 0.2969, 'grad_norm': 0.5390808582305908, 'learning_rate': 4.390123232703451e-05, 'epoch': 0.3}
30%|███ | 1373/4506 [1:33:59<3:28:13, 3.99s/it]
30%|███ | 1374/4506 [1:34:03<3:32:04, 4.06s/it]
{'loss': 0.2905, 'grad_norm': 0.4293787181377411, 'learning_rate': 4.38885495995865e-05, 'epoch': 0.3}
30%|███ | 1374/4506 [1:34:03<3:32:04, 4.06s/it]
31%|███ | 1375/4506 [1:34:07<3:36:04, 4.14s/it]
{'loss': 0.2916, 'grad_norm': 0.44349405169487, 'learning_rate': 4.387585553465631e-05, 'epoch': 0.31}
31%|███ | 1375/4506 [1:34:07<3:36:04, 4.14s/it]
31%|███ | 1376/4506 [1:34:11<3:37:56, 4.18s/it]
{'loss': 0.2812, 'grad_norm': 0.44540131092071533, 'learning_rate': 4.386315013986331e-05, 'epoch': 0.31}
31%|███ | 1376/4506 [1:34:11<3:37:56, 4.18s/it]
31%|███ | 1377/4506 [1:34:16<3:38:25, 4.19s/it]
{'loss': 0.2911, 'grad_norm': 0.42032262682914734, 'learning_rate': 4.3850433422833656e-05, 'epoch': 0.31}
31%|███ | 1377/4506 [1:34:16<3:38:25, 4.19s/it]
31%|███ | 1378/4506 [1:34:20<3:40:03, 4.22s/it]
{'loss': 0.3093, 'grad_norm': 0.4468022286891937, 'learning_rate': 4.38377053912003e-05, 'epoch': 0.31}
31%|███ | 1378/4506 [1:34:20<3:40:03, 4.22s/it]
31%|███ | 1379/4506 [1:34:24<3:35:59, 4.14s/it]
{'loss': 0.2781, 'grad_norm': 0.39601245522499084, 'learning_rate': 4.382496605260302e-05, 'epoch': 0.31}
31%|███ | 1379/4506 [1:34:24<3:35:59, 4.14s/it]
31%|███ | 1380/4506 [1:34:28<3:37:05, 4.17s/it]
{'loss': 0.2933, 'grad_norm': 0.4641726016998291, 'learning_rate': 4.381221541468833e-05, 'epoch': 0.31}
31%|███ | 1380/4506 [1:34:28<3:37:05, 4.17s/it]
31%|███ | 1381/4506 [1:34:32<3:36:05, 4.15s/it]
{'loss': 0.2801, 'grad_norm': 0.41258493065834045, 'learning_rate': 4.3799453485109556e-05, 'epoch': 0.31}
31%|███ | 1381/4506 [1:34:32<3:36:05, 4.15s/it]
31%|███ | 1382/4506 [1:34:36<3:35:02, 4.13s/it]
{'loss': 0.2968, 'grad_norm': 0.4763171672821045, 'learning_rate': 4.37866802715268e-05, 'epoch': 0.31}
31%|███ | 1382/4506 [1:34:36<3:35:02, 4.13s/it]
31%|███ | 1383/4506 [1:34:40<3:29:04, 4.02s/it]
{'loss': 0.2933, 'grad_norm': 0.5321487188339233, 'learning_rate': 4.377389578160694e-05, 'epoch': 0.31}
31%|███ | 1383/4506 [1:34:40<3:29:04, 4.02s/it]
31%|███ | 1384/4506 [1:34:44<3:28:53, 4.01s/it]
{'loss': 0.2811, 'grad_norm': 0.45973673462867737, 'learning_rate': 4.376110002302361e-05, 'epoch': 0.31}
31%|███ | 1384/4506 [1:34:44<3:28:53, 4.01s/it]
31%|███ | 1385/4506 [1:34:48<3:31:06, 4.06s/it]
{'loss': 0.2922, 'grad_norm': 0.41204938292503357, 'learning_rate': 4.374829300345721e-05, 'epoch': 0.31}
31%|███ | 1385/4506 [1:34:48<3:31:06, 4.06s/it]
31%|███ | 1386/4506 [1:34:53<3:33:25, 4.10s/it]
{'loss': 0.288, 'grad_norm': 0.4298037886619568, 'learning_rate': 4.373547473059491e-05, 'epoch': 0.31}
31%|███ | 1386/4506 [1:34:53<3:33:25, 4.10s/it]
31%|███ | 1387/4506 [1:34:57<3:35:32, 4.15s/it]
{'loss': 0.2905, 'grad_norm': 0.39057034254074097, 'learning_rate': 4.3722645212130616e-05, 'epoch': 0.31}
31%|███ | 1387/4506 [1:34:57<3:35:32, 4.15s/it]
31%|███ | 1388/4506 [1:35:01<3:39:10, 4.22s/it]
{'loss': 0.2779, 'grad_norm': 0.39799848198890686, 'learning_rate': 4.370980445576501e-05, 'epoch': 0.31}
31%|███ | 1388/4506 [1:35:01<3:39:10, 4.22s/it]
31%|███ | 1389/4506 [1:35:05<3:40:02, 4.24s/it]
{'loss': 0.2965, 'grad_norm': 0.42089200019836426, 'learning_rate': 4.369695246920549e-05, 'epoch': 0.31}
31%|███ | 1389/4506 [1:35:05<3:40:02, 4.24s/it]
31%|███ | 1390/4506 [1:35:09<3:34:56, 4.14s/it]
{'loss': 0.2743, 'grad_norm': 0.39049068093299866, 'learning_rate': 4.3684089260166225e-05, 'epoch': 0.31}
31%|███ | 1390/4506 [1:35:09<3:34:56, 4.14s/it]
31%|███ | 1391/4506 [1:35:13<3:29:07, 4.03s/it]
{'loss': 0.2829, 'grad_norm': 0.4372801184654236, 'learning_rate': 4.367121483636809e-05, 'epoch': 0.31}
31%|███ | 1391/4506 [1:35:13<3:29:07, 4.03s/it]
31%|███ | 1392/4506 [1:35:17<3:31:57, 4.08s/it]
{'loss': 0.2871, 'grad_norm': 0.39304834604263306, 'learning_rate': 4.365832920553872e-05, 'epoch': 0.31}
31%|███ | 1392/4506 [1:35:17<3:31:57, 4.08s/it]
31%|███ | 1393/4506 [1:35:21<3:28:12, 4.01s/it]
{'loss': 0.2843, 'grad_norm': 0.42192304134368896, 'learning_rate': 4.3645432375412444e-05, 'epoch': 0.31}
31%|███ | 1393/4506 [1:35:21<3:28:12, 4.01s/it]
31%|███ | 1394/4506 [1:35:25<3:24:46, 3.95s/it]
{'loss': 0.2789, 'grad_norm': 0.4429652690887451, 'learning_rate': 4.363252435373034e-05, 'epoch': 0.31}
31%|███ | 1394/4506 [1:35:25<3:24:46, 3.95s/it]
31%|███ | 1395/4506 [1:35:29<3:27:59, 4.01s/it]
{'loss': 0.2924, 'grad_norm': 0.48326560854911804, 'learning_rate': 4.3619605148240204e-05, 'epoch': 0.31}
31%|███ | 1395/4506 [1:35:29<3:27:59, 4.01s/it]
31%|███ | 1396/4506 [1:35:33<3:26:40, 3.99s/it]
{'loss': 0.2805, 'grad_norm': 0.4045831561088562, 'learning_rate': 4.3606674766696534e-05, 'epoch': 0.31}
31%|███ | 1396/4506 [1:35:33<3:26:40, 3.99s/it]
31%|███ | 1397/4506 [1:35:37<3:26:32, 3.99s/it]
{'loss': 0.2917, 'grad_norm': 0.44054165482521057, 'learning_rate': 4.359373321686053e-05, 'epoch': 0.31}
31%|███ | 1397/4506 [1:35:37<3:26:32, 3.99s/it]
31%|███ | 1398/4506 [1:35:41<3:26:36, 3.99s/it]
{'loss': 0.2907, 'grad_norm': 0.46501925587654114, 'learning_rate': 4.358078050650011e-05, 'epoch': 0.31}
31%|███ | 1398/4506 [1:35:41<3:26:36, 3.99s/it]
31%|███ | 1399/4506 [1:35:46<3:35:12, 4.16s/it]
{'loss': 0.2759, 'grad_norm': 0.4099593162536621, 'learning_rate': 4.356781664338988e-05, 'epoch': 0.31}
31%|███ | 1399/4506 [1:35:46<3:35:12, 4.16s/it]
31%|███ | 1400/4506 [1:35:50<3:32:39, 4.11s/it]
{'loss': 0.279, 'grad_norm': 0.428849995136261, 'learning_rate': 4.355484163531115e-05, 'epoch': 0.31}
31%|███ | 1400/4506 [1:35:50<3:32:39, 4.11s/it]
31%|███ | 1401/4506 [1:35:54<3:30:23, 4.07s/it]
{'loss': 0.297, 'grad_norm': 0.4679866135120392, 'learning_rate': 4.354185549005192e-05, 'epoch': 0.31}
31%|███ | 1401/4506 [1:35:54<3:30:23, 4.07s/it]
31%|███ | 1402/4506 [1:35:57<3:26:43, 4.00s/it]
{'loss': 0.2829, 'grad_norm': 0.4609648287296295, 'learning_rate': 4.3528858215406856e-05, 'epoch': 0.31}
31%|███ | 1402/4506 [1:35:57<3:26:43, 4.00s/it]
31%|███ | 1403/4506 [1:36:01<3:26:17, 3.99s/it]
{'loss': 0.2855, 'grad_norm': 0.4549332857131958, 'learning_rate': 4.351584981917732e-05, 'epoch': 0.31}
31%|███ | 1403/4506 [1:36:01<3:26:17, 3.99s/it]
31%|███ | 1404/4506 [1:36:05<3:26:40, 4.00s/it]
{'loss': 0.3147, 'grad_norm': 0.4687715172767639, 'learning_rate': 4.350283030917136e-05, 'epoch': 0.31}
31%|███ | 1404/4506 [1:36:05<3:26:40, 4.00s/it]
31%|███ | 1405/4506 [1:36:09<3:27:39, 4.02s/it]
{'loss': 0.2896, 'grad_norm': 0.41976436972618103, 'learning_rate': 4.348979969320367e-05, 'epoch': 0.31}
31%|███ | 1405/4506 [1:36:09<3:27:39, 4.02s/it]
31%|███ | 1406/4506 [1:36:13<3:26:17, 3.99s/it]
{'loss': 0.2749, 'grad_norm': 0.415494829416275, 'learning_rate': 4.347675797909562e-05, 'epoch': 0.31}
31%|███ | 1406/4506 [1:36:13<3:26:17, 3.99s/it]
31%|███ | 1407/4506 [1:36:18<3:29:10, 4.05s/it]
{'loss': 0.2828, 'grad_norm': 0.39504656195640564, 'learning_rate': 4.3463705174675265e-05, 'epoch': 0.31}
31%|███ | 1407/4506 [1:36:18<3:29:10, 4.05s/it]
31%|███ | 1408/4506 [1:36:21<3:24:59, 3.97s/it]
{'loss': 0.3027, 'grad_norm': 0.4614517092704773, 'learning_rate': 4.345064128777726e-05, 'epoch': 0.31}
31%|███ | 1408/4506 [1:36:21<3:24:59, 3.97s/it]
31%|███▏ | 1409/4506 [1:36:26<3:29:02, 4.05s/it]
{'loss': 0.2953, 'grad_norm': 0.453668475151062, 'learning_rate': 4.343756632624298e-05, 'epoch': 0.31}
31%|███▏ | 1409/4506 [1:36:26<3:29:02, 4.05s/it]
31%|███▏ | 1410/4506 [1:36:30<3:36:41, 4.20s/it]
{'loss': 0.2924, 'grad_norm': 0.5394361019134521, 'learning_rate': 4.34244802979204e-05, 'epoch': 0.31}
31%|███▏ | 1410/4506 [1:36:30<3:36:41, 4.20s/it]
31%|███▏ | 1411/4506 [1:36:34<3:32:43, 4.12s/it]
{'loss': 0.2811, 'grad_norm': 0.5101924538612366, 'learning_rate': 4.3411383210664144e-05, 'epoch': 0.31}
31%|███▏ | 1411/4506 [1:36:34<3:32:43, 4.12s/it]
31%|███▏ | 1412/4506 [1:36:38<3:34:26, 4.16s/it]
{'loss': 0.2894, 'grad_norm': 0.4451570510864258, 'learning_rate': 4.33982750723355e-05, 'epoch': 0.31}
31%|███▏ | 1412/4506 [1:36:38<3:34:26, 4.16s/it]
31%|███▏ | 1413/4506 [1:36:42<3:32:39, 4.13s/it]
{'loss': 0.3012, 'grad_norm': 0.479181170463562, 'learning_rate': 4.338515589080237e-05, 'epoch': 0.31}
31%|███▏ | 1413/4506 [1:36:42<3:32:39, 4.13s/it]
31%|███▏ | 1414/4506 [1:36:46<3:27:04, 4.02s/it]
{'loss': 0.2839, 'grad_norm': 0.4510266184806824, 'learning_rate': 4.337202567393928e-05, 'epoch': 0.31}
31%|███▏ | 1414/4506 [1:36:46<3:27:04, 4.02s/it]
31%|███▏ | 1415/4506 [1:36:50<3:28:09, 4.04s/it]
{'loss': 0.2843, 'grad_norm': 0.4351036250591278, 'learning_rate': 4.3358884429627375e-05, 'epoch': 0.31}
31%|███▏ | 1415/4506 [1:36:50<3:28:09, 4.04s/it]
31%|███▏ | 1416/4506 [1:36:54<3:28:24, 4.05s/it]
{'loss': 0.2952, 'grad_norm': 0.47945636510849, 'learning_rate': 4.334573216575445e-05, 'epoch': 0.31}
31%|███▏ | 1416/4506 [1:36:54<3:28:24, 4.05s/it]
31%|███▏ | 1417/4506 [1:36:58<3:27:43, 4.03s/it]
{'loss': 0.2966, 'grad_norm': 0.4493899643421173, 'learning_rate': 4.3332568890214875e-05, 'epoch': 0.31}
31%|███▏ | 1417/4506 [1:36:58<3:27:43, 4.03s/it]
31%|███▏ | 1418/4506 [1:37:02<3:28:38, 4.05s/it]
{'loss': 0.2778, 'grad_norm': 0.41709545254707336, 'learning_rate': 4.331939461090966e-05, 'epoch': 0.31}
31%|███▏ | 1418/4506 [1:37:02<3:28:38, 4.05s/it]
31%|███▏ | 1419/4506 [1:37:06<3:29:36, 4.07s/it]
{'loss': 0.3021, 'grad_norm': 0.41885852813720703, 'learning_rate': 4.330620933574641e-05, 'epoch': 0.31}
31%|███▏ | 1419/4506 [1:37:07<3:29:36, 4.07s/it]
32%|███▏ | 1420/4506 [1:37:10<3:24:35, 3.98s/it]
{'loss': 0.2803, 'grad_norm': 0.45554599165916443, 'learning_rate': 4.329301307263932e-05, 'epoch': 0.32}
32%|███▏ | 1420/4506 [1:37:10<3:24:35, 3.98s/it]
32%|███▏ | 1421/4506 [1:37:14<3:19:29, 3.88s/it]
{'loss': 0.2793, 'grad_norm': 0.41274869441986084, 'learning_rate': 4.3279805829509196e-05, 'epoch': 0.32}
32%|███▏ | 1421/4506 [1:37:14<3:19:29, 3.88s/it]
32%|███▏ | 1422/4506 [1:37:18<3:18:18, 3.86s/it]
{'loss': 0.2872, 'grad_norm': 0.4615424573421478, 'learning_rate': 4.3266587614283426e-05, 'epoch': 0.32}
32%|███▏ | 1422/4506 [1:37:18<3:18:18, 3.86s/it]
32%|███▏ | 1423/4506 [1:37:22<3:19:01, 3.87s/it]
{'loss': 0.2826, 'grad_norm': 0.4373774230480194, 'learning_rate': 4.3253358434895975e-05, 'epoch': 0.32}
32%|███▏ | 1423/4506 [1:37:22<3:19:01, 3.87s/it]
32%|███▏ | 1424/4506 [1:37:26<3:22:39, 3.95s/it]
{'loss': 0.2882, 'grad_norm': 0.4928945004940033, 'learning_rate': 4.324011829928741e-05, 'epoch': 0.32}
32%|███▏ | 1424/4506 [1:37:26<3:22:39, 3.95s/it]
32%|███▏ | 1425/4506 [1:37:30<3:31:46, 4.12s/it]
{'loss': 0.295, 'grad_norm': 0.48207998275756836, 'learning_rate': 4.3226867215404864e-05, 'epoch': 0.32}
32%|███▏ | 1425/4506 [1:37:30<3:31:46, 4.12s/it]
32%|███▏ | 1426/4506 [1:37:34<3:30:20, 4.10s/it]
{'loss': 0.2991, 'grad_norm': 0.47239843010902405, 'learning_rate': 4.321360519120204e-05, 'epoch': 0.32}
32%|███▏ | 1426/4506 [1:37:34<3:30:20, 4.10s/it]
32%|███▏ | 1427/4506 [1:37:38<3:28:31, 4.06s/it]
{'loss': 0.3034, 'grad_norm': 0.5176689624786377, 'learning_rate': 4.32003322346392e-05, 'epoch': 0.32}
32%|███▏ | 1427/4506 [1:37:38<3:28:31, 4.06s/it]
32%|███▏ | 1428/4506 [1:37:42<3:29:37, 4.09s/it]
{'loss': 0.287, 'grad_norm': 0.4696263074874878, 'learning_rate': 4.318704835368318e-05, 'epoch': 0.32}
32%|███▏ | 1428/4506 [1:37:42<3:29:37, 4.09s/it]
32%|███▏ | 1429/4506 [1:37:46<3:27:46, 4.05s/it]
{'loss': 0.2924, 'grad_norm': 0.47677236795425415, 'learning_rate': 4.3173753556307375e-05, 'epoch': 0.32}
32%|███▏ | 1429/4506 [1:37:46<3:27:46, 4.05s/it]
32%|███▏ | 1430/4506 [1:37:50<3:27:12, 4.04s/it]
{'loss': 0.2886, 'grad_norm': 0.4515426754951477, 'learning_rate': 4.3160447850491725e-05, 'epoch': 0.32}
32%|███▏ | 1430/4506 [1:37:50<3:27:12, 4.04s/it]
32%|███▏ | 1431/4506 [1:37:54<3:27:15, 4.04s/it]
{'loss': 0.2897, 'grad_norm': 0.43672919273376465, 'learning_rate': 4.314713124422271e-05, 'epoch': 0.32}
32%|███▏ | 1431/4506 [1:37:54<3:27:15, 4.04s/it]
32%|███▏ | 1432/4506 [1:37:58<3:25:43, 4.02s/it]
{'loss': 0.2832, 'grad_norm': 0.42106086015701294, 'learning_rate': 4.313380374549338e-05, 'epoch': 0.32}
32%|███▏ | 1432/4506 [1:37:58<3:25:43, 4.02s/it]
32%|███▏ | 1433/4506 [1:38:03<3:28:55, 4.08s/it]
{'loss': 0.3067, 'grad_norm': 0.41535916924476624, 'learning_rate': 4.3120465362303285e-05, 'epoch': 0.32}
32%|███▏ | 1433/4506 [1:38:03<3:28:55, 4.08s/it]
32%|███▏ | 1434/4506 [1:38:07<3:39:22, 4.28s/it]
{'loss': 0.2998, 'grad_norm': 0.41927671432495117, 'learning_rate': 4.3107116102658546e-05, 'epoch': 0.32}
32%|███▏ | 1434/4506 [1:38:07<3:39:22, 4.28s/it]
32%|███▏ | 1435/4506 [1:38:11<3:32:09, 4.15s/it]
{'loss': 0.2841, 'grad_norm': 0.451346755027771, 'learning_rate': 4.309375597457178e-05, 'epoch': 0.32}
32%|███▏ | 1435/4506 [1:38:11<3:32:09, 4.15s/it]
32%|███▏ | 1436/4506 [1:38:15<3:29:56, 4.10s/it]
{'loss': 0.2884, 'grad_norm': 0.3795832395553589, 'learning_rate': 4.308038498606216e-05, 'epoch': 0.32}
32%|███▏ | 1436/4506 [1:38:15<3:29:56, 4.10s/it]
32%|███▏ | 1437/4506 [1:38:19<3:26:27, 4.04s/it]
{'loss': 0.2854, 'grad_norm': 0.43456488847732544, 'learning_rate': 4.3067003145155346e-05, 'epoch': 0.32}
32%|███▏ | 1437/4506 [1:38:19<3:26:27, 4.04s/it]
32%|███▏ | 1438/4506 [1:38:23<3:23:07, 3.97s/it]
{'loss': 0.2872, 'grad_norm': 0.43175190687179565, 'learning_rate': 4.3053610459883526e-05, 'epoch': 0.32}
32%|███▏ | 1438/4506 [1:38:23<3:23:07, 3.97s/it]
32%|███▏ | 1439/4506 [1:38:27<3:23:42, 3.99s/it]
{'loss': 0.2685, 'grad_norm': 0.3729560077190399, 'learning_rate': 4.30402069382854e-05, 'epoch': 0.32}
32%|███▏ | 1439/4506 [1:38:27<3:23:42, 3.99s/it]
32%|███▏ | 1440/4506 [1:38:31<3:27:15, 4.06s/it]
{'loss': 0.3086, 'grad_norm': 0.481825590133667, 'learning_rate': 4.302679258840617e-05, 'epoch': 0.32}
32%|███▏ | 1440/4506 [1:38:31<3:27:15, 4.06s/it]
32%|███▏ | 1441/4506 [1:38:35<3:25:03, 4.01s/it]
{'loss': 0.2862, 'grad_norm': 0.4446410536766052, 'learning_rate': 4.301336741829755e-05, 'epoch': 0.32}
32%|███▏ | 1441/4506 [1:38:35<3:25:03, 4.01s/it]
32%|███▏ | 1442/4506 [1:38:39<3:27:16, 4.06s/it]
{'loss': 0.2837, 'grad_norm': 0.45960086584091187, 'learning_rate': 4.299993143601772e-05, 'epoch': 0.32}
32%|███▏ | 1442/4506 [1:38:39<3:27:16, 4.06s/it]
32%|███▏ | 1443/4506 [1:38:43<3:24:29, 4.01s/it]
{'loss': 0.2991, 'grad_norm': 0.47001349925994873, 'learning_rate': 4.298648464963136e-05, 'epoch': 0.32}
32%|███▏ | 1443/4506 [1:38:43<3:24:29, 4.01s/it]
32%|███▏ | 1444/4506 [1:38:47<3:28:11, 4.08s/it]
{'loss': 0.2811, 'grad_norm': 0.4191378355026245, 'learning_rate': 4.2973027067209656e-05, 'epoch': 0.32}
32%|███▏ | 1444/4506 [1:38:47<3:28:11, 4.08s/it]
32%|███▏ | 1445/4506 [1:38:52<3:30:09, 4.12s/it]
{'loss': 0.2806, 'grad_norm': 0.40677228569984436, 'learning_rate': 4.295955869683024e-05, 'epoch': 0.32}
32%|███▏ | 1445/4506 [1:38:52<3:30:09, 4.12s/it]
32%|███▏ | 1446/4506 [1:38:55<3:26:47, 4.05s/it]
{'loss': 0.2761, 'grad_norm': 0.4451577961444855, 'learning_rate': 4.2946079546577264e-05, 'epoch': 0.32}
32%|███▏ | 1446/4506 [1:38:56<3:26:47, 4.05s/it]
32%|███▏ | 1447/4506 [1:39:00<3:28:22, 4.09s/it]
{'loss': 0.3035, 'grad_norm': 0.5018994808197021, 'learning_rate': 4.29325896245413e-05, 'epoch': 0.32}
32%|███▏ | 1447/4506 [1:39:00<3:28:22, 4.09s/it]
32%|███▏ | 1448/4506 [1:39:04<3:29:09, 4.10s/it]
{'loss': 0.2933, 'grad_norm': 0.43188396096229553, 'learning_rate': 4.291908893881942e-05, 'epoch': 0.32}
32%|███▏ | 1448/4506 [1:39:04<3:29:09, 4.10s/it]
32%|███▏ | 1449/4506 [1:39:08<3:26:47, 4.06s/it]
{'loss': 0.2855, 'grad_norm': 0.4166548550128937, 'learning_rate': 4.290557749751515e-05, 'epoch': 0.32}
32%|███▏ | 1449/4506 [1:39:08<3:26:47, 4.06s/it]
32%|███▏ | 1450/4506 [1:39:12<3:27:15, 4.07s/it]
{'loss': 0.286, 'grad_norm': 0.42256397008895874, 'learning_rate': 4.289205530873845e-05, 'epoch': 0.32}
32%|███▏ | 1450/4506 [1:39:12<3:27:15, 4.07s/it]
32%|███▏ | 1451/4506 [1:39:16<3:25:45, 4.04s/it]
{'loss': 0.2927, 'grad_norm': 0.48743876814842224, 'learning_rate': 4.287852238060578e-05, 'epoch': 0.32}
32%|███▏ | 1451/4506 [1:39:16<3:25:45, 4.04s/it]
32%|███▏ | 1452/4506 [1:39:20<3:25:31, 4.04s/it]
{'loss': 0.2899, 'grad_norm': 0.47351935505867004, 'learning_rate': 4.286497872123999e-05, 'epoch': 0.32}
32%|███▏ | 1452/4506 [1:39:20<3:25:31, 4.04s/it]
32%|███▏ | 1453/4506 [1:39:24<3:32:06, 4.17s/it]
{'loss': 0.2801, 'grad_norm': 0.5064451098442078, 'learning_rate': 4.28514243387704e-05, 'epoch': 0.32}
32%|███▏ | 1453/4506 [1:39:24<3:32:06, 4.17s/it]
32%|███▏ | 1454/4506 [1:39:29<3:37:50, 4.28s/it]
{'loss': 0.2882, 'grad_norm': 0.4268536865711212, 'learning_rate': 4.283785924133277e-05, 'epoch': 0.32}
32%|███▏ | 1454/4506 [1:39:29<3:37:50, 4.28s/it]
32%|███▏ | 1455/4506 [1:39:33<3:34:16, 4.21s/it]
{'loss': 0.2961, 'grad_norm': 0.5009930729866028, 'learning_rate': 4.282428343706928e-05, 'epoch': 0.32}
32%|███▏ | 1455/4506 [1:39:33<3:34:16, 4.21s/it]
32%|███▏ | 1456/4506 [1:39:37<3:28:44, 4.11s/it]
{'loss': 0.272, 'grad_norm': 0.4918985366821289, 'learning_rate': 4.281069693412855e-05, 'epoch': 0.32}
32%|███▏ | 1456/4506 [1:39:37<3:28:44, 4.11s/it]
32%|███▏ | 1457/4506 [1:39:41<3:37:14, 4.28s/it]
{'loss': 0.3009, 'grad_norm': 0.42856651544570923, 'learning_rate': 4.27970997406656e-05, 'epoch': 0.32}
32%|███▏ | 1457/4506 [1:39:41<3:37:14, 4.28s/it]
32%|███▏ | 1458/4506 [1:39:45<3:33:16, 4.20s/it]
{'loss': 0.2921, 'grad_norm': 0.42977920174598694, 'learning_rate': 4.278349186484188e-05, 'epoch': 0.32}
32%|███▏ | 1458/4506 [1:39:45<3:33:16, 4.20s/it]
32%|███▏ | 1459/4506 [1:39:50<3:32:39, 4.19s/it]
{'loss': 0.2707, 'grad_norm': 0.4430248439311981, 'learning_rate': 4.276987331482526e-05, 'epoch': 0.32}
32%|███▏ | 1459/4506 [1:39:50<3:32:39, 4.19s/it]
32%|███▏ | 1460/4506 [1:39:54<3:28:16, 4.10s/it]
{'loss': 0.2816, 'grad_norm': 0.40392762422561646, 'learning_rate': 4.275624409879e-05, 'epoch': 0.32}
32%|███▏ | 1460/4506 [1:39:54<3:28:16, 4.10s/it]
32%|███▏ | 1461/4506 [1:39:58<3:33:18, 4.20s/it]
{'loss': 0.2849, 'grad_norm': 0.4854315519332886, 'learning_rate': 4.274260422491677e-05, 'epoch': 0.32}
32%|███▏ | 1461/4506 [1:39:58<3:33:18, 4.20s/it]
32%|███▏ | 1462/4506 [1:40:02<3:31:28, 4.17s/it]
{'loss': 0.2953, 'grad_norm': 0.49528682231903076, 'learning_rate': 4.272895370139264e-05, 'epoch': 0.32}
32%|███▏ | 1462/4506 [1:40:02<3:31:28, 4.17s/it]
32%|███▏ | 1463/4506 [1:40:06<3:27:41, 4.10s/it]
{'loss': 0.2942, 'grad_norm': 0.421474814414978, 'learning_rate': 4.271529253641107e-05, 'epoch': 0.32}
32%|███▏ | 1463/4506 [1:40:06<3:27:41, 4.10s/it]
32%|███▏ | 1464/4506 [1:40:10<3:24:04, 4.03s/it]
{'loss': 0.2873, 'grad_norm': 0.4067288935184479, 'learning_rate': 4.270162073817191e-05, 'epoch': 0.32}
32%|███▏ | 1464/4506 [1:40:10<3:24:04, 4.03s/it]
33%|███▎ | 1465/4506 [1:40:14<3:27:23, 4.09s/it]
{'loss': 0.2939, 'grad_norm': 0.535398006439209, 'learning_rate': 4.268793831488139e-05, 'epoch': 0.33}
33%|███▎ | 1465/4506 [1:40:14<3:27:23, 4.09s/it]
33%|███▎ | 1466/4506 [1:40:18<3:29:10, 4.13s/it]
{'loss': 0.2837, 'grad_norm': 0.4701130986213684, 'learning_rate': 4.26742452747521e-05, 'epoch': 0.33}
33%|███▎ | 1466/4506 [1:40:18<3:29:10, 4.13s/it]
33%|███▎ | 1467/4506 [1:40:22<3:24:51, 4.04s/it]
{'loss': 0.2863, 'grad_norm': 0.5158503651618958, 'learning_rate': 4.266054162600304e-05, 'epoch': 0.33}
33%|███▎ | 1467/4506 [1:40:22<3:24:51, 4.04s/it]
33%|███▎ | 1468/4506 [1:40:26<3:21:02, 3.97s/it]
{'loss': 0.2779, 'grad_norm': 0.48617658019065857, 'learning_rate': 4.264682737685955e-05, 'epoch': 0.33}
33%|███▎ | 1468/4506 [1:40:26<3:21:02, 3.97s/it]
33%|███▎ | 1469/4506 [1:40:30<3:25:28, 4.06s/it]
{'loss': 0.2857, 'grad_norm': 0.4630263149738312, 'learning_rate': 4.263310253555334e-05, 'epoch': 0.33}
33%|███▎ | 1469/4506 [1:40:30<3:25:28, 4.06s/it]
33%|███▎ | 1470/4506 [1:40:34<3:22:57, 4.01s/it]
{'loss': 0.2935, 'grad_norm': 0.4781331419944763, 'learning_rate': 4.2619367110322474e-05, 'epoch': 0.33}
33%|███▎ | 1470/4506 [1:40:34<3:22:57, 4.01s/it]
33%|███▎ | 1471/4506 [1:40:38<3:21:22, 3.98s/it]
{'loss': 0.2848, 'grad_norm': 0.4882846772670746, 'learning_rate': 4.2605621109411376e-05, 'epoch': 0.33}
33%|███▎ | 1471/4506 [1:40:38<3:21:22, 3.98s/it]
33%|███▎ | 1472/4506 [1:40:42<3:26:10, 4.08s/it]
{'loss': 0.2834, 'grad_norm': 0.48853927850723267, 'learning_rate': 4.2591864541070806e-05, 'epoch': 0.33}
33%|███▎ | 1472/4506 [1:40:42<3:26:10, 4.08s/it]
33%|███▎ | 1473/4506 [1:40:46<3:25:21, 4.06s/it]
{'loss': 0.2732, 'grad_norm': 0.3932627737522125, 'learning_rate': 4.2578097413557883e-05, 'epoch': 0.33}
33%|███▎ | 1473/4506 [1:40:46<3:25:21, 4.06s/it]
33%|███▎ | 1474/4506 [1:40:51<3:27:37, 4.11s/it]
{'loss': 0.2837, 'grad_norm': 0.41786786913871765, 'learning_rate': 4.256431973513606e-05, 'epoch': 0.33}
33%|███▎ | 1474/4506 [1:40:51<3:27:37, 4.11s/it]
33%|███▎ | 1475/4506 [1:40:54<3:24:33, 4.05s/it]
{'loss': 0.287, 'grad_norm': 0.4347076416015625, 'learning_rate': 4.25505315140751e-05, 'epoch': 0.33}
33%|███▎ | 1475/4506 [1:40:55<3:24:33, 4.05s/it]
33%|███▎ | 1476/4506 [1:40:59<3:24:56, 4.06s/it]
{'loss': 0.284, 'grad_norm': 0.4144408106803894, 'learning_rate': 4.2536732758651134e-05, 'epoch': 0.33}
33%|███▎ | 1476/4506 [1:40:59<3:24:56, 4.06s/it]
33%|███▎ | 1477/4506 [1:41:03<3:33:08, 4.22s/it]
{'loss': 0.3094, 'grad_norm': 0.48144808411598206, 'learning_rate': 4.2522923477146584e-05, 'epoch': 0.33}
33%|███▎ | 1477/4506 [1:41:03<3:33:08, 4.22s/it]
33%|███▎ | 1478/4506 [1:41:07<3:32:09, 4.20s/it]
{'loss': 0.2699, 'grad_norm': 0.3908189833164215, 'learning_rate': 4.25091036778502e-05, 'epoch': 0.33}
33%|███▎ | 1478/4506 [1:41:07<3:32:09, 4.20s/it]
33%|███▎ | 1479/4506 [1:41:11<3:30:43, 4.18s/it]
{'loss': 0.2877, 'grad_norm': 0.4085959494113922, 'learning_rate': 4.2495273369057065e-05, 'epoch': 0.33}
33%|███▎ | 1479/4506 [1:41:11<3:30:43, 4.18s/it]
33%|███▎ | 1480/4506 [1:41:15<3:23:33, 4.04s/it]
{'loss': 0.2917, 'grad_norm': 0.4771279990673065, 'learning_rate': 4.2481432559068515e-05, 'epoch': 0.33}
33%|███▎ | 1480/4506 [1:41:15<3:23:33, 4.04s/it]
33%|███▎ | 1481/4506 [1:41:19<3:24:56, 4.06s/it]
{'loss': 0.2892, 'grad_norm': 0.46541643142700195, 'learning_rate': 4.246758125619226e-05, 'epoch': 0.33}
33%|███▎ | 1481/4506 [1:41:19<3:24:56, 4.06s/it]
33%|███▎ | 1482/4506 [1:41:23<3:19:50, 3.97s/it]
{'loss': 0.2794, 'grad_norm': 0.46138375997543335, 'learning_rate': 4.245371946874225e-05, 'epoch': 0.33}
33%|███▎ | 1482/4506 [1:41:23<3:19:50, 3.97s/it]
33%|███▎ | 1483/4506 [1:41:27<3:20:25, 3.98s/it]
{'loss': 0.2744, 'grad_norm': 0.4698708951473236, 'learning_rate': 4.243984720503876e-05, 'epoch': 0.33}
33%|███▎ | 1483/4506 [1:41:27<3:20:25, 3.98s/it]
33%|███▎ | 1484/4506 [1:41:31<3:18:44, 3.95s/it]
{'loss': 0.2872, 'grad_norm': 0.45664364099502563, 'learning_rate': 4.242596447340835e-05, 'epoch': 0.33}
33%|███▎ | 1484/4506 [1:41:31<3:18:44, 3.95s/it]
33%|███▎ | 1485/4506 [1:41:35<3:23:21, 4.04s/it]
{'loss': 0.2882, 'grad_norm': 0.4918450117111206, 'learning_rate': 4.241207128218386e-05, 'epoch': 0.33}
33%|███▎ | 1485/4506 [1:41:35<3:23:21, 4.04s/it]
33%|███▎ | 1486/4506 [1:41:39<3:23:16, 4.04s/it]
{'loss': 0.2844, 'grad_norm': 0.4586839973926544, 'learning_rate': 4.239816763970439e-05, 'epoch': 0.33}
33%|███▎ | 1486/4506 [1:41:39<3:23:16, 4.04s/it]
33%|███▎ | 1487/4506 [1:41:43<3:23:11, 4.04s/it]
{'loss': 0.2843, 'grad_norm': 0.4232316017150879, 'learning_rate': 4.238425355431535e-05, 'epoch': 0.33}
33%|███▎ | 1487/4506 [1:41:43<3:23:11, 4.04s/it]
33%|███▎ | 1488/4506 [1:41:47<3:20:25, 3.98s/it]
{'loss': 0.2938, 'grad_norm': 0.4273756146430969, 'learning_rate': 4.237032903436837e-05, 'epoch': 0.33}
33%|███▎ | 1488/4506 [1:41:47<3:20:25, 3.98s/it]
33%|███▎ | 1489/4506 [1:41:51<3:15:38, 3.89s/it]
{'loss': 0.278, 'grad_norm': 0.43254563212394714, 'learning_rate': 4.23563940882214e-05, 'epoch': 0.33}
33%|███▎ | 1489/4506 [1:41:51<3:15:38, 3.89s/it]
33%|███▎ | 1490/4506 [1:41:55<3:23:29, 4.05s/it]
{'loss': 0.2898, 'grad_norm': 0.47836869955062866, 'learning_rate': 4.2342448724238595e-05, 'epoch': 0.33}
33%|███▎ | 1490/4506 [1:41:55<3:23:29, 4.05s/it]
33%|███▎ | 1491/4506 [1:41:59<3:20:40, 3.99s/it]
{'loss': 0.2745, 'grad_norm': 0.4416758120059967, 'learning_rate': 4.23284929507904e-05, 'epoch': 0.33}
33%|███▎ | 1491/4506 [1:41:59<3:20:40, 3.99s/it]
33%|███▎ | 1492/4506 [1:42:03<3:20:32, 3.99s/it]
{'loss': 0.2809, 'grad_norm': 0.47329697012901306, 'learning_rate': 4.2314526776253486e-05, 'epoch': 0.33}
33%|███▎ | 1492/4506 [1:42:03<3:20:32, 3.99s/it]
33%|███▎ | 1493/4506 [1:42:07<3:17:10, 3.93s/it]
{'loss': 0.2617, 'grad_norm': 0.39817437529563904, 'learning_rate': 4.230055020901079e-05, 'epoch': 0.33}
33%|███▎ | 1493/4506 [1:42:07<3:17:10, 3.93s/it]
33%|███▎ | 1494/4506 [1:42:11<3:14:45, 3.88s/it]
{'loss': 0.2653, 'grad_norm': 0.45211055874824524, 'learning_rate': 4.2286563257451464e-05, 'epoch': 0.33}
33%|███▎ | 1494/4506 [1:42:11<3:14:45, 3.88s/it]
33%|███▎ | 1495/4506 [1:42:14<3:13:11, 3.85s/it]
{'loss': 0.295, 'grad_norm': 0.46953845024108887, 'learning_rate': 4.22725659299709e-05, 'epoch': 0.33}
33%|███▎ | 1495/4506 [1:42:14<3:13:11, 3.85s/it]
33%|███▎ | 1496/4506 [1:42:19<3:18:50, 3.96s/it]
{'loss': 0.2876, 'grad_norm': 0.42169779539108276, 'learning_rate': 4.225855823497072e-05, 'epoch': 0.33}
33%|███▎ | 1496/4506 [1:42:19<3:18:50, 3.96s/it]
33%|███▎ | 1497/4506 [1:42:23<3:18:33, 3.96s/it]
{'loss': 0.2711, 'grad_norm': 0.4345304071903229, 'learning_rate': 4.224454018085878e-05, 'epoch': 0.33}
33%|███▎ | 1497/4506 [1:42:23<3:18:33, 3.96s/it]
33%|███▎ | 1498/4506 [1:42:27<3:20:18, 4.00s/it]
{'loss': 0.2722, 'grad_norm': 0.4456804394721985, 'learning_rate': 4.223051177604913e-05, 'epoch': 0.33}
33%|███▎ | 1498/4506 [1:42:27<3:20:18, 4.00s/it]
33%|███▎ | 1499/4506 [1:42:31<3:23:23, 4.06s/it]
{'loss': 0.2842, 'grad_norm': 0.4300602972507477, 'learning_rate': 4.221647302896205e-05, 'epoch': 0.33}
33%|███▎ | 1499/4506 [1:42:31<3:23:23, 4.06s/it]
33%|███▎ | 1500/4506 [1:42:35<3:24:51, 4.09s/it]
{'loss': 0.2956, 'grad_norm': 0.46429285407066345, 'learning_rate': 4.220242394802402e-05, 'epoch': 0.33}
33%|███▎ | 1500/4506 [1:42:35<3:24:51, 4.09s/it]
33%|███▎ | 1501/4506 [1:42:39<3:27:01, 4.13s/it]
{'loss': 0.2852, 'grad_norm': 0.42715057730674744, 'learning_rate': 4.218836454166773e-05, 'epoch': 0.33}
33%|███▎ | 1501/4506 [1:42:39<3:27:01, 4.13s/it]
33%|███▎ | 1502/4506 [1:42:44<3:31:04, 4.22s/it]
{'loss': 0.2886, 'grad_norm': 0.42493367195129395, 'learning_rate': 4.217429481833206e-05, 'epoch': 0.33}
33%|███▎ | 1502/4506 [1:42:44<3:31:04, 4.22s/it]
33%|███▎ | 1503/4506 [1:42:48<3:25:53, 4.11s/it]
{'loss': 0.2965, 'grad_norm': 0.4661273658275604, 'learning_rate': 4.216021478646209e-05, 'epoch': 0.33}
33%|███▎ | 1503/4506 [1:42:48<3:25:53, 4.11s/it]
33%|███▎ | 1504/4506 [1:42:52<3:26:36, 4.13s/it]
{'loss': 0.2928, 'grad_norm': 0.4129681885242462, 'learning_rate': 4.214612445450908e-05, 'epoch': 0.33}
33%|███▎ | 1504/4506 [1:42:52<3:26:36, 4.13s/it]
33%|███▎ | 1505/4506 [1:42:56<3:24:25, 4.09s/it]
{'loss': 0.2747, 'grad_norm': 0.4300139546394348, 'learning_rate': 4.213202383093048e-05, 'epoch': 0.33}
33%|███▎ | 1505/4506 [1:42:56<3:24:25, 4.09s/it]
33%|███▎ | 1506/4506 [1:42:59<3:18:23, 3.97s/it]
{'loss': 0.2782, 'grad_norm': 0.4340522289276123, 'learning_rate': 4.211791292418991e-05, 'epoch': 0.33}
33%|███▎ | 1506/4506 [1:42:59<3:18:23, 3.97s/it]
33%|███▎ | 1507/4506 [1:43:04<3:21:05, 4.02s/it]
{'loss': 0.303, 'grad_norm': 0.42604538798332214, 'learning_rate': 4.210379174275716e-05, 'epoch': 0.33}
33%|███▎ | 1507/4506 [1:43:04<3:21:05, 4.02s/it]
33%|███▎ | 1508/4506 [1:43:08<3:21:01, 4.02s/it]
{'loss': 0.2987, 'grad_norm': 0.5013821721076965, 'learning_rate': 4.2089660295108205e-05, 'epoch': 0.33}
33%|███▎ | 1508/4506 [1:43:08<3:21:01, 4.02s/it]
33%|███▎ | 1509/4506 [1:43:12<3:24:30, 4.09s/it]
{'loss': 0.2902, 'grad_norm': 0.42848628759384155, 'learning_rate': 4.207551858972515e-05, 'epoch': 0.33}
33%|███▎ | 1509/4506 [1:43:12<3:24:30, 4.09s/it]
34%|███▎ | 1510/4506 [1:43:16<3:22:06, 4.05s/it]
{'loss': 0.2786, 'grad_norm': 0.4353961646556854, 'learning_rate': 4.20613666350963e-05, 'epoch': 0.34}
34%|███▎ | 1510/4506 [1:43:16<3:22:06, 4.05s/it]
34%|███▎ | 1511/4506 [1:43:20<3:21:42, 4.04s/it]
{'loss': 0.2866, 'grad_norm': 0.42212337255477905, 'learning_rate': 4.2047204439716084e-05, 'epoch': 0.34}
34%|███▎ | 1511/4506 [1:43:20<3:21:42, 4.04s/it]
34%|███▎ | 1512/4506 [1:43:24<3:24:11, 4.09s/it]
{'loss': 0.2947, 'grad_norm': 0.5090945959091187, 'learning_rate': 4.2033032012085076e-05, 'epoch': 0.34}
34%|███▎ | 1512/4506 [1:43:24<3:24:11, 4.09s/it]
34%|███▎ | 1513/4506 [1:43:28<3:20:19, 4.02s/it]
{'loss': 0.2966, 'grad_norm': 0.4677547812461853, 'learning_rate': 4.201884936071e-05, 'epoch': 0.34}
34%|███▎ | 1513/4506 [1:43:28<3:20:19, 4.02s/it]
34%|███▎ | 1514/4506 [1:43:32<3:17:42, 3.96s/it]
{'loss': 0.2859, 'grad_norm': 0.39564019441604614, 'learning_rate': 4.2004656494103714e-05, 'epoch': 0.34}
34%|███▎ | 1514/4506 [1:43:32<3:17:42, 3.96s/it]
34%|███▎ | 1515/4506 [1:43:36<3:19:29, 4.00s/it]
{'loss': 0.2834, 'grad_norm': 0.4473416805267334, 'learning_rate': 4.19904534207852e-05, 'epoch': 0.34}
34%|███▎ | 1515/4506 [1:43:36<3:19:29, 4.00s/it]
34%|███▎ | 1516/4506 [1:43:40<3:20:08, 4.02s/it]
{'loss': 0.281, 'grad_norm': 0.4003050625324249, 'learning_rate': 4.19762401492796e-05, 'epoch': 0.34}
34%|███▎ | 1516/4506 [1:43:40<3:20:08, 4.02s/it]
34%|███▎ | 1517/4506 [1:43:44<3:19:48, 4.01s/it]
{'loss': 0.28, 'grad_norm': 0.398598849773407, 'learning_rate': 4.196201668811813e-05, 'epoch': 0.34}
34%|███▎ | 1517/4506 [1:43:44<3:19:48, 4.01s/it]
34%|███▎ | 1518/4506 [1:43:48<3:20:55, 4.03s/it]
{'loss': 0.2882, 'grad_norm': 0.42848291993141174, 'learning_rate': 4.194778304583815e-05, 'epoch': 0.34}
34%|███▎ | 1518/4506 [1:43:48<3:20:55, 4.03s/it]
34%|███▎ | 1519/4506 [1:43:52<3:23:37, 4.09s/it]
{'loss': 0.293, 'grad_norm': 0.41862010955810547, 'learning_rate': 4.193353923098312e-05, 'epoch': 0.34}
34%|███▎ | 1519/4506 [1:43:52<3:23:37, 4.09s/it]
34%|███▎ | 1520/4506 [1:43:56<3:23:52, 4.10s/it]
{'loss': 0.2891, 'grad_norm': 0.4129178524017334, 'learning_rate': 4.191928525210262e-05, 'epoch': 0.34}
34%|███▎ | 1520/4506 [1:43:56<3:23:52, 4.10s/it]
34%|███▍ | 1521/4506 [1:44:00<3:23:56, 4.10s/it]
{'loss': 0.3008, 'grad_norm': 0.3978825509548187, 'learning_rate': 4.190502111775233e-05, 'epoch': 0.34}
34%|███▍ | 1521/4506 [1:44:00<3:23:56, 4.10s/it]
34%|███▍ | 1522/4506 [1:44:04<3:19:17, 4.01s/it]
{'loss': 0.2793, 'grad_norm': 0.45631855726242065, 'learning_rate': 4.189074683649399e-05, 'epoch': 0.34}
34%|███▍ | 1522/4506 [1:44:04<3:19:17, 4.01s/it]
34%|███▍ | 1523/4506 [1:44:08<3:18:46, 4.00s/it]
{'loss': 0.2838, 'grad_norm': 0.41053307056427, 'learning_rate': 4.187646241689548e-05, 'epoch': 0.34}
34%|███▍ | 1523/4506 [1:44:08<3:18:46, 4.00s/it]
34%|███▍ | 1524/4506 [1:44:12<3:23:49, 4.10s/it]
{'loss': 0.2843, 'grad_norm': 0.47488394379615784, 'learning_rate': 4.186216786753074e-05, 'epoch': 0.34}
34%|███▍ | 1524/4506 [1:44:12<3:23:49, 4.10s/it]
34%|███▍ | 1525/4506 [1:44:17<3:23:57, 4.11s/it]
{'loss': 0.2793, 'grad_norm': 0.46201545000076294, 'learning_rate': 4.1847863196979784e-05, 'epoch': 0.34}
34%|███▍ | 1525/4506 [1:44:17<3:23:57, 4.11s/it]
34%|███▍ | 1526/4506 [1:44:21<3:22:16, 4.07s/it]
{'loss': 0.2824, 'grad_norm': 0.4046635925769806, 'learning_rate': 4.1833548413828724e-05, 'epoch': 0.34}
34%|███▍ | 1526/4506 [1:44:21<3:22:16, 4.07s/it]
34%|███▍ | 1527/4506 [1:44:24<3:17:23, 3.98s/it]
{'loss': 0.2901, 'grad_norm': 0.4311058819293976, 'learning_rate': 4.1819223526669715e-05, 'epoch': 0.34}
34%|███▍ | 1527/4506 [1:44:24<3:17:23, 3.98s/it]
34%|███▍ | 1528/4506 [1:44:29<3:27:19, 4.18s/it]
{'loss': 0.2733, 'grad_norm': 0.41044649481773376, 'learning_rate': 4.1804888544100996e-05, 'epoch': 0.34}
34%|███▍ | 1528/4506 [1:44:29<3:27:19, 4.18s/it]
34%|███▍ | 1529/4506 [1:44:33<3:25:30, 4.14s/it]
{'loss': 0.269, 'grad_norm': 0.4165918231010437, 'learning_rate': 4.179054347472686e-05, 'epoch': 0.34}
34%|███▍ | 1529/4506 [1:44:33<3:25:30, 4.14s/it]
34%|███▍ | 1530/4506 [1:44:37<3:24:27, 4.12s/it]
{'loss': 0.2721, 'grad_norm': 0.4376011788845062, 'learning_rate': 4.177618832715766e-05, 'epoch': 0.34}
34%|███▍ | 1530/4506 [1:44:37<3:24:27, 4.12s/it]
34%|███▍ | 1531/4506 [1:44:41<3:23:54, 4.11s/it]
{'loss': 0.2975, 'grad_norm': 0.4824669659137726, 'learning_rate': 4.1761823110009786e-05, 'epoch': 0.34}
34%|███▍ | 1531/4506 [1:44:41<3:23:54, 4.11s/it]
34%|███▍ | 1532/4506 [1:44:45<3:23:06, 4.10s/it]
{'loss': 0.2835, 'grad_norm': 0.4169347286224365, 'learning_rate': 4.174744783190567e-05, 'epoch': 0.34}
34%|███▍ | 1532/4506 [1:44:45<3:23:06, 4.10s/it]
34%|███▍ | 1533/4506 [1:44:49<3:23:10, 4.10s/it]
{'loss': 0.2634, 'grad_norm': 0.42513883113861084, 'learning_rate': 4.1733062501473806e-05, 'epoch': 0.34}
34%|███▍ | 1533/4506 [1:44:49<3:23:10, 4.10s/it]
34%|███▍ | 1534/4506 [1:44:53<3:20:39, 4.05s/it]
{'loss': 0.2721, 'grad_norm': 0.4609147608280182, 'learning_rate': 4.171866712734871e-05, 'epoch': 0.34}
34%|███▍ | 1534/4506 [1:44:53<3:20:39, 4.05s/it]
34%|███▍ | 1535/4506 [1:44:58<3:29:33, 4.23s/it]
{'loss': 0.2752, 'grad_norm': 0.402935266494751, 'learning_rate': 4.1704261718170904e-05, 'epoch': 0.34}
34%|███▍ | 1535/4506 [1:44:58<3:29:33, 4.23s/it]
34%|███▍ | 1536/4506 [1:45:02<3:29:54, 4.24s/it]
{'loss': 0.297, 'grad_norm': 0.9569360613822937, 'learning_rate': 4.168984628258697e-05, 'epoch': 0.34}
34%|███▍ | 1536/4506 [1:45:02<3:29:54, 4.24s/it]
34%|███▍ | 1537/4506 [1:45:07<3:34:21, 4.33s/it]
{'loss': 0.2839, 'grad_norm': 0.4887223541736603, 'learning_rate': 4.1675420829249476e-05, 'epoch': 0.34}
34%|███▍ | 1537/4506 [1:45:07<3:34:21, 4.33s/it]
34%|███▍ | 1538/4506 [1:45:11<3:37:41, 4.40s/it]
{'loss': 0.2808, 'grad_norm': 0.47963598370552063, 'learning_rate': 4.166098536681704e-05, 'epoch': 0.34}
34%|███▍ | 1538/4506 [1:45:11<3:37:41, 4.40s/it]
34%|███▍ | 1539/4506 [1:45:16<3:35:26, 4.36s/it]
{'loss': 0.2777, 'grad_norm': 0.5310475826263428, 'learning_rate': 4.164653990395425e-05, 'epoch': 0.34}
34%|███▍ | 1539/4506 [1:45:16<3:35:26, 4.36s/it]
34%|███▍ | 1540/4506 [1:45:20<3:37:17, 4.40s/it]
{'loss': 0.2935, 'grad_norm': 0.45913976430892944, 'learning_rate': 4.163208444933171e-05, 'epoch': 0.34}
34%|███▍ | 1540/4506 [1:45:20<3:37:17, 4.40s/it]
34%|███▍ | 1541/4506 [1:45:24<3:28:08, 4.21s/it]
{'loss': 0.2809, 'grad_norm': 0.5106010437011719, 'learning_rate': 4.161761901162603e-05, 'epoch': 0.34}
34%|███▍ | 1541/4506 [1:45:24<3:28:08, 4.21s/it]
34%|███▍ | 1542/4506 [1:45:28<3:24:19, 4.14s/it]
{'loss': 0.277, 'grad_norm': 0.45751670002937317, 'learning_rate': 4.16031435995198e-05, 'epoch': 0.34}
34%|███▍ | 1542/4506 [1:45:28<3:24:19, 4.14s/it]
34%|███▍ | 1543/4506 [1:45:32<3:27:13, 4.20s/it]
{'loss': 0.2825, 'grad_norm': 0.4055114984512329, 'learning_rate': 4.158865822170161e-05, 'epoch': 0.34}
34%|███▍ | 1543/4506 [1:45:32<3:27:13, 4.20s/it]
34%|███▍ | 1544/4506 [1:45:36<3:25:47, 4.17s/it]
{'loss': 0.2853, 'grad_norm': 0.4883459508419037, 'learning_rate': 4.157416288686602e-05, 'epoch': 0.34}
34%|███▍ | 1544/4506 [1:45:36<3:25:47, 4.17s/it]
34%|███▍ | 1545/4506 [1:45:41<3:30:11, 4.26s/it]
{'loss': 0.283, 'grad_norm': 0.43390917778015137, 'learning_rate': 4.155965760371359e-05, 'epoch': 0.34}
34%|███▍ | 1545/4506 [1:45:41<3:30:11, 4.26s/it]
34%|███▍ | 1546/4506 [1:45:45<3:29:13, 4.24s/it]
{'loss': 0.2771, 'grad_norm': 0.45208436250686646, 'learning_rate': 4.1545142380950796e-05, 'epoch': 0.34}
34%|███▍ | 1546/4506 [1:45:45<3:29:13, 4.24s/it]
34%|███▍ | 1547/4506 [1:45:49<3:23:40, 4.13s/it]
{'loss': 0.2778, 'grad_norm': 0.4612308442592621, 'learning_rate': 4.1530617227290136e-05, 'epoch': 0.34}
34%|███▍ | 1547/4506 [1:45:49<3:23:40, 4.13s/it]
34%|███▍ | 1548/4506 [1:45:53<3:21:42, 4.09s/it]
{'loss': 0.2844, 'grad_norm': 0.4830743968486786, 'learning_rate': 4.151608215145005e-05, 'epoch': 0.34}
34%|███▍ | 1548/4506 [1:45:53<3:21:42, 4.09s/it]
34%|███▍ | 1549/4506 [1:45:57<3:24:10, 4.14s/it]
{'loss': 0.2944, 'grad_norm': 0.4472517967224121, 'learning_rate': 4.1501537162154934e-05, 'epoch': 0.34}
34%|███▍ | 1549/4506 [1:45:57<3:24:10, 4.14s/it]
34%|███▍ | 1550/4506 [1:46:01<3:22:46, 4.12s/it]
{'loss': 0.285, 'grad_norm': 0.40040329098701477, 'learning_rate': 4.148698226813514e-05, 'epoch': 0.34}
34%|███▍ | 1550/4506 [1:46:01<3:22:46, 4.12s/it]
34%|███▍ | 1551/4506 [1:46:05<3:19:50, 4.06s/it]
{'loss': 0.2633, 'grad_norm': 0.3971138894557953, 'learning_rate': 4.147241747812694e-05, 'epoch': 0.34}
34%|███▍ | 1551/4506 [1:46:05<3:19:50, 4.06s/it]
34%|███▍ | 1552/4506 [1:46:09<3:17:02, 4.00s/it]
{'loss': 0.292, 'grad_norm': 0.4465784728527069, 'learning_rate': 4.145784280087257e-05, 'epoch': 0.34}
34%|███▍ | 1552/4506 [1:46:09<3:17:02, 4.00s/it]
34%|███▍ | 1553/4506 [1:46:13<3:14:21, 3.95s/it]
{'loss': 0.2707, 'grad_norm': 0.5190557837486267, 'learning_rate': 4.14432582451202e-05, 'epoch': 0.34}
34%|███▍ | 1553/4506 [1:46:13<3:14:21, 3.95s/it]
34%|███▍ | 1554/4506 [1:46:17<3:17:32, 4.02s/it]
{'loss': 0.2717, 'grad_norm': 0.4318910837173462, 'learning_rate': 4.142866381962393e-05, 'epoch': 0.34}
34%|███▍ | 1554/4506 [1:46:17<3:17:32, 4.02s/it]
35%|███▍ | 1555/4506 [1:46:21<3:16:36, 4.00s/it]
{'loss': 0.3022, 'grad_norm': 0.5107045769691467, 'learning_rate': 4.1414059533143755e-05, 'epoch': 0.35}
35%|███▍ | 1555/4506 [1:46:21<3:16:36, 4.00s/it]
35%|███▍ | 1556/4506 [1:46:25<3:19:03, 4.05s/it]
{'loss': 0.2705, 'grad_norm': 0.44626376032829285, 'learning_rate': 4.139944539444564e-05, 'epoch': 0.35}
35%|███▍ | 1556/4506 [1:46:25<3:19:03, 4.05s/it]
35%|███▍ | 1557/4506 [1:46:29<3:19:28, 4.06s/it]
{'loss': 0.2697, 'grad_norm': 0.38844987750053406, 'learning_rate': 4.138482141230141e-05, 'epoch': 0.35}
35%|███▍ | 1557/4506 [1:46:29<3:19:28, 4.06s/it]
35%|███▍ | 1558/4506 [1:46:33<3:22:27, 4.12s/it]
{'loss': 0.2873, 'grad_norm': 0.45312634110450745, 'learning_rate': 4.137018759548885e-05, 'epoch': 0.35}
35%|███▍ | 1558/4506 [1:46:33<3:22:27, 4.12s/it]
35%|███▍ | 1559/4506 [1:46:37<3:20:09, 4.08s/it]
{'loss': 0.2671, 'grad_norm': 0.35628706216812134, 'learning_rate': 4.13555439527916e-05, 'epoch': 0.35}
35%|███▍ | 1559/4506 [1:46:37<3:20:09, 4.08s/it]
35%|███▍ | 1560/4506 [1:46:41<3:19:35, 4.07s/it]
{'loss': 0.287, 'grad_norm': 0.4286841154098511, 'learning_rate': 4.134089049299923e-05, 'epoch': 0.35}
35%|███▍ | 1560/4506 [1:46:41<3:19:35, 4.07s/it]
35%|███▍ | 1561/4506 [1:46:45<3:17:42, 4.03s/it]
{'loss': 0.2821, 'grad_norm': 0.4071606695652008, 'learning_rate': 4.13262272249072e-05, 'epoch': 0.35}
35%|███▍ | 1561/4506 [1:46:45<3:17:42, 4.03s/it]
35%|███▍ | 1562/4506 [1:46:49<3:19:26, 4.06s/it]
{'loss': 0.2853, 'grad_norm': 0.4861789643764496, 'learning_rate': 4.1311554157316834e-05, 'epoch': 0.35}
35%|███▍ | 1562/4506 [1:46:49<3:19:26, 4.06s/it]
35%|███▍ | 1563/4506 [1:46:53<3:16:14, 4.00s/it]
{'loss': 0.2846, 'grad_norm': 0.42296212911605835, 'learning_rate': 4.1296871299035355e-05, 'epoch': 0.35}
35%|███▍ | 1563/4506 [1:46:53<3:16:14, 4.00s/it]
35%|███▍ | 1564/4506 [1:46:58<3:19:35, 4.07s/it]
{'loss': 0.2789, 'grad_norm': 0.41625919938087463, 'learning_rate': 4.1282178658875876e-05, 'epoch': 0.35}
35%|███▍ | 1564/4506 [1:46:58<3:19:35, 4.07s/it]
35%|███▍ | 1565/4506 [1:47:01<3:17:10, 4.02s/it]
{'loss': 0.2896, 'grad_norm': 0.44237378239631653, 'learning_rate': 4.1267476245657354e-05, 'epoch': 0.35}
35%|███▍ | 1565/4506 [1:47:01<3:17:10, 4.02s/it]
35%|███▍ | 1566/4506 [1:47:05<3:14:50, 3.98s/it]
{'loss': 0.278, 'grad_norm': 0.4950725734233856, 'learning_rate': 4.125276406820463e-05, 'epoch': 0.35}
35%|███▍ | 1566/4506 [1:47:05<3:14:50, 3.98s/it]
35%|███▍ | 1567/4506 [1:47:09<3:17:45, 4.04s/it]
{'loss': 0.2889, 'grad_norm': 0.4341103136539459, 'learning_rate': 4.123804213534839e-05, 'epoch': 0.35}
35%|███▍ | 1567/4506 [1:47:09<3:17:45, 4.04s/it]
35%|███▍ | 1568/4506 [1:47:13<3:17:31, 4.03s/it]
{'loss': 0.284, 'grad_norm': 0.4430128335952759, 'learning_rate': 4.1223310455925204e-05, 'epoch': 0.35}
35%|███▍ | 1568/4506 [1:47:14<3:17:31, 4.03s/it]
35%|███▍ | 1569/4506 [1:47:18<3:19:33, 4.08s/it]
{'loss': 0.2769, 'grad_norm': 0.41687318682670593, 'learning_rate': 4.1208569038777465e-05, 'epoch': 0.35}
35%|███▍ | 1569/4506 [1:47:18<3:19:33, 4.08s/it]
35%|███▍ | 1570/4506 [1:47:22<3:18:51, 4.06s/it]
{'loss': 0.2883, 'grad_norm': 0.49631837010383606, 'learning_rate': 4.119381789275342e-05, 'epoch': 0.35}
35%|███▍ | 1570/4506 [1:47:22<3:18:51, 4.06s/it]
35%|███▍ | 1571/4506 [1:47:26<3:16:24, 4.02s/it]
{'loss': 0.2701, 'grad_norm': 0.43180757761001587, 'learning_rate': 4.1179057026707155e-05, 'epoch': 0.35}
35%|███▍ | 1571/4506 [1:47:26<3:16:24, 4.02s/it]
35%|███▍ | 1572/4506 [1:47:29<3:12:31, 3.94s/it]
{'loss': 0.2811, 'grad_norm': 0.46380746364593506, 'learning_rate': 4.116428644949859e-05, 'epoch': 0.35}
35%|███▍ | 1572/4506 [1:47:29<3:12:31, 3.94s/it]
35%|███▍ | 1573/4506 [1:47:34<3:15:51, 4.01s/it]
{'loss': 0.2797, 'grad_norm': 0.4304683208465576, 'learning_rate': 4.114950616999348e-05, 'epoch': 0.35}
35%|███▍ | 1573/4506 [1:47:34<3:15:51, 4.01s/it]
35%|███▍ | 1574/4506 [1:47:38<3:20:37, 4.11s/it]
{'loss': 0.2804, 'grad_norm': 0.42848068475723267, 'learning_rate': 4.113471619706339e-05, 'epoch': 0.35}
35%|███▍ | 1574/4506 [1:47:38<3:20:37, 4.11s/it]
35%|███▍ | 1575/4506 [1:47:42<3:22:16, 4.14s/it]
{'loss': 0.2816, 'grad_norm': 0.4025889039039612, 'learning_rate': 4.111991653958572e-05, 'epoch': 0.35}
35%|███▍ | 1575/4506 [1:47:42<3:22:16, 4.14s/it]
35%|███▍ | 1576/4506 [1:47:46<3:20:26, 4.10s/it]
{'loss': 0.2805, 'grad_norm': 0.4112975299358368, 'learning_rate': 4.1105107206443674e-05, 'epoch': 0.35}
35%|███▍ | 1576/4506 [1:47:46<3:20:26, 4.10s/it]
35%|███▍ | 1577/4506 [1:47:50<3:22:22, 4.15s/it]
{'loss': 0.2725, 'grad_norm': 0.3812086582183838, 'learning_rate': 4.109028820652625e-05, 'epoch': 0.35}
35%|███▍ | 1577/4506 [1:47:50<3:22:22, 4.15s/it]
35%|███▌ | 1578/4506 [1:47:55<3:26:54, 4.24s/it]
{'loss': 0.2848, 'grad_norm': 0.4458532929420471, 'learning_rate': 4.107545954872829e-05, 'epoch': 0.35}
35%|███▌ | 1578/4506 [1:47:55<3:26:54, 4.24s/it]
35%|███▌ | 1579/4506 [1:47:59<3:24:09, 4.19s/it]
{'loss': 0.2796, 'grad_norm': 0.4084831476211548, 'learning_rate': 4.106062124195038e-05, 'epoch': 0.35}
35%|███▌ | 1579/4506 [1:47:59<3:24:09, 4.19s/it]
35%|███▌ | 1580/4506 [1:48:03<3:27:36, 4.26s/it]
{'loss': 0.3023, 'grad_norm': 0.43972647190093994, 'learning_rate': 4.104577329509894e-05, 'epoch': 0.35}
35%|███▌ | 1580/4506 [1:48:03<3:27:36, 4.26s/it]
35%|███▌ | 1581/4506 [1:48:08<3:28:34, 4.28s/it]
{'loss': 0.2894, 'grad_norm': 0.388906329870224, 'learning_rate': 4.103091571708615e-05, 'epoch': 0.35}
35%|███▌ | 1581/4506 [1:48:08<3:28:34, 4.28s/it]
35%|███▌ | 1582/4506 [1:48:12<3:31:14, 4.33s/it]
{'loss': 0.285, 'grad_norm': 0.48577889800071716, 'learning_rate': 4.101604851682997e-05, 'epoch': 0.35}
35%|███▌ | 1582/4506 [1:48:12<3:31:14, 4.33s/it]
35%|███▌ | 1583/4506 [1:48:16<3:29:01, 4.29s/it]
{'loss': 0.2846, 'grad_norm': 0.4223753809928894, 'learning_rate': 4.1001171703254184e-05, 'epoch': 0.35}
35%|███▌ | 1583/4506 [1:48:16<3:29:01, 4.29s/it]
35%|███▌ | 1584/4506 [1:48:21<3:29:29, 4.30s/it]
{'loss': 0.2781, 'grad_norm': 0.4361390769481659, 'learning_rate': 4.098628528528827e-05, 'epoch': 0.35}
35%|███▌ | 1584/4506 [1:48:21<3:29:29, 4.30s/it]
35%|███▌ | 1585/4506 [1:48:25<3:31:25, 4.34s/it]
{'loss': 0.292, 'grad_norm': 0.4611336588859558, 'learning_rate': 4.097138927186752e-05, 'epoch': 0.35}
35%|███▌ | 1585/4506 [1:48:25<3:31:25, 4.34s/it]
35%|███▌ | 1586/4506 [1:48:29<3:30:35, 4.33s/it]
{'loss': 0.2796, 'grad_norm': 0.41695350408554077, 'learning_rate': 4.0956483671932975e-05, 'epoch': 0.35}
35%|███▌ | 1586/4506 [1:48:29<3:30:35, 4.33s/it]
35%|███▌ | 1587/4506 [1:48:33<3:23:41, 4.19s/it]
{'loss': 0.2816, 'grad_norm': 0.43777358531951904, 'learning_rate': 4.094156849443144e-05, 'epoch': 0.35}
35%|███▌ | 1587/4506 [1:48:33<3:23:41, 4.19s/it]
35%|███▌ | 1588/4506 [1:48:37<3:24:58, 4.21s/it]
{'loss': 0.2811, 'grad_norm': 0.49022236466407776, 'learning_rate': 4.092664374831545e-05, 'epoch': 0.35}
35%|███▌ | 1588/4506 [1:48:37<3:24:58, 4.21s/it]
35%|███▌ | 1589/4506 [1:48:42<3:25:23, 4.22s/it]
{'loss': 0.2779, 'grad_norm': 0.37504854798316956, 'learning_rate': 4.091170944254329e-05, 'epoch': 0.35}
35%|███▌ | 1589/4506 [1:48:42<3:25:23, 4.22s/it]
35%|███▌ | 1590/4506 [1:48:46<3:21:10, 4.14s/it]
{'loss': 0.2803, 'grad_norm': 0.4536045491695404, 'learning_rate': 4.0896765586078985e-05, 'epoch': 0.35}
35%|███▌ | 1590/4506 [1:48:46<3:21:10, 4.14s/it]
35%|███▌ | 1591/4506 [1:48:50<3:24:37, 4.21s/it]
{'loss': 0.2897, 'grad_norm': 0.3809104263782501, 'learning_rate': 4.088181218789229e-05, 'epoch': 0.35}
35%|███▌ | 1591/4506 [1:48:50<3:24:37, 4.21s/it]
35%|███▌ | 1592/4506 [1:48:54<3:18:56, 4.10s/it]
{'loss': 0.287, 'grad_norm': 0.4549267590045929, 'learning_rate': 4.0866849256958694e-05, 'epoch': 0.35}
35%|███▌ | 1592/4506 [1:48:54<3:18:56, 4.10s/it]
35%|███▌ | 1593/4506 [1:48:58<3:19:06, 4.10s/it]
{'loss': 0.2526, 'grad_norm': 0.3958105444908142, 'learning_rate': 4.0851876802259404e-05, 'epoch': 0.35}
35%|███▌ | 1593/4506 [1:48:58<3:19:06, 4.10s/it]
35%|███▌ | 1594/4506 [1:49:02<3:18:58, 4.10s/it]
{'loss': 0.2763, 'grad_norm': 0.4432293474674225, 'learning_rate': 4.0836894832781346e-05, 'epoch': 0.35}
35%|███▌ | 1594/4506 [1:49:02<3:18:58, 4.10s/it]
35%|███▌ | 1595/4506 [1:49:07<3:25:51, 4.24s/it]
{'loss': 0.2845, 'grad_norm': 0.40832918882369995, 'learning_rate': 4.082190335751714e-05, 'epoch': 0.35}
35%|███▌ | 1595/4506 [1:49:07<3:25:51, 4.24s/it]
35%|███▌ | 1596/4506 [1:49:10<3:19:22, 4.11s/it]
{'loss': 0.287, 'grad_norm': 0.44586682319641113, 'learning_rate': 4.080690238546514e-05, 'epoch': 0.35}
35%|███▌ | 1596/4506 [1:49:10<3:19:22, 4.11s/it]
35%|███▌ | 1597/4506 [1:49:14<3:14:40, 4.02s/it]
{'loss': 0.2782, 'grad_norm': 0.4135918617248535, 'learning_rate': 4.0791891925629385e-05, 'epoch': 0.35}
35%|███▌ | 1597/4506 [1:49:14<3:14:40, 4.02s/it]
35%|███▌ | 1598/4506 [1:49:18<3:10:37, 3.93s/it]
{'loss': 0.2731, 'grad_norm': 0.4487631916999817, 'learning_rate': 4.07768719870196e-05, 'epoch': 0.35}
35%|███▌ | 1598/4506 [1:49:18<3:10:37, 3.93s/it]
35%|███▌ | 1599/4506 [1:49:22<3:13:54, 4.00s/it]
{'loss': 0.2774, 'grad_norm': 0.41095781326293945, 'learning_rate': 4.076184257865122e-05, 'epoch': 0.35}
35%|███▌ | 1599/4506 [1:49:22<3:13:54, 4.00s/it]
36%|███▌ | 1600/4506 [1:49:26<3:11:28, 3.95s/it]
{'loss': 0.2861, 'grad_norm': 0.45891281962394714, 'learning_rate': 4.074680370954534e-05, 'epoch': 0.36}
36%|███▌ | 1600/4506 [1:49:26<3:11:28, 3.95s/it]
36%|███▌ | 1601/4506 [1:49:30<3:14:05, 4.01s/it]
{'loss': 0.2795, 'grad_norm': 0.4345782399177551, 'learning_rate': 4.0731755388728755e-05, 'epoch': 0.36}
36%|███▌ | 1601/4506 [1:49:30<3:14:05, 4.01s/it]
36%|███▌ | 1602/4506 [1:49:35<3:20:09, 4.14s/it]
{'loss': 0.2864, 'grad_norm': 0.43300318717956543, 'learning_rate': 4.071669762523393e-05, 'epoch': 0.36}
36%|███▌ | 1602/4506 [1:49:35<3:20:09, 4.14s/it]
36%|███▌ | 1603/4506 [1:49:39<3:19:14, 4.12s/it]
{'loss': 0.2842, 'grad_norm': 0.45081084966659546, 'learning_rate': 4.070163042809898e-05, 'epoch': 0.36}
36%|███▌ | 1603/4506 [1:49:39<3:19:14, 4.12s/it]
36%|███▌ | 1604/4506 [1:49:43<3:17:30, 4.08s/it]
{'loss': 0.2726, 'grad_norm': 0.4639222323894501, 'learning_rate': 4.068655380636771e-05, 'epoch': 0.36}
36%|███▌ | 1604/4506 [1:49:43<3:17:30, 4.08s/it]
36%|███▌ | 1605/4506 [1:49:47<3:25:40, 4.25s/it]
{'loss': 0.2903, 'grad_norm': 0.4599519371986389, 'learning_rate': 4.067146776908956e-05, 'epoch': 0.36}
36%|███▌ | 1605/4506 [1:49:47<3:25:40, 4.25s/it]
36%|███▌ | 1606/4506 [1:49:51<3:21:49, 4.18s/it]
{'loss': 0.2676, 'grad_norm': 0.396931529045105, 'learning_rate': 4.065637232531963e-05, 'epoch': 0.36}
36%|███▌ | 1606/4506 [1:49:51<3:21:49, 4.18s/it]
36%|███▌ | 1607/4506 [1:49:55<3:19:17, 4.12s/it]
{'loss': 0.2815, 'grad_norm': 0.4335518181324005, 'learning_rate': 4.0641267484118655e-05, 'epoch': 0.36}
36%|███▌ | 1607/4506 [1:49:55<3:19:17, 4.12s/it]
36%|███▌ | 1608/4506 [1:49:59<3:16:19, 4.06s/it]
{'loss': 0.2809, 'grad_norm': 0.4439251720905304, 'learning_rate': 4.062615325455304e-05, 'epoch': 0.36}
36%|███▌ | 1608/4506 [1:49:59<3:16:19, 4.06s/it]
36%|███▌ | 1609/4506 [1:50:03<3:13:08, 4.00s/it]
{'loss': 0.2776, 'grad_norm': 0.44261056184768677, 'learning_rate': 4.0611029645694775e-05, 'epoch': 0.36}
36%|███▌ | 1609/4506 [1:50:03<3:13:08, 4.00s/it]
36%|███▌ | 1610/4506 [1:50:07<3:12:24, 3.99s/it]
{'loss': 0.2802, 'grad_norm': 0.397990345954895, 'learning_rate': 4.059589666662155e-05, 'epoch': 0.36}
36%|███▌ | 1610/4506 [1:50:07<3:12:24, 3.99s/it]
36%|███▌ | 1611/4506 [1:50:11<3:09:48, 3.93s/it]
{'loss': 0.2762, 'grad_norm': 0.434385746717453, 'learning_rate': 4.0580754326416605e-05, 'epoch': 0.36}
36%|███▌ | 1611/4506 [1:50:11<3:09:48, 3.93s/it]
36%|███▌ | 1612/4506 [1:50:15<3:09:43, 3.93s/it]
{'loss': 0.283, 'grad_norm': 0.3673551678657532, 'learning_rate': 4.056560263416884e-05, 'epoch': 0.36}
36%|███▌ | 1612/4506 [1:50:15<3:09:43, 3.93s/it]
36%|███▌ | 1613/4506 [1:50:19<3:11:55, 3.98s/it]
{'loss': 0.2697, 'grad_norm': 0.41635674238204956, 'learning_rate': 4.0550441598972785e-05, 'epoch': 0.36}
36%|███▌ | 1613/4506 [1:50:19<3:11:55, 3.98s/it]
36%|███▌ | 1614/4506 [1:50:23<3:16:35, 4.08s/it]
{'loss': 0.2804, 'grad_norm': 0.41972190141677856, 'learning_rate': 4.053527122992853e-05, 'epoch': 0.36}
36%|███▌ | 1614/4506 [1:50:23<3:16:35, 4.08s/it]
36%|███▌ | 1615/4506 [1:50:27<3:18:44, 4.12s/it]
{'loss': 0.2873, 'grad_norm': 0.4269864559173584, 'learning_rate': 4.0520091536141803e-05, 'epoch': 0.36}
36%|███▌ | 1615/4506 [1:50:27<3:18:44, 4.12s/it]
36%|███▌ | 1616/4506 [1:50:32<3:19:53, 4.15s/it]
{'loss': 0.2824, 'grad_norm': 0.4154677987098694, 'learning_rate': 4.050490252672391e-05, 'epoch': 0.36}
36%|███▌ | 1616/4506 [1:50:32<3:19:53, 4.15s/it]
36%|███▌ | 1617/4506 [1:50:36<3:19:40, 4.15s/it]
{'loss': 0.2786, 'grad_norm': 0.4013412594795227, 'learning_rate': 4.048970421079177e-05, 'epoch': 0.36}
36%|███▌ | 1617/4506 [1:50:36<3:19:40, 4.15s/it]
36%|███▌ | 1618/4506 [1:50:40<3:20:51, 4.17s/it]
{'loss': 0.283, 'grad_norm': 0.40638142824172974, 'learning_rate': 4.047449659746786e-05, 'epoch': 0.36}
36%|███▌ | 1618/4506 [1:50:40<3:20:51, 4.17s/it]
36%|███▌ | 1619/4506 [1:50:44<3:20:55, 4.18s/it]
{'loss': 0.2594, 'grad_norm': 0.3726873993873596, 'learning_rate': 4.045927969588026e-05, 'epoch': 0.36}
36%|███▌ | 1619/4506 [1:50:44<3:20:55, 4.18s/it]
36%|███▌ | 1620/4506 [1:50:48<3:22:50, 4.22s/it]
{'loss': 0.2737, 'grad_norm': 0.38343316316604614, 'learning_rate': 4.044405351516262e-05, 'epoch': 0.36}
36%|███▌ | 1620/4506 [1:50:48<3:22:50, 4.22s/it]
36%|███▌ | 1621/4506 [1:50:53<3:20:59, 4.18s/it]
{'loss': 0.281, 'grad_norm': 0.421086847782135, 'learning_rate': 4.042881806445414e-05, 'epoch': 0.36}
36%|███▌ | 1621/4506 [1:50:53<3:20:59, 4.18s/it]
36%|███▌ | 1622/4506 [1:50:57<3:17:49, 4.12s/it]
{'loss': 0.2705, 'grad_norm': 0.4429261386394501, 'learning_rate': 4.041357335289962e-05, 'epoch': 0.36}
36%|███▌ | 1622/4506 [1:50:57<3:17:49, 4.12s/it]
36%|███▌ | 1623/4506 [1:51:01<3:22:18, 4.21s/it]
{'loss': 0.2794, 'grad_norm': 0.43456265330314636, 'learning_rate': 4.039831938964941e-05, 'epoch': 0.36}
36%|███▌ | 1623/4506 [1:51:01<3:22:18, 4.21s/it]
36%|███▌ | 1624/4506 [1:51:05<3:19:44, 4.16s/it]
{'loss': 0.2644, 'grad_norm': 0.45219284296035767, 'learning_rate': 4.0383056183859366e-05, 'epoch': 0.36}
36%|███▌ | 1624/4506 [1:51:05<3:19:44, 4.16s/it]
36%|███▌ | 1625/4506 [1:51:09<3:16:28, 4.09s/it]
{'loss': 0.2798, 'grad_norm': 0.408048540353775, 'learning_rate': 4.0367783744690954e-05, 'epoch': 0.36}
36%|███▌ | 1625/4506 [1:51:09<3:16:28, 4.09s/it]
36%|███▌ | 1626/4506 [1:51:13<3:11:24, 3.99s/it]
{'loss': 0.2663, 'grad_norm': 0.3993772566318512, 'learning_rate': 4.035250208131116e-05, 'epoch': 0.36}
36%|███▌ | 1626/4506 [1:51:13<3:11:24, 3.99s/it]
36%|███▌ | 1627/4506 [1:51:17<3:13:13, 4.03s/it]
{'loss': 0.2891, 'grad_norm': 0.43439218401908875, 'learning_rate': 4.0337211202892486e-05, 'epoch': 0.36}
36%|███▌ | 1627/4506 [1:51:17<3:13:13, 4.03s/it]
36%|███▌ | 1628/4506 [1:51:21<3:10:07, 3.96s/it]
{'loss': 0.2794, 'grad_norm': 0.3963658809661865, 'learning_rate': 4.0321911118613e-05, 'epoch': 0.36}
36%|███▌ | 1628/4506 [1:51:21<3:10:07, 3.96s/it]
36%|███▌ | 1629/4506 [1:51:25<3:14:22, 4.05s/it]
{'loss': 0.2562, 'grad_norm': 0.39873236417770386, 'learning_rate': 4.030660183765625e-05, 'epoch': 0.36}
36%|███▌ | 1629/4506 [1:51:25<3:14:22, 4.05s/it]
36%|███▌ | 1630/4506 [1:51:29<3:19:11, 4.16s/it]
{'loss': 0.2736, 'grad_norm': 0.4174105226993561, 'learning_rate': 4.0291283369211374e-05, 'epoch': 0.36}
36%|███▌ | 1630/4506 [1:51:29<3:19:11, 4.16s/it]
36%|███▌ | 1631/4506 [1:51:33<3:19:47, 4.17s/it]
{'loss': 0.2917, 'grad_norm': 0.42155569791793823, 'learning_rate': 4.027595572247296e-05, 'epoch': 0.36}
36%|███▌ | 1631/4506 [1:51:33<3:19:47, 4.17s/it]
36%|███▌ | 1632/4506 [1:51:39<3:35:23, 4.50s/it]
{'loss': 0.2804, 'grad_norm': 0.40601375699043274, 'learning_rate': 4.026061890664112e-05, 'epoch': 0.36}
36%|███▌ | 1632/4506 [1:51:39<3:35:23, 4.50s/it]
36%|███▌ | 1633/4506 [1:51:42<3:24:28, 4.27s/it]
{'loss': 0.2922, 'grad_norm': 0.4298681318759918, 'learning_rate': 4.024527293092149e-05, 'epoch': 0.36}
36%|███▌ | 1633/4506 [1:51:42<3:24:28, 4.27s/it]
36%|███▋ | 1634/4506 [1:51:47<3:23:00, 4.24s/it]
{'loss': 0.2794, 'grad_norm': 0.38963019847869873, 'learning_rate': 4.0229917804525174e-05, 'epoch': 0.36}
36%|███▋ | 1634/4506 [1:51:47<3:23:00, 4.24s/it]
36%|███▋ | 1635/4506 [1:51:51<3:18:46, 4.15s/it]
{'loss': 0.2836, 'grad_norm': 0.4115995466709137, 'learning_rate': 4.021455353666882e-05, 'epoch': 0.36}
36%|███▋ | 1635/4506 [1:51:51<3:18:46, 4.15s/it]
36%|███▋ | 1636/4506 [1:51:54<3:14:52, 4.07s/it]
{'loss': 0.2784, 'grad_norm': 0.4356158971786499, 'learning_rate': 4.01991801365745e-05, 'epoch': 0.36}
36%|███▋ | 1636/4506 [1:51:55<3:14:52, 4.07s/it]
36%|███▋ | 1637/4506 [1:51:59<3:16:04, 4.10s/it]
{'loss': 0.2745, 'grad_norm': 0.3956626355648041, 'learning_rate': 4.0183797613469804e-05, 'epoch': 0.36}
36%|███▋ | 1637/4506 [1:51:59<3:16:04, 4.10s/it]
36%|███▋ | 1638/4506 [1:52:03<3:14:06, 4.06s/it]
{'loss': 0.274, 'grad_norm': 0.40470635890960693, 'learning_rate': 4.016840597658779e-05, 'epoch': 0.36}
36%|███▋ | 1638/4506 [1:52:03<3:14:06, 4.06s/it]
36%|███▋ | 1639/4506 [1:52:07<3:19:13, 4.17s/it]
{'loss': 0.2829, 'grad_norm': 0.3984769284725189, 'learning_rate': 4.0153005235166995e-05, 'epoch': 0.36}
36%|███▋ | 1639/4506 [1:52:07<3:19:13, 4.17s/it]
36%|███▋ | 1640/4506 [1:52:11<3:19:29, 4.18s/it]
{'loss': 0.2777, 'grad_norm': 0.4629054069519043, 'learning_rate': 4.0137595398451406e-05, 'epoch': 0.36}
36%|███▋ | 1640/4506 [1:52:11<3:19:29, 4.18s/it]
36%|███▋ | 1641/4506 [1:52:15<3:16:35, 4.12s/it]
{'loss': 0.2713, 'grad_norm': 0.4180014431476593, 'learning_rate': 4.012217647569048e-05, 'epoch': 0.36}
36%|███▋ | 1641/4506 [1:52:15<3:16:35, 4.12s/it]
36%|███▋ | 1642/4506 [1:52:19<3:15:52, 4.10s/it]
{'loss': 0.2842, 'grad_norm': 0.4852376878261566, 'learning_rate': 4.0106748476139105e-05, 'epoch': 0.36}
36%|███▋ | 1642/4506 [1:52:19<3:15:52, 4.10s/it]
36%|███▋ | 1643/4506 [1:52:23<3:10:21, 3.99s/it]
{'loss': 0.2822, 'grad_norm': 0.49879786372184753, 'learning_rate': 4.009131140905765e-05, 'epoch': 0.36}
36%|███▋ | 1643/4506 [1:52:23<3:10:21, 3.99s/it]
36%|███▋ | 1644/4506 [1:52:27<3:13:03, 4.05s/it]
{'loss': 0.2727, 'grad_norm': 0.40830329060554504, 'learning_rate': 4.007586528371193e-05, 'epoch': 0.36}
36%|███▋ | 1644/4506 [1:52:27<3:13:03, 4.05s/it]
37%|███▋ | 1645/4506 [1:52:31<3:08:42, 3.96s/it]
{'loss': 0.2727, 'grad_norm': 0.46293291449546814, 'learning_rate': 4.006041010937315e-05, 'epoch': 0.37}
37%|███▋ | 1645/4506 [1:52:31<3:08:42, 3.96s/it]
37%|███▋ | 1646/4506 [1:52:35<3:06:59, 3.92s/it]
{'loss': 0.2727, 'grad_norm': 0.4142531156539917, 'learning_rate': 4.004494589531799e-05, 'epoch': 0.37}
37%|███▋ | 1646/4506 [1:52:35<3:06:59, 3.92s/it]
37%|███▋ | 1647/4506 [1:52:39<3:12:05, 4.03s/it]
{'loss': 0.2716, 'grad_norm': 0.44545018672943115, 'learning_rate': 4.002947265082854e-05, 'epoch': 0.37}
37%|███▋ | 1647/4506 [1:52:39<3:12:05, 4.03s/it]
37%|███▋ | 1648/4506 [1:52:43<3:08:19, 3.95s/it]
{'loss': 0.2666, 'grad_norm': 0.40547481179237366, 'learning_rate': 4.001399038519231e-05, 'epoch': 0.37}
37%|███▋ | 1648/4506 [1:52:43<3:08:19, 3.95s/it]
37%|███▋ | 1649/4506 [1:52:47<3:11:12, 4.02s/it]
{'loss': 0.2681, 'grad_norm': 0.41998258233070374, 'learning_rate': 3.9998499107702235e-05, 'epoch': 0.37}
37%|███▋ | 1649/4506 [1:52:47<3:11:12, 4.02s/it]
37%|███▋ | 1650/4506 [1:52:51<3:13:15, 4.06s/it]
{'loss': 0.2847, 'grad_norm': 0.47467777132987976, 'learning_rate': 3.9982998827656636e-05, 'epoch': 0.37}
37%|███▋ | 1650/4506 [1:52:51<3:13:15, 4.06s/it]
37%|███▋ | 1651/4506 [1:52:55<3:09:33, 3.98s/it]
{'loss': 0.285, 'grad_norm': 0.42333394289016724, 'learning_rate': 3.996748955435927e-05, 'epoch': 0.37}
37%|███▋ | 1651/4506 [1:52:55<3:09:33, 3.98s/it]
37%|███▋ | 1652/4506 [1:52:59<3:08:38, 3.97s/it]
{'loss': 0.2689, 'grad_norm': 0.42535048723220825, 'learning_rate': 3.995197129711926e-05, 'epoch': 0.37}
37%|███▋ | 1652/4506 [1:52:59<3:08:38, 3.97s/it]
37%|███▋ | 1653/4506 [1:53:03<3:07:58, 3.95s/it]
{'loss': 0.2904, 'grad_norm': 0.5055444240570068, 'learning_rate': 3.993644406525114e-05, 'epoch': 0.37}
37%|███▋ | 1653/4506 [1:53:03<3:07:58, 3.95s/it]
37%|███▋ | 1654/4506 [1:53:07<3:09:32, 3.99s/it]
{'loss': 0.2794, 'grad_norm': 0.46418192982673645, 'learning_rate': 3.992090786807484e-05, 'epoch': 0.37}
37%|███▋ | 1654/4506 [1:53:07<3:09:32, 3.99s/it]
37%|███▋ | 1655/4506 [1:53:11<3:08:22, 3.96s/it]
{'loss': 0.2918, 'grad_norm': 0.41347718238830566, 'learning_rate': 3.9905362714915643e-05, 'epoch': 0.37}
37%|███▋ | 1655/4506 [1:53:11<3:08:22, 3.96s/it]
37%|███▋ | 1656/4506 [1:53:15<3:09:25, 3.99s/it]
{'loss': 0.2709, 'grad_norm': 0.39393168687820435, 'learning_rate': 3.988980861510423e-05, 'epoch': 0.37}
37%|███▋ | 1656/4506 [1:53:15<3:09:25, 3.99s/it]
37%|███▋ | 1657/4506 [1:53:19<3:10:39, 4.02s/it]
{'loss': 0.2688, 'grad_norm': 0.4021659791469574, 'learning_rate': 3.9874245577976647e-05, 'epoch': 0.37}
37%|███▋ | 1657/4506 [1:53:19<3:10:39, 4.02s/it]
37%|███▋ | 1658/4506 [1:53:23<3:17:07, 4.15s/it]
{'loss': 0.2814, 'grad_norm': 0.420153945684433, 'learning_rate': 3.9858673612874285e-05, 'epoch': 0.37}
37%|███▋ | 1658/4506 [1:53:23<3:17:07, 4.15s/it]
37%|███▋ | 1659/4506 [1:53:28<3:17:46, 4.17s/it]
{'loss': 0.2852, 'grad_norm': 0.5180535316467285, 'learning_rate': 3.9843092729143935e-05, 'epoch': 0.37}
37%|███▋ | 1659/4506 [1:53:28<3:17:46, 4.17s/it]
37%|███▋ | 1660/4506 [1:53:32<3:19:18, 4.20s/it]
{'loss': 0.2754, 'grad_norm': 0.48659148812294006, 'learning_rate': 3.98275029361377e-05, 'epoch': 0.37}
37%|███▋ | 1660/4506 [1:53:32<3:19:18, 4.20s/it]
37%|███▋ | 1661/4506 [1:53:36<3:22:35, 4.27s/it]
{'loss': 0.2763, 'grad_norm': 0.4392460584640503, 'learning_rate': 3.981190424321306e-05, 'epoch': 0.37}
37%|███▋ | 1661/4506 [1:53:36<3:22:35, 4.27s/it]
37%|███▋ | 1662/4506 [1:53:40<3:16:14, 4.14s/it]
{'loss': 0.2607, 'grad_norm': 0.4334415793418884, 'learning_rate': 3.9796296659732813e-05, 'epoch': 0.37}
37%|███▋ | 1662/4506 [1:53:40<3:16:14, 4.14s/it]
37%|███▋ | 1663/4506 [1:53:44<3:18:15, 4.18s/it]
{'loss': 0.2762, 'grad_norm': 0.4602874219417572, 'learning_rate': 3.978068019506511e-05, 'epoch': 0.37}
37%|███▋ | 1663/4506 [1:53:44<3:18:15, 4.18s/it]
37%|███▋ | 1664/4506 [1:53:49<3:21:16, 4.25s/it]
{'loss': 0.2858, 'grad_norm': 0.4471319317817688, 'learning_rate': 3.976505485858344e-05, 'epoch': 0.37}
37%|███▋ | 1664/4506 [1:53:49<3:21:16, 4.25s/it]
37%|███▋ | 1665/4506 [1:53:53<3:16:30, 4.15s/it]
{'loss': 0.2616, 'grad_norm': 0.42613428831100464, 'learning_rate': 3.97494206596666e-05, 'epoch': 0.37}
37%|███▋ | 1665/4506 [1:53:53<3:16:30, 4.15s/it]
37%|███▋ | 1666/4506 [1:53:57<3:14:48, 4.12s/it]
{'loss': 0.2667, 'grad_norm': 0.409416526556015, 'learning_rate': 3.97337776076987e-05, 'epoch': 0.37}
37%|███▋ | 1666/4506 [1:53:57<3:14:48, 4.12s/it]
37%|███▋ | 1667/4506 [1:54:01<3:16:08, 4.15s/it]
{'loss': 0.2729, 'grad_norm': 0.41452547907829285, 'learning_rate': 3.9718125712069174e-05, 'epoch': 0.37}
37%|███▋ | 1667/4506 [1:54:01<3:16:08, 4.15s/it]
37%|███▋ | 1668/4506 [1:54:05<3:14:23, 4.11s/it]
{'loss': 0.2686, 'grad_norm': 0.4259277284145355, 'learning_rate': 3.9702464982172795e-05, 'epoch': 0.37}
37%|███▋ | 1668/4506 [1:54:05<3:14:23, 4.11s/it]
37%|███▋ | 1669/4506 [1:54:09<3:16:48, 4.16s/it]
{'loss': 0.2693, 'grad_norm': 0.45460304617881775, 'learning_rate': 3.968679542740958e-05, 'epoch': 0.37}
37%|███▋ | 1669/4506 [1:54:09<3:16:48, 4.16s/it]
37%|███▋ | 1670/4506 [1:54:13<3:09:59, 4.02s/it]
{'loss': 0.276, 'grad_norm': 0.43217530846595764, 'learning_rate': 3.967111705718488e-05, 'epoch': 0.37}
37%|███▋ | 1670/4506 [1:54:13<3:09:59, 4.02s/it]
37%|███▋ | 1671/4506 [1:54:17<3:05:54, 3.93s/it]
{'loss': 0.2821, 'grad_norm': 0.4341748058795929, 'learning_rate': 3.9655429880909345e-05, 'epoch': 0.37}
37%|███▋ | 1671/4506 [1:54:17<3:05:54, 3.93s/it]
37%|███▋ | 1672/4506 [1:54:21<3:10:22, 4.03s/it]
{'loss': 0.2803, 'grad_norm': 0.5258432626724243, 'learning_rate': 3.963973390799888e-05, 'epoch': 0.37}
37%|███▋ | 1672/4506 [1:54:21<3:10:22, 4.03s/it]
37%|███▋ | 1673/4506 [1:54:25<3:10:51, 4.04s/it]
{'loss': 0.2636, 'grad_norm': 0.36714527010917664, 'learning_rate': 3.962402914787468e-05, 'epoch': 0.37}
37%|███▋ | 1673/4506 [1:54:25<3:10:51, 4.04s/it]
37%|███▋ | 1674/4506 [1:54:29<3:10:08, 4.03s/it]
{'loss': 0.2672, 'grad_norm': 0.35594990849494934, 'learning_rate': 3.960831560996323e-05, 'epoch': 0.37}
37%|███▋ | 1674/4506 [1:54:29<3:10:08, 4.03s/it]
37%|███▋ | 1675/4506 [1:54:33<3:07:31, 3.97s/it]
{'loss': 0.2661, 'grad_norm': 0.41446855664253235, 'learning_rate': 3.959259330369628e-05, 'epoch': 0.37}
37%|███▋ | 1675/4506 [1:54:33<3:07:31, 3.97s/it]
37%|███▋ | 1676/4506 [1:54:37<3:05:33, 3.93s/it]
{'loss': 0.2715, 'grad_norm': 0.4916912019252777, 'learning_rate': 3.957686223851082e-05, 'epoch': 0.37}
37%|███▋ | 1676/4506 [1:54:37<3:05:33, 3.93s/it]
37%|███▋ | 1677/4506 [1:54:40<3:02:47, 3.88s/it]
{'loss': 0.2691, 'grad_norm': 0.39813104271888733, 'learning_rate': 3.956112242384913e-05, 'epoch': 0.37}
37%|███▋ | 1677/4506 [1:54:40<3:02:47, 3.88s/it]
37%|███▋ | 1678/4506 [1:54:44<3:03:36, 3.90s/it]
{'loss': 0.2788, 'grad_norm': 0.476639986038208, 'learning_rate': 3.954537386915871e-05, 'epoch': 0.37}
37%|███▋ | 1678/4506 [1:54:44<3:03:36, 3.90s/it]
37%|███▋ | 1679/4506 [1:54:48<3:00:15, 3.83s/it]
{'loss': 0.2693, 'grad_norm': 0.4441368281841278, 'learning_rate': 3.9529616583892335e-05, 'epoch': 0.37}
37%|███▋ | 1679/4506 [1:54:48<3:00:15, 3.83s/it]
37%|███▋ | 1680/4506 [1:54:52<3:01:23, 3.85s/it]
{'loss': 0.2812, 'grad_norm': 0.4640970826148987, 'learning_rate': 3.9513850577508e-05, 'epoch': 0.37}
37%|███▋ | 1680/4506 [1:54:52<3:01:23, 3.85s/it]
37%|███▋ | 1681/4506 [1:54:57<3:10:30, 4.05s/it]
{'loss': 0.2699, 'grad_norm': 0.4150484502315521, 'learning_rate': 3.949807585946894e-05, 'epoch': 0.37}
37%|███▋ | 1681/4506 [1:54:57<3:10:30, 4.05s/it]
37%|███▋ | 1682/4506 [1:55:01<3:10:54, 4.06s/it]
{'loss': 0.2692, 'grad_norm': 0.41347476840019226, 'learning_rate': 3.948229243924364e-05, 'epoch': 0.37}
37%|███▋ | 1682/4506 [1:55:01<3:10:54, 4.06s/it]
37%|███▋ | 1683/4506 [1:55:04<3:06:43, 3.97s/it]
{'loss': 0.2717, 'grad_norm': 0.41369298100471497, 'learning_rate': 3.946650032630576e-05, 'epoch': 0.37}
37%|███▋ | 1683/4506 [1:55:04<3:06:43, 3.97s/it]
37%|███▋ | 1684/4506 [1:55:08<3:08:45, 4.01s/it]
{'loss': 0.2772, 'grad_norm': 0.4026239812374115, 'learning_rate': 3.945069953013423e-05, 'epoch': 0.37}
37%|███▋ | 1684/4506 [1:55:08<3:08:45, 4.01s/it]
37%|███▋ | 1685/4506 [1:55:13<3:10:01, 4.04s/it]
{'loss': 0.2727, 'grad_norm': 0.42533934116363525, 'learning_rate': 3.943489006021315e-05, 'epoch': 0.37}
37%|███▋ | 1685/4506 [1:55:13<3:10:01, 4.04s/it]
37%|███▋ | 1686/4506 [1:55:17<3:16:41, 4.19s/it]
{'loss': 0.2655, 'grad_norm': 0.3788037598133087, 'learning_rate': 3.941907192603185e-05, 'epoch': 0.37}
37%|███▋ | 1686/4506 [1:55:17<3:16:41, 4.19s/it]
37%|███▋ | 1687/4506 [1:55:22<3:20:34, 4.27s/it]
{'loss': 0.2809, 'grad_norm': 0.4213291108608246, 'learning_rate': 3.940324513708487e-05, 'epoch': 0.37}
37%|███▋ | 1687/4506 [1:55:22<3:20:34, 4.27s/it]
37%|███▋ | 1688/4506 [1:55:26<3:17:06, 4.20s/it]
{'loss': 0.2955, 'grad_norm': 0.46356824040412903, 'learning_rate': 3.938740970287192e-05, 'epoch': 0.37}
37%|███▋ | 1688/4506 [1:55:26<3:17:06, 4.20s/it]
37%|███▋ | 1689/4506 [1:55:30<3:15:35, 4.17s/it]
{'loss': 0.2749, 'grad_norm': 0.42626723647117615, 'learning_rate': 3.93715656328979e-05, 'epoch': 0.37}
37%|███▋ | 1689/4506 [1:55:30<3:15:35, 4.17s/it]
38%|███▊ | 1690/4506 [1:55:35<3:25:12, 4.37s/it]
{'loss': 0.2717, 'grad_norm': 0.38759860396385193, 'learning_rate': 3.935571293667292e-05, 'epoch': 0.38}
38%|███▊ | 1690/4506 [1:55:35<3:25:12, 4.37s/it]
38%|███▊ | 1691/4506 [1:55:39<3:21:25, 4.29s/it]
{'loss': 0.2637, 'grad_norm': 0.41424402594566345, 'learning_rate': 3.9339851623712235e-05, 'epoch': 0.38}
38%|███▊ | 1691/4506 [1:55:39<3:21:25, 4.29s/it]
38%|███▊ | 1692/4506 [1:55:43<3:22:55, 4.33s/it]
{'loss': 0.2702, 'grad_norm': 0.37975674867630005, 'learning_rate': 3.93239817035363e-05, 'epoch': 0.38}
38%|███▊ | 1692/4506 [1:55:43<3:22:55, 4.33s/it]
38%|███▊ | 1693/4506 [1:55:47<3:22:17, 4.31s/it]
{'loss': 0.2667, 'grad_norm': 0.40637457370758057, 'learning_rate': 3.930810318567072e-05, 'epoch': 0.38}
38%|███▊ | 1693/4506 [1:55:47<3:22:17, 4.31s/it]
38%|███▊ | 1694/4506 [1:55:51<3:19:45, 4.26s/it]
{'loss': 0.2685, 'grad_norm': 0.42446455359458923, 'learning_rate': 3.929221607964626e-05, 'epoch': 0.38}
38%|███▊ | 1694/4506 [1:55:51<3:19:45, 4.26s/it]
38%|███▊ | 1695/4506 [1:55:56<3:20:42, 4.28s/it]
{'loss': 0.2807, 'grad_norm': 0.4148784875869751, 'learning_rate': 3.927632039499885e-05, 'epoch': 0.38}
38%|███▊ | 1695/4506 [1:55:56<3:20:42, 4.28s/it]
38%|███▊ | 1696/4506 [1:56:00<3:20:14, 4.28s/it]
{'loss': 0.2775, 'grad_norm': 0.41834744811058044, 'learning_rate': 3.926041614126956e-05, 'epoch': 0.38}
38%|███▊ | 1696/4506 [1:56:00<3:20:14, 4.28s/it]
38%|███▊ | 1697/4506 [1:56:04<3:19:15, 4.26s/it]
{'loss': 0.276, 'grad_norm': 0.3909909725189209, 'learning_rate': 3.924450332800461e-05, 'epoch': 0.38}
38%|███▊ | 1697/4506 [1:56:04<3:19:15, 4.26s/it]
38%|███▊ | 1698/4506 [1:56:08<3:12:03, 4.10s/it]
{'loss': 0.2836, 'grad_norm': 0.4073463976383209, 'learning_rate': 3.922858196475535e-05, 'epoch': 0.38}
38%|███▊ | 1698/4506 [1:56:08<3:12:03, 4.10s/it]
38%|███▊ | 1699/4506 [1:56:12<3:06:50, 3.99s/it]
{'loss': 0.2625, 'grad_norm': 0.39504778385162354, 'learning_rate': 3.921265206107827e-05, 'epoch': 0.38}
38%|███▊ | 1699/4506 [1:56:12<3:06:50, 3.99s/it]
38%|███▊ | 1700/4506 [1:56:16<3:09:42, 4.06s/it]
{'loss': 0.288, 'grad_norm': 0.45575228333473206, 'learning_rate': 3.919671362653499e-05, 'epoch': 0.38}
38%|███▊ | 1700/4506 [1:56:16<3:09:42, 4.06s/it]
38%|███▊ | 1701/4506 [1:56:20<3:06:49, 4.00s/it]
{'loss': 0.2688, 'grad_norm': 0.4160563051700592, 'learning_rate': 3.918076667069223e-05, 'epoch': 0.38}
38%|███▊ | 1701/4506 [1:56:20<3:06:49, 4.00s/it]
38%|███▊ | 1702/4506 [1:56:24<3:05:10, 3.96s/it]
{'loss': 0.2802, 'grad_norm': 0.44372615218162537, 'learning_rate': 3.916481120312184e-05, 'epoch': 0.38}
38%|███▊ | 1702/4506 [1:56:24<3:05:10, 3.96s/it]
38%|███▊ | 1703/4506 [1:56:28<3:12:23, 4.12s/it]
{'loss': 0.2696, 'grad_norm': 0.3806222677230835, 'learning_rate': 3.91488472334008e-05, 'epoch': 0.38}
38%|███▊ | 1703/4506 [1:56:28<3:12:23, 4.12s/it]
38%|███▊ | 1704/4506 [1:56:32<3:09:11, 4.05s/it]
{'loss': 0.2676, 'grad_norm': 0.37891826033592224, 'learning_rate': 3.9132874771111136e-05, 'epoch': 0.38}
38%|███▊ | 1704/4506 [1:56:32<3:09:11, 4.05s/it]
38%|███▊ | 1705/4506 [1:56:36<3:13:34, 4.15s/it]
{'loss': 0.2652, 'grad_norm': 0.40809252858161926, 'learning_rate': 3.911689382584002e-05, 'epoch': 0.38}
38%|███▊ | 1705/4506 [1:56:36<3:13:34, 4.15s/it]
38%|███▊ | 1706/4506 [1:56:41<3:12:19, 4.12s/it]
{'loss': 0.2817, 'grad_norm': 0.39688146114349365, 'learning_rate': 3.910090440717971e-05, 'epoch': 0.38}
38%|███▊ | 1706/4506 [1:56:41<3:12:19, 4.12s/it]
38%|███▊ | 1707/4506 [1:56:44<3:09:58, 4.07s/it]
{'loss': 0.2718, 'grad_norm': 0.3793061673641205, 'learning_rate': 3.908490652472754e-05, 'epoch': 0.38}
38%|███▊ | 1707/4506 [1:56:44<3:09:58, 4.07s/it]
38%|███▊ | 1708/4506 [1:56:49<3:11:35, 4.11s/it]
{'loss': 0.2671, 'grad_norm': 0.38611507415771484, 'learning_rate': 3.906890018808591e-05, 'epoch': 0.38}
38%|███▊ | 1708/4506 [1:56:49<3:11:35, 4.11s/it]
38%|███▊ | 1709/4506 [1:56:53<3:17:26, 4.24s/it]
{'loss': 0.2685, 'grad_norm': 0.37487655878067017, 'learning_rate': 3.905288540686232e-05, 'epoch': 0.38}
38%|███▊ | 1709/4506 [1:56:53<3:17:26, 4.24s/it]
38%|███▊ | 1710/4506 [1:56:58<3:18:24, 4.26s/it]
{'loss': 0.2603, 'grad_norm': 0.4218653440475464, 'learning_rate': 3.9036862190669334e-05, 'epoch': 0.38}
38%|███▊ | 1710/4506 [1:56:58<3:18:24, 4.26s/it]
38%|███▊ | 1711/4506 [1:57:01<3:13:28, 4.15s/it]
{'loss': 0.265, 'grad_norm': 0.3630189895629883, 'learning_rate': 3.902083054912457e-05, 'epoch': 0.38}
38%|███▊ | 1711/4506 [1:57:01<3:13:28, 4.15s/it]
38%|███▊ | 1712/4506 [1:57:06<3:14:10, 4.17s/it]
{'loss': 0.2586, 'grad_norm': 0.38403549790382385, 'learning_rate': 3.900479049185071e-05, 'epoch': 0.38}
38%|███▊ | 1712/4506 [1:57:06<3:14:10, 4.17s/it]
38%|███▊ | 1713/4506 [1:57:09<3:09:20, 4.07s/it]
{'loss': 0.2504, 'grad_norm': 0.3878001868724823, 'learning_rate': 3.8988742028475476e-05, 'epoch': 0.38}
38%|███▊ | 1713/4506 [1:57:09<3:09:20, 4.07s/it]
38%|███▊ | 1714/4506 [1:57:13<3:09:00, 4.06s/it]
{'loss': 0.2668, 'grad_norm': 0.4071967303752899, 'learning_rate': 3.897268516863165e-05, 'epoch': 0.38}
38%|███▊ | 1714/4506 [1:57:14<3:09:00, 4.06s/it]
38%|███▊ | 1715/4506 [1:57:18<3:14:08, 4.17s/it]
{'loss': 0.2662, 'grad_norm': 0.40801334381103516, 'learning_rate': 3.895661992195704e-05, 'epoch': 0.38}
38%|███▊ | 1715/4506 [1:57:18<3:14:08, 4.17s/it]
38%|███▊ | 1716/4506 [1:57:22<3:11:41, 4.12s/it]
{'loss': 0.2796, 'grad_norm': 0.4710800051689148, 'learning_rate': 3.894054629809451e-05, 'epoch': 0.38}
38%|███▊ | 1716/4506 [1:57:22<3:11:41, 4.12s/it]
38%|███▊ | 1717/4506 [1:57:26<3:13:45, 4.17s/it]
{'loss': 0.2722, 'grad_norm': 0.4263559579849243, 'learning_rate': 3.892446430669193e-05, 'epoch': 0.38}
38%|███▊ | 1717/4506 [1:57:26<3:13:45, 4.17s/it]
38%|███▊ | 1718/4506 [1:57:30<3:14:21, 4.18s/it]
{'loss': 0.2597, 'grad_norm': 0.38025882840156555, 'learning_rate': 3.890837395740221e-05, 'epoch': 0.38}
38%|███▊ | 1718/4506 [1:57:30<3:14:21, 4.18s/it]
38%|███▊ | 1719/4506 [1:57:34<3:09:39, 4.08s/it]
{'loss': 0.2621, 'grad_norm': 0.3884318470954895, 'learning_rate': 3.8892275259883246e-05, 'epoch': 0.38}
38%|███▊ | 1719/4506 [1:57:34<3:09:39, 4.08s/it]
38%|███▊ | 1720/4506 [1:57:38<3:03:37, 3.95s/it]
{'loss': 0.2792, 'grad_norm': 0.4628463387489319, 'learning_rate': 3.887616822379798e-05, 'epoch': 0.38}
38%|███▊ | 1720/4506 [1:57:38<3:03:37, 3.95s/it]
38%|███▊ | 1721/4506 [1:57:42<3:02:05, 3.92s/it]
{'loss': 0.2542, 'grad_norm': 0.4226265549659729, 'learning_rate': 3.8860052858814355e-05, 'epoch': 0.38}
38%|███▊ | 1721/4506 [1:57:42<3:02:05, 3.92s/it]
38%|███▊ | 1722/4506 [1:57:46<3:02:53, 3.94s/it]
{'loss': 0.2614, 'grad_norm': 0.44041720032691956, 'learning_rate': 3.8843929174605287e-05, 'epoch': 0.38}
38%|███▊ | 1722/4506 [1:57:46<3:02:53, 3.94s/it]
38%|███▊ | 1723/4506 [1:57:50<3:07:41, 4.05s/it]
{'loss': 0.2719, 'grad_norm': 0.4075077474117279, 'learning_rate': 3.88277971808487e-05, 'epoch': 0.38}
38%|███▊ | 1723/4506 [1:57:50<3:07:41, 4.05s/it]
38%|███▊ | 1724/4506 [1:57:55<3:13:42, 4.18s/it]
{'loss': 0.2637, 'grad_norm': 0.46693915128707886, 'learning_rate': 3.881165688722752e-05, 'epoch': 0.38}
38%|███▊ | 1724/4506 [1:57:55<3:13:42, 4.18s/it]
38%|███▊ | 1725/4506 [1:57:58<3:09:29, 4.09s/it]
{'loss': 0.2536, 'grad_norm': 0.40043172240257263, 'learning_rate': 3.879550830342964e-05, 'epoch': 0.38}
38%|███▊ | 1725/4506 [1:57:58<3:09:29, 4.09s/it]
38%|███▊ | 1726/4506 [1:58:02<3:07:48, 4.05s/it]
{'loss': 0.268, 'grad_norm': 0.45123299956321716, 'learning_rate': 3.877935143914793e-05, 'epoch': 0.38}
38%|███▊ | 1726/4506 [1:58:02<3:07:48, 4.05s/it]
38%|███▊ | 1727/4506 [1:58:06<3:08:25, 4.07s/it]
{'loss': 0.2573, 'grad_norm': 0.3941057324409485, 'learning_rate': 3.8763186304080224e-05, 'epoch': 0.38}
38%|███▊ | 1727/4506 [1:58:07<3:08:25, 4.07s/it]
38%|███▊ | 1728/4506 [1:58:10<3:04:56, 3.99s/it]
{'loss': 0.2713, 'grad_norm': 0.44847625494003296, 'learning_rate': 3.874701290792934e-05, 'epoch': 0.38}
38%|███▊ | 1728/4506 [1:58:10<3:04:56, 3.99s/it]
38%|███▊ | 1729/4506 [1:58:14<3:03:14, 3.96s/it]
{'loss': 0.2672, 'grad_norm': 0.40534254908561707, 'learning_rate': 3.8730831260403024e-05, 'epoch': 0.38}
38%|███▊ | 1729/4506 [1:58:14<3:03:14, 3.96s/it]
38%|███▊ | 1730/4506 [1:58:18<3:06:51, 4.04s/it]
{'loss': 0.2753, 'grad_norm': 0.4218928813934326, 'learning_rate': 3.871464137121401e-05, 'epoch': 0.38}
38%|███▊ | 1730/4506 [1:58:18<3:06:51, 4.04s/it]
38%|███▊ | 1731/4506 [1:58:23<3:12:41, 4.17s/it]
{'loss': 0.2803, 'grad_norm': 0.4546772837638855, 'learning_rate': 3.8698443250079965e-05, 'epoch': 0.38}
38%|███▊ | 1731/4506 [1:58:23<3:12:41, 4.17s/it]
38%|███▊ | 1732/4506 [1:58:27<3:16:14, 4.24s/it]
{'loss': 0.2647, 'grad_norm': 0.4816248118877411, 'learning_rate': 3.868223690672348e-05, 'epoch': 0.38}
38%|███▊ | 1732/4506 [1:58:27<3:16:14, 4.24s/it]
38%|███▊ | 1733/4506 [1:58:31<3:09:42, 4.10s/it]
{'loss': 0.2589, 'grad_norm': 0.45744702219963074, 'learning_rate': 3.86660223508721e-05, 'epoch': 0.38}
38%|███▊ | 1733/4506 [1:58:31<3:09:42, 4.10s/it]
38%|███▊ | 1734/4506 [1:58:35<3:07:59, 4.07s/it]
{'loss': 0.2692, 'grad_norm': 0.4488428831100464, 'learning_rate': 3.86497995922583e-05, 'epoch': 0.38}
38%|███▊ | 1734/4506 [1:58:35<3:07:59, 4.07s/it]
39%|███▊ | 1735/4506 [1:58:39<3:09:45, 4.11s/it]
{'loss': 0.2776, 'grad_norm': 0.4407889246940613, 'learning_rate': 3.863356864061947e-05, 'epoch': 0.39}
39%|███▊ | 1735/4506 [1:58:39<3:09:45, 4.11s/it]
39%|███▊ | 1736/4506 [1:58:43<3:08:20, 4.08s/it]
{'loss': 0.2834, 'grad_norm': 0.4241366386413574, 'learning_rate': 3.861732950569793e-05, 'epoch': 0.39}
39%|███▊ | 1736/4506 [1:58:43<3:08:20, 4.08s/it]
39%|███▊ | 1737/4506 [1:58:47<3:07:50, 4.07s/it]
{'loss': 0.2592, 'grad_norm': 0.4954528212547302, 'learning_rate': 3.860108219724088e-05, 'epoch': 0.39}
39%|███▊ | 1737/4506 [1:58:47<3:07:50, 4.07s/it]
39%|███▊ | 1738/4506 [1:58:51<3:04:33, 4.00s/it]
{'loss': 0.2658, 'grad_norm': 0.4829043745994568, 'learning_rate': 3.8584826725000465e-05, 'epoch': 0.39}
39%|███▊ | 1738/4506 [1:58:51<3:04:33, 4.00s/it]
39%|███▊ | 1739/4506 [1:58:55<3:08:43, 4.09s/it]
{'loss': 0.2564, 'grad_norm': 0.40796345472335815, 'learning_rate': 3.8568563098733725e-05, 'epoch': 0.39}
39%|███▊ | 1739/4506 [1:58:55<3:08:43, 4.09s/it]
39%|███▊ | 1740/4506 [1:59:00<3:12:22, 4.17s/it]
{'loss': 0.2736, 'grad_norm': 0.4182085394859314, 'learning_rate': 3.855229132820256e-05, 'epoch': 0.39}
39%|███▊ | 1740/4506 [1:59:00<3:12:22, 4.17s/it]
39%|███▊ | 1741/4506 [1:59:04<3:12:21, 4.17s/it]
{'loss': 0.2779, 'grad_norm': 0.4693056046962738, 'learning_rate': 3.853601142317379e-05, 'epoch': 0.39}
39%|███▊ | 1741/4506 [1:59:04<3:12:21, 4.17s/it]
39%|███▊ | 1742/4506 [1:59:08<3:10:00, 4.12s/it]
{'loss': 0.2822, 'grad_norm': 0.42991626262664795, 'learning_rate': 3.8519723393419116e-05, 'epoch': 0.39}
39%|███▊ | 1742/4506 [1:59:08<3:10:00, 4.12s/it]
39%|███▊ | 1743/4506 [1:59:12<3:10:43, 4.14s/it]
{'loss': 0.2669, 'grad_norm': 0.39782053232192993, 'learning_rate': 3.8503427248715106e-05, 'epoch': 0.39}
39%|███▊ | 1743/4506 [1:59:12<3:10:43, 4.14s/it]
39%|███▊ | 1744/4506 [1:59:16<3:10:11, 4.13s/it]
{'loss': 0.2551, 'grad_norm': 0.4261806905269623, 'learning_rate': 3.848712299884319e-05, 'epoch': 0.39}
39%|███▊ | 1744/4506 [1:59:16<3:10:11, 4.13s/it]
39%|███▊ | 1745/4506 [1:59:20<3:10:24, 4.14s/it]
{'loss': 0.2642, 'grad_norm': 0.45957431197166443, 'learning_rate': 3.84708106535897e-05, 'epoch': 0.39}
39%|███▊ | 1745/4506 [1:59:20<3:10:24, 4.14s/it]
39%|███▊ | 1746/4506 [1:59:24<3:05:28, 4.03s/it]
{'loss': 0.2756, 'grad_norm': 0.4474402964115143, 'learning_rate': 3.845449022274578e-05, 'epoch': 0.39}
39%|███▊ | 1746/4506 [1:59:24<3:05:28, 4.03s/it]
39%|███▉ | 1747/4506 [1:59:28<3:00:45, 3.93s/it]
{'loss': 0.271, 'grad_norm': 0.46406224370002747, 'learning_rate': 3.843816171610746e-05, 'epoch': 0.39}
39%|███▉ | 1747/4506 [1:59:28<3:00:45, 3.93s/it]
39%|███▉ | 1748/4506 [1:59:32<3:00:43, 3.93s/it]
{'loss': 0.2686, 'grad_norm': 0.38210973143577576, 'learning_rate': 3.84218251434756e-05, 'epoch': 0.39}
39%|███▉ | 1748/4506 [1:59:32<3:00:43, 3.93s/it]
39%|███▉ | 1749/4506 [1:59:36<2:58:46, 3.89s/it]
{'loss': 0.272, 'grad_norm': 0.4310205280780792, 'learning_rate': 3.8405480514655906e-05, 'epoch': 0.39}
39%|███▉ | 1749/4506 [1:59:36<2:58:46, 3.89s/it]
39%|███▉ | 1750/4506 [1:59:40<2:59:32, 3.91s/it]
{'loss': 0.2679, 'grad_norm': 0.45385491847991943, 'learning_rate': 3.838912783945893e-05, 'epoch': 0.39}
39%|███▉ | 1750/4506 [1:59:40<2:59:32, 3.91s/it]
39%|███▉ | 1751/4506 [1:59:44<3:03:55, 4.01s/it]
{'loss': 0.2566, 'grad_norm': 0.3901473879814148, 'learning_rate': 3.837276712770004e-05, 'epoch': 0.39}
39%|███▉ | 1751/4506 [1:59:44<3:03:55, 4.01s/it]
39%|███▉ | 1752/4506 [1:59:48<3:03:26, 4.00s/it]
{'loss': 0.2555, 'grad_norm': 0.45843201875686646, 'learning_rate': 3.835639838919944e-05, 'epoch': 0.39}
39%|███▉ | 1752/4506 [1:59:48<3:03:26, 4.00s/it]
39%|███▉ | 1753/4506 [1:59:52<3:11:41, 4.18s/it]
{'loss': 0.2636, 'grad_norm': 0.42575857043266296, 'learning_rate': 3.834002163378213e-05, 'epoch': 0.39}
39%|███▉ | 1753/4506 [1:59:52<3:11:41, 4.18s/it]
39%|███▉ | 1754/4506 [1:59:56<3:08:11, 4.10s/it]
{'loss': 0.2727, 'grad_norm': 0.4373457729816437, 'learning_rate': 3.832363687127795e-05, 'epoch': 0.39}
39%|███▉ | 1754/4506 [1:59:56<3:08:11, 4.10s/it]
39%|███▉ | 1755/4506 [2:00:01<3:09:42, 4.14s/it]
{'loss': 0.2642, 'grad_norm': 0.4793025851249695, 'learning_rate': 3.8307244111521535e-05, 'epoch': 0.39}
39%|███▉ | 1755/4506 [2:00:01<3:09:42, 4.14s/it]
39%|███▉ | 1756/4506 [2:00:05<3:17:11, 4.30s/it]
{'loss': 0.2696, 'grad_norm': 0.3832084834575653, 'learning_rate': 3.8290843364352314e-05, 'epoch': 0.39}
39%|███▉ | 1756/4506 [2:00:05<3:17:11, 4.30s/it]
39%|███▉ | 1757/4506 [2:00:09<3:10:23, 4.16s/it]
{'loss': 0.2622, 'grad_norm': 0.38058310747146606, 'learning_rate': 3.82744346396145e-05, 'epoch': 0.39}
39%|███▉ | 1757/4506 [2:00:09<3:10:23, 4.16s/it]
39%|███▉ | 1758/4506 [2:00:13<3:09:27, 4.14s/it]
{'loss': 0.2661, 'grad_norm': 0.443510502576828, 'learning_rate': 3.825801794715713e-05, 'epoch': 0.39}
39%|███▉ | 1758/4506 [2:00:13<3:09:27, 4.14s/it]
39%|███▉ | 1759/4506 [2:00:17<3:09:24, 4.14s/it]
{'loss': 0.2752, 'grad_norm': 0.38904476165771484, 'learning_rate': 3.824159329683399e-05, 'epoch': 0.39}
39%|███▉ | 1759/4506 [2:00:17<3:09:24, 4.14s/it]
39%|███▉ | 1760/4506 [2:00:21<3:06:06, 4.07s/it]
{'loss': 0.264, 'grad_norm': 0.431026428937912, 'learning_rate': 3.822516069850367e-05, 'epoch': 0.39}
39%|███▉ | 1760/4506 [2:00:21<3:06:06, 4.07s/it]
39%|███▉ | 1761/4506 [2:00:25<3:02:40, 3.99s/it]
{'loss': 0.2702, 'grad_norm': 0.4238075613975525, 'learning_rate': 3.82087201620295e-05, 'epoch': 0.39}
39%|███▉ | 1761/4506 [2:00:25<3:02:40, 3.99s/it]
39%|███▉ | 1762/4506 [2:00:29<3:05:19, 4.05s/it]
{'loss': 0.2764, 'grad_norm': 0.4036318063735962, 'learning_rate': 3.819227169727959e-05, 'epoch': 0.39}
39%|███▉ | 1762/4506 [2:00:29<3:05:19, 4.05s/it]
39%|███▉ | 1763/4506 [2:00:33<3:03:27, 4.01s/it]
{'loss': 0.2653, 'grad_norm': 0.4031672775745392, 'learning_rate': 3.817581531412681e-05, 'epoch': 0.39}
39%|███▉ | 1763/4506 [2:00:33<3:03:27, 4.01s/it]
39%|███▉ | 1764/4506 [2:00:37<3:02:45, 4.00s/it]
{'loss': 0.2679, 'grad_norm': 0.3743205964565277, 'learning_rate': 3.815935102244879e-05, 'epoch': 0.39}
39%|███▉ | 1764/4506 [2:00:37<3:02:45, 4.00s/it]
39%|███▉ | 1765/4506 [2:00:41<3:05:31, 4.06s/it]
{'loss': 0.261, 'grad_norm': 0.4048526883125305, 'learning_rate': 3.81428788321279e-05, 'epoch': 0.39}
39%|███▉ | 1765/4506 [2:00:41<3:05:31, 4.06s/it]
39%|███▉ | 1766/4506 [2:00:46<3:09:48, 4.16s/it]
{'loss': 0.262, 'grad_norm': 0.3591105043888092, 'learning_rate': 3.8126398753051236e-05, 'epoch': 0.39}
39%|███▉ | 1766/4506 [2:00:46<3:09:48, 4.16s/it]
39%|███▉ | 1767/4506 [2:00:50<3:10:46, 4.18s/it]
{'loss': 0.2582, 'grad_norm': 0.39895421266555786, 'learning_rate': 3.8109910795110654e-05, 'epoch': 0.39}
39%|███▉ | 1767/4506 [2:00:50<3:10:46, 4.18s/it]
39%|███▉ | 1768/4506 [2:00:54<3:11:29, 4.20s/it]
{'loss': 0.2669, 'grad_norm': 0.41338586807250977, 'learning_rate': 3.809341496820272e-05, 'epoch': 0.39}
39%|███▉ | 1768/4506 [2:00:54<3:11:29, 4.20s/it]
39%|███▉ | 1769/4506 [2:00:58<3:10:31, 4.18s/it]
{'loss': 0.2571, 'grad_norm': 0.4069066643714905, 'learning_rate': 3.807691128222875e-05, 'epoch': 0.39}
39%|███▉ | 1769/4506 [2:00:58<3:10:31, 4.18s/it]
39%|███▉ | 1770/4506 [2:01:02<3:08:18, 4.13s/it]
{'loss': 0.25, 'grad_norm': 0.34866952896118164, 'learning_rate': 3.806039974709473e-05, 'epoch': 0.39}
39%|███▉ | 1770/4506 [2:01:02<3:08:18, 4.13s/it]
39%|███▉ | 1771/4506 [2:01:07<3:10:10, 4.17s/it]
{'loss': 0.2538, 'grad_norm': 0.4452736973762512, 'learning_rate': 3.804388037271141e-05, 'epoch': 0.39}
39%|███▉ | 1771/4506 [2:01:07<3:10:10, 4.17s/it]
39%|███▉ | 1772/4506 [2:01:11<3:08:50, 4.14s/it]
{'loss': 0.2666, 'grad_norm': 0.5069514513015747, 'learning_rate': 3.8027353168994206e-05, 'epoch': 0.39}
39%|███▉ | 1772/4506 [2:01:11<3:08:50, 4.14s/it]
39%|███▉ | 1773/4506 [2:01:15<3:08:09, 4.13s/it]
{'loss': 0.2824, 'grad_norm': 0.4386875629425049, 'learning_rate': 3.8010818145863256e-05, 'epoch': 0.39}
39%|███▉ | 1773/4506 [2:01:15<3:08:09, 4.13s/it]
39%|███▉ | 1774/4506 [2:01:19<3:07:45, 4.12s/it]
{'loss': 0.2806, 'grad_norm': 0.3897395431995392, 'learning_rate': 3.799427531324339e-05, 'epoch': 0.39}
39%|███▉ | 1774/4506 [2:01:19<3:07:45, 4.12s/it]
39%|███▉ | 1775/4506 [2:01:23<3:07:58, 4.13s/it]
{'loss': 0.2836, 'grad_norm': 0.4107520580291748, 'learning_rate': 3.79777246810641e-05, 'epoch': 0.39}
39%|███▉ | 1775/4506 [2:01:23<3:07:58, 4.13s/it]
39%|███▉ | 1776/4506 [2:01:27<3:07:27, 4.12s/it]
{'loss': 0.2768, 'grad_norm': 0.41834747791290283, 'learning_rate': 3.796116625925959e-05, 'epoch': 0.39}
39%|███▉ | 1776/4506 [2:01:27<3:07:27, 4.12s/it]
39%|███▉ | 1777/4506 [2:01:32<3:11:46, 4.22s/it]
{'loss': 0.2566, 'grad_norm': 0.37912702560424805, 'learning_rate': 3.794460005776874e-05, 'epoch': 0.39}
39%|███▉ | 1777/4506 [2:01:32<3:11:46, 4.22s/it]
39%|███▉ | 1778/4506 [2:01:35<3:06:34, 4.10s/it]
{'loss': 0.2636, 'grad_norm': 0.37124714255332947, 'learning_rate': 3.792802608653507e-05, 'epoch': 0.39}
39%|███▉ | 1778/4506 [2:01:35<3:06:34, 4.10s/it]
39%|███▉ | 1779/4506 [2:01:39<3:02:03, 4.01s/it]
{'loss': 0.2726, 'grad_norm': 0.4941209852695465, 'learning_rate': 3.79114443555068e-05, 'epoch': 0.39}
39%|███▉ | 1779/4506 [2:01:39<3:02:03, 4.01s/it]
40%|███▉ | 1780/4506 [2:01:43<3:03:27, 4.04s/it]
{'loss': 0.2561, 'grad_norm': 0.37318143248558044, 'learning_rate': 3.7894854874636765e-05, 'epoch': 0.4}
40%|███▉ | 1780/4506 [2:01:43<3:03:27, 4.04s/it]
40%|███▉ | 1781/4506 [2:01:48<3:12:22, 4.24s/it]
{'loss': 0.265, 'grad_norm': 0.3855264484882355, 'learning_rate': 3.78782576538825e-05, 'epoch': 0.4}
40%|███▉ | 1781/4506 [2:01:48<3:12:22, 4.24s/it]
40%|███▉ | 1782/4506 [2:01:52<3:08:48, 4.16s/it]
{'loss': 0.2734, 'grad_norm': 0.42289063334465027, 'learning_rate': 3.786165270320614e-05, 'epoch': 0.4}
40%|███▉ | 1782/4506 [2:01:52<3:08:48, 4.16s/it]
40%|███▉ | 1783/4506 [2:01:56<3:06:12, 4.10s/it]
{'loss': 0.2581, 'grad_norm': 0.40189552307128906, 'learning_rate': 3.7845040032574506e-05, 'epoch': 0.4}
40%|███▉ | 1783/4506 [2:01:56<3:06:12, 4.10s/it]
40%|███▉ | 1784/4506 [2:02:00<3:06:08, 4.10s/it]
{'loss': 0.2794, 'grad_norm': 0.4201652407646179, 'learning_rate': 3.7828419651959014e-05, 'epoch': 0.4}
40%|███▉ | 1784/4506 [2:02:00<3:06:08, 4.10s/it]
40%|███▉ | 1785/4506 [2:02:04<3:09:42, 4.18s/it]
{'loss': 0.2666, 'grad_norm': 0.39466777443885803, 'learning_rate': 3.781179157133571e-05, 'epoch': 0.4}
40%|███▉ | 1785/4506 [2:02:04<3:09:42, 4.18s/it]
40%|███▉ | 1786/4506 [2:02:09<3:13:07, 4.26s/it]
{'loss': 0.2637, 'grad_norm': 0.39358288049697876, 'learning_rate': 3.77951558006853e-05, 'epoch': 0.4}
40%|███▉ | 1786/4506 [2:02:09<3:13:07, 4.26s/it]
40%|███▉ | 1787/4506 [2:02:13<3:08:30, 4.16s/it]
{'loss': 0.2729, 'grad_norm': 0.404852032661438, 'learning_rate': 3.777851234999307e-05, 'epoch': 0.4}
40%|███▉ | 1787/4506 [2:02:13<3:08:30, 4.16s/it]
40%|███▉ | 1788/4506 [2:02:17<3:09:14, 4.18s/it]
{'loss': 0.2617, 'grad_norm': 0.36283716559410095, 'learning_rate': 3.776186122924891e-05, 'epoch': 0.4}
40%|███▉ | 1788/4506 [2:02:17<3:09:14, 4.18s/it]
40%|███▉ | 1789/4506 [2:02:21<3:12:29, 4.25s/it]
{'loss': 0.2617, 'grad_norm': 0.3696580231189728, 'learning_rate': 3.774520244844736e-05, 'epoch': 0.4}
40%|███▉ | 1789/4506 [2:02:21<3:12:29, 4.25s/it]
40%|███▉ | 1790/4506 [2:02:26<3:17:22, 4.36s/it]
{'loss': 0.2564, 'grad_norm': 0.3612099289894104, 'learning_rate': 3.77285360175875e-05, 'epoch': 0.4}
40%|███▉ | 1790/4506 [2:02:26<3:17:22, 4.36s/it]
40%|███▉ | 1791/4506 [2:02:30<3:14:21, 4.30s/it]
{'loss': 0.2656, 'grad_norm': 0.4231267273426056, 'learning_rate': 3.771186194667304e-05, 'epoch': 0.4}
40%|███▉ | 1791/4506 [2:02:30<3:14:21, 4.30s/it]
40%|███▉ | 1792/4506 [2:02:35<3:14:51, 4.31s/it]
{'loss': 0.2648, 'grad_norm': 0.426922470331192, 'learning_rate': 3.769518024571226e-05, 'epoch': 0.4}
40%|███▉ | 1792/4506 [2:02:35<3:14:51, 4.31s/it]
40%|███▉ | 1793/4506 [2:02:39<3:12:01, 4.25s/it]
{'loss': 0.2687, 'grad_norm': 0.435257226228714, 'learning_rate': 3.767849092471803e-05, 'epoch': 0.4}
40%|███▉ | 1793/4506 [2:02:39<3:12:01, 4.25s/it]
40%|███▉ | 1794/4506 [2:02:43<3:09:15, 4.19s/it]
{'loss': 0.2706, 'grad_norm': 0.4166106581687927, 'learning_rate': 3.766179399370778e-05, 'epoch': 0.4}
40%|███▉ | 1794/4506 [2:02:43<3:09:15, 4.19s/it]
40%|███▉ | 1795/4506 [2:02:47<3:09:45, 4.20s/it]
{'loss': 0.2698, 'grad_norm': 0.43554484844207764, 'learning_rate': 3.764508946270353e-05, 'epoch': 0.4}
40%|███▉ | 1795/4506 [2:02:47<3:09:45, 4.20s/it]
40%|███▉ | 1796/4506 [2:02:51<3:08:55, 4.18s/it]
{'loss': 0.2648, 'grad_norm': 0.3885402977466583, 'learning_rate': 3.762837734173185e-05, 'epoch': 0.4}
40%|███▉ | 1796/4506 [2:02:51<3:08:55, 4.18s/it]
40%|███▉ | 1797/4506 [2:02:55<3:05:11, 4.10s/it]
{'loss': 0.26, 'grad_norm': 0.41504162549972534, 'learning_rate': 3.761165764082383e-05, 'epoch': 0.4}
40%|███▉ | 1797/4506 [2:02:55<3:05:11, 4.10s/it]
40%|███▉ | 1798/4506 [2:02:59<3:02:46, 4.05s/it]
{'loss': 0.2778, 'grad_norm': 0.3840697407722473, 'learning_rate': 3.759493037001518e-05, 'epoch': 0.4}
40%|███▉ | 1798/4506 [2:02:59<3:02:46, 4.05s/it]
40%|███▉ | 1799/4506 [2:03:03<3:04:40, 4.09s/it]
{'loss': 0.2502, 'grad_norm': 0.3600037395954132, 'learning_rate': 3.75781955393461e-05, 'epoch': 0.4}
40%|███▉ | 1799/4506 [2:03:03<3:04:40, 4.09s/it]
40%|███▉ | 1800/4506 [2:03:07<3:01:39, 4.03s/it]
{'loss': 0.2602, 'grad_norm': 0.39864954352378845, 'learning_rate': 3.756145315886135e-05, 'epoch': 0.4}
40%|███▉ | 1800/4506 [2:03:07<3:01:39, 4.03s/it]
40%|███▉ | 1801/4506 [2:03:12<3:08:53, 4.19s/it]
{'loss': 0.2747, 'grad_norm': 0.4234963059425354, 'learning_rate': 3.7544703238610206e-05, 'epoch': 0.4}
40%|███▉ | 1801/4506 [2:03:12<3:08:53, 4.19s/it]
40%|███▉ | 1802/4506 [2:03:16<3:09:13, 4.20s/it]
{'loss': 0.2529, 'grad_norm': 0.3316763639450073, 'learning_rate': 3.75279457886465e-05, 'epoch': 0.4}
40%|███▉ | 1802/4506 [2:03:16<3:09:13, 4.20s/it]
40%|████ | 1803/4506 [2:03:20<3:07:16, 4.16s/it]
{'loss': 0.2638, 'grad_norm': 0.37085333466529846, 'learning_rate': 3.751118081902855e-05, 'epoch': 0.4}
40%|████ | 1803/4506 [2:03:20<3:07:16, 4.16s/it]
40%|████ | 1804/4506 [2:03:24<3:13:16, 4.29s/it]
{'loss': 0.2692, 'grad_norm': 0.43486490845680237, 'learning_rate': 3.749440833981919e-05, 'epoch': 0.4}
40%|████ | 1804/4506 [2:03:24<3:13:16, 4.29s/it]
40%|████ | 1805/4506 [2:03:28<3:08:28, 4.19s/it]
{'loss': 0.2765, 'grad_norm': 0.45183852314949036, 'learning_rate': 3.7477628361085793e-05, 'epoch': 0.4}
40%|████ | 1805/4506 [2:03:28<3:08:28, 4.19s/it]
40%|████ | 1806/4506 [2:03:32<3:05:22, 4.12s/it]
{'loss': 0.2635, 'grad_norm': 0.41193148493766785, 'learning_rate': 3.74608408929002e-05, 'epoch': 0.4}
40%|████ | 1806/4506 [2:03:32<3:05:22, 4.12s/it]
40%|████ | 1807/4506 [2:03:36<3:04:31, 4.10s/it]
{'loss': 0.2547, 'grad_norm': 0.4177568256855011, 'learning_rate': 3.744404594533877e-05, 'epoch': 0.4}
40%|████ | 1807/4506 [2:03:36<3:04:31, 4.10s/it]
40%|████ | 1808/4506 [2:03:41<3:05:30, 4.13s/it]
{'loss': 0.2526, 'grad_norm': 0.3777453601360321, 'learning_rate': 3.742724352848233e-05, 'epoch': 0.4}
40%|████ | 1808/4506 [2:03:41<3:05:30, 4.13s/it]
40%|████ | 1809/4506 [2:03:44<3:01:34, 4.04s/it]
{'loss': 0.2549, 'grad_norm': 0.36194801330566406, 'learning_rate': 3.741043365241621e-05, 'epoch': 0.4}
40%|████ | 1809/4506 [2:03:44<3:01:34, 4.04s/it]
40%|████ | 1810/4506 [2:03:48<2:59:52, 4.00s/it]
{'loss': 0.2779, 'grad_norm': 0.4343928396701813, 'learning_rate': 3.7393616327230204e-05, 'epoch': 0.4}
40%|████ | 1810/4506 [2:03:48<2:59:52, 4.00s/it]
40%|████ | 1811/4506 [2:03:54<3:16:14, 4.37s/it]
{'loss': 0.2594, 'grad_norm': 0.3989998400211334, 'learning_rate': 3.7376791563018585e-05, 'epoch': 0.4}
40%|████ | 1811/4506 [2:03:54<3:16:14, 4.37s/it]
40%|████ | 1812/4506 [2:03:57<3:10:09, 4.24s/it]
{'loss': 0.2638, 'grad_norm': 0.4210521876811981, 'learning_rate': 3.73599593698801e-05, 'epoch': 0.4}
40%|████ | 1812/4506 [2:03:57<3:10:09, 4.24s/it]
40%|████ | 1813/4506 [2:04:02<3:12:04, 4.28s/it]
{'loss': 0.2675, 'grad_norm': 0.3939293622970581, 'learning_rate': 3.7343119757917916e-05, 'epoch': 0.4}
40%|████ | 1813/4506 [2:04:02<3:12:04, 4.28s/it]
40%|████ | 1814/4506 [2:04:07<3:21:58, 4.50s/it]
{'loss': 0.2633, 'grad_norm': 0.39479485154151917, 'learning_rate': 3.73262727372397e-05, 'epoch': 0.4}
40%|████ | 1814/4506 [2:04:07<3:21:58, 4.50s/it]
40%|████ | 1815/4506 [2:04:11<3:17:02, 4.39s/it]
{'loss': 0.2749, 'grad_norm': 0.42943450808525085, 'learning_rate': 3.730941831795755e-05, 'epoch': 0.4}
40%|████ | 1815/4506 [2:04:11<3:17:02, 4.39s/it]
40%|████ | 1816/4506 [2:04:15<3:13:15, 4.31s/it]
{'loss': 0.273, 'grad_norm': 0.46199551224708557, 'learning_rate': 3.7292556510187984e-05, 'epoch': 0.4}
40%|████ | 1816/4506 [2:04:15<3:13:15, 4.31s/it]
40%|████ | 1817/4506 [2:04:19<3:13:13, 4.31s/it]
{'loss': 0.2546, 'grad_norm': 0.33806219696998596, 'learning_rate': 3.7275687324051994e-05, 'epoch': 0.4}
40%|████ | 1817/4506 [2:04:19<3:13:13, 4.31s/it]
40%|████ | 1818/4506 [2:04:23<3:07:43, 4.19s/it]
{'loss': 0.2645, 'grad_norm': 0.39223966002464294, 'learning_rate': 3.7258810769674963e-05, 'epoch': 0.4}
40%|████ | 1818/4506 [2:04:23<3:07:43, 4.19s/it]
40%|████ | 1819/4506 [2:04:27<3:04:30, 4.12s/it]
{'loss': 0.2617, 'grad_norm': 0.3939347565174103, 'learning_rate': 3.7241926857186715e-05, 'epoch': 0.4}
40%|████ | 1819/4506 [2:04:27<3:04:30, 4.12s/it]
40%|████ | 1820/4506 [2:04:31<3:01:33, 4.06s/it]
{'loss': 0.2742, 'grad_norm': 0.45769283175468445, 'learning_rate': 3.722503559672148e-05, 'epoch': 0.4}
40%|████ | 1820/4506 [2:04:31<3:01:33, 4.06s/it]
40%|████ | 1821/4506 [2:04:36<3:05:42, 4.15s/it]
{'loss': 0.2644, 'grad_norm': 0.39914944767951965, 'learning_rate': 3.7208136998417934e-05, 'epoch': 0.4}
40%|████ | 1821/4506 [2:04:36<3:05:42, 4.15s/it]
40%|████ | 1822/4506 [2:04:40<3:06:42, 4.17s/it]
{'loss': 0.2691, 'grad_norm': 0.4238964915275574, 'learning_rate': 3.7191231072419096e-05, 'epoch': 0.4}
40%|████ | 1822/4506 [2:04:40<3:06:42, 4.17s/it]
40%|████ | 1823/4506 [2:04:44<3:05:05, 4.14s/it]
{'loss': 0.2661, 'grad_norm': 0.41020599007606506, 'learning_rate': 3.717431782887244e-05, 'epoch': 0.4}
40%|████ | 1823/4506 [2:04:44<3:05:05, 4.14s/it]
40%|████ | 1824/4506 [2:04:48<3:01:13, 4.05s/it]
{'loss': 0.2744, 'grad_norm': 0.45625972747802734, 'learning_rate': 3.715739727792981e-05, 'epoch': 0.4}
40%|████ | 1824/4506 [2:04:48<3:01:13, 4.05s/it]
41%|████ | 1825/4506 [2:04:52<3:00:19, 4.04s/it]
{'loss': 0.2788, 'grad_norm': 0.4523107707500458, 'learning_rate': 3.714046942974741e-05, 'epoch': 0.41}
41%|████ | 1825/4506 [2:04:52<3:00:19, 4.04s/it]
41%|████ | 1826/4506 [2:04:56<2:58:25, 3.99s/it]
{'loss': 0.242, 'grad_norm': 0.36873936653137207, 'learning_rate': 3.7123534294485876e-05, 'epoch': 0.41}
41%|████ | 1826/4506 [2:04:56<2:58:25, 3.99s/it]
41%|████ | 1827/4506 [2:05:00<2:58:33, 4.00s/it]
{'loss': 0.2752, 'grad_norm': 0.40327051281929016, 'learning_rate': 3.710659188231018e-05, 'epoch': 0.41}
41%|████ | 1827/4506 [2:05:00<2:58:33, 4.00s/it]
41%|████ | 1828/4506 [2:05:03<2:54:38, 3.91s/it]
{'loss': 0.2627, 'grad_norm': 0.4269372522830963, 'learning_rate': 3.7089642203389686e-05, 'epoch': 0.41}
41%|████ | 1828/4506 [2:05:03<2:54:38, 3.91s/it]
41%|████ | 1829/4506 [2:05:07<2:55:30, 3.93s/it]
{'loss': 0.2789, 'grad_norm': 0.48834624886512756, 'learning_rate': 3.7072685267898084e-05, 'epoch': 0.41}
41%|████ | 1829/4506 [2:05:07<2:55:30, 3.93s/it]
41%|████ | 1830/4506 [2:05:11<2:56:48, 3.96s/it]
{'loss': 0.2686, 'grad_norm': 0.3985850512981415, 'learning_rate': 3.705572108601346e-05, 'epoch': 0.41}
41%|████ | 1830/4506 [2:05:11<2:56:48, 3.96s/it]
41%|████ | 1831/4506 [2:05:16<3:00:29, 4.05s/it]
{'loss': 0.2719, 'grad_norm': 0.4328080415725708, 'learning_rate': 3.703874966791823e-05, 'epoch': 0.41}
41%|████ | 1831/4506 [2:05:16<3:00:29, 4.05s/it]
41%|████ | 1832/4506 [2:05:20<3:08:32, 4.23s/it]
{'loss': 0.2738, 'grad_norm': 0.39362633228302, 'learning_rate': 3.702177102379915e-05, 'epoch': 0.41}
41%|████ | 1832/4506 [2:05:20<3:08:32, 4.23s/it]
41%|████ | 1833/4506 [2:05:24<3:07:43, 4.21s/it]
{'loss': 0.271, 'grad_norm': 0.40826159715652466, 'learning_rate': 3.700478516384732e-05, 'epoch': 0.41}
41%|████ | 1833/4506 [2:05:24<3:07:43, 4.21s/it]
41%|████ | 1834/4506 [2:05:29<3:06:40, 4.19s/it]
{'loss': 0.2693, 'grad_norm': 0.41558393836021423, 'learning_rate': 3.698779209825818e-05, 'epoch': 0.41}
41%|████ | 1834/4506 [2:05:29<3:06:40, 4.19s/it]
41%|████ | 1835/4506 [2:05:33<3:03:34, 4.12s/it]
{'loss': 0.2743, 'grad_norm': 0.45315730571746826, 'learning_rate': 3.697079183723147e-05, 'epoch': 0.41}
41%|████ | 1835/4506 [2:05:33<3:03:34, 4.12s/it]
41%|████ | 1836/4506 [2:05:36<2:59:24, 4.03s/it]
{'loss': 0.2753, 'grad_norm': 0.4166724979877472, 'learning_rate': 3.6953784390971274e-05, 'epoch': 0.41}
41%|████ | 1836/4506 [2:05:36<2:59:24, 4.03s/it]
41%|████ | 1837/4506 [2:05:40<3:00:10, 4.05s/it]
{'loss': 0.2773, 'grad_norm': 0.41672277450561523, 'learning_rate': 3.693676976968598e-05, 'epoch': 0.41}
41%|████ | 1837/4506 [2:05:40<3:00:10, 4.05s/it]
41%|████ | 1838/4506 [2:05:44<2:58:30, 4.01s/it]
{'loss': 0.2708, 'grad_norm': 0.4019433259963989, 'learning_rate': 3.691974798358827e-05, 'epoch': 0.41}
41%|████ | 1838/4506 [2:05:44<2:58:30, 4.01s/it]
41%|████ | 1839/4506 [2:05:48<2:58:36, 4.02s/it]
{'loss': 0.2567, 'grad_norm': 0.4231611490249634, 'learning_rate': 3.6902719042895145e-05, 'epoch': 0.41}
41%|████ | 1839/4506 [2:05:48<2:58:36, 4.02s/it]
41%|████ | 1840/4506 [2:05:53<3:06:23, 4.19s/it]
{'loss': 0.2606, 'grad_norm': 0.3350728452205658, 'learning_rate': 3.68856829578279e-05, 'epoch': 0.41}
41%|████ | 1840/4506 [2:05:53<3:06:23, 4.19s/it]
41%|████ | 1841/4506 [2:05:57<3:00:37, 4.07s/it]
{'loss': 0.2615, 'grad_norm': 0.3629390597343445, 'learning_rate': 3.68686397386121e-05, 'epoch': 0.41}
41%|████ | 1841/4506 [2:05:57<3:00:37, 4.07s/it]
41%|████ | 1842/4506 [2:06:01<3:01:04, 4.08s/it]
{'loss': 0.2747, 'grad_norm': 0.38898301124572754, 'learning_rate': 3.685158939547761e-05, 'epoch': 0.41}
41%|████ | 1842/4506 [2:06:01<3:01:04, 4.08s/it]
41%|████ | 1843/4506 [2:06:05<3:06:18, 4.20s/it]
{'loss': 0.2726, 'grad_norm': 0.3987950384616852, 'learning_rate': 3.683453193865857e-05, 'epoch': 0.41}
41%|████ | 1843/4506 [2:06:05<3:06:18, 4.20s/it]
41%|████ | 1844/4506 [2:06:09<3:01:39, 4.09s/it]
{'loss': 0.2616, 'grad_norm': 0.38525620102882385, 'learning_rate': 3.681746737839336e-05, 'epoch': 0.41}
41%|████ | 1844/4506 [2:06:09<3:01:39, 4.09s/it]
41%|████ | 1845/4506 [2:06:13<2:59:29, 4.05s/it]
{'loss': 0.262, 'grad_norm': 0.3820343017578125, 'learning_rate': 3.680039572492468e-05, 'epoch': 0.41}
41%|████ | 1845/4506 [2:06:13<2:59:29, 4.05s/it]
41%|████ | 1846/4506 [2:06:18<3:03:49, 4.15s/it]
{'loss': 0.2619, 'grad_norm': 0.37897059321403503, 'learning_rate': 3.678331698849944e-05, 'epoch': 0.41}
41%|████ | 1846/4506 [2:06:18<3:03:49, 4.15s/it]
41%|████ | 1847/4506 [2:06:22<3:06:42, 4.21s/it]
{'loss': 0.269, 'grad_norm': 0.3549080789089203, 'learning_rate': 3.676623117936882e-05, 'epoch': 0.41}
41%|████ | 1847/4506 [2:06:22<3:06:42, 4.21s/it]
41%|████ | 1848/4506 [2:06:26<3:01:49, 4.10s/it]
{'loss': 0.2584, 'grad_norm': 0.38662174344062805, 'learning_rate': 3.674913830778823e-05, 'epoch': 0.41}
41%|████ | 1848/4506 [2:06:26<3:01:49, 4.10s/it]
41%|████ | 1849/4506 [2:06:30<3:04:45, 4.17s/it]
{'loss': 0.268, 'grad_norm': 0.38018491864204407, 'learning_rate': 3.6732038384017366e-05, 'epoch': 0.41}
41%|████ | 1849/4506 [2:06:30<3:04:45, 4.17s/it]
41%|████ | 1850/4506 [2:06:34<3:03:20, 4.14s/it]
{'loss': 0.2558, 'grad_norm': 0.40226709842681885, 'learning_rate': 3.6714931418320084e-05, 'epoch': 0.41}
41%|████ | 1850/4506 [2:06:34<3:03:20, 4.14s/it]
41%|████ | 1851/4506 [2:06:38<3:03:00, 4.14s/it]
{'loss': 0.2729, 'grad_norm': 0.4312264025211334, 'learning_rate': 3.669781742096452e-05, 'epoch': 0.41}
41%|████ | 1851/4506 [2:06:38<3:03:00, 4.14s/it]
41%|████ | 1852/4506 [2:06:43<3:05:00, 4.18s/it]
{'loss': 0.2726, 'grad_norm': 0.4582519829273224, 'learning_rate': 3.6680696402223026e-05, 'epoch': 0.41}
41%|████ | 1852/4506 [2:06:43<3:05:00, 4.18s/it]
41%|████ | 1853/4506 [2:06:47<3:07:50, 4.25s/it]
{'loss': 0.2459, 'grad_norm': 0.35827144980430603, 'learning_rate': 3.666356837237215e-05, 'epoch': 0.41}
41%|████ | 1853/4506 [2:06:47<3:07:50, 4.25s/it]
41%|████ | 1854/4506 [2:06:51<3:02:27, 4.13s/it]
{'loss': 0.2565, 'grad_norm': 0.3752121925354004, 'learning_rate': 3.664643334169264e-05, 'epoch': 0.41}
41%|████ | 1854/4506 [2:06:51<3:02:27, 4.13s/it]
41%|████ | 1855/4506 [2:06:55<3:02:13, 4.12s/it]
{'loss': 0.2728, 'grad_norm': 0.3843683898448944, 'learning_rate': 3.6629291320469495e-05, 'epoch': 0.41}
41%|████ | 1855/4506 [2:06:55<3:02:13, 4.12s/it]
41%|████ | 1856/4506 [2:06:59<3:02:04, 4.12s/it]
{'loss': 0.2626, 'grad_norm': 0.4344431161880493, 'learning_rate': 3.661214231899186e-05, 'epoch': 0.41}
41%|████ | 1856/4506 [2:06:59<3:02:04, 4.12s/it]
41%|████ | 1857/4506 [2:07:03<2:59:31, 4.07s/it]
{'loss': 0.26, 'grad_norm': 0.3760468065738678, 'learning_rate': 3.659498634755309e-05, 'epoch': 0.41}
41%|████ | 1857/4506 [2:07:03<2:59:31, 4.07s/it]
41%|████ | 1858/4506 [2:07:07<2:57:34, 4.02s/it]
{'loss': 0.2662, 'grad_norm': 0.4396817684173584, 'learning_rate': 3.657782341645072e-05, 'epoch': 0.41}
41%|████ | 1858/4506 [2:07:07<2:57:34, 4.02s/it]
41%|████▏ | 1859/4506 [2:07:11<2:58:18, 4.04s/it]
{'loss': 0.2495, 'grad_norm': 0.4093157947063446, 'learning_rate': 3.656065353598645e-05, 'epoch': 0.41}
41%|████▏ | 1859/4506 [2:07:11<2:58:18, 4.04s/it]
41%|████▏ | 1860/4506 [2:07:15<3:01:30, 4.12s/it]
{'loss': 0.2726, 'grad_norm': 0.41827285289764404, 'learning_rate': 3.6543476716466194e-05, 'epoch': 0.41}
41%|████▏ | 1860/4506 [2:07:15<3:01:30, 4.12s/it]
41%|████▏ | 1861/4506 [2:07:20<3:03:21, 4.16s/it]
{'loss': 0.26, 'grad_norm': 0.4022930860519409, 'learning_rate': 3.652629296819998e-05, 'epoch': 0.41}
41%|████▏ | 1861/4506 [2:07:20<3:03:21, 4.16s/it]
41%|████▏ | 1862/4506 [2:07:24<3:03:53, 4.17s/it]
{'loss': 0.284, 'grad_norm': 0.46989119052886963, 'learning_rate': 3.650910230150203e-05, 'epoch': 0.41}
41%|████▏ | 1862/4506 [2:07:24<3:03:53, 4.17s/it]
41%|████▏ | 1863/4506 [2:07:28<3:05:15, 4.21s/it]
{'loss': 0.2707, 'grad_norm': 0.4272128939628601, 'learning_rate': 3.649190472669069e-05, 'epoch': 0.41}
41%|████▏ | 1863/4506 [2:07:28<3:05:15, 4.21s/it]
41%|████▏ | 1864/4506 [2:07:32<3:05:08, 4.20s/it]
{'loss': 0.2537, 'grad_norm': 0.4030905067920685, 'learning_rate': 3.6474700254088476e-05, 'epoch': 0.41}
41%|████▏ | 1864/4506 [2:07:32<3:05:08, 4.20s/it]
41%|████▏ | 1865/4506 [2:07:36<3:01:29, 4.12s/it]
{'loss': 0.2542, 'grad_norm': 0.389523983001709, 'learning_rate': 3.6457488894022034e-05, 'epoch': 0.41}
41%|████▏ | 1865/4506 [2:07:36<3:01:29, 4.12s/it]
41%|████▏ | 1866/4506 [2:07:40<2:57:30, 4.03s/it]
{'loss': 0.2625, 'grad_norm': 0.4305983781814575, 'learning_rate': 3.644027065682215e-05, 'epoch': 0.41}
41%|████▏ | 1866/4506 [2:07:40<2:57:30, 4.03s/it]
41%|████▏ | 1867/4506 [2:07:44<2:56:49, 4.02s/it]
{'loss': 0.2686, 'grad_norm': 0.4207035303115845, 'learning_rate': 3.6423045552823734e-05, 'epoch': 0.41}
41%|████▏ | 1867/4506 [2:07:44<2:56:49, 4.02s/it]
41%|████▏ | 1868/4506 [2:07:48<2:59:56, 4.09s/it]
{'loss': 0.2755, 'grad_norm': 0.4195449948310852, 'learning_rate': 3.640581359236581e-05, 'epoch': 0.41}
41%|████▏ | 1868/4506 [2:07:48<2:59:56, 4.09s/it]
41%|████▏ | 1869/4506 [2:07:52<2:57:09, 4.03s/it]
{'loss': 0.2559, 'grad_norm': 0.4394044876098633, 'learning_rate': 3.638857478579153e-05, 'epoch': 0.41}
41%|████▏ | 1869/4506 [2:07:52<2:57:09, 4.03s/it]
42%|████▏ | 1870/4506 [2:07:56<2:53:26, 3.95s/it]
{'loss': 0.2663, 'grad_norm': 0.44062402844429016, 'learning_rate': 3.6371329143448155e-05, 'epoch': 0.42}
42%|████▏ | 1870/4506 [2:07:56<2:53:26, 3.95s/it]
42%|████▏ | 1871/4506 [2:08:00<2:54:31, 3.97s/it]
{'loss': 0.2554, 'grad_norm': 0.38147276639938354, 'learning_rate': 3.635407667568703e-05, 'epoch': 0.42}
42%|████▏ | 1871/4506 [2:08:00<2:54:31, 3.97s/it]
42%|████▏ | 1872/4506 [2:08:04<2:53:09, 3.94s/it]
{'loss': 0.2605, 'grad_norm': 0.4147888422012329, 'learning_rate': 3.633681739286363e-05, 'epoch': 0.42}
42%|████▏ | 1872/4506 [2:08:04<2:53:09, 3.94s/it]
42%|████▏ | 1873/4506 [2:08:08<2:56:18, 4.02s/it]
{'loss': 0.2808, 'grad_norm': 0.5476221442222595, 'learning_rate': 3.6319551305337484e-05, 'epoch': 0.42}
42%|████▏ | 1873/4506 [2:08:08<2:56:18, 4.02s/it]
42%|████▏ | 1874/4506 [2:08:12<2:59:04, 4.08s/it]
{'loss': 0.2559, 'grad_norm': 0.3991256356239319, 'learning_rate': 3.6302278423472244e-05, 'epoch': 0.42}
42%|████▏ | 1874/4506 [2:08:12<2:59:04, 4.08s/it]
42%|████▏ | 1875/4506 [2:08:16<2:56:43, 4.03s/it]
{'loss': 0.2705, 'grad_norm': 0.43614378571510315, 'learning_rate': 3.6284998757635596e-05, 'epoch': 0.42}
42%|████▏ | 1875/4506 [2:08:16<2:56:43, 4.03s/it]
42%|████▏ | 1876/4506 [2:08:20<2:58:30, 4.07s/it]
{'loss': 0.2583, 'grad_norm': 0.44261234998703003, 'learning_rate': 3.626771231819933e-05, 'epoch': 0.42}
42%|████▏ | 1876/4506 [2:08:20<2:58:30, 4.07s/it]
42%|████▏ | 1877/4506 [2:08:24<2:59:07, 4.09s/it]
{'loss': 0.2715, 'grad_norm': 0.45064276456832886, 'learning_rate': 3.6250419115539305e-05, 'epoch': 0.42}
42%|████▏ | 1877/4506 [2:08:24<2:59:07, 4.09s/it]
42%|████▏ | 1878/4506 [2:08:28<2:57:52, 4.06s/it]
{'loss': 0.2633, 'grad_norm': 0.40960413217544556, 'learning_rate': 3.6233119160035406e-05, 'epoch': 0.42}
42%|████▏ | 1878/4506 [2:08:28<2:57:52, 4.06s/it]
42%|████▏ | 1879/4506 [2:08:32<2:57:05, 4.04s/it]
{'loss': 0.2706, 'grad_norm': 0.393454372882843, 'learning_rate': 3.62158124620716e-05, 'epoch': 0.42}
42%|████▏ | 1879/4506 [2:08:32<2:57:05, 4.04s/it]
42%|████▏ | 1880/4506 [2:08:37<2:58:19, 4.07s/it]
{'loss': 0.2474, 'grad_norm': 0.38157665729522705, 'learning_rate': 3.619849903203591e-05, 'epoch': 0.42}
42%|████▏ | 1880/4506 [2:08:37<2:58:19, 4.07s/it]
42%|████▏ | 1881/4506 [2:08:41<3:00:23, 4.12s/it]
{'loss': 0.2576, 'grad_norm': 0.36371150612831116, 'learning_rate': 3.6181178880320365e-05, 'epoch': 0.42}
42%|████▏ | 1881/4506 [2:08:41<3:00:23, 4.12s/it]
42%|████▏ | 1882/4506 [2:08:45<2:58:35, 4.08s/it]
{'loss': 0.2708, 'grad_norm': 0.4583682417869568, 'learning_rate': 3.616385201732105e-05, 'epoch': 0.42}
42%|████▏ | 1882/4506 [2:08:45<2:58:35, 4.08s/it]
42%|████▏ | 1883/4506 [2:08:49<2:58:40, 4.09s/it]
{'loss': 0.2585, 'grad_norm': 0.3509986400604248, 'learning_rate': 3.614651845343808e-05, 'epoch': 0.42}
42%|████▏ | 1883/4506 [2:08:49<2:58:40, 4.09s/it]
42%|████▏ | 1884/4506 [2:08:53<2:55:57, 4.03s/it]
{'loss': 0.2494, 'grad_norm': 0.40995079278945923, 'learning_rate': 3.612917819907559e-05, 'epoch': 0.42}
42%|████▏ | 1884/4506 [2:08:53<2:55:57, 4.03s/it]
42%|████▏ | 1885/4506 [2:08:57<2:58:10, 4.08s/it]
{'loss': 0.2709, 'grad_norm': 0.47396400570869446, 'learning_rate': 3.611183126464172e-05, 'epoch': 0.42}
42%|████▏ | 1885/4506 [2:08:57<2:58:10, 4.08s/it]
42%|████▏ | 1886/4506 [2:09:01<3:00:41, 4.14s/it]
{'loss': 0.251, 'grad_norm': 0.4358019232749939, 'learning_rate': 3.609447766054863e-05, 'epoch': 0.42}
42%|████▏ | 1886/4506 [2:09:01<3:00:41, 4.14s/it]
42%|████▏ | 1887/4506 [2:09:06<3:05:37, 4.25s/it]
{'loss': 0.2556, 'grad_norm': 0.37337374687194824, 'learning_rate': 3.607711739721248e-05, 'epoch': 0.42}
42%|████▏ | 1887/4506 [2:09:06<3:05:37, 4.25s/it]
42%|████▏ | 1888/4506 [2:09:10<3:06:00, 4.26s/it]
{'loss': 0.2578, 'grad_norm': 0.4606708288192749, 'learning_rate': 3.6059750485053444e-05, 'epoch': 0.42}
42%|████▏ | 1888/4506 [2:09:10<3:06:00, 4.26s/it]
42%|████▏ | 1889/4506 [2:09:14<3:01:21, 4.16s/it]
{'loss': 0.2532, 'grad_norm': 0.42600956559181213, 'learning_rate': 3.6042376934495645e-05, 'epoch': 0.42}
42%|████▏ | 1889/4506 [2:09:14<3:01:21, 4.16s/it]
42%|████▏ | 1890/4506 [2:09:18<3:02:30, 4.19s/it]
{'loss': 0.2586, 'grad_norm': 0.4503241777420044, 'learning_rate': 3.602499675596724e-05, 'epoch': 0.42}
42%|████▏ | 1890/4506 [2:09:18<3:02:30, 4.19s/it]
42%|████▏ | 1891/4506 [2:09:22<2:58:24, 4.09s/it]
{'loss': 0.2588, 'grad_norm': 0.3448338508605957, 'learning_rate': 3.6007609959900327e-05, 'epoch': 0.42}
42%|████▏ | 1891/4506 [2:09:22<2:58:24, 4.09s/it]
42%|████▏ | 1892/4506 [2:09:26<2:58:48, 4.10s/it]
{'loss': 0.256, 'grad_norm': 0.3965805470943451, 'learning_rate': 3.5990216556730995e-05, 'epoch': 0.42}
42%|████▏ | 1892/4506 [2:09:26<2:58:48, 4.10s/it]
42%|████▏ | 1893/4506 [2:09:30<3:00:28, 4.14s/it]
{'loss': 0.2493, 'grad_norm': 0.38253000378608704, 'learning_rate': 3.597281655689929e-05, 'epoch': 0.42}
42%|████▏ | 1893/4506 [2:09:30<3:00:28, 4.14s/it]
42%|████▏ | 1894/4506 [2:09:35<3:00:25, 4.14s/it]
{'loss': 0.2731, 'grad_norm': 0.4551195204257965, 'learning_rate': 3.5955409970849224e-05, 'epoch': 0.42}
42%|████▏ | 1894/4506 [2:09:35<3:00:25, 4.14s/it]
42%|████▏ | 1895/4506 [2:09:39<3:02:55, 4.20s/it]
{'loss': 0.2729, 'grad_norm': 0.42120760679244995, 'learning_rate': 3.593799680902876e-05, 'epoch': 0.42}
42%|████▏ | 1895/4506 [2:09:39<3:02:55, 4.20s/it]
42%|████▏ | 1896/4506 [2:09:43<3:00:12, 4.14s/it]
{'loss': 0.2512, 'grad_norm': 0.37180545926094055, 'learning_rate': 3.5920577081889804e-05, 'epoch': 0.42}
42%|████▏ | 1896/4506 [2:09:43<3:00:12, 4.14s/it]
42%|████▏ | 1897/4506 [2:09:47<2:59:15, 4.12s/it]
{'loss': 0.2558, 'grad_norm': 0.37840017676353455, 'learning_rate': 3.590315079988822e-05, 'epoch': 0.42}
42%|████▏ | 1897/4506 [2:09:47<2:59:15, 4.12s/it]
42%|████▏ | 1898/4506 [2:09:51<3:01:30, 4.18s/it]
{'loss': 0.2621, 'grad_norm': 0.3954865038394928, 'learning_rate': 3.5885717973483766e-05, 'epoch': 0.42}
42%|████▏ | 1898/4506 [2:09:51<3:01:30, 4.18s/it]
42%|████▏ | 1899/4506 [2:09:55<3:00:21, 4.15s/it]
{'loss': 0.2758, 'grad_norm': 0.39180275797843933, 'learning_rate': 3.5868278613140184e-05, 'epoch': 0.42}
42%|████▏ | 1899/4506 [2:09:55<3:00:21, 4.15s/it]
42%|████▏ | 1900/4506 [2:09:59<2:53:52, 4.00s/it]
{'loss': 0.2595, 'grad_norm': 0.4280647039413452, 'learning_rate': 3.585083272932509e-05, 'epoch': 0.42}
42%|████▏ | 1900/4506 [2:09:59<2:53:52, 4.00s/it]
42%|████▏ | 1901/4506 [2:10:03<2:52:23, 3.97s/it]
{'loss': 0.2591, 'grad_norm': 0.45643824338912964, 'learning_rate': 3.5833380332510034e-05, 'epoch': 0.42}
42%|████▏ | 1901/4506 [2:10:03<2:52:23, 3.97s/it]
42%|████▏ | 1902/4506 [2:10:07<2:56:34, 4.07s/it]
{'loss': 0.2614, 'grad_norm': 0.3840964734554291, 'learning_rate': 3.5815921433170496e-05, 'epoch': 0.42}
42%|████▏ | 1902/4506 [2:10:07<2:56:34, 4.07s/it]
42%|████▏ | 1903/4506 [2:10:11<2:55:08, 4.04s/it]
{'loss': 0.2656, 'grad_norm': 0.42561131715774536, 'learning_rate': 3.5798456041785815e-05, 'epoch': 0.42}
42%|████▏ | 1903/4506 [2:10:11<2:55:08, 4.04s/it]
42%|████▏ | 1904/4506 [2:10:16<3:00:01, 4.15s/it]
{'loss': 0.2459, 'grad_norm': 0.3841196298599243, 'learning_rate': 3.578098416883926e-05, 'epoch': 0.42}
42%|████▏ | 1904/4506 [2:10:16<3:00:01, 4.15s/it]
42%|████▏ | 1905/4506 [2:10:19<2:53:55, 4.01s/it]
{'loss': 0.2712, 'grad_norm': 0.4630531072616577, 'learning_rate': 3.5763505824817974e-05, 'epoch': 0.42}
42%|████▏ | 1905/4506 [2:10:19<2:53:55, 4.01s/it]
42%|████▏ | 1906/4506 [2:10:23<2:53:08, 4.00s/it]
{'loss': 0.2495, 'grad_norm': 0.4386773407459259, 'learning_rate': 3.574602102021301e-05, 'epoch': 0.42}
42%|████▏ | 1906/4506 [2:10:23<2:53:08, 4.00s/it]
42%|████▏ | 1907/4506 [2:10:28<2:59:48, 4.15s/it]
{'loss': 0.2586, 'grad_norm': 0.36939170956611633, 'learning_rate': 3.572852976551926e-05, 'epoch': 0.42}
42%|████▏ | 1907/4506 [2:10:28<2:59:48, 4.15s/it]
42%|████▏ | 1908/4506 [2:10:32<3:01:06, 4.18s/it]
{'loss': 0.2653, 'grad_norm': 0.40132319927215576, 'learning_rate': 3.5711032071235517e-05, 'epoch': 0.42}
42%|████▏ | 1908/4506 [2:10:32<3:01:06, 4.18s/it]
42%|████▏ | 1909/4506 [2:10:36<3:01:40, 4.20s/it]
{'loss': 0.2672, 'grad_norm': 0.37684229016304016, 'learning_rate': 3.569352794786443e-05, 'epoch': 0.42}
42%|████▏ | 1909/4506 [2:10:36<3:01:40, 4.20s/it]
42%|████▏ | 1910/4506 [2:10:40<2:56:37, 4.08s/it]
{'loss': 0.2653, 'grad_norm': 0.44058704376220703, 'learning_rate': 3.5676017405912495e-05, 'epoch': 0.42}
42%|████▏ | 1910/4506 [2:10:40<2:56:37, 4.08s/it]
42%|████▏ | 1911/4506 [2:10:44<2:58:55, 4.14s/it]
{'loss': 0.2551, 'grad_norm': 0.3697770833969116, 'learning_rate': 3.565850045589008e-05, 'epoch': 0.42}
42%|████▏ | 1911/4506 [2:10:44<2:58:55, 4.14s/it]
42%|████▏ | 1912/4506 [2:10:49<3:00:33, 4.18s/it]
{'loss': 0.2703, 'grad_norm': 0.4546809196472168, 'learning_rate': 3.5640977108311394e-05, 'epoch': 0.42}
42%|████▏ | 1912/4506 [2:10:49<3:00:33, 4.18s/it]
42%|████▏ | 1913/4506 [2:10:53<2:58:17, 4.13s/it]
{'loss': 0.244, 'grad_norm': 0.39333364367485046, 'learning_rate': 3.562344737369448e-05, 'epoch': 0.42}
42%|████▏ | 1913/4506 [2:10:53<2:58:17, 4.13s/it]
42%|████▏ | 1914/4506 [2:10:57<2:58:07, 4.12s/it]
{'loss': 0.2627, 'grad_norm': 0.41628846526145935, 'learning_rate': 3.5605911262561214e-05, 'epoch': 0.42}
42%|████▏ | 1914/4506 [2:10:57<2:58:07, 4.12s/it]
42%|████▏ | 1915/4506 [2:11:01<2:58:20, 4.13s/it]
{'loss': 0.2558, 'grad_norm': 0.41487011313438416, 'learning_rate': 3.558836878543731e-05, 'epoch': 0.43}
42%|████▏ | 1915/4506 [2:11:01<2:58:20, 4.13s/it]
43%|████▎ | 1916/4506 [2:11:05<2:55:40, 4.07s/it]
{'loss': 0.2628, 'grad_norm': 0.4284682273864746, 'learning_rate': 3.55708199528523e-05, 'epoch': 0.43}
43%|████▎ | 1916/4506 [2:11:05<2:55:40, 4.07s/it]
43%|████▎ | 1917/4506 [2:11:09<2:54:21, 4.04s/it]
{'loss': 0.2608, 'grad_norm': 0.36829373240470886, 'learning_rate': 3.5553264775339515e-05, 'epoch': 0.43}
43%|████▎ | 1917/4506 [2:11:09<2:54:21, 4.04s/it]
43%|████▎ | 1918/4506 [2:11:13<2:55:40, 4.07s/it]
{'loss': 0.2673, 'grad_norm': 0.44768190383911133, 'learning_rate': 3.5535703263436124e-05, 'epoch': 0.43}
43%|████▎ | 1918/4506 [2:11:13<2:55:40, 4.07s/it]
43%|████▎ | 1919/4506 [2:11:17<2:57:16, 4.11s/it]
{'loss': 0.2586, 'grad_norm': 0.3927031457424164, 'learning_rate': 3.5518135427683066e-05, 'epoch': 0.43}
43%|████▎ | 1919/4506 [2:11:17<2:57:16, 4.11s/it]
43%|████▎ | 1920/4506 [2:11:21<2:56:28, 4.09s/it]
{'loss': 0.247, 'grad_norm': 0.42848289012908936, 'learning_rate': 3.550056127862509e-05, 'epoch': 0.43}
43%|████▎ | 1920/4506 [2:11:21<2:56:28, 4.09s/it]
43%|████▎ | 1921/4506 [2:11:25<2:58:20, 4.14s/it]
{'loss': 0.2579, 'grad_norm': 0.37700292468070984, 'learning_rate': 3.5482980826810745e-05, 'epoch': 0.43}
43%|████▎ | 1921/4506 [2:11:25<2:58:20, 4.14s/it]
43%|████▎ | 1922/4506 [2:11:30<3:02:37, 4.24s/it]
{'loss': 0.2591, 'grad_norm': 0.39541929960250854, 'learning_rate': 3.546539408279235e-05, 'epoch': 0.43}
43%|████▎ | 1922/4506 [2:11:30<3:02:37, 4.24s/it]
43%|████▎ | 1923/4506 [2:11:34<2:58:25, 4.14s/it]
{'loss': 0.2572, 'grad_norm': 0.4695752263069153, 'learning_rate': 3.5447801057126e-05, 'epoch': 0.43}
43%|████▎ | 1923/4506 [2:11:34<2:58:25, 4.14s/it]
43%|████▎ | 1924/4506 [2:11:38<2:58:31, 4.15s/it]
{'loss': 0.258, 'grad_norm': 0.4429291784763336, 'learning_rate': 3.5430201760371564e-05, 'epoch': 0.43}
43%|████▎ | 1924/4506 [2:11:38<2:58:31, 4.15s/it]
43%|████▎ | 1925/4506 [2:11:42<2:58:57, 4.16s/it]
{'loss': 0.2718, 'grad_norm': 0.4495207667350769, 'learning_rate': 3.5412596203092686e-05, 'epoch': 0.43}
43%|████▎ | 1925/4506 [2:11:42<2:58:57, 4.16s/it]
43%|████▎ | 1926/4506 [2:11:47<3:05:33, 4.32s/it]
{'loss': 0.2618, 'grad_norm': 0.41560953855514526, 'learning_rate': 3.539498439585674e-05, 'epoch': 0.43}
43%|████▎ | 1926/4506 [2:11:47<3:05:33, 4.32s/it]
43%|████▎ | 1927/4506 [2:11:51<3:09:03, 4.40s/it]
{'loss': 0.251, 'grad_norm': 0.38970357179641724, 'learning_rate': 3.5377366349234874e-05, 'epoch': 0.43}
43%|████▎ | 1927/4506 [2:11:51<3:09:03, 4.40s/it]
43%|████▎ | 1928/4506 [2:11:55<3:03:30, 4.27s/it]
{'loss': 0.2495, 'grad_norm': 0.42076820135116577, 'learning_rate': 3.5359742073801995e-05, 'epoch': 0.43}
43%|████▎ | 1928/4506 [2:11:55<3:03:30, 4.27s/it]
43%|████▎ | 1929/4506 [2:12:00<3:01:11, 4.22s/it]
{'loss': 0.2577, 'grad_norm': 0.37506812810897827, 'learning_rate': 3.53421115801367e-05, 'epoch': 0.43}
43%|████▎ | 1929/4506 [2:12:00<3:01:11, 4.22s/it]
43%|████▎ | 1930/4506 [2:12:04<3:04:06, 4.29s/it]
{'loss': 0.2688, 'grad_norm': 0.41497594118118286, 'learning_rate': 3.532447487882136e-05, 'epoch': 0.43}
43%|████▎ | 1930/4506 [2:12:04<3:04:06, 4.29s/it]
43%|████▎ | 1931/4506 [2:12:08<2:58:08, 4.15s/it]
{'loss': 0.2651, 'grad_norm': 0.4181379973888397, 'learning_rate': 3.530683198044207e-05, 'epoch': 0.43}
43%|████▎ | 1931/4506 [2:12:08<2:58:08, 4.15s/it]
43%|████▎ | 1932/4506 [2:12:12<2:57:21, 4.13s/it]
{'loss': 0.2667, 'grad_norm': 0.405754953622818, 'learning_rate': 3.528918289558862e-05, 'epoch': 0.43}
43%|████▎ | 1932/4506 [2:12:12<2:57:21, 4.13s/it]
43%|████▎ | 1933/4506 [2:12:16<2:57:45, 4.15s/it]
{'loss': 0.2516, 'grad_norm': 0.44212955236434937, 'learning_rate': 3.527152763485453e-05, 'epoch': 0.43}
43%|████▎ | 1933/4506 [2:12:16<2:57:45, 4.15s/it]
43%|████▎ | 1934/4506 [2:12:20<2:51:38, 4.00s/it]
{'loss': 0.2675, 'grad_norm': 0.4427857995033264, 'learning_rate': 3.5253866208837035e-05, 'epoch': 0.43}
43%|████▎ | 1934/4506 [2:12:20<2:51:38, 4.00s/it]
43%|████▎ | 1935/4506 [2:12:24<2:51:58, 4.01s/it]
{'loss': 0.2817, 'grad_norm': 0.4446980357170105, 'learning_rate': 3.523619862813704e-05, 'epoch': 0.43}
43%|████▎ | 1935/4506 [2:12:24<2:51:58, 4.01s/it]
43%|████▎ | 1936/4506 [2:12:28<2:57:43, 4.15s/it]
{'loss': 0.2592, 'grad_norm': 0.4648580551147461, 'learning_rate': 3.521852490335919e-05, 'epoch': 0.43}
43%|████▎ | 1936/4506 [2:12:28<2:57:43, 4.15s/it]
43%|████▎ | 1937/4506 [2:12:33<2:59:14, 4.19s/it]
{'loss': 0.2485, 'grad_norm': 0.382507860660553, 'learning_rate': 3.520084504511178e-05, 'epoch': 0.43}
43%|████▎ | 1937/4506 [2:12:33<2:59:14, 4.19s/it]
43%|████▎ | 1938/4506 [2:12:37<3:00:14, 4.21s/it]
{'loss': 0.2648, 'grad_norm': 0.36499375104904175, 'learning_rate': 3.518315906400679e-05, 'epoch': 0.43}
43%|████▎ | 1938/4506 [2:12:37<3:00:14, 4.21s/it]
43%|████▎ | 1939/4506 [2:12:41<2:55:21, 4.10s/it]
{'loss': 0.2503, 'grad_norm': 0.4205186367034912, 'learning_rate': 3.51654669706599e-05, 'epoch': 0.43}
43%|████▎ | 1939/4506 [2:12:41<2:55:21, 4.10s/it]
43%|████▎ | 1940/4506 [2:12:45<2:53:33, 4.06s/it]
{'loss': 0.2463, 'grad_norm': 0.42919641733169556, 'learning_rate': 3.514776877569044e-05, 'epoch': 0.43}
43%|████▎ | 1940/4506 [2:12:45<2:53:33, 4.06s/it]
43%|████▎ | 1941/4506 [2:12:49<2:52:10, 4.03s/it]
{'loss': 0.2552, 'grad_norm': 0.38295748829841614, 'learning_rate': 3.5130064489721395e-05, 'epoch': 0.43}
43%|████▎ | 1941/4506 [2:12:49<2:52:10, 4.03s/it]
43%|████▎ | 1942/4506 [2:12:53<2:52:31, 4.04s/it]
{'loss': 0.26, 'grad_norm': 0.41744399070739746, 'learning_rate': 3.5112354123379416e-05, 'epoch': 0.43}
43%|████▎ | 1942/4506 [2:12:53<2:52:31, 4.04s/it]
43%|████▎ | 1943/4506 [2:12:57<2:57:22, 4.15s/it]
{'loss': 0.2614, 'grad_norm': 0.3694899082183838, 'learning_rate': 3.509463768729482e-05, 'epoch': 0.43}
43%|████▎ | 1943/4506 [2:12:57<2:57:22, 4.15s/it]
43%|████▎ | 1944/4506 [2:13:01<2:54:01, 4.08s/it]
{'loss': 0.2658, 'grad_norm': 0.442281037569046, 'learning_rate': 3.5076915192101533e-05, 'epoch': 0.43}
43%|████▎ | 1944/4506 [2:13:01<2:54:01, 4.08s/it]
43%|████▎ | 1945/4506 [2:13:05<2:53:27, 4.06s/it]
{'loss': 0.2731, 'grad_norm': 0.377446711063385, 'learning_rate': 3.5059186648437135e-05, 'epoch': 0.43}
43%|████▎ | 1945/4506 [2:13:05<2:53:27, 4.06s/it]
43%|████▎ | 1946/4506 [2:13:09<2:54:42, 4.09s/it]
{'loss': 0.2569, 'grad_norm': 0.37426093220710754, 'learning_rate': 3.504145206694286e-05, 'epoch': 0.43}
43%|████▎ | 1946/4506 [2:13:09<2:54:42, 4.09s/it]
43%|████▎ | 1947/4506 [2:13:14<2:58:29, 4.19s/it]
{'loss': 0.267, 'grad_norm': 0.402997761964798, 'learning_rate': 3.5023711458263525e-05, 'epoch': 0.43}
43%|████▎ | 1947/4506 [2:13:14<2:58:29, 4.19s/it]
43%|████▎ | 1948/4506 [2:13:18<2:57:39, 4.17s/it]
{'loss': 0.2478, 'grad_norm': 0.38478171825408936, 'learning_rate': 3.500596483304759e-05, 'epoch': 0.43}
43%|████▎ | 1948/4506 [2:13:18<2:57:39, 4.17s/it]
43%|████▎ | 1949/4506 [2:13:21<2:53:06, 4.06s/it]
{'loss': 0.2639, 'grad_norm': 0.45663246512413025, 'learning_rate': 3.4988212201947106e-05, 'epoch': 0.43}
43%|████▎ | 1949/4506 [2:13:21<2:53:06, 4.06s/it]
43%|████▎ | 1950/4506 [2:13:26<2:58:11, 4.18s/it]
{'loss': 0.2725, 'grad_norm': 0.38877856731414795, 'learning_rate': 3.4970453575617765e-05, 'epoch': 0.43}
43%|████▎ | 1950/4506 [2:13:26<2:58:11, 4.18s/it]
43%|████▎ | 1951/4506 [2:13:30<2:59:15, 4.21s/it]
{'loss': 0.2555, 'grad_norm': 0.41236937046051025, 'learning_rate': 3.495268896471882e-05, 'epoch': 0.43}
43%|████▎ | 1951/4506 [2:13:30<2:59:15, 4.21s/it]
43%|████▎ | 1952/4506 [2:13:34<3:00:02, 4.23s/it]
{'loss': 0.2501, 'grad_norm': 0.3559125065803528, 'learning_rate': 3.493491837991312e-05, 'epoch': 0.43}
43%|████▎ | 1952/4506 [2:13:34<3:00:02, 4.23s/it]
43%|████▎ | 1953/4506 [2:13:39<3:04:26, 4.33s/it]
{'loss': 0.2705, 'grad_norm': 0.3834999203681946, 'learning_rate': 3.491714183186714e-05, 'epoch': 0.43}
43%|████▎ | 1953/4506 [2:13:39<3:04:26, 4.33s/it]
43%|████▎ | 1954/4506 [2:13:43<2:59:02, 4.21s/it]
{'loss': 0.2558, 'grad_norm': 0.3718677759170532, 'learning_rate': 3.4899359331250883e-05, 'epoch': 0.43}
43%|████▎ | 1954/4506 [2:13:43<2:59:02, 4.21s/it]
43%|████▎ | 1955/4506 [2:13:48<3:03:05, 4.31s/it]
{'loss': 0.2585, 'grad_norm': 0.3733175992965698, 'learning_rate': 3.488157088873795e-05, 'epoch': 0.43}
43%|████▎ | 1955/4506 [2:13:48<3:03:05, 4.31s/it]
43%|████▎ | 1956/4506 [2:13:52<3:01:05, 4.26s/it]
{'loss': 0.2583, 'grad_norm': 0.40043818950653076, 'learning_rate': 3.4863776515005516e-05, 'epoch': 0.43}
43%|████▎ | 1956/4506 [2:13:52<3:01:05, 4.26s/it]
43%|████▎ | 1957/4506 [2:13:56<3:01:54, 4.28s/it]
{'loss': 0.2624, 'grad_norm': 0.40227338671684265, 'learning_rate': 3.48459762207343e-05, 'epoch': 0.43}
43%|████▎ | 1957/4506 [2:13:56<3:01:54, 4.28s/it]
43%|████▎ | 1958/4506 [2:14:00<2:59:12, 4.22s/it]
{'loss': 0.2463, 'grad_norm': 0.3511274456977844, 'learning_rate': 3.482817001660857e-05, 'epoch': 0.43}
43%|████▎ | 1958/4506 [2:14:00<2:59:12, 4.22s/it]
43%|████▎ | 1959/4506 [2:14:04<2:57:59, 4.19s/it]
{'loss': 0.257, 'grad_norm': 0.3931587338447571, 'learning_rate': 3.481035791331617e-05, 'epoch': 0.43}
43%|████▎ | 1959/4506 [2:14:04<2:57:59, 4.19s/it]
43%|████▎ | 1960/4506 [2:14:08<2:54:01, 4.10s/it]
{'loss': 0.2534, 'grad_norm': 0.3815319240093231, 'learning_rate': 3.479253992154845e-05, 'epoch': 0.44}
43%|████▎ | 1960/4506 [2:14:08<2:54:01, 4.10s/it]
44%|████▎ | 1961/4506 [2:14:12<2:53:12, 4.08s/it]
{'loss': 0.2621, 'grad_norm': 0.4034101665019989, 'learning_rate': 3.4774716052000316e-05, 'epoch': 0.44}
44%|████▎ | 1961/4506 [2:14:12<2:53:12, 4.08s/it]
44%|████▎ | 1962/4506 [2:14:16<2:50:18, 4.02s/it]
{'loss': 0.2483, 'grad_norm': 0.3892239034175873, 'learning_rate': 3.47568863153702e-05, 'epoch': 0.44}
44%|████▎ | 1962/4506 [2:14:16<2:50:18, 4.02s/it]
44%|████▎ | 1963/4506 [2:14:20<2:51:52, 4.06s/it]
{'loss': 0.2627, 'grad_norm': 0.40673914551734924, 'learning_rate': 3.4739050722360056e-05, 'epoch': 0.44}
44%|████▎ | 1963/4506 [2:14:20<2:51:52, 4.06s/it]
44%|████▎ | 1964/4506 [2:14:24<2:50:10, 4.02s/it]
{'loss': 0.2738, 'grad_norm': 0.4837701916694641, 'learning_rate': 3.472120928367533e-05, 'epoch': 0.44}
44%|████▎ | 1964/4506 [2:14:24<2:50:10, 4.02s/it]
44%|████▎ | 1965/4506 [2:14:28<2:53:38, 4.10s/it]
{'loss': 0.2528, 'grad_norm': 0.35203319787979126, 'learning_rate': 3.470336201002502e-05, 'epoch': 0.44}
44%|████▎ | 1965/4506 [2:14:28<2:53:38, 4.10s/it]
44%|████▎ | 1966/4506 [2:14:32<2:52:36, 4.08s/it]
{'loss': 0.2556, 'grad_norm': 0.4024696946144104, 'learning_rate': 3.4685508912121595e-05, 'epoch': 0.44}
44%|████▎ | 1966/4506 [2:14:32<2:52:36, 4.08s/it]
44%|████▎ | 1967/4506 [2:14:36<2:52:33, 4.08s/it]
{'loss': 0.2603, 'grad_norm': 0.37382248044013977, 'learning_rate': 3.4667650000681025e-05, 'epoch': 0.44}
44%|████▎ | 1967/4506 [2:14:36<2:52:33, 4.08s/it]
44%|████▎ | 1968/4506 [2:14:41<2:52:31, 4.08s/it]
{'loss': 0.2577, 'grad_norm': 0.4114155173301697, 'learning_rate': 3.464978528642276e-05, 'epoch': 0.44}
44%|████▎ | 1968/4506 [2:14:41<2:52:31, 4.08s/it]
44%|████▎ | 1969/4506 [2:14:45<2:51:38, 4.06s/it]
{'loss': 0.2699, 'grad_norm': 0.4109683632850647, 'learning_rate': 3.4631914780069776e-05, 'epoch': 0.44}
44%|████▎ | 1969/4506 [2:14:45<2:51:38, 4.06s/it]
44%|████▎ | 1970/4506 [2:14:49<2:51:02, 4.05s/it]
{'loss': 0.2536, 'grad_norm': 0.37575745582580566, 'learning_rate': 3.4614038492348466e-05, 'epoch': 0.44}
44%|████▎ | 1970/4506 [2:14:49<2:51:02, 4.05s/it]
44%|████▎ | 1971/4506 [2:14:52<2:47:32, 3.97s/it]
{'loss': 0.2769, 'grad_norm': 0.403748482465744, 'learning_rate': 3.459615643398873e-05, 'epoch': 0.44}
44%|████▎ | 1971/4506 [2:14:52<2:47:32, 3.97s/it]
44%|████▍ | 1972/4506 [2:14:56<2:49:22, 4.01s/it]
{'loss': 0.257, 'grad_norm': 0.39869827032089233, 'learning_rate': 3.457826861572393e-05, 'epoch': 0.44}
44%|████▍ | 1972/4506 [2:14:56<2:49:22, 4.01s/it]
44%|████▍ | 1973/4506 [2:15:00<2:47:35, 3.97s/it]
{'loss': 0.25, 'grad_norm': 0.4020322859287262, 'learning_rate': 3.456037504829088e-05, 'epoch': 0.44}
44%|████▍ | 1973/4506 [2:15:00<2:47:35, 3.97s/it]
44%|████▍ | 1974/4506 [2:15:04<2:45:18, 3.92s/it]
{'loss': 0.2564, 'grad_norm': 0.42751288414001465, 'learning_rate': 3.454247574242983e-05, 'epoch': 0.44}
44%|████▍ | 1974/4506 [2:15:04<2:45:18, 3.92s/it]
44%|████▍ | 1975/4506 [2:15:08<2:44:04, 3.89s/it]
{'loss': 0.2554, 'grad_norm': 0.3666369318962097, 'learning_rate': 3.45245707088845e-05, 'epoch': 0.44}
44%|████▍ | 1975/4506 [2:15:08<2:44:04, 3.89s/it]
44%|████▍ | 1976/4506 [2:15:12<2:46:37, 3.95s/it]
{'loss': 0.2551, 'grad_norm': 0.39394065737724304, 'learning_rate': 3.4506659958402026e-05, 'epoch': 0.44}
44%|████▍ | 1976/4506 [2:15:12<2:46:37, 3.95s/it]
44%|████▍ | 1977/4506 [2:15:16<2:45:22, 3.92s/it]
{'loss': 0.2414, 'grad_norm': 0.3805224299430847, 'learning_rate': 3.4488743501733e-05, 'epoch': 0.44}
44%|████▍ | 1977/4506 [2:15:16<2:45:22, 3.92s/it]
44%|████▍ | 1978/4506 [2:15:20<2:47:00, 3.96s/it]
{'loss': 0.2664, 'grad_norm': 0.4080048203468323, 'learning_rate': 3.44708213496314e-05, 'epoch': 0.44}
44%|████▍ | 1978/4506 [2:15:20<2:47:00, 3.96s/it]
44%|████▍ | 1979/4506 [2:15:24<2:44:34, 3.91s/it]
{'loss': 0.2474, 'grad_norm': 0.37454643845558167, 'learning_rate': 3.4452893512854676e-05, 'epoch': 0.44}
44%|████▍ | 1979/4506 [2:15:24<2:44:34, 3.91s/it]
44%|████▍ | 1980/4506 [2:15:28<2:45:09, 3.92s/it]
{'loss': 0.241, 'grad_norm': 0.38283151388168335, 'learning_rate': 3.443496000216365e-05, 'epoch': 0.44}
44%|████▍ | 1980/4506 [2:15:28<2:45:09, 3.92s/it]
44%|████▍ | 1981/4506 [2:15:32<2:44:11, 3.90s/it]
{'loss': 0.2531, 'grad_norm': 0.3912610411643982, 'learning_rate': 3.441702082832255e-05, 'epoch': 0.44}
44%|████▍ | 1981/4506 [2:15:32<2:44:11, 3.90s/it]
44%|████▍ | 1982/4506 [2:15:36<2:46:38, 3.96s/it]
{'loss': 0.2637, 'grad_norm': 0.44410577416419983, 'learning_rate': 3.439907600209903e-05, 'epoch': 0.44}
44%|████▍ | 1982/4506 [2:15:36<2:46:38, 3.96s/it]
44%|████▍ | 1983/4506 [2:15:40<2:55:44, 4.18s/it]
{'loss': 0.2522, 'grad_norm': 0.3989620804786682, 'learning_rate': 3.4381125534264104e-05, 'epoch': 0.44}
44%|████▍ | 1983/4506 [2:15:40<2:55:44, 4.18s/it]
44%|████▍ | 1984/4506 [2:15:44<2:53:24, 4.13s/it]
{'loss': 0.2629, 'grad_norm': 0.4064871668815613, 'learning_rate': 3.436316943559221e-05, 'epoch': 0.44}
44%|████▍ | 1984/4506 [2:15:44<2:53:24, 4.13s/it]
44%|████▍ | 1985/4506 [2:15:48<2:52:19, 4.10s/it]
{'loss': 0.2454, 'grad_norm': 0.3741840422153473, 'learning_rate': 3.434520771686113e-05, 'epoch': 0.44}
44%|████▍ | 1985/4506 [2:15:48<2:52:19, 4.10s/it]
44%|████▍ | 1986/4506 [2:15:53<2:54:24, 4.15s/it]
{'loss': 0.2676, 'grad_norm': 0.46394816040992737, 'learning_rate': 3.432724038885203e-05, 'epoch': 0.44}
44%|████▍ | 1986/4506 [2:15:53<2:54:24, 4.15s/it]
44%|████▍ | 1987/4506 [2:15:57<2:57:40, 4.23s/it]
{'loss': 0.2473, 'grad_norm': 0.40162113308906555, 'learning_rate': 3.4309267462349455e-05, 'epoch': 0.44}
44%|████▍ | 1987/4506 [2:15:57<2:57:40, 4.23s/it]
44%|████▍ | 1988/4506 [2:16:01<2:56:01, 4.19s/it]
{'loss': 0.2663, 'grad_norm': 0.48350709676742554, 'learning_rate': 3.42912889481413e-05, 'epoch': 0.44}
44%|████▍ | 1988/4506 [2:16:01<2:56:01, 4.19s/it]
44%|████▍ | 1989/4506 [2:16:05<2:53:48, 4.14s/it]
{'loss': 0.244, 'grad_norm': 0.4002487063407898, 'learning_rate': 3.427330485701883e-05, 'epoch': 0.44}
44%|████▍ | 1989/4506 [2:16:05<2:53:48, 4.14s/it]
44%|████▍ | 1990/4506 [2:16:09<2:54:40, 4.17s/it]
{'loss': 0.2504, 'grad_norm': 0.37526869773864746, 'learning_rate': 3.4255315199776615e-05, 'epoch': 0.44}
44%|████▍ | 1990/4506 [2:16:09<2:54:40, 4.17s/it]
44%|████▍ | 1991/4506 [2:16:13<2:50:20, 4.06s/it]
{'loss': 0.2546, 'grad_norm': 0.39025166630744934, 'learning_rate': 3.423731998721262e-05, 'epoch': 0.44}
44%|████▍ | 1991/4506 [2:16:13<2:50:20, 4.06s/it]
44%|████▍ | 1992/4506 [2:16:17<2:50:12, 4.06s/it]
{'loss': 0.2548, 'grad_norm': 0.41367942094802856, 'learning_rate': 3.421931923012812e-05, 'epoch': 0.44}
44%|████▍ | 1992/4506 [2:16:17<2:50:12, 4.06s/it]
44%|████▍ | 1993/4506 [2:16:21<2:46:20, 3.97s/it]
{'loss': 0.2542, 'grad_norm': 0.40431904792785645, 'learning_rate': 3.42013129393277e-05, 'epoch': 0.44}
44%|████▍ | 1993/4506 [2:16:21<2:46:20, 3.97s/it]
44%|████▍ | 1994/4506 [2:16:25<2:44:52, 3.94s/it]
{'loss': 0.243, 'grad_norm': 0.3858115077018738, 'learning_rate': 3.418330112561928e-05, 'epoch': 0.44}
44%|████▍ | 1994/4506 [2:16:25<2:44:52, 3.94s/it]
44%|████▍ | 1995/4506 [2:16:29<2:44:45, 3.94s/it]
{'loss': 0.2494, 'grad_norm': 0.4767802953720093, 'learning_rate': 3.4165283799814116e-05, 'epoch': 0.44}
44%|████▍ | 1995/4506 [2:16:29<2:44:45, 3.94s/it]
44%|████▍ | 1996/4506 [2:16:33<2:43:31, 3.91s/it]
{'loss': 0.2535, 'grad_norm': 0.4230082035064697, 'learning_rate': 3.414726097272675e-05, 'epoch': 0.44}
44%|████▍ | 1996/4506 [2:16:33<2:43:31, 3.91s/it]
44%|████▍ | 1997/4506 [2:16:37<2:46:45, 3.99s/it]
{'loss': 0.2478, 'grad_norm': 0.39223232865333557, 'learning_rate': 3.412923265517503e-05, 'epoch': 0.44}
44%|████▍ | 1997/4506 [2:16:37<2:46:45, 3.99s/it]
44%|████▍ | 1998/4506 [2:16:41<2:46:26, 3.98s/it]
{'loss': 0.2472, 'grad_norm': 0.41262102127075195, 'learning_rate': 3.4111198857980104e-05, 'epoch': 0.44}
44%|████▍ | 1998/4506 [2:16:41<2:46:26, 3.98s/it]
44%|████▍ | 1999/4506 [2:16:45<2:44:02, 3.93s/it]
{'loss': 0.2601, 'grad_norm': 0.4410969614982605, 'learning_rate': 3.409315959196639e-05, 'epoch': 0.44}
44%|████▍ | 1999/4506 [2:16:45<2:44:02, 3.93s/it]
44%|████▍ | 2000/4506 [2:16:49<2:43:49, 3.92s/it]
{'loss': 0.2545, 'grad_norm': 0.38851019740104675, 'learning_rate': 3.407511486796163e-05, 'epoch': 0.44}
44%|████▍ | 2000/4506 [2:16:49<2:43:49, 3.92s/it]
44%|████▍ | 2001/4506 [2:16:52<2:43:25, 3.91s/it]
{'loss': 0.25, 'grad_norm': 0.43023088574409485, 'learning_rate': 3.40570646967968e-05, 'epoch': 0.44}
44%|████▍ | 2001/4506 [2:16:52<2:43:25, 3.91s/it]
44%|████▍ | 2002/4506 [2:16:57<2:47:48, 4.02s/it]
{'loss': 0.2602, 'grad_norm': 0.43958404660224915, 'learning_rate': 3.4039009089306155e-05, 'epoch': 0.44}
44%|████▍ | 2002/4506 [2:16:57<2:47:48, 4.02s/it]
44%|████▍ | 2003/4506 [2:17:01<2:52:23, 4.13s/it]
{'loss': 0.2479, 'grad_norm': 0.40625226497650146, 'learning_rate': 3.402094805632724e-05, 'epoch': 0.44}
44%|████▍ | 2003/4506 [2:17:01<2:52:23, 4.13s/it]
44%|████▍ | 2004/4506 [2:17:05<2:52:47, 4.14s/it]
{'loss': 0.2464, 'grad_norm': 0.3457852303981781, 'learning_rate': 3.400288160870083e-05, 'epoch': 0.44}
44%|████▍ | 2004/4506 [2:17:05<2:52:47, 4.14s/it]
44%|████▍ | 2005/4506 [2:17:10<3:02:13, 4.37s/it]
{'loss': 0.2625, 'grad_norm': 0.33506688475608826, 'learning_rate': 3.398480975727094e-05, 'epoch': 0.45}
44%|████▍ | 2005/4506 [2:17:10<3:02:13, 4.37s/it]
45%|████▍ | 2006/4506 [2:17:14<2:59:44, 4.31s/it]
{'loss': 0.2463, 'grad_norm': 0.4360909163951874, 'learning_rate': 3.396673251288486e-05, 'epoch': 0.45}
45%|████▍ | 2006/4506 [2:17:14<2:59:44, 4.31s/it]
45%|████▍ | 2007/4506 [2:17:19<3:02:45, 4.39s/it]
{'loss': 0.2744, 'grad_norm': 0.4324704706668854, 'learning_rate': 3.3948649886393114e-05, 'epoch': 0.45}
45%|████▍ | 2007/4506 [2:17:19<3:02:45, 4.39s/it]
45%|████▍ | 2008/4506 [2:17:23<2:58:27, 4.29s/it]
{'loss': 0.2563, 'grad_norm': 0.4225405752658844, 'learning_rate': 3.393056188864942e-05, 'epoch': 0.45}
45%|████▍ | 2008/4506 [2:17:23<2:58:27, 4.29s/it]
45%|████▍ | 2009/4506 [2:17:27<2:56:18, 4.24s/it]
{'loss': 0.2435, 'grad_norm': 0.335217148065567, 'learning_rate': 3.391246853051076e-05, 'epoch': 0.45}
45%|████▍ | 2009/4506 [2:17:27<2:56:18, 4.24s/it]
45%|████▍ | 2010/4506 [2:17:31<2:55:20, 4.21s/it]
{'loss': 0.2556, 'grad_norm': 0.351423442363739, 'learning_rate': 3.3894369822837316e-05, 'epoch': 0.45}
45%|████▍ | 2010/4506 [2:17:31<2:55:20, 4.21s/it]
45%|████▍ | 2011/4506 [2:17:35<2:50:36, 4.10s/it]
{'loss': 0.2668, 'grad_norm': 0.4181399345397949, 'learning_rate': 3.3876265776492474e-05, 'epoch': 0.45}
45%|████▍ | 2011/4506 [2:17:35<2:50:36, 4.10s/it]
45%|████▍ | 2012/4506 [2:17:39<2:44:15, 3.95s/it]
{'loss': 0.26, 'grad_norm': 0.45916420221328735, 'learning_rate': 3.385815640234283e-05, 'epoch': 0.45}
45%|████▍ | 2012/4506 [2:17:39<2:44:15, 3.95s/it]
45%|████▍ | 2013/4506 [2:17:43<2:44:12, 3.95s/it]
{'loss': 0.2451, 'grad_norm': 0.34632086753845215, 'learning_rate': 3.384004171125821e-05, 'epoch': 0.45}
45%|████▍ | 2013/4506 [2:17:43<2:44:12, 3.95s/it]
45%|████▍ | 2014/4506 [2:17:47<2:47:20, 4.03s/it]
{'loss': 0.2717, 'grad_norm': 0.4601442515850067, 'learning_rate': 3.3821921714111577e-05, 'epoch': 0.45}
45%|████▍ | 2014/4506 [2:17:47<2:47:20, 4.03s/it]
45%|████▍ | 2015/4506 [2:17:51<2:48:32, 4.06s/it]
{'loss': 0.263, 'grad_norm': 0.39501628279685974, 'learning_rate': 3.3803796421779106e-05, 'epoch': 0.45}
45%|████▍ | 2015/4506 [2:17:51<2:48:32, 4.06s/it]
45%|████▍ | 2016/4506 [2:17:55<2:48:39, 4.06s/it]
{'loss': 0.2505, 'grad_norm': 0.44839173555374146, 'learning_rate': 3.378566584514016e-05, 'epoch': 0.45}
45%|████▍ | 2016/4506 [2:17:55<2:48:39, 4.06s/it]
45%|████▍ | 2017/4506 [2:17:59<2:48:48, 4.07s/it]
{'loss': 0.2484, 'grad_norm': 0.37156280875205994, 'learning_rate': 3.376752999507726e-05, 'epoch': 0.45}
45%|████▍ | 2017/4506 [2:17:59<2:48:48, 4.07s/it]
45%|████▍ | 2018/4506 [2:18:03<2:51:43, 4.14s/it]
{'loss': 0.2451, 'grad_norm': 0.41512787342071533, 'learning_rate': 3.3749388882476085e-05, 'epoch': 0.45}
45%|████▍ | 2018/4506 [2:18:03<2:51:43, 4.14s/it]
45%|████▍ | 2019/4506 [2:18:07<2:50:16, 4.11s/it]
{'loss': 0.2356, 'grad_norm': 0.40941932797431946, 'learning_rate': 3.37312425182255e-05, 'epoch': 0.45}
45%|████▍ | 2019/4506 [2:18:08<2:50:16, 4.11s/it]
45%|████▍ | 2020/4506 [2:18:12<2:50:40, 4.12s/it]
{'loss': 0.2503, 'grad_norm': 0.4214937686920166, 'learning_rate': 3.3713090913217486e-05, 'epoch': 0.45}
45%|████▍ | 2020/4506 [2:18:12<2:50:40, 4.12s/it]
45%|████▍ | 2021/4506 [2:18:16<2:52:13, 4.16s/it]
{'loss': 0.2461, 'grad_norm': 0.40697941184043884, 'learning_rate': 3.3694934078347195e-05, 'epoch': 0.45}
45%|████▍ | 2021/4506 [2:18:16<2:52:13, 4.16s/it]
45%|████▍ | 2022/4506 [2:18:21<2:59:55, 4.35s/it]
{'loss': 0.2555, 'grad_norm': 0.4042433798313141, 'learning_rate': 3.367677202451292e-05, 'epoch': 0.45}
45%|████▍ | 2022/4506 [2:18:21<2:59:55, 4.35s/it]
45%|████▍ | 2023/4506 [2:18:25<2:57:17, 4.28s/it]
{'loss': 0.2585, 'grad_norm': 0.41805702447891235, 'learning_rate': 3.365860476261608e-05, 'epoch': 0.45}
45%|████▍ | 2023/4506 [2:18:25<2:57:17, 4.28s/it]
45%|████▍ | 2024/4506 [2:18:29<2:54:57, 4.23s/it]
{'loss': 0.2533, 'grad_norm': 0.43195343017578125, 'learning_rate': 3.364043230356121e-05, 'epoch': 0.45}
45%|████▍ | 2024/4506 [2:18:29<2:54:57, 4.23s/it]
45%|████▍ | 2025/4506 [2:18:33<2:53:13, 4.19s/it]
{'loss': 0.2561, 'grad_norm': 0.3939242959022522, 'learning_rate': 3.362225465825597e-05, 'epoch': 0.45}
45%|████▍ | 2025/4506 [2:18:33<2:53:13, 4.19s/it]
45%|████▍ | 2026/4506 [2:18:37<2:52:07, 4.16s/it]
{'loss': 0.2621, 'grad_norm': 0.4112301468849182, 'learning_rate': 3.360407183761114e-05, 'epoch': 0.45}
45%|████▍ | 2026/4506 [2:18:37<2:52:07, 4.16s/it]
45%|████▍ | 2027/4506 [2:18:41<2:49:39, 4.11s/it]
{'loss': 0.2559, 'grad_norm': 0.4229515492916107, 'learning_rate': 3.358588385254059e-05, 'epoch': 0.45}
45%|████▍ | 2027/4506 [2:18:41<2:49:39, 4.11s/it]
45%|████▌ | 2028/4506 [2:18:45<2:45:54, 4.02s/it]
{'loss': 0.2464, 'grad_norm': 0.42176738381385803, 'learning_rate': 3.3567690713961333e-05, 'epoch': 0.45}
45%|████▌ | 2028/4506 [2:18:45<2:45:54, 4.02s/it]
45%|████▌ | 2029/4506 [2:18:49<2:41:47, 3.92s/it]
{'loss': 0.2482, 'grad_norm': 0.4100395143032074, 'learning_rate': 3.3549492432793415e-05, 'epoch': 0.45}
45%|████▌ | 2029/4506 [2:18:49<2:41:47, 3.92s/it]
45%|████▌ | 2030/4506 [2:18:53<2:47:08, 4.05s/it]
{'loss': 0.2562, 'grad_norm': 0.3944317698478699, 'learning_rate': 3.353128901996001e-05, 'epoch': 0.45}
45%|████▌ | 2030/4506 [2:18:53<2:47:08, 4.05s/it]
45%|████▌ | 2031/4506 [2:18:57<2:47:24, 4.06s/it]
{'loss': 0.256, 'grad_norm': 0.37291955947875977, 'learning_rate': 3.351308048638735e-05, 'epoch': 0.45}
45%|████▌ | 2031/4506 [2:18:57<2:47:24, 4.06s/it]
45%|████▌ | 2032/4506 [2:19:01<2:48:51, 4.10s/it]
{'loss': 0.2475, 'grad_norm': 0.39556023478507996, 'learning_rate': 3.3494866843004774e-05, 'epoch': 0.45}
45%|████▌ | 2032/4506 [2:19:01<2:48:51, 4.10s/it]
45%|████▌ | 2033/4506 [2:19:06<2:54:41, 4.24s/it]
{'loss': 0.2526, 'grad_norm': 0.4197218418121338, 'learning_rate': 3.3476648100744644e-05, 'epoch': 0.45}
45%|████▌ | 2033/4506 [2:19:06<2:54:41, 4.24s/it]
45%|████▌ | 2034/4506 [2:19:10<2:54:34, 4.24s/it]
{'loss': 0.2462, 'grad_norm': 0.4202274680137634, 'learning_rate': 3.345842427054241e-05, 'epoch': 0.45}
45%|████▌ | 2034/4506 [2:19:10<2:54:34, 4.24s/it]
45%|████▌ | 2035/4506 [2:19:14<2:55:53, 4.27s/it]
{'loss': 0.2481, 'grad_norm': 0.4259864389896393, 'learning_rate': 3.344019536333657e-05, 'epoch': 0.45}
45%|████▌ | 2035/4506 [2:19:14<2:55:53, 4.27s/it]
45%|████▌ | 2036/4506 [2:19:19<2:56:39, 4.29s/it]
{'loss': 0.2518, 'grad_norm': 0.4763185679912567, 'learning_rate': 3.342196139006867e-05, 'epoch': 0.45}
45%|████▌ | 2036/4506 [2:19:19<2:56:39, 4.29s/it]
45%|████▌ | 2037/4506 [2:19:23<3:01:56, 4.42s/it]
{'loss': 0.2545, 'grad_norm': 0.39212578535079956, 'learning_rate': 3.3403722361683286e-05, 'epoch': 0.45}
45%|████▌ | 2037/4506 [2:19:23<3:01:56, 4.42s/it]
45%|████▌ | 2038/4506 [2:19:28<3:01:33, 4.41s/it]
{'loss': 0.2674, 'grad_norm': 0.42872345447540283, 'learning_rate': 3.338547828912805e-05, 'epoch': 0.45}
45%|████▌ | 2038/4506 [2:19:28<3:01:33, 4.41s/it]
45%|████▌ | 2039/4506 [2:19:32<3:00:28, 4.39s/it]
{'loss': 0.2731, 'grad_norm': 0.37955793738365173, 'learning_rate': 3.336722918335361e-05, 'epoch': 0.45}
45%|████▌ | 2039/4506 [2:19:32<3:00:28, 4.39s/it]
45%|████▌ | 2040/4506 [2:19:36<2:56:01, 4.28s/it]
{'loss': 0.2561, 'grad_norm': 0.3850400745868683, 'learning_rate': 3.334897505531362e-05, 'epoch': 0.45}
45%|████▌ | 2040/4506 [2:19:36<2:56:01, 4.28s/it]
45%|████▌ | 2041/4506 [2:19:40<2:52:25, 4.20s/it]
{'loss': 0.2669, 'grad_norm': 0.40773409605026245, 'learning_rate': 3.333071591596478e-05, 'epoch': 0.45}
45%|████▌ | 2041/4506 [2:19:40<2:52:25, 4.20s/it]
45%|████▌ | 2042/4506 [2:19:45<2:56:56, 4.31s/it]
{'loss': 0.2579, 'grad_norm': 0.42485854029655457, 'learning_rate': 3.3312451776266776e-05, 'epoch': 0.45}
45%|████▌ | 2042/4506 [2:19:45<2:56:56, 4.31s/it]
45%|████▌ | 2043/4506 [2:19:48<2:48:55, 4.12s/it]
{'loss': 0.2601, 'grad_norm': 0.4380251467227936, 'learning_rate': 3.329418264718229e-05, 'epoch': 0.45}
45%|████▌ | 2043/4506 [2:19:48<2:48:55, 4.12s/it]
45%|████▌ | 2044/4506 [2:19:52<2:48:19, 4.10s/it]
{'loss': 0.2459, 'grad_norm': 0.37804675102233887, 'learning_rate': 3.327590853967702e-05, 'epoch': 0.45}
45%|████▌ | 2044/4506 [2:19:53<2:48:19, 4.10s/it]
45%|████▌ | 2045/4506 [2:19:57<2:48:00, 4.10s/it]
{'loss': 0.2575, 'grad_norm': 0.37272387742996216, 'learning_rate': 3.325762946471964e-05, 'epoch': 0.45}
45%|████▌ | 2045/4506 [2:19:57<2:48:00, 4.10s/it]
45%|████▌ | 2046/4506 [2:20:00<2:44:55, 4.02s/it]
{'loss': 0.2566, 'grad_norm': 0.3994593024253845, 'learning_rate': 3.3239345433281796e-05, 'epoch': 0.45}
45%|████▌ | 2046/4506 [2:20:00<2:44:55, 4.02s/it]
45%|████▌ | 2047/4506 [2:20:05<2:49:20, 4.13s/it]
{'loss': 0.2614, 'grad_norm': 0.4362177848815918, 'learning_rate': 3.322105645633813e-05, 'epoch': 0.45}
45%|████▌ | 2047/4506 [2:20:05<2:49:20, 4.13s/it]
45%|████▌ | 2048/4506 [2:20:09<2:48:49, 4.12s/it]
{'loss': 0.2573, 'grad_norm': 0.41346976161003113, 'learning_rate': 3.320276254486626e-05, 'epoch': 0.45}
45%|████▌ | 2048/4506 [2:20:09<2:48:49, 4.12s/it]
45%|████▌ | 2049/4506 [2:20:13<2:45:54, 4.05s/it]
{'loss': 0.2703, 'grad_norm': 0.4314185082912445, 'learning_rate': 3.318446370984671e-05, 'epoch': 0.45}
45%|████▌ | 2049/4506 [2:20:13<2:45:54, 4.05s/it]
45%|████▌ | 2050/4506 [2:20:17<2:45:01, 4.03s/it]
{'loss': 0.2562, 'grad_norm': 0.35202592611312866, 'learning_rate': 3.316615996226302e-05, 'epoch': 0.46}
45%|████▌ | 2050/4506 [2:20:17<2:45:01, 4.03s/it]
46%|████▌ | 2051/4506 [2:20:21<2:46:45, 4.08s/it]
{'loss': 0.2834, 'grad_norm': 0.37307265400886536, 'learning_rate': 3.3147851313101664e-05, 'epoch': 0.46}
46%|████▌ | 2051/4506 [2:20:21<2:46:45, 4.08s/it]
46%|████▌ | 2052/4506 [2:20:25<2:45:47, 4.05s/it]
{'loss': 0.2575, 'grad_norm': 0.4709285795688629, 'learning_rate': 3.3129537773352034e-05, 'epoch': 0.46}
46%|████▌ | 2052/4506 [2:20:25<2:45:47, 4.05s/it]
46%|████▌ | 2053/4506 [2:20:29<2:48:33, 4.12s/it]
{'loss': 0.2521, 'grad_norm': 0.3939027190208435, 'learning_rate': 3.311121935400647e-05, 'epoch': 0.46}
46%|████▌ | 2053/4506 [2:20:29<2:48:33, 4.12s/it]
46%|████▌ | 2054/4506 [2:20:33<2:45:57, 4.06s/it]
{'loss': 0.2548, 'grad_norm': 0.38840481638908386, 'learning_rate': 3.309289606606027e-05, 'epoch': 0.46}
46%|████▌ | 2054/4506 [2:20:33<2:45:57, 4.06s/it]
46%|████▌ | 2055/4506 [2:20:37<2:41:53, 3.96s/it]
{'loss': 0.2441, 'grad_norm': 0.38471025228500366, 'learning_rate': 3.3074567920511605e-05, 'epoch': 0.46}
46%|████▌ | 2055/4506 [2:20:37<2:41:53, 3.96s/it]
46%|████▌ | 2056/4506 [2:20:41<2:38:33, 3.88s/it]
{'loss': 0.2443, 'grad_norm': 0.3807336390018463, 'learning_rate': 3.30562349283616e-05, 'epoch': 0.46}
46%|████▌ | 2056/4506 [2:20:41<2:38:33, 3.88s/it]
46%|████▌ | 2057/4506 [2:20:45<2:41:29, 3.96s/it]
{'loss': 0.2517, 'grad_norm': 0.41136205196380615, 'learning_rate': 3.303789710061426e-05, 'epoch': 0.46}
46%|████▌ | 2057/4506 [2:20:45<2:41:29, 3.96s/it]
46%|████▌ | 2058/4506 [2:20:49<2:44:22, 4.03s/it]
{'loss': 0.243, 'grad_norm': 0.39967504143714905, 'learning_rate': 3.3019554448276526e-05, 'epoch': 0.46}
46%|████▌ | 2058/4506 [2:20:49<2:44:22, 4.03s/it]
46%|████▌ | 2059/4506 [2:20:53<2:43:55, 4.02s/it]
{'loss': 0.2638, 'grad_norm': 0.41100844740867615, 'learning_rate': 3.30012069823582e-05, 'epoch': 0.46}
46%|████▌ | 2059/4506 [2:20:53<2:43:55, 4.02s/it]
46%|████▌ | 2060/4506 [2:20:57<2:48:33, 4.13s/it]
{'loss': 0.2455, 'grad_norm': 0.4292071759700775, 'learning_rate': 3.2982854713871995e-05, 'epoch': 0.46}
46%|████▌ | 2060/4506 [2:20:57<2:48:33, 4.13s/it]
46%|████▌ | 2061/4506 [2:21:01<2:47:11, 4.10s/it]
{'loss': 0.239, 'grad_norm': 0.507513165473938, 'learning_rate': 3.2964497653833505e-05, 'epoch': 0.46}
46%|████▌ | 2061/4506 [2:21:01<2:47:11, 4.10s/it]
46%|████▌ | 2062/4506 [2:21:05<2:46:56, 4.10s/it]
{'loss': 0.2606, 'grad_norm': 0.3991277813911438, 'learning_rate': 3.294613581326118e-05, 'epoch': 0.46}
46%|████▌ | 2062/4506 [2:21:05<2:46:56, 4.10s/it]
46%|████▌ | 2063/4506 [2:21:09<2:45:23, 4.06s/it]
{'loss': 0.2507, 'grad_norm': 0.40984395146369934, 'learning_rate': 3.292776920317638e-05, 'epoch': 0.46}
46%|████▌ | 2063/4506 [2:21:09<2:45:23, 4.06s/it]
46%|████▌ | 2064/4506 [2:21:14<2:46:22, 4.09s/it]
{'loss': 0.2506, 'grad_norm': 0.44637227058410645, 'learning_rate': 3.2909397834603286e-05, 'epoch': 0.46}
46%|████▌ | 2064/4506 [2:21:14<2:46:22, 4.09s/it]
46%|████▌ | 2065/4506 [2:21:18<2:45:51, 4.08s/it]
{'loss': 0.2341, 'grad_norm': 0.3716481029987335, 'learning_rate': 3.2891021718568954e-05, 'epoch': 0.46}
46%|████▌ | 2065/4506 [2:21:18<2:45:51, 4.08s/it]
46%|████▌ | 2066/4506 [2:21:22<2:46:01, 4.08s/it]
{'loss': 0.2484, 'grad_norm': 0.40785038471221924, 'learning_rate': 3.2872640866103295e-05, 'epoch': 0.46}
46%|████▌ | 2066/4506 [2:21:22<2:46:01, 4.08s/it]
46%|████▌ | 2067/4506 [2:21:26<2:53:05, 4.26s/it]
{'loss': 0.255, 'grad_norm': 0.3664761483669281, 'learning_rate': 3.2854255288239056e-05, 'epoch': 0.46}
46%|████▌ | 2067/4506 [2:21:26<2:53:05, 4.26s/it]
46%|████▌ | 2068/4506 [2:21:30<2:50:11, 4.19s/it]
{'loss': 0.2542, 'grad_norm': 0.4231211245059967, 'learning_rate': 3.283586499601181e-05, 'epoch': 0.46}
46%|████▌ | 2068/4506 [2:21:30<2:50:11, 4.19s/it]
46%|████▌ | 2069/4506 [2:21:35<2:51:40, 4.23s/it]
{'loss': 0.2582, 'grad_norm': 0.380021333694458, 'learning_rate': 3.281747000045997e-05, 'epoch': 0.46}
46%|████▌ | 2069/4506 [2:21:35<2:51:40, 4.23s/it]
46%|████▌ | 2070/4506 [2:21:39<2:49:36, 4.18s/it]
{'loss': 0.2584, 'grad_norm': 0.6830818057060242, 'learning_rate': 3.279907031262479e-05, 'epoch': 0.46}
46%|████▌ | 2070/4506 [2:21:39<2:49:36, 4.18s/it]
46%|████▌ | 2071/4506 [2:21:43<2:48:33, 4.15s/it]
{'loss': 0.2383, 'grad_norm': 0.36357930302619934, 'learning_rate': 3.2780665943550305e-05, 'epoch': 0.46}
46%|████▌ | 2071/4506 [2:21:43<2:48:33, 4.15s/it]
46%|████▌ | 2072/4506 [2:21:47<2:44:28, 4.05s/it]
{'loss': 0.2576, 'grad_norm': 0.41865235567092896, 'learning_rate': 3.2762256904283385e-05, 'epoch': 0.46}
46%|████▌ | 2072/4506 [2:21:47<2:44:28, 4.05s/it]
46%|████▌ | 2073/4506 [2:21:51<2:47:50, 4.14s/it]
{'loss': 0.256, 'grad_norm': 0.4387216866016388, 'learning_rate': 3.274384320587369e-05, 'epoch': 0.46}
46%|████▌ | 2073/4506 [2:21:51<2:47:50, 4.14s/it]
46%|████▌ | 2074/4506 [2:21:55<2:46:55, 4.12s/it]
{'loss': 0.2622, 'grad_norm': 0.46080824732780457, 'learning_rate': 3.272542485937369e-05, 'epoch': 0.46}
46%|████▌ | 2074/4506 [2:21:55<2:46:55, 4.12s/it]
46%|████▌ | 2075/4506 [2:21:59<2:45:19, 4.08s/it]
{'loss': 0.2629, 'grad_norm': 0.48594996333122253, 'learning_rate': 3.270700187583863e-05, 'epoch': 0.46}
46%|████▌ | 2075/4506 [2:21:59<2:45:19, 4.08s/it]
46%|████▌ | 2076/4506 [2:22:03<2:46:27, 4.11s/it]
{'loss': 0.2574, 'grad_norm': 0.4150388836860657, 'learning_rate': 3.268857426632655e-05, 'epoch': 0.46}
46%|████▌ | 2076/4506 [2:22:03<2:46:27, 4.11s/it]
46%|████▌ | 2077/4506 [2:22:07<2:44:09, 4.06s/it]
{'loss': 0.2656, 'grad_norm': 0.5267914533615112, 'learning_rate': 3.267014204189825e-05, 'epoch': 0.46}
46%|████▌ | 2077/4506 [2:22:07<2:44:09, 4.06s/it]
46%|████▌ | 2078/4506 [2:22:11<2:41:58, 4.00s/it]
{'loss': 0.2512, 'grad_norm': 0.3734939396381378, 'learning_rate': 3.265170521361734e-05, 'epoch': 0.46}
46%|████▌ | 2078/4506 [2:22:11<2:41:58, 4.00s/it]
46%|████▌ | 2079/4506 [2:22:15<2:43:30, 4.04s/it]
{'loss': 0.2574, 'grad_norm': 0.3736480474472046, 'learning_rate': 3.263326379255013e-05, 'epoch': 0.46}
46%|████▌ | 2079/4506 [2:22:15<2:43:30, 4.04s/it]
46%|████▌ | 2080/4506 [2:22:19<2:43:13, 4.04s/it]
{'loss': 0.2747, 'grad_norm': 0.4752572178840637, 'learning_rate': 3.2614817789765746e-05, 'epoch': 0.46}
46%|████▌ | 2080/4506 [2:22:19<2:43:13, 4.04s/it]
46%|████▌ | 2081/4506 [2:22:23<2:41:54, 4.01s/it]
{'loss': 0.2614, 'grad_norm': 0.4089973568916321, 'learning_rate': 3.2596367216336025e-05, 'epoch': 0.46}
46%|████▌ | 2081/4506 [2:22:23<2:41:54, 4.01s/it]
46%|████▌ | 2082/4506 [2:22:27<2:39:14, 3.94s/it]
{'loss': 0.2412, 'grad_norm': 0.36880427598953247, 'learning_rate': 3.257791208333558e-05, 'epoch': 0.46}
46%|████▌ | 2082/4506 [2:22:27<2:39:14, 3.94s/it]
46%|████▌ | 2083/4506 [2:22:31<2:41:47, 4.01s/it]
{'loss': 0.2594, 'grad_norm': 0.41687071323394775, 'learning_rate': 3.255945240184173e-05, 'epoch': 0.46}
46%|████▌ | 2083/4506 [2:22:31<2:41:47, 4.01s/it]
46%|████▌ | 2084/4506 [2:22:35<2:42:01, 4.01s/it]
{'loss': 0.2673, 'grad_norm': 0.4219221770763397, 'learning_rate': 3.254098818293454e-05, 'epoch': 0.46}
46%|████▌ | 2084/4506 [2:22:35<2:42:01, 4.01s/it]
46%|████▋ | 2085/4506 [2:22:39<2:41:10, 3.99s/it]
{'loss': 0.2592, 'grad_norm': 0.37493234872817993, 'learning_rate': 3.25225194376968e-05, 'epoch': 0.46}
46%|████▋ | 2085/4506 [2:22:39<2:41:10, 3.99s/it]
46%|████▋ | 2086/4506 [2:22:43<2:40:28, 3.98s/it]
{'loss': 0.2591, 'grad_norm': 0.35754862427711487, 'learning_rate': 3.250404617721401e-05, 'epoch': 0.46}
46%|████▋ | 2086/4506 [2:22:43<2:40:28, 3.98s/it]
46%|████▋ | 2087/4506 [2:22:47<2:43:11, 4.05s/it]
{'loss': 0.256, 'grad_norm': 0.3617333173751831, 'learning_rate': 3.248556841257438e-05, 'epoch': 0.46}
46%|████▋ | 2087/4506 [2:22:47<2:43:11, 4.05s/it]
46%|████▋ | 2088/4506 [2:22:51<2:44:36, 4.08s/it]
{'loss': 0.2364, 'grad_norm': 0.36003872752189636, 'learning_rate': 3.246708615486883e-05, 'epoch': 0.46}
46%|████▋ | 2088/4506 [2:22:51<2:44:36, 4.08s/it]
46%|████▋ | 2089/4506 [2:22:55<2:41:12, 4.00s/it]
{'loss': 0.2463, 'grad_norm': 0.38532623648643494, 'learning_rate': 3.244859941519097e-05, 'epoch': 0.46}
46%|████▋ | 2089/4506 [2:22:55<2:41:12, 4.00s/it]
46%|████▋ | 2090/4506 [2:22:59<2:40:33, 3.99s/it]
{'loss': 0.2389, 'grad_norm': 0.3454495668411255, 'learning_rate': 3.243010820463712e-05, 'epoch': 0.46}
46%|████▋ | 2090/4506 [2:22:59<2:40:33, 3.99s/it]
46%|████▋ | 2091/4506 [2:23:03<2:42:13, 4.03s/it]
{'loss': 0.2464, 'grad_norm': 0.38770192861557007, 'learning_rate': 3.241161253430624e-05, 'epoch': 0.46}
46%|████▋ | 2091/4506 [2:23:03<2:42:13, 4.03s/it]
46%|████▋ | 2092/4506 [2:23:08<2:46:13, 4.13s/it]
{'loss': 0.2482, 'grad_norm': 0.41136911511421204, 'learning_rate': 3.2393112415300016e-05, 'epoch': 0.46}
46%|████▋ | 2092/4506 [2:23:08<2:46:13, 4.13s/it]
46%|████▋ | 2093/4506 [2:23:12<2:46:17, 4.13s/it]
{'loss': 0.258, 'grad_norm': 0.4225878417491913, 'learning_rate': 3.2374607858722774e-05, 'epoch': 0.46}
46%|████▋ | 2093/4506 [2:23:12<2:46:17, 4.13s/it]
46%|████▋ | 2094/4506 [2:23:16<2:48:48, 4.20s/it]
{'loss': 0.2592, 'grad_norm': 0.4375823438167572, 'learning_rate': 3.2356098875681514e-05, 'epoch': 0.46}
46%|████▋ | 2094/4506 [2:23:16<2:48:48, 4.20s/it]
46%|████▋ | 2095/4506 [2:23:21<2:52:11, 4.29s/it]
{'loss': 0.2557, 'grad_norm': 0.39656561613082886, 'learning_rate': 3.233758547728588e-05, 'epoch': 0.47}
46%|████▋ | 2095/4506 [2:23:21<2:52:11, 4.29s/it]
47%|████▋ | 2096/4506 [2:23:25<2:49:36, 4.22s/it]
{'loss': 0.2537, 'grad_norm': 0.4049898684024811, 'learning_rate': 3.231906767464819e-05, 'epoch': 0.47}
47%|████▋ | 2096/4506 [2:23:25<2:49:36, 4.22s/it]
47%|████▋ | 2097/4506 [2:23:29<2:50:28, 4.25s/it]
{'loss': 0.2486, 'grad_norm': 0.3591095507144928, 'learning_rate': 3.2300545478883395e-05, 'epoch': 0.47}
47%|████▋ | 2097/4506 [2:23:29<2:50:28, 4.25s/it]
47%|████▋ | 2098/4506 [2:23:33<2:51:08, 4.26s/it]
{'loss': 0.2503, 'grad_norm': 0.39816781878471375, 'learning_rate': 3.2282018901109065e-05, 'epoch': 0.47}
47%|████▋ | 2098/4506 [2:23:33<2:51:08, 4.26s/it]
47%|████▋ | 2099/4506 [2:23:37<2:46:14, 4.14s/it]
{'loss': 0.2588, 'grad_norm': 0.4223647713661194, 'learning_rate': 3.226348795244544e-05, 'epoch': 0.47}
47%|████▋ | 2099/4506 [2:23:37<2:46:14, 4.14s/it]
47%|████▋ | 2100/4506 [2:23:41<2:45:48, 4.13s/it]
{'loss': 0.2429, 'grad_norm': 0.35304272174835205, 'learning_rate': 3.224495264401533e-05, 'epoch': 0.47}
47%|████▋ | 2100/4506 [2:23:41<2:45:48, 4.13s/it]
47%|████▋ | 2101/4506 [2:23:45<2:45:46, 4.14s/it]
{'loss': 0.2648, 'grad_norm': 0.45954257249832153, 'learning_rate': 3.22264129869442e-05, 'epoch': 0.47}
47%|████▋ | 2101/4506 [2:23:45<2:45:46, 4.14s/it]
47%|████▋ | 2102/4506 [2:23:49<2:43:36, 4.08s/it]
{'loss': 0.2547, 'grad_norm': 0.38920852541923523, 'learning_rate': 3.220786899236014e-05, 'epoch': 0.47}
47%|████▋ | 2102/4506 [2:23:49<2:43:36, 4.08s/it]
47%|████▋ | 2103/4506 [2:23:54<2:48:13, 4.20s/it]
{'loss': 0.2433, 'grad_norm': 0.4036330580711365, 'learning_rate': 3.218932067139378e-05, 'epoch': 0.47}
47%|████▋ | 2103/4506 [2:23:54<2:48:13, 4.20s/it]
47%|████▋ | 2104/4506 [2:23:58<2:41:41, 4.04s/it]
{'loss': 0.2463, 'grad_norm': 0.42686915397644043, 'learning_rate': 3.217076803517842e-05, 'epoch': 0.47}
47%|████▋ | 2104/4506 [2:23:58<2:41:41, 4.04s/it]
47%|████▋ | 2105/4506 [2:24:02<2:46:02, 4.15s/it]
{'loss': 0.2526, 'grad_norm': 0.41542351245880127, 'learning_rate': 3.2152211094849906e-05, 'epoch': 0.47}
47%|████▋ | 2105/4506 [2:24:02<2:46:02, 4.15s/it]
47%|████▋ | 2106/4506 [2:24:06<2:44:12, 4.11s/it]
{'loss': 0.2381, 'grad_norm': 0.3872690200805664, 'learning_rate': 3.2133649861546675e-05, 'epoch': 0.47}
47%|████▋ | 2106/4506 [2:24:06<2:44:12, 4.11s/it]
47%|████▋ | 2107/4506 [2:24:10<2:48:49, 4.22s/it]
{'loss': 0.2592, 'grad_norm': 0.42578473687171936, 'learning_rate': 3.211508434640974e-05, 'epoch': 0.47}
47%|████▋ | 2107/4506 [2:24:10<2:48:49, 4.22s/it]
47%|████▋ | 2108/4506 [2:24:15<2:46:50, 4.17s/it]
{'loss': 0.2429, 'grad_norm': 0.4262234568595886, 'learning_rate': 3.20965145605827e-05, 'epoch': 0.47}
47%|████▋ | 2108/4506 [2:24:15<2:46:50, 4.17s/it]
47%|████▋ | 2109/4506 [2:24:18<2:42:52, 4.08s/it]
{'loss': 0.2508, 'grad_norm': 0.37991732358932495, 'learning_rate': 3.2077940515211696e-05, 'epoch': 0.47}
47%|████▋ | 2109/4506 [2:24:18<2:42:52, 4.08s/it]
47%|████▋ | 2110/4506 [2:24:23<2:44:34, 4.12s/it]
{'loss': 0.2534, 'grad_norm': 0.4044076204299927, 'learning_rate': 3.2059362221445444e-05, 'epoch': 0.47}
47%|████▋ | 2110/4506 [2:24:23<2:44:34, 4.12s/it]
47%|████▋ | 2111/4506 [2:24:27<2:44:32, 4.12s/it]
{'loss': 0.2457, 'grad_norm': 0.4074120819568634, 'learning_rate': 3.204077969043519e-05, 'epoch': 0.47}
47%|████▋ | 2111/4506 [2:24:27<2:44:32, 4.12s/it]
47%|████▋ | 2112/4506 [2:24:31<2:43:16, 4.09s/it]
{'loss': 0.235, 'grad_norm': 0.33903637528419495, 'learning_rate': 3.202219293333474e-05, 'epoch': 0.47}
47%|████▋ | 2112/4506 [2:24:31<2:43:16, 4.09s/it]
47%|████▋ | 2113/4506 [2:24:35<2:43:31, 4.10s/it]
{'loss': 0.2527, 'grad_norm': 0.3661743402481079, 'learning_rate': 3.200360196130043e-05, 'epoch': 0.47}
47%|████▋ | 2113/4506 [2:24:35<2:43:31, 4.10s/it]
47%|████▋ | 2114/4506 [2:24:39<2:43:54, 4.11s/it]
{'loss': 0.2584, 'grad_norm': 0.41749560832977295, 'learning_rate': 3.1985006785491134e-05, 'epoch': 0.47}
47%|████▋ | 2114/4506 [2:24:39<2:43:54, 4.11s/it]
47%|████▋ | 2115/4506 [2:24:43<2:39:40, 4.01s/it]
{'loss': 0.2484, 'grad_norm': 0.4656373858451843, 'learning_rate': 3.1966407417068234e-05, 'epoch': 0.47}
47%|████▋ | 2115/4506 [2:24:43<2:39:40, 4.01s/it]
47%|████▋ | 2116/4506 [2:24:47<2:38:11, 3.97s/it]
{'loss': 0.2435, 'grad_norm': 0.36479654908180237, 'learning_rate': 3.194780386719564e-05, 'epoch': 0.47}
47%|████▋ | 2116/4506 [2:24:47<2:38:11, 3.97s/it]
47%|████▋ | 2117/4506 [2:24:51<2:38:40, 3.99s/it]
{'loss': 0.2537, 'grad_norm': 0.37985074520111084, 'learning_rate': 3.1929196147039764e-05, 'epoch': 0.47}
47%|████▋ | 2117/4506 [2:24:51<2:38:40, 3.99s/it]
47%|████▋ | 2118/4506 [2:24:55<2:40:27, 4.03s/it]
{'loss': 0.246, 'grad_norm': 0.3830971419811249, 'learning_rate': 3.1910584267769526e-05, 'epoch': 0.47}
47%|████▋ | 2118/4506 [2:24:55<2:40:27, 4.03s/it]
47%|████▋ | 2119/4506 [2:24:59<2:39:41, 4.01s/it]
{'loss': 0.2777, 'grad_norm': 0.46288779377937317, 'learning_rate': 3.189196824055635e-05, 'epoch': 0.47}
47%|████▋ | 2119/4506 [2:24:59<2:39:41, 4.01s/it]
47%|████▋ | 2120/4506 [2:25:03<2:43:11, 4.10s/it]
{'loss': 0.2587, 'grad_norm': 0.3840714395046234, 'learning_rate': 3.187334807657415e-05, 'epoch': 0.47}
47%|████▋ | 2120/4506 [2:25:03<2:43:11, 4.10s/it]
47%|████▋ | 2121/4506 [2:25:07<2:43:27, 4.11s/it]
{'loss': 0.2353, 'grad_norm': 0.3597859740257263, 'learning_rate': 3.185472378699929e-05, 'epoch': 0.47}
47%|████▋ | 2121/4506 [2:25:07<2:43:27, 4.11s/it]
47%|████▋ | 2122/4506 [2:25:11<2:43:32, 4.12s/it]
{'loss': 0.2495, 'grad_norm': 0.4119827449321747, 'learning_rate': 3.183609538301065e-05, 'epoch': 0.47}
47%|████▋ | 2122/4506 [2:25:11<2:43:32, 4.12s/it]
47%|████▋ | 2123/4506 [2:25:15<2:41:35, 4.07s/it]
{'loss': 0.2562, 'grad_norm': 0.3778342306613922, 'learning_rate': 3.181746287578957e-05, 'epoch': 0.47}
47%|████▋ | 2123/4506 [2:25:15<2:41:35, 4.07s/it]
47%|████▋ | 2124/4506 [2:25:19<2:40:41, 4.05s/it]
{'loss': 0.2647, 'grad_norm': 0.4246066212654114, 'learning_rate': 3.179882627651983e-05, 'epoch': 0.47}
47%|████▋ | 2124/4506 [2:25:19<2:40:41, 4.05s/it]
47%|████▋ | 2125/4506 [2:25:23<2:37:26, 3.97s/it]
{'loss': 0.2471, 'grad_norm': 0.41600003838539124, 'learning_rate': 3.178018559638771e-05, 'epoch': 0.47}
47%|████▋ | 2125/4506 [2:25:23<2:37:26, 3.97s/it]
47%|████▋ | 2126/4506 [2:25:27<2:37:24, 3.97s/it]
{'loss': 0.2575, 'grad_norm': 0.43981003761291504, 'learning_rate': 3.1761540846581885e-05, 'epoch': 0.47}
47%|████▋ | 2126/4506 [2:25:27<2:37:24, 3.97s/it]
47%|████▋ | 2127/4506 [2:25:31<2:35:40, 3.93s/it]
{'loss': 0.255, 'grad_norm': 0.41938716173171997, 'learning_rate': 3.174289203829352e-05, 'epoch': 0.47}
47%|████▋ | 2127/4506 [2:25:31<2:35:40, 3.93s/it]
47%|████▋ | 2128/4506 [2:25:35<2:39:17, 4.02s/it]
{'loss': 0.25, 'grad_norm': 0.431244432926178, 'learning_rate': 3.1724239182716184e-05, 'epoch': 0.47}
47%|████▋ | 2128/4506 [2:25:35<2:39:17, 4.02s/it]
47%|████▋ | 2129/4506 [2:25:39<2:43:16, 4.12s/it]
{'loss': 0.2561, 'grad_norm': 0.40489625930786133, 'learning_rate': 3.170558229104591e-05, 'epoch': 0.47}
47%|████▋ | 2129/4506 [2:25:40<2:43:16, 4.12s/it]
47%|████▋ | 2130/4506 [2:25:44<2:48:31, 4.26s/it]
{'loss': 0.2383, 'grad_norm': 0.38826027512550354, 'learning_rate': 3.168692137448111e-05, 'epoch': 0.47}
47%|████▋ | 2130/4506 [2:25:44<2:48:31, 4.26s/it]
47%|████▋ | 2131/4506 [2:25:48<2:50:23, 4.30s/it]
{'loss': 0.2583, 'grad_norm': 0.39746832847595215, 'learning_rate': 3.166825644422264e-05, 'epoch': 0.47}
47%|████▋ | 2131/4506 [2:25:48<2:50:23, 4.30s/it]
47%|████▋ | 2132/4506 [2:25:53<2:48:23, 4.26s/it]
{'loss': 0.2527, 'grad_norm': 0.44394275546073914, 'learning_rate': 3.1649587511473764e-05, 'epoch': 0.47}
47%|████▋ | 2132/4506 [2:25:53<2:48:23, 4.26s/it]
47%|████▋ | 2133/4506 [2:25:57<2:44:52, 4.17s/it]
{'loss': 0.2509, 'grad_norm': 0.4356195628643036, 'learning_rate': 3.163091458744014e-05, 'epoch': 0.47}
47%|████▋ | 2133/4506 [2:25:57<2:44:52, 4.17s/it]
47%|████▋ | 2134/4506 [2:26:01<2:42:45, 4.12s/it]
{'loss': 0.257, 'grad_norm': 0.42369577288627625, 'learning_rate': 3.161223768332983e-05, 'epoch': 0.47}
47%|████▋ | 2134/4506 [2:26:01<2:42:45, 4.12s/it]
47%|████▋ | 2135/4506 [2:26:05<2:42:08, 4.10s/it]
{'loss': 0.2376, 'grad_norm': 0.41797977685928345, 'learning_rate': 3.1593556810353264e-05, 'epoch': 0.47}
47%|████▋ | 2135/4506 [2:26:05<2:42:08, 4.10s/it]
47%|████▋ | 2136/4506 [2:26:09<2:40:08, 4.05s/it]
{'loss': 0.2569, 'grad_norm': 0.38939669728279114, 'learning_rate': 3.157487197972329e-05, 'epoch': 0.47}
47%|████▋ | 2136/4506 [2:26:09<2:40:08, 4.05s/it]
47%|████▋ | 2137/4506 [2:26:13<2:42:08, 4.11s/it]
{'loss': 0.2536, 'grad_norm': 0.3667083978652954, 'learning_rate': 3.155618320265511e-05, 'epoch': 0.47}
47%|████▋ | 2137/4506 [2:26:13<2:42:08, 4.11s/it]
47%|████▋ | 2138/4506 [2:26:17<2:37:46, 4.00s/it]
{'loss': 0.236, 'grad_norm': 0.49005603790283203, 'learning_rate': 3.153749049036627e-05, 'epoch': 0.47}
47%|████▋ | 2138/4506 [2:26:17<2:37:46, 4.00s/it]
47%|████▋ | 2139/4506 [2:26:20<2:35:40, 3.95s/it]
{'loss': 0.2543, 'grad_norm': 0.3728054463863373, 'learning_rate': 3.1518793854076746e-05, 'epoch': 0.47}
47%|████▋ | 2139/4506 [2:26:20<2:35:40, 3.95s/it]
47%|████▋ | 2140/4506 [2:26:24<2:35:30, 3.94s/it]
{'loss': 0.2537, 'grad_norm': 0.36343690752983093, 'learning_rate': 3.15000933050088e-05, 'epoch': 0.48}
47%|████▋ | 2140/4506 [2:26:24<2:35:30, 3.94s/it]
48%|████▊ | 2141/4506 [2:26:29<2:38:55, 4.03s/it]
{'loss': 0.2468, 'grad_norm': 0.41467049717903137, 'learning_rate': 3.148138885438707e-05, 'epoch': 0.48}
48%|████▊ | 2141/4506 [2:26:29<2:38:55, 4.03s/it]
48%|████▊ | 2142/4506 [2:26:32<2:37:36, 4.00s/it]
{'loss': 0.2601, 'grad_norm': 0.3886142373085022, 'learning_rate': 3.146268051343856e-05, 'epoch': 0.48}
48%|████▊ | 2142/4506 [2:26:33<2:37:36, 4.00s/it]
48%|████▊ | 2143/4506 [2:26:37<2:39:20, 4.05s/it]
{'loss': 0.2487, 'grad_norm': 0.36736559867858887, 'learning_rate': 3.144396829339257e-05, 'epoch': 0.48}
48%|████▊ | 2143/4506 [2:26:37<2:39:20, 4.05s/it]
48%|████▊ | 2144/4506 [2:26:40<2:36:41, 3.98s/it]
{'loss': 0.2472, 'grad_norm': 0.3995389938354492, 'learning_rate': 3.142525220548073e-05, 'epoch': 0.48}
48%|████▊ | 2144/4506 [2:26:40<2:36:41, 3.98s/it]
48%|████▊ | 2145/4506 [2:26:44<2:35:29, 3.95s/it]
{'loss': 0.2501, 'grad_norm': 0.42196670174598694, 'learning_rate': 3.140653226093702e-05, 'epoch': 0.48}
48%|████▊ | 2145/4506 [2:26:44<2:35:29, 3.95s/it]
48%|████▊ | 2146/4506 [2:26:49<2:43:09, 4.15s/it]
{'loss': 0.2553, 'grad_norm': 0.3880380094051361, 'learning_rate': 3.1387808470997725e-05, 'epoch': 0.48}
48%|████▊ | 2146/4506 [2:26:49<2:43:09, 4.15s/it]
48%|████▊ | 2147/4506 [2:26:53<2:41:30, 4.11s/it]
{'loss': 0.2639, 'grad_norm': 0.4654938876628876, 'learning_rate': 3.136908084690142e-05, 'epoch': 0.48}
48%|████▊ | 2147/4506 [2:26:53<2:41:30, 4.11s/it]
48%|████▊ | 2148/4506 [2:26:57<2:41:08, 4.10s/it]
{'loss': 0.249, 'grad_norm': 0.388139009475708, 'learning_rate': 3.135034939988901e-05, 'epoch': 0.48}
48%|████▊ | 2148/4506 [2:26:57<2:41:08, 4.10s/it]
48%|████▊ | 2149/4506 [2:27:01<2:39:44, 4.07s/it]
{'loss': 0.2529, 'grad_norm': 0.46134695410728455, 'learning_rate': 3.1331614141203674e-05, 'epoch': 0.48}
48%|████▊ | 2149/4506 [2:27:01<2:39:44, 4.07s/it]
48%|████▊ | 2150/4506 [2:27:05<2:40:03, 4.08s/it]
{'loss': 0.2571, 'grad_norm': 0.39415082335472107, 'learning_rate': 3.131287508209088e-05, 'epoch': 0.48}
48%|████▊ | 2150/4506 [2:27:05<2:40:03, 4.08s/it]
48%|████▊ | 2151/4506 [2:27:09<2:40:38, 4.09s/it]
{'loss': 0.2413, 'grad_norm': 0.43433064222335815, 'learning_rate': 3.1294132233798376e-05, 'epoch': 0.48}
48%|████▊ | 2151/4506 [2:27:09<2:40:38, 4.09s/it]
48%|████▊ | 2152/4506 [2:27:14<2:43:22, 4.16s/it]
{'loss': 0.2524, 'grad_norm': 0.419862300157547, 'learning_rate': 3.12753856075762e-05, 'epoch': 0.48}
48%|████▊ | 2152/4506 [2:27:14<2:43:22, 4.16s/it]
48%|████▊ | 2153/4506 [2:27:18<2:42:30, 4.14s/it]
{'loss': 0.2458, 'grad_norm': 0.35035932064056396, 'learning_rate': 3.1256635214676656e-05, 'epoch': 0.48}
48%|████▊ | 2153/4506 [2:27:18<2:42:30, 4.14s/it]
48%|████▊ | 2154/4506 [2:27:22<2:47:04, 4.26s/it]
{'loss': 0.2505, 'grad_norm': 0.39417582750320435, 'learning_rate': 3.123788106635427e-05, 'epoch': 0.48}
48%|████▊ | 2154/4506 [2:27:22<2:47:04, 4.26s/it]
48%|████▊ | 2155/4506 [2:27:26<2:42:39, 4.15s/it]
{'loss': 0.2498, 'grad_norm': 0.40317466855049133, 'learning_rate': 3.121912317386589e-05, 'epoch': 0.48}
48%|████▊ | 2155/4506 [2:27:26<2:42:39, 4.15s/it]
48%|████▊ | 2156/4506 [2:27:30<2:39:12, 4.06s/it]
{'loss': 0.2378, 'grad_norm': 0.3693580627441406, 'learning_rate': 3.1200361548470544e-05, 'epoch': 0.48}
48%|████▊ | 2156/4506 [2:27:30<2:39:12, 4.06s/it]
48%|████▊ | 2157/4506 [2:27:34<2:37:27, 4.02s/it]
{'loss': 0.2426, 'grad_norm': 0.378089964389801, 'learning_rate': 3.118159620142954e-05, 'epoch': 0.48}
48%|████▊ | 2157/4506 [2:27:34<2:37:27, 4.02s/it]
48%|████▊ | 2158/4506 [2:27:38<2:34:35, 3.95s/it]
{'loss': 0.2619, 'grad_norm': 0.41399720311164856, 'learning_rate': 3.116282714400642e-05, 'epoch': 0.48}
48%|████▊ | 2158/4506 [2:27:38<2:34:35, 3.95s/it]
48%|████▊ | 2159/4506 [2:27:42<2:35:45, 3.98s/it]
{'loss': 0.2525, 'grad_norm': 0.4617007374763489, 'learning_rate': 3.1144054387466934e-05, 'epoch': 0.48}
48%|████▊ | 2159/4506 [2:27:42<2:35:45, 3.98s/it]
48%|████▊ | 2160/4506 [2:27:46<2:35:54, 3.99s/it]
{'loss': 0.2517, 'grad_norm': 0.3866349458694458, 'learning_rate': 3.1125277943079066e-05, 'epoch': 0.48}
48%|████▊ | 2160/4506 [2:27:46<2:35:54, 3.99s/it]
48%|████▊ | 2161/4506 [2:27:50<2:36:37, 4.01s/it]
{'loss': 0.2482, 'grad_norm': 0.39325833320617676, 'learning_rate': 3.1106497822113004e-05, 'epoch': 0.48}
48%|████▊ | 2161/4506 [2:27:50<2:36:37, 4.01s/it]
48%|████▊ | 2162/4506 [2:27:54<2:37:30, 4.03s/it]
{'loss': 0.2533, 'grad_norm': 0.4436732828617096, 'learning_rate': 3.108771403584115e-05, 'epoch': 0.48}
48%|████▊ | 2162/4506 [2:27:54<2:37:30, 4.03s/it]
48%|████▊ | 2163/4506 [2:27:58<2:35:57, 3.99s/it]
{'loss': 0.2422, 'grad_norm': 0.4079633951187134, 'learning_rate': 3.106892659553809e-05, 'epoch': 0.48}
48%|████▊ | 2163/4506 [2:27:58<2:35:57, 3.99s/it]
48%|████▊ | 2164/4506 [2:28:02<2:35:00, 3.97s/it]
{'loss': 0.2448, 'grad_norm': 0.38326960802078247, 'learning_rate': 3.1050135512480654e-05, 'epoch': 0.48}
48%|████▊ | 2164/4506 [2:28:02<2:35:00, 3.97s/it]
48%|████▊ | 2165/4506 [2:28:06<2:32:51, 3.92s/it]
{'loss': 0.2459, 'grad_norm': 0.380143404006958, 'learning_rate': 3.1031340797947786e-05, 'epoch': 0.48}
48%|████▊ | 2165/4506 [2:28:06<2:32:51, 3.92s/it]
48%|████▊ | 2166/4506 [2:28:09<2:33:23, 3.93s/it]
{'loss': 0.2458, 'grad_norm': 0.44335946440696716, 'learning_rate': 3.1012542463220665e-05, 'epoch': 0.48}
48%|████▊ | 2166/4506 [2:28:10<2:33:23, 3.93s/it]
48%|████▊ | 2167/4506 [2:28:14<2:35:00, 3.98s/it]
{'loss': 0.2542, 'grad_norm': 0.48410114645957947, 'learning_rate': 3.099374051958262e-05, 'epoch': 0.48}
48%|████▊ | 2167/4506 [2:28:14<2:35:00, 3.98s/it]
48%|████▊ | 2168/4506 [2:28:18<2:35:47, 4.00s/it]
{'loss': 0.2584, 'grad_norm': 0.414028137922287, 'learning_rate': 3.097493497831914e-05, 'epoch': 0.48}
48%|████▊ | 2168/4506 [2:28:18<2:35:47, 4.00s/it]
48%|████▊ | 2169/4506 [2:28:22<2:35:54, 4.00s/it]
{'loss': 0.2606, 'grad_norm': 0.457526832818985, 'learning_rate': 3.095612585071789e-05, 'epoch': 0.48}
48%|████▊ | 2169/4506 [2:28:22<2:35:54, 4.00s/it]
48%|████▊ | 2170/4506 [2:28:26<2:35:41, 4.00s/it]
{'loss': 0.2418, 'grad_norm': 0.37319040298461914, 'learning_rate': 3.093731314806868e-05, 'epoch': 0.48}
48%|████▊ | 2170/4506 [2:28:26<2:35:41, 4.00s/it]
48%|████▊ | 2171/4506 [2:28:30<2:36:45, 4.03s/it]
{'loss': 0.2561, 'grad_norm': 0.3584696054458618, 'learning_rate': 3.091849688166347e-05, 'epoch': 0.48}
48%|████▊ | 2171/4506 [2:28:30<2:36:45, 4.03s/it]
48%|████▊ | 2172/4506 [2:28:34<2:37:01, 4.04s/it]
{'loss': 0.2566, 'grad_norm': 0.4011690318584442, 'learning_rate': 3.0899677062796355e-05, 'epoch': 0.48}
48%|████▊ | 2172/4506 [2:28:34<2:37:01, 4.04s/it]
48%|████▊ | 2173/4506 [2:28:38<2:35:37, 4.00s/it]
{'loss': 0.2501, 'grad_norm': 0.37106576561927795, 'learning_rate': 3.0880853702763554e-05, 'epoch': 0.48}
48%|████▊ | 2173/4506 [2:28:38<2:35:37, 4.00s/it]
48%|████▊ | 2174/4506 [2:28:42<2:37:19, 4.05s/it]
{'loss': 0.2528, 'grad_norm': 0.39304226636886597, 'learning_rate': 3.0862026812863436e-05, 'epoch': 0.48}
48%|████▊ | 2174/4506 [2:28:42<2:37:19, 4.05s/it]
48%|████▊ | 2175/4506 [2:28:46<2:36:52, 4.04s/it]
{'loss': 0.2486, 'grad_norm': 0.3504863381385803, 'learning_rate': 3.0843196404396465e-05, 'epoch': 0.48}
48%|████▊ | 2175/4506 [2:28:46<2:36:52, 4.04s/it]
48%|████▊ | 2176/4506 [2:28:50<2:35:16, 4.00s/it]
{'loss': 0.2489, 'grad_norm': 0.41466450691223145, 'learning_rate': 3.082436248866521e-05, 'epoch': 0.48}
48%|████▊ | 2176/4506 [2:28:50<2:35:16, 4.00s/it]
48%|████▊ | 2177/4506 [2:28:54<2:36:09, 4.02s/it]
{'loss': 0.2448, 'grad_norm': 0.38814330101013184, 'learning_rate': 3.0805525076974394e-05, 'epoch': 0.48}
48%|████▊ | 2177/4506 [2:28:54<2:36:09, 4.02s/it]
48%|████▊ | 2178/4506 [2:28:58<2:38:26, 4.08s/it]
{'loss': 0.2492, 'grad_norm': 0.3730236291885376, 'learning_rate': 3.078668418063078e-05, 'epoch': 0.48}
48%|████▊ | 2178/4506 [2:28:58<2:38:26, 4.08s/it]
48%|████▊ | 2179/4506 [2:29:02<2:40:29, 4.14s/it]
{'loss': 0.2554, 'grad_norm': 0.3719511926174164, 'learning_rate': 3.0767839810943246e-05, 'epoch': 0.48}
48%|████▊ | 2179/4506 [2:29:02<2:40:29, 4.14s/it]
48%|████▊ | 2180/4506 [2:29:06<2:36:41, 4.04s/it]
{'loss': 0.2532, 'grad_norm': 0.3820197582244873, 'learning_rate': 3.074899197922277e-05, 'epoch': 0.48}
48%|████▊ | 2180/4506 [2:29:06<2:36:41, 4.04s/it]
48%|████▊ | 2181/4506 [2:29:10<2:39:06, 4.11s/it]
{'loss': 0.2583, 'grad_norm': 0.41058826446533203, 'learning_rate': 3.073014069678238e-05, 'epoch': 0.48}
48%|████▊ | 2181/4506 [2:29:10<2:39:06, 4.11s/it]
48%|████▊ | 2182/4506 [2:29:14<2:35:55, 4.03s/it]
{'loss': 0.2284, 'grad_norm': 0.42215147614479065, 'learning_rate': 3.0711285974937197e-05, 'epoch': 0.48}
48%|████▊ | 2182/4506 [2:29:14<2:35:55, 4.03s/it]
48%|████▊ | 2183/4506 [2:29:18<2:35:31, 4.02s/it]
{'loss': 0.2498, 'grad_norm': 0.39704084396362305, 'learning_rate': 3.06924278250044e-05, 'epoch': 0.48}
48%|████▊ | 2183/4506 [2:29:18<2:35:31, 4.02s/it]
48%|████▊ | 2184/4506 [2:29:22<2:35:25, 4.02s/it]
{'loss': 0.2384, 'grad_norm': 0.3572736084461212, 'learning_rate': 3.067356625830322e-05, 'epoch': 0.48}
48%|████▊ | 2184/4506 [2:29:22<2:35:25, 4.02s/it]
48%|████▊ | 2185/4506 [2:29:26<2:37:33, 4.07s/it]
{'loss': 0.244, 'grad_norm': 0.3862534165382385, 'learning_rate': 3.065470128615495e-05, 'epoch': 0.48}
48%|████▊ | 2185/4506 [2:29:26<2:37:33, 4.07s/it]
49%|████▊ | 2186/4506 [2:29:31<2:46:02, 4.29s/it]
{'loss': 0.2547, 'grad_norm': 0.3762350380420685, 'learning_rate': 3.0635832919882914e-05, 'epoch': 0.49}
49%|████▊ | 2186/4506 [2:29:31<2:46:02, 4.29s/it]
49%|████▊ | 2187/4506 [2:29:35<2:42:16, 4.20s/it]
{'loss': 0.245, 'grad_norm': 0.4026622474193573, 'learning_rate': 3.061696117081248e-05, 'epoch': 0.49}
49%|████▊ | 2187/4506 [2:29:35<2:42:16, 4.20s/it]
49%|████▊ | 2188/4506 [2:29:40<2:43:27, 4.23s/it]
{'loss': 0.2484, 'grad_norm': 0.3962830603122711, 'learning_rate': 3.059808605027105e-05, 'epoch': 0.49}
49%|████▊ | 2188/4506 [2:29:40<2:43:27, 4.23s/it]
49%|████▊ | 2189/4506 [2:29:44<2:45:24, 4.28s/it]
{'loss': 0.2512, 'grad_norm': 0.3809361755847931, 'learning_rate': 3.0579207569588037e-05, 'epoch': 0.49}
49%|████▊ | 2189/4506 [2:29:44<2:45:24, 4.28s/it]
49%|████▊ | 2190/4506 [2:29:48<2:38:53, 4.12s/it]
{'loss': 0.2634, 'grad_norm': 0.4752698540687561, 'learning_rate': 3.056032574009488e-05, 'epoch': 0.49}
49%|████▊ | 2190/4506 [2:29:48<2:38:53, 4.12s/it]
49%|████▊ | 2191/4506 [2:29:52<2:35:49, 4.04s/it]
{'loss': 0.2506, 'grad_norm': 0.34913232922554016, 'learning_rate': 3.054144057312505e-05, 'epoch': 0.49}
49%|████▊ | 2191/4506 [2:29:52<2:35:49, 4.04s/it]
49%|████▊ | 2192/4506 [2:29:55<2:34:36, 4.01s/it]
{'loss': 0.2453, 'grad_norm': 0.41559743881225586, 'learning_rate': 3.052255208001397e-05, 'epoch': 0.49}
49%|████▊ | 2192/4506 [2:29:56<2:34:36, 4.01s/it]
49%|████▊ | 2193/4506 [2:30:00<2:34:57, 4.02s/it]
{'loss': 0.2466, 'grad_norm': 0.36383679509162903, 'learning_rate': 3.050366027209911e-05, 'epoch': 0.49}
49%|████▊ | 2193/4506 [2:30:00<2:34:57, 4.02s/it]
49%|████▊ | 2194/4506 [2:30:03<2:33:48, 3.99s/it]
{'loss': 0.2518, 'grad_norm': 0.40172746777534485, 'learning_rate': 3.048476516071989e-05, 'epoch': 0.49}
49%|████▊ | 2194/4506 [2:30:03<2:33:48, 3.99s/it]
49%|████▊ | 2195/4506 [2:30:07<2:33:03, 3.97s/it]
{'loss': 0.2406, 'grad_norm': 0.4141163229942322, 'learning_rate': 3.0465866757217744e-05, 'epoch': 0.49}
49%|████▊ | 2195/4506 [2:30:07<2:33:03, 3.97s/it]
49%|████▊ | 2196/4506 [2:30:12<2:35:25, 4.04s/it]
{'loss': 0.2417, 'grad_norm': 0.3682010769844055, 'learning_rate': 3.044696507293606e-05, 'epoch': 0.49}
49%|████▊ | 2196/4506 [2:30:12<2:35:25, 4.04s/it]
49%|████▉ | 2197/4506 [2:30:15<2:33:17, 3.98s/it]
{'loss': 0.2472, 'grad_norm': 0.36639028787612915, 'learning_rate': 3.0428060119220207e-05, 'epoch': 0.49}
49%|████▉ | 2197/4506 [2:30:15<2:33:17, 3.98s/it]
49%|████▉ | 2198/4506 [2:30:20<2:35:16, 4.04s/it]
{'loss': 0.2472, 'grad_norm': 0.43716520071029663, 'learning_rate': 3.0409151907417516e-05, 'epoch': 0.49}
49%|████▉ | 2198/4506 [2:30:20<2:35:16, 4.04s/it]
49%|████▉ | 2199/4506 [2:30:24<2:37:48, 4.10s/it]
{'loss': 0.2535, 'grad_norm': 0.40827375650405884, 'learning_rate': 3.0390240448877265e-05, 'epoch': 0.49}
49%|████▉ | 2199/4506 [2:30:24<2:37:48, 4.10s/it]
49%|████▉ | 2200/4506 [2:30:28<2:39:46, 4.16s/it]
{'loss': 0.244, 'grad_norm': 0.4177387058734894, 'learning_rate': 3.0371325754950698e-05, 'epoch': 0.49}
49%|████▉ | 2200/4506 [2:30:28<2:39:46, 4.16s/it]
49%|████▉ | 2201/4506 [2:30:32<2:38:48, 4.13s/it]
{'loss': 0.2467, 'grad_norm': 0.4242149889469147, 'learning_rate': 3.0352407836990972e-05, 'epoch': 0.49}
49%|████▉ | 2201/4506 [2:30:32<2:38:48, 4.13s/it]
49%|████▉ | 2202/4506 [2:30:36<2:36:31, 4.08s/it]
{'loss': 0.2588, 'grad_norm': 0.46424615383148193, 'learning_rate': 3.0333486706353214e-05, 'epoch': 0.49}
49%|████▉ | 2202/4506 [2:30:36<2:36:31, 4.08s/it]
49%|████▉ | 2203/4506 [2:30:41<2:40:28, 4.18s/it]
{'loss': 0.2493, 'grad_norm': 0.41435444355010986, 'learning_rate': 3.031456237439446e-05, 'epoch': 0.49}
49%|████▉ | 2203/4506 [2:30:41<2:40:28, 4.18s/it]
49%|████▉ | 2204/4506 [2:30:45<2:39:55, 4.17s/it]
{'loss': 0.2473, 'grad_norm': 0.371319979429245, 'learning_rate': 3.0295634852473658e-05, 'epoch': 0.49}
49%|████▉ | 2204/4506 [2:30:45<2:39:55, 4.17s/it]
49%|████▉ | 2205/4506 [2:30:49<2:38:09, 4.12s/it]
{'loss': 0.2457, 'grad_norm': 0.39288145303726196, 'learning_rate': 3.0276704151951695e-05, 'epoch': 0.49}
49%|████▉ | 2205/4506 [2:30:49<2:38:09, 4.12s/it]
49%|████▉ | 2206/4506 [2:30:53<2:39:55, 4.17s/it]
{'loss': 0.2505, 'grad_norm': 0.40301552414894104, 'learning_rate': 3.025777028419135e-05, 'epoch': 0.49}
49%|████▉ | 2206/4506 [2:30:53<2:39:55, 4.17s/it]
49%|████▉ | 2207/4506 [2:30:57<2:38:26, 4.13s/it]
{'loss': 0.2522, 'grad_norm': 0.4476172924041748, 'learning_rate': 3.02388332605573e-05, 'epoch': 0.49}
49%|████▉ | 2207/4506 [2:30:57<2:38:26, 4.13s/it]
49%|████▉ | 2208/4506 [2:31:01<2:37:52, 4.12s/it]
{'loss': 0.2451, 'grad_norm': 0.45383769273757935, 'learning_rate': 3.021989309241613e-05, 'epoch': 0.49}
49%|████▉ | 2208/4506 [2:31:01<2:37:52, 4.12s/it]
49%|████▉ | 2209/4506 [2:31:05<2:36:34, 4.09s/it]
{'loss': 0.2536, 'grad_norm': 0.41694387793540955, 'learning_rate': 3.0200949791136306e-05, 'epoch': 0.49}
49%|████▉ | 2209/4506 [2:31:05<2:36:34, 4.09s/it]
49%|████▉ | 2210/4506 [2:31:09<2:33:31, 4.01s/it]
{'loss': 0.2571, 'grad_norm': 0.42766883969306946, 'learning_rate': 3.0182003368088167e-05, 'epoch': 0.49}
49%|████▉ | 2210/4506 [2:31:09<2:33:31, 4.01s/it]
49%|████▉ | 2211/4506 [2:31:13<2:35:59, 4.08s/it]
{'loss': 0.255, 'grad_norm': 0.3705843389034271, 'learning_rate': 3.0163053834643946e-05, 'epoch': 0.49}
49%|████▉ | 2211/4506 [2:31:13<2:35:59, 4.08s/it]
49%|████▉ | 2212/4506 [2:31:17<2:36:25, 4.09s/it]
{'loss': 0.2418, 'grad_norm': 0.3514150083065033, 'learning_rate': 3.014410120217771e-05, 'epoch': 0.49}
49%|████▉ | 2212/4506 [2:31:17<2:36:25, 4.09s/it]
49%|████▉ | 2213/4506 [2:31:21<2:34:14, 4.04s/it]
{'loss': 0.2552, 'grad_norm': 0.47182491421699524, 'learning_rate': 3.012514548206542e-05, 'epoch': 0.49}
49%|████▉ | 2213/4506 [2:31:21<2:34:14, 4.04s/it]
49%|████▉ | 2214/4506 [2:31:25<2:34:29, 4.04s/it]
{'loss': 0.2575, 'grad_norm': 0.42265358567237854, 'learning_rate': 3.0106186685684868e-05, 'epoch': 0.49}
49%|████▉ | 2214/4506 [2:31:25<2:34:29, 4.04s/it]
49%|████▉ | 2215/4506 [2:31:29<2:34:21, 4.04s/it]
{'loss': 0.2414, 'grad_norm': 0.39296501874923706, 'learning_rate': 3.0087224824415712e-05, 'epoch': 0.49}
49%|████▉ | 2215/4506 [2:31:29<2:34:21, 4.04s/it]
49%|████▉ | 2216/4506 [2:31:34<2:41:59, 4.24s/it]
{'loss': 0.2495, 'grad_norm': 0.36326345801353455, 'learning_rate': 3.0068259909639425e-05, 'epoch': 0.49}
49%|████▉ | 2216/4506 [2:31:34<2:41:59, 4.24s/it]
49%|████▉ | 2217/4506 [2:31:38<2:37:06, 4.12s/it]
{'loss': 0.2416, 'grad_norm': 0.44379547238349915, 'learning_rate': 3.0049291952739328e-05, 'epoch': 0.49}
49%|████▉ | 2217/4506 [2:31:38<2:37:06, 4.12s/it]
49%|████▉ | 2218/4506 [2:31:42<2:31:52, 3.98s/it]
{'loss': 0.2447, 'grad_norm': 0.4015536904335022, 'learning_rate': 3.0030320965100572e-05, 'epoch': 0.49}
49%|████▉ | 2218/4506 [2:31:42<2:31:52, 3.98s/it]
49%|████▉ | 2219/4506 [2:31:46<2:32:46, 4.01s/it]
{'loss': 0.2472, 'grad_norm': 0.41901251673698425, 'learning_rate': 3.001134695811012e-05, 'epoch': 0.49}
49%|████▉ | 2219/4506 [2:31:46<2:32:46, 4.01s/it]
49%|████▉ | 2220/4506 [2:31:49<2:30:03, 3.94s/it]
{'loss': 0.2424, 'grad_norm': 0.4139098525047302, 'learning_rate': 2.9992369943156746e-05, 'epoch': 0.49}
49%|████▉ | 2220/4506 [2:31:49<2:30:03, 3.94s/it]
49%|████▉ | 2221/4506 [2:31:53<2:28:07, 3.89s/it]
{'loss': 0.2485, 'grad_norm': 0.41024264693260193, 'learning_rate': 2.997338993163103e-05, 'epoch': 0.49}
49%|████▉ | 2221/4506 [2:31:53<2:28:07, 3.89s/it]
49%|████▉ | 2222/4506 [2:31:57<2:28:38, 3.90s/it]
{'loss': 0.2422, 'grad_norm': 0.40633806586265564, 'learning_rate': 2.9954406934925354e-05, 'epoch': 0.49}
49%|████▉ | 2222/4506 [2:31:57<2:28:38, 3.90s/it]
49%|████▉ | 2223/4506 [2:32:01<2:28:21, 3.90s/it]
{'loss': 0.2477, 'grad_norm': 0.42935875058174133, 'learning_rate': 2.9935420964433892e-05, 'epoch': 0.49}
49%|████▉ | 2223/4506 [2:32:01<2:28:21, 3.90s/it]
49%|████▉ | 2224/4506 [2:32:05<2:29:42, 3.94s/it]
{'loss': 0.24, 'grad_norm': 0.40537136793136597, 'learning_rate': 2.9916432031552605e-05, 'epoch': 0.49}
49%|████▉ | 2224/4506 [2:32:05<2:29:42, 3.94s/it]
49%|████▉ | 2225/4506 [2:32:09<2:33:10, 4.03s/it]
{'loss': 0.2474, 'grad_norm': 0.4462307393550873, 'learning_rate': 2.9897440147679217e-05, 'epoch': 0.49}
49%|████▉ | 2225/4506 [2:32:09<2:33:10, 4.03s/it]
49%|████▉ | 2226/4506 [2:32:13<2:32:32, 4.01s/it]
{'loss': 0.2453, 'grad_norm': 0.3820648193359375, 'learning_rate': 2.987844532421324e-05, 'epoch': 0.49}
49%|████▉ | 2226/4506 [2:32:13<2:32:32, 4.01s/it]
49%|████▉ | 2227/4506 [2:32:17<2:33:00, 4.03s/it]
{'loss': 0.2585, 'grad_norm': 0.4307577908039093, 'learning_rate': 2.985944757255595e-05, 'epoch': 0.49}
49%|████▉ | 2227/4506 [2:32:17<2:33:00, 4.03s/it]
49%|████▉ | 2228/4506 [2:32:21<2:32:04, 4.01s/it]
{'loss': 0.2488, 'grad_norm': 0.4196356534957886, 'learning_rate': 2.9840446904110377e-05, 'epoch': 0.49}
49%|████▉ | 2228/4506 [2:32:21<2:32:04, 4.01s/it]
49%|████▉ | 2229/4506 [2:32:25<2:33:48, 4.05s/it]
{'loss': 0.2548, 'grad_norm': 0.4064522683620453, 'learning_rate': 2.9821443330281283e-05, 'epoch': 0.49}
49%|████▉ | 2229/4506 [2:32:25<2:33:48, 4.05s/it]
49%|████▉ | 2230/4506 [2:32:29<2:33:11, 4.04s/it]
{'loss': 0.2416, 'grad_norm': 0.39461150765419006, 'learning_rate': 2.9802436862475208e-05, 'epoch': 0.49}
49%|████▉ | 2230/4506 [2:32:29<2:33:11, 4.04s/it]
50%|████▉ | 2231/4506 [2:32:34<2:35:33, 4.10s/it]
{'loss': 0.2541, 'grad_norm': 0.42251837253570557, 'learning_rate': 2.978342751210041e-05, 'epoch': 0.5}
50%|████▉ | 2231/4506 [2:32:34<2:35:33, 4.10s/it]
50%|████▉ | 2232/4506 [2:32:38<2:35:58, 4.12s/it]
{'loss': 0.2489, 'grad_norm': 0.5170906186103821, 'learning_rate': 2.9764415290566867e-05, 'epoch': 0.5}
50%|████▉ | 2232/4506 [2:32:38<2:35:58, 4.12s/it]
50%|████▉ | 2233/4506 [2:32:42<2:39:51, 4.22s/it]
{'loss': 0.2698, 'grad_norm': 0.4759027063846588, 'learning_rate': 2.974540020928631e-05, 'epoch': 0.5}
50%|████▉ | 2233/4506 [2:32:42<2:39:51, 4.22s/it]
50%|████▉ | 2234/4506 [2:32:46<2:39:01, 4.20s/it]
{'loss': 0.2373, 'grad_norm': 0.3708154857158661, 'learning_rate': 2.972638227967216e-05, 'epoch': 0.5}
50%|████▉ | 2234/4506 [2:32:46<2:39:01, 4.20s/it]
50%|████▉ | 2235/4506 [2:32:51<2:38:17, 4.18s/it]
{'loss': 0.2426, 'grad_norm': 0.3823888599872589, 'learning_rate': 2.9707361513139566e-05, 'epoch': 0.5}
50%|████▉ | 2235/4506 [2:32:51<2:38:17, 4.18s/it]
50%|████▉ | 2236/4506 [2:32:55<2:36:17, 4.13s/it]
{'loss': 0.237, 'grad_norm': 0.34161633253097534, 'learning_rate': 2.968833792110535e-05, 'epoch': 0.5}
50%|████▉ | 2236/4506 [2:32:55<2:36:17, 4.13s/it]
50%|████▉ | 2237/4506 [2:32:59<2:37:39, 4.17s/it]
{'loss': 0.2717, 'grad_norm': 0.4650087356567383, 'learning_rate': 2.9669311514988074e-05, 'epoch': 0.5}
50%|████▉ | 2237/4506 [2:32:59<2:37:39, 4.17s/it]
50%|████▉ | 2238/4506 [2:33:03<2:39:28, 4.22s/it]
{'loss': 0.2373, 'grad_norm': 0.3542432188987732, 'learning_rate': 2.9650282306207954e-05, 'epoch': 0.5}
50%|████▉ | 2238/4506 [2:33:03<2:39:28, 4.22s/it]
50%|████▉ | 2239/4506 [2:33:07<2:38:27, 4.19s/it]
{'loss': 0.2562, 'grad_norm': 0.48596253991127014, 'learning_rate': 2.9631250306186898e-05, 'epoch': 0.5}
50%|████▉ | 2239/4506 [2:33:07<2:38:27, 4.19s/it]
50%|████▉ | 2240/4506 [2:33:12<2:38:38, 4.20s/it]
{'loss': 0.2568, 'grad_norm': 0.4225861728191376, 'learning_rate': 2.96122155263485e-05, 'epoch': 0.5}
50%|████▉ | 2240/4506 [2:33:12<2:38:38, 4.20s/it]
50%|████▉ | 2241/4506 [2:33:16<2:42:05, 4.29s/it]
{'loss': 0.2504, 'grad_norm': 0.37981313467025757, 'learning_rate': 2.9593177978118015e-05, 'epoch': 0.5}
50%|████▉ | 2241/4506 [2:33:16<2:42:05, 4.29s/it]
50%|████▉ | 2242/4506 [2:33:20<2:41:08, 4.27s/it]
{'loss': 0.2461, 'grad_norm': 0.5013424754142761, 'learning_rate': 2.9574137672922343e-05, 'epoch': 0.5}
50%|████▉ | 2242/4506 [2:33:20<2:41:08, 4.27s/it]
50%|████▉ | 2243/4506 [2:33:24<2:35:39, 4.13s/it]
{'loss': 0.2285, 'grad_norm': 0.3454064130783081, 'learning_rate': 2.9555094622190072e-05, 'epoch': 0.5}
50%|████▉ | 2243/4506 [2:33:24<2:35:39, 4.13s/it]
50%|████▉ | 2244/4506 [2:33:28<2:35:11, 4.12s/it]
{'loss': 0.2463, 'grad_norm': 0.3542492985725403, 'learning_rate': 2.9536048837351416e-05, 'epoch': 0.5}
50%|████▉ | 2244/4506 [2:33:28<2:35:11, 4.12s/it]
50%|████▉ | 2245/4506 [2:33:33<2:38:33, 4.21s/it]
{'loss': 0.2406, 'grad_norm': 0.32544368505477905, 'learning_rate': 2.9517000329838223e-05, 'epoch': 0.5}
50%|████▉ | 2245/4506 [2:33:33<2:38:33, 4.21s/it]
50%|████▉ | 2246/4506 [2:33:37<2:36:19, 4.15s/it]
{'loss': 0.2462, 'grad_norm': 0.3565821051597595, 'learning_rate': 2.9497949111084005e-05, 'epoch': 0.5}
50%|████▉ | 2246/4506 [2:33:37<2:36:19, 4.15s/it]
50%|████▉ | 2247/4506 [2:33:41<2:34:45, 4.11s/it]
{'loss': 0.2559, 'grad_norm': 0.41575315594673157, 'learning_rate': 2.947889519252387e-05, 'epoch': 0.5}
50%|████▉ | 2247/4506 [2:33:41<2:34:45, 4.11s/it]
50%|████▉ | 2248/4506 [2:33:45<2:34:30, 4.11s/it]
{'loss': 0.2467, 'grad_norm': 0.45649024844169617, 'learning_rate': 2.9459838585594564e-05, 'epoch': 0.5}
50%|████▉ | 2248/4506 [2:33:45<2:34:30, 4.11s/it]
50%|████▉ | 2249/4506 [2:33:49<2:37:46, 4.19s/it]
{'loss': 0.2482, 'grad_norm': 0.5252575278282166, 'learning_rate': 2.944077930173444e-05, 'epoch': 0.5}
50%|████▉ | 2249/4506 [2:33:49<2:37:46, 4.19s/it]
50%|████▉ | 2250/4506 [2:33:53<2:36:57, 4.17s/it]
{'loss': 0.2537, 'grad_norm': 0.3846352696418762, 'learning_rate': 2.9421717352383466e-05, 'epoch': 0.5}
50%|████▉ | 2250/4506 [2:33:53<2:36:57, 4.17s/it]
50%|████▉ | 2251/4506 [2:33:58<2:38:44, 4.22s/it]
{'loss': 0.262, 'grad_norm': 0.3799417316913605, 'learning_rate': 2.9402652748983196e-05, 'epoch': 0.5}
50%|████▉ | 2251/4506 [2:33:58<2:38:44, 4.22s/it]
50%|████▉ | 2252/4506 [2:34:02<2:38:01, 4.21s/it]
{'loss': 0.2522, 'grad_norm': 0.37249505519866943, 'learning_rate': 2.938358550297679e-05, 'epoch': 0.5}
50%|████▉ | 2252/4506 [2:34:02<2:38:01, 4.21s/it]
50%|█████ | 2253/4506 [2:34:07<2:44:31, 4.38s/it]
{'loss': 0.2719, 'grad_norm': 0.4342379868030548, 'learning_rate': 2.9364515625808993e-05, 'epoch': 0.5}
50%|█████ | 2253/4506 [2:34:07<2:44:31, 4.38s/it]
50%|█████ | 2254/4506 [2:34:11<2:41:54, 4.31s/it]
{'loss': 0.2488, 'grad_norm': 0.3898887634277344, 'learning_rate': 2.9345443128926115e-05, 'epoch': 0.5}
50%|█████ | 2254/4506 [2:34:11<2:41:54, 4.31s/it]
50%|█████ | 2255/4506 [2:34:15<2:42:48, 4.34s/it]
{'loss': 0.2497, 'grad_norm': 0.39047709107398987, 'learning_rate': 2.9326368023776052e-05, 'epoch': 0.5}
50%|█████ | 2255/4506 [2:34:15<2:42:48, 4.34s/it]
50%|█████ | 2256/4506 [2:34:20<2:43:24, 4.36s/it]
{'loss': 0.2381, 'grad_norm': 0.4213413596153259, 'learning_rate': 2.9307290321808273e-05, 'epoch': 0.5}
50%|█████ | 2256/4506 [2:34:20<2:43:24, 4.36s/it]
50%|█████ | 2257/4506 [2:34:24<2:40:08, 4.27s/it]
{'loss': 0.2519, 'grad_norm': 0.37872645258903503, 'learning_rate': 2.9288210034473783e-05, 'epoch': 0.5}
50%|█████ | 2257/4506 [2:34:24<2:40:08, 4.27s/it]
50%|█████ | 2258/4506 [2:34:28<2:40:43, 4.29s/it]
{'loss': 0.2528, 'grad_norm': 0.3748287856578827, 'learning_rate': 2.9269127173225154e-05, 'epoch': 0.5}
50%|█████ | 2258/4506 [2:34:28<2:40:43, 4.29s/it]
50%|█████ | 2259/4506 [2:34:32<2:39:25, 4.26s/it]
{'loss': 0.2492, 'grad_norm': 0.3867661952972412, 'learning_rate': 2.9250041749516505e-05, 'epoch': 0.5}
50%|█████ | 2259/4506 [2:34:32<2:39:25, 4.26s/it]
50%|█████ | 2260/4506 [2:34:36<2:37:21, 4.20s/it]
{'loss': 0.2387, 'grad_norm': 0.3517872989177704, 'learning_rate': 2.9230953774803487e-05, 'epoch': 0.5}
50%|█████ | 2260/4506 [2:34:36<2:37:21, 4.20s/it]
50%|█████ | 2261/4506 [2:34:41<2:40:45, 4.30s/it]
{'loss': 0.2452, 'grad_norm': 0.3200223445892334, 'learning_rate': 2.9211863260543275e-05, 'epoch': 0.5}
50%|█████ | 2261/4506 [2:34:41<2:40:45, 4.30s/it]
50%|█████ | 2262/4506 [2:34:44<2:33:39, 4.11s/it]
{'loss': 0.2349, 'grad_norm': 0.3737230896949768, 'learning_rate': 2.9192770218194587e-05, 'epoch': 0.5}
50%|█████ | 2262/4506 [2:34:44<2:33:39, 4.11s/it]
50%|█████ | 2263/4506 [2:34:48<2:33:25, 4.10s/it]
{'loss': 0.2471, 'grad_norm': 0.41296255588531494, 'learning_rate': 2.9173674659217642e-05, 'epoch': 0.5}
50%|█████ | 2263/4506 [2:34:48<2:33:25, 4.10s/it]
50%|█████ | 2264/4506 [2:34:53<2:35:56, 4.17s/it]
{'loss': 0.2466, 'grad_norm': 0.33212554454803467, 'learning_rate': 2.915457659507417e-05, 'epoch': 0.5}
50%|█████ | 2264/4506 [2:34:53<2:35:56, 4.17s/it]
50%|█████ | 2265/4506 [2:34:57<2:31:37, 4.06s/it]
{'loss': 0.2364, 'grad_norm': 0.3784218728542328, 'learning_rate': 2.913547603722742e-05, 'epoch': 0.5}
50%|█████ | 2265/4506 [2:34:57<2:31:37, 4.06s/it]
50%|█████ | 2266/4506 [2:35:00<2:28:20, 3.97s/it]
{'loss': 0.2526, 'grad_norm': 0.3899935483932495, 'learning_rate': 2.9116372997142132e-05, 'epoch': 0.5}
50%|█████ | 2266/4506 [2:35:00<2:28:20, 3.97s/it]
50%|█████ | 2267/4506 [2:35:05<2:32:39, 4.09s/it]
{'loss': 0.2382, 'grad_norm': 0.3330913484096527, 'learning_rate': 2.909726748628451e-05, 'epoch': 0.5}
50%|█████ | 2267/4506 [2:35:05<2:32:39, 4.09s/it]
50%|█████ | 2268/4506 [2:35:09<2:29:42, 4.01s/it]
{'loss': 0.247, 'grad_norm': 0.3778986632823944, 'learning_rate': 2.9078159516122294e-05, 'epoch': 0.5}
50%|█████ | 2268/4506 [2:35:09<2:29:42, 4.01s/it]
50%|█████ | 2269/4506 [2:35:13<2:29:57, 4.02s/it]
{'loss': 0.2429, 'grad_norm': 0.33747169375419617, 'learning_rate': 2.9059049098124645e-05, 'epoch': 0.5}
50%|█████ | 2269/4506 [2:35:13<2:29:57, 4.02s/it]
50%|█████ | 2270/4506 [2:35:17<2:31:37, 4.07s/it]
{'loss': 0.2549, 'grad_norm': 0.4128037691116333, 'learning_rate': 2.9039936243762223e-05, 'epoch': 0.5}
50%|█████ | 2270/4506 [2:35:17<2:31:37, 4.07s/it]
50%|█████ | 2271/4506 [2:35:21<2:37:34, 4.23s/it]
{'loss': 0.2439, 'grad_norm': 0.36352723836898804, 'learning_rate': 2.9020820964507143e-05, 'epoch': 0.5}
50%|█████ | 2271/4506 [2:35:21<2:37:34, 4.23s/it]
50%|█████ | 2272/4506 [2:35:25<2:35:26, 4.17s/it]
{'loss': 0.2416, 'grad_norm': 0.3819003701210022, 'learning_rate': 2.900170327183299e-05, 'epoch': 0.5}
50%|█████ | 2272/4506 [2:35:25<2:35:26, 4.17s/it]
50%|█████ | 2273/4506 [2:35:30<2:38:06, 4.25s/it]
{'loss': 0.2424, 'grad_norm': 0.4016568064689636, 'learning_rate': 2.8982583177214772e-05, 'epoch': 0.5}
50%|█████ | 2273/4506 [2:35:30<2:38:06, 4.25s/it]
50%|█████ | 2274/4506 [2:35:34<2:34:16, 4.15s/it]
{'loss': 0.2475, 'grad_norm': 0.460296630859375, 'learning_rate': 2.8963460692128953e-05, 'epoch': 0.5}
50%|█████ | 2274/4506 [2:35:34<2:34:16, 4.15s/it]
50%|█████ | 2275/4506 [2:35:38<2:29:58, 4.03s/it]
{'loss': 0.2514, 'grad_norm': 0.3977344334125519, 'learning_rate': 2.894433582805344e-05, 'epoch': 0.5}
50%|█████ | 2275/4506 [2:35:38<2:29:58, 4.03s/it]
51%|█████ | 2276/4506 [2:35:41<2:28:36, 4.00s/it]
{'loss': 0.2397, 'grad_norm': 0.4084092974662781, 'learning_rate': 2.8925208596467544e-05, 'epoch': 0.51}
51%|█████ | 2276/4506 [2:35:41<2:28:36, 4.00s/it]
51%|█████ | 2277/4506 [2:35:46<2:30:03, 4.04s/it]
{'loss': 0.246, 'grad_norm': 0.37970665097236633, 'learning_rate': 2.8906079008852016e-05, 'epoch': 0.51}
51%|█████ | 2277/4506 [2:35:46<2:30:03, 4.04s/it]
51%|█████ | 2278/4506 [2:35:50<2:34:08, 4.15s/it]
{'loss': 0.2366, 'grad_norm': 0.35096946358680725, 'learning_rate': 2.888694707668903e-05, 'epoch': 0.51}
51%|█████ | 2278/4506 [2:35:50<2:34:08, 4.15s/it]
51%|█████ | 2279/4506 [2:35:54<2:30:02, 4.04s/it]
{'loss': 0.2547, 'grad_norm': 0.5079193115234375, 'learning_rate': 2.8867812811462135e-05, 'epoch': 0.51}
51%|█████ | 2279/4506 [2:35:54<2:30:02, 4.04s/it]
51%|█████ | 2280/4506 [2:35:58<2:28:40, 4.01s/it]
{'loss': 0.2331, 'grad_norm': 0.40208929777145386, 'learning_rate': 2.8848676224656307e-05, 'epoch': 0.51}
51%|█████ | 2280/4506 [2:35:58<2:28:40, 4.01s/it]
51%|█████ | 2281/4506 [2:36:02<2:31:55, 4.10s/it]
{'loss': 0.2407, 'grad_norm': 0.3782118558883667, 'learning_rate': 2.8829537327757912e-05, 'epoch': 0.51}
51%|█████ | 2281/4506 [2:36:02<2:31:55, 4.10s/it]
51%|█████ | 2282/4506 [2:36:06<2:33:41, 4.15s/it]
{'loss': 0.2462, 'grad_norm': 0.38425227999687195, 'learning_rate': 2.881039613225469e-05, 'epoch': 0.51}
51%|█████ | 2282/4506 [2:36:06<2:33:41, 4.15s/it]
51%|█████ | 2283/4506 [2:36:10<2:30:46, 4.07s/it]
{'loss': 0.2436, 'grad_norm': 0.4412653148174286, 'learning_rate': 2.879125264963577e-05, 'epoch': 0.51}
51%|█████ | 2283/4506 [2:36:10<2:30:46, 4.07s/it]
51%|█████ | 2284/4506 [2:36:14<2:29:58, 4.05s/it]
{'loss': 0.2538, 'grad_norm': 0.3862152099609375, 'learning_rate': 2.8772106891391658e-05, 'epoch': 0.51}
51%|█████ | 2284/4506 [2:36:14<2:29:58, 4.05s/it]
51%|█████ | 2285/4506 [2:36:18<2:32:33, 4.12s/it]
{'loss': 0.24, 'grad_norm': 0.3425889015197754, 'learning_rate': 2.8752958869014224e-05, 'epoch': 0.51}
51%|█████ | 2285/4506 [2:36:18<2:32:33, 4.12s/it]
51%|█████ | 2286/4506 [2:36:22<2:29:14, 4.03s/it]
{'loss': 0.2479, 'grad_norm': 0.534942090511322, 'learning_rate': 2.8733808593996675e-05, 'epoch': 0.51}
51%|█████ | 2286/4506 [2:36:22<2:29:14, 4.03s/it]
51%|█████ | 2287/4506 [2:36:26<2:30:07, 4.06s/it]
{'loss': 0.246, 'grad_norm': 0.42242372035980225, 'learning_rate': 2.871465607783361e-05, 'epoch': 0.51}
51%|█████ | 2287/4506 [2:36:26<2:30:07, 4.06s/it]
51%|█████ | 2288/4506 [2:36:30<2:28:15, 4.01s/it]
{'loss': 0.2365, 'grad_norm': 0.3766038417816162, 'learning_rate': 2.8695501332020946e-05, 'epoch': 0.51}
51%|█████ | 2288/4506 [2:36:30<2:28:15, 4.01s/it]
51%|█████ | 2289/4506 [2:36:34<2:29:46, 4.05s/it]
{'loss': 0.2435, 'grad_norm': 0.4078232944011688, 'learning_rate': 2.867634436805593e-05, 'epoch': 0.51}
51%|█████ | 2289/4506 [2:36:34<2:29:46, 4.05s/it]
51%|█████ | 2290/4506 [2:36:39<2:34:01, 4.17s/it]
{'loss': 0.2347, 'grad_norm': 0.39711859822273254, 'learning_rate': 2.8657185197437176e-05, 'epoch': 0.51}
51%|█████ | 2290/4506 [2:36:39<2:34:01, 4.17s/it]
51%|█████ | 2291/4506 [2:36:43<2:34:14, 4.18s/it]
{'loss': 0.2372, 'grad_norm': 0.4321627914905548, 'learning_rate': 2.863802383166459e-05, 'epoch': 0.51}
51%|█████ | 2291/4506 [2:36:43<2:34:14, 4.18s/it]
51%|█████ | 2292/4506 [2:36:48<2:38:45, 4.30s/it]
{'loss': 0.2562, 'grad_norm': 0.47147196531295776, 'learning_rate': 2.8618860282239413e-05, 'epoch': 0.51}
51%|█████ | 2292/4506 [2:36:48<2:38:45, 4.30s/it]
51%|█████ | 2293/4506 [2:36:52<2:39:55, 4.34s/it]
{'loss': 0.2527, 'grad_norm': 0.38323545455932617, 'learning_rate': 2.8599694560664176e-05, 'epoch': 0.51}
51%|█████ | 2293/4506 [2:36:52<2:39:55, 4.34s/it]
51%|█████ | 2294/4506 [2:36:56<2:32:20, 4.13s/it]
{'loss': 0.2509, 'grad_norm': 0.4608871340751648, 'learning_rate': 2.858052667844274e-05, 'epoch': 0.51}
51%|█████ | 2294/4506 [2:36:56<2:32:20, 4.13s/it]
51%|█████ | 2295/4506 [2:37:00<2:29:53, 4.07s/it]
{'loss': 0.2348, 'grad_norm': 0.33736929297447205, 'learning_rate': 2.856135664708025e-05, 'epoch': 0.51}
51%|█████ | 2295/4506 [2:37:00<2:29:53, 4.07s/it]
51%|█████ | 2296/4506 [2:37:04<2:27:36, 4.01s/it]
{'loss': 0.2396, 'grad_norm': 0.3455488681793213, 'learning_rate': 2.8542184478083145e-05, 'epoch': 0.51}
51%|█████ | 2296/4506 [2:37:04<2:27:36, 4.01s/it]
51%|█████ | 2297/4506 [2:37:08<2:28:49, 4.04s/it]
{'loss': 0.25, 'grad_norm': 0.38293832540512085, 'learning_rate': 2.8523010182959142e-05, 'epoch': 0.51}
51%|█████ | 2297/4506 [2:37:08<2:28:49, 4.04s/it]
51%|█████ | 2298/4506 [2:37:12<2:30:00, 4.08s/it]
{'loss': 0.2384, 'grad_norm': 0.3611268699169159, 'learning_rate': 2.8503833773217224e-05, 'epoch': 0.51}
51%|█████ | 2298/4506 [2:37:12<2:30:00, 4.08s/it]
51%|█████ | 2299/4506 [2:37:16<2:34:16, 4.19s/it]
{'loss': 0.2514, 'grad_norm': 0.3238224685192108, 'learning_rate': 2.8484655260367678e-05, 'epoch': 0.51}
51%|█████ | 2299/4506 [2:37:16<2:34:16, 4.19s/it]
51%|█████ | 2300/4506 [2:37:20<2:34:22, 4.20s/it]
{'loss': 0.2468, 'grad_norm': 0.38368943333625793, 'learning_rate': 2.846547465592201e-05, 'epoch': 0.51}
51%|█████ | 2300/4506 [2:37:21<2:34:22, 4.20s/it]
51%|█████ | 2301/4506 [2:37:25<2:32:57, 4.16s/it]
{'loss': 0.2354, 'grad_norm': 0.4212316572666168, 'learning_rate': 2.8446291971393018e-05, 'epoch': 0.51}
51%|█████ | 2301/4506 [2:37:25<2:32:57, 4.16s/it]
51%|█████ | 2302/4506 [2:37:29<2:33:39, 4.18s/it]
{'loss': 0.2479, 'grad_norm': 0.41213932633399963, 'learning_rate': 2.842710721829472e-05, 'epoch': 0.51}
51%|█████ | 2302/4506 [2:37:29<2:33:39, 4.18s/it]
51%|█████ | 2303/4506 [2:37:33<2:30:54, 4.11s/it]
{'loss': 0.2398, 'grad_norm': 0.4082527160644531, 'learning_rate': 2.8407920408142396e-05, 'epoch': 0.51}
51%|█████ | 2303/4506 [2:37:33<2:30:54, 4.11s/it]
51%|█████ | 2304/4506 [2:37:37<2:29:50, 4.08s/it]
{'loss': 0.2391, 'grad_norm': 0.3985230326652527, 'learning_rate': 2.8388731552452557e-05, 'epoch': 0.51}
51%|█████ | 2304/4506 [2:37:37<2:29:50, 4.08s/it]
51%|█████ | 2305/4506 [2:37:41<2:27:10, 4.01s/it]
{'loss': 0.2447, 'grad_norm': 0.45081380009651184, 'learning_rate': 2.8369540662742928e-05, 'epoch': 0.51}
51%|█████ | 2305/4506 [2:37:41<2:27:10, 4.01s/it]
51%|█████ | 2306/4506 [2:37:44<2:25:21, 3.96s/it]
{'loss': 0.2485, 'grad_norm': 0.4279423654079437, 'learning_rate': 2.8350347750532473e-05, 'epoch': 0.51}
51%|█████ | 2306/4506 [2:37:44<2:25:21, 3.96s/it]
51%|█████ | 2307/4506 [2:37:49<2:26:41, 4.00s/it]
{'loss': 0.2422, 'grad_norm': 0.34796953201293945, 'learning_rate': 2.8331152827341362e-05, 'epoch': 0.51}
51%|█████ | 2307/4506 [2:37:49<2:26:41, 4.00s/it]
51%|█████ | 2308/4506 [2:37:53<2:30:23, 4.11s/it]
{'loss': 0.2363, 'grad_norm': 0.37949177622795105, 'learning_rate': 2.831195590469096e-05, 'epoch': 0.51}
51%|█████ | 2308/4506 [2:37:53<2:30:23, 4.11s/it]
51%|█████ | 2309/4506 [2:37:57<2:34:06, 4.21s/it]
{'loss': 0.2429, 'grad_norm': 0.44372957944869995, 'learning_rate': 2.8292756994103858e-05, 'epoch': 0.51}
51%|█████ | 2309/4506 [2:37:57<2:34:06, 4.21s/it]
51%|█████▏ | 2310/4506 [2:38:02<2:34:30, 4.22s/it]
{'loss': 0.2315, 'grad_norm': 0.38682740926742554, 'learning_rate': 2.8273556107103826e-05, 'epoch': 0.51}
51%|█████▏ | 2310/4506 [2:38:02<2:34:30, 4.22s/it]
51%|█████▏ | 2311/4506 [2:38:06<2:31:48, 4.15s/it]
{'loss': 0.2367, 'grad_norm': 0.38111791014671326, 'learning_rate': 2.8254353255215814e-05, 'epoch': 0.51}
51%|█████▏ | 2311/4506 [2:38:06<2:31:48, 4.15s/it]
51%|█████▏ | 2312/4506 [2:38:10<2:35:01, 4.24s/it]
{'loss': 0.2429, 'grad_norm': 0.3674730956554413, 'learning_rate': 2.823514844996596e-05, 'epoch': 0.51}
51%|█████▏ | 2312/4506 [2:38:10<2:35:01, 4.24s/it]
51%|█████▏ | 2313/4506 [2:38:14<2:35:11, 4.25s/it]
{'loss': 0.2517, 'grad_norm': 0.4444069564342499, 'learning_rate': 2.821594170288157e-05, 'epoch': 0.51}
51%|█████▏ | 2313/4506 [2:38:14<2:35:11, 4.25s/it]
51%|█████▏ | 2314/4506 [2:38:18<2:32:30, 4.17s/it]
{'loss': 0.2439, 'grad_norm': 0.4689241051673889, 'learning_rate': 2.819673302549112e-05, 'epoch': 0.51}
51%|█████▏ | 2314/4506 [2:38:18<2:32:30, 4.17s/it]
51%|█████▏ | 2315/4506 [2:38:22<2:31:05, 4.14s/it]
{'loss': 0.2409, 'grad_norm': 0.4124879837036133, 'learning_rate': 2.8177522429324242e-05, 'epoch': 0.51}
51%|█████▏ | 2315/4506 [2:38:22<2:31:05, 4.14s/it]
51%|█████▏ | 2316/4506 [2:38:26<2:30:28, 4.12s/it]
{'loss': 0.2452, 'grad_norm': 0.41344788670539856, 'learning_rate': 2.8158309925911724e-05, 'epoch': 0.51}
51%|█████▏ | 2316/4506 [2:38:26<2:30:28, 4.12s/it]
51%|█████▏ | 2317/4506 [2:38:30<2:28:59, 4.08s/it]
{'loss': 0.233, 'grad_norm': 0.39901649951934814, 'learning_rate': 2.8139095526785493e-05, 'epoch': 0.51}
51%|█████▏ | 2317/4506 [2:38:30<2:28:59, 4.08s/it]
51%|█████▏ | 2318/4506 [2:38:34<2:27:36, 4.05s/it]
{'loss': 0.2347, 'grad_norm': 0.37716513872146606, 'learning_rate': 2.811987924347861e-05, 'epoch': 0.51}
51%|█████▏ | 2318/4506 [2:38:34<2:27:36, 4.05s/it]
51%|█████▏ | 2319/4506 [2:38:38<2:25:53, 4.00s/it]
{'loss': 0.2366, 'grad_norm': 0.4741862714290619, 'learning_rate': 2.8100661087525283e-05, 'epoch': 0.51}
51%|█████▏ | 2319/4506 [2:38:38<2:25:53, 4.00s/it]
51%|█████▏ | 2320/4506 [2:38:42<2:26:14, 4.01s/it]
{'loss': 0.2581, 'grad_norm': 0.4753527343273163, 'learning_rate': 2.808144107046083e-05, 'epoch': 0.51}
51%|█████▏ | 2320/4506 [2:38:42<2:26:14, 4.01s/it]
52%|█████▏ | 2321/4506 [2:38:46<2:25:14, 3.99s/it]
{'loss': 0.2368, 'grad_norm': 0.378119558095932, 'learning_rate': 2.8062219203821683e-05, 'epoch': 0.52}
52%|█████▏ | 2321/4506 [2:38:46<2:25:14, 3.99s/it]
52%|█████▏ | 2322/4506 [2:38:50<2:23:46, 3.95s/it]
{'loss': 0.2393, 'grad_norm': 0.4192379415035248, 'learning_rate': 2.80429954991454e-05, 'epoch': 0.52}
52%|█████▏ | 2322/4506 [2:38:50<2:23:46, 3.95s/it]
52%|█████▏ | 2323/4506 [2:38:54<2:23:39, 3.95s/it]
{'loss': 0.2394, 'grad_norm': 0.398423433303833, 'learning_rate': 2.8023769967970637e-05, 'epoch': 0.52}
52%|█████▏ | 2323/4506 [2:38:54<2:23:39, 3.95s/it]
52%|█████▏ | 2324/4506 [2:38:58<2:23:59, 3.96s/it]
{'loss': 0.244, 'grad_norm': 0.40275418758392334, 'learning_rate': 2.8004542621837127e-05, 'epoch': 0.52}
52%|█████▏ | 2324/4506 [2:38:58<2:23:59, 3.96s/it]
52%|█████▏ | 2325/4506 [2:39:02<2:25:20, 4.00s/it]
{'loss': 0.247, 'grad_norm': 0.38691890239715576, 'learning_rate': 2.7985313472285724e-05, 'epoch': 0.52}
52%|█████▏ | 2325/4506 [2:39:02<2:25:20, 4.00s/it]
52%|█████▏ | 2326/4506 [2:39:06<2:26:51, 4.04s/it]
{'loss': 0.24, 'grad_norm': 0.34117287397384644, 'learning_rate': 2.7966082530858344e-05, 'epoch': 0.52}
52%|█████▏ | 2326/4506 [2:39:06<2:26:51, 4.04s/it]
52%|█████▏ | 2327/4506 [2:39:10<2:27:19, 4.06s/it]
{'loss': 0.2502, 'grad_norm': 0.38896259665489197, 'learning_rate': 2.7946849809097976e-05, 'epoch': 0.52}
52%|█████▏ | 2327/4506 [2:39:10<2:27:19, 4.06s/it]
52%|█████▏ | 2328/4506 [2:39:15<2:31:20, 4.17s/it]
{'loss': 0.2562, 'grad_norm': 0.40764328837394714, 'learning_rate': 2.7927615318548696e-05, 'epoch': 0.52}
52%|█████▏ | 2328/4506 [2:39:15<2:31:20, 4.17s/it]
52%|█████▏ | 2329/4506 [2:39:19<2:28:37, 4.10s/it]
{'loss': 0.248, 'grad_norm': 0.36362361907958984, 'learning_rate': 2.790837907075563e-05, 'epoch': 0.52}
52%|█████▏ | 2329/4506 [2:39:19<2:28:37, 4.10s/it]
52%|█████▏ | 2330/4506 [2:39:23<2:27:50, 4.08s/it]
{'loss': 0.237, 'grad_norm': 0.3617875874042511, 'learning_rate': 2.7889141077264945e-05, 'epoch': 0.52}
52%|█████▏ | 2330/4506 [2:39:23<2:27:50, 4.08s/it]
52%|█████▏ | 2331/4506 [2:39:27<2:33:37, 4.24s/it]
{'loss': 0.2303, 'grad_norm': 0.35051724314689636, 'learning_rate': 2.7869901349623882e-05, 'epoch': 0.52}
52%|█████▏ | 2331/4506 [2:39:27<2:33:37, 4.24s/it]
52%|█████▏ | 2332/4506 [2:39:31<2:32:02, 4.20s/it]
{'loss': 0.2477, 'grad_norm': 0.3925064504146576, 'learning_rate': 2.785065989938071e-05, 'epoch': 0.52}
52%|█████▏ | 2332/4506 [2:39:31<2:32:02, 4.20s/it]
52%|█████▏ | 2333/4506 [2:39:35<2:27:38, 4.08s/it]
{'loss': 0.2368, 'grad_norm': 0.4011475443840027, 'learning_rate': 2.7831416738084727e-05, 'epoch': 0.52}
52%|█████▏ | 2333/4506 [2:39:35<2:27:38, 4.08s/it]
52%|█████▏ | 2334/4506 [2:39:40<2:29:22, 4.13s/it]
{'loss': 0.2411, 'grad_norm': 0.3487144410610199, 'learning_rate': 2.781217187728627e-05, 'epoch': 0.52}
52%|█████▏ | 2334/4506 [2:39:40<2:29:22, 4.13s/it]
52%|█████▏ | 2335/4506 [2:39:44<2:31:33, 4.19s/it]
{'loss': 0.2435, 'grad_norm': 0.3932376205921173, 'learning_rate': 2.7792925328536685e-05, 'epoch': 0.52}
52%|█████▏ | 2335/4506 [2:39:44<2:31:33, 4.19s/it]
52%|█████▏ | 2336/4506 [2:39:48<2:29:56, 4.15s/it]
{'loss': 0.2576, 'grad_norm': 0.414094477891922, 'learning_rate': 2.7773677103388345e-05, 'epoch': 0.52}
52%|█████▏ | 2336/4506 [2:39:48<2:29:56, 4.15s/it]
52%|█████▏ | 2337/4506 [2:39:52<2:31:58, 4.20s/it]
{'loss': 0.2477, 'grad_norm': 0.38700854778289795, 'learning_rate': 2.7754427213394607e-05, 'epoch': 0.52}
52%|█████▏ | 2337/4506 [2:39:52<2:31:58, 4.20s/it]
52%|█████▏ | 2338/4506 [2:39:57<2:32:39, 4.22s/it]
{'loss': 0.2341, 'grad_norm': 0.3618897795677185, 'learning_rate': 2.7735175670109852e-05, 'epoch': 0.52}
52%|█████▏ | 2338/4506 [2:39:57<2:32:39, 4.22s/it]
52%|█████▏ | 2339/4506 [2:40:01<2:32:49, 4.23s/it]
{'loss': 0.2338, 'grad_norm': 0.36121341586112976, 'learning_rate': 2.771592248508944e-05, 'epoch': 0.52}
52%|█████▏ | 2339/4506 [2:40:01<2:32:49, 4.23s/it]
52%|█████▏ | 2340/4506 [2:40:05<2:33:33, 4.25s/it]
{'loss': 0.2515, 'grad_norm': 0.3642018139362335, 'learning_rate': 2.7696667669889713e-05, 'epoch': 0.52}
52%|█████▏ | 2340/4506 [2:40:05<2:33:33, 4.25s/it]
52%|█████▏ | 2341/4506 [2:40:09<2:31:04, 4.19s/it]
{'loss': 0.245, 'grad_norm': 0.39908647537231445, 'learning_rate': 2.7677411236068e-05, 'epoch': 0.52}
52%|█████▏ | 2341/4506 [2:40:09<2:31:04, 4.19s/it]
52%|█████▏ | 2342/4506 [2:40:13<2:28:56, 4.13s/it]
{'loss': 0.2305, 'grad_norm': 0.33812880516052246, 'learning_rate': 2.7658153195182605e-05, 'epoch': 0.52}
52%|█████▏ | 2342/4506 [2:40:13<2:28:56, 4.13s/it]
52%|█████▏ | 2343/4506 [2:40:17<2:27:54, 4.10s/it]
{'loss': 0.2342, 'grad_norm': 0.37086886167526245, 'learning_rate': 2.7638893558792777e-05, 'epoch': 0.52}
52%|█████▏ | 2343/4506 [2:40:17<2:27:54, 4.10s/it]
52%|█████▏ | 2344/4506 [2:40:21<2:28:16, 4.12s/it]
{'loss': 0.2488, 'grad_norm': 0.38126274943351746, 'learning_rate': 2.761963233845875e-05, 'epoch': 0.52}
52%|█████▏ | 2344/4506 [2:40:21<2:28:16, 4.12s/it]
52%|█████▏ | 2345/4506 [2:40:25<2:26:18, 4.06s/it]
{'loss': 0.2464, 'grad_norm': 0.37178245186805725, 'learning_rate': 2.7600369545741687e-05, 'epoch': 0.52}
52%|█████▏ | 2345/4506 [2:40:25<2:26:18, 4.06s/it]
52%|█████▏ | 2346/4506 [2:40:29<2:25:11, 4.03s/it]
{'loss': 0.2242, 'grad_norm': 0.4239112436771393, 'learning_rate': 2.7581105192203698e-05, 'epoch': 0.52}
52%|█████▏ | 2346/4506 [2:40:29<2:25:11, 4.03s/it]
52%|█████▏ | 2347/4506 [2:40:33<2:24:18, 4.01s/it]
{'loss': 0.2387, 'grad_norm': 0.3798506557941437, 'learning_rate': 2.7561839289407842e-05, 'epoch': 0.52}
52%|█████▏ | 2347/4506 [2:40:33<2:24:18, 4.01s/it]
52%|█████▏ | 2348/4506 [2:40:37<2:22:50, 3.97s/it]
{'loss': 0.2409, 'grad_norm': 0.3977767825126648, 'learning_rate': 2.7542571848918097e-05, 'epoch': 0.52}
52%|█████▏ | 2348/4506 [2:40:37<2:22:50, 3.97s/it]
52%|█████▏ | 2349/4506 [2:40:41<2:22:48, 3.97s/it]
{'loss': 0.2328, 'grad_norm': 0.3408187925815582, 'learning_rate': 2.752330288229936e-05, 'epoch': 0.52}
52%|█████▏ | 2349/4506 [2:40:41<2:22:48, 3.97s/it]
52%|█████▏ | 2350/4506 [2:40:45<2:21:12, 3.93s/it]
{'loss': 0.2484, 'grad_norm': 0.3931172490119934, 'learning_rate': 2.750403240111747e-05, 'epoch': 0.52}
52%|█████▏ | 2350/4506 [2:40:45<2:21:12, 3.93s/it]
52%|█████▏ | 2351/4506 [2:40:49<2:27:11, 4.10s/it]
{'loss': 0.2418, 'grad_norm': 0.38469570875167847, 'learning_rate': 2.7484760416939136e-05, 'epoch': 0.52}
52%|█████▏ | 2351/4506 [2:40:49<2:27:11, 4.10s/it]
52%|█████▏ | 2352/4506 [2:40:53<2:24:18, 4.02s/it]
{'loss': 0.2305, 'grad_norm': 0.3609306812286377, 'learning_rate': 2.7465486941332e-05, 'epoch': 0.52}
52%|█████▏ | 2352/4506 [2:40:53<2:24:18, 4.02s/it]
52%|█████▏ | 2353/4506 [2:40:57<2:24:42, 4.03s/it]
{'loss': 0.2342, 'grad_norm': 0.38945209980010986, 'learning_rate': 2.7446211985864583e-05, 'epoch': 0.52}
52%|█████▏ | 2353/4506 [2:40:57<2:24:42, 4.03s/it]
52%|█████▏ | 2354/4506 [2:41:01<2:26:28, 4.08s/it]
{'loss': 0.2323, 'grad_norm': 0.32619011402130127, 'learning_rate': 2.7426935562106303e-05, 'epoch': 0.52}
52%|█████▏ | 2354/4506 [2:41:01<2:26:28, 4.08s/it]
52%|█████▏ | 2355/4506 [2:41:05<2:25:59, 4.07s/it]
{'loss': 0.2349, 'grad_norm': 0.3748278319835663, 'learning_rate': 2.7407657681627458e-05, 'epoch': 0.52}
52%|█████▏ | 2355/4506 [2:41:05<2:25:59, 4.07s/it]
52%|█████▏ | 2356/4506 [2:41:10<2:30:15, 4.19s/it]
{'loss': 0.2526, 'grad_norm': 0.36334025859832764, 'learning_rate': 2.738837835599921e-05, 'epoch': 0.52}
52%|█████▏ | 2356/4506 [2:41:10<2:30:15, 4.19s/it]
52%|█████▏ | 2357/4506 [2:41:14<2:31:41, 4.24s/it]
{'loss': 0.2405, 'grad_norm': 0.35385632514953613, 'learning_rate': 2.7369097596793614e-05, 'epoch': 0.52}
52%|█████▏ | 2357/4506 [2:41:14<2:31:41, 4.24s/it]
52%|█████▏ | 2358/4506 [2:41:18<2:27:55, 4.13s/it]
{'loss': 0.2386, 'grad_norm': 0.4570431113243103, 'learning_rate': 2.734981541558355e-05, 'epoch': 0.52}
52%|█████▏ | 2358/4506 [2:41:18<2:27:55, 4.13s/it]
52%|█████▏ | 2359/4506 [2:41:22<2:27:06, 4.11s/it]
{'loss': 0.244, 'grad_norm': 0.3529464304447174, 'learning_rate': 2.733053182394277e-05, 'epoch': 0.52}
52%|█████▏ | 2359/4506 [2:41:22<2:27:06, 4.11s/it]
52%|█████▏ | 2360/4506 [2:41:26<2:27:42, 4.13s/it]
{'loss': 0.2476, 'grad_norm': 0.43442389369010925, 'learning_rate': 2.7311246833445898e-05, 'epoch': 0.52}
52%|█████▏ | 2360/4506 [2:41:26<2:27:42, 4.13s/it]
52%|█████▏ | 2361/4506 [2:41:31<2:33:04, 4.28s/it]
{'loss': 0.239, 'grad_norm': 0.36261269450187683, 'learning_rate': 2.7291960455668343e-05, 'epoch': 0.52}
52%|█████▏ | 2361/4506 [2:41:31<2:33:04, 4.28s/it]
52%|█████▏ | 2362/4506 [2:41:35<2:31:27, 4.24s/it]
{'loss': 0.2304, 'grad_norm': 0.3458215892314911, 'learning_rate': 2.7272672702186387e-05, 'epoch': 0.52}
52%|█████▏ | 2362/4506 [2:41:35<2:31:27, 4.24s/it]
52%|█████▏ | 2363/4506 [2:41:39<2:29:08, 4.18s/it]
{'loss': 0.2454, 'grad_norm': 0.38985803723335266, 'learning_rate': 2.725338358457713e-05, 'epoch': 0.52}
52%|█████▏ | 2363/4506 [2:41:39<2:29:08, 4.18s/it]
52%|█████▏ | 2364/4506 [2:41:43<2:26:45, 4.11s/it]
{'loss': 0.2426, 'grad_norm': 0.4328736960887909, 'learning_rate': 2.723409311441848e-05, 'epoch': 0.52}
52%|█████▏ | 2364/4506 [2:41:43<2:26:45, 4.11s/it]
52%|█████▏ | 2365/4506 [2:41:47<2:23:52, 4.03s/it]
{'loss': 0.2443, 'grad_norm': 0.37221992015838623, 'learning_rate': 2.721480130328916e-05, 'epoch': 0.52}
52%|█████▏ | 2365/4506 [2:41:47<2:23:52, 4.03s/it]
53%|█████▎ | 2366/4506 [2:41:51<2:26:36, 4.11s/it]
{'loss': 0.2475, 'grad_norm': 0.42343321442604065, 'learning_rate': 2.7195508162768717e-05, 'epoch': 0.53}
53%|█████▎ | 2366/4506 [2:41:51<2:26:36, 4.11s/it]
53%|█████▎ | 2367/4506 [2:41:55<2:23:53, 4.04s/it]
{'loss': 0.2331, 'grad_norm': 0.4077184200286865, 'learning_rate': 2.7176213704437476e-05, 'epoch': 0.53}
53%|█████▎ | 2367/4506 [2:41:55<2:23:53, 4.04s/it]
53%|█████▎ | 2368/4506 [2:41:59<2:22:40, 4.00s/it]
{'loss': 0.2403, 'grad_norm': 0.3931230306625366, 'learning_rate': 2.715691793987654e-05, 'epoch': 0.53}
53%|█████▎ | 2368/4506 [2:41:59<2:22:40, 4.00s/it]
53%|█████▎ | 2369/4506 [2:42:03<2:20:08, 3.93s/it]
{'loss': 0.2438, 'grad_norm': 0.388919860124588, 'learning_rate': 2.7137620880667847e-05, 'epoch': 0.53}
53%|█████▎ | 2369/4506 [2:42:03<2:20:08, 3.93s/it]
53%|█████▎ | 2370/4506 [2:42:07<2:23:14, 4.02s/it]
{'loss': 0.242, 'grad_norm': 0.4057837426662445, 'learning_rate': 2.7118322538394054e-05, 'epoch': 0.53}
53%|█████▎ | 2370/4506 [2:42:07<2:23:14, 4.02s/it]
53%|█████▎ | 2371/4506 [2:42:11<2:23:44, 4.04s/it]
{'loss': 0.2359, 'grad_norm': 0.36278486251831055, 'learning_rate': 2.7099022924638617e-05, 'epoch': 0.53}
53%|█████▎ | 2371/4506 [2:42:11<2:23:44, 4.04s/it]
53%|█████▎ | 2372/4506 [2:42:15<2:21:40, 3.98s/it]
{'loss': 0.2367, 'grad_norm': 0.36411604285240173, 'learning_rate': 2.707972205098576e-05, 'epoch': 0.53}
53%|█████▎ | 2372/4506 [2:42:15<2:21:40, 3.98s/it]
53%|█████▎ | 2373/4506 [2:42:19<2:24:21, 4.06s/it]
{'loss': 0.2281, 'grad_norm': 0.35479599237442017, 'learning_rate': 2.706041992902045e-05, 'epoch': 0.53}
53%|█████▎ | 2373/4506 [2:42:19<2:24:21, 4.06s/it]
53%|█████▎ | 2374/4506 [2:42:23<2:21:29, 3.98s/it]
{'loss': 0.2471, 'grad_norm': 0.425899863243103, 'learning_rate': 2.7041116570328406e-05, 'epoch': 0.53}
53%|█████▎ | 2374/4506 [2:42:23<2:21:29, 3.98s/it]
53%|█████▎ | 2375/4506 [2:42:27<2:22:58, 4.03s/it]
{'loss': 0.2463, 'grad_norm': 0.3340379297733307, 'learning_rate': 2.7021811986496088e-05, 'epoch': 0.53}
53%|█████▎ | 2375/4506 [2:42:27<2:22:58, 4.03s/it]
53%|█████▎ | 2376/4506 [2:42:32<2:27:28, 4.15s/it]
{'loss': 0.2412, 'grad_norm': 0.3460337221622467, 'learning_rate': 2.7002506189110704e-05, 'epoch': 0.53}
53%|█████▎ | 2376/4506 [2:42:32<2:27:28, 4.15s/it]
53%|█████▎ | 2377/4506 [2:42:36<2:30:04, 4.23s/it]
{'loss': 0.2304, 'grad_norm': 0.4068567156791687, 'learning_rate': 2.6983199189760178e-05, 'epoch': 0.53}
53%|█████▎ | 2377/4506 [2:42:36<2:30:04, 4.23s/it]
53%|█████▎ | 2378/4506 [2:42:40<2:30:37, 4.25s/it]
{'loss': 0.2474, 'grad_norm': 0.3970955014228821, 'learning_rate': 2.6963891000033153e-05, 'epoch': 0.53}
53%|█████▎ | 2378/4506 [2:42:40<2:30:37, 4.25s/it]
53%|█████▎ | 2379/4506 [2:42:44<2:27:59, 4.17s/it]
{'loss': 0.2477, 'grad_norm': 0.37900784611701965, 'learning_rate': 2.6944581631519e-05, 'epoch': 0.53}
53%|█████▎ | 2379/4506 [2:42:44<2:27:59, 4.17s/it]
53%|█████▎ | 2380/4506 [2:42:48<2:24:16, 4.07s/it]
{'loss': 0.2469, 'grad_norm': 0.36667555570602417, 'learning_rate': 2.692527109580778e-05, 'epoch': 0.53}
53%|█████▎ | 2380/4506 [2:42:48<2:24:16, 4.07s/it]
53%|█████▎ | 2381/4506 [2:42:52<2:23:50, 4.06s/it]
{'loss': 0.2277, 'grad_norm': 0.355446457862854, 'learning_rate': 2.690595940449027e-05, 'epoch': 0.53}
53%|█████▎ | 2381/4506 [2:42:52<2:23:50, 4.06s/it]
53%|█████▎ | 2382/4506 [2:42:57<2:30:36, 4.25s/it]
{'loss': 0.2509, 'grad_norm': 0.383597731590271, 'learning_rate': 2.6886646569157934e-05, 'epoch': 0.53}
53%|█████▎ | 2382/4506 [2:42:57<2:30:36, 4.25s/it]
53%|█████▎ | 2383/4506 [2:43:01<2:27:57, 4.18s/it]
{'loss': 0.2255, 'grad_norm': 0.3788282871246338, 'learning_rate': 2.6867332601402927e-05, 'epoch': 0.53}
53%|█████▎ | 2383/4506 [2:43:01<2:27:57, 4.18s/it]
53%|█████▎ | 2384/4506 [2:43:05<2:26:10, 4.13s/it]
{'loss': 0.2386, 'grad_norm': 0.34341737627983093, 'learning_rate': 2.6848017512818074e-05, 'epoch': 0.53}
53%|█████▎ | 2384/4506 [2:43:05<2:26:10, 4.13s/it]
53%|█████▎ | 2385/4506 [2:43:09<2:25:21, 4.11s/it]
{'loss': 0.2341, 'grad_norm': 0.3519901931285858, 'learning_rate': 2.6828701314996885e-05, 'epoch': 0.53}
53%|█████▎ | 2385/4506 [2:43:09<2:25:21, 4.11s/it]
53%|█████▎ | 2386/4506 [2:43:13<2:23:04, 4.05s/it]
{'loss': 0.2261, 'grad_norm': 0.3662409484386444, 'learning_rate': 2.680938401953352e-05, 'epoch': 0.53}
53%|█████▎ | 2386/4506 [2:43:13<2:23:04, 4.05s/it]
53%|█████▎ | 2387/4506 [2:43:17<2:22:15, 4.03s/it]
{'loss': 0.2339, 'grad_norm': 0.4173281490802765, 'learning_rate': 2.679006563802281e-05, 'epoch': 0.53}
53%|█████▎ | 2387/4506 [2:43:17<2:22:15, 4.03s/it]
53%|█████▎ | 2388/4506 [2:43:21<2:21:46, 4.02s/it]
{'loss': 0.231, 'grad_norm': 0.36791589856147766, 'learning_rate': 2.6770746182060245e-05, 'epoch': 0.53}
53%|█████▎ | 2388/4506 [2:43:21<2:21:46, 4.02s/it]
53%|█████▎ | 2389/4506 [2:43:25<2:20:30, 3.98s/it]
{'loss': 0.2283, 'grad_norm': 0.3583594262599945, 'learning_rate': 2.6751425663241947e-05, 'epoch': 0.53}
53%|█████▎ | 2389/4506 [2:43:25<2:20:30, 3.98s/it]
53%|█████▎ | 2390/4506 [2:43:29<2:21:18, 4.01s/it]
{'loss': 0.2361, 'grad_norm': 0.42934954166412354, 'learning_rate': 2.6732104093164668e-05, 'epoch': 0.53}
53%|█████▎ | 2390/4506 [2:43:29<2:21:18, 4.01s/it]
53%|█████▎ | 2391/4506 [2:43:33<2:21:02, 4.00s/it]
{'loss': 0.2385, 'grad_norm': 0.3751201331615448, 'learning_rate': 2.6712781483425814e-05, 'epoch': 0.53}
53%|█████▎ | 2391/4506 [2:43:33<2:21:02, 4.00s/it]
53%|█████▎ | 2392/4506 [2:43:38<2:28:06, 4.20s/it]
{'loss': 0.2365, 'grad_norm': 0.3806377649307251, 'learning_rate': 2.6693457845623403e-05, 'epoch': 0.53}
53%|█████▎ | 2392/4506 [2:43:38<2:28:06, 4.20s/it]
53%|█████▎ | 2393/4506 [2:43:41<2:25:29, 4.13s/it]
{'loss': 0.2451, 'grad_norm': 0.4681822955608368, 'learning_rate': 2.6674133191356065e-05, 'epoch': 0.53}
53%|█████▎ | 2393/4506 [2:43:41<2:25:29, 4.13s/it]
53%|█████▎ | 2394/4506 [2:43:46<2:26:31, 4.16s/it]
{'loss': 0.2311, 'grad_norm': 0.4141519069671631, 'learning_rate': 2.6654807532223046e-05, 'epoch': 0.53}
53%|█████▎ | 2394/4506 [2:43:46<2:26:31, 4.16s/it]
53%|█████▎ | 2395/4506 [2:43:50<2:25:13, 4.13s/it]
{'loss': 0.237, 'grad_norm': 0.3942056894302368, 'learning_rate': 2.66354808798242e-05, 'epoch': 0.53}
53%|█████▎ | 2395/4506 [2:43:50<2:25:13, 4.13s/it]
53%|█████▎ | 2396/4506 [2:43:54<2:29:34, 4.25s/it]
{'loss': 0.2347, 'grad_norm': 0.4149496853351593, 'learning_rate': 2.6616153245759968e-05, 'epoch': 0.53}
53%|█████▎ | 2396/4506 [2:43:54<2:29:34, 4.25s/it]
53%|█████▎ | 2397/4506 [2:43:58<2:26:56, 4.18s/it]
{'loss': 0.2453, 'grad_norm': 0.4085482656955719, 'learning_rate': 2.6596824641631386e-05, 'epoch': 0.53}
53%|█████▎ | 2397/4506 [2:43:58<2:26:56, 4.18s/it]
53%|█████▎ | 2398/4506 [2:44:02<2:25:44, 4.15s/it]
{'loss': 0.2338, 'grad_norm': 0.3782094120979309, 'learning_rate': 2.657749507904006e-05, 'epoch': 0.53}
53%|█████▎ | 2398/4506 [2:44:02<2:25:44, 4.15s/it]
53%|█████▎ | 2399/4506 [2:44:07<2:29:49, 4.27s/it]
{'loss': 0.2299, 'grad_norm': 0.3834865987300873, 'learning_rate': 2.6558164569588194e-05, 'epoch': 0.53}
53%|█████▎ | 2399/4506 [2:44:07<2:29:49, 4.27s/it]
53%|█████▎ | 2400/4506 [2:44:11<2:24:39, 4.12s/it]
{'loss': 0.2558, 'grad_norm': 0.436441034078598, 'learning_rate': 2.6538833124878543e-05, 'epoch': 0.53}
53%|█████▎ | 2400/4506 [2:44:11<2:24:39, 4.12s/it]
53%|█████▎ | 2401/4506 [2:44:15<2:23:21, 4.09s/it]
{'loss': 0.2519, 'grad_norm': 0.40440839529037476, 'learning_rate': 2.651950075651444e-05, 'epoch': 0.53}
53%|█████▎ | 2401/4506 [2:44:15<2:23:21, 4.09s/it]
53%|█████▎ | 2402/4506 [2:44:19<2:25:26, 4.15s/it]
{'loss': 0.2361, 'grad_norm': 0.35513782501220703, 'learning_rate': 2.6500167476099734e-05, 'epoch': 0.53}
53%|█████▎ | 2402/4506 [2:44:19<2:25:26, 4.15s/it]
53%|█████▎ | 2403/4506 [2:44:23<2:28:19, 4.23s/it]
{'loss': 0.2506, 'grad_norm': 0.36580049991607666, 'learning_rate': 2.6480833295238873e-05, 'epoch': 0.53}
53%|█████▎ | 2403/4506 [2:44:23<2:28:19, 4.23s/it]
53%|█████▎ | 2404/4506 [2:44:28<2:28:35, 4.24s/it]
{'loss': 0.229, 'grad_norm': 0.3311828672885895, 'learning_rate': 2.646149822553681e-05, 'epoch': 0.53}
53%|█████▎ | 2404/4506 [2:44:28<2:28:35, 4.24s/it]
53%|█████▎ | 2405/4506 [2:44:32<2:28:57, 4.25s/it]
{'loss': 0.2397, 'grad_norm': 0.3860957622528076, 'learning_rate': 2.644216227859904e-05, 'epoch': 0.53}
53%|█████▎ | 2405/4506 [2:44:32<2:28:57, 4.25s/it]
53%|█████▎ | 2406/4506 [2:44:36<2:23:41, 4.11s/it]
{'loss': 0.2422, 'grad_norm': 0.3599430322647095, 'learning_rate': 2.6422825466031594e-05, 'epoch': 0.53}
53%|█████▎ | 2406/4506 [2:44:36<2:23:41, 4.11s/it]
53%|█████▎ | 2407/4506 [2:44:40<2:23:25, 4.10s/it]
{'loss': 0.2351, 'grad_norm': 0.3504411280155182, 'learning_rate': 2.6403487799441012e-05, 'epoch': 0.53}
53%|█████▎ | 2407/4506 [2:44:40<2:23:25, 4.10s/it]
53%|█████▎ | 2408/4506 [2:44:44<2:23:33, 4.11s/it]
{'loss': 0.2352, 'grad_norm': 0.3817390203475952, 'learning_rate': 2.638414929043435e-05, 'epoch': 0.53}
53%|█████▎ | 2408/4506 [2:44:44<2:23:33, 4.11s/it]
53%|█████▎ | 2409/4506 [2:44:48<2:20:22, 4.02s/it]
{'loss': 0.2436, 'grad_norm': 0.4327779710292816, 'learning_rate': 2.6364809950619168e-05, 'epoch': 0.53}
53%|█████▎ | 2409/4506 [2:44:48<2:20:22, 4.02s/it]
53%|█████▎ | 2410/4506 [2:44:52<2:20:31, 4.02s/it]
{'loss': 0.2307, 'grad_norm': 0.41373321413993835, 'learning_rate': 2.6345469791603527e-05, 'epoch': 0.53}
53%|█████▎ | 2410/4506 [2:44:52<2:20:31, 4.02s/it]
54%|█████▎ | 2411/4506 [2:44:56<2:21:06, 4.04s/it]
{'loss': 0.2386, 'grad_norm': 0.39319437742233276, 'learning_rate': 2.632612882499598e-05, 'epoch': 0.54}
54%|█████▎ | 2411/4506 [2:44:56<2:21:06, 4.04s/it]
54%|█████▎ | 2412/4506 [2:45:00<2:21:37, 4.06s/it]
{'loss': 0.2397, 'grad_norm': 0.3630939722061157, 'learning_rate': 2.630678706240557e-05, 'epoch': 0.54}
54%|█████▎ | 2412/4506 [2:45:00<2:21:37, 4.06s/it]
54%|█████▎ | 2413/4506 [2:45:04<2:19:35, 4.00s/it]
{'loss': 0.2317, 'grad_norm': 0.40441280603408813, 'learning_rate': 2.6287444515441794e-05, 'epoch': 0.54}
54%|█████▎ | 2413/4506 [2:45:04<2:19:35, 4.00s/it]
54%|█████▎ | 2414/4506 [2:45:08<2:21:51, 4.07s/it]
{'loss': 0.2429, 'grad_norm': 0.37040022015571594, 'learning_rate': 2.6268101195714656e-05, 'epoch': 0.54}
54%|█████▎ | 2414/4506 [2:45:08<2:21:51, 4.07s/it]
54%|█████▎ | 2415/4506 [2:45:12<2:20:48, 4.04s/it]
{'loss': 0.2403, 'grad_norm': 0.4040074646472931, 'learning_rate': 2.624875711483459e-05, 'epoch': 0.54}
54%|█████▎ | 2415/4506 [2:45:12<2:20:48, 4.04s/it]
54%|█████▎ | 2416/4506 [2:45:16<2:23:19, 4.11s/it]
{'loss': 0.2413, 'grad_norm': 0.43541091680526733, 'learning_rate': 2.6229412284412508e-05, 'epoch': 0.54}
54%|█████▎ | 2416/4506 [2:45:16<2:23:19, 4.11s/it]
54%|█████▎ | 2417/4506 [2:45:20<2:23:09, 4.11s/it]
{'loss': 0.2467, 'grad_norm': 0.37483492493629456, 'learning_rate': 2.6210066716059768e-05, 'epoch': 0.54}
54%|█████▎ | 2417/4506 [2:45:20<2:23:09, 4.11s/it]
54%|█████▎ | 2418/4506 [2:45:25<2:24:26, 4.15s/it]
{'loss': 0.2307, 'grad_norm': 0.3342697024345398, 'learning_rate': 2.6190720421388166e-05, 'epoch': 0.54}
54%|█████▎ | 2418/4506 [2:45:25<2:24:26, 4.15s/it]
54%|█████▎ | 2419/4506 [2:45:29<2:23:52, 4.14s/it]
{'loss': 0.2443, 'grad_norm': 0.37761080265045166, 'learning_rate': 2.6171373412009935e-05, 'epoch': 0.54}
54%|█████▎ | 2419/4506 [2:45:29<2:23:52, 4.14s/it]
54%|█████▎ | 2420/4506 [2:45:33<2:22:35, 4.10s/it]
{'loss': 0.2432, 'grad_norm': 0.36461901664733887, 'learning_rate': 2.6152025699537753e-05, 'epoch': 0.54}
54%|█████▎ | 2420/4506 [2:45:33<2:22:35, 4.10s/it]
54%|█████▎ | 2421/4506 [2:45:37<2:23:53, 4.14s/it]
{'loss': 0.2337, 'grad_norm': 0.34763646125793457, 'learning_rate': 2.6132677295584685e-05, 'epoch': 0.54}
54%|█████▎ | 2421/4506 [2:45:37<2:23:53, 4.14s/it]
54%|█████▍ | 2422/4506 [2:45:41<2:21:46, 4.08s/it]
{'loss': 0.2433, 'grad_norm': 0.37950387597084045, 'learning_rate': 2.6113328211764237e-05, 'epoch': 0.54}
54%|█████▍ | 2422/4506 [2:45:41<2:21:46, 4.08s/it]
54%|█████▍ | 2423/4506 [2:45:45<2:21:40, 4.08s/it]
{'loss': 0.2368, 'grad_norm': 0.45222318172454834, 'learning_rate': 2.609397845969033e-05, 'epoch': 0.54}
54%|█████▍ | 2423/4506 [2:45:45<2:21:40, 4.08s/it]
54%|█████▍ | 2424/4506 [2:45:49<2:23:25, 4.13s/it]
{'loss': 0.2418, 'grad_norm': 0.36968740820884705, 'learning_rate': 2.607462805097726e-05, 'epoch': 0.54}
54%|█████▍ | 2424/4506 [2:45:49<2:23:25, 4.13s/it]
54%|█████▍ | 2425/4506 [2:45:53<2:22:59, 4.12s/it]
{'loss': 0.2284, 'grad_norm': 0.36592164635658264, 'learning_rate': 2.605527699723974e-05, 'epoch': 0.54}
54%|█████▍ | 2425/4506 [2:45:53<2:22:59, 4.12s/it]
54%|█████▍ | 2426/4506 [2:45:58<2:24:30, 4.17s/it]
{'loss': 0.2451, 'grad_norm': 0.3927489221096039, 'learning_rate': 2.6035925310092857e-05, 'epoch': 0.54}
54%|█████▍ | 2426/4506 [2:45:58<2:24:30, 4.17s/it]
54%|█████▍ | 2427/4506 [2:46:03<2:31:12, 4.36s/it]
{'loss': 0.2543, 'grad_norm': 0.4346103370189667, 'learning_rate': 2.601657300115209e-05, 'epoch': 0.54}
54%|█████▍ | 2427/4506 [2:46:03<2:31:12, 4.36s/it]
54%|█████▍ | 2428/4506 [2:46:07<2:33:04, 4.42s/it]
{'loss': 0.239, 'grad_norm': 0.400358647108078, 'learning_rate': 2.5997220082033264e-05, 'epoch': 0.54}
54%|█████▍ | 2428/4506 [2:46:07<2:33:04, 4.42s/it]
54%|█████▍ | 2429/4506 [2:46:11<2:26:12, 4.22s/it]
{'loss': 0.24, 'grad_norm': 0.41191086173057556, 'learning_rate': 2.597786656435261e-05, 'epoch': 0.54}
54%|█████▍ | 2429/4506 [2:46:11<2:26:12, 4.22s/it]
54%|█████▍ | 2430/4506 [2:46:15<2:27:18, 4.26s/it]
{'loss': 0.2332, 'grad_norm': 0.39181581139564514, 'learning_rate': 2.5958512459726697e-05, 'epoch': 0.54}
54%|█████▍ | 2430/4506 [2:46:15<2:27:18, 4.26s/it]
54%|█████▍ | 2431/4506 [2:46:19<2:25:47, 4.22s/it]
{'loss': 0.2401, 'grad_norm': 0.40420854091644287, 'learning_rate': 2.5939157779772432e-05, 'epoch': 0.54}
54%|█████▍ | 2431/4506 [2:46:19<2:25:47, 4.22s/it]
54%|█████▍ | 2432/4506 [2:46:23<2:23:55, 4.16s/it]
{'loss': 0.2364, 'grad_norm': 0.43274298310279846, 'learning_rate': 2.5919802536107096e-05, 'epoch': 0.54}
54%|█████▍ | 2432/4506 [2:46:23<2:23:55, 4.16s/it]
54%|█████▍ | 2433/4506 [2:46:27<2:22:53, 4.14s/it]
{'loss': 0.2443, 'grad_norm': 0.41763076186180115, 'learning_rate': 2.5900446740348293e-05, 'epoch': 0.54}
54%|█████▍ | 2433/4506 [2:46:27<2:22:53, 4.14s/it]
54%|█████▍ | 2434/4506 [2:46:32<2:25:17, 4.21s/it]
{'loss': 0.2392, 'grad_norm': 0.37047940492630005, 'learning_rate': 2.5881090404113955e-05, 'epoch': 0.54}
54%|█████▍ | 2434/4506 [2:46:32<2:25:17, 4.21s/it]
54%|█████▍ | 2435/4506 [2:46:36<2:23:09, 4.15s/it]
{'loss': 0.2505, 'grad_norm': 0.42542946338653564, 'learning_rate': 2.5861733539022352e-05, 'epoch': 0.54}
54%|█████▍ | 2435/4506 [2:46:36<2:23:09, 4.15s/it]
54%|█████▍ | 2436/4506 [2:46:39<2:18:40, 4.02s/it]
{'loss': 0.2355, 'grad_norm': 0.39750874042510986, 'learning_rate': 2.5842376156692062e-05, 'epoch': 0.54}
54%|█████▍ | 2436/4506 [2:46:40<2:18:40, 4.02s/it]
54%|█████▍ | 2437/4506 [2:46:44<2:21:58, 4.12s/it]
{'loss': 0.2447, 'grad_norm': 0.3703449070453644, 'learning_rate': 2.582301826874196e-05, 'epoch': 0.54}
54%|█████▍ | 2437/4506 [2:46:44<2:21:58, 4.12s/it]
54%|█████▍ | 2438/4506 [2:46:48<2:19:59, 4.06s/it]
{'loss': 0.2408, 'grad_norm': 0.42183199524879456, 'learning_rate': 2.5803659886791264e-05, 'epoch': 0.54}
54%|█████▍ | 2438/4506 [2:46:48<2:19:59, 4.06s/it]
54%|█████▍ | 2439/4506 [2:46:52<2:21:05, 4.10s/it]
{'loss': 0.2333, 'grad_norm': 0.3213618993759155, 'learning_rate': 2.5784301022459444e-05, 'epoch': 0.54}
54%|█████▍ | 2439/4506 [2:46:52<2:21:05, 4.10s/it]
54%|█████▍ | 2440/4506 [2:46:56<2:17:09, 3.98s/it]
{'loss': 0.2264, 'grad_norm': 0.37858784198760986, 'learning_rate': 2.5764941687366295e-05, 'epoch': 0.54}
54%|█████▍ | 2440/4506 [2:46:56<2:17:09, 3.98s/it]
54%|█████▍ | 2441/4506 [2:47:00<2:16:32, 3.97s/it]
{'loss': 0.2342, 'grad_norm': 0.33925309777259827, 'learning_rate': 2.574558189313186e-05, 'epoch': 0.54}
54%|█████▍ | 2441/4506 [2:47:00<2:16:32, 3.97s/it]
54%|█████▍ | 2442/4506 [2:47:04<2:22:48, 4.15s/it]
{'loss': 0.234, 'grad_norm': 0.43977710604667664, 'learning_rate': 2.5726221651376497e-05, 'epoch': 0.54}
54%|█████▍ | 2442/4506 [2:47:04<2:22:48, 4.15s/it]
54%|█████▍ | 2443/4506 [2:47:08<2:22:03, 4.13s/it]
{'loss': 0.2346, 'grad_norm': 0.33608686923980713, 'learning_rate': 2.57068609737208e-05, 'epoch': 0.54}
54%|█████▍ | 2443/4506 [2:47:08<2:22:03, 4.13s/it]
54%|█████▍ | 2444/4506 [2:47:12<2:19:05, 4.05s/it]
{'loss': 0.2397, 'grad_norm': 0.3378828763961792, 'learning_rate': 2.568749987178563e-05, 'epoch': 0.54}
54%|█████▍ | 2444/4506 [2:47:12<2:19:05, 4.05s/it]
54%|█████▍ | 2445/4506 [2:47:16<2:17:58, 4.02s/it]
{'loss': 0.2487, 'grad_norm': 0.40919655561447144, 'learning_rate': 2.566813835719213e-05, 'epoch': 0.54}
54%|█████▍ | 2445/4506 [2:47:16<2:17:58, 4.02s/it]
54%|█████▍ | 2446/4506 [2:47:20<2:21:46, 4.13s/it]
{'loss': 0.2472, 'grad_norm': 0.4161156713962555, 'learning_rate': 2.5648776441561662e-05, 'epoch': 0.54}
54%|█████▍ | 2446/4506 [2:47:20<2:21:46, 4.13s/it]
54%|█████▍ | 2447/4506 [2:47:25<2:21:58, 4.14s/it]
{'loss': 0.2384, 'grad_norm': 0.4469252824783325, 'learning_rate': 2.5629414136515826e-05, 'epoch': 0.54}
54%|█████▍ | 2447/4506 [2:47:25<2:21:58, 4.14s/it]
54%|█████▍ | 2448/4506 [2:47:29<2:21:42, 4.13s/it]
{'loss': 0.2331, 'grad_norm': 0.3525707423686981, 'learning_rate': 2.5610051453676474e-05, 'epoch': 0.54}
54%|█████▍ | 2448/4506 [2:47:29<2:21:42, 4.13s/it]
54%|█████▍ | 2449/4506 [2:47:33<2:19:59, 4.08s/it]
{'loss': 0.2485, 'grad_norm': 0.3925364017486572, 'learning_rate': 2.5590688404665687e-05, 'epoch': 0.54}
54%|█████▍ | 2449/4506 [2:47:33<2:19:59, 4.08s/it]
54%|█████▍ | 2450/4506 [2:47:37<2:20:24, 4.10s/it]
{'loss': 0.2374, 'grad_norm': 0.352683424949646, 'learning_rate': 2.5571325001105732e-05, 'epoch': 0.54}
54%|█████▍ | 2450/4506 [2:47:37<2:20:24, 4.10s/it]
54%|█████▍ | 2451/4506 [2:47:41<2:20:26, 4.10s/it]
{'loss': 0.2324, 'grad_norm': 0.3567129075527191, 'learning_rate': 2.555196125461914e-05, 'epoch': 0.54}
54%|█████▍ | 2451/4506 [2:47:41<2:20:26, 4.10s/it]
54%|█████▍ | 2452/4506 [2:47:45<2:21:37, 4.14s/it]
{'loss': 0.2469, 'grad_norm': 0.34768596291542053, 'learning_rate': 2.5532597176828606e-05, 'epoch': 0.54}
54%|█████▍ | 2452/4506 [2:47:45<2:21:37, 4.14s/it]
54%|█████▍ | 2453/4506 [2:47:49<2:20:11, 4.10s/it]
{'loss': 0.2443, 'grad_norm': 0.4225701093673706, 'learning_rate': 2.5513232779357033e-05, 'epoch': 0.54}
54%|█████▍ | 2453/4506 [2:47:49<2:20:11, 4.10s/it]
54%|█████▍ | 2454/4506 [2:47:53<2:19:39, 4.08s/it]
{'loss': 0.2416, 'grad_norm': 0.37238362431526184, 'learning_rate': 2.5493868073827536e-05, 'epoch': 0.54}
54%|█████▍ | 2454/4506 [2:47:53<2:19:39, 4.08s/it]
54%|█████▍ | 2455/4506 [2:47:57<2:16:45, 4.00s/it]
{'loss': 0.2485, 'grad_norm': 0.43071213364601135, 'learning_rate': 2.54745030718634e-05, 'epoch': 0.54}
54%|█████▍ | 2455/4506 [2:47:57<2:16:45, 4.00s/it]
55%|█████▍ | 2456/4506 [2:48:01<2:15:57, 3.98s/it]
{'loss': 0.2337, 'grad_norm': 0.35716119408607483, 'learning_rate': 2.5455137785088073e-05, 'epoch': 0.55}
55%|█████▍ | 2456/4506 [2:48:01<2:15:57, 3.98s/it]
55%|█████▍ | 2457/4506 [2:48:05<2:17:33, 4.03s/it]
{'loss': 0.2402, 'grad_norm': 0.3672958314418793, 'learning_rate': 2.543577222512521e-05, 'epoch': 0.55}
55%|█████▍ | 2457/4506 [2:48:05<2:17:33, 4.03s/it]
55%|█████▍ | 2458/4506 [2:48:09<2:15:34, 3.97s/it]
{'loss': 0.2278, 'grad_norm': 0.35634511709213257, 'learning_rate': 2.541640640359859e-05, 'epoch': 0.55}
55%|█████▍ | 2458/4506 [2:48:09<2:15:34, 3.97s/it]
55%|█████▍ | 2459/4506 [2:48:13<2:15:12, 3.96s/it]
{'loss': 0.243, 'grad_norm': 0.38952627778053284, 'learning_rate': 2.5397040332132183e-05, 'epoch': 0.55}
55%|█████▍ | 2459/4506 [2:48:13<2:15:12, 3.96s/it]
55%|█████▍ | 2460/4506 [2:48:17<2:15:28, 3.97s/it]
{'loss': 0.2374, 'grad_norm': 0.44269999861717224, 'learning_rate': 2.5377674022350077e-05, 'epoch': 0.55}
55%|█████▍ | 2460/4506 [2:48:17<2:15:28, 3.97s/it]
55%|█████▍ | 2461/4506 [2:48:21<2:16:05, 3.99s/it]
{'loss': 0.2366, 'grad_norm': 0.3887978792190552, 'learning_rate': 2.5358307485876543e-05, 'epoch': 0.55}
55%|█████▍ | 2461/4506 [2:48:21<2:16:05, 3.99s/it]
55%|█████▍ | 2462/4506 [2:48:25<2:15:17, 3.97s/it]
{'loss': 0.2216, 'grad_norm': 0.33021676540374756, 'learning_rate': 2.5338940734335954e-05, 'epoch': 0.55}
55%|█████▍ | 2462/4506 [2:48:25<2:15:17, 3.97s/it]
55%|█████▍ | 2463/4506 [2:48:29<2:17:36, 4.04s/it]
{'loss': 0.2342, 'grad_norm': 0.4106321632862091, 'learning_rate': 2.5319573779352823e-05, 'epoch': 0.55}
55%|█████▍ | 2463/4506 [2:48:29<2:17:36, 4.04s/it]
55%|█████▍ | 2464/4506 [2:48:33<2:18:38, 4.07s/it]
{'loss': 0.2442, 'grad_norm': 0.44007638096809387, 'learning_rate': 2.5300206632551783e-05, 'epoch': 0.55}
55%|█████▍ | 2464/4506 [2:48:33<2:18:38, 4.07s/it]
55%|█████▍ | 2465/4506 [2:48:38<2:21:06, 4.15s/it]
{'loss': 0.2418, 'grad_norm': 0.37916022539138794, 'learning_rate': 2.528083930555759e-05, 'epoch': 0.55}
55%|█████▍ | 2465/4506 [2:48:38<2:21:06, 4.15s/it]
55%|█████▍ | 2466/4506 [2:48:42<2:23:49, 4.23s/it]
{'loss': 0.2296, 'grad_norm': 0.35611438751220703, 'learning_rate': 2.526147180999509e-05, 'epoch': 0.55}
55%|█████▍ | 2466/4506 [2:48:42<2:23:49, 4.23s/it]
55%|█████▍ | 2467/4506 [2:48:46<2:21:31, 4.16s/it]
{'loss': 0.2334, 'grad_norm': 0.3807065486907959, 'learning_rate': 2.5242104157489267e-05, 'epoch': 0.55}
55%|█████▍ | 2467/4506 [2:48:46<2:21:31, 4.16s/it]
55%|█████▍ | 2468/4506 [2:48:50<2:18:39, 4.08s/it]
{'loss': 0.2364, 'grad_norm': 0.3898019790649414, 'learning_rate': 2.522273635966516e-05, 'epoch': 0.55}
55%|█████▍ | 2468/4506 [2:48:50<2:18:39, 4.08s/it]
55%|█████▍ | 2469/4506 [2:48:54<2:18:15, 4.07s/it]
{'loss': 0.2426, 'grad_norm': 0.3537741005420685, 'learning_rate': 2.5203368428147907e-05, 'epoch': 0.55}
55%|█████▍ | 2469/4506 [2:48:54<2:18:15, 4.07s/it]
55%|█████▍ | 2470/4506 [2:48:58<2:16:48, 4.03s/it]
{'loss': 0.231, 'grad_norm': 0.38833606243133545, 'learning_rate': 2.5184000374562738e-05, 'epoch': 0.55}
55%|█████▍ | 2470/4506 [2:48:58<2:16:48, 4.03s/it]
55%|█████▍ | 2471/4506 [2:49:02<2:19:05, 4.10s/it]
{'loss': 0.2339, 'grad_norm': 0.37451472878456116, 'learning_rate': 2.516463221053495e-05, 'epoch': 0.55}
55%|█████▍ | 2471/4506 [2:49:02<2:19:05, 4.10s/it]
55%|█████▍ | 2472/4506 [2:49:06<2:19:03, 4.10s/it]
{'loss': 0.2396, 'grad_norm': 0.36287838220596313, 'learning_rate': 2.5145263947689894e-05, 'epoch': 0.55}
55%|█████▍ | 2472/4506 [2:49:06<2:19:03, 4.10s/it]
55%|█████▍ | 2473/4506 [2:49:11<2:21:58, 4.19s/it]
{'loss': 0.2344, 'grad_norm': 0.3623603284358978, 'learning_rate': 2.5125895597653003e-05, 'epoch': 0.55}
55%|█████▍ | 2473/4506 [2:49:11<2:21:58, 4.19s/it]
55%|█████▍ | 2474/4506 [2:49:15<2:21:05, 4.17s/it]
{'loss': 0.2408, 'grad_norm': 0.4023028612136841, 'learning_rate': 2.5106527172049744e-05, 'epoch': 0.55}
55%|█████▍ | 2474/4506 [2:49:15<2:21:05, 4.17s/it]
55%|█████▍ | 2475/4506 [2:49:19<2:19:05, 4.11s/it]
{'loss': 0.2258, 'grad_norm': 0.40840476751327515, 'learning_rate': 2.5087158682505624e-05, 'epoch': 0.55}
55%|█████▍ | 2475/4506 [2:49:19<2:19:05, 4.11s/it]
55%|█████▍ | 2476/4506 [2:49:23<2:16:14, 4.03s/it]
{'loss': 0.2291, 'grad_norm': 0.3515484929084778, 'learning_rate': 2.5067790140646226e-05, 'epoch': 0.55}
55%|█████▍ | 2476/4506 [2:49:23<2:16:14, 4.03s/it]
55%|█████▍ | 2477/4506 [2:49:27<2:18:58, 4.11s/it]
{'loss': 0.2279, 'grad_norm': 0.3829165995121002, 'learning_rate': 2.5048421558097113e-05, 'epoch': 0.55}
55%|█████▍ | 2477/4506 [2:49:27<2:18:58, 4.11s/it]
55%|█████▍ | 2478/4506 [2:49:31<2:24:14, 4.27s/it]
{'loss': 0.2467, 'grad_norm': 0.403791606426239, 'learning_rate': 2.502905294648391e-05, 'epoch': 0.55}
55%|█████▍ | 2478/4506 [2:49:31<2:24:14, 4.27s/it]
55%|█████▌ | 2479/4506 [2:49:36<2:23:03, 4.23s/it]
{'loss': 0.2376, 'grad_norm': 0.4148090183734894, 'learning_rate': 2.5009684317432243e-05, 'epoch': 0.55}
55%|█████▌ | 2479/4506 [2:49:36<2:23:03, 4.23s/it]
55%|█████▌ | 2480/4506 [2:49:40<2:19:39, 4.14s/it]
{'loss': 0.2348, 'grad_norm': 0.37204453349113464, 'learning_rate': 2.499031568256776e-05, 'epoch': 0.55}
55%|█████▌ | 2480/4506 [2:49:40<2:19:39, 4.14s/it]
55%|█████▌ | 2481/4506 [2:49:44<2:22:36, 4.23s/it]
{'loss': 0.2367, 'grad_norm': 0.37942788004875183, 'learning_rate': 2.4970947053516094e-05, 'epoch': 0.55}
55%|█████▌ | 2481/4506 [2:49:44<2:22:36, 4.23s/it]
55%|█████▌ | 2482/4506 [2:49:48<2:22:51, 4.23s/it]
{'loss': 0.2443, 'grad_norm': 0.39656880497932434, 'learning_rate': 2.4951578441902886e-05, 'epoch': 0.55}
55%|█████▌ | 2482/4506 [2:49:48<2:22:51, 4.23s/it]
55%|█████▌ | 2483/4506 [2:49:53<2:25:34, 4.32s/it]
{'loss': 0.2315, 'grad_norm': 0.37516388297080994, 'learning_rate': 2.4932209859353783e-05, 'epoch': 0.55}
55%|█████▌ | 2483/4506 [2:49:53<2:25:34, 4.32s/it]
55%|█████▌ | 2484/4506 [2:49:57<2:22:55, 4.24s/it]
{'loss': 0.2356, 'grad_norm': 0.33395901322364807, 'learning_rate': 2.4912841317494375e-05, 'epoch': 0.55}
55%|█████▌ | 2484/4506 [2:49:57<2:22:55, 4.24s/it]
55%|█████▌ | 2485/4506 [2:50:01<2:19:50, 4.15s/it]
{'loss': 0.2468, 'grad_norm': 0.8175527453422546, 'learning_rate': 2.4893472827950262e-05, 'epoch': 0.55}
55%|█████▌ | 2485/4506 [2:50:01<2:19:50, 4.15s/it]
55%|█████▌ | 2486/4506 [2:50:05<2:20:52, 4.18s/it]
{'loss': 0.2369, 'grad_norm': 0.3391227722167969, 'learning_rate': 2.4874104402347e-05, 'epoch': 0.55}
55%|█████▌ | 2486/4506 [2:50:05<2:20:52, 4.18s/it]
55%|█████▌ | 2487/4506 [2:50:09<2:20:24, 4.17s/it]
{'loss': 0.2338, 'grad_norm': 0.32947078347206116, 'learning_rate': 2.4854736052310108e-05, 'epoch': 0.55}
55%|█████▌ | 2487/4506 [2:50:09<2:20:24, 4.17s/it]
55%|█████▌ | 2488/4506 [2:50:14<2:24:22, 4.29s/it]
{'loss': 0.2408, 'grad_norm': 0.4009400010108948, 'learning_rate': 2.4835367789465057e-05, 'epoch': 0.55}
55%|█████▌ | 2488/4506 [2:50:14<2:24:22, 4.29s/it]
55%|█████▌ | 2489/4506 [2:50:18<2:26:37, 4.36s/it]
{'loss': 0.244, 'grad_norm': 0.3658909201622009, 'learning_rate': 2.4815999625437265e-05, 'epoch': 0.55}
55%|█████▌ | 2489/4506 [2:50:18<2:26:37, 4.36s/it]
55%|█████▌ | 2490/4506 [2:50:22<2:24:04, 4.29s/it]
{'loss': 0.237, 'grad_norm': 0.34380680322647095, 'learning_rate': 2.47966315718521e-05, 'epoch': 0.55}
55%|█████▌ | 2490/4506 [2:50:22<2:24:04, 4.29s/it]
55%|█████▌ | 2491/4506 [2:50:26<2:21:35, 4.22s/it]
{'loss': 0.2275, 'grad_norm': 0.3195788562297821, 'learning_rate': 2.4777263640334844e-05, 'epoch': 0.55}
55%|█████▌ | 2491/4506 [2:50:26<2:21:35, 4.22s/it]
55%|█████▌ | 2492/4506 [2:50:31<2:21:04, 4.20s/it]
{'loss': 0.2311, 'grad_norm': 0.34549951553344727, 'learning_rate': 2.4757895842510742e-05, 'epoch': 0.55}
55%|█████▌ | 2492/4506 [2:50:31<2:21:04, 4.20s/it]
55%|█████▌ | 2493/4506 [2:50:35<2:19:29, 4.16s/it]
{'loss': 0.2389, 'grad_norm': 0.36969393491744995, 'learning_rate': 2.4738528190004913e-05, 'epoch': 0.55}
55%|█████▌ | 2493/4506 [2:50:35<2:19:29, 4.16s/it]
55%|█████▌ | 2494/4506 [2:50:38<2:14:49, 4.02s/it]
{'loss': 0.2314, 'grad_norm': 0.35439589619636536, 'learning_rate': 2.471916069444242e-05, 'epoch': 0.55}
55%|█████▌ | 2494/4506 [2:50:38<2:14:49, 4.02s/it]
55%|█████▌ | 2495/4506 [2:50:42<2:12:46, 3.96s/it]
{'loss': 0.2479, 'grad_norm': 0.3897327780723572, 'learning_rate': 2.4699793367448223e-05, 'epoch': 0.55}
55%|█████▌ | 2495/4506 [2:50:42<2:12:46, 3.96s/it]
55%|█████▌ | 2496/4506 [2:50:47<2:17:47, 4.11s/it]
{'loss': 0.2475, 'grad_norm': 0.4265716075897217, 'learning_rate': 2.468042622064719e-05, 'epoch': 0.55}
55%|█████▌ | 2496/4506 [2:50:47<2:17:47, 4.11s/it]
55%|█████▌ | 2497/4506 [2:50:51<2:17:12, 4.10s/it]
{'loss': 0.2347, 'grad_norm': 0.38560348749160767, 'learning_rate': 2.4661059265664052e-05, 'epoch': 0.55}
55%|█████▌ | 2497/4506 [2:50:51<2:17:12, 4.10s/it]
55%|█████▌ | 2498/4506 [2:50:55<2:18:43, 4.14s/it]
{'loss': 0.2453, 'grad_norm': 0.4356807768344879, 'learning_rate': 2.464169251412346e-05, 'epoch': 0.55}
55%|█████▌ | 2498/4506 [2:50:55<2:18:43, 4.14s/it]
55%|█████▌ | 2499/4506 [2:50:59<2:17:29, 4.11s/it]
{'loss': 0.2251, 'grad_norm': 0.37705981731414795, 'learning_rate': 2.462232597764992e-05, 'epoch': 0.55}
55%|█████▌ | 2499/4506 [2:50:59<2:17:29, 4.11s/it]
55%|█████▌ | 2500/4506 [2:51:03<2:21:42, 4.24s/it]
{'loss': 0.241, 'grad_norm': 0.381303995847702, 'learning_rate': 2.4602959667867823e-05, 'epoch': 0.55}
55%|█████▌ | 2500/4506 [2:51:03<2:21:42, 4.24s/it]
56%|█████▌ | 2501/4506 [2:51:08<2:19:34, 4.18s/it]
{'loss': 0.2412, 'grad_norm': 0.34141016006469727, 'learning_rate': 2.4583593596401408e-05, 'epoch': 0.56}
56%|█████▌ | 2501/4506 [2:51:08<2:19:34, 4.18s/it]
56%|█████▌ | 2502/4506 [2:51:11<2:16:23, 4.08s/it]
{'loss': 0.2424, 'grad_norm': 0.39120879769325256, 'learning_rate': 2.4564227774874798e-05, 'epoch': 0.56}
56%|█████▌ | 2502/4506 [2:51:11<2:16:23, 4.08s/it]
56%|█████▌ | 2503/4506 [2:51:15<2:13:35, 4.00s/it]
{'loss': 0.2461, 'grad_norm': 0.4279358386993408, 'learning_rate': 2.4544862214911926e-05, 'epoch': 0.56}
56%|█████▌ | 2503/4506 [2:51:15<2:13:35, 4.00s/it]
56%|█████▌ | 2504/4506 [2:51:19<2:10:49, 3.92s/it]
{'loss': 0.2354, 'grad_norm': 0.3745264708995819, 'learning_rate': 2.4525496928136603e-05, 'epoch': 0.56}
56%|█████▌ | 2504/4506 [2:51:19<2:10:49, 3.92s/it]
56%|█████▌ | 2505/4506 [2:51:23<2:12:23, 3.97s/it]
{'loss': 0.2289, 'grad_norm': 0.3896828591823578, 'learning_rate': 2.450613192617247e-05, 'epoch': 0.56}
56%|█████▌ | 2505/4506 [2:51:23<2:12:23, 3.97s/it]
56%|█████▌ | 2506/4506 [2:51:28<2:19:00, 4.17s/it]
{'loss': 0.2425, 'grad_norm': 0.3695014417171478, 'learning_rate': 2.4486767220642973e-05, 'epoch': 0.56}
56%|█████▌ | 2506/4506 [2:51:28<2:19:00, 4.17s/it]
56%|█████▌ | 2507/4506 [2:51:32<2:16:01, 4.08s/it]
{'loss': 0.2509, 'grad_norm': 0.4117131531238556, 'learning_rate': 2.4467402823171407e-05, 'epoch': 0.56}
56%|█████▌ | 2507/4506 [2:51:32<2:16:01, 4.08s/it]
56%|█████▌ | 2508/4506 [2:51:36<2:15:58, 4.08s/it]
{'loss': 0.248, 'grad_norm': 0.4185125231742859, 'learning_rate': 2.4448038745380866e-05, 'epoch': 0.56}
56%|█████▌ | 2508/4506 [2:51:36<2:15:58, 4.08s/it]
56%|█████▌ | 2509/4506 [2:51:40<2:15:43, 4.08s/it]
{'loss': 0.2382, 'grad_norm': 0.39710694551467896, 'learning_rate': 2.442867499889427e-05, 'epoch': 0.56}
56%|█████▌ | 2509/4506 [2:51:40<2:15:43, 4.08s/it]
56%|█████▌ | 2510/4506 [2:51:44<2:14:46, 4.05s/it]
{'loss': 0.237, 'grad_norm': 0.41450434923171997, 'learning_rate': 2.4409311595334322e-05, 'epoch': 0.56}
56%|█████▌ | 2510/4506 [2:51:44<2:14:46, 4.05s/it]
56%|█████▌ | 2511/4506 [2:51:48<2:15:27, 4.07s/it]
{'loss': 0.2467, 'grad_norm': 0.41210630536079407, 'learning_rate': 2.4389948546323535e-05, 'epoch': 0.56}
56%|█████▌ | 2511/4506 [2:51:48<2:15:27, 4.07s/it]
56%|█████▌ | 2512/4506 [2:51:52<2:13:39, 4.02s/it]
{'loss': 0.2258, 'grad_norm': 0.3939395248889923, 'learning_rate': 2.437058586348418e-05, 'epoch': 0.56}
56%|█████▌ | 2512/4506 [2:51:52<2:13:39, 4.02s/it]
56%|█████▌ | 2513/4506 [2:51:56<2:14:26, 4.05s/it]
{'loss': 0.2484, 'grad_norm': 0.39773988723754883, 'learning_rate': 2.4351223558438347e-05, 'epoch': 0.56}
56%|█████▌ | 2513/4506 [2:51:56<2:14:26, 4.05s/it]
56%|█████▌ | 2514/4506 [2:51:59<2:09:35, 3.90s/it]
{'loss': 0.2202, 'grad_norm': 0.35645389556884766, 'learning_rate': 2.433186164280787e-05, 'epoch': 0.56}
56%|█████▌ | 2514/4506 [2:51:59<2:09:35, 3.90s/it]
56%|█████▌ | 2515/4506 [2:52:03<2:10:43, 3.94s/it]
{'loss': 0.2321, 'grad_norm': 0.35999608039855957, 'learning_rate': 2.4312500128214372e-05, 'epoch': 0.56}
56%|█████▌ | 2515/4506 [2:52:03<2:10:43, 3.94s/it]
56%|█████▌ | 2516/4506 [2:52:07<2:12:11, 3.99s/it]
{'loss': 0.2408, 'grad_norm': 0.34132155776023865, 'learning_rate': 2.4293139026279207e-05, 'epoch': 0.56}
56%|█████▌ | 2516/4506 [2:52:07<2:12:11, 3.99s/it]
56%|█████▌ | 2517/4506 [2:52:11<2:11:48, 3.98s/it]
{'loss': 0.2488, 'grad_norm': 0.4253360331058502, 'learning_rate': 2.427377834862351e-05, 'epoch': 0.56}
56%|█████▌ | 2517/4506 [2:52:11<2:11:48, 3.98s/it]
56%|█████▌ | 2518/4506 [2:52:15<2:12:38, 4.00s/it]
{'loss': 0.234, 'grad_norm': 0.3637419641017914, 'learning_rate': 2.4254418106868136e-05, 'epoch': 0.56}
56%|█████▌ | 2518/4506 [2:52:16<2:12:38, 4.00s/it]
56%|█████▌ | 2519/4506 [2:52:20<2:14:49, 4.07s/it]
{'loss': 0.2259, 'grad_norm': 0.3821588456630707, 'learning_rate': 2.423505831263371e-05, 'epoch': 0.56}
56%|█████▌ | 2519/4506 [2:52:20<2:14:49, 4.07s/it]
56%|█████▌ | 2520/4506 [2:52:23<2:11:03, 3.96s/it]
{'loss': 0.2251, 'grad_norm': 0.4065185785293579, 'learning_rate': 2.4215698977540552e-05, 'epoch': 0.56}
56%|█████▌ | 2520/4506 [2:52:23<2:11:03, 3.96s/it]
56%|█████▌ | 2521/4506 [2:52:27<2:09:39, 3.92s/it]
{'loss': 0.2398, 'grad_norm': 0.3469301760196686, 'learning_rate': 2.4196340113208742e-05, 'epoch': 0.56}
56%|█████▌ | 2521/4506 [2:52:27<2:09:39, 3.92s/it]
56%|█████▌ | 2522/4506 [2:52:31<2:12:26, 4.01s/it]
{'loss': 0.2332, 'grad_norm': 0.37017446756362915, 'learning_rate': 2.4176981731258045e-05, 'epoch': 0.56}
56%|█████▌ | 2522/4506 [2:52:31<2:12:26, 4.01s/it]
56%|█████▌ | 2523/4506 [2:52:36<2:13:33, 4.04s/it]
{'loss': 0.2359, 'grad_norm': 0.3668469488620758, 'learning_rate': 2.4157623843307947e-05, 'epoch': 0.56}
56%|█████▌ | 2523/4506 [2:52:36<2:13:33, 4.04s/it]
56%|█████▌ | 2524/4506 [2:52:40<2:12:36, 4.01s/it]
{'loss': 0.2258, 'grad_norm': 0.3720451295375824, 'learning_rate': 2.413826646097766e-05, 'epoch': 0.56}
56%|█████▌ | 2524/4506 [2:52:40<2:12:36, 4.01s/it]
56%|█████▌ | 2525/4506 [2:52:43<2:11:16, 3.98s/it]
{'loss': 0.2344, 'grad_norm': 0.38657498359680176, 'learning_rate': 2.411890959588605e-05, 'epoch': 0.56}
56%|█████▌ | 2525/4506 [2:52:43<2:11:16, 3.98s/it]
56%|█████▌ | 2526/4506 [2:52:48<2:12:53, 4.03s/it]
{'loss': 0.2348, 'grad_norm': 0.37200361490249634, 'learning_rate': 2.4099553259651716e-05, 'epoch': 0.56}
56%|█████▌ | 2526/4506 [2:52:48<2:12:53, 4.03s/it]
56%|█████▌ | 2527/4506 [2:52:51<2:11:39, 3.99s/it]
{'loss': 0.2381, 'grad_norm': 0.4238567650318146, 'learning_rate': 2.4080197463892907e-05, 'epoch': 0.56}
56%|█████▌ | 2527/4506 [2:52:51<2:11:39, 3.99s/it]
56%|█████▌ | 2528/4506 [2:52:55<2:09:11, 3.92s/it]
{'loss': 0.2307, 'grad_norm': 0.38158831000328064, 'learning_rate': 2.4060842220227574e-05, 'epoch': 0.56}
56%|█████▌ | 2528/4506 [2:52:55<2:09:11, 3.92s/it]
56%|█████▌ | 2529/4506 [2:52:59<2:10:58, 3.97s/it]
{'loss': 0.2278, 'grad_norm': 0.36272531747817993, 'learning_rate': 2.404148754027331e-05, 'epoch': 0.56}
56%|█████▌ | 2529/4506 [2:52:59<2:10:58, 3.97s/it]
56%|█████▌ | 2530/4506 [2:53:03<2:11:45, 4.00s/it]
{'loss': 0.2407, 'grad_norm': 0.3729516267776489, 'learning_rate': 2.4022133435647397e-05, 'epoch': 0.56}
56%|█████▌ | 2530/4506 [2:53:03<2:11:45, 4.00s/it]
56%|█████▌ | 2531/4506 [2:53:07<2:10:37, 3.97s/it]
{'loss': 0.2293, 'grad_norm': 0.40093132853507996, 'learning_rate': 2.4002779917966735e-05, 'epoch': 0.56}
56%|█████▌ | 2531/4506 [2:53:07<2:10:37, 3.97s/it]
56%|█████▌ | 2532/4506 [2:53:12<2:14:41, 4.09s/it]
{'loss': 0.2324, 'grad_norm': 0.3658462166786194, 'learning_rate': 2.398342699884792e-05, 'epoch': 0.56}
56%|█████▌ | 2532/4506 [2:53:12<2:14:41, 4.09s/it]
56%|█████▌ | 2533/4506 [2:53:16<2:13:08, 4.05s/it]
{'loss': 0.2307, 'grad_norm': 0.36212408542633057, 'learning_rate': 2.396407468990714e-05, 'epoch': 0.56}
56%|█████▌ | 2533/4506 [2:53:16<2:13:08, 4.05s/it]
56%|█████▌ | 2534/4506 [2:53:20<2:14:17, 4.09s/it]
{'loss': 0.2295, 'grad_norm': 0.41075748205184937, 'learning_rate': 2.3944723002760268e-05, 'epoch': 0.56}
56%|█████▌ | 2534/4506 [2:53:20<2:14:17, 4.09s/it]
56%|█████▋ | 2535/4506 [2:53:24<2:20:04, 4.26s/it]
{'loss': 0.2523, 'grad_norm': 0.3905392289161682, 'learning_rate': 2.392537194902274e-05, 'epoch': 0.56}
56%|█████▋ | 2535/4506 [2:53:24<2:20:04, 4.26s/it]
56%|█████▋ | 2536/4506 [2:53:28<2:15:03, 4.11s/it]
{'loss': 0.239, 'grad_norm': 0.4372241795063019, 'learning_rate': 2.3906021540309675e-05, 'epoch': 0.56}
56%|█████▋ | 2536/4506 [2:53:28<2:15:03, 4.11s/it]
56%|█████▋ | 2537/4506 [2:53:32<2:14:54, 4.11s/it]
{'loss': 0.2403, 'grad_norm': 0.38821303844451904, 'learning_rate': 2.3886671788235762e-05, 'epoch': 0.56}
56%|█████▋ | 2537/4506 [2:53:32<2:14:54, 4.11s/it]
56%|█████▋ | 2538/4506 [2:53:36<2:12:50, 4.05s/it]
{'loss': 0.2227, 'grad_norm': 0.38297832012176514, 'learning_rate': 2.386732270441532e-05, 'epoch': 0.56}
56%|█████▋ | 2538/4506 [2:53:36<2:12:50, 4.05s/it]
56%|█████▋ | 2539/4506 [2:53:40<2:13:58, 4.09s/it]
{'loss': 0.2532, 'grad_norm': 0.4212227165699005, 'learning_rate': 2.3847974300462263e-05, 'epoch': 0.56}
56%|█████▋ | 2539/4506 [2:53:40<2:13:58, 4.09s/it]
56%|█████▋ | 2540/4506 [2:53:44<2:12:42, 4.05s/it]
{'loss': 0.2352, 'grad_norm': 0.36015042662620544, 'learning_rate': 2.3828626587990067e-05, 'epoch': 0.56}
56%|█████▋ | 2540/4506 [2:53:44<2:12:42, 4.05s/it]
56%|█████▋ | 2541/4506 [2:53:49<2:15:15, 4.13s/it]
{'loss': 0.237, 'grad_norm': 0.40305182337760925, 'learning_rate': 2.3809279578611844e-05, 'epoch': 0.56}
56%|█████▋ | 2541/4506 [2:53:49<2:15:15, 4.13s/it]
56%|█████▋ | 2542/4506 [2:53:53<2:12:54, 4.06s/it]
{'loss': 0.2235, 'grad_norm': 0.45331865549087524, 'learning_rate': 2.3789933283940234e-05, 'epoch': 0.56}
56%|█████▋ | 2542/4506 [2:53:53<2:12:54, 4.06s/it]
56%|█████▋ | 2543/4506 [2:53:57<2:13:26, 4.08s/it]
{'loss': 0.2298, 'grad_norm': 0.38774335384368896, 'learning_rate': 2.3770587715587505e-05, 'epoch': 0.56}
56%|█████▋ | 2543/4506 [2:53:57<2:13:26, 4.08s/it]
56%|█████▋ | 2544/4506 [2:54:01<2:14:37, 4.12s/it]
{'loss': 0.2392, 'grad_norm': 0.37722957134246826, 'learning_rate': 2.375124288516542e-05, 'epoch': 0.56}
56%|█████▋ | 2544/4506 [2:54:01<2:14:37, 4.12s/it]
56%|█████▋ | 2545/4506 [2:54:05<2:15:39, 4.15s/it]
{'loss': 0.2514, 'grad_norm': 0.39813360571861267, 'learning_rate': 2.3731898804285356e-05, 'epoch': 0.56}
56%|█████▋ | 2545/4506 [2:54:05<2:15:39, 4.15s/it]
57%|█████▋ | 2546/4506 [2:54:10<2:18:46, 4.25s/it]
{'loss': 0.2345, 'grad_norm': 0.38910555839538574, 'learning_rate': 2.371255548455821e-05, 'epoch': 0.57}
57%|█████▋ | 2546/4506 [2:54:10<2:18:46, 4.25s/it]
57%|█████▋ | 2547/4506 [2:54:14<2:19:22, 4.27s/it]
{'loss': 0.2362, 'grad_norm': 0.35680505633354187, 'learning_rate': 2.369321293759444e-05, 'epoch': 0.57}
57%|█████▋ | 2547/4506 [2:54:14<2:19:22, 4.27s/it]
57%|█████▋ | 2548/4506 [2:54:19<2:22:48, 4.38s/it]
{'loss': 0.2392, 'grad_norm': 0.39728134870529175, 'learning_rate': 2.3673871175004018e-05, 'epoch': 0.57}
57%|█████▋ | 2548/4506 [2:54:19<2:22:48, 4.38s/it]
57%|█████▋ | 2549/4506 [2:54:23<2:20:19, 4.30s/it]
{'loss': 0.2416, 'grad_norm': 0.4608924388885498, 'learning_rate': 2.365453020839648e-05, 'epoch': 0.57}
57%|█████▋ | 2549/4506 [2:54:23<2:20:19, 4.30s/it]
57%|█████▋ | 2550/4506 [2:54:27<2:16:58, 4.20s/it]
{'loss': 0.232, 'grad_norm': 0.4218224287033081, 'learning_rate': 2.3635190049380835e-05, 'epoch': 0.57}
57%|█████▋ | 2550/4506 [2:54:27<2:16:58, 4.20s/it]
57%|█████▋ | 2551/4506 [2:54:31<2:18:34, 4.25s/it]
{'loss': 0.2398, 'grad_norm': 0.3466421365737915, 'learning_rate': 2.3615850709565655e-05, 'epoch': 0.57}
57%|█████▋ | 2551/4506 [2:54:31<2:18:34, 4.25s/it]
57%|█████▋ | 2552/4506 [2:54:35<2:16:50, 4.20s/it]
{'loss': 0.2251, 'grad_norm': 0.4162239134311676, 'learning_rate': 2.3596512200558987e-05, 'epoch': 0.57}
57%|█████▋ | 2552/4506 [2:54:35<2:16:50, 4.20s/it]
57%|█████▋ | 2553/4506 [2:54:39<2:13:47, 4.11s/it]
{'loss': 0.2359, 'grad_norm': 0.42474204301834106, 'learning_rate': 2.3577174533968412e-05, 'epoch': 0.57}
57%|█████▋ | 2553/4506 [2:54:39<2:13:47, 4.11s/it]
57%|█████▋ | 2554/4506 [2:54:43<2:14:13, 4.13s/it]
{'loss': 0.2425, 'grad_norm': 0.4160434901714325, 'learning_rate': 2.3557837721400964e-05, 'epoch': 0.57}
57%|█████▋ | 2554/4506 [2:54:43<2:14:13, 4.13s/it]
57%|█████▋ | 2555/4506 [2:54:47<2:14:44, 4.14s/it]
{'loss': 0.2286, 'grad_norm': 0.3442448079586029, 'learning_rate': 2.3538501774463197e-05, 'epoch': 0.57}
57%|█████▋ | 2555/4506 [2:54:47<2:14:44, 4.14s/it]
57%|█████▋ | 2556/4506 [2:54:51<2:11:18, 4.04s/it]
{'loss': 0.2314, 'grad_norm': 0.42377498745918274, 'learning_rate': 2.3519166704761143e-05, 'epoch': 0.57}
57%|█████▋ | 2556/4506 [2:54:51<2:11:18, 4.04s/it]
57%|█████▋ | 2557/4506 [2:54:55<2:10:20, 4.01s/it]
{'loss': 0.2381, 'grad_norm': 0.39344069361686707, 'learning_rate': 2.349983252390027e-05, 'epoch': 0.57}
57%|█████▋ | 2557/4506 [2:54:55<2:10:20, 4.01s/it]
57%|█████▋ | 2558/4506 [2:55:00<2:15:51, 4.18s/it]
{'loss': 0.2401, 'grad_norm': 0.40064918994903564, 'learning_rate': 2.3480499243485578e-05, 'epoch': 0.57}
57%|█████▋ | 2558/4506 [2:55:00<2:15:51, 4.18s/it]
57%|█████▋ | 2559/4506 [2:55:04<2:14:17, 4.14s/it]
{'loss': 0.2303, 'grad_norm': 0.36166784167289734, 'learning_rate': 2.346116687512146e-05, 'epoch': 0.57}
57%|█████▋ | 2559/4506 [2:55:04<2:14:17, 4.14s/it]
57%|█████▋ | 2560/4506 [2:55:08<2:14:27, 4.15s/it]
{'loss': 0.229, 'grad_norm': 0.419587641954422, 'learning_rate': 2.3441835430411812e-05, 'epoch': 0.57}
57%|█████▋ | 2560/4506 [2:55:08<2:14:27, 4.15s/it]
57%|█████▋ | 2561/4506 [2:55:12<2:13:17, 4.11s/it]
{'loss': 0.2467, 'grad_norm': 0.44946563243865967, 'learning_rate': 2.3422504920959942e-05, 'epoch': 0.57}
57%|█████▋ | 2561/4506 [2:55:12<2:13:17, 4.11s/it]
57%|█████▋ | 2562/4506 [2:55:16<2:17:02, 4.23s/it]
{'loss': 0.2364, 'grad_norm': 0.3835078775882721, 'learning_rate': 2.340317535836863e-05, 'epoch': 0.57}
57%|█████▋ | 2562/4506 [2:55:16<2:17:02, 4.23s/it]
57%|█████▋ | 2563/4506 [2:55:20<2:12:17, 4.09s/it]
{'loss': 0.2311, 'grad_norm': 0.42812392115592957, 'learning_rate': 2.3383846754240038e-05, 'epoch': 0.57}
57%|█████▋ | 2563/4506 [2:55:20<2:12:17, 4.09s/it]
57%|█████▋ | 2564/4506 [2:55:24<2:11:34, 4.07s/it]
{'loss': 0.2236, 'grad_norm': 0.33638060092926025, 'learning_rate': 2.3364519120175806e-05, 'epoch': 0.57}
57%|█████▋ | 2564/4506 [2:55:24<2:11:34, 4.07s/it]
57%|█████▋ | 2565/4506 [2:55:29<2:14:08, 4.15s/it]
{'loss': 0.2343, 'grad_norm': 0.35015207529067993, 'learning_rate': 2.3345192467776957e-05, 'epoch': 0.57}
57%|█████▋ | 2565/4506 [2:55:29<2:14:08, 4.15s/it]
57%|█████▋ | 2566/4506 [2:55:33<2:15:18, 4.19s/it]
{'loss': 0.2201, 'grad_norm': 0.33513763546943665, 'learning_rate': 2.3325866808643937e-05, 'epoch': 0.57}
57%|█████▋ | 2566/4506 [2:55:33<2:15:18, 4.19s/it]
57%|█████▋ | 2567/4506 [2:55:37<2:12:42, 4.11s/it]
{'loss': 0.2292, 'grad_norm': 0.38199692964553833, 'learning_rate': 2.3306542154376596e-05, 'epoch': 0.57}
57%|█████▋ | 2567/4506 [2:55:37<2:12:42, 4.11s/it]
57%|█████▋ | 2568/4506 [2:55:41<2:13:57, 4.15s/it]
{'loss': 0.2373, 'grad_norm': 0.3660190999507904, 'learning_rate': 2.328721851657419e-05, 'epoch': 0.57}
57%|█████▋ | 2568/4506 [2:55:41<2:13:57, 4.15s/it]
57%|█████▋ | 2569/4506 [2:55:45<2:10:29, 4.04s/it]
{'loss': 0.2257, 'grad_norm': 0.3711663484573364, 'learning_rate': 2.326789590683533e-05, 'epoch': 0.57}
57%|█████▋ | 2569/4506 [2:55:45<2:10:29, 4.04s/it]
57%|█████▋ | 2570/4506 [2:55:49<2:12:08, 4.10s/it]
{'loss': 0.2311, 'grad_norm': 0.369683176279068, 'learning_rate': 2.324857433675806e-05, 'epoch': 0.57}
57%|█████▋ | 2570/4506 [2:55:49<2:12:08, 4.10s/it]
57%|█████▋ | 2571/4506 [2:55:54<2:17:37, 4.27s/it]
{'loss': 0.2404, 'grad_norm': 0.4091588258743286, 'learning_rate': 2.3229253817939754e-05, 'epoch': 0.57}
57%|█████▋ | 2571/4506 [2:55:54<2:17:37, 4.27s/it]
57%|█████▋ | 2572/4506 [2:55:58<2:17:47, 4.27s/it]
{'loss': 0.2529, 'grad_norm': 0.36540907621383667, 'learning_rate': 2.3209934361977195e-05, 'epoch': 0.57}
57%|█████▋ | 2572/4506 [2:55:58<2:17:47, 4.27s/it]
57%|█████▋ | 2573/4506 [2:56:02<2:12:58, 4.13s/it]
{'loss': 0.2258, 'grad_norm': 0.33905407786369324, 'learning_rate': 2.319061598046649e-05, 'epoch': 0.57}
57%|█████▋ | 2573/4506 [2:56:02<2:12:58, 4.13s/it]
57%|█████▋ | 2574/4506 [2:56:06<2:12:53, 4.13s/it]
{'loss': 0.2352, 'grad_norm': 0.3749675154685974, 'learning_rate': 2.3171298685003124e-05, 'epoch': 0.57}
57%|█████▋ | 2574/4506 [2:56:06<2:12:53, 4.13s/it]
57%|█████▋ | 2575/4506 [2:56:10<2:11:17, 4.08s/it]
{'loss': 0.2274, 'grad_norm': 0.36114898324012756, 'learning_rate': 2.315198248718194e-05, 'epoch': 0.57}
57%|█████▋ | 2575/4506 [2:56:10<2:11:17, 4.08s/it]
57%|█████▋ | 2576/4506 [2:56:14<2:07:22, 3.96s/it]
{'loss': 0.2339, 'grad_norm': 0.36217746138572693, 'learning_rate': 2.3132667398597075e-05, 'epoch': 0.57}
57%|█████▋ | 2576/4506 [2:56:14<2:07:22, 3.96s/it]
57%|█████▋ | 2577/4506 [2:56:18<2:09:57, 4.04s/it]
{'loss': 0.2379, 'grad_norm': 0.381683349609375, 'learning_rate': 2.311335343084207e-05, 'epoch': 0.57}
57%|█████▋ | 2577/4506 [2:56:18<2:09:57, 4.04s/it]
57%|█████▋ | 2578/4506 [2:56:22<2:07:45, 3.98s/it]
{'loss': 0.2326, 'grad_norm': 0.32297879457473755, 'learning_rate': 2.3094040595509735e-05, 'epoch': 0.57}
57%|█████▋ | 2578/4506 [2:56:22<2:07:45, 3.98s/it]
57%|█████▋ | 2579/4506 [2:56:26<2:10:03, 4.05s/it]
{'loss': 0.2288, 'grad_norm': 0.32745566964149475, 'learning_rate': 2.3074728904192226e-05, 'epoch': 0.57}
57%|█████▋ | 2579/4506 [2:56:26<2:10:03, 4.05s/it]
57%|█████▋ | 2580/4506 [2:56:30<2:09:55, 4.05s/it]
{'loss': 0.2362, 'grad_norm': 0.35809239745140076, 'learning_rate': 2.3055418368481e-05, 'epoch': 0.57}
57%|█████▋ | 2580/4506 [2:56:30<2:09:55, 4.05s/it]
57%|█████▋ | 2581/4506 [2:56:34<2:10:45, 4.08s/it]
{'loss': 0.2281, 'grad_norm': 0.39250946044921875, 'learning_rate': 2.3036108999966853e-05, 'epoch': 0.57}
57%|█████▋ | 2581/4506 [2:56:34<2:10:45, 4.08s/it]
57%|█████▋ | 2582/4506 [2:56:38<2:12:59, 4.15s/it]
{'loss': 0.2264, 'grad_norm': 0.35919854044914246, 'learning_rate': 2.3016800810239828e-05, 'epoch': 0.57}
57%|█████▋ | 2582/4506 [2:56:38<2:12:59, 4.15s/it]
57%|█████▋ | 2583/4506 [2:56:42<2:10:51, 4.08s/it]
{'loss': 0.2398, 'grad_norm': 0.3681199252605438, 'learning_rate': 2.29974938108893e-05, 'epoch': 0.57}
57%|█████▋ | 2583/4506 [2:56:42<2:10:51, 4.08s/it]
57%|█████▋ | 2584/4506 [2:56:46<2:10:31, 4.07s/it]
{'loss': 0.2419, 'grad_norm': 0.4121382236480713, 'learning_rate': 2.297818801350391e-05, 'epoch': 0.57}
57%|█████▋ | 2584/4506 [2:56:46<2:10:31, 4.07s/it]
57%|█████▋ | 2585/4506 [2:56:50<2:06:42, 3.96s/it]
{'loss': 0.242, 'grad_norm': 0.4062412977218628, 'learning_rate': 2.29588834296716e-05, 'epoch': 0.57}
57%|█████▋ | 2585/4506 [2:56:50<2:06:42, 3.96s/it]
57%|█████▋ | 2586/4506 [2:56:54<2:09:31, 4.05s/it]
{'loss': 0.2366, 'grad_norm': 0.37294894456863403, 'learning_rate': 2.293958007097955e-05, 'epoch': 0.57}
57%|█████▋ | 2586/4506 [2:56:54<2:09:31, 4.05s/it]
57%|█████▋ | 2587/4506 [2:56:58<2:11:11, 4.10s/it]
{'loss': 0.2327, 'grad_norm': 0.41117990016937256, 'learning_rate': 2.2920277949014246e-05, 'epoch': 0.57}
57%|█████▋ | 2587/4506 [2:56:58<2:11:11, 4.10s/it]
57%|█████▋ | 2588/4506 [2:57:03<2:10:46, 4.09s/it]
{'loss': 0.2258, 'grad_norm': 0.372989296913147, 'learning_rate': 2.2900977075361382e-05, 'epoch': 0.57}
57%|█████▋ | 2588/4506 [2:57:03<2:10:46, 4.09s/it]
57%|█████▋ | 2589/4506 [2:57:06<2:08:41, 4.03s/it]
{'loss': 0.2424, 'grad_norm': 0.414748877286911, 'learning_rate': 2.288167746160595e-05, 'epoch': 0.57}
57%|█████▋ | 2589/4506 [2:57:06<2:08:41, 4.03s/it]
57%|█████▋ | 2590/4506 [2:57:10<2:08:33, 4.03s/it]
{'loss': 0.2225, 'grad_norm': 0.40251415967941284, 'learning_rate': 2.2862379119332166e-05, 'epoch': 0.57}
57%|█████▋ | 2590/4506 [2:57:10<2:08:33, 4.03s/it]
58%|█████▊ | 2591/4506 [2:57:15<2:11:24, 4.12s/it]
{'loss': 0.2406, 'grad_norm': 0.3867556154727936, 'learning_rate': 2.284308206012346e-05, 'epoch': 0.58}
58%|█████▊ | 2591/4506 [2:57:15<2:11:24, 4.12s/it]
58%|█████▊ | 2592/4506 [2:57:18<2:07:52, 4.01s/it]
{'loss': 0.2262, 'grad_norm': 0.3776716887950897, 'learning_rate': 2.2823786295562536e-05, 'epoch': 0.58}
58%|█████▊ | 2592/4506 [2:57:19<2:07:52, 4.01s/it]
58%|█████▊ | 2593/4506 [2:57:22<2:06:42, 3.97s/it]
{'loss': 0.2475, 'grad_norm': 0.42327818274497986, 'learning_rate': 2.280449183723129e-05, 'epoch': 0.58}
58%|█████▊ | 2593/4506 [2:57:22<2:06:42, 3.97s/it]
58%|█████▊ | 2594/4506 [2:57:26<2:07:24, 4.00s/it]
{'loss': 0.2392, 'grad_norm': 0.4017113149166107, 'learning_rate': 2.278519869671085e-05, 'epoch': 0.58}
58%|█████▊ | 2594/4506 [2:57:26<2:07:24, 4.00s/it]
58%|█████▊ | 2595/4506 [2:57:30<2:05:42, 3.95s/it]
{'loss': 0.242, 'grad_norm': 0.40357857942581177, 'learning_rate': 2.276590688558153e-05, 'epoch': 0.58}
58%|█████▊ | 2595/4506 [2:57:30<2:05:42, 3.95s/it]
58%|█████▊ | 2596/4506 [2:57:34<2:05:23, 3.94s/it]
{'loss': 0.2395, 'grad_norm': 0.40712621808052063, 'learning_rate': 2.2746616415422885e-05, 'epoch': 0.58}
58%|█████▊ | 2596/4506 [2:57:34<2:05:23, 3.94s/it]
58%|█████▊ | 2597/4506 [2:57:38<2:06:34, 3.98s/it]
{'loss': 0.2311, 'grad_norm': 0.3702675700187683, 'learning_rate': 2.2727327297813615e-05, 'epoch': 0.58}
58%|█████▊ | 2597/4506 [2:57:38<2:06:34, 3.98s/it]
58%|█████▊ | 2598/4506 [2:57:42<2:08:10, 4.03s/it]
{'loss': 0.2291, 'grad_norm': 0.3540802597999573, 'learning_rate': 2.2708039544331663e-05, 'epoch': 0.58}
58%|█████▊ | 2598/4506 [2:57:42<2:08:10, 4.03s/it]
58%|█████▊ | 2599/4506 [2:57:47<2:12:48, 4.18s/it]
{'loss': 0.2322, 'grad_norm': 0.4103766083717346, 'learning_rate': 2.2688753166554104e-05, 'epoch': 0.58}
58%|█████▊ | 2599/4506 [2:57:47<2:12:48, 4.18s/it]
58%|█████▊ | 2600/4506 [2:57:51<2:09:32, 4.08s/it]
{'loss': 0.2307, 'grad_norm': 0.4010927379131317, 'learning_rate': 2.266946817605723e-05, 'epoch': 0.58}
58%|█████▊ | 2600/4506 [2:57:51<2:09:32, 4.08s/it]
58%|█████▊ | 2601/4506 [2:57:55<2:10:44, 4.12s/it]
{'loss': 0.2133, 'grad_norm': 0.3894166052341461, 'learning_rate': 2.2650184584416452e-05, 'epoch': 0.58}
58%|█████▊ | 2601/4506 [2:57:55<2:10:44, 4.12s/it]
58%|█████▊ | 2602/4506 [2:57:59<2:12:46, 4.18s/it]
{'loss': 0.2253, 'grad_norm': 0.33512449264526367, 'learning_rate': 2.263090240320639e-05, 'epoch': 0.58}
58%|█████▊ | 2602/4506 [2:57:59<2:12:46, 4.18s/it]
58%|█████▊ | 2603/4506 [2:58:04<2:15:57, 4.29s/it]
{'loss': 0.2434, 'grad_norm': 0.38897520303726196, 'learning_rate': 2.2611621644000786e-05, 'epoch': 0.58}
58%|█████▊ | 2603/4506 [2:58:04<2:15:57, 4.29s/it]
58%|█████▊ | 2604/4506 [2:58:08<2:13:27, 4.21s/it]
{'loss': 0.245, 'grad_norm': 0.40468311309814453, 'learning_rate': 2.2592342318372545e-05, 'epoch': 0.58}
58%|█████▊ | 2604/4506 [2:58:08<2:13:27, 4.21s/it]
58%|█████▊ | 2605/4506 [2:58:12<2:10:47, 4.13s/it]
{'loss': 0.2262, 'grad_norm': 0.42546743154525757, 'learning_rate': 2.2573064437893696e-05, 'epoch': 0.58}
58%|█████▊ | 2605/4506 [2:58:12<2:10:47, 4.13s/it]
58%|█████▊ | 2606/4506 [2:58:16<2:09:14, 4.08s/it]
{'loss': 0.2274, 'grad_norm': 0.3923092782497406, 'learning_rate': 2.2553788014135423e-05, 'epoch': 0.58}
58%|█████▊ | 2606/4506 [2:58:16<2:09:14, 4.08s/it]
58%|█████▊ | 2607/4506 [2:58:20<2:09:35, 4.09s/it]
{'loss': 0.237, 'grad_norm': 0.3663574457168579, 'learning_rate': 2.253451305866801e-05, 'epoch': 0.58}
58%|█████▊ | 2607/4506 [2:58:20<2:09:35, 4.09s/it]
58%|█████▊ | 2608/4506 [2:58:24<2:05:58, 3.98s/it]
{'loss': 0.2251, 'grad_norm': 0.3986949622631073, 'learning_rate': 2.251523958306087e-05, 'epoch': 0.58}
58%|█████▊ | 2608/4506 [2:58:24<2:05:58, 3.98s/it]
58%|█████▊ | 2609/4506 [2:58:28<2:07:15, 4.02s/it]
{'loss': 0.2353, 'grad_norm': 0.39307692646980286, 'learning_rate': 2.2495967598882545e-05, 'epoch': 0.58}
58%|█████▊ | 2609/4506 [2:58:28<2:07:15, 4.02s/it]
58%|█████▊ | 2610/4506 [2:58:32<2:08:21, 4.06s/it]
{'loss': 0.2233, 'grad_norm': 0.3975258767604828, 'learning_rate': 2.2476697117700642e-05, 'epoch': 0.58}
58%|█████▊ | 2610/4506 [2:58:32<2:08:21, 4.06s/it]
58%|█████▊ | 2611/4506 [2:58:36<2:04:33, 3.94s/it]
{'loss': 0.222, 'grad_norm': 0.5034211874008179, 'learning_rate': 2.2457428151081912e-05, 'epoch': 0.58}
58%|█████▊ | 2611/4506 [2:58:36<2:04:33, 3.94s/it]
58%|█████▊ | 2612/4506 [2:58:40<2:06:45, 4.02s/it]
{'loss': 0.2301, 'grad_norm': 0.40435564517974854, 'learning_rate': 2.2438160710592164e-05, 'epoch': 0.58}
58%|█████▊ | 2612/4506 [2:58:40<2:06:45, 4.02s/it]
58%|█████▊ | 2613/4506 [2:58:44<2:09:19, 4.10s/it]
{'loss': 0.2302, 'grad_norm': 0.3942118287086487, 'learning_rate': 2.241889480779631e-05, 'epoch': 0.58}
58%|█████▊ | 2613/4506 [2:58:44<2:09:19, 4.10s/it]
58%|█████▊ | 2614/4506 [2:58:48<2:07:38, 4.05s/it]
{'loss': 0.2312, 'grad_norm': 0.3664464056491852, 'learning_rate': 2.2399630454258315e-05, 'epoch': 0.58}
58%|█████▊ | 2614/4506 [2:58:48<2:07:38, 4.05s/it]
58%|█████▊ | 2615/4506 [2:58:52<2:10:22, 4.14s/it]
{'loss': 0.2246, 'grad_norm': 0.42012330889701843, 'learning_rate': 2.2380367661541256e-05, 'epoch': 0.58}
58%|█████▊ | 2615/4506 [2:58:52<2:10:22, 4.14s/it]
58%|█████▊ | 2616/4506 [2:58:57<2:14:08, 4.26s/it]
{'loss': 0.2376, 'grad_norm': 0.38689446449279785, 'learning_rate': 2.2361106441207222e-05, 'epoch': 0.58}
58%|█████▊ | 2616/4506 [2:58:57<2:14:08, 4.26s/it]
58%|█████▊ | 2617/4506 [2:59:01<2:12:23, 4.21s/it]
{'loss': 0.2505, 'grad_norm': 0.47242170572280884, 'learning_rate': 2.2341846804817397e-05, 'epoch': 0.58}
58%|█████▊ | 2617/4506 [2:59:01<2:12:23, 4.21s/it]
58%|█████▊ | 2618/4506 [2:59:05<2:09:41, 4.12s/it]
{'loss': 0.2381, 'grad_norm': 0.3682031035423279, 'learning_rate': 2.2322588763931998e-05, 'epoch': 0.58}
58%|█████▊ | 2618/4506 [2:59:05<2:09:41, 4.12s/it]
58%|█████▊ | 2619/4506 [2:59:09<2:11:24, 4.18s/it]
{'loss': 0.2381, 'grad_norm': 0.4559778869152069, 'learning_rate': 2.2303332330110293e-05, 'epoch': 0.58}
58%|█████▊ | 2619/4506 [2:59:09<2:11:24, 4.18s/it]
58%|█████▊ | 2620/4506 [2:59:13<2:06:45, 4.03s/it]
{'loss': 0.2205, 'grad_norm': 0.3758236765861511, 'learning_rate': 2.2284077514910566e-05, 'epoch': 0.58}
58%|█████▊ | 2620/4506 [2:59:13<2:06:45, 4.03s/it]
58%|█████▊ | 2621/4506 [2:59:17<2:09:58, 4.14s/it]
{'loss': 0.2342, 'grad_norm': 0.3676392734050751, 'learning_rate': 2.226482432989015e-05, 'epoch': 0.58}
58%|█████▊ | 2621/4506 [2:59:17<2:09:58, 4.14s/it]
58%|█████▊ | 2622/4506 [2:59:21<2:09:29, 4.12s/it]
{'loss': 0.2354, 'grad_norm': 0.3508048355579376, 'learning_rate': 2.2245572786605392e-05, 'epoch': 0.58}
58%|█████▊ | 2622/4506 [2:59:21<2:09:29, 4.12s/it]
58%|█████▊ | 2623/4506 [2:59:25<2:08:42, 4.10s/it]
{'loss': 0.2315, 'grad_norm': 0.3740125000476837, 'learning_rate': 2.222632289661166e-05, 'epoch': 0.58}
58%|█████▊ | 2623/4506 [2:59:25<2:08:42, 4.10s/it]
58%|█████▊ | 2624/4506 [2:59:29<2:08:11, 4.09s/it]
{'loss': 0.2387, 'grad_norm': 0.37275591492652893, 'learning_rate': 2.2207074671463324e-05, 'epoch': 0.58}
58%|█████▊ | 2624/4506 [2:59:29<2:08:11, 4.09s/it]
58%|█████▊ | 2625/4506 [2:59:34<2:07:55, 4.08s/it]
{'loss': 0.2315, 'grad_norm': 0.3624538779258728, 'learning_rate': 2.218782812271374e-05, 'epoch': 0.58}
58%|█████▊ | 2625/4506 [2:59:34<2:07:55, 4.08s/it]
58%|█████▊ | 2626/4506 [2:59:38<2:10:15, 4.16s/it]
{'loss': 0.2416, 'grad_norm': 0.3714732527732849, 'learning_rate': 2.2168583261915283e-05, 'epoch': 0.58}
58%|█████▊ | 2626/4506 [2:59:38<2:10:15, 4.16s/it]
58%|█████▊ | 2627/4506 [2:59:42<2:06:14, 4.03s/it]
{'loss': 0.2252, 'grad_norm': 0.37292543053627014, 'learning_rate': 2.2149340100619295e-05, 'epoch': 0.58}
58%|█████▊ | 2627/4506 [2:59:42<2:06:14, 4.03s/it]
58%|█████▊ | 2628/4506 [2:59:46<2:05:41, 4.02s/it]
{'loss': 0.2231, 'grad_norm': 0.3609100878238678, 'learning_rate': 2.213009865037613e-05, 'epoch': 0.58}
58%|█████▊ | 2628/4506 [2:59:46<2:05:41, 4.02s/it]
58%|█████▊ | 2629/4506 [2:59:49<2:03:45, 3.96s/it]
{'loss': 0.2255, 'grad_norm': 0.34824132919311523, 'learning_rate': 2.211085892273506e-05, 'epoch': 0.58}
58%|█████▊ | 2629/4506 [2:59:49<2:03:45, 3.96s/it]
58%|█████▊ | 2630/4506 [2:59:53<2:03:46, 3.96s/it]
{'loss': 0.2328, 'grad_norm': 0.3847554326057434, 'learning_rate': 2.209162092924438e-05, 'epoch': 0.58}
58%|█████▊ | 2630/4506 [2:59:53<2:03:46, 3.96s/it]
58%|█████▊ | 2631/4506 [2:59:58<2:10:16, 4.17s/it]
{'loss': 0.2369, 'grad_norm': 0.4156075119972229, 'learning_rate': 2.2072384681451303e-05, 'epoch': 0.58}
58%|█████▊ | 2631/4506 [2:59:58<2:10:16, 4.17s/it]
58%|█████▊ | 2632/4506 [3:00:02<2:09:43, 4.15s/it]
{'loss': 0.2415, 'grad_norm': 0.37574583292007446, 'learning_rate': 2.2053150190902027e-05, 'epoch': 0.58}
58%|█████▊ | 2632/4506 [3:00:02<2:09:43, 4.15s/it]
58%|█████▊ | 2633/4506 [3:00:06<2:07:30, 4.08s/it]
{'loss': 0.2212, 'grad_norm': 0.3476894497871399, 'learning_rate': 2.2033917469141655e-05, 'epoch': 0.58}
58%|█████▊ | 2633/4506 [3:00:06<2:07:30, 4.08s/it]
58%|█████▊ | 2634/4506 [3:00:10<2:04:33, 3.99s/it]
{'loss': 0.2375, 'grad_norm': 0.3837743401527405, 'learning_rate': 2.201468652771428e-05, 'epoch': 0.58}
58%|█████▊ | 2634/4506 [3:00:10<2:04:33, 3.99s/it]
58%|█████▊ | 2635/4506 [3:00:14<2:04:01, 3.98s/it]
{'loss': 0.241, 'grad_norm': 0.38108402490615845, 'learning_rate': 2.199545737816287e-05, 'epoch': 0.58}
58%|█████▊ | 2635/4506 [3:00:14<2:04:01, 3.98s/it]
58%|█████▊ | 2636/4506 [3:00:18<2:04:35, 4.00s/it]
{'loss': 0.2238, 'grad_norm': 0.3684070408344269, 'learning_rate': 2.197623003202937e-05, 'epoch': 0.59}
58%|█████▊ | 2636/4506 [3:00:18<2:04:35, 4.00s/it]
59%|█████▊ | 2637/4506 [3:00:22<2:06:05, 4.05s/it]
{'loss': 0.2269, 'grad_norm': 0.33985117077827454, 'learning_rate': 2.1957004500854598e-05, 'epoch': 0.59}
59%|█████▊ | 2637/4506 [3:00:22<2:06:05, 4.05s/it]
59%|█████▊ | 2638/4506 [3:00:26<2:04:57, 4.01s/it]
{'loss': 0.2242, 'grad_norm': 0.34467262029647827, 'learning_rate': 2.1937780796178323e-05, 'epoch': 0.59}
59%|█████▊ | 2638/4506 [3:00:26<2:04:57, 4.01s/it]
59%|█████▊ | 2639/4506 [3:00:30<2:04:36, 4.00s/it]
{'loss': 0.2334, 'grad_norm': 0.3313126862049103, 'learning_rate': 2.1918558929539176e-05, 'epoch': 0.59}
59%|█████▊ | 2639/4506 [3:00:30<2:04:36, 4.00s/it]
59%|█████▊ | 2640/4506 [3:00:34<2:03:47, 3.98s/it]
{'loss': 0.2251, 'grad_norm': 0.45885711908340454, 'learning_rate': 2.1899338912474722e-05, 'epoch': 0.59}
59%|█████▊ | 2640/4506 [3:00:34<2:03:47, 3.98s/it]
59%|█████▊ | 2641/4506 [3:00:38<2:07:48, 4.11s/it]
{'loss': 0.2375, 'grad_norm': 0.33629468083381653, 'learning_rate': 2.18801207565214e-05, 'epoch': 0.59}
59%|█████▊ | 2641/4506 [3:00:38<2:07:48, 4.11s/it]
59%|█████▊ | 2642/4506 [3:00:42<2:06:18, 4.07s/it]
{'loss': 0.23, 'grad_norm': 0.3803044259548187, 'learning_rate': 2.1860904473214513e-05, 'epoch': 0.59}
59%|█████▊ | 2642/4506 [3:00:42<2:06:18, 4.07s/it]
59%|█████▊ | 2643/4506 [3:00:47<2:09:47, 4.18s/it]
{'loss': 0.2297, 'grad_norm': 0.3575625419616699, 'learning_rate': 2.1841690074088285e-05, 'epoch': 0.59}
59%|█████▊ | 2643/4506 [3:00:47<2:09:47, 4.18s/it]
59%|█████▊ | 2644/4506 [3:00:51<2:08:03, 4.13s/it]
{'loss': 0.239, 'grad_norm': 0.35111451148986816, 'learning_rate': 2.1822477570675763e-05, 'epoch': 0.59}
59%|█████▊ | 2644/4506 [3:00:51<2:08:03, 4.13s/it]
59%|█████▊ | 2645/4506 [3:00:55<2:05:45, 4.05s/it]
{'loss': 0.2259, 'grad_norm': 0.35805681347846985, 'learning_rate': 2.180326697450889e-05, 'epoch': 0.59}
59%|█████▊ | 2645/4506 [3:00:55<2:05:45, 4.05s/it]
59%|█████▊ | 2646/4506 [3:00:59<2:05:17, 4.04s/it]
{'loss': 0.2176, 'grad_norm': 0.3602406680583954, 'learning_rate': 2.1784058297118438e-05, 'epoch': 0.59}
59%|█████▊ | 2646/4506 [3:00:59<2:05:17, 4.04s/it]
59%|█████▊ | 2647/4506 [3:01:02<2:03:38, 3.99s/it]
{'loss': 0.2428, 'grad_norm': 0.5423498749732971, 'learning_rate': 2.1764851550034053e-05, 'epoch': 0.59}
59%|█████▊ | 2647/4506 [3:01:02<2:03:38, 3.99s/it]
59%|█████▉ | 2648/4506 [3:01:07<2:04:32, 4.02s/it]
{'loss': 0.2285, 'grad_norm': 0.38414323329925537, 'learning_rate': 2.174564674478419e-05, 'epoch': 0.59}
59%|█████▉ | 2648/4506 [3:01:07<2:04:32, 4.02s/it]
59%|█████▉ | 2649/4506 [3:01:11<2:05:39, 4.06s/it]
{'loss': 0.2432, 'grad_norm': 0.3963199257850647, 'learning_rate': 2.172644389289618e-05, 'epoch': 0.59}
59%|█████▉ | 2649/4506 [3:01:11<2:05:39, 4.06s/it]
59%|█████▉ | 2650/4506 [3:01:15<2:03:44, 4.00s/it]
{'loss': 0.2198, 'grad_norm': 0.3181029260158539, 'learning_rate': 2.1707243005896137e-05, 'epoch': 0.59}
59%|█████▉ | 2650/4506 [3:01:15<2:03:44, 4.00s/it]
59%|█████▉ | 2651/4506 [3:01:19<2:04:04, 4.01s/it]
{'loss': 0.2362, 'grad_norm': 0.3818005323410034, 'learning_rate': 2.1688044095309045e-05, 'epoch': 0.59}
59%|█████▉ | 2651/4506 [3:01:19<2:04:04, 4.01s/it]
59%|█████▉ | 2652/4506 [3:01:22<2:00:43, 3.91s/it]
{'loss': 0.2319, 'grad_norm': 0.3741357624530792, 'learning_rate': 2.166884717265864e-05, 'epoch': 0.59}
59%|█████▉ | 2652/4506 [3:01:22<2:00:43, 3.91s/it]
59%|█████▉ | 2653/4506 [3:01:27<2:05:08, 4.05s/it]
{'loss': 0.2249, 'grad_norm': 0.33457130193710327, 'learning_rate': 2.1649652249467533e-05, 'epoch': 0.59}
59%|█████▉ | 2653/4506 [3:01:27<2:05:08, 4.05s/it]
59%|█████▉ | 2654/4506 [3:01:31<2:05:35, 4.07s/it]
{'loss': 0.238, 'grad_norm': 0.3713972866535187, 'learning_rate': 2.163045933725707e-05, 'epoch': 0.59}
59%|█████▉ | 2654/4506 [3:01:31<2:05:35, 4.07s/it]
59%|█████▉ | 2655/4506 [3:01:35<2:03:11, 3.99s/it]
{'loss': 0.2308, 'grad_norm': 0.4079037606716156, 'learning_rate': 2.161126844754745e-05, 'epoch': 0.59}
59%|█████▉ | 2655/4506 [3:01:35<2:03:11, 3.99s/it]
59%|█████▉ | 2656/4506 [3:01:39<2:03:19, 4.00s/it]
{'loss': 0.2288, 'grad_norm': 0.35551294684410095, 'learning_rate': 2.1592079591857603e-05, 'epoch': 0.59}
59%|█████▉ | 2656/4506 [3:01:39<2:03:19, 4.00s/it]
59%|█████▉ | 2657/4506 [3:01:43<2:04:51, 4.05s/it]
{'loss': 0.23, 'grad_norm': 0.33647772669792175, 'learning_rate': 2.1572892781705287e-05, 'epoch': 0.59}
59%|█████▉ | 2657/4506 [3:01:43<2:04:51, 4.05s/it]
59%|█████▉ | 2658/4506 [3:01:47<2:07:59, 4.16s/it]
{'loss': 0.2416, 'grad_norm': 0.38248908519744873, 'learning_rate': 2.1553708028606995e-05, 'epoch': 0.59}
59%|█████▉ | 2658/4506 [3:01:47<2:07:59, 4.16s/it]
59%|█████▉ | 2659/4506 [3:01:51<2:06:40, 4.12s/it]
{'loss': 0.243, 'grad_norm': 0.40745946764945984, 'learning_rate': 2.1534525344077994e-05, 'epoch': 0.59}
59%|█████▉ | 2659/4506 [3:01:51<2:06:40, 4.12s/it]
59%|█████▉ | 2660/4506 [3:01:55<2:06:10, 4.10s/it]
{'loss': 0.2279, 'grad_norm': 0.32479551434516907, 'learning_rate': 2.151534473963234e-05, 'epoch': 0.59}
59%|█████▉ | 2660/4506 [3:01:55<2:06:10, 4.10s/it]
59%|█████▉ | 2661/4506 [3:02:00<2:08:21, 4.17s/it]
{'loss': 0.2315, 'grad_norm': 0.33735230565071106, 'learning_rate': 2.149616622678278e-05, 'epoch': 0.59}
59%|█████▉ | 2661/4506 [3:02:00<2:08:21, 4.17s/it]
59%|█████▉ | 2662/4506 [3:02:04<2:11:09, 4.27s/it]
{'loss': 0.2327, 'grad_norm': 0.3253001272678375, 'learning_rate': 2.147698981704087e-05, 'epoch': 0.59}
59%|█████▉ | 2662/4506 [3:02:04<2:11:09, 4.27s/it]
59%|█████▉ | 2663/4506 [3:02:08<2:07:49, 4.16s/it]
{'loss': 0.2479, 'grad_norm': 0.38232070207595825, 'learning_rate': 2.1457815521916858e-05, 'epoch': 0.59}
59%|█████▉ | 2663/4506 [3:02:08<2:07:49, 4.16s/it]
59%|█████▉ | 2664/4506 [3:02:13<2:11:53, 4.30s/it]
{'loss': 0.2375, 'grad_norm': 0.39692267775535583, 'learning_rate': 2.1438643352919753e-05, 'epoch': 0.59}
59%|█████▉ | 2664/4506 [3:02:13<2:11:53, 4.30s/it]
59%|█████▉ | 2665/4506 [3:02:17<2:12:23, 4.31s/it]
{'loss': 0.2332, 'grad_norm': 0.341194212436676, 'learning_rate': 2.141947332155726e-05, 'epoch': 0.59}
59%|█████▉ | 2665/4506 [3:02:17<2:12:23, 4.31s/it]
59%|█████▉ | 2666/4506 [3:02:21<2:08:12, 4.18s/it]
{'loss': 0.2198, 'grad_norm': 0.3354461193084717, 'learning_rate': 2.1400305439335833e-05, 'epoch': 0.59}
59%|█████▉ | 2666/4506 [3:02:21<2:08:12, 4.18s/it]
59%|█████▉ | 2667/4506 [3:02:25<2:08:20, 4.19s/it]
{'loss': 0.2343, 'grad_norm': 0.3618800938129425, 'learning_rate': 2.1381139717760596e-05, 'epoch': 0.59}
59%|█████▉ | 2667/4506 [3:02:25<2:08:20, 4.19s/it]
59%|█████▉ | 2668/4506 [3:02:29<2:04:36, 4.07s/it]
{'loss': 0.2322, 'grad_norm': 0.3762582838535309, 'learning_rate': 2.1361976168335412e-05, 'epoch': 0.59}
59%|█████▉ | 2668/4506 [3:02:29<2:04:36, 4.07s/it]
59%|█████▉ | 2669/4506 [3:02:33<2:05:45, 4.11s/it]
{'loss': 0.2214, 'grad_norm': 0.39824584126472473, 'learning_rate': 2.1342814802562823e-05, 'epoch': 0.59}
59%|█████▉ | 2669/4506 [3:02:33<2:05:45, 4.11s/it]
59%|█████▉ | 2670/4506 [3:02:37<2:06:30, 4.13s/it]
{'loss': 0.2316, 'grad_norm': 0.39767149090766907, 'learning_rate': 2.1323655631944073e-05, 'epoch': 0.59}
59%|█████▉ | 2670/4506 [3:02:37<2:06:30, 4.13s/it]
59%|█████▉ | 2671/4506 [3:02:41<2:06:54, 4.15s/it]
{'loss': 0.2184, 'grad_norm': 0.3258102536201477, 'learning_rate': 2.1304498667979056e-05, 'epoch': 0.59}
59%|█████▉ | 2671/4506 [3:02:41<2:06:54, 4.15s/it]
59%|█████▉ | 2672/4506 [3:02:45<2:02:06, 3.99s/it]
{'loss': 0.2239, 'grad_norm': 0.337798148393631, 'learning_rate': 2.1285343922166395e-05, 'epoch': 0.59}
59%|█████▉ | 2672/4506 [3:02:45<2:02:06, 3.99s/it]
59%|█████▉ | 2673/4506 [3:02:49<2:05:20, 4.10s/it]
{'loss': 0.2337, 'grad_norm': 0.3758181035518646, 'learning_rate': 2.126619140600333e-05, 'epoch': 0.59}
59%|█████▉ | 2673/4506 [3:02:49<2:05:20, 4.10s/it]
59%|█████▉ | 2674/4506 [3:02:53<2:05:26, 4.11s/it]
{'loss': 0.2415, 'grad_norm': 0.4237552583217621, 'learning_rate': 2.1247041130985785e-05, 'epoch': 0.59}
59%|█████▉ | 2674/4506 [3:02:54<2:05:26, 4.11s/it]
59%|█████▉ | 2675/4506 [3:02:58<2:08:49, 4.22s/it]
{'loss': 0.2249, 'grad_norm': 0.3838600814342499, 'learning_rate': 2.122789310860835e-05, 'epoch': 0.59}
59%|█████▉ | 2675/4506 [3:02:58<2:08:49, 4.22s/it]
59%|█████▉ | 2676/4506 [3:03:02<2:08:53, 4.23s/it]
{'loss': 0.2192, 'grad_norm': 0.33149340748786926, 'learning_rate': 2.1208747350364236e-05, 'epoch': 0.59}
59%|█████▉ | 2676/4506 [3:03:02<2:08:53, 4.23s/it]
59%|█████▉ | 2677/4506 [3:03:06<2:09:21, 4.24s/it]
{'loss': 0.2381, 'grad_norm': 0.3754129409790039, 'learning_rate': 2.1189603867745318e-05, 'epoch': 0.59}
59%|█████▉ | 2677/4506 [3:03:07<2:09:21, 4.24s/it]
59%|█████▉ | 2678/4506 [3:03:11<2:09:40, 4.26s/it]
{'loss': 0.2312, 'grad_norm': 0.3920048773288727, 'learning_rate': 2.117046267224209e-05, 'epoch': 0.59}
59%|█████▉ | 2678/4506 [3:03:11<2:09:40, 4.26s/it]
59%|█████▉ | 2679/4506 [3:03:15<2:07:15, 4.18s/it]
{'loss': 0.2238, 'grad_norm': 0.3415831923484802, 'learning_rate': 2.1151323775343702e-05, 'epoch': 0.59}
59%|█████▉ | 2679/4506 [3:03:15<2:07:15, 4.18s/it]
59%|█████▉ | 2680/4506 [3:03:18<2:02:41, 4.03s/it]
{'loss': 0.2139, 'grad_norm': 0.4050319790840149, 'learning_rate': 2.113218718853787e-05, 'epoch': 0.59}
59%|█████▉ | 2680/4506 [3:03:18<2:02:41, 4.03s/it]
59%|█████▉ | 2681/4506 [3:03:23<2:06:28, 4.16s/it]
{'loss': 0.2354, 'grad_norm': 0.4226042628288269, 'learning_rate': 2.1113052923310976e-05, 'epoch': 0.6}
59%|█████▉ | 2681/4506 [3:03:23<2:06:28, 4.16s/it]
60%|█████▉ | 2682/4506 [3:03:27<2:04:23, 4.09s/it]
{'loss': 0.2306, 'grad_norm': 0.36140942573547363, 'learning_rate': 2.109392099114798e-05, 'epoch': 0.6}
60%|█████▉ | 2682/4506 [3:03:27<2:04:23, 4.09s/it]
60%|█████▉ | 2683/4506 [3:03:31<2:02:37, 4.04s/it]
{'loss': 0.2254, 'grad_norm': 0.408518522977829, 'learning_rate': 2.1074791403532458e-05, 'epoch': 0.6}
60%|█████▉ | 2683/4506 [3:03:31<2:02:37, 4.04s/it]
60%|█████▉ | 2684/4506 [3:03:35<2:03:18, 4.06s/it]
{'loss': 0.234, 'grad_norm': 0.40419983863830566, 'learning_rate': 2.105566417194656e-05, 'epoch': 0.6}
60%|█████▉ | 2684/4506 [3:03:35<2:03:18, 4.06s/it]
60%|█████▉ | 2685/4506 [3:03:39<2:02:33, 4.04s/it]
{'loss': 0.2244, 'grad_norm': 0.35313233733177185, 'learning_rate': 2.103653930787105e-05, 'epoch': 0.6}
60%|█████▉ | 2685/4506 [3:03:39<2:02:33, 4.04s/it]
60%|█████▉ | 2686/4506 [3:03:43<2:00:58, 3.99s/it]
{'loss': 0.2316, 'grad_norm': 0.42978084087371826, 'learning_rate': 2.101741682278523e-05, 'epoch': 0.6}
60%|█████▉ | 2686/4506 [3:03:43<2:00:58, 3.99s/it]
60%|█████▉ | 2687/4506 [3:03:47<2:01:45, 4.02s/it]
{'loss': 0.2263, 'grad_norm': 0.4029468595981598, 'learning_rate': 2.0998296728167012e-05, 'epoch': 0.6}
60%|█████▉ | 2687/4506 [3:03:47<2:01:45, 4.02s/it]
60%|█████▉ | 2688/4506 [3:03:51<2:07:14, 4.20s/it]
{'loss': 0.2345, 'grad_norm': 0.34251952171325684, 'learning_rate': 2.0979179035492852e-05, 'epoch': 0.6}
60%|█████▉ | 2688/4506 [3:03:51<2:07:14, 4.20s/it]
60%|█████▉ | 2689/4506 [3:03:56<2:07:16, 4.20s/it]
{'loss': 0.228, 'grad_norm': 0.3387608826160431, 'learning_rate': 2.0960063756237786e-05, 'epoch': 0.6}
60%|█████▉ | 2689/4506 [3:03:56<2:07:16, 4.20s/it]
60%|█████▉ | 2690/4506 [3:04:00<2:10:30, 4.31s/it]
{'loss': 0.2329, 'grad_norm': 0.3990435302257538, 'learning_rate': 2.0940950901875367e-05, 'epoch': 0.6}
60%|█████▉ | 2690/4506 [3:04:00<2:10:30, 4.31s/it]
60%|█████▉ | 2691/4506 [3:04:05<2:11:06, 4.33s/it]
{'loss': 0.2248, 'grad_norm': 0.33626365661621094, 'learning_rate': 2.0921840483877715e-05, 'epoch': 0.6}
60%|█████▉ | 2691/4506 [3:04:05<2:11:06, 4.33s/it]
60%|█████▉ | 2692/4506 [3:04:09<2:08:01, 4.23s/it]
{'loss': 0.2159, 'grad_norm': 0.3443509042263031, 'learning_rate': 2.090273251371549e-05, 'epoch': 0.6}
60%|█████▉ | 2692/4506 [3:04:09<2:08:01, 4.23s/it]
60%|█████▉ | 2693/4506 [3:04:13<2:05:24, 4.15s/it]
{'loss': 0.2266, 'grad_norm': 0.33548983931541443, 'learning_rate': 2.0883627002857874e-05, 'epoch': 0.6}
60%|█████▉ | 2693/4506 [3:04:13<2:05:24, 4.15s/it]
60%|█████▉ | 2694/4506 [3:04:17<2:04:33, 4.12s/it]
{'loss': 0.2266, 'grad_norm': 0.36838749051094055, 'learning_rate': 2.0864523962772586e-05, 'epoch': 0.6}
60%|█████▉ | 2694/4506 [3:04:17<2:04:33, 4.12s/it]
60%|█████▉ | 2695/4506 [3:04:21<2:04:41, 4.13s/it]
{'loss': 0.2367, 'grad_norm': 0.3675439953804016, 'learning_rate': 2.0845423404925833e-05, 'epoch': 0.6}
60%|█████▉ | 2695/4506 [3:04:21<2:04:41, 4.13s/it]
60%|█████▉ | 2696/4506 [3:04:25<2:03:57, 4.11s/it]
{'loss': 0.2254, 'grad_norm': 0.3232535421848297, 'learning_rate': 2.0826325340782367e-05, 'epoch': 0.6}
60%|█████▉ | 2696/4506 [3:04:25<2:03:57, 4.11s/it]
60%|█████▉ | 2697/4506 [3:04:29<2:01:44, 4.04s/it]
{'loss': 0.2224, 'grad_norm': 0.3882271647453308, 'learning_rate': 2.080722978180542e-05, 'epoch': 0.6}
60%|█████▉ | 2697/4506 [3:04:29<2:01:44, 4.04s/it]
60%|█████▉ | 2698/4506 [3:04:33<2:00:29, 4.00s/it]
{'loss': 0.2211, 'grad_norm': 0.36537227034568787, 'learning_rate': 2.0788136739456734e-05, 'epoch': 0.6}
60%|█████▉ | 2698/4506 [3:04:33<2:00:29, 4.00s/it]
60%|█████▉ | 2699/4506 [3:04:37<2:01:02, 4.02s/it]
{'loss': 0.2317, 'grad_norm': 0.4052889943122864, 'learning_rate': 2.076904622519652e-05, 'epoch': 0.6}
60%|█████▉ | 2699/4506 [3:04:37<2:01:02, 4.02s/it]
60%|█████▉ | 2700/4506 [3:04:40<1:57:59, 3.92s/it]
{'loss': 0.2299, 'grad_norm': 0.39879778027534485, 'learning_rate': 2.07499582504835e-05, 'epoch': 0.6}
60%|█████▉ | 2700/4506 [3:04:40<1:57:59, 3.92s/it]
60%|█████▉ | 2701/4506 [3:04:44<1:57:57, 3.92s/it]
{'loss': 0.2286, 'grad_norm': 0.34466665983200073, 'learning_rate': 2.0730872826774845e-05, 'epoch': 0.6}
60%|█████▉ | 2701/4506 [3:04:44<1:57:57, 3.92s/it]
60%|█████▉ | 2702/4506 [3:04:49<2:01:23, 4.04s/it]
{'loss': 0.219, 'grad_norm': 0.3453252613544464, 'learning_rate': 2.071178996552622e-05, 'epoch': 0.6}
60%|█████▉ | 2702/4506 [3:04:49<2:01:23, 4.04s/it]
60%|█████▉ | 2703/4506 [3:04:52<1:58:29, 3.94s/it]
{'loss': 0.2324, 'grad_norm': 0.3529214560985565, 'learning_rate': 2.069270967819173e-05, 'epoch': 0.6}
60%|█████▉ | 2703/4506 [3:04:52<1:58:29, 3.94s/it]
60%|██████ | 2704/4506 [3:04:56<1:59:18, 3.97s/it]
{'loss': 0.2243, 'grad_norm': 0.3554275333881378, 'learning_rate': 2.0673631976223953e-05, 'epoch': 0.6}
60%|██████ | 2704/4506 [3:04:56<1:59:18, 3.97s/it]
60%|██████ | 2705/4506 [3:05:00<1:58:55, 3.96s/it]
{'loss': 0.2282, 'grad_norm': 0.4303394854068756, 'learning_rate': 2.065455687107389e-05, 'epoch': 0.6}
60%|██████ | 2705/4506 [3:05:00<1:58:55, 3.96s/it]
60%|██████ | 2706/4506 [3:05:05<2:02:25, 4.08s/it]
{'loss': 0.2299, 'grad_norm': 0.3283340632915497, 'learning_rate': 2.0635484374191016e-05, 'epoch': 0.6}
60%|██████ | 2706/4506 [3:05:05<2:02:25, 4.08s/it]
60%|██████ | 2707/4506 [3:05:09<2:02:26, 4.08s/it]
{'loss': 0.2309, 'grad_norm': 0.35055968165397644, 'learning_rate': 2.0616414497023222e-05, 'epoch': 0.6}
60%|██████ | 2707/4506 [3:05:09<2:02:26, 4.08s/it]
60%|██████ | 2708/4506 [3:05:13<2:04:02, 4.14s/it]
{'loss': 0.2156, 'grad_norm': 0.3491853177547455, 'learning_rate': 2.0597347251016813e-05, 'epoch': 0.6}
60%|██████ | 2708/4506 [3:05:13<2:04:02, 4.14s/it]
60%|██████ | 2709/4506 [3:05:17<2:04:05, 4.14s/it]
{'loss': 0.2218, 'grad_norm': 0.3524864912033081, 'learning_rate': 2.0578282647616546e-05, 'epoch': 0.6}
60%|██████ | 2709/4506 [3:05:17<2:04:05, 4.14s/it]
60%|██████ | 2710/4506 [3:05:22<2:07:35, 4.26s/it]
{'loss': 0.2354, 'grad_norm': 0.4139131009578705, 'learning_rate': 2.0559220698265565e-05, 'epoch': 0.6}
60%|██████ | 2710/4506 [3:05:22<2:07:35, 4.26s/it]
60%|██████ | 2711/4506 [3:05:26<2:07:55, 4.28s/it]
{'loss': 0.2292, 'grad_norm': 0.3764800727367401, 'learning_rate': 2.0540161414405445e-05, 'epoch': 0.6}
60%|██████ | 2711/4506 [3:05:26<2:07:55, 4.28s/it]
60%|██████ | 2712/4506 [3:05:30<2:04:33, 4.17s/it]
{'loss': 0.231, 'grad_norm': 0.42316681146621704, 'learning_rate': 2.0521104807476133e-05, 'epoch': 0.6}
60%|██████ | 2712/4506 [3:05:30<2:04:33, 4.17s/it]
60%|██████ | 2713/4506 [3:05:34<2:00:15, 4.02s/it]
{'loss': 0.2205, 'grad_norm': 0.3862617611885071, 'learning_rate': 2.0502050888916004e-05, 'epoch': 0.6}
60%|██████ | 2713/4506 [3:05:34<2:00:15, 4.02s/it]
60%|██████ | 2714/4506 [3:05:38<2:01:31, 4.07s/it]
{'loss': 0.2352, 'grad_norm': 0.3643556833267212, 'learning_rate': 2.048299967016178e-05, 'epoch': 0.6}
60%|██████ | 2714/4506 [3:05:38<2:01:31, 4.07s/it]
60%|██████ | 2715/4506 [3:05:42<2:00:09, 4.03s/it]
{'loss': 0.2176, 'grad_norm': 0.3715202212333679, 'learning_rate': 2.0463951162648593e-05, 'epoch': 0.6}
60%|██████ | 2715/4506 [3:05:42<2:00:09, 4.03s/it]
60%|██████ | 2716/4506 [3:05:46<1:59:40, 4.01s/it]
{'loss': 0.233, 'grad_norm': 0.3644055724143982, 'learning_rate': 2.0444905377809927e-05, 'epoch': 0.6}
60%|██████ | 2716/4506 [3:05:46<1:59:40, 4.01s/it]
60%|██████ | 2717/4506 [3:05:50<1:59:42, 4.01s/it]
{'loss': 0.2303, 'grad_norm': 0.41091758012771606, 'learning_rate': 2.0425862327077663e-05, 'epoch': 0.6}
60%|██████ | 2717/4506 [3:05:50<1:59:42, 4.01s/it]
60%|██████ | 2718/4506 [3:05:54<2:04:39, 4.18s/it]
{'loss': 0.2347, 'grad_norm': 0.3868355453014374, 'learning_rate': 2.0406822021881994e-05, 'epoch': 0.6}
60%|██████ | 2718/4506 [3:05:54<2:04:39, 4.18s/it]
60%|██████ | 2719/4506 [3:05:59<2:04:55, 4.19s/it]
{'loss': 0.2282, 'grad_norm': 0.35180771350860596, 'learning_rate': 2.0387784473651504e-05, 'epoch': 0.6}
60%|██████ | 2719/4506 [3:05:59<2:04:55, 4.19s/it]
60%|██████ | 2720/4506 [3:06:03<2:03:10, 4.14s/it]
{'loss': 0.2245, 'grad_norm': 0.3540983498096466, 'learning_rate': 2.03687496938131e-05, 'epoch': 0.6}
60%|██████ | 2720/4506 [3:06:03<2:03:10, 4.14s/it]
60%|██████ | 2721/4506 [3:06:06<2:00:49, 4.06s/it]
{'loss': 0.2281, 'grad_norm': 0.4117898941040039, 'learning_rate': 2.0349717693792052e-05, 'epoch': 0.6}
60%|██████ | 2721/4506 [3:06:06<2:00:49, 4.06s/it]
60%|██████ | 2722/4506 [3:06:10<1:59:45, 4.03s/it]
{'loss': 0.2185, 'grad_norm': 0.347328245639801, 'learning_rate': 2.033068848501193e-05, 'epoch': 0.6}
60%|██████ | 2722/4506 [3:06:10<1:59:45, 4.03s/it]
60%|██████ | 2723/4506 [3:06:15<2:01:39, 4.09s/it]
{'loss': 0.2339, 'grad_norm': 0.35616427659988403, 'learning_rate': 2.0311662078894655e-05, 'epoch': 0.6}
60%|██████ | 2723/4506 [3:06:15<2:01:39, 4.09s/it]
60%|██████ | 2724/4506 [3:06:18<1:58:29, 3.99s/it]
{'loss': 0.2377, 'grad_norm': 0.3983999788761139, 'learning_rate': 2.029263848686045e-05, 'epoch': 0.6}
60%|██████ | 2724/4506 [3:06:18<1:58:29, 3.99s/it]
60%|██████ | 2725/4506 [3:06:23<2:00:05, 4.05s/it]
{'loss': 0.2379, 'grad_norm': 0.35695362091064453, 'learning_rate': 2.027361772032784e-05, 'epoch': 0.6}
60%|██████ | 2725/4506 [3:06:23<2:00:05, 4.05s/it]
60%|██████ | 2726/4506 [3:06:27<2:03:14, 4.15s/it]
{'loss': 0.2295, 'grad_norm': 0.3697837293148041, 'learning_rate': 2.02545997907137e-05, 'epoch': 0.61}
60%|██████ | 2726/4506 [3:06:27<2:03:14, 4.15s/it]
61%|██████ | 2727/4506 [3:06:31<2:04:11, 4.19s/it]
{'loss': 0.2331, 'grad_norm': 0.35742294788360596, 'learning_rate': 2.0235584709433136e-05, 'epoch': 0.61}
61%|██████ | 2727/4506 [3:06:31<2:04:11, 4.19s/it]
61%|██████ | 2728/4506 [3:06:35<2:01:20, 4.09s/it]
{'loss': 0.2305, 'grad_norm': 0.4013253450393677, 'learning_rate': 2.0216572487899603e-05, 'epoch': 0.61}
61%|██████ | 2728/4506 [3:06:35<2:01:20, 4.09s/it]
61%|██████ | 2729/4506 [3:06:39<2:03:33, 4.17s/it]
{'loss': 0.2322, 'grad_norm': 0.3872532844543457, 'learning_rate': 2.0197563137524798e-05, 'epoch': 0.61}
61%|██████ | 2729/4506 [3:06:39<2:03:33, 4.17s/it]
61%|██████ | 2730/4506 [3:06:43<2:01:57, 4.12s/it]
{'loss': 0.2356, 'grad_norm': 0.39964860677719116, 'learning_rate': 2.0178556669718722e-05, 'epoch': 0.61}
61%|██████ | 2730/4506 [3:06:43<2:01:57, 4.12s/it]
61%|██████ | 2731/4506 [3:06:48<2:05:41, 4.25s/it]
{'loss': 0.2214, 'grad_norm': 0.4151087701320648, 'learning_rate': 2.015955309588963e-05, 'epoch': 0.61}
61%|██████ | 2731/4506 [3:06:48<2:05:41, 4.25s/it]
61%|██████ | 2732/4506 [3:06:52<2:01:53, 4.12s/it]
{'loss': 0.2277, 'grad_norm': 0.4693710505962372, 'learning_rate': 2.0140552427444057e-05, 'epoch': 0.61}
61%|██████ | 2732/4506 [3:06:52<2:01:53, 4.12s/it]
61%|██████ | 2733/4506 [3:06:56<2:02:47, 4.16s/it]
{'loss': 0.2194, 'grad_norm': 0.32201170921325684, 'learning_rate': 2.012155467578676e-05, 'epoch': 0.61}
61%|██████ | 2733/4506 [3:06:56<2:02:47, 4.16s/it]
61%|██████ | 2734/4506 [3:07:00<2:01:54, 4.13s/it]
{'loss': 0.2206, 'grad_norm': 0.38227513432502747, 'learning_rate': 2.010255985232079e-05, 'epoch': 0.61}
61%|██████ | 2734/4506 [3:07:00<2:01:54, 4.13s/it]
61%|██████ | 2735/4506 [3:07:04<1:59:16, 4.04s/it]
{'loss': 0.2269, 'grad_norm': 0.36979711055755615, 'learning_rate': 2.0083567968447397e-05, 'epoch': 0.61}
61%|██████ | 2735/4506 [3:07:04<1:59:16, 4.04s/it]
61%|██████ | 2736/4506 [3:07:08<2:00:31, 4.09s/it]
{'loss': 0.2217, 'grad_norm': 0.30391791462898254, 'learning_rate': 2.0064579035566114e-05, 'epoch': 0.61}
61%|██████ | 2736/4506 [3:07:08<2:00:31, 4.09s/it]
61%|██████ | 2737/4506 [3:07:12<2:00:03, 4.07s/it]
{'loss': 0.2418, 'grad_norm': 0.38340121507644653, 'learning_rate': 2.004559306507465e-05, 'epoch': 0.61}
61%|██████ | 2737/4506 [3:07:12<2:00:03, 4.07s/it]
61%|██████ | 2738/4506 [3:07:16<1:58:04, 4.01s/it]
{'loss': 0.2303, 'grad_norm': 0.3980295956134796, 'learning_rate': 2.0026610068368976e-05, 'epoch': 0.61}
61%|██████ | 2738/4506 [3:07:16<1:58:04, 4.01s/it]
61%|██████ | 2739/4506 [3:07:20<1:55:27, 3.92s/it]
{'loss': 0.2369, 'grad_norm': 0.41443610191345215, 'learning_rate': 2.0007630056843256e-05, 'epoch': 0.61}
61%|██████ | 2739/4506 [3:07:20<1:55:27, 3.92s/it]
61%|██████ | 2740/4506 [3:07:24<1:56:23, 3.95s/it]
{'loss': 0.2146, 'grad_norm': 0.3637952208518982, 'learning_rate': 1.998865304188988e-05, 'epoch': 0.61}
61%|██████ | 2740/4506 [3:07:24<1:56:23, 3.95s/it]
61%|██████ | 2741/4506 [3:07:28<1:58:19, 4.02s/it]
{'loss': 0.238, 'grad_norm': 0.37051570415496826, 'learning_rate': 1.9969679034899437e-05, 'epoch': 0.61}
61%|██████ | 2741/4506 [3:07:28<1:58:19, 4.02s/it]
61%|██████ | 2742/4506 [3:07:32<1:57:24, 3.99s/it]
{'loss': 0.2202, 'grad_norm': 0.3659733831882477, 'learning_rate': 1.9950708047260678e-05, 'epoch': 0.61}
61%|██████ | 2742/4506 [3:07:32<1:57:24, 3.99s/it]
61%|██████ | 2743/4506 [3:07:36<1:55:33, 3.93s/it]
{'loss': 0.2167, 'grad_norm': 0.3511994481086731, 'learning_rate': 1.9931740090360584e-05, 'epoch': 0.61}
61%|██████ | 2743/4506 [3:07:36<1:55:33, 3.93s/it]
61%|██████ | 2744/4506 [3:07:40<1:57:42, 4.01s/it]
{'loss': 0.2299, 'grad_norm': 0.3964718282222748, 'learning_rate': 1.9912775175584294e-05, 'epoch': 0.61}
61%|██████ | 2744/4506 [3:07:40<1:57:42, 4.01s/it]
61%|██████ | 2745/4506 [3:07:44<1:57:49, 4.01s/it]
{'loss': 0.2212, 'grad_norm': 0.34527134895324707, 'learning_rate': 1.989381331431514e-05, 'epoch': 0.61}
61%|██████ | 2745/4506 [3:07:44<1:57:49, 4.01s/it]
61%|██████ | 2746/4506 [3:07:48<1:57:35, 4.01s/it]
{'loss': 0.2245, 'grad_norm': 0.3823343813419342, 'learning_rate': 1.987485451793459e-05, 'epoch': 0.61}
61%|██████ | 2746/4506 [3:07:48<1:57:35, 4.01s/it]
61%|██████ | 2747/4506 [3:07:52<1:59:31, 4.08s/it]
{'loss': 0.2438, 'grad_norm': 0.42726609110832214, 'learning_rate': 1.9855898797822297e-05, 'epoch': 0.61}
61%|██████ | 2747/4506 [3:07:52<1:59:31, 4.08s/it]
61%|██████ | 2748/4506 [3:07:56<1:57:21, 4.01s/it]
{'loss': 0.2147, 'grad_norm': 0.36794164776802063, 'learning_rate': 1.9836946165356063e-05, 'epoch': 0.61}
61%|██████ | 2748/4506 [3:07:56<1:57:21, 4.01s/it]
61%|██████ | 2749/4506 [3:08:00<1:56:39, 3.98s/it]
{'loss': 0.2236, 'grad_norm': 0.353590190410614, 'learning_rate': 1.9817996631911835e-05, 'epoch': 0.61}
61%|██████ | 2749/4506 [3:08:00<1:56:39, 3.98s/it]
61%|██████ | 2750/4506 [3:08:04<1:54:48, 3.92s/it]
{'loss': 0.2236, 'grad_norm': 0.37881284952163696, 'learning_rate': 1.9799050208863696e-05, 'epoch': 0.61}
61%|██████ | 2750/4506 [3:08:04<1:54:48, 3.92s/it]
61%|██████ | 2751/4506 [3:08:08<1:56:56, 4.00s/it]
{'loss': 0.2351, 'grad_norm': 0.40788644552230835, 'learning_rate': 1.9780106907583877e-05, 'epoch': 0.61}
61%|██████ | 2751/4506 [3:08:08<1:56:56, 4.00s/it]
61%|██████ | 2752/4506 [3:08:12<1:55:26, 3.95s/it]
{'loss': 0.2402, 'grad_norm': 0.42155420780181885, 'learning_rate': 1.97611667394427e-05, 'epoch': 0.61}
61%|██████ | 2752/4506 [3:08:12<1:55:26, 3.95s/it]
61%|██████ | 2753/4506 [3:08:16<1:58:23, 4.05s/it]
{'loss': 0.231, 'grad_norm': 0.385510116815567, 'learning_rate': 1.9742229715808656e-05, 'epoch': 0.61}
61%|██████ | 2753/4506 [3:08:16<1:58:23, 4.05s/it]
61%|██████ | 2754/4506 [3:08:20<1:59:44, 4.10s/it]
{'loss': 0.2314, 'grad_norm': 0.38623231649398804, 'learning_rate': 1.9723295848048307e-05, 'epoch': 0.61}
61%|██████ | 2754/4506 [3:08:20<1:59:44, 4.10s/it]
61%|██████ | 2755/4506 [3:08:24<2:00:50, 4.14s/it]
{'loss': 0.2177, 'grad_norm': 0.4205824136734009, 'learning_rate': 1.9704365147526348e-05, 'epoch': 0.61}
61%|██████ | 2755/4506 [3:08:24<2:00:50, 4.14s/it]
61%|██████ | 2756/4506 [3:08:29<2:00:25, 4.13s/it]
{'loss': 0.2156, 'grad_norm': 0.3714348077774048, 'learning_rate': 1.9685437625605546e-05, 'epoch': 0.61}
61%|██████ | 2756/4506 [3:08:29<2:00:25, 4.13s/it]
61%|██████ | 2757/4506 [3:08:33<2:02:16, 4.19s/it]
{'loss': 0.2377, 'grad_norm': 0.42034730315208435, 'learning_rate': 1.9666513293646788e-05, 'epoch': 0.61}
61%|██████ | 2757/4506 [3:08:33<2:02:16, 4.19s/it]
61%|██████ | 2758/4506 [3:08:37<1:58:56, 4.08s/it]
{'loss': 0.2195, 'grad_norm': 0.4379158020019531, 'learning_rate': 1.964759216300903e-05, 'epoch': 0.61}
61%|██████ | 2758/4506 [3:08:37<1:58:56, 4.08s/it]
61%|██████ | 2759/4506 [3:08:41<2:00:06, 4.13s/it]
{'loss': 0.2243, 'grad_norm': 0.42134907841682434, 'learning_rate': 1.9628674245049308e-05, 'epoch': 0.61}
61%|██████ | 2759/4506 [3:08:41<2:00:06, 4.13s/it]
61%|██████▏ | 2760/4506 [3:08:45<2:01:15, 4.17s/it]
{'loss': 0.2194, 'grad_norm': 0.399128258228302, 'learning_rate': 1.9609759551122744e-05, 'epoch': 0.61}
61%|██████▏ | 2760/4506 [3:08:45<2:01:15, 4.17s/it]
61%|██████▏ | 2761/4506 [3:08:49<2:00:37, 4.15s/it]
{'loss': 0.2221, 'grad_norm': 0.3557426631450653, 'learning_rate': 1.9590848092582494e-05, 'epoch': 0.61}
61%|██████▏ | 2761/4506 [3:08:49<2:00:37, 4.15s/it]
61%|██████▏ | 2762/4506 [3:08:53<1:58:37, 4.08s/it]
{'loss': 0.2324, 'grad_norm': 0.3684883713722229, 'learning_rate': 1.9571939880779802e-05, 'epoch': 0.61}
61%|██████▏ | 2762/4506 [3:08:53<1:58:37, 4.08s/it]
61%|██████▏ | 2763/4506 [3:08:57<1:57:53, 4.06s/it]
{'loss': 0.2296, 'grad_norm': 0.39197584986686707, 'learning_rate': 1.9553034927063944e-05, 'epoch': 0.61}
61%|██████▏ | 2763/4506 [3:08:57<1:57:53, 4.06s/it]
61%|██████▏ | 2764/4506 [3:09:01<1:58:05, 4.07s/it]
{'loss': 0.2149, 'grad_norm': 0.3586570918560028, 'learning_rate': 1.953413324278227e-05, 'epoch': 0.61}
61%|██████▏ | 2764/4506 [3:09:01<1:58:05, 4.07s/it]
61%|██████▏ | 2765/4506 [3:09:05<1:57:05, 4.04s/it]
{'loss': 0.227, 'grad_norm': 0.38371917605400085, 'learning_rate': 1.9515234839280114e-05, 'epoch': 0.61}
61%|██████▏ | 2765/4506 [3:09:05<1:57:05, 4.04s/it]
61%|██████▏ | 2766/4506 [3:09:10<1:58:48, 4.10s/it]
{'loss': 0.2198, 'grad_norm': 0.39168548583984375, 'learning_rate': 1.94963397279009e-05, 'epoch': 0.61}
61%|██████▏ | 2766/4506 [3:09:10<1:58:48, 4.10s/it]
61%|██████▏ | 2767/4506 [3:09:13<1:57:21, 4.05s/it]
{'loss': 0.2336, 'grad_norm': 0.38677895069122314, 'learning_rate': 1.9477447919986028e-05, 'epoch': 0.61}
61%|██████▏ | 2767/4506 [3:09:13<1:57:21, 4.05s/it]
61%|██████▏ | 2768/4506 [3:09:17<1:57:17, 4.05s/it]
{'loss': 0.2187, 'grad_norm': 0.38089290261268616, 'learning_rate': 1.9458559426874954e-05, 'epoch': 0.61}
61%|██████▏ | 2768/4506 [3:09:18<1:57:17, 4.05s/it]
61%|██████▏ | 2769/4506 [3:09:21<1:55:18, 3.98s/it]
{'loss': 0.2187, 'grad_norm': 0.34671831130981445, 'learning_rate': 1.943967425990511e-05, 'epoch': 0.61}
61%|██████▏ | 2769/4506 [3:09:21<1:55:18, 3.98s/it]
61%|██████▏ | 2770/4506 [3:09:26<1:56:54, 4.04s/it]
{'loss': 0.222, 'grad_norm': 0.36765530705451965, 'learning_rate': 1.942079243041197e-05, 'epoch': 0.61}
61%|██████▏ | 2770/4506 [3:09:26<1:56:54, 4.04s/it]
61%|██████▏ | 2771/4506 [3:09:30<1:56:52, 4.04s/it]
{'loss': 0.2327, 'grad_norm': 0.39598050713539124, 'learning_rate': 1.9401913949728957e-05, 'epoch': 0.62}
61%|██████▏ | 2771/4506 [3:09:30<1:56:52, 4.04s/it]
62%|██████▏ | 2772/4506 [3:09:33<1:55:25, 3.99s/it]
{'loss': 0.2183, 'grad_norm': 0.34709984064102173, 'learning_rate': 1.9383038829187526e-05, 'epoch': 0.62}
62%|██████▏ | 2772/4506 [3:09:33<1:55:25, 3.99s/it]
62%|██████▏ | 2773/4506 [3:09:38<1:58:15, 4.09s/it]
{'loss': 0.2265, 'grad_norm': 0.4047681987285614, 'learning_rate': 1.936416708011709e-05, 'epoch': 0.62}
62%|██████▏ | 2773/4506 [3:09:38<1:58:15, 4.09s/it]
62%|██████▏ | 2774/4506 [3:09:42<1:57:44, 4.08s/it]
{'loss': 0.2231, 'grad_norm': 0.36418673396110535, 'learning_rate': 1.934529871384506e-05, 'epoch': 0.62}
62%|██████▏ | 2774/4506 [3:09:42<1:57:44, 4.08s/it]
62%|██████▏ | 2775/4506 [3:09:46<1:59:51, 4.15s/it]
{'loss': 0.2275, 'grad_norm': 0.35344934463500977, 'learning_rate': 1.9326433741696786e-05, 'epoch': 0.62}
62%|██████▏ | 2775/4506 [3:09:46<1:59:51, 4.15s/it]
62%|██████▏ | 2776/4506 [3:09:50<1:58:54, 4.12s/it]
{'loss': 0.2275, 'grad_norm': 0.4103547930717468, 'learning_rate': 1.9307572174995606e-05, 'epoch': 0.62}
62%|██████▏ | 2776/4506 [3:09:50<1:58:54, 4.12s/it]
62%|██████▏ | 2777/4506 [3:09:54<1:54:44, 3.98s/it]
{'loss': 0.2264, 'grad_norm': 0.41914018988609314, 'learning_rate': 1.928871402506281e-05, 'epoch': 0.62}
62%|██████▏ | 2777/4506 [3:09:54<1:54:44, 3.98s/it]
62%|██████▏ | 2778/4506 [3:09:58<1:57:22, 4.08s/it]
{'loss': 0.2331, 'grad_norm': 0.3892504870891571, 'learning_rate': 1.9269859303217623e-05, 'epoch': 0.62}
62%|██████▏ | 2778/4506 [3:09:58<1:57:22, 4.08s/it]
62%|██████▏ | 2779/4506 [3:10:03<2:02:46, 4.27s/it]
{'loss': 0.2285, 'grad_norm': 0.4330151379108429, 'learning_rate': 1.9251008020777245e-05, 'epoch': 0.62}
62%|██████▏ | 2779/4506 [3:10:03<2:02:46, 4.27s/it]
62%|██████▏ | 2780/4506 [3:10:07<1:58:47, 4.13s/it]
{'loss': 0.2369, 'grad_norm': 0.4440554976463318, 'learning_rate': 1.9232160189056757e-05, 'epoch': 0.62}
62%|██████▏ | 2780/4506 [3:10:07<1:58:47, 4.13s/it]
62%|██████▏ | 2781/4506 [3:10:11<2:03:32, 4.30s/it]
{'loss': 0.2323, 'grad_norm': 0.35650405287742615, 'learning_rate': 1.9213315819369233e-05, 'epoch': 0.62}
62%|██████▏ | 2781/4506 [3:10:11<2:03:32, 4.30s/it]
62%|██████▏ | 2782/4506 [3:10:15<2:00:50, 4.21s/it]
{'loss': 0.2185, 'grad_norm': 0.3627731204032898, 'learning_rate': 1.919447492302561e-05, 'epoch': 0.62}
62%|██████▏ | 2782/4506 [3:10:15<2:00:50, 4.21s/it]
62%|██████▏ | 2783/4506 [3:10:19<1:59:31, 4.16s/it]
{'loss': 0.2273, 'grad_norm': 0.3827498257160187, 'learning_rate': 1.917563751133479e-05, 'epoch': 0.62}
62%|██████▏ | 2783/4506 [3:10:19<1:59:31, 4.16s/it]
62%|██████▏ | 2784/4506 [3:10:23<1:56:32, 4.06s/it]
{'loss': 0.2273, 'grad_norm': 0.36920034885406494, 'learning_rate': 1.9156803595603544e-05, 'epoch': 0.62}
62%|██████▏ | 2784/4506 [3:10:23<1:56:32, 4.06s/it]
62%|██████▏ | 2785/4506 [3:10:28<1:58:39, 4.14s/it]
{'loss': 0.2148, 'grad_norm': 0.37509018182754517, 'learning_rate': 1.913797318713657e-05, 'epoch': 0.62}
62%|██████▏ | 2785/4506 [3:10:28<1:58:39, 4.14s/it]
62%|██████▏ | 2786/4506 [3:10:32<1:58:58, 4.15s/it]
{'loss': 0.2235, 'grad_norm': 0.3783295452594757, 'learning_rate': 1.9119146297236442e-05, 'epoch': 0.62}
62%|██████▏ | 2786/4506 [3:10:32<1:58:58, 4.15s/it]
62%|██████▏ | 2787/4506 [3:10:36<2:00:38, 4.21s/it]
{'loss': 0.2165, 'grad_norm': 0.3194473385810852, 'learning_rate': 1.9100322937203647e-05, 'epoch': 0.62}
62%|██████▏ | 2787/4506 [3:10:36<2:00:38, 4.21s/it]
62%|██████▏ | 2788/4506 [3:10:40<1:55:41, 4.04s/it]
{'loss': 0.2189, 'grad_norm': 0.3976116180419922, 'learning_rate': 1.908150311833653e-05, 'epoch': 0.62}
62%|██████▏ | 2788/4506 [3:10:40<1:55:41, 4.04s/it]
62%|██████▏ | 2789/4506 [3:10:44<1:56:40, 4.08s/it]
{'loss': 0.2255, 'grad_norm': 0.36024588346481323, 'learning_rate': 1.9062686851931323e-05, 'epoch': 0.62}
62%|██████▏ | 2789/4506 [3:10:44<1:56:40, 4.08s/it]
62%|██████▏ | 2790/4506 [3:10:48<1:55:19, 4.03s/it]
{'loss': 0.2219, 'grad_norm': 0.4477747678756714, 'learning_rate': 1.9043874149282115e-05, 'epoch': 0.62}
62%|██████▏ | 2790/4506 [3:10:48<1:55:19, 4.03s/it]
62%|██████▏ | 2791/4506 [3:10:52<1:57:34, 4.11s/it]
{'loss': 0.2217, 'grad_norm': 0.36945709586143494, 'learning_rate': 1.9025065021680868e-05, 'epoch': 0.62}
62%|██████▏ | 2791/4506 [3:10:52<1:57:34, 4.11s/it]
62%|██████▏ | 2792/4506 [3:10:56<1:55:13, 4.03s/it]
{'loss': 0.2275, 'grad_norm': 0.3992622196674347, 'learning_rate': 1.9006259480417395e-05, 'epoch': 0.62}
62%|██████▏ | 2792/4506 [3:10:56<1:55:13, 4.03s/it]
62%|██████▏ | 2793/4506 [3:11:00<1:56:24, 4.08s/it]
{'loss': 0.2242, 'grad_norm': 0.3951168656349182, 'learning_rate': 1.8987457536779344e-05, 'epoch': 0.62}
62%|██████▏ | 2793/4506 [3:11:00<1:56:24, 4.08s/it]
62%|██████▏ | 2794/4506 [3:11:04<1:56:12, 4.07s/it]
{'loss': 0.2357, 'grad_norm': 0.3944301903247833, 'learning_rate': 1.8968659202052223e-05, 'epoch': 0.62}
62%|██████▏ | 2794/4506 [3:11:04<1:56:12, 4.07s/it]
62%|██████▏ | 2795/4506 [3:11:08<1:54:09, 4.00s/it]
{'loss': 0.2113, 'grad_norm': 0.41091448068618774, 'learning_rate': 1.894986448751935e-05, 'epoch': 0.62}
62%|██████▏ | 2795/4506 [3:11:08<1:54:09, 4.00s/it]
62%|██████▏ | 2796/4506 [3:11:12<1:55:30, 4.05s/it]
{'loss': 0.214, 'grad_norm': 0.363637238740921, 'learning_rate': 1.8931073404461907e-05, 'epoch': 0.62}
62%|██████▏ | 2796/4506 [3:11:12<1:55:30, 4.05s/it]
62%|██████▏ | 2797/4506 [3:11:16<1:56:27, 4.09s/it]
{'loss': 0.2258, 'grad_norm': 0.3748008608818054, 'learning_rate': 1.8912285964158858e-05, 'epoch': 0.62}
62%|██████▏ | 2797/4506 [3:11:16<1:56:27, 4.09s/it]
62%|██████▏ | 2798/4506 [3:11:20<1:56:31, 4.09s/it]
{'loss': 0.2176, 'grad_norm': 0.36724644899368286, 'learning_rate': 1.8893502177887005e-05, 'epoch': 0.62}
62%|██████▏ | 2798/4506 [3:11:20<1:56:31, 4.09s/it]
62%|██████▏ | 2799/4506 [3:11:25<1:55:58, 4.08s/it]
{'loss': 0.2296, 'grad_norm': 0.362766832113266, 'learning_rate': 1.887472205692094e-05, 'epoch': 0.62}
62%|██████▏ | 2799/4506 [3:11:25<1:55:58, 4.08s/it]
62%|██████▏ | 2800/4506 [3:11:28<1:52:54, 3.97s/it]
{'loss': 0.224, 'grad_norm': 0.3806685507297516, 'learning_rate': 1.885594561253307e-05, 'epoch': 0.62}
62%|██████▏ | 2800/4506 [3:11:28<1:52:54, 3.97s/it]
62%|██████▏ | 2801/4506 [3:11:32<1:54:08, 4.02s/it]
{'loss': 0.2234, 'grad_norm': 0.3919810354709625, 'learning_rate': 1.883717285599358e-05, 'epoch': 0.62}
62%|██████▏ | 2801/4506 [3:11:32<1:54:08, 4.02s/it]
62%|██████▏ | 2802/4506 [3:11:36<1:53:50, 4.01s/it]
{'loss': 0.212, 'grad_norm': 0.37551259994506836, 'learning_rate': 1.881840379857046e-05, 'epoch': 0.62}
62%|██████▏ | 2802/4506 [3:11:36<1:53:50, 4.01s/it]
62%|██████▏ | 2803/4506 [3:11:40<1:52:28, 3.96s/it]
{'loss': 0.214, 'grad_norm': 0.3697497844696045, 'learning_rate': 1.8799638451529462e-05, 'epoch': 0.62}
62%|██████▏ | 2803/4506 [3:11:40<1:52:28, 3.96s/it]
62%|██████▏ | 2804/4506 [3:11:44<1:54:23, 4.03s/it]
{'loss': 0.2119, 'grad_norm': 0.3755195438861847, 'learning_rate': 1.878087682613412e-05, 'epoch': 0.62}
62%|██████▏ | 2804/4506 [3:11:44<1:54:23, 4.03s/it]
62%|██████▏ | 2805/4506 [3:11:48<1:54:25, 4.04s/it]
{'loss': 0.2185, 'grad_norm': 0.35124656558036804, 'learning_rate': 1.876211893364573e-05, 'epoch': 0.62}
62%|██████▏ | 2805/4506 [3:11:48<1:54:25, 4.04s/it]
62%|██████▏ | 2806/4506 [3:11:53<1:55:00, 4.06s/it]
{'loss': 0.2341, 'grad_norm': 0.3858252167701721, 'learning_rate': 1.8743364785323354e-05, 'epoch': 0.62}
62%|██████▏ | 2806/4506 [3:11:53<1:55:00, 4.06s/it]
62%|██████▏ | 2807/4506 [3:11:57<1:54:39, 4.05s/it]
{'loss': 0.2295, 'grad_norm': 0.4030131697654724, 'learning_rate': 1.8724614392423793e-05, 'epoch': 0.62}
62%|██████▏ | 2807/4506 [3:11:57<1:54:39, 4.05s/it]
62%|██████▏ | 2808/4506 [3:12:01<1:58:15, 4.18s/it]
{'loss': 0.2357, 'grad_norm': 0.37402021884918213, 'learning_rate': 1.870586776620163e-05, 'epoch': 0.62}
62%|██████▏ | 2808/4506 [3:12:01<1:58:15, 4.18s/it]
62%|██████▏ | 2809/4506 [3:12:05<1:56:58, 4.14s/it]
{'loss': 0.2226, 'grad_norm': 0.33263787627220154, 'learning_rate': 1.8687124917909132e-05, 'epoch': 0.62}
62%|██████▏ | 2809/4506 [3:12:05<1:56:58, 4.14s/it]
62%|██████▏ | 2810/4506 [3:12:09<1:57:31, 4.16s/it]
{'loss': 0.221, 'grad_norm': 0.36642882227897644, 'learning_rate': 1.8668385858796332e-05, 'epoch': 0.62}
62%|██████▏ | 2810/4506 [3:12:09<1:57:31, 4.16s/it]
62%|██████▏ | 2811/4506 [3:12:14<2:01:10, 4.29s/it]
{'loss': 0.2408, 'grad_norm': 0.34143438935279846, 'learning_rate': 1.8649650600110997e-05, 'epoch': 0.62}
62%|██████▏ | 2811/4506 [3:12:14<2:01:10, 4.29s/it]
62%|██████▏ | 2812/4506 [3:12:18<1:59:30, 4.23s/it]
{'loss': 0.2309, 'grad_norm': 0.45311662554740906, 'learning_rate': 1.863091915309858e-05, 'epoch': 0.62}
62%|██████▏ | 2812/4506 [3:12:18<1:59:30, 4.23s/it]
62%|██████▏ | 2813/4506 [3:12:22<1:56:38, 4.13s/it]
{'loss': 0.2213, 'grad_norm': 0.3800541162490845, 'learning_rate': 1.861219152900228e-05, 'epoch': 0.62}
62%|██████▏ | 2813/4506 [3:12:22<1:56:38, 4.13s/it]
62%|██████▏ | 2814/4506 [3:12:26<1:56:12, 4.12s/it]
{'loss': 0.2317, 'grad_norm': 0.3824986219406128, 'learning_rate': 1.859346773906298e-05, 'epoch': 0.62}
62%|██████▏ | 2814/4506 [3:12:26<1:56:12, 4.12s/it]
62%|██████▏ | 2815/4506 [3:12:30<1:58:09, 4.19s/it]
{'loss': 0.223, 'grad_norm': 0.3771374225616455, 'learning_rate': 1.8574747794519275e-05, 'epoch': 0.62}
62%|██████▏ | 2815/4506 [3:12:30<1:58:09, 4.19s/it]
62%|██████▏ | 2816/4506 [3:12:34<1:56:52, 4.15s/it]
{'loss': 0.2268, 'grad_norm': 0.35336166620254517, 'learning_rate': 1.8556031706607442e-05, 'epoch': 0.63}
62%|██████▏ | 2816/4506 [3:12:34<1:56:52, 4.15s/it]
63%|██████▎ | 2817/4506 [3:12:39<1:56:37, 4.14s/it]
{'loss': 0.2191, 'grad_norm': 0.35110732913017273, 'learning_rate': 1.8537319486561446e-05, 'epoch': 0.63}
63%|██████▎ | 2817/4506 [3:12:39<1:56:37, 4.14s/it]
63%|██████▎ | 2818/4506 [3:12:43<1:59:34, 4.25s/it]
{'loss': 0.2383, 'grad_norm': 0.36507853865623474, 'learning_rate': 1.8518611145612925e-05, 'epoch': 0.63}
63%|██████▎ | 2818/4506 [3:12:43<1:59:34, 4.25s/it]
63%|██████▎ | 2819/4506 [3:12:47<1:59:10, 4.24s/it]
{'loss': 0.2313, 'grad_norm': 0.37641915678977966, 'learning_rate': 1.8499906694991203e-05, 'epoch': 0.63}
63%|██████▎ | 2819/4506 [3:12:47<1:59:10, 4.24s/it]
63%|██████▎ | 2820/4506 [3:12:52<2:01:22, 4.32s/it]
{'loss': 0.2424, 'grad_norm': 0.39206501841545105, 'learning_rate': 1.8481206145923257e-05, 'epoch': 0.63}
63%|██████▎ | 2820/4506 [3:12:52<2:01:22, 4.32s/it]
63%|██████▎ | 2821/4506 [3:12:55<1:56:24, 4.15s/it]
{'loss': 0.2202, 'grad_norm': 0.3920734226703644, 'learning_rate': 1.846250950963373e-05, 'epoch': 0.63}
63%|██████▎ | 2821/4506 [3:12:56<1:56:24, 4.15s/it]
63%|██████▎ | 2822/4506 [3:12:59<1:53:55, 4.06s/it]
{'loss': 0.2335, 'grad_norm': 0.41775068640708923, 'learning_rate': 1.84438167973449e-05, 'epoch': 0.63}
63%|██████▎ | 2822/4506 [3:12:59<1:53:55, 4.06s/it]
63%|██████▎ | 2823/4506 [3:13:03<1:54:28, 4.08s/it]
{'loss': 0.2198, 'grad_norm': 0.3483288288116455, 'learning_rate': 1.8425128020276716e-05, 'epoch': 0.63}
63%|██████▎ | 2823/4506 [3:13:03<1:54:28, 4.08s/it]
63%|██████▎ | 2824/4506 [3:13:08<1:55:19, 4.11s/it]
{'loss': 0.2216, 'grad_norm': 0.35823097825050354, 'learning_rate': 1.8406443189646735e-05, 'epoch': 0.63}
63%|██████▎ | 2824/4506 [3:13:08<1:55:19, 4.11s/it]
63%|██████▎ | 2825/4506 [3:13:12<1:56:41, 4.17s/it]
{'loss': 0.2281, 'grad_norm': 0.3587222993373871, 'learning_rate': 1.838776231667018e-05, 'epoch': 0.63}
63%|██████▎ | 2825/4506 [3:13:12<1:56:41, 4.17s/it]
63%|██████▎ | 2826/4506 [3:13:16<1:54:35, 4.09s/it]
{'loss': 0.2251, 'grad_norm': 0.4402826130390167, 'learning_rate': 1.836908541255987e-05, 'epoch': 0.63}
63%|██████▎ | 2826/4506 [3:13:16<1:54:35, 4.09s/it]
63%|██████▎ | 2827/4506 [3:13:20<1:55:47, 4.14s/it]
{'loss': 0.2223, 'grad_norm': 0.35297390818595886, 'learning_rate': 1.835041248852624e-05, 'epoch': 0.63}
63%|██████▎ | 2827/4506 [3:13:20<1:55:47, 4.14s/it]
63%|██████▎ | 2828/4506 [3:13:24<1:56:59, 4.18s/it]
{'loss': 0.2259, 'grad_norm': 0.35042837262153625, 'learning_rate': 1.833174355577737e-05, 'epoch': 0.63}
63%|██████▎ | 2828/4506 [3:13:24<1:56:59, 4.18s/it]
63%|██████▎ | 2829/4506 [3:13:29<1:56:50, 4.18s/it]
{'loss': 0.2161, 'grad_norm': 0.35324206948280334, 'learning_rate': 1.8313078625518896e-05, 'epoch': 0.63}
63%|██████▎ | 2829/4506 [3:13:29<1:56:50, 4.18s/it]
63%|██████▎ | 2830/4506 [3:13:33<1:57:17, 4.20s/it]
{'loss': 0.2258, 'grad_norm': 0.3600199222564697, 'learning_rate': 1.8294417708954104e-05, 'epoch': 0.63}
63%|██████▎ | 2830/4506 [3:13:33<1:57:17, 4.20s/it]
63%|██████▎ | 2831/4506 [3:13:37<2:00:00, 4.30s/it]
{'loss': 0.2213, 'grad_norm': 0.36942991614341736, 'learning_rate': 1.8275760817283815e-05, 'epoch': 0.63}
63%|██████▎ | 2831/4506 [3:13:37<2:00:00, 4.30s/it]
63%|██████▎ | 2832/4506 [3:13:41<1:57:02, 4.19s/it]
{'loss': 0.2271, 'grad_norm': 0.3977261185646057, 'learning_rate': 1.8257107961706488e-05, 'epoch': 0.63}
63%|██████▎ | 2832/4506 [3:13:41<1:57:02, 4.19s/it]
63%|██████▎ | 2833/4506 [3:13:45<1:54:37, 4.11s/it]
{'loss': 0.2269, 'grad_norm': 0.372539222240448, 'learning_rate': 1.823845915341812e-05, 'epoch': 0.63}
63%|██████▎ | 2833/4506 [3:13:45<1:54:37, 4.11s/it]
63%|██████▎ | 2834/4506 [3:13:50<1:56:29, 4.18s/it]
{'loss': 0.2433, 'grad_norm': 0.4076315462589264, 'learning_rate': 1.82198144036123e-05, 'epoch': 0.63}
63%|██████▎ | 2834/4506 [3:13:50<1:56:29, 4.18s/it]
63%|██████▎ | 2835/4506 [3:13:53<1:53:46, 4.09s/it]
{'loss': 0.2137, 'grad_norm': 0.36041292548179626, 'learning_rate': 1.8201173723480165e-05, 'epoch': 0.63}
63%|██████▎ | 2835/4506 [3:13:53<1:53:46, 4.09s/it]
63%|██████▎ | 2836/4506 [3:13:57<1:52:46, 4.05s/it]
{'loss': 0.2334, 'grad_norm': 0.40010371804237366, 'learning_rate': 1.8182537124210438e-05, 'epoch': 0.63}
63%|██████▎ | 2836/4506 [3:13:57<1:52:46, 4.05s/it]
63%|██████▎ | 2837/4506 [3:14:01<1:50:56, 3.99s/it]
{'loss': 0.2194, 'grad_norm': 0.38571998476982117, 'learning_rate': 1.816390461698935e-05, 'epoch': 0.63}
63%|██████▎ | 2837/4506 [3:14:01<1:50:56, 3.99s/it]
63%|██████▎ | 2838/4506 [3:14:05<1:51:50, 4.02s/it]
{'loss': 0.2259, 'grad_norm': 0.36522749066352844, 'learning_rate': 1.8145276213000714e-05, 'epoch': 0.63}
63%|██████▎ | 2838/4506 [3:14:05<1:51:50, 4.02s/it]
63%|██████▎ | 2839/4506 [3:14:09<1:49:04, 3.93s/it]
{'loss': 0.2311, 'grad_norm': 0.4368633031845093, 'learning_rate': 1.8126651923425852e-05, 'epoch': 0.63}
63%|██████▎ | 2839/4506 [3:14:09<1:49:04, 3.93s/it]
63%|██████▎ | 2840/4506 [3:14:13<1:50:08, 3.97s/it]
{'loss': 0.2214, 'grad_norm': 0.3671899735927582, 'learning_rate': 1.810803175944365e-05, 'epoch': 0.63}
63%|██████▎ | 2840/4506 [3:14:13<1:50:08, 3.97s/it]
63%|██████▎ | 2841/4506 [3:14:18<1:54:26, 4.12s/it]
{'loss': 0.2153, 'grad_norm': 0.33318257331848145, 'learning_rate': 1.8089415732230473e-05, 'epoch': 0.63}
63%|██████▎ | 2841/4506 [3:14:18<1:54:26, 4.12s/it]
63%|██████▎ | 2842/4506 [3:14:22<1:54:12, 4.12s/it]
{'loss': 0.2139, 'grad_norm': 0.3798915147781372, 'learning_rate': 1.8070803852960245e-05, 'epoch': 0.63}
63%|██████▎ | 2842/4506 [3:14:22<1:54:12, 4.12s/it]
63%|██████▎ | 2843/4506 [3:14:26<1:55:54, 4.18s/it]
{'loss': 0.2258, 'grad_norm': 0.3991449177265167, 'learning_rate': 1.8052196132804377e-05, 'epoch': 0.63}
63%|██████▎ | 2843/4506 [3:14:26<1:55:54, 4.18s/it]
63%|██████▎ | 2844/4506 [3:14:30<1:56:45, 4.21s/it]
{'loss': 0.2216, 'grad_norm': 0.3422580361366272, 'learning_rate': 1.8033592582931775e-05, 'epoch': 0.63}
63%|██████▎ | 2844/4506 [3:14:30<1:56:45, 4.21s/it]
63%|██████▎ | 2845/4506 [3:14:34<1:52:52, 4.08s/it]
{'loss': 0.2251, 'grad_norm': 0.39455947279930115, 'learning_rate': 1.801499321450888e-05, 'epoch': 0.63}
63%|██████▎ | 2845/4506 [3:14:34<1:52:52, 4.08s/it]
63%|██████▎ | 2846/4506 [3:14:38<1:51:03, 4.01s/it]
{'loss': 0.2232, 'grad_norm': 0.40924298763275146, 'learning_rate': 1.7996398038699574e-05, 'epoch': 0.63}
63%|██████▎ | 2846/4506 [3:14:38<1:51:03, 4.01s/it]
63%|██████▎ | 2847/4506 [3:14:42<1:52:20, 4.06s/it]
{'loss': 0.2215, 'grad_norm': 0.34420546889305115, 'learning_rate': 1.797780706666527e-05, 'epoch': 0.63}
63%|██████▎ | 2847/4506 [3:14:42<1:52:20, 4.06s/it]
63%|██████▎ | 2848/4506 [3:14:47<1:54:59, 4.16s/it]
{'loss': 0.228, 'grad_norm': 0.3563390374183655, 'learning_rate': 1.7959220309564817e-05, 'epoch': 0.63}
63%|██████▎ | 2848/4506 [3:14:47<1:54:59, 4.16s/it]
63%|██████▎ | 2849/4506 [3:14:50<1:52:13, 4.06s/it]
{'loss': 0.22, 'grad_norm': 0.3631713092327118, 'learning_rate': 1.794063777855457e-05, 'epoch': 0.63}
63%|██████▎ | 2849/4506 [3:14:50<1:52:13, 4.06s/it]
63%|██████▎ | 2850/4506 [3:14:55<1:53:53, 4.13s/it]
{'loss': 0.2275, 'grad_norm': 0.3830614984035492, 'learning_rate': 1.792205948478831e-05, 'epoch': 0.63}
63%|██████▎ | 2850/4506 [3:14:55<1:53:53, 4.13s/it]
63%|██████▎ | 2851/4506 [3:14:59<1:53:41, 4.12s/it]
{'loss': 0.219, 'grad_norm': 0.37886396050453186, 'learning_rate': 1.7903485439417306e-05, 'epoch': 0.63}
63%|██████▎ | 2851/4506 [3:14:59<1:53:41, 4.12s/it]
63%|██████▎ | 2852/4506 [3:15:03<1:53:04, 4.10s/it]
{'loss': 0.232, 'grad_norm': 0.4140421450138092, 'learning_rate': 1.7884915653590263e-05, 'epoch': 0.63}
63%|██████▎ | 2852/4506 [3:15:03<1:53:04, 4.10s/it]
63%|██████▎ | 2853/4506 [3:15:07<1:52:11, 4.07s/it]
{'loss': 0.2253, 'grad_norm': 0.41444095969200134, 'learning_rate': 1.786635013845333e-05, 'epoch': 0.63}
63%|██████▎ | 2853/4506 [3:15:07<1:52:11, 4.07s/it]
63%|██████▎ | 2854/4506 [3:15:11<1:54:03, 4.14s/it]
{'loss': 0.2202, 'grad_norm': 0.406109482049942, 'learning_rate': 1.7847788905150096e-05, 'epoch': 0.63}
63%|██████▎ | 2854/4506 [3:15:11<1:54:03, 4.14s/it]
63%|██████▎ | 2855/4506 [3:15:15<1:54:09, 4.15s/it]
{'loss': 0.2175, 'grad_norm': 0.3299678564071655, 'learning_rate': 1.7829231964821586e-05, 'epoch': 0.63}
63%|██████▎ | 2855/4506 [3:15:15<1:54:09, 4.15s/it]
63%|██████▎ | 2856/4506 [3:15:19<1:52:45, 4.10s/it]
{'loss': 0.2206, 'grad_norm': 0.41963157057762146, 'learning_rate': 1.781067932860622e-05, 'epoch': 0.63}
63%|██████▎ | 2856/4506 [3:15:19<1:52:45, 4.10s/it]
63%|██████▎ | 2857/4506 [3:15:23<1:50:43, 4.03s/it]
{'loss': 0.2223, 'grad_norm': 0.3903002440929413, 'learning_rate': 1.779213100763987e-05, 'epoch': 0.63}
63%|██████▎ | 2857/4506 [3:15:23<1:50:43, 4.03s/it]
63%|██████▎ | 2858/4506 [3:15:27<1:50:48, 4.03s/it]
{'loss': 0.2427, 'grad_norm': 0.41930079460144043, 'learning_rate': 1.7773587013055796e-05, 'epoch': 0.63}
63%|██████▎ | 2858/4506 [3:15:27<1:50:48, 4.03s/it]
63%|██████▎ | 2859/4506 [3:15:31<1:51:28, 4.06s/it]
{'loss': 0.2264, 'grad_norm': 0.408077597618103, 'learning_rate': 1.7755047355984682e-05, 'epoch': 0.63}
63%|██████▎ | 2859/4506 [3:15:31<1:51:28, 4.06s/it]
63%|██████▎ | 2860/4506 [3:15:36<1:52:57, 4.12s/it]
{'loss': 0.2323, 'grad_norm': 0.34277015924453735, 'learning_rate': 1.7736512047554576e-05, 'epoch': 0.63}
63%|██████▎ | 2860/4506 [3:15:36<1:52:57, 4.12s/it]
63%|██████▎ | 2861/4506 [3:15:40<1:56:19, 4.24s/it]
{'loss': 0.2226, 'grad_norm': 0.3471219837665558, 'learning_rate': 1.7717981098890937e-05, 'epoch': 0.64}
63%|██████▎ | 2861/4506 [3:15:40<1:56:19, 4.24s/it]
64%|██████▎ | 2862/4506 [3:15:45<1:59:04, 4.35s/it]
{'loss': 0.2315, 'grad_norm': 0.38462817668914795, 'learning_rate': 1.7699454521116617e-05, 'epoch': 0.64}
64%|██████▎ | 2862/4506 [3:15:45<1:59:04, 4.35s/it]
64%|██████▎ | 2863/4506 [3:15:49<1:55:30, 4.22s/it]
{'loss': 0.2287, 'grad_norm': 0.3490491509437561, 'learning_rate': 1.7680932325351806e-05, 'epoch': 0.64}
64%|██████▎ | 2863/4506 [3:15:49<1:55:30, 4.22s/it]
64%|██████▎ | 2864/4506 [3:15:52<1:52:12, 4.10s/it]
{'loss': 0.2403, 'grad_norm': 0.4114421308040619, 'learning_rate': 1.7662414522714125e-05, 'epoch': 0.64}
64%|██████▎ | 2864/4506 [3:15:52<1:52:12, 4.10s/it]
64%|██████▎ | 2865/4506 [3:15:57<1:52:05, 4.10s/it]
{'loss': 0.2286, 'grad_norm': 0.36508962512016296, 'learning_rate': 1.7643901124318495e-05, 'epoch': 0.64}
64%|██████▎ | 2865/4506 [3:15:57<1:52:05, 4.10s/it]
64%|██████▎ | 2866/4506 [3:16:00<1:49:19, 4.00s/it]
{'loss': 0.2208, 'grad_norm': 0.3808446526527405, 'learning_rate': 1.762539214127723e-05, 'epoch': 0.64}
64%|██████▎ | 2866/4506 [3:16:00<1:49:19, 4.00s/it]
64%|██████▎ | 2867/4506 [3:16:04<1:47:31, 3.94s/it]
{'loss': 0.2273, 'grad_norm': 0.43225398659706116, 'learning_rate': 1.7606887584699986e-05, 'epoch': 0.64}
64%|██████▎ | 2867/4506 [3:16:04<1:47:31, 3.94s/it]
64%|██████▎ | 2868/4506 [3:16:08<1:47:07, 3.92s/it]
{'loss': 0.226, 'grad_norm': 0.36807945370674133, 'learning_rate': 1.7588387465693765e-05, 'epoch': 0.64}
64%|██████▎ | 2868/4506 [3:16:08<1:47:07, 3.92s/it]
64%|██████▎ | 2869/4506 [3:16:12<1:48:06, 3.96s/it]
{'loss': 0.2195, 'grad_norm': 0.34015539288520813, 'learning_rate': 1.7569891795362885e-05, 'epoch': 0.64}
64%|██████▎ | 2869/4506 [3:16:12<1:48:06, 3.96s/it]
64%|██████▎ | 2870/4506 [3:16:16<1:46:08, 3.89s/it]
{'loss': 0.2366, 'grad_norm': 0.4066300392150879, 'learning_rate': 1.755140058480903e-05, 'epoch': 0.64}
64%|██████▎ | 2870/4506 [3:16:16<1:46:08, 3.89s/it]
64%|██████▎ | 2871/4506 [3:16:20<1:47:17, 3.94s/it]
{'loss': 0.2302, 'grad_norm': 0.4191570281982422, 'learning_rate': 1.753291384513117e-05, 'epoch': 0.64}
64%|██████▎ | 2871/4506 [3:16:20<1:47:17, 3.94s/it]
64%|██████▎ | 2872/4506 [3:16:24<1:46:02, 3.89s/it]
{'loss': 0.2084, 'grad_norm': 0.3379266858100891, 'learning_rate': 1.7514431587425624e-05, 'epoch': 0.64}
64%|██████▎ | 2872/4506 [3:16:24<1:46:02, 3.89s/it]
64%|██████▍ | 2873/4506 [3:16:28<1:51:50, 4.11s/it]
{'loss': 0.2341, 'grad_norm': 0.3622596859931946, 'learning_rate': 1.7495953822785993e-05, 'epoch': 0.64}
64%|██████▍ | 2873/4506 [3:16:28<1:51:50, 4.11s/it]
64%|██████▍ | 2874/4506 [3:16:33<1:54:33, 4.21s/it]
{'loss': 0.2262, 'grad_norm': 0.4136495292186737, 'learning_rate': 1.7477480562303207e-05, 'epoch': 0.64}
64%|██████▍ | 2874/4506 [3:16:33<1:54:33, 4.21s/it]
64%|██████▍ | 2875/4506 [3:16:36<1:49:57, 4.05s/it]
{'loss': 0.2224, 'grad_norm': 0.3920751214027405, 'learning_rate': 1.7459011817065467e-05, 'epoch': 0.64}
64%|██████▍ | 2875/4506 [3:16:36<1:49:57, 4.05s/it]
64%|██████▍ | 2876/4506 [3:16:41<1:51:14, 4.10s/it]
{'loss': 0.2301, 'grad_norm': 0.34613728523254395, 'learning_rate': 1.7440547598158278e-05, 'epoch': 0.64}
64%|██████▍ | 2876/4506 [3:16:41<1:51:14, 4.10s/it]
64%|██████▍ | 2877/4506 [3:16:45<1:50:50, 4.08s/it]
{'loss': 0.2238, 'grad_norm': 0.3580951392650604, 'learning_rate': 1.7422087916664434e-05, 'epoch': 0.64}
64%|██████▍ | 2877/4506 [3:16:45<1:50:50, 4.08s/it]
64%|██████▍ | 2878/4506 [3:16:49<1:49:37, 4.04s/it]
{'loss': 0.225, 'grad_norm': 0.4619707763195038, 'learning_rate': 1.740363278366398e-05, 'epoch': 0.64}
64%|██████▍ | 2878/4506 [3:16:49<1:49:37, 4.04s/it]
64%|██████▍ | 2879/4506 [3:16:52<1:49:03, 4.02s/it]
{'loss': 0.2254, 'grad_norm': 0.3600691556930542, 'learning_rate': 1.738518221023427e-05, 'epoch': 0.64}
64%|██████▍ | 2879/4506 [3:16:52<1:49:03, 4.02s/it]
64%|██████▍ | 2880/4506 [3:16:56<1:46:30, 3.93s/it]
{'loss': 0.22, 'grad_norm': 0.3616017997264862, 'learning_rate': 1.7366736207449878e-05, 'epoch': 0.64}
64%|██████▍ | 2880/4506 [3:16:56<1:46:30, 3.93s/it]
64%|██████▍ | 2881/4506 [3:17:01<1:50:01, 4.06s/it]
{'loss': 0.2185, 'grad_norm': 0.370971143245697, 'learning_rate': 1.734829478638268e-05, 'epoch': 0.64}
64%|██████▍ | 2881/4506 [3:17:01<1:50:01, 4.06s/it]
64%|██████▍ | 2882/4506 [3:17:05<1:50:13, 4.07s/it]
{'loss': 0.2114, 'grad_norm': 0.3616202473640442, 'learning_rate': 1.7329857958101746e-05, 'epoch': 0.64}
64%|██████▍ | 2882/4506 [3:17:05<1:50:13, 4.07s/it]
64%|██████▍ | 2883/4506 [3:17:09<1:50:52, 4.10s/it]
{'loss': 0.2267, 'grad_norm': 0.4083409011363983, 'learning_rate': 1.7311425733673464e-05, 'epoch': 0.64}
64%|██████▍ | 2883/4506 [3:17:09<1:50:52, 4.10s/it]
64%|██████▍ | 2884/4506 [3:17:13<1:49:15, 4.04s/it]
{'loss': 0.2072, 'grad_norm': 0.3644458055496216, 'learning_rate': 1.7292998124161376e-05, 'epoch': 0.64}
64%|██████▍ | 2884/4506 [3:17:13<1:49:15, 4.04s/it]
64%|██████▍ | 2885/4506 [3:17:17<1:52:04, 4.15s/it]
{'loss': 0.2256, 'grad_norm': 0.3963865637779236, 'learning_rate': 1.7274575140626318e-05, 'epoch': 0.64}
64%|██████▍ | 2885/4506 [3:17:17<1:52:04, 4.15s/it]
64%|██████▍ | 2886/4506 [3:17:21<1:53:08, 4.19s/it]
{'loss': 0.2182, 'grad_norm': 0.32897910475730896, 'learning_rate': 1.725615679412631e-05, 'epoch': 0.64}
64%|██████▍ | 2886/4506 [3:17:21<1:53:08, 4.19s/it]
64%|██████▍ | 2887/4506 [3:17:25<1:50:49, 4.11s/it]
{'loss': 0.2205, 'grad_norm': 0.42552420496940613, 'learning_rate': 1.7237743095716624e-05, 'epoch': 0.64}
64%|██████▍ | 2887/4506 [3:17:25<1:50:49, 4.11s/it]
64%|██████▍ | 2888/4506 [3:17:29<1:51:07, 4.12s/it]
{'loss': 0.227, 'grad_norm': 0.381478488445282, 'learning_rate': 1.7219334056449697e-05, 'epoch': 0.64}
64%|██████▍ | 2888/4506 [3:17:29<1:51:07, 4.12s/it]
64%|██████▍ | 2889/4506 [3:17:34<1:53:47, 4.22s/it]
{'loss': 0.2236, 'grad_norm': 1.7214782238006592, 'learning_rate': 1.720092968737522e-05, 'epoch': 0.64}
64%|██████▍ | 2889/4506 [3:17:34<1:53:47, 4.22s/it]
64%|██████▍ | 2890/4506 [3:17:38<1:50:29, 4.10s/it]
{'loss': 0.2196, 'grad_norm': 0.3803940713405609, 'learning_rate': 1.7182529999540026e-05, 'epoch': 0.64}
64%|██████▍ | 2890/4506 [3:17:38<1:50:29, 4.10s/it]
64%|██████▍ | 2891/4506 [3:17:42<1:50:33, 4.11s/it]
{'loss': 0.2198, 'grad_norm': 0.3649773895740509, 'learning_rate': 1.7164135003988197e-05, 'epoch': 0.64}
64%|██████▍ | 2891/4506 [3:17:42<1:50:33, 4.11s/it]
64%|██████▍ | 2892/4506 [3:17:46<1:51:39, 4.15s/it]
{'loss': 0.2189, 'grad_norm': 0.3840683102607727, 'learning_rate': 1.7145744711760957e-05, 'epoch': 0.64}
64%|██████▍ | 2892/4506 [3:17:46<1:51:39, 4.15s/it]
64%|██████▍ | 2893/4506 [3:17:50<1:52:21, 4.18s/it]
{'loss': 0.2317, 'grad_norm': 0.38084980845451355, 'learning_rate': 1.7127359133896708e-05, 'epoch': 0.64}
64%|██████▍ | 2893/4506 [3:17:50<1:52:21, 4.18s/it]
64%|██████▍ | 2894/4506 [3:17:54<1:51:05, 4.14s/it]
{'loss': 0.2256, 'grad_norm': 0.39669016003608704, 'learning_rate': 1.7108978281431052e-05, 'epoch': 0.64}
64%|██████▍ | 2894/4506 [3:17:54<1:51:05, 4.14s/it]
64%|██████▍ | 2895/4506 [3:17:58<1:50:36, 4.12s/it]
{'loss': 0.2223, 'grad_norm': 0.34870487451553345, 'learning_rate': 1.709060216539672e-05, 'epoch': 0.64}
64%|██████▍ | 2895/4506 [3:17:59<1:50:36, 4.12s/it]
64%|██████▍ | 2896/4506 [3:18:02<1:49:20, 4.07s/it]
{'loss': 0.2272, 'grad_norm': 0.33539772033691406, 'learning_rate': 1.7072230796823634e-05, 'epoch': 0.64}
64%|██████▍ | 2896/4506 [3:18:02<1:49:20, 4.07s/it]
64%|██████▍ | 2897/4506 [3:18:07<1:50:00, 4.10s/it]
{'loss': 0.2241, 'grad_norm': 0.36561158299446106, 'learning_rate': 1.7053864186738823e-05, 'epoch': 0.64}
64%|██████▍ | 2897/4506 [3:18:07<1:50:00, 4.10s/it]
64%|██████▍ | 2898/4506 [3:18:11<1:51:13, 4.15s/it]
{'loss': 0.2215, 'grad_norm': 0.3742469847202301, 'learning_rate': 1.703550234616651e-05, 'epoch': 0.64}
64%|██████▍ | 2898/4506 [3:18:11<1:51:13, 4.15s/it]
64%|██████▍ | 2899/4506 [3:18:15<1:50:37, 4.13s/it]
{'loss': 0.2265, 'grad_norm': 0.36947426199913025, 'learning_rate': 1.701714528612801e-05, 'epoch': 0.64}
64%|██████▍ | 2899/4506 [3:18:15<1:50:37, 4.13s/it]
64%|██████▍ | 2900/4506 [3:18:19<1:51:39, 4.17s/it]
{'loss': 0.2289, 'grad_norm': 0.41648203134536743, 'learning_rate': 1.699879301764181e-05, 'epoch': 0.64}
64%|██████▍ | 2900/4506 [3:18:19<1:51:39, 4.17s/it]
64%|██████▍ | 2901/4506 [3:18:23<1:49:47, 4.10s/it]
{'loss': 0.2319, 'grad_norm': 0.39993128180503845, 'learning_rate': 1.698044555172348e-05, 'epoch': 0.64}
64%|██████▍ | 2901/4506 [3:18:23<1:49:47, 4.10s/it]
64%|██████▍ | 2902/4506 [3:18:28<1:53:12, 4.23s/it]
{'loss': 0.2236, 'grad_norm': 0.36589279770851135, 'learning_rate': 1.6962102899385747e-05, 'epoch': 0.64}
64%|██████▍ | 2902/4506 [3:18:28<1:53:12, 4.23s/it]
64%|██████▍ | 2903/4506 [3:18:32<1:50:41, 4.14s/it]
{'loss': 0.2102, 'grad_norm': 0.3641602098941803, 'learning_rate': 1.6943765071638407e-05, 'epoch': 0.64}
64%|██████▍ | 2903/4506 [3:18:32<1:50:41, 4.14s/it]
64%|██████▍ | 2904/4506 [3:18:35<1:48:01, 4.05s/it]
{'loss': 0.2207, 'grad_norm': 0.39523324370384216, 'learning_rate': 1.6925432079488397e-05, 'epoch': 0.64}
64%|██████▍ | 2904/4506 [3:18:35<1:48:01, 4.05s/it]
64%|██████▍ | 2905/4506 [3:18:40<1:51:17, 4.17s/it]
{'loss': 0.2209, 'grad_norm': 0.43173229694366455, 'learning_rate': 1.690710393393973e-05, 'epoch': 0.64}
64%|██████▍ | 2905/4506 [3:18:40<1:51:17, 4.17s/it]
64%|██████▍ | 2906/4506 [3:18:44<1:51:23, 4.18s/it]
{'loss': 0.2108, 'grad_norm': 0.3661336898803711, 'learning_rate': 1.6888780645993528e-05, 'epoch': 0.65}
64%|██████▍ | 2906/4506 [3:18:44<1:51:23, 4.18s/it]
65%|██████▍ | 2907/4506 [3:18:48<1:50:36, 4.15s/it]
{'loss': 0.2468, 'grad_norm': 0.41545945405960083, 'learning_rate': 1.687046222664797e-05, 'epoch': 0.65}
65%|██████▍ | 2907/4506 [3:18:48<1:50:36, 4.15s/it]
65%|██████▍ | 2908/4506 [3:18:53<1:52:00, 4.21s/it]
{'loss': 0.2276, 'grad_norm': 0.40673115849494934, 'learning_rate': 1.6852148686898345e-05, 'epoch': 0.65}
65%|██████▍ | 2908/4506 [3:18:53<1:52:00, 4.21s/it]
65%|██████▍ | 2909/4506 [3:18:57<1:50:48, 4.16s/it]
{'loss': 0.2265, 'grad_norm': 0.3796389102935791, 'learning_rate': 1.6833840037736986e-05, 'epoch': 0.65}
65%|██████▍ | 2909/4506 [3:18:57<1:50:48, 4.16s/it]
65%|██████▍ | 2910/4506 [3:19:01<1:51:00, 4.17s/it]
{'loss': 0.2166, 'grad_norm': 0.3527551591396332, 'learning_rate': 1.6815536290153295e-05, 'epoch': 0.65}
65%|██████▍ | 2910/4506 [3:19:01<1:51:00, 4.17s/it]
65%|██████▍ | 2911/4506 [3:19:05<1:49:35, 4.12s/it]
{'loss': 0.2221, 'grad_norm': 0.34149810671806335, 'learning_rate': 1.6797237455133757e-05, 'epoch': 0.65}
65%|██████▍ | 2911/4506 [3:19:05<1:49:35, 4.12s/it]
65%|██████▍ | 2912/4506 [3:19:09<1:49:32, 4.12s/it]
{'loss': 0.2214, 'grad_norm': 0.33631834387779236, 'learning_rate': 1.6778943543661867e-05, 'epoch': 0.65}
65%|██████▍ | 2912/4506 [3:19:09<1:49:32, 4.12s/it]
65%|██████▍ | 2913/4506 [3:19:13<1:50:43, 4.17s/it]
{'loss': 0.2284, 'grad_norm': 0.3595815896987915, 'learning_rate': 1.676065456671821e-05, 'epoch': 0.65}
65%|██████▍ | 2913/4506 [3:19:13<1:50:43, 4.17s/it]
65%|██████▍ | 2914/4506 [3:19:18<1:51:48, 4.21s/it]
{'loss': 0.2171, 'grad_norm': 0.34147265553474426, 'learning_rate': 1.674237053528037e-05, 'epoch': 0.65}
65%|██████▍ | 2914/4506 [3:19:18<1:51:48, 4.21s/it]
65%|██████▍ | 2915/4506 [3:19:21<1:47:48, 4.07s/it]
{'loss': 0.2213, 'grad_norm': 0.3530040681362152, 'learning_rate': 1.6724091460322993e-05, 'epoch': 0.65}
65%|██████▍ | 2915/4506 [3:19:21<1:47:48, 4.07s/it]
65%|██████▍ | 2916/4506 [3:19:25<1:47:33, 4.06s/it]
{'loss': 0.2197, 'grad_norm': 0.3973182141780853, 'learning_rate': 1.6705817352817715e-05, 'epoch': 0.65}
65%|██████▍ | 2916/4506 [3:19:25<1:47:33, 4.06s/it]
65%|██████▍ | 2917/4506 [3:19:30<1:49:56, 4.15s/it]
{'loss': 0.2141, 'grad_norm': 0.325665146112442, 'learning_rate': 1.6687548223733233e-05, 'epoch': 0.65}
65%|██████▍ | 2917/4506 [3:19:30<1:49:56, 4.15s/it]
65%|██████▍ | 2918/4506 [3:19:34<1:48:03, 4.08s/it]
{'loss': 0.2348, 'grad_norm': 0.44941142201423645, 'learning_rate': 1.666928408403522e-05, 'epoch': 0.65}
65%|██████▍ | 2918/4506 [3:19:34<1:48:03, 4.08s/it]
65%|██████▍ | 2919/4506 [3:19:38<1:47:46, 4.07s/it]
{'loss': 0.2261, 'grad_norm': 0.3510751724243164, 'learning_rate': 1.6651024944686382e-05, 'epoch': 0.65}
65%|██████▍ | 2919/4506 [3:19:38<1:47:46, 4.07s/it]
65%|██████▍ | 2920/4506 [3:19:42<1:46:57, 4.05s/it]
{'loss': 0.2168, 'grad_norm': 0.3711263835430145, 'learning_rate': 1.6632770816646388e-05, 'epoch': 0.65}
65%|██████▍ | 2920/4506 [3:19:42<1:46:57, 4.05s/it]
65%|██████▍ | 2921/4506 [3:19:46<1:45:56, 4.01s/it]
{'loss': 0.2149, 'grad_norm': 0.38901790976524353, 'learning_rate': 1.6614521710871953e-05, 'epoch': 0.65}
65%|██████▍ | 2921/4506 [3:19:46<1:45:56, 4.01s/it]
65%|██████▍ | 2922/4506 [3:19:50<1:45:50, 4.01s/it]
{'loss': 0.2119, 'grad_norm': 0.3497355580329895, 'learning_rate': 1.6596277638316713e-05, 'epoch': 0.65}
65%|██████▍ | 2922/4506 [3:19:50<1:45:50, 4.01s/it]
65%|██████▍ | 2923/4506 [3:19:53<1:45:03, 3.98s/it]
{'loss': 0.2189, 'grad_norm': 0.4186893403530121, 'learning_rate': 1.6578038609931336e-05, 'epoch': 0.65}
65%|██████▍ | 2923/4506 [3:19:53<1:45:03, 3.98s/it]
65%|██████▍ | 2924/4506 [3:19:58<1:47:40, 4.08s/it]
{'loss': 0.2283, 'grad_norm': 0.39323538541793823, 'learning_rate': 1.655980463666343e-05, 'epoch': 0.65}
65%|██████▍ | 2924/4506 [3:19:58<1:47:40, 4.08s/it]
65%|██████▍ | 2925/4506 [3:20:02<1:46:58, 4.06s/it]
{'loss': 0.2195, 'grad_norm': 0.4128832221031189, 'learning_rate': 1.6541575729457594e-05, 'epoch': 0.65}
65%|██████▍ | 2925/4506 [3:20:02<1:46:58, 4.06s/it]
65%|██████▍ | 2926/4506 [3:20:06<1:45:14, 4.00s/it]
{'loss': 0.2278, 'grad_norm': 0.3923792243003845, 'learning_rate': 1.6523351899255362e-05, 'epoch': 0.65}
65%|██████▍ | 2926/4506 [3:20:06<1:45:14, 4.00s/it]
65%|██████▍ | 2927/4506 [3:20:10<1:45:31, 4.01s/it]
{'loss': 0.2208, 'grad_norm': 0.34026095271110535, 'learning_rate': 1.6505133156995228e-05, 'epoch': 0.65}
65%|██████▍ | 2927/4506 [3:20:10<1:45:31, 4.01s/it]
65%|██████▍ | 2928/4506 [3:20:14<1:45:33, 4.01s/it]
{'loss': 0.2395, 'grad_norm': 0.36347007751464844, 'learning_rate': 1.6486919513612653e-05, 'epoch': 0.65}
65%|██████▍ | 2928/4506 [3:20:14<1:45:33, 4.01s/it]
65%|██████▌ | 2929/4506 [3:20:18<1:44:39, 3.98s/it]
{'loss': 0.22, 'grad_norm': 0.408549964427948, 'learning_rate': 1.6468710980039994e-05, 'epoch': 0.65}
65%|██████▌ | 2929/4506 [3:20:18<1:44:39, 3.98s/it]
65%|██████▌ | 2930/4506 [3:20:22<1:46:03, 4.04s/it]
{'loss': 0.2207, 'grad_norm': 0.437284380197525, 'learning_rate': 1.6450507567206598e-05, 'epoch': 0.65}
65%|██████▌ | 2930/4506 [3:20:22<1:46:03, 4.04s/it]
65%|██████▌ | 2931/4506 [3:20:26<1:44:58, 4.00s/it]
{'loss': 0.2326, 'grad_norm': 0.4149101674556732, 'learning_rate': 1.6432309286038676e-05, 'epoch': 0.65}
65%|██████▌ | 2931/4506 [3:20:26<1:44:58, 4.00s/it]
65%|██████▌ | 2932/4506 [3:20:30<1:47:46, 4.11s/it]
{'loss': 0.2251, 'grad_norm': 0.43288326263427734, 'learning_rate': 1.6414116147459413e-05, 'epoch': 0.65}
65%|██████▌ | 2932/4506 [3:20:30<1:47:46, 4.11s/it]
65%|██████▌ | 2933/4506 [3:20:34<1:47:59, 4.12s/it]
{'loss': 0.2232, 'grad_norm': 0.40796253085136414, 'learning_rate': 1.6395928162388867e-05, 'epoch': 0.65}
65%|██████▌ | 2933/4506 [3:20:34<1:47:59, 4.12s/it]
65%|██████▌ | 2934/4506 [3:20:38<1:48:06, 4.13s/it]
{'loss': 0.2198, 'grad_norm': 0.3531845510005951, 'learning_rate': 1.6377745341744046e-05, 'epoch': 0.65}
65%|██████▌ | 2934/4506 [3:20:38<1:48:06, 4.13s/it]
65%|██████▌ | 2935/4506 [3:20:43<1:49:50, 4.20s/it]
{'loss': 0.2138, 'grad_norm': 0.3229581117630005, 'learning_rate': 1.63595676964388e-05, 'epoch': 0.65}
65%|██████▌ | 2935/4506 [3:20:43<1:49:50, 4.20s/it]
65%|██████▌ | 2936/4506 [3:20:46<1:46:19, 4.06s/it]
{'loss': 0.231, 'grad_norm': 0.5171562433242798, 'learning_rate': 1.6341395237383923e-05, 'epoch': 0.65}
65%|██████▌ | 2936/4506 [3:20:46<1:46:19, 4.06s/it]
65%|██████▌ | 2937/4506 [3:20:51<1:46:49, 4.09s/it]
{'loss': 0.2172, 'grad_norm': 0.43893709778785706, 'learning_rate': 1.6323227975487073e-05, 'epoch': 0.65}
65%|██████▌ | 2937/4506 [3:20:51<1:46:49, 4.09s/it]
65%|██████▌ | 2938/4506 [3:20:55<1:47:17, 4.11s/it]
{'loss': 0.2289, 'grad_norm': 0.3693413734436035, 'learning_rate': 1.6305065921652808e-05, 'epoch': 0.65}
65%|██████▌ | 2938/4506 [3:20:55<1:47:17, 4.11s/it]
65%|██████▌ | 2939/4506 [3:20:59<1:45:47, 4.05s/it]
{'loss': 0.2214, 'grad_norm': 0.4547670781612396, 'learning_rate': 1.6286909086782516e-05, 'epoch': 0.65}
65%|██████▌ | 2939/4506 [3:20:59<1:45:47, 4.05s/it]
65%|██████▌ | 2940/4506 [3:21:02<1:43:36, 3.97s/it]
{'loss': 0.2243, 'grad_norm': 0.4113937020301819, 'learning_rate': 1.626875748177451e-05, 'epoch': 0.65}
65%|██████▌ | 2940/4506 [3:21:02<1:43:36, 3.97s/it]
65%|██████▌ | 2941/4506 [3:21:07<1:45:28, 4.04s/it]
{'loss': 0.2243, 'grad_norm': 0.39952313899993896, 'learning_rate': 1.6250611117523914e-05, 'epoch': 0.65}
65%|██████▌ | 2941/4506 [3:21:07<1:45:28, 4.04s/it]
65%|██████▌ | 2942/4506 [3:21:11<1:46:45, 4.10s/it]
{'loss': 0.2395, 'grad_norm': 0.3929509222507477, 'learning_rate': 1.6232470004922744e-05, 'epoch': 0.65}
65%|██████▌ | 2942/4506 [3:21:11<1:46:45, 4.10s/it]
65%|██████▌ | 2943/4506 [3:21:15<1:47:55, 4.14s/it]
{'loss': 0.2276, 'grad_norm': 0.40536585450172424, 'learning_rate': 1.621433415485985e-05, 'epoch': 0.65}
65%|██████▌ | 2943/4506 [3:21:15<1:47:55, 4.14s/it]
65%|██████▌ | 2944/4506 [3:21:19<1:49:05, 4.19s/it]
{'loss': 0.221, 'grad_norm': 0.3845546841621399, 'learning_rate': 1.6196203578220896e-05, 'epoch': 0.65}
65%|██████▌ | 2944/4506 [3:21:19<1:49:05, 4.19s/it]
65%|██████▌ | 2945/4506 [3:21:23<1:46:39, 4.10s/it]
{'loss': 0.2185, 'grad_norm': 0.4308321177959442, 'learning_rate': 1.6178078285888436e-05, 'epoch': 0.65}
65%|██████▌ | 2945/4506 [3:21:23<1:46:39, 4.10s/it]
65%|██████▌ | 2946/4506 [3:21:28<1:48:12, 4.16s/it]
{'loss': 0.2086, 'grad_norm': 0.31609147787094116, 'learning_rate': 1.6159958288741798e-05, 'epoch': 0.65}
65%|██████▌ | 2946/4506 [3:21:28<1:48:12, 4.16s/it]
65%|██████▌ | 2947/4506 [3:21:32<1:51:31, 4.29s/it]
{'loss': 0.2135, 'grad_norm': 0.3521648049354553, 'learning_rate': 1.6141843597657174e-05, 'epoch': 0.65}
65%|██████▌ | 2947/4506 [3:21:32<1:51:31, 4.29s/it]
65%|██████▌ | 2948/4506 [3:21:36<1:50:31, 4.26s/it]
{'loss': 0.22, 'grad_norm': 0.42461666464805603, 'learning_rate': 1.6123734223507535e-05, 'epoch': 0.65}
65%|██████▌ | 2948/4506 [3:21:36<1:50:31, 4.26s/it]
65%|██████▌ | 2949/4506 [3:21:40<1:47:32, 4.14s/it]
{'loss': 0.2248, 'grad_norm': 0.4339952766895294, 'learning_rate': 1.61056301771627e-05, 'epoch': 0.65}
65%|██████▌ | 2949/4506 [3:21:40<1:47:32, 4.14s/it]
65%|██████▌ | 2950/4506 [3:21:44<1:45:28, 4.07s/it]
{'loss': 0.2284, 'grad_norm': 0.39901745319366455, 'learning_rate': 1.6087531469489247e-05, 'epoch': 0.65}
65%|██████▌ | 2950/4506 [3:21:44<1:45:28, 4.07s/it]
65%|██████▌ | 2951/4506 [3:21:48<1:43:23, 3.99s/it]
{'loss': 0.2143, 'grad_norm': 0.41603225469589233, 'learning_rate': 1.6069438111350585e-05, 'epoch': 0.66}
65%|██████▌ | 2951/4506 [3:21:48<1:43:23, 3.99s/it]
66%|██████▌ | 2952/4506 [3:21:52<1:40:26, 3.88s/it]
{'loss': 0.2238, 'grad_norm': 0.44647637009620667, 'learning_rate': 1.6051350113606888e-05, 'epoch': 0.66}
66%|██████▌ | 2952/4506 [3:21:52<1:40:26, 3.88s/it]
66%|██████▌ | 2953/4506 [3:21:56<1:42:25, 3.96s/it]
{'loss': 0.2159, 'grad_norm': 0.3627834916114807, 'learning_rate': 1.603326748711514e-05, 'epoch': 0.66}
66%|██████▌ | 2953/4506 [3:21:56<1:42:25, 3.96s/it]
66%|██████▌ | 2954/4506 [3:22:00<1:43:03, 3.98s/it]
{'loss': 0.217, 'grad_norm': 0.3475397825241089, 'learning_rate': 1.6015190242729063e-05, 'epoch': 0.66}
66%|██████▌ | 2954/4506 [3:22:00<1:43:03, 3.98s/it]
66%|██████▌ | 2955/4506 [3:22:04<1:45:19, 4.07s/it]
{'loss': 0.2309, 'grad_norm': 0.3532862365245819, 'learning_rate': 1.599711839129918e-05, 'epoch': 0.66}
66%|██████▌ | 2955/4506 [3:22:04<1:45:19, 4.07s/it]
66%|██████▌ | 2956/4506 [3:22:08<1:45:49, 4.10s/it]
{'loss': 0.2266, 'grad_norm': 0.44549301266670227, 'learning_rate': 1.597905194367276e-05, 'epoch': 0.66}
66%|██████▌ | 2956/4506 [3:22:08<1:45:49, 4.10s/it]
66%|██████▌ | 2957/4506 [3:22:12<1:45:32, 4.09s/it]
{'loss': 0.2207, 'grad_norm': 0.42029470205307007, 'learning_rate': 1.5960990910693847e-05, 'epoch': 0.66}
66%|██████▌ | 2957/4506 [3:22:12<1:45:32, 4.09s/it]
66%|██████▌ | 2958/4506 [3:22:16<1:44:16, 4.04s/it]
{'loss': 0.2138, 'grad_norm': 0.35013172030448914, 'learning_rate': 1.59429353032032e-05, 'epoch': 0.66}
66%|██████▌ | 2958/4506 [3:22:16<1:44:16, 4.04s/it]
66%|██████▌ | 2959/4506 [3:22:20<1:45:01, 4.07s/it]
{'loss': 0.2187, 'grad_norm': 0.3742494583129883, 'learning_rate': 1.5924885132038375e-05, 'epoch': 0.66}
66%|██████▌ | 2959/4506 [3:22:20<1:45:01, 4.07s/it]
66%|██████▌ | 2960/4506 [3:22:24<1:44:03, 4.04s/it]
{'loss': 0.2345, 'grad_norm': 0.4579346776008606, 'learning_rate': 1.5906840408033614e-05, 'epoch': 0.66}
66%|██████▌ | 2960/4506 [3:22:24<1:44:03, 4.04s/it]
66%|██████▌ | 2961/4506 [3:22:28<1:43:21, 4.01s/it]
{'loss': 0.2141, 'grad_norm': 0.42398935556411743, 'learning_rate': 1.5888801142019906e-05, 'epoch': 0.66}
66%|██████▌ | 2961/4506 [3:22:28<1:43:21, 4.01s/it]
66%|██████▌ | 2962/4506 [3:22:33<1:44:59, 4.08s/it]
{'loss': 0.2183, 'grad_norm': 0.41237467527389526, 'learning_rate': 1.587076734482498e-05, 'epoch': 0.66}
66%|██████▌ | 2962/4506 [3:22:33<1:44:59, 4.08s/it]
66%|██████▌ | 2963/4506 [3:22:36<1:41:44, 3.96s/it]
{'loss': 0.2346, 'grad_norm': 0.45704206824302673, 'learning_rate': 1.5852739027273256e-05, 'epoch': 0.66}
66%|██████▌ | 2963/4506 [3:22:36<1:41:44, 3.96s/it]
66%|██████▌ | 2964/4506 [3:22:40<1:41:24, 3.95s/it]
{'loss': 0.2247, 'grad_norm': 0.3850363492965698, 'learning_rate': 1.583471620018589e-05, 'epoch': 0.66}
66%|██████▌ | 2964/4506 [3:22:40<1:41:24, 3.95s/it]
66%|██████▌ | 2965/4506 [3:22:44<1:43:38, 4.04s/it]
{'loss': 0.2372, 'grad_norm': 0.38302892446517944, 'learning_rate': 1.5816698874380722e-05, 'epoch': 0.66}
66%|██████▌ | 2965/4506 [3:22:44<1:43:38, 4.04s/it]
66%|██████▌ | 2966/4506 [3:22:48<1:42:09, 3.98s/it]
{'loss': 0.2317, 'grad_norm': 0.3481009602546692, 'learning_rate': 1.579868706067232e-05, 'epoch': 0.66}
66%|██████▌ | 2966/4506 [3:22:48<1:42:09, 3.98s/it]
66%|██████▌ | 2967/4506 [3:22:52<1:42:19, 3.99s/it]
{'loss': 0.2262, 'grad_norm': 0.34376829862594604, 'learning_rate': 1.5780680769871886e-05, 'epoch': 0.66}
66%|██████▌ | 2967/4506 [3:22:52<1:42:19, 3.99s/it]
66%|██████▌ | 2968/4506 [3:22:56<1:43:55, 4.05s/it]
{'loss': 0.227, 'grad_norm': 0.4170290231704712, 'learning_rate': 1.5762680012787378e-05, 'epoch': 0.66}
66%|██████▌ | 2968/4506 [3:22:56<1:43:55, 4.05s/it]
66%|██████▌ | 2969/4506 [3:23:01<1:44:18, 4.07s/it]
{'loss': 0.2287, 'grad_norm': 0.3428385853767395, 'learning_rate': 1.574468480022338e-05, 'epoch': 0.66}
66%|██████▌ | 2969/4506 [3:23:01<1:44:18, 4.07s/it]
66%|██████▌ | 2970/4506 [3:23:04<1:42:30, 4.00s/it]
{'loss': 0.2166, 'grad_norm': 0.37294164299964905, 'learning_rate': 1.5726695142981177e-05, 'epoch': 0.66}
66%|██████▌ | 2970/4506 [3:23:04<1:42:30, 4.00s/it]
66%|██████▌ | 2971/4506 [3:23:09<1:45:33, 4.13s/it]
{'loss': 0.2188, 'grad_norm': 0.3533340096473694, 'learning_rate': 1.5708711051858692e-05, 'epoch': 0.66}
66%|██████▌ | 2971/4506 [3:23:09<1:45:33, 4.13s/it]
66%|██████▌ | 2972/4506 [3:23:13<1:47:01, 4.19s/it]
{'loss': 0.2243, 'grad_norm': 0.3414674401283264, 'learning_rate': 1.5690732537650548e-05, 'epoch': 0.66}
66%|██████▌ | 2972/4506 [3:23:13<1:47:01, 4.19s/it]
66%|██████▌ | 2973/4506 [3:23:17<1:43:09, 4.04s/it]
{'loss': 0.2166, 'grad_norm': 0.4385148882865906, 'learning_rate': 1.5672759611147974e-05, 'epoch': 0.66}
66%|██████▌ | 2973/4506 [3:23:17<1:43:09, 4.04s/it]
66%|██████▌ | 2974/4506 [3:23:21<1:43:05, 4.04s/it]
{'loss': 0.2233, 'grad_norm': 0.37263190746307373, 'learning_rate': 1.5654792283138882e-05, 'epoch': 0.66}
66%|██████▌ | 2974/4506 [3:23:21<1:43:05, 4.04s/it]
66%|██████▌ | 2975/4506 [3:23:25<1:42:43, 4.03s/it]
{'loss': 0.2289, 'grad_norm': 0.4040169417858124, 'learning_rate': 1.5636830564407795e-05, 'epoch': 0.66}
66%|██████▌ | 2975/4506 [3:23:25<1:42:43, 4.03s/it]
66%|██████▌ | 2976/4506 [3:23:29<1:43:17, 4.05s/it]
{'loss': 0.2192, 'grad_norm': 0.4094831943511963, 'learning_rate': 1.56188744657359e-05, 'epoch': 0.66}
66%|██████▌ | 2976/4506 [3:23:29<1:43:17, 4.05s/it]
66%|██████▌ | 2977/4506 [3:23:33<1:42:55, 4.04s/it]
{'loss': 0.2196, 'grad_norm': 0.3828543722629547, 'learning_rate': 1.5600923997900982e-05, 'epoch': 0.66}
66%|██████▌ | 2977/4506 [3:23:33<1:42:55, 4.04s/it]
66%|██████▌ | 2978/4506 [3:23:37<1:43:31, 4.06s/it]
{'loss': 0.2291, 'grad_norm': 0.3624255955219269, 'learning_rate': 1.5582979171677453e-05, 'epoch': 0.66}
66%|██████▌ | 2978/4506 [3:23:37<1:43:31, 4.06s/it]
66%|██████▌ | 2979/4506 [3:23:41<1:41:44, 4.00s/it]
{'loss': 0.2271, 'grad_norm': 0.39003458619117737, 'learning_rate': 1.556503999783636e-05, 'epoch': 0.66}
66%|██████▌ | 2979/4506 [3:23:41<1:41:44, 4.00s/it]
66%|██████▌ | 2980/4506 [3:23:45<1:40:58, 3.97s/it]
{'loss': 0.2186, 'grad_norm': 0.3437541127204895, 'learning_rate': 1.5547106487145323e-05, 'epoch': 0.66}
66%|██████▌ | 2980/4506 [3:23:45<1:40:58, 3.97s/it]
66%|██████▌ | 2981/4506 [3:23:49<1:40:58, 3.97s/it]
{'loss': 0.2257, 'grad_norm': 0.3395892381668091, 'learning_rate': 1.5529178650368604e-05, 'epoch': 0.66}
66%|██████▌ | 2981/4506 [3:23:49<1:40:58, 3.97s/it]
66%|██████▌ | 2982/4506 [3:23:53<1:40:38, 3.96s/it]
{'loss': 0.2179, 'grad_norm': 0.35656705498695374, 'learning_rate': 1.5511256498267014e-05, 'epoch': 0.66}
66%|██████▌ | 2982/4506 [3:23:53<1:40:38, 3.96s/it]
66%|██████▌ | 2983/4506 [3:23:57<1:41:05, 3.98s/it]
{'loss': 0.212, 'grad_norm': 0.32110095024108887, 'learning_rate': 1.549334004159798e-05, 'epoch': 0.66}
66%|██████▌ | 2983/4506 [3:23:57<1:41:05, 3.98s/it]
66%|██████▌ | 2984/4506 [3:24:01<1:43:20, 4.07s/it]
{'loss': 0.2268, 'grad_norm': 0.4122837781906128, 'learning_rate': 1.5475429291115507e-05, 'epoch': 0.66}
66%|██████▌ | 2984/4506 [3:24:01<1:43:20, 4.07s/it]
66%|██████▌ | 2985/4506 [3:24:05<1:41:09, 3.99s/it]
{'loss': 0.2247, 'grad_norm': 0.361045777797699, 'learning_rate': 1.5457524257570184e-05, 'epoch': 0.66}
66%|██████▌ | 2985/4506 [3:24:05<1:41:09, 3.99s/it]
66%|██████▋ | 2986/4506 [3:24:09<1:41:14, 4.00s/it]
{'loss': 0.2153, 'grad_norm': 0.42888757586479187, 'learning_rate': 1.5439624951709126e-05, 'epoch': 0.66}
66%|██████▋ | 2986/4506 [3:24:09<1:41:14, 4.00s/it]
66%|██████▋ | 2987/4506 [3:24:13<1:41:16, 4.00s/it]
{'loss': 0.2224, 'grad_norm': 0.42364567518234253, 'learning_rate': 1.5421731384276076e-05, 'epoch': 0.66}
66%|██████▋ | 2987/4506 [3:24:13<1:41:16, 4.00s/it]
66%|██████▋ | 2988/4506 [3:24:17<1:41:39, 4.02s/it]
{'loss': 0.2147, 'grad_norm': 0.41127708554267883, 'learning_rate': 1.540384356601127e-05, 'epoch': 0.66}
66%|██████▋ | 2988/4506 [3:24:17<1:41:39, 4.02s/it]
66%|██████▋ | 2989/4506 [3:24:21<1:40:19, 3.97s/it]
{'loss': 0.2265, 'grad_norm': 0.3596389889717102, 'learning_rate': 1.538596150765154e-05, 'epoch': 0.66}
66%|██████▋ | 2989/4506 [3:24:21<1:40:19, 3.97s/it]
66%|██████▋ | 2990/4506 [3:24:25<1:44:59, 4.16s/it]
{'loss': 0.2329, 'grad_norm': 0.35791391134262085, 'learning_rate': 1.536808521993023e-05, 'epoch': 0.66}
66%|██████▋ | 2990/4506 [3:24:25<1:44:59, 4.16s/it]
66%|██████▋ | 2991/4506 [3:24:30<1:45:41, 4.19s/it]
{'loss': 0.2087, 'grad_norm': 0.41625529527664185, 'learning_rate': 1.535021471357724e-05, 'epoch': 0.66}
66%|██████▋ | 2991/4506 [3:24:30<1:45:41, 4.19s/it]
66%|██████▋ | 2992/4506 [3:24:34<1:45:44, 4.19s/it]
{'loss': 0.2144, 'grad_norm': 0.36615437269210815, 'learning_rate': 1.533234999931898e-05, 'epoch': 0.66}
66%|██████▋ | 2992/4506 [3:24:34<1:45:44, 4.19s/it]
66%|██████▋ | 2993/4506 [3:24:38<1:48:19, 4.30s/it]
{'loss': 0.2218, 'grad_norm': 0.36903566122055054, 'learning_rate': 1.531449108787841e-05, 'epoch': 0.66}
66%|██████▋ | 2993/4506 [3:24:38<1:48:19, 4.30s/it]
66%|██████▋ | 2994/4506 [3:24:42<1:45:50, 4.20s/it]
{'loss': 0.2236, 'grad_norm': 0.3571740686893463, 'learning_rate': 1.5296637989974988e-05, 'epoch': 0.66}
66%|██████▋ | 2994/4506 [3:24:42<1:45:50, 4.20s/it]
66%|██████▋ | 2995/4506 [3:24:46<1:43:14, 4.10s/it]
{'loss': 0.2306, 'grad_norm': 0.45321956276893616, 'learning_rate': 1.5278790716324674e-05, 'epoch': 0.66}
66%|██████▋ | 2995/4506 [3:24:46<1:43:14, 4.10s/it]
66%|██████▋ | 2996/4506 [3:24:50<1:44:04, 4.14s/it]
{'loss': 0.2231, 'grad_norm': 0.3684597611427307, 'learning_rate': 1.5260949277639957e-05, 'epoch': 0.67}
66%|██████▋ | 2996/4506 [3:24:50<1:44:04, 4.14s/it]
67%|██████▋ | 2997/4506 [3:24:54<1:42:37, 4.08s/it]
{'loss': 0.2228, 'grad_norm': 0.3663969933986664, 'learning_rate': 1.5243113684629801e-05, 'epoch': 0.67}
67%|██████▋ | 2997/4506 [3:24:54<1:42:37, 4.08s/it]
67%|██████▋ | 2998/4506 [3:24:59<1:43:03, 4.10s/it]
{'loss': 0.2234, 'grad_norm': 0.38783976435661316, 'learning_rate': 1.5225283947999686e-05, 'epoch': 0.67}
67%|██████▋ | 2998/4506 [3:24:59<1:43:03, 4.10s/it]
67%|██████▋ | 2999/4506 [3:25:03<1:43:16, 4.11s/it]
{'loss': 0.2146, 'grad_norm': 0.3255474865436554, 'learning_rate': 1.5207460078451551e-05, 'epoch': 0.67}
67%|██████▋ | 2999/4506 [3:25:03<1:43:16, 4.11s/it]
67%|██████▋ | 3000/4506 [3:25:07<1:40:52, 4.02s/it]
{'loss': 0.229, 'grad_norm': 0.38356390595436096, 'learning_rate': 1.5189642086683838e-05, 'epoch': 0.67}
67%|██████▋ | 3000/4506 [3:25:07<1:40:52, 4.02s/it]
67%|██████▋ | 3001/4506 [3:25:11<1:45:16, 4.20s/it]
{'loss': 0.2338, 'grad_norm': 0.40235891938209534, 'learning_rate': 1.5171829983391429e-05, 'epoch': 0.67}
67%|██████▋ | 3001/4506 [3:25:11<1:45:16, 4.20s/it]
67%|██████▋ | 3002/4506 [3:25:15<1:42:03, 4.07s/it]
{'loss': 0.2247, 'grad_norm': 0.40508273243904114, 'learning_rate': 1.5154023779265706e-05, 'epoch': 0.67}
67%|██████▋ | 3002/4506 [3:25:15<1:42:03, 4.07s/it]
67%|██████▋ | 3003/4506 [3:25:19<1:44:14, 4.16s/it]
{'loss': 0.2141, 'grad_norm': 0.3318498432636261, 'learning_rate': 1.5136223484994483e-05, 'epoch': 0.67}
67%|██████▋ | 3003/4506 [3:25:19<1:44:14, 4.16s/it]
67%|██████▋ | 3004/4506 [3:25:23<1:42:40, 4.10s/it]
{'loss': 0.21, 'grad_norm': 0.3509775996208191, 'learning_rate': 1.511842911126205e-05, 'epoch': 0.67}
67%|██████▋ | 3004/4506 [3:25:23<1:42:40, 4.10s/it]
67%|██████▋ | 3005/4506 [3:25:28<1:44:04, 4.16s/it]
{'loss': 0.2132, 'grad_norm': 0.3575451672077179, 'learning_rate': 1.5100640668749119e-05, 'epoch': 0.67}
67%|██████▋ | 3005/4506 [3:25:28<1:44:04, 4.16s/it]
67%|██████▋ | 3006/4506 [3:25:31<1:42:20, 4.09s/it]
{'loss': 0.2288, 'grad_norm': 0.40029701590538025, 'learning_rate': 1.5082858168132866e-05, 'epoch': 0.67}
67%|██████▋ | 3006/4506 [3:25:31<1:42:20, 4.09s/it]
67%|██████▋ | 3007/4506 [3:25:36<1:42:19, 4.10s/it]
{'loss': 0.2119, 'grad_norm': 0.3539907932281494, 'learning_rate': 1.5065081620086877e-05, 'epoch': 0.67}
67%|██████▋ | 3007/4506 [3:25:36<1:42:19, 4.10s/it]
67%|██████▋ | 3008/4506 [3:25:39<1:40:57, 4.04s/it]
{'loss': 0.2264, 'grad_norm': 0.3596251904964447, 'learning_rate': 1.504731103528119e-05, 'epoch': 0.67}
67%|██████▋ | 3008/4506 [3:25:39<1:40:57, 4.04s/it]
67%|██████▋ | 3009/4506 [3:25:43<1:38:59, 3.97s/it]
{'loss': 0.2259, 'grad_norm': 0.46509650349617004, 'learning_rate': 1.5029546424382238e-05, 'epoch': 0.67}
67%|██████▋ | 3009/4506 [3:25:43<1:38:59, 3.97s/it]
67%|██████▋ | 3010/4506 [3:25:48<1:41:04, 4.05s/it]
{'loss': 0.2261, 'grad_norm': 0.41474950313568115, 'learning_rate': 1.5011787798052898e-05, 'epoch': 0.67}
67%|██████▋ | 3010/4506 [3:25:48<1:41:04, 4.05s/it]
67%|██████▋ | 3011/4506 [3:25:52<1:42:25, 4.11s/it]
{'loss': 0.2209, 'grad_norm': 0.3834429383277893, 'learning_rate': 1.4994035166952421e-05, 'epoch': 0.67}
67%|██████▋ | 3011/4506 [3:25:52<1:42:25, 4.11s/it]
67%|██████▋ | 3012/4506 [3:25:56<1:40:44, 4.05s/it]
{'loss': 0.2114, 'grad_norm': 0.3778914511203766, 'learning_rate': 1.4976288541736477e-05, 'epoch': 0.67}
67%|██████▋ | 3012/4506 [3:25:56<1:40:44, 4.05s/it]
67%|██████▋ | 3013/4506 [3:25:59<1:37:31, 3.92s/it]
{'loss': 0.2134, 'grad_norm': 0.4105207324028015, 'learning_rate': 1.4958547933057149e-05, 'epoch': 0.67}
67%|██████▋ | 3013/4506 [3:25:59<1:37:31, 3.92s/it]
67%|██████▋ | 3014/4506 [3:26:03<1:39:08, 3.99s/it]
{'loss': 0.2064, 'grad_norm': 0.35180696845054626, 'learning_rate': 1.4940813351562866e-05, 'epoch': 0.67}
67%|██████▋ | 3014/4506 [3:26:03<1:39:08, 3.99s/it]
67%|██████▋ | 3015/4506 [3:26:08<1:39:47, 4.02s/it]
{'loss': 0.2299, 'grad_norm': 0.41399624943733215, 'learning_rate': 1.4923084807898479e-05, 'epoch': 0.67}
67%|██████▋ | 3015/4506 [3:26:08<1:39:47, 4.02s/it]
67%|██████▋ | 3016/4506 [3:26:11<1:38:59, 3.99s/it]
{'loss': 0.2166, 'grad_norm': 0.3395949900150299, 'learning_rate': 1.490536231270519e-05, 'epoch': 0.67}
67%|██████▋ | 3016/4506 [3:26:11<1:38:59, 3.99s/it]
67%|██████▋ | 3017/4506 [3:26:15<1:37:33, 3.93s/it]
{'loss': 0.2258, 'grad_norm': 0.4222262501716614, 'learning_rate': 1.4887645876620587e-05, 'epoch': 0.67}
67%|██████▋ | 3017/4506 [3:26:15<1:37:33, 3.93s/it]
67%|██████▋ | 3018/4506 [3:26:19<1:37:50, 3.95s/it]
{'loss': 0.2144, 'grad_norm': 0.33339181542396545, 'learning_rate': 1.486993551027861e-05, 'epoch': 0.67}
67%|██████▋ | 3018/4506 [3:26:19<1:37:50, 3.95s/it]
67%|██████▋ | 3019/4506 [3:26:23<1:37:54, 3.95s/it]
{'loss': 0.2133, 'grad_norm': 0.394127756357193, 'learning_rate': 1.485223122430957e-05, 'epoch': 0.67}
67%|██████▋ | 3019/4506 [3:26:23<1:37:54, 3.95s/it]
67%|██████▋ | 3020/4506 [3:26:27<1:38:27, 3.98s/it]
{'loss': 0.2177, 'grad_norm': 0.4080653786659241, 'learning_rate': 1.4834533029340104e-05, 'epoch': 0.67}
67%|██████▋ | 3020/4506 [3:26:27<1:38:27, 3.98s/it]
67%|██████▋ | 3021/4506 [3:26:31<1:35:46, 3.87s/it]
{'loss': 0.2223, 'grad_norm': 0.42131903767585754, 'learning_rate': 1.4816840935993209e-05, 'epoch': 0.67}
67%|██████▋ | 3021/4506 [3:26:31<1:35:46, 3.87s/it]
67%|██████▋ | 3022/4506 [3:26:35<1:36:05, 3.88s/it]
{'loss': 0.2254, 'grad_norm': 0.3885858356952667, 'learning_rate': 1.4799154954888222e-05, 'epoch': 0.67}
67%|██████▋ | 3022/4506 [3:26:35<1:36:05, 3.88s/it]
67%|██████▋ | 3023/4506 [3:26:39<1:36:44, 3.91s/it]
{'loss': 0.2127, 'grad_norm': 0.3729693293571472, 'learning_rate': 1.4781475096640815e-05, 'epoch': 0.67}
67%|██████▋ | 3023/4506 [3:26:39<1:36:44, 3.91s/it]
67%|██████▋ | 3024/4506 [3:26:43<1:35:59, 3.89s/it]
{'loss': 0.218, 'grad_norm': 0.38233786821365356, 'learning_rate': 1.4763801371862956e-05, 'epoch': 0.67}
67%|██████▋ | 3024/4506 [3:26:43<1:35:59, 3.89s/it]
67%|██████▋ | 3025/4506 [3:26:47<1:39:44, 4.04s/it]
{'loss': 0.2166, 'grad_norm': 0.3478695750236511, 'learning_rate': 1.4746133791162976e-05, 'epoch': 0.67}
67%|██████▋ | 3025/4506 [3:26:47<1:39:44, 4.04s/it]
67%|██████▋ | 3026/4506 [3:26:51<1:41:18, 4.11s/it]
{'loss': 0.2094, 'grad_norm': 0.3242485821247101, 'learning_rate': 1.4728472365145471e-05, 'epoch': 0.67}
67%|██████▋ | 3026/4506 [3:26:51<1:41:18, 4.11s/it]
67%|██████▋ | 3027/4506 [3:26:55<1:41:47, 4.13s/it]
{'loss': 0.2251, 'grad_norm': 0.3843131363391876, 'learning_rate': 1.4710817104411386e-05, 'epoch': 0.67}
67%|██████▋ | 3027/4506 [3:26:55<1:41:47, 4.13s/it]
67%|██████▋ | 3028/4506 [3:26:59<1:38:28, 4.00s/it]
{'loss': 0.2089, 'grad_norm': 0.3701571226119995, 'learning_rate': 1.4693168019557943e-05, 'epoch': 0.67}
67%|██████▋ | 3028/4506 [3:26:59<1:38:28, 4.00s/it]
67%|██████▋ | 3029/4506 [3:27:03<1:39:07, 4.03s/it]
{'loss': 0.2292, 'grad_norm': 0.37307319045066833, 'learning_rate': 1.4675525121178646e-05, 'epoch': 0.67}
67%|██████▋ | 3029/4506 [3:27:03<1:39:07, 4.03s/it]
67%|██████▋ | 3030/4506 [3:27:07<1:38:14, 3.99s/it]
{'loss': 0.2232, 'grad_norm': 0.4099442660808563, 'learning_rate': 1.4657888419863308e-05, 'epoch': 0.67}
67%|██████▋ | 3030/4506 [3:27:07<1:38:14, 3.99s/it]
67%|██████▋ | 3031/4506 [3:27:11<1:37:07, 3.95s/it]
{'loss': 0.2113, 'grad_norm': 0.38284018635749817, 'learning_rate': 1.4640257926198014e-05, 'epoch': 0.67}
67%|██████▋ | 3031/4506 [3:27:11<1:37:07, 3.95s/it]
67%|██████▋ | 3032/4506 [3:27:15<1:39:16, 4.04s/it]
{'loss': 0.2198, 'grad_norm': 0.33967670798301697, 'learning_rate': 1.4622633650765134e-05, 'epoch': 0.67}
67%|██████▋ | 3032/4506 [3:27:15<1:39:16, 4.04s/it]
67%|██████▋ | 3033/4506 [3:27:19<1:38:27, 4.01s/it]
{'loss': 0.2075, 'grad_norm': 0.33085015416145325, 'learning_rate': 1.4605015604143269e-05, 'epoch': 0.67}
67%|██████▋ | 3033/4506 [3:27:19<1:38:27, 4.01s/it]
67%|██████▋ | 3034/4506 [3:27:23<1:38:21, 4.01s/it]
{'loss': 0.2233, 'grad_norm': 0.41126805543899536, 'learning_rate': 1.4587403796907328e-05, 'epoch': 0.67}
67%|██████▋ | 3034/4506 [3:27:23<1:38:21, 4.01s/it]
67%|██████▋ | 3035/4506 [3:27:28<1:41:53, 4.16s/it]
{'loss': 0.2226, 'grad_norm': 0.4214102029800415, 'learning_rate': 1.4569798239628431e-05, 'epoch': 0.67}
67%|██████▋ | 3035/4506 [3:27:28<1:41:53, 4.16s/it]
67%|██████▋ | 3036/4506 [3:27:32<1:39:41, 4.07s/it]
{'loss': 0.2171, 'grad_norm': 0.3715529441833496, 'learning_rate': 1.4552198942874004e-05, 'epoch': 0.67}
67%|██████▋ | 3036/4506 [3:27:32<1:39:41, 4.07s/it]
67%|██████▋ | 3037/4506 [3:27:36<1:42:39, 4.19s/it]
{'loss': 0.2176, 'grad_norm': 0.4166722595691681, 'learning_rate': 1.453460591720765e-05, 'epoch': 0.67}
67%|██████▋ | 3037/4506 [3:27:36<1:42:39, 4.19s/it]
67%|██████▋ | 3038/4506 [3:27:40<1:41:33, 4.15s/it]
{'loss': 0.2249, 'grad_norm': 0.3877812922000885, 'learning_rate': 1.4517019173189258e-05, 'epoch': 0.67}
67%|██████▋ | 3038/4506 [3:27:40<1:41:33, 4.15s/it]
67%|██████▋ | 3039/4506 [3:27:44<1:42:14, 4.18s/it]
{'loss': 0.2123, 'grad_norm': 0.3639332354068756, 'learning_rate': 1.4499438721374908e-05, 'epoch': 0.67}
67%|██████▋ | 3039/4506 [3:27:44<1:42:14, 4.18s/it]
67%|██████▋ | 3040/4506 [3:27:48<1:40:02, 4.09s/it]
{'loss': 0.2243, 'grad_norm': 0.37953490018844604, 'learning_rate': 1.4481864572316945e-05, 'epoch': 0.67}
67%|██████▋ | 3040/4506 [3:27:48<1:40:02, 4.09s/it]
67%|██████▋ | 3041/4506 [3:27:53<1:42:41, 4.21s/it]
{'loss': 0.2271, 'grad_norm': 0.3758881390094757, 'learning_rate': 1.4464296736563881e-05, 'epoch': 0.67}
67%|██████▋ | 3041/4506 [3:27:53<1:42:41, 4.21s/it]
68%|██████▊ | 3042/4506 [3:27:57<1:43:49, 4.25s/it]
{'loss': 0.2246, 'grad_norm': 0.4163985252380371, 'learning_rate': 1.4446735224660479e-05, 'epoch': 0.68}
68%|██████▊ | 3042/4506 [3:27:57<1:43:49, 4.25s/it]
68%|██████▊ | 3043/4506 [3:28:01<1:42:20, 4.20s/it]
{'loss': 0.2165, 'grad_norm': 0.34291839599609375, 'learning_rate': 1.4429180047147698e-05, 'epoch': 0.68}
68%|██████▊ | 3043/4506 [3:28:01<1:42:20, 4.20s/it]
68%|██████▊ | 3044/4506 [3:28:05<1:42:00, 4.19s/it]
{'loss': 0.2267, 'grad_norm': 0.4473089277744293, 'learning_rate': 1.4411631214562691e-05, 'epoch': 0.68}
68%|██████▊ | 3044/4506 [3:28:05<1:42:00, 4.19s/it]
68%|██████▊ | 3045/4506 [3:28:09<1:40:51, 4.14s/it]
{'loss': 0.2095, 'grad_norm': 0.36842402815818787, 'learning_rate': 1.4394088737438795e-05, 'epoch': 0.68}
68%|██████▊ | 3045/4506 [3:28:09<1:40:51, 4.14s/it]
68%|██████▊ | 3046/4506 [3:28:14<1:42:32, 4.21s/it]
{'loss': 0.2212, 'grad_norm': 0.3897131085395813, 'learning_rate': 1.437655262630553e-05, 'epoch': 0.68}
68%|██████▊ | 3046/4506 [3:28:14<1:42:32, 4.21s/it]
68%|██████▊ | 3047/4506 [3:28:17<1:38:33, 4.05s/it]
{'loss': 0.2118, 'grad_norm': 0.35742485523223877, 'learning_rate': 1.4359022891688612e-05, 'epoch': 0.68}
68%|██████▊ | 3047/4506 [3:28:17<1:38:33, 4.05s/it]
68%|██████▊ | 3048/4506 [3:28:21<1:37:53, 4.03s/it]
{'loss': 0.2172, 'grad_norm': 0.3991433382034302, 'learning_rate': 1.434149954410992e-05, 'epoch': 0.68}
68%|██████▊ | 3048/4506 [3:28:21<1:37:53, 4.03s/it]
68%|██████▊ | 3049/4506 [3:28:26<1:41:15, 4.17s/it]
{'loss': 0.2175, 'grad_norm': 0.3923925757408142, 'learning_rate': 1.4323982594087514e-05, 'epoch': 0.68}
68%|██████▊ | 3049/4506 [3:28:26<1:41:15, 4.17s/it]
68%|██████▊ | 3050/4506 [3:28:30<1:41:34, 4.19s/it]
{'loss': 0.2358, 'grad_norm': 0.42051270604133606, 'learning_rate': 1.4306472052135577e-05, 'epoch': 0.68}
68%|██████▊ | 3050/4506 [3:28:30<1:41:34, 4.19s/it]
68%|██████▊ | 3051/4506 [3:28:34<1:40:08, 4.13s/it]
{'loss': 0.2168, 'grad_norm': 0.3524439334869385, 'learning_rate': 1.4288967928764491e-05, 'epoch': 0.68}
68%|██████▊ | 3051/4506 [3:28:34<1:40:08, 4.13s/it]
68%|██████▊ | 3052/4506 [3:28:38<1:39:58, 4.13s/it]
{'loss': 0.2179, 'grad_norm': 0.42019638419151306, 'learning_rate': 1.4271470234480744e-05, 'epoch': 0.68}
68%|██████▊ | 3052/4506 [3:28:38<1:39:58, 4.13s/it]
68%|██████▊ | 3053/4506 [3:28:42<1:39:33, 4.11s/it]
{'loss': 0.2318, 'grad_norm': 0.37379515171051025, 'learning_rate': 1.4253978979787e-05, 'epoch': 0.68}
68%|██████▊ | 3053/4506 [3:28:42<1:39:33, 4.11s/it]
68%|██████▊ | 3054/4506 [3:28:47<1:42:35, 4.24s/it]
{'loss': 0.2147, 'grad_norm': 0.4212897717952728, 'learning_rate': 1.4236494175182016e-05, 'epoch': 0.68}
68%|██████▊ | 3054/4506 [3:28:47<1:42:35, 4.24s/it]
68%|██████▊ | 3055/4506 [3:28:51<1:42:03, 4.22s/it]
{'loss': 0.2117, 'grad_norm': 0.3164077401161194, 'learning_rate': 1.4219015831160742e-05, 'epoch': 0.68}
68%|██████▊ | 3055/4506 [3:28:51<1:42:03, 4.22s/it]
68%|██████▊ | 3056/4506 [3:28:55<1:41:00, 4.18s/it]
{'loss': 0.214, 'grad_norm': 0.3872397840023041, 'learning_rate': 1.4201543958214186e-05, 'epoch': 0.68}
68%|██████▊ | 3056/4506 [3:28:55<1:41:00, 4.18s/it]
68%|██████▊ | 3057/4506 [3:29:00<1:43:34, 4.29s/it]
{'loss': 0.2195, 'grad_norm': 0.3856452405452728, 'learning_rate': 1.4184078566829512e-05, 'epoch': 0.68}
68%|██████▊ | 3057/4506 [3:29:00<1:43:34, 4.29s/it]
68%|██████▊ | 3058/4506 [3:29:04<1:41:46, 4.22s/it]
{'loss': 0.2164, 'grad_norm': 0.3520124554634094, 'learning_rate': 1.4166619667489961e-05, 'epoch': 0.68}
68%|██████▊ | 3058/4506 [3:29:04<1:41:46, 4.22s/it]
68%|██████▊ | 3059/4506 [3:29:08<1:45:48, 4.39s/it]
{'loss': 0.221, 'grad_norm': 0.34653833508491516, 'learning_rate': 1.4149167270674918e-05, 'epoch': 0.68}
68%|██████▊ | 3059/4506 [3:29:08<1:45:48, 4.39s/it]
68%|██████▊ | 3060/4506 [3:29:12<1:42:34, 4.26s/it]
{'loss': 0.2188, 'grad_norm': 0.3974013328552246, 'learning_rate': 1.4131721386859825e-05, 'epoch': 0.68}
68%|██████▊ | 3060/4506 [3:29:12<1:42:34, 4.26s/it]
68%|██████▊ | 3061/4506 [3:29:16<1:39:54, 4.15s/it]
{'loss': 0.217, 'grad_norm': 0.3854479193687439, 'learning_rate': 1.4114282026516235e-05, 'epoch': 0.68}
68%|██████▊ | 3061/4506 [3:29:16<1:39:54, 4.15s/it]
68%|██████▊ | 3062/4506 [3:29:20<1:40:19, 4.17s/it]
{'loss': 0.2106, 'grad_norm': 0.3681335151195526, 'learning_rate': 1.4096849200111794e-05, 'epoch': 0.68}
68%|██████▊ | 3062/4506 [3:29:20<1:40:19, 4.17s/it]
68%|██████▊ | 3063/4506 [3:29:24<1:37:05, 4.04s/it]
{'loss': 0.2127, 'grad_norm': 0.3879750669002533, 'learning_rate': 1.4079422918110199e-05, 'epoch': 0.68}
68%|██████▊ | 3063/4506 [3:29:24<1:37:05, 4.04s/it]
68%|██████▊ | 3064/4506 [3:29:28<1:37:58, 4.08s/it]
{'loss': 0.2132, 'grad_norm': 0.35643890500068665, 'learning_rate': 1.406200319097125e-05, 'epoch': 0.68}
68%|██████▊ | 3064/4506 [3:29:28<1:37:58, 4.08s/it]
68%|██████▊ | 3065/4506 [3:29:33<1:39:44, 4.15s/it]
{'loss': 0.2186, 'grad_norm': 0.3789287805557251, 'learning_rate': 1.4044590029150783e-05, 'epoch': 0.68}
68%|██████▊ | 3065/4506 [3:29:33<1:39:44, 4.15s/it]
68%|██████▊ | 3066/4506 [3:29:37<1:39:05, 4.13s/it]
{'loss': 0.209, 'grad_norm': 0.3912944495677948, 'learning_rate': 1.4027183443100716e-05, 'epoch': 0.68}
68%|██████▊ | 3066/4506 [3:29:37<1:39:05, 4.13s/it]
68%|██████▊ | 3067/4506 [3:29:41<1:39:23, 4.14s/it]
{'loss': 0.2191, 'grad_norm': 0.3307666778564453, 'learning_rate': 1.4009783443269009e-05, 'epoch': 0.68}
68%|██████▊ | 3067/4506 [3:29:41<1:39:23, 4.14s/it]
68%|██████▊ | 3068/4506 [3:29:45<1:38:31, 4.11s/it]
{'loss': 0.2124, 'grad_norm': 0.3732035458087921, 'learning_rate': 1.399239004009968e-05, 'epoch': 0.68}
68%|██████▊ | 3068/4506 [3:29:45<1:38:31, 4.11s/it]
68%|██████▊ | 3069/4506 [3:29:49<1:39:23, 4.15s/it]
{'loss': 0.2239, 'grad_norm': 0.4669533669948578, 'learning_rate': 1.3975003244032764e-05, 'epoch': 0.68}
68%|██████▊ | 3069/4506 [3:29:49<1:39:23, 4.15s/it]
68%|██████▊ | 3070/4506 [3:29:53<1:38:19, 4.11s/it]
{'loss': 0.226, 'grad_norm': 0.3655960261821747, 'learning_rate': 1.395762306550436e-05, 'epoch': 0.68}
68%|██████▊ | 3070/4506 [3:29:53<1:38:19, 4.11s/it]
68%|██████▊ | 3071/4506 [3:29:57<1:38:37, 4.12s/it]
{'loss': 0.2137, 'grad_norm': 0.35749465227127075, 'learning_rate': 1.3940249514946563e-05, 'epoch': 0.68}
68%|██████▊ | 3071/4506 [3:29:57<1:38:37, 4.12s/it]
68%|██████▊ | 3072/4506 [3:30:02<1:38:49, 4.14s/it]
{'loss': 0.2168, 'grad_norm': 0.385691374540329, 'learning_rate': 1.3922882602787523e-05, 'epoch': 0.68}
68%|██████▊ | 3072/4506 [3:30:02<1:38:49, 4.14s/it]
68%|██████▊ | 3073/4506 [3:30:06<1:37:17, 4.07s/it]
{'loss': 0.2095, 'grad_norm': 0.3642350435256958, 'learning_rate': 1.3905522339451365e-05, 'epoch': 0.68}
68%|██████▊ | 3073/4506 [3:30:06<1:37:17, 4.07s/it]
68%|██████▊ | 3074/4506 [3:30:10<1:39:12, 4.16s/it]
{'loss': 0.21, 'grad_norm': 0.34934002161026, 'learning_rate': 1.3888168735358285e-05, 'epoch': 0.68}
68%|██████▊ | 3074/4506 [3:30:10<1:39:12, 4.16s/it]
68%|██████▊ | 3075/4506 [3:30:14<1:38:35, 4.13s/it]
{'loss': 0.2134, 'grad_norm': 0.3577991724014282, 'learning_rate': 1.387082180092441e-05, 'epoch': 0.68}
68%|██████▊ | 3075/4506 [3:30:14<1:38:35, 4.13s/it]
68%|██████▊ | 3076/4506 [3:30:18<1:36:41, 4.06s/it]
{'loss': 0.2148, 'grad_norm': 0.3720307946205139, 'learning_rate': 1.3853481546561924e-05, 'epoch': 0.68}
68%|██████▊ | 3076/4506 [3:30:18<1:36:41, 4.06s/it]
68%|██████▊ | 3077/4506 [3:30:22<1:38:50, 4.15s/it]
{'loss': 0.2176, 'grad_norm': 0.3440043330192566, 'learning_rate': 1.383614798267896e-05, 'epoch': 0.68}
68%|██████▊ | 3077/4506 [3:30:22<1:38:50, 4.15s/it]
68%|██████▊ | 3078/4506 [3:30:26<1:36:54, 4.07s/it]
{'loss': 0.222, 'grad_norm': 0.40477636456489563, 'learning_rate': 1.3818821119679643e-05, 'epoch': 0.68}
68%|██████▊ | 3078/4506 [3:30:26<1:36:54, 4.07s/it]
68%|██████▊ | 3079/4506 [3:30:30<1:36:16, 4.05s/it]
{'loss': 0.2077, 'grad_norm': 0.36231574416160583, 'learning_rate': 1.3801500967964095e-05, 'epoch': 0.68}
68%|██████▊ | 3079/4506 [3:30:30<1:36:16, 4.05s/it]
68%|██████▊ | 3080/4506 [3:30:34<1:36:03, 4.04s/it]
{'loss': 0.226, 'grad_norm': 0.41371914744377136, 'learning_rate': 1.3784187537928394e-05, 'epoch': 0.68}
68%|██████▊ | 3080/4506 [3:30:34<1:36:03, 4.04s/it]
68%|██████▊ | 3081/4506 [3:30:38<1:32:41, 3.90s/it]
{'loss': 0.2168, 'grad_norm': 0.4094100594520569, 'learning_rate': 1.3766880839964602e-05, 'epoch': 0.68}
68%|██████▊ | 3081/4506 [3:30:38<1:32:41, 3.90s/it]
68%|██████▊ | 3082/4506 [3:30:42<1:35:39, 4.03s/it]
{'loss': 0.2173, 'grad_norm': 0.38340896368026733, 'learning_rate': 1.3749580884460706e-05, 'epoch': 0.68}
68%|██████▊ | 3082/4506 [3:30:42<1:35:39, 4.03s/it]
68%|██████▊ | 3083/4506 [3:30:46<1:35:35, 4.03s/it]
{'loss': 0.2177, 'grad_norm': 0.38461804389953613, 'learning_rate': 1.3732287681800679e-05, 'epoch': 0.68}
68%|██████▊ | 3083/4506 [3:30:46<1:35:35, 4.03s/it]
68%|██████▊ | 3084/4506 [3:30:50<1:33:55, 3.96s/it]
{'loss': 0.2131, 'grad_norm': 0.3918326497077942, 'learning_rate': 1.3715001242364411e-05, 'epoch': 0.68}
68%|██████▊ | 3084/4506 [3:30:50<1:33:55, 3.96s/it]
68%|██████▊ | 3085/4506 [3:30:54<1:35:41, 4.04s/it]
{'loss': 0.2262, 'grad_norm': 0.34038445353507996, 'learning_rate': 1.3697721576527761e-05, 'epoch': 0.68}
68%|██████▊ | 3085/4506 [3:30:54<1:35:41, 4.04s/it]
68%|██████▊ | 3086/4506 [3:30:58<1:37:39, 4.13s/it]
{'loss': 0.2051, 'grad_norm': 0.33987611532211304, 'learning_rate': 1.3680448694662513e-05, 'epoch': 0.68}
68%|██████▊ | 3086/4506 [3:30:58<1:37:39, 4.13s/it]
69%|██████▊ | 3087/4506 [3:31:03<1:38:24, 4.16s/it]
{'loss': 0.2148, 'grad_norm': 0.3805027902126312, 'learning_rate': 1.3663182607136377e-05, 'epoch': 0.69}
69%|██████▊ | 3087/4506 [3:31:03<1:38:24, 4.16s/it]
69%|██████▊ | 3088/4506 [3:31:07<1:39:24, 4.21s/it]
{'loss': 0.2179, 'grad_norm': 0.38012486696243286, 'learning_rate': 1.3645923324312968e-05, 'epoch': 0.69}
69%|██████▊ | 3088/4506 [3:31:07<1:39:24, 4.21s/it]
69%|██████▊ | 3089/4506 [3:31:11<1:38:43, 4.18s/it]
{'loss': 0.2108, 'grad_norm': 0.4126494824886322, 'learning_rate': 1.3628670856551856e-05, 'epoch': 0.69}
69%|██████▊ | 3089/4506 [3:31:11<1:38:43, 4.18s/it]
69%|██████▊ | 3090/4506 [3:31:15<1:37:43, 4.14s/it]
{'loss': 0.2172, 'grad_norm': 0.44266921281814575, 'learning_rate': 1.3611425214208473e-05, 'epoch': 0.69}
69%|██████▊ | 3090/4506 [3:31:15<1:37:43, 4.14s/it]
69%|██████▊ | 3091/4506 [3:31:20<1:40:36, 4.27s/it]
{'loss': 0.218, 'grad_norm': 0.3651542663574219, 'learning_rate': 1.3594186407634202e-05, 'epoch': 0.69}
69%|██████▊ | 3091/4506 [3:31:20<1:40:36, 4.27s/it]
69%|██████▊ | 3092/4506 [3:31:24<1:38:07, 4.16s/it]
{'loss': 0.2256, 'grad_norm': 0.38825151324272156, 'learning_rate': 1.3576954447176265e-05, 'epoch': 0.69}
69%|██████▊ | 3092/4506 [3:31:24<1:38:07, 4.16s/it]
69%|██████▊ | 3093/4506 [3:31:28<1:39:36, 4.23s/it]
{'loss': 0.2337, 'grad_norm': 0.48349717259407043, 'learning_rate': 1.3559729343177851e-05, 'epoch': 0.69}
69%|██████▊ | 3093/4506 [3:31:28<1:39:36, 4.23s/it]
69%|██████▊ | 3094/4506 [3:31:32<1:36:19, 4.09s/it]
{'loss': 0.2247, 'grad_norm': 0.46677517890930176, 'learning_rate': 1.3542511105977974e-05, 'epoch': 0.69}
69%|██████▊ | 3094/4506 [3:31:32<1:36:19, 4.09s/it]
69%|██████▊ | 3095/4506 [3:31:36<1:34:42, 4.03s/it]
{'loss': 0.2101, 'grad_norm': 0.4047313928604126, 'learning_rate': 1.3525299745911533e-05, 'epoch': 0.69}
69%|██████▊ | 3095/4506 [3:31:36<1:34:42, 4.03s/it]
69%|██████▊ | 3096/4506 [3:31:40<1:33:46, 3.99s/it]
{'loss': 0.2147, 'grad_norm': 0.40130364894866943, 'learning_rate': 1.3508095273309324e-05, 'epoch': 0.69}
69%|██████▊ | 3096/4506 [3:31:40<1:33:46, 3.99s/it]
69%|██████▊ | 3097/4506 [3:31:43<1:32:16, 3.93s/it]
{'loss': 0.2101, 'grad_norm': 0.4196714758872986, 'learning_rate': 1.3490897698497985e-05, 'epoch': 0.69}
69%|██████▊ | 3097/4506 [3:31:43<1:32:16, 3.93s/it]
69%|██████▉ | 3098/4506 [3:31:47<1:32:45, 3.95s/it]
{'loss': 0.2218, 'grad_norm': 0.3737318217754364, 'learning_rate': 1.3473707031800023e-05, 'epoch': 0.69}
69%|██████▉ | 3098/4506 [3:31:47<1:32:45, 3.95s/it]
69%|██████▉ | 3099/4506 [3:31:51<1:33:59, 4.01s/it]
{'loss': 0.2286, 'grad_norm': 0.3767370879650116, 'learning_rate': 1.3456523283533807e-05, 'epoch': 0.69}
69%|██████▉ | 3099/4506 [3:31:51<1:33:59, 4.01s/it]
69%|██████▉ | 3100/4506 [3:31:56<1:34:36, 4.04s/it]
{'loss': 0.2165, 'grad_norm': 0.36635157465934753, 'learning_rate': 1.3439346464013552e-05, 'epoch': 0.69}
69%|██████▉ | 3100/4506 [3:31:56<1:34:36, 4.04s/it]
69%|██████▉ | 3101/4506 [3:32:00<1:35:34, 4.08s/it]
{'loss': 0.2149, 'grad_norm': 0.4551670551300049, 'learning_rate': 1.342217658354929e-05, 'epoch': 0.69}
69%|██████▉ | 3101/4506 [3:32:00<1:35:34, 4.08s/it]
69%|██████▉ | 3102/4506 [3:32:04<1:35:50, 4.10s/it]
{'loss': 0.213, 'grad_norm': 0.3522827625274658, 'learning_rate': 1.3405013652446925e-05, 'epoch': 0.69}
69%|██████▉ | 3102/4506 [3:32:04<1:35:50, 4.10s/it]
69%|██████▉ | 3103/4506 [3:32:08<1:33:38, 4.00s/it]
{'loss': 0.2189, 'grad_norm': 0.38395634293556213, 'learning_rate': 1.3387857681008145e-05, 'epoch': 0.69}
69%|██████▉ | 3103/4506 [3:32:08<1:33:38, 4.00s/it]
69%|██████▉ | 3104/4506 [3:32:12<1:35:44, 4.10s/it]
{'loss': 0.2215, 'grad_norm': 0.33808889985084534, 'learning_rate': 1.3370708679530502e-05, 'epoch': 0.69}
69%|██████▉ | 3104/4506 [3:32:12<1:35:44, 4.10s/it]
69%|██████▉ | 3105/4506 [3:32:16<1:34:36, 4.05s/it]
{'loss': 0.2182, 'grad_norm': 0.3900679349899292, 'learning_rate': 1.3353566658307351e-05, 'epoch': 0.69}
69%|██████▉ | 3105/4506 [3:32:16<1:34:36, 4.05s/it]
69%|██████▉ | 3106/4506 [3:32:20<1:32:08, 3.95s/it]
{'loss': 0.2148, 'grad_norm': 0.39984315633773804, 'learning_rate': 1.333643162762786e-05, 'epoch': 0.69}
69%|██████▉ | 3106/4506 [3:32:20<1:32:08, 3.95s/it]
69%|██████▉ | 3107/4506 [3:32:24<1:32:52, 3.98s/it]
{'loss': 0.2295, 'grad_norm': 0.3701474666595459, 'learning_rate': 1.3319303597776978e-05, 'epoch': 0.69}
69%|██████▉ | 3107/4506 [3:32:24<1:32:52, 3.98s/it]
69%|██████▉ | 3108/4506 [3:32:28<1:35:46, 4.11s/it]
{'loss': 0.2087, 'grad_norm': 0.3700549006462097, 'learning_rate': 1.3302182579035482e-05, 'epoch': 0.69}
69%|██████▉ | 3108/4506 [3:32:28<1:35:46, 4.11s/it]
69%|██████▉ | 3109/4506 [3:32:32<1:34:42, 4.07s/it]
{'loss': 0.2175, 'grad_norm': 0.3483114242553711, 'learning_rate': 1.3285068581679922e-05, 'epoch': 0.69}
69%|██████▉ | 3109/4506 [3:32:32<1:34:42, 4.07s/it]
69%|██████▉ | 3110/4506 [3:32:36<1:36:39, 4.15s/it]
{'loss': 0.2199, 'grad_norm': 0.42252710461616516, 'learning_rate': 1.326796161598265e-05, 'epoch': 0.69}
69%|██████▉ | 3110/4506 [3:32:36<1:36:39, 4.15s/it]
69%|██████▉ | 3111/4506 [3:32:41<1:38:50, 4.25s/it]
{'loss': 0.2252, 'grad_norm': 0.4239134192466736, 'learning_rate': 1.325086169221177e-05, 'epoch': 0.69}
69%|██████▉ | 3111/4506 [3:32:41<1:38:50, 4.25s/it]
69%|██████▉ | 3112/4506 [3:32:45<1:36:03, 4.13s/it]
{'loss': 0.2111, 'grad_norm': 0.38118427991867065, 'learning_rate': 1.3233768820631184e-05, 'epoch': 0.69}
69%|██████▉ | 3112/4506 [3:32:45<1:36:03, 4.13s/it]
69%|██████▉ | 3113/4506 [3:32:49<1:35:19, 4.11s/it]
{'loss': 0.218, 'grad_norm': 0.4022994935512543, 'learning_rate': 1.321668301150057e-05, 'epoch': 0.69}
69%|██████▉ | 3113/4506 [3:32:49<1:35:19, 4.11s/it]
69%|██████▉ | 3114/4506 [3:32:53<1:33:04, 4.01s/it]
{'loss': 0.2135, 'grad_norm': 0.3920004367828369, 'learning_rate': 1.3199604275075326e-05, 'epoch': 0.69}
69%|██████▉ | 3114/4506 [3:32:53<1:33:04, 4.01s/it]
69%|██████▉ | 3115/4506 [3:32:56<1:32:09, 3.98s/it]
{'loss': 0.2164, 'grad_norm': 0.4231116473674774, 'learning_rate': 1.3182532621606647e-05, 'epoch': 0.69}
69%|██████▉ | 3115/4506 [3:32:57<1:32:09, 3.98s/it]
69%|██████▉ | 3116/4506 [3:33:00<1:31:52, 3.97s/it]
{'loss': 0.2072, 'grad_norm': 0.479007363319397, 'learning_rate': 1.3165468061341445e-05, 'epoch': 0.69}
69%|██████▉ | 3116/4506 [3:33:00<1:31:52, 3.97s/it]
69%|██████▉ | 3117/4506 [3:33:04<1:32:17, 3.99s/it]
{'loss': 0.23, 'grad_norm': 0.4014793038368225, 'learning_rate': 1.3148410604522393e-05, 'epoch': 0.69}
69%|██████▉ | 3117/4506 [3:33:04<1:32:17, 3.99s/it]
69%|██████▉ | 3118/4506 [3:33:09<1:34:19, 4.08s/it]
{'loss': 0.2105, 'grad_norm': 0.35468918085098267, 'learning_rate': 1.3131360261387898e-05, 'epoch': 0.69}
69%|██████▉ | 3118/4506 [3:33:09<1:34:19, 4.08s/it]
69%|██████▉ | 3119/4506 [3:33:13<1:37:05, 4.20s/it]
{'loss': 0.214, 'grad_norm': 0.4124232828617096, 'learning_rate': 1.3114317042172106e-05, 'epoch': 0.69}
69%|██████▉ | 3119/4506 [3:33:13<1:37:05, 4.20s/it]
69%|██████▉ | 3120/4506 [3:33:18<1:39:24, 4.30s/it]
{'loss': 0.2244, 'grad_norm': 0.4211292862892151, 'learning_rate': 1.3097280957104851e-05, 'epoch': 0.69}
69%|██████▉ | 3120/4506 [3:33:18<1:39:24, 4.30s/it]
69%|██████▉ | 3121/4506 [3:33:22<1:37:34, 4.23s/it]
{'loss': 0.2276, 'grad_norm': 0.4520873725414276, 'learning_rate': 1.3080252016411737e-05, 'epoch': 0.69}
69%|██████▉ | 3121/4506 [3:33:22<1:37:34, 4.23s/it]
69%|██████▉ | 3122/4506 [3:33:26<1:35:43, 4.15s/it]
{'loss': 0.2038, 'grad_norm': 0.3736242651939392, 'learning_rate': 1.3063230230314028e-05, 'epoch': 0.69}
69%|██████▉ | 3122/4506 [3:33:26<1:35:43, 4.15s/it]
69%|██████▉ | 3123/4506 [3:33:30<1:33:06, 4.04s/it]
{'loss': 0.2129, 'grad_norm': 0.4281829595565796, 'learning_rate': 1.3046215609028722e-05, 'epoch': 0.69}
69%|██████▉ | 3123/4506 [3:33:30<1:33:06, 4.04s/it]
69%|██████▉ | 3124/4506 [3:33:33<1:31:32, 3.97s/it]
{'loss': 0.2256, 'grad_norm': 0.44390588998794556, 'learning_rate': 1.3029208162768525e-05, 'epoch': 0.69}
69%|██████▉ | 3124/4506 [3:33:33<1:31:32, 3.97s/it]
69%|██████▉ | 3125/4506 [3:33:37<1:30:15, 3.92s/it]
{'loss': 0.2216, 'grad_norm': 0.38088127970695496, 'learning_rate': 1.3012207901741824e-05, 'epoch': 0.69}
69%|██████▉ | 3125/4506 [3:33:37<1:30:15, 3.92s/it]
69%|██████▉ | 3126/4506 [3:33:41<1:29:02, 3.87s/it]
{'loss': 0.2192, 'grad_norm': 0.4054166376590729, 'learning_rate': 1.2995214836152676e-05, 'epoch': 0.69}
69%|██████▉ | 3126/4506 [3:33:41<1:29:02, 3.87s/it]
69%|██████▉ | 3127/4506 [3:33:45<1:30:38, 3.94s/it]
{'loss': 0.2032, 'grad_norm': 0.3545200228691101, 'learning_rate': 1.2978228976200858e-05, 'epoch': 0.69}
69%|██████▉ | 3127/4506 [3:33:45<1:30:38, 3.94s/it]
69%|██████▉ | 3128/4506 [3:33:49<1:28:50, 3.87s/it]
{'loss': 0.2032, 'grad_norm': 0.4265173673629761, 'learning_rate': 1.2961250332081782e-05, 'epoch': 0.69}
69%|██████▉ | 3128/4506 [3:33:49<1:28:50, 3.87s/it]
69%|██████▉ | 3129/4506 [3:33:53<1:28:15, 3.85s/it]
{'loss': 0.2278, 'grad_norm': 0.4238649904727936, 'learning_rate': 1.2944278913986549e-05, 'epoch': 0.69}
69%|██████▉ | 3129/4506 [3:33:53<1:28:15, 3.85s/it]
69%|██████▉ | 3130/4506 [3:33:57<1:29:30, 3.90s/it]
{'loss': 0.2323, 'grad_norm': 0.4618402421474457, 'learning_rate': 1.2927314732101922e-05, 'epoch': 0.69}
69%|██████▉ | 3130/4506 [3:33:57<1:29:30, 3.90s/it]
69%|██████▉ | 3131/4506 [3:34:01<1:31:52, 4.01s/it]
{'loss': 0.2094, 'grad_norm': 0.3910558223724365, 'learning_rate': 1.2910357796610323e-05, 'epoch': 0.69}
69%|██████▉ | 3131/4506 [3:34:01<1:31:52, 4.01s/it]
70%|██████▉ | 3132/4506 [3:34:05<1:34:39, 4.13s/it]
{'loss': 0.2178, 'grad_norm': 0.3838731646537781, 'learning_rate': 1.2893408117689825e-05, 'epoch': 0.7}
70%|██████▉ | 3132/4506 [3:34:05<1:34:39, 4.13s/it]
70%|██████▉ | 3133/4506 [3:34:09<1:33:22, 4.08s/it]
{'loss': 0.2053, 'grad_norm': 0.4419490694999695, 'learning_rate': 1.2876465705514129e-05, 'epoch': 0.7}
70%|██████▉ | 3133/4506 [3:34:09<1:33:22, 4.08s/it]
70%|██████▉ | 3134/4506 [3:34:14<1:35:12, 4.16s/it]
{'loss': 0.2037, 'grad_norm': 0.3260953426361084, 'learning_rate': 1.28595305702526e-05, 'epoch': 0.7}
70%|██████▉ | 3134/4506 [3:34:14<1:35:12, 4.16s/it]
70%|██████▉ | 3135/4506 [3:34:18<1:35:28, 4.18s/it]
{'loss': 0.215, 'grad_norm': 0.46018239855766296, 'learning_rate': 1.2842602722070207e-05, 'epoch': 0.7}
70%|██████▉ | 3135/4506 [3:34:18<1:35:28, 4.18s/it]
70%|██████▉ | 3136/4506 [3:34:22<1:34:22, 4.13s/it]
{'loss': 0.2101, 'grad_norm': 0.35907313227653503, 'learning_rate': 1.2825682171127563e-05, 'epoch': 0.7}
70%|██████▉ | 3136/4506 [3:34:22<1:34:22, 4.13s/it]
70%|██████▉ | 3137/4506 [3:34:26<1:35:47, 4.20s/it]
{'loss': 0.2207, 'grad_norm': 0.40936365723609924, 'learning_rate': 1.2808768927580899e-05, 'epoch': 0.7}
70%|██████▉ | 3137/4506 [3:34:26<1:35:47, 4.20s/it]
70%|██████▉ | 3138/4506 [3:34:30<1:34:49, 4.16s/it]
{'loss': 0.2316, 'grad_norm': 0.4725489616394043, 'learning_rate': 1.2791863001582077e-05, 'epoch': 0.7}
70%|██████▉ | 3138/4506 [3:34:30<1:34:49, 4.16s/it]
70%|██████▉ | 3139/4506 [3:34:34<1:33:26, 4.10s/it]
{'loss': 0.2155, 'grad_norm': 0.3548602759838104, 'learning_rate': 1.2774964403278516e-05, 'epoch': 0.7}
70%|██████▉ | 3139/4506 [3:34:34<1:33:26, 4.10s/it]
70%|██████▉ | 3140/4506 [3:34:38<1:33:23, 4.10s/it]
{'loss': 0.2173, 'grad_norm': 0.37274304032325745, 'learning_rate': 1.2758073142813296e-05, 'epoch': 0.7}
70%|██████▉ | 3140/4506 [3:34:38<1:33:23, 4.10s/it]
70%|██████▉ | 3141/4506 [3:34:42<1:32:20, 4.06s/it]
{'loss': 0.2056, 'grad_norm': 0.3589990437030792, 'learning_rate': 1.2741189230325042e-05, 'epoch': 0.7}
70%|██████▉ | 3141/4506 [3:34:42<1:32:20, 4.06s/it]
70%|██████▉ | 3142/4506 [3:34:46<1:30:57, 4.00s/it]
{'loss': 0.225, 'grad_norm': 0.4090579152107239, 'learning_rate': 1.2724312675948008e-05, 'epoch': 0.7}
70%|██████▉ | 3142/4506 [3:34:46<1:30:57, 4.00s/it]
70%|██████▉ | 3143/4506 [3:34:50<1:29:11, 3.93s/it]
{'loss': 0.2179, 'grad_norm': 0.4188744127750397, 'learning_rate': 1.2707443489812007e-05, 'epoch': 0.7}
70%|██████▉ | 3143/4506 [3:34:50<1:29:11, 3.93s/it]
70%|██████▉ | 3144/4506 [3:34:54<1:31:04, 4.01s/it]
{'loss': 0.2111, 'grad_norm': 0.3537863492965698, 'learning_rate': 1.2690581682042452e-05, 'epoch': 0.7}
70%|██████▉ | 3144/4506 [3:34:54<1:31:04, 4.01s/it]
70%|██████▉ | 3145/4506 [3:34:58<1:30:15, 3.98s/it]
{'loss': 0.2195, 'grad_norm': 0.3991938829421997, 'learning_rate': 1.2673727262760304e-05, 'epoch': 0.7}
70%|██████▉ | 3145/4506 [3:34:58<1:30:15, 3.98s/it]
70%|██████▉ | 3146/4506 [3:35:02<1:32:05, 4.06s/it]
{'loss': 0.2192, 'grad_norm': 0.34051334857940674, 'learning_rate': 1.2656880242082089e-05, 'epoch': 0.7}
70%|██████▉ | 3146/4506 [3:35:02<1:32:05, 4.06s/it]
70%|██████▉ | 3147/4506 [3:35:06<1:31:37, 4.05s/it]
{'loss': 0.2148, 'grad_norm': 0.38634195923805237, 'learning_rate': 1.2640040630119917e-05, 'epoch': 0.7}
70%|██████▉ | 3147/4506 [3:35:06<1:31:37, 4.05s/it]
70%|██████▉ | 3148/4506 [3:35:10<1:31:46, 4.06s/it]
{'loss': 0.2242, 'grad_norm': 0.3919277489185333, 'learning_rate': 1.2623208436981421e-05, 'epoch': 0.7}
70%|██████▉ | 3148/4506 [3:35:10<1:31:46, 4.06s/it]
70%|██████▉ | 3149/4506 [3:35:15<1:32:32, 4.09s/it]
{'loss': 0.2201, 'grad_norm': 0.3989778757095337, 'learning_rate': 1.26063836727698e-05, 'epoch': 0.7}
70%|██████▉ | 3149/4506 [3:35:15<1:32:32, 4.09s/it]
70%|██████▉ | 3150/4506 [3:35:19<1:33:58, 4.16s/it]
{'loss': 0.22, 'grad_norm': 0.3680225610733032, 'learning_rate': 1.2589566347583793e-05, 'epoch': 0.7}
70%|██████▉ | 3150/4506 [3:35:19<1:33:58, 4.16s/it]
70%|██████▉ | 3151/4506 [3:35:22<1:30:01, 3.99s/it]
{'loss': 0.221, 'grad_norm': 0.43468716740608215, 'learning_rate': 1.2572756471517677e-05, 'epoch': 0.7}
70%|██████▉ | 3151/4506 [3:35:22<1:30:01, 3.99s/it]
70%|██████▉ | 3152/4506 [3:35:27<1:30:40, 4.02s/it]
{'loss': 0.2071, 'grad_norm': 0.3242214620113373, 'learning_rate': 1.2555954054661239e-05, 'epoch': 0.7}
70%|██████▉ | 3152/4506 [3:35:27<1:30:40, 4.02s/it]
70%|██████▉ | 3153/4506 [3:35:31<1:31:28, 4.06s/it]
{'loss': 0.228, 'grad_norm': 0.39319539070129395, 'learning_rate': 1.253915910709981e-05, 'epoch': 0.7}
70%|██████▉ | 3153/4506 [3:35:31<1:31:28, 4.06s/it]
70%|██████▉ | 3154/4506 [3:35:35<1:32:47, 4.12s/it]
{'loss': 0.2103, 'grad_norm': 0.3524678349494934, 'learning_rate': 1.2522371638914216e-05, 'epoch': 0.7}
70%|██████▉ | 3154/4506 [3:35:35<1:32:47, 4.12s/it]
70%|███████ | 3155/4506 [3:35:39<1:33:51, 4.17s/it]
{'loss': 0.2394, 'grad_norm': 0.43534085154533386, 'learning_rate': 1.2505591660180816e-05, 'epoch': 0.7}
70%|███████ | 3155/4506 [3:35:39<1:33:51, 4.17s/it]
70%|███████ | 3156/4506 [3:35:44<1:37:14, 4.32s/it]
{'loss': 0.2279, 'grad_norm': 0.38532233238220215, 'learning_rate': 1.2488819180971456e-05, 'epoch': 0.7}
70%|███████ | 3156/4506 [3:35:44<1:37:14, 4.32s/it]
70%|███████ | 3157/4506 [3:35:48<1:34:02, 4.18s/it]
{'loss': 0.2171, 'grad_norm': 0.38325977325439453, 'learning_rate': 1.2472054211353507e-05, 'epoch': 0.7}
70%|███████ | 3157/4506 [3:35:48<1:34:02, 4.18s/it]
70%|███████ | 3158/4506 [3:35:52<1:37:01, 4.32s/it]
{'loss': 0.2115, 'grad_norm': 0.36837121844291687, 'learning_rate': 1.2455296761389793e-05, 'epoch': 0.7}
70%|███████ | 3158/4506 [3:35:52<1:37:01, 4.32s/it]
70%|███████ | 3159/4506 [3:35:56<1:35:33, 4.26s/it]
{'loss': 0.2196, 'grad_norm': 0.3583650588989258, 'learning_rate': 1.2438546841138659e-05, 'epoch': 0.7}
70%|███████ | 3159/4506 [3:35:57<1:35:33, 4.26s/it]
70%|███████ | 3160/4506 [3:36:01<1:34:40, 4.22s/it]
{'loss': 0.228, 'grad_norm': 0.42000478506088257, 'learning_rate': 1.2421804460653904e-05, 'epoch': 0.7}
70%|███████ | 3160/4506 [3:36:01<1:34:40, 4.22s/it]
70%|███████ | 3161/4506 [3:36:05<1:35:36, 4.27s/it]
{'loss': 0.2338, 'grad_norm': 0.4003738760948181, 'learning_rate': 1.2405069629984823e-05, 'epoch': 0.7}
70%|███████ | 3161/4506 [3:36:05<1:35:36, 4.27s/it]
70%|███████ | 3162/4506 [3:36:09<1:35:13, 4.25s/it]
{'loss': 0.2177, 'grad_norm': 0.4031253159046173, 'learning_rate': 1.2388342359176176e-05, 'epoch': 0.7}
70%|███████ | 3162/4506 [3:36:09<1:35:13, 4.25s/it]
70%|███████ | 3163/4506 [3:36:13<1:33:33, 4.18s/it]
{'loss': 0.216, 'grad_norm': 0.42333418130874634, 'learning_rate': 1.2371622658268162e-05, 'epoch': 0.7}
70%|███████ | 3163/4506 [3:36:13<1:33:33, 4.18s/it]
70%|███████ | 3164/4506 [3:36:17<1:32:06, 4.12s/it]
{'loss': 0.2114, 'grad_norm': 0.3379456102848053, 'learning_rate': 1.2354910537296475e-05, 'epoch': 0.7}
70%|███████ | 3164/4506 [3:36:17<1:32:06, 4.12s/it]
70%|███████ | 3165/4506 [3:36:22<1:34:19, 4.22s/it]
{'loss': 0.217, 'grad_norm': 0.36261484026908875, 'learning_rate': 1.233820600629222e-05, 'epoch': 0.7}
70%|███████ | 3165/4506 [3:36:22<1:34:19, 4.22s/it]
70%|███████ | 3166/4506 [3:36:26<1:34:09, 4.22s/it]
{'loss': 0.22, 'grad_norm': 0.4104851186275482, 'learning_rate': 1.2321509075281981e-05, 'epoch': 0.7}
70%|███████ | 3166/4506 [3:36:26<1:34:09, 4.22s/it]
70%|███████ | 3167/4506 [3:36:30<1:33:34, 4.19s/it]
{'loss': 0.2182, 'grad_norm': 0.37079674005508423, 'learning_rate': 1.2304819754287747e-05, 'epoch': 0.7}
70%|███████ | 3167/4506 [3:36:30<1:33:34, 4.19s/it]
70%|███████ | 3168/4506 [3:36:34<1:32:01, 4.13s/it]
{'loss': 0.2133, 'grad_norm': 0.346151739358902, 'learning_rate': 1.2288138053326967e-05, 'epoch': 0.7}
70%|███████ | 3168/4506 [3:36:34<1:32:01, 4.13s/it]
70%|███████ | 3169/4506 [3:36:38<1:33:29, 4.20s/it]
{'loss': 0.2031, 'grad_norm': 0.35405096411705017, 'learning_rate': 1.2271463982412502e-05, 'epoch': 0.7}
70%|███████ | 3169/4506 [3:36:38<1:33:29, 4.20s/it]
70%|███████ | 3170/4506 [3:36:42<1:31:29, 4.11s/it]
{'loss': 0.2142, 'grad_norm': 0.34323981404304504, 'learning_rate': 1.225479755155265e-05, 'epoch': 0.7}
70%|███████ | 3170/4506 [3:36:42<1:31:29, 4.11s/it]
70%|███████ | 3171/4506 [3:36:46<1:31:10, 4.10s/it]
{'loss': 0.2156, 'grad_norm': 0.36681145429611206, 'learning_rate': 1.2238138770751087e-05, 'epoch': 0.7}
70%|███████ | 3171/4506 [3:36:46<1:31:10, 4.10s/it]
70%|███████ | 3172/4506 [3:36:50<1:29:07, 4.01s/it]
{'loss': 0.2091, 'grad_norm': 0.36411234736442566, 'learning_rate': 1.222148765000694e-05, 'epoch': 0.7}
70%|███████ | 3172/4506 [3:36:50<1:29:07, 4.01s/it]
70%|███████ | 3173/4506 [3:36:55<1:31:35, 4.12s/it]
{'loss': 0.213, 'grad_norm': 0.3364955186843872, 'learning_rate': 1.2204844199314705e-05, 'epoch': 0.7}
70%|███████ | 3173/4506 [3:36:55<1:31:35, 4.12s/it]
70%|███████ | 3174/4506 [3:36:59<1:32:23, 4.16s/it]
{'loss': 0.2188, 'grad_norm': 0.4012974798679352, 'learning_rate': 1.2188208428664289e-05, 'epoch': 0.7}
70%|███████ | 3174/4506 [3:36:59<1:32:23, 4.16s/it]
70%|███████ | 3175/4506 [3:37:03<1:31:36, 4.13s/it]
{'loss': 0.2185, 'grad_norm': 0.3935071527957916, 'learning_rate': 1.2171580348040992e-05, 'epoch': 0.7}
70%|███████ | 3175/4506 [3:37:03<1:31:36, 4.13s/it]
70%|███████ | 3176/4506 [3:37:07<1:32:11, 4.16s/it]
{'loss': 0.2138, 'grad_norm': 0.362449049949646, 'learning_rate': 1.2154959967425503e-05, 'epoch': 0.7}
70%|███████ | 3176/4506 [3:37:07<1:32:11, 4.16s/it]
71%|███████ | 3177/4506 [3:37:11<1:32:51, 4.19s/it]
{'loss': 0.2223, 'grad_norm': 0.3988601267337799, 'learning_rate': 1.2138347296793859e-05, 'epoch': 0.71}
71%|███████ | 3177/4506 [3:37:11<1:32:51, 4.19s/it]
71%|███████ | 3178/4506 [3:37:16<1:33:07, 4.21s/it]
{'loss': 0.2243, 'grad_norm': 0.374472975730896, 'learning_rate': 1.2121742346117513e-05, 'epoch': 0.71}
71%|███████ | 3178/4506 [3:37:16<1:33:07, 4.21s/it]
71%|███████ | 3179/4506 [3:37:20<1:32:40, 4.19s/it]
{'loss': 0.2106, 'grad_norm': 0.38424408435821533, 'learning_rate': 1.2105145125363238e-05, 'epoch': 0.71}
71%|███████ | 3179/4506 [3:37:20<1:32:40, 4.19s/it]
71%|███████ | 3180/4506 [3:37:24<1:31:44, 4.15s/it]
{'loss': 0.2116, 'grad_norm': 0.35822808742523193, 'learning_rate': 1.2088555644493205e-05, 'epoch': 0.71}
71%|███████ | 3180/4506 [3:37:24<1:31:44, 4.15s/it]
71%|███████ | 3181/4506 [3:37:28<1:32:15, 4.18s/it]
{'loss': 0.2218, 'grad_norm': 0.4106532335281372, 'learning_rate': 1.2071973913464932e-05, 'epoch': 0.71}
71%|███████ | 3181/4506 [3:37:28<1:32:15, 4.18s/it]
71%|███████ | 3182/4506 [3:37:32<1:28:48, 4.02s/it]
{'loss': 0.2151, 'grad_norm': 0.3917100727558136, 'learning_rate': 1.205539994223126e-05, 'epoch': 0.71}
71%|███████ | 3182/4506 [3:37:32<1:28:48, 4.02s/it]
71%|███████ | 3183/4506 [3:37:36<1:27:41, 3.98s/it]
{'loss': 0.2142, 'grad_norm': 0.3965434432029724, 'learning_rate': 1.2038833740740413e-05, 'epoch': 0.71}
71%|███████ | 3183/4506 [3:37:36<1:27:41, 3.98s/it]
71%|███████ | 3184/4506 [3:37:40<1:28:10, 4.00s/it]
{'loss': 0.1983, 'grad_norm': 0.3262520134449005, 'learning_rate': 1.2022275318935904e-05, 'epoch': 0.71}
71%|███████ | 3184/4506 [3:37:40<1:28:10, 4.00s/it]
71%|███████ | 3185/4506 [3:37:44<1:27:34, 3.98s/it]
{'loss': 0.218, 'grad_norm': 0.4352213740348816, 'learning_rate': 1.2005724686756626e-05, 'epoch': 0.71}
71%|███████ | 3185/4506 [3:37:44<1:27:34, 3.98s/it]
71%|███████ | 3186/4506 [3:37:47<1:26:52, 3.95s/it]
{'loss': 0.211, 'grad_norm': 0.3467179238796234, 'learning_rate': 1.1989181854136747e-05, 'epoch': 0.71}
71%|███████ | 3186/4506 [3:37:47<1:26:52, 3.95s/it]
71%|███████ | 3187/4506 [3:37:51<1:26:19, 3.93s/it]
{'loss': 0.2118, 'grad_norm': 0.3492768406867981, 'learning_rate': 1.1972646831005797e-05, 'epoch': 0.71}
71%|███████ | 3187/4506 [3:37:51<1:26:19, 3.93s/it]
71%|███████ | 3188/4506 [3:37:55<1:27:28, 3.98s/it]
{'loss': 0.2005, 'grad_norm': 0.393365740776062, 'learning_rate': 1.195611962728859e-05, 'epoch': 0.71}
71%|███████ | 3188/4506 [3:37:55<1:27:28, 3.98s/it]
71%|███████ | 3189/4506 [3:37:59<1:26:12, 3.93s/it]
{'loss': 0.2256, 'grad_norm': 0.38690099120140076, 'learning_rate': 1.1939600252905273e-05, 'epoch': 0.71}
71%|███████ | 3189/4506 [3:37:59<1:26:12, 3.93s/it]
71%|███████ | 3190/4506 [3:38:03<1:26:23, 3.94s/it]
{'loss': 0.2142, 'grad_norm': 0.37729957699775696, 'learning_rate': 1.1923088717771256e-05, 'epoch': 0.71}
71%|███████ | 3190/4506 [3:38:03<1:26:23, 3.94s/it]
71%|███████ | 3191/4506 [3:38:08<1:29:54, 4.10s/it]
{'loss': 0.2178, 'grad_norm': 0.4304269254207611, 'learning_rate': 1.1906585031797284e-05, 'epoch': 0.71}
71%|███████ | 3191/4506 [3:38:08<1:29:54, 4.10s/it]
71%|███████ | 3192/4506 [3:38:12<1:28:51, 4.06s/it]
{'loss': 0.2134, 'grad_norm': 0.3739289939403534, 'learning_rate': 1.1890089204889352e-05, 'epoch': 0.71}
71%|███████ | 3192/4506 [3:38:12<1:28:51, 4.06s/it]
71%|███████ | 3193/4506 [3:38:16<1:28:23, 4.04s/it]
{'loss': 0.2109, 'grad_norm': 0.37471383810043335, 'learning_rate': 1.1873601246948765e-05, 'epoch': 0.71}
71%|███████ | 3193/4506 [3:38:16<1:28:23, 4.04s/it]
71%|███████ | 3194/4506 [3:38:20<1:30:19, 4.13s/it]
{'loss': 0.2112, 'grad_norm': 0.43773940205574036, 'learning_rate': 1.18571211678721e-05, 'epoch': 0.71}
71%|███████ | 3194/4506 [3:38:20<1:30:19, 4.13s/it]
71%|███████ | 3195/4506 [3:38:24<1:30:10, 4.13s/it]
{'loss': 0.2148, 'grad_norm': 0.4167539179325104, 'learning_rate': 1.1840648977551213e-05, 'epoch': 0.71}
71%|███████ | 3195/4506 [3:38:24<1:30:10, 4.13s/it]
71%|███████ | 3196/4506 [3:38:28<1:31:50, 4.21s/it]
{'loss': 0.2235, 'grad_norm': 0.5008059144020081, 'learning_rate': 1.18241846858732e-05, 'epoch': 0.71}
71%|███████ | 3196/4506 [3:38:28<1:31:50, 4.21s/it]
71%|███████ | 3197/4506 [3:38:33<1:31:10, 4.18s/it]
{'loss': 0.2083, 'grad_norm': 0.3385120928287506, 'learning_rate': 1.180772830272042e-05, 'epoch': 0.71}
71%|███████ | 3197/4506 [3:38:33<1:31:10, 4.18s/it]
71%|███████ | 3198/4506 [3:38:37<1:31:33, 4.20s/it]
{'loss': 0.2165, 'grad_norm': 0.43278229236602783, 'learning_rate': 1.1791279837970509e-05, 'epoch': 0.71}
71%|███████ | 3198/4506 [3:38:37<1:31:33, 4.20s/it]
71%|███████ | 3199/4506 [3:38:41<1:29:29, 4.11s/it]
{'loss': 0.2122, 'grad_norm': 0.3867837190628052, 'learning_rate': 1.1774839301496332e-05, 'epoch': 0.71}
71%|███████ | 3199/4506 [3:38:41<1:29:29, 4.11s/it]
71%|███████ | 3200/4506 [3:38:45<1:29:30, 4.11s/it]
{'loss': 0.2198, 'grad_norm': 0.3768579959869385, 'learning_rate': 1.1758406703166011e-05, 'epoch': 0.71}
71%|███████ | 3200/4506 [3:38:45<1:29:30, 4.11s/it]
71%|███████ | 3201/4506 [3:38:49<1:28:54, 4.09s/it]
{'loss': 0.2114, 'grad_norm': 0.3513030707836151, 'learning_rate': 1.174198205284287e-05, 'epoch': 0.71}
71%|███████ | 3201/4506 [3:38:49<1:28:54, 4.09s/it]
71%|███████ | 3202/4506 [3:38:53<1:30:24, 4.16s/it]
{'loss': 0.221, 'grad_norm': 0.3808373808860779, 'learning_rate': 1.1725565360385505e-05, 'epoch': 0.71}
71%|███████ | 3202/4506 [3:38:53<1:30:24, 4.16s/it]
71%|███████ | 3203/4506 [3:38:57<1:28:11, 4.06s/it]
{'loss': 0.2019, 'grad_norm': 0.42069926857948303, 'learning_rate': 1.1709156635647694e-05, 'epoch': 0.71}
71%|███████ | 3203/4506 [3:38:57<1:28:11, 4.06s/it]
71%|███████ | 3204/4506 [3:39:02<1:31:54, 4.24s/it]
{'loss': 0.2226, 'grad_norm': 0.35286182165145874, 'learning_rate': 1.1692755888478474e-05, 'epoch': 0.71}
71%|███████ | 3204/4506 [3:39:02<1:31:54, 4.24s/it]
71%|███████ | 3205/4506 [3:39:06<1:30:04, 4.15s/it]
{'loss': 0.2112, 'grad_norm': 0.33698806166648865, 'learning_rate': 1.1676363128722051e-05, 'epoch': 0.71}
71%|███████ | 3205/4506 [3:39:06<1:30:04, 4.15s/it]
71%|███████ | 3206/4506 [3:39:10<1:28:56, 4.10s/it]
{'loss': 0.2148, 'grad_norm': 0.4023645520210266, 'learning_rate': 1.1659978366217871e-05, 'epoch': 0.71}
71%|███████ | 3206/4506 [3:39:10<1:28:56, 4.10s/it]
71%|███████ | 3207/4506 [3:39:14<1:29:24, 4.13s/it]
{'loss': 0.2157, 'grad_norm': 0.3809047043323517, 'learning_rate': 1.1643601610800563e-05, 'epoch': 0.71}
71%|███████ | 3207/4506 [3:39:14<1:29:24, 4.13s/it]
71%|███████ | 3208/4506 [3:39:18<1:31:54, 4.25s/it]
{'loss': 0.2085, 'grad_norm': 0.36608925461769104, 'learning_rate': 1.1627232872299964e-05, 'epoch': 0.71}
71%|███████ | 3208/4506 [3:39:18<1:31:54, 4.25s/it]
71%|███████ | 3209/4506 [3:39:22<1:31:12, 4.22s/it]
{'loss': 0.2093, 'grad_norm': 0.3971116244792938, 'learning_rate': 1.1610872160541073e-05, 'epoch': 0.71}
71%|███████ | 3209/4506 [3:39:22<1:31:12, 4.22s/it]
71%|███████ | 3210/4506 [3:39:27<1:29:55, 4.16s/it]
{'loss': 0.2202, 'grad_norm': 0.40227222442626953, 'learning_rate': 1.1594519485344105e-05, 'epoch': 0.71}
71%|███████ | 3210/4506 [3:39:27<1:29:55, 4.16s/it]
71%|███████▏ | 3211/4506 [3:39:31<1:29:53, 4.16s/it]
{'loss': 0.2107, 'grad_norm': 0.33067038655281067, 'learning_rate': 1.157817485652441e-05, 'epoch': 0.71}
71%|███████▏ | 3211/4506 [3:39:31<1:29:53, 4.16s/it]
71%|███████▏ | 3212/4506 [3:39:35<1:30:52, 4.21s/it]
{'loss': 0.2134, 'grad_norm': 0.4594621956348419, 'learning_rate': 1.1561838283892546e-05, 'epoch': 0.71}
71%|███████▏ | 3212/4506 [3:39:35<1:30:52, 4.21s/it]
71%|███████▏ | 3213/4506 [3:39:39<1:31:11, 4.23s/it]
{'loss': 0.2194, 'grad_norm': 0.4117177724838257, 'learning_rate': 1.1545509777254229e-05, 'epoch': 0.71}
71%|███████▏ | 3213/4506 [3:39:39<1:31:11, 4.23s/it]
71%|███████▏ | 3214/4506 [3:39:43<1:30:55, 4.22s/it]
{'loss': 0.2149, 'grad_norm': 0.39676520228385925, 'learning_rate': 1.1529189346410305e-05, 'epoch': 0.71}
71%|███████▏ | 3214/4506 [3:39:43<1:30:55, 4.22s/it]
71%|███████▏ | 3215/4506 [3:39:47<1:28:30, 4.11s/it]
{'loss': 0.2098, 'grad_norm': 0.35972175002098083, 'learning_rate': 1.1512877001156813e-05, 'epoch': 0.71}
71%|███████▏ | 3215/4506 [3:39:47<1:28:30, 4.11s/it]
71%|███████▏ | 3216/4506 [3:39:51<1:28:37, 4.12s/it]
{'loss': 0.2048, 'grad_norm': 0.36657142639160156, 'learning_rate': 1.1496572751284901e-05, 'epoch': 0.71}
71%|███████▏ | 3216/4506 [3:39:51<1:28:37, 4.12s/it]
71%|███████▏ | 3217/4506 [3:39:55<1:27:17, 4.06s/it]
{'loss': 0.214, 'grad_norm': 0.39456743001937866, 'learning_rate': 1.1480276606580887e-05, 'epoch': 0.71}
71%|███████▏ | 3217/4506 [3:39:55<1:27:17, 4.06s/it]
71%|███████▏ | 3218/4506 [3:39:59<1:26:40, 4.04s/it]
{'loss': 0.217, 'grad_norm': 0.42045289278030396, 'learning_rate': 1.1463988576826206e-05, 'epoch': 0.71}
71%|███████▏ | 3218/4506 [3:39:59<1:26:40, 4.04s/it]
71%|███████▏ | 3219/4506 [3:40:04<1:27:25, 4.08s/it]
{'loss': 0.222, 'grad_norm': 0.3711325228214264, 'learning_rate': 1.1447708671797447e-05, 'epoch': 0.71}
71%|███████▏ | 3219/4506 [3:40:04<1:27:25, 4.08s/it]
71%|███████▏ | 3220/4506 [3:40:07<1:26:20, 4.03s/it]
{'loss': 0.2063, 'grad_norm': 0.33941397070884705, 'learning_rate': 1.1431436901266279e-05, 'epoch': 0.71}
71%|███████▏ | 3220/4506 [3:40:07<1:26:20, 4.03s/it]
71%|███████▏ | 3221/4506 [3:40:12<1:28:07, 4.12s/it]
{'loss': 0.2248, 'grad_norm': 0.40645524859428406, 'learning_rate': 1.1415173274999536e-05, 'epoch': 0.71}
71%|███████▏ | 3221/4506 [3:40:12<1:28:07, 4.12s/it]
72%|███████▏ | 3222/4506 [3:40:16<1:25:48, 4.01s/it]
{'loss': 0.2229, 'grad_norm': 0.4214262068271637, 'learning_rate': 1.1398917802759121e-05, 'epoch': 0.72}
72%|███████▏ | 3222/4506 [3:40:16<1:25:48, 4.01s/it]
72%|███████▏ | 3223/4506 [3:40:20<1:25:53, 4.02s/it]
{'loss': 0.2135, 'grad_norm': 0.3489350378513336, 'learning_rate': 1.1382670494302083e-05, 'epoch': 0.72}
72%|███████▏ | 3223/4506 [3:40:20<1:25:53, 4.02s/it]
72%|███████▏ | 3224/4506 [3:40:24<1:28:52, 4.16s/it]
{'loss': 0.2179, 'grad_norm': 0.3891071677207947, 'learning_rate': 1.1366431359380533e-05, 'epoch': 0.72}
72%|███████▏ | 3224/4506 [3:40:24<1:28:52, 4.16s/it]
72%|███████▏ | 3225/4506 [3:40:28<1:25:31, 4.01s/it]
{'loss': 0.2183, 'grad_norm': 0.3716537654399872, 'learning_rate': 1.1350200407741702e-05, 'epoch': 0.72}
72%|███████▏ | 3225/4506 [3:40:28<1:25:31, 4.01s/it]
72%|███████▏ | 3226/4506 [3:40:31<1:23:13, 3.90s/it]
{'loss': 0.2166, 'grad_norm': 0.38603270053863525, 'learning_rate': 1.1333977649127898e-05, 'epoch': 0.72}
72%|███████▏ | 3226/4506 [3:40:31<1:23:13, 3.90s/it]
72%|███████▏ | 3227/4506 [3:40:35<1:24:18, 3.96s/it]
{'loss': 0.2154, 'grad_norm': 0.3769650161266327, 'learning_rate': 1.131776309327653e-05, 'epoch': 0.72}
72%|███████▏ | 3227/4506 [3:40:35<1:24:18, 3.96s/it]
72%|███████▏ | 3228/4506 [3:40:39<1:22:16, 3.86s/it]
{'loss': 0.2156, 'grad_norm': 0.4009692966938019, 'learning_rate': 1.1301556749920042e-05, 'epoch': 0.72}
72%|███████▏ | 3228/4506 [3:40:39<1:22:16, 3.86s/it]
72%|███████▏ | 3229/4506 [3:40:43<1:24:55, 3.99s/it]
{'loss': 0.2106, 'grad_norm': 0.36123189330101013, 'learning_rate': 1.1285358628785996e-05, 'epoch': 0.72}
72%|███████▏ | 3229/4506 [3:40:43<1:24:55, 3.99s/it]
72%|███████▏ | 3230/4506 [3:40:47<1:24:30, 3.97s/it]
{'loss': 0.201, 'grad_norm': 0.3566931188106537, 'learning_rate': 1.1269168739596984e-05, 'epoch': 0.72}
72%|███████▏ | 3230/4506 [3:40:47<1:24:30, 3.97s/it]
72%|███████▏ | 3231/4506 [3:40:51<1:24:16, 3.97s/it]
{'loss': 0.2062, 'grad_norm': 0.3486084043979645, 'learning_rate': 1.1252987092070672e-05, 'epoch': 0.72}
72%|███████▏ | 3231/4506 [3:40:51<1:24:16, 3.97s/it]
72%|███████▏ | 3232/4506 [3:40:56<1:26:32, 4.08s/it]
{'loss': 0.2149, 'grad_norm': 0.3418000340461731, 'learning_rate': 1.1236813695919787e-05, 'epoch': 0.72}
72%|███████▏ | 3232/4506 [3:40:56<1:26:32, 4.08s/it]
72%|███████▏ | 3233/4506 [3:41:00<1:26:41, 4.09s/it]
{'loss': 0.2199, 'grad_norm': 0.37676897644996643, 'learning_rate': 1.1220648560852078e-05, 'epoch': 0.72}
72%|███████▏ | 3233/4506 [3:41:00<1:26:41, 4.09s/it]
72%|███████▏ | 3234/4506 [3:41:04<1:25:35, 4.04s/it]
{'loss': 0.2204, 'grad_norm': 0.35772162675857544, 'learning_rate': 1.1204491696570371e-05, 'epoch': 0.72}
72%|███████▏ | 3234/4506 [3:41:04<1:25:35, 4.04s/it]
72%|███████▏ | 3235/4506 [3:41:08<1:30:21, 4.27s/it]
{'loss': 0.2208, 'grad_norm': 0.44408929347991943, 'learning_rate': 1.1188343112772485e-05, 'epoch': 0.72}
72%|███████▏ | 3235/4506 [3:41:08<1:30:21, 4.27s/it]
72%|███████▏ | 3236/4506 [3:41:13<1:29:02, 4.21s/it]
{'loss': 0.2174, 'grad_norm': 0.3579804599285126, 'learning_rate': 1.1172202819151301e-05, 'epoch': 0.72}
72%|███████▏ | 3236/4506 [3:41:13<1:29:02, 4.21s/it]
72%|███████▏ | 3237/4506 [3:41:16<1:26:36, 4.09s/it]
{'loss': 0.2068, 'grad_norm': 0.34146443009376526, 'learning_rate': 1.1156070825394716e-05, 'epoch': 0.72}
72%|███████▏ | 3237/4506 [3:41:16<1:26:36, 4.09s/it]
72%|███████▏ | 3238/4506 [3:41:21<1:27:02, 4.12s/it]
{'loss': 0.2089, 'grad_norm': 0.3367658257484436, 'learning_rate': 1.113994714118565e-05, 'epoch': 0.72}
72%|███████▏ | 3238/4506 [3:41:21<1:27:02, 4.12s/it]
72%|███████▏ | 3239/4506 [3:41:25<1:26:21, 4.09s/it]
{'loss': 0.2083, 'grad_norm': 0.3750361502170563, 'learning_rate': 1.1123831776202014e-05, 'epoch': 0.72}
72%|███████▏ | 3239/4506 [3:41:25<1:26:21, 4.09s/it]
72%|███████▏ | 3240/4506 [3:41:29<1:29:18, 4.23s/it]
{'loss': 0.2081, 'grad_norm': 0.3638611137866974, 'learning_rate': 1.110772474011676e-05, 'epoch': 0.72}
72%|███████▏ | 3240/4506 [3:41:29<1:29:18, 4.23s/it]
72%|███████▏ | 3241/4506 [3:41:33<1:28:01, 4.18s/it]
{'loss': 0.2153, 'grad_norm': 0.3613526225090027, 'learning_rate': 1.1091626042597797e-05, 'epoch': 0.72}
72%|███████▏ | 3241/4506 [3:41:33<1:28:01, 4.18s/it]
72%|███████▏ | 3242/4506 [3:41:37<1:27:11, 4.14s/it]
{'loss': 0.2204, 'grad_norm': 0.3614828288555145, 'learning_rate': 1.1075535693308075e-05, 'epoch': 0.72}
72%|███████▏ | 3242/4506 [3:41:37<1:27:11, 4.14s/it]
72%|███████▏ | 3243/4506 [3:41:41<1:27:32, 4.16s/it]
{'loss': 0.2193, 'grad_norm': 0.3722996115684509, 'learning_rate': 1.1059453701905493e-05, 'epoch': 0.72}
72%|███████▏ | 3243/4506 [3:41:41<1:27:32, 4.16s/it]
72%|███████▏ | 3244/4506 [3:41:46<1:27:14, 4.15s/it]
{'loss': 0.2129, 'grad_norm': 0.4170643091201782, 'learning_rate': 1.1043380078042958e-05, 'epoch': 0.72}
72%|███████▏ | 3244/4506 [3:41:46<1:27:14, 4.15s/it]
72%|███████▏ | 3245/4506 [3:41:50<1:26:55, 4.14s/it]
{'loss': 0.197, 'grad_norm': 0.4705902934074402, 'learning_rate': 1.1027314831368354e-05, 'epoch': 0.72}
72%|███████▏ | 3245/4506 [3:41:50<1:26:55, 4.14s/it]
72%|███████▏ | 3246/4506 [3:41:53<1:25:04, 4.05s/it]
{'loss': 0.2215, 'grad_norm': 0.47365444898605347, 'learning_rate': 1.1011257971524534e-05, 'epoch': 0.72}
72%|███████▏ | 3246/4506 [3:41:53<1:25:04, 4.05s/it]
72%|███████▏ | 3247/4506 [3:41:58<1:27:12, 4.16s/it]
{'loss': 0.2183, 'grad_norm': 0.32114318013191223, 'learning_rate': 1.0995209508149306e-05, 'epoch': 0.72}
72%|███████▏ | 3247/4506 [3:41:58<1:27:12, 4.16s/it]
72%|███████▏ | 3248/4506 [3:42:02<1:25:14, 4.07s/it]
{'loss': 0.2192, 'grad_norm': 0.4271710515022278, 'learning_rate': 1.097916945087544e-05, 'epoch': 0.72}
72%|███████▏ | 3248/4506 [3:42:02<1:25:14, 4.07s/it]
72%|███████▏ | 3249/4506 [3:42:06<1:26:46, 4.14s/it]
{'loss': 0.2203, 'grad_norm': 0.3849189877510071, 'learning_rate': 1.096313780933067e-05, 'epoch': 0.72}
72%|███████▏ | 3249/4506 [3:42:06<1:26:46, 4.14s/it]
72%|███████▏ | 3250/4506 [3:42:11<1:28:47, 4.24s/it]
{'loss': 0.2251, 'grad_norm': 0.34452149271965027, 'learning_rate': 1.094711459313768e-05, 'epoch': 0.72}
72%|███████▏ | 3250/4506 [3:42:11<1:28:47, 4.24s/it]
72%|███████▏ | 3251/4506 [3:42:15<1:27:14, 4.17s/it]
{'loss': 0.2184, 'grad_norm': 0.39014747738838196, 'learning_rate': 1.09310998119141e-05, 'epoch': 0.72}
72%|███████▏ | 3251/4506 [3:42:15<1:27:14, 4.17s/it]
72%|███████▏ | 3252/4506 [3:42:19<1:26:58, 4.16s/it]
{'loss': 0.2175, 'grad_norm': 0.4096516966819763, 'learning_rate': 1.091509347527247e-05, 'epoch': 0.72}
72%|███████▏ | 3252/4506 [3:42:19<1:26:58, 4.16s/it]
72%|███████▏ | 3253/4506 [3:42:22<1:24:32, 4.05s/it]
{'loss': 0.2186, 'grad_norm': 0.37610724568367004, 'learning_rate': 1.08990955928203e-05, 'epoch': 0.72}
72%|███████▏ | 3253/4506 [3:42:22<1:24:32, 4.05s/it]
72%|███████▏ | 3254/4506 [3:42:26<1:22:49, 3.97s/it]
{'loss': 0.2092, 'grad_norm': 0.37422502040863037, 'learning_rate': 1.0883106174159981e-05, 'epoch': 0.72}
72%|███████▏ | 3254/4506 [3:42:26<1:22:49, 3.97s/it]
72%|███████▏ | 3255/4506 [3:42:30<1:23:03, 3.98s/it]
{'loss': 0.2125, 'grad_norm': 0.36742734909057617, 'learning_rate': 1.086712522888887e-05, 'epoch': 0.72}
72%|███████▏ | 3255/4506 [3:42:30<1:23:03, 3.98s/it]
72%|███████▏ | 3256/4506 [3:42:35<1:25:27, 4.10s/it]
{'loss': 0.2125, 'grad_norm': 0.3536536991596222, 'learning_rate': 1.0851152766599204e-05, 'epoch': 0.72}
72%|███████▏ | 3256/4506 [3:42:35<1:25:27, 4.10s/it]
72%|███████▏ | 3257/4506 [3:42:39<1:24:10, 4.04s/it]
{'loss': 0.2128, 'grad_norm': 0.35689181089401245, 'learning_rate': 1.0835188796878156e-05, 'epoch': 0.72}
72%|███████▏ | 3257/4506 [3:42:39<1:24:10, 4.04s/it]
72%|███████▏ | 3258/4506 [3:42:43<1:25:11, 4.10s/it]
{'loss': 0.2147, 'grad_norm': 0.37328529357910156, 'learning_rate': 1.0819233329307768e-05, 'epoch': 0.72}
72%|███████▏ | 3258/4506 [3:42:43<1:25:11, 4.10s/it]
72%|███████▏ | 3259/4506 [3:42:47<1:26:24, 4.16s/it]
{'loss': 0.2226, 'grad_norm': 0.391278475522995, 'learning_rate': 1.0803286373465016e-05, 'epoch': 0.72}
72%|███████▏ | 3259/4506 [3:42:47<1:26:24, 4.16s/it]
72%|███████▏ | 3260/4506 [3:42:51<1:24:41, 4.08s/it]
{'loss': 0.2202, 'grad_norm': 0.4161718785762787, 'learning_rate': 1.078734793892173e-05, 'epoch': 0.72}
72%|███████▏ | 3260/4506 [3:42:51<1:24:41, 4.08s/it]
72%|███████▏ | 3261/4506 [3:42:55<1:24:27, 4.07s/it]
{'loss': 0.2081, 'grad_norm': 0.4306774437427521, 'learning_rate': 1.0771418035244657e-05, 'epoch': 0.72}
72%|███████▏ | 3261/4506 [3:42:55<1:24:27, 4.07s/it]
72%|███████▏ | 3262/4506 [3:43:00<1:27:19, 4.21s/it]
{'loss': 0.2258, 'grad_norm': 0.3757742941379547, 'learning_rate': 1.0755496671995396e-05, 'epoch': 0.72}
72%|███████▏ | 3262/4506 [3:43:00<1:27:19, 4.21s/it]
72%|███████▏ | 3263/4506 [3:43:04<1:27:32, 4.23s/it]
{'loss': 0.2195, 'grad_norm': 0.45358890295028687, 'learning_rate': 1.0739583858730443e-05, 'epoch': 0.72}
72%|███████▏ | 3263/4506 [3:43:04<1:27:32, 4.23s/it]
72%|███████▏ | 3264/4506 [3:43:08<1:25:07, 4.11s/it]
{'loss': 0.2027, 'grad_norm': 0.3982643187046051, 'learning_rate': 1.0723679605001161e-05, 'epoch': 0.72}
72%|███████▏ | 3264/4506 [3:43:08<1:25:07, 4.11s/it]
72%|███████▏ | 3265/4506 [3:43:12<1:25:56, 4.16s/it]
{'loss': 0.2102, 'grad_norm': 0.3671136796474457, 'learning_rate': 1.0707783920353745e-05, 'epoch': 0.72}
72%|███████▏ | 3265/4506 [3:43:12<1:25:56, 4.16s/it]
72%|███████▏ | 3266/4506 [3:43:16<1:24:18, 4.08s/it]
{'loss': 0.2117, 'grad_norm': 0.3705672025680542, 'learning_rate': 1.069189681432929e-05, 'epoch': 0.72}
72%|███████▏ | 3266/4506 [3:43:16<1:24:18, 4.08s/it]
73%|███████▎ | 3267/4506 [3:43:20<1:24:36, 4.10s/it]
{'loss': 0.2126, 'grad_norm': 0.41242191195487976, 'learning_rate': 1.0676018296463708e-05, 'epoch': 0.73}
73%|███████▎ | 3267/4506 [3:43:20<1:24:36, 4.10s/it]
73%|███████▎ | 3268/4506 [3:43:24<1:23:37, 4.05s/it]
{'loss': 0.2204, 'grad_norm': 0.40421023964881897, 'learning_rate': 1.0660148376287768e-05, 'epoch': 0.73}
73%|███████▎ | 3268/4506 [3:43:24<1:23:37, 4.05s/it]
73%|███████▎ | 3269/4506 [3:43:28<1:22:42, 4.01s/it]
{'loss': 0.2155, 'grad_norm': 0.3603297173976898, 'learning_rate': 1.0644287063327082e-05, 'epoch': 0.73}
73%|███████▎ | 3269/4506 [3:43:28<1:22:42, 4.01s/it]
73%|███████▎ | 3270/4506 [3:43:32<1:21:12, 3.94s/it]
{'loss': 0.2079, 'grad_norm': 0.408769816160202, 'learning_rate': 1.0628434367102105e-05, 'epoch': 0.73}
73%|███████▎ | 3270/4506 [3:43:32<1:21:12, 3.94s/it]
73%|███████▎ | 3271/4506 [3:43:35<1:20:43, 3.92s/it]
{'loss': 0.2109, 'grad_norm': 0.3880594074726105, 'learning_rate': 1.0612590297128085e-05, 'epoch': 0.73}
73%|███████▎ | 3271/4506 [3:43:35<1:20:43, 3.92s/it]
73%|███████▎ | 3272/4506 [3:43:40<1:23:16, 4.05s/it]
{'loss': 0.207, 'grad_norm': 0.3600222170352936, 'learning_rate': 1.0596754862915138e-05, 'epoch': 0.73}
73%|███████▎ | 3272/4506 [3:43:40<1:23:16, 4.05s/it]
73%|███████▎ | 3273/4506 [3:43:44<1:23:23, 4.06s/it]
{'loss': 0.2083, 'grad_norm': 0.3771316707134247, 'learning_rate': 1.0580928073968149e-05, 'epoch': 0.73}
73%|███████▎ | 3273/4506 [3:43:44<1:23:23, 4.06s/it]
73%|███████▎ | 3274/4506 [3:43:48<1:24:51, 4.13s/it]
{'loss': 0.217, 'grad_norm': 0.37316980957984924, 'learning_rate': 1.0565109939786854e-05, 'epoch': 0.73}
73%|███████▎ | 3274/4506 [3:43:48<1:24:51, 4.13s/it]
73%|███████▎ | 3275/4506 [3:43:53<1:26:19, 4.21s/it]
{'loss': 0.2199, 'grad_norm': 0.42670291662216187, 'learning_rate': 1.0549300469865772e-05, 'epoch': 0.73}
73%|███████▎ | 3275/4506 [3:43:53<1:26:19, 4.21s/it]
73%|███████▎ | 3276/4506 [3:43:56<1:23:52, 4.09s/it]
{'loss': 0.2128, 'grad_norm': 0.38896629214286804, 'learning_rate': 1.0533499673694243e-05, 'epoch': 0.73}
73%|███████▎ | 3276/4506 [3:43:56<1:23:52, 4.09s/it]
73%|███████▎ | 3277/4506 [3:44:00<1:23:24, 4.07s/it]
{'loss': 0.2072, 'grad_norm': 0.3462109863758087, 'learning_rate': 1.0517707560756362e-05, 'epoch': 0.73}
73%|███████▎ | 3277/4506 [3:44:00<1:23:24, 4.07s/it]
73%|███████▎ | 3278/4506 [3:44:04<1:22:35, 4.04s/it]
{'loss': 0.216, 'grad_norm': 0.3881259262561798, 'learning_rate': 1.0501924140531058e-05, 'epoch': 0.73}
73%|███████▎ | 3278/4506 [3:44:04<1:22:35, 4.04s/it]
73%|███████▎ | 3279/4506 [3:44:08<1:20:35, 3.94s/it]
{'loss': 0.2155, 'grad_norm': 0.4211081862449646, 'learning_rate': 1.0486149422492011e-05, 'epoch': 0.73}
73%|███████▎ | 3279/4506 [3:44:08<1:20:35, 3.94s/it]
73%|███████▎ | 3280/4506 [3:44:12<1:19:20, 3.88s/it]
{'loss': 0.2097, 'grad_norm': 0.40098145604133606, 'learning_rate': 1.0470383416107674e-05, 'epoch': 0.73}
73%|███████▎ | 3280/4506 [3:44:12<1:19:20, 3.88s/it]
73%|███████▎ | 3281/4506 [3:44:16<1:21:11, 3.98s/it]
{'loss': 0.2192, 'grad_norm': 0.39687830209732056, 'learning_rate': 1.0454626130841294e-05, 'epoch': 0.73}
73%|███████▎ | 3281/4506 [3:44:16<1:21:11, 3.98s/it]
73%|███████▎ | 3282/4506 [3:44:20<1:20:06, 3.93s/it]
{'loss': 0.2173, 'grad_norm': 0.41515761613845825, 'learning_rate': 1.0438877576150877e-05, 'epoch': 0.73}
73%|███████▎ | 3282/4506 [3:44:20<1:20:06, 3.93s/it]
73%|███████▎ | 3283/4506 [3:44:24<1:20:31, 3.95s/it]
{'loss': 0.216, 'grad_norm': 0.4411446750164032, 'learning_rate': 1.0423137761489187e-05, 'epoch': 0.73}
73%|███████▎ | 3283/4506 [3:44:24<1:20:31, 3.95s/it]
73%|███████▎ | 3284/4506 [3:44:28<1:24:24, 4.14s/it]
{'loss': 0.2109, 'grad_norm': 0.35643166303634644, 'learning_rate': 1.0407406696303728e-05, 'epoch': 0.73}
73%|███████▎ | 3284/4506 [3:44:28<1:24:24, 4.14s/it]
73%|███████▎ | 3285/4506 [3:44:33<1:25:04, 4.18s/it]
{'loss': 0.2073, 'grad_norm': 0.34536847472190857, 'learning_rate': 1.039168439003678e-05, 'epoch': 0.73}
73%|███████▎ | 3285/4506 [3:44:33<1:25:04, 4.18s/it]
73%|███████▎ | 3286/4506 [3:44:37<1:25:19, 4.20s/it]
{'loss': 0.2066, 'grad_norm': 0.3675990402698517, 'learning_rate': 1.037597085212533e-05, 'epoch': 0.73}
73%|███████▎ | 3286/4506 [3:44:37<1:25:19, 4.20s/it]
73%|███████▎ | 3287/4506 [3:44:42<1:27:56, 4.33s/it]
{'loss': 0.2226, 'grad_norm': 0.3633042871952057, 'learning_rate': 1.036026609200113e-05, 'epoch': 0.73}
73%|███████▎ | 3287/4506 [3:44:42<1:27:56, 4.33s/it]
73%|███████▎ | 3288/4506 [3:44:46<1:25:35, 4.22s/it]
{'loss': 0.2134, 'grad_norm': 0.39344871044158936, 'learning_rate': 1.0344570119090658e-05, 'epoch': 0.73}
73%|███████▎ | 3288/4506 [3:44:46<1:25:35, 4.22s/it]
73%|███████▎ | 3289/4506 [3:44:50<1:26:06, 4.25s/it]
{'loss': 0.2194, 'grad_norm': 0.3989395201206207, 'learning_rate': 1.0328882942815119e-05, 'epoch': 0.73}
73%|███████▎ | 3289/4506 [3:44:50<1:26:06, 4.25s/it]
73%|███████▎ | 3290/4506 [3:44:54<1:22:53, 4.09s/it]
{'loss': 0.2127, 'grad_norm': 0.40439558029174805, 'learning_rate': 1.031320457259042e-05, 'epoch': 0.73}
73%|███████▎ | 3290/4506 [3:44:54<1:22:53, 4.09s/it]
73%|███████▎ | 3291/4506 [3:44:58<1:22:33, 4.08s/it]
{'loss': 0.2105, 'grad_norm': 0.3884015679359436, 'learning_rate': 1.0297535017827214e-05, 'epoch': 0.73}
73%|███████▎ | 3291/4506 [3:44:58<1:22:33, 4.08s/it]
73%|███████▎ | 3292/4506 [3:45:02<1:22:10, 4.06s/it]
{'loss': 0.2171, 'grad_norm': 0.377779483795166, 'learning_rate': 1.0281874287930823e-05, 'epoch': 0.73}
73%|███████▎ | 3292/4506 [3:45:02<1:22:10, 4.06s/it]
73%|███████▎ | 3293/4506 [3:45:06<1:24:02, 4.16s/it]
{'loss': 0.204, 'grad_norm': 0.38145026564598083, 'learning_rate': 1.0266222392301302e-05, 'epoch': 0.73}
73%|███████▎ | 3293/4506 [3:45:06<1:24:02, 4.16s/it]
73%|███████▎ | 3294/4506 [3:45:10<1:24:55, 4.20s/it]
{'loss': 0.2153, 'grad_norm': 0.3240818977355957, 'learning_rate': 1.0250579340333402e-05, 'epoch': 0.73}
73%|███████▎ | 3294/4506 [3:45:10<1:24:55, 4.20s/it]
73%|███████▎ | 3295/4506 [3:45:15<1:24:56, 4.21s/it]
{'loss': 0.2086, 'grad_norm': 0.36979183554649353, 'learning_rate': 1.0234945141416561e-05, 'epoch': 0.73}
73%|███████▎ | 3295/4506 [3:45:15<1:24:56, 4.21s/it]
73%|███████▎ | 3296/4506 [3:45:19<1:26:39, 4.30s/it]
{'loss': 0.2128, 'grad_norm': 0.37222644686698914, 'learning_rate': 1.0219319804934894e-05, 'epoch': 0.73}
73%|███████▎ | 3296/4506 [3:45:19<1:26:39, 4.30s/it]
73%|███████▎ | 3297/4506 [3:45:23<1:24:11, 4.18s/it]
{'loss': 0.2212, 'grad_norm': 0.3836853504180908, 'learning_rate': 1.0203703340267192e-05, 'epoch': 0.73}
73%|███████▎ | 3297/4506 [3:45:23<1:24:11, 4.18s/it]
73%|███████▎ | 3298/4506 [3:45:27<1:23:25, 4.14s/it]
{'loss': 0.2065, 'grad_norm': 0.3445141017436981, 'learning_rate': 1.0188095756786955e-05, 'epoch': 0.73}
73%|███████▎ | 3298/4506 [3:45:27<1:23:25, 4.14s/it]
73%|███████▎ | 3299/4506 [3:45:31<1:21:21, 4.04s/it]
{'loss': 0.2062, 'grad_norm': 0.3251948654651642, 'learning_rate': 1.017249706386231e-05, 'epoch': 0.73}
73%|███████▎ | 3299/4506 [3:45:31<1:21:21, 4.04s/it]
73%|███████▎ | 3300/4506 [3:45:35<1:20:04, 3.98s/it]
{'loss': 0.2105, 'grad_norm': 0.37994107604026794, 'learning_rate': 1.0156907270856073e-05, 'epoch': 0.73}
73%|███████▎ | 3300/4506 [3:45:35<1:20:04, 3.98s/it]
73%|███████▎ | 3301/4506 [3:45:39<1:20:32, 4.01s/it]
{'loss': 0.2036, 'grad_norm': 0.3658817410469055, 'learning_rate': 1.0141326387125716e-05, 'epoch': 0.73}
73%|███████▎ | 3301/4506 [3:45:39<1:20:32, 4.01s/it]
73%|███████▎ | 3302/4506 [3:45:43<1:21:33, 4.06s/it]
{'loss': 0.2106, 'grad_norm': 0.36395955085754395, 'learning_rate': 1.0125754422023364e-05, 'epoch': 0.73}
73%|███████▎ | 3302/4506 [3:45:43<1:21:33, 4.06s/it]
73%|███████▎ | 3303/4506 [3:45:47<1:19:56, 3.99s/it]
{'loss': 0.2154, 'grad_norm': 0.3509896397590637, 'learning_rate': 1.011019138489577e-05, 'epoch': 0.73}
73%|███████▎ | 3303/4506 [3:45:47<1:19:56, 3.99s/it]
73%|███████▎ | 3304/4506 [3:45:51<1:19:10, 3.95s/it]
{'loss': 0.2082, 'grad_norm': 0.370594322681427, 'learning_rate': 1.009463728508436e-05, 'epoch': 0.73}
73%|███████▎ | 3304/4506 [3:45:51<1:19:10, 3.95s/it]
73%|███████▎ | 3305/4506 [3:45:54<1:17:00, 3.85s/it]
{'loss': 0.2103, 'grad_norm': 0.41770634055137634, 'learning_rate': 1.0079092131925161e-05, 'epoch': 0.73}
73%|███████▎ | 3305/4506 [3:45:54<1:17:00, 3.85s/it]
73%|███████▎ | 3306/4506 [3:45:58<1:16:33, 3.83s/it]
{'loss': 0.2241, 'grad_norm': 0.3962084650993347, 'learning_rate': 1.0063555934748853e-05, 'epoch': 0.73}
73%|███████▎ | 3306/4506 [3:45:58<1:16:33, 3.83s/it]
73%|███████▎ | 3307/4506 [3:46:02<1:18:32, 3.93s/it]
{'loss': 0.2107, 'grad_norm': 0.38591131567955017, 'learning_rate': 1.0048028702880736e-05, 'epoch': 0.73}
73%|███████▎ | 3307/4506 [3:46:02<1:18:32, 3.93s/it]
73%|███████▎ | 3308/4506 [3:46:06<1:20:05, 4.01s/it]
{'loss': 0.2007, 'grad_norm': 0.38031306862831116, 'learning_rate': 1.0032510445640734e-05, 'epoch': 0.73}
73%|███████▎ | 3308/4506 [3:46:06<1:20:05, 4.01s/it]
73%|███████▎ | 3309/4506 [3:46:11<1:20:46, 4.05s/it]
{'loss': 0.2229, 'grad_norm': 0.3537936210632324, 'learning_rate': 1.001700117234336e-05, 'epoch': 0.73}
73%|███████▎ | 3309/4506 [3:46:11<1:20:46, 4.05s/it]
73%|███████▎ | 3310/4506 [3:46:14<1:19:53, 4.01s/it]
{'loss': 0.2127, 'grad_norm': 0.3872355818748474, 'learning_rate': 1.0001500892297772e-05, 'epoch': 0.73}
73%|███████▎ | 3310/4506 [3:46:14<1:19:53, 4.01s/it]
73%|███████▎ | 3311/4506 [3:46:19<1:20:12, 4.03s/it]
{'loss': 0.2148, 'grad_norm': 0.3760296404361725, 'learning_rate': 9.98600961480769e-06, 'epoch': 0.73}
73%|███████▎ | 3311/4506 [3:46:19<1:20:12, 4.03s/it]
74%|███████▎ | 3312/4506 [3:46:23<1:24:09, 4.23s/it]
{'loss': 0.226, 'grad_norm': 0.3883407413959503, 'learning_rate': 9.97052734917146e-06, 'epoch': 0.74}
74%|███████▎ | 3312/4506 [3:46:23<1:24:09, 4.23s/it]
74%|███████▎ | 3313/4506 [3:46:28<1:25:07, 4.28s/it]
{'loss': 0.2208, 'grad_norm': 0.4494079649448395, 'learning_rate': 9.955054104682015e-06, 'epoch': 0.74}
74%|███████▎ | 3313/4506 [3:46:28<1:25:07, 4.28s/it]
74%|███████▎ | 3314/4506 [3:46:32<1:26:52, 4.37s/it]
{'loss': 0.2192, 'grad_norm': 0.33868756890296936, 'learning_rate': 9.939589890626852e-06, 'epoch': 0.74}
74%|███████▎ | 3314/4506 [3:46:32<1:26:52, 4.37s/it]
74%|███████▎ | 3315/4506 [3:46:36<1:24:49, 4.27s/it]
{'loss': 0.2132, 'grad_norm': 0.37957528233528137, 'learning_rate': 9.92413471628808e-06, 'epoch': 0.74}
74%|███████▎ | 3315/4506 [3:46:36<1:24:49, 4.27s/it]
74%|███████▎ | 3316/4506 [3:46:40<1:23:12, 4.20s/it]
{'loss': 0.2045, 'grad_norm': 0.3398580253124237, 'learning_rate': 9.908688590942347e-06, 'epoch': 0.74}
74%|███████▎ | 3316/4506 [3:46:40<1:23:12, 4.20s/it]
74%|███████▎ | 3317/4506 [3:46:44<1:22:42, 4.17s/it]
{'loss': 0.2053, 'grad_norm': 0.4173705279827118, 'learning_rate': 9.893251523860908e-06, 'epoch': 0.74}
74%|███████▎ | 3317/4506 [3:46:44<1:22:42, 4.17s/it]
74%|███████▎ | 3318/4506 [3:46:48<1:21:58, 4.14s/it]
{'loss': 0.2026, 'grad_norm': 0.4231821894645691, 'learning_rate': 9.877823524309537e-06, 'epoch': 0.74}
74%|███████▎ | 3318/4506 [3:46:48<1:21:58, 4.14s/it]
74%|███████▎ | 3319/4506 [3:46:52<1:21:20, 4.11s/it]
{'loss': 0.2062, 'grad_norm': 0.34548860788345337, 'learning_rate': 9.8624046015486e-06, 'epoch': 0.74}
74%|███████▎ | 3319/4506 [3:46:53<1:21:20, 4.11s/it]
74%|███████▎ | 3320/4506 [3:46:56<1:20:09, 4.05s/it]
{'loss': 0.2049, 'grad_norm': 0.32372787594795227, 'learning_rate': 9.846994764833007e-06, 'epoch': 0.74}
74%|███████▎ | 3320/4506 [3:46:56<1:20:09, 4.05s/it]
74%|███████▎ | 3321/4506 [3:47:00<1:18:09, 3.96s/it]
{'loss': 0.2174, 'grad_norm': 0.42124494910240173, 'learning_rate': 9.831594023412214e-06, 'epoch': 0.74}
74%|███████▎ | 3321/4506 [3:47:00<1:18:09, 3.96s/it]
74%|███████▎ | 3322/4506 [3:47:04<1:19:52, 4.05s/it]
{'loss': 0.2174, 'grad_norm': 0.3929750323295593, 'learning_rate': 9.816202386530199e-06, 'epoch': 0.74}
74%|███████▎ | 3322/4506 [3:47:04<1:19:52, 4.05s/it]
74%|███████▎ | 3323/4506 [3:47:08<1:19:19, 4.02s/it]
{'loss': 0.2153, 'grad_norm': 0.3650379478931427, 'learning_rate': 9.800819863425511e-06, 'epoch': 0.74}
74%|███████▎ | 3323/4506 [3:47:08<1:19:19, 4.02s/it]
74%|███████▍ | 3324/4506 [3:47:13<1:21:09, 4.12s/it]
{'loss': 0.2192, 'grad_norm': 0.4038201868534088, 'learning_rate': 9.785446463331188e-06, 'epoch': 0.74}
74%|███████▍ | 3324/4506 [3:47:13<1:21:09, 4.12s/it]
74%|███████▍ | 3325/4506 [3:47:17<1:20:50, 4.11s/it]
{'loss': 0.2104, 'grad_norm': 0.3209807872772217, 'learning_rate': 9.770082195474822e-06, 'epoch': 0.74}
74%|███████▍ | 3325/4506 [3:47:17<1:20:50, 4.11s/it]
74%|███████▍ | 3326/4506 [3:47:21<1:20:53, 4.11s/it]
{'loss': 0.2191, 'grad_norm': 0.4338976740837097, 'learning_rate': 9.754727069078516e-06, 'epoch': 0.74}
74%|███████▍ | 3326/4506 [3:47:21<1:20:53, 4.11s/it]
74%|███████▍ | 3327/4506 [3:47:25<1:19:11, 4.03s/it]
{'loss': 0.2202, 'grad_norm': 0.4799514412879944, 'learning_rate': 9.739381093358887e-06, 'epoch': 0.74}
74%|███████▍ | 3327/4506 [3:47:25<1:19:11, 4.03s/it]
74%|███████▍ | 3328/4506 [3:47:29<1:17:30, 3.95s/it]
{'loss': 0.2141, 'grad_norm': 0.3754061758518219, 'learning_rate': 9.724044277527048e-06, 'epoch': 0.74}
74%|███████▍ | 3328/4506 [3:47:29<1:17:30, 3.95s/it]
74%|███████▍ | 3329/4506 [3:47:32<1:16:01, 3.88s/it]
{'loss': 0.2068, 'grad_norm': 0.38133084774017334, 'learning_rate': 9.70871663078863e-06, 'epoch': 0.74}
74%|███████▍ | 3329/4506 [3:47:32<1:16:01, 3.88s/it]
74%|███████▍ | 3330/4506 [3:47:37<1:19:23, 4.05s/it]
{'loss': 0.221, 'grad_norm': 0.3910427987575531, 'learning_rate': 9.693398162343753e-06, 'epoch': 0.74}
74%|███████▍ | 3330/4506 [3:47:37<1:19:23, 4.05s/it]
74%|███████▍ | 3331/4506 [3:47:41<1:20:30, 4.11s/it]
{'loss': 0.2155, 'grad_norm': 0.35757729411125183, 'learning_rate': 9.678088881387005e-06, 'epoch': 0.74}
74%|███████▍ | 3331/4506 [3:47:41<1:20:30, 4.11s/it]
74%|███████▍ | 3332/4506 [3:47:45<1:19:28, 4.06s/it]
{'loss': 0.2024, 'grad_norm': 0.3634088635444641, 'learning_rate': 9.66278879710752e-06, 'epoch': 0.74}
74%|███████▍ | 3332/4506 [3:47:45<1:19:28, 4.06s/it]
74%|███████▍ | 3333/4506 [3:47:49<1:18:54, 4.04s/it]
{'loss': 0.2119, 'grad_norm': 0.43884339928627014, 'learning_rate': 9.647497918688843e-06, 'epoch': 0.74}
74%|███████▍ | 3333/4506 [3:47:49<1:18:54, 4.04s/it]
74%|███████▍ | 3334/4506 [3:47:53<1:18:46, 4.03s/it]
{'loss': 0.2179, 'grad_norm': 0.3782673478126526, 'learning_rate': 9.632216255309052e-06, 'epoch': 0.74}
74%|███████▍ | 3334/4506 [3:47:53<1:18:46, 4.03s/it]
74%|███████▍ | 3335/4506 [3:47:57<1:19:03, 4.05s/it]
{'loss': 0.2157, 'grad_norm': 0.42243775725364685, 'learning_rate': 9.61694381614064e-06, 'epoch': 0.74}
74%|███████▍ | 3335/4506 [3:47:57<1:19:03, 4.05s/it]
74%|███████▍ | 3336/4506 [3:48:01<1:18:55, 4.05s/it]
{'loss': 0.2095, 'grad_norm': 0.3325609266757965, 'learning_rate': 9.601680610350609e-06, 'epoch': 0.74}
74%|███████▍ | 3336/4506 [3:48:01<1:18:55, 4.05s/it]
74%|███████▍ | 3337/4506 [3:48:05<1:18:58, 4.05s/it]
{'loss': 0.2152, 'grad_norm': 0.35153523087501526, 'learning_rate': 9.58642664710038e-06, 'epoch': 0.74}
74%|███████▍ | 3337/4506 [3:48:05<1:18:58, 4.05s/it]
74%|███████▍ | 3338/4506 [3:48:09<1:18:00, 4.01s/it]
{'loss': 0.2251, 'grad_norm': 0.3784453570842743, 'learning_rate': 9.57118193554586e-06, 'epoch': 0.74}
74%|███████▍ | 3338/4506 [3:48:09<1:18:00, 4.01s/it]
74%|███████▍ | 3339/4506 [3:48:13<1:17:06, 3.96s/it]
{'loss': 0.2155, 'grad_norm': 0.44643595814704895, 'learning_rate': 9.555946484837385e-06, 'epoch': 0.74}
74%|███████▍ | 3339/4506 [3:48:13<1:17:06, 3.96s/it]
74%|███████▍ | 3340/4506 [3:48:17<1:16:00, 3.91s/it]
{'loss': 0.2103, 'grad_norm': 0.34682491421699524, 'learning_rate': 9.540720304119746e-06, 'epoch': 0.74}
74%|███████▍ | 3340/4506 [3:48:17<1:16:00, 3.91s/it]
74%|███████▍ | 3341/4506 [3:48:21<1:16:07, 3.92s/it]
{'loss': 0.2114, 'grad_norm': 0.36600932478904724, 'learning_rate': 9.525503402532142e-06, 'epoch': 0.74}
74%|███████▍ | 3341/4506 [3:48:21<1:16:07, 3.92s/it]
74%|███████▍ | 3342/4506 [3:48:24<1:14:36, 3.85s/it]
{'loss': 0.2132, 'grad_norm': 0.38448306918144226, 'learning_rate': 9.51029578920824e-06, 'epoch': 0.74}
74%|███████▍ | 3342/4506 [3:48:24<1:14:36, 3.85s/it]
74%|███████▍ | 3343/4506 [3:48:28<1:16:29, 3.95s/it]
{'loss': 0.2095, 'grad_norm': 0.4093492329120636, 'learning_rate': 9.495097473276093e-06, 'epoch': 0.74}
74%|███████▍ | 3343/4506 [3:48:28<1:16:29, 3.95s/it]
74%|███████▍ | 3344/4506 [3:48:33<1:17:11, 3.99s/it]
{'loss': 0.2115, 'grad_norm': 0.3810185194015503, 'learning_rate': 9.4799084638582e-06, 'epoch': 0.74}
74%|███████▍ | 3344/4506 [3:48:33<1:17:11, 3.99s/it]
74%|███████▍ | 3345/4506 [3:48:37<1:18:54, 4.08s/it]
{'loss': 0.2151, 'grad_norm': 0.37332043051719666, 'learning_rate': 9.464728770071469e-06, 'epoch': 0.74}
74%|███████▍ | 3345/4506 [3:48:37<1:18:54, 4.08s/it]
74%|███████▍ | 3346/4506 [3:48:41<1:19:27, 4.11s/it]
{'loss': 0.2104, 'grad_norm': 0.382207989692688, 'learning_rate': 9.44955840102722e-06, 'epoch': 0.74}
74%|███████▍ | 3346/4506 [3:48:41<1:19:27, 4.11s/it]
74%|███████▍ | 3347/4506 [3:48:45<1:21:37, 4.23s/it]
{'loss': 0.2222, 'grad_norm': 0.3798463046550751, 'learning_rate': 9.434397365831162e-06, 'epoch': 0.74}
74%|███████▍ | 3347/4506 [3:48:45<1:21:37, 4.23s/it]
74%|███████▍ | 3348/4506 [3:48:50<1:20:53, 4.19s/it]
{'loss': 0.2115, 'grad_norm': 0.3395776152610779, 'learning_rate': 9.419245673583404e-06, 'epoch': 0.74}
74%|███████▍ | 3348/4506 [3:48:50<1:20:53, 4.19s/it]
74%|███████▍ | 3349/4506 [3:48:54<1:20:16, 4.16s/it]
{'loss': 0.2079, 'grad_norm': 0.41155314445495605, 'learning_rate': 9.40410333337847e-06, 'epoch': 0.74}
74%|███████▍ | 3349/4506 [3:48:54<1:20:16, 4.16s/it]
74%|███████▍ | 3350/4506 [3:48:58<1:20:05, 4.16s/it]
{'loss': 0.2153, 'grad_norm': 0.4110398292541504, 'learning_rate': 9.38897035430522e-06, 'epoch': 0.74}
74%|███████▍ | 3350/4506 [3:48:58<1:20:05, 4.16s/it]
74%|███████▍ | 3351/4506 [3:49:02<1:19:28, 4.13s/it]
{'loss': 0.2288, 'grad_norm': 0.46331799030303955, 'learning_rate': 9.373846745446974e-06, 'epoch': 0.74}
74%|███████▍ | 3351/4506 [3:49:02<1:19:28, 4.13s/it]
74%|███████▍ | 3352/4506 [3:49:06<1:17:07, 4.01s/it]
{'loss': 0.2159, 'grad_norm': 0.37932175397872925, 'learning_rate': 9.358732515881347e-06, 'epoch': 0.74}
74%|███████▍ | 3352/4506 [3:49:06<1:17:07, 4.01s/it]
74%|███████▍ | 3353/4506 [3:49:10<1:16:58, 4.01s/it]
{'loss': 0.2137, 'grad_norm': 0.4032853841781616, 'learning_rate': 9.343627674680381e-06, 'epoch': 0.74}
74%|███████▍ | 3353/4506 [3:49:10<1:16:58, 4.01s/it]
74%|███████▍ | 3354/4506 [3:49:14<1:18:30, 4.09s/it]
{'loss': 0.2188, 'grad_norm': 0.39813047647476196, 'learning_rate': 9.328532230910444e-06, 'epoch': 0.74}
74%|███████▍ | 3354/4506 [3:49:14<1:18:30, 4.09s/it]
74%|███████▍ | 3355/4506 [3:49:19<1:23:34, 4.36s/it]
{'loss': 0.2091, 'grad_norm': 0.365141898393631, 'learning_rate': 9.313446193632296e-06, 'epoch': 0.74}
74%|███████▍ | 3355/4506 [3:49:19<1:23:34, 4.36s/it]
74%|███████▍ | 3356/4506 [3:49:23<1:21:42, 4.26s/it]
{'loss': 0.2111, 'grad_norm': 0.3701182007789612, 'learning_rate': 9.298369571901022e-06, 'epoch': 0.74}
74%|███████▍ | 3356/4506 [3:49:23<1:21:42, 4.26s/it]
75%|███████▍ | 3357/4506 [3:49:27<1:18:16, 4.09s/it]
{'loss': 0.212, 'grad_norm': 0.4324359893798828, 'learning_rate': 9.283302374766074e-06, 'epoch': 0.75}
75%|███████▍ | 3357/4506 [3:49:27<1:18:16, 4.09s/it]
75%|███████▍ | 3358/4506 [3:49:30<1:16:48, 4.01s/it]
{'loss': 0.2061, 'grad_norm': 0.36057186126708984, 'learning_rate': 9.268244611271243e-06, 'epoch': 0.75}
75%|███████▍ | 3358/4506 [3:49:30<1:16:48, 4.01s/it]
75%|███████▍ | 3359/4506 [3:49:34<1:14:31, 3.90s/it]
{'loss': 0.2216, 'grad_norm': 0.4110294580459595, 'learning_rate': 9.253196290454666e-06, 'epoch': 0.75}
75%|███████▍ | 3359/4506 [3:49:34<1:14:31, 3.90s/it]
75%|███████▍ | 3360/4506 [3:49:39<1:17:32, 4.06s/it]
{'loss': 0.2122, 'grad_norm': 0.392994225025177, 'learning_rate': 9.238157421348786e-06, 'epoch': 0.75}
75%|███████▍ | 3360/4506 [3:49:39<1:17:32, 4.06s/it]
75%|███████▍ | 3361/4506 [3:49:43<1:17:36, 4.07s/it]
{'loss': 0.2141, 'grad_norm': 0.3368234932422638, 'learning_rate': 9.223128012980409e-06, 'epoch': 0.75}
75%|███████▍ | 3361/4506 [3:49:43<1:17:36, 4.07s/it]
75%|███████▍ | 3362/4506 [3:49:47<1:18:54, 4.14s/it]
{'loss': 0.2024, 'grad_norm': 0.34357523918151855, 'learning_rate': 9.208108074370622e-06, 'epoch': 0.75}
75%|███████▍ | 3362/4506 [3:49:47<1:18:54, 4.14s/it]
75%|███████▍ | 3363/4506 [3:49:51<1:18:34, 4.12s/it]
{'loss': 0.2175, 'grad_norm': 0.39655181765556335, 'learning_rate': 9.19309761453486e-06, 'epoch': 0.75}
75%|███████▍ | 3363/4506 [3:49:51<1:18:34, 4.12s/it]
75%|███████▍ | 3364/4506 [3:49:55<1:18:44, 4.14s/it]
{'loss': 0.2128, 'grad_norm': 0.4185037910938263, 'learning_rate': 9.178096642482864e-06, 'epoch': 0.75}
75%|███████▍ | 3364/4506 [3:49:55<1:18:44, 4.14s/it]
75%|███████▍ | 3365/4506 [3:49:59<1:18:22, 4.12s/it]
{'loss': 0.2104, 'grad_norm': 0.3650191128253937, 'learning_rate': 9.16310516721866e-06, 'epoch': 0.75}
75%|███████▍ | 3365/4506 [3:49:59<1:18:22, 4.12s/it]
75%|███████▍ | 3366/4506 [3:50:03<1:18:12, 4.12s/it]
{'loss': 0.2072, 'grad_norm': 0.3800249993801117, 'learning_rate': 9.148123197740601e-06, 'epoch': 0.75}
75%|███████▍ | 3366/4506 [3:50:03<1:18:12, 4.12s/it]
75%|███████▍ | 3367/4506 [3:50:08<1:19:26, 4.18s/it]
{'loss': 0.2139, 'grad_norm': 0.421546995639801, 'learning_rate': 9.13315074304131e-06, 'epoch': 0.75}
75%|███████▍ | 3367/4506 [3:50:08<1:19:26, 4.18s/it]
75%|███████▍ | 3368/4506 [3:50:12<1:18:25, 4.14s/it]
{'loss': 0.2243, 'grad_norm': 0.35988181829452515, 'learning_rate': 9.11818781210772e-06, 'epoch': 0.75}
75%|███████▍ | 3368/4506 [3:50:12<1:18:25, 4.14s/it]
75%|███████▍ | 3369/4506 [3:50:16<1:18:00, 4.12s/it]
{'loss': 0.2104, 'grad_norm': 0.39235687255859375, 'learning_rate': 9.103234413921016e-06, 'epoch': 0.75}
75%|███████▍ | 3369/4506 [3:50:16<1:18:00, 4.12s/it]
75%|███████▍ | 3370/4506 [3:50:20<1:18:36, 4.15s/it]
{'loss': 0.223, 'grad_norm': 0.3331841230392456, 'learning_rate': 9.088290557456716e-06, 'epoch': 0.75}
75%|███████▍ | 3370/4506 [3:50:20<1:18:36, 4.15s/it]
75%|███████▍ | 3371/4506 [3:50:24<1:19:02, 4.18s/it]
{'loss': 0.1998, 'grad_norm': 0.3105921745300293, 'learning_rate': 9.073356251684551e-06, 'epoch': 0.75}
75%|███████▍ | 3371/4506 [3:50:24<1:19:02, 4.18s/it]
75%|███████▍ | 3372/4506 [3:50:28<1:16:27, 4.05s/it]
{'loss': 0.2039, 'grad_norm': 0.36133450269699097, 'learning_rate': 9.058431505568563e-06, 'epoch': 0.75}
75%|███████▍ | 3372/4506 [3:50:28<1:16:27, 4.05s/it]
75%|███████▍ | 3373/4506 [3:50:32<1:15:42, 4.01s/it]
{'loss': 0.215, 'grad_norm': 0.34356194734573364, 'learning_rate': 9.043516328067022e-06, 'epoch': 0.75}
75%|███████▍ | 3373/4506 [3:50:32<1:15:42, 4.01s/it]
75%|███████▍ | 3374/4506 [3:50:36<1:17:36, 4.11s/it]
{'loss': 0.2126, 'grad_norm': 0.399649977684021, 'learning_rate': 9.028610728132489e-06, 'epoch': 0.75}
75%|███████▍ | 3374/4506 [3:50:36<1:17:36, 4.11s/it]
75%|███████▍ | 3375/4506 [3:50:40<1:14:23, 3.95s/it]
{'loss': 0.2033, 'grad_norm': 0.3650927245616913, 'learning_rate': 9.013714714711738e-06, 'epoch': 0.75}
75%|███████▍ | 3375/4506 [3:50:40<1:14:23, 3.95s/it]
75%|███████▍ | 3376/4506 [3:50:44<1:14:59, 3.98s/it]
{'loss': 0.2091, 'grad_norm': 0.3393290340900421, 'learning_rate': 8.998828296745824e-06, 'epoch': 0.75}
75%|███████▍ | 3376/4506 [3:50:44<1:14:59, 3.98s/it]
75%|███████▍ | 3377/4506 [3:50:48<1:13:44, 3.92s/it]
{'loss': 0.2146, 'grad_norm': 0.38827311992645264, 'learning_rate': 8.98395148317002e-06, 'epoch': 0.75}
75%|███████▍ | 3377/4506 [3:50:48<1:13:44, 3.92s/it]
75%|███████▍ | 3378/4506 [3:50:52<1:14:30, 3.96s/it]
{'loss': 0.2156, 'grad_norm': 0.3541666865348816, 'learning_rate': 8.96908428291386e-06, 'epoch': 0.75}
75%|███████▍ | 3378/4506 [3:50:52<1:14:30, 3.96s/it]
75%|███████▍ | 3379/4506 [3:50:56<1:13:35, 3.92s/it]
{'loss': 0.2239, 'grad_norm': 0.38739219307899475, 'learning_rate': 8.954226704901064e-06, 'epoch': 0.75}
75%|███████▍ | 3379/4506 [3:50:56<1:13:35, 3.92s/it]
75%|███████▌ | 3380/4506 [3:50:59<1:13:31, 3.92s/it]
{'loss': 0.2055, 'grad_norm': 0.3753780126571655, 'learning_rate': 8.939378758049627e-06, 'epoch': 0.75}
75%|███████▌ | 3380/4506 [3:50:59<1:13:31, 3.92s/it]
75%|███████▌ | 3381/4506 [3:51:03<1:12:14, 3.85s/it]
{'loss': 0.2194, 'grad_norm': 0.47665324807167053, 'learning_rate': 8.924540451271717e-06, 'epoch': 0.75}
75%|███████▌ | 3381/4506 [3:51:03<1:12:14, 3.85s/it]
75%|███████▌ | 3382/4506 [3:51:07<1:11:53, 3.84s/it]
{'loss': 0.2057, 'grad_norm': 0.34479856491088867, 'learning_rate': 8.909711793473748e-06, 'epoch': 0.75}
75%|███████▌ | 3382/4506 [3:51:07<1:11:53, 3.84s/it]
75%|███████▌ | 3383/4506 [3:51:12<1:17:36, 4.15s/it]
{'loss': 0.21, 'grad_norm': 0.40048837661743164, 'learning_rate': 8.894892793556339e-06, 'epoch': 0.75}
75%|███████▌ | 3383/4506 [3:51:12<1:17:36, 4.15s/it]
75%|███████▌ | 3384/4506 [3:51:16<1:16:11, 4.07s/it]
{'loss': 0.2164, 'grad_norm': 0.40829533338546753, 'learning_rate': 8.880083460414284e-06, 'epoch': 0.75}
75%|███████▌ | 3384/4506 [3:51:16<1:16:11, 4.07s/it]
75%|███████▌ | 3385/4506 [3:51:20<1:16:58, 4.12s/it]
{'loss': 0.215, 'grad_norm': 0.3705514371395111, 'learning_rate': 8.865283802936618e-06, 'epoch': 0.75}
75%|███████▌ | 3385/4506 [3:51:20<1:16:58, 4.12s/it]
75%|███████▌ | 3386/4506 [3:51:24<1:16:46, 4.11s/it]
{'loss': 0.2114, 'grad_norm': 0.4021587371826172, 'learning_rate': 8.850493830006528e-06, 'epoch': 0.75}
75%|███████▌ | 3386/4506 [3:51:24<1:16:46, 4.11s/it]
75%|███████▌ | 3387/4506 [3:51:28<1:15:27, 4.05s/it]
{'loss': 0.2041, 'grad_norm': 0.3856029212474823, 'learning_rate': 8.835713550501423e-06, 'epoch': 0.75}
75%|███████▌ | 3387/4506 [3:51:28<1:15:27, 4.05s/it]
75%|███████▌ | 3388/4506 [3:51:32<1:16:47, 4.12s/it]
{'loss': 0.2112, 'grad_norm': 0.4872981905937195, 'learning_rate': 8.820942973292848e-06, 'epoch': 0.75}
75%|███████▌ | 3388/4506 [3:51:32<1:16:47, 4.12s/it]
75%|███████▌ | 3389/4506 [3:51:37<1:17:55, 4.19s/it]
{'loss': 0.2072, 'grad_norm': 0.39684274792671204, 'learning_rate': 8.806182107246585e-06, 'epoch': 0.75}
75%|███████▌ | 3389/4506 [3:51:37<1:17:55, 4.19s/it]
75%|███████▌ | 3390/4506 [3:51:40<1:14:37, 4.01s/it]
{'loss': 0.1969, 'grad_norm': 0.3579533100128174, 'learning_rate': 8.791430961222535e-06, 'epoch': 0.75}
75%|███████▌ | 3390/4506 [3:51:40<1:14:37, 4.01s/it]
75%|███████▌ | 3391/4506 [3:51:45<1:16:56, 4.14s/it]
{'loss': 0.205, 'grad_norm': 0.348061740398407, 'learning_rate': 8.7766895440748e-06, 'epoch': 0.75}
75%|███████▌ | 3391/4506 [3:51:45<1:16:56, 4.14s/it]
75%|███████▌ | 3392/4506 [3:51:49<1:18:48, 4.24s/it]
{'loss': 0.2119, 'grad_norm': 0.38433128595352173, 'learning_rate': 8.76195786465161e-06, 'epoch': 0.75}
75%|███████▌ | 3392/4506 [3:51:49<1:18:48, 4.24s/it]
75%|███████▌ | 3393/4506 [3:51:53<1:18:32, 4.23s/it]
{'loss': 0.2065, 'grad_norm': 0.3924587368965149, 'learning_rate': 8.74723593179538e-06, 'epoch': 0.75}
75%|███████▌ | 3393/4506 [3:51:53<1:18:32, 4.23s/it]
75%|███████▌ | 3394/4506 [3:51:58<1:19:30, 4.29s/it]
{'loss': 0.2038, 'grad_norm': 0.37298932671546936, 'learning_rate': 8.732523754342653e-06, 'epoch': 0.75}
75%|███████▌ | 3394/4506 [3:51:58<1:19:30, 4.29s/it]
75%|███████▌ | 3395/4506 [3:52:02<1:18:59, 4.27s/it]
{'loss': 0.2139, 'grad_norm': 0.4127388000488281, 'learning_rate': 8.717821341124128e-06, 'epoch': 0.75}
75%|███████▌ | 3395/4506 [3:52:02<1:18:59, 4.27s/it]
75%|███████▌ | 3396/4506 [3:52:06<1:17:29, 4.19s/it]
{'loss': 0.2086, 'grad_norm': 0.3893918991088867, 'learning_rate': 8.703128700964641e-06, 'epoch': 0.75}
75%|███████▌ | 3396/4506 [3:52:06<1:17:29, 4.19s/it]
75%|███████▌ | 3397/4506 [3:52:10<1:16:56, 4.16s/it]
{'loss': 0.2138, 'grad_norm': 0.3735790252685547, 'learning_rate': 8.688445842683173e-06, 'epoch': 0.75}
75%|███████▌ | 3397/4506 [3:52:10<1:16:56, 4.16s/it]
75%|███████▌ | 3398/4506 [3:52:14<1:16:24, 4.14s/it]
{'loss': 0.207, 'grad_norm': 0.38607826828956604, 'learning_rate': 8.673772775092814e-06, 'epoch': 0.75}
75%|███████▌ | 3398/4506 [3:52:14<1:16:24, 4.14s/it]
75%|███████▌ | 3399/4506 [3:52:18<1:15:33, 4.10s/it]
{'loss': 0.2105, 'grad_norm': 0.37970641255378723, 'learning_rate': 8.659109507000774e-06, 'epoch': 0.75}
75%|███████▌ | 3399/4506 [3:52:18<1:15:33, 4.10s/it]
75%|███████▌ | 3400/4506 [3:52:22<1:14:28, 4.04s/it]
{'loss': 0.2137, 'grad_norm': 0.42413198947906494, 'learning_rate': 8.644456047208402e-06, 'epoch': 0.75}
75%|███████▌ | 3400/4506 [3:52:22<1:14:28, 4.04s/it]
75%|███████▌ | 3401/4506 [3:52:26<1:15:18, 4.09s/it]
{'loss': 0.2063, 'grad_norm': 0.37411636114120483, 'learning_rate': 8.629812404511153e-06, 'epoch': 0.75}
75%|███████▌ | 3401/4506 [3:52:26<1:15:18, 4.09s/it]
75%|███████▌ | 3402/4506 [3:52:31<1:16:20, 4.15s/it]
{'loss': 0.2124, 'grad_norm': 0.4401175379753113, 'learning_rate': 8.615178587698591e-06, 'epoch': 0.76}
75%|███████▌ | 3402/4506 [3:52:31<1:16:20, 4.15s/it]
76%|███████▌ | 3403/4506 [3:52:35<1:17:22, 4.21s/it]
{'loss': 0.2066, 'grad_norm': 0.3790542185306549, 'learning_rate': 8.600554605554367e-06, 'epoch': 0.76}
76%|███████▌ | 3403/4506 [3:52:35<1:17:22, 4.21s/it]
76%|███████▌ | 3404/4506 [3:52:39<1:17:21, 4.21s/it]
{'loss': 0.1984, 'grad_norm': 0.3400149941444397, 'learning_rate': 8.585940466856252e-06, 'epoch': 0.76}
76%|███████▌ | 3404/4506 [3:52:39<1:17:21, 4.21s/it]
76%|███████▌ | 3405/4506 [3:52:43<1:17:02, 4.20s/it]
{'loss': 0.2047, 'grad_norm': 0.37747660279273987, 'learning_rate': 8.57133618037608e-06, 'epoch': 0.76}
76%|███████▌ | 3405/4506 [3:52:43<1:17:02, 4.20s/it]
76%|███████▌ | 3406/4506 [3:52:47<1:14:27, 4.06s/it]
{'loss': 0.2078, 'grad_norm': 0.41998937726020813, 'learning_rate': 8.556741754879807e-06, 'epoch': 0.76}
76%|███████▌ | 3406/4506 [3:52:47<1:14:27, 4.06s/it]
76%|███████▌ | 3407/4506 [3:52:51<1:13:43, 4.03s/it]
{'loss': 0.209, 'grad_norm': 0.36205846071243286, 'learning_rate': 8.542157199127426e-06, 'epoch': 0.76}
76%|███████▌ | 3407/4506 [3:52:51<1:13:43, 4.03s/it]
76%|███████▌ | 3408/4506 [3:52:55<1:12:53, 3.98s/it]
{'loss': 0.2081, 'grad_norm': 0.37404900789260864, 'learning_rate': 8.527582521873066e-06, 'epoch': 0.76}
76%|███████▌ | 3408/4506 [3:52:55<1:12:53, 3.98s/it]
76%|███████▌ | 3409/4506 [3:52:59<1:13:26, 4.02s/it]
{'loss': 0.2202, 'grad_norm': 0.3912821114063263, 'learning_rate': 8.513017731864862e-06, 'epoch': 0.76}
76%|███████▌ | 3409/4506 [3:52:59<1:13:26, 4.02s/it]
76%|███████▌ | 3410/4506 [3:53:03<1:14:34, 4.08s/it]
{'loss': 0.2062, 'grad_norm': 0.3729149103164673, 'learning_rate': 8.498462837845065e-06, 'epoch': 0.76}
76%|███████▌ | 3410/4506 [3:53:03<1:14:34, 4.08s/it]
76%|███████▌ | 3411/4506 [3:53:07<1:13:47, 4.04s/it]
{'loss': 0.212, 'grad_norm': 0.35722899436950684, 'learning_rate': 8.483917848549946e-06, 'epoch': 0.76}
76%|███████▌ | 3411/4506 [3:53:07<1:13:47, 4.04s/it]
76%|███████▌ | 3412/4506 [3:53:11<1:13:27, 4.03s/it]
{'loss': 0.2038, 'grad_norm': 0.37256160378456116, 'learning_rate': 8.469382772709868e-06, 'epoch': 0.76}
76%|███████▌ | 3412/4506 [3:53:11<1:13:27, 4.03s/it]
76%|███████▌ | 3413/4506 [3:53:15<1:12:07, 3.96s/it]
{'loss': 0.2036, 'grad_norm': 0.38661620020866394, 'learning_rate': 8.454857619049212e-06, 'epoch': 0.76}
76%|███████▌ | 3413/4506 [3:53:15<1:12:07, 3.96s/it]
76%|███████▌ | 3414/4506 [3:53:19<1:13:51, 4.06s/it]
{'loss': 0.2035, 'grad_norm': 0.3541903793811798, 'learning_rate': 8.440342396286419e-06, 'epoch': 0.76}
76%|███████▌ | 3414/4506 [3:53:19<1:13:51, 4.06s/it]
76%|███████▌ | 3415/4506 [3:53:23<1:14:22, 4.09s/it]
{'loss': 0.2201, 'grad_norm': 0.42492642998695374, 'learning_rate': 8.42583711313398e-06, 'epoch': 0.76}
76%|███████▌ | 3415/4506 [3:53:23<1:14:22, 4.09s/it]
76%|███████▌ | 3416/4506 [3:53:27<1:14:03, 4.08s/it]
{'loss': 0.2036, 'grad_norm': 0.3643840253353119, 'learning_rate': 8.41134177829839e-06, 'epoch': 0.76}
76%|███████▌ | 3416/4506 [3:53:27<1:14:03, 4.08s/it]
76%|███████▌ | 3417/4506 [3:53:31<1:12:39, 4.00s/it]
{'loss': 0.2095, 'grad_norm': 0.3677407205104828, 'learning_rate': 8.396856400480207e-06, 'epoch': 0.76}
76%|███████▌ | 3417/4506 [3:53:31<1:12:39, 4.00s/it]
76%|███████▌ | 3418/4506 [3:53:35<1:11:52, 3.96s/it]
{'loss': 0.2001, 'grad_norm': 0.3813495337963104, 'learning_rate': 8.382380988373977e-06, 'epoch': 0.76}
76%|███████▌ | 3418/4506 [3:53:35<1:11:52, 3.96s/it]
76%|███████▌ | 3419/4506 [3:53:40<1:15:21, 4.16s/it]
{'loss': 0.2146, 'grad_norm': 0.3937419056892395, 'learning_rate': 8.367915550668295e-06, 'epoch': 0.76}
76%|███████▌ | 3419/4506 [3:53:40<1:15:21, 4.16s/it]
76%|███████▌ | 3420/4506 [3:53:44<1:15:55, 4.19s/it]
{'loss': 0.2128, 'grad_norm': 0.34046274423599243, 'learning_rate': 8.353460096045753e-06, 'epoch': 0.76}
76%|███████▌ | 3420/4506 [3:53:44<1:15:55, 4.19s/it]
76%|███████▌ | 3421/4506 [3:53:48<1:15:58, 4.20s/it]
{'loss': 0.2052, 'grad_norm': 0.35032063722610474, 'learning_rate': 8.339014633182965e-06, 'epoch': 0.76}
76%|███████▌ | 3421/4506 [3:53:48<1:15:58, 4.20s/it]
76%|███████▌ | 3422/4506 [3:53:52<1:14:49, 4.14s/it]
{'loss': 0.2166, 'grad_norm': 0.44930222630500793, 'learning_rate': 8.324579170750518e-06, 'epoch': 0.76}
76%|███████▌ | 3422/4506 [3:53:52<1:14:49, 4.14s/it]
76%|███████▌ | 3423/4506 [3:53:57<1:16:20, 4.23s/it]
{'loss': 0.2156, 'grad_norm': 0.3891095519065857, 'learning_rate': 8.310153717413035e-06, 'epoch': 0.76}
76%|███████▌ | 3423/4506 [3:53:57<1:16:20, 4.23s/it]
76%|███████▌ | 3424/4506 [3:54:01<1:15:22, 4.18s/it]
{'loss': 0.1995, 'grad_norm': 0.3300117552280426, 'learning_rate': 8.295738281829096e-06, 'epoch': 0.76}
76%|███████▌ | 3424/4506 [3:54:01<1:15:22, 4.18s/it]
76%|███████▌ | 3425/4506 [3:54:05<1:13:39, 4.09s/it]
{'loss': 0.2012, 'grad_norm': 0.3492822051048279, 'learning_rate': 8.281332872651302e-06, 'epoch': 0.76}
76%|███████▌ | 3425/4506 [3:54:05<1:13:39, 4.09s/it]
76%|███████▌ | 3426/4506 [3:54:09<1:12:46, 4.04s/it]
{'loss': 0.2035, 'grad_norm': 0.34967803955078125, 'learning_rate': 8.266937498526186e-06, 'epoch': 0.76}
76%|███████▌ | 3426/4506 [3:54:09<1:12:46, 4.04s/it]
76%|███████▌ | 3427/4506 [3:54:13<1:13:15, 4.07s/it]
{'loss': 0.2022, 'grad_norm': 0.3942825496196747, 'learning_rate': 8.252552168094327e-06, 'epoch': 0.76}
76%|███████▌ | 3427/4506 [3:54:13<1:13:15, 4.07s/it]
76%|███████▌ | 3428/4506 [3:54:17<1:13:12, 4.07s/it]
{'loss': 0.2124, 'grad_norm': 0.3506634533405304, 'learning_rate': 8.238176889990216e-06, 'epoch': 0.76}
76%|███████▌ | 3428/4506 [3:54:17<1:13:12, 4.07s/it]
76%|███████▌ | 3429/4506 [3:54:21<1:14:31, 4.15s/it]
{'loss': 0.2209, 'grad_norm': 0.36857885122299194, 'learning_rate': 8.223811672842344e-06, 'epoch': 0.76}
76%|███████▌ | 3429/4506 [3:54:21<1:14:31, 4.15s/it]
76%|███████▌ | 3430/4506 [3:54:26<1:16:50, 4.28s/it]
{'loss': 0.2048, 'grad_norm': 0.3610490560531616, 'learning_rate': 8.20945652527314e-06, 'epoch': 0.76}
76%|███████▌ | 3430/4506 [3:54:26<1:16:50, 4.28s/it]
76%|███████▌ | 3431/4506 [3:54:30<1:14:45, 4.17s/it]
{'loss': 0.1979, 'grad_norm': 0.3698340356349945, 'learning_rate': 8.195111455899013e-06, 'epoch': 0.76}
76%|███████▌ | 3431/4506 [3:54:30<1:14:45, 4.17s/it]
76%|███████▌ | 3432/4506 [3:54:34<1:15:34, 4.22s/it]
{'loss': 0.2041, 'grad_norm': 0.35797983407974243, 'learning_rate': 8.180776473330292e-06, 'epoch': 0.76}
76%|███████▌ | 3432/4506 [3:54:34<1:15:34, 4.22s/it]
76%|███████▌ | 3433/4506 [3:54:38<1:14:25, 4.16s/it]
{'loss': 0.21, 'grad_norm': 0.3905852138996124, 'learning_rate': 8.166451586171284e-06, 'epoch': 0.76}
76%|███████▌ | 3433/4506 [3:54:38<1:14:25, 4.16s/it]
76%|███████▌ | 3434/4506 [3:54:42<1:12:10, 4.04s/it]
{'loss': 0.2128, 'grad_norm': 0.37763726711273193, 'learning_rate': 8.152136803020224e-06, 'epoch': 0.76}
76%|███████▌ | 3434/4506 [3:54:42<1:12:10, 4.04s/it]
76%|███████▌ | 3435/4506 [3:54:46<1:13:12, 4.10s/it]
{'loss': 0.2092, 'grad_norm': 0.35327863693237305, 'learning_rate': 8.137832132469268e-06, 'epoch': 0.76}
76%|███████▌ | 3435/4506 [3:54:46<1:13:12, 4.10s/it]
76%|███████▋ | 3436/4506 [3:54:51<1:15:40, 4.24s/it]
{'loss': 0.2028, 'grad_norm': 0.4324641227722168, 'learning_rate': 8.123537583104529e-06, 'epoch': 0.76}
76%|███████▋ | 3436/4506 [3:54:51<1:15:40, 4.24s/it]
76%|███████▋ | 3437/4506 [3:54:55<1:15:17, 4.23s/it]
{'loss': 0.2096, 'grad_norm': 0.3822529911994934, 'learning_rate': 8.109253163506014e-06, 'epoch': 0.76}
76%|███████▋ | 3437/4506 [3:54:55<1:15:17, 4.23s/it]
76%|███████▋ | 3438/4506 [3:54:58<1:12:37, 4.08s/it]
{'loss': 0.2034, 'grad_norm': 0.3999866843223572, 'learning_rate': 8.094978882247677e-06, 'epoch': 0.76}
76%|███████▋ | 3438/4506 [3:54:58<1:12:37, 4.08s/it]
76%|███████▋ | 3439/4506 [3:55:02<1:11:26, 4.02s/it]
{'loss': 0.221, 'grad_norm': 0.49271899461746216, 'learning_rate': 8.080714747897375e-06, 'epoch': 0.76}
76%|███████▋ | 3439/4506 [3:55:02<1:11:26, 4.02s/it]
76%|███████▋ | 3440/4506 [3:55:06<1:10:15, 3.95s/it]
{'loss': 0.2081, 'grad_norm': 0.37048646807670593, 'learning_rate': 8.066460769016881e-06, 'epoch': 0.76}
76%|███████▋ | 3440/4506 [3:55:06<1:10:15, 3.95s/it]
76%|███████▋ | 3441/4506 [3:55:10<1:12:07, 4.06s/it]
{'loss': 0.2097, 'grad_norm': 0.3827546238899231, 'learning_rate': 8.052216954161854e-06, 'epoch': 0.76}
76%|███████▋ | 3441/4506 [3:55:10<1:12:07, 4.06s/it]
76%|███████▋ | 3442/4506 [3:55:15<1:11:55, 4.06s/it]
{'loss': 0.2192, 'grad_norm': 0.4205770492553711, 'learning_rate': 8.037983311881878e-06, 'epoch': 0.76}
76%|███████▋ | 3442/4506 [3:55:15<1:11:55, 4.06s/it]
76%|███████▋ | 3443/4506 [3:55:18<1:11:22, 4.03s/it]
{'loss': 0.2052, 'grad_norm': 0.39807042479515076, 'learning_rate': 8.023759850720406e-06, 'epoch': 0.76}
76%|███████▋ | 3443/4506 [3:55:18<1:11:22, 4.03s/it]
76%|███████▋ | 3444/4506 [3:55:23<1:12:06, 4.07s/it]
{'loss': 0.2141, 'grad_norm': 0.3783402442932129, 'learning_rate': 8.009546579214801e-06, 'epoch': 0.76}
76%|███████▋ | 3444/4506 [3:55:23<1:12:06, 4.07s/it]
76%|███████▋ | 3445/4506 [3:55:27<1:12:32, 4.10s/it]
{'loss': 0.2003, 'grad_norm': 0.3816852569580078, 'learning_rate': 7.995343505896286e-06, 'epoch': 0.76}
76%|███████▋ | 3445/4506 [3:55:27<1:12:32, 4.10s/it]
76%|███████▋ | 3446/4506 [3:55:31<1:12:33, 4.11s/it]
{'loss': 0.1974, 'grad_norm': 0.4594497084617615, 'learning_rate': 7.981150639290005e-06, 'epoch': 0.76}
76%|███████▋ | 3446/4506 [3:55:31<1:12:33, 4.11s/it]
76%|███████▋ | 3447/4506 [3:55:35<1:14:33, 4.22s/it]
{'loss': 0.221, 'grad_norm': 0.41770896315574646, 'learning_rate': 7.966967987914933e-06, 'epoch': 0.77}
76%|███████▋ | 3447/4506 [3:55:35<1:14:33, 4.22s/it]
77%|███████▋ | 3448/4506 [3:55:39<1:12:55, 4.14s/it]
{'loss': 0.2102, 'grad_norm': 0.3607460856437683, 'learning_rate': 7.952795560283922e-06, 'epoch': 0.77}
77%|███████▋ | 3448/4506 [3:55:39<1:12:55, 4.14s/it]
77%|███████▋ | 3449/4506 [3:55:44<1:14:44, 4.24s/it]
{'loss': 0.2013, 'grad_norm': 0.38026687502861023, 'learning_rate': 7.938633364903705e-06, 'epoch': 0.77}
77%|███████▋ | 3449/4506 [3:55:44<1:14:44, 4.24s/it]
77%|███████▋ | 3450/4506 [3:55:48<1:12:21, 4.11s/it]
{'loss': 0.2004, 'grad_norm': 0.3609796464443207, 'learning_rate': 7.924481410274853e-06, 'epoch': 0.77}
77%|███████▋ | 3450/4506 [3:55:48<1:12:21, 4.11s/it]
77%|███████▋ | 3451/4506 [3:55:52<1:11:53, 4.09s/it]
{'loss': 0.2035, 'grad_norm': 0.3577798902988434, 'learning_rate': 7.910339704891806e-06, 'epoch': 0.77}
77%|███████▋ | 3451/4506 [3:55:52<1:11:53, 4.09s/it]
77%|███████▋ | 3452/4506 [3:55:56<1:11:26, 4.07s/it]
{'loss': 0.2152, 'grad_norm': 0.3763330280780792, 'learning_rate': 7.896208257242847e-06, 'epoch': 0.77}
77%|███████▋ | 3452/4506 [3:55:56<1:11:26, 4.07s/it]
77%|███████▋ | 3453/4506 [3:56:00<1:11:36, 4.08s/it]
{'loss': 0.1991, 'grad_norm': 0.33439329266548157, 'learning_rate': 7.882087075810104e-06, 'epoch': 0.77}
77%|███████▋ | 3453/4506 [3:56:00<1:11:36, 4.08s/it]
77%|███████▋ | 3454/4506 [3:56:04<1:10:03, 4.00s/it]
{'loss': 0.1957, 'grad_norm': 0.410939484834671, 'learning_rate': 7.867976169069527e-06, 'epoch': 0.77}
77%|███████▋ | 3454/4506 [3:56:04<1:10:03, 4.00s/it]
77%|███████▋ | 3455/4506 [3:56:08<1:10:16, 4.01s/it]
{'loss': 0.217, 'grad_norm': 0.35160255432128906, 'learning_rate': 7.85387554549093e-06, 'epoch': 0.77}
77%|███████▋ | 3455/4506 [3:56:08<1:10:16, 4.01s/it]
77%|███████▋ | 3456/4506 [3:56:12<1:12:02, 4.12s/it]
{'loss': 0.2194, 'grad_norm': 0.5290538668632507, 'learning_rate': 7.839785213537917e-06, 'epoch': 0.77}
77%|███████▋ | 3456/4506 [3:56:12<1:12:02, 4.12s/it]
77%|███████▋ | 3457/4506 [3:56:16<1:12:06, 4.12s/it]
{'loss': 0.1985, 'grad_norm': 0.42305463552474976, 'learning_rate': 7.825705181667941e-06, 'epoch': 0.77}
77%|███████▋ | 3457/4506 [3:56:16<1:12:06, 4.12s/it]
77%|███████▋ | 3458/4506 [3:56:20<1:13:03, 4.18s/it]
{'loss': 0.219, 'grad_norm': 0.4171411395072937, 'learning_rate': 7.81163545833227e-06, 'epoch': 0.77}
77%|███████▋ | 3458/4506 [3:56:21<1:13:03, 4.18s/it]
77%|███████▋ | 3459/4506 [3:56:25<1:13:42, 4.22s/it]
{'loss': 0.2129, 'grad_norm': 0.43767601251602173, 'learning_rate': 7.797576051975982e-06, 'epoch': 0.77}
77%|███████▋ | 3459/4506 [3:56:25<1:13:42, 4.22s/it]
77%|███████▋ | 3460/4506 [3:56:29<1:12:43, 4.17s/it]
{'loss': 0.2101, 'grad_norm': 0.38753077387809753, 'learning_rate': 7.783526971037953e-06, 'epoch': 0.77}
77%|███████▋ | 3460/4506 [3:56:29<1:12:43, 4.17s/it]
77%|███████▋ | 3461/4506 [3:56:33<1:12:48, 4.18s/it]
{'loss': 0.2165, 'grad_norm': 0.4159576892852783, 'learning_rate': 7.769488223950874e-06, 'epoch': 0.77}
77%|███████▋ | 3461/4506 [3:56:33<1:12:48, 4.18s/it]
77%|███████▋ | 3462/4506 [3:56:37<1:11:29, 4.11s/it]
{'loss': 0.1993, 'grad_norm': 0.3706046938896179, 'learning_rate': 7.75545981914122e-06, 'epoch': 0.77}
77%|███████▋ | 3462/4506 [3:56:37<1:11:29, 4.11s/it]
77%|███████▋ | 3463/4506 [3:56:41<1:10:47, 4.07s/it]
{'loss': 0.2134, 'grad_norm': 0.4188576638698578, 'learning_rate': 7.741441765029281e-06, 'epoch': 0.77}
77%|███████▋ | 3463/4506 [3:56:41<1:10:47, 4.07s/it]
77%|███████▋ | 3464/4506 [3:56:45<1:08:57, 3.97s/it]
{'loss': 0.1966, 'grad_norm': 0.4097137153148651, 'learning_rate': 7.727434070029102e-06, 'epoch': 0.77}
77%|███████▋ | 3464/4506 [3:56:45<1:08:57, 3.97s/it]
77%|███████▋ | 3465/4506 [3:56:49<1:10:36, 4.07s/it]
{'loss': 0.1971, 'grad_norm': 0.3763822317123413, 'learning_rate': 7.713436742548538e-06, 'epoch': 0.77}
77%|███████▋ | 3465/4506 [3:56:49<1:10:36, 4.07s/it]
77%|███████▋ | 3466/4506 [3:56:53<1:09:57, 4.04s/it]
{'loss': 0.2019, 'grad_norm': 0.3472703695297241, 'learning_rate': 7.699449790989216e-06, 'epoch': 0.77}
77%|███████▋ | 3466/4506 [3:56:53<1:09:57, 4.04s/it]
77%|███████▋ | 3467/4506 [3:56:57<1:08:46, 3.97s/it]
{'loss': 0.2027, 'grad_norm': 0.4164162576198578, 'learning_rate': 7.685473223746515e-06, 'epoch': 0.77}
77%|███████▋ | 3467/4506 [3:56:57<1:08:46, 3.97s/it]
77%|███████▋ | 3468/4506 [3:57:01<1:08:49, 3.98s/it]
{'loss': 0.2, 'grad_norm': 0.3902263641357422, 'learning_rate': 7.671507049209614e-06, 'epoch': 0.77}
77%|███████▋ | 3468/4506 [3:57:01<1:08:49, 3.98s/it]
77%|███████▋ | 3469/4506 [3:57:05<1:08:31, 3.97s/it]
{'loss': 0.2057, 'grad_norm': 0.38339462876319885, 'learning_rate': 7.657551275761415e-06, 'epoch': 0.77}
77%|███████▋ | 3469/4506 [3:57:05<1:08:31, 3.97s/it]
77%|███████▋ | 3470/4506 [3:57:08<1:07:20, 3.90s/it]
{'loss': 0.2123, 'grad_norm': 0.3587760627269745, 'learning_rate': 7.643605911778612e-06, 'epoch': 0.77}
77%|███████▋ | 3470/4506 [3:57:08<1:07:20, 3.90s/it]
77%|███████▋ | 3471/4506 [3:57:13<1:09:39, 4.04s/it]
{'loss': 0.2049, 'grad_norm': 0.35009506344795227, 'learning_rate': 7.629670965631633e-06, 'epoch': 0.77}
77%|███████▋ | 3471/4506 [3:57:13<1:09:39, 4.04s/it]
77%|███████▋ | 3472/4506 [3:57:17<1:08:11, 3.96s/it]
{'loss': 0.1965, 'grad_norm': 0.3620169460773468, 'learning_rate': 7.615746445684666e-06, 'epoch': 0.77}
77%|███████▋ | 3472/4506 [3:57:17<1:08:11, 3.96s/it]
77%|███████▋ | 3473/4506 [3:57:21<1:08:47, 4.00s/it]
{'loss': 0.2075, 'grad_norm': 0.40615084767341614, 'learning_rate': 7.601832360295613e-06, 'epoch': 0.77}
77%|███████▋ | 3473/4506 [3:57:21<1:08:47, 4.00s/it]
77%|███████▋ | 3474/4506 [3:57:25<1:08:52, 4.00s/it]
{'loss': 0.1964, 'grad_norm': 0.368762344121933, 'learning_rate': 7.587928717816151e-06, 'epoch': 0.77}
77%|███████▋ | 3474/4506 [3:57:25<1:08:52, 4.00s/it]
77%|███████▋ | 3475/4506 [3:57:29<1:10:05, 4.08s/it]
{'loss': 0.2212, 'grad_norm': 0.40976202487945557, 'learning_rate': 7.574035526591649e-06, 'epoch': 0.77}
77%|███████▋ | 3475/4506 [3:57:29<1:10:05, 4.08s/it]
77%|███████▋ | 3476/4506 [3:57:33<1:11:05, 4.14s/it]
{'loss': 0.2052, 'grad_norm': 0.3493613004684448, 'learning_rate': 7.560152794961236e-06, 'epoch': 0.77}
77%|███████▋ | 3476/4506 [3:57:33<1:11:05, 4.14s/it]
77%|███████▋ | 3477/4506 [3:57:37<1:09:53, 4.08s/it]
{'loss': 0.2067, 'grad_norm': 0.41631704568862915, 'learning_rate': 7.546280531257746e-06, 'epoch': 0.77}
77%|███████▋ | 3477/4506 [3:57:37<1:09:53, 4.08s/it]
77%|███████▋ | 3478/4506 [3:57:41<1:09:28, 4.06s/it]
{'loss': 0.2081, 'grad_norm': 0.34037309885025024, 'learning_rate': 7.532418743807745e-06, 'epoch': 0.77}
77%|███████▋ | 3478/4506 [3:57:41<1:09:28, 4.06s/it]
77%|███████▋ | 3479/4506 [3:57:46<1:11:04, 4.15s/it]
{'loss': 0.207, 'grad_norm': 0.35516947507858276, 'learning_rate': 7.518567440931481e-06, 'epoch': 0.77}
77%|███████▋ | 3479/4506 [3:57:46<1:11:04, 4.15s/it]
77%|███████▋ | 3480/4506 [3:57:50<1:11:57, 4.21s/it]
{'loss': 0.1927, 'grad_norm': 0.41376420855522156, 'learning_rate': 7.504726630942946e-06, 'epoch': 0.77}
77%|███████▋ | 3480/4506 [3:57:50<1:11:57, 4.21s/it]
77%|███████▋ | 3481/4506 [3:57:54<1:10:58, 4.15s/it]
{'loss': 0.2076, 'grad_norm': 0.36875030398368835, 'learning_rate': 7.490896322149804e-06, 'epoch': 0.77}
77%|███████▋ | 3481/4506 [3:57:54<1:10:58, 4.15s/it]
77%|███████▋ | 3482/4506 [3:57:58<1:09:03, 4.05s/it]
{'loss': 0.2058, 'grad_norm': 0.3908901810646057, 'learning_rate': 7.477076522853421e-06, 'epoch': 0.77}
77%|███████▋ | 3482/4506 [3:57:58<1:09:03, 4.05s/it]
77%|███████▋ | 3483/4506 [3:58:02<1:10:39, 4.14s/it]
{'loss': 0.2206, 'grad_norm': 0.394504576921463, 'learning_rate': 7.46326724134887e-06, 'epoch': 0.77}
77%|███████▋ | 3483/4506 [3:58:02<1:10:39, 4.14s/it]
77%|███████▋ | 3484/4506 [3:58:06<1:09:29, 4.08s/it]
{'loss': 0.1931, 'grad_norm': 0.3979129195213318, 'learning_rate': 7.4494684859248985e-06, 'epoch': 0.77}
77%|███████▋ | 3484/4506 [3:58:06<1:09:29, 4.08s/it]
77%|███████▋ | 3485/4506 [3:58:10<1:08:25, 4.02s/it]
{'loss': 0.2057, 'grad_norm': 0.4611952006816864, 'learning_rate': 7.4356802648639535e-06, 'epoch': 0.77}
77%|███████▋ | 3485/4506 [3:58:10<1:08:25, 4.02s/it]
77%|███████▋ | 3486/4506 [3:58:14<1:07:42, 3.98s/it]
{'loss': 0.2093, 'grad_norm': 0.46486133337020874, 'learning_rate': 7.421902586442122e-06, 'epoch': 0.77}
77%|███████▋ | 3486/4506 [3:58:14<1:07:42, 3.98s/it]
77%|███████▋ | 3487/4506 [3:58:18<1:08:38, 4.04s/it]
{'loss': 0.1975, 'grad_norm': 0.4179832339286804, 'learning_rate': 7.408135458929205e-06, 'epoch': 0.77}
77%|███████▋ | 3487/4506 [3:58:18<1:08:38, 4.04s/it]
77%|███████▋ | 3488/4506 [3:58:22<1:08:41, 4.05s/it]
{'loss': 0.2079, 'grad_norm': 0.35864755511283875, 'learning_rate': 7.3943788905886356e-06, 'epoch': 0.77}
77%|███████▋ | 3488/4506 [3:58:22<1:08:41, 4.05s/it]
77%|███████▋ | 3489/4506 [3:58:26<1:08:34, 4.05s/it]
{'loss': 0.2129, 'grad_norm': 0.39475348591804504, 'learning_rate': 7.38063288967753e-06, 'epoch': 0.77}
77%|███████▋ | 3489/4506 [3:58:26<1:08:34, 4.05s/it]
77%|███████▋ | 3490/4506 [3:58:30<1:07:46, 4.00s/it]
{'loss': 0.2102, 'grad_norm': 0.46979930996894836, 'learning_rate': 7.366897464446662e-06, 'epoch': 0.77}
77%|███████▋ | 3490/4506 [3:58:30<1:07:46, 4.00s/it]
77%|███████▋ | 3491/4506 [3:58:34<1:09:37, 4.12s/it]
{'loss': 0.208, 'grad_norm': 0.43178462982177734, 'learning_rate': 7.353172623140453e-06, 'epoch': 0.77}
77%|███████▋ | 3491/4506 [3:58:34<1:09:37, 4.12s/it]
77%|███████▋ | 3492/4506 [3:58:39<1:10:16, 4.16s/it]
{'loss': 0.2104, 'grad_norm': 0.3636956214904785, 'learning_rate': 7.339458373996961e-06, 'epoch': 0.78}
77%|███████▋ | 3492/4506 [3:58:39<1:10:16, 4.16s/it]
78%|███████▊ | 3493/4506 [3:58:43<1:09:57, 4.14s/it]
{'loss': 0.2112, 'grad_norm': 0.4130551517009735, 'learning_rate': 7.325754725247905e-06, 'epoch': 0.78}
78%|███████▊ | 3493/4506 [3:58:43<1:09:57, 4.14s/it]
78%|███████▊ | 3494/4506 [3:58:47<1:08:13, 4.05s/it]
{'loss': 0.2029, 'grad_norm': 0.3478567600250244, 'learning_rate': 7.312061685118618e-06, 'epoch': 0.78}
78%|███████▊ | 3494/4506 [3:58:47<1:08:13, 4.05s/it]
78%|███████▊ | 3495/4506 [3:58:51<1:08:04, 4.04s/it]
{'loss': 0.2017, 'grad_norm': 0.4079001247882843, 'learning_rate': 7.298379261828089e-06, 'epoch': 0.78}
78%|███████▊ | 3495/4506 [3:58:51<1:08:04, 4.04s/it]
78%|███████▊ | 3496/4506 [3:58:54<1:06:46, 3.97s/it]
{'loss': 0.2035, 'grad_norm': 0.4204656779766083, 'learning_rate': 7.284707463588927e-06, 'epoch': 0.78}
78%|███████▊ | 3496/4506 [3:58:54<1:06:46, 3.97s/it]
78%|███████▊ | 3497/4506 [3:58:59<1:08:15, 4.06s/it]
{'loss': 0.2092, 'grad_norm': 0.4159030020236969, 'learning_rate': 7.2710462986073645e-06, 'epoch': 0.78}
78%|███████▊ | 3497/4506 [3:58:59<1:08:15, 4.06s/it]
78%|███████▊ | 3498/4506 [3:59:03<1:11:16, 4.24s/it]
{'loss': 0.2038, 'grad_norm': 0.3551419675350189, 'learning_rate': 7.257395775083242e-06, 'epoch': 0.78}
78%|███████▊ | 3498/4506 [3:59:03<1:11:16, 4.24s/it]
78%|███████▊ | 3499/4506 [3:59:07<1:09:28, 4.14s/it]
{'loss': 0.2138, 'grad_norm': 0.40538904070854187, 'learning_rate': 7.243755901210012e-06, 'epoch': 0.78}
78%|███████▊ | 3499/4506 [3:59:07<1:09:28, 4.14s/it]
78%|███████▊ | 3500/4506 [3:59:11<1:09:09, 4.13s/it]
{'loss': 0.2248, 'grad_norm': 0.3958415985107422, 'learning_rate': 7.230126685174754e-06, 'epoch': 0.78}
78%|███████▊ | 3500/4506 [3:59:11<1:09:09, 4.13s/it]
78%|███████▊ | 3501/4506 [3:59:16<1:10:39, 4.22s/it]
{'loss': 0.2026, 'grad_norm': 0.39075997471809387, 'learning_rate': 7.216508135158129e-06, 'epoch': 0.78}
78%|███████▊ | 3501/4506 [3:59:16<1:10:39, 4.22s/it]
78%|███████▊ | 3502/4506 [3:59:20<1:09:17, 4.14s/it]
{'loss': 0.2067, 'grad_norm': 0.3567182719707489, 'learning_rate': 7.202900259334408e-06, 'epoch': 0.78}
78%|███████▊ | 3502/4506 [3:59:20<1:09:17, 4.14s/it]
78%|███████▊ | 3503/4506 [3:59:24<1:10:17, 4.20s/it]
{'loss': 0.2051, 'grad_norm': 0.3486519455909729, 'learning_rate': 7.189303065871452e-06, 'epoch': 0.78}
78%|███████▊ | 3503/4506 [3:59:24<1:10:17, 4.20s/it]
78%|███████▊ | 3504/4506 [3:59:28<1:08:37, 4.11s/it]
{'loss': 0.2149, 'grad_norm': 0.4065150022506714, 'learning_rate': 7.17571656293072e-06, 'epoch': 0.78}
78%|███████▊ | 3504/4506 [3:59:28<1:08:37, 4.11s/it]
78%|███████▊ | 3505/4506 [3:59:32<1:06:28, 3.98s/it]
{'loss': 0.2156, 'grad_norm': 0.37230879068374634, 'learning_rate': 7.162140758667229e-06, 'epoch': 0.78}
78%|███████▊ | 3505/4506 [3:59:32<1:06:28, 3.98s/it]
78%|███████▊ | 3506/4506 [3:59:36<1:06:32, 3.99s/it]
{'loss': 0.2156, 'grad_norm': 0.4416612982749939, 'learning_rate': 7.148575661229606e-06, 'epoch': 0.78}
78%|███████▊ | 3506/4506 [3:59:36<1:06:32, 3.99s/it]
78%|███████▊ | 3507/4506 [3:59:40<1:06:40, 4.00s/it]
{'loss': 0.2072, 'grad_norm': 0.3854820728302002, 'learning_rate': 7.135021278760018e-06, 'epoch': 0.78}
78%|███████▊ | 3507/4506 [3:59:40<1:06:40, 4.00s/it]
78%|███████▊ | 3508/4506 [3:59:44<1:06:49, 4.02s/it]
{'loss': 0.2145, 'grad_norm': 0.4203706979751587, 'learning_rate': 7.121477619394223e-06, 'epoch': 0.78}
78%|███████▊ | 3508/4506 [3:59:44<1:06:49, 4.02s/it]
78%|███████▊ | 3509/4506 [3:59:48<1:05:54, 3.97s/it]
{'loss': 0.1962, 'grad_norm': 0.36022722721099854, 'learning_rate': 7.107944691261542e-06, 'epoch': 0.78}
78%|███████▊ | 3509/4506 [3:59:48<1:05:54, 3.97s/it]
78%|███████▊ | 3510/4506 [3:59:51<1:05:30, 3.95s/it]
{'loss': 0.2183, 'grad_norm': 0.4426419734954834, 'learning_rate': 7.094422502484857e-06, 'epoch': 0.78}
78%|███████▊ | 3510/4506 [3:59:52<1:05:30, 3.95s/it]
78%|███████▊ | 3511/4506 [3:59:56<1:06:49, 4.03s/it]
{'loss': 0.2033, 'grad_norm': 0.37246230244636536, 'learning_rate': 7.080911061180581e-06, 'epoch': 0.78}
78%|███████▊ | 3511/4506 [3:59:56<1:06:49, 4.03s/it]
78%|███████▊ | 3512/4506 [3:59:59<1:05:29, 3.95s/it]
{'loss': 0.2096, 'grad_norm': 0.4228118062019348, 'learning_rate': 7.067410375458708e-06, 'epoch': 0.78}
78%|███████▊ | 3512/4506 [3:59:59<1:05:29, 3.95s/it]
78%|███████▊ | 3513/4506 [4:00:04<1:06:37, 4.03s/it]
{'loss': 0.2103, 'grad_norm': 0.4170245826244354, 'learning_rate': 7.053920453422744e-06, 'epoch': 0.78}
78%|███████▊ | 3513/4506 [4:00:04<1:06:37, 4.03s/it]
78%|███████▊ | 3514/4506 [4:00:07<1:05:23, 3.95s/it]
{'loss': 0.2148, 'grad_norm': 0.3635629117488861, 'learning_rate': 7.040441303169756e-06, 'epoch': 0.78}
78%|███████▊ | 3514/4506 [4:00:07<1:05:23, 3.95s/it]
78%|███████▊ | 3515/4506 [4:00:11<1:05:34, 3.97s/it]
{'loss': 0.2026, 'grad_norm': 0.3894003629684448, 'learning_rate': 7.026972932790354e-06, 'epoch': 0.78}
78%|███████▊ | 3515/4506 [4:00:11<1:05:34, 3.97s/it]
78%|███████▊ | 3516/4506 [4:00:16<1:06:26, 4.03s/it]
{'loss': 0.2027, 'grad_norm': 0.4018757939338684, 'learning_rate': 7.013515350368644e-06, 'epoch': 0.78}
78%|███████▊ | 3516/4506 [4:00:16<1:06:26, 4.03s/it]
78%|███████▊ | 3517/4506 [4:00:20<1:06:08, 4.01s/it]
{'loss': 0.2059, 'grad_norm': 0.4022873342037201, 'learning_rate': 7.000068563982293e-06, 'epoch': 0.78}
78%|███████▊ | 3517/4506 [4:00:20<1:06:08, 4.01s/it]
78%|███████▊ | 3518/4506 [4:00:23<1:05:01, 3.95s/it]
{'loss': 0.2066, 'grad_norm': 0.36481422185897827, 'learning_rate': 6.986632581702454e-06, 'epoch': 0.78}
78%|███████▊ | 3518/4506 [4:00:23<1:05:01, 3.95s/it]
78%|███████▊ | 3519/4506 [4:00:28<1:06:52, 4.07s/it]
{'loss': 0.231, 'grad_norm': 0.472859263420105, 'learning_rate': 6.973207411593832e-06, 'epoch': 0.78}
78%|███████▊ | 3519/4506 [4:00:28<1:06:52, 4.07s/it]
78%|███████▊ | 3520/4506 [4:00:32<1:06:44, 4.06s/it]
{'loss': 0.2055, 'grad_norm': 0.4023250341415405, 'learning_rate': 6.9597930617146036e-06, 'epoch': 0.78}
78%|███████▊ | 3520/4506 [4:00:32<1:06:44, 4.06s/it]
78%|███████▊ | 3521/4506 [4:00:36<1:06:28, 4.05s/it]
{'loss': 0.2056, 'grad_norm': 0.35611703991889954, 'learning_rate': 6.94638954011648e-06, 'epoch': 0.78}
78%|███████▊ | 3521/4506 [4:00:36<1:06:28, 4.05s/it]
78%|███████▊ | 3522/4506 [4:00:40<1:07:58, 4.14s/it]
{'loss': 0.2047, 'grad_norm': 0.33819419145584106, 'learning_rate': 6.932996854844659e-06, 'epoch': 0.78}
78%|███████▊ | 3522/4506 [4:00:40<1:07:58, 4.14s/it]
78%|███████▊ | 3523/4506 [4:00:44<1:07:01, 4.09s/it]
{'loss': 0.1986, 'grad_norm': 0.35603123903274536, 'learning_rate': 6.919615013937847e-06, 'epoch': 0.78}
78%|███████▊ | 3523/4506 [4:00:44<1:07:01, 4.09s/it]
78%|███████▊ | 3524/4506 [4:00:48<1:08:00, 4.15s/it]
{'loss': 0.206, 'grad_norm': 0.35020965337753296, 'learning_rate': 6.906244025428218e-06, 'epoch': 0.78}
78%|███████▊ | 3524/4506 [4:00:48<1:08:00, 4.15s/it]
78%|███████▊ | 3525/4506 [4:00:52<1:06:12, 4.05s/it]
{'loss': 0.2079, 'grad_norm': 0.39215388894081116, 'learning_rate': 6.892883897341462e-06, 'epoch': 0.78}
78%|███████▊ | 3525/4506 [4:00:52<1:06:12, 4.05s/it]
78%|███████▊ | 3526/4506 [4:00:56<1:05:54, 4.03s/it]
{'loss': 0.2067, 'grad_norm': 0.4564109742641449, 'learning_rate': 6.879534637696719e-06, 'epoch': 0.78}
78%|███████▊ | 3526/4506 [4:00:56<1:05:54, 4.03s/it]
78%|███████▊ | 3527/4506 [4:01:00<1:06:02, 4.05s/it]
{'loss': 0.2086, 'grad_norm': 0.377254456281662, 'learning_rate': 6.866196254506626e-06, 'epoch': 0.78}
78%|███████▊ | 3527/4506 [4:01:00<1:06:02, 4.05s/it]
78%|███████▊ | 3528/4506 [4:01:04<1:06:16, 4.07s/it]
{'loss': 0.2009, 'grad_norm': 0.43810808658599854, 'learning_rate': 6.852868755777286e-06, 'epoch': 0.78}
78%|███████▊ | 3528/4506 [4:01:04<1:06:16, 4.07s/it]
78%|███████▊ | 3529/4506 [4:01:09<1:07:14, 4.13s/it]
{'loss': 0.2014, 'grad_norm': 0.3842511475086212, 'learning_rate': 6.839552149508283e-06, 'epoch': 0.78}
78%|███████▊ | 3529/4506 [4:01:09<1:07:14, 4.13s/it]
78%|███████▊ | 3530/4506 [4:01:13<1:08:10, 4.19s/it]
{'loss': 0.212, 'grad_norm': 0.36979013681411743, 'learning_rate': 6.826246443692625e-06, 'epoch': 0.78}
78%|███████▊ | 3530/4506 [4:01:13<1:08:10, 4.19s/it]
78%|███████▊ | 3531/4506 [4:01:17<1:06:33, 4.10s/it]
{'loss': 0.2124, 'grad_norm': 0.39036673307418823, 'learning_rate': 6.812951646316823e-06, 'epoch': 0.78}
78%|███████▊ | 3531/4506 [4:01:17<1:06:33, 4.10s/it]
78%|███████▊ | 3532/4506 [4:01:21<1:08:04, 4.19s/it]
{'loss': 0.2034, 'grad_norm': 0.430909663438797, 'learning_rate': 6.7996677653608125e-06, 'epoch': 0.78}
78%|███████▊ | 3532/4506 [4:01:21<1:08:04, 4.19s/it]
78%|███████▊ | 3533/4506 [4:01:25<1:06:45, 4.12s/it]
{'loss': 0.1986, 'grad_norm': 0.4003376364707947, 'learning_rate': 6.786394808797961e-06, 'epoch': 0.78}
78%|███████▊ | 3533/4506 [4:01:25<1:06:45, 4.12s/it]
78%|███████▊ | 3534/4506 [4:01:29<1:06:31, 4.11s/it]
{'loss': 0.2087, 'grad_norm': 0.3824915885925293, 'learning_rate': 6.773132784595138e-06, 'epoch': 0.78}
78%|███████▊ | 3534/4506 [4:01:29<1:06:31, 4.11s/it]
78%|███████▊ | 3535/4506 [4:01:33<1:05:11, 4.03s/it]
{'loss': 0.2071, 'grad_norm': 0.4336482286453247, 'learning_rate': 6.759881700712586e-06, 'epoch': 0.78}
78%|███████▊ | 3535/4506 [4:01:33<1:05:11, 4.03s/it]
78%|███████▊ | 3536/4506 [4:01:37<1:06:10, 4.09s/it]
{'loss': 0.2102, 'grad_norm': 0.44910213351249695, 'learning_rate': 6.746641565104028e-06, 'epoch': 0.78}
78%|███████▊ | 3536/4506 [4:01:37<1:06:10, 4.09s/it]
78%|███████▊ | 3537/4506 [4:01:41<1:05:09, 4.03s/it]
{'loss': 0.2119, 'grad_norm': 0.4048246443271637, 'learning_rate': 6.733412385716581e-06, 'epoch': 0.79}
78%|███████▊ | 3537/4506 [4:01:41<1:05:09, 4.03s/it]
79%|███████▊ | 3538/4506 [4:01:45<1:05:12, 4.04s/it]
{'loss': 0.2027, 'grad_norm': 0.3862391412258148, 'learning_rate': 6.720194170490812e-06, 'epoch': 0.79}
79%|███████▊ | 3538/4506 [4:01:45<1:05:12, 4.04s/it]
79%|███████▊ | 3539/4506 [4:01:50<1:05:28, 4.06s/it]
{'loss': 0.2082, 'grad_norm': 0.42197495698928833, 'learning_rate': 6.706986927360684e-06, 'epoch': 0.79}
79%|███████▊ | 3539/4506 [4:01:50<1:05:28, 4.06s/it]
79%|███████▊ | 3540/4506 [4:01:53<1:03:26, 3.94s/it]
{'loss': 0.2104, 'grad_norm': 0.42211097478866577, 'learning_rate': 6.693790664253593e-06, 'epoch': 0.79}
79%|███████▊ | 3540/4506 [4:01:53<1:03:26, 3.94s/it]
79%|███████▊ | 3541/4506 [4:01:57<1:05:02, 4.04s/it]
{'loss': 0.2013, 'grad_norm': 0.3892677426338196, 'learning_rate': 6.680605389090339e-06, 'epoch': 0.79}
79%|███████▊ | 3541/4506 [4:01:57<1:05:02, 4.04s/it]
79%|███████▊ | 3542/4506 [4:02:02<1:04:53, 4.04s/it]
{'loss': 0.2077, 'grad_norm': 0.41695448756217957, 'learning_rate': 6.66743110978513e-06, 'epoch': 0.79}
79%|███████▊ | 3542/4506 [4:02:02<1:04:53, 4.04s/it]
79%|███████▊ | 3543/4506 [4:02:06<1:05:25, 4.08s/it]
{'loss': 0.2081, 'grad_norm': 0.355471670627594, 'learning_rate': 6.6542678342455565e-06, 'epoch': 0.79}
79%|███████▊ | 3543/4506 [4:02:06<1:05:25, 4.08s/it]
79%|███████▊ | 3544/4506 [4:02:10<1:04:16, 4.01s/it]
{'loss': 0.2096, 'grad_norm': 0.3763100206851959, 'learning_rate': 6.641115570372633e-06, 'epoch': 0.79}
79%|███████▊ | 3544/4506 [4:02:10<1:04:16, 4.01s/it]
79%|███████▊ | 3545/4506 [4:02:14<1:04:18, 4.01s/it]
{'loss': 0.2005, 'grad_norm': 0.3654741644859314, 'learning_rate': 6.627974326060729e-06, 'epoch': 0.79}
79%|███████▊ | 3545/4506 [4:02:14<1:04:18, 4.01s/it]
79%|███████▊ | 3546/4506 [4:02:18<1:04:19, 4.02s/it]
{'loss': 0.2069, 'grad_norm': 0.38082948327064514, 'learning_rate': 6.614844109197632e-06, 'epoch': 0.79}
79%|███████▊ | 3546/4506 [4:02:18<1:04:19, 4.02s/it]
79%|███████▊ | 3547/4506 [4:02:22<1:05:01, 4.07s/it]
{'loss': 0.1983, 'grad_norm': 0.34757500886917114, 'learning_rate': 6.601724927664493e-06, 'epoch': 0.79}
79%|███████▊ | 3547/4506 [4:02:22<1:05:01, 4.07s/it]
79%|███████▊ | 3548/4506 [4:02:26<1:04:30, 4.04s/it]
{'loss': 0.207, 'grad_norm': 0.43289220333099365, 'learning_rate': 6.588616789335855e-06, 'epoch': 0.79}
79%|███████▊ | 3548/4506 [4:02:26<1:04:30, 4.04s/it]
79%|███████▉ | 3549/4506 [4:02:30<1:04:00, 4.01s/it]
{'loss': 0.2154, 'grad_norm': 0.37712785601615906, 'learning_rate': 6.575519702079611e-06, 'epoch': 0.79}
79%|███████▉ | 3549/4506 [4:02:30<1:04:00, 4.01s/it]
79%|███████▉ | 3550/4506 [4:02:34<1:05:20, 4.10s/it]
{'loss': 0.2208, 'grad_norm': 0.41256850957870483, 'learning_rate': 6.562433673757026e-06, 'epoch': 0.79}
79%|███████▉ | 3550/4506 [4:02:34<1:05:20, 4.10s/it]
79%|███████▉ | 3551/4506 [4:02:38<1:04:07, 4.03s/it]
{'loss': 0.2018, 'grad_norm': 0.38639071583747864, 'learning_rate': 6.549358712222747e-06, 'epoch': 0.79}
79%|███████▉ | 3551/4506 [4:02:38<1:04:07, 4.03s/it]
79%|███████▉ | 3552/4506 [4:02:42<1:05:32, 4.12s/it]
{'loss': 0.2133, 'grad_norm': 0.39160722494125366, 'learning_rate': 6.536294825324743e-06, 'epoch': 0.79}
79%|███████▉ | 3552/4506 [4:02:42<1:05:32, 4.12s/it]
79%|███████▉ | 3553/4506 [4:02:46<1:04:55, 4.09s/it]
{'loss': 0.2157, 'grad_norm': 0.42323818802833557, 'learning_rate': 6.523242020904383e-06, 'epoch': 0.79}
79%|███████▉ | 3553/4506 [4:02:46<1:04:55, 4.09s/it]
79%|███████▉ | 3554/4506 [4:02:50<1:05:05, 4.10s/it]
{'loss': 0.2004, 'grad_norm': 0.37543466687202454, 'learning_rate': 6.5102003067963355e-06, 'epoch': 0.79}
79%|███████▉ | 3554/4506 [4:02:50<1:05:05, 4.10s/it]
79%|███████▉ | 3555/4506 [4:02:54<1:05:15, 4.12s/it]
{'loss': 0.2001, 'grad_norm': 0.386454701423645, 'learning_rate': 6.49716969082865e-06, 'epoch': 0.79}
79%|███████▉ | 3555/4506 [4:02:55<1:05:15, 4.12s/it]
79%|███████▉ | 3556/4506 [4:02:58<1:04:17, 4.06s/it]
{'loss': 0.2069, 'grad_norm': 0.40777483582496643, 'learning_rate': 6.484150180822681e-06, 'epoch': 0.79}
79%|███████▉ | 3556/4506 [4:02:58<1:04:17, 4.06s/it]
79%|███████▉ | 3557/4506 [4:03:03<1:05:23, 4.13s/it]
{'loss': 0.2058, 'grad_norm': 0.4155120253562927, 'learning_rate': 6.471141784593157e-06, 'epoch': 0.79}
79%|███████▉ | 3557/4506 [4:03:03<1:05:23, 4.13s/it]
79%|███████▉ | 3558/4506 [4:03:07<1:05:45, 4.16s/it]
{'loss': 0.1999, 'grad_norm': 0.37485402822494507, 'learning_rate': 6.458144509948089e-06, 'epoch': 0.79}
79%|███████▉ | 3558/4506 [4:03:07<1:05:45, 4.16s/it]
79%|███████▉ | 3559/4506 [4:03:11<1:04:20, 4.08s/it]
{'loss': 0.203, 'grad_norm': 0.38350507616996765, 'learning_rate': 6.445158364688849e-06, 'epoch': 0.79}
79%|███████▉ | 3559/4506 [4:03:11<1:04:20, 4.08s/it]
79%|███████▉ | 3560/4506 [4:03:15<1:04:25, 4.09s/it]
{'loss': 0.2023, 'grad_norm': 0.36769017577171326, 'learning_rate': 6.432183356610119e-06, 'epoch': 0.79}
79%|███████▉ | 3560/4506 [4:03:15<1:04:25, 4.09s/it]
79%|███████▉ | 3561/4506 [4:03:19<1:04:01, 4.07s/it]
{'loss': 0.1972, 'grad_norm': 0.38584986329078674, 'learning_rate': 6.419219493499895e-06, 'epoch': 0.79}
79%|███████▉ | 3561/4506 [4:03:19<1:04:01, 4.07s/it]
79%|███████▉ | 3562/4506 [4:03:23<1:04:39, 4.11s/it]
{'loss': 0.2018, 'grad_norm': 0.3760276436805725, 'learning_rate': 6.406266783139473e-06, 'epoch': 0.79}
79%|███████▉ | 3562/4506 [4:03:23<1:04:39, 4.11s/it]
79%|███████▉ | 3563/4506 [4:03:27<1:04:10, 4.08s/it]
{'loss': 0.207, 'grad_norm': 0.3973504900932312, 'learning_rate': 6.393325233303474e-06, 'epoch': 0.79}
79%|███████▉ | 3563/4506 [4:03:27<1:04:10, 4.08s/it]
79%|███████▉ | 3564/4506 [4:03:31<1:04:40, 4.12s/it]
{'loss': 0.2123, 'grad_norm': 0.3733390271663666, 'learning_rate': 6.380394851759797e-06, 'epoch': 0.79}
79%|███████▉ | 3564/4506 [4:03:31<1:04:40, 4.12s/it]
79%|███████▉ | 3565/4506 [4:03:35<1:04:31, 4.11s/it]
{'loss': 0.2082, 'grad_norm': 0.3762107193470001, 'learning_rate': 6.367475646269658e-06, 'epoch': 0.79}
79%|███████▉ | 3565/4506 [4:03:36<1:04:31, 4.11s/it]
79%|███████▉ | 3566/4506 [4:03:40<1:04:31, 4.12s/it]
{'loss': 0.2107, 'grad_norm': 0.3841451108455658, 'learning_rate': 6.354567624587565e-06, 'epoch': 0.79}
79%|███████▉ | 3566/4506 [4:03:40<1:04:31, 4.12s/it]
79%|███████▉ | 3567/4506 [4:03:43<1:01:44, 3.95s/it]
{'loss': 0.1995, 'grad_norm': 0.4105494022369385, 'learning_rate': 6.341670794461291e-06, 'epoch': 0.79}
79%|███████▉ | 3567/4506 [4:03:43<1:01:44, 3.95s/it]
79%|███████▉ | 3568/4506 [4:03:47<1:00:23, 3.86s/it]
{'loss': 0.2105, 'grad_norm': 0.41631099581718445, 'learning_rate': 6.328785163631917e-06, 'epoch': 0.79}
79%|███████▉ | 3568/4506 [4:03:47<1:00:23, 3.86s/it]
79%|███████▉ | 3569/4506 [4:03:51<1:00:21, 3.86s/it]
{'loss': 0.2095, 'grad_norm': 0.39808762073516846, 'learning_rate': 6.315910739833783e-06, 'epoch': 0.79}
79%|███████▉ | 3569/4506 [4:03:51<1:00:21, 3.86s/it]
79%|███████▉ | 3570/4506 [4:03:55<1:00:44, 3.89s/it]
{'loss': 0.2105, 'grad_norm': 0.38834118843078613, 'learning_rate': 6.303047530794518e-06, 'epoch': 0.79}
79%|███████▉ | 3570/4506 [4:03:55<1:00:44, 3.89s/it]
79%|███████▉ | 3571/4506 [4:03:59<1:02:34, 4.02s/it]
{'loss': 0.2154, 'grad_norm': 0.36266231536865234, 'learning_rate': 6.290195544234989e-06, 'epoch': 0.79}
79%|███████▉ | 3571/4506 [4:03:59<1:02:34, 4.02s/it]
79%|███████▉ | 3572/4506 [4:04:04<1:05:19, 4.20s/it]
{'loss': 0.2091, 'grad_norm': 0.35417547821998596, 'learning_rate': 6.277354787869386e-06, 'epoch': 0.79}
79%|███████▉ | 3572/4506 [4:04:04<1:05:19, 4.20s/it]
79%|███████▉ | 3573/4506 [4:04:08<1:05:48, 4.23s/it]
{'loss': 0.2061, 'grad_norm': 0.32720640301704407, 'learning_rate': 6.264525269405092e-06, 'epoch': 0.79}
79%|███████▉ | 3573/4506 [4:04:08<1:05:48, 4.23s/it]
79%|███████▉ | 3574/4506 [4:04:12<1:06:00, 4.25s/it]
{'loss': 0.2172, 'grad_norm': 0.3811240792274475, 'learning_rate': 6.251706996542794e-06, 'epoch': 0.79}
79%|███████▉ | 3574/4506 [4:04:12<1:06:00, 4.25s/it]
79%|███████▉ | 3575/4506 [4:04:16<1:04:50, 4.18s/it]
{'loss': 0.1966, 'grad_norm': 0.3593485951423645, 'learning_rate': 6.238899976976392e-06, 'epoch': 0.79}
79%|███████▉ | 3575/4506 [4:04:16<1:04:50, 4.18s/it]
79%|███████▉ | 3576/4506 [4:04:20<1:03:29, 4.10s/it]
{'loss': 0.2103, 'grad_norm': 0.3700810372829437, 'learning_rate': 6.226104218393064e-06, 'epoch': 0.79}
79%|███████▉ | 3576/4506 [4:04:20<1:03:29, 4.10s/it]
79%|███████▉ | 3577/4506 [4:04:24<1:03:20, 4.09s/it]
{'loss': 0.2217, 'grad_norm': 0.4116648733615875, 'learning_rate': 6.213319728473199e-06, 'epoch': 0.79}
79%|███████▉ | 3577/4506 [4:04:24<1:03:20, 4.09s/it]
79%|███████▉ | 3578/4506 [4:04:28<1:01:38, 3.99s/it]
{'loss': 0.2088, 'grad_norm': 0.4022841453552246, 'learning_rate': 6.200546514890446e-06, 'epoch': 0.79}
79%|███████▉ | 3578/4506 [4:04:28<1:01:38, 3.99s/it]
79%|███████▉ | 3579/4506 [4:04:32<1:01:21, 3.97s/it]
{'loss': 0.2018, 'grad_norm': 0.447321355342865, 'learning_rate': 6.187784585311673e-06, 'epoch': 0.79}
79%|███████▉ | 3579/4506 [4:04:32<1:01:21, 3.97s/it]
79%|███████▉ | 3580/4506 [4:04:36<1:02:57, 4.08s/it]
{'loss': 0.1986, 'grad_norm': 0.3626938760280609, 'learning_rate': 6.175033947396988e-06, 'epoch': 0.79}
79%|███████▉ | 3580/4506 [4:04:36<1:02:57, 4.08s/it]
79%|███████▉ | 3581/4506 [4:04:40<1:01:38, 4.00s/it]
{'loss': 0.2081, 'grad_norm': 0.4838167726993561, 'learning_rate': 6.162294608799698e-06, 'epoch': 0.79}
79%|███████▉ | 3581/4506 [4:04:40<1:01:38, 4.00s/it]
79%|███████▉ | 3582/4506 [4:04:44<1:00:25, 3.92s/it]
{'loss': 0.1949, 'grad_norm': 0.3676987886428833, 'learning_rate': 6.1495665771663545e-06, 'epoch': 0.8}
79%|███████▉ | 3582/4506 [4:04:44<1:00:25, 3.92s/it]
80%|███████▉ | 3583/4506 [4:04:48<1:01:02, 3.97s/it]
{'loss': 0.1999, 'grad_norm': 0.3774416148662567, 'learning_rate': 6.136849860136695e-06, 'epoch': 0.8}
80%|███████▉ | 3583/4506 [4:04:48<1:01:02, 3.97s/it]
80%|███████▉ | 3584/4506 [4:04:52<1:02:22, 4.06s/it]
{'loss': 0.2183, 'grad_norm': 0.47211864590644836, 'learning_rate': 6.124144465343687e-06, 'epoch': 0.8}
80%|███████▉ | 3584/4506 [4:04:52<1:02:22, 4.06s/it]
80%|███████▉ | 3585/4506 [4:04:56<1:03:07, 4.11s/it]
{'loss': 0.2101, 'grad_norm': 0.4649568796157837, 'learning_rate': 6.111450400413504e-06, 'epoch': 0.8}
80%|███████▉ | 3585/4506 [4:04:56<1:03:07, 4.11s/it]
80%|███████▉ | 3586/4506 [4:05:00<1:03:05, 4.11s/it]
{'loss': 0.2111, 'grad_norm': 0.384360671043396, 'learning_rate': 6.098767672965494e-06, 'epoch': 0.8}
80%|███████▉ | 3586/4506 [4:05:00<1:03:05, 4.11s/it]
80%|███████▉ | 3587/4506 [4:05:05<1:03:31, 4.15s/it]
{'loss': 0.2112, 'grad_norm': 0.43448400497436523, 'learning_rate': 6.086096290612231e-06, 'epoch': 0.8}
80%|███████▉ | 3587/4506 [4:05:05<1:03:31, 4.15s/it]
80%|███████▉ | 3588/4506 [4:05:09<1:02:45, 4.10s/it]
{'loss': 0.2102, 'grad_norm': 0.3849923014640808, 'learning_rate': 6.07343626095945e-06, 'epoch': 0.8}
80%|███████▉ | 3588/4506 [4:05:09<1:02:45, 4.10s/it]
80%|███████▉ | 3589/4506 [4:05:12<1:01:09, 4.00s/it]
{'loss': 0.207, 'grad_norm': 0.38627830147743225, 'learning_rate': 6.060787591606099e-06, 'epoch': 0.8}
80%|███████▉ | 3589/4506 [4:05:12<1:01:09, 4.00s/it]
80%|███████▉ | 3590/4506 [4:05:16<1:00:06, 3.94s/it]
{'loss': 0.1986, 'grad_norm': 0.4213692843914032, 'learning_rate': 6.048150290144275e-06, 'epoch': 0.8}
80%|███████▉ | 3590/4506 [4:05:16<1:00:06, 3.94s/it]
80%|███████▉ | 3591/4506 [4:05:20<58:46, 3.85s/it]
{'loss': 0.2044, 'grad_norm': 0.4314192533493042, 'learning_rate': 6.035524364159298e-06, 'epoch': 0.8}
80%|███████▉ | 3591/4506 [4:05:20<58:46, 3.85s/it]
80%|███████▉ | 3592/4506 [4:05:24<59:13, 3.89s/it]
{'loss': 0.2024, 'grad_norm': 0.442989706993103, 'learning_rate': 6.022909821229611e-06, 'epoch': 0.8}
80%|███████▉ | 3592/4506 [4:05:24<59:13, 3.89s/it]
80%|███████▉ | 3593/4506 [4:05:28<1:01:08, 4.02s/it]
{'loss': 0.2065, 'grad_norm': 0.3601630628108978, 'learning_rate': 6.01030666892686e-06, 'epoch': 0.8}
80%|███████▉ | 3593/4506 [4:05:28<1:01:08, 4.02s/it]
80%|███████▉ | 3594/4506 [4:05:33<1:03:38, 4.19s/it]
{'loss': 0.1985, 'grad_norm': 0.4009852707386017, 'learning_rate': 5.997714914815827e-06, 'epoch': 0.8}
80%|███████▉ | 3594/4506 [4:05:33<1:03:38, 4.19s/it]
80%|███████▉ | 3595/4506 [4:05:37<1:02:30, 4.12s/it]
{'loss': 0.2157, 'grad_norm': 0.4147739112377167, 'learning_rate': 5.985134566454484e-06, 'epoch': 0.8}
80%|███████▉ | 3595/4506 [4:05:37<1:02:30, 4.12s/it]
80%|███████▉ | 3596/4506 [4:05:40<1:00:39, 4.00s/it]
{'loss': 0.1954, 'grad_norm': 0.38223373889923096, 'learning_rate': 5.97256563139392e-06, 'epoch': 0.8}
80%|███████▉ | 3596/4506 [4:05:40<1:00:39, 4.00s/it]
80%|███████▉ | 3597/4506 [4:05:45<1:01:07, 4.03s/it]
{'loss': 0.209, 'grad_norm': 0.40460196137428284, 'learning_rate': 5.960008117178401e-06, 'epoch': 0.8}
80%|███████▉ | 3597/4506 [4:05:45<1:01:07, 4.03s/it]
80%|███████▉ | 3598/4506 [4:05:49<1:00:44, 4.01s/it]
{'loss': 0.1993, 'grad_norm': 0.3659716248512268, 'learning_rate': 5.94746203134533e-06, 'epoch': 0.8}
80%|███████▉ | 3598/4506 [4:05:49<1:00:44, 4.01s/it]
80%|███████▉ | 3599/4506 [4:05:53<1:01:57, 4.10s/it]
{'loss': 0.2037, 'grad_norm': 0.3108455240726471, 'learning_rate': 5.9349273814252536e-06, 'epoch': 0.8}
80%|███████▉ | 3599/4506 [4:05:53<1:01:57, 4.10s/it]
80%|███████▉ | 3600/4506 [4:05:57<1:01:50, 4.10s/it]
{'loss': 0.2134, 'grad_norm': 0.410951167345047, 'learning_rate': 5.922404174941845e-06, 'epoch': 0.8}
80%|███████▉ | 3600/4506 [4:05:57<1:01:50, 4.10s/it]
80%|███████▉ | 3601/4506 [4:06:01<1:01:25, 4.07s/it]
{'loss': 0.2177, 'grad_norm': 0.4264029562473297, 'learning_rate': 5.909892419411908e-06, 'epoch': 0.8}
80%|███████▉ | 3601/4506 [4:06:01<1:01:25, 4.07s/it]
80%|███████▉ | 3602/4506 [4:06:05<1:02:28, 4.15s/it]
{'loss': 0.2118, 'grad_norm': 0.3814660608768463, 'learning_rate': 5.897392122345382e-06, 'epoch': 0.8}
80%|███████▉ | 3602/4506 [4:06:05<1:02:28, 4.15s/it]
80%|███████▉ | 3603/4506 [4:06:09<1:02:49, 4.17s/it]
{'loss': 0.2174, 'grad_norm': 0.44075196981430054, 'learning_rate': 5.884903291245331e-06, 'epoch': 0.8}
80%|███████▉ | 3603/4506 [4:06:09<1:02:49, 4.17s/it]
80%|███████▉ | 3604/4506 [4:06:13<1:01:07, 4.07s/it]
{'loss': 0.2147, 'grad_norm': 0.36956602334976196, 'learning_rate': 5.872425933607933e-06, 'epoch': 0.8}
80%|███████▉ | 3604/4506 [4:06:13<1:01:07, 4.07s/it]
80%|████████ | 3605/4506 [4:06:17<1:01:05, 4.07s/it]
{'loss': 0.2102, 'grad_norm': 0.39562591910362244, 'learning_rate': 5.859960056922467e-06, 'epoch': 0.8}
80%|████████ | 3605/4506 [4:06:17<1:01:05, 4.07s/it]
80%|████████ | 3606/4506 [4:06:22<1:03:09, 4.21s/it]
{'loss': 0.2075, 'grad_norm': 0.3550536334514618, 'learning_rate': 5.847505668671347e-06, 'epoch': 0.8}
80%|████████ | 3606/4506 [4:06:22<1:03:09, 4.21s/it]
80%|████████ | 3607/4506 [4:06:26<1:02:55, 4.20s/it]
{'loss': 0.2204, 'grad_norm': 0.37708401679992676, 'learning_rate': 5.835062776330058e-06, 'epoch': 0.8}
80%|████████ | 3607/4506 [4:06:26<1:02:55, 4.20s/it]
80%|████████ | 3608/4506 [4:06:31<1:03:56, 4.27s/it]
{'loss': 0.212, 'grad_norm': 0.4027252793312073, 'learning_rate': 5.822631387367216e-06, 'epoch': 0.8}
80%|████████ | 3608/4506 [4:06:31<1:03:56, 4.27s/it]
80%|████████ | 3609/4506 [4:06:35<1:03:17, 4.23s/it]
{'loss': 0.2038, 'grad_norm': 0.34642675518989563, 'learning_rate': 5.810211509244504e-06, 'epoch': 0.8}
80%|████████ | 3609/4506 [4:06:35<1:03:17, 4.23s/it]
80%|████████ | 3610/4506 [4:06:39<1:01:54, 4.15s/it]
{'loss': 0.1934, 'grad_norm': 0.3699454963207245, 'learning_rate': 5.7978031494167335e-06, 'epoch': 0.8}
80%|████████ | 3610/4506 [4:06:39<1:01:54, 4.15s/it]
80%|████████ | 3611/4506 [4:06:43<1:02:41, 4.20s/it]
{'loss': 0.2231, 'grad_norm': 0.3395445644855499, 'learning_rate': 5.785406315331756e-06, 'epoch': 0.8}
80%|████████ | 3611/4506 [4:06:43<1:02:41, 4.20s/it]
80%|████████ | 3612/4506 [4:06:47<1:02:49, 4.22s/it]
{'loss': 0.1962, 'grad_norm': 0.3565334677696228, 'learning_rate': 5.773021014430552e-06, 'epoch': 0.8}
80%|████████ | 3612/4506 [4:06:47<1:02:49, 4.22s/it]
80%|████████ | 3613/4506 [4:06:52<1:03:28, 4.27s/it]
{'loss': 0.2173, 'grad_norm': 0.367290198802948, 'learning_rate': 5.760647254147133e-06, 'epoch': 0.8}
80%|████████ | 3613/4506 [4:06:52<1:03:28, 4.27s/it]
80%|████████ | 3614/4506 [4:06:56<1:03:26, 4.27s/it]
{'loss': 0.2067, 'grad_norm': 0.4304642677307129, 'learning_rate': 5.748285041908624e-06, 'epoch': 0.8}
80%|████████ | 3614/4506 [4:06:56<1:03:26, 4.27s/it]
80%|████████ | 3615/4506 [4:07:00<1:02:18, 4.20s/it]
{'loss': 0.2095, 'grad_norm': 0.4121472239494324, 'learning_rate': 5.735934385135189e-06, 'epoch': 0.8}
80%|████████ | 3615/4506 [4:07:00<1:02:18, 4.20s/it]
80%|████████ | 3616/4506 [4:07:04<1:02:39, 4.22s/it]
{'loss': 0.2063, 'grad_norm': 0.43774211406707764, 'learning_rate': 5.723595291240069e-06, 'epoch': 0.8}
80%|████████ | 3616/4506 [4:07:04<1:02:39, 4.22s/it]
80%|████████ | 3617/4506 [4:07:08<1:01:11, 4.13s/it]
{'loss': 0.2006, 'grad_norm': 0.3579534590244293, 'learning_rate': 5.711267767629577e-06, 'epoch': 0.8}
80%|████████ | 3617/4506 [4:07:08<1:01:11, 4.13s/it]
80%|████████ | 3618/4506 [4:07:12<1:01:36, 4.16s/it]
{'loss': 0.2014, 'grad_norm': 0.42382776737213135, 'learning_rate': 5.69895182170305e-06, 'epoch': 0.8}
80%|████████ | 3618/4506 [4:07:12<1:01:36, 4.16s/it]
80%|████████ | 3619/4506 [4:07:17<1:02:19, 4.22s/it]
{'loss': 0.1994, 'grad_norm': 0.41958358883857727, 'learning_rate': 5.686647460852909e-06, 'epoch': 0.8}
80%|████████ | 3619/4506 [4:07:17<1:02:19, 4.22s/it]
80%|████████ | 3620/4506 [4:07:21<1:01:51, 4.19s/it]
{'loss': 0.2046, 'grad_norm': 0.4394287168979645, 'learning_rate': 5.67435469246459e-06, 'epoch': 0.8}
80%|████████ | 3620/4506 [4:07:21<1:01:51, 4.19s/it]
80%|████████ | 3621/4506 [4:07:25<1:00:58, 4.13s/it]
{'loss': 0.1967, 'grad_norm': 0.3956090807914734, 'learning_rate': 5.662073523916597e-06, 'epoch': 0.8}
80%|████████ | 3621/4506 [4:07:25<1:00:58, 4.13s/it]
80%|████████ | 3622/4506 [4:07:29<1:00:18, 4.09s/it]
{'loss': 0.199, 'grad_norm': 0.3437924385070801, 'learning_rate': 5.6498039625804576e-06, 'epoch': 0.8}
80%|████████ | 3622/4506 [4:07:29<1:00:18, 4.09s/it]
80%|████████ | 3623/4506 [4:07:33<59:54, 4.07s/it]
{'loss': 0.2051, 'grad_norm': 0.39603191614151, 'learning_rate': 5.637546015820744e-06, 'epoch': 0.8}
80%|████████ | 3623/4506 [4:07:33<59:54, 4.07s/it]
80%|████████ | 3624/4506 [4:07:37<1:00:41, 4.13s/it]
{'loss': 0.215, 'grad_norm': 0.42664211988449097, 'learning_rate': 5.625299690995034e-06, 'epoch': 0.8}
80%|████████ | 3624/4506 [4:07:37<1:00:41, 4.13s/it]
80%|████████ | 3625/4506 [4:07:41<1:00:12, 4.10s/it]
{'loss': 0.2132, 'grad_norm': 0.4655003845691681, 'learning_rate': 5.6130649954539595e-06, 'epoch': 0.8}
80%|████████ | 3625/4506 [4:07:41<1:00:12, 4.10s/it]
80%|████████ | 3626/4506 [4:07:45<59:42, 4.07s/it]
{'loss': 0.201, 'grad_norm': 0.3806888163089752, 'learning_rate': 5.6008419365411426e-06, 'epoch': 0.8}
80%|████████ | 3626/4506 [4:07:45<59:42, 4.07s/it]
80%|████████ | 3627/4506 [4:07:49<59:49, 4.08s/it]
{'loss': 0.2018, 'grad_norm': 0.41884127259254456, 'learning_rate': 5.588630521593252e-06, 'epoch': 0.81}
80%|████████ | 3627/4506 [4:07:49<59:49, 4.08s/it]
81%|████████ | 3628/4506 [4:07:54<1:00:50, 4.16s/it]
{'loss': 0.2165, 'grad_norm': 0.4627501666545868, 'learning_rate': 5.576430757939924e-06, 'epoch': 0.81}
81%|████████ | 3628/4506 [4:07:54<1:00:50, 4.16s/it]
81%|████████ | 3629/4506 [4:07:57<59:41, 4.08s/it]
{'loss': 0.2033, 'grad_norm': 0.45944899320602417, 'learning_rate': 5.564242652903859e-06, 'epoch': 0.81}
81%|████████ | 3629/4506 [4:07:57<59:41, 4.08s/it]
81%|████████ | 3630/4506 [4:08:01<58:55, 4.04s/it]
{'loss': 0.2035, 'grad_norm': 0.40661370754241943, 'learning_rate': 5.552066213800708e-06, 'epoch': 0.81}
81%|████████ | 3630/4506 [4:08:01<58:55, 4.04s/it]
81%|████████ | 3631/4506 [4:08:05<58:25, 4.01s/it]
{'loss': 0.2129, 'grad_norm': 0.3820823132991791, 'learning_rate': 5.539901447939155e-06, 'epoch': 0.81}
81%|████████ | 3631/4506 [4:08:05<58:25, 4.01s/it]
81%|████████ | 3632/4506 [4:08:09<57:17, 3.93s/it]
{'loss': 0.203, 'grad_norm': 0.36888256669044495, 'learning_rate': 5.527748362620847e-06, 'epoch': 0.81}
81%|████████ | 3632/4506 [4:08:09<57:17, 3.93s/it]
81%|████████ | 3633/4506 [4:08:13<57:33, 3.96s/it]
{'loss': 0.1998, 'grad_norm': 0.3752046227455139, 'learning_rate': 5.5156069651404465e-06, 'epoch': 0.81}
81%|████████ | 3633/4506 [4:08:13<57:33, 3.96s/it]
81%|████████ | 3634/4506 [4:08:17<57:36, 3.96s/it]
{'loss': 0.2025, 'grad_norm': 0.39529964327812195, 'learning_rate': 5.5034772627855805e-06, 'epoch': 0.81}
81%|████████ | 3634/4506 [4:08:17<57:36, 3.96s/it]
81%|████████ | 3635/4506 [4:08:21<58:52, 4.06s/it]
{'loss': 0.2078, 'grad_norm': 0.39593183994293213, 'learning_rate': 5.491359262836873e-06, 'epoch': 0.81}
81%|████████ | 3635/4506 [4:08:21<58:52, 4.06s/it]
81%|████████ | 3636/4506 [4:08:26<59:46, 4.12s/it]
{'loss': 0.1982, 'grad_norm': 0.39468637108802795, 'learning_rate': 5.479252972567919e-06, 'epoch': 0.81}
81%|████████ | 3636/4506 [4:08:26<59:46, 4.12s/it]
81%|████████ | 3637/4506 [4:08:30<59:22, 4.10s/it]
{'loss': 0.2051, 'grad_norm': 0.39424437284469604, 'learning_rate': 5.4671583992452706e-06, 'epoch': 0.81}
81%|████████ | 3637/4506 [4:08:30<59:22, 4.10s/it]
81%|████████ | 3638/4506 [4:08:34<1:00:54, 4.21s/it]
{'loss': 0.2105, 'grad_norm': 0.38109061121940613, 'learning_rate': 5.455075550128471e-06, 'epoch': 0.81}
81%|████████ | 3638/4506 [4:08:34<1:00:54, 4.21s/it]
81%|████████ | 3639/4506 [4:08:38<59:24, 4.11s/it]
{'loss': 0.2158, 'grad_norm': 0.4030088186264038, 'learning_rate': 5.443004432470003e-06, 'epoch': 0.81}
81%|████████ | 3639/4506 [4:08:38<59:24, 4.11s/it]
81%|████████ | 3640/4506 [4:08:42<58:56, 4.08s/it]
{'loss': 0.2097, 'grad_norm': 0.36867794394493103, 'learning_rate': 5.430945053515324e-06, 'epoch': 0.81}
81%|████████ | 3640/4506 [4:08:42<58:56, 4.08s/it]
81%|████████ | 3641/4506 [4:08:46<58:39, 4.07s/it]
{'loss': 0.1995, 'grad_norm': 0.35069581866264343, 'learning_rate': 5.418897420502839e-06, 'epoch': 0.81}
81%|████████ | 3641/4506 [4:08:46<58:39, 4.07s/it]
81%|████████ | 3642/4506 [4:08:51<1:00:26, 4.20s/it]
{'loss': 0.1956, 'grad_norm': 0.3587309718132019, 'learning_rate': 5.4068615406639126e-06, 'epoch': 0.81}
81%|████████ | 3642/4506 [4:08:51<1:00:26, 4.20s/it]
81%|████████ | 3643/4506 [4:08:55<59:49, 4.16s/it]
{'loss': 0.2051, 'grad_norm': 0.38451460003852844, 'learning_rate': 5.394837421222835e-06, 'epoch': 0.81}
81%|████████ | 3643/4506 [4:08:55<59:49, 4.16s/it]
81%|████████ | 3644/4506 [4:08:59<59:05, 4.11s/it]
{'loss': 0.2079, 'grad_norm': 0.35474374890327454, 'learning_rate': 5.382825069396855e-06, 'epoch': 0.81}
81%|████████ | 3644/4506 [4:08:59<59:05, 4.11s/it]
81%|████████ | 3645/4506 [4:09:03<58:13, 4.06s/it]
{'loss': 0.2156, 'grad_norm': 0.37302955985069275, 'learning_rate': 5.370824492396143e-06, 'epoch': 0.81}
81%|████████ | 3645/4506 [4:09:03<58:13, 4.06s/it]
81%|████████ | 3646/4506 [4:09:07<58:03, 4.05s/it]
{'loss': 0.1987, 'grad_norm': 0.3544074296951294, 'learning_rate': 5.358835697423823e-06, 'epoch': 0.81}
81%|████████ | 3646/4506 [4:09:07<58:03, 4.05s/it]
81%|████████ | 3647/4506 [4:09:10<57:12, 4.00s/it]
{'loss': 0.2054, 'grad_norm': 0.35527503490448, 'learning_rate': 5.346858691675916e-06, 'epoch': 0.81}
81%|████████ | 3647/4506 [4:09:10<57:12, 4.00s/it]
81%|████████ | 3648/4506 [4:09:14<56:23, 3.94s/it]
{'loss': 0.1861, 'grad_norm': 0.35249894857406616, 'learning_rate': 5.334893482341408e-06, 'epoch': 0.81}
81%|████████ | 3648/4506 [4:09:14<56:23, 3.94s/it]
81%|████████ | 3649/4506 [4:09:19<58:25, 4.09s/it]
{'loss': 0.2018, 'grad_norm': 0.3517704904079437, 'learning_rate': 5.3229400766021705e-06, 'epoch': 0.81}
81%|████████ | 3649/4506 [4:09:19<58:25, 4.09s/it]
81%|████████ | 3650/4506 [4:09:23<57:43, 4.05s/it]
{'loss': 0.2109, 'grad_norm': 0.39654114842414856, 'learning_rate': 5.310998481632995e-06, 'epoch': 0.81}
81%|████████ | 3650/4506 [4:09:23<57:43, 4.05s/it]
81%|████████ | 3651/4506 [4:09:27<58:14, 4.09s/it]
{'loss': 0.2035, 'grad_norm': 0.3544260263442993, 'learning_rate': 5.299068704601607e-06, 'epoch': 0.81}
81%|████████ | 3651/4506 [4:09:27<58:14, 4.09s/it]
81%|████████ | 3652/4506 [4:09:31<56:49, 3.99s/it]
{'loss': 0.207, 'grad_norm': 0.431440532207489, 'learning_rate': 5.287150752668601e-06, 'epoch': 0.81}
81%|████████ | 3652/4506 [4:09:31<56:49, 3.99s/it]
81%|████████ | 3653/4506 [4:09:35<56:56, 4.01s/it]
{'loss': 0.2041, 'grad_norm': 0.3816622495651245, 'learning_rate': 5.275244632987506e-06, 'epoch': 0.81}
81%|████████ | 3653/4506 [4:09:35<56:56, 4.01s/it]
81%|████████ | 3654/4506 [4:09:39<58:00, 4.09s/it]
{'loss': 0.1946, 'grad_norm': 0.38153550028800964, 'learning_rate': 5.263350352704735e-06, 'epoch': 0.81}
81%|████████ | 3654/4506 [4:09:39<58:00, 4.09s/it]
81%|████████ | 3655/4506 [4:09:43<56:59, 4.02s/it]
{'loss': 0.2057, 'grad_norm': 0.40503913164138794, 'learning_rate': 5.251467918959605e-06, 'epoch': 0.81}
81%|████████ | 3655/4506 [4:09:43<56:59, 4.02s/it]
81%|████████ | 3656/4506 [4:09:47<59:43, 4.22s/it]
{'loss': 0.198, 'grad_norm': 0.37235864996910095, 'learning_rate': 5.2395973388843e-06, 'epoch': 0.81}
81%|████████ | 3656/4506 [4:09:47<59:43, 4.22s/it]
81%|████████ | 3657/4506 [4:09:52<1:00:52, 4.30s/it]
{'loss': 0.2049, 'grad_norm': 0.5598635673522949, 'learning_rate': 5.22773861960392e-06, 'epoch': 0.81}
81%|████████ | 3657/4506 [4:09:52<1:00:52, 4.30s/it]
81%|████████ | 3658/4506 [4:09:56<58:42, 4.15s/it]
{'loss': 0.1983, 'grad_norm': 0.4199615716934204, 'learning_rate': 5.215891768236411e-06, 'epoch': 0.81}
81%|████████ | 3658/4506 [4:09:56<58:42, 4.15s/it]
81%|████████ | 3659/4506 [4:10:00<59:53, 4.24s/it]
{'loss': 0.209, 'grad_norm': 0.35466986894607544, 'learning_rate': 5.204056791892622e-06, 'epoch': 0.81}
81%|████████ | 3659/4506 [4:10:00<59:53, 4.24s/it]
81%|████████ | 3660/4506 [4:10:05<1:00:04, 4.26s/it]
{'loss': 0.215, 'grad_norm': 0.4250589907169342, 'learning_rate': 5.192233697676266e-06, 'epoch': 0.81}
81%|████████ | 3660/4506 [4:10:05<1:00:04, 4.26s/it]
81%|████████ | 3661/4506 [4:10:09<1:01:22, 4.36s/it]
{'loss': 0.2058, 'grad_norm': 0.42654091119766235, 'learning_rate': 5.180422492683931e-06, 'epoch': 0.81}
81%|████████ | 3661/4506 [4:10:09<1:01:22, 4.36s/it]
81%|████████▏ | 3662/4506 [4:10:13<58:59, 4.19s/it]
{'loss': 0.1924, 'grad_norm': 0.3803257942199707, 'learning_rate': 5.16862318400505e-06, 'epoch': 0.81}
81%|████████▏ | 3662/4506 [4:10:13<58:59, 4.19s/it]
81%|████████▏ | 3663/4506 [4:10:17<58:34, 4.17s/it]
{'loss': 0.2143, 'grad_norm': 0.45863911509513855, 'learning_rate': 5.156835778721936e-06, 'epoch': 0.81}
81%|████████▏ | 3663/4506 [4:10:17<58:34, 4.17s/it]
81%|████████▏ | 3664/4506 [4:10:21<58:04, 4.14s/it]
{'loss': 0.2051, 'grad_norm': 0.5134897232055664, 'learning_rate': 5.145060283909739e-06, 'epoch': 0.81}
81%|████████▏ | 3664/4506 [4:10:21<58:04, 4.14s/it]
81%|████████▏ | 3665/4506 [4:10:25<57:57, 4.14s/it]
{'loss': 0.2067, 'grad_norm': 0.3604978621006012, 'learning_rate': 5.133296706636481e-06, 'epoch': 0.81}
81%|████████▏ | 3665/4506 [4:10:25<57:57, 4.14s/it]
81%|████████▏ | 3666/4506 [4:10:30<58:45, 4.20s/it]
{'loss': 0.2044, 'grad_norm': 0.3640682101249695, 'learning_rate': 5.121545053963006e-06, 'epoch': 0.81}
81%|████████▏ | 3666/4506 [4:10:30<58:45, 4.20s/it]
81%|████████▏ | 3667/4506 [4:10:34<59:26, 4.25s/it]
{'loss': 0.2025, 'grad_norm': 0.36440393328666687, 'learning_rate': 5.109805332943019e-06, 'epoch': 0.81}
81%|████████▏ | 3667/4506 [4:10:34<59:26, 4.25s/it]
81%|████████▏ | 3668/4506 [4:10:38<59:59, 4.30s/it]
{'loss': 0.2112, 'grad_norm': 0.3749324679374695, 'learning_rate': 5.098077550623065e-06, 'epoch': 0.81}
81%|████████▏ | 3668/4506 [4:10:38<59:59, 4.30s/it]
81%|████████▏ | 3669/4506 [4:10:42<57:30, 4.12s/it]
{'loss': 0.1977, 'grad_norm': 0.41334447264671326, 'learning_rate': 5.086361714042503e-06, 'epoch': 0.81}
81%|████████▏ | 3669/4506 [4:10:42<57:30, 4.12s/it]
81%|████████▏ | 3670/4506 [4:10:46<56:44, 4.07s/it]
{'loss': 0.2005, 'grad_norm': 0.38194945454597473, 'learning_rate': 5.074657830233548e-06, 'epoch': 0.81}
81%|████████▏ | 3670/4506 [4:10:46<56:44, 4.07s/it]
81%|████████▏ | 3671/4506 [4:10:50<57:24, 4.13s/it]
{'loss': 0.1973, 'grad_norm': 0.36459559202194214, 'learning_rate': 5.062965906221212e-06, 'epoch': 0.81}
81%|████████▏ | 3671/4506 [4:10:50<57:24, 4.13s/it]
81%|████████▏ | 3672/4506 [4:10:54<56:39, 4.08s/it]
{'loss': 0.2043, 'grad_norm': 0.4094129502773285, 'learning_rate': 5.051285949023354e-06, 'epoch': 0.82}
81%|████████▏ | 3672/4506 [4:10:54<56:39, 4.08s/it]
82%|████████▏ | 3673/4506 [4:10:59<57:42, 4.16s/it]
{'loss': 0.1936, 'grad_norm': 0.3318234384059906, 'learning_rate': 5.039617965650636e-06, 'epoch': 0.82}
82%|████████▏ | 3673/4506 [4:10:59<57:42, 4.16s/it]
82%|████████▏ | 3674/4506 [4:11:03<56:56, 4.11s/it]
{'loss': 0.1999, 'grad_norm': 0.37800484895706177, 'learning_rate': 5.027961963106545e-06, 'epoch': 0.82}
82%|████████▏ | 3674/4506 [4:11:03<56:56, 4.11s/it]
82%|████████▏ | 3675/4506 [4:11:07<56:26, 4.08s/it]
{'loss': 0.2061, 'grad_norm': 0.3698611557483673, 'learning_rate': 5.016317948387356e-06, 'epoch': 0.82}
82%|████████▏ | 3675/4506 [4:11:07<56:26, 4.08s/it]
82%|████████▏ | 3676/4506 [4:11:10<55:11, 3.99s/it]
{'loss': 0.2094, 'grad_norm': 0.4952230155467987, 'learning_rate': 5.00468592848217e-06, 'epoch': 0.82}
82%|████████▏ | 3676/4506 [4:11:10<55:11, 3.99s/it]
82%|████████▏ | 3677/4506 [4:11:14<55:14, 4.00s/it]
{'loss': 0.1955, 'grad_norm': 0.3610292673110962, 'learning_rate': 4.993065910372871e-06, 'epoch': 0.82}
82%|████████▏ | 3677/4506 [4:11:14<55:14, 4.00s/it]
82%|████████▏ | 3678/4506 [4:11:18<55:02, 3.99s/it]
{'loss': 0.1967, 'grad_norm': 0.38175609707832336, 'learning_rate': 4.981457901034153e-06, 'epoch': 0.82}
82%|████████▏ | 3678/4506 [4:11:18<55:02, 3.99s/it]
82%|████████▏ | 3679/4506 [4:11:22<55:17, 4.01s/it]
{'loss': 0.2079, 'grad_norm': 0.4061267077922821, 'learning_rate': 4.969861907433493e-06, 'epoch': 0.82}
82%|████████▏ | 3679/4506 [4:11:22<55:17, 4.01s/it]
82%|████████▏ | 3680/4506 [4:11:26<54:27, 3.96s/it]
{'loss': 0.2043, 'grad_norm': 0.5348223447799683, 'learning_rate': 4.958277936531169e-06, 'epoch': 0.82}
82%|████████▏ | 3680/4506 [4:11:26<54:27, 3.96s/it]
82%|████████▏ | 3681/4506 [4:11:31<55:44, 4.05s/it]
{'loss': 0.2101, 'grad_norm': 0.44455355405807495, 'learning_rate': 4.946705995280221e-06, 'epoch': 0.82}
82%|████████▏ | 3681/4506 [4:11:31<55:44, 4.05s/it]
82%|████████▏ | 3682/4506 [4:11:34<54:09, 3.94s/it]
{'loss': 0.2035, 'grad_norm': 0.4680754244327545, 'learning_rate': 4.935146090626488e-06, 'epoch': 0.82}
82%|████████▏ | 3682/4506 [4:11:34<54:09, 3.94s/it]
82%|████████▏ | 3683/4506 [4:11:38<53:38, 3.91s/it]
{'loss': 0.2153, 'grad_norm': 0.39728617668151855, 'learning_rate': 4.923598229508575e-06, 'epoch': 0.82}
82%|████████▏ | 3683/4506 [4:11:38<53:38, 3.91s/it]
82%|████████▏ | 3684/4506 [4:11:42<54:02, 3.94s/it]
{'loss': 0.2073, 'grad_norm': 0.40467408299446106, 'learning_rate': 4.912062418857849e-06, 'epoch': 0.82}
82%|████████▏ | 3684/4506 [4:11:42<54:02, 3.94s/it]
82%|████████▏ | 3685/4506 [4:11:47<56:10, 4.10s/it]
{'loss': 0.2094, 'grad_norm': 0.4491935074329376, 'learning_rate': 4.900538665598467e-06, 'epoch': 0.82}
82%|████████▏ | 3685/4506 [4:11:47<56:10, 4.10s/it]
82%|████████▏ | 3686/4506 [4:11:50<55:04, 4.03s/it]
{'loss': 0.1972, 'grad_norm': 0.4623575210571289, 'learning_rate': 4.889026976647329e-06, 'epoch': 0.82}
82%|████████▏ | 3686/4506 [4:11:50<55:04, 4.03s/it]
82%|████████▏ | 3687/4506 [4:11:55<55:20, 4.05s/it]
{'loss': 0.206, 'grad_norm': 0.3959204852581024, 'learning_rate': 4.8775273589141135e-06, 'epoch': 0.82}
82%|████████▏ | 3687/4506 [4:11:55<55:20, 4.05s/it]
82%|████████▏ | 3688/4506 [4:11:59<55:34, 4.08s/it]
{'loss': 0.1942, 'grad_norm': 0.3745720088481903, 'learning_rate': 4.866039819301224e-06, 'epoch': 0.82}
82%|████████▏ | 3688/4506 [4:11:59<55:34, 4.08s/it]
82%|████████▏ | 3689/4506 [4:12:03<57:06, 4.19s/it]
{'loss': 0.2226, 'grad_norm': 0.3758023679256439, 'learning_rate': 4.854564364703848e-06, 'epoch': 0.82}
82%|████████▏ | 3689/4506 [4:12:03<57:06, 4.19s/it]
82%|████████▏ | 3690/4506 [4:12:07<56:56, 4.19s/it]
{'loss': 0.2062, 'grad_norm': 0.38719701766967773, 'learning_rate': 4.8431010020098835e-06, 'epoch': 0.82}
82%|████████▏ | 3690/4506 [4:12:07<56:56, 4.19s/it]
82%|████████▏ | 3691/4506 [4:12:11<55:57, 4.12s/it]
{'loss': 0.1916, 'grad_norm': 0.3526093065738678, 'learning_rate': 4.831649738100003e-06, 'epoch': 0.82}
82%|████████▏ | 3691/4506 [4:12:11<55:57, 4.12s/it]
82%|████████▏ | 3692/4506 [4:12:16<58:44, 4.33s/it]
{'loss': 0.2108, 'grad_norm': 0.4006465971469879, 'learning_rate': 4.820210579847601e-06, 'epoch': 0.82}
82%|████████▏ | 3692/4506 [4:12:16<58:44, 4.33s/it]
82%|████████▏ | 3693/4506 [4:12:20<57:31, 4.24s/it]
{'loss': 0.2076, 'grad_norm': 0.40269935131073, 'learning_rate': 4.808783534118813e-06, 'epoch': 0.82}
82%|████████▏ | 3693/4506 [4:12:20<57:31, 4.24s/it]
82%|████████▏ | 3694/4506 [4:12:24<55:54, 4.13s/it]
{'loss': 0.2042, 'grad_norm': 0.38969191908836365, 'learning_rate': 4.7973686077724874e-06, 'epoch': 0.82}
82%|████████▏ | 3694/4506 [4:12:24<55:54, 4.13s/it]
82%|████████▏ | 3695/4506 [4:12:28<54:32, 4.04s/it]
{'loss': 0.1991, 'grad_norm': 0.3925114572048187, 'learning_rate': 4.785965807660223e-06, 'epoch': 0.82}
82%|████████▏ | 3695/4506 [4:12:28<54:32, 4.04s/it]
82%|████████▏ | 3696/4506 [4:12:32<55:56, 4.14s/it]
{'loss': 0.21, 'grad_norm': 0.35177963972091675, 'learning_rate': 4.7745751406263165e-06, 'epoch': 0.82}
82%|████████▏ | 3696/4506 [4:12:32<55:56, 4.14s/it]
82%|████████▏ | 3697/4506 [4:12:36<56:05, 4.16s/it]
{'loss': 0.2038, 'grad_norm': 0.39017847180366516, 'learning_rate': 4.763196613507798e-06, 'epoch': 0.82}
82%|████████▏ | 3697/4506 [4:12:36<56:05, 4.16s/it]
82%|████████▏ | 3698/4506 [4:12:40<55:29, 4.12s/it]
{'loss': 0.1994, 'grad_norm': 0.383574515581131, 'learning_rate': 4.751830233134405e-06, 'epoch': 0.82}
82%|████████▏ | 3698/4506 [4:12:40<55:29, 4.12s/it]
82%|████████▏ | 3699/4506 [4:12:44<55:10, 4.10s/it]
{'loss': 0.1986, 'grad_norm': 0.39307138323783875, 'learning_rate': 4.740476006328592e-06, 'epoch': 0.82}
82%|████████▏ | 3699/4506 [4:12:44<55:10, 4.10s/it]
82%|████████▏ | 3700/4506 [4:12:49<55:28, 4.13s/it]
{'loss': 0.2013, 'grad_norm': 0.3770471513271332, 'learning_rate': 4.7291339399055025e-06, 'epoch': 0.82}
82%|████████▏ | 3700/4506 [4:12:49<55:28, 4.13s/it]
82%|████████▏ | 3701/4506 [4:12:53<55:05, 4.11s/it]
{'loss': 0.2041, 'grad_norm': 0.47397395968437195, 'learning_rate': 4.717804040672991e-06, 'epoch': 0.82}
82%|████████▏ | 3701/4506 [4:12:53<55:05, 4.11s/it]
82%|████████▏ | 3702/4506 [4:12:57<54:21, 4.06s/it]
{'loss': 0.1984, 'grad_norm': 0.41140878200531006, 'learning_rate': 4.70648631543161e-06, 'epoch': 0.82}
82%|████████▏ | 3702/4506 [4:12:57<54:21, 4.06s/it]
82%|████████▏ | 3703/4506 [4:13:01<54:17, 4.06s/it]
{'loss': 0.2047, 'grad_norm': 0.37271884083747864, 'learning_rate': 4.695180770974597e-06, 'epoch': 0.82}
82%|████████▏ | 3703/4506 [4:13:01<54:17, 4.06s/it]
82%|████████▏ | 3704/4506 [4:13:05<54:30, 4.08s/it]
{'loss': 0.1977, 'grad_norm': 0.3811433017253876, 'learning_rate': 4.6838874140878894e-06, 'epoch': 0.82}
82%|████████▏ | 3704/4506 [4:13:05<54:30, 4.08s/it]
82%|████████▏ | 3705/4506 [4:13:09<53:54, 4.04s/it]
{'loss': 0.201, 'grad_norm': 0.3668990135192871, 'learning_rate': 4.672606251550102e-06, 'epoch': 0.82}
82%|████████▏ | 3705/4506 [4:13:09<53:54, 4.04s/it]
82%|████████▏ | 3706/4506 [4:13:13<55:48, 4.19s/it]
{'loss': 0.2173, 'grad_norm': 0.4222773611545563, 'learning_rate': 4.661337290132539e-06, 'epoch': 0.82}
82%|████████▏ | 3706/4506 [4:13:13<55:48, 4.19s/it]
82%|████████▏ | 3707/4506 [4:13:18<56:14, 4.22s/it]
{'loss': 0.2019, 'grad_norm': 0.43602702021598816, 'learning_rate': 4.650080536599161e-06, 'epoch': 0.82}
82%|████████▏ | 3707/4506 [4:13:18<56:14, 4.22s/it]
82%|████████▏ | 3708/4506 [4:13:22<54:46, 4.12s/it]
{'loss': 0.1987, 'grad_norm': 0.41318583488464355, 'learning_rate': 4.638835997706625e-06, 'epoch': 0.82}
82%|████████▏ | 3708/4506 [4:13:22<54:46, 4.12s/it]
82%|████████▏ | 3709/4506 [4:13:25<54:00, 4.07s/it]
{'loss': 0.2107, 'grad_norm': 0.37680450081825256, 'learning_rate': 4.627603680204234e-06, 'epoch': 0.82}
82%|████████▏ | 3709/4506 [4:13:25<54:00, 4.07s/it]
82%|████████▏ | 3710/4506 [4:13:29<53:31, 4.03s/it]
{'loss': 0.1916, 'grad_norm': 0.38246938586235046, 'learning_rate': 4.6163835908339755e-06, 'epoch': 0.82}
82%|████████▏ | 3710/4506 [4:13:29<53:31, 4.03s/it]
82%|████████▏ | 3711/4506 [4:13:34<54:40, 4.13s/it]
{'loss': 0.2063, 'grad_norm': 0.42162051796913147, 'learning_rate': 4.605175736330486e-06, 'epoch': 0.82}
82%|████████▏ | 3711/4506 [4:13:34<54:40, 4.13s/it]
82%|████████▏ | 3712/4506 [4:13:38<54:43, 4.13s/it]
{'loss': 0.2048, 'grad_norm': 0.4375093877315521, 'learning_rate': 4.5939801234210656e-06, 'epoch': 0.82}
82%|████████▏ | 3712/4506 [4:13:38<54:43, 4.13s/it]
82%|████████▏ | 3713/4506 [4:13:42<55:08, 4.17s/it]
{'loss': 0.2084, 'grad_norm': 0.3868264853954315, 'learning_rate': 4.582796758825653e-06, 'epoch': 0.82}
82%|████████▏ | 3713/4506 [4:13:42<55:08, 4.17s/it]
82%|████████▏ | 3714/4506 [4:13:46<55:33, 4.21s/it]
{'loss': 0.2104, 'grad_norm': 0.42120563983917236, 'learning_rate': 4.571625649256852e-06, 'epoch': 0.82}
82%|████████▏ | 3714/4506 [4:13:46<55:33, 4.21s/it]
82%|████████▏ | 3715/4506 [4:13:51<54:55, 4.17s/it]
{'loss': 0.2094, 'grad_norm': 0.402847558259964, 'learning_rate': 4.560466801419894e-06, 'epoch': 0.82}
82%|████████▏ | 3715/4506 [4:13:51<54:55, 4.17s/it]
82%|████████▏ | 3716/4506 [4:13:55<55:04, 4.18s/it]
{'loss': 0.2068, 'grad_norm': 0.432616651058197, 'learning_rate': 4.549320222012665e-06, 'epoch': 0.82}
82%|████████▏ | 3716/4506 [4:13:55<55:04, 4.18s/it]
82%|████████▏ | 3717/4506 [4:13:59<55:03, 4.19s/it]
{'loss': 0.1979, 'grad_norm': 0.4007754325866699, 'learning_rate': 4.538185917725685e-06, 'epoch': 0.83}
82%|████████▏ | 3717/4506 [4:13:59<55:03, 4.19s/it]
83%|████████▎ | 3718/4506 [4:14:03<54:11, 4.13s/it]
{'loss': 0.202, 'grad_norm': 0.393523246049881, 'learning_rate': 4.52706389524209e-06, 'epoch': 0.83}
83%|████████▎ | 3718/4506 [4:14:03<54:11, 4.13s/it]
83%|████████▎ | 3719/4506 [4:14:07<53:32, 4.08s/it]
{'loss': 0.2127, 'grad_norm': 0.37434375286102295, 'learning_rate': 4.515954161237668e-06, 'epoch': 0.83}
83%|████████▎ | 3719/4506 [4:14:07<53:32, 4.08s/it]
83%|████████▎ | 3720/4506 [4:14:11<53:52, 4.11s/it]
{'loss': 0.2114, 'grad_norm': 0.40534713864326477, 'learning_rate': 4.50485672238081e-06, 'epoch': 0.83}
83%|████████▎ | 3720/4506 [4:14:11<53:52, 4.11s/it]
83%|████████▎ | 3721/4506 [4:14:15<53:54, 4.12s/it]
{'loss': 0.1964, 'grad_norm': 0.38506636023521423, 'learning_rate': 4.493771585332548e-06, 'epoch': 0.83}
83%|████████▎ | 3721/4506 [4:14:15<53:54, 4.12s/it]
83%|████████▎ | 3722/4506 [4:14:19<53:31, 4.10s/it]
{'loss': 0.1985, 'grad_norm': 0.38570085167884827, 'learning_rate': 4.482698756746506e-06, 'epoch': 0.83}
83%|████████▎ | 3722/4506 [4:14:19<53:31, 4.10s/it]
83%|████████▎ | 3723/4506 [4:14:23<53:00, 4.06s/it]
{'loss': 0.195, 'grad_norm': 0.3863520920276642, 'learning_rate': 4.471638243268936e-06, 'epoch': 0.83}
83%|████████▎ | 3723/4506 [4:14:23<53:00, 4.06s/it]
83%|████████▎ | 3724/4506 [4:14:27<51:34, 3.96s/it]
{'loss': 0.1864, 'grad_norm': 0.3737875819206238, 'learning_rate': 4.460590051538699e-06, 'epoch': 0.83}
83%|████████▎ | 3724/4506 [4:14:27<51:34, 3.96s/it]
83%|████████▎ | 3725/4506 [4:14:31<53:25, 4.10s/it]
{'loss': 0.2051, 'grad_norm': 0.38112109899520874, 'learning_rate': 4.449554188187258e-06, 'epoch': 0.83}
83%|████████▎ | 3725/4506 [4:14:31<53:25, 4.10s/it]
83%|████████▎ | 3726/4506 [4:14:35<52:51, 4.07s/it]
{'loss': 0.1923, 'grad_norm': 0.3570330739021301, 'learning_rate': 4.438530659838666e-06, 'epoch': 0.83}
83%|████████▎ | 3726/4506 [4:14:35<52:51, 4.07s/it]
83%|████████▎ | 3727/4506 [4:14:40<54:07, 4.17s/it]
{'loss': 0.1935, 'grad_norm': 0.36594370007514954, 'learning_rate': 4.427519473109587e-06, 'epoch': 0.83}
83%|████████▎ | 3727/4506 [4:14:40<54:07, 4.17s/it]
83%|████████▎ | 3728/4506 [4:14:44<55:14, 4.26s/it]
{'loss': 0.1998, 'grad_norm': 0.4202220141887665, 'learning_rate': 4.416520634609264e-06, 'epoch': 0.83}
83%|████████▎ | 3728/4506 [4:14:44<55:14, 4.26s/it]
83%|████████▎ | 3729/4506 [4:14:48<54:35, 4.21s/it]
{'loss': 0.2009, 'grad_norm': 0.3200521469116211, 'learning_rate': 4.405534150939536e-06, 'epoch': 0.83}
83%|████████▎ | 3729/4506 [4:14:48<54:35, 4.21s/it]
83%|████████▎ | 3730/4506 [4:14:52<52:59, 4.10s/it]
{'loss': 0.1994, 'grad_norm': 0.3823970556259155, 'learning_rate': 4.3945600286948276e-06, 'epoch': 0.83}
83%|████████▎ | 3730/4506 [4:14:52<52:59, 4.10s/it]
83%|████████▎ | 3731/4506 [4:14:56<53:00, 4.10s/it]
{'loss': 0.194, 'grad_norm': 0.36014172434806824, 'learning_rate': 4.383598274462145e-06, 'epoch': 0.83}
83%|████████▎ | 3731/4506 [4:14:56<53:00, 4.10s/it]
83%|████████▎ | 3732/4506 [4:15:00<52:40, 4.08s/it]
{'loss': 0.2004, 'grad_norm': 0.35475072264671326, 'learning_rate': 4.372648894821058e-06, 'epoch': 0.83}
83%|████████▎ | 3732/4506 [4:15:00<52:40, 4.08s/it]
83%|████████▎ | 3733/4506 [4:15:04<51:26, 3.99s/it]
{'loss': 0.1941, 'grad_norm': 0.3941681683063507, 'learning_rate': 4.3617118963437235e-06, 'epoch': 0.83}
83%|████████▎ | 3733/4506 [4:15:04<51:26, 3.99s/it]
83%|████████▎ | 3734/4506 [4:15:08<50:33, 3.93s/it]
{'loss': 0.2007, 'grad_norm': 0.4779421389102936, 'learning_rate': 4.350787285594854e-06, 'epoch': 0.83}
83%|████████▎ | 3734/4506 [4:15:08<50:33, 3.93s/it]
83%|████████▎ | 3735/4506 [4:15:12<49:43, 3.87s/it]
{'loss': 0.2038, 'grad_norm': 0.42954206466674805, 'learning_rate': 4.3398750691317355e-06, 'epoch': 0.83}
83%|████████▎ | 3735/4506 [4:15:12<49:43, 3.87s/it]
83%|████████▎ | 3736/4506 [4:15:16<51:38, 4.02s/it]
{'loss': 0.2086, 'grad_norm': 0.3808591067790985, 'learning_rate': 4.328975253504222e-06, 'epoch': 0.83}
83%|████████▎ | 3736/4506 [4:15:16<51:38, 4.02s/it]
83%|████████▎ | 3737/4506 [4:15:20<51:00, 3.98s/it]
{'loss': 0.201, 'grad_norm': 0.39955756068229675, 'learning_rate': 4.318087845254698e-06, 'epoch': 0.83}
83%|████████▎ | 3737/4506 [4:15:20<51:00, 3.98s/it]
83%|████████▎ | 3738/4506 [4:15:24<50:38, 3.96s/it]
{'loss': 0.2083, 'grad_norm': 0.43919745087623596, 'learning_rate': 4.307212850918135e-06, 'epoch': 0.83}
83%|████████▎ | 3738/4506 [4:15:24<50:38, 3.96s/it]
83%|████████▎ | 3739/4506 [4:15:28<49:57, 3.91s/it]
{'loss': 0.2084, 'grad_norm': 0.3629416525363922, 'learning_rate': 4.296350277022018e-06, 'epoch': 0.83}
83%|████████▎ | 3739/4506 [4:15:28<49:57, 3.91s/it]
83%|████████▎ | 3740/4506 [4:15:32<51:44, 4.05s/it]
{'loss': 0.1965, 'grad_norm': 0.40593940019607544, 'learning_rate': 4.285500130086412e-06, 'epoch': 0.83}
83%|████████▎ | 3740/4506 [4:15:32<51:44, 4.05s/it]
83%|████████▎ | 3741/4506 [4:15:36<52:47, 4.14s/it]
{'loss': 0.2002, 'grad_norm': 0.36334341764450073, 'learning_rate': 4.274662416623887e-06, 'epoch': 0.83}
83%|████████▎ | 3741/4506 [4:15:36<52:47, 4.14s/it]
83%|████████▎ | 3742/4506 [4:15:41<53:56, 4.24s/it]
{'loss': 0.2065, 'grad_norm': 0.4138539731502533, 'learning_rate': 4.263837143139579e-06, 'epoch': 0.83}
83%|████████▎ | 3742/4506 [4:15:41<53:56, 4.24s/it]
83%|████████▎ | 3743/4506 [4:15:45<52:12, 4.11s/it]
{'loss': 0.1997, 'grad_norm': 0.414740651845932, 'learning_rate': 4.253024316131143e-06, 'epoch': 0.83}
83%|████████▎ | 3743/4506 [4:15:45<52:12, 4.11s/it]
83%|████████▎ | 3744/4506 [4:15:49<51:46, 4.08s/it]
{'loss': 0.2025, 'grad_norm': 0.4083847403526306, 'learning_rate': 4.242223942088777e-06, 'epoch': 0.83}
83%|████████▎ | 3744/4506 [4:15:49<51:46, 4.08s/it]
83%|████████▎ | 3745/4506 [4:15:52<50:42, 4.00s/it]
{'loss': 0.2049, 'grad_norm': 0.4124525487422943, 'learning_rate': 4.231436027495178e-06, 'epoch': 0.83}
83%|████████▎ | 3745/4506 [4:15:52<50:42, 4.00s/it]
83%|████████▎ | 3746/4506 [4:15:57<52:40, 4.16s/it]
{'loss': 0.2055, 'grad_norm': 0.4158831536769867, 'learning_rate': 4.2206605788255946e-06, 'epoch': 0.83}
83%|████████▎ | 3746/4506 [4:15:57<52:40, 4.16s/it]
83%|████████▎ | 3747/4506 [4:16:01<52:36, 4.16s/it]
{'loss': 0.2021, 'grad_norm': 0.3930809497833252, 'learning_rate': 4.209897602547769e-06, 'epoch': 0.83}
83%|████████▎ | 3747/4506 [4:16:01<52:36, 4.16s/it]
83%|████████▎ | 3748/4506 [4:16:05<52:23, 4.15s/it]
{'loss': 0.2074, 'grad_norm': 0.47036081552505493, 'learning_rate': 4.199147105121967e-06, 'epoch': 0.83}
83%|████████▎ | 3748/4506 [4:16:05<52:23, 4.15s/it]
83%|████████▎ | 3749/4506 [4:16:09<52:10, 4.14s/it]
{'loss': 0.2022, 'grad_norm': 0.4299858808517456, 'learning_rate': 4.188409093000973e-06, 'epoch': 0.83}
83%|████████▎ | 3749/4506 [4:16:09<52:10, 4.14s/it]
83%|████████▎ | 3750/4506 [4:16:14<52:18, 4.15s/it]
{'loss': 0.2113, 'grad_norm': 0.407943457365036, 'learning_rate': 4.1776835726300646e-06, 'epoch': 0.83}
83%|████████▎ | 3750/4506 [4:16:14<52:18, 4.15s/it]
83%|████████▎ | 3751/4506 [4:16:18<52:26, 4.17s/it]
{'loss': 0.2028, 'grad_norm': 0.3209913372993469, 'learning_rate': 4.166970550447028e-06, 'epoch': 0.83}
83%|████████▎ | 3751/4506 [4:16:18<52:26, 4.17s/it]
83%|████████▎ | 3752/4506 [4:16:22<51:34, 4.10s/it]
{'loss': 0.1909, 'grad_norm': 0.4040665030479431, 'learning_rate': 4.156270032882134e-06, 'epoch': 0.83}
83%|████████▎ | 3752/4506 [4:16:22<51:34, 4.10s/it]
83%|████████▎ | 3753/4506 [4:16:26<51:48, 4.13s/it]
{'loss': 0.1872, 'grad_norm': 0.41325992345809937, 'learning_rate': 4.145582026358167e-06, 'epoch': 0.83}
83%|████████▎ | 3753/4506 [4:16:26<51:48, 4.13s/it]
83%|████████▎ | 3754/4506 [4:16:30<50:37, 4.04s/it]
{'loss': 0.201, 'grad_norm': 0.39546695351600647, 'learning_rate': 4.134906537290392e-06, 'epoch': 0.83}
83%|████████▎ | 3754/4506 [4:16:30<50:37, 4.04s/it]
83%|████████▎ | 3755/4506 [4:16:33<49:26, 3.95s/it]
{'loss': 0.1921, 'grad_norm': 0.3794635236263275, 'learning_rate': 4.12424357208657e-06, 'epoch': 0.83}
83%|████████▎ | 3755/4506 [4:16:33<49:26, 3.95s/it]
83%|████████▎ | 3756/4506 [4:16:38<50:16, 4.02s/it]
{'loss': 0.1954, 'grad_norm': 0.3320651352405548, 'learning_rate': 4.113593137146926e-06, 'epoch': 0.83}
83%|████████▎ | 3756/4506 [4:16:38<50:16, 4.02s/it]
83%|████████▎ | 3757/4506 [4:16:42<50:18, 4.03s/it]
{'loss': 0.1961, 'grad_norm': 0.3911479413509369, 'learning_rate': 4.102955238864184e-06, 'epoch': 0.83}
83%|████████▎ | 3757/4506 [4:16:42<50:18, 4.03s/it]
83%|████████▎ | 3758/4506 [4:16:46<50:14, 4.03s/it]
{'loss': 0.1911, 'grad_norm': 0.34134313464164734, 'learning_rate': 4.0923298836235245e-06, 'epoch': 0.83}
83%|████████▎ | 3758/4506 [4:16:46<50:14, 4.03s/it]
83%|████████▎ | 3759/4506 [4:16:50<52:24, 4.21s/it]
{'loss': 0.1892, 'grad_norm': 0.34274905920028687, 'learning_rate': 4.081717077802619e-06, 'epoch': 0.83}
83%|████████▎ | 3759/4506 [4:16:50<52:24, 4.21s/it]
83%|████████▎ | 3760/4506 [4:16:54<50:48, 4.09s/it]
{'loss': 0.2139, 'grad_norm': 0.4107447862625122, 'learning_rate': 4.071116827771587e-06, 'epoch': 0.83}
83%|████████▎ | 3760/4506 [4:16:54<50:48, 4.09s/it]
83%|████████▎ | 3761/4506 [4:16:58<50:04, 4.03s/it]
{'loss': 0.1988, 'grad_norm': 0.46029534935951233, 'learning_rate': 4.060529139893027e-06, 'epoch': 0.83}
83%|████████▎ | 3761/4506 [4:16:58<50:04, 4.03s/it]
83%|████████▎ | 3762/4506 [4:17:02<50:19, 4.06s/it]
{'loss': 0.1941, 'grad_norm': 0.3969859480857849, 'learning_rate': 4.04995402052199e-06, 'epoch': 0.84}
83%|████████▎ | 3762/4506 [4:17:02<50:19, 4.06s/it]
84%|████████▎ | 3763/4506 [4:17:07<52:04, 4.21s/it]
{'loss': 0.1999, 'grad_norm': 0.36285102367401123, 'learning_rate': 4.039391476005991e-06, 'epoch': 0.84}
84%|████████▎ | 3763/4506 [4:17:07<52:04, 4.21s/it]
84%|████████▎ | 3764/4506 [4:17:11<51:08, 4.14s/it]
{'loss': 0.1939, 'grad_norm': 0.3656004071235657, 'learning_rate': 4.028841512684978e-06, 'epoch': 0.84}
84%|████████▎ | 3764/4506 [4:17:11<51:08, 4.14s/it]
84%|████████▎ | 3765/4506 [4:17:15<51:46, 4.19s/it]
{'loss': 0.2009, 'grad_norm': 0.3896752893924713, 'learning_rate': 4.018304136891371e-06, 'epoch': 0.84}
84%|████████▎ | 3765/4506 [4:17:15<51:46, 4.19s/it]
84%|████████▎ | 3766/4506 [4:17:19<50:37, 4.11s/it]
{'loss': 0.1941, 'grad_norm': 0.38017603754997253, 'learning_rate': 4.007779354950015e-06, 'epoch': 0.84}
84%|████████▎ | 3766/4506 [4:17:19<50:37, 4.11s/it]
84%|████████▎ | 3767/4506 [4:17:23<50:32, 4.10s/it]
{'loss': 0.1907, 'grad_norm': 0.40470558404922485, 'learning_rate': 3.9972671731782065e-06, 'epoch': 0.84}
84%|████████▎ | 3767/4506 [4:17:23<50:32, 4.10s/it]
84%|████████▎ | 3768/4506 [4:17:27<49:17, 4.01s/it]
{'loss': 0.1996, 'grad_norm': 0.47038939595222473, 'learning_rate': 3.986767597885688e-06, 'epoch': 0.84}
84%|████████▎ | 3768/4506 [4:17:27<49:17, 4.01s/it]
84%|████████▎ | 3769/4506 [4:17:31<49:42, 4.05s/it]
{'loss': 0.2022, 'grad_norm': 0.40358641743659973, 'learning_rate': 3.976280635374604e-06, 'epoch': 0.84}
84%|████████▎ | 3769/4506 [4:17:31<49:42, 4.05s/it]
84%|████████▎ | 3770/4506 [4:17:35<49:55, 4.07s/it]
{'loss': 0.2015, 'grad_norm': 0.414459764957428, 'learning_rate': 3.965806291939569e-06, 'epoch': 0.84}
84%|████████▎ | 3770/4506 [4:17:35<49:55, 4.07s/it]
84%|████████▎ | 3771/4506 [4:17:39<50:18, 4.11s/it]
{'loss': 0.2046, 'grad_norm': 0.4046570360660553, 'learning_rate': 3.955344573867587e-06, 'epoch': 0.84}
84%|████████▎ | 3771/4506 [4:17:39<50:18, 4.11s/it]
84%|████████▎ | 3772/4506 [4:17:44<50:45, 4.15s/it]
{'loss': 0.2036, 'grad_norm': 0.4170626401901245, 'learning_rate': 3.944895487438102e-06, 'epoch': 0.84}
84%|████████▎ | 3772/4506 [4:17:44<50:45, 4.15s/it]
84%|████████▎ | 3773/4506 [4:17:47<48:32, 3.97s/it]
{'loss': 0.2157, 'grad_norm': 0.42144089937210083, 'learning_rate': 3.9344590389229805e-06, 'epoch': 0.84}
84%|████████▎ | 3773/4506 [4:17:47<48:32, 3.97s/it]
84%|████████▍ | 3774/4506 [4:17:51<48:16, 3.96s/it]
{'loss': 0.204, 'grad_norm': 0.41799694299697876, 'learning_rate': 3.924035234586498e-06, 'epoch': 0.84}
84%|████████▍ | 3774/4506 [4:17:51<48:16, 3.96s/it]
84%|████████▍ | 3775/4506 [4:17:55<48:40, 4.00s/it]
{'loss': 0.1963, 'grad_norm': 0.3765568435192108, 'learning_rate': 3.913624080685327e-06, 'epoch': 0.84}
84%|████████▍ | 3775/4506 [4:17:55<48:40, 4.00s/it]
84%|████████▍ | 3776/4506 [4:17:59<47:44, 3.92s/it]
{'loss': 0.1971, 'grad_norm': 0.5146211981773376, 'learning_rate': 3.90322558346857e-06, 'epoch': 0.84}
84%|████████▍ | 3776/4506 [4:17:59<47:44, 3.92s/it]
84%|████████▍ | 3777/4506 [4:18:02<46:32, 3.83s/it]
{'loss': 0.1939, 'grad_norm': 0.38932350277900696, 'learning_rate': 3.8928397491777155e-06, 'epoch': 0.84}
84%|████████▍ | 3777/4506 [4:18:02<46:32, 3.83s/it]
84%|████████▍ | 3778/4506 [4:18:06<45:47, 3.77s/it]
{'loss': 0.1993, 'grad_norm': 0.3803618848323822, 'learning_rate': 3.882466584046663e-06, 'epoch': 0.84}
84%|████████▍ | 3778/4506 [4:18:06<45:47, 3.77s/it]
84%|████████▍ | 3779/4506 [4:18:10<45:58, 3.79s/it]
{'loss': 0.2046, 'grad_norm': 0.43547314405441284, 'learning_rate': 3.872106094301689e-06, 'epoch': 0.84}
84%|████████▍ | 3779/4506 [4:18:10<45:58, 3.79s/it]
84%|████████▍ | 3780/4506 [4:18:14<46:56, 3.88s/it]
{'loss': 0.2022, 'grad_norm': 0.3746822476387024, 'learning_rate': 3.861758286161485e-06, 'epoch': 0.84}
84%|████████▍ | 3780/4506 [4:18:14<46:56, 3.88s/it]
84%|████████▍ | 3781/4506 [4:18:18<47:10, 3.90s/it]
{'loss': 0.1952, 'grad_norm': 0.4226939380168915, 'learning_rate': 3.85142316583712e-06, 'epoch': 0.84}
84%|████████▍ | 3781/4506 [4:18:18<47:10, 3.90s/it]
84%|████████▍ | 3782/4506 [4:18:22<47:31, 3.94s/it]
{'loss': 0.1944, 'grad_norm': 0.3984280526638031, 'learning_rate': 3.8411007395320525e-06, 'epoch': 0.84}
84%|████████▍ | 3782/4506 [4:18:22<47:31, 3.94s/it]
84%|████████▍ | 3783/4506 [4:18:26<47:19, 3.93s/it]
{'loss': 0.2082, 'grad_norm': 0.3619217276573181, 'learning_rate': 3.830791013442103e-06, 'epoch': 0.84}
84%|████████▍ | 3783/4506 [4:18:26<47:19, 3.93s/it]
84%|████████▍ | 3784/4506 [4:18:30<47:35, 3.96s/it]
{'loss': 0.191, 'grad_norm': 0.3607044219970703, 'learning_rate': 3.820493993755497e-06, 'epoch': 0.84}
84%|████████▍ | 3784/4506 [4:18:30<47:35, 3.96s/it]
84%|████████▍ | 3785/4506 [4:18:34<47:15, 3.93s/it]
{'loss': 0.1993, 'grad_norm': 0.45264488458633423, 'learning_rate': 3.8102096866528076e-06, 'epoch': 0.84}
84%|████████▍ | 3785/4506 [4:18:34<47:15, 3.93s/it]
84%|████████▍ | 3786/4506 [4:18:38<47:08, 3.93s/it]
{'loss': 0.2045, 'grad_norm': 0.3820101022720337, 'learning_rate': 3.7999380983069933e-06, 'epoch': 0.84}
84%|████████▍ | 3786/4506 [4:18:38<47:08, 3.93s/it]
84%|████████▍ | 3787/4506 [4:18:42<49:13, 4.11s/it]
{'loss': 0.2019, 'grad_norm': 0.35779324173927307, 'learning_rate': 3.789679234883381e-06, 'epoch': 0.84}
84%|████████▍ | 3787/4506 [4:18:42<49:13, 4.11s/it]
84%|████████▍ | 3788/4506 [4:18:46<49:08, 4.11s/it]
{'loss': 0.205, 'grad_norm': 0.3650997579097748, 'learning_rate': 3.7794331025396394e-06, 'epoch': 0.84}
84%|████████▍ | 3788/4506 [4:18:46<49:08, 4.11s/it]
84%|████████▍ | 3789/4506 [4:18:50<48:23, 4.05s/it]
{'loss': 0.196, 'grad_norm': 0.41300055384635925, 'learning_rate': 3.769199707425822e-06, 'epoch': 0.84}
84%|████████▍ | 3789/4506 [4:18:50<48:23, 4.05s/it]
84%|████████▍ | 3790/4506 [4:18:54<47:29, 3.98s/it]
{'loss': 0.1957, 'grad_norm': 0.40586116909980774, 'learning_rate': 3.7589790556843114e-06, 'epoch': 0.84}
84%|████████▍ | 3790/4506 [4:18:54<47:29, 3.98s/it]
84%|████████▍ | 3791/4506 [4:18:58<47:45, 4.01s/it]
{'loss': 0.2095, 'grad_norm': 0.4161042273044586, 'learning_rate': 3.7487711534498595e-06, 'epoch': 0.84}
84%|████████▍ | 3791/4506 [4:18:58<47:45, 4.01s/it]
84%|████████▍ | 3792/4506 [4:19:03<48:55, 4.11s/it]
{'loss': 0.2062, 'grad_norm': 0.3594443202018738, 'learning_rate': 3.738576006849559e-06, 'epoch': 0.84}
84%|████████▍ | 3792/4506 [4:19:03<48:55, 4.11s/it]
84%|████████▍ | 3793/4506 [4:19:07<51:25, 4.33s/it]
{'loss': 0.1948, 'grad_norm': 0.3987312316894531, 'learning_rate': 3.728393622002857e-06, 'epoch': 0.84}
84%|████████▍ | 3793/4506 [4:19:07<51:25, 4.33s/it]
84%|████████▍ | 3794/4506 [4:19:12<51:25, 4.33s/it]
{'loss': 0.216, 'grad_norm': 0.4791152775287628, 'learning_rate': 3.7182240050215146e-06, 'epoch': 0.84}
84%|████████▍ | 3794/4506 [4:19:12<51:25, 4.33s/it]
84%|████████▍ | 3795/4506 [4:19:16<51:27, 4.34s/it]
{'loss': 0.2061, 'grad_norm': 0.33805134892463684, 'learning_rate': 3.708067162009657e-06, 'epoch': 0.84}
84%|████████▍ | 3795/4506 [4:19:16<51:27, 4.34s/it]
84%|████████▍ | 3796/4506 [4:19:20<51:23, 4.34s/it]
{'loss': 0.1964, 'grad_norm': 0.39482933282852173, 'learning_rate': 3.6979230990637213e-06, 'epoch': 0.84}
84%|████████▍ | 3796/4506 [4:19:20<51:23, 4.34s/it]
84%|████████▍ | 3797/4506 [4:19:24<49:55, 4.23s/it]
{'loss': 0.1997, 'grad_norm': 0.3859730362892151, 'learning_rate': 3.6877918222724933e-06, 'epoch': 0.84}
84%|████████▍ | 3797/4506 [4:19:24<49:55, 4.23s/it]
84%|████████▍ | 3798/4506 [4:19:29<50:08, 4.25s/it]
{'loss': 0.1955, 'grad_norm': 0.37244659662246704, 'learning_rate': 3.6776733377170635e-06, 'epoch': 0.84}
84%|████████▍ | 3798/4506 [4:19:29<50:08, 4.25s/it]
84%|████████▍ | 3799/4506 [4:19:33<50:58, 4.33s/it]
{'loss': 0.1963, 'grad_norm': 0.3475169539451599, 'learning_rate': 3.6675676514708575e-06, 'epoch': 0.84}
84%|████████▍ | 3799/4506 [4:19:33<50:58, 4.33s/it]
84%|████████▍ | 3800/4506 [4:19:38<51:26, 4.37s/it]
{'loss': 0.2057, 'grad_norm': 0.4249792993068695, 'learning_rate': 3.657474769599617e-06, 'epoch': 0.84}
84%|████████▍ | 3800/4506 [4:19:38<51:26, 4.37s/it]
84%|████████▍ | 3801/4506 [4:19:42<49:48, 4.24s/it]
{'loss': 0.2008, 'grad_norm': 0.4018096625804901, 'learning_rate': 3.647394698161402e-06, 'epoch': 0.84}
84%|████████▍ | 3801/4506 [4:19:42<49:48, 4.24s/it]
84%|████████▍ | 3802/4506 [4:19:45<48:30, 4.13s/it]
{'loss': 0.2063, 'grad_norm': 0.39480334520339966, 'learning_rate': 3.6373274432065727e-06, 'epoch': 0.84}
84%|████████▍ | 3802/4506 [4:19:45<48:30, 4.13s/it]
84%|████████▍ | 3803/4506 [4:19:50<48:25, 4.13s/it]
{'loss': 0.1998, 'grad_norm': 0.45322051644325256, 'learning_rate': 3.6272730107777957e-06, 'epoch': 0.84}
84%|████████▍ | 3803/4506 [4:19:50<48:25, 4.13s/it]
84%|████████▍ | 3804/4506 [4:19:54<48:03, 4.11s/it]
{'loss': 0.1927, 'grad_norm': 0.3803762197494507, 'learning_rate': 3.6172314069100514e-06, 'epoch': 0.84}
84%|████████▍ | 3804/4506 [4:19:54<48:03, 4.11s/it]
84%|████████▍ | 3805/4506 [4:19:58<47:57, 4.10s/it]
{'loss': 0.2015, 'grad_norm': 0.3795751929283142, 'learning_rate': 3.6072026376306216e-06, 'epoch': 0.84}
84%|████████▍ | 3805/4506 [4:19:58<47:57, 4.10s/it]
84%|████████▍ | 3806/4506 [4:20:02<47:01, 4.03s/it]
{'loss': 0.2024, 'grad_norm': 0.3985573351383209, 'learning_rate': 3.5971867089590773e-06, 'epoch': 0.84}
84%|████████▍ | 3806/4506 [4:20:02<47:01, 4.03s/it]
84%|████████▍ | 3807/4506 [4:20:06<47:36, 4.09s/it]
{'loss': 0.1958, 'grad_norm': 0.40519627928733826, 'learning_rate': 3.5871836269072784e-06, 'epoch': 0.85}
84%|████████▍ | 3807/4506 [4:20:06<47:36, 4.09s/it]
85%|████████▍ | 3808/4506 [4:20:10<47:13, 4.06s/it]
{'loss': 0.199, 'grad_norm': 0.41512709856033325, 'learning_rate': 3.5771933974793862e-06, 'epoch': 0.85}
85%|████████▍ | 3808/4506 [4:20:10<47:13, 4.06s/it]
85%|████████▍ | 3809/4506 [4:20:14<47:55, 4.13s/it]
{'loss': 0.2034, 'grad_norm': 0.4102889597415924, 'learning_rate': 3.5672160266718295e-06, 'epoch': 0.85}
85%|████████▍ | 3809/4506 [4:20:14<47:55, 4.13s/it]
85%|████████▍ | 3810/4506 [4:20:18<47:12, 4.07s/it]
{'loss': 0.2045, 'grad_norm': 0.40011733770370483, 'learning_rate': 3.557251520473337e-06, 'epoch': 0.85}
85%|████████▍ | 3810/4506 [4:20:18<47:12, 4.07s/it]
85%|████████▍ | 3811/4506 [4:20:22<46:49, 4.04s/it]
{'loss': 0.1938, 'grad_norm': 0.35875290632247925, 'learning_rate': 3.5472998848649103e-06, 'epoch': 0.85}
85%|████████▍ | 3811/4506 [4:20:22<46:49, 4.04s/it]
85%|████████▍ | 3812/4506 [4:20:26<46:28, 4.02s/it]
{'loss': 0.2065, 'grad_norm': 0.3568592965602875, 'learning_rate': 3.5373611258198243e-06, 'epoch': 0.85}
85%|████████▍ | 3812/4506 [4:20:26<46:28, 4.02s/it]
85%|████████▍ | 3813/4506 [4:20:30<46:33, 4.03s/it]
{'loss': 0.197, 'grad_norm': 0.40942084789276123, 'learning_rate': 3.527435249303618e-06, 'epoch': 0.85}
85%|████████▍ | 3813/4506 [4:20:30<46:33, 4.03s/it]
85%|████████▍ | 3814/4506 [4:20:34<46:53, 4.07s/it]
{'loss': 0.209, 'grad_norm': 0.3822757303714752, 'learning_rate': 3.5175222612741142e-06, 'epoch': 0.85}
85%|████████▍ | 3814/4506 [4:20:34<46:53, 4.07s/it]
85%|████████▍ | 3815/4506 [4:20:38<47:30, 4.13s/it]
{'loss': 0.2047, 'grad_norm': 0.3825817108154297, 'learning_rate': 3.5076221676813763e-06, 'epoch': 0.85}
85%|████████▍ | 3815/4506 [4:20:38<47:30, 4.13s/it]
85%|████████▍ | 3816/4506 [4:20:42<46:31, 4.05s/it]
{'loss': 0.1981, 'grad_norm': 0.38021156191825867, 'learning_rate': 3.497734974467759e-06, 'epoch': 0.85}
85%|████████▍ | 3816/4506 [4:20:42<46:31, 4.05s/it]
85%|████████▍ | 3817/4506 [4:20:46<46:45, 4.07s/it]
{'loss': 0.2123, 'grad_norm': 0.38847872614860535, 'learning_rate': 3.4878606875678372e-06, 'epoch': 0.85}
85%|████████▍ | 3817/4506 [4:20:46<46:45, 4.07s/it]
85%|████████▍ | 3818/4506 [4:20:51<46:58, 4.10s/it]
{'loss': 0.2029, 'grad_norm': 0.3843871057033539, 'learning_rate': 3.4779993129084725e-06, 'epoch': 0.85}
85%|████████▍ | 3818/4506 [4:20:51<46:58, 4.10s/it]
85%|████████▍ | 3819/4506 [4:20:55<48:01, 4.19s/it]
{'loss': 0.2033, 'grad_norm': 0.3577982485294342, 'learning_rate': 3.4681508564087583e-06, 'epoch': 0.85}
85%|████████▍ | 3819/4506 [4:20:55<48:01, 4.19s/it]
85%|████████▍ | 3820/4506 [4:20:59<46:38, 4.08s/it]
{'loss': 0.205, 'grad_norm': 0.38260653614997864, 'learning_rate': 3.4583153239800355e-06, 'epoch': 0.85}
85%|████████▍ | 3820/4506 [4:20:59<46:38, 4.08s/it]
85%|████████▍ | 3821/4506 [4:21:03<46:03, 4.03s/it]
{'loss': 0.2008, 'grad_norm': 0.4549785554409027, 'learning_rate': 3.448492721525895e-06, 'epoch': 0.85}
85%|████████▍ | 3821/4506 [4:21:03<46:03, 4.03s/it]
85%|████████▍ | 3822/4506 [4:21:07<47:02, 4.13s/it]
{'loss': 0.1905, 'grad_norm': 0.35348761081695557, 'learning_rate': 3.4386830549421547e-06, 'epoch': 0.85}
85%|████████▍ | 3822/4506 [4:21:07<47:02, 4.13s/it]
85%|████████▍ | 3823/4506 [4:21:11<46:41, 4.10s/it]
{'loss': 0.1961, 'grad_norm': 0.3715651035308838, 'learning_rate': 3.4288863301168762e-06, 'epoch': 0.85}
85%|████████▍ | 3823/4506 [4:21:11<46:41, 4.10s/it]
85%|████████▍ | 3824/4506 [4:21:16<47:45, 4.20s/it]
{'loss': 0.1964, 'grad_norm': 0.3533250689506531, 'learning_rate': 3.419102552930356e-06, 'epoch': 0.85}
85%|████████▍ | 3824/4506 [4:21:16<47:45, 4.20s/it]
85%|████████▍ | 3825/4506 [4:21:20<47:13, 4.16s/it]
{'loss': 0.2134, 'grad_norm': 0.428076833486557, 'learning_rate': 3.409331729255119e-06, 'epoch': 0.85}
85%|████████▍ | 3825/4506 [4:21:20<47:13, 4.16s/it]
85%|████████▍ | 3826/4506 [4:21:24<48:17, 4.26s/it]
{'loss': 0.2051, 'grad_norm': 0.345644474029541, 'learning_rate': 3.399573864955899e-06, 'epoch': 0.85}
85%|████████▍ | 3826/4506 [4:21:24<48:17, 4.26s/it]
85%|████████▍ | 3827/4506 [4:21:28<47:35, 4.21s/it]
{'loss': 0.2186, 'grad_norm': 0.43049994111061096, 'learning_rate': 3.3898289658896744e-06, 'epoch': 0.85}
85%|████████▍ | 3827/4506 [4:21:28<47:35, 4.21s/it]
85%|████████▍ | 3828/4506 [4:21:33<48:10, 4.26s/it]
{'loss': 0.1998, 'grad_norm': 0.40994831919670105, 'learning_rate': 3.380097037905616e-06, 'epoch': 0.85}
85%|████████▍ | 3828/4506 [4:21:33<48:10, 4.26s/it]
85%|████████▍ | 3829/4506 [4:21:37<48:08, 4.27s/it]
{'loss': 0.1953, 'grad_norm': 0.3926750421524048, 'learning_rate': 3.370378086845136e-06, 'epoch': 0.85}
85%|████████▍ | 3829/4506 [4:21:37<48:08, 4.27s/it]
85%|████████▍ | 3830/4506 [4:21:41<46:51, 4.16s/it]
{'loss': 0.1931, 'grad_norm': 0.3723422884941101, 'learning_rate': 3.3606721185418382e-06, 'epoch': 0.85}
85%|████████▍ | 3830/4506 [4:21:41<46:51, 4.16s/it]
85%|████████▌ | 3831/4506 [4:21:45<47:46, 4.25s/it]
{'loss': 0.197, 'grad_norm': 0.36233118176460266, 'learning_rate': 3.350979138821547e-06, 'epoch': 0.85}
85%|████████▌ | 3831/4506 [4:21:45<47:46, 4.25s/it]
85%|████████▌ | 3832/4506 [4:21:49<47:33, 4.23s/it]
{'loss': 0.1976, 'grad_norm': 0.4087424874305725, 'learning_rate': 3.341299153502275e-06, 'epoch': 0.85}
85%|████████▌ | 3832/4506 [4:21:49<47:33, 4.23s/it]
85%|████████▌ | 3833/4506 [4:21:54<47:17, 4.22s/it]
{'loss': 0.2087, 'grad_norm': 0.42482733726501465, 'learning_rate': 3.3316321683942526e-06, 'epoch': 0.85}
85%|████████▌ | 3833/4506 [4:21:54<47:17, 4.22s/it]
85%|████████▌ | 3834/4506 [4:21:58<47:20, 4.23s/it]
{'loss': 0.2002, 'grad_norm': 0.3971657156944275, 'learning_rate': 3.321978189299896e-06, 'epoch': 0.85}
85%|████████▌ | 3834/4506 [4:21:58<47:20, 4.23s/it]
85%|████████▌ | 3835/4506 [4:22:02<46:41, 4.17s/it]
{'loss': 0.1991, 'grad_norm': 0.417209655046463, 'learning_rate': 3.312337222013806e-06, 'epoch': 0.85}
85%|████████▌ | 3835/4506 [4:22:02<46:41, 4.17s/it]
85%|████████▌ | 3836/4506 [4:22:06<47:49, 4.28s/it]
{'loss': 0.2072, 'grad_norm': 0.389072984457016, 'learning_rate': 3.302709272322796e-06, 'epoch': 0.85}
85%|████████▌ | 3836/4506 [4:22:06<47:49, 4.28s/it]
85%|████████▌ | 3837/4506 [4:22:10<45:58, 4.12s/it]
{'loss': 0.1971, 'grad_norm': 0.3980267345905304, 'learning_rate': 3.2930943460058482e-06, 'epoch': 0.85}
85%|████████▌ | 3837/4506 [4:22:10<45:58, 4.12s/it]
85%|████████▌ | 3838/4506 [4:22:14<45:18, 4.07s/it]
{'loss': 0.1925, 'grad_norm': 0.39212939143180847, 'learning_rate': 3.2834924488341467e-06, 'epoch': 0.85}
85%|████████▌ | 3838/4506 [4:22:14<45:18, 4.07s/it]
85%|████████▌ | 3839/4506 [4:22:18<44:52, 4.04s/it]
{'loss': 0.1857, 'grad_norm': 0.3779745101928711, 'learning_rate': 3.2739035865710253e-06, 'epoch': 0.85}
85%|████████▌ | 3839/4506 [4:22:18<44:52, 4.04s/it]
85%|████████▌ | 3840/4506 [4:22:22<44:22, 4.00s/it]
{'loss': 0.1966, 'grad_norm': 0.37853601574897766, 'learning_rate': 3.264327764972025e-06, 'epoch': 0.85}
85%|████████▌ | 3840/4506 [4:22:22<44:22, 4.00s/it]
85%|████████▌ | 3841/4506 [4:22:26<45:55, 4.14s/it]
{'loss': 0.2041, 'grad_norm': 0.43422186374664307, 'learning_rate': 3.2547649897848308e-06, 'epoch': 0.85}
85%|████████▌ | 3841/4506 [4:22:27<45:55, 4.14s/it]
85%|████████▌ | 3842/4506 [4:22:30<44:48, 4.05s/it]
{'loss': 0.1948, 'grad_norm': 0.37832188606262207, 'learning_rate': 3.2452152667493243e-06, 'epoch': 0.85}
85%|████████▌ | 3842/4506 [4:22:30<44:48, 4.05s/it]
85%|████████▌ | 3843/4506 [4:22:34<44:32, 4.03s/it]
{'loss': 0.2028, 'grad_norm': 0.42336955666542053, 'learning_rate': 3.2356786015975305e-06, 'epoch': 0.85}
85%|████████▌ | 3843/4506 [4:22:34<44:32, 4.03s/it]
85%|████████▌ | 3844/4506 [4:22:38<44:01, 3.99s/it]
{'loss': 0.1941, 'grad_norm': 0.41590479016304016, 'learning_rate': 3.2261550000536574e-06, 'epoch': 0.85}
85%|████████▌ | 3844/4506 [4:22:38<44:01, 3.99s/it]
85%|████████▌ | 3845/4506 [4:22:42<44:25, 4.03s/it]
{'loss': 0.195, 'grad_norm': 0.3668734133243561, 'learning_rate': 3.2166444678340487e-06, 'epoch': 0.85}
85%|████████▌ | 3845/4506 [4:22:42<44:25, 4.03s/it]
85%|████████▌ | 3846/4506 [4:22:47<45:18, 4.12s/it]
{'loss': 0.2021, 'grad_norm': 0.41019541025161743, 'learning_rate': 3.2071470106472266e-06, 'epoch': 0.85}
85%|████████▌ | 3846/4506 [4:22:47<45:18, 4.12s/it]
85%|████████▌ | 3847/4506 [4:22:51<46:25, 4.23s/it]
{'loss': 0.2081, 'grad_norm': 0.3969305157661438, 'learning_rate': 3.19766263419384e-06, 'epoch': 0.85}
85%|████████▌ | 3847/4506 [4:22:51<46:25, 4.23s/it]
85%|████████▌ | 3848/4506 [4:22:55<44:57, 4.10s/it]
{'loss': 0.1966, 'grad_norm': 0.47270673513412476, 'learning_rate': 3.1881913441667077e-06, 'epoch': 0.85}
85%|████████▌ | 3848/4506 [4:22:55<44:57, 4.10s/it]
85%|████████▌ | 3849/4506 [4:22:59<45:03, 4.12s/it]
{'loss': 0.2071, 'grad_norm': 0.3815246522426605, 'learning_rate': 3.178733146250784e-06, 'epoch': 0.85}
85%|████████▌ | 3849/4506 [4:22:59<45:03, 4.12s/it]
85%|████████▌ | 3850/4506 [4:23:04<46:33, 4.26s/it]
{'loss': 0.2019, 'grad_norm': 0.37185606360435486, 'learning_rate': 3.1692880461231784e-06, 'epoch': 0.85}
85%|████████▌ | 3850/4506 [4:23:04<46:33, 4.26s/it]
85%|████████▌ | 3851/4506 [4:23:08<45:08, 4.13s/it]
{'loss': 0.2278, 'grad_norm': 0.6947259306907654, 'learning_rate': 3.1598560494531136e-06, 'epoch': 0.85}
85%|████████▌ | 3851/4506 [4:23:08<45:08, 4.13s/it]
85%|████████▌ | 3852/4506 [4:23:11<44:24, 4.07s/it]
{'loss': 0.1944, 'grad_norm': 0.4304612874984741, 'learning_rate': 3.1504371619019647e-06, 'epoch': 0.86}
85%|████████▌ | 3852/4506 [4:23:11<44:24, 4.07s/it]
86%|████████▌ | 3853/4506 [4:23:15<43:31, 4.00s/it]
{'loss': 0.2056, 'grad_norm': 0.45168083906173706, 'learning_rate': 3.1410313891232364e-06, 'epoch': 0.86}
86%|████████▌ | 3853/4506 [4:23:15<43:31, 4.00s/it]
86%|████████▌ | 3854/4506 [4:23:19<43:35, 4.01s/it]
{'loss': 0.1965, 'grad_norm': 0.3454737663269043, 'learning_rate': 3.131638736762557e-06, 'epoch': 0.86}
86%|████████▌ | 3854/4506 [4:23:19<43:35, 4.01s/it]
86%|████████▌ | 3855/4506 [4:23:23<43:17, 3.99s/it]
{'loss': 0.2057, 'grad_norm': 0.4032440185546875, 'learning_rate': 3.1222592104576813e-06, 'epoch': 0.86}
86%|████████▌ | 3855/4506 [4:23:23<43:17, 3.99s/it]
86%|████████▌ | 3856/4506 [4:23:27<43:12, 3.99s/it]
{'loss': 0.2058, 'grad_norm': 0.4103756248950958, 'learning_rate': 3.112892815838489e-06, 'epoch': 0.86}
86%|████████▌ | 3856/4506 [4:23:27<43:12, 3.99s/it]
86%|████████▌ | 3857/4506 [4:23:31<42:52, 3.96s/it]
{'loss': 0.1953, 'grad_norm': 0.3880676329135895, 'learning_rate': 3.10353955852698e-06, 'epoch': 0.86}
86%|████████▌ | 3857/4506 [4:23:31<42:52, 3.96s/it]
86%|████████▌ | 3858/4506 [4:23:35<42:27, 3.93s/it]
{'loss': 0.2151, 'grad_norm': 0.40610212087631226, 'learning_rate': 3.094199444137255e-06, 'epoch': 0.86}
86%|████████▌ | 3858/4506 [4:23:35<42:27, 3.93s/it]
86%|████████▌ | 3859/4506 [4:23:39<43:16, 4.01s/it]
{'loss': 0.214, 'grad_norm': 0.4008800685405731, 'learning_rate': 3.084872478275544e-06, 'epoch': 0.86}
86%|████████▌ | 3859/4506 [4:23:39<43:16, 4.01s/it]
86%|████████▌ | 3860/4506 [4:23:43<43:20, 4.03s/it]
{'loss': 0.1936, 'grad_norm': 0.3623058795928955, 'learning_rate': 3.0755586665401627e-06, 'epoch': 0.86}
86%|████████▌ | 3860/4506 [4:23:43<43:20, 4.03s/it]
86%|████████▌ | 3861/4506 [4:23:47<42:50, 3.98s/it]
{'loss': 0.2067, 'grad_norm': 0.41149166226387024, 'learning_rate': 3.066258014521556e-06, 'epoch': 0.86}
86%|████████▌ | 3861/4506 [4:23:47<42:50, 3.98s/it]
86%|████████▌ | 3862/4506 [4:23:51<43:28, 4.05s/it]
{'loss': 0.2172, 'grad_norm': 0.44605448842048645, 'learning_rate': 3.0569705278022525e-06, 'epoch': 0.86}
86%|████████▌ | 3862/4506 [4:23:51<43:28, 4.05s/it]
86%|████████▌ | 3863/4506 [4:23:55<43:36, 4.07s/it]
{'loss': 0.1904, 'grad_norm': 0.3838931918144226, 'learning_rate': 3.047696211956891e-06, 'epoch': 0.86}
86%|████████▌ | 3863/4506 [4:23:55<43:36, 4.07s/it]
86%|████████▌ | 3864/4506 [4:24:00<43:52, 4.10s/it]
{'loss': 0.1881, 'grad_norm': 0.3886209726333618, 'learning_rate': 3.038435072552187e-06, 'epoch': 0.86}
86%|████████▌ | 3864/4506 [4:24:00<43:52, 4.10s/it]
86%|████████▌ | 3865/4506 [4:24:04<44:24, 4.16s/it]
{'loss': 0.2058, 'grad_norm': 0.37291616201400757, 'learning_rate': 3.0291871151469696e-06, 'epoch': 0.86}
86%|████████▌ | 3865/4506 [4:24:04<44:24, 4.16s/it]
86%|████████▌ | 3866/4506 [4:24:08<43:50, 4.11s/it]
{'loss': 0.2013, 'grad_norm': 0.3809332847595215, 'learning_rate': 3.0199523452921346e-06, 'epoch': 0.86}
86%|████████▌ | 3866/4506 [4:24:08<43:50, 4.11s/it]
86%|████████▌ | 3867/4506 [4:24:12<43:37, 4.10s/it]
{'loss': 0.1955, 'grad_norm': 0.3844633400440216, 'learning_rate': 3.0107307685306755e-06, 'epoch': 0.86}
86%|████████▌ | 3867/4506 [4:24:12<43:37, 4.10s/it]
86%|████████▌ | 3868/4506 [4:24:16<42:55, 4.04s/it]
{'loss': 0.1947, 'grad_norm': 0.3919549584388733, 'learning_rate': 3.0015223903976706e-06, 'epoch': 0.86}
86%|████████▌ | 3868/4506 [4:24:16<42:55, 4.04s/it]
86%|████████▌ | 3869/4506 [4:24:20<43:06, 4.06s/it]
{'loss': 0.2034, 'grad_norm': 0.41285160183906555, 'learning_rate': 2.992327216420257e-06, 'epoch': 0.86}
86%|████████▌ | 3869/4506 [4:24:20<43:06, 4.06s/it]
86%|████████▌ | 3870/4506 [4:24:24<42:39, 4.02s/it]
{'loss': 0.1963, 'grad_norm': 0.4005133807659149, 'learning_rate': 2.9831452521176665e-06, 'epoch': 0.86}
86%|████████▌ | 3870/4506 [4:24:24<42:39, 4.02s/it]
86%|████████▌ | 3871/4506 [4:24:28<43:25, 4.10s/it]
{'loss': 0.193, 'grad_norm': 0.4370368421077728, 'learning_rate': 2.9739765030011857e-06, 'epoch': 0.86}
86%|████████▌ | 3871/4506 [4:24:28<43:25, 4.10s/it]
86%|████████▌ | 3872/4506 [4:24:32<42:50, 4.05s/it]
{'loss': 0.2071, 'grad_norm': 0.41795778274536133, 'learning_rate': 2.9648209745741838e-06, 'epoch': 0.86}
86%|████████▌ | 3872/4506 [4:24:32<42:50, 4.05s/it]
86%|████████▌ | 3873/4506 [4:24:37<44:01, 4.17s/it]
{'loss': 0.202, 'grad_norm': 0.35793906450271606, 'learning_rate': 2.9556786723320823e-06, 'epoch': 0.86}
86%|████████▌ | 3873/4506 [4:24:37<44:01, 4.17s/it]
86%|████████▌ | 3874/4506 [4:24:41<43:11, 4.10s/it]
{'loss': 0.2021, 'grad_norm': 0.43959617614746094, 'learning_rate': 2.946549601762369e-06, 'epoch': 0.86}
86%|████████▌ | 3874/4506 [4:24:41<43:11, 4.10s/it]
86%|████████▌ | 3875/4506 [4:24:45<42:52, 4.08s/it]
{'loss': 0.2046, 'grad_norm': 0.35007718205451965, 'learning_rate': 2.937433768344594e-06, 'epoch': 0.86}
86%|████████▌ | 3875/4506 [4:24:45<42:52, 4.08s/it]
86%|████████▌ | 3876/4506 [4:24:49<43:22, 4.13s/it]
{'loss': 0.2075, 'grad_norm': 0.3687748908996582, 'learning_rate': 2.9283311775503612e-06, 'epoch': 0.86}
86%|████████▌ | 3876/4506 [4:24:49<43:22, 4.13s/it]
86%|████████▌ | 3877/4506 [4:24:53<43:33, 4.15s/it]
{'loss': 0.1975, 'grad_norm': 0.3757188618183136, 'learning_rate': 2.9192418348433114e-06, 'epoch': 0.86}
86%|████████▌ | 3877/4506 [4:24:53<43:33, 4.15s/it]
86%|████████▌ | 3878/4506 [4:24:57<43:08, 4.12s/it]
{'loss': 0.2055, 'grad_norm': 0.3951666057109833, 'learning_rate': 2.910165745679158e-06, 'epoch': 0.86}
86%|████████▌ | 3878/4506 [4:24:57<43:08, 4.12s/it]
86%|████████▌ | 3879/4506 [4:25:01<41:56, 4.01s/it]
{'loss': 0.196, 'grad_norm': 0.3644367754459381, 'learning_rate': 2.9011029155056322e-06, 'epoch': 0.86}
86%|████████▌ | 3879/4506 [4:25:01<41:56, 4.01s/it]
86%|████████▌ | 3880/4506 [4:25:05<42:08, 4.04s/it]
{'loss': 0.2034, 'grad_norm': 0.37177303433418274, 'learning_rate': 2.8920533497625248e-06, 'epoch': 0.86}
86%|████████▌ | 3880/4506 [4:25:05<42:08, 4.04s/it]
86%|████████▌ | 3881/4506 [4:25:09<41:12, 3.96s/it]
{'loss': 0.2086, 'grad_norm': 0.4236007630825043, 'learning_rate': 2.883017053881665e-06, 'epoch': 0.86}
86%|████████▌ | 3881/4506 [4:25:09<41:12, 3.96s/it]
86%|████████▌ | 3882/4506 [4:25:13<40:39, 3.91s/it]
{'loss': 0.2023, 'grad_norm': 0.39456483721733093, 'learning_rate': 2.873994033286917e-06, 'epoch': 0.86}
86%|████████▌ | 3882/4506 [4:25:13<40:39, 3.91s/it]
86%|████████▌ | 3883/4506 [4:25:17<41:17, 3.98s/it]
{'loss': 0.1932, 'grad_norm': 0.35379937291145325, 'learning_rate': 2.8649842933941572e-06, 'epoch': 0.86}
86%|████████▌ | 3883/4506 [4:25:17<41:17, 3.98s/it]
86%|████████▌ | 3884/4506 [4:25:20<40:12, 3.88s/it]
{'loss': 0.1995, 'grad_norm': 0.4143606722354889, 'learning_rate': 2.8559878396113184e-06, 'epoch': 0.86}
86%|████████▌ | 3884/4506 [4:25:20<40:12, 3.88s/it]
86%|████████▌ | 3885/4506 [4:25:24<40:39, 3.93s/it]
{'loss': 0.1922, 'grad_norm': 0.42542997002601624, 'learning_rate': 2.8470046773383443e-06, 'epoch': 0.86}
86%|████████▌ | 3885/4506 [4:25:24<40:39, 3.93s/it]
86%|████████▌ | 3886/4506 [4:25:29<41:42, 4.04s/it]
{'loss': 0.1977, 'grad_norm': 0.36953455209732056, 'learning_rate': 2.8380348119671883e-06, 'epoch': 0.86}
86%|████████▌ | 3886/4506 [4:25:29<41:42, 4.04s/it]
86%|████████▋ | 3887/4506 [4:25:32<41:05, 3.98s/it]
{'loss': 0.1877, 'grad_norm': 0.4050857722759247, 'learning_rate': 2.8290782488818558e-06, 'epoch': 0.86}
86%|████████▋ | 3887/4506 [4:25:33<41:05, 3.98s/it]
86%|████████▋ | 3888/4506 [4:25:37<42:02, 4.08s/it]
{'loss': 0.204, 'grad_norm': 0.43167492747306824, 'learning_rate': 2.8201349934583373e-06, 'epoch': 0.86}
86%|████████▋ | 3888/4506 [4:25:37<42:02, 4.08s/it]
86%|████████▋ | 3889/4506 [4:25:41<41:52, 4.07s/it]
{'loss': 0.2005, 'grad_norm': 0.4039248526096344, 'learning_rate': 2.8112050510646524e-06, 'epoch': 0.86}
86%|████████▋ | 3889/4506 [4:25:41<41:52, 4.07s/it]
86%|████████▋ | 3890/4506 [4:25:45<42:10, 4.11s/it]
{'loss': 0.201, 'grad_norm': 0.4206696152687073, 'learning_rate': 2.8022884270608173e-06, 'epoch': 0.86}
86%|████████▋ | 3890/4506 [4:25:45<42:10, 4.11s/it]
86%|████████▋ | 3891/4506 [4:25:49<42:32, 4.15s/it]
{'loss': 0.195, 'grad_norm': 0.37689366936683655, 'learning_rate': 2.7933851267988726e-06, 'epoch': 0.86}
86%|████████▋ | 3891/4506 [4:25:49<42:32, 4.15s/it]
86%|████████▋ | 3892/4506 [4:25:54<42:57, 4.20s/it]
{'loss': 0.1923, 'grad_norm': 0.3832522928714752, 'learning_rate': 2.784495155622835e-06, 'epoch': 0.86}
86%|████████▋ | 3892/4506 [4:25:54<42:57, 4.20s/it]
86%|████████▋ | 3893/4506 [4:25:57<41:54, 4.10s/it]
{'loss': 0.2089, 'grad_norm': 0.4639732539653778, 'learning_rate': 2.7756185188687443e-06, 'epoch': 0.86}
86%|████████▋ | 3893/4506 [4:25:57<41:54, 4.10s/it]
86%|████████▋ | 3894/4506 [4:26:02<42:18, 4.15s/it]
{'loss': 0.1878, 'grad_norm': 0.37587207555770874, 'learning_rate': 2.7667552218646254e-06, 'epoch': 0.86}
86%|████████▋ | 3894/4506 [4:26:02<42:18, 4.15s/it]
86%|████████▋ | 3895/4506 [4:26:06<41:47, 4.10s/it]
{'loss': 0.1979, 'grad_norm': 0.39409253001213074, 'learning_rate': 2.757905269930508e-06, 'epoch': 0.86}
86%|████████▋ | 3895/4506 [4:26:06<41:47, 4.10s/it]
86%|████████▋ | 3896/4506 [4:26:10<43:40, 4.30s/it]
{'loss': 0.1984, 'grad_norm': 0.38683027029037476, 'learning_rate': 2.7490686683783877e-06, 'epoch': 0.86}
86%|████████▋ | 3896/4506 [4:26:10<43:40, 4.30s/it]
86%|████████▋ | 3897/4506 [4:26:14<42:08, 4.15s/it]
{'loss': 0.2217, 'grad_norm': 0.44417333602905273, 'learning_rate': 2.7402454225122748e-06, 'epoch': 0.86}
86%|████████▋ | 3897/4506 [4:26:14<42:08, 4.15s/it]
87%|████████▋ | 3898/4506 [4:26:18<40:44, 4.02s/it]
{'loss': 0.2038, 'grad_norm': 0.37804439663887024, 'learning_rate': 2.731435537628138e-06, 'epoch': 0.87}
87%|████████▋ | 3898/4506 [4:26:18<40:44, 4.02s/it]
87%|████████▋ | 3899/4506 [4:26:22<41:08, 4.07s/it]
{'loss': 0.2028, 'grad_norm': 0.49353644251823425, 'learning_rate': 2.722639019013945e-06, 'epoch': 0.87}
87%|████████▋ | 3899/4506 [4:26:22<41:08, 4.07s/it]
87%|████████▋ | 3900/4506 [4:26:26<41:15, 4.09s/it]
{'loss': 0.1977, 'grad_norm': 0.3652610778808594, 'learning_rate': 2.713855871949636e-06, 'epoch': 0.87}
87%|████████▋ | 3900/4506 [4:26:26<41:15, 4.09s/it]
87%|████████▋ | 3901/4506 [4:26:30<40:46, 4.04s/it]
{'loss': 0.1992, 'grad_norm': 0.44013720750808716, 'learning_rate': 2.7050861017071218e-06, 'epoch': 0.87}
87%|████████▋ | 3901/4506 [4:26:30<40:46, 4.04s/it]
87%|████████▋ | 3902/4506 [4:26:34<39:36, 3.93s/it]
{'loss': 0.2008, 'grad_norm': 0.3935183584690094, 'learning_rate': 2.6963297135502856e-06, 'epoch': 0.87}
87%|████████▋ | 3902/4506 [4:26:34<39:36, 3.93s/it]
87%|████████▋ | 3903/4506 [4:26:38<40:00, 3.98s/it]
{'loss': 0.2132, 'grad_norm': 0.4488699436187744, 'learning_rate': 2.6875867127349683e-06, 'epoch': 0.87}
87%|████████▋ | 3903/4506 [4:26:38<40:00, 3.98s/it]
87%|████████▋ | 3904/4506 [4:26:42<39:35, 3.95s/it]
{'loss': 0.1982, 'grad_norm': 0.40771374106407166, 'learning_rate': 2.678857104509e-06, 'epoch': 0.87}
87%|████████▋ | 3904/4506 [4:26:42<39:35, 3.95s/it]
87%|████████▋ | 3905/4506 [4:26:46<39:13, 3.92s/it]
{'loss': 0.197, 'grad_norm': 0.39425233006477356, 'learning_rate': 2.670140894112141e-06, 'epoch': 0.87}
87%|████████▋ | 3905/4506 [4:26:46<39:13, 3.92s/it]
87%|████████▋ | 3906/4506 [4:26:49<38:24, 3.84s/it]
{'loss': 0.1977, 'grad_norm': 0.4352598786354065, 'learning_rate': 2.661438086776144e-06, 'epoch': 0.87}
87%|████████▋ | 3906/4506 [4:26:49<38:24, 3.84s/it]
87%|████████▋ | 3907/4506 [4:26:53<38:27, 3.85s/it]
{'loss': 0.2048, 'grad_norm': 0.38223108649253845, 'learning_rate': 2.6527486877246847e-06, 'epoch': 0.87}
87%|████████▋ | 3907/4506 [4:26:53<38:27, 3.85s/it]
87%|████████▋ | 3908/4506 [4:26:57<38:54, 3.90s/it]
{'loss': 0.1929, 'grad_norm': 0.3546290993690491, 'learning_rate': 2.644072702173417e-06, 'epoch': 0.87}
87%|████████▋ | 3908/4506 [4:26:57<38:54, 3.90s/it]
87%|████████▋ | 3909/4506 [4:27:01<39:05, 3.93s/it]
{'loss': 0.1896, 'grad_norm': 0.3983183205127716, 'learning_rate': 2.635410135329916e-06, 'epoch': 0.87}
87%|████████▋ | 3909/4506 [4:27:01<39:05, 3.93s/it]
87%|████████▋ | 3910/4506 [4:27:05<39:19, 3.96s/it]
{'loss': 0.206, 'grad_norm': 0.44413360953330994, 'learning_rate': 2.6267609923937314e-06, 'epoch': 0.87}
87%|████████▋ | 3910/4506 [4:27:05<39:19, 3.96s/it]
87%|████████▋ | 3911/4506 [4:27:09<39:49, 4.02s/it]
{'loss': 0.2019, 'grad_norm': 0.4121004641056061, 'learning_rate': 2.6181252785563355e-06, 'epoch': 0.87}
87%|████████▋ | 3911/4506 [4:27:09<39:49, 4.02s/it]
87%|████████▋ | 3912/4506 [4:27:14<40:09, 4.06s/it]
{'loss': 0.1997, 'grad_norm': 0.3981860280036926, 'learning_rate': 2.6095029990011453e-06, 'epoch': 0.87}
87%|████████▋ | 3912/4506 [4:27:14<40:09, 4.06s/it]
87%|████████▋ | 3913/4506 [4:27:18<40:03, 4.05s/it]
{'loss': 0.2043, 'grad_norm': 0.3859134614467621, 'learning_rate': 2.6008941589035162e-06, 'epoch': 0.87}
87%|████████▋ | 3913/4506 [4:27:18<40:03, 4.05s/it]
87%|████████▋ | 3914/4506 [4:27:22<39:22, 3.99s/it]
{'loss': 0.2044, 'grad_norm': 0.36432817578315735, 'learning_rate': 2.592298763430745e-06, 'epoch': 0.87}
87%|████████▋ | 3914/4506 [4:27:22<39:22, 3.99s/it]
87%|████████▋ | 3915/4506 [4:27:25<38:48, 3.94s/it]
{'loss': 0.207, 'grad_norm': 0.3973345458507538, 'learning_rate': 2.583716817742038e-06, 'epoch': 0.87}
87%|████████▋ | 3915/4506 [4:27:25<38:48, 3.94s/it]
87%|████████▋ | 3916/4506 [4:27:30<40:18, 4.10s/it]
{'loss': 0.203, 'grad_norm': 0.39738592505455017, 'learning_rate': 2.5751483269885466e-06, 'epoch': 0.87}
87%|████████▋ | 3916/4506 [4:27:30<40:18, 4.10s/it]
87%|████████▋ | 3917/4506 [4:27:34<39:45, 4.05s/it]
{'loss': 0.1903, 'grad_norm': 0.4196644127368927, 'learning_rate': 2.5665932963133327e-06, 'epoch': 0.87}
87%|████████▋ | 3917/4506 [4:27:34<39:45, 4.05s/it]
87%|████████▋ | 3918/4506 [4:27:38<39:39, 4.05s/it]
{'loss': 0.2021, 'grad_norm': 0.3816438317298889, 'learning_rate': 2.5580517308513936e-06, 'epoch': 0.87}
87%|████████▋ | 3918/4506 [4:27:38<39:39, 4.05s/it]
87%|████████▋ | 3919/4506 [4:27:42<40:27, 4.14s/it]
{'loss': 0.1952, 'grad_norm': 0.3690407872200012, 'learning_rate': 2.5495236357296365e-06, 'epoch': 0.87}
87%|████████▋ | 3919/4506 [4:27:42<40:27, 4.14s/it]
87%|████████▋ | 3920/4506 [4:27:46<40:20, 4.13s/it]
{'loss': 0.2064, 'grad_norm': 0.4065913259983063, 'learning_rate': 2.5410090160668754e-06, 'epoch': 0.87}
87%|████████▋ | 3920/4506 [4:27:46<40:20, 4.13s/it]
87%|████████▋ | 3921/4506 [4:27:51<40:40, 4.17s/it]
{'loss': 0.2164, 'grad_norm': 0.4267270267009735, 'learning_rate': 2.5325078769738555e-06, 'epoch': 0.87}
87%|████████▋ | 3921/4506 [4:27:51<40:40, 4.17s/it]
87%|████████▋ | 3922/4506 [4:27:55<41:15, 4.24s/it]
{'loss': 0.1964, 'grad_norm': 0.3830542266368866, 'learning_rate': 2.5240202235532083e-06, 'epoch': 0.87}
87%|████████▋ | 3922/4506 [4:27:55<41:15, 4.24s/it]
87%|████████▋ | 3923/4506 [4:27:59<41:37, 4.28s/it]
{'loss': 0.1945, 'grad_norm': 0.376214861869812, 'learning_rate': 2.5155460608994902e-06, 'epoch': 0.87}
87%|████████▋ | 3923/4506 [4:27:59<41:37, 4.28s/it]
87%|████████▋ | 3924/4506 [4:28:03<40:52, 4.21s/it]
{'loss': 0.2027, 'grad_norm': 0.3478774428367615, 'learning_rate': 2.50708539409914e-06, 'epoch': 0.87}
87%|████████▋ | 3924/4506 [4:28:03<40:52, 4.21s/it]
87%|████████▋ | 3925/4506 [4:28:07<39:54, 4.12s/it]
{'loss': 0.1908, 'grad_norm': 0.37937819957733154, 'learning_rate': 2.4986382282305237e-06, 'epoch': 0.87}
87%|████████▋ | 3925/4506 [4:28:07<39:54, 4.12s/it]
87%|████████▋ | 3926/4506 [4:28:11<39:35, 4.10s/it]
{'loss': 0.2071, 'grad_norm': 0.38575634360313416, 'learning_rate': 2.4902045683638743e-06, 'epoch': 0.87}
87%|████████▋ | 3926/4506 [4:28:11<39:35, 4.10s/it]
87%|████████▋ | 3927/4506 [4:28:16<39:55, 4.14s/it]
{'loss': 0.2089, 'grad_norm': 0.4028584659099579, 'learning_rate': 2.4817844195613393e-06, 'epoch': 0.87}
87%|████████▋ | 3927/4506 [4:28:16<39:55, 4.14s/it]
87%|████████▋ | 3928/4506 [4:28:19<39:17, 4.08s/it]
{'loss': 0.2105, 'grad_norm': 0.4180997312068939, 'learning_rate': 2.4733777868769404e-06, 'epoch': 0.87}
87%|████████▋ | 3928/4506 [4:28:19<39:17, 4.08s/it]
87%|████████▋ | 3929/4506 [4:28:23<38:27, 4.00s/it]
{'loss': 0.1973, 'grad_norm': 0.3794504702091217, 'learning_rate': 2.464984675356605e-06, 'epoch': 0.87}
87%|████████▋ | 3929/4506 [4:28:23<38:27, 4.00s/it]
87%|████████▋ | 3930/4506 [4:28:28<40:13, 4.19s/it]
{'loss': 0.2068, 'grad_norm': 0.4276103675365448, 'learning_rate': 2.4566050900381194e-06, 'epoch': 0.87}
87%|████████▋ | 3930/4506 [4:28:28<40:13, 4.19s/it]
87%|████████▋ | 3931/4506 [4:28:32<39:42, 4.14s/it]
{'loss': 0.1956, 'grad_norm': 0.42524832487106323, 'learning_rate': 2.4482390359511748e-06, 'epoch': 0.87}
87%|████████▋ | 3931/4506 [4:28:32<39:42, 4.14s/it]
87%|████████▋ | 3932/4506 [4:28:36<38:53, 4.07s/it]
{'loss': 0.2039, 'grad_norm': 0.43391311168670654, 'learning_rate': 2.439886518117332e-06, 'epoch': 0.87}
87%|████████▋ | 3932/4506 [4:28:36<38:53, 4.07s/it]
87%|████████▋ | 3933/4506 [4:28:40<39:19, 4.12s/it]
{'loss': 0.2172, 'grad_norm': 0.41322392225265503, 'learning_rate': 2.4315475415500275e-06, 'epoch': 0.87}
87%|████████▋ | 3933/4506 [4:28:40<39:19, 4.12s/it]
87%|████████▋ | 3934/4506 [4:28:45<40:13, 4.22s/it]
{'loss': 0.2118, 'grad_norm': 0.3965466618537903, 'learning_rate': 2.423222111254561e-06, 'epoch': 0.87}
87%|████████▋ | 3934/4506 [4:28:45<40:13, 4.22s/it]
87%|████████▋ | 3935/4506 [4:28:49<40:13, 4.23s/it]
{'loss': 0.1998, 'grad_norm': 0.3951808214187622, 'learning_rate': 2.414910232228118e-06, 'epoch': 0.87}
87%|████████▋ | 3935/4506 [4:28:49<40:13, 4.23s/it]
87%|████████▋ | 3936/4506 [4:28:53<40:15, 4.24s/it]
{'loss': 0.1989, 'grad_norm': 0.4005270004272461, 'learning_rate': 2.406611909459733e-06, 'epoch': 0.87}
87%|████████▋ | 3936/4506 [4:28:53<40:15, 4.24s/it]
87%|████████▋ | 3937/4506 [4:28:57<39:53, 4.21s/it]
{'loss': 0.206, 'grad_norm': 0.3960319757461548, 'learning_rate': 2.398327147930318e-06, 'epoch': 0.87}
87%|████████▋ | 3937/4506 [4:28:57<39:53, 4.21s/it]
87%|████████▋ | 3938/4506 [4:29:01<40:07, 4.24s/it]
{'loss': 0.1979, 'grad_norm': 0.39988332986831665, 'learning_rate': 2.3900559526126383e-06, 'epoch': 0.87}
87%|████████▋ | 3938/4506 [4:29:01<40:07, 4.24s/it]
87%|████████▋ | 3939/4506 [4:29:06<39:32, 4.18s/it]
{'loss': 0.1992, 'grad_norm': 0.3958902060985565, 'learning_rate': 2.381798328471313e-06, 'epoch': 0.87}
87%|████████▋ | 3939/4506 [4:29:06<39:32, 4.18s/it]
87%|████████▋ | 3940/4506 [4:29:09<38:48, 4.11s/it]
{'loss': 0.2053, 'grad_norm': 0.3816908895969391, 'learning_rate': 2.3735542804628254e-06, 'epoch': 0.87}
87%|████████▋ | 3940/4506 [4:29:09<38:48, 4.11s/it]
87%|████████▋ | 3941/4506 [4:29:14<38:34, 4.10s/it]
{'loss': 0.2013, 'grad_norm': 0.44131040573120117, 'learning_rate': 2.365323813535497e-06, 'epoch': 0.87}
87%|████████▋ | 3941/4506 [4:29:14<38:34, 4.10s/it]
87%|████████▋ | 3942/4506 [4:29:18<38:43, 4.12s/it]
{'loss': 0.198, 'grad_norm': 0.3780421316623688, 'learning_rate': 2.357106932629513e-06, 'epoch': 0.87}
87%|████████▋ | 3942/4506 [4:29:18<38:43, 4.12s/it]
88%|████████▊ | 3943/4506 [4:29:22<38:45, 4.13s/it]
{'loss': 0.1998, 'grad_norm': 0.3585613965988159, 'learning_rate': 2.348903642676878e-06, 'epoch': 0.88}
88%|████████▊ | 3943/4506 [4:29:22<38:45, 4.13s/it]
88%|████████▊ | 3944/4506 [4:29:26<38:05, 4.07s/it]
{'loss': 0.1868, 'grad_norm': 0.4202667474746704, 'learning_rate': 2.3407139486014807e-06, 'epoch': 0.88}
88%|████████▊ | 3944/4506 [4:29:26<38:05, 4.07s/it]
88%|████████▊ | 3945/4506 [4:29:30<37:41, 4.03s/it]
{'loss': 0.2062, 'grad_norm': 0.4310460090637207, 'learning_rate': 2.3325378553190056e-06, 'epoch': 0.88}
88%|████████▊ | 3945/4506 [4:29:30<37:41, 4.03s/it]
88%|████████▊ | 3946/4506 [4:29:34<36:55, 3.96s/it]
{'loss': 0.1918, 'grad_norm': 0.4178878664970398, 'learning_rate': 2.3243753677370057e-06, 'epoch': 0.88}
88%|████████▊ | 3946/4506 [4:29:34<36:55, 3.96s/it]
88%|████████▊ | 3947/4506 [4:29:37<36:53, 3.96s/it]
{'loss': 0.1961, 'grad_norm': 0.39103808999061584, 'learning_rate': 2.316226490754844e-06, 'epoch': 0.88}
88%|████████▊ | 3947/4506 [4:29:37<36:53, 3.96s/it]
88%|████████▊ | 3948/4506 [4:29:41<36:18, 3.90s/it]
{'loss': 0.2018, 'grad_norm': 0.4502004384994507, 'learning_rate': 2.308091229263731e-06, 'epoch': 0.88}
88%|████████▊ | 3948/4506 [4:29:41<36:18, 3.90s/it]
88%|████████▊ | 3949/4506 [4:29:45<36:35, 3.94s/it]
{'loss': 0.1952, 'grad_norm': 0.4046587646007538, 'learning_rate': 2.2999695881466945e-06, 'epoch': 0.88}
88%|████████▊ | 3949/4506 [4:29:45<36:35, 3.94s/it]
88%|████████▊ | 3950/4506 [4:29:49<36:13, 3.91s/it]
{'loss': 0.1902, 'grad_norm': 0.3811034560203552, 'learning_rate': 2.291861572278589e-06, 'epoch': 0.88}
88%|████████▊ | 3950/4506 [4:29:49<36:13, 3.91s/it]
88%|████████▊ | 3951/4506 [4:29:53<37:20, 4.04s/it]
{'loss': 0.202, 'grad_norm': 0.38074228167533875, 'learning_rate': 2.283767186526098e-06, 'epoch': 0.88}
88%|████████▊ | 3951/4506 [4:29:53<37:20, 4.04s/it]
88%|████████▊ | 3952/4506 [4:29:57<37:15, 4.04s/it]
{'loss': 0.1992, 'grad_norm': 0.3860480785369873, 'learning_rate': 2.2756864357477176e-06, 'epoch': 0.88}
88%|████████▊ | 3952/4506 [4:29:58<37:15, 4.04s/it]
88%|████████▊ | 3953/4506 [4:30:01<36:30, 3.96s/it]
{'loss': 0.2021, 'grad_norm': 0.4496752619743347, 'learning_rate': 2.267619324793757e-06, 'epoch': 0.88}
88%|████████▊ | 3953/4506 [4:30:01<36:30, 3.96s/it]
88%|████████▊ | 3954/4506 [4:30:06<37:27, 4.07s/it]
{'loss': 0.2029, 'grad_norm': 0.395587682723999, 'learning_rate': 2.2595658585063406e-06, 'epoch': 0.88}
88%|████████▊ | 3954/4506 [4:30:06<37:27, 4.07s/it]
88%|████████▊ | 3955/4506 [4:30:10<38:09, 4.15s/it]
{'loss': 0.2, 'grad_norm': 0.4270792007446289, 'learning_rate': 2.251526041719404e-06, 'epoch': 0.88}
88%|████████▊ | 3955/4506 [4:30:10<38:09, 4.15s/it]
88%|████████▊ | 3956/4506 [4:30:14<37:01, 4.04s/it]
{'loss': 0.1976, 'grad_norm': 0.4154841899871826, 'learning_rate': 2.243499879258695e-06, 'epoch': 0.88}
88%|████████▊ | 3956/4506 [4:30:14<37:01, 4.04s/it]
88%|████████▊ | 3957/4506 [4:30:18<37:19, 4.08s/it]
{'loss': 0.217, 'grad_norm': 0.4278455674648285, 'learning_rate': 2.2354873759417584e-06, 'epoch': 0.88}
88%|████████▊ | 3957/4506 [4:30:18<37:19, 4.08s/it]
88%|████████▊ | 3958/4506 [4:30:22<37:50, 4.14s/it]
{'loss': 0.1823, 'grad_norm': 0.3884335458278656, 'learning_rate': 2.2274885365779375e-06, 'epoch': 0.88}
88%|████████▊ | 3958/4506 [4:30:22<37:50, 4.14s/it]
88%|████████▊ | 3959/4506 [4:30:27<39:28, 4.33s/it]
{'loss': 0.2078, 'grad_norm': 0.38561949133872986, 'learning_rate': 2.2195033659683894e-06, 'epoch': 0.88}
88%|████████▊ | 3959/4506 [4:30:27<39:28, 4.33s/it]
88%|████████▊ | 3960/4506 [4:30:31<38:35, 4.24s/it]
{'loss': 0.2063, 'grad_norm': 0.43420448899269104, 'learning_rate': 2.2115318689060444e-06, 'epoch': 0.88}
88%|████████▊ | 3960/4506 [4:30:31<38:35, 4.24s/it]
88%|████████▊ | 3961/4506 [4:30:35<38:53, 4.28s/it]
{'loss': 0.209, 'grad_norm': 0.38449013233184814, 'learning_rate': 2.2035740501756467e-06, 'epoch': 0.88}
88%|████████▊ | 3961/4506 [4:30:35<38:53, 4.28s/it]
88%|████████▊ | 3962/4506 [4:30:40<38:39, 4.26s/it]
{'loss': 0.2027, 'grad_norm': 0.3911384344100952, 'learning_rate': 2.1956299145537123e-06, 'epoch': 0.88}
88%|████████▊ | 3962/4506 [4:30:40<38:39, 4.26s/it]
88%|████████▊ | 3963/4506 [4:30:43<37:18, 4.12s/it]
{'loss': 0.1915, 'grad_norm': 0.39664581418037415, 'learning_rate': 2.1876994668085626e-06, 'epoch': 0.88}
88%|████████▊ | 3963/4506 [4:30:43<37:18, 4.12s/it]
88%|████████▊ | 3964/4506 [4:30:47<36:38, 4.06s/it]
{'loss': 0.2063, 'grad_norm': 0.43464747071266174, 'learning_rate': 2.17978271170029e-06, 'epoch': 0.88}
88%|████████▊ | 3964/4506 [4:30:47<36:38, 4.06s/it]
88%|████████▊ | 3965/4506 [4:30:52<37:01, 4.11s/it]
{'loss': 0.2109, 'grad_norm': 0.4122372269630432, 'learning_rate': 2.171879653980771e-06, 'epoch': 0.88}
88%|████████▊ | 3965/4506 [4:30:52<37:01, 4.11s/it]
88%|████████▊ | 3966/4506 [4:30:56<36:49, 4.09s/it]
{'loss': 0.1975, 'grad_norm': 0.3620370924472809, 'learning_rate': 2.1639902983936616e-06, 'epoch': 0.88}
88%|████████▊ | 3966/4506 [4:30:56<36:49, 4.09s/it]
88%|████████▊ | 3967/4506 [4:31:00<36:39, 4.08s/it]
{'loss': 0.1903, 'grad_norm': 0.3707556426525116, 'learning_rate': 2.1561146496743954e-06, 'epoch': 0.88}
88%|████████▊ | 3967/4506 [4:31:00<36:39, 4.08s/it]
88%|████████▊ | 3968/4506 [4:31:04<36:11, 4.04s/it]
{'loss': 0.1966, 'grad_norm': 0.37400028109550476, 'learning_rate': 2.1482527125501693e-06, 'epoch': 0.88}
88%|████████▊ | 3968/4506 [4:31:04<36:11, 4.04s/it]
88%|████████▊ | 3969/4506 [4:31:08<36:22, 4.06s/it]
{'loss': 0.2092, 'grad_norm': 0.47427406907081604, 'learning_rate': 2.140404491739964e-06, 'epoch': 0.88}
88%|████████▊ | 3969/4506 [4:31:08<36:22, 4.06s/it]
88%|████████▊ | 3970/4506 [4:31:12<36:54, 4.13s/it]
{'loss': 0.1926, 'grad_norm': 0.3607529401779175, 'learning_rate': 2.132569991954522e-06, 'epoch': 0.88}
88%|████████▊ | 3970/4506 [4:31:12<36:54, 4.13s/it]
88%|████████▊ | 3971/4506 [4:31:16<36:20, 4.08s/it]
{'loss': 0.2152, 'grad_norm': 0.3910280168056488, 'learning_rate': 2.1247492178963407e-06, 'epoch': 0.88}
88%|████████▊ | 3971/4506 [4:31:16<36:20, 4.08s/it]
88%|████████▊ | 3972/4506 [4:31:20<35:45, 4.02s/it]
{'loss': 0.2017, 'grad_norm': 0.3972361981868744, 'learning_rate': 2.116942174259692e-06, 'epoch': 0.88}
88%|████████▊ | 3972/4506 [4:31:20<35:45, 4.02s/it]
88%|████████▊ | 3973/4506 [4:31:24<35:40, 4.02s/it]
{'loss': 0.1949, 'grad_norm': 0.40474119782447815, 'learning_rate': 2.1091488657306006e-06, 'epoch': 0.88}
88%|████████▊ | 3973/4506 [4:31:24<35:40, 4.02s/it]
88%|████████▊ | 3974/4506 [4:31:28<35:59, 4.06s/it]
{'loss': 0.2011, 'grad_norm': 0.35254520177841187, 'learning_rate': 2.1013692969868437e-06, 'epoch': 0.88}
88%|████████▊ | 3974/4506 [4:31:28<35:59, 4.06s/it]
88%|████████▊ | 3975/4506 [4:31:32<36:33, 4.13s/it]
{'loss': 0.21, 'grad_norm': 0.47802141308784485, 'learning_rate': 2.093603472697958e-06, 'epoch': 0.88}
88%|████████▊ | 3975/4506 [4:31:32<36:33, 4.13s/it]
88%|████████▊ | 3976/4506 [4:31:36<36:34, 4.14s/it]
{'loss': 0.1975, 'grad_norm': 0.3712274730205536, 'learning_rate': 2.0858513975252345e-06, 'epoch': 0.88}
88%|████████▊ | 3976/4506 [4:31:36<36:34, 4.14s/it]
88%|████████▊ | 3977/4506 [4:31:41<36:39, 4.16s/it]
{'loss': 0.2089, 'grad_norm': 0.3540850877761841, 'learning_rate': 2.0781130761216903e-06, 'epoch': 0.88}
88%|████████▊ | 3977/4506 [4:31:41<36:39, 4.16s/it]
88%|████████▊ | 3978/4506 [4:31:45<36:58, 4.20s/it]
{'loss': 0.2015, 'grad_norm': 0.3773611783981323, 'learning_rate': 2.0703885131321154e-06, 'epoch': 0.88}
88%|████████▊ | 3978/4506 [4:31:45<36:58, 4.20s/it]
88%|████████▊ | 3979/4506 [4:31:49<36:48, 4.19s/it]
{'loss': 0.212, 'grad_norm': 0.4519721269607544, 'learning_rate': 2.0626777131930148e-06, 'epoch': 0.88}
88%|████████▊ | 3979/4506 [4:31:49<36:48, 4.19s/it]
88%|████████▊ | 3980/4506 [4:31:54<37:47, 4.31s/it]
{'loss': 0.2062, 'grad_norm': 0.4347835183143616, 'learning_rate': 2.0549806809326583e-06, 'epoch': 0.88}
88%|████████▊ | 3980/4506 [4:31:54<37:47, 4.31s/it]
88%|████████▊ | 3981/4506 [4:31:58<36:46, 4.20s/it]
{'loss': 0.1995, 'grad_norm': 0.36647742986679077, 'learning_rate': 2.0472974209710198e-06, 'epoch': 0.88}
88%|████████▊ | 3981/4506 [4:31:58<36:46, 4.20s/it]
88%|████████▊ | 3982/4506 [4:32:01<35:45, 4.10s/it]
{'loss': 0.1921, 'grad_norm': 0.40723979473114014, 'learning_rate': 2.0396279379198498e-06, 'epoch': 0.88}
88%|████████▊ | 3982/4506 [4:32:01<35:45, 4.10s/it]
88%|████████▊ | 3983/4506 [4:32:06<36:15, 4.16s/it]
{'loss': 0.2019, 'grad_norm': 0.38673919439315796, 'learning_rate': 2.0319722363825873e-06, 'epoch': 0.88}
88%|████████▊ | 3983/4506 [4:32:06<36:15, 4.16s/it]
88%|████████▊ | 3984/4506 [4:32:10<35:36, 4.09s/it]
{'loss': 0.2073, 'grad_norm': 0.47420820593833923, 'learning_rate': 2.024330320954429e-06, 'epoch': 0.88}
88%|████████▊ | 3984/4506 [4:32:10<35:36, 4.09s/it]
88%|████████▊ | 3985/4506 [4:32:14<35:29, 4.09s/it]
{'loss': 0.2011, 'grad_norm': 0.416032999753952, 'learning_rate': 2.0167021962222725e-06, 'epoch': 0.88}
88%|████████▊ | 3985/4506 [4:32:14<35:29, 4.09s/it]
88%|████████▊ | 3986/4506 [4:32:18<34:54, 4.03s/it]
{'loss': 0.2043, 'grad_norm': 0.40173789858818054, 'learning_rate': 2.009087866764764e-06, 'epoch': 0.88}
88%|████████▊ | 3986/4506 [4:32:18<34:54, 4.03s/it]
88%|████████▊ | 3987/4506 [4:32:21<34:03, 3.94s/it]
{'loss': 0.1952, 'grad_norm': 0.4011593461036682, 'learning_rate': 2.001487337152244e-06, 'epoch': 0.88}
88%|████████▊ | 3987/4506 [4:32:21<34:03, 3.94s/it]
89%|████████▊ | 3988/4506 [4:32:26<34:27, 3.99s/it]
{'loss': 0.2071, 'grad_norm': 0.37691590189933777, 'learning_rate': 1.993900611946789e-06, 'epoch': 0.89}
89%|████████▊ | 3988/4506 [4:32:26<34:27, 3.99s/it]
89%|████████▊ | 3989/4506 [4:32:29<33:52, 3.93s/it]
{'loss': 0.1888, 'grad_norm': 0.36398571729660034, 'learning_rate': 1.9863276957021807e-06, 'epoch': 0.89}
89%|████████▊ | 3989/4506 [4:32:29<33:52, 3.93s/it]
89%|████████▊ | 3990/4506 [4:32:33<33:42, 3.92s/it]
{'loss': 0.1942, 'grad_norm': 0.4208531081676483, 'learning_rate': 1.9787685929639116e-06, 'epoch': 0.89}
89%|████████▊ | 3990/4506 [4:32:33<33:42, 3.92s/it]
89%|████████▊ | 3991/4506 [4:32:38<34:33, 4.03s/it]
{'loss': 0.1977, 'grad_norm': 0.39710763096809387, 'learning_rate': 1.9712233082691906e-06, 'epoch': 0.89}
89%|████████▊ | 3991/4506 [4:32:38<34:33, 4.03s/it]
89%|████████▊ | 3992/4506 [4:32:41<34:25, 4.02s/it]
{'loss': 0.1927, 'grad_norm': 0.3668835759162903, 'learning_rate': 1.9636918461469172e-06, 'epoch': 0.89}
89%|████████▊ | 3992/4506 [4:32:42<34:25, 4.02s/it]
89%|████████▊ | 3993/4506 [4:32:45<34:18, 4.01s/it]
{'loss': 0.2021, 'grad_norm': 0.40776121616363525, 'learning_rate': 1.956174211117712e-06, 'epoch': 0.89}
89%|████████▊ | 3993/4506 [4:32:46<34:18, 4.01s/it]
89%|████████▊ | 3994/4506 [4:32:50<35:21, 4.14s/it]
{'loss': 0.1904, 'grad_norm': 0.38980257511138916, 'learning_rate': 1.948670407693884e-06, 'epoch': 0.89}
89%|████████▊ | 3994/4506 [4:32:50<35:21, 4.14s/it]
89%|████████▊ | 3995/4506 [4:32:54<35:01, 4.11s/it]
{'loss': 0.2009, 'grad_norm': 0.44471216201782227, 'learning_rate': 1.9411804403794533e-06, 'epoch': 0.89}
89%|████████▊ | 3995/4506 [4:32:54<35:01, 4.11s/it]
89%|████████▊ | 3996/4506 [4:32:58<35:01, 4.12s/it]
{'loss': 0.2022, 'grad_norm': 0.4675160050392151, 'learning_rate': 1.933704313670115e-06, 'epoch': 0.89}
89%|████████▊ | 3996/4506 [4:32:58<35:01, 4.12s/it]
89%|████████▊ | 3997/4506 [4:33:03<35:37, 4.20s/it]
{'loss': 0.2047, 'grad_norm': 0.4036681652069092, 'learning_rate': 1.926242032053277e-06, 'epoch': 0.89}
89%|████████▊ | 3997/4506 [4:33:03<35:37, 4.20s/it]
89%|████████▊ | 3998/4506 [4:33:07<35:08, 4.15s/it]
{'loss': 0.1996, 'grad_norm': 0.4064146876335144, 'learning_rate': 1.9187936000080174e-06, 'epoch': 0.89}
89%|████████▊ | 3998/4506 [4:33:07<35:08, 4.15s/it]
89%|████████▊ | 3999/4506 [4:33:11<35:12, 4.17s/it]
{'loss': 0.2058, 'grad_norm': 0.3455832898616791, 'learning_rate': 1.9113590220051244e-06, 'epoch': 0.89}
89%|████████▊ | 3999/4506 [4:33:11<35:12, 4.17s/it]
89%|████████▉ | 4000/4506 [4:33:15<35:50, 4.25s/it]
{'loss': 0.2001, 'grad_norm': 0.4193384051322937, 'learning_rate': 1.9039383025070413e-06, 'epoch': 0.89}
89%|████████▉ | 4000/4506 [4:33:15<35:50, 4.25s/it]
89%|████████▉ | 4001/4506 [4:33:19<35:30, 4.22s/it]
{'loss': 0.1873, 'grad_norm': 0.4113488495349884, 'learning_rate': 1.8965314459679278e-06, 'epoch': 0.89}
89%|████████▉ | 4001/4506 [4:33:19<35:30, 4.22s/it]
89%|████████▉ | 4002/4506 [4:33:23<34:56, 4.16s/it]
{'loss': 0.2008, 'grad_norm': 0.3636332154273987, 'learning_rate': 1.8891384568335918e-06, 'epoch': 0.89}
89%|████████▉ | 4002/4506 [4:33:23<34:56, 4.16s/it]
89%|████████▉ | 4003/4506 [4:33:27<34:15, 4.09s/it]
{'loss': 0.1952, 'grad_norm': 0.4145393669605255, 'learning_rate': 1.881759339541539e-06, 'epoch': 0.89}
89%|████████▉ | 4003/4506 [4:33:27<34:15, 4.09s/it]
89%|████████▉ | 4004/4506 [4:33:31<34:06, 4.08s/it]
{'loss': 0.2009, 'grad_norm': 0.36727848649024963, 'learning_rate': 1.8743940985209373e-06, 'epoch': 0.89}
89%|████████▉ | 4004/4506 [4:33:31<34:06, 4.08s/it]
89%|████████▉ | 4005/4506 [4:33:35<34:11, 4.09s/it]
{'loss': 0.1949, 'grad_norm': 0.41886192560195923, 'learning_rate': 1.8670427381926203e-06, 'epoch': 0.89}
89%|████████▉ | 4005/4506 [4:33:35<34:11, 4.09s/it]
89%|████████▉ | 4006/4506 [4:33:40<34:10, 4.10s/it]
{'loss': 0.197, 'grad_norm': 0.37517398595809937, 'learning_rate': 1.8597052629691082e-06, 'epoch': 0.89}
89%|████████▉ | 4006/4506 [4:33:40<34:10, 4.10s/it]
89%|████████▉ | 4007/4506 [4:33:43<33:22, 4.01s/it]
{'loss': 0.2042, 'grad_norm': 0.3925628364086151, 'learning_rate': 1.8523816772545721e-06, 'epoch': 0.89}
89%|████████▉ | 4007/4506 [4:33:43<33:22, 4.01s/it]
89%|████████▉ | 4008/4506 [4:33:48<34:37, 4.17s/it]
{'loss': 0.207, 'grad_norm': 0.3984144330024719, 'learning_rate': 1.8450719854448573e-06, 'epoch': 0.89}
89%|████████▉ | 4008/4506 [4:33:48<34:37, 4.17s/it]
89%|████████▉ | 4009/4506 [4:33:52<34:04, 4.11s/it]
{'loss': 0.2001, 'grad_norm': 0.4001248776912689, 'learning_rate': 1.8377761919274538e-06, 'epoch': 0.89}
89%|████████▉ | 4009/4506 [4:33:52<34:04, 4.11s/it]
89%|████████▉ | 4010/4506 [4:33:56<33:41, 4.08s/it]
{'loss': 0.1976, 'grad_norm': 0.39829620718955994, 'learning_rate': 1.8304943010815285e-06, 'epoch': 0.89}
89%|████████▉ | 4010/4506 [4:33:56<33:41, 4.08s/it]
89%|████████▉ | 4011/4506 [4:34:00<33:17, 4.03s/it]
{'loss': 0.1883, 'grad_norm': 0.36221280694007874, 'learning_rate': 1.823226317277882e-06, 'epoch': 0.89}
89%|████████▉ | 4011/4506 [4:34:00<33:17, 4.03s/it]
89%|████████▉ | 4012/4506 [4:34:04<33:01, 4.01s/it]
{'loss': 0.1999, 'grad_norm': 0.4165520966053009, 'learning_rate': 1.8159722448789885e-06, 'epoch': 0.89}
89%|████████▉ | 4012/4506 [4:34:04<33:01, 4.01s/it]
89%|████████▉ | 4013/4506 [4:34:08<32:43, 3.98s/it]
{'loss': 0.193, 'grad_norm': 0.38675758242607117, 'learning_rate': 1.8087320882389597e-06, 'epoch': 0.89}
89%|████████▉ | 4013/4506 [4:34:08<32:43, 3.98s/it]
89%|████████▉ | 4014/4506 [4:34:12<33:31, 4.09s/it]
{'loss': 0.1931, 'grad_norm': 0.39783820509910583, 'learning_rate': 1.8015058517035638e-06, 'epoch': 0.89}
89%|████████▉ | 4014/4506 [4:34:12<33:31, 4.09s/it]
89%|████████▉ | 4015/4506 [4:34:16<33:27, 4.09s/it]
{'loss': 0.1974, 'grad_norm': 0.4661695063114166, 'learning_rate': 1.794293539610198e-06, 'epoch': 0.89}
89%|████████▉ | 4015/4506 [4:34:16<33:27, 4.09s/it]
89%|████████▉ | 4016/4506 [4:34:20<32:57, 4.04s/it]
{'loss': 0.2127, 'grad_norm': 0.4513182044029236, 'learning_rate': 1.7870951562879213e-06, 'epoch': 0.89}
89%|████████▉ | 4016/4506 [4:34:20<32:57, 4.04s/it]
89%|████████▉ | 4017/4506 [4:34:25<34:07, 4.19s/it]
{'loss': 0.1966, 'grad_norm': 0.3519471287727356, 'learning_rate': 1.7799107060574138e-06, 'epoch': 0.89}
89%|████████▉ | 4017/4506 [4:34:25<34:07, 4.19s/it]
89%|████████▉ | 4018/4506 [4:34:29<33:51, 4.16s/it]
{'loss': 0.2003, 'grad_norm': 0.40667933225631714, 'learning_rate': 1.7727401932310066e-06, 'epoch': 0.89}
89%|████████▉ | 4018/4506 [4:34:29<33:51, 4.16s/it]
89%|████████▉ | 4019/4506 [4:34:33<33:36, 4.14s/it]
{'loss': 0.1991, 'grad_norm': 0.3625085949897766, 'learning_rate': 1.7655836221126515e-06, 'epoch': 0.89}
89%|████████▉ | 4019/4506 [4:34:33<33:36, 4.14s/it]
89%|████████▉ | 4020/4506 [4:34:37<33:37, 4.15s/it]
{'loss': 0.1928, 'grad_norm': 0.41115379333496094, 'learning_rate': 1.7584409969979537e-06, 'epoch': 0.89}
89%|████████▉ | 4020/4506 [4:34:37<33:37, 4.15s/it]
89%|████████▉ | 4021/4506 [4:34:41<32:54, 4.07s/it]
{'loss': 0.1982, 'grad_norm': 0.47489824891090393, 'learning_rate': 1.7513123221741285e-06, 'epoch': 0.89}
89%|████████▉ | 4021/4506 [4:34:41<32:54, 4.07s/it]
89%|████████▉ | 4022/4506 [4:34:45<32:09, 3.99s/it]
{'loss': 0.1977, 'grad_norm': 0.4196415841579437, 'learning_rate': 1.7441976019200167e-06, 'epoch': 0.89}
89%|████████▉ | 4022/4506 [4:34:45<32:09, 3.99s/it]
89%|████████▉ | 4023/4506 [4:34:48<31:45, 3.95s/it]
{'loss': 0.2072, 'grad_norm': 0.4317258894443512, 'learning_rate': 1.7370968405061e-06, 'epoch': 0.89}
89%|████████▉ | 4023/4506 [4:34:48<31:45, 3.95s/it]
89%|████████▉ | 4024/4506 [4:34:53<32:17, 4.02s/it]
{'loss': 0.2038, 'grad_norm': 0.37319740653038025, 'learning_rate': 1.7300100421944603e-06, 'epoch': 0.89}
89%|████████▉ | 4024/4506 [4:34:53<32:17, 4.02s/it]
89%|████████▉ | 4025/4506 [4:34:57<31:53, 3.98s/it]
{'loss': 0.1942, 'grad_norm': 0.3832333981990814, 'learning_rate': 1.7229372112388176e-06, 'epoch': 0.89}
89%|████████▉ | 4025/4506 [4:34:57<31:53, 3.98s/it]
89%|████████▉ | 4026/4506 [4:35:01<32:04, 4.01s/it]
{'loss': 0.196, 'grad_norm': 0.38617756962776184, 'learning_rate': 1.715878351884498e-06, 'epoch': 0.89}
89%|████████▉ | 4026/4506 [4:35:01<32:04, 4.01s/it]
89%|████████▉ | 4027/4506 [4:35:05<33:19, 4.17s/it]
{'loss': 0.2011, 'grad_norm': 0.4258445203304291, 'learning_rate': 1.7088334683684482e-06, 'epoch': 0.89}
89%|████████▉ | 4027/4506 [4:35:05<33:19, 4.17s/it]
89%|████████▉ | 4028/4506 [4:35:09<32:37, 4.10s/it]
{'loss': 0.1953, 'grad_norm': 0.3641691505908966, 'learning_rate': 1.7018025649192139e-06, 'epoch': 0.89}
89%|████████▉ | 4028/4506 [4:35:09<32:37, 4.10s/it]
89%|████████▉ | 4029/4506 [4:35:13<32:26, 4.08s/it]
{'loss': 0.206, 'grad_norm': 0.37872615456581116, 'learning_rate': 1.6947856457569633e-06, 'epoch': 0.89}
89%|████████▉ | 4029/4506 [4:35:13<32:26, 4.08s/it]
89%|████████▉ | 4030/4506 [4:35:17<32:44, 4.13s/it]
{'loss': 0.2025, 'grad_norm': 0.39544543623924255, 'learning_rate': 1.687782715093461e-06, 'epoch': 0.89}
89%|████████▉ | 4030/4506 [4:35:17<32:44, 4.13s/it]
89%|████████▉ | 4031/4506 [4:35:22<33:17, 4.21s/it]
{'loss': 0.1935, 'grad_norm': 0.4239048659801483, 'learning_rate': 1.6807937771320781e-06, 'epoch': 0.89}
89%|████████▉ | 4031/4506 [4:35:22<33:17, 4.21s/it]
89%|████████▉ | 4032/4506 [4:35:26<32:18, 4.09s/it]
{'loss': 0.1917, 'grad_norm': 0.43452972173690796, 'learning_rate': 1.673818836067792e-06, 'epoch': 0.89}
89%|████████▉ | 4032/4506 [4:35:26<32:18, 4.09s/it]
90%|████████▉ | 4033/4506 [4:35:30<32:11, 4.08s/it]
{'loss': 0.1882, 'grad_norm': 0.37461283802986145, 'learning_rate': 1.666857896087176e-06, 'epoch': 0.9}
90%|████████▉ | 4033/4506 [4:35:30<32:11, 4.08s/it]
90%|████████▉ | 4034/4506 [4:35:34<31:49, 4.05s/it]
{'loss': 0.1947, 'grad_norm': 0.4936369061470032, 'learning_rate': 1.6599109613683912e-06, 'epoch': 0.9}
90%|████████▉ | 4034/4506 [4:35:34<31:49, 4.05s/it]
90%|████████▉ | 4035/4506 [4:35:38<32:16, 4.11s/it]
{'loss': 0.1935, 'grad_norm': 0.3554624319076538, 'learning_rate': 1.652978036081207e-06, 'epoch': 0.9}
90%|████████▉ | 4035/4506 [4:35:38<32:16, 4.11s/it]
90%|████████▉ | 4036/4506 [4:35:42<31:56, 4.08s/it]
{'loss': 0.1859, 'grad_norm': 0.4024277925491333, 'learning_rate': 1.64605912438697e-06, 'epoch': 0.9}
90%|████████▉ | 4036/4506 [4:35:42<31:56, 4.08s/it]
90%|████████▉ | 4037/4506 [4:35:46<32:15, 4.13s/it]
{'loss': 0.1855, 'grad_norm': 0.36509472131729126, 'learning_rate': 1.6391542304386221e-06, 'epoch': 0.9}
90%|████████▉ | 4037/4506 [4:35:46<32:15, 4.13s/it]
90%|████████▉ | 4038/4506 [4:35:50<32:21, 4.15s/it]
{'loss': 0.1982, 'grad_norm': 0.4068324565887451, 'learning_rate': 1.6322633583806895e-06, 'epoch': 0.9}
90%|████████▉ | 4038/4506 [4:35:50<32:21, 4.15s/it]
90%|████████▉ | 4039/4506 [4:35:54<31:48, 4.09s/it]
{'loss': 0.2065, 'grad_norm': 0.40240487456321716, 'learning_rate': 1.625386512349289e-06, 'epoch': 0.9}
90%|████████▉ | 4039/4506 [4:35:54<31:48, 4.09s/it]
90%|████████▉ | 4040/4506 [4:35:58<31:07, 4.01s/it]
{'loss': 0.1954, 'grad_norm': 0.43582287430763245, 'learning_rate': 1.6185236964721101e-06, 'epoch': 0.9}
90%|████████▉ | 4040/4506 [4:35:58<31:07, 4.01s/it]
90%|████████▉ | 4041/4506 [4:36:02<31:15, 4.03s/it]
{'loss': 0.2076, 'grad_norm': 0.43987327814102173, 'learning_rate': 1.6116749148684218e-06, 'epoch': 0.9}
90%|████████▉ | 4041/4506 [4:36:02<31:15, 4.03s/it]
90%|████████▉ | 4042/4506 [4:36:07<31:59, 4.14s/it]
{'loss': 0.1974, 'grad_norm': 0.4469214677810669, 'learning_rate': 1.6048401716490746e-06, 'epoch': 0.9}
90%|████████▉ | 4042/4506 [4:36:07<31:59, 4.14s/it]
90%|████████▉ | 4043/4506 [4:36:10<30:42, 3.98s/it]
{'loss': 0.1995, 'grad_norm': 0.43206754326820374, 'learning_rate': 1.5980194709164842e-06, 'epoch': 0.9}
90%|████████▉ | 4043/4506 [4:36:10<30:42, 3.98s/it]
90%|████████▉ | 4044/4506 [4:36:14<30:33, 3.97s/it]
{'loss': 0.1914, 'grad_norm': 0.4189107120037079, 'learning_rate': 1.5912128167646478e-06, 'epoch': 0.9}
90%|████████▉ | 4044/4506 [4:36:14<30:33, 3.97s/it]
90%|████████▉ | 4045/4506 [4:36:18<30:43, 4.00s/it]
{'loss': 0.2004, 'grad_norm': 0.3469609022140503, 'learning_rate': 1.5844202132791225e-06, 'epoch': 0.9}
90%|████████▉ | 4045/4506 [4:36:18<30:43, 4.00s/it]
90%|████████▉ | 4046/4506 [4:36:22<30:47, 4.02s/it]
{'loss': 0.212, 'grad_norm': 0.47079992294311523, 'learning_rate': 1.5776416645370412e-06, 'epoch': 0.9}
90%|████████▉ | 4046/4506 [4:36:22<30:47, 4.02s/it]
90%|████████▉ | 4047/4506 [4:36:27<31:16, 4.09s/it]
{'loss': 0.2017, 'grad_norm': 0.44920432567596436, 'learning_rate': 1.5708771746070883e-06, 'epoch': 0.9}
90%|████████▉ | 4047/4506 [4:36:27<31:16, 4.09s/it]
90%|████████▉ | 4048/4506 [4:36:31<31:40, 4.15s/it]
{'loss': 0.1916, 'grad_norm': 0.3817996680736542, 'learning_rate': 1.5641267475495214e-06, 'epoch': 0.9}
90%|████████▉ | 4048/4506 [4:36:31<31:40, 4.15s/it]
90%|████████▉ | 4049/4506 [4:36:35<31:01, 4.07s/it]
{'loss': 0.1996, 'grad_norm': 0.38726067543029785, 'learning_rate': 1.5573903874161495e-06, 'epoch': 0.9}
90%|████████▉ | 4049/4506 [4:36:35<31:01, 4.07s/it]
90%|████████▉ | 4050/4506 [4:36:39<32:34, 4.29s/it]
{'loss': 0.1899, 'grad_norm': 0.39698055386543274, 'learning_rate': 1.5506680982503407e-06, 'epoch': 0.9}
90%|████████▉ | 4050/4506 [4:36:39<32:34, 4.29s/it]
90%|████████▉ | 4051/4506 [4:36:43<31:39, 4.18s/it]
{'loss': 0.2005, 'grad_norm': 0.4068664610385895, 'learning_rate': 1.543959884087018e-06, 'epoch': 0.9}
90%|████████▉ | 4051/4506 [4:36:43<31:39, 4.18s/it]
90%|████████▉ | 4052/4506 [4:36:47<30:58, 4.09s/it]
{'loss': 0.2047, 'grad_norm': 0.3582463562488556, 'learning_rate': 1.5372657489526598e-06, 'epoch': 0.9}
90%|████████▉ | 4052/4506 [4:36:47<30:58, 4.09s/it]
90%|████████▉ | 4053/4506 [4:36:51<30:20, 4.02s/it]
{'loss': 0.1984, 'grad_norm': 0.43113794922828674, 'learning_rate': 1.5305856968652888e-06, 'epoch': 0.9}
90%|████████▉ | 4053/4506 [4:36:51<30:20, 4.02s/it]
90%|████████▉ | 4054/4506 [4:36:55<30:20, 4.03s/it]
{'loss': 0.2043, 'grad_norm': 0.3924328684806824, 'learning_rate': 1.5239197318344666e-06, 'epoch': 0.9}
90%|████████▉ | 4054/4506 [4:36:55<30:20, 4.03s/it]
90%|████████▉ | 4055/4506 [4:36:59<30:36, 4.07s/it]
{'loss': 0.2022, 'grad_norm': 0.4384406805038452, 'learning_rate': 1.517267857861318e-06, 'epoch': 0.9}
90%|████████▉ | 4055/4506 [4:36:59<30:36, 4.07s/it]
90%|█████████ | 4056/4506 [4:37:04<30:44, 4.10s/it]
{'loss': 0.2039, 'grad_norm': 0.42782366275787354, 'learning_rate': 1.5106300789384937e-06, 'epoch': 0.9}
90%|█████████ | 4056/4506 [4:37:04<30:44, 4.10s/it]
90%|█████████ | 4057/4506 [4:37:08<30:53, 4.13s/it]
{'loss': 0.2038, 'grad_norm': 0.454057514667511, 'learning_rate': 1.5040063990501924e-06, 'epoch': 0.9}
90%|█████████ | 4057/4506 [4:37:08<30:53, 4.13s/it]
90%|█████████ | 4058/4506 [4:37:12<30:48, 4.13s/it]
{'loss': 0.2084, 'grad_norm': 0.43090566992759705, 'learning_rate': 1.4973968221721507e-06, 'epoch': 0.9}
90%|█████████ | 4058/4506 [4:37:12<30:48, 4.13s/it]
90%|█████████ | 4059/4506 [4:37:16<30:39, 4.12s/it]
{'loss': 0.1948, 'grad_norm': 0.4046306014060974, 'learning_rate': 1.4908013522716397e-06, 'epoch': 0.9}
90%|█████████ | 4059/4506 [4:37:16<30:39, 4.12s/it]
90%|█████████ | 4060/4506 [4:37:20<30:12, 4.06s/it]
{'loss': 0.2029, 'grad_norm': 0.39724060893058777, 'learning_rate': 1.4842199933074563e-06, 'epoch': 0.9}
90%|█████████ | 4060/4506 [4:37:20<30:12, 4.06s/it]
90%|█████████ | 4061/4506 [4:37:24<30:11, 4.07s/it]
{'loss': 0.2042, 'grad_norm': 0.368166983127594, 'learning_rate': 1.4776527492299352e-06, 'epoch': 0.9}
90%|█████████ | 4061/4506 [4:37:24<30:11, 4.07s/it]
90%|█████████ | 4062/4506 [4:37:28<29:06, 3.93s/it]
{'loss': 0.1928, 'grad_norm': 0.401378870010376, 'learning_rate': 1.4710996239809317e-06, 'epoch': 0.9}
90%|█████████ | 4062/4506 [4:37:28<29:06, 3.93s/it]
90%|█████████ | 4063/4506 [4:37:32<29:18, 3.97s/it]
{'loss': 0.2018, 'grad_norm': 0.419524222612381, 'learning_rate': 1.4645606214938351e-06, 'epoch': 0.9}
90%|█████████ | 4063/4506 [4:37:32<29:18, 3.97s/it]
90%|█████████ | 4064/4506 [4:37:36<29:41, 4.03s/it]
{'loss': 0.1963, 'grad_norm': 0.3961334824562073, 'learning_rate': 1.4580357456935506e-06, 'epoch': 0.9}
90%|█████████ | 4064/4506 [4:37:36<29:41, 4.03s/it]
90%|█████████ | 4065/4506 [4:37:40<29:40, 4.04s/it]
{'loss': 0.1922, 'grad_norm': 0.44206860661506653, 'learning_rate': 1.4515250004965147e-06, 'epoch': 0.9}
90%|█████████ | 4065/4506 [4:37:40<29:40, 4.04s/it]
90%|█████████ | 4066/4506 [4:37:44<29:47, 4.06s/it]
{'loss': 0.2077, 'grad_norm': 0.40834710001945496, 'learning_rate': 1.4450283898106653e-06, 'epoch': 0.9}
90%|█████████ | 4066/4506 [4:37:44<29:47, 4.06s/it]
90%|█████████ | 4067/4506 [4:37:48<29:34, 4.04s/it]
{'loss': 0.1934, 'grad_norm': 0.42786112427711487, 'learning_rate': 1.438545917535472e-06, 'epoch': 0.9}
90%|█████████ | 4067/4506 [4:37:48<29:34, 4.04s/it]
90%|█████████ | 4068/4506 [4:37:52<30:03, 4.12s/it]
{'loss': 0.187, 'grad_norm': 0.37236812710762024, 'learning_rate': 1.4320775875619087e-06, 'epoch': 0.9}
90%|█████████ | 4068/4506 [4:37:52<30:03, 4.12s/it]
90%|█████████ | 4069/4506 [4:37:56<29:25, 4.04s/it]
{'loss': 0.1961, 'grad_norm': 0.36963534355163574, 'learning_rate': 1.4256234037724636e-06, 'epoch': 0.9}
90%|█████████ | 4069/4506 [4:37:56<29:25, 4.04s/it]
90%|█████████ | 4070/4506 [4:38:00<29:47, 4.10s/it]
{'loss': 0.2015, 'grad_norm': 0.37934452295303345, 'learning_rate': 1.4191833700411384e-06, 'epoch': 0.9}
90%|█████████ | 4070/4506 [4:38:00<29:47, 4.10s/it]
90%|█████████ | 4071/4506 [4:38:04<29:33, 4.08s/it]
{'loss': 0.2002, 'grad_norm': 0.40003541111946106, 'learning_rate': 1.4127574902334323e-06, 'epoch': 0.9}
90%|█████████ | 4071/4506 [4:38:04<29:33, 4.08s/it]
90%|█████████ | 4072/4506 [4:38:08<29:21, 4.06s/it]
{'loss': 0.203, 'grad_norm': 0.3620172142982483, 'learning_rate': 1.4063457682063573e-06, 'epoch': 0.9}
90%|█████████ | 4072/4506 [4:38:08<29:21, 4.06s/it]
90%|█████████ | 4073/4506 [4:38:12<28:51, 4.00s/it]
{'loss': 0.1922, 'grad_norm': 0.4535475969314575, 'learning_rate': 1.3999482078084214e-06, 'epoch': 0.9}
90%|█████████ | 4073/4506 [4:38:12<28:51, 4.00s/it]
90%|█████████ | 4074/4506 [4:38:16<28:30, 3.96s/it]
{'loss': 0.1954, 'grad_norm': 0.4192049205303192, 'learning_rate': 1.3935648128796386e-06, 'epoch': 0.9}
90%|█████████ | 4074/4506 [4:38:16<28:30, 3.96s/it]
90%|█████████ | 4075/4506 [4:38:20<28:17, 3.94s/it]
{'loss': 0.2093, 'grad_norm': 0.4237874746322632, 'learning_rate': 1.3871955872515108e-06, 'epoch': 0.9}
90%|█████████ | 4075/4506 [4:38:20<28:17, 3.94s/it]
90%|█████████ | 4076/4506 [4:38:24<28:30, 3.98s/it]
{'loss': 0.2021, 'grad_norm': 0.3838396966457367, 'learning_rate': 1.3808405347470465e-06, 'epoch': 0.9}
90%|█████████ | 4076/4506 [4:38:24<28:30, 3.98s/it]
90%|█████████ | 4077/4506 [4:38:28<28:15, 3.95s/it]
{'loss': 0.1926, 'grad_norm': 0.3820505440235138, 'learning_rate': 1.3744996591807391e-06, 'epoch': 0.9}
90%|█████████ | 4077/4506 [4:38:28<28:15, 3.95s/it]
91%|█████████ | 4078/4506 [4:38:32<28:32, 4.00s/it]
{'loss': 0.2012, 'grad_norm': 0.37698060274124146, 'learning_rate': 1.3681729643585773e-06, 'epoch': 0.91}
91%|█████████ | 4078/4506 [4:38:32<28:32, 4.00s/it]
91%|█████████ | 4079/4506 [4:38:36<28:37, 4.02s/it]
{'loss': 0.2029, 'grad_norm': 0.39395853877067566, 'learning_rate': 1.3618604540780316e-06, 'epoch': 0.91}
91%|█████████ | 4079/4506 [4:38:36<28:37, 4.02s/it]
91%|█████████ | 4080/4506 [4:38:40<28:25, 4.00s/it]
{'loss': 0.2029, 'grad_norm': 0.4056699573993683, 'learning_rate': 1.3555621321280714e-06, 'epoch': 0.91}
91%|█████████ | 4080/4506 [4:38:40<28:25, 4.00s/it]
91%|█████████ | 4081/4506 [4:38:44<28:25, 4.01s/it]
{'loss': 0.1946, 'grad_norm': 0.46214672923088074, 'learning_rate': 1.3492780022891283e-06, 'epoch': 0.91}
91%|█████████ | 4081/4506 [4:38:44<28:25, 4.01s/it]
91%|█████████ | 4082/4506 [4:38:49<29:27, 4.17s/it]
{'loss': 0.2056, 'grad_norm': 0.406800240278244, 'learning_rate': 1.3430080683331376e-06, 'epoch': 0.91}
91%|█████████ | 4082/4506 [4:38:49<29:27, 4.17s/it]
91%|█████████ | 4083/4506 [4:38:53<28:46, 4.08s/it]
{'loss': 0.1852, 'grad_norm': 0.39003533124923706, 'learning_rate': 1.3367523340235e-06, 'epoch': 0.91}
91%|█████████ | 4083/4506 [4:38:53<28:46, 4.08s/it]
91%|█████████ | 4084/4506 [4:38:57<28:44, 4.09s/it]
{'loss': 0.1901, 'grad_norm': 0.3687865138053894, 'learning_rate': 1.3305108031151036e-06, 'epoch': 0.91}
91%|█████████ | 4084/4506 [4:38:57<28:44, 4.09s/it]
91%|█████████ | 4085/4506 [4:39:01<28:51, 4.11s/it]
{'loss': 0.1901, 'grad_norm': 0.3835069239139557, 'learning_rate': 1.3242834793542964e-06, 'epoch': 0.91}
91%|█████████ | 4085/4506 [4:39:01<28:51, 4.11s/it]
91%|█████████ | 4086/4506 [4:39:05<28:59, 4.14s/it]
{'loss': 0.2025, 'grad_norm': 0.3574983775615692, 'learning_rate': 1.3180703664789184e-06, 'epoch': 0.91}
91%|█████████ | 4086/4506 [4:39:05<28:59, 4.14s/it]
91%|█████████ | 4087/4506 [4:39:09<29:08, 4.17s/it]
{'loss': 0.1853, 'grad_norm': 0.4150078296661377, 'learning_rate': 1.3118714682182564e-06, 'epoch': 0.91}
91%|█████████ | 4087/4506 [4:39:09<29:08, 4.17s/it]
91%|█████████ | 4088/4506 [4:39:14<29:21, 4.21s/it]
{'loss': 0.1965, 'grad_norm': 0.39437296986579895, 'learning_rate': 1.3056867882930867e-06, 'epoch': 0.91}
91%|█████████ | 4088/4506 [4:39:14<29:21, 4.21s/it]
91%|█████████ | 4089/4506 [4:39:18<30:10, 4.34s/it]
{'loss': 0.1943, 'grad_norm': 0.3160073161125183, 'learning_rate': 1.2995163304156426e-06, 'epoch': 0.91}
91%|█████████ | 4089/4506 [4:39:18<30:10, 4.34s/it]
91%|█████████ | 4090/4506 [4:39:23<30:09, 4.35s/it]
{'loss': 0.2062, 'grad_norm': 0.41678205132484436, 'learning_rate': 1.2933600982896144e-06, 'epoch': 0.91}
91%|█████████ | 4090/4506 [4:39:23<30:09, 4.35s/it]
91%|█████████ | 4091/4506 [4:39:27<29:12, 4.22s/it]
{'loss': 0.2005, 'grad_norm': 0.4221538007259369, 'learning_rate': 1.2872180956101653e-06, 'epoch': 0.91}
91%|█████████ | 4091/4506 [4:39:27<29:12, 4.22s/it]
91%|█████████ | 4092/4506 [4:39:31<30:24, 4.41s/it]
{'loss': 0.2046, 'grad_norm': 0.3747026324272156, 'learning_rate': 1.281090326063908e-06, 'epoch': 0.91}
91%|█████████ | 4092/4506 [4:39:31<30:24, 4.41s/it]
91%|█████████ | 4093/4506 [4:39:36<30:20, 4.41s/it]
{'loss': 0.1974, 'grad_norm': 0.36945438385009766, 'learning_rate': 1.274976793328922e-06, 'epoch': 0.91}
91%|█████████ | 4093/4506 [4:39:36<30:20, 4.41s/it]
91%|█████████ | 4094/4506 [4:39:39<28:43, 4.18s/it]
{'loss': 0.2057, 'grad_norm': 0.4104343354701996, 'learning_rate': 1.2688775010747306e-06, 'epoch': 0.91}
91%|█████████ | 4094/4506 [4:39:39<28:43, 4.18s/it]
91%|█████████ | 4095/4506 [4:39:44<29:19, 4.28s/it]
{'loss': 0.2128, 'grad_norm': 0.3830345571041107, 'learning_rate': 1.2627924529623136e-06, 'epoch': 0.91}
91%|█████████ | 4095/4506 [4:39:44<29:19, 4.28s/it]
91%|█████████ | 4096/4506 [4:39:48<28:27, 4.16s/it]
{'loss': 0.2056, 'grad_norm': 0.40235379338264465, 'learning_rate': 1.256721652644105e-06, 'epoch': 0.91}
91%|█████████ | 4096/4506 [4:39:48<28:27, 4.16s/it]
91%|█████████ | 4097/4506 [4:39:52<28:11, 4.14s/it]
{'loss': 0.1922, 'grad_norm': 0.36436164379119873, 'learning_rate': 1.2506651037639872e-06, 'epoch': 0.91}
91%|█████████ | 4097/4506 [4:39:52<28:11, 4.14s/it]
91%|█████████ | 4098/4506 [4:39:56<28:10, 4.14s/it]
{'loss': 0.2039, 'grad_norm': 0.40470829606056213, 'learning_rate': 1.24462280995728e-06, 'epoch': 0.91}
91%|█████████ | 4098/4506 [4:39:56<28:10, 4.14s/it]
91%|█████████ | 4099/4506 [4:40:00<27:44, 4.09s/it]
{'loss': 0.1983, 'grad_norm': 0.4061943590641022, 'learning_rate': 1.2385947748507542e-06, 'epoch': 0.91}
91%|█████████ | 4099/4506 [4:40:00<27:44, 4.09s/it]
91%|█████████ | 4100/4506 [4:40:04<27:56, 4.13s/it]
{'loss': 0.207, 'grad_norm': 0.38184380531311035, 'learning_rate': 1.232581002062616e-06, 'epoch': 0.91}
91%|█████████ | 4100/4506 [4:40:04<27:56, 4.13s/it]
91%|█████████ | 4101/4506 [4:40:08<27:51, 4.13s/it]
{'loss': 0.2088, 'grad_norm': 0.4426085650920868, 'learning_rate': 1.2265814952025196e-06, 'epoch': 0.91}
91%|█████████ | 4101/4506 [4:40:08<27:51, 4.13s/it]
91%|█████████ | 4102/4506 [4:40:13<28:03, 4.17s/it]
{'loss': 0.2073, 'grad_norm': 0.39643460512161255, 'learning_rate': 1.2205962578715479e-06, 'epoch': 0.91}
91%|█████████ | 4102/4506 [4:40:13<28:03, 4.17s/it]
91%|█████████ | 4103/4506 [4:40:17<27:26, 4.09s/it]
{'loss': 0.1851, 'grad_norm': 0.3895293176174164, 'learning_rate': 1.2146252936622304e-06, 'epoch': 0.91}
91%|█████████ | 4103/4506 [4:40:17<27:26, 4.09s/it]
91%|█████████ | 4104/4506 [4:40:21<27:12, 4.06s/it]
{'loss': 0.1978, 'grad_norm': 0.38084468245506287, 'learning_rate': 1.208668606158514e-06, 'epoch': 0.91}
91%|█████████ | 4104/4506 [4:40:21<27:12, 4.06s/it]
91%|█████████ | 4105/4506 [4:40:25<28:07, 4.21s/it]
{'loss': 0.2059, 'grad_norm': 0.38378816843032837, 'learning_rate': 1.2027261989357803e-06, 'epoch': 0.91}
91%|█████████ | 4105/4506 [4:40:25<28:07, 4.21s/it]
91%|█████████ | 4106/4506 [4:40:29<27:50, 4.18s/it]
{'loss': 0.1917, 'grad_norm': 0.4555360674858093, 'learning_rate': 1.1967980755608483e-06, 'epoch': 0.91}
91%|█████████ | 4106/4506 [4:40:29<27:50, 4.18s/it]
91%|█████████ | 4107/4506 [4:40:33<27:01, 4.06s/it]
{'loss': 0.1897, 'grad_norm': 0.428918719291687, 'learning_rate': 1.1908842395919606e-06, 'epoch': 0.91}
91%|█████████ | 4107/4506 [4:40:33<27:01, 4.06s/it]
91%|█████████ | 4108/4506 [4:40:37<26:21, 3.97s/it]
{'loss': 0.1965, 'grad_norm': 0.3679507374763489, 'learning_rate': 1.1849846945787779e-06, 'epoch': 0.91}
91%|█████████ | 4108/4506 [4:40:37<26:21, 3.97s/it]
91%|█████████ | 4109/4506 [4:40:41<26:20, 3.98s/it]
{'loss': 0.1943, 'grad_norm': 0.3936727046966553, 'learning_rate': 1.1790994440623871e-06, 'epoch': 0.91}
91%|█████████ | 4109/4506 [4:40:41<26:20, 3.98s/it]
91%|█████████ | 4110/4506 [4:40:45<25:57, 3.93s/it]
{'loss': 0.1887, 'grad_norm': 0.3712727129459381, 'learning_rate': 1.1732284915752956e-06, 'epoch': 0.91}
91%|█████████ | 4110/4506 [4:40:45<25:57, 3.93s/it]
91%|█████████ | 4111/4506 [4:40:49<26:01, 3.95s/it]
{'loss': 0.1963, 'grad_norm': 0.39257267117500305, 'learning_rate': 1.1673718406414265e-06, 'epoch': 0.91}
91%|█████████ | 4111/4506 [4:40:49<26:01, 3.95s/it]
91%|█████████▏| 4112/4506 [4:40:52<25:37, 3.90s/it]
{'loss': 0.1943, 'grad_norm': 0.4071812033653259, 'learning_rate': 1.1615294947761203e-06, 'epoch': 0.91}
91%|█████████▏| 4112/4506 [4:40:52<25:37, 3.90s/it]
91%|█████████▏| 4113/4506 [4:40:56<25:44, 3.93s/it]
{'loss': 0.1893, 'grad_norm': 0.37816494703292847, 'learning_rate': 1.1557014574861276e-06, 'epoch': 0.91}
91%|█████████▏| 4113/4506 [4:40:56<25:44, 3.93s/it]
91%|█████████▏| 4114/4506 [4:41:00<25:58, 3.98s/it]
{'loss': 0.192, 'grad_norm': 0.37300947308540344, 'learning_rate': 1.1498877322696194e-06, 'epoch': 0.91}
91%|█████████▏| 4114/4506 [4:41:00<25:58, 3.98s/it]
91%|█████████▏| 4115/4506 [4:41:04<26:02, 4.00s/it]
{'loss': 0.1957, 'grad_norm': 0.35283195972442627, 'learning_rate': 1.1440883226161658e-06, 'epoch': 0.91}
91%|█████████▏| 4115/4506 [4:41:05<26:02, 4.00s/it]
91%|█████████▏| 4116/4506 [4:41:09<26:05, 4.01s/it]
{'loss': 0.1978, 'grad_norm': 0.42608705163002014, 'learning_rate': 1.1383032320067543e-06, 'epoch': 0.91}
91%|█████████▏| 4116/4506 [4:41:09<26:05, 4.01s/it]
91%|█████████▏| 4117/4506 [4:41:12<25:41, 3.96s/it]
{'loss': 0.2084, 'grad_norm': 0.4037228226661682, 'learning_rate': 1.1325324639137686e-06, 'epoch': 0.91}
91%|█████████▏| 4117/4506 [4:41:12<25:41, 3.96s/it]
91%|█████████▏| 4118/4506 [4:41:16<25:38, 3.96s/it]
{'loss': 0.2147, 'grad_norm': 0.42454975843429565, 'learning_rate': 1.1267760218009987e-06, 'epoch': 0.91}
91%|█████████▏| 4118/4506 [4:41:16<25:38, 3.96s/it]
91%|█████████▏| 4119/4506 [4:41:21<26:05, 4.05s/it]
{'loss': 0.1853, 'grad_norm': 0.3844202756881714, 'learning_rate': 1.1210339091236365e-06, 'epoch': 0.91}
91%|█████████▏| 4119/4506 [4:41:21<26:05, 4.05s/it]
91%|█████████▏| 4120/4506 [4:41:25<26:01, 4.05s/it]
{'loss': 0.202, 'grad_norm': 0.40148594975471497, 'learning_rate': 1.115306129328275e-06, 'epoch': 0.91}
91%|█████████▏| 4120/4506 [4:41:25<26:01, 4.05s/it]
91%|█████████▏| 4121/4506 [4:41:29<25:55, 4.04s/it]
{'loss': 0.1884, 'grad_norm': 0.3886905014514923, 'learning_rate': 1.1095926858529005e-06, 'epoch': 0.91}
91%|█████████▏| 4121/4506 [4:41:29<25:55, 4.04s/it]
91%|█████████▏| 4122/4506 [4:41:33<26:14, 4.10s/it]
{'loss': 0.1998, 'grad_norm': 0.37694719433784485, 'learning_rate': 1.1038935821268942e-06, 'epoch': 0.91}
91%|█████████▏| 4122/4506 [4:41:33<26:14, 4.10s/it]
92%|█████████▏| 4123/4506 [4:41:37<25:26, 3.99s/it]
{'loss': 0.2022, 'grad_norm': 0.4087013602256775, 'learning_rate': 1.0982088215710368e-06, 'epoch': 0.92}
92%|█████████▏| 4123/4506 [4:41:37<25:26, 3.99s/it]
92%|█████████▏| 4124/4506 [4:41:41<25:09, 3.95s/it]
{'loss': 0.1977, 'grad_norm': 0.38533109426498413, 'learning_rate': 1.0925384075974848e-06, 'epoch': 0.92}
92%|█████████▏| 4124/4506 [4:41:41<25:09, 3.95s/it]
92%|█████████▏| 4125/4506 [4:41:44<25:09, 3.96s/it]
{'loss': 0.2157, 'grad_norm': 0.43449342250823975, 'learning_rate': 1.0868823436098018e-06, 'epoch': 0.92}
92%|█████████▏| 4125/4506 [4:41:44<25:09, 3.96s/it]
92%|█████████▏| 4126/4506 [4:41:49<25:43, 4.06s/it]
{'loss': 0.2061, 'grad_norm': 0.3712804615497589, 'learning_rate': 1.081240633002925e-06, 'epoch': 0.92}
92%|█████████▏| 4126/4506 [4:41:49<25:43, 4.06s/it]
92%|█████████▏| 4127/4506 [4:41:53<26:01, 4.12s/it]
{'loss': 0.1968, 'grad_norm': 0.35673314332962036, 'learning_rate': 1.0756132791631874e-06, 'epoch': 0.92}
92%|█████████▏| 4127/4506 [4:41:53<26:01, 4.12s/it]
92%|█████████▏| 4128/4506 [4:41:57<25:19, 4.02s/it]
{'loss': 0.1908, 'grad_norm': 0.4272763431072235, 'learning_rate': 1.0700002854682866e-06, 'epoch': 0.92}
92%|█████████▏| 4128/4506 [4:41:57<25:19, 4.02s/it]
92%|█████████▏| 4129/4506 [4:42:01<24:52, 3.96s/it]
{'loss': 0.1901, 'grad_norm': 0.3944593667984009, 'learning_rate': 1.0644016552873253e-06, 'epoch': 0.92}
92%|█████████▏| 4129/4506 [4:42:01<24:52, 3.96s/it]
92%|█████████▏| 4130/4506 [4:42:05<24:47, 3.96s/it]
{'loss': 0.2103, 'grad_norm': 0.427761048078537, 'learning_rate': 1.0588173919807598e-06, 'epoch': 0.92}
92%|█████████▏| 4130/4506 [4:42:05<24:47, 3.96s/it]
92%|█████████▏| 4131/4506 [4:42:09<25:06, 4.02s/it]
{'loss': 0.1979, 'grad_norm': 0.4039788842201233, 'learning_rate': 1.0532474989004448e-06, 'epoch': 0.92}
92%|█████████▏| 4131/4506 [4:42:09<25:06, 4.02s/it]
92%|█████████▏| 4132/4506 [4:42:13<25:09, 4.04s/it]
{'loss': 0.1919, 'grad_norm': 0.3798264265060425, 'learning_rate': 1.0476919793895923e-06, 'epoch': 0.92}
92%|█████████▏| 4132/4506 [4:42:13<25:09, 4.04s/it]
92%|█████████▏| 4133/4506 [4:42:17<25:24, 4.09s/it]
{'loss': 0.2025, 'grad_norm': 0.3804951310157776, 'learning_rate': 1.042150836782796e-06, 'epoch': 0.92}
92%|█████████▏| 4133/4506 [4:42:17<25:24, 4.09s/it]
92%|█████████▏| 4134/4506 [4:42:21<25:01, 4.04s/it]
{'loss': 0.1942, 'grad_norm': 0.3736687898635864, 'learning_rate': 1.0366240744060174e-06, 'epoch': 0.92}
92%|█████████▏| 4134/4506 [4:42:21<25:01, 4.04s/it]
92%|█████████▏| 4135/4506 [4:42:25<24:49, 4.01s/it]
{'loss': 0.1905, 'grad_norm': 0.38624686002731323, 'learning_rate': 1.0311116955765947e-06, 'epoch': 0.92}
92%|█████████▏| 4135/4506 [4:42:25<24:49, 4.01s/it]
92%|█████████▏| 4136/4506 [4:42:29<25:39, 4.16s/it]
{'loss': 0.2012, 'grad_norm': 0.39051464200019836, 'learning_rate': 1.0256137036032147e-06, 'epoch': 0.92}
92%|█████████▏| 4136/4506 [4:42:29<25:39, 4.16s/it]
92%|█████████▏| 4137/4506 [4:42:34<25:29, 4.14s/it]
{'loss': 0.1945, 'grad_norm': 0.38065317273139954, 'learning_rate': 1.0201301017859488e-06, 'epoch': 0.92}
92%|█████████▏| 4137/4506 [4:42:34<25:29, 4.14s/it]
92%|█████████▏| 4138/4506 [4:42:37<24:50, 4.05s/it]
{'loss': 0.2133, 'grad_norm': 0.3921726942062378, 'learning_rate': 1.0146608934162167e-06, 'epoch': 0.92}
92%|█████████▏| 4138/4506 [4:42:37<24:50, 4.05s/it]
92%|█████████▏| 4139/4506 [4:42:41<24:38, 4.03s/it]
{'loss': 0.2057, 'grad_norm': 0.4114178419113159, 'learning_rate': 1.0092060817768007e-06, 'epoch': 0.92}
92%|█████████▏| 4139/4506 [4:42:41<24:38, 4.03s/it]
92%|█████████▏| 4140/4506 [4:42:45<24:08, 3.96s/it]
{'loss': 0.1957, 'grad_norm': 0.4143953323364258, 'learning_rate': 1.003765670141854e-06, 'epoch': 0.92}
92%|█████████▏| 4140/4506 [4:42:45<24:08, 3.96s/it]
92%|█████████▏| 4141/4506 [4:42:49<23:54, 3.93s/it]
{'loss': 0.1879, 'grad_norm': 0.45966535806655884, 'learning_rate': 9.983396617768675e-07, 'epoch': 0.92}
92%|█████████▏| 4141/4506 [4:42:49<23:54, 3.93s/it]
92%|█████████▏| 4142/4506 [4:42:53<23:49, 3.93s/it]
{'loss': 0.2026, 'grad_norm': 0.4607619047164917, 'learning_rate': 9.929280599387025e-07, 'epoch': 0.92}
92%|█████████▏| 4142/4506 [4:42:53<23:49, 3.93s/it]
92%|█████████▏| 4143/4506 [4:42:57<23:46, 3.93s/it]
{'loss': 0.1941, 'grad_norm': 0.3717540204524994, 'learning_rate': 9.875308678755635e-07, 'epoch': 0.92}
92%|█████████▏| 4143/4506 [4:42:57<23:46, 3.93s/it]
92%|█████████▏| 4144/4506 [4:43:01<24:18, 4.03s/it]
{'loss': 0.1947, 'grad_norm': 0.3690280020236969, 'learning_rate': 9.821480888270119e-07, 'epoch': 0.92}
92%|█████████▏| 4144/4506 [4:43:01<24:18, 4.03s/it]
92%|█████████▏| 4145/4506 [4:43:05<24:00, 3.99s/it]
{'loss': 0.186, 'grad_norm': 0.38382166624069214, 'learning_rate': 9.767797260239548e-07, 'epoch': 0.92}
92%|█████████▏| 4145/4506 [4:43:05<24:00, 3.99s/it]
92%|█████████▏| 4146/4506 [4:43:09<23:50, 3.97s/it]
{'loss': 0.202, 'grad_norm': 0.3790653944015503, 'learning_rate': 9.71425782688648e-07, 'epoch': 0.92}
92%|█████████▏| 4146/4506 [4:43:09<23:50, 3.97s/it]
92%|█████████▏| 4147/4506 [4:43:13<24:45, 4.14s/it]
{'loss': 0.1905, 'grad_norm': 0.38057851791381836, 'learning_rate': 9.660862620346878e-07, 'epoch': 0.92}
92%|█████████▏| 4147/4506 [4:43:13<24:45, 4.14s/it]
92%|█████████▏| 4148/4506 [4:43:17<24:20, 4.08s/it]
{'loss': 0.1924, 'grad_norm': 0.4180445075035095, 'learning_rate': 9.607611672670215e-07, 'epoch': 0.92}
92%|█████████▏| 4148/4506 [4:43:17<24:20, 4.08s/it]
92%|█████████▏| 4149/4506 [4:43:22<24:27, 4.11s/it]
{'loss': 0.1956, 'grad_norm': 0.3732786774635315, 'learning_rate': 9.554505015819283e-07, 'epoch': 0.92}
92%|█████████▏| 4149/4506 [4:43:22<24:27, 4.11s/it]
92%|█████████▏| 4150/4506 [4:43:26<24:30, 4.13s/it]
{'loss': 0.2003, 'grad_norm': 0.3788241147994995, 'learning_rate': 9.501542681670361e-07, 'epoch': 0.92}
92%|█████████▏| 4150/4506 [4:43:26<24:30, 4.13s/it]
92%|█████████▏| 4151/4506 [4:43:30<24:18, 4.11s/it]
{'loss': 0.1877, 'grad_norm': 0.3620060384273529, 'learning_rate': 9.448724702013023e-07, 'epoch': 0.92}
92%|█████████▏| 4151/4506 [4:43:30<24:18, 4.11s/it]
92%|█████████▏| 4152/4506 [4:43:34<24:34, 4.17s/it]
{'loss': 0.1867, 'grad_norm': 0.3597061336040497, 'learning_rate': 9.396051108550213e-07, 'epoch': 0.92}
92%|█████████▏| 4152/4506 [4:43:34<24:34, 4.17s/it]
92%|█████████▏| 4153/4506 [4:43:38<24:18, 4.13s/it]
{'loss': 0.198, 'grad_norm': 0.38052889704704285, 'learning_rate': 9.343521932898225e-07, 'epoch': 0.92}
92%|█████████▏| 4153/4506 [4:43:38<24:18, 4.13s/it]
92%|█████████▏| 4154/4506 [4:43:42<24:13, 4.13s/it]
{'loss': 0.1957, 'grad_norm': 0.3803419768810272, 'learning_rate': 9.291137206586753e-07, 'epoch': 0.92}
92%|█████████▏| 4154/4506 [4:43:42<24:13, 4.13s/it]
92%|█████████▏| 4155/4506 [4:43:46<24:14, 4.14s/it]
{'loss': 0.1977, 'grad_norm': 0.38145899772644043, 'learning_rate': 9.238896961058618e-07, 'epoch': 0.92}
92%|█████████▏| 4155/4506 [4:43:46<24:14, 4.14s/it]
92%|█████████▏| 4156/4506 [4:43:50<23:55, 4.10s/it]
{'loss': 0.1908, 'grad_norm': 0.3793427646160126, 'learning_rate': 9.186801227670017e-07, 'epoch': 0.92}
92%|█████████▏| 4156/4506 [4:43:50<23:55, 4.10s/it]
92%|█████████▏| 4157/4506 [4:43:55<24:11, 4.16s/it]
{'loss': 0.1996, 'grad_norm': 0.3909122049808502, 'learning_rate': 9.134850037690379e-07, 'epoch': 0.92}
92%|█████████▏| 4157/4506 [4:43:55<24:11, 4.16s/it]
92%|█████████▏| 4158/4506 [4:43:59<23:51, 4.11s/it]
{'loss': 0.198, 'grad_norm': 0.41341519355773926, 'learning_rate': 9.083043422302429e-07, 'epoch': 0.92}
92%|█████████▏| 4158/4506 [4:43:59<23:51, 4.11s/it]
92%|█████████▏| 4159/4506 [4:44:03<23:51, 4.13s/it]
{'loss': 0.187, 'grad_norm': 0.3914778232574463, 'learning_rate': 9.031381412602069e-07, 'epoch': 0.92}
92%|█████████▏| 4159/4506 [4:44:03<23:51, 4.13s/it]
92%|█████████▏| 4160/4506 [4:44:07<23:14, 4.03s/it]
{'loss': 0.1933, 'grad_norm': 0.40808627009391785, 'learning_rate': 8.979864039598385e-07, 'epoch': 0.92}
92%|█████████▏| 4160/4506 [4:44:07<23:14, 4.03s/it]
92%|█████████▏| 4161/4506 [4:44:11<22:57, 3.99s/it]
{'loss': 0.1994, 'grad_norm': 0.4269714951515198, 'learning_rate': 8.928491334213723e-07, 'epoch': 0.92}
92%|█████████▏| 4161/4506 [4:44:11<22:57, 3.99s/it]
92%|█████████▏| 4162/4506 [4:44:15<23:06, 4.03s/it]
{'loss': 0.2016, 'grad_norm': 0.4290783107280731, 'learning_rate': 8.877263327283474e-07, 'epoch': 0.92}
92%|█████████▏| 4162/4506 [4:44:15<23:06, 4.03s/it]
92%|█████████▏| 4163/4506 [4:44:19<23:30, 4.11s/it]
{'loss': 0.2018, 'grad_norm': 0.42662036418914795, 'learning_rate': 8.826180049556293e-07, 'epoch': 0.92}
92%|█████████▏| 4163/4506 [4:44:19<23:30, 4.11s/it]
92%|█████████▏| 4164/4506 [4:44:23<23:06, 4.05s/it]
{'loss': 0.1961, 'grad_norm': 0.43058520555496216, 'learning_rate': 8.775241531693845e-07, 'epoch': 0.92}
92%|█████████▏| 4164/4506 [4:44:23<23:06, 4.05s/it]
92%|█████████▏| 4165/4506 [4:44:27<22:37, 3.98s/it]
{'loss': 0.2031, 'grad_norm': 0.41208839416503906, 'learning_rate': 8.724447804271091e-07, 'epoch': 0.92}
92%|█████████▏| 4165/4506 [4:44:27<22:37, 3.98s/it]
92%|█████████▏| 4166/4506 [4:44:31<22:32, 3.98s/it]
{'loss': 0.1981, 'grad_norm': 0.43879595398902893, 'learning_rate': 8.673798897775892e-07, 'epoch': 0.92}
92%|█████████▏| 4166/4506 [4:44:31<22:32, 3.98s/it]
92%|█████████▏| 4167/4506 [4:44:35<22:35, 4.00s/it]
{'loss': 0.2053, 'grad_norm': 0.4289300739765167, 'learning_rate': 8.62329484260932e-07, 'epoch': 0.92}
92%|█████████▏| 4167/4506 [4:44:35<22:35, 4.00s/it]
92%|█████████▏| 4168/4506 [4:44:39<22:59, 4.08s/it]
{'loss': 0.1951, 'grad_norm': 0.38382890820503235, 'learning_rate': 8.572935669085402e-07, 'epoch': 0.93}
92%|█████████▏| 4168/4506 [4:44:39<22:59, 4.08s/it]
93%|█████████▎| 4169/4506 [4:44:43<23:17, 4.15s/it]
{'loss': 0.1903, 'grad_norm': 0.4048576056957245, 'learning_rate': 8.522721407431267e-07, 'epoch': 0.93}
93%|█████████▎| 4169/4506 [4:44:43<23:17, 4.15s/it]
93%|█████████▎| 4170/4506 [4:44:47<22:51, 4.08s/it]
{'loss': 0.2043, 'grad_norm': 0.3842526972293854, 'learning_rate': 8.472652087786998e-07, 'epoch': 0.93}
93%|█████████▎| 4170/4506 [4:44:47<22:51, 4.08s/it]
93%|█████████▎| 4171/4506 [4:44:51<22:02, 3.95s/it]
{'loss': 0.2066, 'grad_norm': 0.42039400339126587, 'learning_rate': 8.422727740205777e-07, 'epoch': 0.93}
93%|█████████▎| 4171/4506 [4:44:51<22:02, 3.95s/it]
93%|█████████▎| 4172/4506 [4:44:55<22:35, 4.06s/it]
{'loss': 0.2013, 'grad_norm': 0.4144326448440552, 'learning_rate': 8.372948394653718e-07, 'epoch': 0.93}
93%|█████████▎| 4172/4506 [4:44:55<22:35, 4.06s/it]
93%|█████████▎| 4173/4506 [4:44:59<22:34, 4.07s/it]
{'loss': 0.1933, 'grad_norm': 0.40070709586143494, 'learning_rate': 8.323314081009837e-07, 'epoch': 0.93}
93%|█████████▎| 4173/4506 [4:44:59<22:34, 4.07s/it]
93%|█████████▎| 4174/4506 [4:45:03<22:15, 4.02s/it]
{'loss': 0.2028, 'grad_norm': 0.44925549626350403, 'learning_rate': 8.273824829066246e-07, 'epoch': 0.93}
93%|█████████▎| 4174/4506 [4:45:03<22:15, 4.02s/it]
93%|█████████▎| 4175/4506 [4:45:07<22:15, 4.04s/it]
{'loss': 0.1938, 'grad_norm': 0.3888208568096161, 'learning_rate': 8.224480668527823e-07, 'epoch': 0.93}
93%|█████████▎| 4175/4506 [4:45:07<22:15, 4.04s/it]
93%|█████████▎| 4176/4506 [4:45:11<21:56, 3.99s/it]
{'loss': 0.1976, 'grad_norm': 0.37860727310180664, 'learning_rate': 8.17528162901246e-07, 'epoch': 0.93}
93%|█████████▎| 4176/4506 [4:45:11<21:56, 3.99s/it]
93%|█████████▎| 4177/4506 [4:45:15<22:08, 4.04s/it]
{'loss': 0.1909, 'grad_norm': 0.4001084268093109, 'learning_rate': 8.126227740050923e-07, 'epoch': 0.93}
93%|█████████▎| 4177/4506 [4:45:15<22:08, 4.04s/it]
93%|█████████▎| 4178/4506 [4:45:19<22:03, 4.04s/it]
{'loss': 0.1923, 'grad_norm': 0.38298940658569336, 'learning_rate': 8.077319031086911e-07, 'epoch': 0.93}
93%|█████████▎| 4178/4506 [4:45:19<22:03, 4.04s/it]
93%|█████████▎| 4179/4506 [4:45:23<21:52, 4.01s/it]
{'loss': 0.187, 'grad_norm': 0.4122878909111023, 'learning_rate': 8.0285555314768e-07, 'epoch': 0.93}
93%|█████████▎| 4179/4506 [4:45:23<21:52, 4.01s/it]
93%|█████████▎| 4180/4506 [4:45:27<21:34, 3.97s/it]
{'loss': 0.2045, 'grad_norm': 0.3949165940284729, 'learning_rate': 7.979937270490012e-07, 'epoch': 0.93}
93%|█████████▎| 4180/4506 [4:45:27<21:34, 3.97s/it]
93%|█████████▎| 4181/4506 [4:45:31<21:49, 4.03s/it]
{'loss': 0.206, 'grad_norm': 0.4082556664943695, 'learning_rate': 7.931464277308647e-07, 'epoch': 0.93}
93%|█████████▎| 4181/4506 [4:45:31<21:49, 4.03s/it]
93%|█████████▎| 4182/4506 [4:45:36<21:51, 4.05s/it]
{'loss': 0.2034, 'grad_norm': 0.42656025290489197, 'learning_rate': 7.883136581027739e-07, 'epoch': 0.93}
93%|█████████▎| 4182/4506 [4:45:36<21:51, 4.05s/it]
93%|█████████▎| 4183/4506 [4:45:40<22:20, 4.15s/it]
{'loss': 0.1962, 'grad_norm': 0.3861384689807892, 'learning_rate': 7.834954210654943e-07, 'epoch': 0.93}
93%|█████████▎| 4183/4506 [4:45:40<22:20, 4.15s/it]
93%|█████████▎| 4184/4506 [4:45:44<21:45, 4.06s/it]
{'loss': 0.1873, 'grad_norm': 0.38446173071861267, 'learning_rate': 7.786917195110932e-07, 'epoch': 0.93}
93%|█████████▎| 4184/4506 [4:45:44<21:45, 4.06s/it]
93%|█████████▎| 4185/4506 [4:45:48<22:22, 4.18s/it]
{'loss': 0.1953, 'grad_norm': 0.387115478515625, 'learning_rate': 7.739025563228835e-07, 'epoch': 0.93}
93%|█████████▎| 4185/4506 [4:45:48<22:22, 4.18s/it]
93%|█████████▎| 4186/4506 [4:45:52<22:18, 4.18s/it]
{'loss': 0.2052, 'grad_norm': 0.41246020793914795, 'learning_rate': 7.691279343754771e-07, 'epoch': 0.93}
93%|█████████▎| 4186/4506 [4:45:52<22:18, 4.18s/it]
93%|█████████▎| 4187/4506 [4:45:57<22:25, 4.22s/it]
{'loss': 0.2038, 'grad_norm': 0.3745049834251404, 'learning_rate': 7.643678565347395e-07, 'epoch': 0.93}
93%|█████████▎| 4187/4506 [4:45:57<22:25, 4.22s/it]
93%|█████████▎| 4188/4506 [4:46:00<21:37, 4.08s/it]
{'loss': 0.2012, 'grad_norm': 0.42346104979515076, 'learning_rate': 7.596223256578217e-07, 'epoch': 0.93}
93%|█████████▎| 4188/4506 [4:46:00<21:37, 4.08s/it]
93%|█████████▎| 4189/4506 [4:46:04<21:15, 4.02s/it]
{'loss': 0.2039, 'grad_norm': 0.36944830417633057, 'learning_rate': 7.548913445931283e-07, 'epoch': 0.93}
93%|█████████▎| 4189/4506 [4:46:04<21:15, 4.02s/it]
93%|█████████▎| 4190/4506 [4:46:08<21:17, 4.04s/it]
{'loss': 0.1905, 'grad_norm': 0.4594587981700897, 'learning_rate': 7.501749161803434e-07, 'epoch': 0.93}
93%|█████████▎| 4190/4506 [4:46:08<21:17, 4.04s/it]
93%|█████████▎| 4191/4506 [4:46:12<21:06, 4.02s/it]
{'loss': 0.1974, 'grad_norm': 0.46996164321899414, 'learning_rate': 7.454730432504109e-07, 'epoch': 0.93}
93%|█████████▎| 4191/4506 [4:46:12<21:06, 4.02s/it]
93%|█████████▎| 4192/4506 [4:46:17<21:31, 4.11s/it]
{'loss': 0.1892, 'grad_norm': 0.38500598073005676, 'learning_rate': 7.407857286255343e-07, 'epoch': 0.93}
93%|█████████▎| 4192/4506 [4:46:17<21:31, 4.11s/it]
93%|█████████▎| 4193/4506 [4:46:21<21:18, 4.08s/it]
{'loss': 0.1969, 'grad_norm': 0.37335696816444397, 'learning_rate': 7.361129751191854e-07, 'epoch': 0.93}
93%|█████████▎| 4193/4506 [4:46:21<21:18, 4.08s/it]
93%|█████████▎| 4194/4506 [4:46:25<21:48, 4.19s/it]
{'loss': 0.201, 'grad_norm': 0.37996456027030945, 'learning_rate': 7.314547855360898e-07, 'epoch': 0.93}
93%|█████████▎| 4194/4506 [4:46:25<21:48, 4.19s/it]
93%|█████████▎| 4195/4506 [4:46:29<21:46, 4.20s/it]
{'loss': 0.1921, 'grad_norm': 0.4011996388435364, 'learning_rate': 7.26811162672239e-07, 'epoch': 0.93}
93%|█████████▎| 4195/4506 [4:46:29<21:46, 4.20s/it]
93%|█████████▎| 4196/4506 [4:46:34<22:26, 4.34s/it]
{'loss': 0.2004, 'grad_norm': 0.4059779942035675, 'learning_rate': 7.221821093148728e-07, 'epoch': 0.93}
93%|█████████▎| 4196/4506 [4:46:34<22:26, 4.34s/it]
93%|█████████▎| 4197/4506 [4:46:38<21:53, 4.25s/it]
{'loss': 0.1878, 'grad_norm': 0.35604992508888245, 'learning_rate': 7.175676282424964e-07, 'epoch': 0.93}
93%|█████████▎| 4197/4506 [4:46:38<21:53, 4.25s/it]
93%|█████████▎| 4198/4506 [4:46:42<22:00, 4.29s/it]
{'loss': 0.192, 'grad_norm': 0.3687392473220825, 'learning_rate': 7.129677222248526e-07, 'epoch': 0.93}
93%|█████████▎| 4198/4506 [4:46:42<22:00, 4.29s/it]
93%|█████████▎| 4199/4506 [4:46:47<21:45, 4.25s/it]
{'loss': 0.2007, 'grad_norm': 0.40446752309799194, 'learning_rate': 7.083823940229522e-07, 'epoch': 0.93}
93%|█████████▎| 4199/4506 [4:46:47<21:45, 4.25s/it]
93%|█████████▎| 4200/4506 [4:46:50<21:01, 4.12s/it]
{'loss': 0.1943, 'grad_norm': 0.37828052043914795, 'learning_rate': 7.038116463890438e-07, 'epoch': 0.93}
93%|█████████▎| 4200/4506 [4:46:50<21:01, 4.12s/it]
93%|█████████▎| 4201/4506 [4:46:55<21:02, 4.14s/it]
{'loss': 0.2093, 'grad_norm': 0.46177661418914795, 'learning_rate': 6.992554820666325e-07, 'epoch': 0.93}
93%|█████████▎| 4201/4506 [4:46:55<21:02, 4.14s/it]
93%|█████████▎| 4202/4506 [4:46:59<20:42, 4.09s/it]
{'loss': 0.1897, 'grad_norm': 0.4034101068973541, 'learning_rate': 6.947139037904616e-07, 'epoch': 0.93}
93%|█████████▎| 4202/4506 [4:46:59<20:42, 4.09s/it]
93%|█████████▎| 4203/4506 [4:47:03<20:33, 4.07s/it]
{'loss': 0.1841, 'grad_norm': 0.392764687538147, 'learning_rate': 6.901869142865336e-07, 'epoch': 0.93}
93%|█████████▎| 4203/4506 [4:47:03<20:33, 4.07s/it]
93%|█████████▎| 4204/4506 [4:47:06<20:07, 4.00s/it]
{'loss': 0.1939, 'grad_norm': 0.39646992087364197, 'learning_rate': 6.856745162720779e-07, 'epoch': 0.93}
93%|█████████▎| 4204/4506 [4:47:07<20:07, 4.00s/it]
93%|█████████▎| 4205/4506 [4:47:11<20:18, 4.05s/it]
{'loss': 0.2007, 'grad_norm': 0.4144479036331177, 'learning_rate': 6.811767124555779e-07, 'epoch': 0.93}
93%|█████████▎| 4205/4506 [4:47:11<20:18, 4.05s/it]
93%|█████████▎| 4206/4506 [4:47:15<20:18, 4.06s/it]
{'loss': 0.2066, 'grad_norm': 0.41895508766174316, 'learning_rate': 6.766935055367491e-07, 'epoch': 0.93}
93%|█████████▎| 4206/4506 [4:47:15<20:18, 4.06s/it]
93%|█████████▎| 4207/4506 [4:47:19<19:51, 3.99s/it]
{'loss': 0.2, 'grad_norm': 0.4344446659088135, 'learning_rate': 6.722248982065498e-07, 'epoch': 0.93}
93%|█████████▎| 4207/4506 [4:47:19<19:51, 3.99s/it]
93%|█████████▎| 4208/4506 [4:47:23<20:28, 4.12s/it]
{'loss': 0.2071, 'grad_norm': 0.39121654629707336, 'learning_rate': 6.67770893147171e-07, 'epoch': 0.93}
93%|█████████▎| 4208/4506 [4:47:23<20:28, 4.12s/it]
93%|█████████▎| 4209/4506 [4:47:27<20:24, 4.12s/it]
{'loss': 0.1993, 'grad_norm': 0.42408064007759094, 'learning_rate': 6.633314930320461e-07, 'epoch': 0.93}
93%|█████████▎| 4209/4506 [4:47:27<20:24, 4.12s/it]
93%|█████████▎| 4210/4506 [4:47:31<20:08, 4.08s/it]
{'loss': 0.2033, 'grad_norm': 0.4128269851207733, 'learning_rate': 6.589067005258382e-07, 'epoch': 0.93}
93%|█████████▎| 4210/4506 [4:47:31<20:08, 4.08s/it]
93%|█████████▎| 4211/4506 [4:47:36<20:52, 4.25s/it]
{'loss': 0.2009, 'grad_norm': 0.4098506271839142, 'learning_rate': 6.544965182844393e-07, 'epoch': 0.93}
93%|█████████▎| 4211/4506 [4:47:36<20:52, 4.25s/it]
93%|█████████▎| 4212/4506 [4:47:40<20:14, 4.13s/it]
{'loss': 0.1923, 'grad_norm': 0.4092162549495697, 'learning_rate': 6.501009489549792e-07, 'epoch': 0.93}
93%|█████████▎| 4212/4506 [4:47:40<20:14, 4.13s/it]
93%|█████████▎| 4213/4506 [4:47:44<20:48, 4.26s/it]
{'loss': 0.1959, 'grad_norm': 0.3503790497779846, 'learning_rate': 6.457199951758058e-07, 'epoch': 0.94}
93%|█████████▎| 4213/4506 [4:47:44<20:48, 4.26s/it]
94%|█████████▎| 4214/4506 [4:47:48<20:23, 4.19s/it]
{'loss': 0.2043, 'grad_norm': 0.4507802426815033, 'learning_rate': 6.413536595765074e-07, 'epoch': 0.94}
94%|█████████▎| 4214/4506 [4:47:48<20:23, 4.19s/it]
94%|█████████▎| 4215/4506 [4:47:52<19:51, 4.09s/it]
{'loss': 0.1817, 'grad_norm': 0.3685556948184967, 'learning_rate': 6.370019447778875e-07, 'epoch': 0.94}
94%|█████████▎| 4215/4506 [4:47:52<19:51, 4.09s/it]
94%|█████████▎| 4216/4506 [4:47:56<19:20, 4.00s/it]
{'loss': 0.2027, 'grad_norm': 0.3996613025665283, 'learning_rate': 6.326648533919815e-07, 'epoch': 0.94}
94%|█████████▎| 4216/4506 [4:47:56<19:20, 4.00s/it]
94%|█████████▎| 4217/4506 [4:48:00<18:59, 3.94s/it]
{'loss': 0.2101, 'grad_norm': 0.4054930806159973, 'learning_rate': 6.283423880220379e-07, 'epoch': 0.94}
94%|█████████▎| 4217/4506 [4:48:00<18:59, 3.94s/it]
94%|█████████▎| 4218/4506 [4:48:04<19:03, 3.97s/it]
{'loss': 0.199, 'grad_norm': 0.42075788974761963, 'learning_rate': 6.240345512625395e-07, 'epoch': 0.94}
94%|█████████▎| 4218/4506 [4:48:04<19:03, 3.97s/it]
94%|█████████▎| 4219/4506 [4:48:08<18:54, 3.95s/it]
{'loss': 0.1863, 'grad_norm': 0.3609766364097595, 'learning_rate': 6.197413456991735e-07, 'epoch': 0.94}
94%|█████████▎| 4219/4506 [4:48:08<18:54, 3.95s/it]
94%|█████████▎| 4220/4506 [4:48:12<18:58, 3.98s/it]
{'loss': 0.1953, 'grad_norm': 0.38473066687583923, 'learning_rate': 6.154627739088592e-07, 'epoch': 0.94}
94%|█████████▎| 4220/4506 [4:48:12<18:58, 3.98s/it]
94%|█████████▎| 4221/4506 [4:48:16<18:53, 3.98s/it]
{'loss': 0.19, 'grad_norm': 0.369148850440979, 'learning_rate': 6.1119883845972e-07, 'epoch': 0.94}
94%|█████████▎| 4221/4506 [4:48:16<18:53, 3.98s/it]
94%|█████████▎| 4222/4506 [4:48:20<19:09, 4.05s/it]
{'loss': 0.2023, 'grad_norm': 0.3489353060722351, 'learning_rate': 6.069495419111004e-07, 'epoch': 0.94}
94%|█████████▎| 4222/4506 [4:48:20<19:09, 4.05s/it]
94%|█████████▎| 4223/4506 [4:48:24<19:56, 4.23s/it]
{'loss': 0.1923, 'grad_norm': 0.42134180665016174, 'learning_rate': 6.027148868135629e-07, 'epoch': 0.94}
94%|█████████▎| 4223/4506 [4:48:24<19:56, 4.23s/it]
94%|█████████▎| 4224/4506 [4:48:29<19:54, 4.24s/it]
{'loss': 0.1978, 'grad_norm': 0.36731433868408203, 'learning_rate': 5.984948757088715e-07, 'epoch': 0.94}
94%|█████████▎| 4224/4506 [4:48:29<19:54, 4.24s/it]
94%|█████████▍| 4225/4506 [4:48:33<19:26, 4.15s/it]
{'loss': 0.1933, 'grad_norm': 0.3976779282093048, 'learning_rate': 5.942895111300056e-07, 'epoch': 0.94}
94%|█████████▍| 4225/4506 [4:48:33<19:26, 4.15s/it]
94%|█████████▍| 4226/4506 [4:48:37<19:20, 4.14s/it]
{'loss': 0.1917, 'grad_norm': 0.3993114233016968, 'learning_rate': 5.900987956011517e-07, 'epoch': 0.94}
94%|█████████▍| 4226/4506 [4:48:37<19:20, 4.14s/it]
94%|█████████▍| 4227/4506 [4:48:41<19:19, 4.16s/it]
{'loss': 0.2182, 'grad_norm': 0.39620253443717957, 'learning_rate': 5.859227316377059e-07, 'epoch': 0.94}
94%|█████████▍| 4227/4506 [4:48:41<19:19, 4.16s/it]
94%|█████████▍| 4228/4506 [4:48:45<18:59, 4.10s/it]
{'loss': 0.1908, 'grad_norm': 0.45289504528045654, 'learning_rate': 5.817613217462714e-07, 'epoch': 0.94}
94%|█████████▍| 4228/4506 [4:48:45<18:59, 4.10s/it]
94%|█████████▍| 4229/4506 [4:48:49<18:40, 4.05s/it]
{'loss': 0.211, 'grad_norm': 0.40402325987815857, 'learning_rate': 5.776145684246531e-07, 'epoch': 0.94}
94%|█████████▍| 4229/4506 [4:48:49<18:40, 4.05s/it]
94%|█████████▍| 4230/4506 [4:48:53<18:25, 4.00s/it]
{'loss': 0.2086, 'grad_norm': 0.42621085047721863, 'learning_rate': 5.734824741618544e-07, 'epoch': 0.94}
94%|█████████▍| 4230/4506 [4:48:53<18:25, 4.00s/it]
94%|█████████▍| 4231/4506 [4:48:57<18:19, 4.00s/it]
{'loss': 0.1864, 'grad_norm': 0.32898709177970886, 'learning_rate': 5.693650414380885e-07, 'epoch': 0.94}
94%|█████████▍| 4231/4506 [4:48:57<18:19, 4.00s/it]
94%|█████████▍| 4232/4506 [4:49:01<18:20, 4.02s/it]
{'loss': 0.189, 'grad_norm': 0.4338718056678772, 'learning_rate': 5.652622727247591e-07, 'epoch': 0.94}
94%|█████████▍| 4232/4506 [4:49:01<18:20, 4.02s/it]
94%|█████████▍| 4233/4506 [4:49:05<18:33, 4.08s/it]
{'loss': 0.2062, 'grad_norm': 0.4289539158344269, 'learning_rate': 5.611741704844742e-07, 'epoch': 0.94}
94%|█████████▍| 4233/4506 [4:49:05<18:33, 4.08s/it]
94%|█████████▍| 4234/4506 [4:49:09<18:59, 4.19s/it]
{'loss': 0.1988, 'grad_norm': 0.41512352228164673, 'learning_rate': 5.571007371710402e-07, 'epoch': 0.94}
94%|█████████▍| 4234/4506 [4:49:10<18:59, 4.19s/it]
94%|█████████▍| 4235/4506 [4:49:14<19:09, 4.24s/it]
{'loss': 0.2043, 'grad_norm': 0.43381065130233765, 'learning_rate': 5.530419752294541e-07, 'epoch': 0.94}
94%|█████████▍| 4235/4506 [4:49:14<19:09, 4.24s/it]
94%|█████████▍| 4236/4506 [4:49:18<18:44, 4.17s/it]
{'loss': 0.2014, 'grad_norm': 0.37849709391593933, 'learning_rate': 5.489978870959089e-07, 'epoch': 0.94}
94%|█████████▍| 4236/4506 [4:49:18<18:44, 4.17s/it]
94%|█████████▍| 4237/4506 [4:49:22<18:36, 4.15s/it]
{'loss': 0.2014, 'grad_norm': 0.49220046401023865, 'learning_rate': 5.44968475197788e-07, 'epoch': 0.94}
94%|█████████▍| 4237/4506 [4:49:22<18:36, 4.15s/it]
94%|█████████▍| 4238/4506 [4:49:26<18:11, 4.07s/it]
{'loss': 0.2037, 'grad_norm': 0.42949289083480835, 'learning_rate': 5.409537419536675e-07, 'epoch': 0.94}
94%|█████████▍| 4238/4506 [4:49:26<18:11, 4.07s/it]
94%|█████████▍| 4239/4506 [4:49:30<17:43, 3.98s/it]
{'loss': 0.202, 'grad_norm': 0.4314068853855133, 'learning_rate': 5.36953689773309e-07, 'epoch': 0.94}
94%|█████████▍| 4239/4506 [4:49:30<17:43, 3.98s/it]
94%|█████████▍| 4240/4506 [4:49:34<17:34, 3.97s/it]
{'loss': 0.1856, 'grad_norm': 0.39578142762184143, 'learning_rate': 5.329683210576697e-07, 'epoch': 0.94}
94%|█████████▍| 4240/4506 [4:49:34<17:34, 3.97s/it]
94%|█████████▍| 4241/4506 [4:49:37<17:20, 3.93s/it]
{'loss': 0.2005, 'grad_norm': 0.42804649472236633, 'learning_rate': 5.289976381988887e-07, 'epoch': 0.94}
94%|█████████▍| 4241/4506 [4:49:37<17:20, 3.93s/it]
94%|█████████▍| 4242/4506 [4:49:42<17:37, 4.01s/it]
{'loss': 0.1998, 'grad_norm': 0.41828298568725586, 'learning_rate': 5.25041643580293e-07, 'epoch': 0.94}
94%|█████████▍| 4242/4506 [4:49:42<17:37, 4.01s/it]
94%|█████████▍| 4243/4506 [4:49:46<17:54, 4.09s/it]
{'loss': 0.199, 'grad_norm': 0.386231929063797, 'learning_rate': 5.211003395763858e-07, 'epoch': 0.94}
94%|█████████▍| 4243/4506 [4:49:46<17:54, 4.09s/it]
94%|█████████▍| 4244/4506 [4:49:50<17:44, 4.06s/it]
{'loss': 0.1917, 'grad_norm': 0.46190139651298523, 'learning_rate': 5.171737285528638e-07, 'epoch': 0.94}
94%|█████████▍| 4244/4506 [4:49:50<17:44, 4.06s/it]
94%|█████████▍| 4245/4506 [4:49:54<17:21, 3.99s/it]
{'loss': 0.1997, 'grad_norm': 0.38199883699417114, 'learning_rate': 5.132618128665889e-07, 'epoch': 0.94}
94%|█████████▍| 4245/4506 [4:49:54<17:21, 3.99s/it]
94%|█████████▍| 4246/4506 [4:49:58<17:50, 4.12s/it]
{'loss': 0.1939, 'grad_norm': 0.3639383614063263, 'learning_rate': 5.093645948656218e-07, 'epoch': 0.94}
94%|█████████▍| 4246/4506 [4:49:58<17:50, 4.12s/it]
94%|█████████▍| 4247/4506 [4:50:02<17:45, 4.12s/it]
{'loss': 0.1974, 'grad_norm': 0.3923373520374298, 'learning_rate': 5.054820768891855e-07, 'epoch': 0.94}
94%|█████████▍| 4247/4506 [4:50:02<17:45, 4.12s/it]
94%|█████████▍| 4248/4506 [4:50:06<17:19, 4.03s/it]
{'loss': 0.1912, 'grad_norm': 0.3962663412094116, 'learning_rate': 5.016142612676883e-07, 'epoch': 0.94}
94%|█████████▍| 4248/4506 [4:50:06<17:19, 4.03s/it]
94%|█████████▍| 4249/4506 [4:50:10<17:03, 3.98s/it]
{'loss': 0.1968, 'grad_norm': 0.38079598546028137, 'learning_rate': 4.977611503227092e-07, 'epoch': 0.94}
94%|█████████▍| 4249/4506 [4:50:10<17:03, 3.98s/it]
94%|█████████▍| 4250/4506 [4:50:14<17:14, 4.04s/it]
{'loss': 0.1953, 'grad_norm': 0.36759206652641296, 'learning_rate': 4.939227463670038e-07, 'epoch': 0.94}
94%|█████████▍| 4250/4506 [4:50:14<17:14, 4.04s/it]
94%|█████████▍| 4251/4506 [4:50:18<17:31, 4.12s/it]
{'loss': 0.2103, 'grad_norm': 0.4163901209831238, 'learning_rate': 4.900990517044956e-07, 'epoch': 0.94}
94%|█████████▍| 4251/4506 [4:50:18<17:31, 4.12s/it]
94%|█████████▍| 4252/4506 [4:50:23<17:32, 4.15s/it]
{'loss': 0.1993, 'grad_norm': 0.4722650945186615, 'learning_rate': 4.862900686302879e-07, 'epoch': 0.94}
94%|█████████▍| 4252/4506 [4:50:23<17:32, 4.15s/it]
94%|█████████▍| 4253/4506 [4:50:27<17:21, 4.12s/it]
{'loss': 0.1946, 'grad_norm': 0.41936033964157104, 'learning_rate': 4.824957994306433e-07, 'epoch': 0.94}
94%|█████████▍| 4253/4506 [4:50:27<17:21, 4.12s/it]
94%|█████████▍| 4254/4506 [4:50:31<17:35, 4.19s/it]
{'loss': 0.1939, 'grad_norm': 0.4243394732475281, 'learning_rate': 4.787162463830014e-07, 'epoch': 0.94}
94%|█████████▍| 4254/4506 [4:50:31<17:35, 4.19s/it]
94%|█████████▍| 4255/4506 [4:50:35<17:31, 4.19s/it]
{'loss': 0.2047, 'grad_norm': 0.4212538003921509, 'learning_rate': 4.7495141175596667e-07, 'epoch': 0.94}
94%|█████████▍| 4255/4506 [4:50:35<17:31, 4.19s/it]
94%|█████████▍| 4256/4506 [4:50:39<17:15, 4.14s/it]
{'loss': 0.1927, 'grad_norm': 0.3844621479511261, 'learning_rate': 4.7120129780929835e-07, 'epoch': 0.94}
94%|█████████▍| 4256/4506 [4:50:39<17:15, 4.14s/it]
94%|█████████▍| 4257/4506 [4:50:43<17:11, 4.14s/it]
{'loss': 0.1962, 'grad_norm': 0.43328821659088135, 'learning_rate': 4.674659067939402e-07, 'epoch': 0.94}
94%|█████████▍| 4257/4506 [4:50:43<17:11, 4.14s/it]
94%|█████████▍| 4258/4506 [4:50:47<16:58, 4.11s/it]
{'loss': 0.2019, 'grad_norm': 0.4036131203174591, 'learning_rate': 4.6374524095197657e-07, 'epoch': 0.95}
94%|█████████▍| 4258/4506 [4:50:47<16:58, 4.11s/it]
95%|█████████▍| 4259/4506 [4:50:51<16:43, 4.06s/it]
{'loss': 0.2021, 'grad_norm': 0.3817920982837677, 'learning_rate': 4.6003930251667103e-07, 'epoch': 0.95}
95%|█████████▍| 4259/4506 [4:50:51<16:43, 4.06s/it]
95%|█████████▍| 4260/4506 [4:50:55<16:37, 4.05s/it]
{'loss': 0.1882, 'grad_norm': 0.3459521532058716, 'learning_rate': 4.563480937124387e-07, 'epoch': 0.95}
95%|█████████▍| 4260/4506 [4:50:55<16:37, 4.05s/it]
95%|█████████▍| 4261/4506 [4:51:00<16:54, 4.14s/it]
{'loss': 0.2029, 'grad_norm': 0.4121132493019104, 'learning_rate': 4.5267161675486e-07, 'epoch': 0.95}
95%|█████████▍| 4261/4506 [4:51:00<16:54, 4.14s/it]
95%|█████████▍| 4262/4506 [4:51:04<16:57, 4.17s/it]
{'loss': 0.1902, 'grad_norm': 0.38445964455604553, 'learning_rate': 4.490098738506615e-07, 'epoch': 0.95}
95%|█████████▍| 4262/4506 [4:51:04<16:57, 4.17s/it]
95%|█████████▍| 4263/4506 [4:51:08<16:31, 4.08s/it]
{'loss': 0.2049, 'grad_norm': 0.360634446144104, 'learning_rate': 4.453628671977378e-07, 'epoch': 0.95}
95%|█████████▍| 4263/4506 [4:51:08<16:31, 4.08s/it]
95%|█████████▍| 4264/4506 [4:51:11<15:51, 3.93s/it]
{'loss': 0.199, 'grad_norm': 0.4663832485675812, 'learning_rate': 4.417305989851267e-07, 'epoch': 0.95}
95%|█████████▍| 4264/4506 [4:51:11<15:51, 3.93s/it]
95%|█████████▍| 4265/4506 [4:51:16<16:11, 4.03s/it]
{'loss': 0.209, 'grad_norm': 0.4073139727115631, 'learning_rate': 4.3811307139303146e-07, 'epoch': 0.95}
95%|█████████▍| 4265/4506 [4:51:16<16:11, 4.03s/it]
95%|█████████▍| 4266/4506 [4:51:20<15:59, 4.00s/it]
{'loss': 0.1938, 'grad_norm': 0.36748164892196655, 'learning_rate': 4.3451028659280413e-07, 'epoch': 0.95}
95%|█████████▍| 4266/4506 [4:51:20<15:59, 4.00s/it]
95%|█████████▍| 4267/4506 [4:51:24<15:54, 3.99s/it]
{'loss': 0.2168, 'grad_norm': 0.46935752034187317, 'learning_rate': 4.309222467469398e-07, 'epoch': 0.95}
95%|█████████▍| 4267/4506 [4:51:24<15:54, 3.99s/it]
95%|█████████▍| 4268/4506 [4:51:28<16:00, 4.04s/it]
{'loss': 0.1937, 'grad_norm': 0.37853074073791504, 'learning_rate': 4.273489540090936e-07, 'epoch': 0.95}
95%|█████████▍| 4268/4506 [4:51:28<16:00, 4.04s/it]
95%|█████████▍| 4269/4506 [4:51:32<15:49, 4.01s/it]
{'loss': 0.187, 'grad_norm': 0.3854079842567444, 'learning_rate': 4.2379041052406364e-07, 'epoch': 0.95}
95%|█████████▍| 4269/4506 [4:51:32<15:49, 4.01s/it]
95%|█████████▍| 4270/4506 [4:51:36<15:49, 4.02s/it]
{'loss': 0.2024, 'grad_norm': 0.41339176893234253, 'learning_rate': 4.2024661842779424e-07, 'epoch': 0.95}
95%|█████████▍| 4270/4506 [4:51:36<15:49, 4.02s/it]
95%|█████████▍| 4271/4506 [4:51:40<16:12, 4.14s/it]
{'loss': 0.2034, 'grad_norm': 0.4418002963066101, 'learning_rate': 4.1671757984737827e-07, 'epoch': 0.95}
95%|█████████▍| 4271/4506 [4:51:40<16:12, 4.14s/it]
95%|█████████▍| 4272/4506 [4:51:44<15:41, 4.02s/it]
{'loss': 0.1922, 'grad_norm': 0.4343132972717285, 'learning_rate': 4.1320329690105466e-07, 'epoch': 0.95}
95%|█████████▍| 4272/4506 [4:51:44<15:41, 4.02s/it]
95%|█████████▍| 4273/4506 [4:51:48<15:34, 4.01s/it]
{'loss': 0.1814, 'grad_norm': 0.3706549406051636, 'learning_rate': 4.097037716981972e-07, 'epoch': 0.95}
95%|█████████▍| 4273/4506 [4:51:48<15:34, 4.01s/it]
95%|█████████▍| 4274/4506 [4:51:52<15:53, 4.11s/it]
{'loss': 0.2047, 'grad_norm': 0.45362594723701477, 'learning_rate': 4.06219006339334e-07, 'epoch': 0.95}
95%|█████████▍| 4274/4506 [4:51:52<15:53, 4.11s/it]
95%|█████████▍| 4275/4506 [4:51:56<15:25, 4.01s/it]
{'loss': 0.1935, 'grad_norm': 0.41657885909080505, 'learning_rate': 4.027490029161196e-07, 'epoch': 0.95}
95%|█████████▍| 4275/4506 [4:51:56<15:25, 4.01s/it]
95%|█████████▍| 4276/4506 [4:52:00<15:40, 4.09s/it]
{'loss': 0.1936, 'grad_norm': 0.3713635802268982, 'learning_rate': 3.9929376351136307e-07, 'epoch': 0.95}
95%|█████████▍| 4276/4506 [4:52:00<15:40, 4.09s/it]
95%|█████████▍| 4277/4506 [4:52:04<15:31, 4.07s/it]
{'loss': 0.1912, 'grad_norm': 0.4105583727359772, 'learning_rate': 3.9585329019899976e-07, 'epoch': 0.95}
95%|█████████▍| 4277/4506 [4:52:04<15:31, 4.07s/it]
95%|█████████▍| 4278/4506 [4:52:08<15:24, 4.05s/it]
{'loss': 0.2084, 'grad_norm': 0.4273221790790558, 'learning_rate': 3.924275850441084e-07, 'epoch': 0.95}
95%|█████████▍| 4278/4506 [4:52:08<15:24, 4.05s/it]
95%|█████████▍| 4279/4506 [4:52:12<15:16, 4.04s/it]
{'loss': 0.1941, 'grad_norm': 0.37864363193511963, 'learning_rate': 3.890166501028997e-07, 'epoch': 0.95}
95%|█████████▍| 4279/4506 [4:52:12<15:16, 4.04s/it]
95%|█████████▍| 4280/4506 [4:52:16<15:03, 4.00s/it]
{'loss': 0.2011, 'grad_norm': 0.4039371609687805, 'learning_rate': 3.856204874227248e-07, 'epoch': 0.95}
95%|█████████▍| 4280/4506 [4:52:16<15:03, 4.00s/it]
95%|█████████▌| 4281/4506 [4:52:20<14:51, 3.96s/it]
{'loss': 0.2072, 'grad_norm': 0.3918601870536804, 'learning_rate': 3.822390990420588e-07, 'epoch': 0.95}
95%|█████████▌| 4281/4506 [4:52:20<14:51, 3.96s/it]
95%|█████████▌| 4282/4506 [4:52:24<14:34, 3.91s/it]
{'loss': 0.1829, 'grad_norm': 0.4117942750453949, 'learning_rate': 3.788724869905169e-07, 'epoch': 0.95}
95%|█████████▌| 4282/4506 [4:52:24<14:34, 3.91s/it]
95%|█████████▌| 4283/4506 [4:52:28<14:32, 3.91s/it]
{'loss': 0.1884, 'grad_norm': 0.36132094264030457, 'learning_rate': 3.755206532888411e-07, 'epoch': 0.95}
95%|█████████▌| 4283/4506 [4:52:28<14:32, 3.91s/it]
95%|█████████▌| 4284/4506 [4:52:32<14:42, 3.98s/it]
{'loss': 0.2071, 'grad_norm': 0.4107201099395752, 'learning_rate': 3.721835999489026e-07, 'epoch': 0.95}
95%|█████████▌| 4284/4506 [4:52:32<14:42, 3.98s/it]
95%|█████████▌| 4285/4506 [4:52:36<14:58, 4.06s/it]
{'loss': 0.1981, 'grad_norm': 0.4339522123336792, 'learning_rate': 3.6886132897370194e-07, 'epoch': 0.95}
95%|█████████▌| 4285/4506 [4:52:36<14:58, 4.06s/it]
95%|█████████▌| 4286/4506 [4:52:40<14:27, 3.94s/it]
{'loss': 0.187, 'grad_norm': 0.35448336601257324, 'learning_rate': 3.6555384235737453e-07, 'epoch': 0.95}
95%|█████████▌| 4286/4506 [4:52:40<14:27, 3.94s/it]
95%|█████████▌| 4287/4506 [4:52:44<14:29, 3.97s/it]
{'loss': 0.2057, 'grad_norm': 0.4653412103652954, 'learning_rate': 3.622611420851657e-07, 'epoch': 0.95}
95%|█████████▌| 4287/4506 [4:52:44<14:29, 3.97s/it]
95%|█████████▌| 4288/4506 [4:52:48<14:33, 4.01s/it]
{'loss': 0.1967, 'grad_norm': 0.35818424820899963, 'learning_rate': 3.5898323013345837e-07, 'epoch': 0.95}
95%|█████████▌| 4288/4506 [4:52:48<14:33, 4.01s/it]
95%|█████████▌| 4289/4506 [4:52:52<14:29, 4.01s/it]
{'loss': 0.2057, 'grad_norm': 0.40239253640174866, 'learning_rate': 3.5572010846975365e-07, 'epoch': 0.95}
95%|█████████▌| 4289/4506 [4:52:52<14:29, 4.01s/it]
95%|█████████▌| 4290/4506 [4:52:56<14:40, 4.07s/it]
{'loss': 0.1975, 'grad_norm': 0.39473438262939453, 'learning_rate': 3.524717790526766e-07, 'epoch': 0.95}
95%|█████████▌| 4290/4506 [4:52:56<14:40, 4.07s/it]
95%|█████████▌| 4291/4506 [4:53:01<14:57, 4.17s/it]
{'loss': 0.19, 'grad_norm': 0.39289605617523193, 'learning_rate': 3.492382438319758e-07, 'epoch': 0.95}
95%|█████████▌| 4291/4506 [4:53:01<14:57, 4.17s/it]
95%|█████████▌| 4292/4506 [4:53:05<15:01, 4.21s/it]
{'loss': 0.2075, 'grad_norm': 0.42396143078804016, 'learning_rate': 3.460195047485126e-07, 'epoch': 0.95}
95%|█████████▌| 4292/4506 [4:53:05<15:01, 4.21s/it]
95%|█████████▌| 4293/4506 [4:53:09<14:48, 4.17s/it]
{'loss': 0.2015, 'grad_norm': 0.42042508721351624, 'learning_rate': 3.428155637342778e-07, 'epoch': 0.95}
95%|█████████▌| 4293/4506 [4:53:09<14:48, 4.17s/it]
95%|█████████▌| 4294/4506 [4:53:13<14:54, 4.22s/it]
{'loss': 0.1973, 'grad_norm': 0.40671148896217346, 'learning_rate': 3.3962642271236913e-07, 'epoch': 0.95}
95%|█████████▌| 4294/4506 [4:53:13<14:54, 4.22s/it]
95%|█████████▌| 4295/4506 [4:53:17<14:29, 4.12s/it]
{'loss': 0.1869, 'grad_norm': 0.36054423451423645, 'learning_rate': 3.3645208359701366e-07, 'epoch': 0.95}
95%|█████████▌| 4295/4506 [4:53:17<14:29, 4.12s/it]
95%|█████████▌| 4296/4506 [4:53:21<14:22, 4.11s/it]
{'loss': 0.1965, 'grad_norm': 0.3937097191810608, 'learning_rate': 3.332925482935345e-07, 'epoch': 0.95}
95%|█████████▌| 4296/4506 [4:53:21<14:22, 4.11s/it]
95%|█████████▌| 4297/4506 [4:53:25<14:18, 4.11s/it]
{'loss': 0.189, 'grad_norm': 0.36567914485931396, 'learning_rate': 3.3014781869838973e-07, 'epoch': 0.95}
95%|█████████▌| 4297/4506 [4:53:25<14:18, 4.11s/it]
95%|█████████▌| 4298/4506 [4:53:29<14:03, 4.06s/it]
{'loss': 0.1875, 'grad_norm': 0.39304813742637634, 'learning_rate': 3.2701789669914165e-07, 'epoch': 0.95}
95%|█████████▌| 4298/4506 [4:53:29<14:03, 4.06s/it]
95%|█████████▌| 4299/4506 [4:53:33<13:55, 4.04s/it]
{'loss': 0.1997, 'grad_norm': 0.3790150284767151, 'learning_rate': 3.239027841744624e-07, 'epoch': 0.95}
95%|█████████▌| 4299/4506 [4:53:33<13:55, 4.04s/it]
95%|█████████▌| 4300/4506 [4:53:37<13:50, 4.03s/it]
{'loss': 0.1983, 'grad_norm': 0.4903334975242615, 'learning_rate': 3.2080248299413416e-07, 'epoch': 0.95}
95%|█████████▌| 4300/4506 [4:53:37<13:50, 4.03s/it]
95%|█████████▌| 4301/4506 [4:53:41<13:35, 3.98s/it]
{'loss': 0.1993, 'grad_norm': 0.42339080572128296, 'learning_rate': 3.1771699501905726e-07, 'epoch': 0.95}
95%|█████████▌| 4301/4506 [4:53:41<13:35, 3.98s/it]
95%|█████████▌| 4302/4506 [4:53:45<13:19, 3.92s/it]
{'loss': 0.1935, 'grad_norm': 0.40354785323143005, 'learning_rate': 3.146463221012336e-07, 'epoch': 0.95}
95%|█████████▌| 4302/4506 [4:53:45<13:19, 3.92s/it]
95%|█████████▌| 4303/4506 [4:53:49<13:45, 4.07s/it]
{'loss': 0.1941, 'grad_norm': 0.4054623544216156, 'learning_rate': 3.1159046608377484e-07, 'epoch': 0.96}
95%|█████████▌| 4303/4506 [4:53:49<13:45, 4.07s/it]
96%|█████████▌| 4304/4506 [4:53:53<13:40, 4.06s/it]
{'loss': 0.1956, 'grad_norm': 0.4171094298362732, 'learning_rate': 3.085494288008972e-07, 'epoch': 0.96}
96%|█████████▌| 4304/4506 [4:53:53<13:40, 4.06s/it]
96%|█████████▌| 4305/4506 [4:53:57<13:26, 4.01s/it]
{'loss': 0.196, 'grad_norm': 0.43415719270706177, 'learning_rate': 3.0552321207792666e-07, 'epoch': 0.96}
96%|█████████▌| 4305/4506 [4:53:57<13:26, 4.01s/it]
96%|█████████▌| 4306/4506 [4:54:01<13:10, 3.95s/it]
{'loss': 0.1979, 'grad_norm': 0.4414050281047821, 'learning_rate': 3.02511817731288e-07, 'epoch': 0.96}
96%|█████████▌| 4306/4506 [4:54:01<13:10, 3.95s/it]
96%|█████████▌| 4307/4506 [4:54:06<13:55, 4.20s/it]
{'loss': 0.2088, 'grad_norm': 0.37499603629112244, 'learning_rate': 2.9951524756851034e-07, 'epoch': 0.96}
96%|█████████▌| 4307/4506 [4:54:06<13:55, 4.20s/it]
96%|█████████▌| 4308/4506 [4:54:10<13:37, 4.13s/it]
{'loss': 0.1884, 'grad_norm': 0.39020439982414246, 'learning_rate': 2.9653350338822995e-07, 'epoch': 0.96}
96%|█████████▌| 4308/4506 [4:54:10<13:37, 4.13s/it]
96%|█████████▌| 4309/4506 [4:54:14<13:10, 4.01s/it]
{'loss': 0.19, 'grad_norm': 0.37543705105781555, 'learning_rate': 2.93566586980179e-07, 'epoch': 0.96}
96%|█████████▌| 4309/4506 [4:54:14<13:10, 4.01s/it]
96%|█████████▌| 4310/4506 [4:54:18<13:21, 4.09s/it]
{'loss': 0.1866, 'grad_norm': 0.40840327739715576, 'learning_rate': 2.906145001251914e-07, 'epoch': 0.96}
96%|█████████▌| 4310/4506 [4:54:18<13:21, 4.09s/it]
96%|█████████▌| 4311/4506 [4:54:22<13:04, 4.02s/it]
{'loss': 0.1909, 'grad_norm': 0.4237089455127716, 'learning_rate': 2.8767724459519973e-07, 'epoch': 0.96}
96%|█████████▌| 4311/4506 [4:54:22<13:04, 4.02s/it]
96%|█████████▌| 4312/4506 [4:54:26<12:58, 4.02s/it]
{'loss': 0.2001, 'grad_norm': 0.4173308312892914, 'learning_rate': 2.847548221532326e-07, 'epoch': 0.96}
96%|█████████▌| 4312/4506 [4:54:26<12:58, 4.02s/it]
96%|█████████▌| 4313/4506 [4:54:30<12:52, 4.00s/it]
{'loss': 0.1968, 'grad_norm': 0.4256829023361206, 'learning_rate': 2.818472345534173e-07, 'epoch': 0.96}
96%|█████████▌| 4313/4506 [4:54:30<12:52, 4.00s/it]
96%|█████████▌| 4314/4506 [4:54:34<12:49, 4.01s/it]
{'loss': 0.1889, 'grad_norm': 0.3972247540950775, 'learning_rate': 2.789544835409774e-07, 'epoch': 0.96}
96%|█████████▌| 4314/4506 [4:54:34<12:49, 4.01s/it]
96%|█████████▌| 4315/4506 [4:54:38<13:20, 4.19s/it]
{'loss': 0.198, 'grad_norm': 0.39972248673439026, 'learning_rate': 2.760765708522295e-07, 'epoch': 0.96}
96%|█████████▌| 4315/4506 [4:54:38<13:20, 4.19s/it]
96%|█████████▌| 4316/4506 [4:54:43<13:13, 4.18s/it]
{'loss': 0.1977, 'grad_norm': 0.39016610383987427, 'learning_rate': 2.7321349821458344e-07, 'epoch': 0.96}
96%|█████████▌| 4316/4506 [4:54:43<13:13, 4.18s/it]
96%|█████████▌| 4317/4506 [4:54:47<13:03, 4.15s/it]
{'loss': 0.1904, 'grad_norm': 0.376995712518692, 'learning_rate': 2.7036526734654234e-07, 'epoch': 0.96}
96%|█████████▌| 4317/4506 [4:54:47<13:03, 4.15s/it]
96%|█████████▌| 4318/4506 [4:54:51<13:08, 4.19s/it]
{'loss': 0.2103, 'grad_norm': 0.35102391242980957, 'learning_rate': 2.675318799577053e-07, 'epoch': 0.96}
96%|█████████▌| 4318/4506 [4:54:51<13:08, 4.19s/it]
96%|█████████▌| 4319/4506 [4:54:55<12:54, 4.14s/it]
{'loss': 0.2076, 'grad_norm': 0.4798688292503357, 'learning_rate': 2.6471333774875375e-07, 'epoch': 0.96}
96%|█████████▌| 4319/4506 [4:54:55<12:54, 4.14s/it]
96%|█████████▌| 4320/4506 [4:54:59<12:43, 4.10s/it]
{'loss': 0.1946, 'grad_norm': 0.39627185463905334, 'learning_rate': 2.61909642411462e-07, 'epoch': 0.96}
96%|█████████▌| 4320/4506 [4:54:59<12:43, 4.10s/it]
96%|█████████▌| 4321/4506 [4:55:03<12:38, 4.10s/it]
{'loss': 0.2089, 'grad_norm': 0.43182623386383057, 'learning_rate': 2.5912079562869504e-07, 'epoch': 0.96}
96%|█████████▌| 4321/4506 [4:55:03<12:38, 4.10s/it]
96%|█████████▌| 4322/4506 [4:55:07<12:21, 4.03s/it]
{'loss': 0.1935, 'grad_norm': 0.4081030488014221, 'learning_rate': 2.5634679907440006e-07, 'epoch': 0.96}
96%|█████████▌| 4322/4506 [4:55:07<12:21, 4.03s/it]
96%|█████████▌| 4323/4506 [4:55:11<12:06, 3.97s/it]
{'loss': 0.2026, 'grad_norm': 0.40152114629745483, 'learning_rate': 2.535876544136201e-07, 'epoch': 0.96}
96%|█████████▌| 4323/4506 [4:55:11<12:06, 3.97s/it]
96%|█████████▌| 4324/4506 [4:55:15<12:28, 4.11s/it]
{'loss': 0.2077, 'grad_norm': 0.36872777342796326, 'learning_rate': 2.5084336330247484e-07, 'epoch': 0.96}
96%|█████████▌| 4324/4506 [4:55:15<12:28, 4.11s/it]
96%|█████████▌| 4325/4506 [4:55:19<12:10, 4.04s/it]
{'loss': 0.1904, 'grad_norm': 0.3836245536804199, 'learning_rate': 2.481139273881716e-07, 'epoch': 0.96}
96%|█████████▌| 4325/4506 [4:55:19<12:10, 4.04s/it]
96%|█████████▌| 4326/4506 [4:55:23<12:02, 4.01s/it]
{'loss': 0.1873, 'grad_norm': 0.37577375769615173, 'learning_rate': 2.4539934830900003e-07, 'epoch': 0.96}
96%|█████████▌| 4326/4506 [4:55:23<12:02, 4.01s/it]
96%|█████████▌| 4327/4506 [4:55:27<12:11, 4.09s/it]
{'loss': 0.1977, 'grad_norm': 0.35429370403289795, 'learning_rate': 2.4269962769433164e-07, 'epoch': 0.96}
96%|█████████▌| 4327/4506 [4:55:27<12:11, 4.09s/it]
96%|█████████▌| 4328/4506 [4:55:31<12:04, 4.07s/it]
{'loss': 0.2051, 'grad_norm': 0.38556843996047974, 'learning_rate': 2.40014767164623e-07, 'epoch': 0.96}
96%|█████████▌| 4328/4506 [4:55:31<12:04, 4.07s/it]
96%|█████████▌| 4329/4506 [4:55:35<12:02, 4.08s/it]
{'loss': 0.1868, 'grad_norm': 0.44040364027023315, 'learning_rate': 2.3734476833141005e-07, 'epoch': 0.96}
96%|█████████▌| 4329/4506 [4:55:35<12:02, 4.08s/it]
96%|█████████▌| 4330/4506 [4:55:39<11:55, 4.07s/it]
{'loss': 0.199, 'grad_norm': 0.37242305278778076, 'learning_rate': 2.3468963279730804e-07, 'epoch': 0.96}
96%|█████████▌| 4330/4506 [4:55:39<11:55, 4.07s/it]
96%|█████████▌| 4331/4506 [4:55:44<11:55, 4.09s/it]
{'loss': 0.1958, 'grad_norm': 0.37694424390792847, 'learning_rate': 2.3204936215600603e-07, 'epoch': 0.96}
96%|█████████▌| 4331/4506 [4:55:44<11:55, 4.09s/it]
96%|█████████▌| 4332/4506 [4:55:48<11:46, 4.06s/it]
{'loss': 0.1893, 'grad_norm': 0.4750857651233673, 'learning_rate': 2.2942395799227523e-07, 'epoch': 0.96}
96%|█████████▌| 4332/4506 [4:55:48<11:46, 4.06s/it]
96%|█████████▌| 4333/4506 [4:55:52<11:48, 4.10s/it]
{'loss': 0.2023, 'grad_norm': 0.4385031461715698, 'learning_rate': 2.2681342188196898e-07, 'epoch': 0.96}
96%|█████████▌| 4333/4506 [4:55:52<11:48, 4.10s/it]
96%|█████████▌| 4334/4506 [4:55:56<11:45, 4.10s/it]
{'loss': 0.1988, 'grad_norm': 0.37110093235969543, 'learning_rate': 2.242177553920033e-07, 'epoch': 0.96}
96%|█████████▌| 4334/4506 [4:55:56<11:45, 4.10s/it]
96%|█████████▌| 4335/4506 [4:56:00<11:47, 4.14s/it]
{'loss': 0.1954, 'grad_norm': 0.4272224009037018, 'learning_rate': 2.2163696008037914e-07, 'epoch': 0.96}
96%|█████████▌| 4335/4506 [4:56:00<11:47, 4.14s/it]
96%|█████████▌| 4336/4506 [4:56:04<11:41, 4.13s/it]
{'loss': 0.1953, 'grad_norm': 0.40391838550567627, 'learning_rate': 2.1907103749616852e-07, 'epoch': 0.96}
96%|█████████▌| 4336/4506 [4:56:04<11:41, 4.13s/it]
96%|█████████▌| 4337/4506 [4:56:08<11:31, 4.09s/it]
{'loss': 0.1879, 'grad_norm': 0.3889187276363373, 'learning_rate': 2.1651998917952e-07, 'epoch': 0.96}
96%|█████████▌| 4337/4506 [4:56:08<11:31, 4.09s/it]
96%|█████████▋| 4338/4506 [4:56:12<11:29, 4.10s/it]
{'loss': 0.2023, 'grad_norm': 0.4151346683502197, 'learning_rate': 2.13983816661642e-07, 'epoch': 0.96}
96%|█████████▋| 4338/4506 [4:56:12<11:29, 4.10s/it]
96%|█████████▋| 4339/4506 [4:56:17<11:54, 4.28s/it]
{'loss': 0.1949, 'grad_norm': 0.35456135869026184, 'learning_rate': 2.114625214648308e-07, 'epoch': 0.96}
96%|█████████▋| 4339/4506 [4:56:17<11:54, 4.28s/it]
96%|█████████▋| 4340/4506 [4:56:21<11:27, 4.14s/it]
{'loss': 0.1978, 'grad_norm': 0.41559383273124695, 'learning_rate': 2.089561051024369e-07, 'epoch': 0.96}
96%|█████████▋| 4340/4506 [4:56:21<11:27, 4.14s/it]
96%|█████████▋| 4341/4506 [4:56:25<11:19, 4.12s/it]
{'loss': 0.1914, 'grad_norm': 0.40752026438713074, 'learning_rate': 2.06464569078893e-07, 'epoch': 0.96}
96%|█████████▋| 4341/4506 [4:56:25<11:19, 4.12s/it]
96%|█████████▋| 4342/4506 [4:56:28<10:49, 3.96s/it]
{'loss': 0.1953, 'grad_norm': 0.42721790075302124, 'learning_rate': 2.039879148896945e-07, 'epoch': 0.96}
96%|█████████▋| 4342/4506 [4:56:28<10:49, 3.96s/it]
96%|█████████▋| 4343/4506 [4:56:33<10:55, 4.02s/it]
{'loss': 0.2042, 'grad_norm': 0.39897647500038147, 'learning_rate': 2.0152614402140224e-07, 'epoch': 0.96}
96%|█████████▋| 4343/4506 [4:56:33<10:55, 4.02s/it]
96%|█████████▋| 4344/4506 [4:56:36<10:41, 3.96s/it]
{'loss': 0.2002, 'grad_norm': 0.39554670453071594, 'learning_rate': 1.9907925795164817e-07, 'epoch': 0.96}
96%|█████████▋| 4344/4506 [4:56:36<10:41, 3.96s/it]
96%|█████████▋| 4345/4506 [4:56:40<10:38, 3.97s/it]
{'loss': 0.18, 'grad_norm': 0.3656444251537323, 'learning_rate': 1.9664725814912688e-07, 'epoch': 0.96}
96%|█████████▋| 4345/4506 [4:56:40<10:38, 3.97s/it]
96%|█████████▋| 4346/4506 [4:56:44<10:32, 3.95s/it]
{'loss': 0.193, 'grad_norm': 0.3687048554420471, 'learning_rate': 1.942301460735957e-07, 'epoch': 0.96}
96%|█████████▋| 4346/4506 [4:56:44<10:32, 3.95s/it]
96%|█████████▋| 4347/4506 [4:56:48<10:34, 3.99s/it]
{'loss': 0.1943, 'grad_norm': 0.3759593665599823, 'learning_rate': 1.9182792317588295e-07, 'epoch': 0.96}
96%|█████████▋| 4347/4506 [4:56:48<10:34, 3.99s/it]
96%|█████████▋| 4348/4506 [4:56:53<10:38, 4.04s/it]
{'loss': 0.2044, 'grad_norm': 0.36217984557151794, 'learning_rate': 1.8944059089787692e-07, 'epoch': 0.97}
96%|█████████▋| 4348/4506 [4:56:53<10:38, 4.04s/it]
97%|█████████▋| 4349/4506 [4:56:57<10:45, 4.11s/it]
{'loss': 0.1945, 'grad_norm': 0.3864218294620514, 'learning_rate': 1.8706815067252304e-07, 'epoch': 0.97}
97%|█████████▋| 4349/4506 [4:56:57<10:45, 4.11s/it]
97%|█████████▋| 4350/4506 [4:57:01<10:46, 4.14s/it]
{'loss': 0.1941, 'grad_norm': 0.3458574116230011, 'learning_rate': 1.84710603923835e-07, 'epoch': 0.97}
97%|█████████▋| 4350/4506 [4:57:01<10:46, 4.14s/it]
97%|█████████▋| 4351/4506 [4:57:05<10:37, 4.11s/it]
{'loss': 0.1932, 'grad_norm': 0.3750353157520294, 'learning_rate': 1.8236795206688084e-07, 'epoch': 0.97}
97%|█████████▋| 4351/4506 [4:57:05<10:37, 4.11s/it]
97%|█████████▋| 4352/4506 [4:57:09<10:32, 4.11s/it]
{'loss': 0.1921, 'grad_norm': 0.366504043340683, 'learning_rate': 1.800401965077969e-07, 'epoch': 0.97}
97%|█████████▋| 4352/4506 [4:57:09<10:32, 4.11s/it]
97%|█████████▋| 4353/4506 [4:57:14<10:39, 4.18s/it]
{'loss': 0.1938, 'grad_norm': 0.39904168248176575, 'learning_rate': 1.7772733864376555e-07, 'epoch': 0.97}
97%|█████████▋| 4353/4506 [4:57:14<10:39, 4.18s/it]
97%|█████████▋| 4354/4506 [4:57:17<10:18, 4.07s/it]
{'loss': 0.1868, 'grad_norm': 0.37574535608291626, 'learning_rate': 1.754293798630402e-07, 'epoch': 0.97}
97%|█████████▋| 4354/4506 [4:57:17<10:18, 4.07s/it]
97%|█████████▋| 4355/4506 [4:57:21<10:07, 4.02s/it]
{'loss': 0.1925, 'grad_norm': 0.4535418450832367, 'learning_rate': 1.7314632154492306e-07, 'epoch': 0.97}
97%|█████████▋| 4355/4506 [4:57:21<10:07, 4.02s/it]
97%|█████████▋| 4356/4506 [4:57:25<10:07, 4.05s/it]
{'loss': 0.2101, 'grad_norm': 0.46078962087631226, 'learning_rate': 1.7087816505977905e-07, 'epoch': 0.97}
97%|█████████▋| 4356/4506 [4:57:25<10:07, 4.05s/it]
97%|█████████▋| 4357/4506 [4:57:29<10:03, 4.05s/it]
{'loss': 0.1932, 'grad_norm': 0.38911888003349304, 'learning_rate': 1.6862491176901919e-07, 'epoch': 0.97}
97%|█████████▋| 4357/4506 [4:57:29<10:03, 4.05s/it]
97%|█████████▋| 4358/4506 [4:57:34<10:06, 4.10s/it]
{'loss': 0.2023, 'grad_norm': 0.490012526512146, 'learning_rate': 1.6638656302511712e-07, 'epoch': 0.97}
97%|█████████▋| 4358/4506 [4:57:34<10:06, 4.10s/it]
97%|█████████▋| 4359/4506 [4:57:38<10:06, 4.13s/it]
{'loss': 0.1878, 'grad_norm': 0.3872903883457184, 'learning_rate': 1.6416312017159818e-07, 'epoch': 0.97}
97%|█████████▋| 4359/4506 [4:57:38<10:06, 4.13s/it]
97%|█████████▋| 4360/4506 [4:57:42<10:06, 4.15s/it]
{'loss': 0.1972, 'grad_norm': 0.3511144816875458, 'learning_rate': 1.6195458454304203e-07, 'epoch': 0.97}
97%|█████████▋| 4360/4506 [4:57:42<10:06, 4.15s/it]
97%|█████████▋| 4361/4506 [4:57:46<09:59, 4.13s/it]
{'loss': 0.1994, 'grad_norm': 0.39049866795539856, 'learning_rate': 1.5976095746507435e-07, 'epoch': 0.97}
97%|█████████▋| 4361/4506 [4:57:46<09:59, 4.13s/it]
97%|█████████▋| 4362/4506 [4:57:50<09:37, 4.01s/it]
{'loss': 0.1961, 'grad_norm': 0.3834914267063141, 'learning_rate': 1.5758224025438084e-07, 'epoch': 0.97}
97%|█████████▋| 4362/4506 [4:57:50<09:37, 4.01s/it]
97%|█████████▋| 4363/4506 [4:57:54<09:40, 4.06s/it]
{'loss': 0.1959, 'grad_norm': 0.38400688767433167, 'learning_rate': 1.5541843421869317e-07, 'epoch': 0.97}
97%|█████████▋| 4363/4506 [4:57:54<09:40, 4.06s/it]
97%|█████████▋| 4364/4506 [4:57:58<09:30, 4.02s/it]
{'loss': 0.1944, 'grad_norm': 0.3980599343776703, 'learning_rate': 1.5326954065679188e-07, 'epoch': 0.97}
97%|█████████▋| 4364/4506 [4:57:58<09:30, 4.02s/it]
97%|█████████▋| 4365/4506 [4:58:02<09:33, 4.06s/it]
{'loss': 0.2035, 'grad_norm': 0.4051656126976013, 'learning_rate': 1.511355608585119e-07, 'epoch': 0.97}
97%|█████████▋| 4365/4506 [4:58:02<09:33, 4.06s/it]
97%|█████████▋| 4366/4506 [4:58:06<09:27, 4.06s/it]
{'loss': 0.2024, 'grad_norm': 0.3877652585506439, 'learning_rate': 1.490164961047258e-07, 'epoch': 0.97}
97%|█████████▋| 4366/4506 [4:58:06<09:27, 4.06s/it]
97%|█████████▋| 4367/4506 [4:58:10<09:27, 4.08s/it]
{'loss': 0.1924, 'grad_norm': 0.4011330008506775, 'learning_rate': 1.4691234766736895e-07, 'epoch': 0.97}
97%|█████████▋| 4367/4506 [4:58:10<09:27, 4.08s/it]
97%|█████████▋| 4368/4506 [4:58:14<09:12, 4.00s/it]
{'loss': 0.1886, 'grad_norm': 0.3629530370235443, 'learning_rate': 1.4482311680941163e-07, 'epoch': 0.97}
97%|█████████▋| 4368/4506 [4:58:14<09:12, 4.00s/it]
97%|█████████▋| 4369/4506 [4:58:18<08:57, 3.93s/it]
{'loss': 0.2015, 'grad_norm': 0.40164536237716675, 'learning_rate': 1.4274880478487296e-07, 'epoch': 0.97}
97%|█████████▋| 4369/4506 [4:58:18<08:57, 3.93s/it]
97%|█████████▋| 4370/4506 [4:58:22<08:56, 3.95s/it]
{'loss': 0.2003, 'grad_norm': 0.35751548409461975, 'learning_rate': 1.4068941283881809e-07, 'epoch': 0.97}
97%|█████████▋| 4370/4506 [4:58:22<08:56, 3.95s/it]
97%|█████████▋| 4371/4506 [4:58:26<09:01, 4.01s/it]
{'loss': 0.1991, 'grad_norm': 0.41939443349838257, 'learning_rate': 1.3864494220735825e-07, 'epoch': 0.97}
97%|█████████▋| 4371/4506 [4:58:26<09:01, 4.01s/it]
97%|█████████▋| 4372/4506 [4:58:30<09:04, 4.06s/it]
{'loss': 0.1873, 'grad_norm': 0.38831833004951477, 'learning_rate': 1.3661539411764513e-07, 'epoch': 0.97}
97%|█████████▋| 4372/4506 [4:58:30<09:04, 4.06s/it]
97%|█████████▋| 4373/4506 [4:58:34<09:07, 4.12s/it]
{'loss': 0.1881, 'grad_norm': 0.3968115448951721, 'learning_rate': 1.346007697878765e-07, 'epoch': 0.97}
97%|█████████▋| 4373/4506 [4:58:35<09:07, 4.12s/it]
97%|█████████▋| 4374/4506 [4:58:39<09:28, 4.31s/it]
{'loss': 0.1903, 'grad_norm': 0.3392868936061859, 'learning_rate': 1.3260107042729342e-07, 'epoch': 0.97}
97%|█████████▋| 4374/4506 [4:58:39<09:28, 4.31s/it]
97%|█████████▋| 4375/4506 [4:58:43<09:03, 4.15s/it]
{'loss': 0.2063, 'grad_norm': 0.4355619251728058, 'learning_rate': 1.3061629723617185e-07, 'epoch': 0.97}
97%|█████████▋| 4375/4506 [4:58:43<09:03, 4.15s/it]
97%|█████████▋| 4376/4506 [4:58:47<08:59, 4.15s/it]
{'loss': 0.1954, 'grad_norm': 0.4065239727497101, 'learning_rate': 1.2864645140583386e-07, 'epoch': 0.97}
97%|█████████▋| 4376/4506 [4:58:47<08:59, 4.15s/it]
97%|█████████▋| 4377/4506 [4:58:51<08:52, 4.13s/it]
{'loss': 0.2032, 'grad_norm': 0.4016261398792267, 'learning_rate': 1.2669153411864476e-07, 'epoch': 0.97}
97%|█████████▋| 4377/4506 [4:58:51<08:52, 4.13s/it]
97%|█████████▋| 4378/4506 [4:58:55<08:51, 4.16s/it]
{'loss': 0.197, 'grad_norm': 0.4030523896217346, 'learning_rate': 1.2475154654799927e-07, 'epoch': 0.97}
97%|█████████▋| 4378/4506 [4:58:55<08:51, 4.16s/it]
97%|█████████▋| 4379/4506 [4:58:59<08:37, 4.07s/it]
{'loss': 0.1961, 'grad_norm': 0.4606389105319977, 'learning_rate': 1.2282648985834088e-07, 'epoch': 0.97}
97%|█████████▋| 4379/4506 [4:58:59<08:37, 4.07s/it]
97%|█████████▋| 4380/4506 [4:59:03<08:30, 4.05s/it]
{'loss': 0.1962, 'grad_norm': 0.43388456106185913, 'learning_rate': 1.2091636520515093e-07, 'epoch': 0.97}
97%|█████████▋| 4380/4506 [4:59:03<08:30, 4.05s/it]
97%|█████████▋| 4381/4506 [4:59:08<08:38, 4.15s/it]
{'loss': 0.1989, 'grad_norm': 0.42626461386680603, 'learning_rate': 1.1902117373493727e-07, 'epoch': 0.97}
97%|█████████▋| 4381/4506 [4:59:08<08:38, 4.15s/it]
97%|█████████▋| 4382/4506 [4:59:12<08:40, 4.20s/it]
{'loss': 0.1986, 'grad_norm': 0.39457935094833374, 'learning_rate': 1.1714091658525383e-07, 'epoch': 0.97}
97%|█████████▋| 4382/4506 [4:59:12<08:40, 4.20s/it]
97%|█████████▋| 4383/4506 [4:59:16<08:36, 4.20s/it]
{'loss': 0.1905, 'grad_norm': 0.41041505336761475, 'learning_rate': 1.1527559488468953e-07, 'epoch': 0.97}
97%|█████████▋| 4383/4506 [4:59:16<08:36, 4.20s/it]
97%|█████████▋| 4384/4506 [4:59:20<08:23, 4.12s/it]
{'loss': 0.2095, 'grad_norm': 0.4282996356487274, 'learning_rate': 1.1342520975286541e-07, 'epoch': 0.97}
97%|█████████▋| 4384/4506 [4:59:20<08:23, 4.12s/it]
97%|█████████▋| 4385/4506 [4:59:24<08:13, 4.08s/it]
{'loss': 0.1957, 'grad_norm': 0.4384167492389679, 'learning_rate': 1.115897623004375e-07, 'epoch': 0.97}
97%|█████████▋| 4385/4506 [4:59:24<08:13, 4.08s/it]
97%|█████████▋| 4386/4506 [4:59:28<07:49, 3.91s/it]
{'loss': 0.196, 'grad_norm': 0.377660870552063, 'learning_rate': 1.0976925362910507e-07, 'epoch': 0.97}
97%|█████████▋| 4386/4506 [4:59:28<07:49, 3.91s/it]
97%|█████████▋| 4387/4506 [4:59:32<07:55, 3.99s/it]
{'loss': 0.1822, 'grad_norm': 0.3905128240585327, 'learning_rate': 1.0796368483158293e-07, 'epoch': 0.97}
97%|█████████▋| 4387/4506 [4:59:32<07:55, 3.99s/it]
97%|█████████▋| 4388/4506 [4:59:36<07:42, 3.92s/it]
{'loss': 0.1908, 'grad_norm': 0.381356418132782, 'learning_rate': 1.0617305699163472e-07, 'epoch': 0.97}
97%|█████████▋| 4388/4506 [4:59:36<07:42, 3.92s/it]
97%|█████████▋| 4389/4506 [4:59:40<07:45, 3.98s/it]
{'loss': 0.1989, 'grad_norm': 0.389679878950119, 'learning_rate': 1.0439737118404791e-07, 'epoch': 0.97}
97%|█████████▋| 4389/4506 [4:59:40<07:45, 3.98s/it]
97%|█████████▋| 4390/4506 [4:59:44<07:56, 4.11s/it]
{'loss': 0.1907, 'grad_norm': 0.3809577524662018, 'learning_rate': 1.0263662847464217e-07, 'epoch': 0.97}
97%|█████████▋| 4390/4506 [4:59:44<07:56, 4.11s/it]
97%|█████████▋| 4391/4506 [4:59:48<07:52, 4.11s/it]
{'loss': 0.197, 'grad_norm': 0.3929407596588135, 'learning_rate': 1.0089082992026932e-07, 'epoch': 0.97}
97%|█████████▋| 4391/4506 [4:59:48<07:52, 4.11s/it]
97%|█████████▋| 4392/4506 [4:59:53<07:55, 4.17s/it]
{'loss': 0.2011, 'grad_norm': 0.40849316120147705, 'learning_rate': 9.915997656881337e-08, 'epoch': 0.97}
97%|█████████▋| 4392/4506 [4:59:53<07:55, 4.17s/it]
97%|█████████▋| 4393/4506 [4:59:57<07:44, 4.11s/it]
{'loss': 0.1923, 'grad_norm': 0.35472968220710754, 'learning_rate': 9.744406945918217e-08, 'epoch': 0.98}
97%|█████████▋| 4393/4506 [4:59:57<07:44, 4.11s/it]
98%|█████████▊| 4394/4506 [5:00:01<07:43, 4.14s/it]
{'loss': 0.209, 'grad_norm': 0.3982284367084503, 'learning_rate': 9.574310962131294e-08, 'epoch': 0.98}
98%|█████████▊| 4394/4506 [5:00:01<07:43, 4.14s/it]
98%|█████████▊| 4395/4506 [5:00:05<07:35, 4.10s/it]
{'loss': 0.19, 'grad_norm': 0.41561421751976013, 'learning_rate': 9.405709807618068e-08, 'epoch': 0.98}
98%|█████████▊| 4395/4506 [5:00:05<07:35, 4.10s/it]
98%|█████████▊| 4396/4506 [5:00:09<07:43, 4.22s/it]
{'loss': 0.2036, 'grad_norm': 0.4491631090641022, 'learning_rate': 9.238603583577588e-08, 'epoch': 0.98}
98%|█████████▊| 4396/4506 [5:00:09<07:43, 4.22s/it]
98%|█████████▊| 4397/4506 [5:00:13<07:34, 4.17s/it]
{'loss': 0.1929, 'grad_norm': 0.39441463351249695, 'learning_rate': 9.072992390312118e-08, 'epoch': 0.98}
98%|█████████▊| 4397/4506 [5:00:13<07:34, 4.17s/it]
98%|█████████▊| 4398/4506 [5:00:17<07:29, 4.16s/it]
{'loss': 0.1914, 'grad_norm': 0.4138943552970886, 'learning_rate': 8.90887632722659e-08, 'epoch': 0.98}
98%|█████████▊| 4398/4506 [5:00:17<07:29, 4.16s/it]
98%|█████████▊| 4399/4506 [5:00:21<07:16, 4.08s/it]
{'loss': 0.1966, 'grad_norm': 0.4126277565956116, 'learning_rate': 8.746255492828592e-08, 'epoch': 0.98}
98%|█████████▊| 4399/4506 [5:00:21<07:16, 4.08s/it]
98%|█████████▊| 4400/4506 [5:00:25<07:06, 4.02s/it]
{'loss': 0.1839, 'grad_norm': 0.38444146513938904, 'learning_rate': 8.585129984727825e-08, 'epoch': 0.98}
98%|█████████▊| 4400/4506 [5:00:25<07:06, 4.02s/it]
98%|█████████▊| 4401/4506 [5:00:29<07:06, 4.06s/it]
{'loss': 0.2054, 'grad_norm': 0.4629722237586975, 'learning_rate': 8.425499899637202e-08, 'epoch': 0.98}
98%|█████████▊| 4401/4506 [5:00:29<07:06, 4.06s/it]
98%|█████████▊| 4402/4506 [5:00:33<07:01, 4.05s/it]
{'loss': 0.202, 'grad_norm': 0.3934548795223236, 'learning_rate': 8.267365333370913e-08, 'epoch': 0.98}
98%|█████████▊| 4402/4506 [5:00:33<07:01, 4.05s/it]
98%|█████████▊| 4403/4506 [5:00:38<07:12, 4.20s/it]
{'loss': 0.1845, 'grad_norm': 0.3522265553474426, 'learning_rate': 8.110726380846367e-08, 'epoch': 0.98}
98%|█████████▊| 4403/4506 [5:00:38<07:12, 4.20s/it]
98%|█████████▊| 4404/4506 [5:00:42<07:09, 4.21s/it]
{'loss': 0.1929, 'grad_norm': 0.37829992175102234, 'learning_rate': 7.955583136083356e-08, 'epoch': 0.98}
98%|█████████▊| 4404/4506 [5:00:42<07:09, 4.21s/it]
98%|█████████▊| 4405/4506 [5:00:46<07:05, 4.21s/it]
{'loss': 0.1915, 'grad_norm': 0.411201536655426, 'learning_rate': 7.801935692203499e-08, 'epoch': 0.98}
98%|█████████▊| 4405/4506 [5:00:46<07:05, 4.21s/it]
98%|█████████▊| 4406/4506 [5:00:50<06:45, 4.06s/it]
{'loss': 0.2086, 'grad_norm': 0.48577943444252014, 'learning_rate': 7.64978414143025e-08, 'epoch': 0.98}
98%|█████████▊| 4406/4506 [5:00:50<06:45, 4.06s/it]
98%|█████████▊| 4407/4506 [5:00:54<06:51, 4.16s/it]
{'loss': 0.1904, 'grad_norm': 0.3838304579257965, 'learning_rate': 7.499128575089998e-08, 'epoch': 0.98}
98%|█████████▊| 4407/4506 [5:00:55<06:51, 4.16s/it]
98%|█████████▊| 4408/4506 [5:00:59<06:46, 4.15s/it]
{'loss': 0.1946, 'grad_norm': 0.36804404854774475, 'learning_rate': 7.349969083610686e-08, 'epoch': 0.98}
98%|█████████▊| 4408/4506 [5:00:59<06:46, 4.15s/it]
98%|█████████▊| 4409/4506 [5:01:03<06:50, 4.24s/it]
{'loss': 0.1917, 'grad_norm': 0.38174277544021606, 'learning_rate': 7.202305756522088e-08, 'epoch': 0.98}
98%|█████████▊| 4409/4506 [5:01:03<06:50, 4.24s/it]
98%|█████████▊| 4410/4506 [5:01:07<06:42, 4.19s/it]
{'loss': 0.1892, 'grad_norm': 0.3816530406475067, 'learning_rate': 7.056138682456637e-08, 'epoch': 0.98}
98%|█████████▊| 4410/4506 [5:01:07<06:42, 4.19s/it]
98%|█████████▊| 4411/4506 [5:01:11<06:34, 4.15s/it]
{'loss': 0.1983, 'grad_norm': 0.39377912878990173, 'learning_rate': 6.911467949148043e-08, 'epoch': 0.98}
98%|█████████▊| 4411/4506 [5:01:11<06:34, 4.15s/it]
98%|█████████▊| 4412/4506 [5:01:15<06:21, 4.06s/it]
{'loss': 0.1928, 'grad_norm': 0.4007073938846588, 'learning_rate': 6.7682936434324e-08, 'epoch': 0.98}
98%|█████████▊| 4412/4506 [5:01:15<06:21, 4.06s/it]
98%|█████████▊| 4413/4506 [5:01:19<06:02, 3.90s/it]
{'loss': 0.1924, 'grad_norm': 0.4340749680995941, 'learning_rate': 6.626615851246798e-08, 'epoch': 0.98}
98%|█████████▊| 4413/4506 [5:01:19<06:02, 3.90s/it]
98%|█████████▊| 4414/4506 [5:01:23<06:14, 4.07s/it]
{'loss': 0.2065, 'grad_norm': 0.36522606015205383, 'learning_rate': 6.48643465763099e-08, 'epoch': 0.98}
98%|█████████▊| 4414/4506 [5:01:23<06:14, 4.07s/it]
98%|█████████▊| 4415/4506 [5:01:27<06:07, 4.04s/it]
{'loss': 0.204, 'grad_norm': 0.4291992783546448, 'learning_rate': 6.347750146725728e-08, 'epoch': 0.98}
98%|█████████▊| 4415/4506 [5:01:27<06:07, 4.04s/it]
98%|█████████▊| 4416/4506 [5:01:31<06:03, 4.04s/it]
{'loss': 0.2, 'grad_norm': 0.4101126194000244, 'learning_rate': 6.21056240177359e-08, 'epoch': 0.98}
98%|█████████▊| 4416/4506 [5:01:31<06:03, 4.04s/it]
98%|█████████▊| 4417/4506 [5:01:35<05:49, 3.93s/it]
{'loss': 0.2018, 'grad_norm': 0.38705042004585266, 'learning_rate': 6.074871505118984e-08, 'epoch': 0.98}
98%|█████████▊| 4417/4506 [5:01:35<05:49, 3.93s/it]
98%|█████████▊| 4418/4506 [5:01:39<05:52, 4.01s/it]
{'loss': 0.2034, 'grad_norm': 0.37562552094459534, 'learning_rate': 5.940677538207873e-08, 'epoch': 0.98}
98%|█████████▊| 4418/4506 [5:01:39<05:52, 4.01s/it]
98%|█████████▊| 4419/4506 [5:01:43<05:45, 3.97s/it]
{'loss': 0.2011, 'grad_norm': 0.38460415601730347, 'learning_rate': 5.8079805815872136e-08, 'epoch': 0.98}
98%|█████████▊| 4419/4506 [5:01:43<05:45, 3.97s/it]
98%|█████████▊| 4420/4506 [5:01:47<05:41, 3.98s/it]
{'loss': 0.201, 'grad_norm': 0.38045594096183777, 'learning_rate': 5.676780714906349e-08, 'epoch': 0.98}
98%|█████████▊| 4420/4506 [5:01:47<05:41, 3.98s/it]
98%|█████████▊| 4421/4506 [5:01:51<05:44, 4.05s/it]
{'loss': 0.1942, 'grad_norm': 0.3696591258049011, 'learning_rate': 5.5470780169147864e-08, 'epoch': 0.98}
98%|█████████▊| 4421/4506 [5:01:51<05:44, 4.05s/it]
98%|█████████▊| 4422/4506 [5:01:55<05:36, 4.01s/it]
{'loss': 0.1876, 'grad_norm': 0.3725655674934387, 'learning_rate': 5.4188725654641396e-08, 'epoch': 0.98}
98%|█████████▊| 4422/4506 [5:01:55<05:36, 4.01s/it]
98%|█████████▊| 4423/4506 [5:01:59<05:36, 4.06s/it]
{'loss': 0.1834, 'grad_norm': 0.3618480861186981, 'learning_rate': 5.292164437507574e-08, 'epoch': 0.98}
98%|█████████▊| 4423/4506 [5:01:59<05:36, 4.06s/it]
98%|█████████▊| 4424/4506 [5:02:03<05:27, 4.00s/it]
{'loss': 0.1826, 'grad_norm': 0.40344762802124023, 'learning_rate': 5.166953709098699e-08, 'epoch': 0.98}
98%|█████████▊| 4424/4506 [5:02:03<05:27, 4.00s/it]
98%|█████████▊| 4425/4506 [5:02:07<05:20, 3.96s/it]
{'loss': 0.196, 'grad_norm': 0.38459512591362, 'learning_rate': 5.043240455393228e-08, 'epoch': 0.98}
98%|█████████▊| 4425/4506 [5:02:07<05:20, 3.96s/it]
98%|█████████▊| 4426/4506 [5:02:11<05:11, 3.90s/it]
{'loss': 0.1894, 'grad_norm': 0.3862270414829254, 'learning_rate': 4.921024750647596e-08, 'epoch': 0.98}
98%|█████████▊| 4426/4506 [5:02:11<05:11, 3.90s/it]
98%|█████████▊| 4427/4506 [5:02:15<05:24, 4.11s/it]
{'loss': 0.1883, 'grad_norm': 0.38287532329559326, 'learning_rate': 4.800306668218957e-08, 'epoch': 0.98}
98%|█████████▊| 4427/4506 [5:02:15<05:24, 4.11s/it]
98%|█████████▊| 4428/4506 [5:02:19<05:23, 4.15s/it]
{'loss': 0.1975, 'grad_norm': 0.40969061851501465, 'learning_rate': 4.6810862805662935e-08, 'epoch': 0.98}
98%|█████████▊| 4428/4506 [5:02:19<05:23, 4.15s/it]
98%|█████████▊| 4429/4506 [5:02:23<05:15, 4.09s/it]
{'loss': 0.1881, 'grad_norm': 0.3618950843811035, 'learning_rate': 4.563363659249309e-08, 'epoch': 0.98}
98%|█████████▊| 4429/4506 [5:02:23<05:15, 4.09s/it]
98%|█████████▊| 4430/4506 [5:02:27<05:08, 4.07s/it]
{'loss': 0.1867, 'grad_norm': 0.3809279799461365, 'learning_rate': 4.4471388749287026e-08, 'epoch': 0.98}
98%|█████████▊| 4430/4506 [5:02:27<05:08, 4.07s/it]
98%|█████████▊| 4431/4506 [5:02:31<05:04, 4.07s/it]
{'loss': 0.1945, 'grad_norm': 0.36396142840385437, 'learning_rate': 4.3324119973661706e-08, 'epoch': 0.98}
98%|█████████▊| 4431/4506 [5:02:31<05:04, 4.07s/it]
98%|█████████▊| 4432/4506 [5:02:35<04:53, 3.97s/it]
{'loss': 0.1949, 'grad_norm': 0.40018296241760254, 'learning_rate': 4.219183095424128e-08, 'epoch': 0.98}
98%|█████████▊| 4432/4506 [5:02:35<04:53, 3.97s/it]
98%|█████████▊| 4433/4506 [5:02:39<04:53, 4.02s/it]
{'loss': 0.1894, 'grad_norm': 0.40137308835983276, 'learning_rate': 4.107452237065989e-08, 'epoch': 0.98}
98%|█████████▊| 4433/4506 [5:02:39<04:53, 4.02s/it]
98%|█████████▊| 4434/4506 [5:02:43<04:50, 4.04s/it]
{'loss': 0.1969, 'grad_norm': 0.39191415905952454, 'learning_rate': 3.997219489356163e-08, 'epoch': 0.98}
98%|█████████▊| 4434/4506 [5:02:43<04:50, 4.04s/it]
98%|█████████▊| 4435/4506 [5:02:47<04:45, 4.02s/it]
{'loss': 0.1959, 'grad_norm': 0.42400720715522766, 'learning_rate': 3.8884849184595004e-08, 'epoch': 0.98}
98%|█████████▊| 4435/4506 [5:02:47<04:45, 4.02s/it]
98%|█████████▊| 4436/4506 [5:02:51<04:39, 4.00s/it]
{'loss': 0.1917, 'grad_norm': 0.4293706715106964, 'learning_rate': 3.781248589642128e-08, 'epoch': 0.98}
98%|█████████▊| 4436/4506 [5:02:51<04:39, 4.00s/it]
98%|█████████▊| 4437/4506 [5:02:55<04:29, 3.90s/it]
{'loss': 0.1871, 'grad_norm': 0.405478298664093, 'learning_rate': 3.675510567270335e-08, 'epoch': 0.98}
98%|█████████▊| 4437/4506 [5:02:55<04:29, 3.90s/it]
98%|█████████▊| 4438/4506 [5:03:00<04:39, 4.11s/it]
{'loss': 0.1893, 'grad_norm': 0.41249749064445496, 'learning_rate': 3.5712709148111314e-08, 'epoch': 0.99}
98%|█████████▊| 4438/4506 [5:03:00<04:39, 4.11s/it]
99%|█████████▊| 4439/4506 [5:03:04<04:34, 4.09s/it]
{'loss': 0.1874, 'grad_norm': 0.36135855317115784, 'learning_rate': 3.468529694832801e-08, 'epoch': 0.99}
99%|█████████▊| 4439/4506 [5:03:04<04:34, 4.09s/it]
99%|█████████▊| 4440/4506 [5:03:08<04:33, 4.14s/it]
{'loss': 0.2077, 'grad_norm': 0.4113578200340271, 'learning_rate': 3.367286969003236e-08, 'epoch': 0.99}
99%|█████████▊| 4440/4506 [5:03:08<04:33, 4.14s/it]
99%|█████████▊| 4441/4506 [5:03:13<04:39, 4.29s/it]
{'loss': 0.1975, 'grad_norm': 0.3767612874507904, 'learning_rate': 3.2675427980916054e-08, 'epoch': 0.99}
99%|█████████▊| 4441/4506 [5:03:13<04:39, 4.29s/it]
99%|█████████▊| 4442/4506 [5:03:17<04:27, 4.19s/it]
{'loss': 0.1968, 'grad_norm': 0.37348419427871704, 'learning_rate': 3.169297241967795e-08, 'epoch': 0.99}
99%|█████████▊| 4442/4506 [5:03:17<04:27, 4.19s/it]
99%|█████████▊| 4443/4506 [5:03:21<04:24, 4.19s/it]
{'loss': 0.182, 'grad_norm': 0.354137122631073, 'learning_rate': 3.072550359601023e-08, 'epoch': 0.99}
99%|█████████▊| 4443/4506 [5:03:21<04:24, 4.19s/it]
99%|█████████▊| 4444/4506 [5:03:25<04:13, 4.09s/it]
{'loss': 0.1961, 'grad_norm': 0.3708319365978241, 'learning_rate': 2.9773022090623382e-08, 'epoch': 0.99}
99%|█████████▊| 4444/4506 [5:03:25<04:13, 4.09s/it]
99%|█████████▊| 4445/4506 [5:03:29<04:12, 4.14s/it]
{'loss': 0.1891, 'grad_norm': 0.34286513924598694, 'learning_rate': 2.8835528475221197e-08, 'epoch': 0.99}
99%|█████████▊| 4445/4506 [5:03:29<04:12, 4.14s/it]
99%|█████████▊| 4446/4506 [5:03:33<04:13, 4.22s/it]
{'loss': 0.1981, 'grad_norm': 0.450753390789032, 'learning_rate': 2.7913023312520215e-08, 'epoch': 0.99}
99%|█████████▊| 4446/4506 [5:03:33<04:13, 4.22s/it]
99%|█████████▊| 4447/4506 [5:03:37<04:03, 4.12s/it]
{'loss': 0.2021, 'grad_norm': 0.420258104801178, 'learning_rate': 2.7005507156230293e-08, 'epoch': 0.99}
99%|█████████▊| 4447/4506 [5:03:37<04:03, 4.12s/it]
99%|█████████▊| 4448/4506 [5:03:41<04:00, 4.14s/it]
{'loss': 0.1889, 'grad_norm': 0.45200926065444946, 'learning_rate': 2.6112980551076804e-08, 'epoch': 0.99}
99%|█████████▊| 4448/4506 [5:03:41<04:00, 4.14s/it]
99%|█████████▊| 4449/4506 [5:03:46<04:02, 4.25s/it]
{'loss': 0.1963, 'grad_norm': 0.393063485622406, 'learning_rate': 2.5235444032778444e-08, 'epoch': 0.99}
99%|█████████▊| 4449/4506 [5:03:46<04:02, 4.25s/it]
99%|█████████▉| 4450/4506 [5:03:50<03:55, 4.21s/it]
{'loss': 0.197, 'grad_norm': 0.5024051070213318, 'learning_rate': 2.437289812805832e-08, 'epoch': 0.99}
99%|█████████▉| 4450/4506 [5:03:50<03:55, 4.21s/it]
99%|█████████▉| 4451/4506 [5:03:54<03:53, 4.24s/it]
{'loss': 0.1956, 'grad_norm': 0.364459753036499, 'learning_rate': 2.3525343354643957e-08, 'epoch': 0.99}
99%|█████████▉| 4451/4506 [5:03:54<03:53, 4.24s/it]
99%|█████████▉| 4452/4506 [5:03:58<03:43, 4.14s/it]
{'loss': 0.1869, 'grad_norm': 0.38877996802330017, 'learning_rate': 2.269278022126453e-08, 'epoch': 0.99}
99%|█████████▉| 4452/4506 [5:03:58<03:43, 4.14s/it]
99%|█████████▉| 4453/4506 [5:04:02<03:35, 4.06s/it]
{'loss': 0.1937, 'grad_norm': 0.37821948528289795, 'learning_rate': 2.18752092276453e-08, 'epoch': 0.99}
99%|█████████▉| 4453/4506 [5:04:02<03:35, 4.06s/it]
99%|█████████▉| 4454/4506 [5:04:06<03:35, 4.14s/it]
{'loss': 0.1976, 'grad_norm': 0.3742591440677643, 'learning_rate': 2.1072630864524268e-08, 'epoch': 0.99}
99%|█████████▉| 4454/4506 [5:04:06<03:35, 4.14s/it]
99%|█████████▉| 4455/4506 [5:04:11<03:36, 4.24s/it]
{'loss': 0.2042, 'grad_norm': 0.3873163163661957, 'learning_rate': 2.0285045613627206e-08, 'epoch': 0.99}
99%|█████████▉| 4455/4506 [5:04:11<03:36, 4.24s/it]
99%|█████████▉| 4456/4506 [5:04:15<03:34, 4.28s/it]
{'loss': 0.1911, 'grad_norm': 0.3561249077320099, 'learning_rate': 1.9512453947689858e-08, 'epoch': 0.99}
99%|█████████▉| 4456/4506 [5:04:15<03:34, 4.28s/it]
99%|█████████▉| 4457/4506 [5:04:19<03:25, 4.19s/it]
{'loss': 0.1906, 'grad_norm': 0.4374772608280182, 'learning_rate': 1.875485633044405e-08, 'epoch': 0.99}
99%|█████████▉| 4457/4506 [5:04:19<03:25, 4.19s/it]
99%|█████████▉| 4458/4506 [5:04:23<03:18, 4.13s/it]
{'loss': 0.1968, 'grad_norm': 0.3969595432281494, 'learning_rate': 1.8012253216623254e-08, 'epoch': 0.99}
99%|█████████▉| 4458/4506 [5:04:23<03:18, 4.13s/it]
99%|█████████▉| 4459/4506 [5:04:27<03:16, 4.17s/it]
{'loss': 0.2088, 'grad_norm': 0.37909266352653503, 'learning_rate': 1.7284645051962588e-08, 'epoch': 0.99}
99%|█████████▉| 4459/4506 [5:04:27<03:16, 4.17s/it]
99%|█████████▉| 4460/4506 [5:04:31<03:10, 4.13s/it]
{'loss': 0.2047, 'grad_norm': 0.42636585235595703, 'learning_rate': 1.6572032273190484e-08, 'epoch': 0.99}
99%|█████████▉| 4460/4506 [5:04:31<03:10, 4.13s/it]
99%|█████████▉| 4461/4506 [5:04:35<03:02, 4.07s/it]
{'loss': 0.1922, 'grad_norm': 0.4033891558647156, 'learning_rate': 1.5874415308042567e-08, 'epoch': 0.99}
99%|█████████▉| 4461/4506 [5:04:35<03:02, 4.07s/it]
99%|█████████▉| 4462/4506 [5:04:40<03:01, 4.13s/it]
{'loss': 0.2054, 'grad_norm': 0.39610961079597473, 'learning_rate': 1.5191794575245e-08, 'epoch': 0.99}
99%|█████████▉| 4462/4506 [5:04:40<03:01, 4.13s/it]
99%|█████████▉| 4463/4506 [5:04:43<02:52, 4.00s/it]
{'loss': 0.1969, 'grad_norm': 0.3967750668525696, 'learning_rate': 1.4524170484533916e-08, 'epoch': 0.99}
99%|█████████▉| 4463/4506 [5:04:43<02:52, 4.00s/it]
99%|█████████▉| 4464/4506 [5:04:48<02:51, 4.08s/it]
{'loss': 0.1898, 'grad_norm': 0.38838955760002136, 'learning_rate': 1.3871543436633217e-08, 'epoch': 0.99}
99%|█████████▉| 4464/4506 [5:04:48<02:51, 4.08s/it]
99%|█████████▉| 4465/4506 [5:04:51<02:42, 3.97s/it]
{'loss': 0.2039, 'grad_norm': 0.45959770679473877, 'learning_rate': 1.3233913823271216e-08, 'epoch': 0.99}
99%|█████████▉| 4465/4506 [5:04:51<02:42, 3.97s/it]
99%|█████████▉| 4466/4506 [5:04:56<02:42, 4.05s/it]
{'loss': 0.1908, 'grad_norm': 0.3740370571613312, 'learning_rate': 1.2611282027169546e-08, 'epoch': 0.99}
99%|█████████▉| 4466/4506 [5:04:56<02:42, 4.05s/it]
99%|█████████▉| 4467/4506 [5:05:00<02:41, 4.15s/it]
{'loss': 0.2046, 'grad_norm': 0.39028802514076233, 'learning_rate': 1.2003648422057034e-08, 'epoch': 0.99}
99%|█████████▉| 4467/4506 [5:05:00<02:41, 4.15s/it]
99%|█████████▉| 4468/4506 [5:05:04<02:39, 4.20s/it]
{'loss': 0.1902, 'grad_norm': 0.383316308259964, 'learning_rate': 1.1411013372647494e-08, 'epoch': 0.99}
99%|█████████▉| 4468/4506 [5:05:04<02:39, 4.20s/it]
99%|█████████▉| 4469/4506 [5:05:09<02:37, 4.26s/it]
{'loss': 0.1953, 'grad_norm': 0.3991543650627136, 'learning_rate': 1.0833377234661934e-08, 'epoch': 0.99}
99%|█████████▉| 4469/4506 [5:05:09<02:37, 4.26s/it]
99%|█████████▉| 4470/4506 [5:05:13<02:30, 4.17s/it]
{'loss': 0.2013, 'grad_norm': 0.3941512703895569, 'learning_rate': 1.0270740354814678e-08, 'epoch': 0.99}
99%|█████████▉| 4470/4506 [5:05:13<02:30, 4.17s/it]
99%|█████████▉| 4471/4506 [5:05:17<02:23, 4.09s/it]
{'loss': 0.2004, 'grad_norm': 0.39600545167922974, 'learning_rate': 9.723103070816142e-09, 'epoch': 0.99}
99%|█████████▉| 4471/4506 [5:05:17<02:23, 4.09s/it]
99%|█████████▉| 4472/4506 [5:05:20<02:17, 4.04s/it]
{'loss': 0.1888, 'grad_norm': 0.3457690477371216, 'learning_rate': 9.190465711375607e-09, 'epoch': 0.99}
99%|█████████▉| 4472/4506 [5:05:20<02:17, 4.04s/it]
99%|█████████▉| 4473/4506 [5:05:25<02:14, 4.08s/it]
{'loss': 0.1881, 'grad_norm': 0.39754870533943176, 'learning_rate': 8.672828596198446e-09, 'epoch': 0.99}
99%|█████████▉| 4473/4506 [5:05:25<02:14, 4.08s/it]
99%|█████████▉| 4474/4506 [5:05:29<02:10, 4.06s/it]
{'loss': 0.1862, 'grad_norm': 0.38025134801864624, 'learning_rate': 8.1701920359889e-09, 'epoch': 0.99}
99%|█████████▉| 4474/4506 [5:05:29<02:10, 4.06s/it]
99%|█████████▉| 4475/4506 [5:05:33<02:04, 4.03s/it]
{'loss': 0.1958, 'grad_norm': 0.39292505383491516, 'learning_rate': 7.682556332438972e-09, 'epoch': 0.99}
99%|█████████▉| 4475/4506 [5:05:33<02:04, 4.03s/it]
99%|█████████▉| 4476/4506 [5:05:36<01:57, 3.92s/it]
{'loss': 0.1991, 'grad_norm': 0.45937028527259827, 'learning_rate': 7.2099217782450874e-09, 'epoch': 0.99}
99%|█████████▉| 4476/4506 [5:05:36<01:57, 3.92s/it]
99%|█████████▉| 4477/4506 [5:05:41<01:57, 4.04s/it]
{'loss': 0.2067, 'grad_norm': 0.4161739945411682, 'learning_rate': 6.752288657099759e-09, 'epoch': 0.99}
99%|█████████▉| 4477/4506 [5:05:41<01:57, 4.04s/it]
99%|█████████▉| 4478/4506 [5:05:45<01:55, 4.14s/it]
{'loss': 0.196, 'grad_norm': 0.37289005517959595, 'learning_rate': 6.30965724368604e-09, 'epoch': 0.99}
99%|█████████▉| 4478/4506 [5:05:45<01:55, 4.14s/it]
99%|█████████▉| 4479/4506 [5:05:49<01:51, 4.12s/it]
{'loss': 0.1918, 'grad_norm': 0.37138694524765015, 'learning_rate': 5.882027803683077e-09, 'epoch': 0.99}
99%|█████████▉| 4479/4506 [5:05:49<01:51, 4.12s/it]
99%|█████████▉| 4480/4506 [5:05:53<01:44, 4.01s/it]
{'loss': 0.2016, 'grad_norm': 0.43162256479263306, 'learning_rate': 5.469400593768881e-09, 'epoch': 0.99}
99%|█████████▉| 4480/4506 [5:05:53<01:44, 4.01s/it]
99%|█████████▉| 4481/4506 [5:05:57<01:42, 4.12s/it]
{'loss': 0.1863, 'grad_norm': 0.38378581404685974, 'learning_rate': 5.07177586161478e-09, 'epoch': 0.99}
99%|█████████▉| 4481/4506 [5:05:57<01:42, 4.12s/it]
99%|█████████▉| 4482/4506 [5:06:01<01:36, 4.01s/it]
{'loss': 0.1868, 'grad_norm': 0.40373045206069946, 'learning_rate': 4.6891538458881944e-09, 'epoch': 0.99}
99%|█████████▉| 4482/4506 [5:06:01<01:36, 4.01s/it]
99%|█████████▉| 4483/4506 [5:06:05<01:32, 4.00s/it]
{'loss': 0.1991, 'grad_norm': 0.4014822840690613, 'learning_rate': 4.321534776247082e-09, 'epoch': 1.0}
99%|█████████▉| 4483/4506 [5:06:05<01:32, 4.00s/it]
100%|█████████▉| 4484/4506 [5:06:09<01:30, 4.10s/it]
{'loss': 0.2079, 'grad_norm': 0.44239649176597595, 'learning_rate': 3.968918873351046e-09, 'epoch': 1.0}
100%|█████████▉| 4484/4506 [5:06:09<01:30, 4.10s/it]
100%|█████████▉| 4485/4506 [5:06:13<01:26, 4.12s/it]
{'loss': 0.1859, 'grad_norm': 0.3805749714374542, 'learning_rate': 3.6313063488502277e-09, 'epoch': 1.0}
100%|█████████▉| 4485/4506 [5:06:13<01:26, 4.12s/it]
100%|█████████▉| 4486/4506 [5:06:17<01:20, 4.03s/it]
{'loss': 0.1909, 'grad_norm': 0.3958582580089569, 'learning_rate': 3.3086974053880837e-09, 'epoch': 1.0}
100%|█████████▉| 4486/4506 [5:06:17<01:20, 4.03s/it]
100%|█████████▉| 4487/4506 [5:06:21<01:16, 4.02s/it]
{'loss': 0.1912, 'grad_norm': 0.40308713912963867, 'learning_rate': 3.0010922366069395e-09, 'epoch': 1.0}
100%|█████████▉| 4487/4506 [5:06:21<01:16, 4.02s/it]
100%|█████████▉| 4488/4506 [5:06:25<01:13, 4.09s/it]
{'loss': 0.1847, 'grad_norm': 0.38255465030670166, 'learning_rate': 2.708491027139659e-09, 'epoch': 1.0}
100%|█████████▉| 4488/4506 [5:06:25<01:13, 4.09s/it]
100%|█████████▉| 4489/4506 [5:06:30<01:10, 4.17s/it]
{'loss': 0.1962, 'grad_norm': 0.36823931336402893, 'learning_rate': 2.430893952612423e-09, 'epoch': 1.0}
100%|█████████▉| 4489/4506 [5:06:30<01:10, 4.17s/it]
100%|█████████▉| 4490/4506 [5:06:34<01:05, 4.12s/it]
{'loss': 0.1948, 'grad_norm': 0.4268641471862793, 'learning_rate': 2.1683011796502783e-09, 'epoch': 1.0}
100%|█████████▉| 4490/4506 [5:06:34<01:05, 4.12s/it]
100%|█████████▉| 4491/4506 [5:06:38<01:02, 4.14s/it]
{'loss': 0.1959, 'grad_norm': 0.37856996059417725, 'learning_rate': 1.920712865868812e-09, 'epoch': 1.0}
100%|█████████▉| 4491/4506 [5:06:38<01:02, 4.14s/it]
100%|█████████▉| 4492/4506 [5:06:42<00:56, 4.05s/it]
{'loss': 0.1917, 'grad_norm': 0.3500954210758209, 'learning_rate': 1.6881291598769277e-09, 'epoch': 1.0}
100%|█████████▉| 4492/4506 [5:06:42<00:56, 4.05s/it]
100%|█████████▉| 4493/4506 [5:06:46<00:53, 4.13s/it]
{'loss': 0.1994, 'grad_norm': 0.3661535680294037, 'learning_rate': 1.4705502012796192e-09, 'epoch': 1.0}
100%|█████████▉| 4493/4506 [5:06:46<00:53, 4.13s/it]
100%|█████████▉| 4494/4506 [5:06:51<00:50, 4.20s/it]
{'loss': 0.2001, 'grad_norm': 0.43849143385887146, 'learning_rate': 1.2679761206724206e-09, 'epoch': 1.0}
100%|█████████▉| 4494/4506 [5:06:51<00:50, 4.20s/it]
100%|█████████▉| 4495/4506 [5:06:55<00:46, 4.21s/it]
{'loss': 0.1954, 'grad_norm': 0.3684777319431305, 'learning_rate': 1.0804070396497335e-09, 'epoch': 1.0}
100%|█████████▉| 4495/4506 [5:06:55<00:46, 4.21s/it]
100%|█████████▉| 4496/4506 [5:06:59<00:42, 4.23s/it]
{'loss': 0.2094, 'grad_norm': 0.3938392996788025, 'learning_rate': 9.078430707937235e-10, 'epoch': 1.0}
100%|█████████▉| 4496/4506 [5:06:59<00:42, 4.23s/it]
100%|█████████▉| 4497/4506 [5:07:03<00:37, 4.19s/it]
{'loss': 0.1985, 'grad_norm': 0.3816867470741272, 'learning_rate': 7.50284317682648e-10, 'epoch': 1.0}
100%|█████████▉| 4497/4506 [5:07:03<00:37, 4.19s/it]
100%|█████████▉| 4498/4506 [5:07:07<00:32, 4.12s/it]
{'loss': 0.2021, 'grad_norm': 0.36776986718177795, 'learning_rate': 6.077308748880795e-10, 'epoch': 1.0}
100%|█████████▉| 4498/4506 [5:07:07<00:32, 4.12s/it]
100%|█████████▉| 4499/4506 [5:07:11<00:28, 4.11s/it]
{'loss': 0.2021, 'grad_norm': 0.4480020999908447, 'learning_rate': 4.801828279776821e-10, 'epoch': 1.0}
100%|█████████▉| 4499/4506 [5:07:11<00:28, 4.11s/it]
100%|█████████▉| 4500/4506 [5:07:15<00:25, 4.17s/it]
{'loss': 0.1993, 'grad_norm': 0.4031181335449219, 'learning_rate': 3.676402535068846e-10, 'epoch': 1.0}
100%|█████████▉| 4500/4506 [5:07:16<00:25, 4.17s/it]
100%|█████████▉| 4501/4506 [5:07:19<00:20, 4.12s/it]
{'loss': 0.198, 'grad_norm': 0.36655470728874207, 'learning_rate': 2.701032190272068e-10, 'epoch': 1.0}
100%|█████████▉| 4501/4506 [5:07:19<00:20, 4.12s/it]
100%|█████████▉| 4502/4506 [5:07:24<00:16, 4.10s/it]
{'loss': 0.1924, 'grad_norm': 0.43112707138061523, 'learning_rate': 1.8757178308348445e-10, 'epoch': 1.0}
100%|█████████▉| 4502/4506 [5:07:24<00:16, 4.10s/it]
100%|█████████▉| 4503/4506 [5:07:28<00:12, 4.17s/it]
{'loss': 0.1997, 'grad_norm': 0.40285834670066833, 'learning_rate': 1.200459952166444e-10, 'epoch': 1.0}
100%|█████████▉| 4503/4506 [5:07:28<00:12, 4.17s/it]
100%|█████████▉| 4504/4506 [5:07:32<00:08, 4.09s/it]
{'loss': 0.1926, 'grad_norm': 0.37464457750320435, 'learning_rate': 6.752589595260261e-11, 'epoch': 1.0}
100%|█████████▉| 4504/4506 [5:07:32<00:08, 4.09s/it]
100%|█████████▉| 4505/4506 [5:07:36<00:04, 4.10s/it]
{'loss': 0.2003, 'grad_norm': 0.40571078658103943, 'learning_rate': 3.001151681891745e-11, 'epoch': 1.0}
100%|█████████▉| 4505/4506 [5:07:36<00:04, 4.10s/it]
100%|██████████| 4506/4506 [5:07:43<00:00, 4.86s/it]
{'loss': 0.1985, 'grad_norm': 0.7938607931137085, 'learning_rate': 7.502880330911843e-12, 'epoch': 1.0}
100%|██████████| 4506/4506 [5:07:43<00:00, 4.86s/it][INFO|trainer.py:4309] 2025-10-30 20:23:39,698 >> Saving model checkpoint to saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506
[INFO|configuration_utils.py:491] 2025-10-30 20:23:39,750 >> Configuration saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/config.json
[INFO|configuration_utils.py:757] 2025-10-30 20:23:39,753 >> Configuration saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/generation_config.json
[INFO|modeling_utils.py:4189] 2025-10-30 20:23:55,157 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2421] 2025-10-30 20:23:55,161 >> chat template saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-10-30 20:23:55,163 >> tokenizer config file saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-10-30 20:23:55,165 >> Special tokens file saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/special_tokens_map.json
[2025-10-30 20:23:55,745] [INFO] [logging.py:107:log_dist] [Rank 0] [Torch] Checkpoint global_step4506 is about to be saved!
[2025-10-30 20:23:57,267] [INFO] [logging.py:107:log_dist] [Rank 0] Saving model checkpoint: saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/global_step4506/mp_rank_00_model_states.pt
[2025-10-30 20:23:57,267] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/global_step4506/mp_rank_00_model_states.pt...
[2025-10-30 20:24:19,605] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/global_step4506/mp_rank_00_model_states.pt.
[2025-10-30 20:24:19,614] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/global_step4506/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2025-10-30 20:24:31,667] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/global_step4506/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2025-10-30 20:24:31,675] [INFO] [engine.py:3701:_save_zero_checkpoint] zero checkpoint saved saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/global_step4506/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2025-10-30 20:24:31,676] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step4506 is ready now!
[INFO|image_processing_base.py:253] 2025-10-30 20:24:32,903 >> Image processor saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/preprocessor_config.json
[INFO|video_processing_utils.py:600] 2025-10-30 20:24:32,905 >> Video processor saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/video_preprocessor_config.json
[INFO|feature_extraction_utils.py:434] 2025-10-30 20:24:32,907 >> Feature extractor saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/preprocessor_config.json
[INFO|tokenization_utils_base.py:2421] 2025-10-30 20:24:32,910 >> chat template saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-10-30 20:24:32,912 >> tokenizer config file saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-10-30 20:24:32,915 >> Special tokens file saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/special_tokens_map.json
[INFO|processing_utils.py:814] 2025-10-30 20:24:33,062 >> chat template saved in saves/qwen2_omni-3b/full/sft_v3/checkpoint-4506/chat_template.jinja
[INFO|trainer.py:2810] 2025-10-30 20:24:33,408 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 18521.6536, 'train_samples_per_second': 7.784, 'train_steps_per_second': 0.243, 'train_loss': 0.2828275888060338, 'epoch': 1.0}
100%|██████████| 4506/4506 [5:08:41<00:00, 4.86s/it]
100%|██████████| 4506/4506 [5:08:41<00:00, 4.11s/it]
[INFO|image_processing_base.py:253] 2025-10-30 20:24:33,425 >> Image processor saved in saves/qwen2_omni-3b/full/sft_v3/preprocessor_config.json
[INFO|video_processing_utils.py:600] 2025-10-30 20:24:33,428 >> Video processor saved in saves/qwen2_omni-3b/full/sft_v3/video_preprocessor_config.json
[INFO|feature_extraction_utils.py:434] 2025-10-30 20:24:33,430 >> Feature extractor saved in saves/qwen2_omni-3b/full/sft_v3/preprocessor_config.json
[INFO|tokenization_utils_base.py:2421] 2025-10-30 20:24:33,432 >> chat template saved in saves/qwen2_omni-3b/full/sft_v3/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-10-30 20:24:33,435 >> tokenizer config file saved in saves/qwen2_omni-3b/full/sft_v3/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-10-30 20:24:33,437 >> Special tokens file saved in saves/qwen2_omni-3b/full/sft_v3/special_tokens_map.json
[INFO|processing_utils.py:814] 2025-10-30 20:24:33,604 >> chat template saved in saves/qwen2_omni-3b/full/sft_v3/chat_template.jinja
[INFO|trainer.py:4309] 2025-10-30 20:24:38,779 >> Saving model checkpoint to saves/qwen2_omni-3b/full/sft_v3
[INFO|configuration_utils.py:491] 2025-10-30 20:24:38,789 >> Configuration saved in saves/qwen2_omni-3b/full/sft_v3/config.json
[INFO|configuration_utils.py:757] 2025-10-30 20:24:38,792 >> Configuration saved in saves/qwen2_omni-3b/full/sft_v3/generation_config.json
[INFO|modeling_utils.py:4189] 2025-10-30 20:24:54,953 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at saves/qwen2_omni-3b/full/sft_v3/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2421] 2025-10-30 20:24:54,956 >> chat template saved in saves/qwen2_omni-3b/full/sft_v3/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-10-30 20:24:54,958 >> tokenizer config file saved in saves/qwen2_omni-3b/full/sft_v3/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-10-30 20:24:54,960 >> Special tokens file saved in saves/qwen2_omni-3b/full/sft_v3/special_tokens_map.json
***** train metrics *****
epoch = 1.0
total_flos = 8747620790GF
train_loss = 0.2828
train_runtime = 5:08:41.65
train_samples_per_second = 7.784
train_steps_per_second = 0.243
Figure saved at: saves/qwen2_omni-3b/full/sft_v3/training_loss.png
[WARNING|2025-10-30 20:24:56] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.
[WARNING|2025-10-30 20:24:56] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|modelcard.py:456] 2025-10-30 20:24:56,119 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}