srun: job 6683247 queued and waiting for resources srun: job 6683247 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-2105 JobID: 6683247 | Full list: batch-block1-2105 batch-block1-10014 NETWORK=Efficient-Large-Model/VILA1.5-3b wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-2105 JobID: 6683247 | Full list: batch-block1-2105 batch-block1-10014 NETWORK=Efficient-Large-Model/VILA1.5-3b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-04-09 17:42:48,252] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:48,252] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:48,252] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:48,252] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:48,252] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:48,252] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:48,252] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:48,252] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:49,383] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:49,383] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:49,383] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:49,383] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:49,383] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:49,383] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:49,383] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:49,383] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-09 17:42:50,207] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:50,207] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:50,207] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:50,207] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:50,207] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:50,207] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:50,207] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:50,207] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:50,207] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:50,207] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:50,208] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:50,207] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:50,208] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:50,208] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:50,208] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:50,208] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:51,162] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:51,162] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:51,162] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:51,162] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:51,162] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:51,162] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:51,162] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:51,162] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:51,162] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:51,162] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:51,162] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:51,163] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:51,163] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:51,163] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-09 17:42:51,163] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:51,163] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-09 17:42:51,163] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( Fetching 17 files: 0%| | 0/17 [00:00\nWould this person be more likely to be a type a or b person?\nAnswer the question using a single word or phrase.'}, {'from': 'gpt', 'value': ''}]] (ignored) 3%|▎ | 156/5773 [14:42<8:30:21, 5.45s/it] 3%|▎ | 156/5773 [14:36<8:30:21, 5.45s/it] {'loss': 0.632, 'learning_rate': 1.7931034482758623e-05, 'epoch': 0.03} 3%|▎ | 156/5773 [14:42<8:30:21, 5.45s/it] {'loss': 0.632, 'learning_rate': 1.7931034482758623e-05, 'epoch': 0.03} 3%|▎ | 156/5773 [14:36<8:30:21, 5.45s/it] 3%|▎ | 157/5773 [14:42<8:31:55, 5.47s/it] 3%|▎ | 157/5773 [14:47<8:31:56, 5.47s/it] {'loss': 0.6346, 'learning_rate': 1.8045977011494254e-05, 'epoch': 0.03} 3%|▎ | 157/5773 [14:47<8:31:56, 5.47s/it] {'loss': 0.6346, 'learning_rate': 1.8045977011494254e-05, 'epoch': 0.03} 3%|▎ | 157/5773 [14:42<8:31:55, 5.47s/it] 3%|▎ | 158/5773 [14:48<8:36:32, 5.52s/it] 3%|▎ | 158/5773 [14:53<8:36:32, 5.52s/it] {'loss': 0.6394, 'learning_rate': 1.8160919540229885e-05, 'epoch': 0.03} 3%|▎ | 158/5773 [14:53<8:36:32, 5.52s/it] {'loss': 0.6394, 'learning_rate': 1.8160919540229885e-05, 'epoch': 0.03} 3%|▎ | 158/5773 [14:48<8:36:32, 5.52s/it] 3%|▎ | 159/5773 [14:59<8:35:34, 5.51s/it] 3%|▎ | 159/5773 [14:53<8:35:34, 5.51s/it] {'loss': 0.6375, 'learning_rate': 1.827586206896552e-05, 'epoch': 0.03} {'loss': 0.6375, 'learning_rate': 1.827586206896552e-05, 'epoch': 0.03} 3%|▎ | 159/5773 [14:59<8:35:34, 5.51s/it] 3%|▎ | 159/5773 [14:53<8:35:34, 5.51s/it] 3%|▎ | 160/5773 [15:04<8:36:44, 5.52s/it] 3%|▎ | 160/5773 [14:59<8:36:45, 5.52s/it] {'loss': 0.6443, 'learning_rate': 1.839080459770115e-05, 'epoch': 0.03} 3%|▎ | 160/5773 [15:04<8:36:44, 5.52s/it] {'loss': 0.6443, 'learning_rate': 1.839080459770115e-05, 'epoch': 0.03} 3%|▎ | 160/5773 [14:59<8:36:45, 5.52s/it] 3%|▎ | 161/5773 [15:10<8:32:14, 5.48s/it] 3%|▎ | 161/5773 [15:04<8:32:14, 5.48s/it] {'loss': 0.6303, 'learning_rate': 1.8505747126436784e-05, 'epoch': 0.03} 3%|▎ | 161/5773 [15:10<8:32:14, 5.48s/it] {'loss': 0.6303, 'learning_rate': 1.8505747126436784e-05, 'epoch': 0.03} 3%|▎ | 161/5773 [15:04<8:32:14, 5.48s/it] 3%|▎ | 162/5773 [15:15<8:24:09, 5.39s/it] 3%|▎ | 162/5773 [15:09<8:24:09, 5.39s/it] {'loss': 0.6395, 'learning_rate': 1.8620689655172415e-05, 'epoch': 0.03} 3%|▎ | 162/5773 [15:15<8:24:09, 5.39s/it] {'loss': 0.6395, 'learning_rate': 1.8620689655172415e-05, 'epoch': 0.03} 3%|▎ | 162/5773 [15:09<8:24:09, 5.39s/it] 3%|▎ | 163/5773 [15:21<8:35:04, 5.51s/it] 3%|▎ | 163/5773 [15:15<8:35:04, 5.51s/it] {'loss': 0.6484, 'learning_rate': 1.873563218390805e-05, 'epoch': 0.03} 3%|▎ | 163/5773 [15:21<8:35:04, 5.51s/it] {'loss': 0.6484, 'learning_rate': 1.873563218390805e-05, 'epoch': 0.03} 3%|▎ | 163/5773 [15:15<8:35:04, 5.51s/it] 3%|▎ | 164/5773 [15:26<8:36:40, 5.53s/it] 3%|▎ | 164/5773 [15:21<8:36:40, 5.53s/it] {'loss': 0.6404, 'learning_rate': 1.885057471264368e-05, 'epoch': 0.03} 3%|▎ | 164/5773 [15:26<8:36:40, 5.53s/it] {'loss': 0.6404, 'learning_rate': 1.885057471264368e-05, 'epoch': 0.03} 3%|▎ | 164/5773 [15:21<8:36:40, 5.53s/it] 3%|▎ | 165/5773 [15:31<8:33:07, 5.49s/it] 3%|▎ | 165/5773 [15:26<8:33:07, 5.49s/it] {'loss': 0.6204, 'learning_rate': 1.896551724137931e-05, 'epoch': 0.03} 3%|▎ | 165/5773 [15:31<8:33:07, 5.49s/it] {'loss': 0.6204, 'learning_rate': 1.896551724137931e-05, 'epoch': 0.03} 3%|▎ | 165/5773 [15:26<8:33:07, 5.49s/it] 3%|▎ | 166/5773 [15:37<8:25:37, 5.41s/it] 3%|▎ | 166/5773 [15:31<8:25:37, 5.41s/it] {'loss': 0.64, 'learning_rate': 1.908045977011494e-05, 'epoch': 0.03} 3%|▎ | 166/5773 [15:37<8:25:37, 5.41s/it] {'loss': 0.64, 'learning_rate': 1.908045977011494e-05, 'epoch': 0.03} 3%|▎ | 166/5773 [15:31<8:25:37, 5.41s/it] 3%|▎ | 167/5773 [15:42<8:24:38, 5.40s/it] 3%|▎ | 167/5773 [15:37<8:24:38, 5.40s/it] {'loss': 0.6064, 'learning_rate': 1.9195402298850576e-05, 'epoch': 0.03} 3%|▎ | 167/5773 [15:37<8:24:38, 5.40s/it]{'loss': 0.6064, 'learning_rate': 1.9195402298850576e-05, 'epoch': 0.03} 3%|▎ | 167/5773 [15:42<8:24:38, 5.40s/it] 3%|▎ | 168/5773 [15:47<8:22:44, 5.38s/it] 3%|▎ | 168/5773 [15:42<8:22:44, 5.38s/it] {'loss': 0.6252, 'learning_rate': 1.931034482758621e-05, 'epoch': 0.03} 3%|▎ | 168/5773 [15:47<8:22:44, 5.38s/it] {'loss': 0.6252, 'learning_rate': 1.931034482758621e-05, 'epoch': 0.03} 3%|▎ | 168/5773 [15:42<8:22:44, 5.38s/it] 3%|▎ | 169/5773 [15:53<8:20:39, 5.36s/it] 3%|▎ | 169/5773 [15:47<8:20:39, 5.36s/it] {'loss': 0.6307, 'learning_rate': 1.942528735632184e-05, 'epoch': 0.03} 3%|▎ | 169/5773 [15:53<8:20:39, 5.36s/it] {'loss': 0.6307, 'learning_rate': 1.942528735632184e-05, 'epoch': 0.03} 3%|▎ | 169/5773 [15:47<8:20:39, 5.36s/it] 3%|▎ | 170/5773 [15:58<8:24:27, 5.40s/it] 3%|▎ | 170/5773 [15:53<8:24:27, 5.40s/it] {'loss': 0.6399, 'learning_rate': 1.9540229885057475e-05, 'epoch': 0.03} 3%|▎ | 170/5773 [15:58<8:24:27, 5.40s/it] {'loss': 0.6399, 'learning_rate': 1.9540229885057475e-05, 'epoch': 0.03} 3%|▎ | 170/5773 [15:53<8:24:27, 5.40s/it] 3%|▎ | 171/5773 [16:04<8:27:55, 5.44s/it] 3%|▎ | 171/5773 [15:58<8:27:54, 5.44s/it] {'loss': 0.6291, 'learning_rate': 1.9655172413793106e-05, 'epoch': 0.03} 3%|▎ | 171/5773 [16:04<8:27:55, 5.44s/it] {'loss': 0.6291, 'learning_rate': 1.9655172413793106e-05, 'epoch': 0.03} 3%|▎ | 171/5773 [15:58<8:27:54, 5.44s/it] 3%|▎ | 172/5773 [16:09<8:28:20, 5.45s/it] 3%|▎ | 172/5773 [16:04<8:28:20, 5.45s/it] {'loss': 0.6556, 'learning_rate': 1.9770114942528737e-05, 'epoch': 0.03} 3%|▎ | 172/5773 [16:09<8:28:20, 5.45s/it] {'loss': 0.6556, 'learning_rate': 1.9770114942528737e-05, 'epoch': 0.03} 3%|▎ | 172/5773 [16:04<8:28:20, 5.45s/it] 3%|▎ | 173/5773 [16:15<8:23:57, 5.40s/it] 3%|▎ | 173/5773 [16:09<8:23:57, 5.40s/it] {'loss': 0.633, 'learning_rate': 1.9885057471264367e-05, 'epoch': 0.03} 3%|▎ | 173/5773 [16:15<8:23:57, 5.40s/it] {'loss': 0.633, 'learning_rate': 1.9885057471264367e-05, 'epoch': 0.03} 3%|▎ | 173/5773 [16:09<8:23:57, 5.40s/it] 3%|▎ | 174/5773 [16:20<8:24:17, 5.40s/it] 3%|▎ | 174/5773 [16:14<8:24:18, 5.40s/it] {'loss': 0.6308, 'learning_rate': 2e-05, 'epoch': 0.03} 3%|▎ | 174/5773 [16:20<8:24:17, 5.40s/it] {'loss': 0.6308, 'learning_rate': 2e-05, 'epoch': 0.03} 3%|▎ | 174/5773 [16:14<8:24:18, 5.40s/it] 3%|▎ | 175/5773 [16:25<8:25:19, 5.42s/it] 3%|▎ | 175/5773 [16:20<8:25:19, 5.42s/it] {'loss': 0.6344, 'learning_rate': 1.9999998425840254e-05, 'epoch': 0.03} 3%|▎ | 175/5773 [16:25<8:25:19, 5.42s/it] {'loss': 0.6344, 'learning_rate': 1.9999998425840254e-05, 'epoch': 0.03} 3%|▎ | 175/5773 [16:20<8:25:19, 5.42s/it] 3%|▎ | 176/5773 [16:31<8:25:45, 5.42s/it] 3%|▎ | 176/5773 [16:25<8:25:45, 5.42s/it] {'loss': 0.6327, 'learning_rate': 1.99999937033615e-05, 'epoch': 0.03} 3%|▎ | 176/5773 [16:31<8:25:45, 5.42s/it] {'loss': 0.6327, 'learning_rate': 1.99999937033615e-05, 'epoch': 0.03} 3%|▎ | 176/5773 [16:25<8:25:45, 5.42s/it] 3%|▎ | 177/5773 [16:36<8:26:45, 5.43s/it] 3%|▎ | 177/5773 [16:31<8:26:45, 5.43s/it] {'loss': 0.6402, 'learning_rate': 1.9999985832565235e-05, 'epoch': 0.03} 3%|▎ | 177/5773 [16:36<8:26:45, 5.43s/it] {'loss': 0.6402, 'learning_rate': 1.9999985832565235e-05, 'epoch': 0.03} 3%|▎ | 177/5773 [16:31<8:26:45, 5.43s/it] 3%|▎ | 178/5773 [16:42<8:28:50, 5.46s/it] 3%|▎ | 178/5773 [16:36<8:28:50, 5.46s/it] {'loss': 0.6353, 'learning_rate': 1.999997481345393e-05, 'epoch': 0.03} 3%|▎ | 178/5773 [16:42<8:28:50, 5.46s/it] {'loss': 0.6353, 'learning_rate': 1.999997481345393e-05, 'epoch': 0.03} 3%|▎ | 178/5773 [16:36<8:28:50, 5.46s/it] 3%|▎ | 179/5773 [16:47<8:26:40, 5.43s/it] 3%|▎ | 179/5773 [16:42<8:26:40, 5.43s/it] {'loss': 0.6332, 'learning_rate': 1.999996064603106e-05, 'epoch': 0.03} 3%|▎ | 179/5773 [16:47<8:26:40, 5.43s/it] {'loss': 0.6332, 'learning_rate': 1.999996064603106e-05, 'epoch': 0.03} 3%|▎ | 179/5773 [16:42<8:26:40, 5.43s/it] 3%|▎ | 180/5773 [16:52<8:22:44, 5.39s/it] 3%|▎ | 180/5773 [16:47<8:22:44, 5.39s/it] {'loss': 0.6242, 'learning_rate': 1.9999943330301076e-05, 'epoch': 0.03} 3%|▎ | 180/5773 [16:52<8:22:44, 5.39s/it] {'loss': 0.6242, 'learning_rate': 1.9999943330301076e-05, 'epoch': 0.03} 3%|▎ | 180/5773 [16:47<8:22:44, 5.39s/it] 3%|▎ | 181/5773 [16:58<8:20:42, 5.37s/it] 3%|▎ | 181/5773 [16:52<8:20:41, 5.37s/it] {'loss': 0.6324, 'learning_rate': 1.9999922866269443e-05, 'epoch': 0.03} 3%|▎ | 181/5773 [16:58<8:20:42, 5.37s/it] {'loss': 0.6324, 'learning_rate': 1.9999922866269443e-05, 'epoch': 0.03} 3%|▎ | 181/5773 [16:52<8:20:41, 5.37s/it] 3%|▎ | 182/5773 [17:03<8:24:26, 5.41s/it] 3%|▎ | 182/5773 [16:58<8:24:26, 5.41s/it] {'loss': 0.6229, 'learning_rate': 1.999989925394259e-05, 'epoch': 0.03} 3%|▎ | 182/5773 [17:03<8:24:26, 5.41s/it] {'loss': 0.6229, 'learning_rate': 1.999989925394259e-05, 'epoch': 0.03} 3%|▎ | 182/5773 [16:58<8:24:26, 5.41s/it] 3%|▎ | 183/5773 [17:09<8:27:26, 5.45s/it] 3%|▎ | 183/5773 [17:03<8:27:26, 5.45s/it] {'loss': 0.6267, 'learning_rate': 1.999987249332796e-05, 'epoch': 0.03} 3%|▎ | 183/5773 [17:09<8:27:26, 5.45s/it] {'loss': 0.6267, 'learning_rate': 1.999987249332796e-05, 'epoch': 0.03} 3%|▎ | 183/5773 [17:03<8:27:26, 5.45s/it] 3%|▎ | 184/5773 [17:14<8:28:11, 5.46s/it] 3%|▎ | 184/5773 [17:09<8:28:11, 5.46s/it] {'loss': 0.6369, 'learning_rate': 1.9999842584433976e-05, 'epoch': 0.03} 3%|▎ | 184/5773 [17:14<8:28:11, 5.46s/it] {'loss': 0.6369, 'learning_rate': 1.9999842584433976e-05, 'epoch': 0.03} 3%|▎ | 184/5773 [17:09<8:28:11, 5.46s/it] 3%|▎ | 185/5773 [17:20<8:26:41, 5.44s/it] 3%|▎ | 185/5773 [17:14<8:26:41, 5.44s/it] {'loss': 0.6204, 'learning_rate': 1.9999809527270053e-05, 'epoch': 0.03} 3%|▎ | 185/5773 [17:20<8:26:41, 5.44s/it] {'loss': 0.6204, 'learning_rate': 1.9999809527270053e-05, 'epoch': 0.03} 3%|▎ | 185/5773 [17:14<8:26:41, 5.44s/it] 3%|▎ | 186/5773 [17:25<8:25:54, 5.43s/it] 3%|▎ | 186/5773 [17:20<8:25:54, 5.43s/it] {'loss': 0.6399, 'learning_rate': 1.9999773321846598e-05, 'epoch': 0.03} 3%|▎ | 186/5773 [17:25<8:25:54, 5.43s/it] {'loss': 0.6399, 'learning_rate': 1.9999773321846598e-05, 'epoch': 0.03} 3%|▎ | 186/5773 [17:20<8:25:54, 5.43s/it] 3%|▎ | 187/5773 [17:31<8:25:26, 5.43s/it] 3%|▎ | 187/5773 [17:25<8:25:26, 5.43s/it] {'loss': 0.6158, 'learning_rate': 1.9999733968175014e-05, 'epoch': 0.03} 3%|▎ | 187/5773 [17:31<8:25:26, 5.43s/it] {'loss': 0.6158, 'learning_rate': 1.9999733968175014e-05, 'epoch': 0.03} 3%|▎ | 187/5773 [17:25<8:25:26, 5.43s/it] 3%|▎ | 188/5773 [17:36<8:23:20, 5.41s/it] 3%|▎ | 188/5773 [17:30<8:23:20, 5.41s/it] {'loss': 0.6203, 'learning_rate': 1.9999691466267683e-05, 'epoch': 0.03} 3%|▎ | 188/5773 [17:36<8:23:20, 5.41s/it] {'loss': 0.6203, 'learning_rate': 1.9999691466267683e-05, 'epoch': 0.03} 3%|▎ | 188/5773 [17:30<8:23:20, 5.41s/it] 3%|▎ | 189/5773 [17:41<8:20:57, 5.38s/it] 3%|▎ | 189/5773 [17:36<8:20:57, 5.38s/it] {'loss': 0.6336, 'learning_rate': 1.9999645816137994e-05, 'epoch': 0.03} 3%|▎ | 189/5773 [17:41<8:20:57, 5.38s/it] {'loss': 0.6336, 'learning_rate': 1.9999645816137994e-05, 'epoch': 0.03} 3%|▎ | 189/5773 [17:36<8:20:57, 5.38s/it] 3%|▎ | 190/5773 [17:47<8:27:38, 5.46s/it] 3%|▎ | 190/5773 [17:41<8:27:38, 5.46s/it] {'loss': 0.6087, 'learning_rate': 1.9999597017800315e-05, 'epoch': 0.03} 3%|▎ | 190/5773 [17:47<8:27:38, 5.46s/it] {'loss': 0.6087, 'learning_rate': 1.9999597017800315e-05, 'epoch': 0.03} 3%|▎ | 190/5773 [17:41<8:27:38, 5.46s/it] 3%|▎ | 191/5773 [17:52<8:28:25, 5.46s/it] 3%|▎ | 191/5773 [17:47<8:28:25, 5.47s/it] {'loss': 0.6241, 'learning_rate': 1.9999545071270006e-05, 'epoch': 0.03} 3%|▎ | 191/5773 [17:52<8:28:25, 5.46s/it] {'loss': 0.6241, 'learning_rate': 1.9999545071270006e-05, 'epoch': 0.03} 3%|▎ | 191/5773 [17:47<8:28:25, 5.47s/it] 3%|▎ | 192/5773 [17:58<8:26:31, 5.45s/it] 3%|▎ | 192/5773 [17:52<8:26:32, 5.45s/it] {'loss': 0.641, 'learning_rate': 1.999948997656343e-05, 'epoch': 0.03} 3%|▎ | 192/5773 [17:58<8:26:31, 5.45s/it] {'loss': 0.641, 'learning_rate': 1.999948997656343e-05, 'epoch': 0.03} 3%|▎ | 192/5773 [17:52<8:26:32, 5.45s/it] 3%|▎ | 193/5773 [18:03<8:22:03, 5.40s/it] 3%|▎ | 193/5773 [17:57<8:22:03, 5.40s/it] {'loss': 0.6503, 'learning_rate': 1.9999431733697928e-05, 'epoch': 0.03} 3%|▎ | 193/5773 [18:03<8:22:03, 5.40s/it] {'loss': 0.6503, 'learning_rate': 1.9999431733697928e-05, 'epoch': 0.03} 3%|▎ | 193/5773 [17:57<8:22:03, 5.40s/it] 3%|▎ | 194/5773 [18:08<8:18:55, 5.37s/it] 3%|▎ | 194/5773 [18:03<8:18:54, 5.37s/it] {'loss': 0.6344, 'learning_rate': 1.9999370342691836e-05, 'epoch': 0.03} 3%|▎ | 194/5773 [18:08<8:18:55, 5.37s/it] {'loss': 0.6344, 'learning_rate': 1.9999370342691836e-05, 'epoch': 0.03} 3%|▎ | 194/5773 [18:03<8:18:54, 5.37s/it] 3%|▎ | 195/5773 [18:08<8:21:21, 5.39s/it] 3%|▎ | 195/5773 [18:14<8:21:21, 5.39s/it] {'loss': 0.6391, 'learning_rate': 1.999930580356448e-05, 'epoch': 0.03} {'loss': 0.6391, 'learning_rate': 1.999930580356448e-05, 'epoch': 0.03} 3%|▎ | 195/5773 [18:14<8:21:21, 5.39s/it] 3%|▎ | 195/5773 [18:08<8:21:21, 5.39s/it] 3%|▎ | 196/5773 [18:19<8:23:25, 5.42s/it] 3%|▎ | 196/5773 [18:14<8:23:25, 5.42s/it] {'loss': 0.6198, 'learning_rate': 1.999923811633618e-05, 'epoch': 0.03} 3%|▎ | 196/5773 [18:19<8:23:25, 5.42s/it] {'loss': 0.6198, 'learning_rate': 1.999923811633618e-05, 'epoch': 0.03} 3%|▎ | 196/5773 [18:14<8:23:25, 5.42s/it] 3%|▎ | 197/5773 [18:25<8:25:36, 5.44s/it] 3%|▎ | 197/5773 [18:19<8:25:36, 5.44s/it] {'loss': 0.6342, 'learning_rate': 1.9999167281028252e-05, 'epoch': 0.03} 3%|▎ | 197/5773 [18:25<8:25:36, 5.44s/it] {'loss': 0.6342, 'learning_rate': 1.9999167281028252e-05, 'epoch': 0.03} 3%|▎ | 197/5773 [18:19<8:25:36, 5.44s/it] 3%|▎ | 198/5773 [18:30<8:23:33, 5.42s/it] 3%|▎ | 198/5773 [18:25<8:23:32, 5.42s/it] {'loss': 0.6203, 'learning_rate': 1.999909329766299e-05, 'epoch': 0.03} 3%|▎ | 198/5773 [18:30<8:23:33, 5.42s/it] {'loss': 0.6203, 'learning_rate': 1.999909329766299e-05, 'epoch': 0.03} 3%|▎ | 198/5773 [18:25<8:23:32, 5.42s/it] 3%|▎ | 199/5773 [18:35<8:20:07, 5.38s/it] 3%|▎ | 199/5773 [18:30<8:20:07, 5.38s/it] {'loss': 0.6187, 'learning_rate': 1.999901616626369e-05, 'epoch': 0.03} 3%|▎ | 199/5773 [18:35<8:20:07, 5.38s/it] {'loss': 0.6187, 'learning_rate': 1.999901616626369e-05, 'epoch': 0.03} 3%|▎ | 199/5773 [18:30<8:20:07, 5.38s/it]8 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3%|▎ | 200/5773 [18:41<8:20:20, 5.39s/it]15 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 014 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1210 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 3%|▎ | 200/5773 [18:35<8:20:20, 5.39s/it]5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6222, 'learning_rate': 1.9998935886854634e-05, 'epoch': 0.03} 3%|▎ | 200/5773 [18:41<8:20:20, 5.39s/it] {'loss': 0.6222, 'learning_rate': 1.9998935886854634e-05, 'epoch': 0.03} 3%|▎ | 200/5773 [18:35<8:20:20, 5.39s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 3%|▎ | 201/5773 [18:59<14:19:02, 9.25s/it] 3%|▎ | 201/5773 [18:54<14:19:02, 9.25s/it] {'loss': 0.6203, 'learning_rate': 1.99988524594611e-05, 'epoch': 0.03} 3%|▎ | 201/5773 [18:59<14:19:02, 9.25s/it] {'loss': 0.6203, 'learning_rate': 1.99988524594611e-05, 'epoch': 0.03} 3%|▎ | 201/5773 [18:54<14:19:02, 9.25s/it] 3%|▎ | 202/5773 [19:04<12:31:16, 8.09s/it] 3%|▎ | 202/5773 [18:59<12:31:17, 8.09s/it] {'loss': 0.6475, 'learning_rate': 1.9998765884109348e-05, 'epoch': 0.03} 3%|▎ | 202/5773 [19:04<12:31:16, 8.09s/it] {'loss': 0.6475, 'learning_rate': 1.9998765884109348e-05, 'epoch': 0.03} 3%|▎ | 202/5773 [18:59<12:31:17, 8.09s/it] 4%|▎ | 203/5773 [19:10<11:16:42, 7.29s/it] 4%|▎ | 203/5773 [19:04<11:16:42, 7.29s/it] {'loss': 0.617, 'learning_rate': 1.9998676160826637e-05, 'epoch': 0.04} 4%|▎ | 203/5773 [19:10<11:16:42, 7.29s/it] {'loss': 0.617, 'learning_rate': 1.9998676160826637e-05, 'epoch': 0.04} 4%|▎ | 203/5773 [19:04<11:16:42, 7.29s/it] 4%|▎ | 204/5773 [19:15<10:22:12, 6.70s/it] 4%|▎ | 204/5773 [19:10<10:22:13, 6.70s/it] {'loss': 0.6321, 'learning_rate': 1.9998583289641214e-05, 'epoch': 0.04} 4%|▎ | 204/5773 [19:15<10:22:12, 6.70s/it] {'loss': 0.6321, 'learning_rate': 1.9998583289641214e-05, 'epoch': 0.04} 4%|▎ | 204/5773 [19:10<10:22:13, 6.70s/it] 4%|▎ | 205/5773 [19:21<9:45:21, 6.31s/it] 4%|▎ | 205/5773 [19:15<9:45:21, 6.31s/it] {'loss': 0.6227, 'learning_rate': 1.999848727058232e-05, 'epoch': 0.04} 4%|▎ | 205/5773 [19:21<9:45:21, 6.31s/it] {'loss': 0.6227, 'learning_rate': 1.999848727058232e-05, 'epoch': 0.04} 4%|▎ | 205/5773 [19:15<9:45:21, 6.31s/it] 4%|▎ | 206/5773 [19:26<9:24:22, 6.08s/it] {'loss': 0.6272, 'learning_rate': 1.9998388103680186e-05, 'epoch': 0.04} 4%|▎ | 206/5773 [19:26<9:24:22, 6.08s/it] 4%|▎ | 206/5773 [19:21<9:24:22, 6.08s/it] {'loss': 0.6272, 'learning_rate': 1.9998388103680186e-05, 'epoch': 0.04} 4%|▎ | 206/5773 [19:21<9:24:22, 6.08s/it] 4%|▎ | 207/5773 [19:32<9:07:35, 5.90s/it] 4%|▎ | 207/5773 [19:26<9:07:35, 5.90s/it]{'loss': 0.6111, 'learning_rate': 1.9998285788966027e-05, 'epoch': 0.04} {'loss': 0.6111, 'learning_rate': 1.9998285788966027e-05, 'epoch': 0.04} 4%|▎ | 207/5773 [19:32<9:07:35, 5.90s/it] 4%|▎ | 207/5773 [19:26<9:07:35, 5.90s/it] 4%|▎ | 208/5773 [19:37<8:49:55, 5.71s/it] 4%|▎ | 208/5773 [19:31<8:49:55, 5.71s/it] {'loss': 0.6265, 'learning_rate': 1.999818032647206e-05, 'epoch': 0.04} 4%|▎ | 208/5773 [19:37<8:49:55, 5.71s/it] {'loss': 0.6265, 'learning_rate': 1.999818032647206e-05, 'epoch': 0.04} 4%|▎ | 208/5773 [19:31<8:49:55, 5.71s/it] 4%|▎ | 209/5773 [19:42<8:43:05, 5.64s/it] 4%|▎ | 209/5773 [19:37<8:43:06, 5.64s/it] {'loss': 0.6103, 'learning_rate': 1.999807171623149e-05, 'epoch': 0.04} 4%|▎ | 209/5773 [19:42<8:43:05, 5.64s/it] {'loss': 0.6103, 'learning_rate': 1.999807171623149e-05, 'epoch': 0.04} 4%|▎ | 209/5773 [19:37<8:43:06, 5.64s/it] 4%|▎ | 210/5773 [19:48<8:35:41, 5.56s/it] 4%|▎ | 210/5773 [19:42<8:35:41, 5.56s/it] {'loss': 0.6242, 'learning_rate': 1.9997959958278502e-05, 'epoch': 0.04} 4%|▎ | 210/5773 [19:48<8:35:41, 5.56s/it] {'loss': 0.6242, 'learning_rate': 1.9997959958278502e-05, 'epoch': 0.04} 4%|▎ | 210/5773 [19:42<8:35:41, 5.56s/it] 4%|▎ | 211/5773 [19:53<8:27:44, 5.48s/it] 4%|▎ | 211/5773 [19:47<8:27:44, 5.48s/it] {'loss': 0.6296, 'learning_rate': 1.9997845052648288e-05, 'epoch': 0.04} 4%|▎ | 211/5773 [19:53<8:27:44, 5.48s/it] {'loss': 0.6296, 'learning_rate': 1.9997845052648288e-05, 'epoch': 0.04} 4%|▎ | 211/5773 [19:47<8:27:44, 5.48s/it] 4%|▎ | 212/5773 [19:58<8:26:46, 5.47s/it] 4%|▎ | 212/5773 [19:53<8:26:46, 5.47s/it] {'loss': 0.6518, 'learning_rate': 1.999772699937702e-05, 'epoch': 0.04} 4%|▎ | 212/5773 [19:58<8:26:46, 5.47s/it] {'loss': 0.6518, 'learning_rate': 1.999772699937702e-05, 'epoch': 0.04} 4%|▎ | 212/5773 [19:53<8:26:46, 5.47s/it] 4%|▎ | 213/5773 [20:04<8:24:33, 5.44s/it] 4%|▎ | 213/5773 [19:58<8:24:32, 5.44s/it] {'loss': 0.6436, 'learning_rate': 1.9997605798501873e-05, 'epoch': 0.04} 4%|▎ | 213/5773 [20:04<8:24:33, 5.44s/it] {'loss': 0.6436, 'learning_rate': 1.9997605798501873e-05, 'epoch': 0.04} 4%|▎ | 213/5773 [19:58<8:24:32, 5.44s/it] 4%|▎ | 214/5773 [20:09<8:27:17, 5.48s/it] 4%|▎ | 214/5773 [20:04<8:27:17, 5.48s/it] {'loss': 0.6357, 'learning_rate': 1.9997481450061e-05, 'epoch': 0.04} 4%|▎ | 214/5773 [20:09<8:27:17, 5.48s/it] {'loss': 0.6357, 'learning_rate': 1.9997481450061e-05, 'epoch': 0.04} 4%|▎ | 214/5773 [20:04<8:27:17, 5.48s/it] 4%|▎ | 215/5773 [20:15<8:21:48, 5.42s/it] 4%|▎ | 215/5773 [20:09<8:21:48, 5.42s/it] {'loss': 0.6404, 'learning_rate': 1.9997353954093545e-05, 'epoch': 0.04} 4%|▎ | 215/5773 [20:15<8:21:48, 5.42s/it] {'loss': 0.6404, 'learning_rate': 1.9997353954093545e-05, 'epoch': 0.04} 4%|▎ | 215/5773 [20:09<8:21:48, 5.42s/it] 4%|▎ | 216/5773 [20:20<8:23:17, 5.43s/it] 4%|▎ | 216/5773 [20:15<8:23:17, 5.43s/it] {'loss': 0.6359, 'learning_rate': 1.9997223310639652e-05, 'epoch': 0.04} 4%|▎ | 216/5773 [20:20<8:23:17, 5.43s/it] {'loss': 0.6359, 'learning_rate': 1.9997223310639652e-05, 'epoch': 0.04} 4%|▎ | 216/5773 [20:15<8:23:17, 5.43s/it] 4%|▍ | 217/5773 [20:26<8:23:25, 5.44s/it] 4%|▍ | 217/5773 [20:20<8:23:25, 5.44s/it] {'loss': 0.6312, 'learning_rate': 1.9997089519740455e-05, 'epoch': 0.04} 4%|▍ | 217/5773 [20:26<8:23:25, 5.44s/it] {'loss': 0.6312, 'learning_rate': 1.9997089519740455e-05, 'epoch': 0.04} 4%|▍ | 217/5773 [20:20<8:23:25, 5.44s/it] 4%|▍ | 218/5773 [20:31<8:21:47, 5.42s/it] 4%|▍ | 218/5773 [20:25<8:21:48, 5.42s/it] {'loss': 0.6377, 'learning_rate': 1.999695258143807e-05, 'epoch': 0.04} 4%|▍ | 218/5773 [20:31<8:21:47, 5.42s/it] {'loss': 0.6377, 'learning_rate': 1.999695258143807e-05, 'epoch': 0.04} 4%|▍ | 218/5773 [20:25<8:21:48, 5.42s/it] 4%|▍ | 219/5773 [20:36<8:24:17, 5.45s/it] 4%|▍ | 219/5773 [20:31<8:24:17, 5.45s/it] {'loss': 0.6157, 'learning_rate': 1.9996812495775612e-05, 'epoch': 0.04} 4%|▍ | 219/5773 [20:37<8:24:17, 5.45s/it] {'loss': 0.6157, 'learning_rate': 1.9996812495775612e-05, 'epoch': 0.04} 4%|▍ | 219/5773 [20:31<8:24:17, 5.45s/it] 4%|▍ | 220/5773 [20:42<8:23:18, 5.44s/it] 4%|▍ | 220/5773 [20:36<8:23:19, 5.44s/it] {'loss': 0.6222, 'learning_rate': 1.9996669262797183e-05, 'epoch': 0.04} 4%|▍ | 220/5773 [20:42<8:23:18, 5.44s/it] {'loss': 0.6222, 'learning_rate': 1.9996669262797183e-05, 'epoch': 0.04} 4%|▍ | 220/5773 [20:36<8:23:19, 5.44s/it] 4%|▍ | 221/5773 [20:47<8:23:23, 5.44s/it] 4%|▍ | 221/5773 [20:42<8:23:23, 5.44s/it] {'loss': 0.6137, 'learning_rate': 1.999652288254788e-05, 'epoch': 0.04} 4%|▍ | 221/5773 [20:47<8:23:23, 5.44s/it] {'loss': 0.6137, 'learning_rate': 1.999652288254788e-05, 'epoch': 0.04} 4%|▍ | 221/5773 [20:42<8:23:23, 5.44s/it] 4%|▍ | 222/5773 [20:53<8:21:16, 5.42s/it] 4%|▍ | 222/5773 [20:47<8:21:16, 5.42s/it] {'loss': 0.6331, 'learning_rate': 1.999637335507379e-05, 'epoch': 0.04} 4%|▍ | 222/5773 [20:53<8:21:16, 5.42s/it] {'loss': 0.6331, 'learning_rate': 1.999637335507379e-05, 'epoch': 0.04} 4%|▍ | 222/5773 [20:47<8:21:16, 5.42s/it] 4%|▍ | 223/5773 [20:58<8:22:10, 5.43s/it] 4%|▍ | 223/5773 [20:53<8:22:10, 5.43s/it] {'loss': 0.6109, 'learning_rate': 1.999622068042198e-05, 'epoch': 0.04} 4%|▍ | 223/5773 [20:58<8:22:10, 5.43s/it] {'loss': 0.6109, 'learning_rate': 1.999622068042198e-05, 'epoch': 0.04} 4%|▍ | 223/5773 [20:53<8:22:10, 5.43s/it] 4%|▍ | 224/5773 [21:04<8:20:04, 5.41s/it] 4%|▍ | 224/5773 [20:58<8:20:04, 5.41s/it] {'loss': 0.6034, 'learning_rate': 1.9996064858640526e-05, 'epoch': 0.04} 4%|▍ | 224/5773 [21:04<8:20:04, 5.41s/it] {'loss': 0.6034, 'learning_rate': 1.9996064858640526e-05, 'epoch': 0.04} 4%|▍ | 224/5773 [20:58<8:20:04, 5.41s/it] 4%|▍ | 225/5773 [21:09<8:19:17, 5.40s/it] 4%|▍ | 225/5773 [21:03<8:19:17, 5.40s/it] {'loss': 0.6281, 'learning_rate': 1.999590588977848e-05, 'epoch': 0.04} 4%|▍ | 225/5773 [21:09<8:19:17, 5.40s/it] {'loss': 0.6281, 'learning_rate': 1.999590588977848e-05, 'epoch': 0.04} 4%|▍ | 225/5773 [21:03<8:19:17, 5.40s/it] 4%|▍ | 226/5773 [21:14<8:23:25, 5.45s/it] 4%|▍ | 226/5773 [21:09<8:23:25, 5.45s/it] {'loss': 0.6236, 'learning_rate': 1.9995743773885896e-05, 'epoch': 0.04} 4%|▍ | 226/5773 [21:14<8:23:25, 5.45s/it] {'loss': 0.6236, 'learning_rate': 1.9995743773885896e-05, 'epoch': 0.04} 4%|▍ | 226/5773 [21:09<8:23:25, 5.45s/it] 4%|▍ | 227/5773 [21:20<8:24:58, 5.46s/it] 4%|▍ | 227/5773 [21:14<8:24:58, 5.46s/it] {'loss': 0.6178, 'learning_rate': 1.999557851101381e-05, 'epoch': 0.04} 4%|▍ | 227/5773 [21:20<8:24:58, 5.46s/it] {'loss': 0.6178, 'learning_rate': 1.999557851101381e-05, 'epoch': 0.04} 4%|▍ | 227/5773 [21:14<8:24:58, 5.46s/it] 4%|▍ | 228/5773 [21:25<8:25:27, 5.47s/it] 4%|▍ | 228/5773 [21:20<8:25:27, 5.47s/it] {'loss': 0.6191, 'learning_rate': 1.999541010121425e-05, 'epoch': 0.04} 4%|▍ | 228/5773 [21:25<8:25:27, 5.47s/it] {'loss': 0.6191, 'learning_rate': 1.999541010121425e-05, 'epoch': 0.04} 4%|▍ | 228/5773 [21:20<8:25:27, 5.47s/it] 4%|▍ | 229/5773 [21:31<8:22:06, 5.43s/it] 4%|▍ | 229/5773 [21:25<8:22:06, 5.43s/it] {'loss': 0.6195, 'learning_rate': 1.999523854454024e-05, 'epoch': 0.04} 4%|▍ | 229/5773 [21:31<8:22:06, 5.43s/it] {'loss': 0.6195, 'learning_rate': 1.999523854454024e-05, 'epoch': 0.04} 4%|▍ | 229/5773 [21:25<8:22:06, 5.43s/it] 4%|▍ | 230/5773 [21:36<8:19:57, 5.41s/it] 4%|▍ | 230/5773 [21:31<8:19:58, 5.41s/it] {'loss': 0.6062, 'learning_rate': 1.9995063841045793e-05, 'epoch': 0.04} 4%|▍ | 230/5773 [21:36<8:19:57, 5.41s/it] {'loss': 0.6062, 'learning_rate': 1.9995063841045793e-05, 'epoch': 0.04} 4%|▍ | 230/5773 [21:31<8:19:58, 5.41s/it] 4%|▍ | 231/5773 [21:42<8:19:07, 5.40s/it] 4%|▍ | 231/5773 [21:36<8:19:07, 5.40s/it] {'loss': 0.6265, 'learning_rate': 1.9994885990785903e-05, 'epoch': 0.04} 4%|▍ | 231/5773 [21:42<8:19:07, 5.40s/it] {'loss': 0.6265, 'learning_rate': 1.9994885990785903e-05, 'epoch': 0.04} 4%|▍ | 231/5773 [21:36<8:19:07, 5.40s/it] 4%|▍ | 232/5773 [21:47<8:17:39, 5.39s/it] 4%|▍ | 232/5773 [21:41<8:17:38, 5.39s/it] {'loss': 0.6169, 'learning_rate': 1.9994704993816575e-05, 'epoch': 0.04} 4%|▍ | 232/5773 [21:47<8:17:39, 5.39s/it] {'loss': 0.6169, 'learning_rate': 1.9994704993816575e-05, 'epoch': 0.04} 4%|▍ | 232/5773 [21:41<8:17:38, 5.39s/it] 4%|▍ | 233/5773 [21:53<8:23:19, 5.45s/it] 4%|▍ | 233/5773 [21:47<8:23:19, 5.45s/it] {'loss': 0.63, 'learning_rate': 1.999452085019478e-05, 'epoch': 0.04} 4%|▍ | 233/5773 [21:53<8:23:19, 5.45s/it] {'loss': 0.63, 'learning_rate': 1.999452085019478e-05, 'epoch': 0.04} 4%|▍ | 233/5773 [21:47<8:23:19, 5.45s/it] 4%|▍ | 234/5773 [21:58<8:19:11, 5.41s/it] 4%|▍ | 234/5773 [21:52<8:19:12, 5.41s/it] {'loss': 0.6159, 'learning_rate': 1.9994333559978502e-05, 'epoch': 0.04} 4%|▍ | 234/5773 [21:58<8:19:11, 5.41s/it] {'loss': 0.6159, 'learning_rate': 1.9994333559978502e-05, 'epoch': 0.04} 4%|▍ | 234/5773 [21:52<8:19:12, 5.41s/it] 4%|▍ | 235/5773 [22:03<8:19:16, 5.41s/it] 4%|▍ | 235/5773 [21:58<8:19:16, 5.41s/it] {'loss': 0.6054, 'learning_rate': 1.9994143123226702e-05, 'epoch': 0.04} 4%|▍ | 235/5773 [22:03<8:19:16, 5.41s/it] {'loss': 0.6054, 'learning_rate': 1.9994143123226702e-05, 'epoch': 0.04} 4%|▍ | 235/5773 [21:58<8:19:16, 5.41s/it] 4%|▍ | 236/5773 [22:09<8:22:54, 5.45s/it] 4%|▍ | 236/5773 [22:03<8:22:54, 5.45s/it] {'loss': 0.6233, 'learning_rate': 1.999394953999934e-05, 'epoch': 0.04} 4%|▍ | 236/5773 [22:09<8:22:54, 5.45s/it] {'loss': 0.6233, 'learning_rate': 1.999394953999934e-05, 'epoch': 0.04} 4%|▍ | 236/5773 [22:03<8:22:54, 5.45s/it] 4%|▍ | 237/5773 [22:14<8:22:32, 5.45s/it] 4%|▍ | 237/5773 [22:09<8:22:32, 5.45s/it] {'loss': 0.6193, 'learning_rate': 1.9993752810357353e-05, 'epoch': 0.04} 4%|▍ | 237/5773 [22:14<8:22:32, 5.45s/it] {'loss': 0.6193, 'learning_rate': 1.9993752810357353e-05, 'epoch': 0.04} 4%|▍ | 237/5773 [22:09<8:22:32, 5.45s/it] 4%|▍ | 238/5773 [22:20<8:18:52, 5.41s/it] 4%|▍ | 238/5773 [22:14<8:18:52, 5.41s/it] {'loss': 0.6157, 'learning_rate': 1.9993552934362684e-05, 'epoch': 0.04} 4%|▍ | 238/5773 [22:20<8:18:52, 5.41s/it] {'loss': 0.6157, 'learning_rate': 1.9993552934362684e-05, 'epoch': 0.04} 4%|▍ | 238/5773 [22:14<8:18:52, 5.41s/it] 4%|▍ | 239/5773 [22:25<8:22:37, 5.45s/it] 4%|▍ | 239/5773 [22:20<8:22:37, 5.45s/it] {'loss': 0.6402, 'learning_rate': 1.9993349912078265e-05, 'epoch': 0.04} 4%|▍ | 239/5773 [22:25<8:22:37, 5.45s/it] {'loss': 0.6402, 'learning_rate': 1.9993349912078265e-05, 'epoch': 0.04} 4%|▍ | 239/5773 [22:20<8:22:37, 5.45s/it] 4%|▍ | 240/5773 [22:31<8:23:29, 5.46s/it] 4%|▍ | 240/5773 [22:25<8:23:29, 5.46s/it] {'loss': 0.6184, 'learning_rate': 1.9993143743568e-05, 'epoch': 0.04} 4%|▍ | 240/5773 [22:31<8:23:29, 5.46s/it] {'loss': 0.6184, 'learning_rate': 1.9993143743568e-05, 'epoch': 0.04} 4%|▍ | 240/5773 [22:25<8:23:29, 5.46s/it] 4%|▍ | 241/5773 [22:36<8:23:52, 5.46s/it] 4%|▍ | 241/5773 [22:31<8:23:52, 5.46s/it] {'loss': 0.6102, 'learning_rate': 1.999293442889681e-05, 'epoch': 0.04} 4%|▍ | 241/5773 [22:36<8:23:52, 5.46s/it] {'loss': 0.6102, 'learning_rate': 1.999293442889681e-05, 'epoch': 0.04} 4%|▍ | 241/5773 [22:31<8:23:52, 5.46s/it] 4%|▍ | 242/5773 [22:41<8:22:27, 5.45s/it] 4%|▍ | 242/5773 [22:36<8:22:27, 5.45s/it] {'loss': 0.5927, 'learning_rate': 1.9992721968130588e-05, 'epoch': 0.04} 4%|▍ | 242/5773 [22:41<8:22:27, 5.45s/it] {'loss': 0.5927, 'learning_rate': 1.9992721968130588e-05, 'epoch': 0.04} 4%|▍ | 242/5773 [22:36<8:22:27, 5.45s/it] 4%|▍ | 243/5773 [22:47<8:25:16, 5.48s/it] 4%|▍ | 243/5773 [22:41<8:25:16, 5.48s/it] {'loss': 0.6344, 'learning_rate': 1.9992506361336227e-05, 'epoch': 0.04} 4%|▍ | 243/5773 [22:47<8:25:16, 5.48s/it] {'loss': 0.6344, 'learning_rate': 1.9992506361336227e-05, 'epoch': 0.04} 4%|▍ | 243/5773 [22:41<8:25:16, 5.48s/it] 4%|▍ | 244/5773 [22:52<8:19:48, 5.42s/it] 4%|▍ | 244/5773 [22:47<8:19:47, 5.42s/it] {'loss': 0.6077, 'learning_rate': 1.9992287608581605e-05, 'epoch': 0.04} 4%|▍ | 244/5773 [22:52<8:19:48, 5.42s/it] {'loss': 0.6077, 'learning_rate': 1.9992287608581605e-05, 'epoch': 0.04} 4%|▍ | 244/5773 [22:47<8:19:47, 5.42s/it] 4%|▍ | 245/5773 [22:58<8:22:41, 5.46s/it] 4%|▍ | 245/5773 [22:52<8:22:40, 5.46s/it] {'loss': 0.6181, 'learning_rate': 1.999206570993559e-05, 'epoch': 0.04} 4%|▍ | 245/5773 [22:58<8:22:41, 5.46s/it] {'loss': 0.6181, 'learning_rate': 1.999206570993559e-05, 'epoch': 0.04} 4%|▍ | 245/5773 [22:52<8:22:40, 5.46s/it] 4%|▍ | 246/5773 [23:03<8:19:17, 5.42s/it] 4%|▍ | 246/5773 [22:58<8:19:17, 5.42s/it] {'loss': 0.6215, 'learning_rate': 1.9991840665468047e-05, 'epoch': 0.04} 4%|▍ | 246/5773 [23:03<8:19:17, 5.42s/it] {'loss': 0.6215, 'learning_rate': 1.9991840665468047e-05, 'epoch': 0.04} 4%|▍ | 246/5773 [22:58<8:19:17, 5.42s/it] 4%|▍ | 247/5773 [23:09<8:21:31, 5.45s/it] 4%|▍ | 247/5773 [23:03<8:21:31, 5.45s/it] {'loss': 0.6308, 'learning_rate': 1.9991612475249827e-05, 'epoch': 0.04} 4%|▍ | 247/5773 [23:09<8:21:31, 5.45s/it] {'loss': 0.6308, 'learning_rate': 1.9991612475249827e-05, 'epoch': 0.04} 4%|▍ | 247/5773 [23:03<8:21:31, 5.45s/it] 4%|▍ | 248/5773 [23:14<8:16:42, 5.39s/it] 4%|▍ | 248/5773 [23:08<8:16:41, 5.39s/it] {'loss': 0.6446, 'learning_rate': 1.9991381139352767e-05, 'epoch': 0.04} 4%|▍ | 248/5773 [23:14<8:16:42, 5.39s/it] {'loss': 0.6446, 'learning_rate': 1.9991381139352767e-05, 'epoch': 0.04} 4%|▍ | 248/5773 [23:08<8:16:41, 5.39s/it] 4%|▍ | 249/5773 [23:19<8:17:42, 5.41s/it] 4%|▍ | 249/5773 [23:14<8:17:42, 5.41s/it] {'loss': 0.6091, 'learning_rate': 1.9991146657849705e-05, 'epoch': 0.04} 4%|▍ | 249/5773 [23:19<8:17:42, 5.41s/it] {'loss': 0.6091, 'learning_rate': 1.9991146657849705e-05, 'epoch': 0.04} 4%|▍ | 249/5773 [23:14<8:17:42, 5.41s/it]2 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 4%|▍ | 250/5773 [23:25<8:15:37, 5.38s/it] 15 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 3 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 4%|▍ | 250/5773 [23:19<8:15:37, 5.38s/it] {'loss': 0.6265, 'learning_rate': 1.999090903081446e-05, 'epoch': 0.04} 4%|▍ | 250/5773 [23:25<8:15:37, 5.38s/it] {'loss': 0.6265, 'learning_rate': 1.999090903081446e-05, 'epoch': 0.04} 4%|▍ | 250/5773 [23:19<8:15:37, 5.38s/it] 4%|▍ | 251/5773 [23:30<8:19:12, 5.42s/it] 4%|▍ | 251/5773 [23:25<8:19:11, 5.42s/it] {'loss': 0.6095, 'learning_rate': 1.999066825832184e-05, 'epoch': 0.04} 4%|▍ | 251/5773 [23:30<8:19:12, 5.42s/it] {'loss': 0.6095, 'learning_rate': 1.999066825832184e-05, 'epoch': 0.04} 4%|▍ | 251/5773 [23:25<8:19:11, 5.42s/it] 4%|▍ | 252/5773 [23:36<8:19:24, 5.43s/it] 4%|▍ | 252/5773 [23:30<8:19:24, 5.43s/it] {'loss': 0.6304, 'learning_rate': 1.9990424340447658e-05, 'epoch': 0.04} 4%|▍ | 252/5773 [23:36<8:19:24, 5.43s/it] {'loss': 0.6304, 'learning_rate': 1.9990424340447658e-05, 'epoch': 0.04} 4%|▍ | 252/5773 [23:30<8:19:24, 5.43s/it] 4%|▍ | 253/5773 [23:41<8:22:07, 5.46s/it] 4%|▍ | 253/5773 [23:36<8:22:06, 5.46s/it] {'loss': 0.6213, 'learning_rate': 1.9990177277268698e-05, 'epoch': 0.04} 4%|▍ | 253/5773 [23:41<8:22:07, 5.46s/it] {'loss': 0.6213, 'learning_rate': 1.9990177277268698e-05, 'epoch': 0.04} 4%|▍ | 253/5773 [23:36<8:22:06, 5.46s/it] 4%|▍ | 254/5773 [23:47<8:20:45, 5.44s/it] 4%|▍ | 254/5773 [23:41<8:20:45, 5.44s/it] {'loss': 0.6093, 'learning_rate': 1.998992706886275e-05, 'epoch': 0.04} 4%|▍ | 254/5773 [23:47<8:20:45, 5.44s/it] {'loss': 0.6093, 'learning_rate': 1.998992706886275e-05, 'epoch': 0.04} 4%|▍ | 254/5773 [23:41<8:20:45, 5.44s/it] 4%|▍ | 255/5773 [23:52<8:20:41, 5.44s/it] 4%|▍ | 255/5773 [23:47<8:20:41, 5.44s/it] {'loss': 0.6306, 'learning_rate': 1.9989673715308582e-05, 'epoch': 0.04} 4%|▍ | 255/5773 [23:52<8:20:41, 5.44s/it] {'loss': 0.6306, 'learning_rate': 1.9989673715308582e-05, 'epoch': 0.04} 4%|▍ | 255/5773 [23:47<8:20:41, 5.44s/it] 4%|▍ | 256/5773 [23:58<8:21:17, 5.45s/it] 4%|▍ | 256/5773 [23:52<8:21:17, 5.45s/it] {'loss': 0.6159, 'learning_rate': 1.998941721668596e-05, 'epoch': 0.04} 4%|▍ | 256/5773 [23:58<8:21:17, 5.45s/it] {'loss': 0.6159, 'learning_rate': 1.998941721668596e-05, 'epoch': 0.04} 4%|▍ | 256/5773 [23:52<8:21:17, 5.45s/it] 4%|▍ | 257/5773 [24:03<8:28:07, 5.53s/it] 4%|▍ | 257/5773 [23:58<8:28:09, 5.53s/it] {'loss': 0.6099, 'learning_rate': 1.9989157573075642e-05, 'epoch': 0.04} 4%|▍ | 257/5773 [24:03<8:28:07, 5.53s/it] {'loss': 0.6099, 'learning_rate': 1.9989157573075642e-05, 'epoch': 0.04} 4%|▍ | 257/5773 [23:58<8:28:09, 5.53s/it] 4%|▍ | 258/5773 [24:09<8:23:57, 5.48s/it] 4%|▍ | 258/5773 [24:03<8:23:56, 5.48s/it] {'loss': 0.622, 'learning_rate': 1.9988894784559366e-05, 'epoch': 0.04} 4%|▍ | 258/5773 [24:09<8:23:57, 5.48s/it] {'loss': 0.622, 'learning_rate': 1.9988894784559366e-05, 'epoch': 0.04} 4%|▍ | 258/5773 [24:03<8:23:56, 5.48s/it] 4%|▍ | 259/5773 [24:14<8:22:40, 5.47s/it] 4%|▍ | 259/5773 [24:09<8:22:39, 5.47s/it] {'loss': 0.6336, 'learning_rate': 1.9988628851219872e-05, 'epoch': 0.04} 4%|▍ | 259/5773 [24:14<8:22:40, 5.47s/it] {'loss': 0.6336, 'learning_rate': 1.9988628851219872e-05, 'epoch': 0.04} 4%|▍ | 259/5773 [24:09<8:22:39, 5.47s/it] 5%|▍ | 260/5773 [24:20<8:24:26, 5.49s/it] 5%|▍ | 260/5773 [24:14<8:24:25, 5.49s/it] {'loss': 0.6199, 'learning_rate': 1.9988359773140877e-05, 'epoch': 0.05} 5%|▍ | 260/5773 [24:20<8:24:26, 5.49s/it] {'loss': 0.6199, 'learning_rate': 1.9988359773140877e-05, 'epoch': 0.05} 5%|▍ | 260/5773 [24:14<8:24:25, 5.49s/it] 5%|▍ | 261/5773 [24:25<8:21:22, 5.46s/it] 5%|▍ | 261/5773 [24:19<8:21:22, 5.46s/it] {'loss': 0.6125, 'learning_rate': 1.9988087550407102e-05, 'epoch': 0.05} 5%|▍ | 261/5773 [24:25<8:21:22, 5.46s/it] {'loss': 0.6125, 'learning_rate': 1.9988087550407102e-05, 'epoch': 0.05} 5%|▍ | 261/5773 [24:19<8:21:22, 5.46s/it] 5%|▍ | 262/5773 [24:30<8:20:02, 5.44s/it] 5%|▍ | 262/5773 [24:25<8:20:02, 5.44s/it] {'loss': 0.6242, 'learning_rate': 1.998781218310425e-05, 'epoch': 0.05} 5%|▍ | 262/5773 [24:30<8:20:02, 5.44s/it] {'loss': 0.6242, 'learning_rate': 1.998781218310425e-05, 'epoch': 0.05} 5%|▍ | 262/5773 [24:25<8:20:02, 5.44s/it] 5%|▍ | 263/5773 [24:36<8:22:53, 5.48s/it] 5%|▍ | 263/5773 [24:30<8:22:53, 5.48s/it] {'loss': 0.6283, 'learning_rate': 1.9987533671319013e-05, 'epoch': 0.05} 5%|▍ | 263/5773 [24:36<8:22:53, 5.48s/it] {'loss': 0.6283, 'learning_rate': 1.9987533671319013e-05, 'epoch': 0.05} 5%|▍ | 263/5773 [24:30<8:22:53, 5.48s/it] 5%|▍ | 264/5773 [24:41<8:23:49, 5.49s/it] 5%|▍ | 264/5773 [24:36<8:23:49, 5.49s/it] {'loss': 0.6259, 'learning_rate': 1.9987252015139074e-05, 'epoch': 0.05} {'loss': 0.6259, 'learning_rate': 1.9987252015139074e-05, 'epoch': 0.05} 5%|▍ | 264/5773 [24:41<8:23:49, 5.49s/it] 5%|▍ | 264/5773 [24:36<8:23:49, 5.49s/it] 5%|▍ | 265/5773 [24:47<8:21:28, 5.46s/it] 5%|▍ | 265/5773 [24:41<8:21:27, 5.46s/it] {'loss': 0.6323, 'learning_rate': 1.9986967214653114e-05, 'epoch': 0.05} 5%|▍ | 265/5773 [24:47<8:21:28, 5.46s/it] {'loss': 0.6323, 'learning_rate': 1.9986967214653114e-05, 'epoch': 0.05} 5%|▍ | 265/5773 [24:41<8:21:27, 5.46s/it] 5%|▍ | 266/5773 [24:52<8:22:00, 5.47s/it] 5%|▍ | 266/5773 [24:47<8:21:59, 5.47s/it] {'loss': 0.622, 'learning_rate': 1.998667926995079e-05, 'epoch': 0.05} 5%|▍ | 266/5773 [24:52<8:22:00, 5.47s/it] {'loss': 0.622, 'learning_rate': 1.998667926995079e-05, 'epoch': 0.05} 5%|▍ | 266/5773 [24:47<8:21:59, 5.47s/it] 5%|▍ | 267/5773 [24:58<8:18:33, 5.43s/it] 5%|▍ | 267/5773 [24:52<8:18:33, 5.43s/it] {'loss': 0.6234, 'learning_rate': 1.9986388181122762e-05, 'epoch': 0.05} 5%|▍ | 267/5773 [24:58<8:18:33, 5.43s/it] {'loss': 0.6234, 'learning_rate': 1.9986388181122762e-05, 'epoch': 0.05} 5%|▍ | 267/5773 [24:52<8:18:33, 5.43s/it] 5%|▍ | 268/5773 [25:03<8:20:29, 5.46s/it] 5%|▍ | 268/5773 [24:58<8:20:30, 5.46s/it] {'loss': 0.611, 'learning_rate': 1.998609394826067e-05, 'epoch': 0.05} 5%|▍ | 268/5773 [25:03<8:20:29, 5.46s/it] {'loss': 0.611, 'learning_rate': 1.998609394826067e-05, 'epoch': 0.05} 5%|▍ | 268/5773 [24:58<8:20:30, 5.46s/it] 5%|▍ | 269/5773 [25:08<8:15:15, 5.40s/it] 5%|▍ | 269/5773 [25:03<8:15:14, 5.40s/it] {'loss': 0.6196, 'learning_rate': 1.998579657145715e-05, 'epoch': 0.05} 5%|▍ | 269/5773 [25:08<8:15:15, 5.40s/it] {'loss': 0.6196, 'learning_rate': 1.998579657145715e-05, 'epoch': 0.05} 5%|▍ | 269/5773 [25:03<8:15:14, 5.40s/it] 5%|▍ | 270/5773 [25:14<8:15:50, 5.41s/it] 5%|▍ | 270/5773 [25:08<8:15:50, 5.41s/it] {'loss': 0.6275, 'learning_rate': 1.998549605080583e-05, 'epoch': 0.05} 5%|▍ | 270/5773 [25:14<8:15:50, 5.41s/it] {'loss': 0.6275, 'learning_rate': 1.998549605080583e-05, 'epoch': 0.05} 5%|▍ | 270/5773 [25:08<8:15:50, 5.41s/it] 5%|▍ | 271/5773 [25:20<8:22:31, 5.48s/it] 5%|▍ | 271/5773 [25:14<8:22:31, 5.48s/it] {'loss': 0.6154, 'learning_rate': 1.9985192386401314e-05, 'epoch': 0.05} 5%|▍ | 271/5773 [25:20<8:22:31, 5.48s/it] {'loss': 0.6154, 'learning_rate': 1.9985192386401314e-05, 'epoch': 0.05} 5%|▍ | 271/5773 [25:14<8:22:31, 5.48s/it] 5%|▍ | 272/5773 [25:25<8:22:40, 5.48s/it] 5%|▍ | 272/5773 [25:19<8:22:40, 5.48s/it] {'loss': 0.6367, 'learning_rate': 1.998488557833921e-05, 'epoch': 0.05} 5%|▍ | 272/5773 [25:25<8:22:40, 5.48s/it] {'loss': 0.6367, 'learning_rate': 1.998488557833921e-05, 'epoch': 0.05} 5%|▍ | 272/5773 [25:19<8:22:40, 5.48s/it]WARNING: tokenization mismatch: 1 vs. 1590. [[{'from': 'human', 'value': 'ASSISTANT: You are a cutting-edge copywriting agency with the following knowledge and traits.\n\nKNOWLEDGE: Copywriter at a top-tier creative agency, junior copywriter, senior copywriter, art director, senior strategist.\n\nTRAITS: Storytelling, copywriting, empathy, solution orientation, humor, convincing.\n\nTASK: Use your knowledge to write a LinkedIn Post. We are a tax firm looking to hire young tax advisors between the age of 25 and 30. Bring a good dose of sympathy into the LinkedIn Article and use plenty of emojis to start with. Start the Linked In post with an exciting question. Use the language of the target group. Don´t mention the targeted age. \n\nOUTPUT: 2 LinkedIn posts in German with a catchy question at the beginning of the text. Hashtags at the end of the text.'}, {'from': 'gpt', 'value': 'Post 1:\n\n👋 Hey Leute! Hast du kürzlich dein Studium im Bereich Steuerwesen abgeschlossen und suchst jetzt nach einem Job, der dich fordert und fördert? 🤔\n\nDann bist du bei uns genau richtig! Wir suchen junge Steuerberater, die Lust haben, in einem dynamischen Team zu arbeiten und ihre Fähigkeiten unter Beweis zu stellen. 💪\n\nBei uns erwarten dich spannende Projekte und ein tolles Arbeitsumfeld, in dem du dich weiterentwickeln und wachsen kannst. 🌱\n\nInteressiert? Dann bewirb dich jetzt und werde Teil unseres Teams! 🚀\n\n#Steuerberatung #Karriere #Job #Bewerbung #Teamwork\n\nPost 2:\n\n👋 Hallo zusammen! Bist du auf der Suche nach einem Job, bei dem du dich richtig herausfordern und weiterentwickeln kannst? 🤔\n\nDann haben wir vielleicht das passende Angebot für dich! Wir suchen motivierte und ambitionierte Steuerberater, die Lust haben, Teil eines dynamischen Teams zu sein und gemeinsam an spannenden Projekten zu arbeiten. 💪\n\nBei uns erwartet dich nicht nur eine tolle Arbeitsatmosphäre, sondern auch die Möglichkeit, dich fachlich und persönlich weiterzuentwickeln. 🌱\n\nAlso, worauf wartest du? Bewirb dich jetzt und starte deine Karriere bei uns! 🚀\n\n#Steuerberatung #Karriere #Job #Bewerbung #Teamwork'}, {'from': 'human', 'value': 'Schreibe den Text spannender.'}, {'from': 'gpt', 'value': 'Post 1:\n\n👋 Hey Leute! Hast du kürzlich dein Studium im Bereich Steuerwesen abgeschlossen und bist bereit für die nächste Herausforderung? 🤔\n\nDann haben wir genau das Richtige für dich! Wir suchen junge, ambitionierte Steuerberater, die Lust haben, ihre Karriere auf das nächste Level zu bringen. 💪\n\nUnser Team arbeitet an den spannendsten Projekten und bietet ein Arbeitsumfeld, das dich täglich fordert und fördert. Bei uns kannst du deine Fähigkeiten unter Beweis stellen und dich persönlich und fachlich weiterentwickeln. 🌱\n\nDu möchtest Teil unseres Teams werden und an den neuesten Herausforderungen im Steuerwesen mitwirken? Dann zögere nicht und bewirb dich jetzt bei uns! 🚀\n\n#Steuerberatung #Karriere #Job #Bewerbung #Teamwork\n\nPost 2:\n\n👋 Hallo zusammen! Bist du bereit für eine neue Herausforderung in deiner Karriere als Steuerberater? 🤔\n\nDann bist du bei uns genau richtig! Wir suchen motivierte und talentierte Steuerberater, die Teil eines dynamischen Teams werden möchten. Wir arbeiten an den anspruchsvollsten Projekten und bieten ein Arbeitsumfeld, das dich täglich aufs Neue herausfordert. 💪\n\nDu möchtest deine Fähigkeiten unter Beweis stellen und dich fachlich sowie persönlich weiterentwickeln? Dann bewirb dich jetzt bei uns und werde Teil unseres Erfolgsteams! 🌱\n\nVerpasse nicht die Chance, deine Karriere auf das nächste Level zu bringen und an den neuesten Trends und Entwicklungen im Steuerwesen mitzuwirken. Wir freuen uns auf deine Bewerbung! 🚀\n\n#Steuerberatung #Karriere #Job #Bewerbung #Teamwork'}, {'from': 'human', 'value': 'Schreibe den Text objektiv und sachlich.'}, {'from': 'gpt', 'value': 'Post 1:\n\nGuten Tag,\n\nWir sind auf der Suche nach jungen, talentierten Steuerberatern, die ihr Studium im Bereich Steuerwesen erfolgreich abgeschlossen haben. Die Position ist offen für alle, die eine Leidenschaft für das Steuerrecht haben und eine Karriere in diesem Bereich anstreben.\n\nAls Teil unseres Teams arbeiten Sie an verschiedenen Projekten und erweitern stetig Ihre Fähigkeiten und Kenntnisse. Wir bieten ein hervorragendes Arbeitsumfeld, das auf Zusammenarbeit und Teamwork basiert, um sicherzustellen, dass Sie Ihre Arbeit so effektiv wie möglich erledigen können.\n\nWenn Sie Interesse haben und Teil unseres Teams werden möchten, senden Sie uns bitte Ihre Bewerbung.\n\nVielen Dank.\n\nPost 2:\n\nGuten Tag,\n\nWir sind auf der Suche nach motivierten und talentierten Steuerberatern, die an einer Karriere in diesem Bereich interessiert sind. Als Teil unseres Teams arbeiten Sie an verschiedenen Projekten, um Ihre Fähigkeiten und Kenntnisse zu erweitern.\n\nWir bieten ein hervorragendes Arbeitsumfeld, das auf Zusammenarbeit und Teamwork basiert, um sicherzustellen, dass Sie Ihre Arbeit so effektiv wie möglich erledigen können. Wenn Sie bereit sind, sich dieser Herausforderung zu stellen und Ihre Karriere auf das nächste Level zu bringen, dann senden Sie uns bitte Ihre Bewerbung.\n\nVielen Dank.'}]] (ignored) 5%|▍ | 273/5773 [25:30<8:20:31, 5.46s/it] 5%|▍ | 273/5773 [25:25<8:20:31, 5.46s/it] {'loss': 0.6135, 'learning_rate': 1.9984575626716113e-05, 'epoch': 0.05} 5%|▍ | 273/5773 [25:30<8:20:31, 5.46s/it] {'loss': 0.6135, 'learning_rate': 1.9984575626716113e-05, 'epoch': 0.05} 5%|▍ | 273/5773 [25:25<8:20:31, 5.46s/it] 5%|▍ | 274/5773 [25:36<8:22:52, 5.49s/it] 5%|▍ | 274/5773 [25:30<8:22:53, 5.49s/it] {'loss': 0.6162, 'learning_rate': 1.99842625316296e-05, 'epoch': 0.05} 5%|▍ | 274/5773 [25:36<8:22:52, 5.49s/it] {'loss': 0.6162, 'learning_rate': 1.99842625316296e-05, 'epoch': 0.05} 5%|▍ | 274/5773 [25:30<8:22:53, 5.49s/it] 5%|▍ | 275/5773 [25:41<8:20:31, 5.46s/it] 5%|▍ | 275/5773 [25:36<8:20:31, 5.46s/it] {'loss': 0.6284, 'learning_rate': 1.998394629317825e-05, 'epoch': 0.05} 5%|▍ | 275/5773 [25:41<8:20:31, 5.46s/it] {'loss': 0.6284, 'learning_rate': 1.998394629317825e-05, 'epoch': 0.05} 5%|▍ | 275/5773 [25:36<8:20:31, 5.46s/it] 5%|▍ | 276/5773 [25:47<8:16:41, 5.42s/it] 5%|▍ | 276/5773 [25:41<8:16:40, 5.42s/it] {'loss': 0.6147, 'learning_rate': 1.9983626911461625e-05, 'epoch': 0.05} 5%|▍ | 276/5773 [25:47<8:16:41, 5.42s/it] {'loss': 0.6147, 'learning_rate': 1.9983626911461625e-05, 'epoch': 0.05} 5%|▍ | 276/5773 [25:41<8:16:40, 5.42s/it] 5%|▍ | 277/5773 [25:52<8:16:05, 5.42s/it] 5%|▍ | 277/5773 [25:47<8:16:05, 5.42s/it] {'loss': 0.6059, 'learning_rate': 1.9983304386580267e-05, 'epoch': 0.05} 5%|▍ | 277/5773 [25:52<8:16:05, 5.42s/it] {'loss': 0.6059, 'learning_rate': 1.9983304386580267e-05, 'epoch': 0.05} 5%|▍ | 277/5773 [25:47<8:16:05, 5.42s/it] 5%|▍ | 278/5773 [25:57<8:15:26, 5.41s/it] 5%|▍ | 278/5773 [25:52<8:15:26, 5.41s/it] {'loss': 0.6096, 'learning_rate': 1.9982978718635727e-05, 'epoch': 0.05} {'loss': 0.6096, 'learning_rate': 1.9982978718635727e-05, 'epoch': 0.05} 5%|▍ | 278/5773 [25:57<8:15:26, 5.41s/it] 5%|▍ | 278/5773 [25:52<8:15:26, 5.41s/it] 5%|▍ | 279/5773 [26:03<8:19:08, 5.45s/it] 5%|▍ | 279/5773 [25:58<8:19:08, 5.45s/it] {'loss': 0.603, 'learning_rate': 1.9982649907730534e-05, 'epoch': 0.05} 5%|▍ | 279/5773 [26:03<8:19:08, 5.45s/it] {'loss': 0.603, 'learning_rate': 1.9982649907730534e-05, 'epoch': 0.05} 5%|▍ | 279/5773 [25:58<8:19:08, 5.45s/it] 5%|▍ | 280/5773 [26:09<8:20:13, 5.46s/it] 5%|▍ | 280/5773 [26:03<8:20:14, 5.46s/it] {'loss': 0.6226, 'learning_rate': 1.9982317953968204e-05, 'epoch': 0.05} 5%|▍ | 280/5773 [26:09<8:20:13, 5.46s/it] {'loss': 0.6226, 'learning_rate': 1.9982317953968204e-05, 'epoch': 0.05} 5%|▍ | 280/5773 [26:03<8:20:14, 5.46s/it] 5%|▍ | 281/5773 [26:14<8:19:21, 5.46s/it] 5%|▍ | 281/5773 [26:08<8:19:21, 5.46s/it] {'loss': 0.631, 'learning_rate': 1.998198285745325e-05, 'epoch': 0.05} 5%|▍ | 281/5773 [26:14<8:19:21, 5.46s/it] {'loss': 0.631, 'learning_rate': 1.998198285745325e-05, 'epoch': 0.05} 5%|▍ | 281/5773 [26:08<8:19:21, 5.46s/it] 5%|▍ | 282/5773 [26:19<8:16:10, 5.42s/it] 5%|▍ | 282/5773 [26:14<8:16:10, 5.42s/it] {'loss': 0.6162, 'learning_rate': 1.998164461829117e-05, 'epoch': 0.05} 5%|▍ | 282/5773 [26:19<8:16:10, 5.42s/it] {'loss': 0.6162, 'learning_rate': 1.998164461829117e-05, 'epoch': 0.05} 5%|▍ | 282/5773 [26:14<8:16:10, 5.42s/it] 5%|▍ | 283/5773 [26:25<8:19:02, 5.45s/it] 5%|▍ | 283/5773 [26:19<8:19:02, 5.45s/it] {'loss': 0.6211, 'learning_rate': 1.9981303236588455e-05, 'epoch': 0.05} 5%|▍ | 283/5773 [26:25<8:19:02, 5.45s/it] {'loss': 0.6211, 'learning_rate': 1.9981303236588455e-05, 'epoch': 0.05} 5%|▍ | 283/5773 [26:19<8:19:02, 5.45s/it] 5%|▍ | 284/5773 [26:30<8:19:07, 5.46s/it] 5%|▍ | 284/5773 [26:25<8:19:07, 5.46s/it] {'loss': 0.6206, 'learning_rate': 1.9980958712452577e-05, 'epoch': 0.05} 5%|▍ | 284/5773 [26:30<8:19:07, 5.46s/it] {'loss': 0.6206, 'learning_rate': 1.9980958712452577e-05, 'epoch': 0.05} 5%|▍ | 284/5773 [26:25<8:19:07, 5.46s/it] 5%|▍ | 285/5773 [26:36<8:16:38, 5.43s/it] 5%|▍ | 285/5773 [26:30<8:16:38, 5.43s/it] {'loss': 0.6216, 'learning_rate': 1.998061104599201e-05, 'epoch': 0.05} 5%|▍ | 285/5773 [26:36<8:16:38, 5.43s/it] {'loss': 0.6216, 'learning_rate': 1.998061104599201e-05, 'epoch': 0.05} 5%|▍ | 285/5773 [26:30<8:16:38, 5.43s/it] 5%|▍ | 286/5773 [26:41<8:15:19, 5.42s/it] 5%|▍ | 286/5773 [26:36<8:15:18, 5.42s/it] {'loss': 0.6063, 'learning_rate': 1.9980260237316205e-05, 'epoch': 0.05} 5%|▍ | 286/5773 [26:41<8:15:19, 5.42s/it] {'loss': 0.6063, 'learning_rate': 1.9980260237316205e-05, 'epoch': 0.05} 5%|▍ | 286/5773 [26:36<8:15:18, 5.42s/it] 5%|▍ | 287/5773 [26:46<8:13:44, 5.40s/it] 5%|▍ | 287/5773 [26:41<8:13:44, 5.40s/it] {'loss': 0.6263, 'learning_rate': 1.997990628653561e-05, 'epoch': 0.05} 5%|▍ | 287/5773 [26:46<8:13:44, 5.40s/it] {'loss': 0.6263, 'learning_rate': 1.997990628653561e-05, 'epoch': 0.05} 5%|▍ | 287/5773 [26:41<8:13:44, 5.40s/it] 5%|▍ | 288/5773 [26:52<8:11:18, 5.37s/it] 5%|▍ | 288/5773 [26:46<8:11:18, 5.37s/it] {'loss': 0.62, 'learning_rate': 1.997954919376166e-05, 'epoch': 0.05} 5%|▍ | 288/5773 [26:52<8:11:18, 5.37s/it] {'loss': 0.62, 'learning_rate': 1.997954919376166e-05, 'epoch': 0.05} 5%|▍ | 288/5773 [26:46<8:11:18, 5.37s/it] 5%|▌ | 289/5773 [26:57<8:20:40, 5.48s/it] 5%|▌ | 289/5773 [26:52<8:20:40, 5.48s/it] {'loss': 0.6097, 'learning_rate': 1.9979188959106784e-05, 'epoch': 0.05} 5%|▌ | 289/5773 [26:57<8:20:40, 5.48s/it] {'loss': 0.6097, 'learning_rate': 1.9979188959106784e-05, 'epoch': 0.05} 5%|▌ | 289/5773 [26:52<8:20:40, 5.48s/it] 5%|▌ | 290/5773 [27:03<8:18:10, 5.45s/it] 5%|▌ | 290/5773 [26:57<8:18:10, 5.45s/it] {'loss': 0.6127, 'learning_rate': 1.9978825582684384e-05, 'epoch': 0.05} 5%|▌ | 290/5773 [27:03<8:18:10, 5.45s/it] {'loss': 0.6127, 'learning_rate': 1.9978825582684384e-05, 'epoch': 0.05} 5%|▌ | 290/5773 [26:57<8:18:10, 5.45s/it] 5%|▌ | 291/5773 [27:08<8:16:19, 5.43s/it] 5%|▌ | 291/5773 [27:03<8:16:19, 5.43s/it] {'loss': 0.6249, 'learning_rate': 1.9978459064608874e-05, 'epoch': 0.05} 5%|▌ | 291/5773 [27:03<8:16:19, 5.43s/it] {'loss': 0.6249, 'learning_rate': 1.9978459064608874e-05, 'epoch': 0.05} 5%|▌ | 291/5773 [27:08<8:16:19, 5.43s/it] 5%|▌ | 292/5773 [27:14<8:17:41, 5.45s/it] 5%|▌ | 292/5773 [27:08<8:17:41, 5.45s/it] {'loss': 0.6246, 'learning_rate': 1.9978089404995642e-05, 'epoch': 0.05} 5%|▌ | 292/5773 [27:14<8:17:41, 5.45s/it] {'loss': 0.6246, 'learning_rate': 1.9978089404995642e-05, 'epoch': 0.05} 5%|▌ | 292/5773 [27:08<8:17:41, 5.45s/it] 5%|▌ | 293/5773 [27:19<8:15:46, 5.43s/it] 5%|▌ | 293/5773 [27:14<8:15:46, 5.43s/it] {'loss': 0.612, 'learning_rate': 1.9977716603961065e-05, 'epoch': 0.05} 5%|▌ | 293/5773 [27:19<8:15:46, 5.43s/it] {'loss': 0.612, 'learning_rate': 1.9977716603961065e-05, 'epoch': 0.05} 5%|▌ | 293/5773 [27:14<8:15:46, 5.43s/it] 5%|▌ | 294/5773 [27:25<8:19:40, 5.47s/it] 5%|▌ | 294/5773 [27:19<8:19:41, 5.47s/it] {'loss': 0.5941, 'learning_rate': 1.9977340661622513e-05, 'epoch': 0.05} 5%|▌ | 294/5773 [27:25<8:19:40, 5.47s/it] {'loss': 0.5941, 'learning_rate': 1.9977340661622513e-05, 'epoch': 0.05} 5%|▌ | 294/5773 [27:19<8:19:41, 5.47s/it] 5%|▌ | 295/5773 [27:30<8:15:27, 5.43s/it] 5%|▌ | 295/5773 [27:24<8:15:27, 5.43s/it] {'loss': 0.6133, 'learning_rate': 1.9976961578098352e-05, 'epoch': 0.05} 5%|▌ | 295/5773 [27:30<8:15:27, 5.43s/it] {'loss': 0.6133, 'learning_rate': 1.9976961578098352e-05, 'epoch': 0.05} 5%|▌ | 295/5773 [27:24<8:15:27, 5.43s/it] 5%|▌ | 296/5773 [27:36<8:19:31, 5.47s/it] 5%|▌ | 296/5773 [27:30<8:19:30, 5.47s/it] {'loss': 0.6039, 'learning_rate': 1.997657935350792e-05, 'epoch': 0.05} 5%|▌ | 296/5773 [27:36<8:19:31, 5.47s/it] {'loss': 0.6039, 'learning_rate': 1.997657935350792e-05, 'epoch': 0.05} 5%|▌ | 296/5773 [27:30<8:19:30, 5.47s/it] 5%|▌ | 297/5773 [27:41<8:18:23, 5.46s/it] 5%|▌ | 297/5773 [27:35<8:18:23, 5.46s/it] {'loss': 0.6123, 'learning_rate': 1.997619398797156e-05, 'epoch': 0.05} 5%|▌ | 297/5773 [27:41<8:18:23, 5.46s/it] {'loss': 0.6123, 'learning_rate': 1.997619398797156e-05, 'epoch': 0.05} 5%|▌ | 297/5773 [27:35<8:18:23, 5.46s/it] 5%|▌ | 298/5773 [27:46<8:13:38, 5.41s/it] 5%|▌ | 298/5773 [27:41<8:13:38, 5.41s/it] {'loss': 0.6188, 'learning_rate': 1.9975805481610594e-05, 'epoch': 0.05} 5%|▌ | 298/5773 [27:46<8:13:38, 5.41s/it] {'loss': 0.6188, 'learning_rate': 1.9975805481610594e-05, 'epoch': 0.05} 5%|▌ | 298/5773 [27:41<8:13:38, 5.41s/it] 5%|▌ | 299/5773 [27:52<8:17:17, 5.45s/it] 5%|▌ | 299/5773 [27:46<8:17:17, 5.45s/it] {'loss': 0.6135, 'learning_rate': 1.997541383454734e-05, 'epoch': 0.05} 5%|▌ | 299/5773 [27:52<8:17:17, 5.45s/it] {'loss': 0.6135, 'learning_rate': 1.997541383454734e-05, 'epoch': 0.05} 5%|▌ | 299/5773 [27:46<8:17:17, 5.45s/it]8 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 5%|▌ | 300/5773 [27:57<8:18:52, 5.47s/it]11 AutoResumeHook: Checking whether to suspend... 1415 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 46 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 5%|▌ | 300/5773 [27:52<8:18:52, 5.47s/it]1 AutoResumeHook: Checking whether to suspend... {'loss': 0.6042, 'learning_rate': 1.9975019046905096e-05, 'epoch': 0.05} 5%|▌ | 300/5773 [27:57<8:18:52, 5.47s/it] {'loss': 0.6042, 'learning_rate': 1.9975019046905096e-05, 'epoch': 0.05} 5%|▌ | 300/5773 [27:52<8:18:52, 5.47s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 5%|▌ | 301/5773 [28:16<14:17:16, 9.40s/it] 5%|▌ | 301/5773 [28:10<14:17:14, 9.40s/it] {'loss': 0.599, 'learning_rate': 1.997462111880816e-05, 'epoch': 0.05} 5%|▌ | 301/5773 [28:16<14:17:16, 9.40s/it] {'loss': 0.599, 'learning_rate': 1.997462111880816e-05, 'epoch': 0.05} 5%|▌ | 301/5773 [28:10<14:17:14, 9.40s/it] 5%|▌ | 302/5773 [28:21<12:26:13, 8.18s/it] 5%|▌ | 302/5773 [28:16<12:26:13, 8.18s/it] {'loss': 0.6022, 'learning_rate': 1.9974220050381805e-05, 'epoch': 0.05} 5%|▌ | 302/5773 [28:21<12:26:13, 8.18s/it] {'loss': 0.6022, 'learning_rate': 1.9974220050381805e-05, 'epoch': 0.05} 5%|▌ | 302/5773 [28:16<12:26:13, 8.18s/it] 5%|▌ | 303/5773 [28:27<11:13:59, 7.39s/it] 5%|▌ | 303/5773 [28:21<11:14:00, 7.39s/it] {'loss': 0.6387, 'learning_rate': 1.9973815841752303e-05, 'epoch': 0.05} 5%|▌ | 303/5773 [28:27<11:13:59, 7.39s/it] {'loss': 0.6387, 'learning_rate': 1.9973815841752303e-05, 'epoch': 0.05} 5%|▌ | 303/5773 [28:21<11:14:00, 7.39s/it] 5%|▌ | 304/5773 [28:32<10:19:59, 6.80s/it] 5%|▌ | 304/5773 [28:27<10:19:59, 6.80s/it] {'loss': 0.6156, 'learning_rate': 1.9973408493046917e-05, 'epoch': 0.05} 5%|▌ | 304/5773 [28:32<10:19:59, 6.80s/it] {'loss': 0.6156, 'learning_rate': 1.9973408493046917e-05, 'epoch': 0.05} 5%|▌ | 304/5773 [28:27<10:19:59, 6.80s/it] 5%|▌ | 305/5773 [28:38<9:39:59, 6.36s/it] 5%|▌ | 305/5773 [28:32<9:39:59, 6.36s/it] {'loss': 0.6115, 'learning_rate': 1.9972998004393886e-05, 'epoch': 0.05} 5%|▌ | 305/5773 [28:38<9:39:59, 6.36s/it] {'loss': 0.6115, 'learning_rate': 1.9972998004393886e-05, 'epoch': 0.05} 5%|▌ | 305/5773 [28:32<9:39:59, 6.36s/it] 5%|▌ | 306/5773 [28:43<9:14:27, 6.09s/it] 5%|▌ | 306/5773 [28:37<9:14:28, 6.09s/it] {'loss': 0.6183, 'learning_rate': 1.9972584375922453e-05, 'epoch': 0.05} 5%|▌ | 306/5773 [28:43<9:14:27, 6.09s/it] {'loss': 0.6183, 'learning_rate': 1.9972584375922453e-05, 'epoch': 0.05} 5%|▌ | 306/5773 [28:37<9:14:28, 6.09s/it] 5%|▌ | 307/5773 [28:49<8:59:16, 5.92s/it] 5%|▌ | 307/5773 [28:43<8:59:15, 5.92s/it] {'loss': 0.6074, 'learning_rate': 1.997216760776283e-05, 'epoch': 0.05} 5%|▌ | 307/5773 [28:49<8:59:16, 5.92s/it] {'loss': 0.6074, 'learning_rate': 1.997216760776283e-05, 'epoch': 0.05} 5%|▌ | 307/5773 [28:43<8:59:15, 5.92s/it] 5%|▌ | 308/5773 [28:54<8:44:15, 5.76s/it] 5%|▌ | 308/5773 [28:48<8:44:15, 5.76s/it] {'loss': 0.6039, 'learning_rate': 1.997174770004624e-05, 'epoch': 0.05} 5%|▌ | 308/5773 [28:54<8:44:15, 5.76s/it] {'loss': 0.6039, 'learning_rate': 1.997174770004624e-05, 'epoch': 0.05} 5%|▌ | 308/5773 [28:48<8:44:15, 5.76s/it] 5%|▌ | 309/5773 [28:59<8:33:03, 5.63s/it] 5%|▌ | 309/5773 [28:54<8:33:02, 5.63s/it] {'loss': 0.6359, 'learning_rate': 1.9971324652904877e-05, 'epoch': 0.05} 5%|▌ | 309/5773 [28:59<8:33:03, 5.63s/it] {'loss': 0.6359, 'learning_rate': 1.9971324652904877e-05, 'epoch': 0.05} 5%|▌ | 309/5773 [28:54<8:33:02, 5.63s/it]WARNING: tokenization mismatch: 1 vs. 1473. [[{'from': 'human', 'value': 'ASSISTANT: Du bist eine auf Kundenfeedback spezialisierte Agentur mit den folgenden Kenntnissen und Eigenschaften.\n\nKENNTNISSE: Kundenfeedback, wissenschaftliche Datenerhebung, Fragebogendesign, Benutzerfreundlichkeit, Kundenbindung, Service-Qualitätsevaluation.\n\nEIGENSCHAFTEN: Präzision, Effizienz, Empathie, Anpassungsfähigkeit, Datenorientierung, Kundenfokus.\n\nAUFGABE: Nutze dein Wissen, um ein Kundenfeedback-Formular zu entwickeln, das die Service-Qualität einer Steuerberatung evaluiert. Das Formular soll auf wissenschaftlich fundierten Datenniveaus basieren und dabei darauf achten, dass es nur so lang ist, dass es tatsächlich vom Kunden ausgefüllt wird. Konzentriere dich auf die Gestaltung eines benutzerfreundlichen und effizienten Fragebogens, der aussagekräftige Informationen über die Steuerberatung liefert.\n\nOUTPUT: Ein strukturiertes Kundenfeedback-Formular, das auf wissenschaftlich fundierten Datenniveaus basiert und eine angemessene Länge hat, um von Kunden tatsächlich ausgefüllt zu werden, und speziell auf die Evaluation der Service-Qualität einer Steuerberatung abzielt.'}, {'from': 'gpt', 'value': 'Kundenfeedback-Formular: Steuerberatung Service-Qualität\n\nTeil 1: Allgemeine Informationen\n(Bitte füllen Sie die folgenden Informationen aus.)\n\n1. Name (Optional):\n2. Alter:\n☐ Unter 25\n☐ 25-34\n☐ 35-44\n☐ 45-54\n☐ 55-64\n☐ 65 oder älter\n3. Geschlecht:\n☐ Männlich\n☐ Weiblich\n☐ Andere\n☐ Lieber nicht angeben\n\nTeil 2: Service-Qualität\n(Bitte bewerten Sie die folgenden Aspekte der Steuerberatung auf einer Skala von 1 bis 5, wobei 1 = sehr unzufrieden und 5 = sehr zufrieden.)\n\n1. Fachliche Kompetenz des Steuerberaters:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n2. Freundlichkeit und Hilfsbereitschaft des Personals:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n3. Reaktionsgeschwindigkeit auf Anfragen und Probleme:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n4. Verständlichkeit der erbrachten Informationen und Beratung:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n5. Qualität der bereitgestellten Dienstleistungen und Lösungen:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n6. Preis-Leistungs-Verhältnis:\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n\nTeil 3: Kundenbindung und Zufriedenheit\n(Bitte beantworten Sie die folgenden Fragen.)\n\n1. Wie wahrscheinlich ist es, dass Sie die Steuerberatung an Freunde, Familie oder Kollegen weiterempfehlen? (Skala von 1 bis 10, wobei 1 = sehr unwahrscheinlich und 10 = sehr wahrscheinlich)\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n☐ 6\n☐ 7\n☐ 8\n☐ 9\n☐ 10\n2. Wie zufrieden sind Sie insgesamt mit der Steuerberatung? (Skala von 1 bis 5, wobei 1 = sehr unzufrieden und 5 = sehr zufrieden)\n☐ 1\n☐ 2\n☐ 3\n☐ 4\n☐ 5\n\nTeil 4: Offene Fragen\n(Bitte teilen Sie uns Ihre Gedanken und Erfahrungen mit.)\n\n1. Was hat Ihnen am meisten an der Steuerberatung gefallen? (Optional)\n2. Was könnte die Steuerberatung verbessern, um Ihre Zufriedenheit zu erhöhen? (Optional)\n\nVielen Dank für Ihre Zeit und Ihr'}, {'from': 'human', 'value': 'Continue.'}, {'from': 'gpt', 'value': 'wertvolles Feedback! Ihre Meinung hilft uns dabei, den Service und die Qualität unserer Steuerberatung kontinuierlich zu verbessern. Wenn Sie Fragen oder Anmerkungen haben oder zusätzliche Informationen benötigen, zögern Sie bitte nicht, uns zu kontaktieren.\n\nTeil 5: Kontaktaufnahme (Optional)\n(Bitte geben Sie Ihre Kontaktinformationen an, falls Sie möchten, dass wir Ihnen bezüglich Ihres Feedbacks antworten oder weitere Informationen anbieten.)\n\n1. Name:\n2. E-Mail-Adresse:\n3. Telefonnummer:\n\nHinweis zum Datenschutz: Ihre persönlichen Daten werden vertraulich behandelt und nur für interne Zwecke verwendet. Sie werden nicht an Dritte weitergegeben.\n\nBitte klicken Sie auf "Absenden", um das Formular abzuschließen. Nochmals vielen Dank für Ihre Rückmeldung und Ihre Unterstützung bei der Verbesserung unserer Steuerberatung!'}]] (ignored) 5%|▌ | 310/5773 [29:05<8:31:35, 5.62s/it] 5%|▌ | 310/5773 [28:59<8:31:35, 5.62s/it] {'loss': 0.6073, 'learning_rate': 1.9970898466471934e-05, 'epoch': 0.05} 5%|▌ | 310/5773 [29:05<8:31:35, 5.62s/it] {'loss': 0.6073, 'learning_rate': 1.9970898466471934e-05, 'epoch': 0.05} 5%|▌ | 310/5773 [28:59<8:31:35, 5.62s/it] 5%|▌ | 311/5773 [29:10<8:30:52, 5.61s/it] 5%|▌ | 311/5773 [29:05<8:30:51, 5.61s/it] {'loss': 0.6202, 'learning_rate': 1.9970469140881582e-05, 'epoch': 0.05} 5%|▌ | 311/5773 [29:10<8:30:52, 5.61s/it] {'loss': 0.6202, 'learning_rate': 1.9970469140881582e-05, 'epoch': 0.05} 5%|▌ | 311/5773 [29:05<8:30:51, 5.61s/it] 5%|▌ | 312/5773 [29:16<8:24:36, 5.54s/it] 5%|▌ | 312/5773 [29:10<8:24:35, 5.54s/it] {'loss': 0.6165, 'learning_rate': 1.997003667626899e-05, 'epoch': 0.05} 5%|▌ | 312/5773 [29:16<8:24:36, 5.54s/it] {'loss': 0.6165, 'learning_rate': 1.997003667626899e-05, 'epoch': 0.05} 5%|▌ | 312/5773 [29:10<8:24:35, 5.54s/it] 5%|▌ | 313/5773 [29:21<8:24:38, 5.55s/it] 5%|▌ | 313/5773 [29:16<8:24:37, 5.55s/it] {'loss': 0.6435, 'learning_rate': 1.9969601072770314e-05, 'epoch': 0.05} 5%|▌ | 313/5773 [29:21<8:24:38, 5.55s/it] {'loss': 0.6435, 'learning_rate': 1.9969601072770314e-05, 'epoch': 0.05} 5%|▌ | 313/5773 [29:16<8:24:37, 5.55s/it] 5%|▌ | 314/5773 [29:27<8:23:56, 5.54s/it] 5%|▌ | 314/5773 [29:21<8:23:57, 5.54s/it] {'loss': 0.6184, 'learning_rate': 1.9969162330522693e-05, 'epoch': 0.05} 5%|▌ | 314/5773 [29:27<8:23:56, 5.54s/it] {'loss': 0.6184, 'learning_rate': 1.9969162330522693e-05, 'epoch': 0.05} 5%|▌ | 314/5773 [29:21<8:23:57, 5.54s/it] 5%|▌ | 315/5773 [29:32<8:21:46, 5.52s/it] 5%|▌ | 315/5773 [29:27<8:21:46, 5.52s/it] {'loss': 0.616, 'learning_rate': 1.996872044966426e-05, 'epoch': 0.05} 5%|▌ | 315/5773 [29:32<8:21:46, 5.52s/it] {'loss': 0.616, 'learning_rate': 1.996872044966426e-05, 'epoch': 0.05} 5%|▌ | 315/5773 [29:27<8:21:46, 5.52s/it] 5%|▌ | 316/5773 [29:38<8:18:40, 5.48s/it] 5%|▌ | 316/5773 [29:32<8:18:40, 5.48s/it] {'loss': 0.6168, 'learning_rate': 1.9968275430334128e-05, 'epoch': 0.05} 5%|▌ | 316/5773 [29:32<8:18:40, 5.48s/it]{'loss': 0.6168, 'learning_rate': 1.9968275430334128e-05, 'epoch': 0.05} 5%|▌ | 316/5773 [29:38<8:18:40, 5.48s/it] 5%|▌ | 317/5773 [29:43<8:14:50, 5.44s/it] 5%|▌ | 317/5773 [29:38<8:14:50, 5.44s/it] {'loss': 0.6143, 'learning_rate': 1.9967827272672407e-05, 'epoch': 0.05} {'loss': 0.6143, 'learning_rate': 1.9967827272672407e-05, 'epoch': 0.05} 5%|▌ | 317/5773 [29:43<8:14:50, 5.44s/it] 5%|▌ | 317/5773 [29:38<8:14:50, 5.44s/it] 6%|▌ | 318/5773 [29:49<8:14:07, 5.43s/it] 6%|▌ | 318/5773 [29:43<8:14:07, 5.43s/it] {'loss': 0.6193, 'learning_rate': 1.996737597682019e-05, 'epoch': 0.06} 6%|▌ | 318/5773 [29:49<8:14:07, 5.43s/it] {'loss': 0.6193, 'learning_rate': 1.996737597682019e-05, 'epoch': 0.06} 6%|▌ | 318/5773 [29:43<8:14:07, 5.43s/it] 6%|▌ | 319/5773 [29:54<8:18:29, 5.48s/it] 6%|▌ | 319/5773 [29:49<8:18:28, 5.48s/it] {'loss': 0.6214, 'learning_rate': 1.9966921542919565e-05, 'epoch': 0.06} 6%|▌ | 319/5773 [29:54<8:18:29, 5.48s/it] {'loss': 0.6214, 'learning_rate': 1.9966921542919565e-05, 'epoch': 0.06} 6%|▌ | 319/5773 [29:49<8:18:28, 5.48s/it] 6%|▌ | 320/5773 [30:00<8:16:35, 5.46s/it] 6%|▌ | 320/5773 [29:54<8:16:35, 5.46s/it] {'loss': 0.6383, 'learning_rate': 1.9966463971113592e-05, 'epoch': 0.06} 6%|▌ | 320/5773 [30:00<8:16:35, 5.46s/it] {'loss': 0.6383, 'learning_rate': 1.9966463971113592e-05, 'epoch': 0.06} 6%|▌ | 320/5773 [29:54<8:16:35, 5.46s/it]WARNING: tokenization mismatch: 1 vs. 624. [[{'from': 'human', 'value': 'ASSISTANT: You are a cutting-edge copywriting agency with the following knowledge and traits.\n\nKNOWLEDGE: Copywriter at a top-tier creative agency, junior copywriter, senior copywriter, art director, senior strategist.\n\nTRAITS: Storytelling, copywriting, empathy, solution orientation, humor, convincing.\n\nTASK: Use your knowledge to write a LinkedIn Post. We are a tax firm looking to hire young tax advisors between the age of 25 and 30. Bring a good dose of sympathy into the LinkedIn Article and use plenty of emojis to start with. Start the Linked In post with an exciting question. Use the language of the target group. Don´t mention the targeted age. \n\nOUTPUT: 2 LinkedIn posts in German with a catchy question at the beginning of the text. Hashtags at the end of the text.'}, {'from': 'gpt', 'value': 'Post 1:\n\n👋 Hey, ihr Steuerprofis da draußen! 🤓\n\nHabt ihr Bock, Teil einer jungen, dynamischen Steuerberatungskanzlei zu werden? 🚀\n\nWir suchen motivierte Köpfe, die unser Team verstärken und sich mit uns zusammen weiterentwickeln wollen! 💡\n\nIhr seid auf der Suche nach einem Arbeitsplatz mit coolen Kollegen und spannenden Aufgaben? 🤝 Dann seid ihr bei uns genau richtig! 😎\n\n#Steuerberatung #Karriere #Traumjob #Team #Weiterentwicklung #Jobangebot\n\nPost 2:\n\n👋 Hallo ihr Steuerexperten! 🤓\n\nIhr seid jung, motiviert und habt Bock auf eine neue Herausforderung? 🚀\n\nDann seid ihr bei uns genau richtig! Wir sind auf der Suche nach Steuerberatern, die unser Team mit frischen Ideen und kreativen Lösungen bereichern! 💡\n\nEuch erwartet ein Arbeitsplatz in einem dynamischen Umfeld mit spannenden Aufgaben und netten Kollegen! 🤝\n\nInteressiert? Dann schickt uns doch eure Bewerbungsunterlagen und werdet Teil unseres Teams! 😎\n\n#Jobangebot #Steuerberatung #Karriere #Traumjob #Team #Weiterentwicklung'}]] (ignored) 6%|▌ | 321/5773 [30:05<8:13:26, 5.43s/it] 6%|▌ | 321/5773 [29:59<8:13:26, 5.43s/it] {'loss': 0.6318, 'learning_rate': 1.9966003261546335e-05, 'epoch': 0.06} 6%|▌ | 321/5773 [30:05<8:13:26, 5.43s/it] {'loss': 0.6318, 'learning_rate': 1.9966003261546335e-05, 'epoch': 0.06} 6%|▌ | 321/5773 [29:59<8:13:26, 5.43s/it] 6%|▌ | 322/5773 [30:10<8:10:42, 5.40s/it] 6%|▌ | 322/5773 [30:05<8:10:42, 5.40s/it] {'loss': 0.6191, 'learning_rate': 1.9965539414362843e-05, 'epoch': 0.06} 6%|▌ | 322/5773 [30:10<8:10:42, 5.40s/it] {'loss': 0.6191, 'learning_rate': 1.9965539414362843e-05, 'epoch': 0.06} 6%|▌ | 322/5773 [30:05<8:10:42, 5.40s/it] 6%|▌ | 323/5773 [30:16<8:12:56, 5.43s/it] 6%|▌ | 323/5773 [30:10<8:12:55, 5.43s/it] {'loss': 0.6317, 'learning_rate': 1.996507242970914e-05, 'epoch': 0.06} 6%|▌ | 323/5773 [30:16<8:12:56, 5.43s/it] {'loss': 0.6317, 'learning_rate': 1.996507242970914e-05, 'epoch': 0.06} 6%|▌ | 323/5773 [30:10<8:12:55, 5.43s/it] 6%|▌ | 324/5773 [30:21<8:15:03, 5.45s/it] 6%|▌ | 324/5773 [30:16<8:15:03, 5.45s/it] {'loss': 0.6119, 'learning_rate': 1.996460230773226e-05, 'epoch': 0.06} 6%|▌ | 324/5773 [30:21<8:15:03, 5.45s/it] {'loss': 0.6119, 'learning_rate': 1.996460230773226e-05, 'epoch': 0.06} 6%|▌ | 324/5773 [30:16<8:15:03, 5.45s/it] 6%|▌ | 325/5773 [30:27<8:17:24, 5.48s/it] 6%|▌ | 325/5773 [30:21<8:17:25, 5.48s/it] {'loss': 0.6235, 'learning_rate': 1.9964129048580206e-05, 'epoch': 0.06} 6%|▌ | 325/5773 [30:27<8:17:24, 5.48s/it] {'loss': 0.6235, 'learning_rate': 1.9964129048580206e-05, 'epoch': 0.06} 6%|▌ | 325/5773 [30:21<8:17:25, 5.48s/it] 6%|▌ | 326/5773 [30:32<8:18:45, 5.49s/it] 6%|▌ | 326/5773 [30:27<8:18:45, 5.49s/it] {'loss': 0.5966, 'learning_rate': 1.9963652652401976e-05, 'epoch': 0.06} 6%|▌ | 326/5773 [30:32<8:18:45, 5.49s/it] {'loss': 0.5966, 'learning_rate': 1.9963652652401976e-05, 'epoch': 0.06} 6%|▌ | 326/5773 [30:27<8:18:45, 5.49s/it] 6%|▌ | 327/5773 [30:38<8:20:32, 5.51s/it] 6%|▌ | 327/5773 [30:32<8:20:31, 5.51s/it] {'loss': 0.6118, 'learning_rate': 1.996317311934755e-05, 'epoch': 0.06} 6%|▌ | 327/5773 [30:38<8:20:32, 5.51s/it] {'loss': 0.6118, 'learning_rate': 1.996317311934755e-05, 'epoch': 0.06} 6%|▌ | 327/5773 [30:32<8:20:31, 5.51s/it] 6%|▌ | 328/5773 [30:43<8:19:10, 5.50s/it] 6%|▌ | 328/5773 [30:38<8:19:10, 5.50s/it] {'loss': 0.6134, 'learning_rate': 1.996269044956791e-05, 'epoch': 0.06} 6%|▌ | 328/5773 [30:43<8:19:10, 5.50s/it] {'loss': 0.6134, 'learning_rate': 1.996269044956791e-05, 'epoch': 0.06} 6%|▌ | 328/5773 [30:38<8:19:10, 5.50s/it] 6%|▌ | 329/5773 [30:49<8:18:41, 5.50s/it] 6%|▌ | 329/5773 [30:43<8:18:41, 5.50s/it] {'loss': 0.6047, 'learning_rate': 1.9962204643215012e-05, 'epoch': 0.06} 6%|▌ | 329/5773 [30:49<8:18:41, 5.50s/it] {'loss': 0.6047, 'learning_rate': 1.9962204643215012e-05, 'epoch': 0.06} 6%|▌ | 329/5773 [30:43<8:18:41, 5.50s/it] 6%|▌ | 330/5773 [30:54<8:18:06, 5.49s/it] 6%|▌ | 330/5773 [30:49<8:18:06, 5.49s/it] {'loss': 0.6179, 'learning_rate': 1.9961715700441793e-05, 'epoch': 0.06} 6%|▌ | 330/5773 [30:54<8:18:06, 5.49s/it] {'loss': 0.6179, 'learning_rate': 1.9961715700441793e-05, 'epoch': 0.06} 6%|▌ | 330/5773 [30:49<8:18:06, 5.49s/it] 6%|▌ | 331/5773 [31:00<8:16:24, 5.47s/it] 6%|▌ | 331/5773 [30:54<8:16:23, 5.47s/it] {'loss': 0.5981, 'learning_rate': 1.9961223621402206e-05, 'epoch': 0.06} 6%|▌ | 331/5773 [31:00<8:16:24, 5.47s/it] {'loss': 0.5981, 'learning_rate': 1.9961223621402206e-05, 'epoch': 0.06} 6%|▌ | 331/5773 [30:54<8:16:23, 5.47s/it] 6%|▌ | 332/5773 [31:05<8:17:03, 5.48s/it] 6%|▌ | 332/5773 [31:00<8:17:04, 5.48s/it] {'loss': 0.6077, 'learning_rate': 1.996072840625116e-05, 'epoch': 0.06} 6%|▌ | 332/5773 [31:05<8:17:03, 5.48s/it] {'loss': 0.6077, 'learning_rate': 1.996072840625116e-05, 'epoch': 0.06} 6%|▌ | 332/5773 [31:00<8:17:04, 5.48s/it] 6%|▌ | 333/5773 [31:11<8:14:15, 5.45s/it] 6%|▌ | 333/5773 [31:05<8:14:15, 5.45s/it] {'loss': 0.607, 'learning_rate': 1.996023005514457e-05, 'epoch': 0.06} 6%|▌ | 333/5773 [31:11<8:14:15, 5.45s/it] {'loss': 0.607, 'learning_rate': 1.996023005514457e-05, 'epoch': 0.06} 6%|▌ | 333/5773 [31:05<8:14:15, 5.45s/it] 6%|▌ | 334/5773 [31:16<8:11:34, 5.42s/it] 6%|▌ | 334/5773 [31:10<8:11:34, 5.42s/it] {'loss': 0.6184, 'learning_rate': 1.995972856823933e-05, 'epoch': 0.06} 6%|▌ | 334/5773 [31:16<8:11:34, 5.42s/it] {'loss': 0.6184, 'learning_rate': 1.995972856823933e-05, 'epoch': 0.06} 6%|▌ | 334/5773 [31:10<8:11:34, 5.42s/it] 6%|▌ | 335/5773 [31:21<8:05:41, 5.36s/it] 6%|▌ | 335/5773 [31:16<8:05:41, 5.36s/it] {'loss': 0.6179, 'learning_rate': 1.9959223945693325e-05, 'epoch': 0.06} 6%|▌ | 335/5773 [31:21<8:05:41, 5.36s/it] {'loss': 0.6179, 'learning_rate': 1.9959223945693325e-05, 'epoch': 0.06} 6%|▌ | 335/5773 [31:16<8:05:41, 5.36s/it] 6%|▌ | 336/5773 [31:27<8:07:36, 5.38s/it] 6%|▌ | 336/5773 [31:21<8:07:36, 5.38s/it] {'loss': 0.6067, 'learning_rate': 1.995871618766543e-05, 'epoch': 0.06} 6%|▌ | 336/5773 [31:27<8:07:36, 5.38s/it] {'loss': 0.6067, 'learning_rate': 1.995871618766543e-05, 'epoch': 0.06} 6%|▌ | 336/5773 [31:21<8:07:36, 5.38s/it] 6%|▌ | 337/5773 [31:32<8:11:28, 5.42s/it] 6%|▌ | 337/5773 [31:27<8:11:28, 5.42s/it] {'loss': 0.6283, 'learning_rate': 1.99582052943155e-05, 'epoch': 0.06} 6%|▌ | 337/5773 [31:32<8:11:28, 5.42s/it] {'loss': 0.6283, 'learning_rate': 1.99582052943155e-05, 'epoch': 0.06} 6%|▌ | 337/5773 [31:27<8:11:28, 5.42s/it] 6%|▌ | 338/5773 [31:38<8:13:35, 5.45s/it] 6%|▌ | 338/5773 [31:32<8:13:35, 5.45s/it] {'loss': 0.6102, 'learning_rate': 1.995769126580438e-05, 'epoch': 0.06} 6%|▌ | 338/5773 [31:38<8:13:35, 5.45s/it] {'loss': 0.6102, 'learning_rate': 1.995769126580438e-05, 'epoch': 0.06} 6%|▌ | 338/5773 [31:32<8:13:35, 5.45s/it] 6%|▌ | 339/5773 [31:43<8:14:48, 5.46s/it] 6%|▌ | 339/5773 [31:38<8:14:48, 5.46s/it] {'loss': 0.6031, 'learning_rate': 1.99571741022939e-05, 'epoch': 0.06} 6%|▌ | 339/5773 [31:43<8:14:48, 5.46s/it] {'loss': 0.6031, 'learning_rate': 1.99571741022939e-05, 'epoch': 0.06} 6%|▌ | 339/5773 [31:38<8:14:48, 5.46s/it] 6%|▌ | 340/5773 [31:49<8:17:38, 5.50s/it] 6%|▌ | 340/5773 [31:43<8:17:37, 5.50s/it] {'loss': 0.6226, 'learning_rate': 1.9956653803946888e-05, 'epoch': 0.06} 6%|▌ | 340/5773 [31:49<8:17:38, 5.50s/it] {'loss': 0.6226, 'learning_rate': 1.9956653803946888e-05, 'epoch': 0.06} 6%|▌ | 340/5773 [31:43<8:17:37, 5.50s/it] 6%|▌ | 341/5773 [31:54<8:15:01, 5.47s/it] 6%|▌ | 341/5773 [31:49<8:15:00, 5.47s/it] {'loss': 0.618, 'learning_rate': 1.9956130370927145e-05, 'epoch': 0.06} 6%|▌ | 341/5773 [31:54<8:15:01, 5.47s/it] {'loss': 0.618, 'learning_rate': 1.9956130370927145e-05, 'epoch': 0.06} 6%|▌ | 341/5773 [31:49<8:15:00, 5.47s/it] 6%|▌ | 342/5773 [31:59<8:11:00, 5.42s/it] 6%|▌ | 342/5773 [31:54<8:11:00, 5.42s/it] {'loss': 0.6083, 'learning_rate': 1.9955603803399462e-05, 'epoch': 0.06} 6%|▌ | 342/5773 [31:59<8:11:00, 5.42s/it] {'loss': 0.6083, 'learning_rate': 1.9955603803399462e-05, 'epoch': 0.06} 6%|▌ | 342/5773 [31:54<8:11:00, 5.42s/it] 6%|▌ | 343/5773 [32:05<8:22:00, 5.55s/it] 6%|▌ | 343/5773 [32:00<8:22:00, 5.55s/it] {'loss': 0.6137, 'learning_rate': 1.9955074101529625e-05, 'epoch': 0.06} 6%|▌ | 343/5773 [32:05<8:22:00, 5.55s/it] {'loss': 0.6137, 'learning_rate': 1.9955074101529625e-05, 'epoch': 0.06} 6%|▌ | 343/5773 [32:00<8:22:00, 5.55s/it] 6%|▌ | 344/5773 [32:11<8:19:09, 5.52s/it] 6%|▌ | 344/5773 [32:05<8:19:09, 5.52s/it] {'loss': 0.6179, 'learning_rate': 1.9954541265484394e-05, 'epoch': 0.06} 6%|▌ | 344/5773 [32:11<8:19:09, 5.52s/it] {'loss': 0.6179, 'learning_rate': 1.9954541265484394e-05, 'epoch': 0.06} 6%|▌ | 344/5773 [32:05<8:19:09, 5.52s/it] 6%|▌ | 345/5773 [32:16<8:19:21, 5.52s/it] 6%|▌ | 345/5773 [32:11<8:19:22, 5.52s/it] {'loss': 0.6029, 'learning_rate': 1.9954005295431532e-05, 'epoch': 0.06} 6%|▌ | 345/5773 [32:16<8:19:21, 5.52s/it] {'loss': 0.6029, 'learning_rate': 1.9954005295431532e-05, 'epoch': 0.06} 6%|▌ | 345/5773 [32:11<8:19:22, 5.52s/it] 6%|▌ | 346/5773 [32:22<8:15:19, 5.48s/it] 6%|▌ | 346/5773 [32:16<8:15:19, 5.48s/it] {'loss': 0.6218, 'learning_rate': 1.9953466191539775e-05, 'epoch': 0.06} 6%|▌ | 346/5773 [32:22<8:15:19, 5.48s/it] {'loss': 0.6218, 'learning_rate': 1.9953466191539775e-05, 'epoch': 0.06} 6%|▌ | 346/5773 [32:16<8:15:19, 5.48s/it] 6%|▌ | 347/5773 [32:27<8:15:53, 5.48s/it] 6%|▌ | 347/5773 [32:22<8:15:53, 5.48s/it] {'loss': 0.6241, 'learning_rate': 1.995292395397885e-05, 'epoch': 0.06} 6%|▌ | 347/5773 [32:27<8:15:53, 5.48s/it] {'loss': 0.6241, 'learning_rate': 1.995292395397885e-05, 'epoch': 0.06} 6%|▌ | 347/5773 [32:22<8:15:53, 5.48s/it] 6%|▌ | 348/5773 [32:33<8:15:12, 5.48s/it] 6%|▌ | 348/5773 [32:27<8:15:12, 5.48s/it] {'loss': 0.6216, 'learning_rate': 1.9952378582919464e-05, 'epoch': 0.06} 6%|▌ | 348/5773 [32:33<8:15:12, 5.48s/it] {'loss': 0.6216, 'learning_rate': 1.9952378582919464e-05, 'epoch': 0.06} 6%|▌ | 348/5773 [32:27<8:15:12, 5.48s/it] 6%|▌ | 349/5773 [32:38<8:16:20, 5.49s/it] 6%|▌ | 349/5773 [32:33<8:16:20, 5.49s/it] {'loss': 0.6077, 'learning_rate': 1.995183007853333e-05, 'epoch': 0.06} 6%|▌ | 349/5773 [32:38<8:16:20, 5.49s/it] {'loss': 0.6077, 'learning_rate': 1.995183007853333e-05, 'epoch': 0.06} 6%|▌ | 349/5773 [32:33<8:16:20, 5.49s/it]15 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 6%|▌ | 350/5773 [32:44<8:13:31, 5.46s/it]4 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 09 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6%|▌ | 350/5773 [32:38<8:13:31, 5.46s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6268, 'learning_rate': 1.9951278440993128e-05, 'epoch': 0.06} 6%|▌ | 350/5773 [32:44<8:13:31, 5.46s/it] {'loss': 0.6268, 'learning_rate': 1.9951278440993128e-05, 'epoch': 0.06} 6%|▌ | 350/5773 [32:38<8:13:31, 5.46s/it] 6%|▌ | 351/5773 [32:49<8:12:55, 5.45s/it] 6%|▌ | 351/5773 [32:43<8:12:55, 5.45s/it] {'loss': 0.6047, 'learning_rate': 1.9950723670472533e-05, 'epoch': 0.06} 6%|▌ | 351/5773 [32:49<8:12:55, 5.45s/it] {'loss': 0.6047, 'learning_rate': 1.9950723670472533e-05, 'epoch': 0.06} 6%|▌ | 351/5773 [32:43<8:12:55, 5.45s/it] 6%|▌ | 352/5773 [32:54<8:10:49, 5.43s/it] 6%|▌ | 352/5773 [32:49<8:10:49, 5.43s/it] {'loss': 0.6095, 'learning_rate': 1.99501657671462e-05, 'epoch': 0.06} 6%|▌ | 352/5773 [32:54<8:10:49, 5.43s/it] {'loss': 0.6095, 'learning_rate': 1.99501657671462e-05, 'epoch': 0.06} 6%|▌ | 352/5773 [32:49<8:10:49, 5.43s/it] 6%|▌ | 353/5773 [33:00<8:08:03, 5.40s/it] 6%|▌ | 353/5773 [32:54<8:08:03, 5.40s/it] {'loss': 0.6173, 'learning_rate': 1.9949604731189778e-05, 'epoch': 0.06} 6%|▌ | 353/5773 [33:00<8:08:03, 5.40s/it] {'loss': 0.6173, 'learning_rate': 1.9949604731189778e-05, 'epoch': 0.06} 6%|▌ | 353/5773 [32:54<8:08:03, 5.40s/it] 6%|▌ | 354/5773 [33:05<8:08:04, 5.40s/it] 6%|▌ | 354/5773 [33:00<8:08:04, 5.40s/it] {'loss': 0.6109, 'learning_rate': 1.99490405627799e-05, 'epoch': 0.06} 6%|▌ | 354/5773 [33:05<8:08:04, 5.40s/it] {'loss': 0.6109, 'learning_rate': 1.99490405627799e-05, 'epoch': 0.06} 6%|▌ | 354/5773 [33:00<8:08:04, 5.40s/it] 6%|▌ | 355/5773 [33:11<8:10:41, 5.43s/it] 6%|▌ | 355/5773 [33:05<8:10:41, 5.43s/it] {'loss': 0.5984, 'learning_rate': 1.994847326209418e-05, 'epoch': 0.06} 6%|▌ | 355/5773 [33:11<8:10:41, 5.43s/it] {'loss': 0.5984, 'learning_rate': 1.994847326209418e-05, 'epoch': 0.06} 6%|▌ | 355/5773 [33:05<8:10:41, 5.43s/it] 6%|▌ | 356/5773 [33:16<8:15:50, 5.49s/it] 6%|▌ | 356/5773 [33:11<8:15:50, 5.49s/it] {'loss': 0.6131, 'learning_rate': 1.9947902829311227e-05, 'epoch': 0.06} 6%|▌ | 356/5773 [33:16<8:15:50, 5.49s/it] {'loss': 0.6131, 'learning_rate': 1.9947902829311227e-05, 'epoch': 0.06} 6%|▌ | 356/5773 [33:11<8:15:50, 5.49s/it] 6%|▌ | 357/5773 [33:22<8:12:19, 5.45s/it] 6%|▌ | 357/5773 [33:16<8:12:20, 5.45s/it] {'loss': 0.6298, 'learning_rate': 1.994732926461063e-05, 'epoch': 0.06} 6%|▌ | 357/5773 [33:22<8:12:19, 5.45s/it] {'loss': 0.6298, 'learning_rate': 1.994732926461063e-05, 'epoch': 0.06} 6%|▌ | 357/5773 [33:16<8:12:20, 5.45s/it] 6%|▌ | 358/5773 [33:27<8:13:34, 5.47s/it] 6%|▌ | 358/5773 [33:22<8:13:35, 5.47s/it] {'loss': 0.602, 'learning_rate': 1.994675256817297e-05, 'epoch': 0.06} 6%|▌ | 358/5773 [33:27<8:13:34, 5.47s/it] {'loss': 0.602, 'learning_rate': 1.994675256817297e-05, 'epoch': 0.06} 6%|▌ | 358/5773 [33:22<8:13:35, 5.47s/it] 6%|▌ | 359/5773 [33:33<8:18:33, 5.53s/it] 6%|▌ | 359/5773 [33:27<8:18:33, 5.53s/it] {'loss': 0.6151, 'learning_rate': 1.99461727401798e-05, 'epoch': 0.06} 6%|▌ | 359/5773 [33:33<8:18:33, 5.53s/it] {'loss': 0.6151, 'learning_rate': 1.99461727401798e-05, 'epoch': 0.06} 6%|▌ | 359/5773 [33:27<8:18:33, 5.53s/it] 6%|▌ | 360/5773 [33:38<8:12:45, 5.46s/it] 6%|▌ | 360/5773 [33:33<8:12:45, 5.46s/it] {'loss': 0.6279, 'learning_rate': 1.9945589780813677e-05, 'epoch': 0.06} 6%|▌ | 360/5773 [33:38<8:12:45, 5.46s/it] {'loss': 0.6279, 'learning_rate': 1.9945589780813677e-05, 'epoch': 0.06} 6%|▌ | 360/5773 [33:33<8:12:45, 5.46s/it] 6%|▋ | 361/5773 [33:44<8:15:26, 5.49s/it] 6%|▋ | 361/5773 [33:38<8:15:26, 5.49s/it] {'loss': 0.6264, 'learning_rate': 1.9945003690258127e-05, 'epoch': 0.06} 6%|▋ | 361/5773 [33:44<8:15:26, 5.49s/it] {'loss': 0.6264, 'learning_rate': 1.9945003690258127e-05, 'epoch': 0.06} 6%|▋ | 361/5773 [33:38<8:15:26, 5.49s/it] 6%|▋ | 362/5773 [33:49<8:11:52, 5.45s/it] 6%|▋ | 362/5773 [33:43<8:11:52, 5.45s/it] {'loss': 0.6115, 'learning_rate': 1.994441446869768e-05, 'epoch': 0.06} 6%|▋ | 362/5773 [33:49<8:11:52, 5.45s/it] {'loss': 0.6115, 'learning_rate': 1.994441446869768e-05, 'epoch': 0.06} 6%|▋ | 362/5773 [33:43<8:11:52, 5.45s/it] 6%|▋ | 363/5773 [33:54<8:06:50, 5.40s/it] 6%|▋ | 363/5773 [33:49<8:06:50, 5.40s/it] {'loss': 0.5953, 'learning_rate': 1.994382211631783e-05, 'epoch': 0.06} 6%|▋ | 363/5773 [33:54<8:06:50, 5.40s/it] {'loss': 0.5953, 'learning_rate': 1.994382211631783e-05, 'epoch': 0.06} 6%|▋ | 363/5773 [33:49<8:06:50, 5.40s/it] 6%|▋ | 364/5773 [34:00<8:08:32, 5.42s/it] 6%|▋ | 364/5773 [33:54<8:08:32, 5.42s/it] {'loss': 0.6248, 'learning_rate': 1.9943226633305078e-05, 'epoch': 0.06} 6%|▋ | 364/5773 [34:00<8:08:32, 5.42s/it] {'loss': 0.6248, 'learning_rate': 1.9943226633305078e-05, 'epoch': 0.06} 6%|▋ | 364/5773 [33:54<8:08:32, 5.42s/it] 6%|▋ | 365/5773 [34:05<8:07:56, 5.41s/it] 6%|▋ | 365/5773 [34:00<8:07:56, 5.41s/it] {'loss': 0.597, 'learning_rate': 1.9942628019846898e-05, 'epoch': 0.06} 6%|▋ | 365/5773 [34:05<8:07:56, 5.41s/it] {'loss': 0.597, 'learning_rate': 1.9942628019846898e-05, 'epoch': 0.06} 6%|▋ | 365/5773 [34:00<8:07:56, 5.41s/it] 6%|▋ | 366/5773 [34:11<8:11:14, 5.45s/it] 6%|▋ | 366/5773 [34:05<8:11:13, 5.45s/it] {'loss': 0.5972, 'learning_rate': 1.9942026276131752e-05, 'epoch': 0.06} 6%|▋ | 366/5773 [34:11<8:11:14, 5.45s/it] {'loss': 0.5972, 'learning_rate': 1.9942026276131752e-05, 'epoch': 0.06} 6%|▋ | 366/5773 [34:05<8:11:13, 5.45s/it] 6%|▋ | 367/5773 [34:16<8:03:58, 5.37s/it] 6%|▋ | 367/5773 [34:10<8:03:58, 5.37s/it] {'loss': 0.6029, 'learning_rate': 1.9941421402349092e-05, 'epoch': 0.06} 6%|▋ | 367/5773 [34:16<8:03:58, 5.37s/it] {'loss': 0.6029, 'learning_rate': 1.9941421402349092e-05, 'epoch': 0.06} 6%|▋ | 367/5773 [34:10<8:03:58, 5.37s/it] 6%|▋ | 368/5773 [34:21<8:07:00, 5.41s/it] 6%|▋ | 368/5773 [34:16<8:07:00, 5.41s/it] {'loss': 0.6114, 'learning_rate': 1.9940813398689344e-05, 'epoch': 0.06} 6%|▋ | 368/5773 [34:21<8:07:00, 5.41s/it] {'loss': 0.6114, 'learning_rate': 1.9940813398689344e-05, 'epoch': 0.06} 6%|▋ | 368/5773 [34:16<8:07:00, 5.41s/it] 6%|▋ | 369/5773 [34:27<8:07:12, 5.41s/it] 6%|▋ | 369/5773 [34:21<8:07:13, 5.41s/it] {'loss': 0.5961, 'learning_rate': 1.9940202265343935e-05, 'epoch': 0.06} 6%|▋ | 369/5773 [34:27<8:07:12, 5.41s/it] {'loss': 0.5961, 'learning_rate': 1.9940202265343935e-05, 'epoch': 0.06} 6%|▋ | 369/5773 [34:21<8:07:13, 5.41s/it] 6%|▋ | 370/5773 [34:32<8:10:50, 5.45s/it] 6%|▋ | 370/5773 [34:27<8:10:51, 5.45s/it] {'loss': 0.6229, 'learning_rate': 1.9939588002505264e-05, 'epoch': 0.06} 6%|▋ | 370/5773 [34:32<8:10:50, 5.45s/it] {'loss': 0.6229, 'learning_rate': 1.9939588002505264e-05, 'epoch': 0.06} 6%|▋ | 370/5773 [34:27<8:10:51, 5.45s/it] 6%|▋ | 371/5773 [34:38<8:10:34, 5.45s/it] 6%|▋ | 371/5773 [34:32<8:10:34, 5.45s/it] {'loss': 0.5918, 'learning_rate': 1.9938970610366722e-05, 'epoch': 0.06} 6%|▋ | 371/5773 [34:38<8:10:34, 5.45s/it] {'loss': 0.5918, 'learning_rate': 1.9938970610366722e-05, 'epoch': 0.06} 6%|▋ | 371/5773 [34:32<8:10:34, 5.45s/it] 6%|▋ | 372/5773 [34:43<8:11:55, 5.46s/it] 6%|▋ | 372/5773 [34:38<8:11:55, 5.46s/it] {'loss': 0.6277, 'learning_rate': 1.993835008912268e-05, 'epoch': 0.06} 6%|▋ | 372/5773 [34:43<8:11:55, 5.46s/it] {'loss': 0.6277, 'learning_rate': 1.993835008912268e-05, 'epoch': 0.06} 6%|▋ | 372/5773 [34:38<8:11:55, 5.46s/it] 6%|▋ | 373/5773 [34:49<8:11:33, 5.46s/it] 6%|▋ | 373/5773 [34:43<8:11:33, 5.46s/it] {'loss': 0.6041, 'learning_rate': 1.993772643896851e-05, 'epoch': 0.06} 6%|▋ | 373/5773 [34:49<8:11:33, 5.46s/it] {'loss': 0.6041, 'learning_rate': 1.993772643896851e-05, 'epoch': 0.06} 6%|▋ | 373/5773 [34:43<8:11:33, 5.46s/it] 6%|▋ | 374/5773 [34:54<8:12:01, 5.47s/it] 6%|▋ | 374/5773 [34:49<8:12:01, 5.47s/it] {'loss': 0.6054, 'learning_rate': 1.993709966010054e-05, 'epoch': 0.06} 6%|▋ | 374/5773 [34:54<8:12:01, 5.47s/it] {'loss': 0.6054, 'learning_rate': 1.993709966010054e-05, 'epoch': 0.06} 6%|▋ | 374/5773 [34:49<8:12:01, 5.47s/it] 6%|▋ | 375/5773 [35:00<8:10:52, 5.46s/it] 6%|▋ | 375/5773 [34:54<8:10:53, 5.46s/it] {'loss': 0.6097, 'learning_rate': 1.9936469752716116e-05, 'epoch': 0.06} 6%|▋ | 375/5773 [35:00<8:10:52, 5.46s/it] {'loss': 0.6097, 'learning_rate': 1.9936469752716116e-05, 'epoch': 0.06} 6%|▋ | 375/5773 [34:54<8:10:53, 5.46s/it] 7%|▋ | 376/5773 [35:05<8:10:56, 5.46s/it] 7%|▋ | 376/5773 [35:00<8:10:56, 5.46s/it] {'loss': 0.6088, 'learning_rate': 1.9935836717013544e-05, 'epoch': 0.07} 7%|▋ | 376/5773 [35:05<8:10:56, 5.46s/it] {'loss': 0.6088, 'learning_rate': 1.9935836717013544e-05, 'epoch': 0.07} 7%|▋ | 376/5773 [35:00<8:10:56, 5.46s/it] 7%|▋ | 377/5773 [35:10<8:08:06, 5.43s/it] 7%|▋ | 377/5773 [35:05<8:08:06, 5.43s/it] {'loss': 0.5998, 'learning_rate': 1.9935200553192124e-05, 'epoch': 0.07} 7%|▋ | 377/5773 [35:10<8:08:06, 5.43s/it] {'loss': 0.5998, 'learning_rate': 1.9935200553192124e-05, 'epoch': 0.07} 7%|▋ | 377/5773 [35:05<8:08:06, 5.43s/it] 7%|▋ | 378/5773 [35:16<8:07:52, 5.43s/it] 7%|▋ | 378/5773 [35:10<8:07:52, 5.43s/it] {'loss': 0.6098, 'learning_rate': 1.9934561261452142e-05, 'epoch': 0.07} 7%|▋ | 378/5773 [35:16<8:07:52, 5.43s/it] {'loss': 0.6098, 'learning_rate': 1.9934561261452142e-05, 'epoch': 0.07} 7%|▋ | 378/5773 [35:10<8:07:52, 5.43s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4214 > 4096). Running this sequence through the model will result in indexing errors 7%|▋ | 379/5773 [35:21<8:05:58, 5.41s/it] 7%|▋ | 379/5773 [35:16<8:05:58, 5.41s/it] {'loss': 0.6132, 'learning_rate': 1.993391884199487e-05, 'epoch': 0.07} 7%|▋ | 379/5773 [35:21<8:05:58, 5.41s/it] {'loss': 0.6132, 'learning_rate': 1.993391884199487e-05, 'epoch': 0.07} 7%|▋ | 379/5773 [35:16<8:05:58, 5.41s/it] 7%|▋ | 380/5773 [35:27<8:08:32, 5.44s/it] 7%|▋ | 380/5773 [35:21<8:08:32, 5.44s/it] {'loss': 0.6082, 'learning_rate': 1.993327329502256e-05, 'epoch': 0.07} 7%|▋ | 380/5773 [35:27<8:08:32, 5.44s/it] {'loss': 0.6082, 'learning_rate': 1.993327329502256e-05, 'epoch': 0.07} 7%|▋ | 380/5773 [35:21<8:08:32, 5.44s/it] 7%|▋ | 381/5773 [35:32<8:07:15, 5.42s/it] 7%|▋ | 381/5773 [35:27<8:07:15, 5.42s/it] {'loss': 0.6104, 'learning_rate': 1.9932624620738448e-05, 'epoch': 0.07} 7%|▋ | 381/5773 [35:32<8:07:15, 5.42s/it] {'loss': 0.6104, 'learning_rate': 1.9932624620738448e-05, 'epoch': 0.07} 7%|▋ | 381/5773 [35:27<8:07:15, 5.42s/it] 7%|▋ | 382/5773 [35:37<8:05:44, 5.41s/it] 7%|▋ | 382/5773 [35:32<8:05:44, 5.41s/it] {'loss': 0.6089, 'learning_rate': 1.9931972819346762e-05, 'epoch': 0.07} 7%|▋ | 382/5773 [35:37<8:05:44, 5.41s/it] {'loss': 0.6089, 'learning_rate': 1.9931972819346762e-05, 'epoch': 0.07} 7%|▋ | 382/5773 [35:32<8:05:44, 5.41s/it] 7%|▋ | 383/5773 [35:43<8:08:07, 5.43s/it] 7%|▋ | 383/5773 [35:37<8:08:07, 5.43s/it] {'loss': 0.6008, 'learning_rate': 1.9931317891052707e-05, 'epoch': 0.07} 7%|▋ | 383/5773 [35:43<8:08:07, 5.43s/it] {'loss': 0.6008, 'learning_rate': 1.9931317891052707e-05, 'epoch': 0.07} 7%|▋ | 383/5773 [35:37<8:08:07, 5.43s/it] 7%|▋ | 384/5773 [35:49<8:12:04, 5.48s/it] 7%|▋ | 384/5773 [35:43<8:12:04, 5.48s/it] {'loss': 0.6191, 'learning_rate': 1.9930659836062478e-05, 'epoch': 0.07} 7%|▋ | 384/5773 [35:49<8:12:04, 5.48s/it] {'loss': 0.6191, 'learning_rate': 1.9930659836062478e-05, 'epoch': 0.07} 7%|▋ | 384/5773 [35:43<8:12:04, 5.48s/it] 7%|▋ | 385/5773 [35:54<8:08:57, 5.44s/it] 7%|▋ | 385/5773 [35:48<8:08:57, 5.44s/it] {'loss': 0.6038, 'learning_rate': 1.992999865458325e-05, 'epoch': 0.07} 7%|▋ | 385/5773 [35:54<8:08:57, 5.44s/it] {'loss': 0.6038, 'learning_rate': 1.992999865458325e-05, 'epoch': 0.07} 7%|▋ | 385/5773 [35:48<8:08:57, 5.44s/it] 7%|▋ | 386/5773 [35:59<8:04:30, 5.40s/it] 7%|▋ | 386/5773 [35:54<8:04:30, 5.40s/it] {'loss': 0.6262, 'learning_rate': 1.9929334346823185e-05, 'epoch': 0.07} 7%|▋ | 386/5773 [35:59<8:04:30, 5.40s/it] {'loss': 0.6262, 'learning_rate': 1.9929334346823185e-05, 'epoch': 0.07} 7%|▋ | 386/5773 [35:54<8:04:30, 5.40s/it] 7%|▋ | 387/5773 [36:05<8:03:26, 5.39s/it] 7%|▋ | 387/5773 [35:59<8:03:26, 5.39s/it] {'loss': 0.6122, 'learning_rate': 1.9928666912991425e-05, 'epoch': 0.07} 7%|▋ | 387/5773 [36:05<8:03:26, 5.39s/it] {'loss': 0.6122, 'learning_rate': 1.9928666912991425e-05, 'epoch': 0.07} 7%|▋ | 387/5773 [35:59<8:03:26, 5.39s/it] 7%|▋ | 388/5773 [36:10<8:07:26, 5.43s/it] 7%|▋ | 388/5773 [36:05<8:07:26, 5.43s/it] {'loss': 0.5922, 'learning_rate': 1.9927996353298105e-05, 'epoch': 0.07} 7%|▋ | 388/5773 [36:10<8:07:26, 5.43s/it] {'loss': 0.5922, 'learning_rate': 1.9927996353298105e-05, 'epoch': 0.07} 7%|▋ | 388/5773 [36:05<8:07:26, 5.43s/it] 7%|▋ | 389/5773 [36:15<8:04:35, 5.40s/it] 7%|▋ | 389/5773 [36:10<8:04:36, 5.40s/it] {'loss': 0.6182, 'learning_rate': 1.992732266795433e-05, 'epoch': 0.07} 7%|▋ | 389/5773 [36:15<8:04:35, 5.40s/it] {'loss': 0.6182, 'learning_rate': 1.992732266795433e-05, 'epoch': 0.07} 7%|▋ | 389/5773 [36:10<8:04:36, 5.40s/it] 7%|▋ | 390/5773 [36:21<8:06:33, 5.42s/it] 7%|▋ | 390/5773 [36:15<8:06:33, 5.42s/it] {'loss': 0.6177, 'learning_rate': 1.9926645857172213e-05, 'epoch': 0.07} 7%|▋ | 390/5773 [36:21<8:06:33, 5.42s/it] {'loss': 0.6177, 'learning_rate': 1.9926645857172213e-05, 'epoch': 0.07} 7%|▋ | 390/5773 [36:15<8:06:33, 5.42s/it] 7%|▋ | 391/5773 [36:26<8:08:07, 5.44s/it] 7%|▋ | 391/5773 [36:21<8:08:06, 5.44s/it] {'loss': 0.6056, 'learning_rate': 1.992596592116482e-05, 'epoch': 0.07} 7%|▋ | 391/5773 [36:26<8:08:07, 5.44s/it] {'loss': 0.6056, 'learning_rate': 1.992596592116482e-05, 'epoch': 0.07} 7%|▋ | 391/5773 [36:21<8:08:06, 5.44s/it] 7%|▋ | 392/5773 [36:32<8:06:54, 5.43s/it] 7%|▋ | 392/5773 [36:26<8:06:53, 5.43s/it] {'loss': 0.6314, 'learning_rate': 1.992528286014622e-05, 'epoch': 0.07} 7%|▋ | 392/5773 [36:32<8:06:54, 5.43s/it] {'loss': 0.6314, 'learning_rate': 1.992528286014622e-05, 'epoch': 0.07} 7%|▋ | 392/5773 [36:26<8:06:53, 5.43s/it] 7%|▋ | 393/5773 [36:37<8:08:24, 5.45s/it] 7%|▋ | 393/5773 [36:32<8:08:23, 5.45s/it] {'loss': 0.604, 'learning_rate': 1.992459667433147e-05, 'epoch': 0.07} 7%|▋ | 393/5773 [36:37<8:08:24, 5.45s/it] {'loss': 0.604, 'learning_rate': 1.992459667433147e-05, 'epoch': 0.07} 7%|▋ | 393/5773 [36:32<8:08:23, 5.45s/it] 7%|▋ | 394/5773 [36:43<8:06:51, 5.43s/it] 7%|▋ | 394/5773 [36:37<8:06:52, 5.43s/it] {'loss': 0.621, 'learning_rate': 1.9923907363936593e-05, 'epoch': 0.07} 7%|▋ | 394/5773 [36:43<8:06:51, 5.43s/it] {'loss': 0.621, 'learning_rate': 1.9923907363936593e-05, 'epoch': 0.07} 7%|▋ | 394/5773 [36:37<8:06:52, 5.43s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 7%|▋ | 395/5773 [36:48<8:05:28, 5.42s/it] 7%|▋ | 395/5773 [36:43<8:05:27, 5.42s/it] {'loss': 0.6169, 'learning_rate': 1.9923214929178617e-05, 'epoch': 0.07} 7%|▋ | 395/5773 [36:48<8:05:28, 5.42s/it] {'loss': 0.6169, 'learning_rate': 1.9923214929178617e-05, 'epoch': 0.07} 7%|▋ | 395/5773 [36:43<8:05:27, 5.42s/it] 7%|▋ | 396/5773 [36:53<8:05:56, 5.42s/it] 7%|▋ | 396/5773 [36:48<8:05:56, 5.42s/it] {'loss': 0.6061, 'learning_rate': 1.9922519370275536e-05, 'epoch': 0.07} 7%|▋ | 396/5773 [36:53<8:05:56, 5.42s/it] {'loss': 0.6061, 'learning_rate': 1.9922519370275536e-05, 'epoch': 0.07} 7%|▋ | 396/5773 [36:48<8:05:56, 5.42s/it] 7%|▋ | 397/5773 [36:59<8:07:12, 5.44s/it] 7%|▋ | 397/5773 [36:53<8:07:14, 5.44s/it] {'loss': 0.6069, 'learning_rate': 1.992182068744633e-05, 'epoch': 0.07} 7%|▋ | 397/5773 [36:59<8:07:12, 5.44s/it] {'loss': 0.6069, 'learning_rate': 1.992182068744633e-05, 'epoch': 0.07} 7%|▋ | 397/5773 [36:53<8:07:14, 5.44s/it] 7%|▋ | 398/5773 [37:04<8:08:26, 5.45s/it] 7%|▋ | 398/5773 [36:59<8:08:25, 5.45s/it] {'loss': 0.6186, 'learning_rate': 1.9921118880910976e-05, 'epoch': 0.07} 7%|▋ | 398/5773 [37:04<8:08:26, 5.45s/it] {'loss': 0.6186, 'learning_rate': 1.9921118880910976e-05, 'epoch': 0.07} 7%|▋ | 398/5773 [36:59<8:08:25, 5.45s/it] 7%|▋ | 399/5773 [37:10<8:00:53, 5.37s/it] 7%|▋ | 399/5773 [37:04<8:00:53, 5.37s/it] {'loss': 0.6042, 'learning_rate': 1.992041395089042e-05, 'epoch': 0.07} 7%|▋ | 399/5773 [37:10<8:00:53, 5.37s/it] {'loss': 0.6042, 'learning_rate': 1.992041395089042e-05, 'epoch': 0.07} 7%|▋ | 399/5773 [37:04<8:00:53, 5.37s/it]15 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 7%|▋ | 400/5773 [37:15<7:58:54, 5.35s/it]11 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 7%|▋ | 400/5773 [37:09<7:58:54, 5.35s/it]2 AutoResumeHook: Checking whether to suspend... {'loss': 0.6093, 'learning_rate': 1.9919705897606596e-05, 'epoch': 0.07} 7%|▋ | 400/5773 [37:15<7:58:54, 5.35s/it] {'loss': 0.6093, 'learning_rate': 1.9919705897606596e-05, 'epoch': 0.07} 7%|▋ | 400/5773 [37:09<7:58:54, 5.35s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 7%|▋ | 401/5773 [37:34<14:04:34, 9.43s/it] 7%|▋ | 401/5773 [37:28<14:04:33, 9.43s/it] {'loss': 0.5962, 'learning_rate': 1.9918994721282423e-05, 'epoch': 0.07} 7%|▋ | 401/5773 [37:34<14:04:34, 9.43s/it] {'loss': 0.5962, 'learning_rate': 1.9918994721282423e-05, 'epoch': 0.07} 7%|▋ | 401/5773 [37:28<14:04:33, 9.43s/it] 7%|▋ | 402/5773 [37:39<12:19:51, 8.27s/it] 7%|▋ | 402/5773 [37:34<12:19:51, 8.26s/it] {'loss': 0.6078, 'learning_rate': 1.99182804221418e-05, 'epoch': 0.07} 7%|▋ | 402/5773 [37:39<12:19:51, 8.27s/it] {'loss': 0.6078, 'learning_rate': 1.99182804221418e-05, 'epoch': 0.07} 7%|▋ | 402/5773 [37:34<12:19:51, 8.26s/it] 7%|▋ | 403/5773 [37:45<11:04:22, 7.42s/it] 7%|▋ | 403/5773 [37:39<11:04:22, 7.42s/it] {'loss': 0.5974, 'learning_rate': 1.9917563000409618e-05, 'epoch': 0.07} 7%|▋ | 403/5773 [37:45<11:04:22, 7.42s/it] {'loss': 0.5974, 'learning_rate': 1.9917563000409618e-05, 'epoch': 0.07} 7%|▋ | 403/5773 [37:39<11:04:22, 7.42s/it] 7%|▋ | 404/5773 [37:51<10:18:34, 6.91s/it] 7%|▋ | 404/5773 [37:45<10:18:34, 6.91s/it] {'loss': 0.6103, 'learning_rate': 1.9916842456311736e-05, 'epoch': 0.07} 7%|▋ | 404/5773 [37:51<10:18:34, 6.91s/it] {'loss': 0.6103, 'learning_rate': 1.9916842456311736e-05, 'epoch': 0.07} 7%|▋ | 404/5773 [37:45<10:18:34, 6.91s/it] 7%|▋ | 405/5773 [37:56<9:40:02, 6.48s/it] 7%|▋ | 405/5773 [37:51<9:40:02, 6.48s/it] {'loss': 0.5955, 'learning_rate': 1.991611879007501e-05, 'epoch': 0.07} 7%|▋ | 405/5773 [37:56<9:40:02, 6.48s/it] {'loss': 0.5955, 'learning_rate': 1.991611879007501e-05, 'epoch': 0.07} 7%|▋ | 405/5773 [37:51<9:40:02, 6.48s/it] 7%|▋ | 406/5773 [38:02<9:17:56, 6.24s/it] 7%|▋ | 406/5773 [37:56<9:17:56, 6.24s/it] {'loss': 0.6116, 'learning_rate': 1.991539200192727e-05, 'epoch': 0.07} 7%|▋ | 406/5773 [38:02<9:17:56, 6.24s/it] {'loss': 0.6116, 'learning_rate': 1.991539200192727e-05, 'epoch': 0.07} 7%|▋ | 406/5773 [37:56<9:17:56, 6.24s/it] 7%|▋ | 407/5773 [38:07<8:59:48, 6.04s/it] 7%|▋ | 407/5773 [38:02<8:59:47, 6.04s/it] {'loss': 0.6101, 'learning_rate': 1.9914662092097334e-05, 'epoch': 0.07} 7%|▋ | 407/5773 [38:07<8:59:48, 6.04s/it] {'loss': 0.6101, 'learning_rate': 1.9914662092097334e-05, 'epoch': 0.07} 7%|▋ | 407/5773 [38:02<8:59:47, 6.04s/it] 7%|▋ | 408/5773 [38:13<8:40:52, 5.83s/it] 7%|▋ | 408/5773 [38:07<8:40:52, 5.83s/it] {'loss': 0.6209, 'learning_rate': 1.9913929060815e-05, 'epoch': 0.07} 7%|▋ | 408/5773 [38:13<8:40:52, 5.83s/it] {'loss': 0.6209, 'learning_rate': 1.9913929060815e-05, 'epoch': 0.07} 7%|▋ | 408/5773 [38:07<8:40:52, 5.83s/it] 7%|▋ | 409/5773 [38:18<8:34:44, 5.76s/it] 7%|▋ | 409/5773 [38:13<8:34:44, 5.76s/it] {'loss': 0.5969, 'learning_rate': 1.991319290831105e-05, 'epoch': 0.07} 7%|▋ | 409/5773 [38:18<8:34:44, 5.76s/it] {'loss': 0.5969, 'learning_rate': 1.991319290831105e-05, 'epoch': 0.07} 7%|▋ | 409/5773 [38:13<8:34:44, 5.76s/it] 7%|▋ | 410/5773 [38:24<8:28:52, 5.69s/it] 7%|▋ | 410/5773 [38:18<8:28:52, 5.69s/it] {'loss': 0.608, 'learning_rate': 1.9912453634817246e-05, 'epoch': 0.07} 7%|▋ | 410/5773 [38:24<8:28:52, 5.69s/it] {'loss': 0.608, 'learning_rate': 1.9912453634817246e-05, 'epoch': 0.07} 7%|▋ | 410/5773 [38:18<8:28:52, 5.69s/it] 7%|▋ | 411/5773 [38:29<8:18:47, 5.58s/it] 7%|▋ | 411/5773 [38:24<8:18:48, 5.58s/it] {'loss': 0.6242, 'learning_rate': 1.991171124056634e-05, 'epoch': 0.07} 7%|▋ | 411/5773 [38:29<8:18:47, 5.58s/it] {'loss': 0.6242, 'learning_rate': 1.991171124056634e-05, 'epoch': 0.07} 7%|▋ | 411/5773 [38:24<8:18:48, 5.58s/it] 7%|▋ | 412/5773 [38:35<8:14:39, 5.54s/it] 7%|▋ | 412/5773 [38:29<8:14:39, 5.54s/it] {'loss': 0.6174, 'learning_rate': 1.9910965725792057e-05, 'epoch': 0.07} 7%|▋ | 412/5773 [38:35<8:14:39, 5.54s/it] {'loss': 0.6174, 'learning_rate': 1.9910965725792057e-05, 'epoch': 0.07} 7%|▋ | 412/5773 [38:29<8:14:39, 5.54s/it] 7%|▋ | 413/5773 [38:40<8:16:02, 5.55s/it] 7%|▋ | 413/5773 [38:35<8:16:02, 5.55s/it] {'loss': 0.5988, 'learning_rate': 1.991021709072911e-05, 'epoch': 0.07} 7%|▋ | 413/5773 [38:40<8:16:02, 5.55s/it] {'loss': 0.5988, 'learning_rate': 1.991021709072911e-05, 'epoch': 0.07} 7%|▋ | 413/5773 [38:35<8:16:02, 5.55s/it] 7%|▋ | 414/5773 [38:46<8:20:51, 5.61s/it] 7%|▋ | 414/5773 [38:40<8:20:51, 5.61s/it] {'loss': 0.6049, 'learning_rate': 1.9909465335613194e-05, 'epoch': 0.07} 7%|▋ | 414/5773 [38:46<8:20:51, 5.61s/it] {'loss': 0.6049, 'learning_rate': 1.9909465335613194e-05, 'epoch': 0.07} 7%|▋ | 414/5773 [38:40<8:20:51, 5.61s/it] 7%|▋ | 415/5773 [38:51<8:15:34, 5.55s/it] 7%|▋ | 415/5773 [38:46<8:15:34, 5.55s/it] {'loss': 0.6252, 'learning_rate': 1.990871046068098e-05, 'epoch': 0.07} 7%|▋ | 415/5773 [38:51<8:15:34, 5.55s/it] {'loss': 0.6252, 'learning_rate': 1.990871046068098e-05, 'epoch': 0.07} 7%|▋ | 415/5773 [38:46<8:15:34, 5.55s/it] 7%|▋ | 416/5773 [38:57<8:13:04, 5.52s/it] 7%|▋ | 416/5773 [38:51<8:13:03, 5.52s/it] {'loss': 0.6116, 'learning_rate': 1.990795246617014e-05, 'epoch': 0.07} 7%|▋ | 416/5773 [38:51<8:13:03, 5.52s/it] {'loss': 0.6116, 'learning_rate': 1.990795246617014e-05, 'epoch': 0.07} 7%|▋ | 416/5773 [38:57<8:13:04, 5.52s/it] 7%|▋ | 417/5773 [39:02<8:12:45, 5.52s/it] 7%|▋ | 417/5773 [38:57<8:12:45, 5.52s/it] {'loss': 0.6236, 'learning_rate': 1.9907191352319302e-05, 'epoch': 0.07} 7%|▋ | 417/5773 [39:02<8:12:45, 5.52s/it] {'loss': 0.6236, 'learning_rate': 1.9907191352319302e-05, 'epoch': 0.07} 7%|▋ | 417/5773 [38:57<8:12:45, 5.52s/it] 7%|▋ | 418/5773 [39:08<8:09:02, 5.48s/it] 7%|▋ | 418/5773 [39:02<8:09:02, 5.48s/it] {'loss': 0.6215, 'learning_rate': 1.9906427119368098e-05, 'epoch': 0.07} 7%|▋ | 418/5773 [39:08<8:09:02, 5.48s/it] {'loss': 0.6215, 'learning_rate': 1.9906427119368098e-05, 'epoch': 0.07} 7%|▋ | 418/5773 [39:02<8:09:02, 5.48s/it] 7%|▋ | 419/5773 [39:13<8:07:33, 5.46s/it] 7%|▋ | 419/5773 [39:08<8:07:33, 5.46s/it] {'loss': 0.5936, 'learning_rate': 1.9905659767557126e-05, 'epoch': 0.07} 7%|▋ | 419/5773 [39:13<8:07:33, 5.46s/it] {'loss': 0.5936, 'learning_rate': 1.9905659767557126e-05, 'epoch': 0.07} 7%|▋ | 419/5773 [39:08<8:07:33, 5.46s/it] 7%|▋ | 420/5773 [39:18<8:03:13, 5.42s/it] 7%|▋ | 420/5773 [39:13<8:03:13, 5.42s/it] {'loss': 0.6029, 'learning_rate': 1.9904889297127973e-05, 'epoch': 0.07} 7%|▋ | 420/5773 [39:18<8:03:13, 5.42s/it] {'loss': 0.6029, 'learning_rate': 1.9904889297127973e-05, 'epoch': 0.07} 7%|▋ | 420/5773 [39:13<8:03:13, 5.42s/it] 7%|▋ | 421/5773 [39:24<8:01:40, 5.40s/it] 7%|▋ | 421/5773 [39:18<8:01:40, 5.40s/it] {'loss': 0.614, 'learning_rate': 1.9904115708323214e-05, 'epoch': 0.07} 7%|▋ | 421/5773 [39:24<8:01:40, 5.40s/it] {'loss': 0.614, 'learning_rate': 1.9904115708323214e-05, 'epoch': 0.07} 7%|▋ | 421/5773 [39:18<8:01:40, 5.40s/it] 7%|▋ | 422/5773 [39:29<7:59:13, 5.37s/it] 7%|▋ | 422/5773 [39:24<7:59:13, 5.37s/it] {'loss': 0.6128, 'learning_rate': 1.9903339001386396e-05, 'epoch': 0.07} 7%|▋ | 422/5773 [39:29<7:59:13, 5.37s/it] {'loss': 0.6128, 'learning_rate': 1.9903339001386396e-05, 'epoch': 0.07} 7%|▋ | 422/5773 [39:24<7:59:13, 5.37s/it] 7%|▋ | 423/5773 [39:34<7:56:34, 5.34s/it] 7%|▋ | 423/5773 [39:29<7:56:34, 5.34s/it] {'loss': 0.6009, 'learning_rate': 1.9902559176562045e-05, 'epoch': 0.07} 7%|▋ | 423/5773 [39:34<7:56:34, 5.34s/it] {'loss': 0.6009, 'learning_rate': 1.9902559176562045e-05, 'epoch': 0.07} 7%|▋ | 423/5773 [39:29<7:56:34, 5.34s/it] 7%|▋ | 424/5773 [39:40<7:58:24, 5.37s/it] 7%|▋ | 424/5773 [39:34<7:58:24, 5.37s/it] {'loss': 0.5944, 'learning_rate': 1.9901776234095686e-05, 'epoch': 0.07} 7%|▋ | 424/5773 [39:40<7:58:24, 5.37s/it] {'loss': 0.5944, 'learning_rate': 1.9901776234095686e-05, 'epoch': 0.07} 7%|▋ | 424/5773 [39:34<7:58:24, 5.37s/it] 7%|▋ | 425/5773 [39:45<7:55:28, 5.33s/it] 7%|▋ | 425/5773 [39:39<7:55:28, 5.33s/it] {'loss': 0.5916, 'learning_rate': 1.9900990174233807e-05, 'epoch': 0.07} 7%|▋ | 425/5773 [39:45<7:55:28, 5.33s/it] {'loss': 0.5916, 'learning_rate': 1.9900990174233807e-05, 'epoch': 0.07} 7%|▋ | 425/5773 [39:39<7:55:28, 5.33s/it] 7%|▋ | 426/5773 [39:50<7:57:10, 5.35s/it] 7%|▋ | 426/5773 [39:45<7:57:10, 5.35s/it] {'loss': 0.6182, 'learning_rate': 1.9900200997223886e-05, 'epoch': 0.07} 7%|▋ | 426/5773 [39:50<7:57:10, 5.35s/it] {'loss': 0.6182, 'learning_rate': 1.9900200997223886e-05, 'epoch': 0.07} 7%|▋ | 426/5773 [39:45<7:57:10, 5.35s/it] 7%|▋ | 427/5773 [39:56<8:00:38, 5.39s/it] 7%|▋ | 427/5773 [39:50<8:00:38, 5.39s/it] {'loss': 0.61, 'learning_rate': 1.9899408703314383e-05, 'epoch': 0.07} 7%|▋ | 427/5773 [39:56<8:00:38, 5.39s/it] {'loss': 0.61, 'learning_rate': 1.9899408703314383e-05, 'epoch': 0.07} 7%|▋ | 427/5773 [39:50<8:00:38, 5.39s/it] 7%|▋ | 428/5773 [40:02<8:09:12, 5.49s/it] 7%|▋ | 428/5773 [39:56<8:09:12, 5.49s/it] {'loss': 0.6146, 'learning_rate': 1.9898613292754738e-05, 'epoch': 0.07} 7%|▋ | 428/5773 [40:02<8:09:12, 5.49s/it] {'loss': 0.6146, 'learning_rate': 1.9898613292754738e-05, 'epoch': 0.07} 7%|▋ | 428/5773 [39:56<8:09:12, 5.49s/it] 7%|▋ | 429/5773 [40:07<8:06:47, 5.47s/it] 7%|▋ | 429/5773 [40:01<8:06:47, 5.47s/it] {'loss': 0.5921, 'learning_rate': 1.9897814765795365e-05, 'epoch': 0.07} 7%|▋ | 429/5773 [40:07<8:06:47, 5.47s/it] {'loss': 0.5921, 'learning_rate': 1.9897814765795365e-05, 'epoch': 0.07} 7%|▋ | 429/5773 [40:01<8:06:47, 5.47s/it] 7%|▋ | 430/5773 [40:12<8:03:33, 5.43s/it] 7%|▋ | 430/5773 [40:07<8:03:33, 5.43s/it] {'loss': 0.5978, 'learning_rate': 1.9897013122687676e-05, 'epoch': 0.07} 7%|▋ | 430/5773 [40:12<8:03:33, 5.43s/it] {'loss': 0.5978, 'learning_rate': 1.9897013122687676e-05, 'epoch': 0.07} 7%|▋ | 430/5773 [40:07<8:03:33, 5.43s/it] 7%|▋ | 431/5773 [40:18<8:05:25, 5.45s/it] 7%|▋ | 431/5773 [40:12<8:05:24, 5.45s/it] {'loss': 0.6121, 'learning_rate': 1.9896208363684048e-05, 'epoch': 0.07} 7%|▋ | 431/5773 [40:18<8:05:25, 5.45s/it] {'loss': 0.6121, 'learning_rate': 1.9896208363684048e-05, 'epoch': 0.07} 7%|▋ | 431/5773 [40:12<8:05:24, 5.45s/it] 7%|▋ | 432/5773 [40:23<8:06:11, 5.46s/it] 7%|▋ | 432/5773 [40:18<8:06:10, 5.46s/it] {'loss': 0.6173, 'learning_rate': 1.9895400489037843e-05, 'epoch': 0.07} 7%|▋ | 432/5773 [40:23<8:06:11, 5.46s/it] {'loss': 0.6173, 'learning_rate': 1.9895400489037843e-05, 'epoch': 0.07} 7%|▋ | 432/5773 [40:18<8:06:10, 5.46s/it] 8%|▊ | 433/5773 [40:29<8:04:32, 5.44s/it] 8%|▊ | 433/5773 [40:23<8:04:32, 5.44s/it] {'loss': 0.6051, 'learning_rate': 1.989458949900341e-05, 'epoch': 0.08} 8%|▊ | 433/5773 [40:29<8:04:32, 5.44s/it] {'loss': 0.6051, 'learning_rate': 1.989458949900341e-05, 'epoch': 0.08} 8%|▊ | 433/5773 [40:23<8:04:32, 5.44s/it] 8%|▊ | 434/5773 [40:34<8:06:42, 5.47s/it] 8%|▊ | 434/5773 [40:29<8:06:42, 5.47s/it] {'loss': 0.5981, 'learning_rate': 1.989377539383607e-05, 'epoch': 0.08} 8%|▊ | 434/5773 [40:34<8:06:42, 5.47s/it] {'loss': 0.5981, 'learning_rate': 1.989377539383607e-05, 'epoch': 0.08} 8%|▊ | 434/5773 [40:29<8:06:42, 5.47s/it] 8%|▊ | 435/5773 [40:40<8:08:08, 5.49s/it] 8%|▊ | 435/5773 [40:34<8:08:08, 5.49s/it] {'loss': 0.6088, 'learning_rate': 1.9892958173792136e-05, 'epoch': 0.08} 8%|▊ | 435/5773 [40:40<8:08:08, 5.49s/it] {'loss': 0.6088, 'learning_rate': 1.9892958173792136e-05, 'epoch': 0.08} 8%|▊ | 435/5773 [40:34<8:08:08, 5.49s/it] 8%|▊ | 436/5773 [40:45<8:05:18, 5.46s/it] 8%|▊ | 436/5773 [40:40<8:05:23, 5.46s/it] {'loss': 0.6112, 'learning_rate': 1.989213783912889e-05, 'epoch': 0.08} 8%|▊ | 436/5773 [40:45<8:05:18, 5.46s/it] {'loss': 0.6112, 'learning_rate': 1.989213783912889e-05, 'epoch': 0.08} 8%|▊ | 436/5773 [40:40<8:05:23, 5.46s/it] 8%|▊ | 437/5773 [40:51<8:05:17, 5.46s/it] 8%|▊ | 437/5773 [40:45<8:05:15, 5.46s/it] {'loss': 0.5997, 'learning_rate': 1.9891314390104597e-05, 'epoch': 0.08} 8%|▊ | 437/5773 [40:51<8:05:17, 5.46s/it] {'loss': 0.5997, 'learning_rate': 1.9891314390104597e-05, 'epoch': 0.08} 8%|▊ | 437/5773 [40:45<8:05:15, 5.46s/it] 8%|▊ | 438/5773 [40:56<8:03:53, 5.44s/it] 8%|▊ | 438/5773 [40:51<8:03:52, 5.44s/it] {'loss': 0.6162, 'learning_rate': 1.989048782697851e-05, 'epoch': 0.08} 8%|▊ | 438/5773 [40:56<8:03:53, 5.44s/it] {'loss': 0.6162, 'learning_rate': 1.989048782697851e-05, 'epoch': 0.08} 8%|▊ | 438/5773 [40:51<8:03:52, 5.44s/it] 8%|▊ | 439/5773 [41:02<8:04:32, 5.45s/it] 8%|▊ | 439/5773 [40:56<8:04:31, 5.45s/it] {'loss': 0.606, 'learning_rate': 1.988965815001086e-05, 'epoch': 0.08} 8%|▊ | 439/5773 [41:02<8:04:32, 5.45s/it] {'loss': 0.606, 'learning_rate': 1.988965815001086e-05, 'epoch': 0.08} 8%|▊ | 439/5773 [40:56<8:04:31, 5.45s/it] 8%|▊ | 440/5773 [41:07<8:04:33, 5.45s/it] 8%|▊ | 440/5773 [41:01<8:04:32, 5.45s/it] {'loss': 0.597, 'learning_rate': 1.9888825359462846e-05, 'epoch': 0.08} 8%|▊ | 440/5773 [41:07<8:04:33, 5.45s/it] {'loss': 0.597, 'learning_rate': 1.9888825359462846e-05, 'epoch': 0.08} 8%|▊ | 440/5773 [41:01<8:04:32, 5.45s/it] 8%|▊ | 441/5773 [41:13<8:08:11, 5.49s/it] 8%|▊ | 441/5773 [41:07<8:08:11, 5.49s/it] {'loss': 0.5985, 'learning_rate': 1.9887989455596667e-05, 'epoch': 0.08} 8%|▊ | 441/5773 [41:13<8:08:11, 5.49s/it] {'loss': 0.5985, 'learning_rate': 1.9887989455596667e-05, 'epoch': 0.08} 8%|▊ | 441/5773 [41:07<8:08:11, 5.49s/it] 8%|▊ | 442/5773 [41:18<8:03:49, 5.45s/it] 8%|▊ | 442/5773 [41:12<8:03:50, 5.45s/it] {'loss': 0.5891, 'learning_rate': 1.988715043867549e-05, 'epoch': 0.08} 8%|▊ | 442/5773 [41:18<8:03:49, 5.45s/it] {'loss': 0.5891, 'learning_rate': 1.988715043867549e-05, 'epoch': 0.08} 8%|▊ | 442/5773 [41:12<8:03:50, 5.45s/it] 8%|▊ | 443/5773 [41:23<8:01:59, 5.43s/it] 8%|▊ | 443/5773 [41:18<8:01:58, 5.43s/it] {'loss': 0.6059, 'learning_rate': 1.9886308308963458e-05, 'epoch': 0.08} 8%|▊ | 443/5773 [41:23<8:01:59, 5.43s/it] {'loss': 0.6059, 'learning_rate': 1.9886308308963458e-05, 'epoch': 0.08} 8%|▊ | 443/5773 [41:18<8:01:58, 5.43s/it] 8%|▊ | 444/5773 [41:29<7:59:04, 5.39s/it] 8%|▊ | 444/5773 [41:23<7:59:03, 5.39s/it] {'loss': 0.5968, 'learning_rate': 1.9885463066725708e-05, 'epoch': 0.08} 8%|▊ | 444/5773 [41:29<7:59:04, 5.39s/it] {'loss': 0.5968, 'learning_rate': 1.9885463066725708e-05, 'epoch': 0.08} 8%|▊ | 444/5773 [41:23<7:59:03, 5.39s/it] 8%|▊ | 445/5773 [41:34<7:56:41, 5.37s/it] 8%|▊ | 445/5773 [41:28<7:56:41, 5.37s/it] {'loss': 0.614, 'learning_rate': 1.9884614712228346e-05, 'epoch': 0.08} 8%|▊ | 445/5773 [41:34<7:56:41, 5.37s/it] {'loss': 0.614, 'learning_rate': 1.9884614712228346e-05, 'epoch': 0.08} 8%|▊ | 445/5773 [41:28<7:56:41, 5.37s/it] 8%|▊ | 446/5773 [41:39<7:56:21, 5.37s/it] 8%|▊ | 446/5773 [41:34<7:56:22, 5.37s/it] {'loss': 0.6141, 'learning_rate': 1.9883763245738457e-05, 'epoch': 0.08} 8%|▊ | 446/5773 [41:39<7:56:21, 5.37s/it] {'loss': 0.6141, 'learning_rate': 1.9883763245738457e-05, 'epoch': 0.08} 8%|▊ | 446/5773 [41:34<7:56:22, 5.37s/it] 8%|▊ | 447/5773 [41:45<7:59:41, 5.40s/it] 8%|▊ | 447/5773 [41:39<7:59:41, 5.40s/it] {'loss': 0.6001, 'learning_rate': 1.988290866752412e-05, 'epoch': 0.08} 8%|▊ | 447/5773 [41:45<7:59:41, 5.40s/it] {'loss': 0.6001, 'learning_rate': 1.988290866752412e-05, 'epoch': 0.08} 8%|▊ | 447/5773 [41:39<7:59:41, 5.40s/it] 8%|▊ | 448/5773 [41:50<8:02:12, 5.43s/it] 8%|▊ | 448/5773 [41:45<8:02:12, 5.43s/it] {'loss': 0.6214, 'learning_rate': 1.9882050977854374e-05, 'epoch': 0.08} 8%|▊ | 448/5773 [41:50<8:02:12, 5.43s/it] {'loss': 0.6214, 'learning_rate': 1.9882050977854374e-05, 'epoch': 0.08} 8%|▊ | 448/5773 [41:45<8:02:12, 5.43s/it] 8%|▊ | 449/5773 [41:56<8:00:44, 5.42s/it] 8%|▊ | 449/5773 [41:50<8:00:45, 5.42s/it] {'loss': 0.6109, 'learning_rate': 1.9881190176999255e-05, 'epoch': 0.08} 8%|▊ | 449/5773 [41:56<8:00:44, 5.42s/it] {'loss': 0.6109, 'learning_rate': 1.9881190176999255e-05, 'epoch': 0.08} 8%|▊ | 449/5773 [41:50<8:00:45, 5.42s/it]815 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 8%|▊ | 450/5773 [42:01<7:57:41, 5.38s/it]4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 010 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 8%|▊ | 450/5773 [41:55<7:57:42, 5.38s/it]14 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6052, 'learning_rate': 1.9880326265229764e-05, 'epoch': 0.08} 8%|▊ | 450/5773 [42:01<7:57:41, 5.38s/it] {'loss': 0.6052, 'learning_rate': 1.9880326265229764e-05, 'epoch': 0.08} 8%|▊ | 450/5773 [41:55<7:57:42, 5.38s/it] 8%|▊ | 451/5773 [42:07<8:06:06, 5.48s/it] 8%|▊ | 451/5773 [42:01<8:06:05, 5.48s/it] {'loss': 0.6109, 'learning_rate': 1.9879459242817888e-05, 'epoch': 0.08} 8%|▊ | 451/5773 [42:07<8:06:06, 5.48s/it] {'loss': 0.6109, 'learning_rate': 1.9879459242817888e-05, 'epoch': 0.08} 8%|▊ | 451/5773 [42:01<8:06:05, 5.48s/it] 8%|▊ | 452/5773 [42:12<8:05:09, 5.47s/it] 8%|▊ | 452/5773 [42:07<8:05:08, 5.47s/it] {'loss': 0.6089, 'learning_rate': 1.98785891100366e-05, 'epoch': 0.08} 8%|▊ | 452/5773 [42:12<8:05:09, 5.47s/it] {'loss': 0.6089, 'learning_rate': 1.98785891100366e-05, 'epoch': 0.08} 8%|▊ | 452/5773 [42:07<8:05:08, 5.47s/it] 8%|▊ | 453/5773 [42:18<8:03:53, 5.46s/it] 8%|▊ | 453/5773 [42:12<8:03:53, 5.46s/it] {'loss': 0.6117, 'learning_rate': 1.9877715867159838e-05, 'epoch': 0.08} 8%|▊ | 453/5773 [42:18<8:03:53, 5.46s/it] {'loss': 0.6117, 'learning_rate': 1.9877715867159838e-05, 'epoch': 0.08} 8%|▊ | 453/5773 [42:12<8:03:53, 5.46s/it] 8%|▊ | 454/5773 [42:23<8:02:43, 5.45s/it] 8%|▊ | 454/5773 [42:17<8:02:43, 5.45s/it] {'loss': 0.6079, 'learning_rate': 1.9876839514462532e-05, 'epoch': 0.08} 8%|▊ | 454/5773 [42:23<8:02:43, 5.45s/it] {'loss': 0.6079, 'learning_rate': 1.9876839514462532e-05, 'epoch': 0.08} 8%|▊ | 454/5773 [42:17<8:02:43, 5.45s/it] 8%|▊ | 455/5773 [42:28<8:03:46, 5.46s/it] 8%|▊ | 455/5773 [42:23<8:03:46, 5.46s/it] {'loss': 0.6146, 'learning_rate': 1.9875960052220582e-05, 'epoch': 0.08} 8%|▊ | 455/5773 [42:28<8:03:46, 5.46s/it] {'loss': 0.6146, 'learning_rate': 1.9875960052220582e-05, 'epoch': 0.08} 8%|▊ | 455/5773 [42:23<8:03:46, 5.46s/it] 8%|▊ | 456/5773 [42:34<8:08:25, 5.51s/it] 8%|▊ | 456/5773 [42:29<8:08:25, 5.51s/it] {'loss': 0.589, 'learning_rate': 1.9875077480710875e-05, 'epoch': 0.08} 8%|▊ | 456/5773 [42:34<8:08:25, 5.51s/it] {'loss': 0.589, 'learning_rate': 1.9875077480710875e-05, 'epoch': 0.08} 8%|▊ | 456/5773 [42:29<8:08:25, 5.51s/it] 8%|▊ | 457/5773 [42:39<8:01:22, 5.43s/it] 8%|▊ | 457/5773 [42:34<8:01:23, 5.43s/it] {'loss': 0.6036, 'learning_rate': 1.987419180021127e-05, 'epoch': 0.08} 8%|▊ | 457/5773 [42:39<8:01:22, 5.43s/it] {'loss': 0.6036, 'learning_rate': 1.987419180021127e-05, 'epoch': 0.08} 8%|▊ | 457/5773 [42:34<8:01:23, 5.43s/it] 8%|▊ | 458/5773 [42:45<8:00:18, 5.42s/it] 8%|▊ | 458/5773 [42:39<8:00:18, 5.42s/it] {'loss': 0.617, 'learning_rate': 1.9873303011000606e-05, 'epoch': 0.08} 8%|▊ | 458/5773 [42:45<8:00:18, 5.42s/it] {'loss': 0.617, 'learning_rate': 1.9873303011000606e-05, 'epoch': 0.08} 8%|▊ | 458/5773 [42:39<8:00:18, 5.42s/it] 8%|▊ | 459/5773 [42:50<8:01:30, 5.44s/it] 8%|▊ | 459/5773 [42:45<8:01:30, 5.44s/it] {'loss': 0.6215, 'learning_rate': 1.9872411113358707e-05, 'epoch': 0.08} 8%|▊ | 459/5773 [42:50<8:01:30, 5.44s/it] {'loss': 0.6215, 'learning_rate': 1.9872411113358707e-05, 'epoch': 0.08} 8%|▊ | 459/5773 [42:45<8:01:30, 5.44s/it] 8%|▊ | 460/5773 [42:56<8:00:14, 5.42s/it] 8%|▊ | 460/5773 [42:50<8:00:14, 5.42s/it] {'loss': 0.6044, 'learning_rate': 1.9871516107566366e-05, 'epoch': 0.08} 8%|▊ | 460/5773 [42:56<8:00:14, 5.42s/it] {'loss': 0.6044, 'learning_rate': 1.9871516107566366e-05, 'epoch': 0.08} 8%|▊ | 460/5773 [42:50<8:00:14, 5.42s/it] 8%|▊ | 461/5773 [43:01<7:57:21, 5.39s/it] 8%|▊ | 461/5773 [42:55<7:57:23, 5.39s/it] {'loss': 0.6287, 'learning_rate': 1.9870617993905363e-05, 'epoch': 0.08} 8%|▊ | 461/5773 [43:01<7:57:21, 5.39s/it] {'loss': 0.6287, 'learning_rate': 1.9870617993905363e-05, 'epoch': 0.08} 8%|▊ | 461/5773 [42:55<7:57:23, 5.39s/it] 8%|▊ | 462/5773 [43:06<8:01:03, 5.43s/it] 8%|▊ | 462/5773 [43:01<8:01:03, 5.43s/it] {'loss': 0.5913, 'learning_rate': 1.986971677265845e-05, 'epoch': 0.08} 8%|▊ | 462/5773 [43:06<8:01:03, 5.43s/it] {'loss': 0.5913, 'learning_rate': 1.986971677265845e-05, 'epoch': 0.08} 8%|▊ | 462/5773 [43:01<8:01:03, 5.43s/it] 8%|▊ | 463/5773 [43:12<7:59:29, 5.42s/it] 8%|▊ | 463/5773 [43:06<7:59:28, 5.42s/it] {'loss': 0.6002, 'learning_rate': 1.9868812444109364e-05, 'epoch': 0.08} 8%|▊ | 463/5773 [43:12<7:59:29, 5.42s/it] {'loss': 0.6002, 'learning_rate': 1.9868812444109364e-05, 'epoch': 0.08} 8%|▊ | 463/5773 [43:06<7:59:28, 5.42s/it] 8%|▊ | 464/5773 [43:17<8:01:08, 5.44s/it] 8%|▊ | 464/5773 [43:12<8:01:07, 5.44s/it] {'loss': 0.6056, 'learning_rate': 1.9867905008542812e-05, 'epoch': 0.08} 8%|▊ | 464/5773 [43:17<8:01:08, 5.44s/it] {'loss': 0.6056, 'learning_rate': 1.9867905008542812e-05, 'epoch': 0.08} 8%|▊ | 464/5773 [43:12<8:01:07, 5.44s/it] 8%|▊ | 465/5773 [43:23<8:03:42, 5.47s/it] 8%|▊ | 465/5773 [43:17<8:03:42, 5.47s/it] {'loss': 0.614, 'learning_rate': 1.986699446624449e-05, 'epoch': 0.08} 8%|▊ | 465/5773 [43:23<8:03:42, 5.47s/it] {'loss': 0.614, 'learning_rate': 1.986699446624449e-05, 'epoch': 0.08} 8%|▊ | 465/5773 [43:17<8:03:42, 5.47s/it] 8%|▊ | 466/5773 [43:28<8:00:38, 5.43s/it] 8%|▊ | 466/5773 [43:23<8:00:38, 5.43s/it] {'loss': 0.6014, 'learning_rate': 1.9866080817501057e-05, 'epoch': 0.08} 8%|▊ | 466/5773 [43:28<8:00:38, 5.43s/it] {'loss': 0.6014, 'learning_rate': 1.9866080817501057e-05, 'epoch': 0.08} 8%|▊ | 466/5773 [43:23<8:00:38, 5.43s/it] 8%|▊ | 467/5773 [43:34<7:59:12, 5.42s/it] 8%|▊ | 467/5773 [43:28<7:59:12, 5.42s/it] {'loss': 0.5943, 'learning_rate': 1.9865164062600165e-05, 'epoch': 0.08} 8%|▊ | 467/5773 [43:34<7:59:12, 5.42s/it] {'loss': 0.5943, 'learning_rate': 1.9865164062600165e-05, 'epoch': 0.08} 8%|▊ | 467/5773 [43:28<7:59:12, 5.42s/it] 8%|▊ | 468/5773 [43:39<7:59:33, 5.42s/it] 8%|▊ | 468/5773 [43:33<7:59:33, 5.42s/it] {'loss': 0.5931, 'learning_rate': 1.9864244201830437e-05, 'epoch': 0.08} 8%|▊ | 468/5773 [43:39<7:59:33, 5.42s/it] {'loss': 0.5931, 'learning_rate': 1.9864244201830437e-05, 'epoch': 0.08} 8%|▊ | 468/5773 [43:33<7:59:33, 5.42s/it] 8%|▊ | 469/5773 [43:44<7:59:29, 5.42s/it] 8%|▊ | 469/5773 [43:39<7:59:29, 5.42s/it] {'loss': 0.6144, 'learning_rate': 1.9863321235481472e-05, 'epoch': 0.08} 8%|▊ | 469/5773 [43:44<7:59:29, 5.42s/it] {'loss': 0.6144, 'learning_rate': 1.9863321235481472e-05, 'epoch': 0.08} 8%|▊ | 469/5773 [43:39<7:59:29, 5.42s/it] 8%|▊ | 470/5773 [43:50<8:01:57, 5.45s/it] 8%|▊ | 470/5773 [43:44<8:01:57, 5.45s/it] {'loss': 0.6133, 'learning_rate': 1.9862395163843854e-05, 'epoch': 0.08} 8%|▊ | 470/5773 [43:50<8:01:57, 5.45s/it] {'loss': 0.6133, 'learning_rate': 1.9862395163843854e-05, 'epoch': 0.08} 8%|▊ | 470/5773 [43:44<8:01:57, 5.45s/it] 8%|▊ | 471/5773 [43:55<8:03:45, 5.47s/it] 8%|▊ | 471/5773 [43:50<8:03:45, 5.47s/it] {'loss': 0.6078, 'learning_rate': 1.986146598720913e-05, 'epoch': 0.08} 8%|▊ | 471/5773 [43:55<8:03:45, 5.47s/it] {'loss': 0.6078, 'learning_rate': 1.986146598720913e-05, 'epoch': 0.08} 8%|▊ | 471/5773 [43:50<8:03:45, 5.47s/it] 8%|▊ | 472/5773 [44:01<7:57:23, 5.40s/it] 8%|▊ | 472/5773 [43:55<7:57:23, 5.40s/it] {'loss': 0.6064, 'learning_rate': 1.986053370586985e-05, 'epoch': 0.08} 8%|▊ | 472/5773 [44:01<7:57:23, 5.40s/it] {'loss': 0.6064, 'learning_rate': 1.986053370586985e-05, 'epoch': 0.08} 8%|▊ | 472/5773 [43:55<7:57:23, 5.40s/it] 8%|▊ | 473/5773 [44:06<7:59:46, 5.43s/it] 8%|▊ | 473/5773 [44:01<7:59:46, 5.43s/it] {'loss': 0.5851, 'learning_rate': 1.9859598320119514e-05, 'epoch': 0.08} 8%|▊ | 473/5773 [44:06<7:59:46, 5.43s/it] {'loss': 0.5851, 'learning_rate': 1.9859598320119514e-05, 'epoch': 0.08} 8%|▊ | 473/5773 [44:01<7:59:46, 5.43s/it] 8%|▊ | 474/5773 [44:12<7:59:58, 5.43s/it] 8%|▊ | 474/5773 [44:06<7:59:58, 5.43s/it] {'loss': 0.6073, 'learning_rate': 1.9858659830252616e-05, 'epoch': 0.08} 8%|▊ | 474/5773 [44:12<7:59:58, 5.43s/it] {'loss': 0.6073, 'learning_rate': 1.9858659830252616e-05, 'epoch': 0.08} 8%|▊ | 474/5773 [44:06<7:59:58, 5.43s/it] 8%|▊ | 475/5773 [44:17<8:03:58, 5.48s/it] 8%|▊ | 475/5773 [44:12<8:03:56, 5.48s/it] {'loss': 0.6012, 'learning_rate': 1.985771823656462e-05, 'epoch': 0.08} 8%|▊ | 475/5773 [44:17<8:03:58, 5.48s/it] {'loss': 0.6012, 'learning_rate': 1.985771823656462e-05, 'epoch': 0.08} 8%|▊ | 475/5773 [44:12<8:03:56, 5.48s/it] 8%|▊ | 476/5773 [44:23<8:04:17, 5.49s/it] 8%|▊ | 476/5773 [44:17<8:04:16, 5.49s/it] {'loss': 0.5959, 'learning_rate': 1.9856773539351972e-05, 'epoch': 0.08} 8%|▊ | 476/5773 [44:23<8:04:17, 5.49s/it] {'loss': 0.5959, 'learning_rate': 1.9856773539351972e-05, 'epoch': 0.08} 8%|▊ | 476/5773 [44:17<8:04:16, 5.49s/it] 8%|▊ | 477/5773 [44:28<8:01:53, 5.46s/it] 8%|▊ | 477/5773 [44:23<8:01:53, 5.46s/it] {'loss': 0.6199, 'learning_rate': 1.9855825738912094e-05, 'epoch': 0.08} 8%|▊ | 477/5773 [44:28<8:01:53, 5.46s/it] {'loss': 0.6199, 'learning_rate': 1.9855825738912094e-05, 'epoch': 0.08} 8%|▊ | 477/5773 [44:23<8:01:53, 5.46s/it] 8%|▊ | 478/5773 [44:34<7:59:39, 5.44s/it] 8%|▊ | 478/5773 [44:28<7:59:39, 5.44s/it] {'loss': 0.6013, 'learning_rate': 1.9854874835543376e-05, 'epoch': 0.08} 8%|▊ | 478/5773 [44:34<7:59:39, 5.44s/it] {'loss': 0.6013, 'learning_rate': 1.9854874835543376e-05, 'epoch': 0.08} 8%|▊ | 478/5773 [44:28<7:59:39, 5.44s/it] 8%|▊ | 479/5773 [44:39<7:56:49, 5.40s/it] 8%|▊ | 479/5773 [44:33<7:56:49, 5.40s/it] {'loss': 0.606, 'learning_rate': 1.9853920829545204e-05, 'epoch': 0.08} 8%|▊ | 479/5773 [44:39<7:56:49, 5.40s/it] {'loss': 0.606, 'learning_rate': 1.9853920829545204e-05, 'epoch': 0.08} 8%|▊ | 479/5773 [44:33<7:56:49, 5.40s/it] 8%|▊ | 480/5773 [44:44<7:55:44, 5.39s/it] 8%|▊ | 480/5773 [44:39<7:55:44, 5.39s/it] {'loss': 0.6081, 'learning_rate': 1.9852963721217922e-05, 'epoch': 0.08} 8%|▊ | 480/5773 [44:44<7:55:44, 5.39s/it] {'loss': 0.6081, 'learning_rate': 1.9852963721217922e-05, 'epoch': 0.08} 8%|▊ | 480/5773 [44:39<7:55:44, 5.39s/it] 8%|▊ | 481/5773 [44:50<7:56:51, 5.41s/it] 8%|▊ | 481/5773 [44:44<7:56:52, 5.41s/it] {'loss': 0.6038, 'learning_rate': 1.9852003510862864e-05, 'epoch': 0.08} 8%|▊ | 481/5773 [44:50<7:56:51, 5.41s/it] {'loss': 0.6038, 'learning_rate': 1.9852003510862864e-05, 'epoch': 0.08} 8%|▊ | 481/5773 [44:44<7:56:52, 5.41s/it] 8%|▊ | 482/5773 [44:55<7:54:48, 5.38s/it] 8%|▊ | 482/5773 [44:49<7:54:49, 5.38s/it] {'loss': 0.6033, 'learning_rate': 1.985104019878233e-05, 'epoch': 0.08} 8%|▊ | 482/5773 [44:55<7:54:48, 5.38s/it] {'loss': 0.6033, 'learning_rate': 1.985104019878233e-05, 'epoch': 0.08} 8%|▊ | 482/5773 [44:49<7:54:49, 5.38s/it] 8%|▊ | 483/5773 [45:00<7:56:55, 5.41s/it] 8%|▊ | 483/5773 [44:55<7:56:55, 5.41s/it] {'loss': 0.6023, 'learning_rate': 1.9850073785279598e-05, 'epoch': 0.08} 8%|▊ | 483/5773 [45:00<7:56:55, 5.41s/it] {'loss': 0.6023, 'learning_rate': 1.9850073785279598e-05, 'epoch': 0.08} 8%|▊ | 483/5773 [44:55<7:56:55, 5.41s/it] 8%|▊ | 484/5773 [45:06<8:01:21, 5.46s/it] 8%|▊ | 484/5773 [45:00<8:01:20, 5.46s/it] {'loss': 0.6237, 'learning_rate': 1.9849104270658933e-05, 'epoch': 0.08} 8%|▊ | 484/5773 [45:06<8:01:21, 5.46s/it] {'loss': 0.6237, 'learning_rate': 1.9849104270658933e-05, 'epoch': 0.08} 8%|▊ | 484/5773 [45:01<8:01:20, 5.46s/it] 8%|▊ | 485/5773 [45:12<8:02:15, 5.47s/it] 8%|▊ | 485/5773 [45:06<8:02:14, 5.47s/it] {'loss': 0.6063, 'learning_rate': 1.9848131655225567e-05, 'epoch': 0.08} 8%|▊ | 485/5773 [45:12<8:02:15, 5.47s/it] {'loss': 0.6063, 'learning_rate': 1.9848131655225567e-05, 'epoch': 0.08} 8%|▊ | 485/5773 [45:06<8:02:14, 5.47s/it] 8%|▊ | 486/5773 [45:17<8:00:10, 5.45s/it] 8%|▊ | 486/5773 [45:11<8:00:10, 5.45s/it] {'loss': 0.5956, 'learning_rate': 1.9847155939285713e-05, 'epoch': 0.08} 8%|▊ | 486/5773 [45:17<8:00:10, 5.45s/it] {'loss': 0.5956, 'learning_rate': 1.9847155939285713e-05, 'epoch': 0.08} 8%|▊ | 486/5773 [45:11<8:00:10, 5.45s/it] 8%|▊ | 487/5773 [45:22<8:03:12, 5.48s/it] 8%|▊ | 487/5773 [45:17<8:03:13, 5.48s/it] {'loss': 0.6237, 'learning_rate': 1.9846177123146553e-05, 'epoch': 0.08} 8%|▊ | 487/5773 [45:23<8:03:12, 5.48s/it] {'loss': 0.6237, 'learning_rate': 1.9846177123146553e-05, 'epoch': 0.08} 8%|▊ | 487/5773 [45:17<8:03:13, 5.48s/it] 8%|▊ | 488/5773 [45:28<7:57:51, 5.43s/it] 8%|▊ | 488/5773 [45:22<7:57:52, 5.43s/it] {'loss': 0.6095, 'learning_rate': 1.9845195207116253e-05, 'epoch': 0.08} 8%|▊ | 488/5773 [45:28<7:57:51, 5.43s/it] {'loss': 0.6095, 'learning_rate': 1.9845195207116253e-05, 'epoch': 0.08} 8%|▊ | 488/5773 [45:22<7:57:52, 5.43s/it] 8%|▊ | 489/5773 [45:33<7:56:13, 5.41s/it] 8%|▊ | 489/5773 [45:28<7:56:13, 5.41s/it] {'loss': 0.6018, 'learning_rate': 1.9844210191503944e-05, 'epoch': 0.08} 8%|▊ | 489/5773 [45:33<7:56:13, 5.41s/it] {'loss': 0.6018, 'learning_rate': 1.9844210191503944e-05, 'epoch': 0.08} 8%|▊ | 489/5773 [45:28<7:56:13, 5.41s/it] 8%|▊ | 490/5773 [45:39<7:56:06, 5.41s/it] 8%|▊ | 490/5773 [45:33<7:56:06, 5.41s/it] {'loss': 0.6045, 'learning_rate': 1.9843222076619755e-05, 'epoch': 0.08} 8%|▊ | 490/5773 [45:39<7:56:06, 5.41s/it] {'loss': 0.6045, 'learning_rate': 1.9843222076619755e-05, 'epoch': 0.08} 8%|▊ | 490/5773 [45:33<7:56:06, 5.41s/it] 9%|▊ | 491/5773 [45:44<7:57:14, 5.42s/it] 9%|▊ | 491/5773 [45:38<7:57:14, 5.42s/it] {'loss': 0.6302, 'learning_rate': 1.984223086277476e-05, 'epoch': 0.09} 9%|▊ | 491/5773 [45:44<7:57:14, 5.42s/it] {'loss': 0.6302, 'learning_rate': 1.984223086277476e-05, 'epoch': 0.09} 9%|▊ | 491/5773 [45:38<7:57:14, 5.42s/it] 9%|▊ | 492/5773 [45:49<7:57:37, 5.43s/it] 9%|▊ | 492/5773 [45:44<7:57:38, 5.43s/it] {'loss': 0.6164, 'learning_rate': 1.9841236550281037e-05, 'epoch': 0.09} 9%|▊ | 492/5773 [45:49<7:57:37, 5.43s/it] {'loss': 0.6164, 'learning_rate': 1.9841236550281037e-05, 'epoch': 0.09} 9%|▊ | 492/5773 [45:44<7:57:38, 5.43s/it] 9%|▊ | 493/5773 [45:55<7:56:53, 5.42s/it] 9%|▊ | 493/5773 [45:49<7:56:53, 5.42s/it] {'loss': 0.6149, 'learning_rate': 1.9840239139451622e-05, 'epoch': 0.09} 9%|▊ | 493/5773 [45:55<7:56:53, 5.42s/it] {'loss': 0.6149, 'learning_rate': 1.9840239139451622e-05, 'epoch': 0.09} 9%|▊ | 493/5773 [45:49<7:56:53, 5.42s/it] 9%|▊ | 494/5773 [46:00<7:55:01, 5.40s/it] 9%|▊ | 494/5773 [45:55<7:55:01, 5.40s/it] {'loss': 0.6118, 'learning_rate': 1.983923863060053e-05, 'epoch': 0.09} 9%|▊ | 494/5773 [46:00<7:55:01, 5.40s/it] {'loss': 0.6118, 'learning_rate': 1.983923863060053e-05, 'epoch': 0.09} 9%|▊ | 494/5773 [45:55<7:55:01, 5.40s/it] 9%|▊ | 495/5773 [46:06<7:53:55, 5.39s/it] 9%|▊ | 495/5773 [46:00<7:53:55, 5.39s/it] {'loss': 0.6115, 'learning_rate': 1.9838235024042757e-05, 'epoch': 0.09} 9%|▊ | 495/5773 [46:06<7:53:55, 5.39s/it] {'loss': 0.6115, 'learning_rate': 1.9838235024042757e-05, 'epoch': 0.09} 9%|▊ | 495/5773 [46:00<7:53:55, 5.39s/it] 9%|▊ | 496/5773 [46:11<7:52:29, 5.37s/it] 9%|▊ | 496/5773 [46:05<7:52:29, 5.37s/it] {'loss': 0.6039, 'learning_rate': 1.983722832009427e-05, 'epoch': 0.09} 9%|▊ | 496/5773 [46:11<7:52:29, 5.37s/it] {'loss': 0.6039, 'learning_rate': 1.983722832009427e-05, 'epoch': 0.09} 9%|▊ | 496/5773 [46:05<7:52:29, 5.37s/it] 9%|▊ | 497/5773 [46:16<7:50:26, 5.35s/it] 9%|▊ | 497/5773 [46:11<7:50:25, 5.35s/it] {'loss': 0.6038, 'learning_rate': 1.983621851907201e-05, 'epoch': 0.09} 9%|▊ | 497/5773 [46:16<7:50:26, 5.35s/it] {'loss': 0.6038, 'learning_rate': 1.983621851907201e-05, 'epoch': 0.09} 9%|▊ | 497/5773 [46:11<7:50:25, 5.35s/it] 9%|▊ | 498/5773 [46:22<7:51:09, 5.36s/it] 9%|▊ | 498/5773 [46:16<7:51:09, 5.36s/it] {'loss': 0.599, 'learning_rate': 1.9835205621293893e-05, 'epoch': 0.09} 9%|▊ | 498/5773 [46:22<7:51:09, 5.36s/it] {'loss': 0.599, 'learning_rate': 1.9835205621293893e-05, 'epoch': 0.09} 9%|▊ | 498/5773 [46:16<7:51:09, 5.36s/it] 9%|▊ | 499/5773 [46:27<7:53:41, 5.39s/it] 9%|▊ | 499/5773 [46:22<7:53:39, 5.39s/it] {'loss': 0.6251, 'learning_rate': 1.9834189627078817e-05, 'epoch': 0.09} 9%|▊ | 499/5773 [46:27<7:53:41, 5.39s/it] {'loss': 0.6251, 'learning_rate': 1.9834189627078817e-05, 'epoch': 0.09} 9%|▊ | 499/5773 [46:22<7:53:39, 5.39s/it]15 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 46 5 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 9%|▊ | 500/5773 [46:32<7:50:44, 5.36s/it]9 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 9%|▊ | 500/5773 [46:27<7:50:45, 5.36s/it]3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5853, 'learning_rate': 1.9833170536746645e-05, 'epoch': 0.09} 9%|▊ | 500/5773 [46:32<7:50:44, 5.36s/it] {'loss': 0.5853, 'learning_rate': 1.9833170536746645e-05, 'epoch': 0.09} 9%|▊ | 500/5773 [46:27<7:50:45, 5.36s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 9%|▊ | 501/5773 [46:51<13:31:05, 9.23s/it] 9%|▊ | 501/5773 [46:45<13:31:05, 9.23s/it] {'loss': 0.6136, 'learning_rate': 1.983214835061822e-05, 'epoch': 0.09} 9%|▊ | 501/5773 [46:51<13:31:05, 9.23s/it] {'loss': 0.6136, 'learning_rate': 1.983214835061822e-05, 'epoch': 0.09} 9%|▊ | 501/5773 [46:45<13:31:05, 9.23s/it] 9%|▊ | 502/5773 [46:56<11:50:24, 8.09s/it] 9%|▊ | 502/5773 [46:50<11:50:24, 8.09s/it] {'loss': 0.5988, 'learning_rate': 1.983112306901536e-05, 'epoch': 0.09} 9%|▊ | 502/5773 [46:56<11:50:24, 8.09s/it] {'loss': 0.5988, 'learning_rate': 1.983112306901536e-05, 'epoch': 0.09} 9%|▊ | 502/5773 [46:50<11:50:24, 8.09s/it] 9%|▊ | 503/5773 [47:02<10:42:35, 7.32s/it] 9%|▊ | 503/5773 [46:56<10:42:34, 7.32s/it] {'loss': 0.6188, 'learning_rate': 1.9830094692260853e-05, 'epoch': 0.09} 9%|▊ | 503/5773 [47:02<10:42:35, 7.32s/it] {'loss': 0.6188, 'learning_rate': 1.9830094692260853e-05, 'epoch': 0.09} 9%|▊ | 503/5773 [46:56<10:42:34, 7.32s/it] 9%|▊ | 504/5773 [47:07<9:53:17, 6.76s/it] 9%|▊ | 504/5773 [47:01<9:53:17, 6.76s/it] {'loss': 0.6121, 'learning_rate': 1.982906322067847e-05, 'epoch': 0.09} 9%|▊ | 504/5773 [47:07<9:53:17, 6.76s/it] {'loss': 0.6121, 'learning_rate': 1.982906322067847e-05, 'epoch': 0.09} 9%|▊ | 504/5773 [47:01<9:53:17, 6.76s/it] 9%|▊ | 505/5773 [47:13<9:22:03, 6.40s/it] 9%|▊ | 505/5773 [47:07<9:22:04, 6.40s/it] {'loss': 0.6202, 'learning_rate': 1.9828028654592947e-05, 'epoch': 0.09} 9%|▊ | 505/5773 [47:13<9:22:03, 6.40s/it] {'loss': 0.6202, 'learning_rate': 1.9828028654592947e-05, 'epoch': 0.09} 9%|▊ | 505/5773 [47:07<9:22:04, 6.40s/it] 9%|▉ | 506/5773 [47:18<8:57:19, 6.12s/it] 9%|▉ | 506/5773 [47:12<8:57:19, 6.12s/it] {'loss': 0.6086, 'learning_rate': 1.9826990994330003e-05, 'epoch': 0.09} 9%|▉ | 506/5773 [47:18<8:57:19, 6.12s/it] {'loss': 0.6086, 'learning_rate': 1.9826990994330003e-05, 'epoch': 0.09} 9%|▉ | 506/5773 [47:12<8:57:19, 6.12s/it] 9%|▉ | 507/5773 [47:23<8:39:02, 5.91s/it] 9%|▉ | 507/5773 [47:18<8:39:01, 5.91s/it] {'loss': 0.6041, 'learning_rate': 1.9825950240216324e-05, 'epoch': 0.09} 9%|▉ | 507/5773 [47:23<8:39:02, 5.91s/it] {'loss': 0.6041, 'learning_rate': 1.9825950240216324e-05, 'epoch': 0.09} 9%|▉ | 507/5773 [47:18<8:39:01, 5.91s/it] 9%|▉ | 508/5773 [47:29<8:31:06, 5.82s/it] 9%|▉ | 508/5773 [47:24<8:31:06, 5.82s/it] {'loss': 0.6091, 'learning_rate': 1.9824906392579568e-05, 'epoch': 0.09} 9%|▉ | 508/5773 [47:29<8:31:06, 5.82s/it] {'loss': 0.6091, 'learning_rate': 1.9824906392579568e-05, 'epoch': 0.09} 9%|▉ | 508/5773 [47:24<8:31:06, 5.82s/it] 9%|▉ | 509/5773 [47:35<8:22:46, 5.73s/it] 9%|▉ | 509/5773 [47:29<8:22:46, 5.73s/it] {'loss': 0.604, 'learning_rate': 1.982385945174838e-05, 'epoch': 0.09} 9%|▉ | 509/5773 [47:35<8:22:46, 5.73s/it] {'loss': 0.604, 'learning_rate': 1.982385945174838e-05, 'epoch': 0.09} 9%|▉ | 509/5773 [47:29<8:22:46, 5.73s/it] 9%|▉ | 510/5773 [47:40<8:20:50, 5.71s/it] 9%|▉ | 510/5773 [47:35<8:20:50, 5.71s/it] {'loss': 0.6076, 'learning_rate': 1.9822809418052363e-05, 'epoch': 0.09} 9%|▉ | 510/5773 [47:40<8:20:50, 5.71s/it] {'loss': 0.6076, 'learning_rate': 1.9822809418052363e-05, 'epoch': 0.09} 9%|▉ | 510/5773 [47:35<8:20:50, 5.71s/it] 9%|▉ | 511/5773 [47:46<8:13:32, 5.63s/it] 9%|▉ | 511/5773 [47:40<8:13:32, 5.63s/it] {'loss': 0.5983, 'learning_rate': 1.982175629182211e-05, 'epoch': 0.09} 9%|▉ | 511/5773 [47:46<8:13:32, 5.63s/it] {'loss': 0.5983, 'learning_rate': 1.982175629182211e-05, 'epoch': 0.09} 9%|▉ | 511/5773 [47:40<8:13:32, 5.63s/it] 9%|▉ | 512/5773 [47:51<8:07:08, 5.56s/it] 9%|▉ | 512/5773 [47:46<8:07:09, 5.56s/it] {'loss': 0.5774, 'learning_rate': 1.9820700073389165e-05, 'epoch': 0.09} 9%|▉ | 512/5773 [47:51<8:07:08, 5.56s/it] {'loss': 0.5774, 'learning_rate': 1.9820700073389165e-05, 'epoch': 0.09} 9%|▉ | 512/5773 [47:46<8:07:09, 5.56s/it] 9%|▉ | 513/5773 [47:56<8:03:42, 5.52s/it] 9%|▉ | 513/5773 [47:51<8:03:41, 5.52s/it] {'loss': 0.6021, 'learning_rate': 1.9819640763086075e-05, 'epoch': 0.09} 9%|▉ | 513/5773 [47:56<8:03:42, 5.52s/it] {'loss': 0.6021, 'learning_rate': 1.9819640763086075e-05, 'epoch': 0.09} 9%|▉ | 513/5773 [47:51<8:03:41, 5.52s/it] 9%|▉ | 514/5773 [48:02<8:00:42, 5.48s/it] 9%|▉ | 514/5773 [47:56<8:00:43, 5.48s/it] {'loss': 0.6193, 'learning_rate': 1.9818578361246335e-05, 'epoch': 0.09} 9%|▉ | 514/5773 [48:02<8:00:42, 5.48s/it] {'loss': 0.6193, 'learning_rate': 1.9818578361246335e-05, 'epoch': 0.09} 9%|▉ | 514/5773 [47:56<8:00:43, 5.48s/it] 9%|▉ | 515/5773 [48:07<7:54:27, 5.41s/it] 9%|▉ | 515/5773 [48:02<7:54:26, 5.41s/it] {'loss': 0.6317, 'learning_rate': 1.9817512868204425e-05, 'epoch': 0.09} 9%|▉ | 515/5773 [48:07<7:54:27, 5.41s/it] {'loss': 0.6317, 'learning_rate': 1.9817512868204425e-05, 'epoch': 0.09} 9%|▉ | 515/5773 [48:02<7:54:26, 5.41s/it] 9%|▉ | 516/5773 [48:13<7:57:33, 5.45s/it] 9%|▉ | 516/5773 [48:07<7:57:34, 5.45s/it] {'loss': 0.6134, 'learning_rate': 1.9816444284295796e-05, 'epoch': 0.09} 9%|▉ | 516/5773 [48:13<7:57:33, 5.45s/it] {'loss': 0.6134, 'learning_rate': 1.9816444284295796e-05, 'epoch': 0.09} 9%|▉ | 516/5773 [48:07<7:57:34, 5.45s/it] 9%|▉ | 517/5773 [48:18<7:58:58, 5.47s/it] 9%|▉ | 517/5773 [48:13<7:58:58, 5.47s/it] {'loss': 0.5902, 'learning_rate': 1.9815372609856875e-05, 'epoch': 0.09} 9%|▉ | 517/5773 [48:18<7:58:58, 5.47s/it] {'loss': 0.5902, 'learning_rate': 1.9815372609856875e-05, 'epoch': 0.09} 9%|▉ | 517/5773 [48:13<7:58:58, 5.47s/it] 9%|▉ | 518/5773 [48:23<7:54:08, 5.41s/it] 9%|▉ | 518/5773 [48:18<7:54:08, 5.41s/it] {'loss': 0.6075, 'learning_rate': 1.9814297845225057e-05, 'epoch': 0.09} 9%|▉ | 518/5773 [48:23<7:54:08, 5.41s/it] {'loss': 0.6075, 'learning_rate': 1.9814297845225057e-05, 'epoch': 0.09} 9%|▉ | 518/5773 [48:18<7:54:08, 5.41s/it] 9%|▉ | 519/5773 [48:29<7:54:45, 5.42s/it] 9%|▉ | 519/5773 [48:23<7:54:45, 5.42s/it] {'loss': 0.6012, 'learning_rate': 1.981321999073871e-05, 'epoch': 0.09} 9%|▉ | 519/5773 [48:29<7:54:45, 5.42s/it] {'loss': 0.6012, 'learning_rate': 1.981321999073871e-05, 'epoch': 0.09} 9%|▉ | 519/5773 [48:23<7:54:45, 5.42s/it] 9%|▉ | 520/5773 [48:34<7:55:00, 5.43s/it] 9%|▉ | 520/5773 [48:29<7:55:00, 5.43s/it] {'loss': 0.597, 'learning_rate': 1.9812139046737184e-05, 'epoch': 0.09} 9%|▉ | 520/5773 [48:34<7:55:00, 5.43s/it] {'loss': 0.597, 'learning_rate': 1.9812139046737184e-05, 'epoch': 0.09} 9%|▉ | 520/5773 [48:29<7:55:00, 5.43s/it] 9%|▉ | 521/5773 [48:34<7:54:37, 5.42s/it] 9%|▉ | 521/5773 [48:40<7:54:38, 5.42s/it] {'loss': 0.6076, 'learning_rate': 1.981105501356079e-05, 'epoch': 0.09} 9%|▉ | 521/5773 [48:40<7:54:38, 5.42s/it] {'loss': 0.6076, 'learning_rate': 1.981105501356079e-05, 'epoch': 0.09} 9%|▉ | 521/5773 [48:34<7:54:37, 5.42s/it] 9%|▉ | 522/5773 [48:45<7:56:15, 5.44s/it] 9%|▉ | 522/5773 [48:40<7:56:14, 5.44s/it] {'loss': 0.6027, 'learning_rate': 1.9809967891550812e-05, 'epoch': 0.09} 9%|▉ | 522/5773 [48:45<7:56:15, 5.44s/it] {'loss': 0.6027, 'learning_rate': 1.9809967891550812e-05, 'epoch': 0.09} 9%|▉ | 522/5773 [48:40<7:56:14, 5.44s/it] 9%|▉ | 523/5773 [48:51<7:57:09, 5.45s/it] 9%|▉ | 523/5773 [48:45<7:57:09, 5.45s/it] {'loss': 0.6027, 'learning_rate': 1.980887768104952e-05, 'epoch': 0.09} 9%|▉ | 523/5773 [48:51<7:57:09, 5.45s/it] {'loss': 0.6027, 'learning_rate': 1.980887768104952e-05, 'epoch': 0.09} 9%|▉ | 523/5773 [48:45<7:57:09, 5.45s/it] 9%|▉ | 524/5773 [48:56<7:55:48, 5.44s/it] 9%|▉ | 524/5773 [48:51<7:55:49, 5.44s/it] {'loss': 0.5832, 'learning_rate': 1.980778438240014e-05, 'epoch': 0.09} 9%|▉ | 524/5773 [48:56<7:55:48, 5.44s/it] {'loss': 0.5832, 'learning_rate': 1.980778438240014e-05, 'epoch': 0.09} 9%|▉ | 524/5773 [48:51<7:55:49, 5.44s/it] 9%|▉ | 525/5773 [49:01<7:52:12, 5.40s/it] 9%|▉ | 525/5773 [48:56<7:52:12, 5.40s/it] {'loss': 0.6114, 'learning_rate': 1.980668799594688e-05, 'epoch': 0.09} 9%|▉ | 525/5773 [49:01<7:52:12, 5.40s/it] {'loss': 0.6114, 'learning_rate': 1.980668799594688e-05, 'epoch': 0.09} 9%|▉ | 525/5773 [48:56<7:52:12, 5.40s/it] 9%|▉ | 526/5773 [49:07<7:50:38, 5.38s/it] 9%|▉ | 526/5773 [49:01<7:50:38, 5.38s/it] {'loss': 0.6256, 'learning_rate': 1.980558852203492e-05, 'epoch': 0.09} 9%|▉ | 526/5773 [49:07<7:50:38, 5.38s/it] {'loss': 0.6256, 'learning_rate': 1.980558852203492e-05, 'epoch': 0.09} 9%|▉ | 526/5773 [49:01<7:50:38, 5.38s/it] 9%|▉ | 527/5773 [49:12<7:50:50, 5.39s/it] 9%|▉ | 527/5773 [49:07<7:50:50, 5.39s/it] {'loss': 0.6069, 'learning_rate': 1.98044859610104e-05, 'epoch': 0.09} 9%|▉ | 527/5773 [49:12<7:50:50, 5.39s/it] {'loss': 0.6069, 'learning_rate': 1.98044859610104e-05, 'epoch': 0.09} 9%|▉ | 527/5773 [49:07<7:50:50, 5.39s/it] 9%|▉ | 528/5773 [49:18<7:52:48, 5.41s/it] 9%|▉ | 528/5773 [49:12<7:52:48, 5.41s/it] {'loss': 0.6081, 'learning_rate': 1.9803380313220457e-05, 'epoch': 0.09} 9%|▉ | 528/5773 [49:18<7:52:48, 5.41s/it] {'loss': 0.6081, 'learning_rate': 1.9803380313220457e-05, 'epoch': 0.09} 9%|▉ | 528/5773 [49:12<7:52:48, 5.41s/it] 9%|▉ | 529/5773 [49:23<7:48:38, 5.36s/it] 9%|▉ | 529/5773 [49:17<7:48:38, 5.36s/it] {'loss': 0.6018, 'learning_rate': 1.9802271579013172e-05, 'epoch': 0.09} 9%|▉ | 529/5773 [49:23<7:48:38, 5.36s/it] {'loss': 0.6018, 'learning_rate': 1.9802271579013172e-05, 'epoch': 0.09} 9%|▉ | 529/5773 [49:17<7:48:38, 5.36s/it] 9%|▉ | 530/5773 [49:28<7:48:30, 5.36s/it] 9%|▉ | 530/5773 [49:23<7:48:30, 5.36s/it] {'loss': 0.5994, 'learning_rate': 1.980115975873761e-05, 'epoch': 0.09} 9%|▉ | 530/5773 [49:28<7:48:30, 5.36s/it] {'loss': 0.5994, 'learning_rate': 1.980115975873761e-05, 'epoch': 0.09} 9%|▉ | 530/5773 [49:23<7:48:30, 5.36s/it] 9%|▉ | 531/5773 [49:34<7:48:20, 5.36s/it] 9%|▉ | 531/5773 [49:28<7:48:19, 5.36s/it] {'loss': 0.6088, 'learning_rate': 1.9800044852743817e-05, 'epoch': 0.09} 9%|▉ | 531/5773 [49:34<7:48:20, 5.36s/it] {'loss': 0.6088, 'learning_rate': 1.9800044852743817e-05, 'epoch': 0.09} 9%|▉ | 531/5773 [49:28<7:48:19, 5.36s/it] 9%|▉ | 532/5773 [49:39<7:48:41, 5.37s/it] 9%|▉ | 532/5773 [49:33<7:48:41, 5.37s/it] {'loss': 0.6008, 'learning_rate': 1.9798926861382792e-05, 'epoch': 0.09} 9%|▉ | 532/5773 [49:39<7:48:41, 5.37s/it] {'loss': 0.6008, 'learning_rate': 1.9798926861382792e-05, 'epoch': 0.09} 9%|▉ | 532/5773 [49:33<7:48:41, 5.37s/it] 9%|▉ | 533/5773 [49:44<7:46:06, 5.34s/it] 9%|▉ | 533/5773 [49:39<7:46:07, 5.34s/it] {'loss': 0.5962, 'learning_rate': 1.9797805785006516e-05, 'epoch': 0.09} 9%|▉ | 533/5773 [49:44<7:46:06, 5.34s/it] {'loss': 0.5962, 'learning_rate': 1.9797805785006516e-05, 'epoch': 0.09} 9%|▉ | 533/5773 [49:39<7:46:07, 5.34s/it] 9%|▉ | 534/5773 [49:50<7:49:32, 5.38s/it] 9%|▉ | 534/5773 [49:44<7:49:32, 5.38s/it] {'loss': 0.6001, 'learning_rate': 1.9796681623967943e-05, 'epoch': 0.09} 9%|▉ | 534/5773 [49:50<7:49:32, 5.38s/it] {'loss': 0.6001, 'learning_rate': 1.9796681623967943e-05, 'epoch': 0.09} 9%|▉ | 534/5773 [49:44<7:49:32, 5.38s/it] 9%|▉ | 535/5773 [49:55<7:53:14, 5.42s/it] 9%|▉ | 535/5773 [49:50<7:53:14, 5.42s/it] {'loss': 0.6219, 'learning_rate': 1.9795554378620995e-05, 'epoch': 0.09} 9%|▉ | 535/5773 [49:55<7:53:14, 5.42s/it] {'loss': 0.6219, 'learning_rate': 1.9795554378620995e-05, 'epoch': 0.09} 9%|▉ | 535/5773 [49:50<7:53:14, 5.42s/it] 9%|▉ | 536/5773 [50:01<7:51:44, 5.40s/it] 9%|▉ | 536/5773 [49:55<7:51:44, 5.40s/it] {'loss': 0.6074, 'learning_rate': 1.9794424049320557e-05, 'epoch': 0.09} 9%|▉ | 536/5773 [50:01<7:51:44, 5.40s/it] {'loss': 0.6074, 'learning_rate': 1.9794424049320557e-05, 'epoch': 0.09} 9%|▉ | 536/5773 [49:55<7:51:44, 5.40s/it] 9%|▉ | 537/5773 [50:06<7:51:17, 5.40s/it] 9%|▉ | 537/5773 [50:00<7:51:15, 5.40s/it] {'loss': 0.6006, 'learning_rate': 1.9793290636422503e-05, 'epoch': 0.09} 9%|▉ | 537/5773 [50:06<7:51:17, 5.40s/it] {'loss': 0.6006, 'learning_rate': 1.9793290636422503e-05, 'epoch': 0.09} 9%|▉ | 537/5773 [50:00<7:51:15, 5.40s/it] 9%|▉ | 538/5773 [50:11<7:47:43, 5.36s/it] 9%|▉ | 538/5773 [50:06<7:47:43, 5.36s/it] {'loss': 0.6061, 'learning_rate': 1.9792154140283663e-05, 'epoch': 0.09} 9%|▉ | 538/5773 [50:11<7:47:43, 5.36s/it] {'loss': 0.6061, 'learning_rate': 1.9792154140283663e-05, 'epoch': 0.09} 9%|▉ | 538/5773 [50:06<7:47:43, 5.36s/it] 9%|▉ | 539/5773 [50:17<7:50:04, 5.39s/it] 9%|▉ | 539/5773 [50:11<7:50:03, 5.39s/it] {'loss': 0.624, 'learning_rate': 1.979101456126184e-05, 'epoch': 0.09} 9%|▉ | 539/5773 [50:17<7:50:04, 5.39s/it] {'loss': 0.624, 'learning_rate': 1.979101456126184e-05, 'epoch': 0.09} 9%|▉ | 539/5773 [50:11<7:50:03, 5.39s/it] 9%|▉ | 540/5773 [50:22<7:51:38, 5.41s/it] 9%|▉ | 540/5773 [50:17<7:51:38, 5.41s/it] {'loss': 0.6041, 'learning_rate': 1.9789871899715817e-05, 'epoch': 0.09} 9%|▉ | 540/5773 [50:22<7:51:38, 5.41s/it] {'loss': 0.6041, 'learning_rate': 1.9789871899715817e-05, 'epoch': 0.09} 9%|▉ | 540/5773 [50:17<7:51:38, 5.41s/it] 9%|▉ | 541/5773 [50:28<7:51:23, 5.41s/it] 9%|▉ | 541/5773 [50:22<7:51:23, 5.41s/it] {'loss': 0.5929, 'learning_rate': 1.9788726156005332e-05, 'epoch': 0.09} 9%|▉ | 541/5773 [50:28<7:51:23, 5.41s/it] {'loss': 0.5929, 'learning_rate': 1.9788726156005332e-05, 'epoch': 0.09} 9%|▉ | 541/5773 [50:22<7:51:23, 5.41s/it] 9%|▉ | 542/5773 [50:33<7:52:09, 5.42s/it] 9%|▉ | 542/5773 [50:27<7:52:09, 5.42s/it] {'loss': 0.5962, 'learning_rate': 1.9787577330491104e-05, 'epoch': 0.09} 9%|▉ | 542/5773 [50:33<7:52:09, 5.42s/it] {'loss': 0.5962, 'learning_rate': 1.9787577330491104e-05, 'epoch': 0.09} 9%|▉ | 542/5773 [50:27<7:52:09, 5.42s/it] 9%|▉ | 543/5773 [50:39<7:57:03, 5.47s/it] 9%|▉ | 543/5773 [50:33<7:57:03, 5.47s/it] {'loss': 0.616, 'learning_rate': 1.9786425423534826e-05, 'epoch': 0.09} 9%|▉ | 543/5773 [50:39<7:57:03, 5.47s/it] {'loss': 0.616, 'learning_rate': 1.9786425423534826e-05, 'epoch': 0.09} 9%|▉ | 543/5773 [50:33<7:57:03, 5.47s/it] 9%|▉ | 544/5773 [50:44<7:54:38, 5.45s/it] 9%|▉ | 544/5773 [50:38<7:54:37, 5.45s/it] {'loss': 0.6177, 'learning_rate': 1.9785270435499147e-05, 'epoch': 0.09} 9%|▉ | 544/5773 [50:44<7:54:38, 5.45s/it] {'loss': 0.6177, 'learning_rate': 1.9785270435499147e-05, 'epoch': 0.09} 9%|▉ | 544/5773 [50:38<7:54:37, 5.45s/it] 9%|▉ | 545/5773 [50:50<7:57:50, 5.48s/it] 9%|▉ | 545/5773 [50:44<7:57:49, 5.48s/it] {'loss': 0.5935, 'learning_rate': 1.97841123667477e-05, 'epoch': 0.09} 9%|▉ | 545/5773 [50:50<7:57:50, 5.48s/it] {'loss': 0.5935, 'learning_rate': 1.97841123667477e-05, 'epoch': 0.09} 9%|▉ | 545/5773 [50:44<7:57:49, 5.48s/it] 9%|▉ | 546/5773 [50:55<7:53:19, 5.43s/it] 9%|▉ | 546/5773 [50:49<7:53:19, 5.43s/it] {'loss': 0.5896, 'learning_rate': 1.9782951217645078e-05, 'epoch': 0.09} 9%|▉ | 546/5773 [50:55<7:53:19, 5.43s/it] {'loss': 0.5896, 'learning_rate': 1.9782951217645078e-05, 'epoch': 0.09} 9%|▉ | 546/5773 [50:49<7:53:19, 5.43s/it] 9%|▉ | 547/5773 [51:00<7:53:50, 5.44s/it] 9%|▉ | 547/5773 [50:55<7:53:49, 5.44s/it] {'loss': 0.5974, 'learning_rate': 1.978178698855685e-05, 'epoch': 0.09} 9%|▉ | 547/5773 [51:00<7:53:50, 5.44s/it] {'loss': 0.5974, 'learning_rate': 1.978178698855685e-05, 'epoch': 0.09} 9%|▉ | 547/5773 [50:55<7:53:49, 5.44s/it] 9%|▉ | 548/5773 [51:06<7:52:07, 5.42s/it] 9%|▉ | 548/5773 [51:00<7:52:08, 5.42s/it] {'loss': 0.6032, 'learning_rate': 1.9780619679849552e-05, 'epoch': 0.09} 9%|▉ | 548/5773 [51:06<7:52:07, 5.42s/it] {'loss': 0.6032, 'learning_rate': 1.9780619679849552e-05, 'epoch': 0.09} 9%|▉ | 548/5773 [51:00<7:52:08, 5.42s/it] 10%|▉ | 549/5773 [51:11<7:52:14, 5.42s/it] 10%|▉ | 549/5773 [51:06<7:52:14, 5.42s/it] {'loss': 0.6117, 'learning_rate': 1.9779449291890687e-05, 'epoch': 0.1} 10%|▉ | 549/5773 [51:11<7:52:14, 5.42s/it] {'loss': 0.6117, 'learning_rate': 1.9779449291890687e-05, 'epoch': 0.1} 10%|▉ | 549/5773 [51:06<7:52:14, 5.42s/it]8 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 10%|▉ | 550/5773 [51:17<7:50:35, 5.41s/it]13 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 42 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 510 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...14 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend...3 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 10%|▉ | 550/5773 [51:11<7:50:35, 5.41s/it]15 AutoResumeHook: Checking whether to suspend... {'loss': 0.598, 'learning_rate': 1.9778275825048737e-05, 'epoch': 0.1} 10%|▉ | 550/5773 [51:17<7:50:35, 5.41s/it] {'loss': 0.598, 'learning_rate': 1.9778275825048737e-05, 'epoch': 0.1} 10%|▉ | 550/5773 [51:11<7:50:35, 5.41s/it] 10%|▉ | 551/5773 [51:22<7:54:27, 5.45s/it] 10%|▉ | 551/5773 [51:17<7:54:27, 5.45s/it] {'loss': 0.6053, 'learning_rate': 1.9777099279693147e-05, 'epoch': 0.1} 10%|▉ | 551/5773 [51:22<7:54:27, 5.45s/it] {'loss': 0.6053, 'learning_rate': 1.9777099279693147e-05, 'epoch': 0.1} 10%|▉ | 551/5773 [51:17<7:54:27, 5.45s/it] 10%|▉ | 552/5773 [51:27<7:52:41, 5.43s/it] 10%|▉ | 552/5773 [51:22<7:52:41, 5.43s/it] {'loss': 0.6015, 'learning_rate': 1.9775919656194324e-05, 'epoch': 0.1} 10%|▉ | 552/5773 [51:27<7:52:41, 5.43s/it] {'loss': 0.6015, 'learning_rate': 1.9775919656194324e-05, 'epoch': 0.1} 10%|▉ | 552/5773 [51:22<7:52:41, 5.43s/it] 10%|▉ | 553/5773 [51:33<7:59:18, 5.51s/it] 10%|▉ | 553/5773 [51:28<7:59:19, 5.51s/it] {'loss': 0.6051, 'learning_rate': 1.9774736954923654e-05, 'epoch': 0.1} 10%|▉ | 553/5773 [51:33<7:59:18, 5.51s/it] {'loss': 0.6051, 'learning_rate': 1.9774736954923654e-05, 'epoch': 0.1} 10%|▉ | 553/5773 [51:28<7:59:19, 5.51s/it] 10%|▉ | 554/5773 [51:39<7:57:17, 5.49s/it] 10%|▉ | 554/5773 [51:33<7:57:17, 5.49s/it] {'loss': 0.6125, 'learning_rate': 1.9773551176253488e-05, 'epoch': 0.1} 10%|▉ | 554/5773 [51:39<7:57:17, 5.49s/it] {'loss': 0.6125, 'learning_rate': 1.9773551176253488e-05, 'epoch': 0.1} 10%|▉ | 554/5773 [51:33<7:57:17, 5.49s/it] 10%|▉ | 555/5773 [51:44<7:59:17, 5.51s/it] 10%|▉ | 555/5773 [51:39<7:59:17, 5.51s/it] {'loss': 0.6097, 'learning_rate': 1.9772362320557152e-05, 'epoch': 0.1} 10%|▉ | 555/5773 [51:44<7:59:17, 5.51s/it] {'loss': 0.6097, 'learning_rate': 1.9772362320557152e-05, 'epoch': 0.1} 10%|▉ | 555/5773 [51:39<7:59:17, 5.51s/it] 10%|▉ | 556/5773 [51:50<7:54:49, 5.46s/it] 10%|▉ | 556/5773 [51:44<7:54:49, 5.46s/it] {'loss': 0.5941, 'learning_rate': 1.9771170388208935e-05, 'epoch': 0.1} 10%|▉ | 556/5773 [51:50<7:54:49, 5.46s/it] {'loss': 0.5941, 'learning_rate': 1.9771170388208935e-05, 'epoch': 0.1} 10%|▉ | 556/5773 [51:44<7:54:49, 5.46s/it] 10%|▉ | 557/5773 [51:55<7:51:09, 5.42s/it] 10%|▉ | 557/5773 [51:49<7:51:09, 5.42s/it] {'loss': 0.5953, 'learning_rate': 1.9769975379584087e-05, 'epoch': 0.1} 10%|▉ | 557/5773 [51:55<7:51:09, 5.42s/it] {'loss': 0.5953, 'learning_rate': 1.9769975379584087e-05, 'epoch': 0.1} 10%|▉ | 557/5773 [51:49<7:51:09, 5.42s/it] 10%|▉ | 558/5773 [52:00<7:54:22, 5.46s/it] 10%|▉ | 558/5773 [51:55<7:54:22, 5.46s/it] {'loss': 0.6236, 'learning_rate': 1.9768777295058845e-05, 'epoch': 0.1} 10%|▉ | 558/5773 [52:00<7:54:22, 5.46s/it] {'loss': 0.6236, 'learning_rate': 1.9768777295058845e-05, 'epoch': 0.1} 10%|▉ | 558/5773 [51:55<7:54:22, 5.46s/it] 10%|▉ | 559/5773 [52:06<7:56:25, 5.48s/it] 10%|▉ | 559/5773 [52:00<7:56:25, 5.48s/it] {'loss': 0.6261, 'learning_rate': 1.97675761350104e-05, 'epoch': 0.1} 10%|▉ | 559/5773 [52:06<7:56:25, 5.48s/it] {'loss': 0.6261, 'learning_rate': 1.97675761350104e-05, 'epoch': 0.1} 10%|▉ | 559/5773 [52:00<7:56:25, 5.48s/it] 10%|▉ | 560/5773 [52:11<7:57:15, 5.49s/it] 10%|▉ | 560/5773 [52:06<7:57:14, 5.49s/it] {'loss': 0.6027, 'learning_rate': 1.9766371899816915e-05, 'epoch': 0.1} 10%|▉ | 560/5773 [52:11<7:57:15, 5.49s/it] {'loss': 0.6027, 'learning_rate': 1.9766371899816915e-05, 'epoch': 0.1} 10%|▉ | 560/5773 [52:06<7:57:14, 5.49s/it] 10%|▉ | 561/5773 [52:17<7:55:48, 5.48s/it] 10%|▉ | 561/5773 [52:11<7:55:46, 5.48s/it] {'loss': 0.5988, 'learning_rate': 1.9765164589857522e-05, 'epoch': 0.1} 10%|▉ | 561/5773 [52:17<7:55:48, 5.48s/it] {'loss': 0.5988, 'learning_rate': 1.9765164589857522e-05, 'epoch': 0.1} 10%|▉ | 561/5773 [52:11<7:55:46, 5.48s/it] 10%|▉ | 562/5773 [52:22<7:53:20, 5.45s/it] 10%|▉ | 562/5773 [52:17<7:53:20, 5.45s/it] {'loss': 0.5868, 'learning_rate': 1.976395420551232e-05, 'epoch': 0.1} 10%|▉ | 562/5773 [52:22<7:53:20, 5.45s/it] {'loss': 0.5868, 'learning_rate': 1.976395420551232e-05, 'epoch': 0.1} 10%|▉ | 562/5773 [52:17<7:53:20, 5.45s/it] 10%|▉ | 563/5773 [52:28<7:53:43, 5.46s/it] 10%|▉ | 563/5773 [52:22<7:53:42, 5.46s/it] {'loss': 0.5934, 'learning_rate': 1.976274074716238e-05, 'epoch': 0.1} 10%|▉ | 563/5773 [52:28<7:53:43, 5.46s/it] {'loss': 0.5934, 'learning_rate': 1.976274074716238e-05, 'epoch': 0.1} 10%|▉ | 563/5773 [52:22<7:53:42, 5.46s/it] 10%|▉ | 564/5773 [52:33<7:53:59, 5.46s/it] 10%|▉ | 564/5773 [52:28<7:53:59, 5.46s/it] {'loss': 0.6056, 'learning_rate': 1.9761524215189736e-05, 'epoch': 0.1} 10%|▉ | 564/5773 [52:33<7:53:59, 5.46s/it] {'loss': 0.6056, 'learning_rate': 1.9761524215189736e-05, 'epoch': 0.1} 10%|▉ | 564/5773 [52:28<7:53:59, 5.46s/it] 10%|▉ | 565/5773 [52:39<7:55:47, 5.48s/it] 10%|▉ | 565/5773 [52:33<7:55:47, 5.48s/it] {'loss': 0.5929, 'learning_rate': 1.976030460997739e-05, 'epoch': 0.1} 10%|▉ | 565/5773 [52:39<7:55:47, 5.48s/it] {'loss': 0.5929, 'learning_rate': 1.976030460997739e-05, 'epoch': 0.1} 10%|▉ | 565/5773 [52:33<7:55:47, 5.48s/it] 10%|▉ | 566/5773 [52:44<7:46:27, 5.37s/it] 10%|▉ | 566/5773 [52:38<7:46:27, 5.38s/it] {'loss': 0.5968, 'learning_rate': 1.975908193190931e-05, 'epoch': 0.1} 10%|▉ | 566/5773 [52:44<7:46:27, 5.37s/it] {'loss': 0.5968, 'learning_rate': 1.975908193190931e-05, 'epoch': 0.1} 10%|▉ | 566/5773 [52:38<7:46:27, 5.38s/it] 10%|▉ | 567/5773 [52:49<7:51:59, 5.44s/it] 10%|▉ | 567/5773 [52:44<7:51:58, 5.44s/it] {'loss': 0.6033, 'learning_rate': 1.975785618137044e-05, 'epoch': 0.1} 10%|▉ | 567/5773 [52:49<7:51:59, 5.44s/it] {'loss': 0.6033, 'learning_rate': 1.975785618137044e-05, 'epoch': 0.1} 10%|▉ | 567/5773 [52:44<7:51:58, 5.44s/it] 10%|▉ | 568/5773 [52:55<7:47:05, 5.38s/it] 10%|▉ | 568/5773 [52:49<7:47:04, 5.38s/it] {'loss': 0.6032, 'learning_rate': 1.9756627358746683e-05, 'epoch': 0.1} 10%|▉ | 568/5773 [52:55<7:47:05, 5.38s/it] {'loss': 0.6032, 'learning_rate': 1.9756627358746683e-05, 'epoch': 0.1} 10%|▉ | 568/5773 [52:49<7:47:04, 5.38s/it] 10%|▉ | 569/5773 [53:00<7:53:34, 5.46s/it] 10%|▉ | 569/5773 [52:55<7:53:33, 5.46s/it] {'loss': 0.5939, 'learning_rate': 1.9755395464424913e-05, 'epoch': 0.1} 10%|▉ | 569/5773 [53:00<7:53:34, 5.46s/it] {'loss': 0.5939, 'learning_rate': 1.9755395464424913e-05, 'epoch': 0.1} 10%|▉ | 569/5773 [52:55<7:53:33, 5.46s/it] 10%|▉ | 570/5773 [53:06<7:54:55, 5.48s/it] 10%|▉ | 570/5773 [53:00<7:54:55, 5.48s/it] {'loss': 0.5951, 'learning_rate': 1.9754160498792964e-05, 'epoch': 0.1} 10%|▉ | 570/5773 [53:06<7:54:55, 5.48s/it] {'loss': 0.5951, 'learning_rate': 1.9754160498792964e-05, 'epoch': 0.1} 10%|▉ | 570/5773 [53:00<7:54:55, 5.48s/it] 10%|▉ | 571/5773 [53:11<7:54:48, 5.48s/it] 10%|▉ | 571/5773 [53:06<7:54:48, 5.48s/it] {'loss': 0.5868, 'learning_rate': 1.975292246223965e-05, 'epoch': 0.1} 10%|▉ | 571/5773 [53:11<7:54:48, 5.48s/it] {'loss': 0.5868, 'learning_rate': 1.975292246223965e-05, 'epoch': 0.1} 10%|▉ | 571/5773 [53:06<7:54:48, 5.48s/it] 10%|▉ | 572/5773 [53:17<7:57:54, 5.51s/it] 10%|▉ | 572/5773 [53:11<7:57:55, 5.51s/it] {'loss': 0.608, 'learning_rate': 1.975168135515474e-05, 'epoch': 0.1} 10%|▉ | 572/5773 [53:17<7:57:54, 5.51s/it] {'loss': 0.608, 'learning_rate': 1.975168135515474e-05, 'epoch': 0.1} 10%|▉ | 572/5773 [53:11<7:57:55, 5.51s/it] 10%|▉ | 573/5773 [53:22<7:52:57, 5.46s/it] 10%|▉ | 573/5773 [53:17<7:52:57, 5.46s/it] {'loss': 0.6115, 'learning_rate': 1.9750437177928975e-05, 'epoch': 0.1} 10%|▉ | 573/5773 [53:22<7:52:57, 5.46s/it] {'loss': 0.6115, 'learning_rate': 1.9750437177928975e-05, 'epoch': 0.1} 10%|▉ | 573/5773 [53:17<7:52:57, 5.46s/it] 10%|▉ | 574/5773 [53:28<7:51:50, 5.45s/it] 10%|▉ | 574/5773 [53:22<7:51:50, 5.45s/it] {'loss': 0.6017, 'learning_rate': 1.9749189930954062e-05, 'epoch': 0.1} 10%|▉ | 574/5773 [53:28<7:51:50, 5.45s/it] {'loss': 0.6017, 'learning_rate': 1.9749189930954062e-05, 'epoch': 0.1} 10%|▉ | 574/5773 [53:22<7:51:50, 5.45s/it] 10%|▉ | 575/5773 [53:33<7:48:56, 5.41s/it] 10%|▉ | 575/5773 [53:27<7:48:56, 5.41s/it] {'loss': 0.6064, 'learning_rate': 1.9747939614622673e-05, 'epoch': 0.1} 10%|▉ | 575/5773 [53:33<7:48:56, 5.41s/it] {'loss': 0.6064, 'learning_rate': 1.9747939614622673e-05, 'epoch': 0.1} 10%|▉ | 575/5773 [53:27<7:48:56, 5.41s/it] 10%|▉ | 576/5773 [53:38<7:49:30, 5.42s/it] 10%|▉ | 576/5773 [53:33<7:49:30, 5.42s/it] {'loss': 0.6007, 'learning_rate': 1.974668622932845e-05, 'epoch': 0.1} 10%|▉ | 576/5773 [53:38<7:49:30, 5.42s/it] {'loss': 0.6007, 'learning_rate': 1.974668622932845e-05, 'epoch': 0.1} 10%|▉ | 576/5773 [53:33<7:49:30, 5.42s/it] 10%|▉ | 577/5773 [53:44<7:49:23, 5.42s/it] 10%|▉ | 577/5773 [53:38<7:49:22, 5.42s/it] {'loss': 0.5994, 'learning_rate': 1.9745429775465997e-05, 'epoch': 0.1} 10%|▉ | 577/5773 [53:44<7:49:23, 5.42s/it] {'loss': 0.5994, 'learning_rate': 1.9745429775465997e-05, 'epoch': 0.1} 10%|▉ | 577/5773 [53:38<7:49:22, 5.42s/it] 10%|█ | 578/5773 [53:49<7:50:23, 5.43s/it] 10%|█ | 578/5773 [53:44<7:50:23, 5.43s/it] {'loss': 0.6094, 'learning_rate': 1.974417025343089e-05, 'epoch': 0.1} 10%|█ | 578/5773 [53:49<7:50:23, 5.43s/it] {'loss': 0.6094, 'learning_rate': 1.974417025343089e-05, 'epoch': 0.1} 10%|█ | 578/5773 [53:44<7:50:23, 5.43s/it] 10%|█ | 579/5773 [53:55<7:47:21, 5.40s/it] 10%|█ | 579/5773 [53:49<7:47:22, 5.40s/it] {'loss': 0.6046, 'learning_rate': 1.9742907663619658e-05, 'epoch': 0.1} 10%|█ | 579/5773 [53:55<7:47:21, 5.40s/it] {'loss': 0.6046, 'learning_rate': 1.9742907663619658e-05, 'epoch': 0.1} 10%|█ | 579/5773 [53:49<7:47:22, 5.40s/it] 10%|█ | 580/5773 [54:00<7:45:56, 5.38s/it] 10%|█ | 580/5773 [53:54<7:45:55, 5.38s/it] {'loss': 0.6151, 'learning_rate': 1.9741642006429813e-05, 'epoch': 0.1} 10%|█ | 580/5773 [54:00<7:45:56, 5.38s/it] {'loss': 0.6151, 'learning_rate': 1.9741642006429813e-05, 'epoch': 0.1} 10%|█ | 580/5773 [53:54<7:45:55, 5.38s/it] 10%|█ | 581/5773 [54:05<7:48:40, 5.42s/it] 10%|█ | 581/5773 [54:00<7:48:40, 5.42s/it] {'loss': 0.592, 'learning_rate': 1.974037328225982e-05, 'epoch': 0.1} 10%|█ | 581/5773 [54:05<7:48:40, 5.42s/it] {'loss': 0.592, 'learning_rate': 1.974037328225982e-05, 'epoch': 0.1} 10%|█ | 581/5773 [54:00<7:48:40, 5.42s/it] 10%|█ | 582/5773 [54:11<7:47:36, 5.40s/it] 10%|█ | 582/5773 [54:05<7:47:36, 5.40s/it] {'loss': 0.6019, 'learning_rate': 1.9739101491509114e-05, 'epoch': 0.1} 10%|█ | 582/5773 [54:11<7:47:36, 5.40s/it] {'loss': 0.6019, 'learning_rate': 1.9739101491509114e-05, 'epoch': 0.1} 10%|█ | 582/5773 [54:05<7:47:36, 5.40s/it] 10%|█ | 583/5773 [54:16<7:45:21, 5.38s/it] 10%|█ | 583/5773 [54:11<7:45:21, 5.38s/it] {'loss': 0.5802, 'learning_rate': 1.9737826634578097e-05, 'epoch': 0.1} 10%|█ | 583/5773 [54:16<7:45:21, 5.38s/it] {'loss': 0.5802, 'learning_rate': 1.9737826634578097e-05, 'epoch': 0.1} 10%|█ | 583/5773 [54:11<7:45:21, 5.38s/it] 10%|█ | 584/5773 [54:22<7:48:40, 5.42s/it] 10%|█ | 584/5773 [54:16<7:48:41, 5.42s/it] {'loss': 0.5994, 'learning_rate': 1.9736548711868137e-05, 'epoch': 0.1} 10%|█ | 584/5773 [54:22<7:48:40, 5.42s/it] {'loss': 0.5994, 'learning_rate': 1.9736548711868137e-05, 'epoch': 0.1} 10%|█ | 584/5773 [54:16<7:48:41, 5.42s/it] 10%|█ | 585/5773 [54:27<7:50:45, 5.44s/it] 10%|█ | 585/5773 [54:22<7:50:45, 5.44s/it] {'loss': 0.6105, 'learning_rate': 1.9735267723781556e-05, 'epoch': 0.1} 10%|█ | 585/5773 [54:27<7:50:45, 5.44s/it] {'loss': 0.6105, 'learning_rate': 1.9735267723781556e-05, 'epoch': 0.1} 10%|█ | 585/5773 [54:22<7:50:45, 5.44s/it] 10%|█ | 586/5773 [54:33<7:52:08, 5.46s/it] 10%|█ | 586/5773 [54:27<7:52:08, 5.46s/it] {'loss': 0.6154, 'learning_rate': 1.9733983670721658e-05, 'epoch': 0.1} 10%|█ | 586/5773 [54:33<7:52:08, 5.46s/it] {'loss': 0.6154, 'learning_rate': 1.9733983670721658e-05, 'epoch': 0.1} 10%|█ | 586/5773 [54:27<7:52:08, 5.46s/it] 10%|█ | 587/5773 [54:38<7:54:21, 5.49s/it] 10%|█ | 587/5773 [54:33<7:54:21, 5.49s/it] {'loss': 0.6158, 'learning_rate': 1.97326965530927e-05, 'epoch': 0.1} 10%|█ | 587/5773 [54:38<7:54:21, 5.49s/it] {'loss': 0.6158, 'learning_rate': 1.97326965530927e-05, 'epoch': 0.1} 10%|█ | 587/5773 [54:33<7:54:21, 5.49s/it] 10%|█ | 588/5773 [54:44<7:52:00, 5.46s/it] 10%|█ | 588/5773 [54:38<7:52:01, 5.46s/it] {'loss': 0.6222, 'learning_rate': 1.9731406371299917e-05, 'epoch': 0.1} 10%|█ | 588/5773 [54:44<7:52:00, 5.46s/it] {'loss': 0.6222, 'learning_rate': 1.9731406371299917e-05, 'epoch': 0.1} 10%|█ | 588/5773 [54:38<7:52:01, 5.46s/it] 10%|█ | 589/5773 [54:49<7:51:05, 5.45s/it] 10%|█ | 589/5773 [54:44<7:51:06, 5.45s/it] {'loss': 0.6039, 'learning_rate': 1.973011312574949e-05, 'epoch': 0.1} 10%|█ | 589/5773 [54:49<7:51:05, 5.45s/it] {'loss': 0.6039, 'learning_rate': 1.973011312574949e-05, 'epoch': 0.1} 10%|█ | 589/5773 [54:44<7:51:06, 5.45s/it] 10%|█ | 590/5773 [54:55<7:52:05, 5.47s/it] 10%|█ | 590/5773 [54:49<7:52:05, 5.47s/it] {'loss': 0.6224, 'learning_rate': 1.972881681684857e-05, 'epoch': 0.1} 10%|█ | 590/5773 [54:55<7:52:05, 5.47s/it] {'loss': 0.6224, 'learning_rate': 1.972881681684857e-05, 'epoch': 0.1} 10%|█ | 590/5773 [54:49<7:52:05, 5.47s/it] 10%|█ | 591/5773 [55:00<7:52:09, 5.47s/it] 10%|█ | 591/5773 [54:55<7:52:09, 5.47s/it] {'loss': 0.6005, 'learning_rate': 1.9727517445005286e-05, 'epoch': 0.1} 10%|█ | 591/5773 [55:00<7:52:09, 5.47s/it] {'loss': 0.6005, 'learning_rate': 1.9727517445005286e-05, 'epoch': 0.1} 10%|█ | 591/5773 [54:55<7:52:09, 5.47s/it] 10%|█ | 592/5773 [55:06<7:51:43, 5.46s/it] 10%|█ | 592/5773 [55:00<7:51:43, 5.46s/it] {'loss': 0.6008, 'learning_rate': 1.9726215010628717e-05, 'epoch': 0.1} 10%|█ | 592/5773 [55:06<7:51:43, 5.46s/it] {'loss': 0.6008, 'learning_rate': 1.9726215010628717e-05, 'epoch': 0.1} 10%|█ | 592/5773 [55:00<7:51:43, 5.46s/it] 10%|█ | 593/5773 [55:11<7:54:29, 5.50s/it] 10%|█ | 593/5773 [55:06<7:54:28, 5.50s/it] {'loss': 0.6044, 'learning_rate': 1.9724909514128914e-05, 'epoch': 0.1} 10%|█ | 593/5773 [55:11<7:54:29, 5.50s/it] {'loss': 0.6044, 'learning_rate': 1.9724909514128914e-05, 'epoch': 0.1} 10%|█ | 593/5773 [55:06<7:54:28, 5.50s/it] 10%|█ | 594/5773 [55:17<7:52:36, 5.48s/it] 10%|█ | 594/5773 [55:11<7:52:36, 5.48s/it] {'loss': 0.5839, 'learning_rate': 1.9723600955916887e-05, 'epoch': 0.1} 10%|█ | 594/5773 [55:17<7:52:36, 5.48s/it] {'loss': 0.5839, 'learning_rate': 1.9723600955916887e-05, 'epoch': 0.1} 10%|█ | 594/5773 [55:11<7:52:36, 5.48s/it] 10%|█ | 595/5773 [55:22<7:52:34, 5.48s/it] 10%|█ | 595/5773 [55:16<7:52:35, 5.48s/it] {'loss': 0.5957, 'learning_rate': 1.972228933640461e-05, 'epoch': 0.1} 10%|█ | 595/5773 [55:22<7:52:34, 5.48s/it] {'loss': 0.5957, 'learning_rate': 1.972228933640461e-05, 'epoch': 0.1} 10%|█ | 595/5773 [55:16<7:52:35, 5.48s/it] 10%|█ | 596/5773 [55:27<7:51:02, 5.46s/it] 10%|█ | 596/5773 [55:22<7:51:02, 5.46s/it] {'loss': 0.6019, 'learning_rate': 1.9720974656005027e-05, 'epoch': 0.1} 10%|█ | 596/5773 [55:27<7:51:02, 5.46s/it] {'loss': 0.6019, 'learning_rate': 1.9720974656005027e-05, 'epoch': 0.1} 10%|█ | 596/5773 [55:22<7:51:02, 5.46s/it] 10%|█ | 597/5773 [55:33<7:48:24, 5.43s/it] 10%|█ | 597/5773 [55:27<7:48:24, 5.43s/it] {'loss': 0.6063, 'learning_rate': 1.9719656915132038e-05, 'epoch': 0.1} 10%|█ | 597/5773 [55:33<7:48:24, 5.43s/it] {'loss': 0.6063, 'learning_rate': 1.9719656915132038e-05, 'epoch': 0.1} 10%|█ | 597/5773 [55:27<7:48:24, 5.43s/it] 10%|█ | 598/5773 [55:38<7:53:17, 5.49s/it] 10%|█ | 598/5773 [55:33<7:53:17, 5.49s/it] {'loss': 0.602, 'learning_rate': 1.9718336114200515e-05, 'epoch': 0.1} 10%|█ | 598/5773 [55:38<7:53:17, 5.49s/it] {'loss': 0.602, 'learning_rate': 1.9718336114200515e-05, 'epoch': 0.1} 10%|█ | 598/5773 [55:33<7:53:17, 5.49s/it] 10%|█ | 599/5773 [55:44<7:55:53, 5.52s/it] 10%|█ | 599/5773 [55:38<7:55:53, 5.52s/it] {'loss': 0.6031, 'learning_rate': 1.971701225362628e-05, 'epoch': 0.1} 10%|█ | 599/5773 [55:44<7:55:53, 5.52s/it] {'loss': 0.6031, 'learning_rate': 1.971701225362628e-05, 'epoch': 0.1} 10%|█ | 599/5773 [55:38<7:55:53, 5.52s/it]8 AutoResumeHook: Checking whether to suspend... 16 4 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend...12 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 10%|█ | 600/5773 [55:49<7:52:03, 5.48s/it]13 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...9 AutoResumeHook: Checking whether to suspend... 10%|█ | 600/5773 [55:44<7:52:03, 5.48s/it] 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.598, 'learning_rate': 1.9715685333826134e-05, 'epoch': 0.1} 10%|█ | 600/5773 [55:49<7:52:03, 5.48s/it] {'loss': 0.598, 'learning_rate': 1.9715685333826134e-05, 'epoch': 0.1} 10%|█ | 600/5773 [55:44<7:52:03, 5.48s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 10%|█ | 601/5773 [56:08<13:33:43, 9.44s/it] 10%|█ | 601/5773 [56:03<13:33:43, 9.44s/it] {'loss': 0.6178, 'learning_rate': 1.9714355355217828e-05, 'epoch': 0.1} 10%|█ | 601/5773 [56:08<13:33:43, 9.44s/it] {'loss': 0.6178, 'learning_rate': 1.9714355355217828e-05, 'epoch': 0.1} 10%|█ | 601/5773 [56:03<13:33:43, 9.44s/it] 10%|█ | 602/5773 [56:13<11:48:22, 8.22s/it] 10%|█ | 602/5773 [56:08<11:48:21, 8.22s/it] {'loss': 0.5961, 'learning_rate': 1.9713022318220086e-05, 'epoch': 0.1} 10%|█ | 602/5773 [56:13<11:48:22, 8.22s/it] {'loss': 0.5961, 'learning_rate': 1.9713022318220086e-05, 'epoch': 0.1} 10%|█ | 602/5773 [56:08<11:48:21, 8.22s/it] 10%|█ | 603/5773 [56:19<10:40:05, 7.43s/it] 10%|█ | 603/5773 [56:13<10:40:04, 7.43s/it] {'loss': 0.606, 'learning_rate': 1.971168622325259e-05, 'epoch': 0.1} 10%|█ | 603/5773 [56:19<10:40:05, 7.43s/it] {'loss': 0.606, 'learning_rate': 1.971168622325259e-05, 'epoch': 0.1} 10%|█ | 603/5773 [56:13<10:40:04, 7.43s/it] 10%|█ | 604/5773 [56:24<9:49:49, 6.85s/it] 10%|█ | 604/5773 [56:19<9:49:49, 6.85s/it] {'loss': 0.6057, 'learning_rate': 1.9710347070735984e-05, 'epoch': 0.1} 10%|█ | 604/5773 [56:24<9:49:49, 6.85s/it] {'loss': 0.6057, 'learning_rate': 1.9710347070735984e-05, 'epoch': 0.1} 10%|█ | 604/5773 [56:19<9:49:49, 6.85s/it] 10%|█ | 605/5773 [56:30<9:15:49, 6.45s/it] 10%|█ | 605/5773 [56:24<9:15:49, 6.45s/it] {'loss': 0.6, 'learning_rate': 1.9709004861091876e-05, 'epoch': 0.1} 10%|█ | 605/5773 [56:30<9:15:49, 6.45s/it] {'loss': 0.6, 'learning_rate': 1.9709004861091876e-05, 'epoch': 0.1} 10%|█ | 605/5773 [56:24<9:15:49, 6.45s/it] 10%|█ | 606/5773 [56:35<8:46:56, 6.12s/it] 10%|█ | 606/5773 [56:30<8:46:55, 6.12s/it] {'loss': 0.6136, 'learning_rate': 1.9707659594742838e-05, 'epoch': 0.1} 10%|█ | 606/5773 [56:35<8:46:56, 6.12s/it] {'loss': 0.6136, 'learning_rate': 1.9707659594742838e-05, 'epoch': 0.1} 10%|█ | 606/5773 [56:30<8:46:55, 6.12s/it] 11%|█ | 607/5773 [56:41<8:31:17, 5.94s/it] 11%|█ | 607/5773 [56:35<8:31:18, 5.94s/it] {'loss': 0.5904, 'learning_rate': 1.9706311272112402e-05, 'epoch': 0.11} 11%|█ | 607/5773 [56:41<8:31:17, 5.94s/it] {'loss': 0.5904, 'learning_rate': 1.9706311272112402e-05, 'epoch': 0.11} 11%|█ | 607/5773 [56:35<8:31:18, 5.94s/it] 11%|█ | 608/5773 [56:46<8:14:45, 5.75s/it] 11%|█ | 608/5773 [56:41<8:14:45, 5.75s/it] {'loss': 0.5741, 'learning_rate': 1.9704959893625065e-05, 'epoch': 0.11} 11%|█ | 608/5773 [56:46<8:14:45, 5.75s/it] {'loss': 0.5741, 'learning_rate': 1.9704959893625065e-05, 'epoch': 0.11} 11%|█ | 608/5773 [56:41<8:14:45, 5.75s/it] 11%|█ | 609/5773 [56:52<8:07:34, 5.67s/it] 11%|█ | 609/5773 [56:46<8:07:34, 5.67s/it] {'loss': 0.5984, 'learning_rate': 1.9703605459706277e-05, 'epoch': 0.11} 11%|█ | 609/5773 [56:52<8:07:34, 5.67s/it] {'loss': 0.5984, 'learning_rate': 1.9703605459706277e-05, 'epoch': 0.11} 11%|█ | 609/5773 [56:46<8:07:34, 5.67s/it] 11%|█ | 610/5773 [56:57<7:57:46, 5.55s/it] 11%|█ | 610/5773 [56:51<7:57:46, 5.55s/it] {'loss': 0.6133, 'learning_rate': 1.9702247970782466e-05, 'epoch': 0.11} 11%|█ | 610/5773 [56:57<7:57:46, 5.55s/it] {'loss': 0.6133, 'learning_rate': 1.9702247970782466e-05, 'epoch': 0.11} 11%|█ | 610/5773 [56:51<7:57:46, 5.55s/it] 11%|█ | 611/5773 [57:02<7:53:39, 5.51s/it] 11%|█ | 611/5773 [56:57<7:53:39, 5.51s/it] {'loss': 0.5793, 'learning_rate': 1.970088742728101e-05, 'epoch': 0.11} 11%|█ | 611/5773 [57:02<7:53:39, 5.51s/it] {'loss': 0.5793, 'learning_rate': 1.970088742728101e-05, 'epoch': 0.11} 11%|█ | 611/5773 [56:57<7:53:39, 5.51s/it] 11%|█ | 612/5773 [57:08<7:55:31, 5.53s/it] 11%|█ | 612/5773 [57:02<7:55:31, 5.53s/it] {'loss': 0.5931, 'learning_rate': 1.969952382963025e-05, 'epoch': 0.11} 11%|█ | 612/5773 [57:08<7:55:31, 5.53s/it] {'loss': 0.5931, 'learning_rate': 1.969952382963025e-05, 'epoch': 0.11} 11%|█ | 612/5773 [57:02<7:55:31, 5.53s/it] 11%|█ | 613/5773 [57:13<7:53:17, 5.50s/it] 11%|█ | 613/5773 [57:08<7:53:17, 5.50s/it] {'loss': 0.6022, 'learning_rate': 1.9698157178259486e-05, 'epoch': 0.11} 11%|█ | 613/5773 [57:13<7:53:17, 5.50s/it] {'loss': 0.6022, 'learning_rate': 1.9698157178259486e-05, 'epoch': 0.11} 11%|█ | 613/5773 [57:08<7:53:17, 5.50s/it] 11%|█ | 614/5773 [57:19<7:48:47, 5.45s/it] 11%|█ | 614/5773 [57:13<7:48:47, 5.45s/it] {'loss': 0.6211, 'learning_rate': 1.9696787473598993e-05, 'epoch': 0.11} 11%|█ | 614/5773 [57:19<7:48:47, 5.45s/it] {'loss': 0.6211, 'learning_rate': 1.9696787473598993e-05, 'epoch': 0.11} 11%|█ | 614/5773 [57:13<7:48:47, 5.45s/it] 11%|█ | 615/5773 [57:24<7:48:59, 5.46s/it] 11%|█ | 615/5773 [57:19<7:49:00, 5.46s/it] {'loss': 0.597, 'learning_rate': 1.9695414716079993e-05, 'epoch': 0.11} 11%|█ | 615/5773 [57:24<7:48:59, 5.46s/it] {'loss': 0.597, 'learning_rate': 1.9695414716079993e-05, 'epoch': 0.11} 11%|█ | 615/5773 [57:19<7:49:00, 5.46s/it] 11%|█ | 616/5773 [57:29<7:44:22, 5.40s/it] 11%|█ | 616/5773 [57:24<7:44:22, 5.40s/it] {'loss': 0.6061, 'learning_rate': 1.9694038906134673e-05, 'epoch': 0.11} 11%|█ | 616/5773 [57:29<7:44:22, 5.40s/it] {'loss': 0.6061, 'learning_rate': 1.9694038906134673e-05, 'epoch': 0.11} 11%|█ | 616/5773 [57:24<7:44:22, 5.40s/it] 11%|█ | 617/5773 [57:35<7:44:24, 5.40s/it] 11%|█ | 617/5773 [57:29<7:44:23, 5.40s/it] {'loss': 0.6007, 'learning_rate': 1.9692660044196183e-05, 'epoch': 0.11} 11%|█ | 617/5773 [57:35<7:44:24, 5.40s/it] {'loss': 0.6007, 'learning_rate': 1.9692660044196183e-05, 'epoch': 0.11} 11%|█ | 617/5773 [57:29<7:44:23, 5.40s/it] 11%|█ | 618/5773 [57:40<7:44:02, 5.40s/it] 11%|█ | 618/5773 [57:35<7:44:02, 5.40s/it] {'loss': 0.5892, 'learning_rate': 1.9691278130698633e-05, 'epoch': 0.11} 11%|█ | 618/5773 [57:40<7:44:02, 5.40s/it] {'loss': 0.5892, 'learning_rate': 1.9691278130698633e-05, 'epoch': 0.11} 11%|█ | 618/5773 [57:35<7:44:02, 5.40s/it] 11%|█ | 619/5773 [57:46<7:47:18, 5.44s/it] 11%|█ | 619/5773 [57:40<7:47:18, 5.44s/it] {'loss': 0.6199, 'learning_rate': 1.9689893166077093e-05, 'epoch': 0.11} 11%|█ | 619/5773 [57:46<7:47:18, 5.44s/it] {'loss': 0.6199, 'learning_rate': 1.9689893166077093e-05, 'epoch': 0.11} 11%|█ | 619/5773 [57:40<7:47:18, 5.44s/it] 11%|█ | 620/5773 [57:51<7:52:19, 5.50s/it] 11%|█ | 620/5773 [57:46<7:52:19, 5.50s/it] {'loss': 0.6154, 'learning_rate': 1.9688505150767596e-05, 'epoch': 0.11} 11%|█ | 620/5773 [57:51<7:52:19, 5.50s/it] {'loss': 0.6154, 'learning_rate': 1.9688505150767596e-05, 'epoch': 0.11} 11%|█ | 620/5773 [57:46<7:52:19, 5.50s/it] 11%|█ | 621/5773 [57:57<7:48:49, 5.46s/it] 11%|█ | 621/5773 [57:51<7:48:48, 5.46s/it] {'loss': 0.5822, 'learning_rate': 1.968711408520713e-05, 'epoch': 0.11} 11%|█ | 621/5773 [57:57<7:48:49, 5.46s/it] {'loss': 0.5822, 'learning_rate': 1.968711408520713e-05, 'epoch': 0.11} 11%|█ | 621/5773 [57:51<7:48:48, 5.46s/it] 11%|█ | 622/5773 [58:02<7:53:41, 5.52s/it] 11%|█ | 622/5773 [57:57<7:53:42, 5.52s/it] {'loss': 0.6064, 'learning_rate': 1.9685719969833648e-05, 'epoch': 0.11} 11%|█ | 622/5773 [58:02<7:53:41, 5.52s/it] {'loss': 0.6064, 'learning_rate': 1.9685719969833648e-05, 'epoch': 0.11} 11%|█ | 622/5773 [57:57<7:53:42, 5.52s/it] 11%|█ | 623/5773 [58:08<7:52:36, 5.51s/it] 11%|█ | 623/5773 [58:02<7:52:36, 5.51s/it] {'loss': 0.6208, 'learning_rate': 1.9684322805086064e-05, 'epoch': 0.11} 11%|█ | 623/5773 [58:08<7:52:36, 5.51s/it] {'loss': 0.6208, 'learning_rate': 1.9684322805086064e-05, 'epoch': 0.11} 11%|█ | 623/5773 [58:02<7:52:36, 5.51s/it] 11%|█ | 624/5773 [58:13<7:52:00, 5.50s/it] 11%|█ | 624/5773 [58:08<7:52:00, 5.50s/it] {'loss': 0.6069, 'learning_rate': 1.9682922591404246e-05, 'epoch': 0.11} 11%|█ | 624/5773 [58:13<7:52:00, 5.50s/it]{'loss': 0.6069, 'learning_rate': 1.9682922591404246e-05, 'epoch': 0.11} 11%|█ | 624/5773 [58:08<7:52:00, 5.50s/it] 11%|█ | 625/5773 [58:19<7:53:21, 5.52s/it] 11%|█ | 625/5773 [58:13<7:53:21, 5.52s/it] {'loss': 0.604, 'learning_rate': 1.968151932922903e-05, 'epoch': 0.11} 11%|█ | 625/5773 [58:19<7:53:21, 5.52s/it] {'loss': 0.604, 'learning_rate': 1.968151932922903e-05, 'epoch': 0.11} 11%|█ | 625/5773 [58:13<7:53:21, 5.52s/it] 11%|█ | 626/5773 [58:25<7:56:05, 5.55s/it] 11%|█ | 626/5773 [58:19<7:56:05, 5.55s/it] {'loss': 0.6041, 'learning_rate': 1.9680113019002212e-05, 'epoch': 0.11} 11%|█ | 626/5773 [58:25<7:56:05, 5.55s/it] {'loss': 0.6041, 'learning_rate': 1.9680113019002212e-05, 'epoch': 0.11} 11%|█ | 626/5773 [58:19<7:56:05, 5.55s/it] 11%|█ | 627/5773 [58:30<7:50:22, 5.48s/it] 11%|█ | 627/5773 [58:24<7:50:22, 5.48s/it] {'loss': 0.5954, 'learning_rate': 1.9678703661166532e-05, 'epoch': 0.11} 11%|█ | 627/5773 [58:30<7:50:22, 5.48s/it] {'loss': 0.5954, 'learning_rate': 1.9678703661166532e-05, 'epoch': 0.11} 11%|█ | 627/5773 [58:24<7:50:22, 5.48s/it] 11%|█ | 628/5773 [58:35<7:48:50, 5.47s/it] 11%|█ | 628/5773 [58:30<7:48:50, 5.47s/it] {'loss': 0.601, 'learning_rate': 1.9677291256165712e-05, 'epoch': 0.11} 11%|█ | 628/5773 [58:35<7:48:50, 5.47s/it] {'loss': 0.601, 'learning_rate': 1.9677291256165712e-05, 'epoch': 0.11} 11%|█ | 628/5773 [58:30<7:48:50, 5.47s/it] 11%|█ | 629/5773 [58:41<7:47:11, 5.45s/it] 11%|█ | 629/5773 [58:35<7:47:12, 5.45s/it] {'loss': 0.5964, 'learning_rate': 1.9675875804444416e-05, 'epoch': 0.11} 11%|█ | 629/5773 [58:41<7:47:11, 5.45s/it] {'loss': 0.5964, 'learning_rate': 1.9675875804444416e-05, 'epoch': 0.11} 11%|█ | 629/5773 [58:35<7:47:12, 5.45s/it] 11%|█ | 630/5773 [58:46<7:48:44, 5.47s/it] 11%|█ | 630/5773 [58:41<7:48:44, 5.47s/it] {'loss': 0.5843, 'learning_rate': 1.9674457306448272e-05, 'epoch': 0.11} 11%|█ | 630/5773 [58:46<7:48:44, 5.47s/it] {'loss': 0.5843, 'learning_rate': 1.9674457306448272e-05, 'epoch': 0.11} 11%|█ | 630/5773 [58:41<7:48:44, 5.47s/it] 11%|█ | 631/5773 [58:52<7:55:20, 5.55s/it] 11%|█ | 631/5773 [58:46<7:55:21, 5.55s/it] {'loss': 0.5953, 'learning_rate': 1.967303576262387e-05, 'epoch': 0.11} 11%|█ | 631/5773 [58:52<7:55:20, 5.55s/it] {'loss': 0.5953, 'learning_rate': 1.967303576262387e-05, 'epoch': 0.11} 11%|█ | 631/5773 [58:46<7:55:21, 5.55s/it] 11%|█ | 632/5773 [58:57<7:52:05, 5.51s/it] 11%|█ | 632/5773 [58:52<7:52:05, 5.51s/it] {'loss': 0.6149, 'learning_rate': 1.9671611173418764e-05, 'epoch': 0.11} 11%|█ | 632/5773 [58:57<7:52:05, 5.51s/it] {'loss': 0.6149, 'learning_rate': 1.9671611173418764e-05, 'epoch': 0.11} 11%|█ | 632/5773 [58:52<7:52:05, 5.51s/it] 11%|█ | 633/5773 [59:03<7:49:17, 5.48s/it] 11%|█ | 633/5773 [58:57<7:49:17, 5.48s/it] {'loss': 0.5814, 'learning_rate': 1.967018353928145e-05, 'epoch': 0.11} 11%|█ | 633/5773 [59:03<7:49:17, 5.48s/it] {'loss': 0.5814, 'learning_rate': 1.967018353928145e-05, 'epoch': 0.11} 11%|█ | 633/5773 [58:57<7:49:17, 5.48s/it] 11%|█ | 634/5773 [59:08<7:48:07, 5.47s/it] 11%|█ | 634/5773 [59:03<7:48:07, 5.47s/it] {'loss': 0.5949, 'learning_rate': 1.96687528606614e-05, 'epoch': 0.11} 11%|█ | 634/5773 [59:08<7:48:07, 5.47s/it] {'loss': 0.5949, 'learning_rate': 1.96687528606614e-05, 'epoch': 0.11} 11%|█ | 634/5773 [59:03<7:48:07, 5.47s/it] 11%|█ | 635/5773 [59:14<7:50:23, 5.49s/it] 11%|█ | 635/5773 [59:08<7:50:23, 5.49s/it] {'loss': 0.6025, 'learning_rate': 1.9667319138009036e-05, 'epoch': 0.11} 11%|█ | 635/5773 [59:14<7:50:23, 5.49s/it] {'loss': 0.6025, 'learning_rate': 1.9667319138009036e-05, 'epoch': 0.11} 11%|█ | 635/5773 [59:08<7:50:23, 5.49s/it] 11%|█ | 636/5773 [59:19<7:48:39, 5.47s/it] 11%|█ | 636/5773 [59:14<7:48:39, 5.47s/it] {'loss': 0.5966, 'learning_rate': 1.9665882371775735e-05, 'epoch': 0.11} 11%|█ | 636/5773 [59:19<7:48:39, 5.47s/it] {'loss': 0.5966, 'learning_rate': 1.9665882371775735e-05, 'epoch': 0.11} 11%|█ | 636/5773 [59:14<7:48:39, 5.47s/it] 11%|█ | 637/5773 [59:25<7:48:19, 5.47s/it] 11%|█ | 637/5773 [59:19<7:48:19, 5.47s/it] {'loss': 0.6114, 'learning_rate': 1.9664442562413844e-05, 'epoch': 0.11} 11%|█ | 637/5773 [59:25<7:48:19, 5.47s/it] {'loss': 0.6114, 'learning_rate': 1.9664442562413844e-05, 'epoch': 0.11} 11%|█ | 637/5773 [59:19<7:48:19, 5.47s/it] 11%|█ | 638/5773 [59:30<7:43:52, 5.42s/it] 11%|█ | 638/5773 [59:24<7:43:52, 5.42s/it] {'loss': 0.6086, 'learning_rate': 1.9662999710376655e-05, 'epoch': 0.11} 11%|█ | 638/5773 [59:30<7:43:52, 5.42s/it] {'loss': 0.6086, 'learning_rate': 1.9662999710376655e-05, 'epoch': 0.11} 11%|█ | 638/5773 [59:24<7:43:52, 5.42s/it] 11%|█ | 639/5773 [59:35<7:45:02, 5.43s/it] 11%|█ | 639/5773 [59:30<7:45:03, 5.44s/it] {'loss': 0.5752, 'learning_rate': 1.966155381611843e-05, 'epoch': 0.11} 11%|█ | 639/5773 [59:35<7:45:02, 5.43s/it] {'loss': 0.5752, 'learning_rate': 1.966155381611843e-05, 'epoch': 0.11} 11%|█ | 639/5773 [59:30<7:45:03, 5.44s/it] 11%|█ | 640/5773 [59:41<7:45:33, 5.44s/it] 11%|█ | 640/5773 [59:35<7:45:36, 5.44s/it] {'loss': 0.5939, 'learning_rate': 1.9660104880094374e-05, 'epoch': 0.11} 11%|█ | 640/5773 [59:41<7:45:33, 5.44s/it] {'loss': 0.5939, 'learning_rate': 1.9660104880094374e-05, 'epoch': 0.11} 11%|█ | 640/5773 [59:35<7:45:36, 5.44s/it] 11%|█ | 641/5773 [59:46<7:46:53, 5.46s/it] 11%|█ | 641/5773 [59:41<7:46:52, 5.46s/it] {'loss': 0.5878, 'learning_rate': 1.965865290276067e-05, 'epoch': 0.11} 11%|█ | 641/5773 [59:46<7:46:53, 5.46s/it] {'loss': 0.5878, 'learning_rate': 1.965865290276067e-05, 'epoch': 0.11} 11%|█ | 641/5773 [59:41<7:46:52, 5.46s/it] 11%|█ | 642/5773 [59:52<7:44:26, 5.43s/it] 11%|█ | 642/5773 [59:46<7:44:25, 5.43s/it] {'loss': 0.5962, 'learning_rate': 1.9657197884574434e-05, 'epoch': 0.11} 11%|█ | 642/5773 [59:52<7:44:26, 5.43s/it] {'loss': 0.5962, 'learning_rate': 1.9657197884574434e-05, 'epoch': 0.11} 11%|█ | 642/5773 [59:46<7:44:25, 5.43s/it] 11%|█ | 643/5773 [59:57<7:43:02, 5.42s/it] 11%|█ | 643/5773 [59:52<7:43:01, 5.42s/it] {'loss': 0.5814, 'learning_rate': 1.965573982599376e-05, 'epoch': 0.11} 11%|█ | 643/5773 [59:57<7:43:02, 5.42s/it] {'loss': 0.5814, 'learning_rate': 1.965573982599376e-05, 'epoch': 0.11} 11%|█ | 643/5773 [59:52<7:43:01, 5.42s/it] 11%|█ | 644/5773 [1:00:03<7:53:30, 5.54s/it] 11%|█ | 644/5773 [59:57<7:53:29, 5.54s/it] {'loss': 0.6065, 'learning_rate': 1.9654278727477695e-05, 'epoch': 0.11} 11%|█ | 644/5773 [1:00:03<7:53:30, 5.54s/it] {'loss': 0.6065, 'learning_rate': 1.9654278727477695e-05, 'epoch': 0.11} 11%|█ | 644/5773 [59:57<7:53:29, 5.54s/it] 11%|█ | 645/5773 [1:00:08<7:46:26, 5.46s/it] 11%|█ | 645/5773 [1:00:03<7:46:27, 5.46s/it] {'loss': 0.6076, 'learning_rate': 1.9652814589486228e-05, 'epoch': 0.11} 11%|█ | 645/5773 [1:00:08<7:46:26, 5.46s/it] {'loss': 0.6076, 'learning_rate': 1.9652814589486228e-05, 'epoch': 0.11} 11%|█ | 645/5773 [1:00:03<7:46:27, 5.46s/it] 11%|█ | 646/5773 [1:00:14<7:43:59, 5.43s/it] 11%|█ | 646/5773 [1:00:08<7:44:00, 5.43s/it] {'loss': 0.5884, 'learning_rate': 1.9651347412480323e-05, 'epoch': 0.11} 11%|█ | 646/5773 [1:00:14<7:43:59, 5.43s/it] {'loss': 0.5884, 'learning_rate': 1.9651347412480323e-05, 'epoch': 0.11} 11%|█ | 646/5773 [1:00:08<7:44:00, 5.43s/it] 11%|█ | 647/5773 [1:00:19<7:48:05, 5.48s/it] 11%|█ | 647/5773 [1:00:14<7:48:05, 5.48s/it] {'loss': 0.5891, 'learning_rate': 1.96498771969219e-05, 'epoch': 0.11} 11%|█ | 647/5773 [1:00:19<7:48:05, 5.48s/it] {'loss': 0.5891, 'learning_rate': 1.96498771969219e-05, 'epoch': 0.11} 11%|█ | 647/5773 [1:00:14<7:48:05, 5.48s/it] 11%|█ | 648/5773 [1:00:25<7:44:58, 5.44s/it] 11%|█ | 648/5773 [1:00:19<7:44:58, 5.44s/it] {'loss': 0.5963, 'learning_rate': 1.9648403943273818e-05, 'epoch': 0.11} 11%|█ | 648/5773 [1:00:25<7:44:58, 5.44s/it] {'loss': 0.5963, 'learning_rate': 1.9648403943273818e-05, 'epoch': 0.11} 11%|█ | 648/5773 [1:00:19<7:44:58, 5.44s/it] 11%|█ | 649/5773 [1:00:30<7:45:54, 5.46s/it] 11%|█ | 649/5773 [1:00:25<7:45:54, 5.46s/it] {'loss': 0.5981, 'learning_rate': 1.9646927651999915e-05, 'epoch': 0.11} 11%|█ | 649/5773 [1:00:30<7:45:54, 5.46s/it] {'loss': 0.5981, 'learning_rate': 1.9646927651999915e-05, 'epoch': 0.11} 11%|█ | 649/5773 [1:00:25<7:45:54, 5.46s/it]15 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 76 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 11%|█▏ | 650/5773 [1:00:35<7:43:11, 5.42s/it]12 9AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend...0 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 11%|█▏ | 650/5773 [1:00:30<7:43:11, 5.42s/it] {'loss': 0.6053, 'learning_rate': 1.9645448323564968e-05, 'epoch': 0.11} 11%|█▏ | 650/5773 [1:00:35<7:43:11, 5.42s/it] {'loss': 0.6053, 'learning_rate': 1.9645448323564968e-05, 'epoch': 0.11} 11%|█▏ | 650/5773 [1:00:30<7:43:11, 5.42s/it] 11%|█▏ | 651/5773 [1:00:41<7:43:56, 5.43s/it] 11%|█▏ | 651/5773 [1:00:35<7:43:55, 5.43s/it] {'loss': 0.5978, 'learning_rate': 1.964396595843472e-05, 'epoch': 0.11} 11%|█▏ | 651/5773 [1:00:41<7:43:56, 5.43s/it] {'loss': 0.5978, 'learning_rate': 1.964396595843472e-05, 'epoch': 0.11} 11%|█▏ | 651/5773 [1:00:35<7:43:55, 5.43s/it] 11%|█▏ | 652/5773 [1:00:46<7:45:00, 5.45s/it] 11%|█▏ | 652/5773 [1:00:41<7:45:01, 5.45s/it] {'loss': 0.6106, 'learning_rate': 1.9642480557075862e-05, 'epoch': 0.11} 11%|█▏ | 652/5773 [1:00:46<7:45:00, 5.45s/it] {'loss': 0.6106, 'learning_rate': 1.9642480557075862e-05, 'epoch': 0.11} 11%|█▏ | 652/5773 [1:00:41<7:45:01, 5.45s/it] 11%|█▏ | 653/5773 [1:00:52<7:41:42, 5.41s/it] 11%|█▏ | 653/5773 [1:00:46<7:41:44, 5.41s/it] {'loss': 0.6073, 'learning_rate': 1.964099211995605e-05, 'epoch': 0.11} 11%|█▏ | 653/5773 [1:00:52<7:41:42, 5.41s/it] {'loss': 0.6073, 'learning_rate': 1.964099211995605e-05, 'epoch': 0.11} 11%|█▏ | 653/5773 [1:00:46<7:41:44, 5.41s/it] 11%|█▏ | 654/5773 [1:00:57<7:44:10, 5.44s/it] 11%|█▏ | 654/5773 [1:00:52<7:44:10, 5.44s/it] {'loss': 0.6097, 'learning_rate': 1.9639500647543892e-05, 'epoch': 0.11} 11%|█▏ | 654/5773 [1:00:57<7:44:10, 5.44s/it] {'loss': 0.6097, 'learning_rate': 1.9639500647543892e-05, 'epoch': 0.11} 11%|█▏ | 654/5773 [1:00:52<7:44:10, 5.44s/it] 11%|█▏ | 655/5773 [1:01:02<7:39:08, 5.38s/it] 11%|█▏ | 655/5773 [1:00:57<7:39:09, 5.38s/it] {'loss': 0.6106, 'learning_rate': 1.963800614030895e-05, 'epoch': 0.11} 11%|█▏ | 655/5773 [1:01:02<7:39:08, 5.38s/it] {'loss': 0.6106, 'learning_rate': 1.963800614030895e-05, 'epoch': 0.11} 11%|█▏ | 655/5773 [1:00:57<7:39:09, 5.38s/it] 11%|█▏ | 656/5773 [1:01:08<7:40:12, 5.40s/it] 11%|█▏ | 656/5773 [1:01:02<7:40:12, 5.40s/it] {'loss': 0.5994, 'learning_rate': 1.9636508598721745e-05, 'epoch': 0.11} 11%|█▏ | 656/5773 [1:01:08<7:40:12, 5.40s/it] {'loss': 0.5994, 'learning_rate': 1.9636508598721745e-05, 'epoch': 0.11} 11%|█▏ | 656/5773 [1:01:02<7:40:12, 5.40s/it] 11%|█▏ | 657/5773 [1:01:13<7:38:59, 5.38s/it] 11%|█▏ | 657/5773 [1:01:08<7:39:00, 5.38s/it] {'loss': 0.5838, 'learning_rate': 1.963500802325375e-05, 'epoch': 0.11} 11%|█▏ | 657/5773 [1:01:13<7:38:59, 5.38s/it] {'loss': 0.5838, 'learning_rate': 1.963500802325375e-05, 'epoch': 0.11} 11%|█▏ | 657/5773 [1:01:08<7:39:00, 5.38s/it] 11%|█▏ | 658/5773 [1:01:19<7:42:50, 5.43s/it] 11%|█▏ | 658/5773 [1:01:13<7:42:50, 5.43s/it] {'loss': 0.6112, 'learning_rate': 1.963350441437739e-05, 'epoch': 0.11} 11%|█▏ | 658/5773 [1:01:19<7:42:50, 5.43s/it] {'loss': 0.6112, 'learning_rate': 1.963350441437739e-05, 'epoch': 0.11} 11%|█▏ | 658/5773 [1:01:13<7:42:50, 5.43s/it] 11%|█▏ | 659/5773 [1:01:24<7:45:05, 5.46s/it] 11%|█▏ | 659/5773 [1:01:19<7:45:05, 5.46s/it] {'loss': 0.6065, 'learning_rate': 1.9631997772566052e-05, 'epoch': 0.11} 11%|█▏ | 659/5773 [1:01:24<7:45:05, 5.46s/it] {'loss': 0.6065, 'learning_rate': 1.9631997772566052e-05, 'epoch': 0.11} 11%|█▏ | 659/5773 [1:01:19<7:45:05, 5.46s/it] 11%|█▏ | 660/5773 [1:01:30<7:46:49, 5.48s/it] 11%|█▏ | 660/5773 [1:01:24<7:46:49, 5.48s/it] {'loss': 0.6039, 'learning_rate': 1.9630488098294072e-05, 'epoch': 0.11} 11%|█▏ | 660/5773 [1:01:30<7:46:49, 5.48s/it] {'loss': 0.6039, 'learning_rate': 1.9630488098294072e-05, 'epoch': 0.11} 11%|█▏ | 660/5773 [1:01:24<7:46:49, 5.48s/it] 11%|█▏ | 661/5773 [1:01:35<7:43:52, 5.44s/it] 11%|█▏ | 661/5773 [1:01:30<7:43:51, 5.44s/it] {'loss': 0.5707, 'learning_rate': 1.9628975392036755e-05, 'epoch': 0.11} 11%|█▏ | 661/5773 [1:01:35<7:43:52, 5.44s/it] {'loss': 0.5707, 'learning_rate': 1.9628975392036755e-05, 'epoch': 0.11} 11%|█▏ | 661/5773 [1:01:30<7:43:51, 5.44s/it] 11%|█▏ | 662/5773 [1:01:41<7:41:33, 5.42s/it] 11%|█▏ | 662/5773 [1:01:35<7:41:32, 5.42s/it] {'loss': 0.6088, 'learning_rate': 1.9627459654270335e-05, 'epoch': 0.11} 11%|█▏ | 662/5773 [1:01:41<7:41:33, 5.42s/it] {'loss': 0.6088, 'learning_rate': 1.9627459654270335e-05, 'epoch': 0.11} 11%|█▏ | 662/5773 [1:01:35<7:41:32, 5.42s/it] 11%|█▏ | 663/5773 [1:01:46<7:48:08, 5.50s/it] 11%|█▏ | 663/5773 [1:01:41<7:48:07, 5.50s/it] {'loss': 0.6213, 'learning_rate': 1.9625940885472018e-05, 'epoch': 0.11} 11%|█▏ | 663/5773 [1:01:46<7:48:08, 5.50s/it] {'loss': 0.6213, 'learning_rate': 1.9625940885472018e-05, 'epoch': 0.11} 11%|█▏ | 663/5773 [1:01:41<7:48:07, 5.50s/it] 12%|█▏ | 664/5773 [1:01:46<7:43:29, 5.44s/it] 12%|█▏ | 664/5773 [1:01:52<7:43:29, 5.44s/it] {'loss': 0.6106, 'learning_rate': 1.962441908611997e-05, 'epoch': 0.12} 12%|█▏ | 664/5773 [1:01:52<7:43:29, 5.44s/it] {'loss': 0.6106, 'learning_rate': 1.962441908611997e-05, 'epoch': 0.12} 12%|█▏ | 664/5773 [1:01:46<7:43:29, 5.44s/it] 12%|█▏ | 665/5773 [1:01:51<7:44:01, 5.45s/it] 12%|█▏ | 665/5773 [1:01:57<7:44:01, 5.45s/it] {'loss': 0.6085, 'learning_rate': 1.9622894256693292e-05, 'epoch': 0.12} 12%|█▏ | 665/5773 [1:01:57<7:44:01, 5.45s/it] {'loss': 0.6085, 'learning_rate': 1.9622894256693292e-05, 'epoch': 0.12} 12%|█▏ | 665/5773 [1:01:51<7:44:01, 5.45s/it] 12%|█▏ | 666/5773 [1:02:02<7:43:48, 5.45s/it] 12%|█▏ | 666/5773 [1:01:57<7:43:49, 5.45s/it] {'loss': 0.611, 'learning_rate': 1.9621366397672054e-05, 'epoch': 0.12} 12%|█▏ | 666/5773 [1:02:02<7:43:48, 5.45s/it] {'loss': 0.611, 'learning_rate': 1.9621366397672054e-05, 'epoch': 0.12} 12%|█▏ | 666/5773 [1:01:57<7:43:49, 5.45s/it] 12%|█▏ | 667/5773 [1:02:02<7:46:39, 5.48s/it] 12%|█▏ | 667/5773 [1:02:08<7:46:39, 5.48s/it] {'loss': 0.5842, 'learning_rate': 1.9619835509537274e-05, 'epoch': 0.12} 12%|█▏ | 667/5773 [1:02:08<7:46:39, 5.48s/it] {'loss': 0.5842, 'learning_rate': 1.9619835509537274e-05, 'epoch': 0.12} 12%|█▏ | 667/5773 [1:02:02<7:46:39, 5.48s/it] 12%|█▏ | 668/5773 [1:02:13<7:43:15, 5.44s/it] 12%|█▏ | 668/5773 [1:02:08<7:43:16, 5.44s/it] {'loss': 0.6073, 'learning_rate': 1.961830159277092e-05, 'epoch': 0.12} 12%|█▏ | 668/5773 [1:02:13<7:43:15, 5.44s/it] {'loss': 0.6073, 'learning_rate': 1.961830159277092e-05, 'epoch': 0.12} 12%|█▏ | 668/5773 [1:02:08<7:43:16, 5.44s/it] 12%|█▏ | 669/5773 [1:02:19<7:43:16, 5.45s/it] 12%|█▏ | 669/5773 [1:02:13<7:43:16, 5.45s/it] {'loss': 0.6061, 'learning_rate': 1.9616764647855924e-05, 'epoch': 0.12} 12%|█▏ | 669/5773 [1:02:19<7:43:16, 5.45s/it] {'loss': 0.6061, 'learning_rate': 1.9616764647855924e-05, 'epoch': 0.12} 12%|█▏ | 669/5773 [1:02:13<7:43:16, 5.45s/it] 12%|█▏ | 670/5773 [1:02:24<7:40:27, 5.41s/it] 12%|█▏ | 670/5773 [1:02:19<7:40:27, 5.41s/it] {'loss': 0.6077, 'learning_rate': 1.9615224675276167e-05, 'epoch': 0.12} 12%|█▏ | 670/5773 [1:02:24<7:40:27, 5.41s/it] {'loss': 0.6077, 'learning_rate': 1.9615224675276167e-05, 'epoch': 0.12} 12%|█▏ | 670/5773 [1:02:19<7:40:27, 5.41s/it] 12%|█▏ | 671/5773 [1:02:30<7:42:01, 5.43s/it] 12%|█▏ | 671/5773 [1:02:24<7:42:01, 5.43s/it] {'loss': 0.606, 'learning_rate': 1.9613681675516473e-05, 'epoch': 0.12} 12%|█▏ | 671/5773 [1:02:30<7:42:01, 5.43s/it] {'loss': 0.606, 'learning_rate': 1.9613681675516473e-05, 'epoch': 0.12} 12%|█▏ | 671/5773 [1:02:24<7:42:01, 5.43s/it] 12%|█▏ | 672/5773 [1:02:35<7:43:33, 5.45s/it] 12%|█▏ | 672/5773 [1:02:30<7:43:32, 5.45s/it] {'loss': 0.5944, 'learning_rate': 1.9612135649062635e-05, 'epoch': 0.12} 12%|█▏ | 672/5773 [1:02:35<7:43:33, 5.45s/it] {'loss': 0.5944, 'learning_rate': 1.9612135649062635e-05, 'epoch': 0.12} 12%|█▏ | 672/5773 [1:02:30<7:43:32, 5.45s/it] 12%|█▏ | 673/5773 [1:02:41<7:41:50, 5.43s/it] 12%|█▏ | 673/5773 [1:02:35<7:41:49, 5.43s/it] {'loss': 0.5928, 'learning_rate': 1.961058659640139e-05, 'epoch': 0.12} 12%|█▏ | 673/5773 [1:02:41<7:41:50, 5.43s/it] {'loss': 0.5928, 'learning_rate': 1.961058659640139e-05, 'epoch': 0.12} 12%|█▏ | 673/5773 [1:02:35<7:41:49, 5.43s/it] 12%|█▏ | 674/5773 [1:02:46<7:40:32, 5.42s/it] 12%|█▏ | 674/5773 [1:02:40<7:40:32, 5.42s/it] {'loss': 0.5971, 'learning_rate': 1.9609034518020425e-05, 'epoch': 0.12} 12%|█▏ | 674/5773 [1:02:46<7:40:32, 5.42s/it] {'loss': 0.5971, 'learning_rate': 1.9609034518020425e-05, 'epoch': 0.12} 12%|█▏ | 674/5773 [1:02:40<7:40:32, 5.42s/it] 12%|█▏ | 675/5773 [1:02:51<7:42:14, 5.44s/it] 12%|█▏ | 675/5773 [1:02:46<7:42:14, 5.44s/it] {'loss': 0.5979, 'learning_rate': 1.9607479414408387e-05, 'epoch': 0.12} 12%|█▏ | 675/5773 [1:02:51<7:42:14, 5.44s/it] {'loss': 0.5979, 'learning_rate': 1.9607479414408387e-05, 'epoch': 0.12} 12%|█▏ | 675/5773 [1:02:46<7:42:14, 5.44s/it] 12%|█▏ | 676/5773 [1:02:57<7:40:06, 5.42s/it] 12%|█▏ | 676/5773 [1:02:51<7:40:06, 5.42s/it] {'loss': 0.6144, 'learning_rate': 1.9605921286054877e-05, 'epoch': 0.12} 12%|█▏ | 676/5773 [1:02:57<7:40:06, 5.42s/it] {'loss': 0.6144, 'learning_rate': 1.9605921286054877e-05, 'epoch': 0.12} 12%|█▏ | 676/5773 [1:02:51<7:40:06, 5.42s/it] 12%|█▏ | 677/5773 [1:03:02<7:40:08, 5.42s/it] 12%|█▏ | 677/5773 [1:02:57<7:40:08, 5.42s/it] {'loss': 0.5944, 'learning_rate': 1.9604360133450436e-05, 'epoch': 0.12} 12%|█▏ | 677/5773 [1:03:02<7:40:08, 5.42s/it] {'loss': 0.5944, 'learning_rate': 1.9604360133450436e-05, 'epoch': 0.12} 12%|█▏ | 677/5773 [1:02:57<7:40:08, 5.42s/it] 12%|█▏ | 678/5773 [1:03:08<7:43:54, 5.46s/it] 12%|█▏ | 678/5773 [1:03:02<7:43:54, 5.46s/it] {'loss': 0.6033, 'learning_rate': 1.9602795957086567e-05, 'epoch': 0.12} 12%|█▏ | 678/5773 [1:03:08<7:43:54, 5.46s/it] {'loss': 0.6033, 'learning_rate': 1.9602795957086567e-05, 'epoch': 0.12} 12%|█▏ | 678/5773 [1:03:02<7:43:54, 5.46s/it] 12%|█▏ | 679/5773 [1:03:13<7:41:10, 5.43s/it] 12%|█▏ | 679/5773 [1:03:08<7:41:11, 5.43s/it] {'loss': 0.6037, 'learning_rate': 1.9601228757455725e-05, 'epoch': 0.12} 12%|█▏ | 679/5773 [1:03:13<7:41:10, 5.43s/it] {'loss': 0.6037, 'learning_rate': 1.9601228757455725e-05, 'epoch': 0.12} 12%|█▏ | 679/5773 [1:03:08<7:41:11, 5.43s/it] 12%|█▏ | 680/5773 [1:03:19<7:42:48, 5.45s/it] 12%|█▏ | 680/5773 [1:03:13<7:42:48, 5.45s/it] {'loss': 0.5999, 'learning_rate': 1.9599658535051312e-05, 'epoch': 0.12} 12%|█▏ | 680/5773 [1:03:19<7:42:48, 5.45s/it] {'loss': 0.5999, 'learning_rate': 1.9599658535051312e-05, 'epoch': 0.12} 12%|█▏ | 680/5773 [1:03:13<7:42:48, 5.45s/it] 12%|█▏ | 681/5773 [1:03:24<7:42:20, 5.45s/it] 12%|█▏ | 681/5773 [1:03:19<7:42:20, 5.45s/it] {'loss': 0.6144, 'learning_rate': 1.959808529036769e-05, 'epoch': 0.12} 12%|█▏ | 681/5773 [1:03:24<7:42:20, 5.45s/it] {'loss': 0.6144, 'learning_rate': 1.959808529036769e-05, 'epoch': 0.12} 12%|█▏ | 681/5773 [1:03:19<7:42:20, 5.45s/it] 12%|█▏ | 682/5773 [1:03:30<7:45:44, 5.49s/it] 12%|█▏ | 682/5773 [1:03:24<7:45:39, 5.49s/it] {'loss': 0.5928, 'learning_rate': 1.9596509023900155e-05, 'epoch': 0.12} 12%|█▏ | 682/5773 [1:03:30<7:45:44, 5.49s/it] {'loss': 0.5928, 'learning_rate': 1.9596509023900155e-05, 'epoch': 0.12} 12%|█▏ | 682/5773 [1:03:24<7:45:39, 5.49s/it] 12%|█▏ | 683/5773 [1:03:35<7:43:28, 5.46s/it] 12%|█▏ | 683/5773 [1:03:30<7:43:29, 5.46s/it] {'loss': 0.6044, 'learning_rate': 1.9594929736144978e-05, 'epoch': 0.12} 12%|█▏ | 683/5773 [1:03:35<7:43:28, 5.46s/it] {'loss': 0.6044, 'learning_rate': 1.9594929736144978e-05, 'epoch': 0.12} 12%|█▏ | 683/5773 [1:03:30<7:43:29, 5.46s/it] 12%|█▏ | 684/5773 [1:03:40<7:39:18, 5.42s/it] 12%|█▏ | 684/5773 [1:03:35<7:39:20, 5.42s/it] {'loss': 0.5948, 'learning_rate': 1.959334742759936e-05, 'epoch': 0.12} 12%|█▏ | 684/5773 [1:03:40<7:39:18, 5.42s/it] {'loss': 0.5948, 'learning_rate': 1.959334742759936e-05, 'epoch': 0.12} 12%|█▏ | 684/5773 [1:03:35<7:39:20, 5.42s/it] 12%|█▏ | 685/5773 [1:03:46<7:38:42, 5.41s/it] 12%|█▏ | 685/5773 [1:03:40<7:38:42, 5.41s/it] {'loss': 0.604, 'learning_rate': 1.9591762098761467e-05, 'epoch': 0.12} 12%|█▏ | 685/5773 [1:03:46<7:38:42, 5.41s/it] {'loss': 0.604, 'learning_rate': 1.9591762098761467e-05, 'epoch': 0.12} 12%|█▏ | 685/5773 [1:03:40<7:38:42, 5.41s/it] 12%|█▏ | 686/5773 [1:03:51<7:38:09, 5.40s/it] 12%|█▏ | 686/5773 [1:03:46<7:38:09, 5.40s/it] {'loss': 0.5983, 'learning_rate': 1.9590173750130413e-05, 'epoch': 0.12} 12%|█▏ | 686/5773 [1:03:51<7:38:09, 5.40s/it] {'loss': 0.5983, 'learning_rate': 1.9590173750130413e-05, 'epoch': 0.12} 12%|█▏ | 686/5773 [1:03:46<7:38:09, 5.40s/it] 12%|█▏ | 687/5773 [1:03:56<7:36:43, 5.39s/it] 12%|█▏ | 687/5773 [1:03:51<7:36:42, 5.39s/it] {'loss': 0.6169, 'learning_rate': 1.9588582382206258e-05, 'epoch': 0.12} 12%|█▏ | 687/5773 [1:03:56<7:36:43, 5.39s/it] {'loss': 0.6169, 'learning_rate': 1.9588582382206258e-05, 'epoch': 0.12} 12%|█▏ | 687/5773 [1:03:51<7:36:42, 5.39s/it] 12%|█▏ | 688/5773 [1:04:02<7:33:48, 5.35s/it] 12%|█▏ | 688/5773 [1:03:56<7:33:49, 5.35s/it] {'loss': 0.6039, 'learning_rate': 1.9586987995490015e-05, 'epoch': 0.12} 12%|█▏ | 688/5773 [1:04:02<7:33:48, 5.35s/it] {'loss': 0.6039, 'learning_rate': 1.9586987995490015e-05, 'epoch': 0.12} 12%|█▏ | 688/5773 [1:03:56<7:33:49, 5.35s/it] 12%|█▏ | 689/5773 [1:04:07<7:42:01, 5.45s/it] 12%|█▏ | 689/5773 [1:04:02<7:42:01, 5.45s/it] {'loss': 0.6156, 'learning_rate': 1.958539059048365e-05, 'epoch': 0.12} 12%|█▏ | 689/5773 [1:04:07<7:42:01, 5.45s/it] {'loss': 0.6156, 'learning_rate': 1.958539059048365e-05, 'epoch': 0.12} 12%|█▏ | 689/5773 [1:04:02<7:42:01, 5.45s/it] 12%|█▏ | 690/5773 [1:04:13<7:38:20, 5.41s/it] 12%|█▏ | 690/5773 [1:04:07<7:38:19, 5.41s/it] {'loss': 0.6061, 'learning_rate': 1.9583790167690074e-05, 'epoch': 0.12} 12%|█▏ | 690/5773 [1:04:13<7:38:20, 5.41s/it] {'loss': 0.6061, 'learning_rate': 1.9583790167690074e-05, 'epoch': 0.12} 12%|█▏ | 690/5773 [1:04:07<7:38:19, 5.41s/it] 12%|█▏ | 691/5773 [1:04:18<7:38:07, 5.41s/it] 12%|█▏ | 691/5773 [1:04:13<7:38:07, 5.41s/it] {'loss': 0.5966, 'learning_rate': 1.9582186727613152e-05, 'epoch': 0.12} 12%|█▏ | 691/5773 [1:04:18<7:38:07, 5.41s/it] {'loss': 0.5966, 'learning_rate': 1.9582186727613152e-05, 'epoch': 0.12} 12%|█▏ | 691/5773 [1:04:13<7:38:07, 5.41s/it] 12%|█▏ | 692/5773 [1:04:24<7:41:15, 5.45s/it] 12%|█▏ | 692/5773 [1:04:18<7:41:15, 5.45s/it] {'loss': 0.5805, 'learning_rate': 1.9580580270757702e-05, 'epoch': 0.12} 12%|█▏ | 692/5773 [1:04:24<7:41:15, 5.45s/it] {'loss': 0.5805, 'learning_rate': 1.9580580270757702e-05, 'epoch': 0.12} 12%|█▏ | 692/5773 [1:04:18<7:41:15, 5.45s/it] 12%|█▏ | 693/5773 [1:04:29<7:40:55, 5.44s/it] 12%|█▏ | 693/5773 [1:04:24<7:40:54, 5.44s/it] {'loss': 0.6046, 'learning_rate': 1.9578970797629488e-05, 'epoch': 0.12} 12%|█▏ | 693/5773 [1:04:29<7:40:55, 5.44s/it] {'loss': 0.6046, 'learning_rate': 1.9578970797629488e-05, 'epoch': 0.12} 12%|█▏ | 693/5773 [1:04:24<7:40:54, 5.44s/it] 12%|█▏ | 694/5773 [1:04:35<7:43:46, 5.48s/it] 12%|█▏ | 694/5773 [1:04:29<7:43:46, 5.48s/it] {'loss': 0.6113, 'learning_rate': 1.9577358308735217e-05, 'epoch': 0.12} 12%|█▏ | 694/5773 [1:04:35<7:43:46, 5.48s/it] {'loss': 0.6113, 'learning_rate': 1.9577358308735217e-05, 'epoch': 0.12} 12%|█▏ | 694/5773 [1:04:29<7:43:46, 5.48s/it] 12%|█▏ | 695/5773 [1:04:40<7:47:04, 5.52s/it] 12%|█▏ | 695/5773 [1:04:35<7:47:04, 5.52s/it] {'loss': 0.6156, 'learning_rate': 1.9575742804582557e-05, 'epoch': 0.12} 12%|█▏ | 695/5773 [1:04:40<7:47:04, 5.52s/it] {'loss': 0.6156, 'learning_rate': 1.9575742804582557e-05, 'epoch': 0.12} 12%|█▏ | 695/5773 [1:04:35<7:47:04, 5.52s/it] 12%|█▏ | 696/5773 [1:04:46<7:47:02, 5.52s/it] 12%|█▏ | 696/5773 [1:04:40<7:47:02, 5.52s/it] {'loss': 0.5987, 'learning_rate': 1.957412428568012e-05, 'epoch': 0.12} 12%|█▏ | 696/5773 [1:04:46<7:47:02, 5.52s/it] {'loss': 0.5987, 'learning_rate': 1.957412428568012e-05, 'epoch': 0.12} 12%|█▏ | 696/5773 [1:04:40<7:47:02, 5.52s/it] 12%|█▏ | 697/5773 [1:04:51<7:46:57, 5.52s/it] 12%|█▏ | 697/5773 [1:04:46<7:46:58, 5.52s/it] {'loss': 0.5913, 'learning_rate': 1.9572502752537462e-05, 'epoch': 0.12} 12%|█▏ | 697/5773 [1:04:51<7:46:57, 5.52s/it] {'loss': 0.5913, 'learning_rate': 1.9572502752537462e-05, 'epoch': 0.12} 12%|█▏ | 697/5773 [1:04:46<7:46:58, 5.52s/it] 12%|█▏ | 698/5773 [1:04:57<7:43:28, 5.48s/it] 12%|█▏ | 698/5773 [1:04:51<7:43:28, 5.48s/it] {'loss': 0.5947, 'learning_rate': 1.9570878205665104e-05, 'epoch': 0.12} 12%|█▏ | 698/5773 [1:04:57<7:43:28, 5.48s/it] {'loss': 0.5947, 'learning_rate': 1.9570878205665104e-05, 'epoch': 0.12} 12%|█▏ | 698/5773 [1:04:51<7:43:28, 5.48s/it] 12%|█▏ | 699/5773 [1:05:02<7:41:50, 5.46s/it] 12%|█▏ | 699/5773 [1:04:57<7:41:50, 5.46s/it] {'loss': 0.6079, 'learning_rate': 1.9569250645574498e-05, 'epoch': 0.12} 12%|█▏ | 699/5773 [1:05:02<7:41:50, 5.46s/it] {'loss': 0.6079, 'learning_rate': 1.9569250645574498e-05, 'epoch': 0.12} 12%|█▏ | 699/5773 [1:04:57<7:41:50, 5.46s/it]8 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 12%|█▏ | 700/5773 [1:05:07<7:39:08, 5.43s/it]13 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 12%|█▏ | 700/5773 [1:05:02<7:39:09, 5.43s/it] {'loss': 0.595, 'learning_rate': 1.956762007277805e-05, 'epoch': 0.12} 12%|█▏ | 700/5773 [1:05:08<7:39:08, 5.43s/it] {'loss': 0.595, 'learning_rate': 1.956762007277805e-05, 'epoch': 0.12} 12%|█▏ | 700/5773 [1:05:02<7:39:09, 5.43s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 12%|█▏ | 701/5773 [1:05:28<14:02:10, 9.96s/it] 12%|█▏ | 701/5773 [1:05:23<14:02:09, 9.96s/it] {'loss': 0.6072, 'learning_rate': 1.9565986487789124e-05, 'epoch': 0.12} 12%|█▏ | 701/5773 [1:05:28<14:02:10, 9.96s/it] {'loss': 0.6072, 'learning_rate': 1.9565986487789124e-05, 'epoch': 0.12} 12%|█▏ | 701/5773 [1:05:23<14:02:09, 9.96s/it] 12%|█▏ | 702/5773 [1:05:28<12:07:36, 8.61s/it] 12%|█▏ | 702/5773 [1:05:33<12:07:39, 8.61s/it] {'loss': 0.6025, 'learning_rate': 1.9564349891122017e-05, 'epoch': 0.12} 12%|█▏ | 702/5773 [1:05:33<12:07:39, 8.61s/it] {'loss': 0.6025, 'learning_rate': 1.9564349891122017e-05, 'epoch': 0.12} 12%|█▏ | 702/5773 [1:05:28<12:07:36, 8.61s/it] 12%|█▏ | 703/5773 [1:05:34<10:59:50, 7.81s/it] 12%|█▏ | 703/5773 [1:05:39<10:59:50, 7.81s/it] {'loss': 0.6233, 'learning_rate': 1.9562710283291988e-05, 'epoch': 0.12} 12%|█▏ | 703/5773 [1:05:34<10:59:50, 7.81s/it]{'loss': 0.6233, 'learning_rate': 1.9562710283291988e-05, 'epoch': 0.12} 12%|█▏ | 703/5773 [1:05:39<10:59:50, 7.81s/it] 12%|█▏ | 704/5773 [1:05:45<10:04:01, 7.15s/it] 12%|█▏ | 704/5773 [1:05:40<10:04:01, 7.15s/it] {'loss': 0.5939, 'learning_rate': 1.9561067664815233e-05, 'epoch': 0.12} 12%|█▏ | 704/5773 [1:05:45<10:04:01, 7.15s/it] {'loss': 0.5939, 'learning_rate': 1.9561067664815233e-05, 'epoch': 0.12} 12%|█▏ | 704/5773 [1:05:40<10:04:01, 7.15s/it] 12%|█▏ | 705/5773 [1:05:51<9:21:15, 6.64s/it] 12%|█▏ | 705/5773 [1:05:45<9:21:16, 6.64s/it] {'loss': 0.6028, 'learning_rate': 1.9559422036208905e-05, 'epoch': 0.12} 12%|█▏ | 705/5773 [1:05:51<9:21:15, 6.64s/it] {'loss': 0.6028, 'learning_rate': 1.9559422036208905e-05, 'epoch': 0.12} 12%|█▏ | 705/5773 [1:05:45<9:21:16, 6.64s/it] 12%|█▏ | 706/5773 [1:05:56<8:49:30, 6.27s/it] 12%|█▏ | 706/5773 [1:05:50<8:49:30, 6.27s/it] {'loss': 0.6115, 'learning_rate': 1.95577733979911e-05, 'epoch': 0.12} 12%|█▏ | 706/5773 [1:05:56<8:49:30, 6.27s/it] {'loss': 0.6115, 'learning_rate': 1.95577733979911e-05, 'epoch': 0.12} 12%|█▏ | 706/5773 [1:05:50<8:49:30, 6.27s/it] 12%|█▏ | 707/5773 [1:06:02<8:32:38, 6.07s/it] 12%|█▏ | 707/5773 [1:05:56<8:32:39, 6.07s/it] {'loss': 0.6081, 'learning_rate': 1.9556121750680857e-05, 'epoch': 0.12} 12%|█▏ | 707/5773 [1:06:02<8:32:38, 6.07s/it] {'loss': 0.6081, 'learning_rate': 1.9556121750680857e-05, 'epoch': 0.12} 12%|█▏ | 707/5773 [1:05:56<8:32:39, 6.07s/it] 12%|█▏ | 708/5773 [1:06:07<8:13:35, 5.85s/it] 12%|█▏ | 708/5773 [1:06:01<8:13:35, 5.85s/it] {'loss': 0.5941, 'learning_rate': 1.9554467094798173e-05, 'epoch': 0.12} 12%|█▏ | 708/5773 [1:06:07<8:13:35, 5.85s/it] {'loss': 0.5941, 'learning_rate': 1.9554467094798173e-05, 'epoch': 0.12} 12%|█▏ | 708/5773 [1:06:01<8:13:35, 5.85s/it] 12%|█▏ | 709/5773 [1:06:12<8:06:02, 5.76s/it] 12%|█▏ | 709/5773 [1:06:07<8:06:02, 5.76s/it] {'loss': 0.6062, 'learning_rate': 1.9552809430863985e-05, 'epoch': 0.12} 12%|█▏ | 709/5773 [1:06:12<8:06:02, 5.76s/it] {'loss': 0.6062, 'learning_rate': 1.9552809430863985e-05, 'epoch': 0.12} 12%|█▏ | 709/5773 [1:06:07<8:06:02, 5.76s/it] 12%|█▏ | 710/5773 [1:06:18<7:59:38, 5.68s/it] 12%|█▏ | 710/5773 [1:06:12<7:59:38, 5.68s/it] {'loss': 0.6072, 'learning_rate': 1.9551148759400177e-05, 'epoch': 0.12} 12%|█▏ | 710/5773 [1:06:18<7:59:38, 5.68s/it] {'loss': 0.6072, 'learning_rate': 1.9551148759400177e-05, 'epoch': 0.12} 12%|█▏ | 710/5773 [1:06:12<7:59:38, 5.68s/it] 12%|█▏ | 711/5773 [1:06:23<7:50:13, 5.57s/it] 12%|█▏ | 711/5773 [1:06:18<7:50:13, 5.57s/it] {'loss': 0.6063, 'learning_rate': 1.954948508092958e-05, 'epoch': 0.12} 12%|█▏ | 711/5773 [1:06:23<7:50:13, 5.57s/it] {'loss': 0.6063, 'learning_rate': 1.954948508092958e-05, 'epoch': 0.12} 12%|█▏ | 711/5773 [1:06:18<7:50:13, 5.57s/it] 12%|█▏ | 712/5773 [1:06:29<7:52:07, 5.60s/it] 12%|█▏ | 712/5773 [1:06:23<7:52:07, 5.60s/it] {'loss': 0.5871, 'learning_rate': 1.954781839597598e-05, 'epoch': 0.12} 12%|█▏ | 712/5773 [1:06:29<7:52:07, 5.60s/it] {'loss': 0.5871, 'learning_rate': 1.954781839597598e-05, 'epoch': 0.12} 12%|█▏ | 712/5773 [1:06:23<7:52:07, 5.60s/it] 12%|█▏ | 713/5773 [1:06:34<7:48:07, 5.55s/it] 12%|█▏ | 713/5773 [1:06:29<7:48:07, 5.55s/it] {'loss': 0.6166, 'learning_rate': 1.9546148705064097e-05, 'epoch': 0.12} 12%|█▏ | 713/5773 [1:06:34<7:48:07, 5.55s/it] {'loss': 0.6166, 'learning_rate': 1.9546148705064097e-05, 'epoch': 0.12} 12%|█▏ | 713/5773 [1:06:29<7:48:07, 5.55s/it] 12%|█▏ | 714/5773 [1:06:40<7:48:49, 5.56s/it] 12%|█▏ | 714/5773 [1:06:34<7:48:49, 5.56s/it] {'loss': 0.6027, 'learning_rate': 1.95444760087196e-05, 'epoch': 0.12} 12%|█▏ | 714/5773 [1:06:40<7:48:49, 5.56s/it] {'loss': 0.6027, 'learning_rate': 1.95444760087196e-05, 'epoch': 0.12} 12%|█▏ | 714/5773 [1:06:34<7:48:49, 5.56s/it] 12%|█▏ | 715/5773 [1:06:45<7:46:34, 5.53s/it] 12%|█▏ | 715/5773 [1:06:40<7:46:35, 5.53s/it] {'loss': 0.6054, 'learning_rate': 1.9542800307469116e-05, 'epoch': 0.12} 12%|█▏ | 715/5773 [1:06:45<7:46:34, 5.53s/it] {'loss': 0.6054, 'learning_rate': 1.9542800307469116e-05, 'epoch': 0.12} 12%|█▏ | 715/5773 [1:06:40<7:46:35, 5.53s/it] 12%|█▏ | 716/5773 [1:06:51<7:49:22, 5.57s/it] 12%|█▏ | 716/5773 [1:06:45<7:49:22, 5.57s/it] {'loss': 0.5932, 'learning_rate': 1.95411216018402e-05, 'epoch': 0.12} 12%|█▏ | 716/5773 [1:06:51<7:49:22, 5.57s/it] {'loss': 0.5932, 'learning_rate': 1.95411216018402e-05, 'epoch': 0.12} 12%|█▏ | 716/5773 [1:06:45<7:49:22, 5.57s/it] 12%|█▏ | 717/5773 [1:06:56<7:46:21, 5.53s/it] 12%|█▏ | 717/5773 [1:06:51<7:46:20, 5.53s/it]{'loss': 0.6028, 'learning_rate': 1.9539439892361375e-05, 'epoch': 0.12} 12%|█▏ | 717/5773 [1:06:56<7:46:21, 5.53s/it] {'loss': 0.6028, 'learning_rate': 1.9539439892361375e-05, 'epoch': 0.12} 12%|█▏ | 717/5773 [1:06:51<7:46:20, 5.53s/it] 12%|█▏ | 718/5773 [1:07:02<7:40:29, 5.47s/it] 12%|█▏ | 718/5773 [1:06:56<7:40:29, 5.47s/it] {'loss': 0.5864, 'learning_rate': 1.9537755179562086e-05, 'epoch': 0.12} 12%|█▏ | 718/5773 [1:07:02<7:40:29, 5.47s/it] {'loss': 0.5864, 'learning_rate': 1.9537755179562086e-05, 'epoch': 0.12} 12%|█▏ | 718/5773 [1:06:56<7:40:29, 5.47s/it] 12%|█▏ | 719/5773 [1:07:07<7:35:11, 5.40s/it] 12%|█▏ | 719/5773 [1:07:02<7:35:11, 5.40s/it] {'loss': 0.6003, 'learning_rate': 1.9536067463972733e-05, 'epoch': 0.12} 12%|█▏ | 719/5773 [1:07:07<7:35:11, 5.40s/it] {'loss': 0.6003, 'learning_rate': 1.9536067463972733e-05, 'epoch': 0.12} 12%|█▏ | 719/5773 [1:07:02<7:35:11, 5.40s/it] 12%|█▏ | 720/5773 [1:07:12<7:36:37, 5.42s/it] 12%|█▏ | 720/5773 [1:07:07<7:36:37, 5.42s/it] {'loss': 0.6181, 'learning_rate': 1.953437674612467e-05, 'epoch': 0.12} 12%|█▏ | 720/5773 [1:07:13<7:36:37, 5.42s/it] {'loss': 0.6181, 'learning_rate': 1.953437674612467e-05, 'epoch': 0.12} 12%|█▏ | 720/5773 [1:07:07<7:36:37, 5.42s/it] 12%|█▏ | 721/5773 [1:07:18<7:34:26, 5.40s/it] 12%|█▏ | 721/5773 [1:07:12<7:34:26, 5.40s/it] {'loss': 0.5947, 'learning_rate': 1.9532683026550187e-05, 'epoch': 0.12} 12%|█▏ | 721/5773 [1:07:18<7:34:26, 5.40s/it] {'loss': 0.5947, 'learning_rate': 1.9532683026550187e-05, 'epoch': 0.12} 12%|█▏ | 721/5773 [1:07:12<7:34:26, 5.40s/it] 13%|█▎ | 722/5773 [1:07:23<7:37:18, 5.43s/it] 13%|█▎ | 722/5773 [1:07:18<7:37:18, 5.43s/it] {'loss': 0.5942, 'learning_rate': 1.9530986305782517e-05, 'epoch': 0.13} 13%|█▎ | 722/5773 [1:07:23<7:37:18, 5.43s/it] {'loss': 0.5942, 'learning_rate': 1.9530986305782517e-05, 'epoch': 0.13} 13%|█▎ | 722/5773 [1:07:18<7:37:18, 5.43s/it] 13%|█▎ | 723/5773 [1:07:29<7:35:07, 5.41s/it] 13%|█▎ | 723/5773 [1:07:23<7:35:07, 5.41s/it] {'loss': 0.5894, 'learning_rate': 1.9529286584355848e-05, 'epoch': 0.13} 13%|█▎ | 723/5773 [1:07:29<7:35:07, 5.41s/it] {'loss': 0.5894, 'learning_rate': 1.9529286584355848e-05, 'epoch': 0.13} 13%|█▎ | 723/5773 [1:07:23<7:35:07, 5.41s/it] 13%|█▎ | 724/5773 [1:07:34<7:35:45, 5.42s/it] 13%|█▎ | 724/5773 [1:07:29<7:35:45, 5.42s/it] {'loss': 0.5979, 'learning_rate': 1.9527583862805303e-05, 'epoch': 0.13} 13%|█▎ | 724/5773 [1:07:34<7:35:45, 5.42s/it] {'loss': 0.5979, 'learning_rate': 1.9527583862805303e-05, 'epoch': 0.13} 13%|█▎ | 724/5773 [1:07:29<7:35:45, 5.42s/it] 13%|█▎ | 725/5773 [1:07:39<7:32:21, 5.38s/it] 13%|█▎ | 725/5773 [1:07:34<7:32:21, 5.38s/it] {'loss': 0.6152, 'learning_rate': 1.9525878141666954e-05, 'epoch': 0.13} 13%|█▎ | 725/5773 [1:07:39<7:32:21, 5.38s/it] {'loss': 0.6152, 'learning_rate': 1.9525878141666954e-05, 'epoch': 0.13} 13%|█▎ | 725/5773 [1:07:34<7:32:21, 5.38s/it] 13%|█▎ | 726/5773 [1:07:45<7:31:36, 5.37s/it] 13%|█▎ | 726/5773 [1:07:39<7:31:36, 5.37s/it] {'loss': 0.5895, 'learning_rate': 1.9524169421477818e-05, 'epoch': 0.13} 13%|█▎ | 726/5773 [1:07:45<7:31:36, 5.37s/it] {'loss': 0.5895, 'learning_rate': 1.9524169421477818e-05, 'epoch': 0.13} 13%|█▎ | 726/5773 [1:07:39<7:31:36, 5.37s/it] 13%|█▎ | 727/5773 [1:07:50<7:36:48, 5.43s/it] 13%|█▎ | 727/5773 [1:07:45<7:36:48, 5.43s/it] {'loss': 0.6003, 'learning_rate': 1.952245770277585e-05, 'epoch': 0.13} 13%|█▎ | 727/5773 [1:07:50<7:36:48, 5.43s/it] {'loss': 0.6003, 'learning_rate': 1.952245770277585e-05, 'epoch': 0.13} 13%|█▎ | 727/5773 [1:07:45<7:36:48, 5.43s/it] 13%|█▎ | 728/5773 [1:07:56<7:40:45, 5.48s/it] 13%|█▎ | 728/5773 [1:07:50<7:40:45, 5.48s/it] {'loss': 0.5944, 'learning_rate': 1.9520742986099958e-05, 'epoch': 0.13} 13%|█▎ | 728/5773 [1:07:56<7:40:45, 5.48s/it] {'loss': 0.5944, 'learning_rate': 1.9520742986099958e-05, 'epoch': 0.13} 13%|█▎ | 728/5773 [1:07:50<7:40:45, 5.48s/it] 13%|█▎ | 729/5773 [1:08:01<7:35:05, 5.41s/it] 13%|█▎ | 729/5773 [1:07:56<7:35:05, 5.41s/it] {'loss': 0.6014, 'learning_rate': 1.9519025271989986e-05, 'epoch': 0.13} 13%|█▎ | 729/5773 [1:08:01<7:35:05, 5.41s/it] {'loss': 0.6014, 'learning_rate': 1.9519025271989986e-05, 'epoch': 0.13} 13%|█▎ | 729/5773 [1:07:56<7:35:05, 5.41s/it] 13%|█▎ | 730/5773 [1:08:07<7:33:47, 5.40s/it] 13%|█▎ | 730/5773 [1:08:01<7:33:46, 5.40s/it] {'loss': 0.6043, 'learning_rate': 1.951730456098673e-05, 'epoch': 0.13} 13%|█▎ | 730/5773 [1:08:07<7:33:47, 5.40s/it] {'loss': 0.6043, 'learning_rate': 1.951730456098673e-05, 'epoch': 0.13} 13%|█▎ | 730/5773 [1:08:01<7:33:46, 5.40s/it] 13%|█▎ | 731/5773 [1:08:06<7:32:58, 5.39s/it] 13%|█▎ | 731/5773 [1:08:12<7:33:00, 5.39s/it] {'loss': 0.5906, 'learning_rate': 1.9515580853631922e-05, 'epoch': 0.13} 13%|█▎ | 731/5773 [1:08:12<7:33:00, 5.39s/it] {'loss': 0.5906, 'learning_rate': 1.9515580853631922e-05, 'epoch': 0.13} 13%|█▎ | 731/5773 [1:08:06<7:32:58, 5.39s/it] 13%|█▎ | 732/5773 [1:08:17<7:30:53, 5.37s/it] 13%|█▎ | 732/5773 [1:08:12<7:30:54, 5.37s/it] {'loss': 0.5985, 'learning_rate': 1.9513854150468238e-05, 'epoch': 0.13} 13%|█▎ | 732/5773 [1:08:17<7:30:53, 5.37s/it] {'loss': 0.5985, 'learning_rate': 1.9513854150468238e-05, 'epoch': 0.13} 13%|█▎ | 732/5773 [1:08:12<7:30:54, 5.37s/it] 13%|█▎ | 733/5773 [1:08:23<7:32:13, 5.38s/it] 13%|█▎ | 733/5773 [1:08:17<7:32:13, 5.38s/it] {'loss': 0.6042, 'learning_rate': 1.9512124452039307e-05, 'epoch': 0.13} 13%|█▎ | 733/5773 [1:08:23<7:32:13, 5.38s/it] {'loss': 0.6042, 'learning_rate': 1.9512124452039307e-05, 'epoch': 0.13} 13%|█▎ | 733/5773 [1:08:17<7:32:13, 5.38s/it] 13%|█▎ | 734/5773 [1:08:28<7:30:32, 5.36s/it] 13%|█▎ | 734/5773 [1:08:22<7:30:32, 5.36s/it] {'loss': 0.5941, 'learning_rate': 1.9510391758889683e-05, 'epoch': 0.13} 13%|█▎ | 734/5773 [1:08:28<7:30:32, 5.36s/it] {'loss': 0.5941, 'learning_rate': 1.9510391758889683e-05, 'epoch': 0.13} 13%|█▎ | 734/5773 [1:08:22<7:30:32, 5.36s/it] 13%|█▎ | 735/5773 [1:08:33<7:31:39, 5.38s/it] 13%|█▎ | 735/5773 [1:08:28<7:31:40, 5.38s/it] {'loss': 0.5808, 'learning_rate': 1.9508656071564883e-05, 'epoch': 0.13} 13%|█▎ | 735/5773 [1:08:33<7:31:39, 5.38s/it] {'loss': 0.5808, 'learning_rate': 1.9508656071564883e-05, 'epoch': 0.13} 13%|█▎ | 735/5773 [1:08:28<7:31:40, 5.38s/it] 13%|█▎ | 736/5773 [1:08:39<7:34:57, 5.42s/it] 13%|█▎ | 736/5773 [1:08:33<7:34:57, 5.42s/it] {'loss': 0.5914, 'learning_rate': 1.950691739061135e-05, 'epoch': 0.13} 13%|█▎ | 736/5773 [1:08:39<7:34:57, 5.42s/it] {'loss': 0.5914, 'learning_rate': 1.950691739061135e-05, 'epoch': 0.13} 13%|█▎ | 736/5773 [1:08:33<7:34:57, 5.42s/it] 13%|█▎ | 737/5773 [1:08:44<7:37:02, 5.45s/it] 13%|█▎ | 737/5773 [1:08:39<7:37:03, 5.45s/it] {'loss': 0.5931, 'learning_rate': 1.9505175716576476e-05, 'epoch': 0.13} 13%|█▎ | 737/5773 [1:08:44<7:37:02, 5.45s/it] {'loss': 0.5931, 'learning_rate': 1.9505175716576476e-05, 'epoch': 0.13} 13%|█▎ | 737/5773 [1:08:39<7:37:03, 5.45s/it] 13%|█▎ | 738/5773 [1:08:50<7:34:40, 5.42s/it] 13%|█▎ | 738/5773 [1:08:44<7:34:39, 5.42s/it] {'loss': 0.6063, 'learning_rate': 1.95034310500086e-05, 'epoch': 0.13} 13%|█▎ | 738/5773 [1:08:50<7:34:40, 5.42s/it] {'loss': 0.6063, 'learning_rate': 1.95034310500086e-05, 'epoch': 0.13} 13%|█▎ | 738/5773 [1:08:44<7:34:39, 5.42s/it] 13%|█▎ | 739/5773 [1:08:55<7:37:50, 5.46s/it] 13%|█▎ | 739/5773 [1:08:50<7:37:49, 5.46s/it] {'loss': 0.5896, 'learning_rate': 1.9501683391457e-05, 'epoch': 0.13} 13%|█▎ | 739/5773 [1:08:55<7:37:50, 5.46s/it] {'loss': 0.5896, 'learning_rate': 1.9501683391457e-05, 'epoch': 0.13} 13%|█▎ | 739/5773 [1:08:50<7:37:49, 5.46s/it] 13%|█▎ | 740/5773 [1:09:01<7:41:05, 5.50s/it] 13%|█▎ | 740/5773 [1:08:55<7:41:06, 5.50s/it] {'loss': 0.6029, 'learning_rate': 1.9499932741471887e-05, 'epoch': 0.13} 13%|█▎ | 740/5773 [1:09:01<7:41:05, 5.50s/it] {'loss': 0.6029, 'learning_rate': 1.9499932741471887e-05, 'epoch': 0.13} 13%|█▎ | 740/5773 [1:08:55<7:41:06, 5.50s/it] 13%|█▎ | 741/5773 [1:09:06<7:40:15, 5.49s/it] 13%|█▎ | 741/5773 [1:09:01<7:40:14, 5.49s/it] {'loss': 0.5809, 'learning_rate': 1.949817910060443e-05, 'epoch': 0.13} 13%|█▎ | 741/5773 [1:09:06<7:40:15, 5.49s/it] {'loss': 0.5809, 'learning_rate': 1.949817910060443e-05, 'epoch': 0.13} 13%|█▎ | 741/5773 [1:09:01<7:40:14, 5.49s/it] 13%|█▎ | 742/5773 [1:09:12<7:41:17, 5.50s/it] 13%|█▎ | 742/5773 [1:09:06<7:41:18, 5.50s/it] {'loss': 0.5898, 'learning_rate': 1.9496422469406725e-05, 'epoch': 0.13} 13%|█▎ | 742/5773 [1:09:12<7:41:17, 5.50s/it] {'loss': 0.5898, 'learning_rate': 1.9496422469406725e-05, 'epoch': 0.13} 13%|█▎ | 742/5773 [1:09:06<7:41:18, 5.50s/it] 13%|█▎ | 743/5773 [1:09:17<7:40:05, 5.49s/it] 13%|█▎ | 743/5773 [1:09:12<7:40:04, 5.49s/it] {'loss': 0.6037, 'learning_rate': 1.9494662848431816e-05, 'epoch': 0.13} 13%|█▎ | 743/5773 [1:09:17<7:40:05, 5.49s/it] {'loss': 0.6037, 'learning_rate': 1.9494662848431816e-05, 'epoch': 0.13} 13%|█▎ | 743/5773 [1:09:12<7:40:04, 5.49s/it] 13%|█▎ | 744/5773 [1:09:23<7:35:53, 5.44s/it] 13%|█▎ | 744/5773 [1:09:17<7:35:53, 5.44s/it] {'loss': 0.5974, 'learning_rate': 1.9492900238233696e-05, 'epoch': 0.13} 13%|█▎ | 744/5773 [1:09:23<7:35:53, 5.44s/it] {'loss': 0.5974, 'learning_rate': 1.9492900238233696e-05, 'epoch': 0.13} 13%|█▎ | 744/5773 [1:09:17<7:35:53, 5.44s/it] 13%|█▎ | 745/5773 [1:09:28<7:39:43, 5.49s/it] 13%|█▎ | 745/5773 [1:09:23<7:39:44, 5.49s/it] {'loss': 0.6003, 'learning_rate': 1.949113463936728e-05, 'epoch': 0.13} 13%|█▎ | 745/5773 [1:09:28<7:39:43, 5.49s/it] {'loss': 0.6003, 'learning_rate': 1.949113463936728e-05, 'epoch': 0.13} 13%|█▎ | 745/5773 [1:09:23<7:39:44, 5.49s/it] 13%|█▎ | 746/5773 [1:09:34<7:38:56, 5.48s/it] 13%|█▎ | 746/5773 [1:09:28<7:38:56, 5.48s/it] {'loss': 0.602, 'learning_rate': 1.9489366052388443e-05, 'epoch': 0.13} 13%|█▎ | 746/5773 [1:09:34<7:38:56, 5.48s/it] {'loss': 0.602, 'learning_rate': 1.9489366052388443e-05, 'epoch': 0.13} 13%|█▎ | 746/5773 [1:09:28<7:38:56, 5.48s/it] 13%|█▎ | 747/5773 [1:09:39<7:38:55, 5.48s/it] 13%|█▎ | 747/5773 [1:09:34<7:38:55, 5.48s/it] {'loss': 0.5768, 'learning_rate': 1.9487594477853986e-05, 'epoch': 0.13} 13%|█▎ | 747/5773 [1:09:39<7:38:55, 5.48s/it] {'loss': 0.5768, 'learning_rate': 1.9487594477853986e-05, 'epoch': 0.13} 13%|█▎ | 747/5773 [1:09:34<7:38:55, 5.48s/it] 13%|█▎ | 748/5773 [1:09:45<7:42:15, 5.52s/it] 13%|█▎ | 748/5773 [1:09:39<7:42:15, 5.52s/it] {'loss': 0.5965, 'learning_rate': 1.9485819916321663e-05, 'epoch': 0.13} 13%|█▎ | 748/5773 [1:09:45<7:42:15, 5.52s/it] {'loss': 0.5965, 'learning_rate': 1.9485819916321663e-05, 'epoch': 0.13} 13%|█▎ | 748/5773 [1:09:39<7:42:15, 5.52s/it] 13%|█▎ | 749/5773 [1:09:50<7:42:39, 5.53s/it] 13%|█▎ | 749/5773 [1:09:45<7:42:38, 5.53s/it] {'loss': 0.6125, 'learning_rate': 1.9484042368350163e-05, 'epoch': 0.13} 13%|█▎ | 749/5773 [1:09:50<7:42:39, 5.53s/it] {'loss': 0.6125, 'learning_rate': 1.9484042368350163e-05, 'epoch': 0.13} 13%|█▎ | 749/5773 [1:09:45<7:42:38, 5.53s/it]14 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 910 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 13%|█▎ | 750/5773 [1:09:56<7:41:06, 5.51s/it]12 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 13%|█▎ | 750/5773 [1:09:50<7:41:07, 5.51s/it]11 AutoResumeHook: Checking whether to suspend... {'loss': 0.5988, 'learning_rate': 1.9482261834499107e-05, 'epoch': 0.13} 13%|█▎ | 750/5773 [1:09:56<7:41:06, 5.51s/it] {'loss': 0.5988, 'learning_rate': 1.9482261834499107e-05, 'epoch': 0.13} 13%|█▎ | 750/5773 [1:09:50<7:41:07, 5.51s/it] 13%|█▎ | 751/5773 [1:10:01<7:42:15, 5.52s/it] 13%|█▎ | 751/5773 [1:09:56<7:42:14, 5.52s/it] {'loss': 0.5981, 'learning_rate': 1.9480478315329073e-05, 'epoch': 0.13} 13%|█▎ | 751/5773 [1:10:01<7:42:15, 5.52s/it] {'loss': 0.5981, 'learning_rate': 1.9480478315329073e-05, 'epoch': 0.13} 13%|█▎ | 751/5773 [1:09:56<7:42:14, 5.52s/it] 13%|█▎ | 752/5773 [1:10:07<7:39:06, 5.49s/it] 13%|█▎ | 752/5773 [1:10:01<7:39:07, 5.49s/it] {'loss': 0.6013, 'learning_rate': 1.9478691811401563e-05, 'epoch': 0.13} 13%|█▎ | 752/5773 [1:10:07<7:39:06, 5.49s/it] {'loss': 0.6013, 'learning_rate': 1.9478691811401563e-05, 'epoch': 0.13} 13%|█▎ | 752/5773 [1:10:01<7:39:07, 5.49s/it] 13%|█▎ | 753/5773 [1:10:12<7:41:32, 5.52s/it] 13%|█▎ | 753/5773 [1:10:07<7:41:33, 5.52s/it] {'loss': 0.5986, 'learning_rate': 1.9476902323279034e-05, 'epoch': 0.13} 13%|█▎ | 753/5773 [1:10:12<7:41:32, 5.52s/it] {'loss': 0.5986, 'learning_rate': 1.9476902323279034e-05, 'epoch': 0.13} 13%|█▎ | 753/5773 [1:10:07<7:41:33, 5.52s/it] 13%|█▎ | 754/5773 [1:10:18<7:37:21, 5.47s/it] 13%|█▎ | 754/5773 [1:10:12<7:37:21, 5.47s/it] {'loss': 0.6134, 'learning_rate': 1.9475109851524862e-05, 'epoch': 0.13} 13%|█▎ | 754/5773 [1:10:18<7:37:21, 5.47s/it] {'loss': 0.6134, 'learning_rate': 1.9475109851524862e-05, 'epoch': 0.13} 13%|█▎ | 754/5773 [1:10:12<7:37:21, 5.47s/it] 13%|█▎ | 755/5773 [1:10:23<7:38:32, 5.48s/it] 13%|█▎ | 755/5773 [1:10:18<7:38:32, 5.48s/it] {'loss': 0.6084, 'learning_rate': 1.9473314396703385e-05, 'epoch': 0.13} 13%|█▎ | 755/5773 [1:10:23<7:38:32, 5.48s/it] {'loss': 0.6084, 'learning_rate': 1.9473314396703385e-05, 'epoch': 0.13} 13%|█▎ | 755/5773 [1:10:18<7:38:32, 5.48s/it] 13%|█▎ | 756/5773 [1:10:29<7:44:47, 5.56s/it] 13%|█▎ | 756/5773 [1:10:23<7:44:47, 5.56s/it] {'loss': 0.5949, 'learning_rate': 1.9471515959379866e-05, 'epoch': 0.13} 13%|█▎ | 756/5773 [1:10:29<7:44:47, 5.56s/it] {'loss': 0.5949, 'learning_rate': 1.9471515959379866e-05, 'epoch': 0.13} 13%|█▎ | 756/5773 [1:10:23<7:44:47, 5.56s/it] 13%|█▎ | 757/5773 [1:10:35<7:43:18, 5.54s/it] 13%|█▎ | 757/5773 [1:10:29<7:43:17, 5.54s/it] {'loss': 0.6018, 'learning_rate': 1.946971454012051e-05, 'epoch': 0.13} 13%|█▎ | 757/5773 [1:10:35<7:43:18, 5.54s/it] {'loss': 0.6018, 'learning_rate': 1.946971454012051e-05, 'epoch': 0.13} 13%|█▎ | 757/5773 [1:10:29<7:43:17, 5.54s/it] 13%|█▎ | 758/5773 [1:10:40<7:40:36, 5.51s/it] 13%|█▎ | 758/5773 [1:10:34<7:40:36, 5.51s/it] {'loss': 0.5911, 'learning_rate': 1.946791013949246e-05, 'epoch': 0.13} 13%|█▎ | 758/5773 [1:10:40<7:40:36, 5.51s/it] {'loss': 0.5911, 'learning_rate': 1.946791013949246e-05, 'epoch': 0.13} 13%|█▎ | 758/5773 [1:10:34<7:40:36, 5.51s/it] 13%|█▎ | 759/5773 [1:10:45<7:34:08, 5.43s/it] 13%|█▎ | 759/5773 [1:10:40<7:34:07, 5.43s/it] {'loss': 0.5884, 'learning_rate': 1.94661027580638e-05, 'epoch': 0.13} 13%|█▎ | 759/5773 [1:10:45<7:34:08, 5.43s/it] {'loss': 0.5884, 'learning_rate': 1.94661027580638e-05, 'epoch': 0.13} 13%|█▎ | 759/5773 [1:10:40<7:34:07, 5.43s/it] 13%|█▎ | 760/5773 [1:10:51<7:34:38, 5.44s/it] 13%|█▎ | 760/5773 [1:10:45<7:34:37, 5.44s/it] {'loss': 0.6006, 'learning_rate': 1.9464292396403553e-05, 'epoch': 0.13} 13%|█▎ | 760/5773 [1:10:51<7:34:38, 5.44s/it] {'loss': 0.6006, 'learning_rate': 1.9464292396403553e-05, 'epoch': 0.13} 13%|█▎ | 760/5773 [1:10:45<7:34:37, 5.44s/it] 13%|█▎ | 761/5773 [1:10:56<7:33:15, 5.43s/it] 13%|█▎ | 761/5773 [1:10:51<7:33:16, 5.43s/it] {'loss': 0.6117, 'learning_rate': 1.9462479055081676e-05, 'epoch': 0.13} 13%|█▎ | 761/5773 [1:10:56<7:33:15, 5.43s/it] {'loss': 0.6117, 'learning_rate': 1.9462479055081676e-05, 'epoch': 0.13} 13%|█▎ | 761/5773 [1:10:51<7:33:16, 5.43s/it] 13%|█▎ | 762/5773 [1:11:02<7:33:58, 5.44s/it] 13%|█▎ | 762/5773 [1:10:56<7:33:58, 5.44s/it] {'loss': 0.5862, 'learning_rate': 1.946066273466907e-05, 'epoch': 0.13} 13%|█▎ | 762/5773 [1:11:02<7:33:58, 5.44s/it] {'loss': 0.5862, 'learning_rate': 1.946066273466907e-05, 'epoch': 0.13} 13%|█▎ | 762/5773 [1:10:56<7:33:58, 5.44s/it] 13%|█▎ | 763/5773 [1:11:07<7:35:23, 5.45s/it] 13%|█▎ | 763/5773 [1:11:01<7:35:23, 5.45s/it] {'loss': 0.6047, 'learning_rate': 1.945884343573757e-05, 'epoch': 0.13} 13%|█▎ | 763/5773 [1:11:07<7:35:23, 5.45s/it] {'loss': 0.6047, 'learning_rate': 1.945884343573757e-05, 'epoch': 0.13} 13%|█▎ | 763/5773 [1:11:01<7:35:23, 5.45s/it] 13%|█▎ | 764/5773 [1:11:12<7:34:10, 5.44s/it] 13%|█▎ | 764/5773 [1:11:07<7:34:10, 5.44s/it] {'loss': 0.6001, 'learning_rate': 1.9457021158859946e-05, 'epoch': 0.13} 13%|█▎ | 764/5773 [1:11:12<7:34:10, 5.44s/it] {'loss': 0.6001, 'learning_rate': 1.9457021158859946e-05, 'epoch': 0.13} 13%|█▎ | 764/5773 [1:11:07<7:34:10, 5.44s/it] 13%|█▎ | 765/5773 [1:11:18<7:30:00, 5.39s/it] 13%|█▎ | 765/5773 [1:11:12<7:30:00, 5.39s/it] {'loss': 0.6115, 'learning_rate': 1.9455195904609913e-05, 'epoch': 0.13} 13%|█▎ | 765/5773 [1:11:18<7:30:00, 5.39s/it] {'loss': 0.6115, 'learning_rate': 1.9455195904609913e-05, 'epoch': 0.13} 13%|█▎ | 765/5773 [1:11:12<7:30:00, 5.39s/it] 13%|█▎ | 766/5773 [1:11:23<7:28:11, 5.37s/it] 13%|█▎ | 766/5773 [1:11:17<7:28:11, 5.37s/it] {'loss': 0.5909, 'learning_rate': 1.9453367673562116e-05, 'epoch': 0.13} 13%|█▎ | 766/5773 [1:11:23<7:28:11, 5.37s/it] {'loss': 0.5909, 'learning_rate': 1.9453367673562116e-05, 'epoch': 0.13} 13%|█▎ | 766/5773 [1:11:17<7:28:11, 5.37s/it] 13%|█▎ | 767/5773 [1:11:29<7:32:36, 5.42s/it] 13%|█▎ | 767/5773 [1:11:23<7:32:35, 5.42s/it] {'loss': 0.5987, 'learning_rate': 1.9451536466292142e-05, 'epoch': 0.13} 13%|█▎ | 767/5773 [1:11:29<7:32:36, 5.42s/it] {'loss': 0.5987, 'learning_rate': 1.9451536466292142e-05, 'epoch': 0.13} 13%|█▎ | 767/5773 [1:11:23<7:32:35, 5.42s/it] 13%|█▎ | 768/5773 [1:11:34<7:35:35, 5.46s/it] 13%|█▎ | 768/5773 [1:11:29<7:35:34, 5.46s/it] {'loss': 0.5966, 'learning_rate': 1.9449702283376516e-05, 'epoch': 0.13} 13%|█▎ | 768/5773 [1:11:34<7:35:35, 5.46s/it] {'loss': 0.5966, 'learning_rate': 1.9449702283376516e-05, 'epoch': 0.13} 13%|█▎ | 768/5773 [1:11:29<7:35:34, 5.46s/it] 13%|█▎ | 769/5773 [1:11:40<7:40:54, 5.53s/it] 13%|█▎ | 769/5773 [1:11:34<7:40:54, 5.53s/it] {'loss': 0.5931, 'learning_rate': 1.9447865125392696e-05, 'epoch': 0.13} 13%|█▎ | 769/5773 [1:11:40<7:40:54, 5.53s/it] {'loss': 0.5931, 'learning_rate': 1.9447865125392696e-05, 'epoch': 0.13} 13%|█▎ | 769/5773 [1:11:34<7:40:54, 5.53s/it] 13%|█▎ | 770/5773 [1:11:45<7:37:23, 5.49s/it] 13%|█▎ | 770/5773 [1:11:40<7:37:23, 5.49s/it] {'loss': 0.5889, 'learning_rate': 1.9446024992919077e-05, 'epoch': 0.13} 13%|█▎ | 770/5773 [1:11:45<7:37:23, 5.49s/it] {'loss': 0.5889, 'learning_rate': 1.9446024992919077e-05, 'epoch': 0.13} 13%|█▎ | 770/5773 [1:11:40<7:37:23, 5.49s/it] 13%|█▎ | 771/5773 [1:11:51<7:38:06, 5.50s/it] 13%|█▎ | 771/5773 [1:11:45<7:38:06, 5.50s/it] {'loss': 0.5979, 'learning_rate': 1.944418188653499e-05, 'epoch': 0.13} 13%|█▎ | 771/5773 [1:11:51<7:38:06, 5.50s/it] {'loss': 0.5979, 'learning_rate': 1.944418188653499e-05, 'epoch': 0.13} 13%|█▎ | 771/5773 [1:11:45<7:38:06, 5.50s/it] 13%|█▎ | 772/5773 [1:11:56<7:39:06, 5.51s/it] 13%|█▎ | 772/5773 [1:11:51<7:39:06, 5.51s/it]{'loss': 0.6004, 'learning_rate': 1.9442335806820706e-05, 'epoch': 0.13} {'loss': 0.6004, 'learning_rate': 1.9442335806820706e-05, 'epoch': 0.13} 13%|█▎ | 772/5773 [1:11:56<7:39:06, 5.51s/it] 13%|█▎ | 772/5773 [1:11:51<7:39:06, 5.51s/it] 13%|█▎ | 773/5773 [1:12:02<7:38:40, 5.50s/it] 13%|█▎ | 773/5773 [1:11:56<7:38:40, 5.50s/it] {'loss': 0.596, 'learning_rate': 1.944048675435743e-05, 'epoch': 0.13} 13%|█▎ | 773/5773 [1:12:02<7:38:40, 5.50s/it] {'loss': 0.596, 'learning_rate': 1.944048675435743e-05, 'epoch': 0.13} 13%|█▎ | 773/5773 [1:11:56<7:38:40, 5.50s/it] 13%|█▎ | 774/5773 [1:12:07<7:33:12, 5.44s/it] 13%|█▎ | 774/5773 [1:12:01<7:33:13, 5.44s/it] {'loss': 0.5944, 'learning_rate': 1.9438634729727304e-05, 'epoch': 0.13} 13%|█▎ | 774/5773 [1:12:07<7:33:12, 5.44s/it] {'loss': 0.5944, 'learning_rate': 1.9438634729727304e-05, 'epoch': 0.13} 13%|█▎ | 774/5773 [1:12:01<7:33:13, 5.44s/it] 13%|█▎ | 775/5773 [1:12:13<7:35:44, 5.47s/it] 13%|█▎ | 775/5773 [1:12:07<7:35:44, 5.47s/it] {'loss': 0.5853, 'learning_rate': 1.9436779733513398e-05, 'epoch': 0.13} 13%|█▎ | 775/5773 [1:12:13<7:35:44, 5.47s/it] {'loss': 0.5853, 'learning_rate': 1.9436779733513398e-05, 'epoch': 0.13} 13%|█▎ | 775/5773 [1:12:07<7:35:44, 5.47s/it] 13%|█▎ | 776/5773 [1:12:18<7:30:05, 5.40s/it] 13%|█▎ | 776/5773 [1:12:12<7:30:04, 5.40s/it] {'loss': 0.6009, 'learning_rate': 1.9434921766299733e-05, 'epoch': 0.13} 13%|█▎ | 776/5773 [1:12:18<7:30:05, 5.40s/it] {'loss': 0.6009, 'learning_rate': 1.9434921766299733e-05, 'epoch': 0.13} 13%|█▎ | 776/5773 [1:12:12<7:30:04, 5.40s/it] 13%|█▎ | 777/5773 [1:12:23<7:32:50, 5.44s/it] 13%|█▎ | 777/5773 [1:12:18<7:32:50, 5.44s/it] {'loss': 0.593, 'learning_rate': 1.943306082867125e-05, 'epoch': 0.13} 13%|█▎ | 777/5773 [1:12:23<7:32:50, 5.44s/it] {'loss': 0.593, 'learning_rate': 1.943306082867125e-05, 'epoch': 0.13} 13%|█▎ | 777/5773 [1:12:18<7:32:50, 5.44s/it] 13%|█▎ | 778/5773 [1:12:29<7:31:34, 5.42s/it] 13%|█▎ | 778/5773 [1:12:23<7:31:33, 5.42s/it] {'loss': 0.5969, 'learning_rate': 1.9431196921213837e-05, 'epoch': 0.13} 13%|█▎ | 778/5773 [1:12:29<7:31:34, 5.42s/it] {'loss': 0.5969, 'learning_rate': 1.9431196921213837e-05, 'epoch': 0.13} 13%|█▎ | 778/5773 [1:12:23<7:31:33, 5.42s/it] 13%|█▎ | 779/5773 [1:12:34<7:30:13, 5.41s/it] 13%|█▎ | 779/5773 [1:12:29<7:30:13, 5.41s/it] {'loss': 0.593, 'learning_rate': 1.9429330044514305e-05, 'epoch': 0.13} 13%|█▎ | 779/5773 [1:12:34<7:30:13, 5.41s/it] {'loss': 0.593, 'learning_rate': 1.9429330044514305e-05, 'epoch': 0.13} 13%|█▎ | 779/5773 [1:12:29<7:30:13, 5.41s/it] 14%|█▎ | 780/5773 [1:12:39<7:29:10, 5.40s/it] 14%|█▎ | 780/5773 [1:12:34<7:29:11, 5.40s/it] {'loss': 0.5944, 'learning_rate': 1.942746019916041e-05, 'epoch': 0.14} 14%|█▎ | 780/5773 [1:12:39<7:29:10, 5.40s/it] {'loss': 0.5944, 'learning_rate': 1.942746019916041e-05, 'epoch': 0.14} 14%|█▎ | 780/5773 [1:12:34<7:29:11, 5.40s/it] 14%|█▎ | 781/5773 [1:12:45<7:31:39, 5.43s/it] 14%|█▎ | 781/5773 [1:12:39<7:31:38, 5.43s/it] {'loss': 0.5961, 'learning_rate': 1.9425587385740844e-05, 'epoch': 0.14} 14%|█▎ | 781/5773 [1:12:45<7:31:39, 5.43s/it] {'loss': 0.5961, 'learning_rate': 1.9425587385740844e-05, 'epoch': 0.14} 14%|█▎ | 781/5773 [1:12:39<7:31:38, 5.43s/it] 14%|█▎ | 782/5773 [1:12:50<7:33:19, 5.45s/it] 14%|█▎ | 782/5773 [1:12:45<7:33:18, 5.45s/it] {'loss': 0.6013, 'learning_rate': 1.942371160484522e-05, 'epoch': 0.14} 14%|█▎ | 782/5773 [1:12:50<7:33:19, 5.45s/it] {'loss': 0.6013, 'learning_rate': 1.942371160484522e-05, 'epoch': 0.14} 14%|█▎ | 782/5773 [1:12:45<7:33:18, 5.45s/it] 14%|█▎ | 783/5773 [1:12:56<7:31:16, 5.43s/it] 14%|█▎ | 783/5773 [1:12:50<7:31:16, 5.43s/it] {'loss': 0.5873, 'learning_rate': 1.9421832857064093e-05, 'epoch': 0.14} 14%|█▎ | 783/5773 [1:12:56<7:31:16, 5.43s/it] {'loss': 0.5873, 'learning_rate': 1.9421832857064093e-05, 'epoch': 0.14} 14%|█▎ | 783/5773 [1:12:50<7:31:16, 5.43s/it] 14%|█▎ | 784/5773 [1:13:01<7:26:41, 5.37s/it] 14%|█▎ | 784/5773 [1:12:56<7:26:41, 5.37s/it] {'loss': 0.583, 'learning_rate': 1.9419951142988963e-05, 'epoch': 0.14} 14%|█▎ | 784/5773 [1:13:01<7:26:41, 5.37s/it] {'loss': 0.583, 'learning_rate': 1.9419951142988963e-05, 'epoch': 0.14} 14%|█▎ | 784/5773 [1:12:56<7:26:41, 5.37s/it] 14%|█▎ | 785/5773 [1:13:07<7:28:05, 5.39s/it] 14%|█▎ | 785/5773 [1:13:01<7:28:06, 5.39s/it] {'loss': 0.5989, 'learning_rate': 1.9418066463212246e-05, 'epoch': 0.14} 14%|█▎ | 785/5773 [1:13:07<7:28:05, 5.39s/it] {'loss': 0.5989, 'learning_rate': 1.9418066463212246e-05, 'epoch': 0.14} 14%|█▎ | 785/5773 [1:13:01<7:28:06, 5.39s/it] 14%|█▎ | 786/5773 [1:13:12<7:25:38, 5.36s/it] 14%|█▎ | 786/5773 [1:13:06<7:25:37, 5.36s/it] {'loss': 0.615, 'learning_rate': 1.94161788183273e-05, 'epoch': 0.14} 14%|█▎ | 786/5773 [1:13:12<7:25:38, 5.36s/it] {'loss': 0.615, 'learning_rate': 1.94161788183273e-05, 'epoch': 0.14} 14%|█▎ | 786/5773 [1:13:06<7:25:37, 5.36s/it] 14%|█▎ | 787/5773 [1:13:17<7:26:09, 5.37s/it] 14%|█▎ | 787/5773 [1:13:12<7:26:09, 5.37s/it] {'loss': 0.6065, 'learning_rate': 1.941428820892842e-05, 'epoch': 0.14} 14%|█▎ | 787/5773 [1:13:17<7:26:09, 5.37s/it] {'loss': 0.6065, 'learning_rate': 1.941428820892842e-05, 'epoch': 0.14} 14%|█▎ | 787/5773 [1:13:12<7:26:09, 5.37s/it] 14%|█▎ | 788/5773 [1:13:23<7:32:26, 5.45s/it] 14%|█▎ | 788/5773 [1:13:17<7:32:26, 5.45s/it] {'loss': 0.5945, 'learning_rate': 1.9412394635610824e-05, 'epoch': 0.14} 14%|█▎ | 788/5773 [1:13:23<7:32:26, 5.45s/it] {'loss': 0.5945, 'learning_rate': 1.9412394635610824e-05, 'epoch': 0.14} 14%|█▎ | 788/5773 [1:13:17<7:32:26, 5.45s/it] 14%|█▎ | 789/5773 [1:13:29<7:39:22, 5.53s/it] 14%|█▎ | 789/5773 [1:13:23<7:39:22, 5.53s/it] {'loss': 0.612, 'learning_rate': 1.9410498098970675e-05, 'epoch': 0.14} 14%|█▎ | 789/5773 [1:13:29<7:39:22, 5.53s/it] {'loss': 0.612, 'learning_rate': 1.9410498098970675e-05, 'epoch': 0.14} 14%|█▎ | 789/5773 [1:13:23<7:39:22, 5.53s/it] 14%|█▎ | 790/5773 [1:13:34<7:35:15, 5.48s/it] 14%|█▎ | 790/5773 [1:13:28<7:35:16, 5.48s/it] {'loss': 0.5941, 'learning_rate': 1.9408598599605062e-05, 'epoch': 0.14} 14%|█▎ | 790/5773 [1:13:34<7:35:15, 5.48s/it] {'loss': 0.5941, 'learning_rate': 1.9408598599605062e-05, 'epoch': 0.14} 14%|█▎ | 790/5773 [1:13:28<7:35:16, 5.48s/it] 14%|█▎ | 791/5773 [1:13:39<7:37:22, 5.51s/it] 14%|█▎ | 791/5773 [1:13:34<7:37:22, 5.51s/it] {'loss': 0.6054, 'learning_rate': 1.9406696138112006e-05, 'epoch': 0.14} 14%|█▎ | 791/5773 [1:13:39<7:37:22, 5.51s/it] {'loss': 0.6054, 'learning_rate': 1.9406696138112006e-05, 'epoch': 0.14} 14%|█▎ | 791/5773 [1:13:34<7:37:22, 5.51s/it] 14%|█▎ | 792/5773 [1:13:45<7:38:49, 5.53s/it] 14%|█▎ | 792/5773 [1:13:40<7:38:49, 5.53s/it] {'loss': 0.5965, 'learning_rate': 1.9404790715090463e-05, 'epoch': 0.14} 14%|█▎ | 792/5773 [1:13:45<7:38:49, 5.53s/it] {'loss': 0.5965, 'learning_rate': 1.9404790715090463e-05, 'epoch': 0.14} 14%|█▎ | 792/5773 [1:13:40<7:38:49, 5.53s/it] 14%|█▎ | 793/5773 [1:13:50<7:35:05, 5.48s/it] 14%|█▎ | 793/5773 [1:13:45<7:35:06, 5.48s/it] {'loss': 0.5907, 'learning_rate': 1.9402882331140322e-05, 'epoch': 0.14} 14%|█▎ | 793/5773 [1:13:50<7:35:05, 5.48s/it] {'loss': 0.5907, 'learning_rate': 1.9402882331140322e-05, 'epoch': 0.14} 14%|█▎ | 793/5773 [1:13:45<7:35:06, 5.48s/it] 14%|█▍ | 794/5773 [1:13:56<7:32:39, 5.45s/it] 14%|█▍ | 794/5773 [1:13:50<7:32:39, 5.45s/it] {'loss': 0.5961, 'learning_rate': 1.9400970986862404e-05, 'epoch': 0.14} 14%|█▍ | 794/5773 [1:13:56<7:32:39, 5.45s/it] {'loss': 0.5961, 'learning_rate': 1.9400970986862404e-05, 'epoch': 0.14} 14%|█▍ | 794/5773 [1:13:50<7:32:39, 5.45s/it] 14%|█▍ | 795/5773 [1:14:01<7:29:21, 5.42s/it] 14%|█▍ | 795/5773 [1:13:56<7:29:21, 5.42s/it] {'loss': 0.5837, 'learning_rate': 1.9399056682858458e-05, 'epoch': 0.14} 14%|█▍ | 795/5773 [1:14:01<7:29:21, 5.42s/it] {'loss': 0.5837, 'learning_rate': 1.9399056682858458e-05, 'epoch': 0.14} 14%|█▍ | 795/5773 [1:13:56<7:29:21, 5.42s/it] 14%|█▍ | 796/5773 [1:14:07<7:31:52, 5.45s/it] 14%|█▍ | 796/5773 [1:14:01<7:31:53, 5.45s/it] {'loss': 0.5916, 'learning_rate': 1.9397139419731176e-05, 'epoch': 0.14} 14%|█▍ | 796/5773 [1:14:07<7:31:52, 5.45s/it] {'loss': 0.5916, 'learning_rate': 1.9397139419731176e-05, 'epoch': 0.14} 14%|█▍ | 796/5773 [1:14:01<7:31:53, 5.45s/it] 14%|█▍ | 797/5773 [1:14:12<7:29:43, 5.42s/it] 14%|█▍ | 797/5773 [1:14:07<7:29:43, 5.42s/it] {'loss': 0.6074, 'learning_rate': 1.9395219198084163e-05, 'epoch': 0.14} 14%|█▍ | 797/5773 [1:14:12<7:29:43, 5.42s/it] {'loss': 0.6074, 'learning_rate': 1.9395219198084163e-05, 'epoch': 0.14} 14%|█▍ | 797/5773 [1:14:07<7:29:43, 5.42s/it] 14%|█▍ | 798/5773 [1:14:17<7:29:59, 5.43s/it] 14%|█▍ | 798/5773 [1:14:12<7:29:58, 5.43s/it] {'loss': 0.5995, 'learning_rate': 1.9393296018521973e-05, 'epoch': 0.14} 14%|█▍ | 798/5773 [1:14:17<7:29:59, 5.43s/it] {'loss': 0.5995, 'learning_rate': 1.9393296018521973e-05, 'epoch': 0.14} 14%|█▍ | 798/5773 [1:14:12<7:29:58, 5.43s/it] 14%|█▍ | 799/5773 [1:14:23<7:29:37, 5.42s/it] 14%|█▍ | 799/5773 [1:14:17<7:29:37, 5.42s/it] {'loss': 0.5906, 'learning_rate': 1.9391369881650083e-05, 'epoch': 0.14} 14%|█▍ | 799/5773 [1:14:23<7:29:37, 5.42s/it] {'loss': 0.5906, 'learning_rate': 1.9391369881650083e-05, 'epoch': 0.14} 14%|█▍ | 799/5773 [1:14:17<7:29:37, 5.42s/it]7 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 14%|█▍ | 800/5773 [1:14:28<7:29:42, 5.43s/it]2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 14%|█▍ | 800/5773 [1:14:23<7:29:42, 5.43s/it] {'loss': 0.5869, 'learning_rate': 1.93894407880749e-05, 'epoch': 0.14} 14%|█▍ | 800/5773 [1:14:28<7:29:42, 5.43s/it] {'loss': 0.5869, 'learning_rate': 1.93894407880749e-05, 'epoch': 0.14} 14%|█▍ | 800/5773 [1:14:23<7:29:42, 5.43s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 14%|█▍ | 801/5773 [1:14:47<12:55:01, 9.35s/it] 14%|█▍ | 801/5773 [1:14:41<12:55:01, 9.35s/it] {'loss': 0.5948, 'learning_rate': 1.938750873840377e-05, 'epoch': 0.14} 14%|█▍ | 801/5773 [1:14:47<12:55:01, 9.35s/it] {'loss': 0.5948, 'learning_rate': 1.938750873840377e-05, 'epoch': 0.14} 14%|█▍ | 801/5773 [1:14:41<12:55:01, 9.35s/it] 14%|█▍ | 802/5773 [1:14:52<11:19:53, 8.21s/it] 14%|█▍ | 802/5773 [1:14:47<11:19:53, 8.21s/it] {'loss': 0.5944, 'learning_rate': 1.9385573733244956e-05, 'epoch': 0.14} 14%|█▍ | 802/5773 [1:14:52<11:19:53, 8.21s/it] {'loss': 0.5944, 'learning_rate': 1.9385573733244956e-05, 'epoch': 0.14} 14%|█▍ | 802/5773 [1:14:47<11:19:53, 8.21s/it] 14%|█▍ | 803/5773 [1:14:58<10:08:56, 7.35s/it] 14%|█▍ | 803/5773 [1:14:52<10:08:56, 7.35s/it] {'loss': 0.5992, 'learning_rate': 1.9383635773207664e-05, 'epoch': 0.14} 14%|█▍ | 803/5773 [1:14:58<10:08:56, 7.35s/it] {'loss': 0.5992, 'learning_rate': 1.9383635773207664e-05, 'epoch': 0.14} 14%|█▍ | 803/5773 [1:14:52<10:08:56, 7.35s/it] 14%|█▍ | 804/5773 [1:15:03<9:19:27, 6.76s/it] 14%|█▍ | 804/5773 [1:14:58<9:19:27, 6.76s/it] {'loss': 0.6082, 'learning_rate': 1.938169485890203e-05, 'epoch': 0.14} 14%|█▍ | 804/5773 [1:15:03<9:19:27, 6.76s/it] {'loss': 0.6082, 'learning_rate': 1.938169485890203e-05, 'epoch': 0.14} 14%|█▍ | 804/5773 [1:14:58<9:19:27, 6.76s/it] 14%|█▍ | 805/5773 [1:15:09<8:56:52, 6.48s/it] 14%|█▍ | 805/5773 [1:15:03<8:56:51, 6.48s/it] {'loss': 0.5821, 'learning_rate': 1.937975099093911e-05, 'epoch': 0.14} 14%|█▍ | 805/5773 [1:15:09<8:56:52, 6.48s/it] {'loss': 0.5821, 'learning_rate': 1.937975099093911e-05, 'epoch': 0.14} 14%|█▍ | 805/5773 [1:15:03<8:56:51, 6.48s/it] 14%|█▍ | 806/5773 [1:15:14<8:30:44, 6.17s/it] 14%|█▍ | 806/5773 [1:15:09<8:30:44, 6.17s/it] {'loss': 0.5883, 'learning_rate': 1.9377804169930893e-05, 'epoch': 0.14} 14%|█▍ | 806/5773 [1:15:14<8:30:44, 6.17s/it] {'loss': 0.5883, 'learning_rate': 1.9377804169930893e-05, 'epoch': 0.14} 14%|█▍ | 806/5773 [1:15:09<8:30:44, 6.17s/it] 14%|█▍ | 807/5773 [1:15:20<8:15:38, 5.99s/it] 14%|█▍ | 807/5773 [1:15:14<8:15:37, 5.99s/it] {'loss': 0.5923, 'learning_rate': 1.9375854396490305e-05, 'epoch': 0.14} 14%|█▍ | 807/5773 [1:15:20<8:15:38, 5.99s/it] {'loss': 0.5923, 'learning_rate': 1.9375854396490305e-05, 'epoch': 0.14} 14%|█▍ | 807/5773 [1:15:14<8:15:37, 5.99s/it] 14%|█▍ | 808/5773 [1:15:26<8:06:38, 5.88s/it] 14%|█▍ | 808/5773 [1:15:20<8:06:37, 5.88s/it] {'loss': 0.5881, 'learning_rate': 1.9373901671231202e-05, 'epoch': 0.14} 14%|█▍ | 808/5773 [1:15:26<8:06:38, 5.88s/it] {'loss': 0.5881, 'learning_rate': 1.9373901671231202e-05, 'epoch': 0.14} 14%|█▍ | 808/5773 [1:15:20<8:06:37, 5.88s/it] 14%|█▍ | 809/5773 [1:15:31<7:58:12, 5.78s/it] 14%|█▍ | 809/5773 [1:15:26<7:58:12, 5.78s/it] {'loss': 0.5883, 'learning_rate': 1.9371945994768354e-05, 'epoch': 0.14} 14%|█▍ | 809/5773 [1:15:31<7:58:12, 5.78s/it] {'loss': 0.5883, 'learning_rate': 1.9371945994768354e-05, 'epoch': 0.14} 14%|█▍ | 809/5773 [1:15:26<7:58:12, 5.78s/it] 14%|█▍ | 810/5773 [1:15:37<7:49:27, 5.68s/it] 14%|█▍ | 810/5773 [1:15:31<7:49:26, 5.68s/it] {'loss': 0.6326, 'learning_rate': 1.9369987367717477e-05, 'epoch': 0.14} 14%|█▍ | 810/5773 [1:15:37<7:49:27, 5.68s/it] {'loss': 0.6326, 'learning_rate': 1.9369987367717477e-05, 'epoch': 0.14} 14%|█▍ | 810/5773 [1:15:31<7:49:26, 5.68s/it] 14%|█▍ | 811/5773 [1:15:42<7:42:46, 5.60s/it] 14%|█▍ | 811/5773 [1:15:36<7:42:47, 5.60s/it] {'loss': 0.582, 'learning_rate': 1.9368025790695207e-05, 'epoch': 0.14} 14%|█▍ | 811/5773 [1:15:42<7:42:46, 5.60s/it] {'loss': 0.582, 'learning_rate': 1.9368025790695207e-05, 'epoch': 0.14} 14%|█▍ | 811/5773 [1:15:36<7:42:47, 5.60s/it] 14%|█▍ | 812/5773 [1:15:48<7:41:35, 5.58s/it] 14%|█▍ | 812/5773 [1:15:42<7:41:35, 5.58s/it] {'loss': 0.5774, 'learning_rate': 1.9366061264319112e-05, 'epoch': 0.14} 14%|█▍ | 812/5773 [1:15:48<7:41:35, 5.58s/it] {'loss': 0.5774, 'learning_rate': 1.9366061264319112e-05, 'epoch': 0.14} 14%|█▍ | 812/5773 [1:15:42<7:41:35, 5.58s/it] 14%|█▍ | 813/5773 [1:15:53<7:36:15, 5.52s/it] 14%|█▍ | 813/5773 [1:15:47<7:36:16, 5.52s/it] {'loss': 0.5693, 'learning_rate': 1.9364093789207686e-05, 'epoch': 0.14} 14%|█▍ | 813/5773 [1:15:53<7:36:15, 5.52s/it] {'loss': 0.5693, 'learning_rate': 1.9364093789207686e-05, 'epoch': 0.14} 14%|█▍ | 813/5773 [1:15:47<7:36:16, 5.52s/it] 14%|█▍ | 814/5773 [1:15:58<7:34:41, 5.50s/it] 14%|█▍ | 814/5773 [1:15:53<7:34:41, 5.50s/it] {'loss': 0.6021, 'learning_rate': 1.9362123365980356e-05, 'epoch': 0.14} 14%|█▍ | 814/5773 [1:15:58<7:34:41, 5.50s/it] {'loss': 0.6021, 'learning_rate': 1.9362123365980356e-05, 'epoch': 0.14} 14%|█▍ | 814/5773 [1:15:53<7:34:41, 5.50s/it] 14%|█▍ | 815/5773 [1:16:04<7:31:19, 5.46s/it] 14%|█▍ | 815/5773 [1:15:58<7:31:18, 5.46s/it] {'loss': 0.5958, 'learning_rate': 1.9360149995257474e-05, 'epoch': 0.14} 14%|█▍ | 815/5773 [1:16:04<7:31:19, 5.46s/it] {'loss': 0.5958, 'learning_rate': 1.9360149995257474e-05, 'epoch': 0.14} 14%|█▍ | 815/5773 [1:15:58<7:31:18, 5.46s/it] 14%|█▍ | 816/5773 [1:16:09<7:32:19, 5.47s/it] 14%|█▍ | 816/5773 [1:16:04<7:32:19, 5.48s/it] {'loss': 0.5991, 'learning_rate': 1.9358173677660317e-05, 'epoch': 0.14} 14%|█▍ | 816/5773 [1:16:09<7:32:19, 5.47s/it] {'loss': 0.5991, 'learning_rate': 1.9358173677660317e-05, 'epoch': 0.14} 14%|█▍ | 816/5773 [1:16:04<7:32:19, 5.48s/it] 14%|█▍ | 817/5773 [1:16:15<7:30:22, 5.45s/it] 14%|█▍ | 817/5773 [1:16:09<7:30:22, 5.45s/it] {'loss': 0.5968, 'learning_rate': 1.9356194413811092e-05, 'epoch': 0.14} 14%|█▍ | 817/5773 [1:16:15<7:30:22, 5.45s/it] {'loss': 0.5968, 'learning_rate': 1.9356194413811092e-05, 'epoch': 0.14} 14%|█▍ | 817/5773 [1:16:09<7:30:22, 5.45s/it] 14%|█▍ | 818/5773 [1:16:20<7:29:44, 5.45s/it] 14%|█▍ | 818/5773 [1:16:15<7:29:43, 5.45s/it] {'loss': 0.5954, 'learning_rate': 1.935421220433294e-05, 'epoch': 0.14} 14%|█▍ | 818/5773 [1:16:20<7:29:44, 5.45s/it] {'loss': 0.5954, 'learning_rate': 1.935421220433294e-05, 'epoch': 0.14} 14%|█▍ | 818/5773 [1:16:15<7:29:43, 5.45s/it] 14%|█▍ | 819/5773 [1:16:25<7:28:14, 5.43s/it] 14%|█▍ | 819/5773 [1:16:20<7:28:15, 5.43s/it] {'loss': 0.5957, 'learning_rate': 1.9352227049849923e-05, 'epoch': 0.14} 14%|█▍ | 819/5773 [1:16:25<7:28:14, 5.43s/it] {'loss': 0.5957, 'learning_rate': 1.9352227049849923e-05, 'epoch': 0.14} 14%|█▍ | 819/5773 [1:16:20<7:28:15, 5.43s/it] 14%|█▍ | 820/5773 [1:16:31<7:28:14, 5.43s/it] 14%|█▍ | 820/5773 [1:16:25<7:28:14, 5.43s/it] {'loss': 0.5847, 'learning_rate': 1.9350238950987024e-05, 'epoch': 0.14} 14%|█▍ | 820/5773 [1:16:31<7:28:14, 5.43s/it] {'loss': 0.5847, 'learning_rate': 1.9350238950987024e-05, 'epoch': 0.14} 14%|█▍ | 820/5773 [1:16:25<7:28:14, 5.43s/it] 14%|█▍ | 821/5773 [1:16:36<7:28:12, 5.43s/it] 14%|█▍ | 821/5773 [1:16:31<7:28:12, 5.43s/it] {'loss': 0.5918, 'learning_rate': 1.934824790837017e-05, 'epoch': 0.14} 14%|█▍ | 821/5773 [1:16:36<7:28:12, 5.43s/it] {'loss': 0.5918, 'learning_rate': 1.934824790837017e-05, 'epoch': 0.14} 14%|█▍ | 821/5773 [1:16:31<7:28:12, 5.43s/it] 14%|█▍ | 822/5773 [1:16:42<7:30:32, 5.46s/it] 14%|█▍ | 822/5773 [1:16:36<7:30:32, 5.46s/it] {'loss': 0.6008, 'learning_rate': 1.93462539226262e-05, 'epoch': 0.14} 14%|█▍ | 822/5773 [1:16:42<7:30:32, 5.46s/it] {'loss': 0.6008, 'learning_rate': 1.93462539226262e-05, 'epoch': 0.14} 14%|█▍ | 822/5773 [1:16:36<7:30:32, 5.46s/it] 14%|█▍ | 823/5773 [1:16:47<7:31:23, 5.47s/it] 14%|█▍ | 823/5773 [1:16:42<7:31:23, 5.47s/it] {'loss': 0.62, 'learning_rate': 1.934425699438288e-05, 'epoch': 0.14} 14%|█▍ | 823/5773 [1:16:47<7:31:23, 5.47s/it] {'loss': 0.62, 'learning_rate': 1.934425699438288e-05, 'epoch': 0.14} 14%|█▍ | 823/5773 [1:16:42<7:31:23, 5.47s/it] 14%|█▍ | 824/5773 [1:16:53<7:30:50, 5.47s/it] 14%|█▍ | 824/5773 [1:16:47<7:30:50, 5.47s/it] {'loss': 0.5972, 'learning_rate': 1.9342257124268914e-05, 'epoch': 0.14} 14%|█▍ | 824/5773 [1:16:53<7:30:50, 5.47s/it] {'loss': 0.5972, 'learning_rate': 1.9342257124268914e-05, 'epoch': 0.14} 14%|█▍ | 824/5773 [1:16:47<7:30:50, 5.47s/it] 14%|█▍ | 825/5773 [1:16:58<7:28:34, 5.44s/it] 14%|█▍ | 825/5773 [1:16:53<7:28:35, 5.44s/it] {'loss': 0.6149, 'learning_rate': 1.9340254312913923e-05, 'epoch': 0.14} 14%|█▍ | 825/5773 [1:16:58<7:28:34, 5.44s/it] {'loss': 0.6149, 'learning_rate': 1.9340254312913923e-05, 'epoch': 0.14} 14%|█▍ | 825/5773 [1:16:53<7:28:35, 5.44s/it] 14%|█▍ | 826/5773 [1:17:04<7:29:03, 5.45s/it] 14%|█▍ | 826/5773 [1:16:58<7:29:02, 5.45s/it] {'loss': 0.5851, 'learning_rate': 1.9338248560948452e-05, 'epoch': 0.14} 14%|█▍ | 826/5773 [1:17:04<7:29:03, 5.45s/it] {'loss': 0.5851, 'learning_rate': 1.9338248560948452e-05, 'epoch': 0.14} 14%|█▍ | 826/5773 [1:16:58<7:29:02, 5.45s/it] 14%|█▍ | 827/5773 [1:17:09<7:30:59, 5.47s/it] 14%|█▍ | 827/5773 [1:17:04<7:31:00, 5.47s/it] {'loss': 0.5992, 'learning_rate': 1.9336239869003982e-05, 'epoch': 0.14} 14%|█▍ | 827/5773 [1:17:09<7:30:59, 5.47s/it] {'loss': 0.5992, 'learning_rate': 1.9336239869003982e-05, 'epoch': 0.14} 14%|█▍ | 827/5773 [1:17:04<7:31:00, 5.47s/it] 14%|█▍ | 828/5773 [1:17:15<7:32:35, 5.49s/it] 14%|█▍ | 828/5773 [1:17:09<7:32:35, 5.49s/it] {'loss': 0.5932, 'learning_rate': 1.933422823771291e-05, 'epoch': 0.14} 14%|█▍ | 828/5773 [1:17:15<7:32:35, 5.49s/it] {'loss': 0.5932, 'learning_rate': 1.933422823771291e-05, 'epoch': 0.14} 14%|█▍ | 828/5773 [1:17:09<7:32:35, 5.49s/it] 14%|█▍ | 829/5773 [1:17:20<7:29:10, 5.45s/it] 14%|█▍ | 829/5773 [1:17:15<7:29:10, 5.45s/it] {'loss': 0.5915, 'learning_rate': 1.933221366770856e-05, 'epoch': 0.14} 14%|█▍ | 829/5773 [1:17:20<7:29:10, 5.45s/it] {'loss': 0.5915, 'learning_rate': 1.933221366770856e-05, 'epoch': 0.14} 14%|█▍ | 829/5773 [1:17:15<7:29:10, 5.45s/it] 14%|█▍ | 830/5773 [1:17:25<7:28:22, 5.44s/it] 14%|█▍ | 830/5773 [1:17:20<7:28:23, 5.44s/it] {'loss': 0.5859, 'learning_rate': 1.9330196159625188e-05, 'epoch': 0.14} 14%|█▍ | 830/5773 [1:17:25<7:28:22, 5.44s/it] {'loss': 0.5859, 'learning_rate': 1.9330196159625188e-05, 'epoch': 0.14} 14%|█▍ | 830/5773 [1:17:20<7:28:23, 5.44s/it] 14%|█▍ | 831/5773 [1:17:31<7:27:56, 5.44s/it] 14%|█▍ | 831/5773 [1:17:25<7:27:56, 5.44s/it] {'loss': 0.5961, 'learning_rate': 1.9328175714097964e-05, 'epoch': 0.14} 14%|█▍ | 831/5773 [1:17:31<7:27:56, 5.44s/it] {'loss': 0.5961, 'learning_rate': 1.9328175714097964e-05, 'epoch': 0.14} 14%|█▍ | 831/5773 [1:17:25<7:27:56, 5.44s/it] 14%|█▍ | 832/5773 [1:17:36<7:26:43, 5.42s/it] 14%|█▍ | 832/5773 [1:17:31<7:26:44, 5.42s/it] {'loss': 0.6179, 'learning_rate': 1.9326152331762995e-05, 'epoch': 0.14} 14%|█▍ | 832/5773 [1:17:36<7:26:43, 5.42s/it] {'loss': 0.6179, 'learning_rate': 1.9326152331762995e-05, 'epoch': 0.14} 14%|█▍ | 832/5773 [1:17:31<7:26:44, 5.42s/it] 14%|█▍ | 833/5773 [1:17:42<7:25:56, 5.42s/it] 14%|█▍ | 833/5773 [1:17:36<7:25:56, 5.42s/it] {'loss': 0.5817, 'learning_rate': 1.9324126013257304e-05, 'epoch': 0.14} 14%|█▍ | 833/5773 [1:17:42<7:25:56, 5.42s/it] {'loss': 0.5817, 'learning_rate': 1.9324126013257304e-05, 'epoch': 0.14} 14%|█▍ | 833/5773 [1:17:36<7:25:56, 5.42s/it] 14%|█▍ | 834/5773 [1:17:47<7:30:19, 5.47s/it] 14%|█▍ | 834/5773 [1:17:42<7:30:19, 5.47s/it] {'loss': 0.598, 'learning_rate': 1.9322096759218838e-05, 'epoch': 0.14} 14%|█▍ | 834/5773 [1:17:47<7:30:19, 5.47s/it] {'loss': 0.598, 'learning_rate': 1.9322096759218838e-05, 'epoch': 0.14} 14%|█▍ | 834/5773 [1:17:42<7:30:19, 5.47s/it] 14%|█▍ | 835/5773 [1:17:53<7:27:56, 5.44s/it] 14%|█▍ | 835/5773 [1:17:47<7:27:56, 5.44s/it] {'loss': 0.5752, 'learning_rate': 1.9320064570286473e-05, 'epoch': 0.14} 14%|█▍ | 835/5773 [1:17:53<7:27:56, 5.44s/it] {'loss': 0.5752, 'learning_rate': 1.9320064570286473e-05, 'epoch': 0.14} 14%|█▍ | 835/5773 [1:17:47<7:27:56, 5.44s/it] 14%|█▍ | 836/5773 [1:17:58<7:26:32, 5.43s/it] 14%|█▍ | 836/5773 [1:17:53<7:26:32, 5.43s/it] {'loss': 0.6198, 'learning_rate': 1.931802944710001e-05, 'epoch': 0.14} 14%|█▍ | 836/5773 [1:17:58<7:26:32, 5.43s/it] {'loss': 0.6198, 'learning_rate': 1.931802944710001e-05, 'epoch': 0.14} 14%|█▍ | 836/5773 [1:17:53<7:26:32, 5.43s/it] 14%|█▍ | 837/5773 [1:18:03<7:25:34, 5.42s/it] 14%|█▍ | 837/5773 [1:17:58<7:25:33, 5.42s/it] {'loss': 0.5905, 'learning_rate': 1.9315991390300166e-05, 'epoch': 0.14} 14%|█▍ | 837/5773 [1:18:03<7:25:34, 5.42s/it] {'loss': 0.5905, 'learning_rate': 1.9315991390300166e-05, 'epoch': 0.14} 14%|█▍ | 837/5773 [1:17:58<7:25:33, 5.42s/it] 15%|█▍ | 838/5773 [1:18:09<7:26:50, 5.43s/it] 15%|█▍ | 838/5773 [1:18:03<7:26:50, 5.43s/it] {'loss': 0.6004, 'learning_rate': 1.931395040052859e-05, 'epoch': 0.15} 15%|█▍ | 838/5773 [1:18:09<7:26:50, 5.43s/it] {'loss': 0.6004, 'learning_rate': 1.931395040052859e-05, 'epoch': 0.15} 15%|█▍ | 838/5773 [1:18:03<7:26:50, 5.43s/it] 15%|█▍ | 839/5773 [1:18:14<7:28:46, 5.46s/it] 15%|█▍ | 839/5773 [1:18:09<7:28:46, 5.46s/it] {'loss': 0.6136, 'learning_rate': 1.9311906478427848e-05, 'epoch': 0.15} 15%|█▍ | 839/5773 [1:18:14<7:28:46, 5.46s/it] {'loss': 0.6136, 'learning_rate': 1.9311906478427848e-05, 'epoch': 0.15} 15%|█▍ | 839/5773 [1:18:09<7:28:46, 5.46s/it] 15%|█▍ | 840/5773 [1:18:20<7:33:19, 5.51s/it] 15%|█▍ | 840/5773 [1:18:15<7:33:19, 5.51s/it] {'loss': 0.6045, 'learning_rate': 1.9309859624641432e-05, 'epoch': 0.15} 15%|█▍ | 840/5773 [1:18:20<7:33:19, 5.51s/it] {'loss': 0.6045, 'learning_rate': 1.9309859624641432e-05, 'epoch': 0.15} 15%|█▍ | 840/5773 [1:18:15<7:33:19, 5.51s/it] 15%|█▍ | 841/5773 [1:18:25<7:29:17, 5.47s/it] 15%|█▍ | 841/5773 [1:18:20<7:29:17, 5.47s/it] {'loss': 0.5853, 'learning_rate': 1.930780983981376e-05, 'epoch': 0.15} 15%|█▍ | 841/5773 [1:18:25<7:29:17, 5.47s/it] {'loss': 0.5853, 'learning_rate': 1.930780983981376e-05, 'epoch': 0.15} 15%|█▍ | 841/5773 [1:18:20<7:29:17, 5.47s/it] 15%|█▍ | 842/5773 [1:18:31<7:28:24, 5.46s/it] 15%|█▍ | 842/5773 [1:18:25<7:28:24, 5.46s/it] {'loss': 0.5906, 'learning_rate': 1.9305757124590164e-05, 'epoch': 0.15} 15%|█▍ | 842/5773 [1:18:31<7:28:24, 5.46s/it] {'loss': 0.5906, 'learning_rate': 1.9305757124590164e-05, 'epoch': 0.15} 15%|█▍ | 842/5773 [1:18:25<7:28:24, 5.46s/it] 15%|█▍ | 843/5773 [1:18:36<7:26:47, 5.44s/it] 15%|█▍ | 843/5773 [1:18:31<7:26:47, 5.44s/it] {'loss': 0.602, 'learning_rate': 1.9303701479616915e-05, 'epoch': 0.15} 15%|█▍ | 843/5773 [1:18:36<7:26:47, 5.44s/it] {'loss': 0.602, 'learning_rate': 1.9303701479616915e-05, 'epoch': 0.15} 15%|█▍ | 843/5773 [1:18:31<7:26:47, 5.44s/it] 15%|█▍ | 844/5773 [1:18:42<7:25:20, 5.42s/it] 15%|█▍ | 844/5773 [1:18:36<7:25:20, 5.42s/it] {'loss': 0.5967, 'learning_rate': 1.9301642905541186e-05, 'epoch': 0.15} 15%|█▍ | 844/5773 [1:18:42<7:25:20, 5.42s/it] {'loss': 0.5967, 'learning_rate': 1.9301642905541186e-05, 'epoch': 0.15} 15%|█▍ | 844/5773 [1:18:36<7:25:20, 5.42s/it] 15%|█▍ | 845/5773 [1:18:47<7:24:15, 5.41s/it] 15%|█▍ | 845/5773 [1:18:41<7:24:15, 5.41s/it] {'loss': 0.5936, 'learning_rate': 1.9299581403011082e-05, 'epoch': 0.15} 15%|█▍ | 845/5773 [1:18:47<7:24:15, 5.41s/it] {'loss': 0.5936, 'learning_rate': 1.9299581403011082e-05, 'epoch': 0.15} 15%|█▍ | 845/5773 [1:18:41<7:24:15, 5.41s/it] 15%|█▍ | 846/5773 [1:18:53<7:33:57, 5.53s/it] 15%|█▍ | 846/5773 [1:18:47<7:33:57, 5.53s/it] {'loss': 0.5991, 'learning_rate': 1.9297516972675638e-05, 'epoch': 0.15} 15%|█▍ | 846/5773 [1:18:53<7:33:57, 5.53s/it] {'loss': 0.5991, 'learning_rate': 1.9297516972675638e-05, 'epoch': 0.15} 15%|█▍ | 846/5773 [1:18:47<7:33:57, 5.53s/it] 15%|█▍ | 847/5773 [1:18:58<7:34:54, 5.54s/it] 15%|█▍ | 847/5773 [1:18:53<7:34:54, 5.54s/it] {'loss': 0.594, 'learning_rate': 1.9295449615184792e-05, 'epoch': 0.15} 15%|█▍ | 847/5773 [1:18:58<7:34:54, 5.54s/it] {'loss': 0.594, 'learning_rate': 1.9295449615184792e-05, 'epoch': 0.15} 15%|█▍ | 847/5773 [1:18:53<7:34:54, 5.54s/it] 15%|█▍ | 848/5773 [1:19:04<7:32:47, 5.52s/it] 15%|█▍ | 848/5773 [1:18:58<7:32:47, 5.52s/it] {'loss': 0.6054, 'learning_rate': 1.9293379331189425e-05, 'epoch': 0.15} 15%|█▍ | 848/5773 [1:19:04<7:32:47, 5.52s/it] {'loss': 0.6054, 'learning_rate': 1.9293379331189425e-05, 'epoch': 0.15} 15%|█▍ | 848/5773 [1:18:58<7:32:47, 5.52s/it] 15%|█▍ | 849/5773 [1:19:09<7:32:52, 5.52s/it] 15%|█▍ | 849/5773 [1:19:04<7:32:53, 5.52s/it] {'loss': 0.5876, 'learning_rate': 1.9291306121341318e-05, 'epoch': 0.15} 15%|█▍ | 849/5773 [1:19:09<7:32:52, 5.52s/it] {'loss': 0.5876, 'learning_rate': 1.9291306121341318e-05, 'epoch': 0.15} 15%|█▍ | 849/5773 [1:19:04<7:32:53, 5.52s/it]14 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 15%|█▍ | 850/5773 [1:19:15<7:32:27, 5.51s/it]9 AutoResumeHook: Checking whether to suspend...15 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 15%|█▍ | 850/5773 [1:19:09<7:32:27, 5.51s/it]10 AutoResumeHook: Checking whether to suspend... {'loss': 0.6127, 'learning_rate': 1.928922998629319e-05, 'epoch': 0.15} 15%|█▍ | 850/5773 [1:19:15<7:32:27, 5.51s/it] {'loss': 0.6127, 'learning_rate': 1.928922998629319e-05, 'epoch': 0.15} 15%|█▍ | 850/5773 [1:19:09<7:32:27, 5.51s/it] 15%|█▍ | 851/5773 [1:19:20<7:29:23, 5.48s/it] 15%|█▍ | 851/5773 [1:19:15<7:29:22, 5.48s/it]{'loss': 0.6114, 'learning_rate': 1.9287150926698673e-05, 'epoch': 0.15} 15%|█▍ | 851/5773 [1:19:20<7:29:23, 5.48s/it] {'loss': 0.6114, 'learning_rate': 1.9287150926698673e-05, 'epoch': 0.15} 15%|█▍ | 851/5773 [1:19:15<7:29:22, 5.48s/it] 15%|█▍ | 852/5773 [1:19:26<7:32:57, 5.52s/it] 15%|█▍ | 852/5773 [1:19:20<7:32:58, 5.52s/it] {'loss': 0.5954, 'learning_rate': 1.9285068943212327e-05, 'epoch': 0.15} 15%|█▍ | 852/5773 [1:19:26<7:32:57, 5.52s/it] {'loss': 0.5954, 'learning_rate': 1.9285068943212327e-05, 'epoch': 0.15} 15%|█▍ | 852/5773 [1:19:20<7:32:58, 5.52s/it] 15%|█▍ | 853/5773 [1:19:31<7:28:50, 5.47s/it] 15%|█▍ | 853/5773 [1:19:26<7:28:49, 5.47s/it] {'loss': 0.5861, 'learning_rate': 1.9282984036489618e-05, 'epoch': 0.15} 15%|█▍ | 853/5773 [1:19:31<7:28:50, 5.47s/it] {'loss': 0.5861, 'learning_rate': 1.9282984036489618e-05, 'epoch': 0.15} 15%|█▍ | 853/5773 [1:19:26<7:28:49, 5.47s/it] 15%|█▍ | 854/5773 [1:19:37<7:30:09, 5.49s/it] 15%|█▍ | 854/5773 [1:19:31<7:30:09, 5.49s/it] {'loss': 0.5905, 'learning_rate': 1.928089620718694e-05, 'epoch': 0.15} 15%|█▍ | 854/5773 [1:19:37<7:30:09, 5.49s/it] {'loss': 0.5905, 'learning_rate': 1.928089620718694e-05, 'epoch': 0.15} 15%|█▍ | 854/5773 [1:19:31<7:30:09, 5.49s/it] 15%|█▍ | 855/5773 [1:19:42<7:25:21, 5.43s/it] 15%|█▍ | 855/5773 [1:19:37<7:25:21, 5.43s/it] {'loss': 0.5867, 'learning_rate': 1.9278805455961623e-05, 'epoch': 0.15} 15%|█▍ | 855/5773 [1:19:42<7:25:21, 5.43s/it] {'loss': 0.5867, 'learning_rate': 1.9278805455961623e-05, 'epoch': 0.15} 15%|█▍ | 855/5773 [1:19:37<7:25:21, 5.43s/it] 15%|█▍ | 856/5773 [1:19:48<7:26:55, 5.45s/it] 15%|█▍ | 856/5773 [1:19:42<7:26:55, 5.45s/it] {'loss': 0.5971, 'learning_rate': 1.9276711783471888e-05, 'epoch': 0.15} 15%|█▍ | 856/5773 [1:19:48<7:26:55, 5.45s/it] {'loss': 0.5971, 'learning_rate': 1.9276711783471888e-05, 'epoch': 0.15} 15%|█▍ | 856/5773 [1:19:42<7:26:55, 5.45s/it] 15%|█▍ | 857/5773 [1:19:53<7:29:56, 5.49s/it] 15%|█▍ | 857/5773 [1:19:48<7:29:56, 5.49s/it] {'loss': 0.5926, 'learning_rate': 1.9274615190376894e-05, 'epoch': 0.15} 15%|█▍ | 857/5773 [1:19:53<7:29:56, 5.49s/it] {'loss': 0.5926, 'learning_rate': 1.9274615190376894e-05, 'epoch': 0.15} 15%|█▍ | 857/5773 [1:19:48<7:29:56, 5.49s/it] 15%|█▍ | 858/5773 [1:19:59<7:32:25, 5.52s/it] 15%|█▍ | 858/5773 [1:19:53<7:32:26, 5.52s/it] {'loss': 0.5944, 'learning_rate': 1.927251567733672e-05, 'epoch': 0.15} 15%|█▍ | 858/5773 [1:19:59<7:32:25, 5.52s/it] {'loss': 0.5944, 'learning_rate': 1.927251567733672e-05, 'epoch': 0.15} 15%|█▍ | 858/5773 [1:19:53<7:32:26, 5.52s/it] 15%|█▍ | 859/5773 [1:20:04<7:29:22, 5.49s/it] 15%|█▍ | 859/5773 [1:19:59<7:29:23, 5.49s/it] {'loss': 0.5878, 'learning_rate': 1.927041324501235e-05, 'epoch': 0.15} 15%|█▍ | 859/5773 [1:20:04<7:29:22, 5.49s/it] {'loss': 0.5878, 'learning_rate': 1.927041324501235e-05, 'epoch': 0.15} 15%|█▍ | 859/5773 [1:19:59<7:29:23, 5.49s/it] 15%|█▍ | 860/5773 [1:20:10<7:27:27, 5.46s/it] 15%|█▍ | 860/5773 [1:20:04<7:27:27, 5.46s/it] {'loss': 0.579, 'learning_rate': 1.9268307894065704e-05, 'epoch': 0.15} 15%|█▍ | 860/5773 [1:20:10<7:27:27, 5.46s/it] {'loss': 0.579, 'learning_rate': 1.9268307894065704e-05, 'epoch': 0.15} 15%|█▍ | 860/5773 [1:20:04<7:27:27, 5.46s/it] 15%|█▍ | 861/5773 [1:20:15<7:27:58, 5.47s/it] 15%|█▍ | 861/5773 [1:20:10<7:27:58, 5.47s/it] {'loss': 0.5877, 'learning_rate': 1.9266199625159615e-05, 'epoch': 0.15} 15%|█▍ | 861/5773 [1:20:15<7:27:58, 5.47s/it] {'loss': 0.5877, 'learning_rate': 1.9266199625159615e-05, 'epoch': 0.15} 15%|█▍ | 861/5773 [1:20:10<7:27:58, 5.47s/it] 15%|█▍ | 862/5773 [1:20:21<7:27:40, 5.47s/it] 15%|█▍ | 862/5773 [1:20:15<7:27:40, 5.47s/it] {'loss': 0.5884, 'learning_rate': 1.926408843895783e-05, 'epoch': 0.15} 15%|█▍ | 862/5773 [1:20:21<7:27:40, 5.47s/it] {'loss': 0.5884, 'learning_rate': 1.926408843895783e-05, 'epoch': 0.15} 15%|█▍ | 862/5773 [1:20:15<7:27:40, 5.47s/it] 15%|█▍ | 863/5773 [1:20:26<7:27:22, 5.47s/it] 15%|█▍ | 863/5773 [1:20:20<7:27:21, 5.47s/it] {'loss': 0.6041, 'learning_rate': 1.926197433612502e-05, 'epoch': 0.15} 15%|█▍ | 863/5773 [1:20:26<7:27:22, 5.47s/it] {'loss': 0.6041, 'learning_rate': 1.926197433612502e-05, 'epoch': 0.15} 15%|█▍ | 863/5773 [1:20:20<7:27:21, 5.47s/it] 15%|█▍ | 864/5773 [1:20:31<7:25:38, 5.45s/it] 15%|█▍ | 864/5773 [1:20:26<7:25:38, 5.45s/it] {'loss': 0.6081, 'learning_rate': 1.925985731732677e-05, 'epoch': 0.15} 15%|█▍ | 864/5773 [1:20:31<7:25:38, 5.45s/it] {'loss': 0.6081, 'learning_rate': 1.925985731732677e-05, 'epoch': 0.15} 15%|█▍ | 864/5773 [1:20:26<7:25:38, 5.45s/it] 15%|█▍ | 865/5773 [1:20:37<7:21:54, 5.40s/it] 15%|█▍ | 865/5773 [1:20:31<7:21:54, 5.40s/it] {'loss': 0.5952, 'learning_rate': 1.9257737383229586e-05, 'epoch': 0.15} 15%|█▍ | 865/5773 [1:20:31<7:21:54, 5.40s/it]{'loss': 0.5952, 'learning_rate': 1.9257737383229586e-05, 'epoch': 0.15} 15%|█▍ | 865/5773 [1:20:37<7:21:54, 5.40s/it] 15%|█▌ | 866/5773 [1:20:37<7:21:25, 5.40s/it] 15%|█▌ | 866/5773 [1:20:42<7:21:26, 5.40s/it] {'loss': 0.6192, 'learning_rate': 1.925561453450089e-05, 'epoch': 0.15} 15%|█▌ | 866/5773 [1:20:42<7:21:26, 5.40s/it] {'loss': 0.6192, 'learning_rate': 1.925561453450089e-05, 'epoch': 0.15} 15%|█▌ | 866/5773 [1:20:37<7:21:25, 5.40s/it] 15%|█▌ | 867/5773 [1:20:48<7:24:20, 5.43s/it] 15%|█▌ | 867/5773 [1:20:42<7:24:21, 5.43s/it] {'loss': 0.5897, 'learning_rate': 1.9253488771809024e-05, 'epoch': 0.15} 15%|█▌ | 867/5773 [1:20:48<7:24:20, 5.43s/it] {'loss': 0.5897, 'learning_rate': 1.9253488771809024e-05, 'epoch': 0.15} 15%|█▌ | 867/5773 [1:20:42<7:24:21, 5.43s/it] 15%|█▌ | 868/5773 [1:20:53<7:24:29, 5.44s/it] 15%|█▌ | 868/5773 [1:20:48<7:24:30, 5.44s/it] {'loss': 0.5987, 'learning_rate': 1.9251360095823246e-05, 'epoch': 0.15} 15%|█▌ | 868/5773 [1:20:53<7:24:29, 5.44s/it] {'loss': 0.5987, 'learning_rate': 1.9251360095823246e-05, 'epoch': 0.15} 15%|█▌ | 868/5773 [1:20:48<7:24:30, 5.44s/it] 15%|█▌ | 869/5773 [1:20:58<7:21:29, 5.40s/it] 15%|█▌ | 869/5773 [1:20:53<7:21:28, 5.40s/it] {'loss': 0.5888, 'learning_rate': 1.9249228507213732e-05, 'epoch': 0.15} 15%|█▌ | 869/5773 [1:20:58<7:21:29, 5.40s/it] {'loss': 0.5888, 'learning_rate': 1.9249228507213732e-05, 'epoch': 0.15} 15%|█▌ | 869/5773 [1:20:53<7:21:28, 5.40s/it] 15%|█▌ | 870/5773 [1:21:04<7:22:30, 5.42s/it] 15%|█▌ | 870/5773 [1:20:58<7:22:30, 5.42s/it] {'loss': 0.5992, 'learning_rate': 1.924709400665157e-05, 'epoch': 0.15} 15%|█▌ | 870/5773 [1:21:04<7:22:30, 5.42s/it] {'loss': 0.5992, 'learning_rate': 1.924709400665157e-05, 'epoch': 0.15} 15%|█▌ | 870/5773 [1:20:58<7:22:30, 5.42s/it] 15%|█▌ | 871/5773 [1:21:09<7:21:44, 5.41s/it] 15%|█▌ | 871/5773 [1:21:04<7:21:44, 5.41s/it] {'loss': 0.5834, 'learning_rate': 1.9244956594808778e-05, 'epoch': 0.15} 15%|█▌ | 871/5773 [1:21:09<7:21:44, 5.41s/it] {'loss': 0.5834, 'learning_rate': 1.9244956594808778e-05, 'epoch': 0.15} 15%|█▌ | 871/5773 [1:21:04<7:21:44, 5.41s/it] 15%|█▌ | 872/5773 [1:21:15<7:20:56, 5.40s/it] 15%|█▌ | 872/5773 [1:21:09<7:20:55, 5.40s/it] {'loss': 0.5788, 'learning_rate': 1.9242816272358272e-05, 'epoch': 0.15} 15%|█▌ | 872/5773 [1:21:15<7:20:56, 5.40s/it] {'loss': 0.5788, 'learning_rate': 1.9242816272358272e-05, 'epoch': 0.15} 15%|█▌ | 872/5773 [1:21:09<7:20:55, 5.40s/it] 15%|█▌ | 873/5773 [1:21:20<7:22:59, 5.42s/it] 15%|█▌ | 873/5773 [1:21:15<7:22:59, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.9240673039973897e-05, 'epoch': 0.15} 15%|█▌ | 873/5773 [1:21:20<7:22:59, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.9240673039973897e-05, 'epoch': 0.15} 15%|█▌ | 873/5773 [1:21:15<7:22:59, 5.42s/it] 15%|█▌ | 874/5773 [1:21:26<7:27:04, 5.48s/it] 15%|█▌ | 874/5773 [1:21:20<7:27:04, 5.48s/it] {'loss': 0.5885, 'learning_rate': 1.923852689833041e-05, 'epoch': 0.15} 15%|█▌ | 874/5773 [1:21:26<7:27:04, 5.48s/it] {'loss': 0.5885, 'learning_rate': 1.923852689833041e-05, 'epoch': 0.15} 15%|█▌ | 874/5773 [1:21:20<7:27:04, 5.48s/it] 15%|█▌ | 875/5773 [1:21:31<7:22:53, 5.43s/it] 15%|█▌ | 875/5773 [1:21:25<7:22:53, 5.43s/it] {'loss': 0.5899, 'learning_rate': 1.923637784810349e-05, 'epoch': 0.15} 15%|█▌ | 875/5773 [1:21:31<7:22:53, 5.43s/it] {'loss': 0.5899, 'learning_rate': 1.923637784810349e-05, 'epoch': 0.15} 15%|█▌ | 875/5773 [1:21:25<7:22:53, 5.43s/it] 15%|█▌ | 876/5773 [1:21:36<7:21:34, 5.41s/it] 15%|█▌ | 876/5773 [1:21:31<7:21:35, 5.41s/it] {'loss': 0.594, 'learning_rate': 1.9234225889969723e-05, 'epoch': 0.15} 15%|█▌ | 876/5773 [1:21:36<7:21:34, 5.41s/it] {'loss': 0.594, 'learning_rate': 1.9234225889969723e-05, 'epoch': 0.15} 15%|█▌ | 876/5773 [1:21:31<7:21:35, 5.41s/it] 15%|█▌ | 877/5773 [1:21:42<7:25:50, 5.46s/it] 15%|█▌ | 877/5773 [1:21:36<7:25:49, 5.46s/it] {'loss': 0.6094, 'learning_rate': 1.923207102460661e-05, 'epoch': 0.15} 15%|█▌ | 877/5773 [1:21:42<7:25:50, 5.46s/it] {'loss': 0.6094, 'learning_rate': 1.923207102460661e-05, 'epoch': 0.15} 15%|█▌ | 877/5773 [1:21:36<7:25:49, 5.46s/it] 15%|█▌ | 878/5773 [1:21:48<7:28:43, 5.50s/it] 15%|█▌ | 878/5773 [1:21:42<7:28:43, 5.50s/it] {'loss': 0.5966, 'learning_rate': 1.922991325269258e-05, 'epoch': 0.15} 15%|█▌ | 878/5773 [1:21:48<7:28:43, 5.50s/it] {'loss': 0.5966, 'learning_rate': 1.922991325269258e-05, 'epoch': 0.15} 15%|█▌ | 878/5773 [1:21:42<7:28:43, 5.50s/it] 15%|█▌ | 879/5773 [1:21:53<7:29:01, 5.50s/it] 15%|█▌ | 879/5773 [1:21:48<7:29:01, 5.50s/it] {'loss': 0.597, 'learning_rate': 1.9227752574906965e-05, 'epoch': 0.15} 15%|█▌ | 879/5773 [1:21:53<7:29:01, 5.50s/it] {'loss': 0.597, 'learning_rate': 1.9227752574906965e-05, 'epoch': 0.15} 15%|█▌ | 879/5773 [1:21:48<7:29:01, 5.50s/it] 15%|█▌ | 880/5773 [1:21:58<7:27:49, 5.49s/it] 15%|█▌ | 880/5773 [1:21:53<7:27:49, 5.49s/it] {'loss': 0.5926, 'learning_rate': 1.922558899193001e-05, 'epoch': 0.15} 15%|█▌ | 880/5773 [1:21:58<7:27:49, 5.49s/it] {'loss': 0.5926, 'learning_rate': 1.922558899193001e-05, 'epoch': 0.15} 15%|█▌ | 880/5773 [1:21:53<7:27:49, 5.49s/it] 15%|█▌ | 881/5773 [1:22:04<7:28:02, 5.50s/it] 15%|█▌ | 881/5773 [1:21:58<7:28:02, 5.50s/it] {'loss': 0.6119, 'learning_rate': 1.922342250444289e-05, 'epoch': 0.15} 15%|█▌ | 881/5773 [1:22:04<7:28:02, 5.50s/it] {'loss': 0.6119, 'learning_rate': 1.922342250444289e-05, 'epoch': 0.15} 15%|█▌ | 881/5773 [1:21:58<7:28:02, 5.50s/it] 15%|█▌ | 882/5773 [1:22:10<7:28:20, 5.50s/it] 15%|█▌ | 882/5773 [1:22:04<7:28:21, 5.50s/it] {'loss': 0.5949, 'learning_rate': 1.9221253113127677e-05, 'epoch': 0.15} 15%|█▌ | 882/5773 [1:22:10<7:28:20, 5.50s/it] {'loss': 0.5949, 'learning_rate': 1.9221253113127677e-05, 'epoch': 0.15} 15%|█▌ | 882/5773 [1:22:04<7:28:21, 5.50s/it] 15%|█▌ | 883/5773 [1:22:15<7:26:18, 5.48s/it] 15%|█▌ | 883/5773 [1:22:09<7:26:18, 5.48s/it] {'loss': 0.5953, 'learning_rate': 1.9219080818667365e-05, 'epoch': 0.15} 15%|█▌ | 883/5773 [1:22:15<7:26:18, 5.48s/it] {'loss': 0.5953, 'learning_rate': 1.9219080818667365e-05, 'epoch': 0.15} 15%|█▌ | 883/5773 [1:22:09<7:26:18, 5.48s/it] 15%|█▌ | 884/5773 [1:22:20<7:24:28, 5.45s/it] 15%|█▌ | 884/5773 [1:22:15<7:24:28, 5.45s/it] {'loss': 0.5846, 'learning_rate': 1.9216905621745866e-05, 'epoch': 0.15} 15%|█▌ | 884/5773 [1:22:20<7:24:28, 5.45s/it] {'loss': 0.5846, 'learning_rate': 1.9216905621745866e-05, 'epoch': 0.15} 15%|█▌ | 884/5773 [1:22:15<7:24:28, 5.45s/it] 15%|█▌ | 885/5773 [1:22:26<7:22:32, 5.43s/it] 15%|█▌ | 885/5773 [1:22:20<7:22:31, 5.43s/it] {'loss': 0.5842, 'learning_rate': 1.9214727523048e-05, 'epoch': 0.15} 15%|█▌ | 885/5773 [1:22:26<7:22:32, 5.43s/it] {'loss': 0.5842, 'learning_rate': 1.9214727523048e-05, 'epoch': 0.15} 15%|█▌ | 885/5773 [1:22:20<7:22:31, 5.43s/it] 15%|█▌ | 886/5773 [1:22:31<7:25:15, 5.47s/it] 15%|█▌ | 886/5773 [1:22:26<7:25:14, 5.47s/it] {'loss': 0.5983, 'learning_rate': 1.9212546523259498e-05, 'epoch': 0.15} 15%|█▌ | 886/5773 [1:22:31<7:25:15, 5.47s/it] {'loss': 0.5983, 'learning_rate': 1.9212546523259498e-05, 'epoch': 0.15} 15%|█▌ | 886/5773 [1:22:26<7:25:14, 5.47s/it] 15%|█▌ | 887/5773 [1:22:37<7:27:13, 5.49s/it] 15%|█▌ | 887/5773 [1:22:31<7:27:13, 5.49s/it] {'loss': 0.5915, 'learning_rate': 1.9210362623067015e-05, 'epoch': 0.15} 15%|█▌ | 887/5773 [1:22:37<7:27:13, 5.49s/it] {'loss': 0.5915, 'learning_rate': 1.9210362623067015e-05, 'epoch': 0.15} 15%|█▌ | 887/5773 [1:22:31<7:27:13, 5.49s/it] 15%|█▌ | 888/5773 [1:22:43<7:32:11, 5.55s/it] 15%|█▌ | 888/5773 [1:22:37<7:32:11, 5.55s/it] {'loss': 0.6004, 'learning_rate': 1.920817582315811e-05, 'epoch': 0.15} 15%|█▌ | 888/5773 [1:22:43<7:32:11, 5.55s/it] {'loss': 0.6004, 'learning_rate': 1.920817582315811e-05, 'epoch': 0.15} 15%|█▌ | 888/5773 [1:22:37<7:32:11, 5.55s/it] 15%|█▌ | 889/5773 [1:22:48<7:26:10, 5.48s/it] 15%|█▌ | 889/5773 [1:22:42<7:26:10, 5.48s/it] {'loss': 0.5862, 'learning_rate': 1.920598612422125e-05, 'epoch': 0.15} 15%|█▌ | 889/5773 [1:22:48<7:26:10, 5.48s/it] {'loss': 0.5862, 'learning_rate': 1.920598612422125e-05, 'epoch': 0.15} 15%|█▌ | 889/5773 [1:22:42<7:26:10, 5.48s/it] 15%|█▌ | 890/5773 [1:22:53<7:23:59, 5.46s/it] 15%|█▌ | 890/5773 [1:22:48<7:24:00, 5.46s/it] {'loss': 0.5994, 'learning_rate': 1.9203793526945835e-05, 'epoch': 0.15} 15%|█▌ | 890/5773 [1:22:53<7:23:59, 5.46s/it] {'loss': 0.5994, 'learning_rate': 1.9203793526945835e-05, 'epoch': 0.15} 15%|█▌ | 890/5773 [1:22:48<7:24:00, 5.46s/it] 15%|█▌ | 891/5773 [1:22:59<7:22:29, 5.44s/it] 15%|█▌ | 891/5773 [1:22:53<7:22:29, 5.44s/it] {'loss': 0.5946, 'learning_rate': 1.9201598032022156e-05, 'epoch': 0.15} 15%|█▌ | 891/5773 [1:22:59<7:22:29, 5.44s/it] {'loss': 0.5946, 'learning_rate': 1.9201598032022156e-05, 'epoch': 0.15} 15%|█▌ | 891/5773 [1:22:53<7:22:29, 5.44s/it] 15%|█▌ | 892/5773 [1:23:04<7:22:16, 5.44s/it] 15%|█▌ | 892/5773 [1:22:59<7:22:15, 5.44s/it] {'loss': 0.5859, 'learning_rate': 1.9199399640141428e-05, 'epoch': 0.15} 15%|█▌ | 892/5773 [1:23:04<7:22:16, 5.44s/it] {'loss': 0.5859, 'learning_rate': 1.9199399640141428e-05, 'epoch': 0.15} 15%|█▌ | 892/5773 [1:22:59<7:22:15, 5.44s/it] 15%|█▌ | 893/5773 [1:23:09<7:21:29, 5.43s/it] 15%|█▌ | 893/5773 [1:23:04<7:21:29, 5.43s/it] {'loss': 0.5928, 'learning_rate': 1.9197198351995774e-05, 'epoch': 0.15} 15%|█▌ | 893/5773 [1:23:09<7:21:29, 5.43s/it] {'loss': 0.5928, 'learning_rate': 1.9197198351995774e-05, 'epoch': 0.15} 15%|█▌ | 893/5773 [1:23:04<7:21:29, 5.43s/it] 15%|█▌ | 894/5773 [1:23:15<7:18:07, 5.39s/it] 15%|█▌ | 894/5773 [1:23:09<7:18:07, 5.39s/it] {'loss': 0.5824, 'learning_rate': 1.9194994168278232e-05, 'epoch': 0.15} 15%|█▌ | 894/5773 [1:23:15<7:18:07, 5.39s/it] {'loss': 0.5824, 'learning_rate': 1.9194994168278232e-05, 'epoch': 0.15} 15%|█▌ | 894/5773 [1:23:09<7:18:07, 5.39s/it] 16%|█▌ | 895/5773 [1:23:20<7:19:25, 5.40s/it] {'loss': 0.6057, 'learning_rate': 1.919278708968275e-05, 'epoch': 0.16} 16%|█▌ | 895/5773 [1:23:15<7:19:23, 5.40s/it] {'loss': 0.6057, 'learning_rate': 1.919278708968275e-05, 'epoch': 0.16} 16%|█▌ | 895/5773 [1:23:20<7:19:25, 5.40s/it] 16%|█▌ | 895/5773 [1:23:15<7:19:23, 5.40s/it] 16%|█▌ | 896/5773 [1:23:26<7:20:52, 5.42s/it] 16%|█▌ | 896/5773 [1:23:20<7:20:52, 5.42s/it] {'loss': 0.595, 'learning_rate': 1.919057711690418e-05, 'epoch': 0.16} 16%|█▌ | 896/5773 [1:23:26<7:20:52, 5.42s/it] {'loss': 0.595, 'learning_rate': 1.919057711690418e-05, 'epoch': 0.16} 16%|█▌ | 896/5773 [1:23:20<7:20:52, 5.42s/it] 16%|█▌ | 897/5773 [1:23:31<7:22:32, 5.45s/it] 16%|█▌ | 897/5773 [1:23:26<7:22:33, 5.45s/it] {'loss': 0.6064, 'learning_rate': 1.9188364250638295e-05, 'epoch': 0.16} 16%|█▌ | 897/5773 [1:23:31<7:22:32, 5.45s/it] {'loss': 0.6064, 'learning_rate': 1.9188364250638295e-05, 'epoch': 0.16} 16%|█▌ | 897/5773 [1:23:26<7:22:33, 5.45s/it] 16%|█▌ | 898/5773 [1:23:37<7:21:53, 5.44s/it] 16%|█▌ | 898/5773 [1:23:31<7:21:54, 5.44s/it] {'loss': 0.5991, 'learning_rate': 1.9186148491581782e-05, 'epoch': 0.16} 16%|█▌ | 898/5773 [1:23:37<7:21:53, 5.44s/it] {'loss': 0.5991, 'learning_rate': 1.9186148491581782e-05, 'epoch': 0.16} 16%|█▌ | 898/5773 [1:23:31<7:21:54, 5.44s/it] 16%|█▌ | 899/5773 [1:23:42<7:21:27, 5.43s/it] 16%|█▌ | 899/5773 [1:23:36<7:21:28, 5.43s/it] {'loss': 0.5964, 'learning_rate': 1.9183929840432227e-05, 'epoch': 0.16} 16%|█▌ | 899/5773 [1:23:42<7:21:27, 5.43s/it] {'loss': 0.5964, 'learning_rate': 1.9183929840432227e-05, 'epoch': 0.16} 16%|█▌ | 899/5773 [1:23:36<7:21:28, 5.43s/it]6 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 16%|█▌ | 900/5773 [1:23:47<7:18:30, 5.40s/it]3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 16%|█▌ | 900/5773 [1:23:42<7:18:30, 5.40s/it] {'loss': 0.6064, 'learning_rate': 1.9181708297888133e-05, 'epoch': 0.16} 16%|█▌ | 900/5773 [1:23:47<7:18:30, 5.40s/it] {'loss': 0.6064, 'learning_rate': 1.9181708297888133e-05, 'epoch': 0.16} 16%|█▌ | 900/5773 [1:23:42<7:18:30, 5.40s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 16%|█▌ | 901/5773 [1:24:00<12:27:24, 9.20s/it] 16%|█▌ | 901/5773 [1:24:05<12:27:25, 9.20s/it] {'loss': 0.5806, 'learning_rate': 1.9179483864648913e-05, 'epoch': 0.16} 16%|█▌ | 901/5773 [1:24:05<12:27:25, 9.20s/it] {'loss': 0.5806, 'learning_rate': 1.9179483864648913e-05, 'epoch': 0.16} 16%|█▌ | 901/5773 [1:24:00<12:27:24, 9.20s/it] 16%|█▌ | 902/5773 [1:24:11<10:55:13, 8.07s/it] 16%|█▌ | 902/5773 [1:24:05<10:55:12, 8.07s/it] {'loss': 0.591, 'learning_rate': 1.9177256541414894e-05, 'epoch': 0.16} 16%|█▌ | 902/5773 [1:24:11<10:55:13, 8.07s/it] {'loss': 0.591, 'learning_rate': 1.9177256541414894e-05, 'epoch': 0.16} 16%|█▌ | 902/5773 [1:24:05<10:55:12, 8.07s/it] 16%|█▌ | 903/5773 [1:24:16<9:49:12, 7.26s/it] 16%|█▌ | 903/5773 [1:24:11<9:49:12, 7.26s/it] {'loss': 0.5949, 'learning_rate': 1.91750263288873e-05, 'epoch': 0.16} 16%|█▌ | 903/5773 [1:24:16<9:49:12, 7.26s/it] {'loss': 0.5949, 'learning_rate': 1.91750263288873e-05, 'epoch': 0.16} 16%|█▌ | 903/5773 [1:24:11<9:49:12, 7.26s/it] 16%|█▌ | 904/5773 [1:24:22<9:10:28, 6.78s/it] 16%|█▌ | 904/5773 [1:24:16<9:10:28, 6.78s/it] {'loss': 0.589, 'learning_rate': 1.917279322776828e-05, 'epoch': 0.16} 16%|█▌ | 904/5773 [1:24:22<9:10:28, 6.78s/it] {'loss': 0.589, 'learning_rate': 1.917279322776828e-05, 'epoch': 0.16} 16%|█▌ | 904/5773 [1:24:16<9:10:28, 6.78s/it] 16%|█▌ | 905/5773 [1:24:27<8:37:42, 6.38s/it] 16%|█▌ | 905/5773 [1:24:22<8:37:42, 6.38s/it] {'loss': 0.5813, 'learning_rate': 1.917055723876088e-05, 'epoch': 0.16} 16%|█▌ | 905/5773 [1:24:27<8:37:42, 6.38s/it] {'loss': 0.5813, 'learning_rate': 1.917055723876088e-05, 'epoch': 0.16} 16%|█▌ | 905/5773 [1:24:22<8:37:42, 6.38s/it] 16%|█▌ | 906/5773 [1:24:27<8:14:21, 6.09s/it] 16%|█▌ | 906/5773 [1:24:33<8:14:22, 6.09s/it] {'loss': 0.6072, 'learning_rate': 1.916831836256907e-05, 'epoch': 0.16} 16%|█▌ | 906/5773 [1:24:33<8:14:22, 6.09s/it] {'loss': 0.6072, 'learning_rate': 1.916831836256907e-05, 'epoch': 0.16} 16%|█▌ | 906/5773 [1:24:27<8:14:21, 6.09s/it] 16%|█▌ | 907/5773 [1:24:38<7:59:48, 5.92s/it] 16%|█▌ | 907/5773 [1:24:33<7:59:49, 5.92s/it] {'loss': 0.6004, 'learning_rate': 1.9166076599897706e-05, 'epoch': 0.16} 16%|█▌ | 907/5773 [1:24:38<7:59:48, 5.92s/it] {'loss': 0.6004, 'learning_rate': 1.9166076599897706e-05, 'epoch': 0.16} 16%|█▌ | 907/5773 [1:24:33<7:59:49, 5.92s/it] 16%|█▌ | 908/5773 [1:24:44<7:45:42, 5.74s/it] 16%|█▌ | 908/5773 [1:24:38<7:45:42, 5.74s/it] {'loss': 0.5779, 'learning_rate': 1.916383195145258e-05, 'epoch': 0.16} 16%|█▌ | 908/5773 [1:24:44<7:45:42, 5.74s/it] {'loss': 0.5779, 'learning_rate': 1.916383195145258e-05, 'epoch': 0.16} 16%|█▌ | 908/5773 [1:24:38<7:45:42, 5.74s/it] 16%|█▌ | 909/5773 [1:24:49<7:38:44, 5.66s/it] 16%|█▌ | 909/5773 [1:24:44<7:38:44, 5.66s/it] {'loss': 0.5975, 'learning_rate': 1.916158441794037e-05, 'epoch': 0.16} 16%|█▌ | 909/5773 [1:24:49<7:38:44, 5.66s/it] {'loss': 0.5975, 'learning_rate': 1.916158441794037e-05, 'epoch': 0.16} 16%|█▌ | 909/5773 [1:24:44<7:38:44, 5.66s/it] 16%|█▌ | 910/5773 [1:24:54<7:32:29, 5.58s/it] 16%|█▌ | 910/5773 [1:24:49<7:32:29, 5.58s/it] {'loss': 0.5907, 'learning_rate': 1.9159334000068675e-05, 'epoch': 0.16} 16%|█▌ | 910/5773 [1:24:54<7:32:29, 5.58s/it] {'loss': 0.5907, 'learning_rate': 1.9159334000068675e-05, 'epoch': 0.16} 16%|█▌ | 910/5773 [1:24:49<7:32:29, 5.58s/it] 16%|█▌ | 911/5773 [1:25:00<7:28:41, 5.54s/it] 16%|█▌ | 911/5773 [1:24:54<7:28:42, 5.54s/it] {'loss': 0.5965, 'learning_rate': 1.9157080698546e-05, 'epoch': 0.16} 16%|█▌ | 911/5773 [1:25:00<7:28:41, 5.54s/it] {'loss': 0.5965, 'learning_rate': 1.9157080698546e-05, 'epoch': 0.16} 16%|█▌ | 911/5773 [1:24:54<7:28:42, 5.54s/it] 16%|█▌ | 912/5773 [1:25:05<7:24:05, 5.48s/it] 16%|█▌ | 912/5773 [1:25:00<7:24:05, 5.48s/it] {'loss': 0.5965, 'learning_rate': 1.9154824514081752e-05, 'epoch': 0.16} 16%|█▌ | 912/5773 [1:25:05<7:24:05, 5.48s/it] {'loss': 0.5965, 'learning_rate': 1.9154824514081752e-05, 'epoch': 0.16} 16%|█▌ | 912/5773 [1:25:00<7:24:05, 5.48s/it] 16%|█▌ | 913/5773 [1:25:11<7:23:57, 5.48s/it] 16%|█▌ | 913/5773 [1:25:05<7:23:57, 5.48s/it] {'loss': 0.6015, 'learning_rate': 1.9152565447386256e-05, 'epoch': 0.16} 16%|█▌ | 913/5773 [1:25:11<7:23:57, 5.48s/it] {'loss': 0.6015, 'learning_rate': 1.9152565447386256e-05, 'epoch': 0.16} 16%|█▌ | 913/5773 [1:25:05<7:23:57, 5.48s/it] 16%|█▌ | 914/5773 [1:25:16<7:23:29, 5.48s/it] 16%|█▌ | 914/5773 [1:25:11<7:23:28, 5.48s/it] {'loss': 0.5978, 'learning_rate': 1.915030349917073e-05, 'epoch': 0.16} 16%|█▌ | 914/5773 [1:25:16<7:23:29, 5.48s/it] {'loss': 0.5978, 'learning_rate': 1.915030349917073e-05, 'epoch': 0.16} 16%|█▌ | 914/5773 [1:25:11<7:23:28, 5.48s/it] 16%|█▌ | 915/5773 [1:25:22<7:27:26, 5.53s/it] 16%|█▌ | 915/5773 [1:25:16<7:27:26, 5.53s/it] {'loss': 0.5996, 'learning_rate': 1.9148038670147315e-05, 'epoch': 0.16} 16%|█▌ | 915/5773 [1:25:22<7:27:26, 5.53s/it] {'loss': 0.5996, 'learning_rate': 1.9148038670147315e-05, 'epoch': 0.16} 16%|█▌ | 915/5773 [1:25:16<7:27:26, 5.53s/it] 16%|█▌ | 916/5773 [1:25:27<7:28:19, 5.54s/it] 16%|█▌ | 916/5773 [1:25:22<7:28:19, 5.54s/it] {'loss': 0.5669, 'learning_rate': 1.914577096102905e-05, 'epoch': 0.16} 16%|█▌ | 916/5773 [1:25:27<7:28:19, 5.54s/it] {'loss': 0.5669, 'learning_rate': 1.914577096102905e-05, 'epoch': 0.16} 16%|█▌ | 916/5773 [1:25:22<7:28:19, 5.54s/it] 16%|█▌ | 917/5773 [1:25:33<7:26:03, 5.51s/it] 16%|█▌ | 917/5773 [1:25:27<7:26:02, 5.51s/it] {'loss': 0.5986, 'learning_rate': 1.9143500372529878e-05, 'epoch': 0.16} 16%|█▌ | 917/5773 [1:25:33<7:26:03, 5.51s/it] {'loss': 0.5986, 'learning_rate': 1.9143500372529878e-05, 'epoch': 0.16} 16%|█▌ | 917/5773 [1:25:27<7:26:02, 5.51s/it] 16%|█▌ | 918/5773 [1:25:38<7:25:11, 5.50s/it] 16%|█▌ | 918/5773 [1:25:33<7:25:11, 5.50s/it] {'loss': 0.5883, 'learning_rate': 1.9141226905364657e-05, 'epoch': 0.16} 16%|█▌ | 918/5773 [1:25:38<7:25:11, 5.50s/it] {'loss': 0.5883, 'learning_rate': 1.9141226905364657e-05, 'epoch': 0.16} 16%|█▌ | 918/5773 [1:25:33<7:25:11, 5.50s/it] 16%|█▌ | 919/5773 [1:25:44<7:19:51, 5.44s/it] 16%|█▌ | 919/5773 [1:25:38<7:19:51, 5.44s/it] {'loss': 0.5883, 'learning_rate': 1.9138950560249147e-05, 'epoch': 0.16} 16%|█▌ | 919/5773 [1:25:44<7:19:51, 5.44s/it] {'loss': 0.5883, 'learning_rate': 1.9138950560249147e-05, 'epoch': 0.16} 16%|█▌ | 919/5773 [1:25:38<7:19:51, 5.44s/it] 16%|█▌ | 920/5773 [1:25:49<7:21:00, 5.45s/it] 16%|█▌ | 920/5773 [1:25:44<7:21:00, 5.45s/it] {'loss': 0.5792, 'learning_rate': 1.9136671337900013e-05, 'epoch': 0.16} 16%|█▌ | 920/5773 [1:25:49<7:21:00, 5.45s/it] {'loss': 0.5792, 'learning_rate': 1.9136671337900013e-05, 'epoch': 0.16} 16%|█▌ | 920/5773 [1:25:44<7:21:00, 5.45s/it] 16%|█▌ | 921/5773 [1:25:54<7:16:06, 5.39s/it] 16%|█▌ | 921/5773 [1:25:49<7:16:06, 5.39s/it] {'loss': 0.6009, 'learning_rate': 1.9134389239034826e-05, 'epoch': 0.16} 16%|█▌ | 921/5773 [1:25:54<7:16:06, 5.39s/it] {'loss': 0.6009, 'learning_rate': 1.9134389239034826e-05, 'epoch': 0.16} 16%|█▌ | 921/5773 [1:25:49<7:16:06, 5.39s/it] 16%|█▌ | 922/5773 [1:26:00<7:16:40, 5.40s/it] 16%|█▌ | 922/5773 [1:25:54<7:16:39, 5.40s/it] {'loss': 0.6017, 'learning_rate': 1.9132104264372065e-05, 'epoch': 0.16} 16%|█▌ | 922/5773 [1:26:00<7:16:40, 5.40s/it] {'loss': 0.6017, 'learning_rate': 1.9132104264372065e-05, 'epoch': 0.16} 16%|█▌ | 922/5773 [1:25:54<7:16:39, 5.40s/it] 16%|█▌ | 923/5773 [1:26:05<7:13:48, 5.37s/it] 16%|█▌ | 923/5773 [1:26:00<7:13:48, 5.37s/it] {'loss': 0.5919, 'learning_rate': 1.9129816414631116e-05, 'epoch': 0.16} 16%|█▌ | 923/5773 [1:26:05<7:13:48, 5.37s/it] {'loss': 0.5919, 'learning_rate': 1.9129816414631116e-05, 'epoch': 0.16} 16%|█▌ | 923/5773 [1:26:00<7:13:48, 5.37s/it] 16%|█▌ | 924/5773 [1:26:10<7:14:04, 5.37s/it] 16%|█▌ | 924/5773 [1:26:05<7:14:05, 5.37s/it] {'loss': 0.6065, 'learning_rate': 1.9127525690532258e-05, 'epoch': 0.16} 16%|█▌ | 924/5773 [1:26:10<7:14:04, 5.37s/it] {'loss': 0.6065, 'learning_rate': 1.9127525690532258e-05, 'epoch': 0.16} 16%|█▌ | 924/5773 [1:26:05<7:14:05, 5.37s/it] 16%|█▌ | 925/5773 [1:26:16<7:18:05, 5.42s/it] 16%|█▌ | 925/5773 [1:26:10<7:18:05, 5.42s/it] {'loss': 0.6088, 'learning_rate': 1.9125232092796697e-05, 'epoch': 0.16} 16%|█▌ | 925/5773 [1:26:16<7:18:05, 5.42s/it] {'loss': 0.6088, 'learning_rate': 1.9125232092796697e-05, 'epoch': 0.16} 16%|█▌ | 925/5773 [1:26:10<7:18:05, 5.42s/it] 16%|█▌ | 926/5773 [1:26:22<7:21:45, 5.47s/it] 16%|█▌ | 926/5773 [1:26:16<7:21:46, 5.47s/it] {'loss': 0.5938, 'learning_rate': 1.9122935622146518e-05, 'epoch': 0.16} 16%|█▌ | 926/5773 [1:26:22<7:21:45, 5.47s/it] {'loss': 0.5938, 'learning_rate': 1.9122935622146518e-05, 'epoch': 0.16} 16%|█▌ | 926/5773 [1:26:16<7:21:46, 5.47s/it] 16%|█▌ | 927/5773 [1:26:27<7:20:50, 5.46s/it] 16%|█▌ | 927/5773 [1:26:21<7:20:50, 5.46s/it] {'loss': 0.6032, 'learning_rate': 1.9120636279304733e-05, 'epoch': 0.16} 16%|█▌ | 927/5773 [1:26:27<7:20:50, 5.46s/it] {'loss': 0.6032, 'learning_rate': 1.9120636279304733e-05, 'epoch': 0.16} 16%|█▌ | 927/5773 [1:26:21<7:20:50, 5.46s/it] 16%|█▌ | 928/5773 [1:26:33<7:23:24, 5.49s/it] 16%|█▌ | 928/5773 [1:26:27<7:23:24, 5.49s/it] {'loss': 0.5952, 'learning_rate': 1.911833406499524e-05, 'epoch': 0.16} 16%|█▌ | 928/5773 [1:26:33<7:23:24, 5.49s/it] {'loss': 0.5952, 'learning_rate': 1.911833406499524e-05, 'epoch': 0.16} 16%|█▌ | 928/5773 [1:26:27<7:23:24, 5.49s/it] 16%|█▌ | 929/5773 [1:26:38<7:23:18, 5.49s/it] 16%|█▌ | 929/5773 [1:26:33<7:23:18, 5.49s/it] {'loss': 0.5894, 'learning_rate': 1.911602897994286e-05, 'epoch': 0.16} 16%|█▌ | 929/5773 [1:26:38<7:23:18, 5.49s/it] {'loss': 0.5894, 'learning_rate': 1.911602897994286e-05, 'epoch': 0.16} 16%|█▌ | 929/5773 [1:26:33<7:23:18, 5.49s/it] 16%|█▌ | 930/5773 [1:26:43<7:22:20, 5.48s/it] 16%|█▌ | 930/5773 [1:26:38<7:22:20, 5.48s/it] {'loss': 0.5982, 'learning_rate': 1.91137210248733e-05, 'epoch': 0.16} 16%|█▌ | 930/5773 [1:26:43<7:22:20, 5.48s/it] {'loss': 0.5982, 'learning_rate': 1.91137210248733e-05, 'epoch': 0.16} 16%|█▌ | 930/5773 [1:26:38<7:22:20, 5.48s/it] 16%|█▌ | 931/5773 [1:26:43<7:20:18, 5.46s/it] 16%|█▌ | 931/5773 [1:26:49<7:20:20, 5.46s/it] {'loss': 0.5887, 'learning_rate': 1.9111410200513182e-05, 'epoch': 0.16} 16%|█▌ | 931/5773 [1:26:49<7:20:20, 5.46s/it] {'loss': 0.5887, 'learning_rate': 1.9111410200513182e-05, 'epoch': 0.16} 16%|█▌ | 931/5773 [1:26:43<7:20:18, 5.46s/it] 16%|█▌ | 932/5773 [1:26:54<7:19:06, 5.44s/it] 16%|█▌ | 932/5773 [1:26:49<7:19:05, 5.44s/it] {'loss': 0.5924, 'learning_rate': 1.9109096507590022e-05, 'epoch': 0.16} 16%|█▌ | 932/5773 [1:26:54<7:19:06, 5.44s/it] {'loss': 0.5924, 'learning_rate': 1.9109096507590022e-05, 'epoch': 0.16} 16%|█▌ | 932/5773 [1:26:49<7:19:05, 5.44s/it] 16%|█▌ | 933/5773 [1:27:00<7:13:50, 5.38s/it] 16%|█▌ | 933/5773 [1:26:54<7:13:49, 5.38s/it] {'loss': 0.5998, 'learning_rate': 1.910677994683225e-05, 'epoch': 0.16} 16%|█▌ | 933/5773 [1:27:00<7:13:50, 5.38s/it] {'loss': 0.5998, 'learning_rate': 1.910677994683225e-05, 'epoch': 0.16} 16%|█▌ | 933/5773 [1:26:54<7:13:49, 5.38s/it] 16%|█▌ | 934/5773 [1:26:59<7:16:24, 5.41s/it] 16%|█▌ | 934/5773 [1:27:05<7:16:24, 5.41s/it] {'loss': 0.5872, 'learning_rate': 1.910446051896919e-05, 'epoch': 0.16} 16%|█▌ | 934/5773 [1:27:05<7:16:24, 5.41s/it] {'loss': 0.5872, 'learning_rate': 1.910446051896919e-05, 'epoch': 0.16} 16%|█▌ | 934/5773 [1:26:59<7:16:24, 5.41s/it] 16%|█▌ | 935/5773 [1:27:05<7:12:42, 5.37s/it] 16%|█▌ | 935/5773 [1:27:10<7:12:43, 5.37s/it] {'loss': 0.5933, 'learning_rate': 1.9102138224731073e-05, 'epoch': 0.16} 16%|█▌ | 935/5773 [1:27:10<7:12:43, 5.37s/it] {'loss': 0.5933, 'learning_rate': 1.9102138224731073e-05, 'epoch': 0.16} 16%|█▌ | 935/5773 [1:27:05<7:12:42, 5.37s/it] 16%|█▌ | 936/5773 [1:27:10<7:12:12, 5.36s/it] 16%|█▌ | 936/5773 [1:27:16<7:12:12, 5.36s/it] {'loss': 0.5966, 'learning_rate': 1.9099813064849034e-05, 'epoch': 0.16} 16%|█▌ | 936/5773 [1:27:16<7:12:12, 5.36s/it] {'loss': 0.5966, 'learning_rate': 1.9099813064849034e-05, 'epoch': 0.16} 16%|█▌ | 936/5773 [1:27:10<7:12:12, 5.36s/it] 16%|█▌ | 937/5773 [1:27:16<7:23:35, 5.50s/it] 16%|█▌ | 937/5773 [1:27:21<7:23:36, 5.50s/it] {'loss': 0.5976, 'learning_rate': 1.90974850400551e-05, 'epoch': 0.16} 16%|█▌ | 937/5773 [1:27:21<7:23:36, 5.50s/it] {'loss': 0.5976, 'learning_rate': 1.90974850400551e-05, 'epoch': 0.16} 16%|█▌ | 937/5773 [1:27:16<7:23:35, 5.50s/it] 16%|█▌ | 938/5773 [1:27:27<7:20:26, 5.47s/it] 16%|█▌ | 938/5773 [1:27:21<7:20:26, 5.47s/it] {'loss': 0.5833, 'learning_rate': 1.9095154151082218e-05, 'epoch': 0.16} 16%|█▌ | 938/5773 [1:27:27<7:20:26, 5.47s/it] {'loss': 0.5833, 'learning_rate': 1.9095154151082218e-05, 'epoch': 0.16} 16%|█▌ | 938/5773 [1:27:21<7:20:26, 5.47s/it] 16%|█▋ | 939/5773 [1:27:32<7:17:50, 5.43s/it] 16%|█▋ | 939/5773 [1:27:27<7:17:51, 5.43s/it] {'loss': 0.6186, 'learning_rate': 1.909282039866422e-05, 'epoch': 0.16} 16%|█▋ | 939/5773 [1:27:32<7:17:50, 5.43s/it] {'loss': 0.6186, 'learning_rate': 1.909282039866422e-05, 'epoch': 0.16} 16%|█▋ | 939/5773 [1:27:27<7:17:51, 5.43s/it] 16%|█▋ | 940/5773 [1:27:32<7:19:29, 5.46s/it] 16%|█▋ | 940/5773 [1:27:38<7:19:29, 5.46s/it] {'loss': 0.5819, 'learning_rate': 1.9090483783535844e-05, 'epoch': 0.16} 16%|█▋ | 940/5773 [1:27:38<7:19:29, 5.46s/it] {'loss': 0.5819, 'learning_rate': 1.9090483783535844e-05, 'epoch': 0.16} 16%|█▋ | 940/5773 [1:27:32<7:19:29, 5.46s/it] 16%|█▋ | 941/5773 [1:27:38<7:22:50, 5.50s/it] 16%|█▋ | 941/5773 [1:27:43<7:22:51, 5.50s/it] {'loss': 0.5828, 'learning_rate': 1.9088144306432737e-05, 'epoch': 0.16} 16%|█▋ | 941/5773 [1:27:43<7:22:51, 5.50s/it] {'loss': 0.5828, 'learning_rate': 1.9088144306432737e-05, 'epoch': 0.16} 16%|█▋ | 941/5773 [1:27:38<7:22:50, 5.50s/it] 16%|█▋ | 942/5773 [1:27:43<7:19:02, 5.45s/it] 16%|█▋ | 942/5773 [1:27:49<7:19:02, 5.45s/it] {'loss': 0.5859, 'learning_rate': 1.9085801968091434e-05, 'epoch': 0.16} 16%|█▋ | 942/5773 [1:27:49<7:19:02, 5.45s/it] {'loss': 0.5859, 'learning_rate': 1.9085801968091434e-05, 'epoch': 0.16} 16%|█▋ | 942/5773 [1:27:43<7:19:02, 5.45s/it] 16%|█▋ | 943/5773 [1:27:49<7:17:49, 5.44s/it] 16%|█▋ | 943/5773 [1:27:54<7:17:49, 5.44s/it] {'loss': 0.5943, 'learning_rate': 1.9083456769249387e-05, 'epoch': 0.16} 16%|█▋ | 943/5773 [1:27:54<7:17:49, 5.44s/it] {'loss': 0.5943, 'learning_rate': 1.9083456769249387e-05, 'epoch': 0.16} 16%|█▋ | 943/5773 [1:27:49<7:17:49, 5.44s/it] 16%|█▋ | 944/5773 [1:28:00<7:18:49, 5.45s/it] 16%|█▋ | 944/5773 [1:27:54<7:18:50, 5.45s/it] {'loss': 0.6001, 'learning_rate': 1.9081108710644933e-05, 'epoch': 0.16} 16%|█▋ | 944/5773 [1:28:00<7:18:49, 5.45s/it] {'loss': 0.6001, 'learning_rate': 1.9081108710644933e-05, 'epoch': 0.16} 16%|█▋ | 944/5773 [1:27:54<7:18:50, 5.45s/it] 16%|█▋ | 945/5773 [1:28:00<7:29:56, 5.59s/it] 16%|█▋ | 945/5773 [1:28:05<7:29:57, 5.59s/it] {'loss': 0.5787, 'learning_rate': 1.9078757793017317e-05, 'epoch': 0.16} 16%|█▋ | 945/5773 [1:28:05<7:29:57, 5.59s/it] {'loss': 0.5787, 'learning_rate': 1.9078757793017317e-05, 'epoch': 0.16} 16%|█▋ | 945/5773 [1:28:00<7:29:56, 5.59s/it] 16%|█▋ | 946/5773 [1:28:05<7:26:59, 5.56s/it] 16%|█▋ | 946/5773 [1:28:11<7:26:59, 5.56s/it] {'loss': 0.5992, 'learning_rate': 1.907640401710668e-05, 'epoch': 0.16} 16%|█▋ | 946/5773 [1:28:11<7:26:59, 5.56s/it] {'loss': 0.5992, 'learning_rate': 1.907640401710668e-05, 'epoch': 0.16} 16%|█▋ | 946/5773 [1:28:05<7:26:59, 5.56s/it] 16%|█▋ | 947/5773 [1:28:11<7:23:50, 5.52s/it] 16%|█▋ | 947/5773 [1:28:16<7:23:50, 5.52s/it] {'loss': 0.5944, 'learning_rate': 1.9074047383654076e-05, 'epoch': 0.16} 16%|█▋ | 947/5773 [1:28:16<7:23:50, 5.52s/it] {'loss': 0.5944, 'learning_rate': 1.9074047383654076e-05, 'epoch': 0.16} 16%|█▋ | 947/5773 [1:28:11<7:23:50, 5.52s/it] 16%|█▋ | 948/5773 [1:28:16<7:22:30, 5.50s/it] 16%|█▋ | 948/5773 [1:28:22<7:22:30, 5.50s/it] {'loss': 0.5834, 'learning_rate': 1.907168789340144e-05, 'epoch': 0.16} 16%|█▋ | 948/5773 [1:28:22<7:22:30, 5.50s/it] {'loss': 0.5834, 'learning_rate': 1.907168789340144e-05, 'epoch': 0.16} 16%|█▋ | 948/5773 [1:28:16<7:22:30, 5.50s/it] 16%|█▋ | 949/5773 [1:28:22<7:26:21, 5.55s/it] 16%|█▋ | 949/5773 [1:28:28<7:26:21, 5.55s/it] {'loss': 0.6156, 'learning_rate': 1.9069325547091614e-05, 'epoch': 0.16} 16%|█▋ | 949/5773 [1:28:28<7:26:21, 5.55s/it] {'loss': 0.6156, 'learning_rate': 1.9069325547091614e-05, 'epoch': 0.16} 16%|█▋ | 949/5773 [1:28:22<7:26:21, 5.55s/it]2 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend...13 AutoResumeHook: Checking whether to suspend... 7 5 AutoResumeHook: Checking whether to suspend...11 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...14 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 08 AutoResumeHook: Checking whether to suspend... 10121 AutoResumeHook: Checking whether to suspend... 16%|█▋ | 950/5773 [1:28:33<7:24:53, 5.53s/it] AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 16%|█▋ | 950/5773 [1:28:27<7:24:54, 5.53s/it] {'loss': 0.5864, 'learning_rate': 1.9066960345468343e-05, 'epoch': 0.16} 16%|█▋ | 950/5773 [1:28:33<7:24:53, 5.53s/it] {'loss': 0.5864, 'learning_rate': 1.9066960345468343e-05, 'epoch': 0.16} 16%|█▋ | 950/5773 [1:28:27<7:24:54, 5.53s/it] 16%|█▋ | 951/5773 [1:28:38<7:20:10, 5.48s/it] 16%|█▋ | 951/5773 [1:28:33<7:20:11, 5.48s/it] {'loss': 0.5991, 'learning_rate': 1.9064592289276268e-05, 'epoch': 0.16} 16%|█▋ | 951/5773 [1:28:38<7:20:10, 5.48s/it] {'loss': 0.5991, 'learning_rate': 1.9064592289276268e-05, 'epoch': 0.16} 16%|█▋ | 951/5773 [1:28:33<7:20:11, 5.48s/it] 16%|█▋ | 952/5773 [1:28:38<7:16:39, 5.43s/it] 16%|█▋ | 952/5773 [1:28:44<7:16:39, 5.43s/it] {'loss': 0.5804, 'learning_rate': 1.906222137926093e-05, 'epoch': 0.16} 16%|█▋ | 952/5773 [1:28:44<7:16:39, 5.43s/it] {'loss': 0.5804, 'learning_rate': 1.906222137926093e-05, 'epoch': 0.16} 16%|█▋ | 952/5773 [1:28:38<7:16:39, 5.43s/it] 17%|█▋ | 953/5773 [1:28:43<7:11:52, 5.38s/it] 17%|█▋ | 953/5773 [1:28:49<7:11:53, 5.38s/it] {'loss': 0.6111, 'learning_rate': 1.9059847616168766e-05, 'epoch': 0.17} 17%|█▋ | 953/5773 [1:28:49<7:11:53, 5.38s/it] {'loss': 0.6111, 'learning_rate': 1.9059847616168766e-05, 'epoch': 0.17} 17%|█▋ | 953/5773 [1:28:43<7:11:52, 5.38s/it] 17%|█▋ | 954/5773 [1:28:49<7:11:24, 5.37s/it] 17%|█▋ | 954/5773 [1:28:54<7:11:25, 5.37s/it] {'loss': 0.6167, 'learning_rate': 1.9057471000747113e-05, 'epoch': 0.17} 17%|█▋ | 954/5773 [1:28:54<7:11:25, 5.37s/it] {'loss': 0.6167, 'learning_rate': 1.9057471000747113e-05, 'epoch': 0.17} 17%|█▋ | 954/5773 [1:28:49<7:11:24, 5.37s/it] 17%|█▋ | 955/5773 [1:28:54<7:10:37, 5.36s/it] 17%|█▋ | 955/5773 [1:29:00<7:10:38, 5.36s/it] {'loss': 0.5921, 'learning_rate': 1.9055091533744204e-05, 'epoch': 0.17} 17%|█▋ | 955/5773 [1:29:00<7:10:38, 5.36s/it] {'loss': 0.5921, 'learning_rate': 1.9055091533744204e-05, 'epoch': 0.17} 17%|█▋ | 955/5773 [1:28:54<7:10:37, 5.36s/it] 17%|█▋ | 956/5773 [1:29:00<7:14:22, 5.41s/it] 17%|█▋ | 956/5773 [1:29:05<7:14:21, 5.41s/it] {'loss': 0.5899, 'learning_rate': 1.905270921590917e-05, 'epoch': 0.17} 17%|█▋ | 956/5773 [1:29:05<7:14:21, 5.41s/it] {'loss': 0.5899, 'learning_rate': 1.905270921590917e-05, 'epoch': 0.17} 17%|█▋ | 956/5773 [1:29:00<7:14:22, 5.41s/it] 17%|█▋ | 957/5773 [1:29:05<7:14:51, 5.42s/it] 17%|█▋ | 957/5773 [1:29:11<7:14:52, 5.42s/it] {'loss': 0.5911, 'learning_rate': 1.9050324047992045e-05, 'epoch': 0.17} 17%|█▋ | 957/5773 [1:29:11<7:14:52, 5.42s/it] {'loss': 0.5911, 'learning_rate': 1.9050324047992045e-05, 'epoch': 0.17} 17%|█▋ | 957/5773 [1:29:05<7:14:51, 5.42s/it] 17%|█▋ | 958/5773 [1:29:11<7:16:22, 5.44s/it] 17%|█▋ | 958/5773 [1:29:16<7:16:22, 5.44s/it] {'loss': 0.5881, 'learning_rate': 1.904793603074375e-05, 'epoch': 0.17} 17%|█▋ | 958/5773 [1:29:16<7:16:22, 5.44s/it] {'loss': 0.5881, 'learning_rate': 1.904793603074375e-05, 'epoch': 0.17} 17%|█▋ | 958/5773 [1:29:11<7:16:22, 5.44s/it] 17%|█▋ | 959/5773 [1:29:16<7:16:49, 5.44s/it] 17%|█▋ | 959/5773 [1:29:22<7:16:49, 5.44s/it] {'loss': 0.589, 'learning_rate': 1.9045545164916118e-05, 'epoch': 0.17} 17%|█▋ | 959/5773 [1:29:22<7:16:49, 5.44s/it] {'loss': 0.589, 'learning_rate': 1.9045545164916118e-05, 'epoch': 0.17} 17%|█▋ | 959/5773 [1:29:16<7:16:49, 5.44s/it] 17%|█▋ | 960/5773 [1:29:21<7:13:01, 5.40s/it] 17%|█▋ | 960/5773 [1:29:27<7:13:01, 5.40s/it] {'loss': 0.5886, 'learning_rate': 1.9043151451261865e-05, 'epoch': 0.17} 17%|█▋ | 960/5773 [1:29:27<7:13:01, 5.40s/it] {'loss': 0.5886, 'learning_rate': 1.9043151451261865e-05, 'epoch': 0.17} 17%|█▋ | 960/5773 [1:29:21<7:13:01, 5.40s/it] 17%|█▋ | 961/5773 [1:29:27<7:14:28, 5.42s/it] 17%|█▋ | 961/5773 [1:29:32<7:14:28, 5.42s/it] {'loss': 0.5911, 'learning_rate': 1.9040754890534604e-05, 'epoch': 0.17} 17%|█▋ | 961/5773 [1:29:32<7:14:28, 5.42s/it] {'loss': 0.5911, 'learning_rate': 1.9040754890534604e-05, 'epoch': 0.17} 17%|█▋ | 961/5773 [1:29:27<7:14:28, 5.42s/it] 17%|█▋ | 962/5773 [1:29:32<7:13:35, 5.41s/it] 17%|█▋ | 962/5773 [1:29:38<7:13:35, 5.41s/it] {'loss': 0.6036, 'learning_rate': 1.9038355483488857e-05, 'epoch': 0.17} 17%|█▋ | 962/5773 [1:29:38<7:13:35, 5.41s/it] {'loss': 0.6036, 'learning_rate': 1.9038355483488857e-05, 'epoch': 0.17} 17%|█▋ | 962/5773 [1:29:32<7:13:35, 5.41s/it] 17%|█▋ | 963/5773 [1:29:38<7:14:24, 5.42s/it] 17%|█▋ | 963/5773 [1:29:43<7:14:24, 5.42s/it] {'loss': 0.5936, 'learning_rate': 1.9035953230880026e-05, 'epoch': 0.17} 17%|█▋ | 963/5773 [1:29:43<7:14:24, 5.42s/it] {'loss': 0.5936, 'learning_rate': 1.9035953230880026e-05, 'epoch': 0.17} 17%|█▋ | 963/5773 [1:29:38<7:14:24, 5.42s/it] 17%|█▋ | 964/5773 [1:29:49<7:19:38, 5.49s/it] 17%|█▋ | 964/5773 [1:29:43<7:19:39, 5.49s/it] {'loss': 0.5976, 'learning_rate': 1.9033548133464425e-05, 'epoch': 0.17} 17%|█▋ | 964/5773 [1:29:49<7:19:38, 5.49s/it] {'loss': 0.5976, 'learning_rate': 1.9033548133464425e-05, 'epoch': 0.17} 17%|█▋ | 964/5773 [1:29:43<7:19:39, 5.49s/it] 17%|█▋ | 965/5773 [1:29:48<7:14:36, 5.42s/it] 17%|█▋ | 965/5773 [1:29:54<7:14:36, 5.42s/it] {'loss': 0.6108, 'learning_rate': 1.903114019199925e-05, 'epoch': 0.17} 17%|█▋ | 965/5773 [1:29:54<7:14:36, 5.42s/it] {'loss': 0.6108, 'learning_rate': 1.903114019199925e-05, 'epoch': 0.17} 17%|█▋ | 965/5773 [1:29:48<7:14:36, 5.42s/it] 17%|█▋ | 966/5773 [1:29:54<7:14:03, 5.42s/it] 17%|█▋ | 966/5773 [1:29:59<7:14:06, 5.42s/it] {'loss': 0.5963, 'learning_rate': 1.9028729407242598e-05, 'epoch': 0.17} 17%|█▋ | 966/5773 [1:29:59<7:14:06, 5.42s/it] {'loss': 0.5963, 'learning_rate': 1.9028729407242598e-05, 'epoch': 0.17} 17%|█▋ | 966/5773 [1:29:54<7:14:03, 5.42s/it] 17%|█▋ | 967/5773 [1:29:59<7:11:18, 5.38s/it] 17%|█▋ | 967/5773 [1:30:05<7:11:17, 5.38s/it] {'loss': 0.5972, 'learning_rate': 1.902631577995347e-05, 'epoch': 0.17} 17%|█▋ | 967/5773 [1:30:05<7:11:17, 5.38s/it] {'loss': 0.5972, 'learning_rate': 1.902631577995347e-05, 'epoch': 0.17} 17%|█▋ | 967/5773 [1:29:59<7:11:18, 5.38s/it] 17%|█▋ | 968/5773 [1:30:05<7:13:57, 5.42s/it] 17%|█▋ | 968/5773 [1:30:10<7:13:57, 5.42s/it] {'loss': 0.5988, 'learning_rate': 1.9023899310891737e-05, 'epoch': 0.17} 17%|█▋ | 968/5773 [1:30:10<7:13:57, 5.42s/it] {'loss': 0.5988, 'learning_rate': 1.9023899310891737e-05, 'epoch': 0.17} 17%|█▋ | 968/5773 [1:30:05<7:13:57, 5.42s/it] 17%|█▋ | 969/5773 [1:30:10<7:16:12, 5.45s/it] 17%|█▋ | 969/5773 [1:30:16<7:16:13, 5.45s/it] {'loss': 0.6, 'learning_rate': 1.9021480000818193e-05, 'epoch': 0.17} 17%|█▋ | 969/5773 [1:30:16<7:16:13, 5.45s/it] {'loss': 0.6, 'learning_rate': 1.9021480000818193e-05, 'epoch': 0.17} 17%|█▋ | 969/5773 [1:30:10<7:16:12, 5.45s/it] 17%|█▋ | 970/5773 [1:30:16<7:16:45, 5.46s/it] 17%|█▋ | 970/5773 [1:30:21<7:16:44, 5.46s/it] {'loss': 0.5871, 'learning_rate': 1.901905785049451e-05, 'epoch': 0.17} 17%|█▋ | 970/5773 [1:30:21<7:16:44, 5.46s/it] {'loss': 0.5871, 'learning_rate': 1.901905785049451e-05, 'epoch': 0.17} 17%|█▋ | 970/5773 [1:30:16<7:16:45, 5.46s/it] 17%|█▋ | 971/5773 [1:30:27<7:19:43, 5.49s/it] 17%|█▋ | 971/5773 [1:30:21<7:19:44, 5.49s/it] {'loss': 0.5901, 'learning_rate': 1.9016632860683257e-05, 'epoch': 0.17} 17%|█▋ | 971/5773 [1:30:27<7:19:43, 5.49s/it] {'loss': 0.5901, 'learning_rate': 1.9016632860683257e-05, 'epoch': 0.17} 17%|█▋ | 971/5773 [1:30:21<7:19:44, 5.49s/it] 17%|█▋ | 972/5773 [1:30:32<7:15:02, 5.44s/it] 17%|█▋ | 972/5773 [1:30:27<7:15:03, 5.44s/it] {'loss': 0.5891, 'learning_rate': 1.9014205032147904e-05, 'epoch': 0.17} 17%|█▋ | 972/5773 [1:30:32<7:15:02, 5.44s/it] {'loss': 0.5891, 'learning_rate': 1.9014205032147904e-05, 'epoch': 0.17} 17%|█▋ | 972/5773 [1:30:27<7:15:03, 5.44s/it] 17%|█▋ | 973/5773 [1:30:32<7:13:02, 5.41s/it] 17%|█▋ | 973/5773 [1:30:37<7:13:02, 5.41s/it] {'loss': 0.5931, 'learning_rate': 1.9011774365652802e-05, 'epoch': 0.17} 17%|█▋ | 973/5773 [1:30:37<7:13:02, 5.41s/it] {'loss': 0.5931, 'learning_rate': 1.9011774365652802e-05, 'epoch': 0.17} 17%|█▋ | 973/5773 [1:30:32<7:13:02, 5.41s/it] 17%|█▋ | 974/5773 [1:30:43<7:12:08, 5.40s/it] 17%|█▋ | 974/5773 [1:30:37<7:12:08, 5.40s/it] {'loss': 0.5779, 'learning_rate': 1.9009340861963208e-05, 'epoch': 0.17} 17%|█▋ | 974/5773 [1:30:43<7:12:08, 5.40s/it] {'loss': 0.5779, 'learning_rate': 1.9009340861963208e-05, 'epoch': 0.17} 17%|█▋ | 974/5773 [1:30:37<7:12:08, 5.40s/it] 17%|█▋ | 975/5773 [1:30:43<7:13:18, 5.42s/it] 17%|█▋ | 975/5773 [1:30:48<7:13:18, 5.42s/it] {'loss': 0.5929, 'learning_rate': 1.9006904521845263e-05, 'epoch': 0.17} 17%|█▋ | 975/5773 [1:30:48<7:13:18, 5.42s/it] {'loss': 0.5929, 'learning_rate': 1.9006904521845263e-05, 'epoch': 0.17} 17%|█▋ | 975/5773 [1:30:43<7:13:18, 5.42s/it] 17%|█▋ | 976/5773 [1:30:48<7:16:25, 5.46s/it] 17%|█▋ | 976/5773 [1:30:54<7:16:25, 5.46s/it] {'loss': 0.5966, 'learning_rate': 1.9004465346066005e-05, 'epoch': 0.17} 17%|█▋ | 976/5773 [1:30:54<7:16:25, 5.46s/it] {'loss': 0.5966, 'learning_rate': 1.9004465346066005e-05, 'epoch': 0.17} 17%|█▋ | 976/5773 [1:30:48<7:16:25, 5.46s/it] 17%|█▋ | 977/5773 [1:30:54<7:15:26, 5.45s/it] 17%|█▋ | 977/5773 [1:30:59<7:15:26, 5.45s/it] {'loss': 0.5986, 'learning_rate': 1.9002023335393366e-05, 'epoch': 0.17} 17%|█▋ | 977/5773 [1:30:59<7:15:26, 5.45s/it] {'loss': 0.5986, 'learning_rate': 1.9002023335393366e-05, 'epoch': 0.17} 17%|█▋ | 977/5773 [1:30:54<7:15:26, 5.45s/it] 17%|█▋ | 978/5773 [1:30:59<7:14:53, 5.44s/it] 17%|█▋ | 978/5773 [1:31:05<7:14:53, 5.44s/it] {'loss': 0.6057, 'learning_rate': 1.8999578490596167e-05, 'epoch': 0.17} 17%|█▋ | 978/5773 [1:31:05<7:14:53, 5.44s/it] {'loss': 0.6057, 'learning_rate': 1.8999578490596167e-05, 'epoch': 0.17} 17%|█▋ | 978/5773 [1:30:59<7:14:53, 5.44s/it] 17%|█▋ | 979/5773 [1:31:05<7:12:59, 5.42s/it] 17%|█▋ | 979/5773 [1:31:10<7:12:59, 5.42s/it] {'loss': 0.5987, 'learning_rate': 1.8997130812444127e-05, 'epoch': 0.17} 17%|█▋ | 979/5773 [1:31:10<7:12:59, 5.42s/it] {'loss': 0.5987, 'learning_rate': 1.8997130812444127e-05, 'epoch': 0.17} 17%|█▋ | 979/5773 [1:31:05<7:12:59, 5.42s/it] 17%|█▋ | 980/5773 [1:31:16<7:13:47, 5.43s/it] 17%|█▋ | 980/5773 [1:31:10<7:13:47, 5.43s/it] {'loss': 0.5874, 'learning_rate': 1.899468030170785e-05, 'epoch': 0.17} 17%|█▋ | 980/5773 [1:31:16<7:13:47, 5.43s/it] {'loss': 0.5874, 'learning_rate': 1.899468030170785e-05, 'epoch': 0.17} 17%|█▋ | 980/5773 [1:31:10<7:13:47, 5.43s/it] 17%|█▋ | 981/5773 [1:31:15<7:11:50, 5.41s/it] 17%|█▋ | 981/5773 [1:31:21<7:11:50, 5.41s/it] {'loss': 0.5933, 'learning_rate': 1.8992226959158834e-05, 'epoch': 0.17} 17%|█▋ | 981/5773 [1:31:21<7:11:50, 5.41s/it] {'loss': 0.5933, 'learning_rate': 1.8992226959158834e-05, 'epoch': 0.17} 17%|█▋ | 981/5773 [1:31:15<7:11:50, 5.41s/it] 17%|█▋ | 982/5773 [1:31:26<7:15:47, 5.46s/it] 17%|█▋ | 982/5773 [1:31:21<7:15:48, 5.46s/it] {'loss': 0.5964, 'learning_rate': 1.8989770785569472e-05, 'epoch': 0.17} 17%|█▋ | 982/5773 [1:31:26<7:15:47, 5.46s/it] {'loss': 0.5964, 'learning_rate': 1.8989770785569472e-05, 'epoch': 0.17} 17%|█▋ | 982/5773 [1:31:21<7:15:48, 5.46s/it] 17%|█▋ | 983/5773 [1:31:26<7:15:19, 5.45s/it] 17%|█▋ | 983/5773 [1:31:32<7:15:19, 5.45s/it] {'loss': 0.6102, 'learning_rate': 1.898731178171305e-05, 'epoch': 0.17} 17%|█▋ | 983/5773 [1:31:32<7:15:19, 5.45s/it] {'loss': 0.6102, 'learning_rate': 1.898731178171305e-05, 'epoch': 0.17} 17%|█▋ | 983/5773 [1:31:26<7:15:19, 5.45s/it] 17%|█▋ | 984/5773 [1:31:32<7:17:05, 5.48s/it] 17%|█▋ | 984/5773 [1:31:37<7:17:05, 5.48s/it] {'loss': 0.5941, 'learning_rate': 1.898484994836373e-05, 'epoch': 0.17} 17%|█▋ | 984/5773 [1:31:37<7:17:05, 5.48s/it] {'loss': 0.5941, 'learning_rate': 1.898484994836373e-05, 'epoch': 0.17} 17%|█▋ | 984/5773 [1:31:32<7:17:05, 5.48s/it] 17%|█▋ | 985/5773 [1:31:37<7:13:58, 5.44s/it] 17%|█▋ | 985/5773 [1:31:43<7:13:58, 5.44s/it] {'loss': 0.5926, 'learning_rate': 1.8982385286296586e-05, 'epoch': 0.17} 17%|█▋ | 985/5773 [1:31:43<7:13:58, 5.44s/it] {'loss': 0.5926, 'learning_rate': 1.8982385286296586e-05, 'epoch': 0.17} 17%|█▋ | 985/5773 [1:31:37<7:13:58, 5.44s/it] 17%|█▋ | 986/5773 [1:31:48<7:15:04, 5.45s/it] 17%|█▋ | 986/5773 [1:31:43<7:15:04, 5.45s/it] {'loss': 0.593, 'learning_rate': 1.897991779628757e-05, 'epoch': 0.17} 17%|█▋ | 986/5773 [1:31:48<7:15:04, 5.45s/it] {'loss': 0.593, 'learning_rate': 1.897991779628757e-05, 'epoch': 0.17} 17%|█▋ | 986/5773 [1:31:43<7:15:04, 5.45s/it] 17%|█▋ | 987/5773 [1:31:48<7:10:50, 5.40s/it] 17%|█▋ | 987/5773 [1:31:54<7:10:50, 5.40s/it] {'loss': 0.5832, 'learning_rate': 1.897744747911352e-05, 'epoch': 0.17} 17%|█▋ | 987/5773 [1:31:54<7:10:50, 5.40s/it] {'loss': 0.5832, 'learning_rate': 1.897744747911352e-05, 'epoch': 0.17} 17%|█▋ | 987/5773 [1:31:48<7:10:50, 5.40s/it] 17%|█▋ | 988/5773 [1:31:59<7:12:06, 5.42s/it] 17%|█▋ | 988/5773 [1:31:53<7:12:06, 5.42s/it] {'loss': 0.5915, 'learning_rate': 1.897497433555218e-05, 'epoch': 0.17} 17%|█▋ | 988/5773 [1:31:59<7:12:06, 5.42s/it] {'loss': 0.5915, 'learning_rate': 1.897497433555218e-05, 'epoch': 0.17} 17%|█▋ | 988/5773 [1:31:53<7:12:06, 5.42s/it] 17%|█▋ | 989/5773 [1:31:59<7:12:24, 5.42s/it] 17%|█▋ | 989/5773 [1:32:04<7:12:24, 5.42s/it] {'loss': 0.613, 'learning_rate': 1.8972498366382173e-05, 'epoch': 0.17} 17%|█▋ | 989/5773 [1:32:04<7:12:24, 5.42s/it] {'loss': 0.613, 'learning_rate': 1.8972498366382173e-05, 'epoch': 0.17} 17%|█▋ | 989/5773 [1:31:59<7:12:24, 5.42s/it] 17%|█▋ | 990/5773 [1:32:04<7:11:24, 5.41s/it] 17%|█▋ | 990/5773 [1:32:10<7:11:24, 5.41s/it] {'loss': 0.5864, 'learning_rate': 1.8970019572383007e-05, 'epoch': 0.17} 17%|█▋ | 990/5773 [1:32:10<7:11:24, 5.41s/it] {'loss': 0.5864, 'learning_rate': 1.8970019572383007e-05, 'epoch': 0.17} 17%|█▋ | 990/5773 [1:32:04<7:11:24, 5.41s/it] 17%|█▋ | 991/5773 [1:32:10<7:09:33, 5.39s/it] 17%|█▋ | 991/5773 [1:32:15<7:09:33, 5.39s/it] {'loss': 0.5991, 'learning_rate': 1.896753795433509e-05, 'epoch': 0.17} 17%|█▋ | 991/5773 [1:32:15<7:09:33, 5.39s/it] {'loss': 0.5991, 'learning_rate': 1.896753795433509e-05, 'epoch': 0.17} 17%|█▋ | 991/5773 [1:32:10<7:09:33, 5.39s/it] 17%|█▋ | 992/5773 [1:32:21<7:11:51, 5.42s/it] 17%|█▋ | 992/5773 [1:32:15<7:11:51, 5.42s/it] {'loss': 0.5943, 'learning_rate': 1.8965053513019715e-05, 'epoch': 0.17} 17%|█▋ | 992/5773 [1:32:21<7:11:51, 5.42s/it] {'loss': 0.5943, 'learning_rate': 1.8965053513019715e-05, 'epoch': 0.17} 17%|█▋ | 992/5773 [1:32:15<7:11:51, 5.42s/it] 17%|█▋ | 993/5773 [1:32:20<7:09:28, 5.39s/it] 17%|█▋ | 993/5773 [1:32:26<7:09:28, 5.39s/it] {'loss': 0.5885, 'learning_rate': 1.8962566249219062e-05, 'epoch': 0.17} 17%|█▋ | 993/5773 [1:32:26<7:09:28, 5.39s/it] {'loss': 0.5885, 'learning_rate': 1.8962566249219062e-05, 'epoch': 0.17} 17%|█▋ | 993/5773 [1:32:20<7:09:28, 5.39s/it] 17%|█▋ | 994/5773 [1:32:26<7:09:48, 5.40s/it] 17%|█▋ | 994/5773 [1:32:31<7:09:49, 5.40s/it] {'loss': 0.5935, 'learning_rate': 1.89600761637162e-05, 'epoch': 0.17} 17%|█▋ | 994/5773 [1:32:31<7:09:49, 5.40s/it] {'loss': 0.5935, 'learning_rate': 1.89600761637162e-05, 'epoch': 0.17} 17%|█▋ | 994/5773 [1:32:26<7:09:48, 5.40s/it] 17%|█▋ | 995/5773 [1:32:31<7:09:00, 5.39s/it] 17%|█▋ | 995/5773 [1:32:37<7:09:00, 5.39s/it] {'loss': 0.5896, 'learning_rate': 1.8957583257295094e-05, 'epoch': 0.17} 17%|█▋ | 995/5773 [1:32:37<7:09:00, 5.39s/it] {'loss': 0.5896, 'learning_rate': 1.8957583257295094e-05, 'epoch': 0.17} 17%|█▋ | 995/5773 [1:32:31<7:09:00, 5.39s/it] 17%|█▋ | 996/5773 [1:32:37<7:06:42, 5.36s/it] 17%|█▋ | 996/5773 [1:32:42<7:06:42, 5.36s/it] {'loss': 0.5989, 'learning_rate': 1.8955087530740584e-05, 'epoch': 0.17} 17%|█▋ | 996/5773 [1:32:42<7:06:42, 5.36s/it] {'loss': 0.5989, 'learning_rate': 1.8955087530740584e-05, 'epoch': 0.17} 17%|█▋ | 996/5773 [1:32:37<7:06:42, 5.36s/it] 17%|█▋ | 997/5773 [1:32:42<7:04:49, 5.34s/it] 17%|█▋ | 997/5773 [1:32:47<7:04:49, 5.34s/it] {'loss': 0.5943, 'learning_rate': 1.8952588984838404e-05, 'epoch': 0.17} 17%|█▋ | 997/5773 [1:32:47<7:04:49, 5.34s/it] {'loss': 0.5943, 'learning_rate': 1.8952588984838404e-05, 'epoch': 0.17} 17%|█▋ | 997/5773 [1:32:42<7:04:49, 5.34s/it] 17%|█▋ | 998/5773 [1:32:47<7:04:32, 5.33s/it] 17%|█▋ | 998/5773 [1:32:53<7:04:32, 5.33s/it] {'loss': 0.6053, 'learning_rate': 1.895008762037518e-05, 'epoch': 0.17} 17%|█▋ | 998/5773 [1:32:53<7:04:32, 5.33s/it] {'loss': 0.6053, 'learning_rate': 1.895008762037518e-05, 'epoch': 0.17} 17%|█▋ | 998/5773 [1:32:47<7:04:32, 5.33s/it] 17%|█▋ | 999/5773 [1:32:52<7:03:16, 5.32s/it] 17%|█▋ | 999/5773 [1:32:58<7:03:16, 5.32s/it] {'loss': 0.5951, 'learning_rate': 1.894758343813842e-05, 'epoch': 0.17} 17%|█▋ | 999/5773 [1:32:58<7:03:16, 5.32s/it] {'loss': 0.5951, 'learning_rate': 1.894758343813842e-05, 'epoch': 0.17} 17%|█▋ | 999/5773 [1:32:52<7:03:16, 5.32s/it]14 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 25 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 9 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 17%|█▋ | 1000/5773 [1:33:03<7:06:14, 5.36s/it]AutoResumeHook: Checking whether to suspend...4 AutoResumeHook: Checking whether to suspend... 17%|█▋ | 1000/5773 [1:32:58<7:06:14, 5.36s/it]11 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... {'loss': 0.5954, 'learning_rate': 1.8945076438916515e-05, 'epoch': 0.17} 17%|█▋ | 1000/5773 [1:33:03<7:06:14, 5.36s/it] {'loss': 0.5954, 'learning_rate': 1.8945076438916515e-05, 'epoch': 0.17} 17%|█▋ | 1000/5773 [1:32:58<7:06:14, 5.36s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 17%|█▋ | 1001/5773 [1:33:16<12:19:04, 9.29s/it] 17%|█▋ | 1001/5773 [1:33:22<12:19:04, 9.29s/it] {'loss': 0.5991, 'learning_rate': 1.8942566623498758e-05, 'epoch': 0.17} 17%|█▋ | 1001/5773 [1:33:22<12:19:04, 9.29s/it] {'loss': 0.5991, 'learning_rate': 1.8942566623498758e-05, 'epoch': 0.17} 17%|█▋ | 1001/5773 [1:33:16<12:19:04, 9.29s/it] 17%|█▋ | 1002/5773 [1:33:22<10:42:37, 8.08s/it] 17%|█▋ | 1002/5773 [1:33:27<10:42:37, 8.08s/it] {'loss': 0.6209, 'learning_rate': 1.894005399267532e-05, 'epoch': 0.17} 17%|█▋ | 1002/5773 [1:33:27<10:42:37, 8.08s/it] {'loss': 0.6209, 'learning_rate': 1.894005399267532e-05, 'epoch': 0.17} 17%|█▋ | 1002/5773 [1:33:22<10:42:37, 8.08s/it] 17%|█▋ | 1003/5773 [1:33:27<9:38:45, 7.28s/it] 17%|█▋ | 1003/5773 [1:33:33<9:38:46, 7.28s/it] {'loss': 0.6027, 'learning_rate': 1.8937538547237247e-05, 'epoch': 0.17} 17%|█▋ | 1003/5773 [1:33:33<9:38:46, 7.28s/it] {'loss': 0.6027, 'learning_rate': 1.8937538547237247e-05, 'epoch': 0.17} 17%|█▋ | 1003/5773 [1:33:27<9:38:45, 7.28s/it] 17%|█▋ | 1004/5773 [1:33:32<8:53:58, 6.72s/it] 17%|█▋ | 1004/5773 [1:33:38<8:53:58, 6.72s/it] {'loss': 0.581, 'learning_rate': 1.8935020287976486e-05, 'epoch': 0.17} 17%|█▋ | 1004/5773 [1:33:38<8:53:58, 6.72s/it] {'loss': 0.581, 'learning_rate': 1.8935020287976486e-05, 'epoch': 0.17} 17%|█▋ | 1004/5773 [1:33:32<8:53:58, 6.72s/it] 17%|█▋ | 1005/5773 [1:33:38<8:23:44, 6.34s/it] 17%|█▋ | 1005/5773 [1:33:43<8:23:44, 6.34s/it] {'loss': 0.6185, 'learning_rate': 1.893249921568587e-05, 'epoch': 0.17} 17%|█▋ | 1005/5773 [1:33:43<8:23:44, 6.34s/it] {'loss': 0.6185, 'learning_rate': 1.893249921568587e-05, 'epoch': 0.17} 17%|█▋ | 1005/5773 [1:33:38<8:23:44, 6.34s/it] 17%|█▋ | 1006/5773 [1:33:44<8:12:23, 6.20s/it] 17%|█▋ | 1006/5773 [1:33:49<8:12:22, 6.20s/it] {'loss': 0.6023, 'learning_rate': 1.8929975331159106e-05, 'epoch': 0.17} 17%|█▋ | 1006/5773 [1:33:49<8:12:22, 6.20s/it] {'loss': 0.6023, 'learning_rate': 1.8929975331159106e-05, 'epoch': 0.17} 17%|█▋ | 1006/5773 [1:33:44<8:12:23, 6.20s/it] 17%|█▋ | 1007/5773 [1:33:49<7:53:28, 5.96s/it] 17%|█▋ | 1007/5773 [1:33:55<7:53:29, 5.96s/it] {'loss': 0.6098, 'learning_rate': 1.8927448635190796e-05, 'epoch': 0.17} 17%|█▋ | 1007/5773 [1:33:55<7:53:29, 5.96s/it] {'loss': 0.6098, 'learning_rate': 1.8927448635190796e-05, 'epoch': 0.17} 17%|█▋ | 1007/5773 [1:33:49<7:53:28, 5.96s/it] 17%|█▋ | 1008/5773 [1:33:55<7:39:16, 5.78s/it] 17%|█▋ | 1008/5773 [1:34:00<7:39:16, 5.78s/it] {'loss': 0.578, 'learning_rate': 1.8924919128576428e-05, 'epoch': 0.17} 17%|█▋ | 1008/5773 [1:34:00<7:39:16, 5.78s/it] {'loss': 0.578, 'learning_rate': 1.8924919128576428e-05, 'epoch': 0.17} 17%|█▋ | 1008/5773 [1:33:55<7:39:16, 5.78s/it] 17%|█▋ | 1009/5773 [1:34:00<7:37:05, 5.76s/it] 17%|█▋ | 1009/5773 [1:34:06<7:37:05, 5.76s/it] {'loss': 0.5859, 'learning_rate': 1.892238681211237e-05, 'epoch': 0.17} 17%|█▋ | 1009/5773 [1:34:06<7:37:05, 5.76s/it] {'loss': 0.5859, 'learning_rate': 1.892238681211237e-05, 'epoch': 0.17} 17%|█▋ | 1009/5773 [1:34:00<7:37:05, 5.76s/it] 17%|█▋ | 1010/5773 [1:34:06<7:29:16, 5.66s/it] 17%|█▋ | 1010/5773 [1:34:11<7:29:17, 5.66s/it] {'loss': 0.583, 'learning_rate': 1.8919851686595875e-05, 'epoch': 0.17} 17%|█▋ | 1010/5773 [1:34:11<7:29:17, 5.66s/it] {'loss': 0.583, 'learning_rate': 1.8919851686595875e-05, 'epoch': 0.17} 17%|█▋ | 1010/5773 [1:34:06<7:29:16, 5.66s/it] 18%|█▊ | 1011/5773 [1:34:11<7:21:06, 5.56s/it] 18%|█▊ | 1011/5773 [1:34:16<7:21:06, 5.56s/it] {'loss': 0.5873, 'learning_rate': 1.891731375282508e-05, 'epoch': 0.18} 18%|█▊ | 1011/5773 [1:34:16<7:21:06, 5.56s/it] {'loss': 0.5873, 'learning_rate': 1.891731375282508e-05, 'epoch': 0.18} 18%|█▊ | 1011/5773 [1:34:11<7:21:06, 5.56s/it] 18%|█▊ | 1012/5773 [1:34:17<7:23:06, 5.58s/it] 18%|█▊ | 1012/5773 [1:34:22<7:23:06, 5.58s/it] {'loss': 0.5885, 'learning_rate': 1.8914773011599012e-05, 'epoch': 0.18} 18%|█▊ | 1012/5773 [1:34:22<7:23:06, 5.58s/it] {'loss': 0.5885, 'learning_rate': 1.8914773011599012e-05, 'epoch': 0.18} 18%|█▊ | 1012/5773 [1:34:17<7:23:06, 5.58s/it] 18%|█▊ | 1013/5773 [1:34:22<7:20:41, 5.56s/it] 18%|█▊ | 1013/5773 [1:34:28<7:20:41, 5.55s/it] {'loss': 0.5927, 'learning_rate': 1.891222946371757e-05, 'epoch': 0.18} 18%|█▊ | 1013/5773 [1:34:28<7:20:41, 5.55s/it] {'loss': 0.5927, 'learning_rate': 1.891222946371757e-05, 'epoch': 0.18} 18%|█▊ | 1013/5773 [1:34:22<7:20:41, 5.56s/it] 18%|█▊ | 1014/5773 [1:34:28<7:17:23, 5.51s/it] 18%|█▊ | 1014/5773 [1:34:33<7:17:23, 5.51s/it] {'loss': 0.5866, 'learning_rate': 1.8909683109981553e-05, 'epoch': 0.18} 18%|█▊ | 1014/5773 [1:34:33<7:17:23, 5.51s/it] {'loss': 0.5866, 'learning_rate': 1.8909683109981553e-05, 'epoch': 0.18} 18%|█▊ | 1014/5773 [1:34:28<7:17:23, 5.51s/it] 18%|█▊ | 1015/5773 [1:34:33<7:14:18, 5.48s/it] 18%|█▊ | 1015/5773 [1:34:38<7:14:18, 5.48s/it] {'loss': 0.603, 'learning_rate': 1.890713395119263e-05, 'epoch': 0.18} 18%|█▊ | 1015/5773 [1:34:38<7:14:18, 5.48s/it] {'loss': 0.603, 'learning_rate': 1.890713395119263e-05, 'epoch': 0.18} 18%|█▊ | 1015/5773 [1:34:33<7:14:18, 5.48s/it] 18%|█▊ | 1016/5773 [1:34:38<7:12:35, 5.46s/it] 18%|█▊ | 1016/5773 [1:34:44<7:12:36, 5.46s/it] {'loss': 0.5961, 'learning_rate': 1.890458198815336e-05, 'epoch': 0.18} 18%|█▊ | 1016/5773 [1:34:44<7:12:36, 5.46s/it] {'loss': 0.5961, 'learning_rate': 1.890458198815336e-05, 'epoch': 0.18} 18%|█▊ | 1016/5773 [1:34:38<7:12:35, 5.46s/it] 18%|█▊ | 1017/5773 [1:34:44<7:13:13, 5.47s/it] 18%|█▊ | 1017/5773 [1:34:49<7:13:13, 5.47s/it] {'loss': 0.5773, 'learning_rate': 1.8902027221667177e-05, 'epoch': 0.18} 18%|█▊ | 1017/5773 [1:34:49<7:13:13, 5.47s/it] {'loss': 0.5773, 'learning_rate': 1.8902027221667177e-05, 'epoch': 0.18} 18%|█▊ | 1017/5773 [1:34:44<7:13:13, 5.47s/it] 18%|█▊ | 1018/5773 [1:34:49<7:13:37, 5.47s/it] 18%|█▊ | 1018/5773 [1:34:55<7:13:37, 5.47s/it] {'loss': 0.5904, 'learning_rate': 1.889946965253841e-05, 'epoch': 0.18} 18%|█▊ | 1018/5773 [1:34:55<7:13:37, 5.47s/it] {'loss': 0.5904, 'learning_rate': 1.889946965253841e-05, 'epoch': 0.18} 18%|█▊ | 1018/5773 [1:34:49<7:13:37, 5.47s/it] 18%|█▊ | 1019/5773 [1:34:55<7:14:04, 5.48s/it] 18%|█▊ | 1019/5773 [1:35:00<7:14:04, 5.48s/it] {'loss': 0.5961, 'learning_rate': 1.8896909281572258e-05, 'epoch': 0.18} 18%|█▊ | 1019/5773 [1:35:00<7:14:04, 5.48s/it] {'loss': 0.5961, 'learning_rate': 1.8896909281572258e-05, 'epoch': 0.18} 18%|█▊ | 1019/5773 [1:34:55<7:14:04, 5.48s/it] 18%|█▊ | 1020/5773 [1:35:00<7:10:47, 5.44s/it] 18%|█▊ | 1020/5773 [1:35:06<7:10:47, 5.44s/it] {'loss': 0.5969, 'learning_rate': 1.889434610957481e-05, 'epoch': 0.18} 18%|█▊ | 1020/5773 [1:35:06<7:10:47, 5.44s/it] {'loss': 0.5969, 'learning_rate': 1.889434610957481e-05, 'epoch': 0.18} 18%|█▊ | 1020/5773 [1:35:00<7:10:47, 5.44s/it] 18%|█▊ | 1021/5773 [1:35:06<7:10:19, 5.43s/it] 18%|█▊ | 1021/5773 [1:35:11<7:10:19, 5.43s/it] {'loss': 0.6039, 'learning_rate': 1.8891780137353036e-05, 'epoch': 0.18} 18%|█▊ | 1021/5773 [1:35:11<7:10:19, 5.43s/it] {'loss': 0.6039, 'learning_rate': 1.8891780137353036e-05, 'epoch': 0.18} 18%|█▊ | 1021/5773 [1:35:06<7:10:19, 5.43s/it] 18%|█▊ | 1022/5773 [1:35:11<7:16:52, 5.52s/it] 18%|█▊ | 1022/5773 [1:35:17<7:16:52, 5.52s/it] {'loss': 0.5726, 'learning_rate': 1.888921136571478e-05, 'epoch': 0.18} 18%|█▊ | 1022/5773 [1:35:17<7:16:52, 5.52s/it] {'loss': 0.5726, 'learning_rate': 1.888921136571478e-05, 'epoch': 0.18} 18%|█▊ | 1022/5773 [1:35:11<7:16:52, 5.52s/it] 18%|█▊ | 1023/5773 [1:35:17<7:12:45, 5.47s/it] 18%|█▊ | 1023/5773 [1:35:22<7:12:45, 5.47s/it] {'loss': 0.5963, 'learning_rate': 1.8886639795468783e-05, 'epoch': 0.18} 18%|█▊ | 1023/5773 [1:35:22<7:12:45, 5.47s/it] {'loss': 0.5963, 'learning_rate': 1.8886639795468783e-05, 'epoch': 0.18} 18%|█▊ | 1023/5773 [1:35:17<7:12:45, 5.47s/it] 18%|█▊ | 1024/5773 [1:35:22<7:10:05, 5.43s/it] 18%|█▊ | 1024/5773 [1:35:27<7:10:05, 5.43s/it] {'loss': 0.6029, 'learning_rate': 1.8884065427424652e-05, 'epoch': 0.18} 18%|█▊ | 1024/5773 [1:35:27<7:10:05, 5.43s/it] {'loss': 0.6029, 'learning_rate': 1.8884065427424652e-05, 'epoch': 0.18} 18%|█▊ | 1024/5773 [1:35:22<7:10:05, 5.43s/it] 18%|█▊ | 1025/5773 [1:35:27<7:12:37, 5.47s/it] 18%|█▊ | 1025/5773 [1:35:33<7:12:37, 5.47s/it] {'loss': 0.5892, 'learning_rate': 1.8881488262392876e-05, 'epoch': 0.18} 18%|█▊ | 1025/5773 [1:35:33<7:12:37, 5.47s/it] {'loss': 0.5892, 'learning_rate': 1.8881488262392876e-05, 'epoch': 0.18} 18%|█▊ | 1025/5773 [1:35:28<7:12:37, 5.47s/it] 18%|█▊ | 1026/5773 [1:35:33<7:12:38, 5.47s/it] 18%|█▊ | 1026/5773 [1:35:39<7:12:38, 5.47s/it] {'loss': 0.5982, 'learning_rate': 1.8878908301184836e-05, 'epoch': 0.18} 18%|█▊ | 1026/5773 [1:35:39<7:12:38, 5.47s/it] {'loss': 0.5982, 'learning_rate': 1.8878908301184836e-05, 'epoch': 0.18} 18%|█▊ | 1026/5773 [1:35:33<7:12:38, 5.47s/it] 18%|█▊ | 1027/5773 [1:35:39<7:15:01, 5.50s/it] 18%|█▊ | 1027/5773 [1:35:44<7:15:01, 5.50s/it] {'loss': 0.5967, 'learning_rate': 1.8876325544612782e-05, 'epoch': 0.18} 18%|█▊ | 1027/5773 [1:35:44<7:15:01, 5.50s/it] {'loss': 0.5967, 'learning_rate': 1.8876325544612782e-05, 'epoch': 0.18} 18%|█▊ | 1027/5773 [1:35:39<7:15:01, 5.50s/it] 18%|█▊ | 1028/5773 [1:35:44<7:09:12, 5.43s/it] 18%|█▊ | 1028/5773 [1:35:49<7:09:12, 5.43s/it] {'loss': 0.5968, 'learning_rate': 1.8873739993489852e-05, 'epoch': 0.18} 18%|█▊ | 1028/5773 [1:35:49<7:09:12, 5.43s/it] {'loss': 0.5968, 'learning_rate': 1.8873739993489852e-05, 'epoch': 0.18} 18%|█▊ | 1028/5773 [1:35:44<7:09:12, 5.43s/it] 18%|█▊ | 1029/5773 [1:35:49<7:10:15, 5.44s/it] 18%|█▊ | 1029/5773 [1:35:55<7:10:15, 5.44s/it] {'loss': 0.5845, 'learning_rate': 1.887115164863006e-05, 'epoch': 0.18} 18%|█▊ | 1029/5773 [1:35:55<7:10:15, 5.44s/it] {'loss': 0.5845, 'learning_rate': 1.887115164863006e-05, 'epoch': 0.18} 18%|█▊ | 1029/5773 [1:35:49<7:10:15, 5.44s/it] 18%|█▊ | 1030/5773 [1:35:55<7:11:05, 5.45s/it] 18%|█▊ | 1030/5773 [1:36:00<7:11:05, 5.45s/it] {'loss': 0.5925, 'learning_rate': 1.8868560510848296e-05, 'epoch': 0.18} 18%|█▊ | 1030/5773 [1:36:00<7:11:05, 5.45s/it] {'loss': 0.5925, 'learning_rate': 1.8868560510848296e-05, 'epoch': 0.18} 18%|█▊ | 1030/5773 [1:35:55<7:11:05, 5.45s/it] 18%|█▊ | 1031/5773 [1:36:00<7:14:37, 5.50s/it] 18%|█▊ | 1031/5773 [1:36:06<7:14:37, 5.50s/it] {'loss': 0.607, 'learning_rate': 1.8865966580960334e-05, 'epoch': 0.18} 18%|█▊ | 1031/5773 [1:36:06<7:14:37, 5.50s/it] {'loss': 0.607, 'learning_rate': 1.8865966580960334e-05, 'epoch': 0.18} 18%|█▊ | 1031/5773 [1:36:00<7:14:37, 5.50s/it] 18%|█▊ | 1032/5773 [1:36:06<7:13:15, 5.48s/it] 18%|█▊ | 1032/5773 [1:36:11<7:13:16, 5.48s/it] {'loss': 0.592, 'learning_rate': 1.8863369859782824e-05, 'epoch': 0.18} 18%|█▊ | 1032/5773 [1:36:11<7:13:16, 5.48s/it] {'loss': 0.592, 'learning_rate': 1.8863369859782824e-05, 'epoch': 0.18} 18%|█▊ | 1032/5773 [1:36:06<7:13:15, 5.48s/it] 18%|█▊ | 1033/5773 [1:36:11<7:15:12, 5.51s/it] 18%|█▊ | 1033/5773 [1:36:17<7:15:12, 5.51s/it] {'loss': 0.5865, 'learning_rate': 1.8860770348133305e-05, 'epoch': 0.18} 18%|█▊ | 1033/5773 [1:36:17<7:15:12, 5.51s/it] {'loss': 0.5865, 'learning_rate': 1.8860770348133305e-05, 'epoch': 0.18} 18%|█▊ | 1033/5773 [1:36:11<7:15:12, 5.51s/it] 18%|█▊ | 1034/5773 [1:36:17<7:13:17, 5.49s/it] 18%|█▊ | 1034/5773 [1:36:22<7:13:17, 5.49s/it] {'loss': 0.5871, 'learning_rate': 1.8858168046830176e-05, 'epoch': 0.18} 18%|█▊ | 1034/5773 [1:36:22<7:13:17, 5.49s/it] {'loss': 0.5871, 'learning_rate': 1.8858168046830176e-05, 'epoch': 0.18} 18%|█▊ | 1034/5773 [1:36:17<7:13:17, 5.49s/it] 18%|█▊ | 1035/5773 [1:36:22<7:09:33, 5.44s/it] 18%|█▊ | 1035/5773 [1:36:28<7:09:33, 5.44s/it] {'loss': 0.6081, 'learning_rate': 1.8855562956692732e-05, 'epoch': 0.18} 18%|█▊ | 1035/5773 [1:36:28<7:09:33, 5.44s/it] {'loss': 0.6081, 'learning_rate': 1.8855562956692732e-05, 'epoch': 0.18} 18%|█▊ | 1035/5773 [1:36:22<7:09:33, 5.44s/it] 18%|█▊ | 1036/5773 [1:36:27<7:06:44, 5.41s/it] 18%|█▊ | 1036/5773 [1:36:33<7:06:43, 5.41s/it] {'loss': 0.5961, 'learning_rate': 1.8852955078541134e-05, 'epoch': 0.18} 18%|█▊ | 1036/5773 [1:36:33<7:06:43, 5.41s/it] {'loss': 0.5961, 'learning_rate': 1.8852955078541134e-05, 'epoch': 0.18} 18%|█▊ | 1036/5773 [1:36:27<7:06:44, 5.41s/it] 18%|█▊ | 1037/5773 [1:36:33<7:11:16, 5.46s/it] 18%|█▊ | 1037/5773 [1:36:39<7:11:16, 5.46s/it] {'loss': 0.5934, 'learning_rate': 1.8850344413196426e-05, 'epoch': 0.18} 18%|█▊ | 1037/5773 [1:36:39<7:11:16, 5.46s/it] {'loss': 0.5934, 'learning_rate': 1.8850344413196426e-05, 'epoch': 0.18} 18%|█▊ | 1037/5773 [1:36:33<7:11:16, 5.46s/it] 18%|█▊ | 1038/5773 [1:36:38<7:09:46, 5.45s/it] 18%|█▊ | 1038/5773 [1:36:44<7:09:46, 5.45s/it] {'loss': 0.6054, 'learning_rate': 1.8847730961480534e-05, 'epoch': 0.18} 18%|█▊ | 1038/5773 [1:36:44<7:09:46, 5.45s/it] {'loss': 0.6054, 'learning_rate': 1.8847730961480534e-05, 'epoch': 0.18} 18%|█▊ | 1038/5773 [1:36:38<7:09:46, 5.45s/it] 18%|█▊ | 1039/5773 [1:36:44<7:10:19, 5.45s/it] 18%|█▊ | 1039/5773 [1:36:49<7:10:19, 5.45s/it] {'loss': 0.581, 'learning_rate': 1.8845114724216248e-05, 'epoch': 0.18} 18%|█▊ | 1039/5773 [1:36:49<7:10:19, 5.45s/it] {'loss': 0.581, 'learning_rate': 1.8845114724216248e-05, 'epoch': 0.18} 18%|█▊ | 1039/5773 [1:36:44<7:10:19, 5.45s/it] 18%|█▊ | 1040/5773 [1:36:49<7:07:27, 5.42s/it] 18%|█▊ | 1040/5773 [1:36:55<7:07:28, 5.42s/it] {'loss': 0.5991, 'learning_rate': 1.884249570222725e-05, 'epoch': 0.18} 18%|█▊ | 1040/5773 [1:36:55<7:07:28, 5.42s/it] {'loss': 0.5991, 'learning_rate': 1.884249570222725e-05, 'epoch': 0.18} 18%|█▊ | 1040/5773 [1:36:49<7:07:27, 5.42s/it] 18%|█▊ | 1041/5773 [1:36:55<7:09:08, 5.44s/it] 18%|█▊ | 1041/5773 [1:37:00<7:09:08, 5.44s/it] {'loss': 0.6051, 'learning_rate': 1.883987389633809e-05, 'epoch': 0.18} 18%|█▊ | 1041/5773 [1:37:00<7:09:08, 5.44s/it] {'loss': 0.6051, 'learning_rate': 1.883987389633809e-05, 'epoch': 0.18} 18%|█▊ | 1041/5773 [1:36:55<7:09:08, 5.44s/it] 18%|█▊ | 1042/5773 [1:37:00<7:06:55, 5.41s/it] 18%|█▊ | 1042/5773 [1:37:06<7:06:55, 5.41s/it] {'loss': 0.5883, 'learning_rate': 1.8837249307374193e-05, 'epoch': 0.18} 18%|█▊ | 1042/5773 [1:37:06<7:06:55, 5.41s/it] {'loss': 0.5883, 'learning_rate': 1.8837249307374193e-05, 'epoch': 0.18} 18%|█▊ | 1042/5773 [1:37:00<7:06:55, 5.41s/it] 18%|█▊ | 1043/5773 [1:37:06<7:10:34, 5.46s/it] 18%|█▊ | 1043/5773 [1:37:11<7:10:33, 5.46s/it] {'loss': 0.5946, 'learning_rate': 1.883462193616187e-05, 'epoch': 0.18} 18%|█▊ | 1043/5773 [1:37:11<7:10:33, 5.46s/it] {'loss': 0.5946, 'learning_rate': 1.883462193616187e-05, 'epoch': 0.18} 18%|█▊ | 1043/5773 [1:37:06<7:10:34, 5.46s/it] 18%|█▊ | 1044/5773 [1:37:11<7:13:04, 5.49s/it] 18%|█▊ | 1044/5773 [1:37:17<7:13:05, 5.49s/it] {'loss': 0.5906, 'learning_rate': 1.8831991783528293e-05, 'epoch': 0.18} 18%|█▊ | 1044/5773 [1:37:17<7:13:05, 5.49s/it] {'loss': 0.5906, 'learning_rate': 1.8831991783528293e-05, 'epoch': 0.18} 18%|█▊ | 1044/5773 [1:37:11<7:13:04, 5.49s/it] 18%|█▊ | 1045/5773 [1:37:17<7:10:23, 5.46s/it] 18%|█▊ | 1045/5773 [1:37:22<7:10:24, 5.46s/it] {'loss': 0.6071, 'learning_rate': 1.8829358850301524e-05, 'epoch': 0.18} 18%|█▊ | 1045/5773 [1:37:22<7:10:24, 5.46s/it] {'loss': 0.6071, 'learning_rate': 1.8829358850301524e-05, 'epoch': 0.18} 18%|█▊ | 1045/5773 [1:37:17<7:10:23, 5.46s/it] 18%|█▊ | 1046/5773 [1:37:22<7:12:23, 5.49s/it] 18%|█▊ | 1046/5773 [1:37:28<7:12:23, 5.49s/it] {'loss': 0.5874, 'learning_rate': 1.8826723137310492e-05, 'epoch': 0.18} 18%|█▊ | 1046/5773 [1:37:28<7:12:23, 5.49s/it] {'loss': 0.5874, 'learning_rate': 1.8826723137310492e-05, 'epoch': 0.18} 18%|█▊ | 1046/5773 [1:37:22<7:12:23, 5.49s/it] 18%|█▊ | 1047/5773 [1:37:28<7:11:37, 5.48s/it] 18%|█▊ | 1047/5773 [1:37:33<7:11:37, 5.48s/it] {'loss': 0.5929, 'learning_rate': 1.8824084645385005e-05, 'epoch': 0.18} 18%|█▊ | 1047/5773 [1:37:33<7:11:37, 5.48s/it] {'loss': 0.5929, 'learning_rate': 1.8824084645385005e-05, 'epoch': 0.18} 18%|█▊ | 1047/5773 [1:37:28<7:11:37, 5.48s/it] 18%|█▊ | 1048/5773 [1:37:33<7:13:19, 5.50s/it] 18%|█▊ | 1048/5773 [1:37:39<7:13:19, 5.50s/it] {'loss': 0.6024, 'learning_rate': 1.8821443375355746e-05, 'epoch': 0.18} 18%|█▊ | 1048/5773 [1:37:39<7:13:19, 5.50s/it] {'loss': 0.6024, 'learning_rate': 1.8821443375355746e-05, 'epoch': 0.18} 18%|█▊ | 1048/5773 [1:37:33<7:13:19, 5.50s/it] 18%|█▊ | 1049/5773 [1:37:39<7:10:23, 5.47s/it] 18%|█▊ | 1049/5773 [1:37:44<7:10:23, 5.47s/it] {'loss': 0.5843, 'learning_rate': 1.8818799328054265e-05, 'epoch': 0.18} 18%|█▊ | 1049/5773 [1:37:44<7:10:23, 5.47s/it] {'loss': 0.5843, 'learning_rate': 1.8818799328054265e-05, 'epoch': 0.18} 18%|█▊ | 1049/5773 [1:37:39<7:10:23, 5.47s/it]09 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 71 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 23 AutoResumeHook: Checking whether to suspend... 18%|█▊ | 1050/5773 [1:37:44<7:08:55, 5.45s/it]AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 8 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 18%|█▊ | 1050/5773 [1:37:50<7:08:55, 5.45s/it] {'loss': 0.6022, 'learning_rate': 1.8816152504312998e-05, 'epoch': 0.18} 18%|█▊ | 1050/5773 [1:37:50<7:08:55, 5.45s/it] {'loss': 0.6022, 'learning_rate': 1.8816152504312998e-05, 'epoch': 0.18} 18%|█▊ | 1050/5773 [1:37:44<7:08:55, 5.45s/it] 18%|█▊ | 1051/5773 [1:37:49<7:07:15, 5.43s/it] 18%|█▊ | 1051/5773 [1:37:55<7:07:16, 5.43s/it] {'loss': 0.5821, 'learning_rate': 1.881350290496525e-05, 'epoch': 0.18} 18%|█▊ | 1051/5773 [1:37:55<7:07:16, 5.43s/it] {'loss': 0.5821, 'learning_rate': 1.881350290496525e-05, 'epoch': 0.18} 18%|█▊ | 1051/5773 [1:37:49<7:07:15, 5.43s/it] 18%|█▊ | 1052/5773 [1:37:55<7:03:27, 5.38s/it] 18%|█▊ | 1052/5773 [1:38:00<7:03:28, 5.38s/it] {'loss': 0.5946, 'learning_rate': 1.88108505308452e-05, 'epoch': 0.18} 18%|█▊ | 1052/5773 [1:38:00<7:03:28, 5.38s/it] {'loss': 0.5946, 'learning_rate': 1.88108505308452e-05, 'epoch': 0.18} 18%|█▊ | 1052/5773 [1:37:55<7:03:27, 5.38s/it] 18%|█▊ | 1053/5773 [1:38:00<7:02:56, 5.38s/it] 18%|█▊ | 1053/5773 [1:38:06<7:02:56, 5.38s/it] {'loss': 0.581, 'learning_rate': 1.8808195382787894e-05, 'epoch': 0.18} 18%|█▊ | 1053/5773 [1:38:06<7:02:56, 5.38s/it] {'loss': 0.581, 'learning_rate': 1.8808195382787894e-05, 'epoch': 0.18} 18%|█▊ | 1053/5773 [1:38:00<7:02:56, 5.38s/it] 18%|█▊ | 1054/5773 [1:38:05<7:05:01, 5.40s/it] 18%|█▊ | 1054/5773 [1:38:11<7:05:01, 5.40s/it] {'loss': 0.5898, 'learning_rate': 1.8805537461629266e-05, 'epoch': 0.18} 18%|█▊ | 1054/5773 [1:38:11<7:05:01, 5.40s/it] {'loss': 0.5898, 'learning_rate': 1.8805537461629266e-05, 'epoch': 0.18} 18%|█▊ | 1054/5773 [1:38:06<7:05:01, 5.40s/it] 18%|█▊ | 1055/5773 [1:38:11<7:06:45, 5.43s/it] 18%|█▊ | 1055/5773 [1:38:17<7:06:44, 5.43s/it] {'loss': 0.5691, 'learning_rate': 1.8802876768206106e-05, 'epoch': 0.18} 18%|█▊ | 1055/5773 [1:38:17<7:06:44, 5.43s/it] {'loss': 0.5691, 'learning_rate': 1.8802876768206106e-05, 'epoch': 0.18} 18%|█▊ | 1055/5773 [1:38:11<7:06:45, 5.43s/it] 18%|█▊ | 1056/5773 [1:38:16<7:03:08, 5.38s/it] 18%|█▊ | 1056/5773 [1:38:22<7:03:07, 5.38s/it] {'loss': 0.598, 'learning_rate': 1.880021330335609e-05, 'epoch': 0.18} 18%|█▊ | 1056/5773 [1:38:22<7:03:07, 5.38s/it] {'loss': 0.598, 'learning_rate': 1.880021330335609e-05, 'epoch': 0.18} 18%|█▊ | 1056/5773 [1:38:16<7:03:08, 5.38s/it] 18%|█▊ | 1057/5773 [1:38:22<7:06:17, 5.42s/it] 18%|█▊ | 1057/5773 [1:38:27<7:06:17, 5.42s/it] {'loss': 0.604, 'learning_rate': 1.8797547067917764e-05, 'epoch': 0.18} 18%|█▊ | 1057/5773 [1:38:27<7:06:17, 5.42s/it] {'loss': 0.604, 'learning_rate': 1.8797547067917764e-05, 'epoch': 0.18} 18%|█▊ | 1057/5773 [1:38:22<7:06:17, 5.42s/it] 18%|█▊ | 1058/5773 [1:38:27<7:05:07, 5.41s/it] 18%|█▊ | 1058/5773 [1:38:33<7:05:06, 5.41s/it] {'loss': 0.5989, 'learning_rate': 1.879487806273054e-05, 'epoch': 0.18} 18%|█▊ | 1058/5773 [1:38:33<7:05:06, 5.41s/it] {'loss': 0.5989, 'learning_rate': 1.879487806273054e-05, 'epoch': 0.18} 18%|█▊ | 1058/5773 [1:38:27<7:05:07, 5.41s/it] 18%|█▊ | 1059/5773 [1:38:33<7:06:38, 5.43s/it] 18%|█▊ | 1059/5773 [1:38:38<7:06:38, 5.43s/it] {'loss': 0.586, 'learning_rate': 1.8792206288634706e-05, 'epoch': 0.18} 18%|█▊ | 1059/5773 [1:38:38<7:06:38, 5.43s/it] {'loss': 0.586, 'learning_rate': 1.8792206288634706e-05, 'epoch': 0.18} 18%|█▊ | 1059/5773 [1:38:33<7:06:38, 5.43s/it] 18%|█▊ | 1060/5773 [1:38:38<7:06:29, 5.43s/it] 18%|█▊ | 1060/5773 [1:38:44<7:06:30, 5.43s/it] {'loss': 0.6059, 'learning_rate': 1.8789531746471422e-05, 'epoch': 0.18} 18%|█▊ | 1060/5773 [1:38:44<7:06:30, 5.43s/it] {'loss': 0.6059, 'learning_rate': 1.8789531746471422e-05, 'epoch': 0.18} 18%|█▊ | 1060/5773 [1:38:38<7:06:29, 5.43s/it] 18%|█▊ | 1061/5773 [1:38:44<7:07:27, 5.44s/it] 18%|█▊ | 1061/5773 [1:38:49<7:07:28, 5.44s/it] {'loss': 0.5875, 'learning_rate': 1.8786854437082725e-05, 'epoch': 0.18} 18%|█▊ | 1061/5773 [1:38:49<7:07:28, 5.44s/it] {'loss': 0.5875, 'learning_rate': 1.8786854437082725e-05, 'epoch': 0.18} 18%|█▊ | 1061/5773 [1:38:44<7:07:27, 5.44s/it] 18%|█▊ | 1062/5773 [1:38:49<7:07:02, 5.44s/it] 18%|█▊ | 1062/5773 [1:38:54<7:07:02, 5.44s/it] {'loss': 0.5911, 'learning_rate': 1.878417436131151e-05, 'epoch': 0.18} 18%|█▊ | 1062/5773 [1:38:55<7:07:02, 5.44s/it] {'loss': 0.5911, 'learning_rate': 1.878417436131151e-05, 'epoch': 0.18} 18%|█▊ | 1062/5773 [1:38:49<7:07:02, 5.44s/it] 18%|█▊ | 1063/5773 [1:38:54<7:04:36, 5.41s/it] 18%|█▊ | 1063/5773 [1:39:00<7:04:37, 5.41s/it] {'loss': 0.6055, 'learning_rate': 1.8781491520001555e-05, 'epoch': 0.18} 18%|█▊ | 1063/5773 [1:39:00<7:04:37, 5.41s/it] {'loss': 0.6055, 'learning_rate': 1.8781491520001555e-05, 'epoch': 0.18} 18%|█▊ | 1063/5773 [1:38:54<7:04:36, 5.41s/it] 18%|█▊ | 1064/5773 [1:39:00<7:03:58, 5.40s/it] 18%|█▊ | 1064/5773 [1:39:05<7:03:58, 5.40s/it] {'loss': 0.5967, 'learning_rate': 1.8778805913997503e-05, 'epoch': 0.18} 18%|█▊ | 1064/5773 [1:39:05<7:03:58, 5.40s/it] {'loss': 0.5967, 'learning_rate': 1.8778805913997503e-05, 'epoch': 0.18} 18%|█▊ | 1064/5773 [1:39:00<7:03:58, 5.40s/it] 18%|█▊ | 1065/5773 [1:39:05<7:13:23, 5.52s/it] 18%|█▊ | 1065/5773 [1:39:11<7:13:23, 5.52s/it] {'loss': 0.6076, 'learning_rate': 1.8776117544144866e-05, 'epoch': 0.18} 18%|█▊ | 1065/5773 [1:39:11<7:13:23, 5.52s/it] {'loss': 0.6076, 'learning_rate': 1.8776117544144866e-05, 'epoch': 0.18} 18%|█▊ | 1065/5773 [1:39:05<7:13:23, 5.52s/it] 18%|█▊ | 1066/5773 [1:39:11<7:09:54, 5.48s/it] 18%|█▊ | 1066/5773 [1:39:16<7:09:53, 5.48s/it] {'loss': 0.5712, 'learning_rate': 1.877342641129003e-05, 'epoch': 0.18} 18%|█▊ | 1066/5773 [1:39:16<7:09:53, 5.48s/it] {'loss': 0.5712, 'learning_rate': 1.877342641129003e-05, 'epoch': 0.18} 18%|█▊ | 1066/5773 [1:39:11<7:09:54, 5.48s/it] 18%|█▊ | 1067/5773 [1:39:16<7:09:45, 5.48s/it] 18%|█▊ | 1067/5773 [1:39:22<7:09:45, 5.48s/it] {'loss': 0.5979, 'learning_rate': 1.8770732516280256e-05, 'epoch': 0.18} 18%|█▊ | 1067/5773 [1:39:22<7:09:45, 5.48s/it] {'loss': 0.5979, 'learning_rate': 1.8770732516280256e-05, 'epoch': 0.18} 18%|█▊ | 1067/5773 [1:39:16<7:09:45, 5.48s/it] 18%|█▊ | 1068/5773 [1:39:22<7:06:48, 5.44s/it] 18%|█▊ | 1068/5773 [1:39:27<7:06:48, 5.44s/it] {'loss': 0.5938, 'learning_rate': 1.876803585996366e-05, 'epoch': 0.18} 18%|█▊ | 1068/5773 [1:39:27<7:06:48, 5.44s/it] {'loss': 0.5938, 'learning_rate': 1.876803585996366e-05, 'epoch': 0.18} 18%|█▊ | 1068/5773 [1:39:22<7:06:48, 5.44s/it] 19%|█▊ | 1069/5773 [1:39:33<7:07:44, 5.46s/it] {'loss': 0.5872, 'learning_rate': 1.876533644318924e-05, 'epoch': 0.19} 19%|█▊ | 1069/5773 [1:39:33<7:07:44, 5.46s/it] 19%|█▊ | 1069/5773 [1:39:27<7:07:44, 5.46s/it] {'loss': 0.5872, 'learning_rate': 1.876533644318924e-05, 'epoch': 0.19} 19%|█▊ | 1069/5773 [1:39:27<7:07:44, 5.46s/it] 19%|█▊ | 1070/5773 [1:39:33<7:06:17, 5.44s/it] 19%|█▊ | 1070/5773 [1:39:38<7:06:18, 5.44s/it] {'loss': 0.6024, 'learning_rate': 1.8762634266806852e-05, 'epoch': 0.19} 19%|█▊ | 1070/5773 [1:39:38<7:06:18, 5.44s/it] {'loss': 0.6024, 'learning_rate': 1.8762634266806852e-05, 'epoch': 0.19} 19%|█▊ | 1070/5773 [1:39:33<7:06:17, 5.44s/it] 19%|█▊ | 1071/5773 [1:39:38<7:05:50, 5.43s/it] 19%|█▊ | 1071/5773 [1:39:44<7:05:50, 5.43s/it] {'loss': 0.5924, 'learning_rate': 1.875992933166724e-05, 'epoch': 0.19} 19%|█▊ | 1071/5773 [1:39:44<7:05:50, 5.43s/it] {'loss': 0.5924, 'learning_rate': 1.875992933166724e-05, 'epoch': 0.19} 19%|█▊ | 1071/5773 [1:39:38<7:05:50, 5.43s/it] 19%|█▊ | 1072/5773 [1:39:43<7:03:01, 5.40s/it] 19%|█▊ | 1072/5773 [1:39:49<7:03:02, 5.40s/it] {'loss': 0.6062, 'learning_rate': 1.8757221638621993e-05, 'epoch': 0.19} 19%|█▊ | 1072/5773 [1:39:49<7:03:02, 5.40s/it] {'loss': 0.6062, 'learning_rate': 1.8757221638621993e-05, 'epoch': 0.19} 19%|█▊ | 1072/5773 [1:39:43<7:03:01, 5.40s/it] 19%|█▊ | 1073/5773 [1:39:49<7:09:40, 5.49s/it] 19%|█▊ | 1073/5773 [1:39:55<7:09:41, 5.49s/it] {'loss': 0.5948, 'learning_rate': 1.8754511188523583e-05, 'epoch': 0.19} 19%|█▊ | 1073/5773 [1:39:55<7:09:41, 5.49s/it] {'loss': 0.5948, 'learning_rate': 1.8754511188523583e-05, 'epoch': 0.19} 19%|█▊ | 1073/5773 [1:39:49<7:09:40, 5.49s/it] 19%|█▊ | 1074/5773 [1:39:54<7:05:21, 5.43s/it] 19%|█▊ | 1074/5773 [1:40:00<7:05:21, 5.43s/it] {'loss': 0.6088, 'learning_rate': 1.8751797982225342e-05, 'epoch': 0.19} 19%|█▊ | 1074/5773 [1:40:00<7:05:21, 5.43s/it] {'loss': 0.6088, 'learning_rate': 1.8751797982225342e-05, 'epoch': 0.19} 19%|█▊ | 1074/5773 [1:39:54<7:05:21, 5.43s/it] 19%|█▊ | 1075/5773 [1:40:00<7:02:04, 5.39s/it] 19%|█▊ | 1075/5773 [1:40:05<7:02:04, 5.39s/it] {'loss': 0.6021, 'learning_rate': 1.8749082020581485e-05, 'epoch': 0.19} 19%|█▊ | 1075/5773 [1:40:05<7:02:04, 5.39s/it] {'loss': 0.6021, 'learning_rate': 1.8749082020581485e-05, 'epoch': 0.19} 19%|█▊ | 1075/5773 [1:40:00<7:02:04, 5.39s/it] 19%|█▊ | 1076/5773 [1:40:05<7:04:56, 5.43s/it] 19%|█▊ | 1076/5773 [1:40:11<7:04:56, 5.43s/it] {'loss': 0.5812, 'learning_rate': 1.8746363304447073e-05, 'epoch': 0.19} 19%|█▊ | 1076/5773 [1:40:11<7:04:56, 5.43s/it] {'loss': 0.5812, 'learning_rate': 1.8746363304447073e-05, 'epoch': 0.19} 19%|█▊ | 1076/5773 [1:40:05<7:04:56, 5.43s/it] 19%|█▊ | 1077/5773 [1:40:11<7:04:00, 5.42s/it] 19%|█▊ | 1077/5773 [1:40:16<7:04:01, 5.42s/it] {'loss': 0.5914, 'learning_rate': 1.8743641834678047e-05, 'epoch': 0.19} 19%|█▊ | 1077/5773 [1:40:16<7:04:01, 5.42s/it] {'loss': 0.5914, 'learning_rate': 1.8743641834678047e-05, 'epoch': 0.19} 19%|█▊ | 1077/5773 [1:40:11<7:04:00, 5.42s/it] 19%|█▊ | 1078/5773 [1:40:16<7:09:52, 5.49s/it] 19%|█▊ | 1078/5773 [1:40:22<7:09:52, 5.49s/it] {'loss': 0.5965, 'learning_rate': 1.8740917612131218e-05, 'epoch': 0.19} 19%|█▊ | 1078/5773 [1:40:22<7:09:52, 5.49s/it] {'loss': 0.5965, 'learning_rate': 1.8740917612131218e-05, 'epoch': 0.19} 19%|█▊ | 1078/5773 [1:40:16<7:09:52, 5.49s/it] 19%|█▊ | 1079/5773 [1:40:21<7:05:08, 5.43s/it] 19%|█▊ | 1079/5773 [1:40:27<7:05:07, 5.43s/it] {'loss': 0.5703, 'learning_rate': 1.873819063766425e-05, 'epoch': 0.19} 19%|█▊ | 1079/5773 [1:40:27<7:05:07, 5.43s/it] {'loss': 0.5703, 'learning_rate': 1.873819063766425e-05, 'epoch': 0.19} 19%|█▊ | 1079/5773 [1:40:22<7:05:08, 5.43s/it] 19%|█▊ | 1080/5773 [1:40:27<7:04:33, 5.43s/it] 19%|█▊ | 1080/5773 [1:40:32<7:04:32, 5.43s/it] {'loss': 0.5895, 'learning_rate': 1.873546091213569e-05, 'epoch': 0.19} 19%|█▊ | 1080/5773 [1:40:32<7:04:32, 5.43s/it] {'loss': 0.5895, 'learning_rate': 1.873546091213569e-05, 'epoch': 0.19} 19%|█▊ | 1080/5773 [1:40:27<7:04:33, 5.43s/it] 19%|█▊ | 1081/5773 [1:40:32<7:05:01, 5.44s/it] 19%|█▊ | 1081/5773 [1:40:38<7:05:02, 5.44s/it] {'loss': 0.6019, 'learning_rate': 1.8732728436404938e-05, 'epoch': 0.19} 19%|█▊ | 1081/5773 [1:40:38<7:05:02, 5.44s/it] {'loss': 0.6019, 'learning_rate': 1.8732728436404938e-05, 'epoch': 0.19} 19%|█▊ | 1081/5773 [1:40:32<7:05:01, 5.44s/it] 19%|█▊ | 1082/5773 [1:40:38<7:06:07, 5.45s/it] 19%|█▊ | 1082/5773 [1:40:43<7:06:06, 5.45s/it] {'loss': 0.5966, 'learning_rate': 1.8729993211332263e-05, 'epoch': 0.19} 19%|█▊ | 1082/5773 [1:40:43<7:06:06, 5.45s/it] {'loss': 0.5966, 'learning_rate': 1.8729993211332263e-05, 'epoch': 0.19} 19%|█▊ | 1082/5773 [1:40:38<7:06:07, 5.45s/it] 19%|█▉ | 1083/5773 [1:40:43<7:03:08, 5.41s/it] 19%|█▉ | 1083/5773 [1:40:49<7:03:08, 5.41s/it] {'loss': 0.5864, 'learning_rate': 1.8727255237778804e-05, 'epoch': 0.19} 19%|█▉ | 1083/5773 [1:40:49<7:03:08, 5.41s/it] {'loss': 0.5864, 'learning_rate': 1.8727255237778804e-05, 'epoch': 0.19} 19%|█▉ | 1083/5773 [1:40:43<7:03:08, 5.41s/it] 19%|█▉ | 1084/5773 [1:40:49<7:09:07, 5.49s/it] 19%|█▉ | 1084/5773 [1:40:54<7:09:06, 5.49s/it] {'loss': 0.5806, 'learning_rate': 1.8724514516606565e-05, 'epoch': 0.19} 19%|█▉ | 1084/5773 [1:40:54<7:09:06, 5.49s/it] {'loss': 0.5806, 'learning_rate': 1.8724514516606565e-05, 'epoch': 0.19} 19%|█▉ | 1084/5773 [1:40:49<7:09:07, 5.49s/it] 19%|█▉ | 1085/5773 [1:40:54<7:08:16, 5.48s/it] 19%|█▉ | 1085/5773 [1:41:00<7:08:16, 5.48s/it] {'loss': 0.5891, 'learning_rate': 1.872177104867841e-05, 'epoch': 0.19} 19%|█▉ | 1085/5773 [1:41:00<7:08:16, 5.48s/it] {'loss': 0.5891, 'learning_rate': 1.872177104867841e-05, 'epoch': 0.19} 19%|█▉ | 1085/5773 [1:40:54<7:08:16, 5.48s/it] 19%|█▉ | 1086/5773 [1:41:00<7:08:37, 5.49s/it] 19%|█▉ | 1086/5773 [1:41:05<7:08:36, 5.49s/it] {'loss': 0.6024, 'learning_rate': 1.8719024834858065e-05, 'epoch': 0.19} 19%|█▉ | 1086/5773 [1:41:05<7:08:36, 5.49s/it] {'loss': 0.6024, 'learning_rate': 1.8719024834858065e-05, 'epoch': 0.19} 19%|█▉ | 1086/5773 [1:41:00<7:08:37, 5.49s/it] 19%|█▉ | 1087/5773 [1:41:05<7:07:51, 5.48s/it] 19%|█▉ | 1087/5773 [1:41:11<7:07:50, 5.48s/it] {'loss': 0.5786, 'learning_rate': 1.8716275876010135e-05, 'epoch': 0.19} 19%|█▉ | 1087/5773 [1:41:11<7:07:50, 5.48s/it] {'loss': 0.5786, 'learning_rate': 1.8716275876010135e-05, 'epoch': 0.19} 19%|█▉ | 1087/5773 [1:41:05<7:07:51, 5.48s/it] 19%|█▉ | 1088/5773 [1:41:11<7:07:01, 5.47s/it] 19%|█▉ | 1088/5773 [1:41:16<7:07:00, 5.47s/it] {'loss': 0.5904, 'learning_rate': 1.8713524173000075e-05, 'epoch': 0.19} 19%|█▉ | 1088/5773 [1:41:16<7:07:00, 5.47s/it] {'loss': 0.5904, 'learning_rate': 1.8713524173000075e-05, 'epoch': 0.19} 19%|█▉ | 1088/5773 [1:41:11<7:07:01, 5.47s/it] 19%|█▉ | 1089/5773 [1:41:16<7:07:09, 5.47s/it] 19%|█▉ | 1089/5773 [1:41:22<7:07:09, 5.47s/it] {'loss': 0.5882, 'learning_rate': 1.871076972669421e-05, 'epoch': 0.19} 19%|█▉ | 1089/5773 [1:41:22<7:07:09, 5.47s/it] {'loss': 0.5882, 'learning_rate': 1.871076972669421e-05, 'epoch': 0.19} 19%|█▉ | 1089/5773 [1:41:16<7:07:09, 5.47s/it] 19%|█▉ | 1090/5773 [1:41:22<7:03:48, 5.43s/it] 19%|█▉ | 1090/5773 [1:41:27<7:03:49, 5.43s/it] {'loss': 0.5859, 'learning_rate': 1.870801253795973e-05, 'epoch': 0.19} 19%|█▉ | 1090/5773 [1:41:27<7:03:49, 5.43s/it] {'loss': 0.5859, 'learning_rate': 1.870801253795973e-05, 'epoch': 0.19} 19%|█▉ | 1090/5773 [1:41:22<7:03:48, 5.43s/it] 19%|█▉ | 1091/5773 [1:41:27<7:02:38, 5.42s/it] 19%|█▉ | 1091/5773 [1:41:32<7:02:38, 5.42s/it] {'loss': 0.5987, 'learning_rate': 1.8705252607664683e-05, 'epoch': 0.19} 19%|█▉ | 1091/5773 [1:41:32<7:02:38, 5.42s/it] {'loss': 0.5987, 'learning_rate': 1.8705252607664683e-05, 'epoch': 0.19} 19%|█▉ | 1091/5773 [1:41:27<7:02:38, 5.42s/it] 19%|█▉ | 1092/5773 [1:41:32<7:03:56, 5.43s/it] 19%|█▉ | 1092/5773 [1:41:38<7:03:57, 5.43s/it] {'loss': 0.6025, 'learning_rate': 1.8702489936677987e-05, 'epoch': 0.19} 19%|█▉ | 1092/5773 [1:41:38<7:03:57, 5.43s/it] {'loss': 0.6025, 'learning_rate': 1.8702489936677987e-05, 'epoch': 0.19} 19%|█▉ | 1092/5773 [1:41:32<7:03:56, 5.43s/it] 19%|█▉ | 1093/5773 [1:41:38<7:00:04, 5.39s/it] 19%|█▉ | 1093/5773 [1:41:43<7:00:04, 5.39s/it] {'loss': 0.5835, 'learning_rate': 1.8699724525869413e-05, 'epoch': 0.19} 19%|█▉ | 1093/5773 [1:41:43<7:00:04, 5.39s/it] {'loss': 0.5835, 'learning_rate': 1.8699724525869413e-05, 'epoch': 0.19} 19%|█▉ | 1093/5773 [1:41:38<7:00:04, 5.39s/it] 19%|█▉ | 1094/5773 [1:41:43<7:01:02, 5.40s/it] 19%|█▉ | 1094/5773 [1:41:49<7:01:02, 5.40s/it] {'loss': 0.5972, 'learning_rate': 1.8696956376109602e-05, 'epoch': 0.19} 19%|█▉ | 1094/5773 [1:41:49<7:01:02, 5.40s/it] {'loss': 0.5972, 'learning_rate': 1.8696956376109602e-05, 'epoch': 0.19} 19%|█▉ | 1094/5773 [1:41:43<7:01:02, 5.40s/it] 19%|█▉ | 1095/5773 [1:41:49<7:02:01, 5.41s/it] 19%|█▉ | 1095/5773 [1:41:54<7:02:00, 5.41s/it] {'loss': 0.5862, 'learning_rate': 1.869418548827006e-05, 'epoch': 0.19} 19%|█▉ | 1095/5773 [1:41:54<7:02:00, 5.41s/it] {'loss': 0.5862, 'learning_rate': 1.869418548827006e-05, 'epoch': 0.19} 19%|█▉ | 1095/5773 [1:41:49<7:02:01, 5.41s/it] 19%|█▉ | 1096/5773 [1:41:54<7:01:49, 5.41s/it] 19%|█▉ | 1096/5773 [1:41:59<7:01:49, 5.41s/it] {'loss': 0.6063, 'learning_rate': 1.8691411863223147e-05, 'epoch': 0.19} 19%|█▉ | 1096/5773 [1:41:59<7:01:49, 5.41s/it] {'loss': 0.6063, 'learning_rate': 1.8691411863223147e-05, 'epoch': 0.19} 19%|█▉ | 1096/5773 [1:41:54<7:01:49, 5.41s/it] 19%|█▉ | 1097/5773 [1:41:59<6:58:32, 5.37s/it] 19%|█▉ | 1097/5773 [1:42:05<6:58:33, 5.37s/it] {'loss': 0.5841, 'learning_rate': 1.8688635501842093e-05, 'epoch': 0.19} 19%|█▉ | 1097/5773 [1:42:05<6:58:33, 5.37s/it] {'loss': 0.5841, 'learning_rate': 1.8688635501842093e-05, 'epoch': 0.19} 19%|█▉ | 1097/5773 [1:41:59<6:58:32, 5.37s/it] 19%|█▉ | 1098/5773 [1:42:05<7:00:18, 5.39s/it] 19%|█▉ | 1098/5773 [1:42:10<7:00:18, 5.39s/it] {'loss': 0.5855, 'learning_rate': 1.8685856405000984e-05, 'epoch': 0.19} 19%|█▉ | 1098/5773 [1:42:10<7:00:18, 5.39s/it] {'loss': 0.5855, 'learning_rate': 1.8685856405000984e-05, 'epoch': 0.19} 19%|█▉ | 1098/5773 [1:42:05<7:00:18, 5.39s/it] 19%|█▉ | 1099/5773 [1:42:10<6:59:55, 5.39s/it] 19%|█▉ | 1099/5773 [1:42:16<6:59:55, 5.39s/it] {'loss': 0.5863, 'learning_rate': 1.8683074573574763e-05, 'epoch': 0.19} 19%|█▉ | 1099/5773 [1:42:16<6:59:55, 5.39s/it] {'loss': 0.5863, 'learning_rate': 1.8683074573574763e-05, 'epoch': 0.19} 19%|█▉ | 1099/5773 [1:42:10<6:59:55, 5.39s/it]7 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 53 AutoResumeHook: Checking whether to suspend...2 1AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 19%|█▉ | 1100/5773 [1:42:15<6:58:39, 5.38s/it]15 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 19%|█▉ | 1100/5773 [1:42:21<6:58:39, 5.38s/it] {'loss': 0.5939, 'learning_rate': 1.868029000843925e-05, 'epoch': 0.19} 19%|█▉ | 1100/5773 [1:42:21<6:58:39, 5.38s/it] {'loss': 0.5939, 'learning_rate': 1.868029000843925e-05, 'epoch': 0.19} 19%|█▉ | 1100/5773 [1:42:15<6:58:39, 5.38s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 19%|█▉ | 1101/5773 [1:42:35<12:39:49, 9.76s/it] 19%|█▉ | 1101/5773 [1:42:41<12:39:48, 9.76s/it] {'loss': 0.5819, 'learning_rate': 1.8677502710471105e-05, 'epoch': 0.19} 19%|█▉ | 1101/5773 [1:42:41<12:39:48, 9.76s/it] {'loss': 0.5819, 'learning_rate': 1.8677502710471105e-05, 'epoch': 0.19} 19%|█▉ | 1101/5773 [1:42:35<12:39:49, 9.76s/it] 19%|█▉ | 1102/5773 [1:42:41<10:59:25, 8.47s/it] 19%|█▉ | 1102/5773 [1:42:46<10:59:25, 8.47s/it] {'loss': 0.6181, 'learning_rate': 1.8674712680547865e-05, 'epoch': 0.19} 19%|█▉ | 1102/5773 [1:42:46<10:59:25, 8.47s/it] {'loss': 0.6181, 'learning_rate': 1.8674712680547865e-05, 'epoch': 0.19} 19%|█▉ | 1102/5773 [1:42:41<10:59:25, 8.47s/it] 19%|█▉ | 1103/5773 [1:42:46<9:49:44, 7.58s/it] 19%|█▉ | 1103/5773 [1:42:52<9:49:43, 7.58s/it] {'loss': 0.6014, 'learning_rate': 1.8671919919547914e-05, 'epoch': 0.19} 19%|█▉ | 1103/5773 [1:42:52<9:49:43, 7.58s/it] {'loss': 0.6014, 'learning_rate': 1.8671919919547914e-05, 'epoch': 0.19} 19%|█▉ | 1103/5773 [1:42:46<9:49:44, 7.58s/it] 19%|█▉ | 1104/5773 [1:42:52<9:01:20, 6.96s/it] 19%|█▉ | 1104/5773 [1:42:57<9:01:19, 6.96s/it] {'loss': 0.601, 'learning_rate': 1.866912442835051e-05, 'epoch': 0.19} {'loss': 0.601, 'learning_rate': 1.866912442835051e-05, 'epoch': 0.19} 19%|█▉ | 1104/5773 [1:42:52<9:01:20, 6.96s/it] 19%|█▉ | 1104/5773 [1:42:57<9:01:19, 6.96s/it] 19%|█▉ | 1105/5773 [1:42:57<8:25:11, 6.49s/it] 19%|█▉ | 1105/5773 [1:43:03<8:25:11, 6.49s/it] {'loss': 0.5794, 'learning_rate': 1.8666326207835758e-05, 'epoch': 0.19} 19%|█▉ | 1105/5773 [1:43:03<8:25:11, 6.49s/it] {'loss': 0.5794, 'learning_rate': 1.8666326207835758e-05, 'epoch': 0.19} 19%|█▉ | 1105/5773 [1:42:57<8:25:11, 6.49s/it] 19%|█▉ | 1106/5773 [1:43:03<7:59:39, 6.17s/it] 19%|█▉ | 1106/5773 [1:43:08<7:59:39, 6.17s/it] {'loss': 0.5907, 'learning_rate': 1.8663525258884626e-05, 'epoch': 0.19} 19%|█▉ | 1106/5773 [1:43:08<7:59:39, 6.17s/it] {'loss': 0.5907, 'learning_rate': 1.8663525258884626e-05, 'epoch': 0.19} 19%|█▉ | 1106/5773 [1:43:03<7:59:39, 6.17s/it] 19%|█▉ | 1107/5773 [1:43:08<7:45:49, 5.99s/it] 19%|█▉ | 1107/5773 [1:43:14<7:45:48, 5.99s/it] {'loss': 0.6141, 'learning_rate': 1.866072158237895e-05, 'epoch': 0.19} 19%|█▉ | 1107/5773 [1:43:14<7:45:48, 5.99s/it] {'loss': 0.6141, 'learning_rate': 1.866072158237895e-05, 'epoch': 0.19} 19%|█▉ | 1107/5773 [1:43:08<7:45:49, 5.99s/it] 19%|█▉ | 1108/5773 [1:43:14<7:32:38, 5.82s/it] 19%|█▉ | 1108/5773 [1:43:19<7:32:38, 5.82s/it] {'loss': 0.6042, 'learning_rate': 1.8657915179201408e-05, 'epoch': 0.19} 19%|█▉ | 1108/5773 [1:43:19<7:32:38, 5.82s/it] {'loss': 0.6042, 'learning_rate': 1.8657915179201408e-05, 'epoch': 0.19} 19%|█▉ | 1108/5773 [1:43:14<7:32:38, 5.82s/it] 19%|█▉ | 1109/5773 [1:43:19<7:27:31, 5.76s/it] 19%|█▉ | 1109/5773 [1:43:25<7:27:31, 5.76s/it] {'loss': 0.5857, 'learning_rate': 1.865510605023555e-05, 'epoch': 0.19} 19%|█▉ | 1109/5773 [1:43:25<7:27:31, 5.76s/it] {'loss': 0.5857, 'learning_rate': 1.865510605023555e-05, 'epoch': 0.19} 19%|█▉ | 1109/5773 [1:43:19<7:27:31, 5.76s/it] 19%|█▉ | 1110/5773 [1:43:25<7:20:51, 5.67s/it] 19%|█▉ | 1110/5773 [1:43:30<7:20:51, 5.67s/it] {'loss': 0.6012, 'learning_rate': 1.865229419636578e-05, 'epoch': 0.19} 19%|█▉ | 1110/5773 [1:43:30<7:20:51, 5.67s/it] {'loss': 0.6012, 'learning_rate': 1.865229419636578e-05, 'epoch': 0.19} 19%|█▉ | 1110/5773 [1:43:25<7:20:51, 5.67s/it] 19%|█▉ | 1111/5773 [1:43:30<7:18:58, 5.65s/it] 19%|█▉ | 1111/5773 [1:43:36<7:18:58, 5.65s/it] {'loss': 0.5853, 'learning_rate': 1.8649479618477357e-05, 'epoch': 0.19} 19%|█▉ | 1111/5773 [1:43:36<7:18:58, 5.65s/it] {'loss': 0.5853, 'learning_rate': 1.8649479618477357e-05, 'epoch': 0.19} 19%|█▉ | 1111/5773 [1:43:30<7:18:58, 5.65s/it] 19%|█▉ | 1112/5773 [1:43:36<7:20:39, 5.67s/it] 19%|█▉ | 1112/5773 [1:43:42<7:20:40, 5.67s/it] {'loss': 0.5932, 'learning_rate': 1.86466623174564e-05, 'epoch': 0.19} 19%|█▉ | 1112/5773 [1:43:42<7:20:40, 5.67s/it] {'loss': 0.5932, 'learning_rate': 1.86466623174564e-05, 'epoch': 0.19} 19%|█▉ | 1112/5773 [1:43:36<7:20:39, 5.67s/it] 19%|█▉ | 1113/5773 [1:43:42<7:17:51, 5.64s/it] 19%|█▉ | 1113/5773 [1:43:47<7:17:51, 5.64s/it] {'loss': 0.5884, 'learning_rate': 1.8643842294189888e-05, 'epoch': 0.19} 19%|█▉ | 1113/5773 [1:43:47<7:17:51, 5.64s/it] {'loss': 0.5884, 'learning_rate': 1.8643842294189888e-05, 'epoch': 0.19} 19%|█▉ | 1113/5773 [1:43:42<7:17:51, 5.64s/it] 19%|█▉ | 1114/5773 [1:43:47<7:12:19, 5.57s/it] 19%|█▉ | 1114/5773 [1:43:53<7:12:19, 5.57s/it] {'loss': 0.5846, 'learning_rate': 1.8641019549565654e-05, 'epoch': 0.19} 19%|█▉ | 1114/5773 [1:43:53<7:12:19, 5.57s/it] {'loss': 0.5846, 'learning_rate': 1.8641019549565654e-05, 'epoch': 0.19} 19%|█▉ | 1114/5773 [1:43:47<7:12:19, 5.57s/it] 19%|█▉ | 1115/5773 [1:43:52<7:08:23, 5.52s/it] 19%|█▉ | 1115/5773 [1:43:58<7:08:23, 5.52s/it] {'loss': 0.5963, 'learning_rate': 1.8638194084472384e-05, 'epoch': 0.19} 19%|█▉ | 1115/5773 [1:43:58<7:08:23, 5.52s/it] {'loss': 0.5963, 'learning_rate': 1.8638194084472384e-05, 'epoch': 0.19} 19%|█▉ | 1115/5773 [1:43:52<7:08:23, 5.52s/it] 19%|█▉ | 1116/5773 [1:43:58<7:04:06, 5.46s/it] 19%|█▉ | 1116/5773 [1:44:03<7:04:06, 5.46s/it] {'loss': 0.5857, 'learning_rate': 1.863536589979963e-05, 'epoch': 0.19} 19%|█▉ | 1116/5773 [1:44:03<7:04:06, 5.46s/it] {'loss': 0.5857, 'learning_rate': 1.863536589979963e-05, 'epoch': 0.19} 19%|█▉ | 1116/5773 [1:43:58<7:04:06, 5.46s/it] 19%|█▉ | 1117/5773 [1:44:03<7:00:05, 5.41s/it] 19%|█▉ | 1117/5773 [1:44:09<7:00:05, 5.41s/it] {'loss': 0.5835, 'learning_rate': 1.8632534996437793e-05, 'epoch': 0.19} 19%|█▉ | 1117/5773 [1:44:09<7:00:05, 5.41s/it] {'loss': 0.5835, 'learning_rate': 1.8632534996437793e-05, 'epoch': 0.19} 19%|█▉ | 1117/5773 [1:44:03<7:00:05, 5.41s/it] 19%|█▉ | 1118/5773 [1:44:09<7:03:03, 5.45s/it] 19%|█▉ | 1118/5773 [1:44:14<7:03:03, 5.45s/it] {'loss': 0.6107, 'learning_rate': 1.862970137527813e-05, 'epoch': 0.19} 19%|█▉ | 1118/5773 [1:44:14<7:03:03, 5.45s/it] {'loss': 0.6107, 'learning_rate': 1.862970137527813e-05, 'epoch': 0.19} 19%|█▉ | 1118/5773 [1:44:09<7:03:03, 5.45s/it] 19%|█▉ | 1119/5773 [1:44:14<7:04:29, 5.47s/it] 19%|█▉ | 1119/5773 [1:44:20<7:04:28, 5.47s/it] {'loss': 0.6012, 'learning_rate': 1.8626865037212756e-05, 'epoch': 0.19} 19%|█▉ | 1119/5773 [1:44:20<7:04:28, 5.47s/it] {'loss': 0.6012, 'learning_rate': 1.8626865037212756e-05, 'epoch': 0.19} 19%|█▉ | 1119/5773 [1:44:14<7:04:29, 5.47s/it] 19%|█▉ | 1120/5773 [1:44:20<7:04:45, 5.48s/it] 19%|█▉ | 1120/5773 [1:44:25<7:04:45, 5.48s/it] {'loss': 0.5911, 'learning_rate': 1.8624025983134643e-05, 'epoch': 0.19} 19%|█▉ | 1120/5773 [1:44:25<7:04:45, 5.48s/it] {'loss': 0.5911, 'learning_rate': 1.8624025983134643e-05, 'epoch': 0.19} 19%|█▉ | 1120/5773 [1:44:20<7:04:45, 5.48s/it] 19%|█▉ | 1121/5773 [1:44:25<7:04:54, 5.48s/it] 19%|█▉ | 1121/5773 [1:44:31<7:04:54, 5.48s/it] {'loss': 0.587, 'learning_rate': 1.8621184213937615e-05, 'epoch': 0.19} 19%|█▉ | 1121/5773 [1:44:25<7:04:54, 5.48s/it]{'loss': 0.587, 'learning_rate': 1.8621184213937615e-05, 'epoch': 0.19} 19%|█▉ | 1121/5773 [1:44:31<7:04:54, 5.48s/it] 19%|█▉ | 1122/5773 [1:44:31<7:03:21, 5.46s/it] 19%|█▉ | 1122/5773 [1:44:36<7:03:22, 5.46s/it] {'loss': 0.579, 'learning_rate': 1.8618339730516353e-05, 'epoch': 0.19} 19%|█▉ | 1122/5773 [1:44:36<7:03:22, 5.46s/it] {'loss': 0.579, 'learning_rate': 1.8618339730516353e-05, 'epoch': 0.19} 19%|█▉ | 1122/5773 [1:44:31<7:03:21, 5.46s/it] 19%|█▉ | 1123/5773 [1:44:36<7:08:36, 5.53s/it] 19%|█▉ | 1123/5773 [1:44:42<7:08:36, 5.53s/it] {'loss': 0.5704, 'learning_rate': 1.8615492533766387e-05, 'epoch': 0.19} 19%|█▉ | 1123/5773 [1:44:42<7:08:36, 5.53s/it] {'loss': 0.5704, 'learning_rate': 1.8615492533766387e-05, 'epoch': 0.19} 19%|█▉ | 1123/5773 [1:44:36<7:08:36, 5.53s/it] 19%|█▉ | 1124/5773 [1:44:42<7:08:23, 5.53s/it] 19%|█▉ | 1124/5773 [1:44:47<7:08:23, 5.53s/it] {'loss': 0.589, 'learning_rate': 1.861264262458411e-05, 'epoch': 0.19} 19%|█▉ | 1124/5773 [1:44:47<7:08:23, 5.53s/it] {'loss': 0.589, 'learning_rate': 1.861264262458411e-05, 'epoch': 0.19} 19%|█▉ | 1124/5773 [1:44:42<7:08:23, 5.53s/it] 19%|█▉ | 1125/5773 [1:44:47<7:06:38, 5.51s/it] 19%|█▉ | 1125/5773 [1:44:53<7:06:38, 5.51s/it] {'loss': 0.5886, 'learning_rate': 1.860979000386676e-05, 'epoch': 0.19} 19%|█▉ | 1125/5773 [1:44:53<7:06:38, 5.51s/it] {'loss': 0.5886, 'learning_rate': 1.860979000386676e-05, 'epoch': 0.19} 19%|█▉ | 1125/5773 [1:44:47<7:06:38, 5.51s/it] 20%|█▉ | 1126/5773 [1:44:53<7:02:09, 5.45s/it] 20%|█▉ | 1126/5773 [1:44:58<7:02:10, 5.45s/it] {'loss': 0.5866, 'learning_rate': 1.860693467251244e-05, 'epoch': 0.2} 20%|█▉ | 1126/5773 [1:44:58<7:02:10, 5.45s/it] {'loss': 0.5866, 'learning_rate': 1.860693467251244e-05, 'epoch': 0.2} 20%|█▉ | 1126/5773 [1:44:53<7:02:09, 5.45s/it] 20%|█▉ | 1127/5773 [1:44:58<7:03:15, 5.47s/it] 20%|█▉ | 1127/5773 [1:45:04<7:03:14, 5.47s/it] {'loss': 0.5875, 'learning_rate': 1.8604076631420095e-05, 'epoch': 0.2} 20%|█▉ | 1127/5773 [1:45:04<7:03:14, 5.47s/it] {'loss': 0.5875, 'learning_rate': 1.8604076631420095e-05, 'epoch': 0.2} 20%|█▉ | 1127/5773 [1:44:58<7:03:15, 5.47s/it] 20%|█▉ | 1128/5773 [1:45:03<7:01:38, 5.45s/it] 20%|█▉ | 1128/5773 [1:45:09<7:01:38, 5.45s/it] {'loss': 0.5912, 'learning_rate': 1.8601215881489528e-05, 'epoch': 0.2} 20%|█▉ | 1128/5773 [1:45:09<7:01:38, 5.45s/it] {'loss': 0.5912, 'learning_rate': 1.8601215881489528e-05, 'epoch': 0.2} 20%|█▉ | 1128/5773 [1:45:03<7:01:38, 5.45s/it] 20%|█▉ | 1129/5773 [1:45:09<7:05:22, 5.50s/it] 20%|█▉ | 1129/5773 [1:45:15<7:05:22, 5.50s/it] {'loss': 0.5796, 'learning_rate': 1.8598352423621394e-05, 'epoch': 0.2} 20%|█▉ | 1129/5773 [1:45:15<7:05:22, 5.50s/it] {'loss': 0.5796, 'learning_rate': 1.8598352423621394e-05, 'epoch': 0.2} 20%|█▉ | 1129/5773 [1:45:09<7:05:22, 5.50s/it] 20%|█▉ | 1130/5773 [1:45:14<7:04:45, 5.49s/it] 20%|█▉ | 1130/5773 [1:45:20<7:04:45, 5.49s/it] {'loss': 0.5958, 'learning_rate': 1.8595486258717198e-05, 'epoch': 0.2} 20%|█▉ | 1130/5773 [1:45:20<7:04:45, 5.49s/it] {'loss': 0.5958, 'learning_rate': 1.8595486258717198e-05, 'epoch': 0.2} 20%|█▉ | 1130/5773 [1:45:15<7:04:45, 5.49s/it] 20%|█▉ | 1131/5773 [1:45:20<7:02:13, 5.46s/it] 20%|█▉ | 1131/5773 [1:45:25<7:02:13, 5.46s/it] {'loss': 0.596, 'learning_rate': 1.8592617387679304e-05, 'epoch': 0.2} 20%|█▉ | 1131/5773 [1:45:25<7:02:13, 5.46s/it] {'loss': 0.596, 'learning_rate': 1.8592617387679304e-05, 'epoch': 0.2} 20%|█▉ | 1131/5773 [1:45:20<7:02:13, 5.46s/it] 20%|█▉ | 1132/5773 [1:45:25<7:02:41, 5.46s/it] 20%|█▉ | 1132/5773 [1:45:31<7:02:41, 5.46s/it] {'loss': 0.5848, 'learning_rate': 1.858974581141093e-05, 'epoch': 0.2} 20%|█▉ | 1132/5773 [1:45:31<7:02:41, 5.46s/it] {'loss': 0.5848, 'learning_rate': 1.858974581141093e-05, 'epoch': 0.2} 20%|█▉ | 1132/5773 [1:45:25<7:02:41, 5.46s/it] 20%|█▉ | 1133/5773 [1:45:31<7:03:00, 5.47s/it] 20%|█▉ | 1133/5773 [1:45:36<7:02:59, 5.47s/it] {'loss': 0.6046, 'learning_rate': 1.858687153081613e-05, 'epoch': 0.2} 20%|█▉ | 1133/5773 [1:45:36<7:02:59, 5.47s/it] {'loss': 0.6046, 'learning_rate': 1.858687153081613e-05, 'epoch': 0.2} 20%|█▉ | 1133/5773 [1:45:31<7:03:00, 5.47s/it] 20%|█▉ | 1134/5773 [1:45:36<7:05:32, 5.50s/it] 20%|█▉ | 1134/5773 [1:45:42<7:05:32, 5.50s/it] {'loss': 0.5836, 'learning_rate': 1.8583994546799822e-05, 'epoch': 0.2} 20%|█▉ | 1134/5773 [1:45:42<7:05:32, 5.50s/it] {'loss': 0.5836, 'learning_rate': 1.8583994546799822e-05, 'epoch': 0.2} 20%|█▉ | 1134/5773 [1:45:36<7:05:32, 5.50s/it] 20%|█▉ | 1135/5773 [1:45:42<7:02:21, 5.46s/it] 20%|█▉ | 1135/5773 [1:45:47<7:02:20, 5.46s/it] {'loss': 0.5925, 'learning_rate': 1.858111486026778e-05, 'epoch': 0.2} 20%|█▉ | 1135/5773 [1:45:47<7:02:20, 5.46s/it] {'loss': 0.5925, 'learning_rate': 1.858111486026778e-05, 'epoch': 0.2} 20%|█▉ | 1135/5773 [1:45:42<7:02:21, 5.46s/it] 20%|█▉ | 1136/5773 [1:45:47<6:59:51, 5.43s/it] 20%|█▉ | 1136/5773 [1:45:53<6:59:51, 5.43s/it] {'loss': 0.6041, 'learning_rate': 1.857823247212661e-05, 'epoch': 0.2} 20%|█▉ | 1136/5773 [1:45:53<6:59:51, 5.43s/it] {'loss': 0.6041, 'learning_rate': 1.857823247212661e-05, 'epoch': 0.2} 20%|█▉ | 1136/5773 [1:45:47<6:59:51, 5.43s/it] 20%|█▉ | 1137/5773 [1:45:53<6:57:50, 5.41s/it] 20%|█▉ | 1137/5773 [1:45:58<6:57:50, 5.41s/it] {'loss': 0.6006, 'learning_rate': 1.8575347383283788e-05, 'epoch': 0.2} 20%|█▉ | 1137/5773 [1:45:58<6:57:50, 5.41s/it] {'loss': 0.6006, 'learning_rate': 1.8575347383283788e-05, 'epoch': 0.2} 20%|█▉ | 1137/5773 [1:45:53<6:57:50, 5.41s/it] 20%|█▉ | 1138/5773 [1:45:58<6:58:55, 5.42s/it] 20%|█▉ | 1138/5773 [1:46:04<6:58:55, 5.42s/it] {'loss': 0.6089, 'learning_rate': 1.8572459594647626e-05, 'epoch': 0.2} 20%|█▉ | 1138/5773 [1:46:04<6:58:55, 5.42s/it] {'loss': 0.6089, 'learning_rate': 1.8572459594647626e-05, 'epoch': 0.2} 20%|█▉ | 1138/5773 [1:45:58<6:58:55, 5.42s/it] 20%|█▉ | 1139/5773 [1:46:03<6:57:28, 5.41s/it] 20%|█▉ | 1139/5773 [1:46:09<6:57:29, 5.41s/it] {'loss': 0.5736, 'learning_rate': 1.8569569107127297e-05, 'epoch': 0.2} 20%|█▉ | 1139/5773 [1:46:09<6:57:29, 5.41s/it] {'loss': 0.5736, 'learning_rate': 1.8569569107127297e-05, 'epoch': 0.2} 20%|█▉ | 1139/5773 [1:46:03<6:57:28, 5.41s/it] 20%|█▉ | 1140/5773 [1:46:09<6:55:12, 5.38s/it] 20%|█▉ | 1140/5773 [1:46:14<6:55:12, 5.38s/it] {'loss': 0.5941, 'learning_rate': 1.8566675921632817e-05, 'epoch': 0.2} 20%|█▉ | 1140/5773 [1:46:14<6:55:12, 5.38s/it] {'loss': 0.5941, 'learning_rate': 1.8566675921632817e-05, 'epoch': 0.2} 20%|█▉ | 1140/5773 [1:46:09<6:55:12, 5.38s/it] 20%|█▉ | 1141/5773 [1:46:14<6:56:31, 5.40s/it] 20%|█▉ | 1141/5773 [1:46:20<6:56:31, 5.40s/it] {'loss': 0.5767, 'learning_rate': 1.8563780039075055e-05, 'epoch': 0.2} 20%|█▉ | 1141/5773 [1:46:20<6:56:31, 5.40s/it] {'loss': 0.5767, 'learning_rate': 1.8563780039075055e-05, 'epoch': 0.2} 20%|█▉ | 1141/5773 [1:46:14<6:56:31, 5.40s/it] 20%|█▉ | 1142/5773 [1:46:20<7:00:16, 5.45s/it] 20%|█▉ | 1142/5773 [1:46:25<7:00:16, 5.45s/it] {'loss': 0.5897, 'learning_rate': 1.8560881460365726e-05, 'epoch': 0.2} 20%|█▉ | 1142/5773 [1:46:25<7:00:16, 5.45s/it] {'loss': 0.5897, 'learning_rate': 1.8560881460365726e-05, 'epoch': 0.2} 20%|█▉ | 1142/5773 [1:46:20<7:00:16, 5.45s/it] 20%|█▉ | 1143/5773 [1:46:25<7:02:28, 5.47s/it] 20%|█▉ | 1143/5773 [1:46:31<7:02:27, 5.47s/it] {'loss': 0.6032, 'learning_rate': 1.855798018641739e-05, 'epoch': 0.2} 20%|█▉ | 1143/5773 [1:46:31<7:02:27, 5.47s/it] {'loss': 0.6032, 'learning_rate': 1.855798018641739e-05, 'epoch': 0.2} 20%|█▉ | 1143/5773 [1:46:25<7:02:28, 5.47s/it] 20%|█▉ | 1144/5773 [1:46:31<7:05:47, 5.52s/it] 20%|█▉ | 1144/5773 [1:46:36<7:05:48, 5.52s/it] {'loss': 0.5854, 'learning_rate': 1.855507621814347e-05, 'epoch': 0.2} 20%|█▉ | 1144/5773 [1:46:36<7:05:48, 5.52s/it] {'loss': 0.5854, 'learning_rate': 1.855507621814347e-05, 'epoch': 0.2} 20%|█▉ | 1144/5773 [1:46:31<7:05:47, 5.52s/it] 20%|█▉ | 1145/5773 [1:46:36<7:03:23, 5.49s/it] 20%|█▉ | 1145/5773 [1:46:42<7:03:23, 5.49s/it] {'loss': 0.5915, 'learning_rate': 1.8552169556458224e-05, 'epoch': 0.2} 20%|█▉ | 1145/5773 [1:46:42<7:03:23, 5.49s/it] {'loss': 0.5915, 'learning_rate': 1.8552169556458224e-05, 'epoch': 0.2} 20%|█▉ | 1145/5773 [1:46:36<7:03:23, 5.49s/it] 20%|█▉ | 1146/5773 [1:46:42<7:01:39, 5.47s/it] 20%|█▉ | 1146/5773 [1:46:47<7:01:40, 5.47s/it] {'loss': 0.5874, 'learning_rate': 1.854926020227676e-05, 'epoch': 0.2} 20%|█▉ | 1146/5773 [1:46:47<7:01:40, 5.47s/it] {'loss': 0.5874, 'learning_rate': 1.854926020227676e-05, 'epoch': 0.2} 20%|█▉ | 1146/5773 [1:46:42<7:01:39, 5.47s/it] 20%|█▉ | 1147/5773 [1:46:47<7:00:55, 5.46s/it] 20%|█▉ | 1147/5773 [1:46:53<7:00:54, 5.46s/it] {'loss': 0.6002, 'learning_rate': 1.8546348156515036e-05, 'epoch': 0.2} 20%|█▉ | 1147/5773 [1:46:53<7:00:54, 5.46s/it] {'loss': 0.6002, 'learning_rate': 1.8546348156515036e-05, 'epoch': 0.2} 20%|█▉ | 1147/5773 [1:46:47<7:00:55, 5.46s/it] 20%|█▉ | 1148/5773 [1:46:52<6:56:22, 5.40s/it] 20%|█▉ | 1148/5773 [1:46:58<6:56:22, 5.40s/it] {'loss': 0.5708, 'learning_rate': 1.8543433420089862e-05, 'epoch': 0.2} 20%|█▉ | 1148/5773 [1:46:58<6:56:22, 5.40s/it] {'loss': 0.5708, 'learning_rate': 1.8543433420089862e-05, 'epoch': 0.2} 20%|█▉ | 1148/5773 [1:46:52<6:56:22, 5.40s/it] 20%|█▉ | 1149/5773 [1:46:58<6:56:20, 5.40s/it] 20%|█▉ | 1149/5773 [1:47:03<6:56:20, 5.40s/it] {'loss': 0.5735, 'learning_rate': 1.8540515993918886e-05, 'epoch': 0.2} 20%|█▉ | 1149/5773 [1:47:03<6:56:20, 5.40s/it] {'loss': 0.5735, 'learning_rate': 1.8540515993918886e-05, 'epoch': 0.2} 20%|█▉ | 1149/5773 [1:46:58<6:56:20, 5.40s/it]12 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend...7 AutoResumeHook: Checking whether to suspend...10 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 13 14 0AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...5 AutoResumeHook: Checking whether to suspend... 20%|█▉ | 1150/5773 [1:47:03<6:55:33, 5.39s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 20%|█▉ | 1150/5773 [1:47:09<6:55:33, 5.39s/it]9 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... {'loss': 0.5857, 'learning_rate': 1.8537595878920607e-05, 'epoch': 0.2} 20%|█▉ | 1150/5773 [1:47:09<6:55:33, 5.39s/it] {'loss': 0.5857, 'learning_rate': 1.8537595878920607e-05, 'epoch': 0.2} 20%|█▉ | 1150/5773 [1:47:03<6:55:33, 5.39s/it] 20%|█▉ | 1151/5773 [1:47:09<6:55:26, 5.39s/it] 20%|█▉ | 1151/5773 [1:47:14<6:55:26, 5.39s/it] {'loss': 0.5817, 'learning_rate': 1.853467307601437e-05, 'epoch': 0.2} 20%|█▉ | 1151/5773 [1:47:14<6:55:26, 5.39s/it] {'loss': 0.5817, 'learning_rate': 1.853467307601437e-05, 'epoch': 0.2} 20%|█▉ | 1151/5773 [1:47:09<6:55:26, 5.39s/it] 20%|█▉ | 1152/5773 [1:47:14<6:58:23, 5.43s/it] 20%|█▉ | 1152/5773 [1:47:20<6:58:23, 5.43s/it] {'loss': 0.5919, 'learning_rate': 1.8531747586120368e-05, 'epoch': 0.2} 20%|█▉ | 1152/5773 [1:47:20<6:58:23, 5.43s/it] {'loss': 0.5919, 'learning_rate': 1.8531747586120368e-05, 'epoch': 0.2} 20%|█▉ | 1152/5773 [1:47:14<6:58:23, 5.43s/it] 20%|█▉ | 1153/5773 [1:47:19<6:58:07, 5.43s/it] 20%|█▉ | 1153/5773 [1:47:25<6:58:06, 5.43s/it] {'loss': 0.5913, 'learning_rate': 1.8528819410159638e-05, 'epoch': 0.2} 20%|█▉ | 1153/5773 [1:47:25<6:58:06, 5.43s/it] {'loss': 0.5913, 'learning_rate': 1.8528819410159638e-05, 'epoch': 0.2} 20%|█▉ | 1153/5773 [1:47:19<6:58:07, 5.43s/it] 20%|█▉ | 1154/5773 [1:47:25<6:59:49, 5.45s/it] 20%|█▉ | 1154/5773 [1:47:31<6:59:49, 5.45s/it] {'loss': 0.5974, 'learning_rate': 1.8525888549054067e-05, 'epoch': 0.2} 20%|█▉ | 1154/5773 [1:47:31<6:59:49, 5.45s/it] {'loss': 0.5974, 'learning_rate': 1.8525888549054067e-05, 'epoch': 0.2} 20%|█▉ | 1154/5773 [1:47:25<6:59:49, 5.45s/it] 20%|██ | 1155/5773 [1:47:30<7:00:36, 5.46s/it] 20%|██ | 1155/5773 [1:47:36<7:00:36, 5.46s/it] {'loss': 0.5993, 'learning_rate': 1.8522955003726375e-05, 'epoch': 0.2} 20%|██ | 1155/5773 [1:47:36<7:00:36, 5.46s/it] {'loss': 0.5993, 'learning_rate': 1.8522955003726375e-05, 'epoch': 0.2} 20%|██ | 1155/5773 [1:47:30<7:00:36, 5.46s/it] 20%|██ | 1156/5773 [1:47:36<6:58:49, 5.44s/it] 20%|██ | 1156/5773 [1:47:41<6:58:49, 5.44s/it] {'loss': 0.5909, 'learning_rate': 1.8520018775100146e-05, 'epoch': 0.2} 20%|██ | 1156/5773 [1:47:41<6:58:49, 5.44s/it] {'loss': 0.5909, 'learning_rate': 1.8520018775100146e-05, 'epoch': 0.2} 20%|██ | 1156/5773 [1:47:36<6:58:49, 5.44s/it] 20%|██ | 1157/5773 [1:47:41<6:56:49, 5.42s/it] 20%|██ | 1157/5773 [1:47:47<6:56:49, 5.42s/it] {'loss': 0.608, 'learning_rate': 1.851707986409979e-05, 'epoch': 0.2} 20%|██ | 1157/5773 [1:47:47<6:56:49, 5.42s/it] {'loss': 0.608, 'learning_rate': 1.851707986409979e-05, 'epoch': 0.2} 20%|██ | 1157/5773 [1:47:41<6:56:49, 5.42s/it] 20%|██ | 1158/5773 [1:47:47<7:00:29, 5.47s/it] 20%|██ | 1158/5773 [1:47:52<7:00:29, 5.47s/it] {'loss': 0.6027, 'learning_rate': 1.8514138271650578e-05, 'epoch': 0.2} 20%|██ | 1158/5773 [1:47:52<7:00:29, 5.47s/it] {'loss': 0.6027, 'learning_rate': 1.8514138271650578e-05, 'epoch': 0.2} 20%|██ | 1158/5773 [1:47:47<7:00:29, 5.47s/it] 20%|██ | 1159/5773 [1:47:52<7:05:14, 5.53s/it] 20%|██ | 1159/5773 [1:47:58<7:05:14, 5.53s/it] {'loss': 0.5808, 'learning_rate': 1.851119399867861e-05, 'epoch': 0.2} 20%|██ | 1159/5773 [1:47:58<7:05:14, 5.53s/it] {'loss': 0.5808, 'learning_rate': 1.851119399867861e-05, 'epoch': 0.2} 20%|██ | 1159/5773 [1:47:52<7:05:14, 5.53s/it] 20%|██ | 1160/5773 [1:47:58<6:59:55, 5.46s/it] 20%|██ | 1160/5773 [1:48:03<6:59:55, 5.46s/it] {'loss': 0.5834, 'learning_rate': 1.8508247046110843e-05, 'epoch': 0.2} 20%|██ | 1160/5773 [1:48:03<6:59:55, 5.46s/it] {'loss': 0.5834, 'learning_rate': 1.8508247046110843e-05, 'epoch': 0.2} 20%|██ | 1160/5773 [1:47:58<6:59:55, 5.46s/it] 20%|██ | 1161/5773 [1:48:03<7:02:19, 5.49s/it] 20%|██ | 1161/5773 [1:48:09<7:02:18, 5.49s/it] {'loss': 0.5869, 'learning_rate': 1.8505297414875066e-05, 'epoch': 0.2} 20%|██ | 1161/5773 [1:48:09<7:02:18, 5.49s/it] {'loss': 0.5869, 'learning_rate': 1.8505297414875066e-05, 'epoch': 0.2} 20%|██ | 1161/5773 [1:48:03<7:02:19, 5.49s/it] 20%|██ | 1162/5773 [1:48:09<7:00:56, 5.48s/it] 20%|██ | 1162/5773 [1:48:14<7:00:56, 5.48s/it] {'loss': 0.5839, 'learning_rate': 1.8502345105899922e-05, 'epoch': 0.2} 20%|██ | 1162/5773 [1:48:14<7:00:56, 5.48s/it] {'loss': 0.5839, 'learning_rate': 1.8502345105899922e-05, 'epoch': 0.2} 20%|██ | 1162/5773 [1:48:09<7:00:56, 5.48s/it] 20%|██ | 1163/5773 [1:48:14<7:03:23, 5.51s/it] 20%|██ | 1163/5773 [1:48:20<7:03:23, 5.51s/it] {'loss': 0.6121, 'learning_rate': 1.849939012011489e-05, 'epoch': 0.2} 20%|██ | 1163/5773 [1:48:20<7:03:23, 5.51s/it] {'loss': 0.6121, 'learning_rate': 1.849939012011489e-05, 'epoch': 0.2} 20%|██ | 1163/5773 [1:48:14<7:03:23, 5.51s/it] 20%|██ | 1164/5773 [1:48:20<7:00:41, 5.48s/it] 20%|██ | 1164/5773 [1:48:25<7:00:41, 5.48s/it] {'loss': 0.6093, 'learning_rate': 1.8496432458450297e-05, 'epoch': 0.2} 20%|██ | 1164/5773 [1:48:25<7:00:41, 5.48s/it] {'loss': 0.6093, 'learning_rate': 1.8496432458450297e-05, 'epoch': 0.2} 20%|██ | 1164/5773 [1:48:20<7:00:41, 5.48s/it] 20%|██ | 1165/5773 [1:48:25<7:02:31, 5.50s/it] 20%|██ | 1165/5773 [1:48:31<7:02:31, 5.50s/it] {'loss': 0.598, 'learning_rate': 1.8493472121837302e-05, 'epoch': 0.2} 20%|██ | 1165/5773 [1:48:31<7:02:31, 5.50s/it] {'loss': 0.598, 'learning_rate': 1.8493472121837302e-05, 'epoch': 0.2} 20%|██ | 1165/5773 [1:48:25<7:02:31, 5.50s/it] 20%|██ | 1166/5773 [1:48:31<7:00:20, 5.47s/it] 20%|██ | 1166/5773 [1:48:36<7:00:20, 5.47s/it] {'loss': 0.6105, 'learning_rate': 1.8490509111207925e-05, 'epoch': 0.2} 20%|██ | 1166/5773 [1:48:36<7:00:20, 5.47s/it] {'loss': 0.6105, 'learning_rate': 1.8490509111207925e-05, 'epoch': 0.2} 20%|██ | 1166/5773 [1:48:31<7:00:20, 5.47s/it] 20%|██ | 1167/5773 [1:48:36<6:53:58, 5.39s/it] 20%|██ | 1167/5773 [1:48:41<6:53:58, 5.39s/it] {'loss': 0.5941, 'learning_rate': 1.8487543427495004e-05, 'epoch': 0.2} 20%|██ | 1167/5773 [1:48:41<6:53:58, 5.39s/it] {'loss': 0.5941, 'learning_rate': 1.8487543427495004e-05, 'epoch': 0.2} 20%|██ | 1167/5773 [1:48:36<6:53:58, 5.39s/it] 20%|██ | 1168/5773 [1:48:41<6:57:08, 5.44s/it] 20%|██ | 1168/5773 [1:48:47<6:57:09, 5.44s/it] {'loss': 0.5916, 'learning_rate': 1.8484575071632242e-05, 'epoch': 0.2} 20%|██ | 1168/5773 [1:48:47<6:57:09, 5.44s/it] {'loss': 0.5916, 'learning_rate': 1.8484575071632242e-05, 'epoch': 0.2} 20%|██ | 1168/5773 [1:48:41<6:57:08, 5.44s/it] 20%|██ | 1169/5773 [1:48:47<7:04:38, 5.53s/it] 20%|██ | 1169/5773 [1:48:53<7:04:38, 5.53s/it] {'loss': 0.6019, 'learning_rate': 1.8481604044554164e-05, 'epoch': 0.2} 20%|██ | 1169/5773 [1:48:53<7:04:38, 5.53s/it] {'loss': 0.6019, 'learning_rate': 1.8481604044554164e-05, 'epoch': 0.2} 20%|██ | 1169/5773 [1:48:47<7:04:38, 5.53s/it] 20%|██ | 1170/5773 [1:48:53<7:00:08, 5.48s/it] 20%|██ | 1170/5773 [1:48:58<7:00:09, 5.48s/it] {'loss': 0.588, 'learning_rate': 1.8478630347196147e-05, 'epoch': 0.2} 20%|██ | 1170/5773 [1:48:58<7:00:09, 5.48s/it] {'loss': 0.588, 'learning_rate': 1.8478630347196147e-05, 'epoch': 0.2} 20%|██ | 1170/5773 [1:48:53<7:00:08, 5.48s/it] 20%|██ | 1171/5773 [1:48:58<7:03:05, 5.52s/it] 20%|██ | 1171/5773 [1:49:04<7:03:05, 5.52s/it] {'loss': 0.591, 'learning_rate': 1.8475653980494408e-05, 'epoch': 0.2} 20%|██ | 1171/5773 [1:49:04<7:03:05, 5.52s/it] {'loss': 0.591, 'learning_rate': 1.8475653980494408e-05, 'epoch': 0.2} 20%|██ | 1171/5773 [1:48:58<7:03:05, 5.52s/it] 20%|██ | 1172/5773 [1:49:04<6:59:28, 5.47s/it] 20%|██ | 1172/5773 [1:49:09<6:59:28, 5.47s/it] {'loss': 0.59, 'learning_rate': 1.8472674945386e-05, 'epoch': 0.2} 20%|██ | 1172/5773 [1:49:09<6:59:28, 5.47s/it] {'loss': 0.59, 'learning_rate': 1.8472674945386e-05, 'epoch': 0.2} 20%|██ | 1172/5773 [1:49:04<6:59:28, 5.47s/it] 20%|██ | 1173/5773 [1:49:09<6:59:35, 5.47s/it] 20%|██ | 1173/5773 [1:49:15<6:59:34, 5.47s/it] {'loss': 0.5995, 'learning_rate': 1.846969324280882e-05, 'epoch': 0.2} 20%|██ | 1173/5773 [1:49:15<6:59:34, 5.47s/it] {'loss': 0.5995, 'learning_rate': 1.846969324280882e-05, 'epoch': 0.2} 20%|██ | 1173/5773 [1:49:09<6:59:35, 5.47s/it] 20%|██ | 1174/5773 [1:49:14<6:57:59, 5.45s/it] 20%|██ | 1174/5773 [1:49:20<6:57:58, 5.45s/it] {'loss': 0.592, 'learning_rate': 1.84667088737016e-05, 'epoch': 0.2} 20%|██ | 1174/5773 [1:49:20<6:57:58, 5.45s/it] {'loss': 0.592, 'learning_rate': 1.84667088737016e-05, 'epoch': 0.2} 20%|██ | 1174/5773 [1:49:14<6:57:59, 5.45s/it] 20%|██ | 1175/5773 [1:49:20<7:01:26, 5.50s/it] 20%|██ | 1175/5773 [1:49:26<7:01:26, 5.50s/it] {'loss': 0.6115, 'learning_rate': 1.8463721839003917e-05, 'epoch': 0.2} 20%|██ | 1175/5773 [1:49:26<7:01:26, 5.50s/it] {'loss': 0.6115, 'learning_rate': 1.8463721839003917e-05, 'epoch': 0.2} 20%|██ | 1175/5773 [1:49:20<7:01:26, 5.50s/it] 20%|██ | 1176/5773 [1:49:26<7:01:40, 5.50s/it] 20%|██ | 1176/5773 [1:49:31<7:01:41, 5.50s/it] {'loss': 0.5817, 'learning_rate': 1.846073213965619e-05, 'epoch': 0.2} 20%|██ | 1176/5773 [1:49:31<7:01:41, 5.50s/it] {'loss': 0.5817, 'learning_rate': 1.846073213965619e-05, 'epoch': 0.2} 20%|██ | 1176/5773 [1:49:26<7:01:40, 5.50s/it] 20%|██ | 1177/5773 [1:49:31<7:01:17, 5.50s/it] 20%|██ | 1177/5773 [1:49:37<7:01:17, 5.50s/it] {'loss': 0.6005, 'learning_rate': 1.845773977659966e-05, 'epoch': 0.2} 20%|██ | 1177/5773 [1:49:37<7:01:17, 5.50s/it] {'loss': 0.6005, 'learning_rate': 1.845773977659966e-05, 'epoch': 0.2} 20%|██ | 1177/5773 [1:49:31<7:01:17, 5.50s/it] 20%|██ | 1178/5773 [1:49:37<7:00:23, 5.49s/it] 20%|██ | 1178/5773 [1:49:42<7:00:23, 5.49s/it] {'loss': 0.5936, 'learning_rate': 1.845474475077643e-05, 'epoch': 0.2} 20%|██ | 1178/5773 [1:49:42<7:00:23, 5.49s/it] {'loss': 0.5936, 'learning_rate': 1.845474475077643e-05, 'epoch': 0.2} 20%|██ | 1178/5773 [1:49:37<7:00:23, 5.49s/it] 20%|██ | 1179/5773 [1:49:42<7:05:40, 5.56s/it] 20%|██ | 1179/5773 [1:49:48<7:05:40, 5.56s/it] {'loss': 0.5963, 'learning_rate': 1.845174706312942e-05, 'epoch': 0.2} 20%|██ | 1179/5773 [1:49:48<7:05:40, 5.56s/it] {'loss': 0.5963, 'learning_rate': 1.845174706312942e-05, 'epoch': 0.2} 20%|██ | 1179/5773 [1:49:42<7:05:40, 5.56s/it] 20%|██ | 1180/5773 [1:49:48<7:00:43, 5.50s/it] 20%|██ | 1180/5773 [1:49:53<7:00:44, 5.50s/it] {'loss': 0.5819, 'learning_rate': 1.844874671460241e-05, 'epoch': 0.2} 20%|██ | 1180/5773 [1:49:53<7:00:44, 5.50s/it] {'loss': 0.5819, 'learning_rate': 1.844874671460241e-05, 'epoch': 0.2} 20%|██ | 1180/5773 [1:49:48<7:00:43, 5.50s/it] 20%|██ | 1181/5773 [1:49:53<6:58:54, 5.47s/it] 20%|██ | 1181/5773 [1:49:59<6:58:54, 5.47s/it] {'loss': 0.5974, 'learning_rate': 1.8445743706139994e-05, 'epoch': 0.2} 20%|██ | 1181/5773 [1:49:59<6:58:54, 5.47s/it] {'loss': 0.5974, 'learning_rate': 1.8445743706139994e-05, 'epoch': 0.2} 20%|██ | 1181/5773 [1:49:53<6:58:54, 5.47s/it] 20%|██ | 1182/5773 [1:49:58<6:58:21, 5.47s/it] 20%|██ | 1182/5773 [1:50:04<6:58:21, 5.47s/it] {'loss': 0.5835, 'learning_rate': 1.8442738038687623e-05, 'epoch': 0.2} 20%|██ | 1182/5773 [1:50:04<6:58:21, 5.47s/it] {'loss': 0.5835, 'learning_rate': 1.8442738038687623e-05, 'epoch': 0.2} 20%|██ | 1182/5773 [1:49:58<6:58:21, 5.47s/it] 20%|██ | 1183/5773 [1:50:04<6:59:57, 5.49s/it] 20%|██ | 1183/5773 [1:50:10<6:59:57, 5.49s/it] {'loss': 0.6036, 'learning_rate': 1.8439729713191572e-05, 'epoch': 0.2} 20%|██ | 1183/5773 [1:50:10<6:59:57, 5.49s/it] {'loss': 0.6036, 'learning_rate': 1.8439729713191572e-05, 'epoch': 0.2} 20%|██ | 1183/5773 [1:50:04<6:59:57, 5.49s/it] 21%|██ | 1184/5773 [1:50:09<6:56:23, 5.44s/it] 21%|██ | 1184/5773 [1:50:15<6:56:23, 5.44s/it] {'loss': 0.5982, 'learning_rate': 1.843671873059896e-05, 'epoch': 0.21} 21%|██ | 1184/5773 [1:50:15<6:56:23, 5.44s/it] {'loss': 0.5982, 'learning_rate': 1.843671873059896e-05, 'epoch': 0.21} 21%|██ | 1184/5773 [1:50:09<6:56:23, 5.44s/it] 21%|██ | 1185/5773 [1:50:15<6:57:51, 5.46s/it] 21%|██ | 1185/5773 [1:50:20<6:57:51, 5.46s/it] {'loss': 0.5797, 'learning_rate': 1.8433705091857738e-05, 'epoch': 0.21} 21%|██ | 1185/5773 [1:50:20<6:57:51, 5.46s/it] {'loss': 0.5797, 'learning_rate': 1.8433705091857738e-05, 'epoch': 0.21} 21%|██ | 1185/5773 [1:50:15<6:57:51, 5.46s/it] 21%|██ | 1186/5773 [1:50:20<6:54:54, 5.43s/it] 21%|██ | 1186/5773 [1:50:26<6:54:55, 5.43s/it] {'loss': 0.5784, 'learning_rate': 1.8430688797916702e-05, 'epoch': 0.21} 21%|██ | 1186/5773 [1:50:26<6:54:55, 5.43s/it] {'loss': 0.5784, 'learning_rate': 1.8430688797916702e-05, 'epoch': 0.21} 21%|██ | 1186/5773 [1:50:20<6:54:54, 5.43s/it] 21%|██ | 1187/5773 [1:50:26<6:55:21, 5.43s/it] 21%|██ | 1187/5773 [1:50:31<6:55:21, 5.43s/it] {'loss': 0.5867, 'learning_rate': 1.842766984972547e-05, 'epoch': 0.21} 21%|██ | 1187/5773 [1:50:31<6:55:21, 5.43s/it] {'loss': 0.5867, 'learning_rate': 1.842766984972547e-05, 'epoch': 0.21} 21%|██ | 1187/5773 [1:50:26<6:55:21, 5.43s/it] 21%|██ | 1188/5773 [1:50:31<6:53:38, 5.41s/it] 21%|██ | 1188/5773 [1:50:37<6:53:37, 5.41s/it] {'loss': 0.5816, 'learning_rate': 1.842464824823451e-05, 'epoch': 0.21} 21%|██ | 1188/5773 [1:50:37<6:53:37, 5.41s/it] {'loss': 0.5816, 'learning_rate': 1.842464824823451e-05, 'epoch': 0.21} 21%|██ | 1188/5773 [1:50:31<6:53:38, 5.41s/it] 21%|██ | 1189/5773 [1:50:37<6:55:53, 5.44s/it] 21%|██ | 1189/5773 [1:50:42<6:55:53, 5.44s/it] {'loss': 0.5838, 'learning_rate': 1.8421623994395114e-05, 'epoch': 0.21} 21%|██ | 1189/5773 [1:50:42<6:55:53, 5.44s/it] {'loss': 0.5838, 'learning_rate': 1.8421623994395114e-05, 'epoch': 0.21} 21%|██ | 1189/5773 [1:50:37<6:55:53, 5.44s/it] 21%|██ | 1190/5773 [1:50:42<6:57:33, 5.47s/it] 21%|██ | 1190/5773 [1:50:48<6:57:33, 5.47s/it] {'loss': 0.5726, 'learning_rate': 1.8418597089159415e-05, 'epoch': 0.21} 21%|██ | 1190/5773 [1:50:48<6:57:33, 5.47s/it] {'loss': 0.5726, 'learning_rate': 1.8418597089159415e-05, 'epoch': 0.21} 21%|██ | 1190/5773 [1:50:42<6:57:33, 5.47s/it] 21%|██ | 1191/5773 [1:50:48<6:59:59, 5.50s/it] 21%|██ | 1191/5773 [1:50:53<6:59:59, 5.50s/it] {'loss': 0.5877, 'learning_rate': 1.841556753348038e-05, 'epoch': 0.21} 21%|██ | 1191/5773 [1:50:53<6:59:59, 5.50s/it] {'loss': 0.5877, 'learning_rate': 1.841556753348038e-05, 'epoch': 0.21} 21%|██ | 1191/5773 [1:50:48<6:59:59, 5.50s/it] 21%|██ | 1192/5773 [1:50:53<6:56:01, 5.45s/it] 21%|██ | 1192/5773 [1:50:58<6:56:01, 5.45s/it] {'loss': 0.594, 'learning_rate': 1.8412535328311813e-05, 'epoch': 0.21} 21%|██ | 1192/5773 [1:50:58<6:56:01, 5.45s/it] {'loss': 0.594, 'learning_rate': 1.8412535328311813e-05, 'epoch': 0.21} 21%|██ | 1192/5773 [1:50:53<6:56:01, 5.45s/it] 21%|██ | 1193/5773 [1:50:59<7:06:38, 5.59s/it] 21%|██ | 1193/5773 [1:51:04<7:06:37, 5.59s/it] {'loss': 0.5845, 'learning_rate': 1.8409500474608343e-05, 'epoch': 0.21} 21%|██ | 1193/5773 [1:51:04<7:06:37, 5.59s/it] {'loss': 0.5845, 'learning_rate': 1.8409500474608343e-05, 'epoch': 0.21} 21%|██ | 1193/5773 [1:50:59<7:06:38, 5.59s/it] 21%|██ | 1194/5773 [1:51:04<7:04:42, 5.57s/it] 21%|██ | 1194/5773 [1:51:10<7:04:42, 5.57s/it] {'loss': 0.5836, 'learning_rate': 1.8406462973325445e-05, 'epoch': 0.21} 21%|██ | 1194/5773 [1:51:10<7:04:42, 5.57s/it] {'loss': 0.5836, 'learning_rate': 1.8406462973325445e-05, 'epoch': 0.21} 21%|██ | 1194/5773 [1:51:04<7:04:42, 5.57s/it] 21%|██ | 1195/5773 [1:51:10<7:03:13, 5.55s/it] 21%|██ | 1195/5773 [1:51:15<7:03:13, 5.55s/it] {'loss': 0.589, 'learning_rate': 1.840342282541942e-05, 'epoch': 0.21} 21%|██ | 1195/5773 [1:51:15<7:03:13, 5.55s/it] {'loss': 0.589, 'learning_rate': 1.840342282541942e-05, 'epoch': 0.21} 21%|██ | 1195/5773 [1:51:10<7:03:13, 5.55s/it] 21%|██ | 1196/5773 [1:51:15<6:58:18, 5.48s/it] 21%|██ | 1196/5773 [1:51:21<6:58:18, 5.48s/it] {'loss': 0.5901, 'learning_rate': 1.8400380031847395e-05, 'epoch': 0.21} 21%|██ | 1196/5773 [1:51:21<6:58:18, 5.48s/it] {'loss': 0.5901, 'learning_rate': 1.8400380031847395e-05, 'epoch': 0.21} 21%|██ | 1196/5773 [1:51:15<6:58:18, 5.48s/it] 21%|██ | 1197/5773 [1:51:21<7:01:42, 5.53s/it] 21%|██ | 1197/5773 [1:51:26<7:01:42, 5.53s/it] {'loss': 0.6005, 'learning_rate': 1.8397334593567347e-05, 'epoch': 0.21} 21%|██ | 1197/5773 [1:51:26<7:01:42, 5.53s/it] {'loss': 0.6005, 'learning_rate': 1.8397334593567347e-05, 'epoch': 0.21} 21%|██ | 1197/5773 [1:51:21<7:01:42, 5.53s/it] 21%|██ | 1198/5773 [1:51:26<7:02:12, 5.54s/it] 21%|██ | 1198/5773 [1:51:32<7:02:12, 5.54s/it] {'loss': 0.6083, 'learning_rate': 1.8394286511538077e-05, 'epoch': 0.21} 21%|██ | 1198/5773 [1:51:32<7:02:12, 5.54s/it] {'loss': 0.6083, 'learning_rate': 1.8394286511538077e-05, 'epoch': 0.21} 21%|██ | 1198/5773 [1:51:26<7:02:12, 5.54s/it] 21%|██ | 1199/5773 [1:51:32<6:57:28, 5.48s/it] 21%|██ | 1199/5773 [1:51:37<6:57:28, 5.48s/it] {'loss': 0.5932, 'learning_rate': 1.839123578671922e-05, 'epoch': 0.21} 21%|██ | 1199/5773 [1:51:37<6:57:28, 5.48s/it] {'loss': 0.5932, 'learning_rate': 1.839123578671922e-05, 'epoch': 0.21} 21%|██ | 1199/5773 [1:51:32<6:57:28, 5.48s/it]7 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 010 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 21%|██ | 1200/5773 [1:51:37<6:51:21, 5.40s/it]14 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 21%|██ | 1200/5773 [1:51:42<6:51:21, 5.40s/it]4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... {'loss': 0.5884, 'learning_rate': 1.8388182420071237e-05, 'epoch': 0.21} 21%|██ | 1200/5773 [1:51:42<6:51:21, 5.40s/it] {'loss': 0.5884, 'learning_rate': 1.8388182420071237e-05, 'epoch': 0.21} 21%|██ | 1200/5773 [1:51:37<6:51:21, 5.40s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 21%|██ | 1201/5773 [1:51:56<11:59:29, 9.44s/it] 21%|██ | 1201/5773 [1:52:01<11:59:29, 9.44s/it] {'loss': 0.5828, 'learning_rate': 1.838512641255543e-05, 'epoch': 0.21} 21%|██ | 1201/5773 [1:52:01<11:59:29, 9.44s/it] {'loss': 0.5828, 'learning_rate': 1.838512641255543e-05, 'epoch': 0.21} 21%|██ | 1201/5773 [1:51:56<11:59:29, 9.44s/it] 21%|██ | 1202/5773 [1:52:01<10:29:57, 8.27s/it] 21%|██ | 1202/5773 [1:52:07<10:29:57, 8.27s/it] {'loss': 0.6054, 'learning_rate': 1.8382067765133926e-05, 'epoch': 0.21} 21%|██ | 1202/5773 [1:52:07<10:29:57, 8.27s/it] {'loss': 0.6054, 'learning_rate': 1.8382067765133926e-05, 'epoch': 0.21} 21%|██ | 1202/5773 [1:52:01<10:29:57, 8.27s/it] 21%|██ | 1203/5773 [1:52:07<9:20:44, 7.36s/it] 21%|██ | 1203/5773 [1:52:12<9:20:44, 7.36s/it] {'loss': 0.579, 'learning_rate': 1.8379006478769677e-05, 'epoch': 0.21} 21%|██ | 1203/5773 [1:52:12<9:20:44, 7.36s/it] {'loss': 0.579, 'learning_rate': 1.8379006478769677e-05, 'epoch': 0.21} 21%|██ | 1203/5773 [1:52:07<9:20:44, 7.36s/it] 21%|██ | 1204/5773 [1:52:12<8:37:43, 6.80s/it] 21%|██ | 1204/5773 [1:52:18<8:37:43, 6.80s/it] {'loss': 0.593, 'learning_rate': 1.8375942554426486e-05, 'epoch': 0.21} 21%|██ | 1204/5773 [1:52:18<8:37:43, 6.80s/it] {'loss': 0.593, 'learning_rate': 1.8375942554426486e-05, 'epoch': 0.21} 21%|██ | 1204/5773 [1:52:12<8:37:43, 6.80s/it] 21%|██ | 1205/5773 [1:52:18<8:05:35, 6.38s/it] 21%|██ | 1205/5773 [1:52:23<8:05:35, 6.38s/it] {'loss': 0.5986, 'learning_rate': 1.837287599306897e-05, 'epoch': 0.21} 21%|██ | 1205/5773 [1:52:23<8:05:35, 6.38s/it] {'loss': 0.5986, 'learning_rate': 1.837287599306897e-05, 'epoch': 0.21} 21%|██ | 1205/5773 [1:52:18<8:05:35, 6.38s/it] 21%|██ | 1206/5773 [1:52:23<7:47:59, 6.15s/it] 21%|██ | 1206/5773 [1:52:29<7:47:59, 6.15s/it] {'loss': 0.576, 'learning_rate': 1.8369806795662582e-05, 'epoch': 0.21} 21%|██ | 1206/5773 [1:52:29<7:47:59, 6.15s/it] {'loss': 0.576, 'learning_rate': 1.8369806795662582e-05, 'epoch': 0.21} 21%|██ | 1206/5773 [1:52:23<7:47:59, 6.15s/it] 21%|██ | 1207/5773 [1:52:28<7:29:55, 5.91s/it] 21%|██ | 1207/5773 [1:52:34<7:29:56, 5.91s/it] {'loss': 0.5945, 'learning_rate': 1.8366734963173597e-05, 'epoch': 0.21} 21%|██ | 1207/5773 [1:52:34<7:29:56, 5.91s/it] {'loss': 0.5945, 'learning_rate': 1.8366734963173597e-05, 'epoch': 0.21} 21%|██ | 1207/5773 [1:52:28<7:29:55, 5.91s/it] 21%|██ | 1208/5773 [1:52:34<7:17:42, 5.75s/it] 21%|██ | 1208/5773 [1:52:39<7:17:42, 5.75s/it] {'loss': 0.5974, 'learning_rate': 1.836366049656913e-05, 'epoch': 0.21} 21%|██ | 1208/5773 [1:52:39<7:17:42, 5.75s/it] {'loss': 0.5974, 'learning_rate': 1.836366049656913e-05, 'epoch': 0.21} 21%|██ | 1208/5773 [1:52:34<7:17:42, 5.75s/it] 21%|██ | 1209/5773 [1:52:39<7:11:28, 5.67s/it] 21%|██ | 1209/5773 [1:52:45<7:11:28, 5.67s/it] {'loss': 0.5721, 'learning_rate': 1.8360583396817123e-05, 'epoch': 0.21} 21%|██ | 1209/5773 [1:52:45<7:11:28, 5.67s/it] {'loss': 0.5721, 'learning_rate': 1.8360583396817123e-05, 'epoch': 0.21} 21%|██ | 1209/5773 [1:52:39<7:11:28, 5.67s/it] 21%|██ | 1210/5773 [1:52:45<7:07:51, 5.63s/it] 21%|██ | 1210/5773 [1:52:50<7:07:51, 5.63s/it] {'loss': 0.5979, 'learning_rate': 1.8357503664886345e-05, 'epoch': 0.21} 21%|██ | 1210/5773 [1:52:50<7:07:51, 5.63s/it] {'loss': 0.5979, 'learning_rate': 1.8357503664886345e-05, 'epoch': 0.21} 21%|██ | 1210/5773 [1:52:45<7:07:51, 5.63s/it] 21%|██ | 1211/5773 [1:52:50<7:01:21, 5.54s/it] 21%|██ | 1211/5773 [1:52:56<7:01:20, 5.54s/it] {'loss': 0.5964, 'learning_rate': 1.8354421301746393e-05, 'epoch': 0.21} 21%|██ | 1211/5773 [1:52:56<7:01:20, 5.54s/it] {'loss': 0.5964, 'learning_rate': 1.8354421301746393e-05, 'epoch': 0.21} 21%|██ | 1211/5773 [1:52:50<7:01:21, 5.54s/it] 21%|██ | 1212/5773 [1:52:56<6:57:57, 5.50s/it] 21%|██ | 1212/5773 [1:53:01<6:57:57, 5.50s/it] {'loss': 0.5862, 'learning_rate': 1.835133630836769e-05, 'epoch': 0.21} 21%|██ | 1212/5773 [1:53:01<6:57:57, 5.50s/it] {'loss': 0.5862, 'learning_rate': 1.835133630836769e-05, 'epoch': 0.21} 21%|██ | 1212/5773 [1:52:56<6:57:57, 5.50s/it] 21%|██ | 1213/5773 [1:53:01<6:58:07, 5.50s/it] 21%|██ | 1213/5773 [1:53:07<6:58:07, 5.50s/it] {'loss': 0.5958, 'learning_rate': 1.8348248685721495e-05, 'epoch': 0.21} 21%|██ | 1213/5773 [1:53:07<6:58:07, 5.50s/it] {'loss': 0.5958, 'learning_rate': 1.8348248685721495e-05, 'epoch': 0.21} 21%|██ | 1213/5773 [1:53:01<6:58:07, 5.50s/it] 21%|██ | 1214/5773 [1:53:07<6:58:15, 5.50s/it] 21%|██ | 1214/5773 [1:53:12<6:58:15, 5.50s/it] {'loss': 0.5901, 'learning_rate': 1.834515843477989e-05, 'epoch': 0.21} 21%|██ | 1214/5773 [1:53:12<6:58:15, 5.50s/it] {'loss': 0.5901, 'learning_rate': 1.834515843477989e-05, 'epoch': 0.21} 21%|██ | 1214/5773 [1:53:07<6:58:15, 5.50s/it] 21%|██ | 1215/5773 [1:53:12<6:57:12, 5.49s/it] 21%|██ | 1215/5773 [1:53:18<6:57:12, 5.49s/it] {'loss': 0.5843, 'learning_rate': 1.8342065556515787e-05, 'epoch': 0.21} 21%|██ | 1215/5773 [1:53:18<6:57:12, 5.49s/it] {'loss': 0.5843, 'learning_rate': 1.8342065556515787e-05, 'epoch': 0.21} 21%|██ | 1215/5773 [1:53:12<6:57:12, 5.49s/it] 21%|██ | 1216/5773 [1:53:18<6:55:36, 5.47s/it] 21%|██ | 1216/5773 [1:53:23<6:55:37, 5.47s/it] {'loss': 0.5889, 'learning_rate': 1.8338970051902913e-05, 'epoch': 0.21} 21%|██ | 1216/5773 [1:53:23<6:55:37, 5.47s/it] {'loss': 0.5889, 'learning_rate': 1.8338970051902913e-05, 'epoch': 0.21} 21%|██ | 1216/5773 [1:53:18<6:55:36, 5.47s/it] 21%|██ | 1217/5773 [1:53:23<6:54:34, 5.46s/it] 21%|██ | 1217/5773 [1:53:28<6:54:34, 5.46s/it] {'loss': 0.5973, 'learning_rate': 1.8335871921915843e-05, 'epoch': 0.21} 21%|██ | 1217/5773 [1:53:28<6:54:34, 5.46s/it] {'loss': 0.5973, 'learning_rate': 1.8335871921915843e-05, 'epoch': 0.21} 21%|██ | 1217/5773 [1:53:23<6:54:34, 5.46s/it] 21%|██ | 1218/5773 [1:53:29<6:58:17, 5.51s/it] 21%|██ | 1218/5773 [1:53:34<6:58:17, 5.51s/it] {'loss': 0.5856, 'learning_rate': 1.833277116752996e-05, 'epoch': 0.21} 21%|██ | 1218/5773 [1:53:34<6:58:17, 5.51s/it] {'loss': 0.5856, 'learning_rate': 1.833277116752996e-05, 'epoch': 0.21} 21%|██ | 1218/5773 [1:53:29<6:58:17, 5.51s/it] 21%|██ | 1219/5773 [1:53:34<6:52:15, 5.43s/it] 21%|██ | 1219/5773 [1:53:39<6:52:15, 5.43s/it] {'loss': 0.5965, 'learning_rate': 1.8329667789721487e-05, 'epoch': 0.21} 21%|██ | 1219/5773 [1:53:39<6:52:15, 5.43s/it] {'loss': 0.5965, 'learning_rate': 1.8329667789721487e-05, 'epoch': 0.21} 21%|██ | 1219/5773 [1:53:34<6:52:15, 5.43s/it] 21%|██ | 1220/5773 [1:53:39<6:49:01, 5.39s/it] 21%|██ | 1220/5773 [1:53:45<6:49:01, 5.39s/it] {'loss': 0.5893, 'learning_rate': 1.8326561789467458e-05, 'epoch': 0.21} {'loss': 0.5893, 'learning_rate': 1.8326561789467458e-05, 'epoch': 0.21} 21%|██ | 1220/5773 [1:53:45<6:49:01, 5.39s/it] 21%|██ | 1220/5773 [1:53:39<6:49:01, 5.39s/it] 21%|██ | 1221/5773 [1:53:44<6:48:10, 5.38s/it] 21%|██ | 1221/5773 [1:53:50<6:48:10, 5.38s/it] {'loss': 0.5861, 'learning_rate': 1.8323453167745747e-05, 'epoch': 0.21} 21%|██ | 1221/5773 [1:53:50<6:48:10, 5.38s/it] {'loss': 0.5861, 'learning_rate': 1.8323453167745747e-05, 'epoch': 0.21} 21%|██ | 1221/5773 [1:53:44<6:48:10, 5.38s/it] 21%|██ | 1222/5773 [1:53:50<6:45:40, 5.35s/it] 21%|██ | 1222/5773 [1:53:55<6:45:40, 5.35s/it] {'loss': 0.5984, 'learning_rate': 1.832034192553505e-05, 'epoch': 0.21} 21%|██ | 1222/5773 [1:53:55<6:45:40, 5.35s/it] {'loss': 0.5984, 'learning_rate': 1.832034192553505e-05, 'epoch': 0.21} 21%|██ | 1222/5773 [1:53:50<6:45:40, 5.35s/it] 21%|██ | 1223/5773 [1:53:55<6:49:41, 5.40s/it] 21%|██ | 1223/5773 [1:54:01<6:49:42, 5.40s/it] {'loss': 0.5895, 'learning_rate': 1.831722806381488e-05, 'epoch': 0.21} 21%|██ | 1223/5773 [1:54:01<6:49:42, 5.40s/it] {'loss': 0.5895, 'learning_rate': 1.831722806381488e-05, 'epoch': 0.21} 21%|██ | 1223/5773 [1:53:55<6:49:41, 5.40s/it] 21%|██ | 1224/5773 [1:54:01<6:54:04, 5.46s/it] 21%|██ | 1224/5773 [1:54:06<6:54:04, 5.46s/it] {'loss': 0.5885, 'learning_rate': 1.831411158356558e-05, 'epoch': 0.21} 21%|██ | 1224/5773 [1:54:06<6:54:04, 5.46s/it] {'loss': 0.5885, 'learning_rate': 1.831411158356558e-05, 'epoch': 0.21} 21%|██ | 1224/5773 [1:54:01<6:54:04, 5.46s/it] 21%|██ | 1225/5773 [1:54:06<6:54:04, 5.46s/it] 21%|██ | 1225/5773 [1:54:12<6:54:04, 5.46s/it] {'loss': 0.6007, 'learning_rate': 1.831099248576832e-05, 'epoch': 0.21} 21%|██ | 1225/5773 [1:54:12<6:54:04, 5.46s/it] {'loss': 0.6007, 'learning_rate': 1.831099248576832e-05, 'epoch': 0.21} 21%|██ | 1225/5773 [1:54:06<6:54:04, 5.46s/it] 21%|██ | 1226/5773 [1:54:12<6:51:52, 5.43s/it] 21%|██ | 1226/5773 [1:54:17<6:51:52, 5.43s/it] {'loss': 0.5923, 'learning_rate': 1.830787077140509e-05, 'epoch': 0.21} 21%|██ | 1226/5773 [1:54:17<6:51:52, 5.43s/it] {'loss': 0.5923, 'learning_rate': 1.830787077140509e-05, 'epoch': 0.21} 21%|██ | 1226/5773 [1:54:12<6:51:52, 5.43s/it] 21%|██▏ | 1227/5773 [1:54:17<6:50:34, 5.42s/it] 21%|██▏ | 1227/5773 [1:54:23<6:50:33, 5.42s/it] {'loss': 0.6029, 'learning_rate': 1.830474644145871e-05, 'epoch': 0.21} 21%|██▏ | 1227/5773 [1:54:23<6:50:33, 5.42s/it] {'loss': 0.6029, 'learning_rate': 1.830474644145871e-05, 'epoch': 0.21} 21%|██▏ | 1227/5773 [1:54:17<6:50:34, 5.42s/it] 21%|██▏ | 1228/5773 [1:54:23<6:54:24, 5.47s/it] 21%|██▏ | 1228/5773 [1:54:28<6:54:24, 5.47s/it] {'loss': 0.5734, 'learning_rate': 1.8301619496912813e-05, 'epoch': 0.21} 21%|██▏ | 1228/5773 [1:54:28<6:54:24, 5.47s/it] {'loss': 0.5734, 'learning_rate': 1.8301619496912813e-05, 'epoch': 0.21} 21%|██▏ | 1228/5773 [1:54:23<6:54:24, 5.47s/it] 21%|██▏ | 1229/5773 [1:54:28<6:52:45, 5.45s/it] 21%|██▏ | 1229/5773 [1:54:34<6:52:45, 5.45s/it] {'loss': 0.5912, 'learning_rate': 1.8298489938751864e-05, 'epoch': 0.21} 21%|██▏ | 1229/5773 [1:54:34<6:52:45, 5.45s/it] {'loss': 0.5912, 'learning_rate': 1.8298489938751864e-05, 'epoch': 0.21} 21%|██▏ | 1229/5773 [1:54:28<6:52:45, 5.45s/it] 21%|██▏ | 1230/5773 [1:54:34<6:53:58, 5.47s/it] 21%|██▏ | 1230/5773 [1:54:39<6:53:58, 5.47s/it] {'loss': 0.5908, 'learning_rate': 1.8295357767961144e-05, 'epoch': 0.21} 21%|██▏ | 1230/5773 [1:54:39<6:53:58, 5.47s/it] {'loss': 0.5908, 'learning_rate': 1.8295357767961144e-05, 'epoch': 0.21} 21%|██▏ | 1230/5773 [1:54:34<6:53:58, 5.47s/it] 21%|██▏ | 1231/5773 [1:54:39<6:54:05, 5.47s/it] 21%|██▏ | 1231/5773 [1:54:45<6:54:05, 5.47s/it] {'loss': 0.6007, 'learning_rate': 1.8292222985526772e-05, 'epoch': 0.21} 21%|██▏ | 1231/5773 [1:54:45<6:54:05, 5.47s/it] {'loss': 0.6007, 'learning_rate': 1.8292222985526772e-05, 'epoch': 0.21} 21%|██▏ | 1231/5773 [1:54:39<6:54:05, 5.47s/it] 21%|██▏ | 1232/5773 [1:54:44<6:52:50, 5.45s/it] 21%|██▏ | 1232/5773 [1:54:50<6:52:50, 5.45s/it] {'loss': 0.582, 'learning_rate': 1.8289085592435663e-05, 'epoch': 0.21} {'loss': 0.582, 'learning_rate': 1.8289085592435663e-05, 'epoch': 0.21} 21%|██▏ | 1232/5773 [1:54:50<6:52:50, 5.45s/it] 21%|██▏ | 1232/5773 [1:54:44<6:52:50, 5.45s/it] 21%|██▏ | 1233/5773 [1:54:50<6:54:32, 5.48s/it] 21%|██▏ | 1233/5773 [1:54:56<6:54:32, 5.48s/it] {'loss': 0.5895, 'learning_rate': 1.8285945589675576e-05, 'epoch': 0.21} 21%|██▏ | 1233/5773 [1:54:56<6:54:32, 5.48s/it] {'loss': 0.5895, 'learning_rate': 1.8285945589675576e-05, 'epoch': 0.21} 21%|██▏ | 1233/5773 [1:54:50<6:54:32, 5.48s/it] 21%|██▏ | 1234/5773 [1:54:55<6:52:31, 5.45s/it] 21%|██▏ | 1234/5773 [1:55:01<6:52:30, 5.45s/it] {'loss': 0.5938, 'learning_rate': 1.8282802978235084e-05, 'epoch': 0.21} 21%|██▏ | 1234/5773 [1:55:01<6:52:30, 5.45s/it] {'loss': 0.5938, 'learning_rate': 1.8282802978235084e-05, 'epoch': 0.21} 21%|██▏ | 1234/5773 [1:54:55<6:52:31, 5.45s/it] 21%|██▏ | 1235/5773 [1:55:01<6:50:06, 5.42s/it] 21%|██▏ | 1235/5773 [1:55:06<6:50:08, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.827965775910358e-05, 'epoch': 0.21} 21%|██▏ | 1235/5773 [1:55:06<6:50:08, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.827965775910358e-05, 'epoch': 0.21} 21%|██▏ | 1235/5773 [1:55:01<6:50:06, 5.42s/it] 21%|██▏ | 1236/5773 [1:55:06<6:52:36, 5.46s/it] 21%|██▏ | 1236/5773 [1:55:12<6:52:36, 5.46s/it] {'loss': 0.5927, 'learning_rate': 1.827650993327128e-05, 'epoch': 0.21} 21%|██▏ | 1236/5773 [1:55:12<6:52:36, 5.46s/it] {'loss': 0.5927, 'learning_rate': 1.827650993327128e-05, 'epoch': 0.21} 21%|██▏ | 1236/5773 [1:55:06<6:52:36, 5.46s/it] 21%|██▏ | 1237/5773 [1:55:12<6:57:00, 5.52s/it] 21%|██▏ | 1237/5773 [1:55:17<6:57:00, 5.52s/it] {'loss': 0.5965, 'learning_rate': 1.827335950172922e-05, 'epoch': 0.21} 21%|██▏ | 1237/5773 [1:55:17<6:57:00, 5.52s/it] {'loss': 0.5965, 'learning_rate': 1.827335950172922e-05, 'epoch': 0.21} 21%|██▏ | 1237/5773 [1:55:12<6:57:00, 5.52s/it] 21%|██▏ | 1238/5773 [1:55:17<6:56:16, 5.51s/it] 21%|██▏ | 1238/5773 [1:55:23<6:56:16, 5.51s/it] {'loss': 0.5858, 'learning_rate': 1.827020646546926e-05, 'epoch': 0.21} 21%|██▏ | 1238/5773 [1:55:23<6:56:16, 5.51s/it] {'loss': 0.5858, 'learning_rate': 1.827020646546926e-05, 'epoch': 0.21} 21%|██▏ | 1238/5773 [1:55:17<6:56:16, 5.51s/it] 21%|██▏ | 1239/5773 [1:55:23<6:53:34, 5.47s/it] 21%|██▏ | 1239/5773 [1:55:28<6:53:34, 5.47s/it] {'loss': 0.6157, 'learning_rate': 1.826705082548407e-05, 'epoch': 0.21} 21%|██▏ | 1239/5773 [1:55:28<6:53:34, 5.47s/it] {'loss': 0.6157, 'learning_rate': 1.826705082548407e-05, 'epoch': 0.21} 21%|██▏ | 1239/5773 [1:55:23<6:53:34, 5.47s/it] 21%|██▏ | 1240/5773 [1:55:28<6:52:55, 5.47s/it] 21%|██▏ | 1240/5773 [1:55:34<6:52:55, 5.47s/it] {'loss': 0.5981, 'learning_rate': 1.826389258276715e-05, 'epoch': 0.21} 21%|██▏ | 1240/5773 [1:55:34<6:52:55, 5.47s/it] {'loss': 0.5981, 'learning_rate': 1.826389258276715e-05, 'epoch': 0.21} 21%|██▏ | 1240/5773 [1:55:28<6:52:55, 5.47s/it] 21%|██▏ | 1241/5773 [1:55:34<6:50:39, 5.44s/it] 21%|██▏ | 1241/5773 [1:55:39<6:50:39, 5.44s/it] {'loss': 0.5899, 'learning_rate': 1.8260731738312817e-05, 'epoch': 0.21} 21%|██▏ | 1241/5773 [1:55:39<6:50:39, 5.44s/it] {'loss': 0.5899, 'learning_rate': 1.8260731738312817e-05, 'epoch': 0.21} 21%|██▏ | 1241/5773 [1:55:34<6:50:39, 5.44s/it] 22%|██▏ | 1242/5773 [1:55:39<6:53:32, 5.48s/it] 22%|██▏ | 1242/5773 [1:55:45<6:53:32, 5.48s/it] {'loss': 0.574, 'learning_rate': 1.8257568293116204e-05, 'epoch': 0.22} 22%|██▏ | 1242/5773 [1:55:45<6:53:32, 5.48s/it] {'loss': 0.574, 'learning_rate': 1.8257568293116204e-05, 'epoch': 0.22} 22%|██▏ | 1242/5773 [1:55:39<6:53:32, 5.48s/it] 22%|██▏ | 1243/5773 [1:55:44<6:47:48, 5.40s/it] 22%|██▏ | 1243/5773 [1:55:50<6:47:49, 5.40s/it] {'loss': 0.5752, 'learning_rate': 1.8254402248173265e-05, 'epoch': 0.22} 22%|██▏ | 1243/5773 [1:55:50<6:47:49, 5.40s/it] {'loss': 0.5752, 'learning_rate': 1.8254402248173265e-05, 'epoch': 0.22} 22%|██▏ | 1243/5773 [1:55:44<6:47:48, 5.40s/it] 22%|██▏ | 1244/5773 [1:55:50<6:46:14, 5.38s/it] 22%|██▏ | 1244/5773 [1:55:55<6:46:14, 5.38s/it] {'loss': 0.5868, 'learning_rate': 1.825123360448077e-05, 'epoch': 0.22} 22%|██▏ | 1244/5773 [1:55:55<6:46:14, 5.38s/it] {'loss': 0.5868, 'learning_rate': 1.825123360448077e-05, 'epoch': 0.22} 22%|██▏ | 1244/5773 [1:55:50<6:46:14, 5.38s/it] 22%|██▏ | 1245/5773 [1:55:55<6:51:31, 5.45s/it] 22%|██▏ | 1245/5773 [1:56:01<6:51:31, 5.45s/it] {'loss': 0.5879, 'learning_rate': 1.824806236303631e-05, 'epoch': 0.22} 22%|██▏ | 1245/5773 [1:55:55<6:51:31, 5.45s/it] {'loss': 0.5879, 'learning_rate': 1.824806236303631e-05, 'epoch': 0.22} 22%|██▏ | 1245/5773 [1:56:01<6:51:31, 5.45s/it] 22%|██▏ | 1246/5773 [1:56:01<6:46:43, 5.39s/it] 22%|██▏ | 1246/5773 [1:56:06<6:46:43, 5.39s/it] {'loss': 0.5834, 'learning_rate': 1.82448885248383e-05, 'epoch': 0.22} 22%|██▏ | 1246/5773 [1:56:06<6:46:43, 5.39s/it] {'loss': 0.5834, 'learning_rate': 1.82448885248383e-05, 'epoch': 0.22} 22%|██▏ | 1246/5773 [1:56:01<6:46:43, 5.39s/it] 22%|██▏ | 1247/5773 [1:56:06<6:48:35, 5.42s/it] 22%|██▏ | 1247/5773 [1:56:12<6:48:35, 5.42s/it] {'loss': 0.5919, 'learning_rate': 1.8241712090885956e-05, 'epoch': 0.22} 22%|██▏ | 1247/5773 [1:56:12<6:48:35, 5.42s/it] {'loss': 0.5919, 'learning_rate': 1.8241712090885956e-05, 'epoch': 0.22} 22%|██▏ | 1247/5773 [1:56:06<6:48:35, 5.42s/it] 22%|██▏ | 1248/5773 [1:56:12<6:51:45, 5.46s/it] 22%|██▏ | 1248/5773 [1:56:17<6:51:45, 5.46s/it] {'loss': 0.5795, 'learning_rate': 1.8238533062179325e-05, 'epoch': 0.22} 22%|██▏ | 1248/5773 [1:56:17<6:51:45, 5.46s/it] {'loss': 0.5795, 'learning_rate': 1.8238533062179325e-05, 'epoch': 0.22} 22%|██▏ | 1248/5773 [1:56:12<6:51:45, 5.46s/it] 22%|██▏ | 1249/5773 [1:56:17<6:51:35, 5.46s/it] 22%|██▏ | 1249/5773 [1:56:23<6:51:35, 5.46s/it] {'loss': 0.5741, 'learning_rate': 1.8235351439719266e-05, 'epoch': 0.22} 22%|██▏ | 1249/5773 [1:56:23<6:51:35, 5.46s/it] {'loss': 0.5741, 'learning_rate': 1.8235351439719266e-05, 'epoch': 0.22} 22%|██▏ | 1249/5773 [1:56:17<6:51:35, 5.46s/it]15 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 22%|██▏ | 1250/5773 [1:56:22<6:48:23, 5.42s/it]4 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 22%|██▏ | 1250/5773 [1:56:28<6:48:23, 5.42s/it]12 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... {'loss': 0.5864, 'learning_rate': 1.823216722450746e-05, 'epoch': 0.22} 22%|██▏ | 1250/5773 [1:56:28<6:48:23, 5.42s/it] {'loss': 0.5864, 'learning_rate': 1.823216722450746e-05, 'epoch': 0.22} 22%|██▏ | 1250/5773 [1:56:22<6:48:23, 5.42s/it] 22%|██▏ | 1251/5773 [1:56:28<6:48:44, 5.42s/it] 22%|██▏ | 1251/5773 [1:56:33<6:48:44, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.8228980417546392e-05, 'epoch': 0.22} 22%|██▏ | 1251/5773 [1:56:33<6:48:44, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.8228980417546392e-05, 'epoch': 0.22} 22%|██▏ | 1251/5773 [1:56:28<6:48:44, 5.42s/it] 22%|██▏ | 1252/5773 [1:56:33<6:49:00, 5.43s/it] 22%|██▏ | 1252/5773 [1:56:39<6:48:59, 5.43s/it] {'loss': 0.5851, 'learning_rate': 1.8225791019839375e-05, 'epoch': 0.22} 22%|██▏ | 1252/5773 [1:56:39<6:48:59, 5.43s/it] {'loss': 0.5851, 'learning_rate': 1.8225791019839375e-05, 'epoch': 0.22} 22%|██▏ | 1252/5773 [1:56:33<6:49:00, 5.43s/it] 22%|██▏ | 1253/5773 [1:56:39<6:47:54, 5.41s/it] 22%|██▏ | 1253/5773 [1:56:44<6:47:54, 5.41s/it] {'loss': 0.5933, 'learning_rate': 1.8222599032390534e-05, 'epoch': 0.22} 22%|██▏ | 1253/5773 [1:56:44<6:47:54, 5.41s/it] {'loss': 0.5933, 'learning_rate': 1.8222599032390534e-05, 'epoch': 0.22} 22%|██▏ | 1253/5773 [1:56:39<6:47:54, 5.41s/it] 22%|██▏ | 1254/5773 [1:56:44<6:46:10, 5.39s/it] 22%|██▏ | 1254/5773 [1:56:50<6:46:10, 5.39s/it] {'loss': 0.583, 'learning_rate': 1.8219404456204808e-05, 'epoch': 0.22} 22%|██▏ | 1254/5773 [1:56:50<6:46:10, 5.39s/it] {'loss': 0.583, 'learning_rate': 1.8219404456204808e-05, 'epoch': 0.22} 22%|██▏ | 1254/5773 [1:56:44<6:46:10, 5.39s/it] 22%|██▏ | 1255/5773 [1:56:50<6:51:04, 5.46s/it] 22%|██▏ | 1255/5773 [1:56:55<6:51:04, 5.46s/it] {'loss': 0.5969, 'learning_rate': 1.8216207292287945e-05, 'epoch': 0.22} 22%|██▏ | 1255/5773 [1:56:55<6:51:04, 5.46s/it] {'loss': 0.5969, 'learning_rate': 1.8216207292287945e-05, 'epoch': 0.22} 22%|██▏ | 1255/5773 [1:56:50<6:51:04, 5.46s/it] 22%|██▏ | 1256/5773 [1:56:55<6:50:50, 5.46s/it] 22%|██▏ | 1256/5773 [1:57:01<6:50:50, 5.46s/it] {'loss': 0.5918, 'learning_rate': 1.8213007541646527e-05, 'epoch': 0.22} 22%|██▏ | 1256/5773 [1:57:01<6:50:50, 5.46s/it] {'loss': 0.5918, 'learning_rate': 1.8213007541646527e-05, 'epoch': 0.22} 22%|██▏ | 1256/5773 [1:56:55<6:50:50, 5.46s/it] 22%|██▏ | 1257/5773 [1:57:00<6:48:16, 5.42s/it] 22%|██▏ | 1257/5773 [1:57:06<6:48:16, 5.42s/it] {'loss': 0.5837, 'learning_rate': 1.8209805205287932e-05, 'epoch': 0.22} 22%|██▏ | 1257/5773 [1:57:06<6:48:16, 5.42s/it] {'loss': 0.5837, 'learning_rate': 1.8209805205287932e-05, 'epoch': 0.22} 22%|██▏ | 1257/5773 [1:57:00<6:48:16, 5.42s/it] 22%|██▏ | 1258/5773 [1:57:06<6:48:56, 5.43s/it] 22%|██▏ | 1258/5773 [1:57:11<6:48:55, 5.43s/it] {'loss': 0.5828, 'learning_rate': 1.8206600284220353e-05, 'epoch': 0.22} 22%|██▏ | 1258/5773 [1:57:11<6:48:55, 5.43s/it] {'loss': 0.5828, 'learning_rate': 1.8206600284220353e-05, 'epoch': 0.22} 22%|██▏ | 1258/5773 [1:57:06<6:48:56, 5.43s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5002 > 4096). Running this sequence through the model will result in indexing errors 22%|██▏ | 1259/5773 [1:57:12<6:51:57, 5.48s/it] 22%|██▏ | 1259/5773 [1:57:17<6:51:56, 5.48s/it] {'loss': 0.5957, 'learning_rate': 1.8203392779452804e-05, 'epoch': 0.22} 22%|██▏ | 1259/5773 [1:57:17<6:51:56, 5.48s/it] {'loss': 0.5957, 'learning_rate': 1.8203392779452804e-05, 'epoch': 0.22} 22%|██▏ | 1259/5773 [1:57:12<6:51:57, 5.48s/it] 22%|██▏ | 1260/5773 [1:57:17<6:54:24, 5.51s/it] 22%|██▏ | 1260/5773 [1:57:23<6:54:24, 5.51s/it] {'loss': 0.5702, 'learning_rate': 1.8200182691995113e-05, 'epoch': 0.22} 22%|██▏ | 1260/5773 [1:57:23<6:54:24, 5.51s/it] {'loss': 0.5702, 'learning_rate': 1.8200182691995113e-05, 'epoch': 0.22} 22%|██▏ | 1260/5773 [1:57:17<6:54:24, 5.51s/it] 22%|██▏ | 1261/5773 [1:57:23<6:54:59, 5.52s/it] 22%|██▏ | 1261/5773 [1:57:28<6:54:59, 5.52s/it] {'loss': 0.5847, 'learning_rate': 1.819697002285792e-05, 'epoch': 0.22} 22%|██▏ | 1261/5773 [1:57:28<6:54:59, 5.52s/it] {'loss': 0.5847, 'learning_rate': 1.819697002285792e-05, 'epoch': 0.22} 22%|██▏ | 1261/5773 [1:57:23<6:54:59, 5.52s/it] 22%|██▏ | 1262/5773 [1:57:28<6:53:33, 5.50s/it] 22%|██▏ | 1262/5773 [1:57:34<6:53:35, 5.50s/it] {'loss': 0.5793, 'learning_rate': 1.819375477305267e-05, 'epoch': 0.22} 22%|██▏ | 1262/5773 [1:57:34<6:53:35, 5.50s/it] {'loss': 0.5793, 'learning_rate': 1.819375477305267e-05, 'epoch': 0.22} 22%|██▏ | 1262/5773 [1:57:28<6:53:33, 5.50s/it] 22%|██▏ | 1263/5773 [1:57:33<6:50:21, 5.46s/it] 22%|██▏ | 1263/5773 [1:57:39<6:50:20, 5.46s/it] {'loss': 0.6093, 'learning_rate': 1.8190536943591627e-05, 'epoch': 0.22} 22%|██▏ | 1263/5773 [1:57:39<6:50:20, 5.46s/it] {'loss': 0.6093, 'learning_rate': 1.8190536943591627e-05, 'epoch': 0.22} 22%|██▏ | 1263/5773 [1:57:33<6:50:21, 5.46s/it] 22%|██▏ | 1264/5773 [1:57:39<6:52:42, 5.49s/it] 22%|██▏ | 1264/5773 [1:57:45<6:52:42, 5.49s/it] {'loss': 0.6011, 'learning_rate': 1.8187316535487868e-05, 'epoch': 0.22} 22%|██▏ | 1264/5773 [1:57:45<6:52:42, 5.49s/it] {'loss': 0.6011, 'learning_rate': 1.8187316535487868e-05, 'epoch': 0.22} 22%|██▏ | 1264/5773 [1:57:39<6:52:42, 5.49s/it] 22%|██▏ | 1265/5773 [1:57:44<6:51:43, 5.48s/it] 22%|██▏ | 1265/5773 [1:57:50<6:51:43, 5.48s/it] {'loss': 0.601, 'learning_rate': 1.8184093549755284e-05, 'epoch': 0.22} 22%|██▏ | 1265/5773 [1:57:50<6:51:43, 5.48s/it] {'loss': 0.601, 'learning_rate': 1.8184093549755284e-05, 'epoch': 0.22} 22%|██▏ | 1265/5773 [1:57:44<6:51:43, 5.48s/it] 22%|██▏ | 1266/5773 [1:57:50<6:48:49, 5.44s/it] 22%|██▏ | 1266/5773 [1:57:55<6:48:50, 5.44s/it] {'loss': 0.5908, 'learning_rate': 1.8180867987408567e-05, 'epoch': 0.22} 22%|██▏ | 1266/5773 [1:57:55<6:48:50, 5.44s/it] {'loss': 0.5908, 'learning_rate': 1.8180867987408567e-05, 'epoch': 0.22} 22%|██▏ | 1266/5773 [1:57:50<6:48:49, 5.44s/it] 22%|██▏ | 1267/5773 [1:57:55<6:48:47, 5.44s/it] 22%|██▏ | 1267/5773 [1:58:01<6:48:47, 5.44s/it] {'loss': 0.5892, 'learning_rate': 1.8177639849463234e-05, 'epoch': 0.22} 22%|██▏ | 1267/5773 [1:58:01<6:48:47, 5.44s/it] {'loss': 0.5892, 'learning_rate': 1.8177639849463234e-05, 'epoch': 0.22} 22%|██▏ | 1267/5773 [1:57:55<6:48:47, 5.44s/it] 22%|██▏ | 1268/5773 [1:58:01<6:49:31, 5.45s/it] 22%|██▏ | 1268/5773 [1:58:06<6:49:32, 5.45s/it] {'loss': 0.6083, 'learning_rate': 1.8174409136935603e-05, 'epoch': 0.22} 22%|██▏ | 1268/5773 [1:58:06<6:49:32, 5.45s/it] {'loss': 0.6083, 'learning_rate': 1.8174409136935603e-05, 'epoch': 0.22} 22%|██▏ | 1268/5773 [1:58:01<6:49:31, 5.45s/it] 22%|██▏ | 1269/5773 [1:58:06<6:49:46, 5.46s/it] 22%|██▏ | 1269/5773 [1:58:12<6:49:46, 5.46s/it] {'loss': 0.601, 'learning_rate': 1.81711758508428e-05, 'epoch': 0.22} 22%|██▏ | 1269/5773 [1:58:12<6:49:46, 5.46s/it] {'loss': 0.601, 'learning_rate': 1.81711758508428e-05, 'epoch': 0.22} 22%|██▏ | 1269/5773 [1:58:06<6:49:46, 5.46s/it] 22%|██▏ | 1270/5773 [1:58:12<6:51:26, 5.48s/it] 22%|██▏ | 1270/5773 [1:58:17<6:51:25, 5.48s/it] {'loss': 0.5842, 'learning_rate': 1.816793999220278e-05, 'epoch': 0.22} 22%|██▏ | 1270/5773 [1:58:17<6:51:25, 5.48s/it] {'loss': 0.5842, 'learning_rate': 1.816793999220278e-05, 'epoch': 0.22} 22%|██▏ | 1270/5773 [1:58:12<6:51:26, 5.48s/it] 22%|██▏ | 1271/5773 [1:58:17<6:50:23, 5.47s/it] 22%|██▏ | 1271/5773 [1:58:23<6:50:23, 5.47s/it] {'loss': 0.5704, 'learning_rate': 1.816470156203428e-05, 'epoch': 0.22} 22%|██▏ | 1271/5773 [1:58:17<6:50:23, 5.47s/it] {'loss': 0.5704, 'learning_rate': 1.816470156203428e-05, 'epoch': 0.22} 22%|██▏ | 1271/5773 [1:58:23<6:50:23, 5.47s/it] 22%|██▏ | 1272/5773 [1:58:23<6:48:15, 5.44s/it] 22%|██▏ | 1272/5773 [1:58:28<6:48:15, 5.44s/it] {'loss': 0.5997, 'learning_rate': 1.816146056135687e-05, 'epoch': 0.22} 22%|██▏ | 1272/5773 [1:58:28<6:48:15, 5.44s/it] {'loss': 0.5997, 'learning_rate': 1.816146056135687e-05, 'epoch': 0.22} 22%|██▏ | 1272/5773 [1:58:23<6:48:15, 5.44s/it] 22%|██▏ | 1273/5773 [1:58:28<6:50:56, 5.48s/it] 22%|██▏ | 1273/5773 [1:58:34<6:50:56, 5.48s/it] {'loss': 0.5888, 'learning_rate': 1.8158216991190917e-05, 'epoch': 0.22} 22%|██▏ | 1273/5773 [1:58:34<6:50:56, 5.48s/it] {'loss': 0.5888, 'learning_rate': 1.8158216991190917e-05, 'epoch': 0.22} 22%|██▏ | 1273/5773 [1:58:28<6:50:56, 5.48s/it] 22%|██▏ | 1274/5773 [1:58:34<6:49:33, 5.46s/it] 22%|██▏ | 1274/5773 [1:58:39<6:49:33, 5.46s/it] {'loss': 0.5993, 'learning_rate': 1.8154970852557604e-05, 'epoch': 0.22} 22%|██▏ | 1274/5773 [1:58:39<6:49:33, 5.46s/it] {'loss': 0.5993, 'learning_rate': 1.8154970852557604e-05, 'epoch': 0.22} 22%|██▏ | 1274/5773 [1:58:34<6:49:33, 5.46s/it] 22%|██▏ | 1275/5773 [1:58:39<6:52:48, 5.51s/it] 22%|██▏ | 1275/5773 [1:58:45<6:52:48, 5.51s/it] {'loss': 0.5807, 'learning_rate': 1.8151722146478913e-05, 'epoch': 0.22} 22%|██▏ | 1275/5773 [1:58:45<6:52:48, 5.51s/it] {'loss': 0.5807, 'learning_rate': 1.8151722146478913e-05, 'epoch': 0.22} 22%|██▏ | 1275/5773 [1:58:39<6:52:48, 5.51s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 22%|██▏ | 1276/5773 [1:58:45<6:51:17, 5.49s/it] 22%|██▏ | 1276/5773 [1:58:50<6:51:17, 5.49s/it] {'loss': 0.5875, 'learning_rate': 1.814847087397765e-05, 'epoch': 0.22} 22%|██▏ | 1276/5773 [1:58:50<6:51:17, 5.49s/it] {'loss': 0.5875, 'learning_rate': 1.814847087397765e-05, 'epoch': 0.22} 22%|██▏ | 1276/5773 [1:58:45<6:51:17, 5.49s/it] 22%|██▏ | 1277/5773 [1:58:50<6:51:11, 5.49s/it] 22%|██▏ | 1277/5773 [1:58:56<6:51:12, 5.49s/it] {'loss': 0.5865, 'learning_rate': 1.814521703607741e-05, 'epoch': 0.22} 22%|██▏ | 1277/5773 [1:58:56<6:51:12, 5.49s/it] {'loss': 0.5865, 'learning_rate': 1.814521703607741e-05, 'epoch': 0.22} 22%|██▏ | 1277/5773 [1:58:50<6:51:11, 5.49s/it] 22%|██▏ | 1278/5773 [1:58:56<6:51:37, 5.49s/it] 22%|██▏ | 1278/5773 [1:59:01<6:51:37, 5.49s/it] {'loss': 0.583, 'learning_rate': 1.8141960633802612e-05, 'epoch': 0.22} 22%|██▏ | 1278/5773 [1:59:01<6:51:37, 5.49s/it] {'loss': 0.583, 'learning_rate': 1.8141960633802612e-05, 'epoch': 0.22} 22%|██▏ | 1278/5773 [1:58:56<6:51:37, 5.49s/it] 22%|██▏ | 1279/5773 [1:59:01<6:46:37, 5.43s/it] 22%|██▏ | 1279/5773 [1:59:06<6:46:36, 5.43s/it] {'loss': 0.5743, 'learning_rate': 1.813870166817847e-05, 'epoch': 0.22} 22%|██▏ | 1279/5773 [1:59:06<6:46:36, 5.43s/it] {'loss': 0.5743, 'learning_rate': 1.813870166817847e-05, 'epoch': 0.22} 22%|██▏ | 1279/5773 [1:59:01<6:46:37, 5.43s/it] 22%|██▏ | 1280/5773 [1:59:06<6:46:57, 5.43s/it] 22%|██▏ | 1280/5773 [1:59:12<6:46:58, 5.43s/it] {'loss': 0.5957, 'learning_rate': 1.8135440140231013e-05, 'epoch': 0.22} 22%|██▏ | 1280/5773 [1:59:12<6:46:58, 5.43s/it] {'loss': 0.5957, 'learning_rate': 1.8135440140231013e-05, 'epoch': 0.22} 22%|██▏ | 1280/5773 [1:59:06<6:46:57, 5.43s/it] 22%|██▏ | 1281/5773 [1:59:12<6:43:43, 5.39s/it] 22%|██▏ | 1281/5773 [1:59:17<6:43:43, 5.39s/it] {'loss': 0.5984, 'learning_rate': 1.8132176050987077e-05, 'epoch': 0.22} 22%|██▏ | 1281/5773 [1:59:17<6:43:43, 5.39s/it] {'loss': 0.5984, 'learning_rate': 1.8132176050987077e-05, 'epoch': 0.22} 22%|██▏ | 1281/5773 [1:59:12<6:43:43, 5.39s/it] 22%|██▏ | 1282/5773 [1:59:17<6:43:15, 5.39s/it] 22%|██▏ | 1282/5773 [1:59:23<6:43:15, 5.39s/it] {'loss': 0.5941, 'learning_rate': 1.8128909401474298e-05, 'epoch': 0.22} 22%|██▏ | 1282/5773 [1:59:23<6:43:15, 5.39s/it] {'loss': 0.5941, 'learning_rate': 1.8128909401474298e-05, 'epoch': 0.22} 22%|██▏ | 1282/5773 [1:59:17<6:43:15, 5.39s/it] 22%|██▏ | 1283/5773 [1:59:22<6:42:14, 5.38s/it] 22%|██▏ | 1283/5773 [1:59:28<6:42:13, 5.37s/it] {'loss': 0.5791, 'learning_rate': 1.812564019272112e-05, 'epoch': 0.22} 22%|██▏ | 1283/5773 [1:59:28<6:42:13, 5.37s/it] {'loss': 0.5791, 'learning_rate': 1.812564019272112e-05, 'epoch': 0.22} 22%|██▏ | 1283/5773 [1:59:22<6:42:14, 5.38s/it] 22%|██▏ | 1284/5773 [1:59:28<6:43:09, 5.39s/it] 22%|██▏ | 1284/5773 [1:59:33<6:43:08, 5.39s/it] {'loss': 0.5848, 'learning_rate': 1.81223684257568e-05, 'epoch': 0.22} 22%|██▏ | 1284/5773 [1:59:33<6:43:08, 5.39s/it] {'loss': 0.5848, 'learning_rate': 1.81223684257568e-05, 'epoch': 0.22} 22%|██▏ | 1284/5773 [1:59:28<6:43:09, 5.39s/it] 22%|██▏ | 1285/5773 [1:59:33<6:41:34, 5.37s/it] 22%|██▏ | 1285/5773 [1:59:39<6:41:35, 5.37s/it] {'loss': 0.5896, 'learning_rate': 1.811909410161139e-05, 'epoch': 0.22} 22%|██▏ | 1285/5773 [1:59:39<6:41:35, 5.37s/it] {'loss': 0.5896, 'learning_rate': 1.811909410161139e-05, 'epoch': 0.22} 22%|██▏ | 1285/5773 [1:59:33<6:41:34, 5.37s/it] 22%|██▏ | 1286/5773 [1:59:39<6:44:57, 5.42s/it] 22%|██▏ | 1286/5773 [1:59:44<6:44:57, 5.42s/it] {'loss': 0.5834, 'learning_rate': 1.8115817221315753e-05, 'epoch': 0.22} 22%|██▏ | 1286/5773 [1:59:44<6:44:57, 5.42s/it] {'loss': 0.5834, 'learning_rate': 1.8115817221315753e-05, 'epoch': 0.22} 22%|██▏ | 1286/5773 [1:59:39<6:44:57, 5.42s/it] 22%|██▏ | 1287/5773 [1:59:44<6:45:55, 5.43s/it] 22%|██▏ | 1287/5773 [1:59:50<6:45:55, 5.43s/it] {'loss': 0.5897, 'learning_rate': 1.8112537785901557e-05, 'epoch': 0.22} 22%|██▏ | 1287/5773 [1:59:50<6:45:55, 5.43s/it] {'loss': 0.5897, 'learning_rate': 1.8112537785901557e-05, 'epoch': 0.22} 22%|██▏ | 1287/5773 [1:59:44<6:45:55, 5.43s/it] 22%|██▏ | 1288/5773 [1:59:50<6:48:03, 5.46s/it] 22%|██▏ | 1288/5773 [1:59:55<6:48:03, 5.46s/it] {'loss': 0.6102, 'learning_rate': 1.8109255796401278e-05, 'epoch': 0.22} 22%|██▏ | 1288/5773 [1:59:55<6:48:03, 5.46s/it] {'loss': 0.6102, 'learning_rate': 1.8109255796401278e-05, 'epoch': 0.22} 22%|██▏ | 1288/5773 [1:59:50<6:48:03, 5.46s/it] 22%|██▏ | 1289/5773 [1:59:55<6:43:50, 5.40s/it] 22%|██▏ | 1289/5773 [2:00:00<6:43:52, 5.40s/it] {'loss': 0.5906, 'learning_rate': 1.8105971253848178e-05, 'epoch': 0.22} 22%|██▏ | 1289/5773 [2:00:00<6:43:52, 5.40s/it] {'loss': 0.5906, 'learning_rate': 1.8105971253848178e-05, 'epoch': 0.22} 22%|██▏ | 1289/5773 [1:59:55<6:43:50, 5.40s/it] 22%|██▏ | 1290/5773 [2:00:00<6:45:42, 5.43s/it] 22%|██▏ | 1290/5773 [2:00:06<6:45:42, 5.43s/it] {'loss': 0.6039, 'learning_rate': 1.8102684159276345e-05, 'epoch': 0.22} 22%|██▏ | 1290/5773 [2:00:06<6:45:42, 5.43s/it] {'loss': 0.6039, 'learning_rate': 1.8102684159276345e-05, 'epoch': 0.22} 22%|██▏ | 1290/5773 [2:00:00<6:45:42, 5.43s/it] 22%|██▏ | 1291/5773 [2:00:06<6:44:46, 5.42s/it] 22%|██▏ | 1291/5773 [2:00:11<6:44:46, 5.42s/it] {'loss': 0.5931, 'learning_rate': 1.809939451372066e-05, 'epoch': 0.22} 22%|██▏ | 1291/5773 [2:00:11<6:44:46, 5.42s/it] {'loss': 0.5931, 'learning_rate': 1.809939451372066e-05, 'epoch': 0.22} 22%|██▏ | 1291/5773 [2:00:06<6:44:46, 5.42s/it] 22%|██▏ | 1292/5773 [2:00:11<6:46:59, 5.45s/it] 22%|██▏ | 1292/5773 [2:00:17<6:46:59, 5.45s/it] {'loss': 0.5985, 'learning_rate': 1.8096102318216807e-05, 'epoch': 0.22} 22%|██▏ | 1292/5773 [2:00:17<6:46:59, 5.45s/it] {'loss': 0.5985, 'learning_rate': 1.8096102318216807e-05, 'epoch': 0.22} 22%|██▏ | 1292/5773 [2:00:11<6:46:59, 5.45s/it] 22%|██▏ | 1293/5773 [2:00:17<6:46:12, 5.44s/it] 22%|██▏ | 1293/5773 [2:00:22<6:46:12, 5.44s/it] {'loss': 0.5945, 'learning_rate': 1.8092807573801277e-05, 'epoch': 0.22} 22%|██▏ | 1293/5773 [2:00:22<6:46:12, 5.44s/it] {'loss': 0.5945, 'learning_rate': 1.8092807573801277e-05, 'epoch': 0.22} 22%|██▏ | 1293/5773 [2:00:17<6:46:12, 5.44s/it] 22%|██▏ | 1294/5773 [2:00:22<6:48:09, 5.47s/it] 22%|██▏ | 1294/5773 [2:00:28<6:48:09, 5.47s/it] {'loss': 0.6139, 'learning_rate': 1.8089510281511357e-05, 'epoch': 0.22} 22%|██▏ | 1294/5773 [2:00:28<6:48:09, 5.47s/it] {'loss': 0.6139, 'learning_rate': 1.8089510281511357e-05, 'epoch': 0.22} 22%|██▏ | 1294/5773 [2:00:22<6:48:09, 5.47s/it] 22%|██▏ | 1295/5773 [2:00:28<6:44:40, 5.42s/it] 22%|██▏ | 1295/5773 [2:00:33<6:44:40, 5.42s/it] {'loss': 0.6027, 'learning_rate': 1.8086210442385146e-05, 'epoch': 0.22} 22%|██▏ | 1295/5773 [2:00:33<6:44:40, 5.42s/it] {'loss': 0.6027, 'learning_rate': 1.8086210442385146e-05, 'epoch': 0.22} 22%|██▏ | 1295/5773 [2:00:28<6:44:40, 5.42s/it] 22%|██▏ | 1296/5773 [2:00:33<6:45:03, 5.43s/it] 22%|██▏ | 1296/5773 [2:00:39<6:45:03, 5.43s/it] {'loss': 0.6247, 'learning_rate': 1.8082908057461534e-05, 'epoch': 0.22} 22%|██▏ | 1296/5773 [2:00:39<6:45:03, 5.43s/it] {'loss': 0.6247, 'learning_rate': 1.8082908057461534e-05, 'epoch': 0.22} 22%|██▏ | 1296/5773 [2:00:33<6:45:03, 5.43s/it] 22%|██▏ | 1297/5773 [2:00:39<6:47:15, 5.46s/it] 22%|██▏ | 1297/5773 [2:00:44<6:47:15, 5.46s/it] {'loss': 0.6077, 'learning_rate': 1.8079603127780216e-05, 'epoch': 0.22} 22%|██▏ | 1297/5773 [2:00:44<6:47:15, 5.46s/it] {'loss': 0.6077, 'learning_rate': 1.8079603127780216e-05, 'epoch': 0.22} 22%|██▏ | 1297/5773 [2:00:39<6:47:15, 5.46s/it] 22%|██▏ | 1298/5773 [2:00:44<6:48:47, 5.48s/it] 22%|██▏ | 1298/5773 [2:00:50<6:48:47, 5.48s/it] {'loss': 0.5951, 'learning_rate': 1.807629565438169e-05, 'epoch': 0.22} 22%|██▏ | 1298/5773 [2:00:50<6:48:47, 5.48s/it] {'loss': 0.5951, 'learning_rate': 1.807629565438169e-05, 'epoch': 0.22} 22%|██▏ | 1298/5773 [2:00:44<6:48:47, 5.48s/it] 23%|██▎ | 1299/5773 [2:00:50<6:48:31, 5.48s/it] 23%|██▎ | 1299/5773 [2:00:55<6:48:32, 5.48s/it] {'loss': 0.6001, 'learning_rate': 1.8072985638307265e-05, 'epoch': 0.23} 23%|██▎ | 1299/5773 [2:00:55<6:48:32, 5.48s/it] {'loss': 0.6001, 'learning_rate': 1.8072985638307265e-05, 'epoch': 0.23} 23%|██▎ | 1299/5773 [2:00:50<6:48:31, 5.48s/it]15 AutoResumeHook: Checking whether to suspend... 09 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 23%|██▎ | 1300/5773 [2:00:55<6:45:27, 5.44s/it]10 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 23%|██▎ | 1300/5773 [2:01:00<6:45:27, 5.44s/it]4 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.5903, 'learning_rate': 1.8069673080599024e-05, 'epoch': 0.23} 23%|██▎ | 1300/5773 [2:01:00<6:45:27, 5.44s/it] {'loss': 0.5903, 'learning_rate': 1.8069673080599024e-05, 'epoch': 0.23} 23%|██▎ | 1300/5773 [2:00:55<6:45:27, 5.44s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 23%|██▎ | 1301/5773 [2:01:13<11:26:26, 9.21s/it] 23%|██▎ | 1301/5773 [2:01:18<11:26:25, 9.21s/it] {'loss': 0.5817, 'learning_rate': 1.8066357982299873e-05, 'epoch': 0.23} 23%|██▎ | 1301/5773 [2:01:18<11:26:25, 9.21s/it] {'loss': 0.5817, 'learning_rate': 1.8066357982299873e-05, 'epoch': 0.23} 23%|██▎ | 1301/5773 [2:01:13<11:26:26, 9.21s/it] 23%|██▎ | 1302/5773 [2:01:18<9:58:07, 8.03s/it] 23%|██▎ | 1302/5773 [2:01:24<9:58:07, 8.03s/it] {'loss': 0.5936, 'learning_rate': 1.8063040344453513e-05, 'epoch': 0.23} 23%|██▎ | 1302/5773 [2:01:24<9:58:07, 8.03s/it] {'loss': 0.5936, 'learning_rate': 1.8063040344453513e-05, 'epoch': 0.23} 23%|██▎ | 1302/5773 [2:01:18<9:58:07, 8.03s/it] 23%|██▎ | 1303/5773 [2:01:23<8:57:04, 7.21s/it] 23%|██▎ | 1303/5773 [2:01:29<8:57:04, 7.21s/it] {'loss': 0.581, 'learning_rate': 1.805972016810444e-05, 'epoch': 0.23} 23%|██▎ | 1303/5773 [2:01:29<8:57:04, 7.21s/it] {'loss': 0.581, 'learning_rate': 1.805972016810444e-05, 'epoch': 0.23} 23%|██▎ | 1303/5773 [2:01:23<8:57:04, 7.21s/it] 23%|██▎ | 1304/5773 [2:01:29<8:18:46, 6.70s/it] 23%|██▎ | 1304/5773 [2:01:34<8:18:46, 6.70s/it] {'loss': 0.6131, 'learning_rate': 1.8056397454297953e-05, 'epoch': 0.23} 23%|██▎ | 1304/5773 [2:01:35<8:18:46, 6.70s/it] {'loss': 0.6131, 'learning_rate': 1.8056397454297953e-05, 'epoch': 0.23} 23%|██▎ | 1304/5773 [2:01:29<8:18:46, 6.70s/it] 23%|██▎ | 1305/5773 [2:01:34<7:51:00, 6.33s/it] 23%|██▎ | 1305/5773 [2:01:40<7:51:00, 6.33s/it] {'loss': 0.5882, 'learning_rate': 1.8053072204080142e-05, 'epoch': 0.23} 23%|██▎ | 1305/5773 [2:01:40<7:51:00, 6.33s/it] {'loss': 0.5882, 'learning_rate': 1.8053072204080142e-05, 'epoch': 0.23} 23%|██▎ | 1305/5773 [2:01:34<7:51:00, 6.33s/it] 23%|██▎ | 1306/5773 [2:01:40<7:32:30, 6.08s/it] 23%|██▎ | 1306/5773 [2:01:45<7:32:30, 6.08s/it] {'loss': 0.5726, 'learning_rate': 1.8049744418497913e-05, 'epoch': 0.23} 23%|██▎ | 1306/5773 [2:01:45<7:32:30, 6.08s/it] {'loss': 0.5726, 'learning_rate': 1.8049744418497913e-05, 'epoch': 0.23} 23%|██▎ | 1306/5773 [2:01:40<7:32:30, 6.08s/it] 23%|██▎ | 1307/5773 [2:01:45<7:20:13, 5.91s/it] 23%|██▎ | 1307/5773 [2:01:51<7:20:13, 5.91s/it] {'loss': 0.5935, 'learning_rate': 1.8046414098598947e-05, 'epoch': 0.23} 23%|██▎ | 1307/5773 [2:01:51<7:20:13, 5.91s/it] {'loss': 0.5935, 'learning_rate': 1.8046414098598947e-05, 'epoch': 0.23} 23%|██▎ | 1307/5773 [2:01:45<7:20:13, 5.91s/it] 23%|██▎ | 1308/5773 [2:01:51<7:08:21, 5.76s/it] 23%|██▎ | 1308/5773 [2:01:56<7:08:21, 5.76s/it] {'loss': 0.5949, 'learning_rate': 1.804308124543175e-05, 'epoch': 0.23} 23%|██▎ | 1308/5773 [2:01:56<7:08:21, 5.76s/it] {'loss': 0.5949, 'learning_rate': 1.804308124543175e-05, 'epoch': 0.23} 23%|██▎ | 1308/5773 [2:01:51<7:08:21, 5.76s/it] 23%|██▎ | 1309/5773 [2:01:56<7:00:39, 5.65s/it] 23%|██▎ | 1309/5773 [2:02:02<7:00:39, 5.65s/it] {'loss': 0.6001, 'learning_rate': 1.8039745860045595e-05, 'epoch': 0.23} 23%|██▎ | 1309/5773 [2:02:02<7:00:39, 5.65s/it] {'loss': 0.6001, 'learning_rate': 1.8039745860045595e-05, 'epoch': 0.23} 23%|██▎ | 1309/5773 [2:01:56<7:00:39, 5.65s/it] 23%|██▎ | 1310/5773 [2:02:02<6:55:39, 5.59s/it] 23%|██▎ | 1310/5773 [2:02:07<6:55:39, 5.59s/it] {'loss': 0.5931, 'learning_rate': 1.803640794349058e-05, 'epoch': 0.23} 23%|██▎ | 1310/5773 [2:02:07<6:55:39, 5.59s/it] {'loss': 0.5931, 'learning_rate': 1.803640794349058e-05, 'epoch': 0.23} 23%|██▎ | 1310/5773 [2:02:02<6:55:39, 5.59s/it] 23%|██▎ | 1311/5773 [2:02:07<6:49:24, 5.51s/it] 23%|██▎ | 1311/5773 [2:02:13<6:49:24, 5.51s/it] {'loss': 0.609, 'learning_rate': 1.803306749681758e-05, 'epoch': 0.23} 23%|██▎ | 1311/5773 [2:02:13<6:49:24, 5.51s/it] {'loss': 0.609, 'learning_rate': 1.803306749681758e-05, 'epoch': 0.23} 23%|██▎ | 1311/5773 [2:02:07<6:49:24, 5.51s/it] 23%|██▎ | 1312/5773 [2:02:13<6:49:48, 5.51s/it] 23%|██▎ | 1312/5773 [2:02:18<6:49:48, 5.51s/it] {'loss': 0.5965, 'learning_rate': 1.8029724521078278e-05, 'epoch': 0.23} 23%|██▎ | 1312/5773 [2:02:18<6:49:48, 5.51s/it] {'loss': 0.5965, 'learning_rate': 1.8029724521078278e-05, 'epoch': 0.23} 23%|██▎ | 1312/5773 [2:02:13<6:49:48, 5.51s/it] 23%|██▎ | 1313/5773 [2:02:18<6:47:51, 5.49s/it] 23%|██▎ | 1313/5773 [2:02:23<6:47:51, 5.49s/it] {'loss': 0.5904, 'learning_rate': 1.8026379017325147e-05, 'epoch': 0.23} 23%|██▎ | 1313/5773 [2:02:23<6:47:51, 5.49s/it] {'loss': 0.5904, 'learning_rate': 1.8026379017325147e-05, 'epoch': 0.23} 23%|██▎ | 1313/5773 [2:02:18<6:47:51, 5.49s/it] 23%|██▎ | 1314/5773 [2:02:23<6:42:51, 5.42s/it] 23%|██▎ | 1314/5773 [2:02:29<6:42:51, 5.42s/it] {'loss': 0.5799, 'learning_rate': 1.8023030986611463e-05, 'epoch': 0.23} 23%|██▎ | 1314/5773 [2:02:29<6:42:51, 5.42s/it] {'loss': 0.5799, 'learning_rate': 1.8023030986611463e-05, 'epoch': 0.23} 23%|██▎ | 1314/5773 [2:02:23<6:42:51, 5.42s/it] 23%|██▎ | 1315/5773 [2:02:29<6:43:09, 5.43s/it] 23%|██▎ | 1315/5773 [2:02:34<6:43:09, 5.43s/it] {'loss': 0.5848, 'learning_rate': 1.8019680429991293e-05, 'epoch': 0.23} 23%|██▎ | 1315/5773 [2:02:34<6:43:09, 5.43s/it] {'loss': 0.5848, 'learning_rate': 1.8019680429991293e-05, 'epoch': 0.23} 23%|██▎ | 1315/5773 [2:02:29<6:43:09, 5.43s/it] 23%|██▎ | 1316/5773 [2:02:34<6:41:25, 5.40s/it] 23%|██▎ | 1316/5773 [2:02:40<6:41:26, 5.40s/it] {'loss': 0.6127, 'learning_rate': 1.8016327348519495e-05, 'epoch': 0.23} 23%|██▎ | 1316/5773 [2:02:40<6:41:26, 5.40s/it] {'loss': 0.6127, 'learning_rate': 1.8016327348519495e-05, 'epoch': 0.23} 23%|██▎ | 1316/5773 [2:02:34<6:41:25, 5.40s/it] 23%|██▎ | 1317/5773 [2:02:39<6:40:58, 5.40s/it] 23%|██▎ | 1317/5773 [2:02:45<6:40:58, 5.40s/it] {'loss': 0.5885, 'learning_rate': 1.8012971743251722e-05, 'epoch': 0.23} 23%|██▎ | 1317/5773 [2:02:45<6:40:58, 5.40s/it] {'loss': 0.5885, 'learning_rate': 1.8012971743251722e-05, 'epoch': 0.23} 23%|██▎ | 1317/5773 [2:02:39<6:40:58, 5.40s/it] 23%|██▎ | 1318/5773 [2:02:45<6:41:12, 5.40s/it] 23%|██▎ | 1318/5773 [2:02:50<6:41:12, 5.40s/it] {'loss': 0.5973, 'learning_rate': 1.8009613615244438e-05, 'epoch': 0.23} 23%|██▎ | 1318/5773 [2:02:50<6:41:12, 5.40s/it] {'loss': 0.5973, 'learning_rate': 1.8009613615244438e-05, 'epoch': 0.23} 23%|██▎ | 1318/5773 [2:02:45<6:41:12, 5.40s/it] 23%|██▎ | 1319/5773 [2:02:50<6:41:36, 5.41s/it] 23%|██▎ | 1319/5773 [2:02:56<6:41:36, 5.41s/it] {'loss': 0.5937, 'learning_rate': 1.800625296555488e-05, 'epoch': 0.23} 23%|██▎ | 1319/5773 [2:02:56<6:41:36, 5.41s/it] {'loss': 0.5937, 'learning_rate': 1.800625296555488e-05, 'epoch': 0.23} 23%|██▎ | 1319/5773 [2:02:50<6:41:36, 5.41s/it] 23%|██▎ | 1320/5773 [2:02:56<6:47:49, 5.50s/it] 23%|██▎ | 1320/5773 [2:03:01<6:47:49, 5.50s/it] {'loss': 0.6015, 'learning_rate': 1.8002889795241087e-05, 'epoch': 0.23} 23%|██▎ | 1320/5773 [2:03:01<6:47:49, 5.50s/it] {'loss': 0.6015, 'learning_rate': 1.8002889795241087e-05, 'epoch': 0.23} 23%|██▎ | 1320/5773 [2:02:56<6:47:49, 5.50s/it] 23%|██▎ | 1321/5773 [2:03:01<6:46:46, 5.48s/it] 23%|██▎ | 1321/5773 [2:03:07<6:46:46, 5.48s/it] {'loss': 0.5598, 'learning_rate': 1.79995241053619e-05, 'epoch': 0.23} 23%|██▎ | 1321/5773 [2:03:07<6:46:46, 5.48s/it] {'loss': 0.5598, 'learning_rate': 1.79995241053619e-05, 'epoch': 0.23} 23%|██▎ | 1321/5773 [2:03:01<6:46:46, 5.48s/it] 23%|██▎ | 1322/5773 [2:03:07<6:47:52, 5.50s/it] 23%|██▎ | 1322/5773 [2:03:12<6:47:52, 5.50s/it] {'loss': 0.5915, 'learning_rate': 1.799615589697694e-05, 'epoch': 0.23} 23%|██▎ | 1322/5773 [2:03:12<6:47:52, 5.50s/it] {'loss': 0.5915, 'learning_rate': 1.799615589697694e-05, 'epoch': 0.23} 23%|██▎ | 1322/5773 [2:03:07<6:47:52, 5.50s/it] 23%|██▎ | 1323/5773 [2:03:12<6:46:17, 5.48s/it] 23%|██▎ | 1323/5773 [2:03:18<6:46:17, 5.48s/it] {'loss': 0.5791, 'learning_rate': 1.7992785171146633e-05, 'epoch': 0.23} 23%|██▎ | 1323/5773 [2:03:18<6:46:17, 5.48s/it] {'loss': 0.5791, 'learning_rate': 1.7992785171146633e-05, 'epoch': 0.23} 23%|██▎ | 1323/5773 [2:03:12<6:46:17, 5.48s/it] 23%|██▎ | 1324/5773 [2:03:18<6:42:28, 5.43s/it] 23%|██▎ | 1324/5773 [2:03:23<6:42:28, 5.43s/it] {'loss': 0.583, 'learning_rate': 1.798941192893218e-05, 'epoch': 0.23} 23%|██▎ | 1324/5773 [2:03:23<6:42:28, 5.43s/it] {'loss': 0.583, 'learning_rate': 1.798941192893218e-05, 'epoch': 0.23} 23%|██▎ | 1324/5773 [2:03:18<6:42:28, 5.43s/it] 23%|██▎ | 1325/5773 [2:03:23<6:41:48, 5.42s/it] 23%|██▎ | 1325/5773 [2:03:29<6:41:48, 5.42s/it] {'loss': 0.5892, 'learning_rate': 1.7986036171395594e-05, 'epoch': 0.23} 23%|██▎ | 1325/5773 [2:03:29<6:41:48, 5.42s/it] {'loss': 0.5892, 'learning_rate': 1.7986036171395594e-05, 'epoch': 0.23} 23%|██▎ | 1325/5773 [2:03:23<6:41:48, 5.42s/it] 23%|██▎ | 1326/5773 [2:03:29<6:41:59, 5.42s/it] 23%|██▎ | 1326/5773 [2:03:34<6:41:59, 5.42s/it] {'loss': 0.6005, 'learning_rate': 1.7982657899599672e-05, 'epoch': 0.23} 23%|██▎ | 1326/5773 [2:03:34<6:41:59, 5.42s/it] {'loss': 0.6005, 'learning_rate': 1.7982657899599672e-05, 'epoch': 0.23} 23%|██▎ | 1326/5773 [2:03:29<6:41:59, 5.42s/it] 23%|██▎ | 1327/5773 [2:03:34<6:39:46, 5.40s/it] 23%|██▎ | 1327/5773 [2:03:39<6:39:46, 5.39s/it] {'loss': 0.5904, 'learning_rate': 1.7979277114607996e-05, 'epoch': 0.23} 23%|██▎ | 1327/5773 [2:03:39<6:39:46, 5.39s/it] {'loss': 0.5904, 'learning_rate': 1.7979277114607996e-05, 'epoch': 0.23} 23%|██▎ | 1327/5773 [2:03:34<6:39:46, 5.40s/it] 23%|██▎ | 1328/5773 [2:03:39<6:39:42, 5.40s/it] 23%|██▎ | 1328/5773 [2:03:45<6:39:42, 5.40s/it] {'loss': 0.5991, 'learning_rate': 1.7975893817484948e-05, 'epoch': 0.23} 23%|██▎ | 1328/5773 [2:03:45<6:39:42, 5.40s/it] {'loss': 0.5991, 'learning_rate': 1.7975893817484948e-05, 'epoch': 0.23} 23%|██▎ | 1328/5773 [2:03:39<6:39:42, 5.40s/it] 23%|██▎ | 1329/5773 [2:03:45<6:41:03, 5.41s/it] 23%|██▎ | 1329/5773 [2:03:50<6:41:03, 5.41s/it] {'loss': 0.5903, 'learning_rate': 1.79725080092957e-05, 'epoch': 0.23} 23%|██▎ | 1329/5773 [2:03:50<6:41:03, 5.41s/it] {'loss': 0.5903, 'learning_rate': 1.79725080092957e-05, 'epoch': 0.23} 23%|██▎ | 1329/5773 [2:03:45<6:41:03, 5.41s/it] 23%|██▎ | 1330/5773 [2:03:50<6:43:16, 5.45s/it] 23%|██▎ | 1330/5773 [2:03:56<6:43:16, 5.45s/it] {'loss': 0.6104, 'learning_rate': 1.796911969110621e-05, 'epoch': 0.23} 23%|██▎ | 1330/5773 [2:03:56<6:43:16, 5.45s/it] {'loss': 0.6104, 'learning_rate': 1.796911969110621e-05, 'epoch': 0.23} 23%|██▎ | 1330/5773 [2:03:50<6:43:16, 5.45s/it] 23%|██▎ | 1331/5773 [2:03:56<6:41:27, 5.42s/it] 23%|██▎ | 1331/5773 [2:04:01<6:41:27, 5.42s/it] {'loss': 0.5892, 'learning_rate': 1.7965728863983228e-05, 'epoch': 0.23} 23%|██▎ | 1331/5773 [2:04:01<6:41:27, 5.42s/it] {'loss': 0.5892, 'learning_rate': 1.7965728863983228e-05, 'epoch': 0.23} 23%|██▎ | 1331/5773 [2:03:56<6:41:27, 5.42s/it] 23%|██▎ | 1332/5773 [2:04:01<6:42:27, 5.44s/it] 23%|██▎ | 1332/5773 [2:04:07<6:42:26, 5.44s/it] {'loss': 0.5903, 'learning_rate': 1.7962335528994296e-05, 'epoch': 0.23} 23%|██▎ | 1332/5773 [2:04:01<6:42:27, 5.44s/it] {'loss': 0.5903, 'learning_rate': 1.7962335528994296e-05, 'epoch': 0.23} 23%|██▎ | 1332/5773 [2:04:07<6:42:26, 5.44s/it] 23%|██▎ | 1333/5773 [2:04:06<6:41:05, 5.42s/it] 23%|██▎ | 1333/5773 [2:04:12<6:41:05, 5.42s/it] {'loss': 0.5998, 'learning_rate': 1.795893968720775e-05, 'epoch': 0.23} 23%|██▎ | 1333/5773 [2:04:12<6:41:05, 5.42s/it] {'loss': 0.5998, 'learning_rate': 1.795893968720775e-05, 'epoch': 0.23} 23%|██▎ | 1333/5773 [2:04:06<6:41:05, 5.42s/it] 23%|██▎ | 1334/5773 [2:04:12<6:43:38, 5.46s/it] 23%|██▎ | 1334/5773 [2:04:18<6:43:38, 5.46s/it] {'loss': 0.5871, 'learning_rate': 1.7955541339692697e-05, 'epoch': 0.23} 23%|██▎ | 1334/5773 [2:04:18<6:43:38, 5.46s/it] {'loss': 0.5871, 'learning_rate': 1.7955541339692697e-05, 'epoch': 0.23} 23%|██▎ | 1334/5773 [2:04:12<6:43:38, 5.46s/it] 23%|██▎ | 1335/5773 [2:04:17<6:42:14, 5.44s/it] 23%|██▎ | 1335/5773 [2:04:23<6:42:14, 5.44s/it] {'loss': 0.581, 'learning_rate': 1.7952140487519055e-05, 'epoch': 0.23} 23%|██▎ | 1335/5773 [2:04:23<6:42:14, 5.44s/it] {'loss': 0.581, 'learning_rate': 1.7952140487519055e-05, 'epoch': 0.23} 23%|██▎ | 1335/5773 [2:04:17<6:42:14, 5.44s/it] 23%|██▎ | 1336/5773 [2:04:23<6:40:15, 5.41s/it] 23%|██▎ | 1336/5773 [2:04:28<6:40:15, 5.41s/it] {'loss': 0.5871, 'learning_rate': 1.7948737131757518e-05, 'epoch': 0.23} 23%|██▎ | 1336/5773 [2:04:28<6:40:15, 5.41s/it] {'loss': 0.5871, 'learning_rate': 1.7948737131757518e-05, 'epoch': 0.23} 23%|██▎ | 1336/5773 [2:04:23<6:40:15, 5.41s/it] 23%|██▎ | 1337/5773 [2:04:28<6:45:22, 5.48s/it] 23%|██▎ | 1337/5773 [2:04:34<6:45:22, 5.48s/it] {'loss': 0.5933, 'learning_rate': 1.7945331273479577e-05, 'epoch': 0.23} 23%|██▎ | 1337/5773 [2:04:34<6:45:22, 5.48s/it] {'loss': 0.5933, 'learning_rate': 1.7945331273479577e-05, 'epoch': 0.23} 23%|██▎ | 1337/5773 [2:04:28<6:45:22, 5.48s/it] 23%|██▎ | 1338/5773 [2:04:34<6:42:42, 5.45s/it] 23%|██▎ | 1338/5773 [2:04:39<6:42:42, 5.45s/it] {'loss': 0.5806, 'learning_rate': 1.7941922913757492e-05, 'epoch': 0.23} 23%|██▎ | 1338/5773 [2:04:39<6:42:42, 5.45s/it] {'loss': 0.5806, 'learning_rate': 1.7941922913757492e-05, 'epoch': 0.23} 23%|██▎ | 1338/5773 [2:04:34<6:42:42, 5.45s/it] 23%|██▎ | 1339/5773 [2:04:39<6:39:51, 5.41s/it] 23%|██▎ | 1339/5773 [2:04:45<6:39:51, 5.41s/it] {'loss': 0.605, 'learning_rate': 1.7938512053664335e-05, 'epoch': 0.23} 23%|██▎ | 1339/5773 [2:04:45<6:39:51, 5.41s/it] {'loss': 0.605, 'learning_rate': 1.7938512053664335e-05, 'epoch': 0.23} 23%|██▎ | 1339/5773 [2:04:39<6:39:51, 5.41s/it] 23%|██▎ | 1340/5773 [2:04:45<6:43:13, 5.46s/it] 23%|██▎ | 1340/5773 [2:04:50<6:43:13, 5.46s/it] {'loss': 0.5984, 'learning_rate': 1.793509869427395e-05, 'epoch': 0.23} 23%|██▎ | 1340/5773 [2:04:50<6:43:13, 5.46s/it] {'loss': 0.5984, 'learning_rate': 1.793509869427395e-05, 'epoch': 0.23} 23%|██▎ | 1340/5773 [2:04:45<6:43:13, 5.46s/it] 23%|██▎ | 1341/5773 [2:04:50<6:42:36, 5.45s/it] 23%|██▎ | 1341/5773 [2:04:56<6:42:37, 5.45s/it] {'loss': 0.5971, 'learning_rate': 1.793168283666097e-05, 'epoch': 0.23} 23%|██▎ | 1341/5773 [2:04:56<6:42:37, 5.45s/it] {'loss': 0.5971, 'learning_rate': 1.793168283666097e-05, 'epoch': 0.23} 23%|██▎ | 1341/5773 [2:04:50<6:42:36, 5.45s/it] 23%|██▎ | 1342/5773 [2:04:55<6:41:26, 5.44s/it] 23%|██▎ | 1342/5773 [2:05:01<6:41:26, 5.44s/it] {'loss': 0.5884, 'learning_rate': 1.7928264481900815e-05, 'epoch': 0.23} 23%|██▎ | 1342/5773 [2:05:01<6:41:26, 5.44s/it] {'loss': 0.5884, 'learning_rate': 1.7928264481900815e-05, 'epoch': 0.23} 23%|██▎ | 1342/5773 [2:04:55<6:41:26, 5.44s/it] 23%|██▎ | 1343/5773 [2:05:01<6:45:25, 5.49s/it] 23%|██▎ | 1343/5773 [2:05:07<6:45:25, 5.49s/it] {'loss': 0.5931, 'learning_rate': 1.79248436310697e-05, 'epoch': 0.23} 23%|██▎ | 1343/5773 [2:05:07<6:45:25, 5.49s/it] {'loss': 0.5931, 'learning_rate': 1.79248436310697e-05, 'epoch': 0.23} 23%|██▎ | 1343/5773 [2:05:01<6:45:25, 5.49s/it] 23%|██▎ | 1344/5773 [2:05:07<6:45:35, 5.49s/it] 23%|██▎ | 1344/5773 [2:05:12<6:45:35, 5.49s/it] {'loss': 0.607, 'learning_rate': 1.7921420285244612e-05, 'epoch': 0.23} 23%|██▎ | 1344/5773 [2:05:12<6:45:35, 5.49s/it] {'loss': 0.607, 'learning_rate': 1.7921420285244612e-05, 'epoch': 0.23} 23%|██▎ | 1344/5773 [2:05:07<6:45:35, 5.49s/it] 23%|██▎ | 1345/5773 [2:05:12<6:45:15, 5.49s/it] 23%|██▎ | 1345/5773 [2:05:18<6:45:15, 5.49s/it] {'loss': 0.5794, 'learning_rate': 1.791799444550333e-05, 'epoch': 0.23} 23%|██▎ | 1345/5773 [2:05:18<6:45:15, 5.49s/it] {'loss': 0.5794, 'learning_rate': 1.791799444550333e-05, 'epoch': 0.23} 23%|██▎ | 1345/5773 [2:05:12<6:45:15, 5.49s/it] 23%|██▎ | 1346/5773 [2:05:18<6:44:33, 5.48s/it] 23%|██▎ | 1346/5773 [2:05:23<6:44:32, 5.48s/it] {'loss': 0.5814, 'learning_rate': 1.7914566112924413e-05, 'epoch': 0.23} 23%|██▎ | 1346/5773 [2:05:23<6:44:32, 5.48s/it] {'loss': 0.5814, 'learning_rate': 1.7914566112924413e-05, 'epoch': 0.23} 23%|██▎ | 1346/5773 [2:05:18<6:44:33, 5.48s/it] 23%|██▎ | 1347/5773 [2:05:23<6:44:13, 5.48s/it] 23%|██▎ | 1347/5773 [2:05:29<6:44:13, 5.48s/it] {'loss': 0.594, 'learning_rate': 1.791113528858722e-05, 'epoch': 0.23} 23%|██▎ | 1347/5773 [2:05:29<6:44:13, 5.48s/it] {'loss': 0.594, 'learning_rate': 1.791113528858722e-05, 'epoch': 0.23} 23%|██▎ | 1347/5773 [2:05:23<6:44:13, 5.48s/it] 23%|██▎ | 1348/5773 [2:05:28<6:43:36, 5.47s/it] 23%|██▎ | 1348/5773 [2:05:34<6:43:36, 5.47s/it] {'loss': 0.5743, 'learning_rate': 1.7907701973571878e-05, 'epoch': 0.23} 23%|██▎ | 1348/5773 [2:05:34<6:43:36, 5.47s/it] {'loss': 0.5743, 'learning_rate': 1.7907701973571878e-05, 'epoch': 0.23} 23%|██▎ | 1348/5773 [2:05:28<6:43:36, 5.47s/it] 23%|██▎ | 1349/5773 [2:05:34<6:43:31, 5.47s/it] 23%|██▎ | 1349/5773 [2:05:39<6:43:31, 5.47s/it] {'loss': 0.5905, 'learning_rate': 1.7904266168959306e-05, 'epoch': 0.23} 23%|██▎ | 1349/5773 [2:05:39<6:43:31, 5.47s/it] {'loss': 0.5905, 'learning_rate': 1.7904266168959306e-05, 'epoch': 0.23} 23%|██▎ | 1349/5773 [2:05:34<6:43:31, 5.47s/it]7 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 23%|██▎ | 1350/5773 [2:05:39<6:41:51, 5.45s/it]10 AutoResumeHook: Checking whether to suspend... 1283 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 23%|██▎ | 1350/5773 [2:05:45<6:41:51, 5.45s/it]11 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6062, 'learning_rate': 1.7900827875831207e-05, 'epoch': 0.23} 23%|██▎ | 1350/5773 [2:05:45<6:41:51, 5.45s/it] {'loss': 0.6062, 'learning_rate': 1.7900827875831207e-05, 'epoch': 0.23} 23%|██▎ | 1350/5773 [2:05:39<6:41:51, 5.45s/it] 23%|██▎ | 1351/5773 [2:05:45<6:45:43, 5.51s/it] 23%|██▎ | 1351/5773 [2:05:50<6:45:43, 5.51s/it] {'loss': 0.5832, 'learning_rate': 1.7897387095270058e-05, 'epoch': 0.23} 23%|██▎ | 1351/5773 [2:05:50<6:45:43, 5.51s/it] {'loss': 0.5832, 'learning_rate': 1.7897387095270058e-05, 'epoch': 0.23} 23%|██▎ | 1351/5773 [2:05:45<6:45:43, 5.51s/it] 23%|██▎ | 1352/5773 [2:05:51<6:47:38, 5.53s/it] 23%|██▎ | 1352/5773 [2:05:56<6:47:38, 5.53s/it] {'loss': 0.5914, 'learning_rate': 1.7893943828359136e-05, 'epoch': 0.23} 23%|██▎ | 1352/5773 [2:05:56<6:47:38, 5.53s/it] {'loss': 0.5914, 'learning_rate': 1.7893943828359136e-05, 'epoch': 0.23} 23%|██▎ | 1352/5773 [2:05:51<6:47:38, 5.53s/it] 23%|██▎ | 1353/5773 [2:05:56<6:48:14, 5.54s/it] 23%|██▎ | 1353/5773 [2:06:02<6:48:14, 5.54s/it] {'loss': 0.5993, 'learning_rate': 1.7890498076182484e-05, 'epoch': 0.23} 23%|██▎ | 1353/5773 [2:06:02<6:48:14, 5.54s/it] {'loss': 0.5993, 'learning_rate': 1.7890498076182484e-05, 'epoch': 0.23} 23%|██▎ | 1353/5773 [2:05:56<6:48:14, 5.54s/it] 23%|██▎ | 1354/5773 [2:06:01<6:43:26, 5.48s/it] 23%|██▎ | 1354/5773 [2:06:07<6:43:26, 5.48s/it] {'loss': 0.5923, 'learning_rate': 1.7887049839824938e-05, 'epoch': 0.23} 23%|██▎ | 1354/5773 [2:06:07<6:43:26, 5.48s/it] {'loss': 0.5923, 'learning_rate': 1.7887049839824938e-05, 'epoch': 0.23} 23%|██▎ | 1354/5773 [2:06:01<6:43:26, 5.48s/it] 23%|██▎ | 1355/5773 [2:06:07<6:43:34, 5.48s/it] 23%|██▎ | 1355/5773 [2:06:12<6:43:34, 5.48s/it] {'loss': 0.5877, 'learning_rate': 1.7883599120372116e-05, 'epoch': 0.23} 23%|██▎ | 1355/5773 [2:06:12<6:43:34, 5.48s/it] {'loss': 0.5877, 'learning_rate': 1.7883599120372116e-05, 'epoch': 0.23} 23%|██▎ | 1355/5773 [2:06:07<6:43:34, 5.48s/it] 23%|██▎ | 1356/5773 [2:06:12<6:40:40, 5.44s/it] 23%|██▎ | 1356/5773 [2:06:18<6:40:40, 5.44s/it] {'loss': 0.5814, 'learning_rate': 1.7880145918910406e-05, 'epoch': 0.23} 23%|██▎ | 1356/5773 [2:06:18<6:40:40, 5.44s/it] {'loss': 0.5814, 'learning_rate': 1.7880145918910406e-05, 'epoch': 0.23} 23%|██▎ | 1356/5773 [2:06:12<6:40:40, 5.44s/it] 24%|██▎ | 1357/5773 [2:06:18<6:39:51, 5.43s/it] 24%|██▎ | 1357/5773 [2:06:23<6:39:50, 5.43s/it] {'loss': 0.5912, 'learning_rate': 1.7876690236526995e-05, 'epoch': 0.24} 24%|██▎ | 1357/5773 [2:06:23<6:39:50, 5.43s/it] {'loss': 0.5912, 'learning_rate': 1.7876690236526995e-05, 'epoch': 0.24} 24%|██▎ | 1357/5773 [2:06:18<6:39:51, 5.43s/it] 24%|██▎ | 1358/5773 [2:06:23<6:38:24, 5.41s/it] 24%|██▎ | 1358/5773 [2:06:29<6:38:24, 5.41s/it] {'loss': 0.6054, 'learning_rate': 1.787323207430984e-05, 'epoch': 0.24} 24%|██▎ | 1358/5773 [2:06:29<6:38:24, 5.41s/it] {'loss': 0.6054, 'learning_rate': 1.787323207430984e-05, 'epoch': 0.24} 24%|██▎ | 1358/5773 [2:06:23<6:38:24, 5.41s/it] 24%|██▎ | 1359/5773 [2:06:29<6:38:52, 5.42s/it] 24%|██▎ | 1359/5773 [2:06:34<6:38:52, 5.42s/it] {'loss': 0.5996, 'learning_rate': 1.786977143334768e-05, 'epoch': 0.24} 24%|██▎ | 1359/5773 [2:06:34<6:38:52, 5.42s/it] {'loss': 0.5996, 'learning_rate': 1.786977143334768e-05, 'epoch': 0.24} 24%|██▎ | 1359/5773 [2:06:29<6:38:52, 5.42s/it] 24%|██▎ | 1360/5773 [2:06:34<6:37:38, 5.41s/it] 24%|██▎ | 1360/5773 [2:06:39<6:37:38, 5.41s/it] {'loss': 0.6031, 'learning_rate': 1.786630831473003e-05, 'epoch': 0.24} 24%|██▎ | 1360/5773 [2:06:39<6:37:38, 5.41s/it] {'loss': 0.6031, 'learning_rate': 1.786630831473003e-05, 'epoch': 0.24} 24%|██▎ | 1360/5773 [2:06:34<6:37:38, 5.41s/it] 24%|██▎ | 1361/5773 [2:06:39<6:40:27, 5.45s/it] 24%|██▎ | 1361/5773 [2:06:45<6:40:27, 5.45s/it] {'loss': 0.5843, 'learning_rate': 1.7862842719547204e-05, 'epoch': 0.24} {'loss': 0.5843, 'learning_rate': 1.7862842719547204e-05, 'epoch': 0.24} 24%|██▎ | 1361/5773 [2:06:45<6:40:27, 5.45s/it] 24%|██▎ | 1361/5773 [2:06:39<6:40:27, 5.45s/it] 24%|██▎ | 1362/5773 [2:06:45<6:40:49, 5.45s/it] 24%|██▎ | 1362/5773 [2:06:50<6:40:49, 5.45s/it] {'loss': 0.5846, 'learning_rate': 1.785937464889027e-05, 'epoch': 0.24} 24%|██▎ | 1362/5773 [2:06:50<6:40:49, 5.45s/it] {'loss': 0.5846, 'learning_rate': 1.785937464889027e-05, 'epoch': 0.24} 24%|██▎ | 1362/5773 [2:06:45<6:40:49, 5.45s/it] 24%|██▎ | 1363/5773 [2:06:50<6:40:29, 5.45s/it] 24%|██▎ | 1363/5773 [2:06:56<6:40:30, 5.45s/it] {'loss': 0.5867, 'learning_rate': 1.785590410385109e-05, 'epoch': 0.24} 24%|██▎ | 1363/5773 [2:06:56<6:40:30, 5.45s/it] {'loss': 0.5867, 'learning_rate': 1.785590410385109e-05, 'epoch': 0.24} 24%|██▎ | 1363/5773 [2:06:50<6:40:29, 5.45s/it] 24%|██▎ | 1364/5773 [2:06:56<6:42:52, 5.48s/it] 24%|██▎ | 1364/5773 [2:07:01<6:42:51, 5.48s/it] {'loss': 0.5924, 'learning_rate': 1.7852431085522305e-05, 'epoch': 0.24} 24%|██▎ | 1364/5773 [2:07:01<6:42:51, 5.48s/it] {'loss': 0.5924, 'learning_rate': 1.7852431085522305e-05, 'epoch': 0.24} 24%|██▎ | 1364/5773 [2:06:56<6:42:52, 5.48s/it] 24%|██▎ | 1365/5773 [2:07:01<6:42:01, 5.47s/it] 24%|██▎ | 1365/5773 [2:07:07<6:42:01, 5.47s/it] {'loss': 0.5871, 'learning_rate': 1.784895559499733e-05, 'epoch': 0.24} 24%|██▎ | 1365/5773 [2:07:07<6:42:01, 5.47s/it] {'loss': 0.5871, 'learning_rate': 1.784895559499733e-05, 'epoch': 0.24} 24%|██▎ | 1365/5773 [2:07:01<6:42:01, 5.47s/it] 24%|██▎ | 1366/5773 [2:07:07<6:41:05, 5.46s/it] 24%|██▎ | 1366/5773 [2:07:12<6:41:05, 5.46s/it] {'loss': 0.5959, 'learning_rate': 1.7845477633370364e-05, 'epoch': 0.24} 24%|██▎ | 1366/5773 [2:07:12<6:41:05, 5.46s/it] {'loss': 0.5959, 'learning_rate': 1.7845477633370364e-05, 'epoch': 0.24} 24%|██▎ | 1366/5773 [2:07:07<6:41:05, 5.46s/it] 24%|██▎ | 1367/5773 [2:07:12<6:38:46, 5.43s/it] 24%|██▎ | 1367/5773 [2:07:18<6:38:46, 5.43s/it] {'loss': 0.5971, 'learning_rate': 1.7841997201736375e-05, 'epoch': 0.24} 24%|██▎ | 1367/5773 [2:07:18<6:38:46, 5.43s/it] {'loss': 0.5971, 'learning_rate': 1.7841997201736375e-05, 'epoch': 0.24} 24%|██▎ | 1367/5773 [2:07:12<6:38:46, 5.43s/it] 24%|██▎ | 1368/5773 [2:07:17<6:35:11, 5.38s/it] 24%|██▎ | 1368/5773 [2:07:23<6:35:11, 5.38s/it] {'loss': 0.5858, 'learning_rate': 1.7838514301191117e-05, 'epoch': 0.24} 24%|██▎ | 1368/5773 [2:07:23<6:35:11, 5.38s/it] {'loss': 0.5858, 'learning_rate': 1.7838514301191117e-05, 'epoch': 0.24} 24%|██▎ | 1368/5773 [2:07:17<6:35:11, 5.38s/it] 24%|██▎ | 1369/5773 [2:07:23<6:34:50, 5.38s/it] 24%|██▎ | 1369/5773 [2:07:28<6:34:50, 5.38s/it] {'loss': 0.5746, 'learning_rate': 1.783502893283112e-05, 'epoch': 0.24} 24%|██▎ | 1369/5773 [2:07:28<6:34:50, 5.38s/it] {'loss': 0.5746, 'learning_rate': 1.783502893283112e-05, 'epoch': 0.24} 24%|██▎ | 1369/5773 [2:07:23<6:34:50, 5.38s/it] 24%|██▎ | 1370/5773 [2:07:28<6:35:32, 5.39s/it] 24%|██▎ | 1370/5773 [2:07:34<6:35:32, 5.39s/it] {'loss': 0.5937, 'learning_rate': 1.7831541097753683e-05, 'epoch': 0.24} 24%|██▎ | 1370/5773 [2:07:34<6:35:32, 5.39s/it] {'loss': 0.5937, 'learning_rate': 1.7831541097753683e-05, 'epoch': 0.24} 24%|██▎ | 1370/5773 [2:07:28<6:35:32, 5.39s/it] 24%|██▎ | 1371/5773 [2:07:34<6:34:56, 5.38s/it] 24%|██▎ | 1371/5773 [2:07:39<6:34:56, 5.38s/it] {'loss': 0.6087, 'learning_rate': 1.78280507970569e-05, 'epoch': 0.24} 24%|██▎ | 1371/5773 [2:07:39<6:34:56, 5.38s/it] {'loss': 0.6087, 'learning_rate': 1.78280507970569e-05, 'epoch': 0.24} 24%|██▎ | 1371/5773 [2:07:34<6:34:56, 5.38s/it] 24%|██▍ | 1372/5773 [2:07:39<6:35:23, 5.39s/it] 24%|██▍ | 1372/5773 [2:07:44<6:35:23, 5.39s/it] {'loss': 0.5908, 'learning_rate': 1.7824558031839613e-05, 'epoch': 0.24} 24%|██▍ | 1372/5773 [2:07:44<6:35:23, 5.39s/it] {'loss': 0.5908, 'learning_rate': 1.7824558031839613e-05, 'epoch': 0.24} 24%|██▍ | 1372/5773 [2:07:39<6:35:23, 5.39s/it] 24%|██▍ | 1373/5773 [2:07:44<6:37:40, 5.42s/it] 24%|██▍ | 1373/5773 [2:07:50<6:37:42, 5.42s/it] {'loss': 0.5969, 'learning_rate': 1.782106280320147e-05, 'epoch': 0.24} 24%|██▍ | 1373/5773 [2:07:50<6:37:42, 5.42s/it] {'loss': 0.5969, 'learning_rate': 1.782106280320147e-05, 'epoch': 0.24} 24%|██▍ | 1373/5773 [2:07:44<6:37:40, 5.42s/it] 24%|██▍ | 1374/5773 [2:07:50<6:35:02, 5.39s/it] 24%|██▍ | 1374/5773 [2:07:55<6:35:01, 5.39s/it] {'loss': 0.5888, 'learning_rate': 1.781756511224287e-05, 'epoch': 0.24} 24%|██▍ | 1374/5773 [2:07:55<6:35:01, 5.39s/it] {'loss': 0.5888, 'learning_rate': 1.781756511224287e-05, 'epoch': 0.24} 24%|██▍ | 1374/5773 [2:07:50<6:35:02, 5.39s/it] 24%|██▍ | 1375/5773 [2:08:01<6:39:54, 5.46s/it] 24%|██▍ | 1375/5773 [2:07:55<6:39:55, 5.46s/it] {'loss': 0.5915, 'learning_rate': 1.781406496006501e-05, 'epoch': 0.24} 24%|██▍ | 1375/5773 [2:08:01<6:39:54, 5.46s/it] {'loss': 0.5915, 'learning_rate': 1.781406496006501e-05, 'epoch': 0.24} 24%|██▍ | 1375/5773 [2:07:55<6:39:55, 5.46s/it] 24%|██▍ | 1376/5773 [2:08:01<6:37:49, 5.43s/it] 24%|██▍ | 1376/5773 [2:08:06<6:37:48, 5.43s/it] {'loss': 0.603, 'learning_rate': 1.7810562347769842e-05, 'epoch': 0.24} 24%|██▍ | 1376/5773 [2:08:06<6:37:48, 5.43s/it] {'loss': 0.603, 'learning_rate': 1.7810562347769842e-05, 'epoch': 0.24} 24%|██▍ | 1376/5773 [2:08:01<6:37:49, 5.43s/it] 24%|██▍ | 1377/5773 [2:08:06<6:35:41, 5.40s/it] 24%|██▍ | 1377/5773 [2:08:12<6:35:41, 5.40s/it] {'loss': 0.5859, 'learning_rate': 1.78070572764601e-05, 'epoch': 0.24} 24%|██▍ | 1377/5773 [2:08:12<6:35:41, 5.40s/it] {'loss': 0.5859, 'learning_rate': 1.78070572764601e-05, 'epoch': 0.24} 24%|██▍ | 1377/5773 [2:08:06<6:35:41, 5.40s/it] 24%|██▍ | 1378/5773 [2:08:12<6:37:10, 5.42s/it] 24%|██▍ | 1378/5773 [2:08:17<6:37:09, 5.42s/it] {'loss': 0.5835, 'learning_rate': 1.780354974723929e-05, 'epoch': 0.24} 24%|██▍ | 1378/5773 [2:08:17<6:37:09, 5.42s/it] {'loss': 0.5835, 'learning_rate': 1.780354974723929e-05, 'epoch': 0.24} 24%|██▍ | 1378/5773 [2:08:12<6:37:10, 5.42s/it] 24%|██▍ | 1379/5773 [2:08:17<6:43:13, 5.51s/it] 24%|██▍ | 1379/5773 [2:08:23<6:43:12, 5.51s/it] {'loss': 0.5805, 'learning_rate': 1.78000397612117e-05, 'epoch': 0.24} 24%|██▍ | 1379/5773 [2:08:23<6:43:12, 5.51s/it] {'loss': 0.5805, 'learning_rate': 1.78000397612117e-05, 'epoch': 0.24} 24%|██▍ | 1379/5773 [2:08:17<6:43:13, 5.51s/it] 24%|██▍ | 1380/5773 [2:08:23<6:40:01, 5.46s/it] 24%|██▍ | 1380/5773 [2:08:28<6:40:01, 5.46s/it] {'loss': 0.596, 'learning_rate': 1.7796527319482385e-05, 'epoch': 0.24} 24%|██▍ | 1380/5773 [2:08:28<6:40:01, 5.46s/it] {'loss': 0.596, 'learning_rate': 1.7796527319482385e-05, 'epoch': 0.24} 24%|██▍ | 1380/5773 [2:08:23<6:40:01, 5.46s/it] 24%|██▍ | 1381/5773 [2:08:34<6:40:29, 5.47s/it] 24%|██▍ | 1381/5773 [2:08:28<6:40:29, 5.47s/it] {'loss': 0.5776, 'learning_rate': 1.7793012423157172e-05, 'epoch': 0.24} 24%|██▍ | 1381/5773 [2:08:34<6:40:29, 5.47s/it] {'loss': 0.5776, 'learning_rate': 1.7793012423157172e-05, 'epoch': 0.24} 24%|██▍ | 1381/5773 [2:08:28<6:40:29, 5.47s/it] 24%|██▍ | 1382/5773 [2:08:34<6:44:52, 5.53s/it] 24%|██▍ | 1382/5773 [2:08:39<6:44:52, 5.53s/it] {'loss': 0.5768, 'learning_rate': 1.778949507334266e-05, 'epoch': 0.24} 24%|██▍ | 1382/5773 [2:08:39<6:44:52, 5.53s/it] {'loss': 0.5768, 'learning_rate': 1.778949507334266e-05, 'epoch': 0.24} 24%|██▍ | 1382/5773 [2:08:34<6:44:52, 5.53s/it] 24%|██▍ | 1383/5773 [2:08:39<6:43:35, 5.52s/it] 24%|██▍ | 1383/5773 [2:08:45<6:43:35, 5.52s/it] {'loss': 0.5734, 'learning_rate': 1.778597527114623e-05, 'epoch': 0.24} 24%|██▍ | 1383/5773 [2:08:45<6:43:35, 5.52s/it] {'loss': 0.5734, 'learning_rate': 1.778597527114623e-05, 'epoch': 0.24} 24%|██▍ | 1383/5773 [2:08:39<6:43:35, 5.52s/it] 24%|██▍ | 1384/5773 [2:08:45<6:45:15, 5.54s/it] 24%|██▍ | 1384/5773 [2:08:50<6:45:15, 5.54s/it] {'loss': 0.5719, 'learning_rate': 1.7782453017676025e-05, 'epoch': 0.24} 24%|██▍ | 1384/5773 [2:08:50<6:45:15, 5.54s/it] {'loss': 0.5719, 'learning_rate': 1.7782453017676025e-05, 'epoch': 0.24} 24%|██▍ | 1384/5773 [2:08:45<6:45:15, 5.54s/it] 24%|██▍ | 1385/5773 [2:08:56<6:42:24, 5.50s/it] 24%|██▍ | 1385/5773 [2:08:50<6:42:24, 5.50s/it] {'loss': 0.5967, 'learning_rate': 1.777892831404096e-05, 'epoch': 0.24} 24%|██▍ | 1385/5773 [2:08:56<6:42:24, 5.50s/it] {'loss': 0.5967, 'learning_rate': 1.777892831404096e-05, 'epoch': 0.24} 24%|██▍ | 1385/5773 [2:08:50<6:42:24, 5.50s/it] 24%|██▍ | 1386/5773 [2:08:56<6:40:20, 5.48s/it] 24%|██▍ | 1386/5773 [2:09:01<6:40:20, 5.48s/it] {'loss': 0.5994, 'learning_rate': 1.777540116135073e-05, 'epoch': 0.24} 24%|██▍ | 1386/5773 [2:09:01<6:40:20, 5.48s/it] {'loss': 0.5994, 'learning_rate': 1.777540116135073e-05, 'epoch': 0.24} 24%|██▍ | 1386/5773 [2:08:56<6:40:20, 5.48s/it] 24%|██▍ | 1387/5773 [2:09:01<6:39:17, 5.46s/it] 24%|██▍ | 1387/5773 [2:09:07<6:39:17, 5.46s/it] {'loss': 0.5949, 'learning_rate': 1.777187156071579e-05, 'epoch': 0.24} 24%|██▍ | 1387/5773 [2:09:07<6:39:17, 5.46s/it] {'loss': 0.5949, 'learning_rate': 1.777187156071579e-05, 'epoch': 0.24} 24%|██▍ | 1387/5773 [2:09:01<6:39:17, 5.46s/it] 24%|██▍ | 1388/5773 [2:09:06<6:35:13, 5.41s/it] 24%|██▍ | 1388/5773 [2:09:12<6:35:13, 5.41s/it] {'loss': 0.578, 'learning_rate': 1.7768339513247374e-05, 'epoch': 0.24} 24%|██▍ | 1388/5773 [2:09:12<6:35:13, 5.41s/it] {'loss': 0.578, 'learning_rate': 1.7768339513247374e-05, 'epoch': 0.24} 24%|██▍ | 1388/5773 [2:09:06<6:35:13, 5.41s/it] 24%|██▍ | 1389/5773 [2:09:12<6:37:30, 5.44s/it] 24%|██▍ | 1389/5773 [2:09:17<6:37:30, 5.44s/it] {'loss': 0.5905, 'learning_rate': 1.7764805020057476e-05, 'epoch': 0.24} 24%|██▍ | 1389/5773 [2:09:17<6:37:30, 5.44s/it] {'loss': 0.5905, 'learning_rate': 1.7764805020057476e-05, 'epoch': 0.24} 24%|██▍ | 1389/5773 [2:09:12<6:37:30, 5.44s/it] 24%|██▍ | 1390/5773 [2:09:17<6:35:21, 5.41s/it] 24%|██▍ | 1390/5773 [2:09:23<6:35:21, 5.41s/it] {'loss': 0.5921, 'learning_rate': 1.776126808225888e-05, 'epoch': 0.24} 24%|██▍ | 1390/5773 [2:09:23<6:35:21, 5.41s/it] {'loss': 0.5921, 'learning_rate': 1.776126808225888e-05, 'epoch': 0.24} 24%|██▍ | 1390/5773 [2:09:17<6:35:21, 5.41s/it] 24%|██▍ | 1391/5773 [2:09:23<6:36:05, 5.42s/it] 24%|██▍ | 1391/5773 [2:09:28<6:36:05, 5.42s/it] {'loss': 0.5893, 'learning_rate': 1.7757728700965122e-05, 'epoch': 0.24} 24%|██▍ | 1391/5773 [2:09:28<6:36:05, 5.42s/it] {'loss': 0.5893, 'learning_rate': 1.7757728700965122e-05, 'epoch': 0.24} 24%|██▍ | 1391/5773 [2:09:23<6:36:05, 5.42s/it] 24%|██▍ | 1392/5773 [2:09:28<6:37:21, 5.44s/it] 24%|██▍ | 1392/5773 [2:09:34<6:37:21, 5.44s/it] {'loss': 0.5837, 'learning_rate': 1.7754186877290508e-05, 'epoch': 0.24} 24%|██▍ | 1392/5773 [2:09:34<6:37:21, 5.44s/it] {'loss': 0.5837, 'learning_rate': 1.7754186877290508e-05, 'epoch': 0.24} 24%|██▍ | 1392/5773 [2:09:28<6:37:21, 5.44s/it] 24%|██▍ | 1393/5773 [2:09:34<6:37:11, 5.44s/it] 24%|██▍ | 1393/5773 [2:09:39<6:37:11, 5.44s/it] {'loss': 0.5851, 'learning_rate': 1.7750642612350124e-05, 'epoch': 0.24} 24%|██▍ | 1393/5773 [2:09:39<6:37:11, 5.44s/it] {'loss': 0.5851, 'learning_rate': 1.7750642612350124e-05, 'epoch': 0.24} 24%|██▍ | 1393/5773 [2:09:34<6:37:11, 5.44s/it] 24%|██▍ | 1394/5773 [2:09:39<6:33:55, 5.40s/it] 24%|██▍ | 1394/5773 [2:09:44<6:33:55, 5.40s/it] {'loss': 0.5809, 'learning_rate': 1.7747095907259807e-05, 'epoch': 0.24} 24%|██▍ | 1394/5773 [2:09:44<6:33:55, 5.40s/it] {'loss': 0.5809, 'learning_rate': 1.7747095907259807e-05, 'epoch': 0.24} 24%|██▍ | 1394/5773 [2:09:39<6:33:55, 5.40s/it] 24%|██▍ | 1395/5773 [2:09:44<6:34:24, 5.41s/it] 24%|██▍ | 1395/5773 [2:09:50<6:34:24, 5.41s/it] {'loss': 0.5926, 'learning_rate': 1.7743546763136187e-05, 'epoch': 0.24} 24%|██▍ | 1395/5773 [2:09:50<6:34:24, 5.41s/it] {'loss': 0.5926, 'learning_rate': 1.7743546763136187e-05, 'epoch': 0.24} 24%|██▍ | 1395/5773 [2:09:44<6:34:24, 5.41s/it] 24%|██▍ | 1396/5773 [2:09:50<6:39:28, 5.48s/it] 24%|██▍ | 1396/5773 [2:09:56<6:39:27, 5.48s/it] {'loss': 0.5883, 'learning_rate': 1.773999518109664e-05, 'epoch': 0.24} 24%|██▍ | 1396/5773 [2:09:56<6:39:27, 5.48s/it] {'loss': 0.5883, 'learning_rate': 1.773999518109664e-05, 'epoch': 0.24} 24%|██▍ | 1396/5773 [2:09:50<6:39:28, 5.48s/it] 24%|██▍ | 1397/5773 [2:09:56<6:42:40, 5.52s/it] 24%|██▍ | 1397/5773 [2:10:01<6:42:41, 5.52s/it] {'loss': 0.5962, 'learning_rate': 1.773644116225932e-05, 'epoch': 0.24} 24%|██▍ | 1397/5773 [2:10:01<6:42:41, 5.52s/it] {'loss': 0.5962, 'learning_rate': 1.773644116225932e-05, 'epoch': 0.24} 24%|██▍ | 1397/5773 [2:09:56<6:42:40, 5.52s/it] 24%|██▍ | 1398/5773 [2:10:01<6:38:10, 5.46s/it] 24%|██▍ | 1398/5773 [2:10:06<6:38:10, 5.46s/it] {'loss': 0.5951, 'learning_rate': 1.7732884707743143e-05, 'epoch': 0.24} 24%|██▍ | 1398/5773 [2:10:06<6:38:10, 5.46s/it] {'loss': 0.5951, 'learning_rate': 1.7732884707743143e-05, 'epoch': 0.24} 24%|██▍ | 1398/5773 [2:10:01<6:38:10, 5.46s/it] 24%|██▍ | 1399/5773 [2:10:07<6:40:25, 5.49s/it] 24%|██▍ | 1399/5773 [2:10:12<6:40:25, 5.49s/it] {'loss': 0.5991, 'learning_rate': 1.7729325818667794e-05, 'epoch': 0.24} 24%|██▍ | 1399/5773 [2:10:12<6:40:25, 5.49s/it] {'loss': 0.5991, 'learning_rate': 1.7729325818667794e-05, 'epoch': 0.24} 24%|██▍ | 1399/5773 [2:10:07<6:40:25, 5.49s/it]14 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 24%|██▍ | 1400/5773 [2:10:12<6:39:52, 5.49s/it]8 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 24%|██▍ | 1400/5773 [2:10:18<6:39:52, 5.49s/it]11 10AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... {'loss': 0.5895, 'learning_rate': 1.772576449615373e-05, 'epoch': 0.24} 24%|██▍ | 1400/5773 [2:10:18<6:39:52, 5.49s/it] {'loss': 0.5895, 'learning_rate': 1.772576449615373e-05, 'epoch': 0.24} 24%|██▍ | 1400/5773 [2:10:12<6:39:52, 5.49s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 24%|██▍ | 1401/5773 [2:10:33<12:19:27, 10.15s/it] 24%|██▍ | 1401/5773 [2:10:39<12:19:27, 10.15s/it] {'loss': 0.5954, 'learning_rate': 1.7722200741322163e-05, 'epoch': 0.24} 24%|██▍ | 1401/5773 [2:10:39<12:19:27, 10.15s/it] {'loss': 0.5954, 'learning_rate': 1.7722200741322163e-05, 'epoch': 0.24} 24%|██▍ | 1401/5773 [2:10:33<12:19:27, 10.15s/it] 24%|██▍ | 1402/5773 [2:10:38<10:35:50, 8.73s/it] 24%|██▍ | 1402/5773 [2:10:44<10:35:50, 8.73s/it] {'loss': 0.584, 'learning_rate': 1.771863455529508e-05, 'epoch': 0.24} 24%|██▍ | 1402/5773 [2:10:44<10:35:50, 8.73s/it] {'loss': 0.584, 'learning_rate': 1.771863455529508e-05, 'epoch': 0.24} 24%|██▍ | 1402/5773 [2:10:38<10:35:50, 8.73s/it] 24%|██▍ | 1403/5773 [2:10:44<9:26:18, 7.78s/it] 24%|██▍ | 1403/5773 [2:10:50<9:26:18, 7.78s/it] {'loss': 0.5965, 'learning_rate': 1.7715065939195235e-05, 'epoch': 0.24} 24%|██▍ | 1403/5773 [2:10:50<9:26:18, 7.78s/it] {'loss': 0.5965, 'learning_rate': 1.7715065939195235e-05, 'epoch': 0.24} 24%|██▍ | 1403/5773 [2:10:44<9:26:18, 7.78s/it] 24%|██▍ | 1404/5773 [2:10:49<8:34:17, 7.06s/it] 24%|██▍ | 1404/5773 [2:10:55<8:34:16, 7.06s/it] {'loss': 0.5896, 'learning_rate': 1.7711494894146138e-05, 'epoch': 0.24} 24%|██▍ | 1404/5773 [2:10:55<8:34:16, 7.06s/it] {'loss': 0.5896, 'learning_rate': 1.7711494894146138e-05, 'epoch': 0.24} 24%|██▍ | 1404/5773 [2:10:49<8:34:17, 7.06s/it] 24%|██▍ | 1405/5773 [2:10:55<8:00:22, 6.60s/it] 24%|██▍ | 1405/5773 [2:11:00<8:00:23, 6.60s/it] {'loss': 0.5932, 'learning_rate': 1.7707921421272064e-05, 'epoch': 0.24} 24%|██▍ | 1405/5773 [2:11:00<8:00:23, 6.60s/it] {'loss': 0.5932, 'learning_rate': 1.7707921421272064e-05, 'epoch': 0.24} 24%|██▍ | 1405/5773 [2:10:55<8:00:22, 6.60s/it] 24%|██▍ | 1406/5773 [2:11:00<7:37:41, 6.29s/it] 24%|██▍ | 1406/5773 [2:11:06<7:37:42, 6.29s/it] {'loss': 0.5819, 'learning_rate': 1.7704345521698057e-05, 'epoch': 0.24} 24%|██▍ | 1406/5773 [2:11:06<7:37:42, 6.29s/it] {'loss': 0.5819, 'learning_rate': 1.7704345521698057e-05, 'epoch': 0.24} 24%|██▍ | 1406/5773 [2:11:00<7:37:41, 6.29s/it] 24%|██▍ | 1407/5773 [2:11:06<7:16:50, 6.00s/it] 24%|██▍ | 1407/5773 [2:11:11<7:16:51, 6.00s/it] {'loss': 0.5768, 'learning_rate': 1.7700767196549934e-05, 'epoch': 0.24} 24%|██▍ | 1407/5773 [2:11:11<7:16:51, 6.00s/it] {'loss': 0.5768, 'learning_rate': 1.7700767196549934e-05, 'epoch': 0.24} 24%|██▍ | 1407/5773 [2:11:06<7:16:50, 6.00s/it] 24%|██▍ | 1408/5773 [2:11:11<7:02:40, 5.81s/it] 24%|██▍ | 1408/5773 [2:11:17<7:02:40, 5.81s/it] {'loss': 0.597, 'learning_rate': 1.7697186446954257e-05, 'epoch': 0.24} 24%|██▍ | 1408/5773 [2:11:17<7:02:40, 5.81s/it] {'loss': 0.597, 'learning_rate': 1.7697186446954257e-05, 'epoch': 0.24} 24%|██▍ | 1408/5773 [2:11:11<7:02:40, 5.81s/it] 24%|██▍ | 1409/5773 [2:11:17<6:52:44, 5.67s/it] 24%|██▍ | 1409/5773 [2:11:22<6:52:44, 5.67s/it] {'loss': 0.6008, 'learning_rate': 1.7693603274038365e-05, 'epoch': 0.24} 24%|██▍ | 1409/5773 [2:11:22<6:52:44, 5.67s/it] {'loss': 0.6008, 'learning_rate': 1.7693603274038365e-05, 'epoch': 0.24} 24%|██▍ | 1409/5773 [2:11:17<6:52:44, 5.67s/it] 24%|██▍ | 1410/5773 [2:11:22<6:45:31, 5.58s/it] 24%|██▍ | 1410/5773 [2:11:27<6:45:31, 5.58s/it] {'loss': 0.5928, 'learning_rate': 1.769001767893035e-05, 'epoch': 0.24} 24%|██▍ | 1410/5773 [2:11:27<6:45:31, 5.58s/it] {'loss': 0.5928, 'learning_rate': 1.769001767893035e-05, 'epoch': 0.24} 24%|██▍ | 1410/5773 [2:11:22<6:45:31, 5.58s/it] 24%|██▍ | 1411/5773 [2:11:27<6:40:49, 5.51s/it] 24%|██▍ | 1411/5773 [2:11:33<6:40:49, 5.51s/it] {'loss': 0.5732, 'learning_rate': 1.7686429662759076e-05, 'epoch': 0.24} 24%|██▍ | 1411/5773 [2:11:33<6:40:49, 5.51s/it] {'loss': 0.5732, 'learning_rate': 1.7686429662759076e-05, 'epoch': 0.24} 24%|██▍ | 1411/5773 [2:11:27<6:40:49, 5.51s/it] 24%|██▍ | 1412/5773 [2:11:33<6:41:38, 5.53s/it] 24%|██▍ | 1412/5773 [2:11:38<6:41:38, 5.53s/it] {'loss': 0.5876, 'learning_rate': 1.7682839226654168e-05, 'epoch': 0.24} 24%|██▍ | 1412/5773 [2:11:38<6:41:38, 5.53s/it] {'loss': 0.5876, 'learning_rate': 1.7682839226654168e-05, 'epoch': 0.24} 24%|██▍ | 1412/5773 [2:11:33<6:41:38, 5.53s/it] 24%|██▍ | 1413/5773 [2:11:38<6:42:25, 5.54s/it] 24%|██▍ | 1413/5773 [2:11:44<6:42:25, 5.54s/it] {'loss': 0.5791, 'learning_rate': 1.7679246371746e-05, 'epoch': 0.24} 24%|██▍ | 1413/5773 [2:11:44<6:42:25, 5.54s/it] {'loss': 0.5791, 'learning_rate': 1.7679246371746e-05, 'epoch': 0.24} 24%|██▍ | 1413/5773 [2:11:38<6:42:25, 5.54s/it] 24%|██▍ | 1414/5773 [2:11:44<6:42:19, 5.54s/it] 24%|██▍ | 1414/5773 [2:11:49<6:42:19, 5.54s/it] {'loss': 0.5833, 'learning_rate': 1.7675651099165732e-05, 'epoch': 0.24} 24%|██▍ | 1414/5773 [2:11:49<6:42:19, 5.54s/it] {'loss': 0.5833, 'learning_rate': 1.7675651099165732e-05, 'epoch': 0.24} 24%|██▍ | 1414/5773 [2:11:44<6:42:19, 5.54s/it] 25%|██▍ | 1415/5773 [2:11:49<6:39:40, 5.50s/it] 25%|██▍ | 1415/5773 [2:11:55<6:39:40, 5.50s/it] {'loss': 0.5991, 'learning_rate': 1.767205341004526e-05, 'epoch': 0.25} 25%|██▍ | 1415/5773 [2:11:55<6:39:40, 5.50s/it] {'loss': 0.5991, 'learning_rate': 1.767205341004526e-05, 'epoch': 0.25} 25%|██▍ | 1415/5773 [2:11:49<6:39:40, 5.50s/it] 25%|██▍ | 1416/5773 [2:11:55<6:36:37, 5.46s/it] 25%|██▍ | 1416/5773 [2:12:00<6:36:37, 5.46s/it] {'loss': 0.5785, 'learning_rate': 1.766845330551725e-05, 'epoch': 0.25} 25%|██▍ | 1416/5773 [2:12:00<6:36:37, 5.46s/it] {'loss': 0.5785, 'learning_rate': 1.766845330551725e-05, 'epoch': 0.25} 25%|██▍ | 1416/5773 [2:11:55<6:36:37, 5.46s/it] 25%|██▍ | 1417/5773 [2:12:00<6:36:09, 5.46s/it] 25%|██▍ | 1417/5773 [2:12:06<6:36:09, 5.46s/it] {'loss': 0.5828, 'learning_rate': 1.766485078671514e-05, 'epoch': 0.25} 25%|██▍ | 1417/5773 [2:12:06<6:36:09, 5.46s/it] {'loss': 0.5828, 'learning_rate': 1.766485078671514e-05, 'epoch': 0.25} 25%|██▍ | 1417/5773 [2:12:00<6:36:09, 5.46s/it] 25%|██▍ | 1418/5773 [2:12:06<6:37:06, 5.47s/it] 25%|██▍ | 1418/5773 [2:12:11<6:37:06, 5.47s/it] {'loss': 0.5837, 'learning_rate': 1.7661245854773103e-05, 'epoch': 0.25} 25%|██▍ | 1418/5773 [2:12:11<6:37:06, 5.47s/it] {'loss': 0.5837, 'learning_rate': 1.7661245854773103e-05, 'epoch': 0.25} 25%|██▍ | 1418/5773 [2:12:06<6:37:06, 5.47s/it] 25%|██▍ | 1419/5773 [2:12:11<6:36:18, 5.46s/it] 25%|██▍ | 1419/5773 [2:12:17<6:36:18, 5.46s/it] {'loss': 0.5953, 'learning_rate': 1.76576385108261e-05, 'epoch': 0.25} 25%|██▍ | 1419/5773 [2:12:17<6:36:18, 5.46s/it] {'loss': 0.5953, 'learning_rate': 1.76576385108261e-05, 'epoch': 0.25} 25%|██▍ | 1419/5773 [2:12:11<6:36:18, 5.46s/it] 25%|██▍ | 1420/5773 [2:12:17<6:38:32, 5.49s/it] 25%|██▍ | 1420/5773 [2:12:22<6:38:32, 5.49s/it] {'loss': 0.6141, 'learning_rate': 1.765402875600984e-05, 'epoch': 0.25} 25%|██▍ | 1420/5773 [2:12:22<6:38:32, 5.49s/it] {'loss': 0.6141, 'learning_rate': 1.765402875600984e-05, 'epoch': 0.25} 25%|██▍ | 1420/5773 [2:12:17<6:38:32, 5.49s/it] 25%|██▍ | 1421/5773 [2:12:22<6:33:55, 5.43s/it] 25%|██▍ | 1421/5773 [2:12:27<6:33:55, 5.43s/it] {'loss': 0.6022, 'learning_rate': 1.7650416591460776e-05, 'epoch': 0.25} 25%|██▍ | 1421/5773 [2:12:27<6:33:55, 5.43s/it] {'loss': 0.6022, 'learning_rate': 1.7650416591460776e-05, 'epoch': 0.25} 25%|██▍ | 1421/5773 [2:12:22<6:33:55, 5.43s/it] 25%|██▍ | 1422/5773 [2:12:27<6:33:07, 5.42s/it] 25%|██▍ | 1422/5773 [2:12:33<6:33:07, 5.42s/it] {'loss': 0.5884, 'learning_rate': 1.7646802018316143e-05, 'epoch': 0.25} 25%|██▍ | 1422/5773 [2:12:33<6:33:07, 5.42s/it] {'loss': 0.5884, 'learning_rate': 1.7646802018316143e-05, 'epoch': 0.25} 25%|██▍ | 1422/5773 [2:12:27<6:33:07, 5.42s/it] 25%|██▍ | 1423/5773 [2:12:33<6:34:51, 5.45s/it] 25%|██▍ | 1423/5773 [2:12:38<6:34:51, 5.45s/it] {'loss': 0.6088, 'learning_rate': 1.764318503771392e-05, 'epoch': 0.25} 25%|██▍ | 1423/5773 [2:12:38<6:34:51, 5.45s/it] {'loss': 0.6088, 'learning_rate': 1.764318503771392e-05, 'epoch': 0.25} 25%|██▍ | 1423/5773 [2:12:33<6:34:51, 5.45s/it] 25%|██▍ | 1424/5773 [2:12:38<6:34:21, 5.44s/it] 25%|██▍ | 1424/5773 [2:12:44<6:34:22, 5.44s/it] {'loss': 0.5684, 'learning_rate': 1.7639565650792846e-05, 'epoch': 0.25} 25%|██▍ | 1424/5773 [2:12:44<6:34:22, 5.44s/it] {'loss': 0.5684, 'learning_rate': 1.7639565650792846e-05, 'epoch': 0.25} 25%|██▍ | 1424/5773 [2:12:38<6:34:21, 5.44s/it] 25%|██▍ | 1425/5773 [2:12:44<6:39:39, 5.52s/it] 25%|██▍ | 1425/5773 [2:12:49<6:39:40, 5.52s/it] {'loss': 0.5939, 'learning_rate': 1.763594385869243e-05, 'epoch': 0.25} 25%|██▍ | 1425/5773 [2:12:49<6:39:40, 5.52s/it] {'loss': 0.5939, 'learning_rate': 1.763594385869243e-05, 'epoch': 0.25} 25%|██▍ | 1425/5773 [2:12:44<6:39:39, 5.52s/it] 25%|██▍ | 1426/5773 [2:12:49<6:36:19, 5.47s/it] 25%|██▍ | 1426/5773 [2:12:55<6:36:19, 5.47s/it] {'loss': 0.579, 'learning_rate': 1.7632319662552914e-05, 'epoch': 0.25} 25%|██▍ | 1426/5773 [2:12:55<6:36:19, 5.47s/it] {'loss': 0.579, 'learning_rate': 1.7632319662552914e-05, 'epoch': 0.25} 25%|██▍ | 1426/5773 [2:12:49<6:36:19, 5.47s/it] 25%|██▍ | 1427/5773 [2:12:55<6:33:09, 5.43s/it] 25%|██▍ | 1427/5773 [2:13:00<6:33:09, 5.43s/it] {'loss': 0.5846, 'learning_rate': 1.762869306351532e-05, 'epoch': 0.25} 25%|██▍ | 1427/5773 [2:13:00<6:33:09, 5.43s/it] {'loss': 0.5846, 'learning_rate': 1.762869306351532e-05, 'epoch': 0.25} 25%|██▍ | 1427/5773 [2:12:55<6:33:09, 5.43s/it] 25%|██▍ | 1428/5773 [2:13:00<6:33:29, 5.43s/it] 25%|██▍ | 1428/5773 [2:13:06<6:33:29, 5.43s/it] {'loss': 0.5896, 'learning_rate': 1.7625064062721414e-05, 'epoch': 0.25} 25%|██▍ | 1428/5773 [2:13:06<6:33:29, 5.43s/it] {'loss': 0.5896, 'learning_rate': 1.7625064062721414e-05, 'epoch': 0.25} 25%|██▍ | 1428/5773 [2:13:00<6:33:29, 5.43s/it] 25%|██▍ | 1429/5773 [2:13:05<6:31:43, 5.41s/it] 25%|██▍ | 1429/5773 [2:13:11<6:31:43, 5.41s/it] {'loss': 0.5958, 'learning_rate': 1.762143266131372e-05, 'epoch': 0.25} 25%|██▍ | 1429/5773 [2:13:11<6:31:43, 5.41s/it] {'loss': 0.5958, 'learning_rate': 1.762143266131372e-05, 'epoch': 0.25} 25%|██▍ | 1429/5773 [2:13:05<6:31:43, 5.41s/it] 25%|██▍ | 1430/5773 [2:13:11<6:39:06, 5.51s/it] 25%|██▍ | 1430/5773 [2:13:17<6:39:06, 5.51s/it] {'loss': 0.5747, 'learning_rate': 1.7617798860435526e-05, 'epoch': 0.25} 25%|██▍ | 1430/5773 [2:13:17<6:39:06, 5.51s/it] {'loss': 0.5747, 'learning_rate': 1.7617798860435526e-05, 'epoch': 0.25} 25%|██▍ | 1430/5773 [2:13:11<6:39:06, 5.51s/it] 25%|██▍ | 1431/5773 [2:13:17<6:37:46, 5.50s/it] 25%|██▍ | 1431/5773 [2:13:22<6:37:46, 5.50s/it] {'loss': 0.5948, 'learning_rate': 1.761416266123086e-05, 'epoch': 0.25} 25%|██▍ | 1431/5773 [2:13:22<6:37:46, 5.50s/it] {'loss': 0.5948, 'learning_rate': 1.761416266123086e-05, 'epoch': 0.25} 25%|██▍ | 1431/5773 [2:13:17<6:37:46, 5.50s/it] 25%|██▍ | 1432/5773 [2:13:22<6:35:11, 5.46s/it] 25%|██▍ | 1432/5773 [2:13:28<6:35:10, 5.46s/it] {'loss': 0.5858, 'learning_rate': 1.761052406484452e-05, 'epoch': 0.25} 25%|██▍ | 1432/5773 [2:13:28<6:35:10, 5.46s/it] {'loss': 0.5858, 'learning_rate': 1.761052406484452e-05, 'epoch': 0.25} 25%|██▍ | 1432/5773 [2:13:22<6:35:11, 5.46s/it] 25%|██▍ | 1433/5773 [2:13:27<6:31:41, 5.42s/it] 25%|██▍ | 1433/5773 [2:13:33<6:31:41, 5.42s/it] {'loss': 0.5965, 'learning_rate': 1.7606883072422048e-05, 'epoch': 0.25} 25%|██▍ | 1433/5773 [2:13:33<6:31:41, 5.42s/it] {'loss': 0.5965, 'learning_rate': 1.7606883072422048e-05, 'epoch': 0.25} 25%|██▍ | 1433/5773 [2:13:27<6:31:41, 5.42s/it] 25%|██▍ | 1434/5773 [2:13:33<6:35:35, 5.47s/it] 25%|██▍ | 1434/5773 [2:13:38<6:35:35, 5.47s/it] {'loss': 0.5792, 'learning_rate': 1.7603239685109746e-05, 'epoch': 0.25} 25%|██▍ | 1434/5773 [2:13:38<6:35:35, 5.47s/it] {'loss': 0.5792, 'learning_rate': 1.7603239685109746e-05, 'epoch': 0.25} 25%|██▍ | 1434/5773 [2:13:33<6:35:35, 5.47s/it] 25%|██▍ | 1435/5773 [2:13:38<6:36:28, 5.48s/it] 25%|██▍ | 1435/5773 [2:13:44<6:36:28, 5.48s/it] {'loss': 0.5893, 'learning_rate': 1.759959390405467e-05, 'epoch': 0.25} 25%|██▍ | 1435/5773 [2:13:44<6:36:28, 5.48s/it] {'loss': 0.5893, 'learning_rate': 1.759959390405467e-05, 'epoch': 0.25} 25%|██▍ | 1435/5773 [2:13:38<6:36:28, 5.48s/it] 25%|██▍ | 1436/5773 [2:13:44<6:39:05, 5.52s/it] 25%|██▍ | 1436/5773 [2:13:50<6:39:05, 5.52s/it] {'loss': 0.5761, 'learning_rate': 1.7595945730404627e-05, 'epoch': 0.25} {'loss': 0.5761, 'learning_rate': 1.7595945730404627e-05, 'epoch': 0.25} 25%|██▍ | 1436/5773 [2:13:50<6:39:05, 5.52s/it] 25%|██▍ | 1436/5773 [2:13:44<6:39:05, 5.52s/it] 25%|██▍ | 1437/5773 [2:13:50<6:41:42, 5.56s/it] 25%|██▍ | 1437/5773 [2:13:55<6:41:41, 5.56s/it] {'loss': 0.5969, 'learning_rate': 1.759229516530818e-05, 'epoch': 0.25} 25%|██▍ | 1437/5773 [2:13:55<6:41:41, 5.56s/it] {'loss': 0.5969, 'learning_rate': 1.759229516530818e-05, 'epoch': 0.25} 25%|██▍ | 1437/5773 [2:13:50<6:41:42, 5.56s/it] 25%|██▍ | 1438/5773 [2:13:55<6:39:47, 5.53s/it] 25%|██▍ | 1438/5773 [2:14:01<6:39:48, 5.53s/it] {'loss': 0.5908, 'learning_rate': 1.7588642209914644e-05, 'epoch': 0.25} 25%|██▍ | 1438/5773 [2:14:01<6:39:48, 5.53s/it] {'loss': 0.5908, 'learning_rate': 1.7588642209914644e-05, 'epoch': 0.25} 25%|██▍ | 1438/5773 [2:13:55<6:39:47, 5.53s/it] 25%|██▍ | 1439/5773 [2:14:01<6:37:35, 5.50s/it] 25%|██▍ | 1439/5773 [2:14:06<6:37:35, 5.50s/it] {'loss': 0.5862, 'learning_rate': 1.7584986865374084e-05, 'epoch': 0.25} 25%|██▍ | 1439/5773 [2:14:06<6:37:35, 5.50s/it] {'loss': 0.5862, 'learning_rate': 1.7584986865374084e-05, 'epoch': 0.25} 25%|██▍ | 1439/5773 [2:14:01<6:37:35, 5.50s/it] 25%|██▍ | 1440/5773 [2:14:06<6:35:08, 5.47s/it] 25%|██▍ | 1440/5773 [2:14:12<6:35:08, 5.47s/it] {'loss': 0.5951, 'learning_rate': 1.758132913283732e-05, 'epoch': 0.25} 25%|██▍ | 1440/5773 [2:14:12<6:35:08, 5.47s/it] {'loss': 0.5951, 'learning_rate': 1.758132913283732e-05, 'epoch': 0.25} 25%|██▍ | 1440/5773 [2:14:06<6:35:08, 5.47s/it] 25%|██▍ | 1441/5773 [2:14:11<6:32:42, 5.44s/it] 25%|██▍ | 1441/5773 [2:14:17<6:32:42, 5.44s/it] {'loss': 0.598, 'learning_rate': 1.757766901345592e-05, 'epoch': 0.25} 25%|██▍ | 1441/5773 [2:14:17<6:32:42, 5.44s/it] {'loss': 0.598, 'learning_rate': 1.757766901345592e-05, 'epoch': 0.25} 25%|██▍ | 1441/5773 [2:14:11<6:32:42, 5.44s/it] 25%|██▍ | 1442/5773 [2:14:17<6:37:05, 5.50s/it] 25%|██▍ | 1442/5773 [2:14:23<6:37:05, 5.50s/it] {'loss': 0.5828, 'learning_rate': 1.7574006508382214e-05, 'epoch': 0.25} 25%|██▍ | 1442/5773 [2:14:23<6:37:05, 5.50s/it] {'loss': 0.5828, 'learning_rate': 1.7574006508382214e-05, 'epoch': 0.25} 25%|██▍ | 1442/5773 [2:14:17<6:37:05, 5.50s/it] 25%|██▍ | 1443/5773 [2:14:23<6:38:31, 5.52s/it] 25%|██▍ | 1443/5773 [2:14:28<6:38:31, 5.52s/it] {'loss': 0.5805, 'learning_rate': 1.757034161876927e-05, 'epoch': 0.25} 25%|██▍ | 1443/5773 [2:14:28<6:38:31, 5.52s/it] {'loss': 0.5805, 'learning_rate': 1.757034161876927e-05, 'epoch': 0.25} 25%|██▍ | 1443/5773 [2:14:23<6:38:31, 5.52s/it] 25%|██▌ | 1444/5773 [2:14:28<6:35:25, 5.48s/it] 25%|██▌ | 1444/5773 [2:14:34<6:35:27, 5.48s/it] {'loss': 0.6095, 'learning_rate': 1.7566674345770912e-05, 'epoch': 0.25} 25%|██▌ | 1444/5773 [2:14:34<6:35:27, 5.48s/it] {'loss': 0.6095, 'learning_rate': 1.7566674345770912e-05, 'epoch': 0.25} 25%|██▌ | 1444/5773 [2:14:28<6:35:25, 5.48s/it] 25%|██▌ | 1445/5773 [2:14:33<6:32:50, 5.45s/it] 25%|██▌ | 1445/5773 [2:14:39<6:32:49, 5.45s/it] {'loss': 0.578, 'learning_rate': 1.756300469054172e-05, 'epoch': 0.25} 25%|██▌ | 1445/5773 [2:14:39<6:32:49, 5.45s/it] {'loss': 0.578, 'learning_rate': 1.756300469054172e-05, 'epoch': 0.25} 25%|██▌ | 1445/5773 [2:14:33<6:32:50, 5.45s/it] 25%|██▌ | 1446/5773 [2:14:39<6:34:41, 5.47s/it] 25%|██▌ | 1446/5773 [2:14:44<6:34:41, 5.47s/it] {'loss': 0.5911, 'learning_rate': 1.755933265423701e-05, 'epoch': 0.25} 25%|██▌ | 1446/5773 [2:14:44<6:34:41, 5.47s/it] {'loss': 0.5911, 'learning_rate': 1.755933265423701e-05, 'epoch': 0.25} 25%|██▌ | 1446/5773 [2:14:39<6:34:41, 5.47s/it] 25%|██▌ | 1447/5773 [2:14:44<6:30:37, 5.42s/it] 25%|██▌ | 1447/5773 [2:14:50<6:30:37, 5.42s/it] {'loss': 0.58, 'learning_rate': 1.7555658238012867e-05, 'epoch': 0.25} 25%|██▌ | 1447/5773 [2:14:50<6:30:37, 5.42s/it] {'loss': 0.58, 'learning_rate': 1.7555658238012867e-05, 'epoch': 0.25} 25%|██▌ | 1447/5773 [2:14:44<6:30:37, 5.42s/it] 25%|██▌ | 1448/5773 [2:14:50<6:29:35, 5.40s/it] 25%|██▌ | 1448/5773 [2:14:55<6:29:36, 5.40s/it] {'loss': 0.5931, 'learning_rate': 1.7551981443026104e-05, 'epoch': 0.25} 25%|██▌ | 1448/5773 [2:14:55<6:29:36, 5.40s/it] {'loss': 0.5931, 'learning_rate': 1.7551981443026104e-05, 'epoch': 0.25} 25%|██▌ | 1448/5773 [2:14:50<6:29:35, 5.40s/it] 25%|██▌ | 1449/5773 [2:14:55<6:27:26, 5.38s/it] 25%|██▌ | 1449/5773 [2:15:00<6:27:26, 5.38s/it] {'loss': 0.5956, 'learning_rate': 1.75483022704343e-05, 'epoch': 0.25} 25%|██▌ | 1449/5773 [2:15:00<6:27:26, 5.38s/it] {'loss': 0.5956, 'learning_rate': 1.75483022704343e-05, 'epoch': 0.25} 25%|██▌ | 1449/5773 [2:14:55<6:27:26, 5.38s/it]1 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 25%|██▌ | 1450/5773 [2:15:00<6:28:33, 5.39s/it]2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 25%|██▌ | 1450/5773 [2:15:06<6:28:32, 5.39s/it]9 AutoResumeHook: Checking whether to suspend... {'loss': 0.6017, 'learning_rate': 1.754462072139578e-05, 'epoch': 0.25} 25%|██▌ | 1450/5773 [2:15:06<6:28:32, 5.39s/it] {'loss': 0.6017, 'learning_rate': 1.754462072139578e-05, 'epoch': 0.25} 25%|██▌ | 1450/5773 [2:15:00<6:28:33, 5.39s/it] 25%|██▌ | 1451/5773 [2:15:06<6:27:27, 5.38s/it] 25%|██▌ | 1451/5773 [2:15:11<6:27:27, 5.38s/it] {'loss': 0.5981, 'learning_rate': 1.75409367970696e-05, 'epoch': 0.25} 25%|██▌ | 1451/5773 [2:15:11<6:27:27, 5.38s/it] {'loss': 0.5981, 'learning_rate': 1.75409367970696e-05, 'epoch': 0.25} 25%|██▌ | 1451/5773 [2:15:06<6:27:27, 5.38s/it] 25%|██▌ | 1452/5773 [2:15:11<6:28:41, 5.40s/it] 25%|██▌ | 1452/5773 [2:15:17<6:28:41, 5.40s/it] {'loss': 0.5895, 'learning_rate': 1.753725049861559e-05, 'epoch': 0.25} 25%|██▌ | 1452/5773 [2:15:17<6:28:41, 5.40s/it] {'loss': 0.5895, 'learning_rate': 1.753725049861559e-05, 'epoch': 0.25} 25%|██▌ | 1452/5773 [2:15:11<6:28:41, 5.40s/it] 25%|██▌ | 1453/5773 [2:15:16<6:25:11, 5.35s/it] 25%|██▌ | 1453/5773 [2:15:22<6:25:10, 5.35s/it] {'loss': 0.5875, 'learning_rate': 1.753356182719431e-05, 'epoch': 0.25} 25%|██▌ | 1453/5773 [2:15:22<6:25:10, 5.35s/it] {'loss': 0.5875, 'learning_rate': 1.753356182719431e-05, 'epoch': 0.25} 25%|██▌ | 1453/5773 [2:15:16<6:25:11, 5.35s/it] 25%|██▌ | 1454/5773 [2:15:22<6:29:25, 5.41s/it] 25%|██▌ | 1454/5773 [2:15:27<6:29:25, 5.41s/it] {'loss': 0.5953, 'learning_rate': 1.7529870783967066e-05, 'epoch': 0.25} 25%|██▌ | 1454/5773 [2:15:27<6:29:25, 5.41s/it] {'loss': 0.5953, 'learning_rate': 1.7529870783967066e-05, 'epoch': 0.25} 25%|██▌ | 1454/5773 [2:15:22<6:29:25, 5.41s/it] 25%|██▌ | 1455/5773 [2:15:27<6:30:17, 5.42s/it] 25%|██▌ | 1455/5773 [2:15:33<6:30:17, 5.42s/it] {'loss': 0.5874, 'learning_rate': 1.7526177370095924e-05, 'epoch': 0.25} 25%|██▌ | 1455/5773 [2:15:33<6:30:17, 5.42s/it] {'loss': 0.5874, 'learning_rate': 1.7526177370095924e-05, 'epoch': 0.25} 25%|██▌ | 1455/5773 [2:15:27<6:30:17, 5.42s/it] 25%|██▌ | 1456/5773 [2:15:33<6:36:02, 5.50s/it] 25%|██▌ | 1456/5773 [2:15:39<6:36:02, 5.50s/it] {'loss': 0.5885, 'learning_rate': 1.752248158674369e-05, 'epoch': 0.25} 25%|██▌ | 1456/5773 [2:15:39<6:36:02, 5.50s/it] {'loss': 0.5885, 'learning_rate': 1.752248158674369e-05, 'epoch': 0.25} 25%|██▌ | 1456/5773 [2:15:33<6:36:02, 5.50s/it] 25%|██▌ | 1457/5773 [2:15:38<6:33:31, 5.47s/it] 25%|██▌ | 1457/5773 [2:15:44<6:33:31, 5.47s/it] {'loss': 0.5982, 'learning_rate': 1.751878343507391e-05, 'epoch': 0.25} 25%|██▌ | 1457/5773 [2:15:44<6:33:31, 5.47s/it] {'loss': 0.5982, 'learning_rate': 1.751878343507391e-05, 'epoch': 0.25} 25%|██▌ | 1457/5773 [2:15:38<6:33:31, 5.47s/it] 25%|██▌ | 1458/5773 [2:15:44<6:31:37, 5.45s/it] 25%|██▌ | 1458/5773 [2:15:49<6:31:37, 5.45s/it] {'loss': 0.58, 'learning_rate': 1.7515082916250876e-05, 'epoch': 0.25} 25%|██▌ | 1458/5773 [2:15:49<6:31:37, 5.45s/it] {'loss': 0.58, 'learning_rate': 1.7515082916250876e-05, 'epoch': 0.25} 25%|██▌ | 1458/5773 [2:15:44<6:31:37, 5.45s/it] 25%|██▌ | 1459/5773 [2:15:49<6:32:24, 5.46s/it] 25%|██▌ | 1459/5773 [2:15:55<6:32:24, 5.46s/it] {'loss': 0.5967, 'learning_rate': 1.7511380031439635e-05, 'epoch': 0.25} 25%|██▌ | 1459/5773 [2:15:55<6:32:24, 5.46s/it] {'loss': 0.5967, 'learning_rate': 1.7511380031439635e-05, 'epoch': 0.25} 25%|██▌ | 1459/5773 [2:15:49<6:32:24, 5.46s/it] 25%|██▌ | 1460/5773 [2:15:55<6:29:50, 5.42s/it] 25%|██▌ | 1460/5773 [2:16:00<6:29:51, 5.42s/it] {'loss': 0.58, 'learning_rate': 1.7507674781805976e-05, 'epoch': 0.25} 25%|██▌ | 1460/5773 [2:16:00<6:29:51, 5.42s/it] {'loss': 0.58, 'learning_rate': 1.7507674781805976e-05, 'epoch': 0.25} 25%|██▌ | 1460/5773 [2:15:55<6:29:50, 5.42s/it] 25%|██▌ | 1461/5773 [2:16:00<6:29:08, 5.41s/it] 25%|██▌ | 1461/5773 [2:16:06<6:29:08, 5.41s/it] {'loss': 0.5859, 'learning_rate': 1.7503967168516426e-05, 'epoch': 0.25} 25%|██▌ | 1461/5773 [2:16:06<6:29:08, 5.41s/it] {'loss': 0.5859, 'learning_rate': 1.7503967168516426e-05, 'epoch': 0.25} 25%|██▌ | 1461/5773 [2:16:00<6:29:08, 5.41s/it] 25%|██▌ | 1462/5773 [2:16:05<6:27:24, 5.39s/it] 25%|██▌ | 1462/5773 [2:16:11<6:27:24, 5.39s/it] {'loss': 0.5777, 'learning_rate': 1.7500257192738263e-05, 'epoch': 0.25} 25%|██▌ | 1462/5773 [2:16:11<6:27:24, 5.39s/it] {'loss': 0.5777, 'learning_rate': 1.7500257192738263e-05, 'epoch': 0.25} 25%|██▌ | 1462/5773 [2:16:05<6:27:24, 5.39s/it] 25%|██▌ | 1463/5773 [2:16:11<6:28:11, 5.40s/it] 25%|██▌ | 1463/5773 [2:16:16<6:28:10, 5.40s/it] {'loss': 0.5978, 'learning_rate': 1.74965448556395e-05, 'epoch': 0.25} 25%|██▌ | 1463/5773 [2:16:16<6:28:10, 5.40s/it] {'loss': 0.5978, 'learning_rate': 1.74965448556395e-05, 'epoch': 0.25} 25%|██▌ | 1463/5773 [2:16:11<6:28:11, 5.40s/it] 25%|██▌ | 1464/5773 [2:16:16<6:32:02, 5.46s/it] 25%|██▌ | 1464/5773 [2:16:22<6:32:02, 5.46s/it] {'loss': 0.5834, 'learning_rate': 1.749283015838891e-05, 'epoch': 0.25} 25%|██▌ | 1464/5773 [2:16:22<6:32:02, 5.46s/it] {'loss': 0.5834, 'learning_rate': 1.749283015838891e-05, 'epoch': 0.25} 25%|██▌ | 1464/5773 [2:16:16<6:32:02, 5.46s/it] 25%|██▌ | 1465/5773 [2:16:22<6:30:58, 5.45s/it] 25%|██▌ | 1465/5773 [2:16:27<6:30:58, 5.45s/it] {'loss': 0.5946, 'learning_rate': 1.748911310215599e-05, 'epoch': 0.25} 25%|██▌ | 1465/5773 [2:16:27<6:30:58, 5.45s/it] {'loss': 0.5946, 'learning_rate': 1.748911310215599e-05, 'epoch': 0.25} 25%|██▌ | 1465/5773 [2:16:22<6:30:58, 5.45s/it] 25%|██▌ | 1466/5773 [2:16:27<6:33:20, 5.48s/it] 25%|██▌ | 1466/5773 [2:16:33<6:33:21, 5.48s/it] {'loss': 0.5925, 'learning_rate': 1.7485393688110987e-05, 'epoch': 0.25} 25%|██▌ | 1466/5773 [2:16:33<6:33:21, 5.48s/it] {'loss': 0.5925, 'learning_rate': 1.7485393688110987e-05, 'epoch': 0.25} 25%|██▌ | 1466/5773 [2:16:27<6:33:20, 5.48s/it] 25%|██▌ | 1467/5773 [2:16:33<6:31:58, 5.46s/it] 25%|██▌ | 1467/5773 [2:16:38<6:31:58, 5.46s/it] {'loss': 0.5883, 'learning_rate': 1.7481671917424895e-05, 'epoch': 0.25} 25%|██▌ | 1467/5773 [2:16:38<6:31:58, 5.46s/it] {'loss': 0.5883, 'learning_rate': 1.7481671917424895e-05, 'epoch': 0.25} 25%|██▌ | 1467/5773 [2:16:33<6:31:58, 5.46s/it] 25%|██▌ | 1468/5773 [2:16:38<6:32:57, 5.48s/it] 25%|██▌ | 1468/5773 [2:16:44<6:32:57, 5.48s/it] {'loss': 0.5896, 'learning_rate': 1.7477947791269447e-05, 'epoch': 0.25} 25%|██▌ | 1468/5773 [2:16:44<6:32:57, 5.48s/it] {'loss': 0.5896, 'learning_rate': 1.7477947791269447e-05, 'epoch': 0.25} 25%|██▌ | 1468/5773 [2:16:38<6:32:57, 5.48s/it] 25%|██▌ | 1469/5773 [2:16:44<6:32:35, 5.47s/it] 25%|██▌ | 1469/5773 [2:16:49<6:32:34, 5.47s/it] {'loss': 0.6068, 'learning_rate': 1.7474221310817114e-05, 'epoch': 0.25} 25%|██▌ | 1469/5773 [2:16:49<6:32:34, 5.47s/it] {'loss': 0.6068, 'learning_rate': 1.7474221310817114e-05, 'epoch': 0.25} 25%|██▌ | 1469/5773 [2:16:44<6:32:35, 5.47s/it] 25%|██▌ | 1470/5773 [2:16:49<6:30:30, 5.45s/it] 25%|██▌ | 1470/5773 [2:16:55<6:30:30, 5.45s/it] {'loss': 0.5918, 'learning_rate': 1.7470492477241113e-05, 'epoch': 0.25} 25%|██▌ | 1470/5773 [2:16:55<6:30:30, 5.45s/it] {'loss': 0.5918, 'learning_rate': 1.7470492477241113e-05, 'epoch': 0.25} 25%|██▌ | 1470/5773 [2:16:49<6:30:30, 5.45s/it] 25%|██▌ | 1471/5773 [2:16:55<6:32:39, 5.48s/it] 25%|██▌ | 1471/5773 [2:17:00<6:32:39, 5.48s/it] {'loss': 0.5765, 'learning_rate': 1.7466761291715403e-05, 'epoch': 0.25} 25%|██▌ | 1471/5773 [2:17:00<6:32:39, 5.48s/it] {'loss': 0.5765, 'learning_rate': 1.7466761291715403e-05, 'epoch': 0.25} 25%|██▌ | 1471/5773 [2:16:55<6:32:39, 5.48s/it] 25%|██▌ | 1472/5773 [2:17:00<6:28:31, 5.42s/it] 25%|██▌ | 1472/5773 [2:17:05<6:28:31, 5.42s/it] {'loss': 0.5907, 'learning_rate': 1.746302775541467e-05, 'epoch': 0.25} 25%|██▌ | 1472/5773 [2:17:05<6:28:31, 5.42s/it] {'loss': 0.5907, 'learning_rate': 1.746302775541467e-05, 'epoch': 0.25} 25%|██▌ | 1472/5773 [2:17:00<6:28:31, 5.42s/it] 26%|██▌ | 1473/5773 [2:17:05<6:29:59, 5.44s/it] 26%|██▌ | 1473/5773 [2:17:11<6:29:59, 5.44s/it] {'loss': 0.5726, 'learning_rate': 1.7459291869514363e-05, 'epoch': 0.26} 26%|██▌ | 1473/5773 [2:17:11<6:29:59, 5.44s/it] {'loss': 0.5726, 'learning_rate': 1.7459291869514363e-05, 'epoch': 0.26} 26%|██▌ | 1473/5773 [2:17:05<6:29:59, 5.44s/it] 26%|██▌ | 1474/5773 [2:17:11<6:31:03, 5.46s/it] 26%|██▌ | 1474/5773 [2:17:16<6:31:03, 5.46s/it] {'loss': 0.5996, 'learning_rate': 1.7455553635190652e-05, 'epoch': 0.26} 26%|██▌ | 1474/5773 [2:17:16<6:31:03, 5.46s/it] {'loss': 0.5996, 'learning_rate': 1.7455553635190652e-05, 'epoch': 0.26} 26%|██▌ | 1474/5773 [2:17:11<6:31:03, 5.46s/it] 26%|██▌ | 1475/5773 [2:17:16<6:31:14, 5.46s/it] 26%|██▌ | 1475/5773 [2:17:22<6:31:15, 5.46s/it] {'loss': 0.5878, 'learning_rate': 1.7451813053620452e-05, 'epoch': 0.26} 26%|██▌ | 1475/5773 [2:17:22<6:31:15, 5.46s/it] {'loss': 0.5878, 'learning_rate': 1.7451813053620452e-05, 'epoch': 0.26} 26%|██▌ | 1475/5773 [2:17:16<6:31:14, 5.46s/it] 26%|██▌ | 1476/5773 [2:17:22<6:37:46, 5.55s/it] 26%|██▌ | 1476/5773 [2:17:28<6:37:46, 5.55s/it] {'loss': 0.5736, 'learning_rate': 1.744807012598142e-05, 'epoch': 0.26} 26%|██▌ | 1476/5773 [2:17:28<6:37:46, 5.55s/it] {'loss': 0.5736, 'learning_rate': 1.744807012598142e-05, 'epoch': 0.26} 26%|██▌ | 1476/5773 [2:17:22<6:37:46, 5.55s/it] 26%|██▌ | 1477/5773 [2:17:28<6:32:59, 5.49s/it] 26%|██▌ | 1477/5773 [2:17:33<6:32:58, 5.49s/it] {'loss': 0.5934, 'learning_rate': 1.7444324853451947e-05, 'epoch': 0.26} 26%|██▌ | 1477/5773 [2:17:33<6:32:58, 5.49s/it] {'loss': 0.5934, 'learning_rate': 1.7444324853451947e-05, 'epoch': 0.26} 26%|██▌ | 1477/5773 [2:17:28<6:32:59, 5.49s/it] 26%|██▌ | 1478/5773 [2:17:33<6:32:08, 5.48s/it] 26%|██▌ | 1478/5773 [2:17:38<6:32:08, 5.48s/it] {'loss': 0.5694, 'learning_rate': 1.7440577237211168e-05, 'epoch': 0.26} 26%|██▌ | 1478/5773 [2:17:38<6:32:08, 5.48s/it] {'loss': 0.5694, 'learning_rate': 1.7440577237211168e-05, 'epoch': 0.26} 26%|██▌ | 1478/5773 [2:17:33<6:32:08, 5.48s/it] 26%|██▌ | 1479/5773 [2:17:39<6:35:35, 5.53s/it] 26%|██▌ | 1479/5773 [2:17:44<6:35:34, 5.53s/it] {'loss': 0.599, 'learning_rate': 1.7436827278438947e-05, 'epoch': 0.26} 26%|██▌ | 1479/5773 [2:17:44<6:35:34, 5.53s/it] {'loss': 0.599, 'learning_rate': 1.7436827278438947e-05, 'epoch': 0.26} 26%|██▌ | 1479/5773 [2:17:39<6:35:35, 5.53s/it] 26%|██▌ | 1480/5773 [2:17:44<6:32:00, 5.48s/it] 26%|██▌ | 1480/5773 [2:17:50<6:32:01, 5.48s/it] {'loss': 0.5784, 'learning_rate': 1.743307497831589e-05, 'epoch': 0.26} 26%|██▌ | 1480/5773 [2:17:50<6:32:01, 5.48s/it] {'loss': 0.5784, 'learning_rate': 1.743307497831589e-05, 'epoch': 0.26} 26%|██▌ | 1480/5773 [2:17:44<6:32:00, 5.48s/it] 26%|██▌ | 1481/5773 [2:17:49<6:29:57, 5.45s/it] 26%|██▌ | 1481/5773 [2:17:55<6:29:57, 5.45s/it] {'loss': 0.609, 'learning_rate': 1.7429320338023355e-05, 'epoch': 0.26} 26%|██▌ | 1481/5773 [2:17:55<6:29:57, 5.45s/it] {'loss': 0.609, 'learning_rate': 1.7429320338023355e-05, 'epoch': 0.26} 26%|██▌ | 1481/5773 [2:17:49<6:29:57, 5.45s/it] 26%|██▌ | 1482/5773 [2:17:55<6:29:07, 5.44s/it] 26%|██▌ | 1482/5773 [2:18:00<6:29:06, 5.44s/it] {'loss': 0.5989, 'learning_rate': 1.7425563358743403e-05, 'epoch': 0.26} 26%|██▌ | 1482/5773 [2:18:00<6:29:06, 5.44s/it] {'loss': 0.5989, 'learning_rate': 1.7425563358743403e-05, 'epoch': 0.26} 26%|██▌ | 1482/5773 [2:17:55<6:29:07, 5.44s/it] 26%|██▌ | 1483/5773 [2:18:00<6:29:33, 5.45s/it] 26%|██▌ | 1483/5773 [2:18:06<6:29:33, 5.45s/it] {'loss': 0.5718, 'learning_rate': 1.7421804041658867e-05, 'epoch': 0.26} 26%|██▌ | 1483/5773 [2:18:06<6:29:33, 5.45s/it] {'loss': 0.5718, 'learning_rate': 1.7421804041658867e-05, 'epoch': 0.26} 26%|██▌ | 1483/5773 [2:18:00<6:29:33, 5.45s/it] 26%|██▌ | 1484/5773 [2:18:06<6:32:13, 5.49s/it] 26%|██▌ | 1484/5773 [2:18:11<6:32:13, 5.49s/it] {'loss': 0.5923, 'learning_rate': 1.741804238795329e-05, 'epoch': 0.26} 26%|██▌ | 1484/5773 [2:18:11<6:32:13, 5.49s/it] {'loss': 0.5923, 'learning_rate': 1.741804238795329e-05, 'epoch': 0.26} 26%|██▌ | 1484/5773 [2:18:06<6:32:13, 5.49s/it] 26%|██▌ | 1485/5773 [2:18:11<6:30:17, 5.46s/it] 26%|██▌ | 1485/5773 [2:18:17<6:30:17, 5.46s/it] {'loss': 0.5921, 'learning_rate': 1.7414278398810966e-05, 'epoch': 0.26} 26%|██▌ | 1485/5773 [2:18:17<6:30:17, 5.46s/it] {'loss': 0.5921, 'learning_rate': 1.7414278398810966e-05, 'epoch': 0.26} 26%|██▌ | 1485/5773 [2:18:11<6:30:17, 5.46s/it] 26%|██▌ | 1486/5773 [2:18:17<6:29:11, 5.45s/it] 26%|██▌ | 1486/5773 [2:18:22<6:29:11, 5.45s/it] {'loss': 0.5957, 'learning_rate': 1.7410512075416915e-05, 'epoch': 0.26} 26%|██▌ | 1486/5773 [2:18:22<6:29:11, 5.45s/it] {'loss': 0.5957, 'learning_rate': 1.7410512075416915e-05, 'epoch': 0.26} 26%|██▌ | 1486/5773 [2:18:17<6:29:11, 5.45s/it] 26%|██▌ | 1487/5773 [2:18:22<6:30:21, 5.46s/it] 26%|██▌ | 1487/5773 [2:18:28<6:30:21, 5.46s/it] {'loss': 0.5919, 'learning_rate': 1.74067434189569e-05, 'epoch': 0.26} 26%|██▌ | 1487/5773 [2:18:28<6:30:21, 5.46s/it] {'loss': 0.5919, 'learning_rate': 1.74067434189569e-05, 'epoch': 0.26} 26%|██▌ | 1487/5773 [2:18:22<6:30:21, 5.46s/it] 26%|██▌ | 1488/5773 [2:18:27<6:27:05, 5.42s/it] 26%|██▌ | 1488/5773 [2:18:33<6:27:07, 5.42s/it] {'loss': 0.5953, 'learning_rate': 1.7402972430617414e-05, 'epoch': 0.26} 26%|██▌ | 1488/5773 [2:18:33<6:27:07, 5.42s/it] {'loss': 0.5953, 'learning_rate': 1.7402972430617414e-05, 'epoch': 0.26} 26%|██▌ | 1488/5773 [2:18:27<6:27:05, 5.42s/it] 26%|██▌ | 1489/5773 [2:18:33<6:25:13, 5.40s/it] 26%|██▌ | 1489/5773 [2:18:38<6:25:12, 5.39s/it] {'loss': 0.5828, 'learning_rate': 1.7399199111585683e-05, 'epoch': 0.26} 26%|██▌ | 1489/5773 [2:18:38<6:25:12, 5.39s/it] {'loss': 0.5828, 'learning_rate': 1.7399199111585683e-05, 'epoch': 0.26} 26%|██▌ | 1489/5773 [2:18:33<6:25:13, 5.40s/it] 26%|██▌ | 1490/5773 [2:18:38<6:26:45, 5.42s/it] 26%|██▌ | 1490/5773 [2:18:44<6:26:45, 5.42s/it] {'loss': 0.5947, 'learning_rate': 1.739542346304967e-05, 'epoch': 0.26} 26%|██▌ | 1490/5773 [2:18:44<6:26:45, 5.42s/it] {'loss': 0.5947, 'learning_rate': 1.739542346304967e-05, 'epoch': 0.26} 26%|██▌ | 1490/5773 [2:18:38<6:26:45, 5.42s/it] 26%|██▌ | 1491/5773 [2:18:44<6:29:31, 5.46s/it] 26%|██▌ | 1491/5773 [2:18:49<6:29:30, 5.46s/it] {'loss': 0.5756, 'learning_rate': 1.7391645486198068e-05, 'epoch': 0.26} 26%|██▌ | 1491/5773 [2:18:49<6:29:30, 5.46s/it] {'loss': 0.5756, 'learning_rate': 1.7391645486198068e-05, 'epoch': 0.26} 26%|██▌ | 1491/5773 [2:18:44<6:29:31, 5.46s/it] 26%|██▌ | 1492/5773 [2:18:49<6:34:16, 5.53s/it] 26%|██▌ | 1492/5773 [2:18:55<6:34:17, 5.53s/it] {'loss': 0.5868, 'learning_rate': 1.7387865182220305e-05, 'epoch': 0.26} 26%|██▌ | 1492/5773 [2:18:55<6:34:17, 5.53s/it] {'loss': 0.5868, 'learning_rate': 1.7387865182220305e-05, 'epoch': 0.26} 26%|██▌ | 1492/5773 [2:18:50<6:34:16, 5.53s/it] 26%|██▌ | 1493/5773 [2:18:55<6:33:31, 5.52s/it] 26%|██▌ | 1493/5773 [2:19:01<6:33:31, 5.52s/it] {'loss': 0.5955, 'learning_rate': 1.738408255230654e-05, 'epoch': 0.26} 26%|██▌ | 1493/5773 [2:19:01<6:33:31, 5.52s/it] {'loss': 0.5955, 'learning_rate': 1.738408255230654e-05, 'epoch': 0.26} 26%|██▌ | 1493/5773 [2:18:55<6:33:31, 5.52s/it] 26%|██▌ | 1494/5773 [2:19:00<6:30:28, 5.48s/it] 26%|██▌ | 1494/5773 [2:19:06<6:30:28, 5.48s/it] {'loss': 0.5894, 'learning_rate': 1.738029759764767e-05, 'epoch': 0.26} 26%|██▌ | 1494/5773 [2:19:06<6:30:28, 5.48s/it] {'loss': 0.5894, 'learning_rate': 1.738029759764767e-05, 'epoch': 0.26} 26%|██▌ | 1494/5773 [2:19:00<6:30:28, 5.48s/it] 26%|██▌ | 1495/5773 [2:19:06<6:26:09, 5.42s/it] 26%|██▌ | 1495/5773 [2:19:11<6:26:08, 5.42s/it] {'loss': 0.5756, 'learning_rate': 1.7376510319435314e-05, 'epoch': 0.26} 26%|██▌ | 1495/5773 [2:19:11<6:26:08, 5.42s/it] {'loss': 0.5756, 'learning_rate': 1.7376510319435314e-05, 'epoch': 0.26} 26%|██▌ | 1495/5773 [2:19:06<6:26:09, 5.42s/it] 26%|██▌ | 1496/5773 [2:19:11<6:28:56, 5.46s/it] 26%|██▌ | 1496/5773 [2:19:17<6:28:55, 5.46s/it] {'loss': 0.5911, 'learning_rate': 1.7372720718861838e-05, 'epoch': 0.26} 26%|██▌ | 1496/5773 [2:19:17<6:28:55, 5.46s/it] {'loss': 0.5911, 'learning_rate': 1.7372720718861838e-05, 'epoch': 0.26} 26%|██▌ | 1496/5773 [2:19:11<6:28:56, 5.46s/it] 26%|██▌ | 1497/5773 [2:19:17<6:27:58, 5.44s/it] 26%|██▌ | 1497/5773 [2:19:22<6:27:58, 5.44s/it] {'loss': 0.6076, 'learning_rate': 1.736892879712032e-05, 'epoch': 0.26} 26%|██▌ | 1497/5773 [2:19:22<6:27:58, 5.44s/it] {'loss': 0.6076, 'learning_rate': 1.736892879712032e-05, 'epoch': 0.26} 26%|██▌ | 1497/5773 [2:19:17<6:27:58, 5.44s/it] 26%|██▌ | 1498/5773 [2:19:22<6:22:31, 5.37s/it] 26%|██▌ | 1498/5773 [2:19:27<6:22:32, 5.37s/it] {'loss': 0.5948, 'learning_rate': 1.736513455540458e-05, 'epoch': 0.26} 26%|██▌ | 1498/5773 [2:19:27<6:22:32, 5.37s/it] {'loss': 0.5948, 'learning_rate': 1.736513455540458e-05, 'epoch': 0.26} 26%|██▌ | 1498/5773 [2:19:22<6:22:31, 5.37s/it] 26%|██▌ | 1499/5773 [2:19:27<6:23:30, 5.38s/it] 26%|██▌ | 1499/5773 [2:19:33<6:23:30, 5.38s/it] {'loss': 0.5669, 'learning_rate': 1.7361337994909166e-05, 'epoch': 0.26} 26%|██▌ | 1499/5773 [2:19:33<6:23:30, 5.38s/it] {'loss': 0.5669, 'learning_rate': 1.7361337994909166e-05, 'epoch': 0.26} 26%|██▌ | 1499/5773 [2:19:27<6:23:30, 5.38s/it]3 AutoResumeHook: Checking whether to suspend... 1415 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 26 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... 1013 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 26%|██▌ | 1500/5773 [2:19:33<6:26:58, 5.43s/it] 4 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 26%|██▌ | 1500/5773 [2:19:38<6:26:58, 5.43s/it]11 AutoResumeHook: Checking whether to suspend... {'loss': 0.5829, 'learning_rate': 1.735753911682936e-05, 'epoch': 0.26} 26%|██▌ | 1500/5773 [2:19:38<6:26:58, 5.43s/it] {'loss': 0.5829, 'learning_rate': 1.735753911682936e-05, 'epoch': 0.26} 26%|██▌ | 1500/5773 [2:19:33<6:26:58, 5.43s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 26%|██▌ | 1501/5773 [2:19:51<11:09:45, 9.41s/it] 26%|██▌ | 1501/5773 [2:19:57<11:09:45, 9.41s/it] {'loss': 0.5809, 'learning_rate': 1.735373792236117e-05, 'epoch': 0.26} 26%|██▌ | 1501/5773 [2:19:57<11:09:45, 9.41s/it] {'loss': 0.5809, 'learning_rate': 1.735373792236117e-05, 'epoch': 0.26} 26%|██▌ | 1501/5773 [2:19:51<11:09:45, 9.41s/it] 26%|██▌ | 1502/5773 [2:19:57<9:45:35, 8.23s/it] 26%|██▌ | 1502/5773 [2:20:02<9:45:36, 8.23s/it] {'loss': 0.5921, 'learning_rate': 1.7349934412701328e-05, 'epoch': 0.26} 26%|██▌ | 1502/5773 [2:20:02<9:45:36, 8.23s/it] {'loss': 0.5921, 'learning_rate': 1.7349934412701328e-05, 'epoch': 0.26} 26%|██▌ | 1502/5773 [2:19:57<9:45:35, 8.23s/it] 26%|██▌ | 1503/5773 [2:20:02<8:46:10, 7.39s/it] 26%|██▌ | 1503/5773 [2:20:08<8:46:10, 7.39s/it] {'loss': 0.5824, 'learning_rate': 1.7346128589047308e-05, 'epoch': 0.26} 26%|██▌ | 1503/5773 [2:20:08<8:46:10, 7.39s/it] {'loss': 0.5824, 'learning_rate': 1.7346128589047308e-05, 'epoch': 0.26} 26%|██▌ | 1503/5773 [2:20:02<8:46:10, 7.39s/it] 26%|██▌ | 1504/5773 [2:20:08<8:04:51, 6.81s/it] 26%|██▌ | 1504/5773 [2:20:13<8:04:51, 6.81s/it] {'loss': 0.5818, 'learning_rate': 1.7342320452597296e-05, 'epoch': 0.26} 26%|██▌ | 1504/5773 [2:20:13<8:04:51, 6.81s/it] {'loss': 0.5818, 'learning_rate': 1.7342320452597296e-05, 'epoch': 0.26} 26%|██▌ | 1504/5773 [2:20:08<8:04:51, 6.81s/it] 26%|██▌ | 1505/5773 [2:20:13<7:32:43, 6.36s/it] 26%|██▌ | 1505/5773 [2:20:19<7:32:44, 6.36s/it] {'loss': 0.5845, 'learning_rate': 1.7338510004550225e-05, 'epoch': 0.26} 26%|██▌ | 1505/5773 [2:20:19<7:32:44, 6.36s/it] {'loss': 0.5845, 'learning_rate': 1.7338510004550225e-05, 'epoch': 0.26} 26%|██▌ | 1505/5773 [2:20:13<7:32:43, 6.36s/it] 26%|██▌ | 1506/5773 [2:20:18<7:10:47, 6.06s/it] 26%|██▌ | 1506/5773 [2:20:24<7:10:47, 6.06s/it] {'loss': 0.5969, 'learning_rate': 1.7334697246105738e-05, 'epoch': 0.26} 26%|██▌ | 1506/5773 [2:20:24<7:10:47, 6.06s/it] {'loss': 0.5969, 'learning_rate': 1.7334697246105738e-05, 'epoch': 0.26} 26%|██▌ | 1506/5773 [2:20:18<7:10:47, 6.06s/it] 26%|██▌ | 1507/5773 [2:20:24<7:05:34, 5.99s/it] 26%|██▌ | 1507/5773 [2:20:30<7:05:34, 5.99s/it] {'loss': 0.5735, 'learning_rate': 1.7330882178464215e-05, 'epoch': 0.26} 26%|██▌ | 1507/5773 [2:20:30<7:05:34, 5.99s/it] {'loss': 0.5735, 'learning_rate': 1.7330882178464215e-05, 'epoch': 0.26} 26%|██▌ | 1507/5773 [2:20:24<7:05:34, 5.99s/it] 26%|██▌ | 1508/5773 [2:20:30<6:51:26, 5.79s/it] 26%|██▌ | 1508/5773 [2:20:35<6:51:26, 5.79s/it] {'loss': 0.6134, 'learning_rate': 1.7327064802826762e-05, 'epoch': 0.26} 26%|██▌ | 1508/5773 [2:20:35<6:51:26, 5.79s/it] {'loss': 0.6134, 'learning_rate': 1.7327064802826762e-05, 'epoch': 0.26} 26%|██▌ | 1508/5773 [2:20:30<6:51:26, 5.79s/it] 26%|██▌ | 1509/5773 [2:20:35<6:41:57, 5.66s/it] 26%|██▌ | 1509/5773 [2:20:41<6:41:57, 5.66s/it] {'loss': 0.5739, 'learning_rate': 1.7323245120395217e-05, 'epoch': 0.26} 26%|██▌ | 1509/5773 [2:20:41<6:41:57, 5.66s/it] {'loss': 0.5739, 'learning_rate': 1.7323245120395217e-05, 'epoch': 0.26} 26%|██▌ | 1509/5773 [2:20:35<6:41:57, 5.66s/it] 26%|██▌ | 1510/5773 [2:20:40<6:35:58, 5.57s/it] 26%|██▌ | 1510/5773 [2:20:46<6:35:57, 5.57s/it] {'loss': 0.5893, 'learning_rate': 1.7319423132372125e-05, 'epoch': 0.26} 26%|██▌ | 1510/5773 [2:20:46<6:35:57, 5.57s/it] {'loss': 0.5893, 'learning_rate': 1.7319423132372125e-05, 'epoch': 0.26} 26%|██▌ | 1510/5773 [2:20:40<6:35:58, 5.57s/it] 26%|██▌ | 1511/5773 [2:20:46<6:36:22, 5.58s/it] 26%|██▌ | 1511/5773 [2:20:51<6:36:22, 5.58s/it] {'loss': 0.5848, 'learning_rate': 1.731559883996078e-05, 'epoch': 0.26} 26%|██▌ | 1511/5773 [2:20:52<6:36:22, 5.58s/it] {'loss': 0.5848, 'learning_rate': 1.731559883996078e-05, 'epoch': 0.26} 26%|██▌ | 1511/5773 [2:20:46<6:36:22, 5.58s/it] 26%|██▌ | 1512/5773 [2:20:51<6:32:56, 5.53s/it] 26%|██▌ | 1512/5773 [2:20:57<6:32:57, 5.53s/it] {'loss': 0.5999, 'learning_rate': 1.731177224436519e-05, 'epoch': 0.26} 26%|██▌ | 1512/5773 [2:20:57<6:32:57, 5.53s/it] {'loss': 0.5999, 'learning_rate': 1.731177224436519e-05, 'epoch': 0.26} 26%|██▌ | 1512/5773 [2:20:51<6:32:56, 5.53s/it] 26%|██▌ | 1513/5773 [2:20:57<6:39:01, 5.62s/it] 26%|██▌ | 1513/5773 [2:21:03<6:39:01, 5.62s/it] {'loss': 0.5815, 'learning_rate': 1.7307943346790086e-05, 'epoch': 0.26} 26%|██▌ | 1513/5773 [2:21:03<6:39:01, 5.62s/it] {'loss': 0.5815, 'learning_rate': 1.7307943346790086e-05, 'epoch': 0.26} 26%|██▌ | 1513/5773 [2:20:57<6:39:01, 5.62s/it] 26%|██▌ | 1514/5773 [2:21:03<6:35:11, 5.57s/it] 26%|██▌ | 1514/5773 [2:21:08<6:35:11, 5.57s/it] {'loss': 0.5955, 'learning_rate': 1.7304112148440927e-05, 'epoch': 0.26} 26%|██▌ | 1514/5773 [2:21:08<6:35:11, 5.57s/it]{'loss': 0.5955, 'learning_rate': 1.7304112148440927e-05, 'epoch': 0.26} 26%|██▌ | 1514/5773 [2:21:03<6:35:11, 5.57s/it] 26%|██▌ | 1515/5773 [2:21:08<6:30:51, 5.51s/it] 26%|██▌ | 1515/5773 [2:21:14<6:30:51, 5.51s/it] {'loss': 0.5715, 'learning_rate': 1.73002786505239e-05, 'epoch': 0.26} 26%|██▌ | 1515/5773 [2:21:14<6:30:51, 5.51s/it] {'loss': 0.5715, 'learning_rate': 1.73002786505239e-05, 'epoch': 0.26} 26%|██▌ | 1515/5773 [2:21:08<6:30:51, 5.51s/it] 26%|██▋ | 1516/5773 [2:21:14<6:31:30, 5.52s/it] 26%|██▋ | 1516/5773 [2:21:19<6:31:30, 5.52s/it] {'loss': 0.5869, 'learning_rate': 1.7296442854245915e-05, 'epoch': 0.26} 26%|██▋ | 1516/5773 [2:21:19<6:31:30, 5.52s/it] {'loss': 0.5869, 'learning_rate': 1.7296442854245915e-05, 'epoch': 0.26} 26%|██▋ | 1516/5773 [2:21:14<6:31:30, 5.52s/it] 26%|██▋ | 1517/5773 [2:21:19<6:31:29, 5.52s/it] 26%|██▋ | 1517/5773 [2:21:25<6:31:29, 5.52s/it] {'loss': 0.595, 'learning_rate': 1.7292604760814597e-05, 'epoch': 0.26} 26%|██▋ | 1517/5773 [2:21:25<6:31:29, 5.52s/it] {'loss': 0.595, 'learning_rate': 1.7292604760814597e-05, 'epoch': 0.26} 26%|██▋ | 1517/5773 [2:21:19<6:31:29, 5.52s/it] 26%|██▋ | 1518/5773 [2:21:25<6:30:14, 5.50s/it] 26%|██▋ | 1518/5773 [2:21:30<6:30:14, 5.50s/it] {'loss': 0.5833, 'learning_rate': 1.7288764371438304e-05, 'epoch': 0.26} 26%|██▋ | 1518/5773 [2:21:30<6:30:14, 5.50s/it] {'loss': 0.5833, 'learning_rate': 1.7288764371438304e-05, 'epoch': 0.26} 26%|██▋ | 1518/5773 [2:21:25<6:30:14, 5.50s/it] 26%|██▋ | 1519/5773 [2:21:30<6:28:55, 5.49s/it] 26%|██▋ | 1519/5773 [2:21:36<6:28:55, 5.49s/it] {'loss': 0.5843, 'learning_rate': 1.728492168732611e-05, 'epoch': 0.26} 26%|██▋ | 1519/5773 [2:21:36<6:28:55, 5.49s/it] {'loss': 0.5843, 'learning_rate': 1.728492168732611e-05, 'epoch': 0.26} 26%|██▋ | 1519/5773 [2:21:30<6:28:55, 5.49s/it] 26%|██▋ | 1520/5773 [2:21:35<6:27:11, 5.46s/it] 26%|██▋ | 1520/5773 [2:21:41<6:27:11, 5.46s/it] {'loss': 0.5814, 'learning_rate': 1.728107670968782e-05, 'epoch': 0.26} 26%|██▋ | 1520/5773 [2:21:41<6:27:11, 5.46s/it] {'loss': 0.5814, 'learning_rate': 1.728107670968782e-05, 'epoch': 0.26} 26%|██▋ | 1520/5773 [2:21:35<6:27:11, 5.46s/it] 26%|██▋ | 1521/5773 [2:21:41<6:24:53, 5.43s/it] 26%|██▋ | 1521/5773 [2:21:46<6:24:53, 5.43s/it] {'loss': 0.5697, 'learning_rate': 1.727722943973395e-05, 'epoch': 0.26} 26%|██▋ | 1521/5773 [2:21:46<6:24:53, 5.43s/it] {'loss': 0.5697, 'learning_rate': 1.727722943973395e-05, 'epoch': 0.26} 26%|██▋ | 1521/5773 [2:21:41<6:24:53, 5.43s/it] 26%|██▋ | 1522/5773 [2:21:46<6:23:43, 5.42s/it] 26%|██▋ | 1522/5773 [2:21:52<6:23:43, 5.42s/it] {'loss': 0.5683, 'learning_rate': 1.7273379878675752e-05, 'epoch': 0.26} 26%|██▋ | 1522/5773 [2:21:52<6:23:43, 5.42s/it] {'loss': 0.5683, 'learning_rate': 1.7273379878675752e-05, 'epoch': 0.26} 26%|██▋ | 1522/5773 [2:21:46<6:23:43, 5.42s/it] 26%|██▋ | 1523/5773 [2:21:52<6:22:32, 5.40s/it] 26%|██▋ | 1523/5773 [2:21:57<6:22:32, 5.40s/it] {'loss': 0.5748, 'learning_rate': 1.7269528027725182e-05, 'epoch': 0.26} 26%|██▋ | 1523/5773 [2:21:57<6:22:32, 5.40s/it] {'loss': 0.5748, 'learning_rate': 1.7269528027725182e-05, 'epoch': 0.26} 26%|██▋ | 1523/5773 [2:21:52<6:22:32, 5.40s/it] 26%|██▋ | 1524/5773 [2:21:57<6:27:48, 5.48s/it] 26%|██▋ | 1524/5773 [2:22:03<6:27:48, 5.48s/it] {'loss': 0.5874, 'learning_rate': 1.726567388809493e-05, 'epoch': 0.26} 26%|██▋ | 1524/5773 [2:22:03<6:27:48, 5.48s/it] {'loss': 0.5874, 'learning_rate': 1.726567388809493e-05, 'epoch': 0.26} 26%|██▋ | 1524/5773 [2:21:57<6:27:48, 5.48s/it] 26%|██▋ | 1525/5773 [2:22:02<6:23:47, 5.42s/it] 26%|██▋ | 1525/5773 [2:22:08<6:23:46, 5.42s/it] {'loss': 0.6045, 'learning_rate': 1.7261817460998398e-05, 'epoch': 0.26} 26%|██▋ | 1525/5773 [2:22:08<6:23:46, 5.42s/it] {'loss': 0.6045, 'learning_rate': 1.7261817460998398e-05, 'epoch': 0.26} 26%|██▋ | 1525/5773 [2:22:02<6:23:47, 5.42s/it] 26%|██▋ | 1526/5773 [2:22:08<6:27:18, 5.47s/it] 26%|██▋ | 1526/5773 [2:22:14<6:27:18, 5.47s/it] {'loss': 0.5891, 'learning_rate': 1.725795874764972e-05, 'epoch': 0.26} 26%|██▋ | 1526/5773 [2:22:14<6:27:18, 5.47s/it] {'loss': 0.5891, 'learning_rate': 1.725795874764972e-05, 'epoch': 0.26} 26%|██▋ | 1526/5773 [2:22:08<6:27:18, 5.47s/it] 26%|██▋ | 1527/5773 [2:22:13<6:22:57, 5.41s/it] 26%|██▋ | 1527/5773 [2:22:19<6:22:57, 5.41s/it] {'loss': 0.5916, 'learning_rate': 1.7254097749263735e-05, 'epoch': 0.26} 26%|██▋ | 1527/5773 [2:22:19<6:22:57, 5.41s/it] {'loss': 0.5916, 'learning_rate': 1.7254097749263735e-05, 'epoch': 0.26} 26%|██▋ | 1527/5773 [2:22:13<6:22:57, 5.41s/it] 26%|██▋ | 1528/5773 [2:22:19<6:23:50, 5.43s/it] 26%|██▋ | 1528/5773 [2:22:24<6:23:50, 5.43s/it] {'loss': 0.6018, 'learning_rate': 1.725023446705601e-05, 'epoch': 0.26} 26%|██▋ | 1528/5773 [2:22:24<6:23:50, 5.43s/it] {'loss': 0.6018, 'learning_rate': 1.725023446705601e-05, 'epoch': 0.26} 26%|██▋ | 1528/5773 [2:22:19<6:23:50, 5.43s/it] 26%|██▋ | 1529/5773 [2:22:24<6:26:16, 5.46s/it] 26%|██▋ | 1529/5773 [2:22:30<6:26:16, 5.46s/it] {'loss': 0.5781, 'learning_rate': 1.7246368902242837e-05, 'epoch': 0.26} 26%|██▋ | 1529/5773 [2:22:30<6:26:16, 5.46s/it] {'loss': 0.5781, 'learning_rate': 1.7246368902242837e-05, 'epoch': 0.26} 26%|██▋ | 1529/5773 [2:22:24<6:26:16, 5.46s/it] 27%|██▋ | 1530/5773 [2:22:30<6:24:06, 5.43s/it] 27%|██▋ | 1530/5773 [2:22:35<6:24:06, 5.43s/it] {'loss': 0.5883, 'learning_rate': 1.724250105604121e-05, 'epoch': 0.27} 27%|██▋ | 1530/5773 [2:22:35<6:24:06, 5.43s/it] {'loss': 0.5883, 'learning_rate': 1.724250105604121e-05, 'epoch': 0.27} 27%|██▋ | 1530/5773 [2:22:30<6:24:06, 5.43s/it] 27%|██▋ | 1531/5773 [2:22:35<6:25:44, 5.46s/it] 27%|██▋ | 1531/5773 [2:22:41<6:25:44, 5.46s/it] {'loss': 0.5667, 'learning_rate': 1.7238630929668855e-05, 'epoch': 0.27} 27%|██▋ | 1531/5773 [2:22:41<6:25:44, 5.46s/it] {'loss': 0.5667, 'learning_rate': 1.7238630929668855e-05, 'epoch': 0.27} 27%|██▋ | 1531/5773 [2:22:35<6:25:44, 5.46s/it] 27%|██▋ | 1532/5773 [2:22:41<6:27:15, 5.48s/it] 27%|██▋ | 1532/5773 [2:22:46<6:27:14, 5.48s/it] {'loss': 0.5925, 'learning_rate': 1.723475852434421e-05, 'epoch': 0.27} 27%|██▋ | 1532/5773 [2:22:46<6:27:14, 5.48s/it] {'loss': 0.5925, 'learning_rate': 1.723475852434421e-05, 'epoch': 0.27} 27%|██▋ | 1532/5773 [2:22:41<6:27:15, 5.48s/it] 27%|██▋ | 1533/5773 [2:22:46<6:28:10, 5.49s/it] 27%|██▋ | 1533/5773 [2:22:52<6:28:10, 5.49s/it] {'loss': 0.5993, 'learning_rate': 1.7230883841286433e-05, 'epoch': 0.27} 27%|██▋ | 1533/5773 [2:22:52<6:28:10, 5.49s/it] {'loss': 0.5993, 'learning_rate': 1.7230883841286433e-05, 'epoch': 0.27} 27%|██▋ | 1533/5773 [2:22:46<6:28:10, 5.49s/it] 27%|██▋ | 1534/5773 [2:22:52<6:25:14, 5.45s/it] 27%|██▋ | 1534/5773 [2:22:57<6:25:13, 5.45s/it] {'loss': 0.5759, 'learning_rate': 1.72270068817154e-05, 'epoch': 0.27} 27%|██▋ | 1534/5773 [2:22:57<6:25:13, 5.45s/it] {'loss': 0.5759, 'learning_rate': 1.72270068817154e-05, 'epoch': 0.27} 27%|██▋ | 1534/5773 [2:22:52<6:25:14, 5.45s/it] 27%|██▋ | 1535/5773 [2:22:57<6:25:06, 5.45s/it] 27%|██▋ | 1535/5773 [2:23:03<6:25:06, 5.45s/it] {'loss': 0.5861, 'learning_rate': 1.7223127646851698e-05, 'epoch': 0.27} 27%|██▋ | 1535/5773 [2:23:03<6:25:06, 5.45s/it] {'loss': 0.5861, 'learning_rate': 1.7223127646851698e-05, 'epoch': 0.27} 27%|██▋ | 1535/5773 [2:22:57<6:25:06, 5.45s/it] 27%|██▋ | 1536/5773 [2:23:02<6:24:06, 5.44s/it] 27%|██▋ | 1536/5773 [2:23:08<6:24:07, 5.44s/it] {'loss': 0.584, 'learning_rate': 1.721924613791663e-05, 'epoch': 0.27} 27%|██▋ | 1536/5773 [2:23:08<6:24:07, 5.44s/it] {'loss': 0.584, 'learning_rate': 1.721924613791663e-05, 'epoch': 0.27} 27%|██▋ | 1536/5773 [2:23:02<6:24:06, 5.44s/it] 27%|██▋ | 1537/5773 [2:23:08<6:28:33, 5.50s/it] 27%|██▋ | 1537/5773 [2:23:14<6:28:33, 5.50s/it] {'loss': 0.5978, 'learning_rate': 1.7215362356132232e-05, 'epoch': 0.27} 27%|██▋ | 1537/5773 [2:23:14<6:28:33, 5.50s/it] {'loss': 0.5978, 'learning_rate': 1.7215362356132232e-05, 'epoch': 0.27} 27%|██▋ | 1537/5773 [2:23:08<6:28:33, 5.50s/it] 27%|██▋ | 1538/5773 [2:23:14<6:27:16, 5.49s/it] 27%|██▋ | 1538/5773 [2:23:19<6:27:16, 5.49s/it] {'loss': 0.5978, 'learning_rate': 1.721147630272123e-05, 'epoch': 0.27} 27%|██▋ | 1538/5773 [2:23:19<6:27:16, 5.49s/it] {'loss': 0.5978, 'learning_rate': 1.721147630272123e-05, 'epoch': 0.27} 27%|██▋ | 1538/5773 [2:23:14<6:27:16, 5.49s/it] 27%|██▋ | 1539/5773 [2:23:19<6:27:12, 5.49s/it] 27%|██▋ | 1539/5773 [2:23:25<6:27:12, 5.49s/it] {'loss': 0.5749, 'learning_rate': 1.7207587978907085e-05, 'epoch': 0.27} 27%|██▋ | 1539/5773 [2:23:25<6:27:12, 5.49s/it] {'loss': 0.5749, 'learning_rate': 1.7207587978907085e-05, 'epoch': 0.27} 27%|██▋ | 1539/5773 [2:23:19<6:27:12, 5.49s/it] 27%|██▋ | 1540/5773 [2:23:25<6:27:24, 5.49s/it] 27%|██▋ | 1540/5773 [2:23:30<6:27:24, 5.49s/it] {'loss': 0.5842, 'learning_rate': 1.7203697385913965e-05, 'epoch': 0.27} 27%|██▋ | 1540/5773 [2:23:30<6:27:24, 5.49s/it] {'loss': 0.5842, 'learning_rate': 1.7203697385913965e-05, 'epoch': 0.27} 27%|██▋ | 1540/5773 [2:23:25<6:27:24, 5.49s/it] 27%|██▋ | 1541/5773 [2:23:30<6:29:31, 5.52s/it] 27%|██▋ | 1541/5773 [2:23:36<6:29:31, 5.52s/it] {'loss': 0.5952, 'learning_rate': 1.719980452496675e-05, 'epoch': 0.27} 27%|██▋ | 1541/5773 [2:23:36<6:29:31, 5.52s/it] {'loss': 0.5952, 'learning_rate': 1.719980452496675e-05, 'epoch': 0.27} 27%|██▋ | 1541/5773 [2:23:30<6:29:31, 5.52s/it] 27%|██▋ | 1542/5773 [2:23:41<6:30:33, 5.54s/it] 27%|██▋ | 1542/5773 [2:23:36<6:30:37, 5.54s/it] {'loss': 0.6013, 'learning_rate': 1.7195909397291036e-05, 'epoch': 0.27} 27%|██▋ | 1542/5773 [2:23:41<6:30:33, 5.54s/it] {'loss': 0.6013, 'learning_rate': 1.7195909397291036e-05, 'epoch': 0.27} 27%|██▋ | 1542/5773 [2:23:36<6:30:37, 5.54s/it] 27%|██▋ | 1543/5773 [2:23:41<6:31:42, 5.56s/it] 27%|██▋ | 1543/5773 [2:23:47<6:31:43, 5.56s/it] {'loss': 0.5732, 'learning_rate': 1.7192012004113138e-05, 'epoch': 0.27} 27%|██▋ | 1543/5773 [2:23:47<6:31:43, 5.56s/it] {'loss': 0.5732, 'learning_rate': 1.7192012004113138e-05, 'epoch': 0.27} 27%|██▋ | 1543/5773 [2:23:41<6:31:42, 5.56s/it] 27%|██▋ | 1544/5773 [2:23:47<6:29:34, 5.53s/it] 27%|██▋ | 1544/5773 [2:23:52<6:29:35, 5.53s/it] {'loss': 0.6002, 'learning_rate': 1.7188112346660077e-05, 'epoch': 0.27} 27%|██▋ | 1544/5773 [2:23:52<6:29:35, 5.53s/it] {'loss': 0.6002, 'learning_rate': 1.7188112346660077e-05, 'epoch': 0.27} 27%|██▋ | 1544/5773 [2:23:47<6:29:34, 5.53s/it] 27%|██▋ | 1545/5773 [2:23:52<6:26:01, 5.48s/it] 27%|██▋ | 1545/5773 [2:23:58<6:26:02, 5.48s/it] {'loss': 0.5867, 'learning_rate': 1.7184210426159594e-05, 'epoch': 0.27} 27%|██▋ | 1545/5773 [2:23:58<6:26:02, 5.48s/it] {'loss': 0.5867, 'learning_rate': 1.7184210426159594e-05, 'epoch': 0.27} 27%|██▋ | 1545/5773 [2:23:52<6:26:01, 5.48s/it] 27%|██▋ | 1546/5773 [2:23:57<6:22:10, 5.42s/it] 27%|██▋ | 1546/5773 [2:24:03<6:22:10, 5.42s/it] {'loss': 0.5922, 'learning_rate': 1.7180306243840133e-05, 'epoch': 0.27} 27%|██▋ | 1546/5773 [2:24:03<6:22:10, 5.42s/it] {'loss': 0.5922, 'learning_rate': 1.7180306243840133e-05, 'epoch': 0.27} 27%|██▋ | 1546/5773 [2:23:57<6:22:10, 5.42s/it] 27%|██▋ | 1547/5773 [2:24:03<6:23:04, 5.44s/it] 27%|██▋ | 1547/5773 [2:24:08<6:23:04, 5.44s/it] {'loss': 0.5865, 'learning_rate': 1.7176399800930852e-05, 'epoch': 0.27} 27%|██▋ | 1547/5773 [2:24:08<6:23:04, 5.44s/it] {'loss': 0.5865, 'learning_rate': 1.7176399800930852e-05, 'epoch': 0.27} 27%|██▋ | 1547/5773 [2:24:03<6:23:04, 5.44s/it] 27%|██▋ | 1548/5773 [2:24:14<6:22:18, 5.43s/it] 27%|██▋ | 1548/5773 [2:24:08<6:22:20, 5.43s/it] {'loss': 0.6037, 'learning_rate': 1.7172491098661636e-05, 'epoch': 0.27} 27%|██▋ | 1548/5773 [2:24:14<6:22:18, 5.43s/it] {'loss': 0.6037, 'learning_rate': 1.7172491098661636e-05, 'epoch': 0.27} 27%|██▋ | 1548/5773 [2:24:08<6:22:20, 5.43s/it] 27%|██▋ | 1549/5773 [2:24:14<6:22:50, 5.44s/it] 27%|██▋ | 1549/5773 [2:24:19<6:22:51, 5.44s/it] {'loss': 0.5733, 'learning_rate': 1.7168580138263064e-05, 'epoch': 0.27} 27%|██▋ | 1549/5773 [2:24:19<6:22:51, 5.44s/it] {'loss': 0.5733, 'learning_rate': 1.7168580138263064e-05, 'epoch': 0.27} 27%|██▋ | 1549/5773 [2:24:14<6:22:50, 5.44s/it]15 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 37 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 27%|██▋ | 1550/5773 [2:24:19<6:27:55, 5.51s/it]5 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 2 6AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 27%|██▋ | 1550/5773 [2:24:25<6:27:56, 5.51s/it] {'loss': 0.598, 'learning_rate': 1.7164666920966424e-05, 'epoch': 0.27} 27%|██▋ | 1550/5773 [2:24:25<6:27:56, 5.51s/it] {'loss': 0.598, 'learning_rate': 1.7164666920966424e-05, 'epoch': 0.27} 27%|██▋ | 1550/5773 [2:24:19<6:27:55, 5.51s/it] 27%|██▋ | 1551/5773 [2:24:25<6:28:28, 5.52s/it] 27%|██▋ | 1551/5773 [2:24:31<6:28:28, 5.52s/it] {'loss': 0.5701, 'learning_rate': 1.716075144800373e-05, 'epoch': 0.27} 27%|██▋ | 1551/5773 [2:24:31<6:28:28, 5.52s/it] {'loss': 0.5701, 'learning_rate': 1.716075144800373e-05, 'epoch': 0.27} 27%|██▋ | 1551/5773 [2:24:25<6:28:28, 5.52s/it] 27%|██▋ | 1552/5773 [2:24:31<6:28:46, 5.53s/it] 27%|██▋ | 1552/5773 [2:24:36<6:28:46, 5.53s/it] {'loss': 0.5943, 'learning_rate': 1.7156833720607698e-05, 'epoch': 0.27} 27%|██▋ | 1552/5773 [2:24:36<6:28:46, 5.53s/it] {'loss': 0.5943, 'learning_rate': 1.7156833720607698e-05, 'epoch': 0.27} 27%|██▋ | 1552/5773 [2:24:31<6:28:46, 5.53s/it] 27%|██▋ | 1553/5773 [2:24:36<6:29:56, 5.54s/it] 27%|██▋ | 1553/5773 [2:24:42<6:29:58, 5.54s/it] {'loss': 0.5963, 'learning_rate': 1.7152913740011746e-05, 'epoch': 0.27} 27%|██▋ | 1553/5773 [2:24:42<6:29:58, 5.54s/it] {'loss': 0.5963, 'learning_rate': 1.7152913740011746e-05, 'epoch': 0.27} 27%|██▋ | 1553/5773 [2:24:36<6:29:56, 5.54s/it] 27%|██▋ | 1554/5773 [2:24:42<6:27:17, 5.51s/it] 27%|██▋ | 1554/5773 [2:24:47<6:27:17, 5.51s/it] {'loss': 0.5692, 'learning_rate': 1.714899150745002e-05, 'epoch': 0.27} 27%|██▋ | 1554/5773 [2:24:47<6:27:17, 5.51s/it] {'loss': 0.5692, 'learning_rate': 1.714899150745002e-05, 'epoch': 0.27} 27%|██▋ | 1554/5773 [2:24:42<6:27:17, 5.51s/it] 27%|██▋ | 1555/5773 [2:24:47<6:28:34, 5.53s/it] 27%|██▋ | 1555/5773 [2:24:53<6:28:34, 5.53s/it] {'loss': 0.5867, 'learning_rate': 1.7145067024157355e-05, 'epoch': 0.27} 27%|██▋ | 1555/5773 [2:24:53<6:28:34, 5.53s/it] {'loss': 0.5867, 'learning_rate': 1.7145067024157355e-05, 'epoch': 0.27} 27%|██▋ | 1555/5773 [2:24:47<6:28:34, 5.53s/it] 27%|██▋ | 1556/5773 [2:24:52<6:21:53, 5.43s/it] 27%|██▋ | 1556/5773 [2:24:58<6:21:53, 5.43s/it] {'loss': 0.5817, 'learning_rate': 1.7141140291369312e-05, 'epoch': 0.27} 27%|██▋ | 1556/5773 [2:24:58<6:21:53, 5.43s/it] {'loss': 0.5817, 'learning_rate': 1.7141140291369312e-05, 'epoch': 0.27} 27%|██▋ | 1556/5773 [2:24:52<6:21:53, 5.43s/it] 27%|██▋ | 1557/5773 [2:24:58<6:20:22, 5.41s/it] 27%|██▋ | 1557/5773 [2:25:03<6:20:22, 5.41s/it] {'loss': 0.5674, 'learning_rate': 1.713721131032214e-05, 'epoch': 0.27} 27%|██▋ | 1557/5773 [2:25:03<6:20:22, 5.41s/it] {'loss': 0.5674, 'learning_rate': 1.713721131032214e-05, 'epoch': 0.27} 27%|██▋ | 1557/5773 [2:24:58<6:20:22, 5.41s/it] 27%|██▋ | 1558/5773 [2:25:03<6:22:12, 5.44s/it] 27%|██▋ | 1558/5773 [2:25:09<6:22:12, 5.44s/it] {'loss': 0.5951, 'learning_rate': 1.713328008225282e-05, 'epoch': 0.27} 27%|██▋ | 1558/5773 [2:25:09<6:22:12, 5.44s/it] {'loss': 0.5951, 'learning_rate': 1.713328008225282e-05, 'epoch': 0.27} 27%|██▋ | 1558/5773 [2:25:03<6:22:12, 5.44s/it] 27%|██▋ | 1559/5773 [2:25:09<6:23:19, 5.46s/it] 27%|██▋ | 1559/5773 [2:25:14<6:23:18, 5.46s/it] {'loss': 0.5773, 'learning_rate': 1.7129346608399027e-05, 'epoch': 0.27} 27%|██▋ | 1559/5773 [2:25:14<6:23:18, 5.46s/it] {'loss': 0.5773, 'learning_rate': 1.7129346608399027e-05, 'epoch': 0.27} 27%|██▋ | 1559/5773 [2:25:09<6:23:19, 5.46s/it] 27%|██▋ | 1560/5773 [2:25:14<6:19:16, 5.40s/it] 27%|██▋ | 1560/5773 [2:25:20<6:19:16, 5.40s/it] {'loss': 0.5849, 'learning_rate': 1.7125410889999135e-05, 'epoch': 0.27} 27%|██▋ | 1560/5773 [2:25:20<6:19:16, 5.40s/it] {'loss': 0.5849, 'learning_rate': 1.7125410889999135e-05, 'epoch': 0.27} 27%|██▋ | 1560/5773 [2:25:14<6:19:16, 5.40s/it] 27%|██▋ | 1561/5773 [2:25:20<6:23:05, 5.46s/it] 27%|██▋ | 1561/5773 [2:25:25<6:23:06, 5.46s/it] {'loss': 0.5804, 'learning_rate': 1.712147292829224e-05, 'epoch': 0.27} 27%|██▋ | 1561/5773 [2:25:25<6:23:06, 5.46s/it] {'loss': 0.5804, 'learning_rate': 1.712147292829224e-05, 'epoch': 0.27} 27%|██▋ | 1561/5773 [2:25:20<6:23:05, 5.46s/it] 27%|██▋ | 1562/5773 [2:25:25<6:22:00, 5.44s/it] 27%|██▋ | 1562/5773 [2:25:31<6:21:59, 5.44s/it] {'loss': 0.6007, 'learning_rate': 1.711753272451814e-05, 'epoch': 0.27} 27%|██▋ | 1562/5773 [2:25:31<6:21:59, 5.44s/it] {'loss': 0.6007, 'learning_rate': 1.711753272451814e-05, 'epoch': 0.27} 27%|██▋ | 1562/5773 [2:25:25<6:22:00, 5.44s/it] 27%|██▋ | 1563/5773 [2:25:30<6:21:38, 5.44s/it] 27%|██▋ | 1563/5773 [2:25:36<6:21:39, 5.44s/it] {'loss': 0.6036, 'learning_rate': 1.7113590279917332e-05, 'epoch': 0.27} 27%|██▋ | 1563/5773 [2:25:36<6:21:39, 5.44s/it] {'loss': 0.6036, 'learning_rate': 1.7113590279917332e-05, 'epoch': 0.27} 27%|██▋ | 1563/5773 [2:25:30<6:21:38, 5.44s/it] 27%|██▋ | 1564/5773 [2:25:36<6:22:37, 5.45s/it] 27%|██▋ | 1564/5773 [2:25:41<6:22:37, 5.45s/it] {'loss': 0.5972, 'learning_rate': 1.7109645595731028e-05, 'epoch': 0.27} 27%|██▋ | 1564/5773 [2:25:41<6:22:37, 5.45s/it] {'loss': 0.5972, 'learning_rate': 1.7109645595731028e-05, 'epoch': 0.27} 27%|██▋ | 1564/5773 [2:25:36<6:22:37, 5.45s/it] 27%|██▋ | 1565/5773 [2:25:41<6:22:40, 5.46s/it] 27%|██▋ | 1565/5773 [2:25:47<6:22:40, 5.46s/it] {'loss': 0.5858, 'learning_rate': 1.710569867320114e-05, 'epoch': 0.27} 27%|██▋ | 1565/5773 [2:25:47<6:22:40, 5.46s/it] {'loss': 0.5858, 'learning_rate': 1.710569867320114e-05, 'epoch': 0.27} 27%|██▋ | 1565/5773 [2:25:41<6:22:40, 5.46s/it] 27%|██▋ | 1566/5773 [2:25:47<6:22:50, 5.46s/it] 27%|██▋ | 1566/5773 [2:25:52<6:22:50, 5.46s/it] {'loss': 0.5781, 'learning_rate': 1.7101749513570282e-05, 'epoch': 0.27} 27%|██▋ | 1566/5773 [2:25:52<6:22:50, 5.46s/it] {'loss': 0.5781, 'learning_rate': 1.7101749513570282e-05, 'epoch': 0.27} 27%|██▋ | 1566/5773 [2:25:47<6:22:50, 5.46s/it] 27%|██▋ | 1567/5773 [2:25:52<6:21:13, 5.44s/it] 27%|██▋ | 1567/5773 [2:25:58<6:21:13, 5.44s/it] {'loss': 0.5937, 'learning_rate': 1.709779811808178e-05, 'epoch': 0.27} 27%|██▋ | 1567/5773 [2:25:58<6:21:13, 5.44s/it] {'loss': 0.5937, 'learning_rate': 1.709779811808178e-05, 'epoch': 0.27} 27%|██▋ | 1567/5773 [2:25:52<6:21:13, 5.44s/it] 27%|██▋ | 1568/5773 [2:25:58<6:22:09, 5.45s/it] 27%|██▋ | 1568/5773 [2:26:03<6:22:08, 5.45s/it] {'loss': 0.5968, 'learning_rate': 1.7093844487979658e-05, 'epoch': 0.27} 27%|██▋ | 1568/5773 [2:26:03<6:22:08, 5.45s/it] {'loss': 0.5968, 'learning_rate': 1.7093844487979658e-05, 'epoch': 0.27} 27%|██▋ | 1568/5773 [2:25:58<6:22:09, 5.45s/it] 27%|██▋ | 1569/5773 [2:26:03<6:20:23, 5.43s/it] 27%|██▋ | 1569/5773 [2:26:09<6:20:23, 5.43s/it] {'loss': 0.5728, 'learning_rate': 1.7089888624508638e-05, 'epoch': 0.27} 27%|██▋ | 1569/5773 [2:26:09<6:20:23, 5.43s/it] {'loss': 0.5728, 'learning_rate': 1.7089888624508638e-05, 'epoch': 0.27} 27%|██▋ | 1569/5773 [2:26:03<6:20:23, 5.43s/it] 27%|██▋ | 1570/5773 [2:26:08<6:18:09, 5.40s/it] 27%|██▋ | 1570/5773 [2:26:14<6:18:10, 5.40s/it] {'loss': 0.6125, 'learning_rate': 1.7085930528914166e-05, 'epoch': 0.27} 27%|██▋ | 1570/5773 [2:26:14<6:18:10, 5.40s/it] {'loss': 0.6125, 'learning_rate': 1.7085930528914166e-05, 'epoch': 0.27} 27%|██▋ | 1570/5773 [2:26:08<6:18:09, 5.40s/it] 27%|██▋ | 1571/5773 [2:26:14<6:19:11, 5.41s/it] 27%|██▋ | 1571/5773 [2:26:19<6:19:11, 5.41s/it] {'loss': 0.5845, 'learning_rate': 1.7081970202442363e-05, 'epoch': 0.27} 27%|██▋ | 1571/5773 [2:26:19<6:19:11, 5.41s/it] {'loss': 0.5845, 'learning_rate': 1.7081970202442363e-05, 'epoch': 0.27} 27%|██▋ | 1571/5773 [2:26:14<6:19:11, 5.41s/it] 27%|██▋ | 1572/5773 [2:26:20<6:23:55, 5.48s/it] 27%|██▋ | 1572/5773 [2:26:25<6:23:55, 5.48s/it] {'loss': 0.6075, 'learning_rate': 1.707800764634008e-05, 'epoch': 0.27} 27%|██▋ | 1572/5773 [2:26:25<6:23:55, 5.48s/it] {'loss': 0.6075, 'learning_rate': 1.707800764634008e-05, 'epoch': 0.27} 27%|██▋ | 1572/5773 [2:26:20<6:23:55, 5.48s/it] 27%|██▋ | 1573/5773 [2:26:25<6:24:23, 5.49s/it] 27%|██▋ | 1573/5773 [2:26:31<6:24:23, 5.49s/it] {'loss': 0.5912, 'learning_rate': 1.7074042861854842e-05, 'epoch': 0.27} 27%|██▋ | 1573/5773 [2:26:31<6:24:23, 5.49s/it] {'loss': 0.5912, 'learning_rate': 1.7074042861854842e-05, 'epoch': 0.27} 27%|██▋ | 1573/5773 [2:26:25<6:24:23, 5.49s/it] 27%|██▋ | 1574/5773 [2:26:31<6:27:08, 5.53s/it] 27%|██▋ | 1574/5773 [2:26:36<6:27:08, 5.53s/it] {'loss': 0.5916, 'learning_rate': 1.70700758502349e-05, 'epoch': 0.27} 27%|██▋ | 1574/5773 [2:26:36<6:27:08, 5.53s/it] {'loss': 0.5916, 'learning_rate': 1.70700758502349e-05, 'epoch': 0.27} 27%|██▋ | 1574/5773 [2:26:31<6:27:08, 5.53s/it] 27%|██▋ | 1575/5773 [2:26:36<6:27:24, 5.54s/it] 27%|██▋ | 1575/5773 [2:26:42<6:27:24, 5.54s/it] {'loss': 0.6012, 'learning_rate': 1.7066106612729196e-05, 'epoch': 0.27} 27%|██▋ | 1575/5773 [2:26:42<6:27:24, 5.54s/it] {'loss': 0.6012, 'learning_rate': 1.7066106612729196e-05, 'epoch': 0.27} 27%|██▋ | 1575/5773 [2:26:36<6:27:24, 5.54s/it] 27%|██▋ | 1576/5773 [2:26:42<6:27:03, 5.53s/it] 27%|██▋ | 1576/5773 [2:26:47<6:27:03, 5.53s/it] {'loss': 0.5882, 'learning_rate': 1.7062135150587367e-05, 'epoch': 0.27} 27%|██▋ | 1576/5773 [2:26:47<6:27:03, 5.53s/it] {'loss': 0.5882, 'learning_rate': 1.7062135150587367e-05, 'epoch': 0.27} 27%|██▋ | 1576/5773 [2:26:42<6:27:03, 5.53s/it] 27%|██▋ | 1577/5773 [2:26:47<6:24:22, 5.50s/it] 27%|██▋ | 1577/5773 [2:26:53<6:24:22, 5.50s/it] {'loss': 0.5911, 'learning_rate': 1.705816146505976e-05, 'epoch': 0.27} 27%|██▋ | 1577/5773 [2:26:53<6:24:22, 5.50s/it] {'loss': 0.5911, 'learning_rate': 1.705816146505976e-05, 'epoch': 0.27} 27%|██▋ | 1577/5773 [2:26:47<6:24:22, 5.50s/it] 27%|██▋ | 1578/5773 [2:26:53<6:29:28, 5.57s/it] 27%|██▋ | 1578/5773 [2:26:58<6:29:28, 5.57s/it] {'loss': 0.586, 'learning_rate': 1.7054185557397413e-05, 'epoch': 0.27} 27%|██▋ | 1578/5773 [2:26:58<6:29:28, 5.57s/it] {'loss': 0.586, 'learning_rate': 1.7054185557397413e-05, 'epoch': 0.27} 27%|██▋ | 1578/5773 [2:26:53<6:29:28, 5.57s/it] 27%|██▋ | 1579/5773 [2:26:58<6:22:15, 5.47s/it] 27%|██▋ | 1579/5773 [2:27:04<6:22:16, 5.47s/it] {'loss': 0.5877, 'learning_rate': 1.705020742885208e-05, 'epoch': 0.27} 27%|██▋ | 1579/5773 [2:27:04<6:22:16, 5.47s/it] {'loss': 0.5877, 'learning_rate': 1.705020742885208e-05, 'epoch': 0.27} 27%|██▋ | 1579/5773 [2:26:58<6:22:15, 5.47s/it] 27%|██▋ | 1580/5773 [2:27:03<6:18:07, 5.41s/it] 27%|██▋ | 1580/5773 [2:27:09<6:18:07, 5.41s/it] {'loss': 0.5829, 'learning_rate': 1.704622708067619e-05, 'epoch': 0.27} 27%|██▋ | 1580/5773 [2:27:09<6:18:07, 5.41s/it] {'loss': 0.5829, 'learning_rate': 1.704622708067619e-05, 'epoch': 0.27} 27%|██▋ | 1580/5773 [2:27:03<6:18:07, 5.41s/it] 27%|██▋ | 1581/5773 [2:27:09<6:19:23, 5.43s/it] 27%|██▋ | 1581/5773 [2:27:14<6:19:23, 5.43s/it] {'loss': 0.5791, 'learning_rate': 1.704224451412289e-05, 'epoch': 0.27} 27%|██▋ | 1581/5773 [2:27:14<6:19:23, 5.43s/it] {'loss': 0.5791, 'learning_rate': 1.704224451412289e-05, 'epoch': 0.27} 27%|██▋ | 1581/5773 [2:27:09<6:19:23, 5.43s/it] 27%|██▋ | 1582/5773 [2:27:14<6:18:09, 5.41s/it] 27%|██▋ | 1582/5773 [2:27:20<6:18:09, 5.41s/it] {'loss': 0.5915, 'learning_rate': 1.703825973044602e-05, 'epoch': 0.27} 27%|██▋ | 1582/5773 [2:27:20<6:18:09, 5.41s/it] {'loss': 0.5915, 'learning_rate': 1.703825973044602e-05, 'epoch': 0.27} 27%|██▋ | 1582/5773 [2:27:14<6:18:09, 5.41s/it] 27%|██▋ | 1583/5773 [2:27:20<6:17:15, 5.40s/it] 27%|██▋ | 1583/5773 [2:27:25<6:17:15, 5.40s/it] {'loss': 0.5863, 'learning_rate': 1.7034272730900116e-05, 'epoch': 0.27} 27%|██▋ | 1583/5773 [2:27:25<6:17:15, 5.40s/it] {'loss': 0.5863, 'learning_rate': 1.7034272730900116e-05, 'epoch': 0.27} 27%|██▋ | 1583/5773 [2:27:20<6:17:15, 5.40s/it] 27%|██▋ | 1584/5773 [2:27:25<6:16:10, 5.39s/it] 27%|██▋ | 1584/5773 [2:27:30<6:16:10, 5.39s/it] {'loss': 0.5879, 'learning_rate': 1.7030283516740414e-05, 'epoch': 0.27} 27%|██▋ | 1584/5773 [2:27:30<6:16:10, 5.39s/it] {'loss': 0.5879, 'learning_rate': 1.7030283516740414e-05, 'epoch': 0.27} 27%|██▋ | 1584/5773 [2:27:25<6:16:10, 5.39s/it] 27%|██▋ | 1585/5773 [2:27:30<6:15:36, 5.38s/it] 27%|██▋ | 1585/5773 [2:27:36<6:15:36, 5.38s/it] {'loss': 0.5816, 'learning_rate': 1.7026292089222844e-05, 'epoch': 0.27} 27%|██▋ | 1585/5773 [2:27:36<6:15:36, 5.38s/it] {'loss': 0.5816, 'learning_rate': 1.7026292089222844e-05, 'epoch': 0.27} 27%|██▋ | 1585/5773 [2:27:30<6:15:36, 5.38s/it] 27%|██▋ | 1586/5773 [2:27:36<6:17:09, 5.40s/it] 27%|██▋ | 1586/5773 [2:27:41<6:17:09, 5.40s/it] {'loss': 0.5786, 'learning_rate': 1.7022298449604037e-05, 'epoch': 0.27} 27%|██▋ | 1586/5773 [2:27:41<6:17:09, 5.40s/it] {'loss': 0.5786, 'learning_rate': 1.7022298449604037e-05, 'epoch': 0.27} 27%|██▋ | 1586/5773 [2:27:36<6:17:09, 5.40s/it] 27%|██▋ | 1587/5773 [2:27:41<6:18:35, 5.43s/it] 27%|██▋ | 1587/5773 [2:27:47<6:18:35, 5.43s/it] {'loss': 0.5849, 'learning_rate': 1.7018302599141313e-05, 'epoch': 0.27} 27%|██▋ | 1587/5773 [2:27:47<6:18:35, 5.43s/it] {'loss': 0.5849, 'learning_rate': 1.7018302599141313e-05, 'epoch': 0.27} 27%|██▋ | 1587/5773 [2:27:41<6:18:35, 5.43s/it] 28%|██▊ | 1588/5773 [2:27:47<6:18:43, 5.43s/it] 28%|██▊ | 1588/5773 [2:27:52<6:18:43, 5.43s/it] {'loss': 0.5851, 'learning_rate': 1.7014304539092705e-05, 'epoch': 0.28} 28%|██▊ | 1588/5773 [2:27:52<6:18:43, 5.43s/it] {'loss': 0.5851, 'learning_rate': 1.7014304539092705e-05, 'epoch': 0.28} 28%|██▊ | 1588/5773 [2:27:47<6:18:43, 5.43s/it] 28%|██▊ | 1589/5773 [2:27:52<6:17:17, 5.41s/it] 28%|██▊ | 1589/5773 [2:27:58<6:17:18, 5.41s/it] {'loss': 0.6043, 'learning_rate': 1.7010304270716917e-05, 'epoch': 0.28} 28%|██▊ | 1589/5773 [2:27:58<6:17:18, 5.41s/it] {'loss': 0.6043, 'learning_rate': 1.7010304270716917e-05, 'epoch': 0.28} 28%|██▊ | 1589/5773 [2:27:52<6:17:17, 5.41s/it] 28%|██▊ | 1590/5773 [2:27:57<6:15:31, 5.39s/it] 28%|██▊ | 1590/5773 [2:28:03<6:15:31, 5.39s/it] {'loss': 0.5698, 'learning_rate': 1.7006301795273365e-05, 'epoch': 0.28} 28%|██▊ | 1590/5773 [2:28:03<6:15:31, 5.39s/it] {'loss': 0.5698, 'learning_rate': 1.7006301795273365e-05, 'epoch': 0.28} 28%|██▊ | 1590/5773 [2:27:57<6:15:31, 5.39s/it] 28%|██▊ | 1591/5773 [2:28:03<6:14:19, 5.37s/it] 28%|██▊ | 1591/5773 [2:28:08<6:14:18, 5.37s/it] {'loss': 0.5888, 'learning_rate': 1.700229711402216e-05, 'epoch': 0.28} 28%|██▊ | 1591/5773 [2:28:08<6:14:18, 5.37s/it] {'loss': 0.5888, 'learning_rate': 1.700229711402216e-05, 'epoch': 0.28} 28%|██▊ | 1591/5773 [2:28:03<6:14:19, 5.37s/it] 28%|██▊ | 1592/5773 [2:28:08<6:15:52, 5.39s/it] 28%|██▊ | 1592/5773 [2:28:14<6:15:53, 5.39s/it] {'loss': 0.5926, 'learning_rate': 1.6998290228224106e-05, 'epoch': 0.28} 28%|██▊ | 1592/5773 [2:28:14<6:15:53, 5.39s/it] {'loss': 0.5926, 'learning_rate': 1.6998290228224106e-05, 'epoch': 0.28} 28%|██▊ | 1592/5773 [2:28:08<6:15:52, 5.39s/it] 28%|██▊ | 1593/5773 [2:28:14<6:15:08, 5.38s/it] 28%|██▊ | 1593/5773 [2:28:19<6:15:09, 5.38s/it] {'loss': 0.596, 'learning_rate': 1.699428113914069e-05, 'epoch': 0.28} 28%|██▊ | 1593/5773 [2:28:19<6:15:09, 5.38s/it] {'loss': 0.596, 'learning_rate': 1.699428113914069e-05, 'epoch': 0.28} 28%|██▊ | 1593/5773 [2:28:14<6:15:08, 5.38s/it] 28%|██▊ | 1594/5773 [2:28:19<6:15:32, 5.39s/it] 28%|██▊ | 1594/5773 [2:28:24<6:15:32, 5.39s/it] {'loss': 0.5695, 'learning_rate': 1.6990269848034104e-05, 'epoch': 0.28} 28%|██▊ | 1594/5773 [2:28:24<6:15:32, 5.39s/it] {'loss': 0.5695, 'learning_rate': 1.6990269848034104e-05, 'epoch': 0.28} 28%|██▊ | 1594/5773 [2:28:19<6:15:32, 5.39s/it] 28%|██▊ | 1595/5773 [2:28:24<6:18:14, 5.43s/it] 28%|██▊ | 1595/5773 [2:28:30<6:18:14, 5.43s/it] {'loss': 0.5887, 'learning_rate': 1.6986256356167237e-05, 'epoch': 0.28} 28%|██▊ | 1595/5773 [2:28:30<6:18:14, 5.43s/it] {'loss': 0.5887, 'learning_rate': 1.6986256356167237e-05, 'epoch': 0.28} 28%|██▊ | 1595/5773 [2:28:24<6:18:14, 5.43s/it] 28%|██▊ | 1596/5773 [2:28:30<6:19:19, 5.45s/it] 28%|██▊ | 1596/5773 [2:28:35<6:19:19, 5.45s/it] {'loss': 0.5798, 'learning_rate': 1.698224066480366e-05, 'epoch': 0.28} 28%|██▊ | 1596/5773 [2:28:35<6:19:19, 5.45s/it] {'loss': 0.5798, 'learning_rate': 1.698224066480366e-05, 'epoch': 0.28} 28%|██▊ | 1596/5773 [2:28:30<6:19:19, 5.45s/it] 28%|██▊ | 1597/5773 [2:28:35<6:20:48, 5.47s/it] 28%|██▊ | 1597/5773 [2:28:41<6:20:48, 5.47s/it] {'loss': 0.5949, 'learning_rate': 1.6978222775207636e-05, 'epoch': 0.28} 28%|██▊ | 1597/5773 [2:28:41<6:20:48, 5.47s/it] {'loss': 0.5949, 'learning_rate': 1.6978222775207636e-05, 'epoch': 0.28} 28%|██▊ | 1597/5773 [2:28:35<6:20:48, 5.47s/it] 28%|██▊ | 1598/5773 [2:28:41<6:17:51, 5.43s/it] 28%|██▊ | 1598/5773 [2:28:46<6:17:52, 5.43s/it] {'loss': 0.577, 'learning_rate': 1.6974202688644132e-05, 'epoch': 0.28} 28%|██▊ | 1598/5773 [2:28:46<6:17:52, 5.43s/it] {'loss': 0.577, 'learning_rate': 1.6974202688644132e-05, 'epoch': 0.28} 28%|██▊ | 1598/5773 [2:28:41<6:17:51, 5.43s/it] 28%|██▊ | 1599/5773 [2:28:46<6:18:26, 5.44s/it] 28%|██▊ | 1599/5773 [2:28:52<6:18:25, 5.44s/it] {'loss': 0.5924, 'learning_rate': 1.69701804063788e-05, 'epoch': 0.28} 28%|██▊ | 1599/5773 [2:28:52<6:18:25, 5.44s/it] {'loss': 0.5924, 'learning_rate': 1.69701804063788e-05, 'epoch': 0.28} 28%|██▊ | 1599/5773 [2:28:46<6:18:26, 5.44s/it]12 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend...2 AutoResumeHook: Checking whether to suspend... 28%|██▊ | 1600/5773 [2:28:52<6:18:46, 5.45s/it] 6 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 28%|██▊ | 1600/5773 [2:28:57<6:18:46, 5.45s/it] {'loss': 0.5978, 'learning_rate': 1.6966155929677973e-05, 'epoch': 0.28} 28%|██▊ | 1600/5773 [2:28:57<6:18:46, 5.45s/it] {'loss': 0.5978, 'learning_rate': 1.6966155929677973e-05, 'epoch': 0.28} 28%|██▊ | 1600/5773 [2:28:52<6:18:46, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 28%|██▊ | 1601/5773 [2:29:11<11:05:53, 9.58s/it] 28%|██▊ | 1601/5773 [2:29:16<11:05:52, 9.58s/it] {'loss': 0.5988, 'learning_rate': 1.69621292598087e-05, 'epoch': 0.28} 28%|██▊ | 1601/5773 [2:29:16<11:05:52, 9.58s/it] {'loss': 0.5988, 'learning_rate': 1.69621292598087e-05, 'epoch': 0.28} 28%|██▊ | 1601/5773 [2:29:11<11:05:53, 9.58s/it] 28%|██▊ | 1602/5773 [2:29:16<9:40:07, 8.35s/it] 28%|██▊ | 1602/5773 [2:29:22<9:40:06, 8.34s/it] {'loss': 0.5786, 'learning_rate': 1.695810039803869e-05, 'epoch': 0.28} 28%|██▊ | 1602/5773 [2:29:22<9:40:06, 8.34s/it] {'loss': 0.5786, 'learning_rate': 1.695810039803869e-05, 'epoch': 0.28} 28%|██▊ | 1602/5773 [2:29:16<9:40:07, 8.35s/it] 28%|██▊ | 1603/5773 [2:29:22<8:38:54, 7.47s/it] 28%|██▊ | 1603/5773 [2:29:27<8:38:55, 7.47s/it] {'loss': 0.589, 'learning_rate': 1.695406934563637e-05, 'epoch': 0.28} 28%|██▊ | 1603/5773 [2:29:27<8:38:55, 7.47s/it] {'loss': 0.589, 'learning_rate': 1.695406934563637e-05, 'epoch': 0.28} 28%|██▊ | 1603/5773 [2:29:22<8:38:54, 7.47s/it] 28%|██▊ | 1604/5773 [2:29:27<7:54:11, 6.82s/it] 28%|██▊ | 1604/5773 [2:29:33<7:54:11, 6.82s/it] {'loss': 0.5896, 'learning_rate': 1.695003610387084e-05, 'epoch': 0.28} 28%|██▊ | 1604/5773 [2:29:33<7:54:11, 6.82s/it] {'loss': 0.5896, 'learning_rate': 1.695003610387084e-05, 'epoch': 0.28} 28%|██▊ | 1604/5773 [2:29:27<7:54:11, 6.82s/it] 28%|██▊ | 1605/5773 [2:29:33<7:25:28, 6.41s/it] 28%|██▊ | 1605/5773 [2:29:38<7:25:27, 6.41s/it] {'loss': 0.6019, 'learning_rate': 1.6946000674011892e-05, 'epoch': 0.28} 28%|██▊ | 1605/5773 [2:29:38<7:25:27, 6.41s/it] {'loss': 0.6019, 'learning_rate': 1.6946000674011892e-05, 'epoch': 0.28} 28%|██▊ | 1605/5773 [2:29:33<7:25:28, 6.41s/it] 28%|██▊ | 1606/5773 [2:29:38<7:06:45, 6.14s/it] 28%|██▊ | 1606/5773 [2:29:44<7:06:44, 6.14s/it] {'loss': 0.5781, 'learning_rate': 1.6941963057330006e-05, 'epoch': 0.28} 28%|██▊ | 1606/5773 [2:29:44<7:06:44, 6.14s/it] {'loss': 0.5781, 'learning_rate': 1.6941963057330006e-05, 'epoch': 0.28} 28%|██▊ | 1606/5773 [2:29:38<7:06:45, 6.14s/it] 28%|██▊ | 1607/5773 [2:29:44<6:52:25, 5.94s/it] 28%|██▊ | 1607/5773 [2:29:49<6:52:24, 5.94s/it] {'loss': 0.5852, 'learning_rate': 1.6937923255096354e-05, 'epoch': 0.28} 28%|██▊ | 1607/5773 [2:29:49<6:52:24, 5.94s/it] {'loss': 0.5852, 'learning_rate': 1.6937923255096354e-05, 'epoch': 0.28} 28%|██▊ | 1607/5773 [2:29:44<6:52:25, 5.94s/it] 28%|██▊ | 1608/5773 [2:29:49<6:45:42, 5.84s/it] 28%|██▊ | 1608/5773 [2:29:55<6:45:41, 5.84s/it] {'loss': 0.5733, 'learning_rate': 1.69338812685828e-05, 'epoch': 0.28} 28%|██▊ | 1608/5773 [2:29:55<6:45:41, 5.84s/it] {'loss': 0.5733, 'learning_rate': 1.69338812685828e-05, 'epoch': 0.28} 28%|██▊ | 1608/5773 [2:29:49<6:45:42, 5.84s/it] 28%|██▊ | 1609/5773 [2:29:55<6:36:28, 5.71s/it] 28%|██▊ | 1609/5773 [2:30:00<6:36:28, 5.71s/it] {'loss': 0.5701, 'learning_rate': 1.6929837099061885e-05, 'epoch': 0.28} 28%|██▊ | 1609/5773 [2:30:00<6:36:28, 5.71s/it] {'loss': 0.5701, 'learning_rate': 1.6929837099061885e-05, 'epoch': 0.28} 28%|██▊ | 1609/5773 [2:29:55<6:36:28, 5.71s/it] 28%|██▊ | 1610/5773 [2:30:00<6:33:29, 5.67s/it] 28%|██▊ | 1610/5773 [2:30:06<6:33:29, 5.67s/it] {'loss': 0.5741, 'learning_rate': 1.6925790747806845e-05, 'epoch': 0.28} 28%|██▊ | 1610/5773 [2:30:06<6:33:29, 5.67s/it] {'loss': 0.5741, 'learning_rate': 1.6925790747806845e-05, 'epoch': 0.28} 28%|██▊ | 1610/5773 [2:30:00<6:33:29, 5.67s/it] 28%|██▊ | 1611/5773 [2:30:06<6:27:53, 5.59s/it] 28%|██▊ | 1611/5773 [2:30:11<6:27:53, 5.59s/it] {'loss': 0.5743, 'learning_rate': 1.6921742216091602e-05, 'epoch': 0.28} 28%|██▊ | 1611/5773 [2:30:11<6:27:53, 5.59s/it] {'loss': 0.5743, 'learning_rate': 1.6921742216091602e-05, 'epoch': 0.28} 28%|██▊ | 1611/5773 [2:30:06<6:27:53, 5.59s/it] 28%|██▊ | 1612/5773 [2:30:11<6:25:29, 5.56s/it] 28%|██▊ | 1612/5773 [2:30:17<6:25:29, 5.56s/it] {'loss': 0.5722, 'learning_rate': 1.6917691505190756e-05, 'epoch': 0.28} 28%|██▊ | 1612/5773 [2:30:17<6:25:29, 5.56s/it] {'loss': 0.5722, 'learning_rate': 1.6917691505190756e-05, 'epoch': 0.28} 28%|██▊ | 1612/5773 [2:30:11<6:25:29, 5.56s/it] 28%|██▊ | 1613/5773 [2:30:16<6:21:36, 5.50s/it] 28%|██▊ | 1613/5773 [2:30:22<6:21:36, 5.50s/it] {'loss': 0.6066, 'learning_rate': 1.691363861637961e-05, 'epoch': 0.28} 28%|██▊ | 1613/5773 [2:30:22<6:21:36, 5.50s/it] {'loss': 0.6066, 'learning_rate': 1.691363861637961e-05, 'epoch': 0.28} 28%|██▊ | 1613/5773 [2:30:16<6:21:36, 5.50s/it] 28%|██▊ | 1614/5773 [2:30:27<6:17:00, 5.44s/it] 28%|██▊ | 1614/5773 [2:30:22<6:16:59, 5.44s/it] {'loss': 0.5668, 'learning_rate': 1.6909583550934137e-05, 'epoch': 0.28} 28%|██▊ | 1614/5773 [2:30:27<6:17:00, 5.44s/it] {'loss': 0.5668, 'learning_rate': 1.6909583550934137e-05, 'epoch': 0.28} 28%|██▊ | 1614/5773 [2:30:22<6:16:59, 5.44s/it] 28%|██▊ | 1615/5773 [2:30:27<6:17:10, 5.44s/it] 28%|██▊ | 1615/5773 [2:30:33<6:17:10, 5.44s/it] {'loss': 0.5894, 'learning_rate': 1.6905526310131e-05, 'epoch': 0.28} 28%|██▊ | 1615/5773 [2:30:33<6:17:10, 5.44s/it] {'loss': 0.5894, 'learning_rate': 1.6905526310131e-05, 'epoch': 0.28} 28%|██▊ | 1615/5773 [2:30:27<6:17:10, 5.44s/it] 28%|██▊ | 1616/5773 [2:30:33<6:15:15, 5.42s/it] 28%|██▊ | 1616/5773 [2:30:38<6:15:15, 5.42s/it] {'loss': 0.595, 'learning_rate': 1.6901466895247554e-05, 'epoch': 0.28} 28%|██▊ | 1616/5773 [2:30:38<6:15:15, 5.42s/it] {'loss': 0.595, 'learning_rate': 1.6901466895247554e-05, 'epoch': 0.28} 28%|██▊ | 1616/5773 [2:30:33<6:15:15, 5.42s/it] 28%|██▊ | 1617/5773 [2:30:38<6:16:17, 5.43s/it] 28%|██▊ | 1617/5773 [2:30:44<6:16:17, 5.43s/it] {'loss': 0.5914, 'learning_rate': 1.6897405307561826e-05, 'epoch': 0.28} 28%|██▊ | 1617/5773 [2:30:44<6:16:17, 5.43s/it] {'loss': 0.5914, 'learning_rate': 1.6897405307561826e-05, 'epoch': 0.28} 28%|██▊ | 1617/5773 [2:30:38<6:16:17, 5.43s/it] 28%|██▊ | 1618/5773 [2:30:43<6:15:09, 5.42s/it] 28%|██▊ | 1618/5773 [2:30:49<6:15:09, 5.42s/it] {'loss': 0.5871, 'learning_rate': 1.689334154835254e-05, 'epoch': 0.28} 28%|██▊ | 1618/5773 [2:30:49<6:15:09, 5.42s/it] {'loss': 0.5871, 'learning_rate': 1.689334154835254e-05, 'epoch': 0.28} 28%|██▊ | 1618/5773 [2:30:43<6:15:09, 5.42s/it] 28%|██▊ | 1619/5773 [2:30:49<6:11:32, 5.37s/it] 28%|██▊ | 1619/5773 [2:30:54<6:11:32, 5.37s/it] {'loss': 0.5872, 'learning_rate': 1.688927561889909e-05, 'epoch': 0.28} 28%|██▊ | 1619/5773 [2:30:54<6:11:32, 5.37s/it] {'loss': 0.5872, 'learning_rate': 1.688927561889909e-05, 'epoch': 0.28} 28%|██▊ | 1619/5773 [2:30:49<6:11:32, 5.37s/it] 28%|██▊ | 1620/5773 [2:30:54<6:15:45, 5.43s/it] 28%|██▊ | 1620/5773 [2:31:00<6:15:45, 5.43s/it] {'loss': 0.61, 'learning_rate': 1.6885207520481568e-05, 'epoch': 0.28} 28%|██▊ | 1620/5773 [2:31:00<6:15:45, 5.43s/it] {'loss': 0.61, 'learning_rate': 1.6885207520481568e-05, 'epoch': 0.28} 28%|██▊ | 1620/5773 [2:30:54<6:15:45, 5.43s/it] 28%|██▊ | 1621/5773 [2:31:00<6:15:43, 5.43s/it] 28%|██▊ | 1621/5773 [2:31:05<6:15:43, 5.43s/it] {'loss': 0.5827, 'learning_rate': 1.6881137254380736e-05, 'epoch': 0.28} 28%|██▊ | 1621/5773 [2:31:05<6:15:43, 5.43s/it] {'loss': 0.5827, 'learning_rate': 1.6881137254380736e-05, 'epoch': 0.28} 28%|██▊ | 1621/5773 [2:31:00<6:15:43, 5.43s/it] 28%|██▊ | 1622/5773 [2:31:05<6:15:20, 5.43s/it] 28%|██▊ | 1622/5773 [2:31:11<6:15:20, 5.43s/it] {'loss': 0.57, 'learning_rate': 1.6877064821878044e-05, 'epoch': 0.28} 28%|██▊ | 1622/5773 [2:31:11<6:15:20, 5.43s/it] {'loss': 0.57, 'learning_rate': 1.6877064821878044e-05, 'epoch': 0.28} 28%|██▊ | 1622/5773 [2:31:05<6:15:20, 5.43s/it] 28%|██▊ | 1623/5773 [2:31:11<6:15:06, 5.42s/it] 28%|██▊ | 1623/5773 [2:31:16<6:15:06, 5.42s/it] {'loss': 0.5842, 'learning_rate': 1.687299022425563e-05, 'epoch': 0.28} 28%|██▊ | 1623/5773 [2:31:16<6:15:06, 5.42s/it] {'loss': 0.5842, 'learning_rate': 1.687299022425563e-05, 'epoch': 0.28} 28%|██▊ | 1623/5773 [2:31:11<6:15:06, 5.42s/it] 28%|██▊ | 1624/5773 [2:31:16<6:16:12, 5.44s/it] 28%|██▊ | 1624/5773 [2:31:22<6:16:14, 5.44s/it] {'loss': 0.6083, 'learning_rate': 1.6868913462796298e-05, 'epoch': 0.28} 28%|██▊ | 1624/5773 [2:31:22<6:16:14, 5.44s/it] {'loss': 0.6083, 'learning_rate': 1.6868913462796298e-05, 'epoch': 0.28} 28%|██▊ | 1624/5773 [2:31:16<6:16:12, 5.44s/it] 28%|██▊ | 1625/5773 [2:31:22<6:19:11, 5.48s/it] 28%|██▊ | 1625/5773 [2:31:27<6:19:10, 5.48s/it] {'loss': 0.5816, 'learning_rate': 1.686483453878355e-05, 'epoch': 0.28} 28%|██▊ | 1625/5773 [2:31:27<6:19:10, 5.48s/it] {'loss': 0.5816, 'learning_rate': 1.686483453878355e-05, 'epoch': 0.28} 28%|██▊ | 1625/5773 [2:31:22<6:19:11, 5.48s/it] 28%|██▊ | 1626/5773 [2:31:27<6:16:14, 5.44s/it] 28%|██▊ | 1626/5773 [2:31:32<6:16:13, 5.44s/it] {'loss': 0.5955, 'learning_rate': 1.686075345350156e-05, 'epoch': 0.28} {'loss': 0.5955, 'learning_rate': 1.686075345350156e-05, 'epoch': 0.28} 28%|██▊ | 1626/5773 [2:31:32<6:16:13, 5.44s/it] 28%|██▊ | 1626/5773 [2:31:27<6:16:14, 5.44s/it] 28%|██▊ | 1627/5773 [2:31:32<6:12:50, 5.40s/it] 28%|██▊ | 1627/5773 [2:31:38<6:12:50, 5.40s/it] {'loss': 0.5816, 'learning_rate': 1.685667020823518e-05, 'epoch': 0.28} 28%|██▊ | 1627/5773 [2:31:38<6:12:50, 5.40s/it] {'loss': 0.5816, 'learning_rate': 1.685667020823518e-05, 'epoch': 0.28} 28%|██▊ | 1627/5773 [2:31:32<6:12:50, 5.40s/it] 28%|██▊ | 1628/5773 [2:31:38<6:11:25, 5.38s/it] 28%|██▊ | 1628/5773 [2:31:43<6:11:26, 5.38s/it] {'loss': 0.5844, 'learning_rate': 1.685258480426995e-05, 'epoch': 0.28} 28%|██▊ | 1628/5773 [2:31:43<6:11:26, 5.38s/it] {'loss': 0.5844, 'learning_rate': 1.685258480426995e-05, 'epoch': 0.28} 28%|██▊ | 1628/5773 [2:31:38<6:11:25, 5.38s/it] 28%|██▊ | 1629/5773 [2:31:43<6:12:18, 5.39s/it] 28%|██▊ | 1629/5773 [2:31:48<6:12:18, 5.39s/it] {'loss': 0.5898, 'learning_rate': 1.6848497242892086e-05, 'epoch': 0.28} 28%|██▊ | 1629/5773 [2:31:48<6:12:18, 5.39s/it] {'loss': 0.5898, 'learning_rate': 1.6848497242892086e-05, 'epoch': 0.28} 28%|██▊ | 1629/5773 [2:31:43<6:12:18, 5.39s/it] 28%|██▊ | 1630/5773 [2:31:48<6:13:35, 5.41s/it] 28%|██▊ | 1630/5773 [2:31:54<6:13:36, 5.41s/it] {'loss': 0.5807, 'learning_rate': 1.6844407525388486e-05, 'epoch': 0.28} 28%|██▊ | 1630/5773 [2:31:54<6:13:36, 5.41s/it] {'loss': 0.5807, 'learning_rate': 1.6844407525388486e-05, 'epoch': 0.28} 28%|██▊ | 1630/5773 [2:31:48<6:13:35, 5.41s/it] 28%|██▊ | 1631/5773 [2:31:54<6:15:06, 5.43s/it] 28%|██▊ | 1631/5773 [2:31:59<6:15:06, 5.43s/it] {'loss': 0.5975, 'learning_rate': 1.6840315653046715e-05, 'epoch': 0.28} 28%|██▊ | 1631/5773 [2:31:59<6:15:06, 5.43s/it] {'loss': 0.5975, 'learning_rate': 1.6840315653046715e-05, 'epoch': 0.28} 28%|██▊ | 1631/5773 [2:31:54<6:15:06, 5.43s/it] 28%|██▊ | 1632/5773 [2:31:59<6:14:13, 5.42s/it] 28%|██▊ | 1632/5773 [2:32:05<6:14:13, 5.42s/it] {'loss': 0.5775, 'learning_rate': 1.6836221627155033e-05, 'epoch': 0.28} 28%|██▊ | 1632/5773 [2:31:59<6:14:13, 5.42s/it]{'loss': 0.5775, 'learning_rate': 1.6836221627155033e-05, 'epoch': 0.28} 28%|██▊ | 1632/5773 [2:32:05<6:14:13, 5.42s/it] 28%|██▊ | 1633/5773 [2:32:05<6:17:29, 5.47s/it] 28%|██▊ | 1633/5773 [2:32:10<6:17:29, 5.47s/it] {'loss': 0.5953, 'learning_rate': 1.6832125449002363e-05, 'epoch': 0.28} 28%|██▊ | 1633/5773 [2:32:10<6:17:29, 5.47s/it] {'loss': 0.5953, 'learning_rate': 1.6832125449002363e-05, 'epoch': 0.28} 28%|██▊ | 1633/5773 [2:32:05<6:17:29, 5.47s/it] 28%|██▊ | 1634/5773 [2:32:10<6:18:29, 5.49s/it] 28%|██▊ | 1634/5773 [2:32:16<6:18:29, 5.49s/it] {'loss': 0.568, 'learning_rate': 1.6828027119878316e-05, 'epoch': 0.28} 28%|██▊ | 1634/5773 [2:32:16<6:18:29, 5.49s/it] {'loss': 0.568, 'learning_rate': 1.6828027119878316e-05, 'epoch': 0.28} 28%|██▊ | 1634/5773 [2:32:10<6:18:29, 5.49s/it] 28%|██▊ | 1635/5773 [2:32:16<6:19:14, 5.50s/it] 28%|██▊ | 1635/5773 [2:32:21<6:19:13, 5.50s/it] {'loss': 0.5847, 'learning_rate': 1.6823926641073184e-05, 'epoch': 0.28} 28%|██▊ | 1635/5773 [2:32:21<6:19:13, 5.50s/it] {'loss': 0.5847, 'learning_rate': 1.6823926641073184e-05, 'epoch': 0.28} 28%|██▊ | 1635/5773 [2:32:16<6:19:14, 5.50s/it] 28%|██▊ | 1636/5773 [2:32:21<6:19:36, 5.51s/it] 28%|██▊ | 1636/5773 [2:32:27<6:19:36, 5.51s/it] {'loss': 0.5907, 'learning_rate': 1.6819824013877917e-05, 'epoch': 0.28} 28%|██▊ | 1636/5773 [2:32:27<6:19:36, 5.51s/it] {'loss': 0.5907, 'learning_rate': 1.6819824013877917e-05, 'epoch': 0.28} 28%|██▊ | 1636/5773 [2:32:21<6:19:36, 5.51s/it] 28%|██▊ | 1637/5773 [2:32:27<6:19:35, 5.51s/it] 28%|██▊ | 1637/5773 [2:32:33<6:19:36, 5.51s/it] {'loss': 0.5722, 'learning_rate': 1.681571923958416e-05, 'epoch': 0.28} 28%|██▊ | 1637/5773 [2:32:33<6:19:36, 5.51s/it] {'loss': 0.5722, 'learning_rate': 1.681571923958416e-05, 'epoch': 0.28} 28%|██▊ | 1637/5773 [2:32:27<6:19:35, 5.51s/it] 28%|██▊ | 1638/5773 [2:32:32<6:19:29, 5.51s/it] 28%|██▊ | 1638/5773 [2:32:38<6:19:29, 5.51s/it] {'loss': 0.5711, 'learning_rate': 1.681161231948423e-05, 'epoch': 0.28} 28%|██▊ | 1638/5773 [2:32:38<6:19:29, 5.51s/it]{'loss': 0.5711, 'learning_rate': 1.681161231948423e-05, 'epoch': 0.28} 28%|██▊ | 1638/5773 [2:32:32<6:19:29, 5.51s/it] 28%|██▊ | 1639/5773 [2:32:38<6:17:57, 5.49s/it] 28%|██▊ | 1639/5773 [2:32:43<6:17:57, 5.49s/it] {'loss': 0.594, 'learning_rate': 1.6807503254871105e-05, 'epoch': 0.28} 28%|██▊ | 1639/5773 [2:32:43<6:17:57, 5.49s/it] {'loss': 0.594, 'learning_rate': 1.6807503254871105e-05, 'epoch': 0.28} 28%|██▊ | 1639/5773 [2:32:38<6:17:57, 5.49s/it] 28%|██▊ | 1640/5773 [2:32:43<6:17:43, 5.48s/it] 28%|██▊ | 1640/5773 [2:32:49<6:17:43, 5.48s/it] {'loss': 0.5931, 'learning_rate': 1.6803392047038457e-05, 'epoch': 0.28} 28%|██▊ | 1640/5773 [2:32:49<6:17:43, 5.48s/it] {'loss': 0.5931, 'learning_rate': 1.6803392047038457e-05, 'epoch': 0.28} 28%|██▊ | 1640/5773 [2:32:43<6:17:43, 5.48s/it] 28%|██▊ | 1641/5773 [2:32:49<6:16:27, 5.47s/it] 28%|██▊ | 1641/5773 [2:32:54<6:16:26, 5.47s/it] {'loss': 0.5861, 'learning_rate': 1.679927869728063e-05, 'epoch': 0.28} 28%|██▊ | 1641/5773 [2:32:54<6:16:26, 5.47s/it] {'loss': 0.5861, 'learning_rate': 1.679927869728063e-05, 'epoch': 0.28} 28%|██▊ | 1641/5773 [2:32:49<6:16:27, 5.47s/it] 28%|██▊ | 1642/5773 [2:32:54<6:16:18, 5.47s/it] 28%|██▊ | 1642/5773 [2:33:00<6:16:18, 5.47s/it] {'loss': 0.5983, 'learning_rate': 1.6795163206892632e-05, 'epoch': 0.28} 28%|██▊ | 1642/5773 [2:33:00<6:16:18, 5.47s/it] {'loss': 0.5983, 'learning_rate': 1.6795163206892632e-05, 'epoch': 0.28} 28%|██▊ | 1642/5773 [2:32:54<6:16:18, 5.47s/it] 28%|██▊ | 1643/5773 [2:33:00<6:18:41, 5.50s/it] 28%|██▊ | 1643/5773 [2:33:05<6:18:41, 5.50s/it] {'loss': 0.6027, 'learning_rate': 1.6791045577170157e-05, 'epoch': 0.28} 28%|██▊ | 1643/5773 [2:33:05<6:18:41, 5.50s/it] {'loss': 0.6027, 'learning_rate': 1.6791045577170157e-05, 'epoch': 0.28} 28%|██▊ | 1643/5773 [2:33:00<6:18:41, 5.50s/it] 28%|██▊ | 1644/5773 [2:33:05<6:18:55, 5.51s/it] 28%|██▊ | 1644/5773 [2:33:11<6:18:55, 5.51s/it] {'loss': 0.5699, 'learning_rate': 1.6786925809409555e-05, 'epoch': 0.28} 28%|██▊ | 1644/5773 [2:33:11<6:18:55, 5.51s/it] {'loss': 0.5699, 'learning_rate': 1.6786925809409555e-05, 'epoch': 0.28} 28%|██▊ | 1644/5773 [2:33:05<6:18:55, 5.51s/it] 28%|██▊ | 1645/5773 [2:33:11<6:17:20, 5.48s/it] 28%|██▊ | 1645/5773 [2:33:16<6:17:20, 5.48s/it] {'loss': 0.5847, 'learning_rate': 1.678280390490787e-05, 'epoch': 0.28} 28%|██▊ | 1645/5773 [2:33:16<6:17:20, 5.48s/it] {'loss': 0.5847, 'learning_rate': 1.678280390490787e-05, 'epoch': 0.28} 28%|██▊ | 1645/5773 [2:33:11<6:17:20, 5.48s/it] 29%|██▊ | 1646/5773 [2:33:16<6:16:57, 5.48s/it] 29%|██▊ | 1646/5773 [2:33:22<6:16:57, 5.48s/it] {'loss': 0.5785, 'learning_rate': 1.6778679864962807e-05, 'epoch': 0.29} 29%|██▊ | 1646/5773 [2:33:22<6:16:57, 5.48s/it] {'loss': 0.5785, 'learning_rate': 1.6778679864962807e-05, 'epoch': 0.29} 29%|██▊ | 1646/5773 [2:33:16<6:16:57, 5.48s/it] 29%|██▊ | 1647/5773 [2:33:22<6:16:08, 5.47s/it] 29%|██▊ | 1647/5773 [2:33:27<6:16:08, 5.47s/it] {'loss': 0.5844, 'learning_rate': 1.6774553690872745e-05, 'epoch': 0.29} 29%|██▊ | 1647/5773 [2:33:27<6:16:08, 5.47s/it] {'loss': 0.5844, 'learning_rate': 1.6774553690872745e-05, 'epoch': 0.29} 29%|██▊ | 1647/5773 [2:33:22<6:16:08, 5.47s/it] 29%|██▊ | 1648/5773 [2:33:27<6:17:50, 5.50s/it] 29%|██▊ | 1648/5773 [2:33:33<6:17:50, 5.50s/it] {'loss': 0.5936, 'learning_rate': 1.6770425383936734e-05, 'epoch': 0.29} 29%|██▊ | 1648/5773 [2:33:33<6:17:50, 5.50s/it] {'loss': 0.5936, 'learning_rate': 1.6770425383936734e-05, 'epoch': 0.29} 29%|██▊ | 1648/5773 [2:33:27<6:17:50, 5.50s/it] 29%|██▊ | 1649/5773 [2:33:33<6:15:02, 5.46s/it] 29%|██▊ | 1649/5773 [2:33:38<6:15:02, 5.46s/it] {'loss': 0.59, 'learning_rate': 1.6766294945454502e-05, 'epoch': 0.29} 29%|██▊ | 1649/5773 [2:33:38<6:15:02, 5.46s/it] {'loss': 0.59, 'learning_rate': 1.6766294945454502e-05, 'epoch': 0.29} 29%|██▊ | 1649/5773 [2:33:33<6:15:02, 5.46s/it]14 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...13 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 11 29%|██▊ | 1650/5773 [2:33:38<6:15:42, 5.47s/it]2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 29%|██▊ | 1650/5773 [2:33:44<6:15:41, 5.47s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5716, 'learning_rate': 1.6762162376726438e-05, 'epoch': 0.29} 29%|██▊ | 1650/5773 [2:33:44<6:15:41, 5.47s/it] {'loss': 0.5716, 'learning_rate': 1.6762162376726438e-05, 'epoch': 0.29} 29%|██▊ | 1650/5773 [2:33:38<6:15:42, 5.47s/it] 29%|██▊ | 1651/5773 [2:33:44<6:13:40, 5.44s/it] 29%|██▊ | 1651/5773 [2:33:49<6:13:41, 5.44s/it] {'loss': 0.5754, 'learning_rate': 1.6758027679053607e-05, 'epoch': 0.29} 29%|██▊ | 1651/5773 [2:33:49<6:13:41, 5.44s/it] {'loss': 0.5754, 'learning_rate': 1.6758027679053607e-05, 'epoch': 0.29} 29%|██▊ | 1651/5773 [2:33:44<6:13:40, 5.44s/it] 29%|██▊ | 1652/5773 [2:33:49<6:16:37, 5.48s/it] 29%|██▊ | 1652/5773 [2:33:55<6:16:36, 5.48s/it] {'loss': 0.5979, 'learning_rate': 1.675389085373775e-05, 'epoch': 0.29} 29%|██▊ | 1652/5773 [2:33:55<6:16:36, 5.48s/it] {'loss': 0.5979, 'learning_rate': 1.675389085373775e-05, 'epoch': 0.29} 29%|██▊ | 1652/5773 [2:33:49<6:16:37, 5.48s/it] 29%|██▊ | 1653/5773 [2:33:54<6:11:49, 5.41s/it] 29%|██▊ | 1653/5773 [2:34:00<6:11:48, 5.41s/it] {'loss': 0.5786, 'learning_rate': 1.674975190208126e-05, 'epoch': 0.29} 29%|██▊ | 1653/5773 [2:34:00<6:11:48, 5.41s/it] {'loss': 0.5786, 'learning_rate': 1.674975190208126e-05, 'epoch': 0.29} 29%|██▊ | 1653/5773 [2:33:54<6:11:49, 5.41s/it] 29%|██▊ | 1654/5773 [2:34:00<6:17:04, 5.49s/it] 29%|██▊ | 1654/5773 [2:34:06<6:17:03, 5.49s/it] {'loss': 0.6052, 'learning_rate': 1.674561082538722e-05, 'epoch': 0.29} 29%|██▊ | 1654/5773 [2:34:06<6:17:03, 5.49s/it] {'loss': 0.6052, 'learning_rate': 1.674561082538722e-05, 'epoch': 0.29} 29%|██▊ | 1654/5773 [2:34:00<6:17:04, 5.49s/it] 29%|██▊ | 1655/5773 [2:34:06<6:19:22, 5.53s/it] 29%|██▊ | 1655/5773 [2:34:11<6:19:22, 5.53s/it] {'loss': 0.5916, 'learning_rate': 1.6741467624959372e-05, 'epoch': 0.29} 29%|██▊ | 1655/5773 [2:34:11<6:19:22, 5.53s/it] {'loss': 0.5916, 'learning_rate': 1.6741467624959372e-05, 'epoch': 0.29} 29%|██▊ | 1655/5773 [2:34:06<6:19:22, 5.53s/it] 29%|██▊ | 1656/5773 [2:34:11<6:17:22, 5.50s/it] 29%|██▊ | 1656/5773 [2:34:17<6:17:22, 5.50s/it] {'loss': 0.5778, 'learning_rate': 1.673732230210213e-05, 'epoch': 0.29} 29%|██▊ | 1656/5773 [2:34:17<6:17:22, 5.50s/it] {'loss': 0.5778, 'learning_rate': 1.673732230210213e-05, 'epoch': 0.29} 29%|██▊ | 1656/5773 [2:34:11<6:17:22, 5.50s/it] 29%|██▊ | 1657/5773 [2:34:17<6:15:46, 5.48s/it] 29%|██▊ | 1657/5773 [2:34:22<6:15:46, 5.48s/it] {'loss': 0.6019, 'learning_rate': 1.6733174858120568e-05, 'epoch': 0.29} 29%|██▊ | 1657/5773 [2:34:22<6:15:46, 5.48s/it] {'loss': 0.6019, 'learning_rate': 1.6733174858120568e-05, 'epoch': 0.29} 29%|██▊ | 1657/5773 [2:34:17<6:15:46, 5.48s/it] 29%|██▊ | 1658/5773 [2:34:22<6:14:45, 5.46s/it] 29%|██▊ | 1658/5773 [2:34:27<6:14:46, 5.46s/it] {'loss': 0.5711, 'learning_rate': 1.672902529432044e-05, 'epoch': 0.29} 29%|██▊ | 1658/5773 [2:34:27<6:14:46, 5.46s/it] {'loss': 0.5711, 'learning_rate': 1.672902529432044e-05, 'epoch': 0.29} 29%|██▊ | 1658/5773 [2:34:22<6:14:45, 5.46s/it] 29%|██▊ | 1659/5773 [2:34:27<6:12:46, 5.44s/it] 29%|██▊ | 1659/5773 [2:34:33<6:12:46, 5.44s/it] {'loss': 0.5932, 'learning_rate': 1.6724873612008155e-05, 'epoch': 0.29} 29%|██▊ | 1659/5773 [2:34:33<6:12:46, 5.44s/it] {'loss': 0.5932, 'learning_rate': 1.6724873612008155e-05, 'epoch': 0.29} 29%|██▊ | 1659/5773 [2:34:27<6:12:46, 5.44s/it] 29%|██▉ | 1660/5773 [2:34:33<6:13:34, 5.45s/it] 29%|██▉ | 1660/5773 [2:34:38<6:13:34, 5.45s/it] {'loss': 0.5828, 'learning_rate': 1.67207198124908e-05, 'epoch': 0.29} 29%|██▉ | 1660/5773 [2:34:38<6:13:34, 5.45s/it] {'loss': 0.5828, 'learning_rate': 1.67207198124908e-05, 'epoch': 0.29} 29%|██▉ | 1660/5773 [2:34:33<6:13:34, 5.45s/it] 29%|██▉ | 1661/5773 [2:34:38<6:12:47, 5.44s/it] 29%|██▉ | 1661/5773 [2:34:44<6:12:47, 5.44s/it] {'loss': 0.5747, 'learning_rate': 1.6716563897076122e-05, 'epoch': 0.29} 29%|██▉ | 1661/5773 [2:34:44<6:12:47, 5.44s/it] {'loss': 0.5747, 'learning_rate': 1.6716563897076122e-05, 'epoch': 0.29} 29%|██▉ | 1661/5773 [2:34:38<6:12:47, 5.44s/it] 29%|██▉ | 1662/5773 [2:34:44<6:13:57, 5.46s/it] 29%|██▉ | 1662/5773 [2:34:49<6:13:58, 5.46s/it] {'loss': 0.6034, 'learning_rate': 1.6712405867072537e-05, 'epoch': 0.29} 29%|██▉ | 1662/5773 [2:34:49<6:13:58, 5.46s/it] {'loss': 0.6034, 'learning_rate': 1.6712405867072537e-05, 'epoch': 0.29} 29%|██▉ | 1662/5773 [2:34:44<6:13:57, 5.46s/it] 29%|██▉ | 1663/5773 [2:34:49<6:15:16, 5.48s/it] 29%|██▉ | 1663/5773 [2:34:55<6:15:15, 5.48s/it] {'loss': 0.5791, 'learning_rate': 1.670824572378913e-05, 'epoch': 0.29} 29%|██▉ | 1663/5773 [2:34:55<6:15:15, 5.48s/it] {'loss': 0.5791, 'learning_rate': 1.670824572378913e-05, 'epoch': 0.29} 29%|██▉ | 1663/5773 [2:34:49<6:15:16, 5.48s/it] 29%|██▉ | 1664/5773 [2:34:55<6:14:00, 5.46s/it] 29%|██▉ | 1664/5773 [2:35:00<6:14:00, 5.46s/it] {'loss': 0.5854, 'learning_rate': 1.6704083468535636e-05, 'epoch': 0.29} 29%|██▉ | 1664/5773 [2:35:00<6:14:00, 5.46s/it] {'loss': 0.5854, 'learning_rate': 1.6704083468535636e-05, 'epoch': 0.29} 29%|██▉ | 1664/5773 [2:34:55<6:14:00, 5.46s/it] 29%|██▉ | 1665/5773 [2:35:00<6:14:12, 5.47s/it] 29%|██▉ | 1665/5773 [2:35:06<6:14:12, 5.47s/it] {'loss': 0.5695, 'learning_rate': 1.6699919102622474e-05, 'epoch': 0.29} 29%|██▉ | 1665/5773 [2:35:06<6:14:12, 5.47s/it] {'loss': 0.5695, 'learning_rate': 1.6699919102622474e-05, 'epoch': 0.29} 29%|██▉ | 1665/5773 [2:35:00<6:14:12, 5.47s/it] 29%|██▉ | 1666/5773 [2:35:05<6:11:34, 5.43s/it] 29%|██▉ | 1666/5773 [2:35:11<6:11:35, 5.43s/it] {'loss': 0.5889, 'learning_rate': 1.6695752627360718e-05, 'epoch': 0.29} 29%|██▉ | 1666/5773 [2:35:11<6:11:35, 5.43s/it] {'loss': 0.5889, 'learning_rate': 1.6695752627360718e-05, 'epoch': 0.29} 29%|██▉ | 1666/5773 [2:35:05<6:11:34, 5.43s/it] 29%|██▉ | 1667/5773 [2:35:11<6:11:24, 5.43s/it] 29%|██▉ | 1667/5773 [2:35:16<6:11:24, 5.43s/it] {'loss': 0.5774, 'learning_rate': 1.6691584044062102e-05, 'epoch': 0.29} 29%|██▉ | 1667/5773 [2:35:16<6:11:24, 5.43s/it] {'loss': 0.5774, 'learning_rate': 1.6691584044062102e-05, 'epoch': 0.29} 29%|██▉ | 1667/5773 [2:35:11<6:11:24, 5.43s/it] 29%|██▉ | 1668/5773 [2:35:16<6:11:56, 5.44s/it] 29%|██▉ | 1668/5773 [2:35:22<6:11:56, 5.44s/it] {'loss': 0.5787, 'learning_rate': 1.668741335403904e-05, 'epoch': 0.29} 29%|██▉ | 1668/5773 [2:35:22<6:11:56, 5.44s/it] {'loss': 0.5787, 'learning_rate': 1.668741335403904e-05, 'epoch': 0.29} 29%|██▉ | 1668/5773 [2:35:16<6:11:56, 5.44s/it] 29%|██▉ | 1669/5773 [2:35:22<6:10:49, 5.42s/it] 29%|██▉ | 1669/5773 [2:35:27<6:10:50, 5.42s/it] {'loss': 0.5869, 'learning_rate': 1.668324055860459e-05, 'epoch': 0.29} 29%|██▉ | 1669/5773 [2:35:27<6:10:50, 5.42s/it] {'loss': 0.5869, 'learning_rate': 1.668324055860459e-05, 'epoch': 0.29} 29%|██▉ | 1669/5773 [2:35:22<6:10:49, 5.42s/it] 29%|██▉ | 1670/5773 [2:35:27<6:10:35, 5.42s/it] 29%|██▉ | 1670/5773 [2:35:33<6:10:35, 5.42s/it] {'loss': 0.5981, 'learning_rate': 1.6679065659072486e-05, 'epoch': 0.29} 29%|██▉ | 1670/5773 [2:35:33<6:10:35, 5.42s/it] {'loss': 0.5981, 'learning_rate': 1.6679065659072486e-05, 'epoch': 0.29} 29%|██▉ | 1670/5773 [2:35:27<6:10:35, 5.42s/it] 29%|██▉ | 1671/5773 [2:35:33<6:19:53, 5.56s/it] 29%|██▉ | 1671/5773 [2:35:39<6:19:52, 5.56s/it] {'loss': 0.5888, 'learning_rate': 1.667488865675712e-05, 'epoch': 0.29} 29%|██▉ | 1671/5773 [2:35:39<6:19:52, 5.56s/it] {'loss': 0.5888, 'learning_rate': 1.667488865675712e-05, 'epoch': 0.29} 29%|██▉ | 1671/5773 [2:35:33<6:19:53, 5.56s/it] 29%|██▉ | 1672/5773 [2:35:38<6:17:16, 5.52s/it] 29%|██▉ | 1672/5773 [2:35:44<6:17:16, 5.52s/it] {'loss': 0.5789, 'learning_rate': 1.667070955297354e-05, 'epoch': 0.29} 29%|██▉ | 1672/5773 [2:35:44<6:17:16, 5.52s/it] {'loss': 0.5789, 'learning_rate': 1.667070955297354e-05, 'epoch': 0.29} 29%|██▉ | 1672/5773 [2:35:38<6:17:16, 5.52s/it] 29%|██▉ | 1673/5773 [2:35:44<6:15:51, 5.50s/it] 29%|██▉ | 1673/5773 [2:35:49<6:15:51, 5.50s/it] {'loss': 0.6004, 'learning_rate': 1.6666528349037467e-05, 'epoch': 0.29} 29%|██▉ | 1673/5773 [2:35:49<6:15:51, 5.50s/it] {'loss': 0.6004, 'learning_rate': 1.6666528349037467e-05, 'epoch': 0.29} 29%|██▉ | 1673/5773 [2:35:44<6:15:51, 5.50s/it] 29%|██▉ | 1674/5773 [2:35:50<6:17:36, 5.53s/it] 29%|██▉ | 1674/5773 [2:35:55<6:17:36, 5.53s/it] {'loss': 0.5843, 'learning_rate': 1.6662345046265273e-05, 'epoch': 0.29} 29%|██▉ | 1674/5773 [2:35:55<6:17:36, 5.53s/it] {'loss': 0.5843, 'learning_rate': 1.6662345046265273e-05, 'epoch': 0.29} 29%|██▉ | 1674/5773 [2:35:50<6:17:36, 5.53s/it] 29%|██▉ | 1675/5773 [2:35:55<6:20:12, 5.57s/it] 29%|██▉ | 1675/5773 [2:36:01<6:20:12, 5.57s/it] {'loss': 0.5757, 'learning_rate': 1.6658159645974e-05, 'epoch': 0.29} 29%|██▉ | 1675/5773 [2:36:01<6:20:12, 5.57s/it] {'loss': 0.5757, 'learning_rate': 1.6658159645974e-05, 'epoch': 0.29} 29%|██▉ | 1675/5773 [2:35:55<6:20:12, 5.57s/it] 29%|██▉ | 1676/5773 [2:36:01<6:20:39, 5.57s/it] 29%|██▉ | 1676/5773 [2:36:06<6:20:39, 5.57s/it] {'loss': 0.5704, 'learning_rate': 1.6653972149481342e-05, 'epoch': 0.29} 29%|██▉ | 1676/5773 [2:36:06<6:20:39, 5.57s/it] {'loss': 0.5704, 'learning_rate': 1.6653972149481342e-05, 'epoch': 0.29} 29%|██▉ | 1676/5773 [2:36:01<6:20:39, 5.57s/it] 29%|██▉ | 1677/5773 [2:36:06<6:16:40, 5.52s/it] 29%|██▉ | 1677/5773 [2:36:12<6:16:39, 5.52s/it] {'loss': 0.5957, 'learning_rate': 1.664978255810566e-05, 'epoch': 0.29} 29%|██▉ | 1677/5773 [2:36:12<6:16:39, 5.52s/it] {'loss': 0.5957, 'learning_rate': 1.664978255810566e-05, 'epoch': 0.29} 29%|██▉ | 1677/5773 [2:36:06<6:16:40, 5.52s/it] 29%|██▉ | 1678/5773 [2:36:12<6:15:58, 5.51s/it] 29%|██▉ | 1678/5773 [2:36:17<6:15:58, 5.51s/it] {'loss': 0.566, 'learning_rate': 1.6645590873165968e-05, 'epoch': 0.29} 29%|██▉ | 1678/5773 [2:36:17<6:15:58, 5.51s/it] {'loss': 0.566, 'learning_rate': 1.6645590873165968e-05, 'epoch': 0.29} 29%|██▉ | 1678/5773 [2:36:12<6:15:58, 5.51s/it] 29%|██▉ | 1679/5773 [2:36:17<6:12:32, 5.46s/it] 29%|██▉ | 1679/5773 [2:36:23<6:12:32, 5.46s/it] {'loss': 0.6014, 'learning_rate': 1.6641397095981946e-05, 'epoch': 0.29} 29%|██▉ | 1679/5773 [2:36:23<6:12:32, 5.46s/it] {'loss': 0.6014, 'learning_rate': 1.6641397095981946e-05, 'epoch': 0.29} 29%|██▉ | 1679/5773 [2:36:17<6:12:32, 5.46s/it] 29%|██▉ | 1680/5773 [2:36:22<6:10:11, 5.43s/it] 29%|██▉ | 1680/5773 [2:36:28<6:10:11, 5.43s/it] {'loss': 0.5742, 'learning_rate': 1.6637201227873926e-05, 'epoch': 0.29} 29%|██▉ | 1680/5773 [2:36:28<6:10:11, 5.43s/it] {'loss': 0.5742, 'learning_rate': 1.6637201227873926e-05, 'epoch': 0.29} 29%|██▉ | 1680/5773 [2:36:22<6:10:11, 5.43s/it] 29%|██▉ | 1681/5773 [2:36:28<6:13:57, 5.48s/it] 29%|██▉ | 1681/5773 [2:36:33<6:13:58, 5.48s/it] {'loss': 0.5975, 'learning_rate': 1.6633003270162903e-05, 'epoch': 0.29} 29%|██▉ | 1681/5773 [2:36:33<6:13:58, 5.48s/it] {'loss': 0.5975, 'learning_rate': 1.6633003270162903e-05, 'epoch': 0.29} 29%|██▉ | 1681/5773 [2:36:28<6:13:57, 5.48s/it] 29%|██▉ | 1682/5773 [2:36:33<6:13:34, 5.48s/it] 29%|██▉ | 1682/5773 [2:36:39<6:13:34, 5.48s/it] {'loss': 0.5761, 'learning_rate': 1.6628803224170526e-05, 'epoch': 0.29} 29%|██▉ | 1682/5773 [2:36:39<6:13:34, 5.48s/it] {'loss': 0.5761, 'learning_rate': 1.6628803224170526e-05, 'epoch': 0.29} 29%|██▉ | 1682/5773 [2:36:33<6:13:34, 5.48s/it] 29%|██▉ | 1683/5773 [2:36:39<6:12:14, 5.46s/it] 29%|██▉ | 1683/5773 [2:36:44<6:12:14, 5.46s/it] {'loss': 0.5898, 'learning_rate': 1.662460109121911e-05, 'epoch': 0.29} 29%|██▉ | 1683/5773 [2:36:44<6:12:14, 5.46s/it] {'loss': 0.5898, 'learning_rate': 1.662460109121911e-05, 'epoch': 0.29} 29%|██▉ | 1683/5773 [2:36:39<6:12:14, 5.46s/it] 29%|██▉ | 1684/5773 [2:36:44<6:08:45, 5.41s/it] 29%|██▉ | 1684/5773 [2:36:50<6:08:45, 5.41s/it] {'loss': 0.5942, 'learning_rate': 1.6620396872631608e-05, 'epoch': 0.29} 29%|██▉ | 1684/5773 [2:36:50<6:08:45, 5.41s/it] {'loss': 0.5942, 'learning_rate': 1.6620396872631608e-05, 'epoch': 0.29} 29%|██▉ | 1684/5773 [2:36:44<6:08:45, 5.41s/it] 29%|██▉ | 1685/5773 [2:36:50<6:08:56, 5.41s/it] 29%|██▉ | 1685/5773 [2:36:55<6:08:55, 5.41s/it] {'loss': 0.6007, 'learning_rate': 1.6616190569731655e-05, 'epoch': 0.29} 29%|██▉ | 1685/5773 [2:36:55<6:08:55, 5.41s/it] {'loss': 0.6007, 'learning_rate': 1.6616190569731655e-05, 'epoch': 0.29} 29%|██▉ | 1685/5773 [2:36:50<6:08:56, 5.41s/it] 29%|██▉ | 1686/5773 [2:36:55<6:10:12, 5.43s/it] 29%|██▉ | 1686/5773 [2:37:01<6:10:12, 5.43s/it] {'loss': 0.5707, 'learning_rate': 1.6611982183843524e-05, 'epoch': 0.29} 29%|██▉ | 1686/5773 [2:37:01<6:10:12, 5.43s/it] {'loss': 0.5707, 'learning_rate': 1.6611982183843524e-05, 'epoch': 0.29} 29%|██▉ | 1686/5773 [2:36:55<6:10:12, 5.43s/it] 29%|██▉ | 1687/5773 [2:37:00<6:09:04, 5.42s/it] 29%|██▉ | 1687/5773 [2:37:06<6:09:04, 5.42s/it] {'loss': 0.5927, 'learning_rate': 1.6607771716292153e-05, 'epoch': 0.29} 29%|██▉ | 1687/5773 [2:37:06<6:09:04, 5.42s/it] {'loss': 0.5927, 'learning_rate': 1.6607771716292153e-05, 'epoch': 0.29} 29%|██▉ | 1687/5773 [2:37:00<6:09:04, 5.42s/it] 29%|██▉ | 1688/5773 [2:37:06<6:09:50, 5.43s/it] 29%|██▉ | 1688/5773 [2:37:11<6:09:49, 5.43s/it] {'loss': 0.5828, 'learning_rate': 1.6603559168403126e-05, 'epoch': 0.29} 29%|██▉ | 1688/5773 [2:37:11<6:09:49, 5.43s/it] {'loss': 0.5828, 'learning_rate': 1.6603559168403126e-05, 'epoch': 0.29} 29%|██▉ | 1688/5773 [2:37:06<6:09:50, 5.43s/it] 29%|██▉ | 1689/5773 [2:37:11<6:12:44, 5.48s/it] 29%|██▉ | 1689/5773 [2:37:17<6:12:44, 5.48s/it] {'loss': 0.5876, 'learning_rate': 1.659934454150269e-05, 'epoch': 0.29} 29%|██▉ | 1689/5773 [2:37:17<6:12:44, 5.48s/it] {'loss': 0.5876, 'learning_rate': 1.659934454150269e-05, 'epoch': 0.29} 29%|██▉ | 1689/5773 [2:37:11<6:12:44, 5.48s/it] 29%|██▉ | 1690/5773 [2:37:17<6:12:11, 5.47s/it] 29%|██▉ | 1690/5773 [2:37:22<6:12:12, 5.47s/it] {'loss': 0.579, 'learning_rate': 1.6595127836917744e-05, 'epoch': 0.29} 29%|██▉ | 1690/5773 [2:37:22<6:12:12, 5.47s/it] {'loss': 0.579, 'learning_rate': 1.6595127836917744e-05, 'epoch': 0.29} 29%|██▉ | 1690/5773 [2:37:17<6:12:11, 5.47s/it] 29%|██▉ | 1691/5773 [2:37:22<6:13:19, 5.49s/it] 29%|██▉ | 1691/5773 [2:37:28<6:13:18, 5.49s/it] {'loss': 0.5951, 'learning_rate': 1.6590909055975846e-05, 'epoch': 0.29} 29%|██▉ | 1691/5773 [2:37:28<6:13:18, 5.49s/it] {'loss': 0.5951, 'learning_rate': 1.6590909055975846e-05, 'epoch': 0.29} 29%|██▉ | 1691/5773 [2:37:22<6:13:19, 5.49s/it] 29%|██▉ | 1692/5773 [2:37:28<6:11:30, 5.46s/it] 29%|██▉ | 1692/5773 [2:37:33<6:11:29, 5.46s/it] {'loss': 0.5795, 'learning_rate': 1.6586688200005193e-05, 'epoch': 0.29} 29%|██▉ | 1692/5773 [2:37:33<6:11:29, 5.46s/it] {'loss': 0.5795, 'learning_rate': 1.6586688200005193e-05, 'epoch': 0.29} 29%|██▉ | 1692/5773 [2:37:28<6:11:30, 5.46s/it] 29%|██▉ | 1693/5773 [2:37:33<6:10:54, 5.45s/it] 29%|██▉ | 1693/5773 [2:37:39<6:10:54, 5.45s/it] {'loss': 0.588, 'learning_rate': 1.6582465270334655e-05, 'epoch': 0.29} 29%|██▉ | 1693/5773 [2:37:39<6:10:54, 5.45s/it] {'loss': 0.588, 'learning_rate': 1.6582465270334655e-05, 'epoch': 0.29} 29%|██▉ | 1693/5773 [2:37:33<6:10:54, 5.45s/it] 29%|██▉ | 1694/5773 [2:37:39<6:13:48, 5.50s/it] 29%|██▉ | 1694/5773 [2:37:44<6:13:49, 5.50s/it] {'loss': 0.5729, 'learning_rate': 1.6578240268293745e-05, 'epoch': 0.29} 29%|██▉ | 1694/5773 [2:37:44<6:13:49, 5.50s/it] {'loss': 0.5729, 'learning_rate': 1.6578240268293745e-05, 'epoch': 0.29} 29%|██▉ | 1694/5773 [2:37:39<6:13:48, 5.50s/it] 29%|██▉ | 1695/5773 [2:37:45<6:16:55, 5.55s/it] 29%|██▉ | 1695/5773 [2:37:50<6:16:55, 5.55s/it] {'loss': 0.5886, 'learning_rate': 1.657401319521262e-05, 'epoch': 0.29} 29%|██▉ | 1695/5773 [2:37:50<6:16:55, 5.55s/it] {'loss': 0.5886, 'learning_rate': 1.657401319521262e-05, 'epoch': 0.29} 29%|██▉ | 1695/5773 [2:37:45<6:16:55, 5.55s/it] 29%|██▉ | 1696/5773 [2:37:55<6:14:14, 5.51s/it] 29%|██▉ | 1696/5773 [2:37:50<6:14:14, 5.51s/it] {'loss': 0.5858, 'learning_rate': 1.6569784052422105e-05, 'epoch': 0.29} {'loss': 0.5858, 'learning_rate': 1.6569784052422105e-05, 'epoch': 0.29} 29%|██▉ | 1696/5773 [2:37:55<6:14:14, 5.51s/it] 29%|██▉ | 1696/5773 [2:37:50<6:14:14, 5.51s/it] 29%|██▉ | 1697/5773 [2:37:55<6:12:24, 5.48s/it] 29%|██▉ | 1697/5773 [2:38:01<6:12:24, 5.48s/it] {'loss': 0.6034, 'learning_rate': 1.6565552841253663e-05, 'epoch': 0.29} 29%|██▉ | 1697/5773 [2:38:01<6:12:24, 5.48s/it] {'loss': 0.6034, 'learning_rate': 1.6565552841253663e-05, 'epoch': 0.29} 29%|██▉ | 1697/5773 [2:37:55<6:12:24, 5.48s/it] 29%|██▉ | 1698/5773 [2:38:01<6:11:37, 5.47s/it] 29%|██▉ | 1698/5773 [2:38:06<6:11:37, 5.47s/it] {'loss': 0.5853, 'learning_rate': 1.6561319563039428e-05, 'epoch': 0.29} 29%|██▉ | 1698/5773 [2:38:06<6:11:37, 5.47s/it] {'loss': 0.5853, 'learning_rate': 1.6561319563039428e-05, 'epoch': 0.29} 29%|██▉ | 1698/5773 [2:38:01<6:11:37, 5.47s/it] 29%|██▉ | 1699/5773 [2:38:06<6:07:33, 5.41s/it] 29%|██▉ | 1699/5773 [2:38:12<6:07:33, 5.41s/it] {'loss': 0.5777, 'learning_rate': 1.655708421911215e-05, 'epoch': 0.29} 29%|██▉ | 1699/5773 [2:38:12<6:07:33, 5.41s/it] {'loss': 0.5777, 'learning_rate': 1.655708421911215e-05, 'epoch': 0.29} 29%|██▉ | 1699/5773 [2:38:06<6:07:33, 5.41s/it]1 AutoResumeHook: Checking whether to suspend... 1310 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 29%|██▉ | 1700/5773 [2:38:12<6:12:40, 5.49s/it]2 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 29%|██▉ | 1700/5773 [2:38:17<6:12:40, 5.49s/it] {'loss': 0.5866, 'learning_rate': 1.655284681080527e-05, 'epoch': 0.29} 29%|██▉ | 1700/5773 [2:38:17<6:12:40, 5.49s/it] {'loss': 0.5866, 'learning_rate': 1.655284681080527e-05, 'epoch': 0.29} 29%|██▉ | 1700/5773 [2:38:12<6:12:40, 5.49s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 29%|██▉ | 1701/5773 [2:38:30<10:22:24, 9.17s/it] 29%|██▉ | 1701/5773 [2:38:35<10:22:23, 9.17s/it] {'loss': 0.6017, 'learning_rate': 1.6548607339452853e-05, 'epoch': 0.29} 29%|██▉ | 1701/5773 [2:38:35<10:22:23, 9.17s/it] {'loss': 0.6017, 'learning_rate': 1.6548607339452853e-05, 'epoch': 0.29} 29%|██▉ | 1701/5773 [2:38:30<10:22:24, 9.17s/it] 29%|██▉ | 1702/5773 [2:38:35<9:04:13, 8.02s/it] 29%|██▉ | 1702/5773 [2:38:40<9:04:13, 8.02s/it] {'loss': 0.5756, 'learning_rate': 1.6544365806389614e-05, 'epoch': 0.29} 29%|██▉ | 1702/5773 [2:38:40<9:04:13, 8.02s/it] {'loss': 0.5756, 'learning_rate': 1.6544365806389614e-05, 'epoch': 0.29} 29%|██▉ | 1702/5773 [2:38:35<9:04:13, 8.02s/it] 29%|██▉ | 1703/5773 [2:38:40<8:09:01, 7.21s/it] 29%|██▉ | 1703/5773 [2:38:46<8:09:00, 7.21s/it] {'loss': 0.5753, 'learning_rate': 1.654012221295093e-05, 'epoch': 0.29} 29%|██▉ | 1703/5773 [2:38:46<8:09:00, 7.21s/it] {'loss': 0.5753, 'learning_rate': 1.654012221295093e-05, 'epoch': 0.29} 29%|██▉ | 1703/5773 [2:38:40<8:09:01, 7.21s/it] 30%|██▉ | 1704/5773 [2:38:46<7:32:03, 6.67s/it] 30%|██▉ | 1704/5773 [2:38:51<7:32:03, 6.67s/it] {'loss': 0.5924, 'learning_rate': 1.653587656047282e-05, 'epoch': 0.3} 30%|██▉ | 1704/5773 [2:38:51<7:32:03, 6.67s/it] {'loss': 0.5924, 'learning_rate': 1.653587656047282e-05, 'epoch': 0.3} 30%|██▉ | 1704/5773 [2:38:46<7:32:03, 6.67s/it] 30%|██▉ | 1705/5773 [2:38:51<7:02:49, 6.24s/it] 30%|██▉ | 1705/5773 [2:38:56<7:02:48, 6.24s/it] {'loss': 0.5832, 'learning_rate': 1.6531628850291947e-05, 'epoch': 0.3} 30%|██▉ | 1705/5773 [2:38:56<7:02:48, 6.24s/it] {'loss': 0.5832, 'learning_rate': 1.6531628850291947e-05, 'epoch': 0.3} 30%|██▉ | 1705/5773 [2:38:51<7:02:49, 6.24s/it] 30%|██▉ | 1706/5773 [2:38:56<6:43:37, 5.95s/it] 30%|██▉ | 1706/5773 [2:39:02<6:43:37, 5.95s/it] {'loss': 0.5851, 'learning_rate': 1.652737908374563e-05, 'epoch': 0.3} 30%|██▉ | 1706/5773 [2:39:02<6:43:37, 5.95s/it] {'loss': 0.5851, 'learning_rate': 1.652737908374563e-05, 'epoch': 0.3} 30%|██▉ | 1706/5773 [2:38:56<6:43:37, 5.95s/it] 30%|██▉ | 1707/5773 [2:39:01<6:28:49, 5.74s/it] 30%|██▉ | 1707/5773 [2:39:07<6:28:48, 5.74s/it] {'loss': 0.5646, 'learning_rate': 1.6523127262171827e-05, 'epoch': 0.3} 30%|██▉ | 1707/5773 [2:39:07<6:28:48, 5.74s/it] {'loss': 0.5646, 'learning_rate': 1.6523127262171827e-05, 'epoch': 0.3} 30%|██▉ | 1707/5773 [2:39:01<6:28:49, 5.74s/it] 30%|██▉ | 1708/5773 [2:39:07<6:25:20, 5.69s/it] 30%|██▉ | 1708/5773 [2:39:12<6:25:20, 5.69s/it] {'loss': 0.5968, 'learning_rate': 1.6518873386909148e-05, 'epoch': 0.3} {'loss': 0.5968, 'learning_rate': 1.6518873386909148e-05, 'epoch': 0.3} 30%|██▉ | 1708/5773 [2:39:07<6:25:20, 5.69s/it] 30%|██▉ | 1708/5773 [2:39:12<6:25:20, 5.69s/it] 30%|██▉ | 1709/5773 [2:39:13<6:24:48, 5.68s/it] 30%|██▉ | 1709/5773 [2:39:18<6:24:48, 5.68s/it] {'loss': 0.5919, 'learning_rate': 1.6514617459296852e-05, 'epoch': 0.3} 30%|██▉ | 1709/5773 [2:39:18<6:24:48, 5.68s/it] {'loss': 0.5919, 'learning_rate': 1.6514617459296852e-05, 'epoch': 0.3} 30%|██▉ | 1709/5773 [2:39:13<6:24:48, 5.68s/it] 30%|██▉ | 1710/5773 [2:39:18<6:20:50, 5.62s/it] 30%|██▉ | 1710/5773 [2:39:24<6:20:50, 5.62s/it] {'loss': 0.59, 'learning_rate': 1.651035948067484e-05, 'epoch': 0.3} 30%|██▉ | 1710/5773 [2:39:24<6:20:50, 5.62s/it] {'loss': 0.59, 'learning_rate': 1.651035948067484e-05, 'epoch': 0.3} 30%|██▉ | 1710/5773 [2:39:18<6:20:50, 5.62s/it] 30%|██▉ | 1711/5773 [2:39:29<6:21:33, 5.64s/it] 30%|██▉ | 1711/5773 [2:39:24<6:21:33, 5.64s/it] {'loss': 0.5807, 'learning_rate': 1.6506099452383656e-05, 'epoch': 0.3} {'loss': 0.5807, 'learning_rate': 1.6506099452383656e-05, 'epoch': 0.3} 30%|██▉ | 1711/5773 [2:39:29<6:21:33, 5.64s/it] 30%|██▉ | 1711/5773 [2:39:24<6:21:33, 5.64s/it] 30%|██▉ | 1712/5773 [2:39:29<6:23:19, 5.66s/it] 30%|██▉ | 1712/5773 [2:39:35<6:23:19, 5.66s/it] {'loss': 0.5872, 'learning_rate': 1.65018373757645e-05, 'epoch': 0.3} 30%|██▉ | 1712/5773 [2:39:35<6:23:19, 5.66s/it] {'loss': 0.5872, 'learning_rate': 1.65018373757645e-05, 'epoch': 0.3} 30%|██▉ | 1712/5773 [2:39:29<6:23:19, 5.66s/it] 30%|██▉ | 1713/5773 [2:39:35<6:20:59, 5.63s/it] 30%|██▉ | 1713/5773 [2:39:41<6:20:58, 5.63s/it] {'loss': 0.5871, 'learning_rate': 1.6497573252159203e-05, 'epoch': 0.3} 30%|██▉ | 1713/5773 [2:39:41<6:20:58, 5.63s/it] {'loss': 0.5871, 'learning_rate': 1.6497573252159203e-05, 'epoch': 0.3} 30%|██▉ | 1713/5773 [2:39:35<6:20:59, 5.63s/it] 30%|██▉ | 1714/5773 [2:39:41<6:21:38, 5.64s/it] 30%|██▉ | 1714/5773 [2:39:46<6:21:37, 5.64s/it] {'loss': 0.5942, 'learning_rate': 1.649330708291025e-05, 'epoch': 0.3} 30%|██▉ | 1714/5773 [2:39:46<6:21:37, 5.64s/it] {'loss': 0.5942, 'learning_rate': 1.649330708291025e-05, 'epoch': 0.3} 30%|██▉ | 1714/5773 [2:39:41<6:21:38, 5.64s/it] 30%|██▉ | 1715/5773 [2:39:54<7:09:49, 6.36s/it] 30%|██▉ | 1715/5773 [2:39:49<7:09:49, 6.36s/it] {'loss': 0.5662, 'learning_rate': 1.648903886936077e-05, 'epoch': 0.3} 30%|██▉ | 1715/5773 [2:39:54<7:09:49, 6.36s/it] {'loss': 0.5662, 'learning_rate': 1.648903886936077e-05, 'epoch': 0.3} 30%|██▉ | 1715/5773 [2:39:49<7:09:49, 6.36s/it] 30%|██▉ | 1716/5773 [2:39:54<6:48:27, 6.04s/it] 30%|██▉ | 1716/5773 [2:40:00<6:48:27, 6.04s/it] {'loss': 0.5922, 'learning_rate': 1.648476861285453e-05, 'epoch': 0.3} 30%|██▉ | 1716/5773 [2:40:00<6:48:27, 6.04s/it] {'loss': 0.5922, 'learning_rate': 1.648476861285453e-05, 'epoch': 0.3} 30%|██▉ | 1716/5773 [2:39:54<6:48:27, 6.04s/it] 30%|██▉ | 1717/5773 [2:39:59<6:32:53, 5.81s/it] 30%|██▉ | 1717/5773 [2:40:05<6:32:52, 5.81s/it] {'loss': 0.5698, 'learning_rate': 1.6480496314735944e-05, 'epoch': 0.3} 30%|██▉ | 1717/5773 [2:40:05<6:32:52, 5.81s/it] {'loss': 0.5698, 'learning_rate': 1.6480496314735944e-05, 'epoch': 0.3} 30%|██▉ | 1717/5773 [2:39:59<6:32:53, 5.81s/it] 30%|██▉ | 1718/5773 [2:40:10<6:23:45, 5.68s/it] 30%|██▉ | 1718/5773 [2:40:05<6:23:45, 5.68s/it] {'loss': 0.5996, 'learning_rate': 1.6476221976350064e-05, 'epoch': 0.3} 30%|██▉ | 1718/5773 [2:40:10<6:23:45, 5.68s/it] {'loss': 0.5996, 'learning_rate': 1.6476221976350064e-05, 'epoch': 0.3} 30%|██▉ | 1718/5773 [2:40:05<6:23:45, 5.68s/it] 30%|██▉ | 1719/5773 [2:40:16<6:19:33, 5.62s/it] 30%|██▉ | 1719/5773 [2:40:10<6:19:33, 5.62s/it] {'loss': 0.5764, 'learning_rate': 1.6471945599042592e-05, 'epoch': 0.3} 30%|██▉ | 1719/5773 [2:40:16<6:19:33, 5.62s/it] {'loss': 0.5764, 'learning_rate': 1.6471945599042592e-05, 'epoch': 0.3} 30%|██▉ | 1719/5773 [2:40:10<6:19:33, 5.62s/it] 30%|██▉ | 1720/5773 [2:40:21<6:19:41, 5.62s/it] 30%|██▉ | 1720/5773 [2:40:16<6:19:41, 5.62s/it] {'loss': 0.5709, 'learning_rate': 1.6467667184159872e-05, 'epoch': 0.3} 30%|██▉ | 1720/5773 [2:40:21<6:19:41, 5.62s/it] {'loss': 0.5709, 'learning_rate': 1.6467667184159872e-05, 'epoch': 0.3} 30%|██▉ | 1720/5773 [2:40:16<6:19:41, 5.62s/it] 30%|██▉ | 1721/5773 [2:40:21<6:13:47, 5.53s/it] 30%|██▉ | 1721/5773 [2:40:27<6:13:48, 5.54s/it] {'loss': 0.5806, 'learning_rate': 1.646338673304888e-05, 'epoch': 0.3} {'loss': 0.5806, 'learning_rate': 1.646338673304888e-05, 'epoch': 0.3} 30%|██▉ | 1721/5773 [2:40:27<6:13:48, 5.54s/it] 30%|██▉ | 1721/5773 [2:40:21<6:13:47, 5.53s/it] 30%|██▉ | 1722/5773 [2:40:27<6:11:53, 5.51s/it] 30%|██▉ | 1722/5773 [2:40:32<6:11:53, 5.51s/it] {'loss': 0.5817, 'learning_rate': 1.645910424705724e-05, 'epoch': 0.3} 30%|██▉ | 1722/5773 [2:40:32<6:11:53, 5.51s/it] {'loss': 0.5817, 'learning_rate': 1.645910424705724e-05, 'epoch': 0.3} 30%|██▉ | 1722/5773 [2:40:27<6:11:53, 5.51s/it] 30%|██▉ | 1723/5773 [2:40:38<6:12:49, 5.52s/it] 30%|██▉ | 1723/5773 [2:40:32<6:12:50, 5.52s/it] {'loss': 0.5808, 'learning_rate': 1.6454819727533216e-05, 'epoch': 0.3} 30%|██▉ | 1723/5773 [2:40:38<6:12:49, 5.52s/it] {'loss': 0.5808, 'learning_rate': 1.6454819727533216e-05, 'epoch': 0.3} 30%|██▉ | 1723/5773 [2:40:32<6:12:50, 5.52s/it] 30%|██▉ | 1724/5773 [2:40:43<6:08:27, 5.46s/it] 30%|██▉ | 1724/5773 [2:40:37<6:08:27, 5.46s/it] {'loss': 0.5929, 'learning_rate': 1.6450533175825715e-05, 'epoch': 0.3} 30%|██▉ | 1724/5773 [2:40:43<6:08:27, 5.46s/it] {'loss': 0.5929, 'learning_rate': 1.6450533175825715e-05, 'epoch': 0.3} 30%|██▉ | 1724/5773 [2:40:37<6:08:27, 5.46s/it] 30%|██▉ | 1725/5773 [2:40:43<6:07:08, 5.44s/it] 30%|██▉ | 1725/5773 [2:40:48<6:07:09, 5.44s/it] {'loss': 0.5819, 'learning_rate': 1.6446244593284277e-05, 'epoch': 0.3} {'loss': 0.5819, 'learning_rate': 1.6446244593284277e-05, 'epoch': 0.3} 30%|██▉ | 1725/5773 [2:40:48<6:07:09, 5.44s/it] 30%|██▉ | 1725/5773 [2:40:43<6:07:08, 5.44s/it] 30%|██▉ | 1726/5773 [2:40:48<6:07:17, 5.45s/it] 30%|██▉ | 1726/5773 [2:40:54<6:07:17, 5.45s/it] {'loss': 0.574, 'learning_rate': 1.644195398125908e-05, 'epoch': 0.3} 30%|██▉ | 1726/5773 [2:40:54<6:07:17, 5.45s/it] {'loss': 0.574, 'learning_rate': 1.644195398125908e-05, 'epoch': 0.3} 30%|██▉ | 1726/5773 [2:40:48<6:07:17, 5.45s/it] 30%|██▉ | 1727/5773 [2:40:54<6:05:03, 5.41s/it] 30%|██▉ | 1727/5773 [2:40:59<6:05:03, 5.41s/it] {'loss': 0.5765, 'learning_rate': 1.6437661341100954e-05, 'epoch': 0.3} 30%|██▉ | 1727/5773 [2:40:59<6:05:03, 5.41s/it] {'loss': 0.5765, 'learning_rate': 1.6437661341100954e-05, 'epoch': 0.3} 30%|██▉ | 1727/5773 [2:40:54<6:05:03, 5.41s/it] 30%|██▉ | 1728/5773 [2:40:59<6:06:03, 5.43s/it] 30%|██▉ | 1728/5773 [2:41:05<6:06:04, 5.43s/it] {'loss': 0.5867, 'learning_rate': 1.6433366674161354e-05, 'epoch': 0.3} 30%|██▉ | 1728/5773 [2:41:05<6:06:04, 5.43s/it] {'loss': 0.5867, 'learning_rate': 1.6433366674161354e-05, 'epoch': 0.3} 30%|██▉ | 1728/5773 [2:40:59<6:06:03, 5.43s/it] 30%|██▉ | 1729/5773 [2:41:04<6:04:55, 5.41s/it] 30%|██▉ | 1729/5773 [2:41:10<6:04:55, 5.41s/it] {'loss': 0.5791, 'learning_rate': 1.6429069981792386e-05, 'epoch': 0.3} 30%|██▉ | 1729/5773 [2:41:10<6:04:55, 5.41s/it] {'loss': 0.5791, 'learning_rate': 1.6429069981792386e-05, 'epoch': 0.3} 30%|██▉ | 1729/5773 [2:41:04<6:04:55, 5.41s/it] 30%|██▉ | 1730/5773 [2:41:10<6:07:34, 5.45s/it] 30%|██▉ | 1730/5773 [2:41:16<6:07:34, 5.45s/it] {'loss': 0.5993, 'learning_rate': 1.6424771265346775e-05, 'epoch': 0.3} 30%|██▉ | 1730/5773 [2:41:16<6:07:34, 5.45s/it] {'loss': 0.5993, 'learning_rate': 1.6424771265346775e-05, 'epoch': 0.3} 30%|██▉ | 1730/5773 [2:41:10<6:07:34, 5.45s/it] 30%|██▉ | 1731/5773 [2:41:15<6:06:36, 5.44s/it] 30%|██▉ | 1731/5773 [2:41:21<6:06:36, 5.44s/it] {'loss': 0.5835, 'learning_rate': 1.64204705261779e-05, 'epoch': 0.3} 30%|██▉ | 1731/5773 [2:41:21<6:06:36, 5.44s/it] {'loss': 0.5835, 'learning_rate': 1.64204705261779e-05, 'epoch': 0.3} 30%|██▉ | 1731/5773 [2:41:15<6:06:36, 5.44s/it] 30%|███ | 1732/5773 [2:41:21<6:07:27, 5.46s/it] 30%|███ | 1732/5773 [2:41:26<6:07:28, 5.46s/it] {'loss': 0.5894, 'learning_rate': 1.6416167765639773e-05, 'epoch': 0.3} 30%|███ | 1732/5773 [2:41:26<6:07:28, 5.46s/it] {'loss': 0.5894, 'learning_rate': 1.6416167765639773e-05, 'epoch': 0.3} 30%|███ | 1732/5773 [2:41:21<6:07:27, 5.46s/it] 30%|███ | 1733/5773 [2:41:26<6:07:18, 5.46s/it] 30%|███ | 1733/5773 [2:41:32<6:07:18, 5.46s/it] {'loss': 0.5779, 'learning_rate': 1.6411862985087036e-05, 'epoch': 0.3} 30%|███ | 1733/5773 [2:41:32<6:07:18, 5.46s/it] {'loss': 0.5779, 'learning_rate': 1.6411862985087036e-05, 'epoch': 0.3} 30%|███ | 1733/5773 [2:41:26<6:07:18, 5.46s/it] 30%|███ | 1734/5773 [2:41:32<6:07:48, 5.46s/it] 30%|███ | 1734/5773 [2:41:37<6:07:48, 5.46s/it] {'loss': 0.5758, 'learning_rate': 1.6407556185874975e-05, 'epoch': 0.3} 30%|███ | 1734/5773 [2:41:37<6:07:48, 5.46s/it] {'loss': 0.5758, 'learning_rate': 1.6407556185874975e-05, 'epoch': 0.3} 30%|███ | 1734/5773 [2:41:32<6:07:48, 5.46s/it] 30%|███ | 1735/5773 [2:41:37<6:06:21, 5.44s/it] 30%|███ | 1735/5773 [2:41:43<6:06:22, 5.44s/it] {'loss': 0.6006, 'learning_rate': 1.6403247369359502e-05, 'epoch': 0.3} 30%|███ | 1735/5773 [2:41:43<6:06:22, 5.44s/it] {'loss': 0.6006, 'learning_rate': 1.6403247369359502e-05, 'epoch': 0.3} 30%|███ | 1735/5773 [2:41:37<6:06:21, 5.44s/it] 30%|███ | 1736/5773 [2:41:43<6:06:11, 5.44s/it] 30%|███ | 1736/5773 [2:41:48<6:06:11, 5.44s/it] {'loss': 0.5876, 'learning_rate': 1.6398936536897182e-05, 'epoch': 0.3} 30%|███ | 1736/5773 [2:41:48<6:06:11, 5.44s/it] {'loss': 0.5876, 'learning_rate': 1.6398936536897182e-05, 'epoch': 0.3} 30%|███ | 1736/5773 [2:41:43<6:06:11, 5.44s/it] 30%|███ | 1737/5773 [2:41:54<6:06:46, 5.45s/it] 30%|███ | 1737/5773 [2:41:48<6:06:46, 5.45s/it] {'loss': 0.5889, 'learning_rate': 1.639462368984519e-05, 'epoch': 0.3} 30%|███ | 1737/5773 [2:41:48<6:06:46, 5.45s/it]{'loss': 0.5889, 'learning_rate': 1.639462368984519e-05, 'epoch': 0.3} 30%|███ | 1737/5773 [2:41:54<6:06:46, 5.45s/it] 30%|███ | 1738/5773 [2:41:54<6:09:27, 5.49s/it] 30%|███ | 1738/5773 [2:41:59<6:09:27, 5.49s/it] {'loss': 0.5848, 'learning_rate': 1.6390308829561358e-05, 'epoch': 0.3} 30%|███ | 1738/5773 [2:41:59<6:09:27, 5.49s/it] {'loss': 0.5848, 'learning_rate': 1.6390308829561358e-05, 'epoch': 0.3} 30%|███ | 1738/5773 [2:41:54<6:09:27, 5.49s/it] 30%|███ | 1739/5773 [2:42:05<6:06:45, 5.45s/it] 30%|███ | 1739/5773 [2:41:59<6:06:45, 5.46s/it] {'loss': 0.5664, 'learning_rate': 1.6385991957404136e-05, 'epoch': 0.3} 30%|███ | 1739/5773 [2:42:05<6:06:45, 5.45s/it] {'loss': 0.5664, 'learning_rate': 1.6385991957404136e-05, 'epoch': 0.3} 30%|███ | 1739/5773 [2:41:59<6:06:45, 5.46s/it] 30%|███ | 1740/5773 [2:42:04<6:05:12, 5.43s/it] 30%|███ | 1740/5773 [2:42:10<6:05:12, 5.43s/it] {'loss': 0.5868, 'learning_rate': 1.6381673074732615e-05, 'epoch': 0.3} 30%|███ | 1740/5773 [2:42:10<6:05:12, 5.43s/it] {'loss': 0.5868, 'learning_rate': 1.6381673074732615e-05, 'epoch': 0.3} 30%|███ | 1740/5773 [2:42:04<6:05:12, 5.43s/it] 30%|███ | 1741/5773 [2:42:10<6:04:31, 5.42s/it] 30%|███ | 1741/5773 [2:42:15<6:04:31, 5.42s/it] {'loss': 0.5885, 'learning_rate': 1.6377352182906515e-05, 'epoch': 0.3} 30%|███ | 1741/5773 [2:42:15<6:04:31, 5.42s/it] {'loss': 0.5885, 'learning_rate': 1.6377352182906515e-05, 'epoch': 0.3} 30%|███ | 1741/5773 [2:42:10<6:04:31, 5.42s/it] 30%|███ | 1742/5773 [2:42:15<6:05:59, 5.45s/it] 30%|███ | 1742/5773 [2:42:21<6:05:59, 5.45s/it] {'loss': 0.5878, 'learning_rate': 1.637302928328619e-05, 'epoch': 0.3} 30%|███ | 1742/5773 [2:42:21<6:05:59, 5.45s/it] {'loss': 0.5878, 'learning_rate': 1.637302928328619e-05, 'epoch': 0.3} 30%|███ | 1742/5773 [2:42:15<6:05:59, 5.45s/it] 30%|███ | 1743/5773 [2:42:26<6:06:41, 5.46s/it] 30%|███ | 1743/5773 [2:42:21<6:06:42, 5.46s/it] {'loss': 0.5811, 'learning_rate': 1.6368704377232637e-05, 'epoch': 0.3} 30%|███ | 1743/5773 [2:42:26<6:06:41, 5.46s/it] {'loss': 0.5811, 'learning_rate': 1.6368704377232637e-05, 'epoch': 0.3} 30%|███ | 1743/5773 [2:42:21<6:06:42, 5.46s/it] 30%|███ | 1744/5773 [2:42:26<6:09:20, 5.50s/it] 30%|███ | 1744/5773 [2:42:32<6:09:20, 5.50s/it] {'loss': 0.5846, 'learning_rate': 1.6364377466107465e-05, 'epoch': 0.3} 30%|███ | 1744/5773 [2:42:32<6:09:20, 5.50s/it] {'loss': 0.5846, 'learning_rate': 1.6364377466107465e-05, 'epoch': 0.3} 30%|███ | 1744/5773 [2:42:26<6:09:20, 5.50s/it] 30%|███ | 1745/5773 [2:42:32<6:08:14, 5.49s/it] 30%|███ | 1745/5773 [2:42:37<6:08:14, 5.49s/it] {'loss': 0.5747, 'learning_rate': 1.6360048551272926e-05, 'epoch': 0.3} 30%|███ | 1745/5773 [2:42:37<6:08:14, 5.49s/it] {'loss': 0.5747, 'learning_rate': 1.6360048551272926e-05, 'epoch': 0.3} 30%|███ | 1745/5773 [2:42:32<6:08:14, 5.49s/it] 30%|███ | 1746/5773 [2:42:37<6:07:24, 5.47s/it] 30%|███ | 1746/5773 [2:42:43<6:07:24, 5.47s/it] {'loss': 0.5993, 'learning_rate': 1.63557176340919e-05, 'epoch': 0.3} 30%|███ | 1746/5773 [2:42:43<6:07:24, 5.47s/it] {'loss': 0.5993, 'learning_rate': 1.63557176340919e-05, 'epoch': 0.3} 30%|███ | 1746/5773 [2:42:37<6:07:24, 5.47s/it] 30%|███ | 1747/5773 [2:42:43<6:06:45, 5.47s/it] 30%|███ | 1747/5773 [2:42:48<6:06:45, 5.47s/it] {'loss': 0.5729, 'learning_rate': 1.6351384715927897e-05, 'epoch': 0.3} 30%|███ | 1747/5773 [2:42:48<6:06:45, 5.47s/it] {'loss': 0.5729, 'learning_rate': 1.6351384715927897e-05, 'epoch': 0.3} 30%|███ | 1747/5773 [2:42:43<6:06:45, 5.47s/it] 30%|███ | 1748/5773 [2:42:48<6:08:09, 5.49s/it] 30%|███ | 1748/5773 [2:42:54<6:08:10, 5.49s/it] {'loss': 0.5994, 'learning_rate': 1.6347049798145064e-05, 'epoch': 0.3} 30%|███ | 1748/5773 [2:42:54<6:08:10, 5.49s/it] {'loss': 0.5994, 'learning_rate': 1.6347049798145064e-05, 'epoch': 0.3} 30%|███ | 1748/5773 [2:42:48<6:08:09, 5.49s/it] 30%|███ | 1749/5773 [2:42:57<7:15:42, 6.50s/it] 30%|███ | 1749/5773 [2:43:03<7:15:41, 6.50s/it] {'loss': 0.5765, 'learning_rate': 1.6342712882108166e-05, 'epoch': 0.3} 30%|███ | 1749/5773 [2:43:03<7:15:41, 6.50s/it] {'loss': 0.5765, 'learning_rate': 1.6342712882108166e-05, 'epoch': 0.3} 30%|███ | 1749/5773 [2:42:57<7:15:42, 6.50s/it]12 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 010 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...118 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 30%|███ | 1750/5773 [2:43:08<6:56:53, 6.22s/it]2 AutoResumeHook: Checking whether to suspend... 30%|███ | 1750/5773 [2:43:03<6:56:54, 6.22s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5813, 'learning_rate': 1.6338373969182605e-05, 'epoch': 0.3} 30%|███ | 1750/5773 [2:43:08<6:56:53, 6.22s/it] {'loss': 0.5813, 'learning_rate': 1.6338373969182605e-05, 'epoch': 0.3} 30%|███ | 1750/5773 [2:43:03<6:56:54, 6.22s/it] 30%|███ | 1751/5773 [2:43:14<6:40:55, 5.98s/it] 30%|███ | 1751/5773 [2:43:08<6:40:55, 5.98s/it] {'loss': 0.5635, 'learning_rate': 1.633403306073441e-05, 'epoch': 0.3} 30%|███ | 1751/5773 [2:43:14<6:40:55, 5.98s/it] {'loss': 0.5635, 'learning_rate': 1.633403306073441e-05, 'epoch': 0.3} 30%|███ | 1751/5773 [2:43:08<6:40:55, 5.98s/it] 30%|███ | 1752/5773 [2:43:14<6:33:48, 5.88s/it] 30%|███ | 1752/5773 [2:43:19<6:33:48, 5.88s/it] {'loss': 0.5895, 'learning_rate': 1.6329690158130233e-05, 'epoch': 0.3} 30%|███ | 1752/5773 [2:43:19<6:33:48, 5.88s/it] {'loss': 0.5895, 'learning_rate': 1.6329690158130233e-05, 'epoch': 0.3} 30%|███ | 1752/5773 [2:43:14<6:33:48, 5.88s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4901 > 4096). Running this sequence through the model will result in indexing errors 30%|███ | 1753/5773 [2:43:19<6:25:51, 5.76s/it] 30%|███ | 1753/5773 [2:43:25<6:25:51, 5.76s/it] {'loss': 0.5821, 'learning_rate': 1.6325345262737363e-05, 'epoch': 0.3} 30%|███ | 1753/5773 [2:43:20<6:25:51, 5.76s/it]{'loss': 0.5821, 'learning_rate': 1.6325345262737363e-05, 'epoch': 0.3} 30%|███ | 1753/5773 [2:43:25<6:25:51, 5.76s/it] 30%|███ | 1754/5773 [2:43:25<6:22:53, 5.72s/it] 30%|███ | 1754/5773 [2:43:30<6:22:53, 5.72s/it] {'loss': 0.584, 'learning_rate': 1.6320998375923708e-05, 'epoch': 0.3} 30%|███ | 1754/5773 [2:43:30<6:22:53, 5.72s/it] {'loss': 0.584, 'learning_rate': 1.6320998375923708e-05, 'epoch': 0.3} 30%|███ | 1754/5773 [2:43:25<6:22:53, 5.72s/it] 30%|███ | 1755/5773 [2:43:36<6:17:48, 5.64s/it] 30%|███ | 1755/5773 [2:43:30<6:17:49, 5.64s/it] {'loss': 0.548, 'learning_rate': 1.6316649499057816e-05, 'epoch': 0.3} 30%|███ | 1755/5773 [2:43:36<6:17:48, 5.64s/it] {'loss': 0.548, 'learning_rate': 1.6316649499057816e-05, 'epoch': 0.3} 30%|███ | 1755/5773 [2:43:30<6:17:49, 5.64s/it] 30%|███ | 1756/5773 [2:43:36<6:14:48, 5.60s/it] 30%|███ | 1756/5773 [2:43:41<6:14:48, 5.60s/it] {'loss': 0.5839, 'learning_rate': 1.6312298633508844e-05, 'epoch': 0.3} 30%|███ | 1756/5773 [2:43:41<6:14:48, 5.60s/it] {'loss': 0.5839, 'learning_rate': 1.6312298633508844e-05, 'epoch': 0.3} 30%|███ | 1756/5773 [2:43:36<6:14:48, 5.60s/it] 30%|███ | 1757/5773 [2:43:41<6:11:48, 5.55s/it] 30%|███ | 1757/5773 [2:43:47<6:11:48, 5.55s/it] {'loss': 0.5771, 'learning_rate': 1.6307945780646584e-05, 'epoch': 0.3} 30%|███ | 1757/5773 [2:43:47<6:11:48, 5.55s/it] {'loss': 0.5771, 'learning_rate': 1.6307945780646584e-05, 'epoch': 0.3} 30%|███ | 1757/5773 [2:43:41<6:11:48, 5.55s/it] 30%|███ | 1758/5773 [2:43:47<6:09:06, 5.52s/it] 30%|███ | 1758/5773 [2:43:52<6:09:06, 5.52s/it] {'loss': 0.5829, 'learning_rate': 1.6303590941841457e-05, 'epoch': 0.3} 30%|███ | 1758/5773 [2:43:52<6:09:06, 5.52s/it] {'loss': 0.5829, 'learning_rate': 1.6303590941841457e-05, 'epoch': 0.3} 30%|███ | 1758/5773 [2:43:47<6:09:06, 5.52s/it] 30%|███ | 1759/5773 [2:43:58<6:07:58, 5.50s/it] 30%|███ | 1759/5773 [2:43:52<6:07:59, 5.50s/it] {'loss': 0.5852, 'learning_rate': 1.62992341184645e-05, 'epoch': 0.3} 30%|███ | 1759/5773 [2:43:58<6:07:58, 5.50s/it] {'loss': 0.5852, 'learning_rate': 1.62992341184645e-05, 'epoch': 0.3} 30%|███ | 1759/5773 [2:43:52<6:07:59, 5.50s/it] 30%|███ | 1760/5773 [2:43:58<6:08:29, 5.51s/it] 30%|███ | 1760/5773 [2:44:03<6:08:29, 5.51s/it] {'loss': 0.5861, 'learning_rate': 1.6294875311887384e-05, 'epoch': 0.3} 30%|███ | 1760/5773 [2:44:03<6:08:29, 5.51s/it] {'loss': 0.5861, 'learning_rate': 1.6294875311887384e-05, 'epoch': 0.3} 30%|███ | 1760/5773 [2:43:58<6:08:29, 5.51s/it] 31%|███ | 1761/5773 [2:44:03<6:08:57, 5.52s/it] 31%|███ | 1761/5773 [2:44:09<6:08:57, 5.52s/it] {'loss': 0.5832, 'learning_rate': 1.6290514523482405e-05, 'epoch': 0.31} 31%|███ | 1761/5773 [2:44:09<6:08:57, 5.52s/it] {'loss': 0.5832, 'learning_rate': 1.6290514523482405e-05, 'epoch': 0.31} 31%|███ | 1761/5773 [2:44:03<6:08:57, 5.52s/it] 31%|███ | 1762/5773 [2:44:12<7:21:19, 6.60s/it] 31%|███ | 1762/5773 [2:44:18<7:21:20, 6.60s/it] {'loss': 0.6013, 'learning_rate': 1.628615175462247e-05, 'epoch': 0.31} 31%|███ | 1762/5773 [2:44:18<7:21:20, 6.60s/it] {'loss': 0.6013, 'learning_rate': 1.628615175462247e-05, 'epoch': 0.31} 31%|███ | 1762/5773 [2:44:12<7:21:19, 6.60s/it] 31%|███ | 1763/5773 [2:44:18<6:55:57, 6.22s/it] 31%|███ | 1763/5773 [2:44:23<6:55:57, 6.22s/it] {'loss': 0.5657, 'learning_rate': 1.6281787006681116e-05, 'epoch': 0.31} 31%|███ | 1763/5773 [2:44:23<6:55:57, 6.22s/it] {'loss': 0.5657, 'learning_rate': 1.6281787006681116e-05, 'epoch': 0.31} 31%|███ | 1763/5773 [2:44:18<6:55:57, 6.22s/it] 31%|███ | 1764/5773 [2:44:23<6:43:29, 6.04s/it] 31%|███ | 1764/5773 [2:44:29<6:43:28, 6.04s/it] {'loss': 0.5795, 'learning_rate': 1.627742028103252e-05, 'epoch': 0.31} 31%|███ | 1764/5773 [2:44:29<6:43:28, 6.04s/it] {'loss': 0.5795, 'learning_rate': 1.627742028103252e-05, 'epoch': 0.31} 31%|███ | 1764/5773 [2:44:23<6:43:29, 6.04s/it] 31%|███ | 1765/5773 [2:44:32<7:32:30, 6.77s/it] 31%|███ | 1765/5773 [2:44:37<7:32:30, 6.77s/it] {'loss': 0.5773, 'learning_rate': 1.6273051579051453e-05, 'epoch': 0.31} 31%|███ | 1765/5773 [2:44:37<7:32:30, 6.77s/it] {'loss': 0.5773, 'learning_rate': 1.6273051579051453e-05, 'epoch': 0.31} 31%|███ | 1765/5773 [2:44:32<7:32:30, 6.77s/it] 31%|███ | 1766/5773 [2:44:37<7:04:20, 6.35s/it] 31%|███ | 1766/5773 [2:44:43<7:04:20, 6.35s/it] {'loss': 0.5873, 'learning_rate': 1.626868090211333e-05, 'epoch': 0.31} 31%|███ | 1766/5773 [2:44:43<7:04:20, 6.35s/it] {'loss': 0.5873, 'learning_rate': 1.626868090211333e-05, 'epoch': 0.31} 31%|███ | 1766/5773 [2:44:37<7:04:20, 6.35s/it] 31%|███ | 1767/5773 [2:44:43<6:43:27, 6.04s/it] 31%|███ | 1767/5773 [2:44:48<6:43:27, 6.04s/it] {'loss': 0.5834, 'learning_rate': 1.626430825159417e-05, 'epoch': 0.31} 31%|███ | 1767/5773 [2:44:48<6:43:27, 6.04s/it] {'loss': 0.5834, 'learning_rate': 1.626430825159417e-05, 'epoch': 0.31} 31%|███ | 1767/5773 [2:44:43<6:43:27, 6.04s/it] 31%|███ | 1768/5773 [2:44:48<6:31:50, 5.87s/it] 31%|███ | 1768/5773 [2:44:54<6:31:50, 5.87s/it] {'loss': 0.5887, 'learning_rate': 1.625993362887063e-05, 'epoch': 0.31} 31%|███ | 1768/5773 [2:44:54<6:31:50, 5.87s/it] {'loss': 0.5887, 'learning_rate': 1.625993362887063e-05, 'epoch': 0.31} 31%|███ | 1768/5773 [2:44:48<6:31:50, 5.87s/it] 31%|███ | 1769/5773 [2:45:02<7:13:20, 6.49s/it] 31%|███ | 1769/5773 [2:44:56<7:13:21, 6.49s/it] {'loss': 0.5751, 'learning_rate': 1.625555703531998e-05, 'epoch': 0.31} 31%|███ | 1769/5773 [2:45:02<7:13:20, 6.49s/it] {'loss': 0.5751, 'learning_rate': 1.625555703531998e-05, 'epoch': 0.31} 31%|███ | 1769/5773 [2:44:56<7:13:21, 6.49s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 31%|███ | 1770/5773 [2:45:07<6:53:34, 6.20s/it] 31%|███ | 1770/5773 [2:45:02<6:53:35, 6.20s/it] {'loss': 0.5614, 'learning_rate': 1.6251178472320114e-05, 'epoch': 0.31} 31%|███ | 1770/5773 [2:45:07<6:53:34, 6.20s/it] {'loss': 0.5614, 'learning_rate': 1.6251178472320114e-05, 'epoch': 0.31} 31%|███ | 1770/5773 [2:45:02<6:53:35, 6.20s/it] 31%|███ | 1771/5773 [2:45:18<8:32:33, 7.68s/it] 31%|███ | 1771/5773 [2:45:13<8:32:33, 7.68s/it] {'loss': 0.5877, 'learning_rate': 1.624679794124954e-05, 'epoch': 0.31} 31%|███ | 1771/5773 [2:45:18<8:32:33, 7.68s/it] {'loss': 0.5877, 'learning_rate': 1.624679794124954e-05, 'epoch': 0.31} 31%|███ | 1771/5773 [2:45:13<8:32:33, 7.68s/it] 31%|███ | 1772/5773 [2:45:29<9:42:40, 8.74s/it] 31%|███ | 1772/5773 [2:45:24<9:42:44, 8.74s/it] {'loss': 0.5848, 'learning_rate': 1.6242415443487393e-05, 'epoch': 0.31} 31%|███ | 1772/5773 [2:45:29<9:42:40, 8.74s/it] {'loss': 0.5848, 'learning_rate': 1.6242415443487393e-05, 'epoch': 0.31} 31%|███ | 1772/5773 [2:45:24<9:42:44, 8.74s/it] 31%|███ | 1773/5773 [2:45:35<8:36:51, 7.75s/it] 31%|███ | 1773/5773 [2:45:29<8:36:51, 7.75s/it] {'loss': 0.5743, 'learning_rate': 1.6238030980413418e-05, 'epoch': 0.31} 31%|███ | 1773/5773 [2:45:35<8:36:51, 7.75s/it] {'loss': 0.5743, 'learning_rate': 1.6238030980413418e-05, 'epoch': 0.31} 31%|███ | 1773/5773 [2:45:29<8:36:51, 7.75s/it] 31%|███ | 1774/5773 [2:45:44<8:56:02, 8.04s/it] 31%|███ | 1774/5773 [2:45:38<8:56:01, 8.04s/it] {'loss': 0.5881, 'learning_rate': 1.6233644553407987e-05, 'epoch': 0.31} 31%|███ | 1774/5773 [2:45:44<8:56:02, 8.04s/it] {'loss': 0.5881, 'learning_rate': 1.6233644553407987e-05, 'epoch': 0.31} 31%|███ | 1774/5773 [2:45:38<8:56:01, 8.04s/it] 31%|███ | 1775/5773 [2:45:49<8:06:23, 7.30s/it] 31%|███ | 1775/5773 [2:45:44<8:06:23, 7.30s/it] {'loss': 0.5832, 'learning_rate': 1.6229256163852086e-05, 'epoch': 0.31} 31%|███ | 1775/5773 [2:45:49<8:06:23, 7.30s/it] {'loss': 0.5832, 'learning_rate': 1.6229256163852086e-05, 'epoch': 0.31} 31%|███ | 1775/5773 [2:45:44<8:06:23, 7.30s/it] 31%|███ | 1776/5773 [2:45:55<7:30:44, 6.77s/it] 31%|███ | 1776/5773 [2:45:49<7:30:44, 6.77s/it] {'loss': 0.5792, 'learning_rate': 1.622486581312732e-05, 'epoch': 0.31} 31%|███ | 1776/5773 [2:45:55<7:30:44, 6.77s/it] {'loss': 0.5792, 'learning_rate': 1.622486581312732e-05, 'epoch': 0.31} 31%|███ | 1776/5773 [2:45:49<7:30:44, 6.77s/it] 31%|███ | 1777/5773 [2:46:00<7:07:37, 6.42s/it] 31%|███ | 1777/5773 [2:45:55<7:07:36, 6.42s/it] {'loss': 0.5871, 'learning_rate': 1.6220473502615918e-05, 'epoch': 0.31} 31%|███ | 1777/5773 [2:46:00<7:07:37, 6.42s/it] {'loss': 0.5871, 'learning_rate': 1.6220473502615918e-05, 'epoch': 0.31} 31%|███ | 1777/5773 [2:45:55<7:07:36, 6.42s/it] 31%|███ | 1778/5773 [2:46:06<6:47:51, 6.13s/it] 31%|███ | 1778/5773 [2:46:00<6:47:51, 6.13s/it] {'loss': 0.5939, 'learning_rate': 1.621607923370071e-05, 'epoch': 0.31} 31%|███ | 1778/5773 [2:46:06<6:47:51, 6.13s/it] {'loss': 0.5939, 'learning_rate': 1.621607923370071e-05, 'epoch': 0.31} 31%|███ | 1778/5773 [2:46:00<6:47:51, 6.13s/it] 31%|███ | 1779/5773 [2:46:11<6:31:00, 5.87s/it] 31%|███ | 1779/5773 [2:46:05<6:31:00, 5.87s/it] {'loss': 0.5638, 'learning_rate': 1.6211683007765157e-05, 'epoch': 0.31} 31%|███ | 1779/5773 [2:46:11<6:31:00, 5.87s/it] {'loss': 0.5638, 'learning_rate': 1.6211683007765157e-05, 'epoch': 0.31} 31%|███ | 1779/5773 [2:46:05<6:31:00, 5.87s/it] 31%|███ | 1780/5773 [2:46:17<6:26:38, 5.81s/it] 31%|███ | 1780/5773 [2:46:11<6:26:38, 5.81s/it] {'loss': 0.5887, 'learning_rate': 1.6207284826193334e-05, 'epoch': 0.31} 31%|███ | 1780/5773 [2:46:17<6:26:38, 5.81s/it] {'loss': 0.5887, 'learning_rate': 1.6207284826193334e-05, 'epoch': 0.31} 31%|███ | 1780/5773 [2:46:11<6:26:38, 5.81s/it] 31%|███ | 1781/5773 [2:46:22<6:20:01, 5.71s/it] 31%|███ | 1781/5773 [2:46:17<6:20:01, 5.71s/it] {'loss': 0.5805, 'learning_rate': 1.6202884690369924e-05, 'epoch': 0.31} 31%|███ | 1781/5773 [2:46:22<6:20:01, 5.71s/it] {'loss': 0.5805, 'learning_rate': 1.6202884690369924e-05, 'epoch': 0.31} 31%|███ | 1781/5773 [2:46:17<6:20:01, 5.71s/it] 31%|███ | 1782/5773 [2:46:27<6:10:58, 5.58s/it] 31%|███ | 1782/5773 [2:46:22<6:10:58, 5.58s/it] {'loss': 0.5813, 'learning_rate': 1.6198482601680233e-05, 'epoch': 0.31} 31%|███ | 1782/5773 [2:46:27<6:10:58, 5.58s/it] {'loss': 0.5813, 'learning_rate': 1.6198482601680233e-05, 'epoch': 0.31} 31%|███ | 1782/5773 [2:46:22<6:10:58, 5.58s/it] 31%|███ | 1783/5773 [2:46:38<7:54:33, 7.14s/it] 31%|███ | 1783/5773 [2:46:33<7:54:33, 7.14s/it] {'loss': 0.583, 'learning_rate': 1.619407856151018e-05, 'epoch': 0.31} 31%|███ | 1783/5773 [2:46:38<7:54:33, 7.14s/it] {'loss': 0.583, 'learning_rate': 1.619407856151018e-05, 'epoch': 0.31} 31%|███ | 1783/5773 [2:46:33<7:54:33, 7.14s/it] 31%|███ | 1784/5773 [2:46:44<7:20:13, 6.62s/it] 31%|███ | 1784/5773 [2:46:38<7:20:13, 6.62s/it] {'loss': 0.5855, 'learning_rate': 1.61896725712463e-05, 'epoch': 0.31} 31%|███ | 1784/5773 [2:46:44<7:20:13, 6.62s/it] {'loss': 0.5855, 'learning_rate': 1.61896725712463e-05, 'epoch': 0.31} 31%|███ | 1784/5773 [2:46:38<7:20:13, 6.62s/it] 31%|███ | 1785/5773 [2:46:49<6:56:45, 6.27s/it] 31%|███ | 1785/5773 [2:46:44<6:56:45, 6.27s/it] {'loss': 0.586, 'learning_rate': 1.618526463227573e-05, 'epoch': 0.31} 31%|███ | 1785/5773 [2:46:49<6:56:45, 6.27s/it] {'loss': 0.586, 'learning_rate': 1.618526463227573e-05, 'epoch': 0.31} 31%|███ | 1785/5773 [2:46:44<6:56:45, 6.27s/it] 31%|███ | 1786/5773 [2:46:55<6:40:57, 6.03s/it] 31%|███ | 1786/5773 [2:46:49<6:40:57, 6.03s/it] {'loss': 0.5876, 'learning_rate': 1.618085474598624e-05, 'epoch': 0.31} 31%|███ | 1786/5773 [2:46:55<6:40:57, 6.03s/it] {'loss': 0.5876, 'learning_rate': 1.618085474598624e-05, 'epoch': 0.31} 31%|███ | 1786/5773 [2:46:49<6:40:57, 6.03s/it] 31%|███ | 1787/5773 [2:47:00<6:26:24, 5.82s/it] 31%|███ | 1787/5773 [2:46:54<6:26:24, 5.82s/it] {'loss': 0.5841, 'learning_rate': 1.6176442913766192e-05, 'epoch': 0.31} 31%|███ | 1787/5773 [2:47:00<6:26:24, 5.82s/it] {'loss': 0.5841, 'learning_rate': 1.6176442913766192e-05, 'epoch': 0.31} 31%|███ | 1787/5773 [2:46:54<6:26:24, 5.82s/it] 31%|███ | 1788/5773 [2:47:05<6:16:20, 5.67s/it] 31%|███ | 1788/5773 [2:47:00<6:16:20, 5.67s/it] {'loss': 0.6121, 'learning_rate': 1.6172029137004584e-05, 'epoch': 0.31} 31%|███ | 1788/5773 [2:47:05<6:16:20, 5.67s/it] {'loss': 0.6121, 'learning_rate': 1.6172029137004584e-05, 'epoch': 0.31} 31%|███ | 1788/5773 [2:47:00<6:16:20, 5.67s/it] 31%|███ | 1789/5773 [2:47:11<6:18:06, 5.69s/it] 31%|███ | 1789/5773 [2:47:05<6:18:06, 5.69s/it] {'loss': 0.5706, 'learning_rate': 1.6167613417091007e-05, 'epoch': 0.31} 31%|███ | 1789/5773 [2:47:11<6:18:06, 5.69s/it] {'loss': 0.5706, 'learning_rate': 1.6167613417091007e-05, 'epoch': 0.31} 31%|███ | 1789/5773 [2:47:05<6:18:06, 5.69s/it] 31%|███ | 1790/5773 [2:47:16<6:13:05, 5.62s/it] 31%|███ | 1790/5773 [2:47:11<6:13:05, 5.62s/it] {'loss': 0.5976, 'learning_rate': 1.6163195755415676e-05, 'epoch': 0.31} 31%|███ | 1790/5773 [2:47:16<6:13:05, 5.62s/it] {'loss': 0.5976, 'learning_rate': 1.6163195755415676e-05, 'epoch': 0.31} 31%|███ | 1790/5773 [2:47:11<6:13:05, 5.62s/it] 31%|███ | 1791/5773 [2:47:22<6:09:09, 5.56s/it] 31%|███ | 1791/5773 [2:47:16<6:09:09, 5.56s/it] {'loss': 0.5923, 'learning_rate': 1.6158776153369406e-05, 'epoch': 0.31} 31%|███ | 1791/5773 [2:47:22<6:09:09, 5.56s/it] {'loss': 0.5923, 'learning_rate': 1.6158776153369406e-05, 'epoch': 0.31} 31%|███ | 1791/5773 [2:47:16<6:09:09, 5.56s/it] 31%|███ | 1792/5773 [2:47:27<6:04:19, 5.49s/it] 31%|███ | 1792/5773 [2:47:22<6:04:19, 5.49s/it] {'loss': 0.5826, 'learning_rate': 1.615435461234363e-05, 'epoch': 0.31} 31%|███ | 1792/5773 [2:47:27<6:04:19, 5.49s/it] {'loss': 0.5826, 'learning_rate': 1.615435461234363e-05, 'epoch': 0.31} 31%|███ | 1792/5773 [2:47:22<6:04:19, 5.49s/it] 31%|███ | 1793/5773 [2:47:36<7:09:02, 6.47s/it] 31%|███ | 1793/5773 [2:47:30<7:09:02, 6.47s/it] {'loss': 0.5893, 'learning_rate': 1.6149931133730396e-05, 'epoch': 0.31} 31%|███ | 1793/5773 [2:47:36<7:09:02, 6.47s/it] {'loss': 0.5893, 'learning_rate': 1.6149931133730396e-05, 'epoch': 0.31} 31%|███ | 1793/5773 [2:47:30<7:09:02, 6.47s/it] 31%|███ | 1794/5773 [2:47:41<6:49:05, 6.17s/it] 31%|███ | 1794/5773 [2:47:36<6:49:05, 6.17s/it] {'loss': 0.587, 'learning_rate': 1.6145505718922348e-05, 'epoch': 0.31} 31%|███ | 1794/5773 [2:47:41<6:49:05, 6.17s/it] {'loss': 0.587, 'learning_rate': 1.6145505718922348e-05, 'epoch': 0.31} 31%|███ | 1794/5773 [2:47:36<6:49:05, 6.17s/it] 31%|███ | 1795/5773 [2:47:47<6:39:33, 6.03s/it] 31%|███ | 1795/5773 [2:47:41<6:39:33, 6.03s/it] {'loss': 0.5842, 'learning_rate': 1.6141078369312752e-05, 'epoch': 0.31} 31%|███ | 1795/5773 [2:47:47<6:39:33, 6.03s/it] {'loss': 0.5842, 'learning_rate': 1.6141078369312752e-05, 'epoch': 0.31} 31%|███ | 1795/5773 [2:47:41<6:39:33, 6.03s/it] 31%|███ | 1796/5773 [2:47:52<6:27:22, 5.84s/it] 31%|███ | 1796/5773 [2:47:47<6:27:22, 5.84s/it] {'loss': 0.586, 'learning_rate': 1.613664908629548e-05, 'epoch': 0.31} 31%|███ | 1796/5773 [2:47:52<6:27:22, 5.84s/it] {'loss': 0.586, 'learning_rate': 1.613664908629548e-05, 'epoch': 0.31} 31%|███ | 1796/5773 [2:47:47<6:27:22, 5.84s/it] 31%|███ | 1797/5773 [2:47:58<6:16:57, 5.69s/it] 31%|███ | 1797/5773 [2:47:52<6:16:57, 5.69s/it] {'loss': 0.575, 'learning_rate': 1.6132217871265012e-05, 'epoch': 0.31} 31%|███ | 1797/5773 [2:47:58<6:16:57, 5.69s/it] {'loss': 0.575, 'learning_rate': 1.6132217871265012e-05, 'epoch': 0.31} 31%|███ | 1797/5773 [2:47:52<6:16:57, 5.69s/it] 31%|███ | 1798/5773 [2:48:06<7:06:31, 6.44s/it] 31%|███ | 1798/5773 [2:48:00<7:06:31, 6.44s/it] {'loss': 0.5857, 'learning_rate': 1.6127784725616434e-05, 'epoch': 0.31} 31%|███ | 1798/5773 [2:48:06<7:06:31, 6.44s/it] {'loss': 0.5857, 'learning_rate': 1.6127784725616434e-05, 'epoch': 0.31} 31%|███ | 1798/5773 [2:48:00<7:06:31, 6.44s/it] 31%|███ | 1799/5773 [2:48:12<6:49:10, 6.18s/it] 31%|███ | 1799/5773 [2:48:06<6:49:10, 6.18s/it] {'loss': 0.6041, 'learning_rate': 1.6123349650745445e-05, 'epoch': 0.31} 31%|███ | 1799/5773 [2:48:12<6:49:10, 6.18s/it] {'loss': 0.6041, 'learning_rate': 1.6123349650745445e-05, 'epoch': 0.31} 31%|███ | 1799/5773 [2:48:06<6:49:10, 6.18s/it]8 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 31%|███ | 1800/5773 [2:48:17<6:34:14, 5.95s/it]9 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 31%|███ | 1800/5773 [2:48:11<6:34:14, 5.95s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... {'loss': 0.5925, 'learning_rate': 1.6118912648048345e-05, 'epoch': 0.31} 31%|███ | 1800/5773 [2:48:17<6:34:14, 5.95s/it] {'loss': 0.5925, 'learning_rate': 1.6118912648048345e-05, 'epoch': 0.31} 31%|███ | 1800/5773 [2:48:11<6:34:14, 5.95s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 31%|███ | 1801/5773 [2:48:36<10:57:13, 9.93s/it] 31%|███ | 1801/5773 [2:48:31<10:57:13, 9.93s/it] {'loss': 0.5838, 'learning_rate': 1.6114473718922042e-05, 'epoch': 0.31} 31%|███ | 1801/5773 [2:48:36<10:57:13, 9.93s/it] {'loss': 0.5838, 'learning_rate': 1.6114473718922042e-05, 'epoch': 0.31} 31%|███ | 1801/5773 [2:48:31<10:57:13, 9.93s/it] 31%|███ | 1802/5773 [2:48:42<9:27:28, 8.57s/it] 31%|███ | 1802/5773 [2:48:36<9:27:27, 8.57s/it] {'loss': 0.5844, 'learning_rate': 1.611003286476406e-05, 'epoch': 0.31} 31%|███ | 1802/5773 [2:48:42<9:27:28, 8.57s/it] {'loss': 0.5844, 'learning_rate': 1.611003286476406e-05, 'epoch': 0.31} 31%|███ | 1802/5773 [2:48:36<9:27:27, 8.57s/it] 31%|███ | 1803/5773 [2:48:47<8:27:07, 7.66s/it] 31%|███ | 1803/5773 [2:48:42<8:27:07, 7.66s/it] {'loss': 0.571, 'learning_rate': 1.610559008697252e-05, 'epoch': 0.31} 31%|███ | 1803/5773 [2:48:47<8:27:07, 7.66s/it] {'loss': 0.571, 'learning_rate': 1.610559008697252e-05, 'epoch': 0.31} 31%|███ | 1803/5773 [2:48:42<8:27:07, 7.66s/it] 31%|███ | 1804/5773 [2:48:56<8:47:59, 7.98s/it] 31%|███ | 1804/5773 [2:48:50<8:47:58, 7.98s/it] {'loss': 0.577, 'learning_rate': 1.6101145386946145e-05, 'epoch': 0.31} 31%|███ | 1804/5773 [2:48:56<8:47:59, 7.98s/it] {'loss': 0.577, 'learning_rate': 1.6101145386946145e-05, 'epoch': 0.31} 31%|███ | 1804/5773 [2:48:50<8:47:58, 7.98s/it] 31%|███▏ | 1805/5773 [2:49:01<7:58:00, 7.23s/it] 31%|███▏ | 1805/5773 [2:48:56<7:58:00, 7.23s/it] {'loss': 0.5802, 'learning_rate': 1.6096698766084274e-05, 'epoch': 0.31} 31%|███▏ | 1805/5773 [2:49:01<7:58:00, 7.23s/it] {'loss': 0.5802, 'learning_rate': 1.6096698766084274e-05, 'epoch': 0.31} 31%|███▏ | 1805/5773 [2:48:56<7:58:00, 7.23s/it] 31%|███▏ | 1806/5773 [2:49:01<7:20:21, 6.66s/it] 31%|███▏ | 1806/5773 [2:49:07<7:20:23, 6.66s/it] {'loss': 0.5834, 'learning_rate': 1.6092250225786842e-05, 'epoch': 0.31} 31%|███▏ | 1806/5773 [2:49:07<7:20:23, 6.66s/it] {'loss': 0.5834, 'learning_rate': 1.6092250225786842e-05, 'epoch': 0.31} 31%|███▏ | 1806/5773 [2:49:01<7:20:21, 6.66s/it] 31%|███▏ | 1807/5773 [2:49:07<6:55:25, 6.28s/it] 31%|███▏ | 1807/5773 [2:49:12<6:55:29, 6.29s/it] {'loss': 0.5719, 'learning_rate': 1.6087799767454394e-05, 'epoch': 0.31} 31%|███▏ | 1807/5773 [2:49:12<6:55:29, 6.29s/it] {'loss': 0.5719, 'learning_rate': 1.6087799767454394e-05, 'epoch': 0.31} 31%|███▏ | 1807/5773 [2:49:07<6:55:25, 6.28s/it] 31%|███▏ | 1808/5773 [2:49:17<6:36:37, 6.00s/it] 31%|███▏ | 1808/5773 [2:49:12<6:36:38, 6.00s/it] {'loss': 0.5812, 'learning_rate': 1.6083347392488072e-05, 'epoch': 0.31} 31%|███▏ | 1808/5773 [2:49:12<6:36:38, 6.00s/it]{'loss': 0.5812, 'learning_rate': 1.6083347392488072e-05, 'epoch': 0.31} 31%|███▏ | 1808/5773 [2:49:17<6:36:37, 6.00s/it] 31%|███▏ | 1809/5773 [2:49:23<6:27:10, 5.86s/it] 31%|███▏ | 1809/5773 [2:49:17<6:27:11, 5.86s/it] {'loss': 0.5879, 'learning_rate': 1.6078893102289634e-05, 'epoch': 0.31} 31%|███▏ | 1809/5773 [2:49:23<6:27:10, 5.86s/it] {'loss': 0.5879, 'learning_rate': 1.6078893102289634e-05, 'epoch': 0.31} 31%|███▏ | 1809/5773 [2:49:17<6:27:11, 5.86s/it] 31%|███▏ | 1810/5773 [2:49:28<6:20:10, 5.76s/it] 31%|███▏ | 1810/5773 [2:49:23<6:20:12, 5.76s/it] {'loss': 0.5858, 'learning_rate': 1.6074436898261424e-05, 'epoch': 0.31} 31%|███▏ | 1810/5773 [2:49:28<6:20:10, 5.76s/it] {'loss': 0.5858, 'learning_rate': 1.6074436898261424e-05, 'epoch': 0.31} 31%|███▏ | 1810/5773 [2:49:23<6:20:12, 5.76s/it] 31%|███▏ | 1811/5773 [2:49:34<6:15:39, 5.69s/it] 31%|███▏ | 1811/5773 [2:49:28<6:15:40, 5.69s/it] {'loss': 0.5685, 'learning_rate': 1.6069978781806403e-05, 'epoch': 0.31} 31%|███▏ | 1811/5773 [2:49:34<6:15:39, 5.69s/it] {'loss': 0.5685, 'learning_rate': 1.6069978781806403e-05, 'epoch': 0.31} 31%|███▏ | 1811/5773 [2:49:28<6:15:40, 5.69s/it] 31%|███▏ | 1812/5773 [2:49:39<6:09:04, 5.59s/it] 31%|███▏ | 1812/5773 [2:49:34<6:09:04, 5.59s/it] {'loss': 0.5813, 'learning_rate': 1.6065518754328125e-05, 'epoch': 0.31} 31%|███▏ | 1812/5773 [2:49:39<6:09:04, 5.59s/it] {'loss': 0.5813, 'learning_rate': 1.6065518754328125e-05, 'epoch': 0.31} 31%|███▏ | 1812/5773 [2:49:34<6:09:04, 5.59s/it] 31%|███▏ | 1813/5773 [2:49:45<6:07:09, 5.56s/it] 31%|███▏ | 1813/5773 [2:49:39<6:07:09, 5.56s/it] {'loss': 0.5805, 'learning_rate': 1.6061056817230754e-05, 'epoch': 0.31} 31%|███▏ | 1813/5773 [2:49:45<6:07:09, 5.56s/it] {'loss': 0.5805, 'learning_rate': 1.6061056817230754e-05, 'epoch': 0.31} 31%|███▏ | 1813/5773 [2:49:39<6:07:09, 5.56s/it] 31%|███▏ | 1814/5773 [2:49:50<6:05:43, 5.54s/it] 31%|███▏ | 1814/5773 [2:49:45<6:05:42, 5.54s/it] {'loss': 0.5715, 'learning_rate': 1.6056592971919047e-05, 'epoch': 0.31} 31%|███▏ | 1814/5773 [2:49:45<6:05:42, 5.54s/it]{'loss': 0.5715, 'learning_rate': 1.6056592971919047e-05, 'epoch': 0.31} 31%|███▏ | 1814/5773 [2:49:50<6:05:43, 5.54s/it] 31%|███▏ | 1815/5773 [2:49:50<6:03:08, 5.50s/it] 31%|███▏ | 1815/5773 [2:49:56<6:03:09, 5.51s/it] {'loss': 0.5906, 'learning_rate': 1.6052127219798367e-05, 'epoch': 0.31} 31%|███▏ | 1815/5773 [2:49:56<6:03:09, 5.51s/it] {'loss': 0.5906, 'learning_rate': 1.6052127219798367e-05, 'epoch': 0.31} 31%|███▏ | 1815/5773 [2:49:50<6:03:08, 5.50s/it] 31%|███▏ | 1816/5773 [2:50:01<6:01:09, 5.48s/it] 31%|███▏ | 1816/5773 [2:49:56<6:01:09, 5.48s/it] {'loss': 0.5826, 'learning_rate': 1.604765956227467e-05, 'epoch': 0.31} 31%|███▏ | 1816/5773 [2:50:01<6:01:09, 5.48s/it] {'loss': 0.5826, 'learning_rate': 1.604765956227467e-05, 'epoch': 0.31} 31%|███▏ | 1816/5773 [2:49:56<6:01:09, 5.48s/it] 31%|███▏ | 1817/5773 [2:50:07<6:00:57, 5.47s/it] 31%|███▏ | 1817/5773 [2:50:01<6:00:57, 5.47s/it] {'loss': 0.582, 'learning_rate': 1.6043190000754525e-05, 'epoch': 0.31} 31%|███▏ | 1817/5773 [2:50:07<6:00:57, 5.47s/it] {'loss': 0.582, 'learning_rate': 1.6043190000754525e-05, 'epoch': 0.31} 31%|███▏ | 1817/5773 [2:50:01<6:00:57, 5.47s/it] 31%|███▏ | 1818/5773 [2:50:12<6:00:42, 5.47s/it] 31%|███▏ | 1818/5773 [2:50:07<6:00:42, 5.47s/it] {'loss': 0.5804, 'learning_rate': 1.6038718536645088e-05, 'epoch': 0.31} 31%|███▏ | 1818/5773 [2:50:12<6:00:42, 5.47s/it] {'loss': 0.5804, 'learning_rate': 1.6038718536645088e-05, 'epoch': 0.31} 31%|███▏ | 1818/5773 [2:50:07<6:00:42, 5.47s/it] 32%|███▏ | 1819/5773 [2:50:18<6:02:12, 5.50s/it] 32%|███▏ | 1819/5773 [2:50:12<6:02:12, 5.50s/it] {'loss': 0.5781, 'learning_rate': 1.6034245171354118e-05, 'epoch': 0.32} 32%|███▏ | 1819/5773 [2:50:18<6:02:12, 5.50s/it] {'loss': 0.5781, 'learning_rate': 1.6034245171354118e-05, 'epoch': 0.32} 32%|███▏ | 1819/5773 [2:50:12<6:02:12, 5.50s/it] 32%|███▏ | 1820/5773 [2:50:23<5:59:04, 5.45s/it] 32%|███▏ | 1820/5773 [2:50:17<5:59:04, 5.45s/it] {'loss': 0.5764, 'learning_rate': 1.6029769906289982e-05, 'epoch': 0.32} 32%|███▏ | 1820/5773 [2:50:23<5:59:04, 5.45s/it] {'loss': 0.5764, 'learning_rate': 1.6029769906289982e-05, 'epoch': 0.32} 32%|███▏ | 1820/5773 [2:50:17<5:59:04, 5.45s/it] 32%|███▏ | 1821/5773 [2:50:28<5:55:56, 5.40s/it] 32%|███▏ | 1821/5773 [2:50:23<5:55:56, 5.40s/it] {'loss': 0.5868, 'learning_rate': 1.6025292742861622e-05, 'epoch': 0.32} 32%|███▏ | 1821/5773 [2:50:28<5:55:56, 5.40s/it] {'loss': 0.5868, 'learning_rate': 1.6025292742861622e-05, 'epoch': 0.32} 32%|███▏ | 1821/5773 [2:50:23<5:55:56, 5.40s/it] 32%|███▏ | 1822/5773 [2:50:34<5:53:19, 5.37s/it] 32%|███▏ | 1822/5773 [2:50:28<5:53:19, 5.37s/it] {'loss': 0.5773, 'learning_rate': 1.6020813682478603e-05, 'epoch': 0.32} 32%|███▏ | 1822/5773 [2:50:34<5:53:19, 5.37s/it] {'loss': 0.5773, 'learning_rate': 1.6020813682478603e-05, 'epoch': 0.32} 32%|███▏ | 1822/5773 [2:50:28<5:53:19, 5.37s/it] 32%|███▏ | 1823/5773 [2:50:39<5:53:57, 5.38s/it] 32%|███▏ | 1823/5773 [2:50:33<5:53:57, 5.38s/it] {'loss': 0.5854, 'learning_rate': 1.601633272655107e-05, 'epoch': 0.32} 32%|███▏ | 1823/5773 [2:50:39<5:53:57, 5.38s/it] {'loss': 0.5854, 'learning_rate': 1.601633272655107e-05, 'epoch': 0.32} 32%|███▏ | 1823/5773 [2:50:33<5:53:57, 5.38s/it] 32%|███▏ | 1824/5773 [2:50:44<5:53:23, 5.37s/it] 32%|███▏ | 1824/5773 [2:50:39<5:53:28, 5.37s/it] {'loss': 0.5731, 'learning_rate': 1.6011849876489777e-05, 'epoch': 0.32} 32%|███▏ | 1824/5773 [2:50:44<5:53:23, 5.37s/it] {'loss': 0.5731, 'learning_rate': 1.6011849876489777e-05, 'epoch': 0.32} 32%|███▏ | 1824/5773 [2:50:39<5:53:28, 5.37s/it] 32%|███▏ | 1825/5773 [2:50:50<5:55:00, 5.40s/it] 32%|███▏ | 1825/5773 [2:50:44<5:54:59, 5.39s/it] {'loss': 0.5882, 'learning_rate': 1.6007365133706065e-05, 'epoch': 0.32} 32%|███▏ | 1825/5773 [2:50:50<5:55:00, 5.40s/it] {'loss': 0.5882, 'learning_rate': 1.6007365133706065e-05, 'epoch': 0.32} 32%|███▏ | 1825/5773 [2:50:44<5:54:59, 5.39s/it] 32%|███▏ | 1826/5773 [2:50:55<5:55:04, 5.40s/it] 32%|███▏ | 1826/5773 [2:50:50<5:55:03, 5.40s/it] {'loss': 0.5556, 'learning_rate': 1.6002878499611876e-05, 'epoch': 0.32} 32%|███▏ | 1826/5773 [2:50:55<5:55:04, 5.40s/it] {'loss': 0.5556, 'learning_rate': 1.6002878499611876e-05, 'epoch': 0.32} 32%|███▏ | 1826/5773 [2:50:50<5:55:03, 5.40s/it] 32%|███▏ | 1827/5773 [2:51:01<5:56:13, 5.42s/it] 32%|███▏ | 1827/5773 [2:50:55<5:56:12, 5.42s/it] {'loss': 0.5883, 'learning_rate': 1.5998389975619747e-05, 'epoch': 0.32} 32%|███▏ | 1827/5773 [2:51:01<5:56:13, 5.42s/it] {'loss': 0.5883, 'learning_rate': 1.5998389975619747e-05, 'epoch': 0.32} 32%|███▏ | 1827/5773 [2:50:55<5:56:12, 5.42s/it] 32%|███▏ | 1828/5773 [2:51:06<5:55:58, 5.41s/it] 32%|███▏ | 1828/5773 [2:51:00<5:55:57, 5.41s/it] {'loss': 0.5961, 'learning_rate': 1.5993899563142804e-05, 'epoch': 0.32} 32%|███▏ | 1828/5773 [2:51:06<5:55:58, 5.41s/it] {'loss': 0.5961, 'learning_rate': 1.5993899563142804e-05, 'epoch': 0.32} 32%|███▏ | 1828/5773 [2:51:00<5:55:57, 5.41s/it] 32%|███▏ | 1829/5773 [2:51:12<5:58:14, 5.45s/it] 32%|███▏ | 1829/5773 [2:51:06<5:58:14, 5.45s/it] {'loss': 0.592, 'learning_rate': 1.598940726359477e-05, 'epoch': 0.32} 32%|███▏ | 1829/5773 [2:51:12<5:58:14, 5.45s/it] {'loss': 0.592, 'learning_rate': 1.598940726359477e-05, 'epoch': 0.32} 32%|███▏ | 1829/5773 [2:51:06<5:58:14, 5.45s/it] 32%|███▏ | 1830/5773 [2:51:17<5:56:37, 5.43s/it] 32%|███▏ | 1830/5773 [2:51:11<5:56:37, 5.43s/it] {'loss': 0.5802, 'learning_rate': 1.5984913078389974e-05, 'epoch': 0.32} 32%|███▏ | 1830/5773 [2:51:17<5:56:37, 5.43s/it] {'loss': 0.5802, 'learning_rate': 1.5984913078389974e-05, 'epoch': 0.32} 32%|███▏ | 1830/5773 [2:51:11<5:56:37, 5.43s/it] 32%|███▏ | 1831/5773 [2:51:23<6:03:48, 5.54s/it] 32%|███▏ | 1831/5773 [2:51:17<6:03:47, 5.54s/it] {'loss': 0.5681, 'learning_rate': 1.5980417008943323e-05, 'epoch': 0.32} 32%|███▏ | 1831/5773 [2:51:23<6:03:48, 5.54s/it] {'loss': 0.5681, 'learning_rate': 1.5980417008943323e-05, 'epoch': 0.32} 32%|███▏ | 1831/5773 [2:51:17<6:03:47, 5.54s/it] 32%|███▏ | 1832/5773 [2:51:28<6:01:23, 5.50s/it] 32%|███▏ | 1832/5773 [2:51:23<6:01:23, 5.50s/it] {'loss': 0.5745, 'learning_rate': 1.5975919056670325e-05, 'epoch': 0.32} 32%|███▏ | 1832/5773 [2:51:28<6:01:23, 5.50s/it] {'loss': 0.5745, 'learning_rate': 1.5975919056670325e-05, 'epoch': 0.32} 32%|███▏ | 1832/5773 [2:51:23<6:01:23, 5.50s/it] 32%|███▏ | 1833/5773 [2:51:34<5:58:58, 5.47s/it] 32%|███▏ | 1833/5773 [2:51:28<5:58:58, 5.47s/it] {'loss': 0.5835, 'learning_rate': 1.5971419222987078e-05, 'epoch': 0.32} 32%|███▏ | 1833/5773 [2:51:34<5:58:58, 5.47s/it] {'loss': 0.5835, 'learning_rate': 1.5971419222987078e-05, 'epoch': 0.32} 32%|███▏ | 1833/5773 [2:51:28<5:58:58, 5.47s/it] 32%|███▏ | 1834/5773 [2:51:39<5:55:16, 5.41s/it] 32%|███▏ | 1834/5773 [2:51:33<5:55:16, 5.41s/it] {'loss': 0.5661, 'learning_rate': 1.5966917509310277e-05, 'epoch': 0.32} 32%|███▏ | 1834/5773 [2:51:39<5:55:16, 5.41s/it] {'loss': 0.5661, 'learning_rate': 1.5966917509310277e-05, 'epoch': 0.32} 32%|███▏ | 1834/5773 [2:51:33<5:55:16, 5.41s/it] 32%|███▏ | 1835/5773 [2:51:44<5:53:24, 5.38s/it] 32%|███▏ | 1835/5773 [2:51:39<5:53:24, 5.38s/it] {'loss': 0.5799, 'learning_rate': 1.59624139170572e-05, 'epoch': 0.32} 32%|███▏ | 1835/5773 [2:51:44<5:53:24, 5.38s/it] {'loss': 0.5799, 'learning_rate': 1.59624139170572e-05, 'epoch': 0.32} 32%|███▏ | 1835/5773 [2:51:39<5:53:24, 5.38s/it] 32%|███▏ | 1836/5773 [2:51:50<5:58:28, 5.46s/it] 32%|███▏ | 1836/5773 [2:51:44<5:58:28, 5.46s/it] {'loss': 0.5793, 'learning_rate': 1.5957908447645722e-05, 'epoch': 0.32} 32%|███▏ | 1836/5773 [2:51:50<5:58:28, 5.46s/it] {'loss': 0.5793, 'learning_rate': 1.5957908447645722e-05, 'epoch': 0.32} 32%|███▏ | 1836/5773 [2:51:44<5:58:28, 5.46s/it] 32%|███▏ | 1837/5773 [2:51:55<6:00:19, 5.49s/it] 32%|███▏ | 1837/5773 [2:51:50<6:00:19, 5.49s/it] {'loss': 0.5752, 'learning_rate': 1.5953401102494315e-05, 'epoch': 0.32} 32%|███▏ | 1837/5773 [2:51:55<6:00:19, 5.49s/it] {'loss': 0.5752, 'learning_rate': 1.5953401102494315e-05, 'epoch': 0.32} 32%|███▏ | 1837/5773 [2:51:50<6:00:19, 5.49s/it] 32%|███▏ | 1838/5773 [2:52:01<6:01:14, 5.51s/it] 32%|███▏ | 1838/5773 [2:51:55<6:01:14, 5.51s/it] {'loss': 0.593, 'learning_rate': 1.594889188302203e-05, 'epoch': 0.32} 32%|███▏ | 1838/5773 [2:52:01<6:01:14, 5.51s/it] {'loss': 0.593, 'learning_rate': 1.594889188302203e-05, 'epoch': 0.32} 32%|███▏ | 1838/5773 [2:51:55<6:01:14, 5.51s/it] 32%|███▏ | 1839/5773 [2:52:06<5:59:41, 5.49s/it] 32%|███▏ | 1839/5773 [2:52:01<5:59:41, 5.49s/it] {'loss': 0.5771, 'learning_rate': 1.594438079064851e-05, 'epoch': 0.32} 32%|███▏ | 1839/5773 [2:52:06<5:59:41, 5.49s/it] {'loss': 0.5771, 'learning_rate': 1.594438079064851e-05, 'epoch': 0.32} 32%|███▏ | 1839/5773 [2:52:01<5:59:41, 5.49s/it] 32%|███▏ | 1840/5773 [2:52:12<5:59:58, 5.49s/it] 32%|███▏ | 1840/5773 [2:52:06<5:59:58, 5.49s/it] {'loss': 0.5935, 'learning_rate': 1.5939867826794e-05, 'epoch': 0.32} 32%|███▏ | 1840/5773 [2:52:12<5:59:58, 5.49s/it] {'loss': 0.5935, 'learning_rate': 1.5939867826794e-05, 'epoch': 0.32} 32%|███▏ | 1840/5773 [2:52:06<5:59:58, 5.49s/it] 32%|███▏ | 1841/5773 [2:52:17<5:54:48, 5.41s/it] 32%|███▏ | 1841/5773 [2:52:12<5:54:48, 5.41s/it] {'loss': 0.5859, 'learning_rate': 1.593535299287932e-05, 'epoch': 0.32} 32%|███▏ | 1841/5773 [2:52:17<5:54:48, 5.41s/it] {'loss': 0.5859, 'learning_rate': 1.593535299287932e-05, 'epoch': 0.32} 32%|███▏ | 1841/5773 [2:52:12<5:54:48, 5.41s/it] 32%|███▏ | 1842/5773 [2:52:22<5:53:09, 5.39s/it] 32%|███▏ | 1842/5773 [2:52:17<5:53:09, 5.39s/it] {'loss': 0.6017, 'learning_rate': 1.5930836290325884e-05, 'epoch': 0.32} 32%|███▏ | 1842/5773 [2:52:22<5:53:09, 5.39s/it] {'loss': 0.6017, 'learning_rate': 1.5930836290325884e-05, 'epoch': 0.32} 32%|███▏ | 1842/5773 [2:52:17<5:53:09, 5.39s/it] 32%|███▏ | 1843/5773 [2:52:28<5:56:53, 5.45s/it] 32%|███▏ | 1843/5773 [2:52:22<5:56:53, 5.45s/it] {'loss': 0.6002, 'learning_rate': 1.59263177205557e-05, 'epoch': 0.32} 32%|███▏ | 1843/5773 [2:52:28<5:56:53, 5.45s/it] {'loss': 0.6002, 'learning_rate': 1.59263177205557e-05, 'epoch': 0.32} 32%|███▏ | 1843/5773 [2:52:22<5:56:53, 5.45s/it] 32%|███▏ | 1844/5773 [2:52:33<5:57:30, 5.46s/it] 32%|███▏ | 1844/5773 [2:52:28<5:57:30, 5.46s/it] {'loss': 0.5762, 'learning_rate': 1.5921797284991342e-05, 'epoch': 0.32} 32%|███▏ | 1844/5773 [2:52:33<5:57:30, 5.46s/it] {'loss': 0.5762, 'learning_rate': 1.5921797284991342e-05, 'epoch': 0.32} 32%|███▏ | 1844/5773 [2:52:28<5:57:30, 5.46s/it] 32%|███▏ | 1845/5773 [2:52:39<6:01:43, 5.53s/it] 32%|███▏ | 1845/5773 [2:52:34<6:01:43, 5.53s/it] {'loss': 0.6055, 'learning_rate': 1.5917274985056007e-05, 'epoch': 0.32} 32%|███▏ | 1845/5773 [2:52:39<6:01:43, 5.53s/it] {'loss': 0.6055, 'learning_rate': 1.5917274985056007e-05, 'epoch': 0.32} 32%|███▏ | 1845/5773 [2:52:34<6:01:43, 5.53s/it] 32%|███▏ | 1846/5773 [2:52:45<6:00:35, 5.51s/it] 32%|███▏ | 1846/5773 [2:52:39<6:00:35, 5.51s/it] {'loss': 0.5924, 'learning_rate': 1.5912750822173446e-05, 'epoch': 0.32} 32%|███▏ | 1846/5773 [2:52:45<6:00:35, 5.51s/it] {'loss': 0.5924, 'learning_rate': 1.5912750822173446e-05, 'epoch': 0.32} 32%|███▏ | 1846/5773 [2:52:39<6:00:35, 5.51s/it] 32%|███▏ | 1847/5773 [2:52:50<6:00:54, 5.52s/it] 32%|███▏ | 1847/5773 [2:52:45<6:00:54, 5.52s/it] {'loss': 0.5957, 'learning_rate': 1.590822479776802e-05, 'epoch': 0.32} 32%|███▏ | 1847/5773 [2:52:50<6:00:54, 5.52s/it] {'loss': 0.5957, 'learning_rate': 1.590822479776802e-05, 'epoch': 0.32} 32%|███▏ | 1847/5773 [2:52:45<6:00:54, 5.52s/it] 32%|███▏ | 1848/5773 [2:52:56<5:58:14, 5.48s/it] 32%|███▏ | 1848/5773 [2:52:50<5:58:14, 5.48s/it] {'loss': 0.5909, 'learning_rate': 1.590369691326466e-05, 'epoch': 0.32} 32%|███▏ | 1848/5773 [2:52:56<5:58:14, 5.48s/it] {'loss': 0.5909, 'learning_rate': 1.590369691326466e-05, 'epoch': 0.32} 32%|███▏ | 1848/5773 [2:52:50<5:58:14, 5.48s/it] 32%|███▏ | 1849/5773 [2:53:01<5:56:15, 5.45s/it] 32%|███▏ | 1849/5773 [2:52:55<5:56:15, 5.45s/it] {'loss': 0.5753, 'learning_rate': 1.5899167170088886e-05, 'epoch': 0.32} 32%|███▏ | 1849/5773 [2:53:01<5:56:15, 5.45s/it] {'loss': 0.5753, 'learning_rate': 1.5899167170088886e-05, 'epoch': 0.32} 32%|███▏ | 1849/5773 [2:52:55<5:56:15, 5.45s/it]7 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 32%|███▏ | 1850/5773 [2:53:06<5:54:36, 5.42s/it]14 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 32%|███▏ | 1850/5773 [2:53:01<5:54:37, 5.42s/it] {'loss': 0.5817, 'learning_rate': 1.5894635569666816e-05, 'epoch': 0.32} 32%|███▏ | 1850/5773 [2:53:06<5:54:36, 5.42s/it] {'loss': 0.5817, 'learning_rate': 1.5894635569666816e-05, 'epoch': 0.32} 32%|███▏ | 1850/5773 [2:53:01<5:54:37, 5.42s/it] 32%|███▏ | 1851/5773 [2:53:12<5:57:40, 5.47s/it] 32%|███▏ | 1851/5773 [2:53:06<5:57:40, 5.47s/it] {'loss': 0.5811, 'learning_rate': 1.589010211342513e-05, 'epoch': 0.32} 32%|███▏ | 1851/5773 [2:53:12<5:57:40, 5.47s/it] {'loss': 0.5811, 'learning_rate': 1.589010211342513e-05, 'epoch': 0.32} 32%|███▏ | 1851/5773 [2:53:06<5:57:40, 5.47s/it] 32%|███▏ | 1852/5773 [2:53:17<5:56:48, 5.46s/it] 32%|███▏ | 1852/5773 [2:53:12<5:56:48, 5.46s/it] {'loss': 0.5734, 'learning_rate': 1.5885566802791115e-05, 'epoch': 0.32} 32%|███▏ | 1852/5773 [2:53:17<5:56:48, 5.46s/it] {'loss': 0.5734, 'learning_rate': 1.5885566802791115e-05, 'epoch': 0.32} 32%|███▏ | 1852/5773 [2:53:12<5:56:48, 5.46s/it] 32%|███▏ | 1853/5773 [2:53:23<5:56:29, 5.46s/it] 32%|███▏ | 1853/5773 [2:53:17<5:56:29, 5.46s/it] {'loss': 0.5908, 'learning_rate': 1.5881029639192626e-05, 'epoch': 0.32} 32%|███▏ | 1853/5773 [2:53:23<5:56:29, 5.46s/it] {'loss': 0.5908, 'learning_rate': 1.5881029639192626e-05, 'epoch': 0.32} 32%|███▏ | 1853/5773 [2:53:17<5:56:29, 5.46s/it] 32%|███▏ | 1854/5773 [2:53:28<5:55:17, 5.44s/it] 32%|███▏ | 1854/5773 [2:53:23<5:55:18, 5.44s/it] {'loss': 0.5711, 'learning_rate': 1.5876490624058113e-05, 'epoch': 0.32} 32%|███▏ | 1854/5773 [2:53:28<5:55:17, 5.44s/it] {'loss': 0.5711, 'learning_rate': 1.5876490624058113e-05, 'epoch': 0.32} 32%|███▏ | 1854/5773 [2:53:23<5:55:18, 5.44s/it] 32%|███▏ | 1855/5773 [2:53:33<5:52:13, 5.39s/it] 32%|███▏ | 1855/5773 [2:53:28<5:52:13, 5.39s/it] {'loss': 0.5897, 'learning_rate': 1.5871949758816594e-05, 'epoch': 0.32} 32%|███▏ | 1855/5773 [2:53:33<5:52:13, 5.39s/it] {'loss': 0.5897, 'learning_rate': 1.5871949758816594e-05, 'epoch': 0.32} 32%|███▏ | 1855/5773 [2:53:28<5:52:13, 5.39s/it] 32%|███▏ | 1856/5773 [2:53:39<5:51:09, 5.38s/it] 32%|███▏ | 1856/5773 [2:53:33<5:51:09, 5.38s/it] {'loss': 0.5639, 'learning_rate': 1.586740704489769e-05, 'epoch': 0.32} 32%|███▏ | 1856/5773 [2:53:39<5:51:09, 5.38s/it] {'loss': 0.5639, 'learning_rate': 1.586740704489769e-05, 'epoch': 0.32} 32%|███▏ | 1856/5773 [2:53:33<5:51:09, 5.38s/it] 32%|███▏ | 1857/5773 [2:53:44<5:50:35, 5.37s/it] 32%|███▏ | 1857/5773 [2:53:39<5:50:35, 5.37s/it] {'loss': 0.5973, 'learning_rate': 1.5862862483731574e-05, 'epoch': 0.32} 32%|███▏ | 1857/5773 [2:53:44<5:50:35, 5.37s/it] {'loss': 0.5973, 'learning_rate': 1.5862862483731574e-05, 'epoch': 0.32} 32%|███▏ | 1857/5773 [2:53:39<5:50:35, 5.37s/it] 32%|███▏ | 1858/5773 [2:53:49<5:49:43, 5.36s/it] 32%|███▏ | 1858/5773 [2:53:44<5:49:43, 5.36s/it] {'loss': 0.5806, 'learning_rate': 1.585831607674904e-05, 'epoch': 0.32} 32%|███▏ | 1858/5773 [2:53:49<5:49:43, 5.36s/it] {'loss': 0.5806, 'learning_rate': 1.585831607674904e-05, 'epoch': 0.32} 32%|███▏ | 1858/5773 [2:53:44<5:49:43, 5.36s/it] 32%|███▏ | 1859/5773 [2:53:55<5:48:56, 5.35s/it] 32%|███▏ | 1859/5773 [2:53:49<5:48:55, 5.35s/it] {'loss': 0.5887, 'learning_rate': 1.5853767825381434e-05, 'epoch': 0.32} 32%|███▏ | 1859/5773 [2:53:55<5:48:56, 5.35s/it] {'loss': 0.5887, 'learning_rate': 1.5853767825381434e-05, 'epoch': 0.32} 32%|███▏ | 1859/5773 [2:53:49<5:48:55, 5.35s/it] 32%|███▏ | 1860/5773 [2:54:00<5:51:15, 5.39s/it] 32%|███▏ | 1860/5773 [2:53:55<5:51:15, 5.39s/it] {'loss': 0.5707, 'learning_rate': 1.5849217731060686e-05, 'epoch': 0.32} 32%|███▏ | 1860/5773 [2:54:00<5:51:15, 5.39s/it] {'loss': 0.5707, 'learning_rate': 1.5849217731060686e-05, 'epoch': 0.32} 32%|███▏ | 1860/5773 [2:53:55<5:51:15, 5.39s/it] 32%|███▏ | 1861/5773 [2:54:06<5:53:46, 5.43s/it] 32%|███▏ | 1861/5773 [2:54:00<5:53:46, 5.43s/it] {'loss': 0.5943, 'learning_rate': 1.5844665795219314e-05, 'epoch': 0.32} 32%|███▏ | 1861/5773 [2:54:06<5:53:46, 5.43s/it] {'loss': 0.5943, 'learning_rate': 1.5844665795219314e-05, 'epoch': 0.32} 32%|███▏ | 1861/5773 [2:54:00<5:53:46, 5.43s/it] 32%|███▏ | 1862/5773 [2:54:11<5:51:19, 5.39s/it] 32%|███▏ | 1862/5773 [2:54:06<5:51:19, 5.39s/it] {'loss': 0.5773, 'learning_rate': 1.584011201929042e-05, 'epoch': 0.32} 32%|███▏ | 1862/5773 [2:54:11<5:51:19, 5.39s/it] {'loss': 0.5773, 'learning_rate': 1.584011201929042e-05, 'epoch': 0.32} 32%|███▏ | 1862/5773 [2:54:06<5:51:19, 5.39s/it] 32%|███▏ | 1863/5773 [2:54:17<5:57:45, 5.49s/it] 32%|███▏ | 1863/5773 [2:54:11<5:57:44, 5.49s/it] {'loss': 0.5969, 'learning_rate': 1.5835556404707668e-05, 'epoch': 0.32} 32%|███▏ | 1863/5773 [2:54:17<5:57:45, 5.49s/it] {'loss': 0.5969, 'learning_rate': 1.5835556404707668e-05, 'epoch': 0.32} 32%|███▏ | 1863/5773 [2:54:11<5:57:44, 5.49s/it] 32%|███▏ | 1864/5773 [2:54:22<5:58:20, 5.50s/it] 32%|███▏ | 1864/5773 [2:54:17<5:58:20, 5.50s/it] {'loss': 0.5591, 'learning_rate': 1.5830998952905316e-05, 'epoch': 0.32} 32%|███▏ | 1864/5773 [2:54:22<5:58:20, 5.50s/it] {'loss': 0.5591, 'learning_rate': 1.5830998952905316e-05, 'epoch': 0.32} 32%|███▏ | 1864/5773 [2:54:17<5:58:20, 5.50s/it] 32%|███▏ | 1865/5773 [2:54:28<6:01:32, 5.55s/it] 32%|███▏ | 1865/5773 [2:54:22<6:01:32, 5.55s/it] {'loss': 0.5898, 'learning_rate': 1.582643966531819e-05, 'epoch': 0.32} 32%|███▏ | 1865/5773 [2:54:28<6:01:32, 5.55s/it] {'loss': 0.5898, 'learning_rate': 1.582643966531819e-05, 'epoch': 0.32} 32%|███▏ | 1865/5773 [2:54:22<6:01:32, 5.55s/it] 32%|███▏ | 1866/5773 [2:54:34<6:00:32, 5.54s/it] 32%|███▏ | 1866/5773 [2:54:28<6:00:32, 5.54s/it] {'loss': 0.5895, 'learning_rate': 1.5821878543381707e-05, 'epoch': 0.32} 32%|███▏ | 1866/5773 [2:54:34<6:00:32, 5.54s/it] {'loss': 0.5895, 'learning_rate': 1.5821878543381707e-05, 'epoch': 0.32} 32%|███▏ | 1866/5773 [2:54:28<6:00:32, 5.54s/it] 32%|███▏ | 1867/5773 [2:54:39<5:58:34, 5.51s/it] 32%|███▏ | 1867/5773 [2:54:33<5:58:34, 5.51s/it] {'loss': 0.5745, 'learning_rate': 1.581731558853185e-05, 'epoch': 0.32} 32%|███▏ | 1867/5773 [2:54:39<5:58:34, 5.51s/it] {'loss': 0.5745, 'learning_rate': 1.581731558853185e-05, 'epoch': 0.32} 32%|███▏ | 1867/5773 [2:54:33<5:58:34, 5.51s/it] 32%|███▏ | 1868/5773 [2:54:44<5:54:02, 5.44s/it] {'loss': 0.5759, 'learning_rate': 1.5812750802205187e-05, 'epoch': 0.32} 32%|███▏ | 1868/5773 [2:54:44<5:54:02, 5.44s/it] 32%|███▏ | 1868/5773 [2:54:39<5:54:03, 5.44s/it] {'loss': 0.5759, 'learning_rate': 1.5812750802205187e-05, 'epoch': 0.32} 32%|███▏ | 1868/5773 [2:54:39<5:54:03, 5.44s/it] 32%|███▏ | 1869/5773 [2:54:50<5:52:52, 5.42s/it] 32%|███▏ | 1869/5773 [2:54:44<5:52:52, 5.42s/it] {'loss': 0.5974, 'learning_rate': 1.5808184185838854e-05, 'epoch': 0.32} 32%|███▏ | 1869/5773 [2:54:50<5:52:52, 5.42s/it] {'loss': 0.5974, 'learning_rate': 1.5808184185838854e-05, 'epoch': 0.32} 32%|███▏ | 1869/5773 [2:54:44<5:52:52, 5.42s/it] 32%|███▏ | 1870/5773 [2:54:55<5:55:35, 5.47s/it] 32%|███▏ | 1870/5773 [2:54:50<5:55:35, 5.47s/it] {'loss': 0.5768, 'learning_rate': 1.580361574087057e-05, 'epoch': 0.32} 32%|███▏ | 1870/5773 [2:54:55<5:55:35, 5.47s/it] {'loss': 0.5768, 'learning_rate': 1.580361574087057e-05, 'epoch': 0.32} 32%|███▏ | 1870/5773 [2:54:50<5:55:35, 5.47s/it] 32%|███▏ | 1871/5773 [2:55:01<5:57:55, 5.50s/it] 32%|███▏ | 1871/5773 [2:54:55<5:57:55, 5.50s/it] {'loss': 0.5787, 'learning_rate': 1.5799045468738626e-05, 'epoch': 0.32} 32%|███▏ | 1871/5773 [2:55:01<5:57:55, 5.50s/it] {'loss': 0.5787, 'learning_rate': 1.5799045468738626e-05, 'epoch': 0.32} 32%|███▏ | 1871/5773 [2:54:55<5:57:55, 5.50s/it] 32%|███▏ | 1872/5773 [2:55:06<5:54:33, 5.45s/it] 32%|███▏ | 1872/5773 [2:55:01<5:54:32, 5.45s/it] {'loss': 0.5768, 'learning_rate': 1.5794473370881886e-05, 'epoch': 0.32} 32%|███▏ | 1872/5773 [2:55:06<5:54:33, 5.45s/it] {'loss': 0.5768, 'learning_rate': 1.5794473370881886e-05, 'epoch': 0.32} 32%|███▏ | 1872/5773 [2:55:01<5:54:32, 5.45s/it] 32%|███▏ | 1873/5773 [2:55:11<5:52:45, 5.43s/it] 32%|███▏ | 1873/5773 [2:55:06<5:52:45, 5.43s/it] {'loss': 0.5867, 'learning_rate': 1.57898994487398e-05, 'epoch': 0.32} 32%|███▏ | 1873/5773 [2:55:11<5:52:45, 5.43s/it] {'loss': 0.5867, 'learning_rate': 1.57898994487398e-05, 'epoch': 0.32} 32%|███▏ | 1873/5773 [2:55:06<5:52:45, 5.43s/it] 32%|███▏ | 1874/5773 [2:55:17<5:52:46, 5.43s/it] 32%|███▏ | 1874/5773 [2:55:11<5:52:45, 5.43s/it] {'loss': 0.5814, 'learning_rate': 1.5785323703752383e-05, 'epoch': 0.32} 32%|███▏ | 1874/5773 [2:55:17<5:52:46, 5.43s/it] {'loss': 0.5814, 'learning_rate': 1.5785323703752383e-05, 'epoch': 0.32} 32%|███▏ | 1874/5773 [2:55:11<5:52:45, 5.43s/it] 32%|███▏ | 1875/5773 [2:55:23<5:58:11, 5.51s/it] 32%|███▏ | 1875/5773 [2:55:17<5:58:11, 5.51s/it] {'loss': 0.5805, 'learning_rate': 1.578074613736022e-05, 'epoch': 0.32} 32%|███▏ | 1875/5773 [2:55:23<5:58:11, 5.51s/it] {'loss': 0.5805, 'learning_rate': 1.578074613736022e-05, 'epoch': 0.32} 32%|███▏ | 1875/5773 [2:55:17<5:58:11, 5.51s/it] 32%|███▏ | 1876/5773 [2:55:28<5:54:37, 5.46s/it] 32%|███▏ | 1876/5773 [2:55:22<5:54:37, 5.46s/it] {'loss': 0.5887, 'learning_rate': 1.5776166751004483e-05, 'epoch': 0.32} 32%|███▏ | 1876/5773 [2:55:28<5:54:37, 5.46s/it] {'loss': 0.5887, 'learning_rate': 1.5776166751004483e-05, 'epoch': 0.32} 32%|███▏ | 1876/5773 [2:55:22<5:54:37, 5.46s/it] 33%|███▎ | 1877/5773 [2:55:34<5:56:41, 5.49s/it] 33%|███▎ | 1877/5773 [2:55:28<5:56:41, 5.49s/it] {'loss': 0.5836, 'learning_rate': 1.5771585546126907e-05, 'epoch': 0.33} 33%|███▎ | 1877/5773 [2:55:34<5:56:41, 5.49s/it] {'loss': 0.5836, 'learning_rate': 1.5771585546126907e-05, 'epoch': 0.33} 33%|███▎ | 1877/5773 [2:55:28<5:56:41, 5.49s/it] 33%|███▎ | 1878/5773 [2:55:39<5:56:51, 5.50s/it] 33%|███▎ | 1878/5773 [2:55:33<5:56:50, 5.50s/it] {'loss': 0.5835, 'learning_rate': 1.5767002524169797e-05, 'epoch': 0.33} 33%|███▎ | 1878/5773 [2:55:39<5:56:51, 5.50s/it] {'loss': 0.5835, 'learning_rate': 1.5767002524169797e-05, 'epoch': 0.33} 33%|███▎ | 1878/5773 [2:55:33<5:56:50, 5.50s/it] 33%|███▎ | 1879/5773 [2:55:45<5:56:38, 5.50s/it] 33%|███▎ | 1879/5773 [2:55:39<5:56:38, 5.50s/it] {'loss': 0.5759, 'learning_rate': 1.576241768657604e-05, 'epoch': 0.33} 33%|███▎ | 1879/5773 [2:55:45<5:56:38, 5.50s/it] {'loss': 0.5759, 'learning_rate': 1.576241768657604e-05, 'epoch': 0.33} 33%|███▎ | 1879/5773 [2:55:39<5:56:38, 5.50s/it] 33%|███▎ | 1880/5773 [2:55:50<5:54:27, 5.46s/it] 33%|███▎ | 1880/5773 [2:55:44<5:54:27, 5.46s/it] {'loss': 0.5901, 'learning_rate': 1.5757831034789085e-05, 'epoch': 0.33} 33%|███▎ | 1880/5773 [2:55:50<5:54:27, 5.46s/it] {'loss': 0.5901, 'learning_rate': 1.5757831034789085e-05, 'epoch': 0.33} 33%|███▎ | 1880/5773 [2:55:44<5:54:27, 5.46s/it] 33%|███▎ | 1881/5773 [2:55:55<5:52:37, 5.44s/it] 33%|███▎ | 1881/5773 [2:55:50<5:52:37, 5.44s/it] {'loss': 0.5865, 'learning_rate': 1.575324257025296e-05, 'epoch': 0.33} 33%|███▎ | 1881/5773 [2:55:55<5:52:37, 5.44s/it] {'loss': 0.5865, 'learning_rate': 1.575324257025296e-05, 'epoch': 0.33} 33%|███▎ | 1881/5773 [2:55:50<5:52:37, 5.44s/it] 33%|███▎ | 1882/5773 [2:56:01<5:52:34, 5.44s/it] 33%|███▎ | 1882/5773 [2:55:55<5:52:35, 5.44s/it] {'loss': 0.5824, 'learning_rate': 1.574865229441226e-05, 'epoch': 0.33} 33%|███▎ | 1882/5773 [2:56:01<5:52:34, 5.44s/it] {'loss': 0.5824, 'learning_rate': 1.574865229441226e-05, 'epoch': 0.33} 33%|███▎ | 1882/5773 [2:55:55<5:52:35, 5.44s/it] 33%|███▎ | 1883/5773 [2:56:06<5:53:37, 5.45s/it] 33%|███▎ | 1883/5773 [2:56:01<5:53:37, 5.45s/it] {'loss': 0.5843, 'learning_rate': 1.5744060208712148e-05, 'epoch': 0.33} 33%|███▎ | 1883/5773 [2:56:06<5:53:37, 5.45s/it] {'loss': 0.5843, 'learning_rate': 1.5744060208712148e-05, 'epoch': 0.33} 33%|███▎ | 1883/5773 [2:56:01<5:53:37, 5.45s/it] 33%|███▎ | 1884/5773 [2:56:12<5:53:08, 5.45s/it] 33%|███▎ | 1884/5773 [2:56:06<5:53:08, 5.45s/it] {'loss': 0.5671, 'learning_rate': 1.573946631459836e-05, 'epoch': 0.33} 33%|███▎ | 1884/5773 [2:56:12<5:53:08, 5.45s/it] {'loss': 0.5671, 'learning_rate': 1.573946631459836e-05, 'epoch': 0.33} 33%|███▎ | 1884/5773 [2:56:06<5:53:08, 5.45s/it] 33%|███▎ | 1885/5773 [2:56:17<5:54:43, 5.47s/it] 33%|███▎ | 1885/5773 [2:56:12<5:54:43, 5.47s/it] {'loss': 0.6187, 'learning_rate': 1.5734870613517204e-05, 'epoch': 0.33} 33%|███▎ | 1885/5773 [2:56:17<5:54:43, 5.47s/it] {'loss': 0.6187, 'learning_rate': 1.5734870613517204e-05, 'epoch': 0.33} 33%|███▎ | 1885/5773 [2:56:12<5:54:43, 5.47s/it] 33%|███▎ | 1886/5773 [2:56:22<5:49:48, 5.40s/it] 33%|███▎ | 1886/5773 [2:56:17<5:49:47, 5.40s/it] {'loss': 0.5855, 'learning_rate': 1.573027310691555e-05, 'epoch': 0.33} 33%|███▎ | 1886/5773 [2:56:22<5:49:48, 5.40s/it] {'loss': 0.5855, 'learning_rate': 1.573027310691555e-05, 'epoch': 0.33} 33%|███▎ | 1886/5773 [2:56:17<5:49:47, 5.40s/it] 33%|███▎ | 1887/5773 [2:56:28<5:46:56, 5.36s/it] 33%|███▎ | 1887/5773 [2:56:22<5:46:55, 5.36s/it] {'loss': 0.605, 'learning_rate': 1.572567379624084e-05, 'epoch': 0.33} 33%|███▎ | 1887/5773 [2:56:28<5:46:56, 5.36s/it] {'loss': 0.605, 'learning_rate': 1.572567379624084e-05, 'epoch': 0.33} 33%|███▎ | 1887/5773 [2:56:22<5:46:55, 5.36s/it] 33%|███▎ | 1888/5773 [2:56:33<5:49:20, 5.40s/it] 33%|███▎ | 1888/5773 [2:56:28<5:49:20, 5.40s/it] {'loss': 0.5754, 'learning_rate': 1.5721072682941084e-05, 'epoch': 0.33} 33%|███▎ | 1888/5773 [2:56:33<5:49:20, 5.40s/it] {'loss': 0.5754, 'learning_rate': 1.5721072682941084e-05, 'epoch': 0.33} 33%|███▎ | 1888/5773 [2:56:28<5:49:20, 5.40s/it] 33%|███▎ | 1889/5773 [2:56:39<5:48:33, 5.38s/it] 33%|███▎ | 1889/5773 [2:56:33<5:48:33, 5.38s/it] {'loss': 0.5971, 'learning_rate': 1.5716469768464864e-05, 'epoch': 0.33} 33%|███▎ | 1889/5773 [2:56:39<5:48:33, 5.38s/it] {'loss': 0.5971, 'learning_rate': 1.5716469768464864e-05, 'epoch': 0.33} 33%|███▎ | 1889/5773 [2:56:33<5:48:33, 5.38s/it] 33%|███▎ | 1890/5773 [2:56:44<5:50:35, 5.42s/it] 33%|███▎ | 1890/5773 [2:56:38<5:50:35, 5.42s/it] {'loss': 0.5763, 'learning_rate': 1.571186505426132e-05, 'epoch': 0.33} 33%|███▎ | 1890/5773 [2:56:44<5:50:35, 5.42s/it] {'loss': 0.5763, 'learning_rate': 1.571186505426132e-05, 'epoch': 0.33} 33%|███▎ | 1890/5773 [2:56:38<5:50:35, 5.42s/it] 33%|███▎ | 1891/5773 [2:56:49<5:50:02, 5.41s/it] 33%|███▎ | 1891/5773 [2:56:44<5:50:03, 5.41s/it] {'loss': 0.5941, 'learning_rate': 1.5707258541780162e-05, 'epoch': 0.33} 33%|███▎ | 1891/5773 [2:56:49<5:50:02, 5.41s/it] {'loss': 0.5941, 'learning_rate': 1.5707258541780162e-05, 'epoch': 0.33} 33%|███▎ | 1891/5773 [2:56:44<5:50:03, 5.41s/it] 33%|███▎ | 1892/5773 [2:56:55<5:46:22, 5.36s/it] 33%|███▎ | 1892/5773 [2:56:49<5:46:22, 5.35s/it] {'loss': 0.5994, 'learning_rate': 1.570265023247167e-05, 'epoch': 0.33} 33%|███▎ | 1892/5773 [2:56:55<5:46:22, 5.36s/it] {'loss': 0.5994, 'learning_rate': 1.570265023247167e-05, 'epoch': 0.33} 33%|███▎ | 1892/5773 [2:56:49<5:46:22, 5.35s/it] 33%|███▎ | 1893/5773 [2:57:00<5:46:39, 5.36s/it] 33%|███▎ | 1893/5773 [2:56:54<5:46:39, 5.36s/it] {'loss': 0.5673, 'learning_rate': 1.5698040127786688e-05, 'epoch': 0.33} 33%|███▎ | 1893/5773 [2:57:00<5:46:39, 5.36s/it] {'loss': 0.5673, 'learning_rate': 1.5698040127786688e-05, 'epoch': 0.33} 33%|███▎ | 1893/5773 [2:56:54<5:46:39, 5.36s/it] 33%|███▎ | 1894/5773 [2:57:06<5:49:31, 5.41s/it] 33%|███▎ | 1894/5773 [2:57:00<5:49:31, 5.41s/it] {'loss': 0.6122, 'learning_rate': 1.569342822917662e-05, 'epoch': 0.33} 33%|███▎ | 1894/5773 [2:57:06<5:49:31, 5.41s/it] {'loss': 0.6122, 'learning_rate': 1.569342822917662e-05, 'epoch': 0.33} 33%|███▎ | 1894/5773 [2:57:00<5:49:31, 5.41s/it] 33%|███▎ | 1895/5773 [2:57:11<5:45:59, 5.35s/it] 33%|███▎ | 1895/5773 [2:57:05<5:45:59, 5.35s/it] {'loss': 0.5786, 'learning_rate': 1.568881453809344e-05, 'epoch': 0.33} 33%|███▎ | 1895/5773 [2:57:11<5:45:59, 5.35s/it] {'loss': 0.5786, 'learning_rate': 1.568881453809344e-05, 'epoch': 0.33} 33%|███▎ | 1895/5773 [2:57:05<5:45:59, 5.35s/it] 33%|███▎ | 1896/5773 [2:57:16<5:46:16, 5.36s/it] 33%|███▎ | 1896/5773 [2:57:11<5:46:16, 5.36s/it] {'loss': 0.5661, 'learning_rate': 1.5684199055989693e-05, 'epoch': 0.33} {'loss': 0.5661, 'learning_rate': 1.5684199055989693e-05, 'epoch': 0.33} 33%|███▎ | 1896/5773 [2:57:16<5:46:16, 5.36s/it] 33%|███▎ | 1896/5773 [2:57:11<5:46:16, 5.36s/it]WARNING: tokenization mismatch: 1 vs. 737. [[{'from': 'human', 'value': 'ASSISTANT: You are a top consultant specialising in advising tax firms and have the following skills and qualities.\n\nKNOWLEDGE: Consultant in a top consulting firm specialising in tax consulting, senior strategist, digital expert.\n\nCHARACTERISTICS: Customer-focused, empathetic, solution-oriented, personable, persuasive, creative.\n\nTASK: Use your knowledge to develop 10 measures to encourage clients to submit their documents digitally instead of by post. The measures should be formulated in such a way that the client recognises the advantages of digital filing. The law firm is digitised and can accept digital documents.\n\nOUTPUT: A list of the 10 actions, each with the name of the action as a heading and 2 short additional bullet points.\n\nOUTPUT LANGUAGE: German'}, {'from': 'gpt', 'value': '1. Persönlicher Kontakt und Beratung:\n\t* Persönliche Beratung und Unterstützung bei der Umstellung auf digitales Einreichen\n\t* Betonung der Vorteile, wie Zeit- und Kosteneinsparungen\n2. Schulung und Unterstützung:\n\t* Schulungen für die digitale Einreichung von Dokumenten\n\t* Anleitung zur Nutzung von Tools und Software\n3. Anreize schaffen:\n\t* Ermäßigung bei der Beauftragung digitaler Dienstleistungen\n\t* Sonderangebote für Kunden, die ihre Dokumente digital einreichen\n4. Informationen bereitstellen:\n\t* Veröffentlichung von Leitfäden und Handbüchern für die digitale Einreichung von Dokumenten\n\t* FAQ-Liste für häufig gestellte Fragen zum digitalen Einreichen\n5. Klare Kommunikation:\n\t* Verständliche Kommunikation der Vorteile der digitalen Einreichung\n\t* Bereitstellung von Informationen über die Sicherheit der digitalen Übermittlung\n6. Einfache Zugänglichkeit:\n\t* Vereinfachung des digitalen Einreichungsprozesses durch intuitive Online-Plattformen\n\t* Bereitstellung von Unterstützung für den Fall technischer Schwierigkeiten\n7. Zeitersparnis betonen:\n\t* Betonung der Vorteile von Zeitersparnis durch digitales Einreichen\n\t* Erinnerungen an Fristen und den Nutzen der pünktlichen Einreichung\n8. Umweltbewusstsein:\n\t* Betonung des Umweltvorteils durch digitales Einreichen von Dokumenten\n\t* Verweis auf die Möglichkeit, Papierverbrauch zu reduzieren\n9. Kosteneffizienz:\n\t* Betonung der Vorteile von Kosteneinsparungen durch digitales Einreichen\n\t* Verweis auf die Möglichkeit, Papier- und Portokosten zu sparen\n10. Feedback einholen:\n* Umfrage zur Bewertung des digitalen Einreichungsprozesses\n* Analyse der Ergebnisse zur Verbesserung der digitalen Dienstleistungen'}]] (ignored) 33%|███▎ | 1897/5773 [2:57:22<5:46:55, 5.37s/it] 33%|███▎ | 1897/5773 [2:57:16<5:46:55, 5.37s/it] {'loss': 0.576, 'learning_rate': 1.567958178431847e-05, 'epoch': 0.33} 33%|███▎ | 1897/5773 [2:57:22<5:46:55, 5.37s/it] {'loss': 0.576, 'learning_rate': 1.567958178431847e-05, 'epoch': 0.33} 33%|███▎ | 1897/5773 [2:57:16<5:46:55, 5.37s/it] 33%|███▎ | 1898/5773 [2:57:27<5:46:12, 5.36s/it] 33%|███▎ | 1898/5773 [2:57:21<5:46:12, 5.36s/it] {'loss': 0.5701, 'learning_rate': 1.567496272453344e-05, 'epoch': 0.33} 33%|███▎ | 1898/5773 [2:57:27<5:46:12, 5.36s/it] {'loss': 0.5701, 'learning_rate': 1.567496272453344e-05, 'epoch': 0.33} 33%|███▎ | 1898/5773 [2:57:21<5:46:12, 5.36s/it] 33%|███▎ | 1899/5773 [2:57:32<5:49:49, 5.42s/it] 33%|███▎ | 1899/5773 [2:57:27<5:49:49, 5.42s/it] {'loss': 0.5825, 'learning_rate': 1.5670341878088834e-05, 'epoch': 0.33} 33%|███▎ | 1899/5773 [2:57:32<5:49:49, 5.42s/it] {'loss': 0.5825, 'learning_rate': 1.5670341878088834e-05, 'epoch': 0.33} 33%|███▎ | 1899/5773 [2:57:27<5:49:49, 5.42s/it]WARNING: tokenization mismatch: 1 vs. 1419. [[{'from': 'human', 'value': 'ASSISTANT: You are an innovative tax advisory firm with the following knowledge and traits.\n\nKNOWLEDGE: Tax advisor, junior tax advisor, senior tax advisor, financial strategist, tax regulations.\n\nTRAITS: Empathy, innovation, solution orientation, professionalism, communication, client focus.\n\nTASK: Use your knowledge to develop an internal strategy document. We are a tax advisory firm looking to identify 10 creative ways to improve client satisfaction and loyalty. Focus on ideas that are tailored to the needs of tax advisors and enhance the overall client experience.\n\nOUTPUT: A table in German containing 10 creative strategies to improve client satisfaction and loyalty for a tax advisory firm, including a brief description for each.'}, {'from': 'gpt', 'value': 'Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei\n\n| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1 | Personalisierter Service | Jeder Kunde erhält einen dedizierten Steuerberater, der auf seine individuellen Bedürfnisse und Anliegen eingeht. |\n| 2 | Proaktive Steuerplanung | Regelmäßige Updates und Empfehlungen zur Steueroptimierung, basierend auf aktuellen Gesetzesänderungen und Kundenzielen. |\n| 3 | Umfassende Schulungsangebote | Kostenlose Workshops und Webinare, um das Steuerwissen der Kunden zu erweitern und den Umgang mit Steuerthemen zu erleichtern. |\n| 4 | Digitale Plattform | Eine benutzerfreundliche Online-Plattform, die den Kunden den Zugriff auf ihre Steuerdokumente und den Kontakt zum Berater erleichtert. |\n| 5 | Einführung von Steuer-Coaching | Persönliche Coaching-Sitzungen, um den Kunden dabei zu unterstützen, ihre steuerlichen Ziele und Möglichkeiten besser zu verstehen. |\n| 6 | Schnelle Reaktionszeiten | Verbindliches Versprechen, innerhalb von 24 Stunden auf Kundenanfragen zu reagieren, um eine effiziente Kommunikation zu gewährleisten. |\n| 7 | Transparente Preisgestaltung | Klare und faire Preisstrukturen, die den Kunden im Voraus bekannt sind, um böse Überraschungen zu vermeiden. |\n| 8 | Individuelle Jahresgespräche | Regelmäßige persönliche Jahresgespräche, um auf die Bedürfnisse und Ziele der Kunden einzugehen und die Zusammenarbeit zu stärken. |\n| 9 | Kunden-Feedback-System | Einrichtung eines einfachen Feedback-Systems, um Kundenmeinungen und Verbesserungsvorschläge aktiv einzuholen und umzusetzen. |\n| 10 | Steuer-Newsletter und Informationsangebot | Monatlicher Newsletter mit aktuellen Steuerinformationen und Tipps, der den Kunden einen Mehrwert bietet und sie auf dem Laufenden hält. |'}, {'from': 'human', 'value': 'Bitte etwas kreativer und auf bezugnehmend auf die zwischenmenschliche Ebene.'}, {'from': 'gpt', 'value': 'Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei mit Fokus auf zwischenmenschlicher Ebene\n\n| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1 | Kunden-Events | Organisation von regelmäßigen, informellen Veranstaltungen, um den persönlichen Kontakt zwischen Klienten und Beratern zu fördern. |\n| 2 | Persönliche Geburtstagsgrüße | Individuelle Geburtstagskarten oder -nachrichten für jeden Kunden, um Wertschätzung zu zeigen und Beziehungen zu stärken. |\n| 3 | Kundenbefragung & Anpassung | Aktives Einholen von Kundenmeinungen zur individuellen Betreuung und Anpassung unserer Leistungen entsprechend der Kundenbedürfnisse. |\n| 4 | Teamvorstellung & Empathie | Vorstellung unserer Berater auf der Firmenwebsite und in sozialen Medien, um eine persönliche und empathische Beziehung zum Kunden aufzubauen. |\n| 5 | Kunden-Mentoring-Programm | Schaffung eines Programms, bei dem langjährige Kunden als Mentoren für Neukunden fungieren, um Vertrauen und Loyalität zu fördern. |\n| 6 | "Tag der offenen Tür" | Einladung der Kunden zu einem "Tag der offenen Tür" in unserer Kanzlei, um ihnen einen persönlichen Einblick in unsere Arbeitsweise zu geben. |\n| 7 | Individuelle Erfolgsgeschichten | Erstellung von individuellen Erfolgsgeschichten für jeden Kunden, um ihre steuerlichen Erfolge hervorzuheben und ihre Loyalität zu stärken. |\n| 8 | Gemeinsame soziale Projekte | Zusammenarbeit mit Kunden an sozialen Projekten und Spendenaktionen, um gemeinsame Werte zu teilen und die Zusammenarbeit zu fördern. |\n| 9 | Flexibilität in der Kommunikation | Anbieten verschiedener Kommunikationswege (Telefon, E-Mail, Videoanruf, etc.), um auf die individuellen Bedürfnisse der Kunden einzugehen. |\n| 10 | Jährliches Kunden-Dankeschön | Überraschung der Kunden mit einer kleinen Aufmerksamkeit oder einem Dankeschön-Event, um die Wertschätzung für ihre Treue auszudrücken. |'}]] (ignored) 13 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 33%|███▎ | 1900/5773 [2:57:38<5:49:34, 5.42s/it]5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 015 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend...14 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 33%|███▎ | 1900/5773 [2:57:32<5:49:34, 5.42s/it] {'loss': 0.5936, 'learning_rate': 1.5665719246439435e-05, 'epoch': 0.33} 33%|███▎ | 1900/5773 [2:57:38<5:49:34, 5.42s/it] {'loss': 0.5936, 'learning_rate': 1.5665719246439435e-05, 'epoch': 0.33} 33%|███▎ | 1900/5773 [2:57:32<5:49:34, 5.42s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-1900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 33%|███▎ | 1901/5773 [2:57:56<10:01:52, 9.33s/it] 33%|███▎ | 1901/5773 [2:57:51<10:01:52, 9.33s/it] {'loss': 0.5757, 'learning_rate': 1.56610948310406e-05, 'epoch': 0.33} 33%|███▎ | 1901/5773 [2:57:56<10:01:52, 9.33s/it] {'loss': 0.5757, 'learning_rate': 1.56610948310406e-05, 'epoch': 0.33} 33%|███▎ | 1901/5773 [2:57:51<10:01:52, 9.33s/it] 33%|███▎ | 1902/5773 [2:58:02<8:42:43, 8.10s/it] 33%|███▎ | 1902/5773 [2:57:56<8:42:43, 8.10s/it] {'loss': 0.5887, 'learning_rate': 1.5656468633348243e-05, 'epoch': 0.33} 33%|███▎ | 1902/5773 [2:58:02<8:42:43, 8.10s/it] {'loss': 0.5887, 'learning_rate': 1.5656468633348243e-05, 'epoch': 0.33} 33%|███▎ | 1902/5773 [2:57:56<8:42:43, 8.10s/it] 33%|███▎ | 1903/5773 [2:58:07<7:50:56, 7.30s/it] 33%|███▎ | 1903/5773 [2:58:01<7:50:56, 7.30s/it] {'loss': 0.5801, 'learning_rate': 1.5651840654818834e-05, 'epoch': 0.33} 33%|███▎ | 1903/5773 [2:58:07<7:50:56, 7.30s/it] {'loss': 0.5801, 'learning_rate': 1.5651840654818834e-05, 'epoch': 0.33} 33%|███▎ | 1903/5773 [2:58:01<7:50:56, 7.30s/it] 33%|███▎ | 1904/5773 [2:58:12<7:10:37, 6.68s/it] 33%|███▎ | 1904/5773 [2:58:07<7:10:37, 6.68s/it] {'loss': 0.5902, 'learning_rate': 1.5647210896909414e-05, 'epoch': 0.33} 33%|███▎ | 1904/5773 [2:58:12<7:10:37, 6.68s/it] {'loss': 0.5902, 'learning_rate': 1.5647210896909414e-05, 'epoch': 0.33} 33%|███▎ | 1904/5773 [2:58:07<7:10:37, 6.68s/it] 33%|███▎ | 1905/5773 [2:58:18<6:46:53, 6.31s/it] 33%|███▎ | 1905/5773 [2:58:12<6:46:53, 6.31s/it] {'loss': 0.5987, 'learning_rate': 1.5642579361077576e-05, 'epoch': 0.33} 33%|███▎ | 1905/5773 [2:58:18<6:46:53, 6.31s/it] {'loss': 0.5987, 'learning_rate': 1.5642579361077576e-05, 'epoch': 0.33} 33%|███▎ | 1905/5773 [2:58:12<6:46:53, 6.31s/it] 33%|███▎ | 1906/5773 [2:58:23<6:28:21, 6.03s/it] 33%|███▎ | 1906/5773 [2:58:17<6:28:21, 6.03s/it] {'loss': 0.5818, 'learning_rate': 1.563794604878148e-05, 'epoch': 0.33} 33%|███▎ | 1906/5773 [2:58:23<6:28:21, 6.03s/it] {'loss': 0.5818, 'learning_rate': 1.563794604878148e-05, 'epoch': 0.33} 33%|███▎ | 1906/5773 [2:58:17<6:28:21, 6.03s/it] 33%|███▎ | 1907/5773 [2:58:29<6:19:09, 5.88s/it] 33%|███▎ | 1907/5773 [2:58:23<6:19:09, 5.88s/it] {'loss': 0.5873, 'learning_rate': 1.563331096147983e-05, 'epoch': 0.33} 33%|███▎ | 1907/5773 [2:58:29<6:19:09, 5.88s/it] {'loss': 0.5873, 'learning_rate': 1.563331096147983e-05, 'epoch': 0.33} 33%|███▎ | 1907/5773 [2:58:23<6:19:09, 5.88s/it] 33%|███▎ | 1908/5773 [2:58:34<6:10:16, 5.75s/it] 33%|███▎ | 1908/5773 [2:58:28<6:10:15, 5.75s/it] {'loss': 0.5739, 'learning_rate': 1.562867410063191e-05, 'epoch': 0.33} 33%|███▎ | 1908/5773 [2:58:34<6:10:16, 5.75s/it] {'loss': 0.5739, 'learning_rate': 1.562867410063191e-05, 'epoch': 0.33} 33%|███▎ | 1908/5773 [2:58:28<6:10:15, 5.75s/it] 33%|███▎ | 1909/5773 [2:58:39<6:03:16, 5.64s/it] 33%|███▎ | 1909/5773 [2:58:34<6:03:16, 5.64s/it] {'loss': 0.5899, 'learning_rate': 1.5624035467697547e-05, 'epoch': 0.33} 33%|███▎ | 1909/5773 [2:58:39<6:03:16, 5.64s/it] {'loss': 0.5899, 'learning_rate': 1.5624035467697547e-05, 'epoch': 0.33} 33%|███▎ | 1909/5773 [2:58:34<6:03:16, 5.64s/it] 33%|███▎ | 1910/5773 [2:58:45<6:01:18, 5.61s/it] 33%|███▎ | 1910/5773 [2:58:39<6:01:18, 5.61s/it] {'loss': 0.5972, 'learning_rate': 1.561939506413713e-05, 'epoch': 0.33} 33%|███▎ | 1910/5773 [2:58:45<6:01:18, 5.61s/it] {'loss': 0.5972, 'learning_rate': 1.561939506413713e-05, 'epoch': 0.33} 33%|███▎ | 1910/5773 [2:58:39<6:01:18, 5.61s/it] 33%|███▎ | 1911/5773 [2:58:50<5:58:08, 5.56s/it] 33%|███▎ | 1911/5773 [2:58:45<5:58:08, 5.56s/it] {'loss': 0.5843, 'learning_rate': 1.5614752891411613e-05, 'epoch': 0.33} 33%|███▎ | 1911/5773 [2:58:50<5:58:08, 5.56s/it] {'loss': 0.5843, 'learning_rate': 1.5614752891411613e-05, 'epoch': 0.33} 33%|███▎ | 1911/5773 [2:58:45<5:58:08, 5.56s/it] 33%|███▎ | 1912/5773 [2:58:56<5:54:48, 5.51s/it] 33%|███▎ | 1912/5773 [2:58:50<5:54:48, 5.51s/it] {'loss': 0.5813, 'learning_rate': 1.5610108950982494e-05, 'epoch': 0.33} 33%|███▎ | 1912/5773 [2:58:56<5:54:48, 5.51s/it] {'loss': 0.5813, 'learning_rate': 1.5610108950982494e-05, 'epoch': 0.33} 33%|███▎ | 1912/5773 [2:58:50<5:54:48, 5.51s/it] 33%|███▎ | 1913/5773 [2:59:01<5:52:49, 5.48s/it] 33%|███▎ | 1913/5773 [2:58:56<5:52:49, 5.48s/it] {'loss': 0.5914, 'learning_rate': 1.5605463244311834e-05, 'epoch': 0.33} 33%|███▎ | 1913/5773 [2:59:01<5:52:49, 5.48s/it] {'loss': 0.5914, 'learning_rate': 1.5605463244311834e-05, 'epoch': 0.33} 33%|███▎ | 1913/5773 [2:58:56<5:52:49, 5.48s/it] 33%|███▎ | 1914/5773 [2:59:07<5:53:28, 5.50s/it] 33%|███▎ | 1914/5773 [2:59:01<5:53:28, 5.50s/it] {'loss': 0.5735, 'learning_rate': 1.560081577286225e-05, 'epoch': 0.33} 33%|███▎ | 1914/5773 [2:59:07<5:53:28, 5.50s/it] {'loss': 0.5735, 'learning_rate': 1.560081577286225e-05, 'epoch': 0.33} 33%|███▎ | 1914/5773 [2:59:01<5:53:28, 5.50s/it] 33%|███▎ | 1915/5773 [2:59:12<5:51:37, 5.47s/it] 33%|███▎ | 1915/5773 [2:59:07<5:51:37, 5.47s/it] {'loss': 0.5838, 'learning_rate': 1.559616653809692e-05, 'epoch': 0.33} 33%|███▎ | 1915/5773 [2:59:12<5:51:37, 5.47s/it] {'loss': 0.5838, 'learning_rate': 1.559616653809692e-05, 'epoch': 0.33} 33%|███▎ | 1915/5773 [2:59:07<5:51:37, 5.47s/it] 33%|███▎ | 1916/5773 [2:59:18<5:50:40, 5.46s/it] 33%|███▎ | 1916/5773 [2:59:12<5:50:40, 5.46s/it] {'loss': 0.5919, 'learning_rate': 1.5591515541479566e-05, 'epoch': 0.33} 33%|███▎ | 1916/5773 [2:59:18<5:50:40, 5.46s/it] {'loss': 0.5919, 'learning_rate': 1.5591515541479566e-05, 'epoch': 0.33} 33%|███▎ | 1916/5773 [2:59:12<5:50:40, 5.46s/it] 33%|███▎ | 1917/5773 [2:59:23<5:51:34, 5.47s/it] 33%|███▎ | 1917/5773 [2:59:17<5:51:34, 5.47s/it] {'loss': 0.5744, 'learning_rate': 1.558686278447447e-05, 'epoch': 0.33} 33%|███▎ | 1917/5773 [2:59:23<5:51:34, 5.47s/it] {'loss': 0.5744, 'learning_rate': 1.558686278447447e-05, 'epoch': 0.33} 33%|███▎ | 1917/5773 [2:59:17<5:51:34, 5.47s/it] 33%|███▎ | 1918/5773 [2:59:29<5:51:51, 5.48s/it] 33%|███▎ | 1918/5773 [2:59:23<5:51:51, 5.48s/it] {'loss': 0.5834, 'learning_rate': 1.5582208268546473e-05, 'epoch': 0.33} 33%|███▎ | 1918/5773 [2:59:29<5:51:51, 5.48s/it] {'loss': 0.5834, 'learning_rate': 1.5582208268546473e-05, 'epoch': 0.33} 33%|███▎ | 1918/5773 [2:59:23<5:51:51, 5.48s/it] 33%|███▎ | 1919/5773 [2:59:34<5:48:02, 5.42s/it] 33%|███▎ | 1919/5773 [2:59:28<5:48:02, 5.42s/it] {'loss': 0.583, 'learning_rate': 1.557755199516096e-05, 'epoch': 0.33} 33%|███▎ | 1919/5773 [2:59:34<5:48:02, 5.42s/it] {'loss': 0.583, 'learning_rate': 1.557755199516096e-05, 'epoch': 0.33} 33%|███▎ | 1919/5773 [2:59:28<5:48:02, 5.42s/it] 33%|███▎ | 1920/5773 [2:59:39<5:47:05, 5.40s/it] 33%|███▎ | 1920/5773 [2:59:34<5:47:04, 5.40s/it] {'loss': 0.588, 'learning_rate': 1.557289396578388e-05, 'epoch': 0.33} 33%|███▎ | 1920/5773 [2:59:39<5:47:05, 5.40s/it] {'loss': 0.588, 'learning_rate': 1.557289396578388e-05, 'epoch': 0.33} 33%|███▎ | 1920/5773 [2:59:34<5:47:04, 5.40s/it] 33%|███▎ | 1921/5773 [2:59:45<5:48:50, 5.43s/it] 33%|███▎ | 1921/5773 [2:59:39<5:48:51, 5.43s/it] {'loss': 0.5889, 'learning_rate': 1.5568234181881725e-05, 'epoch': 0.33} 33%|███▎ | 1921/5773 [2:59:45<5:48:50, 5.43s/it] {'loss': 0.5889, 'learning_rate': 1.5568234181881725e-05, 'epoch': 0.33} 33%|███▎ | 1921/5773 [2:59:39<5:48:51, 5.43s/it] 33%|███▎ | 1922/5773 [2:59:50<5:48:24, 5.43s/it] 33%|███▎ | 1922/5773 [2:59:45<5:48:24, 5.43s/it] {'loss': 0.5886, 'learning_rate': 1.5563572644921546e-05, 'epoch': 0.33} 33%|███▎ | 1922/5773 [2:59:50<5:48:24, 5.43s/it] {'loss': 0.5886, 'learning_rate': 1.5563572644921546e-05, 'epoch': 0.33} 33%|███▎ | 1922/5773 [2:59:45<5:48:24, 5.43s/it] 33%|███▎ | 1923/5773 [2:59:50<5:45:50, 5.39s/it] 33%|███▎ | 1923/5773 [2:59:55<5:45:51, 5.39s/it] {'loss': 0.5738, 'learning_rate': 1.5558909356370944e-05, 'epoch': 0.33} 33%|███▎ | 1923/5773 [2:59:55<5:45:51, 5.39s/it] {'loss': 0.5738, 'learning_rate': 1.5558909356370944e-05, 'epoch': 0.33} 33%|███▎ | 1923/5773 [2:59:50<5:45:50, 5.39s/it] 33%|███▎ | 1924/5773 [2:59:55<5:44:18, 5.37s/it] 33%|███▎ | 1924/5773 [3:00:01<5:44:19, 5.37s/it] {'loss': 0.5531, 'learning_rate': 1.5554244317698072e-05, 'epoch': 0.33} 33%|███▎ | 1924/5773 [3:00:01<5:44:19, 5.37s/it] {'loss': 0.5531, 'learning_rate': 1.5554244317698072e-05, 'epoch': 0.33} 33%|███▎ | 1924/5773 [2:59:55<5:44:18, 5.37s/it] 33%|███▎ | 1925/5773 [3:00:06<5:44:50, 5.38s/it] 33%|███▎ | 1925/5773 [3:00:01<5:44:50, 5.38s/it] {'loss': 0.5823, 'learning_rate': 1.554957753037163e-05, 'epoch': 0.33} 33%|███▎ | 1925/5773 [3:00:06<5:44:50, 5.38s/it] {'loss': 0.5823, 'learning_rate': 1.554957753037163e-05, 'epoch': 0.33} 33%|███▎ | 1925/5773 [3:00:01<5:44:50, 5.38s/it] 33%|███▎ | 1926/5773 [3:00:06<5:47:06, 5.41s/it] 33%|███▎ | 1926/5773 [3:00:12<5:47:08, 5.41s/it] {'loss': 0.5918, 'learning_rate': 1.554490899586087e-05, 'epoch': 0.33} 33%|███▎ | 1926/5773 [3:00:12<5:47:08, 5.41s/it] {'loss': 0.5918, 'learning_rate': 1.554490899586087e-05, 'epoch': 0.33} 33%|███▎ | 1926/5773 [3:00:06<5:47:06, 5.41s/it] 33%|███▎ | 1927/5773 [3:00:17<5:46:43, 5.41s/it] 33%|███▎ | 1927/5773 [3:00:11<5:46:44, 5.41s/it] {'loss': 0.5762, 'learning_rate': 1.5540238715635604e-05, 'epoch': 0.33} 33%|███▎ | 1927/5773 [3:00:17<5:46:43, 5.41s/it] {'loss': 0.5762, 'learning_rate': 1.5540238715635604e-05, 'epoch': 0.33} 33%|███▎ | 1927/5773 [3:00:11<5:46:44, 5.41s/it] 33%|███▎ | 1928/5773 [3:00:17<5:48:00, 5.43s/it] 33%|███▎ | 1928/5773 [3:00:22<5:48:01, 5.43s/it] {'loss': 0.5784, 'learning_rate': 1.553556669116618e-05, 'epoch': 0.33} 33%|███▎ | 1928/5773 [3:00:22<5:48:01, 5.43s/it] {'loss': 0.5784, 'learning_rate': 1.553556669116618e-05, 'epoch': 0.33} 33%|███▎ | 1928/5773 [3:00:17<5:48:00, 5.43s/it] 33%|███▎ | 1929/5773 [3:00:22<5:49:15, 5.45s/it] 33%|███▎ | 1929/5773 [3:00:28<5:49:15, 5.45s/it] {'loss': 0.5746, 'learning_rate': 1.5530892923923504e-05, 'epoch': 0.33} 33%|███▎ | 1929/5773 [3:00:28<5:49:15, 5.45s/it] {'loss': 0.5746, 'learning_rate': 1.5530892923923504e-05, 'epoch': 0.33} 33%|███▎ | 1929/5773 [3:00:22<5:49:15, 5.45s/it] 33%|███▎ | 1930/5773 [3:00:28<5:46:47, 5.41s/it] 33%|███▎ | 1930/5773 [3:00:33<5:46:49, 5.41s/it] {'loss': 0.5835, 'learning_rate': 1.552621741537902e-05, 'epoch': 0.33} 33%|███▎ | 1930/5773 [3:00:33<5:46:49, 5.41s/it] {'loss': 0.5835, 'learning_rate': 1.552621741537902e-05, 'epoch': 0.33} 33%|███▎ | 1930/5773 [3:00:28<5:46:47, 5.41s/it] 33%|███▎ | 1931/5773 [3:00:39<5:50:21, 5.47s/it] 33%|███▎ | 1931/5773 [3:00:33<5:50:24, 5.47s/it] {'loss': 0.5892, 'learning_rate': 1.5521540167004736e-05, 'epoch': 0.33} 33%|███▎ | 1931/5773 [3:00:39<5:50:21, 5.47s/it] {'loss': 0.5892, 'learning_rate': 1.5521540167004736e-05, 'epoch': 0.33} 33%|███▎ | 1931/5773 [3:00:33<5:50:24, 5.47s/it] 33%|███▎ | 1932/5773 [3:00:39<5:49:12, 5.46s/it] 33%|███▎ | 1932/5773 [3:00:44<5:49:13, 5.46s/it] {'loss': 0.593, 'learning_rate': 1.5516861180273196e-05, 'epoch': 0.33} 33%|███▎ | 1932/5773 [3:00:44<5:49:13, 5.46s/it] {'loss': 0.593, 'learning_rate': 1.5516861180273196e-05, 'epoch': 0.33} 33%|███▎ | 1932/5773 [3:00:39<5:49:12, 5.46s/it] 33%|███▎ | 1933/5773 [3:00:45<5:54:44, 5.54s/it] 33%|███▎ | 1933/5773 [3:00:50<5:54:44, 5.54s/it] {'loss': 0.5745, 'learning_rate': 1.551218045665749e-05, 'epoch': 0.33} 33%|███▎ | 1933/5773 [3:00:50<5:54:44, 5.54s/it] {'loss': 0.5745, 'learning_rate': 1.551218045665749e-05, 'epoch': 0.33} 33%|███▎ | 1933/5773 [3:00:45<5:54:44, 5.54s/it] 34%|███▎ | 1934/5773 [3:00:55<5:51:50, 5.50s/it] 34%|███▎ | 1934/5773 [3:00:50<5:51:51, 5.50s/it] {'loss': 0.5801, 'learning_rate': 1.5507497997631267e-05, 'epoch': 0.34} 34%|███▎ | 1934/5773 [3:00:55<5:51:50, 5.50s/it] {'loss': 0.5801, 'learning_rate': 1.5507497997631267e-05, 'epoch': 0.34} 34%|███▎ | 1934/5773 [3:00:50<5:51:51, 5.50s/it] 34%|███▎ | 1935/5773 [3:00:55<5:52:42, 5.51s/it] 34%|███▎ | 1935/5773 [3:01:01<5:52:42, 5.51s/it] {'loss': 0.5916, 'learning_rate': 1.5502813804668712e-05, 'epoch': 0.34} 34%|███▎ | 1935/5773 [3:01:01<5:52:42, 5.51s/it] {'loss': 0.5916, 'learning_rate': 1.5502813804668712e-05, 'epoch': 0.34} 34%|███▎ | 1935/5773 [3:00:55<5:52:42, 5.51s/it] 34%|███▎ | 1936/5773 [3:01:01<5:53:16, 5.52s/it] 34%|███▎ | 1936/5773 [3:01:07<5:53:16, 5.52s/it] {'loss': 0.5921, 'learning_rate': 1.5498127879244554e-05, 'epoch': 0.34} 34%|███▎ | 1936/5773 [3:01:07<5:53:16, 5.52s/it] {'loss': 0.5921, 'learning_rate': 1.5498127879244554e-05, 'epoch': 0.34} 34%|███▎ | 1936/5773 [3:01:01<5:53:16, 5.52s/it] 34%|███▎ | 1937/5773 [3:01:07<5:53:50, 5.53s/it] 34%|███▎ | 1937/5773 [3:01:12<5:53:53, 5.54s/it] {'loss': 0.5835, 'learning_rate': 1.5493440222834073e-05, 'epoch': 0.34} 34%|███▎ | 1937/5773 [3:01:12<5:53:53, 5.54s/it] {'loss': 0.5835, 'learning_rate': 1.5493440222834073e-05, 'epoch': 0.34} 34%|███▎ | 1937/5773 [3:01:07<5:53:50, 5.53s/it] 34%|███▎ | 1938/5773 [3:01:12<5:50:41, 5.49s/it] 34%|███▎ | 1938/5773 [3:01:18<5:50:40, 5.49s/it] {'loss': 0.5827, 'learning_rate': 1.54887508369131e-05, 'epoch': 0.34} 34%|███▎ | 1938/5773 [3:01:18<5:50:40, 5.49s/it] {'loss': 0.5827, 'learning_rate': 1.54887508369131e-05, 'epoch': 0.34} 34%|███▎ | 1938/5773 [3:01:12<5:50:41, 5.49s/it] 34%|███▎ | 1939/5773 [3:01:17<5:50:49, 5.49s/it] 34%|███▎ | 1939/5773 [3:01:23<5:50:48, 5.49s/it] {'loss': 0.5944, 'learning_rate': 1.5484059722958e-05, 'epoch': 0.34} 34%|███▎ | 1939/5773 [3:01:23<5:50:48, 5.49s/it] {'loss': 0.5944, 'learning_rate': 1.5484059722958e-05, 'epoch': 0.34} 34%|███▎ | 1939/5773 [3:01:17<5:50:49, 5.49s/it] 34%|███▎ | 1940/5773 [3:01:23<5:51:54, 5.51s/it] 34%|███▎ | 1940/5773 [3:01:29<5:51:55, 5.51s/it] {'loss': 0.5708, 'learning_rate': 1.5479366882445684e-05, 'epoch': 0.34} 34%|███▎ | 1940/5773 [3:01:29<5:51:55, 5.51s/it] {'loss': 0.5708, 'learning_rate': 1.5479366882445684e-05, 'epoch': 0.34} 34%|███▎ | 1940/5773 [3:01:23<5:51:54, 5.51s/it] 34%|███▎ | 1941/5773 [3:01:29<5:56:21, 5.58s/it] 34%|███▎ | 1941/5773 [3:01:34<5:56:21, 5.58s/it] {'loss': 0.5884, 'learning_rate': 1.5474672316853605e-05, 'epoch': 0.34} 34%|███▎ | 1941/5773 [3:01:34<5:56:21, 5.58s/it] {'loss': 0.5884, 'learning_rate': 1.5474672316853605e-05, 'epoch': 0.34} 34%|███▎ | 1941/5773 [3:01:29<5:56:21, 5.58s/it] 34%|███▎ | 1942/5773 [3:01:34<5:53:08, 5.53s/it] 34%|███▎ | 1942/5773 [3:01:40<5:53:10, 5.53s/it] {'loss': 0.5798, 'learning_rate': 1.546997602765977e-05, 'epoch': 0.34} 34%|███▎ | 1942/5773 [3:01:40<5:53:10, 5.53s/it] {'loss': 0.5798, 'learning_rate': 1.546997602765977e-05, 'epoch': 0.34} 34%|███▎ | 1942/5773 [3:01:34<5:53:08, 5.53s/it] 34%|███▎ | 1943/5773 [3:01:45<5:50:43, 5.49s/it] 34%|███▎ | 1943/5773 [3:01:40<5:50:45, 5.49s/it] {'loss': 0.5844, 'learning_rate': 1.5465278016342717e-05, 'epoch': 0.34} 34%|███▎ | 1943/5773 [3:01:45<5:50:43, 5.49s/it] {'loss': 0.5844, 'learning_rate': 1.5465278016342717e-05, 'epoch': 0.34} 34%|███▎ | 1943/5773 [3:01:40<5:50:45, 5.49s/it] 34%|███▎ | 1944/5773 [3:01:51<5:49:32, 5.48s/it] 34%|███▎ | 1944/5773 [3:01:45<5:49:33, 5.48s/it] {'loss': 0.6016, 'learning_rate': 1.5460578284381526e-05, 'epoch': 0.34} 34%|███▎ | 1944/5773 [3:01:51<5:49:32, 5.48s/it] {'loss': 0.6016, 'learning_rate': 1.5460578284381526e-05, 'epoch': 0.34} 34%|███▎ | 1944/5773 [3:01:45<5:49:33, 5.48s/it] 34%|███▎ | 1945/5773 [3:01:50<5:48:58, 5.47s/it] 34%|███▎ | 1945/5773 [3:01:56<5:48:58, 5.47s/it] {'loss': 0.5881, 'learning_rate': 1.545587683325583e-05, 'epoch': 0.34} 34%|███▎ | 1945/5773 [3:01:56<5:48:58, 5.47s/it] {'loss': 0.5881, 'learning_rate': 1.545587683325583e-05, 'epoch': 0.34} 34%|███▎ | 1945/5773 [3:01:50<5:48:58, 5.47s/it] 34%|███▎ | 1946/5773 [3:01:56<5:46:45, 5.44s/it] 34%|███▎ | 1946/5773 [3:02:01<5:46:45, 5.44s/it] {'loss': 0.5914, 'learning_rate': 1.5451173664445795e-05, 'epoch': 0.34} 34%|███▎ | 1946/5773 [3:02:01<5:46:45, 5.44s/it] {'loss': 0.5914, 'learning_rate': 1.5451173664445795e-05, 'epoch': 0.34} 34%|███▎ | 1946/5773 [3:01:56<5:46:45, 5.44s/it] 34%|███▎ | 1947/5773 [3:02:01<5:45:05, 5.41s/it] 34%|███▎ | 1947/5773 [3:02:07<5:45:04, 5.41s/it] {'loss': 0.5729, 'learning_rate': 1.5446468779432123e-05, 'epoch': 0.34} 34%|███▎ | 1947/5773 [3:02:07<5:45:04, 5.41s/it] {'loss': 0.5729, 'learning_rate': 1.5446468779432123e-05, 'epoch': 0.34} 34%|███▎ | 1947/5773 [3:02:01<5:45:05, 5.41s/it] 34%|███▎ | 1948/5773 [3:02:07<5:45:23, 5.42s/it] 34%|███▎ | 1948/5773 [3:02:12<5:45:28, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.5441762179696066e-05, 'epoch': 0.34} 34%|███▎ | 1948/5773 [3:02:12<5:45:28, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.5441762179696066e-05, 'epoch': 0.34} 34%|███▎ | 1948/5773 [3:02:07<5:45:23, 5.42s/it] 34%|███▍ | 1949/5773 [3:02:12<5:44:52, 5.41s/it] 34%|███▍ | 1949/5773 [3:02:18<5:44:51, 5.41s/it] {'loss': 0.5778, 'learning_rate': 1.5437053866719412e-05, 'epoch': 0.34} 34%|███▍ | 1949/5773 [3:02:18<5:44:51, 5.41s/it] {'loss': 0.5778, 'learning_rate': 1.5437053866719412e-05, 'epoch': 0.34} 34%|███▍ | 1949/5773 [3:02:12<5:44:52, 5.41s/it]47 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 912 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 08 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...10 AutoResumeHook: Checking whether to suspend... 34%|███▍ | 1950/5773 [3:02:23<5:41:22, 5.36s/it] 34%|███▍ | 1950/5773 [3:02:17<5:41:24, 5.36s/it] {'loss': 0.5846, 'learning_rate': 1.543234384198449e-05, 'epoch': 0.34} 34%|███▍ | 1950/5773 [3:02:23<5:41:22, 5.36s/it] {'loss': 0.5846, 'learning_rate': 1.543234384198449e-05, 'epoch': 0.34} 34%|███▍ | 1950/5773 [3:02:17<5:41:24, 5.36s/it] 34%|███▍ | 1951/5773 [3:02:23<5:44:34, 5.41s/it] 34%|███▍ | 1951/5773 [3:02:28<5:44:35, 5.41s/it] {'loss': 0.5846, 'learning_rate': 1.5427632106974165e-05, 'epoch': 0.34} 34%|███▍ | 1951/5773 [3:02:28<5:44:35, 5.41s/it] {'loss': 0.5846, 'learning_rate': 1.5427632106974165e-05, 'epoch': 0.34} 34%|███▍ | 1951/5773 [3:02:23<5:44:34, 5.41s/it] 34%|███▍ | 1952/5773 [3:02:28<5:47:50, 5.46s/it] 34%|███▍ | 1952/5773 [3:02:34<5:47:51, 5.46s/it] {'loss': 0.5846, 'learning_rate': 1.542291866317184e-05, 'epoch': 0.34} 34%|███▍ | 1952/5773 [3:02:34<5:47:51, 5.46s/it] {'loss': 0.5846, 'learning_rate': 1.542291866317184e-05, 'epoch': 0.34} 34%|███▍ | 1952/5773 [3:02:28<5:47:50, 5.46s/it] 34%|███▍ | 1953/5773 [3:02:34<5:48:23, 5.47s/it] 34%|███▍ | 1953/5773 [3:02:39<5:48:21, 5.47s/it] {'loss': 0.584, 'learning_rate': 1.5418203512061455e-05, 'epoch': 0.34} 34%|███▍ | 1953/5773 [3:02:39<5:48:21, 5.47s/it] {'loss': 0.584, 'learning_rate': 1.5418203512061455e-05, 'epoch': 0.34} 34%|███▍ | 1953/5773 [3:02:34<5:48:23, 5.47s/it] 34%|███▍ | 1954/5773 [3:02:39<5:51:06, 5.52s/it] 34%|███▍ | 1954/5773 [3:02:45<5:51:07, 5.52s/it] {'loss': 0.5844, 'learning_rate': 1.5413486655127498e-05, 'epoch': 0.34} 34%|███▍ | 1954/5773 [3:02:45<5:51:07, 5.52s/it] {'loss': 0.5844, 'learning_rate': 1.5413486655127498e-05, 'epoch': 0.34} 34%|███▍ | 1954/5773 [3:02:39<5:51:06, 5.52s/it] 34%|███▍ | 1955/5773 [3:02:45<5:47:18, 5.46s/it] 34%|███▍ | 1955/5773 [3:02:50<5:47:17, 5.46s/it] {'loss': 0.5626, 'learning_rate': 1.540876809385498e-05, 'epoch': 0.34} 34%|███▍ | 1955/5773 [3:02:50<5:47:17, 5.46s/it] {'loss': 0.5626, 'learning_rate': 1.540876809385498e-05, 'epoch': 0.34} 34%|███▍ | 1955/5773 [3:02:45<5:47:18, 5.46s/it] 34%|███▍ | 1956/5773 [3:02:50<5:45:54, 5.44s/it] 34%|███▍ | 1956/5773 [3:02:56<5:45:55, 5.44s/it] {'loss': 0.5849, 'learning_rate': 1.540404782972946e-05, 'epoch': 0.34} 34%|███▍ | 1956/5773 [3:02:56<5:45:55, 5.44s/it] {'loss': 0.5849, 'learning_rate': 1.540404782972946e-05, 'epoch': 0.34} 34%|███▍ | 1956/5773 [3:02:50<5:45:54, 5.44s/it] 34%|███▍ | 1957/5773 [3:02:56<5:48:21, 5.48s/it] 34%|███▍ | 1957/5773 [3:03:01<5:48:20, 5.48s/it] {'loss': 0.5803, 'learning_rate': 1.5399325864237022e-05, 'epoch': 0.34} 34%|███▍ | 1957/5773 [3:03:01<5:48:20, 5.48s/it] {'loss': 0.5803, 'learning_rate': 1.5399325864237022e-05, 'epoch': 0.34} 34%|███▍ | 1957/5773 [3:02:56<5:48:21, 5.48s/it] 34%|███▍ | 1958/5773 [3:03:01<5:51:07, 5.52s/it] 34%|███▍ | 1958/5773 [3:03:07<5:51:07, 5.52s/it] {'loss': 0.5839, 'learning_rate': 1.5394602198864297e-05, 'epoch': 0.34} 34%|███▍ | 1958/5773 [3:03:07<5:51:07, 5.52s/it] {'loss': 0.5839, 'learning_rate': 1.5394602198864297e-05, 'epoch': 0.34} 34%|███▍ | 1958/5773 [3:03:01<5:51:07, 5.52s/it] 34%|███▍ | 1959/5773 [3:03:07<5:52:06, 5.54s/it] 34%|███▍ | 1959/5773 [3:03:13<5:52:05, 5.54s/it] {'loss': 0.5834, 'learning_rate': 1.538987683509844e-05, 'epoch': 0.34} 34%|███▍ | 1959/5773 [3:03:13<5:52:05, 5.54s/it] {'loss': 0.5834, 'learning_rate': 1.538987683509844e-05, 'epoch': 0.34} 34%|███▍ | 1959/5773 [3:03:07<5:52:06, 5.54s/it] 34%|███▍ | 1960/5773 [3:03:12<5:51:24, 5.53s/it] 34%|███▍ | 1960/5773 [3:03:18<5:51:25, 5.53s/it] {'loss': 0.5798, 'learning_rate': 1.5385149774427153e-05, 'epoch': 0.34} 34%|███▍ | 1960/5773 [3:03:18<5:51:25, 5.53s/it] {'loss': 0.5798, 'learning_rate': 1.5385149774427153e-05, 'epoch': 0.34} 34%|███▍ | 1960/5773 [3:03:12<5:51:24, 5.53s/it] 34%|███▍ | 1961/5773 [3:03:18<5:49:40, 5.50s/it] 34%|███▍ | 1961/5773 [3:03:23<5:49:39, 5.50s/it] {'loss': 0.5828, 'learning_rate': 1.5380421018338663e-05, 'epoch': 0.34} 34%|███▍ | 1961/5773 [3:03:23<5:49:39, 5.50s/it] {'loss': 0.5828, 'learning_rate': 1.5380421018338663e-05, 'epoch': 0.34} 34%|███▍ | 1961/5773 [3:03:18<5:49:40, 5.50s/it] 34%|███▍ | 1962/5773 [3:03:23<5:50:46, 5.52s/it] 34%|███▍ | 1962/5773 [3:03:29<5:50:45, 5.52s/it] {'loss': 0.5648, 'learning_rate': 1.537569056832173e-05, 'epoch': 0.34} 34%|███▍ | 1962/5773 [3:03:29<5:50:45, 5.52s/it] {'loss': 0.5648, 'learning_rate': 1.537569056832173e-05, 'epoch': 0.34} 34%|███▍ | 1962/5773 [3:03:23<5:50:46, 5.52s/it] 34%|███▍ | 1963/5773 [3:03:29<5:52:56, 5.56s/it] 34%|███▍ | 1963/5773 [3:03:35<5:52:55, 5.56s/it] {'loss': 0.5642, 'learning_rate': 1.537095842586566e-05, 'epoch': 0.34} 34%|███▍ | 1963/5773 [3:03:35<5:52:55, 5.56s/it] {'loss': 0.5642, 'learning_rate': 1.537095842586566e-05, 'epoch': 0.34} 34%|███▍ | 1963/5773 [3:03:29<5:52:56, 5.56s/it] 34%|███▍ | 1964/5773 [3:03:40<5:51:43, 5.54s/it] 34%|███▍ | 1964/5773 [3:03:35<5:51:44, 5.54s/it] {'loss': 0.5901, 'learning_rate': 1.536622459246027e-05, 'epoch': 0.34} 34%|███▍ | 1964/5773 [3:03:40<5:51:43, 5.54s/it] {'loss': 0.5901, 'learning_rate': 1.536622459246027e-05, 'epoch': 0.34} 34%|███▍ | 1964/5773 [3:03:35<5:51:44, 5.54s/it] 34%|███▍ | 1965/5773 [3:03:46<5:53:11, 5.57s/it] 34%|███▍ | 1965/5773 [3:03:40<5:53:12, 5.57s/it] {'loss': 0.5806, 'learning_rate': 1.5361489069595932e-05, 'epoch': 0.34} 34%|███▍ | 1965/5773 [3:03:46<5:53:11, 5.57s/it] {'loss': 0.5806, 'learning_rate': 1.5361489069595932e-05, 'epoch': 0.34} 34%|███▍ | 1965/5773 [3:03:40<5:53:12, 5.57s/it] 34%|███▍ | 1966/5773 [3:03:51<5:48:37, 5.49s/it] 34%|███▍ | 1966/5773 [3:03:46<5:48:38, 5.49s/it] {'loss': 0.5725, 'learning_rate': 1.5356751858763535e-05, 'epoch': 0.34} 34%|███▍ | 1966/5773 [3:03:51<5:48:37, 5.49s/it] {'loss': 0.5725, 'learning_rate': 1.5356751858763535e-05, 'epoch': 0.34} 34%|███▍ | 1966/5773 [3:03:46<5:48:38, 5.49s/it] 34%|███▍ | 1967/5773 [3:03:57<5:47:11, 5.47s/it] 34%|███▍ | 1967/5773 [3:03:51<5:47:11, 5.47s/it] {'loss': 0.5764, 'learning_rate': 1.535201296145451e-05, 'epoch': 0.34} 34%|███▍ | 1967/5773 [3:03:57<5:47:11, 5.47s/it] {'loss': 0.5764, 'learning_rate': 1.535201296145451e-05, 'epoch': 0.34} 34%|███▍ | 1967/5773 [3:03:51<5:47:11, 5.47s/it] 34%|███▍ | 1968/5773 [3:04:02<5:44:26, 5.43s/it] 34%|███▍ | 1968/5773 [3:03:56<5:44:27, 5.43s/it] {'loss': 0.5759, 'learning_rate': 1.5347272379160805e-05, 'epoch': 0.34} 34%|███▍ | 1968/5773 [3:04:02<5:44:26, 5.43s/it] {'loss': 0.5759, 'learning_rate': 1.5347272379160805e-05, 'epoch': 0.34} 34%|███▍ | 1968/5773 [3:03:56<5:44:27, 5.43s/it] 34%|███▍ | 1969/5773 [3:04:07<5:46:22, 5.46s/it] 34%|███▍ | 1969/5773 [3:04:02<5:46:22, 5.46s/it] {'loss': 0.5787, 'learning_rate': 1.5342530113374906e-05, 'epoch': 0.34} 34%|███▍ | 1969/5773 [3:04:07<5:46:22, 5.46s/it] {'loss': 0.5787, 'learning_rate': 1.5342530113374906e-05, 'epoch': 0.34} 34%|███▍ | 1969/5773 [3:04:02<5:46:22, 5.46s/it] 34%|███▍ | 1970/5773 [3:04:13<5:49:47, 5.52s/it] 34%|███▍ | 1970/5773 [3:04:08<5:49:47, 5.52s/it] {'loss': 0.5836, 'learning_rate': 1.5337786165589845e-05, 'epoch': 0.34} 34%|███▍ | 1970/5773 [3:04:13<5:49:47, 5.52s/it] {'loss': 0.5836, 'learning_rate': 1.5337786165589845e-05, 'epoch': 0.34} 34%|███▍ | 1970/5773 [3:04:08<5:49:47, 5.52s/it] 34%|███▍ | 1971/5773 [3:04:18<5:46:09, 5.46s/it] 34%|███▍ | 1971/5773 [3:04:13<5:46:09, 5.46s/it] {'loss': 0.5729, 'learning_rate': 1.533304053729915e-05, 'epoch': 0.34} 34%|███▍ | 1971/5773 [3:04:18<5:46:09, 5.46s/it] {'loss': 0.5729, 'learning_rate': 1.533304053729915e-05, 'epoch': 0.34} 34%|███▍ | 1971/5773 [3:04:13<5:46:09, 5.46s/it] 34%|███▍ | 1972/5773 [3:04:24<5:47:35, 5.49s/it] 34%|███▍ | 1972/5773 [3:04:18<5:47:35, 5.49s/it] {'loss': 0.5811, 'learning_rate': 1.5328293229996907e-05, 'epoch': 0.34} 34%|███▍ | 1972/5773 [3:04:24<5:47:35, 5.49s/it] {'loss': 0.5811, 'learning_rate': 1.5328293229996907e-05, 'epoch': 0.34} 34%|███▍ | 1972/5773 [3:04:18<5:47:35, 5.49s/it] 34%|███▍ | 1973/5773 [3:04:29<5:45:23, 5.45s/it] 34%|███▍ | 1973/5773 [3:04:24<5:45:24, 5.45s/it] {'loss': 0.5945, 'learning_rate': 1.532354424517772e-05, 'epoch': 0.34} 34%|███▍ | 1973/5773 [3:04:29<5:45:23, 5.45s/it] {'loss': 0.5945, 'learning_rate': 1.532354424517772e-05, 'epoch': 0.34} 34%|███▍ | 1973/5773 [3:04:24<5:45:24, 5.45s/it] 34%|███▍ | 1974/5773 [3:04:35<5:47:22, 5.49s/it] 34%|███▍ | 1974/5773 [3:04:29<5:47:22, 5.49s/it] {'loss': 0.5719, 'learning_rate': 1.5318793584336718e-05, 'epoch': 0.34} 34%|███▍ | 1974/5773 [3:04:35<5:47:22, 5.49s/it] {'loss': 0.5719, 'learning_rate': 1.5318793584336718e-05, 'epoch': 0.34} 34%|███▍ | 1974/5773 [3:04:29<5:47:22, 5.49s/it] 34%|███▍ | 1975/5773 [3:04:40<5:47:05, 5.48s/it] 34%|███▍ | 1975/5773 [3:04:35<5:47:04, 5.48s/it] {'loss': 0.5596, 'learning_rate': 1.5314041248969558e-05, 'epoch': 0.34} 34%|███▍ | 1975/5773 [3:04:40<5:47:05, 5.48s/it] {'loss': 0.5596, 'learning_rate': 1.5314041248969558e-05, 'epoch': 0.34} 34%|███▍ | 1975/5773 [3:04:35<5:47:04, 5.48s/it] 34%|███▍ | 1976/5773 [3:04:46<5:43:07, 5.42s/it] 34%|███▍ | 1976/5773 [3:04:40<5:43:06, 5.42s/it] {'loss': 0.5773, 'learning_rate': 1.530928724057243e-05, 'epoch': 0.34} 34%|███▍ | 1976/5773 [3:04:46<5:43:07, 5.42s/it] {'loss': 0.5773, 'learning_rate': 1.530928724057243e-05, 'epoch': 0.34} 34%|███▍ | 1976/5773 [3:04:40<5:43:06, 5.42s/it] 34%|███▍ | 1977/5773 [3:04:46<5:42:50, 5.42s/it] 34%|███▍ | 1977/5773 [3:04:51<5:42:51, 5.42s/it] {'loss': 0.5837, 'learning_rate': 1.5304531560642052e-05, 'epoch': 0.34} 34%|███▍ | 1977/5773 [3:04:51<5:42:51, 5.42s/it] {'loss': 0.5837, 'learning_rate': 1.5304531560642052e-05, 'epoch': 0.34} 34%|███▍ | 1977/5773 [3:04:46<5:42:50, 5.42s/it] 34%|███▍ | 1978/5773 [3:04:57<5:44:55, 5.45s/it] 34%|███▍ | 1978/5773 [3:04:51<5:44:55, 5.45s/it] {'loss': 0.5764, 'learning_rate': 1.529977421067566e-05, 'epoch': 0.34} 34%|███▍ | 1978/5773 [3:04:57<5:44:55, 5.45s/it] {'loss': 0.5764, 'learning_rate': 1.529977421067566e-05, 'epoch': 0.34} 34%|███▍ | 1978/5773 [3:04:51<5:44:55, 5.45s/it] 34%|███▍ | 1979/5773 [3:05:02<5:45:23, 5.46s/it] 34%|███▍ | 1979/5773 [3:04:57<5:45:23, 5.46s/it] {'loss': 0.5741, 'learning_rate': 1.5295015192171016e-05, 'epoch': 0.34} 34%|███▍ | 1979/5773 [3:05:02<5:45:23, 5.46s/it] {'loss': 0.5741, 'learning_rate': 1.5295015192171016e-05, 'epoch': 0.34} 34%|███▍ | 1979/5773 [3:04:57<5:45:23, 5.46s/it] 34%|███▍ | 1980/5773 [3:05:08<5:45:59, 5.47s/it] 34%|███▍ | 1980/5773 [3:05:02<5:45:59, 5.47s/it] {'loss': 0.5798, 'learning_rate': 1.5290254506626417e-05, 'epoch': 0.34} 34%|███▍ | 1980/5773 [3:05:08<5:45:59, 5.47s/it] {'loss': 0.5798, 'learning_rate': 1.5290254506626417e-05, 'epoch': 0.34} 34%|███▍ | 1980/5773 [3:05:02<5:45:59, 5.47s/it] 34%|███▍ | 1981/5773 [3:05:13<5:46:50, 5.49s/it] 34%|███▍ | 1981/5773 [3:05:08<5:46:49, 5.49s/it] {'loss': 0.5773, 'learning_rate': 1.5285492155540676e-05, 'epoch': 0.34} 34%|███▍ | 1981/5773 [3:05:13<5:46:50, 5.49s/it] {'loss': 0.5773, 'learning_rate': 1.5285492155540676e-05, 'epoch': 0.34} 34%|███▍ | 1981/5773 [3:05:08<5:46:49, 5.49s/it] 34%|███▍ | 1982/5773 [3:05:18<5:45:01, 5.46s/it] 34%|███▍ | 1982/5773 [3:05:13<5:45:01, 5.46s/it] {'loss': 0.5746, 'learning_rate': 1.5280728140413134e-05, 'epoch': 0.34} 34%|███▍ | 1982/5773 [3:05:18<5:45:01, 5.46s/it] {'loss': 0.5746, 'learning_rate': 1.5280728140413134e-05, 'epoch': 0.34} 34%|███▍ | 1982/5773 [3:05:13<5:45:01, 5.46s/it] 34%|███▍ | 1983/5773 [3:05:24<5:44:48, 5.46s/it] 34%|███▍ | 1983/5773 [3:05:18<5:44:48, 5.46s/it] {'loss': 0.5926, 'learning_rate': 1.5275962462743657e-05, 'epoch': 0.34} 34%|███▍ | 1983/5773 [3:05:24<5:44:48, 5.46s/it] {'loss': 0.5926, 'learning_rate': 1.5275962462743657e-05, 'epoch': 0.34} 34%|███▍ | 1983/5773 [3:05:18<5:44:48, 5.46s/it] 34%|███▍ | 1984/5773 [3:05:29<5:43:19, 5.44s/it] 34%|███▍ | 1984/5773 [3:05:24<5:43:19, 5.44s/it] {'loss': 0.5882, 'learning_rate': 1.527119512403263e-05, 'epoch': 0.34} 34%|███▍ | 1984/5773 [3:05:29<5:43:19, 5.44s/it] {'loss': 0.5882, 'learning_rate': 1.527119512403263e-05, 'epoch': 0.34} 34%|███▍ | 1984/5773 [3:05:24<5:43:19, 5.44s/it] 34%|███▍ | 1985/5773 [3:05:35<5:43:43, 5.44s/it] 34%|███▍ | 1985/5773 [3:05:29<5:43:43, 5.44s/it] {'loss': 0.5774, 'learning_rate': 1.5266426125780965e-05, 'epoch': 0.34} 34%|███▍ | 1985/5773 [3:05:35<5:43:43, 5.44s/it] {'loss': 0.5774, 'learning_rate': 1.5266426125780965e-05, 'epoch': 0.34} 34%|███▍ | 1985/5773 [3:05:29<5:43:43, 5.44s/it] 34%|███▍ | 1986/5773 [3:05:40<5:41:19, 5.41s/it] 34%|███▍ | 1986/5773 [3:05:35<5:41:18, 5.41s/it] {'loss': 0.5778, 'learning_rate': 1.5261655469490095e-05, 'epoch': 0.34} 34%|███▍ | 1986/5773 [3:05:40<5:41:19, 5.41s/it] {'loss': 0.5778, 'learning_rate': 1.5261655469490095e-05, 'epoch': 0.34} 34%|███▍ | 1986/5773 [3:05:35<5:41:18, 5.41s/it] 34%|███▍ | 1987/5773 [3:05:45<5:40:33, 5.40s/it] 34%|███▍ | 1987/5773 [3:05:40<5:40:33, 5.40s/it] {'loss': 0.568, 'learning_rate': 1.5256883156661972e-05, 'epoch': 0.34} 34%|███▍ | 1987/5773 [3:05:45<5:40:33, 5.40s/it] {'loss': 0.568, 'learning_rate': 1.5256883156661972e-05, 'epoch': 0.34} 34%|███▍ | 1987/5773 [3:05:40<5:40:33, 5.40s/it] 34%|███▍ | 1988/5773 [3:05:51<5:42:09, 5.42s/it] 34%|███▍ | 1988/5773 [3:05:45<5:42:09, 5.42s/it] {'loss': 0.5823, 'learning_rate': 1.5252109188799077e-05, 'epoch': 0.34} 34%|███▍ | 1988/5773 [3:05:51<5:42:09, 5.42s/it] {'loss': 0.5823, 'learning_rate': 1.5252109188799077e-05, 'epoch': 0.34} 34%|███▍ | 1988/5773 [3:05:45<5:42:09, 5.42s/it] 34%|███▍ | 1989/5773 [3:05:56<5:43:03, 5.44s/it] 34%|███▍ | 1989/5773 [3:05:51<5:43:03, 5.44s/it] {'loss': 0.5916, 'learning_rate': 1.5247333567404407e-05, 'epoch': 0.34} 34%|███▍ | 1989/5773 [3:05:56<5:43:03, 5.44s/it] {'loss': 0.5916, 'learning_rate': 1.5247333567404407e-05, 'epoch': 0.34} 34%|███▍ | 1989/5773 [3:05:51<5:43:03, 5.44s/it] 34%|███▍ | 1990/5773 [3:06:02<5:42:03, 5.43s/it] 34%|███▍ | 1990/5773 [3:05:56<5:42:03, 5.43s/it] {'loss': 0.6029, 'learning_rate': 1.5242556293981477e-05, 'epoch': 0.34} 34%|███▍ | 1990/5773 [3:06:02<5:42:03, 5.43s/it] {'loss': 0.6029, 'learning_rate': 1.5242556293981477e-05, 'epoch': 0.34} 34%|███▍ | 1990/5773 [3:05:56<5:42:03, 5.43s/it] 34%|███▍ | 1991/5773 [3:06:07<5:42:46, 5.44s/it] 34%|███▍ | 1991/5773 [3:06:02<5:42:46, 5.44s/it] {'loss': 0.5731, 'learning_rate': 1.5237777370034326e-05, 'epoch': 0.34} 34%|███▍ | 1991/5773 [3:06:07<5:42:46, 5.44s/it] {'loss': 0.5731, 'learning_rate': 1.5237777370034326e-05, 'epoch': 0.34} 34%|███▍ | 1991/5773 [3:06:02<5:42:46, 5.44s/it] 35%|███▍ | 1992/5773 [3:06:07<5:41:24, 5.42s/it] 35%|███▍ | 1992/5773 [3:06:13<5:41:24, 5.42s/it] {'loss': 0.5859, 'learning_rate': 1.5232996797067513e-05, 'epoch': 0.35} 35%|███▍ | 1992/5773 [3:06:13<5:41:24, 5.42s/it] {'loss': 0.5859, 'learning_rate': 1.5232996797067513e-05, 'epoch': 0.35} 35%|███▍ | 1992/5773 [3:06:07<5:41:24, 5.42s/it] 35%|███▍ | 1993/5773 [3:06:18<5:39:29, 5.39s/it] 35%|███▍ | 1993/5773 [3:06:12<5:39:29, 5.39s/it] {'loss': 0.5772, 'learning_rate': 1.5228214576586118e-05, 'epoch': 0.35} 35%|███▍ | 1993/5773 [3:06:18<5:39:29, 5.39s/it] {'loss': 0.5772, 'learning_rate': 1.5228214576586118e-05, 'epoch': 0.35} 35%|███▍ | 1993/5773 [3:06:12<5:39:29, 5.39s/it] 35%|███▍ | 1994/5773 [3:06:23<5:39:05, 5.38s/it] 35%|███▍ | 1994/5773 [3:06:18<5:39:05, 5.38s/it] {'loss': 0.5994, 'learning_rate': 1.522343071009573e-05, 'epoch': 0.35} 35%|███▍ | 1994/5773 [3:06:23<5:39:05, 5.38s/it] {'loss': 0.5994, 'learning_rate': 1.522343071009573e-05, 'epoch': 0.35} 35%|███▍ | 1994/5773 [3:06:18<5:39:05, 5.38s/it] 35%|███▍ | 1995/5773 [3:06:29<5:36:54, 5.35s/it] 35%|███▍ | 1995/5773 [3:06:23<5:36:55, 5.35s/it] {'loss': 0.5812, 'learning_rate': 1.521864519910247e-05, 'epoch': 0.35} 35%|███▍ | 1995/5773 [3:06:29<5:36:54, 5.35s/it] {'loss': 0.5812, 'learning_rate': 1.521864519910247e-05, 'epoch': 0.35} 35%|███▍ | 1995/5773 [3:06:23<5:36:55, 5.35s/it] 35%|███▍ | 1996/5773 [3:06:34<5:37:27, 5.36s/it] 35%|███▍ | 1996/5773 [3:06:28<5:37:27, 5.36s/it] {'loss': 0.5771, 'learning_rate': 1.5213858045112968e-05, 'epoch': 0.35} 35%|███▍ | 1996/5773 [3:06:34<5:37:27, 5.36s/it] {'loss': 0.5771, 'learning_rate': 1.5213858045112968e-05, 'epoch': 0.35} 35%|███▍ | 1996/5773 [3:06:28<5:37:27, 5.36s/it] 35%|███▍ | 1997/5773 [3:06:40<5:43:00, 5.45s/it] 35%|███▍ | 1997/5773 [3:06:34<5:43:00, 5.45s/it] {'loss': 0.5774, 'learning_rate': 1.5209069249634367e-05, 'epoch': 0.35} 35%|███▍ | 1997/5773 [3:06:40<5:43:00, 5.45s/it] {'loss': 0.5774, 'learning_rate': 1.5209069249634367e-05, 'epoch': 0.35} 35%|███▍ | 1997/5773 [3:06:34<5:43:00, 5.45s/it] 35%|███▍ | 1998/5773 [3:06:39<5:39:39, 5.40s/it] 35%|███▍ | 1998/5773 [3:06:45<5:39:40, 5.40s/it] {'loss': 0.5883, 'learning_rate': 1.520427881417434e-05, 'epoch': 0.35} 35%|███▍ | 1998/5773 [3:06:45<5:39:40, 5.40s/it] {'loss': 0.5883, 'learning_rate': 1.520427881417434e-05, 'epoch': 0.35} 35%|███▍ | 1998/5773 [3:06:39<5:39:39, 5.40s/it] 35%|███▍ | 1999/5773 [3:06:45<5:40:38, 5.42s/it] 35%|███▍ | 1999/5773 [3:06:50<5:40:41, 5.42s/it] {'loss': 0.5812, 'learning_rate': 1.5199486740241067e-05, 'epoch': 0.35} 35%|███▍ | 1999/5773 [3:06:50<5:40:41, 5.42s/it] {'loss': 0.5812, 'learning_rate': 1.5199486740241067e-05, 'epoch': 0.35} 35%|███▍ | 1999/5773 [3:06:45<5:40:38, 5.42s/it]7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 01512 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 810 14AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 35%|███▍ | 2000/5773 [3:06:56<5:38:37, 5.38s/it] 35%|███▍ | 2000/5773 [3:06:50<5:38:38, 5.39s/it] {'loss': 0.583, 'learning_rate': 1.5194693029343249e-05, 'epoch': 0.35} 35%|███▍ | 2000/5773 [3:06:56<5:38:37, 5.38s/it] {'loss': 0.583, 'learning_rate': 1.5194693029343249e-05, 'epoch': 0.35} 35%|███▍ | 2000/5773 [3:06:50<5:38:38, 5.39s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 35%|███▍ | 2001/5773 [3:07:09<9:44:55, 9.30s/it] 35%|███▍ | 2001/5773 [3:07:14<9:44:55, 9.30s/it] {'loss': 0.5843, 'learning_rate': 1.518989768299009e-05, 'epoch': 0.35} 35%|███▍ | 2001/5773 [3:07:14<9:44:55, 9.30s/it] {'loss': 0.5843, 'learning_rate': 1.518989768299009e-05, 'epoch': 0.35} 35%|███▍ | 2001/5773 [3:07:09<9:44:55, 9.30s/it] 35%|███▍ | 2002/5773 [3:07:20<8:32:32, 8.16s/it] 35%|███▍ | 2002/5773 [3:07:14<8:32:33, 8.16s/it] {'loss': 0.5819, 'learning_rate': 1.518510070269133e-05, 'epoch': 0.35} 35%|███▍ | 2002/5773 [3:07:20<8:32:32, 8.16s/it] {'loss': 0.5819, 'learning_rate': 1.518510070269133e-05, 'epoch': 0.35} 35%|███▍ | 2002/5773 [3:07:14<8:32:33, 8.16s/it] 35%|███▍ | 2003/5773 [3:07:19<7:39:39, 7.32s/it] 35%|███▍ | 2003/5773 [3:07:25<7:39:43, 7.32s/it] {'loss': 0.5699, 'learning_rate': 1.5180302089957201e-05, 'epoch': 0.35} 35%|███▍ | 2003/5773 [3:07:25<7:39:43, 7.32s/it] {'loss': 0.5699, 'learning_rate': 1.5180302089957201e-05, 'epoch': 0.35} 35%|███▍ | 2003/5773 [3:07:19<7:39:39, 7.32s/it] 35%|███▍ | 2004/5773 [3:07:31<7:07:55, 6.81s/it] 35%|███▍ | 2004/5773 [3:07:25<7:07:58, 6.81s/it] {'loss': 0.5818, 'learning_rate': 1.5175501846298468e-05, 'epoch': 0.35} 35%|███▍ | 2004/5773 [3:07:31<7:07:55, 6.81s/it] {'loss': 0.5818, 'learning_rate': 1.5175501846298468e-05, 'epoch': 0.35} 35%|███▍ | 2004/5773 [3:07:25<7:07:58, 6.81s/it] 35%|███▍ | 2005/5773 [3:07:36<6:40:59, 6.39s/it] 35%|███▍ | 2005/5773 [3:07:30<6:41:01, 6.39s/it] {'loss': 0.5796, 'learning_rate': 1.5170699973226395e-05, 'epoch': 0.35} 35%|███▍ | 2005/5773 [3:07:36<6:40:59, 6.39s/it] {'loss': 0.5796, 'learning_rate': 1.5170699973226395e-05, 'epoch': 0.35} 35%|███▍ | 2005/5773 [3:07:30<6:41:01, 6.39s/it] 35%|███▍ | 2006/5773 [3:07:41<6:22:58, 6.10s/it] 35%|███▍ | 2006/5773 [3:07:36<6:22:59, 6.10s/it] {'loss': 0.592, 'learning_rate': 1.5165896472252768e-05, 'epoch': 0.35} 35%|███▍ | 2006/5773 [3:07:41<6:22:58, 6.10s/it] {'loss': 0.592, 'learning_rate': 1.5165896472252768e-05, 'epoch': 0.35} 35%|███▍ | 2006/5773 [3:07:36<6:22:59, 6.10s/it] 35%|███▍ | 2007/5773 [3:07:47<6:13:09, 5.95s/it] 35%|███▍ | 2007/5773 [3:07:42<6:13:10, 5.95s/it] {'loss': 0.5904, 'learning_rate': 1.5161091344889885e-05, 'epoch': 0.35} 35%|███▍ | 2007/5773 [3:07:47<6:13:09, 5.95s/it] {'loss': 0.5904, 'learning_rate': 1.5161091344889885e-05, 'epoch': 0.35} 35%|███▍ | 2007/5773 [3:07:42<6:13:10, 5.95s/it] 35%|███▍ | 2008/5773 [3:07:53<6:04:40, 5.81s/it] 35%|███▍ | 2008/5773 [3:07:47<6:04:40, 5.81s/it] {'loss': 0.57, 'learning_rate': 1.5156284592650548e-05, 'epoch': 0.35} 35%|███▍ | 2008/5773 [3:07:53<6:04:40, 5.81s/it] {'loss': 0.57, 'learning_rate': 1.5156284592650548e-05, 'epoch': 0.35} 35%|███▍ | 2008/5773 [3:07:47<6:04:40, 5.81s/it] 35%|███▍ | 2009/5773 [3:07:58<5:58:51, 5.72s/it] 35%|███▍ | 2009/5773 [3:07:53<5:58:51, 5.72s/it] {'loss': 0.5778, 'learning_rate': 1.5151476217048083e-05, 'epoch': 0.35} 35%|███▍ | 2009/5773 [3:07:58<5:58:51, 5.72s/it] {'loss': 0.5778, 'learning_rate': 1.5151476217048083e-05, 'epoch': 0.35} 35%|███▍ | 2009/5773 [3:07:53<5:58:51, 5.72s/it] 35%|███▍ | 2010/5773 [3:08:04<5:57:01, 5.69s/it] 35%|███▍ | 2010/5773 [3:07:58<5:57:01, 5.69s/it] {'loss': 0.6012, 'learning_rate': 1.5146666219596311e-05, 'epoch': 0.35} 35%|███▍ | 2010/5773 [3:08:04<5:57:01, 5.69s/it] {'loss': 0.6012, 'learning_rate': 1.5146666219596311e-05, 'epoch': 0.35} 35%|███▍ | 2010/5773 [3:07:58<5:57:01, 5.69s/it] 35%|███▍ | 2011/5773 [3:08:09<5:52:05, 5.62s/it] 35%|███▍ | 2011/5773 [3:08:04<5:52:05, 5.62s/it] {'loss': 0.5652, 'learning_rate': 1.5141854601809583e-05, 'epoch': 0.35} 35%|███▍ | 2011/5773 [3:08:09<5:52:05, 5.62s/it] {'loss': 0.5652, 'learning_rate': 1.5141854601809583e-05, 'epoch': 0.35} 35%|███▍ | 2011/5773 [3:08:04<5:52:05, 5.62s/it] 35%|███▍ | 2012/5773 [3:08:15<5:54:23, 5.65s/it] 35%|███▍ | 2012/5773 [3:08:09<5:54:23, 5.65s/it] {'loss': 0.5831, 'learning_rate': 1.513704136520274e-05, 'epoch': 0.35} 35%|███▍ | 2012/5773 [3:08:15<5:54:23, 5.65s/it] {'loss': 0.5831, 'learning_rate': 1.513704136520274e-05, 'epoch': 0.35} 35%|███▍ | 2012/5773 [3:08:09<5:54:23, 5.65s/it] 35%|███▍ | 2013/5773 [3:08:20<5:49:37, 5.58s/it] 35%|███▍ | 2013/5773 [3:08:15<5:49:36, 5.58s/it] {'loss': 0.5636, 'learning_rate': 1.513222651129115e-05, 'epoch': 0.35} 35%|███▍ | 2013/5773 [3:08:20<5:49:37, 5.58s/it] {'loss': 0.5636, 'learning_rate': 1.513222651129115e-05, 'epoch': 0.35} 35%|███▍ | 2013/5773 [3:08:15<5:49:36, 5.58s/it] 35%|███▍ | 2014/5773 [3:08:26<5:46:47, 5.54s/it] 35%|███▍ | 2014/5773 [3:08:20<5:46:48, 5.54s/it] {'loss': 0.584, 'learning_rate': 1.5127410041590684e-05, 'epoch': 0.35} 35%|███▍ | 2014/5773 [3:08:26<5:46:47, 5.54s/it] {'loss': 0.584, 'learning_rate': 1.5127410041590684e-05, 'epoch': 0.35} 35%|███▍ | 2014/5773 [3:08:20<5:46:48, 5.54s/it] 35%|███▍ | 2015/5773 [3:08:31<5:44:25, 5.50s/it] 35%|███▍ | 2015/5773 [3:08:26<5:44:24, 5.50s/it] {'loss': 0.5868, 'learning_rate': 1.5122591957617713e-05, 'epoch': 0.35} 35%|███▍ | 2015/5773 [3:08:31<5:44:25, 5.50s/it] {'loss': 0.5868, 'learning_rate': 1.5122591957617713e-05, 'epoch': 0.35} 35%|███▍ | 2015/5773 [3:08:26<5:44:24, 5.50s/it] 35%|███▍ | 2016/5773 [3:08:36<5:42:01, 5.46s/it] 35%|███▍ | 2016/5773 [3:08:31<5:42:01, 5.46s/it] {'loss': 0.5834, 'learning_rate': 1.5117772260889132e-05, 'epoch': 0.35} 35%|███▍ | 2016/5773 [3:08:36<5:42:01, 5.46s/it] {'loss': 0.5834, 'learning_rate': 1.5117772260889132e-05, 'epoch': 0.35} 35%|███▍ | 2016/5773 [3:08:31<5:42:01, 5.46s/it] 35%|███▍ | 2017/5773 [3:08:42<5:37:48, 5.40s/it] 35%|███▍ | 2017/5773 [3:08:36<5:37:48, 5.40s/it] {'loss': 0.5822, 'learning_rate': 1.5112950952922329e-05, 'epoch': 0.35} 35%|███▍ | 2017/5773 [3:08:42<5:37:48, 5.40s/it] {'loss': 0.5822, 'learning_rate': 1.5112950952922329e-05, 'epoch': 0.35} 35%|███▍ | 2017/5773 [3:08:36<5:37:48, 5.40s/it] 35%|███▍ | 2018/5773 [3:08:42<5:36:20, 5.37s/it] 35%|███▍ | 2018/5773 [3:08:47<5:36:20, 5.37s/it] {'loss': 0.5873, 'learning_rate': 1.5108128035235207e-05, 'epoch': 0.35} 35%|███▍ | 2018/5773 [3:08:47<5:36:20, 5.37s/it] {'loss': 0.5873, 'learning_rate': 1.5108128035235207e-05, 'epoch': 0.35} 35%|███▍ | 2018/5773 [3:08:42<5:36:20, 5.37s/it] 35%|███▍ | 2019/5773 [3:08:53<5:39:03, 5.42s/it] 35%|███▍ | 2019/5773 [3:08:47<5:39:03, 5.42s/it] {'loss': 0.5746, 'learning_rate': 1.510330350934618e-05, 'epoch': 0.35} 35%|███▍ | 2019/5773 [3:08:53<5:39:03, 5.42s/it] {'loss': 0.5746, 'learning_rate': 1.510330350934618e-05, 'epoch': 0.35} 35%|███▍ | 2019/5773 [3:08:47<5:39:03, 5.42s/it] 35%|███▍ | 2020/5773 [3:08:58<5:38:39, 5.41s/it] 35%|███▍ | 2020/5773 [3:08:52<5:38:39, 5.41s/it] {'loss': 0.5694, 'learning_rate': 1.5098477376774156e-05, 'epoch': 0.35} 35%|███▍ | 2020/5773 [3:08:58<5:38:39, 5.41s/it] {'loss': 0.5694, 'learning_rate': 1.5098477376774156e-05, 'epoch': 0.35} 35%|███▍ | 2020/5773 [3:08:52<5:38:39, 5.41s/it] 35%|███▌ | 2021/5773 [3:09:03<5:40:11, 5.44s/it] 35%|███▌ | 2021/5773 [3:08:58<5:40:11, 5.44s/it] {'loss': 0.5778, 'learning_rate': 1.5093649639038557e-05, 'epoch': 0.35} 35%|███▌ | 2021/5773 [3:09:03<5:40:11, 5.44s/it] {'loss': 0.5778, 'learning_rate': 1.5093649639038557e-05, 'epoch': 0.35} 35%|███▌ | 2021/5773 [3:08:58<5:40:11, 5.44s/it] 35%|███▌ | 2022/5773 [3:09:09<5:40:34, 5.45s/it] 35%|███▌ | 2022/5773 [3:09:03<5:40:35, 5.45s/it] {'loss': 0.555, 'learning_rate': 1.5088820297659314e-05, 'epoch': 0.35} 35%|███▌ | 2022/5773 [3:09:09<5:40:34, 5.45s/it] {'loss': 0.555, 'learning_rate': 1.5088820297659314e-05, 'epoch': 0.35} 35%|███▌ | 2022/5773 [3:09:03<5:40:35, 5.45s/it] 35%|███▌ | 2023/5773 [3:09:14<5:38:31, 5.42s/it] 35%|███▌ | 2023/5773 [3:09:09<5:38:31, 5.42s/it] {'loss': 0.6032, 'learning_rate': 1.5083989354156855e-05, 'epoch': 0.35} 35%|███▌ | 2023/5773 [3:09:14<5:38:31, 5.42s/it] {'loss': 0.6032, 'learning_rate': 1.5083989354156855e-05, 'epoch': 0.35} 35%|███▌ | 2023/5773 [3:09:09<5:38:31, 5.42s/it] 35%|███▌ | 2024/5773 [3:09:20<5:36:50, 5.39s/it] 35%|███▌ | 2024/5773 [3:09:14<5:36:51, 5.39s/it] {'loss': 0.5603, 'learning_rate': 1.5079156810052111e-05, 'epoch': 0.35} 35%|███▌ | 2024/5773 [3:09:20<5:36:50, 5.39s/it] {'loss': 0.5603, 'learning_rate': 1.5079156810052111e-05, 'epoch': 0.35} 35%|███▌ | 2024/5773 [3:09:14<5:36:51, 5.39s/it] 35%|███▌ | 2025/5773 [3:09:25<5:38:47, 5.42s/it] 35%|███▌ | 2025/5773 [3:09:20<5:38:47, 5.42s/it] {'loss': 0.5862, 'learning_rate': 1.507432266686653e-05, 'epoch': 0.35} 35%|███▌ | 2025/5773 [3:09:25<5:38:47, 5.42s/it] {'loss': 0.5862, 'learning_rate': 1.507432266686653e-05, 'epoch': 0.35} 35%|███▌ | 2025/5773 [3:09:20<5:38:47, 5.42s/it] 35%|███▌ | 2026/5773 [3:09:31<5:39:05, 5.43s/it] 35%|███▌ | 2026/5773 [3:09:25<5:39:04, 5.43s/it] {'loss': 0.5962, 'learning_rate': 1.5069486926122045e-05, 'epoch': 0.35} 35%|███▌ | 2026/5773 [3:09:31<5:39:05, 5.43s/it] {'loss': 0.5962, 'learning_rate': 1.5069486926122045e-05, 'epoch': 0.35} 35%|███▌ | 2026/5773 [3:09:25<5:39:04, 5.43s/it] 35%|███▌ | 2027/5773 [3:09:36<5:40:27, 5.45s/it] 35%|███▌ | 2027/5773 [3:09:31<5:40:26, 5.45s/it] {'loss': 0.5789, 'learning_rate': 1.5064649589341112e-05, 'epoch': 0.35} 35%|███▌ | 2027/5773 [3:09:36<5:40:27, 5.45s/it] {'loss': 0.5789, 'learning_rate': 1.5064649589341112e-05, 'epoch': 0.35} 35%|███▌ | 2027/5773 [3:09:31<5:40:26, 5.45s/it] 35%|███▌ | 2028/5773 [3:09:41<5:38:22, 5.42s/it] 35%|███▌ | 2028/5773 [3:09:36<5:38:21, 5.42s/it] {'loss': 0.5884, 'learning_rate': 1.505981065804667e-05, 'epoch': 0.35} 35%|███▌ | 2028/5773 [3:09:41<5:38:22, 5.42s/it] {'loss': 0.5884, 'learning_rate': 1.505981065804667e-05, 'epoch': 0.35} 35%|███▌ | 2028/5773 [3:09:36<5:38:21, 5.42s/it] 35%|███▌ | 2029/5773 [3:09:47<5:36:14, 5.39s/it] 35%|███▌ | 2029/5773 [3:09:41<5:36:14, 5.39s/it] {'loss': 0.5871, 'learning_rate': 1.5054970133762173e-05, 'epoch': 0.35} 35%|███▌ | 2029/5773 [3:09:47<5:36:14, 5.39s/it] {'loss': 0.5871, 'learning_rate': 1.5054970133762173e-05, 'epoch': 0.35} 35%|███▌ | 2029/5773 [3:09:41<5:36:14, 5.39s/it] 35%|███▌ | 2030/5773 [3:09:52<5:37:52, 5.42s/it] 35%|███▌ | 2030/5773 [3:09:47<5:37:53, 5.42s/it] {'loss': 0.5829, 'learning_rate': 1.5050128018011576e-05, 'epoch': 0.35} 35%|███▌ | 2030/5773 [3:09:52<5:37:52, 5.42s/it] {'loss': 0.5829, 'learning_rate': 1.5050128018011576e-05, 'epoch': 0.35} 35%|███▌ | 2030/5773 [3:09:47<5:37:53, 5.42s/it] 35%|███▌ | 2031/5773 [3:09:58<5:38:14, 5.42s/it] 35%|███▌ | 2031/5773 [3:09:52<5:38:13, 5.42s/it] {'loss': 0.5776, 'learning_rate': 1.5045284312319326e-05, 'epoch': 0.35} 35%|███▌ | 2031/5773 [3:09:58<5:38:14, 5.42s/it] {'loss': 0.5776, 'learning_rate': 1.5045284312319326e-05, 'epoch': 0.35} 35%|███▌ | 2031/5773 [3:09:52<5:38:13, 5.42s/it] 35%|███▌ | 2032/5773 [3:10:03<5:39:12, 5.44s/it] 35%|███▌ | 2032/5773 [3:09:58<5:39:12, 5.44s/it] {'loss': 0.5818, 'learning_rate': 1.5040439018210377e-05, 'epoch': 0.35} 35%|███▌ | 2032/5773 [3:10:03<5:39:12, 5.44s/it] {'loss': 0.5818, 'learning_rate': 1.5040439018210377e-05, 'epoch': 0.35} 35%|███▌ | 2032/5773 [3:09:58<5:39:12, 5.44s/it] 35%|███▌ | 2033/5773 [3:10:09<5:37:47, 5.42s/it] 35%|███▌ | 2033/5773 [3:10:03<5:37:47, 5.42s/it] {'loss': 0.5667, 'learning_rate': 1.5035592137210188e-05, 'epoch': 0.35} 35%|███▌ | 2033/5773 [3:10:09<5:37:47, 5.42s/it] {'loss': 0.5667, 'learning_rate': 1.5035592137210188e-05, 'epoch': 0.35} 35%|███▌ | 2033/5773 [3:10:03<5:37:47, 5.42s/it] 35%|███▌ | 2034/5773 [3:10:14<5:36:50, 5.41s/it] 35%|███▌ | 2034/5773 [3:10:08<5:36:49, 5.41s/it] {'loss': 0.5959, 'learning_rate': 1.5030743670844703e-05, 'epoch': 0.35} 35%|███▌ | 2034/5773 [3:10:14<5:36:50, 5.41s/it] {'loss': 0.5959, 'learning_rate': 1.5030743670844703e-05, 'epoch': 0.35} 35%|███▌ | 2034/5773 [3:10:08<5:36:49, 5.41s/it] 35%|███▌ | 2035/5773 [3:10:14<5:37:18, 5.41s/it] 35%|███▌ | 2035/5773 [3:10:19<5:37:19, 5.41s/it] {'loss': 0.589, 'learning_rate': 1.5025893620640384e-05, 'epoch': 0.35} 35%|███▌ | 2035/5773 [3:10:19<5:37:19, 5.41s/it] {'loss': 0.589, 'learning_rate': 1.5025893620640384e-05, 'epoch': 0.35} 35%|███▌ | 2035/5773 [3:10:14<5:37:18, 5.41s/it] 35%|███▌ | 2036/5773 [3:10:25<5:37:29, 5.42s/it] 35%|███▌ | 2036/5773 [3:10:19<5:37:29, 5.42s/it] {'loss': 0.573, 'learning_rate': 1.5021041988124173e-05, 'epoch': 0.35} 35%|███▌ | 2036/5773 [3:10:25<5:37:29, 5.42s/it] {'loss': 0.573, 'learning_rate': 1.5021041988124173e-05, 'epoch': 0.35} 35%|███▌ | 2036/5773 [3:10:19<5:37:29, 5.42s/it] 35%|███▌ | 2037/5773 [3:10:25<5:37:37, 5.42s/it] 35%|███▌ | 2037/5773 [3:10:30<5:37:38, 5.42s/it] {'loss': 0.5785, 'learning_rate': 1.5016188774823528e-05, 'epoch': 0.35} 35%|███▌ | 2037/5773 [3:10:30<5:37:38, 5.42s/it] {'loss': 0.5785, 'learning_rate': 1.5016188774823528e-05, 'epoch': 0.35} 35%|███▌ | 2037/5773 [3:10:25<5:37:37, 5.42s/it] 35%|███▌ | 2038/5773 [3:10:36<5:36:43, 5.41s/it] 35%|███▌ | 2038/5773 [3:10:30<5:36:43, 5.41s/it] {'loss': 0.5869, 'learning_rate': 1.5011333982266389e-05, 'epoch': 0.35} 35%|███▌ | 2038/5773 [3:10:36<5:36:43, 5.41s/it] {'loss': 0.5869, 'learning_rate': 1.5011333982266389e-05, 'epoch': 0.35} 35%|███▌ | 2038/5773 [3:10:30<5:36:43, 5.41s/it] 35%|███▌ | 2039/5773 [3:10:41<5:37:54, 5.43s/it] 35%|███▌ | 2039/5773 [3:10:35<5:37:55, 5.43s/it] {'loss': 0.5914, 'learning_rate': 1.5006477611981201e-05, 'epoch': 0.35} 35%|███▌ | 2039/5773 [3:10:41<5:37:54, 5.43s/it] {'loss': 0.5914, 'learning_rate': 1.5006477611981201e-05, 'epoch': 0.35} 35%|███▌ | 2039/5773 [3:10:35<5:37:55, 5.43s/it] 35%|███▌ | 2040/5773 [3:10:46<5:36:55, 5.42s/it] 35%|███▌ | 2040/5773 [3:10:41<5:36:56, 5.42s/it] {'loss': 0.5898, 'learning_rate': 1.5001619665496904e-05, 'epoch': 0.35} 35%|███▌ | 2040/5773 [3:10:46<5:36:55, 5.42s/it] {'loss': 0.5898, 'learning_rate': 1.5001619665496904e-05, 'epoch': 0.35} 35%|███▌ | 2040/5773 [3:10:41<5:36:56, 5.42s/it] 35%|███▌ | 2041/5773 [3:10:52<5:34:18, 5.37s/it] 35%|███▌ | 2041/5773 [3:10:46<5:34:19, 5.37s/it] {'loss': 0.5784, 'learning_rate': 1.4996760144342934e-05, 'epoch': 0.35} 35%|███▌ | 2041/5773 [3:10:52<5:34:18, 5.37s/it] {'loss': 0.5784, 'learning_rate': 1.4996760144342934e-05, 'epoch': 0.35} 35%|███▌ | 2041/5773 [3:10:46<5:34:19, 5.37s/it] 35%|███▌ | 2042/5773 [3:10:57<5:32:28, 5.35s/it] 35%|███▌ | 2042/5773 [3:10:51<5:32:29, 5.35s/it] {'loss': 0.5771, 'learning_rate': 1.4991899050049225e-05, 'epoch': 0.35} 35%|███▌ | 2042/5773 [3:10:57<5:32:28, 5.35s/it] {'loss': 0.5771, 'learning_rate': 1.4991899050049225e-05, 'epoch': 0.35} 35%|███▌ | 2042/5773 [3:10:51<5:32:29, 5.35s/it] 35%|███▌ | 2043/5773 [3:11:03<5:38:23, 5.44s/it] 35%|███▌ | 2043/5773 [3:10:57<5:38:23, 5.44s/it] {'loss': 0.5752, 'learning_rate': 1.498703638414621e-05, 'epoch': 0.35} 35%|███▌ | 2043/5773 [3:11:03<5:38:23, 5.44s/it] {'loss': 0.5752, 'learning_rate': 1.498703638414621e-05, 'epoch': 0.35} 35%|███▌ | 2043/5773 [3:10:57<5:38:23, 5.44s/it] 35%|███▌ | 2044/5773 [3:11:08<5:38:07, 5.44s/it] 35%|███▌ | 2044/5773 [3:11:03<5:38:07, 5.44s/it] {'loss': 0.5654, 'learning_rate': 1.4982172148164804e-05, 'epoch': 0.35} 35%|███▌ | 2044/5773 [3:11:08<5:38:07, 5.44s/it] {'loss': 0.5654, 'learning_rate': 1.4982172148164804e-05, 'epoch': 0.35} 35%|███▌ | 2044/5773 [3:11:03<5:38:07, 5.44s/it] 35%|███▌ | 2045/5773 [3:11:08<5:37:52, 5.44s/it] 35%|███▌ | 2045/5773 [3:11:14<5:37:53, 5.44s/it] {'loss': 0.5873, 'learning_rate': 1.4977306343636425e-05, 'epoch': 0.35} 35%|███▌ | 2045/5773 [3:11:14<5:37:53, 5.44s/it] {'loss': 0.5873, 'learning_rate': 1.4977306343636425e-05, 'epoch': 0.35} 35%|███▌ | 2045/5773 [3:11:08<5:37:52, 5.44s/it] 35%|███▌ | 2046/5773 [3:11:19<5:40:06, 5.48s/it] 35%|███▌ | 2046/5773 [3:11:14<5:40:07, 5.48s/it] {'loss': 0.5746, 'learning_rate': 1.4972438972092986e-05, 'epoch': 0.35} 35%|███▌ | 2046/5773 [3:11:19<5:40:06, 5.48s/it] {'loss': 0.5746, 'learning_rate': 1.4972438972092986e-05, 'epoch': 0.35} 35%|███▌ | 2046/5773 [3:11:14<5:40:07, 5.48s/it] 35%|███▌ | 2047/5773 [3:11:24<5:38:19, 5.45s/it] 35%|███▌ | 2047/5773 [3:11:19<5:38:18, 5.45s/it] {'loss': 0.6029, 'learning_rate': 1.496757003506689e-05, 'epoch': 0.35} 35%|███▌ | 2047/5773 [3:11:24<5:38:19, 5.45s/it] {'loss': 0.6029, 'learning_rate': 1.496757003506689e-05, 'epoch': 0.35} 35%|███▌ | 2047/5773 [3:11:19<5:38:18, 5.45s/it] 35%|███▌ | 2048/5773 [3:11:24<5:34:19, 5.39s/it] 35%|███▌ | 2048/5773 [3:11:30<5:34:20, 5.39s/it] {'loss': 0.5858, 'learning_rate': 1.4962699534091037e-05, 'epoch': 0.35} 35%|███▌ | 2048/5773 [3:11:30<5:34:20, 5.39s/it] {'loss': 0.5858, 'learning_rate': 1.4962699534091037e-05, 'epoch': 0.35} 35%|███▌ | 2048/5773 [3:11:24<5:34:19, 5.39s/it] 35%|███▌ | 2049/5773 [3:11:35<5:32:44, 5.36s/it] 35%|███▌ | 2049/5773 [3:11:29<5:32:45, 5.36s/it] {'loss': 0.5823, 'learning_rate': 1.4957827470698809e-05, 'epoch': 0.35} 35%|███▌ | 2049/5773 [3:11:35<5:32:44, 5.36s/it] {'loss': 0.5823, 'learning_rate': 1.4957827470698809e-05, 'epoch': 0.35} 35%|███▌ | 2049/5773 [3:11:29<5:32:45, 5.36s/it]13 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 36%|███▌ | 2050/5773 [3:11:40<5:30:58, 5.33s/it]14 AutoResumeHook: Checking whether to suspend... 9 11AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 36%|███▌ | 2050/5773 [3:11:35<5:30:58, 5.33s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5732, 'learning_rate': 1.4952953846424092e-05, 'epoch': 0.36} 36%|███▌ | 2050/5773 [3:11:40<5:30:58, 5.33s/it] {'loss': 0.5732, 'learning_rate': 1.4952953846424092e-05, 'epoch': 0.36} 36%|███▌ | 2050/5773 [3:11:35<5:30:58, 5.33s/it] 36%|███▌ | 2051/5773 [3:11:46<5:29:58, 5.32s/it] 36%|███▌ | 2051/5773 [3:11:40<5:29:57, 5.32s/it] {'loss': 0.576, 'learning_rate': 1.4948078662801263e-05, 'epoch': 0.36} 36%|███▌ | 2051/5773 [3:11:46<5:29:58, 5.32s/it] {'loss': 0.576, 'learning_rate': 1.4948078662801263e-05, 'epoch': 0.36} 36%|███▌ | 2051/5773 [3:11:40<5:29:57, 5.32s/it] 36%|███▌ | 2052/5773 [3:11:51<5:33:25, 5.38s/it] 36%|███▌ | 2052/5773 [3:11:46<5:33:25, 5.38s/it] {'loss': 0.5798, 'learning_rate': 1.4943201921365178e-05, 'epoch': 0.36} 36%|███▌ | 2052/5773 [3:11:51<5:33:25, 5.38s/it] {'loss': 0.5798, 'learning_rate': 1.4943201921365178e-05, 'epoch': 0.36} 36%|███▌ | 2052/5773 [3:11:46<5:33:25, 5.38s/it] 36%|███▌ | 2053/5773 [3:11:56<5:34:26, 5.39s/it] 36%|███▌ | 2053/5773 [3:11:51<5:34:26, 5.39s/it] {'loss': 0.5932, 'learning_rate': 1.4938323623651193e-05, 'epoch': 0.36} 36%|███▌ | 2053/5773 [3:11:56<5:34:26, 5.39s/it] {'loss': 0.5932, 'learning_rate': 1.4938323623651193e-05, 'epoch': 0.36} 36%|███▌ | 2053/5773 [3:11:51<5:34:26, 5.39s/it] 36%|███▌ | 2054/5773 [3:12:02<5:36:15, 5.42s/it] 36%|███▌ | 2054/5773 [3:11:56<5:36:15, 5.43s/it] {'loss': 0.5803, 'learning_rate': 1.4933443771195153e-05, 'epoch': 0.36} 36%|███▌ | 2054/5773 [3:12:02<5:36:15, 5.42s/it] {'loss': 0.5803, 'learning_rate': 1.4933443771195153e-05, 'epoch': 0.36} 36%|███▌ | 2054/5773 [3:11:56<5:36:15, 5.43s/it] 36%|███▌ | 2055/5773 [3:12:07<5:36:26, 5.43s/it] 36%|███▌ | 2055/5773 [3:12:02<5:36:25, 5.43s/it] {'loss': 0.5846, 'learning_rate': 1.4928562365533393e-05, 'epoch': 0.36} 36%|███▌ | 2055/5773 [3:12:07<5:36:26, 5.43s/it] {'loss': 0.5846, 'learning_rate': 1.4928562365533393e-05, 'epoch': 0.36} 36%|███▌ | 2055/5773 [3:12:02<5:36:25, 5.43s/it] 36%|███▌ | 2056/5773 [3:12:13<5:37:28, 5.45s/it] 36%|███▌ | 2056/5773 [3:12:07<5:37:29, 5.45s/it] {'loss': 0.5691, 'learning_rate': 1.4923679408202733e-05, 'epoch': 0.36} 36%|███▌ | 2056/5773 [3:12:13<5:37:28, 5.45s/it] {'loss': 0.5691, 'learning_rate': 1.4923679408202733e-05, 'epoch': 0.36} 36%|███▌ | 2056/5773 [3:12:07<5:37:29, 5.45s/it] 36%|███▌ | 2057/5773 [3:12:18<5:37:00, 5.44s/it] 36%|███▌ | 2057/5773 [3:12:13<5:37:01, 5.44s/it] {'loss': 0.569, 'learning_rate': 1.4918794900740485e-05, 'epoch': 0.36} 36%|███▌ | 2057/5773 [3:12:18<5:37:00, 5.44s/it] {'loss': 0.569, 'learning_rate': 1.4918794900740485e-05, 'epoch': 0.36} 36%|███▌ | 2057/5773 [3:12:13<5:37:01, 5.44s/it] 36%|███▌ | 2058/5773 [3:12:24<5:37:34, 5.45s/it] 36%|███▌ | 2058/5773 [3:12:18<5:37:33, 5.45s/it] {'loss': 0.5728, 'learning_rate': 1.4913908844684448e-05, 'epoch': 0.36} 36%|███▌ | 2058/5773 [3:12:24<5:37:34, 5.45s/it] {'loss': 0.5728, 'learning_rate': 1.4913908844684448e-05, 'epoch': 0.36} 36%|███▌ | 2058/5773 [3:12:18<5:37:33, 5.45s/it] 36%|███▌ | 2059/5773 [3:12:29<5:36:44, 5.44s/it] 36%|███▌ | 2059/5773 [3:12:24<5:36:44, 5.44s/it] {'loss': 0.5791, 'learning_rate': 1.490902124157291e-05, 'epoch': 0.36} 36%|███▌ | 2059/5773 [3:12:29<5:36:44, 5.44s/it] {'loss': 0.5791, 'learning_rate': 1.490902124157291e-05, 'epoch': 0.36} 36%|███▌ | 2059/5773 [3:12:24<5:36:44, 5.44s/it] 36%|███▌ | 2060/5773 [3:12:35<5:35:39, 5.42s/it] 36%|███▌ | 2060/5773 [3:12:29<5:35:40, 5.42s/it] {'loss': 0.574, 'learning_rate': 1.4904132092944638e-05, 'epoch': 0.36} 36%|███▌ | 2060/5773 [3:12:35<5:35:39, 5.42s/it] {'loss': 0.574, 'learning_rate': 1.4904132092944638e-05, 'epoch': 0.36} 36%|███▌ | 2060/5773 [3:12:29<5:35:40, 5.42s/it] 36%|███▌ | 2061/5773 [3:12:40<5:34:24, 5.41s/it] 36%|███▌ | 2061/5773 [3:12:34<5:34:24, 5.41s/it] {'loss': 0.5915, 'learning_rate': 1.4899241400338901e-05, 'epoch': 0.36} 36%|███▌ | 2061/5773 [3:12:40<5:34:24, 5.41s/it] {'loss': 0.5915, 'learning_rate': 1.4899241400338901e-05, 'epoch': 0.36} 36%|███▌ | 2061/5773 [3:12:34<5:34:24, 5.41s/it] 36%|███▌ | 2062/5773 [3:12:45<5:34:36, 5.41s/it] 36%|███▌ | 2062/5773 [3:12:40<5:34:36, 5.41s/it] {'loss': 0.582, 'learning_rate': 1.489434916529544e-05, 'epoch': 0.36} 36%|███▌ | 2062/5773 [3:12:45<5:34:36, 5.41s/it] {'loss': 0.582, 'learning_rate': 1.489434916529544e-05, 'epoch': 0.36} 36%|███▌ | 2062/5773 [3:12:40<5:34:36, 5.41s/it] 36%|███▌ | 2063/5773 [3:12:51<5:34:49, 5.41s/it] 36%|███▌ | 2063/5773 [3:12:45<5:34:49, 5.41s/it] {'loss': 0.589, 'learning_rate': 1.488945538935449e-05, 'epoch': 0.36} 36%|███▌ | 2063/5773 [3:12:51<5:34:49, 5.41s/it] {'loss': 0.589, 'learning_rate': 1.488945538935449e-05, 'epoch': 0.36} 36%|███▌ | 2063/5773 [3:12:45<5:34:49, 5.41s/it] 36%|███▌ | 2064/5773 [3:12:56<5:38:50, 5.48s/it] 36%|███▌ | 2064/5773 [3:12:51<5:38:51, 5.48s/it] {'loss': 0.5811, 'learning_rate': 1.4884560074056767e-05, 'epoch': 0.36} 36%|███▌ | 2064/5773 [3:12:56<5:38:50, 5.48s/it] {'loss': 0.5811, 'learning_rate': 1.4884560074056767e-05, 'epoch': 0.36} 36%|███▌ | 2064/5773 [3:12:51<5:38:51, 5.48s/it] 36%|███▌ | 2065/5773 [3:13:02<5:38:06, 5.47s/it] 36%|███▌ | 2065/5773 [3:12:56<5:38:06, 5.47s/it] {'loss': 0.5921, 'learning_rate': 1.487966322094347e-05, 'epoch': 0.36} 36%|███▌ | 2065/5773 [3:13:02<5:38:06, 5.47s/it] {'loss': 0.5921, 'learning_rate': 1.487966322094347e-05, 'epoch': 0.36} 36%|███▌ | 2065/5773 [3:12:56<5:38:06, 5.47s/it] 36%|███▌ | 2066/5773 [3:13:07<5:35:42, 5.43s/it] 36%|███▌ | 2066/5773 [3:13:02<5:35:41, 5.43s/it] {'loss': 0.5628, 'learning_rate': 1.4874764831556285e-05, 'epoch': 0.36} 36%|███▌ | 2066/5773 [3:13:07<5:35:42, 5.43s/it] {'loss': 0.5628, 'learning_rate': 1.4874764831556285e-05, 'epoch': 0.36} 36%|███▌ | 2066/5773 [3:13:02<5:35:41, 5.43s/it] 36%|███▌ | 2067/5773 [3:13:13<5:38:58, 5.49s/it] 36%|███▌ | 2067/5773 [3:13:07<5:38:57, 5.49s/it] {'loss': 0.5754, 'learning_rate': 1.4869864907437389e-05, 'epoch': 0.36} 36%|███▌ | 2067/5773 [3:13:13<5:38:58, 5.49s/it] {'loss': 0.5754, 'learning_rate': 1.4869864907437389e-05, 'epoch': 0.36} 36%|███▌ | 2067/5773 [3:13:07<5:38:57, 5.49s/it] 36%|███▌ | 2068/5773 [3:13:18<5:38:07, 5.48s/it] 36%|███▌ | 2068/5773 [3:13:13<5:38:07, 5.48s/it] {'loss': 0.5647, 'learning_rate': 1.4864963450129423e-05, 'epoch': 0.36} 36%|███▌ | 2068/5773 [3:13:18<5:38:07, 5.48s/it] {'loss': 0.5647, 'learning_rate': 1.4864963450129423e-05, 'epoch': 0.36} 36%|███▌ | 2068/5773 [3:13:13<5:38:07, 5.48s/it] 36%|███▌ | 2069/5773 [3:13:24<5:41:59, 5.54s/it] 36%|███▌ | 2069/5773 [3:13:18<5:41:59, 5.54s/it] {'loss': 0.5966, 'learning_rate': 1.4860060461175534e-05, 'epoch': 0.36} 36%|███▌ | 2069/5773 [3:13:24<5:41:59, 5.54s/it] {'loss': 0.5966, 'learning_rate': 1.4860060461175534e-05, 'epoch': 0.36} 36%|███▌ | 2069/5773 [3:13:18<5:41:59, 5.54s/it] 36%|███▌ | 2070/5773 [3:13:29<5:39:06, 5.49s/it] 36%|███▌ | 2070/5773 [3:13:24<5:39:06, 5.49s/it] {'loss': 0.5885, 'learning_rate': 1.4855155942119331e-05, 'epoch': 0.36} 36%|███▌ | 2070/5773 [3:13:29<5:39:06, 5.49s/it] {'loss': 0.5885, 'learning_rate': 1.4855155942119331e-05, 'epoch': 0.36} 36%|███▌ | 2070/5773 [3:13:24<5:39:06, 5.49s/it] 36%|███▌ | 2071/5773 [3:13:35<5:38:24, 5.48s/it] 36%|███▌ | 2071/5773 [3:13:29<5:38:26, 5.49s/it] {'loss': 0.5901, 'learning_rate': 1.4850249894504915e-05, 'epoch': 0.36} 36%|███▌ | 2071/5773 [3:13:35<5:38:24, 5.48s/it] {'loss': 0.5901, 'learning_rate': 1.4850249894504915e-05, 'epoch': 0.36} 36%|███▌ | 2071/5773 [3:13:29<5:38:26, 5.49s/it] 36%|███▌ | 2072/5773 [3:13:40<5:34:58, 5.43s/it] 36%|███▌ | 2072/5773 [3:13:35<5:34:57, 5.43s/it] {'loss': 0.575, 'learning_rate': 1.484534231987687e-05, 'epoch': 0.36} 36%|███▌ | 2072/5773 [3:13:40<5:34:58, 5.43s/it] {'loss': 0.575, 'learning_rate': 1.484534231987687e-05, 'epoch': 0.36} 36%|███▌ | 2072/5773 [3:13:35<5:34:57, 5.43s/it] 36%|███▌ | 2073/5773 [3:13:46<5:34:24, 5.42s/it] 36%|███▌ | 2073/5773 [3:13:40<5:34:24, 5.42s/it] {'loss': 0.5685, 'learning_rate': 1.4840433219780255e-05, 'epoch': 0.36} {'loss': 0.5685, 'learning_rate': 1.4840433219780255e-05, 'epoch': 0.36} 36%|███▌ | 2073/5773 [3:13:46<5:34:24, 5.42s/it] 36%|███▌ | 2073/5773 [3:13:40<5:34:24, 5.42s/it] 36%|███▌ | 2074/5773 [3:13:51<5:40:44, 5.53s/it] 36%|███▌ | 2074/5773 [3:13:46<5:40:44, 5.53s/it] {'loss': 0.5651, 'learning_rate': 1.4835522595760612e-05, 'epoch': 0.36} 36%|███▌ | 2074/5773 [3:13:51<5:40:44, 5.53s/it] {'loss': 0.5651, 'learning_rate': 1.4835522595760612e-05, 'epoch': 0.36} 36%|███▌ | 2074/5773 [3:13:46<5:40:44, 5.53s/it] 36%|███▌ | 2075/5773 [3:13:57<5:36:04, 5.45s/it] 36%|███▌ | 2075/5773 [3:13:51<5:36:04, 5.45s/it] {'loss': 0.5797, 'learning_rate': 1.4830610449363957e-05, 'epoch': 0.36} {'loss': 0.5797, 'learning_rate': 1.4830610449363957e-05, 'epoch': 0.36} 36%|███▌ | 2075/5773 [3:13:57<5:36:04, 5.45s/it] 36%|███▌ | 2075/5773 [3:13:51<5:36:04, 5.45s/it] 36%|███▌ | 2076/5773 [3:14:02<5:37:54, 5.48s/it] 36%|███▌ | 2076/5773 [3:13:57<5:37:54, 5.48s/it] {'loss': 0.5691, 'learning_rate': 1.48256967821368e-05, 'epoch': 0.36} 36%|███▌ | 2076/5773 [3:14:02<5:37:54, 5.48s/it] {'loss': 0.5691, 'learning_rate': 1.48256967821368e-05, 'epoch': 0.36} 36%|███▌ | 2076/5773 [3:13:57<5:37:54, 5.48s/it] 36%|███▌ | 2077/5773 [3:14:08<5:39:05, 5.50s/it] 36%|███▌ | 2077/5773 [3:14:02<5:39:05, 5.50s/it] {'loss': 0.6044, 'learning_rate': 1.4820781595626116e-05, 'epoch': 0.36} 36%|███▌ | 2077/5773 [3:14:08<5:39:05, 5.50s/it] {'loss': 0.6044, 'learning_rate': 1.4820781595626116e-05, 'epoch': 0.36} 36%|███▌ | 2077/5773 [3:14:02<5:39:05, 5.50s/it] 36%|███▌ | 2078/5773 [3:14:13<5:37:30, 5.48s/it] 36%|███▌ | 2078/5773 [3:14:08<5:37:30, 5.48s/it] {'loss': 0.5946, 'learning_rate': 1.4815864891379362e-05, 'epoch': 0.36} 36%|███▌ | 2078/5773 [3:14:13<5:37:30, 5.48s/it] {'loss': 0.5946, 'learning_rate': 1.4815864891379362e-05, 'epoch': 0.36} 36%|███▌ | 2078/5773 [3:14:08<5:37:30, 5.48s/it] 36%|███▌ | 2079/5773 [3:14:19<5:37:10, 5.48s/it] 36%|███▌ | 2079/5773 [3:14:13<5:37:10, 5.48s/it] {'loss': 0.5924, 'learning_rate': 1.4810946670944476e-05, 'epoch': 0.36} 36%|███▌ | 2079/5773 [3:14:19<5:37:10, 5.48s/it] {'loss': 0.5924, 'learning_rate': 1.4810946670944476e-05, 'epoch': 0.36} 36%|███▌ | 2079/5773 [3:14:13<5:37:10, 5.48s/it] 36%|███▌ | 2080/5773 [3:14:24<5:36:20, 5.46s/it] 36%|███▌ | 2080/5773 [3:14:19<5:36:20, 5.46s/it] {'loss': 0.5705, 'learning_rate': 1.4806026935869867e-05, 'epoch': 0.36} 36%|███▌ | 2080/5773 [3:14:24<5:36:20, 5.46s/it] {'loss': 0.5705, 'learning_rate': 1.4806026935869867e-05, 'epoch': 0.36} 36%|███▌ | 2080/5773 [3:14:19<5:36:20, 5.46s/it] 36%|███▌ | 2081/5773 [3:14:29<5:32:41, 5.41s/it] 36%|███▌ | 2081/5773 [3:14:24<5:32:41, 5.41s/it] {'loss': 0.5925, 'learning_rate': 1.4801105687704429e-05, 'epoch': 0.36} 36%|███▌ | 2081/5773 [3:14:29<5:32:41, 5.41s/it] {'loss': 0.5925, 'learning_rate': 1.4801105687704429e-05, 'epoch': 0.36} 36%|███▌ | 2081/5773 [3:14:24<5:32:41, 5.41s/it] 36%|███▌ | 2082/5773 [3:14:35<5:33:17, 5.42s/it] 36%|███▌ | 2082/5773 [3:14:29<5:33:17, 5.42s/it] {'loss': 0.5913, 'learning_rate': 1.4796182927997525e-05, 'epoch': 0.36} 36%|███▌ | 2082/5773 [3:14:35<5:33:17, 5.42s/it] {'loss': 0.5913, 'learning_rate': 1.4796182927997525e-05, 'epoch': 0.36} 36%|███▌ | 2082/5773 [3:14:29<5:33:17, 5.42s/it] 36%|███▌ | 2083/5773 [3:14:40<5:34:15, 5.44s/it] 36%|███▌ | 2083/5773 [3:14:35<5:34:15, 5.44s/it] {'loss': 0.5791, 'learning_rate': 1.4791258658298997e-05, 'epoch': 0.36} 36%|███▌ | 2083/5773 [3:14:40<5:34:15, 5.44s/it] {'loss': 0.5791, 'learning_rate': 1.4791258658298997e-05, 'epoch': 0.36} 36%|███▌ | 2083/5773 [3:14:35<5:34:15, 5.44s/it] 36%|███▌ | 2084/5773 [3:14:46<5:36:53, 5.48s/it] 36%|███▌ | 2084/5773 [3:14:40<5:36:54, 5.48s/it] {'loss': 0.5871, 'learning_rate': 1.4786332880159166e-05, 'epoch': 0.36} 36%|███▌ | 2084/5773 [3:14:46<5:36:53, 5.48s/it] {'loss': 0.5871, 'learning_rate': 1.4786332880159166e-05, 'epoch': 0.36} 36%|███▌ | 2084/5773 [3:14:40<5:36:54, 5.48s/it] 36%|███▌ | 2085/5773 [3:14:51<5:35:07, 5.45s/it] 36%|███▌ | 2085/5773 [3:14:46<5:35:06, 5.45s/it] {'loss': 0.5789, 'learning_rate': 1.4781405595128822e-05, 'epoch': 0.36} 36%|███▌ | 2085/5773 [3:14:51<5:35:07, 5.45s/it] {'loss': 0.5789, 'learning_rate': 1.4781405595128822e-05, 'epoch': 0.36} 36%|███▌ | 2085/5773 [3:14:46<5:35:06, 5.45s/it] 36%|███▌ | 2086/5773 [3:14:57<5:35:20, 5.46s/it] 36%|███▌ | 2086/5773 [3:14:51<5:35:20, 5.46s/it] {'loss': 0.5909, 'learning_rate': 1.4776476804759233e-05, 'epoch': 0.36} 36%|███▌ | 2086/5773 [3:14:57<5:35:20, 5.46s/it] {'loss': 0.5909, 'learning_rate': 1.4776476804759233e-05, 'epoch': 0.36} 36%|███▌ | 2086/5773 [3:14:51<5:35:20, 5.46s/it] 36%|███▌ | 2087/5773 [3:15:02<5:32:49, 5.42s/it] 36%|███▌ | 2087/5773 [3:14:56<5:32:49, 5.42s/it] {'loss': 0.5841, 'learning_rate': 1.4771546510602135e-05, 'epoch': 0.36} 36%|███▌ | 2087/5773 [3:15:02<5:32:49, 5.42s/it] {'loss': 0.5841, 'learning_rate': 1.4771546510602135e-05, 'epoch': 0.36} 36%|███▌ | 2087/5773 [3:14:56<5:32:49, 5.42s/it] 36%|███▌ | 2088/5773 [3:15:08<5:35:18, 5.46s/it] 36%|███▌ | 2088/5773 [3:15:02<5:35:18, 5.46s/it] {'loss': 0.5693, 'learning_rate': 1.4766614714209748e-05, 'epoch': 0.36} 36%|███▌ | 2088/5773 [3:15:08<5:35:18, 5.46s/it] {'loss': 0.5693, 'learning_rate': 1.4766614714209748e-05, 'epoch': 0.36} 36%|███▌ | 2088/5773 [3:15:02<5:35:18, 5.46s/it] 36%|███▌ | 2089/5773 [3:15:13<5:35:12, 5.46s/it] 36%|███▌ | 2089/5773 [3:15:08<5:35:11, 5.46s/it] {'loss': 0.5802, 'learning_rate': 1.4761681417134754e-05, 'epoch': 0.36} 36%|███▌ | 2089/5773 [3:15:13<5:35:12, 5.46s/it] {'loss': 0.5802, 'learning_rate': 1.4761681417134754e-05, 'epoch': 0.36} 36%|███▌ | 2089/5773 [3:15:08<5:35:11, 5.46s/it] 36%|███▌ | 2090/5773 [3:15:18<5:32:30, 5.42s/it] 36%|███▌ | 2090/5773 [3:15:13<5:32:31, 5.42s/it] {'loss': 0.5782, 'learning_rate': 1.4756746620930317e-05, 'epoch': 0.36} 36%|███▌ | 2090/5773 [3:15:18<5:32:30, 5.42s/it] {'loss': 0.5782, 'learning_rate': 1.4756746620930317e-05, 'epoch': 0.36} 36%|███▌ | 2090/5773 [3:15:13<5:32:31, 5.42s/it] 36%|███▌ | 2091/5773 [3:15:24<5:32:07, 5.41s/it] 36%|███▌ | 2091/5773 [3:15:18<5:32:08, 5.41s/it] {'loss': 0.5911, 'learning_rate': 1.4751810327150066e-05, 'epoch': 0.36} 36%|███▌ | 2091/5773 [3:15:24<5:32:07, 5.41s/it] {'loss': 0.5911, 'learning_rate': 1.4751810327150066e-05, 'epoch': 0.36} 36%|███▌ | 2091/5773 [3:15:18<5:32:08, 5.41s/it] 36%|███▌ | 2092/5773 [3:15:29<5:30:57, 5.39s/it] 36%|███▌ | 2092/5773 [3:15:24<5:30:57, 5.39s/it] {'loss': 0.5868, 'learning_rate': 1.4746872537348108e-05, 'epoch': 0.36} 36%|███▌ | 2092/5773 [3:15:29<5:30:57, 5.39s/it] {'loss': 0.5868, 'learning_rate': 1.4746872537348108e-05, 'epoch': 0.36} 36%|███▌ | 2092/5773 [3:15:24<5:30:57, 5.39s/it] 36%|███▋ | 2093/5773 [3:15:35<5:37:52, 5.51s/it] 36%|███▋ | 2093/5773 [3:15:29<5:37:52, 5.51s/it] {'loss': 0.5724, 'learning_rate': 1.474193325307901e-05, 'epoch': 0.36} 36%|███▋ | 2093/5773 [3:15:35<5:37:52, 5.51s/it] {'loss': 0.5724, 'learning_rate': 1.474193325307901e-05, 'epoch': 0.36} 36%|███▋ | 2093/5773 [3:15:29<5:37:52, 5.51s/it] 36%|███▋ | 2094/5773 [3:15:40<5:35:49, 5.48s/it] 36%|███▋ | 2094/5773 [3:15:35<5:35:50, 5.48s/it] {'loss': 0.569, 'learning_rate': 1.473699247589782e-05, 'epoch': 0.36} 36%|███▋ | 2094/5773 [3:15:40<5:35:49, 5.48s/it] {'loss': 0.569, 'learning_rate': 1.473699247589782e-05, 'epoch': 0.36} 36%|███▋ | 2094/5773 [3:15:35<5:35:50, 5.48s/it] 36%|███▋ | 2095/5773 [3:15:46<5:36:02, 5.48s/it] 36%|███▋ | 2095/5773 [3:15:40<5:36:02, 5.48s/it] {'loss': 0.572, 'learning_rate': 1.4732050207360056e-05, 'epoch': 0.36} 36%|███▋ | 2095/5773 [3:15:46<5:36:02, 5.48s/it] {'loss': 0.572, 'learning_rate': 1.4732050207360056e-05, 'epoch': 0.36} 36%|███▋ | 2095/5773 [3:15:40<5:36:02, 5.48s/it] 36%|███▋ | 2096/5773 [3:15:51<5:38:03, 5.52s/it] 36%|███▋ | 2096/5773 [3:15:46<5:38:03, 5.52s/it] {'loss': 0.5703, 'learning_rate': 1.4727106449021696e-05, 'epoch': 0.36} 36%|███▋ | 2096/5773 [3:15:51<5:38:03, 5.52s/it] {'loss': 0.5703, 'learning_rate': 1.4727106449021696e-05, 'epoch': 0.36} 36%|███▋ | 2096/5773 [3:15:46<5:38:03, 5.52s/it] 36%|███▋ | 2097/5773 [3:15:51<5:36:46, 5.50s/it] 36%|███▋ | 2097/5773 [3:15:57<5:36:48, 5.50s/it] {'loss': 0.5863, 'learning_rate': 1.4722161202439195e-05, 'epoch': 0.36} 36%|███▋ | 2097/5773 [3:15:57<5:36:48, 5.50s/it] {'loss': 0.5863, 'learning_rate': 1.4722161202439195e-05, 'epoch': 0.36} 36%|███▋ | 2097/5773 [3:15:51<5:36:46, 5.50s/it] 36%|███▋ | 2098/5773 [3:16:02<5:35:36, 5.48s/it] 36%|███▋ | 2098/5773 [3:15:57<5:35:35, 5.48s/it] {'loss': 0.58, 'learning_rate': 1.4717214469169478e-05, 'epoch': 0.36} 36%|███▋ | 2098/5773 [3:16:02<5:35:36, 5.48s/it] {'loss': 0.58, 'learning_rate': 1.4717214469169478e-05, 'epoch': 0.36} 36%|███▋ | 2098/5773 [3:15:57<5:35:35, 5.48s/it] 36%|███▋ | 2099/5773 [3:16:08<5:36:15, 5.49s/it] 36%|███▋ | 2099/5773 [3:16:02<5:36:15, 5.49s/it] {'loss': 0.5801, 'learning_rate': 1.4712266250769932e-05, 'epoch': 0.36} 36%|███▋ | 2099/5773 [3:16:08<5:36:15, 5.49s/it] {'loss': 0.5801, 'learning_rate': 1.4712266250769932e-05, 'epoch': 0.36} 36%|███▋ | 2099/5773 [3:16:02<5:36:15, 5.49s/it]7 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 158 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...13 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 36%|███▋ | 2100/5773 [3:16:13<5:39:32, 5.55s/it]14 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 36%|███▋ | 2100/5773 [3:16:08<5:39:32, 5.55s/it] {'loss': 0.5846, 'learning_rate': 1.4707316548798414e-05, 'epoch': 0.36} 36%|███▋ | 2100/5773 [3:16:13<5:39:32, 5.55s/it] {'loss': 0.5846, 'learning_rate': 1.4707316548798414e-05, 'epoch': 0.36} 36%|███▋ | 2100/5773 [3:16:08<5:39:32, 5.55s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 36%|███▋ | 2101/5773 [3:16:35<10:24:24, 10.20s/it] 36%|███▋ | 2101/5773 [3:16:29<10:24:25, 10.20s/it] {'loss': 0.5892, 'learning_rate': 1.4702365364813248e-05, 'epoch': 0.36} 36%|███▋ | 2101/5773 [3:16:35<10:24:24, 10.20s/it] {'loss': 0.5892, 'learning_rate': 1.4702365364813248e-05, 'epoch': 0.36} 36%|███▋ | 2101/5773 [3:16:29<10:24:25, 10.20s/it] 36%|███▋ | 2102/5773 [3:16:40<8:54:48, 8.74s/it] 36%|███▋ | 2102/5773 [3:16:34<8:54:48, 8.74s/it] {'loss': 0.5838, 'learning_rate': 1.4697412700373223e-05, 'epoch': 0.36} 36%|███▋ | 2102/5773 [3:16:40<8:54:48, 8.74s/it] {'loss': 0.5838, 'learning_rate': 1.4697412700373223e-05, 'epoch': 0.36} 36%|███▋ | 2102/5773 [3:16:34<8:54:48, 8.74s/it] 36%|███▋ | 2103/5773 [3:16:45<7:52:52, 7.73s/it] 36%|███▋ | 2103/5773 [3:16:40<7:52:52, 7.73s/it] {'loss': 0.5773, 'learning_rate': 1.4692458557037603e-05, 'epoch': 0.36} 36%|███▋ | 2103/5773 [3:16:45<7:52:52, 7.73s/it] {'loss': 0.5773, 'learning_rate': 1.4692458557037603e-05, 'epoch': 0.36} 36%|███▋ | 2103/5773 [3:16:40<7:52:52, 7.73s/it] 36%|███▋ | 2104/5773 [3:16:51<7:11:05, 7.05s/it] 36%|███▋ | 2104/5773 [3:16:45<7:11:06, 7.05s/it] {'loss': 0.5782, 'learning_rate': 1.4687502936366105e-05, 'epoch': 0.36} 36%|███▋ | 2104/5773 [3:16:51<7:11:05, 7.05s/it] {'loss': 0.5782, 'learning_rate': 1.4687502936366105e-05, 'epoch': 0.36} 36%|███▋ | 2104/5773 [3:16:45<7:11:06, 7.05s/it] 36%|███▋ | 2105/5773 [3:16:56<6:43:38, 6.60s/it] 36%|███▋ | 2105/5773 [3:16:51<6:43:39, 6.60s/it] {'loss': 0.5909, 'learning_rate': 1.4682545839918918e-05, 'epoch': 0.36} 36%|███▋ | 2105/5773 [3:16:56<6:43:38, 6.60s/it] {'loss': 0.5909, 'learning_rate': 1.4682545839918918e-05, 'epoch': 0.36} 36%|███▋ | 2105/5773 [3:16:51<6:43:39, 6.60s/it] 36%|███▋ | 2106/5773 [3:17:02<6:20:46, 6.23s/it] 36%|███▋ | 2106/5773 [3:16:56<6:20:46, 6.23s/it] {'loss': 0.6019, 'learning_rate': 1.4677587269256693e-05, 'epoch': 0.36} 36%|███▋ | 2106/5773 [3:17:02<6:20:46, 6.23s/it] {'loss': 0.6019, 'learning_rate': 1.4677587269256693e-05, 'epoch': 0.36} 36%|███▋ | 2106/5773 [3:16:56<6:20:46, 6.23s/it] 36%|███▋ | 2107/5773 [3:17:07<6:09:25, 6.05s/it] 36%|███▋ | 2107/5773 [3:17:02<6:09:25, 6.05s/it] {'loss': 0.6023, 'learning_rate': 1.467262722594055e-05, 'epoch': 0.36} 36%|███▋ | 2107/5773 [3:17:07<6:09:25, 6.05s/it] {'loss': 0.6023, 'learning_rate': 1.467262722594055e-05, 'epoch': 0.36} 36%|███▋ | 2107/5773 [3:17:02<6:09:25, 6.05s/it] 37%|███▋ | 2108/5773 [3:17:13<5:58:16, 5.87s/it] 37%|███▋ | 2108/5773 [3:17:07<5:58:16, 5.87s/it] {'loss': 0.5615, 'learning_rate': 1.4667665711532066e-05, 'epoch': 0.37} 37%|███▋ | 2108/5773 [3:17:13<5:58:16, 5.87s/it] {'loss': 0.5615, 'learning_rate': 1.4667665711532066e-05, 'epoch': 0.37} 37%|███▋ | 2108/5773 [3:17:07<5:58:16, 5.87s/it] 37%|███▋ | 2109/5773 [3:17:18<5:48:25, 5.71s/it] 37%|███▋ | 2109/5773 [3:17:12<5:48:25, 5.71s/it] {'loss': 0.5876, 'learning_rate': 1.4662702727593284e-05, 'epoch': 0.37} 37%|███▋ | 2109/5773 [3:17:18<5:48:25, 5.71s/it] {'loss': 0.5876, 'learning_rate': 1.4662702727593284e-05, 'epoch': 0.37} 37%|███▋ | 2109/5773 [3:17:12<5:48:25, 5.71s/it] 37%|███▋ | 2110/5773 [3:17:23<5:43:12, 5.62s/it] 37%|███▋ | 2110/5773 [3:17:18<5:43:12, 5.62s/it] {'loss': 0.5701, 'learning_rate': 1.4657738275686711e-05, 'epoch': 0.37} 37%|███▋ | 2110/5773 [3:17:23<5:43:12, 5.62s/it] {'loss': 0.5701, 'learning_rate': 1.4657738275686711e-05, 'epoch': 0.37} 37%|███▋ | 2110/5773 [3:17:18<5:43:12, 5.62s/it] 37%|███▋ | 2111/5773 [3:17:29<5:40:18, 5.58s/it] 37%|███▋ | 2111/5773 [3:17:23<5:40:18, 5.58s/it]{'loss': 0.5658, 'learning_rate': 1.4652772357375316e-05, 'epoch': 0.37} 37%|███▋ | 2111/5773 [3:17:29<5:40:18, 5.58s/it] {'loss': 0.5658, 'learning_rate': 1.4652772357375316e-05, 'epoch': 0.37} 37%|███▋ | 2111/5773 [3:17:23<5:40:18, 5.58s/it] 37%|███▋ | 2112/5773 [3:17:34<5:38:39, 5.55s/it] 37%|███▋ | 2112/5773 [3:17:29<5:38:40, 5.55s/it] {'loss': 0.6002, 'learning_rate': 1.4647804974222525e-05, 'epoch': 0.37} 37%|███▋ | 2112/5773 [3:17:34<5:38:39, 5.55s/it] {'loss': 0.6002, 'learning_rate': 1.4647804974222525e-05, 'epoch': 0.37} 37%|███▋ | 2112/5773 [3:17:29<5:38:40, 5.55s/it] 37%|███▋ | 2113/5773 [3:17:40<5:36:22, 5.51s/it] 37%|███▋ | 2113/5773 [3:17:34<5:36:22, 5.51s/it] {'loss': 0.58, 'learning_rate': 1.4642836127792237e-05, 'epoch': 0.37} 37%|███▋ | 2113/5773 [3:17:40<5:36:22, 5.51s/it] {'loss': 0.58, 'learning_rate': 1.4642836127792237e-05, 'epoch': 0.37} 37%|███▋ | 2113/5773 [3:17:34<5:36:22, 5.51s/it] 37%|███▋ | 2114/5773 [3:17:45<5:38:37, 5.55s/it] 37%|███▋ | 2114/5773 [3:17:40<5:38:38, 5.55s/it] {'loss': 0.5858, 'learning_rate': 1.4637865819648794e-05, 'epoch': 0.37} 37%|███▋ | 2114/5773 [3:17:45<5:38:37, 5.55s/it] {'loss': 0.5858, 'learning_rate': 1.4637865819648794e-05, 'epoch': 0.37} 37%|███▋ | 2114/5773 [3:17:40<5:38:38, 5.55s/it] 37%|███▋ | 2115/5773 [3:17:51<5:36:40, 5.52s/it] 37%|███▋ | 2115/5773 [3:17:45<5:36:40, 5.52s/it] {'loss': 0.58, 'learning_rate': 1.4632894051357016e-05, 'epoch': 0.37} 37%|███▋ | 2115/5773 [3:17:51<5:36:40, 5.52s/it] {'loss': 0.58, 'learning_rate': 1.4632894051357016e-05, 'epoch': 0.37} 37%|███▋ | 2115/5773 [3:17:45<5:36:40, 5.52s/it] 37%|███▋ | 2116/5773 [3:17:56<5:37:25, 5.54s/it] 37%|███▋ | 2116/5773 [3:17:51<5:37:25, 5.54s/it] {'loss': 0.5955, 'learning_rate': 1.462792082448217e-05, 'epoch': 0.37} 37%|███▋ | 2116/5773 [3:17:56<5:37:25, 5.54s/it] {'loss': 0.5955, 'learning_rate': 1.462792082448217e-05, 'epoch': 0.37} 37%|███▋ | 2116/5773 [3:17:51<5:37:25, 5.54s/it] 37%|███▋ | 2117/5773 [3:18:02<5:37:13, 5.53s/it] 37%|███▋ | 2117/5773 [3:17:56<5:37:14, 5.53s/it] {'loss': 0.577, 'learning_rate': 1.4622946140589984e-05, 'epoch': 0.37} 37%|███▋ | 2117/5773 [3:18:02<5:37:13, 5.53s/it] {'loss': 0.577, 'learning_rate': 1.4622946140589984e-05, 'epoch': 0.37} 37%|███▋ | 2117/5773 [3:17:56<5:37:14, 5.53s/it] 37%|███▋ | 2118/5773 [3:18:07<5:34:11, 5.49s/it] 37%|███▋ | 2118/5773 [3:18:02<5:34:11, 5.49s/it] {'loss': 0.5575, 'learning_rate': 1.4617970001246652e-05, 'epoch': 0.37} 37%|███▋ | 2118/5773 [3:18:07<5:34:11, 5.49s/it] {'loss': 0.5575, 'learning_rate': 1.4617970001246652e-05, 'epoch': 0.37} 37%|███▋ | 2118/5773 [3:18:02<5:34:11, 5.49s/it] 37%|███▋ | 2119/5773 [3:18:13<5:32:29, 5.46s/it] 37%|███▋ | 2119/5773 [3:18:07<5:32:29, 5.46s/it] {'loss': 0.5784, 'learning_rate': 1.461299240801882e-05, 'epoch': 0.37} 37%|███▋ | 2119/5773 [3:18:13<5:32:29, 5.46s/it] {'loss': 0.5784, 'learning_rate': 1.461299240801882e-05, 'epoch': 0.37} 37%|███▋ | 2119/5773 [3:18:07<5:32:29, 5.46s/it] 37%|███▋ | 2120/5773 [3:18:18<5:32:20, 5.46s/it] 37%|███▋ | 2120/5773 [3:18:13<5:32:20, 5.46s/it] {'loss': 0.5687, 'learning_rate': 1.4608013362473594e-05, 'epoch': 0.37} 37%|███▋ | 2120/5773 [3:18:18<5:32:20, 5.46s/it] {'loss': 0.5687, 'learning_rate': 1.4608013362473594e-05, 'epoch': 0.37} 37%|███▋ | 2120/5773 [3:18:13<5:32:20, 5.46s/it] 37%|███▋ | 2121/5773 [3:18:24<5:33:11, 5.47s/it] 37%|███▋ | 2121/5773 [3:18:18<5:33:12, 5.47s/it] {'loss': 0.5722, 'learning_rate': 1.460303286617854e-05, 'epoch': 0.37} 37%|███▋ | 2121/5773 [3:18:24<5:33:11, 5.47s/it] {'loss': 0.5722, 'learning_rate': 1.460303286617854e-05, 'epoch': 0.37} 37%|███▋ | 2121/5773 [3:18:18<5:33:12, 5.47s/it] 37%|███▋ | 2122/5773 [3:18:29<5:29:25, 5.41s/it] 37%|███▋ | 2122/5773 [3:18:23<5:29:25, 5.41s/it] {'loss': 0.5734, 'learning_rate': 1.4598050920701673e-05, 'epoch': 0.37} 37%|███▋ | 2122/5773 [3:18:29<5:29:25, 5.41s/it] {'loss': 0.5734, 'learning_rate': 1.4598050920701673e-05, 'epoch': 0.37} 37%|███▋ | 2122/5773 [3:18:23<5:29:25, 5.41s/it] 37%|███▋ | 2123/5773 [3:18:34<5:28:33, 5.40s/it] 37%|███▋ | 2123/5773 [3:18:29<5:28:33, 5.40s/it] {'loss': 0.5971, 'learning_rate': 1.4593067527611467e-05, 'epoch': 0.37} 37%|███▋ | 2123/5773 [3:18:34<5:28:33, 5.40s/it] {'loss': 0.5971, 'learning_rate': 1.4593067527611467e-05, 'epoch': 0.37} 37%|███▋ | 2123/5773 [3:18:29<5:28:33, 5.40s/it] 37%|███▋ | 2124/5773 [3:18:40<5:29:25, 5.42s/it] 37%|███▋ | 2124/5773 [3:18:34<5:29:25, 5.42s/it] {'loss': 0.5747, 'learning_rate': 1.4588082688476858e-05, 'epoch': 0.37} 37%|███▋ | 2124/5773 [3:18:40<5:29:25, 5.42s/it] {'loss': 0.5747, 'learning_rate': 1.4588082688476858e-05, 'epoch': 0.37} 37%|███▋ | 2124/5773 [3:18:34<5:29:25, 5.42s/it] 37%|███▋ | 2125/5773 [3:18:45<5:27:35, 5.39s/it] 37%|███▋ | 2125/5773 [3:18:40<5:27:35, 5.39s/it] {'loss': 0.5899, 'learning_rate': 1.4583096404867227e-05, 'epoch': 0.37} 37%|███▋ | 2125/5773 [3:18:45<5:27:35, 5.39s/it] {'loss': 0.5899, 'learning_rate': 1.4583096404867227e-05, 'epoch': 0.37} 37%|███▋ | 2125/5773 [3:18:40<5:27:35, 5.39s/it] 37%|███▋ | 2126/5773 [3:18:51<5:31:26, 5.45s/it] 37%|███▋ | 2126/5773 [3:18:45<5:31:26, 5.45s/it] {'loss': 0.5789, 'learning_rate': 1.4578108678352421e-05, 'epoch': 0.37} 37%|███▋ | 2126/5773 [3:18:51<5:31:26, 5.45s/it] {'loss': 0.5789, 'learning_rate': 1.4578108678352421e-05, 'epoch': 0.37} 37%|███▋ | 2126/5773 [3:18:45<5:31:26, 5.45s/it] 37%|███▋ | 2127/5773 [3:18:56<5:27:16, 5.39s/it] 37%|███▋ | 2127/5773 [3:18:50<5:27:16, 5.39s/it] {'loss': 0.5886, 'learning_rate': 1.4573119510502732e-05, 'epoch': 0.37} 37%|███▋ | 2127/5773 [3:18:56<5:27:16, 5.39s/it] {'loss': 0.5886, 'learning_rate': 1.4573119510502732e-05, 'epoch': 0.37} 37%|███▋ | 2127/5773 [3:18:50<5:27:16, 5.39s/it] 37%|███▋ | 2128/5773 [3:19:01<5:26:47, 5.38s/it] 37%|███▋ | 2128/5773 [3:18:56<5:26:47, 5.38s/it] {'loss': 0.5814, 'learning_rate': 1.4568128902888912e-05, 'epoch': 0.37} 37%|███▋ | 2128/5773 [3:19:01<5:26:47, 5.38s/it] {'loss': 0.5814, 'learning_rate': 1.4568128902888912e-05, 'epoch': 0.37} 37%|███▋ | 2128/5773 [3:18:56<5:26:47, 5.38s/it] 37%|███▋ | 2129/5773 [3:19:07<5:26:43, 5.38s/it] 37%|███▋ | 2129/5773 [3:19:01<5:26:44, 5.38s/it] {'loss': 0.5789, 'learning_rate': 1.4563136857082164e-05, 'epoch': 0.37} 37%|███▋ | 2129/5773 [3:19:07<5:26:43, 5.38s/it] {'loss': 0.5789, 'learning_rate': 1.4563136857082164e-05, 'epoch': 0.37} 37%|███▋ | 2129/5773 [3:19:01<5:26:44, 5.38s/it] 37%|███▋ | 2130/5773 [3:19:12<5:25:47, 5.37s/it] 37%|███▋ | 2130/5773 [3:19:07<5:25:47, 5.37s/it] {'loss': 0.5906, 'learning_rate': 1.4558143374654141e-05, 'epoch': 0.37} 37%|███▋ | 2130/5773 [3:19:12<5:25:47, 5.37s/it] {'loss': 0.5906, 'learning_rate': 1.4558143374654141e-05, 'epoch': 0.37} 37%|███▋ | 2130/5773 [3:19:07<5:25:47, 5.37s/it] 37%|███▋ | 2131/5773 [3:19:17<5:24:45, 5.35s/it] 37%|███▋ | 2131/5773 [3:19:12<5:24:45, 5.35s/it] {'loss': 0.5723, 'learning_rate': 1.4553148457176954e-05, 'epoch': 0.37} 37%|███▋ | 2131/5773 [3:19:17<5:24:45, 5.35s/it] {'loss': 0.5723, 'learning_rate': 1.4553148457176954e-05, 'epoch': 0.37} 37%|███▋ | 2131/5773 [3:19:12<5:24:45, 5.35s/it] 37%|███▋ | 2132/5773 [3:19:23<5:25:17, 5.36s/it] 37%|███▋ | 2132/5773 [3:19:17<5:25:17, 5.36s/it] {'loss': 0.5856, 'learning_rate': 1.4548152106223157e-05, 'epoch': 0.37} 37%|███▋ | 2132/5773 [3:19:23<5:25:17, 5.36s/it] {'loss': 0.5856, 'learning_rate': 1.4548152106223157e-05, 'epoch': 0.37} 37%|███▋ | 2132/5773 [3:19:17<5:25:17, 5.36s/it] 37%|███▋ | 2133/5773 [3:19:28<5:27:05, 5.39s/it] 37%|███▋ | 2133/5773 [3:19:23<5:27:05, 5.39s/it] {'loss': 0.5808, 'learning_rate': 1.4543154323365765e-05, 'epoch': 0.37} 37%|███▋ | 2133/5773 [3:19:28<5:27:05, 5.39s/it] {'loss': 0.5808, 'learning_rate': 1.4543154323365765e-05, 'epoch': 0.37} 37%|███▋ | 2133/5773 [3:19:23<5:27:05, 5.39s/it] 37%|███▋ | 2134/5773 [3:19:34<5:29:30, 5.43s/it] 37%|███▋ | 2134/5773 [3:19:28<5:29:30, 5.43s/it] {'loss': 0.581, 'learning_rate': 1.4538155110178242e-05, 'epoch': 0.37} 37%|███▋ | 2134/5773 [3:19:34<5:29:30, 5.43s/it] {'loss': 0.581, 'learning_rate': 1.4538155110178242e-05, 'epoch': 0.37} 37%|███▋ | 2134/5773 [3:19:28<5:29:30, 5.43s/it] 37%|███▋ | 2135/5773 [3:19:39<5:29:34, 5.44s/it] 37%|███▋ | 2135/5773 [3:19:34<5:29:33, 5.44s/it] {'loss': 0.6061, 'learning_rate': 1.4533154468234493e-05, 'epoch': 0.37} 37%|███▋ | 2135/5773 [3:19:39<5:29:34, 5.44s/it] {'loss': 0.6061, 'learning_rate': 1.4533154468234493e-05, 'epoch': 0.37} 37%|███▋ | 2135/5773 [3:19:34<5:29:33, 5.44s/it] 37%|███▋ | 2136/5773 [3:19:45<5:31:19, 5.47s/it] 37%|███▋ | 2136/5773 [3:19:39<5:31:18, 5.47s/it] {'loss': 0.5621, 'learning_rate': 1.4528152399108889e-05, 'epoch': 0.37} 37%|███▋ | 2136/5773 [3:19:45<5:31:19, 5.47s/it] {'loss': 0.5621, 'learning_rate': 1.4528152399108889e-05, 'epoch': 0.37} 37%|███▋ | 2136/5773 [3:19:39<5:31:18, 5.47s/it] 37%|███▋ | 2137/5773 [3:19:50<5:31:01, 5.46s/it] 37%|███▋ | 2137/5773 [3:19:45<5:31:01, 5.46s/it] {'loss': 0.5901, 'learning_rate': 1.4523148904376234e-05, 'epoch': 0.37} 37%|███▋ | 2137/5773 [3:19:50<5:31:01, 5.46s/it] {'loss': 0.5901, 'learning_rate': 1.4523148904376234e-05, 'epoch': 0.37} 37%|███▋ | 2137/5773 [3:19:45<5:31:01, 5.46s/it] 37%|███▋ | 2138/5773 [3:19:56<5:31:21, 5.47s/it] 37%|███▋ | 2138/5773 [3:19:50<5:31:21, 5.47s/it] {'loss': 0.5577, 'learning_rate': 1.451814398561179e-05, 'epoch': 0.37} 37%|███▋ | 2138/5773 [3:19:56<5:31:21, 5.47s/it] {'loss': 0.5577, 'learning_rate': 1.451814398561179e-05, 'epoch': 0.37} 37%|███▋ | 2138/5773 [3:19:50<5:31:21, 5.47s/it] 37%|███▋ | 2139/5773 [3:20:01<5:29:10, 5.43s/it] 37%|███▋ | 2139/5773 [3:19:56<5:29:10, 5.43s/it] {'loss': 0.5736, 'learning_rate': 1.4513137644391266e-05, 'epoch': 0.37} 37%|███▋ | 2139/5773 [3:20:01<5:29:10, 5.43s/it] {'loss': 0.5736, 'learning_rate': 1.4513137644391266e-05, 'epoch': 0.37} 37%|███▋ | 2139/5773 [3:19:56<5:29:10, 5.43s/it] 37%|███▋ | 2140/5773 [3:20:07<5:32:23, 5.49s/it] 37%|███▋ | 2140/5773 [3:20:01<5:32:23, 5.49s/it] {'loss': 0.5695, 'learning_rate': 1.4508129882290818e-05, 'epoch': 0.37} 37%|███▋ | 2140/5773 [3:20:07<5:32:23, 5.49s/it] {'loss': 0.5695, 'learning_rate': 1.4508129882290818e-05, 'epoch': 0.37} 37%|███▋ | 2140/5773 [3:20:01<5:32:23, 5.49s/it] 37%|███▋ | 2141/5773 [3:20:12<5:31:21, 5.47s/it] 37%|███▋ | 2141/5773 [3:20:07<5:31:21, 5.47s/it] {'loss': 0.5681, 'learning_rate': 1.4503120700887048e-05, 'epoch': 0.37} 37%|███▋ | 2141/5773 [3:20:12<5:31:21, 5.47s/it] {'loss': 0.5681, 'learning_rate': 1.4503120700887048e-05, 'epoch': 0.37} 37%|███▋ | 2141/5773 [3:20:07<5:31:21, 5.47s/it] 37%|███▋ | 2142/5773 [3:20:17<5:28:25, 5.43s/it] 37%|███▋ | 2142/5773 [3:20:12<5:28:25, 5.43s/it] {'loss': 0.5596, 'learning_rate': 1.4498110101757009e-05, 'epoch': 0.37} 37%|███▋ | 2142/5773 [3:20:17<5:28:25, 5.43s/it] {'loss': 0.5596, 'learning_rate': 1.4498110101757009e-05, 'epoch': 0.37} 37%|███▋ | 2142/5773 [3:20:12<5:28:25, 5.43s/it] 37%|███▋ | 2143/5773 [3:20:23<5:31:04, 5.47s/it] 37%|███▋ | 2143/5773 [3:20:17<5:31:04, 5.47s/it] {'loss': 0.5924, 'learning_rate': 1.4493098086478196e-05, 'epoch': 0.37} 37%|███▋ | 2143/5773 [3:20:23<5:31:04, 5.47s/it] {'loss': 0.5924, 'learning_rate': 1.4493098086478196e-05, 'epoch': 0.37} 37%|███▋ | 2143/5773 [3:20:17<5:31:04, 5.47s/it] 37%|███▋ | 2144/5773 [3:20:29<5:32:41, 5.50s/it] 37%|███▋ | 2144/5773 [3:20:23<5:32:41, 5.50s/it] {'loss': 0.5771, 'learning_rate': 1.4488084656628552e-05, 'epoch': 0.37} 37%|███▋ | 2144/5773 [3:20:29<5:32:41, 5.50s/it] {'loss': 0.5771, 'learning_rate': 1.4488084656628552e-05, 'epoch': 0.37} 37%|███▋ | 2144/5773 [3:20:23<5:32:41, 5.50s/it] 37%|███▋ | 2145/5773 [3:20:34<5:32:05, 5.49s/it] 37%|███▋ | 2145/5773 [3:20:29<5:32:05, 5.49s/it] {'loss': 0.5825, 'learning_rate': 1.4483069813786466e-05, 'epoch': 0.37} 37%|███▋ | 2145/5773 [3:20:34<5:32:05, 5.49s/it] {'loss': 0.5825, 'learning_rate': 1.4483069813786466e-05, 'epoch': 0.37} 37%|███▋ | 2145/5773 [3:20:29<5:32:05, 5.49s/it] 37%|███▋ | 2146/5773 [3:20:39<5:29:28, 5.45s/it] 37%|███▋ | 2146/5773 [3:20:34<5:29:28, 5.45s/it] {'loss': 0.587, 'learning_rate': 1.4478053559530767e-05, 'epoch': 0.37} 37%|███▋ | 2146/5773 [3:20:39<5:29:28, 5.45s/it] {'loss': 0.587, 'learning_rate': 1.4478053559530767e-05, 'epoch': 0.37} 37%|███▋ | 2146/5773 [3:20:34<5:29:28, 5.45s/it] 37%|███▋ | 2147/5773 [3:20:45<5:29:32, 5.45s/it] 37%|███▋ | 2147/5773 [3:20:39<5:29:31, 5.45s/it] {'loss': 0.5876, 'learning_rate': 1.4473035895440735e-05, 'epoch': 0.37} 37%|███▋ | 2147/5773 [3:20:45<5:29:32, 5.45s/it] {'loss': 0.5876, 'learning_rate': 1.4473035895440735e-05, 'epoch': 0.37} 37%|███▋ | 2147/5773 [3:20:39<5:29:31, 5.45s/it] 37%|███▋ | 2148/5773 [3:20:50<5:30:24, 5.47s/it] 37%|███▋ | 2148/5773 [3:20:45<5:30:23, 5.47s/it] {'loss': 0.5736, 'learning_rate': 1.446801682309609e-05, 'epoch': 0.37} 37%|███▋ | 2148/5773 [3:20:50<5:30:24, 5.47s/it] {'loss': 0.5736, 'learning_rate': 1.446801682309609e-05, 'epoch': 0.37} 37%|███▋ | 2148/5773 [3:20:45<5:30:23, 5.47s/it] 37%|███▋ | 2149/5773 [3:20:56<5:28:09, 5.43s/it] 37%|███▋ | 2149/5773 [3:20:50<5:28:08, 5.43s/it] {'loss': 0.585, 'learning_rate': 1.4462996344077e-05, 'epoch': 0.37} 37%|███▋ | 2149/5773 [3:20:56<5:28:09, 5.43s/it] {'loss': 0.585, 'learning_rate': 1.4462996344077e-05, 'epoch': 0.37} 37%|███▋ | 2149/5773 [3:20:50<5:28:08, 5.43s/it]7 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend...9 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 37%|███▋ | 2150/5773 [3:20:55<5:25:23, 5.39s/it] 37%|███▋ | 2150/5773 [3:21:01<5:25:23, 5.39s/it] {'loss': 0.579, 'learning_rate': 1.4457974459964068e-05, 'epoch': 0.37} 37%|███▋ | 2150/5773 [3:21:01<5:25:23, 5.39s/it] {'loss': 0.579, 'learning_rate': 1.4457974459964068e-05, 'epoch': 0.37} 37%|███▋ | 2150/5773 [3:20:55<5:25:23, 5.39s/it] 37%|███▋ | 2151/5773 [3:21:01<5:27:31, 5.43s/it] 37%|███▋ | 2151/5773 [3:21:07<5:27:33, 5.43s/it] {'loss': 0.5691, 'learning_rate': 1.4452951172338343e-05, 'epoch': 0.37} 37%|███▋ | 2151/5773 [3:21:07<5:27:33, 5.43s/it] {'loss': 0.5691, 'learning_rate': 1.4452951172338343e-05, 'epoch': 0.37} 37%|███▋ | 2151/5773 [3:21:01<5:27:31, 5.43s/it] 37%|███▋ | 2152/5773 [3:21:06<5:28:12, 5.44s/it] 37%|███▋ | 2152/5773 [3:21:12<5:28:13, 5.44s/it] {'loss': 0.5754, 'learning_rate': 1.444792648278132e-05, 'epoch': 0.37} 37%|███▋ | 2152/5773 [3:21:12<5:28:13, 5.44s/it] {'loss': 0.5754, 'learning_rate': 1.444792648278132e-05, 'epoch': 0.37} 37%|███▋ | 2152/5773 [3:21:06<5:28:12, 5.44s/it] 37%|███▋ | 2153/5773 [3:21:12<5:27:05, 5.42s/it] 37%|███▋ | 2153/5773 [3:21:17<5:27:05, 5.42s/it] {'loss': 0.576, 'learning_rate': 1.4442900392874928e-05, 'epoch': 0.37} 37%|███▋ | 2153/5773 [3:21:17<5:27:05, 5.42s/it] {'loss': 0.576, 'learning_rate': 1.4442900392874928e-05, 'epoch': 0.37} 37%|███▋ | 2153/5773 [3:21:12<5:27:05, 5.42s/it] 37%|███▋ | 2154/5773 [3:21:17<5:26:31, 5.41s/it] 37%|███▋ | 2154/5773 [3:21:23<5:26:31, 5.41s/it] {'loss': 0.5796, 'learning_rate': 1.4437872904201542e-05, 'epoch': 0.37} 37%|███▋ | 2154/5773 [3:21:23<5:26:31, 5.41s/it] {'loss': 0.5796, 'learning_rate': 1.4437872904201542e-05, 'epoch': 0.37} 37%|███▋ | 2154/5773 [3:21:17<5:26:31, 5.41s/it] 37%|███▋ | 2155/5773 [3:21:28<5:24:31, 5.38s/it] 37%|███▋ | 2155/5773 [3:21:23<5:24:33, 5.38s/it] {'loss': 0.5755, 'learning_rate': 1.4432844018343981e-05, 'epoch': 0.37} 37%|███▋ | 2155/5773 [3:21:28<5:24:31, 5.38s/it] {'loss': 0.5755, 'learning_rate': 1.4432844018343981e-05, 'epoch': 0.37} 37%|███▋ | 2155/5773 [3:21:23<5:24:33, 5.38s/it] 37%|███▋ | 2156/5773 [3:21:34<5:32:14, 5.51s/it] 37%|███▋ | 2156/5773 [3:21:28<5:32:15, 5.51s/it] {'loss': 0.5846, 'learning_rate': 1.4427813736885489e-05, 'epoch': 0.37} 37%|███▋ | 2156/5773 [3:21:34<5:32:14, 5.51s/it] {'loss': 0.5846, 'learning_rate': 1.4427813736885489e-05, 'epoch': 0.37} 37%|███▋ | 2156/5773 [3:21:28<5:32:15, 5.51s/it] 37%|███▋ | 2157/5773 [3:21:34<5:29:16, 5.46s/it] 37%|███▋ | 2157/5773 [3:21:39<5:29:17, 5.46s/it] {'loss': 0.5843, 'learning_rate': 1.442278206140977e-05, 'epoch': 0.37} 37%|███▋ | 2157/5773 [3:21:39<5:29:17, 5.46s/it] {'loss': 0.5843, 'learning_rate': 1.442278206140977e-05, 'epoch': 0.37} 37%|███▋ | 2157/5773 [3:21:34<5:29:16, 5.46s/it] 37%|███▋ | 2158/5773 [3:21:39<5:26:30, 5.42s/it] 37%|███▋ | 2158/5773 [3:21:45<5:26:32, 5.42s/it] {'loss': 0.5939, 'learning_rate': 1.441774899350095e-05, 'epoch': 0.37} 37%|███▋ | 2158/5773 [3:21:45<5:26:32, 5.42s/it] {'loss': 0.5939, 'learning_rate': 1.441774899350095e-05, 'epoch': 0.37} 37%|███▋ | 2158/5773 [3:21:39<5:26:30, 5.42s/it] 37%|███▋ | 2159/5773 [3:21:45<5:29:11, 5.47s/it] 37%|███▋ | 2159/5773 [3:21:50<5:29:12, 5.47s/it] {'loss': 0.5792, 'learning_rate': 1.4412714534743599e-05, 'epoch': 0.37} 37%|███▋ | 2159/5773 [3:21:50<5:29:12, 5.47s/it] {'loss': 0.5792, 'learning_rate': 1.4412714534743599e-05, 'epoch': 0.37} 37%|███▋ | 2159/5773 [3:21:45<5:29:11, 5.47s/it] 37%|███▋ | 2160/5773 [3:21:50<5:29:02, 5.46s/it] 37%|███▋ | 2160/5773 [3:21:56<5:29:03, 5.46s/it] {'loss': 0.5805, 'learning_rate': 1.440767868672273e-05, 'epoch': 0.37} 37%|███▋ | 2160/5773 [3:21:56<5:29:03, 5.46s/it] {'loss': 0.5805, 'learning_rate': 1.440767868672273e-05, 'epoch': 0.37} 37%|███▋ | 2160/5773 [3:21:50<5:29:02, 5.46s/it] 37%|███▋ | 2161/5773 [3:21:56<5:29:07, 5.47s/it] 37%|███▋ | 2161/5773 [3:22:01<5:29:06, 5.47s/it] {'loss': 0.5966, 'learning_rate': 1.4402641451023783e-05, 'epoch': 0.37} 37%|███▋ | 2161/5773 [3:22:01<5:29:06, 5.47s/it] {'loss': 0.5966, 'learning_rate': 1.4402641451023783e-05, 'epoch': 0.37} 37%|███▋ | 2161/5773 [3:21:56<5:29:07, 5.47s/it] 37%|███▋ | 2162/5773 [3:22:07<5:29:05, 5.47s/it] 37%|███▋ | 2162/5773 [3:22:01<5:29:06, 5.47s/it] {'loss': 0.5856, 'learning_rate': 1.4397602829232646e-05, 'epoch': 0.37} 37%|███▋ | 2162/5773 [3:22:07<5:29:05, 5.47s/it] {'loss': 0.5856, 'learning_rate': 1.4397602829232646e-05, 'epoch': 0.37} 37%|███▋ | 2162/5773 [3:22:01<5:29:06, 5.47s/it] 37%|███▋ | 2163/5773 [3:22:06<5:29:16, 5.47s/it] 37%|███▋ | 2163/5773 [3:22:12<5:29:16, 5.47s/it] {'loss': 0.5785, 'learning_rate': 1.4392562822935632e-05, 'epoch': 0.37} 37%|███▋ | 2163/5773 [3:22:12<5:29:16, 5.47s/it] {'loss': 0.5785, 'learning_rate': 1.4392562822935632e-05, 'epoch': 0.37} 37%|███▋ | 2163/5773 [3:22:06<5:29:16, 5.47s/it] 37%|███▋ | 2164/5773 [3:22:17<5:26:11, 5.42s/it] 37%|███▋ | 2164/5773 [3:22:12<5:26:12, 5.42s/it] {'loss': 0.5837, 'learning_rate': 1.4387521433719505e-05, 'epoch': 0.37} 37%|███▋ | 2164/5773 [3:22:17<5:26:11, 5.42s/it] {'loss': 0.5837, 'learning_rate': 1.4387521433719505e-05, 'epoch': 0.37} 37%|███▋ | 2164/5773 [3:22:12<5:26:12, 5.42s/it] 38%|███▊ | 2165/5773 [3:22:23<5:26:28, 5.43s/it] 38%|███▊ | 2165/5773 [3:22:17<5:26:29, 5.43s/it] {'loss': 0.589, 'learning_rate': 1.4382478663171449e-05, 'epoch': 0.38} 38%|███▊ | 2165/5773 [3:22:23<5:26:28, 5.43s/it] {'loss': 0.589, 'learning_rate': 1.4382478663171449e-05, 'epoch': 0.38} 38%|███▊ | 2165/5773 [3:22:17<5:26:29, 5.43s/it] 38%|███▊ | 2166/5773 [3:22:28<5:28:47, 5.47s/it] 38%|███▊ | 2166/5773 [3:22:23<5:28:48, 5.47s/it]{'loss': 0.5853, 'learning_rate': 1.4377434512879087e-05, 'epoch': 0.38} {'loss': 0.5853, 'learning_rate': 1.4377434512879087e-05, 'epoch': 0.38} 38%|███▊ | 2166/5773 [3:22:28<5:28:47, 5.47s/it] 38%|███▊ | 2166/5773 [3:22:23<5:28:48, 5.47s/it] 38%|███▊ | 2167/5773 [3:22:34<5:26:22, 5.43s/it] 38%|███▊ | 2167/5773 [3:22:28<5:26:22, 5.43s/it] {'loss': 0.5689, 'learning_rate': 1.4372388984430486e-05, 'epoch': 0.38} 38%|███▊ | 2167/5773 [3:22:34<5:26:22, 5.43s/it] {'loss': 0.5689, 'learning_rate': 1.4372388984430486e-05, 'epoch': 0.38} 38%|███▊ | 2167/5773 [3:22:28<5:26:22, 5.43s/it] 38%|███▊ | 2168/5773 [3:22:34<5:25:22, 5.42s/it] 38%|███▊ | 2168/5773 [3:22:39<5:25:23, 5.42s/it] {'loss': 0.5755, 'learning_rate': 1.4367342079414135e-05, 'epoch': 0.38} 38%|███▊ | 2168/5773 [3:22:39<5:25:23, 5.42s/it] {'loss': 0.5755, 'learning_rate': 1.4367342079414135e-05, 'epoch': 0.38} 38%|███▊ | 2168/5773 [3:22:34<5:25:22, 5.42s/it] 38%|███▊ | 2169/5773 [3:22:45<5:26:49, 5.44s/it] 38%|███▊ | 2169/5773 [3:22:39<5:26:50, 5.44s/it] {'loss': 0.5804, 'learning_rate': 1.4362293799418961e-05, 'epoch': 0.38} 38%|███▊ | 2169/5773 [3:22:45<5:26:49, 5.44s/it] {'loss': 0.5804, 'learning_rate': 1.4362293799418961e-05, 'epoch': 0.38} 38%|███▊ | 2169/5773 [3:22:39<5:26:50, 5.44s/it] 38%|███▊ | 2170/5773 [3:22:50<5:30:07, 5.50s/it] 38%|███▊ | 2170/5773 [3:22:45<5:30:08, 5.50s/it] {'loss': 0.5811, 'learning_rate': 1.4357244146034325e-05, 'epoch': 0.38} 38%|███▊ | 2170/5773 [3:22:50<5:30:07, 5.50s/it] {'loss': 0.5811, 'learning_rate': 1.4357244146034325e-05, 'epoch': 0.38} 38%|███▊ | 2170/5773 [3:22:45<5:30:08, 5.50s/it] 38%|███▊ | 2171/5773 [3:22:56<5:28:55, 5.48s/it] 38%|███▊ | 2171/5773 [3:22:50<5:28:55, 5.48s/it] {'loss': 0.5744, 'learning_rate': 1.435219312085002e-05, 'epoch': 0.38} 38%|███▊ | 2171/5773 [3:22:56<5:28:55, 5.48s/it] {'loss': 0.5744, 'learning_rate': 1.435219312085002e-05, 'epoch': 0.38} 38%|███▊ | 2171/5773 [3:22:50<5:28:55, 5.48s/it] 38%|███▊ | 2172/5773 [3:22:56<5:28:11, 5.47s/it] 38%|███▊ | 2172/5773 [3:23:01<5:28:14, 5.47s/it] {'loss': 0.5804, 'learning_rate': 1.4347140725456265e-05, 'epoch': 0.38} 38%|███▊ | 2172/5773 [3:23:01<5:28:14, 5.47s/it] {'loss': 0.5804, 'learning_rate': 1.4347140725456265e-05, 'epoch': 0.38} 38%|███▊ | 2172/5773 [3:22:56<5:28:11, 5.47s/it] 38%|███▊ | 2173/5773 [3:23:07<5:30:21, 5.51s/it] 38%|███▊ | 2173/5773 [3:23:01<5:30:22, 5.51s/it] {'loss': 0.5637, 'learning_rate': 1.4342086961443725e-05, 'epoch': 0.38} 38%|███▊ | 2173/5773 [3:23:07<5:30:21, 5.51s/it] {'loss': 0.5637, 'learning_rate': 1.4342086961443725e-05, 'epoch': 0.38} 38%|███▊ | 2173/5773 [3:23:01<5:30:22, 5.51s/it] 38%|███▊ | 2174/5773 [3:23:07<5:30:25, 5.51s/it] 38%|███▊ | 2174/5773 [3:23:12<5:30:25, 5.51s/it] {'loss': 0.584, 'learning_rate': 1.4337031830403478e-05, 'epoch': 0.38} 38%|███▊ | 2174/5773 [3:23:12<5:30:25, 5.51s/it] {'loss': 0.584, 'learning_rate': 1.4337031830403478e-05, 'epoch': 0.38} 38%|███▊ | 2174/5773 [3:23:07<5:30:25, 5.51s/it] 38%|███▊ | 2175/5773 [3:23:18<5:29:16, 5.49s/it] 38%|███▊ | 2175/5773 [3:23:12<5:29:17, 5.49s/it] {'loss': 0.5875, 'learning_rate': 1.4331975333927042e-05, 'epoch': 0.38} 38%|███▊ | 2175/5773 [3:23:18<5:29:16, 5.49s/it] {'loss': 0.5875, 'learning_rate': 1.4331975333927042e-05, 'epoch': 0.38} 38%|███▊ | 2175/5773 [3:23:12<5:29:17, 5.49s/it] 38%|███▊ | 2176/5773 [3:23:23<5:30:01, 5.51s/it] 38%|███▊ | 2176/5773 [3:23:18<5:30:02, 5.51s/it] {'loss': 0.5945, 'learning_rate': 1.4326917473606368e-05, 'epoch': 0.38} 38%|███▊ | 2176/5773 [3:23:23<5:30:01, 5.51s/it] {'loss': 0.5945, 'learning_rate': 1.4326917473606368e-05, 'epoch': 0.38} 38%|███▊ | 2176/5773 [3:23:18<5:30:02, 5.51s/it] 38%|███▊ | 2177/5773 [3:23:29<5:28:37, 5.48s/it] 38%|███▊ | 2177/5773 [3:23:23<5:28:37, 5.48s/it] {'loss': 0.5772, 'learning_rate': 1.4321858251033827e-05, 'epoch': 0.38} 38%|███▊ | 2177/5773 [3:23:29<5:28:37, 5.48s/it] {'loss': 0.5772, 'learning_rate': 1.4321858251033827e-05, 'epoch': 0.38} 38%|███▊ | 2177/5773 [3:23:23<5:28:37, 5.48s/it] 38%|███▊ | 2178/5773 [3:23:34<5:26:45, 5.45s/it] 38%|███▊ | 2178/5773 [3:23:28<5:26:46, 5.45s/it] {'loss': 0.5749, 'learning_rate': 1.4316797667802229e-05, 'epoch': 0.38} 38%|███▊ | 2178/5773 [3:23:34<5:26:45, 5.45s/it] {'loss': 0.5749, 'learning_rate': 1.4316797667802229e-05, 'epoch': 0.38} 38%|███▊ | 2178/5773 [3:23:28<5:26:46, 5.45s/it] 38%|███▊ | 2179/5773 [3:23:39<5:27:35, 5.47s/it] 38%|███▊ | 2179/5773 [3:23:34<5:27:36, 5.47s/it] {'loss': 0.5702, 'learning_rate': 1.4311735725504803e-05, 'epoch': 0.38} 38%|███▊ | 2179/5773 [3:23:39<5:27:35, 5.47s/it] {'loss': 0.5702, 'learning_rate': 1.4311735725504803e-05, 'epoch': 0.38} 38%|███▊ | 2179/5773 [3:23:34<5:27:36, 5.47s/it] 38%|███▊ | 2180/5773 [3:23:39<5:26:30, 5.45s/it] 38%|███▊ | 2180/5773 [3:23:45<5:26:31, 5.45s/it] {'loss': 0.5893, 'learning_rate': 1.430667242573521e-05, 'epoch': 0.38} 38%|███▊ | 2180/5773 [3:23:45<5:26:31, 5.45s/it] {'loss': 0.5893, 'learning_rate': 1.430667242573521e-05, 'epoch': 0.38} 38%|███▊ | 2180/5773 [3:23:39<5:26:30, 5.45s/it] 38%|███▊ | 2181/5773 [3:23:51<5:29:45, 5.51s/it] 38%|███▊ | 2181/5773 [3:23:45<5:29:45, 5.51s/it] {'loss': 0.5821, 'learning_rate': 1.4301607770087542e-05, 'epoch': 0.38} 38%|███▊ | 2181/5773 [3:23:51<5:29:45, 5.51s/it] {'loss': 0.5821, 'learning_rate': 1.4301607770087542e-05, 'epoch': 0.38} 38%|███▊ | 2181/5773 [3:23:45<5:29:45, 5.51s/it] 38%|███▊ | 2182/5773 [3:23:50<5:27:51, 5.48s/it] 38%|███▊ | 2182/5773 [3:23:56<5:27:52, 5.48s/it] {'loss': 0.568, 'learning_rate': 1.429654176015631e-05, 'epoch': 0.38} 38%|███▊ | 2182/5773 [3:23:56<5:27:52, 5.48s/it] {'loss': 0.568, 'learning_rate': 1.429654176015631e-05, 'epoch': 0.38} 38%|███▊ | 2182/5773 [3:23:50<5:27:51, 5.48s/it] 38%|███▊ | 2183/5773 [3:23:56<5:25:31, 5.44s/it] 38%|███▊ | 2183/5773 [3:24:01<5:25:32, 5.44s/it] {'loss': 0.5948, 'learning_rate': 1.4291474397536463e-05, 'epoch': 0.38} 38%|███▊ | 2183/5773 [3:24:01<5:25:32, 5.44s/it] {'loss': 0.5948, 'learning_rate': 1.4291474397536463e-05, 'epoch': 0.38} 38%|███▊ | 2183/5773 [3:23:56<5:25:31, 5.44s/it] 38%|███▊ | 2184/5773 [3:24:01<5:25:37, 5.44s/it] 38%|███▊ | 2184/5773 [3:24:07<5:25:37, 5.44s/it] {'loss': 0.5759, 'learning_rate': 1.4286405683823359e-05, 'epoch': 0.38} 38%|███▊ | 2184/5773 [3:24:07<5:25:37, 5.44s/it] {'loss': 0.5759, 'learning_rate': 1.4286405683823359e-05, 'epoch': 0.38} 38%|███▊ | 2184/5773 [3:24:01<5:25:37, 5.44s/it] 38%|███▊ | 2185/5773 [3:24:07<5:23:57, 5.42s/it] 38%|███▊ | 2185/5773 [3:24:12<5:23:57, 5.42s/it] {'loss': 0.5888, 'learning_rate': 1.42813356206128e-05, 'epoch': 0.38} 38%|███▊ | 2185/5773 [3:24:12<5:23:57, 5.42s/it] {'loss': 0.5888, 'learning_rate': 1.42813356206128e-05, 'epoch': 0.38} 38%|███▊ | 2185/5773 [3:24:07<5:23:57, 5.42s/it] 38%|███▊ | 2186/5773 [3:24:12<5:26:20, 5.46s/it] 38%|███▊ | 2186/5773 [3:24:18<5:26:21, 5.46s/it] {'loss': 0.5807, 'learning_rate': 1.4276264209500998e-05, 'epoch': 0.38} 38%|███▊ | 2186/5773 [3:24:18<5:26:21, 5.46s/it] {'loss': 0.5807, 'learning_rate': 1.4276264209500998e-05, 'epoch': 0.38} 38%|███▊ | 2186/5773 [3:24:12<5:26:20, 5.46s/it] 38%|███▊ | 2187/5773 [3:24:17<5:23:11, 5.41s/it] 38%|███▊ | 2187/5773 [3:24:23<5:23:11, 5.41s/it] {'loss': 0.5723, 'learning_rate': 1.4271191452084598e-05, 'epoch': 0.38} 38%|███▊ | 2187/5773 [3:24:23<5:23:11, 5.41s/it] {'loss': 0.5723, 'learning_rate': 1.4271191452084598e-05, 'epoch': 0.38} 38%|███▊ | 2187/5773 [3:24:17<5:23:11, 5.41s/it] 38%|███▊ | 2188/5773 [3:24:28<5:23:39, 5.42s/it] 38%|███▊ | 2188/5773 [3:24:23<5:23:40, 5.42s/it] {'loss': 0.552, 'learning_rate': 1.4266117349960661e-05, 'epoch': 0.38} 38%|███▊ | 2188/5773 [3:24:28<5:23:39, 5.42s/it] {'loss': 0.552, 'learning_rate': 1.4266117349960661e-05, 'epoch': 0.38} 38%|███▊ | 2188/5773 [3:24:23<5:23:40, 5.42s/it] 38%|███▊ | 2189/5773 [3:24:34<5:20:13, 5.36s/it] 38%|███▊ | 2189/5773 [3:24:28<5:20:14, 5.36s/it] {'loss': 0.5933, 'learning_rate': 1.4261041904726687e-05, 'epoch': 0.38} 38%|███▊ | 2189/5773 [3:24:34<5:20:13, 5.36s/it] {'loss': 0.5933, 'learning_rate': 1.4261041904726687e-05, 'epoch': 0.38} 38%|███▊ | 2189/5773 [3:24:28<5:20:14, 5.36s/it] 38%|███▊ | 2190/5773 [3:24:39<5:16:46, 5.30s/it] 38%|███▊ | 2190/5773 [3:24:33<5:16:46, 5.30s/it] {'loss': 0.5877, 'learning_rate': 1.425596511798058e-05, 'epoch': 0.38} 38%|███▊ | 2190/5773 [3:24:39<5:16:46, 5.30s/it] {'loss': 0.5877, 'learning_rate': 1.425596511798058e-05, 'epoch': 0.38} 38%|███▊ | 2190/5773 [3:24:33<5:16:46, 5.30s/it] 38%|███▊ | 2191/5773 [3:24:44<5:18:20, 5.33s/it] 38%|███▊ | 2191/5773 [3:24:39<5:18:20, 5.33s/it] {'loss': 0.5773, 'learning_rate': 1.4250886991320676e-05, 'epoch': 0.38} 38%|███▊ | 2191/5773 [3:24:44<5:18:20, 5.33s/it] {'loss': 0.5773, 'learning_rate': 1.4250886991320676e-05, 'epoch': 0.38} 38%|███▊ | 2191/5773 [3:24:39<5:18:20, 5.33s/it] 38%|███▊ | 2192/5773 [3:24:44<5:18:47, 5.34s/it] 38%|███▊ | 2192/5773 [3:24:50<5:18:47, 5.34s/it] {'loss': 0.5846, 'learning_rate': 1.4245807526345732e-05, 'epoch': 0.38} 38%|███▊ | 2192/5773 [3:24:50<5:18:47, 5.34s/it] {'loss': 0.5846, 'learning_rate': 1.4245807526345732e-05, 'epoch': 0.38} 38%|███▊ | 2192/5773 [3:24:44<5:18:47, 5.34s/it] 38%|███▊ | 2193/5773 [3:24:49<5:18:01, 5.33s/it] {'loss': 0.5938, 'learning_rate': 1.4240726724654925e-05, 'epoch': 0.38} 38%|███▊ | 2193/5773 [3:24:49<5:18:01, 5.33s/it] 38%|███▊ | 2193/5773 [3:24:55<5:18:02, 5.33s/it] {'loss': 0.5938, 'learning_rate': 1.4240726724654925e-05, 'epoch': 0.38} 38%|███▊ | 2193/5773 [3:24:55<5:18:02, 5.33s/it] 38%|███▊ | 2194/5773 [3:24:55<5:19:58, 5.36s/it] 38%|███▊ | 2194/5773 [3:25:00<5:19:58, 5.36s/it] {'loss': 0.5792, 'learning_rate': 1.4235644587847857e-05, 'epoch': 0.38} 38%|███▊ | 2194/5773 [3:25:00<5:19:58, 5.36s/it] {'loss': 0.5792, 'learning_rate': 1.4235644587847857e-05, 'epoch': 0.38} 38%|███▊ | 2194/5773 [3:24:55<5:19:58, 5.36s/it] 38%|███▊ | 2195/5773 [3:25:06<5:20:50, 5.38s/it] 38%|███▊ | 2195/5773 [3:25:00<5:20:50, 5.38s/it] {'loss': 0.5858, 'learning_rate': 1.4230561117524545e-05, 'epoch': 0.38} 38%|███▊ | 2195/5773 [3:25:06<5:20:50, 5.38s/it] {'loss': 0.5858, 'learning_rate': 1.4230561117524545e-05, 'epoch': 0.38} 38%|███▊ | 2195/5773 [3:25:00<5:20:50, 5.38s/it] 38%|███▊ | 2196/5773 [3:25:06<5:21:19, 5.39s/it] 38%|███▊ | 2196/5773 [3:25:11<5:21:18, 5.39s/it] {'loss': 0.5721, 'learning_rate': 1.4225476315285427e-05, 'epoch': 0.38} 38%|███▊ | 2196/5773 [3:25:11<5:21:18, 5.39s/it] {'loss': 0.5721, 'learning_rate': 1.4225476315285427e-05, 'epoch': 0.38} 38%|███▊ | 2196/5773 [3:25:06<5:21:19, 5.39s/it] 38%|███▊ | 2197/5773 [3:25:11<5:20:46, 5.38s/it] 38%|███▊ | 2197/5773 [3:25:16<5:20:46, 5.38s/it] {'loss': 0.5819, 'learning_rate': 1.4220390182731361e-05, 'epoch': 0.38} 38%|███▊ | 2197/5773 [3:25:16<5:20:46, 5.38s/it] {'loss': 0.5819, 'learning_rate': 1.4220390182731361e-05, 'epoch': 0.38} 38%|███▊ | 2197/5773 [3:25:11<5:20:46, 5.38s/it] 38%|███▊ | 2198/5773 [3:25:16<5:20:04, 5.37s/it] 38%|███▊ | 2198/5773 [3:25:22<5:20:05, 5.37s/it] {'loss': 0.5621, 'learning_rate': 1.4215302721463624e-05, 'epoch': 0.38} 38%|███▊ | 2198/5773 [3:25:22<5:20:05, 5.37s/it] {'loss': 0.5621, 'learning_rate': 1.4215302721463624e-05, 'epoch': 0.38} 38%|███▊ | 2198/5773 [3:25:16<5:20:04, 5.37s/it] 38%|███▊ | 2199/5773 [3:25:22<5:19:37, 5.37s/it] 38%|███▊ | 2199/5773 [3:25:27<5:19:37, 5.37s/it] {'loss': 0.5653, 'learning_rate': 1.4210213933083916e-05, 'epoch': 0.38} 38%|███▊ | 2199/5773 [3:25:27<5:19:37, 5.37s/it] {'loss': 0.5653, 'learning_rate': 1.4210213933083916e-05, 'epoch': 0.38} 38%|███▊ | 2199/5773 [3:25:22<5:19:37, 5.37s/it]15 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend...2 AutoResumeHook: Checking whether to suspend... 60 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend...4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 8AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 38%|███▊ | 2200/5773 [3:25:33<5:24:18, 5.45s/it] 38%|███▊ | 2200/5773 [3:25:27<5:24:19, 5.45s/it]5 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... {'loss': 0.5767, 'learning_rate': 1.4205123819194344e-05, 'epoch': 0.38} 38%|███▊ | 2200/5773 [3:25:33<5:24:18, 5.45s/it] {'loss': 0.5767, 'learning_rate': 1.4205123819194344e-05, 'epoch': 0.38} 38%|███▊ | 2200/5773 [3:25:27<5:24:19, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 38%|███▊ | 2201/5773 [3:25:54<10:05:57, 10.18s/it] 38%|███▊ | 2201/5773 [3:25:48<10:05:58, 10.18s/it] {'loss': 0.5765, 'learning_rate': 1.4200032381397439e-05, 'epoch': 0.38} 38%|███▊ | 2201/5773 [3:25:54<10:05:57, 10.18s/it] {'loss': 0.5765, 'learning_rate': 1.4200032381397439e-05, 'epoch': 0.38} 38%|███▊ | 2201/5773 [3:25:48<10:05:58, 10.18s/it] 38%|███▊ | 2202/5773 [3:25:59<8:40:00, 8.74s/it] 38%|███▊ | 2202/5773 [3:25:54<8:40:01, 8.74s/it] {'loss': 0.57, 'learning_rate': 1.4194939621296153e-05, 'epoch': 0.38} 38%|███▊ | 2202/5773 [3:25:59<8:40:00, 8.74s/it] {'loss': 0.57, 'learning_rate': 1.4194939621296153e-05, 'epoch': 0.38} 38%|███▊ | 2202/5773 [3:25:54<8:40:01, 8.74s/it] 38%|███▊ | 2203/5773 [3:26:05<7:43:03, 7.78s/it] 38%|███▊ | 2203/5773 [3:25:59<7:43:03, 7.78s/it] {'loss': 0.5681, 'learning_rate': 1.4189845540493845e-05, 'epoch': 0.38} 38%|███▊ | 2203/5773 [3:26:05<7:43:03, 7.78s/it] {'loss': 0.5681, 'learning_rate': 1.4189845540493845e-05, 'epoch': 0.38} 38%|███▊ | 2203/5773 [3:25:59<7:43:03, 7.78s/it] 38%|███▊ | 2204/5773 [3:26:10<7:00:49, 7.07s/it] 38%|███▊ | 2204/5773 [3:26:05<7:00:50, 7.07s/it] {'loss': 0.567, 'learning_rate': 1.41847501405943e-05, 'epoch': 0.38} 38%|███▊ | 2204/5773 [3:26:10<7:00:49, 7.07s/it] {'loss': 0.567, 'learning_rate': 1.41847501405943e-05, 'epoch': 0.38} 38%|███▊ | 2204/5773 [3:26:05<7:00:50, 7.07s/it] 38%|███▊ | 2205/5773 [3:26:16<6:34:57, 6.64s/it] 38%|███▊ | 2205/5773 [3:26:10<6:34:58, 6.64s/it] {'loss': 0.5731, 'learning_rate': 1.4179653423201705e-05, 'epoch': 0.38} 38%|███▊ | 2205/5773 [3:26:16<6:34:57, 6.64s/it] {'loss': 0.5731, 'learning_rate': 1.4179653423201705e-05, 'epoch': 0.38} 38%|███▊ | 2205/5773 [3:26:10<6:34:58, 6.64s/it] 38%|███▊ | 2206/5773 [3:26:21<6:11:19, 6.25s/it] 38%|███▊ | 2206/5773 [3:26:16<6:11:19, 6.25s/it] {'loss': 0.5854, 'learning_rate': 1.4174555389920675e-05, 'epoch': 0.38} 38%|███▊ | 2206/5773 [3:26:21<6:11:19, 6.25s/it] {'loss': 0.5854, 'learning_rate': 1.4174555389920675e-05, 'epoch': 0.38} 38%|███▊ | 2206/5773 [3:26:16<6:11:19, 6.25s/it] 38%|███▊ | 2207/5773 [3:26:27<5:55:20, 5.98s/it] 38%|███▊ | 2207/5773 [3:26:21<5:55:20, 5.98s/it] {'loss': 0.5846, 'learning_rate': 1.4169456042356233e-05, 'epoch': 0.38} 38%|███▊ | 2207/5773 [3:26:27<5:55:20, 5.98s/it] {'loss': 0.5846, 'learning_rate': 1.4169456042356233e-05, 'epoch': 0.38} 38%|███▊ | 2207/5773 [3:26:21<5:55:20, 5.98s/it] 38%|███▊ | 2208/5773 [3:26:32<5:43:23, 5.78s/it] 38%|███▊ | 2208/5773 [3:26:26<5:43:23, 5.78s/it] {'loss': 0.5855, 'learning_rate': 1.4164355382113813e-05, 'epoch': 0.38} 38%|███▊ | 2208/5773 [3:26:32<5:43:23, 5.78s/it] {'loss': 0.5855, 'learning_rate': 1.4164355382113813e-05, 'epoch': 0.38} 38%|███▊ | 2208/5773 [3:26:26<5:43:23, 5.78s/it] 38%|███▊ | 2209/5773 [3:26:38<5:39:42, 5.72s/it] 38%|███▊ | 2209/5773 [3:26:32<5:39:42, 5.72s/it] {'loss': 0.5764, 'learning_rate': 1.4159253410799272e-05, 'epoch': 0.38} 38%|███▊ | 2209/5773 [3:26:38<5:39:42, 5.72s/it] {'loss': 0.5764, 'learning_rate': 1.4159253410799272e-05, 'epoch': 0.38} 38%|███▊ | 2209/5773 [3:26:32<5:39:42, 5.72s/it] 38%|███▊ | 2210/5773 [3:26:43<5:36:27, 5.67s/it] 38%|███▊ | 2210/5773 [3:26:38<5:36:27, 5.67s/it] {'loss': 0.5846, 'learning_rate': 1.4154150130018867e-05, 'epoch': 0.38} 38%|███▊ | 2210/5773 [3:26:43<5:36:27, 5.67s/it] {'loss': 0.5846, 'learning_rate': 1.4154150130018867e-05, 'epoch': 0.38} 38%|███▊ | 2210/5773 [3:26:38<5:36:27, 5.67s/it] 38%|███▊ | 2211/5773 [3:26:49<5:32:03, 5.59s/it] 38%|███▊ | 2211/5773 [3:26:43<5:32:03, 5.59s/it] {'loss': 0.5831, 'learning_rate': 1.4149045541379276e-05, 'epoch': 0.38} 38%|███▊ | 2211/5773 [3:26:49<5:32:03, 5.59s/it] {'loss': 0.5831, 'learning_rate': 1.4149045541379276e-05, 'epoch': 0.38} 38%|███▊ | 2211/5773 [3:26:43<5:32:03, 5.59s/it] 38%|███▊ | 2212/5773 [3:26:54<5:28:28, 5.53s/it] 38%|███▊ | 2212/5773 [3:26:48<5:28:29, 5.53s/it] {'loss': 0.5904, 'learning_rate': 1.414393964648759e-05, 'epoch': 0.38} 38%|███▊ | 2212/5773 [3:26:54<5:28:28, 5.53s/it] {'loss': 0.5904, 'learning_rate': 1.414393964648759e-05, 'epoch': 0.38} 38%|███▊ | 2212/5773 [3:26:48<5:28:29, 5.53s/it] 38%|███▊ | 2213/5773 [3:26:59<5:27:35, 5.52s/it] 38%|███▊ | 2213/5773 [3:26:54<5:27:35, 5.52s/it] {'loss': 0.5725, 'learning_rate': 1.4138832446951305e-05, 'epoch': 0.38} 38%|███▊ | 2213/5773 [3:26:59<5:27:35, 5.52s/it] {'loss': 0.5725, 'learning_rate': 1.4138832446951305e-05, 'epoch': 0.38} 38%|███▊ | 2213/5773 [3:26:54<5:27:35, 5.52s/it] 38%|███▊ | 2214/5773 [3:27:05<5:24:29, 5.47s/it] 38%|███▊ | 2214/5773 [3:26:59<5:24:30, 5.47s/it] {'loss': 0.5893, 'learning_rate': 1.413372394437833e-05, 'epoch': 0.38} 38%|███▊ | 2214/5773 [3:27:05<5:24:29, 5.47s/it] {'loss': 0.5893, 'learning_rate': 1.413372394437833e-05, 'epoch': 0.38} 38%|███▊ | 2214/5773 [3:26:59<5:24:30, 5.47s/it] 38%|███▊ | 2215/5773 [3:27:10<5:20:51, 5.41s/it] 38%|███▊ | 2215/5773 [3:27:05<5:20:50, 5.41s/it] {'loss': 0.5712, 'learning_rate': 1.4128614140376985e-05, 'epoch': 0.38} 38%|███▊ | 2215/5773 [3:27:10<5:20:51, 5.41s/it] {'loss': 0.5712, 'learning_rate': 1.4128614140376985e-05, 'epoch': 0.38} 38%|███▊ | 2215/5773 [3:27:05<5:20:50, 5.41s/it] 38%|███▊ | 2216/5773 [3:27:15<5:20:43, 5.41s/it] 38%|███▊ | 2216/5773 [3:27:10<5:20:43, 5.41s/it] {'loss': 0.5924, 'learning_rate': 1.4123503036556003e-05, 'epoch': 0.38} 38%|███▊ | 2216/5773 [3:27:15<5:20:43, 5.41s/it] {'loss': 0.5924, 'learning_rate': 1.4123503036556003e-05, 'epoch': 0.38} 38%|███▊ | 2216/5773 [3:27:10<5:20:43, 5.41s/it] 38%|███▊ | 2217/5773 [3:27:21<5:20:05, 5.40s/it] 38%|███▊ | 2217/5773 [3:27:15<5:20:04, 5.40s/it] {'loss': 0.5681, 'learning_rate': 1.411839063452452e-05, 'epoch': 0.38} 38%|███▊ | 2217/5773 [3:27:21<5:20:05, 5.40s/it] {'loss': 0.5681, 'learning_rate': 1.411839063452452e-05, 'epoch': 0.38} 38%|███▊ | 2217/5773 [3:27:15<5:20:04, 5.40s/it] 38%|███▊ | 2218/5773 [3:27:26<5:21:06, 5.42s/it] {'loss': 0.5828, 'learning_rate': 1.4113276935892079e-05, 'epoch': 0.38} 38%|███▊ | 2218/5773 [3:27:26<5:21:06, 5.42s/it] 38%|███▊ | 2218/5773 [3:27:21<5:21:06, 5.42s/it] {'loss': 0.5828, 'learning_rate': 1.4113276935892079e-05, 'epoch': 0.38} 38%|███▊ | 2218/5773 [3:27:21<5:21:06, 5.42s/it] 38%|███▊ | 2219/5773 [3:27:32<5:23:37, 5.46s/it] 38%|███▊ | 2219/5773 [3:27:26<5:23:37, 5.46s/it] {'loss': 0.574, 'learning_rate': 1.4108161942268643e-05, 'epoch': 0.38} 38%|███▊ | 2219/5773 [3:27:32<5:23:37, 5.46s/it] {'loss': 0.574, 'learning_rate': 1.4108161942268643e-05, 'epoch': 0.38} 38%|███▊ | 2219/5773 [3:27:26<5:23:37, 5.46s/it] 38%|███▊ | 2220/5773 [3:27:37<5:21:21, 5.43s/it] 38%|███▊ | 2220/5773 [3:27:32<5:21:21, 5.43s/it] {'loss': 0.5678, 'learning_rate': 1.4103045655264576e-05, 'epoch': 0.38} 38%|███▊ | 2220/5773 [3:27:37<5:21:21, 5.43s/it] {'loss': 0.5678, 'learning_rate': 1.4103045655264576e-05, 'epoch': 0.38} 38%|███▊ | 2220/5773 [3:27:32<5:21:21, 5.43s/it] 38%|███▊ | 2221/5773 [3:27:43<5:21:27, 5.43s/it] 38%|███▊ | 2221/5773 [3:27:37<5:21:27, 5.43s/it] {'loss': 0.5674, 'learning_rate': 1.409792807649064e-05, 'epoch': 0.38} 38%|███▊ | 2221/5773 [3:27:43<5:21:27, 5.43s/it] {'loss': 0.5674, 'learning_rate': 1.409792807649064e-05, 'epoch': 0.38} 38%|███▊ | 2221/5773 [3:27:37<5:21:27, 5.43s/it] 38%|███▊ | 2222/5773 [3:27:42<5:19:12, 5.39s/it] 38%|███▊ | 2222/5773 [3:27:48<5:19:12, 5.39s/it] {'loss': 0.5711, 'learning_rate': 1.4092809207558022e-05, 'epoch': 0.38} 38%|███▊ | 2222/5773 [3:27:48<5:19:12, 5.39s/it] {'loss': 0.5711, 'learning_rate': 1.4092809207558022e-05, 'epoch': 0.38} 38%|███▊ | 2222/5773 [3:27:42<5:19:12, 5.39s/it] 39%|███▊ | 2223/5773 [3:27:53<5:21:01, 5.43s/it] 39%|███▊ | 2223/5773 [3:27:48<5:21:01, 5.43s/it] {'loss': 0.5745, 'learning_rate': 1.40876890500783e-05, 'epoch': 0.39} 39%|███▊ | 2223/5773 [3:27:53<5:21:01, 5.43s/it] {'loss': 0.5745, 'learning_rate': 1.40876890500783e-05, 'epoch': 0.39} 39%|███▊ | 2223/5773 [3:27:48<5:21:01, 5.43s/it] 39%|███▊ | 2224/5773 [3:27:59<5:21:14, 5.43s/it] 39%|███▊ | 2224/5773 [3:27:53<5:21:14, 5.43s/it] {'loss': 0.5852, 'learning_rate': 1.4082567605663461e-05, 'epoch': 0.39} 39%|███▊ | 2224/5773 [3:27:59<5:21:14, 5.43s/it] {'loss': 0.5852, 'learning_rate': 1.4082567605663461e-05, 'epoch': 0.39} 39%|███▊ | 2224/5773 [3:27:53<5:21:14, 5.43s/it] 39%|███▊ | 2225/5773 [3:28:04<5:18:31, 5.39s/it] 39%|███▊ | 2225/5773 [3:27:59<5:18:31, 5.39s/it] {'loss': 0.581, 'learning_rate': 1.4077444875925906e-05, 'epoch': 0.39} 39%|███▊ | 2225/5773 [3:28:04<5:18:31, 5.39s/it] {'loss': 0.581, 'learning_rate': 1.4077444875925906e-05, 'epoch': 0.39} 39%|███▊ | 2225/5773 [3:27:59<5:18:31, 5.39s/it] 39%|███▊ | 2226/5773 [3:28:04<5:15:13, 5.33s/it] 39%|███▊ | 2226/5773 [3:28:09<5:15:13, 5.33s/it] {'loss': 0.5887, 'learning_rate': 1.4072320862478428e-05, 'epoch': 0.39} 39%|███▊ | 2226/5773 [3:28:09<5:15:13, 5.33s/it] {'loss': 0.5887, 'learning_rate': 1.4072320862478428e-05, 'epoch': 0.39} 39%|███▊ | 2226/5773 [3:28:04<5:15:13, 5.33s/it] 39%|███▊ | 2227/5773 [3:28:15<5:18:21, 5.39s/it] 39%|███▊ | 2227/5773 [3:28:09<5:18:21, 5.39s/it] {'loss': 0.5813, 'learning_rate': 1.4067195566934235e-05, 'epoch': 0.39} 39%|███▊ | 2227/5773 [3:28:15<5:18:21, 5.39s/it] {'loss': 0.5813, 'learning_rate': 1.4067195566934235e-05, 'epoch': 0.39} 39%|███▊ | 2227/5773 [3:28:09<5:18:21, 5.39s/it] 39%|███▊ | 2228/5773 [3:28:20<5:18:10, 5.39s/it] 39%|███▊ | 2228/5773 [3:28:15<5:18:10, 5.39s/it] {'loss': 0.5849, 'learning_rate': 1.4062068990906933e-05, 'epoch': 0.39} 39%|███▊ | 2228/5773 [3:28:20<5:18:10, 5.39s/it] {'loss': 0.5849, 'learning_rate': 1.4062068990906933e-05, 'epoch': 0.39} 39%|███▊ | 2228/5773 [3:28:15<5:18:10, 5.39s/it] 39%|███▊ | 2229/5773 [3:28:26<5:21:10, 5.44s/it] 39%|███▊ | 2229/5773 [3:28:20<5:21:10, 5.44s/it] {'loss': 0.5658, 'learning_rate': 1.4056941136010526e-05, 'epoch': 0.39} 39%|███▊ | 2229/5773 [3:28:26<5:21:10, 5.44s/it] {'loss': 0.5658, 'learning_rate': 1.4056941136010526e-05, 'epoch': 0.39} 39%|███▊ | 2229/5773 [3:28:20<5:21:10, 5.44s/it] 39%|███▊ | 2230/5773 [3:28:26<5:19:08, 5.40s/it] 39%|███▊ | 2230/5773 [3:28:31<5:19:08, 5.40s/it] {'loss': 0.5808, 'learning_rate': 1.4051812003859436e-05, 'epoch': 0.39} 39%|███▊ | 2230/5773 [3:28:31<5:19:08, 5.40s/it] {'loss': 0.5808, 'learning_rate': 1.4051812003859436e-05, 'epoch': 0.39} 39%|███▊ | 2230/5773 [3:28:26<5:19:08, 5.40s/it] 39%|███▊ | 2231/5773 [3:28:31<5:21:14, 5.44s/it] 39%|███▊ | 2231/5773 [3:28:37<5:21:15, 5.44s/it] {'loss': 0.5666, 'learning_rate': 1.4046681596068468e-05, 'epoch': 0.39} 39%|███▊ | 2231/5773 [3:28:37<5:21:15, 5.44s/it] {'loss': 0.5666, 'learning_rate': 1.4046681596068468e-05, 'epoch': 0.39} 39%|███▊ | 2231/5773 [3:28:31<5:21:14, 5.44s/it] 39%|███▊ | 2232/5773 [3:28:42<5:22:38, 5.47s/it] 39%|███▊ | 2232/5773 [3:28:37<5:22:38, 5.47s/it] {'loss': 0.5738, 'learning_rate': 1.4041549914252843e-05, 'epoch': 0.39} {'loss': 0.5738, 'learning_rate': 1.4041549914252843e-05, 'epoch': 0.39} 39%|███▊ | 2232/5773 [3:28:42<5:22:38, 5.47s/it] 39%|███▊ | 2232/5773 [3:28:37<5:22:38, 5.47s/it] 39%|███▊ | 2233/5773 [3:28:48<5:22:27, 5.47s/it] 39%|███▊ | 2233/5773 [3:28:42<5:22:28, 5.47s/it] {'loss': 0.5732, 'learning_rate': 1.4036416960028181e-05, 'epoch': 0.39} 39%|███▊ | 2233/5773 [3:28:48<5:22:27, 5.47s/it] {'loss': 0.5732, 'learning_rate': 1.4036416960028181e-05, 'epoch': 0.39} 39%|███▊ | 2233/5773 [3:28:42<5:22:28, 5.47s/it] 39%|███▊ | 2234/5773 [3:28:53<5:22:17, 5.46s/it] 39%|███▊ | 2234/5773 [3:28:48<5:22:17, 5.46s/it] {'loss': 0.5713, 'learning_rate': 1.4031282735010499e-05, 'epoch': 0.39} 39%|███▊ | 2234/5773 [3:28:53<5:22:17, 5.46s/it] {'loss': 0.5713, 'learning_rate': 1.4031282735010499e-05, 'epoch': 0.39} 39%|███▊ | 2234/5773 [3:28:48<5:22:17, 5.46s/it] 39%|███▊ | 2235/5773 [3:28:59<5:21:07, 5.45s/it] 39%|███▊ | 2235/5773 [3:28:53<5:21:07, 5.45s/it] {'loss': 0.5678, 'learning_rate': 1.402614724081621e-05, 'epoch': 0.39} 39%|███▊ | 2235/5773 [3:28:59<5:21:07, 5.45s/it] {'loss': 0.5678, 'learning_rate': 1.402614724081621e-05, 'epoch': 0.39} 39%|███▊ | 2235/5773 [3:28:53<5:21:07, 5.45s/it] 39%|███▊ | 2236/5773 [3:29:04<5:20:18, 5.43s/it] 39%|███▊ | 2236/5773 [3:28:58<5:20:18, 5.43s/it] {'loss': 0.5897, 'learning_rate': 1.4021010479062135e-05, 'epoch': 0.39} 39%|███▊ | 2236/5773 [3:29:04<5:20:18, 5.43s/it] {'loss': 0.5897, 'learning_rate': 1.4021010479062135e-05, 'epoch': 0.39} 39%|███▊ | 2236/5773 [3:28:58<5:20:18, 5.43s/it] 39%|███▊ | 2237/5773 [3:29:09<5:19:49, 5.43s/it] 39%|███▊ | 2237/5773 [3:29:04<5:19:49, 5.43s/it] {'loss': 0.5674, 'learning_rate': 1.4015872451365493e-05, 'epoch': 0.39} 39%|███▊ | 2237/5773 [3:29:09<5:19:49, 5.43s/it] {'loss': 0.5674, 'learning_rate': 1.4015872451365493e-05, 'epoch': 0.39} 39%|███▊ | 2237/5773 [3:29:04<5:19:49, 5.43s/it] 39%|███▉ | 2238/5773 [3:29:15<5:18:03, 5.40s/it] 39%|███▉ | 2238/5773 [3:29:09<5:18:03, 5.40s/it] {'loss': 0.5877, 'learning_rate': 1.4010733159343897e-05, 'epoch': 0.39} 39%|███▉ | 2238/5773 [3:29:15<5:18:03, 5.40s/it] {'loss': 0.5877, 'learning_rate': 1.4010733159343897e-05, 'epoch': 0.39} 39%|███▉ | 2238/5773 [3:29:09<5:18:03, 5.40s/it] 39%|███▉ | 2239/5773 [3:29:20<5:16:46, 5.38s/it] 39%|███▉ | 2239/5773 [3:29:15<5:16:47, 5.38s/it] {'loss': 0.5597, 'learning_rate': 1.4005592604615357e-05, 'epoch': 0.39} 39%|███▉ | 2239/5773 [3:29:20<5:16:46, 5.38s/it] {'loss': 0.5597, 'learning_rate': 1.4005592604615357e-05, 'epoch': 0.39} 39%|███▉ | 2239/5773 [3:29:15<5:16:47, 5.38s/it] 39%|███▉ | 2240/5773 [3:29:25<5:17:41, 5.40s/it] 39%|███▉ | 2240/5773 [3:29:20<5:17:41, 5.40s/it] {'loss': 0.5591, 'learning_rate': 1.400045078879829e-05, 'epoch': 0.39} 39%|███▉ | 2240/5773 [3:29:25<5:17:41, 5.40s/it] {'loss': 0.5591, 'learning_rate': 1.400045078879829e-05, 'epoch': 0.39} 39%|███▉ | 2240/5773 [3:29:20<5:17:41, 5.40s/it] 39%|███▉ | 2241/5773 [3:29:31<5:18:20, 5.41s/it] 39%|███▉ | 2241/5773 [3:29:25<5:18:21, 5.41s/it] {'loss': 0.5781, 'learning_rate': 1.3995307713511504e-05, 'epoch': 0.39} 39%|███▉ | 2241/5773 [3:29:31<5:18:20, 5.41s/it] {'loss': 0.5781, 'learning_rate': 1.3995307713511504e-05, 'epoch': 0.39} 39%|███▉ | 2241/5773 [3:29:25<5:18:21, 5.41s/it] 39%|███▉ | 2242/5773 [3:29:31<5:19:16, 5.43s/it] 39%|███▉ | 2242/5773 [3:29:36<5:19:16, 5.43s/it] {'loss': 0.5807, 'learning_rate': 1.3990163380374195e-05, 'epoch': 0.39} 39%|███▉ | 2242/5773 [3:29:36<5:19:16, 5.43s/it] {'loss': 0.5807, 'learning_rate': 1.3990163380374195e-05, 'epoch': 0.39} 39%|███▉ | 2242/5773 [3:29:31<5:19:16, 5.43s/it] 39%|███▉ | 2243/5773 [3:29:42<5:17:56, 5.40s/it] 39%|███▉ | 2243/5773 [3:29:36<5:17:56, 5.40s/it] {'loss': 0.5709, 'learning_rate': 1.3985017791005972e-05, 'epoch': 0.39} 39%|███▉ | 2243/5773 [3:29:42<5:17:56, 5.40s/it] {'loss': 0.5709, 'learning_rate': 1.3985017791005972e-05, 'epoch': 0.39} 39%|███▉ | 2243/5773 [3:29:36<5:17:56, 5.40s/it] 39%|███▉ | 2244/5773 [3:29:47<5:17:57, 5.41s/it] 39%|███▉ | 2244/5773 [3:29:42<5:17:58, 5.41s/it] {'loss': 0.5604, 'learning_rate': 1.3979870947026828e-05, 'epoch': 0.39} 39%|███▉ | 2244/5773 [3:29:47<5:17:57, 5.41s/it] {'loss': 0.5604, 'learning_rate': 1.3979870947026828e-05, 'epoch': 0.39} 39%|███▉ | 2244/5773 [3:29:42<5:17:58, 5.41s/it] 39%|███▉ | 2245/5773 [3:29:52<5:15:43, 5.37s/it] 39%|███▉ | 2245/5773 [3:29:47<5:15:43, 5.37s/it] {'loss': 0.5632, 'learning_rate': 1.3974722850057152e-05, 'epoch': 0.39} 39%|███▉ | 2245/5773 [3:29:52<5:15:43, 5.37s/it] {'loss': 0.5632, 'learning_rate': 1.3974722850057152e-05, 'epoch': 0.39} 39%|███▉ | 2245/5773 [3:29:47<5:15:43, 5.37s/it] 39%|███▉ | 2246/5773 [3:29:58<5:15:13, 5.36s/it] 39%|███▉ | 2246/5773 [3:29:52<5:15:12, 5.36s/it] {'loss': 0.5875, 'learning_rate': 1.3969573501717732e-05, 'epoch': 0.39} {'loss': 0.5875, 'learning_rate': 1.3969573501717732e-05, 'epoch': 0.39} 39%|███▉ | 2246/5773 [3:29:58<5:15:13, 5.36s/it] 39%|███▉ | 2246/5773 [3:29:52<5:15:12, 5.36s/it] 39%|███▉ | 2247/5773 [3:30:03<5:17:54, 5.41s/it] 39%|███▉ | 2247/5773 [3:29:58<5:17:55, 5.41s/it] {'loss': 0.574, 'learning_rate': 1.3964422903629746e-05, 'epoch': 0.39} 39%|███▉ | 2247/5773 [3:30:03<5:17:54, 5.41s/it] {'loss': 0.574, 'learning_rate': 1.3964422903629746e-05, 'epoch': 0.39} 39%|███▉ | 2247/5773 [3:29:58<5:17:55, 5.41s/it] 39%|███▉ | 2248/5773 [3:30:09<5:17:14, 5.40s/it] 39%|███▉ | 2248/5773 [3:30:03<5:17:14, 5.40s/it] {'loss': 0.5714, 'learning_rate': 1.3959271057414769e-05, 'epoch': 0.39} 39%|███▉ | 2248/5773 [3:30:09<5:17:14, 5.40s/it] {'loss': 0.5714, 'learning_rate': 1.3959271057414769e-05, 'epoch': 0.39} 39%|███▉ | 2248/5773 [3:30:03<5:17:14, 5.40s/it] 39%|███▉ | 2249/5773 [3:30:14<5:20:32, 5.46s/it] 39%|███▉ | 2249/5773 [3:30:09<5:20:32, 5.46s/it] {'loss': 0.5809, 'learning_rate': 1.3954117964694767e-05, 'epoch': 0.39} 39%|███▉ | 2249/5773 [3:30:14<5:20:32, 5.46s/it] {'loss': 0.5809, 'learning_rate': 1.3954117964694767e-05, 'epoch': 0.39} 39%|███▉ | 2249/5773 [3:30:09<5:20:32, 5.46s/it]13 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 39%|███▉ | 2250/5773 [3:30:20<5:21:41, 5.48s/it]11 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend...0 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 39%|███▉ | 2250/5773 [3:30:14<5:21:41, 5.48s/it]10 AutoResumeHook: Checking whether to suspend... {'loss': 0.5716, 'learning_rate': 1.3948963627092091e-05, 'epoch': 0.39} 39%|███▉ | 2250/5773 [3:30:20<5:21:41, 5.48s/it] {'loss': 0.5716, 'learning_rate': 1.3948963627092091e-05, 'epoch': 0.39} 39%|███▉ | 2250/5773 [3:30:14<5:21:41, 5.48s/it] 39%|███▉ | 2251/5773 [3:30:26<5:28:23, 5.59s/it] 39%|███▉ | 2251/5773 [3:30:20<5:28:23, 5.59s/it] {'loss': 0.5795, 'learning_rate': 1.3943808046229504e-05, 'epoch': 0.39} 39%|███▉ | 2251/5773 [3:30:26<5:28:23, 5.59s/it] {'loss': 0.5795, 'learning_rate': 1.3943808046229504e-05, 'epoch': 0.39} 39%|███▉ | 2251/5773 [3:30:20<5:28:23, 5.59s/it] 39%|███▉ | 2252/5773 [3:30:31<5:24:41, 5.53s/it] 39%|███▉ | 2252/5773 [3:30:26<5:24:41, 5.53s/it] {'loss': 0.5974, 'learning_rate': 1.3938651223730133e-05, 'epoch': 0.39} 39%|███▉ | 2252/5773 [3:30:31<5:24:41, 5.53s/it] {'loss': 0.5974, 'learning_rate': 1.3938651223730133e-05, 'epoch': 0.39} 39%|███▉ | 2252/5773 [3:30:26<5:24:41, 5.53s/it] 39%|███▉ | 2253/5773 [3:30:37<5:27:09, 5.58s/it] 39%|███▉ | 2253/5773 [3:30:31<5:27:09, 5.58s/it] {'loss': 0.5649, 'learning_rate': 1.3933493161217521e-05, 'epoch': 0.39} 39%|███▉ | 2253/5773 [3:30:37<5:27:09, 5.58s/it] {'loss': 0.5649, 'learning_rate': 1.3933493161217521e-05, 'epoch': 0.39} 39%|███▉ | 2253/5773 [3:30:31<5:27:09, 5.58s/it] 39%|███▉ | 2254/5773 [3:30:42<5:26:05, 5.56s/it] 39%|███▉ | 2254/5773 [3:30:37<5:26:05, 5.56s/it] {'loss': 0.5822, 'learning_rate': 1.3928333860315588e-05, 'epoch': 0.39} 39%|███▉ | 2254/5773 [3:30:42<5:26:05, 5.56s/it] {'loss': 0.5822, 'learning_rate': 1.3928333860315588e-05, 'epoch': 0.39} 39%|███▉ | 2254/5773 [3:30:37<5:26:05, 5.56s/it] 39%|███▉ | 2255/5773 [3:30:47<5:20:20, 5.46s/it] 39%|███▉ | 2255/5773 [3:30:42<5:20:21, 5.46s/it] {'loss': 0.5666, 'learning_rate': 1.3923173322648646e-05, 'epoch': 0.39} 39%|███▉ | 2255/5773 [3:30:47<5:20:20, 5.46s/it] {'loss': 0.5666, 'learning_rate': 1.3923173322648646e-05, 'epoch': 0.39} 39%|███▉ | 2255/5773 [3:30:42<5:20:21, 5.46s/it] 39%|███▉ | 2256/5773 [3:30:47<5:18:40, 5.44s/it] 39%|███▉ | 2256/5773 [3:30:53<5:18:41, 5.44s/it] {'loss': 0.5797, 'learning_rate': 1.39180115498414e-05, 'epoch': 0.39} 39%|███▉ | 2256/5773 [3:30:53<5:18:41, 5.44s/it] {'loss': 0.5797, 'learning_rate': 1.39180115498414e-05, 'epoch': 0.39} 39%|███▉ | 2256/5773 [3:30:47<5:18:40, 5.44s/it] 39%|███▉ | 2257/5773 [3:30:58<5:16:48, 5.41s/it] 39%|███▉ | 2257/5773 [3:30:53<5:16:48, 5.41s/it] {'loss': 0.5901, 'learning_rate': 1.3912848543518936e-05, 'epoch': 0.39} 39%|███▉ | 2257/5773 [3:30:58<5:16:48, 5.41s/it] {'loss': 0.5901, 'learning_rate': 1.3912848543518936e-05, 'epoch': 0.39} 39%|███▉ | 2257/5773 [3:30:53<5:16:48, 5.41s/it] 39%|███▉ | 2258/5773 [3:31:04<5:16:28, 5.40s/it] 39%|███▉ | 2258/5773 [3:30:58<5:16:28, 5.40s/it] {'loss': 0.5836, 'learning_rate': 1.3907684305306733e-05, 'epoch': 0.39} 39%|███▉ | 2258/5773 [3:31:04<5:16:28, 5.40s/it] {'loss': 0.5836, 'learning_rate': 1.3907684305306733e-05, 'epoch': 0.39} 39%|███▉ | 2258/5773 [3:30:58<5:16:28, 5.40s/it] 39%|███▉ | 2259/5773 [3:31:09<5:16:30, 5.40s/it] 39%|███▉ | 2259/5773 [3:31:03<5:16:30, 5.40s/it] {'loss': 0.5786, 'learning_rate': 1.3902518836830664e-05, 'epoch': 0.39} 39%|███▉ | 2259/5773 [3:31:09<5:16:30, 5.40s/it] {'loss': 0.5786, 'learning_rate': 1.3902518836830664e-05, 'epoch': 0.39} 39%|███▉ | 2259/5773 [3:31:03<5:16:30, 5.40s/it] 39%|███▉ | 2260/5773 [3:31:09<5:20:22, 5.47s/it] {'loss': 0.5739, 'learning_rate': 1.3897352139716979e-05, 'epoch': 0.39} 39%|███▉ | 2260/5773 [3:31:09<5:20:22, 5.47s/it] 39%|███▉ | 2260/5773 [3:31:15<5:20:22, 5.47s/it] {'loss': 0.5739, 'learning_rate': 1.3897352139716979e-05, 'epoch': 0.39} 39%|███▉ | 2260/5773 [3:31:15<5:20:22, 5.47s/it] 39%|███▉ | 2261/5773 [3:31:20<5:19:16, 5.45s/it] 39%|███▉ | 2261/5773 [3:31:15<5:19:16, 5.45s/it] {'loss': 0.5883, 'learning_rate': 1.3892184215592323e-05, 'epoch': 0.39} 39%|███▉ | 2261/5773 [3:31:20<5:19:16, 5.45s/it] {'loss': 0.5883, 'learning_rate': 1.3892184215592323e-05, 'epoch': 0.39} 39%|███▉ | 2261/5773 [3:31:15<5:19:16, 5.45s/it] 39%|███▉ | 2262/5773 [3:31:25<5:18:02, 5.43s/it] 39%|███▉ | 2262/5773 [3:31:20<5:18:02, 5.43s/it] {'loss': 0.5821, 'learning_rate': 1.3887015066083722e-05, 'epoch': 0.39} 39%|███▉ | 2262/5773 [3:31:25<5:18:02, 5.43s/it] {'loss': 0.5821, 'learning_rate': 1.3887015066083722e-05, 'epoch': 0.39} 39%|███▉ | 2262/5773 [3:31:20<5:18:02, 5.43s/it] 39%|███▉ | 2263/5773 [3:31:31<5:18:36, 5.45s/it] 39%|███▉ | 2263/5773 [3:31:25<5:18:36, 5.45s/it] {'loss': 0.5911, 'learning_rate': 1.3881844692818587e-05, 'epoch': 0.39} 39%|███▉ | 2263/5773 [3:31:31<5:18:36, 5.45s/it] {'loss': 0.5911, 'learning_rate': 1.3881844692818587e-05, 'epoch': 0.39} 39%|███▉ | 2263/5773 [3:31:25<5:18:36, 5.45s/it] 39%|███▉ | 2264/5773 [3:31:36<5:19:51, 5.47s/it] 39%|███▉ | 2264/5773 [3:31:31<5:19:51, 5.47s/it] {'loss': 0.5673, 'learning_rate': 1.387667309742472e-05, 'epoch': 0.39} 39%|███▉ | 2264/5773 [3:31:36<5:19:51, 5.47s/it] {'loss': 0.5673, 'learning_rate': 1.387667309742472e-05, 'epoch': 0.39} 39%|███▉ | 2264/5773 [3:31:31<5:19:51, 5.47s/it] 39%|███▉ | 2265/5773 [3:31:42<5:18:59, 5.46s/it] 39%|███▉ | 2265/5773 [3:31:36<5:18:59, 5.46s/it] {'loss': 0.5801, 'learning_rate': 1.3871500281530302e-05, 'epoch': 0.39} 39%|███▉ | 2265/5773 [3:31:42<5:18:59, 5.46s/it] {'loss': 0.5801, 'learning_rate': 1.3871500281530302e-05, 'epoch': 0.39} 39%|███▉ | 2265/5773 [3:31:36<5:18:59, 5.46s/it] 39%|███▉ | 2266/5773 [3:31:47<5:18:14, 5.44s/it] 39%|███▉ | 2266/5773 [3:31:42<5:18:14, 5.44s/it] {'loss': 0.5702, 'learning_rate': 1.3866326246763902e-05, 'epoch': 0.39} 39%|███▉ | 2266/5773 [3:31:47<5:18:14, 5.44s/it] {'loss': 0.5702, 'learning_rate': 1.3866326246763902e-05, 'epoch': 0.39} 39%|███▉ | 2266/5773 [3:31:42<5:18:14, 5.44s/it] 39%|███▉ | 2267/5773 [3:31:53<5:18:18, 5.45s/it] 39%|███▉ | 2267/5773 [3:31:47<5:18:18, 5.45s/it] {'loss': 0.5766, 'learning_rate': 1.386115099475447e-05, 'epoch': 0.39} 39%|███▉ | 2267/5773 [3:31:53<5:18:18, 5.45s/it] {'loss': 0.5766, 'learning_rate': 1.386115099475447e-05, 'epoch': 0.39} 39%|███▉ | 2267/5773 [3:31:47<5:18:18, 5.45s/it] 39%|███▉ | 2268/5773 [3:31:58<5:23:00, 5.53s/it] 39%|███▉ | 2268/5773 [3:31:53<5:23:00, 5.53s/it] {'loss': 0.5835, 'learning_rate': 1.3855974527131344e-05, 'epoch': 0.39} 39%|███▉ | 2268/5773 [3:31:58<5:23:00, 5.53s/it] {'loss': 0.5835, 'learning_rate': 1.3855974527131344e-05, 'epoch': 0.39} 39%|███▉ | 2268/5773 [3:31:53<5:23:00, 5.53s/it] 39%|███▉ | 2269/5773 [3:32:04<5:20:34, 5.49s/it] 39%|███▉ | 2269/5773 [3:31:58<5:20:34, 5.49s/it] {'loss': 0.5814, 'learning_rate': 1.3850796845524241e-05, 'epoch': 0.39} 39%|███▉ | 2269/5773 [3:32:04<5:20:34, 5.49s/it] {'loss': 0.5814, 'learning_rate': 1.3850796845524241e-05, 'epoch': 0.39} 39%|███▉ | 2269/5773 [3:31:58<5:20:34, 5.49s/it] 39%|███▉ | 2270/5773 [3:32:09<5:18:15, 5.45s/it] 39%|███▉ | 2270/5773 [3:32:04<5:18:15, 5.45s/it] {'loss': 0.5694, 'learning_rate': 1.3845617951563255e-05, 'epoch': 0.39} 39%|███▉ | 2270/5773 [3:32:09<5:18:15, 5.45s/it] {'loss': 0.5694, 'learning_rate': 1.3845617951563255e-05, 'epoch': 0.39} 39%|███▉ | 2270/5773 [3:32:04<5:18:15, 5.45s/it] 39%|███▉ | 2271/5773 [3:32:15<5:17:51, 5.45s/it] 39%|███▉ | 2271/5773 [3:32:09<5:17:51, 5.45s/it] {'loss': 0.5733, 'learning_rate': 1.3840437846878872e-05, 'epoch': 0.39} 39%|███▉ | 2271/5773 [3:32:15<5:17:51, 5.45s/it] {'loss': 0.5733, 'learning_rate': 1.3840437846878872e-05, 'epoch': 0.39} 39%|███▉ | 2271/5773 [3:32:09<5:17:51, 5.45s/it] 39%|███▉ | 2272/5773 [3:32:20<5:18:27, 5.46s/it] 39%|███▉ | 2272/5773 [3:32:15<5:18:27, 5.46s/it] {'loss': 0.5764, 'learning_rate': 1.3835256533101955e-05, 'epoch': 0.39} 39%|███▉ | 2272/5773 [3:32:20<5:18:27, 5.46s/it] {'loss': 0.5764, 'learning_rate': 1.3835256533101955e-05, 'epoch': 0.39} 39%|███▉ | 2272/5773 [3:32:15<5:18:27, 5.46s/it] 39%|███▉ | 2273/5773 [3:32:25<5:16:16, 5.42s/it] 39%|███▉ | 2273/5773 [3:32:20<5:16:16, 5.42s/it] {'loss': 0.5791, 'learning_rate': 1.3830074011863746e-05, 'epoch': 0.39} 39%|███▉ | 2273/5773 [3:32:25<5:16:16, 5.42s/it] {'loss': 0.5791, 'learning_rate': 1.3830074011863746e-05, 'epoch': 0.39} 39%|███▉ | 2273/5773 [3:32:20<5:16:16, 5.42s/it] 39%|███▉ | 2274/5773 [3:32:31<5:17:03, 5.44s/it] 39%|███▉ | 2274/5773 [3:32:25<5:17:03, 5.44s/it] {'loss': 0.5874, 'learning_rate': 1.3824890284795867e-05, 'epoch': 0.39} 39%|███▉ | 2274/5773 [3:32:31<5:17:03, 5.44s/it] {'loss': 0.5874, 'learning_rate': 1.3824890284795867e-05, 'epoch': 0.39} 39%|███▉ | 2274/5773 [3:32:25<5:17:03, 5.44s/it] 39%|███▉ | 2275/5773 [3:32:36<5:17:21, 5.44s/it] 39%|███▉ | 2275/5773 [3:32:31<5:17:22, 5.44s/it] {'loss': 0.5891, 'learning_rate': 1.3819705353530322e-05, 'epoch': 0.39} 39%|███▉ | 2275/5773 [3:32:36<5:17:21, 5.44s/it] {'loss': 0.5891, 'learning_rate': 1.3819705353530322e-05, 'epoch': 0.39} 39%|███▉ | 2275/5773 [3:32:31<5:17:22, 5.44s/it] 39%|███▉ | 2276/5773 [3:32:42<5:16:34, 5.43s/it] 39%|███▉ | 2276/5773 [3:32:36<5:16:33, 5.43s/it] {'loss': 0.5631, 'learning_rate': 1.381451921969949e-05, 'epoch': 0.39} 39%|███▉ | 2276/5773 [3:32:42<5:16:34, 5.43s/it] {'loss': 0.5631, 'learning_rate': 1.381451921969949e-05, 'epoch': 0.39} 39%|███▉ | 2276/5773 [3:32:36<5:16:33, 5.43s/it] 39%|███▉ | 2277/5773 [3:32:47<5:15:51, 5.42s/it] 39%|███▉ | 2277/5773 [3:32:42<5:15:51, 5.42s/it] {'loss': 0.5794, 'learning_rate': 1.3809331884936142e-05, 'epoch': 0.39} 39%|███▉ | 2277/5773 [3:32:47<5:15:51, 5.42s/it] {'loss': 0.5794, 'learning_rate': 1.3809331884936142e-05, 'epoch': 0.39} 39%|███▉ | 2277/5773 [3:32:42<5:15:51, 5.42s/it] 39%|███▉ | 2278/5773 [3:32:52<5:13:39, 5.38s/it] 39%|███▉ | 2278/5773 [3:32:47<5:13:39, 5.38s/it] {'loss': 0.5656, 'learning_rate': 1.3804143350873403e-05, 'epoch': 0.39} 39%|███▉ | 2278/5773 [3:32:52<5:13:39, 5.38s/it] {'loss': 0.5656, 'learning_rate': 1.3804143350873403e-05, 'epoch': 0.39} 39%|███▉ | 2278/5773 [3:32:47<5:13:39, 5.38s/it] 39%|███▉ | 2279/5773 [3:32:58<5:15:06, 5.41s/it] 39%|███▉ | 2279/5773 [3:32:52<5:15:06, 5.41s/it] {'loss': 0.5626, 'learning_rate': 1.3798953619144797e-05, 'epoch': 0.39} 39%|███▉ | 2279/5773 [3:32:58<5:15:06, 5.41s/it] {'loss': 0.5626, 'learning_rate': 1.3798953619144797e-05, 'epoch': 0.39} 39%|███▉ | 2279/5773 [3:32:52<5:15:06, 5.41s/it] 39%|███▉ | 2280/5773 [3:33:03<5:14:06, 5.40s/it] 39%|███▉ | 2280/5773 [3:32:58<5:14:06, 5.40s/it] {'loss': 0.5644, 'learning_rate': 1.3793762691384215e-05, 'epoch': 0.39} 39%|███▉ | 2280/5773 [3:33:03<5:14:06, 5.40s/it] {'loss': 0.5644, 'learning_rate': 1.3793762691384215e-05, 'epoch': 0.39} 39%|███▉ | 2280/5773 [3:32:58<5:14:06, 5.40s/it] 40%|███▉ | 2281/5773 [3:33:09<5:15:31, 5.42s/it] 40%|███▉ | 2281/5773 [3:33:03<5:15:31, 5.42s/it] {'loss': 0.5561, 'learning_rate': 1.3788570569225929e-05, 'epoch': 0.4} 40%|███▉ | 2281/5773 [3:33:09<5:15:31, 5.42s/it] {'loss': 0.5561, 'learning_rate': 1.3788570569225929e-05, 'epoch': 0.4} 40%|███▉ | 2281/5773 [3:33:03<5:15:31, 5.42s/it] 40%|███▉ | 2282/5773 [3:33:14<5:15:12, 5.42s/it] 40%|███▉ | 2282/5773 [3:33:09<5:15:12, 5.42s/it] {'loss': 0.5794, 'learning_rate': 1.3783377254304586e-05, 'epoch': 0.4} 40%|███▉ | 2282/5773 [3:33:14<5:15:12, 5.42s/it] {'loss': 0.5794, 'learning_rate': 1.3783377254304586e-05, 'epoch': 0.4} 40%|███▉ | 2282/5773 [3:33:09<5:15:12, 5.42s/it] 40%|███▉ | 2283/5773 [3:33:20<5:20:18, 5.51s/it] 40%|███▉ | 2283/5773 [3:33:14<5:20:18, 5.51s/it] {'loss': 0.5912, 'learning_rate': 1.3778182748255204e-05, 'epoch': 0.4} 40%|███▉ | 2283/5773 [3:33:20<5:20:18, 5.51s/it] {'loss': 0.5912, 'learning_rate': 1.3778182748255204e-05, 'epoch': 0.4} 40%|███▉ | 2283/5773 [3:33:14<5:20:18, 5.51s/it] 40%|███▉ | 2284/5773 [3:33:25<5:15:36, 5.43s/it] 40%|███▉ | 2284/5773 [3:33:20<5:15:36, 5.43s/it] {'loss': 0.5573, 'learning_rate': 1.377298705271318e-05, 'epoch': 0.4} 40%|███▉ | 2284/5773 [3:33:25<5:15:36, 5.43s/it] {'loss': 0.5573, 'learning_rate': 1.377298705271318e-05, 'epoch': 0.4} 40%|███▉ | 2284/5773 [3:33:20<5:15:36, 5.43s/it] 40%|███▉ | 2285/5773 [3:33:31<5:18:30, 5.48s/it] 40%|███▉ | 2285/5773 [3:33:25<5:18:30, 5.48s/it] {'loss': 0.5749, 'learning_rate': 1.3767790169314286e-05, 'epoch': 0.4} 40%|███▉ | 2285/5773 [3:33:31<5:18:30, 5.48s/it] {'loss': 0.5749, 'learning_rate': 1.3767790169314286e-05, 'epoch': 0.4} 40%|███▉ | 2285/5773 [3:33:25<5:18:30, 5.48s/it] 40%|███▉ | 2286/5773 [3:33:36<5:19:57, 5.51s/it] 40%|███▉ | 2286/5773 [3:33:31<5:19:57, 5.51s/it] {'loss': 0.5921, 'learning_rate': 1.3762592099694666e-05, 'epoch': 0.4} 40%|███▉ | 2286/5773 [3:33:36<5:19:57, 5.51s/it] {'loss': 0.5921, 'learning_rate': 1.3762592099694666e-05, 'epoch': 0.4} 40%|███▉ | 2286/5773 [3:33:31<5:19:57, 5.51s/it] 40%|███▉ | 2287/5773 [3:33:42<5:16:32, 5.45s/it] 40%|███▉ | 2287/5773 [3:33:36<5:16:31, 5.45s/it] {'loss': 0.5674, 'learning_rate': 1.375739284549084e-05, 'epoch': 0.4} 40%|███▉ | 2287/5773 [3:33:42<5:16:32, 5.45s/it] {'loss': 0.5674, 'learning_rate': 1.375739284549084e-05, 'epoch': 0.4} 40%|███▉ | 2287/5773 [3:33:36<5:16:31, 5.45s/it] 40%|███▉ | 2288/5773 [3:33:47<5:17:15, 5.46s/it] 40%|███▉ | 2288/5773 [3:33:42<5:17:15, 5.46s/it] {'loss': 0.5813, 'learning_rate': 1.3752192408339698e-05, 'epoch': 0.4} 40%|███▉ | 2288/5773 [3:33:47<5:17:15, 5.46s/it] {'loss': 0.5813, 'learning_rate': 1.3752192408339698e-05, 'epoch': 0.4} 40%|███▉ | 2288/5773 [3:33:42<5:17:15, 5.46s/it] 40%|███▉ | 2289/5773 [3:33:52<5:13:51, 5.41s/it] 40%|███▉ | 2289/5773 [3:33:47<5:13:51, 5.41s/it] {'loss': 0.5847, 'learning_rate': 1.3746990789878503e-05, 'epoch': 0.4} 40%|███▉ | 2289/5773 [3:33:52<5:13:51, 5.41s/it] {'loss': 0.5847, 'learning_rate': 1.3746990789878503e-05, 'epoch': 0.4} 40%|███▉ | 2289/5773 [3:33:47<5:13:51, 5.41s/it] 40%|███▉ | 2290/5773 [3:33:58<5:15:44, 5.44s/it] 40%|███▉ | 2290/5773 [3:33:52<5:15:44, 5.44s/it] {'loss': 0.5757, 'learning_rate': 1.3741787991744895e-05, 'epoch': 0.4} 40%|███▉ | 2290/5773 [3:33:58<5:15:44, 5.44s/it] {'loss': 0.5757, 'learning_rate': 1.3741787991744895e-05, 'epoch': 0.4} 40%|███▉ | 2290/5773 [3:33:52<5:15:44, 5.44s/it] 40%|███▉ | 2291/5773 [3:34:03<5:13:38, 5.40s/it] 40%|███▉ | 2291/5773 [3:33:58<5:13:38, 5.40s/it] {'loss': 0.5881, 'learning_rate': 1.3736584015576874e-05, 'epoch': 0.4} 40%|███▉ | 2291/5773 [3:34:03<5:13:38, 5.40s/it] {'loss': 0.5881, 'learning_rate': 1.3736584015576874e-05, 'epoch': 0.4} 40%|███▉ | 2291/5773 [3:33:58<5:13:38, 5.40s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 40%|███▉ | 2292/5773 [3:34:09<5:15:11, 5.43s/it] 40%|███▉ | 2292/5773 [3:34:03<5:15:11, 5.43s/it] {'loss': 0.5838, 'learning_rate': 1.373137886301282e-05, 'epoch': 0.4} 40%|███▉ | 2292/5773 [3:34:09<5:15:11, 5.43s/it] {'loss': 0.5838, 'learning_rate': 1.373137886301282e-05, 'epoch': 0.4} 40%|███▉ | 2292/5773 [3:34:03<5:15:11, 5.43s/it] 40%|███▉ | 2293/5773 [3:34:14<5:18:18, 5.49s/it] 40%|███▉ | 2293/5773 [3:34:09<5:18:18, 5.49s/it] {'loss': 0.591, 'learning_rate': 1.3726172535691489e-05, 'epoch': 0.4} 40%|███▉ | 2293/5773 [3:34:14<5:18:18, 5.49s/it] {'loss': 0.591, 'learning_rate': 1.3726172535691489e-05, 'epoch': 0.4} 40%|███▉ | 2293/5773 [3:34:09<5:18:18, 5.49s/it] 40%|███▉ | 2294/5773 [3:34:20<5:17:46, 5.48s/it] 40%|███▉ | 2294/5773 [3:34:14<5:17:46, 5.48s/it] {'loss': 0.5968, 'learning_rate': 1.3720965035251989e-05, 'epoch': 0.4} 40%|███▉ | 2294/5773 [3:34:20<5:17:46, 5.48s/it] {'loss': 0.5968, 'learning_rate': 1.3720965035251989e-05, 'epoch': 0.4} 40%|███▉ | 2294/5773 [3:34:14<5:17:46, 5.48s/it] 40%|███▉ | 2295/5773 [3:34:25<5:16:14, 5.46s/it] 40%|███▉ | 2295/5773 [3:34:20<5:16:14, 5.46s/it] {'loss': 0.5682, 'learning_rate': 1.3715756363333817e-05, 'epoch': 0.4} 40%|███▉ | 2295/5773 [3:34:25<5:16:14, 5.46s/it] {'loss': 0.5682, 'learning_rate': 1.3715756363333817e-05, 'epoch': 0.4} 40%|███▉ | 2295/5773 [3:34:20<5:16:14, 5.46s/it] 40%|███▉ | 2296/5773 [3:34:31<5:13:25, 5.41s/it] 40%|███▉ | 2296/5773 [3:34:25<5:13:25, 5.41s/it] {'loss': 0.5776, 'learning_rate': 1.371054652157682e-05, 'epoch': 0.4} 40%|███▉ | 2296/5773 [3:34:31<5:13:25, 5.41s/it] {'loss': 0.5776, 'learning_rate': 1.371054652157682e-05, 'epoch': 0.4} 40%|███▉ | 2296/5773 [3:34:25<5:13:25, 5.41s/it] 40%|███▉ | 2297/5773 [3:34:36<5:13:16, 5.41s/it] 40%|███▉ | 2297/5773 [3:34:30<5:13:16, 5.41s/it] {'loss': 0.6036, 'learning_rate': 1.3705335511621229e-05, 'epoch': 0.4} 40%|███▉ | 2297/5773 [3:34:36<5:13:16, 5.41s/it] {'loss': 0.6036, 'learning_rate': 1.3705335511621229e-05, 'epoch': 0.4} 40%|███▉ | 2297/5773 [3:34:30<5:13:16, 5.41s/it] 40%|███▉ | 2298/5773 [3:34:41<5:14:41, 5.43s/it] 40%|███▉ | 2298/5773 [3:34:36<5:14:41, 5.43s/it] {'loss': 0.576, 'learning_rate': 1.3700123335107634e-05, 'epoch': 0.4} 40%|███▉ | 2298/5773 [3:34:41<5:14:41, 5.43s/it] {'loss': 0.576, 'learning_rate': 1.3700123335107634e-05, 'epoch': 0.4} 40%|███▉ | 2298/5773 [3:34:36<5:14:41, 5.43s/it] 40%|███▉ | 2299/5773 [3:34:47<5:19:52, 5.52s/it] 40%|███▉ | 2299/5773 [3:34:42<5:19:52, 5.52s/it] {'loss': 0.5866, 'learning_rate': 1.3694909993676993e-05, 'epoch': 0.4} 40%|███▉ | 2299/5773 [3:34:47<5:19:52, 5.52s/it] {'loss': 0.5866, 'learning_rate': 1.3694909993676993e-05, 'epoch': 0.4} 40%|███▉ | 2299/5773 [3:34:42<5:19:52, 5.52s/it]13 15AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 40%|███▉ | 2300/5773 [3:34:52<5:15:25, 5.45s/it]12 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 40%|███▉ | 2300/5773 [3:34:47<5:15:26, 5.45s/it] {'loss': 0.5519, 'learning_rate': 1.3689695488970638e-05, 'epoch': 0.4} 40%|███▉ | 2300/5773 [3:34:52<5:15:25, 5.45s/it] {'loss': 0.5519, 'learning_rate': 1.3689695488970638e-05, 'epoch': 0.4} 40%|███▉ | 2300/5773 [3:34:47<5:15:26, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 40%|███▉ | 2301/5773 [3:35:12<9:21:28, 9.70s/it] 40%|███▉ | 2301/5773 [3:35:07<9:21:28, 9.70s/it] {'loss': 0.5678, 'learning_rate': 1.3684479822630255e-05, 'epoch': 0.4} 40%|███▉ | 2301/5773 [3:35:12<9:21:28, 9.70s/it] {'loss': 0.5678, 'learning_rate': 1.3684479822630255e-05, 'epoch': 0.4} 40%|███▉ | 2301/5773 [3:35:07<9:21:28, 9.70s/it] 40%|███▉ | 2302/5773 [3:35:17<8:07:04, 8.42s/it] 40%|███▉ | 2302/5773 [3:35:12<8:07:04, 8.42s/it] {'loss': 0.5736, 'learning_rate': 1.3679262996297904e-05, 'epoch': 0.4} 40%|███▉ | 2302/5773 [3:35:17<8:07:04, 8.42s/it] {'loss': 0.5736, 'learning_rate': 1.3679262996297904e-05, 'epoch': 0.4} 40%|███▉ | 2302/5773 [3:35:12<8:07:04, 8.42s/it] 40%|███▉ | 2303/5773 [3:35:23<7:14:45, 7.52s/it] 40%|███▉ | 2303/5773 [3:35:17<7:14:46, 7.52s/it] {'loss': 0.5781, 'learning_rate': 1.3674045011616013e-05, 'epoch': 0.4} 40%|███▉ | 2303/5773 [3:35:23<7:14:45, 7.52s/it] {'loss': 0.5781, 'learning_rate': 1.3674045011616013e-05, 'epoch': 0.4} 40%|███▉ | 2303/5773 [3:35:17<7:14:46, 7.52s/it] 40%|███▉ | 2304/5773 [3:35:28<6:40:16, 6.92s/it] 40%|███▉ | 2304/5773 [3:35:23<6:40:16, 6.92s/it] {'loss': 0.5853, 'learning_rate': 1.3668825870227366e-05, 'epoch': 0.4} 40%|███▉ | 2304/5773 [3:35:28<6:40:16, 6.92s/it] {'loss': 0.5853, 'learning_rate': 1.3668825870227366e-05, 'epoch': 0.4} 40%|███▉ | 2304/5773 [3:35:23<6:40:16, 6.92s/it] 40%|███▉ | 2305/5773 [3:35:34<6:16:26, 6.51s/it] 40%|███▉ | 2305/5773 [3:35:28<6:16:27, 6.51s/it] {'loss': 0.5719, 'learning_rate': 1.3663605573775119e-05, 'epoch': 0.4} 40%|███▉ | 2305/5773 [3:35:34<6:16:26, 6.51s/it] {'loss': 0.5719, 'learning_rate': 1.3663605573775119e-05, 'epoch': 0.4} 40%|███▉ | 2305/5773 [3:35:28<6:16:27, 6.51s/it] 40%|███▉ | 2306/5773 [3:35:39<5:58:04, 6.20s/it] 40%|███▉ | 2306/5773 [3:35:34<5:58:04, 6.20s/it] {'loss': 0.5759, 'learning_rate': 1.3658384123902786e-05, 'epoch': 0.4} 40%|███▉ | 2306/5773 [3:35:39<5:58:04, 6.20s/it] {'loss': 0.5759, 'learning_rate': 1.3658384123902786e-05, 'epoch': 0.4} 40%|███▉ | 2306/5773 [3:35:34<5:58:04, 6.20s/it] 40%|███▉ | 2307/5773 [3:35:45<5:44:54, 5.97s/it] 40%|███▉ | 2307/5773 [3:35:39<5:44:54, 5.97s/it] {'loss': 0.5795, 'learning_rate': 1.3653161522254244e-05, 'epoch': 0.4} 40%|███▉ | 2307/5773 [3:35:45<5:44:54, 5.97s/it] {'loss': 0.5795, 'learning_rate': 1.3653161522254244e-05, 'epoch': 0.4} 40%|███▉ | 2307/5773 [3:35:39<5:44:54, 5.97s/it] 40%|███▉ | 2308/5773 [3:35:50<5:36:45, 5.83s/it] 40%|███▉ | 2308/5773 [3:35:45<5:36:45, 5.83s/it] {'loss': 0.5698, 'learning_rate': 1.3647937770473739e-05, 'epoch': 0.4} 40%|███▉ | 2308/5773 [3:35:50<5:36:45, 5.83s/it] {'loss': 0.5698, 'learning_rate': 1.3647937770473739e-05, 'epoch': 0.4} 40%|███▉ | 2308/5773 [3:35:45<5:36:45, 5.83s/it] 40%|███▉ | 2309/5773 [3:35:56<5:32:42, 5.76s/it] 40%|███▉ | 2309/5773 [3:35:50<5:32:42, 5.76s/it] {'loss': 0.5873, 'learning_rate': 1.364271287020587e-05, 'epoch': 0.4} 40%|███▉ | 2309/5773 [3:35:56<5:32:42, 5.76s/it] {'loss': 0.5873, 'learning_rate': 1.364271287020587e-05, 'epoch': 0.4} 40%|███▉ | 2309/5773 [3:35:50<5:32:42, 5.76s/it] 40%|████ | 2310/5773 [3:36:02<5:32:39, 5.76s/it] 40%|████ | 2310/5773 [3:35:56<5:32:39, 5.76s/it] {'loss': 0.5807, 'learning_rate': 1.3637486823095608e-05, 'epoch': 0.4} 40%|████ | 2310/5773 [3:36:02<5:32:39, 5.76s/it] {'loss': 0.5807, 'learning_rate': 1.3637486823095608e-05, 'epoch': 0.4} 40%|████ | 2310/5773 [3:35:56<5:32:39, 5.76s/it] 40%|████ | 2311/5773 [3:36:07<5:29:12, 5.71s/it] 40%|████ | 2311/5773 [3:36:02<5:29:12, 5.71s/it] {'loss': 0.5722, 'learning_rate': 1.3632259630788278e-05, 'epoch': 0.4} 40%|████ | 2311/5773 [3:36:07<5:29:12, 5.71s/it] {'loss': 0.5722, 'learning_rate': 1.3632259630788278e-05, 'epoch': 0.4} 40%|████ | 2311/5773 [3:36:02<5:29:12, 5.71s/it] 40%|████ | 2312/5773 [3:36:13<5:24:35, 5.63s/it] 40%|████ | 2312/5773 [3:36:07<5:24:35, 5.63s/it] {'loss': 0.5828, 'learning_rate': 1.3627031294929564e-05, 'epoch': 0.4} 40%|████ | 2312/5773 [3:36:13<5:24:35, 5.63s/it] {'loss': 0.5828, 'learning_rate': 1.3627031294929564e-05, 'epoch': 0.4} 40%|████ | 2312/5773 [3:36:07<5:24:35, 5.63s/it] 40%|████ | 2313/5773 [3:36:18<5:20:33, 5.56s/it] 40%|████ | 2313/5773 [3:36:13<5:20:33, 5.56s/it] {'loss': 0.5996, 'learning_rate': 1.3621801817165517e-05, 'epoch': 0.4} 40%|████ | 2313/5773 [3:36:18<5:20:33, 5.56s/it] {'loss': 0.5996, 'learning_rate': 1.3621801817165517e-05, 'epoch': 0.4} 40%|████ | 2313/5773 [3:36:13<5:20:33, 5.56s/it] 40%|████ | 2314/5773 [3:36:24<5:19:30, 5.54s/it] 40%|████ | 2314/5773 [3:36:18<5:19:30, 5.54s/it] {'loss': 0.5865, 'learning_rate': 1.3616571199142542e-05, 'epoch': 0.4} 40%|████ | 2314/5773 [3:36:18<5:19:30, 5.54s/it]{'loss': 0.5865, 'learning_rate': 1.3616571199142542e-05, 'epoch': 0.4} 40%|████ | 2314/5773 [3:36:24<5:19:30, 5.54s/it] 40%|████ | 2315/5773 [3:36:29<5:16:10, 5.49s/it] 40%|████ | 2315/5773 [3:36:24<5:16:10, 5.49s/it] {'loss': 0.5765, 'learning_rate': 1.3611339442507403e-05, 'epoch': 0.4} 40%|████ | 2315/5773 [3:36:29<5:16:10, 5.49s/it] {'loss': 0.5765, 'learning_rate': 1.3611339442507403e-05, 'epoch': 0.4} 40%|████ | 2315/5773 [3:36:24<5:16:10, 5.49s/it] 40%|████ | 2316/5773 [3:36:34<5:12:16, 5.42s/it] 40%|████ | 2316/5773 [3:36:29<5:12:16, 5.42s/it] {'loss': 0.5698, 'learning_rate': 1.3606106548907228e-05, 'epoch': 0.4} 40%|████ | 2316/5773 [3:36:34<5:12:16, 5.42s/it] {'loss': 0.5698, 'learning_rate': 1.3606106548907228e-05, 'epoch': 0.4} 40%|████ | 2316/5773 [3:36:29<5:12:16, 5.42s/it] 40%|████ | 2317/5773 [3:36:40<5:11:53, 5.41s/it] 40%|████ | 2317/5773 [3:36:34<5:11:52, 5.41s/it] {'loss': 0.5784, 'learning_rate': 1.3600872519989497e-05, 'epoch': 0.4} 40%|████ | 2317/5773 [3:36:40<5:11:53, 5.41s/it] {'loss': 0.5784, 'learning_rate': 1.3600872519989497e-05, 'epoch': 0.4} 40%|████ | 2317/5773 [3:36:34<5:11:52, 5.41s/it] 40%|████ | 2318/5773 [3:36:45<5:13:20, 5.44s/it] 40%|████ | 2318/5773 [3:36:40<5:13:20, 5.44s/it] {'loss': 0.5854, 'learning_rate': 1.3595637357402049e-05, 'epoch': 0.4} 40%|████ | 2318/5773 [3:36:45<5:13:20, 5.44s/it] {'loss': 0.5854, 'learning_rate': 1.3595637357402049e-05, 'epoch': 0.4} 40%|████ | 2318/5773 [3:36:40<5:13:20, 5.44s/it] 40%|████ | 2319/5773 [3:36:51<5:14:05, 5.46s/it] 40%|████ | 2319/5773 [3:36:45<5:14:05, 5.46s/it] {'loss': 0.5776, 'learning_rate': 1.3590401062793084e-05, 'epoch': 0.4} 40%|████ | 2319/5773 [3:36:51<5:14:05, 5.46s/it] {'loss': 0.5776, 'learning_rate': 1.3590401062793084e-05, 'epoch': 0.4} 40%|████ | 2319/5773 [3:36:45<5:14:05, 5.46s/it] 40%|████ | 2320/5773 [3:36:56<5:13:19, 5.44s/it] 40%|████ | 2320/5773 [3:36:51<5:13:19, 5.44s/it] {'loss': 0.5701, 'learning_rate': 1.3585163637811148e-05, 'epoch': 0.4} 40%|████ | 2320/5773 [3:36:56<5:13:19, 5.44s/it] {'loss': 0.5701, 'learning_rate': 1.3585163637811148e-05, 'epoch': 0.4} 40%|████ | 2320/5773 [3:36:51<5:13:19, 5.44s/it] 40%|████ | 2321/5773 [3:37:02<5:12:38, 5.43s/it] 40%|████ | 2321/5773 [3:36:56<5:12:39, 5.43s/it] {'loss': 0.576, 'learning_rate': 1.3579925084105154e-05, 'epoch': 0.4} 40%|████ | 2321/5773 [3:37:02<5:12:38, 5.43s/it] {'loss': 0.576, 'learning_rate': 1.3579925084105154e-05, 'epoch': 0.4} 40%|████ | 2321/5773 [3:36:56<5:12:39, 5.43s/it] 40%|████ | 2322/5773 [3:37:07<5:16:21, 5.50s/it] 40%|████ | 2322/5773 [3:37:02<5:16:21, 5.50s/it] {'loss': 0.5819, 'learning_rate': 1.3574685403324367e-05, 'epoch': 0.4} 40%|████ | 2322/5773 [3:37:07<5:16:21, 5.50s/it] {'loss': 0.5819, 'learning_rate': 1.3574685403324367e-05, 'epoch': 0.4} 40%|████ | 2322/5773 [3:37:02<5:16:21, 5.50s/it] 40%|████ | 2323/5773 [3:37:12<5:13:00, 5.44s/it] 40%|████ | 2323/5773 [3:37:07<5:13:00, 5.44s/it] {'loss': 0.5709, 'learning_rate': 1.3569444597118402e-05, 'epoch': 0.4} 40%|████ | 2323/5773 [3:37:13<5:13:00, 5.44s/it] {'loss': 0.5709, 'learning_rate': 1.3569444597118402e-05, 'epoch': 0.4} 40%|████ | 2323/5773 [3:37:07<5:13:00, 5.44s/it] 40%|████ | 2324/5773 [3:37:18<5:10:30, 5.40s/it] 40%|████ | 2324/5773 [3:37:12<5:10:30, 5.40s/it] {'loss': 0.5807, 'learning_rate': 1.3564202667137239e-05, 'epoch': 0.4} 40%|████ | 2324/5773 [3:37:18<5:10:30, 5.40s/it] {'loss': 0.5807, 'learning_rate': 1.3564202667137239e-05, 'epoch': 0.4} 40%|████ | 2324/5773 [3:37:12<5:10:30, 5.40s/it] 40%|████ | 2325/5773 [3:37:24<5:16:03, 5.50s/it] 40%|████ | 2325/5773 [3:37:18<5:16:03, 5.50s/it] {'loss': 0.5671, 'learning_rate': 1.35589596150312e-05, 'epoch': 0.4} 40%|████ | 2325/5773 [3:37:24<5:16:03, 5.50s/it] {'loss': 0.5671, 'learning_rate': 1.35589596150312e-05, 'epoch': 0.4} 40%|████ | 2325/5773 [3:37:18<5:16:03, 5.50s/it] 40%|████ | 2326/5773 [3:37:29<5:16:17, 5.51s/it] 40%|████ | 2326/5773 [3:37:24<5:16:17, 5.51s/it] {'loss': 0.5711, 'learning_rate': 1.3553715442450963e-05, 'epoch': 0.4} 40%|████ | 2326/5773 [3:37:29<5:16:17, 5.51s/it] {'loss': 0.5711, 'learning_rate': 1.3553715442450963e-05, 'epoch': 0.4} 40%|████ | 2326/5773 [3:37:24<5:16:17, 5.51s/it] 40%|████ | 2327/5773 [3:37:35<5:16:46, 5.52s/it] 40%|████ | 2327/5773 [3:37:29<5:16:46, 5.52s/it] {'loss': 0.5637, 'learning_rate': 1.3548470151047567e-05, 'epoch': 0.4} 40%|████ | 2327/5773 [3:37:35<5:16:46, 5.52s/it] {'loss': 0.5637, 'learning_rate': 1.3548470151047567e-05, 'epoch': 0.4} 40%|████ | 2327/5773 [3:37:29<5:16:46, 5.52s/it] 40%|████ | 2328/5773 [3:37:40<5:17:34, 5.53s/it] 40%|████ | 2328/5773 [3:37:35<5:17:34, 5.53s/it] {'loss': 0.5679, 'learning_rate': 1.3543223742472394e-05, 'epoch': 0.4} 40%|████ | 2328/5773 [3:37:40<5:17:34, 5.53s/it] {'loss': 0.5679, 'learning_rate': 1.3543223742472394e-05, 'epoch': 0.4} 40%|████ | 2328/5773 [3:37:35<5:17:34, 5.53s/it] 40%|████ | 2329/5773 [3:37:46<5:17:56, 5.54s/it] 40%|████ | 2329/5773 [3:37:40<5:17:56, 5.54s/it] {'loss': 0.5786, 'learning_rate': 1.3537976218377182e-05, 'epoch': 0.4} 40%|████ | 2329/5773 [3:37:46<5:17:56, 5.54s/it] {'loss': 0.5786, 'learning_rate': 1.3537976218377182e-05, 'epoch': 0.4} 40%|████ | 2329/5773 [3:37:40<5:17:56, 5.54s/it] 40%|████ | 2330/5773 [3:37:51<5:17:15, 5.53s/it] {'loss': 0.5915, 'learning_rate': 1.3532727580414018e-05, 'epoch': 0.4} 40%|████ | 2330/5773 [3:37:51<5:17:15, 5.53s/it] 40%|████ | 2330/5773 [3:37:46<5:17:14, 5.53s/it] {'loss': 0.5915, 'learning_rate': 1.3532727580414018e-05, 'epoch': 0.4} 40%|████ | 2330/5773 [3:37:46<5:17:14, 5.53s/it] 40%|████ | 2331/5773 [3:37:57<5:15:55, 5.51s/it] 40%|████ | 2331/5773 [3:37:51<5:15:55, 5.51s/it] {'loss': 0.6096, 'learning_rate': 1.3527477830235343e-05, 'epoch': 0.4} 40%|████ | 2331/5773 [3:37:57<5:15:55, 5.51s/it] {'loss': 0.6096, 'learning_rate': 1.3527477830235343e-05, 'epoch': 0.4} 40%|████ | 2331/5773 [3:37:51<5:15:55, 5.51s/it] 40%|████ | 2332/5773 [3:38:02<5:12:11, 5.44s/it] 40%|████ | 2332/5773 [3:37:56<5:12:11, 5.44s/it] {'loss': 0.5724, 'learning_rate': 1.3522226969493945e-05, 'epoch': 0.4} 40%|████ | 2332/5773 [3:38:02<5:12:11, 5.44s/it] {'loss': 0.5724, 'learning_rate': 1.3522226969493945e-05, 'epoch': 0.4} 40%|████ | 2332/5773 [3:37:56<5:12:11, 5.44s/it] 40%|████ | 2333/5773 [3:38:07<5:12:07, 5.44s/it] 40%|████ | 2333/5773 [3:38:02<5:12:07, 5.44s/it] {'loss': 0.585, 'learning_rate': 1.351697499984296e-05, 'epoch': 0.4} 40%|████ | 2333/5773 [3:38:07<5:12:07, 5.44s/it] {'loss': 0.585, 'learning_rate': 1.351697499984296e-05, 'epoch': 0.4} 40%|████ | 2333/5773 [3:38:02<5:12:07, 5.44s/it] 40%|████ | 2334/5773 [3:38:13<5:13:39, 5.47s/it] 40%|████ | 2334/5773 [3:38:07<5:13:39, 5.47s/it] {'loss': 0.5719, 'learning_rate': 1.3511721922935884e-05, 'epoch': 0.4} 40%|████ | 2334/5773 [3:38:13<5:13:39, 5.47s/it] {'loss': 0.5719, 'learning_rate': 1.3511721922935884e-05, 'epoch': 0.4} 40%|████ | 2334/5773 [3:38:07<5:13:39, 5.47s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4399 > 4096). Running this sequence through the model will result in indexing errors 40%|████ | 2335/5773 [3:38:19<5:14:54, 5.50s/it] 40%|████ | 2335/5773 [3:38:13<5:14:54, 5.50s/it] {'loss': 0.5831, 'learning_rate': 1.3506467740426541e-05, 'epoch': 0.4} 40%|████ | 2335/5773 [3:38:19<5:14:54, 5.50s/it] {'loss': 0.5831, 'learning_rate': 1.3506467740426541e-05, 'epoch': 0.4} 40%|████ | 2335/5773 [3:38:13<5:14:54, 5.50s/it] 40%|████ | 2336/5773 [3:38:24<5:14:26, 5.49s/it] 40%|████ | 2336/5773 [3:38:18<5:14:26, 5.49s/it] {'loss': 0.5744, 'learning_rate': 1.3501212453969124e-05, 'epoch': 0.4} 40%|████ | 2336/5773 [3:38:24<5:14:26, 5.49s/it] {'loss': 0.5744, 'learning_rate': 1.3501212453969124e-05, 'epoch': 0.4} 40%|████ | 2336/5773 [3:38:18<5:14:26, 5.49s/it] 40%|████ | 2337/5773 [3:38:30<5:16:08, 5.52s/it] 40%|████ | 2337/5773 [3:38:24<5:16:08, 5.52s/it] {'loss': 0.5807, 'learning_rate': 1.3495956065218168e-05, 'epoch': 0.4} 40%|████ | 2337/5773 [3:38:30<5:16:08, 5.52s/it] {'loss': 0.5807, 'learning_rate': 1.3495956065218168e-05, 'epoch': 0.4} 40%|████ | 2337/5773 [3:38:24<5:16:08, 5.52s/it] 40%|████ | 2338/5773 [3:38:35<5:14:00, 5.48s/it] 40%|████ | 2338/5773 [3:38:29<5:14:00, 5.48s/it] {'loss': 0.5964, 'learning_rate': 1.3490698575828543e-05, 'epoch': 0.4} 40%|████ | 2338/5773 [3:38:35<5:14:00, 5.48s/it] {'loss': 0.5964, 'learning_rate': 1.3490698575828543e-05, 'epoch': 0.4} 40%|████ | 2338/5773 [3:38:29<5:14:00, 5.48s/it] 41%|████ | 2339/5773 [3:38:40<5:12:55, 5.47s/it] 41%|████ | 2339/5773 [3:38:35<5:12:56, 5.47s/it] {'loss': 0.5872, 'learning_rate': 1.3485439987455482e-05, 'epoch': 0.41} 41%|████ | 2339/5773 [3:38:40<5:12:55, 5.47s/it] {'loss': 0.5872, 'learning_rate': 1.3485439987455482e-05, 'epoch': 0.41} 41%|████ | 2339/5773 [3:38:35<5:12:56, 5.47s/it] 41%|████ | 2340/5773 [3:38:46<5:11:55, 5.45s/it] 41%|████ | 2340/5773 [3:38:40<5:11:55, 5.45s/it] {'loss': 0.5686, 'learning_rate': 1.3480180301754553e-05, 'epoch': 0.41} 41%|████ | 2340/5773 [3:38:46<5:11:55, 5.45s/it] {'loss': 0.5686, 'learning_rate': 1.3480180301754553e-05, 'epoch': 0.41} 41%|████ | 2340/5773 [3:38:40<5:11:55, 5.45s/it] 41%|████ | 2341/5773 [3:38:51<5:10:37, 5.43s/it] 41%|████ | 2341/5773 [3:38:46<5:10:37, 5.43s/it] {'loss': 0.5918, 'learning_rate': 1.3474919520381673e-05, 'epoch': 0.41} 41%|████ | 2341/5773 [3:38:51<5:10:37, 5.43s/it] {'loss': 0.5918, 'learning_rate': 1.3474919520381673e-05, 'epoch': 0.41} 41%|████ | 2341/5773 [3:38:46<5:10:37, 5.43s/it] 41%|████ | 2342/5773 [3:38:57<5:11:49, 5.45s/it] 41%|████ | 2342/5773 [3:38:51<5:11:49, 5.45s/it] {'loss': 0.5746, 'learning_rate': 1.3469657644993109e-05, 'epoch': 0.41} 41%|████ | 2342/5773 [3:38:57<5:11:49, 5.45s/it] {'loss': 0.5746, 'learning_rate': 1.3469657644993109e-05, 'epoch': 0.41} 41%|████ | 2342/5773 [3:38:51<5:11:49, 5.45s/it] 41%|████ | 2343/5773 [3:39:02<5:13:42, 5.49s/it] 41%|████ | 2343/5773 [3:38:57<5:13:42, 5.49s/it] {'loss': 0.5743, 'learning_rate': 1.3464394677245459e-05, 'epoch': 0.41} 41%|████ | 2343/5773 [3:39:02<5:13:42, 5.49s/it] {'loss': 0.5743, 'learning_rate': 1.3464394677245459e-05, 'epoch': 0.41} 41%|████ | 2343/5773 [3:38:57<5:13:42, 5.49s/it] 41%|████ | 2344/5773 [3:39:08<5:12:08, 5.46s/it] 41%|████ | 2344/5773 [3:39:02<5:12:08, 5.46s/it] {'loss': 0.582, 'learning_rate': 1.3459130618795678e-05, 'epoch': 0.41} 41%|████ | 2344/5773 [3:39:08<5:12:08, 5.46s/it] {'loss': 0.582, 'learning_rate': 1.3459130618795678e-05, 'epoch': 0.41} 41%|████ | 2344/5773 [3:39:02<5:12:08, 5.46s/it] 41%|████ | 2345/5773 [3:39:13<5:13:56, 5.49s/it] 41%|████ | 2345/5773 [3:39:08<5:13:56, 5.49s/it] {'loss': 0.6015, 'learning_rate': 1.3453865471301062e-05, 'epoch': 0.41} 41%|████ | 2345/5773 [3:39:13<5:13:56, 5.49s/it] {'loss': 0.6015, 'learning_rate': 1.3453865471301062e-05, 'epoch': 0.41} 41%|████ | 2345/5773 [3:39:08<5:13:56, 5.49s/it] 41%|████ | 2346/5773 [3:39:19<5:14:13, 5.50s/it] 41%|████ | 2346/5773 [3:39:13<5:14:13, 5.50s/it] {'loss': 0.5876, 'learning_rate': 1.3448599236419246e-05, 'epoch': 0.41} 41%|████ | 2346/5773 [3:39:19<5:14:13, 5.50s/it] {'loss': 0.5876, 'learning_rate': 1.3448599236419246e-05, 'epoch': 0.41} 41%|████ | 2346/5773 [3:39:13<5:14:13, 5.50s/it] 41%|████ | 2347/5773 [3:39:24<5:12:02, 5.46s/it] 41%|████ | 2347/5773 [3:39:19<5:12:02, 5.46s/it] {'loss': 0.5787, 'learning_rate': 1.344333191580821e-05, 'epoch': 0.41} 41%|████ | 2347/5773 [3:39:24<5:12:02, 5.46s/it] {'loss': 0.5787, 'learning_rate': 1.344333191580821e-05, 'epoch': 0.41} 41%|████ | 2347/5773 [3:39:19<5:12:02, 5.46s/it] 41%|████ | 2348/5773 [3:39:30<5:13:21, 5.49s/it] 41%|████ | 2348/5773 [3:39:24<5:13:21, 5.49s/it] {'loss': 0.5806, 'learning_rate': 1.3438063511126267e-05, 'epoch': 0.41} 41%|████ | 2348/5773 [3:39:30<5:13:21, 5.49s/it] {'loss': 0.5806, 'learning_rate': 1.3438063511126267e-05, 'epoch': 0.41} 41%|████ | 2348/5773 [3:39:24<5:13:21, 5.49s/it] 41%|████ | 2349/5773 [3:39:35<5:14:25, 5.51s/it] 41%|████ | 2349/5773 [3:39:30<5:14:25, 5.51s/it] {'loss': 0.5702, 'learning_rate': 1.3432794024032088e-05, 'epoch': 0.41} 41%|████ | 2349/5773 [3:39:35<5:14:25, 5.51s/it] {'loss': 0.5702, 'learning_rate': 1.3432794024032088e-05, 'epoch': 0.41} 41%|████ | 2349/5773 [3:39:30<5:14:25, 5.51s/it]813 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 41%|████ | 2350/5773 [3:39:41<5:13:36, 5.50s/it]15 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 41%|████ | 2350/5773 [3:39:35<5:13:37, 5.50s/it] {'loss': 0.5774, 'learning_rate': 1.3427523456184675e-05, 'epoch': 0.41} 41%|████ | 2350/5773 [3:39:41<5:13:36, 5.50s/it] {'loss': 0.5774, 'learning_rate': 1.3427523456184675e-05, 'epoch': 0.41} 41%|████ | 2350/5773 [3:39:35<5:13:37, 5.50s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 41%|████ | 2351/5773 [3:39:46<5:11:52, 5.47s/it] 41%|████ | 2351/5773 [3:39:41<5:11:51, 5.47s/it] {'loss': 0.5906, 'learning_rate': 1.3422251809243368e-05, 'epoch': 0.41} 41%|████ | 2351/5773 [3:39:46<5:11:52, 5.47s/it] {'loss': 0.5906, 'learning_rate': 1.3422251809243368e-05, 'epoch': 0.41} 41%|████ | 2351/5773 [3:39:41<5:11:51, 5.47s/it] 41%|████ | 2352/5773 [3:39:52<5:12:43, 5.48s/it] 41%|████ | 2352/5773 [3:39:46<5:12:43, 5.48s/it] {'loss': 0.5836, 'learning_rate': 1.3416979084867851e-05, 'epoch': 0.41} 41%|████ | 2352/5773 [3:39:52<5:12:43, 5.48s/it] {'loss': 0.5836, 'learning_rate': 1.3416979084867851e-05, 'epoch': 0.41} 41%|████ | 2352/5773 [3:39:46<5:12:43, 5.48s/it] 41%|████ | 2353/5773 [3:39:57<5:12:14, 5.48s/it] 41%|████ | 2353/5773 [3:39:52<5:12:13, 5.48s/it] {'loss': 0.5936, 'learning_rate': 1.3411705284718148e-05, 'epoch': 0.41} 41%|████ | 2353/5773 [3:39:57<5:12:14, 5.48s/it] {'loss': 0.5936, 'learning_rate': 1.3411705284718148e-05, 'epoch': 0.41} 41%|████ | 2353/5773 [3:39:52<5:12:13, 5.48s/it] 41%|████ | 2354/5773 [3:40:03<5:15:37, 5.54s/it] 41%|████ | 2354/5773 [3:39:57<5:15:38, 5.54s/it] {'loss': 0.5644, 'learning_rate': 1.3406430410454615e-05, 'epoch': 0.41} 41%|████ | 2354/5773 [3:40:03<5:15:37, 5.54s/it] {'loss': 0.5644, 'learning_rate': 1.3406430410454615e-05, 'epoch': 0.41} 41%|████ | 2354/5773 [3:39:57<5:15:38, 5.54s/it] 41%|████ | 2355/5773 [3:40:08<5:12:10, 5.48s/it] 41%|████ | 2355/5773 [3:40:03<5:12:10, 5.48s/it] {'loss': 0.5792, 'learning_rate': 1.3401154463737957e-05, 'epoch': 0.41} 41%|████ | 2355/5773 [3:40:08<5:12:10, 5.48s/it] {'loss': 0.5792, 'learning_rate': 1.3401154463737957e-05, 'epoch': 0.41} 41%|████ | 2355/5773 [3:40:03<5:12:10, 5.48s/it] 41%|████ | 2356/5773 [3:40:14<5:13:53, 5.51s/it] 41%|████ | 2356/5773 [3:40:08<5:13:53, 5.51s/it] {'loss': 0.5697, 'learning_rate': 1.3395877446229207e-05, 'epoch': 0.41} 41%|████ | 2356/5773 [3:40:14<5:13:53, 5.51s/it] {'loss': 0.5697, 'learning_rate': 1.3395877446229207e-05, 'epoch': 0.41} 41%|████ | 2356/5773 [3:40:08<5:13:53, 5.51s/it] 41%|████ | 2357/5773 [3:40:19<5:15:18, 5.54s/it] 41%|████ | 2357/5773 [3:40:14<5:15:18, 5.54s/it] {'loss': 0.5625, 'learning_rate': 1.3390599359589737e-05, 'epoch': 0.41} 41%|████ | 2357/5773 [3:40:19<5:15:18, 5.54s/it] {'loss': 0.5625, 'learning_rate': 1.3390599359589737e-05, 'epoch': 0.41} 41%|████ | 2357/5773 [3:40:14<5:15:18, 5.54s/it] 41%|████ | 2358/5773 [3:40:25<5:14:27, 5.53s/it] 41%|████ | 2358/5773 [3:40:19<5:14:28, 5.53s/it] {'loss': 0.568, 'learning_rate': 1.3385320205481262e-05, 'epoch': 0.41} 41%|████ | 2358/5773 [3:40:25<5:14:27, 5.53s/it] {'loss': 0.568, 'learning_rate': 1.3385320205481262e-05, 'epoch': 0.41} 41%|████ | 2358/5773 [3:40:19<5:14:28, 5.53s/it] 41%|████ | 2359/5773 [3:40:30<5:13:55, 5.52s/it] 41%|████ | 2359/5773 [3:40:25<5:13:55, 5.52s/it] {'loss': 0.5744, 'learning_rate': 1.3380039985565825e-05, 'epoch': 0.41} 41%|████ | 2359/5773 [3:40:30<5:13:55, 5.52s/it] {'loss': 0.5744, 'learning_rate': 1.3380039985565825e-05, 'epoch': 0.41} 41%|████ | 2359/5773 [3:40:25<5:13:55, 5.52s/it] 41%|████ | 2360/5773 [3:40:36<5:11:06, 5.47s/it] 41%|████ | 2360/5773 [3:40:30<5:11:06, 5.47s/it] {'loss': 0.596, 'learning_rate': 1.3374758701505812e-05, 'epoch': 0.41} 41%|████ | 2360/5773 [3:40:36<5:11:06, 5.47s/it] {'loss': 0.596, 'learning_rate': 1.3374758701505812e-05, 'epoch': 0.41} 41%|████ | 2360/5773 [3:40:30<5:11:06, 5.47s/it] 41%|████ | 2361/5773 [3:40:41<5:11:07, 5.47s/it] 41%|████ | 2361/5773 [3:40:36<5:11:07, 5.47s/it] {'loss': 0.5957, 'learning_rate': 1.3369476354963935e-05, 'epoch': 0.41} 41%|████ | 2361/5773 [3:40:41<5:11:07, 5.47s/it] {'loss': 0.5957, 'learning_rate': 1.3369476354963935e-05, 'epoch': 0.41} 41%|████ | 2361/5773 [3:40:36<5:11:07, 5.47s/it] 41%|████ | 2362/5773 [3:40:47<5:09:37, 5.45s/it] 41%|████ | 2362/5773 [3:40:41<5:09:37, 5.45s/it] {'loss': 0.5782, 'learning_rate': 1.3364192947603247e-05, 'epoch': 0.41} 41%|████ | 2362/5773 [3:40:47<5:09:37, 5.45s/it] {'loss': 0.5782, 'learning_rate': 1.3364192947603247e-05, 'epoch': 0.41} 41%|████ | 2362/5773 [3:40:41<5:09:37, 5.45s/it] 41%|████ | 2363/5773 [3:40:52<5:13:53, 5.52s/it] 41%|████ | 2363/5773 [3:40:47<5:13:53, 5.52s/it] {'loss': 0.5929, 'learning_rate': 1.3358908481087133e-05, 'epoch': 0.41} 41%|████ | 2363/5773 [3:40:52<5:13:53, 5.52s/it] {'loss': 0.5929, 'learning_rate': 1.3358908481087133e-05, 'epoch': 0.41} 41%|████ | 2363/5773 [3:40:47<5:13:53, 5.52s/it] 41%|████ | 2364/5773 [3:40:57<5:09:29, 5.45s/it] 41%|████ | 2364/5773 [3:40:52<5:09:29, 5.45s/it] {'loss': 0.5614, 'learning_rate': 1.3353622957079316e-05, 'epoch': 0.41} 41%|████ | 2364/5773 [3:40:57<5:09:29, 5.45s/it] {'loss': 0.5614, 'learning_rate': 1.3353622957079316e-05, 'epoch': 0.41} 41%|████ | 2364/5773 [3:40:52<5:09:29, 5.45s/it] 41%|████ | 2365/5773 [3:41:03<5:11:29, 5.48s/it] 41%|████ | 2365/5773 [3:40:58<5:11:29, 5.48s/it] {'loss': 0.5823, 'learning_rate': 1.3348336377243842e-05, 'epoch': 0.41} 41%|████ | 2365/5773 [3:41:03<5:11:29, 5.48s/it] {'loss': 0.5823, 'learning_rate': 1.3348336377243842e-05, 'epoch': 0.41} 41%|████ | 2365/5773 [3:40:58<5:11:29, 5.48s/it] 41%|████ | 2366/5773 [3:41:08<5:08:37, 5.44s/it] 41%|████ | 2366/5773 [3:41:03<5:08:37, 5.44s/it] {'loss': 0.5637, 'learning_rate': 1.3343048743245098e-05, 'epoch': 0.41} 41%|████ | 2366/5773 [3:41:08<5:08:37, 5.44s/it] {'loss': 0.5637, 'learning_rate': 1.3343048743245098e-05, 'epoch': 0.41} 41%|████ | 2366/5773 [3:41:03<5:08:37, 5.44s/it] 41%|████ | 2367/5773 [3:41:14<5:05:24, 5.38s/it] 41%|████ | 2367/5773 [3:41:08<5:05:24, 5.38s/it] {'loss': 0.5768, 'learning_rate': 1.33377600567478e-05, 'epoch': 0.41} 41%|████ | 2367/5773 [3:41:14<5:05:24, 5.38s/it] {'loss': 0.5768, 'learning_rate': 1.33377600567478e-05, 'epoch': 0.41} 41%|████ | 2367/5773 [3:41:08<5:05:24, 5.38s/it] 41%|████ | 2368/5773 [3:41:19<5:05:31, 5.38s/it] 41%|████ | 2368/5773 [3:41:13<5:05:31, 5.38s/it] {'loss': 0.5733, 'learning_rate': 1.3332470319416996e-05, 'epoch': 0.41} 41%|████ | 2368/5773 [3:41:19<5:05:31, 5.38s/it] {'loss': 0.5733, 'learning_rate': 1.3332470319416996e-05, 'epoch': 0.41} 41%|████ | 2368/5773 [3:41:14<5:05:31, 5.38s/it] 41%|████ | 2369/5773 [3:41:24<5:06:14, 5.40s/it] 41%|████ | 2369/5773 [3:41:19<5:06:14, 5.40s/it] {'loss': 0.5706, 'learning_rate': 1.3327179532918063e-05, 'epoch': 0.41} 41%|████ | 2369/5773 [3:41:24<5:06:14, 5.40s/it] {'loss': 0.5706, 'learning_rate': 1.3327179532918063e-05, 'epoch': 0.41} 41%|████ | 2369/5773 [3:41:19<5:06:14, 5.40s/it] 41%|████ | 2370/5773 [3:41:30<5:07:12, 5.42s/it] 41%|████ | 2370/5773 [3:41:24<5:07:12, 5.42s/it] {'loss': 0.5708, 'learning_rate': 1.3321887698916709e-05, 'epoch': 0.41} 41%|████ | 2370/5773 [3:41:30<5:07:12, 5.42s/it] {'loss': 0.5708, 'learning_rate': 1.3321887698916709e-05, 'epoch': 0.41} 41%|████ | 2370/5773 [3:41:24<5:07:12, 5.42s/it] 41%|████ | 2371/5773 [3:41:35<5:06:06, 5.40s/it] 41%|████ | 2371/5773 [3:41:30<5:06:05, 5.40s/it] {'loss': 0.5796, 'learning_rate': 1.3316594819078979e-05, 'epoch': 0.41} 41%|████ | 2371/5773 [3:41:35<5:06:06, 5.40s/it] {'loss': 0.5796, 'learning_rate': 1.3316594819078979e-05, 'epoch': 0.41} 41%|████ | 2371/5773 [3:41:30<5:06:05, 5.40s/it] 41%|████ | 2372/5773 [3:41:41<5:07:29, 5.42s/it] 41%|████ | 2372/5773 [3:41:35<5:07:30, 5.42s/it] {'loss': 0.5536, 'learning_rate': 1.3311300895071229e-05, 'epoch': 0.41} 41%|████ | 2372/5773 [3:41:41<5:07:29, 5.42s/it] {'loss': 0.5536, 'learning_rate': 1.3311300895071229e-05, 'epoch': 0.41} 41%|████ | 2372/5773 [3:41:35<5:07:30, 5.42s/it] 41%|████ | 2373/5773 [3:41:47<5:12:51, 5.52s/it] 41%|████ | 2373/5773 [3:41:41<5:12:51, 5.52s/it] {'loss': 0.5716, 'learning_rate': 1.3306005928560166e-05, 'epoch': 0.41} 41%|████ | 2373/5773 [3:41:47<5:12:51, 5.52s/it] {'loss': 0.5716, 'learning_rate': 1.3306005928560166e-05, 'epoch': 0.41} 41%|████ | 2373/5773 [3:41:41<5:12:51, 5.52s/it] 41%|████ | 2374/5773 [3:41:52<5:08:55, 5.45s/it] 41%|████ | 2374/5773 [3:41:46<5:08:55, 5.45s/it] {'loss': 0.5725, 'learning_rate': 1.330070992121281e-05, 'epoch': 0.41} 41%|████ | 2374/5773 [3:41:52<5:08:55, 5.45s/it] {'loss': 0.5725, 'learning_rate': 1.330070992121281e-05, 'epoch': 0.41} 41%|████ | 2374/5773 [3:41:46<5:08:55, 5.45s/it] 41%|████ | 2375/5773 [3:41:57<5:09:39, 5.47s/it] 41%|████ | 2375/5773 [3:41:52<5:09:39, 5.47s/it] {'loss': 0.5746, 'learning_rate': 1.3295412874696512e-05, 'epoch': 0.41} 41%|████ | 2375/5773 [3:41:57<5:09:39, 5.47s/it] {'loss': 0.5746, 'learning_rate': 1.3295412874696512e-05, 'epoch': 0.41} 41%|████ | 2375/5773 [3:41:52<5:09:39, 5.47s/it] 41%|████ | 2376/5773 [3:42:03<5:09:51, 5.47s/it] 41%|████ | 2376/5773 [3:41:57<5:09:51, 5.47s/it] {'loss': 0.5915, 'learning_rate': 1.3290114790678956e-05, 'epoch': 0.41} 41%|████ | 2376/5773 [3:42:03<5:09:51, 5.47s/it] {'loss': 0.5915, 'learning_rate': 1.3290114790678956e-05, 'epoch': 0.41} 41%|████ | 2376/5773 [3:41:57<5:09:51, 5.47s/it] 41%|████ | 2377/5773 [3:42:08<5:06:02, 5.41s/it] 41%|████ | 2377/5773 [3:42:03<5:06:02, 5.41s/it] {'loss': 0.5826, 'learning_rate': 1.3284815670828144e-05, 'epoch': 0.41} 41%|████ | 2377/5773 [3:42:08<5:06:02, 5.41s/it] {'loss': 0.5826, 'learning_rate': 1.3284815670828144e-05, 'epoch': 0.41} 41%|████ | 2377/5773 [3:42:03<5:06:02, 5.41s/it] 41%|████ | 2378/5773 [3:42:13<5:05:37, 5.40s/it] 41%|████ | 2378/5773 [3:42:08<5:05:37, 5.40s/it] {'loss': 0.5764, 'learning_rate': 1.327951551681241e-05, 'epoch': 0.41} 41%|████ | 2378/5773 [3:42:13<5:05:37, 5.40s/it] {'loss': 0.5764, 'learning_rate': 1.327951551681241e-05, 'epoch': 0.41} 41%|████ | 2378/5773 [3:42:08<5:05:37, 5.40s/it] 41%|████ | 2379/5773 [3:42:19<5:03:20, 5.36s/it] 41%|████ | 2379/5773 [3:42:13<5:03:20, 5.36s/it] {'loss': 0.5632, 'learning_rate': 1.327421433030041e-05, 'epoch': 0.41} 41%|████ | 2379/5773 [3:42:19<5:03:20, 5.36s/it] {'loss': 0.5632, 'learning_rate': 1.327421433030041e-05, 'epoch': 0.41} 41%|████ | 2379/5773 [3:42:13<5:03:20, 5.36s/it] 41%|████ | 2380/5773 [3:42:24<5:07:32, 5.44s/it] 41%|████ | 2380/5773 [3:42:19<5:07:31, 5.44s/it] {'loss': 0.5869, 'learning_rate': 1.326891211296113e-05, 'epoch': 0.41} 41%|████ | 2380/5773 [3:42:24<5:07:32, 5.44s/it] {'loss': 0.5869, 'learning_rate': 1.326891211296113e-05, 'epoch': 0.41} 41%|████ | 2380/5773 [3:42:19<5:07:31, 5.44s/it] 41%|████ | 2381/5773 [3:42:30<5:05:43, 5.41s/it] 41%|████ | 2381/5773 [3:42:24<5:05:43, 5.41s/it] {'loss': 0.5678, 'learning_rate': 1.3263608866463878e-05, 'epoch': 0.41} 41%|████ | 2381/5773 [3:42:30<5:05:43, 5.41s/it] {'loss': 0.5678, 'learning_rate': 1.3263608866463878e-05, 'epoch': 0.41} 41%|████ | 2381/5773 [3:42:24<5:05:43, 5.41s/it] 41%|████▏ | 2382/5773 [3:42:35<5:04:52, 5.39s/it] 41%|████▏ | 2382/5773 [3:42:29<5:04:52, 5.39s/it] {'loss': 0.5742, 'learning_rate': 1.325830459247828e-05, 'epoch': 0.41} 41%|████▏ | 2382/5773 [3:42:35<5:04:52, 5.39s/it] {'loss': 0.5742, 'learning_rate': 1.325830459247828e-05, 'epoch': 0.41} 41%|████▏ | 2382/5773 [3:42:29<5:04:52, 5.39s/it] 41%|████▏ | 2383/5773 [3:42:41<5:06:45, 5.43s/it] 41%|████▏ | 2383/5773 [3:42:35<5:06:45, 5.43s/it] {'loss': 0.5907, 'learning_rate': 1.3252999292674292e-05, 'epoch': 0.41} 41%|████▏ | 2383/5773 [3:42:41<5:06:45, 5.43s/it] {'loss': 0.5907, 'learning_rate': 1.3252999292674292e-05, 'epoch': 0.41} 41%|████▏ | 2383/5773 [3:42:35<5:06:45, 5.43s/it] 41%|████▏ | 2384/5773 [3:42:46<5:06:39, 5.43s/it] 41%|████▏ | 2384/5773 [3:42:40<5:06:39, 5.43s/it] {'loss': 0.5987, 'learning_rate': 1.3247692968722198e-05, 'epoch': 0.41} 41%|████▏ | 2384/5773 [3:42:46<5:06:39, 5.43s/it] {'loss': 0.5987, 'learning_rate': 1.3247692968722198e-05, 'epoch': 0.41} 41%|████▏ | 2384/5773 [3:42:40<5:06:39, 5.43s/it] 41%|████▏ | 2385/5773 [3:42:51<5:05:41, 5.41s/it] 41%|████▏ | 2385/5773 [3:42:46<5:05:42, 5.41s/it] {'loss': 0.5672, 'learning_rate': 1.3242385622292593e-05, 'epoch': 0.41} 41%|████▏ | 2385/5773 [3:42:51<5:05:41, 5.41s/it] {'loss': 0.5672, 'learning_rate': 1.3242385622292593e-05, 'epoch': 0.41} 41%|████▏ | 2385/5773 [3:42:46<5:05:42, 5.41s/it] 41%|████▏ | 2386/5773 [3:42:57<5:04:35, 5.40s/it] 41%|████▏ | 2386/5773 [3:42:51<5:04:35, 5.40s/it] {'loss': 0.5869, 'learning_rate': 1.32370772550564e-05, 'epoch': 0.41} 41%|████▏ | 2386/5773 [3:42:57<5:04:35, 5.40s/it] {'loss': 0.5869, 'learning_rate': 1.32370772550564e-05, 'epoch': 0.41} 41%|████▏ | 2386/5773 [3:42:51<5:04:35, 5.40s/it] 41%|████▏ | 2387/5773 [3:43:02<5:07:48, 5.45s/it] 41%|████▏ | 2387/5773 [3:42:57<5:07:48, 5.45s/it] {'loss': 0.5777, 'learning_rate': 1.3231767868684863e-05, 'epoch': 0.41} 41%|████▏ | 2387/5773 [3:42:57<5:07:48, 5.45s/it]{'loss': 0.5777, 'learning_rate': 1.3231767868684863e-05, 'epoch': 0.41} 41%|████▏ | 2387/5773 [3:43:02<5:07:48, 5.45s/it] 41%|████▏ | 2388/5773 [3:43:08<5:08:32, 5.47s/it] 41%|████▏ | 2388/5773 [3:43:02<5:08:33, 5.47s/it] {'loss': 0.568, 'learning_rate': 1.3226457464849549e-05, 'epoch': 0.41} 41%|████▏ | 2388/5773 [3:43:08<5:08:32, 5.47s/it] {'loss': 0.568, 'learning_rate': 1.3226457464849549e-05, 'epoch': 0.41} 41%|████▏ | 2388/5773 [3:43:02<5:08:33, 5.47s/it] 41%|████▏ | 2389/5773 [3:43:13<5:09:28, 5.49s/it] 41%|████▏ | 2389/5773 [3:43:08<5:09:28, 5.49s/it] {'loss': 0.579, 'learning_rate': 1.3221146045222342e-05, 'epoch': 0.41} 41%|████▏ | 2389/5773 [3:43:13<5:09:28, 5.49s/it] {'loss': 0.579, 'learning_rate': 1.3221146045222342e-05, 'epoch': 0.41} 41%|████▏ | 2389/5773 [3:43:08<5:09:28, 5.49s/it] 41%|████▏ | 2390/5773 [3:43:19<5:06:23, 5.43s/it] 41%|████▏ | 2390/5773 [3:43:13<5:06:23, 5.43s/it] {'loss': 0.5896, 'learning_rate': 1.321583361147544e-05, 'epoch': 0.41} 41%|████▏ | 2390/5773 [3:43:19<5:06:23, 5.43s/it] {'loss': 0.5896, 'learning_rate': 1.321583361147544e-05, 'epoch': 0.41} 41%|████▏ | 2390/5773 [3:43:13<5:06:23, 5.43s/it] 41%|████▏ | 2391/5773 [3:43:24<5:06:30, 5.44s/it] 41%|████▏ | 2391/5773 [3:43:19<5:06:30, 5.44s/it] {'loss': 0.575, 'learning_rate': 1.3210520165281376e-05, 'epoch': 0.41} 41%|████▏ | 2391/5773 [3:43:24<5:06:30, 5.44s/it] {'loss': 0.575, 'learning_rate': 1.3210520165281376e-05, 'epoch': 0.41} 41%|████▏ | 2391/5773 [3:43:19<5:06:30, 5.44s/it] 41%|████▏ | 2392/5773 [3:43:29<5:05:33, 5.42s/it] 41%|████▏ | 2392/5773 [3:43:24<5:05:32, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.3205205708312985e-05, 'epoch': 0.41} 41%|████▏ | 2392/5773 [3:43:29<5:05:33, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.3205205708312985e-05, 'epoch': 0.41} 41%|████▏ | 2392/5773 [3:43:24<5:05:32, 5.42s/it] 41%|████▏ | 2393/5773 [3:43:35<5:07:43, 5.46s/it] 41%|████▏ | 2393/5773 [3:43:29<5:07:43, 5.46s/it] {'loss': 0.5676, 'learning_rate': 1.3199890242243432e-05, 'epoch': 0.41} 41%|████▏ | 2393/5773 [3:43:35<5:07:43, 5.46s/it] {'loss': 0.5676, 'learning_rate': 1.3199890242243432e-05, 'epoch': 0.41} 41%|████▏ | 2393/5773 [3:43:29<5:07:43, 5.46s/it] 41%|████▏ | 2394/5773 [3:43:41<5:11:21, 5.53s/it] 41%|████▏ | 2394/5773 [3:43:35<5:11:21, 5.53s/it] {'loss': 0.576, 'learning_rate': 1.3194573768746197e-05, 'epoch': 0.41} 41%|████▏ | 2394/5773 [3:43:41<5:11:21, 5.53s/it] {'loss': 0.576, 'learning_rate': 1.3194573768746197e-05, 'epoch': 0.41} 41%|████▏ | 2394/5773 [3:43:35<5:11:21, 5.53s/it] 41%|████▏ | 2395/5773 [3:43:46<5:11:36, 5.53s/it] 41%|████▏ | 2395/5773 [3:43:41<5:11:36, 5.53s/it] {'loss': 0.5674, 'learning_rate': 1.3189256289495074e-05, 'epoch': 0.41} 41%|████▏ | 2395/5773 [3:43:46<5:11:36, 5.53s/it] {'loss': 0.5674, 'learning_rate': 1.3189256289495074e-05, 'epoch': 0.41} 41%|████▏ | 2395/5773 [3:43:41<5:11:36, 5.53s/it] 42%|████▏ | 2396/5773 [3:43:52<5:15:06, 5.60s/it] 42%|████▏ | 2396/5773 [3:43:46<5:15:06, 5.60s/it] {'loss': 0.5795, 'learning_rate': 1.3183937806164174e-05, 'epoch': 0.42} 42%|████▏ | 2396/5773 [3:43:52<5:15:06, 5.60s/it] {'loss': 0.5795, 'learning_rate': 1.3183937806164174e-05, 'epoch': 0.42} 42%|████▏ | 2396/5773 [3:43:46<5:15:06, 5.60s/it] 42%|████▏ | 2397/5773 [3:43:57<5:11:15, 5.53s/it] 42%|████▏ | 2397/5773 [3:43:52<5:11:16, 5.53s/it] {'loss': 0.5692, 'learning_rate': 1.3178618320427924e-05, 'epoch': 0.42} 42%|████▏ | 2397/5773 [3:43:57<5:11:15, 5.53s/it] {'loss': 0.5692, 'learning_rate': 1.3178618320427924e-05, 'epoch': 0.42} 42%|████▏ | 2397/5773 [3:43:52<5:11:16, 5.53s/it] 42%|████▏ | 2398/5773 [3:44:03<5:11:47, 5.54s/it] 42%|████▏ | 2398/5773 [3:43:57<5:11:47, 5.54s/it] {'loss': 0.586, 'learning_rate': 1.3173297833961074e-05, 'epoch': 0.42} 42%|████▏ | 2398/5773 [3:44:03<5:11:47, 5.54s/it] {'loss': 0.586, 'learning_rate': 1.3173297833961074e-05, 'epoch': 0.42} 42%|████▏ | 2398/5773 [3:43:57<5:11:47, 5.54s/it] 42%|████▏ | 2399/5773 [3:44:08<5:11:27, 5.54s/it] 42%|████▏ | 2399/5773 [3:44:03<5:11:27, 5.54s/it] {'loss': 0.583, 'learning_rate': 1.3167976348438678e-05, 'epoch': 0.42} 42%|████▏ | 2399/5773 [3:44:08<5:11:27, 5.54s/it] {'loss': 0.583, 'learning_rate': 1.3167976348438678e-05, 'epoch': 0.42} 42%|████▏ | 2399/5773 [3:44:03<5:11:27, 5.54s/it]13 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 42%|████▏ | 2400/5773 [3:44:14<5:08:48, 5.49s/it]14 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 42%|████▏ | 2400/5773 [3:44:08<5:08:47, 5.49s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5683, 'learning_rate': 1.3162653865536111e-05, 'epoch': 0.42} 42%|████▏ | 2400/5773 [3:44:14<5:08:48, 5.49s/it] {'loss': 0.5683, 'learning_rate': 1.3162653865536111e-05, 'epoch': 0.42} 42%|████▏ | 2400/5773 [3:44:08<5:08:47, 5.49s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 42%|████▏ | 2401/5773 [3:44:33<8:51:22, 9.45s/it] 42%|████▏ | 2401/5773 [3:44:27<8:51:21, 9.45s/it] {'loss': 0.5774, 'learning_rate': 1.315733038692906e-05, 'epoch': 0.42} 42%|████▏ | 2401/5773 [3:44:33<8:51:22, 9.45s/it] {'loss': 0.5774, 'learning_rate': 1.315733038692906e-05, 'epoch': 0.42} 42%|████▏ | 2401/5773 [3:44:27<8:51:21, 9.45s/it] 42%|████▏ | 2402/5773 [3:44:38<7:45:41, 8.29s/it] 42%|████▏ | 2402/5773 [3:44:33<7:45:41, 8.29s/it] {'loss': 0.5831, 'learning_rate': 1.3152005914293534e-05, 'epoch': 0.42} 42%|████▏ | 2402/5773 [3:44:38<7:45:41, 8.29s/it] {'loss': 0.5831, 'learning_rate': 1.3152005914293534e-05, 'epoch': 0.42} 42%|████▏ | 2402/5773 [3:44:33<7:45:41, 8.29s/it] 42%|████▏ | 2403/5773 [3:44:44<6:57:57, 7.44s/it] 42%|████▏ | 2403/5773 [3:44:38<6:57:57, 7.44s/it] {'loss': 0.5713, 'learning_rate': 1.3146680449305833e-05, 'epoch': 0.42} 42%|████▏ | 2403/5773 [3:44:44<6:57:57, 7.44s/it] {'loss': 0.5713, 'learning_rate': 1.3146680449305833e-05, 'epoch': 0.42} 42%|████▏ | 2403/5773 [3:44:38<6:57:57, 7.44s/it] 42%|████▏ | 2404/5773 [3:44:49<6:22:30, 6.81s/it] 42%|████▏ | 2404/5773 [3:44:43<6:22:30, 6.81s/it] {'loss': 0.5729, 'learning_rate': 1.3141353993642594e-05, 'epoch': 0.42} 42%|████▏ | 2404/5773 [3:44:49<6:22:30, 6.81s/it] {'loss': 0.5729, 'learning_rate': 1.3141353993642594e-05, 'epoch': 0.42} 42%|████▏ | 2404/5773 [3:44:43<6:22:30, 6.81s/it] 42%|████▏ | 2405/5773 [3:44:54<6:00:40, 6.43s/it] 42%|████▏ | 2405/5773 [3:44:49<6:00:40, 6.43s/it] {'loss': 0.5619, 'learning_rate': 1.3136026548980751e-05, 'epoch': 0.42} 42%|████▏ | 2405/5773 [3:44:54<6:00:40, 6.43s/it] {'loss': 0.5619, 'learning_rate': 1.3136026548980751e-05, 'epoch': 0.42} 42%|████▏ | 2405/5773 [3:44:49<6:00:40, 6.43s/it] 42%|████▏ | 2406/5773 [3:45:00<5:46:17, 6.17s/it] 42%|████▏ | 2406/5773 [3:44:54<5:46:18, 6.17s/it] {'loss': 0.5978, 'learning_rate': 1.3130698116997555e-05, 'epoch': 0.42} 42%|████▏ | 2406/5773 [3:45:00<5:46:17, 6.17s/it] {'loss': 0.5978, 'learning_rate': 1.3130698116997555e-05, 'epoch': 0.42} 42%|████▏ | 2406/5773 [3:44:55<5:46:18, 6.17s/it] 42%|████▏ | 2407/5773 [3:45:06<5:35:02, 5.97s/it] 42%|████▏ | 2407/5773 [3:45:00<5:35:02, 5.97s/it] {'loss': 0.5768, 'learning_rate': 1.3125368699370567e-05, 'epoch': 0.42} 42%|████▏ | 2407/5773 [3:45:06<5:35:02, 5.97s/it] {'loss': 0.5768, 'learning_rate': 1.3125368699370567e-05, 'epoch': 0.42} 42%|████▏ | 2407/5773 [3:45:00<5:35:02, 5.97s/it] 42%|████▏ | 2408/5773 [3:45:11<5:26:43, 5.83s/it] 42%|████▏ | 2408/5773 [3:45:05<5:26:43, 5.83s/it] {'loss': 0.5643, 'learning_rate': 1.3120038297777657e-05, 'epoch': 0.42} 42%|████▏ | 2408/5773 [3:45:11<5:26:43, 5.83s/it] {'loss': 0.5643, 'learning_rate': 1.3120038297777657e-05, 'epoch': 0.42} 42%|████▏ | 2408/5773 [3:45:05<5:26:43, 5.83s/it] 42%|████▏ | 2409/5773 [3:45:16<5:18:50, 5.69s/it] 42%|████▏ | 2409/5773 [3:45:11<5:18:50, 5.69s/it] {'loss': 0.5616, 'learning_rate': 1.3114706913897008e-05, 'epoch': 0.42} 42%|████▏ | 2409/5773 [3:45:16<5:18:50, 5.69s/it] {'loss': 0.5616, 'learning_rate': 1.3114706913897008e-05, 'epoch': 0.42} 42%|████▏ | 2409/5773 [3:45:11<5:18:50, 5.69s/it] 42%|████▏ | 2410/5773 [3:45:22<5:13:36, 5.60s/it] 42%|████▏ | 2410/5773 [3:45:16<5:13:36, 5.60s/it] {'loss': 0.569, 'learning_rate': 1.3109374549407107e-05, 'epoch': 0.42} 42%|████▏ | 2410/5773 [3:45:22<5:13:36, 5.60s/it] {'loss': 0.569, 'learning_rate': 1.3109374549407107e-05, 'epoch': 0.42} 42%|████▏ | 2410/5773 [3:45:16<5:13:36, 5.60s/it] 42%|████▏ | 2411/5773 [3:45:27<5:08:56, 5.51s/it] 42%|████▏ | 2411/5773 [3:45:22<5:08:56, 5.51s/it] {'loss': 0.5703, 'learning_rate': 1.3104041205986753e-05, 'epoch': 0.42} 42%|████▏ | 2411/5773 [3:45:27<5:08:56, 5.51s/it] {'loss': 0.5703, 'learning_rate': 1.3104041205986753e-05, 'epoch': 0.42} 42%|████▏ | 2411/5773 [3:45:22<5:08:56, 5.51s/it] 42%|████▏ | 2412/5773 [3:45:32<5:05:28, 5.45s/it] 42%|████▏ | 2412/5773 [3:45:27<5:05:28, 5.45s/it] {'loss': 0.5895, 'learning_rate': 1.3098706885315057e-05, 'epoch': 0.42} 42%|████▏ | 2412/5773 [3:45:32<5:05:28, 5.45s/it] {'loss': 0.5895, 'learning_rate': 1.3098706885315057e-05, 'epoch': 0.42} 42%|████▏ | 2412/5773 [3:45:27<5:05:28, 5.45s/it] 42%|████▏ | 2413/5773 [3:45:38<5:04:50, 5.44s/it] 42%|████▏ | 2413/5773 [3:45:32<5:04:50, 5.44s/it] {'loss': 0.5622, 'learning_rate': 1.3093371589071428e-05, 'epoch': 0.42} 42%|████▏ | 2413/5773 [3:45:38<5:04:50, 5.44s/it] {'loss': 0.5622, 'learning_rate': 1.3093371589071428e-05, 'epoch': 0.42} 42%|████▏ | 2413/5773 [3:45:32<5:04:50, 5.44s/it] 42%|████▏ | 2414/5773 [3:45:43<5:03:49, 5.43s/it] 42%|████▏ | 2414/5773 [3:45:38<5:03:49, 5.43s/it] {'loss': 0.5791, 'learning_rate': 1.308803531893559e-05, 'epoch': 0.42} 42%|████▏ | 2414/5773 [3:45:43<5:03:49, 5.43s/it] {'loss': 0.5791, 'learning_rate': 1.308803531893559e-05, 'epoch': 0.42} 42%|████▏ | 2414/5773 [3:45:38<5:03:49, 5.43s/it] 42%|████▏ | 2415/5773 [3:45:49<5:05:58, 5.47s/it] 42%|████▏ | 2415/5773 [3:45:43<5:05:58, 5.47s/it] {'loss': 0.5641, 'learning_rate': 1.3082698076587573e-05, 'epoch': 0.42} 42%|████▏ | 2415/5773 [3:45:49<5:05:58, 5.47s/it] {'loss': 0.5641, 'learning_rate': 1.3082698076587573e-05, 'epoch': 0.42} 42%|████▏ | 2415/5773 [3:45:43<5:05:58, 5.47s/it] 42%|████▏ | 2416/5773 [3:45:54<5:07:32, 5.50s/it] 42%|████▏ | 2416/5773 [3:45:49<5:07:32, 5.50s/it] {'loss': 0.5725, 'learning_rate': 1.307735986370771e-05, 'epoch': 0.42} 42%|████▏ | 2416/5773 [3:45:54<5:07:32, 5.50s/it] {'loss': 0.5725, 'learning_rate': 1.307735986370771e-05, 'epoch': 0.42} 42%|████▏ | 2416/5773 [3:45:49<5:07:32, 5.50s/it] 42%|████▏ | 2417/5773 [3:46:00<5:05:56, 5.47s/it] 42%|████▏ | 2417/5773 [3:45:54<5:05:56, 5.47s/it] {'loss': 0.5817, 'learning_rate': 1.3072020681976639e-05, 'epoch': 0.42} 42%|████▏ | 2417/5773 [3:46:00<5:05:56, 5.47s/it] {'loss': 0.5817, 'learning_rate': 1.3072020681976639e-05, 'epoch': 0.42} 42%|████▏ | 2417/5773 [3:45:54<5:05:56, 5.47s/it] 42%|████▏ | 2418/5773 [3:46:05<5:06:59, 5.49s/it] 42%|████▏ | 2418/5773 [3:46:00<5:06:59, 5.49s/it] {'loss': 0.5839, 'learning_rate': 1.3066680533075312e-05, 'epoch': 0.42} 42%|████▏ | 2418/5773 [3:46:05<5:06:59, 5.49s/it] {'loss': 0.5839, 'learning_rate': 1.3066680533075312e-05, 'epoch': 0.42} 42%|████▏ | 2418/5773 [3:46:00<5:06:59, 5.49s/it] 42%|████▏ | 2419/5773 [3:46:11<5:07:19, 5.50s/it] 42%|████▏ | 2419/5773 [3:46:05<5:07:19, 5.50s/it] {'loss': 0.5825, 'learning_rate': 1.3061339418684967e-05, 'epoch': 0.42} 42%|████▏ | 2419/5773 [3:46:11<5:07:19, 5.50s/it] {'loss': 0.5825, 'learning_rate': 1.3061339418684967e-05, 'epoch': 0.42} 42%|████▏ | 2419/5773 [3:46:05<5:07:19, 5.50s/it] 42%|████▏ | 2420/5773 [3:46:16<5:06:42, 5.49s/it] 42%|████▏ | 2420/5773 [3:46:11<5:06:42, 5.49s/it] {'loss': 0.5727, 'learning_rate': 1.3055997340487165e-05, 'epoch': 0.42} 42%|████▏ | 2420/5773 [3:46:16<5:06:42, 5.49s/it] {'loss': 0.5727, 'learning_rate': 1.3055997340487165e-05, 'epoch': 0.42} 42%|████▏ | 2420/5773 [3:46:11<5:06:42, 5.49s/it] 42%|████▏ | 2421/5773 [3:46:22<5:07:39, 5.51s/it] 42%|████▏ | 2421/5773 [3:46:16<5:07:39, 5.51s/it] {'loss': 0.5592, 'learning_rate': 1.3050654300163763e-05, 'epoch': 0.42} 42%|████▏ | 2421/5773 [3:46:22<5:07:39, 5.51s/it] {'loss': 0.5592, 'learning_rate': 1.3050654300163763e-05, 'epoch': 0.42} 42%|████▏ | 2421/5773 [3:46:16<5:07:39, 5.51s/it] 42%|████▏ | 2422/5773 [3:46:27<5:06:22, 5.49s/it] 42%|████▏ | 2422/5773 [3:46:22<5:06:22, 5.49s/it] {'loss': 0.5879, 'learning_rate': 1.304531029939692e-05, 'epoch': 0.42} 42%|████▏ | 2422/5773 [3:46:27<5:06:22, 5.49s/it] {'loss': 0.5879, 'learning_rate': 1.304531029939692e-05, 'epoch': 0.42} 42%|████▏ | 2422/5773 [3:46:22<5:06:22, 5.49s/it] 42%|████▏ | 2423/5773 [3:46:33<5:06:14, 5.48s/it] 42%|████▏ | 2423/5773 [3:46:27<5:06:14, 5.48s/it] {'loss': 0.5637, 'learning_rate': 1.30399653398691e-05, 'epoch': 0.42} 42%|████▏ | 2423/5773 [3:46:33<5:06:14, 5.48s/it] {'loss': 0.5637, 'learning_rate': 1.30399653398691e-05, 'epoch': 0.42} 42%|████▏ | 2423/5773 [3:46:27<5:06:14, 5.48s/it] 42%|████▏ | 2424/5773 [3:46:38<5:05:21, 5.47s/it] 42%|████▏ | 2424/5773 [3:46:33<5:05:21, 5.47s/it] {'loss': 0.5782, 'learning_rate': 1.3034619423263063e-05, 'epoch': 0.42} 42%|████▏ | 2424/5773 [3:46:38<5:05:21, 5.47s/it] {'loss': 0.5782, 'learning_rate': 1.3034619423263063e-05, 'epoch': 0.42} 42%|████▏ | 2424/5773 [3:46:33<5:05:21, 5.47s/it] 42%|████▏ | 2425/5773 [3:46:44<5:07:07, 5.50s/it] 42%|████▏ | 2425/5773 [3:46:38<5:07:07, 5.50s/it] {'loss': 0.5599, 'learning_rate': 1.3029272551261875e-05, 'epoch': 0.42} 42%|████▏ | 2425/5773 [3:46:44<5:07:07, 5.50s/it] {'loss': 0.5599, 'learning_rate': 1.3029272551261875e-05, 'epoch': 0.42} 42%|████▏ | 2425/5773 [3:46:38<5:07:07, 5.50s/it] 42%|████▏ | 2426/5773 [3:46:49<5:03:35, 5.44s/it] 42%|████▏ | 2426/5773 [3:46:44<5:03:35, 5.44s/it] {'loss': 0.5856, 'learning_rate': 1.3023924725548902e-05, 'epoch': 0.42} 42%|████▏ | 2426/5773 [3:46:49<5:03:35, 5.44s/it] {'loss': 0.5856, 'learning_rate': 1.3023924725548902e-05, 'epoch': 0.42} 42%|████▏ | 2426/5773 [3:46:44<5:03:35, 5.44s/it] 42%|████▏ | 2427/5773 [3:46:55<5:05:05, 5.47s/it] 42%|████▏ | 2427/5773 [3:46:49<5:05:05, 5.47s/it] {'loss': 0.561, 'learning_rate': 1.3018575947807812e-05, 'epoch': 0.42} 42%|████▏ | 2427/5773 [3:46:55<5:05:05, 5.47s/it] {'loss': 0.561, 'learning_rate': 1.3018575947807812e-05, 'epoch': 0.42} 42%|████▏ | 2427/5773 [3:46:49<5:05:05, 5.47s/it] 42%|████▏ | 2428/5773 [3:47:00<5:03:40, 5.45s/it] 42%|████▏ | 2428/5773 [3:46:54<5:03:40, 5.45s/it] {'loss': 0.5835, 'learning_rate': 1.3013226219722575e-05, 'epoch': 0.42} 42%|████▏ | 2428/5773 [3:47:00<5:03:40, 5.45s/it] {'loss': 0.5835, 'learning_rate': 1.3013226219722575e-05, 'epoch': 0.42} 42%|████▏ | 2428/5773 [3:46:54<5:03:40, 5.45s/it] 42%|████▏ | 2429/5773 [3:47:05<5:01:38, 5.41s/it] 42%|████▏ | 2429/5773 [3:47:00<5:01:38, 5.41s/it] {'loss': 0.5977, 'learning_rate': 1.3007875542977448e-05, 'epoch': 0.42} 42%|████▏ | 2429/5773 [3:47:05<5:01:38, 5.41s/it] {'loss': 0.5977, 'learning_rate': 1.3007875542977448e-05, 'epoch': 0.42} 42%|████▏ | 2429/5773 [3:47:00<5:01:38, 5.41s/it] 42%|████▏ | 2430/5773 [3:47:11<5:02:46, 5.43s/it] 42%|████▏ | 2430/5773 [3:47:05<5:02:46, 5.43s/it] {'loss': 0.5908, 'learning_rate': 1.3002523919257e-05, 'epoch': 0.42} 42%|████▏ | 2430/5773 [3:47:11<5:02:46, 5.43s/it] {'loss': 0.5908, 'learning_rate': 1.3002523919257e-05, 'epoch': 0.42} 42%|████▏ | 2430/5773 [3:47:05<5:02:46, 5.43s/it] 42%|████▏ | 2431/5773 [3:47:16<5:00:59, 5.40s/it] 42%|████▏ | 2431/5773 [3:47:11<5:00:59, 5.40s/it] {'loss': 0.5627, 'learning_rate': 1.2997171350246095e-05, 'epoch': 0.42} 42%|████▏ | 2431/5773 [3:47:16<5:00:59, 5.40s/it] {'loss': 0.5627, 'learning_rate': 1.2997171350246095e-05, 'epoch': 0.42} 42%|████▏ | 2431/5773 [3:47:11<5:00:59, 5.40s/it] 42%|████▏ | 2432/5773 [3:47:22<5:07:40, 5.53s/it] 42%|████▏ | 2432/5773 [3:47:16<5:07:40, 5.53s/it] {'loss': 0.5806, 'learning_rate': 1.2991817837629885e-05, 'epoch': 0.42} 42%|████▏ | 2432/5773 [3:47:22<5:07:40, 5.53s/it] {'loss': 0.5806, 'learning_rate': 1.2991817837629885e-05, 'epoch': 0.42} 42%|████▏ | 2432/5773 [3:47:16<5:07:40, 5.53s/it] 42%|████▏ | 2433/5773 [3:47:27<5:04:58, 5.48s/it] 42%|████▏ | 2433/5773 [3:47:22<5:04:58, 5.48s/it] {'loss': 0.5824, 'learning_rate': 1.2986463383093838e-05, 'epoch': 0.42} 42%|████▏ | 2433/5773 [3:47:27<5:04:58, 5.48s/it] {'loss': 0.5824, 'learning_rate': 1.2986463383093838e-05, 'epoch': 0.42} 42%|████▏ | 2433/5773 [3:47:22<5:04:58, 5.48s/it] 42%|████▏ | 2434/5773 [3:47:33<5:07:11, 5.52s/it] 42%|████▏ | 2434/5773 [3:47:27<5:07:11, 5.52s/it] {'loss': 0.5702, 'learning_rate': 1.2981107988323695e-05, 'epoch': 0.42} 42%|████▏ | 2434/5773 [3:47:33<5:07:11, 5.52s/it] {'loss': 0.5702, 'learning_rate': 1.2981107988323695e-05, 'epoch': 0.42} 42%|████▏ | 2434/5773 [3:47:27<5:07:11, 5.52s/it] 42%|████▏ | 2435/5773 [3:47:38<5:07:14, 5.52s/it] 42%|████▏ | 2435/5773 [3:47:33<5:07:14, 5.52s/it] {'loss': 0.5795, 'learning_rate': 1.2975751655005512e-05, 'epoch': 0.42} 42%|████▏ | 2435/5773 [3:47:38<5:07:14, 5.52s/it] {'loss': 0.5795, 'learning_rate': 1.2975751655005512e-05, 'epoch': 0.42} 42%|████▏ | 2435/5773 [3:47:33<5:07:14, 5.52s/it] 42%|████▏ | 2436/5773 [3:47:44<5:03:16, 5.45s/it] 42%|████▏ | 2436/5773 [3:47:38<5:03:16, 5.45s/it] {'loss': 0.5773, 'learning_rate': 1.2970394384825637e-05, 'epoch': 0.42} 42%|████▏ | 2436/5773 [3:47:44<5:03:16, 5.45s/it] {'loss': 0.5773, 'learning_rate': 1.2970394384825637e-05, 'epoch': 0.42} 42%|████▏ | 2436/5773 [3:47:38<5:03:16, 5.45s/it] 42%|████▏ | 2437/5773 [3:47:49<5:02:47, 5.45s/it] 42%|████▏ | 2437/5773 [3:47:44<5:02:47, 5.45s/it] {'loss': 0.5654, 'learning_rate': 1.2965036179470707e-05, 'epoch': 0.42} 42%|████▏ | 2437/5773 [3:47:49<5:02:47, 5.45s/it] {'loss': 0.5654, 'learning_rate': 1.2965036179470707e-05, 'epoch': 0.42} 42%|████▏ | 2437/5773 [3:47:44<5:02:47, 5.45s/it] 42%|████▏ | 2438/5773 [3:47:55<5:06:22, 5.51s/it] 42%|████▏ | 2438/5773 [3:47:49<5:06:22, 5.51s/it] {'loss': 0.5929, 'learning_rate': 1.2959677040627653e-05, 'epoch': 0.42} 42%|████▏ | 2438/5773 [3:47:55<5:06:22, 5.51s/it] {'loss': 0.5929, 'learning_rate': 1.2959677040627653e-05, 'epoch': 0.42} 42%|████▏ | 2438/5773 [3:47:49<5:06:22, 5.51s/it] 42%|████▏ | 2439/5773 [3:48:01<5:21:53, 5.79s/it] 42%|████▏ | 2439/5773 [3:47:56<5:21:53, 5.79s/it] {'loss': 0.5697, 'learning_rate': 1.2954316969983704e-05, 'epoch': 0.42} 42%|████▏ | 2439/5773 [3:48:01<5:21:53, 5.79s/it] {'loss': 0.5697, 'learning_rate': 1.2954316969983704e-05, 'epoch': 0.42} 42%|████▏ | 2439/5773 [3:47:56<5:21:53, 5.79s/it] 42%|████▏ | 2440/5773 [3:48:07<5:14:56, 5.67s/it] 42%|████▏ | 2440/5773 [3:48:01<5:14:56, 5.67s/it] {'loss': 0.5751, 'learning_rate': 1.2948955969226384e-05, 'epoch': 0.42} 42%|████▏ | 2440/5773 [3:48:07<5:14:56, 5.67s/it] {'loss': 0.5751, 'learning_rate': 1.2948955969226384e-05, 'epoch': 0.42} 42%|████▏ | 2440/5773 [3:48:01<5:14:56, 5.67s/it] 42%|████▏ | 2441/5773 [3:48:12<5:09:19, 5.57s/it] 42%|████▏ | 2441/5773 [3:48:06<5:09:19, 5.57s/it] {'loss': 0.5835, 'learning_rate': 1.294359404004351e-05, 'epoch': 0.42} 42%|████▏ | 2441/5773 [3:48:12<5:09:19, 5.57s/it] {'loss': 0.5835, 'learning_rate': 1.294359404004351e-05, 'epoch': 0.42} 42%|████▏ | 2441/5773 [3:48:06<5:09:19, 5.57s/it] 42%|████▏ | 2442/5773 [3:48:17<5:05:33, 5.50s/it] 42%|████▏ | 2442/5773 [3:48:12<5:05:33, 5.50s/it] {'loss': 0.5799, 'learning_rate': 1.293823118412318e-05, 'epoch': 0.42} 42%|████▏ | 2442/5773 [3:48:17<5:05:33, 5.50s/it] {'loss': 0.5799, 'learning_rate': 1.293823118412318e-05, 'epoch': 0.42} 42%|████▏ | 2442/5773 [3:48:12<5:05:33, 5.50s/it] 42%|████▏ | 2443/5773 [3:48:23<5:04:17, 5.48s/it] 42%|████▏ | 2443/5773 [3:48:17<5:04:17, 5.48s/it] {'loss': 0.5659, 'learning_rate': 1.2932867403153799e-05, 'epoch': 0.42} 42%|████▏ | 2443/5773 [3:48:23<5:04:17, 5.48s/it] {'loss': 0.5659, 'learning_rate': 1.2932867403153799e-05, 'epoch': 0.42} 42%|████▏ | 2443/5773 [3:48:17<5:04:17, 5.48s/it] 42%|████▏ | 2444/5773 [3:48:28<5:02:28, 5.45s/it] 42%|████▏ | 2444/5773 [3:48:23<5:02:28, 5.45s/it] {'loss': 0.6032, 'learning_rate': 1.2927502698824057e-05, 'epoch': 0.42} 42%|████▏ | 2444/5773 [3:48:28<5:02:28, 5.45s/it] {'loss': 0.6032, 'learning_rate': 1.2927502698824057e-05, 'epoch': 0.42} 42%|████▏ | 2444/5773 [3:48:23<5:02:28, 5.45s/it] 42%|████▏ | 2445/5773 [3:48:34<5:00:42, 5.42s/it] 42%|████▏ | 2445/5773 [3:48:28<5:00:42, 5.42s/it] {'loss': 0.5823, 'learning_rate': 1.2922137072822932e-05, 'epoch': 0.42} 42%|████▏ | 2445/5773 [3:48:34<5:00:42, 5.42s/it] {'loss': 0.5823, 'learning_rate': 1.2922137072822932e-05, 'epoch': 0.42} 42%|████▏ | 2445/5773 [3:48:28<5:00:42, 5.42s/it] 42%|████▏ | 2446/5773 [3:48:39<5:00:57, 5.43s/it] 42%|████▏ | 2446/5773 [3:48:33<5:00:57, 5.43s/it] {'loss': 0.5806, 'learning_rate': 1.2916770526839693e-05, 'epoch': 0.42} 42%|████▏ | 2446/5773 [3:48:39<5:00:57, 5.43s/it] {'loss': 0.5806, 'learning_rate': 1.2916770526839693e-05, 'epoch': 0.42} 42%|████▏ | 2446/5773 [3:48:33<5:00:57, 5.43s/it] 42%|████▏ | 2447/5773 [3:48:44<4:58:17, 5.38s/it] 42%|████▏ | 2447/5773 [3:48:39<4:58:17, 5.38s/it] {'loss': 0.5523, 'learning_rate': 1.2911403062563901e-05, 'epoch': 0.42} 42%|████▏ | 2447/5773 [3:48:44<4:58:17, 5.38s/it] {'loss': 0.5523, 'learning_rate': 1.2911403062563901e-05, 'epoch': 0.42} 42%|████▏ | 2447/5773 [3:48:39<4:58:17, 5.38s/it] 42%|████▏ | 2448/5773 [3:48:50<5:00:23, 5.42s/it] 42%|████▏ | 2448/5773 [3:48:44<5:00:23, 5.42s/it] {'loss': 0.5773, 'learning_rate': 1.2906034681685409e-05, 'epoch': 0.42} 42%|████▏ | 2448/5773 [3:48:50<5:00:23, 5.42s/it] {'loss': 0.5773, 'learning_rate': 1.2906034681685409e-05, 'epoch': 0.42} 42%|████▏ | 2448/5773 [3:48:44<5:00:23, 5.42s/it] 42%|████▏ | 2449/5773 [3:48:55<5:00:46, 5.43s/it] 42%|████▏ | 2449/5773 [3:48:50<5:00:46, 5.43s/it] {'loss': 0.577, 'learning_rate': 1.2900665385894351e-05, 'epoch': 0.42} 42%|████▏ | 2449/5773 [3:48:55<5:00:46, 5.43s/it] {'loss': 0.577, 'learning_rate': 1.2900665385894351e-05, 'epoch': 0.42} 42%|████▏ | 2449/5773 [3:48:50<5:00:46, 5.43s/it]3 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 15872 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 42%|████▏ | 2450/5773 [3:49:01<5:01:14, 5.44s/it]6 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 42%|████▏ | 2450/5773 [3:48:55<5:01:14, 5.44s/it]9 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... {'loss': 0.5688, 'learning_rate': 1.2895295176881153e-05, 'epoch': 0.42} 42%|████▏ | 2450/5773 [3:49:01<5:01:14, 5.44s/it] {'loss': 0.5688, 'learning_rate': 1.2895295176881153e-05, 'epoch': 0.42} 42%|████▏ | 2450/5773 [3:48:55<5:01:14, 5.44s/it] 42%|████▏ | 2451/5773 [3:49:06<5:01:54, 5.45s/it] 42%|████▏ | 2451/5773 [3:49:01<5:01:54, 5.45s/it] {'loss': 0.5621, 'learning_rate': 1.2889924056336531e-05, 'epoch': 0.42} 42%|████▏ | 2451/5773 [3:49:06<5:01:54, 5.45s/it] {'loss': 0.5621, 'learning_rate': 1.2889924056336531e-05, 'epoch': 0.42} 42%|████▏ | 2451/5773 [3:49:01<5:01:54, 5.45s/it] 42%|████▏ | 2452/5773 [3:49:11<4:59:48, 5.42s/it] 42%|████▏ | 2452/5773 [3:49:06<4:59:48, 5.42s/it] {'loss': 0.5637, 'learning_rate': 1.2884552025951482e-05, 'epoch': 0.42} 42%|████▏ | 2452/5773 [3:49:11<4:59:48, 5.42s/it] {'loss': 0.5637, 'learning_rate': 1.2884552025951482e-05, 'epoch': 0.42} 42%|████▏ | 2452/5773 [3:49:06<4:59:48, 5.42s/it] 42%|████▏ | 2453/5773 [3:49:17<5:01:41, 5.45s/it] 42%|████▏ | 2453/5773 [3:49:11<5:01:41, 5.45s/it] {'loss': 0.5738, 'learning_rate': 1.2879179087417295e-05, 'epoch': 0.42} 42%|████▏ | 2453/5773 [3:49:17<5:01:41, 5.45s/it] {'loss': 0.5738, 'learning_rate': 1.2879179087417295e-05, 'epoch': 0.42} 42%|████▏ | 2453/5773 [3:49:11<5:01:41, 5.45s/it] 43%|████▎ | 2454/5773 [3:49:22<5:00:01, 5.42s/it] 43%|████▎ | 2454/5773 [3:49:17<5:00:01, 5.42s/it] {'loss': 0.5754, 'learning_rate': 1.2873805242425543e-05, 'epoch': 0.43} 43%|████▎ | 2454/5773 [3:49:22<5:00:01, 5.42s/it] {'loss': 0.5754, 'learning_rate': 1.2873805242425543e-05, 'epoch': 0.43} 43%|████▎ | 2454/5773 [3:49:17<5:00:01, 5.42s/it] 43%|████▎ | 2455/5773 [3:49:28<4:58:37, 5.40s/it] 43%|████▎ | 2455/5773 [3:49:22<4:58:37, 5.40s/it] {'loss': 0.577, 'learning_rate': 1.2868430492668083e-05, 'epoch': 0.43} 43%|████▎ | 2455/5773 [3:49:28<4:58:37, 5.40s/it] {'loss': 0.577, 'learning_rate': 1.2868430492668083e-05, 'epoch': 0.43} 43%|████▎ | 2455/5773 [3:49:22<4:58:37, 5.40s/it] 43%|████▎ | 2456/5773 [3:49:28<4:57:17, 5.38s/it] 43%|████▎ | 2456/5773 [3:49:33<4:57:18, 5.38s/it] {'loss': 0.5827, 'learning_rate': 1.2863054839837056e-05, 'epoch': 0.43} 43%|████▎ | 2456/5773 [3:49:33<4:57:18, 5.38s/it] {'loss': 0.5827, 'learning_rate': 1.2863054839837056e-05, 'epoch': 0.43} 43%|████▎ | 2456/5773 [3:49:28<4:57:17, 5.38s/it] 43%|████▎ | 2457/5773 [3:49:38<4:57:09, 5.38s/it] 43%|████▎ | 2457/5773 [3:49:33<4:57:09, 5.38s/it] {'loss': 0.5806, 'learning_rate': 1.2857678285624892e-05, 'epoch': 0.43} 43%|████▎ | 2457/5773 [3:49:38<4:57:09, 5.38s/it] {'loss': 0.5806, 'learning_rate': 1.2857678285624892e-05, 'epoch': 0.43} 43%|████▎ | 2457/5773 [3:49:33<4:57:09, 5.38s/it] 43%|████▎ | 2458/5773 [3:49:44<4:57:46, 5.39s/it] 43%|████▎ | 2458/5773 [3:49:38<4:57:46, 5.39s/it] {'loss': 0.571, 'learning_rate': 1.2852300831724306e-05, 'epoch': 0.43} 43%|████▎ | 2458/5773 [3:49:44<4:57:46, 5.39s/it] {'loss': 0.571, 'learning_rate': 1.2852300831724306e-05, 'epoch': 0.43} 43%|████▎ | 2458/5773 [3:49:38<4:57:46, 5.39s/it] 43%|████▎ | 2459/5773 [3:49:49<4:59:50, 5.43s/it] 43%|████▎ | 2459/5773 [3:49:44<4:59:50, 5.43s/it] {'loss': 0.5855, 'learning_rate': 1.2846922479828285e-05, 'epoch': 0.43} 43%|████▎ | 2459/5773 [3:49:49<4:59:50, 5.43s/it] {'loss': 0.5855, 'learning_rate': 1.2846922479828285e-05, 'epoch': 0.43} 43%|████▎ | 2459/5773 [3:49:44<4:59:50, 5.43s/it] 43%|████▎ | 2460/5773 [3:49:49<4:59:35, 5.43s/it] 43%|████▎ | 2460/5773 [3:49:55<4:59:35, 5.43s/it] {'loss': 0.5649, 'learning_rate': 1.2841543231630107e-05, 'epoch': 0.43} 43%|████▎ | 2460/5773 [3:49:55<4:59:35, 5.43s/it] {'loss': 0.5649, 'learning_rate': 1.2841543231630107e-05, 'epoch': 0.43} 43%|████▎ | 2460/5773 [3:49:49<4:59:35, 5.43s/it] 43%|████▎ | 2461/5773 [3:49:54<4:56:37, 5.37s/it] 43%|████▎ | 2461/5773 [3:50:00<4:56:37, 5.37s/it] {'loss': 0.5721, 'learning_rate': 1.2836163088823335e-05, 'epoch': 0.43} 43%|████▎ | 2461/5773 [3:50:00<4:56:37, 5.37s/it] {'loss': 0.5721, 'learning_rate': 1.2836163088823335e-05, 'epoch': 0.43} 43%|████▎ | 2461/5773 [3:49:54<4:56:37, 5.37s/it] 43%|████▎ | 2462/5773 [3:50:00<4:55:36, 5.36s/it] 43%|████▎ | 2462/5773 [3:50:05<4:55:36, 5.36s/it] {'loss': 0.5649, 'learning_rate': 1.2830782053101807e-05, 'epoch': 0.43} 43%|████▎ | 2462/5773 [3:50:05<4:55:36, 5.36s/it] {'loss': 0.5649, 'learning_rate': 1.2830782053101807e-05, 'epoch': 0.43} 43%|████▎ | 2462/5773 [3:50:00<4:55:36, 5.36s/it] 43%|████▎ | 2463/5773 [3:50:11<4:54:28, 5.34s/it] 43%|████▎ | 2463/5773 [3:50:05<4:54:28, 5.34s/it] {'loss': 0.5851, 'learning_rate': 1.2825400126159644e-05, 'epoch': 0.43} 43%|████▎ | 2463/5773 [3:50:11<4:54:28, 5.34s/it] {'loss': 0.5851, 'learning_rate': 1.2825400126159644e-05, 'epoch': 0.43} 43%|████▎ | 2463/5773 [3:50:05<4:54:28, 5.34s/it] 43%|████▎ | 2464/5773 [3:50:11<4:57:58, 5.40s/it] 43%|████▎ | 2464/5773 [3:50:16<4:57:59, 5.40s/it] {'loss': 0.5773, 'learning_rate': 1.2820017309691254e-05, 'epoch': 0.43} 43%|████▎ | 2464/5773 [3:50:16<4:57:59, 5.40s/it] {'loss': 0.5773, 'learning_rate': 1.2820017309691254e-05, 'epoch': 0.43} 43%|████▎ | 2464/5773 [3:50:11<4:57:58, 5.40s/it] 43%|████▎ | 2465/5773 [3:50:22<4:58:13, 5.41s/it] 43%|████▎ | 2465/5773 [3:50:16<4:58:14, 5.41s/it] {'loss': 0.5697, 'learning_rate': 1.2814633605391316e-05, 'epoch': 0.43} 43%|████▎ | 2465/5773 [3:50:22<4:58:13, 5.41s/it] {'loss': 0.5697, 'learning_rate': 1.2814633605391316e-05, 'epoch': 0.43} 43%|████▎ | 2465/5773 [3:50:16<4:58:14, 5.41s/it] 43%|████▎ | 2466/5773 [3:50:22<4:59:13, 5.43s/it] 43%|████▎ | 2466/5773 [3:50:27<4:59:14, 5.43s/it] {'loss': 0.5853, 'learning_rate': 1.2809249014954788e-05, 'epoch': 0.43} 43%|████▎ | 2466/5773 [3:50:27<4:59:14, 5.43s/it] {'loss': 0.5853, 'learning_rate': 1.2809249014954788e-05, 'epoch': 0.43} 43%|████▎ | 2466/5773 [3:50:22<4:59:13, 5.43s/it] 43%|████▎ | 2467/5773 [3:50:33<5:00:35, 5.46s/it] 43%|████▎ | 2467/5773 [3:50:27<5:00:36, 5.46s/it] {'loss': 0.5761, 'learning_rate': 1.2803863540076918e-05, 'epoch': 0.43} 43%|████▎ | 2467/5773 [3:50:33<5:00:35, 5.46s/it] {'loss': 0.5761, 'learning_rate': 1.2803863540076918e-05, 'epoch': 0.43} 43%|████▎ | 2467/5773 [3:50:27<5:00:36, 5.46s/it] 43%|████▎ | 2468/5773 [3:50:38<4:59:35, 5.44s/it] 43%|████▎ | 2468/5773 [3:50:32<4:59:35, 5.44s/it] {'loss': 0.5687, 'learning_rate': 1.2798477182453221e-05, 'epoch': 0.43} 43%|████▎ | 2468/5773 [3:50:38<4:59:35, 5.44s/it] {'loss': 0.5687, 'learning_rate': 1.2798477182453221e-05, 'epoch': 0.43} 43%|████▎ | 2468/5773 [3:50:32<4:59:35, 5.44s/it] 43%|████▎ | 2469/5773 [3:50:43<4:58:08, 5.41s/it] 43%|████▎ | 2469/5773 [3:50:38<4:58:08, 5.41s/it] {'loss': 0.5756, 'learning_rate': 1.27930899437795e-05, 'epoch': 0.43} 43%|████▎ | 2469/5773 [3:50:43<4:58:08, 5.41s/it] {'loss': 0.5756, 'learning_rate': 1.27930899437795e-05, 'epoch': 0.43} 43%|████▎ | 2469/5773 [3:50:38<4:58:08, 5.41s/it] 43%|████▎ | 2470/5773 [3:50:49<4:57:47, 5.41s/it] 43%|████▎ | 2470/5773 [3:50:43<4:57:47, 5.41s/it] {'loss': 0.5732, 'learning_rate': 1.2787701825751821e-05, 'epoch': 0.43} 43%|████▎ | 2470/5773 [3:50:49<4:57:47, 5.41s/it] {'loss': 0.5732, 'learning_rate': 1.2787701825751821e-05, 'epoch': 0.43} 43%|████▎ | 2470/5773 [3:50:43<4:57:47, 5.41s/it] 43%|████▎ | 2471/5773 [3:50:54<4:59:32, 5.44s/it] 43%|████▎ | 2471/5773 [3:50:49<4:59:32, 5.44s/it] {'loss': 0.5628, 'learning_rate': 1.2782312830066543e-05, 'epoch': 0.43} 43%|████▎ | 2471/5773 [3:50:54<4:59:32, 5.44s/it] {'loss': 0.5628, 'learning_rate': 1.2782312830066543e-05, 'epoch': 0.43} 43%|████▎ | 2471/5773 [3:50:49<4:59:32, 5.44s/it] 43%|████▎ | 2472/5773 [3:51:00<4:59:21, 5.44s/it] 43%|████▎ | 2472/5773 [3:50:54<4:59:21, 5.44s/it] {'loss': 0.57, 'learning_rate': 1.2776922958420295e-05, 'epoch': 0.43} 43%|████▎ | 2472/5773 [3:51:00<4:59:21, 5.44s/it] {'loss': 0.57, 'learning_rate': 1.2776922958420295e-05, 'epoch': 0.43} 43%|████▎ | 2472/5773 [3:50:54<4:59:21, 5.44s/it] 43%|████▎ | 2473/5773 [3:51:05<4:58:09, 5.42s/it] 43%|████▎ | 2473/5773 [3:51:00<4:58:09, 5.42s/it] {'loss': 0.5769, 'learning_rate': 1.2771532212509974e-05, 'epoch': 0.43} 43%|████▎ | 2473/5773 [3:51:05<4:58:09, 5.42s/it] {'loss': 0.5769, 'learning_rate': 1.2771532212509974e-05, 'epoch': 0.43} 43%|████▎ | 2473/5773 [3:51:00<4:58:09, 5.42s/it] 43%|████▎ | 2474/5773 [3:51:10<4:56:51, 5.40s/it] 43%|████▎ | 2474/5773 [3:51:05<4:56:51, 5.40s/it] {'loss': 0.5766, 'learning_rate': 1.2766140594032762e-05, 'epoch': 0.43} 43%|████▎ | 2474/5773 [3:51:10<4:56:51, 5.40s/it] {'loss': 0.5766, 'learning_rate': 1.2766140594032762e-05, 'epoch': 0.43} 43%|████▎ | 2474/5773 [3:51:05<4:56:51, 5.40s/it] 43%|████▎ | 2475/5773 [3:51:16<4:58:40, 5.43s/it] 43%|████▎ | 2475/5773 [3:51:10<4:58:40, 5.43s/it] {'loss': 0.58, 'learning_rate': 1.2760748104686116e-05, 'epoch': 0.43} 43%|████▎ | 2475/5773 [3:51:16<4:58:40, 5.43s/it] {'loss': 0.58, 'learning_rate': 1.2760748104686116e-05, 'epoch': 0.43} 43%|████▎ | 2475/5773 [3:51:10<4:58:40, 5.43s/it] 43%|████▎ | 2476/5773 [3:51:21<4:56:53, 5.40s/it] 43%|████▎ | 2476/5773 [3:51:16<4:56:53, 5.40s/it] {'loss': 0.5768, 'learning_rate': 1.2755354746167758e-05, 'epoch': 0.43} 43%|████▎ | 2476/5773 [3:51:21<4:56:53, 5.40s/it] {'loss': 0.5768, 'learning_rate': 1.2755354746167758e-05, 'epoch': 0.43} 43%|████▎ | 2476/5773 [3:51:16<4:56:53, 5.40s/it] 43%|████▎ | 2477/5773 [3:51:27<5:02:19, 5.50s/it] 43%|████▎ | 2477/5773 [3:51:21<5:02:19, 5.50s/it] {'loss': 0.5715, 'learning_rate': 1.2749960520175696e-05, 'epoch': 0.43} 43%|████▎ | 2477/5773 [3:51:27<5:02:19, 5.50s/it] {'loss': 0.5715, 'learning_rate': 1.2749960520175696e-05, 'epoch': 0.43} 43%|████▎ | 2477/5773 [3:51:21<5:02:19, 5.50s/it] 43%|████▎ | 2478/5773 [3:51:32<5:00:44, 5.48s/it] 43%|████▎ | 2478/5773 [3:51:27<5:00:44, 5.48s/it] {'loss': 0.5747, 'learning_rate': 1.2744565428408202e-05, 'epoch': 0.43} 43%|████▎ | 2478/5773 [3:51:32<5:00:44, 5.48s/it] {'loss': 0.5747, 'learning_rate': 1.2744565428408202e-05, 'epoch': 0.43} 43%|████▎ | 2478/5773 [3:51:27<5:00:44, 5.48s/it] 43%|████▎ | 2479/5773 [3:51:38<4:58:49, 5.44s/it] 43%|████▎ | 2479/5773 [3:51:32<4:58:49, 5.44s/it] {'loss': 0.5664, 'learning_rate': 1.2739169472563822e-05, 'epoch': 0.43} 43%|████▎ | 2479/5773 [3:51:38<4:58:49, 5.44s/it] {'loss': 0.5664, 'learning_rate': 1.2739169472563822e-05, 'epoch': 0.43} 43%|████▎ | 2479/5773 [3:51:32<4:58:49, 5.44s/it] 43%|████▎ | 2480/5773 [3:51:43<4:59:59, 5.47s/it] 43%|████▎ | 2480/5773 [3:51:38<4:59:59, 5.47s/it] {'loss': 0.5732, 'learning_rate': 1.2733772654341376e-05, 'epoch': 0.43} 43%|████▎ | 2480/5773 [3:51:43<4:59:59, 5.47s/it] {'loss': 0.5732, 'learning_rate': 1.2733772654341376e-05, 'epoch': 0.43} 43%|████▎ | 2480/5773 [3:51:38<4:59:59, 5.47s/it] 43%|████▎ | 2481/5773 [3:51:43<5:01:21, 5.49s/it] 43%|████▎ | 2481/5773 [3:51:49<5:01:22, 5.49s/it] {'loss': 0.5724, 'learning_rate': 1.2728374975439954e-05, 'epoch': 0.43} 43%|████▎ | 2481/5773 [3:51:49<5:01:22, 5.49s/it] {'loss': 0.5724, 'learning_rate': 1.2728374975439954e-05, 'epoch': 0.43} 43%|████▎ | 2481/5773 [3:51:43<5:01:21, 5.49s/it] 43%|████▎ | 2482/5773 [3:51:54<4:59:07, 5.45s/it] 43%|████▎ | 2482/5773 [3:51:49<4:59:07, 5.45s/it] {'loss': 0.5726, 'learning_rate': 1.2722976437558919e-05, 'epoch': 0.43} 43%|████▎ | 2482/5773 [3:51:54<4:59:07, 5.45s/it] {'loss': 0.5726, 'learning_rate': 1.2722976437558919e-05, 'epoch': 0.43} 43%|████▎ | 2482/5773 [3:51:49<4:59:07, 5.45s/it] 43%|████▎ | 2483/5773 [3:52:00<5:00:04, 5.47s/it] 43%|████▎ | 2483/5773 [3:51:54<5:00:04, 5.47s/it] {'loss': 0.5707, 'learning_rate': 1.2717577042397904e-05, 'epoch': 0.43} 43%|████▎ | 2483/5773 [3:52:00<5:00:04, 5.47s/it] {'loss': 0.5707, 'learning_rate': 1.2717577042397904e-05, 'epoch': 0.43} 43%|████▎ | 2483/5773 [3:51:54<5:00:04, 5.47s/it] 43%|████▎ | 2484/5773 [3:52:05<5:01:10, 5.49s/it] 43%|████▎ | 2484/5773 [3:52:00<5:01:10, 5.49s/it] {'loss': 0.5934, 'learning_rate': 1.2712176791656807e-05, 'epoch': 0.43} 43%|████▎ | 2484/5773 [3:52:05<5:01:10, 5.49s/it] {'loss': 0.5934, 'learning_rate': 1.2712176791656807e-05, 'epoch': 0.43} 43%|████▎ | 2484/5773 [3:52:00<5:01:10, 5.49s/it] 43%|████▎ | 2485/5773 [3:52:11<5:01:25, 5.50s/it] 43%|████▎ | 2485/5773 [3:52:05<5:01:25, 5.50s/it] {'loss': 0.589, 'learning_rate': 1.2706775687035808e-05, 'epoch': 0.43} 43%|████▎ | 2485/5773 [3:52:11<5:01:25, 5.50s/it] {'loss': 0.589, 'learning_rate': 1.2706775687035808e-05, 'epoch': 0.43} 43%|████▎ | 2485/5773 [3:52:05<5:01:25, 5.50s/it] 43%|████▎ | 2486/5773 [3:52:16<4:58:33, 5.45s/it] 43%|████▎ | 2486/5773 [3:52:11<4:58:33, 5.45s/it] {'loss': 0.569, 'learning_rate': 1.270137373023534e-05, 'epoch': 0.43} 43%|████▎ | 2486/5773 [3:52:16<4:58:33, 5.45s/it] {'loss': 0.569, 'learning_rate': 1.270137373023534e-05, 'epoch': 0.43} 43%|████▎ | 2486/5773 [3:52:11<4:58:33, 5.45s/it] 43%|████▎ | 2487/5773 [3:52:22<4:57:29, 5.43s/it] 43%|████▎ | 2487/5773 [3:52:16<4:57:29, 5.43s/it] {'loss': 0.5825, 'learning_rate': 1.2695970922956108e-05, 'epoch': 0.43} 43%|████▎ | 2487/5773 [3:52:22<4:57:29, 5.43s/it] {'loss': 0.5825, 'learning_rate': 1.2695970922956108e-05, 'epoch': 0.43} 43%|████▎ | 2487/5773 [3:52:16<4:57:29, 5.43s/it] 43%|████▎ | 2488/5773 [3:52:27<4:56:24, 5.41s/it] 43%|████▎ | 2488/5773 [3:52:21<4:56:24, 5.41s/it] {'loss': 0.5771, 'learning_rate': 1.26905672668991e-05, 'epoch': 0.43} 43%|████▎ | 2488/5773 [3:52:27<4:56:24, 5.41s/it] {'loss': 0.5771, 'learning_rate': 1.26905672668991e-05, 'epoch': 0.43} 43%|████▎ | 2488/5773 [3:52:21<4:56:24, 5.41s/it] 43%|████▎ | 2489/5773 [3:52:32<4:56:28, 5.42s/it] 43%|████▎ | 2489/5773 [3:52:27<4:56:28, 5.42s/it] {'loss': 0.5756, 'learning_rate': 1.268516276376555e-05, 'epoch': 0.43} 43%|████▎ | 2489/5773 [3:52:32<4:56:28, 5.42s/it] {'loss': 0.5756, 'learning_rate': 1.268516276376555e-05, 'epoch': 0.43} 43%|████▎ | 2489/5773 [3:52:27<4:56:28, 5.42s/it] 43%|████▎ | 2490/5773 [3:52:38<4:56:31, 5.42s/it] 43%|████▎ | 2490/5773 [3:52:32<4:56:31, 5.42s/it] {'loss': 0.5854, 'learning_rate': 1.2679757415256977e-05, 'epoch': 0.43} 43%|████▎ | 2490/5773 [3:52:38<4:56:31, 5.42s/it] {'loss': 0.5854, 'learning_rate': 1.2679757415256977e-05, 'epoch': 0.43} 43%|████▎ | 2490/5773 [3:52:32<4:56:31, 5.42s/it] 43%|████▎ | 2491/5773 [3:52:38<5:01:02, 5.50s/it] 43%|████▎ | 2491/5773 [3:52:43<5:01:02, 5.50s/it] {'loss': 0.5667, 'learning_rate': 1.2674351223075145e-05, 'epoch': 0.43} 43%|████▎ | 2491/5773 [3:52:43<5:01:02, 5.50s/it] {'loss': 0.5667, 'learning_rate': 1.2674351223075145e-05, 'epoch': 0.43} 43%|████▎ | 2491/5773 [3:52:38<5:01:02, 5.50s/it] 43%|████▎ | 2492/5773 [3:52:49<4:59:25, 5.48s/it] 43%|████▎ | 2492/5773 [3:52:43<4:59:25, 5.48s/it] {'loss': 0.5706, 'learning_rate': 1.2668944188922105e-05, 'epoch': 0.43} 43%|████▎ | 2492/5773 [3:52:49<4:59:25, 5.48s/it] {'loss': 0.5706, 'learning_rate': 1.2668944188922105e-05, 'epoch': 0.43} 43%|████▎ | 2492/5773 [3:52:43<4:59:25, 5.48s/it] 43%|████▎ | 2493/5773 [3:52:54<4:56:59, 5.43s/it] 43%|████▎ | 2493/5773 [3:52:49<4:56:59, 5.43s/it] {'loss': 0.5717, 'learning_rate': 1.2663536314500167e-05, 'epoch': 0.43} 43%|████▎ | 2493/5773 [3:52:54<4:56:59, 5.43s/it] {'loss': 0.5717, 'learning_rate': 1.2663536314500167e-05, 'epoch': 0.43} 43%|████▎ | 2493/5773 [3:52:49<4:56:59, 5.43s/it] 43%|████▎ | 2494/5773 [3:53:00<4:56:56, 5.43s/it] 43%|████▎ | 2494/5773 [3:52:54<4:56:56, 5.43s/it] {'loss': 0.5751, 'learning_rate': 1.2658127601511891e-05, 'epoch': 0.43} 43%|████▎ | 2494/5773 [3:53:00<4:56:56, 5.43s/it] {'loss': 0.5751, 'learning_rate': 1.2658127601511891e-05, 'epoch': 0.43} 43%|████▎ | 2494/5773 [3:52:54<4:56:56, 5.43s/it] 43%|████▎ | 2495/5773 [3:53:05<4:57:06, 5.44s/it] 43%|████▎ | 2495/5773 [3:53:00<4:57:06, 5.44s/it] {'loss': 0.5691, 'learning_rate': 1.2652718051660122e-05, 'epoch': 0.43} 43%|████▎ | 2495/5773 [3:53:05<4:57:06, 5.44s/it] {'loss': 0.5691, 'learning_rate': 1.2652718051660122e-05, 'epoch': 0.43} 43%|████▎ | 2495/5773 [3:53:00<4:57:06, 5.44s/it] 43%|████▎ | 2496/5773 [3:53:11<4:56:48, 5.43s/it] 43%|████▎ | 2496/5773 [3:53:05<4:56:48, 5.43s/it] {'loss': 0.5919, 'learning_rate': 1.264730766664796e-05, 'epoch': 0.43} 43%|████▎ | 2496/5773 [3:53:11<4:56:48, 5.43s/it] {'loss': 0.5919, 'learning_rate': 1.264730766664796e-05, 'epoch': 0.43} 43%|████▎ | 2496/5773 [3:53:05<4:56:48, 5.43s/it] 43%|████▎ | 2497/5773 [3:53:16<4:55:47, 5.42s/it] 43%|████▎ | 2497/5773 [3:53:10<4:55:47, 5.42s/it] {'loss': 0.5705, 'learning_rate': 1.2641896448178759e-05, 'epoch': 0.43} 43%|████▎ | 2497/5773 [3:53:16<4:55:47, 5.42s/it] {'loss': 0.5705, 'learning_rate': 1.2641896448178759e-05, 'epoch': 0.43} 43%|████▎ | 2497/5773 [3:53:10<4:55:47, 5.42s/it] 43%|████▎ | 2498/5773 [3:53:21<4:57:22, 5.45s/it] 43%|████▎ | 2498/5773 [3:53:16<4:57:22, 5.45s/it] {'loss': 0.5778, 'learning_rate': 1.263648439795615e-05, 'epoch': 0.43} 43%|████▎ | 2498/5773 [3:53:21<4:57:22, 5.45s/it] {'loss': 0.5778, 'learning_rate': 1.263648439795615e-05, 'epoch': 0.43} 43%|████▎ | 2498/5773 [3:53:16<4:57:22, 5.45s/it] 43%|████▎ | 2499/5773 [3:53:27<4:58:35, 5.47s/it] 43%|████▎ | 2499/5773 [3:53:21<4:58:35, 5.47s/it] {'loss': 0.5671, 'learning_rate': 1.2631071517684017e-05, 'epoch': 0.43} 43%|████▎ | 2499/5773 [3:53:27<4:58:35, 5.47s/it] {'loss': 0.5671, 'learning_rate': 1.2631071517684017e-05, 'epoch': 0.43} 43%|████▎ | 2499/5773 [3:53:21<4:58:35, 5.47s/it]8 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 43%|████▎ | 2500/5773 [3:53:32<4:58:02, 5.46s/it]15 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 4 43%|████▎ | 2500/5773 [3:53:27<4:58:03, 5.46s/it]1AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... {'loss': 0.5777, 'learning_rate': 1.2625657809066509e-05, 'epoch': 0.43} 43%|████▎ | 2500/5773 [3:53:32<4:58:02, 5.46s/it] {'loss': 0.5777, 'learning_rate': 1.2625657809066509e-05, 'epoch': 0.43} 43%|████▎ | 2500/5773 [3:53:27<4:58:03, 5.46s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-2500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 43%|████▎ | 2501/5773 [3:53:46<8:48:13, 9.69s/it] 43%|████▎ | 2501/5773 [3:53:52<8:48:13, 9.69s/it] {'loss': 0.5765, 'learning_rate': 1.2620243273808033e-05, 'epoch': 0.43} 43%|████▎ | 2501/5773 [3:53:52<8:48:13, 9.69s/it] {'loss': 0.5765, 'learning_rate': 1.2620243273808033e-05, 'epoch': 0.43} 43%|████▎ | 2501/5773 [3:53:46<8:48:13, 9.69s/it] 43%|████▎ | 2502/5773 [3:53:52<7:39:41, 8.43s/it] 43%|████▎ | 2502/5773 [3:53:57<7:39:41, 8.43s/it] {'loss': 0.5858, 'learning_rate': 1.2614827913613256e-05, 'epoch': 0.43} 43%|████▎ | 2502/5773 [3:53:57<7:39:41, 8.43s/it] {'loss': 0.5858, 'learning_rate': 1.2614827913613256e-05, 'epoch': 0.43} 43%|████▎ | 2502/5773 [3:53:52<7:39:41, 8.43s/it] 43%|████▎ | 2503/5773 [3:54:03<6:48:40, 7.50s/it] 43%|████▎ | 2503/5773 [3:53:57<6:48:40, 7.50s/it] {'loss': 0.561, 'learning_rate': 1.2609411730187114e-05, 'epoch': 0.43} 43%|████▎ | 2503/5773 [3:54:03<6:48:40, 7.50s/it] {'loss': 0.561, 'learning_rate': 1.2609411730187114e-05, 'epoch': 0.43} 43%|████▎ | 2503/5773 [3:53:57<6:48:40, 7.50s/it] 43%|████▎ | 2504/5773 [3:54:08<6:14:18, 6.87s/it] 43%|████▎ | 2504/5773 [3:54:03<6:14:19, 6.87s/it] {'loss': 0.5767, 'learning_rate': 1.2603994725234783e-05, 'epoch': 0.43} 43%|████▎ | 2504/5773 [3:54:08<6:14:18, 6.87s/it] {'loss': 0.5767, 'learning_rate': 1.2603994725234783e-05, 'epoch': 0.43} 43%|████▎ | 2504/5773 [3:54:03<6:14:19, 6.87s/it] 43%|████▎ | 2505/5773 [3:54:08<5:52:27, 6.47s/it] 43%|████▎ | 2505/5773 [3:54:14<5:52:27, 6.47s/it] {'loss': 0.5705, 'learning_rate': 1.2598576900461716e-05, 'epoch': 0.43} 43%|████▎ | 2505/5773 [3:54:14<5:52:27, 6.47s/it] {'loss': 0.5705, 'learning_rate': 1.2598576900461716e-05, 'epoch': 0.43} 43%|████▎ | 2505/5773 [3:54:08<5:52:27, 6.47s/it] 43%|████▎ | 2506/5773 [3:54:19<5:37:53, 6.21s/it] 43%|████▎ | 2506/5773 [3:54:14<5:37:53, 6.21s/it] {'loss': 0.5928, 'learning_rate': 1.259315825757362e-05, 'epoch': 0.43} 43%|████▎ | 2506/5773 [3:54:19<5:37:53, 6.21s/it] {'loss': 0.5928, 'learning_rate': 1.259315825757362e-05, 'epoch': 0.43} 43%|████▎ | 2506/5773 [3:54:14<5:37:53, 6.21s/it] 43%|████▎ | 2507/5773 [3:54:25<5:25:03, 5.97s/it] 43%|████▎ | 2507/5773 [3:54:19<5:25:03, 5.97s/it] {'loss': 0.5763, 'learning_rate': 1.2587738798276457e-05, 'epoch': 0.43} 43%|████▎ | 2507/5773 [3:54:25<5:25:03, 5.97s/it] {'loss': 0.5763, 'learning_rate': 1.2587738798276457e-05, 'epoch': 0.43} 43%|████▎ | 2507/5773 [3:54:19<5:25:03, 5.97s/it] 43%|████▎ | 2508/5773 [3:54:30<5:18:32, 5.85s/it] 43%|████▎ | 2508/5773 [3:54:25<5:18:32, 5.85s/it] {'loss': 0.5766, 'learning_rate': 1.2582318524276436e-05, 'epoch': 0.43} 43%|████▎ | 2508/5773 [3:54:30<5:18:32, 5.85s/it] {'loss': 0.5766, 'learning_rate': 1.2582318524276436e-05, 'epoch': 0.43} 43%|████▎ | 2508/5773 [3:54:25<5:18:32, 5.85s/it] 43%|████▎ | 2509/5773 [3:54:36<5:12:10, 5.74s/it] 43%|████▎ | 2509/5773 [3:54:30<5:12:10, 5.74s/it] {'loss': 0.5815, 'learning_rate': 1.2576897437280042e-05, 'epoch': 0.43} 43%|████▎ | 2509/5773 [3:54:36<5:12:10, 5.74s/it] {'loss': 0.5815, 'learning_rate': 1.2576897437280042e-05, 'epoch': 0.43} 43%|████▎ | 2509/5773 [3:54:30<5:12:10, 5.74s/it] 43%|████▎ | 2510/5773 [3:54:41<5:07:27, 5.65s/it] 43%|████▎ | 2510/5773 [3:54:36<5:07:27, 5.65s/it] {'loss': 0.5658, 'learning_rate': 1.2571475538994e-05, 'epoch': 0.43} 43%|████▎ | 2510/5773 [3:54:41<5:07:27, 5.65s/it] {'loss': 0.5658, 'learning_rate': 1.2571475538994e-05, 'epoch': 0.43} 43%|████▎ | 2510/5773 [3:54:36<5:07:27, 5.65s/it] 43%|████▎ | 2511/5773 [3:54:47<5:04:43, 5.60s/it] 43%|████▎ | 2511/5773 [3:54:41<5:04:43, 5.61s/it] {'loss': 0.5862, 'learning_rate': 1.2566052831125306e-05, 'epoch': 0.43} 43%|████▎ | 2511/5773 [3:54:47<5:04:43, 5.60s/it] {'loss': 0.5862, 'learning_rate': 1.2566052831125306e-05, 'epoch': 0.43} 43%|████▎ | 2511/5773 [3:54:41<5:04:43, 5.61s/it] 44%|████▎ | 2512/5773 [3:54:52<5:01:29, 5.55s/it] 44%|████▎ | 2512/5773 [3:54:47<5:01:29, 5.55s/it] {'loss': 0.5646, 'learning_rate': 1.2560629315381192e-05, 'epoch': 0.44} 44%|████▎ | 2512/5773 [3:54:52<5:01:29, 5.55s/it] {'loss': 0.5646, 'learning_rate': 1.2560629315381192e-05, 'epoch': 0.44} 44%|████▎ | 2512/5773 [3:54:47<5:01:29, 5.55s/it] 44%|████▎ | 2513/5773 [3:54:57<4:58:45, 5.50s/it] 44%|████▎ | 2513/5773 [3:54:52<4:58:45, 5.50s/it] {'loss': 0.585, 'learning_rate': 1.2555204993469159e-05, 'epoch': 0.44} 44%|████▎ | 2513/5773 [3:54:58<4:58:45, 5.50s/it] {'loss': 0.585, 'learning_rate': 1.2555204993469159e-05, 'epoch': 0.44} 44%|████▎ | 2513/5773 [3:54:52<4:58:45, 5.50s/it] 44%|████▎ | 2514/5773 [3:54:57<4:57:20, 5.47s/it] 44%|████▎ | 2514/5773 [3:55:03<4:57:20, 5.47s/it] {'loss': 0.5651, 'learning_rate': 1.2549779867096956e-05, 'epoch': 0.44} 44%|████▎ | 2514/5773 [3:55:03<4:57:20, 5.47s/it] {'loss': 0.5651, 'learning_rate': 1.2549779867096956e-05, 'epoch': 0.44} 44%|████▎ | 2514/5773 [3:54:57<4:57:20, 5.47s/it] 44%|████▎ | 2515/5773 [3:55:08<4:54:15, 5.42s/it] 44%|████▎ | 2515/5773 [3:55:03<4:54:15, 5.42s/it] {'loss': 0.5893, 'learning_rate': 1.2544353937972584e-05, 'epoch': 0.44} 44%|████▎ | 2515/5773 [3:55:08<4:54:15, 5.42s/it] {'loss': 0.5893, 'learning_rate': 1.2544353937972584e-05, 'epoch': 0.44} 44%|████▎ | 2515/5773 [3:55:03<4:54:15, 5.42s/it] 44%|████▎ | 2516/5773 [3:55:14<4:54:17, 5.42s/it] 44%|████▎ | 2516/5773 [3:55:08<4:54:17, 5.42s/it] {'loss': 0.5776, 'learning_rate': 1.2538927207804306e-05, 'epoch': 0.44} 44%|████▎ | 2516/5773 [3:55:14<4:54:17, 5.42s/it] {'loss': 0.5776, 'learning_rate': 1.2538927207804306e-05, 'epoch': 0.44} 44%|████▎ | 2516/5773 [3:55:08<4:54:17, 5.42s/it] 44%|████▎ | 2517/5773 [3:55:19<4:53:16, 5.40s/it] 44%|████▎ | 2517/5773 [3:55:13<4:53:16, 5.40s/it] {'loss': 0.5676, 'learning_rate': 1.2533499678300618e-05, 'epoch': 0.44} 44%|████▎ | 2517/5773 [3:55:19<4:53:16, 5.40s/it] {'loss': 0.5676, 'learning_rate': 1.2533499678300618e-05, 'epoch': 0.44} 44%|████▎ | 2517/5773 [3:55:13<4:53:16, 5.40s/it] 44%|████▎ | 2518/5773 [3:55:25<4:57:08, 5.48s/it] 44%|████▎ | 2518/5773 [3:55:19<4:57:08, 5.48s/it] {'loss': 0.5764, 'learning_rate': 1.252807135117029e-05, 'epoch': 0.44} 44%|████▎ | 2518/5773 [3:55:25<4:57:08, 5.48s/it] {'loss': 0.5764, 'learning_rate': 1.252807135117029e-05, 'epoch': 0.44} 44%|████▎ | 2518/5773 [3:55:19<4:57:08, 5.48s/it] 44%|████▎ | 2519/5773 [3:55:30<4:56:56, 5.48s/it] 44%|████▎ | 2519/5773 [3:55:25<4:56:56, 5.48s/it] {'loss': 0.5863, 'learning_rate': 1.252264222812233e-05, 'epoch': 0.44} 44%|████▎ | 2519/5773 [3:55:30<4:56:56, 5.48s/it] {'loss': 0.5863, 'learning_rate': 1.252264222812233e-05, 'epoch': 0.44} 44%|████▎ | 2519/5773 [3:55:25<4:56:56, 5.48s/it] 44%|████▎ | 2520/5773 [3:55:36<4:56:25, 5.47s/it] 44%|████▎ | 2520/5773 [3:55:30<4:56:25, 5.47s/it] {'loss': 0.5713, 'learning_rate': 1.2517212310865996e-05, 'epoch': 0.44} 44%|████▎ | 2520/5773 [3:55:36<4:56:25, 5.47s/it] {'loss': 0.5713, 'learning_rate': 1.2517212310865996e-05, 'epoch': 0.44} 44%|████▎ | 2520/5773 [3:55:30<4:56:25, 5.47s/it] 44%|████▎ | 2521/5773 [3:55:41<4:56:42, 5.47s/it] 44%|████▎ | 2521/5773 [3:55:36<4:56:42, 5.47s/it] {'loss': 0.5609, 'learning_rate': 1.2511781601110804e-05, 'epoch': 0.44} 44%|████▎ | 2521/5773 [3:55:41<4:56:42, 5.47s/it] {'loss': 0.5609, 'learning_rate': 1.2511781601110804e-05, 'epoch': 0.44} 44%|████▎ | 2521/5773 [3:55:36<4:56:42, 5.47s/it] 44%|████▎ | 2522/5773 [3:55:46<4:53:48, 5.42s/it] 44%|████▎ | 2522/5773 [3:55:41<4:53:48, 5.42s/it] {'loss': 0.5817, 'learning_rate': 1.2506350100566515e-05, 'epoch': 0.44} 44%|████▎ | 2522/5773 [3:55:46<4:53:48, 5.42s/it] {'loss': 0.5817, 'learning_rate': 1.2506350100566515e-05, 'epoch': 0.44} 44%|████▎ | 2522/5773 [3:55:41<4:53:48, 5.42s/it] 44%|████▎ | 2523/5773 [3:55:46<4:55:13, 5.45s/it] 44%|████▎ | 2523/5773 [3:55:52<4:55:13, 5.45s/it] {'loss': 0.5661, 'learning_rate': 1.2500917810943134e-05, 'epoch': 0.44} 44%|████▎ | 2523/5773 [3:55:52<4:55:13, 5.45s/it] {'loss': 0.5661, 'learning_rate': 1.2500917810943134e-05, 'epoch': 0.44} 44%|████▎ | 2523/5773 [3:55:46<4:55:13, 5.45s/it] 44%|████▎ | 2524/5773 [3:55:57<4:53:33, 5.42s/it] 44%|████▎ | 2524/5773 [3:55:52<4:53:33, 5.42s/it] {'loss': 0.585, 'learning_rate': 1.2495484733950924e-05, 'epoch': 0.44} 44%|████▎ | 2524/5773 [3:55:57<4:53:33, 5.42s/it] {'loss': 0.585, 'learning_rate': 1.2495484733950924e-05, 'epoch': 0.44} 44%|████▎ | 2524/5773 [3:55:52<4:53:33, 5.42s/it] 44%|████▎ | 2525/5773 [3:56:03<4:53:31, 5.42s/it] 44%|████▎ | 2525/5773 [3:55:57<4:53:31, 5.42s/it] {'loss': 0.5806, 'learning_rate': 1.2490050871300388e-05, 'epoch': 0.44} 44%|████▎ | 2525/5773 [3:56:03<4:53:31, 5.42s/it] {'loss': 0.5806, 'learning_rate': 1.2490050871300388e-05, 'epoch': 0.44} 44%|████▎ | 2525/5773 [3:55:57<4:53:31, 5.42s/it] 44%|████▍ | 2526/5773 [3:56:08<4:54:02, 5.43s/it] 44%|████▍ | 2526/5773 [3:56:03<4:54:02, 5.43s/it] {'loss': 0.5795, 'learning_rate': 1.2484616224702282e-05, 'epoch': 0.44} 44%|████▍ | 2526/5773 [3:56:08<4:54:02, 5.43s/it] {'loss': 0.5795, 'learning_rate': 1.2484616224702282e-05, 'epoch': 0.44} 44%|████▍ | 2526/5773 [3:56:03<4:54:02, 5.43s/it] 44%|████▍ | 2527/5773 [3:56:13<4:52:42, 5.41s/it] 44%|████▍ | 2527/5773 [3:56:08<4:52:42, 5.41s/it] {'loss': 0.5807, 'learning_rate': 1.2479180795867605e-05, 'epoch': 0.44} 44%|████▍ | 2527/5773 [3:56:13<4:52:42, 5.41s/it] {'loss': 0.5807, 'learning_rate': 1.2479180795867605e-05, 'epoch': 0.44} 44%|████▍ | 2527/5773 [3:56:08<4:52:42, 5.41s/it] 44%|████▍ | 2528/5773 [3:56:19<4:53:29, 5.43s/it] 44%|████▍ | 2528/5773 [3:56:13<4:53:29, 5.43s/it] {'loss': 0.5868, 'learning_rate': 1.2473744586507606e-05, 'epoch': 0.44} 44%|████▍ | 2528/5773 [3:56:19<4:53:29, 5.43s/it] {'loss': 0.5868, 'learning_rate': 1.2473744586507606e-05, 'epoch': 0.44} 44%|████▍ | 2528/5773 [3:56:13<4:53:29, 5.43s/it] 44%|████▍ | 2529/5773 [3:56:25<4:56:16, 5.48s/it] 44%|████▍ | 2529/5773 [3:56:19<4:56:16, 5.48s/it] {'loss': 0.5834, 'learning_rate': 1.2468307598333774e-05, 'epoch': 0.44} 44%|████▍ | 2529/5773 [3:56:25<4:56:16, 5.48s/it] {'loss': 0.5834, 'learning_rate': 1.2468307598333774e-05, 'epoch': 0.44} 44%|████▍ | 2529/5773 [3:56:19<4:56:16, 5.48s/it] 44%|████▍ | 2530/5773 [3:56:30<4:55:33, 5.47s/it] 44%|████▍ | 2530/5773 [3:56:24<4:55:33, 5.47s/it] {'loss': 0.5662, 'learning_rate': 1.246286983305785e-05, 'epoch': 0.44} 44%|████▍ | 2530/5773 [3:56:30<4:55:33, 5.47s/it] {'loss': 0.5662, 'learning_rate': 1.246286983305785e-05, 'epoch': 0.44} 44%|████▍ | 2530/5773 [3:56:24<4:55:33, 5.47s/it] 44%|████▍ | 2531/5773 [3:56:35<4:55:11, 5.46s/it] 44%|████▍ | 2531/5773 [3:56:30<4:55:11, 5.46s/it] {'loss': 0.5658, 'learning_rate': 1.2457431292391811e-05, 'epoch': 0.44} 44%|████▍ | 2531/5773 [3:56:35<4:55:11, 5.46s/it] {'loss': 0.5658, 'learning_rate': 1.2457431292391811e-05, 'epoch': 0.44} 44%|████▍ | 2531/5773 [3:56:30<4:55:11, 5.46s/it] 44%|████▍ | 2532/5773 [3:56:41<4:55:19, 5.47s/it] 44%|████▍ | 2532/5773 [3:56:35<4:55:19, 5.47s/it] {'loss': 0.5795, 'learning_rate': 1.2451991978047891e-05, 'epoch': 0.44} 44%|████▍ | 2532/5773 [3:56:41<4:55:19, 5.47s/it] {'loss': 0.5795, 'learning_rate': 1.2451991978047891e-05, 'epoch': 0.44} 44%|████▍ | 2532/5773 [3:56:35<4:55:19, 5.47s/it] 44%|████▍ | 2533/5773 [3:56:46<4:54:57, 5.46s/it] 44%|████▍ | 2533/5773 [3:56:41<4:54:57, 5.46s/it] {'loss': 0.5704, 'learning_rate': 1.244655189173855e-05, 'epoch': 0.44} 44%|████▍ | 2533/5773 [3:56:46<4:54:57, 5.46s/it] {'loss': 0.5704, 'learning_rate': 1.244655189173855e-05, 'epoch': 0.44} 44%|████▍ | 2533/5773 [3:56:41<4:54:57, 5.46s/it] 44%|████▍ | 2534/5773 [3:56:52<4:53:15, 5.43s/it] 44%|████▍ | 2534/5773 [3:56:46<4:53:15, 5.43s/it] {'loss': 0.5803, 'learning_rate': 1.2441111035176511e-05, 'epoch': 0.44} 44%|████▍ | 2534/5773 [3:56:52<4:53:15, 5.43s/it] {'loss': 0.5803, 'learning_rate': 1.2441111035176511e-05, 'epoch': 0.44} 44%|████▍ | 2534/5773 [3:56:46<4:53:15, 5.43s/it] 44%|████▍ | 2535/5773 [3:56:57<4:54:49, 5.46s/it] 44%|████▍ | 2535/5773 [3:56:52<4:54:49, 5.46s/it] {'loss': 0.5742, 'learning_rate': 1.2435669410074727e-05, 'epoch': 0.44} 44%|████▍ | 2535/5773 [3:56:57<4:54:49, 5.46s/it] {'loss': 0.5742, 'learning_rate': 1.2435669410074727e-05, 'epoch': 0.44} 44%|████▍ | 2535/5773 [3:56:52<4:54:49, 5.46s/it] 44%|████▍ | 2536/5773 [3:56:57<4:55:38, 5.48s/it] 44%|████▍ | 2536/5773 [3:57:03<4:55:38, 5.48s/it] {'loss': 0.5801, 'learning_rate': 1.2430227018146387e-05, 'epoch': 0.44} 44%|████▍ | 2536/5773 [3:57:03<4:55:38, 5.48s/it] {'loss': 0.5801, 'learning_rate': 1.2430227018146387e-05, 'epoch': 0.44} 44%|████▍ | 2536/5773 [3:56:57<4:55:38, 5.48s/it] 44%|████▍ | 2537/5773 [3:57:08<4:52:58, 5.43s/it] 44%|████▍ | 2537/5773 [3:57:03<4:52:58, 5.43s/it] {'loss': 0.5763, 'learning_rate': 1.2424783861104943e-05, 'epoch': 0.44} 44%|████▍ | 2537/5773 [3:57:08<4:52:58, 5.43s/it] {'loss': 0.5763, 'learning_rate': 1.2424783861104943e-05, 'epoch': 0.44} 44%|████▍ | 2537/5773 [3:57:03<4:52:58, 5.43s/it] 44%|████▍ | 2538/5773 [3:57:14<4:53:54, 5.45s/it] 44%|████▍ | 2538/5773 [3:57:08<4:53:54, 5.45s/it] {'loss': 0.5687, 'learning_rate': 1.2419339940664064e-05, 'epoch': 0.44} 44%|████▍ | 2538/5773 [3:57:14<4:53:54, 5.45s/it] {'loss': 0.5687, 'learning_rate': 1.2419339940664064e-05, 'epoch': 0.44} 44%|████▍ | 2538/5773 [3:57:08<4:53:54, 5.45s/it] 44%|████▍ | 2539/5773 [3:57:19<4:54:17, 5.46s/it] 44%|████▍ | 2539/5773 [3:57:14<4:54:17, 5.46s/it] {'loss': 0.5703, 'learning_rate': 1.2413895258537676e-05, 'epoch': 0.44} 44%|████▍ | 2539/5773 [3:57:19<4:54:17, 5.46s/it] {'loss': 0.5703, 'learning_rate': 1.2413895258537676e-05, 'epoch': 0.44} 44%|████▍ | 2539/5773 [3:57:14<4:54:17, 5.46s/it] 44%|████▍ | 2540/5773 [3:57:25<4:53:56, 5.46s/it] 44%|████▍ | 2540/5773 [3:57:19<4:53:57, 5.46s/it] {'loss': 0.6157, 'learning_rate': 1.2408449816439935e-05, 'epoch': 0.44} 44%|████▍ | 2540/5773 [3:57:25<4:53:56, 5.46s/it] {'loss': 0.6157, 'learning_rate': 1.2408449816439935e-05, 'epoch': 0.44} 44%|████▍ | 2540/5773 [3:57:19<4:53:57, 5.46s/it] 44%|████▍ | 2541/5773 [3:57:30<4:54:17, 5.46s/it] 44%|████▍ | 2541/5773 [3:57:24<4:54:17, 5.46s/it] {'loss': 0.5819, 'learning_rate': 1.2403003616085245e-05, 'epoch': 0.44} 44%|████▍ | 2541/5773 [3:57:30<4:54:17, 5.46s/it] {'loss': 0.5819, 'learning_rate': 1.2403003616085245e-05, 'epoch': 0.44} 44%|████▍ | 2541/5773 [3:57:24<4:54:17, 5.46s/it] 44%|████▍ | 2542/5773 [3:57:36<4:57:15, 5.52s/it] 44%|████▍ | 2542/5773 [3:57:30<4:57:15, 5.52s/it] {'loss': 0.5649, 'learning_rate': 1.239755665918824e-05, 'epoch': 0.44} 44%|████▍ | 2542/5773 [3:57:36<4:57:15, 5.52s/it] {'loss': 0.5649, 'learning_rate': 1.239755665918824e-05, 'epoch': 0.44} 44%|████▍ | 2542/5773 [3:57:30<4:57:15, 5.52s/it]Apr 09 21:41:53.108415 3882742 slurmstepd 0x155550ab8700: error: *** STEP 6683247.0 ON batch-block1-2105 CANCELLED AT 2025-04-09T21:41:53 DUE TO TIME LIMIT *** srun: Job step aborted: Waiting up to 122 seconds for job step to finish. srun: error: batch-block1-10014: task 1: Terminated srun: Terminating StepId=6683247.0 srun: error: batch-block1-2105: task 0: Terminated srun: job 6697721 queued and waiting for resources srun: job 6697721 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-0082 JobID: 6697721 | Full list: batch-block1-0082 batch-block1-10014 NETWORK=Efficient-Large-Model/VILA1.5-3b MASTER_ADDR=batch-block1-0082 JobID: 6697721 | Full list: batch-block1-0082 batch-block1-10014 NETWORK=Efficient-Large-Model/VILA1.5-3b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-04-10 06:08:55,949] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:55,949] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:55,949] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:55,949] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:55,949] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:55,949] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:55,949] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:55,949] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:56,107] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:56,107] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:56,107] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:56,107] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:56,107] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:56,107] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:56,107] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:56,108] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 06:08:56,910] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:56,910] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:56,910] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:56,910] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:56,910] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:56,910] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:56,910] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:56,910] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:56,910] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:56,910] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:56,910] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:56,910] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:56,910] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:56,910] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:56,910] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-04-10 06:08:56,910] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:56,910] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:57,076] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:57,076] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:57,076] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:57,076] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:57,076] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:57,076] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:57,076] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:57,076] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:57,076] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:57,076] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:57,076] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:57,076] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:57,076] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:57,076] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 06:08:57,076] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 06:08:57,076] [INFO] [comm.py:594:init_distributed] cdb=None You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-04-10 06:09:06,179] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 2.70B parameters Loading checkpoint shards: 0%| | 0/2 [00:00\nWhat vitamin is this vegetable associated with?\nAnswer the question using a single word or phrase.'}, {'from': 'gpt', 'value': ''}]] (ignored) 53%|█████▎ | 3051/5773 [52:19<4:10:20, 5.52s/it] 53%|█████▎ | 3051/5773 [52:21<4:10:21, 5.52s/it] {'loss': 0.5789, 'learning_rate': 9.565285399457345e-06, 'epoch': 0.53} 53%|█████▎ | 3051/5773 [52:21<4:10:21, 5.52s/it] {'loss': 0.5789, 'learning_rate': 9.565285399457345e-06, 'epoch': 0.53} 53%|█████▎ | 3051/5773 [52:19<4:10:20, 5.52s/it] 53%|█████▎ | 3052/5773 [52:25<4:12:28, 5.57s/it] 53%|█████▎ | 3052/5773 [52:27<4:12:28, 5.57s/it] {'loss': 0.5833, 'learning_rate': 9.559679783579068e-06, 'epoch': 0.53} 53%|█████▎ | 3052/5773 [52:27<4:12:28, 5.57s/it] {'loss': 0.5833, 'learning_rate': 9.559679783579068e-06, 'epoch': 0.53} 53%|█████▎ | 3052/5773 [52:25<4:12:28, 5.57s/it] 53%|█████▎ | 3053/5773 [52:30<4:11:33, 5.55s/it] 53%|█████▎ | 3053/5773 [52:32<4:11:33, 5.55s/it] {'loss': 0.5603, 'learning_rate': 9.55407430632766e-06, 'epoch': 0.53} 53%|█████▎ | 3053/5773 [52:32<4:11:33, 5.55s/it] {'loss': 0.5603, 'learning_rate': 9.55407430632766e-06, 'epoch': 0.53} 53%|█████▎ | 3053/5773 [52:30<4:11:33, 5.55s/it] 53%|█████▎ | 3054/5773 [52:36<4:09:16, 5.50s/it] 53%|█████▎ | 3054/5773 [52:37<4:09:16, 5.50s/it] {'loss': 0.5558, 'learning_rate': 9.548468969467912e-06, 'epoch': 0.53} 53%|█████▎ | 3054/5773 [52:37<4:09:16, 5.50s/it] {'loss': 0.5558, 'learning_rate': 9.548468969467912e-06, 'epoch': 0.53} 53%|█████▎ | 3054/5773 [52:36<4:09:16, 5.50s/it] 53%|█████▎ | 3055/5773 [52:41<4:07:59, 5.47s/it] 53%|█████▎ | 3055/5773 [52:43<4:08:00, 5.47s/it] {'loss': 0.5655, 'learning_rate': 9.542863774764557e-06, 'epoch': 0.53} 53%|█████▎ | 3055/5773 [52:43<4:08:00, 5.47s/it] {'loss': 0.5655, 'learning_rate': 9.542863774764557e-06, 'epoch': 0.53} 53%|█████▎ | 3055/5773 [52:41<4:07:59, 5.47s/it] 53%|█████▎ | 3056/5773 [52:46<4:08:42, 5.49s/it] 53%|█████▎ | 3056/5773 [52:48<4:08:42, 5.49s/it] {'loss': 0.5792, 'learning_rate': 9.53725872398229e-06, 'epoch': 0.53} 53%|█████▎ | 3056/5773 [52:48<4:08:42, 5.49s/it] {'loss': 0.5792, 'learning_rate': 9.53725872398229e-06, 'epoch': 0.53} 53%|█████▎ | 3056/5773 [52:46<4:08:42, 5.49s/it] 53%|█████▎ | 3057/5773 [52:52<4:10:02, 5.52s/it] 53%|█████▎ | 3057/5773 [52:54<4:10:02, 5.52s/it] {'loss': 0.5622, 'learning_rate': 9.531653818885763e-06, 'epoch': 0.53} 53%|█████▎ | 3057/5773 [52:54<4:10:02, 5.52s/it] {'loss': 0.5622, 'learning_rate': 9.531653818885763e-06, 'epoch': 0.53} 53%|█████▎ | 3057/5773 [52:52<4:10:02, 5.52s/it] 53%|█████▎ | 3058/5773 [52:58<4:09:31, 5.51s/it] 53%|█████▎ | 3058/5773 [53:00<4:09:31, 5.51s/it] {'loss': 0.5566, 'learning_rate': 9.52604906123958e-06, 'epoch': 0.53} 53%|█████▎ | 3058/5773 [53:00<4:09:31, 5.51s/it] {'loss': 0.5566, 'learning_rate': 9.52604906123958e-06, 'epoch': 0.53} 53%|█████▎ | 3058/5773 [52:58<4:09:31, 5.51s/it] 53%|█████▎ | 3059/5773 [53:03<4:11:24, 5.56s/it] 53%|█████▎ | 3059/5773 [53:05<4:11:24, 5.56s/it] {'loss': 0.58, 'learning_rate': 9.52044445280829e-06, 'epoch': 0.53} 53%|█████▎ | 3059/5773 [53:05<4:11:24, 5.56s/it] {'loss': 0.58, 'learning_rate': 9.52044445280829e-06, 'epoch': 0.53} 53%|█████▎ | 3059/5773 [53:03<4:11:24, 5.56s/it] 53%|█████▎ | 3060/5773 [53:09<4:09:35, 5.52s/it] 53%|█████▎ | 3060/5773 [53:11<4:09:35, 5.52s/it] {'loss': 0.5587, 'learning_rate': 9.514839995356411e-06, 'epoch': 0.53} 53%|█████▎ | 3060/5773 [53:11<4:09:35, 5.52s/it] {'loss': 0.5587, 'learning_rate': 9.514839995356411e-06, 'epoch': 0.53} 53%|█████▎ | 3060/5773 [53:09<4:09:35, 5.52s/it] 53%|█████▎ | 3061/5773 [53:14<4:08:49, 5.50s/it] 53%|█████▎ | 3061/5773 [53:16<4:08:49, 5.50s/it] {'loss': 0.5814, 'learning_rate': 9.509235690648401e-06, 'epoch': 0.53} 53%|█████▎ | 3061/5773 [53:16<4:08:49, 5.50s/it] {'loss': 0.5814, 'learning_rate': 9.509235690648401e-06, 'epoch': 0.53} 53%|█████▎ | 3061/5773 [53:14<4:08:49, 5.50s/it] 53%|█████▎ | 3062/5773 [53:20<4:09:50, 5.53s/it] 53%|█████▎ | 3062/5773 [53:22<4:09:50, 5.53s/it] {'loss': 0.5679, 'learning_rate': 9.503631540448674e-06, 'epoch': 0.53} 53%|█████▎ | 3062/5773 [53:22<4:09:50, 5.53s/it] {'loss': 0.5679, 'learning_rate': 9.503631540448674e-06, 'epoch': 0.53} 53%|█████▎ | 3062/5773 [53:20<4:09:50, 5.53s/it] 53%|█████▎ | 3063/5773 [53:25<4:09:50, 5.53s/it] 53%|█████▎ | 3063/5773 [53:27<4:09:50, 5.53s/it] {'loss': 0.5612, 'learning_rate': 9.4980275465216e-06, 'epoch': 0.53} 53%|█████▎ | 3063/5773 [53:27<4:09:50, 5.53s/it] {'loss': 0.5612, 'learning_rate': 9.4980275465216e-06, 'epoch': 0.53} 53%|█████▎ | 3063/5773 [53:25<4:09:50, 5.53s/it] 53%|█████▎ | 3064/5773 [53:31<4:12:07, 5.58s/it] 53%|█████▎ | 3064/5773 [53:33<4:12:07, 5.58s/it] {'loss': 0.59, 'learning_rate': 9.492423710631488e-06, 'epoch': 0.53} 53%|█████▎ | 3064/5773 [53:33<4:12:07, 5.58s/it] {'loss': 0.59, 'learning_rate': 9.492423710631488e-06, 'epoch': 0.53} 53%|█████▎ | 3064/5773 [53:31<4:12:07, 5.58s/it] 53%|█████▎ | 3065/5773 [53:36<4:10:09, 5.54s/it] 53%|█████▎ | 3065/5773 [53:38<4:10:09, 5.54s/it] {'loss': 0.5664, 'learning_rate': 9.486820034542614e-06, 'epoch': 0.53} 53%|█████▎ | 3065/5773 [53:38<4:10:09, 5.54s/it] {'loss': 0.5664, 'learning_rate': 9.486820034542614e-06, 'epoch': 0.53} 53%|█████▎ | 3065/5773 [53:36<4:10:09, 5.54s/it] 53%|█████▎ | 3066/5773 [53:42<4:07:41, 5.49s/it] 53%|█████▎ | 3066/5773 [53:44<4:07:41, 5.49s/it] {'loss': 0.5803, 'learning_rate': 9.481216520019185e-06, 'epoch': 0.53} 53%|█████▎ | 3066/5773 [53:44<4:07:41, 5.49s/it] {'loss': 0.5803, 'learning_rate': 9.481216520019185e-06, 'epoch': 0.53} 53%|█████▎ | 3066/5773 [53:42<4:07:41, 5.49s/it] 53%|█████▎ | 3067/5773 [53:47<4:07:38, 5.49s/it] 53%|█████▎ | 3067/5773 [53:49<4:07:38, 5.49s/it] {'loss': 0.5636, 'learning_rate': 9.475613168825374e-06, 'epoch': 0.53} 53%|█████▎ | 3067/5773 [53:49<4:07:38, 5.49s/it] {'loss': 0.5636, 'learning_rate': 9.475613168825374e-06, 'epoch': 0.53} 53%|█████▎ | 3067/5773 [53:47<4:07:38, 5.49s/it] 53%|█████▎ | 3068/5773 [53:53<4:07:12, 5.48s/it] 53%|█████▎ | 3068/5773 [53:55<4:07:12, 5.48s/it] {'loss': 0.5574, 'learning_rate': 9.470009982725288e-06, 'epoch': 0.53} 53%|█████▎ | 3068/5773 [53:55<4:07:12, 5.48s/it] {'loss': 0.5574, 'learning_rate': 9.470009982725288e-06, 'epoch': 0.53} 53%|█████▎ | 3068/5773 [53:53<4:07:12, 5.48s/it] 53%|█████▎ | 3069/5773 [53:58<4:09:36, 5.54s/it] 53%|█████▎ | 3069/5773 [54:00<4:09:36, 5.54s/it] {'loss': 0.5664, 'learning_rate': 9.464406963482993e-06, 'epoch': 0.53} {'loss': 0.5664, 'learning_rate': 9.464406963482993e-06, 'epoch': 0.53} 53%|█████▎ | 3069/5773 [54:00<4:09:36, 5.54s/it] 53%|█████▎ | 3069/5773 [53:58<4:09:36, 5.54s/it] 53%|█████▎ | 3070/5773 [54:04<4:08:17, 5.51s/it] 53%|█████▎ | 3070/5773 [54:06<4:08:17, 5.51s/it] {'loss': 0.587, 'learning_rate': 9.4588041128625e-06, 'epoch': 0.53} 53%|█████▎ | 3070/5773 [54:06<4:08:17, 5.51s/it] {'loss': 0.587, 'learning_rate': 9.4588041128625e-06, 'epoch': 0.53} 53%|█████▎ | 3070/5773 [54:04<4:08:17, 5.51s/it] 53%|█████▎ | 3071/5773 [54:09<4:06:42, 5.48s/it] 53%|█████▎ | 3071/5773 [54:11<4:06:45, 5.48s/it] {'loss': 0.5524, 'learning_rate': 9.45320143262776e-06, 'epoch': 0.53} 53%|█████▎ | 3071/5773 [54:11<4:06:45, 5.48s/it] {'loss': 0.5524, 'learning_rate': 9.45320143262776e-06, 'epoch': 0.53} 53%|█████▎ | 3071/5773 [54:09<4:06:42, 5.48s/it] 53%|█████▎ | 3072/5773 [54:15<4:06:41, 5.48s/it] 53%|█████▎ | 3072/5773 [54:17<4:06:41, 5.48s/it] {'loss': 0.5835, 'learning_rate': 9.447598924542686e-06, 'epoch': 0.53} 53%|█████▎ | 3072/5773 [54:17<4:06:41, 5.48s/it] {'loss': 0.5835, 'learning_rate': 9.447598924542686e-06, 'epoch': 0.53} 53%|█████▎ | 3072/5773 [54:15<4:06:41, 5.48s/it] 53%|█████▎ | 3073/5773 [54:20<4:06:45, 5.48s/it] 53%|█████▎ | 3073/5773 [54:22<4:06:45, 5.48s/it] {'loss': 0.5729, 'learning_rate': 9.441996590371117e-06, 'epoch': 0.53} 53%|█████▎ | 3073/5773 [54:22<4:06:45, 5.48s/it] {'loss': 0.5729, 'learning_rate': 9.441996590371117e-06, 'epoch': 0.53} 53%|█████▎ | 3073/5773 [54:20<4:06:45, 5.48s/it] 53%|█████▎ | 3074/5773 [54:26<4:08:50, 5.53s/it] 53%|█████▎ | 3074/5773 [54:28<4:08:49, 5.53s/it] {'loss': 0.5601, 'learning_rate': 9.436394431876847e-06, 'epoch': 0.53} 53%|█████▎ | 3074/5773 [54:28<4:08:49, 5.53s/it] {'loss': 0.5601, 'learning_rate': 9.436394431876847e-06, 'epoch': 0.53} 53%|█████▎ | 3074/5773 [54:26<4:08:50, 5.53s/it] 53%|█████▎ | 3075/5773 [54:31<4:07:35, 5.51s/it] 53%|█████▎ | 3075/5773 [54:33<4:07:35, 5.51s/it] {'loss': 0.5673, 'learning_rate': 9.430792450823616e-06, 'epoch': 0.53} 53%|█████▎ | 3075/5773 [54:33<4:07:35, 5.51s/it] {'loss': 0.5673, 'learning_rate': 9.430792450823616e-06, 'epoch': 0.53} 53%|█████▎ | 3075/5773 [54:31<4:07:35, 5.51s/it] 53%|█████▎ | 3076/5773 [54:37<4:09:25, 5.55s/it] 53%|█████▎ | 3076/5773 [54:39<4:09:24, 5.55s/it] {'loss': 0.5559, 'learning_rate': 9.42519064897511e-06, 'epoch': 0.53} 53%|█████▎ | 3076/5773 [54:39<4:09:24, 5.55s/it] {'loss': 0.5559, 'learning_rate': 9.42519064897511e-06, 'epoch': 0.53} 53%|█████▎ | 3076/5773 [54:37<4:09:25, 5.55s/it] 53%|█████▎ | 3077/5773 [54:43<4:09:33, 5.55s/it] 53%|█████▎ | 3077/5773 [54:44<4:09:34, 5.55s/it] {'loss': 0.5622, 'learning_rate': 9.419589028094952e-06, 'epoch': 0.53} 53%|█████▎ | 3077/5773 [54:44<4:09:34, 5.55s/it] {'loss': 0.5622, 'learning_rate': 9.419589028094952e-06, 'epoch': 0.53} 53%|█████▎ | 3077/5773 [54:43<4:09:33, 5.55s/it] 53%|█████▎ | 3078/5773 [54:48<4:08:31, 5.53s/it] 53%|█████▎ | 3078/5773 [54:50<4:08:31, 5.53s/it] {'loss': 0.5691, 'learning_rate': 9.41398758994671e-06, 'epoch': 0.53} 53%|█████▎ | 3078/5773 [54:50<4:08:31, 5.53s/it] {'loss': 0.5691, 'learning_rate': 9.41398758994671e-06, 'epoch': 0.53} 53%|█████▎ | 3078/5773 [54:48<4:08:31, 5.53s/it] 53%|█████▎ | 3079/5773 [54:54<4:08:22, 5.53s/it] 53%|█████▎ | 3079/5773 [54:55<4:08:22, 5.53s/it] {'loss': 0.5869, 'learning_rate': 9.4083863362939e-06, 'epoch': 0.53} 53%|█████▎ | 3079/5773 [54:55<4:08:22, 5.53s/it] {'loss': 0.5869, 'learning_rate': 9.4083863362939e-06, 'epoch': 0.53} 53%|█████▎ | 3079/5773 [54:54<4:08:22, 5.53s/it] 53%|█████▎ | 3080/5773 [54:59<4:09:45, 5.56s/it] 53%|█████▎ | 3080/5773 [55:01<4:09:44, 5.56s/it] {'loss': 0.5896, 'learning_rate': 9.40278526889997e-06, 'epoch': 0.53} 53%|█████▎ | 3080/5773 [55:01<4:09:44, 5.56s/it] {'loss': 0.5896, 'learning_rate': 9.40278526889997e-06, 'epoch': 0.53} 53%|█████▎ | 3080/5773 [54:59<4:09:45, 5.56s/it] 53%|█████▎ | 3081/5773 [55:05<4:09:00, 5.55s/it] 53%|█████▎ | 3081/5773 [55:07<4:09:00, 5.55s/it] {'loss': 0.5735, 'learning_rate': 9.397184389528323e-06, 'epoch': 0.53} 53%|█████▎ | 3081/5773 [55:07<4:09:00, 5.55s/it] {'loss': 0.5735, 'learning_rate': 9.397184389528323e-06, 'epoch': 0.53} 53%|█████▎ | 3081/5773 [55:05<4:09:00, 5.55s/it] 53%|█████▎ | 3082/5773 [55:10<4:07:55, 5.53s/it] 53%|█████▎ | 3082/5773 [55:12<4:07:55, 5.53s/it] {'loss': 0.5685, 'learning_rate': 9.391583699942286e-06, 'epoch': 0.53} 53%|█████▎ | 3082/5773 [55:12<4:07:55, 5.53s/it] {'loss': 0.5685, 'learning_rate': 9.391583699942286e-06, 'epoch': 0.53} 53%|█████▎ | 3082/5773 [55:10<4:07:55, 5.53s/it] 53%|█████▎ | 3083/5773 [55:16<4:05:58, 5.49s/it] 53%|█████▎ | 3083/5773 [55:18<4:05:58, 5.49s/it] {'loss': 0.5614, 'learning_rate': 9.385983201905143e-06, 'epoch': 0.53} 53%|█████▎ | 3083/5773 [55:18<4:05:58, 5.49s/it] {'loss': 0.5614, 'learning_rate': 9.385983201905143e-06, 'epoch': 0.53} 53%|█████▎ | 3083/5773 [55:16<4:05:58, 5.49s/it] 53%|█████▎ | 3084/5773 [55:21<4:07:27, 5.52s/it] 53%|█████▎ | 3084/5773 [55:23<4:07:27, 5.52s/it] {'loss': 0.5792, 'learning_rate': 9.380382897180103e-06, 'epoch': 0.53} 53%|█████▎ | 3084/5773 [55:23<4:07:27, 5.52s/it] {'loss': 0.5792, 'learning_rate': 9.380382897180103e-06, 'epoch': 0.53} 53%|█████▎ | 3084/5773 [55:21<4:07:27, 5.52s/it] 53%|█████▎ | 3085/5773 [55:27<4:09:30, 5.57s/it] 53%|█████▎ | 3085/5773 [55:29<4:09:30, 5.57s/it] {'loss': 0.5615, 'learning_rate': 9.374782787530326e-06, 'epoch': 0.53} 53%|█████▎ | 3085/5773 [55:29<4:09:30, 5.57s/it] {'loss': 0.5615, 'learning_rate': 9.374782787530326e-06, 'epoch': 0.53} 53%|█████▎ | 3085/5773 [55:27<4:09:30, 5.57s/it] 53%|█████▎ | 3086/5773 [55:32<4:07:34, 5.53s/it] 53%|█████▎ | 3086/5773 [55:34<4:07:34, 5.53s/it] {'loss': 0.5732, 'learning_rate': 9.369182874718904e-06, 'epoch': 0.53} 53%|█████▎ | 3086/5773 [55:34<4:07:34, 5.53s/it] {'loss': 0.5732, 'learning_rate': 9.369182874718904e-06, 'epoch': 0.53} 53%|█████▎ | 3086/5773 [55:32<4:07:34, 5.53s/it] 53%|█████▎ | 3087/5773 [55:40<4:08:16, 5.55s/it] 53%|█████▎ | 3087/5773 [55:38<4:08:16, 5.55s/it] {'loss': 0.5858, 'learning_rate': 9.363583160508864e-06, 'epoch': 0.53} 53%|█████▎ | 3087/5773 [55:40<4:08:16, 5.55s/it] {'loss': 0.5858, 'learning_rate': 9.363583160508864e-06, 'epoch': 0.53} 53%|█████▎ | 3087/5773 [55:38<4:08:16, 5.55s/it] 53%|█████▎ | 3088/5773 [55:43<4:07:19, 5.53s/it] 53%|█████▎ | 3088/5773 [55:45<4:07:19, 5.53s/it] {'loss': 0.5649, 'learning_rate': 9.357983646663178e-06, 'epoch': 0.53} 53%|█████▎ | 3088/5773 [55:45<4:07:19, 5.53s/it] {'loss': 0.5649, 'learning_rate': 9.357983646663178e-06, 'epoch': 0.53} 53%|█████▎ | 3088/5773 [55:43<4:07:19, 5.53s/it] 54%|█████▎ | 3089/5773 [55:49<4:07:40, 5.54s/it] 54%|█████▎ | 3089/5773 [55:51<4:07:40, 5.54s/it] {'loss': 0.5673, 'learning_rate': 9.352384334944754e-06, 'epoch': 0.54} 54%|█████▎ | 3089/5773 [55:51<4:07:40, 5.54s/it] {'loss': 0.5673, 'learning_rate': 9.352384334944754e-06, 'epoch': 0.54} 54%|█████▎ | 3089/5773 [55:49<4:07:40, 5.54s/it] 54%|█████▎ | 3090/5773 [55:54<4:05:46, 5.50s/it] 54%|█████▎ | 3090/5773 [55:56<4:05:46, 5.50s/it] {'loss': 0.5692, 'learning_rate': 9.346785227116432e-06, 'epoch': 0.54} 54%|█████▎ | 3090/5773 [55:56<4:05:46, 5.50s/it] {'loss': 0.5692, 'learning_rate': 9.346785227116432e-06, 'epoch': 0.54} 54%|█████▎ | 3090/5773 [55:54<4:05:46, 5.50s/it] 54%|█████▎ | 3091/5773 [56:00<4:05:27, 5.49s/it] 54%|█████▎ | 3091/5773 [56:02<4:05:27, 5.49s/it] {'loss': 0.5663, 'learning_rate': 9.341186324940991e-06, 'epoch': 0.54} 54%|█████▎ | 3091/5773 [56:02<4:05:27, 5.49s/it] {'loss': 0.5663, 'learning_rate': 9.341186324940991e-06, 'epoch': 0.54} 54%|█████▎ | 3091/5773 [56:00<4:05:27, 5.49s/it] 54%|█████▎ | 3092/5773 [56:05<4:05:35, 5.50s/it] 54%|█████▎ | 3092/5773 [56:07<4:05:36, 5.50s/it] {'loss': 0.5716, 'learning_rate': 9.335587630181142e-06, 'epoch': 0.54} 54%|█████▎ | 3092/5773 [56:07<4:05:36, 5.50s/it] {'loss': 0.5716, 'learning_rate': 9.335587630181142e-06, 'epoch': 0.54} 54%|█████▎ | 3092/5773 [56:05<4:05:35, 5.50s/it] 54%|█████▎ | 3093/5773 [56:11<4:04:50, 5.48s/it] 54%|█████▎ | 3093/5773 [56:13<4:04:49, 5.48s/it] {'loss': 0.5606, 'learning_rate': 9.329989144599536e-06, 'epoch': 0.54} 54%|█████▎ | 3093/5773 [56:13<4:04:49, 5.48s/it] {'loss': 0.5606, 'learning_rate': 9.329989144599536e-06, 'epoch': 0.54} 54%|█████▎ | 3093/5773 [56:11<4:04:50, 5.48s/it] 54%|█████▎ | 3094/5773 [56:16<4:04:03, 5.47s/it] 54%|█████▎ | 3094/5773 [56:18<4:04:03, 5.47s/it] {'loss': 0.5656, 'learning_rate': 9.324390869958756e-06, 'epoch': 0.54} 54%|█████▎ | 3094/5773 [56:18<4:04:03, 5.47s/it] {'loss': 0.5656, 'learning_rate': 9.324390869958756e-06, 'epoch': 0.54} 54%|█████▎ | 3094/5773 [56:16<4:04:03, 5.47s/it] 54%|█████▎ | 3095/5773 [56:22<4:02:23, 5.43s/it] 54%|█████▎ | 3095/5773 [56:23<4:02:23, 5.43s/it] {'loss': 0.5768, 'learning_rate': 9.318792808021313e-06, 'epoch': 0.54} 54%|█████▎ | 3095/5773 [56:23<4:02:23, 5.43s/it] {'loss': 0.5768, 'learning_rate': 9.318792808021313e-06, 'epoch': 0.54} 54%|█████▎ | 3095/5773 [56:22<4:02:23, 5.43s/it] 54%|█████▎ | 3096/5773 [56:27<4:02:20, 5.43s/it] 54%|█████▎ | 3096/5773 [56:29<4:02:20, 5.43s/it] {'loss': 0.5667, 'learning_rate': 9.313194960549661e-06, 'epoch': 0.54} 54%|█████▎ | 3096/5773 [56:29<4:02:20, 5.43s/it] {'loss': 0.5667, 'learning_rate': 9.313194960549661e-06, 'epoch': 0.54} 54%|█████▎ | 3096/5773 [56:27<4:02:20, 5.43s/it] 54%|█████▎ | 3097/5773 [56:32<4:01:34, 5.42s/it] 54%|█████▎ | 3097/5773 [56:34<4:01:34, 5.42s/it] {'loss': 0.5661, 'learning_rate': 9.307597329306175e-06, 'epoch': 0.54} 54%|█████▎ | 3097/5773 [56:34<4:01:34, 5.42s/it] {'loss': 0.5661, 'learning_rate': 9.307597329306175e-06, 'epoch': 0.54} 54%|█████▎ | 3097/5773 [56:32<4:01:34, 5.42s/it] 54%|█████▎ | 3098/5773 [56:38<4:03:36, 5.46s/it] 54%|█████▎ | 3098/5773 [56:40<4:03:36, 5.46s/it] {'loss': 0.5684, 'learning_rate': 9.301999916053175e-06, 'epoch': 0.54} 54%|█████▎ | 3098/5773 [56:40<4:03:36, 5.46s/it] {'loss': 0.5684, 'learning_rate': 9.301999916053175e-06, 'epoch': 0.54} 54%|█████▎ | 3098/5773 [56:38<4:03:36, 5.46s/it] 54%|█████▎ | 3099/5773 [56:43<4:03:04, 5.45s/it] 54%|█████▎ | 3099/5773 [56:45<4:03:04, 5.45s/it] {'loss': 0.5601, 'learning_rate': 9.2964027225529e-06, 'epoch': 0.54} 54%|█████▎ | 3099/5773 [56:45<4:03:04, 5.45s/it] {'loss': 0.5601, 'learning_rate': 9.2964027225529e-06, 'epoch': 0.54} 54%|█████▎ | 3099/5773 [56:43<4:03:04, 5.45s/it]1412 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 117 AutoResumeHook: Checking whether to suspend...5 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... 54%|█████▎ | 3100/5773 [56:51<4:05:41, 5.51s/it] AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 54%|█████▎ | 3100/5773 [56:49<4:05:41, 5.51s/it]13 4 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5818, 'learning_rate': 9.290805750567532e-06, 'epoch': 0.54} 54%|█████▎ | 3100/5773 [56:51<4:05:41, 5.51s/it] {'loss': 0.5818, 'learning_rate': 9.290805750567532e-06, 'epoch': 0.54} 54%|█████▎ | 3100/5773 [56:49<4:05:41, 5.51s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 54%|█████▎ | 3101/5773 [57:11<7:20:14, 9.89s/it] 54%|█████▎ | 3101/5773 [57:09<7:20:15, 9.89s/it] {'loss': 0.5749, 'learning_rate': 9.28520900185917e-06, 'epoch': 0.54} 54%|█████▎ | 3101/5773 [57:11<7:20:14, 9.89s/it] {'loss': 0.5749, 'learning_rate': 9.28520900185917e-06, 'epoch': 0.54} 54%|█████▎ | 3101/5773 [57:09<7:20:15, 9.89s/it] 54%|█████▎ | 3102/5773 [57:17<6:23:10, 8.61s/it] 54%|█████▎ | 3102/5773 [57:15<6:23:10, 8.61s/it] {'loss': 0.5721, 'learning_rate': 9.279612478189853e-06, 'epoch': 0.54} 54%|█████▎ | 3102/5773 [57:17<6:23:10, 8.61s/it] {'loss': 0.5721, 'learning_rate': 9.279612478189853e-06, 'epoch': 0.54} 54%|█████▎ | 3102/5773 [57:15<6:23:10, 8.61s/it] 54%|█████▍ | 3103/5773 [57:22<5:40:21, 7.65s/it] 54%|█████▍ | 3103/5773 [57:20<5:40:21, 7.65s/it] {'loss': 0.5575, 'learning_rate': 9.27401618132154e-06, 'epoch': 0.54} 54%|█████▍ | 3103/5773 [57:22<5:40:21, 7.65s/it] {'loss': 0.5575, 'learning_rate': 9.27401618132154e-06, 'epoch': 0.54} 54%|█████▍ | 3103/5773 [57:20<5:40:21, 7.65s/it] 54%|█████▍ | 3104/5773 [57:28<5:11:54, 7.01s/it] 54%|█████▍ | 3104/5773 [57:26<5:11:54, 7.01s/it] {'loss': 0.5682, 'learning_rate': 9.268420113016132e-06, 'epoch': 0.54} 54%|█████▍ | 3104/5773 [57:28<5:11:54, 7.01s/it] {'loss': 0.5682, 'learning_rate': 9.268420113016132e-06, 'epoch': 0.54} 54%|█████▍ | 3104/5773 [57:26<5:11:54, 7.01s/it] 54%|█████▍ | 3105/5773 [57:33<4:54:21, 6.62s/it] 54%|█████▍ | 3105/5773 [57:31<4:54:21, 6.62s/it] {'loss': 0.5738, 'learning_rate': 9.262824275035443e-06, 'epoch': 0.54} 54%|█████▍ | 3105/5773 [57:33<4:54:21, 6.62s/it] {'loss': 0.5738, 'learning_rate': 9.262824275035443e-06, 'epoch': 0.54} 54%|█████▍ | 3105/5773 [57:31<4:54:21, 6.62s/it] 54%|█████▍ | 3106/5773 [57:39<4:41:09, 6.33s/it] 54%|█████▍ | 3106/5773 [57:37<4:41:09, 6.33s/it] {'loss': 0.5848, 'learning_rate': 9.257228669141226e-06, 'epoch': 0.54} 54%|█████▍ | 3106/5773 [57:39<4:41:09, 6.33s/it] {'loss': 0.5848, 'learning_rate': 9.257228669141226e-06, 'epoch': 0.54} 54%|█████▍ | 3106/5773 [57:37<4:41:09, 6.33s/it] 54%|█████▍ | 3107/5773 [57:42<4:29:49, 6.07s/it] 54%|█████▍ | 3107/5773 [57:44<4:29:49, 6.07s/it] {'loss': 0.5731, 'learning_rate': 9.251633297095158e-06, 'epoch': 0.54} 54%|█████▍ | 3107/5773 [57:44<4:29:49, 6.07s/it] {'loss': 0.5731, 'learning_rate': 9.251633297095158e-06, 'epoch': 0.54} 54%|█████▍ | 3107/5773 [57:42<4:29:49, 6.07s/it] 54%|█████▍ | 3108/5773 [57:50<4:21:27, 5.89s/it] 54%|█████▍ | 3108/5773 [57:48<4:21:28, 5.89s/it] {'loss': 0.5778, 'learning_rate': 9.246038160658833e-06, 'epoch': 0.54} 54%|█████▍ | 3108/5773 [57:50<4:21:27, 5.89s/it] {'loss': 0.5778, 'learning_rate': 9.246038160658833e-06, 'epoch': 0.54} 54%|█████▍ | 3108/5773 [57:48<4:21:28, 5.89s/it] 54%|█████▍ | 3109/5773 [57:53<4:16:11, 5.77s/it] 54%|█████▍ | 3109/5773 [57:55<4:16:11, 5.77s/it] {'loss': 0.575, 'learning_rate': 9.24044326159379e-06, 'epoch': 0.54} 54%|█████▍ | 3109/5773 [57:55<4:16:11, 5.77s/it] {'loss': 0.575, 'learning_rate': 9.24044326159379e-06, 'epoch': 0.54} 54%|█████▍ | 3109/5773 [57:53<4:16:11, 5.77s/it] 54%|█████▍ | 3110/5773 [58:01<4:11:52, 5.67s/it] 54%|█████▍ | 3110/5773 [57:59<4:11:52, 5.67s/it] {'loss': 0.5572, 'learning_rate': 9.23484860166147e-06, 'epoch': 0.54} 54%|█████▍ | 3110/5773 [58:01<4:11:52, 5.67s/it] {'loss': 0.5572, 'learning_rate': 9.23484860166147e-06, 'epoch': 0.54} 54%|█████▍ | 3110/5773 [57:59<4:11:52, 5.67s/it] 54%|█████▍ | 3111/5773 [58:04<4:08:29, 5.60s/it] 54%|█████▍ | 3111/5773 [58:06<4:08:29, 5.60s/it] {'loss': 0.5769, 'learning_rate': 9.22925418262326e-06, 'epoch': 0.54} 54%|█████▍ | 3111/5773 [58:06<4:08:29, 5.60s/it] {'loss': 0.5769, 'learning_rate': 9.22925418262326e-06, 'epoch': 0.54} 54%|█████▍ | 3111/5773 [58:04<4:08:29, 5.60s/it] 54%|█████▍ | 3112/5773 [58:10<4:05:30, 5.54s/it] 54%|█████▍ | 3112/5773 [58:12<4:05:30, 5.54s/it] {'loss': 0.575, 'learning_rate': 9.223660006240458e-06, 'epoch': 0.54} 54%|█████▍ | 3112/5773 [58:12<4:05:30, 5.54s/it] {'loss': 0.575, 'learning_rate': 9.223660006240458e-06, 'epoch': 0.54} 54%|█████▍ | 3112/5773 [58:10<4:05:30, 5.54s/it] 54%|█████▍ | 3113/5773 [58:17<4:05:21, 5.53s/it] 54%|█████▍ | 3113/5773 [58:15<4:05:21, 5.53s/it] {'loss': 0.5551, 'learning_rate': 9.218066074274289e-06, 'epoch': 0.54} 54%|█████▍ | 3113/5773 [58:17<4:05:21, 5.53s/it] {'loss': 0.5551, 'learning_rate': 9.218066074274289e-06, 'epoch': 0.54} 54%|█████▍ | 3113/5773 [58:15<4:05:21, 5.53s/it] 54%|█████▍ | 3114/5773 [58:21<4:04:36, 5.52s/it] 54%|█████▍ | 3114/5773 [58:23<4:04:36, 5.52s/it] {'loss': 0.5839, 'learning_rate': 9.212472388485907e-06, 'epoch': 0.54} 54%|█████▍ | 3114/5773 [58:23<4:04:36, 5.52s/it] {'loss': 0.5839, 'learning_rate': 9.212472388485907e-06, 'epoch': 0.54} 54%|█████▍ | 3114/5773 [58:21<4:04:36, 5.52s/it] 54%|█████▍ | 3115/5773 [58:28<4:04:46, 5.53s/it] 54%|█████▍ | 3115/5773 [58:26<4:04:46, 5.53s/it] {'loss': 0.5803, 'learning_rate': 9.206878950636376e-06, 'epoch': 0.54} 54%|█████▍ | 3115/5773 [58:28<4:04:46, 5.53s/it] {'loss': 0.5803, 'learning_rate': 9.206878950636376e-06, 'epoch': 0.54} 54%|█████▍ | 3115/5773 [58:26<4:04:46, 5.53s/it] 54%|█████▍ | 3116/5773 [58:32<4:03:59, 5.51s/it] 54%|█████▍ | 3116/5773 [58:34<4:03:59, 5.51s/it] {'loss': 0.5699, 'learning_rate': 9.201285762486687e-06, 'epoch': 0.54} 54%|█████▍ | 3116/5773 [58:34<4:03:59, 5.51s/it] {'loss': 0.5699, 'learning_rate': 9.201285762486687e-06, 'epoch': 0.54} 54%|█████▍ | 3116/5773 [58:32<4:03:59, 5.51s/it] 54%|█████▍ | 3117/5773 [58:39<4:04:32, 5.52s/it] 54%|█████▍ | 3117/5773 [58:37<4:04:32, 5.52s/it] {'loss': 0.5737, 'learning_rate': 9.195692825797764e-06, 'epoch': 0.54} 54%|█████▍ | 3117/5773 [58:39<4:04:32, 5.52s/it] {'loss': 0.5737, 'learning_rate': 9.195692825797764e-06, 'epoch': 0.54} 54%|█████▍ | 3117/5773 [58:37<4:04:32, 5.52s/it] 54%|█████▍ | 3118/5773 [58:45<4:02:40, 5.48s/it] 54%|█████▍ | 3118/5773 [58:43<4:02:40, 5.48s/it] {'loss': 0.5628, 'learning_rate': 9.190100142330433e-06, 'epoch': 0.54} 54%|█████▍ | 3118/5773 [58:45<4:02:40, 5.48s/it] {'loss': 0.5628, 'learning_rate': 9.190100142330433e-06, 'epoch': 0.54} 54%|█████▍ | 3118/5773 [58:43<4:02:40, 5.48s/it] 54%|█████▍ | 3119/5773 [58:48<4:04:34, 5.53s/it] 54%|█████▍ | 3119/5773 [58:50<4:04:34, 5.53s/it] {'loss': 0.5599, 'learning_rate': 9.184507713845455e-06, 'epoch': 0.54} 54%|█████▍ | 3119/5773 [58:50<4:04:34, 5.53s/it] {'loss': 0.5599, 'learning_rate': 9.184507713845455e-06, 'epoch': 0.54} 54%|█████▍ | 3119/5773 [58:48<4:04:34, 5.53s/it] 54%|█████▍ | 3120/5773 [58:56<4:04:39, 5.53s/it] 54%|█████▍ | 3120/5773 [58:54<4:04:39, 5.53s/it] {'loss': 0.5758, 'learning_rate': 9.178915542103504e-06, 'epoch': 0.54} 54%|█████▍ | 3120/5773 [58:56<4:04:39, 5.53s/it] {'loss': 0.5758, 'learning_rate': 9.178915542103504e-06, 'epoch': 0.54} 54%|█████▍ | 3120/5773 [58:54<4:04:39, 5.53s/it] 54%|█████▍ | 3121/5773 [59:01<4:04:40, 5.54s/it] 54%|█████▍ | 3121/5773 [58:59<4:04:41, 5.54s/it] {'loss': 0.5592, 'learning_rate': 9.173323628865173e-06, 'epoch': 0.54} 54%|█████▍ | 3121/5773 [59:01<4:04:40, 5.54s/it] {'loss': 0.5592, 'learning_rate': 9.173323628865173e-06, 'epoch': 0.54} 54%|█████▍ | 3121/5773 [58:59<4:04:41, 5.54s/it] 54%|█████▍ | 3122/5773 [59:05<4:03:22, 5.51s/it] 54%|█████▍ | 3122/5773 [59:07<4:03:22, 5.51s/it] {'loss': 0.5657, 'learning_rate': 9.167731975890977e-06, 'epoch': 0.54} 54%|█████▍ | 3122/5773 [59:07<4:03:22, 5.51s/it] {'loss': 0.5657, 'learning_rate': 9.167731975890977e-06, 'epoch': 0.54} 54%|█████▍ | 3122/5773 [59:05<4:03:22, 5.51s/it] 54%|█████▍ | 3123/5773 [59:12<4:04:19, 5.53s/it] 54%|█████▍ | 3123/5773 [59:10<4:04:20, 5.53s/it] {'loss': 0.5641, 'learning_rate': 9.162140584941344e-06, 'epoch': 0.54} 54%|█████▍ | 3123/5773 [59:12<4:04:19, 5.53s/it] {'loss': 0.5641, 'learning_rate': 9.162140584941344e-06, 'epoch': 0.54} 54%|█████▍ | 3123/5773 [59:10<4:04:20, 5.53s/it] 54%|█████▍ | 3124/5773 [59:16<4:04:07, 5.53s/it] 54%|█████▍ | 3124/5773 [59:18<4:04:07, 5.53s/it] {'loss': 0.579, 'learning_rate': 9.156549457776624e-06, 'epoch': 0.54} 54%|█████▍ | 3124/5773 [59:18<4:04:07, 5.53s/it] {'loss': 0.579, 'learning_rate': 9.156549457776624e-06, 'epoch': 0.54} 54%|█████▍ | 3124/5773 [59:16<4:04:07, 5.53s/it] 54%|█████▍ | 3125/5773 [59:23<4:02:13, 5.49s/it] 54%|█████▍ | 3125/5773 [59:21<4:02:13, 5.49s/it] {'loss': 0.5738, 'learning_rate': 9.150958596157085e-06, 'epoch': 0.54} 54%|█████▍ | 3125/5773 [59:23<4:02:13, 5.49s/it] {'loss': 0.5738, 'learning_rate': 9.150958596157085e-06, 'epoch': 0.54} 54%|█████▍ | 3125/5773 [59:21<4:02:13, 5.49s/it] 54%|█████▍ | 3126/5773 [59:27<4:02:24, 5.49s/it] 54%|█████▍ | 3126/5773 [59:29<4:02:24, 5.49s/it] {'loss': 0.5553, 'learning_rate': 9.145368001842905e-06, 'epoch': 0.54} 54%|█████▍ | 3126/5773 [59:29<4:02:24, 5.49s/it] {'loss': 0.5553, 'learning_rate': 9.145368001842905e-06, 'epoch': 0.54} 54%|█████▍ | 3126/5773 [59:27<4:02:24, 5.49s/it] 54%|█████▍ | 3127/5773 [59:32<4:01:24, 5.47s/it] 54%|█████▍ | 3127/5773 [59:34<4:01:24, 5.47s/it] {'loss': 0.5838, 'learning_rate': 9.139777676594184e-06, 'epoch': 0.54} 54%|█████▍ | 3127/5773 [59:34<4:01:24, 5.47s/it] {'loss': 0.5838, 'learning_rate': 9.139777676594184e-06, 'epoch': 0.54} 54%|█████▍ | 3127/5773 [59:32<4:01:24, 5.47s/it] 54%|█████▍ | 3128/5773 [59:38<4:02:55, 5.51s/it] 54%|█████▍ | 3128/5773 [59:40<4:02:56, 5.51s/it] {'loss': 0.5708, 'learning_rate': 9.134187622170939e-06, 'epoch': 0.54} 54%|█████▍ | 3128/5773 [59:40<4:02:56, 5.51s/it] {'loss': 0.5708, 'learning_rate': 9.134187622170939e-06, 'epoch': 0.54} 54%|█████▍ | 3128/5773 [59:38<4:02:55, 5.51s/it] 54%|█████▍ | 3129/5773 [59:43<4:02:54, 5.51s/it] 54%|█████▍ | 3129/5773 [59:45<4:02:53, 5.51s/it] {'loss': 0.5634, 'learning_rate': 9.128597840333087e-06, 'epoch': 0.54} 54%|█████▍ | 3129/5773 [59:45<4:02:53, 5.51s/it] {'loss': 0.5634, 'learning_rate': 9.128597840333087e-06, 'epoch': 0.54} 54%|█████▍ | 3129/5773 [59:43<4:02:54, 5.51s/it] 54%|█████▍ | 3130/5773 [59:49<4:03:38, 5.53s/it] 54%|█████▍ | 3130/5773 [59:51<4:03:38, 5.53s/it] {'loss': 0.5611, 'learning_rate': 9.123008332840478e-06, 'epoch': 0.54} 54%|█████▍ | 3130/5773 [59:51<4:03:38, 5.53s/it] {'loss': 0.5611, 'learning_rate': 9.123008332840478e-06, 'epoch': 0.54} 54%|█████▍ | 3130/5773 [59:49<4:03:38, 5.53s/it] 54%|█████▍ | 3131/5773 [59:55<4:04:55, 5.56s/it] 54%|█████▍ | 3131/5773 [59:57<4:04:55, 5.56s/it] {'loss': 0.5724, 'learning_rate': 9.117419101452864e-06, 'epoch': 0.54} 54%|█████▍ | 3131/5773 [59:57<4:04:55, 5.56s/it] {'loss': 0.5724, 'learning_rate': 9.117419101452864e-06, 'epoch': 0.54} 54%|█████▍ | 3131/5773 [59:55<4:04:55, 5.56s/it] 54%|█████▍ | 3132/5773 [1:00:00<4:02:05, 5.50s/it] 54%|█████▍ | 3132/5773 [1:00:02<4:02:05, 5.50s/it] {'loss': 0.5797, 'learning_rate': 9.111830147929915e-06, 'epoch': 0.54} 54%|█████▍ | 3132/5773 [1:00:02<4:02:05, 5.50s/it] {'loss': 0.5797, 'learning_rate': 9.111830147929915e-06, 'epoch': 0.54} 54%|█████▍ | 3132/5773 [1:00:00<4:02:05, 5.50s/it] 54%|█████▍ | 3133/5773 [1:00:07<3:59:05, 5.43s/it] 54%|█████▍ | 3133/5773 [1:00:05<3:59:05, 5.43s/it] {'loss': 0.5631, 'learning_rate': 9.106241474031213e-06, 'epoch': 0.54} 54%|█████▍ | 3133/5773 [1:00:07<3:59:05, 5.43s/it] {'loss': 0.5631, 'learning_rate': 9.106241474031213e-06, 'epoch': 0.54} 54%|█████▍ | 3133/5773 [1:00:05<3:59:05, 5.43s/it] 54%|█████▍ | 3134/5773 [1:00:13<3:59:59, 5.46s/it] 54%|█████▍ | 3134/5773 [1:00:11<3:59:59, 5.46s/it] {'loss': 0.5852, 'learning_rate': 9.100653081516249e-06, 'epoch': 0.54} 54%|█████▍ | 3134/5773 [1:00:13<3:59:59, 5.46s/it] {'loss': 0.5852, 'learning_rate': 9.100653081516249e-06, 'epoch': 0.54} 54%|█████▍ | 3134/5773 [1:00:11<3:59:59, 5.46s/it] 54%|█████▍ | 3135/5773 [1:00:18<4:00:51, 5.48s/it] 54%|█████▍ | 3135/5773 [1:00:16<4:00:51, 5.48s/it] {'loss': 0.556, 'learning_rate': 9.095064972144432e-06, 'epoch': 0.54} 54%|█████▍ | 3135/5773 [1:00:18<4:00:51, 5.48s/it] {'loss': 0.556, 'learning_rate': 9.095064972144432e-06, 'epoch': 0.54} 54%|█████▍ | 3135/5773 [1:00:16<4:00:51, 5.48s/it] 54%|█████▍ | 3136/5773 [1:00:22<4:10:47, 5.71s/it] 54%|█████▍ | 3136/5773 [1:00:24<4:10:47, 5.71s/it] {'loss': 0.5539, 'learning_rate': 9.089477147675072e-06, 'epoch': 0.54} 54%|█████▍ | 3136/5773 [1:00:24<4:10:47, 5.71s/it] {'loss': 0.5539, 'learning_rate': 9.089477147675072e-06, 'epoch': 0.54} 54%|█████▍ | 3136/5773 [1:00:22<4:10:47, 5.71s/it] 54%|█████▍ | 3137/5773 [1:00:28<4:07:09, 5.63s/it] 54%|█████▍ | 3137/5773 [1:00:30<4:07:09, 5.63s/it] {'loss': 0.5585, 'learning_rate': 9.083889609867396e-06, 'epoch': 0.54} 54%|█████▍ | 3137/5773 [1:00:30<4:07:09, 5.63s/it] {'loss': 0.5585, 'learning_rate': 9.083889609867396e-06, 'epoch': 0.54} 54%|█████▍ | 3137/5773 [1:00:28<4:07:09, 5.63s/it] 54%|█████▍ | 3138/5773 [1:00:35<4:05:20, 5.59s/it] 54%|█████▍ | 3138/5773 [1:00:33<4:05:20, 5.59s/it] {'loss': 0.5667, 'learning_rate': 9.078302360480544e-06, 'epoch': 0.54} 54%|█████▍ | 3138/5773 [1:00:35<4:05:20, 5.59s/it] {'loss': 0.5667, 'learning_rate': 9.078302360480544e-06, 'epoch': 0.54} 54%|█████▍ | 3138/5773 [1:00:33<4:05:20, 5.59s/it] 54%|█████▍ | 3139/5773 [1:00:41<4:03:34, 5.55s/it] 54%|█████▍ | 3139/5773 [1:00:39<4:03:34, 5.55s/it] {'loss': 0.5637, 'learning_rate': 9.072715401273553e-06, 'epoch': 0.54} 54%|█████▍ | 3139/5773 [1:00:41<4:03:34, 5.55s/it] {'loss': 0.5637, 'learning_rate': 9.072715401273553e-06, 'epoch': 0.54} 54%|█████▍ | 3139/5773 [1:00:39<4:03:34, 5.55s/it] 54%|█████▍ | 3140/5773 [1:00:44<4:03:15, 5.54s/it] 54%|█████▍ | 3140/5773 [1:00:46<4:03:15, 5.54s/it] {'loss': 0.5865, 'learning_rate': 9.067128734005382e-06, 'epoch': 0.54} 54%|█████▍ | 3140/5773 [1:00:46<4:03:15, 5.54s/it] {'loss': 0.5865, 'learning_rate': 9.067128734005382e-06, 'epoch': 0.54} 54%|█████▍ | 3140/5773 [1:00:44<4:03:15, 5.54s/it] 54%|█████▍ | 3141/5773 [1:00:50<4:02:18, 5.52s/it] 54%|█████▍ | 3141/5773 [1:00:52<4:02:19, 5.52s/it] {'loss': 0.5562, 'learning_rate': 9.06154236043489e-06, 'epoch': 0.54} 54%|█████▍ | 3141/5773 [1:00:52<4:02:19, 5.52s/it] {'loss': 0.5562, 'learning_rate': 9.06154236043489e-06, 'epoch': 0.54} 54%|█████▍ | 3141/5773 [1:00:50<4:02:18, 5.52s/it] 54%|█████▍ | 3142/5773 [1:00:57<4:00:44, 5.49s/it] 54%|█████▍ | 3142/5773 [1:00:55<4:00:46, 5.49s/it] {'loss': 0.5597, 'learning_rate': 9.05595628232085e-06, 'epoch': 0.54} 54%|█████▍ | 3142/5773 [1:00:57<4:00:44, 5.49s/it] {'loss': 0.5597, 'learning_rate': 9.05595628232085e-06, 'epoch': 0.54} 54%|█████▍ | 3142/5773 [1:00:55<4:00:46, 5.49s/it] 54%|█████▍ | 3143/5773 [1:01:01<4:01:11, 5.50s/it] 54%|█████▍ | 3143/5773 [1:01:03<4:01:11, 5.50s/it] {'loss': 0.5715, 'learning_rate': 9.050370501421931e-06, 'epoch': 0.54} 54%|█████▍ | 3143/5773 [1:01:03<4:01:11, 5.50s/it] {'loss': 0.5715, 'learning_rate': 9.050370501421931e-06, 'epoch': 0.54} 54%|█████▍ | 3143/5773 [1:01:01<4:01:11, 5.50s/it] 54%|█████▍ | 3144/5773 [1:01:08<4:00:09, 5.48s/it] 54%|█████▍ | 3144/5773 [1:01:06<4:00:09, 5.48s/it] {'loss': 0.5763, 'learning_rate': 9.04478501949672e-06, 'epoch': 0.54} 54%|█████▍ | 3144/5773 [1:01:08<4:00:09, 5.48s/it] {'loss': 0.5763, 'learning_rate': 9.04478501949672e-06, 'epoch': 0.54} 54%|█████▍ | 3144/5773 [1:01:06<4:00:09, 5.48s/it] 54%|█████▍ | 3145/5773 [1:01:14<3:58:46, 5.45s/it] 54%|█████▍ | 3145/5773 [1:01:12<3:58:46, 5.45s/it] {'loss': 0.5732, 'learning_rate': 9.039199838303702e-06, 'epoch': 0.54} 54%|█████▍ | 3145/5773 [1:01:14<3:58:46, 5.45s/it] {'loss': 0.5732, 'learning_rate': 9.039199838303702e-06, 'epoch': 0.54} 54%|█████▍ | 3145/5773 [1:01:12<3:58:46, 5.45s/it] 54%|█████▍ | 3146/5773 [1:01:19<4:01:22, 5.51s/it] 54%|█████▍ | 3146/5773 [1:01:17<4:01:22, 5.51s/it] {'loss': 0.5756, 'learning_rate': 9.033614959601274e-06, 'epoch': 0.54} 54%|█████▍ | 3146/5773 [1:01:19<4:01:22, 5.51s/it] {'loss': 0.5756, 'learning_rate': 9.033614959601274e-06, 'epoch': 0.54} 54%|█████▍ | 3146/5773 [1:01:17<4:01:22, 5.51s/it] 55%|█████▍ | 3147/5773 [1:01:25<4:00:07, 5.49s/it] 55%|█████▍ | 3147/5773 [1:01:23<4:00:06, 5.49s/it] {'loss': 0.5491, 'learning_rate': 9.02803038514773e-06, 'epoch': 0.55} 55%|█████▍ | 3147/5773 [1:01:25<4:00:07, 5.49s/it] {'loss': 0.5491, 'learning_rate': 9.02803038514773e-06, 'epoch': 0.55} 55%|█████▍ | 3147/5773 [1:01:23<4:00:06, 5.49s/it] 55%|█████▍ | 3148/5773 [1:01:30<4:00:47, 5.50s/it] 55%|█████▍ | 3148/5773 [1:01:28<4:00:47, 5.50s/it] {'loss': 0.5771, 'learning_rate': 9.022446116701278e-06, 'epoch': 0.55} 55%|█████▍ | 3148/5773 [1:01:30<4:00:47, 5.50s/it] {'loss': 0.5771, 'learning_rate': 9.022446116701278e-06, 'epoch': 0.55} 55%|█████▍ | 3148/5773 [1:01:28<4:00:47, 5.50s/it] 55%|█████▍ | 3149/5773 [1:01:36<3:59:04, 5.47s/it] 55%|█████▍ | 3149/5773 [1:01:34<3:59:05, 5.47s/it] {'loss': 0.5561, 'learning_rate': 9.01686215602002e-06, 'epoch': 0.55} 55%|█████▍ | 3149/5773 [1:01:36<3:59:04, 5.47s/it] {'loss': 0.5561, 'learning_rate': 9.01686215602002e-06, 'epoch': 0.55} 55%|█████▍ | 3149/5773 [1:01:34<3:59:05, 5.47s/it]11 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 55%|█████▍ | 3150/5773 [1:01:41<4:00:49, 5.51s/it] 55%|█████▍ | 3150/5773 [1:01:39<4:00:49, 5.51s/it]4 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... {'loss': 0.5583, 'learning_rate': 9.011278504861963e-06, 'epoch': 0.55} 55%|█████▍ | 3150/5773 [1:01:41<4:00:49, 5.51s/it] {'loss': 0.5583, 'learning_rate': 9.011278504861963e-06, 'epoch': 0.55} 55%|█████▍ | 3150/5773 [1:01:39<4:00:49, 5.51s/it] 55%|█████▍ | 3151/5773 [1:01:45<3:59:10, 5.47s/it] 55%|█████▍ | 3151/5773 [1:01:47<3:59:10, 5.47s/it] {'loss': 0.5675, 'learning_rate': 9.005695164985027e-06, 'epoch': 0.55} 55%|█████▍ | 3151/5773 [1:01:47<3:59:10, 5.47s/it] {'loss': 0.5675, 'learning_rate': 9.005695164985027e-06, 'epoch': 0.55} 55%|█████▍ | 3151/5773 [1:01:45<3:59:10, 5.47s/it] 55%|█████▍ | 3152/5773 [1:01:50<3:59:19, 5.48s/it] 55%|█████▍ | 3152/5773 [1:01:52<3:59:19, 5.48s/it] {'loss': 0.567, 'learning_rate': 9.000112138147018e-06, 'epoch': 0.55} 55%|█████▍ | 3152/5773 [1:01:52<3:59:19, 5.48s/it] {'loss': 0.567, 'learning_rate': 9.000112138147018e-06, 'epoch': 0.55} 55%|█████▍ | 3152/5773 [1:01:50<3:59:19, 5.48s/it] 55%|█████▍ | 3153/5773 [1:01:58<4:00:33, 5.51s/it] 55%|█████▍ | 3153/5773 [1:01:56<4:00:33, 5.51s/it] {'loss': 0.5635, 'learning_rate': 8.994529426105656e-06, 'epoch': 0.55} 55%|█████▍ | 3153/5773 [1:01:58<4:00:33, 5.51s/it] {'loss': 0.5635, 'learning_rate': 8.994529426105656e-06, 'epoch': 0.55} 55%|█████▍ | 3153/5773 [1:01:56<4:00:33, 5.51s/it] 55%|█████▍ | 3154/5773 [1:02:01<3:59:19, 5.48s/it] 55%|█████▍ | 3154/5773 [1:02:03<3:59:19, 5.48s/it] {'loss': 0.5868, 'learning_rate': 8.988947030618554e-06, 'epoch': 0.55} 55%|█████▍ | 3154/5773 [1:02:03<3:59:19, 5.48s/it] {'loss': 0.5868, 'learning_rate': 8.988947030618554e-06, 'epoch': 0.55} 55%|█████▍ | 3154/5773 [1:02:01<3:59:19, 5.48s/it] 55%|█████▍ | 3155/5773 [1:02:07<3:58:31, 5.47s/it] 55%|█████▍ | 3155/5773 [1:02:09<3:58:32, 5.47s/it] {'loss': 0.5738, 'learning_rate': 8.983364953443227e-06, 'epoch': 0.55} 55%|█████▍ | 3155/5773 [1:02:09<3:58:32, 5.47s/it] {'loss': 0.5738, 'learning_rate': 8.983364953443227e-06, 'epoch': 0.55} 55%|█████▍ | 3155/5773 [1:02:07<3:58:31, 5.47s/it] 55%|█████▍ | 3156/5773 [1:02:12<3:58:14, 5.46s/it] 55%|█████▍ | 3156/5773 [1:02:14<3:58:15, 5.46s/it] {'loss': 0.5634, 'learning_rate': 8.977783196337102e-06, 'epoch': 0.55} 55%|█████▍ | 3156/5773 [1:02:14<3:58:15, 5.46s/it] {'loss': 0.5634, 'learning_rate': 8.977783196337102e-06, 'epoch': 0.55} 55%|█████▍ | 3156/5773 [1:02:12<3:58:14, 5.46s/it] 55%|█████▍ | 3157/5773 [1:02:18<3:59:33, 5.49s/it] 55%|█████▍ | 3157/5773 [1:02:20<3:59:33, 5.49s/it] {'loss': 0.5699, 'learning_rate': 8.972201761057482e-06, 'epoch': 0.55} 55%|█████▍ | 3157/5773 [1:02:20<3:59:33, 5.49s/it] {'loss': 0.5699, 'learning_rate': 8.972201761057482e-06, 'epoch': 0.55} 55%|█████▍ | 3157/5773 [1:02:18<3:59:33, 5.49s/it] 55%|█████▍ | 3158/5773 [1:02:23<3:58:11, 5.47s/it] 55%|█████▍ | 3158/5773 [1:02:25<3:58:11, 5.47s/it] {'loss': 0.5562, 'learning_rate': 8.966620649361584e-06, 'epoch': 0.55} 55%|█████▍ | 3158/5773 [1:02:25<3:58:11, 5.47s/it] {'loss': 0.5562, 'learning_rate': 8.966620649361584e-06, 'epoch': 0.55} 55%|█████▍ | 3158/5773 [1:02:23<3:58:11, 5.47s/it] 55%|█████▍ | 3159/5773 [1:02:28<3:58:12, 5.47s/it] 55%|█████▍ | 3159/5773 [1:02:30<3:58:12, 5.47s/it] {'loss': 0.5615, 'learning_rate': 8.961039863006523e-06, 'epoch': 0.55} 55%|█████▍ | 3159/5773 [1:02:30<3:58:12, 5.47s/it] {'loss': 0.5615, 'learning_rate': 8.961039863006523e-06, 'epoch': 0.55} 55%|█████▍ | 3159/5773 [1:02:28<3:58:12, 5.47s/it] 55%|█████▍ | 3160/5773 [1:02:34<3:58:26, 5.48s/it] 55%|█████▍ | 3160/5773 [1:02:36<3:58:26, 5.48s/it] {'loss': 0.5798, 'learning_rate': 8.955459403749305e-06, 'epoch': 0.55} 55%|█████▍ | 3160/5773 [1:02:36<3:58:26, 5.48s/it] {'loss': 0.5798, 'learning_rate': 8.955459403749305e-06, 'epoch': 0.55} 55%|█████▍ | 3160/5773 [1:02:34<3:58:26, 5.48s/it] 55%|█████▍ | 3161/5773 [1:02:42<3:59:57, 5.51s/it] 55%|█████▍ | 3161/5773 [1:02:40<3:59:57, 5.51s/it] {'loss': 0.5518, 'learning_rate': 8.949879273346844e-06, 'epoch': 0.55} 55%|█████▍ | 3161/5773 [1:02:42<3:59:57, 5.51s/it] {'loss': 0.5518, 'learning_rate': 8.949879273346844e-06, 'epoch': 0.55} 55%|█████▍ | 3161/5773 [1:02:40<3:59:57, 5.51s/it] 55%|█████▍ | 3162/5773 [1:02:45<3:59:03, 5.49s/it] 55%|█████▍ | 3162/5773 [1:02:47<3:59:03, 5.49s/it] {'loss': 0.5709, 'learning_rate': 8.944299473555935e-06, 'epoch': 0.55} 55%|█████▍ | 3162/5773 [1:02:47<3:59:03, 5.49s/it] {'loss': 0.5709, 'learning_rate': 8.944299473555935e-06, 'epoch': 0.55} 55%|█████▍ | 3162/5773 [1:02:45<3:59:03, 5.49s/it] 55%|█████▍ | 3163/5773 [1:02:50<3:58:52, 5.49s/it] 55%|█████▍ | 3163/5773 [1:02:52<3:58:53, 5.49s/it] {'loss': 0.5667, 'learning_rate': 8.938720006133282e-06, 'epoch': 0.55} 55%|█████▍ | 3163/5773 [1:02:52<3:58:53, 5.49s/it] {'loss': 0.5667, 'learning_rate': 8.938720006133282e-06, 'epoch': 0.55} 55%|█████▍ | 3163/5773 [1:02:50<3:58:52, 5.49s/it] 55%|█████▍ | 3164/5773 [1:02:58<3:57:41, 5.47s/it] 55%|█████▍ | 3164/5773 [1:02:56<3:57:41, 5.47s/it] {'loss': 0.574, 'learning_rate': 8.933140872835482e-06, 'epoch': 0.55} 55%|█████▍ | 3164/5773 [1:02:58<3:57:41, 5.47s/it] {'loss': 0.574, 'learning_rate': 8.933140872835482e-06, 'epoch': 0.55} 55%|█████▍ | 3164/5773 [1:02:56<3:57:41, 5.47s/it] 55%|█████▍ | 3165/5773 [1:03:01<3:59:12, 5.50s/it] 55%|█████▍ | 3165/5773 [1:03:03<3:59:12, 5.50s/it] {'loss': 0.5732, 'learning_rate': 8.927562075419018e-06, 'epoch': 0.55} 55%|█████▍ | 3165/5773 [1:03:03<3:59:12, 5.50s/it] {'loss': 0.5732, 'learning_rate': 8.927562075419018e-06, 'epoch': 0.55} 55%|█████▍ | 3165/5773 [1:03:01<3:59:12, 5.50s/it] 55%|█████▍ | 3166/5773 [1:03:09<3:57:44, 5.47s/it] 55%|█████▍ | 3166/5773 [1:03:07<3:57:44, 5.47s/it] {'loss': 0.5535, 'learning_rate': 8.921983615640277e-06, 'epoch': 0.55} 55%|█████▍ | 3166/5773 [1:03:09<3:57:44, 5.47s/it] {'loss': 0.5535, 'learning_rate': 8.921983615640277e-06, 'epoch': 0.55} 55%|█████▍ | 3166/5773 [1:03:07<3:57:44, 5.47s/it] 55%|█████▍ | 3167/5773 [1:03:14<3:58:02, 5.48s/it] 55%|█████▍ | 3167/5773 [1:03:12<3:58:02, 5.48s/it] {'loss': 0.5783, 'learning_rate': 8.916405495255536e-06, 'epoch': 0.55} 55%|█████▍ | 3167/5773 [1:03:14<3:58:02, 5.48s/it] {'loss': 0.5783, 'learning_rate': 8.916405495255536e-06, 'epoch': 0.55} 55%|█████▍ | 3167/5773 [1:03:12<3:58:02, 5.48s/it] 55%|█████▍ | 3168/5773 [1:03:20<3:58:47, 5.50s/it] 55%|█████▍ | 3168/5773 [1:03:18<3:58:47, 5.50s/it] {'loss': 0.5747, 'learning_rate': 8.910827716020965e-06, 'epoch': 0.55} 55%|█████▍ | 3168/5773 [1:03:20<3:58:47, 5.50s/it] {'loss': 0.5747, 'learning_rate': 8.910827716020965e-06, 'epoch': 0.55} 55%|█████▍ | 3168/5773 [1:03:18<3:58:47, 5.50s/it] 55%|█████▍ | 3169/5773 [1:03:23<3:59:13, 5.51s/it] 55%|█████▍ | 3169/5773 [1:03:25<3:59:13, 5.51s/it] {'loss': 0.5495, 'learning_rate': 8.905250279692631e-06, 'epoch': 0.55} 55%|█████▍ | 3169/5773 [1:03:25<3:59:13, 5.51s/it] {'loss': 0.5495, 'learning_rate': 8.905250279692631e-06, 'epoch': 0.55} 55%|█████▍ | 3169/5773 [1:03:23<3:59:13, 5.51s/it] 55%|█████▍ | 3170/5773 [1:03:31<3:57:25, 5.47s/it] 55%|█████▍ | 3170/5773 [1:03:29<3:57:27, 5.47s/it] {'loss': 0.5674, 'learning_rate': 8.899673188026488e-06, 'epoch': 0.55} 55%|█████▍ | 3170/5773 [1:03:31<3:57:25, 5.47s/it] {'loss': 0.5674, 'learning_rate': 8.899673188026488e-06, 'epoch': 0.55} 55%|█████▍ | 3170/5773 [1:03:29<3:57:27, 5.47s/it] 55%|█████▍ | 3171/5773 [1:03:36<3:56:04, 5.44s/it] 55%|█████▍ | 3171/5773 [1:03:34<3:56:03, 5.44s/it] {'loss': 0.5749, 'learning_rate': 8.894096442778375e-06, 'epoch': 0.55} 55%|█████▍ | 3171/5773 [1:03:36<3:56:04, 5.44s/it] {'loss': 0.5749, 'learning_rate': 8.894096442778375e-06, 'epoch': 0.55} 55%|█████▍ | 3171/5773 [1:03:34<3:56:03, 5.44s/it] 55%|█████▍ | 3172/5773 [1:03:42<3:57:23, 5.48s/it] 55%|█████▍ | 3172/5773 [1:03:40<3:57:22, 5.48s/it] {'loss': 0.5721, 'learning_rate': 8.888520045704039e-06, 'epoch': 0.55} 55%|█████▍ | 3172/5773 [1:03:42<3:57:23, 5.48s/it] {'loss': 0.5721, 'learning_rate': 8.888520045704039e-06, 'epoch': 0.55} 55%|█████▍ | 3172/5773 [1:03:40<3:57:22, 5.48s/it] 55%|█████▍ | 3173/5773 [1:03:47<3:55:34, 5.44s/it] 55%|█████▍ | 3173/5773 [1:03:45<3:55:34, 5.44s/it] {'loss': 0.5643, 'learning_rate': 8.8829439985591e-06, 'epoch': 0.55} 55%|█████▍ | 3173/5773 [1:03:47<3:55:34, 5.44s/it] {'loss': 0.5643, 'learning_rate': 8.8829439985591e-06, 'epoch': 0.55} 55%|█████▍ | 3173/5773 [1:03:45<3:55:34, 5.44s/it] 55%|█████▍ | 3174/5773 [1:03:51<3:55:37, 5.44s/it] 55%|█████▍ | 3174/5773 [1:03:53<3:55:37, 5.44s/it] {'loss': 0.5734, 'learning_rate': 8.877368303099083e-06, 'epoch': 0.55} 55%|█████▍ | 3174/5773 [1:03:53<3:55:37, 5.44s/it] {'loss': 0.5734, 'learning_rate': 8.877368303099083e-06, 'epoch': 0.55} 55%|█████▍ | 3174/5773 [1:03:51<3:55:37, 5.44s/it] 55%|█████▍ | 3175/5773 [1:03:58<3:56:32, 5.46s/it] 55%|█████▍ | 3175/5773 [1:03:56<3:56:32, 5.46s/it] {'loss': 0.5537, 'learning_rate': 8.871792961079391e-06, 'epoch': 0.55} 55%|█████▍ | 3175/5773 [1:03:58<3:56:32, 5.46s/it] {'loss': 0.5537, 'learning_rate': 8.871792961079391e-06, 'epoch': 0.55} 55%|█████▍ | 3175/5773 [1:03:56<3:56:32, 5.46s/it] 55%|█████▌ | 3176/5773 [1:04:01<3:54:56, 5.43s/it] 55%|█████▌ | 3176/5773 [1:04:03<3:54:56, 5.43s/it] {'loss': 0.5736, 'learning_rate': 8.86621797425532e-06, 'epoch': 0.55} 55%|█████▌ | 3176/5773 [1:04:03<3:54:56, 5.43s/it] {'loss': 0.5736, 'learning_rate': 8.86621797425532e-06, 'epoch': 0.55} 55%|█████▌ | 3176/5773 [1:04:01<3:54:56, 5.43s/it] 55%|█████▌ | 3177/5773 [1:04:07<3:55:28, 5.44s/it] 55%|█████▌ | 3177/5773 [1:04:09<3:55:28, 5.44s/it] {'loss': 0.5716, 'learning_rate': 8.860643344382057e-06, 'epoch': 0.55} 55%|█████▌ | 3177/5773 [1:04:09<3:55:28, 5.44s/it] {'loss': 0.5716, 'learning_rate': 8.860643344382057e-06, 'epoch': 0.55} 55%|█████▌ | 3177/5773 [1:04:07<3:55:28, 5.44s/it] 55%|█████▌ | 3178/5773 [1:04:14<3:53:23, 5.40s/it] 55%|█████▌ | 3178/5773 [1:04:12<3:53:23, 5.40s/it] {'loss': 0.5486, 'learning_rate': 8.855069073214668e-06, 'epoch': 0.55} 55%|█████▌ | 3178/5773 [1:04:14<3:53:23, 5.40s/it] {'loss': 0.5486, 'learning_rate': 8.855069073214668e-06, 'epoch': 0.55} 55%|█████▌ | 3178/5773 [1:04:12<3:53:23, 5.40s/it] 55%|█████▌ | 3179/5773 [1:04:20<3:54:23, 5.42s/it] 55%|█████▌ | 3179/5773 [1:04:18<3:54:23, 5.42s/it] {'loss': 0.5792, 'learning_rate': 8.84949516250812e-06, 'epoch': 0.55} 55%|█████▌ | 3179/5773 [1:04:20<3:54:23, 5.42s/it] {'loss': 0.5792, 'learning_rate': 8.84949516250812e-06, 'epoch': 0.55} 55%|█████▌ | 3179/5773 [1:04:18<3:54:23, 5.42s/it] 55%|█████▌ | 3180/5773 [1:04:25<3:56:27, 5.47s/it] 55%|█████▌ | 3180/5773 [1:04:23<3:56:28, 5.47s/it] {'loss': 0.5802, 'learning_rate': 8.843921614017247e-06, 'epoch': 0.55} 55%|█████▌ | 3180/5773 [1:04:25<3:56:27, 5.47s/it] {'loss': 0.5802, 'learning_rate': 8.843921614017247e-06, 'epoch': 0.55} 55%|█████▌ | 3180/5773 [1:04:23<3:56:28, 5.47s/it] 55%|█████▌ | 3181/5773 [1:04:31<3:55:31, 5.45s/it] 55%|█████▌ | 3181/5773 [1:04:29<3:55:31, 5.45s/it] {'loss': 0.5818, 'learning_rate': 8.83834842949679e-06, 'epoch': 0.55} 55%|█████▌ | 3181/5773 [1:04:31<3:55:31, 5.45s/it] {'loss': 0.5818, 'learning_rate': 8.83834842949679e-06, 'epoch': 0.55} 55%|█████▌ | 3181/5773 [1:04:29<3:55:31, 5.45s/it] 55%|█████▌ | 3182/5773 [1:04:36<3:56:22, 5.47s/it] 55%|█████▌ | 3182/5773 [1:04:34<3:56:22, 5.47s/it] {'loss': 0.5476, 'learning_rate': 8.832775610701363e-06, 'epoch': 0.55} 55%|█████▌ | 3182/5773 [1:04:36<3:56:22, 5.47s/it] {'loss': 0.5476, 'learning_rate': 8.832775610701363e-06, 'epoch': 0.55} 55%|█████▌ | 3182/5773 [1:04:34<3:56:22, 5.47s/it] 55%|█████▌ | 3183/5773 [1:04:42<3:58:47, 5.53s/it] 55%|█████▌ | 3183/5773 [1:04:40<3:58:47, 5.53s/it] {'loss': 0.5625, 'learning_rate': 8.827203159385464e-06, 'epoch': 0.55} 55%|█████▌ | 3183/5773 [1:04:42<3:58:47, 5.53s/it] {'loss': 0.5625, 'learning_rate': 8.827203159385464e-06, 'epoch': 0.55} 55%|█████▌ | 3183/5773 [1:04:40<3:58:47, 5.53s/it] 55%|█████▌ | 3184/5773 [1:04:45<3:59:11, 5.54s/it] 55%|█████▌ | 3184/5773 [1:04:47<3:59:12, 5.54s/it] {'loss': 0.5743, 'learning_rate': 8.821631077303485e-06, 'epoch': 0.55} 55%|█████▌ | 3184/5773 [1:04:47<3:59:12, 5.54s/it] {'loss': 0.5743, 'learning_rate': 8.821631077303485e-06, 'epoch': 0.55} 55%|█████▌ | 3184/5773 [1:04:45<3:59:11, 5.54s/it] 55%|█████▌ | 3185/5773 [1:04:53<3:58:57, 5.54s/it] 55%|█████▌ | 3185/5773 [1:04:51<3:58:57, 5.54s/it] {'loss': 0.5813, 'learning_rate': 8.81605936620969e-06, 'epoch': 0.55} 55%|█████▌ | 3185/5773 [1:04:53<3:58:57, 5.54s/it] {'loss': 0.5813, 'learning_rate': 8.81605936620969e-06, 'epoch': 0.55} 55%|█████▌ | 3185/5773 [1:04:51<3:58:57, 5.54s/it] 55%|█████▌ | 3186/5773 [1:04:56<3:57:47, 5.51s/it] 55%|█████▌ | 3186/5773 [1:04:58<3:57:47, 5.51s/it] {'loss': 0.5606, 'learning_rate': 8.810488027858231e-06, 'epoch': 0.55} 55%|█████▌ | 3186/5773 [1:04:58<3:57:47, 5.51s/it] {'loss': 0.5606, 'learning_rate': 8.810488027858231e-06, 'epoch': 0.55} 55%|█████▌ | 3186/5773 [1:04:56<3:57:47, 5.51s/it] 55%|█████▌ | 3187/5773 [1:05:02<3:55:56, 5.47s/it] 55%|█████▌ | 3187/5773 [1:05:04<3:55:56, 5.47s/it] {'loss': 0.5461, 'learning_rate': 8.804917064003145e-06, 'epoch': 0.55} 55%|█████▌ | 3187/5773 [1:05:04<3:55:56, 5.47s/it] {'loss': 0.5461, 'learning_rate': 8.804917064003145e-06, 'epoch': 0.55} 55%|█████▌ | 3187/5773 [1:05:02<3:55:56, 5.47s/it] 55%|█████▌ | 3188/5773 [1:05:07<3:58:31, 5.54s/it] 55%|█████▌ | 3188/5773 [1:05:09<3:58:31, 5.54s/it] {'loss': 0.5651, 'learning_rate': 8.799346476398351e-06, 'epoch': 0.55} 55%|█████▌ | 3188/5773 [1:05:09<3:58:31, 5.54s/it] {'loss': 0.5651, 'learning_rate': 8.799346476398351e-06, 'epoch': 0.55} 55%|█████▌ | 3188/5773 [1:05:07<3:58:31, 5.54s/it] 55%|█████▌ | 3189/5773 [1:05:13<3:57:05, 5.51s/it] 55%|█████▌ | 3189/5773 [1:05:15<3:57:06, 5.51s/it] {'loss': 0.5812, 'learning_rate': 8.793776266797646e-06, 'epoch': 0.55} 55%|█████▌ | 3189/5773 [1:05:15<3:57:06, 5.51s/it] {'loss': 0.5812, 'learning_rate': 8.793776266797646e-06, 'epoch': 0.55} 55%|█████▌ | 3189/5773 [1:05:13<3:57:05, 5.51s/it] 55%|█████▌ | 3190/5773 [1:05:20<3:57:33, 5.52s/it] 55%|█████▌ | 3190/5773 [1:05:18<3:57:33, 5.52s/it] {'loss': 0.5802, 'learning_rate': 8.788206436954712e-06, 'epoch': 0.55} 55%|█████▌ | 3190/5773 [1:05:20<3:57:33, 5.52s/it] {'loss': 0.5802, 'learning_rate': 8.788206436954712e-06, 'epoch': 0.55} 55%|█████▌ | 3190/5773 [1:05:18<3:57:33, 5.52s/it] 55%|█████▌ | 3191/5773 [1:05:26<3:57:02, 5.51s/it] 55%|█████▌ | 3191/5773 [1:05:24<3:57:02, 5.51s/it] {'loss': 0.5686, 'learning_rate': 8.782636988623108e-06, 'epoch': 0.55} 55%|█████▌ | 3191/5773 [1:05:26<3:57:02, 5.51s/it] {'loss': 0.5686, 'learning_rate': 8.782636988623108e-06, 'epoch': 0.55} 55%|█████▌ | 3191/5773 [1:05:24<3:57:02, 5.51s/it] 55%|█████▌ | 3192/5773 [1:05:30<3:58:26, 5.54s/it] 55%|█████▌ | 3192/5773 [1:05:32<3:58:26, 5.54s/it] {'loss': 0.5662, 'learning_rate': 8.777067923556274e-06, 'epoch': 0.55} 55%|█████▌ | 3192/5773 [1:05:32<3:58:26, 5.54s/it] {'loss': 0.5662, 'learning_rate': 8.777067923556274e-06, 'epoch': 0.55} 55%|█████▌ | 3192/5773 [1:05:30<3:58:26, 5.54s/it] 55%|█████▌ | 3193/5773 [1:05:37<3:57:26, 5.52s/it] 55%|█████▌ | 3193/5773 [1:05:35<3:57:26, 5.52s/it] {'loss': 0.5573, 'learning_rate': 8.77149924350753e-06, 'epoch': 0.55} 55%|█████▌ | 3193/5773 [1:05:37<3:57:26, 5.52s/it] {'loss': 0.5573, 'learning_rate': 8.77149924350753e-06, 'epoch': 0.55} 55%|█████▌ | 3193/5773 [1:05:35<3:57:26, 5.52s/it] 55%|█████▌ | 3194/5773 [1:05:43<3:58:28, 5.55s/it] 55%|█████▌ | 3194/5773 [1:05:41<3:58:29, 5.55s/it] {'loss': 0.563, 'learning_rate': 8.765930950230075e-06, 'epoch': 0.55} 55%|█████▌ | 3194/5773 [1:05:43<3:58:28, 5.55s/it] {'loss': 0.563, 'learning_rate': 8.765930950230075e-06, 'epoch': 0.55} 55%|█████▌ | 3194/5773 [1:05:41<3:58:29, 5.55s/it] 55%|█████▌ | 3195/5773 [1:05:48<4:00:55, 5.61s/it] 55%|█████▌ | 3195/5773 [1:05:46<4:00:56, 5.61s/it] {'loss': 0.5706, 'learning_rate': 8.760363045476986e-06, 'epoch': 0.55} 55%|█████▌ | 3195/5773 [1:05:48<4:00:55, 5.61s/it] {'loss': 0.5706, 'learning_rate': 8.760363045476986e-06, 'epoch': 0.55} 55%|█████▌ | 3195/5773 [1:05:46<4:00:56, 5.61s/it] 55%|█████▌ | 3196/5773 [1:05:54<3:58:30, 5.55s/it] 55%|█████▌ | 3196/5773 [1:05:52<3:58:30, 5.55s/it] {'loss': 0.5689, 'learning_rate': 8.754795531001215e-06, 'epoch': 0.55} 55%|█████▌ | 3196/5773 [1:05:54<3:58:30, 5.55s/it] {'loss': 0.5689, 'learning_rate': 8.754795531001215e-06, 'epoch': 0.55} 55%|█████▌ | 3196/5773 [1:05:52<3:58:30, 5.55s/it] 55%|█████▌ | 3197/5773 [1:05:59<3:57:10, 5.52s/it] 55%|█████▌ | 3197/5773 [1:05:57<3:57:10, 5.52s/it] {'loss': 0.5532, 'learning_rate': 8.749228408555596e-06, 'epoch': 0.55} 55%|█████▌ | 3197/5773 [1:05:59<3:57:10, 5.52s/it] {'loss': 0.5532, 'learning_rate': 8.749228408555596e-06, 'epoch': 0.55} 55%|█████▌ | 3197/5773 [1:05:57<3:57:10, 5.52s/it] 55%|█████▌ | 3198/5773 [1:06:05<3:59:17, 5.58s/it] 55%|█████▌ | 3198/5773 [1:06:03<3:59:17, 5.58s/it] {'loss': 0.5731, 'learning_rate': 8.743661679892832e-06, 'epoch': 0.55} 55%|█████▌ | 3198/5773 [1:06:05<3:59:17, 5.58s/it] {'loss': 0.5731, 'learning_rate': 8.743661679892832e-06, 'epoch': 0.55} 55%|█████▌ | 3198/5773 [1:06:03<3:59:17, 5.58s/it] 55%|█████▌ | 3199/5773 [1:06:10<3:56:02, 5.50s/it] 55%|█████▌ | 3199/5773 [1:06:08<3:56:02, 5.50s/it] {'loss': 0.5643, 'learning_rate': 8.738095346765519e-06, 'epoch': 0.55} 55%|█████▌ | 3199/5773 [1:06:10<3:56:02, 5.50s/it] {'loss': 0.5643, 'learning_rate': 8.738095346765519e-06, 'epoch': 0.55} 55%|█████▌ | 3199/5773 [1:06:08<3:56:02, 5.50s/it]10 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 811 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 55%|█████▌ | 3200/5773 [1:06:16<3:57:28, 5.54s/it]13 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 51 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 55%|█████▌ | 3200/5773 [1:06:14<3:57:28, 5.54s/it] {'loss': 0.5995, 'learning_rate': 8.732529410926102e-06, 'epoch': 0.55} 55%|█████▌ | 3200/5773 [1:06:16<3:57:28, 5.54s/it] {'loss': 0.5995, 'learning_rate': 8.732529410926102e-06, 'epoch': 0.55} 55%|█████▌ | 3200/5773 [1:06:14<3:57:28, 5.54s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 55%|█████▌ | 3201/5773 [1:06:37<7:16:20, 10.18s/it] 55%|█████▌ | 3201/5773 [1:06:35<7:16:20, 10.18s/it] {'loss': 0.554, 'learning_rate': 8.726963874126924e-06, 'epoch': 0.55} 55%|█████▌ | 3201/5773 [1:06:37<7:16:20, 10.18s/it] {'loss': 0.554, 'learning_rate': 8.726963874126924e-06, 'epoch': 0.55} 55%|█████▌ | 3201/5773 [1:06:35<7:16:20, 10.18s/it] 55%|█████▌ | 3202/5773 [1:06:42<6:14:32, 8.74s/it] 55%|█████▌ | 3202/5773 [1:06:40<6:14:32, 8.74s/it] {'loss': 0.5687, 'learning_rate': 8.721398738120191e-06, 'epoch': 0.55} 55%|█████▌ | 3202/5773 [1:06:42<6:14:32, 8.74s/it] {'loss': 0.5687, 'learning_rate': 8.721398738120191e-06, 'epoch': 0.55} 55%|█████▌ | 3202/5773 [1:06:40<6:14:32, 8.74s/it] 55%|█████▌ | 3203/5773 [1:06:48<5:33:39, 7.79s/it] 55%|█████▌ | 3203/5773 [1:06:46<5:33:38, 7.79s/it] {'loss': 0.5721, 'learning_rate': 8.715834004657987e-06, 'epoch': 0.55} 55%|█████▌ | 3203/5773 [1:06:48<5:33:39, 7.79s/it] {'loss': 0.5721, 'learning_rate': 8.715834004657987e-06, 'epoch': 0.55} 55%|█████▌ | 3203/5773 [1:06:46<5:33:38, 7.79s/it] 55%|█████▌ | 3204/5773 [1:06:53<5:03:26, 7.09s/it] 55%|█████▌ | 3204/5773 [1:06:51<5:03:26, 7.09s/it] {'loss': 0.5581, 'learning_rate': 8.710269675492264e-06, 'epoch': 0.55} 55%|█████▌ | 3204/5773 [1:06:53<5:03:26, 7.09s/it] {'loss': 0.5581, 'learning_rate': 8.710269675492264e-06, 'epoch': 0.55} 55%|█████▌ | 3204/5773 [1:06:51<5:03:26, 7.09s/it] 56%|█████▌ | 3205/5773 [1:06:57<4:45:13, 6.66s/it] 56%|█████▌ | 3205/5773 [1:06:59<4:45:14, 6.66s/it] {'loss': 0.5885, 'learning_rate': 8.704705752374857e-06, 'epoch': 0.56} 56%|█████▌ | 3205/5773 [1:06:59<4:45:14, 6.66s/it] {'loss': 0.5885, 'learning_rate': 8.704705752374857e-06, 'epoch': 0.56} 56%|█████▌ | 3205/5773 [1:06:57<4:45:13, 6.66s/it] 56%|█████▌ | 3206/5773 [1:07:03<4:30:01, 6.31s/it] 56%|█████▌ | 3206/5773 [1:07:04<4:30:01, 6.31s/it] {'loss': 0.5603, 'learning_rate': 8.699142237057464e-06, 'epoch': 0.56} 56%|█████▌ | 3206/5773 [1:07:04<4:30:01, 6.31s/it] {'loss': 0.5603, 'learning_rate': 8.699142237057464e-06, 'epoch': 0.56} 56%|█████▌ | 3206/5773 [1:07:03<4:30:01, 6.31s/it] 56%|█████▌ | 3207/5773 [1:07:08<4:18:38, 6.05s/it] 56%|█████▌ | 3207/5773 [1:07:10<4:18:38, 6.05s/it] {'loss': 0.5826, 'learning_rate': 8.693579131291654e-06, 'epoch': 0.56} 56%|█████▌ | 3207/5773 [1:07:10<4:18:38, 6.05s/it] {'loss': 0.5826, 'learning_rate': 8.693579131291654e-06, 'epoch': 0.56} 56%|█████▌ | 3207/5773 [1:07:08<4:18:38, 6.05s/it] 56%|█████▌ | 3208/5773 [1:07:16<4:13:19, 5.93s/it] 56%|█████▌ | 3208/5773 [1:07:14<4:13:19, 5.93s/it] {'loss': 0.5702, 'learning_rate': 8.688016436828876e-06, 'epoch': 0.56} 56%|█████▌ | 3208/5773 [1:07:16<4:13:19, 5.93s/it] {'loss': 0.5702, 'learning_rate': 8.688016436828876e-06, 'epoch': 0.56} 56%|█████▌ | 3208/5773 [1:07:14<4:13:19, 5.93s/it] 56%|█████▌ | 3209/5773 [1:07:21<4:07:18, 5.79s/it] 56%|█████▌ | 3209/5773 [1:07:19<4:07:18, 5.79s/it] {'loss': 0.5653, 'learning_rate': 8.682454155420438e-06, 'epoch': 0.56} 56%|█████▌ | 3209/5773 [1:07:21<4:07:18, 5.79s/it] {'loss': 0.5653, 'learning_rate': 8.682454155420438e-06, 'epoch': 0.56} 56%|█████▌ | 3209/5773 [1:07:19<4:07:18, 5.79s/it] 56%|█████▌ | 3210/5773 [1:07:26<4:01:33, 5.65s/it] 56%|█████▌ | 3210/5773 [1:07:24<4:01:33, 5.65s/it] {'loss': 0.5583, 'learning_rate': 8.676892288817531e-06, 'epoch': 0.56} 56%|█████▌ | 3210/5773 [1:07:26<4:01:33, 5.65s/it] {'loss': 0.5583, 'learning_rate': 8.676892288817531e-06, 'epoch': 0.56} 56%|█████▌ | 3210/5773 [1:07:24<4:01:33, 5.65s/it] 56%|█████▌ | 3211/5773 [1:07:30<4:00:44, 5.64s/it] 56%|█████▌ | 3211/5773 [1:07:32<4:00:45, 5.64s/it] {'loss': 0.5767, 'learning_rate': 8.671330838771201e-06, 'epoch': 0.56} 56%|█████▌ | 3211/5773 [1:07:32<4:00:45, 5.64s/it] {'loss': 0.5767, 'learning_rate': 8.671330838771201e-06, 'epoch': 0.56} 56%|█████▌ | 3211/5773 [1:07:30<4:00:44, 5.64s/it] 56%|█████▌ | 3212/5773 [1:07:37<3:57:24, 5.56s/it] 56%|█████▌ | 3212/5773 [1:07:35<3:57:24, 5.56s/it] {'loss': 0.5501, 'learning_rate': 8.665769807032375e-06, 'epoch': 0.56} 56%|█████▌ | 3212/5773 [1:07:37<3:57:24, 5.56s/it] {'loss': 0.5501, 'learning_rate': 8.665769807032375e-06, 'epoch': 0.56} 56%|█████▌ | 3212/5773 [1:07:35<3:57:24, 5.56s/it] 56%|█████▌ | 3213/5773 [1:07:43<3:56:39, 5.55s/it] 56%|█████▌ | 3213/5773 [1:07:41<3:56:39, 5.55s/it] {'loss': 0.5694, 'learning_rate': 8.660209195351846e-06, 'epoch': 0.56} 56%|█████▌ | 3213/5773 [1:07:43<3:56:39, 5.55s/it] {'loss': 0.5694, 'learning_rate': 8.660209195351846e-06, 'epoch': 0.56} 56%|█████▌ | 3213/5773 [1:07:41<3:56:39, 5.55s/it] 56%|█████▌ | 3214/5773 [1:07:48<3:54:42, 5.50s/it] 56%|█████▌ | 3214/5773 [1:07:46<3:54:42, 5.50s/it] {'loss': 0.5667, 'learning_rate': 8.654649005480265e-06, 'epoch': 0.56} 56%|█████▌ | 3214/5773 [1:07:48<3:54:42, 5.50s/it] {'loss': 0.5667, 'learning_rate': 8.654649005480265e-06, 'epoch': 0.56} 56%|█████▌ | 3214/5773 [1:07:46<3:54:42, 5.50s/it] 56%|█████▌ | 3215/5773 [1:07:52<3:55:28, 5.52s/it] 56%|█████▌ | 3215/5773 [1:07:54<3:55:28, 5.52s/it] {'loss': 0.5618, 'learning_rate': 8.64908923916816e-06, 'epoch': 0.56} 56%|█████▌ | 3215/5773 [1:07:54<3:55:28, 5.52s/it] {'loss': 0.5618, 'learning_rate': 8.64908923916816e-06, 'epoch': 0.56} 56%|█████▌ | 3215/5773 [1:07:52<3:55:28, 5.52s/it] 56%|█████▌ | 3216/5773 [1:07:59<3:56:16, 5.54s/it] 56%|█████▌ | 3216/5773 [1:07:57<3:56:16, 5.54s/it] {'loss': 0.5486, 'learning_rate': 8.643529898165926e-06, 'epoch': 0.56} 56%|█████▌ | 3216/5773 [1:07:59<3:56:16, 5.54s/it] {'loss': 0.5486, 'learning_rate': 8.643529898165926e-06, 'epoch': 0.56} 56%|█████▌ | 3216/5773 [1:07:57<3:56:16, 5.54s/it] 56%|█████▌ | 3217/5773 [1:08:05<3:57:29, 5.58s/it] 56%|█████▌ | 3217/5773 [1:08:03<3:57:29, 5.58s/it] {'loss': 0.5897, 'learning_rate': 8.637970984223817e-06, 'epoch': 0.56} 56%|█████▌ | 3217/5773 [1:08:05<3:57:29, 5.58s/it] {'loss': 0.5897, 'learning_rate': 8.637970984223817e-06, 'epoch': 0.56} 56%|█████▌ | 3217/5773 [1:08:03<3:57:29, 5.58s/it] 56%|█████▌ | 3218/5773 [1:08:11<3:58:08, 5.59s/it] 56%|█████▌ | 3218/5773 [1:08:09<3:58:08, 5.59s/it] {'loss': 0.5697, 'learning_rate': 8.632412499091958e-06, 'epoch': 0.56} 56%|█████▌ | 3218/5773 [1:08:11<3:58:08, 5.59s/it] {'loss': 0.5697, 'learning_rate': 8.632412499091958e-06, 'epoch': 0.56} 56%|█████▌ | 3218/5773 [1:08:09<3:58:08, 5.59s/it] 56%|█████▌ | 3219/5773 [1:08:14<3:57:10, 5.57s/it] 56%|█████▌ | 3219/5773 [1:08:16<3:57:10, 5.57s/it] {'loss': 0.5536, 'learning_rate': 8.62685444452034e-06, 'epoch': 0.56} 56%|█████▌ | 3219/5773 [1:08:16<3:57:10, 5.57s/it] {'loss': 0.5536, 'learning_rate': 8.62685444452034e-06, 'epoch': 0.56} 56%|█████▌ | 3219/5773 [1:08:14<3:57:10, 5.57s/it] 56%|█████▌ | 3220/5773 [1:08:20<3:56:53, 5.57s/it] 56%|█████▌ | 3220/5773 [1:08:22<3:56:53, 5.57s/it] {'loss': 0.5705, 'learning_rate': 8.621296822258813e-06, 'epoch': 0.56} 56%|█████▌ | 3220/5773 [1:08:22<3:56:53, 5.57s/it] {'loss': 0.5705, 'learning_rate': 8.621296822258813e-06, 'epoch': 0.56} 56%|█████▌ | 3220/5773 [1:08:20<3:56:53, 5.57s/it] 56%|█████▌ | 3221/5773 [1:08:25<3:55:42, 5.54s/it] 56%|█████▌ | 3221/5773 [1:08:27<3:55:42, 5.54s/it] {'loss': 0.5682, 'learning_rate': 8.615739634057098e-06, 'epoch': 0.56} 56%|█████▌ | 3221/5773 [1:08:27<3:55:42, 5.54s/it] {'loss': 0.5682, 'learning_rate': 8.615739634057098e-06, 'epoch': 0.56} 56%|█████▌ | 3221/5773 [1:08:25<3:55:42, 5.54s/it] 56%|█████▌ | 3222/5773 [1:08:31<3:56:00, 5.55s/it] 56%|█████▌ | 3222/5773 [1:08:33<3:56:01, 5.55s/it] {'loss': 0.564, 'learning_rate': 8.61018288166477e-06, 'epoch': 0.56} 56%|█████▌ | 3222/5773 [1:08:33<3:56:01, 5.55s/it] {'loss': 0.564, 'learning_rate': 8.61018288166477e-06, 'epoch': 0.56} 56%|█████▌ | 3222/5773 [1:08:31<3:56:00, 5.55s/it] 56%|█████▌ | 3223/5773 [1:08:36<3:53:48, 5.50s/it] 56%|█████▌ | 3223/5773 [1:08:38<3:53:48, 5.50s/it] {'loss': 0.5705, 'learning_rate': 8.604626566831279e-06, 'epoch': 0.56} 56%|█████▌ | 3223/5773 [1:08:38<3:53:48, 5.50s/it] {'loss': 0.5705, 'learning_rate': 8.604626566831279e-06, 'epoch': 0.56} 56%|█████▌ | 3223/5773 [1:08:36<3:53:48, 5.50s/it] 56%|█████▌ | 3224/5773 [1:08:44<3:53:18, 5.49s/it] 56%|█████▌ | 3224/5773 [1:08:42<3:53:19, 5.49s/it] {'loss': 0.5694, 'learning_rate': 8.599070691305925e-06, 'epoch': 0.56} 56%|█████▌ | 3224/5773 [1:08:44<3:53:18, 5.49s/it] {'loss': 0.5694, 'learning_rate': 8.599070691305925e-06, 'epoch': 0.56} 56%|█████▌ | 3224/5773 [1:08:42<3:53:19, 5.49s/it] 56%|█████▌ | 3225/5773 [1:08:50<3:58:00, 5.60s/it] 56%|█████▌ | 3225/5773 [1:08:48<3:58:01, 5.60s/it] {'loss': 0.57, 'learning_rate': 8.593515256837875e-06, 'epoch': 0.56} 56%|█████▌ | 3225/5773 [1:08:50<3:58:00, 5.60s/it] {'loss': 0.57, 'learning_rate': 8.593515256837875e-06, 'epoch': 0.56} 56%|█████▌ | 3225/5773 [1:08:48<3:58:01, 5.60s/it] 56%|█████▌ | 3226/5773 [1:08:53<3:57:24, 5.59s/it] 56%|█████▌ | 3226/5773 [1:08:55<3:57:25, 5.59s/it] {'loss': 0.5679, 'learning_rate': 8.58796026517616e-06, 'epoch': 0.56} 56%|█████▌ | 3226/5773 [1:08:55<3:57:25, 5.59s/it] {'loss': 0.5679, 'learning_rate': 8.58796026517616e-06, 'epoch': 0.56} 56%|█████▌ | 3226/5773 [1:08:53<3:57:24, 5.59s/it] 56%|█████▌ | 3227/5773 [1:08:59<3:54:33, 5.53s/it] 56%|█████▌ | 3227/5773 [1:09:01<3:54:33, 5.53s/it] {'loss': 0.5567, 'learning_rate': 8.582405718069672e-06, 'epoch': 0.56}{'loss': 0.5567, 'learning_rate': 8.582405718069672e-06, 'epoch': 0.56} 56%|█████▌ | 3227/5773 [1:09:01<3:54:33, 5.53s/it] 56%|█████▌ | 3227/5773 [1:08:59<3:54:33, 5.53s/it] 56%|█████▌ | 3228/5773 [1:09:04<3:53:18, 5.50s/it] 56%|█████▌ | 3228/5773 [1:09:06<3:53:18, 5.50s/it] {'loss': 0.5709, 'learning_rate': 8.576851617267151e-06, 'epoch': 0.56} 56%|█████▌ | 3228/5773 [1:09:06<3:53:18, 5.50s/it] {'loss': 0.5709, 'learning_rate': 8.576851617267151e-06, 'epoch': 0.56} 56%|█████▌ | 3228/5773 [1:09:04<3:53:18, 5.50s/it] 56%|█████▌ | 3229/5773 [1:09:11<3:53:24, 5.51s/it] 56%|█████▌ | 3229/5773 [1:09:09<3:53:25, 5.51s/it] {'loss': 0.5763, 'learning_rate': 8.571297964517212e-06, 'epoch': 0.56} 56%|█████▌ | 3229/5773 [1:09:11<3:53:24, 5.51s/it] {'loss': 0.5763, 'learning_rate': 8.571297964517212e-06, 'epoch': 0.56} 56%|█████▌ | 3229/5773 [1:09:09<3:53:25, 5.51s/it] 56%|█████▌ | 3230/5773 [1:09:15<3:51:55, 5.47s/it] 56%|█████▌ | 3230/5773 [1:09:17<3:51:56, 5.47s/it] {'loss': 0.566, 'learning_rate': 8.56574476156832e-06, 'epoch': 0.56} 56%|█████▌ | 3230/5773 [1:09:17<3:51:56, 5.47s/it] {'loss': 0.566, 'learning_rate': 8.56574476156832e-06, 'epoch': 0.56} 56%|█████▌ | 3230/5773 [1:09:15<3:51:55, 5.47s/it] 56%|█████▌ | 3231/5773 [1:09:22<3:52:23, 5.49s/it] 56%|█████▌ | 3231/5773 [1:09:20<3:52:23, 5.49s/it] {'loss': 0.5616, 'learning_rate': 8.560192010168798e-06, 'epoch': 0.56} 56%|█████▌ | 3231/5773 [1:09:22<3:52:23, 5.49s/it] {'loss': 0.5616, 'learning_rate': 8.560192010168798e-06, 'epoch': 0.56} 56%|█████▌ | 3231/5773 [1:09:20<3:52:23, 5.49s/it] 56%|█████▌ | 3232/5773 [1:09:28<3:52:11, 5.48s/it] 56%|█████▌ | 3232/5773 [1:09:26<3:52:11, 5.48s/it] {'loss': 0.5729, 'learning_rate': 8.554639712066837e-06, 'epoch': 0.56} 56%|█████▌ | 3232/5773 [1:09:28<3:52:11, 5.48s/it] {'loss': 0.5729, 'learning_rate': 8.554639712066837e-06, 'epoch': 0.56} 56%|█████▌ | 3232/5773 [1:09:26<3:52:11, 5.48s/it] 56%|█████▌ | 3233/5773 [1:09:31<3:51:05, 5.46s/it] 56%|█████▌ | 3233/5773 [1:09:33<3:51:05, 5.46s/it] {'loss': 0.5795, 'learning_rate': 8.549087869010471e-06, 'epoch': 0.56} 56%|█████▌ | 3233/5773 [1:09:33<3:51:05, 5.46s/it] {'loss': 0.5795, 'learning_rate': 8.549087869010471e-06, 'epoch': 0.56} 56%|█████▌ | 3233/5773 [1:09:31<3:51:05, 5.46s/it] 56%|█████▌ | 3234/5773 [1:09:37<3:51:11, 5.46s/it] 56%|█████▌ | 3234/5773 [1:09:39<3:51:11, 5.46s/it] {'loss': 0.5666, 'learning_rate': 8.543536482747603e-06, 'epoch': 0.56} 56%|█████▌ | 3234/5773 [1:09:39<3:51:11, 5.46s/it] {'loss': 0.5666, 'learning_rate': 8.543536482747603e-06, 'epoch': 0.56} 56%|█████▌ | 3234/5773 [1:09:37<3:51:11, 5.46s/it] 56%|█████▌ | 3235/5773 [1:09:44<3:53:22, 5.52s/it] 56%|█████▌ | 3235/5773 [1:09:42<3:53:22, 5.52s/it] {'loss': 0.5754, 'learning_rate': 8.537985555025982e-06, 'epoch': 0.56} 56%|█████▌ | 3235/5773 [1:09:44<3:53:22, 5.52s/it] {'loss': 0.5754, 'learning_rate': 8.537985555025982e-06, 'epoch': 0.56} 56%|█████▌ | 3235/5773 [1:09:42<3:53:22, 5.52s/it] 56%|█████▌ | 3236/5773 [1:09:48<3:51:16, 5.47s/it] 56%|█████▌ | 3236/5773 [1:09:50<3:51:16, 5.47s/it] {'loss': 0.5524, 'learning_rate': 8.532435087593221e-06, 'epoch': 0.56} 56%|█████▌ | 3236/5773 [1:09:50<3:51:16, 5.47s/it] {'loss': 0.5524, 'learning_rate': 8.532435087593221e-06, 'epoch': 0.56} 56%|█████▌ | 3236/5773 [1:09:48<3:51:16, 5.47s/it] 56%|█████▌ | 3237/5773 [1:09:53<3:52:38, 5.50s/it] 56%|█████▌ | 3237/5773 [1:09:55<3:52:38, 5.50s/it] {'loss': 0.581, 'learning_rate': 8.52688508219678e-06, 'epoch': 0.56} 56%|█████▌ | 3237/5773 [1:09:55<3:52:38, 5.50s/it] {'loss': 0.581, 'learning_rate': 8.52688508219678e-06, 'epoch': 0.56} 56%|█████▌ | 3237/5773 [1:09:53<3:52:38, 5.50s/it] 56%|█████▌ | 3238/5773 [1:09:59<3:53:22, 5.52s/it] 56%|█████▌ | 3238/5773 [1:10:01<3:53:22, 5.52s/it] {'loss': 0.5536, 'learning_rate': 8.52133554058398e-06, 'epoch': 0.56} 56%|█████▌ | 3238/5773 [1:10:01<3:53:22, 5.52s/it] {'loss': 0.5536, 'learning_rate': 8.52133554058398e-06, 'epoch': 0.56} 56%|█████▌ | 3238/5773 [1:09:59<3:53:22, 5.52s/it] 56%|█████▌ | 3239/5773 [1:10:04<3:53:13, 5.52s/it] 56%|█████▌ | 3239/5773 [1:10:06<3:53:13, 5.52s/it] {'loss': 0.5643, 'learning_rate': 8.515786464501998e-06, 'epoch': 0.56} 56%|█████▌ | 3239/5773 [1:10:06<3:53:13, 5.52s/it] {'loss': 0.5643, 'learning_rate': 8.515786464501998e-06, 'epoch': 0.56} 56%|█████▌ | 3239/5773 [1:10:04<3:53:13, 5.52s/it] 56%|█████▌ | 3240/5773 [1:10:12<3:53:46, 5.54s/it] 56%|█████▌ | 3240/5773 [1:10:10<3:53:47, 5.54s/it] {'loss': 0.5642, 'learning_rate': 8.510237855697855e-06, 'epoch': 0.56} 56%|█████▌ | 3240/5773 [1:10:12<3:53:46, 5.54s/it] {'loss': 0.5642, 'learning_rate': 8.510237855697855e-06, 'epoch': 0.56} 56%|█████▌ | 3240/5773 [1:10:10<3:53:47, 5.54s/it] 56%|█████▌ | 3241/5773 [1:10:17<3:53:02, 5.52s/it] 56%|█████▌ | 3241/5773 [1:10:15<3:53:03, 5.52s/it] {'loss': 0.5644, 'learning_rate': 8.504689715918439e-06, 'epoch': 0.56} 56%|█████▌ | 3241/5773 [1:10:17<3:53:02, 5.52s/it] {'loss': 0.5644, 'learning_rate': 8.504689715918439e-06, 'epoch': 0.56} 56%|█████▌ | 3241/5773 [1:10:15<3:53:03, 5.52s/it] 56%|█████▌ | 3242/5773 [1:10:23<3:52:08, 5.50s/it] 56%|█████▌ | 3242/5773 [1:10:21<3:52:08, 5.50s/it] {'loss': 0.5638, 'learning_rate': 8.499142046910471e-06, 'epoch': 0.56} 56%|█████▌ | 3242/5773 [1:10:23<3:52:08, 5.50s/it] {'loss': 0.5638, 'learning_rate': 8.499142046910471e-06, 'epoch': 0.56} 56%|█████▌ | 3242/5773 [1:10:21<3:52:08, 5.50s/it] 56%|█████▌ | 3243/5773 [1:10:28<3:52:14, 5.51s/it] 56%|█████▌ | 3243/5773 [1:10:26<3:52:14, 5.51s/it] {'loss': 0.5803, 'learning_rate': 8.493594850420537e-06, 'epoch': 0.56} 56%|█████▌ | 3243/5773 [1:10:28<3:52:14, 5.51s/it] {'loss': 0.5803, 'learning_rate': 8.493594850420537e-06, 'epoch': 0.56} 56%|█████▌ | 3243/5773 [1:10:26<3:52:14, 5.51s/it] 56%|█████▌ | 3244/5773 [1:10:34<3:48:44, 5.43s/it] 56%|█████▌ | 3244/5773 [1:10:32<3:48:44, 5.43s/it] {'loss': 0.5613, 'learning_rate': 8.488048128195073e-06, 'epoch': 0.56} 56%|█████▌ | 3244/5773 [1:10:34<3:48:44, 5.43s/it] {'loss': 0.5613, 'learning_rate': 8.488048128195073e-06, 'epoch': 0.56} 56%|█████▌ | 3244/5773 [1:10:32<3:48:44, 5.43s/it] 56%|█████▌ | 3245/5773 [1:10:39<3:48:21, 5.42s/it] 56%|█████▌ | 3245/5773 [1:10:37<3:48:21, 5.42s/it] {'loss': 0.5655, 'learning_rate': 8.482501881980368e-06, 'epoch': 0.56} 56%|█████▌ | 3245/5773 [1:10:39<3:48:21, 5.42s/it] {'loss': 0.5655, 'learning_rate': 8.482501881980368e-06, 'epoch': 0.56} 56%|█████▌ | 3245/5773 [1:10:37<3:48:21, 5.42s/it] 56%|█████▌ | 3246/5773 [1:10:45<3:48:41, 5.43s/it] 56%|█████▌ | 3246/5773 [1:10:43<3:48:41, 5.43s/it] {'loss': 0.5756, 'learning_rate': 8.47695611352255e-06, 'epoch': 0.56} 56%|█████▌ | 3246/5773 [1:10:45<3:48:41, 5.43s/it] {'loss': 0.5756, 'learning_rate': 8.47695611352255e-06, 'epoch': 0.56} 56%|█████▌ | 3246/5773 [1:10:43<3:48:41, 5.43s/it] 56%|█████▌ | 3247/5773 [1:10:48<3:47:43, 5.41s/it] 56%|█████▌ | 3247/5773 [1:10:50<3:47:43, 5.41s/it] {'loss': 0.5646, 'learning_rate': 8.471410824567611e-06, 'epoch': 0.56} 56%|█████▌ | 3247/5773 [1:10:50<3:47:43, 5.41s/it] {'loss': 0.5646, 'learning_rate': 8.471410824567611e-06, 'epoch': 0.56} 56%|█████▌ | 3247/5773 [1:10:48<3:47:43, 5.41s/it] 56%|█████▋ | 3248/5773 [1:10:56<3:50:36, 5.48s/it] 56%|█████▋ | 3248/5773 [1:10:54<3:50:36, 5.48s/it] {'loss': 0.5777, 'learning_rate': 8.465866016861383e-06, 'epoch': 0.56} 56%|█████▋ | 3248/5773 [1:10:56<3:50:36, 5.48s/it] {'loss': 0.5777, 'learning_rate': 8.465866016861383e-06, 'epoch': 0.56} 56%|█████▋ | 3248/5773 [1:10:54<3:50:36, 5.48s/it] 56%|█████▋ | 3249/5773 [1:11:01<3:49:37, 5.46s/it] 56%|█████▋ | 3249/5773 [1:10:59<3:49:37, 5.46s/it] {'loss': 0.5764, 'learning_rate': 8.460321692149546e-06, 'epoch': 0.56} 56%|█████▋ | 3249/5773 [1:11:01<3:49:37, 5.46s/it] {'loss': 0.5764, 'learning_rate': 8.460321692149546e-06, 'epoch': 0.56} 56%|█████▋ | 3249/5773 [1:10:59<3:49:37, 5.46s/it]11 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 814 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 010 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 56%|█████▋ | 3250/5773 [1:11:06<3:49:05, 5.45s/it]13 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 56%|█████▋ | 3250/5773 [1:11:04<3:49:06, 5.45s/it]15 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5454, 'learning_rate': 8.454777852177635e-06, 'epoch': 0.56} 56%|█████▋ | 3250/5773 [1:11:06<3:49:05, 5.45s/it] {'loss': 0.5454, 'learning_rate': 8.454777852177635e-06, 'epoch': 0.56} 56%|█████▋ | 3250/5773 [1:11:04<3:49:06, 5.45s/it] 56%|█████▋ | 3251/5773 [1:11:12<3:49:17, 5.45s/it] 56%|█████▋ | 3251/5773 [1:11:10<3:49:17, 5.45s/it] {'loss': 0.5714, 'learning_rate': 8.449234498691024e-06, 'epoch': 0.56} 56%|█████▋ | 3251/5773 [1:11:12<3:49:17, 5.45s/it] {'loss': 0.5714, 'learning_rate': 8.449234498691024e-06, 'epoch': 0.56} 56%|█████▋ | 3251/5773 [1:11:10<3:49:17, 5.45s/it] 56%|█████▋ | 3252/5773 [1:11:17<3:49:28, 5.46s/it] 56%|█████▋ | 3252/5773 [1:11:15<3:49:28, 5.46s/it] {'loss': 0.5745, 'learning_rate': 8.443691633434942e-06, 'epoch': 0.56} 56%|█████▋ | 3252/5773 [1:11:17<3:49:28, 5.46s/it] {'loss': 0.5745, 'learning_rate': 8.443691633434942e-06, 'epoch': 0.56} 56%|█████▋ | 3252/5773 [1:11:15<3:49:28, 5.46s/it] 56%|█████▋ | 3253/5773 [1:11:23<3:51:18, 5.51s/it] 56%|█████▋ | 3253/5773 [1:11:21<3:51:18, 5.51s/it] {'loss': 0.5433, 'learning_rate': 8.438149258154455e-06, 'epoch': 0.56} 56%|█████▋ | 3253/5773 [1:11:23<3:51:18, 5.51s/it] {'loss': 0.5433, 'learning_rate': 8.438149258154455e-06, 'epoch': 0.56} 56%|█████▋ | 3253/5773 [1:11:21<3:51:18, 5.51s/it] 56%|█████▋ | 3254/5773 [1:11:27<3:52:43, 5.54s/it] 56%|█████▋ | 3254/5773 [1:11:29<3:52:43, 5.54s/it] {'loss': 0.5904, 'learning_rate': 8.432607374594484e-06, 'epoch': 0.56} 56%|█████▋ | 3254/5773 [1:11:29<3:52:43, 5.54s/it] {'loss': 0.5904, 'learning_rate': 8.432607374594484e-06, 'epoch': 0.56} 56%|█████▋ | 3254/5773 [1:11:27<3:52:43, 5.54s/it] 56%|█████▋ | 3255/5773 [1:11:32<3:53:21, 5.56s/it] 56%|█████▋ | 3255/5773 [1:11:34<3:53:21, 5.56s/it] {'loss': 0.5666, 'learning_rate': 8.427065984499791e-06, 'epoch': 0.56} 56%|█████▋ | 3255/5773 [1:11:34<3:53:21, 5.56s/it] {'loss': 0.5666, 'learning_rate': 8.427065984499791e-06, 'epoch': 0.56} 56%|█████▋ | 3255/5773 [1:11:32<3:53:21, 5.56s/it] 56%|█████▋ | 3256/5773 [1:11:40<3:51:55, 5.53s/it] 56%|█████▋ | 3256/5773 [1:11:38<3:51:55, 5.53s/it] {'loss': 0.5413, 'learning_rate': 8.421525089614977e-06, 'epoch': 0.56} 56%|█████▋ | 3256/5773 [1:11:40<3:51:55, 5.53s/it] {'loss': 0.5413, 'learning_rate': 8.421525089614977e-06, 'epoch': 0.56} 56%|█████▋ | 3256/5773 [1:11:38<3:51:55, 5.53s/it] 56%|█████▋ | 3257/5773 [1:11:45<3:50:22, 5.49s/it] 56%|█████▋ | 3257/5773 [1:11:43<3:50:22, 5.49s/it] {'loss': 0.5711, 'learning_rate': 8.415984691684498e-06, 'epoch': 0.56} 56%|█████▋ | 3257/5773 [1:11:45<3:50:22, 5.49s/it] {'loss': 0.5711, 'learning_rate': 8.415984691684498e-06, 'epoch': 0.56} 56%|█████▋ | 3257/5773 [1:11:43<3:50:22, 5.49s/it] 56%|█████▋ | 3258/5773 [1:11:50<3:49:28, 5.47s/it] 56%|█████▋ | 3258/5773 [1:11:48<3:49:28, 5.47s/it] {'loss': 0.5394, 'learning_rate': 8.410444792452648e-06, 'epoch': 0.56} 56%|█████▋ | 3258/5773 [1:11:50<3:49:28, 5.47s/it] {'loss': 0.5394, 'learning_rate': 8.410444792452648e-06, 'epoch': 0.56} 56%|█████▋ | 3258/5773 [1:11:48<3:49:28, 5.47s/it] 56%|█████▋ | 3259/5773 [1:11:56<3:49:15, 5.47s/it] 56%|█████▋ | 3259/5773 [1:11:54<3:49:15, 5.47s/it] {'loss': 0.5746, 'learning_rate': 8.40490539366356e-06, 'epoch': 0.56} 56%|█████▋ | 3259/5773 [1:11:56<3:49:15, 5.47s/it] {'loss': 0.5746, 'learning_rate': 8.40490539366356e-06, 'epoch': 0.56} 56%|█████▋ | 3259/5773 [1:11:54<3:49:15, 5.47s/it] 56%|█████▋ | 3260/5773 [1:11:59<3:47:50, 5.44s/it] 56%|█████▋ | 3260/5773 [1:12:01<3:47:50, 5.44s/it] {'loss': 0.5677, 'learning_rate': 8.39936649706122e-06, 'epoch': 0.56} 56%|█████▋ | 3260/5773 [1:12:01<3:47:50, 5.44s/it] {'loss': 0.5677, 'learning_rate': 8.39936649706122e-06, 'epoch': 0.56} 56%|█████▋ | 3260/5773 [1:11:59<3:47:50, 5.44s/it] 56%|█████▋ | 3261/5773 [1:12:07<3:48:59, 5.47s/it] 56%|█████▋ | 3261/5773 [1:12:05<3:48:59, 5.47s/it] {'loss': 0.5882, 'learning_rate': 8.393828104389446e-06, 'epoch': 0.56} 56%|█████▋ | 3261/5773 [1:12:07<3:48:59, 5.47s/it] {'loss': 0.5882, 'learning_rate': 8.393828104389446e-06, 'epoch': 0.56} 56%|█████▋ | 3261/5773 [1:12:05<3:48:59, 5.47s/it] 57%|█████▋ | 3262/5773 [1:12:12<3:49:59, 5.50s/it] 57%|█████▋ | 3262/5773 [1:12:10<3:49:59, 5.50s/it] {'loss': 0.5598, 'learning_rate': 8.388290217391902e-06, 'epoch': 0.57} 57%|█████▋ | 3262/5773 [1:12:12<3:49:59, 5.50s/it] {'loss': 0.5598, 'learning_rate': 8.388290217391902e-06, 'epoch': 0.57} 57%|█████▋ | 3262/5773 [1:12:10<3:49:59, 5.50s/it] 57%|█████▋ | 3263/5773 [1:12:18<3:49:15, 5.48s/it] 57%|█████▋ | 3263/5773 [1:12:16<3:49:15, 5.48s/it] {'loss': 0.5639, 'learning_rate': 8.38275283781209e-06, 'epoch': 0.57} 57%|█████▋ | 3263/5773 [1:12:18<3:49:15, 5.48s/it] {'loss': 0.5639, 'learning_rate': 8.38275283781209e-06, 'epoch': 0.57} 57%|█████▋ | 3263/5773 [1:12:16<3:49:15, 5.48s/it] 57%|█████▋ | 3264/5773 [1:12:21<3:50:01, 5.50s/it] 57%|█████▋ | 3264/5773 [1:12:23<3:50:01, 5.50s/it] {'loss': 0.5564, 'learning_rate': 8.377215967393355e-06, 'epoch': 0.57} 57%|█████▋ | 3264/5773 [1:12:23<3:50:01, 5.50s/it] {'loss': 0.5564, 'learning_rate': 8.377215967393355e-06, 'epoch': 0.57} 57%|█████▋ | 3264/5773 [1:12:21<3:50:01, 5.50s/it] 57%|█████▋ | 3265/5773 [1:12:29<3:51:09, 5.53s/it] 57%|█████▋ | 3265/5773 [1:12:27<3:51:10, 5.53s/it] {'loss': 0.5673, 'learning_rate': 8.371679607878884e-06, 'epoch': 0.57} 57%|█████▋ | 3265/5773 [1:12:29<3:51:09, 5.53s/it] {'loss': 0.5673, 'learning_rate': 8.371679607878884e-06, 'epoch': 0.57} 57%|█████▋ | 3265/5773 [1:12:27<3:51:10, 5.53s/it] 57%|█████▋ | 3266/5773 [1:12:34<3:50:00, 5.50s/it] 57%|█████▋ | 3266/5773 [1:12:32<3:50:00, 5.50s/it] {'loss': 0.5743, 'learning_rate': 8.366143761011695e-06, 'epoch': 0.57} 57%|█████▋ | 3266/5773 [1:12:34<3:50:00, 5.50s/it] {'loss': 0.5743, 'learning_rate': 8.366143761011695e-06, 'epoch': 0.57} 57%|█████▋ | 3266/5773 [1:12:32<3:50:00, 5.50s/it] 57%|█████▋ | 3267/5773 [1:12:40<3:50:14, 5.51s/it] 57%|█████▋ | 3267/5773 [1:12:38<3:50:14, 5.51s/it] {'loss': 0.5591, 'learning_rate': 8.360608428534652e-06, 'epoch': 0.57} 57%|█████▋ | 3267/5773 [1:12:40<3:50:14, 5.51s/it] {'loss': 0.5591, 'learning_rate': 8.360608428534652e-06, 'epoch': 0.57} 57%|█████▋ | 3267/5773 [1:12:38<3:50:14, 5.51s/it] 57%|█████▋ | 3268/5773 [1:12:45<3:49:46, 5.50s/it] 57%|█████▋ | 3268/5773 [1:12:43<3:49:46, 5.50s/it] {'loss': 0.558, 'learning_rate': 8.355073612190452e-06, 'epoch': 0.57} 57%|█████▋ | 3268/5773 [1:12:45<3:49:46, 5.50s/it] {'loss': 0.558, 'learning_rate': 8.355073612190452e-06, 'epoch': 0.57} 57%|█████▋ | 3268/5773 [1:12:43<3:49:46, 5.50s/it] 57%|█████▋ | 3269/5773 [1:12:51<3:50:12, 5.52s/it] 57%|█████▋ | 3269/5773 [1:12:49<3:50:12, 5.52s/it] {'loss': 0.5581, 'learning_rate': 8.349539313721639e-06, 'epoch': 0.57} 57%|█████▋ | 3269/5773 [1:12:51<3:50:12, 5.52s/it] {'loss': 0.5581, 'learning_rate': 8.349539313721639e-06, 'epoch': 0.57} 57%|█████▋ | 3269/5773 [1:12:49<3:50:12, 5.52s/it] 57%|█████▋ | 3270/5773 [1:12:54<3:49:37, 5.50s/it] 57%|█████▋ | 3270/5773 [1:12:56<3:49:37, 5.50s/it] {'loss': 0.58, 'learning_rate': 8.344005534870578e-06, 'epoch': 0.57} 57%|█████▋ | 3270/5773 [1:12:56<3:49:37, 5.50s/it] {'loss': 0.58, 'learning_rate': 8.344005534870578e-06, 'epoch': 0.57} 57%|█████▋ | 3270/5773 [1:12:54<3:49:37, 5.50s/it] 57%|█████▋ | 3271/5773 [1:13:00<3:50:05, 5.52s/it] 57%|█████▋ | 3271/5773 [1:13:02<3:50:07, 5.52s/it] {'loss': 0.5588, 'learning_rate': 8.338472277379485e-06, 'epoch': 0.57} 57%|█████▋ | 3271/5773 [1:13:02<3:50:07, 5.52s/it] {'loss': 0.5588, 'learning_rate': 8.338472277379485e-06, 'epoch': 0.57} 57%|█████▋ | 3271/5773 [1:13:00<3:50:05, 5.52s/it] 57%|█████▋ | 3272/5773 [1:13:08<3:49:47, 5.51s/it] 57%|█████▋ | 3272/5773 [1:13:06<3:49:48, 5.51s/it] {'loss': 0.5751, 'learning_rate': 8.332939542990401e-06, 'epoch': 0.57} 57%|█████▋ | 3272/5773 [1:13:08<3:49:47, 5.51s/it] {'loss': 0.5751, 'learning_rate': 8.332939542990401e-06, 'epoch': 0.57} 57%|█████▋ | 3272/5773 [1:13:06<3:49:48, 5.51s/it] 57%|█████▋ | 3273/5773 [1:13:13<3:51:49, 5.56s/it] 57%|█████▋ | 3273/5773 [1:13:11<3:51:50, 5.56s/it] {'loss': 0.5902, 'learning_rate': 8.327407333445214e-06, 'epoch': 0.57} 57%|█████▋ | 3273/5773 [1:13:13<3:51:49, 5.56s/it] {'loss': 0.5902, 'learning_rate': 8.327407333445214e-06, 'epoch': 0.57} 57%|█████▋ | 3273/5773 [1:13:11<3:51:50, 5.56s/it] 57%|█████▋ | 3274/5773 [1:13:19<3:51:17, 5.55s/it] 57%|█████▋ | 3274/5773 [1:13:17<3:51:17, 5.55s/it] {'loss': 0.5767, 'learning_rate': 8.321875650485636e-06, 'epoch': 0.57} 57%|█████▋ | 3274/5773 [1:13:19<3:51:17, 5.55s/it] {'loss': 0.5767, 'learning_rate': 8.321875650485636e-06, 'epoch': 0.57} 57%|█████▋ | 3274/5773 [1:13:17<3:51:17, 5.55s/it] 57%|█████▋ | 3275/5773 [1:13:24<3:51:03, 5.55s/it] 57%|█████▋ | 3275/5773 [1:13:22<3:51:03, 5.55s/it] {'loss': 0.5771, 'learning_rate': 8.316344495853218e-06, 'epoch': 0.57} 57%|█████▋ | 3275/5773 [1:13:24<3:51:03, 5.55s/it] {'loss': 0.5771, 'learning_rate': 8.316344495853218e-06, 'epoch': 0.57} 57%|█████▋ | 3275/5773 [1:13:22<3:51:03, 5.55s/it] 57%|█████▋ | 3276/5773 [1:13:30<3:49:44, 5.52s/it] 57%|█████▋ | 3276/5773 [1:13:28<3:49:44, 5.52s/it] {'loss': 0.5542, 'learning_rate': 8.310813871289349e-06, 'epoch': 0.57} 57%|█████▋ | 3276/5773 [1:13:30<3:49:44, 5.52s/it] {'loss': 0.5542, 'learning_rate': 8.310813871289349e-06, 'epoch': 0.57} 57%|█████▋ | 3276/5773 [1:13:28<3:49:44, 5.52s/it] 57%|█████▋ | 3277/5773 [1:13:35<3:49:32, 5.52s/it] 57%|█████▋ | 3277/5773 [1:13:33<3:49:32, 5.52s/it] {'loss': 0.5659, 'learning_rate': 8.30528377853524e-06, 'epoch': 0.57} 57%|█████▋ | 3277/5773 [1:13:35<3:49:32, 5.52s/it] {'loss': 0.5659, 'learning_rate': 8.30528377853524e-06, 'epoch': 0.57} 57%|█████▋ | 3277/5773 [1:13:33<3:49:32, 5.52s/it] 57%|█████▋ | 3278/5773 [1:13:41<3:48:55, 5.51s/it] 57%|█████▋ | 3278/5773 [1:13:39<3:48:55, 5.51s/it] {'loss': 0.5609, 'learning_rate': 8.299754219331944e-06, 'epoch': 0.57} 57%|█████▋ | 3278/5773 [1:13:41<3:48:55, 5.51s/it] {'loss': 0.5609, 'learning_rate': 8.299754219331944e-06, 'epoch': 0.57} 57%|█████▋ | 3278/5773 [1:13:39<3:48:55, 5.51s/it] 57%|█████▋ | 3279/5773 [1:13:46<3:48:37, 5.50s/it] 57%|█████▋ | 3279/5773 [1:13:44<3:48:37, 5.50s/it] {'loss': 0.5485, 'learning_rate': 8.29422519542034e-06, 'epoch': 0.57} 57%|█████▋ | 3279/5773 [1:13:46<3:48:37, 5.50s/it] {'loss': 0.5485, 'learning_rate': 8.29422519542034e-06, 'epoch': 0.57} 57%|█████▋ | 3279/5773 [1:13:44<3:48:37, 5.50s/it] 57%|█████▋ | 3280/5773 [1:13:52<3:47:02, 5.46s/it] 57%|█████▋ | 3280/5773 [1:13:50<3:47:02, 5.46s/it] {'loss': 0.5508, 'learning_rate': 8.288696708541146e-06, 'epoch': 0.57} 57%|█████▋ | 3280/5773 [1:13:52<3:47:02, 5.46s/it] {'loss': 0.5508, 'learning_rate': 8.288696708541146e-06, 'epoch': 0.57} 57%|█████▋ | 3280/5773 [1:13:50<3:47:02, 5.46s/it] 57%|█████▋ | 3281/5773 [1:13:57<3:45:50, 5.44s/it] 57%|█████▋ | 3281/5773 [1:13:55<3:45:50, 5.44s/it] {'loss': 0.5799, 'learning_rate': 8.283168760434904e-06, 'epoch': 0.57} 57%|█████▋ | 3281/5773 [1:13:57<3:45:50, 5.44s/it] {'loss': 0.5799, 'learning_rate': 8.283168760434904e-06, 'epoch': 0.57} 57%|█████▋ | 3281/5773 [1:13:55<3:45:50, 5.44s/it] 57%|█████▋ | 3282/5773 [1:14:02<3:45:30, 5.43s/it] 57%|█████▋ | 3282/5773 [1:14:00<3:45:30, 5.43s/it] {'loss': 0.551, 'learning_rate': 8.277641352841985e-06, 'epoch': 0.57} 57%|█████▋ | 3282/5773 [1:14:02<3:45:30, 5.43s/it] {'loss': 0.551, 'learning_rate': 8.277641352841985e-06, 'epoch': 0.57} 57%|█████▋ | 3282/5773 [1:14:00<3:45:30, 5.43s/it] 57%|█████▋ | 3283/5773 [1:14:08<3:46:41, 5.46s/it] 57%|█████▋ | 3283/5773 [1:14:06<3:46:41, 5.46s/it] {'loss': 0.5771, 'learning_rate': 8.272114487502604e-06, 'epoch': 0.57} 57%|█████▋ | 3283/5773 [1:14:08<3:46:41, 5.46s/it] {'loss': 0.5771, 'learning_rate': 8.272114487502604e-06, 'epoch': 0.57} 57%|█████▋ | 3283/5773 [1:14:06<3:46:41, 5.46s/it] 57%|█████▋ | 3284/5773 [1:14:13<3:48:14, 5.50s/it] 57%|█████▋ | 3284/5773 [1:14:12<3:48:14, 5.50s/it] {'loss': 0.5679, 'learning_rate': 8.266588166156785e-06, 'epoch': 0.57} 57%|█████▋ | 3284/5773 [1:14:13<3:48:14, 5.50s/it] {'loss': 0.5679, 'learning_rate': 8.266588166156785e-06, 'epoch': 0.57} 57%|█████▋ | 3284/5773 [1:14:12<3:48:14, 5.50s/it] 57%|█████▋ | 3285/5773 [1:14:17<3:47:52, 5.50s/it] 57%|█████▋ | 3285/5773 [1:14:19<3:47:53, 5.50s/it] {'loss': 0.5455, 'learning_rate': 8.26106239054439e-06, 'epoch': 0.57} 57%|█████▋ | 3285/5773 [1:14:19<3:47:53, 5.50s/it] {'loss': 0.5455, 'learning_rate': 8.26106239054439e-06, 'epoch': 0.57} 57%|█████▋ | 3285/5773 [1:14:17<3:47:52, 5.50s/it] 57%|█████▋ | 3286/5773 [1:14:25<3:50:35, 5.56s/it] 57%|█████▋ | 3286/5773 [1:14:23<3:50:35, 5.56s/it] {'loss': 0.5828, 'learning_rate': 8.255537162405117e-06, 'epoch': 0.57} 57%|█████▋ | 3286/5773 [1:14:25<3:50:35, 5.56s/it] {'loss': 0.5828, 'learning_rate': 8.255537162405117e-06, 'epoch': 0.57} 57%|█████▋ | 3286/5773 [1:14:23<3:50:35, 5.56s/it] 57%|█████▋ | 3287/5773 [1:14:30<3:50:01, 5.55s/it] 57%|█████▋ | 3287/5773 [1:14:28<3:50:01, 5.55s/it] {'loss': 0.5592, 'learning_rate': 8.250012483478478e-06, 'epoch': 0.57} 57%|█████▋ | 3287/5773 [1:14:30<3:50:01, 5.55s/it] {'loss': 0.5592, 'learning_rate': 8.250012483478478e-06, 'epoch': 0.57} 57%|█████▋ | 3287/5773 [1:14:28<3:50:01, 5.55s/it] 57%|█████▋ | 3288/5773 [1:14:36<3:47:56, 5.50s/it] 57%|█████▋ | 3288/5773 [1:14:34<3:47:56, 5.50s/it] {'loss': 0.5678, 'learning_rate': 8.244488355503822e-06, 'epoch': 0.57} 57%|█████▋ | 3288/5773 [1:14:36<3:47:56, 5.50s/it] {'loss': 0.5678, 'learning_rate': 8.244488355503822e-06, 'epoch': 0.57} 57%|█████▋ | 3288/5773 [1:14:34<3:47:56, 5.50s/it] 57%|█████▋ | 3289/5773 [1:14:41<3:47:16, 5.49s/it] 57%|█████▋ | 3289/5773 [1:14:39<3:47:16, 5.49s/it] {'loss': 0.576, 'learning_rate': 8.23896478022032e-06, 'epoch': 0.57} 57%|█████▋ | 3289/5773 [1:14:41<3:47:16, 5.49s/it] {'loss': 0.576, 'learning_rate': 8.23896478022032e-06, 'epoch': 0.57} 57%|█████▋ | 3289/5773 [1:14:39<3:47:16, 5.49s/it] 57%|█████▋ | 3290/5773 [1:14:47<3:47:45, 5.50s/it] 57%|█████▋ | 3290/5773 [1:14:45<3:47:45, 5.50s/it] {'loss': 0.5505, 'learning_rate': 8.233441759366969e-06, 'epoch': 0.57} 57%|█████▋ | 3290/5773 [1:14:47<3:47:45, 5.50s/it] {'loss': 0.5505, 'learning_rate': 8.233441759366969e-06, 'epoch': 0.57} 57%|█████▋ | 3290/5773 [1:14:45<3:47:45, 5.50s/it] 57%|█████▋ | 3291/5773 [1:14:52<3:46:48, 5.48s/it] 57%|█████▋ | 3291/5773 [1:14:50<3:46:48, 5.48s/it] {'loss': 0.5691, 'learning_rate': 8.227919294682595e-06, 'epoch': 0.57} 57%|█████▋ | 3291/5773 [1:14:52<3:46:48, 5.48s/it] {'loss': 0.5691, 'learning_rate': 8.227919294682595e-06, 'epoch': 0.57} 57%|█████▋ | 3291/5773 [1:14:50<3:46:48, 5.48s/it] 57%|█████▋ | 3292/5773 [1:14:58<3:48:11, 5.52s/it] 57%|█████▋ | 3292/5773 [1:14:56<3:48:10, 5.52s/it] {'loss': 0.5694, 'learning_rate': 8.222397387905841e-06, 'epoch': 0.57} 57%|█████▋ | 3292/5773 [1:14:58<3:48:11, 5.52s/it] {'loss': 0.5694, 'learning_rate': 8.222397387905841e-06, 'epoch': 0.57} 57%|█████▋ | 3292/5773 [1:14:56<3:48:10, 5.52s/it] 57%|█████▋ | 3293/5773 [1:15:03<3:46:18, 5.48s/it] 57%|█████▋ | 3293/5773 [1:15:01<3:46:18, 5.48s/it] {'loss': 0.5683, 'learning_rate': 8.216876040775185e-06, 'epoch': 0.57} 57%|█████▋ | 3293/5773 [1:15:03<3:46:18, 5.48s/it] {'loss': 0.5683, 'learning_rate': 8.216876040775185e-06, 'epoch': 0.57} 57%|█████▋ | 3293/5773 [1:15:01<3:46:18, 5.48s/it] 57%|█████▋ | 3294/5773 [1:15:09<3:51:08, 5.59s/it] 57%|█████▋ | 3294/5773 [1:15:07<3:51:08, 5.59s/it] {'loss': 0.5826, 'learning_rate': 8.211355255028924e-06, 'epoch': 0.57} 57%|█████▋ | 3294/5773 [1:15:09<3:51:08, 5.59s/it] {'loss': 0.5826, 'learning_rate': 8.211355255028924e-06, 'epoch': 0.57} 57%|█████▋ | 3294/5773 [1:15:07<3:51:08, 5.59s/it] 57%|█████▋ | 3295/5773 [1:15:14<3:51:11, 5.60s/it] 57%|█████▋ | 3295/5773 [1:15:13<3:51:11, 5.60s/it] {'loss': 0.5617, 'learning_rate': 8.205835032405174e-06, 'epoch': 0.57} 57%|█████▋ | 3295/5773 [1:15:14<3:51:11, 5.60s/it] {'loss': 0.5617, 'learning_rate': 8.205835032405174e-06, 'epoch': 0.57} 57%|█████▋ | 3295/5773 [1:15:13<3:51:11, 5.60s/it] 57%|█████▋ | 3296/5773 [1:15:20<3:51:14, 5.60s/it] 57%|█████▋ | 3296/5773 [1:15:18<3:51:14, 5.60s/it] {'loss': 0.5622, 'learning_rate': 8.20031537464188e-06, 'epoch': 0.57} 57%|█████▋ | 3296/5773 [1:15:20<3:51:14, 5.60s/it] {'loss': 0.5622, 'learning_rate': 8.20031537464188e-06, 'epoch': 0.57} 57%|█████▋ | 3296/5773 [1:15:18<3:51:14, 5.60s/it] 57%|█████▋ | 3297/5773 [1:15:25<3:48:21, 5.53s/it] 57%|█████▋ | 3297/5773 [1:15:24<3:48:21, 5.53s/it] {'loss': 0.5649, 'learning_rate': 8.194796283476808e-06, 'epoch': 0.57} 57%|█████▋ | 3297/5773 [1:15:25<3:48:21, 5.53s/it] {'loss': 0.5649, 'learning_rate': 8.194796283476808e-06, 'epoch': 0.57} 57%|█████▋ | 3297/5773 [1:15:24<3:48:21, 5.53s/it] 57%|█████▋ | 3298/5773 [1:15:31<3:48:17, 5.53s/it] 57%|█████▋ | 3298/5773 [1:15:29<3:48:17, 5.53s/it] {'loss': 0.569, 'learning_rate': 8.189277760647537e-06, 'epoch': 0.57} 57%|█████▋ | 3298/5773 [1:15:31<3:48:17, 5.53s/it] {'loss': 0.569, 'learning_rate': 8.189277760647537e-06, 'epoch': 0.57} 57%|█████▋ | 3298/5773 [1:15:29<3:48:17, 5.53s/it] 57%|█████▋ | 3299/5773 [1:15:37<3:49:00, 5.55s/it] 57%|█████▋ | 3299/5773 [1:15:35<3:49:00, 5.55s/it] {'loss': 0.556, 'learning_rate': 8.183759807891481e-06, 'epoch': 0.57} 57%|█████▋ | 3299/5773 [1:15:37<3:49:00, 5.55s/it] {'loss': 0.556, 'learning_rate': 8.183759807891481e-06, 'epoch': 0.57} 57%|█████▋ | 3299/5773 [1:15:35<3:49:00, 5.55s/it]11 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 155 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 57%|█████▋ | 3300/5773 [1:15:42<3:49:41, 5.57s/it] AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 7AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 57%|█████▋ | 3300/5773 [1:15:40<3:49:41, 5.57s/it]3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5704, 'learning_rate': 8.178242426945867e-06, 'epoch': 0.57} 57%|█████▋ | 3300/5773 [1:15:42<3:49:41, 5.57s/it] {'loss': 0.5704, 'learning_rate': 8.178242426945867e-06, 'epoch': 0.57} 57%|█████▋ | 3300/5773 [1:15:40<3:49:41, 5.57s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 57%|█████▋ | 3301/5773 [1:16:06<7:32:33, 10.98s/it] 57%|█████▋ | 3301/5773 [1:16:04<7:32:33, 10.98s/it] {'loss': 0.5654, 'learning_rate': 8.172725619547738e-06, 'epoch': 0.57} 57%|█████▋ | 3301/5773 [1:16:06<7:32:33, 10.98s/it] {'loss': 0.5654, 'learning_rate': 8.172725619547738e-06, 'epoch': 0.57} 57%|█████▋ | 3301/5773 [1:16:04<7:32:33, 10.98s/it] 57%|█████▋ | 3302/5773 [1:16:11<6:23:35, 9.31s/it] 57%|█████▋ | 3302/5773 [1:16:09<6:23:35, 9.31s/it] {'loss': 0.5559, 'learning_rate': 8.167209387433968e-06, 'epoch': 0.57} 57%|█████▋ | 3302/5773 [1:16:11<6:23:35, 9.31s/it] {'loss': 0.5559, 'learning_rate': 8.167209387433968e-06, 'epoch': 0.57} 57%|█████▋ | 3302/5773 [1:16:09<6:23:35, 9.31s/it] 57%|█████▋ | 3303/5773 [1:16:17<5:35:41, 8.15s/it] 57%|█████▋ | 3303/5773 [1:16:15<5:35:41, 8.15s/it] {'loss': 0.5683, 'learning_rate': 8.161693732341238e-06, 'epoch': 0.57} 57%|█████▋ | 3303/5773 [1:16:17<5:35:41, 8.15s/it] {'loss': 0.5683, 'learning_rate': 8.161693732341238e-06, 'epoch': 0.57} 57%|█████▋ | 3303/5773 [1:16:15<5:35:41, 8.15s/it] 57%|█████▋ | 3304/5773 [1:16:20<5:03:13, 7.37s/it] 57%|█████▋ | 3304/5773 [1:16:22<5:03:13, 7.37s/it] {'loss': 0.5545, 'learning_rate': 8.156178656006057e-06, 'epoch': 0.57} 57%|█████▋ | 3304/5773 [1:16:22<5:03:13, 7.37s/it] {'loss': 0.5545, 'learning_rate': 8.156178656006057e-06, 'epoch': 0.57} 57%|█████▋ | 3304/5773 [1:16:20<5:03:13, 7.37s/it] 57%|█████▋ | 3305/5773 [1:16:26<4:40:27, 6.82s/it] 57%|█████▋ | 3305/5773 [1:16:28<4:40:28, 6.82s/it] {'loss': 0.5623, 'learning_rate': 8.150664160164743e-06, 'epoch': 0.57} 57%|█████▋ | 3305/5773 [1:16:28<4:40:28, 6.82s/it] {'loss': 0.5623, 'learning_rate': 8.150664160164743e-06, 'epoch': 0.57} 57%|█████▋ | 3305/5773 [1:16:26<4:40:27, 6.82s/it] 57%|█████▋ | 3306/5773 [1:16:31<4:25:36, 6.46s/it] 57%|█████▋ | 3306/5773 [1:16:33<4:25:37, 6.46s/it] {'loss': 0.5695, 'learning_rate': 8.145150246553437e-06, 'epoch': 0.57} 57%|█████▋ | 3306/5773 [1:16:33<4:25:37, 6.46s/it] {'loss': 0.5695, 'learning_rate': 8.145150246553437e-06, 'epoch': 0.57} 57%|█████▋ | 3306/5773 [1:16:31<4:25:36, 6.46s/it] 57%|█████▋ | 3307/5773 [1:16:39<4:13:10, 6.16s/it] 57%|█████▋ | 3307/5773 [1:16:37<4:13:11, 6.16s/it] {'loss': 0.5724, 'learning_rate': 8.139636916908098e-06, 'epoch': 0.57} 57%|█████▋ | 3307/5773 [1:16:39<4:13:10, 6.16s/it] {'loss': 0.5724, 'learning_rate': 8.139636916908098e-06, 'epoch': 0.57} 57%|█████▋ | 3307/5773 [1:16:37<4:13:11, 6.16s/it] 57%|█████▋ | 3308/5773 [1:16:44<4:03:48, 5.93s/it] 57%|█████▋ | 3308/5773 [1:16:42<4:03:48, 5.93s/it] {'loss': 0.5706, 'learning_rate': 8.134124172964493e-06, 'epoch': 0.57} 57%|█████▋ | 3308/5773 [1:16:44<4:03:48, 5.93s/it] {'loss': 0.5706, 'learning_rate': 8.134124172964493e-06, 'epoch': 0.57} 57%|█████▋ | 3308/5773 [1:16:42<4:03:48, 5.93s/it] 57%|█████▋ | 3309/5773 [1:16:48<3:59:52, 5.84s/it] 57%|█████▋ | 3309/5773 [1:16:50<3:59:52, 5.84s/it] {'loss': 0.5785, 'learning_rate': 8.128612016458214e-06, 'epoch': 0.57} 57%|█████▋ | 3309/5773 [1:16:50<3:59:52, 5.84s/it] {'loss': 0.5785, 'learning_rate': 8.128612016458214e-06, 'epoch': 0.57} 57%|█████▋ | 3309/5773 [1:16:48<3:59:52, 5.84s/it] 57%|█████▋ | 3310/5773 [1:16:54<3:57:56, 5.80s/it] 57%|█████▋ | 3310/5773 [1:16:56<3:57:56, 5.80s/it] {'loss': 0.5663, 'learning_rate': 8.123100449124662e-06, 'epoch': 0.57} 57%|█████▋ | 3310/5773 [1:16:56<3:57:56, 5.80s/it] {'loss': 0.5663, 'learning_rate': 8.123100449124662e-06, 'epoch': 0.57} 57%|█████▋ | 3310/5773 [1:16:54<3:57:56, 5.80s/it] 57%|█████▋ | 3311/5773 [1:16:59<3:54:05, 5.71s/it] 57%|█████▋ | 3311/5773 [1:17:01<3:54:06, 5.71s/it] {'loss': 0.5633, 'learning_rate': 8.117589472699059e-06, 'epoch': 0.57} 57%|█████▋ | 3311/5773 [1:17:01<3:54:06, 5.71s/it] {'loss': 0.5633, 'learning_rate': 8.117589472699059e-06, 'epoch': 0.57} 57%|█████▋ | 3311/5773 [1:16:59<3:54:05, 5.71s/it] 57%|█████▋ | 3312/5773 [1:17:05<3:53:36, 5.70s/it] 57%|█████▋ | 3312/5773 [1:17:07<3:53:36, 5.70s/it] {'loss': 0.5568, 'learning_rate': 8.11207908891643e-06, 'epoch': 0.57} 57%|█████▋ | 3312/5773 [1:17:07<3:53:36, 5.70s/it] {'loss': 0.5568, 'learning_rate': 8.11207908891643e-06, 'epoch': 0.57} 57%|█████▋ | 3312/5773 [1:17:05<3:53:36, 5.70s/it] 57%|█████▋ | 3313/5773 [1:17:10<3:52:04, 5.66s/it] 57%|█████▋ | 3313/5773 [1:17:12<3:52:05, 5.66s/it] {'loss': 0.5712, 'learning_rate': 8.106569299511622e-06, 'epoch': 0.57} 57%|█████▋ | 3313/5773 [1:17:12<3:52:05, 5.66s/it] {'loss': 0.5712, 'learning_rate': 8.106569299511622e-06, 'epoch': 0.57} 57%|█████▋ | 3313/5773 [1:17:10<3:52:04, 5.66s/it] 57%|█████▋ | 3314/5773 [1:17:16<3:51:28, 5.65s/it] 57%|█████▋ | 3314/5773 [1:17:18<3:51:28, 5.65s/it] {'loss': 0.5726, 'learning_rate': 8.101060106219292e-06, 'epoch': 0.57} 57%|█████▋ | 3314/5773 [1:17:18<3:51:28, 5.65s/it] {'loss': 0.5726, 'learning_rate': 8.101060106219292e-06, 'epoch': 0.57} 57%|█████▋ | 3314/5773 [1:17:16<3:51:28, 5.65s/it] 57%|█████▋ | 3315/5773 [1:17:22<3:50:33, 5.63s/it] 57%|█████▋ | 3315/5773 [1:17:24<3:50:32, 5.63s/it] {'loss': 0.5774, 'learning_rate': 8.095551510773913e-06, 'epoch': 0.57} 57%|█████▋ | 3315/5773 [1:17:24<3:50:32, 5.63s/it] {'loss': 0.5774, 'learning_rate': 8.095551510773913e-06, 'epoch': 0.57} 57%|█████▋ | 3315/5773 [1:17:22<3:50:33, 5.63s/it] 57%|█████▋ | 3316/5773 [1:17:27<3:48:57, 5.59s/it] 57%|█████▋ | 3316/5773 [1:17:29<3:48:57, 5.59s/it] {'loss': 0.5815, 'learning_rate': 8.090043514909763e-06, 'epoch': 0.57} 57%|█████▋ | 3316/5773 [1:17:29<3:48:57, 5.59s/it] {'loss': 0.5815, 'learning_rate': 8.090043514909763e-06, 'epoch': 0.57} 57%|█████▋ | 3316/5773 [1:17:27<3:48:57, 5.59s/it] 57%|█████▋ | 3317/5773 [1:17:32<3:46:29, 5.53s/it] 57%|█████▋ | 3317/5773 [1:17:34<3:46:29, 5.53s/it] {'loss': 0.5628, 'learning_rate': 8.08453612036094e-06, 'epoch': 0.57} 57%|█████▋ | 3317/5773 [1:17:34<3:46:29, 5.53s/it] {'loss': 0.5628, 'learning_rate': 8.08453612036094e-06, 'epoch': 0.57} 57%|█████▋ | 3317/5773 [1:17:32<3:46:29, 5.53s/it] 57%|█████▋ | 3318/5773 [1:17:38<3:46:02, 5.52s/it] 57%|█████▋ | 3318/5773 [1:17:40<3:46:02, 5.52s/it] {'loss': 0.5865, 'learning_rate': 8.079029328861342e-06, 'epoch': 0.57} 57%|█████▋ | 3318/5773 [1:17:40<3:46:02, 5.52s/it] {'loss': 0.5865, 'learning_rate': 8.079029328861342e-06, 'epoch': 0.57} 57%|█████▋ | 3318/5773 [1:17:38<3:46:02, 5.52s/it] 57%|█████▋ | 3319/5773 [1:17:43<3:45:33, 5.51s/it] 57%|█████▋ | 3319/5773 [1:17:45<3:45:33, 5.51s/it] {'loss': 0.5689, 'learning_rate': 8.073523142144686e-06, 'epoch': 0.57} 57%|█████▋ | 3319/5773 [1:17:45<3:45:33, 5.51s/it] {'loss': 0.5689, 'learning_rate': 8.073523142144686e-06, 'epoch': 0.57} 57%|█████▋ | 3319/5773 [1:17:43<3:45:33, 5.51s/it] 58%|█████▊ | 3320/5773 [1:17:49<3:43:17, 5.46s/it] 58%|█████▊ | 3320/5773 [1:17:51<3:43:17, 5.46s/it] {'loss': 0.5724, 'learning_rate': 8.0680175619445e-06, 'epoch': 0.58} 58%|█████▊ | 3320/5773 [1:17:51<3:43:17, 5.46s/it] {'loss': 0.5724, 'learning_rate': 8.0680175619445e-06, 'epoch': 0.58} 58%|█████▊ | 3320/5773 [1:17:49<3:43:17, 5.46s/it] 58%|█████▊ | 3321/5773 [1:17:54<3:43:28, 5.47s/it] 58%|█████▊ | 3321/5773 [1:17:56<3:43:27, 5.47s/it] {'loss': 0.5683, 'learning_rate': 8.062512589994105e-06, 'epoch': 0.58} 58%|█████▊ | 3321/5773 [1:17:56<3:43:27, 5.47s/it] {'loss': 0.5683, 'learning_rate': 8.062512589994105e-06, 'epoch': 0.58} 58%|█████▊ | 3321/5773 [1:17:54<3:43:28, 5.47s/it] 58%|█████▊ | 3322/5773 [1:18:00<3:44:59, 5.51s/it] 58%|█████▊ | 3322/5773 [1:18:02<3:44:59, 5.51s/it] {'loss': 0.5561, 'learning_rate': 8.057008228026654e-06, 'epoch': 0.58} 58%|█████▊ | 3322/5773 [1:18:02<3:44:59, 5.51s/it] {'loss': 0.5561, 'learning_rate': 8.057008228026654e-06, 'epoch': 0.58} 58%|█████▊ | 3322/5773 [1:18:00<3:44:59, 5.51s/it] 58%|█████▊ | 3323/5773 [1:18:05<3:42:49, 5.46s/it] 58%|█████▊ | 3323/5773 [1:18:07<3:42:49, 5.46s/it] {'loss': 0.5804, 'learning_rate': 8.05150447777509e-06, 'epoch': 0.58} 58%|█████▊ | 3323/5773 [1:18:07<3:42:49, 5.46s/it] {'loss': 0.5804, 'learning_rate': 8.05150447777509e-06, 'epoch': 0.58} 58%|█████▊ | 3323/5773 [1:18:05<3:42:49, 5.46s/it] 58%|█████▊ | 3324/5773 [1:18:11<3:42:27, 5.45s/it] 58%|█████▊ | 3324/5773 [1:18:13<3:42:26, 5.45s/it] {'loss': 0.5769, 'learning_rate': 8.046001340972168e-06, 'epoch': 0.58} 58%|█████▊ | 3324/5773 [1:18:13<3:42:26, 5.45s/it] {'loss': 0.5769, 'learning_rate': 8.046001340972168e-06, 'epoch': 0.58} 58%|█████▊ | 3324/5773 [1:18:11<3:42:27, 5.45s/it] 58%|█████▊ | 3325/5773 [1:18:16<3:43:49, 5.49s/it] 58%|█████▊ | 3325/5773 [1:18:18<3:43:50, 5.49s/it] {'loss': 0.5566, 'learning_rate': 8.04049881935046e-06, 'epoch': 0.58} 58%|█████▊ | 3325/5773 [1:18:18<3:43:50, 5.49s/it] {'loss': 0.5566, 'learning_rate': 8.04049881935046e-06, 'epoch': 0.58} 58%|█████▊ | 3325/5773 [1:18:16<3:43:49, 5.49s/it] 58%|█████▊ | 3326/5773 [1:18:22<3:44:16, 5.50s/it] 58%|█████▊ | 3326/5773 [1:18:24<3:44:16, 5.50s/it] {'loss': 0.5656, 'learning_rate': 8.034996914642325e-06, 'epoch': 0.58} 58%|█████▊ | 3326/5773 [1:18:24<3:44:16, 5.50s/it] {'loss': 0.5656, 'learning_rate': 8.034996914642325e-06, 'epoch': 0.58} 58%|█████▊ | 3326/5773 [1:18:22<3:44:16, 5.50s/it] 58%|█████▊ | 3327/5773 [1:18:27<3:45:05, 5.52s/it] 58%|█████▊ | 3327/5773 [1:18:29<3:45:05, 5.52s/it] {'loss': 0.5594, 'learning_rate': 8.029495628579941e-06, 'epoch': 0.58} 58%|█████▊ | 3327/5773 [1:18:29<3:45:05, 5.52s/it] {'loss': 0.5594, 'learning_rate': 8.029495628579941e-06, 'epoch': 0.58} 58%|█████▊ | 3327/5773 [1:18:27<3:45:05, 5.52s/it] 58%|█████▊ | 3328/5773 [1:18:33<3:45:24, 5.53s/it] 58%|█████▊ | 3328/5773 [1:18:35<3:45:25, 5.53s/it] {'loss': 0.5634, 'learning_rate': 8.023994962895294e-06, 'epoch': 0.58} 58%|█████▊ | 3328/5773 [1:18:35<3:45:25, 5.53s/it] {'loss': 0.5634, 'learning_rate': 8.023994962895294e-06, 'epoch': 0.58} 58%|█████▊ | 3328/5773 [1:18:33<3:45:24, 5.53s/it] 58%|█████▊ | 3329/5773 [1:18:38<3:44:21, 5.51s/it] 58%|█████▊ | 3329/5773 [1:18:40<3:44:21, 5.51s/it] {'loss': 0.5593, 'learning_rate': 8.018494919320163e-06, 'epoch': 0.58} 58%|█████▊ | 3329/5773 [1:18:40<3:44:21, 5.51s/it] {'loss': 0.5593, 'learning_rate': 8.018494919320163e-06, 'epoch': 0.58} 58%|█████▊ | 3329/5773 [1:18:38<3:44:21, 5.51s/it] 58%|█████▊ | 3330/5773 [1:18:44<3:44:19, 5.51s/it] 58%|█████▊ | 3330/5773 [1:18:46<3:44:19, 5.51s/it] {'loss': 0.5762, 'learning_rate': 8.012995499586142e-06, 'epoch': 0.58} 58%|█████▊ | 3330/5773 [1:18:46<3:44:19, 5.51s/it] {'loss': 0.5762, 'learning_rate': 8.012995499586142e-06, 'epoch': 0.58} 58%|█████▊ | 3330/5773 [1:18:44<3:44:19, 5.51s/it] 58%|█████▊ | 3331/5773 [1:18:51<3:41:47, 5.45s/it] 58%|█████▊ | 3331/5773 [1:18:49<3:41:48, 5.45s/it] {'loss': 0.571, 'learning_rate': 8.00749670542462e-06, 'epoch': 0.58} 58%|█████▊ | 3331/5773 [1:18:51<3:41:47, 5.45s/it] {'loss': 0.571, 'learning_rate': 8.00749670542462e-06, 'epoch': 0.58} 58%|█████▊ | 3331/5773 [1:18:49<3:41:48, 5.45s/it] 58%|█████▊ | 3332/5773 [1:18:55<3:41:24, 5.44s/it] 58%|█████▊ | 3332/5773 [1:18:57<3:41:23, 5.44s/it] {'loss': 0.5714, 'learning_rate': 8.001998538566794e-06, 'epoch': 0.58} 58%|█████▊ | 3332/5773 [1:18:57<3:41:23, 5.44s/it] {'loss': 0.5714, 'learning_rate': 8.001998538566794e-06, 'epoch': 0.58} 58%|█████▊ | 3332/5773 [1:18:55<3:41:24, 5.44s/it] 58%|█████▊ | 3333/5773 [1:19:00<3:42:22, 5.47s/it] 58%|█████▊ | 3333/5773 [1:19:02<3:42:22, 5.47s/it] {'loss': 0.5523, 'learning_rate': 7.996501000743668e-06, 'epoch': 0.58} 58%|█████▊ | 3333/5773 [1:19:02<3:42:22, 5.47s/it] {'loss': 0.5523, 'learning_rate': 7.996501000743668e-06, 'epoch': 0.58} 58%|█████▊ | 3333/5773 [1:19:00<3:42:22, 5.47s/it] 58%|█████▊ | 3334/5773 [1:19:07<3:40:50, 5.43s/it] 58%|█████▊ | 3334/5773 [1:19:05<3:40:50, 5.43s/it] {'loss': 0.5757, 'learning_rate': 7.991004093686035e-06, 'epoch': 0.58} 58%|█████▊ | 3334/5773 [1:19:07<3:40:50, 5.43s/it] {'loss': 0.5757, 'learning_rate': 7.991004093686035e-06, 'epoch': 0.58} 58%|█████▊ | 3334/5773 [1:19:05<3:40:50, 5.43s/it] 58%|█████▊ | 3335/5773 [1:19:11<3:39:43, 5.41s/it] 58%|█████▊ | 3335/5773 [1:19:13<3:39:43, 5.41s/it] {'loss': 0.5714, 'learning_rate': 7.985507819124502e-06, 'epoch': 0.58} 58%|█████▊ | 3335/5773 [1:19:13<3:39:43, 5.41s/it] {'loss': 0.5714, 'learning_rate': 7.985507819124502e-06, 'epoch': 0.58} 58%|█████▊ | 3335/5773 [1:19:11<3:39:43, 5.41s/it] 58%|█████▊ | 3336/5773 [1:19:16<3:40:44, 5.43s/it] 58%|█████▊ | 3336/5773 [1:19:18<3:40:44, 5.43s/it] {'loss': 0.5637, 'learning_rate': 7.980012178789467e-06, 'epoch': 0.58} 58%|█████▊ | 3336/5773 [1:19:18<3:40:44, 5.43s/it] {'loss': 0.5637, 'learning_rate': 7.980012178789467e-06, 'epoch': 0.58} 58%|█████▊ | 3336/5773 [1:19:16<3:40:44, 5.43s/it] 58%|█████▊ | 3337/5773 [1:19:22<3:38:58, 5.39s/it] 58%|█████▊ | 3337/5773 [1:19:24<3:38:58, 5.39s/it] {'loss': 0.5597, 'learning_rate': 7.974517174411139e-06, 'epoch': 0.58} 58%|█████▊ | 3337/5773 [1:19:24<3:38:58, 5.39s/it] {'loss': 0.5597, 'learning_rate': 7.974517174411139e-06, 'epoch': 0.58} 58%|█████▊ | 3337/5773 [1:19:22<3:38:58, 5.39s/it] 58%|█████▊ | 3338/5773 [1:19:27<3:44:12, 5.52s/it] 58%|█████▊ | 3338/5773 [1:19:29<3:44:11, 5.52s/it] {'loss': 0.5648, 'learning_rate': 7.969022807719519e-06, 'epoch': 0.58} 58%|█████▊ | 3338/5773 [1:19:29<3:44:11, 5.52s/it] {'loss': 0.5648, 'learning_rate': 7.969022807719519e-06, 'epoch': 0.58} 58%|█████▊ | 3338/5773 [1:19:27<3:44:12, 5.52s/it] 58%|█████▊ | 3339/5773 [1:19:33<3:42:44, 5.49s/it] 58%|█████▊ | 3339/5773 [1:19:35<3:42:44, 5.49s/it] {'loss': 0.5583, 'learning_rate': 7.963529080444412e-06, 'epoch': 0.58} 58%|█████▊ | 3339/5773 [1:19:35<3:42:44, 5.49s/it] {'loss': 0.5583, 'learning_rate': 7.963529080444412e-06, 'epoch': 0.58} 58%|█████▊ | 3339/5773 [1:19:33<3:42:44, 5.49s/it] 58%|█████▊ | 3340/5773 [1:19:38<3:43:00, 5.50s/it] 58%|█████▊ | 3340/5773 [1:19:40<3:42:59, 5.50s/it] {'loss': 0.5706, 'learning_rate': 7.958035994315409e-06, 'epoch': 0.58} 58%|█████▊ | 3340/5773 [1:19:40<3:42:59, 5.50s/it] {'loss': 0.5706, 'learning_rate': 7.958035994315409e-06, 'epoch': 0.58} 58%|█████▊ | 3340/5773 [1:19:38<3:43:00, 5.50s/it] 58%|█████▊ | 3341/5773 [1:19:44<3:42:30, 5.49s/it] 58%|█████▊ | 3341/5773 [1:19:46<3:42:30, 5.49s/it] {'loss': 0.5548, 'learning_rate': 7.952543551061917e-06, 'epoch': 0.58} 58%|█████▊ | 3341/5773 [1:19:46<3:42:30, 5.49s/it] {'loss': 0.5548, 'learning_rate': 7.952543551061917e-06, 'epoch': 0.58} 58%|█████▊ | 3341/5773 [1:19:44<3:42:30, 5.49s/it] 58%|█████▊ | 3342/5773 [1:19:49<3:43:07, 5.51s/it] 58%|█████▊ | 3342/5773 [1:19:51<3:43:07, 5.51s/it] {'loss': 0.5831, 'learning_rate': 7.947051752413131e-06, 'epoch': 0.58} 58%|█████▊ | 3342/5773 [1:19:51<3:43:07, 5.51s/it] {'loss': 0.5831, 'learning_rate': 7.947051752413131e-06, 'epoch': 0.58} 58%|█████▊ | 3342/5773 [1:19:49<3:43:07, 5.51s/it] 58%|█████▊ | 3343/5773 [1:19:55<3:42:53, 5.50s/it] 58%|█████▊ | 3343/5773 [1:19:57<3:42:53, 5.50s/it] {'loss': 0.5817, 'learning_rate': 7.941560600098045e-06, 'epoch': 0.58} 58%|█████▊ | 3343/5773 [1:19:57<3:42:53, 5.50s/it] {'loss': 0.5817, 'learning_rate': 7.941560600098045e-06, 'epoch': 0.58} 58%|█████▊ | 3343/5773 [1:19:55<3:42:53, 5.50s/it] 58%|█████▊ | 3344/5773 [1:20:00<3:43:00, 5.51s/it] 58%|█████▊ | 3344/5773 [1:20:02<3:42:59, 5.51s/it] {'loss': 0.554, 'learning_rate': 7.936070095845447e-06, 'epoch': 0.58} 58%|█████▊ | 3344/5773 [1:20:02<3:42:59, 5.51s/it] {'loss': 0.554, 'learning_rate': 7.936070095845447e-06, 'epoch': 0.58} 58%|█████▊ | 3344/5773 [1:20:00<3:43:00, 5.51s/it] 58%|█████▊ | 3345/5773 [1:20:06<3:42:16, 5.49s/it] 58%|█████▊ | 3345/5773 [1:20:08<3:42:16, 5.49s/it] {'loss': 0.5876, 'learning_rate': 7.930580241383924e-06, 'epoch': 0.58} 58%|█████▊ | 3345/5773 [1:20:08<3:42:16, 5.49s/it] {'loss': 0.5876, 'learning_rate': 7.930580241383924e-06, 'epoch': 0.58} 58%|█████▊ | 3345/5773 [1:20:06<3:42:16, 5.49s/it] 58%|█████▊ | 3346/5773 [1:20:11<3:39:52, 5.44s/it] 58%|█████▊ | 3346/5773 [1:20:13<3:39:52, 5.44s/it] {'loss': 0.5694, 'learning_rate': 7.925091038441864e-06, 'epoch': 0.58} 58%|█████▊ | 3346/5773 [1:20:13<3:39:52, 5.44s/it] {'loss': 0.5694, 'learning_rate': 7.925091038441864e-06, 'epoch': 0.58} 58%|█████▊ | 3346/5773 [1:20:11<3:39:52, 5.44s/it] 58%|█████▊ | 3347/5773 [1:20:17<3:40:01, 5.44s/it] 58%|█████▊ | 3347/5773 [1:20:19<3:40:01, 5.44s/it] {'loss': 0.5586, 'learning_rate': 7.919602488747433e-06, 'epoch': 0.58} 58%|█████▊ | 3347/5773 [1:20:19<3:40:01, 5.44s/it] {'loss': 0.5586, 'learning_rate': 7.919602488747433e-06, 'epoch': 0.58} 58%|█████▊ | 3347/5773 [1:20:17<3:40:01, 5.44s/it] 58%|█████▊ | 3348/5773 [1:20:22<3:38:45, 5.41s/it] 58%|█████▊ | 3348/5773 [1:20:24<3:38:45, 5.41s/it] {'loss': 0.5739, 'learning_rate': 7.91411459402861e-06, 'epoch': 0.58} 58%|█████▊ | 3348/5773 [1:20:24<3:38:45, 5.41s/it] {'loss': 0.5739, 'learning_rate': 7.91411459402861e-06, 'epoch': 0.58} 58%|█████▊ | 3348/5773 [1:20:22<3:38:45, 5.41s/it] 58%|█████▊ | 3349/5773 [1:20:28<3:40:51, 5.47s/it] 58%|█████▊ | 3349/5773 [1:20:30<3:40:52, 5.47s/it] {'loss': 0.563, 'learning_rate': 7.908627356013153e-06, 'epoch': 0.58} 58%|█████▊ | 3349/5773 [1:20:30<3:40:52, 5.47s/it] {'loss': 0.563, 'learning_rate': 7.908627356013153e-06, 'epoch': 0.58} 58%|█████▊ | 3349/5773 [1:20:28<3:40:51, 5.47s/it]11 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 014 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 58%|█████▊ | 3350/5773 [1:20:33<3:40:54, 5.47s/it]1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 58%|█████▊ | 3350/5773 [1:20:35<3:40:54, 5.47s/it]3 AutoResumeHook: Checking whether to suspend... {'loss': 0.581, 'learning_rate': 7.903140776428624e-06, 'epoch': 0.58} 58%|█████▊ | 3350/5773 [1:20:35<3:40:54, 5.47s/it] {'loss': 0.581, 'learning_rate': 7.903140776428624e-06, 'epoch': 0.58} 58%|█████▊ | 3350/5773 [1:20:33<3:40:54, 5.47s/it] 58%|█████▊ | 3351/5773 [1:20:39<3:44:07, 5.55s/it] 58%|█████▊ | 3351/5773 [1:20:41<3:44:07, 5.55s/it] {'loss': 0.5671, 'learning_rate': 7.897654857002374e-06, 'epoch': 0.58} 58%|█████▊ | 3351/5773 [1:20:39<3:44:07, 5.55s/it] {'loss': 0.5671, 'learning_rate': 7.897654857002374e-06, 'epoch': 0.58} 58%|█████▊ | 3351/5773 [1:20:41<3:44:07, 5.55s/it] 58%|█████▊ | 3352/5773 [1:20:44<3:43:17, 5.53s/it] 58%|█████▊ | 3352/5773 [1:20:46<3:43:17, 5.53s/it] {'loss': 0.5526, 'learning_rate': 7.892169599461545e-06, 'epoch': 0.58} 58%|█████▊ | 3352/5773 [1:20:44<3:43:17, 5.53s/it] {'loss': 0.5526, 'learning_rate': 7.892169599461545e-06, 'epoch': 0.58} 58%|█████▊ | 3352/5773 [1:20:46<3:43:17, 5.53s/it] 58%|█████▊ | 3353/5773 [1:20:50<3:43:18, 5.54s/it] 58%|█████▊ | 3353/5773 [1:20:52<3:43:17, 5.54s/it] {'loss': 0.5492, 'learning_rate': 7.886685005533073e-06, 'epoch': 0.58} 58%|█████▊ | 3353/5773 [1:20:52<3:43:17, 5.54s/it] {'loss': 0.5492, 'learning_rate': 7.886685005533073e-06, 'epoch': 0.58} 58%|█████▊ | 3353/5773 [1:20:50<3:43:18, 5.54s/it] 58%|█████▊ | 3354/5773 [1:20:55<3:41:31, 5.49s/it] 58%|█████▊ | 3354/5773 [1:20:57<3:41:32, 5.49s/it] {'loss': 0.5714, 'learning_rate': 7.881201076943678e-06, 'epoch': 0.58} 58%|█████▊ | 3354/5773 [1:20:57<3:41:32, 5.49s/it] {'loss': 0.5714, 'learning_rate': 7.881201076943678e-06, 'epoch': 0.58} 58%|█████▊ | 3354/5773 [1:20:55<3:41:31, 5.49s/it] 58%|█████▊ | 3355/5773 [1:21:01<3:40:39, 5.48s/it] 58%|█████▊ | 3355/5773 [1:21:03<3:40:38, 5.48s/it] {'loss': 0.5671, 'learning_rate': 7.87571781541988e-06, 'epoch': 0.58} 58%|█████▊ | 3355/5773 [1:21:03<3:40:38, 5.48s/it] {'loss': 0.5671, 'learning_rate': 7.87571781541988e-06, 'epoch': 0.58} 58%|█████▊ | 3355/5773 [1:21:01<3:40:39, 5.48s/it] 58%|█████▊ | 3356/5773 [1:21:06<3:40:59, 5.49s/it] 58%|█████▊ | 3356/5773 [1:21:08<3:40:59, 5.49s/it] {'loss': 0.5877, 'learning_rate': 7.870235222687983e-06, 'epoch': 0.58} 58%|█████▊ | 3356/5773 [1:21:08<3:40:59, 5.49s/it] {'loss': 0.5877, 'learning_rate': 7.870235222687983e-06, 'epoch': 0.58} 58%|█████▊ | 3356/5773 [1:21:06<3:40:59, 5.49s/it] 58%|█████▊ | 3357/5773 [1:21:12<3:42:52, 5.54s/it] 58%|█████▊ | 3357/5773 [1:21:14<3:42:52, 5.54s/it] {'loss': 0.5654, 'learning_rate': 7.864753300474084e-06, 'epoch': 0.58} 58%|█████▊ | 3357/5773 [1:21:14<3:42:52, 5.54s/it] {'loss': 0.5654, 'learning_rate': 7.864753300474084e-06, 'epoch': 0.58} 58%|█████▊ | 3357/5773 [1:21:12<3:42:52, 5.54s/it] 58%|█████▊ | 3358/5773 [1:21:17<3:40:23, 5.48s/it] 58%|█████▊ | 3358/5773 [1:21:19<3:40:23, 5.48s/it] {'loss': 0.562, 'learning_rate': 7.859272050504065e-06, 'epoch': 0.58} 58%|█████▊ | 3358/5773 [1:21:19<3:40:23, 5.48s/it] {'loss': 0.562, 'learning_rate': 7.859272050504065e-06, 'epoch': 0.58} 58%|█████▊ | 3358/5773 [1:21:17<3:40:23, 5.48s/it] 58%|█████▊ | 3359/5773 [1:21:22<3:39:06, 5.45s/it] 58%|█████▊ | 3359/5773 [1:21:24<3:39:06, 5.45s/it] {'loss': 0.5746, 'learning_rate': 7.853791474503603e-06, 'epoch': 0.58} 58%|█████▊ | 3359/5773 [1:21:24<3:39:06, 5.45s/it] {'loss': 0.5746, 'learning_rate': 7.853791474503603e-06, 'epoch': 0.58} 58%|█████▊ | 3359/5773 [1:21:23<3:39:06, 5.45s/it] 58%|█████▊ | 3360/5773 [1:21:28<3:39:59, 5.47s/it] 58%|█████▊ | 3360/5773 [1:21:30<3:39:59, 5.47s/it] {'loss': 0.595, 'learning_rate': 7.848311574198154e-06, 'epoch': 0.58} 58%|█████▊ | 3360/5773 [1:21:30<3:39:59, 5.47s/it] {'loss': 0.595, 'learning_rate': 7.848311574198154e-06, 'epoch': 0.58} 58%|█████▊ | 3360/5773 [1:21:28<3:39:59, 5.47s/it] 58%|█████▊ | 3361/5773 [1:21:33<3:39:03, 5.45s/it] 58%|█████▊ | 3361/5773 [1:21:35<3:39:03, 5.45s/it] {'loss': 0.5517, 'learning_rate': 7.842832351312971e-06, 'epoch': 0.58} 58%|█████▊ | 3361/5773 [1:21:35<3:39:03, 5.45s/it] {'loss': 0.5517, 'learning_rate': 7.842832351312971e-06, 'epoch': 0.58} 58%|█████▊ | 3361/5773 [1:21:33<3:39:03, 5.45s/it] 58%|█████▊ | 3362/5773 [1:21:39<3:39:31, 5.46s/it] 58%|█████▊ | 3362/5773 [1:21:41<3:39:31, 5.46s/it] {'loss': 0.5751, 'learning_rate': 7.83735380757308e-06, 'epoch': 0.58} 58%|█████▊ | 3362/5773 [1:21:41<3:39:31, 5.46s/it] {'loss': 0.5751, 'learning_rate': 7.83735380757308e-06, 'epoch': 0.58} 58%|█████▊ | 3362/5773 [1:21:39<3:39:31, 5.46s/it] 58%|█████▊ | 3363/5773 [1:21:44<3:39:58, 5.48s/it] 58%|█████▊ | 3363/5773 [1:21:46<3:39:57, 5.48s/it] {'loss': 0.5769, 'learning_rate': 7.831875944703307e-06, 'epoch': 0.58} 58%|█████▊ | 3363/5773 [1:21:46<3:39:57, 5.48s/it] {'loss': 0.5769, 'learning_rate': 7.831875944703307e-06, 'epoch': 0.58} 58%|█████▊ | 3363/5773 [1:21:44<3:39:58, 5.48s/it] 58%|█████▊ | 3364/5773 [1:21:50<3:38:55, 5.45s/it] 58%|█████▊ | 3364/5773 [1:21:52<3:38:55, 5.45s/it] {'loss': 0.5709, 'learning_rate': 7.826398764428261e-06, 'epoch': 0.58} 58%|█████▊ | 3364/5773 [1:21:52<3:38:55, 5.45s/it] {'loss': 0.5709, 'learning_rate': 7.826398764428261e-06, 'epoch': 0.58} 58%|█████▊ | 3364/5773 [1:21:50<3:38:55, 5.45s/it] 58%|█████▊ | 3365/5773 [1:21:55<3:38:30, 5.44s/it] 58%|█████▊ | 3365/5773 [1:21:57<3:38:30, 5.44s/it] {'loss': 0.5747, 'learning_rate': 7.820922268472326e-06, 'epoch': 0.58} 58%|█████▊ | 3365/5773 [1:21:57<3:38:30, 5.44s/it] {'loss': 0.5747, 'learning_rate': 7.820922268472326e-06, 'epoch': 0.58} 58%|█████▊ | 3365/5773 [1:21:55<3:38:30, 5.44s/it] 58%|█████▊ | 3366/5773 [1:22:01<3:38:38, 5.45s/it] 58%|█████▊ | 3366/5773 [1:22:03<3:38:38, 5.45s/it] {'loss': 0.5763, 'learning_rate': 7.815446458559686e-06, 'epoch': 0.58} 58%|█████▊ | 3366/5773 [1:22:03<3:38:38, 5.45s/it] {'loss': 0.5763, 'learning_rate': 7.815446458559686e-06, 'epoch': 0.58} 58%|█████▊ | 3366/5773 [1:22:01<3:38:38, 5.45s/it] 58%|█████▊ | 3367/5773 [1:22:08<3:38:19, 5.44s/it] 58%|█████▊ | 3367/5773 [1:22:06<3:38:19, 5.44s/it] {'loss': 0.5822, 'learning_rate': 7.809971336414297e-06, 'epoch': 0.58} 58%|█████▊ | 3367/5773 [1:22:08<3:38:19, 5.44s/it] {'loss': 0.5822, 'learning_rate': 7.809971336414297e-06, 'epoch': 0.58} 58%|█████▊ | 3367/5773 [1:22:06<3:38:19, 5.44s/it] 58%|█████▊ | 3368/5773 [1:22:14<3:39:06, 5.47s/it] 58%|█████▊ | 3368/5773 [1:22:12<3:39:06, 5.47s/it] {'loss': 0.5815, 'learning_rate': 7.8044969037599e-06, 'epoch': 0.58} 58%|█████▊ | 3368/5773 [1:22:14<3:39:06, 5.47s/it] {'loss': 0.5815, 'learning_rate': 7.8044969037599e-06, 'epoch': 0.58} 58%|█████▊ | 3368/5773 [1:22:12<3:39:06, 5.47s/it] 58%|█████▊ | 3369/5773 [1:22:19<3:41:42, 5.53s/it] 58%|█████▊ | 3369/5773 [1:22:17<3:41:43, 5.53s/it] {'loss': 0.553, 'learning_rate': 7.799023162320023e-06, 'epoch': 0.58} 58%|█████▊ | 3369/5773 [1:22:19<3:41:42, 5.53s/it] {'loss': 0.553, 'learning_rate': 7.799023162320023e-06, 'epoch': 0.58} 58%|█████▊ | 3369/5773 [1:22:17<3:41:43, 5.53s/it] 58%|█████▊ | 3370/5773 [1:22:25<3:39:57, 5.49s/it] 58%|█████▊ | 3370/5773 [1:22:23<3:39:57, 5.49s/it] {'loss': 0.5762, 'learning_rate': 7.793550113817976e-06, 'epoch': 0.58} 58%|█████▊ | 3370/5773 [1:22:25<3:39:57, 5.49s/it] {'loss': 0.5762, 'learning_rate': 7.793550113817976e-06, 'epoch': 0.58} 58%|█████▊ | 3370/5773 [1:22:23<3:39:57, 5.49s/it] 58%|█████▊ | 3371/5773 [1:22:30<3:40:44, 5.51s/it] 58%|█████▊ | 3371/5773 [1:22:28<3:40:45, 5.51s/it] {'loss': 0.5638, 'learning_rate': 7.788077759976849e-06, 'epoch': 0.58} 58%|█████▊ | 3371/5773 [1:22:30<3:40:44, 5.51s/it] {'loss': 0.5638, 'learning_rate': 7.788077759976849e-06, 'epoch': 0.58} 58%|█████▊ | 3371/5773 [1:22:28<3:40:45, 5.51s/it] 58%|█████▊ | 3372/5773 [1:22:36<3:40:31, 5.51s/it] 58%|█████▊ | 3372/5773 [1:22:34<3:40:31, 5.51s/it] {'loss': 0.5797, 'learning_rate': 7.782606102519513e-06, 'epoch': 0.58} 58%|█████▊ | 3372/5773 [1:22:36<3:40:31, 5.51s/it] {'loss': 0.5797, 'learning_rate': 7.782606102519513e-06, 'epoch': 0.58} 58%|█████▊ | 3372/5773 [1:22:34<3:40:31, 5.51s/it] 58%|█████▊ | 3373/5773 [1:22:42<3:44:13, 5.61s/it] 58%|█████▊ | 3373/5773 [1:22:40<3:44:13, 5.61s/it] {'loss': 0.5522, 'learning_rate': 7.777135143168622e-06, 'epoch': 0.58} 58%|█████▊ | 3373/5773 [1:22:42<3:44:13, 5.61s/it] {'loss': 0.5522, 'learning_rate': 7.777135143168622e-06, 'epoch': 0.58} 58%|█████▊ | 3373/5773 [1:22:40<3:44:13, 5.61s/it] 58%|█████▊ | 3374/5773 [1:22:47<3:43:23, 5.59s/it] 58%|█████▊ | 3374/5773 [1:22:45<3:43:22, 5.59s/it] {'loss': 0.5759, 'learning_rate': 7.771664883646607e-06, 'epoch': 0.58} 58%|█████▊ | 3374/5773 [1:22:47<3:43:23, 5.59s/it] {'loss': 0.5759, 'learning_rate': 7.771664883646607e-06, 'epoch': 0.58} 58%|█████▊ | 3374/5773 [1:22:45<3:43:22, 5.59s/it] 58%|█████▊ | 3375/5773 [1:22:51<3:41:02, 5.53s/it] 58%|█████▊ | 3375/5773 [1:22:53<3:41:03, 5.53s/it] {'loss': 0.574, 'learning_rate': 7.766195325675681e-06, 'epoch': 0.58} 58%|█████▊ | 3375/5773 [1:22:53<3:41:03, 5.53s/it] {'loss': 0.574, 'learning_rate': 7.766195325675681e-06, 'epoch': 0.58} 58%|█████▊ | 3375/5773 [1:22:51<3:41:02, 5.53s/it] 58%|█████▊ | 3376/5773 [1:22:56<3:38:46, 5.48s/it] 58%|█████▊ | 3376/5773 [1:22:58<3:38:46, 5.48s/it] {'loss': 0.5699, 'learning_rate': 7.760726470977837e-06, 'epoch': 0.58} 58%|█████▊ | 3376/5773 [1:22:58<3:38:46, 5.48s/it] {'loss': 0.5699, 'learning_rate': 7.760726470977837e-06, 'epoch': 0.58} 58%|█████▊ | 3376/5773 [1:22:56<3:38:46, 5.48s/it] 58%|█████▊ | 3377/5773 [1:23:03<3:38:42, 5.48s/it] 58%|█████▊ | 3377/5773 [1:23:01<3:38:43, 5.48s/it] {'loss': 0.5658, 'learning_rate': 7.755258321274845e-06, 'epoch': 0.58} 58%|█████▊ | 3377/5773 [1:23:03<3:38:42, 5.48s/it] {'loss': 0.5658, 'learning_rate': 7.755258321274845e-06, 'epoch': 0.58} 58%|█████▊ | 3377/5773 [1:23:01<3:38:43, 5.48s/it] 59%|█████▊ | 3378/5773 [1:23:09<3:39:02, 5.49s/it] 59%|█████▊ | 3378/5773 [1:23:07<3:39:02, 5.49s/it] {'loss': 0.5786, 'learning_rate': 7.74979087828825e-06, 'epoch': 0.59} 59%|█████▊ | 3378/5773 [1:23:09<3:39:02, 5.49s/it] {'loss': 0.5786, 'learning_rate': 7.74979087828825e-06, 'epoch': 0.59} 59%|█████▊ | 3378/5773 [1:23:07<3:39:02, 5.49s/it] 59%|█████▊ | 3379/5773 [1:23:14<3:37:54, 5.46s/it] 59%|█████▊ | 3379/5773 [1:23:12<3:37:54, 5.46s/it] {'loss': 0.556, 'learning_rate': 7.744324143739384e-06, 'epoch': 0.59} 59%|█████▊ | 3379/5773 [1:23:14<3:37:54, 5.46s/it] {'loss': 0.556, 'learning_rate': 7.744324143739384e-06, 'epoch': 0.59} 59%|█████▊ | 3379/5773 [1:23:12<3:37:54, 5.46s/it] 59%|█████▊ | 3380/5773 [1:23:20<3:39:43, 5.51s/it] 59%|█████▊ | 3380/5773 [1:23:18<3:39:43, 5.51s/it] {'loss': 0.5755, 'learning_rate': 7.738858119349341e-06, 'epoch': 0.59} 59%|█████▊ | 3380/5773 [1:23:20<3:39:43, 5.51s/it] {'loss': 0.5755, 'learning_rate': 7.738858119349341e-06, 'epoch': 0.59} 59%|█████▊ | 3380/5773 [1:23:18<3:39:43, 5.51s/it] 59%|█████▊ | 3381/5773 [1:23:25<3:38:50, 5.49s/it] 59%|█████▊ | 3381/5773 [1:23:23<3:38:50, 5.49s/it] {'loss': 0.5653, 'learning_rate': 7.733392806839012e-06, 'epoch': 0.59} 59%|█████▊ | 3381/5773 [1:23:25<3:38:50, 5.49s/it] {'loss': 0.5653, 'learning_rate': 7.733392806839012e-06, 'epoch': 0.59} 59%|█████▊ | 3381/5773 [1:23:23<3:38:50, 5.49s/it] 59%|█████▊ | 3382/5773 [1:23:29<3:37:47, 5.47s/it] 59%|█████▊ | 3382/5773 [1:23:31<3:37:48, 5.47s/it] {'loss': 0.5636, 'learning_rate': 7.727928207929039e-06, 'epoch': 0.59} 59%|█████▊ | 3382/5773 [1:23:29<3:37:47, 5.47s/it]{'loss': 0.5636, 'learning_rate': 7.727928207929039e-06, 'epoch': 0.59} 59%|█████▊ | 3382/5773 [1:23:31<3:37:48, 5.47s/it] 59%|█████▊ | 3383/5773 [1:23:36<3:38:17, 5.48s/it] 59%|█████▊ | 3383/5773 [1:23:34<3:38:18, 5.48s/it] {'loss': 0.5575, 'learning_rate': 7.722464324339862e-06, 'epoch': 0.59} 59%|█████▊ | 3383/5773 [1:23:36<3:38:17, 5.48s/it] {'loss': 0.5575, 'learning_rate': 7.722464324339862e-06, 'epoch': 0.59} 59%|█████▊ | 3383/5773 [1:23:34<3:38:18, 5.48s/it] 59%|█████▊ | 3384/5773 [1:23:42<3:39:58, 5.52s/it] 59%|█████▊ | 3384/5773 [1:23:40<3:39:58, 5.52s/it] {'loss': 0.5741, 'learning_rate': 7.71700115779168e-06, 'epoch': 0.59} 59%|█████▊ | 3384/5773 [1:23:42<3:39:58, 5.52s/it] {'loss': 0.5741, 'learning_rate': 7.71700115779168e-06, 'epoch': 0.59} 59%|█████▊ | 3384/5773 [1:23:40<3:39:58, 5.52s/it] 59%|█████▊ | 3385/5773 [1:23:47<3:39:16, 5.51s/it] 59%|█████▊ | 3385/5773 [1:23:45<3:39:16, 5.51s/it] {'loss': 0.5674, 'learning_rate': 7.711538710004476e-06, 'epoch': 0.59} 59%|█████▊ | 3385/5773 [1:23:47<3:39:16, 5.51s/it] {'loss': 0.5674, 'learning_rate': 7.711538710004476e-06, 'epoch': 0.59} 59%|█████▊ | 3385/5773 [1:23:45<3:39:16, 5.51s/it] 59%|█████▊ | 3386/5773 [1:23:53<3:39:48, 5.53s/it] 59%|█████▊ | 3386/5773 [1:23:51<3:39:48, 5.53s/it] {'loss': 0.5906, 'learning_rate': 7.706076982698e-06, 'epoch': 0.59} 59%|█████▊ | 3386/5773 [1:23:53<3:39:48, 5.53s/it] {'loss': 0.5906, 'learning_rate': 7.706076982698e-06, 'epoch': 0.59} 59%|█████▊ | 3386/5773 [1:23:51<3:39:48, 5.53s/it] 59%|█████▊ | 3387/5773 [1:23:59<3:40:23, 5.54s/it] 59%|█████▊ | 3387/5773 [1:23:57<3:40:23, 5.54s/it] {'loss': 0.5541, 'learning_rate': 7.700615977591782e-06, 'epoch': 0.59} 59%|█████▊ | 3387/5773 [1:23:59<3:40:23, 5.54s/it] {'loss': 0.5541, 'learning_rate': 7.700615977591782e-06, 'epoch': 0.59} 59%|█████▊ | 3387/5773 [1:23:57<3:40:23, 5.54s/it] 59%|█████▊ | 3388/5773 [1:24:04<3:40:08, 5.54s/it] 59%|█████▊ | 3388/5773 [1:24:02<3:40:09, 5.54s/it] {'loss': 0.5689, 'learning_rate': 7.69515569640512e-06, 'epoch': 0.59} 59%|█████▊ | 3388/5773 [1:24:04<3:40:08, 5.54s/it] {'loss': 0.5689, 'learning_rate': 7.69515569640512e-06, 'epoch': 0.59} 59%|█████▊ | 3388/5773 [1:24:02<3:40:09, 5.54s/it] 59%|█████▊ | 3389/5773 [1:24:10<3:39:45, 5.53s/it] 59%|█████▊ | 3389/5773 [1:24:08<3:39:45, 5.53s/it] {'loss': 0.5593, 'learning_rate': 7.68969614085708e-06, 'epoch': 0.59} 59%|█████▊ | 3389/5773 [1:24:10<3:39:45, 5.53s/it] {'loss': 0.5593, 'learning_rate': 7.68969614085708e-06, 'epoch': 0.59} 59%|█████▊ | 3389/5773 [1:24:08<3:39:45, 5.53s/it] 59%|█████▊ | 3390/5773 [1:24:15<3:41:30, 5.58s/it] 59%|█████▊ | 3390/5773 [1:24:13<3:41:29, 5.58s/it] {'loss': 0.567, 'learning_rate': 7.684237312666513e-06, 'epoch': 0.59} 59%|█████▊ | 3390/5773 [1:24:15<3:41:30, 5.58s/it] {'loss': 0.567, 'learning_rate': 7.684237312666513e-06, 'epoch': 0.59} 59%|█████▊ | 3390/5773 [1:24:13<3:41:29, 5.58s/it] 59%|█████▊ | 3391/5773 [1:24:21<3:41:55, 5.59s/it] 59%|█████▊ | 3391/5773 [1:24:19<3:41:55, 5.59s/it] {'loss': 0.556, 'learning_rate': 7.678779213552025e-06, 'epoch': 0.59} 59%|█████▊ | 3391/5773 [1:24:21<3:41:55, 5.59s/it] {'loss': 0.556, 'learning_rate': 7.678779213552025e-06, 'epoch': 0.59} 59%|█████▊ | 3391/5773 [1:24:19<3:41:55, 5.59s/it] 59%|█████▉ | 3392/5773 [1:24:26<3:41:09, 5.57s/it] 59%|█████▉ | 3392/5773 [1:24:24<3:41:10, 5.57s/it] {'loss': 0.58, 'learning_rate': 7.673321845232005e-06, 'epoch': 0.59} 59%|█████▉ | 3392/5773 [1:24:26<3:41:09, 5.57s/it] {'loss': 0.58, 'learning_rate': 7.673321845232005e-06, 'epoch': 0.59} 59%|█████▉ | 3392/5773 [1:24:24<3:41:10, 5.57s/it] 59%|█████▉ | 3393/5773 [1:24:32<3:38:52, 5.52s/it] 59%|█████▉ | 3393/5773 [1:24:30<3:38:52, 5.52s/it] {'loss': 0.5649, 'learning_rate': 7.667865209424603e-06, 'epoch': 0.59} 59%|█████▉ | 3393/5773 [1:24:32<3:38:52, 5.52s/it] {'loss': 0.5649, 'learning_rate': 7.667865209424603e-06, 'epoch': 0.59} 59%|█████▉ | 3393/5773 [1:24:30<3:38:52, 5.52s/it] 59%|█████▉ | 3394/5773 [1:24:37<3:37:54, 5.50s/it] 59%|█████▉ | 3394/5773 [1:24:35<3:37:53, 5.50s/it] {'loss': 0.5899, 'learning_rate': 7.662409307847744e-06, 'epoch': 0.59} 59%|█████▉ | 3394/5773 [1:24:37<3:37:54, 5.50s/it] {'loss': 0.5899, 'learning_rate': 7.662409307847744e-06, 'epoch': 0.59} 59%|█████▉ | 3394/5773 [1:24:35<3:37:53, 5.50s/it] 59%|█████▉ | 3395/5773 [1:24:43<3:40:31, 5.56s/it] 59%|█████▉ | 3395/5773 [1:24:41<3:40:32, 5.56s/it] {'loss': 0.5613, 'learning_rate': 7.656954142219126e-06, 'epoch': 0.59} 59%|█████▉ | 3395/5773 [1:24:43<3:40:31, 5.56s/it] {'loss': 0.5613, 'learning_rate': 7.656954142219126e-06, 'epoch': 0.59} 59%|█████▉ | 3395/5773 [1:24:41<3:40:32, 5.56s/it] 59%|█████▉ | 3396/5773 [1:24:49<3:40:33, 5.57s/it] 59%|█████▉ | 3396/5773 [1:24:47<3:40:33, 5.57s/it] {'loss': 0.5643, 'learning_rate': 7.651499714256201e-06, 'epoch': 0.59} 59%|█████▉ | 3396/5773 [1:24:49<3:40:33, 5.57s/it] {'loss': 0.5643, 'learning_rate': 7.651499714256201e-06, 'epoch': 0.59} 59%|█████▉ | 3396/5773 [1:24:47<3:40:33, 5.57s/it] 59%|█████▉ | 3397/5773 [1:24:54<3:38:44, 5.52s/it] 59%|█████▉ | 3397/5773 [1:24:52<3:38:44, 5.52s/it] {'loss': 0.5498, 'learning_rate': 7.646046025676198e-06, 'epoch': 0.59} 59%|█████▉ | 3397/5773 [1:24:54<3:38:44, 5.52s/it] {'loss': 0.5498, 'learning_rate': 7.646046025676198e-06, 'epoch': 0.59} 59%|█████▉ | 3397/5773 [1:24:52<3:38:44, 5.52s/it] 59%|█████▉ | 3398/5773 [1:25:00<3:38:57, 5.53s/it] 59%|█████▉ | 3398/5773 [1:24:58<3:38:56, 5.53s/it] {'loss': 0.5699, 'learning_rate': 7.640593078196117e-06, 'epoch': 0.59} 59%|█████▉ | 3398/5773 [1:25:00<3:38:57, 5.53s/it] {'loss': 0.5699, 'learning_rate': 7.640593078196117e-06, 'epoch': 0.59} 59%|█████▉ | 3398/5773 [1:24:58<3:38:56, 5.53s/it] 59%|█████▉ | 3399/5773 [1:25:05<3:41:35, 5.60s/it] 59%|█████▉ | 3399/5773 [1:25:03<3:41:35, 5.60s/it] {'loss': 0.58, 'learning_rate': 7.635140873532714e-06, 'epoch': 0.59} 59%|█████▉ | 3399/5773 [1:25:05<3:41:35, 5.60s/it] {'loss': 0.58, 'learning_rate': 7.635140873532714e-06, 'epoch': 0.59} 59%|█████▉ | 3399/5773 [1:25:03<3:41:35, 5.60s/it]11 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 59%|█████▉ | 3400/5773 [1:25:11<3:40:27, 5.57s/it]910 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 133 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 59%|█████▉ | 3400/5773 [1:25:09<3:40:27, 5.57s/it] {'loss': 0.566, 'learning_rate': 7.629689413402522e-06, 'epoch': 0.59} 59%|█████▉ | 3400/5773 [1:25:11<3:40:27, 5.57s/it] {'loss': 0.566, 'learning_rate': 7.629689413402522e-06, 'epoch': 0.59} 59%|█████▉ | 3400/5773 [1:25:09<3:40:27, 5.57s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 59%|█████▉ | 3401/5773 [1:25:32<6:44:38, 10.24s/it] 59%|█████▉ | 3401/5773 [1:25:30<6:44:37, 10.24s/it] {'loss': 0.565, 'learning_rate': 7.6242386995218355e-06, 'epoch': 0.59} 59%|█████▉ | 3401/5773 [1:25:32<6:44:38, 10.24s/it] {'loss': 0.565, 'learning_rate': 7.6242386995218355e-06, 'epoch': 0.59} 59%|█████▉ | 3401/5773 [1:25:30<6:44:37, 10.24s/it] 59%|█████▉ | 3402/5773 [1:25:37<5:48:12, 8.81s/it] 59%|█████▉ | 3402/5773 [1:25:35<5:48:11, 8.81s/it] {'loss': 0.5809, 'learning_rate': 7.618788733606708e-06, 'epoch': 0.59} 59%|█████▉ | 3402/5773 [1:25:37<5:48:12, 8.81s/it] {'loss': 0.5809, 'learning_rate': 7.618788733606708e-06, 'epoch': 0.59} 59%|█████▉ | 3402/5773 [1:25:35<5:48:11, 8.81s/it] 59%|█████▉ | 3403/5773 [1:25:43<5:08:32, 7.81s/it] 59%|█████▉ | 3403/5773 [1:25:41<5:08:32, 7.81s/it] {'loss': 0.5635, 'learning_rate': 7.61333951737297e-06, 'epoch': 0.59} 59%|█████▉ | 3403/5773 [1:25:43<5:08:32, 7.81s/it] {'loss': 0.5635, 'learning_rate': 7.61333951737297e-06, 'epoch': 0.59} 59%|█████▉ | 3403/5773 [1:25:41<5:08:32, 7.81s/it] 59%|█████▉ | 3404/5773 [1:25:48<4:41:02, 7.12s/it] 59%|█████▉ | 3404/5773 [1:25:46<4:41:02, 7.12s/it] {'loss': 0.5625, 'learning_rate': 7.607891052536203e-06, 'epoch': 0.59} 59%|█████▉ | 3404/5773 [1:25:48<4:41:02, 7.12s/it] {'loss': 0.5625, 'learning_rate': 7.607891052536203e-06, 'epoch': 0.59} 59%|█████▉ | 3404/5773 [1:25:46<4:41:02, 7.12s/it] 59%|█████▉ | 3405/5773 [1:25:54<4:21:41, 6.63s/it] 59%|█████▉ | 3405/5773 [1:25:52<4:21:42, 6.63s/it] {'loss': 0.569, 'learning_rate': 7.602443340811761e-06, 'epoch': 0.59} 59%|█████▉ | 3405/5773 [1:25:54<4:21:41, 6.63s/it] {'loss': 0.569, 'learning_rate': 7.602443340811761e-06, 'epoch': 0.59} 59%|█████▉ | 3405/5773 [1:25:52<4:21:42, 6.63s/it] 59%|█████▉ | 3406/5773 [1:25:59<4:07:54, 6.28s/it] 59%|█████▉ | 3406/5773 [1:25:57<4:07:54, 6.28s/it] {'loss': 0.5479, 'learning_rate': 7.596996383914756e-06, 'epoch': 0.59} 59%|█████▉ | 3406/5773 [1:25:59<4:07:54, 6.28s/it] {'loss': 0.5479, 'learning_rate': 7.596996383914756e-06, 'epoch': 0.59} 59%|█████▉ | 3406/5773 [1:25:57<4:07:54, 6.28s/it] 59%|█████▉ | 3407/5773 [1:26:05<3:59:30, 6.07s/it] 59%|█████▉ | 3407/5773 [1:26:03<3:59:30, 6.07s/it] {'loss': 0.5683, 'learning_rate': 7.591550183560064e-06, 'epoch': 0.59} 59%|█████▉ | 3407/5773 [1:26:05<3:59:30, 6.07s/it] {'loss': 0.5683, 'learning_rate': 7.591550183560064e-06, 'epoch': 0.59} 59%|█████▉ | 3407/5773 [1:26:03<3:59:30, 6.07s/it] 59%|█████▉ | 3408/5773 [1:26:10<3:50:54, 5.86s/it] 59%|█████▉ | 3408/5773 [1:26:08<3:50:54, 5.86s/it] {'loss': 0.559, 'learning_rate': 7.586104741462326e-06, 'epoch': 0.59} 59%|█████▉ | 3408/5773 [1:26:10<3:50:54, 5.86s/it] {'loss': 0.559, 'learning_rate': 7.586104741462326e-06, 'epoch': 0.59} 59%|█████▉ | 3408/5773 [1:26:08<3:50:54, 5.86s/it] 59%|█████▉ | 3409/5773 [1:26:16<3:46:16, 5.74s/it] 59%|█████▉ | 3409/5773 [1:26:14<3:46:16, 5.74s/it] {'loss': 0.5646, 'learning_rate': 7.58066005933594e-06, 'epoch': 0.59} 59%|█████▉ | 3409/5773 [1:26:16<3:46:16, 5.74s/it] {'loss': 0.5646, 'learning_rate': 7.58066005933594e-06, 'epoch': 0.59} 59%|█████▉ | 3409/5773 [1:26:14<3:46:16, 5.74s/it] 59%|█████▉ | 3410/5773 [1:26:21<3:41:50, 5.63s/it] 59%|█████▉ | 3410/5773 [1:26:19<3:41:50, 5.63s/it] {'loss': 0.5594, 'learning_rate': 7.575216138895062e-06, 'epoch': 0.59} 59%|█████▉ | 3410/5773 [1:26:21<3:41:50, 5.63s/it] {'loss': 0.5594, 'learning_rate': 7.575216138895062e-06, 'epoch': 0.59} 59%|█████▉ | 3410/5773 [1:26:19<3:41:50, 5.63s/it] 59%|█████▉ | 3411/5773 [1:26:27<3:39:51, 5.58s/it] 59%|█████▉ | 3411/5773 [1:26:25<3:39:51, 5.58s/it] {'loss': 0.5737, 'learning_rate': 7.569772981853616e-06, 'epoch': 0.59} 59%|█████▉ | 3411/5773 [1:26:27<3:39:51, 5.58s/it] {'loss': 0.5737, 'learning_rate': 7.569772981853616e-06, 'epoch': 0.59} 59%|█████▉ | 3411/5773 [1:26:25<3:39:51, 5.58s/it] 59%|█████▉ | 3412/5773 [1:26:32<3:37:42, 5.53s/it] 59%|█████▉ | 3412/5773 [1:26:30<3:37:42, 5.53s/it] {'loss': 0.5744, 'learning_rate': 7.56433058992528e-06, 'epoch': 0.59} 59%|█████▉ | 3412/5773 [1:26:32<3:37:42, 5.53s/it] {'loss': 0.5744, 'learning_rate': 7.56433058992528e-06, 'epoch': 0.59} 59%|█████▉ | 3412/5773 [1:26:30<3:37:42, 5.53s/it] 59%|█████▉ | 3413/5773 [1:26:38<3:37:40, 5.53s/it] 59%|█████▉ | 3413/5773 [1:26:36<3:37:40, 5.53s/it] {'loss': 0.572, 'learning_rate': 7.5588889648234905e-06, 'epoch': 0.59} 59%|█████▉ | 3413/5773 [1:26:38<3:37:40, 5.53s/it] {'loss': 0.572, 'learning_rate': 7.5588889648234905e-06, 'epoch': 0.59} 59%|█████▉ | 3413/5773 [1:26:36<3:37:40, 5.53s/it] 59%|█████▉ | 3414/5773 [1:26:43<3:36:13, 5.50s/it] 59%|█████▉ | 3414/5773 [1:26:41<3:36:13, 5.50s/it] {'loss': 0.5733, 'learning_rate': 7.553448108261452e-06, 'epoch': 0.59} 59%|█████▉ | 3414/5773 [1:26:43<3:36:13, 5.50s/it] {'loss': 0.5733, 'learning_rate': 7.553448108261452e-06, 'epoch': 0.59} 59%|█████▉ | 3414/5773 [1:26:41<3:36:13, 5.50s/it] 59%|█████▉ | 3415/5773 [1:26:47<3:36:20, 5.50s/it] 59%|█████▉ | 3415/5773 [1:26:48<3:36:21, 5.51s/it] {'loss': 0.5712, 'learning_rate': 7.548008021952113e-06, 'epoch': 0.59} 59%|█████▉ | 3415/5773 [1:26:48<3:36:21, 5.51s/it] {'loss': 0.5712, 'learning_rate': 7.548008021952113e-06, 'epoch': 0.59} 59%|█████▉ | 3415/5773 [1:26:47<3:36:20, 5.50s/it] 59%|█████▉ | 3416/5773 [1:26:54<3:36:09, 5.50s/it] 59%|█████▉ | 3416/5773 [1:26:52<3:36:10, 5.50s/it] {'loss': 0.5627, 'learning_rate': 7.542568707608192e-06, 'epoch': 0.59} 59%|█████▉ | 3416/5773 [1:26:54<3:36:09, 5.50s/it] {'loss': 0.5627, 'learning_rate': 7.542568707608192e-06, 'epoch': 0.59} 59%|█████▉ | 3416/5773 [1:26:52<3:36:10, 5.50s/it] 59%|█████▉ | 3417/5773 [1:26:59<3:35:00, 5.48s/it] 59%|█████▉ | 3417/5773 [1:26:57<3:35:01, 5.48s/it] {'loss': 0.5519, 'learning_rate': 7.537130166942154e-06, 'epoch': 0.59} 59%|█████▉ | 3417/5773 [1:26:59<3:35:00, 5.48s/it] {'loss': 0.5519, 'learning_rate': 7.537130166942154e-06, 'epoch': 0.59} 59%|█████▉ | 3417/5773 [1:26:57<3:35:01, 5.48s/it] 59%|█████▉ | 3418/5773 [1:27:05<3:35:57, 5.50s/it] 59%|█████▉ | 3418/5773 [1:27:03<3:35:58, 5.50s/it] {'loss': 0.5667, 'learning_rate': 7.53169240166623e-06, 'epoch': 0.59} 59%|█████▉ | 3418/5773 [1:27:05<3:35:57, 5.50s/it] {'loss': 0.5667, 'learning_rate': 7.53169240166623e-06, 'epoch': 0.59} 59%|█████▉ | 3418/5773 [1:27:03<3:35:58, 5.50s/it] 59%|█████▉ | 3419/5773 [1:27:11<3:36:27, 5.52s/it] 59%|█████▉ | 3419/5773 [1:27:09<3:36:27, 5.52s/it] {'loss': 0.5759, 'learning_rate': 7.526255413492396e-06, 'epoch': 0.59} 59%|█████▉ | 3419/5773 [1:27:11<3:36:27, 5.52s/it] {'loss': 0.5759, 'learning_rate': 7.526255413492396e-06, 'epoch': 0.59} 59%|█████▉ | 3419/5773 [1:27:09<3:36:27, 5.52s/it] 59%|█████▉ | 3420/5773 [1:27:16<3:37:54, 5.56s/it] 59%|█████▉ | 3420/5773 [1:27:14<3:37:55, 5.56s/it] {'loss': 0.5639, 'learning_rate': 7.5208192041323945e-06, 'epoch': 0.59} 59%|█████▉ | 3420/5773 [1:27:16<3:37:54, 5.56s/it] {'loss': 0.5639, 'learning_rate': 7.5208192041323945e-06, 'epoch': 0.59} 59%|█████▉ | 3420/5773 [1:27:14<3:37:55, 5.56s/it] 59%|█████▉ | 3421/5773 [1:27:22<3:35:47, 5.50s/it] 59%|█████▉ | 3421/5773 [1:27:20<3:35:47, 5.50s/it] {'loss': 0.5809, 'learning_rate': 7.5153837752977196e-06, 'epoch': 0.59} 59%|█████▉ | 3421/5773 [1:27:22<3:35:47, 5.50s/it] {'loss': 0.5809, 'learning_rate': 7.5153837752977196e-06, 'epoch': 0.59} 59%|█████▉ | 3421/5773 [1:27:20<3:35:47, 5.50s/it] 59%|█████▉ | 3422/5773 [1:27:27<3:35:18, 5.49s/it] 59%|█████▉ | 3422/5773 [1:27:25<3:35:18, 5.49s/it] {'loss': 0.5853, 'learning_rate': 7.509949128699613e-06, 'epoch': 0.59} 59%|█████▉ | 3422/5773 [1:27:27<3:35:18, 5.49s/it] {'loss': 0.5853, 'learning_rate': 7.509949128699613e-06, 'epoch': 0.59} 59%|█████▉ | 3422/5773 [1:27:25<3:35:18, 5.49s/it] 59%|█████▉ | 3423/5773 [1:27:33<3:35:42, 5.51s/it] 59%|█████▉ | 3423/5773 [1:27:31<3:35:42, 5.51s/it] {'loss': 0.566, 'learning_rate': 7.5045152660490805e-06, 'epoch': 0.59} 59%|█████▉ | 3423/5773 [1:27:33<3:35:42, 5.51s/it] {'loss': 0.566, 'learning_rate': 7.5045152660490805e-06, 'epoch': 0.59} 59%|█████▉ | 3423/5773 [1:27:31<3:35:42, 5.51s/it] 59%|█████▉ | 3424/5773 [1:27:38<3:34:28, 5.48s/it] 59%|█████▉ | 3424/5773 [1:27:36<3:34:28, 5.48s/it] {'loss': 0.5725, 'learning_rate': 7.499082189056871e-06, 'epoch': 0.59} 59%|█████▉ | 3424/5773 [1:27:38<3:34:28, 5.48s/it] {'loss': 0.5725, 'learning_rate': 7.499082189056871e-06, 'epoch': 0.59} 59%|█████▉ | 3424/5773 [1:27:36<3:34:28, 5.48s/it] 59%|█████▉ | 3425/5773 [1:27:44<3:37:17, 5.55s/it] 59%|█████▉ | 3425/5773 [1:27:42<3:37:17, 5.55s/it] {'loss': 0.5635, 'learning_rate': 7.493649899433491e-06, 'epoch': 0.59} 59%|█████▉ | 3425/5773 [1:27:44<3:37:17, 5.55s/it] {'loss': 0.5635, 'learning_rate': 7.493649899433491e-06, 'epoch': 0.59} 59%|█████▉ | 3425/5773 [1:27:42<3:37:17, 5.55s/it] 59%|█████▉ | 3426/5773 [1:27:49<3:37:24, 5.56s/it] 59%|█████▉ | 3426/5773 [1:27:47<3:37:24, 5.56s/it] {'loss': 0.5801, 'learning_rate': 7.488218398889198e-06, 'epoch': 0.59} 59%|█████▉ | 3426/5773 [1:27:49<3:37:24, 5.56s/it] {'loss': 0.5801, 'learning_rate': 7.488218398889198e-06, 'epoch': 0.59} 59%|█████▉ | 3426/5773 [1:27:47<3:37:24, 5.56s/it] 59%|█████▉ | 3427/5773 [1:27:55<3:37:41, 5.57s/it] 59%|█████▉ | 3427/5773 [1:27:53<3:37:41, 5.57s/it] {'loss': 0.5541, 'learning_rate': 7.482787689134007e-06, 'epoch': 0.59} 59%|█████▉ | 3427/5773 [1:27:55<3:37:41, 5.57s/it] {'loss': 0.5541, 'learning_rate': 7.482787689134007e-06, 'epoch': 0.59} 59%|█████▉ | 3427/5773 [1:27:53<3:37:41, 5.57s/it] 59%|█████▉ | 3428/5773 [1:27:58<3:36:30, 5.54s/it] 59%|█████▉ | 3428/5773 [1:28:00<3:36:31, 5.54s/it] {'loss': 0.5784, 'learning_rate': 7.4773577718776735e-06, 'epoch': 0.59} 59%|█████▉ | 3428/5773 [1:28:00<3:36:31, 5.54s/it] {'loss': 0.5784, 'learning_rate': 7.4773577718776735e-06, 'epoch': 0.59} 59%|█████▉ | 3428/5773 [1:27:58<3:36:30, 5.54s/it] 59%|█████▉ | 3429/5773 [1:28:06<3:37:26, 5.57s/it] 59%|█████▉ | 3429/5773 [1:28:04<3:37:26, 5.57s/it] {'loss': 0.5618, 'learning_rate': 7.471928648829714e-06, 'epoch': 0.59} 59%|█████▉ | 3429/5773 [1:28:06<3:37:26, 5.57s/it] {'loss': 0.5618, 'learning_rate': 7.471928648829714e-06, 'epoch': 0.59} 59%|█████▉ | 3429/5773 [1:28:04<3:37:26, 5.57s/it] 59%|█████▉ | 3430/5773 [1:28:11<3:36:52, 5.55s/it] 59%|█████▉ | 3430/5773 [1:28:10<3:36:51, 5.55s/it] {'loss': 0.5672, 'learning_rate': 7.4665003216993835e-06, 'epoch': 0.59} 59%|█████▉ | 3430/5773 [1:28:11<3:36:52, 5.55s/it] {'loss': 0.5672, 'learning_rate': 7.4665003216993835e-06, 'epoch': 0.59} 59%|█████▉ | 3430/5773 [1:28:10<3:36:51, 5.55s/it] 59%|█████▉ | 3431/5773 [1:28:17<3:36:41, 5.55s/it] 59%|█████▉ | 3431/5773 [1:28:15<3:36:41, 5.55s/it] {'loss': 0.5508, 'learning_rate': 7.4610727921956985e-06, 'epoch': 0.59} 59%|█████▉ | 3431/5773 [1:28:17<3:36:41, 5.55s/it] {'loss': 0.5508, 'learning_rate': 7.4610727921956985e-06, 'epoch': 0.59} 59%|█████▉ | 3431/5773 [1:28:15<3:36:41, 5.55s/it] 59%|█████▉ | 3432/5773 [1:28:22<3:35:19, 5.52s/it] 59%|█████▉ | 3432/5773 [1:28:20<3:35:19, 5.52s/it] {'loss': 0.5672, 'learning_rate': 7.4556460620274174e-06, 'epoch': 0.59} 59%|█████▉ | 3432/5773 [1:28:22<3:35:19, 5.52s/it] {'loss': 0.5672, 'learning_rate': 7.4556460620274174e-06, 'epoch': 0.59} 59%|█████▉ | 3432/5773 [1:28:20<3:35:19, 5.52s/it] 59%|█████▉ | 3433/5773 [1:28:28<3:34:51, 5.51s/it] 59%|█████▉ | 3433/5773 [1:28:26<3:34:51, 5.51s/it] {'loss': 0.5714, 'learning_rate': 7.4502201329030456e-06, 'epoch': 0.59} 59%|█████▉ | 3433/5773 [1:28:28<3:34:51, 5.51s/it] {'loss': 0.5714, 'learning_rate': 7.4502201329030456e-06, 'epoch': 0.59} 59%|█████▉ | 3433/5773 [1:28:26<3:34:51, 5.51s/it] 59%|█████▉ | 3434/5773 [1:28:33<3:32:03, 5.44s/it] 59%|█████▉ | 3434/5773 [1:28:31<3:32:04, 5.44s/it] {'loss': 0.5746, 'learning_rate': 7.444795006530844e-06, 'epoch': 0.59} 59%|█████▉ | 3434/5773 [1:28:33<3:32:03, 5.44s/it] {'loss': 0.5746, 'learning_rate': 7.444795006530844e-06, 'epoch': 0.59} 59%|█████▉ | 3434/5773 [1:28:31<3:32:04, 5.44s/it] 60%|█████▉ | 3435/5773 [1:28:39<3:32:06, 5.44s/it] 60%|█████▉ | 3435/5773 [1:28:37<3:32:06, 5.44s/it] {'loss': 0.5694, 'learning_rate': 7.4393706846188095e-06, 'epoch': 0.6} 60%|█████▉ | 3435/5773 [1:28:39<3:32:06, 5.44s/it] {'loss': 0.5694, 'learning_rate': 7.4393706846188095e-06, 'epoch': 0.6} 60%|█████▉ | 3435/5773 [1:28:37<3:32:06, 5.44s/it] 60%|█████▉ | 3436/5773 [1:28:44<3:32:36, 5.46s/it] 60%|█████▉ | 3436/5773 [1:28:42<3:32:36, 5.46s/it] {'loss': 0.5566, 'learning_rate': 7.433947168874698e-06, 'epoch': 0.6} 60%|█████▉ | 3436/5773 [1:28:44<3:32:36, 5.46s/it] {'loss': 0.5566, 'learning_rate': 7.433947168874698e-06, 'epoch': 0.6} 60%|█████▉ | 3436/5773 [1:28:42<3:32:36, 5.46s/it] 60%|█████▉ | 3437/5773 [1:28:50<3:32:46, 5.46s/it] 60%|█████▉ | 3437/5773 [1:28:48<3:32:45, 5.46s/it] {'loss': 0.5817, 'learning_rate': 7.4285244610060036e-06, 'epoch': 0.6} 60%|█████▉ | 3437/5773 [1:28:50<3:32:46, 5.46s/it] {'loss': 0.5817, 'learning_rate': 7.4285244610060036e-06, 'epoch': 0.6} 60%|█████▉ | 3437/5773 [1:28:48<3:32:45, 5.46s/it] 60%|█████▉ | 3438/5773 [1:28:55<3:35:17, 5.53s/it] 60%|█████▉ | 3438/5773 [1:28:53<3:35:17, 5.53s/it] {'loss': 0.5719, 'learning_rate': 7.4231025627199635e-06, 'epoch': 0.6} 60%|█████▉ | 3438/5773 [1:28:55<3:35:17, 5.53s/it] {'loss': 0.5719, 'learning_rate': 7.4231025627199635e-06, 'epoch': 0.6} 60%|█████▉ | 3438/5773 [1:28:53<3:35:17, 5.53s/it] 60%|█████▉ | 3439/5773 [1:29:01<3:33:50, 5.50s/it] 60%|█████▉ | 3439/5773 [1:28:59<3:33:51, 5.50s/it] {'loss': 0.565, 'learning_rate': 7.4176814757235685e-06, 'epoch': 0.6} 60%|█████▉ | 3439/5773 [1:29:01<3:33:50, 5.50s/it] {'loss': 0.565, 'learning_rate': 7.4176814757235685e-06, 'epoch': 0.6} 60%|█████▉ | 3439/5773 [1:28:59<3:33:51, 5.50s/it] 60%|█████▉ | 3440/5773 [1:29:06<3:32:37, 5.47s/it] 60%|█████▉ | 3440/5773 [1:29:04<3:32:36, 5.47s/it] {'loss': 0.5598, 'learning_rate': 7.4122612017235494e-06, 'epoch': 0.6} 60%|█████▉ | 3440/5773 [1:29:06<3:32:37, 5.47s/it] {'loss': 0.5598, 'learning_rate': 7.4122612017235494e-06, 'epoch': 0.6} 60%|█████▉ | 3440/5773 [1:29:04<3:32:36, 5.47s/it] 60%|█████▉ | 3441/5773 [1:29:12<3:32:39, 5.47s/it] 60%|█████▉ | 3441/5773 [1:29:10<3:32:39, 5.47s/it] {'loss': 0.5642, 'learning_rate': 7.40684174242638e-06, 'epoch': 0.6} 60%|█████▉ | 3441/5773 [1:29:12<3:32:39, 5.47s/it] {'loss': 0.5642, 'learning_rate': 7.40684174242638e-06, 'epoch': 0.6} 60%|█████▉ | 3441/5773 [1:29:10<3:32:39, 5.47s/it] 60%|█████▉ | 3442/5773 [1:29:17<3:31:12, 5.44s/it] 60%|█████▉ | 3442/5773 [1:29:15<3:31:12, 5.44s/it] {'loss': 0.5786, 'learning_rate': 7.401423099538285e-06, 'epoch': 0.6} 60%|█████▉ | 3442/5773 [1:29:17<3:31:12, 5.44s/it] {'loss': 0.5786, 'learning_rate': 7.401423099538285e-06, 'epoch': 0.6} 60%|█████▉ | 3442/5773 [1:29:15<3:31:12, 5.44s/it] 60%|█████▉ | 3443/5773 [1:29:22<3:30:15, 5.41s/it] 60%|█████▉ | 3443/5773 [1:29:20<3:30:16, 5.41s/it] {'loss': 0.5736, 'learning_rate': 7.396005274765219e-06, 'epoch': 0.6} 60%|█████▉ | 3443/5773 [1:29:22<3:30:15, 5.41s/it] {'loss': 0.5736, 'learning_rate': 7.396005274765219e-06, 'epoch': 0.6} 60%|█████▉ | 3443/5773 [1:29:20<3:30:16, 5.41s/it] 60%|█████▉ | 3444/5773 [1:29:28<3:28:52, 5.38s/it] 60%|█████▉ | 3444/5773 [1:29:26<3:28:53, 5.38s/it] {'loss': 0.5819, 'learning_rate': 7.3905882698128905e-06, 'epoch': 0.6} 60%|█████▉ | 3444/5773 [1:29:28<3:28:52, 5.38s/it] {'loss': 0.5819, 'learning_rate': 7.3905882698128905e-06, 'epoch': 0.6} 60%|█████▉ | 3444/5773 [1:29:26<3:28:53, 5.38s/it] 60%|█████▉ | 3445/5773 [1:29:33<3:29:15, 5.39s/it] 60%|█████▉ | 3445/5773 [1:29:31<3:29:14, 5.39s/it] {'loss': 0.5763, 'learning_rate': 7.385172086386745e-06, 'epoch': 0.6} 60%|█████▉ | 3445/5773 [1:29:33<3:29:15, 5.39s/it] {'loss': 0.5763, 'learning_rate': 7.385172086386745e-06, 'epoch': 0.6} 60%|█████▉ | 3445/5773 [1:29:31<3:29:14, 5.39s/it] 60%|█████▉ | 3446/5773 [1:29:39<3:29:37, 5.41s/it] 60%|█████▉ | 3446/5773 [1:29:37<3:29:37, 5.41s/it] {'loss': 0.5682, 'learning_rate': 7.379756726191969e-06, 'epoch': 0.6} 60%|█████▉ | 3446/5773 [1:29:39<3:29:37, 5.41s/it] {'loss': 0.5682, 'learning_rate': 7.379756726191969e-06, 'epoch': 0.6} 60%|█████▉ | 3446/5773 [1:29:37<3:29:37, 5.41s/it] 60%|█████▉ | 3447/5773 [1:29:44<3:31:14, 5.45s/it] 60%|█████▉ | 3447/5773 [1:29:42<3:31:14, 5.45s/it] {'loss': 0.5605, 'learning_rate': 7.3743421909334945e-06, 'epoch': 0.6} 60%|█████▉ | 3447/5773 [1:29:44<3:31:14, 5.45s/it] {'loss': 0.5605, 'learning_rate': 7.3743421909334945e-06, 'epoch': 0.6} 60%|█████▉ | 3447/5773 [1:29:42<3:31:14, 5.45s/it] 60%|█████▉ | 3448/5773 [1:29:50<3:33:32, 5.51s/it] 60%|█████▉ | 3448/5773 [1:29:48<3:33:32, 5.51s/it] {'loss': 0.5781, 'learning_rate': 7.3689284823159844e-06, 'epoch': 0.6} 60%|█████▉ | 3448/5773 [1:29:50<3:33:32, 5.51s/it] {'loss': 0.5781, 'learning_rate': 7.3689284823159844e-06, 'epoch': 0.6} 60%|█████▉ | 3448/5773 [1:29:48<3:33:32, 5.51s/it] 60%|█████▉ | 3449/5773 [1:29:55<3:33:32, 5.51s/it] 60%|█████▉ | 3449/5773 [1:29:53<3:33:32, 5.51s/it] {'loss': 0.5672, 'learning_rate': 7.363515602043852e-06, 'epoch': 0.6} 60%|█████▉ | 3449/5773 [1:29:55<3:33:32, 5.51s/it] {'loss': 0.5672, 'learning_rate': 7.363515602043852e-06, 'epoch': 0.6} 60%|█████▉ | 3449/5773 [1:29:53<3:33:32, 5.51s/it]10 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend...7 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 60%|█████▉ | 3450/5773 [1:30:01<3:33:01, 5.50s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 013 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 60%|█████▉ | 3450/5773 [1:29:59<3:33:01, 5.50s/it] {'loss': 0.5664, 'learning_rate': 7.358103551821241e-06, 'epoch': 0.6} 60%|█████▉ | 3450/5773 [1:30:01<3:33:01, 5.50s/it] {'loss': 0.5664, 'learning_rate': 7.358103551821241e-06, 'epoch': 0.6} 60%|█████▉ | 3450/5773 [1:29:59<3:33:01, 5.50s/it] 60%|█████▉ | 3451/5773 [1:30:06<3:34:38, 5.55s/it] 60%|█████▉ | 3451/5773 [1:30:04<3:34:39, 5.55s/it] {'loss': 0.5652, 'learning_rate': 7.352692333352046e-06, 'epoch': 0.6} 60%|█████▉ | 3451/5773 [1:30:06<3:34:38, 5.55s/it] {'loss': 0.5652, 'learning_rate': 7.352692333352046e-06, 'epoch': 0.6} 60%|█████▉ | 3451/5773 [1:30:04<3:34:39, 5.55s/it] 60%|█████▉ | 3452/5773 [1:30:12<3:33:06, 5.51s/it] 60%|█████▉ | 3452/5773 [1:30:10<3:33:06, 5.51s/it] {'loss': 0.5605, 'learning_rate': 7.34728194833988e-06, 'epoch': 0.6} 60%|█████▉ | 3452/5773 [1:30:12<3:33:06, 5.51s/it] {'loss': 0.5605, 'learning_rate': 7.34728194833988e-06, 'epoch': 0.6} 60%|█████▉ | 3452/5773 [1:30:10<3:33:06, 5.51s/it] 60%|█████▉ | 3453/5773 [1:30:17<3:32:13, 5.49s/it] 60%|█████▉ | 3453/5773 [1:30:15<3:32:13, 5.49s/it] {'loss': 0.5802, 'learning_rate': 7.341872398488112e-06, 'epoch': 0.6} 60%|█████▉ | 3453/5773 [1:30:17<3:32:13, 5.49s/it] {'loss': 0.5802, 'learning_rate': 7.341872398488112e-06, 'epoch': 0.6} 60%|█████▉ | 3453/5773 [1:30:15<3:32:13, 5.49s/it] 60%|█████▉ | 3454/5773 [1:30:23<3:33:11, 5.52s/it] 60%|█████▉ | 3454/5773 [1:30:21<3:33:10, 5.52s/it] {'loss': 0.5467, 'learning_rate': 7.336463685499839e-06, 'epoch': 0.6} 60%|█████▉ | 3454/5773 [1:30:23<3:33:11, 5.52s/it] {'loss': 0.5467, 'learning_rate': 7.336463685499839e-06, 'epoch': 0.6} 60%|█████▉ | 3454/5773 [1:30:21<3:33:10, 5.52s/it] 60%|█████▉ | 3455/5773 [1:30:28<3:33:16, 5.52s/it] 60%|█████▉ | 3455/5773 [1:30:26<3:33:16, 5.52s/it] {'loss': 0.5633, 'learning_rate': 7.331055811077897e-06, 'epoch': 0.6} 60%|█████▉ | 3455/5773 [1:30:28<3:33:16, 5.52s/it] {'loss': 0.5633, 'learning_rate': 7.331055811077897e-06, 'epoch': 0.6} 60%|█████▉ | 3455/5773 [1:30:26<3:33:16, 5.52s/it] 60%|█████▉ | 3456/5773 [1:30:34<3:32:32, 5.50s/it] 60%|█████▉ | 3456/5773 [1:30:32<3:32:31, 5.50s/it]{'loss': 0.5649, 'learning_rate': 7.325648776924857e-06, 'epoch': 0.6} 60%|█████▉ | 3456/5773 [1:30:34<3:32:32, 5.50s/it] {'loss': 0.5649, 'learning_rate': 7.325648776924857e-06, 'epoch': 0.6} 60%|█████▉ | 3456/5773 [1:30:32<3:32:31, 5.50s/it] 60%|█████▉ | 3457/5773 [1:30:39<3:31:40, 5.48s/it] 60%|█████▉ | 3457/5773 [1:30:37<3:31:40, 5.48s/it] {'loss': 0.5702, 'learning_rate': 7.320242584743027e-06, 'epoch': 0.6} 60%|█████▉ | 3457/5773 [1:30:39<3:31:40, 5.48s/it] {'loss': 0.5702, 'learning_rate': 7.320242584743027e-06, 'epoch': 0.6} 60%|█████▉ | 3457/5773 [1:30:37<3:31:40, 5.48s/it] 60%|█████▉ | 3458/5773 [1:30:45<3:31:35, 5.48s/it] 60%|█████▉ | 3458/5773 [1:30:43<3:31:35, 5.48s/it] {'loss': 0.5509, 'learning_rate': 7.314837236234451e-06, 'epoch': 0.6} 60%|█████▉ | 3458/5773 [1:30:45<3:31:35, 5.48s/it] {'loss': 0.5509, 'learning_rate': 7.314837236234451e-06, 'epoch': 0.6} 60%|█████▉ | 3458/5773 [1:30:43<3:31:35, 5.48s/it] 60%|█████▉ | 3459/5773 [1:30:50<3:31:54, 5.49s/it] 60%|█████▉ | 3459/5773 [1:30:48<3:31:54, 5.49s/it] {'loss': 0.5763, 'learning_rate': 7.309432733100901e-06, 'epoch': 0.6} 60%|█████▉ | 3459/5773 [1:30:50<3:31:54, 5.49s/it] {'loss': 0.5763, 'learning_rate': 7.309432733100901e-06, 'epoch': 0.6} 60%|█████▉ | 3459/5773 [1:30:48<3:31:54, 5.49s/it] 60%|█████▉ | 3460/5773 [1:30:56<3:31:49, 5.49s/it] 60%|█████▉ | 3460/5773 [1:30:54<3:31:50, 5.50s/it] {'loss': 0.5634, 'learning_rate': 7.304029077043893e-06, 'epoch': 0.6} 60%|█████▉ | 3460/5773 [1:30:56<3:31:49, 5.49s/it] {'loss': 0.5634, 'learning_rate': 7.304029077043893e-06, 'epoch': 0.6} 60%|█████▉ | 3460/5773 [1:30:54<3:31:50, 5.50s/it] 60%|█████▉ | 3461/5773 [1:31:01<3:32:17, 5.51s/it] 60%|█████▉ | 3461/5773 [1:30:59<3:32:17, 5.51s/it] {'loss': 0.5674, 'learning_rate': 7.2986262697646634e-06, 'epoch': 0.6} 60%|█████▉ | 3461/5773 [1:31:01<3:32:17, 5.51s/it] {'loss': 0.5674, 'learning_rate': 7.2986262697646634e-06, 'epoch': 0.6} 60%|█████▉ | 3461/5773 [1:30:59<3:32:17, 5.51s/it] 60%|█████▉ | 3462/5773 [1:31:07<3:31:31, 5.49s/it] 60%|█████▉ | 3462/5773 [1:31:05<3:31:31, 5.49s/it] {'loss': 0.5624, 'learning_rate': 7.293224312964194e-06, 'epoch': 0.6} 60%|█████▉ | 3462/5773 [1:31:07<3:31:31, 5.49s/it] {'loss': 0.5624, 'learning_rate': 7.293224312964194e-06, 'epoch': 0.6} 60%|█████▉ | 3462/5773 [1:31:05<3:31:31, 5.49s/it] 60%|█████▉ | 3463/5773 [1:31:12<3:30:46, 5.47s/it] 60%|█████▉ | 3463/5773 [1:31:10<3:30:46, 5.47s/it] {'loss': 0.5691, 'learning_rate': 7.287823208343192e-06, 'epoch': 0.6} 60%|█████▉ | 3463/5773 [1:31:12<3:30:46, 5.47s/it] {'loss': 0.5691, 'learning_rate': 7.287823208343192e-06, 'epoch': 0.6} 60%|█████▉ | 3463/5773 [1:31:10<3:30:46, 5.47s/it] 60%|██████ | 3464/5773 [1:31:18<3:31:04, 5.48s/it] 60%|██████ | 3464/5773 [1:31:16<3:31:04, 5.48s/it] {'loss': 0.5376, 'learning_rate': 7.2824229576020965e-06, 'epoch': 0.6} 60%|██████ | 3464/5773 [1:31:18<3:31:04, 5.48s/it] {'loss': 0.5376, 'learning_rate': 7.2824229576020965e-06, 'epoch': 0.6} 60%|██████ | 3464/5773 [1:31:16<3:31:04, 5.48s/it] 60%|██████ | 3465/5773 [1:31:23<3:31:58, 5.51s/it] 60%|██████ | 3465/5773 [1:31:21<3:31:58, 5.51s/it] {'loss': 0.5816, 'learning_rate': 7.2770235624410844e-06, 'epoch': 0.6} 60%|██████ | 3465/5773 [1:31:23<3:31:58, 5.51s/it] {'loss': 0.5816, 'learning_rate': 7.2770235624410844e-06, 'epoch': 0.6} 60%|██████ | 3465/5773 [1:31:21<3:31:58, 5.51s/it] 60%|██████ | 3466/5773 [1:31:29<3:30:46, 5.48s/it] 60%|██████ | 3466/5773 [1:31:27<3:30:46, 5.48s/it] {'loss': 0.5778, 'learning_rate': 7.2716250245600515e-06, 'epoch': 0.6} 60%|██████ | 3466/5773 [1:31:29<3:30:46, 5.48s/it] {'loss': 0.5778, 'learning_rate': 7.2716250245600515e-06, 'epoch': 0.6} 60%|██████ | 3466/5773 [1:31:27<3:30:46, 5.48s/it] 60%|██████ | 3467/5773 [1:31:34<3:29:46, 5.46s/it] 60%|██████ | 3467/5773 [1:31:32<3:29:46, 5.46s/it] {'loss': 0.5672, 'learning_rate': 7.266227345658629e-06, 'epoch': 0.6} 60%|██████ | 3467/5773 [1:31:34<3:29:46, 5.46s/it] {'loss': 0.5672, 'learning_rate': 7.266227345658629e-06, 'epoch': 0.6} 60%|██████ | 3467/5773 [1:31:32<3:29:46, 5.46s/it] 60%|██████ | 3468/5773 [1:31:40<3:30:23, 5.48s/it] 60%|██████ | 3468/5773 [1:31:38<3:30:23, 5.48s/it] {'loss': 0.567, 'learning_rate': 7.260830527436183e-06, 'epoch': 0.6} 60%|██████ | 3468/5773 [1:31:40<3:30:23, 5.48s/it] {'loss': 0.567, 'learning_rate': 7.260830527436183e-06, 'epoch': 0.6} 60%|██████ | 3468/5773 [1:31:38<3:30:23, 5.48s/it] 60%|██████ | 3469/5773 [1:31:45<3:29:40, 5.46s/it] 60%|██████ | 3469/5773 [1:31:43<3:29:40, 5.46s/it] {'loss': 0.5701, 'learning_rate': 7.255434571591802e-06, 'epoch': 0.6} 60%|██████ | 3469/5773 [1:31:45<3:29:40, 5.46s/it] {'loss': 0.5701, 'learning_rate': 7.255434571591802e-06, 'epoch': 0.6} 60%|██████ | 3469/5773 [1:31:43<3:29:40, 5.46s/it] 60%|██████ | 3470/5773 [1:31:50<3:28:43, 5.44s/it] 60%|██████ | 3470/5773 [1:31:48<3:28:43, 5.44s/it] {'loss': 0.5496, 'learning_rate': 7.250039479824306e-06, 'epoch': 0.6} 60%|██████ | 3470/5773 [1:31:50<3:28:43, 5.44s/it] {'loss': 0.5496, 'learning_rate': 7.250039479824306e-06, 'epoch': 0.6} 60%|██████ | 3470/5773 [1:31:48<3:28:43, 5.44s/it] 60%|██████ | 3471/5773 [1:31:56<3:29:18, 5.46s/it] 60%|██████ | 3471/5773 [1:31:54<3:29:18, 5.46s/it] {'loss': 0.5774, 'learning_rate': 7.2446452538322435e-06, 'epoch': 0.6} 60%|██████ | 3471/5773 [1:31:56<3:29:18, 5.46s/it] {'loss': 0.5774, 'learning_rate': 7.2446452538322435e-06, 'epoch': 0.6} 60%|██████ | 3471/5773 [1:31:54<3:29:18, 5.46s/it] 60%|██████ | 3472/5773 [1:32:01<3:30:00, 5.48s/it] 60%|██████ | 3472/5773 [1:31:59<3:30:00, 5.48s/it] {'loss': 0.5741, 'learning_rate': 7.239251895313887e-06, 'epoch': 0.6} 60%|██████ | 3472/5773 [1:32:01<3:30:00, 5.48s/it] {'loss': 0.5741, 'learning_rate': 7.239251895313887e-06, 'epoch': 0.6} 60%|██████ | 3472/5773 [1:31:59<3:30:00, 5.48s/it] 60%|██████ | 3473/5773 [1:32:07<3:31:58, 5.53s/it] 60%|██████ | 3473/5773 [1:32:05<3:31:58, 5.53s/it] {'loss': 0.5558, 'learning_rate': 7.233859405967241e-06, 'epoch': 0.6} 60%|██████ | 3473/5773 [1:32:07<3:31:58, 5.53s/it] {'loss': 0.5558, 'learning_rate': 7.233859405967241e-06, 'epoch': 0.6} 60%|██████ | 3473/5773 [1:32:05<3:31:58, 5.53s/it] 60%|██████ | 3474/5773 [1:32:13<3:33:57, 5.58s/it] 60%|██████ | 3474/5773 [1:32:11<3:33:57, 5.58s/it] {'loss': 0.5642, 'learning_rate': 7.2284677874900285e-06, 'epoch': 0.6} 60%|██████ | 3474/5773 [1:32:13<3:33:57, 5.58s/it] {'loss': 0.5642, 'learning_rate': 7.2284677874900285e-06, 'epoch': 0.6} 60%|██████ | 3474/5773 [1:32:11<3:33:57, 5.58s/it] 60%|██████ | 3475/5773 [1:32:18<3:31:45, 5.53s/it] 60%|██████ | 3475/5773 [1:32:16<3:31:45, 5.53s/it] {'loss': 0.5777, 'learning_rate': 7.223077041579707e-06, 'epoch': 0.6} 60%|██████ | 3475/5773 [1:32:18<3:31:45, 5.53s/it] {'loss': 0.5777, 'learning_rate': 7.223077041579707e-06, 'epoch': 0.6} 60%|██████ | 3475/5773 [1:32:16<3:31:45, 5.53s/it] 60%|██████ | 3476/5773 [1:32:24<3:30:09, 5.49s/it] 60%|██████ | 3476/5773 [1:32:22<3:30:09, 5.49s/it] {'loss': 0.5725, 'learning_rate': 7.217687169933458e-06, 'epoch': 0.6} 60%|██████ | 3476/5773 [1:32:24<3:30:09, 5.49s/it] {'loss': 0.5725, 'learning_rate': 7.217687169933458e-06, 'epoch': 0.6} 60%|██████ | 3476/5773 [1:32:22<3:30:09, 5.49s/it] 60%|██████ | 3477/5773 [1:32:29<3:28:07, 5.44s/it] 60%|██████ | 3477/5773 [1:32:27<3:28:07, 5.44s/it] {'loss': 0.5543, 'learning_rate': 7.212298174248179e-06, 'epoch': 0.6} 60%|██████ | 3477/5773 [1:32:29<3:28:07, 5.44s/it] {'loss': 0.5543, 'learning_rate': 7.212298174248179e-06, 'epoch': 0.6} 60%|██████ | 3477/5773 [1:32:27<3:28:07, 5.44s/it] 60%|██████ | 3478/5773 [1:32:34<3:28:07, 5.44s/it] 60%|██████ | 3478/5773 [1:32:32<3:28:07, 5.44s/it] {'loss': 0.5596, 'learning_rate': 7.206910056220504e-06, 'epoch': 0.6} 60%|██████ | 3478/5773 [1:32:34<3:28:07, 5.44s/it] {'loss': 0.5596, 'learning_rate': 7.206910056220504e-06, 'epoch': 0.6} 60%|██████ | 3478/5773 [1:32:32<3:28:07, 5.44s/it] 60%|██████ | 3479/5773 [1:32:40<3:29:59, 5.49s/it] 60%|██████ | 3479/5773 [1:32:38<3:29:59, 5.49s/it] {'loss': 0.5709, 'learning_rate': 7.201522817546783e-06, 'epoch': 0.6} 60%|██████ | 3479/5773 [1:32:40<3:29:59, 5.49s/it] {'loss': 0.5709, 'learning_rate': 7.201522817546783e-06, 'epoch': 0.6} 60%|██████ | 3479/5773 [1:32:38<3:29:59, 5.49s/it] 60%|██████ | 3480/5773 [1:32:46<3:32:06, 5.55s/it] 60%|██████ | 3480/5773 [1:32:44<3:32:06, 5.55s/it] {'loss': 0.5714, 'learning_rate': 7.196136459923086e-06, 'epoch': 0.6} 60%|██████ | 3480/5773 [1:32:46<3:32:06, 5.55s/it] {'loss': 0.5714, 'learning_rate': 7.196136459923086e-06, 'epoch': 0.6} 60%|██████ | 3480/5773 [1:32:44<3:32:06, 5.55s/it] 60%|██████ | 3481/5773 [1:32:51<3:28:59, 5.47s/it] 60%|██████ | 3481/5773 [1:32:49<3:28:59, 5.47s/it] {'loss': 0.5715, 'learning_rate': 7.1907509850452165e-06, 'epoch': 0.6} 60%|██████ | 3481/5773 [1:32:51<3:28:59, 5.47s/it] {'loss': 0.5715, 'learning_rate': 7.1907509850452165e-06, 'epoch': 0.6} 60%|██████ | 3481/5773 [1:32:49<3:28:59, 5.47s/it] 60%|██████ | 3482/5773 [1:32:55<3:30:21, 5.51s/it] 60%|██████ | 3482/5773 [1:32:57<3:30:21, 5.51s/it] {'loss': 0.5695, 'learning_rate': 7.18536639460869e-06, 'epoch': 0.6} 60%|██████ | 3482/5773 [1:32:57<3:30:21, 5.51s/it] {'loss': 0.5695, 'learning_rate': 7.18536639460869e-06, 'epoch': 0.6} 60%|██████ | 3482/5773 [1:32:55<3:30:21, 5.51s/it] 60%|██████ | 3483/5773 [1:33:02<3:32:25, 5.57s/it] 60%|██████ | 3483/5773 [1:33:00<3:32:25, 5.57s/it] {'loss': 0.5837, 'learning_rate': 7.179982690308749e-06, 'epoch': 0.6} 60%|██████ | 3483/5773 [1:33:02<3:32:25, 5.57s/it] {'loss': 0.5837, 'learning_rate': 7.179982690308749e-06, 'epoch': 0.6} 60%|██████ | 3483/5773 [1:33:00<3:32:25, 5.57s/it] 60%|██████ | 3484/5773 [1:33:08<3:30:31, 5.52s/it] 60%|██████ | 3484/5773 [1:33:06<3:30:31, 5.52s/it] {'loss': 0.5656, 'learning_rate': 7.174599873840358e-06, 'epoch': 0.6} 60%|██████ | 3484/5773 [1:33:08<3:30:31, 5.52s/it] {'loss': 0.5656, 'learning_rate': 7.174599873840358e-06, 'epoch': 0.6} 60%|██████ | 3484/5773 [1:33:06<3:30:31, 5.52s/it] 60%|██████ | 3485/5773 [1:33:13<3:29:05, 5.48s/it] 60%|██████ | 3485/5773 [1:33:11<3:29:05, 5.48s/it] {'loss': 0.5757, 'learning_rate': 7.169217946898197e-06, 'epoch': 0.6} 60%|██████ | 3485/5773 [1:33:13<3:29:05, 5.48s/it] {'loss': 0.5757, 'learning_rate': 7.169217946898197e-06, 'epoch': 0.6} 60%|██████ | 3485/5773 [1:33:11<3:29:05, 5.48s/it] 60%|██████ | 3486/5773 [1:33:17<3:29:21, 5.49s/it] 60%|██████ | 3486/5773 [1:33:19<3:29:22, 5.49s/it] {'loss': 0.5583, 'learning_rate': 7.1638369111766696e-06, 'epoch': 0.6} 60%|██████ | 3486/5773 [1:33:19<3:29:22, 5.49s/it] {'loss': 0.5583, 'learning_rate': 7.1638369111766696e-06, 'epoch': 0.6} 60%|██████ | 3486/5773 [1:33:17<3:29:21, 5.49s/it] 60%|██████ | 3487/5773 [1:33:24<3:29:52, 5.51s/it] 60%|██████ | 3487/5773 [1:33:22<3:29:52, 5.51s/it] {'loss': 0.5744, 'learning_rate': 7.158456768369895e-06, 'epoch': 0.6} 60%|██████ | 3487/5773 [1:33:24<3:29:52, 5.51s/it] {'loss': 0.5744, 'learning_rate': 7.158456768369895e-06, 'epoch': 0.6} 60%|██████ | 3487/5773 [1:33:22<3:29:52, 5.51s/it] 60%|██████ | 3488/5773 [1:33:30<3:30:07, 5.52s/it] 60%|██████ | 3488/5773 [1:33:28<3:30:07, 5.52s/it] {'loss': 0.5542, 'learning_rate': 7.153077520171718e-06, 'epoch': 0.6} 60%|██████ | 3488/5773 [1:33:30<3:30:07, 5.52s/it] {'loss': 0.5542, 'learning_rate': 7.153077520171718e-06, 'epoch': 0.6} 60%|██████ | 3488/5773 [1:33:28<3:30:07, 5.52s/it] 60%|██████ | 3489/5773 [1:33:35<3:29:57, 5.52s/it] 60%|██████ | 3489/5773 [1:33:33<3:29:57, 5.52s/it] {'loss': 0.5526, 'learning_rate': 7.147699168275697e-06, 'epoch': 0.6} 60%|██████ | 3489/5773 [1:33:35<3:29:57, 5.52s/it]{'loss': 0.5526, 'learning_rate': 7.147699168275697e-06, 'epoch': 0.6} 60%|██████ | 3489/5773 [1:33:33<3:29:57, 5.52s/it] 60%|██████ | 3490/5773 [1:33:41<3:31:34, 5.56s/it] 60%|██████ | 3490/5773 [1:33:39<3:31:33, 5.56s/it] {'loss': 0.5748, 'learning_rate': 7.142321714375107e-06, 'epoch': 0.6} 60%|██████ | 3490/5773 [1:33:41<3:31:34, 5.56s/it] {'loss': 0.5748, 'learning_rate': 7.142321714375107e-06, 'epoch': 0.6} 60%|██████ | 3490/5773 [1:33:39<3:31:33, 5.56s/it] 60%|██████ | 3491/5773 [1:33:46<3:29:16, 5.50s/it] 60%|██████ | 3491/5773 [1:33:44<3:29:16, 5.50s/it] {'loss': 0.5717, 'learning_rate': 7.136945160162944e-06, 'epoch': 0.6} 60%|██████ | 3491/5773 [1:33:46<3:29:16, 5.50s/it] {'loss': 0.5717, 'learning_rate': 7.136945160162944e-06, 'epoch': 0.6} 60%|██████ | 3491/5773 [1:33:44<3:29:16, 5.50s/it] 60%|██████ | 3492/5773 [1:33:52<3:30:20, 5.53s/it] 60%|██████ | 3492/5773 [1:33:50<3:30:20, 5.53s/it] {'loss': 0.5686, 'learning_rate': 7.131569507331919e-06, 'epoch': 0.6} 60%|██████ | 3492/5773 [1:33:52<3:30:20, 5.53s/it] {'loss': 0.5686, 'learning_rate': 7.131569507331919e-06, 'epoch': 0.6} 60%|██████ | 3492/5773 [1:33:50<3:30:20, 5.53s/it] 61%|██████ | 3493/5773 [1:33:57<3:28:09, 5.48s/it] 61%|██████ | 3493/5773 [1:33:55<3:28:09, 5.48s/it] {'loss': 0.5803, 'learning_rate': 7.12619475757446e-06, 'epoch': 0.61} 61%|██████ | 3493/5773 [1:33:57<3:28:09, 5.48s/it] {'loss': 0.5803, 'learning_rate': 7.12619475757446e-06, 'epoch': 0.61} 61%|██████ | 3493/5773 [1:33:55<3:28:09, 5.48s/it] 61%|██████ | 3494/5773 [1:34:03<3:29:03, 5.50s/it] 61%|██████ | 3494/5773 [1:34:01<3:29:03, 5.50s/it] {'loss': 0.5807, 'learning_rate': 7.120820912582706e-06, 'epoch': 0.61} 61%|██████ | 3494/5773 [1:34:03<3:29:03, 5.50s/it] {'loss': 0.5807, 'learning_rate': 7.120820912582706e-06, 'epoch': 0.61} 61%|██████ | 3494/5773 [1:34:01<3:29:03, 5.50s/it] 61%|██████ | 3495/5773 [1:34:08<3:28:08, 5.48s/it] 61%|██████ | 3495/5773 [1:34:06<3:28:08, 5.48s/it] {'loss': 0.5673, 'learning_rate': 7.115447974048522e-06, 'epoch': 0.61} 61%|██████ | 3495/5773 [1:34:08<3:28:08, 5.48s/it] {'loss': 0.5673, 'learning_rate': 7.115447974048522e-06, 'epoch': 0.61} 61%|██████ | 3495/5773 [1:34:06<3:28:08, 5.48s/it] 61%|██████ | 3496/5773 [1:34:14<3:28:24, 5.49s/it] 61%|██████ | 3496/5773 [1:34:12<3:28:24, 5.49s/it] {'loss': 0.5696, 'learning_rate': 7.110075943663473e-06, 'epoch': 0.61} 61%|██████ | 3496/5773 [1:34:14<3:28:24, 5.49s/it] {'loss': 0.5696, 'learning_rate': 7.110075943663473e-06, 'epoch': 0.61} 61%|██████ | 3496/5773 [1:34:12<3:28:24, 5.49s/it] 61%|██████ | 3497/5773 [1:34:19<3:30:14, 5.54s/it] 61%|██████ | 3497/5773 [1:34:17<3:30:14, 5.54s/it] {'loss': 0.5573, 'learning_rate': 7.104704823118851e-06, 'epoch': 0.61} 61%|██████ | 3497/5773 [1:34:19<3:30:14, 5.54s/it] {'loss': 0.5573, 'learning_rate': 7.104704823118851e-06, 'epoch': 0.61} 61%|██████ | 3497/5773 [1:34:17<3:30:14, 5.54s/it] 61%|██████ | 3498/5773 [1:34:25<3:30:01, 5.54s/it] 61%|██████ | 3498/5773 [1:34:23<3:30:01, 5.54s/it] {'loss': 0.5743, 'learning_rate': 7.099334614105652e-06, 'epoch': 0.61} 61%|██████ | 3498/5773 [1:34:25<3:30:01, 5.54s/it] {'loss': 0.5743, 'learning_rate': 7.099334614105652e-06, 'epoch': 0.61} 61%|██████ | 3498/5773 [1:34:23<3:30:01, 5.54s/it] 61%|██████ | 3499/5773 [1:34:30<3:29:21, 5.52s/it] 61%|██████ | 3499/5773 [1:34:28<3:29:21, 5.52s/it] {'loss': 0.561, 'learning_rate': 7.093965318314595e-06, 'epoch': 0.61} 61%|██████ | 3499/5773 [1:34:30<3:29:21, 5.52s/it] {'loss': 0.561, 'learning_rate': 7.093965318314595e-06, 'epoch': 0.61} 61%|██████ | 3499/5773 [1:34:28<3:29:21, 5.52s/it]8 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 61%|██████ | 3500/5773 [1:34:36<3:28:58, 5.52s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 61%|██████ | 3500/5773 [1:34:34<3:28:58, 5.52s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5589, 'learning_rate': 7.0885969374361e-06, 'epoch': 0.61} 61%|██████ | 3500/5773 [1:34:36<3:28:58, 5.52s/it] {'loss': 0.5589, 'learning_rate': 7.0885969374361e-06, 'epoch': 0.61} 61%|██████ | 3500/5773 [1:34:34<3:28:58, 5.52s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 61%|██████ | 3501/5773 [1:35:00<7:03:30, 11.18s/it] 61%|██████ | 3501/5773 [1:34:58<7:03:30, 11.18s/it] {'loss': 0.5731, 'learning_rate': 7.083229473160309e-06, 'epoch': 0.61} 61%|██████ | 3501/5773 [1:35:00<7:03:30, 11.18s/it] {'loss': 0.5731, 'learning_rate': 7.083229473160309e-06, 'epoch': 0.61} 61%|██████ | 3501/5773 [1:34:58<7:03:30, 11.18s/it] 61%|██████ | 3502/5773 [1:35:06<5:56:29, 9.42s/it] 61%|██████ | 3502/5773 [1:35:04<5:56:29, 9.42s/it] {'loss': 0.5558, 'learning_rate': 7.0778629271770726e-06, 'epoch': 0.61} 61%|██████ | 3502/5773 [1:35:06<5:56:29, 9.42s/it] {'loss': 0.5558, 'learning_rate': 7.0778629271770726e-06, 'epoch': 0.61} 61%|██████ | 3502/5773 [1:35:04<5:56:29, 9.42s/it] 61%|██████ | 3503/5773 [1:35:11<5:11:58, 8.25s/it] 61%|██████ | 3503/5773 [1:35:09<5:11:58, 8.25s/it] {'loss': 0.5546, 'learning_rate': 7.072497301175945e-06, 'epoch': 0.61} 61%|██████ | 3503/5773 [1:35:11<5:11:58, 8.25s/it] {'loss': 0.5546, 'learning_rate': 7.072497301175945e-06, 'epoch': 0.61} 61%|██████ | 3503/5773 [1:35:09<5:11:58, 8.25s/it] 61%|██████ | 3504/5773 [1:35:17<4:40:37, 7.42s/it] 61%|██████ | 3504/5773 [1:35:15<4:40:37, 7.42s/it] {'loss': 0.5598, 'learning_rate': 7.067132596846203e-06, 'epoch': 0.61} 61%|██████ | 3504/5773 [1:35:17<4:40:37, 7.42s/it] {'loss': 0.5598, 'learning_rate': 7.067132596846203e-06, 'epoch': 0.61} 61%|██████ | 3504/5773 [1:35:15<4:40:37, 7.42s/it] 61%|██████ | 3505/5773 [1:35:20<4:17:45, 6.82s/it] {'loss': 0.5548, 'learning_rate': 7.061768815876821e-06, 'epoch': 0.61} 61%|██████ | 3505/5773 [1:35:20<4:17:45, 6.82s/it] 61%|██████ | 3505/5773 [1:35:22<4:17:45, 6.82s/it] {'loss': 0.5548, 'learning_rate': 7.061768815876821e-06, 'epoch': 0.61} 61%|██████ | 3505/5773 [1:35:22<4:17:45, 6.82s/it] 61%|██████ | 3506/5773 [1:35:28<4:05:58, 6.51s/it] 61%|██████ | 3506/5773 [1:35:26<4:05:58, 6.51s/it] {'loss': 0.572, 'learning_rate': 7.056405959956492e-06, 'epoch': 0.61} 61%|██████ | 3506/5773 [1:35:28<4:05:58, 6.51s/it] {'loss': 0.572, 'learning_rate': 7.056405959956492e-06, 'epoch': 0.61} 61%|██████ | 3506/5773 [1:35:26<4:05:58, 6.51s/it] 61%|██████ | 3507/5773 [1:35:33<3:55:06, 6.23s/it] 61%|██████ | 3507/5773 [1:35:31<3:55:06, 6.23s/it] {'loss': 0.5635, 'learning_rate': 7.051044030773619e-06, 'epoch': 0.61} 61%|██████ | 3507/5773 [1:35:33<3:55:06, 6.23s/it] {'loss': 0.5635, 'learning_rate': 7.051044030773619e-06, 'epoch': 0.61} 61%|██████ | 3507/5773 [1:35:31<3:55:06, 6.23s/it] 61%|██████ | 3508/5773 [1:35:39<3:48:17, 6.05s/it] 61%|██████ | 3508/5773 [1:35:37<3:48:16, 6.05s/it] {'loss': 0.6217, 'learning_rate': 7.045683030016299e-06, 'epoch': 0.61} 61%|██████ | 3508/5773 [1:35:39<3:48:17, 6.05s/it] {'loss': 0.6217, 'learning_rate': 7.045683030016299e-06, 'epoch': 0.61} 61%|██████ | 3508/5773 [1:35:37<3:48:16, 6.05s/it] 61%|██████ | 3509/5773 [1:35:44<3:40:58, 5.86s/it] 61%|██████ | 3509/5773 [1:35:42<3:40:58, 5.86s/it] {'loss': 0.5628, 'learning_rate': 7.040322959372352e-06, 'epoch': 0.61} 61%|██████ | 3509/5773 [1:35:44<3:40:58, 5.86s/it] {'loss': 0.5628, 'learning_rate': 7.040322959372352e-06, 'epoch': 0.61} 61%|██████ | 3509/5773 [1:35:42<3:40:58, 5.86s/it] 61%|██████ | 3510/5773 [1:35:50<3:34:04, 5.68s/it] 61%|██████ | 3510/5773 [1:35:48<3:34:04, 5.68s/it] {'loss': 0.5521, 'learning_rate': 7.034963820529299e-06, 'epoch': 0.61} 61%|██████ | 3510/5773 [1:35:50<3:34:04, 5.68s/it] {'loss': 0.5521, 'learning_rate': 7.034963820529299e-06, 'epoch': 0.61} 61%|██████ | 3510/5773 [1:35:48<3:34:04, 5.68s/it] 61%|██████ | 3511/5773 [1:35:55<3:31:33, 5.61s/it] 61%|██████ | 3511/5773 [1:35:53<3:31:33, 5.61s/it] {'loss': 0.5431, 'learning_rate': 7.029605615174365e-06, 'epoch': 0.61} 61%|██████ | 3511/5773 [1:35:55<3:31:33, 5.61s/it] {'loss': 0.5431, 'learning_rate': 7.029605615174365e-06, 'epoch': 0.61} 61%|██████ | 3511/5773 [1:35:53<3:31:33, 5.61s/it] 61%|██████ | 3512/5773 [1:36:00<3:28:27, 5.53s/it] 61%|██████ | 3512/5773 [1:35:58<3:28:27, 5.53s/it] {'loss': 0.5652, 'learning_rate': 7.02424834499449e-06, 'epoch': 0.61} 61%|██████ | 3512/5773 [1:36:00<3:28:27, 5.53s/it] {'loss': 0.5652, 'learning_rate': 7.02424834499449e-06, 'epoch': 0.61} 61%|██████ | 3512/5773 [1:35:58<3:28:27, 5.53s/it] 61%|██████ | 3513/5773 [1:36:06<3:26:58, 5.50s/it] 61%|██████ | 3513/5773 [1:36:04<3:26:58, 5.50s/it] {'loss': 0.5719, 'learning_rate': 7.018892011676308e-06, 'epoch': 0.61} 61%|██████ | 3513/5773 [1:36:06<3:26:58, 5.50s/it] {'loss': 0.5719, 'learning_rate': 7.018892011676308e-06, 'epoch': 0.61} 61%|██████ | 3513/5773 [1:36:04<3:26:58, 5.50s/it] 61%|██████ | 3514/5773 [1:36:11<3:27:58, 5.52s/it] 61%|██████ | 3514/5773 [1:36:09<3:27:58, 5.52s/it] {'loss': 0.5723, 'learning_rate': 7.0135366169061665e-06, 'epoch': 0.61} 61%|██████ | 3514/5773 [1:36:11<3:27:58, 5.52s/it] {'loss': 0.5723, 'learning_rate': 7.0135366169061665e-06, 'epoch': 0.61} 61%|██████ | 3514/5773 [1:36:09<3:27:58, 5.52s/it] 61%|██████ | 3515/5773 [1:36:17<3:27:54, 5.52s/it] 61%|██████ | 3515/5773 [1:36:15<3:27:54, 5.52s/it] {'loss': 0.5645, 'learning_rate': 7.0081821623701165e-06, 'epoch': 0.61} 61%|██████ | 3515/5773 [1:36:17<3:27:54, 5.52s/it] {'loss': 0.5645, 'learning_rate': 7.0081821623701165e-06, 'epoch': 0.61} 61%|██████ | 3515/5773 [1:36:15<3:27:54, 5.52s/it] 61%|██████ | 3516/5773 [1:36:22<3:26:42, 5.50s/it] 61%|██████ | 3516/5773 [1:36:20<3:26:42, 5.50s/it] {'loss': 0.5537, 'learning_rate': 7.002828649753908e-06, 'epoch': 0.61} 61%|██████ | 3516/5773 [1:36:22<3:26:42, 5.50s/it] {'loss': 0.5537, 'learning_rate': 7.002828649753908e-06, 'epoch': 0.61} 61%|██████ | 3516/5773 [1:36:20<3:26:42, 5.50s/it] 61%|██████ | 3517/5773 [1:36:28<3:26:49, 5.50s/it] 61%|██████ | 3517/5773 [1:36:26<3:26:49, 5.50s/it] {'loss': 0.5713, 'learning_rate': 6.997476080743002e-06, 'epoch': 0.61} 61%|██████ | 3517/5773 [1:36:28<3:26:49, 5.50s/it] {'loss': 0.5713, 'learning_rate': 6.997476080743002e-06, 'epoch': 0.61} 61%|██████ | 3517/5773 [1:36:26<3:26:49, 5.50s/it] 61%|██████ | 3518/5773 [1:36:33<3:27:19, 5.52s/it] 61%|██████ | 3518/5773 [1:36:31<3:27:19, 5.52s/it] {'loss': 0.5624, 'learning_rate': 6.9921244570225536e-06, 'epoch': 0.61} 61%|██████ | 3518/5773 [1:36:33<3:27:19, 5.52s/it] {'loss': 0.5624, 'learning_rate': 6.9921244570225536e-06, 'epoch': 0.61} 61%|██████ | 3518/5773 [1:36:31<3:27:19, 5.52s/it] 61%|██████ | 3519/5773 [1:36:39<3:25:52, 5.48s/it] 61%|██████ | 3519/5773 [1:36:37<3:25:52, 5.48s/it] {'loss': 0.5695, 'learning_rate': 6.986773780277427e-06, 'epoch': 0.61} 61%|██████ | 3519/5773 [1:36:39<3:25:52, 5.48s/it] {'loss': 0.5695, 'learning_rate': 6.986773780277427e-06, 'epoch': 0.61} 61%|██████ | 3519/5773 [1:36:37<3:25:52, 5.48s/it] 61%|██████ | 3520/5773 [1:36:44<3:23:46, 5.43s/it] 61%|██████ | 3520/5773 [1:36:42<3:23:46, 5.43s/it] {'loss': 0.5793, 'learning_rate': 6.981424052192187e-06, 'epoch': 0.61} 61%|██████ | 3520/5773 [1:36:44<3:23:46, 5.43s/it] {'loss': 0.5793, 'learning_rate': 6.981424052192187e-06, 'epoch': 0.61} 61%|██████ | 3520/5773 [1:36:42<3:23:46, 5.43s/it] 61%|██████ | 3521/5773 [1:36:50<3:24:36, 5.45s/it] 61%|██████ | 3521/5773 [1:36:48<3:24:37, 5.45s/it] {'loss': 0.559, 'learning_rate': 6.976075274451102e-06, 'epoch': 0.61} 61%|██████ | 3521/5773 [1:36:50<3:24:36, 5.45s/it] {'loss': 0.559, 'learning_rate': 6.976075274451102e-06, 'epoch': 0.61} 61%|██████ | 3521/5773 [1:36:48<3:24:37, 5.45s/it] 61%|██████ | 3522/5773 [1:36:55<3:24:24, 5.45s/it] 61%|██████ | 3522/5773 [1:36:53<3:24:24, 5.45s/it] {'loss': 0.5759, 'learning_rate': 6.97072744873813e-06, 'epoch': 0.61} 61%|██████ | 3522/5773 [1:36:55<3:24:24, 5.45s/it] {'loss': 0.5759, 'learning_rate': 6.97072744873813e-06, 'epoch': 0.61} 61%|██████ | 3522/5773 [1:36:53<3:24:24, 5.45s/it] 61%|██████ | 3523/5773 [1:37:01<3:25:04, 5.47s/it] 61%|██████ | 3523/5773 [1:36:59<3:25:04, 5.47s/it] {'loss': 0.5712, 'learning_rate': 6.965380576736943e-06, 'epoch': 0.61} 61%|██████ | 3523/5773 [1:37:01<3:25:04, 5.47s/it] {'loss': 0.5712, 'learning_rate': 6.965380576736943e-06, 'epoch': 0.61} 61%|██████ | 3523/5773 [1:36:59<3:25:04, 5.47s/it] 61%|██████ | 3524/5773 [1:37:06<3:25:44, 5.49s/it] 61%|██████ | 3524/5773 [1:37:04<3:25:44, 5.49s/it] {'loss': 0.5686, 'learning_rate': 6.960034660130904e-06, 'epoch': 0.61} 61%|██████ | 3524/5773 [1:37:06<3:25:44, 5.49s/it] {'loss': 0.5686, 'learning_rate': 6.960034660130904e-06, 'epoch': 0.61} 61%|██████ | 3524/5773 [1:37:04<3:25:44, 5.49s/it] 61%|██████ | 3525/5773 [1:37:12<3:28:03, 5.55s/it] 61%|██████ | 3525/5773 [1:37:10<3:28:04, 5.55s/it] {'loss': 0.5567, 'learning_rate': 6.9546897006030815e-06, 'epoch': 0.61} 61%|██████ | 3525/5773 [1:37:12<3:28:03, 5.55s/it] {'loss': 0.5567, 'learning_rate': 6.9546897006030815e-06, 'epoch': 0.61} 61%|██████ | 3525/5773 [1:37:10<3:28:04, 5.55s/it] 61%|██████ | 3526/5773 [1:37:17<3:27:17, 5.53s/it] 61%|██████ | 3526/5773 [1:37:15<3:27:17, 5.53s/it] {'loss': 0.5555, 'learning_rate': 6.949345699836239e-06, 'epoch': 0.61} 61%|██████ | 3526/5773 [1:37:17<3:27:17, 5.53s/it] {'loss': 0.5555, 'learning_rate': 6.949345699836239e-06, 'epoch': 0.61} 61%|██████ | 3526/5773 [1:37:15<3:27:17, 5.53s/it] 61%|██████ | 3527/5773 [1:37:23<3:24:34, 5.47s/it] 61%|██████ | 3527/5773 [1:37:21<3:24:34, 5.47s/it] {'loss': 0.5686, 'learning_rate': 6.9440026595128365e-06, 'epoch': 0.61} 61%|██████ | 3527/5773 [1:37:23<3:24:34, 5.47s/it] {'loss': 0.5686, 'learning_rate': 6.9440026595128365e-06, 'epoch': 0.61} 61%|██████ | 3527/5773 [1:37:21<3:24:34, 5.47s/it] 61%|██████ | 3528/5773 [1:37:28<3:24:35, 5.47s/it] 61%|██████ | 3528/5773 [1:37:26<3:24:35, 5.47s/it] {'loss': 0.5822, 'learning_rate': 6.938660581315036e-06, 'epoch': 0.61} 61%|██████ | 3528/5773 [1:37:28<3:24:35, 5.47s/it] {'loss': 0.5822, 'learning_rate': 6.938660581315036e-06, 'epoch': 0.61} 61%|██████ | 3528/5773 [1:37:26<3:24:35, 5.47s/it] 61%|██████ | 3529/5773 [1:37:33<3:23:47, 5.45s/it] 61%|██████ | 3529/5773 [1:37:32<3:23:47, 5.45s/it] {'loss': 0.5749, 'learning_rate': 6.933319466924693e-06, 'epoch': 0.61} 61%|██████ | 3529/5773 [1:37:33<3:23:47, 5.45s/it] {'loss': 0.5749, 'learning_rate': 6.933319466924693e-06, 'epoch': 0.61} 61%|██████ | 3529/5773 [1:37:32<3:23:47, 5.45s/it] 61%|██████ | 3530/5773 [1:37:39<3:23:59, 5.46s/it] 61%|██████ | 3530/5773 [1:37:37<3:23:59, 5.46s/it] {'loss': 0.5632, 'learning_rate': 6.927979318023364e-06, 'epoch': 0.61} 61%|██████ | 3530/5773 [1:37:39<3:23:59, 5.46s/it] {'loss': 0.5632, 'learning_rate': 6.927979318023364e-06, 'epoch': 0.61} 61%|██████ | 3530/5773 [1:37:37<3:23:59, 5.46s/it] 61%|██████ | 3531/5773 [1:37:44<3:23:54, 5.46s/it] 61%|██████ | 3531/5773 [1:37:42<3:23:54, 5.46s/it] {'loss': 0.5601, 'learning_rate': 6.922640136292293e-06, 'epoch': 0.61} 61%|██████ | 3531/5773 [1:37:44<3:23:54, 5.46s/it] {'loss': 0.5601, 'learning_rate': 6.922640136292293e-06, 'epoch': 0.61} 61%|██████ | 3531/5773 [1:37:42<3:23:54, 5.46s/it] 61%|██████ | 3532/5773 [1:37:50<3:25:37, 5.51s/it] 61%|██████ | 3532/5773 [1:37:48<3:25:37, 5.51s/it] {'loss': 0.567, 'learning_rate': 6.917301923412427e-06, 'epoch': 0.61} 61%|██████ | 3532/5773 [1:37:50<3:25:37, 5.51s/it] {'loss': 0.567, 'learning_rate': 6.917301923412427e-06, 'epoch': 0.61} 61%|██████ | 3532/5773 [1:37:48<3:25:37, 5.51s/it] 61%|██████ | 3533/5773 [1:37:56<3:26:05, 5.52s/it] 61%|██████ | 3533/5773 [1:37:54<3:26:05, 5.52s/it] {'loss': 0.5594, 'learning_rate': 6.911964681064411e-06, 'epoch': 0.61} 61%|██████ | 3533/5773 [1:37:56<3:26:05, 5.52s/it] {'loss': 0.5594, 'learning_rate': 6.911964681064411e-06, 'epoch': 0.61} 61%|██████ | 3533/5773 [1:37:54<3:26:05, 5.52s/it] 61%|██████ | 3534/5773 [1:37:59<3:25:47, 5.51s/it] 61%|██████ | 3534/5773 [1:38:01<3:25:48, 5.51s/it] {'loss': 0.5697, 'learning_rate': 6.906628410928573e-06, 'epoch': 0.61} 61%|██████ | 3534/5773 [1:38:01<3:25:48, 5.51s/it] {'loss': 0.5697, 'learning_rate': 6.906628410928573e-06, 'epoch': 0.61} 61%|██████ | 3534/5773 [1:37:59<3:25:47, 5.51s/it] 61%|██████ | 3535/5773 [1:38:04<3:23:43, 5.46s/it] 61%|██████ | 3535/5773 [1:38:06<3:23:43, 5.46s/it] {'loss': 0.5654, 'learning_rate': 6.901293114684949e-06, 'epoch': 0.61} 61%|██████ | 3535/5773 [1:38:06<3:23:43, 5.46s/it] {'loss': 0.5654, 'learning_rate': 6.901293114684949e-06, 'epoch': 0.61} 61%|██████ | 3535/5773 [1:38:04<3:23:43, 5.46s/it] 61%|██████▏ | 3536/5773 [1:38:12<3:23:23, 5.46s/it] 61%|██████▏ | 3536/5773 [1:38:10<3:23:23, 5.46s/it] {'loss': 0.5649, 'learning_rate': 6.895958794013251e-06, 'epoch': 0.61} 61%|██████▏ | 3536/5773 [1:38:12<3:23:23, 5.46s/it] {'loss': 0.5649, 'learning_rate': 6.895958794013251e-06, 'epoch': 0.61} 61%|██████▏ | 3536/5773 [1:38:10<3:23:23, 5.46s/it] 61%|██████▏ | 3537/5773 [1:38:15<3:23:42, 5.47s/it] 61%|██████▏ | 3537/5773 [1:38:17<3:23:42, 5.47s/it] {'loss': 0.5702, 'learning_rate': 6.890625450592897e-06, 'epoch': 0.61} 61%|██████▏ | 3537/5773 [1:38:17<3:23:42, 5.47s/it] {'loss': 0.5702, 'learning_rate': 6.890625450592897e-06, 'epoch': 0.61} 61%|██████▏ | 3537/5773 [1:38:15<3:23:42, 5.47s/it] 61%|██████▏ | 3538/5773 [1:38:23<3:22:14, 5.43s/it] 61%|██████▏ | 3538/5773 [1:38:21<3:22:15, 5.43s/it] {'loss': 0.5678, 'learning_rate': 6.8852930861029955e-06, 'epoch': 0.61} 61%|██████▏ | 3538/5773 [1:38:23<3:22:14, 5.43s/it] {'loss': 0.5678, 'learning_rate': 6.8852930861029955e-06, 'epoch': 0.61} 61%|██████▏ | 3538/5773 [1:38:21<3:22:15, 5.43s/it] 61%|██████▏ | 3539/5773 [1:38:28<3:23:30, 5.47s/it] 61%|██████▏ | 3539/5773 [1:38:26<3:23:30, 5.47s/it] {'loss': 0.5642, 'learning_rate': 6.8799617022223465e-06, 'epoch': 0.61} 61%|██████▏ | 3539/5773 [1:38:28<3:23:30, 5.47s/it] {'loss': 0.5642, 'learning_rate': 6.8799617022223465e-06, 'epoch': 0.61} 61%|██████▏ | 3539/5773 [1:38:26<3:23:30, 5.47s/it] 61%|██████▏ | 3540/5773 [1:38:34<3:22:20, 5.44s/it] 61%|██████▏ | 3540/5773 [1:38:32<3:22:20, 5.44s/it] {'loss': 0.5569, 'learning_rate': 6.874631300629435e-06, 'epoch': 0.61} 61%|██████▏ | 3540/5773 [1:38:34<3:22:20, 5.44s/it] {'loss': 0.5569, 'learning_rate': 6.874631300629435e-06, 'epoch': 0.61} 61%|██████▏ | 3540/5773 [1:38:32<3:22:20, 5.44s/it] 61%|██████▏ | 3541/5773 [1:38:39<3:22:02, 5.43s/it] 61%|██████▏ | 3541/5773 [1:38:37<3:22:03, 5.43s/it] {'loss': 0.5597, 'learning_rate': 6.8693018830024485e-06, 'epoch': 0.61} 61%|██████▏ | 3541/5773 [1:38:39<3:22:02, 5.43s/it] {'loss': 0.5597, 'learning_rate': 6.8693018830024485e-06, 'epoch': 0.61} 61%|██████▏ | 3541/5773 [1:38:37<3:22:03, 5.43s/it] 61%|██████▏ | 3542/5773 [1:38:45<3:22:52, 5.46s/it] 61%|██████▏ | 3542/5773 [1:38:43<3:22:52, 5.46s/it] {'loss': 0.5676, 'learning_rate': 6.863973451019252e-06, 'epoch': 0.61} 61%|██████▏ | 3542/5773 [1:38:45<3:22:52, 5.46s/it] {'loss': 0.5676, 'learning_rate': 6.863973451019252e-06, 'epoch': 0.61} 61%|██████▏ | 3542/5773 [1:38:43<3:22:52, 5.46s/it] 61%|██████▏ | 3543/5773 [1:38:48<3:24:07, 5.49s/it] 61%|██████▏ | 3543/5773 [1:38:50<3:24:08, 5.49s/it] {'loss': 0.5627, 'learning_rate': 6.85864600635741e-06, 'epoch': 0.61} 61%|██████▏ | 3543/5773 [1:38:50<3:24:08, 5.49s/it] {'loss': 0.5627, 'learning_rate': 6.85864600635741e-06, 'epoch': 0.61} 61%|██████▏ | 3543/5773 [1:38:48<3:24:07, 5.49s/it] 61%|██████▏ | 3544/5773 [1:38:54<3:23:21, 5.47s/it] 61%|██████▏ | 3544/5773 [1:38:56<3:23:21, 5.47s/it] {'loss': 0.5613, 'learning_rate': 6.853319550694168e-06, 'epoch': 0.61} 61%|██████▏ | 3544/5773 [1:38:56<3:23:21, 5.47s/it] {'loss': 0.5613, 'learning_rate': 6.853319550694168e-06, 'epoch': 0.61} 61%|██████▏ | 3544/5773 [1:38:54<3:23:21, 5.47s/it] 61%|██████▏ | 3545/5773 [1:39:01<3:23:32, 5.48s/it] 61%|██████▏ | 3545/5773 [1:38:59<3:23:31, 5.48s/it] {'loss': 0.5675, 'learning_rate': 6.847994085706469e-06, 'epoch': 0.61} 61%|██████▏ | 3545/5773 [1:39:01<3:23:32, 5.48s/it] {'loss': 0.5675, 'learning_rate': 6.847994085706469e-06, 'epoch': 0.61} 61%|██████▏ | 3545/5773 [1:38:59<3:23:31, 5.48s/it] 61%|██████▏ | 3546/5773 [1:39:04<3:22:19, 5.45s/it] 61%|██████▏ | 3546/5773 [1:39:06<3:22:20, 5.45s/it] {'loss': 0.556, 'learning_rate': 6.84266961307094e-06, 'epoch': 0.61} 61%|██████▏ | 3546/5773 [1:39:06<3:22:20, 5.45s/it] {'loss': 0.556, 'learning_rate': 6.84266961307094e-06, 'epoch': 0.61} 61%|██████▏ | 3546/5773 [1:39:04<3:22:19, 5.45s/it] 61%|██████▏ | 3547/5773 [1:39:12<3:22:38, 5.46s/it] 61%|██████▏ | 3547/5773 [1:39:10<3:22:39, 5.46s/it] {'loss': 0.577, 'learning_rate': 6.837346134463889e-06, 'epoch': 0.61} 61%|██████▏ | 3547/5773 [1:39:12<3:22:38, 5.46s/it] {'loss': 0.577, 'learning_rate': 6.837346134463889e-06, 'epoch': 0.61} 61%|██████▏ | 3547/5773 [1:39:10<3:22:39, 5.46s/it] 61%|██████▏ | 3548/5773 [1:39:16<3:23:36, 5.49s/it] 61%|██████▏ | 3548/5773 [1:39:18<3:23:37, 5.49s/it] {'loss': 0.575, 'learning_rate': 6.832023651561324e-06, 'epoch': 0.61} 61%|██████▏ | 3548/5773 [1:39:18<3:23:37, 5.49s/it] {'loss': 0.575, 'learning_rate': 6.832023651561324e-06, 'epoch': 0.61} 61%|██████▏ | 3548/5773 [1:39:16<3:23:36, 5.49s/it] 61%|██████▏ | 3549/5773 [1:39:21<3:22:07, 5.45s/it] 61%|██████▏ | 3549/5773 [1:39:23<3:22:07, 5.45s/it] {'loss': 0.5648, 'learning_rate': 6.826702166038932e-06, 'epoch': 0.61} 61%|██████▏ | 3549/5773 [1:39:23<3:22:07, 5.45s/it] {'loss': 0.5648, 'learning_rate': 6.826702166038932e-06, 'epoch': 0.61} 61%|██████▏ | 3549/5773 [1:39:21<3:22:07, 5.45s/it]12 AutoResumeHook: Checking whether to suspend... 814 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 61%|██████▏ | 3550/5773 [1:39:29<3:24:27, 5.52s/it]9 10AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 61%|██████▏ | 3550/5773 [1:39:27<3:24:27, 5.52s/it]1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5796, 'learning_rate': 6.821381679572079e-06, 'epoch': 0.61} 61%|██████▏ | 3550/5773 [1:39:29<3:24:27, 5.52s/it] {'loss': 0.5796, 'learning_rate': 6.821381679572079e-06, 'epoch': 0.61} 61%|██████▏ | 3550/5773 [1:39:27<3:24:27, 5.52s/it] 62%|██████▏ | 3551/5773 [1:39:32<3:23:32, 5.50s/it] 62%|██████▏ | 3551/5773 [1:39:34<3:23:32, 5.50s/it] {'loss': 0.5549, 'learning_rate': 6.81606219383583e-06, 'epoch': 0.62} 62%|██████▏ | 3551/5773 [1:39:34<3:23:32, 5.50s/it] {'loss': 0.5549, 'learning_rate': 6.81606219383583e-06, 'epoch': 0.62} 62%|██████▏ | 3551/5773 [1:39:32<3:23:32, 5.50s/it] 62%|██████▏ | 3552/5773 [1:39:39<3:22:15, 5.46s/it] 62%|██████▏ | 3552/5773 [1:39:37<3:22:15, 5.46s/it] {'loss': 0.5816, 'learning_rate': 6.810743710504931e-06, 'epoch': 0.62} 62%|██████▏ | 3552/5773 [1:39:39<3:22:15, 5.46s/it] {'loss': 0.5816, 'learning_rate': 6.810743710504931e-06, 'epoch': 0.62} 62%|██████▏ | 3552/5773 [1:39:37<3:22:15, 5.46s/it] 62%|██████▏ | 3553/5773 [1:39:45<3:23:17, 5.49s/it] 62%|██████▏ | 3553/5773 [1:39:43<3:23:17, 5.49s/it] {'loss': 0.5692, 'learning_rate': 6.805426231253806e-06, 'epoch': 0.62} 62%|██████▏ | 3553/5773 [1:39:45<3:23:17, 5.49s/it] {'loss': 0.5692, 'learning_rate': 6.805426231253806e-06, 'epoch': 0.62} 62%|██████▏ | 3553/5773 [1:39:43<3:23:17, 5.49s/it] 62%|██████▏ | 3554/5773 [1:39:50<3:22:49, 5.48s/it] 62%|██████▏ | 3554/5773 [1:39:48<3:22:49, 5.48s/it] {'loss': 0.5672, 'learning_rate': 6.80010975775657e-06, 'epoch': 0.62} 62%|██████▏ | 3554/5773 [1:39:50<3:22:49, 5.48s/it] {'loss': 0.5672, 'learning_rate': 6.80010975775657e-06, 'epoch': 0.62} 62%|██████▏ | 3554/5773 [1:39:48<3:22:49, 5.48s/it] 62%|██████▏ | 3555/5773 [1:39:56<3:23:47, 5.51s/it] 62%|██████▏ | 3555/5773 [1:39:54<3:23:47, 5.51s/it] {'loss': 0.5726, 'learning_rate': 6.794794291687018e-06, 'epoch': 0.62} 62%|██████▏ | 3555/5773 [1:39:56<3:23:47, 5.51s/it] {'loss': 0.5726, 'learning_rate': 6.794794291687018e-06, 'epoch': 0.62} 62%|██████▏ | 3555/5773 [1:39:54<3:23:47, 5.51s/it] 62%|██████▏ | 3556/5773 [1:40:01<3:22:28, 5.48s/it] 62%|██████▏ | 3556/5773 [1:39:59<3:22:28, 5.48s/it] {'loss': 0.5666, 'learning_rate': 6.789479834718627e-06, 'epoch': 0.62} 62%|██████▏ | 3556/5773 [1:40:01<3:22:28, 5.48s/it] {'loss': 0.5666, 'learning_rate': 6.789479834718627e-06, 'epoch': 0.62} 62%|██████▏ | 3556/5773 [1:39:59<3:22:28, 5.48s/it] 62%|██████▏ | 3557/5773 [1:40:07<3:25:21, 5.56s/it] 62%|██████▏ | 3557/5773 [1:40:05<3:25:21, 5.56s/it] {'loss': 0.5779, 'learning_rate': 6.784166388524562e-06, 'epoch': 0.62} 62%|██████▏ | 3557/5773 [1:40:07<3:25:21, 5.56s/it] {'loss': 0.5779, 'learning_rate': 6.784166388524562e-06, 'epoch': 0.62} 62%|██████▏ | 3557/5773 [1:40:05<3:25:21, 5.56s/it] 62%|██████▏ | 3558/5773 [1:40:11<3:25:29, 5.57s/it] 62%|██████▏ | 3558/5773 [1:40:13<3:25:29, 5.57s/it] {'loss': 0.5781, 'learning_rate': 6.778853954777662e-06, 'epoch': 0.62} 62%|██████▏ | 3558/5773 [1:40:13<3:25:29, 5.57s/it] {'loss': 0.5781, 'learning_rate': 6.778853954777662e-06, 'epoch': 0.62} 62%|██████▏ | 3558/5773 [1:40:11<3:25:29, 5.57s/it] 62%|██████▏ | 3559/5773 [1:40:18<3:23:51, 5.52s/it] 62%|██████▏ | 3559/5773 [1:40:16<3:23:51, 5.52s/it] {'loss': 0.5567, 'learning_rate': 6.773542535150453e-06, 'epoch': 0.62} 62%|██████▏ | 3559/5773 [1:40:18<3:23:51, 5.52s/it] {'loss': 0.5567, 'learning_rate': 6.773542535150453e-06, 'epoch': 0.62} 62%|██████▏ | 3559/5773 [1:40:16<3:23:51, 5.52s/it] 62%|██████▏ | 3560/5773 [1:40:24<3:24:33, 5.55s/it] 62%|██████▏ | 3560/5773 [1:40:22<3:24:33, 5.55s/it] {'loss': 0.5655, 'learning_rate': 6.768232131315136e-06, 'epoch': 0.62} 62%|██████▏ | 3560/5773 [1:40:24<3:24:33, 5.55s/it] {'loss': 0.5655, 'learning_rate': 6.768232131315136e-06, 'epoch': 0.62} 62%|██████▏ | 3560/5773 [1:40:22<3:24:33, 5.55s/it] 62%|██████▏ | 3561/5773 [1:40:29<3:25:38, 5.58s/it] 62%|██████▏ | 3561/5773 [1:40:27<3:25:38, 5.58s/it] {'loss': 0.5574, 'learning_rate': 6.762922744943601e-06, 'epoch': 0.62} 62%|██████▏ | 3561/5773 [1:40:29<3:25:38, 5.58s/it] {'loss': 0.5574, 'learning_rate': 6.762922744943601e-06, 'epoch': 0.62} 62%|██████▏ | 3561/5773 [1:40:27<3:25:38, 5.58s/it] 62%|██████▏ | 3562/5773 [1:40:35<3:26:13, 5.60s/it] 62%|██████▏ | 3562/5773 [1:40:33<3:26:13, 5.60s/it] {'loss': 0.5621, 'learning_rate': 6.757614377707409e-06, 'epoch': 0.62} 62%|██████▏ | 3562/5773 [1:40:35<3:26:13, 5.60s/it] {'loss': 0.5621, 'learning_rate': 6.757614377707409e-06, 'epoch': 0.62} 62%|██████▏ | 3562/5773 [1:40:33<3:26:13, 5.60s/it] 62%|██████▏ | 3563/5773 [1:40:41<3:27:00, 5.62s/it] 62%|██████▏ | 3563/5773 [1:40:39<3:27:00, 5.62s/it] {'loss': 0.5666, 'learning_rate': 6.752307031277807e-06, 'epoch': 0.62} 62%|██████▏ | 3563/5773 [1:40:41<3:27:00, 5.62s/it] {'loss': 0.5666, 'learning_rate': 6.752307031277807e-06, 'epoch': 0.62} 62%|██████▏ | 3563/5773 [1:40:39<3:27:00, 5.62s/it] 62%|██████▏ | 3564/5773 [1:40:46<3:25:42, 5.59s/it] 62%|██████▏ | 3564/5773 [1:40:44<3:25:43, 5.59s/it] {'loss': 0.5738, 'learning_rate': 6.7470007073257105e-06, 'epoch': 0.62} 62%|██████▏ | 3564/5773 [1:40:46<3:25:42, 5.59s/it] {'loss': 0.5738, 'learning_rate': 6.7470007073257105e-06, 'epoch': 0.62} 62%|██████▏ | 3564/5773 [1:40:44<3:25:43, 5.59s/it] 62%|██████▏ | 3565/5773 [1:40:52<3:24:45, 5.56s/it] 62%|██████▏ | 3565/5773 [1:40:50<3:24:45, 5.56s/it] {'loss': 0.561, 'learning_rate': 6.741695407521727e-06, 'epoch': 0.62} 62%|██████▏ | 3565/5773 [1:40:52<3:24:45, 5.56s/it] {'loss': 0.561, 'learning_rate': 6.741695407521727e-06, 'epoch': 0.62} 62%|██████▏ | 3565/5773 [1:40:50<3:24:45, 5.56s/it] 62%|██████▏ | 3566/5773 [1:40:55<3:24:14, 5.55s/it] 62%|██████▏ | 3566/5773 [1:40:57<3:24:14, 5.55s/it] {'loss': 0.5635, 'learning_rate': 6.736391133536127e-06, 'epoch': 0.62} 62%|██████▏ | 3566/5773 [1:40:57<3:24:14, 5.55s/it] {'loss': 0.5635, 'learning_rate': 6.736391133536127e-06, 'epoch': 0.62} 62%|██████▏ | 3566/5773 [1:40:55<3:24:14, 5.55s/it] 62%|██████▏ | 3567/5773 [1:41:03<3:22:42, 5.51s/it] 62%|██████▏ | 3567/5773 [1:41:01<3:22:42, 5.51s/it] {'loss': 0.5515, 'learning_rate': 6.731087887038874e-06, 'epoch': 0.62} 62%|██████▏ | 3567/5773 [1:41:03<3:22:42, 5.51s/it] {'loss': 0.5515, 'learning_rate': 6.731087887038874e-06, 'epoch': 0.62} 62%|██████▏ | 3567/5773 [1:41:01<3:22:42, 5.51s/it] 62%|██████▏ | 3568/5773 [1:41:08<3:22:23, 5.51s/it] 62%|██████▏ | 3568/5773 [1:41:06<3:22:23, 5.51s/it] {'loss': 0.5818, 'learning_rate': 6.725785669699593e-06, 'epoch': 0.62} 62%|██████▏ | 3568/5773 [1:41:08<3:22:23, 5.51s/it] {'loss': 0.5818, 'learning_rate': 6.725785669699593e-06, 'epoch': 0.62} 62%|██████▏ | 3568/5773 [1:41:06<3:22:23, 5.51s/it] 62%|██████▏ | 3569/5773 [1:41:14<3:21:22, 5.48s/it] 62%|██████▏ | 3569/5773 [1:41:12<3:21:22, 5.48s/it] {'loss': 0.5616, 'learning_rate': 6.720484483187592e-06, 'epoch': 0.62} 62%|██████▏ | 3569/5773 [1:41:14<3:21:22, 5.48s/it] {'loss': 0.5616, 'learning_rate': 6.720484483187592e-06, 'epoch': 0.62} 62%|██████▏ | 3569/5773 [1:41:12<3:21:22, 5.48s/it] 62%|██████▏ | 3570/5773 [1:41:19<3:21:34, 5.49s/it] 62%|██████▏ | 3570/5773 [1:41:17<3:21:34, 5.49s/it] {'loss': 0.5686, 'learning_rate': 6.71518432917186e-06, 'epoch': 0.62} 62%|██████▏ | 3570/5773 [1:41:19<3:21:34, 5.49s/it] {'loss': 0.5686, 'learning_rate': 6.71518432917186e-06, 'epoch': 0.62} 62%|██████▏ | 3570/5773 [1:41:17<3:21:34, 5.49s/it] 62%|██████▏ | 3571/5773 [1:41:25<3:22:01, 5.50s/it] 62%|██████▏ | 3571/5773 [1:41:23<3:22:01, 5.50s/it] {'loss': 0.5702, 'learning_rate': 6.709885209321047e-06, 'epoch': 0.62} 62%|██████▏ | 3571/5773 [1:41:25<3:22:01, 5.50s/it] {'loss': 0.5702, 'learning_rate': 6.709885209321047e-06, 'epoch': 0.62} 62%|██████▏ | 3571/5773 [1:41:23<3:22:01, 5.50s/it] 62%|██████▏ | 3572/5773 [1:41:30<3:23:53, 5.56s/it] 62%|██████▏ | 3572/5773 [1:41:28<3:23:53, 5.56s/it] {'loss': 0.563, 'learning_rate': 6.70458712530349e-06, 'epoch': 0.62} 62%|██████▏ | 3572/5773 [1:41:30<3:23:53, 5.56s/it] {'loss': 0.563, 'learning_rate': 6.70458712530349e-06, 'epoch': 0.62} 62%|██████▏ | 3572/5773 [1:41:28<3:23:53, 5.56s/it] 62%|██████▏ | 3573/5773 [1:41:36<3:21:54, 5.51s/it] 62%|██████▏ | 3573/5773 [1:41:34<3:21:54, 5.51s/it] {'loss': 0.5539, 'learning_rate': 6.699290078787193e-06, 'epoch': 0.62} 62%|██████▏ | 3573/5773 [1:41:36<3:21:54, 5.51s/it] {'loss': 0.5539, 'learning_rate': 6.699290078787193e-06, 'epoch': 0.62} 62%|██████▏ | 3573/5773 [1:41:34<3:21:54, 5.51s/it] 62%|██████▏ | 3574/5773 [1:41:41<3:22:02, 5.51s/it] 62%|██████▏ | 3574/5773 [1:41:39<3:22:02, 5.51s/it] {'loss': 0.5701, 'learning_rate': 6.693994071439836e-06, 'epoch': 0.62} 62%|██████▏ | 3574/5773 [1:41:41<3:22:02, 5.51s/it] {'loss': 0.5701, 'learning_rate': 6.693994071439836e-06, 'epoch': 0.62} 62%|██████▏ | 3574/5773 [1:41:39<3:22:02, 5.51s/it] 62%|██████▏ | 3575/5773 [1:41:47<3:20:21, 5.47s/it] 62%|██████▏ | 3575/5773 [1:41:45<3:20:21, 5.47s/it] {'loss': 0.5591, 'learning_rate': 6.688699104928771e-06, 'epoch': 0.62} 62%|██████▏ | 3575/5773 [1:41:47<3:20:21, 5.47s/it] {'loss': 0.5591, 'learning_rate': 6.688699104928771e-06, 'epoch': 0.62} 62%|██████▏ | 3575/5773 [1:41:45<3:20:21, 5.47s/it] 62%|██████▏ | 3576/5773 [1:41:52<3:22:26, 5.53s/it] 62%|██████▏ | 3576/5773 [1:41:50<3:22:26, 5.53s/it] {'loss': 0.5738, 'learning_rate': 6.683405180921023e-06, 'epoch': 0.62} 62%|██████▏ | 3576/5773 [1:41:52<3:22:26, 5.53s/it] {'loss': 0.5738, 'learning_rate': 6.683405180921023e-06, 'epoch': 0.62} 62%|██████▏ | 3576/5773 [1:41:50<3:22:26, 5.53s/it] 62%|██████▏ | 3577/5773 [1:41:58<3:24:28, 5.59s/it] 62%|██████▏ | 3577/5773 [1:41:56<3:24:28, 5.59s/it] {'loss': 0.5499, 'learning_rate': 6.678112301083292e-06, 'epoch': 0.62} 62%|██████▏ | 3577/5773 [1:41:58<3:24:28, 5.59s/it] {'loss': 0.5499, 'learning_rate': 6.678112301083292e-06, 'epoch': 0.62} 62%|██████▏ | 3577/5773 [1:41:56<3:24:28, 5.59s/it] 62%|██████▏ | 3578/5773 [1:42:02<3:23:27, 5.56s/it] 62%|██████▏ | 3578/5773 [1:42:03<3:23:28, 5.56s/it] {'loss': 0.573, 'learning_rate': 6.672820467081942e-06, 'epoch': 0.62} 62%|██████▏ | 3578/5773 [1:42:03<3:23:28, 5.56s/it] {'loss': 0.573, 'learning_rate': 6.672820467081942e-06, 'epoch': 0.62} 62%|██████▏ | 3578/5773 [1:42:02<3:23:27, 5.56s/it] 62%|██████▏ | 3579/5773 [1:42:07<3:23:27, 5.56s/it] 62%|██████▏ | 3579/5773 [1:42:09<3:23:27, 5.56s/it] {'loss': 0.564, 'learning_rate': 6.667529680583008e-06, 'epoch': 0.62} 62%|██████▏ | 3579/5773 [1:42:09<3:23:27, 5.56s/it] {'loss': 0.564, 'learning_rate': 6.667529680583008e-06, 'epoch': 0.62} 62%|██████▏ | 3579/5773 [1:42:07<3:23:27, 5.56s/it] 62%|██████▏ | 3580/5773 [1:42:13<3:22:07, 5.53s/it] 62%|██████▏ | 3580/5773 [1:42:15<3:22:06, 5.53s/it] {'loss': 0.5735, 'learning_rate': 6.662239943252204e-06, 'epoch': 0.62} 62%|██████▏ | 3580/5773 [1:42:15<3:22:06, 5.53s/it] {'loss': 0.5735, 'learning_rate': 6.662239943252204e-06, 'epoch': 0.62} 62%|██████▏ | 3580/5773 [1:42:13<3:22:07, 5.53s/it] 62%|██████▏ | 3581/5773 [1:42:18<3:20:41, 5.49s/it] 62%|██████▏ | 3581/5773 [1:42:20<3:20:41, 5.49s/it] {'loss': 0.5576, 'learning_rate': 6.6569512567549064e-06, 'epoch': 0.62} 62%|██████▏ | 3581/5773 [1:42:20<3:20:41, 5.49s/it] {'loss': 0.5576, 'learning_rate': 6.6569512567549064e-06, 'epoch': 0.62} 62%|██████▏ | 3581/5773 [1:42:18<3:20:41, 5.49s/it] 62%|██████▏ | 3582/5773 [1:42:23<3:19:40, 5.47s/it] 62%|██████▏ | 3582/5773 [1:42:25<3:19:41, 5.47s/it] {'loss': 0.5617, 'learning_rate': 6.651663622756161e-06, 'epoch': 0.62} 62%|██████▏ | 3582/5773 [1:42:25<3:19:41, 5.47s/it] {'loss': 0.5617, 'learning_rate': 6.651663622756161e-06, 'epoch': 0.62} 62%|██████▏ | 3582/5773 [1:42:23<3:19:40, 5.47s/it] 62%|██████▏ | 3583/5773 [1:42:29<3:22:45, 5.56s/it] 62%|██████▏ | 3583/5773 [1:42:31<3:22:46, 5.56s/it] {'loss': 0.572, 'learning_rate': 6.646377042920688e-06, 'epoch': 0.62} 62%|██████▏ | 3583/5773 [1:42:31<3:22:46, 5.56s/it] {'loss': 0.572, 'learning_rate': 6.646377042920688e-06, 'epoch': 0.62} 62%|██████▏ | 3583/5773 [1:42:29<3:22:45, 5.56s/it] 62%|██████▏ | 3584/5773 [1:42:35<3:22:14, 5.54s/it] 62%|██████▏ | 3584/5773 [1:42:37<3:22:16, 5.54s/it] {'loss': 0.5615, 'learning_rate': 6.6410915189128675e-06, 'epoch': 0.62} 62%|██████▏ | 3584/5773 [1:42:37<3:22:16, 5.54s/it] {'loss': 0.5615, 'learning_rate': 6.6410915189128675e-06, 'epoch': 0.62} 62%|██████▏ | 3584/5773 [1:42:35<3:22:14, 5.54s/it] 62%|██████▏ | 3585/5773 [1:42:40<3:22:12, 5.55s/it] 62%|██████▏ | 3585/5773 [1:42:42<3:22:11, 5.54s/it] {'loss': 0.5673, 'learning_rate': 6.635807052396757e-06, 'epoch': 0.62} 62%|██████▏ | 3585/5773 [1:42:42<3:22:11, 5.54s/it] {'loss': 0.5673, 'learning_rate': 6.635807052396757e-06, 'epoch': 0.62} 62%|██████▏ | 3585/5773 [1:42:40<3:22:12, 5.55s/it] 62%|██████▏ | 3586/5773 [1:42:46<3:21:10, 5.52s/it] 62%|██████▏ | 3586/5773 [1:42:48<3:21:10, 5.52s/it] {'loss': 0.5708, 'learning_rate': 6.630523645036066e-06, 'epoch': 0.62} 62%|██████▏ | 3586/5773 [1:42:48<3:21:10, 5.52s/it] {'loss': 0.5708, 'learning_rate': 6.630523645036066e-06, 'epoch': 0.62} 62%|██████▏ | 3586/5773 [1:42:46<3:21:10, 5.52s/it] 62%|██████▏ | 3587/5773 [1:42:51<3:20:05, 5.49s/it] 62%|██████▏ | 3587/5773 [1:42:53<3:20:06, 5.49s/it] {'loss': 0.5589, 'learning_rate': 6.625241298494191e-06, 'epoch': 0.62} 62%|██████▏ | 3587/5773 [1:42:53<3:20:06, 5.49s/it] {'loss': 0.5589, 'learning_rate': 6.625241298494191e-06, 'epoch': 0.62} 62%|██████▏ | 3587/5773 [1:42:51<3:20:05, 5.49s/it] 62%|██████▏ | 3588/5773 [1:42:57<3:21:13, 5.53s/it] 62%|██████▏ | 3588/5773 [1:42:59<3:21:14, 5.53s/it] {'loss': 0.5625, 'learning_rate': 6.619960014434175e-06, 'epoch': 0.62} 62%|██████▏ | 3588/5773 [1:42:59<3:21:14, 5.53s/it]{'loss': 0.5625, 'learning_rate': 6.619960014434175e-06, 'epoch': 0.62} 62%|██████▏ | 3588/5773 [1:42:57<3:21:13, 5.53s/it] 62%|██████▏ | 3589/5773 [1:43:02<3:19:15, 5.47s/it] 62%|██████▏ | 3589/5773 [1:43:04<3:19:14, 5.47s/it] {'loss': 0.5568, 'learning_rate': 6.614679794518737e-06, 'epoch': 0.62} 62%|██████▏ | 3589/5773 [1:43:04<3:19:14, 5.47s/it] {'loss': 0.5568, 'learning_rate': 6.614679794518737e-06, 'epoch': 0.62} 62%|██████▏ | 3589/5773 [1:43:02<3:19:15, 5.47s/it] 62%|██████▏ | 3590/5773 [1:43:07<3:17:43, 5.43s/it] 62%|██████▏ | 3590/5773 [1:43:09<3:17:44, 5.44s/it] {'loss': 0.5585, 'learning_rate': 6.609400640410264e-06, 'epoch': 0.62} {'loss': 0.5585, 'learning_rate': 6.609400640410264e-06, 'epoch': 0.62} 62%|██████▏ | 3590/5773 [1:43:09<3:17:44, 5.44s/it] 62%|██████▏ | 3590/5773 [1:43:07<3:17:43, 5.43s/it] 62%|██████▏ | 3591/5773 [1:43:13<3:19:24, 5.48s/it] 62%|██████▏ | 3591/5773 [1:43:15<3:19:25, 5.48s/it] {'loss': 0.5713, 'learning_rate': 6.604122553770799e-06, 'epoch': 0.62} 62%|██████▏ | 3591/5773 [1:43:15<3:19:25, 5.48s/it] {'loss': 0.5713, 'learning_rate': 6.604122553770799e-06, 'epoch': 0.62} 62%|██████▏ | 3591/5773 [1:43:13<3:19:24, 5.48s/it] 62%|██████▏ | 3592/5773 [1:43:19<3:21:27, 5.54s/it] 62%|██████▏ | 3592/5773 [1:43:21<3:21:28, 5.54s/it] {'loss': 0.5844, 'learning_rate': 6.598845536262046e-06, 'epoch': 0.62} 62%|██████▏ | 3592/5773 [1:43:21<3:21:28, 5.54s/it] {'loss': 0.5844, 'learning_rate': 6.598845536262046e-06, 'epoch': 0.62} 62%|██████▏ | 3592/5773 [1:43:19<3:21:27, 5.54s/it] 62%|██████▏ | 3593/5773 [1:43:24<3:20:03, 5.51s/it] {'loss': 0.5567, 'learning_rate': 6.593569589545389e-06, 'epoch': 0.62} 62%|██████▏ | 3593/5773 [1:43:24<3:20:03, 5.51s/it] 62%|██████▏ | 3593/5773 [1:43:26<3:20:03, 5.51s/it] {'loss': 0.5567, 'learning_rate': 6.593569589545389e-06, 'epoch': 0.62} 62%|██████▏ | 3593/5773 [1:43:26<3:20:03, 5.51s/it] 62%|██████▏ | 3594/5773 [1:43:29<3:18:52, 5.48s/it] 62%|██████▏ | 3594/5773 [1:43:31<3:18:53, 5.48s/it] {'loss': 0.5735, 'learning_rate': 6.588294715281857e-06, 'epoch': 0.62} 62%|██████▏ | 3594/5773 [1:43:31<3:18:53, 5.48s/it] {'loss': 0.5735, 'learning_rate': 6.588294715281857e-06, 'epoch': 0.62} 62%|██████▏ | 3594/5773 [1:43:29<3:18:52, 5.48s/it] 62%|██████▏ | 3595/5773 [1:43:35<3:17:34, 5.44s/it] 62%|██████▏ | 3595/5773 [1:43:37<3:17:33, 5.44s/it] {'loss': 0.5726, 'learning_rate': 6.5830209151321525e-06, 'epoch': 0.62} 62%|██████▏ | 3595/5773 [1:43:37<3:17:33, 5.44s/it] {'loss': 0.5726, 'learning_rate': 6.5830209151321525e-06, 'epoch': 0.62} 62%|██████▏ | 3595/5773 [1:43:35<3:17:34, 5.44s/it] 62%|██████▏ | 3596/5773 [1:43:40<3:18:36, 5.47s/it] 62%|██████▏ | 3596/5773 [1:43:42<3:18:38, 5.47s/it] {'loss': 0.5528, 'learning_rate': 6.577748190756636e-06, 'epoch': 0.62} 62%|██████▏ | 3596/5773 [1:43:42<3:18:38, 5.47s/it] {'loss': 0.5528, 'learning_rate': 6.577748190756636e-06, 'epoch': 0.62} 62%|██████▏ | 3596/5773 [1:43:40<3:18:36, 5.47s/it] 62%|██████▏ | 3597/5773 [1:43:46<3:19:47, 5.51s/it] 62%|██████▏ | 3597/5773 [1:43:48<3:19:47, 5.51s/it] {'loss': 0.5702, 'learning_rate': 6.572476543815328e-06, 'epoch': 0.62} 62%|██████▏ | 3597/5773 [1:43:48<3:19:47, 5.51s/it] {'loss': 0.5702, 'learning_rate': 6.572476543815328e-06, 'epoch': 0.62} 62%|██████▏ | 3597/5773 [1:43:46<3:19:47, 5.51s/it] 62%|██████▏ | 3598/5773 [1:43:51<3:19:35, 5.51s/it] 62%|██████▏ | 3598/5773 [1:43:53<3:19:36, 5.51s/it] {'loss': 0.581, 'learning_rate': 6.5672059759679145e-06, 'epoch': 0.62} 62%|██████▏ | 3598/5773 [1:43:53<3:19:36, 5.51s/it] {'loss': 0.581, 'learning_rate': 6.5672059759679145e-06, 'epoch': 0.62} 62%|██████▏ | 3598/5773 [1:43:51<3:19:35, 5.51s/it] 62%|██████▏ | 3599/5773 [1:43:57<3:18:23, 5.48s/it] 62%|██████▏ | 3599/5773 [1:43:59<3:18:23, 5.48s/it] {'loss': 0.5505, 'learning_rate': 6.561936488873735e-06, 'epoch': 0.62} 62%|██████▏ | 3599/5773 [1:43:59<3:18:23, 5.48s/it] {'loss': 0.5505, 'learning_rate': 6.561936488873735e-06, 'epoch': 0.62} 62%|██████▏ | 3599/5773 [1:43:57<3:18:23, 5.48s/it]9 AutoResumeHook: Checking whether to suspend...12 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 0 12 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 62%|██████▏ | 3600/5773 [1:44:02<3:17:58, 5.47s/it]5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 62%|██████▏ | 3600/5773 [1:44:04<3:17:57, 5.47s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5596, 'learning_rate': 6.5566680841917955e-06, 'epoch': 0.62} 62%|██████▏ | 3600/5773 [1:44:04<3:17:57, 5.47s/it] {'loss': 0.5596, 'learning_rate': 6.5566680841917955e-06, 'epoch': 0.62} 62%|██████▏ | 3600/5773 [1:44:02<3:17:58, 5.47s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 62%|██████▏ | 3601/5773 [1:44:27<6:41:46, 11.10s/it] 62%|██████▏ | 3601/5773 [1:44:29<6:41:46, 11.10s/it] {'loss': 0.5701, 'learning_rate': 6.551400763580756e-06, 'epoch': 0.62} 62%|██████▏ | 3601/5773 [1:44:29<6:41:46, 11.10s/it] {'loss': 0.5701, 'learning_rate': 6.551400763580756e-06, 'epoch': 0.62} 62%|██████▏ | 3601/5773 [1:44:27<6:41:46, 11.10s/it] 62%|██████▏ | 3602/5773 [1:44:32<5:39:52, 9.39s/it] 62%|██████▏ | 3602/5773 [1:44:34<5:39:51, 9.39s/it] {'loss': 0.5742, 'learning_rate': 6.5461345286989375e-06, 'epoch': 0.62} 62%|██████▏ | 3602/5773 [1:44:34<5:39:51, 9.39s/it] {'loss': 0.5742, 'learning_rate': 6.5461345286989375e-06, 'epoch': 0.62} 62%|██████▏ | 3602/5773 [1:44:32<5:39:52, 9.39s/it] 62%|██████▏ | 3603/5773 [1:44:38<4:57:59, 8.24s/it] 62%|██████▏ | 3603/5773 [1:44:39<4:57:59, 8.24s/it] {'loss': 0.553, 'learning_rate': 6.540869381204322e-06, 'epoch': 0.62} 62%|██████▏ | 3603/5773 [1:44:39<4:57:59, 8.24s/it] {'loss': 0.553, 'learning_rate': 6.540869381204322e-06, 'epoch': 0.62} 62%|██████▏ | 3603/5773 [1:44:38<4:57:59, 8.24s/it] 62%|██████▏ | 3604/5773 [1:44:43<4:28:27, 7.43s/it] 62%|██████▏ | 3604/5773 [1:44:45<4:28:27, 7.43s/it] {'loss': 0.5881, 'learning_rate': 6.535605322754541e-06, 'epoch': 0.62} 62%|██████▏ | 3604/5773 [1:44:45<4:28:27, 7.43s/it] {'loss': 0.5881, 'learning_rate': 6.535605322754541e-06, 'epoch': 0.62} 62%|██████▏ | 3604/5773 [1:44:43<4:28:27, 7.43s/it] 62%|██████▏ | 3605/5773 [1:44:51<4:08:33, 6.88s/it] 62%|██████▏ | 3605/5773 [1:44:49<4:08:34, 6.88s/it] {'loss': 0.5572, 'learning_rate': 6.530342355006897e-06, 'epoch': 0.62} 62%|██████▏ | 3605/5773 [1:44:51<4:08:33, 6.88s/it] {'loss': 0.5572, 'learning_rate': 6.530342355006897e-06, 'epoch': 0.62} 62%|██████▏ | 3605/5773 [1:44:49<4:08:34, 6.88s/it] 62%|██████▏ | 3606/5773 [1:44:54<3:53:54, 6.48s/it] 62%|██████▏ | 3606/5773 [1:44:56<3:53:54, 6.48s/it] {'loss': 0.5753, 'learning_rate': 6.525080479618331e-06, 'epoch': 0.62} 62%|██████▏ | 3606/5773 [1:44:56<3:53:54, 6.48s/it] {'loss': 0.5753, 'learning_rate': 6.525080479618331e-06, 'epoch': 0.62} 62%|██████▏ | 3606/5773 [1:44:54<3:53:54, 6.48s/it] 62%|██████▏ | 3607/5773 [1:45:02<3:43:54, 6.20s/it] 62%|██████▏ | 3607/5773 [1:45:00<3:43:55, 6.20s/it] {'loss': 0.5701, 'learning_rate': 6.5198196982454506e-06, 'epoch': 0.62} 62%|██████▏ | 3607/5773 [1:45:02<3:43:54, 6.20s/it] {'loss': 0.5701, 'learning_rate': 6.5198196982454506e-06, 'epoch': 0.62} 62%|██████▏ | 3607/5773 [1:45:00<3:43:55, 6.20s/it] 62%|██████▏ | 3608/5773 [1:45:07<3:35:16, 5.97s/it] 62%|██████▏ | 3608/5773 [1:45:05<3:35:16, 5.97s/it] {'loss': 0.5915, 'learning_rate': 6.514560012544521e-06, 'epoch': 0.62} 62%|██████▏ | 3608/5773 [1:45:07<3:35:16, 5.97s/it] {'loss': 0.5915, 'learning_rate': 6.514560012544521e-06, 'epoch': 0.62} 62%|██████▏ | 3608/5773 [1:45:05<3:35:16, 5.97s/it] 63%|██████▎ | 3609/5773 [1:45:11<3:29:22, 5.81s/it] 63%|██████▎ | 3609/5773 [1:45:13<3:29:23, 5.81s/it] {'loss': 0.5764, 'learning_rate': 6.5093014241714595e-06, 'epoch': 0.63} 63%|██████▎ | 3609/5773 [1:45:13<3:29:23, 5.81s/it] {'loss': 0.5764, 'learning_rate': 6.5093014241714595e-06, 'epoch': 0.63} 63%|██████▎ | 3609/5773 [1:45:11<3:29:22, 5.81s/it] 63%|██████▎ | 3610/5773 [1:45:16<3:26:54, 5.74s/it] 63%|██████▎ | 3610/5773 [1:45:18<3:26:54, 5.74s/it] {'loss': 0.5499, 'learning_rate': 6.504043934781836e-06, 'epoch': 0.63} 63%|██████▎ | 3610/5773 [1:45:18<3:26:54, 5.74s/it] {'loss': 0.5499, 'learning_rate': 6.504043934781836e-06, 'epoch': 0.63} 63%|██████▎ | 3610/5773 [1:45:16<3:26:54, 5.74s/it] 63%|██████▎ | 3611/5773 [1:45:24<3:23:52, 5.66s/it] 63%|██████▎ | 3611/5773 [1:45:22<3:23:52, 5.66s/it] {'loss': 0.584, 'learning_rate': 6.4987875460308784e-06, 'epoch': 0.63} 63%|██████▎ | 3611/5773 [1:45:24<3:23:52, 5.66s/it] {'loss': 0.584, 'learning_rate': 6.4987875460308784e-06, 'epoch': 0.63} 63%|██████▎ | 3611/5773 [1:45:22<3:23:52, 5.66s/it] 63%|██████▎ | 3612/5773 [1:45:27<3:21:17, 5.59s/it] 63%|██████▎ | 3612/5773 [1:45:29<3:21:17, 5.59s/it] {'loss': 0.5636, 'learning_rate': 6.493532259573461e-06, 'epoch': 0.63} 63%|██████▎ | 3612/5773 [1:45:29<3:21:17, 5.59s/it] {'loss': 0.5636, 'learning_rate': 6.493532259573461e-06, 'epoch': 0.63} 63%|██████▎ | 3612/5773 [1:45:27<3:21:17, 5.59s/it] 63%|██████▎ | 3613/5773 [1:45:33<3:20:46, 5.58s/it] 63%|██████▎ | 3613/5773 [1:45:35<3:20:46, 5.58s/it] {'loss': 0.5566, 'learning_rate': 6.488278077064121e-06, 'epoch': 0.63} 63%|██████▎ | 3613/5773 [1:45:35<3:20:46, 5.58s/it] {'loss': 0.5566, 'learning_rate': 6.488278077064121e-06, 'epoch': 0.63} 63%|██████▎ | 3613/5773 [1:45:33<3:20:46, 5.58s/it] 63%|██████▎ | 3614/5773 [1:45:38<3:20:33, 5.57s/it] 63%|██████▎ | 3614/5773 [1:45:40<3:20:33, 5.57s/it] {'loss': 0.5641, 'learning_rate': 6.483025000157041e-06, 'epoch': 0.63} 63%|██████▎ | 3614/5773 [1:45:40<3:20:33, 5.57s/it] {'loss': 0.5641, 'learning_rate': 6.483025000157041e-06, 'epoch': 0.63} 63%|██████▎ | 3614/5773 [1:45:38<3:20:33, 5.57s/it] 63%|██████▎ | 3615/5773 [1:45:44<3:20:19, 5.57s/it] 63%|██████▎ | 3615/5773 [1:45:46<3:20:19, 5.57s/it] {'loss': 0.5582, 'learning_rate': 6.477773030506058e-06, 'epoch': 0.63} 63%|██████▎ | 3615/5773 [1:45:46<3:20:19, 5.57s/it] {'loss': 0.5582, 'learning_rate': 6.477773030506058e-06, 'epoch': 0.63} 63%|██████▎ | 3615/5773 [1:45:44<3:20:19, 5.57s/it] 63%|██████▎ | 3616/5773 [1:45:49<3:21:30, 5.61s/it] 63%|██████▎ | 3616/5773 [1:45:51<3:21:31, 5.61s/it] {'loss': 0.5674, 'learning_rate': 6.47252216976466e-06, 'epoch': 0.63} 63%|██████▎ | 3616/5773 [1:45:51<3:21:31, 5.61s/it] {'loss': 0.5674, 'learning_rate': 6.47252216976466e-06, 'epoch': 0.63} 63%|██████▎ | 3616/5773 [1:45:49<3:21:30, 5.61s/it] 63%|██████▎ | 3617/5773 [1:45:55<3:21:27, 5.61s/it] 63%|██████▎ | 3617/5773 [1:45:57<3:21:27, 5.61s/it] {'loss': 0.5645, 'learning_rate': 6.467272419585984e-06, 'epoch': 0.63} 63%|██████▎ | 3617/5773 [1:45:57<3:21:27, 5.61s/it] {'loss': 0.5645, 'learning_rate': 6.467272419585984e-06, 'epoch': 0.63} 63%|██████▎ | 3617/5773 [1:45:55<3:21:27, 5.61s/it] 63%|██████▎ | 3618/5773 [1:46:03<3:20:16, 5.58s/it] 63%|██████▎ | 3618/5773 [1:46:01<3:20:17, 5.58s/it] {'loss': 0.5662, 'learning_rate': 6.4620237816228215e-06, 'epoch': 0.63} 63%|██████▎ | 3618/5773 [1:46:03<3:20:16, 5.58s/it] {'loss': 0.5662, 'learning_rate': 6.4620237816228215e-06, 'epoch': 0.63} 63%|██████▎ | 3618/5773 [1:46:01<3:20:17, 5.58s/it] 63%|██████▎ | 3619/5773 [1:46:06<3:22:20, 5.64s/it] 63%|██████▎ | 3619/5773 [1:46:08<3:22:20, 5.64s/it] {'loss': 0.5492, 'learning_rate': 6.456776257527611e-06, 'epoch': 0.63} 63%|██████▎ | 3619/5773 [1:46:08<3:22:20, 5.64s/it] {'loss': 0.5492, 'learning_rate': 6.456776257527611e-06, 'epoch': 0.63} 63%|██████▎ | 3619/5773 [1:46:06<3:22:20, 5.64s/it] 63%|██████▎ | 3620/5773 [1:46:12<3:21:26, 5.61s/it] 63%|██████▎ | 3620/5773 [1:46:14<3:21:26, 5.61s/it] {'loss': 0.5504, 'learning_rate': 6.451529848952438e-06, 'epoch': 0.63} 63%|██████▎ | 3620/5773 [1:46:14<3:21:26, 5.61s/it] {'loss': 0.5504, 'learning_rate': 6.451529848952438e-06, 'epoch': 0.63} 63%|██████▎ | 3620/5773 [1:46:12<3:21:26, 5.61s/it] 63%|██████▎ | 3621/5773 [1:46:17<3:20:29, 5.59s/it] 63%|██████▎ | 3621/5773 [1:46:19<3:20:29, 5.59s/it] {'loss': 0.5473, 'learning_rate': 6.446284557549039e-06, 'epoch': 0.63} 63%|██████▎ | 3621/5773 [1:46:19<3:20:29, 5.59s/it] {'loss': 0.5473, 'learning_rate': 6.446284557549039e-06, 'epoch': 0.63} 63%|██████▎ | 3621/5773 [1:46:17<3:20:29, 5.59s/it] 63%|██████▎ | 3622/5773 [1:46:23<3:20:14, 5.59s/it] 63%|██████▎ | 3622/5773 [1:46:25<3:20:14, 5.59s/it] {'loss': 0.5656, 'learning_rate': 6.441040384968806e-06, 'epoch': 0.63} 63%|██████▎ | 3622/5773 [1:46:25<3:20:14, 5.59s/it] {'loss': 0.5656, 'learning_rate': 6.441040384968806e-06, 'epoch': 0.63} 63%|██████▎ | 3622/5773 [1:46:23<3:20:14, 5.59s/it] 63%|██████▎ | 3623/5773 [1:46:31<3:20:25, 5.59s/it] 63%|██████▎ | 3623/5773 [1:46:29<3:20:26, 5.59s/it] {'loss': 0.5727, 'learning_rate': 6.435797332862765e-06, 'epoch': 0.63} 63%|██████▎ | 3623/5773 [1:46:31<3:20:25, 5.59s/it] {'loss': 0.5727, 'learning_rate': 6.435797332862765e-06, 'epoch': 0.63} 63%|██████▎ | 3623/5773 [1:46:29<3:20:26, 5.59s/it] 63%|██████▎ | 3624/5773 [1:46:34<3:21:04, 5.61s/it] 63%|██████▎ | 3624/5773 [1:46:36<3:21:05, 5.61s/it] {'loss': 0.5502, 'learning_rate': 6.430555402881601e-06, 'epoch': 0.63} 63%|██████▎ | 3624/5773 [1:46:36<3:21:05, 5.61s/it] {'loss': 0.5502, 'learning_rate': 6.430555402881601e-06, 'epoch': 0.63} 63%|██████▎ | 3624/5773 [1:46:34<3:21:04, 5.61s/it] 63%|██████▎ | 3625/5773 [1:46:42<3:20:23, 5.60s/it] 63%|██████▎ | 3625/5773 [1:46:40<3:20:24, 5.60s/it] {'loss': 0.5644, 'learning_rate': 6.425314596675637e-06, 'epoch': 0.63} 63%|██████▎ | 3625/5773 [1:46:42<3:20:23, 5.60s/it] {'loss': 0.5644, 'learning_rate': 6.425314596675637e-06, 'epoch': 0.63} 63%|██████▎ | 3625/5773 [1:46:40<3:20:24, 5.60s/it] 63%|██████▎ | 3626/5773 [1:46:45<3:18:29, 5.55s/it] 63%|██████▎ | 3626/5773 [1:46:47<3:18:29, 5.55s/it] {'loss': 0.5614, 'learning_rate': 6.420074915894848e-06, 'epoch': 0.63} 63%|██████▎ | 3626/5773 [1:46:47<3:18:29, 5.55s/it] {'loss': 0.5614, 'learning_rate': 6.420074915894848e-06, 'epoch': 0.63} 63%|██████▎ | 3626/5773 [1:46:45<3:18:29, 5.55s/it] 63%|██████▎ | 3627/5773 [1:46:53<3:18:00, 5.54s/it] 63%|██████▎ | 3627/5773 [1:46:51<3:18:00, 5.54s/it] {'loss': 0.5571, 'learning_rate': 6.4148363621888565e-06, 'epoch': 0.63} 63%|██████▎ | 3627/5773 [1:46:53<3:18:00, 5.54s/it] {'loss': 0.5571, 'learning_rate': 6.4148363621888565e-06, 'epoch': 0.63} 63%|██████▎ | 3627/5773 [1:46:51<3:18:00, 5.54s/it] 63%|██████▎ | 3628/5773 [1:46:56<3:18:14, 5.55s/it] 63%|██████▎ | 3628/5773 [1:46:58<3:18:14, 5.55s/it] {'loss': 0.5686, 'learning_rate': 6.40959893720692e-06, 'epoch': 0.63} 63%|██████▎ | 3628/5773 [1:46:58<3:18:14, 5.55s/it] {'loss': 0.5686, 'learning_rate': 6.40959893720692e-06, 'epoch': 0.63} 63%|██████▎ | 3628/5773 [1:46:56<3:18:14, 5.55s/it] 63%|██████▎ | 3629/5773 [1:47:04<3:16:21, 5.50s/it] 63%|██████▎ | 3629/5773 [1:47:02<3:16:21, 5.50s/it] {'loss': 0.5805, 'learning_rate': 6.404362642597954e-06, 'epoch': 0.63} 63%|██████▎ | 3629/5773 [1:47:04<3:16:21, 5.50s/it] {'loss': 0.5805, 'learning_rate': 6.404362642597954e-06, 'epoch': 0.63} 63%|██████▎ | 3629/5773 [1:47:02<3:16:21, 5.50s/it] 63%|██████▎ | 3630/5773 [1:47:09<3:16:52, 5.51s/it] 63%|██████▎ | 3630/5773 [1:47:07<3:16:52, 5.51s/it] {'loss': 0.568, 'learning_rate': 6.399127480010505e-06, 'epoch': 0.63} 63%|██████▎ | 3630/5773 [1:47:09<3:16:52, 5.51s/it] {'loss': 0.568, 'learning_rate': 6.399127480010505e-06, 'epoch': 0.63} 63%|██████▎ | 3630/5773 [1:47:07<3:16:52, 5.51s/it] 63%|██████▎ | 3631/5773 [1:47:13<3:16:12, 5.50s/it] 63%|██████▎ | 3631/5773 [1:47:15<3:16:12, 5.50s/it] {'loss': 0.5745, 'learning_rate': 6.393893451092772e-06, 'epoch': 0.63} 63%|██████▎ | 3631/5773 [1:47:15<3:16:12, 5.50s/it] {'loss': 0.5745, 'learning_rate': 6.393893451092772e-06, 'epoch': 0.63} 63%|██████▎ | 3631/5773 [1:47:13<3:16:12, 5.50s/it] 63%|██████▎ | 3632/5773 [1:47:18<3:15:20, 5.47s/it] 63%|██████▎ | 3632/5773 [1:47:20<3:15:20, 5.47s/it] {'loss': 0.5714, 'learning_rate': 6.388660557492598e-06, 'epoch': 0.63} 63%|██████▎ | 3632/5773 [1:47:20<3:15:20, 5.47s/it] {'loss': 0.5714, 'learning_rate': 6.388660557492598e-06, 'epoch': 0.63} 63%|██████▎ | 3632/5773 [1:47:18<3:15:20, 5.47s/it] 63%|██████▎ | 3633/5773 [1:47:24<3:15:42, 5.49s/it] 63%|██████▎ | 3633/5773 [1:47:26<3:15:42, 5.49s/it] {'loss': 0.566, 'learning_rate': 6.38342880085746e-06, 'epoch': 0.63} 63%|██████▎ | 3633/5773 [1:47:26<3:15:42, 5.49s/it] {'loss': 0.566, 'learning_rate': 6.38342880085746e-06, 'epoch': 0.63} 63%|██████▎ | 3633/5773 [1:47:24<3:15:42, 5.49s/it] 63%|██████▎ | 3634/5773 [1:47:29<3:16:28, 5.51s/it] 63%|██████▎ | 3634/5773 [1:47:31<3:16:28, 5.51s/it] {'loss': 0.5628, 'learning_rate': 6.378198182834487e-06, 'epoch': 0.63} 63%|██████▎ | 3634/5773 [1:47:31<3:16:28, 5.51s/it] {'loss': 0.5628, 'learning_rate': 6.378198182834487e-06, 'epoch': 0.63} 63%|██████▎ | 3634/5773 [1:47:29<3:16:28, 5.51s/it] 63%|██████▎ | 3635/5773 [1:47:35<3:16:35, 5.52s/it] 63%|██████▎ | 3635/5773 [1:47:37<3:16:34, 5.52s/it] {'loss': 0.5932, 'learning_rate': 6.372968705070441e-06, 'epoch': 0.63} 63%|██████▎ | 3635/5773 [1:47:37<3:16:34, 5.52s/it] {'loss': 0.5932, 'learning_rate': 6.372968705070441e-06, 'epoch': 0.63} 63%|██████▎ | 3635/5773 [1:47:35<3:16:35, 5.52s/it] 63%|██████▎ | 3636/5773 [1:47:40<3:17:20, 5.54s/it] 63%|██████▎ | 3636/5773 [1:47:42<3:17:20, 5.54s/it] {'loss': 0.5594, 'learning_rate': 6.367740369211727e-06, 'epoch': 0.63} 63%|██████▎ | 3636/5773 [1:47:42<3:17:20, 5.54s/it] {'loss': 0.5594, 'learning_rate': 6.367740369211727e-06, 'epoch': 0.63} 63%|██████▎ | 3636/5773 [1:47:40<3:17:20, 5.54s/it] 63%|██████▎ | 3637/5773 [1:47:48<3:16:38, 5.52s/it] 63%|██████▎ | 3637/5773 [1:47:46<3:16:38, 5.52s/it] {'loss': 0.5675, 'learning_rate': 6.3625131769043966e-06, 'epoch': 0.63} 63%|██████▎ | 3637/5773 [1:47:48<3:16:38, 5.52s/it] {'loss': 0.5675, 'learning_rate': 6.3625131769043966e-06, 'epoch': 0.63} 63%|██████▎ | 3637/5773 [1:47:46<3:16:38, 5.52s/it] 63%|██████▎ | 3638/5773 [1:47:51<3:16:34, 5.52s/it] 63%|██████▎ | 3638/5773 [1:47:53<3:16:34, 5.52s/it] {'loss': 0.5769, 'learning_rate': 6.357287129794133e-06, 'epoch': 0.63} 63%|██████▎ | 3638/5773 [1:47:53<3:16:34, 5.52s/it] {'loss': 0.5769, 'learning_rate': 6.357287129794133e-06, 'epoch': 0.63} 63%|██████▎ | 3638/5773 [1:47:51<3:16:34, 5.52s/it] 63%|██████▎ | 3639/5773 [1:47:57<3:16:52, 5.54s/it] 63%|██████▎ | 3639/5773 [1:47:59<3:16:52, 5.54s/it] {'loss': 0.5798, 'learning_rate': 6.352062229526266e-06, 'epoch': 0.63} 63%|██████▎ | 3639/5773 [1:47:59<3:16:52, 5.54s/it] {'loss': 0.5798, 'learning_rate': 6.352062229526266e-06, 'epoch': 0.63} 63%|██████▎ | 3639/5773 [1:47:57<3:16:52, 5.54s/it] 63%|██████▎ | 3640/5773 [1:48:02<3:15:54, 5.51s/it] 63%|██████▎ | 3640/5773 [1:48:04<3:15:54, 5.51s/it] {'loss': 0.583, 'learning_rate': 6.3468384777457605e-06, 'epoch': 0.63} 63%|██████▎ | 3640/5773 [1:48:04<3:15:54, 5.51s/it] {'loss': 0.583, 'learning_rate': 6.3468384777457605e-06, 'epoch': 0.63} 63%|██████▎ | 3640/5773 [1:48:02<3:15:54, 5.51s/it] 63%|██████▎ | 3641/5773 [1:48:10<3:18:05, 5.57s/it] 63%|██████▎ | 3641/5773 [1:48:08<3:18:05, 5.57s/it] {'loss': 0.5745, 'learning_rate': 6.341615876097218e-06, 'epoch': 0.63} 63%|██████▎ | 3641/5773 [1:48:10<3:18:05, 5.57s/it] {'loss': 0.5745, 'learning_rate': 6.341615876097218e-06, 'epoch': 0.63} 63%|██████▎ | 3641/5773 [1:48:08<3:18:05, 5.57s/it] 63%|██████▎ | 3642/5773 [1:48:16<3:16:57, 5.55s/it] 63%|██████▎ | 3642/5773 [1:48:14<3:16:57, 5.55s/it] {'loss': 0.5443, 'learning_rate': 6.336394426224885e-06, 'epoch': 0.63} 63%|██████▎ | 3642/5773 [1:48:16<3:16:57, 5.55s/it] {'loss': 0.5443, 'learning_rate': 6.336394426224885e-06, 'epoch': 0.63} 63%|██████▎ | 3642/5773 [1:48:14<3:16:57, 5.55s/it] 63%|██████▎ | 3643/5773 [1:48:19<3:17:20, 5.56s/it] 63%|██████▎ | 3643/5773 [1:48:21<3:17:20, 5.56s/it] {'loss': 0.5558, 'learning_rate': 6.331174129772635e-06, 'epoch': 0.63} 63%|██████▎ | 3643/5773 [1:48:21<3:17:20, 5.56s/it] {'loss': 0.5558, 'learning_rate': 6.331174129772635e-06, 'epoch': 0.63} 63%|██████▎ | 3643/5773 [1:48:19<3:17:20, 5.56s/it] 63%|██████▎ | 3644/5773 [1:48:27<3:17:46, 5.57s/it] 63%|██████▎ | 3644/5773 [1:48:25<3:17:46, 5.57s/it] {'loss': 0.5602, 'learning_rate': 6.325954988383988e-06, 'epoch': 0.63} 63%|██████▎ | 3644/5773 [1:48:27<3:17:46, 5.57s/it] {'loss': 0.5602, 'learning_rate': 6.325954988383988e-06, 'epoch': 0.63} 63%|██████▎ | 3644/5773 [1:48:25<3:17:46, 5.57s/it] 63%|██████▎ | 3645/5773 [1:48:30<3:17:01, 5.56s/it] 63%|██████▎ | 3645/5773 [1:48:32<3:17:01, 5.56s/it] {'loss': 0.5668, 'learning_rate': 6.320737003702098e-06, 'epoch': 0.63} 63%|██████▎ | 3645/5773 [1:48:32<3:17:01, 5.56s/it] {'loss': 0.5668, 'learning_rate': 6.320737003702098e-06, 'epoch': 0.63} 63%|██████▎ | 3645/5773 [1:48:30<3:17:01, 5.56s/it] 63%|██████▎ | 3646/5773 [1:48:38<3:17:17, 5.57s/it] 63%|██████▎ | 3646/5773 [1:48:36<3:17:17, 5.57s/it] {'loss': 0.5522, 'learning_rate': 6.315520177369747e-06, 'epoch': 0.63} 63%|██████▎ | 3646/5773 [1:48:38<3:17:17, 5.57s/it] {'loss': 0.5522, 'learning_rate': 6.315520177369747e-06, 'epoch': 0.63} 63%|██████▎ | 3646/5773 [1:48:36<3:17:17, 5.57s/it] 63%|██████▎ | 3647/5773 [1:48:44<3:17:56, 5.59s/it] 63%|██████▎ | 3647/5773 [1:48:42<3:17:56, 5.59s/it] {'loss': 0.5659, 'learning_rate': 6.310304511029366e-06, 'epoch': 0.63} 63%|██████▎ | 3647/5773 [1:48:44<3:17:56, 5.59s/it] {'loss': 0.5659, 'learning_rate': 6.310304511029366e-06, 'epoch': 0.63} 63%|██████▎ | 3647/5773 [1:48:42<3:17:56, 5.59s/it] 63%|██████▎ | 3648/5773 [1:48:49<3:17:01, 5.56s/it] 63%|██████▎ | 3648/5773 [1:48:47<3:17:01, 5.56s/it] {'loss': 0.5628, 'learning_rate': 6.305090006323011e-06, 'epoch': 0.63} 63%|██████▎ | 3648/5773 [1:48:49<3:17:01, 5.56s/it] {'loss': 0.5628, 'learning_rate': 6.305090006323011e-06, 'epoch': 0.63} 63%|██████▎ | 3648/5773 [1:48:47<3:17:01, 5.56s/it] 63%|██████▎ | 3649/5773 [1:48:52<3:14:10, 5.49s/it] 63%|██████▎ | 3649/5773 [1:48:54<3:14:11, 5.49s/it] {'loss': 0.5829, 'learning_rate': 6.29987666489237e-06, 'epoch': 0.63} 63%|██████▎ | 3649/5773 [1:48:54<3:14:11, 5.49s/it] {'loss': 0.5829, 'learning_rate': 6.29987666489237e-06, 'epoch': 0.63} 63%|██████▎ | 3649/5773 [1:48:52<3:14:10, 5.49s/it]12 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 63%|██████▎ | 3650/5773 [1:49:00<3:15:14, 5.52s/it]013 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 63%|██████▎ | 3650/5773 [1:48:58<3:15:14, 5.52s/it]6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5706, 'learning_rate': 6.294664488378777e-06, 'epoch': 0.63} 63%|██████▎ | 3650/5773 [1:49:00<3:15:14, 5.52s/it] {'loss': 0.5706, 'learning_rate': 6.294664488378777e-06, 'epoch': 0.63} 63%|██████▎ | 3650/5773 [1:48:58<3:15:14, 5.52s/it] 63%|██████▎ | 3651/5773 [1:49:03<3:13:52, 5.48s/it] 63%|██████▎ | 3651/5773 [1:49:05<3:13:52, 5.48s/it] {'loss': 0.565, 'learning_rate': 6.289453478423183e-06, 'epoch': 0.63} 63%|██████▎ | 3651/5773 [1:49:05<3:13:52, 5.48s/it] {'loss': 0.565, 'learning_rate': 6.289453478423183e-06, 'epoch': 0.63} 63%|██████▎ | 3651/5773 [1:49:03<3:13:52, 5.48s/it] 63%|██████▎ | 3652/5773 [1:49:09<3:15:19, 5.53s/it] 63%|██████▎ | 3652/5773 [1:49:11<3:15:19, 5.53s/it] {'loss': 0.5592, 'learning_rate': 6.2842436366661866e-06, 'epoch': 0.63} 63%|██████▎ | 3652/5773 [1:49:11<3:15:19, 5.53s/it] {'loss': 0.5592, 'learning_rate': 6.2842436366661866e-06, 'epoch': 0.63} 63%|██████▎ | 3652/5773 [1:49:09<3:15:19, 5.53s/it] 63%|██████▎ | 3653/5773 [1:49:15<3:15:29, 5.53s/it] 63%|██████▎ | 3653/5773 [1:49:16<3:15:29, 5.53s/it] {'loss': 0.5734, 'learning_rate': 6.279034964748012e-06, 'epoch': 0.63} 63%|██████▎ | 3653/5773 [1:49:16<3:15:29, 5.53s/it] {'loss': 0.5734, 'learning_rate': 6.279034964748012e-06, 'epoch': 0.63} 63%|██████▎ | 3653/5773 [1:49:15<3:15:29, 5.53s/it] 63%|██████▎ | 3654/5773 [1:49:20<3:15:23, 5.53s/it] 63%|██████▎ | 3654/5773 [1:49:22<3:15:22, 5.53s/it] {'loss': 0.5648, 'learning_rate': 6.273827464308513e-06, 'epoch': 0.63} 63%|██████▎ | 3654/5773 [1:49:22<3:15:22, 5.53s/it] {'loss': 0.5648, 'learning_rate': 6.273827464308513e-06, 'epoch': 0.63} 63%|██████▎ | 3654/5773 [1:49:20<3:15:23, 5.53s/it] 63%|██████▎ | 3655/5773 [1:49:25<3:13:46, 5.49s/it] 63%|██████▎ | 3655/5773 [1:49:27<3:13:46, 5.49s/it] {'loss': 0.5524, 'learning_rate': 6.2686211369871805e-06, 'epoch': 0.63} 63%|██████▎ | 3655/5773 [1:49:27<3:13:46, 5.49s/it] {'loss': 0.5524, 'learning_rate': 6.2686211369871805e-06, 'epoch': 0.63} 63%|██████▎ | 3655/5773 [1:49:25<3:13:46, 5.49s/it] 63%|██████▎ | 3656/5773 [1:49:31<3:15:11, 5.53s/it] 63%|██████▎ | 3656/5773 [1:49:33<3:15:11, 5.53s/it] {'loss': 0.5717, 'learning_rate': 6.263415984423129e-06, 'epoch': 0.63} 63%|██████▎ | 3656/5773 [1:49:33<3:15:11, 5.53s/it] {'loss': 0.5717, 'learning_rate': 6.263415984423129e-06, 'epoch': 0.63} 63%|██████▎ | 3656/5773 [1:49:31<3:15:11, 5.53s/it] 63%|██████▎ | 3657/5773 [1:49:37<3:14:45, 5.52s/it] 63%|██████▎ | 3657/5773 [1:49:39<3:14:46, 5.52s/it] {'loss': 0.5743, 'learning_rate': 6.258212008255109e-06, 'epoch': 0.63} 63%|██████▎ | 3657/5773 [1:49:39<3:14:46, 5.52s/it] {'loss': 0.5743, 'learning_rate': 6.258212008255109e-06, 'epoch': 0.63} 63%|██████▎ | 3657/5773 [1:49:37<3:14:45, 5.52s/it] 63%|██████▎ | 3658/5773 [1:49:44<3:16:26, 5.57s/it] 63%|██████▎ | 3658/5773 [1:49:42<3:16:26, 5.57s/it] {'loss': 0.5536, 'learning_rate': 6.253009210121499e-06, 'epoch': 0.63} 63%|██████▎ | 3658/5773 [1:49:44<3:16:26, 5.57s/it] {'loss': 0.5536, 'learning_rate': 6.253009210121499e-06, 'epoch': 0.63} 63%|██████▎ | 3658/5773 [1:49:42<3:16:26, 5.57s/it] 63%|██████▎ | 3659/5773 [1:49:50<3:15:26, 5.55s/it] 63%|██████▎ | 3659/5773 [1:49:48<3:15:26, 5.55s/it] {'loss': 0.5638, 'learning_rate': 6.247807591660304e-06, 'epoch': 0.63} 63%|██████▎ | 3659/5773 [1:49:50<3:15:26, 5.55s/it] {'loss': 0.5638, 'learning_rate': 6.247807591660304e-06, 'epoch': 0.63} 63%|██████▎ | 3659/5773 [1:49:48<3:15:26, 5.55s/it] 63%|██████▎ | 3660/5773 [1:49:53<3:15:37, 5.55s/it] 63%|██████▎ | 3660/5773 [1:49:55<3:15:37, 5.55s/it] {'loss': 0.5575, 'learning_rate': 6.242607154509163e-06, 'epoch': 0.63} 63%|██████▎ | 3660/5773 [1:49:55<3:15:37, 5.55s/it] {'loss': 0.5575, 'learning_rate': 6.242607154509163e-06, 'epoch': 0.63} 63%|██████▎ | 3660/5773 [1:49:53<3:15:37, 5.55s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 63%|██████▎ | 3661/5773 [1:50:01<3:14:51, 5.54s/it] 63%|██████▎ | 3661/5773 [1:49:59<3:14:51, 5.54s/it] {'loss': 0.5815, 'learning_rate': 6.237407900305334e-06, 'epoch': 0.63} 63%|██████▎ | 3661/5773 [1:50:01<3:14:51, 5.54s/it] {'loss': 0.5815, 'learning_rate': 6.237407900305334e-06, 'epoch': 0.63} 63%|██████▎ | 3661/5773 [1:49:59<3:14:51, 5.54s/it] 63%|██████▎ | 3662/5773 [1:50:04<3:15:41, 5.56s/it] 63%|██████▎ | 3662/5773 [1:50:06<3:15:41, 5.56s/it] {'loss': 0.5732, 'learning_rate': 6.232209830685718e-06, 'epoch': 0.63} 63%|██████▎ | 3662/5773 [1:50:06<3:15:41, 5.56s/it] {'loss': 0.5732, 'learning_rate': 6.232209830685718e-06, 'epoch': 0.63} 63%|██████▎ | 3662/5773 [1:50:04<3:15:41, 5.56s/it] 63%|██████▎ | 3663/5773 [1:50:10<3:15:10, 5.55s/it] 63%|██████▎ | 3663/5773 [1:50:12<3:15:10, 5.55s/it] {'loss': 0.5552, 'learning_rate': 6.2270129472868235e-06, 'epoch': 0.63} 63%|██████▎ | 3663/5773 [1:50:12<3:15:10, 5.55s/it] {'loss': 0.5552, 'learning_rate': 6.2270129472868235e-06, 'epoch': 0.63} 63%|██████▎ | 3663/5773 [1:50:10<3:15:10, 5.55s/it] 63%|██████▎ | 3664/5773 [1:50:15<3:12:56, 5.49s/it] 63%|██████▎ | 3664/5773 [1:50:17<3:12:56, 5.49s/it] {'loss': 0.5796, 'learning_rate': 6.221817251744801e-06, 'epoch': 0.63} 63%|██████▎ | 3664/5773 [1:50:17<3:12:56, 5.49s/it] {'loss': 0.5796, 'learning_rate': 6.221817251744801e-06, 'epoch': 0.63} 63%|██████▎ | 3664/5773 [1:50:15<3:12:56, 5.49s/it] 63%|██████▎ | 3665/5773 [1:50:21<3:12:12, 5.47s/it] 63%|██████▎ | 3665/5773 [1:50:23<3:12:12, 5.47s/it] {'loss': 0.5566, 'learning_rate': 6.216622745695417e-06, 'epoch': 0.63} 63%|██████▎ | 3665/5773 [1:50:23<3:12:12, 5.47s/it] {'loss': 0.5566, 'learning_rate': 6.216622745695417e-06, 'epoch': 0.63} 63%|██████▎ | 3665/5773 [1:50:21<3:12:12, 5.47s/it] 64%|██████▎ | 3666/5773 [1:50:26<3:12:17, 5.48s/it] 64%|██████▎ | 3666/5773 [1:50:28<3:12:17, 5.48s/it] {'loss': 0.5735, 'learning_rate': 6.211429430774073e-06, 'epoch': 0.64} 64%|██████▎ | 3666/5773 [1:50:28<3:12:17, 5.48s/it] {'loss': 0.5735, 'learning_rate': 6.211429430774073e-06, 'epoch': 0.64} 64%|██████▎ | 3666/5773 [1:50:26<3:12:17, 5.48s/it] 64%|██████▎ | 3667/5773 [1:50:32<3:11:50, 5.47s/it] 64%|██████▎ | 3667/5773 [1:50:34<3:11:50, 5.47s/it] {'loss': 0.5712, 'learning_rate': 6.2062373086157855e-06, 'epoch': 0.64} 64%|██████▎ | 3667/5773 [1:50:34<3:11:50, 5.47s/it] {'loss': 0.5712, 'learning_rate': 6.2062373086157855e-06, 'epoch': 0.64} 64%|██████▎ | 3667/5773 [1:50:32<3:11:50, 5.47s/it] 64%|██████▎ | 3668/5773 [1:50:37<3:11:09, 5.45s/it] 64%|██████▎ | 3668/5773 [1:50:39<3:11:09, 5.45s/it] {'loss': 0.563, 'learning_rate': 6.2010463808552065e-06, 'epoch': 0.64} 64%|██████▎ | 3668/5773 [1:50:39<3:11:09, 5.45s/it] {'loss': 0.563, 'learning_rate': 6.2010463808552065e-06, 'epoch': 0.64} 64%|██████▎ | 3668/5773 [1:50:37<3:11:09, 5.45s/it] 64%|██████▎ | 3669/5773 [1:50:43<3:11:13, 5.45s/it] 64%|██████▎ | 3669/5773 [1:50:45<3:11:13, 5.45s/it] {'loss': 0.5561, 'learning_rate': 6.195856649126599e-06, 'epoch': 0.64} 64%|██████▎ | 3669/5773 [1:50:45<3:11:13, 5.45s/it] {'loss': 0.5561, 'learning_rate': 6.195856649126599e-06, 'epoch': 0.64} 64%|██████▎ | 3669/5773 [1:50:43<3:11:13, 5.45s/it] 64%|██████▎ | 3670/5773 [1:50:48<3:11:23, 5.46s/it] 64%|██████▎ | 3670/5773 [1:50:50<3:11:23, 5.46s/it] {'loss': 0.5722, 'learning_rate': 6.190668115063862e-06, 'epoch': 0.64} 64%|██████▎ | 3670/5773 [1:50:50<3:11:23, 5.46s/it] {'loss': 0.5722, 'learning_rate': 6.190668115063862e-06, 'epoch': 0.64} 64%|██████▎ | 3670/5773 [1:50:48<3:11:23, 5.46s/it] 64%|██████▎ | 3671/5773 [1:50:54<3:12:40, 5.50s/it] 64%|██████▎ | 3671/5773 [1:50:56<3:12:40, 5.50s/it] {'loss': 0.5543, 'learning_rate': 6.18548078030051e-06, 'epoch': 0.64} 64%|██████▎ | 3671/5773 [1:50:56<3:12:40, 5.50s/it] {'loss': 0.5543, 'learning_rate': 6.18548078030051e-06, 'epoch': 0.64} 64%|██████▎ | 3671/5773 [1:50:54<3:12:40, 5.50s/it] 64%|██████▎ | 3672/5773 [1:50:59<3:12:18, 5.49s/it] 64%|██████▎ | 3672/5773 [1:51:01<3:12:17, 5.49s/it] {'loss': 0.5715, 'learning_rate': 6.18029464646968e-06, 'epoch': 0.64} 64%|██████▎ | 3672/5773 [1:51:01<3:12:17, 5.49s/it] {'loss': 0.5715, 'learning_rate': 6.18029464646968e-06, 'epoch': 0.64} 64%|██████▎ | 3672/5773 [1:50:59<3:12:18, 5.49s/it] 64%|██████▎ | 3673/5773 [1:51:07<3:14:02, 5.54s/it] 64%|██████▎ | 3673/5773 [1:51:05<3:14:02, 5.54s/it] {'loss': 0.5841, 'learning_rate': 6.175109715204136e-06, 'epoch': 0.64} 64%|██████▎ | 3673/5773 [1:51:07<3:14:02, 5.54s/it] {'loss': 0.5841, 'learning_rate': 6.175109715204136e-06, 'epoch': 0.64} 64%|██████▎ | 3673/5773 [1:51:05<3:14:02, 5.54s/it] 64%|██████▎ | 3674/5773 [1:51:12<3:13:23, 5.53s/it] 64%|██████▎ | 3674/5773 [1:51:10<3:13:24, 5.53s/it] {'loss': 0.5578, 'learning_rate': 6.169925988136256e-06, 'epoch': 0.64} 64%|██████▎ | 3674/5773 [1:51:12<3:13:23, 5.53s/it] {'loss': 0.5578, 'learning_rate': 6.169925988136256e-06, 'epoch': 0.64} 64%|██████▎ | 3674/5773 [1:51:10<3:13:24, 5.53s/it] 64%|██████▎ | 3675/5773 [1:51:18<3:14:00, 5.55s/it] 64%|██████▎ | 3675/5773 [1:51:16<3:14:01, 5.55s/it] {'loss': 0.5575, 'learning_rate': 6.164743466898046e-06, 'epoch': 0.64} 64%|██████▎ | 3675/5773 [1:51:18<3:14:00, 5.55s/it] {'loss': 0.5575, 'learning_rate': 6.164743466898046e-06, 'epoch': 0.64} 64%|██████▎ | 3675/5773 [1:51:16<3:14:01, 5.55s/it] 64%|██████▎ | 3676/5773 [1:51:23<3:13:28, 5.54s/it] 64%|██████▎ | 3676/5773 [1:51:21<3:13:28, 5.54s/it] {'loss': 0.5614, 'learning_rate': 6.1595621531211305e-06, 'epoch': 0.64} 64%|██████▎ | 3676/5773 [1:51:23<3:13:28, 5.54s/it] {'loss': 0.5614, 'learning_rate': 6.1595621531211305e-06, 'epoch': 0.64} 64%|██████▎ | 3676/5773 [1:51:21<3:13:28, 5.54s/it] 64%|██████▎ | 3677/5773 [1:51:29<3:11:31, 5.48s/it] 64%|██████▎ | 3677/5773 [1:51:27<3:11:31, 5.48s/it] {'loss': 0.5756, 'learning_rate': 6.154382048436749e-06, 'epoch': 0.64} 64%|██████▎ | 3677/5773 [1:51:29<3:11:31, 5.48s/it] {'loss': 0.5756, 'learning_rate': 6.154382048436749e-06, 'epoch': 0.64} 64%|██████▎ | 3677/5773 [1:51:27<3:11:31, 5.48s/it] 64%|██████▎ | 3678/5773 [1:51:34<3:11:19, 5.48s/it] 64%|██████▎ | 3678/5773 [1:51:32<3:11:20, 5.48s/it] {'loss': 0.568, 'learning_rate': 6.149203154475764e-06, 'epoch': 0.64} 64%|██████▎ | 3678/5773 [1:51:34<3:11:19, 5.48s/it] {'loss': 0.568, 'learning_rate': 6.149203154475764e-06, 'epoch': 0.64} 64%|██████▎ | 3678/5773 [1:51:32<3:11:20, 5.48s/it] 64%|██████▎ | 3679/5773 [1:51:40<3:13:14, 5.54s/it] 64%|██████▎ | 3679/5773 [1:51:38<3:13:14, 5.54s/it] {'loss': 0.5615, 'learning_rate': 6.14402547286866e-06, 'epoch': 0.64} 64%|██████▎ | 3679/5773 [1:51:40<3:13:14, 5.54s/it] {'loss': 0.5615, 'learning_rate': 6.14402547286866e-06, 'epoch': 0.64} 64%|██████▎ | 3679/5773 [1:51:38<3:13:14, 5.54s/it] 64%|██████▎ | 3680/5773 [1:51:45<3:13:25, 5.54s/it] 64%|██████▎ | 3680/5773 [1:51:43<3:13:25, 5.54s/it] {'loss': 0.5649, 'learning_rate': 6.1388490052455305e-06, 'epoch': 0.64} 64%|██████▎ | 3680/5773 [1:51:45<3:13:25, 5.54s/it] {'loss': 0.5649, 'learning_rate': 6.1388490052455305e-06, 'epoch': 0.64} 64%|██████▎ | 3680/5773 [1:51:43<3:13:25, 5.54s/it] 64%|██████▍ | 3681/5773 [1:51:49<3:12:51, 5.53s/it] 64%|██████▍ | 3681/5773 [1:51:51<3:12:52, 5.53s/it] {'loss': 0.5586, 'learning_rate': 6.133673753236102e-06, 'epoch': 0.64} 64%|██████▍ | 3681/5773 [1:51:51<3:12:52, 5.53s/it] {'loss': 0.5586, 'learning_rate': 6.133673753236102e-06, 'epoch': 0.64} 64%|██████▍ | 3681/5773 [1:51:49<3:12:51, 5.53s/it] 64%|██████▍ | 3682/5773 [1:51:54<3:11:38, 5.50s/it] 64%|██████▍ | 3682/5773 [1:51:56<3:11:38, 5.50s/it] {'loss': 0.5611, 'learning_rate': 6.1284997184697e-06, 'epoch': 0.64} 64%|██████▍ | 3682/5773 [1:51:56<3:11:38, 5.50s/it] {'loss': 0.5611, 'learning_rate': 6.1284997184697e-06, 'epoch': 0.64} 64%|██████▍ | 3682/5773 [1:51:54<3:11:38, 5.50s/it] 64%|██████▍ | 3683/5773 [1:52:02<3:09:46, 5.45s/it] 64%|██████▍ | 3683/5773 [1:52:00<3:09:46, 5.45s/it] {'loss': 0.5604, 'learning_rate': 6.123326902575282e-06, 'epoch': 0.64} 64%|██████▍ | 3683/5773 [1:52:02<3:09:46, 5.45s/it] {'loss': 0.5604, 'learning_rate': 6.123326902575282e-06, 'epoch': 0.64} 64%|██████▍ | 3683/5773 [1:52:00<3:09:46, 5.45s/it] 64%|██████▍ | 3684/5773 [1:52:07<3:09:08, 5.43s/it] 64%|██████▍ | 3684/5773 [1:52:05<3:09:08, 5.43s/it] {'loss': 0.5564, 'learning_rate': 6.118155307181415e-06, 'epoch': 0.64} 64%|██████▍ | 3684/5773 [1:52:07<3:09:08, 5.43s/it]{'loss': 0.5564, 'learning_rate': 6.118155307181415e-06, 'epoch': 0.64} 64%|██████▍ | 3684/5773 [1:52:05<3:09:08, 5.43s/it] 64%|██████▍ | 3685/5773 [1:52:11<3:10:37, 5.48s/it] 64%|██████▍ | 3685/5773 [1:52:13<3:10:37, 5.48s/it] {'loss': 0.5675, 'learning_rate': 6.11298493391628e-06, 'epoch': 0.64} 64%|██████▍ | 3685/5773 [1:52:13<3:10:37, 5.48s/it] {'loss': 0.5675, 'learning_rate': 6.11298493391628e-06, 'epoch': 0.64} 64%|██████▍ | 3685/5773 [1:52:11<3:10:37, 5.48s/it] 64%|██████▍ | 3686/5773 [1:52:18<3:11:42, 5.51s/it] 64%|██████▍ | 3686/5773 [1:52:16<3:11:42, 5.51s/it] {'loss': 0.5714, 'learning_rate': 6.107815784407678e-06, 'epoch': 0.64} 64%|██████▍ | 3686/5773 [1:52:18<3:11:42, 5.51s/it] {'loss': 0.5714, 'learning_rate': 6.107815784407678e-06, 'epoch': 0.64} 64%|██████▍ | 3686/5773 [1:52:16<3:11:42, 5.51s/it] 64%|██████▍ | 3687/5773 [1:52:24<3:10:42, 5.49s/it] 64%|██████▍ | 3687/5773 [1:52:22<3:10:42, 5.49s/it] {'loss': 0.5615, 'learning_rate': 6.102647860283021e-06, 'epoch': 0.64} 64%|██████▍ | 3687/5773 [1:52:24<3:10:42, 5.49s/it] {'loss': 0.5615, 'learning_rate': 6.102647860283021e-06, 'epoch': 0.64} 64%|██████▍ | 3687/5773 [1:52:22<3:10:42, 5.49s/it] 64%|██████▍ | 3688/5773 [1:52:29<3:11:44, 5.52s/it] 64%|██████▍ | 3688/5773 [1:52:27<3:11:44, 5.52s/it] {'loss': 0.5485, 'learning_rate': 6.097481163169336e-06, 'epoch': 0.64} 64%|██████▍ | 3688/5773 [1:52:29<3:11:44, 5.52s/it] {'loss': 0.5485, 'learning_rate': 6.097481163169336e-06, 'epoch': 0.64} 64%|██████▍ | 3688/5773 [1:52:27<3:11:44, 5.52s/it] 64%|██████▍ | 3689/5773 [1:52:35<3:10:09, 5.47s/it] 64%|██████▍ | 3689/5773 [1:52:33<3:10:09, 5.47s/it] {'loss': 0.5558, 'learning_rate': 6.092315694693267e-06, 'epoch': 0.64} 64%|██████▍ | 3689/5773 [1:52:35<3:10:09, 5.47s/it] {'loss': 0.5558, 'learning_rate': 6.092315694693267e-06, 'epoch': 0.64} 64%|██████▍ | 3689/5773 [1:52:33<3:10:09, 5.47s/it] 64%|██████▍ | 3690/5773 [1:52:38<3:11:14, 5.51s/it] 64%|██████▍ | 3690/5773 [1:52:40<3:11:14, 5.51s/it] {'loss': 0.5652, 'learning_rate': 6.087151456481071e-06, 'epoch': 0.64} 64%|██████▍ | 3690/5773 [1:52:40<3:11:14, 5.51s/it] {'loss': 0.5652, 'learning_rate': 6.087151456481071e-06, 'epoch': 0.64} 64%|██████▍ | 3690/5773 [1:52:38<3:11:14, 5.51s/it] 64%|██████▍ | 3691/5773 [1:52:44<3:10:51, 5.50s/it] 64%|██████▍ | 3691/5773 [1:52:46<3:10:51, 5.50s/it] {'loss': 0.5623, 'learning_rate': 6.081988450158605e-06, 'epoch': 0.64} 64%|██████▍ | 3691/5773 [1:52:46<3:10:51, 5.50s/it] {'loss': 0.5623, 'learning_rate': 6.081988450158605e-06, 'epoch': 0.64} 64%|██████▍ | 3691/5773 [1:52:44<3:10:51, 5.50s/it] 64%|██████▍ | 3692/5773 [1:52:49<3:10:40, 5.50s/it] 64%|██████▍ | 3692/5773 [1:52:51<3:10:40, 5.50s/it] {'loss': 0.5675, 'learning_rate': 6.076826677351357e-06, 'epoch': 0.64} 64%|██████▍ | 3692/5773 [1:52:51<3:10:40, 5.50s/it] {'loss': 0.5675, 'learning_rate': 6.076826677351357e-06, 'epoch': 0.64} 64%|██████▍ | 3692/5773 [1:52:49<3:10:40, 5.50s/it] 64%|██████▍ | 3693/5773 [1:52:57<3:10:24, 5.49s/it] 64%|██████▍ | 3693/5773 [1:52:55<3:10:24, 5.49s/it] {'loss': 0.5589, 'learning_rate': 6.071666139684414e-06, 'epoch': 0.64} 64%|██████▍ | 3693/5773 [1:52:57<3:10:24, 5.49s/it] {'loss': 0.5589, 'learning_rate': 6.071666139684414e-06, 'epoch': 0.64} 64%|██████▍ | 3693/5773 [1:52:55<3:10:24, 5.49s/it] 64%|██████▍ | 3694/5773 [1:53:00<3:09:49, 5.48s/it] 64%|██████▍ | 3694/5773 [1:53:02<3:09:50, 5.48s/it] {'loss': 0.5536, 'learning_rate': 6.06650683878248e-06, 'epoch': 0.64} 64%|██████▍ | 3694/5773 [1:53:02<3:09:50, 5.48s/it] {'loss': 0.5536, 'learning_rate': 6.06650683878248e-06, 'epoch': 0.64} 64%|██████▍ | 3694/5773 [1:53:00<3:09:49, 5.48s/it] 64%|██████▍ | 3695/5773 [1:53:08<3:10:33, 5.50s/it] 64%|██████▍ | 3695/5773 [1:53:06<3:10:33, 5.50s/it] {'loss': 0.5411, 'learning_rate': 6.061348776269869e-06, 'epoch': 0.64} 64%|██████▍ | 3695/5773 [1:53:08<3:10:33, 5.50s/it] {'loss': 0.5411, 'learning_rate': 6.061348776269869e-06, 'epoch': 0.64} 64%|██████▍ | 3695/5773 [1:53:06<3:10:33, 5.50s/it] 64%|██████▍ | 3696/5773 [1:53:13<3:11:23, 5.53s/it] 64%|██████▍ | 3696/5773 [1:53:11<3:11:23, 5.53s/it] {'loss': 0.549, 'learning_rate': 6.056191953770501e-06, 'epoch': 0.64} 64%|██████▍ | 3696/5773 [1:53:13<3:11:23, 5.53s/it] {'loss': 0.549, 'learning_rate': 6.056191953770501e-06, 'epoch': 0.64} 64%|██████▍ | 3696/5773 [1:53:11<3:11:23, 5.53s/it] 64%|██████▍ | 3697/5773 [1:53:19<3:10:29, 5.51s/it] 64%|██████▍ | 3697/5773 [1:53:17<3:10:29, 5.51s/it] {'loss': 0.571, 'learning_rate': 6.0510363729079104e-06, 'epoch': 0.64} 64%|██████▍ | 3697/5773 [1:53:19<3:10:29, 5.51s/it] {'loss': 0.571, 'learning_rate': 6.0510363729079104e-06, 'epoch': 0.64} 64%|██████▍ | 3697/5773 [1:53:17<3:10:29, 5.51s/it] 64%|██████▍ | 3698/5773 [1:53:22<3:08:27, 5.45s/it] 64%|██████▍ | 3698/5773 [1:53:24<3:08:27, 5.45s/it] {'loss': 0.5419, 'learning_rate': 6.045882035305237e-06, 'epoch': 0.64} 64%|██████▍ | 3698/5773 [1:53:24<3:08:27, 5.45s/it] {'loss': 0.5419, 'learning_rate': 6.045882035305237e-06, 'epoch': 0.64} 64%|██████▍ | 3698/5773 [1:53:22<3:08:27, 5.45s/it] 64%|██████▍ | 3699/5773 [1:53:30<3:09:04, 5.47s/it] 64%|██████▍ | 3699/5773 [1:53:28<3:09:04, 5.47s/it] {'loss': 0.5406, 'learning_rate': 6.0407289425852335e-06, 'epoch': 0.64} 64%|██████▍ | 3699/5773 [1:53:30<3:09:04, 5.47s/it] {'loss': 0.5406, 'learning_rate': 6.0407289425852335e-06, 'epoch': 0.64} 64%|██████▍ | 3699/5773 [1:53:28<3:09:04, 5.47s/it]12 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 64%|██████▍ | 3700/5773 [1:53:35<3:09:53, 5.50s/it]9 AutoResumeHook: Checking whether to suspend... 7 13 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 0 1 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 64%|██████▍ | 3700/5773 [1:53:33<3:09:53, 5.50s/it] AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend...5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5566, 'learning_rate': 6.035577096370255e-06, 'epoch': 0.64} 64%|██████▍ | 3700/5773 [1:53:35<3:09:53, 5.50s/it] {'loss': 0.5566, 'learning_rate': 6.035577096370255e-06, 'epoch': 0.64} 64%|██████▍ | 3700/5773 [1:53:33<3:09:53, 5.50s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 64%|██████▍ | 3701/5773 [1:53:54<5:28:35, 9.52s/it] 64%|██████▍ | 3701/5773 [1:53:52<5:28:35, 9.52s/it] {'loss': 0.5575, 'learning_rate': 6.030426498282269e-06, 'epoch': 0.64} 64%|██████▍ | 3701/5773 [1:53:54<5:28:35, 9.52s/it] {'loss': 0.5575, 'learning_rate': 6.030426498282269e-06, 'epoch': 0.64} 64%|██████▍ | 3701/5773 [1:53:52<5:28:35, 9.52s/it] 64%|██████▍ | 3702/5773 [1:53:59<4:46:11, 8.29s/it] 64%|██████▍ | 3702/5773 [1:53:57<4:46:11, 8.29s/it] {'loss': 0.5524, 'learning_rate': 6.02527714994285e-06, 'epoch': 0.64} 64%|██████▍ | 3702/5773 [1:53:59<4:46:11, 8.29s/it] {'loss': 0.5524, 'learning_rate': 6.02527714994285e-06, 'epoch': 0.64} 64%|██████▍ | 3702/5773 [1:53:57<4:46:11, 8.29s/it] 64%|██████▍ | 3703/5773 [1:54:05<4:15:49, 7.42s/it] 64%|██████▍ | 3703/5773 [1:54:03<4:15:49, 7.42s/it] {'loss': 0.5676, 'learning_rate': 6.020129052973174e-06, 'epoch': 0.64} 64%|██████▍ | 3703/5773 [1:54:05<4:15:49, 7.42s/it] {'loss': 0.5676, 'learning_rate': 6.020129052973174e-06, 'epoch': 0.64} 64%|██████▍ | 3703/5773 [1:54:03<4:15:49, 7.42s/it] 64%|██████▍ | 3704/5773 [1:54:10<3:56:02, 6.85s/it] 64%|██████▍ | 3704/5773 [1:54:08<3:56:02, 6.85s/it] {'loss': 0.5672, 'learning_rate': 6.014982208994033e-06, 'epoch': 0.64} 64%|██████▍ | 3704/5773 [1:54:10<3:56:02, 6.85s/it] {'loss': 0.5672, 'learning_rate': 6.014982208994033e-06, 'epoch': 0.64} 64%|██████▍ | 3704/5773 [1:54:08<3:56:02, 6.85s/it] 64%|██████▍ | 3705/5773 [1:54:16<3:43:56, 6.50s/it] 64%|██████▍ | 3705/5773 [1:54:14<3:43:56, 6.50s/it] {'loss': 0.5755, 'learning_rate': 6.0098366196258095e-06, 'epoch': 0.64} 64%|██████▍ | 3705/5773 [1:54:16<3:43:56, 6.50s/it] {'loss': 0.5755, 'learning_rate': 6.0098366196258095e-06, 'epoch': 0.64} 64%|██████▍ | 3705/5773 [1:54:14<3:43:56, 6.50s/it] 64%|██████▍ | 3706/5773 [1:54:21<3:33:25, 6.19s/it] 64%|██████▍ | 3706/5773 [1:54:19<3:33:24, 6.19s/it] {'loss': 0.5639, 'learning_rate': 6.004692286488503e-06, 'epoch': 0.64} 64%|██████▍ | 3706/5773 [1:54:21<3:33:25, 6.19s/it] {'loss': 0.5639, 'learning_rate': 6.004692286488503e-06, 'epoch': 0.64} 64%|██████▍ | 3706/5773 [1:54:19<3:33:24, 6.19s/it] 64%|██████▍ | 3707/5773 [1:54:27<3:26:03, 5.98s/it] 64%|██████▍ | 3707/5773 [1:54:25<3:26:03, 5.98s/it] {'loss': 0.5578, 'learning_rate': 5.999549211201711e-06, 'epoch': 0.64} 64%|██████▍ | 3707/5773 [1:54:27<3:26:03, 5.98s/it] {'loss': 0.5578, 'learning_rate': 5.999549211201711e-06, 'epoch': 0.64} 64%|██████▍ | 3707/5773 [1:54:25<3:26:03, 5.98s/it] 64%|██████▍ | 3708/5773 [1:54:33<3:22:19, 5.88s/it] 64%|██████▍ | 3708/5773 [1:54:31<3:22:19, 5.88s/it] {'loss': 0.5756, 'learning_rate': 5.994407395384645e-06, 'epoch': 0.64} 64%|██████▍ | 3708/5773 [1:54:33<3:22:19, 5.88s/it] {'loss': 0.5756, 'learning_rate': 5.994407395384645e-06, 'epoch': 0.64} 64%|██████▍ | 3708/5773 [1:54:31<3:22:19, 5.88s/it] 64%|██████▍ | 3709/5773 [1:54:38<3:18:06, 5.76s/it] 64%|██████▍ | 3709/5773 [1:54:36<3:18:06, 5.76s/it] {'loss': 0.5669, 'learning_rate': 5.989266840656107e-06, 'epoch': 0.64} 64%|██████▍ | 3709/5773 [1:54:38<3:18:06, 5.76s/it] {'loss': 0.5669, 'learning_rate': 5.989266840656107e-06, 'epoch': 0.64} 64%|██████▍ | 3709/5773 [1:54:36<3:18:06, 5.76s/it] 64%|██████▍ | 3710/5773 [1:54:44<3:15:05, 5.67s/it] 64%|██████▍ | 3710/5773 [1:54:42<3:15:05, 5.67s/it] {'loss': 0.5554, 'learning_rate': 5.984127548634511e-06, 'epoch': 0.64} 64%|██████▍ | 3710/5773 [1:54:44<3:15:05, 5.67s/it] {'loss': 0.5554, 'learning_rate': 5.984127548634511e-06, 'epoch': 0.64} 64%|██████▍ | 3710/5773 [1:54:42<3:15:05, 5.67s/it] 64%|██████▍ | 3711/5773 [1:54:49<3:13:10, 5.62s/it] 64%|██████▍ | 3711/5773 [1:54:47<3:13:10, 5.62s/it] {'loss': 0.5559, 'learning_rate': 5.978989520937865e-06, 'epoch': 0.64} 64%|██████▍ | 3711/5773 [1:54:49<3:13:10, 5.62s/it] {'loss': 0.5559, 'learning_rate': 5.978989520937865e-06, 'epoch': 0.64} 64%|██████▍ | 3711/5773 [1:54:47<3:13:10, 5.62s/it] 64%|██████▍ | 3712/5773 [1:54:55<3:12:16, 5.60s/it] 64%|██████▍ | 3712/5773 [1:54:53<3:12:16, 5.60s/it] {'loss': 0.5629, 'learning_rate': 5.973852759183793e-06, 'epoch': 0.64} 64%|██████▍ | 3712/5773 [1:54:55<3:12:16, 5.60s/it] {'loss': 0.5629, 'learning_rate': 5.973852759183793e-06, 'epoch': 0.64} 64%|██████▍ | 3712/5773 [1:54:53<3:12:16, 5.60s/it] 64%|██████▍ | 3713/5773 [1:55:00<3:10:45, 5.56s/it] 64%|██████▍ | 3713/5773 [1:54:58<3:10:45, 5.56s/it] {'loss': 0.586, 'learning_rate': 5.968717264989504e-06, 'epoch': 0.64} 64%|██████▍ | 3713/5773 [1:55:00<3:10:45, 5.56s/it] {'loss': 0.586, 'learning_rate': 5.968717264989504e-06, 'epoch': 0.64} 64%|██████▍ | 3713/5773 [1:54:58<3:10:45, 5.56s/it] 64%|██████▍ | 3714/5773 [1:55:06<3:09:59, 5.54s/it] 64%|██████▍ | 3714/5773 [1:55:04<3:09:59, 5.54s/it] {'loss': 0.5536, 'learning_rate': 5.963583039971819e-06, 'epoch': 0.64} 64%|██████▍ | 3714/5773 [1:55:06<3:09:59, 5.54s/it] {'loss': 0.5536, 'learning_rate': 5.963583039971819e-06, 'epoch': 0.64} 64%|██████▍ | 3714/5773 [1:55:04<3:09:59, 5.54s/it] 64%|██████▍ | 3715/5773 [1:55:11<3:09:04, 5.51s/it] 64%|██████▍ | 3715/5773 [1:55:09<3:09:04, 5.51s/it] {'loss': 0.5632, 'learning_rate': 5.9584500857471575e-06, 'epoch': 0.64} 64%|██████▍ | 3715/5773 [1:55:11<3:09:04, 5.51s/it] {'loss': 0.5632, 'learning_rate': 5.9584500857471575e-06, 'epoch': 0.64} 64%|██████▍ | 3715/5773 [1:55:09<3:09:04, 5.51s/it] 64%|██████▍ | 3716/5773 [1:55:15<3:09:58, 5.54s/it] 64%|██████▍ | 3716/5773 [1:55:17<3:09:58, 5.54s/it] {'loss': 0.5695, 'learning_rate': 5.953318403931533e-06, 'epoch': 0.64} 64%|██████▍ | 3716/5773 [1:55:17<3:09:58, 5.54s/it] {'loss': 0.5695, 'learning_rate': 5.953318403931533e-06, 'epoch': 0.64} 64%|██████▍ | 3716/5773 [1:55:15<3:09:58, 5.54s/it] 64%|██████▍ | 3717/5773 [1:55:22<3:09:38, 5.53s/it] 64%|██████▍ | 3717/5773 [1:55:20<3:09:38, 5.53s/it] {'loss': 0.5698, 'learning_rate': 5.948187996140568e-06, 'epoch': 0.64} 64%|██████▍ | 3717/5773 [1:55:22<3:09:38, 5.53s/it] {'loss': 0.5698, 'learning_rate': 5.948187996140568e-06, 'epoch': 0.64} 64%|██████▍ | 3717/5773 [1:55:20<3:09:38, 5.53s/it] 64%|██████▍ | 3718/5773 [1:55:28<3:10:30, 5.56s/it] 64%|██████▍ | 3718/5773 [1:55:26<3:10:30, 5.56s/it] {'loss': 0.5485, 'learning_rate': 5.943058863989478e-06, 'epoch': 0.64} 64%|██████▍ | 3718/5773 [1:55:28<3:10:30, 5.56s/it] {'loss': 0.5485, 'learning_rate': 5.943058863989478e-06, 'epoch': 0.64} 64%|██████▍ | 3718/5773 [1:55:26<3:10:30, 5.56s/it] 64%|██████▍ | 3719/5773 [1:55:33<3:09:21, 5.53s/it] 64%|██████▍ | 3719/5773 [1:55:31<3:09:21, 5.53s/it] {'loss': 0.5678, 'learning_rate': 5.937931009093072e-06, 'epoch': 0.64} 64%|██████▍ | 3719/5773 [1:55:33<3:09:21, 5.53s/it] {'loss': 0.5678, 'learning_rate': 5.937931009093072e-06, 'epoch': 0.64} 64%|██████▍ | 3719/5773 [1:55:31<3:09:21, 5.53s/it] 64%|██████▍ | 3720/5773 [1:55:39<3:09:33, 5.54s/it] 64%|██████▍ | 3720/5773 [1:55:37<3:09:33, 5.54s/it] {'loss': 0.5688, 'learning_rate': 5.932804433065767e-06, 'epoch': 0.64} 64%|██████▍ | 3720/5773 [1:55:39<3:09:33, 5.54s/it] {'loss': 0.5688, 'learning_rate': 5.932804433065767e-06, 'epoch': 0.64} 64%|██████▍ | 3720/5773 [1:55:37<3:09:33, 5.54s/it] 64%|██████▍ | 3721/5773 [1:55:44<3:09:30, 5.54s/it] 64%|██████▍ | 3721/5773 [1:55:42<3:09:30, 5.54s/it] {'loss': 0.5645, 'learning_rate': 5.927679137521573e-06, 'epoch': 0.64} 64%|██████▍ | 3721/5773 [1:55:44<3:09:30, 5.54s/it] {'loss': 0.5645, 'learning_rate': 5.927679137521573e-06, 'epoch': 0.64} 64%|██████▍ | 3721/5773 [1:55:42<3:09:30, 5.54s/it] 64%|██████▍ | 3722/5773 [1:55:50<3:08:22, 5.51s/it] 64%|██████▍ | 3722/5773 [1:55:48<3:08:23, 5.51s/it] {'loss': 0.5585, 'learning_rate': 5.922555124074096e-06, 'epoch': 0.64} 64%|██████▍ | 3722/5773 [1:55:50<3:08:22, 5.51s/it] {'loss': 0.5585, 'learning_rate': 5.922555124074096e-06, 'epoch': 0.64} 64%|██████▍ | 3722/5773 [1:55:48<3:08:23, 5.51s/it] 64%|██████▍ | 3723/5773 [1:55:55<3:09:26, 5.54s/it] 64%|██████▍ | 3723/5773 [1:55:53<3:09:26, 5.54s/it] {'loss': 0.5564, 'learning_rate': 5.917432394336542e-06, 'epoch': 0.64} 64%|██████▍ | 3723/5773 [1:55:55<3:09:26, 5.54s/it] {'loss': 0.5564, 'learning_rate': 5.917432394336542e-06, 'epoch': 0.64} 64%|██████▍ | 3723/5773 [1:55:53<3:09:26, 5.54s/it] 65%|██████▍ | 3724/5773 [1:56:01<3:11:22, 5.60s/it] 65%|██████▍ | 3724/5773 [1:55:59<3:11:22, 5.60s/it] {'loss': 0.5689, 'learning_rate': 5.912310949921705e-06, 'epoch': 0.65} 65%|██████▍ | 3724/5773 [1:56:01<3:11:22, 5.60s/it] {'loss': 0.5689, 'learning_rate': 5.912310949921705e-06, 'epoch': 0.65} 65%|██████▍ | 3724/5773 [1:55:59<3:11:22, 5.60s/it] 65%|██████▍ | 3725/5773 [1:56:06<3:09:06, 5.54s/it] 65%|██████▍ | 3725/5773 [1:56:05<3:09:06, 5.54s/it] {'loss': 0.5737, 'learning_rate': 5.9071907924419805e-06, 'epoch': 0.65} 65%|██████▍ | 3725/5773 [1:56:06<3:09:06, 5.54s/it] {'loss': 0.5737, 'learning_rate': 5.9071907924419805e-06, 'epoch': 0.65} 65%|██████▍ | 3725/5773 [1:56:05<3:09:06, 5.54s/it] 65%|██████▍ | 3726/5773 [1:56:12<3:08:36, 5.53s/it] 65%|██████▍ | 3726/5773 [1:56:10<3:08:35, 5.53s/it] {'loss': 0.5597, 'learning_rate': 5.90207192350936e-06, 'epoch': 0.65} 65%|██████▍ | 3726/5773 [1:56:12<3:08:36, 5.53s/it] {'loss': 0.5597, 'learning_rate': 5.90207192350936e-06, 'epoch': 0.65} 65%|██████▍ | 3726/5773 [1:56:10<3:08:35, 5.53s/it] 65%|██████▍ | 3727/5773 [1:56:18<3:10:07, 5.58s/it] 65%|██████▍ | 3727/5773 [1:56:16<3:10:06, 5.58s/it] {'loss': 0.552, 'learning_rate': 5.896954344735426e-06, 'epoch': 0.65} 65%|██████▍ | 3727/5773 [1:56:18<3:10:07, 5.58s/it] {'loss': 0.552, 'learning_rate': 5.896954344735426e-06, 'epoch': 0.65} 65%|██████▍ | 3727/5773 [1:56:16<3:10:06, 5.58s/it] 65%|██████▍ | 3728/5773 [1:56:23<3:09:58, 5.57s/it] 65%|██████▍ | 3728/5773 [1:56:21<3:09:58, 5.57s/it] {'loss': 0.5618, 'learning_rate': 5.891838057731358e-06, 'epoch': 0.65} 65%|██████▍ | 3728/5773 [1:56:23<3:09:58, 5.57s/it] {'loss': 0.5618, 'learning_rate': 5.891838057731358e-06, 'epoch': 0.65} 65%|██████▍ | 3728/5773 [1:56:21<3:09:58, 5.57s/it] 65%|██████▍ | 3729/5773 [1:56:29<3:08:32, 5.53s/it] 65%|██████▍ | 3729/5773 [1:56:27<3:08:32, 5.53s/it] {'loss': 0.5664, 'learning_rate': 5.886723064107921e-06, 'epoch': 0.65} 65%|██████▍ | 3729/5773 [1:56:29<3:08:32, 5.53s/it] {'loss': 0.5664, 'learning_rate': 5.886723064107921e-06, 'epoch': 0.65} 65%|██████▍ | 3729/5773 [1:56:27<3:08:32, 5.53s/it] 65%|██████▍ | 3730/5773 [1:56:34<3:08:31, 5.54s/it] 65%|██████▍ | 3730/5773 [1:56:32<3:08:31, 5.54s/it] {'loss': 0.5697, 'learning_rate': 5.881609365475485e-06, 'epoch': 0.65} 65%|██████▍ | 3730/5773 [1:56:34<3:08:31, 5.54s/it] {'loss': 0.5697, 'learning_rate': 5.881609365475485e-06, 'epoch': 0.65} 65%|██████▍ | 3730/5773 [1:56:32<3:08:31, 5.54s/it] 65%|██████▍ | 3731/5773 [1:56:40<3:07:13, 5.50s/it] 65%|██████▍ | 3731/5773 [1:56:38<3:07:13, 5.50s/it] {'loss': 0.5649, 'learning_rate': 5.876496963443998e-06, 'epoch': 0.65} 65%|██████▍ | 3731/5773 [1:56:40<3:07:13, 5.50s/it] {'loss': 0.5649, 'learning_rate': 5.876496963443998e-06, 'epoch': 0.65} 65%|██████▍ | 3731/5773 [1:56:38<3:07:13, 5.50s/it] 65%|██████▍ | 3732/5773 [1:56:45<3:07:49, 5.52s/it] 65%|██████▍ | 3732/5773 [1:56:43<3:07:49, 5.52s/it] {'loss': 0.5649, 'learning_rate': 5.871385859623018e-06, 'epoch': 0.65} 65%|██████▍ | 3732/5773 [1:56:45<3:07:49, 5.52s/it] {'loss': 0.5649, 'learning_rate': 5.871385859623018e-06, 'epoch': 0.65} 65%|██████▍ | 3732/5773 [1:56:43<3:07:49, 5.52s/it] 65%|██████▍ | 3733/5773 [1:56:51<3:05:20, 5.45s/it] 65%|██████▍ | 3733/5773 [1:56:49<3:05:20, 5.45s/it] {'loss': 0.561, 'learning_rate': 5.866276055621673e-06, 'epoch': 0.65} 65%|██████▍ | 3733/5773 [1:56:51<3:05:20, 5.45s/it] {'loss': 0.561, 'learning_rate': 5.866276055621673e-06, 'epoch': 0.65} 65%|██████▍ | 3733/5773 [1:56:49<3:05:20, 5.45s/it] 65%|██████▍ | 3734/5773 [1:56:56<3:06:44, 5.50s/it] 65%|██████▍ | 3734/5773 [1:56:54<3:06:44, 5.50s/it] {'loss': 0.5728, 'learning_rate': 5.861167553048699e-06, 'epoch': 0.65} 65%|██████▍ | 3734/5773 [1:56:56<3:06:44, 5.50s/it] {'loss': 0.5728, 'learning_rate': 5.861167553048699e-06, 'epoch': 0.65} 65%|██████▍ | 3734/5773 [1:56:54<3:06:44, 5.50s/it] 65%|██████▍ | 3735/5773 [1:57:02<3:07:36, 5.52s/it] 65%|██████▍ | 3735/5773 [1:57:00<3:07:36, 5.52s/it] {'loss': 0.566, 'learning_rate': 5.856060353512414e-06, 'epoch': 0.65} 65%|██████▍ | 3735/5773 [1:57:02<3:07:36, 5.52s/it] {'loss': 0.566, 'learning_rate': 5.856060353512414e-06, 'epoch': 0.65} 65%|██████▍ | 3735/5773 [1:57:00<3:07:36, 5.52s/it] 65%|██████▍ | 3736/5773 [1:57:07<3:07:11, 5.51s/it] 65%|██████▍ | 3736/5773 [1:57:05<3:07:11, 5.51s/it] {'loss': 0.5657, 'learning_rate': 5.850954458620728e-06, 'epoch': 0.65} 65%|██████▍ | 3736/5773 [1:57:07<3:07:11, 5.51s/it] {'loss': 0.5657, 'learning_rate': 5.850954458620728e-06, 'epoch': 0.65} 65%|██████▍ | 3736/5773 [1:57:05<3:07:11, 5.51s/it] 65%|██████▍ | 3737/5773 [1:57:13<3:08:44, 5.56s/it] 65%|██████▍ | 3737/5773 [1:57:11<3:08:44, 5.56s/it] {'loss': 0.5671, 'learning_rate': 5.845849869981137e-06, 'epoch': 0.65} 65%|██████▍ | 3737/5773 [1:57:13<3:08:44, 5.56s/it] {'loss': 0.5671, 'learning_rate': 5.845849869981137e-06, 'epoch': 0.65} 65%|██████▍ | 3737/5773 [1:57:11<3:08:44, 5.56s/it] 65%|██████▍ | 3738/5773 [1:57:18<3:05:35, 5.47s/it] 65%|██████▍ | 3738/5773 [1:57:16<3:05:35, 5.47s/it] {'loss': 0.5711, 'learning_rate': 5.840746589200732e-06, 'epoch': 0.65} 65%|██████▍ | 3738/5773 [1:57:18<3:05:35, 5.47s/it] {'loss': 0.5711, 'learning_rate': 5.840746589200732e-06, 'epoch': 0.65} 65%|██████▍ | 3738/5773 [1:57:16<3:05:35, 5.47s/it] 65%|██████▍ | 3739/5773 [1:57:24<3:05:58, 5.49s/it] 65%|██████▍ | 3739/5773 [1:57:22<3:05:57, 5.49s/it] {'loss': 0.5591, 'learning_rate': 5.835644617886189e-06, 'epoch': 0.65} 65%|██████▍ | 3739/5773 [1:57:24<3:05:58, 5.49s/it] {'loss': 0.5591, 'learning_rate': 5.835644617886189e-06, 'epoch': 0.65} 65%|██████▍ | 3739/5773 [1:57:22<3:05:57, 5.49s/it] 65%|██████▍ | 3740/5773 [1:57:29<3:07:30, 5.53s/it] 65%|██████▍ | 3740/5773 [1:57:27<3:07:30, 5.53s/it] {'loss': 0.5841, 'learning_rate': 5.830543957643773e-06, 'epoch': 0.65} 65%|██████▍ | 3740/5773 [1:57:29<3:07:30, 5.53s/it] {'loss': 0.5841, 'learning_rate': 5.830543957643773e-06, 'epoch': 0.65} 65%|██████▍ | 3740/5773 [1:57:27<3:07:30, 5.53s/it] 65%|██████▍ | 3741/5773 [1:57:35<3:05:44, 5.48s/it] 65%|██████▍ | 3741/5773 [1:57:33<3:05:44, 5.48s/it] {'loss': 0.5775, 'learning_rate': 5.825444610079326e-06, 'epoch': 0.65} 65%|██████▍ | 3741/5773 [1:57:35<3:05:44, 5.48s/it] {'loss': 0.5775, 'learning_rate': 5.825444610079326e-06, 'epoch': 0.65} 65%|██████▍ | 3741/5773 [1:57:33<3:05:44, 5.48s/it] 65%|██████▍ | 3742/5773 [1:57:40<3:06:01, 5.50s/it] 65%|██████▍ | 3742/5773 [1:57:38<3:06:01, 5.50s/it] {'loss': 0.5575, 'learning_rate': 5.820346576798297e-06, 'epoch': 0.65} 65%|██████▍ | 3742/5773 [1:57:40<3:06:01, 5.50s/it] {'loss': 0.5575, 'learning_rate': 5.820346576798297e-06, 'epoch': 0.65} 65%|██████▍ | 3742/5773 [1:57:38<3:06:01, 5.50s/it] 65%|██████▍ | 3743/5773 [1:57:46<3:04:17, 5.45s/it] 65%|██████▍ | 3743/5773 [1:57:44<3:04:17, 5.45s/it] {'loss': 0.5581, 'learning_rate': 5.815249859405704e-06, 'epoch': 0.65} 65%|██████▍ | 3743/5773 [1:57:46<3:04:17, 5.45s/it] {'loss': 0.5581, 'learning_rate': 5.815249859405704e-06, 'epoch': 0.65} 65%|██████▍ | 3743/5773 [1:57:44<3:04:17, 5.45s/it] 65%|██████▍ | 3744/5773 [1:57:51<3:03:40, 5.43s/it] 65%|██████▍ | 3744/5773 [1:57:49<3:03:40, 5.43s/it] {'loss': 0.5736, 'learning_rate': 5.810154459506152e-06, 'epoch': 0.65} 65%|██████▍ | 3744/5773 [1:57:51<3:03:40, 5.43s/it] {'loss': 0.5736, 'learning_rate': 5.810154459506152e-06, 'epoch': 0.65} 65%|██████▍ | 3744/5773 [1:57:49<3:03:40, 5.43s/it] 65%|██████▍ | 3745/5773 [1:57:56<3:02:33, 5.40s/it] 65%|██████▍ | 3745/5773 [1:57:54<3:02:33, 5.40s/it] {'loss': 0.559, 'learning_rate': 5.805060378703848e-06, 'epoch': 0.65} 65%|██████▍ | 3745/5773 [1:57:56<3:02:33, 5.40s/it] {'loss': 0.559, 'learning_rate': 5.805060378703848e-06, 'epoch': 0.65} 65%|██████▍ | 3745/5773 [1:57:54<3:02:33, 5.40s/it] 65%|██████▍ | 3746/5773 [1:58:02<3:04:29, 5.46s/it] 65%|██████▍ | 3746/5773 [1:58:00<3:04:29, 5.46s/it] {'loss': 0.5638, 'learning_rate': 5.799967618602565e-06, 'epoch': 0.65} 65%|██████▍ | 3746/5773 [1:58:02<3:04:29, 5.46s/it] {'loss': 0.5638, 'learning_rate': 5.799967618602565e-06, 'epoch': 0.65} 65%|██████▍ | 3746/5773 [1:58:00<3:04:29, 5.46s/it] 65%|██████▍ | 3747/5773 [1:58:07<3:05:43, 5.50s/it] 65%|██████▍ | 3747/5773 [1:58:05<3:05:44, 5.50s/it] {'loss': 0.5593, 'learning_rate': 5.794876180805659e-06, 'epoch': 0.65} 65%|██████▍ | 3747/5773 [1:58:07<3:05:43, 5.50s/it] {'loss': 0.5593, 'learning_rate': 5.794876180805659e-06, 'epoch': 0.65} 65%|██████▍ | 3747/5773 [1:58:05<3:05:44, 5.50s/it] 65%|██████▍ | 3748/5773 [1:58:13<3:05:01, 5.48s/it] 65%|██████▍ | 3748/5773 [1:58:11<3:05:01, 5.48s/it] {'loss': 0.5719, 'learning_rate': 5.789786066916089e-06, 'epoch': 0.65} 65%|██████▍ | 3748/5773 [1:58:13<3:05:01, 5.48s/it] {'loss': 0.5719, 'learning_rate': 5.789786066916089e-06, 'epoch': 0.65} 65%|██████▍ | 3748/5773 [1:58:11<3:05:01, 5.48s/it] 65%|██████▍ | 3749/5773 [1:58:18<3:04:08, 5.46s/it] 65%|██████▍ | 3749/5773 [1:58:16<3:04:08, 5.46s/it] {'loss': 0.5796, 'learning_rate': 5.78469727853638e-06, 'epoch': 0.65} 65%|██████▍ | 3749/5773 [1:58:18<3:04:08, 5.46s/it] {'loss': 0.5796, 'learning_rate': 5.78469727853638e-06, 'epoch': 0.65} 65%|██████▍ | 3749/5773 [1:58:16<3:04:08, 5.46s/it]2 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 64 0 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 911 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 65%|██████▍ | 3750/5773 [1:58:22<3:05:56, 5.51s/it]51 7AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 65%|██████▍ | 3750/5773 [1:58:24<3:05:56, 5.51s/it]15 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5573, 'learning_rate': 5.779609817268646e-06, 'epoch': 0.65} 65%|██████▍ | 3750/5773 [1:58:24<3:05:56, 5.51s/it] {'loss': 0.5573, 'learning_rate': 5.779609817268646e-06, 'epoch': 0.65} 65%|██████▍ | 3750/5773 [1:58:22<3:05:56, 5.51s/it] 65%|██████▍ | 3751/5773 [1:58:29<3:05:37, 5.51s/it] 65%|██████▍ | 3751/5773 [1:58:27<3:05:37, 5.51s/it] {'loss': 0.5636, 'learning_rate': 5.774523684714577e-06, 'epoch': 0.65} 65%|██████▍ | 3751/5773 [1:58:29<3:05:37, 5.51s/it] {'loss': 0.5636, 'learning_rate': 5.774523684714577e-06, 'epoch': 0.65} 65%|██████▍ | 3751/5773 [1:58:27<3:05:37, 5.51s/it] 65%|██████▍ | 3752/5773 [1:58:33<3:06:27, 5.54s/it] 65%|██████▍ | 3752/5773 [1:58:35<3:06:27, 5.54s/it] {'loss': 0.5585, 'learning_rate': 5.769438882475459e-06, 'epoch': 0.65} 65%|██████▍ | 3752/5773 [1:58:35<3:06:27, 5.54s/it] {'loss': 0.5585, 'learning_rate': 5.769438882475459e-06, 'epoch': 0.65} 65%|██████▍ | 3752/5773 [1:58:33<3:06:27, 5.54s/it] 65%|██████▌ | 3753/5773 [1:58:40<3:04:48, 5.49s/it] 65%|██████▌ | 3753/5773 [1:58:38<3:04:48, 5.49s/it] {'loss': 0.5605, 'learning_rate': 5.764355412152147e-06, 'epoch': 0.65} 65%|██████▌ | 3753/5773 [1:58:40<3:04:48, 5.49s/it] {'loss': 0.5605, 'learning_rate': 5.764355412152147e-06, 'epoch': 0.65} 65%|██████▌ | 3753/5773 [1:58:38<3:04:48, 5.49s/it] 65%|██████▌ | 3754/5773 [1:58:46<3:03:22, 5.45s/it] 65%|██████▌ | 3754/5773 [1:58:44<3:03:22, 5.45s/it] {'loss': 0.5613, 'learning_rate': 5.759273275345074e-06, 'epoch': 0.65} 65%|██████▌ | 3754/5773 [1:58:46<3:03:22, 5.45s/it] {'loss': 0.5613, 'learning_rate': 5.759273275345074e-06, 'epoch': 0.65} 65%|██████▌ | 3754/5773 [1:58:44<3:03:22, 5.45s/it] 65%|██████▌ | 3755/5773 [1:58:51<3:05:01, 5.50s/it] 65%|██████▌ | 3755/5773 [1:58:49<3:05:01, 5.50s/it] {'loss': 0.5528, 'learning_rate': 5.7541924736542694e-06, 'epoch': 0.65} 65%|██████▌ | 3755/5773 [1:58:51<3:05:01, 5.50s/it] {'loss': 0.5528, 'learning_rate': 5.7541924736542694e-06, 'epoch': 0.65} 65%|██████▌ | 3755/5773 [1:58:49<3:05:01, 5.50s/it] 65%|██████▌ | 3756/5773 [1:58:57<3:06:14, 5.54s/it] 65%|██████▌ | 3756/5773 [1:58:55<3:06:14, 5.54s/it] {'loss': 0.5442, 'learning_rate': 5.7491130086793275e-06, 'epoch': 0.65} 65%|██████▌ | 3756/5773 [1:58:57<3:06:14, 5.54s/it] {'loss': 0.5442, 'learning_rate': 5.7491130086793275e-06, 'epoch': 0.65} 65%|██████▌ | 3756/5773 [1:58:55<3:06:14, 5.54s/it] 65%|██████▌ | 3757/5773 [1:59:02<3:03:55, 5.47s/it] 65%|██████▌ | 3757/5773 [1:59:00<3:03:55, 5.47s/it] {'loss': 0.5868, 'learning_rate': 5.7440348820194204e-06, 'epoch': 0.65} 65%|██████▌ | 3757/5773 [1:59:02<3:03:55, 5.47s/it] {'loss': 0.5868, 'learning_rate': 5.7440348820194204e-06, 'epoch': 0.65} 65%|██████▌ | 3757/5773 [1:59:00<3:03:55, 5.47s/it] 65%|██████▌ | 3758/5773 [1:59:08<3:04:40, 5.50s/it] 65%|██████▌ | 3758/5773 [1:59:06<3:04:40, 5.50s/it] {'loss': 0.5761, 'learning_rate': 5.7389580952733136e-06, 'epoch': 0.65} 65%|██████▌ | 3758/5773 [1:59:08<3:04:40, 5.50s/it] {'loss': 0.5761, 'learning_rate': 5.7389580952733136e-06, 'epoch': 0.65} 65%|██████▌ | 3758/5773 [1:59:06<3:04:40, 5.50s/it] 65%|██████▌ | 3759/5773 [1:59:13<3:04:37, 5.50s/it] 65%|██████▌ | 3759/5773 [1:59:11<3:04:38, 5.50s/it] {'loss': 0.5651, 'learning_rate': 5.733882650039338e-06, 'epoch': 0.65} 65%|██████▌ | 3759/5773 [1:59:13<3:04:37, 5.50s/it] {'loss': 0.5651, 'learning_rate': 5.733882650039338e-06, 'epoch': 0.65} 65%|██████▌ | 3759/5773 [1:59:11<3:04:38, 5.50s/it] 65%|██████▌ | 3760/5773 [1:59:19<3:04:26, 5.50s/it] 65%|██████▌ | 3760/5773 [1:59:17<3:04:26, 5.50s/it] {'loss': 0.5894, 'learning_rate': 5.728808547915405e-06, 'epoch': 0.65} 65%|██████▌ | 3760/5773 [1:59:19<3:04:26, 5.50s/it] {'loss': 0.5894, 'learning_rate': 5.728808547915405e-06, 'epoch': 0.65} 65%|██████▌ | 3760/5773 [1:59:17<3:04:26, 5.50s/it] 65%|██████▌ | 3761/5773 [1:59:25<3:06:33, 5.56s/it] 65%|██████▌ | 3761/5773 [1:59:23<3:06:33, 5.56s/it] {'loss': 0.5658, 'learning_rate': 5.723735790499005e-06, 'epoch': 0.65} 65%|██████▌ | 3761/5773 [1:59:25<3:06:33, 5.56s/it] {'loss': 0.5658, 'learning_rate': 5.723735790499005e-06, 'epoch': 0.65} 65%|██████▌ | 3761/5773 [1:59:23<3:06:33, 5.56s/it] 65%|██████▌ | 3762/5773 [1:59:30<3:05:15, 5.53s/it] 65%|██████▌ | 3762/5773 [1:59:28<3:05:15, 5.53s/it] {'loss': 0.5791, 'learning_rate': 5.718664379387205e-06, 'epoch': 0.65} 65%|██████▌ | 3762/5773 [1:59:30<3:05:15, 5.53s/it] {'loss': 0.5791, 'learning_rate': 5.718664379387205e-06, 'epoch': 0.65} 65%|██████▌ | 3762/5773 [1:59:28<3:05:15, 5.53s/it] 65%|██████▌ | 3763/5773 [1:59:36<3:05:55, 5.55s/it] 65%|██████▌ | 3763/5773 [1:59:34<3:05:55, 5.55s/it] {'loss': 0.5566, 'learning_rate': 5.71359431617664e-06, 'epoch': 0.65} 65%|██████▌ | 3763/5773 [1:59:36<3:05:55, 5.55s/it] {'loss': 0.5566, 'learning_rate': 5.71359431617664e-06, 'epoch': 0.65} 65%|██████▌ | 3763/5773 [1:59:34<3:05:55, 5.55s/it] 65%|██████▌ | 3764/5773 [1:59:41<3:03:53, 5.49s/it] 65%|██████▌ | 3764/5773 [1:59:39<3:03:53, 5.49s/it] {'loss': 0.5718, 'learning_rate': 5.7085256024635404e-06, 'epoch': 0.65} 65%|██████▌ | 3764/5773 [1:59:41<3:03:53, 5.49s/it] {'loss': 0.5718, 'learning_rate': 5.7085256024635404e-06, 'epoch': 0.65} 65%|██████▌ | 3764/5773 [1:59:39<3:03:53, 5.49s/it] 65%|██████▌ | 3765/5773 [1:59:46<3:03:27, 5.48s/it] 65%|██████▌ | 3765/5773 [1:59:44<3:03:27, 5.48s/it] {'loss': 0.5575, 'learning_rate': 5.703458239843691e-06, 'epoch': 0.65} 65%|██████▌ | 3765/5773 [1:59:46<3:03:27, 5.48s/it] {'loss': 0.5575, 'learning_rate': 5.703458239843691e-06, 'epoch': 0.65} 65%|██████▌ | 3765/5773 [1:59:44<3:03:27, 5.48s/it] 65%|██████▌ | 3766/5773 [1:59:52<3:02:26, 5.45s/it] 65%|██████▌ | 3766/5773 [1:59:50<3:02:26, 5.45s/it] {'loss': 0.5491, 'learning_rate': 5.698392229912463e-06, 'epoch': 0.65} 65%|██████▌ | 3766/5773 [1:59:52<3:02:26, 5.45s/it] {'loss': 0.5491, 'learning_rate': 5.698392229912463e-06, 'epoch': 0.65} 65%|██████▌ | 3766/5773 [1:59:50<3:02:26, 5.45s/it] 65%|██████▌ | 3767/5773 [1:59:55<3:01:26, 5.43s/it] 65%|██████▌ | 3767/5773 [1:59:57<3:01:27, 5.43s/it] {'loss': 0.5542, 'learning_rate': 5.693327574264791e-06, 'epoch': 0.65} 65%|██████▌ | 3767/5773 [1:59:57<3:01:27, 5.43s/it] {'loss': 0.5542, 'learning_rate': 5.693327574264791e-06, 'epoch': 0.65} 65%|██████▌ | 3767/5773 [1:59:55<3:01:26, 5.43s/it] 65%|██████▌ | 3768/5773 [2:00:03<3:01:56, 5.44s/it] 65%|██████▌ | 3768/5773 [2:00:01<3:01:56, 5.44s/it] {'loss': 0.5703, 'learning_rate': 5.688264274495201e-06, 'epoch': 0.65} 65%|██████▌ | 3768/5773 [2:00:03<3:01:56, 5.44s/it] {'loss': 0.5703, 'learning_rate': 5.688264274495201e-06, 'epoch': 0.65} 65%|██████▌ | 3768/5773 [2:00:01<3:01:56, 5.44s/it] 65%|██████▌ | 3769/5773 [2:00:08<3:02:16, 5.46s/it] 65%|██████▌ | 3769/5773 [2:00:06<3:02:17, 5.46s/it] {'loss': 0.5588, 'learning_rate': 5.683202332197775e-06, 'epoch': 0.65} 65%|██████▌ | 3769/5773 [2:00:08<3:02:16, 5.46s/it] {'loss': 0.5588, 'learning_rate': 5.683202332197775e-06, 'epoch': 0.65} 65%|██████▌ | 3769/5773 [2:00:06<3:02:17, 5.46s/it] 65%|██████▌ | 3770/5773 [2:00:14<3:02:38, 5.47s/it] 65%|██████▌ | 3770/5773 [2:00:12<3:02:38, 5.47s/it] {'loss': 0.5685, 'learning_rate': 5.6781417489661725e-06, 'epoch': 0.65} 65%|██████▌ | 3770/5773 [2:00:14<3:02:38, 5.47s/it] {'loss': 0.5685, 'learning_rate': 5.6781417489661725e-06, 'epoch': 0.65} 65%|██████▌ | 3770/5773 [2:00:12<3:02:38, 5.47s/it] 65%|██████▌ | 3771/5773 [2:00:19<3:02:59, 5.48s/it] 65%|██████▌ | 3771/5773 [2:00:17<3:02:59, 5.48s/it] {'loss': 0.5742, 'learning_rate': 5.673082526393634e-06, 'epoch': 0.65} 65%|██████▌ | 3771/5773 [2:00:19<3:02:59, 5.48s/it] {'loss': 0.5742, 'learning_rate': 5.673082526393634e-06, 'epoch': 0.65} 65%|██████▌ | 3771/5773 [2:00:17<3:02:59, 5.48s/it] 65%|██████▌ | 3772/5773 [2:00:25<3:02:53, 5.48s/it] 65%|██████▌ | 3772/5773 [2:00:23<3:02:53, 5.48s/it] {'loss': 0.5809, 'learning_rate': 5.6680246660729584e-06, 'epoch': 0.65} 65%|██████▌ | 3772/5773 [2:00:25<3:02:53, 5.48s/it] {'loss': 0.5809, 'learning_rate': 5.6680246660729584e-06, 'epoch': 0.65} 65%|██████▌ | 3772/5773 [2:00:23<3:02:53, 5.48s/it] 65%|██████▌ | 3773/5773 [2:00:30<3:02:53, 5.49s/it] 65%|██████▌ | 3773/5773 [2:00:28<3:02:53, 5.49s/it] {'loss': 0.5545, 'learning_rate': 5.662968169596525e-06, 'epoch': 0.65} 65%|██████▌ | 3773/5773 [2:00:30<3:02:53, 5.49s/it] {'loss': 0.5545, 'learning_rate': 5.662968169596525e-06, 'epoch': 0.65} 65%|██████▌ | 3773/5773 [2:00:28<3:02:53, 5.49s/it] 65%|██████▌ | 3774/5773 [2:00:36<3:03:20, 5.50s/it] 65%|██████▌ | 3774/5773 [2:00:34<3:03:20, 5.50s/it] {'loss': 0.547, 'learning_rate': 5.657913038556279e-06, 'epoch': 0.65} 65%|██████▌ | 3774/5773 [2:00:36<3:03:20, 5.50s/it] {'loss': 0.547, 'learning_rate': 5.657913038556279e-06, 'epoch': 0.65} 65%|██████▌ | 3774/5773 [2:00:34<3:03:20, 5.50s/it] 65%|██████▌ | 3775/5773 [2:00:39<3:04:04, 5.53s/it] 65%|██████▌ | 3775/5773 [2:00:41<3:04:05, 5.53s/it] {'loss': 0.5533, 'learning_rate': 5.652859274543738e-06, 'epoch': 0.65} 65%|██████▌ | 3775/5773 [2:00:41<3:04:05, 5.53s/it] {'loss': 0.5533, 'learning_rate': 5.652859274543738e-06, 'epoch': 0.65} 65%|██████▌ | 3775/5773 [2:00:39<3:04:04, 5.53s/it] 65%|██████▌ | 3776/5773 [2:00:47<3:03:52, 5.52s/it] 65%|██████▌ | 3776/5773 [2:00:45<3:03:52, 5.52s/it] {'loss': 0.5559, 'learning_rate': 5.647806879149983e-06, 'epoch': 0.65} 65%|██████▌ | 3776/5773 [2:00:47<3:03:52, 5.52s/it] {'loss': 0.5559, 'learning_rate': 5.647806879149983e-06, 'epoch': 0.65} 65%|██████▌ | 3776/5773 [2:00:45<3:03:52, 5.52s/it] 65%|██████▌ | 3777/5773 [2:00:50<3:03:02, 5.50s/it] 65%|██████▌ | 3777/5773 [2:00:52<3:03:02, 5.50s/it] {'loss': 0.5818, 'learning_rate': 5.642755853965678e-06, 'epoch': 0.65} 65%|██████▌ | 3777/5773 [2:00:52<3:03:02, 5.50s/it] {'loss': 0.5818, 'learning_rate': 5.642755853965678e-06, 'epoch': 0.65} 65%|██████▌ | 3777/5773 [2:00:50<3:03:02, 5.50s/it] 65%|██████▌ | 3778/5773 [2:00:56<3:02:15, 5.48s/it] 65%|██████▌ | 3778/5773 [2:00:58<3:02:15, 5.48s/it] {'loss': 0.5665, 'learning_rate': 5.637706200581043e-06, 'epoch': 0.65} 65%|██████▌ | 3778/5773 [2:00:58<3:02:15, 5.48s/it] {'loss': 0.5665, 'learning_rate': 5.637706200581043e-06, 'epoch': 0.65} 65%|██████▌ | 3778/5773 [2:00:56<3:02:15, 5.48s/it] 65%|██████▌ | 3779/5773 [2:01:03<3:03:31, 5.52s/it] 65%|██████▌ | 3779/5773 [2:01:01<3:03:31, 5.52s/it] {'loss': 0.5579, 'learning_rate': 5.63265792058587e-06, 'epoch': 0.65} 65%|██████▌ | 3779/5773 [2:01:03<3:03:31, 5.52s/it] {'loss': 0.5579, 'learning_rate': 5.63265792058587e-06, 'epoch': 0.65} 65%|██████▌ | 3779/5773 [2:01:01<3:03:31, 5.52s/it] 65%|██████▌ | 3780/5773 [2:01:09<3:02:33, 5.50s/it] 65%|██████▌ | 3780/5773 [2:01:07<3:02:34, 5.50s/it] {'loss': 0.5554, 'learning_rate': 5.627611015569516e-06, 'epoch': 0.65} 65%|██████▌ | 3780/5773 [2:01:09<3:02:33, 5.50s/it] {'loss': 0.5554, 'learning_rate': 5.627611015569516e-06, 'epoch': 0.65} 65%|██████▌ | 3780/5773 [2:01:07<3:02:34, 5.50s/it] 65%|██████▌ | 3781/5773 [2:01:14<3:01:47, 5.48s/it] 65%|██████▌ | 3781/5773 [2:01:12<3:01:47, 5.48s/it] {'loss': 0.5797, 'learning_rate': 5.622565487120915e-06, 'epoch': 0.65} 65%|██████▌ | 3781/5773 [2:01:14<3:01:47, 5.48s/it] {'loss': 0.5797, 'learning_rate': 5.622565487120915e-06, 'epoch': 0.65} 65%|██████▌ | 3781/5773 [2:01:12<3:01:47, 5.48s/it] 66%|██████▌ | 3782/5773 [2:01:20<3:01:31, 5.47s/it] 66%|██████▌ | 3782/5773 [2:01:18<3:01:31, 5.47s/it] {'loss': 0.5514, 'learning_rate': 5.617521336828556e-06, 'epoch': 0.66} 66%|██████▌ | 3782/5773 [2:01:20<3:01:31, 5.47s/it] {'loss': 0.5514, 'learning_rate': 5.617521336828556e-06, 'epoch': 0.66} 66%|██████▌ | 3782/5773 [2:01:18<3:01:31, 5.47s/it] 66%|██████▌ | 3783/5773 [2:01:25<3:01:51, 5.48s/it] 66%|██████▌ | 3783/5773 [2:01:23<3:01:52, 5.48s/it] {'loss': 0.575, 'learning_rate': 5.612478566280497e-06, 'epoch': 0.66} 66%|██████▌ | 3783/5773 [2:01:25<3:01:51, 5.48s/it] {'loss': 0.575, 'learning_rate': 5.612478566280497e-06, 'epoch': 0.66} 66%|██████▌ | 3783/5773 [2:01:23<3:01:52, 5.48s/it] 66%|██████▌ | 3784/5773 [2:01:31<3:01:57, 5.49s/it] 66%|██████▌ | 3784/5773 [2:01:29<3:01:57, 5.49s/it] {'loss': 0.5704, 'learning_rate': 5.607437177064367e-06, 'epoch': 0.66} 66%|██████▌ | 3784/5773 [2:01:31<3:01:57, 5.49s/it] {'loss': 0.5704, 'learning_rate': 5.607437177064367e-06, 'epoch': 0.66} 66%|██████▌ | 3784/5773 [2:01:29<3:01:57, 5.49s/it] 66%|██████▌ | 3785/5773 [2:01:36<3:01:31, 5.48s/it] 66%|██████▌ | 3785/5773 [2:01:34<3:01:31, 5.48s/it] {'loss': 0.5634, 'learning_rate': 5.602397170767357e-06, 'epoch': 0.66} 66%|██████▌ | 3785/5773 [2:01:36<3:01:31, 5.48s/it] {'loss': 0.5634, 'learning_rate': 5.602397170767357e-06, 'epoch': 0.66} 66%|██████▌ | 3785/5773 [2:01:34<3:01:31, 5.48s/it] 66%|██████▌ | 3786/5773 [2:01:42<3:01:44, 5.49s/it] 66%|██████▌ | 3786/5773 [2:01:40<3:01:44, 5.49s/it] {'loss': 0.5522, 'learning_rate': 5.597358548976221e-06, 'epoch': 0.66} 66%|██████▌ | 3786/5773 [2:01:42<3:01:44, 5.49s/it] {'loss': 0.5522, 'learning_rate': 5.597358548976221e-06, 'epoch': 0.66} 66%|██████▌ | 3786/5773 [2:01:40<3:01:44, 5.49s/it] 66%|██████▌ | 3787/5773 [2:01:47<3:01:20, 5.48s/it] 66%|██████▌ | 3787/5773 [2:01:45<3:01:20, 5.48s/it] {'loss': 0.5658, 'learning_rate': 5.5923213132772715e-06, 'epoch': 0.66} 66%|██████▌ | 3787/5773 [2:01:47<3:01:20, 5.48s/it] {'loss': 0.5658, 'learning_rate': 5.5923213132772715e-06, 'epoch': 0.66} 66%|██████▌ | 3787/5773 [2:01:45<3:01:20, 5.48s/it] 66%|██████▌ | 3788/5773 [2:01:53<3:03:43, 5.55s/it] 66%|██████▌ | 3788/5773 [2:01:51<3:03:43, 5.55s/it] {'loss': 0.5572, 'learning_rate': 5.587285465256405e-06, 'epoch': 0.66} 66%|██████▌ | 3788/5773 [2:01:53<3:03:43, 5.55s/it] {'loss': 0.5572, 'learning_rate': 5.587285465256405e-06, 'epoch': 0.66} 66%|██████▌ | 3788/5773 [2:01:51<3:03:43, 5.55s/it] 66%|██████▌ | 3789/5773 [2:01:58<3:00:41, 5.46s/it] 66%|██████▌ | 3789/5773 [2:01:56<3:00:41, 5.46s/it] {'loss': 0.5496, 'learning_rate': 5.5822510064990534e-06, 'epoch': 0.66} 66%|██████▌ | 3789/5773 [2:01:58<3:00:41, 5.46s/it] {'loss': 0.5496, 'learning_rate': 5.5822510064990534e-06, 'epoch': 0.66} 66%|██████▌ | 3789/5773 [2:01:56<3:00:41, 5.46s/it] 66%|██████▌ | 3790/5773 [2:02:04<3:01:02, 5.48s/it] 66%|██████▌ | 3790/5773 [2:02:02<3:01:02, 5.48s/it] {'loss': 0.5694, 'learning_rate': 5.577217938590232e-06, 'epoch': 0.66} 66%|██████▌ | 3790/5773 [2:02:04<3:01:02, 5.48s/it] {'loss': 0.5694, 'learning_rate': 5.577217938590232e-06, 'epoch': 0.66} 66%|██████▌ | 3790/5773 [2:02:02<3:01:02, 5.48s/it] 66%|██████▌ | 3791/5773 [2:02:09<3:00:07, 5.45s/it] 66%|██████▌ | 3791/5773 [2:02:07<3:00:07, 5.45s/it] {'loss': 0.5564, 'learning_rate': 5.572186263114512e-06, 'epoch': 0.66} 66%|██████▌ | 3791/5773 [2:02:09<3:00:07, 5.45s/it] {'loss': 0.5564, 'learning_rate': 5.572186263114512e-06, 'epoch': 0.66} 66%|██████▌ | 3791/5773 [2:02:07<3:00:07, 5.45s/it] 66%|██████▌ | 3792/5773 [2:02:15<3:03:07, 5.55s/it] 66%|██████▌ | 3792/5773 [2:02:13<3:03:07, 5.55s/it] {'loss': 0.5766, 'learning_rate': 5.567155981656025e-06, 'epoch': 0.66} 66%|██████▌ | 3792/5773 [2:02:15<3:03:07, 5.55s/it] {'loss': 0.5766, 'learning_rate': 5.567155981656025e-06, 'epoch': 0.66} 66%|██████▌ | 3792/5773 [2:02:13<3:03:07, 5.55s/it] 66%|██████▌ | 3793/5773 [2:02:20<3:02:43, 5.54s/it] 66%|██████▌ | 3793/5773 [2:02:18<3:02:43, 5.54s/it] {'loss': 0.5621, 'learning_rate': 5.5621270957984575e-06, 'epoch': 0.66} 66%|██████▌ | 3793/5773 [2:02:20<3:02:43, 5.54s/it] {'loss': 0.5621, 'learning_rate': 5.5621270957984575e-06, 'epoch': 0.66} 66%|██████▌ | 3793/5773 [2:02:18<3:02:43, 5.54s/it] 66%|██████▌ | 3794/5773 [2:02:26<3:02:49, 5.54s/it] 66%|██████▌ | 3794/5773 [2:02:24<3:02:49, 5.54s/it] {'loss': 0.5685, 'learning_rate': 5.557099607125073e-06, 'epoch': 0.66} 66%|██████▌ | 3794/5773 [2:02:26<3:02:49, 5.54s/it] {'loss': 0.5685, 'learning_rate': 5.557099607125073e-06, 'epoch': 0.66} 66%|██████▌ | 3794/5773 [2:02:24<3:02:49, 5.54s/it] 66%|██████▌ | 3795/5773 [2:02:31<3:01:38, 5.51s/it] 66%|██████▌ | 3795/5773 [2:02:29<3:01:38, 5.51s/it] {'loss': 0.5765, 'learning_rate': 5.5520735172186846e-06, 'epoch': 0.66} 66%|██████▌ | 3795/5773 [2:02:31<3:01:38, 5.51s/it] {'loss': 0.5765, 'learning_rate': 5.5520735172186846e-06, 'epoch': 0.66} 66%|██████▌ | 3795/5773 [2:02:29<3:01:38, 5.51s/it] 66%|██████▌ | 3796/5773 [2:02:37<3:02:32, 5.54s/it] 66%|██████▌ | 3796/5773 [2:02:35<3:02:32, 5.54s/it] {'loss': 0.5596, 'learning_rate': 5.547048827661657e-06, 'epoch': 0.66} 66%|██████▌ | 3796/5773 [2:02:37<3:02:32, 5.54s/it] {'loss': 0.5596, 'learning_rate': 5.547048827661657e-06, 'epoch': 0.66} 66%|██████▌ | 3796/5773 [2:02:35<3:02:32, 5.54s/it] 66%|██████▌ | 3797/5773 [2:02:42<3:02:17, 5.54s/it] 66%|██████▌ | 3797/5773 [2:02:40<3:02:17, 5.54s/it] {'loss': 0.5482, 'learning_rate': 5.542025540035935e-06, 'epoch': 0.66} 66%|██████▌ | 3797/5773 [2:02:42<3:02:17, 5.54s/it] {'loss': 0.5482, 'learning_rate': 5.542025540035935e-06, 'epoch': 0.66} 66%|██████▌ | 3797/5773 [2:02:40<3:02:17, 5.54s/it] 66%|██████▌ | 3798/5773 [2:02:48<3:00:27, 5.48s/it] 66%|██████▌ | 3798/5773 [2:02:46<3:00:27, 5.48s/it] {'loss': 0.5432, 'learning_rate': 5.537003655923003e-06, 'epoch': 0.66} 66%|██████▌ | 3798/5773 [2:02:48<3:00:27, 5.48s/it] {'loss': 0.5432, 'learning_rate': 5.537003655923003e-06, 'epoch': 0.66} 66%|██████▌ | 3798/5773 [2:02:46<3:00:27, 5.48s/it] 66%|██████▌ | 3799/5773 [2:02:53<3:00:26, 5.48s/it] 66%|██████▌ | 3799/5773 [2:02:51<3:00:26, 5.48s/it] {'loss': 0.559, 'learning_rate': 5.531983176903913e-06, 'epoch': 0.66} 66%|██████▌ | 3799/5773 [2:02:53<3:00:26, 5.48s/it] {'loss': 0.559, 'learning_rate': 5.531983176903913e-06, 'epoch': 0.66} 66%|██████▌ | 3799/5773 [2:02:51<3:00:26, 5.48s/it]10 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend...4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 66%|██████▌ | 3800/5773 [2:02:59<3:00:42, 5.50s/it]9 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 66%|██████▌ | 3800/5773 [2:02:57<3:00:42, 5.50s/it]11 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5615, 'learning_rate': 5.5269641045592645e-06, 'epoch': 0.66} 66%|██████▌ | 3800/5773 [2:02:59<3:00:42, 5.50s/it] {'loss': 0.5615, 'learning_rate': 5.5269641045592645e-06, 'epoch': 0.66} 66%|██████▌ | 3800/5773 [2:02:57<3:00:42, 5.50s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 66%|██████▌ | 3801/5773 [2:03:22<5:57:06, 10.87s/it] 66%|██████▌ | 3801/5773 [2:03:20<5:57:06, 10.87s/it] {'loss': 0.5681, 'learning_rate': 5.5219464404692346e-06, 'epoch': 0.66} 66%|██████▌ | 3801/5773 [2:03:22<5:57:06, 10.87s/it] {'loss': 0.5681, 'learning_rate': 5.5219464404692346e-06, 'epoch': 0.66} 66%|██████▌ | 3801/5773 [2:03:20<5:57:06, 10.87s/it] 66%|██████▌ | 3802/5773 [2:03:26<5:03:17, 9.23s/it] 66%|██████▌ | 3802/5773 [2:03:28<5:03:17, 9.23s/it] {'loss': 0.5781, 'learning_rate': 5.516930186213538e-06, 'epoch': 0.66} 66%|██████▌ | 3802/5773 [2:03:28<5:03:17, 9.23s/it] {'loss': 0.5781, 'learning_rate': 5.516930186213538e-06, 'epoch': 0.66} 66%|██████▌ | 3802/5773 [2:03:26<5:03:17, 9.23s/it] 66%|██████▌ | 3803/5773 [2:03:31<4:26:46, 8.13s/it] 66%|██████▌ | 3803/5773 [2:03:33<4:26:46, 8.13s/it] {'loss': 0.5604, 'learning_rate': 5.511915343371452e-06, 'epoch': 0.66} 66%|██████▌ | 3803/5773 [2:03:33<4:26:46, 8.13s/it] {'loss': 0.5604, 'learning_rate': 5.511915343371452e-06, 'epoch': 0.66} 66%|██████▌ | 3803/5773 [2:03:31<4:26:46, 8.13s/it] 66%|██████▌ | 3804/5773 [2:03:37<4:00:18, 7.32s/it] 66%|██████▌ | 3804/5773 [2:03:39<4:00:18, 7.32s/it] {'loss': 0.5787, 'learning_rate': 5.506901913521808e-06, 'epoch': 0.66} 66%|██████▌ | 3804/5773 [2:03:39<4:00:18, 7.32s/it] {'loss': 0.5787, 'learning_rate': 5.506901913521808e-06, 'epoch': 0.66} 66%|██████▌ | 3804/5773 [2:03:37<4:00:18, 7.32s/it] 66%|██████▌ | 3805/5773 [2:03:44<3:41:12, 6.74s/it] 66%|██████▌ | 3805/5773 [2:03:42<3:41:13, 6.74s/it] {'loss': 0.5661, 'learning_rate': 5.501889898242996e-06, 'epoch': 0.66} 66%|██████▌ | 3805/5773 [2:03:44<3:41:12, 6.74s/it] {'loss': 0.5661, 'learning_rate': 5.501889898242996e-06, 'epoch': 0.66} 66%|██████▌ | 3805/5773 [2:03:42<3:41:13, 6.74s/it] 66%|██████▌ | 3806/5773 [2:03:48<3:30:12, 6.41s/it] 66%|██████▌ | 3806/5773 [2:03:50<3:30:12, 6.41s/it] {'loss': 0.5677, 'learning_rate': 5.496879299112954e-06, 'epoch': 0.66} 66%|██████▌ | 3806/5773 [2:03:50<3:30:12, 6.41s/it] {'loss': 0.5677, 'learning_rate': 5.496879299112954e-06, 'epoch': 0.66} 66%|██████▌ | 3806/5773 [2:03:48<3:30:12, 6.41s/it] 66%|██████▌ | 3807/5773 [2:03:55<3:19:48, 6.10s/it] 66%|██████▌ | 3807/5773 [2:03:53<3:19:49, 6.10s/it] {'loss': 0.5622, 'learning_rate': 5.491870117709186e-06, 'epoch': 0.66} 66%|██████▌ | 3807/5773 [2:03:55<3:19:48, 6.10s/it] {'loss': 0.5622, 'learning_rate': 5.491870117709186e-06, 'epoch': 0.66} 66%|██████▌ | 3807/5773 [2:03:53<3:19:49, 6.10s/it] 66%|██████▌ | 3808/5773 [2:04:00<3:14:15, 5.93s/it] 66%|██████▌ | 3808/5773 [2:03:59<3:14:15, 5.93s/it] {'loss': 0.5678, 'learning_rate': 5.486862355608738e-06, 'epoch': 0.66} 66%|██████▌ | 3808/5773 [2:04:00<3:14:15, 5.93s/it] {'loss': 0.5678, 'learning_rate': 5.486862355608738e-06, 'epoch': 0.66} 66%|██████▌ | 3808/5773 [2:03:59<3:14:15, 5.93s/it] 66%|██████▌ | 3809/5773 [2:04:04<3:09:11, 5.78s/it] 66%|██████▌ | 3809/5773 [2:04:06<3:09:11, 5.78s/it] {'loss': 0.5402, 'learning_rate': 5.481856014388211e-06, 'epoch': 0.66} 66%|██████▌ | 3809/5773 [2:04:06<3:09:11, 5.78s/it] {'loss': 0.5402, 'learning_rate': 5.481856014388211e-06, 'epoch': 0.66} 66%|██████▌ | 3809/5773 [2:04:04<3:09:11, 5.78s/it] 66%|██████▌ | 3810/5773 [2:04:09<3:06:47, 5.71s/it] 66%|██████▌ | 3810/5773 [2:04:11<3:06:47, 5.71s/it] {'loss': 0.5583, 'learning_rate': 5.476851095623767e-06, 'epoch': 0.66} 66%|██████▌ | 3810/5773 [2:04:11<3:06:47, 5.71s/it] {'loss': 0.5583, 'learning_rate': 5.476851095623767e-06, 'epoch': 0.66} 66%|██████▌ | 3810/5773 [2:04:09<3:06:47, 5.71s/it] 66%|██████▌ | 3811/5773 [2:04:15<3:05:18, 5.67s/it] 66%|██████▌ | 3811/5773 [2:04:17<3:05:18, 5.67s/it] {'loss': 0.5641, 'learning_rate': 5.471847600891114e-06, 'epoch': 0.66} 66%|██████▌ | 3811/5773 [2:04:17<3:05:18, 5.67s/it] {'loss': 0.5641, 'learning_rate': 5.471847600891114e-06, 'epoch': 0.66} 66%|██████▌ | 3811/5773 [2:04:15<3:05:18, 5.67s/it] 66%|██████▌ | 3812/5773 [2:04:21<3:04:17, 5.64s/it] 66%|██████▌ | 3812/5773 [2:04:23<3:04:17, 5.64s/it] {'loss': 0.5643, 'learning_rate': 5.4668455317655086e-06, 'epoch': 0.66} 66%|██████▌ | 3812/5773 [2:04:23<3:04:17, 5.64s/it] {'loss': 0.5643, 'learning_rate': 5.4668455317655086e-06, 'epoch': 0.66} 66%|██████▌ | 3812/5773 [2:04:21<3:04:17, 5.64s/it] 66%|██████▌ | 3813/5773 [2:04:28<3:03:50, 5.63s/it] 66%|██████▌ | 3813/5773 [2:04:26<3:03:51, 5.63s/it] {'loss': 0.5551, 'learning_rate': 5.461844889821759e-06, 'epoch': 0.66} 66%|██████▌ | 3813/5773 [2:04:28<3:03:50, 5.63s/it] {'loss': 0.5551, 'learning_rate': 5.461844889821759e-06, 'epoch': 0.66} 66%|██████▌ | 3813/5773 [2:04:26<3:03:51, 5.63s/it] 66%|██████▌ | 3814/5773 [2:04:32<3:03:31, 5.62s/it] 66%|██████▌ | 3814/5773 [2:04:34<3:03:32, 5.62s/it] {'loss': 0.5622, 'learning_rate': 5.456845676634235e-06, 'epoch': 0.66} 66%|██████▌ | 3814/5773 [2:04:34<3:03:32, 5.62s/it] {'loss': 0.5622, 'learning_rate': 5.456845676634235e-06, 'epoch': 0.66} 66%|██████▌ | 3814/5773 [2:04:32<3:03:31, 5.62s/it] 66%|██████▌ | 3815/5773 [2:04:39<3:01:19, 5.56s/it] 66%|██████▌ | 3815/5773 [2:04:37<3:01:19, 5.56s/it] {'loss': 0.557, 'learning_rate': 5.451847893776845e-06, 'epoch': 0.66} 66%|██████▌ | 3815/5773 [2:04:39<3:01:19, 5.56s/it] {'loss': 0.557, 'learning_rate': 5.451847893776845e-06, 'epoch': 0.66} 66%|██████▌ | 3815/5773 [2:04:37<3:01:19, 5.56s/it] 66%|██████▌ | 3816/5773 [2:04:43<3:00:14, 5.53s/it] 66%|██████▌ | 3816/5773 [2:04:45<3:00:14, 5.53s/it] {'loss': 0.5689, 'learning_rate': 5.446851542823051e-06, 'epoch': 0.66} 66%|██████▌ | 3816/5773 [2:04:45<3:00:14, 5.53s/it] {'loss': 0.5689, 'learning_rate': 5.446851542823051e-06, 'epoch': 0.66} 66%|██████▌ | 3816/5773 [2:04:43<3:00:14, 5.53s/it] 66%|██████▌ | 3817/5773 [2:04:48<3:00:06, 5.52s/it] 66%|██████▌ | 3817/5773 [2:04:50<3:00:06, 5.52s/it] {'loss': 0.5479, 'learning_rate': 5.441856625345863e-06, 'epoch': 0.66} 66%|██████▌ | 3817/5773 [2:04:50<3:00:06, 5.52s/it] {'loss': 0.5479, 'learning_rate': 5.441856625345863e-06, 'epoch': 0.66} 66%|██████▌ | 3817/5773 [2:04:48<3:00:06, 5.52s/it] 66%|██████▌ | 3818/5773 [2:04:54<2:58:57, 5.49s/it] 66%|██████▌ | 3818/5773 [2:04:56<2:58:57, 5.49s/it] {'loss': 0.5524, 'learning_rate': 5.4368631429178406e-06, 'epoch': 0.66} 66%|██████▌ | 3818/5773 [2:04:56<2:58:57, 5.49s/it] {'loss': 0.5524, 'learning_rate': 5.4368631429178406e-06, 'epoch': 0.66} 66%|██████▌ | 3818/5773 [2:04:54<2:58:57, 5.49s/it] 66%|██████▌ | 3819/5773 [2:05:01<2:58:44, 5.49s/it] 66%|██████▌ | 3819/5773 [2:04:59<2:58:44, 5.49s/it] {'loss': 0.5599, 'learning_rate': 5.43187109711109e-06, 'epoch': 0.66} 66%|██████▌ | 3819/5773 [2:05:01<2:58:44, 5.49s/it] {'loss': 0.5599, 'learning_rate': 5.43187109711109e-06, 'epoch': 0.66} 66%|██████▌ | 3819/5773 [2:04:59<2:58:44, 5.49s/it] 66%|██████▌ | 3820/5773 [2:05:05<2:59:21, 5.51s/it] 66%|██████▌ | 3820/5773 [2:05:07<2:59:21, 5.51s/it] {'loss': 0.57, 'learning_rate': 5.4268804894972705e-06, 'epoch': 0.66} 66%|██████▌ | 3820/5773 [2:05:07<2:59:21, 5.51s/it] {'loss': 0.57, 'learning_rate': 5.4268804894972705e-06, 'epoch': 0.66} 66%|██████▌ | 3820/5773 [2:05:05<2:59:21, 5.51s/it] 66%|██████▌ | 3821/5773 [2:05:12<2:58:24, 5.48s/it] 66%|██████▌ | 3821/5773 [2:05:10<2:58:25, 5.48s/it] {'loss': 0.5574, 'learning_rate': 5.421891321647583e-06, 'epoch': 0.66} 66%|██████▌ | 3821/5773 [2:05:12<2:58:24, 5.48s/it] {'loss': 0.5574, 'learning_rate': 5.421891321647583e-06, 'epoch': 0.66} 66%|██████▌ | 3821/5773 [2:05:10<2:58:25, 5.48s/it] 66%|██████▌ | 3822/5773 [2:05:15<2:56:50, 5.44s/it] 66%|██████▌ | 3822/5773 [2:05:17<2:56:50, 5.44s/it] {'loss': 0.5607, 'learning_rate': 5.416903595132774e-06, 'epoch': 0.66} 66%|██████▌ | 3822/5773 [2:05:17<2:56:50, 5.44s/it] {'loss': 0.5607, 'learning_rate': 5.416903595132774e-06, 'epoch': 0.66} 66%|██████▌ | 3822/5773 [2:05:15<2:56:50, 5.44s/it] 66%|██████▌ | 3823/5773 [2:05:23<2:58:30, 5.49s/it] 66%|██████▌ | 3823/5773 [2:05:21<2:58:30, 5.49s/it] {'loss': 0.5661, 'learning_rate': 5.4119173115231475e-06, 'epoch': 0.66} 66%|██████▌ | 3823/5773 [2:05:23<2:58:30, 5.49s/it] {'loss': 0.5661, 'learning_rate': 5.4119173115231475e-06, 'epoch': 0.66} 66%|██████▌ | 3823/5773 [2:05:21<2:58:30, 5.49s/it] 66%|██████▌ | 3824/5773 [2:05:28<2:58:11, 5.49s/it] 66%|██████▌ | 3824/5773 [2:05:27<2:58:11, 5.49s/it] {'loss': 0.58, 'learning_rate': 5.406932472388537e-06, 'epoch': 0.66} 66%|██████▌ | 3824/5773 [2:05:28<2:58:11, 5.49s/it] {'loss': 0.58, 'learning_rate': 5.406932472388537e-06, 'epoch': 0.66} 66%|██████▌ | 3824/5773 [2:05:27<2:58:11, 5.49s/it] 66%|██████▋ | 3825/5773 [2:05:32<2:57:00, 5.45s/it] 66%|██████▋ | 3825/5773 [2:05:34<2:57:00, 5.45s/it] {'loss': 0.5653, 'learning_rate': 5.401949079298332e-06, 'epoch': 0.66} 66%|██████▋ | 3825/5773 [2:05:34<2:57:00, 5.45s/it] {'loss': 0.5653, 'learning_rate': 5.401949079298332e-06, 'epoch': 0.66} 66%|██████▋ | 3825/5773 [2:05:32<2:57:00, 5.45s/it] 66%|██████▋ | 3826/5773 [2:05:37<2:56:29, 5.44s/it] 66%|██████▋ | 3826/5773 [2:05:39<2:56:29, 5.44s/it] {'loss': 0.5619, 'learning_rate': 5.396967133821461e-06, 'epoch': 0.66} 66%|██████▋ | 3826/5773 [2:05:39<2:56:29, 5.44s/it] {'loss': 0.5619, 'learning_rate': 5.396967133821461e-06, 'epoch': 0.66} 66%|██████▋ | 3826/5773 [2:05:37<2:56:29, 5.44s/it] 66%|██████▋ | 3827/5773 [2:05:43<2:58:47, 5.51s/it] 66%|██████▋ | 3827/5773 [2:05:45<2:58:48, 5.51s/it] {'loss': 0.5707, 'learning_rate': 5.3919866375264055e-06, 'epoch': 0.66} 66%|██████▋ | 3827/5773 [2:05:45<2:58:48, 5.51s/it] {'loss': 0.5707, 'learning_rate': 5.3919866375264055e-06, 'epoch': 0.66} 66%|██████▋ | 3827/5773 [2:05:43<2:58:47, 5.51s/it] 66%|██████▋ | 3828/5773 [2:05:50<2:58:51, 5.52s/it] 66%|██████▋ | 3828/5773 [2:05:49<2:58:51, 5.52s/it] {'loss': 0.5828, 'learning_rate': 5.387007591981181e-06, 'epoch': 0.66} 66%|██████▋ | 3828/5773 [2:05:50<2:58:51, 5.52s/it] {'loss': 0.5828, 'learning_rate': 5.387007591981181e-06, 'epoch': 0.66} 66%|██████▋ | 3828/5773 [2:05:49<2:58:51, 5.52s/it] 66%|██████▋ | 3829/5773 [2:05:56<2:57:36, 5.48s/it] 66%|██████▋ | 3829/5773 [2:05:54<2:57:36, 5.48s/it] {'loss': 0.5552, 'learning_rate': 5.382029998753349e-06, 'epoch': 0.66} 66%|██████▋ | 3829/5773 [2:05:56<2:57:36, 5.48s/it] {'loss': 0.5552, 'learning_rate': 5.382029998753349e-06, 'epoch': 0.66} 66%|██████▋ | 3829/5773 [2:05:54<2:57:36, 5.48s/it] 66%|██████▋ | 3830/5773 [2:06:02<2:59:05, 5.53s/it] 66%|██████▋ | 3830/5773 [2:06:00<2:59:05, 5.53s/it] {'loss': 0.5822, 'learning_rate': 5.377053859410022e-06, 'epoch': 0.66} 66%|██████▋ | 3830/5773 [2:06:02<2:59:05, 5.53s/it] {'loss': 0.5822, 'learning_rate': 5.377053859410022e-06, 'epoch': 0.66} 66%|██████▋ | 3830/5773 [2:06:00<2:59:05, 5.53s/it] 66%|██████▋ | 3831/5773 [2:06:07<2:57:54, 5.50s/it] 66%|██████▋ | 3831/5773 [2:06:05<2:57:54, 5.50s/it] {'loss': 0.5603, 'learning_rate': 5.372079175517838e-06, 'epoch': 0.66} 66%|██████▋ | 3831/5773 [2:06:07<2:57:54, 5.50s/it] {'loss': 0.5603, 'learning_rate': 5.372079175517838e-06, 'epoch': 0.66} 66%|██████▋ | 3831/5773 [2:06:05<2:57:54, 5.50s/it] 66%|██████▋ | 3832/5773 [2:06:12<2:57:35, 5.49s/it] 66%|██████▋ | 3832/5773 [2:06:10<2:57:35, 5.49s/it] {'loss': 0.5619, 'learning_rate': 5.367105948642988e-06, 'epoch': 0.66} 66%|██████▋ | 3832/5773 [2:06:12<2:57:35, 5.49s/it] {'loss': 0.5619, 'learning_rate': 5.367105948642988e-06, 'epoch': 0.66} 66%|██████▋ | 3832/5773 [2:06:10<2:57:35, 5.49s/it] 66%|██████▋ | 3833/5773 [2:06:16<2:58:19, 5.52s/it] 66%|██████▋ | 3833/5773 [2:06:18<2:58:20, 5.52s/it] {'loss': 0.5721, 'learning_rate': 5.3621341803512084e-06, 'epoch': 0.66} 66%|██████▋ | 3833/5773 [2:06:18<2:58:20, 5.52s/it] {'loss': 0.5721, 'learning_rate': 5.3621341803512084e-06, 'epoch': 0.66} 66%|██████▋ | 3833/5773 [2:06:16<2:58:19, 5.52s/it] 66%|██████▋ | 3834/5773 [2:06:21<2:56:36, 5.46s/it] 66%|██████▋ | 3834/5773 [2:06:23<2:56:36, 5.46s/it] {'loss': 0.5746, 'learning_rate': 5.357163872207769e-06, 'epoch': 0.66} 66%|██████▋ | 3834/5773 [2:06:23<2:56:36, 5.46s/it] {'loss': 0.5746, 'learning_rate': 5.357163872207769e-06, 'epoch': 0.66} 66%|██████▋ | 3834/5773 [2:06:21<2:56:36, 5.46s/it] 66%|██████▋ | 3835/5773 [2:06:27<2:56:11, 5.45s/it] 66%|██████▋ | 3835/5773 [2:06:29<2:56:11, 5.45s/it] {'loss': 0.5789, 'learning_rate': 5.352195025777475e-06, 'epoch': 0.66} 66%|██████▋ | 3835/5773 [2:06:29<2:56:11, 5.45s/it] {'loss': 0.5789, 'learning_rate': 5.352195025777475e-06, 'epoch': 0.66} 66%|██████▋ | 3835/5773 [2:06:27<2:56:11, 5.45s/it] 66%|██████▋ | 3836/5773 [2:06:32<2:56:05, 5.45s/it] 66%|██████▋ | 3836/5773 [2:06:34<2:56:05, 5.45s/it] {'loss': 0.5808, 'learning_rate': 5.347227642624687e-06, 'epoch': 0.66} 66%|██████▋ | 3836/5773 [2:06:34<2:56:05, 5.45s/it] {'loss': 0.5808, 'learning_rate': 5.347227642624687e-06, 'epoch': 0.66} 66%|██████▋ | 3836/5773 [2:06:32<2:56:05, 5.45s/it] 66%|██████▋ | 3837/5773 [2:06:40<2:56:20, 5.47s/it] 66%|██████▋ | 3837/5773 [2:06:38<2:56:20, 5.47s/it] {'loss': 0.5625, 'learning_rate': 5.342261724313292e-06, 'epoch': 0.66} 66%|██████▋ | 3837/5773 [2:06:40<2:56:20, 5.47s/it] {'loss': 0.5625, 'learning_rate': 5.342261724313292e-06, 'epoch': 0.66} 66%|██████▋ | 3837/5773 [2:06:38<2:56:20, 5.47s/it] 66%|██████▋ | 3838/5773 [2:06:43<2:57:40, 5.51s/it] 66%|██████▋ | 3838/5773 [2:06:45<2:57:41, 5.51s/it] {'loss': 0.5767, 'learning_rate': 5.337297272406717e-06, 'epoch': 0.66} 66%|██████▋ | 3838/5773 [2:06:45<2:57:41, 5.51s/it] {'loss': 0.5767, 'learning_rate': 5.337297272406717e-06, 'epoch': 0.66} 66%|██████▋ | 3838/5773 [2:06:43<2:57:40, 5.51s/it] 66%|██████▋ | 3839/5773 [2:06:51<2:57:23, 5.50s/it] 66%|██████▋ | 3839/5773 [2:06:49<2:57:23, 5.50s/it] {'loss': 0.5608, 'learning_rate': 5.332334288467937e-06, 'epoch': 0.66} 66%|██████▋ | 3839/5773 [2:06:51<2:57:23, 5.50s/it] {'loss': 0.5608, 'learning_rate': 5.332334288467937e-06, 'epoch': 0.66} 66%|██████▋ | 3839/5773 [2:06:49<2:57:23, 5.50s/it] 67%|██████▋ | 3840/5773 [2:06:55<2:59:00, 5.56s/it] 67%|██████▋ | 3840/5773 [2:06:56<2:59:00, 5.56s/it] {'loss': 0.5778, 'learning_rate': 5.327372774059454e-06, 'epoch': 0.67} 67%|██████▋ | 3840/5773 [2:06:57<2:59:00, 5.56s/it] {'loss': 0.5778, 'learning_rate': 5.327372774059454e-06, 'epoch': 0.67} 67%|██████▋ | 3840/5773 [2:06:55<2:59:00, 5.56s/it] 67%|██████▋ | 3841/5773 [2:07:00<2:59:01, 5.56s/it] 67%|██████▋ | 3841/5773 [2:07:02<2:59:01, 5.56s/it] {'loss': 0.561, 'learning_rate': 5.322412730743309e-06, 'epoch': 0.67} 67%|██████▋ | 3841/5773 [2:07:02<2:59:01, 5.56s/it] {'loss': 0.561, 'learning_rate': 5.322412730743309e-06, 'epoch': 0.67} 67%|██████▋ | 3841/5773 [2:07:00<2:59:01, 5.56s/it] 67%|██████▋ | 3842/5773 [2:07:05<2:57:04, 5.50s/it] 67%|██████▋ | 3842/5773 [2:07:07<2:57:04, 5.50s/it] {'loss': 0.5562, 'learning_rate': 5.317454160081082e-06, 'epoch': 0.67} 67%|██████▋ | 3842/5773 [2:07:07<2:57:04, 5.50s/it] {'loss': 0.5562, 'learning_rate': 5.317454160081082e-06, 'epoch': 0.67} 67%|██████▋ | 3842/5773 [2:07:05<2:57:04, 5.50s/it] 67%|██████▋ | 3843/5773 [2:07:11<2:57:28, 5.52s/it] 67%|██████▋ | 3843/5773 [2:07:13<2:57:28, 5.52s/it] {'loss': 0.5621, 'learning_rate': 5.312497063633897e-06, 'epoch': 0.67} 67%|██████▋ | 3843/5773 [2:07:13<2:57:28, 5.52s/it] {'loss': 0.5621, 'learning_rate': 5.312497063633897e-06, 'epoch': 0.67} 67%|██████▋ | 3843/5773 [2:07:11<2:57:28, 5.52s/it] 67%|██████▋ | 3844/5773 [2:07:17<2:57:33, 5.52s/it] 67%|██████▋ | 3844/5773 [2:07:19<2:57:34, 5.52s/it] {'loss': 0.5718, 'learning_rate': 5.307541442962403e-06, 'epoch': 0.67} 67%|██████▋ | 3844/5773 [2:07:19<2:57:34, 5.52s/it] {'loss': 0.5718, 'learning_rate': 5.307541442962403e-06, 'epoch': 0.67} 67%|██████▋ | 3844/5773 [2:07:17<2:57:33, 5.52s/it] 67%|██████▋ | 3845/5773 [2:07:24<2:56:07, 5.48s/it] 67%|██████▋ | 3845/5773 [2:07:22<2:56:07, 5.48s/it] {'loss': 0.5496, 'learning_rate': 5.302587299626778e-06, 'epoch': 0.67} 67%|██████▋ | 3845/5773 [2:07:24<2:56:07, 5.48s/it] {'loss': 0.5496, 'learning_rate': 5.302587299626778e-06, 'epoch': 0.67} 67%|██████▋ | 3845/5773 [2:07:22<2:56:07, 5.48s/it] 67%|██████▋ | 3846/5773 [2:07:29<2:56:19, 5.49s/it] 67%|██████▋ | 3846/5773 [2:07:27<2:56:19, 5.49s/it] {'loss': 0.5746, 'learning_rate': 5.297634635186757e-06, 'epoch': 0.67} 67%|██████▋ | 3846/5773 [2:07:29<2:56:19, 5.49s/it] {'loss': 0.5746, 'learning_rate': 5.297634635186757e-06, 'epoch': 0.67} 67%|██████▋ | 3846/5773 [2:07:27<2:56:19, 5.49s/it] 67%|██████▋ | 3847/5773 [2:07:35<2:56:46, 5.51s/it] 67%|██████▋ | 3847/5773 [2:07:33<2:56:46, 5.51s/it] {'loss': 0.5641, 'learning_rate': 5.2926834512015925e-06, 'epoch': 0.67} 67%|██████▋ | 3847/5773 [2:07:35<2:56:46, 5.51s/it] {'loss': 0.5641, 'learning_rate': 5.2926834512015925e-06, 'epoch': 0.67} 67%|██████▋ | 3847/5773 [2:07:33<2:56:46, 5.51s/it] 67%|██████▋ | 3848/5773 [2:07:41<2:57:52, 5.54s/it] 67%|██████▋ | 3848/5773 [2:07:39<2:57:53, 5.54s/it] {'loss': 0.5603, 'learning_rate': 5.287733749230071e-06, 'epoch': 0.67} 67%|██████▋ | 3848/5773 [2:07:41<2:57:52, 5.54s/it] {'loss': 0.5603, 'learning_rate': 5.287733749230071e-06, 'epoch': 0.67} 67%|██████▋ | 3848/5773 [2:07:39<2:57:53, 5.54s/it] 67%|██████▋ | 3849/5773 [2:07:44<2:56:52, 5.52s/it] 67%|██████▋ | 3849/5773 [2:07:46<2:56:53, 5.52s/it] {'loss': 0.5587, 'learning_rate': 5.282785530830525e-06, 'epoch': 0.67} 67%|██████▋ | 3849/5773 [2:07:46<2:56:53, 5.52s/it] {'loss': 0.5587, 'learning_rate': 5.282785530830525e-06, 'epoch': 0.67} 67%|██████▋ | 3849/5773 [2:07:44<2:56:52, 5.52s/it]9 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...11 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 67%|██████▋ | 3850/5773 [2:07:49<2:54:46, 5.45s/it]5 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 67%|██████▋ | 3850/5773 [2:07:51<2:54:47, 5.45s/it]13 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5586, 'learning_rate': 5.277838797560808e-06, 'epoch': 0.67} 67%|██████▋ | 3850/5773 [2:07:51<2:54:47, 5.45s/it] {'loss': 0.5586, 'learning_rate': 5.277838797560808e-06, 'epoch': 0.67} 67%|██████▋ | 3850/5773 [2:07:49<2:54:46, 5.45s/it] 67%|██████▋ | 3851/5773 [2:07:55<2:55:45, 5.49s/it] 67%|██████▋ | 3851/5773 [2:07:57<2:55:45, 5.49s/it] {'loss': 0.5736, 'learning_rate': 5.272893550978305e-06, 'epoch': 0.67} 67%|██████▋ | 3851/5773 [2:07:57<2:55:45, 5.49s/it] {'loss': 0.5736, 'learning_rate': 5.272893550978305e-06, 'epoch': 0.67} 67%|██████▋ | 3851/5773 [2:07:55<2:55:45, 5.49s/it] 67%|██████▋ | 3852/5773 [2:08:02<2:55:52, 5.49s/it] 67%|██████▋ | 3852/5773 [2:08:00<2:55:52, 5.49s/it] {'loss': 0.5541, 'learning_rate': 5.267949792639947e-06, 'epoch': 0.67} 67%|██████▋ | 3852/5773 [2:08:02<2:55:52, 5.49s/it] {'loss': 0.5541, 'learning_rate': 5.267949792639947e-06, 'epoch': 0.67} 67%|██████▋ | 3852/5773 [2:08:00<2:55:52, 5.49s/it] 67%|██████▋ | 3853/5773 [2:08:08<2:54:55, 5.47s/it] 67%|██████▋ | 3853/5773 [2:08:06<2:54:56, 5.47s/it] {'loss': 0.5672, 'learning_rate': 5.263007524102182e-06, 'epoch': 0.67} 67%|██████▋ | 3853/5773 [2:08:08<2:54:55, 5.47s/it] {'loss': 0.5672, 'learning_rate': 5.263007524102182e-06, 'epoch': 0.67} 67%|██████▋ | 3853/5773 [2:08:06<2:54:56, 5.47s/it] 67%|██████▋ | 3854/5773 [2:08:13<2:53:39, 5.43s/it] 67%|██████▋ | 3854/5773 [2:08:11<2:53:38, 5.43s/it] {'loss': 0.5575, 'learning_rate': 5.258066746920993e-06, 'epoch': 0.67} 67%|██████▋ | 3854/5773 [2:08:13<2:53:39, 5.43s/it] {'loss': 0.5575, 'learning_rate': 5.258066746920993e-06, 'epoch': 0.67} 67%|██████▋ | 3854/5773 [2:08:11<2:53:38, 5.43s/it] 67%|██████▋ | 3855/5773 [2:08:19<2:52:50, 5.41s/it] 67%|██████▋ | 3855/5773 [2:08:17<2:52:49, 5.41s/it] {'loss': 0.5739, 'learning_rate': 5.253127462651895e-06, 'epoch': 0.67} 67%|██████▋ | 3855/5773 [2:08:19<2:52:50, 5.41s/it] {'loss': 0.5739, 'learning_rate': 5.253127462651895e-06, 'epoch': 0.67} 67%|██████▋ | 3855/5773 [2:08:17<2:52:49, 5.41s/it] 67%|██████▋ | 3856/5773 [2:08:24<2:54:05, 5.45s/it] 67%|██████▋ | 3856/5773 [2:08:22<2:54:05, 5.45s/it] {'loss': 0.562, 'learning_rate': 5.248189672849935e-06, 'epoch': 0.67} 67%|██████▋ | 3856/5773 [2:08:24<2:54:05, 5.45s/it] {'loss': 0.562, 'learning_rate': 5.248189672849935e-06, 'epoch': 0.67} 67%|██████▋ | 3856/5773 [2:08:22<2:54:05, 5.45s/it] 67%|██████▋ | 3857/5773 [2:08:30<2:54:14, 5.46s/it] 67%|██████▋ | 3857/5773 [2:08:28<2:54:14, 5.46s/it] {'loss': 0.5633, 'learning_rate': 5.243253379069685e-06, 'epoch': 0.67} 67%|██████▋ | 3857/5773 [2:08:30<2:54:14, 5.46s/it] {'loss': 0.5633, 'learning_rate': 5.243253379069685e-06, 'epoch': 0.67} 67%|██████▋ | 3857/5773 [2:08:28<2:54:14, 5.46s/it] 67%|██████▋ | 3858/5773 [2:08:33<2:54:22, 5.46s/it] 67%|██████▋ | 3858/5773 [2:08:35<2:54:23, 5.46s/it] {'loss': 0.5427, 'learning_rate': 5.238318582865249e-06, 'epoch': 0.67} 67%|██████▋ | 3858/5773 [2:08:35<2:54:23, 5.46s/it] {'loss': 0.5427, 'learning_rate': 5.238318582865249e-06, 'epoch': 0.67} 67%|██████▋ | 3858/5773 [2:08:33<2:54:22, 5.46s/it] 67%|██████▋ | 3859/5773 [2:08:39<2:54:32, 5.47s/it] 67%|██████▋ | 3859/5773 [2:08:41<2:54:32, 5.47s/it] {'loss': 0.589, 'learning_rate': 5.233385285790258e-06, 'epoch': 0.67} 67%|██████▋ | 3859/5773 [2:08:41<2:54:32, 5.47s/it] {'loss': 0.589, 'learning_rate': 5.233385285790258e-06, 'epoch': 0.67} 67%|██████▋ | 3859/5773 [2:08:39<2:54:32, 5.47s/it] 67%|██████▋ | 3860/5773 [2:08:44<2:53:32, 5.44s/it] 67%|██████▋ | 3860/5773 [2:08:46<2:53:32, 5.44s/it] {'loss': 0.5566, 'learning_rate': 5.228453489397871e-06, 'epoch': 0.67} {'loss': 0.5566, 'learning_rate': 5.228453489397871e-06, 'epoch': 0.67} 67%|██████▋ | 3860/5773 [2:08:46<2:53:32, 5.44s/it] 67%|██████▋ | 3860/5773 [2:08:44<2:53:32, 5.44s/it] 67%|██████▋ | 3861/5773 [2:08:51<2:53:46, 5.45s/it] 67%|██████▋ | 3861/5773 [2:08:49<2:53:46, 5.45s/it] {'loss': 0.5489, 'learning_rate': 5.223523195240772e-06, 'epoch': 0.67} 67%|██████▋ | 3861/5773 [2:08:51<2:53:46, 5.45s/it] {'loss': 0.5489, 'learning_rate': 5.223523195240772e-06, 'epoch': 0.67} 67%|██████▋ | 3861/5773 [2:08:49<2:53:46, 5.45s/it] 67%|██████▋ | 3862/5773 [2:08:55<2:52:37, 5.42s/it] 67%|██████▋ | 3862/5773 [2:08:57<2:52:37, 5.42s/it] {'loss': 0.5575, 'learning_rate': 5.218594404871181e-06, 'epoch': 0.67} 67%|██████▋ | 3862/5773 [2:08:57<2:52:37, 5.42s/it] {'loss': 0.5575, 'learning_rate': 5.218594404871181e-06, 'epoch': 0.67} 67%|██████▋ | 3862/5773 [2:08:55<2:52:37, 5.42s/it] 67%|██████▋ | 3863/5773 [2:09:00<2:52:43, 5.43s/it] 67%|██████▋ | 3863/5773 [2:09:02<2:52:43, 5.43s/it] {'loss': 0.5561, 'learning_rate': 5.213667119840837e-06, 'epoch': 0.67} 67%|██████▋ | 3863/5773 [2:09:02<2:52:43, 5.43s/it] {'loss': 0.5561, 'learning_rate': 5.213667119840837e-06, 'epoch': 0.67} 67%|██████▋ | 3863/5773 [2:09:00<2:52:43, 5.43s/it] 67%|██████▋ | 3864/5773 [2:09:06<2:52:56, 5.44s/it] 67%|██████▋ | 3864/5773 [2:09:08<2:52:56, 5.44s/it] {'loss': 0.5494, 'learning_rate': 5.2087413417010025e-06, 'epoch': 0.67} 67%|██████▋ | 3864/5773 [2:09:08<2:52:56, 5.44s/it] {'loss': 0.5494, 'learning_rate': 5.2087413417010025e-06, 'epoch': 0.67} 67%|██████▋ | 3864/5773 [2:09:06<2:52:56, 5.44s/it] 67%|██████▋ | 3865/5773 [2:09:13<2:52:04, 5.41s/it] 67%|██████▋ | 3865/5773 [2:09:11<2:52:04, 5.41s/it] {'loss': 0.5638, 'learning_rate': 5.203817072002477e-06, 'epoch': 0.67} 67%|██████▋ | 3865/5773 [2:09:13<2:52:04, 5.41s/it] {'loss': 0.5638, 'learning_rate': 5.203817072002477e-06, 'epoch': 0.67} 67%|██████▋ | 3865/5773 [2:09:11<2:52:04, 5.41s/it] 67%|██████▋ | 3866/5773 [2:09:18<2:51:39, 5.40s/it] 67%|██████▋ | 3866/5773 [2:09:16<2:51:39, 5.40s/it] {'loss': 0.5444, 'learning_rate': 5.198894312295575e-06, 'epoch': 0.67} 67%|██████▋ | 3866/5773 [2:09:18<2:51:39, 5.40s/it] {'loss': 0.5444, 'learning_rate': 5.198894312295575e-06, 'epoch': 0.67} 67%|██████▋ | 3866/5773 [2:09:16<2:51:39, 5.40s/it] 67%|██████▋ | 3867/5773 [2:09:22<2:53:06, 5.45s/it] 67%|██████▋ | 3867/5773 [2:09:24<2:53:06, 5.45s/it] {'loss': 0.5794, 'learning_rate': 5.193973064130136e-06, 'epoch': 0.67} 67%|██████▋ | 3867/5773 [2:09:24<2:53:06, 5.45s/it] {'loss': 0.5794, 'learning_rate': 5.193973064130136e-06, 'epoch': 0.67} 67%|██████▋ | 3867/5773 [2:09:22<2:53:06, 5.45s/it] 67%|██████▋ | 3868/5773 [2:09:29<2:52:56, 5.45s/it] 67%|██████▋ | 3868/5773 [2:09:27<2:52:56, 5.45s/it] {'loss': 0.5544, 'learning_rate': 5.189053329055524e-06, 'epoch': 0.67} 67%|██████▋ | 3868/5773 [2:09:29<2:52:56, 5.45s/it] {'loss': 0.5544, 'learning_rate': 5.189053329055524e-06, 'epoch': 0.67} 67%|██████▋ | 3868/5773 [2:09:27<2:52:56, 5.45s/it] 67%|██████▋ | 3869/5773 [2:09:35<2:52:36, 5.44s/it] 67%|██████▋ | 3869/5773 [2:09:33<2:52:36, 5.44s/it] {'loss': 0.5851, 'learning_rate': 5.184135108620638e-06, 'epoch': 0.67} 67%|██████▋ | 3869/5773 [2:09:35<2:52:36, 5.44s/it] {'loss': 0.5851, 'learning_rate': 5.184135108620638e-06, 'epoch': 0.67} 67%|██████▋ | 3869/5773 [2:09:33<2:52:36, 5.44s/it] 67%|██████▋ | 3870/5773 [2:09:40<2:54:06, 5.49s/it] 67%|██████▋ | 3870/5773 [2:09:38<2:54:06, 5.49s/it] {'loss': 0.5596, 'learning_rate': 5.179218404373886e-06, 'epoch': 0.67} 67%|██████▋ | 3870/5773 [2:09:40<2:54:06, 5.49s/it] {'loss': 0.5596, 'learning_rate': 5.179218404373886e-06, 'epoch': 0.67} 67%|██████▋ | 3870/5773 [2:09:38<2:54:06, 5.49s/it] 67%|██████▋ | 3871/5773 [2:09:46<2:53:23, 5.47s/it] 67%|██████▋ | 3871/5773 [2:09:44<2:53:23, 5.47s/it] {'loss': 0.5685, 'learning_rate': 5.174303217863199e-06, 'epoch': 0.67} 67%|██████▋ | 3871/5773 [2:09:46<2:53:23, 5.47s/it] {'loss': 0.5685, 'learning_rate': 5.174303217863199e-06, 'epoch': 0.67} 67%|██████▋ | 3871/5773 [2:09:44<2:53:23, 5.47s/it] 67%|██████▋ | 3872/5773 [2:09:51<2:53:44, 5.48s/it] 67%|██████▋ | 3872/5773 [2:09:49<2:53:44, 5.48s/it] {'loss': 0.548, 'learning_rate': 5.169389550636046e-06, 'epoch': 0.67} 67%|██████▋ | 3872/5773 [2:09:51<2:53:44, 5.48s/it] {'loss': 0.548, 'learning_rate': 5.169389550636046e-06, 'epoch': 0.67} 67%|██████▋ | 3872/5773 [2:09:49<2:53:44, 5.48s/it] 67%|██████▋ | 3873/5773 [2:09:55<2:53:21, 5.47s/it] 67%|██████▋ | 3873/5773 [2:09:57<2:53:22, 5.47s/it] {'loss': 0.5603, 'learning_rate': 5.164477404239395e-06, 'epoch': 0.67} 67%|██████▋ | 3873/5773 [2:09:57<2:53:22, 5.47s/it] {'loss': 0.5603, 'learning_rate': 5.164477404239395e-06, 'epoch': 0.67} 67%|██████▋ | 3873/5773 [2:09:55<2:53:21, 5.47s/it] 67%|██████▋ | 3874/5773 [2:10:00<2:52:36, 5.45s/it] 67%|██████▋ | 3874/5773 [2:10:02<2:52:36, 5.45s/it] {'loss': 0.5513, 'learning_rate': 5.159566780219749e-06, 'epoch': 0.67} 67%|██████▋ | 3874/5773 [2:10:02<2:52:36, 5.45s/it] {'loss': 0.5513, 'learning_rate': 5.159566780219749e-06, 'epoch': 0.67} 67%|██████▋ | 3874/5773 [2:10:00<2:52:36, 5.45s/it] 67%|██████▋ | 3875/5773 [2:10:06<2:53:16, 5.48s/it] 67%|██████▋ | 3875/5773 [2:10:08<2:53:16, 5.48s/it] {'loss': 0.5686, 'learning_rate': 5.154657680123134e-06, 'epoch': 0.67} 67%|██████▋ | 3875/5773 [2:10:08<2:53:16, 5.48s/it] {'loss': 0.5686, 'learning_rate': 5.154657680123134e-06, 'epoch': 0.67} 67%|██████▋ | 3875/5773 [2:10:06<2:53:16, 5.48s/it] 67%|██████▋ | 3876/5773 [2:10:11<2:53:27, 5.49s/it] 67%|██████▋ | 3876/5773 [2:10:13<2:53:27, 5.49s/it] {'loss': 0.5738, 'learning_rate': 5.149750105495088e-06, 'epoch': 0.67} 67%|██████▋ | 3876/5773 [2:10:13<2:53:27, 5.49s/it] {'loss': 0.5738, 'learning_rate': 5.149750105495088e-06, 'epoch': 0.67} 67%|██████▋ | 3876/5773 [2:10:11<2:53:27, 5.49s/it] 67%|██████▋ | 3877/5773 [2:10:19<2:52:42, 5.47s/it] 67%|██████▋ | 3877/5773 [2:10:17<2:52:42, 5.47s/it] {'loss': 0.5865, 'learning_rate': 5.1448440578806715e-06, 'epoch': 0.67} 67%|██████▋ | 3877/5773 [2:10:19<2:52:42, 5.47s/it] {'loss': 0.5865, 'learning_rate': 5.1448440578806715e-06, 'epoch': 0.67} 67%|██████▋ | 3877/5773 [2:10:17<2:52:42, 5.47s/it] 67%|██████▋ | 3878/5773 [2:10:22<2:52:31, 5.46s/it] 67%|██████▋ | 3878/5773 [2:10:24<2:52:31, 5.46s/it] {'loss': 0.5556, 'learning_rate': 5.13993953882447e-06, 'epoch': 0.67} 67%|██████▋ | 3878/5773 [2:10:24<2:52:31, 5.46s/it] {'loss': 0.5556, 'learning_rate': 5.13993953882447e-06, 'epoch': 0.67} 67%|██████▋ | 3878/5773 [2:10:22<2:52:31, 5.46s/it] 67%|██████▋ | 3879/5773 [2:10:30<2:53:57, 5.51s/it] 67%|██████▋ | 3879/5773 [2:10:28<2:53:57, 5.51s/it] {'loss': 0.5586, 'learning_rate': 5.135036549870578e-06, 'epoch': 0.67} 67%|██████▋ | 3879/5773 [2:10:30<2:53:57, 5.51s/it] {'loss': 0.5586, 'learning_rate': 5.135036549870578e-06, 'epoch': 0.67} 67%|██████▋ | 3879/5773 [2:10:28<2:53:57, 5.51s/it] 67%|██████▋ | 3880/5773 [2:10:35<2:56:06, 5.58s/it] 67%|██████▋ | 3880/5773 [2:10:33<2:56:06, 5.58s/it] {'loss': 0.5568, 'learning_rate': 5.130135092562616e-06, 'epoch': 0.67} 67%|██████▋ | 3880/5773 [2:10:35<2:56:06, 5.58s/it] {'loss': 0.5568, 'learning_rate': 5.130135092562616e-06, 'epoch': 0.67} 67%|██████▋ | 3880/5773 [2:10:33<2:56:06, 5.58s/it] 67%|██████▋ | 3881/5773 [2:10:39<2:55:30, 5.57s/it] 67%|██████▋ | 3881/5773 [2:10:41<2:55:31, 5.57s/it] {'loss': 0.5685, 'learning_rate': 5.125235168443714e-06, 'epoch': 0.67} 67%|██████▋ | 3881/5773 [2:10:41<2:55:31, 5.57s/it] {'loss': 0.5685, 'learning_rate': 5.125235168443714e-06, 'epoch': 0.67} 67%|██████▋ | 3881/5773 [2:10:39<2:55:30, 5.57s/it] 67%|██████▋ | 3882/5773 [2:10:44<2:52:57, 5.49s/it] 67%|██████▋ | 3882/5773 [2:10:46<2:52:57, 5.49s/it] {'loss': 0.5612, 'learning_rate': 5.1203367790565314e-06, 'epoch': 0.67} 67%|██████▋ | 3882/5773 [2:10:46<2:52:57, 5.49s/it] {'loss': 0.5612, 'learning_rate': 5.1203367790565314e-06, 'epoch': 0.67} 67%|██████▋ | 3882/5773 [2:10:44<2:52:57, 5.49s/it] 67%|██████▋ | 3883/5773 [2:10:50<2:51:56, 5.46s/it] 67%|██████▋ | 3883/5773 [2:10:52<2:51:56, 5.46s/it] {'loss': 0.5717, 'learning_rate': 5.115439925943237e-06, 'epoch': 0.67} 67%|██████▋ | 3883/5773 [2:10:52<2:51:56, 5.46s/it] {'loss': 0.5717, 'learning_rate': 5.115439925943237e-06, 'epoch': 0.67} 67%|██████▋ | 3883/5773 [2:10:50<2:51:56, 5.46s/it] 67%|██████▋ | 3884/5773 [2:10:57<2:52:40, 5.48s/it] 67%|██████▋ | 3884/5773 [2:10:55<2:52:41, 5.48s/it] {'loss': 0.5486, 'learning_rate': 5.110544610645509e-06, 'epoch': 0.67} 67%|██████▋ | 3884/5773 [2:10:57<2:52:40, 5.48s/it] {'loss': 0.5486, 'learning_rate': 5.110544610645509e-06, 'epoch': 0.67} 67%|██████▋ | 3884/5773 [2:10:55<2:52:41, 5.48s/it] 67%|██████▋ | 3885/5773 [2:11:03<2:52:38, 5.49s/it] 67%|██████▋ | 3885/5773 [2:11:01<2:52:38, 5.49s/it] {'loss': 0.5623, 'learning_rate': 5.10565083470456e-06, 'epoch': 0.67} 67%|██████▋ | 3885/5773 [2:11:03<2:52:38, 5.49s/it] {'loss': 0.5623, 'learning_rate': 5.10565083470456e-06, 'epoch': 0.67} 67%|██████▋ | 3885/5773 [2:11:01<2:52:38, 5.49s/it] 67%|██████▋ | 3886/5773 [2:11:08<2:52:54, 5.50s/it] 67%|██████▋ | 3886/5773 [2:11:06<2:52:54, 5.50s/it] {'loss': 0.5689, 'learning_rate': 5.100758599661104e-06, 'epoch': 0.67} 67%|██████▋ | 3886/5773 [2:11:08<2:52:54, 5.50s/it] {'loss': 0.5689, 'learning_rate': 5.100758599661104e-06, 'epoch': 0.67} 67%|██████▋ | 3886/5773 [2:11:06<2:52:54, 5.50s/it] 67%|██████▋ | 3887/5773 [2:11:12<2:51:58, 5.47s/it] 67%|██████▋ | 3887/5773 [2:11:14<2:51:58, 5.47s/it] {'loss': 0.574, 'learning_rate': 5.095867907055363e-06, 'epoch': 0.67} 67%|██████▋ | 3887/5773 [2:11:14<2:51:58, 5.47s/it] {'loss': 0.574, 'learning_rate': 5.095867907055363e-06, 'epoch': 0.67} 67%|██████▋ | 3887/5773 [2:11:12<2:51:58, 5.47s/it] 67%|██████▋ | 3888/5773 [2:11:19<2:53:30, 5.52s/it] 67%|██████▋ | 3888/5773 [2:11:17<2:53:30, 5.52s/it] {'loss': 0.5639, 'learning_rate': 5.0909787584270955e-06, 'epoch': 0.67} 67%|██████▋ | 3888/5773 [2:11:19<2:53:30, 5.52s/it] {'loss': 0.5639, 'learning_rate': 5.0909787584270955e-06, 'epoch': 0.67} 67%|██████▋ | 3888/5773 [2:11:17<2:53:30, 5.52s/it] 67%|██████▋ | 3889/5773 [2:11:23<2:52:26, 5.49s/it] 67%|██████▋ | 3889/5773 [2:11:25<2:52:27, 5.49s/it] {'loss': 0.5826, 'learning_rate': 5.086091155315557e-06, 'epoch': 0.67} 67%|██████▋ | 3889/5773 [2:11:25<2:52:27, 5.49s/it] {'loss': 0.5826, 'learning_rate': 5.086091155315557e-06, 'epoch': 0.67} 67%|██████▋ | 3889/5773 [2:11:23<2:52:26, 5.49s/it] 67%|██████▋ | 3890/5773 [2:11:30<2:52:30, 5.50s/it] 67%|██████▋ | 3890/5773 [2:11:28<2:52:31, 5.50s/it] {'loss': 0.5373, 'learning_rate': 5.081205099259516e-06, 'epoch': 0.67} 67%|██████▋ | 3890/5773 [2:11:30<2:52:30, 5.50s/it] {'loss': 0.5373, 'learning_rate': 5.081205099259516e-06, 'epoch': 0.67} 67%|██████▋ | 3890/5773 [2:11:28<2:52:31, 5.50s/it] 67%|██████▋ | 3891/5773 [2:11:34<2:53:09, 5.52s/it] {'loss': 0.566, 'learning_rate': 5.076320591797269e-06, 'epoch': 0.67} 67%|██████▋ | 3891/5773 [2:11:34<2:53:09, 5.52s/it] 67%|██████▋ | 3891/5773 [2:11:36<2:53:09, 5.52s/it] {'loss': 0.566, 'learning_rate': 5.076320591797269e-06, 'epoch': 0.67} 67%|██████▋ | 3891/5773 [2:11:36<2:53:09, 5.52s/it] 67%|██████▋ | 3892/5773 [2:11:39<2:53:39, 5.54s/it] 67%|██████▋ | 3892/5773 [2:11:41<2:53:39, 5.54s/it] {'loss': 0.5738, 'learning_rate': 5.0714376344666095e-06, 'epoch': 0.67} 67%|██████▋ | 3892/5773 [2:11:41<2:53:39, 5.54s/it] {'loss': 0.5738, 'learning_rate': 5.0714376344666095e-06, 'epoch': 0.67} 67%|██████▋ | 3892/5773 [2:11:39<2:53:39, 5.54s/it] 67%|██████▋ | 3893/5773 [2:11:47<2:52:41, 5.51s/it] 67%|██████▋ | 3893/5773 [2:11:45<2:52:42, 5.51s/it] {'loss': 0.5613, 'learning_rate': 5.066556228804851e-06, 'epoch': 0.67} 67%|██████▋ | 3893/5773 [2:11:47<2:52:41, 5.51s/it] {'loss': 0.5613, 'learning_rate': 5.066556228804851e-06, 'epoch': 0.67} 67%|██████▋ | 3893/5773 [2:11:45<2:52:42, 5.51s/it] 67%|██████▋ | 3894/5773 [2:11:50<2:52:36, 5.51s/it] 67%|██████▋ | 3894/5773 [2:11:52<2:52:37, 5.51s/it] {'loss': 0.5608, 'learning_rate': 5.061676376348808e-06, 'epoch': 0.67} 67%|██████▋ | 3894/5773 [2:11:52<2:52:37, 5.51s/it] {'loss': 0.5608, 'learning_rate': 5.061676376348808e-06, 'epoch': 0.67} 67%|██████▋ | 3894/5773 [2:11:50<2:52:36, 5.51s/it] 67%|██████▋ | 3895/5773 [2:11:58<2:52:06, 5.50s/it] 67%|██████▋ | 3895/5773 [2:11:56<2:52:06, 5.50s/it] {'loss': 0.5607, 'learning_rate': 5.056798078634826e-06, 'epoch': 0.67} 67%|██████▋ | 3895/5773 [2:11:58<2:52:06, 5.50s/it] {'loss': 0.5607, 'learning_rate': 5.056798078634826e-06, 'epoch': 0.67} 67%|██████▋ | 3895/5773 [2:11:56<2:52:06, 5.50s/it] 67%|██████▋ | 3896/5773 [2:12:03<2:52:28, 5.51s/it] 67%|██████▋ | 3896/5773 [2:12:01<2:52:29, 5.51s/it] {'loss': 0.5497, 'learning_rate': 5.0519213371987415e-06, 'epoch': 0.67} 67%|██████▋ | 3896/5773 [2:12:03<2:52:28, 5.51s/it] {'loss': 0.5497, 'learning_rate': 5.0519213371987415e-06, 'epoch': 0.67} 67%|██████▋ | 3896/5773 [2:12:01<2:52:29, 5.51s/it] 68%|██████▊ | 3897/5773 [2:12:07<2:54:53, 5.59s/it] 68%|██████▊ | 3897/5773 [2:12:09<2:54:54, 5.59s/it] {'loss': 0.5649, 'learning_rate': 5.047046153575906e-06, 'epoch': 0.68} 68%|██████▊ | 3897/5773 [2:12:09<2:54:54, 5.59s/it] {'loss': 0.5649, 'learning_rate': 5.047046153575906e-06, 'epoch': 0.68} 68%|██████▊ | 3897/5773 [2:12:07<2:54:53, 5.59s/it] 68%|██████▊ | 3898/5773 [2:12:13<2:54:54, 5.60s/it] 68%|██████▊ | 3898/5773 [2:12:15<2:54:54, 5.60s/it] {'loss': 0.5615, 'learning_rate': 5.042172529301193e-06, 'epoch': 0.68} 68%|██████▊ | 3898/5773 [2:12:15<2:54:54, 5.60s/it] {'loss': 0.5615, 'learning_rate': 5.042172529301193e-06, 'epoch': 0.68} 68%|██████▊ | 3898/5773 [2:12:13<2:54:54, 5.60s/it] 68%|██████▊ | 3899/5773 [2:12:18<2:54:15, 5.58s/it] 68%|██████▊ | 3899/5773 [2:12:20<2:54:16, 5.58s/it] {'loss': 0.5714, 'learning_rate': 5.0373004659089675e-06, 'epoch': 0.68} 68%|██████▊ | 3899/5773 [2:12:20<2:54:16, 5.58s/it] {'loss': 0.5714, 'learning_rate': 5.0373004659089675e-06, 'epoch': 0.68} 68%|██████▊ | 3899/5773 [2:12:18<2:54:15, 5.58s/it]2 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...4 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 68%|██████▊ | 3900/5773 [2:12:24<2:54:19, 5.58s/it]6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 68%|██████▊ | 3900/5773 [2:12:26<2:54:19, 5.58s/it]7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5716, 'learning_rate': 5.032429964933112e-06, 'epoch': 0.68} 68%|██████▊ | 3900/5773 [2:12:26<2:54:19, 5.58s/it] {'loss': 0.5716, 'learning_rate': 5.032429964933112e-06, 'epoch': 0.68} 68%|██████▊ | 3900/5773 [2:12:24<2:54:19, 5.58s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-3900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 68%|██████▊ | 3901/5773 [2:12:46<5:12:12, 10.01s/it] 68%|██████▊ | 3901/5773 [2:12:44<5:12:12, 10.01s/it] {'loss': 0.5887, 'learning_rate': 5.027561027907017e-06, 'epoch': 0.68} 68%|██████▊ | 3901/5773 [2:12:46<5:12:12, 10.01s/it] {'loss': 0.5887, 'learning_rate': 5.027561027907017e-06, 'epoch': 0.68} 68%|██████▊ | 3901/5773 [2:12:44<5:12:12, 10.01s/it] 68%|██████▊ | 3902/5773 [2:12:52<4:30:22, 8.67s/it] 68%|██████▊ | 3902/5773 [2:12:50<4:30:22, 8.67s/it] {'loss': 0.56, 'learning_rate': 5.02269365636358e-06, 'epoch': 0.68} 68%|██████▊ | 3902/5773 [2:12:52<4:30:22, 8.67s/it] {'loss': 0.56, 'learning_rate': 5.02269365636358e-06, 'epoch': 0.68} 68%|██████▊ | 3902/5773 [2:12:50<4:30:22, 8.67s/it] 68%|██████▊ | 3903/5773 [2:12:55<4:01:57, 7.76s/it] 68%|██████▊ | 3903/5773 [2:12:57<4:01:57, 7.76s/it] {'loss': 0.5586, 'learning_rate': 5.017827851835199e-06, 'epoch': 0.68} 68%|██████▊ | 3903/5773 [2:12:57<4:01:57, 7.76s/it] {'loss': 0.5586, 'learning_rate': 5.017827851835199e-06, 'epoch': 0.68} 68%|██████▊ | 3903/5773 [2:12:55<4:01:57, 7.76s/it] 68%|██████▊ | 3904/5773 [2:13:03<3:40:36, 7.08s/it] 68%|██████▊ | 3904/5773 [2:13:01<3:40:36, 7.08s/it] {'loss': 0.5523, 'learning_rate': 5.012963615853793e-06, 'epoch': 0.68} 68%|██████▊ | 3904/5773 [2:13:03<3:40:36, 7.08s/it] {'loss': 0.5523, 'learning_rate': 5.012963615853793e-06, 'epoch': 0.68} 68%|██████▊ | 3904/5773 [2:13:01<3:40:36, 7.08s/it] 68%|██████▊ | 3905/5773 [2:13:08<3:25:45, 6.61s/it] 68%|██████▊ | 3905/5773 [2:13:06<3:25:45, 6.61s/it] {'loss': 0.5558, 'learning_rate': 5.008100949950776e-06, 'epoch': 0.68} 68%|██████▊ | 3905/5773 [2:13:08<3:25:45, 6.61s/it] {'loss': 0.5558, 'learning_rate': 5.008100949950776e-06, 'epoch': 0.68} 68%|██████▊ | 3905/5773 [2:13:06<3:25:45, 6.61s/it] 68%|██████▊ | 3906/5773 [2:13:14<3:16:18, 6.31s/it] 68%|██████▊ | 3906/5773 [2:13:12<3:16:18, 6.31s/it] {'loss': 0.5718, 'learning_rate': 5.00323985565707e-06, 'epoch': 0.68} 68%|██████▊ | 3906/5773 [2:13:14<3:16:18, 6.31s/it] {'loss': 0.5718, 'learning_rate': 5.00323985565707e-06, 'epoch': 0.68} 68%|██████▊ | 3906/5773 [2:13:12<3:16:18, 6.31s/it] 68%|██████▊ | 3907/5773 [2:13:18<3:08:37, 6.07s/it] 68%|██████▊ | 3907/5773 [2:13:20<3:08:38, 6.07s/it] {'loss': 0.5694, 'learning_rate': 4.998380334503099e-06, 'epoch': 0.68} 68%|██████▊ | 3907/5773 [2:13:20<3:08:38, 6.07s/it] {'loss': 0.5694, 'learning_rate': 4.998380334503099e-06, 'epoch': 0.68} 68%|██████▊ | 3907/5773 [2:13:18<3:08:37, 6.07s/it] 68%|██████▊ | 3908/5773 [2:13:23<3:03:41, 5.91s/it] 68%|██████▊ | 3908/5773 [2:13:25<3:03:41, 5.91s/it] {'loss': 0.5691, 'learning_rate': 4.993522388018803e-06, 'epoch': 0.68} 68%|██████▊ | 3908/5773 [2:13:25<3:03:41, 5.91s/it] {'loss': 0.5691, 'learning_rate': 4.993522388018803e-06, 'epoch': 0.68} 68%|██████▊ | 3908/5773 [2:13:23<3:03:41, 5.91s/it] 68%|██████▊ | 3909/5773 [2:13:31<3:00:08, 5.80s/it] 68%|██████▊ | 3909/5773 [2:13:29<3:00:08, 5.80s/it] {'loss': 0.5627, 'learning_rate': 4.988666017733615e-06, 'epoch': 0.68} 68%|██████▊ | 3909/5773 [2:13:31<3:00:08, 5.80s/it] {'loss': 0.5627, 'learning_rate': 4.988666017733615e-06, 'epoch': 0.68} 68%|██████▊ | 3909/5773 [2:13:29<3:00:08, 5.80s/it] 68%|██████▊ | 3910/5773 [2:13:36<2:55:41, 5.66s/it] 68%|██████▊ | 3910/5773 [2:13:34<2:55:41, 5.66s/it] {'loss': 0.5641, 'learning_rate': 4.9838112251764725e-06, 'epoch': 0.68} 68%|██████▊ | 3910/5773 [2:13:36<2:55:41, 5.66s/it] {'loss': 0.5641, 'learning_rate': 4.9838112251764725e-06, 'epoch': 0.68} 68%|██████▊ | 3910/5773 [2:13:34<2:55:41, 5.66s/it] 68%|██████▊ | 3911/5773 [2:13:39<2:54:01, 5.61s/it] 68%|██████▊ | 3911/5773 [2:13:41<2:54:01, 5.61s/it] {'loss': 0.5677, 'learning_rate': 4.978958011875825e-06, 'epoch': 0.68} 68%|██████▊ | 3911/5773 [2:13:41<2:54:01, 5.61s/it] {'loss': 0.5677, 'learning_rate': 4.978958011875825e-06, 'epoch': 0.68} 68%|██████▊ | 3911/5773 [2:13:39<2:54:01, 5.61s/it] 68%|██████▊ | 3912/5773 [2:13:47<2:53:17, 5.59s/it] 68%|██████▊ | 3912/5773 [2:13:45<2:53:17, 5.59s/it] {'loss': 0.5632, 'learning_rate': 4.974106379359618e-06, 'epoch': 0.68} 68%|██████▊ | 3912/5773 [2:13:47<2:53:17, 5.59s/it] {'loss': 0.5632, 'learning_rate': 4.974106379359618e-06, 'epoch': 0.68} 68%|██████▊ | 3912/5773 [2:13:45<2:53:17, 5.59s/it] 68%|██████▊ | 3913/5773 [2:13:52<2:51:45, 5.54s/it] 68%|██████▊ | 3913/5773 [2:13:50<2:51:45, 5.54s/it] {'loss': 0.5625, 'learning_rate': 4.9692563291552945e-06, 'epoch': 0.68} 68%|██████▊ | 3913/5773 [2:13:52<2:51:45, 5.54s/it] {'loss': 0.5625, 'learning_rate': 4.9692563291552945e-06, 'epoch': 0.68} 68%|██████▊ | 3913/5773 [2:13:50<2:51:45, 5.54s/it] 68%|██████▊ | 3914/5773 [2:13:58<2:50:56, 5.52s/it] 68%|██████▊ | 3914/5773 [2:13:56<2:50:56, 5.52s/it] {'loss': 0.5476, 'learning_rate': 4.964407862789817e-06, 'epoch': 0.68} 68%|██████▊ | 3914/5773 [2:13:58<2:50:56, 5.52s/it] {'loss': 0.5476, 'learning_rate': 4.964407862789817e-06, 'epoch': 0.68} 68%|██████▊ | 3914/5773 [2:13:56<2:50:56, 5.52s/it] 68%|██████▊ | 3915/5773 [2:14:01<2:49:43, 5.48s/it] 68%|██████▊ | 3915/5773 [2:14:03<2:49:43, 5.48s/it] {'loss': 0.5715, 'learning_rate': 4.9595609817896276e-06, 'epoch': 0.68} 68%|██████▊ | 3915/5773 [2:14:03<2:49:43, 5.48s/it] {'loss': 0.5715, 'learning_rate': 4.9595609817896276e-06, 'epoch': 0.68} 68%|██████▊ | 3915/5773 [2:14:01<2:49:43, 5.48s/it] 68%|██████▊ | 3916/5773 [2:14:09<2:49:35, 5.48s/it] 68%|██████▊ | 3916/5773 [2:14:07<2:49:35, 5.48s/it] {'loss': 0.5553, 'learning_rate': 4.9547156876806765e-06, 'epoch': 0.68} 68%|██████▊ | 3916/5773 [2:14:09<2:49:35, 5.48s/it] {'loss': 0.5553, 'learning_rate': 4.9547156876806765e-06, 'epoch': 0.68} 68%|██████▊ | 3916/5773 [2:14:07<2:49:35, 5.48s/it] 68%|██████▊ | 3917/5773 [2:14:14<2:48:13, 5.44s/it] 68%|██████▊ | 3917/5773 [2:14:12<2:48:13, 5.44s/it] {'loss': 0.5612, 'learning_rate': 4.949871981988429e-06, 'epoch': 0.68} 68%|██████▊ | 3917/5773 [2:14:14<2:48:13, 5.44s/it] {'loss': 0.5612, 'learning_rate': 4.949871981988429e-06, 'epoch': 0.68} 68%|██████▊ | 3917/5773 [2:14:12<2:48:13, 5.44s/it] 68%|██████▊ | 3918/5773 [2:14:19<2:47:16, 5.41s/it] 68%|██████▊ | 3918/5773 [2:14:17<2:47:16, 5.41s/it] {'loss': 0.5587, 'learning_rate': 4.9450298662378306e-06, 'epoch': 0.68} 68%|██████▊ | 3918/5773 [2:14:19<2:47:16, 5.41s/it] {'loss': 0.5587, 'learning_rate': 4.9450298662378306e-06, 'epoch': 0.68} 68%|██████▊ | 3918/5773 [2:14:17<2:47:16, 5.41s/it] 68%|██████▊ | 3919/5773 [2:14:25<2:46:48, 5.40s/it] 68%|██████▊ | 3919/5773 [2:14:23<2:46:47, 5.40s/it] {'loss': 0.5717, 'learning_rate': 4.9401893419533355e-06, 'epoch': 0.68} 68%|██████▊ | 3919/5773 [2:14:25<2:46:48, 5.40s/it] {'loss': 0.5717, 'learning_rate': 4.9401893419533355e-06, 'epoch': 0.68} 68%|██████▊ | 3919/5773 [2:14:23<2:46:47, 5.40s/it] 68%|██████▊ | 3920/5773 [2:14:28<2:47:01, 5.41s/it] 68%|██████▊ | 3920/5773 [2:14:30<2:47:01, 5.41s/it] {'loss': 0.5459, 'learning_rate': 4.935350410658891e-06, 'epoch': 0.68} 68%|██████▊ | 3920/5773 [2:14:30<2:47:01, 5.41s/it] {'loss': 0.5459, 'learning_rate': 4.935350410658891e-06, 'epoch': 0.68} 68%|██████▊ | 3920/5773 [2:14:28<2:47:01, 5.41s/it] 68%|██████▊ | 3921/5773 [2:14:34<2:47:40, 5.43s/it] 68%|██████▊ | 3921/5773 [2:14:36<2:47:40, 5.43s/it] {'loss': 0.5712, 'learning_rate': 4.930513073877956e-06, 'epoch': 0.68} 68%|██████▊ | 3921/5773 [2:14:36<2:47:40, 5.43s/it] {'loss': 0.5712, 'learning_rate': 4.930513073877956e-06, 'epoch': 0.68} 68%|██████▊ | 3921/5773 [2:14:34<2:47:40, 5.43s/it] 68%|██████▊ | 3922/5773 [2:14:39<2:47:30, 5.43s/it] 68%|██████▊ | 3922/5773 [2:14:41<2:47:30, 5.43s/it] {'loss': 0.5674, 'learning_rate': 4.925677333133475e-06, 'epoch': 0.68} 68%|██████▊ | 3922/5773 [2:14:41<2:47:30, 5.43s/it] {'loss': 0.5674, 'learning_rate': 4.925677333133475e-06, 'epoch': 0.68} 68%|██████▊ | 3922/5773 [2:14:39<2:47:30, 5.43s/it] 68%|██████▊ | 3923/5773 [2:14:45<2:49:41, 5.50s/it] 68%|██████▊ | 3923/5773 [2:14:47<2:49:41, 5.50s/it] {'loss': 0.5718, 'learning_rate': 4.920843189947888e-06, 'epoch': 0.68} 68%|██████▊ | 3923/5773 [2:14:47<2:49:41, 5.50s/it] {'loss': 0.5718, 'learning_rate': 4.920843189947888e-06, 'epoch': 0.68} 68%|██████▊ | 3923/5773 [2:14:45<2:49:41, 5.50s/it] 68%|██████▊ | 3924/5773 [2:14:52<2:51:13, 5.56s/it] 68%|██████▊ | 3924/5773 [2:14:51<2:51:13, 5.56s/it] {'loss': 0.5691, 'learning_rate': 4.916010645843148e-06, 'epoch': 0.68} 68%|██████▊ | 3924/5773 [2:14:52<2:51:13, 5.56s/it] {'loss': 0.5691, 'learning_rate': 4.916010645843148e-06, 'epoch': 0.68} 68%|██████▊ | 3924/5773 [2:14:51<2:51:13, 5.56s/it] 68%|██████▊ | 3925/5773 [2:14:58<2:50:41, 5.54s/it] 68%|██████▊ | 3925/5773 [2:14:56<2:50:41, 5.54s/it] {'loss': 0.5747, 'learning_rate': 4.9111797023406884e-06, 'epoch': 0.68} 68%|██████▊ | 3925/5773 [2:14:58<2:50:41, 5.54s/it] {'loss': 0.5747, 'learning_rate': 4.9111797023406884e-06, 'epoch': 0.68} 68%|██████▊ | 3925/5773 [2:14:56<2:50:41, 5.54s/it] 68%|██████▊ | 3926/5773 [2:15:04<2:51:22, 5.57s/it] 68%|██████▊ | 3926/5773 [2:15:02<2:51:22, 5.57s/it] {'loss': 0.5614, 'learning_rate': 4.9063503609614405e-06, 'epoch': 0.68} 68%|██████▊ | 3926/5773 [2:15:04<2:51:22, 5.57s/it] {'loss': 0.5614, 'learning_rate': 4.9063503609614405e-06, 'epoch': 0.68} 68%|██████▊ | 3926/5773 [2:15:02<2:51:22, 5.57s/it] 68%|██████▊ | 3927/5773 [2:15:09<2:50:16, 5.53s/it] 68%|██████▊ | 3927/5773 [2:15:07<2:50:16, 5.53s/it] {'loss': 0.5663, 'learning_rate': 4.901522623225845e-06, 'epoch': 0.68} 68%|██████▊ | 3927/5773 [2:15:09<2:50:16, 5.53s/it] {'loss': 0.5663, 'learning_rate': 4.901522623225845e-06, 'epoch': 0.68} 68%|██████▊ | 3927/5773 [2:15:07<2:50:16, 5.53s/it] 68%|██████▊ | 3928/5773 [2:15:15<2:50:43, 5.55s/it] 68%|██████▊ | 3928/5773 [2:15:13<2:50:43, 5.55s/it] {'loss': 0.5661, 'learning_rate': 4.896696490653826e-06, 'epoch': 0.68} 68%|██████▊ | 3928/5773 [2:15:15<2:50:43, 5.55s/it] {'loss': 0.5661, 'learning_rate': 4.896696490653826e-06, 'epoch': 0.68} 68%|██████▊ | 3928/5773 [2:15:13<2:50:43, 5.55s/it] 68%|██████▊ | 3929/5773 [2:15:18<2:48:58, 5.50s/it] 68%|██████▊ | 3929/5773 [2:15:20<2:48:58, 5.50s/it] {'loss': 0.5643, 'learning_rate': 4.891871964764794e-06, 'epoch': 0.68} 68%|██████▊ | 3929/5773 [2:15:20<2:48:58, 5.50s/it] {'loss': 0.5643, 'learning_rate': 4.891871964764794e-06, 'epoch': 0.68} 68%|██████▊ | 3929/5773 [2:15:18<2:48:58, 5.50s/it] 68%|██████▊ | 3930/5773 [2:15:24<2:48:40, 5.49s/it] 68%|██████▊ | 3930/5773 [2:15:26<2:48:40, 5.49s/it] {'loss': 0.5582, 'learning_rate': 4.887049047077674e-06, 'epoch': 0.68} 68%|██████▊ | 3930/5773 [2:15:26<2:48:40, 5.49s/it] {'loss': 0.5582, 'learning_rate': 4.887049047077674e-06, 'epoch': 0.68} 68%|██████▊ | 3930/5773 [2:15:24<2:48:40, 5.49s/it] 68%|██████▊ | 3931/5773 [2:15:31<2:47:25, 5.45s/it] 68%|██████▊ | 3931/5773 [2:15:29<2:47:26, 5.45s/it] {'loss': 0.5402, 'learning_rate': 4.882227739110873e-06, 'epoch': 0.68} 68%|██████▊ | 3931/5773 [2:15:31<2:47:25, 5.45s/it] {'loss': 0.5402, 'learning_rate': 4.882227739110873e-06, 'epoch': 0.68} 68%|██████▊ | 3931/5773 [2:15:29<2:47:26, 5.45s/it] 68%|██████▊ | 3932/5773 [2:15:36<2:48:22, 5.49s/it] 68%|██████▊ | 3932/5773 [2:15:34<2:48:22, 5.49s/it] {'loss': 0.5654, 'learning_rate': 4.8774080423822866e-06, 'epoch': 0.68} 68%|██████▊ | 3932/5773 [2:15:36<2:48:22, 5.49s/it] {'loss': 0.5654, 'learning_rate': 4.8774080423822866e-06, 'epoch': 0.68} 68%|██████▊ | 3932/5773 [2:15:34<2:48:22, 5.49s/it] 68%|██████▊ | 3933/5773 [2:15:42<2:48:45, 5.50s/it] 68%|██████▊ | 3933/5773 [2:15:40<2:48:45, 5.50s/it] {'loss': 0.5562, 'learning_rate': 4.8725899584093184e-06, 'epoch': 0.68} 68%|██████▊ | 3933/5773 [2:15:42<2:48:45, 5.50s/it] {'loss': 0.5562, 'learning_rate': 4.8725899584093184e-06, 'epoch': 0.68} 68%|██████▊ | 3933/5773 [2:15:40<2:48:45, 5.50s/it] 68%|██████▊ | 3934/5773 [2:15:46<2:48:35, 5.50s/it] 68%|██████▊ | 3934/5773 [2:15:47<2:48:35, 5.50s/it] {'loss': 0.5702, 'learning_rate': 4.867773488708851e-06, 'epoch': 0.68} 68%|██████▊ | 3934/5773 [2:15:47<2:48:35, 5.50s/it] {'loss': 0.5702, 'learning_rate': 4.867773488708851e-06, 'epoch': 0.68} 68%|██████▊ | 3934/5773 [2:15:46<2:48:35, 5.50s/it] 68%|██████▊ | 3935/5773 [2:15:51<2:48:47, 5.51s/it] 68%|██████▊ | 3935/5773 [2:15:53<2:48:47, 5.51s/it] {'loss': 0.5923, 'learning_rate': 4.8629586347972625e-06, 'epoch': 0.68} 68%|██████▊ | 3935/5773 [2:15:53<2:48:47, 5.51s/it] {'loss': 0.5923, 'learning_rate': 4.8629586347972625e-06, 'epoch': 0.68} 68%|██████▊ | 3935/5773 [2:15:51<2:48:47, 5.51s/it] 68%|██████▊ | 3936/5773 [2:15:59<2:49:42, 5.54s/it] 68%|██████▊ | 3936/5773 [2:15:57<2:49:42, 5.54s/it] {'loss': 0.5345, 'learning_rate': 4.8581453981904205e-06, 'epoch': 0.68} 68%|██████▊ | 3936/5773 [2:15:59<2:49:42, 5.54s/it] {'loss': 0.5345, 'learning_rate': 4.8581453981904205e-06, 'epoch': 0.68} 68%|██████▊ | 3936/5773 [2:15:57<2:49:42, 5.54s/it] 68%|██████▊ | 3937/5773 [2:16:02<2:49:00, 5.52s/it] 68%|██████▊ | 3937/5773 [2:16:04<2:49:00, 5.52s/it] {'loss': 0.5623, 'learning_rate': 4.853333780403691e-06, 'epoch': 0.68} 68%|██████▊ | 3937/5773 [2:16:04<2:49:00, 5.52s/it] {'loss': 0.5623, 'learning_rate': 4.853333780403691e-06, 'epoch': 0.68} 68%|██████▊ | 3937/5773 [2:16:02<2:49:00, 5.52s/it] 68%|██████▊ | 3938/5773 [2:16:07<2:46:55, 5.46s/it] 68%|██████▊ | 3938/5773 [2:16:09<2:46:55, 5.46s/it] {'loss': 0.5585, 'learning_rate': 4.848523782951921e-06, 'epoch': 0.68} 68%|██████▊ | 3938/5773 [2:16:09<2:46:55, 5.46s/it] {'loss': 0.5585, 'learning_rate': 4.848523782951921e-06, 'epoch': 0.68} 68%|██████▊ | 3938/5773 [2:16:07<2:46:55, 5.46s/it] 68%|██████▊ | 3939/5773 [2:16:13<2:46:32, 5.45s/it] 68%|██████▊ | 3939/5773 [2:16:15<2:46:32, 5.45s/it] {'loss': 0.5648, 'learning_rate': 4.843715407349451e-06, 'epoch': 0.68} 68%|██████▊ | 3939/5773 [2:16:15<2:46:32, 5.45s/it] {'loss': 0.5648, 'learning_rate': 4.843715407349451e-06, 'epoch': 0.68} 68%|██████▊ | 3939/5773 [2:16:13<2:46:32, 5.45s/it] 68%|██████▊ | 3940/5773 [2:16:20<2:46:21, 5.45s/it] 68%|██████▊ | 3940/5773 [2:16:18<2:46:21, 5.45s/it] {'loss': 0.5568, 'learning_rate': 4.838908655110116e-06, 'epoch': 0.68} 68%|██████▊ | 3940/5773 [2:16:20<2:46:21, 5.45s/it] {'loss': 0.5568, 'learning_rate': 4.838908655110116e-06, 'epoch': 0.68} 68%|██████▊ | 3940/5773 [2:16:18<2:46:21, 5.45s/it] 68%|██████▊ | 3941/5773 [2:16:26<2:45:09, 5.41s/it] 68%|██████▊ | 3941/5773 [2:16:24<2:45:09, 5.41s/it] {'loss': 0.5754, 'learning_rate': 4.834103527747233e-06, 'epoch': 0.68} 68%|██████▊ | 3941/5773 [2:16:26<2:45:09, 5.41s/it] {'loss': 0.5754, 'learning_rate': 4.834103527747233e-06, 'epoch': 0.68} 68%|██████▊ | 3941/5773 [2:16:24<2:45:09, 5.41s/it] 68%|██████▊ | 3942/5773 [2:16:31<2:45:55, 5.44s/it] 68%|██████▊ | 3942/5773 [2:16:29<2:45:55, 5.44s/it] {'loss': 0.5614, 'learning_rate': 4.829300026773608e-06, 'epoch': 0.68} 68%|██████▊ | 3942/5773 [2:16:31<2:45:55, 5.44s/it] {'loss': 0.5614, 'learning_rate': 4.829300026773608e-06, 'epoch': 0.68} 68%|██████▊ | 3942/5773 [2:16:29<2:45:55, 5.44s/it] 68%|██████▊ | 3943/5773 [2:16:36<2:45:14, 5.42s/it] 68%|██████▊ | 3943/5773 [2:16:35<2:45:14, 5.42s/it] {'loss': 0.5757, 'learning_rate': 4.824498153701537e-06, 'epoch': 0.68} 68%|██████▊ | 3943/5773 [2:16:36<2:45:14, 5.42s/it] {'loss': 0.5757, 'learning_rate': 4.824498153701537e-06, 'epoch': 0.68} 68%|██████▊ | 3943/5773 [2:16:35<2:45:14, 5.42s/it] 68%|██████▊ | 3944/5773 [2:16:42<2:46:42, 5.47s/it] 68%|██████▊ | 3944/5773 [2:16:40<2:46:42, 5.47s/it] {'loss': 0.5366, 'learning_rate': 4.819697910042803e-06, 'epoch': 0.68} 68%|██████▊ | 3944/5773 [2:16:42<2:46:42, 5.47s/it] {'loss': 0.5366, 'learning_rate': 4.819697910042803e-06, 'epoch': 0.68} 68%|██████▊ | 3944/5773 [2:16:40<2:46:42, 5.47s/it] 68%|██████▊ | 3945/5773 [2:16:48<2:47:59, 5.51s/it] 68%|██████▊ | 3945/5773 [2:16:46<2:47:59, 5.51s/it] {'loss': 0.5577, 'learning_rate': 4.814899297308673e-06, 'epoch': 0.68} 68%|██████▊ | 3945/5773 [2:16:48<2:47:59, 5.51s/it] {'loss': 0.5577, 'learning_rate': 4.814899297308673e-06, 'epoch': 0.68} 68%|██████▊ | 3945/5773 [2:16:46<2:47:59, 5.51s/it] 68%|██████▊ | 3946/5773 [2:16:51<2:47:01, 5.49s/it] 68%|██████▊ | 3946/5773 [2:16:53<2:47:02, 5.49s/it] {'loss': 0.5751, 'learning_rate': 4.810102317009912e-06, 'epoch': 0.68} 68%|██████▊ | 3946/5773 [2:16:53<2:47:02, 5.49s/it] {'loss': 0.5751, 'learning_rate': 4.810102317009912e-06, 'epoch': 0.68} 68%|██████▊ | 3946/5773 [2:16:51<2:47:01, 5.49s/it] 68%|██████▊ | 3947/5773 [2:16:57<2:47:08, 5.49s/it] 68%|██████▊ | 3947/5773 [2:16:59<2:47:08, 5.49s/it] {'loss': 0.5696, 'learning_rate': 4.8053069706567555e-06, 'epoch': 0.68} 68%|██████▊ | 3947/5773 [2:16:59<2:47:08, 5.49s/it] {'loss': 0.5696, 'learning_rate': 4.8053069706567555e-06, 'epoch': 0.68} 68%|██████▊ | 3947/5773 [2:16:57<2:47:08, 5.49s/it] 68%|██████▊ | 3948/5773 [2:17:02<2:46:36, 5.48s/it] 68%|██████▊ | 3948/5773 [2:17:04<2:46:36, 5.48s/it] {'loss': 0.5502, 'learning_rate': 4.8005132597589355e-06, 'epoch': 0.68} 68%|██████▊ | 3948/5773 [2:17:04<2:46:36, 5.48s/it] {'loss': 0.5502, 'learning_rate': 4.8005132597589355e-06, 'epoch': 0.68} 68%|██████▊ | 3948/5773 [2:17:02<2:46:36, 5.48s/it] 68%|██████▊ | 3949/5773 [2:17:10<2:47:21, 5.50s/it] 68%|██████▊ | 3949/5773 [2:17:08<2:47:21, 5.51s/it] {'loss': 0.5771, 'learning_rate': 4.79572118582566e-06, 'epoch': 0.68} 68%|██████▊ | 3949/5773 [2:17:10<2:47:21, 5.50s/it] {'loss': 0.5771, 'learning_rate': 4.79572118582566e-06, 'epoch': 0.68} 68%|██████▊ | 3949/5773 [2:17:08<2:47:21, 5.51s/it]14 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 910 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 68%|██████▊ | 3950/5773 [2:17:15<2:46:48, 5.49s/it]0 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 68%|██████▊ | 3950/5773 [2:17:13<2:46:48, 5.49s/it]12 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5682, 'learning_rate': 4.790930750365635e-06, 'epoch': 0.68} 68%|██████▊ | 3950/5773 [2:17:15<2:46:48, 5.49s/it] {'loss': 0.5682, 'learning_rate': 4.790930750365635e-06, 'epoch': 0.68} 68%|██████▊ | 3950/5773 [2:17:13<2:46:48, 5.49s/it] 68%|██████▊ | 3951/5773 [2:17:19<2:46:48, 5.49s/it] 68%|██████▊ | 3951/5773 [2:17:21<2:46:48, 5.49s/it] {'loss': 0.5772, 'learning_rate': 4.786141954887037e-06, 'epoch': 0.68} 68%|██████▊ | 3951/5773 [2:17:21<2:46:48, 5.49s/it] {'loss': 0.5772, 'learning_rate': 4.786141954887037e-06, 'epoch': 0.68} 68%|██████▊ | 3951/5773 [2:17:19<2:46:48, 5.49s/it] 68%|██████▊ | 3952/5773 [2:17:24<2:47:33, 5.52s/it] 68%|██████▊ | 3952/5773 [2:17:26<2:47:33, 5.52s/it] {'loss': 0.554, 'learning_rate': 4.78135480089753e-06, 'epoch': 0.68} 68%|██████▊ | 3952/5773 [2:17:26<2:47:33, 5.52s/it] {'loss': 0.554, 'learning_rate': 4.78135480089753e-06, 'epoch': 0.68} 68%|██████▊ | 3952/5773 [2:17:24<2:47:33, 5.52s/it] 68%|██████▊ | 3953/5773 [2:17:32<2:46:27, 5.49s/it] 68%|██████▊ | 3953/5773 [2:17:30<2:46:27, 5.49s/it] {'loss': 0.5624, 'learning_rate': 4.77656928990427e-06, 'epoch': 0.68} 68%|██████▊ | 3953/5773 [2:17:32<2:46:27, 5.49s/it] {'loss': 0.5624, 'learning_rate': 4.77656928990427e-06, 'epoch': 0.68} 68%|██████▊ | 3953/5773 [2:17:30<2:46:27, 5.49s/it] 68%|██████▊ | 3954/5773 [2:17:35<2:46:51, 5.50s/it] 68%|██████▊ | 3954/5773 [2:17:37<2:46:51, 5.50s/it] {'loss': 0.5609, 'learning_rate': 4.771785423413885e-06, 'epoch': 0.68} 68%|██████▊ | 3954/5773 [2:17:37<2:46:51, 5.50s/it] {'loss': 0.5609, 'learning_rate': 4.771785423413885e-06, 'epoch': 0.68} 68%|██████▊ | 3954/5773 [2:17:35<2:46:51, 5.50s/it] 69%|██████▊ | 3955/5773 [2:17:43<2:46:54, 5.51s/it] 69%|██████▊ | 3955/5773 [2:17:41<2:46:55, 5.51s/it] {'loss': 0.5544, 'learning_rate': 4.76700320293249e-06, 'epoch': 0.69} 69%|██████▊ | 3955/5773 [2:17:43<2:46:54, 5.51s/it] {'loss': 0.5544, 'learning_rate': 4.76700320293249e-06, 'epoch': 0.69} 69%|██████▊ | 3955/5773 [2:17:41<2:46:55, 5.51s/it] 69%|██████▊ | 3956/5773 [2:17:46<2:45:26, 5.46s/it] 69%|██████▊ | 3956/5773 [2:17:48<2:45:26, 5.46s/it] {'loss': 0.5734, 'learning_rate': 4.762222629965678e-06, 'epoch': 0.69} 69%|██████▊ | 3956/5773 [2:17:48<2:45:26, 5.46s/it] {'loss': 0.5734, 'learning_rate': 4.762222629965678e-06, 'epoch': 0.69} 69%|██████▊ | 3956/5773 [2:17:46<2:45:26, 5.46s/it] 69%|██████▊ | 3957/5773 [2:17:53<2:45:06, 5.46s/it] 69%|██████▊ | 3957/5773 [2:17:51<2:45:07, 5.46s/it] {'loss': 0.5718, 'learning_rate': 4.75744370601853e-06, 'epoch': 0.69} 69%|██████▊ | 3957/5773 [2:17:53<2:45:06, 5.46s/it] {'loss': 0.5718, 'learning_rate': 4.75744370601853e-06, 'epoch': 0.69} 69%|██████▊ | 3957/5773 [2:17:51<2:45:07, 5.46s/it] 69%|██████▊ | 3958/5773 [2:17:57<2:46:04, 5.49s/it] 69%|██████▊ | 3958/5773 [2:17:59<2:46:04, 5.49s/it] {'loss': 0.5479, 'learning_rate': 4.752666432595596e-06, 'epoch': 0.69} 69%|██████▊ | 3958/5773 [2:17:59<2:46:04, 5.49s/it] {'loss': 0.5479, 'learning_rate': 4.752666432595596e-06, 'epoch': 0.69} 69%|██████▊ | 3958/5773 [2:17:57<2:46:04, 5.49s/it] 69%|██████▊ | 3959/5773 [2:18:02<2:45:41, 5.48s/it] 69%|██████▊ | 3959/5773 [2:18:04<2:45:41, 5.48s/it] {'loss': 0.5594, 'learning_rate': 4.747890811200926e-06, 'epoch': 0.69} 69%|██████▊ | 3959/5773 [2:18:04<2:45:41, 5.48s/it] {'loss': 0.5594, 'learning_rate': 4.747890811200926e-06, 'epoch': 0.69} 69%|██████▊ | 3959/5773 [2:18:02<2:45:41, 5.48s/it] 69%|██████▊ | 3960/5773 [2:18:08<2:46:06, 5.50s/it] 69%|██████▊ | 3960/5773 [2:18:10<2:46:06, 5.50s/it] {'loss': 0.5681, 'learning_rate': 4.74311684333803e-06, 'epoch': 0.69} 69%|██████▊ | 3960/5773 [2:18:10<2:46:06, 5.50s/it] {'loss': 0.5681, 'learning_rate': 4.74311684333803e-06, 'epoch': 0.69} 69%|██████▊ | 3960/5773 [2:18:08<2:46:06, 5.50s/it] 69%|██████▊ | 3961/5773 [2:18:13<2:45:27, 5.48s/it] 69%|██████▊ | 3961/5773 [2:18:15<2:45:28, 5.48s/it] {'loss': 0.5693, 'learning_rate': 4.738344530509911e-06, 'epoch': 0.69} 69%|██████▊ | 3961/5773 [2:18:15<2:45:28, 5.48s/it] {'loss': 0.5693, 'learning_rate': 4.738344530509911e-06, 'epoch': 0.69} 69%|██████▊ | 3961/5773 [2:18:13<2:45:27, 5.48s/it] 69%|██████▊ | 3962/5773 [2:18:19<2:47:08, 5.54s/it] 69%|██████▊ | 3962/5773 [2:18:21<2:47:08, 5.54s/it] {'loss': 0.5652, 'learning_rate': 4.7335738742190366e-06, 'epoch': 0.69} 69%|██████▊ | 3962/5773 [2:18:21<2:47:08, 5.54s/it] {'loss': 0.5652, 'learning_rate': 4.7335738742190366e-06, 'epoch': 0.69} 69%|██████▊ | 3962/5773 [2:18:19<2:47:08, 5.54s/it] 69%|██████▊ | 3963/5773 [2:18:25<2:47:28, 5.55s/it] 69%|██████▊ | 3963/5773 [2:18:27<2:47:28, 5.55s/it] {'loss': 0.5503, 'learning_rate': 4.728804875967372e-06, 'epoch': 0.69} 69%|██████▊ | 3963/5773 [2:18:27<2:47:28, 5.55s/it] {'loss': 0.5503, 'learning_rate': 4.728804875967372e-06, 'epoch': 0.69} 69%|██████▊ | 3963/5773 [2:18:25<2:47:28, 5.55s/it] 69%|██████▊ | 3964/5773 [2:18:30<2:45:28, 5.49s/it] 69%|██████▊ | 3964/5773 [2:18:32<2:45:28, 5.49s/it] {'loss': 0.5649, 'learning_rate': 4.724037537256346e-06, 'epoch': 0.69} 69%|██████▊ | 3964/5773 [2:18:32<2:45:28, 5.49s/it] {'loss': 0.5649, 'learning_rate': 4.724037537256346e-06, 'epoch': 0.69} 69%|██████▊ | 3964/5773 [2:18:30<2:45:28, 5.49s/it] 69%|██████▊ | 3965/5773 [2:18:36<2:45:16, 5.48s/it] 69%|██████▊ | 3965/5773 [2:18:38<2:45:16, 5.48s/it] {'loss': 0.5737, 'learning_rate': 4.719271859586865e-06, 'epoch': 0.69} 69%|██████▊ | 3965/5773 [2:18:38<2:45:16, 5.48s/it] {'loss': 0.5737, 'learning_rate': 4.719271859586865e-06, 'epoch': 0.69} 69%|██████▊ | 3965/5773 [2:18:36<2:45:16, 5.48s/it] 69%|██████▊ | 3966/5773 [2:18:41<2:44:33, 5.46s/it] 69%|██████▊ | 3966/5773 [2:18:43<2:44:33, 5.46s/it] {'loss': 0.5466, 'learning_rate': 4.714507844459325e-06, 'epoch': 0.69} 69%|██████▊ | 3966/5773 [2:18:43<2:44:33, 5.46s/it] {'loss': 0.5466, 'learning_rate': 4.714507844459325e-06, 'epoch': 0.69} 69%|██████▊ | 3966/5773 [2:18:41<2:44:33, 5.46s/it] 69%|██████▊ | 3967/5773 [2:18:46<2:44:50, 5.48s/it] 69%|██████▊ | 3967/5773 [2:18:48<2:44:50, 5.48s/it] {'loss': 0.5689, 'learning_rate': 4.709745493373585e-06, 'epoch': 0.69} 69%|██████▊ | 3967/5773 [2:18:48<2:44:50, 5.48s/it] {'loss': 0.5689, 'learning_rate': 4.709745493373585e-06, 'epoch': 0.69} 69%|██████▊ | 3967/5773 [2:18:46<2:44:50, 5.48s/it] 69%|██████▊ | 3968/5773 [2:18:52<2:44:48, 5.48s/it] 69%|██████▊ | 3968/5773 [2:18:54<2:44:48, 5.48s/it] {'loss': 0.5596, 'learning_rate': 4.704984807828987e-06, 'epoch': 0.69} 69%|██████▊ | 3968/5773 [2:18:54<2:44:48, 5.48s/it] {'loss': 0.5596, 'learning_rate': 4.704984807828987e-06, 'epoch': 0.69} 69%|██████▊ | 3968/5773 [2:18:52<2:44:48, 5.48s/it] 69%|██████▉ | 3969/5773 [2:18:58<2:45:42, 5.51s/it] 69%|██████▉ | 3969/5773 [2:18:59<2:45:42, 5.51s/it] {'loss': 0.5544, 'learning_rate': 4.700225789324343e-06, 'epoch': 0.69} 69%|██████▉ | 3969/5773 [2:18:59<2:45:42, 5.51s/it] {'loss': 0.5544, 'learning_rate': 4.700225789324343e-06, 'epoch': 0.69} 69%|██████▉ | 3969/5773 [2:18:58<2:45:42, 5.51s/it] 69%|██████▉ | 3970/5773 [2:19:05<2:44:53, 5.49s/it] 69%|██████▉ | 3970/5773 [2:19:03<2:44:53, 5.49s/it]{'loss': 0.5751, 'learning_rate': 4.695468439357954e-06, 'epoch': 0.69} {'loss': 0.5751, 'learning_rate': 4.695468439357954e-06, 'epoch': 0.69} 69%|██████▉ | 3970/5773 [2:19:05<2:44:53, 5.49s/it] 69%|██████▉ | 3970/5773 [2:19:03<2:44:53, 5.49s/it] 69%|██████▉ | 3971/5773 [2:19:09<2:46:43, 5.55s/it] 69%|██████▉ | 3971/5773 [2:19:11<2:46:43, 5.55s/it] {'loss': 0.5588, 'learning_rate': 4.690712759427571e-06, 'epoch': 0.69} 69%|██████▉ | 3971/5773 [2:19:11<2:46:43, 5.55s/it] {'loss': 0.5588, 'learning_rate': 4.690712759427571e-06, 'epoch': 0.69} 69%|██████▉ | 3971/5773 [2:19:09<2:46:43, 5.55s/it] 69%|██████▉ | 3972/5773 [2:19:14<2:46:50, 5.56s/it] 69%|██████▉ | 3972/5773 [2:19:16<2:46:51, 5.56s/it] {'loss': 0.5655, 'learning_rate': 4.685958751030446e-06, 'epoch': 0.69} 69%|██████▉ | 3972/5773 [2:19:16<2:46:51, 5.56s/it] {'loss': 0.5655, 'learning_rate': 4.685958751030446e-06, 'epoch': 0.69} 69%|██████▉ | 3972/5773 [2:19:14<2:46:50, 5.56s/it] 69%|██████▉ | 3973/5773 [2:19:19<2:43:58, 5.47s/it] 69%|██████▉ | 3973/5773 [2:19:21<2:43:58, 5.47s/it] {'loss': 0.5685, 'learning_rate': 4.681206415663289e-06, 'epoch': 0.69} 69%|██████▉ | 3973/5773 [2:19:21<2:43:58, 5.47s/it] {'loss': 0.5685, 'learning_rate': 4.681206415663289e-06, 'epoch': 0.69} 69%|██████▉ | 3973/5773 [2:19:19<2:43:58, 5.47s/it] 69%|██████▉ | 3974/5773 [2:19:25<2:43:55, 5.47s/it] 69%|██████▉ | 3974/5773 [2:19:27<2:43:55, 5.47s/it] {'loss': 0.5743, 'learning_rate': 4.6764557548222854e-06, 'epoch': 0.69} 69%|██████▉ | 3974/5773 [2:19:27<2:43:55, 5.47s/it] {'loss': 0.5743, 'learning_rate': 4.6764557548222854e-06, 'epoch': 0.69} 69%|██████▉ | 3974/5773 [2:19:25<2:43:55, 5.47s/it] 69%|██████▉ | 3975/5773 [2:19:32<2:43:54, 5.47s/it] 69%|██████▉ | 3975/5773 [2:19:30<2:43:55, 5.47s/it] {'loss': 0.5673, 'learning_rate': 4.671706770003094e-06, 'epoch': 0.69} 69%|██████▉ | 3975/5773 [2:19:32<2:43:54, 5.47s/it] {'loss': 0.5673, 'learning_rate': 4.671706770003094e-06, 'epoch': 0.69} 69%|██████▉ | 3975/5773 [2:19:30<2:43:55, 5.47s/it] 69%|██████▉ | 3976/5773 [2:19:36<2:45:03, 5.51s/it] 69%|██████▉ | 3976/5773 [2:19:38<2:45:03, 5.51s/it] {'loss': 0.5638, 'learning_rate': 4.666959462700852e-06, 'epoch': 0.69} 69%|██████▉ | 3976/5773 [2:19:38<2:45:03, 5.51s/it] {'loss': 0.5638, 'learning_rate': 4.666959462700852e-06, 'epoch': 0.69} 69%|██████▉ | 3976/5773 [2:19:36<2:45:03, 5.51s/it] 69%|██████▉ | 3977/5773 [2:19:42<2:45:36, 5.53s/it] 69%|██████▉ | 3977/5773 [2:19:44<2:45:37, 5.53s/it] {'loss': 0.5409, 'learning_rate': 4.6622138344101605e-06, 'epoch': 0.69} 69%|██████▉ | 3977/5773 [2:19:44<2:45:37, 5.53s/it] {'loss': 0.5409, 'learning_rate': 4.6622138344101605e-06, 'epoch': 0.69} 69%|██████▉ | 3977/5773 [2:19:42<2:45:36, 5.53s/it] 69%|██████▉ | 3978/5773 [2:19:47<2:43:35, 5.47s/it] 69%|██████▉ | 3978/5773 [2:19:49<2:43:35, 5.47s/it] {'loss': 0.5657, 'learning_rate': 4.657469886625093e-06, 'epoch': 0.69} 69%|██████▉ | 3978/5773 [2:19:49<2:43:35, 5.47s/it] {'loss': 0.5657, 'learning_rate': 4.657469886625093e-06, 'epoch': 0.69} 69%|██████▉ | 3978/5773 [2:19:47<2:43:35, 5.47s/it] 69%|██████▉ | 3979/5773 [2:19:52<2:44:17, 5.49s/it] 69%|██████▉ | 3979/5773 [2:19:54<2:44:17, 5.49s/it] {'loss': 0.568, 'learning_rate': 4.652727620839199e-06, 'epoch': 0.69} 69%|██████▉ | 3979/5773 [2:19:54<2:44:17, 5.49s/it] {'loss': 0.568, 'learning_rate': 4.652727620839199e-06, 'epoch': 0.69} 69%|██████▉ | 3979/5773 [2:19:52<2:44:17, 5.49s/it] 69%|██████▉ | 3980/5773 [2:19:58<2:43:29, 5.47s/it] 69%|██████▉ | 3980/5773 [2:20:00<2:43:28, 5.47s/it] {'loss': 0.574, 'learning_rate': 4.647987038545496e-06, 'epoch': 0.69} 69%|██████▉ | 3980/5773 [2:20:00<2:43:28, 5.47s/it] {'loss': 0.574, 'learning_rate': 4.647987038545496e-06, 'epoch': 0.69} 69%|██████▉ | 3980/5773 [2:19:58<2:43:29, 5.47s/it] 69%|██████▉ | 3981/5773 [2:20:03<2:42:51, 5.45s/it] 69%|██████▉ | 3981/5773 [2:20:05<2:42:51, 5.45s/it] {'loss': 0.571, 'learning_rate': 4.643248141236469e-06, 'epoch': 0.69} 69%|██████▉ | 3981/5773 [2:20:05<2:42:51, 5.45s/it] {'loss': 0.571, 'learning_rate': 4.643248141236469e-06, 'epoch': 0.69} 69%|██████▉ | 3981/5773 [2:20:03<2:42:51, 5.45s/it] 69%|██████▉ | 3982/5773 [2:20:09<2:41:25, 5.41s/it] 69%|██████▉ | 3982/5773 [2:20:11<2:41:25, 5.41s/it] {'loss': 0.5755, 'learning_rate': 4.638510930404069e-06, 'epoch': 0.69} 69%|██████▉ | 3982/5773 [2:20:11<2:41:25, 5.41s/it] {'loss': 0.5755, 'learning_rate': 4.638510930404069e-06, 'epoch': 0.69} 69%|██████▉ | 3982/5773 [2:20:09<2:41:25, 5.41s/it] 69%|██████▉ | 3983/5773 [2:20:14<2:42:09, 5.44s/it] 69%|██████▉ | 3983/5773 [2:20:16<2:42:09, 5.44s/it] {'loss': 0.5919, 'learning_rate': 4.633775407539731e-06, 'epoch': 0.69} 69%|██████▉ | 3983/5773 [2:20:16<2:42:09, 5.44s/it] {'loss': 0.5919, 'learning_rate': 4.633775407539731e-06, 'epoch': 0.69} 69%|██████▉ | 3983/5773 [2:20:14<2:42:09, 5.44s/it] 69%|██████▉ | 3984/5773 [2:20:22<2:44:17, 5.51s/it] 69%|██████▉ | 3984/5773 [2:20:20<2:44:17, 5.51s/it] {'loss': 0.5851, 'learning_rate': 4.629041574134346e-06, 'epoch': 0.69} 69%|██████▉ | 3984/5773 [2:20:22<2:44:17, 5.51s/it] {'loss': 0.5851, 'learning_rate': 4.629041574134346e-06, 'epoch': 0.69} 69%|██████▉ | 3984/5773 [2:20:20<2:44:17, 5.51s/it] 69%|██████▉ | 3985/5773 [2:20:25<2:43:01, 5.47s/it] 69%|██████▉ | 3985/5773 [2:20:27<2:43:02, 5.47s/it] {'loss': 0.5536, 'learning_rate': 4.624309431678273e-06, 'epoch': 0.69} 69%|██████▉ | 3985/5773 [2:20:27<2:43:02, 5.47s/it] {'loss': 0.5536, 'learning_rate': 4.624309431678273e-06, 'epoch': 0.69} 69%|██████▉ | 3985/5773 [2:20:25<2:43:01, 5.47s/it] 69%|██████▉ | 3986/5773 [2:20:31<2:43:05, 5.48s/it] 69%|██████▉ | 3986/5773 [2:20:33<2:43:05, 5.48s/it] {'loss': 0.5695, 'learning_rate': 4.619578981661341e-06, 'epoch': 0.69} 69%|██████▉ | 3986/5773 [2:20:33<2:43:05, 5.48s/it] {'loss': 0.5695, 'learning_rate': 4.619578981661341e-06, 'epoch': 0.69} 69%|██████▉ | 3986/5773 [2:20:31<2:43:05, 5.48s/it] 69%|██████▉ | 3987/5773 [2:20:36<2:44:11, 5.52s/it] 69%|██████▉ | 3987/5773 [2:20:38<2:44:11, 5.52s/it] {'loss': 0.561, 'learning_rate': 4.614850225572851e-06, 'epoch': 0.69} 69%|██████▉ | 3987/5773 [2:20:38<2:44:11, 5.52s/it] {'loss': 0.561, 'learning_rate': 4.614850225572851e-06, 'epoch': 0.69} 69%|██████▉ | 3987/5773 [2:20:36<2:44:11, 5.52s/it] 69%|██████▉ | 3988/5773 [2:20:44<2:44:42, 5.54s/it] 69%|██████▉ | 3988/5773 [2:20:42<2:44:43, 5.54s/it] {'loss': 0.5444, 'learning_rate': 4.6101231649015606e-06, 'epoch': 0.69} 69%|██████▉ | 3988/5773 [2:20:44<2:44:42, 5.54s/it]{'loss': 0.5444, 'learning_rate': 4.6101231649015606e-06, 'epoch': 0.69} 69%|██████▉ | 3988/5773 [2:20:42<2:44:43, 5.54s/it] 69%|██████▉ | 3989/5773 [2:20:47<2:43:36, 5.50s/it] 69%|██████▉ | 3989/5773 [2:20:49<2:43:36, 5.50s/it] {'loss': 0.5669, 'learning_rate': 4.605397801135707e-06, 'epoch': 0.69} 69%|██████▉ | 3989/5773 [2:20:49<2:43:36, 5.50s/it] {'loss': 0.5669, 'learning_rate': 4.605397801135707e-06, 'epoch': 0.69} 69%|██████▉ | 3989/5773 [2:20:47<2:43:36, 5.50s/it] 69%|██████▉ | 3990/5773 [2:20:53<2:44:25, 5.53s/it] 69%|██████▉ | 3990/5773 [2:20:55<2:44:25, 5.53s/it] {'loss': 0.5739, 'learning_rate': 4.600674135762982e-06, 'epoch': 0.69} 69%|██████▉ | 3990/5773 [2:20:55<2:44:25, 5.53s/it] {'loss': 0.5739, 'learning_rate': 4.600674135762982e-06, 'epoch': 0.69} 69%|██████▉ | 3990/5773 [2:20:53<2:44:25, 5.53s/it] 69%|██████▉ | 3991/5773 [2:20:58<2:43:53, 5.52s/it] 69%|██████▉ | 3991/5773 [2:21:00<2:43:53, 5.52s/it] {'loss': 0.5691, 'learning_rate': 4.595952170270542e-06, 'epoch': 0.69} 69%|██████▉ | 3991/5773 [2:21:00<2:43:53, 5.52s/it] {'loss': 0.5691, 'learning_rate': 4.595952170270542e-06, 'epoch': 0.69} 69%|██████▉ | 3991/5773 [2:20:58<2:43:53, 5.52s/it] 69%|██████▉ | 3992/5773 [2:21:04<2:42:37, 5.48s/it] 69%|██████▉ | 3992/5773 [2:21:06<2:42:37, 5.48s/it] {'loss': 0.5565, 'learning_rate': 4.591231906145022e-06, 'epoch': 0.69} 69%|██████▉ | 3992/5773 [2:21:06<2:42:37, 5.48s/it] {'loss': 0.5565, 'learning_rate': 4.591231906145022e-06, 'epoch': 0.69} 69%|██████▉ | 3992/5773 [2:21:04<2:42:37, 5.48s/it] 69%|██████▉ | 3993/5773 [2:21:09<2:42:14, 5.47s/it] 69%|██████▉ | 3993/5773 [2:21:11<2:42:14, 5.47s/it] {'loss': 0.5775, 'learning_rate': 4.586513344872505e-06, 'epoch': 0.69} 69%|██████▉ | 3993/5773 [2:21:11<2:42:14, 5.47s/it] {'loss': 0.5775, 'learning_rate': 4.586513344872505e-06, 'epoch': 0.69} 69%|██████▉ | 3993/5773 [2:21:09<2:42:14, 5.47s/it] 69%|██████▉ | 3994/5773 [2:21:15<2:42:46, 5.49s/it] 69%|██████▉ | 3994/5773 [2:21:17<2:42:47, 5.49s/it] {'loss': 0.5714, 'learning_rate': 4.581796487938548e-06, 'epoch': 0.69} 69%|██████▉ | 3994/5773 [2:21:17<2:42:47, 5.49s/it] {'loss': 0.5714, 'learning_rate': 4.581796487938548e-06, 'epoch': 0.69} 69%|██████▉ | 3994/5773 [2:21:15<2:42:46, 5.49s/it] 69%|██████▉ | 3995/5773 [2:21:22<2:42:21, 5.48s/it] 69%|██████▉ | 3995/5773 [2:21:20<2:42:21, 5.48s/it] {'loss': 0.5596, 'learning_rate': 4.577081336828163e-06, 'epoch': 0.69} 69%|██████▉ | 3995/5773 [2:21:22<2:42:21, 5.48s/it] {'loss': 0.5596, 'learning_rate': 4.577081336828163e-06, 'epoch': 0.69} 69%|██████▉ | 3995/5773 [2:21:20<2:42:21, 5.48s/it] 69%|██████▉ | 3996/5773 [2:21:26<2:42:11, 5.48s/it] 69%|██████▉ | 3996/5773 [2:21:28<2:42:12, 5.48s/it] {'loss': 0.5533, 'learning_rate': 4.572367893025839e-06, 'epoch': 0.69} 69%|██████▉ | 3996/5773 [2:21:28<2:42:12, 5.48s/it] {'loss': 0.5533, 'learning_rate': 4.572367893025839e-06, 'epoch': 0.69} 69%|██████▉ | 3996/5773 [2:21:26<2:42:11, 5.48s/it] 69%|██████▉ | 3997/5773 [2:21:31<2:41:50, 5.47s/it] 69%|██████▉ | 3997/5773 [2:21:33<2:41:50, 5.47s/it] {'loss': 0.5755, 'learning_rate': 4.5676561580155125e-06, 'epoch': 0.69} 69%|██████▉ | 3997/5773 [2:21:33<2:41:50, 5.47s/it] {'loss': 0.5755, 'learning_rate': 4.5676561580155125e-06, 'epoch': 0.69} 69%|██████▉ | 3997/5773 [2:21:31<2:41:50, 5.47s/it] 69%|██████▉ | 3998/5773 [2:21:37<2:43:53, 5.54s/it] 69%|██████▉ | 3998/5773 [2:21:39<2:43:53, 5.54s/it] {'loss': 0.5588, 'learning_rate': 4.562946133280589e-06, 'epoch': 0.69} 69%|██████▉ | 3998/5773 [2:21:39<2:43:53, 5.54s/it] {'loss': 0.5588, 'learning_rate': 4.562946133280589e-06, 'epoch': 0.69} 69%|██████▉ | 3998/5773 [2:21:37<2:43:53, 5.54s/it] 69%|██████▉ | 3999/5773 [2:21:42<2:43:21, 5.53s/it] 69%|██████▉ | 3999/5773 [2:21:44<2:43:21, 5.53s/it] {'loss': 0.5546, 'learning_rate': 4.558237820303937e-06, 'epoch': 0.69} 69%|██████▉ | 3999/5773 [2:21:44<2:43:21, 5.53s/it] {'loss': 0.5546, 'learning_rate': 4.558237820303937e-06, 'epoch': 0.69} 69%|██████▉ | 3999/5773 [2:21:42<2:43:21, 5.53s/it]14 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 69%|██████▉ | 4000/5773 [2:21:48<2:43:37, 5.54s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 57 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 69%|██████▉ | 4000/5773 [2:21:50<2:43:37, 5.54s/it]15 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... {'loss': 0.5704, 'learning_rate': 4.553531220567881e-06, 'epoch': 0.69} 69%|██████▉ | 4000/5773 [2:21:48<2:43:37, 5.54s/it] {'loss': 0.5704, 'learning_rate': 4.553531220567881e-06, 'epoch': 0.69} 69%|██████▉ | 4000/5773 [2:21:50<2:43:37, 5.54s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 69%|██████▉ | 4001/5773 [2:22:07<4:39:34, 9.47s/it] 69%|██████▉ | 4001/5773 [2:22:08<4:39:34, 9.47s/it] {'loss': 0.5719, 'learning_rate': 4.5488263355542085e-06, 'epoch': 0.69} {'loss': 0.5719, 'learning_rate': 4.5488263355542085e-06, 'epoch': 0.69} 69%|██████▉ | 4001/5773 [2:22:08<4:39:34, 9.47s/it] 69%|██████▉ | 4001/5773 [2:22:07<4:39:34, 9.47s/it] 69%|██████▉ | 4002/5773 [2:22:12<4:05:17, 8.31s/it] 69%|██████▉ | 4002/5773 [2:22:14<4:05:17, 8.31s/it] {'loss': 0.5641, 'learning_rate': 4.5441231667441724e-06, 'epoch': 0.69} 69%|██████▉ | 4002/5773 [2:22:14<4:05:17, 8.31s/it] {'loss': 0.5641, 'learning_rate': 4.5441231667441724e-06, 'epoch': 0.69} 69%|██████▉ | 4002/5773 [2:22:12<4:05:17, 8.31s/it] 69%|██████▉ | 4003/5773 [2:22:18<3:42:17, 7.54s/it] 69%|██████▉ | 4003/5773 [2:22:20<3:42:17, 7.54s/it] {'loss': 0.5545, 'learning_rate': 4.539421715618476e-06, 'epoch': 0.69} 69%|██████▉ | 4003/5773 [2:22:20<3:42:17, 7.54s/it] {'loss': 0.5545, 'learning_rate': 4.539421715618476e-06, 'epoch': 0.69} 69%|██████▉ | 4003/5773 [2:22:18<3:42:17, 7.54s/it] 69%|██████▉ | 4004/5773 [2:22:23<3:23:49, 6.91s/it] 69%|██████▉ | 4004/5773 [2:22:25<3:23:49, 6.91s/it] {'loss': 0.5495, 'learning_rate': 4.5347219836572855e-06, 'epoch': 0.69} 69%|██████▉ | 4004/5773 [2:22:25<3:23:49, 6.91s/it] {'loss': 0.5495, 'learning_rate': 4.5347219836572855e-06, 'epoch': 0.69} 69%|██████▉ | 4004/5773 [2:22:23<3:23:49, 6.91s/it] 69%|██████▉ | 4005/5773 [2:22:29<3:12:07, 6.52s/it] 69%|██████▉ | 4005/5773 [2:22:31<3:12:07, 6.52s/it] {'loss': 0.568, 'learning_rate': 4.530023972340232e-06, 'epoch': 0.69} 69%|██████▉ | 4005/5773 [2:22:31<3:12:07, 6.52s/it] {'loss': 0.568, 'learning_rate': 4.530023972340232e-06, 'epoch': 0.69} 69%|██████▉ | 4005/5773 [2:22:29<3:12:07, 6.52s/it] 69%|██████▉ | 4006/5773 [2:22:34<3:01:26, 6.16s/it] 69%|██████▉ | 4006/5773 [2:22:36<3:01:26, 6.16s/it] {'loss': 0.5625, 'learning_rate': 4.525327683146396e-06, 'epoch': 0.69} 69%|██████▉ | 4006/5773 [2:22:36<3:01:26, 6.16s/it] {'loss': 0.5625, 'learning_rate': 4.525327683146396e-06, 'epoch': 0.69} 69%|██████▉ | 4006/5773 [2:22:34<3:01:26, 6.16s/it] 69%|██████▉ | 4007/5773 [2:22:40<2:53:21, 5.89s/it] 69%|██████▉ | 4007/5773 [2:22:41<2:53:21, 5.89s/it] {'loss': 0.5511, 'learning_rate': 4.520633117554318e-06, 'epoch': 0.69} 69%|██████▉ | 4007/5773 [2:22:41<2:53:21, 5.89s/it] {'loss': 0.5511, 'learning_rate': 4.520633117554318e-06, 'epoch': 0.69} 69%|██████▉ | 4007/5773 [2:22:40<2:53:21, 5.89s/it] 69%|██████▉ | 4008/5773 [2:22:45<2:48:52, 5.74s/it] 69%|██████▉ | 4008/5773 [2:22:47<2:48:52, 5.74s/it] {'loss': 0.5616, 'learning_rate': 4.515940277042002e-06, 'epoch': 0.69} 69%|██████▉ | 4008/5773 [2:22:45<2:48:52, 5.74s/it]{'loss': 0.5616, 'learning_rate': 4.515940277042002e-06, 'epoch': 0.69} 69%|██████▉ | 4008/5773 [2:22:47<2:48:52, 5.74s/it] 69%|██████▉ | 4009/5773 [2:22:50<2:46:55, 5.68s/it] 69%|██████▉ | 4009/5773 [2:22:52<2:46:55, 5.68s/it] {'loss': 0.5615, 'learning_rate': 4.511249163086901e-06, 'epoch': 0.69} 69%|██████▉ | 4009/5773 [2:22:52<2:46:55, 5.68s/it] {'loss': 0.5615, 'learning_rate': 4.511249163086901e-06, 'epoch': 0.69} 69%|██████▉ | 4009/5773 [2:22:50<2:46:55, 5.68s/it] 69%|██████▉ | 4010/5773 [2:22:56<2:46:35, 5.67s/it] 69%|██████▉ | 4010/5773 [2:22:58<2:46:35, 5.67s/it] {'loss': 0.557, 'learning_rate': 4.506559777165929e-06, 'epoch': 0.69} 69%|██████▉ | 4010/5773 [2:22:58<2:46:35, 5.67s/it] {'loss': 0.557, 'learning_rate': 4.506559777165929e-06, 'epoch': 0.69} 69%|██████▉ | 4010/5773 [2:22:56<2:46:35, 5.67s/it] 69%|██████▉ | 4011/5773 [2:23:02<2:44:58, 5.62s/it] 69%|██████▉ | 4011/5773 [2:23:04<2:44:58, 5.62s/it] {'loss': 0.5618, 'learning_rate': 4.501872120755448e-06, 'epoch': 0.69} 69%|██████▉ | 4011/5773 [2:23:04<2:44:58, 5.62s/it] {'loss': 0.5618, 'learning_rate': 4.501872120755448e-06, 'epoch': 0.69} 69%|██████▉ | 4011/5773 [2:23:02<2:44:58, 5.62s/it] 69%|██████▉ | 4012/5773 [2:23:07<2:43:14, 5.56s/it] 69%|██████▉ | 4012/5773 [2:23:09<2:43:14, 5.56s/it] {'loss': 0.5675, 'learning_rate': 4.497186195331296e-06, 'epoch': 0.69} 69%|██████▉ | 4012/5773 [2:23:09<2:43:14, 5.56s/it] {'loss': 0.5675, 'learning_rate': 4.497186195331296e-06, 'epoch': 0.69} 69%|██████▉ | 4012/5773 [2:23:07<2:43:14, 5.56s/it] 70%|██████▉ | 4013/5773 [2:23:13<2:42:35, 5.54s/it] 70%|██████▉ | 4013/5773 [2:23:14<2:42:35, 5.54s/it] {'loss': 0.5585, 'learning_rate': 4.492502002368738e-06, 'epoch': 0.7} 70%|██████▉ | 4013/5773 [2:23:14<2:42:35, 5.54s/it] {'loss': 0.5585, 'learning_rate': 4.492502002368738e-06, 'epoch': 0.7} 70%|██████▉ | 4013/5773 [2:23:13<2:42:35, 5.54s/it] 70%|██████▉ | 4014/5773 [2:23:18<2:41:04, 5.49s/it] 70%|██████▉ | 4014/5773 [2:23:20<2:41:04, 5.49s/it] {'loss': 0.5547, 'learning_rate': 4.487819543342511e-06, 'epoch': 0.7} 70%|██████▉ | 4014/5773 [2:23:20<2:41:04, 5.49s/it] {'loss': 0.5547, 'learning_rate': 4.487819543342511e-06, 'epoch': 0.7} 70%|██████▉ | 4014/5773 [2:23:18<2:41:04, 5.49s/it] 70%|██████▉ | 4015/5773 [2:23:23<2:40:30, 5.48s/it] 70%|██████▉ | 4015/5773 [2:23:25<2:40:30, 5.48s/it] {'loss': 0.5415, 'learning_rate': 4.4831388197268074e-06, 'epoch': 0.7} 70%|██████▉ | 4015/5773 [2:23:25<2:40:30, 5.48s/it] {'loss': 0.5415, 'learning_rate': 4.4831388197268074e-06, 'epoch': 0.7} 70%|██████▉ | 4015/5773 [2:23:23<2:40:30, 5.48s/it] 70%|██████▉ | 4016/5773 [2:23:29<2:40:23, 5.48s/it] 70%|██████▉ | 4016/5773 [2:23:31<2:40:23, 5.48s/it] {'loss': 0.5689, 'learning_rate': 4.4784598329952675e-06, 'epoch': 0.7} 70%|██████▉ | 4016/5773 [2:23:31<2:40:23, 5.48s/it] {'loss': 0.5689, 'learning_rate': 4.4784598329952675e-06, 'epoch': 0.7} 70%|██████▉ | 4016/5773 [2:23:29<2:40:23, 5.48s/it] 70%|██████▉ | 4017/5773 [2:23:34<2:40:27, 5.48s/it] 70%|██████▉ | 4017/5773 [2:23:36<2:40:27, 5.48s/it] {'loss': 0.5577, 'learning_rate': 4.473782584620979e-06, 'epoch': 0.7} 70%|██████▉ | 4017/5773 [2:23:36<2:40:27, 5.48s/it] {'loss': 0.5577, 'learning_rate': 4.473782584620979e-06, 'epoch': 0.7} 70%|██████▉ | 4017/5773 [2:23:34<2:40:27, 5.48s/it] 70%|██████▉ | 4018/5773 [2:23:40<2:39:22, 5.45s/it] 70%|██████▉ | 4018/5773 [2:23:42<2:39:22, 5.45s/it] {'loss': 0.5562, 'learning_rate': 4.469107076076499e-06, 'epoch': 0.7} 70%|██████▉ | 4018/5773 [2:23:42<2:39:22, 5.45s/it] {'loss': 0.5562, 'learning_rate': 4.469107076076499e-06, 'epoch': 0.7} 70%|██████▉ | 4018/5773 [2:23:40<2:39:22, 5.45s/it] 70%|██████▉ | 4019/5773 [2:23:45<2:40:23, 5.49s/it] 70%|██████▉ | 4019/5773 [2:23:47<2:40:23, 5.49s/it] {'loss': 0.5579, 'learning_rate': 4.464433308833821e-06, 'epoch': 0.7} 70%|██████▉ | 4019/5773 [2:23:47<2:40:23, 5.49s/it] {'loss': 0.5579, 'learning_rate': 4.464433308833821e-06, 'epoch': 0.7} 70%|██████▉ | 4019/5773 [2:23:45<2:40:23, 5.49s/it] 70%|██████▉ | 4020/5773 [2:23:51<2:41:14, 5.52s/it] 70%|██████▉ | 4020/5773 [2:23:53<2:41:14, 5.52s/it] {'loss': 0.5436, 'learning_rate': 4.459761284364394e-06, 'epoch': 0.7} 70%|██████▉ | 4020/5773 [2:23:53<2:41:14, 5.52s/it] {'loss': 0.5436, 'learning_rate': 4.459761284364394e-06, 'epoch': 0.7} 70%|██████▉ | 4020/5773 [2:23:51<2:41:14, 5.52s/it] 70%|██████▉ | 4021/5773 [2:23:56<2:39:54, 5.48s/it] 70%|██████▉ | 4021/5773 [2:23:58<2:39:54, 5.48s/it] {'loss': 0.5582, 'learning_rate': 4.455091004139129e-06, 'epoch': 0.7} 70%|██████▉ | 4021/5773 [2:23:58<2:39:54, 5.48s/it] {'loss': 0.5582, 'learning_rate': 4.455091004139129e-06, 'epoch': 0.7} 70%|██████▉ | 4021/5773 [2:23:56<2:39:54, 5.48s/it] 70%|██████▉ | 4022/5773 [2:24:02<2:39:52, 5.48s/it] 70%|██████▉ | 4022/5773 [2:24:04<2:39:52, 5.48s/it] {'loss': 0.5599, 'learning_rate': 4.450422469628374e-06, 'epoch': 0.7} 70%|██████▉ | 4022/5773 [2:24:04<2:39:52, 5.48s/it] {'loss': 0.5599, 'learning_rate': 4.450422469628374e-06, 'epoch': 0.7} 70%|██████▉ | 4022/5773 [2:24:02<2:39:52, 5.48s/it] 70%|██████▉ | 4023/5773 [2:24:07<2:39:28, 5.47s/it] 70%|██████▉ | 4023/5773 [2:24:09<2:39:28, 5.47s/it] {'loss': 0.5643, 'learning_rate': 4.445755682301933e-06, 'epoch': 0.7} 70%|██████▉ | 4023/5773 [2:24:09<2:39:28, 5.47s/it] {'loss': 0.5643, 'learning_rate': 4.445755682301933e-06, 'epoch': 0.7} 70%|██████▉ | 4023/5773 [2:24:07<2:39:28, 5.47s/it] 70%|██████▉ | 4024/5773 [2:24:13<2:39:00, 5.45s/it] 70%|██████▉ | 4024/5773 [2:24:15<2:39:00, 5.45s/it] {'loss': 0.5724, 'learning_rate': 4.441090643629057e-06, 'epoch': 0.7} 70%|██████▉ | 4024/5773 [2:24:15<2:39:00, 5.45s/it] {'loss': 0.5724, 'learning_rate': 4.441090643629057e-06, 'epoch': 0.7} 70%|██████▉ | 4024/5773 [2:24:13<2:39:00, 5.45s/it] 70%|██████▉ | 4025/5773 [2:24:18<2:40:30, 5.51s/it] 70%|██████▉ | 4025/5773 [2:24:20<2:40:30, 5.51s/it] {'loss': 0.5685, 'learning_rate': 4.436427355078455e-06, 'epoch': 0.7} 70%|██████▉ | 4025/5773 [2:24:20<2:40:30, 5.51s/it] {'loss': 0.5685, 'learning_rate': 4.436427355078455e-06, 'epoch': 0.7} 70%|██████▉ | 4025/5773 [2:24:18<2:40:30, 5.51s/it] 70%|██████▉ | 4026/5773 [2:24:24<2:39:55, 5.49s/it] 70%|██████▉ | 4026/5773 [2:24:26<2:39:55, 5.49s/it] {'loss': 0.5422, 'learning_rate': 4.431765818118281e-06, 'epoch': 0.7} 70%|██████▉ | 4026/5773 [2:24:26<2:39:55, 5.49s/it] {'loss': 0.5422, 'learning_rate': 4.431765818118281e-06, 'epoch': 0.7} 70%|██████▉ | 4026/5773 [2:24:24<2:39:55, 5.49s/it] 70%|██████▉ | 4027/5773 [2:24:29<2:39:23, 5.48s/it] 70%|██████▉ | 4027/5773 [2:24:31<2:39:23, 5.48s/it] {'loss': 0.5577, 'learning_rate': 4.427106034216125e-06, 'epoch': 0.7} 70%|██████▉ | 4027/5773 [2:24:31<2:39:23, 5.48s/it] {'loss': 0.5577, 'learning_rate': 4.427106034216125e-06, 'epoch': 0.7} 70%|██████▉ | 4027/5773 [2:24:29<2:39:23, 5.48s/it] 70%|██████▉ | 4028/5773 [2:24:35<2:40:19, 5.51s/it] 70%|██████▉ | 4028/5773 [2:24:37<2:40:19, 5.51s/it] {'loss': 0.5535, 'learning_rate': 4.422448004839044e-06, 'epoch': 0.7} 70%|██████▉ | 4028/5773 [2:24:37<2:40:19, 5.51s/it] {'loss': 0.5535, 'learning_rate': 4.422448004839044e-06, 'epoch': 0.7} 70%|██████▉ | 4028/5773 [2:24:35<2:40:19, 5.51s/it] 70%|██████▉ | 4029/5773 [2:24:40<2:39:56, 5.50s/it] 70%|██████▉ | 4029/5773 [2:24:42<2:39:56, 5.50s/it] {'loss': 0.5692, 'learning_rate': 4.417791731453534e-06, 'epoch': 0.7} 70%|██████▉ | 4029/5773 [2:24:42<2:39:56, 5.50s/it] {'loss': 0.5692, 'learning_rate': 4.417791731453534e-06, 'epoch': 0.7} 70%|██████▉ | 4029/5773 [2:24:40<2:39:56, 5.50s/it] 70%|██████▉ | 4030/5773 [2:24:46<2:38:35, 5.46s/it] 70%|██████▉ | 4030/5773 [2:24:47<2:38:35, 5.46s/it] {'loss': 0.5605, 'learning_rate': 4.413137215525532e-06, 'epoch': 0.7} 70%|██████▉ | 4030/5773 [2:24:47<2:38:35, 5.46s/it] {'loss': 0.5605, 'learning_rate': 4.413137215525532e-06, 'epoch': 0.7} 70%|██████▉ | 4030/5773 [2:24:46<2:38:35, 5.46s/it] 70%|██████▉ | 4031/5773 [2:24:51<2:38:35, 5.46s/it] 70%|██████▉ | 4031/5773 [2:24:53<2:38:35, 5.46s/it] {'loss': 0.5603, 'learning_rate': 4.408484458520438e-06, 'epoch': 0.7} 70%|██████▉ | 4031/5773 [2:24:51<2:38:35, 5.46s/it]{'loss': 0.5603, 'learning_rate': 4.408484458520438e-06, 'epoch': 0.7} 70%|██████▉ | 4031/5773 [2:24:53<2:38:35, 5.46s/it] 70%|██████▉ | 4032/5773 [2:24:56<2:38:42, 5.47s/it] 70%|██████▉ | 4032/5773 [2:24:58<2:38:42, 5.47s/it] {'loss': 0.5626, 'learning_rate': 4.403833461903084e-06, 'epoch': 0.7} 70%|██████▉ | 4032/5773 [2:24:58<2:38:42, 5.47s/it] {'loss': 0.5626, 'learning_rate': 4.403833461903084e-06, 'epoch': 0.7} 70%|██████▉ | 4032/5773 [2:24:56<2:38:42, 5.47s/it] 70%|██████▉ | 4033/5773 [2:25:02<2:38:08, 5.45s/it] 70%|██████▉ | 4033/5773 [2:25:04<2:38:08, 5.45s/it] {'loss': 0.5517, 'learning_rate': 4.399184227137749e-06, 'epoch': 0.7} 70%|██████▉ | 4033/5773 [2:25:04<2:38:08, 5.45s/it] {'loss': 0.5517, 'learning_rate': 4.399184227137749e-06, 'epoch': 0.7} 70%|██████▉ | 4033/5773 [2:25:02<2:38:08, 5.45s/it] 70%|██████▉ | 4034/5773 [2:25:08<2:39:20, 5.50s/it] 70%|██████▉ | 4034/5773 [2:25:09<2:39:21, 5.50s/it] {'loss': 0.554, 'learning_rate': 4.394536755688169e-06, 'epoch': 0.7} 70%|██████▉ | 4034/5773 [2:25:09<2:39:21, 5.50s/it] {'loss': 0.554, 'learning_rate': 4.394536755688169e-06, 'epoch': 0.7} 70%|██████▉ | 4034/5773 [2:25:08<2:39:20, 5.50s/it] 70%|██████▉ | 4035/5773 [2:25:13<2:38:47, 5.48s/it] 70%|██████▉ | 4035/5773 [2:25:15<2:38:47, 5.48s/it] {'loss': 0.5673, 'learning_rate': 4.389891049017511e-06, 'epoch': 0.7} 70%|██████▉ | 4035/5773 [2:25:15<2:38:47, 5.48s/it] {'loss': 0.5673, 'learning_rate': 4.389891049017511e-06, 'epoch': 0.7} 70%|██████▉ | 4035/5773 [2:25:13<2:38:47, 5.48s/it] 70%|██████▉ | 4036/5773 [2:25:18<2:37:14, 5.43s/it] 70%|██████▉ | 4036/5773 [2:25:20<2:37:14, 5.43s/it] {'loss': 0.5589, 'learning_rate': 4.385247108588391e-06, 'epoch': 0.7} 70%|██████▉ | 4036/5773 [2:25:20<2:37:14, 5.43s/it] {'loss': 0.5589, 'learning_rate': 4.385247108588391e-06, 'epoch': 0.7} 70%|██████▉ | 4036/5773 [2:25:18<2:37:14, 5.43s/it] 70%|██████▉ | 4037/5773 [2:25:24<2:37:00, 5.43s/it] 70%|██████▉ | 4037/5773 [2:25:26<2:36:59, 5.43s/it] {'loss': 0.5504, 'learning_rate': 4.380604935862869e-06, 'epoch': 0.7} 70%|██████▉ | 4037/5773 [2:25:26<2:36:59, 5.43s/it] {'loss': 0.5504, 'learning_rate': 4.380604935862869e-06, 'epoch': 0.7} 70%|██████▉ | 4037/5773 [2:25:24<2:37:00, 5.43s/it] 70%|██████▉ | 4038/5773 [2:25:29<2:35:47, 5.39s/it] 70%|██████▉ | 4038/5773 [2:25:31<2:35:47, 5.39s/it] {'loss': 0.5586, 'learning_rate': 4.375964532302456e-06, 'epoch': 0.7} 70%|██████▉ | 4038/5773 [2:25:31<2:35:47, 5.39s/it] {'loss': 0.5586, 'learning_rate': 4.375964532302456e-06, 'epoch': 0.7} 70%|██████▉ | 4038/5773 [2:25:29<2:35:47, 5.39s/it] 70%|██████▉ | 4039/5773 [2:25:34<2:36:29, 5.42s/it] 70%|██████▉ | 4039/5773 [2:25:36<2:36:29, 5.42s/it] {'loss': 0.557, 'learning_rate': 4.3713258993680926e-06, 'epoch': 0.7} 70%|██████▉ | 4039/5773 [2:25:34<2:36:29, 5.42s/it]{'loss': 0.557, 'learning_rate': 4.3713258993680926e-06, 'epoch': 0.7} 70%|██████▉ | 4039/5773 [2:25:36<2:36:29, 5.42s/it] 70%|██████▉ | 4040/5773 [2:25:40<2:37:01, 5.44s/it] 70%|██████▉ | 4040/5773 [2:25:42<2:37:01, 5.44s/it] {'loss': 0.5655, 'learning_rate': 4.366689038520173e-06, 'epoch': 0.7} 70%|██████▉ | 4040/5773 [2:25:42<2:37:01, 5.44s/it] {'loss': 0.5655, 'learning_rate': 4.366689038520173e-06, 'epoch': 0.7} 70%|██████▉ | 4040/5773 [2:25:40<2:37:01, 5.44s/it] 70%|██████▉ | 4041/5773 [2:25:46<2:38:25, 5.49s/it] 70%|██████▉ | 4041/5773 [2:25:48<2:38:25, 5.49s/it] {'loss': 0.5615, 'learning_rate': 4.3620539512185265e-06, 'epoch': 0.7} 70%|██████▉ | 4041/5773 [2:25:48<2:38:25, 5.49s/it] {'loss': 0.5615, 'learning_rate': 4.3620539512185265e-06, 'epoch': 0.7} 70%|██████▉ | 4041/5773 [2:25:46<2:38:25, 5.49s/it] 70%|███████ | 4042/5773 [2:25:51<2:38:25, 5.49s/it] 70%|███████ | 4042/5773 [2:25:53<2:38:25, 5.49s/it] {'loss': 0.5616, 'learning_rate': 4.357420638922427e-06, 'epoch': 0.7} {'loss': 0.5616, 'learning_rate': 4.357420638922427e-06, 'epoch': 0.7} 70%|███████ | 4042/5773 [2:25:53<2:38:25, 5.49s/it] 70%|███████ | 4042/5773 [2:25:51<2:38:25, 5.49s/it] 70%|███████ | 4043/5773 [2:25:56<2:37:48, 5.47s/it] 70%|███████ | 4043/5773 [2:25:58<2:37:48, 5.47s/it] {'loss': 0.5574, 'learning_rate': 4.352789103090587e-06, 'epoch': 0.7} 70%|███████ | 4043/5773 [2:25:58<2:37:48, 5.47s/it]{'loss': 0.5574, 'learning_rate': 4.352789103090587e-06, 'epoch': 0.7} 70%|███████ | 4043/5773 [2:25:56<2:37:48, 5.47s/it] 70%|███████ | 4044/5773 [2:26:02<2:38:38, 5.51s/it] 70%|███████ | 4044/5773 [2:26:04<2:38:38, 5.51s/it] {'loss': 0.5742, 'learning_rate': 4.348159345181168e-06, 'epoch': 0.7} 70%|███████ | 4044/5773 [2:26:04<2:38:38, 5.51s/it] {'loss': 0.5742, 'learning_rate': 4.348159345181168e-06, 'epoch': 0.7} 70%|███████ | 4044/5773 [2:26:02<2:38:38, 5.51s/it] 70%|███████ | 4045/5773 [2:26:08<2:38:15, 5.49s/it] 70%|███████ | 4045/5773 [2:26:09<2:38:15, 5.49s/it] {'loss': 0.5434, 'learning_rate': 4.343531366651761e-06, 'epoch': 0.7} 70%|███████ | 4045/5773 [2:26:08<2:38:15, 5.49s/it] {'loss': 0.5434, 'learning_rate': 4.343531366651761e-06, 'epoch': 0.7} 70%|███████ | 4045/5773 [2:26:09<2:38:15, 5.49s/it] 70%|███████ | 4046/5773 [2:26:13<2:38:28, 5.51s/it] 70%|███████ | 4046/5773 [2:26:15<2:38:28, 5.51s/it] {'loss': 0.5717, 'learning_rate': 4.3389051689594e-06, 'epoch': 0.7} 70%|███████ | 4046/5773 [2:26:15<2:38:28, 5.51s/it] {'loss': 0.5717, 'learning_rate': 4.3389051689594e-06, 'epoch': 0.7} 70%|███████ | 4046/5773 [2:26:13<2:38:28, 5.51s/it] 70%|███████ | 4047/5773 [2:26:19<2:39:02, 5.53s/it] 70%|███████ | 4047/5773 [2:26:21<2:39:02, 5.53s/it] {'loss': 0.545, 'learning_rate': 4.334280753560566e-06, 'epoch': 0.7} 70%|███████ | 4047/5773 [2:26:21<2:39:02, 5.53s/it] {'loss': 0.545, 'learning_rate': 4.334280753560566e-06, 'epoch': 0.7} 70%|███████ | 4047/5773 [2:26:19<2:39:02, 5.53s/it] 70%|███████ | 4048/5773 [2:26:24<2:39:25, 5.55s/it] 70%|███████ | 4048/5773 [2:26:26<2:39:25, 5.55s/it] {'loss': 0.5629, 'learning_rate': 4.329658121911169e-06, 'epoch': 0.7} 70%|███████ | 4048/5773 [2:26:26<2:39:25, 5.55s/it] {'loss': 0.5629, 'learning_rate': 4.329658121911169e-06, 'epoch': 0.7} 70%|███████ | 4048/5773 [2:26:24<2:39:25, 5.55s/it] 70%|███████ | 4049/5773 [2:26:30<2:38:12, 5.51s/it] 70%|███████ | 4049/5773 [2:26:32<2:38:13, 5.51s/it] {'loss': 0.5716, 'learning_rate': 4.325037275466562e-06, 'epoch': 0.7} 70%|███████ | 4049/5773 [2:26:32<2:38:13, 5.51s/it] {'loss': 0.5716, 'learning_rate': 4.325037275466562e-06, 'epoch': 0.7} 70%|███████ | 4049/5773 [2:26:30<2:38:12, 5.51s/it]5 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 2AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 70%|███████ | 4050/5773 [2:26:35<2:37:06, 5.47s/it]AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 70%|███████ | 4050/5773 [2:26:37<2:37:05, 5.47s/it]15 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... {'loss': 0.5686, 'learning_rate': 4.32041821568153e-06, 'epoch': 0.7} 70%|███████ | 4050/5773 [2:26:37<2:37:05, 5.47s/it] {'loss': 0.5686, 'learning_rate': 4.32041821568153e-06, 'epoch': 0.7} 70%|███████ | 4050/5773 [2:26:35<2:37:06, 5.47s/it] 70%|███████ | 4051/5773 [2:26:40<2:36:50, 5.47s/it] 70%|███████ | 4051/5773 [2:26:42<2:36:50, 5.46s/it] {'loss': 0.556, 'learning_rate': 4.315800944010309e-06, 'epoch': 0.7} 70%|███████ | 4051/5773 [2:26:42<2:36:50, 5.46s/it] {'loss': 0.556, 'learning_rate': 4.315800944010309e-06, 'epoch': 0.7} 70%|███████ | 4051/5773 [2:26:40<2:36:50, 5.47s/it] 70%|███████ | 4052/5773 [2:26:46<2:36:12, 5.45s/it] 70%|███████ | 4052/5773 [2:26:48<2:36:11, 5.45s/it] {'loss': 0.5608, 'learning_rate': 4.31118546190656e-06, 'epoch': 0.7} 70%|███████ | 4052/5773 [2:26:48<2:36:11, 5.45s/it] {'loss': 0.5608, 'learning_rate': 4.31118546190656e-06, 'epoch': 0.7} 70%|███████ | 4052/5773 [2:26:46<2:36:12, 5.45s/it] 70%|███████ | 4053/5773 [2:26:51<2:36:37, 5.46s/it] 70%|███████ | 4053/5773 [2:26:53<2:36:37, 5.46s/it] {'loss': 0.569, 'learning_rate': 4.30657177082338e-06, 'epoch': 0.7} 70%|███████ | 4053/5773 [2:26:53<2:36:37, 5.46s/it] {'loss': 0.569, 'learning_rate': 4.30657177082338e-06, 'epoch': 0.7} 70%|███████ | 4053/5773 [2:26:51<2:36:37, 5.46s/it] 70%|███████ | 4054/5773 [2:26:57<2:37:20, 5.49s/it] 70%|███████ | 4054/5773 [2:26:59<2:37:20, 5.49s/it] {'loss': 0.5664, 'learning_rate': 4.301959872213314e-06, 'epoch': 0.7} 70%|███████ | 4054/5773 [2:26:59<2:37:20, 5.49s/it] {'loss': 0.5664, 'learning_rate': 4.301959872213314e-06, 'epoch': 0.7} 70%|███████ | 4054/5773 [2:26:57<2:37:20, 5.49s/it] 70%|███████ | 4055/5773 [2:27:02<2:37:21, 5.50s/it] 70%|███████ | 4055/5773 [2:27:04<2:37:21, 5.50s/it] {'loss': 0.5528, 'learning_rate': 4.297349767528336e-06, 'epoch': 0.7} {'loss': 0.5528, 'learning_rate': 4.297349767528336e-06, 'epoch': 0.7} 70%|███████ | 4055/5773 [2:27:04<2:37:21, 5.50s/it] 70%|███████ | 4055/5773 [2:27:02<2:37:21, 5.50s/it] 70%|███████ | 4056/5773 [2:27:08<2:36:51, 5.48s/it] 70%|███████ | 4056/5773 [2:27:10<2:36:51, 5.48s/it] {'loss': 0.5501, 'learning_rate': 4.292741458219841e-06, 'epoch': 0.7} 70%|███████ | 4056/5773 [2:27:10<2:36:51, 5.48s/it] {'loss': 0.5501, 'learning_rate': 4.292741458219841e-06, 'epoch': 0.7} 70%|███████ | 4056/5773 [2:27:08<2:36:51, 5.48s/it] 70%|███████ | 4057/5773 [2:27:13<2:36:39, 5.48s/it] 70%|███████ | 4057/5773 [2:27:15<2:36:39, 5.48s/it] {'loss': 0.5665, 'learning_rate': 4.288134945738684e-06, 'epoch': 0.7} 70%|███████ | 4057/5773 [2:27:15<2:36:39, 5.48s/it] {'loss': 0.5665, 'learning_rate': 4.288134945738684e-06, 'epoch': 0.7} 70%|███████ | 4057/5773 [2:27:13<2:36:39, 5.48s/it] 70%|███████ | 4058/5773 [2:27:19<2:36:34, 5.48s/it] 70%|███████ | 4058/5773 [2:27:21<2:36:34, 5.48s/it] {'loss': 0.5671, 'learning_rate': 4.283530231535141e-06, 'epoch': 0.7} 70%|███████ | 4058/5773 [2:27:21<2:36:34, 5.48s/it] {'loss': 0.5671, 'learning_rate': 4.283530231535141e-06, 'epoch': 0.7} 70%|███████ | 4058/5773 [2:27:19<2:36:34, 5.48s/it] 70%|███████ | 4059/5773 [2:27:24<2:37:14, 5.50s/it] 70%|███████ | 4059/5773 [2:27:26<2:37:14, 5.50s/it] {'loss': 0.5636, 'learning_rate': 4.278927317058916e-06, 'epoch': 0.7} 70%|███████ | 4059/5773 [2:27:26<2:37:14, 5.50s/it] {'loss': 0.5636, 'learning_rate': 4.278927317058916e-06, 'epoch': 0.7} 70%|███████ | 4059/5773 [2:27:24<2:37:14, 5.50s/it] 70%|███████ | 4060/5773 [2:27:30<2:38:12, 5.54s/it] 70%|███████ | 4060/5773 [2:27:32<2:38:12, 5.54s/it] {'loss': 0.5867, 'learning_rate': 4.274326203759163e-06, 'epoch': 0.7} 70%|███████ | 4060/5773 [2:27:32<2:38:12, 5.54s/it] {'loss': 0.5867, 'learning_rate': 4.274326203759163e-06, 'epoch': 0.7} 70%|███████ | 4060/5773 [2:27:30<2:38:12, 5.54s/it] 70%|███████ | 4061/5773 [2:27:36<2:38:04, 5.54s/it] 70%|███████ | 4061/5773 [2:27:38<2:38:04, 5.54s/it] {'loss': 0.5767, 'learning_rate': 4.2697268930844534e-06, 'epoch': 0.7} 70%|███████ | 4061/5773 [2:27:38<2:38:04, 5.54s/it] {'loss': 0.5767, 'learning_rate': 4.2697268930844534e-06, 'epoch': 0.7} 70%|███████ | 4061/5773 [2:27:36<2:38:04, 5.54s/it] 70%|███████ | 4062/5773 [2:27:41<2:37:03, 5.51s/it] 70%|███████ | 4062/5773 [2:27:43<2:37:03, 5.51s/it] {'loss': 0.555, 'learning_rate': 4.2651293864828e-06, 'epoch': 0.7} 70%|███████ | 4062/5773 [2:27:43<2:37:03, 5.51s/it] {'loss': 0.555, 'learning_rate': 4.2651293864828e-06, 'epoch': 0.7} 70%|███████ | 4062/5773 [2:27:41<2:37:03, 5.51s/it] 70%|███████ | 4063/5773 [2:27:46<2:36:30, 5.49s/it] 70%|███████ | 4063/5773 [2:27:48<2:36:29, 5.49s/it] {'loss': 0.5462, 'learning_rate': 4.2605336854016395e-06, 'epoch': 0.7} 70%|███████ | 4063/5773 [2:27:48<2:36:29, 5.49s/it] {'loss': 0.5462, 'learning_rate': 4.2605336854016395e-06, 'epoch': 0.7} 70%|███████ | 4063/5773 [2:27:46<2:36:30, 5.49s/it] 70%|███████ | 4064/5773 [2:27:52<2:36:05, 5.48s/it] 70%|███████ | 4064/5773 [2:27:54<2:36:05, 5.48s/it] {'loss': 0.5622, 'learning_rate': 4.255939791287854e-06, 'epoch': 0.7} 70%|███████ | 4064/5773 [2:27:54<2:36:05, 5.48s/it] {'loss': 0.5622, 'learning_rate': 4.255939791287854e-06, 'epoch': 0.7} 70%|███████ | 4064/5773 [2:27:52<2:36:05, 5.48s/it] 70%|███████ | 4065/5773 [2:27:57<2:36:25, 5.50s/it] 70%|███████ | 4065/5773 [2:27:59<2:36:25, 5.50s/it] {'loss': 0.564, 'learning_rate': 4.251347705587744e-06, 'epoch': 0.7} 70%|███████ | 4065/5773 [2:27:59<2:36:25, 5.50s/it] {'loss': 0.564, 'learning_rate': 4.251347705587744e-06, 'epoch': 0.7} 70%|███████ | 4065/5773 [2:27:57<2:36:25, 5.50s/it] 70%|███████ | 4066/5773 [2:28:03<2:35:17, 5.46s/it] 70%|███████ | 4066/5773 [2:28:05<2:35:17, 5.46s/it] {'loss': 0.5709, 'learning_rate': 4.246757429747039e-06, 'epoch': 0.7} 70%|███████ | 4066/5773 [2:28:05<2:35:17, 5.46s/it] {'loss': 0.5709, 'learning_rate': 4.246757429747039e-06, 'epoch': 0.7} 70%|███████ | 4066/5773 [2:28:03<2:35:17, 5.46s/it] 70%|███████ | 4067/5773 [2:28:08<2:34:12, 5.42s/it] 70%|███████ | 4067/5773 [2:28:10<2:34:11, 5.42s/it] {'loss': 0.552, 'learning_rate': 4.242168965210915e-06, 'epoch': 0.7} 70%|███████ | 4067/5773 [2:28:10<2:34:11, 5.42s/it] {'loss': 0.552, 'learning_rate': 4.242168965210915e-06, 'epoch': 0.7} 70%|███████ | 4067/5773 [2:28:08<2:34:12, 5.42s/it] 70%|███████ | 4068/5773 [2:28:14<2:35:05, 5.46s/it] 70%|███████ | 4068/5773 [2:28:16<2:35:05, 5.46s/it] {'loss': 0.5774, 'learning_rate': 4.2375823134239624e-06, 'epoch': 0.7} 70%|███████ | 4068/5773 [2:28:16<2:35:05, 5.46s/it] {'loss': 0.5774, 'learning_rate': 4.2375823134239624e-06, 'epoch': 0.7} 70%|███████ | 4068/5773 [2:28:14<2:35:05, 5.46s/it] 70%|███████ | 4069/5773 [2:28:19<2:35:07, 5.46s/it] 70%|███████ | 4069/5773 [2:28:21<2:35:06, 5.46s/it] {'loss': 0.5618, 'learning_rate': 4.232997475830205e-06, 'epoch': 0.7} 70%|███████ | 4069/5773 [2:28:21<2:35:06, 5.46s/it] {'loss': 0.5618, 'learning_rate': 4.232997475830205e-06, 'epoch': 0.7} 70%|███████ | 4069/5773 [2:28:19<2:35:07, 5.46s/it] 71%|███████ | 4070/5773 [2:28:25<2:35:19, 5.47s/it] 71%|███████ | 4070/5773 [2:28:27<2:35:19, 5.47s/it] {'loss': 0.5772, 'learning_rate': 4.228414453873097e-06, 'epoch': 0.71} 71%|███████ | 4070/5773 [2:28:27<2:35:19, 5.47s/it] {'loss': 0.5772, 'learning_rate': 4.228414453873097e-06, 'epoch': 0.71} 71%|███████ | 4070/5773 [2:28:25<2:35:19, 5.47s/it] 71%|███████ | 4071/5773 [2:28:30<2:35:53, 5.50s/it] 71%|███████ | 4071/5773 [2:28:32<2:35:53, 5.50s/it] {'loss': 0.562, 'learning_rate': 4.223833248995519e-06, 'epoch': 0.71} 71%|███████ | 4071/5773 [2:28:32<2:35:53, 5.50s/it] {'loss': 0.562, 'learning_rate': 4.223833248995519e-06, 'epoch': 0.71} 71%|███████ | 4071/5773 [2:28:30<2:35:53, 5.50s/it] 71%|███████ | 4072/5773 [2:28:36<2:36:14, 5.51s/it] 71%|███████ | 4072/5773 [2:28:38<2:36:14, 5.51s/it] {'loss': 0.5711, 'learning_rate': 4.2192538626397785e-06, 'epoch': 0.71} 71%|███████ | 4072/5773 [2:28:38<2:36:14, 5.51s/it] {'loss': 0.5711, 'learning_rate': 4.2192538626397785e-06, 'epoch': 0.71} 71%|███████ | 4072/5773 [2:28:36<2:36:14, 5.51s/it] 71%|███████ | 4073/5773 [2:28:41<2:35:41, 5.50s/it] 71%|███████ | 4073/5773 [2:28:43<2:35:42, 5.50s/it] {'loss': 0.5616, 'learning_rate': 4.214676296247619e-06, 'epoch': 0.71} 71%|███████ | 4073/5773 [2:28:43<2:35:42, 5.50s/it] {'loss': 0.5616, 'learning_rate': 4.214676296247619e-06, 'epoch': 0.71} 71%|███████ | 4073/5773 [2:28:41<2:35:41, 5.50s/it] 71%|███████ | 4074/5773 [2:28:47<2:35:14, 5.48s/it] 71%|███████ | 4074/5773 [2:28:49<2:35:14, 5.48s/it] {'loss': 0.5639, 'learning_rate': 4.2101005512602015e-06, 'epoch': 0.71} 71%|███████ | 4074/5773 [2:28:49<2:35:14, 5.48s/it] {'loss': 0.5639, 'learning_rate': 4.2101005512602015e-06, 'epoch': 0.71} 71%|███████ | 4074/5773 [2:28:47<2:35:14, 5.48s/it] 71%|███████ | 4075/5773 [2:28:52<2:36:08, 5.52s/it] 71%|███████ | 4075/5773 [2:28:54<2:36:08, 5.52s/it] {'loss': 0.5474, 'learning_rate': 4.205526629118116e-06, 'epoch': 0.71} 71%|███████ | 4075/5773 [2:28:54<2:36:08, 5.52s/it] {'loss': 0.5474, 'learning_rate': 4.205526629118116e-06, 'epoch': 0.71} 71%|███████ | 4075/5773 [2:28:52<2:36:08, 5.52s/it] 71%|███████ | 4076/5773 [2:28:58<2:34:58, 5.48s/it] 71%|███████ | 4076/5773 [2:29:00<2:34:58, 5.48s/it] {'loss': 0.5461, 'learning_rate': 4.200954531261378e-06, 'epoch': 0.71} 71%|███████ | 4076/5773 [2:29:00<2:34:58, 5.48s/it] {'loss': 0.5461, 'learning_rate': 4.200954531261378e-06, 'epoch': 0.71} 71%|███████ | 4076/5773 [2:28:58<2:34:58, 5.48s/it] 71%|███████ | 4077/5773 [2:29:03<2:35:13, 5.49s/it] 71%|███████ | 4077/5773 [2:29:05<2:35:13, 5.49s/it] {'loss': 0.5716, 'learning_rate': 4.196384259129433e-06, 'epoch': 0.71} 71%|███████ | 4077/5773 [2:29:05<2:35:13, 5.49s/it] {'loss': 0.5716, 'learning_rate': 4.196384259129433e-06, 'epoch': 0.71} 71%|███████ | 4077/5773 [2:29:03<2:35:13, 5.49s/it] 71%|███████ | 4078/5773 [2:29:08<2:33:36, 5.44s/it] 71%|███████ | 4078/5773 [2:29:10<2:33:36, 5.44s/it] {'loss': 0.5763, 'learning_rate': 4.191815814161149e-06, 'epoch': 0.71} 71%|███████ | 4078/5773 [2:29:10<2:33:36, 5.44s/it] {'loss': 0.5763, 'learning_rate': 4.191815814161149e-06, 'epoch': 0.71} 71%|███████ | 4078/5773 [2:29:08<2:33:36, 5.44s/it] 71%|███████ | 4079/5773 [2:29:14<2:33:50, 5.45s/it] 71%|███████ | 4079/5773 [2:29:16<2:33:50, 5.45s/it] {'loss': 0.5802, 'learning_rate': 4.1872491977948125e-06, 'epoch': 0.71} 71%|███████ | 4079/5773 [2:29:16<2:33:50, 5.45s/it] {'loss': 0.5802, 'learning_rate': 4.1872491977948125e-06, 'epoch': 0.71} 71%|███████ | 4079/5773 [2:29:14<2:33:50, 5.45s/it] 71%|███████ | 4080/5773 [2:29:19<2:32:21, 5.40s/it] 71%|███████ | 4080/5773 [2:29:21<2:32:21, 5.40s/it] {'loss': 0.5654, 'learning_rate': 4.182684411468148e-06, 'epoch': 0.71} 71%|███████ | 4080/5773 [2:29:21<2:32:21, 5.40s/it] {'loss': 0.5654, 'learning_rate': 4.182684411468148e-06, 'epoch': 0.71} 71%|███████ | 4080/5773 [2:29:19<2:32:21, 5.40s/it] 71%|███████ | 4081/5773 [2:29:25<2:32:35, 5.41s/it] 71%|███████ | 4081/5773 [2:29:27<2:32:35, 5.41s/it] {'loss': 0.5424, 'learning_rate': 4.178121456618294e-06, 'epoch': 0.71} 71%|███████ | 4081/5773 [2:29:27<2:32:35, 5.41s/it] {'loss': 0.5424, 'learning_rate': 4.178121456618294e-06, 'epoch': 0.71} 71%|███████ | 4081/5773 [2:29:25<2:32:35, 5.41s/it] 71%|███████ | 4082/5773 [2:29:30<2:32:44, 5.42s/it] 71%|███████ | 4082/5773 [2:29:32<2:32:44, 5.42s/it] {'loss': 0.5623, 'learning_rate': 4.173560334681809e-06, 'epoch': 0.71} 71%|███████ | 4082/5773 [2:29:32<2:32:44, 5.42s/it] {'loss': 0.5623, 'learning_rate': 4.173560334681809e-06, 'epoch': 0.71} 71%|███████ | 4082/5773 [2:29:30<2:32:44, 5.42s/it] 71%|███████ | 4083/5773 [2:29:36<2:33:30, 5.45s/it] 71%|███████ | 4083/5773 [2:29:38<2:33:30, 5.45s/it] {'loss': 0.5633, 'learning_rate': 4.169001047094691e-06, 'epoch': 0.71} 71%|███████ | 4083/5773 [2:29:38<2:33:30, 5.45s/it] {'loss': 0.5633, 'learning_rate': 4.169001047094691e-06, 'epoch': 0.71} 71%|███████ | 4083/5773 [2:29:36<2:33:30, 5.45s/it] 71%|███████ | 4084/5773 [2:29:41<2:33:15, 5.44s/it] 71%|███████ | 4084/5773 [2:29:43<2:33:15, 5.44s/it] {'loss': 0.5468, 'learning_rate': 4.164443595292339e-06, 'epoch': 0.71} 71%|███████ | 4084/5773 [2:29:43<2:33:15, 5.44s/it] {'loss': 0.5468, 'learning_rate': 4.164443595292339e-06, 'epoch': 0.71} 71%|███████ | 4084/5773 [2:29:41<2:33:15, 5.44s/it] 71%|███████ | 4085/5773 [2:29:47<2:34:20, 5.49s/it] 71%|███████ | 4085/5773 [2:29:49<2:34:20, 5.49s/it] {'loss': 0.5543, 'learning_rate': 4.159887980709584e-06, 'epoch': 0.71} 71%|███████ | 4085/5773 [2:29:49<2:34:20, 5.49s/it] {'loss': 0.5543, 'learning_rate': 4.159887980709584e-06, 'epoch': 0.71} 71%|███████ | 4085/5773 [2:29:47<2:34:20, 5.49s/it] 71%|███████ | 4086/5773 [2:29:52<2:34:48, 5.51s/it] 71%|███████ | 4086/5773 [2:29:54<2:34:48, 5.51s/it] {'loss': 0.5661, 'learning_rate': 4.155334204780688e-06, 'epoch': 0.71} 71%|███████ | 4086/5773 [2:29:54<2:34:48, 5.51s/it] {'loss': 0.5661, 'learning_rate': 4.155334204780688e-06, 'epoch': 0.71} 71%|███████ | 4086/5773 [2:29:52<2:34:48, 5.51s/it] 71%|███████ | 4087/5773 [2:30:00<2:34:36, 5.50s/it] 71%|███████ | 4087/5773 [2:29:58<2:34:36, 5.50s/it] {'loss': 0.5441, 'learning_rate': 4.150782268939319e-06, 'epoch': 0.71} 71%|███████ | 4087/5773 [2:30:00<2:34:36, 5.50s/it] {'loss': 0.5441, 'learning_rate': 4.150782268939319e-06, 'epoch': 0.71} 71%|███████ | 4087/5773 [2:29:58<2:34:36, 5.50s/it] 71%|███████ | 4088/5773 [2:30:03<2:35:24, 5.53s/it] 71%|███████ | 4088/5773 [2:30:05<2:35:24, 5.53s/it] {'loss': 0.5498, 'learning_rate': 4.146232174618573e-06, 'epoch': 0.71} 71%|███████ | 4088/5773 [2:30:05<2:35:24, 5.53s/it] {'loss': 0.5498, 'learning_rate': 4.146232174618573e-06, 'epoch': 0.71} 71%|███████ | 4088/5773 [2:30:03<2:35:24, 5.53s/it] 71%|███████ | 4089/5773 [2:30:09<2:34:38, 5.51s/it] 71%|███████ | 4089/5773 [2:30:11<2:34:37, 5.51s/it] {'loss': 0.5672, 'learning_rate': 4.1416839232509605e-06, 'epoch': 0.71} 71%|███████ | 4089/5773 [2:30:11<2:34:37, 5.51s/it] {'loss': 0.5672, 'learning_rate': 4.1416839232509605e-06, 'epoch': 0.71} 71%|███████ | 4089/5773 [2:30:09<2:34:38, 5.51s/it] 71%|███████ | 4090/5773 [2:30:14<2:34:59, 5.53s/it] 71%|███████ | 4090/5773 [2:30:16<2:34:59, 5.53s/it] {'loss': 0.554, 'learning_rate': 4.1371375162684255e-06, 'epoch': 0.71} 71%|███████ | 4090/5773 [2:30:16<2:34:59, 5.53s/it] {'loss': 0.554, 'learning_rate': 4.1371375162684255e-06, 'epoch': 0.71} 71%|███████ | 4090/5773 [2:30:14<2:34:59, 5.53s/it] 71%|███████ | 4091/5773 [2:30:20<2:34:17, 5.50s/it] 71%|███████ | 4091/5773 [2:30:22<2:34:17, 5.50s/it] {'loss': 0.5619, 'learning_rate': 4.132592955102318e-06, 'epoch': 0.71} 71%|███████ | 4091/5773 [2:30:22<2:34:17, 5.50s/it] {'loss': 0.5619, 'learning_rate': 4.132592955102318e-06, 'epoch': 0.71} 71%|███████ | 4091/5773 [2:30:20<2:34:17, 5.50s/it] 71%|███████ | 4092/5773 [2:30:25<2:33:43, 5.49s/it] 71%|███████ | 4092/5773 [2:30:27<2:33:43, 5.49s/it] {'loss': 0.5632, 'learning_rate': 4.128050241183407e-06, 'epoch': 0.71} 71%|███████ | 4092/5773 [2:30:27<2:33:43, 5.49s/it] {'loss': 0.5632, 'learning_rate': 4.128050241183407e-06, 'epoch': 0.71} 71%|███████ | 4092/5773 [2:30:25<2:33:43, 5.49s/it] 71%|███████ | 4093/5773 [2:30:31<2:33:12, 5.47s/it] 71%|███████ | 4093/5773 [2:30:33<2:33:12, 5.47s/it] {'loss': 0.5554, 'learning_rate': 4.123509375941891e-06, 'epoch': 0.71} 71%|███████ | 4093/5773 [2:30:33<2:33:12, 5.47s/it] {'loss': 0.5554, 'learning_rate': 4.123509375941891e-06, 'epoch': 0.71} 71%|███████ | 4093/5773 [2:30:31<2:33:12, 5.47s/it] 71%|███████ | 4094/5773 [2:30:36<2:32:14, 5.44s/it] 71%|███████ | 4094/5773 [2:30:38<2:32:14, 5.44s/it] {'loss': 0.5676, 'learning_rate': 4.118970360807375e-06, 'epoch': 0.71} 71%|███████ | 4094/5773 [2:30:38<2:32:14, 5.44s/it] {'loss': 0.5676, 'learning_rate': 4.118970360807375e-06, 'epoch': 0.71} 71%|███████ | 4094/5773 [2:30:36<2:32:14, 5.44s/it] 71%|███████ | 4095/5773 [2:30:41<2:32:10, 5.44s/it] 71%|███████ | 4095/5773 [2:30:43<2:32:10, 5.44s/it] {'loss': 0.5648, 'learning_rate': 4.114433197208885e-06, 'epoch': 0.71} 71%|███████ | 4095/5773 [2:30:43<2:32:10, 5.44s/it] {'loss': 0.5648, 'learning_rate': 4.114433197208885e-06, 'epoch': 0.71} 71%|███████ | 4095/5773 [2:30:41<2:32:10, 5.44s/it] 71%|███████ | 4096/5773 [2:30:47<2:31:49, 5.43s/it] 71%|███████ | 4096/5773 [2:30:49<2:31:49, 5.43s/it] {'loss': 0.5604, 'learning_rate': 4.10989788657487e-06, 'epoch': 0.71} 71%|███████ | 4096/5773 [2:30:49<2:31:49, 5.43s/it] {'loss': 0.5604, 'learning_rate': 4.10989788657487e-06, 'epoch': 0.71} 71%|███████ | 4096/5773 [2:30:47<2:31:49, 5.43s/it] 71%|███████ | 4097/5773 [2:30:52<2:31:37, 5.43s/it] 71%|███████ | 4097/5773 [2:30:54<2:31:37, 5.43s/it] {'loss': 0.5539, 'learning_rate': 4.105364430333191e-06, 'epoch': 0.71} 71%|███████ | 4097/5773 [2:30:54<2:31:37, 5.43s/it] {'loss': 0.5539, 'learning_rate': 4.105364430333191e-06, 'epoch': 0.71} 71%|███████ | 4097/5773 [2:30:52<2:31:37, 5.43s/it] 71%|███████ | 4098/5773 [2:30:58<2:33:35, 5.50s/it] 71%|███████ | 4098/5773 [2:31:00<2:33:35, 5.50s/it] {'loss': 0.5543, 'learning_rate': 4.1008328299111145e-06, 'epoch': 0.71} 71%|███████ | 4098/5773 [2:30:58<2:33:35, 5.50s/it]{'loss': 0.5543, 'learning_rate': 4.1008328299111145e-06, 'epoch': 0.71} 71%|███████ | 4098/5773 [2:31:00<2:33:35, 5.50s/it] 71%|███████ | 4099/5773 [2:31:03<2:32:14, 5.46s/it] 71%|███████ | 4099/5773 [2:31:05<2:32:14, 5.46s/it] {'loss': 0.561, 'learning_rate': 4.0963030867353445e-06, 'epoch': 0.71} 71%|███████ | 4099/5773 [2:31:05<2:32:14, 5.46s/it] {'loss': 0.561, 'learning_rate': 4.0963030867353445e-06, 'epoch': 0.71} 71%|███████ | 4099/5773 [2:31:03<2:32:14, 5.46s/it]1 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 71%|███████ | 4100/5773 [2:31:09<2:31:10, 5.42s/it]8 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 71%|███████ | 4100/5773 [2:31:11<2:31:10, 5.42s/it]4 AutoResumeHook: Checking whether to suspend... 6 12 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... {'loss': 0.5523, 'learning_rate': 4.0917752022319845e-06, 'epoch': 0.71} 71%|███████ | 4100/5773 [2:31:11<2:31:10, 5.42s/it] {'loss': 0.5523, 'learning_rate': 4.0917752022319845e-06, 'epoch': 0.71} 71%|███████ | 4100/5773 [2:31:09<2:31:10, 5.42s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 71%|███████ | 4101/5773 [2:31:30<4:41:50, 10.11s/it] 71%|███████ | 4101/5773 [2:31:32<4:41:50, 10.11s/it] {'loss': 0.5521, 'learning_rate': 4.087249177826553e-06, 'epoch': 0.71} 71%|███████ | 4101/5773 [2:31:32<4:41:50, 10.11s/it] {'loss': 0.5521, 'learning_rate': 4.087249177826553e-06, 'epoch': 0.71} 71%|███████ | 4101/5773 [2:31:30<4:41:50, 10.11s/it] 71%|███████ | 4102/5773 [2:31:35<4:03:13, 8.73s/it] 71%|███████ | 4102/5773 [2:31:37<4:03:13, 8.73s/it] {'loss': 0.5639, 'learning_rate': 4.0827250149439956e-06, 'epoch': 0.71} 71%|███████ | 4102/5773 [2:31:37<4:03:13, 8.73s/it] {'loss': 0.5639, 'learning_rate': 4.0827250149439956e-06, 'epoch': 0.71} 71%|███████ | 4102/5773 [2:31:35<4:03:13, 8.73s/it] 71%|███████ | 4103/5773 [2:31:41<3:35:18, 7.74s/it] 71%|███████ | 4103/5773 [2:31:43<3:35:18, 7.74s/it] {'loss': 0.5546, 'learning_rate': 4.078202715008659e-06, 'epoch': 0.71} 71%|███████ | 4103/5773 [2:31:43<3:35:18, 7.74s/it] {'loss': 0.5546, 'learning_rate': 4.078202715008659e-06, 'epoch': 0.71} 71%|███████ | 4103/5773 [2:31:41<3:35:18, 7.74s/it] 71%|███████ | 4104/5773 [2:31:46<3:16:56, 7.08s/it] 71%|███████ | 4104/5773 [2:31:48<3:16:56, 7.08s/it] {'loss': 0.5407, 'learning_rate': 4.073682279444309e-06, 'epoch': 0.71} 71%|███████ | 4104/5773 [2:31:48<3:16:56, 7.08s/it] {'loss': 0.5407, 'learning_rate': 4.073682279444309e-06, 'epoch': 0.71} 71%|███████ | 4104/5773 [2:31:46<3:16:56, 7.08s/it] 71%|███████ | 4105/5773 [2:31:54<3:03:52, 6.61s/it] 71%|███████ | 4105/5773 [2:31:52<3:03:52, 6.61s/it] {'loss': 0.568, 'learning_rate': 4.0691637096741166e-06, 'epoch': 0.71} 71%|███████ | 4105/5773 [2:31:54<3:03:52, 6.61s/it] {'loss': 0.568, 'learning_rate': 4.0691637096741166e-06, 'epoch': 0.71} 71%|███████ | 4105/5773 [2:31:52<3:03:52, 6.61s/it] 71%|███████ | 4106/5773 [2:31:57<2:52:58, 6.23s/it] 71%|███████ | 4106/5773 [2:31:59<2:52:59, 6.23s/it] {'loss': 0.5607, 'learning_rate': 4.064647007120681e-06, 'epoch': 0.71} 71%|███████ | 4106/5773 [2:31:59<2:52:59, 6.23s/it] {'loss': 0.5607, 'learning_rate': 4.064647007120681e-06, 'epoch': 0.71} 71%|███████ | 4106/5773 [2:31:57<2:52:58, 6.23s/it] 71%|███████ | 4107/5773 [2:32:03<2:46:36, 6.00s/it] 71%|███████ | 4107/5773 [2:32:04<2:46:36, 6.00s/it] {'loss': 0.5549, 'learning_rate': 4.060132173206002e-06, 'epoch': 0.71} 71%|███████ | 4107/5773 [2:32:05<2:46:36, 6.00s/it] {'loss': 0.5549, 'learning_rate': 4.060132173206002e-06, 'epoch': 0.71} 71%|███████ | 4107/5773 [2:32:03<2:46:36, 6.00s/it] 71%|███████ | 4108/5773 [2:32:10<2:42:50, 5.87s/it] 71%|███████ | 4108/5773 [2:32:08<2:42:49, 5.87s/it]{'loss': 0.5637, 'learning_rate': 4.055619209351488e-06, 'epoch': 0.71} {'loss': 0.5637, 'learning_rate': 4.055619209351488e-06, 'epoch': 0.71} 71%|███████ | 4108/5773 [2:32:10<2:42:50, 5.87s/it] 71%|███████ | 4108/5773 [2:32:08<2:42:49, 5.87s/it] 71%|███████ | 4109/5773 [2:32:14<2:39:56, 5.77s/it] 71%|███████ | 4109/5773 [2:32:16<2:39:56, 5.77s/it] {'loss': 0.5659, 'learning_rate': 4.0511081169779735e-06, 'epoch': 0.71} 71%|███████ | 4109/5773 [2:32:16<2:39:56, 5.77s/it] {'loss': 0.5659, 'learning_rate': 4.0511081169779735e-06, 'epoch': 0.71} 71%|███████ | 4109/5773 [2:32:14<2:39:56, 5.77s/it] 71%|███████ | 4110/5773 [2:32:19<2:36:48, 5.66s/it] 71%|███████ | 4110/5773 [2:32:21<2:36:48, 5.66s/it] {'loss': 0.5731, 'learning_rate': 4.046598897505688e-06, 'epoch': 0.71} 71%|███████ | 4110/5773 [2:32:21<2:36:48, 5.66s/it] {'loss': 0.5731, 'learning_rate': 4.046598897505688e-06, 'epoch': 0.71} 71%|███████ | 4110/5773 [2:32:19<2:36:48, 5.66s/it] 71%|███████ | 4111/5773 [2:32:24<2:34:43, 5.59s/it] 71%|███████ | 4111/5773 [2:32:26<2:34:43, 5.59s/it] {'loss': 0.5485, 'learning_rate': 4.042091552354279e-06, 'epoch': 0.71} 71%|███████ | 4111/5773 [2:32:26<2:34:43, 5.59s/it] {'loss': 0.5485, 'learning_rate': 4.042091552354279e-06, 'epoch': 0.71} 71%|███████ | 4111/5773 [2:32:24<2:34:43, 5.59s/it] 71%|███████ | 4112/5773 [2:32:30<2:33:52, 5.56s/it] 71%|███████ | 4112/5773 [2:32:32<2:33:52, 5.56s/it] {'loss': 0.5721, 'learning_rate': 4.037586082942805e-06, 'epoch': 0.71} 71%|███████ | 4112/5773 [2:32:32<2:33:52, 5.56s/it] {'loss': 0.5721, 'learning_rate': 4.037586082942805e-06, 'epoch': 0.71} 71%|███████ | 4112/5773 [2:32:30<2:33:52, 5.56s/it] 71%|███████ | 4113/5773 [2:32:35<2:32:24, 5.51s/it] 71%|███████ | 4113/5773 [2:32:37<2:32:24, 5.51s/it] {'loss': 0.5651, 'learning_rate': 4.033082490689728e-06, 'epoch': 0.71} 71%|███████ | 4113/5773 [2:32:37<2:32:24, 5.51s/it] {'loss': 0.5651, 'learning_rate': 4.033082490689728e-06, 'epoch': 0.71} 71%|███████ | 4113/5773 [2:32:35<2:32:24, 5.51s/it] 71%|███████▏ | 4114/5773 [2:32:41<2:31:33, 5.48s/it] 71%|███████▏ | 4114/5773 [2:32:43<2:31:33, 5.48s/it] {'loss': 0.5516, 'learning_rate': 4.028580777012922e-06, 'epoch': 0.71} 71%|███████▏ | 4114/5773 [2:32:43<2:31:33, 5.48s/it] {'loss': 0.5516, 'learning_rate': 4.028580777012922e-06, 'epoch': 0.71} 71%|███████▏ | 4114/5773 [2:32:41<2:31:33, 5.48s/it] 71%|███████▏ | 4115/5773 [2:32:46<2:32:40, 5.52s/it] 71%|███████▏ | 4115/5773 [2:32:48<2:32:40, 5.52s/it] {'loss': 0.5528, 'learning_rate': 4.024080943329676e-06, 'epoch': 0.71} 71%|███████▏ | 4115/5773 [2:32:48<2:32:40, 5.52s/it] {'loss': 0.5528, 'learning_rate': 4.024080943329676e-06, 'epoch': 0.71} 71%|███████▏ | 4115/5773 [2:32:46<2:32:40, 5.52s/it] 71%|███████▏ | 4116/5773 [2:32:52<2:32:09, 5.51s/it] 71%|███████▏ | 4116/5773 [2:32:54<2:32:09, 5.51s/it] {'loss': 0.5494, 'learning_rate': 4.0195829910566795e-06, 'epoch': 0.71} 71%|███████▏ | 4116/5773 [2:32:54<2:32:09, 5.51s/it] {'loss': 0.5494, 'learning_rate': 4.0195829910566795e-06, 'epoch': 0.71} 71%|███████▏ | 4116/5773 [2:32:52<2:32:09, 5.51s/it] 71%|███████▏ | 4117/5773 [2:32:57<2:31:43, 5.50s/it] 71%|███████▏ | 4117/5773 [2:32:59<2:31:43, 5.50s/it] {'loss': 0.5595, 'learning_rate': 4.01508692161003e-06, 'epoch': 0.71} 71%|███████▏ | 4117/5773 [2:32:57<2:31:43, 5.50s/it]{'loss': 0.5595, 'learning_rate': 4.01508692161003e-06, 'epoch': 0.71} 71%|███████▏ | 4117/5773 [2:32:59<2:31:43, 5.50s/it] 71%|███████▏ | 4118/5773 [2:33:03<2:30:57, 5.47s/it] 71%|███████▏ | 4118/5773 [2:33:05<2:30:56, 5.47s/it] {'loss': 0.5715, 'learning_rate': 4.01059273640523e-06, 'epoch': 0.71} 71%|███████▏ | 4118/5773 [2:33:05<2:30:56, 5.47s/it] {'loss': 0.5715, 'learning_rate': 4.01059273640523e-06, 'epoch': 0.71} 71%|███████▏ | 4118/5773 [2:33:03<2:30:57, 5.47s/it] 71%|███████▏ | 4119/5773 [2:33:08<2:31:08, 5.48s/it] 71%|███████▏ | 4119/5773 [2:33:10<2:31:08, 5.48s/it] {'loss': 0.5589, 'learning_rate': 4.006100436857201e-06, 'epoch': 0.71} 71%|███████▏ | 4119/5773 [2:33:10<2:31:08, 5.48s/it] {'loss': 0.5589, 'learning_rate': 4.006100436857201e-06, 'epoch': 0.71} 71%|███████▏ | 4119/5773 [2:33:08<2:31:08, 5.48s/it] 71%|███████▏ | 4120/5773 [2:33:14<2:30:59, 5.48s/it] 71%|███████▏ | 4120/5773 [2:33:16<2:30:59, 5.48s/it] {'loss': 0.5576, 'learning_rate': 4.001610024380258e-06, 'epoch': 0.71} 71%|███████▏ | 4120/5773 [2:33:16<2:30:59, 5.48s/it] {'loss': 0.5576, 'learning_rate': 4.001610024380258e-06, 'epoch': 0.71} 71%|███████▏ | 4120/5773 [2:33:14<2:30:59, 5.48s/it] 71%|███████▏ | 4121/5773 [2:33:19<2:29:58, 5.45s/it] 71%|███████▏ | 4121/5773 [2:33:21<2:29:58, 5.45s/it] {'loss': 0.5687, 'learning_rate': 3.997121500388124e-06, 'epoch': 0.71} 71%|███████▏ | 4121/5773 [2:33:21<2:29:58, 5.45s/it] {'loss': 0.5687, 'learning_rate': 3.997121500388124e-06, 'epoch': 0.71} 71%|███████▏ | 4121/5773 [2:33:19<2:29:58, 5.45s/it] 71%|███████▏ | 4122/5773 [2:33:25<2:30:10, 5.46s/it] 71%|███████▏ | 4122/5773 [2:33:27<2:30:10, 5.46s/it] {'loss': 0.5627, 'learning_rate': 3.992634866293935e-06, 'epoch': 0.71} 71%|███████▏ | 4122/5773 [2:33:27<2:30:10, 5.46s/it] {'loss': 0.5627, 'learning_rate': 3.992634866293935e-06, 'epoch': 0.71} 71%|███████▏ | 4122/5773 [2:33:25<2:30:10, 5.46s/it] 71%|███████▏ | 4123/5773 [2:33:30<2:30:08, 5.46s/it] 71%|███████▏ | 4123/5773 [2:33:32<2:30:08, 5.46s/it] {'loss': 0.5584, 'learning_rate': 3.988150123510224e-06, 'epoch': 0.71} 71%|███████▏ | 4123/5773 [2:33:32<2:30:08, 5.46s/it] {'loss': 0.5584, 'learning_rate': 3.988150123510224e-06, 'epoch': 0.71} 71%|███████▏ | 4123/5773 [2:33:30<2:30:08, 5.46s/it] 71%|███████▏ | 4124/5773 [2:33:36<2:30:40, 5.48s/it] 71%|███████▏ | 4124/5773 [2:33:38<2:30:40, 5.48s/it] {'loss': 0.5502, 'learning_rate': 3.9836672734489315e-06, 'epoch': 0.71} 71%|███████▏ | 4124/5773 [2:33:38<2:30:40, 5.48s/it] {'loss': 0.5502, 'learning_rate': 3.9836672734489315e-06, 'epoch': 0.71} 71%|███████▏ | 4124/5773 [2:33:36<2:30:40, 5.48s/it] 71%|███████▏ | 4125/5773 [2:33:41<2:31:51, 5.53s/it] 71%|███████▏ | 4125/5773 [2:33:43<2:31:51, 5.53s/it] {'loss': 0.5611, 'learning_rate': 3.9791863175214015e-06, 'epoch': 0.71} 71%|███████▏ | 4125/5773 [2:33:43<2:31:51, 5.53s/it] {'loss': 0.5611, 'learning_rate': 3.9791863175214015e-06, 'epoch': 0.71} 71%|███████▏ | 4125/5773 [2:33:41<2:31:51, 5.53s/it] 71%|███████▏ | 4126/5773 [2:33:47<2:31:23, 5.52s/it] 71%|███████▏ | 4126/5773 [2:33:49<2:31:23, 5.52s/it] {'loss': 0.5521, 'learning_rate': 3.974707257138383e-06, 'epoch': 0.71} 71%|███████▏ | 4126/5773 [2:33:49<2:31:23, 5.52s/it] {'loss': 0.5521, 'learning_rate': 3.974707257138383e-06, 'epoch': 0.71} 71%|███████▏ | 4126/5773 [2:33:47<2:31:23, 5.52s/it] 71%|███████▏ | 4127/5773 [2:33:52<2:32:04, 5.54s/it] 71%|███████▏ | 4127/5773 [2:33:54<2:32:04, 5.54s/it] {'loss': 0.5695, 'learning_rate': 3.970230093710023e-06, 'epoch': 0.71} 71%|███████▏ | 4127/5773 [2:33:54<2:32:04, 5.54s/it] {'loss': 0.5695, 'learning_rate': 3.970230093710023e-06, 'epoch': 0.71} 71%|███████▏ | 4127/5773 [2:33:52<2:32:04, 5.54s/it] 72%|███████▏ | 4128/5773 [2:33:58<2:30:48, 5.50s/it] 72%|███████▏ | 4128/5773 [2:34:00<2:30:48, 5.50s/it] {'loss': 0.5544, 'learning_rate': 3.965754828645883e-06, 'epoch': 0.72} 72%|███████▏ | 4128/5773 [2:34:00<2:30:48, 5.50s/it] {'loss': 0.5544, 'learning_rate': 3.965754828645883e-06, 'epoch': 0.72} 72%|███████▏ | 4128/5773 [2:33:58<2:30:48, 5.50s/it] 72%|███████▏ | 4129/5773 [2:34:03<2:30:40, 5.50s/it] 72%|███████▏ | 4129/5773 [2:34:05<2:30:40, 5.50s/it] {'loss': 0.5633, 'learning_rate': 3.961281463354916e-06, 'epoch': 0.72} 72%|███████▏ | 4129/5773 [2:34:05<2:30:40, 5.50s/it] {'loss': 0.5633, 'learning_rate': 3.961281463354916e-06, 'epoch': 0.72} 72%|███████▏ | 4129/5773 [2:34:03<2:30:40, 5.50s/it] 72%|███████▏ | 4130/5773 [2:34:09<2:31:01, 5.51s/it] 72%|███████▏ | 4130/5773 [2:34:11<2:31:01, 5.51s/it] {'loss': 0.5493, 'learning_rate': 3.956809999245479e-06, 'epoch': 0.72} 72%|███████▏ | 4130/5773 [2:34:11<2:31:01, 5.51s/it] {'loss': 0.5493, 'learning_rate': 3.956809999245479e-06, 'epoch': 0.72} 72%|███████▏ | 4130/5773 [2:34:09<2:31:01, 5.51s/it] 72%|███████▏ | 4131/5773 [2:34:14<2:30:12, 5.49s/it] 72%|███████▏ | 4131/5773 [2:34:16<2:30:12, 5.49s/it] {'loss': 0.5656, 'learning_rate': 3.952340437725331e-06, 'epoch': 0.72} 72%|███████▏ | 4131/5773 [2:34:16<2:30:12, 5.49s/it] {'loss': 0.5656, 'learning_rate': 3.952340437725331e-06, 'epoch': 0.72} 72%|███████▏ | 4131/5773 [2:34:14<2:30:12, 5.49s/it] 72%|███████▏ | 4132/5773 [2:34:20<2:29:21, 5.46s/it] 72%|███████▏ | 4132/5773 [2:34:22<2:29:21, 5.46s/it] {'loss': 0.5751, 'learning_rate': 3.947872780201637e-06, 'epoch': 0.72} 72%|███████▏ | 4132/5773 [2:34:22<2:29:21, 5.46s/it] {'loss': 0.5751, 'learning_rate': 3.947872780201637e-06, 'epoch': 0.72} 72%|███████▏ | 4132/5773 [2:34:20<2:29:21, 5.46s/it] 72%|███████▏ | 4133/5773 [2:34:25<2:29:30, 5.47s/it] 72%|███████▏ | 4133/5773 [2:34:27<2:29:30, 5.47s/it] {'loss': 0.546, 'learning_rate': 3.943407028080956e-06, 'epoch': 0.72} 72%|███████▏ | 4133/5773 [2:34:27<2:29:30, 5.47s/it] {'loss': 0.546, 'learning_rate': 3.943407028080956e-06, 'epoch': 0.72} 72%|███████▏ | 4133/5773 [2:34:25<2:29:30, 5.47s/it] 72%|███████▏ | 4134/5773 [2:34:31<2:30:17, 5.50s/it] 72%|███████▏ | 4134/5773 [2:34:33<2:30:18, 5.50s/it] {'loss': 0.5601, 'learning_rate': 3.9389431827692455e-06, 'epoch': 0.72} 72%|███████▏ | 4134/5773 [2:34:33<2:30:18, 5.50s/it] {'loss': 0.5601, 'learning_rate': 3.9389431827692455e-06, 'epoch': 0.72} 72%|███████▏ | 4134/5773 [2:34:31<2:30:17, 5.50s/it] 72%|███████▏ | 4135/5773 [2:34:38<2:29:56, 5.49s/it] 72%|███████▏ | 4135/5773 [2:34:36<2:29:56, 5.49s/it] {'loss': 0.5415, 'learning_rate': 3.934481245671875e-06, 'epoch': 0.72} 72%|███████▏ | 4135/5773 [2:34:38<2:29:56, 5.49s/it] {'loss': 0.5415, 'learning_rate': 3.934481245671875e-06, 'epoch': 0.72} 72%|███████▏ | 4135/5773 [2:34:36<2:29:56, 5.49s/it] 72%|███████▏ | 4136/5773 [2:34:42<2:29:31, 5.48s/it] 72%|███████▏ | 4136/5773 [2:34:44<2:29:31, 5.48s/it] {'loss': 0.5589, 'learning_rate': 3.930021218193599e-06, 'epoch': 0.72} 72%|███████▏ | 4136/5773 [2:34:44<2:29:31, 5.48s/it] {'loss': 0.5589, 'learning_rate': 3.930021218193599e-06, 'epoch': 0.72} 72%|███████▏ | 4136/5773 [2:34:42<2:29:31, 5.48s/it] 72%|███████▏ | 4137/5773 [2:34:47<2:30:07, 5.51s/it] 72%|███████▏ | 4137/5773 [2:34:49<2:30:07, 5.51s/it] {'loss': 0.5691, 'learning_rate': 3.92556310173858e-06, 'epoch': 0.72} 72%|███████▏ | 4137/5773 [2:34:49<2:30:07, 5.51s/it] {'loss': 0.5691, 'learning_rate': 3.92556310173858e-06, 'epoch': 0.72} 72%|███████▏ | 4137/5773 [2:34:47<2:30:07, 5.51s/it] 72%|███████▏ | 4138/5773 [2:34:53<2:30:03, 5.51s/it] 72%|███████▏ | 4138/5773 [2:34:55<2:30:03, 5.51s/it] {'loss': 0.542, 'learning_rate': 3.921106897710368e-06, 'epoch': 0.72} 72%|███████▏ | 4138/5773 [2:34:55<2:30:03, 5.51s/it] {'loss': 0.542, 'learning_rate': 3.921106897710368e-06, 'epoch': 0.72} 72%|███████▏ | 4138/5773 [2:34:53<2:30:03, 5.51s/it] 72%|███████▏ | 4139/5773 [2:34:58<2:29:17, 5.48s/it] 72%|███████▏ | 4139/5773 [2:35:00<2:29:17, 5.48s/it] {'loss': 0.5697, 'learning_rate': 3.916652607511931e-06, 'epoch': 0.72} 72%|███████▏ | 4139/5773 [2:35:00<2:29:17, 5.48s/it] {'loss': 0.5697, 'learning_rate': 3.916652607511931e-06, 'epoch': 0.72} 72%|███████▏ | 4139/5773 [2:34:58<2:29:17, 5.48s/it] 72%|███████▏ | 4140/5773 [2:35:03<2:28:36, 5.46s/it] 72%|███████▏ | 4140/5773 [2:35:05<2:28:36, 5.46s/it] {'loss': 0.5681, 'learning_rate': 3.91220023254561e-06, 'epoch': 0.72} 72%|███████▏ | 4140/5773 [2:35:05<2:28:36, 5.46s/it] {'loss': 0.5681, 'learning_rate': 3.91220023254561e-06, 'epoch': 0.72} 72%|███████▏ | 4140/5773 [2:35:03<2:28:36, 5.46s/it] 72%|███████▏ | 4141/5773 [2:35:11<2:28:50, 5.47s/it] 72%|███████▏ | 4141/5773 [2:35:09<2:28:50, 5.47s/it]{'loss': 0.5584, 'learning_rate': 3.907749774213161e-06, 'epoch': 0.72} {'loss': 0.5584, 'learning_rate': 3.907749774213161e-06, 'epoch': 0.72} 72%|███████▏ | 4141/5773 [2:35:11<2:28:50, 5.47s/it] 72%|███████▏ | 4141/5773 [2:35:09<2:28:50, 5.47s/it] 72%|███████▏ | 4142/5773 [2:35:14<2:28:47, 5.47s/it] 72%|███████▏ | 4142/5773 [2:35:16<2:28:47, 5.47s/it] {'loss': 0.5559, 'learning_rate': 3.903301233915731e-06, 'epoch': 0.72} 72%|███████▏ | 4142/5773 [2:35:16<2:28:47, 5.47s/it] {'loss': 0.5559, 'learning_rate': 3.903301233915731e-06, 'epoch': 0.72} 72%|███████▏ | 4142/5773 [2:35:14<2:28:47, 5.47s/it] 72%|███████▏ | 4143/5773 [2:35:20<2:29:23, 5.50s/it] 72%|███████▏ | 4143/5773 [2:35:22<2:29:23, 5.50s/it] {'loss': 0.5735, 'learning_rate': 3.898854613053858e-06, 'epoch': 0.72} 72%|███████▏ | 4143/5773 [2:35:22<2:29:23, 5.50s/it] {'loss': 0.5735, 'learning_rate': 3.898854613053858e-06, 'epoch': 0.72} 72%|███████▏ | 4143/5773 [2:35:20<2:29:23, 5.50s/it] 72%|███████▏ | 4144/5773 [2:35:25<2:29:17, 5.50s/it] 72%|███████▏ | 4144/5773 [2:35:27<2:29:17, 5.50s/it] {'loss': 0.5454, 'learning_rate': 3.894409913027481e-06, 'epoch': 0.72} 72%|███████▏ | 4144/5773 [2:35:27<2:29:17, 5.50s/it] {'loss': 0.5454, 'learning_rate': 3.894409913027481e-06, 'epoch': 0.72} 72%|███████▏ | 4144/5773 [2:35:25<2:29:17, 5.50s/it] 72%|███████▏ | 4145/5773 [2:35:31<2:28:37, 5.48s/it] 72%|███████▏ | 4145/5773 [2:35:33<2:28:36, 5.48s/it] {'loss': 0.5596, 'learning_rate': 3.88996713523594e-06, 'epoch': 0.72} 72%|███████▏ | 4145/5773 [2:35:33<2:28:36, 5.48s/it] {'loss': 0.5596, 'learning_rate': 3.88996713523594e-06, 'epoch': 0.72} 72%|███████▏ | 4145/5773 [2:35:31<2:28:37, 5.48s/it] 72%|███████▏ | 4146/5773 [2:35:36<2:28:39, 5.48s/it] 72%|███████▏ | 4146/5773 [2:35:38<2:28:39, 5.48s/it] {'loss': 0.5666, 'learning_rate': 3.8855262810779595e-06, 'epoch': 0.72} 72%|███████▏ | 4146/5773 [2:35:38<2:28:39, 5.48s/it] {'loss': 0.5666, 'learning_rate': 3.8855262810779595e-06, 'epoch': 0.72} 72%|███████▏ | 4146/5773 [2:35:36<2:28:39, 5.48s/it] 72%|███████▏ | 4147/5773 [2:35:42<2:28:24, 5.48s/it] 72%|███████▏ | 4147/5773 [2:35:44<2:28:24, 5.48s/it] {'loss': 0.5456, 'learning_rate': 3.881087351951657e-06, 'epoch': 0.72} 72%|███████▏ | 4147/5773 [2:35:44<2:28:24, 5.48s/it] {'loss': 0.5456, 'learning_rate': 3.881087351951657e-06, 'epoch': 0.72} 72%|███████▏ | 4147/5773 [2:35:42<2:28:24, 5.48s/it] 72%|███████▏ | 4148/5773 [2:35:47<2:29:05, 5.51s/it] 72%|███████▏ | 4148/5773 [2:35:49<2:29:05, 5.51s/it] {'loss': 0.5661, 'learning_rate': 3.876650349254557e-06, 'epoch': 0.72} 72%|███████▏ | 4148/5773 [2:35:49<2:29:05, 5.51s/it] {'loss': 0.5661, 'learning_rate': 3.876650349254557e-06, 'epoch': 0.72} 72%|███████▏ | 4148/5773 [2:35:47<2:29:05, 5.51s/it] 72%|███████▏ | 4149/5773 [2:35:53<2:29:24, 5.52s/it] 72%|███████▏ | 4149/5773 [2:35:55<2:29:24, 5.52s/it] {'loss': 0.5505, 'learning_rate': 3.872215274383567e-06, 'epoch': 0.72} 72%|███████▏ | 4149/5773 [2:35:55<2:29:24, 5.52s/it] {'loss': 0.5505, 'learning_rate': 3.872215274383567e-06, 'epoch': 0.72} 72%|███████▏ | 4149/5773 [2:35:53<2:29:24, 5.52s/it]10 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 72%|███████▏ | 4150/5773 [2:35:59<2:29:42, 5.53s/it]3 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 72%|███████▏ | 4150/5773 [2:36:01<2:29:41, 5.53s/it]15 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.553, 'learning_rate': 3.867782128734992e-06, 'epoch': 0.72} 72%|███████▏ | 4150/5773 [2:36:01<2:29:41, 5.53s/it] {'loss': 0.553, 'learning_rate': 3.867782128734992e-06, 'epoch': 0.72} 72%|███████▏ | 4150/5773 [2:35:59<2:29:42, 5.53s/it] 72%|███████▏ | 4151/5773 [2:36:04<2:30:39, 5.57s/it] 72%|███████▏ | 4151/5773 [2:36:06<2:30:39, 5.57s/it] {'loss': 0.5671, 'learning_rate': 3.86335091370452e-06, 'epoch': 0.72} 72%|███████▏ | 4151/5773 [2:36:06<2:30:39, 5.57s/it] {'loss': 0.5671, 'learning_rate': 3.86335091370452e-06, 'epoch': 0.72} 72%|███████▏ | 4151/5773 [2:36:04<2:30:39, 5.57s/it] 72%|███████▏ | 4152/5773 [2:36:10<2:30:02, 5.55s/it] 72%|███████▏ | 4152/5773 [2:36:12<2:30:02, 5.55s/it] {'loss': 0.5618, 'learning_rate': 3.858921630687249e-06, 'epoch': 0.72} 72%|███████▏ | 4152/5773 [2:36:12<2:30:02, 5.55s/it] {'loss': 0.5618, 'learning_rate': 3.858921630687249e-06, 'epoch': 0.72} 72%|███████▏ | 4152/5773 [2:36:10<2:30:02, 5.55s/it] 72%|███████▏ | 4153/5773 [2:36:15<2:29:13, 5.53s/it] 72%|███████▏ | 4153/5773 [2:36:17<2:29:13, 5.53s/it] {'loss': 0.5564, 'learning_rate': 3.854494281077655e-06, 'epoch': 0.72} 72%|███████▏ | 4153/5773 [2:36:17<2:29:13, 5.53s/it] {'loss': 0.5564, 'learning_rate': 3.854494281077655e-06, 'epoch': 0.72} 72%|███████▏ | 4153/5773 [2:36:15<2:29:13, 5.53s/it] 72%|███████▏ | 4154/5773 [2:36:23<2:29:11, 5.53s/it] 72%|███████▏ | 4154/5773 [2:36:21<2:29:11, 5.53s/it] {'loss': 0.5501, 'learning_rate': 3.85006886626961e-06, 'epoch': 0.72} 72%|███████▏ | 4154/5773 [2:36:23<2:29:11, 5.53s/it] {'loss': 0.5501, 'learning_rate': 3.85006886626961e-06, 'epoch': 0.72} 72%|███████▏ | 4154/5773 [2:36:21<2:29:11, 5.53s/it] 72%|███████▏ | 4155/5773 [2:36:26<2:28:35, 5.51s/it] 72%|███████▏ | 4155/5773 [2:36:28<2:28:35, 5.51s/it] {'loss': 0.5585, 'learning_rate': 3.845645387656374e-06, 'epoch': 0.72} 72%|███████▏ | 4155/5773 [2:36:28<2:28:35, 5.51s/it] {'loss': 0.5585, 'learning_rate': 3.845645387656374e-06, 'epoch': 0.72} 72%|███████▏ | 4155/5773 [2:36:26<2:28:35, 5.51s/it] 72%|███████▏ | 4156/5773 [2:36:32<2:27:08, 5.46s/it] 72%|███████▏ | 4156/5773 [2:36:34<2:27:08, 5.46s/it] {'loss': 0.5537, 'learning_rate': 3.841223846630599e-06, 'epoch': 0.72} 72%|███████▏ | 4156/5773 [2:36:34<2:27:08, 5.46s/it] {'loss': 0.5537, 'learning_rate': 3.841223846630599e-06, 'epoch': 0.72} 72%|███████▏ | 4156/5773 [2:36:32<2:27:08, 5.46s/it] 72%|███████▏ | 4157/5773 [2:36:37<2:26:23, 5.44s/it] 72%|███████▏ | 4157/5773 [2:36:39<2:26:23, 5.44s/it] {'loss': 0.5711, 'learning_rate': 3.8368042445843265e-06, 'epoch': 0.72} 72%|███████▏ | 4157/5773 [2:36:39<2:26:23, 5.44s/it] {'loss': 0.5711, 'learning_rate': 3.8368042445843265e-06, 'epoch': 0.72} 72%|███████▏ | 4157/5773 [2:36:37<2:26:23, 5.44s/it] 72%|███████▏ | 4158/5773 [2:36:43<2:28:30, 5.52s/it] 72%|███████▏ | 4158/5773 [2:36:45<2:28:31, 5.52s/it] {'loss': 0.5646, 'learning_rate': 3.832386582908995e-06, 'epoch': 0.72} 72%|███████▏ | 4158/5773 [2:36:45<2:28:31, 5.52s/it] {'loss': 0.5646, 'learning_rate': 3.832386582908995e-06, 'epoch': 0.72} 72%|███████▏ | 4158/5773 [2:36:43<2:28:30, 5.52s/it] 72%|███████▏ | 4159/5773 [2:36:48<2:28:17, 5.51s/it] 72%|███████▏ | 4159/5773 [2:36:50<2:28:17, 5.51s/it] {'loss': 0.5656, 'learning_rate': 3.8279708629954196e-06, 'epoch': 0.72} 72%|███████▏ | 4159/5773 [2:36:50<2:28:17, 5.51s/it] {'loss': 0.5656, 'learning_rate': 3.8279708629954196e-06, 'epoch': 0.72} 72%|███████▏ | 4159/5773 [2:36:48<2:28:17, 5.51s/it] 72%|███████▏ | 4160/5773 [2:36:54<2:28:07, 5.51s/it] 72%|███████▏ | 4160/5773 [2:36:56<2:28:07, 5.51s/it] {'loss': 0.5635, 'learning_rate': 3.823557086233808e-06, 'epoch': 0.72} 72%|███████▏ | 4160/5773 [2:36:56<2:28:07, 5.51s/it] {'loss': 0.5635, 'learning_rate': 3.823557086233808e-06, 'epoch': 0.72} 72%|███████▏ | 4160/5773 [2:36:54<2:28:07, 5.51s/it] 72%|███████▏ | 4161/5773 [2:36:59<2:28:39, 5.53s/it] 72%|███████▏ | 4161/5773 [2:37:01<2:28:39, 5.53s/it] {'loss': 0.5733, 'learning_rate': 3.819145254013766e-06, 'epoch': 0.72} 72%|███████▏ | 4161/5773 [2:37:01<2:28:39, 5.53s/it] {'loss': 0.5733, 'learning_rate': 3.819145254013766e-06, 'epoch': 0.72} 72%|███████▏ | 4161/5773 [2:36:59<2:28:39, 5.53s/it] 72%|███████▏ | 4162/5773 [2:37:05<2:27:11, 5.48s/it] 72%|███████▏ | 4162/5773 [2:37:07<2:27:11, 5.48s/it] {'loss': 0.5523, 'learning_rate': 3.8147353677242735e-06, 'epoch': 0.72} 72%|███████▏ | 4162/5773 [2:37:07<2:27:11, 5.48s/it] {'loss': 0.5523, 'learning_rate': 3.8147353677242735e-06, 'epoch': 0.72} 72%|███████▏ | 4162/5773 [2:37:05<2:27:11, 5.48s/it] 72%|███████▏ | 4163/5773 [2:37:10<2:27:15, 5.49s/it] 72%|███████▏ | 4163/5773 [2:37:12<2:27:15, 5.49s/it] {'loss': 0.563, 'learning_rate': 3.810327428753706e-06, 'epoch': 0.72} 72%|███████▏ | 4163/5773 [2:37:12<2:27:15, 5.49s/it] {'loss': 0.563, 'learning_rate': 3.810327428753706e-06, 'epoch': 0.72} 72%|███████▏ | 4163/5773 [2:37:10<2:27:15, 5.49s/it] 72%|███████▏ | 4164/5773 [2:37:16<2:27:07, 5.49s/it] 72%|███████▏ | 4164/5773 [2:37:18<2:27:07, 5.49s/it] {'loss': 0.537, 'learning_rate': 3.805921438489819e-06, 'epoch': 0.72} 72%|███████▏ | 4164/5773 [2:37:18<2:27:07, 5.49s/it] {'loss': 0.537, 'learning_rate': 3.805921438489819e-06, 'epoch': 0.72} 72%|███████▏ | 4164/5773 [2:37:16<2:27:07, 5.49s/it] 72%|███████▏ | 4165/5773 [2:37:21<2:26:22, 5.46s/it] 72%|███████▏ | 4165/5773 [2:37:23<2:26:22, 5.46s/it] {'loss': 0.5499, 'learning_rate': 3.8015173983197674e-06, 'epoch': 0.72} 72%|███████▏ | 4165/5773 [2:37:23<2:26:22, 5.46s/it] {'loss': 0.5499, 'learning_rate': 3.8015173983197674e-06, 'epoch': 0.72} 72%|███████▏ | 4165/5773 [2:37:21<2:26:22, 5.46s/it] 72%|███████▏ | 4166/5773 [2:37:27<2:26:46, 5.48s/it] 72%|███████▏ | 4166/5773 [2:37:28<2:26:46, 5.48s/it] {'loss': 0.5842, 'learning_rate': 3.7971153096300773e-06, 'epoch': 0.72} 72%|███████▏ | 4166/5773 [2:37:28<2:26:46, 5.48s/it] {'loss': 0.5842, 'learning_rate': 3.7971153096300773e-06, 'epoch': 0.72} 72%|███████▏ | 4166/5773 [2:37:27<2:26:46, 5.48s/it] 72%|███████▏ | 4167/5773 [2:37:32<2:26:51, 5.49s/it] 72%|███████▏ | 4167/5773 [2:37:34<2:26:51, 5.49s/it] {'loss': 0.5594, 'learning_rate': 3.7927151738066693e-06, 'epoch': 0.72} 72%|███████▏ | 4167/5773 [2:37:34<2:26:51, 5.49s/it] {'loss': 0.5594, 'learning_rate': 3.7927151738066693e-06, 'epoch': 0.72} 72%|███████▏ | 4167/5773 [2:37:32<2:26:51, 5.49s/it] 72%|███████▏ | 4168/5773 [2:37:38<2:27:14, 5.50s/it] 72%|███████▏ | 4168/5773 [2:37:40<2:27:15, 5.50s/it] {'loss': 0.5513, 'learning_rate': 3.788316992234845e-06, 'epoch': 0.72} 72%|███████▏ | 4168/5773 [2:37:38<2:27:14, 5.50s/it]{'loss': 0.5513, 'learning_rate': 3.788316992234845e-06, 'epoch': 0.72} 72%|███████▏ | 4168/5773 [2:37:40<2:27:15, 5.50s/it] 72%|███████▏ | 4169/5773 [2:37:43<2:26:38, 5.49s/it] 72%|███████▏ | 4169/5773 [2:37:45<2:26:39, 5.49s/it] {'loss': 0.566, 'learning_rate': 3.783920766299295e-06, 'epoch': 0.72} 72%|███████▏ | 4169/5773 [2:37:45<2:26:39, 5.49s/it]{'loss': 0.566, 'learning_rate': 3.783920766299295e-06, 'epoch': 0.72} 72%|███████▏ | 4169/5773 [2:37:43<2:26:38, 5.49s/it] 72%|███████▏ | 4170/5773 [2:37:51<2:27:04, 5.50s/it] 72%|███████▏ | 4170/5773 [2:37:49<2:27:04, 5.51s/it] {'loss': 0.5548, 'learning_rate': 3.779526497384086e-06, 'epoch': 0.72} 72%|███████▏ | 4170/5773 [2:37:51<2:27:04, 5.50s/it] {'loss': 0.5548, 'learning_rate': 3.779526497384086e-06, 'epoch': 0.72} 72%|███████▏ | 4170/5773 [2:37:49<2:27:04, 5.51s/it] 72%|███████▏ | 4171/5773 [2:37:54<2:26:32, 5.49s/it] 72%|███████▏ | 4171/5773 [2:37:56<2:26:33, 5.49s/it] {'loss': 0.5734, 'learning_rate': 3.775134186872682e-06, 'epoch': 0.72} 72%|███████▏ | 4171/5773 [2:37:56<2:26:33, 5.49s/it] {'loss': 0.5734, 'learning_rate': 3.775134186872682e-06, 'epoch': 0.72} 72%|███████▏ | 4171/5773 [2:37:54<2:26:32, 5.49s/it] 72%|███████▏ | 4172/5773 [2:37:59<2:25:38, 5.46s/it] 72%|███████▏ | 4172/5773 [2:38:01<2:25:38, 5.46s/it] {'loss': 0.5571, 'learning_rate': 3.7707438361479186e-06, 'epoch': 0.72} 72%|███████▏ | 4172/5773 [2:38:01<2:25:38, 5.46s/it] {'loss': 0.5571, 'learning_rate': 3.7707438361479186e-06, 'epoch': 0.72} 72%|███████▏ | 4172/5773 [2:37:59<2:25:38, 5.46s/it] 72%|███████▏ | 4173/5773 [2:38:05<2:26:27, 5.49s/it] 72%|███████▏ | 4173/5773 [2:38:07<2:26:27, 5.49s/it] {'loss': 0.5607, 'learning_rate': 3.766355446592016e-06, 'epoch': 0.72} 72%|███████▏ | 4173/5773 [2:38:07<2:26:27, 5.49s/it] {'loss': 0.5607, 'learning_rate': 3.766355446592016e-06, 'epoch': 0.72} 72%|███████▏ | 4173/5773 [2:38:05<2:26:27, 5.49s/it] 72%|███████▏ | 4174/5773 [2:38:10<2:26:38, 5.50s/it] 72%|███████▏ | 4174/5773 [2:38:12<2:26:38, 5.50s/it] {'loss': 0.5561, 'learning_rate': 3.7619690195865852e-06, 'epoch': 0.72} 72%|███████▏ | 4174/5773 [2:38:12<2:26:38, 5.50s/it] {'loss': 0.5561, 'learning_rate': 3.7619690195865852e-06, 'epoch': 0.72} 72%|███████▏ | 4174/5773 [2:38:10<2:26:38, 5.50s/it] 72%|███████▏ | 4175/5773 [2:38:16<2:25:30, 5.46s/it] 72%|███████▏ | 4175/5773 [2:38:18<2:25:30, 5.46s/it] {'loss': 0.5643, 'learning_rate': 3.7575845565126113e-06, 'epoch': 0.72} 72%|███████▏ | 4175/5773 [2:38:18<2:25:30, 5.46s/it] {'loss': 0.5643, 'learning_rate': 3.7575845565126113e-06, 'epoch': 0.72} 72%|███████▏ | 4175/5773 [2:38:16<2:25:30, 5.46s/it] 72%|███████▏ | 4176/5773 [2:38:21<2:26:01, 5.49s/it] 72%|███████▏ | 4176/5773 [2:38:23<2:26:01, 5.49s/it] {'loss': 0.5608, 'learning_rate': 3.7532020587504593e-06, 'epoch': 0.72} 72%|███████▏ | 4176/5773 [2:38:23<2:26:01, 5.49s/it] {'loss': 0.5608, 'learning_rate': 3.7532020587504593e-06, 'epoch': 0.72} 72%|███████▏ | 4176/5773 [2:38:21<2:26:01, 5.49s/it] 72%|███████▏ | 4177/5773 [2:38:27<2:25:32, 5.47s/it] 72%|███████▏ | 4177/5773 [2:38:29<2:25:32, 5.47s/it] {'loss': 0.5712, 'learning_rate': 3.7488215276798865e-06, 'epoch': 0.72} 72%|███████▏ | 4177/5773 [2:38:29<2:25:32, 5.47s/it] {'loss': 0.5712, 'learning_rate': 3.7488215276798865e-06, 'epoch': 0.72} 72%|███████▏ | 4177/5773 [2:38:27<2:25:32, 5.47s/it] 72%|███████▏ | 4178/5773 [2:38:32<2:25:28, 5.47s/it] 72%|███████▏ | 4178/5773 [2:38:34<2:25:28, 5.47s/it] {'loss': 0.5552, 'learning_rate': 3.7444429646800206e-06, 'epoch': 0.72} 72%|███████▏ | 4178/5773 [2:38:34<2:25:28, 5.47s/it] {'loss': 0.5552, 'learning_rate': 3.7444429646800206e-06, 'epoch': 0.72} 72%|███████▏ | 4178/5773 [2:38:32<2:25:28, 5.47s/it] 72%|███████▏ | 4179/5773 [2:38:38<2:25:11, 5.47s/it] 72%|███████▏ | 4179/5773 [2:38:40<2:25:11, 5.47s/it] {'loss': 0.5525, 'learning_rate': 3.740066371129373e-06, 'epoch': 0.72} 72%|███████▏ | 4179/5773 [2:38:40<2:25:11, 5.47s/it] {'loss': 0.5525, 'learning_rate': 3.740066371129373e-06, 'epoch': 0.72} 72%|███████▏ | 4179/5773 [2:38:38<2:25:11, 5.47s/it] 72%|███████▏ | 4180/5773 [2:38:44<2:27:57, 5.57s/it] 72%|███████▏ | 4180/5773 [2:38:46<2:27:57, 5.57s/it] {'loss': 0.5724, 'learning_rate': 3.735691748405832e-06, 'epoch': 0.72} 72%|███████▏ | 4180/5773 [2:38:46<2:27:57, 5.57s/it] {'loss': 0.5724, 'learning_rate': 3.735691748405832e-06, 'epoch': 0.72} 72%|███████▏ | 4180/5773 [2:38:44<2:27:57, 5.57s/it] 72%|███████▏ | 4181/5773 [2:38:49<2:26:32, 5.52s/it] 72%|███████▏ | 4181/5773 [2:38:51<2:26:32, 5.52s/it] {'loss': 0.5573, 'learning_rate': 3.7313190978866786e-06, 'epoch': 0.72} 72%|███████▏ | 4181/5773 [2:38:51<2:26:32, 5.52s/it] {'loss': 0.5573, 'learning_rate': 3.7313190978866786e-06, 'epoch': 0.72} 72%|███████▏ | 4181/5773 [2:38:49<2:26:32, 5.52s/it] 72%|███████▏ | 4182/5773 [2:38:54<2:25:52, 5.50s/it] 72%|███████▏ | 4182/5773 [2:38:56<2:25:51, 5.50s/it] {'loss': 0.5714, 'learning_rate': 3.726948420948553e-06, 'epoch': 0.72} 72%|███████▏ | 4182/5773 [2:38:56<2:25:51, 5.50s/it] {'loss': 0.5714, 'learning_rate': 3.726948420948553e-06, 'epoch': 0.72} 72%|███████▏ | 4182/5773 [2:38:54<2:25:52, 5.50s/it] 72%|███████▏ | 4183/5773 [2:39:00<2:25:38, 5.50s/it] 72%|███████▏ | 4183/5773 [2:39:02<2:25:38, 5.50s/it] {'loss': 0.5633, 'learning_rate': 3.7225797189674828e-06, 'epoch': 0.72} 72%|███████▏ | 4183/5773 [2:39:02<2:25:38, 5.50s/it] {'loss': 0.5633, 'learning_rate': 3.7225797189674828e-06, 'epoch': 0.72} 72%|███████▏ | 4183/5773 [2:39:00<2:25:38, 5.50s/it] 72%|███████▏ | 4184/5773 [2:39:05<2:25:35, 5.50s/it] 72%|███████▏ | 4184/5773 [2:39:07<2:25:35, 5.50s/it] {'loss': 0.5455, 'learning_rate': 3.7182129933188836e-06, 'epoch': 0.72} 72%|███████▏ | 4184/5773 [2:39:07<2:25:35, 5.50s/it] {'loss': 0.5455, 'learning_rate': 3.7182129933188836e-06, 'epoch': 0.72} 72%|███████▏ | 4184/5773 [2:39:05<2:25:35, 5.50s/it] 72%|███████▏ | 4185/5773 [2:39:11<2:24:36, 5.46s/it] 72%|███████▏ | 4185/5773 [2:39:13<2:24:36, 5.46s/it] {'loss': 0.5727, 'learning_rate': 3.713848245377536e-06, 'epoch': 0.72} 72%|███████▏ | 4185/5773 [2:39:13<2:24:36, 5.46s/it] {'loss': 0.5727, 'learning_rate': 3.713848245377536e-06, 'epoch': 0.72} 72%|███████▏ | 4185/5773 [2:39:11<2:24:36, 5.46s/it] 73%|███████▎ | 4186/5773 [2:39:16<2:25:41, 5.51s/it] 73%|███████▎ | 4186/5773 [2:39:18<2:25:41, 5.51s/it] {'loss': 0.5481, 'learning_rate': 3.709485476517597e-06, 'epoch': 0.73} 73%|███████▎ | 4186/5773 [2:39:18<2:25:41, 5.51s/it] {'loss': 0.5481, 'learning_rate': 3.709485476517597e-06, 'epoch': 0.73} 73%|███████▎ | 4186/5773 [2:39:16<2:25:41, 5.51s/it] 73%|███████▎ | 4187/5773 [2:39:22<2:25:49, 5.52s/it] 73%|███████▎ | 4187/5773 [2:39:24<2:25:49, 5.52s/it] {'loss': 0.5586, 'learning_rate': 3.7051246881126157e-06, 'epoch': 0.73} 73%|███████▎ | 4187/5773 [2:39:24<2:25:49, 5.52s/it] {'loss': 0.5586, 'learning_rate': 3.7051246881126157e-06, 'epoch': 0.73} 73%|███████▎ | 4187/5773 [2:39:22<2:25:49, 5.52s/it] 73%|███████▎ | 4188/5773 [2:39:29<2:25:40, 5.51s/it] 73%|███████▎ | 4188/5773 [2:39:27<2:25:40, 5.51s/it] {'loss': 0.5504, 'learning_rate': 3.700765881535502e-06, 'epoch': 0.73} 73%|███████▎ | 4188/5773 [2:39:29<2:25:40, 5.51s/it] {'loss': 0.5504, 'learning_rate': 3.700765881535502e-06, 'epoch': 0.73} 73%|███████▎ | 4188/5773 [2:39:27<2:25:40, 5.51s/it] 73%|███████▎ | 4189/5773 [2:39:33<2:24:27, 5.47s/it] 73%|███████▎ | 4189/5773 [2:39:35<2:24:27, 5.47s/it] {'loss': 0.5667, 'learning_rate': 3.696409058158544e-06, 'epoch': 0.73} 73%|███████▎ | 4189/5773 [2:39:35<2:24:27, 5.47s/it] {'loss': 0.5667, 'learning_rate': 3.696409058158544e-06, 'epoch': 0.73} 73%|███████▎ | 4189/5773 [2:39:33<2:24:27, 5.47s/it] 73%|███████▎ | 4190/5773 [2:39:38<2:25:23, 5.51s/it] 73%|███████▎ | 4190/5773 [2:39:40<2:25:23, 5.51s/it] {'loss': 0.5599, 'learning_rate': 3.692054219353417e-06, 'epoch': 0.73} 73%|███████▎ | 4190/5773 [2:39:40<2:25:23, 5.51s/it] {'loss': 0.5599, 'learning_rate': 3.692054219353417e-06, 'epoch': 0.73} 73%|███████▎ | 4190/5773 [2:39:38<2:25:23, 5.51s/it] 73%|███████▎ | 4191/5773 [2:39:44<2:24:49, 5.49s/it] 73%|███████▎ | 4191/5773 [2:39:46<2:24:50, 5.49s/it] {'loss': 0.5672, 'learning_rate': 3.6877013664911588e-06, 'epoch': 0.73} 73%|███████▎ | 4191/5773 [2:39:46<2:24:50, 5.49s/it] {'loss': 0.5672, 'learning_rate': 3.6877013664911588e-06, 'epoch': 0.73} 73%|███████▎ | 4191/5773 [2:39:44<2:24:49, 5.49s/it] 73%|███████▎ | 4192/5773 [2:39:52<2:26:06, 5.54s/it] 73%|███████▎ | 4192/5773 [2:39:50<2:26:06, 5.54s/it] {'loss': 0.5516, 'learning_rate': 3.6833505009421867e-06, 'epoch': 0.73} 73%|███████▎ | 4192/5773 [2:39:52<2:26:06, 5.54s/it] {'loss': 0.5516, 'learning_rate': 3.6833505009421867e-06, 'epoch': 0.73} 73%|███████▎ | 4192/5773 [2:39:50<2:26:06, 5.54s/it] 73%|███████▎ | 4193/5773 [2:39:57<2:24:49, 5.50s/it] 73%|███████▎ | 4193/5773 [2:39:55<2:24:49, 5.50s/it] {'loss': 0.5632, 'learning_rate': 3.6790016240762895e-06, 'epoch': 0.73} 73%|███████▎ | 4193/5773 [2:39:57<2:24:49, 5.50s/it] {'loss': 0.5632, 'learning_rate': 3.6790016240762895e-06, 'epoch': 0.73} 73%|███████▎ | 4193/5773 [2:39:55<2:24:49, 5.50s/it] 73%|███████▎ | 4194/5773 [2:40:00<2:24:30, 5.49s/it] 73%|███████▎ | 4194/5773 [2:40:02<2:24:31, 5.49s/it] {'loss': 0.5738, 'learning_rate': 3.6746547372626384e-06, 'epoch': 0.73} 73%|███████▎ | 4194/5773 [2:40:02<2:24:31, 5.49s/it] {'loss': 0.5738, 'learning_rate': 3.6746547372626384e-06, 'epoch': 0.73} 73%|███████▎ | 4194/5773 [2:40:00<2:24:30, 5.49s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5158 > 4096). Running this sequence through the model will result in indexing errors 73%|███████▎ | 4195/5773 [2:40:06<2:24:32, 5.50s/it] 73%|███████▎ | 4195/5773 [2:40:08<2:24:32, 5.50s/it] {'loss': 0.5736, 'learning_rate': 3.6703098418697735e-06, 'epoch': 0.73} 73%|███████▎ | 4195/5773 [2:40:08<2:24:32, 5.50s/it] {'loss': 0.5736, 'learning_rate': 3.6703098418697735e-06, 'epoch': 0.73} 73%|███████▎ | 4195/5773 [2:40:06<2:24:32, 5.50s/it] 73%|███████▎ | 4196/5773 [2:40:11<2:23:22, 5.45s/it] 73%|███████▎ | 4196/5773 [2:40:13<2:23:22, 5.45s/it] {'loss': 0.557, 'learning_rate': 3.6659669392655962e-06, 'epoch': 0.73} 73%|███████▎ | 4196/5773 [2:40:13<2:23:22, 5.45s/it] {'loss': 0.557, 'learning_rate': 3.6659669392655962e-06, 'epoch': 0.73} 73%|███████▎ | 4196/5773 [2:40:11<2:23:22, 5.45s/it] 73%|███████▎ | 4197/5773 [2:40:17<2:23:27, 5.46s/it] 73%|███████▎ | 4197/5773 [2:40:19<2:23:27, 5.46s/it] {'loss': 0.5598, 'learning_rate': 3.6616260308174e-06, 'epoch': 0.73} 73%|███████▎ | 4197/5773 [2:40:19<2:23:27, 5.46s/it] {'loss': 0.5598, 'learning_rate': 3.6616260308174e-06, 'epoch': 0.73} 73%|███████▎ | 4197/5773 [2:40:17<2:23:27, 5.46s/it] 73%|███████▎ | 4198/5773 [2:40:22<2:23:36, 5.47s/it] 73%|███████▎ | 4198/5773 [2:40:24<2:23:36, 5.47s/it] {'loss': 0.576, 'learning_rate': 3.657287117891839e-06, 'epoch': 0.73} 73%|███████▎ | 4198/5773 [2:40:24<2:23:36, 5.47s/it] {'loss': 0.576, 'learning_rate': 3.657287117891839e-06, 'epoch': 0.73} 73%|███████▎ | 4198/5773 [2:40:22<2:23:36, 5.47s/it] 73%|███████▎ | 4199/5773 [2:40:28<2:22:47, 5.44s/it] 73%|███████▎ | 4199/5773 [2:40:30<2:22:47, 5.44s/it] {'loss': 0.5706, 'learning_rate': 3.652950201854939e-06, 'epoch': 0.73} 73%|███████▎ | 4199/5773 [2:40:30<2:22:47, 5.44s/it] {'loss': 0.5706, 'learning_rate': 3.652950201854939e-06, 'epoch': 0.73} 73%|███████▎ | 4199/5773 [2:40:28<2:22:47, 5.44s/it]1 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 011 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 73%|███████▎ | 4200/5773 [2:40:35<2:23:51, 5.49s/it]12 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 73%|███████▎ | 4200/5773 [2:40:33<2:23:52, 5.49s/it]10 AutoResumeHook: Checking whether to suspend... {'loss': 0.5585, 'learning_rate': 3.6486152840721046e-06, 'epoch': 0.73} 73%|███████▎ | 4200/5773 [2:40:35<2:23:51, 5.49s/it] {'loss': 0.5585, 'learning_rate': 3.6486152840721046e-06, 'epoch': 0.73} 73%|███████▎ | 4200/5773 [2:40:33<2:23:52, 5.49s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 73%|███████▎ | 4201/5773 [2:40:56<4:20:37, 9.95s/it] 73%|███████▎ | 4201/5773 [2:40:54<4:20:37, 9.95s/it] {'loss': 0.5619, 'learning_rate': 3.644282365908105e-06, 'epoch': 0.73} 73%|███████▎ | 4201/5773 [2:40:56<4:20:37, 9.95s/it] {'loss': 0.5619, 'learning_rate': 3.644282365908105e-06, 'epoch': 0.73} 73%|███████▎ | 4201/5773 [2:40:54<4:20:37, 9.95s/it] 73%|███████▎ | 4202/5773 [2:41:01<3:44:24, 8.57s/it] 73%|███████▎ | 4202/5773 [2:40:59<3:44:24, 8.57s/it] {'loss': 0.5715, 'learning_rate': 3.639951448727076e-06, 'epoch': 0.73} 73%|███████▎ | 4202/5773 [2:41:01<3:44:24, 8.57s/it] {'loss': 0.5715, 'learning_rate': 3.639951448727076e-06, 'epoch': 0.73} 73%|███████▎ | 4202/5773 [2:40:59<3:44:24, 8.57s/it] 73%|███████▎ | 4203/5773 [2:41:05<3:20:47, 7.67s/it] 73%|███████▎ | 4203/5773 [2:41:06<3:20:47, 7.67s/it] {'loss': 0.5604, 'learning_rate': 3.6356225338925367e-06, 'epoch': 0.73} 73%|███████▎ | 4203/5773 [2:41:06<3:20:47, 7.67s/it] {'loss': 0.5604, 'learning_rate': 3.6356225338925367e-06, 'epoch': 0.73} 73%|███████▎ | 4203/5773 [2:41:05<3:20:47, 7.67s/it] 73%|███████▎ | 4204/5773 [2:41:10<3:04:32, 7.06s/it] 73%|███████▎ | 4204/5773 [2:41:12<3:04:32, 7.06s/it] {'loss': 0.5628, 'learning_rate': 3.6312956227673647e-06, 'epoch': 0.73} 73%|███████▎ | 4204/5773 [2:41:12<3:04:32, 7.06s/it] {'loss': 0.5628, 'learning_rate': 3.6312956227673647e-06, 'epoch': 0.73} 73%|███████▎ | 4204/5773 [2:41:10<3:04:32, 7.06s/it] 73%|███████▎ | 4205/5773 [2:41:16<2:52:15, 6.59s/it] 73%|███████▎ | 4205/5773 [2:41:18<2:52:15, 6.59s/it] {'loss': 0.5713, 'learning_rate': 3.626970716713809e-06, 'epoch': 0.73} 73%|███████▎ | 4205/5773 [2:41:18<2:52:15, 6.59s/it] {'loss': 0.5713, 'learning_rate': 3.626970716713809e-06, 'epoch': 0.73} 73%|███████▎ | 4205/5773 [2:41:16<2:52:15, 6.59s/it] 73%|███████▎ | 4206/5773 [2:41:23<2:43:39, 6.27s/it] 73%|███████▎ | 4206/5773 [2:41:21<2:43:39, 6.27s/it]{'loss': 0.5691, 'learning_rate': 3.6226478170934866e-06, 'epoch': 0.73} {'loss': 0.5691, 'learning_rate': 3.6226478170934866e-06, 'epoch': 0.73} 73%|███████▎ | 4206/5773 [2:41:23<2:43:39, 6.27s/it] 73%|███████▎ | 4206/5773 [2:41:21<2:43:39, 6.27s/it] 73%|███████▎ | 4207/5773 [2:41:27<2:37:23, 6.03s/it] 73%|███████▎ | 4207/5773 [2:41:29<2:37:23, 6.03s/it] {'loss': 0.5399, 'learning_rate': 3.618326925267388e-06, 'epoch': 0.73} 73%|███████▎ | 4207/5773 [2:41:29<2:37:23, 6.03s/it] {'loss': 0.5399, 'learning_rate': 3.618326925267388e-06, 'epoch': 0.73} 73%|███████▎ | 4207/5773 [2:41:27<2:37:23, 6.03s/it] 73%|███████▎ | 4208/5773 [2:41:34<2:33:02, 5.87s/it] 73%|███████▎ | 4208/5773 [2:41:32<2:33:02, 5.87s/it] {'loss': 0.5716, 'learning_rate': 3.6140080425958667e-06, 'epoch': 0.73} 73%|███████▎ | 4208/5773 [2:41:34<2:33:02, 5.87s/it] {'loss': 0.5716, 'learning_rate': 3.6140080425958667e-06, 'epoch': 0.73} 73%|███████▎ | 4208/5773 [2:41:32<2:33:02, 5.87s/it] 73%|███████▎ | 4209/5773 [2:41:38<2:29:22, 5.73s/it] 73%|███████▎ | 4209/5773 [2:41:39<2:29:22, 5.73s/it] {'loss': 0.5518, 'learning_rate': 3.609691170438645e-06, 'epoch': 0.73} 73%|███████▎ | 4209/5773 [2:41:39<2:29:22, 5.73s/it] {'loss': 0.5518, 'learning_rate': 3.609691170438645e-06, 'epoch': 0.73} 73%|███████▎ | 4209/5773 [2:41:38<2:29:22, 5.73s/it] 73%|███████▎ | 4210/5773 [2:41:45<2:26:59, 5.64s/it] 73%|███████▎ | 4210/5773 [2:41:43<2:27:00, 5.64s/it] {'loss': 0.5452, 'learning_rate': 3.6053763101548123e-06, 'epoch': 0.73} 73%|███████▎ | 4210/5773 [2:41:45<2:26:59, 5.64s/it] {'loss': 0.5452, 'learning_rate': 3.6053763101548123e-06, 'epoch': 0.73} 73%|███████▎ | 4210/5773 [2:41:43<2:27:00, 5.64s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 73%|███████▎ | 4211/5773 [2:41:51<2:27:06, 5.65s/it] 73%|███████▎ | 4211/5773 [2:41:49<2:27:07, 5.65s/it] {'loss': 0.5512, 'learning_rate': 3.601063463102823e-06, 'epoch': 0.73} 73%|███████▎ | 4211/5773 [2:41:51<2:27:06, 5.65s/it] {'loss': 0.5512, 'learning_rate': 3.601063463102823e-06, 'epoch': 0.73} 73%|███████▎ | 4211/5773 [2:41:49<2:27:07, 5.65s/it] 73%|███████▎ | 4212/5773 [2:41:56<2:25:41, 5.60s/it] 73%|███████▎ | 4212/5773 [2:41:54<2:25:41, 5.60s/it] {'loss': 0.5663, 'learning_rate': 3.5967526306404974e-06, 'epoch': 0.73} 73%|███████▎ | 4212/5773 [2:41:56<2:25:41, 5.60s/it] {'loss': 0.5663, 'learning_rate': 3.5967526306404974e-06, 'epoch': 0.73} 73%|███████▎ | 4212/5773 [2:41:54<2:25:41, 5.60s/it] 73%|███████▎ | 4213/5773 [2:42:00<2:24:47, 5.57s/it] 73%|███████▎ | 4213/5773 [2:42:02<2:24:47, 5.57s/it] {'loss': 0.5748, 'learning_rate': 3.5924438141250297e-06, 'epoch': 0.73} 73%|███████▎ | 4213/5773 [2:42:02<2:24:47, 5.57s/it] {'loss': 0.5748, 'learning_rate': 3.5924438141250297e-06, 'epoch': 0.73} 73%|███████▎ | 4213/5773 [2:42:00<2:24:47, 5.57s/it] 73%|███████▎ | 4214/5773 [2:42:05<2:23:59, 5.54s/it] 73%|███████▎ | 4214/5773 [2:42:07<2:23:59, 5.54s/it] {'loss': 0.5645, 'learning_rate': 3.588137014912968e-06, 'epoch': 0.73} 73%|███████▎ | 4214/5773 [2:42:07<2:23:59, 5.54s/it] {'loss': 0.5645, 'learning_rate': 3.588137014912968e-06, 'epoch': 0.73} 73%|███████▎ | 4214/5773 [2:42:05<2:23:59, 5.54s/it] 73%|███████▎ | 4215/5773 [2:42:11<2:23:25, 5.52s/it] 73%|███████▎ | 4215/5773 [2:42:13<2:23:25, 5.52s/it] {'loss': 0.5653, 'learning_rate': 3.5838322343602293e-06, 'epoch': 0.73} 73%|███████▎ | 4215/5773 [2:42:13<2:23:25, 5.52s/it] {'loss': 0.5653, 'learning_rate': 3.5838322343602293e-06, 'epoch': 0.73} 73%|███████▎ | 4215/5773 [2:42:11<2:23:25, 5.52s/it] 73%|███████▎ | 4216/5773 [2:42:16<2:24:33, 5.57s/it] 73%|███████▎ | 4216/5773 [2:42:18<2:24:33, 5.57s/it] {'loss': 0.564, 'learning_rate': 3.579529473822102e-06, 'epoch': 0.73} 73%|███████▎ | 4216/5773 [2:42:18<2:24:33, 5.57s/it] {'loss': 0.564, 'learning_rate': 3.579529473822102e-06, 'epoch': 0.73} 73%|███████▎ | 4216/5773 [2:42:16<2:24:33, 5.57s/it] 73%|███████▎ | 4217/5773 [2:42:22<2:23:18, 5.53s/it] 73%|███████▎ | 4217/5773 [2:42:24<2:23:18, 5.53s/it] {'loss': 0.5521, 'learning_rate': 3.5752287346532288e-06, 'epoch': 0.73} 73%|███████▎ | 4217/5773 [2:42:24<2:23:18, 5.53s/it] {'loss': 0.5521, 'learning_rate': 3.5752287346532288e-06, 'epoch': 0.73} 73%|███████▎ | 4217/5773 [2:42:22<2:23:18, 5.53s/it] 73%|███████▎ | 4218/5773 [2:42:29<2:22:34, 5.50s/it] 73%|███████▎ | 4218/5773 [2:42:27<2:22:34, 5.50s/it] {'loss': 0.5665, 'learning_rate': 3.570930018207619e-06, 'epoch': 0.73} 73%|███████▎ | 4218/5773 [2:42:29<2:22:34, 5.50s/it] {'loss': 0.5665, 'learning_rate': 3.570930018207619e-06, 'epoch': 0.73} 73%|███████▎ | 4218/5773 [2:42:27<2:22:34, 5.50s/it] 73%|███████▎ | 4219/5773 [2:42:33<2:22:12, 5.49s/it] 73%|███████▎ | 4219/5773 [2:42:35<2:22:12, 5.49s/it] {'loss': 0.5596, 'learning_rate': 3.5666333258386443e-06, 'epoch': 0.73} 73%|███████▎ | 4219/5773 [2:42:35<2:22:12, 5.49s/it] {'loss': 0.5596, 'learning_rate': 3.5666333258386443e-06, 'epoch': 0.73} 73%|███████▎ | 4219/5773 [2:42:33<2:22:12, 5.49s/it] 73%|███████▎ | 4220/5773 [2:42:40<2:22:01, 5.49s/it] 73%|███████▎ | 4220/5773 [2:42:38<2:22:01, 5.49s/it] {'loss': 0.5465, 'learning_rate': 3.562338658899047e-06, 'epoch': 0.73} 73%|███████▎ | 4220/5773 [2:42:40<2:22:01, 5.49s/it] {'loss': 0.5465, 'learning_rate': 3.562338658899047e-06, 'epoch': 0.73} 73%|███████▎ | 4220/5773 [2:42:38<2:22:01, 5.49s/it] 73%|███████▎ | 4221/5773 [2:42:43<2:21:30, 5.47s/it] 73%|███████▎ | 4221/5773 [2:42:45<2:21:30, 5.47s/it] {'loss': 0.5608, 'learning_rate': 3.5580460187409217e-06, 'epoch': 0.73} 73%|███████▎ | 4221/5773 [2:42:45<2:21:30, 5.47s/it] {'loss': 0.5608, 'learning_rate': 3.5580460187409217e-06, 'epoch': 0.73} 73%|███████▎ | 4221/5773 [2:42:43<2:21:30, 5.47s/it] 73%|███████▎ | 4222/5773 [2:42:51<2:21:40, 5.48s/it] 73%|███████▎ | 4222/5773 [2:42:49<2:21:40, 5.48s/it] {'loss': 0.5519, 'learning_rate': 3.553755406715724e-06, 'epoch': 0.73} 73%|███████▎ | 4222/5773 [2:42:51<2:21:40, 5.48s/it] {'loss': 0.5519, 'learning_rate': 3.553755406715724e-06, 'epoch': 0.73} 73%|███████▎ | 4222/5773 [2:42:49<2:21:40, 5.48s/it] 73%|███████▎ | 4223/5773 [2:42:56<2:21:11, 5.47s/it] 73%|███████▎ | 4223/5773 [2:42:54<2:21:11, 5.47s/it] {'loss': 0.5657, 'learning_rate': 3.549466824174288e-06, 'epoch': 0.73} 73%|███████▎ | 4223/5773 [2:42:56<2:21:11, 5.47s/it] {'loss': 0.5657, 'learning_rate': 3.549466824174288e-06, 'epoch': 0.73} 73%|███████▎ | 4223/5773 [2:42:54<2:21:11, 5.47s/it] 73%|███████▎ | 4224/5773 [2:43:02<2:21:50, 5.49s/it] 73%|███████▎ | 4224/5773 [2:43:00<2:21:50, 5.49s/it] {'loss': 0.5708, 'learning_rate': 3.5451802724667863e-06, 'epoch': 0.73} 73%|███████▎ | 4224/5773 [2:43:02<2:21:50, 5.49s/it] {'loss': 0.5708, 'learning_rate': 3.5451802724667863e-06, 'epoch': 0.73} 73%|███████▎ | 4224/5773 [2:43:00<2:21:50, 5.49s/it] 73%|███████▎ | 4225/5773 [2:43:08<2:23:15, 5.55s/it] 73%|███████▎ | 4225/5773 [2:43:06<2:23:15, 5.55s/it] {'loss': 0.561, 'learning_rate': 3.54089575294276e-06, 'epoch': 0.73} 73%|███████▎ | 4225/5773 [2:43:08<2:23:15, 5.55s/it] {'loss': 0.561, 'learning_rate': 3.54089575294276e-06, 'epoch': 0.73} 73%|███████▎ | 4225/5773 [2:43:06<2:23:15, 5.55s/it] 73%|███████▎ | 4226/5773 [2:43:13<2:21:48, 5.50s/it] 73%|███████▎ | 4226/5773 [2:43:11<2:21:48, 5.50s/it] {'loss': 0.5643, 'learning_rate': 3.536613266951122e-06, 'epoch': 0.73} 73%|███████▎ | 4226/5773 [2:43:13<2:21:48, 5.50s/it] {'loss': 0.5643, 'learning_rate': 3.536613266951122e-06, 'epoch': 0.73} 73%|███████▎ | 4226/5773 [2:43:11<2:21:48, 5.50s/it] 73%|███████▎ | 4227/5773 [2:43:16<2:21:03, 5.47s/it] 73%|███████▎ | 4227/5773 [2:43:18<2:21:03, 5.47s/it] {'loss': 0.5479, 'learning_rate': 3.5323328158401315e-06, 'epoch': 0.73} 73%|███████▎ | 4227/5773 [2:43:18<2:21:03, 5.47s/it] {'loss': 0.5479, 'learning_rate': 3.5323328158401315e-06, 'epoch': 0.73} 73%|███████▎ | 4227/5773 [2:43:16<2:21:03, 5.47s/it] 73%|███████▎ | 4228/5773 [2:43:22<2:20:37, 5.46s/it] 73%|███████▎ | 4228/5773 [2:43:24<2:20:37, 5.46s/it] {'loss': 0.5557, 'learning_rate': 3.5280544009574068e-06, 'epoch': 0.73} 73%|███████▎ | 4228/5773 [2:43:24<2:20:37, 5.46s/it] {'loss': 0.5557, 'learning_rate': 3.5280544009574068e-06, 'epoch': 0.73} 73%|███████▎ | 4228/5773 [2:43:22<2:20:37, 5.46s/it] 73%|███████▎ | 4229/5773 [2:43:27<2:19:02, 5.40s/it] 73%|███████▎ | 4229/5773 [2:43:29<2:19:02, 5.40s/it] {'loss': 0.5662, 'learning_rate': 3.5237780236499386e-06, 'epoch': 0.73} 73%|███████▎ | 4229/5773 [2:43:29<2:19:02, 5.40s/it] {'loss': 0.5662, 'learning_rate': 3.5237780236499386e-06, 'epoch': 0.73} 73%|███████▎ | 4229/5773 [2:43:27<2:19:02, 5.40s/it] 73%|███████▎ | 4230/5773 [2:43:33<2:19:45, 5.43s/it] 73%|███████▎ | 4230/5773 [2:43:35<2:19:45, 5.43s/it] {'loss': 0.5671, 'learning_rate': 3.5195036852640617e-06, 'epoch': 0.73} 73%|███████▎ | 4230/5773 [2:43:35<2:19:45, 5.43s/it] {'loss': 0.5671, 'learning_rate': 3.5195036852640617e-06, 'epoch': 0.73} 73%|███████▎ | 4230/5773 [2:43:33<2:19:45, 5.43s/it] 73%|███████▎ | 4231/5773 [2:43:38<2:19:20, 5.42s/it] 73%|███████▎ | 4231/5773 [2:43:40<2:19:20, 5.42s/it] {'loss': 0.5597, 'learning_rate': 3.5152313871454756e-06, 'epoch': 0.73} 73%|███████▎ | 4231/5773 [2:43:40<2:19:20, 5.42s/it] {'loss': 0.5597, 'learning_rate': 3.5152313871454756e-06, 'epoch': 0.73} 73%|███████▎ | 4231/5773 [2:43:38<2:19:20, 5.42s/it] 73%|███████▎ | 4232/5773 [2:43:45<2:18:57, 5.41s/it] 73%|███████▎ | 4232/5773 [2:43:43<2:18:57, 5.41s/it] {'loss': 0.5613, 'learning_rate': 3.5109611306392322e-06, 'epoch': 0.73} 73%|███████▎ | 4232/5773 [2:43:45<2:18:57, 5.41s/it] {'loss': 0.5613, 'learning_rate': 3.5109611306392322e-06, 'epoch': 0.73} 73%|███████▎ | 4232/5773 [2:43:43<2:18:57, 5.41s/it] 73%|███████▎ | 4233/5773 [2:43:49<2:20:38, 5.48s/it] 73%|███████▎ | 4233/5773 [2:43:51<2:20:38, 5.48s/it] {'loss': 0.5583, 'learning_rate': 3.506692917089751e-06, 'epoch': 0.73} 73%|███████▎ | 4233/5773 [2:43:51<2:20:38, 5.48s/it] {'loss': 0.5583, 'learning_rate': 3.506692917089751e-06, 'epoch': 0.73} 73%|███████▎ | 4233/5773 [2:43:49<2:20:38, 5.48s/it] 73%|███████▎ | 4234/5773 [2:43:57<2:41:26, 6.29s/it] 73%|███████▎ | 4234/5773 [2:43:59<2:41:26, 6.29s/it] {'loss': 0.5477, 'learning_rate': 3.502426747840799e-06, 'epoch': 0.73} 73%|███████▎ | 4234/5773 [2:43:59<2:41:26, 6.29s/it] {'loss': 0.5477, 'learning_rate': 3.502426747840799e-06, 'epoch': 0.73} 73%|███████▎ | 4234/5773 [2:43:57<2:41:26, 6.29s/it] 73%|███████▎ | 4235/5773 [2:44:03<2:35:29, 6.07s/it] 73%|███████▎ | 4235/5773 [2:44:05<2:35:29, 6.07s/it] {'loss': 0.551, 'learning_rate': 3.4981626242355003e-06, 'epoch': 0.73} 73%|███████▎ | 4235/5773 [2:44:05<2:35:29, 6.07s/it] {'loss': 0.551, 'learning_rate': 3.4981626242355003e-06, 'epoch': 0.73} 73%|███████▎ | 4235/5773 [2:44:03<2:35:29, 6.07s/it] 73%|███████▎ | 4236/5773 [2:44:10<2:30:23, 5.87s/it] 73%|███████▎ | 4236/5773 [2:44:08<2:30:23, 5.87s/it] {'loss': 0.5564, 'learning_rate': 3.4939005476163436e-06, 'epoch': 0.73} 73%|███████▎ | 4236/5773 [2:44:10<2:30:23, 5.87s/it] {'loss': 0.5564, 'learning_rate': 3.4939005476163436e-06, 'epoch': 0.73} 73%|███████▎ | 4236/5773 [2:44:08<2:30:23, 5.87s/it] 73%|███████▎ | 4237/5773 [2:44:16<2:27:23, 5.76s/it] 73%|███████▎ | 4237/5773 [2:44:14<2:27:23, 5.76s/it] {'loss': 0.5648, 'learning_rate': 3.489640519325166e-06, 'epoch': 0.73} 73%|███████▎ | 4237/5773 [2:44:16<2:27:23, 5.76s/it] {'loss': 0.5648, 'learning_rate': 3.489640519325166e-06, 'epoch': 0.73} 73%|███████▎ | 4237/5773 [2:44:14<2:27:23, 5.76s/it] 73%|███████▎ | 4238/5773 [2:44:19<2:24:38, 5.65s/it] 73%|███████▎ | 4238/5773 [2:44:21<2:24:38, 5.65s/it] {'loss': 0.5705, 'learning_rate': 3.4853825407031503e-06, 'epoch': 0.73} 73%|███████▎ | 4238/5773 [2:44:21<2:24:38, 5.65s/it] {'loss': 0.5705, 'learning_rate': 3.4853825407031503e-06, 'epoch': 0.73} 73%|███████▎ | 4238/5773 [2:44:19<2:24:38, 5.65s/it] 73%|███████▎ | 4239/5773 [2:44:27<2:22:45, 5.58s/it] 73%|███████▎ | 4239/5773 [2:44:25<2:22:45, 5.58s/it] {'loss': 0.5526, 'learning_rate': 3.481126613090855e-06, 'epoch': 0.73} 73%|███████▎ | 4239/5773 [2:44:27<2:22:45, 5.58s/it] {'loss': 0.5526, 'learning_rate': 3.481126613090855e-06, 'epoch': 0.73} 73%|███████▎ | 4239/5773 [2:44:25<2:22:45, 5.58s/it] 73%|███████▎ | 4240/5773 [2:44:32<2:21:42, 5.55s/it] 73%|███████▎ | 4240/5773 [2:44:30<2:21:42, 5.55s/it] {'loss': 0.5636, 'learning_rate': 3.4768727378281774e-06, 'epoch': 0.73} 73%|███████▎ | 4240/5773 [2:44:32<2:21:42, 5.55s/it] {'loss': 0.5636, 'learning_rate': 3.4768727378281774e-06, 'epoch': 0.73} 73%|███████▎ | 4240/5773 [2:44:30<2:21:42, 5.55s/it] 73%|███████▎ | 4241/5773 [2:44:37<2:19:34, 5.47s/it] 73%|███████▎ | 4241/5773 [2:44:35<2:19:34, 5.47s/it] {'loss': 0.5589, 'learning_rate': 3.472620916254372e-06, 'epoch': 0.73} 73%|███████▎ | 4241/5773 [2:44:37<2:19:34, 5.47s/it] {'loss': 0.5589, 'learning_rate': 3.472620916254372e-06, 'epoch': 0.73} 73%|███████▎ | 4241/5773 [2:44:35<2:19:34, 5.47s/it] 73%|███████▎ | 4242/5773 [2:44:43<2:20:21, 5.50s/it] 73%|███████▎ | 4242/5773 [2:44:41<2:20:21, 5.50s/it] {'loss': 0.5689, 'learning_rate': 3.4683711497080543e-06, 'epoch': 0.73} 73%|███████▎ | 4242/5773 [2:44:43<2:20:21, 5.50s/it] {'loss': 0.5689, 'learning_rate': 3.4683711497080543e-06, 'epoch': 0.73} 73%|███████▎ | 4242/5773 [2:44:41<2:20:21, 5.50s/it] 73%|███████▎ | 4243/5773 [2:44:49<2:40:20, 6.29s/it] 73%|███████▎ | 4243/5773 [2:44:51<2:40:20, 6.29s/it] {'loss': 0.5493, 'learning_rate': 3.4641234395271827e-06, 'epoch': 0.73} 73%|███████▎ | 4243/5773 [2:44:51<2:40:20, 6.29s/it] {'loss': 0.5493, 'learning_rate': 3.4641234395271827e-06, 'epoch': 0.73} 73%|███████▎ | 4243/5773 [2:44:49<2:40:20, 6.29s/it] 74%|███████▎ | 4244/5773 [2:44:54<2:34:03, 6.05s/it] 74%|███████▎ | 4244/5773 [2:44:56<2:34:03, 6.05s/it] {'loss': 0.5547, 'learning_rate': 3.4598777870490717e-06, 'epoch': 0.74} 74%|███████▎ | 4244/5773 [2:44:56<2:34:03, 6.05s/it] {'loss': 0.5547, 'learning_rate': 3.4598777870490717e-06, 'epoch': 0.74} 74%|███████▎ | 4244/5773 [2:44:54<2:34:03, 6.05s/it] 74%|███████▎ | 4245/5773 [2:45:00<2:29:53, 5.89s/it] 74%|███████▎ | 4245/5773 [2:45:02<2:29:53, 5.89s/it] {'loss': 0.5546, 'learning_rate': 3.4556341936103853e-06, 'epoch': 0.74} 74%|███████▎ | 4245/5773 [2:45:02<2:29:53, 5.89s/it] {'loss': 0.5546, 'learning_rate': 3.4556341936103853e-06, 'epoch': 0.74} 74%|███████▎ | 4245/5773 [2:45:00<2:29:53, 5.89s/it] 74%|███████▎ | 4246/5773 [2:45:05<2:26:02, 5.74s/it] 74%|███████▎ | 4246/5773 [2:45:07<2:26:02, 5.74s/it] {'loss': 0.5487, 'learning_rate': 3.4513926605471504e-06, 'epoch': 0.74} 74%|███████▎ | 4246/5773 [2:45:07<2:26:02, 5.74s/it] {'loss': 0.5487, 'learning_rate': 3.4513926605471504e-06, 'epoch': 0.74} 74%|███████▎ | 4246/5773 [2:45:05<2:26:02, 5.74s/it] 74%|███████▎ | 4247/5773 [2:45:13<2:24:13, 5.67s/it] 74%|███████▎ | 4247/5773 [2:45:11<2:24:13, 5.67s/it] {'loss': 0.5479, 'learning_rate': 3.447153189194731e-06, 'epoch': 0.74} 74%|███████▎ | 4247/5773 [2:45:13<2:24:13, 5.67s/it] {'loss': 0.5479, 'learning_rate': 3.447153189194731e-06, 'epoch': 0.74} 74%|███████▎ | 4247/5773 [2:45:11<2:24:13, 5.67s/it] 74%|███████▎ | 4248/5773 [2:45:16<2:21:38, 5.57s/it] 74%|███████▎ | 4248/5773 [2:45:18<2:21:38, 5.57s/it] {'loss': 0.5538, 'learning_rate': 3.442915780887848e-06, 'epoch': 0.74} 74%|███████▎ | 4248/5773 [2:45:18<2:21:38, 5.57s/it] {'loss': 0.5538, 'learning_rate': 3.442915780887848e-06, 'epoch': 0.74} 74%|███████▎ | 4248/5773 [2:45:16<2:21:38, 5.57s/it] 74%|███████▎ | 4249/5773 [2:45:24<2:19:51, 5.51s/it] 74%|███████▎ | 4249/5773 [2:45:22<2:19:51, 5.51s/it] {'loss': 0.5679, 'learning_rate': 3.4386804369605764e-06, 'epoch': 0.74} 74%|███████▎ | 4249/5773 [2:45:24<2:19:51, 5.51s/it] {'loss': 0.5679, 'learning_rate': 3.4386804369605764e-06, 'epoch': 0.74} 74%|███████▎ | 4249/5773 [2:45:22<2:19:51, 5.51s/it]11 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 31 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 74%|███████▎ | 4250/5773 [2:45:29<2:19:31, 5.50s/it]4 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 74%|███████▎ | 4250/5773 [2:45:27<2:19:31, 5.50s/it] {'loss': 0.5598, 'learning_rate': 3.434447158746336e-06, 'epoch': 0.74} 74%|███████▎ | 4250/5773 [2:45:27<2:19:31, 5.50s/it]{'loss': 0.5598, 'learning_rate': 3.434447158746336e-06, 'epoch': 0.74} 74%|███████▎ | 4250/5773 [2:45:29<2:19:31, 5.50s/it] 74%|███████▎ | 4251/5773 [2:45:37<2:38:33, 6.25s/it] 74%|███████▎ | 4251/5773 [2:45:35<2:38:33, 6.25s/it] {'loss': 0.5478, 'learning_rate': 3.4302159475778985e-06, 'epoch': 0.74} 74%|███████▎ | 4251/5773 [2:45:37<2:38:33, 6.25s/it] {'loss': 0.5478, 'learning_rate': 3.4302159475778985e-06, 'epoch': 0.74} 74%|███████▎ | 4251/5773 [2:45:35<2:38:33, 6.25s/it] 74%|███████▎ | 4252/5773 [2:45:43<2:33:33, 6.06s/it] 74%|███████▎ | 4252/5773 [2:45:41<2:33:33, 6.06s/it] {'loss': 0.5855, 'learning_rate': 3.4259868047873835e-06, 'epoch': 0.74} 74%|███████▎ | 4252/5773 [2:45:43<2:33:33, 6.06s/it] {'loss': 0.5855, 'learning_rate': 3.4259868047873835e-06, 'epoch': 0.74} 74%|███████▎ | 4252/5773 [2:45:41<2:33:33, 6.06s/it] 74%|███████▎ | 4253/5773 [2:45:51<2:48:10, 6.64s/it] 74%|███████▎ | 4253/5773 [2:45:49<2:48:10, 6.64s/it] {'loss': 0.5576, 'learning_rate': 3.4217597317062613e-06, 'epoch': 0.74} 74%|███████▎ | 4253/5773 [2:45:51<2:48:10, 6.64s/it] {'loss': 0.5576, 'learning_rate': 3.4217597317062613e-06, 'epoch': 0.74} 74%|███████▎ | 4253/5773 [2:45:49<2:48:10, 6.64s/it] 74%|███████▎ | 4254/5773 [2:45:56<2:38:33, 6.26s/it] 74%|███████▎ | 4254/5773 [2:45:54<2:38:33, 6.26s/it] {'loss': 0.5534, 'learning_rate': 3.4175347296653448e-06, 'epoch': 0.74} 74%|███████▎ | 4254/5773 [2:45:56<2:38:33, 6.26s/it] {'loss': 0.5534, 'learning_rate': 3.4175347296653448e-06, 'epoch': 0.74} 74%|███████▎ | 4254/5773 [2:45:54<2:38:33, 6.26s/it] 74%|███████▎ | 4255/5773 [2:46:01<2:32:17, 6.02s/it] 74%|███████▎ | 4255/5773 [2:46:00<2:32:17, 6.02s/it] {'loss': 0.5662, 'learning_rate': 3.4133117999948086e-06, 'epoch': 0.74} 74%|███████▎ | 4255/5773 [2:46:01<2:32:17, 6.02s/it] {'loss': 0.5662, 'learning_rate': 3.4133117999948086e-06, 'epoch': 0.74} 74%|███████▎ | 4255/5773 [2:46:00<2:32:17, 6.02s/it] 74%|███████▎ | 4256/5773 [2:46:07<2:28:08, 5.86s/it] 74%|███████▎ | 4256/5773 [2:46:05<2:28:08, 5.86s/it] {'loss': 0.557, 'learning_rate': 3.4090909440241594e-06, 'epoch': 0.74} 74%|███████▎ | 4256/5773 [2:46:07<2:28:08, 5.86s/it] {'loss': 0.557, 'learning_rate': 3.4090909440241594e-06, 'epoch': 0.74} 74%|███████▎ | 4256/5773 [2:46:05<2:28:08, 5.86s/it] 74%|███████▎ | 4257/5773 [2:46:12<2:24:46, 5.73s/it] 74%|███████▎ | 4257/5773 [2:46:10<2:24:46, 5.73s/it] {'loss': 0.5433, 'learning_rate': 3.40487216308226e-06, 'epoch': 0.74} 74%|███████▎ | 4257/5773 [2:46:12<2:24:46, 5.73s/it] {'loss': 0.5433, 'learning_rate': 3.40487216308226e-06, 'epoch': 0.74} 74%|███████▎ | 4257/5773 [2:46:10<2:24:46, 5.73s/it] 74%|███████▍ | 4258/5773 [2:46:22<3:07:34, 7.43s/it] 74%|███████▍ | 4258/5773 [2:46:24<3:07:35, 7.43s/it] {'loss': 0.5556, 'learning_rate': 3.400655458497313e-06, 'epoch': 0.74} 74%|███████▍ | 4258/5773 [2:46:24<3:07:35, 7.43s/it] {'loss': 0.5556, 'learning_rate': 3.400655458497313e-06, 'epoch': 0.74} 74%|███████▍ | 4258/5773 [2:46:22<3:07:34, 7.43s/it] 74%|███████▍ | 4259/5773 [2:46:27<2:52:00, 6.82s/it] 74%|███████▍ | 4259/5773 [2:46:29<2:52:01, 6.82s/it] {'loss': 0.5632, 'learning_rate': 3.3964408315968776e-06, 'epoch': 0.74} 74%|███████▍ | 4259/5773 [2:46:29<2:52:01, 6.82s/it] {'loss': 0.5632, 'learning_rate': 3.3964408315968776e-06, 'epoch': 0.74} 74%|███████▍ | 4259/5773 [2:46:27<2:52:00, 6.82s/it] 74%|███████▍ | 4260/5773 [2:46:35<2:43:46, 6.49s/it] 74%|███████▍ | 4260/5773 [2:46:33<2:43:46, 6.49s/it] {'loss': 0.564, 'learning_rate': 3.3922282837078524e-06, 'epoch': 0.74} 74%|███████▍ | 4260/5773 [2:46:35<2:43:46, 6.49s/it] {'loss': 0.564, 'learning_rate': 3.3922282837078524e-06, 'epoch': 0.74} 74%|███████▍ | 4260/5773 [2:46:33<2:43:46, 6.49s/it] 74%|███████▍ | 4261/5773 [2:46:41<2:36:47, 6.22s/it] 74%|███████▍ | 4261/5773 [2:46:39<2:36:47, 6.22s/it] {'loss': 0.5529, 'learning_rate': 3.388017816156476e-06, 'epoch': 0.74} 74%|███████▍ | 4261/5773 [2:46:41<2:36:47, 6.22s/it] {'loss': 0.5529, 'learning_rate': 3.388017816156476e-06, 'epoch': 0.74} 74%|███████▍ | 4261/5773 [2:46:39<2:36:47, 6.22s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5014 > 4096). Running this sequence through the model will result in indexing errors 74%|███████▍ | 4262/5773 [2:46:53<3:21:37, 8.01s/it] 74%|███████▍ | 4262/5773 [2:46:51<3:21:37, 8.01s/it] {'loss': 0.5698, 'learning_rate': 3.3838094302683453e-06, 'epoch': 0.74} 74%|███████▍ | 4262/5773 [2:46:53<3:21:37, 8.01s/it] {'loss': 0.5698, 'learning_rate': 3.3838094302683453e-06, 'epoch': 0.74} 74%|███████▍ | 4262/5773 [2:46:51<3:21:37, 8.01s/it] 74%|███████▍ | 4263/5773 [2:47:01<3:21:18, 8.00s/it] 74%|███████▍ | 4263/5773 [2:46:59<3:21:18, 8.00s/it] {'loss': 0.544, 'learning_rate': 3.3796031273683926e-06, 'epoch': 0.74} 74%|███████▍ | 4263/5773 [2:47:01<3:21:18, 8.00s/it] {'loss': 0.544, 'learning_rate': 3.3796031273683926e-06, 'epoch': 0.74} 74%|███████▍ | 4263/5773 [2:46:59<3:21:18, 8.00s/it] 74%|███████▍ | 4264/5773 [2:47:04<3:00:45, 7.19s/it] 74%|███████▍ | 4264/5773 [2:47:06<3:00:45, 7.19s/it] {'loss': 0.5546, 'learning_rate': 3.3753989087808926e-06, 'epoch': 0.74} 74%|███████▍ | 4264/5773 [2:47:06<3:00:45, 7.19s/it] {'loss': 0.5546, 'learning_rate': 3.3753989087808926e-06, 'epoch': 0.74} 74%|███████▍ | 4264/5773 [2:47:04<3:00:45, 7.19s/it] 74%|███████▍ | 4265/5773 [2:47:13<3:13:08, 7.68s/it] 74%|███████▍ | 4265/5773 [2:47:15<3:13:08, 7.68s/it] {'loss': 0.5634, 'learning_rate': 3.371196775829477e-06, 'epoch': 0.74} 74%|███████▍ | 4265/5773 [2:47:15<3:13:08, 7.68s/it] {'loss': 0.5634, 'learning_rate': 3.371196775829477e-06, 'epoch': 0.74} 74%|███████▍ | 4265/5773 [2:47:13<3:13:08, 7.68s/it] 74%|███████▍ | 4266/5773 [2:47:18<2:57:16, 7.06s/it] 74%|███████▍ | 4266/5773 [2:47:20<2:57:16, 7.06s/it] {'loss': 0.5531, 'learning_rate': 3.366996729837102e-06, 'epoch': 0.74} 74%|███████▍ | 4266/5773 [2:47:20<2:57:16, 7.06s/it] {'loss': 0.5531, 'learning_rate': 3.366996729837102e-06, 'epoch': 0.74} 74%|███████▍ | 4266/5773 [2:47:18<2:57:16, 7.06s/it] 74%|███████▍ | 4267/5773 [2:47:24<2:45:10, 6.58s/it] 74%|███████▍ | 4267/5773 [2:47:26<2:45:10, 6.58s/it] {'loss': 0.5577, 'learning_rate': 3.3627987721260758e-06, 'epoch': 0.74} 74%|███████▍ | 4267/5773 [2:47:26<2:45:10, 6.58s/it] {'loss': 0.5577, 'learning_rate': 3.3627987721260758e-06, 'epoch': 0.74} 74%|███████▍ | 4267/5773 [2:47:24<2:45:10, 6.58s/it] 74%|███████▍ | 4268/5773 [2:47:31<2:37:32, 6.28s/it] 74%|███████▍ | 4268/5773 [2:47:29<2:37:33, 6.28s/it] {'loss': 0.5677, 'learning_rate': 3.358602904018057e-06, 'epoch': 0.74} 74%|███████▍ | 4268/5773 [2:47:31<2:37:32, 6.28s/it] {'loss': 0.5677, 'learning_rate': 3.358602904018057e-06, 'epoch': 0.74} 74%|███████▍ | 4268/5773 [2:47:29<2:37:33, 6.28s/it] 74%|███████▍ | 4269/5773 [2:47:35<2:31:19, 6.04s/it] 74%|███████▍ | 4269/5773 [2:47:37<2:31:19, 6.04s/it] {'loss': 0.5432, 'learning_rate': 3.354409126834034e-06, 'epoch': 0.74} 74%|███████▍ | 4269/5773 [2:47:37<2:31:19, 6.04s/it] {'loss': 0.5432, 'learning_rate': 3.354409126834034e-06, 'epoch': 0.74} 74%|███████▍ | 4269/5773 [2:47:35<2:31:19, 6.04s/it] 74%|███████▍ | 4270/5773 [2:47:40<2:27:14, 5.88s/it] 74%|███████▍ | 4270/5773 [2:47:42<2:27:14, 5.88s/it] {'loss': 0.5789, 'learning_rate': 3.3502174418943446e-06, 'epoch': 0.74} 74%|███████▍ | 4270/5773 [2:47:42<2:27:14, 5.88s/it] {'loss': 0.5789, 'learning_rate': 3.3502174418943446e-06, 'epoch': 0.74} 74%|███████▍ | 4270/5773 [2:47:40<2:27:14, 5.88s/it] 74%|███████▍ | 4271/5773 [2:47:48<2:25:27, 5.81s/it] 74%|███████▍ | 4271/5773 [2:47:46<2:25:27, 5.81s/it] {'loss': 0.5536, 'learning_rate': 3.3460278505186593e-06, 'epoch': 0.74} 74%|███████▍ | 4271/5773 [2:47:48<2:25:27, 5.81s/it] {'loss': 0.5536, 'learning_rate': 3.3460278505186593e-06, 'epoch': 0.74} 74%|███████▍ | 4271/5773 [2:47:46<2:25:27, 5.81s/it] 74%|███████▍ | 4272/5773 [2:47:52<2:23:24, 5.73s/it] 74%|███████▍ | 4272/5773 [2:47:54<2:23:24, 5.73s/it] {'loss': 0.5717, 'learning_rate': 3.3418403540260035e-06, 'epoch': 0.74} 74%|███████▍ | 4272/5773 [2:47:54<2:23:24, 5.73s/it] {'loss': 0.5717, 'learning_rate': 3.3418403540260035e-06, 'epoch': 0.74} 74%|███████▍ | 4272/5773 [2:47:52<2:23:24, 5.73s/it] 74%|███████▍ | 4273/5773 [2:47:57<2:22:49, 5.71s/it] 74%|███████▍ | 4273/5773 [2:47:59<2:22:49, 5.71s/it] {'loss': 0.5636, 'learning_rate': 3.33765495373473e-06, 'epoch': 0.74} 74%|███████▍ | 4273/5773 [2:47:59<2:22:49, 5.71s/it] {'loss': 0.5636, 'learning_rate': 3.33765495373473e-06, 'epoch': 0.74} 74%|███████▍ | 4273/5773 [2:47:57<2:22:49, 5.71s/it] 74%|███████▍ | 4274/5773 [2:48:05<2:20:43, 5.63s/it] 74%|███████▍ | 4274/5773 [2:48:03<2:20:43, 5.63s/it] {'loss': 0.5678, 'learning_rate': 3.3334716509625354e-06, 'epoch': 0.74} 74%|███████▍ | 4274/5773 [2:48:05<2:20:43, 5.63s/it] {'loss': 0.5678, 'learning_rate': 3.3334716509625354e-06, 'epoch': 0.74} 74%|███████▍ | 4274/5773 [2:48:03<2:20:43, 5.63s/it] 74%|███████▍ | 4275/5773 [2:48:13<2:40:05, 6.41s/it] 74%|███████▍ | 4275/5773 [2:48:11<2:40:05, 6.41s/it] {'loss': 0.5687, 'learning_rate': 3.329290447026462e-06, 'epoch': 0.74} 74%|███████▍ | 4275/5773 [2:48:13<2:40:05, 6.41s/it] {'loss': 0.5687, 'learning_rate': 3.329290447026462e-06, 'epoch': 0.74} 74%|███████▍ | 4275/5773 [2:48:11<2:40:05, 6.41s/it] 74%|███████▍ | 4276/5773 [2:48:22<2:57:03, 7.10s/it] 74%|███████▍ | 4276/5773 [2:48:20<2:57:03, 7.10s/it] {'loss': 0.5832, 'learning_rate': 3.325111343242884e-06, 'epoch': 0.74} 74%|███████▍ | 4276/5773 [2:48:22<2:57:03, 7.10s/it] {'loss': 0.5832, 'learning_rate': 3.325111343242884e-06, 'epoch': 0.74} 74%|███████▍ | 4276/5773 [2:48:20<2:57:03, 7.10s/it] 74%|███████▍ | 4277/5773 [2:48:25<2:44:43, 6.61s/it] 74%|███████▍ | 4277/5773 [2:48:27<2:44:43, 6.61s/it] {'loss': 0.5491, 'learning_rate': 3.320934340927513e-06, 'epoch': 0.74} 74%|███████▍ | 4277/5773 [2:48:27<2:44:43, 6.61s/it] {'loss': 0.5491, 'learning_rate': 3.320934340927513e-06, 'epoch': 0.74} 74%|███████▍ | 4277/5773 [2:48:25<2:44:43, 6.61s/it] 74%|███████▍ | 4278/5773 [2:48:33<2:36:06, 6.27s/it] 74%|███████▍ | 4278/5773 [2:48:31<2:36:06, 6.27s/it] {'loss': 0.5781, 'learning_rate': 3.3167594413954084e-06, 'epoch': 0.74} 74%|███████▍ | 4278/5773 [2:48:33<2:36:06, 6.27s/it] {'loss': 0.5781, 'learning_rate': 3.3167594413954084e-06, 'epoch': 0.74} 74%|███████▍ | 4278/5773 [2:48:31<2:36:06, 6.27s/it] 74%|███████▍ | 4279/5773 [2:48:36<2:29:41, 6.01s/it] 74%|███████▍ | 4279/5773 [2:48:38<2:29:41, 6.01s/it] {'loss': 0.5567, 'learning_rate': 3.3125866459609644e-06, 'epoch': 0.74} 74%|███████▍ | 4279/5773 [2:48:38<2:29:41, 6.01s/it] {'loss': 0.5567, 'learning_rate': 3.3125866459609644e-06, 'epoch': 0.74} 74%|███████▍ | 4279/5773 [2:48:36<2:29:41, 6.01s/it] 74%|███████▍ | 4280/5773 [2:48:43<2:25:36, 5.85s/it] 74%|███████▍ | 4280/5773 [2:48:42<2:25:36, 5.85s/it] {'loss': 0.5802, 'learning_rate': 3.308415955937898e-06, 'epoch': 0.74} 74%|███████▍ | 4280/5773 [2:48:44<2:25:36, 5.85s/it] {'loss': 0.5802, 'learning_rate': 3.308415955937898e-06, 'epoch': 0.74} 74%|███████▍ | 4280/5773 [2:48:42<2:25:36, 5.85s/it] 74%|███████▍ | 4281/5773 [2:48:47<2:23:26, 5.77s/it] 74%|███████▍ | 4281/5773 [2:48:49<2:23:26, 5.77s/it] {'loss': 0.5492, 'learning_rate': 3.3042473726392875e-06, 'epoch': 0.74} 74%|███████▍ | 4281/5773 [2:48:49<2:23:26, 5.77s/it] {'loss': 0.5492, 'learning_rate': 3.3042473726392875e-06, 'epoch': 0.74} 74%|███████▍ | 4281/5773 [2:48:47<2:23:26, 5.77s/it] 74%|███████▍ | 4282/5773 [2:48:55<2:23:08, 5.76s/it] 74%|███████▍ | 4282/5773 [2:48:53<2:23:08, 5.76s/it] {'loss': 0.5643, 'learning_rate': 3.300080897377531e-06, 'epoch': 0.74} 74%|███████▍ | 4282/5773 [2:48:55<2:23:08, 5.76s/it] {'loss': 0.5643, 'learning_rate': 3.300080897377531e-06, 'epoch': 0.74} 74%|███████▍ | 4282/5773 [2:48:53<2:23:08, 5.76s/it] 74%|███████▍ | 4283/5773 [2:48:59<2:22:29, 5.74s/it] 74%|███████▍ | 4283/5773 [2:49:01<2:22:30, 5.74s/it] {'loss': 0.5541, 'learning_rate': 3.295916531464367e-06, 'epoch': 0.74} 74%|███████▍ | 4283/5773 [2:49:01<2:22:30, 5.74s/it] {'loss': 0.5541, 'learning_rate': 3.295916531464367e-06, 'epoch': 0.74} 74%|███████▍ | 4283/5773 [2:48:59<2:22:29, 5.74s/it] 74%|███████▍ | 4284/5773 [2:49:04<2:19:29, 5.62s/it] 74%|███████▍ | 4284/5773 [2:49:06<2:19:29, 5.62s/it] {'loss': 0.5634, 'learning_rate': 3.2917542762108758e-06, 'epoch': 0.74} 74%|███████▍ | 4284/5773 [2:49:06<2:19:29, 5.62s/it] {'loss': 0.5634, 'learning_rate': 3.2917542762108758e-06, 'epoch': 0.74} 74%|███████▍ | 4284/5773 [2:49:04<2:19:29, 5.62s/it] 74%|███████▍ | 4285/5773 [2:49:09<2:18:59, 5.60s/it] 74%|███████▍ | 4285/5773 [2:49:11<2:18:59, 5.60s/it] {'loss': 0.5486, 'learning_rate': 3.287594132927464e-06, 'epoch': 0.74} 74%|███████▍ | 4285/5773 [2:49:11<2:18:59, 5.60s/it] {'loss': 0.5486, 'learning_rate': 3.287594132927464e-06, 'epoch': 0.74} 74%|███████▍ | 4285/5773 [2:49:09<2:18:59, 5.60s/it] 74%|███████▍ | 4286/5773 [2:49:20<2:42:04, 6.54s/it] 74%|███████▍ | 4286/5773 [2:49:18<2:42:04, 6.54s/it] {'loss': 0.5534, 'learning_rate': 3.2834361029238805e-06, 'epoch': 0.74} 74%|███████▍ | 4286/5773 [2:49:20<2:42:04, 6.54s/it] {'loss': 0.5534, 'learning_rate': 3.2834361029238805e-06, 'epoch': 0.74} 74%|███████▍ | 4286/5773 [2:49:18<2:42:04, 6.54s/it] 74%|███████▍ | 4287/5773 [2:49:24<2:35:10, 6.27s/it] 74%|███████▍ | 4287/5773 [2:49:26<2:35:10, 6.27s/it] {'loss': 0.5643, 'learning_rate': 3.2792801875092005e-06, 'epoch': 0.74} 74%|███████▍ | 4287/5773 [2:49:26<2:35:10, 6.27s/it] {'loss': 0.5643, 'learning_rate': 3.2792801875092005e-06, 'epoch': 0.74} 74%|███████▍ | 4287/5773 [2:49:24<2:35:10, 6.27s/it] 74%|███████▍ | 4288/5773 [2:49:31<2:31:03, 6.10s/it] 74%|███████▍ | 4288/5773 [2:49:30<2:31:04, 6.10s/it] {'loss': 0.565, 'learning_rate': 3.275126387991847e-06, 'epoch': 0.74} 74%|███████▍ | 4288/5773 [2:49:31<2:31:03, 6.10s/it] {'loss': 0.565, 'learning_rate': 3.275126387991847e-06, 'epoch': 0.74} 74%|███████▍ | 4288/5773 [2:49:30<2:31:04, 6.10s/it] 74%|███████▍ | 4289/5773 [2:49:40<2:47:38, 6.78s/it] 74%|███████▍ | 4289/5773 [2:49:38<2:47:38, 6.78s/it] {'loss': 0.5638, 'learning_rate': 3.270974705679565e-06, 'epoch': 0.74} 74%|███████▍ | 4289/5773 [2:49:40<2:47:38, 6.78s/it] {'loss': 0.5638, 'learning_rate': 3.270974705679565e-06, 'epoch': 0.74} 74%|███████▍ | 4289/5773 [2:49:38<2:47:38, 6.78s/it] 74%|███████▍ | 4290/5773 [2:49:45<2:37:47, 6.38s/it] 74%|███████▍ | 4290/5773 [2:49:43<2:37:47, 6.38s/it] {'loss': 0.5542, 'learning_rate': 3.2668251418794318e-06, 'epoch': 0.74} 74%|███████▍ | 4290/5773 [2:49:45<2:37:47, 6.38s/it] {'loss': 0.5542, 'learning_rate': 3.2668251418794318e-06, 'epoch': 0.74} 74%|███████▍ | 4290/5773 [2:49:43<2:37:47, 6.38s/it] 74%|███████▍ | 4291/5773 [2:49:51<2:30:46, 6.10s/it] 74%|███████▍ | 4291/5773 [2:49:49<2:30:46, 6.10s/it] {'loss': 0.5658, 'learning_rate': 3.2626776978978723e-06, 'epoch': 0.74} 74%|███████▍ | 4291/5773 [2:49:51<2:30:46, 6.10s/it] {'loss': 0.5658, 'learning_rate': 3.2626776978978723e-06, 'epoch': 0.74} 74%|███████▍ | 4291/5773 [2:49:49<2:30:46, 6.10s/it] 74%|███████▍ | 4292/5773 [2:49:54<2:25:30, 5.89s/it] 74%|███████▍ | 4292/5773 [2:49:56<2:25:30, 5.90s/it] {'loss': 0.5594, 'learning_rate': 3.2585323750406284e-06, 'epoch': 0.74} 74%|███████▍ | 4292/5773 [2:49:54<2:25:30, 5.89s/it]{'loss': 0.5594, 'learning_rate': 3.2585323750406284e-06, 'epoch': 0.74} 74%|███████▍ | 4292/5773 [2:49:56<2:25:30, 5.90s/it] 74%|███████▍ | 4293/5773 [2:50:00<2:22:11, 5.76s/it] 74%|███████▍ | 4293/5773 [2:50:02<2:22:11, 5.76s/it] {'loss': 0.5486, 'learning_rate': 3.254389174612782e-06, 'epoch': 0.74} 74%|███████▍ | 4293/5773 [2:50:00<2:22:11, 5.76s/it] {'loss': 0.5486, 'learning_rate': 3.254389174612782e-06, 'epoch': 0.74} 74%|███████▍ | 4293/5773 [2:50:02<2:22:11, 5.76s/it] 74%|███████▍ | 4294/5773 [2:50:07<2:21:20, 5.73s/it] 74%|███████▍ | 4294/5773 [2:50:05<2:21:20, 5.73s/it] {'loss': 0.5551, 'learning_rate': 3.2502480979187433e-06, 'epoch': 0.74} 74%|███████▍ | 4294/5773 [2:50:07<2:21:20, 5.73s/it] {'loss': 0.5551, 'learning_rate': 3.2502480979187433e-06, 'epoch': 0.74} 74%|███████▍ | 4294/5773 [2:50:05<2:21:20, 5.73s/it] 74%|███████▍ | 4295/5773 [2:50:11<2:19:05, 5.65s/it] 74%|███████▍ | 4295/5773 [2:50:13<2:19:05, 5.65s/it] {'loss': 0.5625, 'learning_rate': 3.2461091462622576e-06, 'epoch': 0.74} 74%|███████▍ | 4295/5773 [2:50:13<2:19:05, 5.65s/it] {'loss': 0.5625, 'learning_rate': 3.2461091462622576e-06, 'epoch': 0.74} 74%|███████▍ | 4295/5773 [2:50:11<2:19:05, 5.65s/it] 74%|███████▍ | 4296/5773 [2:50:16<2:17:07, 5.57s/it] 74%|███████▍ | 4296/5773 [2:50:18<2:17:07, 5.57s/it] {'loss': 0.5592, 'learning_rate': 3.2419723209463937e-06, 'epoch': 0.74} 74%|███████▍ | 4296/5773 [2:50:18<2:17:07, 5.57s/it] {'loss': 0.5592, 'learning_rate': 3.2419723209463937e-06, 'epoch': 0.74} 74%|███████▍ | 4296/5773 [2:50:16<2:17:07, 5.57s/it] 74%|███████▍ | 4297/5773 [2:50:24<2:18:44, 5.64s/it] 74%|███████▍ | 4297/5773 [2:50:22<2:18:44, 5.64s/it] {'loss': 0.5639, 'learning_rate': 3.237837623273564e-06, 'epoch': 0.74} 74%|███████▍ | 4297/5773 [2:50:24<2:18:44, 5.64s/it] {'loss': 0.5639, 'learning_rate': 3.237837623273564e-06, 'epoch': 0.74} 74%|███████▍ | 4297/5773 [2:50:22<2:18:44, 5.64s/it] 74%|███████▍ | 4298/5773 [2:50:29<2:17:34, 5.60s/it] 74%|███████▍ | 4298/5773 [2:50:27<2:17:34, 5.60s/it] {'loss': 0.5619, 'learning_rate': 3.2337050545455006e-06, 'epoch': 0.74} 74%|███████▍ | 4298/5773 [2:50:29<2:17:34, 5.60s/it] {'loss': 0.5619, 'learning_rate': 3.2337050545455006e-06, 'epoch': 0.74} 74%|███████▍ | 4298/5773 [2:50:27<2:17:34, 5.60s/it] 74%|███████▍ | 4299/5773 [2:50:35<2:16:40, 5.56s/it] 74%|███████▍ | 4299/5773 [2:50:33<2:16:40, 5.56s/it] {'loss': 0.553, 'learning_rate': 3.229574616063268e-06, 'epoch': 0.74} 74%|███████▍ | 4299/5773 [2:50:35<2:16:40, 5.56s/it] {'loss': 0.553, 'learning_rate': 3.229574616063268e-06, 'epoch': 0.74} 74%|███████▍ | 4299/5773 [2:50:33<2:16:40, 5.56s/it]1 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend...8 AutoResumeHook: Checking whether to suspend... 12 4 AutoResumeHook: Checking whether to suspend... 74%|███████▍ | 4300/5773 [2:50:41<2:17:19, 5.59s/it]AutoResumeHook: Checking whether to suspend... 2713 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 74%|███████▍ | 4300/5773 [2:50:39<2:17:19, 5.59s/it] {'loss': 0.5657, 'learning_rate': 3.225446309127256e-06, 'epoch': 0.74} 74%|███████▍ | 4300/5773 [2:50:41<2:17:19, 5.59s/it] {'loss': 0.5657, 'learning_rate': 3.225446309127256e-06, 'epoch': 0.74} 74%|███████▍ | 4300/5773 [2:50:39<2:17:19, 5.59s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 75%|███████▍ | 4301/5773 [2:51:01<4:05:00, 9.99s/it] 75%|███████▍ | 4301/5773 [2:50:59<4:05:00, 9.99s/it] {'loss': 0.5617, 'learning_rate': 3.2213201350371948e-06, 'epoch': 0.75} 75%|███████▍ | 4301/5773 [2:51:01<4:05:00, 9.99s/it] {'loss': 0.5617, 'learning_rate': 3.2213201350371948e-06, 'epoch': 0.75} 75%|███████▍ | 4301/5773 [2:50:59<4:05:00, 9.99s/it] 75%|███████▍ | 4302/5773 [2:51:09<3:49:45, 9.37s/it] 75%|███████▍ | 4302/5773 [2:51:07<3:49:45, 9.37s/it] {'loss': 0.5625, 'learning_rate': 3.2171960950921333e-06, 'epoch': 0.75} 75%|███████▍ | 4302/5773 [2:51:09<3:49:45, 9.37s/it] {'loss': 0.5625, 'learning_rate': 3.2171960950921333e-06, 'epoch': 0.75} 75%|███████▍ | 4302/5773 [2:51:07<3:49:45, 9.37s/it] 75%|███████▍ | 4303/5773 [2:51:14<3:21:38, 8.23s/it] 75%|███████▍ | 4303/5773 [2:51:12<3:21:38, 8.23s/it] {'loss': 0.5615, 'learning_rate': 3.2130741905904463e-06, 'epoch': 0.75} 75%|███████▍ | 4303/5773 [2:51:14<3:21:38, 8.23s/it] {'loss': 0.5615, 'learning_rate': 3.2130741905904463e-06, 'epoch': 0.75} 75%|███████▍ | 4303/5773 [2:51:12<3:21:38, 8.23s/it] 75%|███████▍ | 4304/5773 [2:51:20<3:02:36, 7.46s/it] 75%|███████▍ | 4304/5773 [2:51:18<3:02:36, 7.46s/it] {'loss': 0.578, 'learning_rate': 3.2089544228298477e-06, 'epoch': 0.75} 75%|███████▍ | 4304/5773 [2:51:20<3:02:36, 7.46s/it] {'loss': 0.578, 'learning_rate': 3.2089544228298477e-06, 'epoch': 0.75} 75%|███████▍ | 4304/5773 [2:51:18<3:02:36, 7.46s/it] 75%|███████▍ | 4305/5773 [2:51:24<2:49:10, 6.91s/it] 75%|███████▍ | 4305/5773 [2:51:26<2:49:11, 6.92s/it] {'loss': 0.5688, 'learning_rate': 3.204836793107369e-06, 'epoch': 0.75} 75%|███████▍ | 4305/5773 [2:51:26<2:49:11, 6.92s/it] {'loss': 0.5688, 'learning_rate': 3.204836793107369e-06, 'epoch': 0.75} 75%|███████▍ | 4305/5773 [2:51:24<2:49:10, 6.91s/it] 75%|███████▍ | 4306/5773 [2:51:31<2:37:16, 6.43s/it] 75%|███████▍ | 4306/5773 [2:51:29<2:37:16, 6.43s/it] {'loss': 0.5477, 'learning_rate': 3.200721302719373e-06, 'epoch': 0.75} 75%|███████▍ | 4306/5773 [2:51:31<2:37:16, 6.43s/it] {'loss': 0.5477, 'learning_rate': 3.200721302719373e-06, 'epoch': 0.75} 75%|███████▍ | 4306/5773 [2:51:29<2:37:16, 6.43s/it] 75%|███████▍ | 4307/5773 [2:51:36<2:29:12, 6.11s/it] 75%|███████▍ | 4307/5773 [2:51:34<2:29:13, 6.11s/it] {'loss': 0.5483, 'learning_rate': 3.1966079529615447e-06, 'epoch': 0.75} 75%|███████▍ | 4307/5773 [2:51:36<2:29:12, 6.11s/it] {'loss': 0.5483, 'learning_rate': 3.1966079529615447e-06, 'epoch': 0.75} 75%|███████▍ | 4307/5773 [2:51:34<2:29:13, 6.11s/it] 75%|███████▍ | 4308/5773 [2:51:42<2:23:23, 5.87s/it] 75%|███████▍ | 4308/5773 [2:51:40<2:23:23, 5.87s/it] {'loss': 0.5567, 'learning_rate': 3.1924967451289013e-06, 'epoch': 0.75} 75%|███████▍ | 4308/5773 [2:51:42<2:23:23, 5.87s/it] {'loss': 0.5567, 'learning_rate': 3.1924967451289013e-06, 'epoch': 0.75} 75%|███████▍ | 4308/5773 [2:51:40<2:23:23, 5.87s/it] 75%|███████▍ | 4309/5773 [2:51:47<2:19:31, 5.72s/it] 75%|███████▍ | 4309/5773 [2:51:45<2:19:32, 5.72s/it] {'loss': 0.5406, 'learning_rate': 3.1883876805157766e-06, 'epoch': 0.75} 75%|███████▍ | 4309/5773 [2:51:47<2:19:31, 5.72s/it] {'loss': 0.5406, 'learning_rate': 3.1883876805157766e-06, 'epoch': 0.75} 75%|███████▍ | 4309/5773 [2:51:45<2:19:32, 5.72s/it] 75%|███████▍ | 4310/5773 [2:51:52<2:17:25, 5.64s/it] 75%|███████▍ | 4310/5773 [2:51:50<2:17:25, 5.64s/it] {'loss': 0.5653, 'learning_rate': 3.184280760415843e-06, 'epoch': 0.75} 75%|███████▍ | 4310/5773 [2:51:52<2:17:25, 5.64s/it] {'loss': 0.5653, 'learning_rate': 3.184280760415843e-06, 'epoch': 0.75} 75%|███████▍ | 4310/5773 [2:51:50<2:17:25, 5.64s/it] 75%|███████▍ | 4311/5773 [2:51:58<2:15:50, 5.58s/it] 75%|███████▍ | 4311/5773 [2:51:56<2:15:50, 5.58s/it] {'loss': 0.5672, 'learning_rate': 3.180175986122087e-06, 'epoch': 0.75} 75%|███████▍ | 4311/5773 [2:51:58<2:15:50, 5.58s/it] {'loss': 0.5672, 'learning_rate': 3.180175986122087e-06, 'epoch': 0.75} 75%|███████▍ | 4311/5773 [2:51:56<2:15:50, 5.58s/it] 75%|███████▍ | 4312/5773 [2:52:01<2:15:18, 5.56s/it] 75%|███████▍ | 4312/5773 [2:52:03<2:15:18, 5.56s/it] {'loss': 0.5661, 'learning_rate': 3.1760733589268223e-06, 'epoch': 0.75} 75%|███████▍ | 4312/5773 [2:52:03<2:15:18, 5.56s/it] {'loss': 0.5661, 'learning_rate': 3.1760733589268223e-06, 'epoch': 0.75} 75%|███████▍ | 4312/5773 [2:52:01<2:15:18, 5.56s/it] 75%|███████▍ | 4313/5773 [2:52:09<2:14:27, 5.53s/it] 75%|███████▍ | 4313/5773 [2:52:07<2:14:27, 5.53s/it] {'loss': 0.5673, 'learning_rate': 3.171972880121684e-06, 'epoch': 0.75} 75%|███████▍ | 4313/5773 [2:52:09<2:14:27, 5.53s/it] {'loss': 0.5673, 'learning_rate': 3.171972880121684e-06, 'epoch': 0.75} 75%|███████▍ | 4313/5773 [2:52:07<2:14:27, 5.53s/it] 75%|███████▍ | 4314/5773 [2:52:14<2:14:32, 5.53s/it] 75%|███████▍ | 4314/5773 [2:52:12<2:14:31, 5.53s/it] {'loss': 0.5656, 'learning_rate': 3.1678745509976405e-06, 'epoch': 0.75} 75%|███████▍ | 4314/5773 [2:52:14<2:14:32, 5.53s/it] {'loss': 0.5656, 'learning_rate': 3.1678745509976405e-06, 'epoch': 0.75} 75%|███████▍ | 4314/5773 [2:52:12<2:14:31, 5.53s/it] 75%|███████▍ | 4315/5773 [2:52:20<2:14:52, 5.55s/it] 75%|███████▍ | 4315/5773 [2:52:18<2:14:52, 5.55s/it] {'loss': 0.5572, 'learning_rate': 3.1637783728449734e-06, 'epoch': 0.75} 75%|███████▍ | 4315/5773 [2:52:20<2:14:52, 5.55s/it] {'loss': 0.5572, 'learning_rate': 3.1637783728449734e-06, 'epoch': 0.75} 75%|███████▍ | 4315/5773 [2:52:18<2:14:52, 5.55s/it] 75%|███████▍ | 4316/5773 [2:52:25<2:13:20, 5.49s/it] 75%|███████▍ | 4316/5773 [2:52:23<2:13:20, 5.49s/it] {'loss': 0.5669, 'learning_rate': 3.159684346953286e-06, 'epoch': 0.75} 75%|███████▍ | 4316/5773 [2:52:25<2:13:20, 5.49s/it] {'loss': 0.5669, 'learning_rate': 3.159684346953286e-06, 'epoch': 0.75} 75%|███████▍ | 4316/5773 [2:52:23<2:13:20, 5.49s/it] 75%|███████▍ | 4317/5773 [2:52:31<2:11:56, 5.44s/it] 75%|███████▍ | 4317/5773 [2:52:29<2:11:56, 5.44s/it] {'loss': 0.5777, 'learning_rate': 3.155592474611516e-06, 'epoch': 0.75} 75%|███████▍ | 4317/5773 [2:52:31<2:11:56, 5.44s/it] {'loss': 0.5777, 'learning_rate': 3.155592474611516e-06, 'epoch': 0.75} 75%|███████▍ | 4317/5773 [2:52:29<2:11:56, 5.44s/it] 75%|███████▍ | 4318/5773 [2:52:36<2:13:09, 5.49s/it] 75%|███████▍ | 4318/5773 [2:52:34<2:13:09, 5.49s/it] {'loss': 0.5492, 'learning_rate': 3.1515027571079137e-06, 'epoch': 0.75} 75%|███████▍ | 4318/5773 [2:52:36<2:13:09, 5.49s/it] {'loss': 0.5492, 'learning_rate': 3.1515027571079137e-06, 'epoch': 0.75} 75%|███████▍ | 4318/5773 [2:52:34<2:13:09, 5.49s/it] 75%|███████▍ | 4319/5773 [2:52:42<2:14:02, 5.53s/it] 75%|███████▍ | 4319/5773 [2:52:40<2:14:02, 5.53s/it] {'loss': 0.5726, 'learning_rate': 3.1474151957300512e-06, 'epoch': 0.75} 75%|███████▍ | 4319/5773 [2:52:42<2:14:02, 5.53s/it] {'loss': 0.5726, 'learning_rate': 3.1474151957300512e-06, 'epoch': 0.75} 75%|███████▍ | 4319/5773 [2:52:40<2:14:02, 5.53s/it] 75%|███████▍ | 4320/5773 [2:52:47<2:13:46, 5.52s/it] 75%|███████▍ | 4320/5773 [2:52:45<2:13:46, 5.52s/it] {'loss': 0.5385, 'learning_rate': 3.1433297917648197e-06, 'epoch': 0.75} 75%|███████▍ | 4320/5773 [2:52:47<2:13:46, 5.52s/it] {'loss': 0.5385, 'learning_rate': 3.1433297917648197e-06, 'epoch': 0.75} 75%|███████▍ | 4320/5773 [2:52:45<2:13:46, 5.52s/it] 75%|███████▍ | 4321/5773 [2:52:53<2:13:46, 5.53s/it] 75%|███████▍ | 4321/5773 [2:52:51<2:13:46, 5.53s/it] {'loss': 0.5607, 'learning_rate': 3.1392465464984455e-06, 'epoch': 0.75} 75%|███████▍ | 4321/5773 [2:52:53<2:13:46, 5.53s/it] {'loss': 0.5607, 'learning_rate': 3.1392465464984455e-06, 'epoch': 0.75} 75%|███████▍ | 4321/5773 [2:52:51<2:13:46, 5.53s/it] 75%|███████▍ | 4322/5773 [2:52:58<2:13:19, 5.51s/it] 75%|███████▍ | 4322/5773 [2:52:56<2:13:19, 5.51s/it] {'loss': 0.5871, 'learning_rate': 3.1351654612164517e-06, 'epoch': 0.75} 75%|███████▍ | 4322/5773 [2:52:58<2:13:19, 5.51s/it] {'loss': 0.5871, 'learning_rate': 3.1351654612164517e-06, 'epoch': 0.75} 75%|███████▍ | 4322/5773 [2:52:56<2:13:19, 5.51s/it] 75%|███████▍ | 4323/5773 [2:53:04<2:13:03, 5.51s/it] 75%|███████▍ | 4323/5773 [2:53:02<2:13:03, 5.51s/it] {'loss': 0.5595, 'learning_rate': 3.131086537203705e-06, 'epoch': 0.75} 75%|███████▍ | 4323/5773 [2:53:04<2:13:03, 5.51s/it] {'loss': 0.5595, 'learning_rate': 3.131086537203705e-06, 'epoch': 0.75} 75%|███████▍ | 4323/5773 [2:53:02<2:13:03, 5.51s/it] 75%|███████▍ | 4324/5773 [2:53:09<2:11:35, 5.45s/it] 75%|███████▍ | 4324/5773 [2:53:07<2:11:35, 5.45s/it] {'loss': 0.5597, 'learning_rate': 3.127009775744375e-06, 'epoch': 0.75} 75%|███████▍ | 4324/5773 [2:53:09<2:11:35, 5.45s/it] {'loss': 0.5597, 'learning_rate': 3.127009775744375e-06, 'epoch': 0.75} 75%|███████▍ | 4324/5773 [2:53:07<2:11:35, 5.45s/it] 75%|███████▍ | 4325/5773 [2:53:15<2:12:05, 5.47s/it] 75%|███████▍ | 4325/5773 [2:53:13<2:12:05, 5.47s/it] {'loss': 0.5578, 'learning_rate': 3.1229351781219585e-06, 'epoch': 0.75} 75%|███████▍ | 4325/5773 [2:53:15<2:12:05, 5.47s/it] {'loss': 0.5578, 'learning_rate': 3.1229351781219585e-06, 'epoch': 0.75} 75%|███████▍ | 4325/5773 [2:53:13<2:12:05, 5.47s/it] 75%|███████▍ | 4326/5773 [2:53:20<2:13:39, 5.54s/it] 75%|███████▍ | 4326/5773 [2:53:18<2:13:39, 5.54s/it] {'loss': 0.5498, 'learning_rate': 3.1188627456192655e-06, 'epoch': 0.75} 75%|███████▍ | 4326/5773 [2:53:20<2:13:39, 5.54s/it] {'loss': 0.5498, 'learning_rate': 3.1188627456192655e-06, 'epoch': 0.75} 75%|███████▍ | 4326/5773 [2:53:18<2:13:39, 5.54s/it] 75%|███████▍ | 4327/5773 [2:53:26<2:13:26, 5.54s/it] 75%|███████▍ | 4327/5773 [2:53:24<2:13:26, 5.54s/it] {'loss': 0.5691, 'learning_rate': 3.1147924795184347e-06, 'epoch': 0.75} 75%|███████▍ | 4327/5773 [2:53:26<2:13:26, 5.54s/it] {'loss': 0.5691, 'learning_rate': 3.1147924795184347e-06, 'epoch': 0.75} 75%|███████▍ | 4327/5773 [2:53:24<2:13:26, 5.54s/it] 75%|███████▍ | 4328/5773 [2:53:31<2:13:29, 5.54s/it] 75%|███████▍ | 4328/5773 [2:53:30<2:13:29, 5.54s/it] {'loss': 0.5564, 'learning_rate': 3.1107243811009112e-06, 'epoch': 0.75} 75%|███████▍ | 4328/5773 [2:53:31<2:13:29, 5.54s/it] {'loss': 0.5564, 'learning_rate': 3.1107243811009112e-06, 'epoch': 0.75} 75%|███████▍ | 4328/5773 [2:53:30<2:13:29, 5.54s/it] 75%|███████▍ | 4329/5773 [2:53:37<2:13:03, 5.53s/it] 75%|███████▍ | 4329/5773 [2:53:35<2:13:02, 5.53s/it] {'loss': 0.562, 'learning_rate': 3.106658451647462e-06, 'epoch': 0.75} 75%|███████▍ | 4329/5773 [2:53:37<2:13:03, 5.53s/it] {'loss': 0.562, 'learning_rate': 3.106658451647462e-06, 'epoch': 0.75} 75%|███████▍ | 4329/5773 [2:53:35<2:13:02, 5.53s/it] 75%|███████▌ | 4330/5773 [2:53:43<2:13:42, 5.56s/it] 75%|███████▌ | 4330/5773 [2:53:41<2:13:42, 5.56s/it] {'loss': 0.5843, 'learning_rate': 3.1025946924381743e-06, 'epoch': 0.75} 75%|███████▌ | 4330/5773 [2:53:43<2:13:42, 5.56s/it] {'loss': 0.5843, 'learning_rate': 3.1025946924381743e-06, 'epoch': 0.75} 75%|███████▌ | 4330/5773 [2:53:41<2:13:42, 5.56s/it] 75%|███████▌ | 4331/5773 [2:53:48<2:12:53, 5.53s/it] 75%|███████▌ | 4331/5773 [2:53:46<2:12:53, 5.53s/it] {'loss': 0.5592, 'learning_rate': 3.0985331047524493e-06, 'epoch': 0.75} 75%|███████▌ | 4331/5773 [2:53:48<2:12:53, 5.53s/it] {'loss': 0.5592, 'learning_rate': 3.0985331047524493e-06, 'epoch': 0.75} 75%|███████▌ | 4331/5773 [2:53:46<2:12:53, 5.53s/it] 75%|███████▌ | 4332/5773 [2:53:53<2:11:18, 5.47s/it] 75%|███████▌ | 4332/5773 [2:53:51<2:11:18, 5.47s/it] {'loss': 0.546, 'learning_rate': 3.094473689869002e-06, 'epoch': 0.75} 75%|███████▌ | 4332/5773 [2:53:53<2:11:18, 5.47s/it] {'loss': 0.546, 'learning_rate': 3.094473689869002e-06, 'epoch': 0.75} 75%|███████▌ | 4332/5773 [2:53:51<2:11:18, 5.47s/it] 75%|███████▌ | 4333/5773 [2:53:59<2:10:15, 5.43s/it] 75%|███████▌ | 4333/5773 [2:53:57<2:10:15, 5.43s/it] {'loss': 0.5791, 'learning_rate': 3.0904164490658652e-06, 'epoch': 0.75} 75%|███████▌ | 4333/5773 [2:53:59<2:10:15, 5.43s/it] {'loss': 0.5791, 'learning_rate': 3.0904164490658652e-06, 'epoch': 0.75} 75%|███████▌ | 4333/5773 [2:53:57<2:10:15, 5.43s/it] 75%|███████▌ | 4334/5773 [2:54:04<2:10:37, 5.45s/it] 75%|███████▌ | 4334/5773 [2:54:02<2:10:37, 5.45s/it] {'loss': 0.5553, 'learning_rate': 3.086361383620393e-06, 'epoch': 0.75} 75%|███████▌ | 4334/5773 [2:54:04<2:10:37, 5.45s/it] {'loss': 0.5553, 'learning_rate': 3.086361383620393e-06, 'epoch': 0.75} 75%|███████▌ | 4334/5773 [2:54:02<2:10:37, 5.45s/it] 75%|███████▌ | 4335/5773 [2:54:10<2:12:44, 5.54s/it] 75%|███████▌ | 4335/5773 [2:54:08<2:12:44, 5.54s/it] {'loss': 0.5493, 'learning_rate': 3.082308494809246e-06, 'epoch': 0.75} 75%|███████▌ | 4335/5773 [2:54:10<2:12:44, 5.54s/it] {'loss': 0.5493, 'learning_rate': 3.082308494809246e-06, 'epoch': 0.75} 75%|███████▌ | 4335/5773 [2:54:08<2:12:44, 5.54s/it] 75%|███████▌ | 4336/5773 [2:54:15<2:12:13, 5.52s/it] 75%|███████▌ | 4336/5773 [2:54:13<2:12:13, 5.52s/it] {'loss': 0.5597, 'learning_rate': 3.078257783908404e-06, 'epoch': 0.75} 75%|███████▌ | 4336/5773 [2:54:15<2:12:13, 5.52s/it] {'loss': 0.5597, 'learning_rate': 3.078257783908404e-06, 'epoch': 0.75} 75%|███████▌ | 4336/5773 [2:54:13<2:12:13, 5.52s/it] 75%|███████▌ | 4337/5773 [2:54:21<2:11:35, 5.50s/it] 75%|███████▌ | 4337/5773 [2:54:19<2:11:35, 5.50s/it] {'loss': 0.5589, 'learning_rate': 3.074209252193159e-06, 'epoch': 0.75} 75%|███████▌ | 4337/5773 [2:54:19<2:11:35, 5.50s/it]{'loss': 0.5589, 'learning_rate': 3.074209252193159e-06, 'epoch': 0.75} 75%|███████▌ | 4337/5773 [2:54:21<2:11:35, 5.50s/it] 75%|███████▌ | 4338/5773 [2:54:26<2:10:41, 5.46s/it] 75%|███████▌ | 4338/5773 [2:54:24<2:10:41, 5.46s/it] {'loss': 0.5532, 'learning_rate': 3.0701629009381194e-06, 'epoch': 0.75} 75%|███████▌ | 4338/5773 [2:54:26<2:10:41, 5.46s/it] {'loss': 0.5532, 'learning_rate': 3.0701629009381194e-06, 'epoch': 0.75} 75%|███████▌ | 4338/5773 [2:54:24<2:10:41, 5.46s/it] 75%|███████▌ | 4339/5773 [2:54:32<2:10:12, 5.45s/it] 75%|███████▌ | 4339/5773 [2:54:30<2:10:12, 5.45s/it] {'loss': 0.5624, 'learning_rate': 3.0661187314172023e-06, 'epoch': 0.75} 75%|███████▌ | 4339/5773 [2:54:32<2:10:12, 5.45s/it] {'loss': 0.5624, 'learning_rate': 3.0661187314172023e-06, 'epoch': 0.75} 75%|███████▌ | 4339/5773 [2:54:30<2:10:12, 5.45s/it] 75%|███████▌ | 4340/5773 [2:54:37<2:11:02, 5.49s/it] 75%|███████▌ | 4340/5773 [2:54:35<2:11:02, 5.49s/it] {'loss': 0.5396, 'learning_rate': 3.062076744903648e-06, 'epoch': 0.75} 75%|███████▌ | 4340/5773 [2:54:37<2:11:02, 5.49s/it]{'loss': 0.5396, 'learning_rate': 3.062076744903648e-06, 'epoch': 0.75} 75%|███████▌ | 4340/5773 [2:54:35<2:11:02, 5.49s/it] 75%|███████▌ | 4341/5773 [2:54:43<2:11:15, 5.50s/it] 75%|███████▌ | 4341/5773 [2:54:41<2:11:15, 5.50s/it] {'loss': 0.5693, 'learning_rate': 3.0580369426699986e-06, 'epoch': 0.75} 75%|███████▌ | 4341/5773 [2:54:43<2:11:15, 5.50s/it] {'loss': 0.5693, 'learning_rate': 3.0580369426699986e-06, 'epoch': 0.75} 75%|███████▌ | 4341/5773 [2:54:41<2:11:15, 5.50s/it] 75%|███████▌ | 4342/5773 [2:54:48<2:11:52, 5.53s/it] 75%|███████▌ | 4342/5773 [2:54:46<2:11:52, 5.53s/it] {'loss': 0.5557, 'learning_rate': 3.0539993259881117e-06, 'epoch': 0.75} 75%|███████▌ | 4342/5773 [2:54:48<2:11:52, 5.53s/it] {'loss': 0.5557, 'learning_rate': 3.0539993259881117e-06, 'epoch': 0.75} 75%|███████▌ | 4342/5773 [2:54:46<2:11:52, 5.53s/it] 75%|███████▌ | 4343/5773 [2:54:54<2:11:35, 5.52s/it] 75%|███████▌ | 4343/5773 [2:54:52<2:11:35, 5.52s/it] {'loss': 0.5631, 'learning_rate': 3.0499638961291623e-06, 'epoch': 0.75} 75%|███████▌ | 4343/5773 [2:54:54<2:11:35, 5.52s/it] {'loss': 0.5631, 'learning_rate': 3.0499638961291623e-06, 'epoch': 0.75} 75%|███████▌ | 4343/5773 [2:54:52<2:11:35, 5.52s/it] 75%|███████▌ | 4344/5773 [2:54:59<2:09:22, 5.43s/it] 75%|███████▌ | 4344/5773 [2:54:57<2:09:22, 5.43s/it] {'loss': 0.5706, 'learning_rate': 3.045930654363631e-06, 'epoch': 0.75} 75%|███████▌ | 4344/5773 [2:54:59<2:09:22, 5.43s/it] {'loss': 0.5706, 'learning_rate': 3.045930654363631e-06, 'epoch': 0.75} 75%|███████▌ | 4344/5773 [2:54:57<2:09:22, 5.43s/it] 75%|███████▌ | 4345/5773 [2:55:05<2:09:15, 5.43s/it] 75%|███████▌ | 4345/5773 [2:55:03<2:09:15, 5.43s/it] {'loss': 0.5777, 'learning_rate': 3.041899601961308e-06, 'epoch': 0.75} 75%|███████▌ | 4345/5773 [2:55:05<2:09:15, 5.43s/it] {'loss': 0.5777, 'learning_rate': 3.041899601961308e-06, 'epoch': 0.75} 75%|███████▌ | 4345/5773 [2:55:03<2:09:15, 5.43s/it] 75%|███████▌ | 4346/5773 [2:55:10<2:08:50, 5.42s/it] 75%|███████▌ | 4346/5773 [2:55:08<2:08:50, 5.42s/it] {'loss': 0.5695, 'learning_rate': 3.037870740191303e-06, 'epoch': 0.75} 75%|███████▌ | 4346/5773 [2:55:10<2:08:50, 5.42s/it] {'loss': 0.5695, 'learning_rate': 3.037870740191303e-06, 'epoch': 0.75} 75%|███████▌ | 4346/5773 [2:55:08<2:08:50, 5.42s/it] 75%|███████▌ | 4347/5773 [2:55:15<2:08:22, 5.40s/it] 75%|███████▌ | 4347/5773 [2:55:13<2:08:22, 5.40s/it] {'loss': 0.5535, 'learning_rate': 3.033844070322027e-06, 'epoch': 0.75} 75%|███████▌ | 4347/5773 [2:55:15<2:08:22, 5.40s/it] {'loss': 0.5535, 'learning_rate': 3.033844070322027e-06, 'epoch': 0.75} 75%|███████▌ | 4347/5773 [2:55:13<2:08:22, 5.40s/it] 75%|███████▌ | 4348/5773 [2:55:21<2:09:25, 5.45s/it] 75%|███████▌ | 4348/5773 [2:55:19<2:09:25, 5.45s/it] {'loss': 0.5626, 'learning_rate': 3.029819593621206e-06, 'epoch': 0.75} 75%|███████▌ | 4348/5773 [2:55:21<2:09:25, 5.45s/it] {'loss': 0.5626, 'learning_rate': 3.029819593621206e-06, 'epoch': 0.75} 75%|███████▌ | 4348/5773 [2:55:19<2:09:25, 5.45s/it] 75%|███████▌ | 4349/5773 [2:55:26<2:09:36, 5.46s/it] 75%|███████▌ | 4349/5773 [2:55:24<2:09:36, 5.46s/it] {'loss': 0.5786, 'learning_rate': 3.0257973113558716e-06, 'epoch': 0.75} 75%|███████▌ | 4349/5773 [2:55:26<2:09:36, 5.46s/it] {'loss': 0.5786, 'learning_rate': 3.0257973113558716e-06, 'epoch': 0.75} 75%|███████▌ | 4349/5773 [2:55:24<2:09:36, 5.46s/it]11 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 75%|███████▌ | 4350/5773 [2:55:32<2:09:46, 5.47s/it]4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 75%|███████▌ | 4350/5773 [2:55:30<2:09:46, 5.47s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5654, 'learning_rate': 3.0217772247923684e-06, 'epoch': 0.75} 75%|███████▌ | 4350/5773 [2:55:32<2:09:46, 5.47s/it] {'loss': 0.5654, 'learning_rate': 3.0217772247923684e-06, 'epoch': 0.75} 75%|███████▌ | 4350/5773 [2:55:30<2:09:46, 5.47s/it] 75%|███████▌ | 4351/5773 [2:55:37<2:08:23, 5.42s/it] 75%|███████▌ | 4351/5773 [2:55:35<2:08:23, 5.42s/it] {'loss': 0.5546, 'learning_rate': 3.0177593351963474e-06, 'epoch': 0.75} 75%|███████▌ | 4351/5773 [2:55:37<2:08:23, 5.42s/it] {'loss': 0.5546, 'learning_rate': 3.0177593351963474e-06, 'epoch': 0.75} 75%|███████▌ | 4351/5773 [2:55:35<2:08:23, 5.42s/it] 75%|███████▌ | 4352/5773 [2:55:43<2:08:26, 5.42s/it] {'loss': 0.5705, 'learning_rate': 3.013743643832765e-06, 'epoch': 0.75} 75%|███████▌ | 4352/5773 [2:55:41<2:08:26, 5.42s/it] {'loss': 0.5705, 'learning_rate': 3.013743643832765e-06, 'epoch': 0.75} 75%|███████▌ | 4352/5773 [2:55:43<2:08:26, 5.42s/it] 75%|███████▌ | 4352/5773 [2:55:41<2:08:26, 5.42s/it] 75%|███████▌ | 4353/5773 [2:55:48<2:08:43, 5.44s/it] 75%|███████▌ | 4353/5773 [2:55:46<2:08:43, 5.44s/it] {'loss': 0.5408, 'learning_rate': 3.009730151965897e-06, 'epoch': 0.75} 75%|███████▌ | 4353/5773 [2:55:48<2:08:43, 5.44s/it] {'loss': 0.5408, 'learning_rate': 3.009730151965897e-06, 'epoch': 0.75} 75%|███████▌ | 4353/5773 [2:55:46<2:08:43, 5.44s/it] 75%|███████▌ | 4354/5773 [2:55:54<2:08:51, 5.45s/it] 75%|███████▌ | 4354/5773 [2:55:52<2:08:51, 5.45s/it] {'loss': 0.5626, 'learning_rate': 3.0057188608593146e-06, 'epoch': 0.75} 75%|███████▌ | 4354/5773 [2:55:54<2:08:51, 5.45s/it] {'loss': 0.5626, 'learning_rate': 3.0057188608593146e-06, 'epoch': 0.75} 75%|███████▌ | 4354/5773 [2:55:52<2:08:51, 5.45s/it] 75%|███████▌ | 4355/5773 [2:55:59<2:09:53, 5.50s/it] 75%|███████▌ | 4355/5773 [2:55:57<2:09:53, 5.50s/it] {'loss': 0.5608, 'learning_rate': 3.001709771775897e-06, 'epoch': 0.75} 75%|███████▌ | 4355/5773 [2:55:59<2:09:53, 5.50s/it] {'loss': 0.5608, 'learning_rate': 3.001709771775897e-06, 'epoch': 0.75} 75%|███████▌ | 4355/5773 [2:55:57<2:09:53, 5.50s/it] 75%|███████▌ | 4356/5773 [2:56:05<2:11:15, 5.56s/it] 75%|███████▌ | 4356/5773 [2:56:03<2:11:15, 5.56s/it] {'loss': 0.5708, 'learning_rate': 2.99770288597784e-06, 'epoch': 0.75} 75%|███████▌ | 4356/5773 [2:56:05<2:11:15, 5.56s/it] {'loss': 0.5708, 'learning_rate': 2.99770288597784e-06, 'epoch': 0.75} 75%|███████▌ | 4356/5773 [2:56:03<2:11:15, 5.56s/it] 75%|███████▌ | 4357/5773 [2:56:10<2:10:30, 5.53s/it] 75%|███████▌ | 4357/5773 [2:56:08<2:10:30, 5.53s/it] {'loss': 0.5525, 'learning_rate': 2.993698204726637e-06, 'epoch': 0.75} 75%|███████▌ | 4357/5773 [2:56:10<2:10:30, 5.53s/it] {'loss': 0.5525, 'learning_rate': 2.993698204726637e-06, 'epoch': 0.75} 75%|███████▌ | 4357/5773 [2:56:08<2:10:30, 5.53s/it] 75%|███████▌ | 4358/5773 [2:56:16<2:09:25, 5.49s/it] 75%|███████▌ | 4358/5773 [2:56:14<2:09:25, 5.49s/it] {'loss': 0.5424, 'learning_rate': 2.989695729283085e-06, 'epoch': 0.75} 75%|███████▌ | 4358/5773 [2:56:16<2:09:25, 5.49s/it] {'loss': 0.5424, 'learning_rate': 2.989695729283085e-06, 'epoch': 0.75} 75%|███████▌ | 4358/5773 [2:56:14<2:09:25, 5.49s/it] 76%|███████▌ | 4359/5773 [2:56:21<2:09:16, 5.49s/it] 76%|███████▌ | 4359/5773 [2:56:19<2:09:16, 5.49s/it] {'loss': 0.5414, 'learning_rate': 2.9856954609072986e-06, 'epoch': 0.76} 76%|███████▌ | 4359/5773 [2:56:21<2:09:16, 5.49s/it] {'loss': 0.5414, 'learning_rate': 2.9856954609072986e-06, 'epoch': 0.76} 76%|███████▌ | 4359/5773 [2:56:19<2:09:16, 5.49s/it] 76%|███████▌ | 4360/5773 [2:56:27<2:09:32, 5.50s/it] 76%|███████▌ | 4360/5773 [2:56:25<2:09:32, 5.50s/it] {'loss': 0.5576, 'learning_rate': 2.9816974008586873e-06, 'epoch': 0.76} 76%|███████▌ | 4360/5773 [2:56:27<2:09:32, 5.50s/it] {'loss': 0.5576, 'learning_rate': 2.9816974008586873e-06, 'epoch': 0.76} 76%|███████▌ | 4360/5773 [2:56:25<2:09:32, 5.50s/it] 76%|███████▌ | 4361/5773 [2:56:32<2:09:41, 5.51s/it] 76%|███████▌ | 4361/5773 [2:56:30<2:09:41, 5.51s/it] {'loss': 0.5772, 'learning_rate': 2.9777015503959674e-06, 'epoch': 0.76} 76%|███████▌ | 4361/5773 [2:56:32<2:09:41, 5.51s/it] {'loss': 0.5772, 'learning_rate': 2.9777015503959674e-06, 'epoch': 0.76} 76%|███████▌ | 4361/5773 [2:56:30<2:09:41, 5.51s/it] 76%|███████▌ | 4362/5773 [2:56:38<2:08:22, 5.46s/it] 76%|███████▌ | 4362/5773 [2:56:36<2:08:22, 5.46s/it] {'loss': 0.5728, 'learning_rate': 2.9737079107771562e-06, 'epoch': 0.76} 76%|███████▌ | 4362/5773 [2:56:38<2:08:22, 5.46s/it] {'loss': 0.5728, 'learning_rate': 2.9737079107771562e-06, 'epoch': 0.76} 76%|███████▌ | 4362/5773 [2:56:36<2:08:22, 5.46s/it] 76%|███████▌ | 4363/5773 [2:56:43<2:09:19, 5.50s/it] 76%|███████▌ | 4363/5773 [2:56:41<2:09:19, 5.50s/it] {'loss': 0.5715, 'learning_rate': 2.9697164832595892e-06, 'epoch': 0.76} 76%|███████▌ | 4363/5773 [2:56:43<2:09:19, 5.50s/it] {'loss': 0.5715, 'learning_rate': 2.9697164832595892e-06, 'epoch': 0.76} 76%|███████▌ | 4363/5773 [2:56:41<2:09:19, 5.50s/it] 76%|███████▌ | 4364/5773 [2:56:49<2:09:36, 5.52s/it] 76%|███████▌ | 4364/5773 [2:56:47<2:09:36, 5.52s/it] {'loss': 0.5547, 'learning_rate': 2.965727269099887e-06, 'epoch': 0.76} 76%|███████▌ | 4364/5773 [2:56:49<2:09:36, 5.52s/it] {'loss': 0.5547, 'learning_rate': 2.965727269099887e-06, 'epoch': 0.76} 76%|███████▌ | 4364/5773 [2:56:47<2:09:36, 5.52s/it] 76%|███████▌ | 4365/5773 [2:56:54<2:10:19, 5.55s/it] 76%|███████▌ | 4365/5773 [2:56:52<2:10:19, 5.55s/it] {'loss': 0.5601, 'learning_rate': 2.961740269553981e-06, 'epoch': 0.76} 76%|███████▌ | 4365/5773 [2:56:54<2:10:19, 5.55s/it] {'loss': 0.5601, 'learning_rate': 2.961740269553981e-06, 'epoch': 0.76} 76%|███████▌ | 4365/5773 [2:56:52<2:10:19, 5.55s/it] 76%|███████▌ | 4366/5773 [2:57:00<2:09:57, 5.54s/it] 76%|███████▌ | 4366/5773 [2:56:58<2:09:57, 5.54s/it] {'loss': 0.5449, 'learning_rate': 2.957755485877112e-06, 'epoch': 0.76} 76%|███████▌ | 4366/5773 [2:57:00<2:09:57, 5.54s/it] {'loss': 0.5449, 'learning_rate': 2.957755485877112e-06, 'epoch': 0.76} 76%|███████▌ | 4366/5773 [2:56:58<2:09:57, 5.54s/it] 76%|███████▌ | 4367/5773 [2:57:05<2:10:17, 5.56s/it] 76%|███████▌ | 4367/5773 [2:57:04<2:10:17, 5.56s/it] {'loss': 0.5689, 'learning_rate': 2.953772919323814e-06, 'epoch': 0.76} 76%|███████▌ | 4367/5773 [2:57:05<2:10:17, 5.56s/it] {'loss': 0.5689, 'learning_rate': 2.953772919323814e-06, 'epoch': 0.76} 76%|███████▌ | 4367/5773 [2:57:04<2:10:17, 5.56s/it] 76%|███████▌ | 4368/5773 [2:57:11<2:08:40, 5.50s/it] 76%|███████▌ | 4368/5773 [2:57:09<2:08:40, 5.50s/it] {'loss': 0.5536, 'learning_rate': 2.9497925711479237e-06, 'epoch': 0.76} 76%|███████▌ | 4368/5773 [2:57:11<2:08:40, 5.50s/it] {'loss': 0.5536, 'learning_rate': 2.9497925711479237e-06, 'epoch': 0.76} 76%|███████▌ | 4368/5773 [2:57:09<2:08:40, 5.50s/it] 76%|███████▌ | 4369/5773 [2:57:16<2:08:20, 5.48s/it] 76%|███████▌ | 4369/5773 [2:57:14<2:08:20, 5.48s/it] {'loss': 0.5678, 'learning_rate': 2.945814442602587e-06, 'epoch': 0.76} 76%|███████▌ | 4369/5773 [2:57:16<2:08:20, 5.48s/it] {'loss': 0.5678, 'learning_rate': 2.945814442602587e-06, 'epoch': 0.76} 76%|███████▌ | 4369/5773 [2:57:14<2:08:20, 5.48s/it] 76%|███████▌ | 4370/5773 [2:57:20<2:07:43, 5.46s/it] 76%|███████▌ | 4370/5773 [2:57:22<2:07:43, 5.46s/it] {'loss': 0.5455, 'learning_rate': 2.9418385349402444e-06, 'epoch': 0.76} 76%|███████▌ | 4370/5773 [2:57:20<2:07:43, 5.46s/it] {'loss': 0.5455, 'learning_rate': 2.9418385349402444e-06, 'epoch': 0.76} 76%|███████▌ | 4370/5773 [2:57:22<2:07:43, 5.46s/it] 76%|███████▌ | 4371/5773 [2:57:25<2:07:38, 5.46s/it] 76%|███████▌ | 4371/5773 [2:57:27<2:07:38, 5.46s/it] {'loss': 0.5524, 'learning_rate': 2.937864849412634e-06, 'epoch': 0.76} 76%|███████▌ | 4371/5773 [2:57:27<2:07:38, 5.46s/it] {'loss': 0.5524, 'learning_rate': 2.937864849412634e-06, 'epoch': 0.76} 76%|███████▌ | 4371/5773 [2:57:25<2:07:38, 5.46s/it] 76%|███████▌ | 4372/5773 [2:57:33<2:07:21, 5.45s/it] 76%|███████▌ | 4372/5773 [2:57:31<2:07:22, 5.45s/it] {'loss': 0.5415, 'learning_rate': 2.9338933872708064e-06, 'epoch': 0.76} 76%|███████▌ | 4372/5773 [2:57:33<2:07:21, 5.45s/it] {'loss': 0.5415, 'learning_rate': 2.9338933872708064e-06, 'epoch': 0.76} 76%|███████▌ | 4372/5773 [2:57:31<2:07:22, 5.45s/it] 76%|███████▌ | 4373/5773 [2:57:38<2:06:05, 5.40s/it] 76%|███████▌ | 4373/5773 [2:57:36<2:06:06, 5.40s/it] {'loss': 0.5515, 'learning_rate': 2.9299241497651e-06, 'epoch': 0.76} 76%|███████▌ | 4373/5773 [2:57:38<2:06:05, 5.40s/it] {'loss': 0.5515, 'learning_rate': 2.9299241497651e-06, 'epoch': 0.76} 76%|███████▌ | 4373/5773 [2:57:36<2:06:06, 5.40s/it] 76%|███████▌ | 4374/5773 [2:57:43<2:07:18, 5.46s/it] 76%|███████▌ | 4374/5773 [2:57:42<2:07:18, 5.46s/it] {'loss': 0.5618, 'learning_rate': 2.925957138145159e-06, 'epoch': 0.76} 76%|███████▌ | 4374/5773 [2:57:43<2:07:18, 5.46s/it] {'loss': 0.5618, 'learning_rate': 2.925957138145159e-06, 'epoch': 0.76} 76%|███████▌ | 4374/5773 [2:57:42<2:07:18, 5.46s/it] 76%|███████▌ | 4375/5773 [2:57:49<2:07:41, 5.48s/it] 76%|███████▌ | 4375/5773 [2:57:47<2:07:41, 5.48s/it] {'loss': 0.5582, 'learning_rate': 2.9219923536599228e-06, 'epoch': 0.76} 76%|███████▌ | 4375/5773 [2:57:49<2:07:41, 5.48s/it] {'loss': 0.5582, 'learning_rate': 2.9219923536599228e-06, 'epoch': 0.76} 76%|███████▌ | 4375/5773 [2:57:47<2:07:41, 5.48s/it] 76%|███████▌ | 4376/5773 [2:57:53<2:07:38, 5.48s/it] 76%|███████▌ | 4376/5773 [2:57:54<2:07:39, 5.48s/it] {'loss': 0.5583, 'learning_rate': 2.9180297975576368e-06, 'epoch': 0.76} 76%|███████▌ | 4376/5773 [2:57:54<2:07:39, 5.48s/it] {'loss': 0.5583, 'learning_rate': 2.9180297975576368e-06, 'epoch': 0.76} 76%|███████▌ | 4376/5773 [2:57:53<2:07:38, 5.48s/it] 76%|███████▌ | 4377/5773 [2:58:00<2:07:25, 5.48s/it] 76%|███████▌ | 4377/5773 [2:57:58<2:07:25, 5.48s/it] {'loss': 0.5631, 'learning_rate': 2.9140694710858376e-06, 'epoch': 0.76} 76%|███████▌ | 4377/5773 [2:58:00<2:07:25, 5.48s/it] {'loss': 0.5631, 'learning_rate': 2.9140694710858376e-06, 'epoch': 0.76} 76%|███████▌ | 4377/5773 [2:57:58<2:07:25, 5.48s/it] 76%|███████▌ | 4378/5773 [2:58:05<2:07:25, 5.48s/it] 76%|███████▌ | 4378/5773 [2:58:03<2:07:25, 5.48s/it] {'loss': 0.5541, 'learning_rate': 2.9101113754913636e-06, 'epoch': 0.76} 76%|███████▌ | 4378/5773 [2:58:05<2:07:25, 5.48s/it] {'loss': 0.5541, 'learning_rate': 2.9101113754913636e-06, 'epoch': 0.76} 76%|███████▌ | 4378/5773 [2:58:03<2:07:25, 5.48s/it] 76%|███████▌ | 4379/5773 [2:58:11<2:07:41, 5.50s/it] 76%|███████▌ | 4379/5773 [2:58:09<2:07:42, 5.50s/it] {'loss': 0.5535, 'learning_rate': 2.906155512020349e-06, 'epoch': 0.76} 76%|███████▌ | 4379/5773 [2:58:11<2:07:41, 5.50s/it] {'loss': 0.5535, 'learning_rate': 2.906155512020349e-06, 'epoch': 0.76} 76%|███████▌ | 4379/5773 [2:58:09<2:07:42, 5.50s/it] 76%|███████▌ | 4380/5773 [2:58:17<2:08:36, 5.54s/it] 76%|███████▌ | 4380/5773 [2:58:15<2:08:36, 5.54s/it] {'loss': 0.5599, 'learning_rate': 2.902201881918225e-06, 'epoch': 0.76} 76%|███████▌ | 4380/5773 [2:58:17<2:08:36, 5.54s/it] {'loss': 0.5599, 'learning_rate': 2.902201881918225e-06, 'epoch': 0.76} 76%|███████▌ | 4380/5773 [2:58:15<2:08:36, 5.54s/it] 76%|███████▌ | 4381/5773 [2:58:22<2:08:52, 5.55s/it] 76%|███████▌ | 4381/5773 [2:58:20<2:08:52, 5.55s/it] {'loss': 0.5705, 'learning_rate': 2.898250486429719e-06, 'epoch': 0.76} 76%|███████▌ | 4381/5773 [2:58:22<2:08:52, 5.55s/it] {'loss': 0.5705, 'learning_rate': 2.898250486429719e-06, 'epoch': 0.76} 76%|███████▌ | 4381/5773 [2:58:20<2:08:52, 5.55s/it] 76%|███████▌ | 4382/5773 [2:58:28<2:08:39, 5.55s/it] 76%|███████▌ | 4382/5773 [2:58:26<2:08:39, 5.55s/it] {'loss': 0.5663, 'learning_rate': 2.894301326798863e-06, 'epoch': 0.76} 76%|███████▌ | 4382/5773 [2:58:28<2:08:39, 5.55s/it] {'loss': 0.5663, 'learning_rate': 2.894301326798863e-06, 'epoch': 0.76} 76%|███████▌ | 4382/5773 [2:58:26<2:08:39, 5.55s/it] 76%|███████▌ | 4383/5773 [2:58:33<2:08:05, 5.53s/it] 76%|███████▌ | 4383/5773 [2:58:31<2:08:05, 5.53s/it] {'loss': 0.5752, 'learning_rate': 2.8903544042689745e-06, 'epoch': 0.76} 76%|███████▌ | 4383/5773 [2:58:33<2:08:05, 5.53s/it] {'loss': 0.5752, 'learning_rate': 2.8903544042689745e-06, 'epoch': 0.76} 76%|███████▌ | 4383/5773 [2:58:31<2:08:05, 5.53s/it] 76%|███████▌ | 4384/5773 [2:58:39<2:07:49, 5.52s/it] 76%|███████▌ | 4384/5773 [2:58:37<2:07:49, 5.52s/it] {'loss': 0.5533, 'learning_rate': 2.886409720082668e-06, 'epoch': 0.76} 76%|███████▌ | 4384/5773 [2:58:39<2:07:49, 5.52s/it] {'loss': 0.5533, 'learning_rate': 2.886409720082668e-06, 'epoch': 0.76} 76%|███████▌ | 4384/5773 [2:58:37<2:07:49, 5.52s/it] 76%|███████▌ | 4385/5773 [2:58:44<2:08:19, 5.55s/it] 76%|███████▌ | 4385/5773 [2:58:42<2:08:19, 5.55s/it] {'loss': 0.5565, 'learning_rate': 2.8824672754818617e-06, 'epoch': 0.76} 76%|███████▌ | 4385/5773 [2:58:44<2:08:19, 5.55s/it] {'loss': 0.5565, 'learning_rate': 2.8824672754818617e-06, 'epoch': 0.76} 76%|███████▌ | 4385/5773 [2:58:42<2:08:19, 5.55s/it] 76%|███████▌ | 4386/5773 [2:58:48<2:07:50, 5.53s/it] 76%|███████▌ | 4386/5773 [2:58:50<2:07:51, 5.53s/it] {'loss': 0.5603, 'learning_rate': 2.8785270717077618e-06, 'epoch': 0.76} 76%|███████▌ | 4386/5773 [2:58:50<2:07:51, 5.53s/it] {'loss': 0.5603, 'learning_rate': 2.8785270717077618e-06, 'epoch': 0.76} 76%|███████▌ | 4386/5773 [2:58:48<2:07:50, 5.53s/it] 76%|███████▌ | 4387/5773 [2:58:55<2:06:22, 5.47s/it] 76%|███████▌ | 4387/5773 [2:58:53<2:06:22, 5.47s/it] {'loss': 0.5682, 'learning_rate': 2.8745891100008683e-06, 'epoch': 0.76} 76%|███████▌ | 4387/5773 [2:58:55<2:06:22, 5.47s/it] {'loss': 0.5682, 'learning_rate': 2.8745891100008683e-06, 'epoch': 0.76} 76%|███████▌ | 4387/5773 [2:58:53<2:06:22, 5.47s/it] 76%|███████▌ | 4388/5773 [2:59:01<2:06:05, 5.46s/it] 76%|███████▌ | 4388/5773 [2:58:59<2:06:05, 5.46s/it] {'loss': 0.5555, 'learning_rate': 2.870653391600976e-06, 'epoch': 0.76} 76%|███████▌ | 4388/5773 [2:59:01<2:06:05, 5.46s/it] {'loss': 0.5555, 'learning_rate': 2.870653391600976e-06, 'epoch': 0.76} 76%|███████▌ | 4388/5773 [2:58:59<2:06:05, 5.46s/it] 76%|███████▌ | 4389/5773 [2:59:06<2:05:17, 5.43s/it] 76%|███████▌ | 4389/5773 [2:59:04<2:05:17, 5.43s/it] {'loss': 0.5639, 'learning_rate': 2.8667199177471794e-06, 'epoch': 0.76} 76%|███████▌ | 4389/5773 [2:59:06<2:05:17, 5.43s/it] {'loss': 0.5639, 'learning_rate': 2.8667199177471794e-06, 'epoch': 0.76} 76%|███████▌ | 4389/5773 [2:59:04<2:05:17, 5.43s/it] 76%|███████▌ | 4390/5773 [2:59:11<2:05:33, 5.45s/it] 76%|███████▌ | 4390/5773 [2:59:09<2:05:34, 5.45s/it] {'loss': 0.5612, 'learning_rate': 2.86278868967786e-06, 'epoch': 0.76} 76%|███████▌ | 4390/5773 [2:59:11<2:05:33, 5.45s/it] {'loss': 0.5612, 'learning_rate': 2.86278868967786e-06, 'epoch': 0.76} 76%|███████▌ | 4390/5773 [2:59:09<2:05:34, 5.45s/it] 76%|███████▌ | 4391/5773 [2:59:15<2:05:28, 5.45s/it] 76%|███████▌ | 4391/5773 [2:59:17<2:05:29, 5.45s/it] {'loss': 0.5541, 'learning_rate': 2.8588597086306933e-06, 'epoch': 0.76} 76%|███████▌ | 4391/5773 [2:59:17<2:05:29, 5.45s/it] {'loss': 0.5541, 'learning_rate': 2.8588597086306933e-06, 'epoch': 0.76} 76%|███████▌ | 4391/5773 [2:59:15<2:05:28, 5.45s/it] 76%|███████▌ | 4392/5773 [2:59:22<2:05:14, 5.44s/it] 76%|███████▌ | 4392/5773 [2:59:20<2:05:14, 5.44s/it] {'loss': 0.5561, 'learning_rate': 2.854932975842648e-06, 'epoch': 0.76} 76%|███████▌ | 4392/5773 [2:59:22<2:05:14, 5.44s/it] {'loss': 0.5561, 'learning_rate': 2.854932975842648e-06, 'epoch': 0.76} 76%|███████▌ | 4392/5773 [2:59:20<2:05:14, 5.44s/it] 76%|███████▌ | 4393/5773 [2:59:28<2:06:20, 5.49s/it] 76%|███████▌ | 4393/5773 [2:59:26<2:06:21, 5.49s/it] {'loss': 0.5763, 'learning_rate': 2.851008492549986e-06, 'epoch': 0.76} 76%|███████▌ | 4393/5773 [2:59:28<2:06:20, 5.49s/it] {'loss': 0.5763, 'learning_rate': 2.851008492549986e-06, 'epoch': 0.76} 76%|███████▌ | 4393/5773 [2:59:26<2:06:21, 5.49s/it] 76%|███████▌ | 4394/5773 [2:59:34<2:08:27, 5.59s/it] 76%|███████▌ | 4394/5773 [2:59:32<2:08:28, 5.59s/it] {'loss': 0.5361, 'learning_rate': 2.8470862599882554e-06, 'epoch': 0.76} 76%|███████▌ | 4394/5773 [2:59:34<2:08:27, 5.59s/it] {'loss': 0.5361, 'learning_rate': 2.8470862599882554e-06, 'epoch': 0.76} 76%|███████▌ | 4394/5773 [2:59:32<2:08:28, 5.59s/it] 76%|███████▌ | 4395/5773 [2:59:39<2:06:56, 5.53s/it] 76%|███████▌ | 4395/5773 [2:59:37<2:06:56, 5.53s/it] {'loss': 0.5726, 'learning_rate': 2.8431662793923075e-06, 'epoch': 0.76} 76%|███████▌ | 4395/5773 [2:59:39<2:06:56, 5.53s/it] {'loss': 0.5726, 'learning_rate': 2.8431662793923075e-06, 'epoch': 0.76} 76%|███████▌ | 4395/5773 [2:59:37<2:06:56, 5.53s/it] 76%|███████▌ | 4396/5773 [2:59:45<2:05:49, 5.48s/it] 76%|███████▌ | 4396/5773 [2:59:43<2:05:48, 5.48s/it] {'loss': 0.5494, 'learning_rate': 2.839248551996274e-06, 'epoch': 0.76} 76%|███████▌ | 4396/5773 [2:59:45<2:05:49, 5.48s/it] {'loss': 0.5494, 'learning_rate': 2.839248551996274e-06, 'epoch': 0.76} 76%|███████▌ | 4396/5773 [2:59:43<2:05:48, 5.48s/it] 76%|███████▌ | 4397/5773 [2:59:50<2:06:20, 5.51s/it] 76%|███████▌ | 4397/5773 [2:59:48<2:06:20, 5.51s/it] {'loss': 0.568, 'learning_rate': 2.8353330790335777e-06, 'epoch': 0.76} 76%|███████▌ | 4397/5773 [2:59:50<2:06:20, 5.51s/it] {'loss': 0.568, 'learning_rate': 2.8353330790335777e-06, 'epoch': 0.76} 76%|███████▌ | 4397/5773 [2:59:48<2:06:20, 5.51s/it] 76%|███████▌ | 4398/5773 [2:59:54<2:05:54, 5.49s/it] 76%|███████▌ | 4398/5773 [2:59:56<2:05:55, 5.49s/it] {'loss': 0.5531, 'learning_rate': 2.8314198617369403e-06, 'epoch': 0.76} 76%|███████▌ | 4398/5773 [2:59:56<2:05:55, 5.49s/it] {'loss': 0.5531, 'learning_rate': 2.8314198617369403e-06, 'epoch': 0.76} 76%|███████▌ | 4398/5773 [2:59:54<2:05:54, 5.49s/it] 76%|███████▌ | 4399/5773 [3:00:01<2:06:05, 5.51s/it] 76%|███████▌ | 4399/5773 [2:59:59<2:06:06, 5.51s/it] {'loss': 0.5445, 'learning_rate': 2.827508901338366e-06, 'epoch': 0.76} 76%|███████▌ | 4399/5773 [3:00:01<2:06:05, 5.51s/it] {'loss': 0.5445, 'learning_rate': 2.827508901338366e-06, 'epoch': 0.76} 76%|███████▌ | 4399/5773 [2:59:59<2:06:06, 5.51s/it]11 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 138 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 50 AutoResumeHook: Checking whether to suspend... 76%|███████▌ | 4400/5773 [3:00:07<2:06:15, 5.52s/it]10 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 76%|███████▌ | 4400/5773 [3:00:05<2:06:15, 5.52s/it] {'loss': 0.5733, 'learning_rate': 2.8236001990691487e-06, 'epoch': 0.76} 76%|███████▌ | 4400/5773 [3:00:07<2:06:15, 5.52s/it] {'loss': 0.5733, 'learning_rate': 2.8236001990691487e-06, 'epoch': 0.76} 76%|███████▌ | 4400/5773 [3:00:05<2:06:15, 5.52s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 76%|███████▌ | 4401/5773 [3:00:30<4:09:45, 10.92s/it] 76%|███████▌ | 4401/5773 [3:00:28<4:09:45, 10.92s/it] {'loss': 0.5798, 'learning_rate': 2.8196937561598704e-06, 'epoch': 0.76} 76%|███████▌ | 4401/5773 [3:00:30<4:09:45, 10.92s/it] {'loss': 0.5798, 'learning_rate': 2.8196937561598704e-06, 'epoch': 0.76} 76%|███████▌ | 4401/5773 [3:00:28<4:09:45, 10.92s/it] 76%|███████▋ | 4402/5773 [3:00:36<3:35:12, 9.42s/it] 76%|███████▋ | 4402/5773 [3:00:34<3:35:12, 9.42s/it] {'loss': 0.5784, 'learning_rate': 2.8157895738404095e-06, 'epoch': 0.76} 76%|███████▋ | 4402/5773 [3:00:36<3:35:12, 9.42s/it] {'loss': 0.5784, 'learning_rate': 2.8157895738404095e-06, 'epoch': 0.76} 76%|███████▋ | 4402/5773 [3:00:34<3:35:12, 9.42s/it] 76%|███████▋ | 4403/5773 [3:00:42<3:08:09, 8.24s/it] 76%|███████▋ | 4403/5773 [3:00:40<3:08:09, 8.24s/it] {'loss': 0.5318, 'learning_rate': 2.811887653339924e-06, 'epoch': 0.76} 76%|███████▋ | 4403/5773 [3:00:42<3:08:09, 8.24s/it] {'loss': 0.5318, 'learning_rate': 2.811887653339924e-06, 'epoch': 0.76} 76%|███████▋ | 4403/5773 [3:00:40<3:08:09, 8.24s/it] 76%|███████▋ | 4404/5773 [3:00:47<2:48:49, 7.40s/it] 76%|███████▋ | 4404/5773 [3:00:45<2:48:49, 7.40s/it] {'loss': 0.5623, 'learning_rate': 2.8079879958868615e-06, 'epoch': 0.76} 76%|███████▋ | 4404/5773 [3:00:47<2:48:49, 7.40s/it] {'loss': 0.5623, 'learning_rate': 2.8079879958868615e-06, 'epoch': 0.76} 76%|███████▋ | 4404/5773 [3:00:45<2:48:49, 7.40s/it] 76%|███████▋ | 4405/5773 [3:00:52<2:35:17, 6.81s/it] 76%|███████▋ | 4405/5773 [3:00:50<2:35:17, 6.81s/it] {'loss': 0.5679, 'learning_rate': 2.8040906027089674e-06, 'epoch': 0.76} 76%|███████▋ | 4405/5773 [3:00:52<2:35:17, 6.81s/it] {'loss': 0.5679, 'learning_rate': 2.8040906027089674e-06, 'epoch': 0.76} 76%|███████▋ | 4405/5773 [3:00:50<2:35:17, 6.81s/it] 76%|███████▋ | 4406/5773 [3:00:58<2:27:00, 6.45s/it] 76%|███████▋ | 4406/5773 [3:00:56<2:27:00, 6.45s/it] {'loss': 0.5638, 'learning_rate': 2.800195475033256e-06, 'epoch': 0.76} 76%|███████▋ | 4406/5773 [3:00:58<2:27:00, 6.45s/it] {'loss': 0.5638, 'learning_rate': 2.800195475033256e-06, 'epoch': 0.76} 76%|███████▋ | 4406/5773 [3:00:56<2:27:00, 6.45s/it] 76%|███████▋ | 4407/5773 [3:01:04<2:20:08, 6.16s/it] 76%|███████▋ | 4407/5773 [3:01:02<2:20:08, 6.16s/it] {'loss': 0.5585, 'learning_rate': 2.7963026140860384e-06, 'epoch': 0.76} 76%|███████▋ | 4407/5773 [3:01:04<2:20:08, 6.16s/it] {'loss': 0.5585, 'learning_rate': 2.7963026140860384e-06, 'epoch': 0.76} 76%|███████▋ | 4407/5773 [3:01:02<2:20:08, 6.16s/it] 76%|███████▋ | 4408/5773 [3:01:09<2:14:23, 5.91s/it] 76%|███████▋ | 4408/5773 [3:01:07<2:14:23, 5.91s/it] {'loss': 0.5453, 'learning_rate': 2.7924120210929174e-06, 'epoch': 0.76} 76%|███████▋ | 4408/5773 [3:01:09<2:14:23, 5.91s/it] {'loss': 0.5453, 'learning_rate': 2.7924120210929174e-06, 'epoch': 0.76} 76%|███████▋ | 4408/5773 [3:01:07<2:14:23, 5.91s/it] 76%|███████▋ | 4409/5773 [3:01:14<2:10:38, 5.75s/it] 76%|███████▋ | 4409/5773 [3:01:12<2:10:38, 5.75s/it] {'loss': 0.5589, 'learning_rate': 2.7885236972787733e-06, 'epoch': 0.76} 76%|███████▋ | 4409/5773 [3:01:14<2:10:38, 5.75s/it] {'loss': 0.5589, 'learning_rate': 2.7885236972787733e-06, 'epoch': 0.76} 76%|███████▋ | 4409/5773 [3:01:12<2:10:38, 5.75s/it] 76%|███████▋ | 4410/5773 [3:01:20<2:09:17, 5.69s/it] 76%|███████▋ | 4410/5773 [3:01:18<2:09:17, 5.69s/it] {'loss': 0.553, 'learning_rate': 2.7846376438677715e-06, 'epoch': 0.76} 76%|███████▋ | 4410/5773 [3:01:20<2:09:17, 5.69s/it] {'loss': 0.553, 'learning_rate': 2.7846376438677715e-06, 'epoch': 0.76} 76%|███████▋ | 4410/5773 [3:01:18<2:09:17, 5.69s/it] 76%|███████▋ | 4411/5773 [3:01:25<2:08:18, 5.65s/it] 76%|███████▋ | 4411/5773 [3:01:23<2:08:18, 5.65s/it] {'loss': 0.5477, 'learning_rate': 2.7807538620833707e-06, 'epoch': 0.76} 76%|███████▋ | 4411/5773 [3:01:25<2:08:18, 5.65s/it] {'loss': 0.5477, 'learning_rate': 2.7807538620833707e-06, 'epoch': 0.76} 76%|███████▋ | 4411/5773 [3:01:23<2:08:18, 5.65s/it] 76%|███████▋ | 4412/5773 [3:01:31<2:07:02, 5.60s/it] 76%|███████▋ | 4412/5773 [3:01:29<2:07:02, 5.60s/it] {'loss': 0.5777, 'learning_rate': 2.776872353148308e-06, 'epoch': 0.76} 76%|███████▋ | 4412/5773 [3:01:31<2:07:02, 5.60s/it] {'loss': 0.5777, 'learning_rate': 2.776872353148308e-06, 'epoch': 0.76} 76%|███████▋ | 4412/5773 [3:01:29<2:07:02, 5.60s/it] 76%|███████▋ | 4413/5773 [3:01:36<2:07:00, 5.60s/it] 76%|███████▋ | 4413/5773 [3:01:34<2:07:00, 5.60s/it] {'loss': 0.5691, 'learning_rate': 2.772993118284606e-06, 'epoch': 0.76} 76%|███████▋ | 4413/5773 [3:01:36<2:07:00, 5.60s/it] {'loss': 0.5691, 'learning_rate': 2.772993118284606e-06, 'epoch': 0.76} 76%|███████▋ | 4413/5773 [3:01:34<2:07:00, 5.60s/it] 76%|███████▋ | 4414/5773 [3:01:42<2:06:24, 5.58s/it] 76%|███████▋ | 4414/5773 [3:01:40<2:06:24, 5.58s/it] {'loss': 0.5515, 'learning_rate': 2.7691161587135683e-06, 'epoch': 0.76} 76%|███████▋ | 4414/5773 [3:01:42<2:06:24, 5.58s/it] {'loss': 0.5515, 'learning_rate': 2.7691161587135683e-06, 'epoch': 0.76} 76%|███████▋ | 4414/5773 [3:01:40<2:06:24, 5.58s/it] 76%|███████▋ | 4415/5773 [3:01:47<2:05:42, 5.55s/it] 76%|███████▋ | 4415/5773 [3:01:45<2:05:42, 5.55s/it] {'loss': 0.582, 'learning_rate': 2.765241475655792e-06, 'epoch': 0.76} 76%|███████▋ | 4415/5773 [3:01:47<2:05:42, 5.55s/it] {'loss': 0.582, 'learning_rate': 2.765241475655792e-06, 'epoch': 0.76} 76%|███████▋ | 4415/5773 [3:01:45<2:05:42, 5.55s/it] 76%|███████▋ | 4416/5773 [3:01:53<2:04:20, 5.50s/it] 76%|███████▋ | 4416/5773 [3:01:51<2:04:20, 5.50s/it] {'loss': 0.5708, 'learning_rate': 2.761369070331149e-06, 'epoch': 0.76} 76%|███████▋ | 4416/5773 [3:01:53<2:04:20, 5.50s/it] {'loss': 0.5708, 'learning_rate': 2.761369070331149e-06, 'epoch': 0.76} 76%|███████▋ | 4416/5773 [3:01:51<2:04:20, 5.50s/it] 77%|███████▋ | 4417/5773 [3:01:58<2:03:45, 5.48s/it] 77%|███████▋ | 4417/5773 [3:01:56<2:03:45, 5.48s/it] {'loss': 0.554, 'learning_rate': 2.7574989439587917e-06, 'epoch': 0.77} 77%|███████▋ | 4417/5773 [3:01:58<2:03:45, 5.48s/it] {'loss': 0.554, 'learning_rate': 2.7574989439587917e-06, 'epoch': 0.77} 77%|███████▋ | 4417/5773 [3:01:56<2:03:45, 5.48s/it] 77%|███████▋ | 4418/5773 [3:02:04<2:03:50, 5.48s/it] 77%|███████▋ | 4418/5773 [3:02:02<2:03:50, 5.48s/it] {'loss': 0.5737, 'learning_rate': 2.7536310977571655e-06, 'epoch': 0.77} 77%|███████▋ | 4418/5773 [3:02:04<2:03:50, 5.48s/it] {'loss': 0.5737, 'learning_rate': 2.7536310977571655e-06, 'epoch': 0.77} 77%|███████▋ | 4418/5773 [3:02:02<2:03:50, 5.48s/it] 77%|███████▋ | 4419/5773 [3:02:09<2:03:20, 5.47s/it] 77%|███████▋ | 4419/5773 [3:02:07<2:03:20, 5.47s/it] {'loss': 0.5528, 'learning_rate': 2.749765532943993e-06, 'epoch': 0.77} 77%|███████▋ | 4419/5773 [3:02:09<2:03:20, 5.47s/it] {'loss': 0.5528, 'learning_rate': 2.749765532943993e-06, 'epoch': 0.77} 77%|███████▋ | 4419/5773 [3:02:07<2:03:20, 5.47s/it] 77%|███████▋ | 4420/5773 [3:02:15<2:04:06, 5.50s/it] 77%|███████▋ | 4420/5773 [3:02:13<2:04:06, 5.50s/it] {'loss': 0.5569, 'learning_rate': 2.7459022507362687e-06, 'epoch': 0.77} 77%|███████▋ | 4420/5773 [3:02:15<2:04:06, 5.50s/it] {'loss': 0.5569, 'learning_rate': 2.7459022507362687e-06, 'epoch': 0.77} 77%|███████▋ | 4420/5773 [3:02:13<2:04:06, 5.50s/it] 77%|███████▋ | 4421/5773 [3:02:20<2:03:37, 5.49s/it] 77%|███████▋ | 4421/5773 [3:02:18<2:03:36, 5.49s/it] {'loss': 0.5778, 'learning_rate': 2.742041252350286e-06, 'epoch': 0.77} 77%|███████▋ | 4421/5773 [3:02:20<2:03:37, 5.49s/it] {'loss': 0.5778, 'learning_rate': 2.742041252350286e-06, 'epoch': 0.77} 77%|███████▋ | 4421/5773 [3:02:18<2:03:36, 5.49s/it] 77%|███████▋ | 4422/5773 [3:02:26<2:03:54, 5.50s/it] 77%|███████▋ | 4422/5773 [3:02:24<2:03:54, 5.50s/it] {'loss': 0.5558, 'learning_rate': 2.738182539001607e-06, 'epoch': 0.77} 77%|███████▋ | 4422/5773 [3:02:26<2:03:54, 5.50s/it] {'loss': 0.5558, 'learning_rate': 2.738182539001607e-06, 'epoch': 0.77} 77%|███████▋ | 4422/5773 [3:02:24<2:03:54, 5.50s/it] 77%|███████▋ | 4423/5773 [3:02:31<2:02:23, 5.44s/it] 77%|███████▋ | 4423/5773 [3:02:29<2:02:23, 5.44s/it] {'loss': 0.5707, 'learning_rate': 2.7343261119050744e-06, 'epoch': 0.77} 77%|███████▋ | 4423/5773 [3:02:31<2:02:23, 5.44s/it] {'loss': 0.5707, 'learning_rate': 2.7343261119050744e-06, 'epoch': 0.77} 77%|███████▋ | 4423/5773 [3:02:29<2:02:23, 5.44s/it] 77%|███████▋ | 4424/5773 [3:02:37<2:02:34, 5.45s/it] 77%|███████▋ | 4424/5773 [3:02:35<2:02:34, 5.45s/it] {'loss': 0.5629, 'learning_rate': 2.730471972274822e-06, 'epoch': 0.77} 77%|███████▋ | 4424/5773 [3:02:37<2:02:34, 5.45s/it] {'loss': 0.5629, 'learning_rate': 2.730471972274822e-06, 'epoch': 0.77} 77%|███████▋ | 4424/5773 [3:02:35<2:02:34, 5.45s/it] 77%|███████▋ | 4425/5773 [3:02:42<2:02:35, 5.46s/it] 77%|███████▋ | 4425/5773 [3:02:40<2:02:34, 5.46s/it] {'loss': 0.5531, 'learning_rate': 2.7266201213242526e-06, 'epoch': 0.77} 77%|███████▋ | 4425/5773 [3:02:42<2:02:35, 5.46s/it] {'loss': 0.5531, 'learning_rate': 2.7266201213242526e-06, 'epoch': 0.77} 77%|███████▋ | 4425/5773 [3:02:40<2:02:34, 5.46s/it] 77%|███████▋ | 4426/5773 [3:02:47<2:02:37, 5.46s/it] 77%|███████▋ | 4426/5773 [3:02:45<2:02:37, 5.46s/it] {'loss': 0.5859, 'learning_rate': 2.7227705602660513e-06, 'epoch': 0.77} 77%|███████▋ | 4426/5773 [3:02:47<2:02:37, 5.46s/it] {'loss': 0.5859, 'learning_rate': 2.7227705602660513e-06, 'epoch': 0.77} 77%|███████▋ | 4426/5773 [3:02:45<2:02:37, 5.46s/it] 77%|███████▋ | 4427/5773 [3:02:53<2:03:14, 5.49s/it] 77%|███████▋ | 4427/5773 [3:02:51<2:03:14, 5.49s/it] {'loss': 0.5802, 'learning_rate': 2.718923290312182e-06, 'epoch': 0.77} 77%|███████▋ | 4427/5773 [3:02:53<2:03:14, 5.49s/it] {'loss': 0.5802, 'learning_rate': 2.718923290312182e-06, 'epoch': 0.77} 77%|███████▋ | 4427/5773 [3:02:51<2:03:14, 5.49s/it] 77%|███████▋ | 4428/5773 [3:02:59<2:03:18, 5.50s/it] 77%|███████▋ | 4428/5773 [3:02:57<2:03:18, 5.50s/it] {'loss': 0.5725, 'learning_rate': 2.715078312673891e-06, 'epoch': 0.77} 77%|███████▋ | 4428/5773 [3:02:59<2:03:18, 5.50s/it] {'loss': 0.5725, 'learning_rate': 2.715078312673891e-06, 'epoch': 0.77} 77%|███████▋ | 4428/5773 [3:02:57<2:03:18, 5.50s/it] 77%|███████▋ | 4429/5773 [3:03:04<2:02:47, 5.48s/it] 77%|███████▋ | 4429/5773 [3:03:02<2:02:47, 5.48s/it] {'loss': 0.54, 'learning_rate': 2.7112356285617e-06, 'epoch': 0.77} 77%|███████▋ | 4429/5773 [3:03:04<2:02:47, 5.48s/it] {'loss': 0.54, 'learning_rate': 2.7112356285617e-06, 'epoch': 0.77} 77%|███████▋ | 4429/5773 [3:03:02<2:02:47, 5.48s/it] 77%|███████▋ | 4430/5773 [3:03:10<2:03:47, 5.53s/it] 77%|███████▋ | 4430/5773 [3:03:08<2:03:47, 5.53s/it] {'loss': 0.5435, 'learning_rate': 2.707395239185404e-06, 'epoch': 0.77} 77%|███████▋ | 4430/5773 [3:03:10<2:03:47, 5.53s/it] {'loss': 0.5435, 'learning_rate': 2.707395239185404e-06, 'epoch': 0.77} 77%|███████▋ | 4430/5773 [3:03:08<2:03:47, 5.53s/it] 77%|███████▋ | 4431/5773 [3:03:15<2:02:27, 5.47s/it] 77%|███████▋ | 4431/5773 [3:03:13<2:02:27, 5.47s/it] {'loss': 0.5434, 'learning_rate': 2.7035571457540865e-06, 'epoch': 0.77} 77%|███████▋ | 4431/5773 [3:03:15<2:02:27, 5.47s/it] {'loss': 0.5434, 'learning_rate': 2.7035571457540865e-06, 'epoch': 0.77} 77%|███████▋ | 4431/5773 [3:03:13<2:02:27, 5.47s/it] 77%|███████▋ | 4432/5773 [3:03:20<2:02:43, 5.49s/it] 77%|███████▋ | 4432/5773 [3:03:19<2:02:43, 5.49s/it] {'loss': 0.555, 'learning_rate': 2.6997213494761e-06, 'epoch': 0.77} 77%|███████▋ | 4432/5773 [3:03:20<2:02:43, 5.49s/it] {'loss': 0.555, 'learning_rate': 2.6997213494761e-06, 'epoch': 0.77} 77%|███████▋ | 4432/5773 [3:03:19<2:02:43, 5.49s/it] 77%|███████▋ | 4433/5773 [3:03:26<2:01:40, 5.45s/it] 77%|███████▋ | 4433/5773 [3:03:24<2:01:40, 5.45s/it] {'loss': 0.5693, 'learning_rate': 2.6958878515590747e-06, 'epoch': 0.77} 77%|███████▋ | 4433/5773 [3:03:26<2:01:40, 5.45s/it] {'loss': 0.5693, 'learning_rate': 2.6958878515590747e-06, 'epoch': 0.77} 77%|███████▋ | 4433/5773 [3:03:24<2:01:40, 5.45s/it] 77%|███████▋ | 4434/5773 [3:03:31<2:01:33, 5.45s/it] 77%|███████▋ | 4434/5773 [3:03:29<2:01:33, 5.45s/it] {'loss': 0.549, 'learning_rate': 2.692056653209919e-06, 'epoch': 0.77} 77%|███████▋ | 4434/5773 [3:03:31<2:01:33, 5.45s/it] {'loss': 0.549, 'learning_rate': 2.692056653209919e-06, 'epoch': 0.77} 77%|███████▋ | 4434/5773 [3:03:29<2:01:33, 5.45s/it] 77%|███████▋ | 4435/5773 [3:03:37<2:00:37, 5.41s/it] 77%|███████▋ | 4435/5773 [3:03:35<2:00:37, 5.41s/it] {'loss': 0.5552, 'learning_rate': 2.6882277556348156e-06, 'epoch': 0.77} 77%|███████▋ | 4435/5773 [3:03:37<2:00:37, 5.41s/it] {'loss': 0.5552, 'learning_rate': 2.6882277556348156e-06, 'epoch': 0.77} 77%|███████▋ | 4435/5773 [3:03:35<2:00:37, 5.41s/it] 77%|███████▋ | 4436/5773 [3:03:42<2:00:27, 5.41s/it] 77%|███████▋ | 4436/5773 [3:03:40<2:00:27, 5.41s/it] {'loss': 0.5485, 'learning_rate': 2.6844011600392215e-06, 'epoch': 0.77} 77%|███████▋ | 4436/5773 [3:03:42<2:00:27, 5.41s/it] {'loss': 0.5485, 'learning_rate': 2.6844011600392215e-06, 'epoch': 0.77} 77%|███████▋ | 4436/5773 [3:03:40<2:00:27, 5.41s/it] 77%|███████▋ | 4437/5773 [3:03:48<2:01:18, 5.45s/it] 77%|███████▋ | 4437/5773 [3:03:46<2:01:18, 5.45s/it] {'loss': 0.5432, 'learning_rate': 2.6805768676278766e-06, 'epoch': 0.77} 77%|███████▋ | 4437/5773 [3:03:48<2:01:18, 5.45s/it] {'loss': 0.5432, 'learning_rate': 2.6805768676278766e-06, 'epoch': 0.77} 77%|███████▋ | 4437/5773 [3:03:46<2:01:18, 5.45s/it] 77%|███████▋ | 4438/5773 [3:03:53<2:01:28, 5.46s/it] 77%|███████▋ | 4438/5773 [3:03:51<2:01:28, 5.46s/it] {'loss': 0.5689, 'learning_rate': 2.676754879604788e-06, 'epoch': 0.77} 77%|███████▋ | 4438/5773 [3:03:53<2:01:28, 5.46s/it] {'loss': 0.5689, 'learning_rate': 2.676754879604788e-06, 'epoch': 0.77} 77%|███████▋ | 4438/5773 [3:03:51<2:01:28, 5.46s/it] 77%|███████▋ | 4439/5773 [3:03:59<2:02:53, 5.53s/it] 77%|███████▋ | 4439/5773 [3:03:57<2:02:53, 5.53s/it] {'loss': 0.5616, 'learning_rate': 2.6729351971732398e-06, 'epoch': 0.77} 77%|███████▋ | 4439/5773 [3:03:59<2:02:53, 5.53s/it] {'loss': 0.5616, 'learning_rate': 2.6729351971732398e-06, 'epoch': 0.77} 77%|███████▋ | 4439/5773 [3:03:57<2:02:53, 5.53s/it] 77%|███████▋ | 4440/5773 [3:04:04<2:03:03, 5.54s/it] 77%|███████▋ | 4440/5773 [3:04:02<2:03:03, 5.54s/it] {'loss': 0.5659, 'learning_rate': 2.669117821535786e-06, 'epoch': 0.77} 77%|███████▋ | 4440/5773 [3:04:04<2:03:03, 5.54s/it] {'loss': 0.5659, 'learning_rate': 2.669117821535786e-06, 'epoch': 0.77} 77%|███████▋ | 4440/5773 [3:04:02<2:03:03, 5.54s/it] 77%|███████▋ | 4441/5773 [3:04:10<2:02:35, 5.52s/it] 77%|███████▋ | 4441/5773 [3:04:08<2:02:35, 5.52s/it] {'loss': 0.5594, 'learning_rate': 2.6653027538942655e-06, 'epoch': 0.77} 77%|███████▋ | 4441/5773 [3:04:10<2:02:35, 5.52s/it] {'loss': 0.5594, 'learning_rate': 2.6653027538942655e-06, 'epoch': 0.77} 77%|███████▋ | 4441/5773 [3:04:08<2:02:35, 5.52s/it] 77%|███████▋ | 4442/5773 [3:04:15<2:01:45, 5.49s/it] 77%|███████▋ | 4442/5773 [3:04:13<2:01:45, 5.49s/it] {'loss': 0.5643, 'learning_rate': 2.6614899954497797e-06, 'epoch': 0.77} 77%|███████▋ | 4442/5773 [3:04:15<2:01:45, 5.49s/it] {'loss': 0.5643, 'learning_rate': 2.6614899954497797e-06, 'epoch': 0.77} 77%|███████▋ | 4442/5773 [3:04:13<2:01:45, 5.49s/it] 77%|███████▋ | 4443/5773 [3:04:21<2:01:11, 5.47s/it] 77%|███████▋ | 4443/5773 [3:04:19<2:01:11, 5.47s/it] {'loss': 0.5566, 'learning_rate': 2.657679547402704e-06, 'epoch': 0.77} 77%|███████▋ | 4443/5773 [3:04:21<2:01:11, 5.47s/it] {'loss': 0.5566, 'learning_rate': 2.657679547402704e-06, 'epoch': 0.77} 77%|███████▋ | 4443/5773 [3:04:19<2:01:11, 5.47s/it] 77%|███████▋ | 4444/5773 [3:04:26<2:01:05, 5.47s/it] 77%|███████▋ | 4444/5773 [3:04:24<2:01:05, 5.47s/it] {'loss': 0.57, 'learning_rate': 2.653871410952695e-06, 'epoch': 0.77} 77%|███████▋ | 4444/5773 [3:04:26<2:01:05, 5.47s/it] {'loss': 0.57, 'learning_rate': 2.653871410952695e-06, 'epoch': 0.77} 77%|███████▋ | 4444/5773 [3:04:24<2:01:05, 5.47s/it] 77%|███████▋ | 4445/5773 [3:04:32<2:02:08, 5.52s/it] 77%|███████▋ | 4445/5773 [3:04:30<2:02:08, 5.52s/it] {'loss': 0.5659, 'learning_rate': 2.650065587298675e-06, 'epoch': 0.77} 77%|███████▋ | 4445/5773 [3:04:32<2:02:08, 5.52s/it] {'loss': 0.5659, 'learning_rate': 2.650065587298675e-06, 'epoch': 0.77} 77%|███████▋ | 4445/5773 [3:04:30<2:02:08, 5.52s/it] 77%|███████▋ | 4446/5773 [3:04:37<2:01:24, 5.49s/it] 77%|███████▋ | 4446/5773 [3:04:35<2:01:24, 5.49s/it] {'loss': 0.5978, 'learning_rate': 2.6462620776388313e-06, 'epoch': 0.77} 77%|███████▋ | 4446/5773 [3:04:37<2:01:24, 5.49s/it] {'loss': 0.5978, 'learning_rate': 2.6462620776388313e-06, 'epoch': 0.77} 77%|███████▋ | 4446/5773 [3:04:35<2:01:24, 5.49s/it] 77%|███████▋ | 4447/5773 [3:04:41<2:02:03, 5.52s/it] 77%|███████▋ | 4447/5773 [3:04:43<2:02:03, 5.52s/it] {'loss': 0.5788, 'learning_rate': 2.6424608831706435e-06, 'epoch': 0.77} 77%|███████▋ | 4447/5773 [3:04:43<2:02:03, 5.52s/it] {'loss': 0.5788, 'learning_rate': 2.6424608831706435e-06, 'epoch': 0.77} 77%|███████▋ | 4447/5773 [3:04:41<2:02:03, 5.52s/it] 77%|███████▋ | 4448/5773 [3:04:48<2:02:11, 5.53s/it] 77%|███████▋ | 4448/5773 [3:04:46<2:02:11, 5.53s/it] {'loss': 0.5429, 'learning_rate': 2.6386620050908383e-06, 'epoch': 0.77} 77%|███████▋ | 4448/5773 [3:04:48<2:02:11, 5.53s/it] {'loss': 0.5429, 'learning_rate': 2.6386620050908383e-06, 'epoch': 0.77} 77%|███████▋ | 4448/5773 [3:04:46<2:02:11, 5.53s/it] 77%|███████▋ | 4449/5773 [3:04:54<2:01:11, 5.49s/it] 77%|███████▋ | 4449/5773 [3:04:52<2:01:11, 5.49s/it] {'loss': 0.5571, 'learning_rate': 2.6348654445954235e-06, 'epoch': 0.77} 77%|███████▋ | 4449/5773 [3:04:54<2:01:11, 5.49s/it] {'loss': 0.5571, 'learning_rate': 2.6348654445954235e-06, 'epoch': 0.77} 77%|███████▋ | 4449/5773 [3:04:52<2:01:11, 5.49s/it]11 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 77%|███████▋ | 4450/5773 [3:04:59<2:00:13, 5.45s/it]13 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 450 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 77%|███████▋ | 4450/5773 [3:04:57<2:00:14, 5.45s/it] {'loss': 0.5673, 'learning_rate': 2.631071202879685e-06, 'epoch': 0.77} 77%|███████▋ | 4450/5773 [3:04:59<2:00:13, 5.45s/it] {'loss': 0.5673, 'learning_rate': 2.631071202879685e-06, 'epoch': 0.77} 77%|███████▋ | 4450/5773 [3:04:57<2:00:14, 5.45s/it] 77%|███████▋ | 4451/5773 [3:05:05<2:01:30, 5.51s/it] 77%|███████▋ | 4451/5773 [3:05:03<2:01:30, 5.51s/it] {'loss': 0.5597, 'learning_rate': 2.6272792811381663e-06, 'epoch': 0.77} 77%|███████▋ | 4451/5773 [3:05:05<2:01:30, 5.51s/it] {'loss': 0.5597, 'learning_rate': 2.6272792811381663e-06, 'epoch': 0.77} 77%|███████▋ | 4451/5773 [3:05:03<2:01:30, 5.51s/it] 77%|███████▋ | 4452/5773 [3:05:10<2:00:59, 5.50s/it] 77%|███████▋ | 4452/5773 [3:05:08<2:00:59, 5.50s/it] {'loss': 0.5585, 'learning_rate': 2.623489680564685e-06, 'epoch': 0.77} 77%|███████▋ | 4452/5773 [3:05:10<2:00:59, 5.50s/it] {'loss': 0.5585, 'learning_rate': 2.623489680564685e-06, 'epoch': 0.77} 77%|███████▋ | 4452/5773 [3:05:08<2:00:59, 5.50s/it] 77%|███████▋ | 4453/5773 [3:05:16<2:00:22, 5.47s/it] 77%|███████▋ | 4453/5773 [3:05:14<2:00:22, 5.47s/it] {'loss': 0.5505, 'learning_rate': 2.619702402352332e-06, 'epoch': 0.77} 77%|███████▋ | 4453/5773 [3:05:16<2:00:22, 5.47s/it] {'loss': 0.5505, 'learning_rate': 2.619702402352332e-06, 'epoch': 0.77} 77%|███████▋ | 4453/5773 [3:05:14<2:00:22, 5.47s/it] 77%|███████▋ | 4454/5773 [3:05:21<2:00:25, 5.48s/it] 77%|███████▋ | 4454/5773 [3:05:19<2:00:25, 5.48s/it] {'loss': 0.5631, 'learning_rate': 2.615917447693462e-06, 'epoch': 0.77} 77%|███████▋ | 4454/5773 [3:05:21<2:00:25, 5.48s/it] {'loss': 0.5631, 'learning_rate': 2.615917447693462e-06, 'epoch': 0.77} 77%|███████▋ | 4454/5773 [3:05:19<2:00:25, 5.48s/it] 77%|███████▋ | 4455/5773 [3:05:27<2:00:48, 5.50s/it] 77%|███████▋ | 4455/5773 [3:05:25<2:00:48, 5.50s/it] {'loss': 0.561, 'learning_rate': 2.6121348177797e-06, 'epoch': 0.77} 77%|███████▋ | 4455/5773 [3:05:27<2:00:48, 5.50s/it] {'loss': 0.561, 'learning_rate': 2.6121348177797e-06, 'epoch': 0.77} 77%|███████▋ | 4455/5773 [3:05:25<2:00:48, 5.50s/it] 77%|███████▋ | 4456/5773 [3:05:32<2:00:04, 5.47s/it] 77%|███████▋ | 4456/5773 [3:05:30<2:00:04, 5.47s/it] {'loss': 0.5722, 'learning_rate': 2.608354513801934e-06, 'epoch': 0.77} 77%|███████▋ | 4456/5773 [3:05:32<2:00:04, 5.47s/it] {'loss': 0.5722, 'learning_rate': 2.608354513801934e-06, 'epoch': 0.77} 77%|███████▋ | 4456/5773 [3:05:30<2:00:04, 5.47s/it] 77%|███████▋ | 4457/5773 [3:05:37<1:59:26, 5.45s/it] 77%|███████▋ | 4457/5773 [3:05:35<1:59:26, 5.45s/it] {'loss': 0.5747, 'learning_rate': 2.6045765369503316e-06, 'epoch': 0.77} 77%|███████▋ | 4457/5773 [3:05:37<1:59:26, 5.45s/it] {'loss': 0.5747, 'learning_rate': 2.6045765369503316e-06, 'epoch': 0.77} 77%|███████▋ | 4457/5773 [3:05:35<1:59:26, 5.45s/it] 77%|███████▋ | 4458/5773 [3:05:43<1:59:07, 5.44s/it] 77%|███████▋ | 4458/5773 [3:05:41<1:59:07, 5.44s/it] {'loss': 0.5666, 'learning_rate': 2.600800888414319e-06, 'epoch': 0.77} 77%|███████▋ | 4458/5773 [3:05:43<1:59:07, 5.44s/it] {'loss': 0.5666, 'learning_rate': 2.600800888414319e-06, 'epoch': 0.77} 77%|███████▋ | 4458/5773 [3:05:41<1:59:07, 5.44s/it] 77%|███████▋ | 4459/5773 [3:05:48<1:59:00, 5.43s/it] 77%|███████▋ | 4459/5773 [3:05:46<1:59:00, 5.43s/it] {'loss': 0.5644, 'learning_rate': 2.597027569382585e-06, 'epoch': 0.77} 77%|███████▋ | 4459/5773 [3:05:48<1:59:00, 5.43s/it] {'loss': 0.5644, 'learning_rate': 2.597027569382585e-06, 'epoch': 0.77} 77%|███████▋ | 4459/5773 [3:05:46<1:59:00, 5.43s/it] 77%|███████▋ | 4460/5773 [3:05:54<1:58:51, 5.43s/it] 77%|███████▋ | 4460/5773 [3:05:52<1:58:51, 5.43s/it] {'loss': 0.5899, 'learning_rate': 2.593256581043099e-06, 'epoch': 0.77} 77%|███████▋ | 4460/5773 [3:05:54<1:58:51, 5.43s/it] {'loss': 0.5899, 'learning_rate': 2.593256581043099e-06, 'epoch': 0.77} 77%|███████▋ | 4460/5773 [3:05:52<1:58:51, 5.43s/it] 77%|███████▋ | 4461/5773 [3:05:59<1:58:30, 5.42s/it] 77%|███████▋ | 4461/5773 [3:05:57<1:58:30, 5.42s/it] {'loss': 0.5792, 'learning_rate': 2.5894879245830883e-06, 'epoch': 0.77} 77%|███████▋ | 4461/5773 [3:05:59<1:58:30, 5.42s/it] {'loss': 0.5792, 'learning_rate': 2.5894879245830883e-06, 'epoch': 0.77} 77%|███████▋ | 4461/5773 [3:05:57<1:58:30, 5.42s/it] 77%|███████▋ | 4462/5773 [3:06:05<1:59:53, 5.49s/it] 77%|███████▋ | 4462/5773 [3:06:03<1:59:53, 5.49s/it] {'loss': 0.5668, 'learning_rate': 2.585721601189036e-06, 'epoch': 0.77} 77%|███████▋ | 4462/5773 [3:06:05<1:59:53, 5.49s/it] {'loss': 0.5668, 'learning_rate': 2.585721601189036e-06, 'epoch': 0.77} 77%|███████▋ | 4462/5773 [3:06:03<1:59:53, 5.49s/it] 77%|███████▋ | 4463/5773 [3:06:10<2:00:17, 5.51s/it] 77%|███████▋ | 4463/5773 [3:06:08<2:00:17, 5.51s/it] {'loss': 0.5402, 'learning_rate': 2.5819576120467126e-06, 'epoch': 0.77} 77%|███████▋ | 4463/5773 [3:06:10<2:00:17, 5.51s/it] {'loss': 0.5402, 'learning_rate': 2.5819576120467126e-06, 'epoch': 0.77} 77%|███████▋ | 4463/5773 [3:06:08<2:00:17, 5.51s/it] 77%|███████▋ | 4464/5773 [3:06:16<2:00:22, 5.52s/it] 77%|███████▋ | 4464/5773 [3:06:14<2:00:22, 5.52s/it] {'loss': 0.5548, 'learning_rate': 2.5781959583411375e-06, 'epoch': 0.77} 77%|███████▋ | 4464/5773 [3:06:16<2:00:22, 5.52s/it] {'loss': 0.5548, 'learning_rate': 2.5781959583411375e-06, 'epoch': 0.77} 77%|███████▋ | 4464/5773 [3:06:14<2:00:22, 5.52s/it] 77%|███████▋ | 4465/5773 [3:06:19<2:00:49, 5.54s/it] 77%|███████▋ | 4465/5773 [3:06:21<2:00:50, 5.54s/it] {'loss': 0.5569, 'learning_rate': 2.574436641256597e-06, 'epoch': 0.77} 77%|███████▋ | 4465/5773 [3:06:21<2:00:50, 5.54s/it] {'loss': 0.5569, 'learning_rate': 2.574436641256597e-06, 'epoch': 0.77} 77%|███████▋ | 4465/5773 [3:06:19<2:00:49, 5.54s/it] 77%|███████▋ | 4466/5773 [3:06:27<2:00:32, 5.53s/it] 77%|███████▋ | 4466/5773 [3:06:25<2:00:31, 5.53s/it] {'loss': 0.5841, 'learning_rate': 2.5706796619766493e-06, 'epoch': 0.77} 77%|███████▋ | 4466/5773 [3:06:27<2:00:32, 5.53s/it] {'loss': 0.5841, 'learning_rate': 2.5706796619766493e-06, 'epoch': 0.77} 77%|███████▋ | 4466/5773 [3:06:25<2:00:31, 5.53s/it] 77%|███████▋ | 4467/5773 [3:06:32<2:00:38, 5.54s/it] 77%|███████▋ | 4467/5773 [3:06:31<2:00:39, 5.54s/it] {'loss': 0.5601, 'learning_rate': 2.5669250216841104e-06, 'epoch': 0.77} 77%|███████▋ | 4467/5773 [3:06:32<2:00:38, 5.54s/it] {'loss': 0.5601, 'learning_rate': 2.5669250216841104e-06, 'epoch': 0.77} 77%|███████▋ | 4467/5773 [3:06:31<2:00:39, 5.54s/it] 77%|███████▋ | 4468/5773 [3:06:38<2:02:33, 5.63s/it] 77%|███████▋ | 4468/5773 [3:06:36<2:02:32, 5.63s/it] {'loss': 0.5733, 'learning_rate': 2.563172721561058e-06, 'epoch': 0.77} 77%|███████▋ | 4468/5773 [3:06:38<2:02:33, 5.63s/it] {'loss': 0.5733, 'learning_rate': 2.563172721561058e-06, 'epoch': 0.77} 77%|███████▋ | 4468/5773 [3:06:36<2:02:32, 5.63s/it] 77%|███████▋ | 4469/5773 [3:06:44<2:00:52, 5.56s/it] 77%|███████▋ | 4469/5773 [3:06:42<2:00:52, 5.56s/it] {'loss': 0.5472, 'learning_rate': 2.5594227627888356e-06, 'epoch': 0.77} 77%|███████▋ | 4469/5773 [3:06:44<2:00:52, 5.56s/it] {'loss': 0.5472, 'learning_rate': 2.5594227627888356e-06, 'epoch': 0.77} 77%|███████▋ | 4469/5773 [3:06:42<2:00:52, 5.56s/it] 77%|███████▋ | 4470/5773 [3:06:49<2:00:11, 5.53s/it] 77%|███████▋ | 4470/5773 [3:06:47<2:00:11, 5.53s/it] {'loss': 0.5723, 'learning_rate': 2.5556751465480555e-06, 'epoch': 0.77} 77%|███████▋ | 4470/5773 [3:06:49<2:00:11, 5.53s/it] {'loss': 0.5723, 'learning_rate': 2.5556751465480555e-06, 'epoch': 0.77} 77%|███████▋ | 4470/5773 [3:06:47<2:00:11, 5.53s/it] 77%|███████▋ | 4471/5773 [3:06:55<1:58:53, 5.48s/it] 77%|███████▋ | 4471/5773 [3:06:53<1:58:53, 5.48s/it] {'loss': 0.5539, 'learning_rate': 2.5519298740185837e-06, 'epoch': 0.77} 77%|███████▋ | 4471/5773 [3:06:55<1:58:53, 5.48s/it] {'loss': 0.5539, 'learning_rate': 2.5519298740185837e-06, 'epoch': 0.77} 77%|███████▋ | 4471/5773 [3:06:53<1:58:53, 5.48s/it] 77%|███████▋ | 4472/5773 [3:07:00<1:59:01, 5.49s/it] 77%|███████▋ | 4472/5773 [3:06:58<1:59:01, 5.49s/it] {'loss': 0.5561, 'learning_rate': 2.5481869463795494e-06, 'epoch': 0.77} 77%|███████▋ | 4472/5773 [3:07:00<1:59:01, 5.49s/it] {'loss': 0.5561, 'learning_rate': 2.5481869463795494e-06, 'epoch': 0.77} 77%|███████▋ | 4472/5773 [3:06:58<1:59:01, 5.49s/it] 77%|███████▋ | 4473/5773 [3:07:06<2:00:17, 5.55s/it] 77%|███████▋ | 4473/5773 [3:07:04<2:00:17, 5.55s/it] {'loss': 0.576, 'learning_rate': 2.54444636480935e-06, 'epoch': 0.77} 77%|███████▋ | 4473/5773 [3:07:06<2:00:17, 5.55s/it] {'loss': 0.576, 'learning_rate': 2.54444636480935e-06, 'epoch': 0.77} 77%|███████▋ | 4473/5773 [3:07:04<2:00:17, 5.55s/it] 77%|███████▋ | 4474/5773 [3:07:11<1:59:13, 5.51s/it] 77%|███████▋ | 4474/5773 [3:07:09<1:59:13, 5.51s/it] {'loss': 0.5805, 'learning_rate': 2.5407081304856384e-06, 'epoch': 0.77} 77%|███████▋ | 4474/5773 [3:07:11<1:59:13, 5.51s/it] {'loss': 0.5805, 'learning_rate': 2.5407081304856384e-06, 'epoch': 0.77} 77%|███████▋ | 4474/5773 [3:07:09<1:59:13, 5.51s/it] 78%|███████▊ | 4475/5773 [3:07:17<1:59:38, 5.53s/it] 78%|███████▊ | 4475/5773 [3:07:15<1:59:38, 5.53s/it] {'loss': 0.5541, 'learning_rate': 2.536972244585331e-06, 'epoch': 0.78} 78%|███████▊ | 4475/5773 [3:07:17<1:59:38, 5.53s/it] {'loss': 0.5541, 'learning_rate': 2.536972244585331e-06, 'epoch': 0.78} 78%|███████▊ | 4475/5773 [3:07:15<1:59:38, 5.53s/it] 78%|███████▊ | 4476/5773 [3:07:22<1:59:04, 5.51s/it] 78%|███████▊ | 4476/5773 [3:07:20<1:59:04, 5.51s/it] {'loss': 0.5613, 'learning_rate': 2.533238708284602e-06, 'epoch': 0.78} 78%|███████▊ | 4476/5773 [3:07:22<1:59:04, 5.51s/it] {'loss': 0.5613, 'learning_rate': 2.533238708284602e-06, 'epoch': 0.78} 78%|███████▊ | 4476/5773 [3:07:20<1:59:04, 5.51s/it] 78%|███████▊ | 4477/5773 [3:07:28<1:57:55, 5.46s/it] 78%|███████▊ | 4477/5773 [3:07:26<1:57:55, 5.46s/it] {'loss': 0.5634, 'learning_rate': 2.5295075227588908e-06, 'epoch': 0.78} 78%|███████▊ | 4477/5773 [3:07:28<1:57:55, 5.46s/it] {'loss': 0.5634, 'learning_rate': 2.5295075227588908e-06, 'epoch': 0.78} 78%|███████▊ | 4477/5773 [3:07:26<1:57:55, 5.46s/it] 78%|███████▊ | 4478/5773 [3:07:33<1:57:48, 5.46s/it] 78%|███████▊ | 4478/5773 [3:07:31<1:57:48, 5.46s/it] {'loss': 0.5514, 'learning_rate': 2.5257786891828872e-06, 'epoch': 0.78} 78%|███████▊ | 4478/5773 [3:07:33<1:57:48, 5.46s/it] {'loss': 0.5514, 'learning_rate': 2.5257786891828872e-06, 'epoch': 0.78} 78%|███████▊ | 4478/5773 [3:07:31<1:57:48, 5.46s/it] 78%|███████▊ | 4479/5773 [3:07:38<1:57:30, 5.45s/it] 78%|███████▊ | 4479/5773 [3:07:36<1:57:30, 5.45s/it] {'loss': 0.5537, 'learning_rate': 2.5220522087305556e-06, 'epoch': 0.78} 78%|███████▊ | 4479/5773 [3:07:38<1:57:30, 5.45s/it] {'loss': 0.5537, 'learning_rate': 2.5220522087305556e-06, 'epoch': 0.78} 78%|███████▊ | 4479/5773 [3:07:36<1:57:30, 5.45s/it] 78%|███████▊ | 4480/5773 [3:07:44<1:57:45, 5.46s/it] 78%|███████▊ | 4480/5773 [3:07:42<1:57:45, 5.46s/it] {'loss': 0.5663, 'learning_rate': 2.518328082575108e-06, 'epoch': 0.78} 78%|███████▊ | 4480/5773 [3:07:44<1:57:45, 5.46s/it] {'loss': 0.5663, 'learning_rate': 2.518328082575108e-06, 'epoch': 0.78} 78%|███████▊ | 4480/5773 [3:07:42<1:57:45, 5.46s/it] 78%|███████▊ | 4481/5773 [3:07:50<1:58:22, 5.50s/it] 78%|███████▊ | 4481/5773 [3:07:48<1:58:22, 5.50s/it] {'loss': 0.5765, 'learning_rate': 2.5146063118890176e-06, 'epoch': 0.78} 78%|███████▊ | 4481/5773 [3:07:50<1:58:22, 5.50s/it] {'loss': 0.5765, 'learning_rate': 2.5146063118890176e-06, 'epoch': 0.78} 78%|███████▊ | 4481/5773 [3:07:48<1:58:22, 5.50s/it] 78%|███████▊ | 4482/5773 [3:07:55<1:58:20, 5.50s/it] 78%|███████▊ | 4482/5773 [3:07:53<1:58:20, 5.50s/it] {'loss': 0.5559, 'learning_rate': 2.510886897844014e-06, 'epoch': 0.78} 78%|███████▊ | 4482/5773 [3:07:55<1:58:20, 5.50s/it] {'loss': 0.5559, 'learning_rate': 2.510886897844014e-06, 'epoch': 0.78} 78%|███████▊ | 4482/5773 [3:07:53<1:58:20, 5.50s/it] 78%|███████▊ | 4483/5773 [3:08:00<1:57:39, 5.47s/it] 78%|███████▊ | 4483/5773 [3:07:58<1:57:39, 5.47s/it] {'loss': 0.5552, 'learning_rate': 2.5071698416110924e-06, 'epoch': 0.78} 78%|███████▊ | 4483/5773 [3:08:00<1:57:39, 5.47s/it] {'loss': 0.5552, 'learning_rate': 2.5071698416110924e-06, 'epoch': 0.78} 78%|███████▊ | 4483/5773 [3:07:58<1:57:39, 5.47s/it] 78%|███████▊ | 4484/5773 [3:08:06<1:57:09, 5.45s/it] 78%|███████▊ | 4484/5773 [3:08:04<1:57:09, 5.45s/it] {'loss': 0.5618, 'learning_rate': 2.5034551443604995e-06, 'epoch': 0.78} 78%|███████▊ | 4484/5773 [3:08:06<1:57:09, 5.45s/it] {'loss': 0.5618, 'learning_rate': 2.5034551443604995e-06, 'epoch': 0.78} 78%|███████▊ | 4484/5773 [3:08:04<1:57:09, 5.45s/it] 78%|███████▊ | 4485/5773 [3:08:11<1:56:11, 5.41s/it] 78%|███████▊ | 4485/5773 [3:08:09<1:56:11, 5.41s/it] {'loss': 0.5398, 'learning_rate': 2.499742807261738e-06, 'epoch': 0.78} 78%|███████▊ | 4485/5773 [3:08:11<1:56:11, 5.41s/it] {'loss': 0.5398, 'learning_rate': 2.499742807261738e-06, 'epoch': 0.78} 78%|███████▊ | 4485/5773 [3:08:09<1:56:11, 5.41s/it] 78%|███████▊ | 4486/5773 [3:08:17<1:56:58, 5.45s/it] 78%|███████▊ | 4486/5773 [3:08:15<1:56:58, 5.45s/it] {'loss': 0.557, 'learning_rate': 2.4960328314835746e-06, 'epoch': 0.78} 78%|███████▊ | 4486/5773 [3:08:17<1:56:58, 5.45s/it] {'loss': 0.557, 'learning_rate': 2.4960328314835746e-06, 'epoch': 0.78} 78%|███████▊ | 4486/5773 [3:08:15<1:56:58, 5.45s/it] 78%|███████▊ | 4487/5773 [3:08:22<1:56:45, 5.45s/it] 78%|███████▊ | 4487/5773 [3:08:20<1:56:45, 5.45s/it] {'loss': 0.5445, 'learning_rate': 2.492325218194026e-06, 'epoch': 0.78} 78%|███████▊ | 4487/5773 [3:08:22<1:56:45, 5.45s/it] {'loss': 0.5445, 'learning_rate': 2.492325218194026e-06, 'epoch': 0.78} 78%|███████▊ | 4487/5773 [3:08:20<1:56:45, 5.45s/it] 78%|███████▊ | 4488/5773 [3:08:28<1:57:34, 5.49s/it] 78%|███████▊ | 4488/5773 [3:08:26<1:57:34, 5.49s/it] {'loss': 0.5512, 'learning_rate': 2.4886199685603676e-06, 'epoch': 0.78} 78%|███████▊ | 4488/5773 [3:08:28<1:57:34, 5.49s/it] {'loss': 0.5512, 'learning_rate': 2.4886199685603676e-06, 'epoch': 0.78} 78%|███████▊ | 4488/5773 [3:08:26<1:57:34, 5.49s/it] 78%|███████▊ | 4489/5773 [3:08:33<1:57:35, 5.49s/it] 78%|███████▊ | 4489/5773 [3:08:31<1:57:35, 5.49s/it] {'loss': 0.5647, 'learning_rate': 2.4849170837491265e-06, 'epoch': 0.78} 78%|███████▊ | 4489/5773 [3:08:33<1:57:35, 5.49s/it] {'loss': 0.5647, 'learning_rate': 2.4849170837491265e-06, 'epoch': 0.78} 78%|███████▊ | 4489/5773 [3:08:31<1:57:35, 5.49s/it] 78%|███████▊ | 4490/5773 [3:08:39<1:56:48, 5.46s/it] 78%|███████▊ | 4490/5773 [3:08:37<1:56:48, 5.46s/it] {'loss': 0.5651, 'learning_rate': 2.481216564926099e-06, 'epoch': 0.78} 78%|███████▊ | 4490/5773 [3:08:37<1:56:48, 5.46s/it] {'loss': 0.5651, 'learning_rate': 2.481216564926099e-06, 'epoch': 0.78} 78%|███████▊ | 4490/5773 [3:08:39<1:56:48, 5.46s/it] 78%|███████▊ | 4491/5773 [3:08:44<1:56:27, 5.45s/it] 78%|███████▊ | 4491/5773 [3:08:42<1:56:27, 5.45s/it] {'loss': 0.5622, 'learning_rate': 2.4775184132563145e-06, 'epoch': 0.78} 78%|███████▊ | 4491/5773 [3:08:44<1:56:27, 5.45s/it] {'loss': 0.5622, 'learning_rate': 2.4775184132563145e-06, 'epoch': 0.78} 78%|███████▊ | 4491/5773 [3:08:42<1:56:27, 5.45s/it] 78%|███████▊ | 4492/5773 [3:08:50<1:57:42, 5.51s/it] 78%|███████▊ | 4492/5773 [3:08:48<1:57:42, 5.51s/it] {'loss': 0.547, 'learning_rate': 2.473822629904078e-06, 'epoch': 0.78} 78%|███████▊ | 4492/5773 [3:08:50<1:57:42, 5.51s/it] {'loss': 0.547, 'learning_rate': 2.473822629904078e-06, 'epoch': 0.78} 78%|███████▊ | 4492/5773 [3:08:48<1:57:42, 5.51s/it] 78%|███████▊ | 4493/5773 [3:08:55<1:57:55, 5.53s/it] 78%|███████▊ | 4493/5773 [3:08:53<1:57:56, 5.53s/it] {'loss': 0.5504, 'learning_rate': 2.4701292160329373e-06, 'epoch': 0.78} 78%|███████▊ | 4493/5773 [3:08:55<1:57:55, 5.53s/it] {'loss': 0.5504, 'learning_rate': 2.4701292160329373e-06, 'epoch': 0.78} 78%|███████▊ | 4493/5773 [3:08:53<1:57:56, 5.53s/it] 78%|███████▊ | 4494/5773 [3:09:01<1:57:36, 5.52s/it] 78%|███████▊ | 4494/5773 [3:08:59<1:57:36, 5.52s/it] {'loss': 0.5616, 'learning_rate': 2.4664381728056962e-06, 'epoch': 0.78} 78%|███████▊ | 4494/5773 [3:09:01<1:57:36, 5.52s/it] {'loss': 0.5616, 'learning_rate': 2.4664381728056962e-06, 'epoch': 0.78} 78%|███████▊ | 4494/5773 [3:08:59<1:57:36, 5.52s/it] 78%|███████▊ | 4495/5773 [3:09:06<1:57:49, 5.53s/it] 78%|███████▊ | 4495/5773 [3:09:04<1:57:49, 5.53s/it] {'loss': 0.5531, 'learning_rate': 2.462749501384413e-06, 'epoch': 0.78} 78%|███████▊ | 4495/5773 [3:09:06<1:57:49, 5.53s/it] {'loss': 0.5531, 'learning_rate': 2.462749501384413e-06, 'epoch': 0.78} 78%|███████▊ | 4495/5773 [3:09:04<1:57:49, 5.53s/it] 78%|███████▊ | 4496/5773 [3:09:12<1:55:59, 5.45s/it] 78%|███████▊ | 4496/5773 [3:09:10<1:55:59, 5.45s/it] {'loss': 0.5678, 'learning_rate': 2.4590632029304018e-06, 'epoch': 0.78} 78%|███████▊ | 4496/5773 [3:09:12<1:55:59, 5.45s/it] {'loss': 0.5678, 'learning_rate': 2.4590632029304018e-06, 'epoch': 0.78} 78%|███████▊ | 4496/5773 [3:09:10<1:55:59, 5.45s/it] 78%|███████▊ | 4497/5773 [3:09:17<1:56:09, 5.46s/it] 78%|███████▊ | 4497/5773 [3:09:15<1:56:09, 5.46s/it] {'loss': 0.5442, 'learning_rate': 2.455379278604226e-06, 'epoch': 0.78} 78%|███████▊ | 4497/5773 [3:09:17<1:56:09, 5.46s/it] {'loss': 0.5442, 'learning_rate': 2.455379278604226e-06, 'epoch': 0.78} 78%|███████▊ | 4497/5773 [3:09:15<1:56:09, 5.46s/it] 78%|███████▊ | 4498/5773 [3:09:23<1:57:04, 5.51s/it] 78%|███████▊ | 4498/5773 [3:09:21<1:57:04, 5.51s/it] {'loss': 0.5426, 'learning_rate': 2.4516977295656997e-06, 'epoch': 0.78} 78%|███████▊ | 4498/5773 [3:09:23<1:57:04, 5.51s/it] {'loss': 0.5426, 'learning_rate': 2.4516977295656997e-06, 'epoch': 0.78} 78%|███████▊ | 4498/5773 [3:09:21<1:57:04, 5.51s/it] 78%|███████▊ | 4499/5773 [3:09:28<1:57:41, 5.54s/it] 78%|███████▊ | 4499/5773 [3:09:26<1:57:41, 5.54s/it] {'loss': 0.5471, 'learning_rate': 2.448018556973897e-06, 'epoch': 0.78} 78%|███████▊ | 4499/5773 [3:09:28<1:57:41, 5.54s/it] {'loss': 0.5471, 'learning_rate': 2.448018556973897e-06, 'epoch': 0.78} 78%|███████▊ | 4499/5773 [3:09:26<1:57:41, 5.54s/it]11 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 78%|███████▊ | 4500/5773 [3:09:34<1:57:07, 5.52s/it]12 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 78%|███████▊ | 4500/5773 [3:09:32<1:57:08, 5.52s/it] {'loss': 0.5637, 'learning_rate': 2.4443417619871367e-06, 'epoch': 0.78} 78%|███████▊ | 4500/5773 [3:09:34<1:57:07, 5.52s/it] {'loss': 0.5637, 'learning_rate': 2.4443417619871367e-06, 'epoch': 0.78} 78%|███████▊ | 4500/5773 [3:09:32<1:57:08, 5.52s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 78%|███████▊ | 4501/5773 [3:09:59<4:04:06, 11.51s/it] 78%|███████▊ | 4501/5773 [3:09:57<4:04:06, 11.51s/it] {'loss': 0.5559, 'learning_rate': 2.4406673457629914e-06, 'epoch': 0.78} 78%|███████▊ | 4501/5773 [3:09:59<4:04:06, 11.51s/it] {'loss': 0.5559, 'learning_rate': 2.4406673457629914e-06, 'epoch': 0.78} 78%|███████▊ | 4501/5773 [3:09:57<4:04:06, 11.51s/it] 78%|███████▊ | 4502/5773 [3:10:05<3:26:02, 9.73s/it] 78%|███████▊ | 4502/5773 [3:10:03<3:26:02, 9.73s/it] {'loss': 0.5532, 'learning_rate': 2.436995309458282e-06, 'epoch': 0.78} 78%|███████▊ | 4502/5773 [3:10:05<3:26:02, 9.73s/it] {'loss': 0.5532, 'learning_rate': 2.436995309458282e-06, 'epoch': 0.78} 78%|███████▊ | 4502/5773 [3:10:03<3:26:02, 9.73s/it] 78%|███████▊ | 4503/5773 [3:10:10<2:58:34, 8.44s/it] 78%|███████▊ | 4503/5773 [3:10:08<2:58:34, 8.44s/it] {'loss': 0.5554, 'learning_rate': 2.4333256542290883e-06, 'epoch': 0.78} 78%|███████▊ | 4503/5773 [3:10:10<2:58:34, 8.44s/it] {'loss': 0.5554, 'learning_rate': 2.4333256542290883e-06, 'epoch': 0.78} 78%|███████▊ | 4503/5773 [3:10:08<2:58:34, 8.44s/it] 78%|███████▊ | 4504/5773 [3:10:16<2:40:03, 7.57s/it] 78%|███████▊ | 4504/5773 [3:10:14<2:40:03, 7.57s/it] {'loss': 0.5714, 'learning_rate': 2.429658381230733e-06, 'epoch': 0.78} 78%|███████▊ | 4504/5773 [3:10:16<2:40:03, 7.57s/it] {'loss': 0.5714, 'learning_rate': 2.429658381230733e-06, 'epoch': 0.78} 78%|███████▊ | 4504/5773 [3:10:14<2:40:03, 7.57s/it] 78%|███████▊ | 4505/5773 [3:10:22<2:28:31, 7.03s/it] 78%|███████▊ | 4505/5773 [3:10:20<2:28:31, 7.03s/it] {'loss': 0.574, 'learning_rate': 2.4259934916177898e-06, 'epoch': 0.78} 78%|███████▊ | 4505/5773 [3:10:22<2:28:31, 7.03s/it] {'loss': 0.574, 'learning_rate': 2.4259934916177898e-06, 'epoch': 0.78} 78%|███████▊ | 4505/5773 [3:10:20<2:28:31, 7.03s/it] 78%|███████▊ | 4506/5773 [3:10:27<2:19:53, 6.62s/it] 78%|███████▊ | 4506/5773 [3:10:25<2:19:53, 6.62s/it] {'loss': 0.5451, 'learning_rate': 2.4223309865440823e-06, 'epoch': 0.78} 78%|███████▊ | 4506/5773 [3:10:27<2:19:53, 6.62s/it] {'loss': 0.5451, 'learning_rate': 2.4223309865440823e-06, 'epoch': 0.78} 78%|███████▊ | 4506/5773 [3:10:25<2:19:53, 6.62s/it] 78%|███████▊ | 4507/5773 [3:10:33<2:12:11, 6.27s/it] 78%|███████▊ | 4507/5773 [3:10:31<2:12:11, 6.27s/it] {'loss': 0.561, 'learning_rate': 2.418670867162686e-06, 'epoch': 0.78} 78%|███████▊ | 4507/5773 [3:10:33<2:12:11, 6.27s/it] {'loss': 0.561, 'learning_rate': 2.418670867162686e-06, 'epoch': 0.78} 78%|███████▊ | 4507/5773 [3:10:31<2:12:11, 6.27s/it] 78%|███████▊ | 4508/5773 [3:10:38<2:06:20, 5.99s/it] 78%|███████▊ | 4508/5773 [3:10:36<2:06:20, 5.99s/it] {'loss': 0.5481, 'learning_rate': 2.41501313462592e-06, 'epoch': 0.78} 78%|███████▊ | 4508/5773 [3:10:38<2:06:20, 5.99s/it] {'loss': 0.5481, 'learning_rate': 2.41501313462592e-06, 'epoch': 0.78} 78%|███████▊ | 4508/5773 [3:10:36<2:06:20, 5.99s/it] 78%|███████▊ | 4509/5773 [3:10:42<2:03:20, 5.85s/it] 78%|███████▊ | 4509/5773 [3:10:44<2:03:20, 5.85s/it] {'loss': 0.5662, 'learning_rate': 2.411357790085359e-06, 'epoch': 0.78} 78%|███████▊ | 4509/5773 [3:10:44<2:03:20, 5.85s/it] {'loss': 0.5662, 'learning_rate': 2.411357790085359e-06, 'epoch': 0.78} 78%|███████▊ | 4509/5773 [3:10:42<2:03:20, 5.85s/it] 78%|███████▊ | 4510/5773 [3:10:49<2:01:57, 5.79s/it] 78%|███████▊ | 4510/5773 [3:10:47<2:01:57, 5.79s/it] {'loss': 0.5551, 'learning_rate': 2.407704834691823e-06, 'epoch': 0.78} 78%|███████▊ | 4510/5773 [3:10:49<2:01:57, 5.79s/it] {'loss': 0.5551, 'learning_rate': 2.407704834691823e-06, 'epoch': 0.78} 78%|███████▊ | 4510/5773 [3:10:47<2:01:57, 5.79s/it] 78%|███████▊ | 4511/5773 [3:10:55<2:00:34, 5.73s/it] 78%|███████▊ | 4511/5773 [3:10:53<2:00:34, 5.73s/it] {'loss': 0.5661, 'learning_rate': 2.404054269595374e-06, 'epoch': 0.78} 78%|███████▊ | 4511/5773 [3:10:55<2:00:34, 5.73s/it] {'loss': 0.5661, 'learning_rate': 2.404054269595374e-06, 'epoch': 0.78} 78%|███████▊ | 4511/5773 [3:10:53<2:00:34, 5.73s/it] 78%|███████▊ | 4512/5773 [3:11:00<1:59:03, 5.67s/it] 78%|███████▊ | 4512/5773 [3:10:58<1:59:03, 5.67s/it] {'loss': 0.5489, 'learning_rate': 2.4004060959453312e-06, 'epoch': 0.78} 78%|███████▊ | 4512/5773 [3:11:00<1:59:03, 5.67s/it] {'loss': 0.5489, 'learning_rate': 2.4004060959453312e-06, 'epoch': 0.78} 78%|███████▊ | 4512/5773 [3:10:58<1:59:03, 5.67s/it] 78%|███████▊ | 4513/5773 [3:11:06<1:57:55, 5.62s/it] 78%|███████▊ | 4513/5773 [3:11:04<1:57:55, 5.62s/it] {'loss': 0.5657, 'learning_rate': 2.396760314890256e-06, 'epoch': 0.78} 78%|███████▊ | 4513/5773 [3:11:06<1:57:55, 5.62s/it] {'loss': 0.5657, 'learning_rate': 2.396760314890256e-06, 'epoch': 0.78} 78%|███████▊ | 4513/5773 [3:11:04<1:57:55, 5.62s/it] 78%|███████▊ | 4514/5773 [3:11:11<1:57:16, 5.59s/it] 78%|███████▊ | 4514/5773 [3:11:09<1:57:16, 5.59s/it] {'loss': 0.5548, 'learning_rate': 2.393116927577953e-06, 'epoch': 0.78} 78%|███████▊ | 4514/5773 [3:11:11<1:57:16, 5.59s/it] {'loss': 0.5548, 'learning_rate': 2.393116927577953e-06, 'epoch': 0.78} 78%|███████▊ | 4514/5773 [3:11:09<1:57:16, 5.59s/it] 78%|███████▊ | 4515/5773 [3:11:17<1:57:29, 5.60s/it] 78%|███████▊ | 4515/5773 [3:11:15<1:57:29, 5.60s/it] {'loss': 0.5784, 'learning_rate': 2.389475935155482e-06, 'epoch': 0.78} 78%|███████▊ | 4515/5773 [3:11:17<1:57:29, 5.60s/it] {'loss': 0.5784, 'learning_rate': 2.389475935155482e-06, 'epoch': 0.78} 78%|███████▊ | 4515/5773 [3:11:15<1:57:29, 5.60s/it] 78%|███████▊ | 4516/5773 [3:11:23<1:57:13, 5.60s/it] 78%|███████▊ | 4516/5773 [3:11:21<1:57:14, 5.60s/it] {'loss': 0.5679, 'learning_rate': 2.3858373387691404e-06, 'epoch': 0.78} 78%|███████▊ | 4516/5773 [3:11:23<1:57:13, 5.60s/it] {'loss': 0.5679, 'learning_rate': 2.3858373387691404e-06, 'epoch': 0.78} 78%|███████▊ | 4516/5773 [3:11:21<1:57:14, 5.60s/it] 78%|███████▊ | 4517/5773 [3:11:28<1:57:15, 5.60s/it] 78%|███████▊ | 4517/5773 [3:11:26<1:57:15, 5.60s/it] {'loss': 0.5429, 'learning_rate': 2.382201139564476e-06, 'epoch': 0.78} 78%|███████▊ | 4517/5773 [3:11:28<1:57:15, 5.60s/it] {'loss': 0.5429, 'learning_rate': 2.382201139564476e-06, 'epoch': 0.78} 78%|███████▊ | 4517/5773 [3:11:26<1:57:15, 5.60s/it] 78%|███████▊ | 4518/5773 [3:11:34<1:56:44, 5.58s/it] 78%|███████▊ | 4518/5773 [3:11:32<1:56:43, 5.58s/it] {'loss': 0.5648, 'learning_rate': 2.3785673386862808e-06, 'epoch': 0.78} 78%|███████▊ | 4518/5773 [3:11:34<1:56:44, 5.58s/it] {'loss': 0.5648, 'learning_rate': 2.3785673386862808e-06, 'epoch': 0.78} 78%|███████▊ | 4518/5773 [3:11:32<1:56:43, 5.58s/it] 78%|███████▊ | 4519/5773 [3:11:39<1:55:05, 5.51s/it] 78%|███████▊ | 4519/5773 [3:11:37<1:55:05, 5.51s/it] {'loss': 0.5385, 'learning_rate': 2.3749359372785884e-06, 'epoch': 0.78} 78%|███████▊ | 4519/5773 [3:11:39<1:55:05, 5.51s/it] {'loss': 0.5385, 'learning_rate': 2.3749359372785884e-06, 'epoch': 0.78} 78%|███████▊ | 4519/5773 [3:11:37<1:55:05, 5.51s/it] 78%|███████▊ | 4520/5773 [3:11:45<1:55:05, 5.51s/it] 78%|███████▊ | 4520/5773 [3:11:43<1:55:05, 5.51s/it] {'loss': 0.56, 'learning_rate': 2.371306936484684e-06, 'epoch': 0.78} 78%|███████▊ | 4520/5773 [3:11:45<1:55:05, 5.51s/it] {'loss': 0.56, 'learning_rate': 2.371306936484684e-06, 'epoch': 0.78} 78%|███████▊ | 4520/5773 [3:11:43<1:55:05, 5.51s/it] 78%|███████▊ | 4521/5773 [3:11:50<1:54:48, 5.50s/it] 78%|███████▊ | 4521/5773 [3:11:48<1:54:48, 5.50s/it] {'loss': 0.5543, 'learning_rate': 2.367680337447087e-06, 'epoch': 0.78} 78%|███████▊ | 4521/5773 [3:11:50<1:54:48, 5.50s/it] {'loss': 0.5543, 'learning_rate': 2.367680337447087e-06, 'epoch': 0.78} 78%|███████▊ | 4521/5773 [3:11:48<1:54:48, 5.50s/it] 78%|███████▊ | 4522/5773 [3:11:56<1:54:37, 5.50s/it] 78%|███████▊ | 4522/5773 [3:11:54<1:54:37, 5.50s/it] {'loss': 0.575, 'learning_rate': 2.3640561413075746e-06, 'epoch': 0.78} 78%|███████▊ | 4522/5773 [3:11:56<1:54:37, 5.50s/it] {'loss': 0.575, 'learning_rate': 2.3640561413075746e-06, 'epoch': 0.78} 78%|███████▊ | 4522/5773 [3:11:54<1:54:37, 5.50s/it] 78%|███████▊ | 4523/5773 [3:11:59<1:54:09, 5.48s/it] 78%|███████▊ | 4523/5773 [3:12:01<1:54:09, 5.48s/it] {'loss': 0.5494, 'learning_rate': 2.3604343492071545e-06, 'epoch': 0.78} 78%|███████▊ | 4523/5773 [3:12:01<1:54:09, 5.48s/it] {'loss': 0.5494, 'learning_rate': 2.3604343492071545e-06, 'epoch': 0.78} 78%|███████▊ | 4523/5773 [3:11:59<1:54:09, 5.48s/it] 78%|███████▊ | 4524/5773 [3:12:07<1:54:52, 5.52s/it] 78%|███████▊ | 4524/5773 [3:12:05<1:54:52, 5.52s/it] {'loss': 0.5599, 'learning_rate': 2.3568149622860815e-06, 'epoch': 0.78} 78%|███████▊ | 4524/5773 [3:12:07<1:54:52, 5.52s/it] {'loss': 0.5599, 'learning_rate': 2.3568149622860815e-06, 'epoch': 0.78} 78%|███████▊ | 4524/5773 [3:12:05<1:54:52, 5.52s/it] 78%|███████▊ | 4525/5773 [3:12:12<1:53:51, 5.47s/it] 78%|███████▊ | 4525/5773 [3:12:10<1:53:51, 5.47s/it] {'loss': 0.5445, 'learning_rate': 2.353197981683859e-06, 'epoch': 0.78} 78%|███████▊ | 4525/5773 [3:12:12<1:53:51, 5.47s/it] {'loss': 0.5445, 'learning_rate': 2.353197981683859e-06, 'epoch': 0.78} 78%|███████▊ | 4525/5773 [3:12:10<1:53:51, 5.47s/it] 78%|███████▊ | 4526/5773 [3:12:17<1:53:37, 5.47s/it] 78%|███████▊ | 4526/5773 [3:12:15<1:53:37, 5.47s/it] {'loss': 0.5375, 'learning_rate': 2.3495834085392265e-06, 'epoch': 0.78} 78%|███████▊ | 4526/5773 [3:12:17<1:53:37, 5.47s/it] {'loss': 0.5375, 'learning_rate': 2.3495834085392265e-06, 'epoch': 0.78} 78%|███████▊ | 4526/5773 [3:12:15<1:53:37, 5.47s/it] 78%|███████▊ | 4527/5773 [3:12:23<1:54:34, 5.52s/it] 78%|███████▊ | 4527/5773 [3:12:21<1:54:34, 5.52s/it] {'loss': 0.5557, 'learning_rate': 2.345971243990163e-06, 'epoch': 0.78} 78%|███████▊ | 4527/5773 [3:12:23<1:54:34, 5.52s/it] {'loss': 0.5557, 'learning_rate': 2.345971243990163e-06, 'epoch': 0.78} 78%|███████▊ | 4527/5773 [3:12:21<1:54:34, 5.52s/it] 78%|███████▊ | 4528/5773 [3:12:29<1:56:30, 5.61s/it] 78%|███████▊ | 4528/5773 [3:12:27<1:56:29, 5.61s/it] {'loss': 0.5364, 'learning_rate': 2.3423614891738977e-06, 'epoch': 0.78} 78%|███████▊ | 4528/5773 [3:12:29<1:56:30, 5.61s/it] {'loss': 0.5364, 'learning_rate': 2.3423614891738977e-06, 'epoch': 0.78} 78%|███████▊ | 4528/5773 [3:12:27<1:56:29, 5.61s/it] 78%|███████▊ | 4529/5773 [3:12:32<1:54:55, 5.54s/it] 78%|███████▊ | 4529/5773 [3:12:34<1:54:55, 5.54s/it] {'loss': 0.5601, 'learning_rate': 2.3387541452268968e-06, 'epoch': 0.78} 78%|███████▊ | 4529/5773 [3:12:34<1:54:55, 5.54s/it] {'loss': 0.5601, 'learning_rate': 2.3387541452268968e-06, 'epoch': 0.78} 78%|███████▊ | 4529/5773 [3:12:32<1:54:55, 5.54s/it] 78%|███████▊ | 4530/5773 [3:12:40<1:54:04, 5.51s/it] 78%|███████▊ | 4530/5773 [3:12:38<1:54:04, 5.51s/it] {'loss': 0.5543, 'learning_rate': 2.3351492132848665e-06, 'epoch': 0.78} 78%|███████▊ | 4530/5773 [3:12:40<1:54:04, 5.51s/it] {'loss': 0.5543, 'learning_rate': 2.3351492132848665e-06, 'epoch': 0.78} 78%|███████▊ | 4530/5773 [3:12:38<1:54:04, 5.51s/it] 78%|███████▊ | 4531/5773 [3:12:45<1:53:38, 5.49s/it] 78%|███████▊ | 4531/5773 [3:12:43<1:53:38, 5.49s/it] {'loss': 0.5649, 'learning_rate': 2.33154669448275e-06, 'epoch': 0.78} 78%|███████▊ | 4531/5773 [3:12:45<1:53:38, 5.49s/it] {'loss': 0.5649, 'learning_rate': 2.33154669448275e-06, 'epoch': 0.78} 78%|███████▊ | 4531/5773 [3:12:43<1:53:38, 5.49s/it] 79%|███████▊ | 4532/5773 [3:12:51<1:54:04, 5.52s/it] 79%|███████▊ | 4532/5773 [3:12:49<1:54:05, 5.52s/it] {'loss': 0.5577, 'learning_rate': 2.3279465899547473e-06, 'epoch': 0.79} 79%|███████▊ | 4532/5773 [3:12:51<1:54:04, 5.52s/it] {'loss': 0.5577, 'learning_rate': 2.3279465899547473e-06, 'epoch': 0.79} 79%|███████▊ | 4532/5773 [3:12:49<1:54:05, 5.52s/it] 79%|███████▊ | 4533/5773 [3:12:57<1:57:17, 5.68s/it] 79%|███████▊ | 4533/5773 [3:12:55<1:57:17, 5.68s/it] {'loss': 0.5573, 'learning_rate': 2.3243489008342735e-06, 'epoch': 0.79} 79%|███████▊ | 4533/5773 [3:12:57<1:57:17, 5.68s/it] {'loss': 0.5573, 'learning_rate': 2.3243489008342735e-06, 'epoch': 0.79} 79%|███████▊ | 4533/5773 [3:12:55<1:57:17, 5.68s/it] 79%|███████▊ | 4534/5773 [3:13:02<1:55:46, 5.61s/it] 79%|███████▊ | 4534/5773 [3:13:00<1:55:46, 5.61s/it] {'loss': 0.5533, 'learning_rate': 2.3207536282539987e-06, 'epoch': 0.79} 79%|███████▊ | 4534/5773 [3:13:02<1:55:46, 5.61s/it] {'loss': 0.5533, 'learning_rate': 2.3207536282539987e-06, 'epoch': 0.79} 79%|███████▊ | 4534/5773 [3:13:00<1:55:46, 5.61s/it] 79%|███████▊ | 4535/5773 [3:13:08<1:54:31, 5.55s/it] 79%|███████▊ | 4535/5773 [3:13:06<1:54:31, 5.55s/it] {'loss': 0.5586, 'learning_rate': 2.317160773345836e-06, 'epoch': 0.79} 79%|███████▊ | 4535/5773 [3:13:08<1:54:31, 5.55s/it] {'loss': 0.5586, 'learning_rate': 2.317160773345836e-06, 'epoch': 0.79} 79%|███████▊ | 4535/5773 [3:13:06<1:54:31, 5.55s/it] 79%|███████▊ | 4536/5773 [3:13:13<1:54:27, 5.55s/it] 79%|███████▊ | 4536/5773 [3:13:11<1:54:27, 5.55s/it] {'loss': 0.5484, 'learning_rate': 2.313570337240926e-06, 'epoch': 0.79} 79%|███████▊ | 4536/5773 [3:13:13<1:54:27, 5.55s/it] {'loss': 0.5484, 'learning_rate': 2.313570337240926e-06, 'epoch': 0.79} 79%|███████▊ | 4536/5773 [3:13:11<1:54:27, 5.55s/it] 79%|███████▊ | 4537/5773 [3:13:19<1:54:25, 5.55s/it] 79%|███████▊ | 4537/5773 [3:13:17<1:54:25, 5.55s/it] {'loss': 0.5643, 'learning_rate': 2.3099823210696503e-06, 'epoch': 0.79} 79%|███████▊ | 4537/5773 [3:13:19<1:54:25, 5.55s/it] {'loss': 0.5643, 'learning_rate': 2.3099823210696503e-06, 'epoch': 0.79} 79%|███████▊ | 4537/5773 [3:13:17<1:54:25, 5.55s/it] 79%|███████▊ | 4538/5773 [3:13:24<1:53:43, 5.53s/it] 79%|███████▊ | 4538/5773 [3:13:22<1:53:43, 5.53s/it] {'loss': 0.5507, 'learning_rate': 2.306396725961638e-06, 'epoch': 0.79} 79%|███████▊ | 4538/5773 [3:13:24<1:53:43, 5.53s/it] {'loss': 0.5507, 'learning_rate': 2.306396725961638e-06, 'epoch': 0.79} 79%|███████▊ | 4538/5773 [3:13:22<1:53:43, 5.53s/it] 79%|███████▊ | 4539/5773 [3:13:30<1:53:31, 5.52s/it] 79%|███████▊ | 4539/5773 [3:13:28<1:53:31, 5.52s/it] {'loss': 0.5542, 'learning_rate': 2.302813553045745e-06, 'epoch': 0.79} 79%|███████▊ | 4539/5773 [3:13:30<1:53:31, 5.52s/it] {'loss': 0.5542, 'learning_rate': 2.302813553045745e-06, 'epoch': 0.79} 79%|███████▊ | 4539/5773 [3:13:28<1:53:31, 5.52s/it] 79%|███████▊ | 4540/5773 [3:13:35<1:52:19, 5.47s/it] 79%|███████▊ | 4540/5773 [3:13:33<1:52:19, 5.47s/it] {'loss': 0.561, 'learning_rate': 2.2992328034500668e-06, 'epoch': 0.79} 79%|███████▊ | 4540/5773 [3:13:35<1:52:19, 5.47s/it] {'loss': 0.561, 'learning_rate': 2.2992328034500668e-06, 'epoch': 0.79} 79%|███████▊ | 4540/5773 [3:13:33<1:52:19, 5.47s/it] 79%|███████▊ | 4541/5773 [3:13:41<1:52:39, 5.49s/it] 79%|███████▊ | 4541/5773 [3:13:39<1:52:39, 5.49s/it] {'loss': 0.5522, 'learning_rate': 2.295654478301942e-06, 'epoch': 0.79} 79%|███████▊ | 4541/5773 [3:13:41<1:52:39, 5.49s/it] {'loss': 0.5522, 'learning_rate': 2.295654478301942e-06, 'epoch': 0.79} 79%|███████▊ | 4541/5773 [3:13:39<1:52:39, 5.49s/it] 79%|███████▊ | 4542/5773 [3:13:46<1:52:26, 5.48s/it] 79%|███████▊ | 4542/5773 [3:13:44<1:52:26, 5.48s/it] {'loss': 0.5725, 'learning_rate': 2.292078578727939e-06, 'epoch': 0.79} 79%|███████▊ | 4542/5773 [3:13:46<1:52:26, 5.48s/it] {'loss': 0.5725, 'learning_rate': 2.292078578727939e-06, 'epoch': 0.79} 79%|███████▊ | 4542/5773 [3:13:44<1:52:26, 5.48s/it] 79%|███████▊ | 4543/5773 [3:13:52<1:52:49, 5.50s/it] 79%|███████▊ | 4543/5773 [3:13:50<1:52:49, 5.50s/it] {'loss': 0.5519, 'learning_rate': 2.2885051058538664e-06, 'epoch': 0.79} 79%|███████▊ | 4543/5773 [3:13:52<1:52:49, 5.50s/it] {'loss': 0.5519, 'learning_rate': 2.2885051058538664e-06, 'epoch': 0.79} 79%|███████▊ | 4543/5773 [3:13:50<1:52:49, 5.50s/it] 79%|███████▊ | 4544/5773 [3:13:57<1:52:23, 5.49s/it] 79%|███████▊ | 4544/5773 [3:13:55<1:52:23, 5.49s/it] {'loss': 0.5573, 'learning_rate': 2.284934060804764e-06, 'epoch': 0.79} 79%|███████▊ | 4544/5773 [3:13:57<1:52:23, 5.49s/it] {'loss': 0.5573, 'learning_rate': 2.284934060804764e-06, 'epoch': 0.79} 79%|███████▊ | 4544/5773 [3:13:55<1:52:23, 5.49s/it] 79%|███████▊ | 4545/5773 [3:14:04<1:59:02, 5.82s/it] 79%|███████▊ | 4545/5773 [3:14:02<1:59:02, 5.82s/it] {'loss': 0.5473, 'learning_rate': 2.2813654447049172e-06, 'epoch': 0.79} 79%|███████▊ | 4545/5773 [3:14:04<1:59:02, 5.82s/it] {'loss': 0.5473, 'learning_rate': 2.2813654447049172e-06, 'epoch': 0.79} 79%|███████▊ | 4545/5773 [3:14:02<1:59:02, 5.82s/it] 79%|███████▊ | 4546/5773 [3:14:09<1:57:28, 5.74s/it] 79%|███████▊ | 4546/5773 [3:14:07<1:57:29, 5.74s/it] {'loss': 0.5545, 'learning_rate': 2.2777992586778374e-06, 'epoch': 0.79} 79%|███████▊ | 4546/5773 [3:14:09<1:57:28, 5.74s/it] {'loss': 0.5545, 'learning_rate': 2.2777992586778374e-06, 'epoch': 0.79} 79%|███████▊ | 4546/5773 [3:14:07<1:57:29, 5.74s/it] 79%|███████▉ | 4547/5773 [3:14:15<1:55:57, 5.67s/it] 79%|███████▉ | 4547/5773 [3:14:13<1:55:57, 5.67s/it] {'loss': 0.5796, 'learning_rate': 2.2742355038462726e-06, 'epoch': 0.79} 79%|███████▉ | 4547/5773 [3:14:15<1:55:57, 5.67s/it] {'loss': 0.5796, 'learning_rate': 2.2742355038462726e-06, 'epoch': 0.79} 79%|███████▉ | 4547/5773 [3:14:13<1:55:57, 5.67s/it] 79%|███████▉ | 4548/5773 [3:14:20<1:54:39, 5.62s/it] 79%|███████▉ | 4548/5773 [3:14:18<1:54:39, 5.62s/it] {'loss': 0.5505, 'learning_rate': 2.270674181332209e-06, 'epoch': 0.79} 79%|███████▉ | 4548/5773 [3:14:20<1:54:39, 5.62s/it] {'loss': 0.5505, 'learning_rate': 2.270674181332209e-06, 'epoch': 0.79} 79%|███████▉ | 4548/5773 [3:14:18<1:54:39, 5.62s/it] 79%|███████▉ | 4549/5773 [3:14:26<1:54:06, 5.59s/it] 79%|███████▉ | 4549/5773 [3:14:24<1:54:06, 5.59s/it] {'loss': 0.5648, 'learning_rate': 2.2671152922568617e-06, 'epoch': 0.79} 79%|███████▉ | 4549/5773 [3:14:26<1:54:06, 5.59s/it] {'loss': 0.5648, 'learning_rate': 2.2671152922568617e-06, 'epoch': 0.79} 79%|███████▉ | 4549/5773 [3:14:24<1:54:06, 5.59s/it]14 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 79%|███████▉ | 4550/5773 [3:14:31<1:53:53, 5.59s/it]1 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 013 AutoResumeHook: Checking whether to suspend... 43 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 79%|███████▉ | 4550/5773 [3:14:29<1:53:53, 5.59s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5617, 'learning_rate': 2.2635588377406823e-06, 'epoch': 0.79} 79%|███████▉ | 4550/5773 [3:14:31<1:53:53, 5.59s/it] {'loss': 0.5617, 'learning_rate': 2.2635588377406823e-06, 'epoch': 0.79} 79%|███████▉ | 4550/5773 [3:14:29<1:53:53, 5.59s/it] 79%|███████▉ | 4551/5773 [3:14:37<1:52:43, 5.53s/it] 79%|███████▉ | 4551/5773 [3:14:35<1:52:43, 5.53s/it] {'loss': 0.5762, 'learning_rate': 2.2600048189033618e-06, 'epoch': 0.79} 79%|███████▉ | 4551/5773 [3:14:37<1:52:43, 5.53s/it] {'loss': 0.5762, 'learning_rate': 2.2600048189033618e-06, 'epoch': 0.79} 79%|███████▉ | 4551/5773 [3:14:35<1:52:43, 5.53s/it] 79%|███████▉ | 4552/5773 [3:14:42<1:51:50, 5.50s/it] 79%|███████▉ | 4552/5773 [3:14:40<1:51:50, 5.50s/it] {'loss': 0.5548, 'learning_rate': 2.256453236863815e-06, 'epoch': 0.79} 79%|███████▉ | 4552/5773 [3:14:42<1:51:50, 5.50s/it] {'loss': 0.5548, 'learning_rate': 2.256453236863815e-06, 'epoch': 0.79} 79%|███████▉ | 4552/5773 [3:14:40<1:51:50, 5.50s/it] 79%|███████▉ | 4553/5773 [3:14:47<1:50:53, 5.45s/it] 79%|███████▉ | 4553/5773 [3:14:46<1:50:53, 5.45s/it] {'loss': 0.5569, 'learning_rate': 2.2529040927401913e-06, 'epoch': 0.79} 79%|███████▉ | 4553/5773 [3:14:47<1:50:53, 5.45s/it] {'loss': 0.5569, 'learning_rate': 2.2529040927401913e-06, 'epoch': 0.79} 79%|███████▉ | 4553/5773 [3:14:46<1:50:53, 5.45s/it] 79%|███████▉ | 4554/5773 [3:14:53<1:50:38, 5.45s/it] 79%|███████▉ | 4554/5773 [3:14:51<1:50:38, 5.45s/it] {'loss': 0.5404, 'learning_rate': 2.24935738764988e-06, 'epoch': 0.79} 79%|███████▉ | 4554/5773 [3:14:53<1:50:38, 5.45s/it] {'loss': 0.5404, 'learning_rate': 2.24935738764988e-06, 'epoch': 0.79} 79%|███████▉ | 4554/5773 [3:14:51<1:50:38, 5.45s/it] 79%|███████▉ | 4555/5773 [3:14:58<1:50:33, 5.45s/it] 79%|███████▉ | 4555/5773 [3:14:56<1:50:33, 5.45s/it] {'loss': 0.5439, 'learning_rate': 2.2458131227094936e-06, 'epoch': 0.79} 79%|███████▉ | 4555/5773 [3:14:58<1:50:33, 5.45s/it] {'loss': 0.5439, 'learning_rate': 2.2458131227094936e-06, 'epoch': 0.79} 79%|███████▉ | 4555/5773 [3:14:56<1:50:33, 5.45s/it] 79%|███████▉ | 4556/5773 [3:15:04<1:50:41, 5.46s/it] 79%|███████▉ | 4556/5773 [3:15:02<1:50:41, 5.46s/it] {'loss': 0.5646, 'learning_rate': 2.2422712990348815e-06, 'epoch': 0.79} 79%|███████▉ | 4556/5773 [3:15:04<1:50:41, 5.46s/it] {'loss': 0.5646, 'learning_rate': 2.2422712990348815e-06, 'epoch': 0.79} 79%|███████▉ | 4556/5773 [3:15:02<1:50:41, 5.46s/it] 79%|███████▉ | 4557/5773 [3:15:09<1:51:05, 5.48s/it] 79%|███████▉ | 4557/5773 [3:15:07<1:51:05, 5.48s/it] {'loss': 0.5768, 'learning_rate': 2.2387319177411194e-06, 'epoch': 0.79} 79%|███████▉ | 4557/5773 [3:15:09<1:51:05, 5.48s/it] {'loss': 0.5768, 'learning_rate': 2.2387319177411194e-06, 'epoch': 0.79} 79%|███████▉ | 4557/5773 [3:15:07<1:51:05, 5.48s/it] 79%|███████▉ | 4558/5773 [3:15:15<1:51:17, 5.50s/it] 79%|███████▉ | 4558/5773 [3:15:13<1:51:17, 5.50s/it] {'loss': 0.5771, 'learning_rate': 2.235194979942523e-06, 'epoch': 0.79} 79%|███████▉ | 4558/5773 [3:15:15<1:51:17, 5.50s/it] {'loss': 0.5771, 'learning_rate': 2.235194979942523e-06, 'epoch': 0.79} 79%|███████▉ | 4558/5773 [3:15:13<1:51:17, 5.50s/it] 79%|███████▉ | 4559/5773 [3:15:21<1:52:14, 5.55s/it] 79%|███████▉ | 4559/5773 [3:15:19<1:52:14, 5.55s/it] {'loss': 0.5452, 'learning_rate': 2.2316604867526313e-06, 'epoch': 0.79} 79%|███████▉ | 4559/5773 [3:15:21<1:52:14, 5.55s/it] {'loss': 0.5452, 'learning_rate': 2.2316604867526313e-06, 'epoch': 0.79} 79%|███████▉ | 4559/5773 [3:15:19<1:52:14, 5.55s/it] 79%|███████▉ | 4560/5773 [3:15:26<1:51:37, 5.52s/it] 79%|███████▉ | 4560/5773 [3:15:24<1:51:37, 5.52s/it] {'loss': 0.556, 'learning_rate': 2.228128439284214e-06, 'epoch': 0.79} 79%|███████▉ | 4560/5773 [3:15:26<1:51:37, 5.52s/it] {'loss': 0.556, 'learning_rate': 2.228128439284214e-06, 'epoch': 0.79} 79%|███████▉ | 4560/5773 [3:15:24<1:51:37, 5.52s/it] 79%|███████▉ | 4561/5773 [3:15:32<1:51:27, 5.52s/it] 79%|███████▉ | 4561/5773 [3:15:30<1:51:27, 5.52s/it] {'loss': 0.5798, 'learning_rate': 2.2245988386492745e-06, 'epoch': 0.79} 79%|███████▉ | 4561/5773 [3:15:32<1:51:27, 5.52s/it] {'loss': 0.5798, 'learning_rate': 2.2245988386492745e-06, 'epoch': 0.79} 79%|███████▉ | 4561/5773 [3:15:30<1:51:27, 5.52s/it] 79%|███████▉ | 4562/5773 [3:15:37<1:50:34, 5.48s/it] 79%|███████▉ | 4562/5773 [3:15:35<1:50:34, 5.48s/it] {'loss': 0.5525, 'learning_rate': 2.221071685959043e-06, 'epoch': 0.79} 79%|███████▉ | 4562/5773 [3:15:37<1:50:34, 5.48s/it] {'loss': 0.5525, 'learning_rate': 2.221071685959043e-06, 'epoch': 0.79} 79%|███████▉ | 4562/5773 [3:15:35<1:50:34, 5.48s/it] 79%|███████▉ | 4563/5773 [3:15:42<1:50:36, 5.48s/it] 79%|███████▉ | 4563/5773 [3:15:40<1:50:36, 5.48s/it] {'loss': 0.5578, 'learning_rate': 2.217546982323977e-06, 'epoch': 0.79} 79%|███████▉ | 4563/5773 [3:15:42<1:50:36, 5.48s/it] {'loss': 0.5578, 'learning_rate': 2.217546982323977e-06, 'epoch': 0.79} 79%|███████▉ | 4563/5773 [3:15:40<1:50:36, 5.48s/it] 79%|███████▉ | 4564/5773 [3:15:48<1:50:15, 5.47s/it] 79%|███████▉ | 4564/5773 [3:15:46<1:50:15, 5.47s/it] {'loss': 0.5657, 'learning_rate': 2.214024728853771e-06, 'epoch': 0.79} 79%|███████▉ | 4564/5773 [3:15:48<1:50:15, 5.47s/it] {'loss': 0.5657, 'learning_rate': 2.214024728853771e-06, 'epoch': 0.79} 79%|███████▉ | 4564/5773 [3:15:46<1:50:15, 5.47s/it] 79%|███████▉ | 4565/5773 [3:15:53<1:49:27, 5.44s/it] 79%|███████▉ | 4565/5773 [3:15:51<1:49:27, 5.44s/it] {'loss': 0.5748, 'learning_rate': 2.210504926657341e-06, 'epoch': 0.79} 79%|███████▉ | 4565/5773 [3:15:53<1:49:27, 5.44s/it] {'loss': 0.5748, 'learning_rate': 2.210504926657341e-06, 'epoch': 0.79} 79%|███████▉ | 4565/5773 [3:15:51<1:49:27, 5.44s/it] 79%|███████▉ | 4566/5773 [3:15:59<1:48:44, 5.41s/it] 79%|███████▉ | 4566/5773 [3:15:57<1:48:44, 5.41s/it] {'loss': 0.5555, 'learning_rate': 2.2069875768428296e-06, 'epoch': 0.79} 79%|███████▉ | 4566/5773 [3:15:59<1:48:44, 5.41s/it] {'loss': 0.5555, 'learning_rate': 2.2069875768428296e-06, 'epoch': 0.79} 79%|███████▉ | 4566/5773 [3:15:57<1:48:44, 5.41s/it] 79%|███████▉ | 4567/5773 [3:16:04<1:51:24, 5.54s/it] 79%|███████▉ | 4567/5773 [3:16:02<1:51:24, 5.54s/it] {'loss': 0.5604, 'learning_rate': 2.2034726805176165e-06, 'epoch': 0.79} 79%|███████▉ | 4567/5773 [3:16:04<1:51:24, 5.54s/it] {'loss': 0.5604, 'learning_rate': 2.2034726805176165e-06, 'epoch': 0.79} 79%|███████▉ | 4567/5773 [3:16:02<1:51:24, 5.54s/it] 79%|███████▉ | 4568/5773 [3:16:10<1:51:30, 5.55s/it] 79%|███████▉ | 4568/5773 [3:16:08<1:51:30, 5.55s/it] {'loss': 0.5503, 'learning_rate': 2.199960238788301e-06, 'epoch': 0.79} 79%|███████▉ | 4568/5773 [3:16:10<1:51:30, 5.55s/it] {'loss': 0.5503, 'learning_rate': 2.199960238788301e-06, 'epoch': 0.79} 79%|███████▉ | 4568/5773 [3:16:08<1:51:30, 5.55s/it] 79%|███████▉ | 4569/5773 [3:16:15<1:50:59, 5.53s/it] 79%|███████▉ | 4569/5773 [3:16:14<1:50:59, 5.53s/it] {'loss': 0.5381, 'learning_rate': 2.1964502527607135e-06, 'epoch': 0.79} 79%|███████▉ | 4569/5773 [3:16:15<1:50:59, 5.53s/it] {'loss': 0.5381, 'learning_rate': 2.1964502527607135e-06, 'epoch': 0.79} 79%|███████▉ | 4569/5773 [3:16:14<1:50:59, 5.53s/it] 79%|███████▉ | 4570/5773 [3:16:21<1:51:58, 5.59s/it] 79%|███████▉ | 4570/5773 [3:16:19<1:51:58, 5.59s/it] {'loss': 0.5403, 'learning_rate': 2.1929427235399037e-06, 'epoch': 0.79} 79%|███████▉ | 4570/5773 [3:16:21<1:51:58, 5.59s/it] {'loss': 0.5403, 'learning_rate': 2.1929427235399037e-06, 'epoch': 0.79} 79%|███████▉ | 4570/5773 [3:16:19<1:51:58, 5.59s/it] 79%|███████▉ | 4571/5773 [3:16:27<1:52:01, 5.59s/it] 79%|███████▉ | 4571/5773 [3:16:25<1:52:01, 5.59s/it] {'loss': 0.5737, 'learning_rate': 2.189437652230162e-06, 'epoch': 0.79} 79%|███████▉ | 4571/5773 [3:16:25<1:52:01, 5.59s/it] {'loss': 0.5737, 'learning_rate': 2.189437652230162e-06, 'epoch': 0.79} 79%|███████▉ | 4571/5773 [3:16:27<1:52:01, 5.59s/it] 79%|███████▉ | 4572/5773 [3:16:32<1:51:08, 5.55s/it] 79%|███████▉ | 4572/5773 [3:16:30<1:51:08, 5.55s/it] {'loss': 0.5551, 'learning_rate': 2.185935039934992e-06, 'epoch': 0.79} 79%|███████▉ | 4572/5773 [3:16:32<1:51:08, 5.55s/it] {'loss': 0.5551, 'learning_rate': 2.185935039934992e-06, 'epoch': 0.79} 79%|███████▉ | 4572/5773 [3:16:30<1:51:08, 5.55s/it] 79%|███████▉ | 4573/5773 [3:16:38<1:50:25, 5.52s/it] 79%|███████▉ | 4573/5773 [3:16:36<1:50:25, 5.52s/it] {'loss': 0.5561, 'learning_rate': 2.182434887757127e-06, 'epoch': 0.79} 79%|███████▉ | 4573/5773 [3:16:38<1:50:25, 5.52s/it] {'loss': 0.5561, 'learning_rate': 2.182434887757127e-06, 'epoch': 0.79} 79%|███████▉ | 4573/5773 [3:16:36<1:50:25, 5.52s/it] 79%|███████▉ | 4574/5773 [3:16:43<1:49:08, 5.46s/it] 79%|███████▉ | 4574/5773 [3:16:41<1:49:08, 5.46s/it] {'loss': 0.5654, 'learning_rate': 2.178937196798534e-06, 'epoch': 0.79} 79%|███████▉ | 4574/5773 [3:16:43<1:49:08, 5.46s/it] {'loss': 0.5654, 'learning_rate': 2.178937196798534e-06, 'epoch': 0.79} 79%|███████▉ | 4574/5773 [3:16:41<1:49:08, 5.46s/it] 79%|███████▉ | 4575/5773 [3:16:49<1:49:11, 5.47s/it] 79%|███████▉ | 4575/5773 [3:16:47<1:49:11, 5.47s/it] {'loss': 0.5605, 'learning_rate': 2.175441968160391e-06, 'epoch': 0.79} 79%|███████▉ | 4575/5773 [3:16:49<1:49:11, 5.47s/it] {'loss': 0.5605, 'learning_rate': 2.175441968160391e-06, 'epoch': 0.79} 79%|███████▉ | 4575/5773 [3:16:47<1:49:11, 5.47s/it] 79%|███████▉ | 4576/5773 [3:16:54<1:50:01, 5.52s/it] 79%|███████▉ | 4576/5773 [3:16:52<1:50:01, 5.52s/it] {'loss': 0.5475, 'learning_rate': 2.1719492029431054e-06, 'epoch': 0.79} 79%|███████▉ | 4576/5773 [3:16:54<1:50:01, 5.52s/it] {'loss': 0.5475, 'learning_rate': 2.1719492029431054e-06, 'epoch': 0.79} 79%|███████▉ | 4576/5773 [3:16:52<1:50:01, 5.52s/it] 79%|███████▉ | 4577/5773 [3:17:00<1:50:26, 5.54s/it] 79%|███████▉ | 4577/5773 [3:16:58<1:50:26, 5.54s/it] {'loss': 0.5702, 'learning_rate': 2.1684589022463187e-06, 'epoch': 0.79} 79%|███████▉ | 4577/5773 [3:17:00<1:50:26, 5.54s/it] {'loss': 0.5702, 'learning_rate': 2.1684589022463187e-06, 'epoch': 0.79} 79%|███████▉ | 4577/5773 [3:16:58<1:50:26, 5.54s/it] 79%|███████▉ | 4578/5773 [3:17:05<1:49:11, 5.48s/it] 79%|███████▉ | 4578/5773 [3:17:03<1:49:11, 5.48s/it] {'loss': 0.5503, 'learning_rate': 2.164971067168885e-06, 'epoch': 0.79} 79%|███████▉ | 4578/5773 [3:17:05<1:49:11, 5.48s/it] {'loss': 0.5503, 'learning_rate': 2.164971067168885e-06, 'epoch': 0.79} 79%|███████▉ | 4578/5773 [3:17:03<1:49:11, 5.48s/it] 79%|███████▉ | 4579/5773 [3:17:10<1:48:42, 5.46s/it] 79%|███████▉ | 4579/5773 [3:17:09<1:48:42, 5.46s/it] {'loss': 0.5761, 'learning_rate': 2.161485698808885e-06, 'epoch': 0.79} 79%|███████▉ | 4579/5773 [3:17:10<1:48:42, 5.46s/it] {'loss': 0.5761, 'learning_rate': 2.161485698808885e-06, 'epoch': 0.79} 79%|███████▉ | 4579/5773 [3:17:09<1:48:42, 5.46s/it] 79%|███████▉ | 4580/5773 [3:17:16<1:49:28, 5.51s/it] 79%|███████▉ | 4580/5773 [3:17:14<1:49:28, 5.51s/it] {'loss': 0.5583, 'learning_rate': 2.1580027982636277e-06, 'epoch': 0.79} 79%|███████▉ | 4580/5773 [3:17:16<1:49:28, 5.51s/it] {'loss': 0.5583, 'learning_rate': 2.1580027982636277e-06, 'epoch': 0.79} 79%|███████▉ | 4580/5773 [3:17:14<1:49:28, 5.51s/it] 79%|███████▉ | 4581/5773 [3:17:21<1:48:35, 5.47s/it] 79%|███████▉ | 4581/5773 [3:17:20<1:48:35, 5.47s/it] {'loss': 0.5547, 'learning_rate': 2.15452236662964e-06, 'epoch': 0.79} 79%|███████▉ | 4581/5773 [3:17:21<1:48:35, 5.47s/it] {'loss': 0.5547, 'learning_rate': 2.15452236662964e-06, 'epoch': 0.79} 79%|███████▉ | 4581/5773 [3:17:20<1:48:35, 5.47s/it] 79%|███████▉ | 4582/5773 [3:17:27<1:49:00, 5.49s/it] 79%|███████▉ | 4582/5773 [3:17:25<1:49:00, 5.49s/it] {'loss': 0.5551, 'learning_rate': 2.1510444050026722e-06, 'epoch': 0.79} 79%|███████▉ | 4582/5773 [3:17:27<1:49:00, 5.49s/it] {'loss': 0.5551, 'learning_rate': 2.1510444050026722e-06, 'epoch': 0.79} 79%|███████▉ | 4582/5773 [3:17:25<1:49:00, 5.49s/it] 79%|███████▉ | 4583/5773 [3:17:33<1:49:37, 5.53s/it] 79%|███████▉ | 4583/5773 [3:17:31<1:49:37, 5.53s/it] {'loss': 0.5656, 'learning_rate': 2.1475689144776958e-06, 'epoch': 0.79} 79%|███████▉ | 4583/5773 [3:17:33<1:49:37, 5.53s/it] {'loss': 0.5656, 'learning_rate': 2.1475689144776958e-06, 'epoch': 0.79} 79%|███████▉ | 4583/5773 [3:17:31<1:49:37, 5.53s/it] 79%|███████▉ | 4584/5773 [3:17:38<1:48:44, 5.49s/it] 79%|███████▉ | 4584/5773 [3:17:36<1:48:44, 5.49s/it] {'loss': 0.5574, 'learning_rate': 2.1440958961489112e-06, 'epoch': 0.79} 79%|███████▉ | 4584/5773 [3:17:38<1:48:44, 5.49s/it] {'loss': 0.5574, 'learning_rate': 2.1440958961489112e-06, 'epoch': 0.79} 79%|███████▉ | 4584/5773 [3:17:36<1:48:44, 5.49s/it] 79%|███████▉ | 4585/5773 [3:17:44<1:48:47, 5.49s/it] 79%|███████▉ | 4585/5773 [3:17:42<1:48:47, 5.49s/it] {'loss': 0.5629, 'learning_rate': 2.140625351109733e-06, 'epoch': 0.79} 79%|███████▉ | 4585/5773 [3:17:44<1:48:47, 5.49s/it] {'loss': 0.5629, 'learning_rate': 2.140625351109733e-06, 'epoch': 0.79} 79%|███████▉ | 4585/5773 [3:17:42<1:48:47, 5.49s/it] 79%|███████▉ | 4586/5773 [3:17:49<1:49:44, 5.55s/it] 79%|███████▉ | 4586/5773 [3:17:47<1:49:44, 5.55s/it] {'loss': 0.5536, 'learning_rate': 2.1371572804527972e-06, 'epoch': 0.79} 79%|███████▉ | 4586/5773 [3:17:49<1:49:44, 5.55s/it] {'loss': 0.5536, 'learning_rate': 2.1371572804527972e-06, 'epoch': 0.79} 79%|███████▉ | 4586/5773 [3:17:47<1:49:44, 5.55s/it] 79%|███████▉ | 4587/5773 [3:17:55<1:49:49, 5.56s/it] 79%|███████▉ | 4587/5773 [3:17:53<1:49:49, 5.56s/it] {'loss': 0.5667, 'learning_rate': 2.1336916852699674e-06, 'epoch': 0.79} 79%|███████▉ | 4587/5773 [3:17:55<1:49:49, 5.56s/it] {'loss': 0.5667, 'learning_rate': 2.1336916852699674e-06, 'epoch': 0.79} 79%|███████▉ | 4587/5773 [3:17:53<1:49:49, 5.56s/it] 79%|███████▉ | 4588/5773 [3:18:00<1:48:30, 5.49s/it] 79%|███████▉ | 4588/5773 [3:17:58<1:48:31, 5.49s/it] {'loss': 0.5344, 'learning_rate': 2.130228566652326e-06, 'epoch': 0.79} 79%|███████▉ | 4588/5773 [3:18:00<1:48:30, 5.49s/it] {'loss': 0.5344, 'learning_rate': 2.130228566652326e-06, 'epoch': 0.79} 79%|███████▉ | 4588/5773 [3:17:58<1:48:31, 5.49s/it] 79%|███████▉ | 4589/5773 [3:18:06<1:49:00, 5.52s/it] 79%|███████▉ | 4589/5773 [3:18:04<1:49:00, 5.52s/it] {'loss': 0.5539, 'learning_rate': 2.126767925690163e-06, 'epoch': 0.79} 79%|███████▉ | 4589/5773 [3:18:06<1:49:00, 5.52s/it] {'loss': 0.5539, 'learning_rate': 2.126767925690163e-06, 'epoch': 0.79} 79%|███████▉ | 4589/5773 [3:18:04<1:49:00, 5.52s/it] 80%|███████▉ | 4590/5773 [3:18:11<1:48:15, 5.49s/it] 80%|███████▉ | 4590/5773 [3:18:09<1:48:15, 5.49s/it] {'loss': 0.5608, 'learning_rate': 2.123309763473007e-06, 'epoch': 0.8} 80%|███████▉ | 4590/5773 [3:18:11<1:48:15, 5.49s/it] {'loss': 0.5608, 'learning_rate': 2.123309763473007e-06, 'epoch': 0.8} 80%|███████▉ | 4590/5773 [3:18:09<1:48:15, 5.49s/it] 80%|███████▉ | 4591/5773 [3:18:17<1:49:13, 5.54s/it] 80%|███████▉ | 4591/5773 [3:18:15<1:49:13, 5.54s/it] {'loss': 0.5556, 'learning_rate': 2.1198540810895975e-06, 'epoch': 0.8} 80%|███████▉ | 4591/5773 [3:18:17<1:49:13, 5.54s/it] {'loss': 0.5556, 'learning_rate': 2.1198540810895975e-06, 'epoch': 0.8} 80%|███████▉ | 4591/5773 [3:18:15<1:49:13, 5.54s/it] 80%|███████▉ | 4592/5773 [3:18:23<1:50:29, 5.61s/it] 80%|███████▉ | 4592/5773 [3:18:21<1:50:29, 5.61s/it] {'loss': 0.5529, 'learning_rate': 2.116400879627888e-06, 'epoch': 0.8} 80%|███████▉ | 4592/5773 [3:18:23<1:50:29, 5.61s/it] {'loss': 0.5529, 'learning_rate': 2.116400879627888e-06, 'epoch': 0.8} 80%|███████▉ | 4592/5773 [3:18:21<1:50:29, 5.61s/it] 80%|███████▉ | 4593/5773 [3:18:28<1:49:43, 5.58s/it] 80%|███████▉ | 4593/5773 [3:18:26<1:49:43, 5.58s/it] {'loss': 0.551, 'learning_rate': 2.1129501601750647e-06, 'epoch': 0.8} 80%|███████▉ | 4593/5773 [3:18:28<1:49:43, 5.58s/it] {'loss': 0.551, 'learning_rate': 2.1129501601750647e-06, 'epoch': 0.8} 80%|███████▉ | 4593/5773 [3:18:26<1:49:43, 5.58s/it] 80%|███████▉ | 4594/5773 [3:18:33<1:47:59, 5.50s/it] 80%|███████▉ | 4594/5773 [3:18:31<1:47:59, 5.50s/it] {'loss': 0.5694, 'learning_rate': 2.1095019238175195e-06, 'epoch': 0.8} 80%|███████▉ | 4594/5773 [3:18:33<1:47:59, 5.50s/it] {'loss': 0.5694, 'learning_rate': 2.1095019238175195e-06, 'epoch': 0.8} 80%|███████▉ | 4594/5773 [3:18:31<1:47:59, 5.50s/it] 80%|███████▉ | 4595/5773 [3:18:39<1:47:53, 5.50s/it] 80%|███████▉ | 4595/5773 [3:18:37<1:47:53, 5.50s/it] {'loss': 0.5525, 'learning_rate': 2.10605617164087e-06, 'epoch': 0.8} 80%|███████▉ | 4595/5773 [3:18:39<1:47:53, 5.50s/it] {'loss': 0.5525, 'learning_rate': 2.10605617164087e-06, 'epoch': 0.8} 80%|███████▉ | 4595/5773 [3:18:37<1:47:53, 5.50s/it] 80%|███████▉ | 4596/5773 [3:18:44<1:47:39, 5.49s/it] 80%|███████▉ | 4596/5773 [3:18:42<1:47:39, 5.49s/it] {'loss': 0.548, 'learning_rate': 2.1026129047299436e-06, 'epoch': 0.8} 80%|███████▉ | 4596/5773 [3:18:44<1:47:39, 5.49s/it] {'loss': 0.548, 'learning_rate': 2.1026129047299436e-06, 'epoch': 0.8} 80%|███████▉ | 4596/5773 [3:18:42<1:47:39, 5.49s/it] 80%|███████▉ | 4597/5773 [3:18:50<1:47:14, 5.47s/it] 80%|███████▉ | 4597/5773 [3:18:48<1:47:14, 5.47s/it] {'loss': 0.5644, 'learning_rate': 2.099172124168797e-06, 'epoch': 0.8} 80%|███████▉ | 4597/5773 [3:18:50<1:47:14, 5.47s/it] {'loss': 0.5644, 'learning_rate': 2.099172124168797e-06, 'epoch': 0.8} 80%|███████▉ | 4597/5773 [3:18:48<1:47:14, 5.47s/it] 80%|███████▉ | 4598/5773 [3:18:56<1:48:36, 5.55s/it] 80%|███████▉ | 4598/5773 [3:18:54<1:48:36, 5.55s/it] {'loss': 0.5429, 'learning_rate': 2.0957338310406962e-06, 'epoch': 0.8} 80%|███████▉ | 4598/5773 [3:18:56<1:48:36, 5.55s/it] {'loss': 0.5429, 'learning_rate': 2.0957338310406962e-06, 'epoch': 0.8} 80%|███████▉ | 4598/5773 [3:18:54<1:48:36, 5.55s/it] 80%|███████▉ | 4599/5773 [3:19:01<1:47:55, 5.52s/it] 80%|███████▉ | 4599/5773 [3:18:59<1:47:55, 5.52s/it] {'loss': 0.5688, 'learning_rate': 2.0922980264281223e-06, 'epoch': 0.8} 80%|███████▉ | 4599/5773 [3:19:01<1:47:55, 5.52s/it] {'loss': 0.5688, 'learning_rate': 2.0922980264281223e-06, 'epoch': 0.8} 80%|███████▉ | 4599/5773 [3:18:59<1:47:55, 5.52s/it]5 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend...8 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 80%|███████▉ | 4600/5773 [3:19:06<1:47:27, 5.50s/it]15 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 4 3AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...10 AutoResumeHook: Checking whether to suspend... 80%|███████▉ | 4600/5773 [3:19:04<1:47:27, 5.50s/it]6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5646, 'learning_rate': 2.088864711412781e-06, 'epoch': 0.8} 80%|███████▉ | 4600/5773 [3:19:06<1:47:27, 5.50s/it] {'loss': 0.5646, 'learning_rate': 2.088864711412781e-06, 'epoch': 0.8} 80%|███████▉ | 4600/5773 [3:19:04<1:47:27, 5.50s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 80%|███████▉ | 4601/5773 [3:19:30<3:32:29, 10.88s/it] 80%|███████▉ | 4601/5773 [3:19:28<3:32:29, 10.88s/it] {'loss': 0.5672, 'learning_rate': 2.0854338870755873e-06, 'epoch': 0.8} 80%|███████▉ | 4601/5773 [3:19:30<3:32:29, 10.88s/it] {'loss': 0.5672, 'learning_rate': 2.0854338870755873e-06, 'epoch': 0.8} 80%|███████▉ | 4601/5773 [3:19:28<3:32:29, 10.88s/it] 80%|███████▉ | 4602/5773 [3:19:33<3:00:33, 9.25s/it] 80%|███████▉ | 4602/5773 [3:19:35<3:00:33, 9.25s/it] {'loss': 0.5729, 'learning_rate': 2.0820055544966757e-06, 'epoch': 0.8} 80%|███████▉ | 4602/5773 [3:19:35<3:00:33, 9.25s/it] {'loss': 0.5729, 'learning_rate': 2.0820055544966757e-06, 'epoch': 0.8} 80%|███████▉ | 4602/5773 [3:19:33<3:00:33, 9.25s/it] 80%|███████▉ | 4603/5773 [3:19:41<2:38:08, 8.11s/it] 80%|███████▉ | 4603/5773 [3:19:39<2:38:08, 8.11s/it] {'loss': 0.5685, 'learning_rate': 2.078579714755393e-06, 'epoch': 0.8} 80%|███████▉ | 4603/5773 [3:19:41<2:38:08, 8.11s/it] {'loss': 0.5685, 'learning_rate': 2.078579714755393e-06, 'epoch': 0.8} 80%|███████▉ | 4603/5773 [3:19:39<2:38:08, 8.11s/it] 80%|███████▉ | 4604/5773 [3:19:46<2:21:48, 7.28s/it] 80%|███████▉ | 4604/5773 [3:19:44<2:21:48, 7.28s/it] {'loss': 0.543, 'learning_rate': 2.0751563689303045e-06, 'epoch': 0.8} 80%|███████▉ | 4604/5773 [3:19:46<2:21:48, 7.28s/it] {'loss': 0.543, 'learning_rate': 2.0751563689303045e-06, 'epoch': 0.8} 80%|███████▉ | 4604/5773 [3:19:44<2:21:48, 7.28s/it] 80%|███████▉ | 4605/5773 [3:19:51<2:10:47, 6.72s/it] 80%|███████▉ | 4605/5773 [3:19:50<2:10:47, 6.72s/it] {'loss': 0.5718, 'learning_rate': 2.071735518099185e-06, 'epoch': 0.8} 80%|███████▉ | 4605/5773 [3:19:51<2:10:47, 6.72s/it] {'loss': 0.5718, 'learning_rate': 2.071735518099185e-06, 'epoch': 0.8} 80%|███████▉ | 4605/5773 [3:19:50<2:10:47, 6.72s/it] 80%|███████▉ | 4606/5773 [3:19:55<2:03:39, 6.36s/it] 80%|███████▉ | 4606/5773 [3:19:57<2:03:39, 6.36s/it] {'loss': 0.5536, 'learning_rate': 2.0683171633390333e-06, 'epoch': 0.8} 80%|███████▉ | 4606/5773 [3:19:57<2:03:39, 6.36s/it] {'loss': 0.5536, 'learning_rate': 2.0683171633390333e-06, 'epoch': 0.8} 80%|███████▉ | 4606/5773 [3:19:55<2:03:39, 6.36s/it] 80%|███████▉ | 4607/5773 [3:20:01<1:58:26, 6.10s/it] 80%|███████▉ | 4607/5773 [3:20:02<1:58:26, 6.10s/it] {'loss': 0.5616, 'learning_rate': 2.064901305726055e-06, 'epoch': 0.8} 80%|███████▉ | 4607/5773 [3:20:02<1:58:26, 6.10s/it] {'loss': 0.5616, 'learning_rate': 2.064901305726055e-06, 'epoch': 0.8} 80%|███████▉ | 4607/5773 [3:20:01<1:58:26, 6.10s/it] 80%|███████▉ | 4608/5773 [3:20:08<1:53:47, 5.86s/it] 80%|███████▉ | 4608/5773 [3:20:06<1:53:47, 5.86s/it] {'loss': 0.5644, 'learning_rate': 2.0614879463356683e-06, 'epoch': 0.8} 80%|███████▉ | 4608/5773 [3:20:08<1:53:47, 5.86s/it] {'loss': 0.5644, 'learning_rate': 2.0614879463356683e-06, 'epoch': 0.8} 80%|███████▉ | 4608/5773 [3:20:06<1:53:47, 5.86s/it] 80%|███████▉ | 4609/5773 [3:20:13<1:51:58, 5.77s/it] 80%|███████▉ | 4609/5773 [3:20:11<1:51:58, 5.77s/it] {'loss': 0.5703, 'learning_rate': 2.0580770862425093e-06, 'epoch': 0.8} 80%|███████▉ | 4609/5773 [3:20:13<1:51:58, 5.77s/it] {'loss': 0.5703, 'learning_rate': 2.0580770862425093e-06, 'epoch': 0.8} 80%|███████▉ | 4609/5773 [3:20:11<1:51:58, 5.77s/it] 80%|███████▉ | 4610/5773 [3:20:19<1:51:19, 5.74s/it] 80%|███████▉ | 4610/5773 [3:20:17<1:51:19, 5.74s/it] {'loss': 0.5449, 'learning_rate': 2.054668726520428e-06, 'epoch': 0.8} 80%|███████▉ | 4610/5773 [3:20:19<1:51:19, 5.74s/it] {'loss': 0.5449, 'learning_rate': 2.054668726520428e-06, 'epoch': 0.8} 80%|███████▉ | 4610/5773 [3:20:17<1:51:19, 5.74s/it] 80%|███████▉ | 4611/5773 [3:20:25<1:50:06, 5.69s/it] 80%|███████▉ | 4611/5773 [3:20:23<1:50:06, 5.69s/it] {'loss': 0.5591, 'learning_rate': 2.051262868242483e-06, 'epoch': 0.8} 80%|███████▉ | 4611/5773 [3:20:25<1:50:06, 5.69s/it] {'loss': 0.5591, 'learning_rate': 2.051262868242483e-06, 'epoch': 0.8} 80%|███████▉ | 4611/5773 [3:20:23<1:50:06, 5.69s/it] 80%|███████▉ | 4612/5773 [3:20:28<1:48:30, 5.61s/it] 80%|███████▉ | 4612/5773 [3:20:30<1:48:30, 5.61s/it] {'loss': 0.5532, 'learning_rate': 2.0478595124809453e-06, 'epoch': 0.8} 80%|███████▉ | 4612/5773 [3:20:30<1:48:30, 5.61s/it] {'loss': 0.5532, 'learning_rate': 2.0478595124809453e-06, 'epoch': 0.8} 80%|███████▉ | 4612/5773 [3:20:28<1:48:30, 5.61s/it] 80%|███████▉ | 4613/5773 [3:20:33<1:46:46, 5.52s/it] 80%|███████▉ | 4613/5773 [3:20:35<1:46:47, 5.52s/it] {'loss': 0.5564, 'learning_rate': 2.0444586603073036e-06, 'epoch': 0.8} 80%|███████▉ | 4613/5773 [3:20:35<1:46:47, 5.52s/it] {'loss': 0.5564, 'learning_rate': 2.0444586603073036e-06, 'epoch': 0.8} 80%|███████▉ | 4613/5773 [3:20:33<1:46:46, 5.52s/it] 80%|███████▉ | 4614/5773 [3:20:39<1:46:23, 5.51s/it] 80%|███████▉ | 4614/5773 [3:20:41<1:46:23, 5.51s/it] {'loss': 0.5528, 'learning_rate': 2.0410603127922548e-06, 'epoch': 0.8} 80%|███████▉ | 4614/5773 [3:20:41<1:46:23, 5.51s/it] {'loss': 0.5528, 'learning_rate': 2.0410603127922548e-06, 'epoch': 0.8} 80%|███████▉ | 4614/5773 [3:20:39<1:46:23, 5.51s/it] 80%|███████▉ | 4615/5773 [3:20:46<1:46:16, 5.51s/it] 80%|███████▉ | 4615/5773 [3:20:44<1:46:17, 5.51s/it] {'loss': 0.5501, 'learning_rate': 2.0376644710057026e-06, 'epoch': 0.8} 80%|███████▉ | 4615/5773 [3:20:46<1:46:16, 5.51s/it] {'loss': 0.5501, 'learning_rate': 2.0376644710057026e-06, 'epoch': 0.8} 80%|███████▉ | 4615/5773 [3:20:44<1:46:17, 5.51s/it] 80%|███████▉ | 4616/5773 [3:20:52<1:45:28, 5.47s/it] 80%|███████▉ | 4616/5773 [3:20:50<1:45:29, 5.47s/it] {'loss': 0.555, 'learning_rate': 2.0342711360167755e-06, 'epoch': 0.8} 80%|███████▉ | 4616/5773 [3:20:52<1:45:28, 5.47s/it] {'loss': 0.555, 'learning_rate': 2.0342711360167755e-06, 'epoch': 0.8} 80%|███████▉ | 4616/5773 [3:20:50<1:45:29, 5.47s/it] 80%|███████▉ | 4617/5773 [3:20:57<1:44:59, 5.45s/it] 80%|███████▉ | 4617/5773 [3:20:55<1:44:59, 5.45s/it] {'loss': 0.5532, 'learning_rate': 2.0308803088937956e-06, 'epoch': 0.8} 80%|███████▉ | 4617/5773 [3:20:57<1:44:59, 5.45s/it] {'loss': 0.5532, 'learning_rate': 2.0308803088937956e-06, 'epoch': 0.8} 80%|███████▉ | 4617/5773 [3:20:55<1:44:59, 5.45s/it] 80%|███████▉ | 4618/5773 [3:21:03<1:45:08, 5.46s/it] 80%|███████▉ | 4618/5773 [3:21:01<1:45:08, 5.46s/it] {'loss': 0.5386, 'learning_rate': 2.0274919907043033e-06, 'epoch': 0.8} 80%|███████▉ | 4618/5773 [3:21:03<1:45:08, 5.46s/it] {'loss': 0.5386, 'learning_rate': 2.0274919907043033e-06, 'epoch': 0.8} 80%|███████▉ | 4618/5773 [3:21:01<1:45:08, 5.46s/it] 80%|████████ | 4619/5773 [3:21:08<1:45:32, 5.49s/it] 80%|████████ | 4619/5773 [3:21:06<1:45:32, 5.49s/it] {'loss': 0.5628, 'learning_rate': 2.0241061825150545e-06, 'epoch': 0.8} 80%|████████ | 4619/5773 [3:21:08<1:45:32, 5.49s/it] {'loss': 0.5628, 'learning_rate': 2.0241061825150545e-06, 'epoch': 0.8} 80%|████████ | 4619/5773 [3:21:06<1:45:32, 5.49s/it] 80%|████████ | 4620/5773 [3:21:14<1:45:38, 5.50s/it] 80%|████████ | 4620/5773 [3:21:12<1:45:38, 5.50s/it] {'loss': 0.5642, 'learning_rate': 2.020722885392008e-06, 'epoch': 0.8} 80%|████████ | 4620/5773 [3:21:14<1:45:38, 5.50s/it] {'loss': 0.5642, 'learning_rate': 2.020722885392008e-06, 'epoch': 0.8} 80%|████████ | 4620/5773 [3:21:12<1:45:38, 5.50s/it] 80%|████████ | 4621/5773 [3:21:19<1:45:36, 5.50s/it] 80%|████████ | 4621/5773 [3:21:17<1:45:35, 5.50s/it] {'loss': 0.5612, 'learning_rate': 2.01734210040033e-06, 'epoch': 0.8} 80%|████████ | 4621/5773 [3:21:19<1:45:36, 5.50s/it] {'loss': 0.5612, 'learning_rate': 2.01734210040033e-06, 'epoch': 0.8} 80%|████████ | 4621/5773 [3:21:17<1:45:35, 5.50s/it] 80%|████████ | 4622/5773 [3:21:24<1:43:52, 5.42s/it] 80%|████████ | 4622/5773 [3:21:22<1:43:52, 5.41s/it] {'loss': 0.5647, 'learning_rate': 2.013963828604406e-06, 'epoch': 0.8} 80%|████████ | 4622/5773 [3:21:24<1:43:52, 5.42s/it] {'loss': 0.5647, 'learning_rate': 2.013963828604406e-06, 'epoch': 0.8} 80%|████████ | 4622/5773 [3:21:22<1:43:52, 5.41s/it] 80%|████████ | 4623/5773 [3:21:30<1:43:22, 5.39s/it] 80%|████████ | 4623/5773 [3:21:28<1:43:22, 5.39s/it] {'loss': 0.5698, 'learning_rate': 2.010588071067822e-06, 'epoch': 0.8} 80%|████████ | 4623/5773 [3:21:30<1:43:22, 5.39s/it] {'loss': 0.5698, 'learning_rate': 2.010588071067822e-06, 'epoch': 0.8} 80%|████████ | 4623/5773 [3:21:28<1:43:22, 5.39s/it] 80%|████████ | 4624/5773 [3:21:35<1:43:52, 5.42s/it] 80%|████████ | 4624/5773 [3:21:33<1:43:52, 5.42s/it] {'loss': 0.54, 'learning_rate': 2.007214828853372e-06, 'epoch': 0.8} 80%|████████ | 4624/5773 [3:21:35<1:43:52, 5.42s/it] {'loss': 0.54, 'learning_rate': 2.007214828853372e-06, 'epoch': 0.8} 80%|████████ | 4624/5773 [3:21:33<1:43:52, 5.42s/it] 80%|████████ | 4625/5773 [3:21:41<1:43:34, 5.41s/it] 80%|████████ | 4625/5773 [3:21:39<1:43:34, 5.41s/it] {'loss': 0.5508, 'learning_rate': 2.0038441030230594e-06, 'epoch': 0.8} 80%|████████ | 4625/5773 [3:21:41<1:43:34, 5.41s/it] {'loss': 0.5508, 'learning_rate': 2.0038441030230594e-06, 'epoch': 0.8} 80%|████████ | 4625/5773 [3:21:39<1:43:34, 5.41s/it] 80%|████████ | 4626/5773 [3:21:46<1:43:19, 5.40s/it] 80%|████████ | 4626/5773 [3:21:44<1:43:19, 5.40s/it] {'loss': 0.5446, 'learning_rate': 2.000475894638101e-06, 'epoch': 0.8} 80%|████████ | 4626/5773 [3:21:46<1:43:19, 5.40s/it] {'loss': 0.5446, 'learning_rate': 2.000475894638101e-06, 'epoch': 0.8} 80%|████████ | 4626/5773 [3:21:44<1:43:19, 5.40s/it] 80%|████████ | 4627/5773 [3:21:51<1:43:01, 5.39s/it] 80%|████████ | 4627/5773 [3:21:49<1:43:01, 5.39s/it] {'loss': 0.5706, 'learning_rate': 1.9971102047589132e-06, 'epoch': 0.8} 80%|████████ | 4627/5773 [3:21:51<1:43:01, 5.39s/it] {'loss': 0.5706, 'learning_rate': 1.9971102047589132e-06, 'epoch': 0.8} 80%|████████ | 4627/5773 [3:21:49<1:43:01, 5.39s/it] 80%|████████ | 4628/5773 [3:21:57<1:43:41, 5.43s/it] 80%|████████ | 4628/5773 [3:21:55<1:43:41, 5.43s/it] {'loss': 0.5513, 'learning_rate': 1.9937470344451215e-06, 'epoch': 0.8} 80%|████████ | 4628/5773 [3:21:57<1:43:41, 5.43s/it] {'loss': 0.5513, 'learning_rate': 1.9937470344451215e-06, 'epoch': 0.8} 80%|████████ | 4628/5773 [3:21:55<1:43:41, 5.43s/it] 80%|████████ | 4629/5773 [3:22:02<1:43:40, 5.44s/it] 80%|████████ | 4629/5773 [3:22:00<1:43:40, 5.44s/it] {'loss': 0.5591, 'learning_rate': 1.990386384755565e-06, 'epoch': 0.8} 80%|████████ | 4629/5773 [3:22:02<1:43:40, 5.44s/it] {'loss': 0.5591, 'learning_rate': 1.990386384755565e-06, 'epoch': 0.8} 80%|████████ | 4629/5773 [3:22:00<1:43:40, 5.44s/it] 80%|████████ | 4630/5773 [3:22:08<1:44:07, 5.47s/it] 80%|████████ | 4630/5773 [3:22:06<1:44:07, 5.47s/it] {'loss': 0.5469, 'learning_rate': 1.9870282567482814e-06, 'epoch': 0.8} 80%|████████ | 4630/5773 [3:22:08<1:44:07, 5.47s/it] {'loss': 0.5469, 'learning_rate': 1.9870282567482814e-06, 'epoch': 0.8} 80%|████████ | 4630/5773 [3:22:06<1:44:07, 5.47s/it] 80%|████████ | 4631/5773 [3:22:13<1:43:36, 5.44s/it] 80%|████████ | 4631/5773 [3:22:11<1:43:36, 5.44s/it] {'loss': 0.5624, 'learning_rate': 1.9836726514805105e-06, 'epoch': 0.8} 80%|████████ | 4631/5773 [3:22:13<1:43:36, 5.44s/it] {'loss': 0.5624, 'learning_rate': 1.9836726514805105e-06, 'epoch': 0.8} 80%|████████ | 4631/5773 [3:22:11<1:43:36, 5.44s/it] 80%|████████ | 4632/5773 [3:22:19<1:43:15, 5.43s/it] 80%|████████ | 4632/5773 [3:22:17<1:43:15, 5.43s/it] {'loss': 0.5501, 'learning_rate': 1.9803195700087117e-06, 'epoch': 0.8} 80%|████████ | 4632/5773 [3:22:19<1:43:15, 5.43s/it] {'loss': 0.5501, 'learning_rate': 1.9803195700087117e-06, 'epoch': 0.8} 80%|████████ | 4632/5773 [3:22:17<1:43:15, 5.43s/it] 80%|████████ | 4633/5773 [3:22:24<1:43:22, 5.44s/it] 80%|████████ | 4633/5773 [3:22:22<1:43:22, 5.44s/it] {'loss': 0.5524, 'learning_rate': 1.9769690133885387e-06, 'epoch': 0.8} 80%|████████ | 4633/5773 [3:22:24<1:43:22, 5.44s/it] {'loss': 0.5524, 'learning_rate': 1.9769690133885387e-06, 'epoch': 0.8} 80%|████████ | 4633/5773 [3:22:22<1:43:22, 5.44s/it] 80%|████████ | 4634/5773 [3:22:30<1:44:33, 5.51s/it] 80%|████████ | 4634/5773 [3:22:28<1:44:33, 5.51s/it] {'loss': 0.5523, 'learning_rate': 1.973620982674852e-06, 'epoch': 0.8} 80%|████████ | 4634/5773 [3:22:30<1:44:33, 5.51s/it] {'loss': 0.5523, 'learning_rate': 1.973620982674852e-06, 'epoch': 0.8} 80%|████████ | 4634/5773 [3:22:28<1:44:33, 5.51s/it] 80%|████████ | 4635/5773 [3:22:35<1:43:57, 5.48s/it] 80%|████████ | 4635/5773 [3:22:33<1:43:57, 5.48s/it] {'loss': 0.5556, 'learning_rate': 1.9702754789217237e-06, 'epoch': 0.8} 80%|████████ | 4635/5773 [3:22:35<1:43:57, 5.48s/it] {'loss': 0.5556, 'learning_rate': 1.9702754789217237e-06, 'epoch': 0.8} 80%|████████ | 4635/5773 [3:22:33<1:43:57, 5.48s/it] 80%|████████ | 4636/5773 [3:22:41<1:44:08, 5.50s/it] 80%|████████ | 4636/5773 [3:22:39<1:44:08, 5.50s/it] {'loss': 0.5545, 'learning_rate': 1.9669325031824225e-06, 'epoch': 0.8} 80%|████████ | 4636/5773 [3:22:41<1:44:08, 5.50s/it] {'loss': 0.5545, 'learning_rate': 1.9669325031824225e-06, 'epoch': 0.8} 80%|████████ | 4636/5773 [3:22:39<1:44:08, 5.50s/it] 80%|████████ | 4637/5773 [3:22:46<1:44:24, 5.51s/it] 80%|████████ | 4637/5773 [3:22:44<1:44:24, 5.51s/it] {'loss': 0.5614, 'learning_rate': 1.963592056509425e-06, 'epoch': 0.8} 80%|████████ | 4637/5773 [3:22:46<1:44:24, 5.51s/it] {'loss': 0.5614, 'learning_rate': 1.963592056509425e-06, 'epoch': 0.8} 80%|████████ | 4637/5773 [3:22:44<1:44:24, 5.51s/it] 80%|████████ | 4638/5773 [3:22:52<1:44:30, 5.52s/it] 80%|████████ | 4638/5773 [3:22:50<1:44:30, 5.52s/it] {'loss': 0.5548, 'learning_rate': 1.9602541399544053e-06, 'epoch': 0.8} 80%|████████ | 4638/5773 [3:22:52<1:44:30, 5.52s/it] {'loss': 0.5548, 'learning_rate': 1.9602541399544053e-06, 'epoch': 0.8} 80%|████████ | 4638/5773 [3:22:50<1:44:30, 5.52s/it] 80%|████████ | 4639/5773 [3:22:57<1:44:56, 5.55s/it] 80%|████████ | 4639/5773 [3:22:55<1:44:56, 5.55s/it] {'loss': 0.567, 'learning_rate': 1.956918754568253e-06, 'epoch': 0.8} 80%|████████ | 4639/5773 [3:22:57<1:44:56, 5.55s/it] {'loss': 0.567, 'learning_rate': 1.956918754568253e-06, 'epoch': 0.8} 80%|████████ | 4639/5773 [3:22:55<1:44:56, 5.55s/it] 80%|████████ | 4640/5773 [3:23:03<1:44:33, 5.54s/it] 80%|████████ | 4640/5773 [3:23:01<1:44:33, 5.54s/it] {'loss': 0.561, 'learning_rate': 1.9535859014010525e-06, 'epoch': 0.8} 80%|████████ | 4640/5773 [3:23:03<1:44:33, 5.54s/it] {'loss': 0.561, 'learning_rate': 1.9535859014010525e-06, 'epoch': 0.8} 80%|████████ | 4640/5773 [3:23:01<1:44:33, 5.54s/it] 80%|████████ | 4641/5773 [3:23:08<1:43:36, 5.49s/it] 80%|████████ | 4641/5773 [3:23:06<1:43:36, 5.49s/it] {'loss': 0.5628, 'learning_rate': 1.9502555815020897e-06, 'epoch': 0.8} 80%|████████ | 4641/5773 [3:23:08<1:43:36, 5.49s/it] {'loss': 0.5628, 'learning_rate': 1.9502555815020897e-06, 'epoch': 0.8} 80%|████████ | 4641/5773 [3:23:06<1:43:36, 5.49s/it] 80%|████████ | 4642/5773 [3:23:14<1:42:49, 5.45s/it] 80%|████████ | 4642/5773 [3:23:12<1:42:49, 5.45s/it] {'loss': 0.5443, 'learning_rate': 1.9469277959198584e-06, 'epoch': 0.8} 80%|████████ | 4642/5773 [3:23:14<1:42:49, 5.45s/it] {'loss': 0.5443, 'learning_rate': 1.9469277959198584e-06, 'epoch': 0.8} 80%|████████ | 4642/5773 [3:23:12<1:42:49, 5.45s/it] 80%|████████ | 4643/5773 [3:23:19<1:42:40, 5.45s/it] 80%|████████ | 4643/5773 [3:23:17<1:42:40, 5.45s/it] {'loss': 0.5533, 'learning_rate': 1.943602545702051e-06, 'epoch': 0.8} 80%|████████ | 4643/5773 [3:23:19<1:42:40, 5.45s/it] {'loss': 0.5533, 'learning_rate': 1.943602545702051e-06, 'epoch': 0.8} 80%|████████ | 4643/5773 [3:23:17<1:42:40, 5.45s/it] 80%|████████ | 4644/5773 [3:23:25<1:43:19, 5.49s/it] 80%|████████ | 4644/5773 [3:23:23<1:43:19, 5.49s/it] {'loss': 0.5671, 'learning_rate': 1.940279831895563e-06, 'epoch': 0.8} 80%|████████ | 4644/5773 [3:23:25<1:43:19, 5.49s/it] {'loss': 0.5671, 'learning_rate': 1.940279831895563e-06, 'epoch': 0.8} 80%|████████ | 4644/5773 [3:23:23<1:43:19, 5.49s/it] 80%|████████ | 4645/5773 [3:23:30<1:43:29, 5.50s/it] 80%|████████ | 4645/5773 [3:23:28<1:43:29, 5.50s/it] {'loss': 0.5544, 'learning_rate': 1.9369596555464897e-06, 'epoch': 0.8} 80%|████████ | 4645/5773 [3:23:30<1:43:29, 5.50s/it] {'loss': 0.5544, 'learning_rate': 1.9369596555464897e-06, 'epoch': 0.8} 80%|████████ | 4645/5773 [3:23:28<1:43:29, 5.50s/it] 80%|████████ | 4646/5773 [3:23:36<1:42:45, 5.47s/it] 80%|████████ | 4646/5773 [3:23:34<1:42:45, 5.47s/it] {'loss': 0.5607, 'learning_rate': 1.93364201770013e-06, 'epoch': 0.8} 80%|████████ | 4646/5773 [3:23:36<1:42:45, 5.47s/it] {'loss': 0.5607, 'learning_rate': 1.93364201770013e-06, 'epoch': 0.8} 80%|████████ | 4646/5773 [3:23:34<1:42:45, 5.47s/it] 80%|████████ | 4647/5773 [3:23:41<1:43:03, 5.49s/it] 80%|████████ | 4647/5773 [3:23:39<1:43:03, 5.49s/it] {'loss': 0.5514, 'learning_rate': 1.930326919400979e-06, 'epoch': 0.8} 80%|████████ | 4647/5773 [3:23:41<1:43:03, 5.49s/it] {'loss': 0.5514, 'learning_rate': 1.930326919400979e-06, 'epoch': 0.8} 80%|████████ | 4647/5773 [3:23:39<1:43:03, 5.49s/it] 81%|████████ | 4648/5773 [3:23:47<1:43:22, 5.51s/it] 81%|████████ | 4648/5773 [3:23:45<1:43:22, 5.51s/it] {'loss': 0.5637, 'learning_rate': 1.9270143616927394e-06, 'epoch': 0.81} 81%|████████ | 4648/5773 [3:23:47<1:43:22, 5.51s/it] {'loss': 0.5637, 'learning_rate': 1.9270143616927394e-06, 'epoch': 0.81} 81%|████████ | 4648/5773 [3:23:45<1:43:22, 5.51s/it] 81%|████████ | 4649/5773 [3:23:52<1:43:00, 5.50s/it] 81%|████████ | 4649/5773 [3:23:50<1:43:00, 5.50s/it] {'loss': 0.56, 'learning_rate': 1.923704345618309e-06, 'epoch': 0.81} 81%|████████ | 4649/5773 [3:23:52<1:43:00, 5.50s/it] {'loss': 0.56, 'learning_rate': 1.923704345618309e-06, 'epoch': 0.81} 81%|████████ | 4649/5773 [3:23:50<1:43:00, 5.50s/it]5 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 109 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 81%|████████ | 4650/5773 [3:23:58<1:43:03, 5.51s/it] 81%|████████ | 4650/5773 [3:23:56<1:43:03, 5.51s/it]2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5737, 'learning_rate': 1.9203968722197876e-06, 'epoch': 0.81} 81%|████████ | 4650/5773 [3:23:58<1:43:03, 5.51s/it] {'loss': 0.5737, 'learning_rate': 1.9203968722197876e-06, 'epoch': 0.81} 81%|████████ | 4650/5773 [3:23:56<1:43:03, 5.51s/it] 81%|████████ | 4651/5773 [3:24:03<1:42:16, 5.47s/it] 81%|████████ | 4651/5773 [3:24:01<1:42:16, 5.47s/it] {'loss': 0.5571, 'learning_rate': 1.9170919425384695e-06, 'epoch': 0.81} 81%|████████ | 4651/5773 [3:24:03<1:42:16, 5.47s/it] {'loss': 0.5571, 'learning_rate': 1.9170919425384695e-06, 'epoch': 0.81} 81%|████████ | 4651/5773 [3:24:01<1:42:16, 5.47s/it] 81%|████████ | 4652/5773 [3:24:09<1:42:51, 5.51s/it] 81%|████████ | 4652/5773 [3:24:07<1:42:51, 5.51s/it] {'loss': 0.5615, 'learning_rate': 1.913789557614857e-06, 'epoch': 0.81} 81%|████████ | 4652/5773 [3:24:09<1:42:51, 5.51s/it] {'loss': 0.5615, 'learning_rate': 1.913789557614857e-06, 'epoch': 0.81} 81%|████████ | 4652/5773 [3:24:07<1:42:51, 5.51s/it] 81%|████████ | 4653/5773 [3:24:14<1:41:49, 5.45s/it] 81%|████████ | 4653/5773 [3:24:12<1:41:49, 5.45s/it] {'loss': 0.5574, 'learning_rate': 1.910489718488645e-06, 'epoch': 0.81} 81%|████████ | 4653/5773 [3:24:14<1:41:49, 5.45s/it] {'loss': 0.5574, 'learning_rate': 1.910489718488645e-06, 'epoch': 0.81} 81%|████████ | 4653/5773 [3:24:12<1:41:49, 5.45s/it] 81%|████████ | 4654/5773 [3:24:20<1:41:46, 5.46s/it] 81%|████████ | 4654/5773 [3:24:18<1:41:46, 5.46s/it] {'loss': 0.5648, 'learning_rate': 1.907192426198724e-06, 'epoch': 0.81} 81%|████████ | 4654/5773 [3:24:20<1:41:46, 5.46s/it] {'loss': 0.5648, 'learning_rate': 1.907192426198724e-06, 'epoch': 0.81} 81%|████████ | 4654/5773 [3:24:18<1:41:46, 5.46s/it] 81%|████████ | 4655/5773 [3:24:25<1:41:40, 5.46s/it] 81%|████████ | 4655/5773 [3:24:23<1:41:40, 5.46s/it] {'loss': 0.5498, 'learning_rate': 1.9038976817831945e-06, 'epoch': 0.81} 81%|████████ | 4655/5773 [3:24:25<1:41:40, 5.46s/it] {'loss': 0.5498, 'learning_rate': 1.9038976817831945e-06, 'epoch': 0.81} 81%|████████ | 4655/5773 [3:24:23<1:41:40, 5.46s/it] 81%|████████ | 4656/5773 [3:24:31<1:41:56, 5.48s/it] 81%|████████ | 4656/5773 [3:24:29<1:41:56, 5.48s/it] {'loss': 0.5744, 'learning_rate': 1.9006054862793433e-06, 'epoch': 0.81} 81%|████████ | 4656/5773 [3:24:31<1:41:56, 5.48s/it] {'loss': 0.5744, 'learning_rate': 1.9006054862793433e-06, 'epoch': 0.81} 81%|████████ | 4656/5773 [3:24:29<1:41:56, 5.48s/it] 81%|████████ | 4657/5773 [3:24:36<1:42:00, 5.48s/it] 81%|████████ | 4657/5773 [3:24:34<1:42:00, 5.48s/it] {'loss': 0.5634, 'learning_rate': 1.897315840723658e-06, 'epoch': 0.81} 81%|████████ | 4657/5773 [3:24:36<1:42:00, 5.48s/it] {'loss': 0.5634, 'learning_rate': 1.897315840723658e-06, 'epoch': 0.81} 81%|████████ | 4657/5773 [3:24:34<1:42:00, 5.48s/it] 81%|████████ | 4658/5773 [3:24:42<1:42:16, 5.50s/it] 81%|████████ | 4658/5773 [3:24:40<1:42:16, 5.50s/it] {'loss': 0.5435, 'learning_rate': 1.8940287461518259e-06, 'epoch': 0.81} 81%|████████ | 4658/5773 [3:24:42<1:42:16, 5.50s/it] {'loss': 0.5435, 'learning_rate': 1.8940287461518259e-06, 'epoch': 0.81} 81%|████████ | 4658/5773 [3:24:40<1:42:16, 5.50s/it] 81%|████████ | 4659/5773 [3:24:47<1:42:21, 5.51s/it] 81%|████████ | 4659/5773 [3:24:45<1:42:21, 5.51s/it] {'loss': 0.5471, 'learning_rate': 1.890744203598729e-06, 'epoch': 0.81} 81%|████████ | 4659/5773 [3:24:47<1:42:21, 5.51s/it] {'loss': 0.5471, 'learning_rate': 1.890744203598729e-06, 'epoch': 0.81} 81%|████████ | 4659/5773 [3:24:45<1:42:21, 5.51s/it] 81%|████████ | 4660/5773 [3:24:53<1:43:16, 5.57s/it] 81%|████████ | 4660/5773 [3:24:51<1:43:16, 5.57s/it] {'loss': 0.552, 'learning_rate': 1.8874622140984422e-06, 'epoch': 0.81} 81%|████████ | 4660/5773 [3:24:53<1:43:16, 5.57s/it] {'loss': 0.552, 'learning_rate': 1.8874622140984422e-06, 'epoch': 0.81} 81%|████████ | 4660/5773 [3:24:51<1:43:16, 5.57s/it] 81%|████████ | 4661/5773 [3:24:58<1:42:31, 5.53s/it] 81%|████████ | 4661/5773 [3:24:56<1:42:31, 5.53s/it] {'loss': 0.5543, 'learning_rate': 1.884182778684247e-06, 'epoch': 0.81} 81%|████████ | 4661/5773 [3:24:58<1:42:31, 5.53s/it] {'loss': 0.5543, 'learning_rate': 1.884182778684247e-06, 'epoch': 0.81} 81%|████████ | 4661/5773 [3:24:56<1:42:31, 5.53s/it] 81%|████████ | 4662/5773 [3:25:04<1:42:44, 5.55s/it] 81%|████████ | 4662/5773 [3:25:02<1:42:44, 5.55s/it] {'loss': 0.5617, 'learning_rate': 1.880905898388612e-06, 'epoch': 0.81} 81%|████████ | 4662/5773 [3:25:04<1:42:44, 5.55s/it] {'loss': 0.5617, 'learning_rate': 1.880905898388612e-06, 'epoch': 0.81} 81%|████████ | 4662/5773 [3:25:02<1:42:44, 5.55s/it] 81%|████████ | 4663/5773 [3:25:09<1:43:02, 5.57s/it] 81%|████████ | 4663/5773 [3:25:07<1:43:02, 5.57s/it] {'loss': 0.543, 'learning_rate': 1.8776315742432026e-06, 'epoch': 0.81} 81%|████████ | 4663/5773 [3:25:09<1:43:02, 5.57s/it] {'loss': 0.543, 'learning_rate': 1.8776315742432026e-06, 'epoch': 0.81} 81%|████████ | 4663/5773 [3:25:07<1:43:02, 5.57s/it] 81%|████████ | 4664/5773 [3:25:15<1:42:55, 5.57s/it] 81%|████████ | 4664/5773 [3:25:13<1:42:55, 5.57s/it] {'loss': 0.5716, 'learning_rate': 1.8743598072788805e-06, 'epoch': 0.81} 81%|████████ | 4664/5773 [3:25:15<1:42:55, 5.57s/it] {'loss': 0.5716, 'learning_rate': 1.8743598072788805e-06, 'epoch': 0.81} 81%|████████ | 4664/5773 [3:25:13<1:42:55, 5.57s/it] 81%|████████ | 4665/5773 [3:25:19<1:42:52, 5.57s/it] 81%|████████ | 4665/5773 [3:25:21<1:42:52, 5.57s/it] {'loss': 0.5418, 'learning_rate': 1.8710905985257044e-06, 'epoch': 0.81} 81%|████████ | 4665/5773 [3:25:19<1:42:52, 5.57s/it]{'loss': 0.5418, 'learning_rate': 1.8710905985257044e-06, 'epoch': 0.81} 81%|████████ | 4665/5773 [3:25:21<1:42:52, 5.57s/it] 81%|████████ | 4666/5773 [3:25:26<1:43:57, 5.63s/it] 81%|████████ | 4666/5773 [3:25:24<1:43:57, 5.63s/it] {'loss': 0.5664, 'learning_rate': 1.8678239490129258e-06, 'epoch': 0.81} 81%|████████ | 4666/5773 [3:25:26<1:43:57, 5.63s/it] {'loss': 0.5664, 'learning_rate': 1.8678239490129258e-06, 'epoch': 0.81} 81%|████████ | 4666/5773 [3:25:24<1:43:57, 5.63s/it] 81%|████████ | 4667/5773 [3:25:32<1:42:24, 5.56s/it] 81%|████████ | 4667/5773 [3:25:30<1:42:24, 5.56s/it] {'loss': 0.5694, 'learning_rate': 1.8645598597689861e-06, 'epoch': 0.81} 81%|████████ | 4667/5773 [3:25:32<1:42:24, 5.56s/it] {'loss': 0.5694, 'learning_rate': 1.8645598597689861e-06, 'epoch': 0.81} 81%|████████ | 4667/5773 [3:25:30<1:42:24, 5.56s/it] 81%|████████ | 4668/5773 [3:25:37<1:41:49, 5.53s/it] 81%|████████ | 4668/5773 [3:25:35<1:41:49, 5.53s/it] {'loss': 0.5808, 'learning_rate': 1.8612983318215316e-06, 'epoch': 0.81} 81%|████████ | 4668/5773 [3:25:37<1:41:49, 5.53s/it] {'loss': 0.5808, 'learning_rate': 1.8612983318215316e-06, 'epoch': 0.81} 81%|████████ | 4668/5773 [3:25:35<1:41:49, 5.53s/it] 81%|████████ | 4669/5773 [3:25:43<1:41:23, 5.51s/it] 81%|████████ | 4669/5773 [3:25:41<1:41:23, 5.51s/it] {'loss': 0.5502, 'learning_rate': 1.8580393661973916e-06, 'epoch': 0.81} 81%|████████ | 4669/5773 [3:25:43<1:41:23, 5.51s/it] {'loss': 0.5502, 'learning_rate': 1.8580393661973916e-06, 'epoch': 0.81} 81%|████████ | 4669/5773 [3:25:41<1:41:23, 5.51s/it] 81%|████████ | 4670/5773 [3:25:48<1:40:35, 5.47s/it] 81%|████████ | 4670/5773 [3:25:46<1:40:35, 5.47s/it] {'loss': 0.5485, 'learning_rate': 1.8547829639225922e-06, 'epoch': 0.81} 81%|████████ | 4670/5773 [3:25:48<1:40:35, 5.47s/it] {'loss': 0.5485, 'learning_rate': 1.8547829639225922e-06, 'epoch': 0.81} 81%|████████ | 4670/5773 [3:25:46<1:40:35, 5.47s/it] 81%|████████ | 4671/5773 [3:25:53<1:40:14, 5.46s/it] 81%|████████ | 4671/5773 [3:25:52<1:40:14, 5.46s/it] {'loss': 0.5623, 'learning_rate': 1.8515291260223523e-06, 'epoch': 0.81} 81%|████████ | 4671/5773 [3:25:53<1:40:14, 5.46s/it] {'loss': 0.5623, 'learning_rate': 1.8515291260223523e-06, 'epoch': 0.81} 81%|████████ | 4671/5773 [3:25:52<1:40:14, 5.46s/it] 81%|████████ | 4672/5773 [3:25:59<1:41:47, 5.55s/it] 81%|████████ | 4672/5773 [3:25:57<1:41:47, 5.55s/it] {'loss': 0.5499, 'learning_rate': 1.84827785352109e-06, 'epoch': 0.81} 81%|████████ | 4672/5773 [3:25:59<1:41:47, 5.55s/it] {'loss': 0.5499, 'learning_rate': 1.84827785352109e-06, 'epoch': 0.81} 81%|████████ | 4672/5773 [3:25:57<1:41:47, 5.55s/it] 81%|████████ | 4673/5773 [3:26:03<1:41:28, 5.53s/it] 81%|████████ | 4673/5773 [3:26:05<1:41:28, 5.53s/it] {'loss': 0.5575, 'learning_rate': 1.8450291474423999e-06, 'epoch': 0.81} 81%|████████ | 4673/5773 [3:26:05<1:41:28, 5.53s/it] {'loss': 0.5575, 'learning_rate': 1.8450291474423999e-06, 'epoch': 0.81} 81%|████████ | 4673/5773 [3:26:03<1:41:28, 5.53s/it] 81%|████████ | 4674/5773 [3:26:10<1:41:19, 5.53s/it] 81%|████████ | 4674/5773 [3:26:08<1:41:19, 5.53s/it] {'loss': 0.5425, 'learning_rate': 1.841783008809086e-06, 'epoch': 0.81} 81%|████████ | 4674/5773 [3:26:10<1:41:19, 5.53s/it] {'loss': 0.5425, 'learning_rate': 1.841783008809086e-06, 'epoch': 0.81} 81%|████████ | 4674/5773 [3:26:08<1:41:19, 5.53s/it] 81%|████████ | 4675/5773 [3:26:16<1:41:07, 5.53s/it] 81%|████████ | 4675/5773 [3:26:14<1:41:07, 5.53s/it] {'loss': 0.5512, 'learning_rate': 1.8385394386431343e-06, 'epoch': 0.81} 81%|████████ | 4675/5773 [3:26:16<1:41:07, 5.53s/it] {'loss': 0.5512, 'learning_rate': 1.8385394386431343e-06, 'epoch': 0.81} 81%|████████ | 4675/5773 [3:26:14<1:41:07, 5.53s/it] 81%|████████ | 4676/5773 [3:26:19<1:41:37, 5.56s/it] 81%|████████ | 4676/5773 [3:26:21<1:41:37, 5.56s/it] {'loss': 0.5708, 'learning_rate': 1.8352984379657247e-06, 'epoch': 0.81} 81%|████████ | 4676/5773 [3:26:21<1:41:37, 5.56s/it] {'loss': 0.5708, 'learning_rate': 1.8352984379657247e-06, 'epoch': 0.81} 81%|████████ | 4676/5773 [3:26:19<1:41:37, 5.56s/it] 81%|████████ | 4677/5773 [3:26:27<1:41:36, 5.56s/it] 81%|████████ | 4677/5773 [3:26:25<1:41:36, 5.56s/it] {'loss': 0.5389, 'learning_rate': 1.832060007797225e-06, 'epoch': 0.81} 81%|████████ | 4677/5773 [3:26:27<1:41:36, 5.56s/it] {'loss': 0.5389, 'learning_rate': 1.832060007797225e-06, 'epoch': 0.81} 81%|████████ | 4677/5773 [3:26:25<1:41:36, 5.56s/it] 81%|████████ | 4678/5773 [3:26:31<1:42:39, 5.63s/it] 81%|████████ | 4678/5773 [3:26:33<1:42:39, 5.63s/it] {'loss': 0.556, 'learning_rate': 1.8288241491572e-06, 'epoch': 0.81} 81%|████████ | 4678/5773 [3:26:33<1:42:39, 5.63s/it] {'loss': 0.556, 'learning_rate': 1.8288241491572e-06, 'epoch': 0.81} 81%|████████ | 4678/5773 [3:26:31<1:42:39, 5.63s/it] 81%|████████ | 4679/5773 [3:26:38<1:41:21, 5.56s/it] 81%|████████ | 4679/5773 [3:26:36<1:41:21, 5.56s/it] {'loss': 0.5471, 'learning_rate': 1.8255908630644015e-06, 'epoch': 0.81} 81%|████████ | 4679/5773 [3:26:38<1:41:21, 5.56s/it] {'loss': 0.5471, 'learning_rate': 1.8255908630644015e-06, 'epoch': 0.81} 81%|████████ | 4679/5773 [3:26:36<1:41:21, 5.56s/it] 81%|████████ | 4680/5773 [3:26:44<1:41:27, 5.57s/it] 81%|████████ | 4680/5773 [3:26:42<1:41:27, 5.57s/it] {'loss': 0.5569, 'learning_rate': 1.822360150536766e-06, 'epoch': 0.81} 81%|████████ | 4680/5773 [3:26:44<1:41:27, 5.57s/it] {'loss': 0.5569, 'learning_rate': 1.822360150536766e-06, 'epoch': 0.81} 81%|████████ | 4680/5773 [3:26:42<1:41:27, 5.57s/it] 81%|████████ | 4681/5773 [3:26:49<1:40:47, 5.54s/it] 81%|████████ | 4681/5773 [3:26:47<1:40:48, 5.54s/it] {'loss': 0.5528, 'learning_rate': 1.8191320125914324e-06, 'epoch': 0.81} 81%|████████ | 4681/5773 [3:26:49<1:40:47, 5.54s/it] {'loss': 0.5528, 'learning_rate': 1.8191320125914324e-06, 'epoch': 0.81} 81%|████████ | 4681/5773 [3:26:47<1:40:48, 5.54s/it] 81%|████████ | 4682/5773 [3:26:53<1:40:55, 5.55s/it] 81%|████████ | 4682/5773 [3:26:55<1:40:55, 5.55s/it] {'loss': 0.5635, 'learning_rate': 1.8159064502447177e-06, 'epoch': 0.81} 81%|████████ | 4682/5773 [3:26:55<1:40:55, 5.55s/it] {'loss': 0.5635, 'learning_rate': 1.8159064502447177e-06, 'epoch': 0.81} 81%|████████ | 4682/5773 [3:26:53<1:40:55, 5.55s/it] 81%|████████ | 4683/5773 [3:27:00<1:40:43, 5.54s/it] 81%|████████ | 4683/5773 [3:26:58<1:40:44, 5.54s/it] {'loss': 0.553, 'learning_rate': 1.8126834645121305e-06, 'epoch': 0.81} 81%|████████ | 4683/5773 [3:27:00<1:40:43, 5.54s/it] {'loss': 0.553, 'learning_rate': 1.8126834645121305e-06, 'epoch': 0.81} 81%|████████ | 4683/5773 [3:26:58<1:40:44, 5.54s/it] 81%|████████ | 4684/5773 [3:27:06<1:41:00, 5.56s/it] 81%|████████ | 4684/5773 [3:27:04<1:40:59, 5.56s/it] {'loss': 0.5511, 'learning_rate': 1.8094630564083737e-06, 'epoch': 0.81} 81%|████████ | 4684/5773 [3:27:06<1:41:00, 5.56s/it] {'loss': 0.5511, 'learning_rate': 1.8094630564083737e-06, 'epoch': 0.81} 81%|████████ | 4684/5773 [3:27:04<1:40:59, 5.56s/it] 81%|████████ | 4685/5773 [3:27:09<1:40:00, 5.52s/it] 81%|████████ | 4685/5773 [3:27:11<1:40:00, 5.52s/it] {'loss': 0.5644, 'learning_rate': 1.806245226947333e-06, 'epoch': 0.81} 81%|████████ | 4685/5773 [3:27:11<1:40:00, 5.52s/it] {'loss': 0.5644, 'learning_rate': 1.806245226947333e-06, 'epoch': 0.81} 81%|████████ | 4685/5773 [3:27:09<1:40:00, 5.52s/it] 81%|████████ | 4686/5773 [3:27:17<1:40:51, 5.57s/it] 81%|████████ | 4686/5773 [3:27:15<1:40:52, 5.57s/it] {'loss': 0.5652, 'learning_rate': 1.8030299771420834e-06, 'epoch': 0.81} 81%|████████ | 4686/5773 [3:27:17<1:40:51, 5.57s/it] {'loss': 0.5652, 'learning_rate': 1.8030299771420834e-06, 'epoch': 0.81} 81%|████████ | 4686/5773 [3:27:15<1:40:52, 5.57s/it] 81%|████████ | 4687/5773 [3:27:23<1:41:14, 5.59s/it] 81%|████████ | 4687/5773 [3:27:21<1:41:14, 5.59s/it] {'loss': 0.5547, 'learning_rate': 1.7998173080048876e-06, 'epoch': 0.81} 81%|████████ | 4687/5773 [3:27:23<1:41:14, 5.59s/it] {'loss': 0.5547, 'learning_rate': 1.7998173080048876e-06, 'epoch': 0.81} 81%|████████ | 4687/5773 [3:27:21<1:41:14, 5.59s/it] 81%|████████ | 4688/5773 [3:27:28<1:40:09, 5.54s/it] 81%|████████ | 4688/5773 [3:27:26<1:40:09, 5.54s/it] {'loss': 0.5517, 'learning_rate': 1.7966072205471985e-06, 'epoch': 0.81} 81%|████████ | 4688/5773 [3:27:28<1:40:09, 5.54s/it] {'loss': 0.5517, 'learning_rate': 1.7966072205471985e-06, 'epoch': 0.81} 81%|████████ | 4688/5773 [3:27:26<1:40:09, 5.54s/it] 81%|████████ | 4689/5773 [3:27:34<1:40:16, 5.55s/it] 81%|████████ | 4689/5773 [3:27:32<1:40:16, 5.55s/it] {'loss': 0.5606, 'learning_rate': 1.7933997157796523e-06, 'epoch': 0.81} 81%|████████ | 4689/5773 [3:27:34<1:40:16, 5.55s/it] {'loss': 0.5606, 'learning_rate': 1.7933997157796523e-06, 'epoch': 0.81} 81%|████████ | 4689/5773 [3:27:32<1:40:16, 5.55s/it] 81%|████████ | 4690/5773 [3:27:39<1:39:58, 5.54s/it] 81%|████████ | 4690/5773 [3:27:37<1:39:58, 5.54s/it] {'loss': 0.5444, 'learning_rate': 1.7901947947120724e-06, 'epoch': 0.81} 81%|████████ | 4690/5773 [3:27:39<1:39:58, 5.54s/it] {'loss': 0.5444, 'learning_rate': 1.7901947947120724e-06, 'epoch': 0.81} 81%|████████ | 4690/5773 [3:27:37<1:39:58, 5.54s/it] 81%|████████▏ | 4691/5773 [3:27:45<1:39:28, 5.52s/it] 81%|████████▏ | 4691/5773 [3:27:43<1:39:28, 5.52s/it] {'loss': 0.5771, 'learning_rate': 1.7869924583534749e-06, 'epoch': 0.81} 81%|████████▏ | 4691/5773 [3:27:45<1:39:28, 5.52s/it] {'loss': 0.5771, 'learning_rate': 1.7869924583534749e-06, 'epoch': 0.81} 81%|████████▏ | 4691/5773 [3:27:43<1:39:28, 5.52s/it] 81%|████████▏ | 4692/5773 [3:27:50<1:39:10, 5.50s/it] 81%|████████▏ | 4692/5773 [3:27:48<1:39:10, 5.50s/it] {'loss': 0.5345, 'learning_rate': 1.7837927077120554e-06, 'epoch': 0.81} 81%|████████▏ | 4692/5773 [3:27:50<1:39:10, 5.50s/it] {'loss': 0.5345, 'learning_rate': 1.7837927077120554e-06, 'epoch': 0.81} 81%|████████▏ | 4692/5773 [3:27:48<1:39:10, 5.50s/it] 81%|████████▏ | 4693/5773 [3:27:56<1:39:38, 5.54s/it] 81%|████████▏ | 4693/5773 [3:27:54<1:39:38, 5.54s/it] {'loss': 0.5441, 'learning_rate': 1.780595543795195e-06, 'epoch': 0.81} 81%|████████▏ | 4693/5773 [3:27:56<1:39:38, 5.54s/it] {'loss': 0.5441, 'learning_rate': 1.780595543795195e-06, 'epoch': 0.81} 81%|████████▏ | 4693/5773 [3:27:54<1:39:38, 5.54s/it] 81%|████████▏ | 4694/5773 [3:28:01<1:39:31, 5.53s/it] 81%|████████▏ | 4694/5773 [3:27:59<1:39:31, 5.53s/it] {'loss': 0.5641, 'learning_rate': 1.7774009676094683e-06, 'epoch': 0.81} 81%|████████▏ | 4694/5773 [3:28:01<1:39:31, 5.53s/it] {'loss': 0.5641, 'learning_rate': 1.7774009676094683e-06, 'epoch': 0.81} 81%|████████▏ | 4694/5773 [3:27:59<1:39:31, 5.53s/it] 81%|████████▏ | 4695/5773 [3:28:07<1:39:39, 5.55s/it] 81%|████████▏ | 4695/5773 [3:28:05<1:39:39, 5.55s/it] {'loss': 0.5705, 'learning_rate': 1.7742089801606278e-06, 'epoch': 0.81} 81%|████████▏ | 4695/5773 [3:28:07<1:39:39, 5.55s/it] {'loss': 0.5705, 'learning_rate': 1.7742089801606278e-06, 'epoch': 0.81} 81%|████████▏ | 4695/5773 [3:28:05<1:39:39, 5.55s/it] 81%|████████▏ | 4696/5773 [3:28:12<1:39:11, 5.53s/it] 81%|████████▏ | 4696/5773 [3:28:10<1:39:11, 5.53s/it] {'loss': 0.551, 'learning_rate': 1.7710195824536092e-06, 'epoch': 0.81} 81%|████████▏ | 4696/5773 [3:28:12<1:39:11, 5.53s/it] {'loss': 0.551, 'learning_rate': 1.7710195824536092e-06, 'epoch': 0.81} 81%|████████▏ | 4696/5773 [3:28:10<1:39:11, 5.53s/it] 81%|████████▏ | 4697/5773 [3:28:16<1:38:37, 5.50s/it] 81%|████████▏ | 4697/5773 [3:28:18<1:38:38, 5.50s/it] {'loss': 0.5659, 'learning_rate': 1.767832775492543e-06, 'epoch': 0.81} 81%|████████▏ | 4697/5773 [3:28:18<1:38:38, 5.50s/it] {'loss': 0.5659, 'learning_rate': 1.767832775492543e-06, 'epoch': 0.81} 81%|████████▏ | 4697/5773 [3:28:16<1:38:37, 5.50s/it] 81%|████████▏ | 4698/5773 [3:28:21<1:37:52, 5.46s/it] 81%|████████▏ | 4698/5773 [3:28:23<1:37:52, 5.46s/it] {'loss': 0.5693, 'learning_rate': 1.764648560280734e-06, 'epoch': 0.81} 81%|████████▏ | 4698/5773 [3:28:23<1:37:52, 5.46s/it] {'loss': 0.5693, 'learning_rate': 1.764648560280734e-06, 'epoch': 0.81} 81%|████████▏ | 4698/5773 [3:28:21<1:37:52, 5.46s/it] 81%|████████▏ | 4699/5773 [3:28:27<1:38:52, 5.52s/it] 81%|████████▏ | 4699/5773 [3:28:29<1:38:52, 5.52s/it] {'loss': 0.5458, 'learning_rate': 1.7614669378206773e-06, 'epoch': 0.81} 81%|████████▏ | 4699/5773 [3:28:29<1:38:52, 5.52s/it] {'loss': 0.5458, 'learning_rate': 1.7614669378206773e-06, 'epoch': 0.81} 81%|████████▏ | 4699/5773 [3:28:27<1:38:52, 5.52s/it]1115 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 14 10AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 81%|████████▏ | 4700/5773 [3:28:34<1:39:18, 5.55s/it]AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 81%|████████▏ | 4700/5773 [3:28:32<1:39:18, 5.55s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5655, 'learning_rate': 1.758287909114047e-06, 'epoch': 0.81} 81%|████████▏ | 4700/5773 [3:28:34<1:39:18, 5.55s/it] {'loss': 0.5655, 'learning_rate': 1.758287909114047e-06, 'epoch': 0.81} 81%|████████▏ | 4700/5773 [3:28:32<1:39:18, 5.55s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 81%|████████▏ | 4701/5773 [3:28:57<3:10:32, 10.66s/it] 81%|████████▏ | 4701/5773 [3:28:55<3:10:32, 10.66s/it] {'loss': 0.538, 'learning_rate': 1.7551114751617049e-06, 'epoch': 0.81} 81%|████████▏ | 4701/5773 [3:28:57<3:10:32, 10.66s/it] {'loss': 0.538, 'learning_rate': 1.7551114751617049e-06, 'epoch': 0.81} 81%|████████▏ | 4701/5773 [3:28:55<3:10:32, 10.66s/it] 81%|████████▏ | 4702/5773 [3:29:02<2:42:04, 9.08s/it] 81%|████████▏ | 4702/5773 [3:29:00<2:42:04, 9.08s/it] {'loss': 0.546, 'learning_rate': 1.751937636963692e-06, 'epoch': 0.81} 81%|████████▏ | 4702/5773 [3:29:02<2:42:04, 9.08s/it] {'loss': 0.546, 'learning_rate': 1.751937636963692e-06, 'epoch': 0.81} 81%|████████▏ | 4702/5773 [3:29:00<2:42:04, 9.08s/it] 81%|████████▏ | 4703/5773 [3:29:08<2:22:32, 7.99s/it] 81%|████████▏ | 4703/5773 [3:29:06<2:22:32, 7.99s/it] {'loss': 0.5471, 'learning_rate': 1.7487663955192325e-06, 'epoch': 0.81} 81%|████████▏ | 4703/5773 [3:29:08<2:22:32, 7.99s/it] {'loss': 0.5471, 'learning_rate': 1.7487663955192325e-06, 'epoch': 0.81} 81%|████████▏ | 4703/5773 [3:29:06<2:22:32, 7.99s/it] 81%|████████▏ | 4704/5773 [3:29:13<2:09:41, 7.28s/it] 81%|████████▏ | 4704/5773 [3:29:12<2:09:41, 7.28s/it] {'loss': 0.5454, 'learning_rate': 1.7455977518267387e-06, 'epoch': 0.81} 81%|████████▏ | 4704/5773 [3:29:13<2:09:41, 7.28s/it] {'loss': 0.5454, 'learning_rate': 1.7455977518267387e-06, 'epoch': 0.81} 81%|████████▏ | 4704/5773 [3:29:12<2:09:41, 7.28s/it] 82%|████████▏ | 4705/5773 [3:29:19<2:00:04, 6.75s/it] 82%|████████▏ | 4705/5773 [3:29:17<2:00:04, 6.75s/it] {'loss': 0.5642, 'learning_rate': 1.7424317068837993e-06, 'epoch': 0.82} 82%|████████▏ | 4705/5773 [3:29:19<2:00:04, 6.75s/it] {'loss': 0.5642, 'learning_rate': 1.7424317068837993e-06, 'epoch': 0.82} 82%|████████▏ | 4705/5773 [3:29:17<2:00:04, 6.75s/it] 82%|████████▏ | 4706/5773 [3:29:25<1:53:46, 6.40s/it] 82%|████████▏ | 4706/5773 [3:29:23<1:53:46, 6.40s/it] {'loss': 0.5551, 'learning_rate': 1.7392682616871836e-06, 'epoch': 0.82} 82%|████████▏ | 4706/5773 [3:29:25<1:53:46, 6.40s/it] {'loss': 0.5551, 'learning_rate': 1.7392682616871836e-06, 'epoch': 0.82} 82%|████████▏ | 4706/5773 [3:29:23<1:53:46, 6.40s/it] 82%|████████▏ | 4707/5773 [3:29:30<1:49:48, 6.18s/it] 82%|████████▏ | 4707/5773 [3:29:28<1:49:48, 6.18s/it] {'loss': 0.5526, 'learning_rate': 1.7361074172328507e-06, 'epoch': 0.82} 82%|████████▏ | 4707/5773 [3:29:30<1:49:48, 6.18s/it] {'loss': 0.5526, 'learning_rate': 1.7361074172328507e-06, 'epoch': 0.82} 82%|████████▏ | 4707/5773 [3:29:28<1:49:48, 6.18s/it] 82%|████████▏ | 4708/5773 [3:29:36<1:46:08, 5.98s/it] 82%|████████▏ | 4708/5773 [3:29:34<1:46:09, 5.98s/it] {'loss': 0.552, 'learning_rate': 1.732949174515932e-06, 'epoch': 0.82} 82%|████████▏ | 4708/5773 [3:29:36<1:46:08, 5.98s/it] {'loss': 0.552, 'learning_rate': 1.732949174515932e-06, 'epoch': 0.82} 82%|████████▏ | 4708/5773 [3:29:34<1:46:09, 5.98s/it] 82%|████████▏ | 4709/5773 [3:29:41<1:43:39, 5.85s/it] 82%|████████▏ | 4709/5773 [3:29:39<1:43:39, 5.85s/it] {'loss': 0.5581, 'learning_rate': 1.7297935345307415e-06, 'epoch': 0.82} 82%|████████▏ | 4709/5773 [3:29:41<1:43:39, 5.85s/it] {'loss': 0.5581, 'learning_rate': 1.7297935345307415e-06, 'epoch': 0.82} 82%|████████▏ | 4709/5773 [3:29:39<1:43:39, 5.85s/it] 82%|████████▏ | 4710/5773 [3:29:47<1:41:35, 5.73s/it] 82%|████████▏ | 4710/5773 [3:29:45<1:41:35, 5.73s/it] {'loss': 0.5522, 'learning_rate': 1.7266404982707796e-06, 'epoch': 0.82} 82%|████████▏ | 4710/5773 [3:29:47<1:41:35, 5.73s/it] {'loss': 0.5522, 'learning_rate': 1.7266404982707796e-06, 'epoch': 0.82} 82%|████████▏ | 4710/5773 [3:29:45<1:41:35, 5.73s/it] 82%|████████▏ | 4711/5773 [3:29:52<1:39:29, 5.62s/it] 82%|████████▏ | 4711/5773 [3:29:50<1:39:29, 5.62s/it] {'loss': 0.5607, 'learning_rate': 1.7234900667287214e-06, 'epoch': 0.82} 82%|████████▏ | 4711/5773 [3:29:52<1:39:29, 5.62s/it] {'loss': 0.5607, 'learning_rate': 1.7234900667287214e-06, 'epoch': 0.82} 82%|████████▏ | 4711/5773 [3:29:50<1:39:29, 5.62s/it] 82%|████████▏ | 4712/5773 [3:29:57<1:38:03, 5.54s/it] 82%|████████▏ | 4712/5773 [3:29:56<1:38:03, 5.54s/it] {'loss': 0.5575, 'learning_rate': 1.7203422408964222e-06, 'epoch': 0.82} 82%|████████▏ | 4712/5773 [3:29:57<1:38:03, 5.54s/it] {'loss': 0.5575, 'learning_rate': 1.7203422408964222e-06, 'epoch': 0.82} 82%|████████▏ | 4712/5773 [3:29:56<1:38:03, 5.54s/it] 82%|████████▏ | 4713/5773 [3:30:03<1:37:26, 5.52s/it] 82%|████████▏ | 4713/5773 [3:30:01<1:37:26, 5.52s/it] {'loss': 0.5424, 'learning_rate': 1.717197021764917e-06, 'epoch': 0.82} 82%|████████▏ | 4713/5773 [3:30:03<1:37:26, 5.52s/it] {'loss': 0.5424, 'learning_rate': 1.717197021764917e-06, 'epoch': 0.82} 82%|████████▏ | 4713/5773 [3:30:01<1:37:26, 5.52s/it] 82%|████████▏ | 4714/5773 [3:30:09<1:38:06, 5.56s/it] 82%|████████▏ | 4714/5773 [3:30:07<1:38:06, 5.56s/it] {'loss': 0.5607, 'learning_rate': 1.7140544103244272e-06, 'epoch': 0.82} 82%|████████▏ | 4714/5773 [3:30:09<1:38:06, 5.56s/it] {'loss': 0.5607, 'learning_rate': 1.7140544103244272e-06, 'epoch': 0.82} 82%|████████▏ | 4714/5773 [3:30:07<1:38:06, 5.56s/it] 82%|████████▏ | 4715/5773 [3:30:14<1:38:46, 5.60s/it] 82%|████████▏ | 4715/5773 [3:30:12<1:38:46, 5.60s/it] {'loss': 0.5657, 'learning_rate': 1.7109144075643392e-06, 'epoch': 0.82} 82%|████████▏ | 4715/5773 [3:30:14<1:38:46, 5.60s/it] {'loss': 0.5657, 'learning_rate': 1.7109144075643392e-06, 'epoch': 0.82} 82%|████████▏ | 4715/5773 [3:30:12<1:38:46, 5.60s/it] 82%|████████▏ | 4716/5773 [3:30:20<1:38:30, 5.59s/it] 82%|████████▏ | 4716/5773 [3:30:18<1:38:30, 5.59s/it] {'loss': 0.5648, 'learning_rate': 1.707777014473232e-06, 'epoch': 0.82} 82%|████████▏ | 4716/5773 [3:30:20<1:38:30, 5.59s/it] {'loss': 0.5648, 'learning_rate': 1.707777014473232e-06, 'epoch': 0.82} 82%|████████▏ | 4716/5773 [3:30:18<1:38:30, 5.59s/it] 82%|████████▏ | 4717/5773 [3:30:26<1:39:56, 5.68s/it] 82%|████████▏ | 4717/5773 [3:30:24<1:39:57, 5.68s/it] {'loss': 0.5591, 'learning_rate': 1.7046422320388556e-06, 'epoch': 0.82} 82%|████████▏ | 4717/5773 [3:30:26<1:39:56, 5.68s/it] {'loss': 0.5591, 'learning_rate': 1.7046422320388556e-06, 'epoch': 0.82} 82%|████████▏ | 4717/5773 [3:30:24<1:39:57, 5.68s/it] 82%|████████▏ | 4718/5773 [3:30:31<1:38:57, 5.63s/it] 82%|████████▏ | 4718/5773 [3:30:29<1:38:57, 5.63s/it] {'loss': 0.5552, 'learning_rate': 1.7015100612481395e-06, 'epoch': 0.82} 82%|████████▏ | 4718/5773 [3:30:31<1:38:57, 5.63s/it] {'loss': 0.5552, 'learning_rate': 1.7015100612481395e-06, 'epoch': 0.82} 82%|████████▏ | 4718/5773 [3:30:29<1:38:57, 5.63s/it] 82%|████████▏ | 4719/5773 [3:30:37<1:38:23, 5.60s/it] 82%|████████▏ | 4719/5773 [3:30:35<1:38:23, 5.60s/it] {'loss': 0.5662, 'learning_rate': 1.698380503087188e-06, 'epoch': 0.82} 82%|████████▏ | 4719/5773 [3:30:37<1:38:23, 5.60s/it] {'loss': 0.5662, 'learning_rate': 1.698380503087188e-06, 'epoch': 0.82} 82%|████████▏ | 4719/5773 [3:30:35<1:38:23, 5.60s/it] 82%|████████▏ | 4720/5773 [3:30:42<1:38:09, 5.59s/it] 82%|████████▏ | 4720/5773 [3:30:40<1:38:09, 5.59s/it] {'loss': 0.5568, 'learning_rate': 1.6952535585412921e-06, 'epoch': 0.82} 82%|████████▏ | 4720/5773 [3:30:42<1:38:09, 5.59s/it] {'loss': 0.5568, 'learning_rate': 1.6952535585412921e-06, 'epoch': 0.82} 82%|████████▏ | 4720/5773 [3:30:40<1:38:09, 5.59s/it] 82%|████████▏ | 4721/5773 [3:30:48<1:38:16, 5.61s/it] 82%|████████▏ | 4721/5773 [3:30:46<1:38:16, 5.61s/it] {'loss': 0.5575, 'learning_rate': 1.6921292285949108e-06, 'epoch': 0.82} {'loss': 0.5575, 'learning_rate': 1.6921292285949108e-06, 'epoch': 0.82} 82%|████████▏ | 4721/5773 [3:30:48<1:38:16, 5.61s/it] 82%|████████▏ | 4721/5773 [3:30:46<1:38:16, 5.61s/it] 82%|████████▏ | 4722/5773 [3:30:53<1:37:32, 5.57s/it] 82%|████████▏ | 4722/5773 [3:30:52<1:37:32, 5.57s/it] {'loss': 0.5799, 'learning_rate': 1.6890075142316809e-06, 'epoch': 0.82} 82%|████████▏ | 4722/5773 [3:30:53<1:37:32, 5.57s/it] {'loss': 0.5799, 'learning_rate': 1.6890075142316809e-06, 'epoch': 0.82} 82%|████████▏ | 4722/5773 [3:30:52<1:37:32, 5.57s/it] 82%|████████▏ | 4723/5773 [3:30:59<1:37:21, 5.56s/it] 82%|████████▏ | 4723/5773 [3:30:57<1:37:21, 5.56s/it] {'loss': 0.563, 'learning_rate': 1.6858884164344224e-06, 'epoch': 0.82} 82%|████████▏ | 4723/5773 [3:30:59<1:37:21, 5.56s/it] {'loss': 0.563, 'learning_rate': 1.6858884164344224e-06, 'epoch': 0.82} 82%|████████▏ | 4723/5773 [3:30:57<1:37:21, 5.56s/it] 82%|████████▏ | 4724/5773 [3:31:05<1:37:38, 5.58s/it] 82%|████████▏ | 4724/5773 [3:31:03<1:37:38, 5.58s/it] {'loss': 0.5493, 'learning_rate': 1.6827719361851236e-06, 'epoch': 0.82} 82%|████████▏ | 4724/5773 [3:31:05<1:37:38, 5.58s/it] {'loss': 0.5493, 'learning_rate': 1.6827719361851236e-06, 'epoch': 0.82} 82%|████████▏ | 4724/5773 [3:31:03<1:37:38, 5.58s/it] 82%|████████▏ | 4725/5773 [3:31:08<1:37:32, 5.58s/it] 82%|████████▏ | 4725/5773 [3:31:10<1:37:33, 5.59s/it] {'loss': 0.5605, 'learning_rate': 1.6796580744649537e-06, 'epoch': 0.82} 82%|████████▏ | 4725/5773 [3:31:10<1:37:33, 5.59s/it] {'loss': 0.5605, 'learning_rate': 1.6796580744649537e-06, 'epoch': 0.82} 82%|████████▏ | 4725/5773 [3:31:08<1:37:32, 5.58s/it] 82%|████████▏ | 4726/5773 [3:31:16<1:37:53, 5.61s/it] 82%|████████▏ | 4726/5773 [3:31:14<1:37:53, 5.61s/it] {'loss': 0.5431, 'learning_rate': 1.6765468322542522e-06, 'epoch': 0.82} 82%|████████▏ | 4726/5773 [3:31:16<1:37:53, 5.61s/it] {'loss': 0.5431, 'learning_rate': 1.6765468322542522e-06, 'epoch': 0.82} 82%|████████▏ | 4726/5773 [3:31:14<1:37:53, 5.61s/it] 82%|████████▏ | 4727/5773 [3:31:21<1:37:26, 5.59s/it] 82%|████████▏ | 4727/5773 [3:31:19<1:37:26, 5.59s/it] {'loss': 0.5649, 'learning_rate': 1.673438210532543e-06, 'epoch': 0.82} 82%|████████▏ | 4727/5773 [3:31:21<1:37:26, 5.59s/it] {'loss': 0.5649, 'learning_rate': 1.673438210532543e-06, 'epoch': 0.82} 82%|████████▏ | 4727/5773 [3:31:19<1:37:26, 5.59s/it] 82%|████████▏ | 4728/5773 [3:31:27<1:36:06, 5.52s/it] 82%|████████▏ | 4728/5773 [3:31:25<1:36:06, 5.52s/it] {'loss': 0.5597, 'learning_rate': 1.6703322102785168e-06, 'epoch': 0.82} 82%|████████▏ | 4728/5773 [3:31:27<1:36:06, 5.52s/it] {'loss': 0.5597, 'learning_rate': 1.6703322102785168e-06, 'epoch': 0.82} 82%|████████▏ | 4728/5773 [3:31:25<1:36:06, 5.52s/it] 82%|████████▏ | 4729/5773 [3:31:32<1:36:00, 5.52s/it] 82%|████████▏ | 4729/5773 [3:31:30<1:36:00, 5.52s/it] {'loss': 0.5706, 'learning_rate': 1.6672288324700413e-06, 'epoch': 0.82} 82%|████████▏ | 4729/5773 [3:31:32<1:36:00, 5.52s/it] {'loss': 0.5706, 'learning_rate': 1.6672288324700413e-06, 'epoch': 0.82} 82%|████████▏ | 4729/5773 [3:31:30<1:36:00, 5.52s/it] 82%|████████▏ | 4730/5773 [3:31:38<1:35:37, 5.50s/it] 82%|████████▏ | 4730/5773 [3:31:36<1:35:37, 5.50s/it] {'loss': 0.5584, 'learning_rate': 1.6641280780841606e-06, 'epoch': 0.82} 82%|████████▏ | 4730/5773 [3:31:38<1:35:37, 5.50s/it] {'loss': 0.5584, 'learning_rate': 1.6641280780841606e-06, 'epoch': 0.82} 82%|████████▏ | 4730/5773 [3:31:36<1:35:37, 5.50s/it] 82%|████████▏ | 4731/5773 [3:31:43<1:35:08, 5.48s/it] 82%|████████▏ | 4731/5773 [3:31:41<1:35:08, 5.48s/it] {'loss': 0.5648, 'learning_rate': 1.6610299480970893e-06, 'epoch': 0.82} 82%|████████▏ | 4731/5773 [3:31:43<1:35:08, 5.48s/it]{'loss': 0.5648, 'learning_rate': 1.6610299480970893e-06, 'epoch': 0.82} 82%|████████▏ | 4731/5773 [3:31:41<1:35:08, 5.48s/it] 82%|████████▏ | 4732/5773 [3:31:49<1:35:06, 5.48s/it] 82%|████████▏ | 4732/5773 [3:31:47<1:35:06, 5.48s/it] {'loss': 0.5711, 'learning_rate': 1.6579344434842171e-06, 'epoch': 0.82} 82%|████████▏ | 4732/5773 [3:31:49<1:35:06, 5.48s/it] {'loss': 0.5711, 'learning_rate': 1.6579344434842171e-06, 'epoch': 0.82} 82%|████████▏ | 4732/5773 [3:31:47<1:35:06, 5.48s/it] 82%|████████▏ | 4733/5773 [3:31:54<1:34:03, 5.43s/it] 82%|████████▏ | 4733/5773 [3:31:52<1:34:03, 5.43s/it] {'loss': 0.5383, 'learning_rate': 1.6548415652201112e-06, 'epoch': 0.82} 82%|████████▏ | 4733/5773 [3:31:54<1:34:03, 5.43s/it] {'loss': 0.5383, 'learning_rate': 1.6548415652201112e-06, 'epoch': 0.82} 82%|████████▏ | 4733/5773 [3:31:52<1:34:03, 5.43s/it] 82%|████████▏ | 4734/5773 [3:31:59<1:33:58, 5.43s/it] 82%|████████▏ | 4734/5773 [3:31:57<1:33:58, 5.43s/it] {'loss': 0.5642, 'learning_rate': 1.651751314278507e-06, 'epoch': 0.82} {'loss': 0.5642, 'learning_rate': 1.651751314278507e-06, 'epoch': 0.82} 82%|████████▏ | 4734/5773 [3:31:59<1:33:58, 5.43s/it] 82%|████████▏ | 4734/5773 [3:31:57<1:33:58, 5.43s/it] 82%|████████▏ | 4735/5773 [3:32:05<1:34:58, 5.49s/it] 82%|████████▏ | 4735/5773 [3:32:03<1:34:58, 5.49s/it] {'loss': 0.5532, 'learning_rate': 1.6486636916323106e-06, 'epoch': 0.82} 82%|████████▏ | 4735/5773 [3:32:05<1:34:58, 5.49s/it] {'loss': 0.5532, 'learning_rate': 1.6486636916323106e-06, 'epoch': 0.82} 82%|████████▏ | 4735/5773 [3:32:03<1:34:58, 5.49s/it] 82%|████████▏ | 4736/5773 [3:32:11<1:35:07, 5.50s/it] 82%|████████▏ | 4736/5773 [3:32:09<1:35:07, 5.50s/it] {'loss': 0.5594, 'learning_rate': 1.6455786982536103e-06, 'epoch': 0.82} 82%|████████▏ | 4736/5773 [3:32:11<1:35:07, 5.50s/it] {'loss': 0.5594, 'learning_rate': 1.6455786982536103e-06, 'epoch': 0.82} 82%|████████▏ | 4736/5773 [3:32:09<1:35:07, 5.50s/it] 82%|████████▏ | 4737/5773 [3:32:16<1:34:29, 5.47s/it] 82%|████████▏ | 4737/5773 [3:32:14<1:34:29, 5.47s/it] {'loss': 0.557, 'learning_rate': 1.642496335113658e-06, 'epoch': 0.82} 82%|████████▏ | 4737/5773 [3:32:16<1:34:29, 5.47s/it] {'loss': 0.557, 'learning_rate': 1.642496335113658e-06, 'epoch': 0.82} 82%|████████▏ | 4737/5773 [3:32:14<1:34:29, 5.47s/it] 82%|████████▏ | 4738/5773 [3:32:22<1:34:34, 5.48s/it] 82%|████████▏ | 4738/5773 [3:32:20<1:34:34, 5.48s/it] {'loss': 0.5652, 'learning_rate': 1.6394166031828796e-06, 'epoch': 0.82} 82%|████████▏ | 4738/5773 [3:32:22<1:34:34, 5.48s/it] {'loss': 0.5652, 'learning_rate': 1.6394166031828796e-06, 'epoch': 0.82} 82%|████████▏ | 4738/5773 [3:32:20<1:34:34, 5.48s/it] 82%|████████▏ | 4739/5773 [3:32:27<1:34:12, 5.47s/it] 82%|████████▏ | 4739/5773 [3:32:25<1:34:12, 5.47s/it] {'loss': 0.558, 'learning_rate': 1.6363395034308704e-06, 'epoch': 0.82} 82%|████████▏ | 4739/5773 [3:32:27<1:34:12, 5.47s/it] {'loss': 0.558, 'learning_rate': 1.6363395034308704e-06, 'epoch': 0.82} 82%|████████▏ | 4739/5773 [3:32:25<1:34:12, 5.47s/it] 82%|████████▏ | 4740/5773 [3:32:32<1:34:11, 5.47s/it] 82%|████████▏ | 4740/5773 [3:32:30<1:34:11, 5.47s/it] {'loss': 0.5627, 'learning_rate': 1.633265036826406e-06, 'epoch': 0.82} 82%|████████▏ | 4740/5773 [3:32:32<1:34:11, 5.47s/it] {'loss': 0.5627, 'learning_rate': 1.633265036826406e-06, 'epoch': 0.82} 82%|████████▏ | 4740/5773 [3:32:30<1:34:11, 5.47s/it] 82%|████████▏ | 4741/5773 [3:32:38<1:34:04, 5.47s/it] 82%|████████▏ | 4741/5773 [3:32:36<1:34:04, 5.47s/it] {'loss': 0.5524, 'learning_rate': 1.6301932043374226e-06, 'epoch': 0.82} 82%|████████▏ | 4741/5773 [3:32:38<1:34:04, 5.47s/it] {'loss': 0.5524, 'learning_rate': 1.6301932043374226e-06, 'epoch': 0.82} 82%|████████▏ | 4741/5773 [3:32:36<1:34:04, 5.47s/it] 82%|████████▏ | 4742/5773 [3:32:43<1:34:37, 5.51s/it] 82%|████████▏ | 4742/5773 [3:32:42<1:34:37, 5.51s/it] {'loss': 0.5743, 'learning_rate': 1.6271240069310323e-06, 'epoch': 0.82} 82%|████████▏ | 4742/5773 [3:32:43<1:34:37, 5.51s/it] {'loss': 0.5743, 'learning_rate': 1.6271240069310323e-06, 'epoch': 0.82} 82%|████████▏ | 4742/5773 [3:32:42<1:34:37, 5.51s/it] 82%|████████▏ | 4743/5773 [3:32:49<1:34:16, 5.49s/it] 82%|████████▏ | 4743/5773 [3:32:47<1:34:16, 5.49s/it] {'loss': 0.5647, 'learning_rate': 1.6240574455735158e-06, 'epoch': 0.82} 82%|████████▏ | 4743/5773 [3:32:49<1:34:16, 5.49s/it] {'loss': 0.5647, 'learning_rate': 1.6240574455735158e-06, 'epoch': 0.82} 82%|████████▏ | 4743/5773 [3:32:47<1:34:16, 5.49s/it] 82%|████████▏ | 4744/5773 [3:32:55<1:34:39, 5.52s/it] 82%|████████▏ | 4744/5773 [3:32:53<1:34:39, 5.52s/it] {'loss': 0.5617, 'learning_rate': 1.6209935212303251e-06, 'epoch': 0.82} 82%|████████▏ | 4744/5773 [3:32:55<1:34:39, 5.52s/it] {'loss': 0.5617, 'learning_rate': 1.6209935212303251e-06, 'epoch': 0.82} 82%|████████▏ | 4744/5773 [3:32:53<1:34:39, 5.52s/it] 82%|████████▏ | 4745/5773 [3:33:00<1:34:25, 5.51s/it] 82%|████████▏ | 4745/5773 [3:32:58<1:34:25, 5.51s/it] {'loss': 0.565, 'learning_rate': 1.6179322348660798e-06, 'epoch': 0.82} 82%|████████▏ | 4745/5773 [3:33:00<1:34:25, 5.51s/it] {'loss': 0.565, 'learning_rate': 1.6179322348660798e-06, 'epoch': 0.82} 82%|████████▏ | 4745/5773 [3:32:58<1:34:25, 5.51s/it] 82%|████████▏ | 4746/5773 [3:33:04<1:34:30, 5.52s/it] 82%|████████▏ | 4746/5773 [3:33:06<1:34:30, 5.52s/it] {'loss': 0.5667, 'learning_rate': 1.6148735874445741e-06, 'epoch': 0.82} 82%|████████▏ | 4746/5773 [3:33:06<1:34:30, 5.52s/it] {'loss': 0.5667, 'learning_rate': 1.6148735874445741e-06, 'epoch': 0.82} 82%|████████▏ | 4746/5773 [3:33:04<1:34:30, 5.52s/it] 82%|████████▏ | 4747/5773 [3:33:09<1:34:10, 5.51s/it] 82%|████████▏ | 4747/5773 [3:33:11<1:34:10, 5.51s/it] {'loss': 0.5493, 'learning_rate': 1.611817579928765e-06, 'epoch': 0.82} 82%|████████▏ | 4747/5773 [3:33:11<1:34:10, 5.51s/it] {'loss': 0.5493, 'learning_rate': 1.611817579928765e-06, 'epoch': 0.82} 82%|████████▏ | 4747/5773 [3:33:09<1:34:10, 5.51s/it] 82%|████████▏ | 4748/5773 [3:33:16<1:33:06, 5.45s/it] 82%|████████▏ | 4748/5773 [3:33:14<1:33:07, 5.45s/it] {'loss': 0.5576, 'learning_rate': 1.6087642132807813e-06, 'epoch': 0.82} 82%|████████▏ | 4748/5773 [3:33:16<1:33:06, 5.45s/it] {'loss': 0.5576, 'learning_rate': 1.6087642132807813e-06, 'epoch': 0.82} 82%|████████▏ | 4748/5773 [3:33:14<1:33:07, 5.45s/it] 82%|████████▏ | 4749/5773 [3:33:22<1:32:47, 5.44s/it] 82%|████████▏ | 4749/5773 [3:33:20<1:32:47, 5.44s/it] {'loss': 0.5524, 'learning_rate': 1.605713488461923e-06, 'epoch': 0.82} 82%|████████▏ | 4749/5773 [3:33:22<1:32:47, 5.44s/it] {'loss': 0.5524, 'learning_rate': 1.605713488461923e-06, 'epoch': 0.82} 82%|████████▏ | 4749/5773 [3:33:20<1:32:47, 5.44s/it]119 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 1412 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 52 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 82%|████████▏ | 4750/5773 [3:33:25<1:32:54, 5.45s/it] 82%|████████▏ | 4750/5773 [3:33:27<1:32:54, 5.45s/it]3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5581, 'learning_rate': 1.6026654064326553e-06, 'epoch': 0.82} 82%|████████▏ | 4750/5773 [3:33:27<1:32:54, 5.45s/it] {'loss': 0.5581, 'learning_rate': 1.6026654064326553e-06, 'epoch': 0.82} 82%|████████▏ | 4750/5773 [3:33:25<1:32:54, 5.45s/it] 82%|████████▏ | 4751/5773 [3:33:33<1:34:52, 5.57s/it] 82%|████████▏ | 4751/5773 [3:33:31<1:34:52, 5.57s/it] {'loss': 0.5443, 'learning_rate': 1.5996199681526092e-06, 'epoch': 0.82} 82%|████████▏ | 4751/5773 [3:33:33<1:34:52, 5.57s/it] {'loss': 0.5443, 'learning_rate': 1.5996199681526092e-06, 'epoch': 0.82} 82%|████████▏ | 4751/5773 [3:33:31<1:34:52, 5.57s/it] 82%|████████▏ | 4752/5773 [3:33:39<1:34:51, 5.57s/it] 82%|████████▏ | 4752/5773 [3:33:37<1:34:51, 5.57s/it] {'loss': 0.5378, 'learning_rate': 1.596577174580586e-06, 'epoch': 0.82} 82%|████████▏ | 4752/5773 [3:33:39<1:34:51, 5.57s/it] {'loss': 0.5378, 'learning_rate': 1.596577174580586e-06, 'epoch': 0.82} 82%|████████▏ | 4752/5773 [3:33:37<1:34:51, 5.57s/it] 82%|████████▏ | 4753/5773 [3:33:42<1:34:37, 5.57s/it] 82%|████████▏ | 4753/5773 [3:33:44<1:34:37, 5.57s/it] {'loss': 0.5788, 'learning_rate': 1.5935370266745575e-06, 'epoch': 0.82} 82%|████████▏ | 4753/5773 [3:33:44<1:34:37, 5.57s/it] {'loss': 0.5788, 'learning_rate': 1.5935370266745575e-06, 'epoch': 0.82} 82%|████████▏ | 4753/5773 [3:33:42<1:34:37, 5.57s/it] 82%|████████▏ | 4754/5773 [3:33:50<1:34:26, 5.56s/it] 82%|████████▏ | 4754/5773 [3:33:48<1:34:26, 5.56s/it] {'loss': 0.5546, 'learning_rate': 1.5904995253916578e-06, 'epoch': 0.82} 82%|████████▏ | 4754/5773 [3:33:50<1:34:26, 5.56s/it] {'loss': 0.5546, 'learning_rate': 1.5904995253916578e-06, 'epoch': 0.82} 82%|████████▏ | 4754/5773 [3:33:48<1:34:26, 5.56s/it] 82%|████████▏ | 4755/5773 [3:33:55<1:33:28, 5.51s/it] 82%|████████▏ | 4755/5773 [3:33:53<1:33:28, 5.51s/it] {'loss': 0.5695, 'learning_rate': 1.587464671688187e-06, 'epoch': 0.82} 82%|████████▏ | 4755/5773 [3:33:55<1:33:28, 5.51s/it] {'loss': 0.5695, 'learning_rate': 1.587464671688187e-06, 'epoch': 0.82} 82%|████████▏ | 4755/5773 [3:33:53<1:33:28, 5.51s/it] 82%|████████▏ | 4756/5773 [3:33:59<1:33:27, 5.51s/it] 82%|████████▏ | 4756/5773 [3:34:01<1:33:27, 5.51s/it] {'loss': 0.5565, 'learning_rate': 1.5844324665196209e-06, 'epoch': 0.82} 82%|████████▏ | 4756/5773 [3:34:01<1:33:27, 5.51s/it] {'loss': 0.5565, 'learning_rate': 1.5844324665196209e-06, 'epoch': 0.82} 82%|████████▏ | 4756/5773 [3:33:59<1:33:27, 5.51s/it] 82%|████████▏ | 4757/5773 [3:34:06<1:33:58, 5.55s/it] 82%|████████▏ | 4757/5773 [3:34:04<1:33:58, 5.55s/it] {'loss': 0.5606, 'learning_rate': 1.581402910840588e-06, 'epoch': 0.82} 82%|████████▏ | 4757/5773 [3:34:06<1:33:58, 5.55s/it] {'loss': 0.5606, 'learning_rate': 1.581402910840588e-06, 'epoch': 0.82} 82%|████████▏ | 4757/5773 [3:34:04<1:33:58, 5.55s/it] 82%|████████▏ | 4758/5773 [3:34:12<1:34:26, 5.58s/it] 82%|████████▏ | 4758/5773 [3:34:10<1:34:26, 5.58s/it] {'loss': 0.5649, 'learning_rate': 1.578376005604888e-06, 'epoch': 0.82} 82%|████████▏ | 4758/5773 [3:34:12<1:34:26, 5.58s/it] {'loss': 0.5649, 'learning_rate': 1.578376005604888e-06, 'epoch': 0.82} 82%|████████▏ | 4758/5773 [3:34:10<1:34:26, 5.58s/it] 82%|████████▏ | 4759/5773 [3:34:18<1:34:16, 5.58s/it] 82%|████████▏ | 4759/5773 [3:34:16<1:34:16, 5.58s/it] {'loss': 0.5693, 'learning_rate': 1.5753517517654937e-06, 'epoch': 0.82} 82%|████████▏ | 4759/5773 [3:34:18<1:34:16, 5.58s/it] {'loss': 0.5693, 'learning_rate': 1.5753517517654937e-06, 'epoch': 0.82} 82%|████████▏ | 4759/5773 [3:34:16<1:34:16, 5.58s/it] 82%|████████▏ | 4760/5773 [3:34:23<1:34:43, 5.61s/it] 82%|████████▏ | 4760/5773 [3:34:21<1:34:43, 5.61s/it] {'loss': 0.5458, 'learning_rate': 1.5723301502745325e-06, 'epoch': 0.82} 82%|████████▏ | 4760/5773 [3:34:23<1:34:43, 5.61s/it] {'loss': 0.5458, 'learning_rate': 1.5723301502745325e-06, 'epoch': 0.82} 82%|████████▏ | 4760/5773 [3:34:21<1:34:43, 5.61s/it] 82%|████████▏ | 4761/5773 [3:34:27<1:33:09, 5.52s/it] 82%|████████▏ | 4761/5773 [3:34:29<1:33:09, 5.52s/it] {'loss': 0.5632, 'learning_rate': 1.5693112020833012e-06, 'epoch': 0.82} 82%|████████▏ | 4761/5773 [3:34:29<1:33:09, 5.52s/it] {'loss': 0.5632, 'learning_rate': 1.5693112020833012e-06, 'epoch': 0.82} 82%|████████▏ | 4761/5773 [3:34:27<1:33:09, 5.52s/it] 82%|████████▏ | 4762/5773 [3:34:34<1:33:45, 5.56s/it] 82%|████████▏ | 4762/5773 [3:34:32<1:33:45, 5.56s/it] {'loss': 0.5616, 'learning_rate': 1.5662949081422629e-06, 'epoch': 0.82} 82%|████████▏ | 4762/5773 [3:34:34<1:33:45, 5.56s/it] {'loss': 0.5616, 'learning_rate': 1.5662949081422629e-06, 'epoch': 0.82} 82%|████████▏ | 4762/5773 [3:34:32<1:33:45, 5.56s/it] 83%|████████▎ | 4763/5773 [3:34:38<1:33:02, 5.53s/it] 83%|████████▎ | 4763/5773 [3:34:40<1:33:02, 5.53s/it] {'loss': 0.5632, 'learning_rate': 1.5632812694010435e-06, 'epoch': 0.83} 83%|████████▎ | 4763/5773 [3:34:40<1:33:02, 5.53s/it] {'loss': 0.5632, 'learning_rate': 1.5632812694010435e-06, 'epoch': 0.83} 83%|████████▎ | 4763/5773 [3:34:38<1:33:02, 5.53s/it] 83%|████████▎ | 4764/5773 [3:34:43<1:32:08, 5.48s/it] 83%|████████▎ | 4764/5773 [3:34:45<1:32:08, 5.48s/it] {'loss': 0.5485, 'learning_rate': 1.5602702868084319e-06, 'epoch': 0.83} 83%|████████▎ | 4764/5773 [3:34:45<1:32:08, 5.48s/it] {'loss': 0.5485, 'learning_rate': 1.5602702868084319e-06, 'epoch': 0.83} 83%|████████▎ | 4764/5773 [3:34:43<1:32:08, 5.48s/it] 83%|████████▎ | 4765/5773 [3:34:48<1:31:30, 5.45s/it] 83%|████████▎ | 4765/5773 [3:34:50<1:31:30, 5.45s/it] {'loss': 0.5509, 'learning_rate': 1.5572619613123797e-06, 'epoch': 0.83} 83%|████████▎ | 4765/5773 [3:34:50<1:31:30, 5.45s/it] {'loss': 0.5509, 'learning_rate': 1.5572619613123797e-06, 'epoch': 0.83} 83%|████████▎ | 4765/5773 [3:34:48<1:31:30, 5.45s/it] 83%|████████▎ | 4766/5773 [3:34:54<1:32:21, 5.50s/it] 83%|████████▎ | 4766/5773 [3:34:56<1:32:21, 5.50s/it] {'loss': 0.5569, 'learning_rate': 1.554256293860007e-06, 'epoch': 0.83} 83%|████████▎ | 4766/5773 [3:34:56<1:32:21, 5.50s/it] {'loss': 0.5569, 'learning_rate': 1.554256293860007e-06, 'epoch': 0.83} 83%|████████▎ | 4766/5773 [3:34:54<1:32:21, 5.50s/it] 83%|████████▎ | 4767/5773 [3:35:02<1:32:29, 5.52s/it] 83%|████████▎ | 4767/5773 [3:35:00<1:32:29, 5.52s/it] {'loss': 0.5675, 'learning_rate': 1.5512532853975925e-06, 'epoch': 0.83} 83%|████████▎ | 4767/5773 [3:35:02<1:32:29, 5.52s/it] {'loss': 0.5675, 'learning_rate': 1.5512532853975925e-06, 'epoch': 0.83} 83%|████████▎ | 4767/5773 [3:35:00<1:32:29, 5.52s/it] 83%|████████▎ | 4768/5773 [3:35:07<1:32:22, 5.51s/it] 83%|████████▎ | 4768/5773 [3:35:05<1:32:22, 5.51s/it] {'loss': 0.5449, 'learning_rate': 1.548252936870578e-06, 'epoch': 0.83} 83%|████████▎ | 4768/5773 [3:35:07<1:32:22, 5.51s/it] {'loss': 0.5449, 'learning_rate': 1.548252936870578e-06, 'epoch': 0.83} 83%|████████▎ | 4768/5773 [3:35:05<1:32:22, 5.51s/it] 83%|████████▎ | 4769/5773 [3:35:11<1:33:12, 5.57s/it] 83%|████████▎ | 4769/5773 [3:35:13<1:33:13, 5.57s/it] {'loss': 0.5475, 'learning_rate': 1.545255249223573e-06, 'epoch': 0.83} 83%|████████▎ | 4769/5773 [3:35:13<1:33:13, 5.57s/it] {'loss': 0.5475, 'learning_rate': 1.545255249223573e-06, 'epoch': 0.83} 83%|████████▎ | 4769/5773 [3:35:11<1:33:12, 5.57s/it] 83%|████████▎ | 4770/5773 [3:35:18<1:33:07, 5.57s/it] 83%|████████▎ | 4770/5773 [3:35:16<1:33:07, 5.57s/it] {'loss': 0.546, 'learning_rate': 1.5422602234003436e-06, 'epoch': 0.83} 83%|████████▎ | 4770/5773 [3:35:18<1:33:07, 5.57s/it] {'loss': 0.546, 'learning_rate': 1.5422602234003436e-06, 'epoch': 0.83} 83%|████████▎ | 4770/5773 [3:35:16<1:33:07, 5.57s/it] 83%|████████▎ | 4771/5773 [3:35:24<1:32:37, 5.55s/it] 83%|████████▎ | 4771/5773 [3:35:22<1:32:37, 5.55s/it] {'loss': 0.5533, 'learning_rate': 1.539267860343815e-06, 'epoch': 0.83} 83%|████████▎ | 4771/5773 [3:35:24<1:32:37, 5.55s/it] {'loss': 0.5533, 'learning_rate': 1.539267860343815e-06, 'epoch': 0.83} 83%|████████▎ | 4771/5773 [3:35:22<1:32:37, 5.55s/it] 83%|████████▎ | 4772/5773 [3:35:29<1:32:28, 5.54s/it] 83%|████████▎ | 4772/5773 [3:35:27<1:32:28, 5.54s/it] {'loss': 0.5522, 'learning_rate': 1.5362781609960853e-06, 'epoch': 0.83} 83%|████████▎ | 4772/5773 [3:35:29<1:32:28, 5.54s/it] {'loss': 0.5522, 'learning_rate': 1.5362781609960853e-06, 'epoch': 0.83} 83%|████████▎ | 4772/5773 [3:35:27<1:32:28, 5.54s/it] 83%|████████▎ | 4773/5773 [3:35:35<1:32:53, 5.57s/it] 83%|████████▎ | 4773/5773 [3:35:33<1:32:53, 5.57s/it] {'loss': 0.5509, 'learning_rate': 1.533291126298404e-06, 'epoch': 0.83} 83%|████████▎ | 4773/5773 [3:35:35<1:32:53, 5.57s/it] {'loss': 0.5509, 'learning_rate': 1.533291126298404e-06, 'epoch': 0.83} 83%|████████▎ | 4773/5773 [3:35:33<1:32:53, 5.57s/it] 83%|████████▎ | 4774/5773 [3:35:41<1:32:35, 5.56s/it] 83%|████████▎ | 4774/5773 [3:35:39<1:32:35, 5.56s/it] {'loss': 0.5741, 'learning_rate': 1.5303067571911834e-06, 'epoch': 0.83} 83%|████████▎ | 4774/5773 [3:35:41<1:32:35, 5.56s/it] {'loss': 0.5741, 'learning_rate': 1.5303067571911834e-06, 'epoch': 0.83} 83%|████████▎ | 4774/5773 [3:35:39<1:32:35, 5.56s/it] 83%|████████▎ | 4775/5773 [3:35:46<1:31:51, 5.52s/it] 83%|████████▎ | 4775/5773 [3:35:44<1:31:51, 5.52s/it] {'loss': 0.5642, 'learning_rate': 1.5273250546140028e-06, 'epoch': 0.83} 83%|████████▎ | 4775/5773 [3:35:46<1:31:51, 5.52s/it] {'loss': 0.5642, 'learning_rate': 1.5273250546140028e-06, 'epoch': 0.83} 83%|████████▎ | 4775/5773 [3:35:44<1:31:51, 5.52s/it] 83%|████████▎ | 4776/5773 [3:35:50<1:31:34, 5.51s/it] 83%|████████▎ | 4776/5773 [3:35:51<1:31:34, 5.51s/it] {'loss': 0.5472, 'learning_rate': 1.524346019505596e-06, 'epoch': 0.83} 83%|████████▎ | 4776/5773 [3:35:51<1:31:34, 5.51s/it] {'loss': 0.5472, 'learning_rate': 1.524346019505596e-06, 'epoch': 0.83} 83%|████████▎ | 4776/5773 [3:35:50<1:31:34, 5.51s/it] 83%|████████▎ | 4777/5773 [3:35:57<1:30:13, 5.44s/it] 83%|████████▎ | 4777/5773 [3:35:55<1:30:13, 5.44s/it] {'loss': 0.5588, 'learning_rate': 1.5213696528038556e-06, 'epoch': 0.83} 83%|████████▎ | 4777/5773 [3:35:57<1:30:13, 5.44s/it] {'loss': 0.5588, 'learning_rate': 1.5213696528038556e-06, 'epoch': 0.83} 83%|████████▎ | 4777/5773 [3:35:55<1:30:13, 5.44s/it] 83%|████████▎ | 4778/5773 [3:36:02<1:29:39, 5.41s/it] 83%|████████▎ | 4778/5773 [3:36:00<1:29:39, 5.41s/it] {'loss': 0.5555, 'learning_rate': 1.5183959554458383e-06, 'epoch': 0.83} 83%|████████▎ | 4778/5773 [3:36:02<1:29:39, 5.41s/it] {'loss': 0.5555, 'learning_rate': 1.5183959554458383e-06, 'epoch': 0.83} 83%|████████▎ | 4778/5773 [3:36:00<1:29:39, 5.41s/it] 83%|████████▎ | 4779/5773 [3:36:06<1:30:35, 5.47s/it] 83%|████████▎ | 4779/5773 [3:36:08<1:30:35, 5.47s/it] {'loss': 0.5691, 'learning_rate': 1.5154249283677613e-06, 'epoch': 0.83} 83%|████████▎ | 4779/5773 [3:36:08<1:30:35, 5.47s/it] {'loss': 0.5691, 'learning_rate': 1.5154249283677613e-06, 'epoch': 0.83} 83%|████████▎ | 4779/5773 [3:36:06<1:30:35, 5.47s/it] 83%|████████▎ | 4780/5773 [3:36:13<1:29:59, 5.44s/it] 83%|████████▎ | 4780/5773 [3:36:11<1:29:59, 5.44s/it] {'loss': 0.5707, 'learning_rate': 1.5124565725049978e-06, 'epoch': 0.83} 83%|████████▎ | 4780/5773 [3:36:13<1:29:59, 5.44s/it] {'loss': 0.5707, 'learning_rate': 1.5124565725049978e-06, 'epoch': 0.83} 83%|████████▎ | 4780/5773 [3:36:11<1:29:59, 5.44s/it] 83%|████████▎ | 4781/5773 [3:36:19<1:31:10, 5.51s/it] 83%|████████▎ | 4781/5773 [3:36:17<1:31:10, 5.51s/it] {'loss': 0.5832, 'learning_rate': 1.5094908887920767e-06, 'epoch': 0.83} 83%|████████▎ | 4781/5773 [3:36:19<1:31:10, 5.51s/it] {'loss': 0.5832, 'learning_rate': 1.5094908887920767e-06, 'epoch': 0.83} 83%|████████▎ | 4781/5773 [3:36:17<1:31:10, 5.51s/it] 83%|████████▎ | 4782/5773 [3:36:24<1:31:00, 5.51s/it] 83%|████████▎ | 4782/5773 [3:36:22<1:31:00, 5.51s/it] {'loss': 0.5606, 'learning_rate': 1.5065278781626968e-06, 'epoch': 0.83} 83%|████████▎ | 4782/5773 [3:36:24<1:31:00, 5.51s/it] {'loss': 0.5606, 'learning_rate': 1.5065278781626968e-06, 'epoch': 0.83} 83%|████████▎ | 4782/5773 [3:36:22<1:31:00, 5.51s/it] 83%|████████▎ | 4783/5773 [3:36:30<1:31:08, 5.52s/it] 83%|████████▎ | 4783/5773 [3:36:28<1:31:08, 5.52s/it] {'loss': 0.5651, 'learning_rate': 1.5035675415497064e-06, 'epoch': 0.83} 83%|████████▎ | 4783/5773 [3:36:30<1:31:08, 5.52s/it] {'loss': 0.5651, 'learning_rate': 1.5035675415497064e-06, 'epoch': 0.83} 83%|████████▎ | 4783/5773 [3:36:28<1:31:08, 5.52s/it] 83%|████████▎ | 4784/5773 [3:36:35<1:30:30, 5.49s/it] 83%|████████▎ | 4784/5773 [3:36:33<1:30:29, 5.49s/it] {'loss': 0.5589, 'learning_rate': 1.5006098798851122e-06, 'epoch': 0.83} 83%|████████▎ | 4784/5773 [3:36:35<1:30:30, 5.49s/it] {'loss': 0.5589, 'learning_rate': 1.5006098798851122e-06, 'epoch': 0.83} 83%|████████▎ | 4784/5773 [3:36:33<1:30:29, 5.49s/it] 83%|████████▎ | 4785/5773 [3:36:39<1:30:24, 5.49s/it] 83%|████████▎ | 4785/5773 [3:36:41<1:30:24, 5.49s/it] {'loss': 0.5568, 'learning_rate': 1.4976548941000813e-06, 'epoch': 0.83} 83%|████████▎ | 4785/5773 [3:36:41<1:30:24, 5.49s/it] {'loss': 0.5568, 'learning_rate': 1.4976548941000813e-06, 'epoch': 0.83} 83%|████████▎ | 4785/5773 [3:36:39<1:30:24, 5.49s/it] 83%|████████▎ | 4786/5773 [3:36:46<1:29:48, 5.46s/it] 83%|████████▎ | 4786/5773 [3:36:44<1:29:48, 5.46s/it] {'loss': 0.5598, 'learning_rate': 1.4947025851249376e-06, 'epoch': 0.83} 83%|████████▎ | 4786/5773 [3:36:46<1:29:48, 5.46s/it] {'loss': 0.5598, 'learning_rate': 1.4947025851249376e-06, 'epoch': 0.83} 83%|████████▎ | 4786/5773 [3:36:44<1:29:48, 5.46s/it] 83%|████████▎ | 4787/5773 [3:36:50<1:30:21, 5.50s/it] 83%|████████▎ | 4787/5773 [3:36:52<1:30:21, 5.50s/it] {'loss': 0.5678, 'learning_rate': 1.4917529538891607e-06, 'epoch': 0.83} 83%|████████▎ | 4787/5773 [3:36:52<1:30:21, 5.50s/it] {'loss': 0.5678, 'learning_rate': 1.4917529538891607e-06, 'epoch': 0.83} 83%|████████▎ | 4787/5773 [3:36:50<1:30:21, 5.50s/it] 83%|████████▎ | 4788/5773 [3:36:55<1:29:13, 5.43s/it] 83%|████████▎ | 4788/5773 [3:36:57<1:29:13, 5.43s/it] {'loss': 0.5552, 'learning_rate': 1.4888060013213934e-06, 'epoch': 0.83} 83%|████████▎ | 4788/5773 [3:36:57<1:29:13, 5.43s/it] {'loss': 0.5552, 'learning_rate': 1.4888060013213934e-06, 'epoch': 0.83} 83%|████████▎ | 4788/5773 [3:36:55<1:29:13, 5.43s/it] 83%|████████▎ | 4789/5773 [3:37:01<1:29:49, 5.48s/it] 83%|████████▎ | 4789/5773 [3:37:03<1:29:49, 5.48s/it] {'loss': 0.5618, 'learning_rate': 1.485861728349427e-06, 'epoch': 0.83} 83%|████████▎ | 4789/5773 [3:37:03<1:29:49, 5.48s/it] {'loss': 0.5618, 'learning_rate': 1.485861728349427e-06, 'epoch': 0.83} 83%|████████▎ | 4789/5773 [3:37:01<1:29:49, 5.48s/it] 83%|████████▎ | 4790/5773 [3:37:08<1:28:59, 5.43s/it] 83%|████████▎ | 4790/5773 [3:37:06<1:28:59, 5.43s/it] {'loss': 0.5519, 'learning_rate': 1.482920135900211e-06, 'epoch': 0.83} 83%|████████▎ | 4790/5773 [3:37:08<1:28:59, 5.43s/it] {'loss': 0.5519, 'learning_rate': 1.482920135900211e-06, 'epoch': 0.83} 83%|████████▎ | 4790/5773 [3:37:06<1:28:59, 5.43s/it] 83%|████████▎ | 4791/5773 [3:37:13<1:28:49, 5.43s/it] 83%|████████▎ | 4791/5773 [3:37:11<1:28:49, 5.43s/it] {'loss': 0.5454, 'learning_rate': 1.4799812248998568e-06, 'epoch': 0.83} 83%|████████▎ | 4791/5773 [3:37:13<1:28:49, 5.43s/it] {'loss': 0.5454, 'learning_rate': 1.4799812248998568e-06, 'epoch': 0.83} 83%|████████▎ | 4791/5773 [3:37:11<1:28:49, 5.43s/it] 83%|████████▎ | 4792/5773 [3:37:19<1:28:41, 5.42s/it] 83%|████████▎ | 4792/5773 [3:37:17<1:28:41, 5.42s/it] {'loss': 0.5544, 'learning_rate': 1.4770449962736265e-06, 'epoch': 0.83} 83%|████████▎ | 4792/5773 [3:37:19<1:28:41, 5.42s/it] {'loss': 0.5544, 'learning_rate': 1.4770449962736265e-06, 'epoch': 0.83} 83%|████████▎ | 4792/5773 [3:37:17<1:28:41, 5.42s/it] 83%|████████▎ | 4793/5773 [3:37:24<1:29:11, 5.46s/it] 83%|████████▎ | 4793/5773 [3:37:22<1:29:11, 5.46s/it] {'loss': 0.5623, 'learning_rate': 1.4741114509459376e-06, 'epoch': 0.83} 83%|████████▎ | 4793/5773 [3:37:24<1:29:11, 5.46s/it] {'loss': 0.5623, 'learning_rate': 1.4741114509459376e-06, 'epoch': 0.83} 83%|████████▎ | 4793/5773 [3:37:22<1:29:11, 5.46s/it] 83%|████████▎ | 4794/5773 [3:37:30<1:29:03, 5.46s/it] 83%|████████▎ | 4794/5773 [3:37:28<1:29:03, 5.46s/it] {'loss': 0.5451, 'learning_rate': 1.471180589840363e-06, 'epoch': 0.83} 83%|████████▎ | 4794/5773 [3:37:30<1:29:03, 5.46s/it] {'loss': 0.5451, 'learning_rate': 1.471180589840363e-06, 'epoch': 0.83} 83%|████████▎ | 4794/5773 [3:37:28<1:29:03, 5.46s/it] 83%|████████▎ | 4795/5773 [3:37:33<1:29:14, 5.47s/it] 83%|████████▎ | 4795/5773 [3:37:35<1:29:14, 5.47s/it] {'loss': 0.5519, 'learning_rate': 1.4682524138796338e-06, 'epoch': 0.83} 83%|████████▎ | 4795/5773 [3:37:35<1:29:14, 5.47s/it] {'loss': 0.5519, 'learning_rate': 1.4682524138796338e-06, 'epoch': 0.83} 83%|████████▎ | 4795/5773 [3:37:33<1:29:14, 5.47s/it] 83%|████████▎ | 4796/5773 [3:37:41<1:28:45, 5.45s/it] 83%|████████▎ | 4796/5773 [3:37:39<1:28:45, 5.45s/it] {'loss': 0.572, 'learning_rate': 1.4653269239856326e-06, 'epoch': 0.83} 83%|████████▎ | 4796/5773 [3:37:41<1:28:45, 5.45s/it] {'loss': 0.572, 'learning_rate': 1.4653269239856326e-06, 'epoch': 0.83} 83%|████████▎ | 4796/5773 [3:37:39<1:28:45, 5.45s/it] 83%|████████▎ | 4797/5773 [3:37:46<1:28:57, 5.47s/it] 83%|████████▎ | 4797/5773 [3:37:44<1:28:57, 5.47s/it] {'loss': 0.56, 'learning_rate': 1.4624041210793939e-06, 'epoch': 0.83} 83%|████████▎ | 4797/5773 [3:37:46<1:28:57, 5.47s/it] {'loss': 0.56, 'learning_rate': 1.4624041210793939e-06, 'epoch': 0.83} 83%|████████▎ | 4797/5773 [3:37:44<1:28:57, 5.47s/it] 83%|████████▎ | 4798/5773 [3:37:50<1:29:17, 5.49s/it] 83%|████████▎ | 4798/5773 [3:37:52<1:29:17, 5.49s/it] {'loss': 0.564, 'learning_rate': 1.4594840060811178e-06, 'epoch': 0.83} 83%|████████▎ | 4798/5773 [3:37:52<1:29:17, 5.49s/it] {'loss': 0.564, 'learning_rate': 1.4594840060811178e-06, 'epoch': 0.83} 83%|████████▎ | 4798/5773 [3:37:50<1:29:17, 5.49s/it] 83%|████████▎ | 4799/5773 [3:37:57<1:29:41, 5.53s/it] 83%|████████▎ | 4799/5773 [3:37:55<1:29:41, 5.53s/it] {'loss': 0.5495, 'learning_rate': 1.4565665799101402e-06, 'epoch': 0.83} 83%|████████▎ | 4799/5773 [3:37:57<1:29:41, 5.53s/it] {'loss': 0.5495, 'learning_rate': 1.4565665799101402e-06, 'epoch': 0.83} 83%|████████▎ | 4799/5773 [3:37:55<1:29:41, 5.53s/it]9 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 1514 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 64 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 83%|████████▎ | 4800/5773 [3:38:03<1:28:54, 5.48s/it]10 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 83%|████████▎ | 4800/5773 [3:38:01<1:28:54, 5.48s/it] {'loss': 0.558, 'learning_rate': 1.4536518434849633e-06, 'epoch': 0.83} 83%|████████▎ | 4800/5773 [3:38:03<1:28:54, 5.48s/it] {'loss': 0.558, 'learning_rate': 1.4536518434849633e-06, 'epoch': 0.83} 83%|████████▎ | 4800/5773 [3:38:01<1:28:54, 5.48s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 83%|████████▎ | 4801/5773 [3:38:25<2:53:13, 10.69s/it] 83%|████████▎ | 4801/5773 [3:38:24<2:53:13, 10.69s/it] {'loss': 0.5557, 'learning_rate': 1.4507397977232418e-06, 'epoch': 0.83} 83%|████████▎ | 4801/5773 [3:38:26<2:53:13, 10.69s/it] {'loss': 0.5557, 'learning_rate': 1.4507397977232418e-06, 'epoch': 0.83} 83%|████████▎ | 4801/5773 [3:38:24<2:53:13, 10.69s/it] 83%|████████▎ | 4802/5773 [3:38:31<2:27:31, 9.12s/it] 83%|████████▎ | 4802/5773 [3:38:29<2:27:31, 9.12s/it] {'loss': 0.5625, 'learning_rate': 1.447830443541779e-06, 'epoch': 0.83} 83%|████████▎ | 4802/5773 [3:38:31<2:27:31, 9.12s/it] {'loss': 0.5625, 'learning_rate': 1.447830443541779e-06, 'epoch': 0.83} 83%|████████▎ | 4802/5773 [3:38:29<2:27:31, 9.12s/it] 83%|████████▎ | 4803/5773 [3:38:36<2:10:04, 8.05s/it] 83%|████████▎ | 4803/5773 [3:38:35<2:10:04, 8.05s/it] {'loss': 0.544, 'learning_rate': 1.4449237818565298e-06, 'epoch': 0.83} 83%|████████▎ | 4803/5773 [3:38:36<2:10:04, 8.05s/it] {'loss': 0.544, 'learning_rate': 1.4449237818565298e-06, 'epoch': 0.83} 83%|████████▎ | 4803/5773 [3:38:35<2:10:04, 8.05s/it] 83%|████████▎ | 4804/5773 [3:38:42<1:57:39, 7.29s/it] 83%|████████▎ | 4804/5773 [3:38:40<1:57:39, 7.29s/it] {'loss': 0.55, 'learning_rate': 1.4420198135826103e-06, 'epoch': 0.83} 83%|████████▎ | 4804/5773 [3:38:42<1:57:39, 7.29s/it] {'loss': 0.55, 'learning_rate': 1.4420198135826103e-06, 'epoch': 0.83} 83%|████████▎ | 4804/5773 [3:38:40<1:57:39, 7.29s/it] 83%|████████▎ | 4805/5773 [3:38:47<1:48:27, 6.72s/it] 83%|████████▎ | 4805/5773 [3:38:45<1:48:27, 6.72s/it] {'loss': 0.5404, 'learning_rate': 1.4391185396342787e-06, 'epoch': 0.83} 83%|████████▎ | 4805/5773 [3:38:47<1:48:27, 6.72s/it] {'loss': 0.5404, 'learning_rate': 1.4391185396342787e-06, 'epoch': 0.83} 83%|████████▎ | 4805/5773 [3:38:45<1:48:27, 6.72s/it] 83%|████████▎ | 4806/5773 [3:38:53<1:42:37, 6.37s/it] 83%|████████▎ | 4806/5773 [3:38:51<1:42:37, 6.37s/it] {'loss': 0.56, 'learning_rate': 1.4362199609249482e-06, 'epoch': 0.83} 83%|████████▎ | 4806/5773 [3:38:53<1:42:37, 6.37s/it] {'loss': 0.56, 'learning_rate': 1.4362199609249482e-06, 'epoch': 0.83} 83%|████████▎ | 4806/5773 [3:38:51<1:42:37, 6.37s/it] 83%|████████▎ | 4807/5773 [3:38:58<1:38:26, 6.11s/it] 83%|████████▎ | 4807/5773 [3:38:57<1:38:26, 6.11s/it] {'loss': 0.5718, 'learning_rate': 1.433324078367183e-06, 'epoch': 0.83} 83%|████████▎ | 4807/5773 [3:38:58<1:38:26, 6.11s/it] {'loss': 0.5718, 'learning_rate': 1.433324078367183e-06, 'epoch': 0.83} 83%|████████▎ | 4807/5773 [3:38:57<1:38:26, 6.11s/it] 83%|████████▎ | 4808/5773 [3:39:04<1:34:48, 5.89s/it] 83%|████████▎ | 4808/5773 [3:39:02<1:34:49, 5.90s/it] {'loss': 0.5778, 'learning_rate': 1.430430892872704e-06, 'epoch': 0.83} 83%|████████▎ | 4808/5773 [3:39:04<1:34:48, 5.89s/it] {'loss': 0.5778, 'learning_rate': 1.430430892872704e-06, 'epoch': 0.83} 83%|████████▎ | 4808/5773 [3:39:02<1:34:49, 5.90s/it] 83%|████████▎ | 4809/5773 [3:39:07<1:33:09, 5.80s/it] 83%|████████▎ | 4809/5773 [3:39:09<1:33:09, 5.80s/it] {'loss': 0.5408, 'learning_rate': 1.4275404053523757e-06, 'epoch': 0.83} 83%|████████▎ | 4809/5773 [3:39:09<1:33:09, 5.80s/it] {'loss': 0.5408, 'learning_rate': 1.4275404053523757e-06, 'epoch': 0.83} 83%|████████▎ | 4809/5773 [3:39:07<1:33:09, 5.80s/it] 83%|████████▎ | 4810/5773 [3:39:15<1:31:16, 5.69s/it] 83%|████████▎ | 4810/5773 [3:39:13<1:31:16, 5.69s/it] {'loss': 0.5557, 'learning_rate': 1.4246526167162144e-06, 'epoch': 0.83} 83%|████████▎ | 4810/5773 [3:39:15<1:31:16, 5.69s/it] {'loss': 0.5557, 'learning_rate': 1.4246526167162144e-06, 'epoch': 0.83} 83%|████████▎ | 4810/5773 [3:39:13<1:31:16, 5.69s/it] 83%|████████▎ | 4811/5773 [3:39:20<1:30:13, 5.63s/it] 83%|████████▎ | 4811/5773 [3:39:18<1:30:13, 5.63s/it] {'loss': 0.5722, 'learning_rate': 1.421767527873391e-06, 'epoch': 0.83} 83%|████████▎ | 4811/5773 [3:39:20<1:30:13, 5.63s/it] {'loss': 0.5722, 'learning_rate': 1.421767527873391e-06, 'epoch': 0.83} 83%|████████▎ | 4811/5773 [3:39:18<1:30:13, 5.63s/it] 83%|████████▎ | 4812/5773 [3:39:26<1:30:24, 5.64s/it] 83%|████████▎ | 4812/5773 [3:39:24<1:30:24, 5.64s/it] {'loss': 0.5614, 'learning_rate': 1.4188851397322255e-06, 'epoch': 0.83} 83%|████████▎ | 4812/5773 [3:39:26<1:30:24, 5.64s/it] {'loss': 0.5614, 'learning_rate': 1.4188851397322255e-06, 'epoch': 0.83} 83%|████████▎ | 4812/5773 [3:39:24<1:30:24, 5.64s/it] 83%|████████▎ | 4813/5773 [3:39:31<1:28:13, 5.51s/it] 83%|████████▎ | 4813/5773 [3:39:29<1:28:13, 5.51s/it] {'loss': 0.5544, 'learning_rate': 1.4160054532001777e-06, 'epoch': 0.83} 83%|████████▎ | 4813/5773 [3:39:31<1:28:13, 5.51s/it] {'loss': 0.5544, 'learning_rate': 1.4160054532001777e-06, 'epoch': 0.83} 83%|████████▎ | 4813/5773 [3:39:29<1:28:13, 5.51s/it] 83%|████████▎ | 4814/5773 [3:39:37<1:28:55, 5.56s/it] 83%|████████▎ | 4814/5773 [3:39:35<1:28:55, 5.56s/it] {'loss': 0.5658, 'learning_rate': 1.4131284691838721e-06, 'epoch': 0.83} 83%|████████▎ | 4814/5773 [3:39:37<1:28:55, 5.56s/it] {'loss': 0.5658, 'learning_rate': 1.4131284691838721e-06, 'epoch': 0.83} 83%|████████▎ | 4814/5773 [3:39:35<1:28:55, 5.56s/it] 83%|████████▎ | 4815/5773 [3:39:42<1:28:54, 5.57s/it] 83%|████████▎ | 4815/5773 [3:39:41<1:28:54, 5.57s/it] {'loss': 0.5595, 'learning_rate': 1.410254188589074e-06, 'epoch': 0.83} 83%|████████▎ | 4815/5773 [3:39:42<1:28:54, 5.57s/it] {'loss': 0.5595, 'learning_rate': 1.410254188589074e-06, 'epoch': 0.83} 83%|████████▎ | 4815/5773 [3:39:41<1:28:54, 5.57s/it] 83%|████████▎ | 4816/5773 [3:39:48<1:28:32, 5.55s/it] 83%|████████▎ | 4816/5773 [3:39:46<1:28:32, 5.55s/it] {'loss': 0.5583, 'learning_rate': 1.4073826123206946e-06, 'epoch': 0.83} 83%|████████▎ | 4816/5773 [3:39:48<1:28:32, 5.55s/it] {'loss': 0.5583, 'learning_rate': 1.4073826123206946e-06, 'epoch': 0.83} 83%|████████▎ | 4816/5773 [3:39:46<1:28:32, 5.55s/it] 83%|████████▎ | 4817/5773 [3:39:54<1:28:25, 5.55s/it] 83%|████████▎ | 4817/5773 [3:39:52<1:28:25, 5.55s/it] {'loss': 0.5594, 'learning_rate': 1.4045137412828036e-06, 'epoch': 0.83} 83%|████████▎ | 4817/5773 [3:39:54<1:28:25, 5.55s/it] {'loss': 0.5594, 'learning_rate': 1.4045137412828036e-06, 'epoch': 0.83} 83%|████████▎ | 4817/5773 [3:39:52<1:28:25, 5.55s/it] 83%|████████▎ | 4818/5773 [3:39:59<1:28:05, 5.53s/it] 83%|████████▎ | 4818/5773 [3:39:57<1:28:05, 5.53s/it] {'loss': 0.556, 'learning_rate': 1.4016475763786109e-06, 'epoch': 0.83} 83%|████████▎ | 4818/5773 [3:39:59<1:28:05, 5.53s/it] {'loss': 0.556, 'learning_rate': 1.4016475763786109e-06, 'epoch': 0.83} 83%|████████▎ | 4818/5773 [3:39:57<1:28:05, 5.53s/it] 83%|████████▎ | 4819/5773 [3:40:05<1:27:41, 5.52s/it] 83%|████████▎ | 4819/5773 [3:40:03<1:27:41, 5.52s/it] {'loss': 0.5512, 'learning_rate': 1.3987841185104767e-06, 'epoch': 0.83} 83%|████████▎ | 4819/5773 [3:40:05<1:27:41, 5.52s/it] {'loss': 0.5512, 'learning_rate': 1.3987841185104767e-06, 'epoch': 0.83} 83%|████████▎ | 4819/5773 [3:40:03<1:27:41, 5.52s/it] 83%|████████▎ | 4820/5773 [3:40:10<1:26:33, 5.45s/it] 83%|████████▎ | 4820/5773 [3:40:08<1:26:33, 5.45s/it] {'loss': 0.5414, 'learning_rate': 1.3959233685799068e-06, 'epoch': 0.83} 83%|████████▎ | 4820/5773 [3:40:10<1:26:33, 5.45s/it] {'loss': 0.5414, 'learning_rate': 1.3959233685799068e-06, 'epoch': 0.83} 83%|████████▎ | 4820/5773 [3:40:08<1:26:33, 5.45s/it] 84%|████████▎ | 4821/5773 [3:40:15<1:26:34, 5.46s/it] 84%|████████▎ | 4821/5773 [3:40:13<1:26:34, 5.46s/it] {'loss': 0.5652, 'learning_rate': 1.3930653274875616e-06, 'epoch': 0.84} 84%|████████▎ | 4821/5773 [3:40:15<1:26:34, 5.46s/it] {'loss': 0.5652, 'learning_rate': 1.3930653274875616e-06, 'epoch': 0.84} 84%|████████▎ | 4821/5773 [3:40:13<1:26:34, 5.46s/it] 84%|████████▎ | 4822/5773 [3:40:21<1:26:29, 5.46s/it] 84%|████████▎ | 4822/5773 [3:40:19<1:26:29, 5.46s/it] {'loss': 0.5613, 'learning_rate': 1.3902099961332405e-06, 'epoch': 0.84} 84%|████████▎ | 4822/5773 [3:40:21<1:26:29, 5.46s/it] {'loss': 0.5613, 'learning_rate': 1.3902099961332405e-06, 'epoch': 0.84} 84%|████████▎ | 4822/5773 [3:40:19<1:26:29, 5.46s/it] 84%|████████▎ | 4823/5773 [3:40:26<1:26:25, 5.46s/it] 84%|████████▎ | 4823/5773 [3:40:24<1:26:25, 5.46s/it] {'loss': 0.5658, 'learning_rate': 1.387357375415891e-06, 'epoch': 0.84} 84%|████████▎ | 4823/5773 [3:40:26<1:26:25, 5.46s/it] {'loss': 0.5658, 'learning_rate': 1.387357375415891e-06, 'epoch': 0.84} 84%|████████▎ | 4823/5773 [3:40:24<1:26:25, 5.46s/it] 84%|████████▎ | 4824/5773 [3:40:32<1:26:48, 5.49s/it] 84%|████████▎ | 4824/5773 [3:40:30<1:26:48, 5.49s/it] {'loss': 0.5369, 'learning_rate': 1.3845074662336134e-06, 'epoch': 0.84} 84%|████████▎ | 4824/5773 [3:40:32<1:26:48, 5.49s/it] {'loss': 0.5369, 'learning_rate': 1.3845074662336134e-06, 'epoch': 0.84} 84%|████████▎ | 4824/5773 [3:40:30<1:26:48, 5.49s/it] 84%|████████▎ | 4825/5773 [3:40:37<1:26:45, 5.49s/it] 84%|████████▎ | 4825/5773 [3:40:35<1:26:45, 5.49s/it] {'loss': 0.5687, 'learning_rate': 1.3816602694836501e-06, 'epoch': 0.84} 84%|████████▎ | 4825/5773 [3:40:37<1:26:45, 5.49s/it] {'loss': 0.5687, 'learning_rate': 1.3816602694836501e-06, 'epoch': 0.84} 84%|████████▎ | 4825/5773 [3:40:35<1:26:45, 5.49s/it] 84%|████████▎ | 4826/5773 [3:40:43<1:26:44, 5.50s/it] 84%|████████▎ | 4826/5773 [3:40:41<1:26:44, 5.50s/it] {'loss': 0.54, 'learning_rate': 1.3788157860623862e-06, 'epoch': 0.84} 84%|████████▎ | 4826/5773 [3:40:43<1:26:44, 5.50s/it] {'loss': 0.54, 'learning_rate': 1.3788157860623862e-06, 'epoch': 0.84} 84%|████████▎ | 4826/5773 [3:40:41<1:26:44, 5.50s/it] 84%|████████▎ | 4827/5773 [3:40:48<1:26:18, 5.47s/it] 84%|████████▎ | 4827/5773 [3:40:46<1:26:18, 5.47s/it] {'loss': 0.5723, 'learning_rate': 1.375974016865359e-06, 'epoch': 0.84} 84%|████████▎ | 4827/5773 [3:40:48<1:26:18, 5.47s/it] {'loss': 0.5723, 'learning_rate': 1.375974016865359e-06, 'epoch': 0.84} 84%|████████▎ | 4827/5773 [3:40:46<1:26:18, 5.47s/it] 84%|████████▎ | 4828/5773 [3:40:54<1:25:42, 5.44s/it] 84%|████████▎ | 4828/5773 [3:40:52<1:25:42, 5.44s/it] {'loss': 0.5467, 'learning_rate': 1.373134962787247e-06, 'epoch': 0.84} 84%|████████▎ | 4828/5773 [3:40:54<1:25:42, 5.44s/it] {'loss': 0.5467, 'learning_rate': 1.373134962787247e-06, 'epoch': 0.84} 84%|████████▎ | 4828/5773 [3:40:52<1:25:42, 5.44s/it] 84%|████████▎ | 4829/5773 [3:40:59<1:26:52, 5.52s/it] 84%|████████▎ | 4829/5773 [3:40:57<1:26:52, 5.52s/it] {'loss': 0.557, 'learning_rate': 1.370298624721873e-06, 'epoch': 0.84} 84%|████████▎ | 4829/5773 [3:40:59<1:26:52, 5.52s/it] {'loss': 0.557, 'learning_rate': 1.370298624721873e-06, 'epoch': 0.84} 84%|████████▎ | 4829/5773 [3:40:57<1:26:52, 5.52s/it] 84%|████████▎ | 4830/5773 [3:41:05<1:26:29, 5.50s/it] 84%|████████▎ | 4830/5773 [3:41:03<1:26:29, 5.50s/it] {'loss': 0.5401, 'learning_rate': 1.3674650035622105e-06, 'epoch': 0.84} 84%|████████▎ | 4830/5773 [3:41:05<1:26:29, 5.50s/it] {'loss': 0.5401, 'learning_rate': 1.3674650035622105e-06, 'epoch': 0.84} 84%|████████▎ | 4830/5773 [3:41:03<1:26:29, 5.50s/it] 84%|████████▎ | 4831/5773 [3:41:10<1:26:15, 5.49s/it] 84%|████████▎ | 4831/5773 [3:41:08<1:26:15, 5.49s/it] {'loss': 0.5577, 'learning_rate': 1.3646341002003738e-06, 'epoch': 0.84} 84%|████████▎ | 4831/5773 [3:41:10<1:26:15, 5.49s/it] {'loss': 0.5577, 'learning_rate': 1.3646341002003738e-06, 'epoch': 0.84} 84%|████████▎ | 4831/5773 [3:41:08<1:26:15, 5.49s/it] 84%|████████▎ | 4832/5773 [3:41:16<1:25:39, 5.46s/it] 84%|████████▎ | 4832/5773 [3:41:14<1:25:39, 5.46s/it] {'loss': 0.5597, 'learning_rate': 1.3618059155276198e-06, 'epoch': 0.84} 84%|████████▎ | 4832/5773 [3:41:16<1:25:39, 5.46s/it] {'loss': 0.5597, 'learning_rate': 1.3618059155276198e-06, 'epoch': 0.84} 84%|████████▎ | 4832/5773 [3:41:14<1:25:39, 5.46s/it] 84%|████████▎ | 4833/5773 [3:41:21<1:25:18, 5.45s/it] 84%|████████▎ | 4833/5773 [3:41:19<1:25:18, 5.45s/it] {'loss': 0.5495, 'learning_rate': 1.3589804504343497e-06, 'epoch': 0.84} 84%|████████▎ | 4833/5773 [3:41:21<1:25:18, 5.45s/it] {'loss': 0.5495, 'learning_rate': 1.3589804504343497e-06, 'epoch': 0.84} 84%|████████▎ | 4833/5773 [3:41:19<1:25:18, 5.45s/it] 84%|████████▎ | 4834/5773 [3:41:26<1:25:16, 5.45s/it] 84%|████████▎ | 4834/5773 [3:41:24<1:25:16, 5.45s/it] {'loss': 0.5675, 'learning_rate': 1.3561577058101138e-06, 'epoch': 0.84} 84%|████████▎ | 4834/5773 [3:41:26<1:25:16, 5.45s/it] {'loss': 0.5675, 'learning_rate': 1.3561577058101138e-06, 'epoch': 0.84} 84%|████████▎ | 4834/5773 [3:41:24<1:25:16, 5.45s/it] 84%|████████▍ | 4835/5773 [3:41:32<1:25:18, 5.46s/it] 84%|████████▍ | 4835/5773 [3:41:30<1:25:18, 5.46s/it] {'loss': 0.5591, 'learning_rate': 1.3533376825436017e-06, 'epoch': 0.84} 84%|████████▍ | 4835/5773 [3:41:32<1:25:18, 5.46s/it] {'loss': 0.5591, 'learning_rate': 1.3533376825436017e-06, 'epoch': 0.84} 84%|████████▍ | 4835/5773 [3:41:30<1:25:18, 5.46s/it] 84%|████████▍ | 4836/5773 [3:41:36<1:25:54, 5.50s/it] 84%|████████▍ | 4836/5773 [3:41:38<1:25:54, 5.50s/it] {'loss': 0.5607, 'learning_rate': 1.3505203815226443e-06, 'epoch': 0.84} 84%|████████▍ | 4836/5773 [3:41:36<1:25:54, 5.50s/it] {'loss': 0.5607, 'learning_rate': 1.3505203815226443e-06, 'epoch': 0.84} 84%|████████▍ | 4836/5773 [3:41:38<1:25:54, 5.50s/it] 84%|████████▍ | 4837/5773 [3:41:43<1:25:58, 5.51s/it] 84%|████████▍ | 4837/5773 [3:41:41<1:25:58, 5.51s/it] {'loss': 0.5476, 'learning_rate': 1.3477058036342216e-06, 'epoch': 0.84} 84%|████████▍ | 4837/5773 [3:41:43<1:25:58, 5.51s/it] {'loss': 0.5476, 'learning_rate': 1.3477058036342216e-06, 'epoch': 0.84} 84%|████████▍ | 4837/5773 [3:41:41<1:25:58, 5.51s/it] 84%|████████▍ | 4838/5773 [3:41:48<1:25:23, 5.48s/it] 84%|████████▍ | 4838/5773 [3:41:47<1:25:23, 5.48s/it] {'loss': 0.5643, 'learning_rate': 1.3448939497644508e-06, 'epoch': 0.84} 84%|████████▍ | 4838/5773 [3:41:48<1:25:23, 5.48s/it] {'loss': 0.5643, 'learning_rate': 1.3448939497644508e-06, 'epoch': 0.84} 84%|████████▍ | 4838/5773 [3:41:47<1:25:23, 5.48s/it] 84%|████████▍ | 4839/5773 [3:41:54<1:24:47, 5.45s/it] 84%|████████▍ | 4839/5773 [3:41:52<1:24:47, 5.45s/it] {'loss': 0.5572, 'learning_rate': 1.3420848207985937e-06, 'epoch': 0.84} 84%|████████▍ | 4839/5773 [3:41:54<1:24:47, 5.45s/it] {'loss': 0.5572, 'learning_rate': 1.3420848207985937e-06, 'epoch': 0.84} 84%|████████▍ | 4839/5773 [3:41:52<1:24:47, 5.45s/it] 84%|████████▍ | 4840/5773 [3:41:57<1:24:47, 5.45s/it] 84%|████████▍ | 4840/5773 [3:41:59<1:24:47, 5.45s/it] {'loss': 0.5576, 'learning_rate': 1.3392784176210539e-06, 'epoch': 0.84} 84%|████████▍ | 4840/5773 [3:41:59<1:24:47, 5.45s/it] {'loss': 0.5576, 'learning_rate': 1.3392784176210539e-06, 'epoch': 0.84} 84%|████████▍ | 4840/5773 [3:41:57<1:24:47, 5.45s/it] 84%|████████▍ | 4841/5773 [3:42:03<1:24:59, 5.47s/it] 84%|████████▍ | 4841/5773 [3:42:05<1:24:59, 5.47s/it] {'loss': 0.5397, 'learning_rate': 1.3364747411153756e-06, 'epoch': 0.84} 84%|████████▍ | 4841/5773 [3:42:05<1:24:59, 5.47s/it] {'loss': 0.5397, 'learning_rate': 1.3364747411153756e-06, 'epoch': 0.84} 84%|████████▍ | 4841/5773 [3:42:03<1:24:59, 5.47s/it] 84%|████████▍ | 4842/5773 [3:42:11<1:25:59, 5.54s/it] 84%|████████▍ | 4842/5773 [3:42:09<1:25:59, 5.54s/it] {'loss': 0.5489, 'learning_rate': 1.333673792164244e-06, 'epoch': 0.84} 84%|████████▍ | 4842/5773 [3:42:11<1:25:59, 5.54s/it] {'loss': 0.5489, 'learning_rate': 1.333673792164244e-06, 'epoch': 0.84} 84%|████████▍ | 4842/5773 [3:42:09<1:25:59, 5.54s/it] 84%|████████▍ | 4843/5773 [3:42:16<1:26:03, 5.55s/it] 84%|████████▍ | 4843/5773 [3:42:14<1:26:03, 5.55s/it] {'loss': 0.5596, 'learning_rate': 1.3308755716494926e-06, 'epoch': 0.84} 84%|████████▍ | 4843/5773 [3:42:16<1:26:03, 5.55s/it] {'loss': 0.5596, 'learning_rate': 1.3308755716494926e-06, 'epoch': 0.84} 84%|████████▍ | 4843/5773 [3:42:14<1:26:03, 5.55s/it] 84%|████████▍ | 4844/5773 [3:42:22<1:25:33, 5.53s/it] 84%|████████▍ | 4844/5773 [3:42:20<1:25:33, 5.53s/it] {'loss': 0.5485, 'learning_rate': 1.3280800804520888e-06, 'epoch': 0.84} 84%|████████▍ | 4844/5773 [3:42:22<1:25:33, 5.53s/it] {'loss': 0.5485, 'learning_rate': 1.3280800804520888e-06, 'epoch': 0.84} 84%|████████▍ | 4844/5773 [3:42:20<1:25:33, 5.53s/it] 84%|████████▍ | 4845/5773 [3:42:27<1:26:19, 5.58s/it] 84%|████████▍ | 4845/5773 [3:42:25<1:26:20, 5.58s/it] {'loss': 0.553, 'learning_rate': 1.32528731945214e-06, 'epoch': 0.84} 84%|████████▍ | 4845/5773 [3:42:27<1:26:19, 5.58s/it] {'loss': 0.553, 'learning_rate': 1.32528731945214e-06, 'epoch': 0.84} 84%|████████▍ | 4845/5773 [3:42:25<1:26:20, 5.58s/it] 84%|████████▍ | 4846/5773 [3:42:33<1:26:05, 5.57s/it] 84%|████████▍ | 4846/5773 [3:42:31<1:26:05, 5.57s/it] {'loss': 0.5524, 'learning_rate': 1.3224972895288967e-06, 'epoch': 0.84} 84%|████████▍ | 4846/5773 [3:42:33<1:26:05, 5.57s/it] {'loss': 0.5524, 'learning_rate': 1.3224972895288967e-06, 'epoch': 0.84} 84%|████████▍ | 4846/5773 [3:42:31<1:26:05, 5.57s/it] 84%|████████▍ | 4847/5773 [3:42:38<1:26:09, 5.58s/it] 84%|████████▍ | 4847/5773 [3:42:36<1:26:09, 5.58s/it] {'loss': 0.553, 'learning_rate': 1.3197099915607536e-06, 'epoch': 0.84} 84%|████████▍ | 4847/5773 [3:42:38<1:26:09, 5.58s/it] {'loss': 0.553, 'learning_rate': 1.3197099915607536e-06, 'epoch': 0.84} 84%|████████▍ | 4847/5773 [3:42:36<1:26:09, 5.58s/it] 84%|████████▍ | 4848/5773 [3:42:44<1:25:56, 5.58s/it] 84%|████████▍ | 4848/5773 [3:42:42<1:25:56, 5.58s/it] {'loss': 0.538, 'learning_rate': 1.3169254264252384e-06, 'epoch': 0.84} 84%|████████▍ | 4848/5773 [3:42:44<1:25:56, 5.58s/it] {'loss': 0.538, 'learning_rate': 1.3169254264252384e-06, 'epoch': 0.84} 84%|████████▍ | 4848/5773 [3:42:42<1:25:56, 5.58s/it] 84%|████████▍ | 4849/5773 [3:42:49<1:25:11, 5.53s/it] 84%|████████▍ | 4849/5773 [3:42:47<1:25:11, 5.53s/it] {'loss': 0.5595, 'learning_rate': 1.3141435949990188e-06, 'epoch': 0.84} 84%|████████▍ | 4849/5773 [3:42:49<1:25:11, 5.53s/it] {'loss': 0.5595, 'learning_rate': 1.3141435949990188e-06, 'epoch': 0.84} 84%|████████▍ | 4849/5773 [3:42:47<1:25:11, 5.53s/it]9 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 84%|████████▍ | 4850/5773 [3:42:55<1:25:32, 5.56s/it]14 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 84%|████████▍ | 4850/5773 [3:42:53<1:25:32, 5.56s/it]3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5351, 'learning_rate': 1.3113644981579088e-06, 'epoch': 0.84} 84%|████████▍ | 4850/5773 [3:42:55<1:25:32, 5.56s/it] {'loss': 0.5351, 'learning_rate': 1.3113644981579088e-06, 'epoch': 0.84} 84%|████████▍ | 4850/5773 [3:42:53<1:25:32, 5.56s/it] 84%|████████▍ | 4851/5773 [3:43:00<1:24:39, 5.51s/it] 84%|████████▍ | 4851/5773 [3:42:58<1:24:39, 5.51s/it] {'loss': 0.5567, 'learning_rate': 1.308588136776855e-06, 'epoch': 0.84} 84%|████████▍ | 4851/5773 [3:43:00<1:24:39, 5.51s/it] {'loss': 0.5567, 'learning_rate': 1.308588136776855e-06, 'epoch': 0.84} 84%|████████▍ | 4851/5773 [3:42:58<1:24:39, 5.51s/it] 84%|████████▍ | 4852/5773 [3:43:06<1:24:00, 5.47s/it] 84%|████████▍ | 4852/5773 [3:43:04<1:24:00, 5.47s/it] {'loss': 0.5635, 'learning_rate': 1.3058145117299436e-06, 'epoch': 0.84} 84%|████████▍ | 4852/5773 [3:43:06<1:24:00, 5.47s/it] {'loss': 0.5635, 'learning_rate': 1.3058145117299436e-06, 'epoch': 0.84} 84%|████████▍ | 4852/5773 [3:43:04<1:24:00, 5.47s/it] 84%|████████▍ | 4853/5773 [3:43:11<1:24:43, 5.53s/it] 84%|████████▍ | 4853/5773 [3:43:10<1:24:43, 5.53s/it] {'loss': 0.5518, 'learning_rate': 1.3030436238903998e-06, 'epoch': 0.84} 84%|████████▍ | 4853/5773 [3:43:11<1:24:43, 5.53s/it] {'loss': 0.5518, 'learning_rate': 1.3030436238903998e-06, 'epoch': 0.84} 84%|████████▍ | 4853/5773 [3:43:10<1:24:43, 5.53s/it] 84%|████████▍ | 4854/5773 [3:43:15<1:24:42, 5.53s/it] 84%|████████▍ | 4854/5773 [3:43:17<1:24:43, 5.53s/it] {'loss': 0.5587, 'learning_rate': 1.3002754741305935e-06, 'epoch': 0.84} 84%|████████▍ | 4854/5773 [3:43:17<1:24:43, 5.53s/it] {'loss': 0.5587, 'learning_rate': 1.3002754741305935e-06, 'epoch': 0.84} 84%|████████▍ | 4854/5773 [3:43:15<1:24:42, 5.53s/it] 84%|████████▍ | 4855/5773 [3:43:23<1:25:54, 5.62s/it] 84%|████████▍ | 4855/5773 [3:43:21<1:25:54, 5.62s/it] {'loss': 0.5532, 'learning_rate': 1.2975100633220183e-06, 'epoch': 0.84} 84%|████████▍ | 4855/5773 [3:43:23<1:25:54, 5.62s/it] {'loss': 0.5532, 'learning_rate': 1.2975100633220183e-06, 'epoch': 0.84} 84%|████████▍ | 4855/5773 [3:43:21<1:25:54, 5.62s/it] 84%|████████▍ | 4856/5773 [3:43:28<1:24:41, 5.54s/it] 84%|████████▍ | 4856/5773 [3:43:26<1:24:41, 5.54s/it] {'loss': 0.5494, 'learning_rate': 1.2947473923353194e-06, 'epoch': 0.84} 84%|████████▍ | 4856/5773 [3:43:28<1:24:41, 5.54s/it] {'loss': 0.5494, 'learning_rate': 1.2947473923353194e-06, 'epoch': 0.84} 84%|████████▍ | 4856/5773 [3:43:26<1:24:41, 5.54s/it] 84%|████████▍ | 4857/5773 [3:43:34<1:24:10, 5.51s/it] 84%|████████▍ | 4857/5773 [3:43:32<1:24:10, 5.51s/it] {'loss': 0.5633, 'learning_rate': 1.2919874620402727e-06, 'epoch': 0.84} 84%|████████▍ | 4857/5773 [3:43:34<1:24:10, 5.51s/it] {'loss': 0.5633, 'learning_rate': 1.2919874620402727e-06, 'epoch': 0.84} 84%|████████▍ | 4857/5773 [3:43:32<1:24:10, 5.51s/it] 84%|████████▍ | 4858/5773 [3:43:39<1:24:12, 5.52s/it] 84%|████████▍ | 4858/5773 [3:43:37<1:24:12, 5.52s/it] {'loss': 0.5679, 'learning_rate': 1.2892302733057915e-06, 'epoch': 0.84} 84%|████████▍ | 4858/5773 [3:43:39<1:24:12, 5.52s/it] {'loss': 0.5679, 'learning_rate': 1.2892302733057915e-06, 'epoch': 0.84} 84%|████████▍ | 4858/5773 [3:43:37<1:24:12, 5.52s/it] 84%|████████▍ | 4859/5773 [3:43:45<1:24:00, 5.52s/it] 84%|████████▍ | 4859/5773 [3:43:43<1:24:00, 5.52s/it] {'loss': 0.5745, 'learning_rate': 1.2864758269999256e-06, 'epoch': 0.84} 84%|████████▍ | 4859/5773 [3:43:45<1:24:00, 5.52s/it] {'loss': 0.5745, 'learning_rate': 1.2864758269999256e-06, 'epoch': 0.84} 84%|████████▍ | 4859/5773 [3:43:43<1:24:00, 5.52s/it] 84%|████████▍ | 4860/5773 [3:43:50<1:22:54, 5.45s/it] 84%|████████▍ | 4860/5773 [3:43:48<1:22:54, 5.45s/it] {'loss': 0.5675, 'learning_rate': 1.2837241239898669e-06, 'epoch': 0.84} 84%|████████▍ | 4860/5773 [3:43:50<1:22:54, 5.45s/it] {'loss': 0.5675, 'learning_rate': 1.2837241239898669e-06, 'epoch': 0.84} 84%|████████▍ | 4860/5773 [3:43:48<1:22:54, 5.45s/it] 84%|████████▍ | 4861/5773 [3:43:56<1:24:35, 5.57s/it] 84%|████████▍ | 4861/5773 [3:43:54<1:24:35, 5.57s/it] {'loss': 0.5636, 'learning_rate': 1.2809751651419366e-06, 'epoch': 0.84} 84%|████████▍ | 4861/5773 [3:43:56<1:24:35, 5.57s/it] {'loss': 0.5636, 'learning_rate': 1.2809751651419366e-06, 'epoch': 0.84} 84%|████████▍ | 4861/5773 [3:43:54<1:24:35, 5.57s/it] 84%|████████▍ | 4862/5773 [3:44:01<1:24:36, 5.57s/it] 84%|████████▍ | 4862/5773 [3:43:59<1:24:36, 5.57s/it] {'loss': 0.5412, 'learning_rate': 1.278228951321594e-06, 'epoch': 0.84} 84%|████████▍ | 4862/5773 [3:44:01<1:24:36, 5.57s/it] {'loss': 0.5412, 'learning_rate': 1.278228951321594e-06, 'epoch': 0.84} 84%|████████▍ | 4862/5773 [3:43:59<1:24:36, 5.57s/it] 84%|████████▍ | 4863/5773 [3:44:07<1:24:05, 5.54s/it] 84%|████████▍ | 4863/5773 [3:44:05<1:24:05, 5.54s/it] {'loss': 0.5496, 'learning_rate': 1.2754854833934371e-06, 'epoch': 0.84} 84%|████████▍ | 4863/5773 [3:44:07<1:24:05, 5.54s/it] {'loss': 0.5496, 'learning_rate': 1.2754854833934371e-06, 'epoch': 0.84} 84%|████████▍ | 4863/5773 [3:44:05<1:24:05, 5.54s/it] 84%|████████▍ | 4864/5773 [3:44:11<1:24:53, 5.60s/it] 84%|████████▍ | 4864/5773 [3:44:13<1:24:53, 5.60s/it] {'loss': 0.5507, 'learning_rate': 1.2727447622211974e-06, 'epoch': 0.84} 84%|████████▍ | 4864/5773 [3:44:13<1:24:53, 5.60s/it] {'loss': 0.5507, 'learning_rate': 1.2727447622211974e-06, 'epoch': 0.84} 84%|████████▍ | 4864/5773 [3:44:11<1:24:53, 5.60s/it] 84%|████████▍ | 4865/5773 [3:44:16<1:25:11, 5.63s/it] 84%|████████▍ | 4865/5773 [3:44:18<1:25:11, 5.63s/it] {'loss': 0.5522, 'learning_rate': 1.270006788667738e-06, 'epoch': 0.84} 84%|████████▍ | 4865/5773 [3:44:18<1:25:11, 5.63s/it] {'loss': 0.5522, 'learning_rate': 1.270006788667738e-06, 'epoch': 0.84} 84%|████████▍ | 4865/5773 [3:44:16<1:25:11, 5.63s/it] 84%|████████▍ | 4866/5773 [3:44:24<1:24:03, 5.56s/it] 84%|████████▍ | 4866/5773 [3:44:22<1:24:03, 5.56s/it] {'loss': 0.5616, 'learning_rate': 1.267271563595065e-06, 'epoch': 0.84} 84%|████████▍ | 4866/5773 [3:44:24<1:24:03, 5.56s/it] {'loss': 0.5616, 'learning_rate': 1.267271563595065e-06, 'epoch': 0.84} 84%|████████▍ | 4866/5773 [3:44:22<1:24:03, 5.56s/it] 84%|████████▍ | 4867/5773 [3:44:27<1:24:24, 5.59s/it] 84%|████████▍ | 4867/5773 [3:44:29<1:24:24, 5.59s/it] {'loss': 0.5577, 'learning_rate': 1.2645390878643128e-06, 'epoch': 0.84} 84%|████████▍ | 4867/5773 [3:44:29<1:24:24, 5.59s/it] {'loss': 0.5577, 'learning_rate': 1.2645390878643128e-06, 'epoch': 0.84} 84%|████████▍ | 4867/5773 [3:44:27<1:24:24, 5.59s/it] 84%|████████▍ | 4868/5773 [3:44:33<1:23:57, 5.57s/it] 84%|████████▍ | 4868/5773 [3:44:35<1:23:57, 5.57s/it] {'loss': 0.5504, 'learning_rate': 1.2618093623357507e-06, 'epoch': 0.84} 84%|████████▍ | 4868/5773 [3:44:35<1:23:57, 5.57s/it] {'loss': 0.5504, 'learning_rate': 1.2618093623357507e-06, 'epoch': 0.84} 84%|████████▍ | 4868/5773 [3:44:33<1:23:57, 5.57s/it] 84%|████████▍ | 4869/5773 [3:44:40<1:23:13, 5.52s/it] 84%|████████▍ | 4869/5773 [3:44:38<1:23:13, 5.52s/it] {'loss': 0.5573, 'learning_rate': 1.2590823878687853e-06, 'epoch': 0.84} 84%|████████▍ | 4869/5773 [3:44:40<1:23:13, 5.52s/it] {'loss': 0.5573, 'learning_rate': 1.2590823878687853e-06, 'epoch': 0.84} 84%|████████▍ | 4869/5773 [3:44:38<1:23:13, 5.52s/it] 84%|████████▍ | 4870/5773 [3:44:44<1:22:54, 5.51s/it] 84%|████████▍ | 4870/5773 [3:44:46<1:22:54, 5.51s/it] {'loss': 0.5688, 'learning_rate': 1.2563581653219548e-06, 'epoch': 0.84} 84%|████████▍ | 4870/5773 [3:44:46<1:22:54, 5.51s/it] {'loss': 0.5688, 'learning_rate': 1.2563581653219548e-06, 'epoch': 0.84} 84%|████████▍ | 4870/5773 [3:44:44<1:22:54, 5.51s/it] 84%|████████▍ | 4871/5773 [3:44:49<1:22:46, 5.51s/it] 84%|████████▍ | 4871/5773 [3:44:51<1:22:46, 5.51s/it] {'loss': 0.5897, 'learning_rate': 1.253636695552931e-06, 'epoch': 0.84} 84%|████████▍ | 4871/5773 [3:44:51<1:22:46, 5.51s/it] {'loss': 0.5897, 'learning_rate': 1.253636695552931e-06, 'epoch': 0.84} 84%|████████▍ | 4871/5773 [3:44:49<1:22:46, 5.51s/it] 84%|████████▍ | 4872/5773 [3:44:55<1:22:51, 5.52s/it] 84%|████████▍ | 4872/5773 [3:44:57<1:22:51, 5.52s/it] {'loss': 0.5649, 'learning_rate': 1.2509179794185188e-06, 'epoch': 0.84} 84%|████████▍ | 4872/5773 [3:44:57<1:22:51, 5.52s/it] {'loss': 0.5649, 'learning_rate': 1.2509179794185188e-06, 'epoch': 0.84} 84%|████████▍ | 4872/5773 [3:44:55<1:22:51, 5.52s/it] 84%|████████▍ | 4873/5773 [3:45:00<1:22:11, 5.48s/it] 84%|████████▍ | 4873/5773 [3:45:02<1:22:11, 5.48s/it] {'loss': 0.5585, 'learning_rate': 1.248202017774659e-06, 'epoch': 0.84} 84%|████████▍ | 4873/5773 [3:45:02<1:22:11, 5.48s/it] {'loss': 0.5585, 'learning_rate': 1.248202017774659e-06, 'epoch': 0.84} 84%|████████▍ | 4873/5773 [3:45:00<1:22:11, 5.48s/it] 84%|████████▍ | 4874/5773 [3:45:06<1:22:36, 5.51s/it] 84%|████████▍ | 4874/5773 [3:45:08<1:22:36, 5.51s/it] {'loss': 0.5678, 'learning_rate': 1.2454888114764219e-06, 'epoch': 0.84} 84%|████████▍ | 4874/5773 [3:45:08<1:22:36, 5.51s/it] {'loss': 0.5678, 'learning_rate': 1.2454888114764219e-06, 'epoch': 0.84} 84%|████████▍ | 4874/5773 [3:45:06<1:22:36, 5.51s/it] 84%|████████▍ | 4875/5773 [3:45:11<1:22:03, 5.48s/it] 84%|████████▍ | 4875/5773 [3:45:13<1:22:03, 5.48s/it] {'loss': 0.5458, 'learning_rate': 1.2427783613780098e-06, 'epoch': 0.84} 84%|████████▍ | 4875/5773 [3:45:13<1:22:03, 5.48s/it] {'loss': 0.5458, 'learning_rate': 1.2427783613780098e-06, 'epoch': 0.84} 84%|████████▍ | 4875/5773 [3:45:11<1:22:03, 5.48s/it] 84%|████████▍ | 4876/5773 [3:45:17<1:22:17, 5.50s/it] 84%|████████▍ | 4876/5773 [3:45:19<1:22:17, 5.50s/it] {'loss': 0.5567, 'learning_rate': 1.2400706683327636e-06, 'epoch': 0.84} 84%|████████▍ | 4876/5773 [3:45:19<1:22:17, 5.50s/it] {'loss': 0.5567, 'learning_rate': 1.2400706683327636e-06, 'epoch': 0.84} 84%|████████▍ | 4876/5773 [3:45:17<1:22:17, 5.50s/it] 84%|████████▍ | 4877/5773 [3:45:22<1:21:44, 5.47s/it] 84%|████████▍ | 4877/5773 [3:45:24<1:21:44, 5.47s/it] {'loss': 0.5482, 'learning_rate': 1.2373657331931478e-06, 'epoch': 0.84} 84%|████████▍ | 4877/5773 [3:45:24<1:21:44, 5.47s/it] {'loss': 0.5482, 'learning_rate': 1.2373657331931478e-06, 'epoch': 0.84} 84%|████████▍ | 4877/5773 [3:45:22<1:21:44, 5.47s/it] 84%|████████▍ | 4878/5773 [3:45:28<1:21:22, 5.46s/it] 84%|████████▍ | 4878/5773 [3:45:30<1:21:22, 5.46s/it] {'loss': 0.5651, 'learning_rate': 1.2346635568107623e-06, 'epoch': 0.84} 84%|████████▍ | 4878/5773 [3:45:30<1:21:22, 5.46s/it] {'loss': 0.5651, 'learning_rate': 1.2346635568107623e-06, 'epoch': 0.84} 84%|████████▍ | 4878/5773 [3:45:28<1:21:22, 5.46s/it] 85%|████████▍ | 4879/5773 [3:45:35<1:22:27, 5.53s/it] 85%|████████▍ | 4879/5773 [3:45:33<1:22:27, 5.53s/it] {'loss': 0.5405, 'learning_rate': 1.2319641400363413e-06, 'epoch': 0.85} 85%|████████▍ | 4879/5773 [3:45:35<1:22:27, 5.53s/it] {'loss': 0.5405, 'learning_rate': 1.2319641400363413e-06, 'epoch': 0.85} 85%|████████▍ | 4879/5773 [3:45:33<1:22:27, 5.53s/it] 85%|████████▍ | 4880/5773 [3:45:39<1:22:41, 5.56s/it] 85%|████████▍ | 4880/5773 [3:45:41<1:22:41, 5.56s/it] {'loss': 0.5662, 'learning_rate': 1.229267483719746e-06, 'epoch': 0.85} 85%|████████▍ | 4880/5773 [3:45:41<1:22:41, 5.56s/it] {'loss': 0.5662, 'learning_rate': 1.229267483719746e-06, 'epoch': 0.85} 85%|████████▍ | 4880/5773 [3:45:39<1:22:41, 5.56s/it] 85%|████████▍ | 4881/5773 [3:45:44<1:21:51, 5.51s/it] 85%|████████▍ | 4881/5773 [3:45:46<1:21:51, 5.51s/it] {'loss': 0.5599, 'learning_rate': 1.2265735887099694e-06, 'epoch': 0.85} 85%|████████▍ | 4881/5773 [3:45:46<1:21:51, 5.51s/it] {'loss': 0.5599, 'learning_rate': 1.2265735887099694e-06, 'epoch': 0.85} 85%|████████▍ | 4881/5773 [3:45:44<1:21:51, 5.51s/it] 85%|████████▍ | 4882/5773 [3:45:50<1:22:05, 5.53s/it] 85%|████████▍ | 4882/5773 [3:45:52<1:22:05, 5.53s/it] {'loss': 0.5759, 'learning_rate': 1.2238824558551365e-06, 'epoch': 0.85} 85%|████████▍ | 4882/5773 [3:45:52<1:22:05, 5.53s/it] {'loss': 0.5759, 'learning_rate': 1.2238824558551365e-06, 'epoch': 0.85} 85%|████████▍ | 4882/5773 [3:45:50<1:22:05, 5.53s/it] 85%|████████▍ | 4883/5773 [3:45:55<1:22:06, 5.54s/it] 85%|████████▍ | 4883/5773 [3:45:57<1:22:06, 5.54s/it] {'loss': 0.564, 'learning_rate': 1.2211940860025017e-06, 'epoch': 0.85} 85%|████████▍ | 4883/5773 [3:45:57<1:22:06, 5.54s/it] {'loss': 0.564, 'learning_rate': 1.2211940860025017e-06, 'epoch': 0.85} 85%|████████▍ | 4883/5773 [3:45:55<1:22:06, 5.54s/it] 85%|████████▍ | 4884/5773 [3:46:01<1:21:51, 5.53s/it] 85%|████████▍ | 4884/5773 [3:46:03<1:21:51, 5.53s/it] {'loss': 0.5669, 'learning_rate': 1.218508479998447e-06, 'epoch': 0.85} 85%|████████▍ | 4884/5773 [3:46:03<1:21:51, 5.53s/it] {'loss': 0.5669, 'learning_rate': 1.218508479998447e-06, 'epoch': 0.85} 85%|████████▍ | 4884/5773 [3:46:01<1:21:51, 5.53s/it] 85%|████████▍ | 4885/5773 [3:46:07<1:22:41, 5.59s/it] 85%|████████▍ | 4885/5773 [3:46:09<1:22:41, 5.59s/it] {'loss': 0.5448, 'learning_rate': 1.2158256386884926e-06, 'epoch': 0.85} 85%|████████▍ | 4885/5773 [3:46:09<1:22:41, 5.59s/it] {'loss': 0.5448, 'learning_rate': 1.2158256386884926e-06, 'epoch': 0.85} 85%|████████▍ | 4885/5773 [3:46:07<1:22:41, 5.59s/it] 85%|████████▍ | 4886/5773 [3:46:12<1:22:04, 5.55s/it] 85%|████████▍ | 4886/5773 [3:46:14<1:22:04, 5.55s/it] {'loss': 0.5603, 'learning_rate': 1.2131455629172784e-06, 'epoch': 0.85} 85%|████████▍ | 4886/5773 [3:46:14<1:22:04, 5.55s/it] {'loss': 0.5603, 'learning_rate': 1.2131455629172784e-06, 'epoch': 0.85} 85%|████████▍ | 4886/5773 [3:46:12<1:22:04, 5.55s/it] 85%|████████▍ | 4887/5773 [3:46:18<1:22:02, 5.56s/it] 85%|████████▍ | 4887/5773 [3:46:20<1:22:01, 5.56s/it] {'loss': 0.5397, 'learning_rate': 1.2104682535285795e-06, 'epoch': 0.85} 85%|████████▍ | 4887/5773 [3:46:20<1:22:01, 5.56s/it] {'loss': 0.5397, 'learning_rate': 1.2104682535285795e-06, 'epoch': 0.85} 85%|████████▍ | 4887/5773 [3:46:18<1:22:02, 5.56s/it] 85%|████████▍ | 4888/5773 [3:46:23<1:21:32, 5.53s/it] 85%|████████▍ | 4888/5773 [3:46:25<1:21:32, 5.53s/it] {'loss': 0.5778, 'learning_rate': 1.207793711365296e-06, 'epoch': 0.85} 85%|████████▍ | 4888/5773 [3:46:25<1:21:32, 5.53s/it] {'loss': 0.5778, 'learning_rate': 1.207793711365296e-06, 'epoch': 0.85} 85%|████████▍ | 4888/5773 [3:46:23<1:21:32, 5.53s/it] 85%|████████▍ | 4889/5773 [3:46:29<1:20:48, 5.48s/it] 85%|████████▍ | 4889/5773 [3:46:31<1:20:48, 5.48s/it] {'loss': 0.5563, 'learning_rate': 1.2051219372694634e-06, 'epoch': 0.85} 85%|████████▍ | 4889/5773 [3:46:31<1:20:48, 5.48s/it] {'loss': 0.5563, 'learning_rate': 1.2051219372694634e-06, 'epoch': 0.85} 85%|████████▍ | 4889/5773 [3:46:29<1:20:48, 5.48s/it] 85%|████████▍ | 4890/5773 [3:46:34<1:20:25, 5.46s/it] 85%|████████▍ | 4890/5773 [3:46:36<1:20:25, 5.46s/it] {'loss': 0.5551, 'learning_rate': 1.2024529320822398e-06, 'epoch': 0.85} 85%|████████▍ | 4890/5773 [3:46:36<1:20:25, 5.46s/it] {'loss': 0.5551, 'learning_rate': 1.2024529320822398e-06, 'epoch': 0.85} 85%|████████▍ | 4890/5773 [3:46:34<1:20:25, 5.46s/it] 85%|████████▍ | 4891/5773 [3:46:41<1:20:19, 5.46s/it] 85%|████████▍ | 4891/5773 [3:46:39<1:20:19, 5.46s/it] {'loss': 0.5447, 'learning_rate': 1.1997866966439097e-06, 'epoch': 0.85} 85%|████████▍ | 4891/5773 [3:46:41<1:20:19, 5.46s/it] {'loss': 0.5447, 'learning_rate': 1.1997866966439097e-06, 'epoch': 0.85} 85%|████████▍ | 4891/5773 [3:46:39<1:20:19, 5.46s/it] 85%|████████▍ | 4892/5773 [3:46:45<1:20:06, 5.46s/it] 85%|████████▍ | 4892/5773 [3:46:47<1:20:06, 5.46s/it] {'loss': 0.5701, 'learning_rate': 1.1971232317938952e-06, 'epoch': 0.85} 85%|████████▍ | 4892/5773 [3:46:47<1:20:06, 5.46s/it] {'loss': 0.5701, 'learning_rate': 1.1971232317938952e-06, 'epoch': 0.85} 85%|████████▍ | 4892/5773 [3:46:45<1:20:06, 5.46s/it] 85%|████████▍ | 4893/5773 [3:46:53<1:20:55, 5.52s/it] 85%|████████▍ | 4893/5773 [3:46:51<1:20:55, 5.52s/it] {'loss': 0.553, 'learning_rate': 1.1944625383707377e-06, 'epoch': 0.85} 85%|████████▍ | 4893/5773 [3:46:53<1:20:55, 5.52s/it] {'loss': 0.553, 'learning_rate': 1.1944625383707377e-06, 'epoch': 0.85} 85%|████████▍ | 4893/5773 [3:46:51<1:20:55, 5.52s/it] 85%|████████▍ | 4894/5773 [3:46:58<1:20:10, 5.47s/it] 85%|████████▍ | 4894/5773 [3:46:56<1:20:10, 5.47s/it] {'loss': 0.5682, 'learning_rate': 1.1918046172121067e-06, 'epoch': 0.85} 85%|████████▍ | 4894/5773 [3:46:58<1:20:10, 5.47s/it] {'loss': 0.5682, 'learning_rate': 1.1918046172121067e-06, 'epoch': 0.85} 85%|████████▍ | 4894/5773 [3:46:56<1:20:10, 5.47s/it] 85%|████████▍ | 4895/5773 [3:47:01<1:20:00, 5.47s/it] 85%|████████▍ | 4895/5773 [3:47:03<1:20:00, 5.47s/it] {'loss': 0.5642, 'learning_rate': 1.1891494691548012e-06, 'epoch': 0.85} 85%|████████▍ | 4895/5773 [3:47:03<1:20:00, 5.47s/it] {'loss': 0.5642, 'learning_rate': 1.1891494691548012e-06, 'epoch': 0.85} 85%|████████▍ | 4895/5773 [3:47:01<1:20:00, 5.47s/it] 85%|████████▍ | 4896/5773 [3:47:07<1:20:34, 5.51s/it] 85%|████████▍ | 4896/5773 [3:47:09<1:20:34, 5.51s/it] {'loss': 0.5587, 'learning_rate': 1.186497095034752e-06, 'epoch': 0.85} 85%|████████▍ | 4896/5773 [3:47:09<1:20:34, 5.51s/it] {'loss': 0.5587, 'learning_rate': 1.186497095034752e-06, 'epoch': 0.85} 85%|████████▍ | 4896/5773 [3:47:07<1:20:34, 5.51s/it] 85%|████████▍ | 4897/5773 [3:47:13<1:20:35, 5.52s/it] 85%|████████▍ | 4897/5773 [3:47:15<1:20:36, 5.52s/it] {'loss': 0.5599, 'learning_rate': 1.1838474956870016e-06, 'epoch': 0.85} 85%|████████▍ | 4897/5773 [3:47:15<1:20:36, 5.52s/it] {'loss': 0.5599, 'learning_rate': 1.1838474956870016e-06, 'epoch': 0.85} 85%|████████▍ | 4897/5773 [3:47:13<1:20:35, 5.52s/it] 85%|████████▍ | 4898/5773 [3:47:20<1:20:36, 5.53s/it] 85%|████████▍ | 4898/5773 [3:47:18<1:20:36, 5.53s/it] {'loss': 0.5568, 'learning_rate': 1.1812006719457369e-06, 'epoch': 0.85} 85%|████████▍ | 4898/5773 [3:47:20<1:20:36, 5.53s/it] {'loss': 0.5568, 'learning_rate': 1.1812006719457369e-06, 'epoch': 0.85} 85%|████████▍ | 4898/5773 [3:47:18<1:20:36, 5.53s/it] 85%|████████▍ | 4899/5773 [3:47:23<1:19:43, 5.47s/it] 85%|████████▍ | 4899/5773 [3:47:25<1:19:43, 5.47s/it] {'loss': 0.5538, 'learning_rate': 1.1785566246442582e-06, 'epoch': 0.85} 85%|████████▍ | 4899/5773 [3:47:25<1:19:43, 5.47s/it] {'loss': 0.5538, 'learning_rate': 1.1785566246442582e-06, 'epoch': 0.85} 85%|████████▍ | 4899/5773 [3:47:23<1:19:43, 5.47s/it]9 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 12 11AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 14 15AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 810 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 85%|████████▍ | 4900/5773 [3:47:31<1:20:40, 5.54s/it]AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 85%|████████▍ | 4900/5773 [3:47:29<1:20:40, 5.54s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.549, 'learning_rate': 1.1759153546149983e-06, 'epoch': 0.85} 85%|████████▍ | 4900/5773 [3:47:31<1:20:40, 5.54s/it] {'loss': 0.549, 'learning_rate': 1.1759153546149983e-06, 'epoch': 0.85} 85%|████████▍ | 4900/5773 [3:47:29<1:20:40, 5.54s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-4900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 85%|████████▍ | 4901/5773 [3:47:49<2:21:15, 9.72s/it] 85%|████████▍ | 4901/5773 [3:47:51<2:21:15, 9.72s/it] {'loss': 0.5716, 'learning_rate': 1.1732768626895098e-06, 'epoch': 0.85} 85%|████████▍ | 4901/5773 [3:47:51<2:21:15, 9.72s/it] {'loss': 0.5716, 'learning_rate': 1.1732768626895098e-06, 'epoch': 0.85} 85%|████████▍ | 4901/5773 [3:47:49<2:21:15, 9.72s/it] 85%|████████▍ | 4902/5773 [3:47:54<2:02:19, 8.43s/it] 85%|████████▍ | 4902/5773 [3:47:56<2:02:19, 8.43s/it] {'loss': 0.5574, 'learning_rate': 1.1706411496984781e-06, 'epoch': 0.85} 85%|████████▍ | 4902/5773 [3:47:56<2:02:19, 8.43s/it] {'loss': 0.5574, 'learning_rate': 1.1706411496984781e-06, 'epoch': 0.85} 85%|████████▍ | 4902/5773 [3:47:54<2:02:19, 8.43s/it] 85%|████████▍ | 4903/5773 [3:47:59<1:49:09, 7.53s/it] 85%|████████▍ | 4903/5773 [3:48:01<1:49:09, 7.53s/it] {'loss': 0.5304, 'learning_rate': 1.16800821647171e-06, 'epoch': 0.85} 85%|████████▍ | 4903/5773 [3:48:01<1:49:09, 7.53s/it] {'loss': 0.5304, 'learning_rate': 1.16800821647171e-06, 'epoch': 0.85} 85%|████████▍ | 4903/5773 [3:47:59<1:49:09, 7.53s/it] 85%|████████▍ | 4904/5773 [3:48:05<1:40:52, 6.96s/it] 85%|████████▍ | 4904/5773 [3:48:07<1:40:52, 6.97s/it] {'loss': 0.5594, 'learning_rate': 1.165378063838133e-06, 'epoch': 0.85} 85%|████████▍ | 4904/5773 [3:48:07<1:40:52, 6.97s/it] {'loss': 0.5594, 'learning_rate': 1.165378063838133e-06, 'epoch': 0.85} 85%|████████▍ | 4904/5773 [3:48:05<1:40:52, 6.96s/it] 85%|████████▍ | 4905/5773 [3:48:10<1:33:44, 6.48s/it] 85%|████████▍ | 4905/5773 [3:48:12<1:33:44, 6.48s/it] {'loss': 0.5423, 'learning_rate': 1.162750692625807e-06, 'epoch': 0.85} 85%|████████▍ | 4905/5773 [3:48:12<1:33:44, 6.48s/it] {'loss': 0.5423, 'learning_rate': 1.162750692625807e-06, 'epoch': 0.85} 85%|████████▍ | 4905/5773 [3:48:10<1:33:44, 6.48s/it] 85%|████████▍ | 4906/5773 [3:48:16<1:29:40, 6.21s/it] 85%|████████▍ | 4906/5773 [3:48:18<1:29:40, 6.21s/it] {'loss': 0.5469, 'learning_rate': 1.1601261036619116e-06, 'epoch': 0.85} 85%|████████▍ | 4906/5773 [3:48:18<1:29:40, 6.21s/it] {'loss': 0.5469, 'learning_rate': 1.1601261036619116e-06, 'epoch': 0.85} 85%|████████▍ | 4906/5773 [3:48:16<1:29:40, 6.21s/it] 85%|████████▍ | 4907/5773 [3:48:22<1:27:17, 6.05s/it] 85%|████████▍ | 4907/5773 [3:48:24<1:27:17, 6.05s/it] {'loss': 0.5725, 'learning_rate': 1.1575042977727513e-06, 'epoch': 0.85} 85%|████████▍ | 4907/5773 [3:48:24<1:27:17, 6.05s/it] {'loss': 0.5725, 'learning_rate': 1.1575042977727513e-06, 'epoch': 0.85} 85%|████████▍ | 4907/5773 [3:48:22<1:27:17, 6.05s/it] 85%|████████▌ | 4908/5773 [3:48:27<1:25:13, 5.91s/it] 85%|████████▌ | 4908/5773 [3:48:29<1:25:13, 5.91s/it] {'loss': 0.5578, 'learning_rate': 1.1548852757837513e-06, 'epoch': 0.85} 85%|████████▌ | 4908/5773 [3:48:29<1:25:13, 5.91s/it] {'loss': 0.5578, 'learning_rate': 1.1548852757837513e-06, 'epoch': 0.85} 85%|████████▌ | 4908/5773 [3:48:27<1:25:13, 5.91s/it] 85%|████████▌ | 4909/5773 [3:48:33<1:23:03, 5.77s/it] 85%|████████▌ | 4909/5773 [3:48:35<1:23:03, 5.77s/it] {'loss': 0.5583, 'learning_rate': 1.152269038519469e-06, 'epoch': 0.85} 85%|████████▌ | 4909/5773 [3:48:35<1:23:03, 5.77s/it] {'loss': 0.5583, 'learning_rate': 1.152269038519469e-06, 'epoch': 0.85} 85%|████████▌ | 4909/5773 [3:48:33<1:23:03, 5.77s/it] 85%|████████▌ | 4910/5773 [3:48:38<1:21:49, 5.69s/it] 85%|████████▌ | 4910/5773 [3:48:40<1:21:49, 5.69s/it] {'loss': 0.5694, 'learning_rate': 1.1496555868035753e-06, 'epoch': 0.85} 85%|████████▌ | 4910/5773 [3:48:40<1:21:49, 5.69s/it] {'loss': 0.5694, 'learning_rate': 1.1496555868035753e-06, 'epoch': 0.85} 85%|████████▌ | 4910/5773 [3:48:38<1:21:49, 5.69s/it] 85%|████████▌ | 4911/5773 [3:48:44<1:21:46, 5.69s/it] 85%|████████▌ | 4911/5773 [3:48:46<1:21:46, 5.69s/it] {'loss': 0.5476, 'learning_rate': 1.1470449214588697e-06, 'epoch': 0.85} 85%|████████▌ | 4911/5773 [3:48:46<1:21:46, 5.69s/it] {'loss': 0.5476, 'learning_rate': 1.1470449214588697e-06, 'epoch': 0.85} 85%|████████▌ | 4911/5773 [3:48:44<1:21:46, 5.69s/it] 85%|████████▌ | 4912/5773 [3:48:49<1:20:17, 5.60s/it] 85%|████████▌ | 4912/5773 [3:48:51<1:20:17, 5.60s/it] {'loss': 0.5618, 'learning_rate': 1.1444370433072727e-06, 'epoch': 0.85} 85%|████████▌ | 4912/5773 [3:48:51<1:20:17, 5.60s/it] {'loss': 0.5618, 'learning_rate': 1.1444370433072727e-06, 'epoch': 0.85} 85%|████████▌ | 4912/5773 [3:48:49<1:20:17, 5.60s/it] 85%|████████▌ | 4913/5773 [3:48:55<1:19:47, 5.57s/it] 85%|████████▌ | 4913/5773 [3:48:57<1:19:47, 5.57s/it] {'loss': 0.5554, 'learning_rate': 1.1418319531698286e-06, 'epoch': 0.85} 85%|████████▌ | 4913/5773 [3:48:57<1:19:47, 5.57s/it] {'loss': 0.5554, 'learning_rate': 1.1418319531698286e-06, 'epoch': 0.85} 85%|████████▌ | 4913/5773 [3:48:55<1:19:47, 5.57s/it] 85%|████████▌ | 4914/5773 [3:49:00<1:19:36, 5.56s/it] 85%|████████▌ | 4914/5773 [3:49:02<1:19:36, 5.56s/it] {'loss': 0.5531, 'learning_rate': 1.139229651866699e-06, 'epoch': 0.85} 85%|████████▌ | 4914/5773 [3:49:02<1:19:36, 5.56s/it] {'loss': 0.5531, 'learning_rate': 1.139229651866699e-06, 'epoch': 0.85} 85%|████████▌ | 4914/5773 [3:49:00<1:19:36, 5.56s/it] 85%|████████▌ | 4915/5773 [3:49:06<1:18:54, 5.52s/it] 85%|████████▌ | 4915/5773 [3:49:08<1:18:54, 5.52s/it] {'loss': 0.5583, 'learning_rate': 1.1366301402171775e-06, 'epoch': 0.85} 85%|████████▌ | 4915/5773 [3:49:08<1:18:54, 5.52s/it] {'loss': 0.5583, 'learning_rate': 1.1366301402171775e-06, 'epoch': 0.85} 85%|████████▌ | 4915/5773 [3:49:06<1:18:54, 5.52s/it] 85%|████████▌ | 4916/5773 [3:49:11<1:18:54, 5.52s/it] 85%|████████▌ | 4916/5773 [3:49:13<1:18:54, 5.52s/it] {'loss': 0.5527, 'learning_rate': 1.1340334190396706e-06, 'epoch': 0.85} 85%|████████▌ | 4916/5773 [3:49:13<1:18:54, 5.52s/it] {'loss': 0.5527, 'learning_rate': 1.1340334190396706e-06, 'epoch': 0.85} 85%|████████▌ | 4916/5773 [3:49:11<1:18:54, 5.52s/it] 85%|████████▌ | 4917/5773 [3:49:17<1:19:02, 5.54s/it] 85%|████████▌ | 4917/5773 [3:49:19<1:19:02, 5.54s/it] {'loss': 0.5582, 'learning_rate': 1.1314394891517066e-06, 'epoch': 0.85} 85%|████████▌ | 4917/5773 [3:49:19<1:19:02, 5.54s/it] {'loss': 0.5582, 'learning_rate': 1.1314394891517066e-06, 'epoch': 0.85} 85%|████████▌ | 4917/5773 [3:49:17<1:19:02, 5.54s/it] 85%|████████▌ | 4918/5773 [3:49:22<1:19:14, 5.56s/it] 85%|████████▌ | 4918/5773 [3:49:24<1:19:14, 5.56s/it] {'loss': 0.568, 'learning_rate': 1.1288483513699421e-06, 'epoch': 0.85} 85%|████████▌ | 4918/5773 [3:49:24<1:19:14, 5.56s/it] {'loss': 0.568, 'learning_rate': 1.1288483513699421e-06, 'epoch': 0.85} 85%|████████▌ | 4918/5773 [3:49:22<1:19:14, 5.56s/it] 85%|████████▌ | 4919/5773 [3:49:28<1:19:04, 5.56s/it] 85%|████████▌ | 4919/5773 [3:49:30<1:19:04, 5.56s/it] {'loss': 0.5216, 'learning_rate': 1.1262600065101481e-06, 'epoch': 0.85} 85%|████████▌ | 4919/5773 [3:49:30<1:19:04, 5.56s/it] {'loss': 0.5216, 'learning_rate': 1.1262600065101481e-06, 'epoch': 0.85} 85%|████████▌ | 4919/5773 [3:49:28<1:19:04, 5.56s/it] 85%|████████▌ | 4920/5773 [3:49:33<1:18:28, 5.52s/it] 85%|████████▌ | 4920/5773 [3:49:35<1:18:28, 5.52s/it] {'loss': 0.579, 'learning_rate': 1.123674455387218e-06, 'epoch': 0.85} 85%|████████▌ | 4920/5773 [3:49:35<1:18:28, 5.52s/it] {'loss': 0.579, 'learning_rate': 1.123674455387218e-06, 'epoch': 0.85} 85%|████████▌ | 4920/5773 [3:49:33<1:18:28, 5.52s/it] 85%|████████▌ | 4921/5773 [3:49:39<1:18:00, 5.49s/it] 85%|████████▌ | 4921/5773 [3:49:41<1:18:00, 5.49s/it] {'loss': 0.5689, 'learning_rate': 1.1210916988151643e-06, 'epoch': 0.85} 85%|████████▌ | 4921/5773 [3:49:41<1:18:00, 5.49s/it] {'loss': 0.5689, 'learning_rate': 1.1210916988151643e-06, 'epoch': 0.85} 85%|████████▌ | 4921/5773 [3:49:39<1:18:00, 5.49s/it] 85%|████████▌ | 4922/5773 [3:49:44<1:18:11, 5.51s/it] 85%|████████▌ | 4922/5773 [3:49:46<1:18:11, 5.51s/it] {'loss': 0.5464, 'learning_rate': 1.118511737607124e-06, 'epoch': 0.85} 85%|████████▌ | 4922/5773 [3:49:46<1:18:11, 5.51s/it] {'loss': 0.5464, 'learning_rate': 1.118511737607124e-06, 'epoch': 0.85} 85%|████████▌ | 4922/5773 [3:49:44<1:18:11, 5.51s/it] 85%|████████▌ | 4923/5773 [3:49:50<1:18:28, 5.54s/it] 85%|████████▌ | 4923/5773 [3:49:52<1:18:28, 5.54s/it] {'loss': 0.5654, 'learning_rate': 1.1159345725753512e-06, 'epoch': 0.85} 85%|████████▌ | 4923/5773 [3:49:52<1:18:28, 5.54s/it] {'loss': 0.5654, 'learning_rate': 1.1159345725753512e-06, 'epoch': 0.85} 85%|████████▌ | 4923/5773 [3:49:50<1:18:28, 5.54s/it] 85%|████████▌ | 4924/5773 [3:49:56<1:18:12, 5.53s/it] 85%|████████▌ | 4924/5773 [3:49:58<1:18:12, 5.53s/it] {'loss': 0.5521, 'learning_rate': 1.1133602045312174e-06, 'epoch': 0.85} 85%|████████▌ | 4924/5773 [3:49:58<1:18:12, 5.53s/it] {'loss': 0.5521, 'learning_rate': 1.1133602045312174e-06, 'epoch': 0.85} 85%|████████▌ | 4924/5773 [3:49:56<1:18:12, 5.53s/it] 85%|████████▌ | 4925/5773 [3:50:03<1:17:30, 5.48s/it] 85%|████████▌ | 4925/5773 [3:50:01<1:17:30, 5.48s/it] {'loss': 0.5702, 'learning_rate': 1.1107886342852204e-06, 'epoch': 0.85} 85%|████████▌ | 4925/5773 [3:50:03<1:17:30, 5.48s/it]{'loss': 0.5702, 'learning_rate': 1.1107886342852204e-06, 'epoch': 0.85} 85%|████████▌ | 4925/5773 [3:50:01<1:17:30, 5.48s/it] 85%|████████▌ | 4926/5773 [3:50:06<1:17:07, 5.46s/it] 85%|████████▌ | 4926/5773 [3:50:08<1:17:07, 5.46s/it] {'loss': 0.5687, 'learning_rate': 1.1082198626469687e-06, 'epoch': 0.85} 85%|████████▌ | 4926/5773 [3:50:08<1:17:07, 5.46s/it] {'loss': 0.5687, 'learning_rate': 1.1082198626469687e-06, 'epoch': 0.85} 85%|████████▌ | 4926/5773 [3:50:06<1:17:07, 5.46s/it] 85%|████████▌ | 4927/5773 [3:50:12<1:17:18, 5.48s/it] 85%|████████▌ | 4927/5773 [3:50:14<1:17:18, 5.48s/it] {'loss': 0.559, 'learning_rate': 1.1056538904251911e-06, 'epoch': 0.85} 85%|████████▌ | 4927/5773 [3:50:14<1:17:18, 5.48s/it] {'loss': 0.559, 'learning_rate': 1.1056538904251911e-06, 'epoch': 0.85} 85%|████████▌ | 4927/5773 [3:50:12<1:17:18, 5.48s/it] 85%|████████▌ | 4928/5773 [3:50:17<1:16:54, 5.46s/it] 85%|████████▌ | 4928/5773 [3:50:19<1:16:54, 5.46s/it] {'loss': 0.5576, 'learning_rate': 1.1030907184277452e-06, 'epoch': 0.85} 85%|████████▌ | 4928/5773 [3:50:19<1:16:54, 5.46s/it] {'loss': 0.5576, 'learning_rate': 1.1030907184277452e-06, 'epoch': 0.85} 85%|████████▌ | 4928/5773 [3:50:17<1:16:54, 5.46s/it] 85%|████████▌ | 4929/5773 [3:50:23<1:16:34, 5.44s/it] 85%|████████▌ | 4929/5773 [3:50:25<1:16:34, 5.44s/it] {'loss': 0.5618, 'learning_rate': 1.1005303474615935e-06, 'epoch': 0.85} 85%|████████▌ | 4929/5773 [3:50:25<1:16:34, 5.44s/it] {'loss': 0.5618, 'learning_rate': 1.1005303474615935e-06, 'epoch': 0.85} 85%|████████▌ | 4929/5773 [3:50:23<1:16:34, 5.44s/it] 85%|████████▌ | 4930/5773 [3:50:28<1:16:22, 5.44s/it] 85%|████████▌ | 4930/5773 [3:50:30<1:16:22, 5.44s/it] {'loss': 0.5579, 'learning_rate': 1.0979727783328243e-06, 'epoch': 0.85} 85%|████████▌ | 4930/5773 [3:50:30<1:16:22, 5.44s/it] {'loss': 0.5579, 'learning_rate': 1.0979727783328243e-06, 'epoch': 0.85} 85%|████████▌ | 4930/5773 [3:50:28<1:16:22, 5.44s/it] 85%|████████▌ | 4931/5773 [3:50:34<1:16:32, 5.45s/it] 85%|████████▌ | 4931/5773 [3:50:36<1:16:32, 5.45s/it] {'loss': 0.5658, 'learning_rate': 1.0954180118466428e-06, 'epoch': 0.85} 85%|████████▌ | 4931/5773 [3:50:36<1:16:32, 5.45s/it] {'loss': 0.5658, 'learning_rate': 1.0954180118466428e-06, 'epoch': 0.85} 85%|████████▌ | 4931/5773 [3:50:34<1:16:32, 5.45s/it] 85%|████████▌ | 4932/5773 [3:50:39<1:15:39, 5.40s/it] 85%|████████▌ | 4932/5773 [3:50:41<1:15:39, 5.40s/it] {'loss': 0.5691, 'learning_rate': 1.0928660488073717e-06, 'epoch': 0.85} 85%|████████▌ | 4932/5773 [3:50:41<1:15:39, 5.40s/it] {'loss': 0.5691, 'learning_rate': 1.0928660488073717e-06, 'epoch': 0.85} 85%|████████▌ | 4932/5773 [3:50:39<1:15:39, 5.40s/it] 85%|████████▌ | 4933/5773 [3:50:44<1:16:11, 5.44s/it] 85%|████████▌ | 4933/5773 [3:50:46<1:16:11, 5.44s/it] {'loss': 0.5805, 'learning_rate': 1.090316890018449e-06, 'epoch': 0.85} 85%|████████▌ | 4933/5773 [3:50:46<1:16:11, 5.44s/it] {'loss': 0.5805, 'learning_rate': 1.090316890018449e-06, 'epoch': 0.85} 85%|████████▌ | 4933/5773 [3:50:44<1:16:11, 5.44s/it] 85%|████████▌ | 4934/5773 [3:50:50<1:15:26, 5.40s/it] 85%|████████▌ | 4934/5773 [3:50:52<1:15:26, 5.40s/it] {'loss': 0.5624, 'learning_rate': 1.0877705362824298e-06, 'epoch': 0.85} 85%|████████▌ | 4934/5773 [3:50:52<1:15:26, 5.40s/it] {'loss': 0.5624, 'learning_rate': 1.0877705362824298e-06, 'epoch': 0.85} 85%|████████▌ | 4934/5773 [3:50:50<1:15:26, 5.40s/it] 85%|████████▌ | 4935/5773 [3:50:55<1:15:10, 5.38s/it] 85%|████████▌ | 4935/5773 [3:50:57<1:15:10, 5.38s/it] {'loss': 0.55, 'learning_rate': 1.0852269884009914e-06, 'epoch': 0.85} 85%|████████▌ | 4935/5773 [3:50:57<1:15:10, 5.38s/it] {'loss': 0.55, 'learning_rate': 1.0852269884009914e-06, 'epoch': 0.85} 85%|████████▌ | 4935/5773 [3:50:55<1:15:10, 5.38s/it] 86%|████████▌ | 4936/5773 [3:51:01<1:15:41, 5.43s/it] 86%|████████▌ | 4936/5773 [3:51:03<1:15:41, 5.43s/it] {'loss': 0.5509, 'learning_rate': 1.0826862471749223e-06, 'epoch': 0.86} 86%|████████▌ | 4936/5773 [3:51:03<1:15:41, 5.43s/it] {'loss': 0.5509, 'learning_rate': 1.0826862471749223e-06, 'epoch': 0.86} 86%|████████▌ | 4936/5773 [3:51:01<1:15:41, 5.43s/it] 86%|████████▌ | 4937/5773 [3:51:08<1:16:29, 5.49s/it] 86%|████████▌ | 4937/5773 [3:51:06<1:16:29, 5.49s/it] {'loss': 0.5546, 'learning_rate': 1.080148313404127e-06, 'epoch': 0.86} 86%|████████▌ | 4937/5773 [3:51:08<1:16:29, 5.49s/it] {'loss': 0.5546, 'learning_rate': 1.080148313404127e-06, 'epoch': 0.86} 86%|████████▌ | 4937/5773 [3:51:06<1:16:29, 5.49s/it] 86%|████████▌ | 4938/5773 [3:51:12<1:17:07, 5.54s/it] 86%|████████▌ | 4938/5773 [3:51:14<1:17:07, 5.54s/it] {'loss': 0.5624, 'learning_rate': 1.0776131878876316e-06, 'epoch': 0.86} 86%|████████▌ | 4938/5773 [3:51:14<1:17:07, 5.54s/it] {'loss': 0.5624, 'learning_rate': 1.0776131878876316e-06, 'epoch': 0.86} 86%|████████▌ | 4938/5773 [3:51:12<1:17:07, 5.54s/it] 86%|████████▌ | 4939/5773 [3:51:17<1:16:30, 5.50s/it] 86%|████████▌ | 4939/5773 [3:51:19<1:16:30, 5.50s/it] {'loss': 0.5487, 'learning_rate': 1.075080871423575e-06, 'epoch': 0.86} 86%|████████▌ | 4939/5773 [3:51:19<1:16:30, 5.50s/it] {'loss': 0.5487, 'learning_rate': 1.075080871423575e-06, 'epoch': 0.86} 86%|████████▌ | 4939/5773 [3:51:17<1:16:30, 5.50s/it] 86%|████████▌ | 4940/5773 [3:51:23<1:15:41, 5.45s/it] 86%|████████▌ | 4940/5773 [3:51:25<1:15:41, 5.45s/it] {'loss': 0.546, 'learning_rate': 1.0725513648092056e-06, 'epoch': 0.86} 86%|████████▌ | 4940/5773 [3:51:25<1:15:41, 5.45s/it] {'loss': 0.546, 'learning_rate': 1.0725513648092056e-06, 'epoch': 0.86} 86%|████████▌ | 4940/5773 [3:51:23<1:15:41, 5.45s/it] 86%|████████▌ | 4941/5773 [3:51:28<1:16:14, 5.50s/it] 86%|████████▌ | 4941/5773 [3:51:30<1:16:14, 5.50s/it] {'loss': 0.5578, 'learning_rate': 1.0700246688408977e-06, 'epoch': 0.86} 86%|████████▌ | 4941/5773 [3:51:30<1:16:14, 5.50s/it] {'loss': 0.5578, 'learning_rate': 1.0700246688408977e-06, 'epoch': 0.86} 86%|████████▌ | 4941/5773 [3:51:28<1:16:14, 5.50s/it] 86%|████████▌ | 4942/5773 [3:51:34<1:15:46, 5.47s/it] 86%|████████▌ | 4942/5773 [3:51:36<1:15:46, 5.47s/it] {'loss': 0.5544, 'learning_rate': 1.067500784314136e-06, 'epoch': 0.86} 86%|████████▌ | 4942/5773 [3:51:36<1:15:46, 5.47s/it] {'loss': 0.5544, 'learning_rate': 1.067500784314136e-06, 'epoch': 0.86} 86%|████████▌ | 4942/5773 [3:51:34<1:15:46, 5.47s/it] 86%|████████▌ | 4943/5773 [3:51:41<1:15:52, 5.49s/it] 86%|████████▌ | 4943/5773 [3:51:39<1:15:52, 5.49s/it] {'loss': 0.559, 'learning_rate': 1.064979712023516e-06, 'epoch': 0.86} 86%|████████▌ | 4943/5773 [3:51:41<1:15:52, 5.49s/it] {'loss': 0.559, 'learning_rate': 1.064979712023516e-06, 'epoch': 0.86} 86%|████████▌ | 4943/5773 [3:51:39<1:15:52, 5.49s/it] 86%|████████▌ | 4944/5773 [3:51:45<1:16:13, 5.52s/it] 86%|████████▌ | 4944/5773 [3:51:47<1:16:13, 5.52s/it] {'loss': 0.5515, 'learning_rate': 1.0624614527627563e-06, 'epoch': 0.86} 86%|████████▌ | 4944/5773 [3:51:47<1:16:13, 5.52s/it] {'loss': 0.5515, 'learning_rate': 1.0624614527627563e-06, 'epoch': 0.86} 86%|████████▌ | 4944/5773 [3:51:45<1:16:13, 5.52s/it] 86%|████████▌ | 4945/5773 [3:51:50<1:16:25, 5.54s/it] 86%|████████▌ | 4945/5773 [3:51:52<1:16:25, 5.54s/it] {'loss': 0.5644, 'learning_rate': 1.0599460073246848e-06, 'epoch': 0.86} 86%|████████▌ | 4945/5773 [3:51:52<1:16:25, 5.54s/it] {'loss': 0.5644, 'learning_rate': 1.0599460073246848e-06, 'epoch': 0.86} 86%|████████▌ | 4945/5773 [3:51:50<1:16:25, 5.54s/it] 86%|████████▌ | 4946/5773 [3:51:56<1:15:53, 5.51s/it] 86%|████████▌ | 4946/5773 [3:51:58<1:15:54, 5.51s/it] {'loss': 0.5588, 'learning_rate': 1.0574333765012423e-06, 'epoch': 0.86} 86%|████████▌ | 4946/5773 [3:51:58<1:15:54, 5.51s/it] {'loss': 0.5588, 'learning_rate': 1.0574333765012423e-06, 'epoch': 0.86} 86%|████████▌ | 4946/5773 [3:51:56<1:15:53, 5.51s/it] 86%|████████▌ | 4947/5773 [3:52:01<1:15:39, 5.50s/it] 86%|████████▌ | 4947/5773 [3:52:03<1:15:39, 5.50s/it] {'loss': 0.5601, 'learning_rate': 1.054923561083484e-06, 'epoch': 0.86} 86%|████████▌ | 4947/5773 [3:52:03<1:15:39, 5.50s/it] {'loss': 0.5601, 'learning_rate': 1.054923561083484e-06, 'epoch': 0.86} 86%|████████▌ | 4947/5773 [3:52:01<1:15:39, 5.50s/it] 86%|████████▌ | 4948/5773 [3:52:07<1:15:09, 5.47s/it] 86%|████████▌ | 4948/5773 [3:52:09<1:15:09, 5.47s/it] {'loss': 0.5501, 'learning_rate': 1.0524165618615845e-06, 'epoch': 0.86} 86%|████████▌ | 4948/5773 [3:52:09<1:15:09, 5.47s/it] {'loss': 0.5501, 'learning_rate': 1.0524165618615845e-06, 'epoch': 0.86} 86%|████████▌ | 4948/5773 [3:52:07<1:15:09, 5.47s/it] 86%|████████▌ | 4949/5773 [3:52:12<1:15:06, 5.47s/it] 86%|████████▌ | 4949/5773 [3:52:14<1:15:06, 5.47s/it] {'loss': 0.5511, 'learning_rate': 1.0499123796248234e-06, 'epoch': 0.86} 86%|████████▌ | 4949/5773 [3:52:14<1:15:06, 5.47s/it] {'loss': 0.5511, 'learning_rate': 1.0499123796248234e-06, 'epoch': 0.86} 86%|████████▌ | 4949/5773 [3:52:12<1:15:06, 5.47s/it]9 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend...0 10 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...13 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 86%|████████▌ | 4950/5773 [3:52:18<1:14:57, 5.46s/it]5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 3 86%|████████▌ | 4950/5773 [3:52:20<1:14:57, 5.46s/it] AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.561, 'learning_rate': 1.047411015161598e-06, 'epoch': 0.86} 86%|████████▌ | 4950/5773 [3:52:20<1:14:57, 5.46s/it] {'loss': 0.561, 'learning_rate': 1.047411015161598e-06, 'epoch': 0.86} 86%|████████▌ | 4950/5773 [3:52:18<1:14:57, 5.46s/it] 86%|████████▌ | 4951/5773 [3:52:23<1:14:55, 5.47s/it] 86%|████████▌ | 4951/5773 [3:52:25<1:14:55, 5.47s/it] {'loss': 0.5429, 'learning_rate': 1.0449124692594193e-06, 'epoch': 0.86} 86%|████████▌ | 4951/5773 [3:52:25<1:14:55, 5.47s/it] {'loss': 0.5429, 'learning_rate': 1.0449124692594193e-06, 'epoch': 0.86} 86%|████████▌ | 4951/5773 [3:52:23<1:14:55, 5.47s/it] 86%|████████▌ | 4952/5773 [3:52:29<1:14:55, 5.48s/it] 86%|████████▌ | 4952/5773 [3:52:31<1:14:55, 5.48s/it] {'loss': 0.5726, 'learning_rate': 1.0424167427049082e-06, 'epoch': 0.86} 86%|████████▌ | 4952/5773 [3:52:31<1:14:55, 5.48s/it] {'loss': 0.5726, 'learning_rate': 1.0424167427049082e-06, 'epoch': 0.86} 86%|████████▌ | 4952/5773 [3:52:29<1:14:55, 5.48s/it] 86%|████████▌ | 4953/5773 [3:52:34<1:14:57, 5.48s/it] 86%|████████▌ | 4953/5773 [3:52:36<1:14:57, 5.48s/it] {'loss': 0.5463, 'learning_rate': 1.0399238362838004e-06, 'epoch': 0.86} 86%|████████▌ | 4953/5773 [3:52:36<1:14:57, 5.48s/it] {'loss': 0.5463, 'learning_rate': 1.0399238362838004e-06, 'epoch': 0.86} 86%|████████▌ | 4953/5773 [3:52:34<1:14:57, 5.48s/it] 86%|████████▌ | 4954/5773 [3:52:40<1:16:16, 5.59s/it] 86%|████████▌ | 4954/5773 [3:52:42<1:16:16, 5.59s/it] {'loss': 0.5818, 'learning_rate': 1.0374337507809407e-06, 'epoch': 0.86} 86%|████████▌ | 4954/5773 [3:52:42<1:16:16, 5.59s/it] {'loss': 0.5818, 'learning_rate': 1.0374337507809407e-06, 'epoch': 0.86} 86%|████████▌ | 4954/5773 [3:52:40<1:16:16, 5.59s/it] 86%|████████▌ | 4955/5773 [3:52:45<1:15:05, 5.51s/it] 86%|████████▌ | 4955/5773 [3:52:47<1:15:05, 5.51s/it] {'loss': 0.5598, 'learning_rate': 1.0349464869802883e-06, 'epoch': 0.86} 86%|████████▌ | 4955/5773 [3:52:47<1:15:05, 5.51s/it] {'loss': 0.5598, 'learning_rate': 1.0349464869802883e-06, 'epoch': 0.86} 86%|████████▌ | 4955/5773 [3:52:45<1:15:05, 5.51s/it] 86%|████████▌ | 4956/5773 [3:52:51<1:14:46, 5.49s/it] 86%|████████▌ | 4956/5773 [3:52:53<1:14:46, 5.49s/it] {'loss': 0.5571, 'learning_rate': 1.0324620456649115e-06, 'epoch': 0.86} 86%|████████▌ | 4956/5773 [3:52:53<1:14:46, 5.49s/it] {'loss': 0.5571, 'learning_rate': 1.0324620456649115e-06, 'epoch': 0.86} 86%|████████▌ | 4956/5773 [3:52:51<1:14:46, 5.49s/it] 86%|████████▌ | 4957/5773 [3:52:56<1:14:51, 5.50s/it] 86%|████████▌ | 4957/5773 [3:52:58<1:14:51, 5.50s/it] {'loss': 0.5609, 'learning_rate': 1.0299804276169944e-06, 'epoch': 0.86} 86%|████████▌ | 4957/5773 [3:52:58<1:14:51, 5.50s/it] {'loss': 0.5609, 'learning_rate': 1.0299804276169944e-06, 'epoch': 0.86} 86%|████████▌ | 4957/5773 [3:52:56<1:14:51, 5.50s/it] 86%|████████▌ | 4958/5773 [3:53:02<1:14:57, 5.52s/it] 86%|████████▌ | 4958/5773 [3:53:04<1:14:57, 5.52s/it] {'loss': 0.5715, 'learning_rate': 1.02750163361783e-06, 'epoch': 0.86} 86%|████████▌ | 4958/5773 [3:53:04<1:14:57, 5.52s/it] {'loss': 0.5715, 'learning_rate': 1.02750163361783e-06, 'epoch': 0.86} 86%|████████▌ | 4958/5773 [3:53:02<1:14:57, 5.52s/it] 86%|████████▌ | 4959/5773 [3:53:07<1:15:05, 5.54s/it] 86%|████████▌ | 4959/5773 [3:53:09<1:15:05, 5.54s/it] {'loss': 0.548, 'learning_rate': 1.0250256644478196e-06, 'epoch': 0.86} 86%|████████▌ | 4959/5773 [3:53:09<1:15:05, 5.54s/it] {'loss': 0.548, 'learning_rate': 1.0250256644478196e-06, 'epoch': 0.86} 86%|████████▌ | 4959/5773 [3:53:07<1:15:05, 5.54s/it] 86%|████████▌ | 4960/5773 [3:53:13<1:14:33, 5.50s/it] 86%|████████▌ | 4960/5773 [3:53:15<1:14:33, 5.50s/it] {'loss': 0.5645, 'learning_rate': 1.0225525208864796e-06, 'epoch': 0.86} 86%|████████▌ | 4960/5773 [3:53:15<1:14:33, 5.50s/it] {'loss': 0.5645, 'learning_rate': 1.0225525208864796e-06, 'epoch': 0.86} 86%|████████▌ | 4960/5773 [3:53:13<1:14:33, 5.50s/it] 86%|████████▌ | 4961/5773 [3:53:18<1:14:16, 5.49s/it] 86%|████████▌ | 4961/5773 [3:53:20<1:14:16, 5.49s/it] {'loss': 0.56, 'learning_rate': 1.0200822037124325e-06, 'epoch': 0.86} 86%|████████▌ | 4961/5773 [3:53:20<1:14:16, 5.49s/it] {'loss': 0.56, 'learning_rate': 1.0200822037124325e-06, 'epoch': 0.86} 86%|████████▌ | 4961/5773 [3:53:18<1:14:16, 5.49s/it] 86%|████████▌ | 4962/5773 [3:53:24<1:14:27, 5.51s/it] 86%|████████▌ | 4962/5773 [3:53:26<1:14:27, 5.51s/it] {'loss': 0.5488, 'learning_rate': 1.0176147137034155e-06, 'epoch': 0.86} 86%|████████▌ | 4962/5773 [3:53:26<1:14:27, 5.51s/it] {'loss': 0.5488, 'learning_rate': 1.0176147137034155e-06, 'epoch': 0.86} 86%|████████▌ | 4962/5773 [3:53:24<1:14:27, 5.51s/it] 86%|████████▌ | 4963/5773 [3:53:29<1:13:51, 5.47s/it] 86%|████████▌ | 4963/5773 [3:53:31<1:13:51, 5.47s/it] {'loss': 0.5589, 'learning_rate': 1.015150051636269e-06, 'epoch': 0.86} 86%|████████▌ | 4963/5773 [3:53:31<1:13:51, 5.47s/it] {'loss': 0.5589, 'learning_rate': 1.015150051636269e-06, 'epoch': 0.86} 86%|████████▌ | 4963/5773 [3:53:29<1:13:51, 5.47s/it] 86%|████████▌ | 4964/5773 [3:53:35<1:13:18, 5.44s/it] 86%|████████▌ | 4964/5773 [3:53:36<1:13:18, 5.44s/it] {'loss': 0.5506, 'learning_rate': 1.0126882182869524e-06, 'epoch': 0.86} 86%|████████▌ | 4964/5773 [3:53:36<1:13:18, 5.44s/it] {'loss': 0.5506, 'learning_rate': 1.0126882182869524e-06, 'epoch': 0.86} 86%|████████▌ | 4964/5773 [3:53:35<1:13:18, 5.44s/it] 86%|████████▌ | 4965/5773 [3:53:40<1:12:50, 5.41s/it] 86%|████████▌ | 4965/5773 [3:53:42<1:12:50, 5.41s/it] {'loss': 0.5513, 'learning_rate': 1.0102292144305282e-06, 'epoch': 0.86} 86%|████████▌ | 4965/5773 [3:53:42<1:12:50, 5.41s/it] {'loss': 0.5513, 'learning_rate': 1.0102292144305282e-06, 'epoch': 0.86} 86%|████████▌ | 4965/5773 [3:53:40<1:12:50, 5.41s/it] 86%|████████▌ | 4966/5773 [3:53:45<1:13:20, 5.45s/it] 86%|████████▌ | 4966/5773 [3:53:47<1:13:20, 5.45s/it] {'loss': 0.5577, 'learning_rate': 1.0077730408411657e-06, 'epoch': 0.86} 86%|████████▌ | 4966/5773 [3:53:47<1:13:20, 5.45s/it] {'loss': 0.5577, 'learning_rate': 1.0077730408411657e-06, 'epoch': 0.86} 86%|████████▌ | 4966/5773 [3:53:45<1:13:20, 5.45s/it] 86%|████████▌ | 4967/5773 [3:53:51<1:13:36, 5.48s/it] 86%|████████▌ | 4967/5773 [3:53:53<1:13:36, 5.48s/it] {'loss': 0.5684, 'learning_rate': 1.0053196982921542e-06, 'epoch': 0.86} 86%|████████▌ | 4967/5773 [3:53:53<1:13:36, 5.48s/it] {'loss': 0.5684, 'learning_rate': 1.0053196982921542e-06, 'epoch': 0.86} 86%|████████▌ | 4967/5773 [3:53:51<1:13:36, 5.48s/it] 86%|████████▌ | 4968/5773 [3:53:56<1:13:44, 5.50s/it] 86%|████████▌ | 4968/5773 [3:53:58<1:13:44, 5.50s/it] {'loss': 0.5593, 'learning_rate': 1.0028691875558771e-06, 'epoch': 0.86} 86%|████████▌ | 4968/5773 [3:53:56<1:13:44, 5.50s/it] {'loss': 0.5593, 'learning_rate': 1.0028691875558771e-06, 'epoch': 0.86} 86%|████████▌ | 4968/5773 [3:53:58<1:13:44, 5.50s/it] 86%|████████▌ | 4969/5773 [3:54:02<1:13:47, 5.51s/it] 86%|████████▌ | 4969/5773 [3:54:04<1:13:47, 5.51s/it] {'loss': 0.5511, 'learning_rate': 1.000421509403834e-06, 'epoch': 0.86} 86%|████████▌ | 4969/5773 [3:54:04<1:13:47, 5.51s/it] {'loss': 0.5511, 'learning_rate': 1.000421509403834e-06, 'epoch': 0.86} 86%|████████▌ | 4969/5773 [3:54:02<1:13:47, 5.51s/it] 86%|████████▌ | 4970/5773 [3:54:08<1:13:44, 5.51s/it] 86%|████████▌ | 4970/5773 [3:54:09<1:13:44, 5.51s/it] {'loss': 0.5536, 'learning_rate': 9.979766646066368e-07, 'epoch': 0.86} 86%|████████▌ | 4970/5773 [3:54:09<1:13:44, 5.51s/it] {'loss': 0.5536, 'learning_rate': 9.979766646066368e-07, 'epoch': 0.86} 86%|████████▌ | 4970/5773 [3:54:08<1:13:44, 5.51s/it] 86%|████████▌ | 4971/5773 [3:54:13<1:13:20, 5.49s/it] 86%|████████▌ | 4971/5773 [3:54:15<1:13:20, 5.49s/it] {'loss': 0.5497, 'learning_rate': 9.955346539339983e-07, 'epoch': 0.86} 86%|████████▌ | 4971/5773 [3:54:15<1:13:20, 5.49s/it] {'loss': 0.5497, 'learning_rate': 9.955346539339983e-07, 'epoch': 0.86} 86%|████████▌ | 4971/5773 [3:54:13<1:13:20, 5.49s/it] 86%|████████▌ | 4972/5773 [3:54:20<1:12:52, 5.46s/it] 86%|████████▌ | 4972/5773 [3:54:18<1:12:52, 5.46s/it] {'loss': 0.5656, 'learning_rate': 9.930954781547398e-07, 'epoch': 0.86} 86%|████████▌ | 4972/5773 [3:54:20<1:12:52, 5.46s/it] {'loss': 0.5656, 'learning_rate': 9.930954781547398e-07, 'epoch': 0.86} 86%|████████▌ | 4972/5773 [3:54:18<1:12:52, 5.46s/it] 86%|████████▌ | 4973/5773 [3:54:24<1:13:07, 5.48s/it] 86%|████████▌ | 4973/5773 [3:54:26<1:13:07, 5.48s/it] {'loss': 0.5609, 'learning_rate': 9.906591380367948e-07, 'epoch': 0.86} 86%|████████▌ | 4973/5773 [3:54:26<1:13:07, 5.48s/it] {'loss': 0.5609, 'learning_rate': 9.906591380367948e-07, 'epoch': 0.86} 86%|████████▌ | 4973/5773 [3:54:24<1:13:07, 5.48s/it] 86%|████████▌ | 4974/5773 [3:54:29<1:13:18, 5.50s/it] 86%|████████▌ | 4974/5773 [3:54:31<1:13:18, 5.50s/it] {'loss': 0.5563, 'learning_rate': 9.882256343471996e-07, 'epoch': 0.86} 86%|████████▌ | 4974/5773 [3:54:31<1:13:18, 5.50s/it] {'loss': 0.5563, 'learning_rate': 9.882256343471996e-07, 'epoch': 0.86} 86%|████████▌ | 4974/5773 [3:54:29<1:13:18, 5.50s/it] 86%|████████▌ | 4975/5773 [3:54:35<1:12:56, 5.48s/it] 86%|████████▌ | 4975/5773 [3:54:37<1:12:56, 5.48s/it] {'loss': 0.5639, 'learning_rate': 9.857949678520984e-07, 'epoch': 0.86} 86%|████████▌ | 4975/5773 [3:54:37<1:12:56, 5.48s/it] {'loss': 0.5639, 'learning_rate': 9.857949678520984e-07, 'epoch': 0.86} 86%|████████▌ | 4975/5773 [3:54:35<1:12:56, 5.48s/it] 86%|████████▌ | 4976/5773 [3:54:40<1:12:37, 5.47s/it] 86%|████████▌ | 4976/5773 [3:54:42<1:12:37, 5.47s/it] {'loss': 0.5757, 'learning_rate': 9.833671393167421e-07, 'epoch': 0.86} 86%|████████▌ | 4976/5773 [3:54:42<1:12:37, 5.47s/it] {'loss': 0.5757, 'learning_rate': 9.833671393167421e-07, 'epoch': 0.86} 86%|████████▌ | 4976/5773 [3:54:40<1:12:37, 5.47s/it] 86%|████████▌ | 4977/5773 [3:54:46<1:12:53, 5.49s/it] 86%|████████▌ | 4977/5773 [3:54:48<1:12:53, 5.49s/it] {'loss': 0.537, 'learning_rate': 9.809421495054916e-07, 'epoch': 0.86} 86%|████████▌ | 4977/5773 [3:54:48<1:12:53, 5.49s/it] {'loss': 0.537, 'learning_rate': 9.809421495054916e-07, 'epoch': 0.86} 86%|████████▌ | 4977/5773 [3:54:46<1:12:53, 5.49s/it] 86%|████████▌ | 4978/5773 [3:54:51<1:12:33, 5.48s/it] 86%|████████▌ | 4978/5773 [3:54:53<1:12:33, 5.48s/it] {'loss': 0.5695, 'learning_rate': 9.785199991818095e-07, 'epoch': 0.86} 86%|████████▌ | 4978/5773 [3:54:53<1:12:33, 5.48s/it] {'loss': 0.5695, 'learning_rate': 9.785199991818095e-07, 'epoch': 0.86} 86%|████████▌ | 4978/5773 [3:54:51<1:12:33, 5.48s/it] 86%|████████▌ | 4979/5773 [3:54:57<1:12:14, 5.46s/it] 86%|████████▌ | 4979/5773 [3:54:59<1:12:14, 5.46s/it] {'loss': 0.5601, 'learning_rate': 9.761006891082636e-07, 'epoch': 0.86} 86%|████████▌ | 4979/5773 [3:54:59<1:12:14, 5.46s/it] {'loss': 0.5601, 'learning_rate': 9.761006891082636e-07, 'epoch': 0.86} 86%|████████▌ | 4979/5773 [3:54:57<1:12:14, 5.46s/it] 86%|████████▋ | 4980/5773 [3:55:02<1:12:50, 5.51s/it] 86%|████████▋ | 4980/5773 [3:55:04<1:12:50, 5.51s/it] {'loss': 0.5499, 'learning_rate': 9.736842200465345e-07, 'epoch': 0.86} 86%|████████▋ | 4980/5773 [3:55:04<1:12:50, 5.51s/it] {'loss': 0.5499, 'learning_rate': 9.736842200465345e-07, 'epoch': 0.86} 86%|████████▋ | 4980/5773 [3:55:02<1:12:50, 5.51s/it] 86%|████████▋ | 4981/5773 [3:55:08<1:13:15, 5.55s/it] 86%|████████▋ | 4981/5773 [3:55:10<1:13:15, 5.55s/it] {'loss': 0.5499, 'learning_rate': 9.71270592757404e-07, 'epoch': 0.86} 86%|████████▋ | 4981/5773 [3:55:10<1:13:15, 5.55s/it] {'loss': 0.5499, 'learning_rate': 9.71270592757404e-07, 'epoch': 0.86} 86%|████████▋ | 4981/5773 [3:55:08<1:13:15, 5.55s/it] 86%|████████▋ | 4982/5773 [3:55:13<1:12:46, 5.52s/it] 86%|████████▋ | 4982/5773 [3:55:15<1:12:46, 5.52s/it] {'loss': 0.5671, 'learning_rate': 9.688598080007526e-07, 'epoch': 0.86} 86%|████████▋ | 4982/5773 [3:55:15<1:12:46, 5.52s/it] {'loss': 0.5671, 'learning_rate': 9.688598080007526e-07, 'epoch': 0.86} 86%|████████▋ | 4982/5773 [3:55:13<1:12:46, 5.52s/it] 86%|████████▋ | 4983/5773 [3:55:19<1:12:11, 5.48s/it] 86%|████████▋ | 4983/5773 [3:55:21<1:12:11, 5.48s/it] {'loss': 0.5557, 'learning_rate': 9.664518665355782e-07, 'epoch': 0.86} 86%|████████▋ | 4983/5773 [3:55:21<1:12:11, 5.48s/it] {'loss': 0.5557, 'learning_rate': 9.664518665355782e-07, 'epoch': 0.86} 86%|████████▋ | 4983/5773 [3:55:19<1:12:11, 5.48s/it] 86%|████████▋ | 4984/5773 [3:55:24<1:12:43, 5.53s/it] 86%|████████▋ | 4984/5773 [3:55:26<1:12:42, 5.53s/it] {'loss': 0.5563, 'learning_rate': 9.640467691199773e-07, 'epoch': 0.86} 86%|████████▋ | 4984/5773 [3:55:26<1:12:42, 5.53s/it] {'loss': 0.5563, 'learning_rate': 9.640467691199773e-07, 'epoch': 0.86} 86%|████████▋ | 4984/5773 [3:55:24<1:12:43, 5.53s/it] 86%|████████▋ | 4985/5773 [3:55:30<1:12:22, 5.51s/it] 86%|████████▋ | 4985/5773 [3:55:32<1:12:22, 5.51s/it] {'loss': 0.5727, 'learning_rate': 9.616445165111477e-07, 'epoch': 0.86} 86%|████████▋ | 4985/5773 [3:55:32<1:12:22, 5.51s/it] {'loss': 0.5727, 'learning_rate': 9.616445165111477e-07, 'epoch': 0.86} 86%|████████▋ | 4985/5773 [3:55:30<1:12:22, 5.51s/it] 86%|████████▋ | 4986/5773 [3:55:36<1:12:40, 5.54s/it] 86%|████████▋ | 4986/5773 [3:55:38<1:12:40, 5.54s/it] {'loss': 0.5525, 'learning_rate': 9.592451094653988e-07, 'epoch': 0.86} 86%|████████▋ | 4986/5773 [3:55:38<1:12:40, 5.54s/it] {'loss': 0.5525, 'learning_rate': 9.592451094653988e-07, 'epoch': 0.86} 86%|████████▋ | 4986/5773 [3:55:36<1:12:40, 5.54s/it] 86%|████████▋ | 4987/5773 [3:55:41<1:12:38, 5.54s/it] 86%|████████▋ | 4987/5773 [3:55:43<1:12:38, 5.54s/it] {'loss': 0.5508, 'learning_rate': 9.568485487381407e-07, 'epoch': 0.86} 86%|████████▋ | 4987/5773 [3:55:43<1:12:38, 5.54s/it] {'loss': 0.5508, 'learning_rate': 9.568485487381407e-07, 'epoch': 0.86} 86%|████████▋ | 4987/5773 [3:55:41<1:12:38, 5.54s/it] 86%|████████▋ | 4988/5773 [3:55:47<1:12:51, 5.57s/it] 86%|████████▋ | 4988/5773 [3:55:49<1:12:51, 5.57s/it] {'loss': 0.5501, 'learning_rate': 9.544548350838856e-07, 'epoch': 0.86} 86%|████████▋ | 4988/5773 [3:55:49<1:12:51, 5.57s/it] {'loss': 0.5501, 'learning_rate': 9.544548350838856e-07, 'epoch': 0.86} 86%|████████▋ | 4988/5773 [3:55:47<1:12:51, 5.57s/it] 86%|████████▋ | 4989/5773 [3:55:52<1:13:23, 5.62s/it] 86%|████████▋ | 4989/5773 [3:55:54<1:13:23, 5.62s/it] {'loss': 0.5662, 'learning_rate': 9.520639692562506e-07, 'epoch': 0.86} 86%|████████▋ | 4989/5773 [3:55:54<1:13:23, 5.62s/it] {'loss': 0.5662, 'learning_rate': 9.520639692562506e-07, 'epoch': 0.86} 86%|████████▋ | 4989/5773 [3:55:52<1:13:23, 5.62s/it] 86%|████████▋ | 4990/5773 [3:55:58<1:12:45, 5.58s/it] 86%|████████▋ | 4990/5773 [3:56:00<1:12:45, 5.58s/it] {'loss': 0.5646, 'learning_rate': 9.496759520079579e-07, 'epoch': 0.86} 86%|████████▋ | 4990/5773 [3:55:58<1:12:45, 5.58s/it]{'loss': 0.5646, 'learning_rate': 9.496759520079579e-07, 'epoch': 0.86} 86%|████████▋ | 4990/5773 [3:56:00<1:12:45, 5.58s/it] 86%|████████▋ | 4991/5773 [3:56:04<1:12:45, 5.58s/it] 86%|████████▋ | 4991/5773 [3:56:06<1:12:45, 5.58s/it] {'loss': 0.5513, 'learning_rate': 9.47290784090833e-07, 'epoch': 0.86} 86%|████████▋ | 4991/5773 [3:56:06<1:12:45, 5.58s/it] {'loss': 0.5513, 'learning_rate': 9.47290784090833e-07, 'epoch': 0.86} 86%|████████▋ | 4991/5773 [3:56:04<1:12:45, 5.58s/it] 86%|████████▋ | 4992/5773 [3:56:09<1:11:35, 5.50s/it] 86%|████████▋ | 4992/5773 [3:56:11<1:11:35, 5.50s/it] {'loss': 0.5499, 'learning_rate': 9.449084662557984e-07, 'epoch': 0.86} 86%|████████▋ | 4992/5773 [3:56:11<1:11:35, 5.50s/it] {'loss': 0.5499, 'learning_rate': 9.449084662557984e-07, 'epoch': 0.86} 86%|████████▋ | 4992/5773 [3:56:09<1:11:35, 5.50s/it] 86%|████████▋ | 4993/5773 [3:56:14<1:11:48, 5.52s/it] 86%|████████▋ | 4993/5773 [3:56:16<1:11:48, 5.52s/it] {'loss': 0.5578, 'learning_rate': 9.425289992528897e-07, 'epoch': 0.86} 86%|████████▋ | 4993/5773 [3:56:16<1:11:48, 5.52s/it] {'loss': 0.5578, 'learning_rate': 9.425289992528897e-07, 'epoch': 0.86} 86%|████████▋ | 4993/5773 [3:56:14<1:11:48, 5.52s/it] 87%|████████▋ | 4994/5773 [3:56:20<1:12:04, 5.55s/it] 87%|████████▋ | 4994/5773 [3:56:22<1:12:04, 5.55s/it] {'loss': 0.5628, 'learning_rate': 9.401523838312354e-07, 'epoch': 0.87} 87%|████████▋ | 4994/5773 [3:56:22<1:12:04, 5.55s/it] {'loss': 0.5628, 'learning_rate': 9.401523838312354e-07, 'epoch': 0.87} 87%|████████▋ | 4994/5773 [3:56:20<1:12:04, 5.55s/it] 87%|████████▋ | 4995/5773 [3:56:26<1:12:04, 5.56s/it] 87%|████████▋ | 4995/5773 [3:56:28<1:12:04, 5.56s/it] {'loss': 0.5625, 'learning_rate': 9.377786207390716e-07, 'epoch': 0.87} 87%|████████▋ | 4995/5773 [3:56:28<1:12:04, 5.56s/it] {'loss': 0.5625, 'learning_rate': 9.377786207390716e-07, 'epoch': 0.87} 87%|████████▋ | 4995/5773 [3:56:26<1:12:04, 5.56s/it] 87%|████████▋ | 4996/5773 [3:56:31<1:11:33, 5.53s/it] 87%|████████▋ | 4996/5773 [3:56:33<1:11:33, 5.53s/it] {'loss': 0.5736, 'learning_rate': 9.354077107237325e-07, 'epoch': 0.87} 87%|████████▋ | 4996/5773 [3:56:33<1:11:33, 5.53s/it] {'loss': 0.5736, 'learning_rate': 9.354077107237325e-07, 'epoch': 0.87} 87%|████████▋ | 4996/5773 [3:56:31<1:11:33, 5.53s/it] 87%|████████▋ | 4997/5773 [3:56:36<1:10:49, 5.48s/it] 87%|████████▋ | 4997/5773 [3:56:38<1:10:49, 5.48s/it] {'loss': 0.5617, 'learning_rate': 9.330396545316589e-07, 'epoch': 0.87} 87%|████████▋ | 4997/5773 [3:56:38<1:10:49, 5.48s/it] {'loss': 0.5617, 'learning_rate': 9.330396545316589e-07, 'epoch': 0.87} 87%|████████▋ | 4997/5773 [3:56:36<1:10:49, 5.48s/it] 87%|████████▋ | 4998/5773 [3:56:44<1:10:56, 5.49s/it] 87%|████████▋ | 4998/5773 [3:56:42<1:10:56, 5.49s/it] {'loss': 0.557, 'learning_rate': 9.306744529083878e-07, 'epoch': 0.87} 87%|████████▋ | 4998/5773 [3:56:44<1:10:56, 5.49s/it] {'loss': 0.557, 'learning_rate': 9.306744529083878e-07, 'epoch': 0.87} 87%|████████▋ | 4998/5773 [3:56:42<1:10:56, 5.49s/it] 87%|████████▋ | 4999/5773 [3:56:50<1:11:56, 5.58s/it] 87%|████████▋ | 4999/5773 [3:56:48<1:11:56, 5.58s/it] {'loss': 0.5398, 'learning_rate': 9.283121065985634e-07, 'epoch': 0.87} 87%|████████▋ | 4999/5773 [3:56:50<1:11:56, 5.58s/it] {'loss': 0.5398, 'learning_rate': 9.283121065985634e-07, 'epoch': 0.87} 87%|████████▋ | 4999/5773 [3:56:48<1:11:56, 5.58s/it]11 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... 4AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 87%|████████▋ | 5000/5773 [3:56:53<1:11:38, 5.56s/it]8 AutoResumeHook: Checking whether to suspend... 87%|████████▋ | 5000/5773 [3:56:55<1:11:38, 5.56s/it]12 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5562, 'learning_rate': 9.259526163459265e-07, 'epoch': 0.87} 87%|████████▋ | 5000/5773 [3:56:55<1:11:38, 5.56s/it] {'loss': 0.5562, 'learning_rate': 9.259526163459265e-07, 'epoch': 0.87} 87%|████████▋ | 5000/5773 [3:56:53<1:11:38, 5.56s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 87%|████████▋ | 5001/5773 [3:57:15<2:12:04, 10.26s/it] 87%|████████▋ | 5001/5773 [3:57:16<2:12:04, 10.26s/it] {'loss': 0.5575, 'learning_rate': 9.235959828933195e-07, 'epoch': 0.87} 87%|████████▋ | 5001/5773 [3:57:16<2:12:04, 10.26s/it] {'loss': 0.5575, 'learning_rate': 9.235959828933195e-07, 'epoch': 0.87} 87%|████████▋ | 5001/5773 [3:57:15<2:12:04, 10.26s/it]WARNING: tokenization mismatch: 1 vs. 789. [[{'from': 'human', 'value': 'ASSISTANT: You are a forward-thinking HR consultancy with the following knowledge and traits.\n\nKNOWLEDGE: HR management, employee satisfaction, talent retention, team building, workplace culture.\n\nTRAITS: Innovation, empathy, solution orientation, creativity, understanding, teamwork.\n\nTASK: Use your knowledge to create an internal strategy document. We are an HR consultancy looking to develop 10 creative ways to improve employee satisfaction. Focus on ideas that are innovative and engaging, and cater to the needs of HR managers and company leaders.\n\nOUTPUT: A table in German containing 10 creative strategies to improve employee satisfaction, including a brief description for each.'}, {'from': 'gpt', 'value': '| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1. | "Wertschätzungswand" | Eine Wand im Büro, auf der Mitarbeiter anonym ihre Kollegen für ihre Leistungen und Beiträge loben und anerkennen können. |\n| 2. | Flexible Arbeitsmodelle | Einführung von flexiblen Arbeitszeiten und Homeoffice-Optionen, um Work-Life-Balance zu verbessern und individuelle Bedürfnisse zu berücksichtigen. |\n| 3. | Mitarbeiter-Workshops für persönliche Entwicklung | Organisieren von internen Workshops und Schulungen, um persönliche und berufliche Fähigkeiten der Mitarbeiter zu fördern. |\n| 4. | Team-Building-Aktivitäten außerhalb des Arbeitsplatzes | Regelmäßige Veranstaltungen wie Escape Rooms, Sportevents oder Kochkurse, um den Zusammenhalt und die Kommunikation im Team zu stärken. |\n| 5. | Gesundheits- und Wellness-Programme | Einführung von Fitness- und Entspannungskursen wie Yoga, Meditation oder Massagen, um das Wohlbefinden der Mitarbeiter zu fördern. |\n| 6. | Feedback-Kultur und transparente Kommunikation | Etablierung einer offenen Feedback-Kultur und transparenter Kommunikation, um die Mitarbeiter in Entscheidungsprozesse einzubinden. |\n| 7. | Mitarbeiter-Mentoring-Programm | Einführung eines Mentoring-Programms, bei dem erfahrene Mitarbeiter ihr Wissen und ihre Erfahrungen mit jüngeren Kollegen teilen. |\n| 8. | "Innovationstage" | Einrichtung von speziellen Tagen, an denen Mitarbeiter an eigenen Projekten arbeiten oder innovative Ideen für das Unternehmen entwickeln können. |\n| 9. | Individuelle Karriereentwicklungspläne | Erstellung von personalisierten Karriereplänen in Zusammenarbeit mit den Mitarbeitern, um ihre berufliche Entwicklung zu unterstützen. |\n| 10. | Gemeinschaftliche soziale Verantwortung (CSR) Initiativen | Organisation von Freiwilligenprojekten und CSR-Aktivitäten, um den Mitarbeitern die Möglichkeit zu geben, sozialen Einfluss zu nehmen. |'}]] (ignored) 87%|████████▋ | 5002/5773 [3:57:20<1:53:38, 8.84s/it] 87%|████████▋ | 5002/5773 [3:57:22<1:53:38, 8.84s/it] {'loss': 0.5431, 'learning_rate': 9.212422069826843e-07, 'epoch': 0.87} 87%|████████▋ | 5002/5773 [3:57:22<1:53:38, 8.84s/it] {'loss': 0.5431, 'learning_rate': 9.212422069826843e-07, 'epoch': 0.87} 87%|████████▋ | 5002/5773 [3:57:20<1:53:38, 8.84s/it] 87%|████████▋ | 5003/5773 [3:57:26<1:41:58, 7.95s/it] 87%|████████▋ | 5003/5773 [3:57:28<1:41:58, 7.95s/it] {'loss': 0.5699, 'learning_rate': 9.188912893550694e-07, 'epoch': 0.87} 87%|████████▋ | 5003/5773 [3:57:28<1:41:58, 7.95s/it] {'loss': 0.5699, 'learning_rate': 9.188912893550694e-07, 'epoch': 0.87} 87%|████████▋ | 5003/5773 [3:57:26<1:41:58, 7.95s/it] 87%|████████▋ | 5004/5773 [3:57:31<1:32:42, 7.23s/it] 87%|████████▋ | 5004/5773 [3:57:33<1:32:42, 7.23s/it] {'loss': 0.5472, 'learning_rate': 9.16543230750615e-07, 'epoch': 0.87} 87%|████████▋ | 5004/5773 [3:57:33<1:32:42, 7.23s/it] {'loss': 0.5472, 'learning_rate': 9.16543230750615e-07, 'epoch': 0.87} 87%|████████▋ | 5004/5773 [3:57:31<1:32:42, 7.23s/it] 87%|████████▋ | 5005/5773 [3:57:37<1:26:21, 6.75s/it] 87%|████████▋ | 5005/5773 [3:57:39<1:26:21, 6.75s/it] {'loss': 0.5565, 'learning_rate': 9.141980319085642e-07, 'epoch': 0.87} 87%|████████▋ | 5005/5773 [3:57:39<1:26:21, 6.75s/it] {'loss': 0.5565, 'learning_rate': 9.141980319085642e-07, 'epoch': 0.87} 87%|████████▋ | 5005/5773 [3:57:37<1:26:21, 6.75s/it] 87%|████████▋ | 5006/5773 [3:57:44<1:20:59, 6.34s/it] 87%|████████▋ | 5006/5773 [3:57:42<1:21:00, 6.34s/it] {'loss': 0.5557, 'learning_rate': 9.118556935672651e-07, 'epoch': 0.87} 87%|████████▋ | 5006/5773 [3:57:44<1:20:59, 6.34s/it] {'loss': 0.5557, 'learning_rate': 9.118556935672651e-07, 'epoch': 0.87} 87%|████████▋ | 5006/5773 [3:57:42<1:21:00, 6.34s/it] 87%|████████▋ | 5007/5773 [3:57:48<1:18:07, 6.12s/it] 87%|████████▋ | 5007/5773 [3:57:50<1:18:07, 6.12s/it] {'loss': 0.5555, 'learning_rate': 9.095162164641569e-07, 'epoch': 0.87} 87%|████████▋ | 5007/5773 [3:57:50<1:18:07, 6.12s/it] {'loss': 0.5555, 'learning_rate': 9.095162164641569e-07, 'epoch': 0.87} 87%|████████▋ | 5007/5773 [3:57:48<1:18:07, 6.12s/it] 87%|████████▋ | 5008/5773 [3:57:54<1:15:58, 5.96s/it] 87%|████████▋ | 5008/5773 [3:57:56<1:15:58, 5.96s/it] {'loss': 0.5488, 'learning_rate': 9.071796013357836e-07, 'epoch': 0.87} 87%|████████▋ | 5008/5773 [3:57:56<1:15:58, 5.96s/it] {'loss': 0.5488, 'learning_rate': 9.071796013357836e-07, 'epoch': 0.87} 87%|████████▋ | 5008/5773 [3:57:54<1:15:58, 5.96s/it] 87%|████████▋ | 5009/5773 [3:57:59<1:13:55, 5.81s/it] 87%|████████▋ | 5009/5773 [3:58:01<1:13:55, 5.81s/it] {'loss': 0.5686, 'learning_rate': 9.048458489177847e-07, 'epoch': 0.87} 87%|████████▋ | 5009/5773 [3:58:01<1:13:55, 5.81s/it] {'loss': 0.5686, 'learning_rate': 9.048458489177847e-07, 'epoch': 0.87} 87%|████████▋ | 5009/5773 [3:57:59<1:13:55, 5.81s/it]Apr 10 10:08:18.548508 853613 slurmstepd 0x155550ab8700: error: *** STEP 6697721.0 ON batch-block1-0082 CANCELLED AT 2025-04-10T10:08:18 DUE TO TIME LIMIT *** srun: Job step aborted: Waiting up to 122 seconds for job step to finish. srun: error: batch-block1-10014: task 1: Terminated srun: Terminating StepId=6697721.0 srun: error: batch-block1-0082: task 0: Terminated srun: job 6719580 queued and waiting for resources srun: job 6719580 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block5-00547 JobID: 6719580 | Full list: batch-block5-00547 batch-block5-00617 NETWORK=Efficient-Large-Model/VILA1.5-3b wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block5-00547 JobID: 6719580 | Full list: batch-block5-00547 batch-block5-00617 NETWORK=Efficient-Large-Model/VILA1.5-3b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-04-10 12:31:05,490] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,490] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,490] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,490] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,490] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,490] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,490] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,491] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,709] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,709] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,709] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,709] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,710] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,725] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,726] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:05,734] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 12:31:07,348] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,348] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,348] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,348] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,348] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,348] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,348] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,348] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,348] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,348] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,348] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,348] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,348] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,348] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,348] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-04-10 12:31:07,348] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,349] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,586] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,586] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,587] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,587] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,586] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,586] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,586] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,586] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,586] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,587] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,587] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,587] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,587] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,587] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 12:31:07,587] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 12:31:07,587] [INFO] [comm.py:594:init_distributed] cdb=None You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-04-10 12:31:19,664] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 2.70B parameters Loading checkpoint shards: 0%| | 0/2 [00:00 4096). Running this sequence through the model will result in indexing errors 90%|████████▉ | 5171/5773 [16:15<55:00, 5.48s/it] 90%|████████▉ | 5171/5773 [16:13<55:00, 5.48s/it] {'loss': 0.5555, 'learning_rate': 5.65078233700268e-07, 'epoch': 0.9} 90%|████████▉ | 5171/5773 [16:15<55:00, 5.48s/it] {'loss': 0.5555, 'learning_rate': 5.65078233700268e-07, 'epoch': 0.9} 90%|████████▉ | 5171/5773 [16:13<55:00, 5.48s/it] 90%|████████▉ | 5172/5773 [16:21<54:37, 5.45s/it] 90%|████████▉ | 5172/5773 [16:18<54:37, 5.45s/it] {'loss': 0.5443, 'learning_rate': 5.632202664866026e-07, 'epoch': 0.9} 90%|████████▉ | 5172/5773 [16:21<54:37, 5.45s/it] {'loss': 0.5443, 'learning_rate': 5.632202664866026e-07, 'epoch': 0.9} 90%|████████▉ | 5172/5773 [16:18<54:37, 5.45s/it] 90%|████████▉ | 5173/5773 [16:26<54:49, 5.48s/it] 90%|████████▉ | 5173/5773 [16:24<54:49, 5.48s/it] {'loss': 0.5519, 'learning_rate': 5.613652702727002e-07, 'epoch': 0.9} 90%|████████▉ | 5173/5773 [16:26<54:49, 5.48s/it] {'loss': 0.5519, 'learning_rate': 5.613652702727002e-07, 'epoch': 0.9} 90%|████████▉ | 5173/5773 [16:24<54:49, 5.48s/it] 90%|████████▉ | 5174/5773 [16:32<55:05, 5.52s/it] 90%|████████▉ | 5174/5773 [16:30<55:05, 5.52s/it] {'loss': 0.5764, 'learning_rate': 5.595132456425711e-07, 'epoch': 0.9} 90%|████████▉ | 5174/5773 [16:32<55:05, 5.52s/it] {'loss': 0.5764, 'learning_rate': 5.595132456425711e-07, 'epoch': 0.9} 90%|████████▉ | 5174/5773 [16:30<55:05, 5.52s/it] 90%|████████▉ | 5175/5773 [16:37<54:54, 5.51s/it] 90%|████████▉ | 5175/5773 [16:35<54:54, 5.51s/it] {'loss': 0.5646, 'learning_rate': 5.576641931792947e-07, 'epoch': 0.9} 90%|████████▉ | 5175/5773 [16:38<54:54, 5.51s/it] {'loss': 0.5646, 'learning_rate': 5.576641931792947e-07, 'epoch': 0.9} 90%|████████▉ | 5175/5773 [16:35<54:54, 5.51s/it] 90%|████████▉ | 5176/5773 [16:43<54:38, 5.49s/it] 90%|████████▉ | 5176/5773 [16:41<54:38, 5.49s/it] {'loss': 0.568, 'learning_rate': 5.55818113465012e-07, 'epoch': 0.9} 90%|████████▉ | 5176/5773 [16:43<54:38, 5.49s/it] {'loss': 0.568, 'learning_rate': 5.55818113465012e-07, 'epoch': 0.9} 90%|████████▉ | 5176/5773 [16:41<54:38, 5.49s/it] 90%|████████▉ | 5177/5773 [16:48<54:37, 5.50s/it] 90%|████████▉ | 5177/5773 [16:46<54:37, 5.50s/it] {'loss': 0.5545, 'learning_rate': 5.539750070809258e-07, 'epoch': 0.9} 90%|████████▉ | 5177/5773 [16:48<54:37, 5.50s/it] {'loss': 0.5545, 'learning_rate': 5.539750070809258e-07, 'epoch': 0.9} 90%|████████▉ | 5177/5773 [16:46<54:37, 5.50s/it] 90%|████████▉ | 5178/5773 [16:54<55:19, 5.58s/it] 90%|████████▉ | 5178/5773 [16:52<55:19, 5.58s/it] {'loss': 0.5381, 'learning_rate': 5.521348746073063e-07, 'epoch': 0.9} 90%|████████▉ | 5178/5773 [16:54<55:19, 5.58s/it] {'loss': 0.5381, 'learning_rate': 5.521348746073063e-07, 'epoch': 0.9} 90%|████████▉ | 5178/5773 [16:52<55:19, 5.58s/it] 90%|████████▉ | 5179/5773 [17:00<54:52, 5.54s/it] 90%|████████▉ | 5179/5773 [16:57<54:52, 5.54s/it] {'loss': 0.5628, 'learning_rate': 5.502977166234857e-07, 'epoch': 0.9} 90%|████████▉ | 5179/5773 [17:00<54:52, 5.54s/it] {'loss': 0.5628, 'learning_rate': 5.502977166234857e-07, 'epoch': 0.9} 90%|████████▉ | 5179/5773 [16:57<54:52, 5.54s/it] 90%|████████▉ | 5180/5773 [17:05<54:39, 5.53s/it] 90%|████████▉ | 5180/5773 [17:03<54:39, 5.53s/it] {'loss': 0.5457, 'learning_rate': 5.484635337078581e-07, 'epoch': 0.9} 90%|████████▉ | 5180/5773 [17:05<54:39, 5.53s/it] {'loss': 0.5457, 'learning_rate': 5.484635337078581e-07, 'epoch': 0.9} 90%|████████▉ | 5180/5773 [17:03<54:39, 5.53s/it] 90%|████████▉ | 5181/5773 [17:11<54:18, 5.50s/it] 90%|████████▉ | 5181/5773 [17:08<54:18, 5.50s/it] {'loss': 0.5593, 'learning_rate': 5.466323264378859e-07, 'epoch': 0.9} 90%|████████▉ | 5181/5773 [17:11<54:18, 5.50s/it] {'loss': 0.5593, 'learning_rate': 5.466323264378859e-07, 'epoch': 0.9} 90%|████████▉ | 5181/5773 [17:08<54:18, 5.50s/it] 90%|████████▉ | 5182/5773 [17:16<54:06, 5.49s/it] 90%|████████▉ | 5182/5773 [17:14<54:06, 5.49s/it] {'loss': 0.5576, 'learning_rate': 5.448040953900902e-07, 'epoch': 0.9} 90%|████████▉ | 5182/5773 [17:16<54:06, 5.49s/it] {'loss': 0.5576, 'learning_rate': 5.448040953900902e-07, 'epoch': 0.9} 90%|████████▉ | 5182/5773 [17:14<54:06, 5.49s/it] 90%|████████▉ | 5183/5773 [17:22<54:04, 5.50s/it] 90%|████████▉ | 5183/5773 [17:19<54:04, 5.50s/it] {'loss': 0.5607, 'learning_rate': 5.429788411400571e-07, 'epoch': 0.9} 90%|████████▉ | 5183/5773 [17:22<54:04, 5.50s/it] {'loss': 0.5607, 'learning_rate': 5.429788411400571e-07, 'epoch': 0.9} 90%|████████▉ | 5183/5773 [17:19<54:04, 5.50s/it] 90%|████████▉ | 5184/5773 [17:27<54:18, 5.53s/it] 90%|████████▉ | 5184/5773 [17:25<54:18, 5.53s/it] {'loss': 0.5651, 'learning_rate': 5.411565642624328e-07, 'epoch': 0.9} 90%|████████▉ | 5184/5773 [17:27<54:18, 5.53s/it] {'loss': 0.5651, 'learning_rate': 5.411565642624328e-07, 'epoch': 0.9} 90%|████████▉ | 5184/5773 [17:25<54:18, 5.53s/it] 90%|████████▉ | 5185/5773 [17:33<53:51, 5.50s/it] 90%|████████▉ | 5185/5773 [17:30<53:51, 5.50s/it] {'loss': 0.5502, 'learning_rate': 5.393372653309315e-07, 'epoch': 0.9} 90%|████████▉ | 5185/5773 [17:33<53:51, 5.50s/it] {'loss': 0.5502, 'learning_rate': 5.393372653309315e-07, 'epoch': 0.9} 90%|████████▉ | 5185/5773 [17:30<53:51, 5.50s/it] 90%|████████▉ | 5186/5773 [17:38<53:46, 5.50s/it] 90%|████████▉ | 5186/5773 [17:36<53:46, 5.50s/it] {'loss': 0.5475, 'learning_rate': 5.375209449183261e-07, 'epoch': 0.9} 90%|████████▉ | 5186/5773 [17:38<53:46, 5.50s/it] {'loss': 0.5475, 'learning_rate': 5.375209449183261e-07, 'epoch': 0.9} 90%|████████▉ | 5186/5773 [17:36<53:46, 5.50s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 90%|████████▉ | 5187/5773 [17:44<53:48, 5.51s/it] 90%|████████▉ | 5187/5773 [17:41<53:48, 5.51s/it] {'loss': 0.5677, 'learning_rate': 5.357076035964492e-07, 'epoch': 0.9} 90%|████████▉ | 5187/5773 [17:44<53:48, 5.51s/it] {'loss': 0.5677, 'learning_rate': 5.357076035964492e-07, 'epoch': 0.9} 90%|████████▉ | 5187/5773 [17:41<53:48, 5.51s/it] 90%|████████▉ | 5188/5773 [17:49<53:36, 5.50s/it] 90%|████████▉ | 5188/5773 [17:47<53:35, 5.50s/it] {'loss': 0.5622, 'learning_rate': 5.33897241936202e-07, 'epoch': 0.9} 90%|████████▉ | 5188/5773 [17:49<53:36, 5.50s/it] {'loss': 0.5622, 'learning_rate': 5.33897241936202e-07, 'epoch': 0.9} 90%|████████▉ | 5188/5773 [17:47<53:35, 5.50s/it] 90%|████████▉ | 5189/5773 [17:55<53:11, 5.47s/it] 90%|████████▉ | 5189/5773 [17:52<53:11, 5.46s/it] {'loss': 0.5513, 'learning_rate': 5.320898605075431e-07, 'epoch': 0.9} 90%|████████▉ | 5189/5773 [17:55<53:11, 5.47s/it] {'loss': 0.5513, 'learning_rate': 5.320898605075431e-07, 'epoch': 0.9} 90%|████████▉ | 5189/5773 [17:52<53:11, 5.46s/it] 90%|████████▉ | 5190/5773 [18:00<53:28, 5.50s/it] 90%|████████▉ | 5190/5773 [17:58<53:28, 5.50s/it] {'loss': 0.5608, 'learning_rate': 5.302854598794938e-07, 'epoch': 0.9} 90%|████████▉ | 5190/5773 [18:00<53:28, 5.50s/it] {'loss': 0.5608, 'learning_rate': 5.302854598794938e-07, 'epoch': 0.9} 90%|████████▉ | 5190/5773 [17:58<53:28, 5.50s/it] 90%|████████▉ | 5191/5773 [18:06<53:04, 5.47s/it] 90%|████████▉ | 5191/5773 [18:03<53:04, 5.47s/it] {'loss': 0.5704, 'learning_rate': 5.284840406201375e-07, 'epoch': 0.9} 90%|████████▉ | 5191/5773 [18:06<53:04, 5.47s/it] {'loss': 0.5704, 'learning_rate': 5.284840406201375e-07, 'epoch': 0.9} 90%|████████▉ | 5191/5773 [18:03<53:04, 5.47s/it] 90%|████████▉ | 5192/5773 [18:11<52:51, 5.46s/it] 90%|████████▉ | 5192/5773 [18:09<52:51, 5.46s/it] {'loss': 0.5871, 'learning_rate': 5.266856032966172e-07, 'epoch': 0.9} 90%|████████▉ | 5192/5773 [18:11<52:51, 5.46s/it] {'loss': 0.5871, 'learning_rate': 5.266856032966172e-07, 'epoch': 0.9} 90%|████████▉ | 5192/5773 [18:09<52:51, 5.46s/it] 90%|████████▉ | 5193/5773 [18:17<53:09, 5.50s/it] 90%|████████▉ | 5193/5773 [18:14<53:09, 5.50s/it] {'loss': 0.5462, 'learning_rate': 5.24890148475139e-07, 'epoch': 0.9} 90%|████████▉ | 5193/5773 [18:17<53:09, 5.50s/it] {'loss': 0.5462, 'learning_rate': 5.24890148475139e-07, 'epoch': 0.9} 90%|████████▉ | 5193/5773 [18:14<53:09, 5.50s/it] 90%|████████▉ | 5194/5773 [18:22<53:00, 5.49s/it] 90%|████████▉ | 5194/5773 [18:20<53:00, 5.49s/it] {'loss': 0.5561, 'learning_rate': 5.230976767209706e-07, 'epoch': 0.9} 90%|████████▉ | 5194/5773 [18:22<53:00, 5.49s/it] {'loss': 0.5561, 'learning_rate': 5.230976767209706e-07, 'epoch': 0.9} 90%|████████▉ | 5194/5773 [18:20<53:00, 5.49s/it] 90%|████████▉ | 5195/5773 [18:28<53:16, 5.53s/it] 90%|████████▉ | 5195/5773 [18:25<53:16, 5.53s/it] {'loss': 0.5488, 'learning_rate': 5.213081885984384e-07, 'epoch': 0.9} 90%|████████▉ | 5195/5773 [18:28<53:16, 5.53s/it] {'loss': 0.5488, 'learning_rate': 5.213081885984384e-07, 'epoch': 0.9} 90%|████████▉ | 5195/5773 [18:25<53:16, 5.53s/it] 90%|█████████ | 5196/5773 [18:33<53:19, 5.54s/it] 90%|█████████ | 5196/5773 [18:31<53:19, 5.54s/it] {'loss': 0.5555, 'learning_rate': 5.195216846709305e-07, 'epoch': 0.9} 90%|█████████ | 5196/5773 [18:33<53:19, 5.54s/it] {'loss': 0.5555, 'learning_rate': 5.195216846709305e-07, 'epoch': 0.9} 90%|█████████ | 5196/5773 [18:31<53:19, 5.54s/it] 90%|█████████ | 5197/5773 [18:39<53:09, 5.54s/it] 90%|█████████ | 5197/5773 [18:36<53:10, 5.54s/it] {'loss': 0.5435, 'learning_rate': 5.17738165500894e-07, 'epoch': 0.9} 90%|█████████ | 5197/5773 [18:39<53:09, 5.54s/it] {'loss': 0.5435, 'learning_rate': 5.17738165500894e-07, 'epoch': 0.9} 90%|█████████ | 5197/5773 [18:36<53:10, 5.54s/it] 90%|█████████ | 5198/5773 [18:44<52:58, 5.53s/it] 90%|█████████ | 5198/5773 [18:42<52:58, 5.53s/it] {'loss': 0.5645, 'learning_rate': 5.159576316498416e-07, 'epoch': 0.9} 90%|█████████ | 5198/5773 [18:44<52:58, 5.53s/it] {'loss': 0.5645, 'learning_rate': 5.159576316498416e-07, 'epoch': 0.9} 90%|█████████ | 5198/5773 [18:42<52:58, 5.53s/it] 90%|█████████ | 5199/5773 [18:50<52:30, 5.49s/it] 90%|█████████ | 5199/5773 [18:47<52:30, 5.49s/it] {'loss': 0.5588, 'learning_rate': 5.141800836783383e-07, 'epoch': 0.9} 90%|█████████ | 5199/5773 [18:50<52:30, 5.49s/it] {'loss': 0.5588, 'learning_rate': 5.141800836783383e-07, 'epoch': 0.9} 90%|█████████ | 5199/5773 [18:47<52:30, 5.49s/it]11 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 10 126 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 13 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 90%|█████████ | 5200/5773 [18:55<52:14, 5.47s/it]5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 90%|█████████ | 5200/5773 [18:53<52:14, 5.47s/it] {'loss': 0.5583, 'learning_rate': 5.124055221460145e-07, 'epoch': 0.9} 90%|█████████ | 5200/5773 [18:55<52:14, 5.47s/it] {'loss': 0.5583, 'learning_rate': 5.124055221460145e-07, 'epoch': 0.9} 90%|█████████ | 5200/5773 [18:53<52:14, 5.47s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 90%|█████████ | 5201/5773 [19:21<1:57:46, 12.35s/it] 90%|█████████ | 5201/5773 [19:23<1:57:46, 12.35s/it] {'loss': 0.5476, 'learning_rate': 5.106339476115596e-07, 'epoch': 0.9} 90%|█████████ | 5201/5773 [19:23<1:57:46, 12.35s/it] {'loss': 0.5476, 'learning_rate': 5.106339476115596e-07, 'epoch': 0.9} 90%|█████████ | 5201/5773 [19:21<1:57:46, 12.35s/it] 90%|█████████ | 5202/5773 [19:27<1:37:58, 10.29s/it] 90%|█████████ | 5202/5773 [19:29<1:37:58, 10.30s/it] {'loss': 0.5545, 'learning_rate': 5.088653606327209e-07, 'epoch': 0.9} 90%|█████████ | 5202/5773 [19:29<1:37:58, 10.30s/it]{'loss': 0.5545, 'learning_rate': 5.088653606327209e-07, 'epoch': 0.9} 90%|█████████ | 5202/5773 [19:27<1:37:58, 10.29s/it] 90%|█████████ | 5203/5773 [19:32<1:23:47, 8.82s/it] 90%|█████████ | 5203/5773 [19:34<1:23:47, 8.82s/it] {'loss': 0.5446, 'learning_rate': 5.070997617663054e-07, 'epoch': 0.9} 90%|█████████ | 5203/5773 [19:34<1:23:47, 8.82s/it] {'loss': 0.5446, 'learning_rate': 5.070997617663054e-07, 'epoch': 0.9} 90%|█████████ | 5203/5773 [19:32<1:23:47, 8.82s/it] 90%|█████████ | 5204/5773 [19:37<1:13:41, 7.77s/it] 90%|█████████ | 5204/5773 [19:40<1:13:41, 7.77s/it] {'loss': 0.5756, 'learning_rate': 5.053371515681826e-07, 'epoch': 0.9} 90%|█████████ | 5204/5773 [19:40<1:13:41, 7.77s/it] {'loss': 0.5756, 'learning_rate': 5.053371515681826e-07, 'epoch': 0.9} 90%|█████████ | 5204/5773 [19:37<1:13:41, 7.77s/it] 90%|█████████ | 5205/5773 [19:43<1:06:43, 7.05s/it] 90%|█████████ | 5205/5773 [19:45<1:06:43, 7.05s/it] {'loss': 0.5556, 'learning_rate': 5.035775305932777e-07, 'epoch': 0.9} 90%|█████████ | 5205/5773 [19:45<1:06:43, 7.05s/it] {'loss': 0.5556, 'learning_rate': 5.035775305932777e-07, 'epoch': 0.9} 90%|█████████ | 5205/5773 [19:43<1:06:43, 7.05s/it] 90%|█████████ | 5206/5773 [19:48<1:02:14, 6.59s/it] 90%|█████████ | 5206/5773 [19:51<1:02:14, 6.59s/it] {'loss': 0.5621, 'learning_rate': 5.018208993955731e-07, 'epoch': 0.9} {'loss': 0.5621, 'learning_rate': 5.018208993955731e-07, 'epoch': 0.9} 90%|█████████ | 5206/5773 [19:48<1:02:14, 6.59s/it] 90%|█████████ | 5206/5773 [19:51<1:02:14, 6.59s/it] 90%|█████████ | 5207/5773 [19:54<58:50, 6.24s/it] 90%|█████████ | 5207/5773 [19:56<58:50, 6.24s/it] {'loss': 0.557, 'learning_rate': 5.000672585281141e-07, 'epoch': 0.9} 90%|█████████ | 5207/5773 [19:56<58:50, 6.24s/it] {'loss': 0.557, 'learning_rate': 5.000672585281141e-07, 'epoch': 0.9} 90%|█████████ | 5207/5773 [19:54<58:50, 6.24s/it] 90%|█████████ | 5208/5773 [19:59<56:18, 5.98s/it] 90%|█████████ | 5208/5773 [20:01<56:19, 5.98s/it] {'loss': 0.554, 'learning_rate': 4.983166085430036e-07, 'epoch': 0.9} 90%|█████████ | 5208/5773 [20:01<56:19, 5.98s/it] {'loss': 0.554, 'learning_rate': 4.983166085430036e-07, 'epoch': 0.9} 90%|█████████ | 5208/5773 [19:59<56:18, 5.98s/it] 90%|█████████ | 5209/5773 [20:04<54:37, 5.81s/it] 90%|█████████ | 5209/5773 [20:07<54:37, 5.81s/it] {'loss': 0.5704, 'learning_rate': 4.965689499914017e-07, 'epoch': 0.9} 90%|█████████ | 5209/5773 [20:07<54:37, 5.81s/it] {'loss': 0.5704, 'learning_rate': 4.965689499914017e-07, 'epoch': 0.9} 90%|█████████ | 5209/5773 [20:04<54:37, 5.81s/it] 90%|█████████ | 5210/5773 [20:10<53:35, 5.71s/it] 90%|█████████ | 5210/5773 [20:12<53:35, 5.71s/it] {'loss': 0.5583, 'learning_rate': 4.948242834235251e-07, 'epoch': 0.9} 90%|█████████ | 5210/5773 [20:12<53:35, 5.71s/it] {'loss': 0.5583, 'learning_rate': 4.948242834235251e-07, 'epoch': 0.9} 90%|█████████ | 5210/5773 [20:10<53:35, 5.71s/it] 90%|█████████ | 5211/5773 [20:15<52:43, 5.63s/it] 90%|█████████ | 5211/5773 [20:18<52:43, 5.63s/it] {'loss': 0.5488, 'learning_rate': 4.930826093886543e-07, 'epoch': 0.9} 90%|█████████ | 5211/5773 [20:18<52:43, 5.63s/it] {'loss': 0.5488, 'learning_rate': 4.930826093886543e-07, 'epoch': 0.9} 90%|█████████ | 5211/5773 [20:15<52:43, 5.63s/it] 90%|█████████ | 5212/5773 [20:21<52:01, 5.56s/it] 90%|█████████ | 5212/5773 [20:23<52:01, 5.56s/it] {'loss': 0.557, 'learning_rate': 4.913439284351207e-07, 'epoch': 0.9} 90%|█████████ | 5212/5773 [20:23<52:01, 5.56s/it] {'loss': 0.557, 'learning_rate': 4.913439284351207e-07, 'epoch': 0.9} 90%|█████████ | 5212/5773 [20:21<52:01, 5.56s/it] 90%|█████████ | 5213/5773 [20:26<51:21, 5.50s/it] 90%|█████████ | 5213/5773 [20:28<51:21, 5.50s/it] {'loss': 0.562, 'learning_rate': 4.896082411103175e-07, 'epoch': 0.9} 90%|█████████ | 5213/5773 [20:28<51:21, 5.50s/it] {'loss': 0.562, 'learning_rate': 4.896082411103175e-07, 'epoch': 0.9} 90%|█████████ | 5213/5773 [20:26<51:21, 5.50s/it] 90%|█████████ | 5214/5773 [20:32<51:22, 5.51s/it] 90%|█████████ | 5214/5773 [20:34<51:22, 5.51s/it] {'loss': 0.5357, 'learning_rate': 4.878755479606967e-07, 'epoch': 0.9} 90%|█████████ | 5214/5773 [20:34<51:22, 5.51s/it] {'loss': 0.5357, 'learning_rate': 4.878755479606967e-07, 'epoch': 0.9} 90%|█████████ | 5214/5773 [20:32<51:22, 5.51s/it] 90%|█████████ | 5215/5773 [20:37<51:02, 5.49s/it] 90%|█████████ | 5215/5773 [20:39<51:02, 5.49s/it] {'loss': 0.56, 'learning_rate': 4.861458495317628e-07, 'epoch': 0.9} 90%|█████████ | 5215/5773 [20:39<51:02, 5.49s/it] {'loss': 0.56, 'learning_rate': 4.861458495317628e-07, 'epoch': 0.9} 90%|█████████ | 5215/5773 [20:37<51:02, 5.49s/it] 90%|█████████ | 5216/5773 [20:42<50:50, 5.48s/it] 90%|█████████ | 5216/5773 [20:45<50:50, 5.48s/it] {'loss': 0.5557, 'learning_rate': 4.844191463680803e-07, 'epoch': 0.9} 90%|█████████ | 5216/5773 [20:45<50:50, 5.48s/it] {'loss': 0.5557, 'learning_rate': 4.844191463680803e-07, 'epoch': 0.9} 90%|█████████ | 5216/5773 [20:42<50:50, 5.48s/it] 90%|█████████ | 5217/5773 [20:48<50:56, 5.50s/it] 90%|█████████ | 5217/5773 [20:50<50:56, 5.50s/it] {'loss': 0.5745, 'learning_rate': 4.826954390132721e-07, 'epoch': 0.9} 90%|█████████ | 5217/5773 [20:50<50:56, 5.50s/it] {'loss': 0.5745, 'learning_rate': 4.826954390132721e-07, 'epoch': 0.9} 90%|█████████ | 5217/5773 [20:48<50:56, 5.50s/it] 90%|█████████ | 5218/5773 [20:54<51:09, 5.53s/it] 90%|█████████ | 5218/5773 [20:56<51:09, 5.53s/it] {'loss': 0.5576, 'learning_rate': 4.809747280100163e-07, 'epoch': 0.9} 90%|█████████ | 5218/5773 [20:56<51:09, 5.53s/it]{'loss': 0.5576, 'learning_rate': 4.809747280100163e-07, 'epoch': 0.9} 90%|█████████ | 5218/5773 [20:54<51:09, 5.53s/it] 90%|█████████ | 5219/5773 [20:59<50:39, 5.49s/it] 90%|█████████ | 5219/5773 [21:01<50:39, 5.49s/it] {'loss': 0.5474, 'learning_rate': 4.792570139000463e-07, 'epoch': 0.9} 90%|█████████ | 5219/5773 [21:01<50:39, 5.49s/it] {'loss': 0.5474, 'learning_rate': 4.792570139000463e-07, 'epoch': 0.9} 90%|█████████ | 5219/5773 [20:59<50:39, 5.49s/it] 90%|█████████ | 5220/5773 [21:04<50:21, 5.46s/it] 90%|█████████ | 5220/5773 [21:07<50:21, 5.46s/it] {'loss': 0.5557, 'learning_rate': 4.775422972241539e-07, 'epoch': 0.9} 90%|█████████ | 5220/5773 [21:07<50:21, 5.46s/it]{'loss': 0.5557, 'learning_rate': 4.775422972241539e-07, 'epoch': 0.9} 90%|█████████ | 5220/5773 [21:04<50:21, 5.46s/it] 90%|█████████ | 5221/5773 [21:10<50:22, 5.48s/it] 90%|█████████ | 5221/5773 [21:12<50:22, 5.48s/it] {'loss': 0.5369, 'learning_rate': 4.7583057852218615e-07, 'epoch': 0.9} 90%|█████████ | 5221/5773 [21:12<50:22, 5.48s/it] {'loss': 0.5369, 'learning_rate': 4.7583057852218615e-07, 'epoch': 0.9} 90%|█████████ | 5221/5773 [21:10<50:22, 5.48s/it] 90%|█████████ | 5222/5773 [21:15<49:44, 5.42s/it] 90%|█████████ | 5222/5773 [21:18<49:44, 5.42s/it] {'loss': 0.5623, 'learning_rate': 4.7412185833304867e-07, 'epoch': 0.9} 90%|█████████ | 5222/5773 [21:18<49:44, 5.42s/it] {'loss': 0.5623, 'learning_rate': 4.7412185833304867e-07, 'epoch': 0.9} 90%|█████████ | 5222/5773 [21:15<49:44, 5.42s/it] 90%|█████████ | 5223/5773 [21:21<49:41, 5.42s/it] 90%|█████████ | 5223/5773 [21:23<49:41, 5.42s/it] {'loss': 0.5664, 'learning_rate': 4.7241613719469784e-07, 'epoch': 0.9} 90%|█████████ | 5223/5773 [21:23<49:41, 5.42s/it]{'loss': 0.5664, 'learning_rate': 4.7241613719469784e-07, 'epoch': 0.9} 90%|█████████ | 5223/5773 [21:21<49:41, 5.42s/it] 90%|█████████ | 5224/5773 [21:26<49:45, 5.44s/it] 90%|█████████ | 5224/5773 [21:29<49:45, 5.44s/it] {'loss': 0.5523, 'learning_rate': 4.7071341564415284e-07, 'epoch': 0.9} 90%|█████████ | 5224/5773 [21:29<49:45, 5.44s/it] {'loss': 0.5523, 'learning_rate': 4.7071341564415284e-07, 'epoch': 0.9} 90%|█████████ | 5224/5773 [21:26<49:45, 5.44s/it] 91%|█████████ | 5225/5773 [21:31<49:21, 5.40s/it] 91%|█████████ | 5225/5773 [21:34<49:21, 5.40s/it] {'loss': 0.5722, 'learning_rate': 4.6901369421748386e-07, 'epoch': 0.91} 91%|█████████ | 5225/5773 [21:34<49:21, 5.40s/it] {'loss': 0.5722, 'learning_rate': 4.6901369421748386e-07, 'epoch': 0.91} 91%|█████████ | 5225/5773 [21:31<49:21, 5.40s/it] 91%|█████████ | 5226/5773 [21:37<48:56, 5.37s/it] 91%|█████████ | 5226/5773 [21:39<48:56, 5.37s/it] {'loss': 0.5702, 'learning_rate': 4.6731697344981506e-07, 'epoch': 0.91} 91%|█████████ | 5226/5773 [21:39<48:56, 5.37s/it] {'loss': 0.5702, 'learning_rate': 4.6731697344981506e-07, 'epoch': 0.91} 91%|█████████ | 5226/5773 [21:37<48:56, 5.37s/it] 91%|█████████ | 5227/5773 [21:42<48:33, 5.34s/it] 91%|█████████ | 5227/5773 [21:44<48:33, 5.34s/it] {'loss': 0.5635, 'learning_rate': 4.6562325387533137e-07, 'epoch': 0.91} {'loss': 0.5635, 'learning_rate': 4.6562325387533137e-07, 'epoch': 0.91} 91%|█████████ | 5227/5773 [21:44<48:33, 5.34s/it] 91%|█████████ | 5227/5773 [21:42<48:33, 5.34s/it] 91%|█████████ | 5228/5773 [21:47<48:36, 5.35s/it] 91%|█████████ | 5228/5773 [21:50<48:36, 5.35s/it] {'loss': 0.5469, 'learning_rate': 4.639325360272684e-07, 'epoch': 0.91} 91%|█████████ | 5228/5773 [21:50<48:36, 5.35s/it] {'loss': 0.5469, 'learning_rate': 4.639325360272684e-07, 'epoch': 0.91} 91%|█████████ | 5228/5773 [21:47<48:36, 5.35s/it] 91%|█████████ | 5229/5773 [21:53<48:29, 5.35s/it] 91%|█████████ | 5229/5773 [21:55<48:29, 5.35s/it] {'loss': 0.5461, 'learning_rate': 4.622448204379171e-07, 'epoch': 0.91} 91%|█████████ | 5229/5773 [21:55<48:29, 5.35s/it] {'loss': 0.5461, 'learning_rate': 4.622448204379171e-07, 'epoch': 0.91} 91%|█████████ | 5229/5773 [21:53<48:29, 5.35s/it] 91%|█████████ | 5230/5773 [21:58<48:11, 5.33s/it] 91%|█████████ | 5230/5773 [22:00<48:11, 5.33s/it] {'loss': 0.5495, 'learning_rate': 4.605601076386268e-07, 'epoch': 0.91} 91%|█████████ | 5230/5773 [22:00<48:11, 5.33s/it] {'loss': 0.5495, 'learning_rate': 4.605601076386268e-07, 'epoch': 0.91} 91%|█████████ | 5230/5773 [21:58<48:11, 5.33s/it] 91%|█████████ | 5231/5773 [22:03<48:35, 5.38s/it] 91%|█████████ | 5231/5773 [22:06<48:35, 5.38s/it] {'loss': 0.5631, 'learning_rate': 4.5887839815979774e-07, 'epoch': 0.91} 91%|█████████ | 5231/5773 [22:06<48:35, 5.38s/it] {'loss': 0.5631, 'learning_rate': 4.5887839815979774e-07, 'epoch': 0.91} 91%|█████████ | 5231/5773 [22:03<48:35, 5.38s/it] 91%|█████████ | 5232/5773 [22:09<48:03, 5.33s/it] 91%|█████████ | 5232/5773 [22:11<48:03, 5.33s/it] {'loss': 0.5584, 'learning_rate': 4.5719969253088524e-07, 'epoch': 0.91} 91%|█████████ | 5232/5773 [22:11<48:03, 5.33s/it] {'loss': 0.5584, 'learning_rate': 4.5719969253088524e-07, 'epoch': 0.91} 91%|█████████ | 5232/5773 [22:09<48:03, 5.33s/it] 91%|█████████ | 5233/5773 [22:14<48:37, 5.40s/it] 91%|█████████ | 5233/5773 [22:17<48:37, 5.40s/it] {'loss': 0.5626, 'learning_rate': 4.55523991280401e-07, 'epoch': 0.91} 91%|█████████ | 5233/5773 [22:17<48:37, 5.40s/it] {'loss': 0.5626, 'learning_rate': 4.55523991280401e-07, 'epoch': 0.91} 91%|█████████ | 5233/5773 [22:14<48:37, 5.40s/it] 91%|█████████ | 5234/5773 [22:20<48:34, 5.41s/it] 91%|█████████ | 5234/5773 [22:22<48:34, 5.41s/it] {'loss': 0.557, 'learning_rate': 4.538512949359075e-07, 'epoch': 0.91} 91%|█████████ | 5234/5773 [22:22<48:34, 5.41s/it] {'loss': 0.557, 'learning_rate': 4.538512949359075e-07, 'epoch': 0.91} 91%|█████████ | 5234/5773 [22:20<48:34, 5.41s/it] 91%|█████████ | 5235/5773 [22:25<48:22, 5.40s/it] 91%|█████████ | 5235/5773 [22:27<48:22, 5.40s/it] {'loss': 0.5714, 'learning_rate': 4.5218160402402234e-07, 'epoch': 0.91} 91%|█████████ | 5235/5773 [22:27<48:22, 5.40s/it] {'loss': 0.5714, 'learning_rate': 4.5218160402402234e-07, 'epoch': 0.91} 91%|█████████ | 5235/5773 [22:25<48:22, 5.40s/it] 91%|█████████ | 5236/5773 [22:30<48:04, 5.37s/it] 91%|█████████ | 5236/5773 [22:33<48:04, 5.37s/it] {'loss': 0.5381, 'learning_rate': 4.5051491907042055e-07, 'epoch': 0.91} 91%|█████████ | 5236/5773 [22:33<48:04, 5.37s/it] {'loss': 0.5381, 'learning_rate': 4.5051491907042055e-07, 'epoch': 0.91} 91%|█████████ | 5236/5773 [22:30<48:04, 5.37s/it] 91%|█████████ | 5237/5773 [22:36<48:17, 5.41s/it] 91%|█████████ | 5237/5773 [22:38<48:17, 5.41s/it] {'loss': 0.5581, 'learning_rate': 4.4885124059982686e-07, 'epoch': 0.91} 91%|█████████ | 5237/5773 [22:38<48:17, 5.41s/it] {'loss': 0.5581, 'learning_rate': 4.4885124059982686e-07, 'epoch': 0.91} 91%|█████████ | 5237/5773 [22:36<48:17, 5.41s/it] 91%|█████████ | 5238/5773 [22:41<47:52, 5.37s/it] 91%|█████████ | 5238/5773 [22:44<47:52, 5.37s/it] {'loss': 0.548, 'learning_rate': 4.47190569136019e-07, 'epoch': 0.91} 91%|█████████ | 5238/5773 [22:44<47:52, 5.37s/it] {'loss': 0.548, 'learning_rate': 4.47190569136019e-07, 'epoch': 0.91} 91%|█████████ | 5238/5773 [22:41<47:52, 5.37s/it] 91%|█████████ | 5239/5773 [22:46<47:36, 5.35s/it] 91%|█████████ | 5239/5773 [22:49<47:36, 5.35s/it] {'loss': 0.555, 'learning_rate': 4.455329052018287e-07, 'epoch': 0.91} 91%|█████████ | 5239/5773 [22:49<47:36, 5.35s/it] {'loss': 0.555, 'learning_rate': 4.455329052018287e-07, 'epoch': 0.91} 91%|█████████ | 5239/5773 [22:46<47:36, 5.35s/it] 91%|█████████ | 5240/5773 [22:52<47:34, 5.36s/it] 91%|█████████ | 5240/5773 [22:54<47:34, 5.36s/it] {'loss': 0.5548, 'learning_rate': 4.4387824931914404e-07, 'epoch': 0.91} 91%|█████████ | 5240/5773 [22:54<47:34, 5.36s/it] {'loss': 0.5548, 'learning_rate': 4.4387824931914404e-07, 'epoch': 0.91} 91%|█████████ | 5240/5773 [22:52<47:34, 5.36s/it] 91%|█████████ | 5241/5773 [22:57<47:59, 5.41s/it] 91%|█████████ | 5241/5773 [23:00<47:59, 5.41s/it] {'loss': 0.5373, 'learning_rate': 4.4222660200890276e-07, 'epoch': 0.91} 91%|█████████ | 5241/5773 [23:00<47:59, 5.41s/it] {'loss': 0.5373, 'learning_rate': 4.4222660200890276e-07, 'epoch': 0.91} 91%|█████████ | 5241/5773 [22:57<47:59, 5.41s/it] 91%|█████████ | 5242/5773 [23:03<48:08, 5.44s/it] 91%|█████████ | 5242/5773 [23:05<48:08, 5.44s/it] {'loss': 0.5515, 'learning_rate': 4.4057796379109565e-07, 'epoch': 0.91} 91%|█████████ | 5242/5773 [23:05<48:08, 5.44s/it] {'loss': 0.5515, 'learning_rate': 4.4057796379109565e-07, 'epoch': 0.91} 91%|█████████ | 5242/5773 [23:03<48:08, 5.44s/it] 91%|█████████ | 5243/5773 [23:08<48:16, 5.47s/it] 91%|█████████ | 5243/5773 [23:11<48:16, 5.47s/it] {'loss': 0.5517, 'learning_rate': 4.389323351847674e-07, 'epoch': 0.91} 91%|█████████ | 5243/5773 [23:11<48:16, 5.47s/it] {'loss': 0.5517, 'learning_rate': 4.389323351847674e-07, 'epoch': 0.91} 91%|█████████ | 5243/5773 [23:08<48:16, 5.47s/it] 91%|█████████ | 5244/5773 [23:14<48:00, 5.44s/it] 91%|█████████ | 5244/5773 [23:16<48:00, 5.44s/it] {'loss': 0.5566, 'learning_rate': 4.3728971670801366e-07, 'epoch': 0.91} 91%|█████████ | 5244/5773 [23:16<48:00, 5.44s/it] {'loss': 0.5566, 'learning_rate': 4.3728971670801366e-07, 'epoch': 0.91} 91%|█████████ | 5244/5773 [23:14<48:00, 5.44s/it] 91%|█████████ | 5245/5773 [23:19<47:54, 5.44s/it] 91%|█████████ | 5245/5773 [23:22<47:54, 5.44s/it] {'loss': 0.5628, 'learning_rate': 4.356501088779841e-07, 'epoch': 0.91} 91%|█████████ | 5245/5773 [23:22<47:54, 5.44s/it] {'loss': 0.5628, 'learning_rate': 4.356501088779841e-07, 'epoch': 0.91} 91%|█████████ | 5245/5773 [23:19<47:54, 5.44s/it] 91%|█████████ | 5246/5773 [23:25<48:27, 5.52s/it] 91%|█████████ | 5246/5773 [23:27<48:27, 5.52s/it] {'loss': 0.5479, 'learning_rate': 4.3401351221087795e-07, 'epoch': 0.91} 91%|█████████ | 5246/5773 [23:27<48:27, 5.52s/it] {'loss': 0.5479, 'learning_rate': 4.3401351221087795e-07, 'epoch': 0.91} 91%|█████████ | 5246/5773 [23:25<48:27, 5.52s/it] 91%|█████████ | 5247/5773 [23:30<48:04, 5.48s/it] 91%|█████████ | 5247/5773 [23:33<48:04, 5.48s/it] {'loss': 0.5542, 'learning_rate': 4.3237992722195197e-07, 'epoch': 0.91} 91%|█████████ | 5247/5773 [23:33<48:04, 5.48s/it] {'loss': 0.5542, 'learning_rate': 4.3237992722195197e-07, 'epoch': 0.91} 91%|█████████ | 5247/5773 [23:30<48:04, 5.48s/it] 91%|█████████ | 5248/5773 [23:36<47:46, 5.46s/it] 91%|█████████ | 5248/5773 [23:38<47:46, 5.46s/it] {'loss': 0.5473, 'learning_rate': 4.3074935442550483e-07, 'epoch': 0.91} 91%|█████████ | 5248/5773 [23:38<47:46, 5.46s/it] {'loss': 0.5473, 'learning_rate': 4.3074935442550483e-07, 'epoch': 0.91} 91%|█████████ | 5248/5773 [23:36<47:46, 5.46s/it] 91%|█████████ | 5249/5773 [23:41<48:06, 5.51s/it] 91%|█████████ | 5249/5773 [23:44<48:06, 5.51s/it] {'loss': 0.5478, 'learning_rate': 4.291217943348991e-07, 'epoch': 0.91} 91%|█████████ | 5249/5773 [23:44<48:06, 5.51s/it] {'loss': 0.5478, 'learning_rate': 4.291217943348991e-07, 'epoch': 0.91} 91%|█████████ | 5249/5773 [23:41<48:06, 5.51s/it]11 AutoResumeHook: Checking whether to suspend... 1413 AutoResumeHook: Checking whether to suspend...2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 015 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 91%|█████████ | 5250/5773 [23:47<48:03, 5.51s/it] 5 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 91%|█████████ | 5250/5773 [23:49<48:03, 5.51s/it] {'loss': 0.5561, 'learning_rate': 4.274972474625383e-07, 'epoch': 0.91} 91%|█████████ | 5250/5773 [23:49<48:03, 5.51s/it]{'loss': 0.5561, 'learning_rate': 4.274972474625383e-07, 'epoch': 0.91} 91%|█████████ | 5250/5773 [23:47<48:03, 5.51s/it] 91%|█████████ | 5251/5773 [23:52<47:59, 5.52s/it] 91%|█████████ | 5251/5773 [23:55<47:59, 5.52s/it] {'loss': 0.5532, 'learning_rate': 4.258757143198844e-07, 'epoch': 0.91} 91%|█████████ | 5251/5773 [23:55<47:59, 5.52s/it] {'loss': 0.5532, 'learning_rate': 4.258757143198844e-07, 'epoch': 0.91} 91%|█████████ | 5251/5773 [23:52<47:59, 5.52s/it] 91%|█████████ | 5252/5773 [23:58<47:24, 5.46s/it] 91%|█████████ | 5252/5773 [24:00<47:24, 5.46s/it] {'loss': 0.5481, 'learning_rate': 4.242571954174457e-07, 'epoch': 0.91} 91%|█████████ | 5252/5773 [24:00<47:24, 5.46s/it] {'loss': 0.5481, 'learning_rate': 4.242571954174457e-07, 'epoch': 0.91} 91%|█████████ | 5252/5773 [23:58<47:24, 5.46s/it] 91%|█████████ | 5253/5773 [24:03<47:22, 5.47s/it] 91%|█████████ | 5253/5773 [24:06<47:22, 5.47s/it] {'loss': 0.551, 'learning_rate': 4.226416912647857e-07, 'epoch': 0.91} 91%|█████████ | 5253/5773 [24:06<47:22, 5.47s/it] {'loss': 0.551, 'learning_rate': 4.226416912647857e-07, 'epoch': 0.91} 91%|█████████ | 5253/5773 [24:03<47:22, 5.47s/it] 91%|█████████ | 5254/5773 [24:09<47:10, 5.45s/it] 91%|█████████ | 5254/5773 [24:11<47:10, 5.45s/it] {'loss': 0.5491, 'learning_rate': 4.210292023705165e-07, 'epoch': 0.91} 91%|█████████ | 5254/5773 [24:11<47:10, 5.45s/it] {'loss': 0.5491, 'learning_rate': 4.210292023705165e-07, 'epoch': 0.91} 91%|█████████ | 5254/5773 [24:09<47:10, 5.45s/it] 91%|█████████ | 5255/5773 [24:14<46:55, 5.43s/it] 91%|█████████ | 5255/5773 [24:16<46:55, 5.43s/it] {'loss': 0.555, 'learning_rate': 4.1941972924229854e-07, 'epoch': 0.91} 91%|█████████ | 5255/5773 [24:16<46:55, 5.43s/it] {'loss': 0.555, 'learning_rate': 4.1941972924229854e-07, 'epoch': 0.91} 91%|█████████ | 5255/5773 [24:14<46:55, 5.43s/it] 91%|█████████ | 5256/5773 [24:19<46:40, 5.42s/it] 91%|█████████ | 5256/5773 [24:22<46:40, 5.42s/it] {'loss': 0.5472, 'learning_rate': 4.1781327238684775e-07, 'epoch': 0.91} 91%|█████████ | 5256/5773 [24:22<46:40, 5.42s/it] {'loss': 0.5472, 'learning_rate': 4.1781327238684775e-07, 'epoch': 0.91} 91%|█████████ | 5256/5773 [24:19<46:40, 5.42s/it] 91%|█████████ | 5257/5773 [24:25<47:00, 5.47s/it] 91%|█████████ | 5257/5773 [24:27<47:00, 5.47s/it] {'loss': 0.569, 'learning_rate': 4.162098323099284e-07, 'epoch': 0.91} {'loss': 0.569, 'learning_rate': 4.162098323099284e-07, 'epoch': 0.91} 91%|█████████ | 5257/5773 [24:27<47:00, 5.47s/it] 91%|█████████ | 5257/5773 [24:25<47:00, 5.47s/it] 91%|█████████ | 5258/5773 [24:30<46:42, 5.44s/it] 91%|█████████ | 5258/5773 [24:33<46:42, 5.44s/it] {'loss': 0.5454, 'learning_rate': 4.146094095163544e-07, 'epoch': 0.91} 91%|█████████ | 5258/5773 [24:33<46:42, 5.44s/it] {'loss': 0.5454, 'learning_rate': 4.146094095163544e-07, 'epoch': 0.91} 91%|█████████ | 5258/5773 [24:30<46:42, 5.44s/it] 91%|█████████ | 5259/5773 [24:36<46:24, 5.42s/it] 91%|█████████ | 5259/5773 [24:38<46:24, 5.42s/it] {'loss': 0.5367, 'learning_rate': 4.1301200450998724e-07, 'epoch': 0.91} {'loss': 0.5367, 'learning_rate': 4.1301200450998724e-07, 'epoch': 0.91} 91%|█████████ | 5259/5773 [24:38<46:24, 5.42s/it] 91%|█████████ | 5259/5773 [24:36<46:24, 5.42s/it] 91%|█████████ | 5260/5773 [24:41<46:12, 5.40s/it] 91%|█████████ | 5260/5773 [24:44<46:12, 5.40s/it] {'loss': 0.555, 'learning_rate': 4.1141761779374457e-07, 'epoch': 0.91} 91%|█████████ | 5260/5773 [24:44<46:12, 5.40s/it] {'loss': 0.555, 'learning_rate': 4.1141761779374457e-07, 'epoch': 0.91} 91%|█████████ | 5260/5773 [24:41<46:12, 5.40s/it] 91%|█████████ | 5261/5773 [24:47<46:12, 5.41s/it] 91%|█████████ | 5261/5773 [24:49<46:12, 5.41s/it] {'loss': 0.564, 'learning_rate': 4.098262498695893e-07, 'epoch': 0.91} 91%|█████████ | 5261/5773 [24:49<46:12, 5.41s/it] {'loss': 0.564, 'learning_rate': 4.098262498695893e-07, 'epoch': 0.91} 91%|█████████ | 5261/5773 [24:47<46:12, 5.41s/it] 91%|█████████ | 5262/5773 [24:52<46:10, 5.42s/it] 91%|█████████ | 5262/5773 [24:54<46:10, 5.42s/it] {'loss': 0.5714, 'learning_rate': 4.0823790123853514e-07, 'epoch': 0.91} 91%|█████████ | 5262/5773 [24:54<46:10, 5.42s/it] {'loss': 0.5714, 'learning_rate': 4.0823790123853514e-07, 'epoch': 0.91} 91%|█████████ | 5262/5773 [24:52<46:10, 5.42s/it] 91%|█████████ | 5263/5773 [24:57<45:55, 5.40s/it] 91%|█████████ | 5263/5773 [25:00<45:55, 5.40s/it] {'loss': 0.5577, 'learning_rate': 4.0665257240064316e-07, 'epoch': 0.91} 91%|█████████ | 5263/5773 [25:00<45:55, 5.40s/it] {'loss': 0.5577, 'learning_rate': 4.0665257240064316e-07, 'epoch': 0.91} 91%|█████████ | 5263/5773 [24:57<45:55, 5.40s/it] 91%|█████████ | 5264/5773 [25:03<45:55, 5.41s/it] 91%|█████████ | 5264/5773 [25:05<45:55, 5.41s/it] {'loss': 0.5539, 'learning_rate': 4.0507026385502747e-07, 'epoch': 0.91} 91%|█████████ | 5264/5773 [25:05<45:55, 5.41s/it] {'loss': 0.5539, 'learning_rate': 4.0507026385502747e-07, 'epoch': 0.91} 91%|█████████ | 5264/5773 [25:03<45:55, 5.41s/it] 91%|█████████ | 5265/5773 [25:08<46:07, 5.45s/it] 91%|█████████ | 5265/5773 [25:11<46:07, 5.45s/it] {'loss': 0.5527, 'learning_rate': 4.034909760998473e-07, 'epoch': 0.91} 91%|█████████ | 5265/5773 [25:11<46:07, 5.45s/it] {'loss': 0.5527, 'learning_rate': 4.034909760998473e-07, 'epoch': 0.91} 91%|█████████ | 5265/5773 [25:08<46:07, 5.45s/it] 91%|█████████ | 5266/5773 [25:14<45:52, 5.43s/it] 91%|█████████ | 5266/5773 [25:16<45:52, 5.43s/it] {'loss': 0.559, 'learning_rate': 4.0191470963231503e-07, 'epoch': 0.91} 91%|█████████ | 5266/5773 [25:16<45:52, 5.43s/it] {'loss': 0.559, 'learning_rate': 4.0191470963231503e-07, 'epoch': 0.91} 91%|█████████ | 5266/5773 [25:14<45:52, 5.43s/it] 91%|█████████ | 5267/5773 [25:19<45:32, 5.40s/it] 91%|█████████ | 5267/5773 [25:21<45:32, 5.40s/it] {'loss': 0.5637, 'learning_rate': 4.003414649486892e-07, 'epoch': 0.91} 91%|█████████ | 5267/5773 [25:21<45:32, 5.40s/it] {'loss': 0.5637, 'learning_rate': 4.003414649486892e-07, 'epoch': 0.91} 91%|█████████ | 5267/5773 [25:19<45:32, 5.40s/it] 91%|█████████▏| 5268/5773 [25:24<45:29, 5.41s/it] 91%|█████████▏| 5268/5773 [25:27<45:29, 5.41s/it] {'loss': 0.551, 'learning_rate': 3.987712425442758e-07, 'epoch': 0.91} 91%|█████████▏| 5268/5773 [25:27<45:29, 5.41s/it]{'loss': 0.551, 'learning_rate': 3.987712425442758e-07, 'epoch': 0.91} 91%|█████████▏| 5268/5773 [25:24<45:29, 5.41s/it] 91%|█████████▏| 5269/5773 [25:30<45:45, 5.45s/it] 91%|█████████▏| 5269/5773 [25:32<45:45, 5.45s/it] {'loss': 0.5603, 'learning_rate': 3.9720404291343385e-07, 'epoch': 0.91} 91%|█████████▏| 5269/5773 [25:32<45:45, 5.45s/it] {'loss': 0.5603, 'learning_rate': 3.9720404291343385e-07, 'epoch': 0.91} 91%|█████████▏| 5269/5773 [25:30<45:45, 5.45s/it] 91%|█████████▏| 5270/5773 [25:35<45:35, 5.44s/it] 91%|█████████▏| 5270/5773 [25:38<45:35, 5.44s/it] {'loss': 0.5506, 'learning_rate': 3.956398665495664e-07, 'epoch': 0.91} 91%|█████████▏| 5270/5773 [25:38<45:35, 5.44s/it] {'loss': 0.5506, 'learning_rate': 3.956398665495664e-07, 'epoch': 0.91} 91%|█████████▏| 5270/5773 [25:35<45:35, 5.44s/it] 91%|█████████▏| 5271/5773 [25:41<45:27, 5.43s/it] 91%|█████████▏| 5271/5773 [25:43<45:27, 5.43s/it] {'loss': 0.5689, 'learning_rate': 3.9407871394512633e-07, 'epoch': 0.91} 91%|█████████▏| 5271/5773 [25:43<45:27, 5.43s/it] {'loss': 0.5689, 'learning_rate': 3.9407871394512633e-07, 'epoch': 0.91} 91%|█████████▏| 5271/5773 [25:41<45:27, 5.43s/it] 91%|█████████▏| 5272/5773 [25:46<45:01, 5.39s/it] 91%|█████████▏| 5272/5773 [25:49<45:01, 5.39s/it] {'loss': 0.5648, 'learning_rate': 3.9252058559161257e-07, 'epoch': 0.91} 91%|█████████▏| 5272/5773 [25:49<45:01, 5.39s/it] {'loss': 0.5648, 'learning_rate': 3.9252058559161257e-07, 'epoch': 0.91} 91%|█████████▏| 5272/5773 [25:46<45:01, 5.39s/it] 91%|█████████▏| 5273/5773 [25:52<44:59, 5.40s/it] 91%|█████████▏| 5273/5773 [25:54<44:59, 5.40s/it] {'loss': 0.5649, 'learning_rate': 3.9096548197957717e-07, 'epoch': 0.91} 91%|█████████▏| 5273/5773 [25:54<44:59, 5.40s/it] {'loss': 0.5649, 'learning_rate': 3.9096548197957717e-07, 'epoch': 0.91} 91%|█████████▏| 5273/5773 [25:52<44:59, 5.40s/it] 91%|█████████▏| 5274/5773 [25:57<45:02, 5.42s/it] 91%|█████████▏| 5274/5773 [25:59<45:02, 5.42s/it] {'loss': 0.54, 'learning_rate': 3.894134035986141e-07, 'epoch': 0.91} 91%|█████████▏| 5274/5773 [25:59<45:02, 5.42s/it] {'loss': 0.54, 'learning_rate': 3.894134035986141e-07, 'epoch': 0.91} 91%|█████████▏| 5274/5773 [25:57<45:02, 5.42s/it] 91%|█████████▏| 5275/5773 [26:02<45:02, 5.43s/it] 91%|█████████▏| 5275/5773 [26:05<45:02, 5.43s/it] {'loss': 0.5492, 'learning_rate': 3.87864350937367e-07, 'epoch': 0.91} 91%|█████████▏| 5275/5773 [26:05<45:02, 5.43s/it] {'loss': 0.5492, 'learning_rate': 3.87864350937367e-07, 'epoch': 0.91} 91%|█████████▏| 5275/5773 [26:02<45:02, 5.43s/it] 91%|█████████▏| 5276/5773 [26:08<45:02, 5.44s/it] 91%|█████████▏| 5276/5773 [26:10<45:02, 5.44s/it] {'loss': 0.5704, 'learning_rate': 3.863183244835289e-07, 'epoch': 0.91} 91%|█████████▏| 5276/5773 [26:10<45:02, 5.44s/it] {'loss': 0.5704, 'learning_rate': 3.863183244835289e-07, 'epoch': 0.91} 91%|█████████▏| 5276/5773 [26:08<45:02, 5.44s/it] 91%|█████████▏| 5277/5773 [26:13<45:09, 5.46s/it] 91%|█████████▏| 5277/5773 [26:16<45:09, 5.46s/it] {'loss': 0.5586, 'learning_rate': 3.8477532472383726e-07, 'epoch': 0.91} 91%|█████████▏| 5277/5773 [26:16<45:09, 5.46s/it] {'loss': 0.5586, 'learning_rate': 3.8477532472383726e-07, 'epoch': 0.91} 91%|█████████▏| 5277/5773 [26:13<45:09, 5.46s/it] 91%|█████████▏| 5278/5773 [26:19<44:56, 5.45s/it] 91%|█████████▏| 5278/5773 [26:21<44:56, 5.45s/it] {'loss': 0.5481, 'learning_rate': 3.832353521440768e-07, 'epoch': 0.91} 91%|█████████▏| 5278/5773 [26:21<44:56, 5.45s/it] {'loss': 0.5481, 'learning_rate': 3.832353521440768e-07, 'epoch': 0.91} 91%|█████████▏| 5278/5773 [26:19<44:56, 5.45s/it] 91%|█████████▏| 5279/5773 [26:24<44:53, 5.45s/it] 91%|█████████▏| 5279/5773 [26:27<44:53, 5.45s/it] {'loss': 0.5469, 'learning_rate': 3.816984072290808e-07, 'epoch': 0.91} 91%|█████████▏| 5279/5773 [26:27<44:53, 5.45s/it]{'loss': 0.5469, 'learning_rate': 3.816984072290808e-07, 'epoch': 0.91} 91%|█████████▏| 5279/5773 [26:24<44:53, 5.45s/it] 91%|█████████▏| 5280/5773 [26:30<44:42, 5.44s/it] 91%|█████████▏| 5280/5773 [26:32<44:42, 5.44s/it] {'loss': 0.5643, 'learning_rate': 3.8016449046273e-07, 'epoch': 0.91} 91%|█████████▏| 5280/5773 [26:32<44:42, 5.44s/it] {'loss': 0.5643, 'learning_rate': 3.8016449046273e-07, 'epoch': 0.91} 91%|█████████▏| 5280/5773 [26:30<44:42, 5.44s/it] 91%|█████████▏| 5281/5773 [26:35<44:39, 5.45s/it] 91%|█████████▏| 5281/5773 [26:38<44:39, 5.45s/it] {'loss': 0.5435, 'learning_rate': 3.78633602327948e-07, 'epoch': 0.91} 91%|█████████▏| 5281/5773 [26:38<44:39, 5.45s/it]{'loss': 0.5435, 'learning_rate': 3.78633602327948e-07, 'epoch': 0.91} 91%|█████████▏| 5281/5773 [26:35<44:39, 5.45s/it] 91%|█████████▏| 5282/5773 [26:41<44:32, 5.44s/it] 91%|█████████▏| 5282/5773 [26:43<44:32, 5.44s/it] {'loss': 0.5449, 'learning_rate': 3.7710574330670934e-07, 'epoch': 0.91} 91%|█████████▏| 5282/5773 [26:43<44:32, 5.44s/it] {'loss': 0.5449, 'learning_rate': 3.7710574330670934e-07, 'epoch': 0.91} 91%|█████████▏| 5282/5773 [26:41<44:32, 5.44s/it] 92%|█████████▏| 5283/5773 [26:46<44:27, 5.44s/it] 92%|█████████▏| 5283/5773 [26:48<44:27, 5.44s/it] {'loss': 0.5565, 'learning_rate': 3.755809138800326e-07, 'epoch': 0.92} 92%|█████████▏| 5283/5773 [26:48<44:27, 5.44s/it] {'loss': 0.5565, 'learning_rate': 3.755809138800326e-07, 'epoch': 0.92} 92%|█████████▏| 5283/5773 [26:46<44:27, 5.44s/it] 92%|█████████▏| 5284/5773 [26:52<44:57, 5.52s/it] 92%|█████████▏| 5284/5773 [26:54<44:57, 5.52s/it] {'loss': 0.5501, 'learning_rate': 3.740591145279815e-07, 'epoch': 0.92} 92%|█████████▏| 5284/5773 [26:54<44:57, 5.52s/it]{'loss': 0.5501, 'learning_rate': 3.740591145279815e-07, 'epoch': 0.92} 92%|█████████▏| 5284/5773 [26:52<44:57, 5.52s/it] 92%|█████████▏| 5285/5773 [26:57<44:37, 5.49s/it] 92%|█████████▏| 5285/5773 [27:00<44:37, 5.49s/it] {'loss': 0.5406, 'learning_rate': 3.725403457296672e-07, 'epoch': 0.92} 92%|█████████▏| 5285/5773 [27:00<44:37, 5.49s/it] {'loss': 0.5406, 'learning_rate': 3.725403457296672e-07, 'epoch': 0.92} 92%|█████████▏| 5285/5773 [26:57<44:37, 5.49s/it] 92%|█████████▏| 5286/5773 [27:03<44:26, 5.48s/it] 92%|█████████▏| 5286/5773 [27:05<44:26, 5.48s/it] {'loss': 0.5627, 'learning_rate': 3.7102460796324844e-07, 'epoch': 0.92} 92%|█████████▏| 5286/5773 [27:05<44:26, 5.48s/it] {'loss': 0.5627, 'learning_rate': 3.7102460796324844e-07, 'epoch': 0.92} 92%|█████████▏| 5286/5773 [27:03<44:26, 5.48s/it] 92%|█████████▏| 5287/5773 [27:08<43:57, 5.43s/it] 92%|█████████▏| 5287/5773 [27:10<43:57, 5.43s/it] {'loss': 0.5482, 'learning_rate': 3.695119017059268e-07, 'epoch': 0.92} 92%|█████████▏| 5287/5773 [27:10<43:57, 5.43s/it]{'loss': 0.5482, 'learning_rate': 3.695119017059268e-07, 'epoch': 0.92} 92%|█████████▏| 5287/5773 [27:08<43:57, 5.43s/it] 92%|█████████▏| 5288/5773 [27:14<44:20, 5.49s/it] 92%|█████████▏| 5288/5773 [27:16<44:20, 5.49s/it] {'loss': 0.5605, 'learning_rate': 3.680022274339501e-07, 'epoch': 0.92} 92%|█████████▏| 5288/5773 [27:16<44:20, 5.49s/it]{'loss': 0.5605, 'learning_rate': 3.680022274339501e-07, 'epoch': 0.92} 92%|█████████▏| 5288/5773 [27:14<44:20, 5.49s/it]WARNING: tokenization mismatch: 1 vs. 1440. [[{'from': 'human', 'value': 'ASSISTANT: Du bist ein Personalmarketing-Experte mit den folgenden Kenntnissen und Eigenschaften.\n\nKENNTNISSE: Personalmarketing, Social Media Marketing, Facebook-Marketing, Kampagnenentwicklung, Zielgruppenanalyse, Content-Strategie, Werbeanzeigen, Performance-Messung, Personalgewinnung, Steuerfachangestellte.\n\nEIGENSCHAFTEN: Kreativ, strategisch, analytisch, kommunikativ, zielorientiert, detailverliebt, teamorientiert.\n\nAUFGABE: Entwickle eine erfolgreiche Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook. Erstelle eine Tabelle mit allen Maßnahmen und weise jeder Maßnahme den zuständigen Spezialisten in einer Marketingagentur zu. Achte dabei auf Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung.\n\nOUTPUT: Eine Tabelle mit allen Maßnahmen für eine erfolgreiche Personalmarketing-Kampagne zur Gewinnung von Steuerfachangestellten auf Facebook, die die Verantwortlichkeiten der Spezialisten in einer Marketingagentur aufzeigt, einschließlich Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung.'}, {'from': 'gpt', 'value': '| Maßnahme | Beschreibung | Zuständiger Spezialist |\n| --- | --- | --- |\n| Zielgruppenanalyse | Analyse der demografischen Daten, Interessen und Verhaltensweisen der potenziellen Bewerber. | Marktanalyst |\n| Erstellung von Personas | Entwicklung von fiktiven, repräsentativen Bewerberprofilen zur besseren Zielgruppenansprache. | Personalmarketing-Experte |\n| Content-Strategie | Planung von relevantem, ansprechendem und zielgruppenspezifischem Content. | Content-Stratege |\n| Redaktionsplan | Erstellung eines Zeitplans für die Veröffentlichung von Inhalten auf Facebook. | Social Media Manager |\n| Erstellung von Werbeanzeigen | Design und Texten von ansprechenden, zielgruppenspezifischen Facebook-Werbeanzeigen. | Art Director & Copywriter |\n| Auswahl von Targeting-Optionen | Festlegung der Zielgruppenparameter für die Facebook-Werbekampagnen. | Facebook-Marketing-Experte |\n| Budgetierung und Zeitplanung | Festlegung des Budgets und der Laufzeit für die Facebook-Werbekampagnen. | Media Planner |\n| Erstellung von Landing Pages | Gestaltung und Implementierung von spezifischen Landing Pages für die Bewerbung. | Web-Designer |\n| Performance-Messung | Überwachung und Analyse der Kampagnenleistung, einschließlich KPIs wie Klicks und Bewerbungen. | Data Analyst |\n| A/B-Testing | Durchführung von A/B-Tests, um die Effektivität von Anzeigen und Landing Pages zu optimieren. | Conversion-Optimierungsexperte |\n| Community-Management | Interaktion mit der Zielgruppe auf Facebook, Beantwortung von Fragen und Moderation von Diskussionen. | Community Manager |\n| Reporting und Optimierung | Regelmäßige Berichterstattung über die Kampagnenergebnisse und Anpassung der Strategie. | Performance Marketing-Manager |'}, {'from': 'human', 'value': 'Weiter.'}, {'from': 'gpt', 'value': 'Hier sind einige zusätzliche Maßnahmen, die Sie in Ihre Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook einbeziehen können:\n\n| Maßnahme | Beschreibung | Zuständiger Spezialist |\n| --- | --- | --- |\n| Video-Content | Erstellung von Video-Content, wie Interviews mit aktuellen Mitarbeitern oder Berufsalltagsvideos. | Videoproduzent |\n| Influencer-Marketing | Zusammenarbeit mit Influencern aus der Branche, um Ihre Arbeitgebermarke zu stärken. | Influencer-Marketing-Experte |\n| Anpassung von Stellenanzeigen | Anpassung der Stellenanzeigen auf Facebook, um die Zielgruppe besser anzusprechen. | Personalmarketing-Experte |\n| Mitarbeiter-Testimonials | Teilen von Erfahrungsberichten und Zitaten der Mitarbeiter auf Facebook, um Authentizität zu schaffen. | Content-Stratege |\n| Virtuelle Karriere-Events | Durchführung von virtuellen Karriere-Events oder Informationsveranstaltungen auf Facebook. | Event-Manager |\n| Interaktive Umfragen und Quiz | Nutzung von interaktiven Umfragen und Quiz auf Facebook, um das Engagement zu erhöhen. | Social Media Manager |\n| Remarketing-Kampagnen | Einsatz von Remarketing-Anzeigen, um die Personen erneut anzusprechen, die bereits interessiert waren. | Facebook-Marketing-Experte |\n| Monitoring von Wettbewerbern | Beobachtung der Personalmarketing-Aktivitäten der Wettbewerber auf Facebook und Ableitung von Optimierungspotenzialen. | Marktanalyst |\n| Integration von Chatbots | Implementierung von Chatbots auf der Facebook-Seite und der Landing Page zur Automatisierung des Bewerbungsprozesses. | Chatbot-Entwickler |\n| Employer Branding-Kampagne | Entwicklung einer Employer Branding-Kampagne, die die Unternehmenskultur und -werte hervorhebt. | Employer Branding-Experte |\n\nMit diesen erweiterten Maßnahmen können Sie eine noch umfassendere Personalmarketing-Strategie entwickeln, die die Chancen erhöht, qualifizierte Steuerfachangestellte über Facebook zu gewinnen.'}]] (ignored) 92%|█████████▏| 5289/5773 [27:19<44:17, 5.49s/it] 92%|█████████▏| 5289/5773 [27:21<44:17, 5.49s/it] {'loss': 0.5597, 'learning_rate': 3.6649558562261377e-07, 'epoch': 0.92} 92%|█████████▏| 5289/5773 [27:21<44:17, 5.49s/it] {'loss': 0.5597, 'learning_rate': 3.6649558562261377e-07, 'epoch': 0.92} 92%|█████████▏| 5289/5773 [27:19<44:17, 5.49s/it] 92%|█████████▏| 5290/5773 [27:24<43:44, 5.43s/it] 92%|█████████▏| 5290/5773 [27:27<43:44, 5.43s/it] {'loss': 0.5394, 'learning_rate': 3.64991976746254e-07, 'epoch': 0.92} {'loss': 0.5394, 'learning_rate': 3.64991976746254e-07, 'epoch': 0.92} 92%|█████████▏| 5290/5773 [27:27<43:44, 5.43s/it] 92%|█████████▏| 5290/5773 [27:24<43:44, 5.43s/it] 92%|█████████▏| 5291/5773 [27:30<43:54, 5.47s/it] 92%|█████████▏| 5291/5773 [27:32<43:54, 5.47s/it] {'loss': 0.5541, 'learning_rate': 3.6349140127825533e-07, 'epoch': 0.92} 92%|█████████▏| 5291/5773 [27:32<43:54, 5.47s/it] {'loss': 0.5541, 'learning_rate': 3.6349140127825533e-07, 'epoch': 0.92} 92%|█████████▏| 5291/5773 [27:30<43:54, 5.47s/it] 92%|█████████▏| 5292/5773 [27:35<43:39, 5.45s/it] 92%|█████████▏| 5292/5773 [27:38<43:39, 5.45s/it] {'loss': 0.5338, 'learning_rate': 3.619938596910499e-07, 'epoch': 0.92} 92%|█████████▏| 5292/5773 [27:38<43:39, 5.45s/it] {'loss': 0.5338, 'learning_rate': 3.619938596910499e-07, 'epoch': 0.92} 92%|█████████▏| 5292/5773 [27:35<43:39, 5.45s/it] 92%|█████████▏| 5293/5773 [27:41<43:31, 5.44s/it] 92%|█████████▏| 5293/5773 [27:43<43:31, 5.44s/it] {'loss': 0.5584, 'learning_rate': 3.6049935245610846e-07, 'epoch': 0.92} 92%|█████████▏| 5293/5773 [27:43<43:31, 5.44s/it] {'loss': 0.5584, 'learning_rate': 3.6049935245610846e-07, 'epoch': 0.92} 92%|█████████▏| 5293/5773 [27:41<43:31, 5.44s/it] 92%|█████████▏| 5294/5773 [27:46<43:32, 5.45s/it] 92%|█████████▏| 5294/5773 [27:49<43:32, 5.45s/it] {'loss': 0.563, 'learning_rate': 3.5900788004395113e-07, 'epoch': 0.92} 92%|█████████▏| 5294/5773 [27:49<43:32, 5.45s/it]{'loss': 0.563, 'learning_rate': 3.5900788004395113e-07, 'epoch': 0.92} 92%|█████████▏| 5294/5773 [27:46<43:32, 5.45s/it] 92%|█████████▏| 5295/5773 [27:52<43:32, 5.47s/it] 92%|█████████▏| 5295/5773 [27:54<43:32, 5.47s/it] {'loss': 0.5645, 'learning_rate': 3.575194429241402e-07, 'epoch': 0.92} 92%|█████████▏| 5295/5773 [27:54<43:32, 5.47s/it]{'loss': 0.5645, 'learning_rate': 3.575194429241402e-07, 'epoch': 0.92} 92%|█████████▏| 5295/5773 [27:52<43:32, 5.47s/it] 92%|█████████▏| 5296/5773 [27:57<43:28, 5.47s/it] 92%|█████████▏| 5296/5773 [28:00<43:28, 5.47s/it] {'loss': 0.5514, 'learning_rate': 3.560340415652852e-07, 'epoch': 0.92} {'loss': 0.5514, 'learning_rate': 3.560340415652852e-07, 'epoch': 0.92} 92%|█████████▏| 5296/5773 [28:00<43:28, 5.47s/it] 92%|█████████▏| 5296/5773 [27:57<43:28, 5.47s/it] 92%|█████████▏| 5297/5773 [28:02<42:59, 5.42s/it] 92%|█████████▏| 5297/5773 [28:05<42:59, 5.42s/it] {'loss': 0.5456, 'learning_rate': 3.545516764350343e-07, 'epoch': 0.92} 92%|█████████▏| 5297/5773 [28:05<42:59, 5.42s/it] {'loss': 0.5456, 'learning_rate': 3.545516764350343e-07, 'epoch': 0.92} 92%|█████████▏| 5297/5773 [28:02<42:59, 5.42s/it] 92%|█████████▏| 5298/5773 [28:08<43:00, 5.43s/it] 92%|█████████▏| 5298/5773 [28:10<43:00, 5.43s/it] {'loss': 0.5435, 'learning_rate': 3.530723480000875e-07, 'epoch': 0.92} 92%|█████████▏| 5298/5773 [28:08<43:00, 5.43s/it]{'loss': 0.5435, 'learning_rate': 3.530723480000875e-07, 'epoch': 0.92} 92%|█████████▏| 5298/5773 [28:10<43:00, 5.43s/it] 92%|█████████▏| 5299/5773 [28:13<42:53, 5.43s/it] 92%|█████████▏| 5299/5773 [28:16<42:53, 5.43s/it] {'loss': 0.5586, 'learning_rate': 3.5159605672618224e-07, 'epoch': 0.92} 92%|█████████▏| 5299/5773 [28:16<42:53, 5.43s/it] {'loss': 0.5586, 'learning_rate': 3.5159605672618224e-07, 'epoch': 0.92} 92%|█████████▏| 5299/5773 [28:13<42:53, 5.43s/it]11 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 09 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 92%|█████████▏| 5300/5773 [28:19<42:47, 5.43s/it]15 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 92%|█████████▏| 5300/5773 [28:21<42:47, 5.43s/it]7 AutoResumeHook: Checking whether to suspend... {'loss': 0.538, 'learning_rate': 3.5012280307810344e-07, 'epoch': 0.92} 92%|█████████▏| 5300/5773 [28:21<42:47, 5.43s/it] {'loss': 0.538, 'learning_rate': 3.5012280307810344e-07, 'epoch': 0.92} 92%|█████████▏| 5300/5773 [28:19<42:47, 5.43s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 92%|█████████▏| 5301/5773 [28:40<1:15:13, 9.56s/it] 92%|█████████▏| 5301/5773 [28:38<1:15:13, 9.56s/it] {'loss': 0.5523, 'learning_rate': 3.486525875196756e-07, 'epoch': 0.92} 92%|█████████▏| 5301/5773 [28:40<1:15:13, 9.56s/it] {'loss': 0.5523, 'learning_rate': 3.486525875196756e-07, 'epoch': 0.92} 92%|█████████▏| 5301/5773 [28:38<1:15:13, 9.56s/it] 92%|█████████▏| 5302/5773 [28:46<1:05:27, 8.34s/it] 92%|█████████▏| 5302/5773 [28:43<1:05:28, 8.34s/it] {'loss': 0.5448, 'learning_rate': 3.47185410513774e-07, 'epoch': 0.92} 92%|█████████▏| 5302/5773 [28:46<1:05:27, 8.34s/it] {'loss': 0.5448, 'learning_rate': 3.47185410513774e-07, 'epoch': 0.92} 92%|█████████▏| 5302/5773 [28:43<1:05:28, 8.34s/it] 92%|█████████▏| 5303/5773 [28:51<58:10, 7.43s/it] 92%|█████████▏| 5303/5773 [28:49<58:11, 7.43s/it] {'loss': 0.5605, 'learning_rate': 3.4572127252231134e-07, 'epoch': 0.92} 92%|█████████▏| 5303/5773 [28:51<58:10, 7.43s/it] {'loss': 0.5605, 'learning_rate': 3.4572127252231134e-07, 'epoch': 0.92} 92%|█████████▏| 5303/5773 [28:49<58:11, 7.43s/it] 92%|█████████▏| 5304/5773 [28:57<53:45, 6.88s/it] 92%|█████████▏| 5304/5773 [28:54<53:45, 6.88s/it] {'loss': 0.5571, 'learning_rate': 3.4426017400624124e-07, 'epoch': 0.92} 92%|█████████▏| 5304/5773 [28:57<53:45, 6.88s/it] {'loss': 0.5571, 'learning_rate': 3.4426017400624124e-07, 'epoch': 0.92} 92%|█████████▏| 5304/5773 [28:54<53:45, 6.88s/it] 92%|█████████▏| 5305/5773 [29:02<50:37, 6.49s/it] 92%|█████████▏| 5305/5773 [29:00<50:37, 6.49s/it] {'loss': 0.5824, 'learning_rate': 3.4280211542556896e-07, 'epoch': 0.92} 92%|█████████▏| 5305/5773 [29:02<50:37, 6.49s/it] {'loss': 0.5824, 'learning_rate': 3.4280211542556896e-07, 'epoch': 0.92} 92%|█████████▏| 5305/5773 [29:00<50:37, 6.49s/it] 92%|█████████▏| 5306/5773 [29:08<48:06, 6.18s/it] 92%|█████████▏| 5306/5773 [29:05<48:06, 6.18s/it] {'loss': 0.5458, 'learning_rate': 3.413470972393351e-07, 'epoch': 0.92} 92%|█████████▏| 5306/5773 [29:08<48:06, 6.18s/it] {'loss': 0.5458, 'learning_rate': 3.413470972393351e-07, 'epoch': 0.92} 92%|█████████▏| 5306/5773 [29:05<48:06, 6.18s/it] 92%|█████████▏| 5307/5773 [29:13<46:42, 6.01s/it] 92%|█████████▏| 5307/5773 [29:11<46:42, 6.01s/it] {'loss': 0.564, 'learning_rate': 3.398951199056266e-07, 'epoch': 0.92} 92%|█████████▏| 5307/5773 [29:13<46:42, 6.01s/it] {'loss': 0.564, 'learning_rate': 3.398951199056266e-07, 'epoch': 0.92} 92%|█████████▏| 5307/5773 [29:11<46:42, 6.01s/it] 92%|█████████▏| 5308/5773 [29:19<45:22, 5.85s/it] 92%|█████████▏| 5308/5773 [29:17<45:22, 5.85s/it] {'loss': 0.5491, 'learning_rate': 3.3844618388157335e-07, 'epoch': 0.92} 92%|█████████▏| 5308/5773 [29:19<45:22, 5.85s/it] {'loss': 0.5491, 'learning_rate': 3.3844618388157335e-07, 'epoch': 0.92} 92%|█████████▏| 5308/5773 [29:17<45:22, 5.85s/it] 92%|█████████▏| 5309/5773 [29:24<44:27, 5.75s/it] 92%|█████████▏| 5309/5773 [29:22<44:27, 5.75s/it] {'loss': 0.5746, 'learning_rate': 3.3700028962334595e-07, 'epoch': 0.92} 92%|█████████▏| 5309/5773 [29:24<44:27, 5.75s/it] {'loss': 0.5746, 'learning_rate': 3.3700028962334595e-07, 'epoch': 0.92} 92%|█████████▏| 5309/5773 [29:22<44:27, 5.75s/it] 92%|█████████▏| 5310/5773 [29:30<43:38, 5.65s/it] 92%|█████████▏| 5310/5773 [29:27<43:38, 5.65s/it] {'loss': 0.5565, 'learning_rate': 3.35557437586157e-07, 'epoch': 0.92} 92%|█████████▏| 5310/5773 [29:30<43:38, 5.65s/it] {'loss': 0.5565, 'learning_rate': 3.35557437586157e-07, 'epoch': 0.92} 92%|█████████▏| 5310/5773 [29:27<43:38, 5.65s/it] 92%|█████████▏| 5311/5773 [29:35<42:48, 5.56s/it] 92%|█████████▏| 5311/5773 [29:33<42:48, 5.56s/it] {'loss': 0.5477, 'learning_rate': 3.341176282242653e-07, 'epoch': 0.92} 92%|█████████▏| 5311/5773 [29:35<42:48, 5.56s/it] {'loss': 0.5477, 'learning_rate': 3.341176282242653e-07, 'epoch': 0.92} 92%|█████████▏| 5311/5773 [29:33<42:48, 5.56s/it] 92%|█████████▏| 5312/5773 [29:41<42:28, 5.53s/it] 92%|█████████▏| 5312/5773 [29:38<42:28, 5.53s/it] {'loss': 0.5658, 'learning_rate': 3.3268086199096606e-07, 'epoch': 0.92} 92%|█████████▏| 5312/5773 [29:41<42:28, 5.53s/it] {'loss': 0.5658, 'learning_rate': 3.3268086199096606e-07, 'epoch': 0.92} 92%|█████████▏| 5312/5773 [29:38<42:28, 5.53s/it] 92%|█████████▏| 5313/5773 [29:46<42:31, 5.55s/it] 92%|█████████▏| 5313/5773 [29:44<42:31, 5.55s/it] {'loss': 0.5584, 'learning_rate': 3.3124713933860074e-07, 'epoch': 0.92} 92%|█████████▏| 5313/5773 [29:46<42:31, 5.55s/it] {'loss': 0.5584, 'learning_rate': 3.3124713933860074e-07, 'epoch': 0.92} 92%|█████████▏| 5313/5773 [29:44<42:31, 5.55s/it] 92%|█████████▏| 5314/5773 [29:52<42:15, 5.52s/it] 92%|█████████▏| 5314/5773 [29:49<42:15, 5.52s/it] {'loss': 0.5552, 'learning_rate': 3.298164607185494e-07, 'epoch': 0.92} 92%|█████████▏| 5314/5773 [29:52<42:15, 5.52s/it] {'loss': 0.5552, 'learning_rate': 3.298164607185494e-07, 'epoch': 0.92} 92%|█████████▏| 5314/5773 [29:49<42:15, 5.52s/it] 92%|█████████▏| 5315/5773 [29:57<41:30, 5.44s/it] 92%|█████████▏| 5315/5773 [29:55<41:30, 5.44s/it] {'loss': 0.5579, 'learning_rate': 3.2838882658123736e-07, 'epoch': 0.92} 92%|█████████▏| 5315/5773 [29:57<41:30, 5.44s/it] {'loss': 0.5579, 'learning_rate': 3.2838882658123736e-07, 'epoch': 0.92} 92%|█████████▏| 5315/5773 [29:55<41:30, 5.44s/it] 92%|█████████▏| 5316/5773 [30:02<41:04, 5.39s/it] 92%|█████████▏| 5316/5773 [30:00<41:04, 5.39s/it] {'loss': 0.5415, 'learning_rate': 3.2696423737612946e-07, 'epoch': 0.92} 92%|█████████▏| 5316/5773 [30:02<41:04, 5.39s/it] {'loss': 0.5415, 'learning_rate': 3.2696423737612946e-07, 'epoch': 0.92} 92%|█████████▏| 5316/5773 [30:00<41:04, 5.39s/it] 92%|█████████▏| 5317/5773 [30:08<41:07, 5.41s/it] 92%|█████████▏| 5317/5773 [30:05<41:07, 5.41s/it] {'loss': 0.5518, 'learning_rate': 3.255426935517303e-07, 'epoch': 0.92} 92%|█████████▏| 5317/5773 [30:08<41:07, 5.41s/it] {'loss': 0.5518, 'learning_rate': 3.255426935517303e-07, 'epoch': 0.92} 92%|█████████▏| 5317/5773 [30:05<41:07, 5.41s/it] 92%|█████████▏| 5318/5773 [30:13<41:33, 5.48s/it] 92%|█████████▏| 5318/5773 [30:11<41:33, 5.48s/it] {'loss': 0.5484, 'learning_rate': 3.241241955555874e-07, 'epoch': 0.92} 92%|█████████▏| 5318/5773 [30:13<41:33, 5.48s/it] {'loss': 0.5484, 'learning_rate': 3.241241955555874e-07, 'epoch': 0.92} 92%|█████████▏| 5318/5773 [30:11<41:33, 5.48s/it] 92%|█████████▏| 5319/5773 [30:19<41:36, 5.50s/it] 92%|█████████▏| 5319/5773 [30:16<41:36, 5.50s/it] {'loss': 0.5543, 'learning_rate': 3.227087438342913e-07, 'epoch': 0.92} 92%|█████████▏| 5319/5773 [30:19<41:36, 5.50s/it] {'loss': 0.5543, 'learning_rate': 3.227087438342913e-07, 'epoch': 0.92} 92%|█████████▏| 5319/5773 [30:16<41:36, 5.50s/it] 92%|█████████▏| 5320/5773 [30:24<41:18, 5.47s/it] 92%|█████████▏| 5320/5773 [30:22<41:18, 5.47s/it] {'loss': 0.5718, 'learning_rate': 3.2129633883346777e-07, 'epoch': 0.92} 92%|█████████▏| 5320/5773 [30:24<41:18, 5.47s/it] {'loss': 0.5718, 'learning_rate': 3.2129633883346777e-07, 'epoch': 0.92} 92%|█████████▏| 5320/5773 [30:22<41:18, 5.47s/it] 92%|█████████▏| 5321/5773 [30:30<41:28, 5.51s/it] 92%|█████████▏| 5321/5773 [30:27<41:28, 5.51s/it] {'loss': 0.556, 'learning_rate': 3.1988698099779114e-07, 'epoch': 0.92} 92%|█████████▏| 5321/5773 [30:30<41:28, 5.51s/it] {'loss': 0.556, 'learning_rate': 3.1988698099779114e-07, 'epoch': 0.92} 92%|█████████▏| 5321/5773 [30:27<41:28, 5.51s/it] 92%|█████████▏| 5322/5773 [30:35<41:19, 5.50s/it] 92%|█████████▏| 5322/5773 [30:33<41:19, 5.50s/it] {'loss': 0.5685, 'learning_rate': 3.184806707709698e-07, 'epoch': 0.92} 92%|█████████▏| 5322/5773 [30:35<41:19, 5.50s/it] {'loss': 0.5685, 'learning_rate': 3.184806707709698e-07, 'epoch': 0.92} 92%|█████████▏| 5322/5773 [30:33<41:19, 5.50s/it] 92%|█████████▏| 5323/5773 [30:41<41:14, 5.50s/it] 92%|█████████▏| 5323/5773 [30:38<41:14, 5.50s/it] {'loss': 0.5455, 'learning_rate': 3.1707740859575395e-07, 'epoch': 0.92} 92%|█████████▏| 5323/5773 [30:41<41:14, 5.50s/it] {'loss': 0.5455, 'learning_rate': 3.1707740859575395e-07, 'epoch': 0.92} 92%|█████████▏| 5323/5773 [30:38<41:14, 5.50s/it] 92%|█████████▏| 5324/5773 [30:46<41:16, 5.52s/it] 92%|█████████▏| 5324/5773 [30:44<41:16, 5.52s/it] {'loss': 0.5473, 'learning_rate': 3.156771949139392e-07, 'epoch': 0.92} 92%|█████████▏| 5324/5773 [30:46<41:16, 5.52s/it] {'loss': 0.5473, 'learning_rate': 3.156771949139392e-07, 'epoch': 0.92} 92%|█████████▏| 5324/5773 [30:44<41:16, 5.52s/it] 92%|█████████▏| 5325/5773 [30:52<41:11, 5.52s/it] 92%|█████████▏| 5325/5773 [30:50<41:11, 5.52s/it] {'loss': 0.5651, 'learning_rate': 3.1428003016635514e-07, 'epoch': 0.92} 92%|█████████▏| 5325/5773 [30:52<41:11, 5.52s/it] {'loss': 0.5651, 'learning_rate': 3.1428003016635514e-07, 'epoch': 0.92} 92%|█████████▏| 5325/5773 [30:50<41:11, 5.52s/it] 92%|█████████▏| 5326/5773 [30:57<40:39, 5.46s/it] 92%|█████████▏| 5326/5773 [30:55<40:39, 5.46s/it] {'loss': 0.5705, 'learning_rate': 3.1288591479287424e-07, 'epoch': 0.92} 92%|█████████▏| 5326/5773 [30:57<40:39, 5.46s/it] {'loss': 0.5705, 'learning_rate': 3.1288591479287424e-07, 'epoch': 0.92} 92%|█████████▏| 5326/5773 [30:55<40:39, 5.46s/it] 92%|█████████▏| 5327/5773 [31:03<40:54, 5.50s/it] 92%|█████████▏| 5327/5773 [31:00<40:54, 5.50s/it] {'loss': 0.5497, 'learning_rate': 3.114948492324077e-07, 'epoch': 0.92} 92%|█████████▏| 5327/5773 [31:03<40:54, 5.50s/it] {'loss': 0.5497, 'learning_rate': 3.114948492324077e-07, 'epoch': 0.92} 92%|█████████▏| 5327/5773 [31:00<40:54, 5.50s/it] 92%|█████████▏| 5328/5773 [31:08<40:54, 5.52s/it] 92%|█████████▏| 5328/5773 [31:06<40:54, 5.52s/it] {'loss': 0.5599, 'learning_rate': 3.1010683392290964e-07, 'epoch': 0.92} 92%|█████████▏| 5328/5773 [31:08<40:54, 5.52s/it] {'loss': 0.5599, 'learning_rate': 3.1010683392290964e-07, 'epoch': 0.92} 92%|█████████▏| 5328/5773 [31:06<40:54, 5.52s/it] 92%|█████████▏| 5329/5773 [31:14<40:43, 5.50s/it] 92%|█████████▏| 5329/5773 [31:11<40:43, 5.50s/it] {'loss': 0.5387, 'learning_rate': 3.087218693013694e-07, 'epoch': 0.92} 92%|█████████▏| 5329/5773 [31:14<40:43, 5.50s/it] {'loss': 0.5387, 'learning_rate': 3.087218693013694e-07, 'epoch': 0.92} 92%|█████████▏| 5329/5773 [31:11<40:43, 5.50s/it] 92%|█████████▏| 5330/5773 [31:19<40:50, 5.53s/it] 92%|█████████▏| 5330/5773 [31:17<40:50, 5.53s/it] {'loss': 0.5579, 'learning_rate': 3.0733995580381817e-07, 'epoch': 0.92} 92%|█████████▏| 5330/5773 [31:19<40:50, 5.53s/it] {'loss': 0.5579, 'learning_rate': 3.0733995580381817e-07, 'epoch': 0.92} 92%|█████████▏| 5330/5773 [31:17<40:50, 5.53s/it] 92%|█████████▏| 5331/5773 [31:25<40:30, 5.50s/it] 92%|█████████▏| 5331/5773 [31:22<40:30, 5.50s/it] {'loss': 0.5644, 'learning_rate': 3.0596109386533015e-07, 'epoch': 0.92} 92%|█████████▏| 5331/5773 [31:25<40:30, 5.50s/it] {'loss': 0.5644, 'learning_rate': 3.0596109386533015e-07, 'epoch': 0.92} 92%|█████████▏| 5331/5773 [31:22<40:30, 5.50s/it] 92%|█████████▏| 5332/5773 [31:30<39:54, 5.43s/it] {'loss': 0.5496, 'learning_rate': 3.045852839200103e-07, 'epoch': 0.92} 92%|█████████▏| 5332/5773 [31:28<39:54, 5.43s/it] {'loss': 0.5496, 'learning_rate': 3.045852839200103e-07, 'epoch': 0.92} 92%|█████████▏| 5332/5773 [31:30<39:54, 5.43s/it] 92%|█████████▏| 5332/5773 [31:28<39:54, 5.43s/it] 92%|█████████▏| 5333/5773 [31:36<39:50, 5.43s/it] 92%|█████████▏| 5333/5773 [31:33<39:50, 5.43s/it] {'loss': 0.5641, 'learning_rate': 3.0321252640100883e-07, 'epoch': 0.92} 92%|█████████▏| 5333/5773 [31:36<39:50, 5.43s/it] {'loss': 0.5641, 'learning_rate': 3.0321252640100883e-07, 'epoch': 0.92} 92%|█████████▏| 5333/5773 [31:33<39:50, 5.43s/it] 92%|█████████▏| 5334/5773 [31:41<39:39, 5.42s/it] 92%|█████████▏| 5334/5773 [31:39<39:39, 5.42s/it] {'loss': 0.557, 'learning_rate': 3.018428217405145e-07, 'epoch': 0.92} 92%|█████████▏| 5334/5773 [31:41<39:39, 5.42s/it] {'loss': 0.557, 'learning_rate': 3.018428217405145e-07, 'epoch': 0.92} 92%|█████████▏| 5334/5773 [31:39<39:39, 5.42s/it] 92%|█████████▏| 5335/5773 [31:46<39:14, 5.38s/it] 92%|█████████▏| 5335/5773 [31:44<39:14, 5.38s/it] {'loss': 0.5665, 'learning_rate': 3.0047617036975453e-07, 'epoch': 0.92} 92%|█████████▏| 5335/5773 [31:46<39:14, 5.38s/it] {'loss': 0.5665, 'learning_rate': 3.0047617036975453e-07, 'epoch': 0.92} 92%|█████████▏| 5335/5773 [31:44<39:14, 5.38s/it] 92%|█████████▏| 5336/5773 [31:52<39:19, 5.40s/it] 92%|█████████▏| 5336/5773 [31:49<39:19, 5.40s/it] {'loss': 0.5486, 'learning_rate': 2.9911257271899253e-07, 'epoch': 0.92} 92%|█████████▏| 5336/5773 [31:52<39:19, 5.40s/it] {'loss': 0.5486, 'learning_rate': 2.9911257271899253e-07, 'epoch': 0.92} 92%|█████████▏| 5336/5773 [31:49<39:19, 5.40s/it] 92%|█████████▏| 5337/5773 [31:57<39:25, 5.43s/it] 92%|█████████▏| 5337/5773 [31:55<39:25, 5.43s/it] {'loss': 0.5484, 'learning_rate': 2.977520292175351e-07, 'epoch': 0.92} 92%|█████████▏| 5337/5773 [31:57<39:25, 5.43s/it] {'loss': 0.5484, 'learning_rate': 2.977520292175351e-07, 'epoch': 0.92} 92%|█████████▏| 5337/5773 [31:55<39:25, 5.43s/it] 92%|█████████▏| 5338/5773 [32:03<39:11, 5.41s/it] 92%|█████████▏| 5338/5773 [32:00<39:11, 5.41s/it] {'loss': 0.556, 'learning_rate': 2.9639454029372405e-07, 'epoch': 0.92} 92%|█████████▏| 5338/5773 [32:03<39:11, 5.41s/it] {'loss': 0.556, 'learning_rate': 2.9639454029372405e-07, 'epoch': 0.92} 92%|█████████▏| 5338/5773 [32:00<39:11, 5.41s/it] 92%|█████████▏| 5339/5773 [32:08<39:19, 5.44s/it] 92%|█████████▏| 5339/5773 [32:06<39:19, 5.44s/it] {'loss': 0.5257, 'learning_rate': 2.9504010637493976e-07, 'epoch': 0.92} 92%|█████████▏| 5339/5773 [32:08<39:19, 5.44s/it] {'loss': 0.5257, 'learning_rate': 2.9504010637493976e-07, 'epoch': 0.92} 92%|█████████▏| 5339/5773 [32:06<39:19, 5.44s/it] 92%|█████████▏| 5340/5773 [32:14<39:17, 5.44s/it] 92%|█████████▏| 5340/5773 [32:11<39:17, 5.44s/it] {'loss': 0.5548, 'learning_rate': 2.9368872788759886e-07, 'epoch': 0.92} 92%|█████████▏| 5340/5773 [32:14<39:17, 5.44s/it] {'loss': 0.5548, 'learning_rate': 2.9368872788759886e-07, 'epoch': 0.92} 92%|█████████▏| 5340/5773 [32:11<39:17, 5.44s/it] 93%|█████████▎| 5341/5773 [32:19<39:04, 5.43s/it] 93%|█████████▎| 5341/5773 [32:17<39:04, 5.43s/it] {'loss': 0.5605, 'learning_rate': 2.9234040525716325e-07, 'epoch': 0.93} 93%|█████████▎| 5341/5773 [32:19<39:04, 5.43s/it] {'loss': 0.5605, 'learning_rate': 2.9234040525716325e-07, 'epoch': 0.93} 93%|█████████▎| 5341/5773 [32:17<39:04, 5.43s/it] 93%|█████████▎| 5342/5773 [32:25<39:20, 5.48s/it] 93%|█████████▎| 5342/5773 [32:22<39:20, 5.48s/it] {'loss': 0.5549, 'learning_rate': 2.9099513890812556e-07, 'epoch': 0.93} 93%|█████████▎| 5342/5773 [32:25<39:20, 5.48s/it] {'loss': 0.5549, 'learning_rate': 2.9099513890812556e-07, 'epoch': 0.93} 93%|█████████▎| 5342/5773 [32:22<39:20, 5.48s/it] 93%|█████████▎| 5343/5773 [32:30<39:05, 5.45s/it] 93%|█████████▎| 5343/5773 [32:28<39:05, 5.45s/it] {'loss': 0.5456, 'learning_rate': 2.8965292926401714e-07, 'epoch': 0.93} 93%|█████████▎| 5343/5773 [32:30<39:05, 5.45s/it] {'loss': 0.5456, 'learning_rate': 2.8965292926401714e-07, 'epoch': 0.93} 93%|█████████▎| 5343/5773 [32:28<39:05, 5.45s/it] 93%|█████████▎| 5344/5773 [32:35<39:14, 5.49s/it] 93%|█████████▎| 5344/5773 [32:33<39:14, 5.49s/it] {'loss': 0.5493, 'learning_rate': 2.8831377674741203e-07, 'epoch': 0.93} 93%|█████████▎| 5344/5773 [32:35<39:14, 5.49s/it] {'loss': 0.5493, 'learning_rate': 2.8831377674741203e-07, 'epoch': 0.93} 93%|█████████▎| 5344/5773 [32:33<39:14, 5.49s/it] 93%|█████████▎| 5345/5773 [32:41<39:03, 5.48s/it] 93%|█████████▎| 5345/5773 [32:39<39:03, 5.48s/it] {'loss': 0.5528, 'learning_rate': 2.8697768177991527e-07, 'epoch': 0.93} 93%|█████████▎| 5345/5773 [32:41<39:03, 5.48s/it] {'loss': 0.5528, 'learning_rate': 2.8697768177991527e-07, 'epoch': 0.93} 93%|█████████▎| 5345/5773 [32:39<39:03, 5.48s/it] 93%|█████████▎| 5346/5773 [32:46<39:07, 5.50s/it] 93%|█████████▎| 5346/5773 [32:44<39:07, 5.50s/it] {'loss': 0.5618, 'learning_rate': 2.856446447821737e-07, 'epoch': 0.93} 93%|█████████▎| 5346/5773 [32:46<39:07, 5.50s/it] {'loss': 0.5618, 'learning_rate': 2.856446447821737e-07, 'epoch': 0.93} 93%|█████████▎| 5346/5773 [32:44<39:07, 5.50s/it] 93%|█████████▎| 5347/5773 [32:52<38:45, 5.46s/it] 93%|█████████▎| 5347/5773 [32:49<38:45, 5.46s/it] {'loss': 0.5295, 'learning_rate': 2.8431466617386936e-07, 'epoch': 0.93} 93%|█████████▎| 5347/5773 [32:52<38:45, 5.46s/it] {'loss': 0.5295, 'learning_rate': 2.8431466617386936e-07, 'epoch': 0.93} 93%|█████████▎| 5347/5773 [32:49<38:45, 5.46s/it] 93%|█████████▎| 5348/5773 [32:57<38:36, 5.45s/it] 93%|█████████▎| 5348/5773 [32:55<38:36, 5.45s/it] {'loss': 0.5529, 'learning_rate': 2.8298774637372295e-07, 'epoch': 0.93} 93%|█████████▎| 5348/5773 [32:57<38:36, 5.45s/it] {'loss': 0.5529, 'learning_rate': 2.8298774637372295e-07, 'epoch': 0.93} 93%|█████████▎| 5348/5773 [32:55<38:36, 5.45s/it] 93%|█████████▎| 5349/5773 [33:03<38:25, 5.44s/it] 93%|█████████▎| 5349/5773 [33:00<38:25, 5.44s/it] {'loss': 0.5645, 'learning_rate': 2.81663885799488e-07, 'epoch': 0.93} 93%|█████████▎| 5349/5773 [33:03<38:25, 5.44s/it] {'loss': 0.5645, 'learning_rate': 2.81663885799488e-07, 'epoch': 0.93} 93%|█████████▎| 5349/5773 [33:00<38:25, 5.44s/it]2 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 39 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 8 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 93%|█████████▎| 5350/5773 [33:08<38:20, 5.44s/it]15 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 93%|█████████▎| 5350/5773 [33:06<38:20, 5.44s/it] {'loss': 0.5617, 'learning_rate': 2.8034308486796223e-07, 'epoch': 0.93} 93%|█████████▎| 5350/5773 [33:08<38:20, 5.44s/it] {'loss': 0.5617, 'learning_rate': 2.8034308486796223e-07, 'epoch': 0.93} 93%|█████████▎| 5350/5773 [33:06<38:20, 5.44s/it] 93%|█████████▎| 5351/5773 [33:14<38:10, 5.43s/it] 93%|█████████▎| 5351/5773 [33:11<38:10, 5.43s/it] {'loss': 0.5496, 'learning_rate': 2.7902534399497416e-07, 'epoch': 0.93} 93%|█████████▎| 5351/5773 [33:14<38:10, 5.43s/it] {'loss': 0.5496, 'learning_rate': 2.7902534399497416e-07, 'epoch': 0.93} 93%|█████████▎| 5351/5773 [33:11<38:10, 5.43s/it] 93%|█████████▎| 5352/5773 [33:19<38:31, 5.49s/it] 93%|█████████▎| 5352/5773 [33:17<38:31, 5.49s/it] {'loss': 0.5575, 'learning_rate': 2.777106635953908e-07, 'epoch': 0.93} 93%|█████████▎| 5352/5773 [33:19<38:31, 5.49s/it] {'loss': 0.5575, 'learning_rate': 2.777106635953908e-07, 'epoch': 0.93} 93%|█████████▎| 5352/5773 [33:17<38:31, 5.49s/it] 93%|█████████▎| 5353/5773 [33:25<38:44, 5.53s/it] 93%|█████████▎| 5353/5773 [33:22<38:44, 5.53s/it] {'loss': 0.5756, 'learning_rate': 2.7639904408311435e-07, 'epoch': 0.93} 93%|█████████▎| 5353/5773 [33:25<38:44, 5.53s/it] {'loss': 0.5756, 'learning_rate': 2.7639904408311435e-07, 'epoch': 0.93} 93%|█████████▎| 5353/5773 [33:22<38:44, 5.53s/it] 93%|█████████▎| 5354/5773 [33:30<38:34, 5.52s/it] 93%|█████████▎| 5354/5773 [33:28<38:34, 5.52s/it] {'loss': 0.5645, 'learning_rate': 2.750904858710879e-07, 'epoch': 0.93} 93%|█████████▎| 5354/5773 [33:30<38:34, 5.52s/it] {'loss': 0.5645, 'learning_rate': 2.750904858710879e-07, 'epoch': 0.93} 93%|█████████▎| 5354/5773 [33:28<38:34, 5.52s/it] 93%|█████████▎| 5355/5773 [33:36<38:15, 5.49s/it] 93%|█████████▎| 5355/5773 [33:33<38:15, 5.49s/it] {'loss': 0.5448, 'learning_rate': 2.737849893712841e-07, 'epoch': 0.93} 93%|█████████▎| 5355/5773 [33:36<38:15, 5.49s/it] {'loss': 0.5448, 'learning_rate': 2.737849893712841e-07, 'epoch': 0.93} 93%|█████████▎| 5355/5773 [33:33<38:15, 5.49s/it] 93%|█████████▎| 5356/5773 [33:41<38:20, 5.52s/it] 93%|█████████▎| 5356/5773 [33:39<38:20, 5.52s/it] {'loss': 0.5547, 'learning_rate': 2.724825549947152e-07, 'epoch': 0.93} 93%|█████████▎| 5356/5773 [33:41<38:20, 5.52s/it] {'loss': 0.5547, 'learning_rate': 2.724825549947152e-07, 'epoch': 0.93} 93%|█████████▎| 5356/5773 [33:39<38:20, 5.52s/it] 93%|█████████▎| 5357/5773 [33:47<37:51, 5.46s/it] 93%|█████████▎| 5357/5773 [33:44<37:51, 5.46s/it] {'loss': 0.5517, 'learning_rate': 2.711831831514311e-07, 'epoch': 0.93} 93%|█████████▎| 5357/5773 [33:47<37:51, 5.46s/it] {'loss': 0.5517, 'learning_rate': 2.711831831514311e-07, 'epoch': 0.93} 93%|█████████▎| 5357/5773 [33:44<37:51, 5.46s/it] 93%|█████████▎| 5358/5773 [33:52<37:30, 5.42s/it] 93%|█████████▎| 5358/5773 [33:50<37:30, 5.42s/it] {'loss': 0.56, 'learning_rate': 2.6988687425051453e-07, 'epoch': 0.93} 93%|█████████▎| 5358/5773 [33:52<37:30, 5.42s/it] {'loss': 0.56, 'learning_rate': 2.6988687425051453e-07, 'epoch': 0.93} 93%|█████████▎| 5358/5773 [33:50<37:30, 5.42s/it] 93%|█████████▎| 5359/5773 [33:57<37:17, 5.40s/it] 93%|█████████▎| 5359/5773 [33:55<37:17, 5.41s/it] {'loss': 0.5463, 'learning_rate': 2.685936287000856e-07, 'epoch': 0.93} 93%|█████████▎| 5359/5773 [33:57<37:17, 5.40s/it] {'loss': 0.5463, 'learning_rate': 2.685936287000856e-07, 'epoch': 0.93} 93%|█████████▎| 5359/5773 [33:55<37:17, 5.41s/it] 93%|█████████▎| 5360/5773 [34:03<37:33, 5.46s/it] 93%|█████████▎| 5360/5773 [34:00<37:33, 5.46s/it] {'loss': 0.5588, 'learning_rate': 2.673034469072977e-07, 'epoch': 0.93} 93%|█████████▎| 5360/5773 [34:03<37:33, 5.46s/it] {'loss': 0.5588, 'learning_rate': 2.673034469072977e-07, 'epoch': 0.93} 93%|█████████▎| 5360/5773 [34:00<37:33, 5.46s/it] 93%|█████████▎| 5361/5773 [34:08<37:30, 5.46s/it] 93%|█████████▎| 5361/5773 [34:06<37:30, 5.46s/it] {'loss': 0.5628, 'learning_rate': 2.660163292783424e-07, 'epoch': 0.93} 93%|█████████▎| 5361/5773 [34:08<37:30, 5.46s/it] {'loss': 0.5628, 'learning_rate': 2.660163292783424e-07, 'epoch': 0.93} 93%|█████████▎| 5361/5773 [34:06<37:30, 5.46s/it] 93%|█████████▎| 5362/5773 [34:14<37:20, 5.45s/it] 93%|█████████▎| 5362/5773 [34:11<37:20, 5.45s/it] {'loss': 0.5511, 'learning_rate': 2.6473227621844457e-07, 'epoch': 0.93} 93%|█████████▎| 5362/5773 [34:14<37:20, 5.45s/it] {'loss': 0.5511, 'learning_rate': 2.6473227621844457e-07, 'epoch': 0.93} 93%|█████████▎| 5362/5773 [34:11<37:20, 5.45s/it] 93%|█████████▎| 5363/5773 [34:19<37:22, 5.47s/it] 93%|█████████▎| 5363/5773 [34:17<37:22, 5.47s/it] {'loss': 0.5438, 'learning_rate': 2.6345128813186737e-07, 'epoch': 0.93} 93%|█████████▎| 5363/5773 [34:19<37:22, 5.47s/it] {'loss': 0.5438, 'learning_rate': 2.6345128813186737e-07, 'epoch': 0.93} 93%|█████████▎| 5363/5773 [34:17<37:22, 5.47s/it] 93%|█████████▎| 5364/5773 [34:25<37:23, 5.48s/it] 93%|█████████▎| 5364/5773 [34:22<37:23, 5.48s/it] {'loss': 0.5592, 'learning_rate': 2.6217336542190494e-07, 'epoch': 0.93} 93%|█████████▎| 5364/5773 [34:25<37:23, 5.48s/it] {'loss': 0.5592, 'learning_rate': 2.6217336542190494e-07, 'epoch': 0.93} 93%|█████████▎| 5364/5773 [34:22<37:23, 5.48s/it] 93%|█████████▎| 5365/5773 [34:30<37:06, 5.46s/it] 93%|█████████▎| 5365/5773 [34:28<37:06, 5.46s/it] {'loss': 0.559, 'learning_rate': 2.6089850849088773e-07, 'epoch': 0.93} 93%|█████████▎| 5365/5773 [34:30<37:06, 5.46s/it] {'loss': 0.559, 'learning_rate': 2.6089850849088773e-07, 'epoch': 0.93} 93%|█████████▎| 5365/5773 [34:28<37:06, 5.46s/it] 93%|█████████▎| 5366/5773 [34:36<36:52, 5.44s/it] 93%|█████████▎| 5366/5773 [34:33<36:52, 5.43s/it] {'loss': 0.5545, 'learning_rate': 2.5962671774018234e-07, 'epoch': 0.93} 93%|█████████▎| 5366/5773 [34:36<36:52, 5.44s/it] {'loss': 0.5545, 'learning_rate': 2.5962671774018234e-07, 'epoch': 0.93} 93%|█████████▎| 5366/5773 [34:33<36:52, 5.43s/it] 93%|█████████▎| 5367/5773 [34:41<36:41, 5.42s/it] 93%|█████████▎| 5367/5773 [34:39<36:41, 5.42s/it] {'loss': 0.5489, 'learning_rate': 2.5835799357018856e-07, 'epoch': 0.93} 93%|█████████▎| 5367/5773 [34:41<36:41, 5.42s/it] {'loss': 0.5489, 'learning_rate': 2.5835799357018856e-07, 'epoch': 0.93} 93%|█████████▎| 5367/5773 [34:39<36:41, 5.42s/it] 93%|█████████▎| 5368/5773 [34:46<36:29, 5.41s/it] 93%|█████████▎| 5368/5773 [34:44<36:29, 5.41s/it] {'loss': 0.5447, 'learning_rate': 2.5709233638034345e-07, 'epoch': 0.93} 93%|█████████▎| 5368/5773 [34:46<36:29, 5.41s/it] {'loss': 0.5447, 'learning_rate': 2.5709233638034345e-07, 'epoch': 0.93} 93%|█████████▎| 5368/5773 [34:44<36:29, 5.41s/it] 93%|█████████▎| 5369/5773 [34:52<36:39, 5.44s/it] 93%|█████████▎| 5369/5773 [34:49<36:39, 5.44s/it] {'loss': 0.5566, 'learning_rate': 2.558297465691129e-07, 'epoch': 0.93} 93%|█████████▎| 5369/5773 [34:52<36:39, 5.44s/it] {'loss': 0.5566, 'learning_rate': 2.558297465691129e-07, 'epoch': 0.93} 93%|█████████▎| 5369/5773 [34:50<36:39, 5.44s/it] 93%|█████████▎| 5370/5773 [34:57<36:41, 5.46s/it] 93%|█████████▎| 5370/5773 [34:55<36:41, 5.46s/it] {'loss': 0.5492, 'learning_rate': 2.5457022453400313e-07, 'epoch': 0.93} 93%|█████████▎| 5370/5773 [34:57<36:41, 5.46s/it] {'loss': 0.5492, 'learning_rate': 2.5457022453400313e-07, 'epoch': 0.93} 93%|█████████▎| 5370/5773 [34:55<36:41, 5.46s/it] 93%|█████████▎| 5371/5773 [35:03<36:53, 5.51s/it] 93%|█████████▎| 5371/5773 [35:01<36:53, 5.51s/it] {'loss': 0.5566, 'learning_rate': 2.533137706715505e-07, 'epoch': 0.93} 93%|█████████▎| 5371/5773 [35:03<36:53, 5.51s/it] {'loss': 0.5566, 'learning_rate': 2.533137706715505e-07, 'epoch': 0.93} 93%|█████████▎| 5371/5773 [35:01<36:53, 5.51s/it] 93%|█████████▎| 5372/5773 [35:09<36:54, 5.52s/it] 93%|█████████▎| 5372/5773 [35:06<36:54, 5.52s/it] {'loss': 0.5523, 'learning_rate': 2.5206038537732736e-07, 'epoch': 0.93} 93%|█████████▎| 5372/5773 [35:09<36:54, 5.52s/it] {'loss': 0.5523, 'learning_rate': 2.5206038537732736e-07, 'epoch': 0.93} 93%|█████████▎| 5372/5773 [35:06<36:54, 5.52s/it] 93%|█████████▎| 5373/5773 [35:14<36:57, 5.54s/it] 93%|█████████▎| 5373/5773 [35:12<36:57, 5.54s/it] {'loss': 0.5448, 'learning_rate': 2.508100690459392e-07, 'epoch': 0.93} 93%|█████████▎| 5373/5773 [35:14<36:57, 5.54s/it] {'loss': 0.5448, 'learning_rate': 2.508100690459392e-07, 'epoch': 0.93} 93%|█████████▎| 5373/5773 [35:12<36:57, 5.54s/it] 93%|█████████▎| 5374/5773 [35:20<36:41, 5.52s/it] 93%|█████████▎| 5374/5773 [35:17<36:41, 5.52s/it] {'loss': 0.5497, 'learning_rate': 2.495628220710278e-07, 'epoch': 0.93} 93%|█████████▎| 5374/5773 [35:20<36:41, 5.52s/it] {'loss': 0.5497, 'learning_rate': 2.495628220710278e-07, 'epoch': 0.93} 93%|█████████▎| 5374/5773 [35:17<36:41, 5.52s/it] 93%|█████████▎| 5375/5773 [35:25<36:45, 5.54s/it] 93%|█████████▎| 5375/5773 [35:23<36:45, 5.54s/it] {'loss': 0.5476, 'learning_rate': 2.483186448452624e-07, 'epoch': 0.93} 93%|█████████▎| 5375/5773 [35:25<36:45, 5.54s/it] {'loss': 0.5476, 'learning_rate': 2.483186448452624e-07, 'epoch': 0.93} 93%|█████████▎| 5375/5773 [35:23<36:45, 5.54s/it] 93%|█████████▎| 5376/5773 [35:31<36:17, 5.49s/it] 93%|█████████▎| 5376/5773 [35:28<36:17, 5.49s/it] {'loss': 0.5736, 'learning_rate': 2.470775377603529e-07, 'epoch': 0.93} 93%|█████████▎| 5376/5773 [35:31<36:17, 5.49s/it] {'loss': 0.5736, 'learning_rate': 2.470775377603529e-07, 'epoch': 0.93} 93%|█████████▎| 5376/5773 [35:28<36:17, 5.49s/it] 93%|█████████▎| 5377/5773 [35:36<36:13, 5.49s/it] 93%|█████████▎| 5377/5773 [35:34<36:13, 5.49s/it] {'loss': 0.554, 'learning_rate': 2.458395012070369e-07, 'epoch': 0.93} 93%|█████████▎| 5377/5773 [35:36<36:13, 5.49s/it] {'loss': 0.554, 'learning_rate': 2.458395012070369e-07, 'epoch': 0.93} 93%|█████████▎| 5377/5773 [35:34<36:13, 5.49s/it] 93%|█████████▎| 5378/5773 [35:42<36:07, 5.49s/it] 93%|█████████▎| 5378/5773 [35:39<36:07, 5.49s/it] {'loss': 0.5481, 'learning_rate': 2.4460453557509037e-07, 'epoch': 0.93} 93%|█████████▎| 5378/5773 [35:42<36:07, 5.49s/it] {'loss': 0.5481, 'learning_rate': 2.4460453557509037e-07, 'epoch': 0.93} 93%|█████████▎| 5378/5773 [35:39<36:07, 5.49s/it] 93%|█████████▎| 5379/5773 [35:47<35:58, 5.48s/it] 93%|█████████▎| 5379/5773 [35:45<35:58, 5.48s/it] {'loss': 0.5443, 'learning_rate': 2.4337264125331774e-07, 'epoch': 0.93} 93%|█████████▎| 5379/5773 [35:47<35:58, 5.48s/it] {'loss': 0.5443, 'learning_rate': 2.4337264125331774e-07, 'epoch': 0.93} 93%|█████████▎| 5379/5773 [35:45<35:58, 5.48s/it] 93%|█████████▎| 5380/5773 [35:52<35:36, 5.44s/it] 93%|█████████▎| 5380/5773 [35:50<35:36, 5.44s/it] {'loss': 0.5591, 'learning_rate': 2.4214381862956105e-07, 'epoch': 0.93} 93%|█████████▎| 5380/5773 [35:52<35:36, 5.44s/it] {'loss': 0.5591, 'learning_rate': 2.4214381862956105e-07, 'epoch': 0.93} 93%|█████████▎| 5380/5773 [35:50<35:36, 5.44s/it] 93%|█████████▎| 5381/5773 [35:58<35:49, 5.48s/it] 93%|█████████▎| 5381/5773 [35:56<35:49, 5.48s/it] {'loss': 0.5583, 'learning_rate': 2.4091806809069086e-07, 'epoch': 0.93} 93%|█████████▎| 5381/5773 [35:58<35:49, 5.48s/it] {'loss': 0.5583, 'learning_rate': 2.4091806809069086e-07, 'epoch': 0.93} 93%|█████████▎| 5381/5773 [35:56<35:49, 5.48s/it] 93%|█████████▎| 5382/5773 [36:03<35:39, 5.47s/it] 93%|█████████▎| 5382/5773 [36:01<35:39, 5.47s/it] {'loss': 0.5646, 'learning_rate': 2.396953900226129e-07, 'epoch': 0.93} 93%|█████████▎| 5382/5773 [36:03<35:39, 5.47s/it] {'loss': 0.5646, 'learning_rate': 2.396953900226129e-07, 'epoch': 0.93} 93%|█████████▎| 5382/5773 [36:01<35:39, 5.47s/it] 93%|█████████▎| 5383/5773 [36:09<35:33, 5.47s/it] 93%|█████████▎| 5383/5773 [36:06<35:33, 5.47s/it] {'loss': 0.5426, 'learning_rate': 2.384757848102659e-07, 'epoch': 0.93} 93%|█████████▎| 5383/5773 [36:09<35:33, 5.47s/it] {'loss': 0.5426, 'learning_rate': 2.384757848102659e-07, 'epoch': 0.93} 93%|█████████▎| 5383/5773 [36:06<35:33, 5.47s/it] 93%|█████████▎| 5384/5773 [36:14<35:29, 5.48s/it] 93%|█████████▎| 5384/5773 [36:12<35:29, 5.48s/it] {'loss': 0.5543, 'learning_rate': 2.3725925283762163e-07, 'epoch': 0.93} 93%|█████████▎| 5384/5773 [36:14<35:29, 5.48s/it] {'loss': 0.5543, 'learning_rate': 2.3725925283762163e-07, 'epoch': 0.93} 93%|█████████▎| 5384/5773 [36:12<35:29, 5.48s/it] 93%|█████████▎| 5385/5773 [36:20<35:15, 5.45s/it] 93%|█████████▎| 5385/5773 [36:17<35:15, 5.45s/it] {'loss': 0.5267, 'learning_rate': 2.360457944876804e-07, 'epoch': 0.93} 93%|█████████▎| 5385/5773 [36:20<35:15, 5.45s/it] {'loss': 0.5267, 'learning_rate': 2.360457944876804e-07, 'epoch': 0.93} 93%|█████████▎| 5385/5773 [36:17<35:15, 5.45s/it] 93%|█████████▎| 5386/5773 [36:25<35:01, 5.43s/it] 93%|█████████▎| 5386/5773 [36:23<35:01, 5.43s/it] {'loss': 0.5523, 'learning_rate': 2.348354101424799e-07, 'epoch': 0.93} 93%|█████████▎| 5386/5773 [36:25<35:01, 5.43s/it] {'loss': 0.5523, 'learning_rate': 2.348354101424799e-07, 'epoch': 0.93} 93%|█████████▎| 5386/5773 [36:23<35:01, 5.43s/it] 93%|█████████▎| 5387/5773 [36:31<34:56, 5.43s/it] 93%|█████████▎| 5387/5773 [36:28<34:56, 5.43s/it] {'loss': 0.5628, 'learning_rate': 2.3362810018308757e-07, 'epoch': 0.93} 93%|█████████▎| 5387/5773 [36:31<34:56, 5.43s/it] {'loss': 0.5628, 'learning_rate': 2.3362810018308757e-07, 'epoch': 0.93} 93%|█████████▎| 5387/5773 [36:28<34:56, 5.43s/it] 93%|█████████▎| 5388/5773 [36:36<35:01, 5.46s/it] 93%|█████████▎| 5388/5773 [36:34<35:01, 5.46s/it] {'loss': 0.5501, 'learning_rate': 2.3242386498960267e-07, 'epoch': 0.93} 93%|█████████▎| 5388/5773 [36:36<35:01, 5.46s/it] {'loss': 0.5501, 'learning_rate': 2.3242386498960267e-07, 'epoch': 0.93} 93%|█████████▎| 5388/5773 [36:34<35:01, 5.46s/it] 93%|█████████▎| 5389/5773 [36:42<34:55, 5.46s/it] 93%|█████████▎| 5389/5773 [36:39<34:55, 5.46s/it] {'loss': 0.5604, 'learning_rate': 2.3122270494115752e-07, 'epoch': 0.93} 93%|█████████▎| 5389/5773 [36:42<34:55, 5.46s/it] {'loss': 0.5604, 'learning_rate': 2.3122270494115752e-07, 'epoch': 0.93} 93%|█████████▎| 5389/5773 [36:39<34:55, 5.46s/it] 93%|█████████▎| 5390/5773 [36:47<34:58, 5.48s/it] 93%|█████████▎| 5390/5773 [36:45<34:58, 5.48s/it] {'loss': 0.5481, 'learning_rate': 2.30024620415914e-07, 'epoch': 0.93} 93%|█████████▎| 5390/5773 [36:47<34:58, 5.48s/it] {'loss': 0.5481, 'learning_rate': 2.30024620415914e-07, 'epoch': 0.93} 93%|█████████▎| 5390/5773 [36:45<34:58, 5.48s/it] 93%|█████████▎| 5391/5773 [36:53<34:53, 5.48s/it] 93%|█████████▎| 5391/5773 [36:50<34:53, 5.48s/it] {'loss': 0.5376, 'learning_rate': 2.2882961179106933e-07, 'epoch': 0.93} 93%|█████████▎| 5391/5773 [36:53<34:53, 5.48s/it] {'loss': 0.5376, 'learning_rate': 2.2882961179106933e-07, 'epoch': 0.93} 93%|█████████▎| 5391/5773 [36:50<34:53, 5.48s/it] 93%|█████████▎| 5392/5773 [36:58<34:34, 5.44s/it] 93%|█████████▎| 5392/5773 [36:55<34:33, 5.44s/it] {'loss': 0.5727, 'learning_rate': 2.2763767944284809e-07, 'epoch': 0.93} 93%|█████████▎| 5392/5773 [36:58<34:34, 5.44s/it] {'loss': 0.5727, 'learning_rate': 2.2763767944284809e-07, 'epoch': 0.93} 93%|█████████▎| 5392/5773 [36:55<34:33, 5.44s/it] 93%|█████████▎| 5393/5773 [37:03<34:42, 5.48s/it] 93%|█████████▎| 5393/5773 [37:01<34:42, 5.48s/it] {'loss': 0.5528, 'learning_rate': 2.2644882374651233e-07, 'epoch': 0.93} 93%|█████████▎| 5393/5773 [37:03<34:42, 5.48s/it] {'loss': 0.5528, 'learning_rate': 2.2644882374651233e-07, 'epoch': 0.93} 93%|█████████▎| 5393/5773 [37:01<34:42, 5.48s/it] 93%|█████████▎| 5394/5773 [37:09<34:31, 5.47s/it] 93%|█████████▎| 5394/5773 [37:06<34:31, 5.47s/it] {'loss': 0.545, 'learning_rate': 2.2526304507634933e-07, 'epoch': 0.93} 93%|█████████▎| 5394/5773 [37:09<34:31, 5.47s/it] {'loss': 0.545, 'learning_rate': 2.2526304507634933e-07, 'epoch': 0.93} 93%|█████████▎| 5394/5773 [37:06<34:31, 5.47s/it] 93%|█████████▎| 5395/5773 [37:14<34:22, 5.46s/it] 93%|█████████▎| 5395/5773 [37:12<34:22, 5.46s/it] {'loss': 0.5471, 'learning_rate': 2.2408034380567824e-07, 'epoch': 0.93} 93%|█████████▎| 5395/5773 [37:14<34:22, 5.46s/it] {'loss': 0.5471, 'learning_rate': 2.2408034380567824e-07, 'epoch': 0.93} 93%|█████████▎| 5395/5773 [37:12<34:22, 5.46s/it] 93%|█████████▎| 5396/5773 [37:20<34:23, 5.47s/it] 93%|█████████▎| 5396/5773 [37:17<34:23, 5.47s/it] {'loss': 0.554, 'learning_rate': 2.2290072030685561e-07, 'epoch': 0.93} 93%|█████████▎| 5396/5773 [37:20<34:23, 5.47s/it] {'loss': 0.554, 'learning_rate': 2.2290072030685561e-07, 'epoch': 0.93} 93%|█████████▎| 5396/5773 [37:17<34:23, 5.47s/it] 93%|█████████▎| 5397/5773 [37:25<34:15, 5.47s/it] 93%|█████████▎| 5397/5773 [37:23<34:15, 5.47s/it] {'loss': 0.5509, 'learning_rate': 2.217241749512622e-07, 'epoch': 0.93} 93%|█████████▎| 5397/5773 [37:25<34:15, 5.47s/it] {'loss': 0.5509, 'learning_rate': 2.217241749512622e-07, 'epoch': 0.93} 93%|█████████▎| 5397/5773 [37:23<34:15, 5.47s/it] 94%|█████████▎| 5398/5773 [37:31<34:05, 5.45s/it] 94%|█████████▎| 5398/5773 [37:28<34:05, 5.45s/it] {'loss': 0.552, 'learning_rate': 2.2055070810931167e-07, 'epoch': 0.94} 94%|█████████▎| 5398/5773 [37:31<34:05, 5.45s/it] {'loss': 0.552, 'learning_rate': 2.2055070810931167e-07, 'epoch': 0.94} 94%|█████████▎| 5398/5773 [37:28<34:05, 5.45s/it] 94%|█████████▎| 5399/5773 [37:36<33:55, 5.44s/it] 94%|█████████▎| 5399/5773 [37:34<33:55, 5.44s/it] {'loss': 0.5489, 'learning_rate': 2.1938032015044964e-07, 'epoch': 0.94} 94%|█████████▎| 5399/5773 [37:36<33:55, 5.44s/it] {'loss': 0.5489, 'learning_rate': 2.1938032015044964e-07, 'epoch': 0.94} 94%|█████████▎| 5399/5773 [37:34<33:55, 5.44s/it]11 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 6312 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 94%|█████████▎| 5400/5773 [37:42<34:03, 5.48s/it]15 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 94%|█████████▎| 5400/5773 [37:39<34:03, 5.48s/it] {'loss': 0.5664, 'learning_rate': 2.1821301144315132e-07, 'epoch': 0.94} 94%|█████████▎| 5400/5773 [37:42<34:03, 5.48s/it] {'loss': 0.5664, 'learning_rate': 2.1821301144315132e-07, 'epoch': 0.94} 94%|█████████▎| 5400/5773 [37:39<34:03, 5.48s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 94%|█████████▎| 5401/5773 [38:05<1:12:24, 11.68s/it] 94%|█████████▎| 5401/5773 [38:08<1:12:25, 11.68s/it] {'loss': 0.5508, 'learning_rate': 2.1704878235492388e-07, 'epoch': 0.94} 94%|█████████▎| 5401/5773 [38:08<1:12:25, 11.68s/it] {'loss': 0.5508, 'learning_rate': 2.1704878235492388e-07, 'epoch': 0.94} 94%|█████████▎| 5401/5773 [38:05<1:12:24, 11.68s/it] 94%|█████████▎| 5402/5773 [38:11<1:00:38, 9.81s/it] 94%|█████████▎| 5402/5773 [38:13<1:00:38, 9.81s/it] {'loss': 0.5548, 'learning_rate': 2.15887633252303e-07, 'epoch': 0.94} {'loss': 0.5548, 'learning_rate': 2.15887633252303e-07, 'epoch': 0.94} 94%|█████████▎| 5402/5773 [38:13<1:00:38, 9.81s/it] 94%|█████████▎| 5402/5773 [38:11<1:00:38, 9.81s/it] 94%|█████████▎| 5403/5773 [38:16<52:34, 8.53s/it] 94%|█████████▎| 5403/5773 [38:19<52:34, 8.53s/it] {'loss': 0.5484, 'learning_rate': 2.1472956450085513e-07, 'epoch': 0.94} 94%|█████████▎| 5403/5773 [38:19<52:34, 8.53s/it] {'loss': 0.5484, 'learning_rate': 2.1472956450085513e-07, 'epoch': 0.94} 94%|█████████▎| 5403/5773 [38:16<52:34, 8.53s/it] 94%|█████████▎| 5404/5773 [38:22<46:42, 7.60s/it] 94%|█████████▎| 5404/5773 [38:24<46:42, 7.60s/it] {'loss': 0.5563, 'learning_rate': 2.1357457646517644e-07, 'epoch': 0.94} 94%|█████████▎| 5404/5773 [38:24<46:42, 7.60s/it] {'loss': 0.5563, 'learning_rate': 2.1357457646517644e-07, 'epoch': 0.94} 94%|█████████▎| 5404/5773 [38:22<46:42, 7.60s/it] 94%|█████████▎| 5405/5773 [38:27<42:46, 6.97s/it] 94%|█████████▎| 5405/5773 [38:30<42:46, 6.97s/it] {'loss': 0.55, 'learning_rate': 2.12422669508896e-07, 'epoch': 0.94} 94%|█████████▎| 5405/5773 [38:30<42:46, 6.97s/it] {'loss': 0.55, 'learning_rate': 2.12422669508896e-07, 'epoch': 0.94} 94%|█████████▎| 5405/5773 [38:27<42:46, 6.97s/it] 94%|█████████▎| 5406/5773 [38:33<39:48, 6.51s/it] 94%|█████████▎| 5406/5773 [38:35<39:48, 6.51s/it] {'loss': 0.5469, 'learning_rate': 2.1127384399467045e-07, 'epoch': 0.94} {'loss': 0.5469, 'learning_rate': 2.1127384399467045e-07, 'epoch': 0.94} 94%|█████████▎| 5406/5773 [38:35<39:48, 6.51s/it] 94%|█████████▎| 5406/5773 [38:33<39:48, 6.51s/it] 94%|█████████▎| 5407/5773 [38:38<37:33, 6.16s/it] 94%|█████████▎| 5407/5773 [38:41<37:33, 6.16s/it] {'loss': 0.5592, 'learning_rate': 2.1012810028418597e-07, 'epoch': 0.94} 94%|█████████▎| 5407/5773 [38:41<37:33, 6.16s/it] {'loss': 0.5592, 'learning_rate': 2.1012810028418597e-07, 'epoch': 0.94} 94%|█████████▎| 5407/5773 [38:38<37:33, 6.16s/it] 94%|█████████▎| 5408/5773 [38:44<36:15, 5.96s/it] 94%|█████████▎| 5408/5773 [38:46<36:15, 5.96s/it] {'loss': 0.5563, 'learning_rate': 2.0898543873815958e-07, 'epoch': 0.94} 94%|█████████▎| 5408/5773 [38:46<36:15, 5.96s/it]{'loss': 0.5563, 'learning_rate': 2.0898543873815958e-07, 'epoch': 0.94} 94%|█████████▎| 5408/5773 [38:44<36:15, 5.96s/it] 94%|█████████▎| 5409/5773 [38:49<35:02, 5.78s/it] 94%|█████████▎| 5409/5773 [38:51<35:02, 5.78s/it] {'loss': 0.5568, 'learning_rate': 2.0784585971633797e-07, 'epoch': 0.94} 94%|█████████▎| 5409/5773 [38:51<35:02, 5.78s/it] {'loss': 0.5568, 'learning_rate': 2.0784585971633797e-07, 'epoch': 0.94} 94%|█████████▎| 5409/5773 [38:49<35:02, 5.78s/it] 94%|█████████▎| 5410/5773 [38:55<34:40, 5.73s/it] 94%|█████████▎| 5410/5773 [38:57<34:40, 5.73s/it] {'loss': 0.5719, 'learning_rate': 2.067093635774975e-07, 'epoch': 0.94} 94%|█████████▎| 5410/5773 [38:57<34:40, 5.73s/it] {'loss': 0.5719, 'learning_rate': 2.067093635774975e-07, 'epoch': 0.94} 94%|█████████▎| 5410/5773 [38:55<34:40, 5.73s/it] 94%|█████████▎| 5411/5773 [39:00<34:06, 5.65s/it] 94%|█████████▎| 5411/5773 [39:02<34:06, 5.65s/it] {'loss': 0.5668, 'learning_rate': 2.055759506794419e-07, 'epoch': 0.94} 94%|█████████▎| 5411/5773 [39:02<34:06, 5.65s/it] {'loss': 0.5668, 'learning_rate': 2.055759506794419e-07, 'epoch': 0.94} 94%|█████████▎| 5411/5773 [39:00<34:06, 5.65s/it] 94%|█████████▎| 5412/5773 [39:05<33:35, 5.58s/it] 94%|█████████▎| 5412/5773 [39:08<33:35, 5.58s/it] {'loss': 0.5704, 'learning_rate': 2.0444562137900802e-07, 'epoch': 0.94} 94%|█████████▎| 5412/5773 [39:08<33:35, 5.58s/it] {'loss': 0.5704, 'learning_rate': 2.0444562137900802e-07, 'epoch': 0.94} 94%|█████████▎| 5412/5773 [39:05<33:35, 5.58s/it] 94%|█████████▍| 5413/5773 [39:11<33:12, 5.53s/it] 94%|█████████▍| 5413/5773 [39:13<33:12, 5.53s/it] {'loss': 0.5562, 'learning_rate': 2.0331837603205785e-07, 'epoch': 0.94} {'loss': 0.5562, 'learning_rate': 2.0331837603205785e-07, 'epoch': 0.94} 94%|█████████▍| 5413/5773 [39:13<33:12, 5.53s/it] 94%|█████████▍| 5413/5773 [39:11<33:12, 5.53s/it] 94%|█████████▍| 5414/5773 [39:17<33:13, 5.55s/it] 94%|█████████▍| 5414/5773 [39:19<33:13, 5.55s/it] {'loss': 0.551, 'learning_rate': 2.0219421499348523e-07, 'epoch': 0.94} 94%|█████████▍| 5414/5773 [39:19<33:13, 5.55s/it] {'loss': 0.551, 'learning_rate': 2.0219421499348523e-07, 'epoch': 0.94} 94%|█████████▍| 5414/5773 [39:17<33:13, 5.55s/it] 94%|█████████▍| 5415/5773 [39:22<32:58, 5.53s/it] 94%|█████████▍| 5415/5773 [39:24<32:58, 5.53s/it] {'loss': 0.5352, 'learning_rate': 2.0107313861721045e-07, 'epoch': 0.94} 94%|█████████▍| 5415/5773 [39:24<32:58, 5.53s/it]{'loss': 0.5352, 'learning_rate': 2.0107313861721045e-07, 'epoch': 0.94} 94%|█████████▍| 5415/5773 [39:22<32:58, 5.53s/it] 94%|█████████▍| 5416/5773 [39:27<32:49, 5.52s/it] 94%|█████████▍| 5416/5773 [39:30<32:49, 5.52s/it] {'loss': 0.5561, 'learning_rate': 1.999551472561867e-07, 'epoch': 0.94} 94%|█████████▍| 5416/5773 [39:30<32:49, 5.52s/it] {'loss': 0.5561, 'learning_rate': 1.999551472561867e-07, 'epoch': 0.94} 94%|█████████▍| 5416/5773 [39:27<32:49, 5.52s/it] 94%|█████████▍| 5417/5773 [39:33<32:23, 5.46s/it] 94%|█████████▍| 5417/5773 [39:35<32:23, 5.46s/it] {'loss': 0.5336, 'learning_rate': 1.9884024126239021e-07, 'epoch': 0.94} 94%|█████████▍| 5417/5773 [39:35<32:23, 5.46s/it]{'loss': 0.5336, 'learning_rate': 1.9884024126239021e-07, 'epoch': 0.94} 94%|█████████▍| 5417/5773 [39:33<32:23, 5.46s/it] 94%|█████████▍| 5418/5773 [39:38<32:14, 5.45s/it] 94%|█████████▍| 5418/5773 [39:41<32:14, 5.45s/it] {'loss': 0.5404, 'learning_rate': 1.977284209868313e-07, 'epoch': 0.94} 94%|█████████▍| 5418/5773 [39:41<32:14, 5.45s/it]{'loss': 0.5404, 'learning_rate': 1.977284209868313e-07, 'epoch': 0.94} 94%|█████████▍| 5418/5773 [39:38<32:14, 5.45s/it] 94%|█████████▍| 5419/5773 [39:44<32:06, 5.44s/it] 94%|█████████▍| 5419/5773 [39:46<32:06, 5.44s/it] {'loss': 0.5559, 'learning_rate': 1.9661968677954668e-07, 'epoch': 0.94} {'loss': 0.5559, 'learning_rate': 1.9661968677954668e-07, 'epoch': 0.94} 94%|█████████▍| 5419/5773 [39:46<32:06, 5.44s/it] 94%|█████████▍| 5419/5773 [39:44<32:06, 5.44s/it] 94%|█████████▍| 5420/5773 [39:49<32:24, 5.51s/it] 94%|█████████▍| 5420/5773 [39:52<32:24, 5.51s/it] {'loss': 0.544, 'learning_rate': 1.955140389895993e-07, 'epoch': 0.94} 94%|█████████▍| 5420/5773 [39:52<32:24, 5.51s/it] {'loss': 0.544, 'learning_rate': 1.955140389895993e-07, 'epoch': 0.94} 94%|█████████▍| 5420/5773 [39:49<32:24, 5.51s/it] 94%|█████████▍| 5421/5773 [39:55<32:03, 5.47s/it] 94%|█████████▍| 5421/5773 [39:57<32:03, 5.47s/it] {'loss': 0.5596, 'learning_rate': 1.9441147796508408e-07, 'epoch': 0.94} 94%|█████████▍| 5421/5773 [39:55<32:03, 5.47s/it] {'loss': 0.5596, 'learning_rate': 1.9441147796508408e-07, 'epoch': 0.94} 94%|█████████▍| 5421/5773 [39:57<32:03, 5.47s/it] 94%|█████████▍| 5422/5773 [40:00<32:27, 5.55s/it] 94%|█████████▍| 5422/5773 [40:03<32:27, 5.55s/it] {'loss': 0.5615, 'learning_rate': 1.9331200405312222e-07, 'epoch': 0.94} 94%|█████████▍| 5422/5773 [40:03<32:27, 5.55s/it] {'loss': 0.5615, 'learning_rate': 1.9331200405312222e-07, 'epoch': 0.94} 94%|█████████▍| 5422/5773 [40:00<32:27, 5.55s/it] 94%|█████████▍| 5423/5773 [40:06<32:10, 5.52s/it] 94%|█████████▍| 5423/5773 [40:08<32:10, 5.52s/it] {'loss': 0.5557, 'learning_rate': 1.922156175998624e-07, 'epoch': 0.94} 94%|█████████▍| 5423/5773 [40:08<32:10, 5.52s/it] {'loss': 0.5557, 'learning_rate': 1.922156175998624e-07, 'epoch': 0.94} 94%|█████████▍| 5423/5773 [40:06<32:10, 5.52s/it] 94%|█████████▍| 5424/5773 [40:11<31:58, 5.50s/it] 94%|█████████▍| 5424/5773 [40:14<31:58, 5.50s/it] {'loss': 0.572, 'learning_rate': 1.9112231895048293e-07, 'epoch': 0.94} 94%|█████████▍| 5424/5773 [40:14<31:58, 5.50s/it]{'loss': 0.572, 'learning_rate': 1.9112231895048293e-07, 'epoch': 0.94} 94%|█████████▍| 5424/5773 [40:11<31:58, 5.50s/it] 94%|█████████▍| 5425/5773 [40:17<31:47, 5.48s/it] 94%|█████████▍| 5425/5773 [40:19<31:47, 5.48s/it] {'loss': 0.5365, 'learning_rate': 1.9003210844918963e-07, 'epoch': 0.94} 94%|█████████▍| 5425/5773 [40:19<31:47, 5.48s/it]{'loss': 0.5365, 'learning_rate': 1.9003210844918963e-07, 'epoch': 0.94} 94%|█████████▍| 5425/5773 [40:17<31:47, 5.48s/it] 94%|█████████▍| 5426/5773 [40:22<31:52, 5.51s/it] 94%|█████████▍| 5426/5773 [40:25<31:52, 5.51s/it] {'loss': 0.5628, 'learning_rate': 1.8894498643921454e-07, 'epoch': 0.94} 94%|█████████▍| 5426/5773 [40:25<31:52, 5.51s/it] {'loss': 0.5628, 'learning_rate': 1.8894498643921454e-07, 'epoch': 0.94} 94%|█████████▍| 5426/5773 [40:22<31:52, 5.51s/it] 94%|█████████▍| 5427/5773 [40:28<31:34, 5.47s/it] 94%|█████████▍| 5427/5773 [40:30<31:34, 5.47s/it] {'loss': 0.5676, 'learning_rate': 1.878609532628195e-07, 'epoch': 0.94} 94%|█████████▍| 5427/5773 [40:30<31:34, 5.47s/it] {'loss': 0.5676, 'learning_rate': 1.878609532628195e-07, 'epoch': 0.94} 94%|█████████▍| 5427/5773 [40:28<31:34, 5.47s/it] 94%|█████████▍| 5428/5773 [40:33<31:31, 5.48s/it] 94%|█████████▍| 5428/5773 [40:36<31:31, 5.48s/it] {'loss': 0.5465, 'learning_rate': 1.8678000926129036e-07, 'epoch': 0.94} 94%|█████████▍| 5428/5773 [40:36<31:31, 5.48s/it]{'loss': 0.5465, 'learning_rate': 1.8678000926129036e-07, 'epoch': 0.94} 94%|█████████▍| 5428/5773 [40:33<31:31, 5.48s/it] 94%|█████████▍| 5429/5773 [40:39<31:11, 5.44s/it] 94%|█████████▍| 5429/5773 [40:41<31:11, 5.44s/it] {'loss': 0.5541, 'learning_rate': 1.8570215477494603e-07, 'epoch': 0.94} 94%|█████████▍| 5429/5773 [40:41<31:11, 5.44s/it] {'loss': 0.5541, 'learning_rate': 1.8570215477494603e-07, 'epoch': 0.94} 94%|█████████▍| 5429/5773 [40:39<31:11, 5.44s/it] 94%|█████████▍| 5430/5773 [40:44<31:14, 5.47s/it] 94%|█████████▍| 5430/5773 [40:47<31:14, 5.47s/it] {'loss': 0.5625, 'learning_rate': 1.8462739014312725e-07, 'epoch': 0.94} 94%|█████████▍| 5430/5773 [40:47<31:14, 5.47s/it] {'loss': 0.5625, 'learning_rate': 1.8462739014312725e-07, 'epoch': 0.94} 94%|█████████▍| 5430/5773 [40:44<31:14, 5.47s/it] 94%|█████████▍| 5431/5773 [40:49<30:55, 5.43s/it] 94%|█████████▍| 5431/5773 [40:52<30:55, 5.43s/it] {'loss': 0.5483, 'learning_rate': 1.835557157042056e-07, 'epoch': 0.94} 94%|█████████▍| 5431/5773 [40:52<30:55, 5.43s/it]{'loss': 0.5483, 'learning_rate': 1.835557157042056e-07, 'epoch': 0.94} 94%|█████████▍| 5431/5773 [40:49<30:55, 5.43s/it] 94%|█████████▍| 5432/5773 [40:55<31:01, 5.46s/it] 94%|█████████▍| 5432/5773 [40:57<31:01, 5.46s/it] {'loss': 0.5693, 'learning_rate': 1.8248713179557788e-07, 'epoch': 0.94} 94%|█████████▍| 5432/5773 [40:57<31:01, 5.46s/it] {'loss': 0.5693, 'learning_rate': 1.8248713179557788e-07, 'epoch': 0.94} 94%|█████████▍| 5432/5773 [40:55<31:01, 5.46s/it] 94%|█████████▍| 5433/5773 [41:00<30:39, 5.41s/it] 94%|█████████▍| 5433/5773 [41:03<30:39, 5.41s/it] {'loss': 0.5555, 'learning_rate': 1.8142163875366824e-07, 'epoch': 0.94} 94%|█████████▍| 5433/5773 [41:03<30:39, 5.41s/it]{'loss': 0.5555, 'learning_rate': 1.8142163875366824e-07, 'epoch': 0.94} 94%|█████████▍| 5433/5773 [41:00<30:39, 5.41s/it] 94%|█████████▍| 5434/5773 [41:06<30:28, 5.39s/it] 94%|█████████▍| 5434/5773 [41:08<30:28, 5.39s/it] {'loss': 0.542, 'learning_rate': 1.803592369139273e-07, 'epoch': 0.94} 94%|█████████▍| 5434/5773 [41:08<30:28, 5.39s/it]{'loss': 0.542, 'learning_rate': 1.803592369139273e-07, 'epoch': 0.94} 94%|█████████▍| 5434/5773 [41:06<30:28, 5.39s/it] 94%|█████████▍| 5435/5773 [41:11<30:31, 5.42s/it] 94%|█████████▍| 5435/5773 [41:14<30:31, 5.42s/it] {'loss': 0.5615, 'learning_rate': 1.792999266108353e-07, 'epoch': 0.94} 94%|█████████▍| 5435/5773 [41:14<30:31, 5.42s/it]{'loss': 0.5615, 'learning_rate': 1.792999266108353e-07, 'epoch': 0.94} 94%|█████████▍| 5435/5773 [41:11<30:31, 5.42s/it] 94%|█████████▍| 5436/5773 [41:16<30:19, 5.40s/it] 94%|█████████▍| 5436/5773 [41:19<30:19, 5.40s/it] {'loss': 0.5598, 'learning_rate': 1.7824370817789539e-07, 'epoch': 0.94} 94%|█████████▍| 5436/5773 [41:19<30:19, 5.40s/it] {'loss': 0.5598, 'learning_rate': 1.7824370817789539e-07, 'epoch': 0.94} 94%|█████████▍| 5436/5773 [41:16<30:19, 5.40s/it] 94%|█████████▍| 5437/5773 [41:22<30:19, 5.42s/it] 94%|█████████▍| 5437/5773 [41:24<30:19, 5.42s/it] {'loss': 0.5806, 'learning_rate': 1.7719058194763828e-07, 'epoch': 0.94} 94%|█████████▍| 5437/5773 [41:24<30:19, 5.42s/it] {'loss': 0.5806, 'learning_rate': 1.7719058194763828e-07, 'epoch': 0.94} 94%|█████████▍| 5437/5773 [41:22<30:19, 5.42s/it] 94%|█████████▍| 5438/5773 [41:27<30:22, 5.44s/it] 94%|█████████▍| 5438/5773 [41:30<30:23, 5.44s/it] {'loss': 0.5409, 'learning_rate': 1.761405482516232e-07, 'epoch': 0.94} 94%|█████████▍| 5438/5773 [41:30<30:23, 5.44s/it] {'loss': 0.5409, 'learning_rate': 1.761405482516232e-07, 'epoch': 0.94} 94%|█████████▍| 5438/5773 [41:27<30:22, 5.44s/it] 94%|█████████▍| 5439/5773 [41:33<30:23, 5.46s/it] 94%|█████████▍| 5439/5773 [41:35<30:23, 5.46s/it] {'loss': 0.5653, 'learning_rate': 1.7509360742043346e-07, 'epoch': 0.94} 94%|█████████▍| 5439/5773 [41:35<30:23, 5.46s/it] {'loss': 0.5653, 'learning_rate': 1.7509360742043346e-07, 'epoch': 0.94} 94%|█████████▍| 5439/5773 [41:33<30:23, 5.46s/it] 94%|█████████▍| 5440/5773 [41:38<30:05, 5.42s/it] 94%|█████████▍| 5440/5773 [41:41<30:05, 5.42s/it] {'loss': 0.5528, 'learning_rate': 1.74049759783681e-07, 'epoch': 0.94} 94%|█████████▍| 5440/5773 [41:41<30:05, 5.42s/it] {'loss': 0.5528, 'learning_rate': 1.74049759783681e-07, 'epoch': 0.94} 94%|█████████▍| 5440/5773 [41:38<30:05, 5.42s/it] 94%|█████████▍| 5441/5773 [41:44<30:01, 5.43s/it] 94%|█████████▍| 5441/5773 [41:46<30:01, 5.43s/it] {'loss': 0.5581, 'learning_rate': 1.7300900566999957e-07, 'epoch': 0.94} 94%|█████████▍| 5441/5773 [41:46<30:01, 5.43s/it]{'loss': 0.5581, 'learning_rate': 1.7300900566999957e-07, 'epoch': 0.94} 94%|█████████▍| 5441/5773 [41:44<30:01, 5.43s/it] 94%|█████████▍| 5442/5773 [41:49<29:46, 5.40s/it] 94%|█████████▍| 5442/5773 [41:51<29:46, 5.40s/it] {'loss': 0.568, 'learning_rate': 1.7197134540705373e-07, 'epoch': 0.94} 94%|█████████▍| 5442/5773 [41:51<29:46, 5.40s/it]{'loss': 0.568, 'learning_rate': 1.7197134540705373e-07, 'epoch': 0.94} 94%|█████████▍| 5442/5773 [41:49<29:46, 5.40s/it] 94%|█████████▍| 5443/5773 [41:54<29:35, 5.38s/it] 94%|█████████▍| 5443/5773 [41:57<29:35, 5.38s/it] {'loss': 0.558, 'learning_rate': 1.7093677932153218e-07, 'epoch': 0.94} 94%|█████████▍| 5443/5773 [41:57<29:35, 5.38s/it] {'loss': 0.558, 'learning_rate': 1.7093677932153218e-07, 'epoch': 0.94} 94%|█████████▍| 5443/5773 [41:54<29:35, 5.38s/it] 94%|█████████▍| 5444/5773 [42:00<29:26, 5.37s/it] 94%|█████████▍| 5444/5773 [42:02<29:26, 5.37s/it] {'loss': 0.5494, 'learning_rate': 1.699053077391488e-07, 'epoch': 0.94} 94%|█████████▍| 5444/5773 [42:02<29:26, 5.37s/it] {'loss': 0.5494, 'learning_rate': 1.699053077391488e-07, 'epoch': 0.94} 94%|█████████▍| 5444/5773 [42:00<29:26, 5.37s/it] 94%|█████████▍| 5445/5773 [42:05<29:43, 5.44s/it] 94%|█████████▍| 5445/5773 [42:08<29:43, 5.44s/it] {'loss': 0.565, 'learning_rate': 1.6887693098464387e-07, 'epoch': 0.94} 94%|█████████▍| 5445/5773 [42:08<29:43, 5.44s/it] {'loss': 0.565, 'learning_rate': 1.6887693098464387e-07, 'epoch': 0.94} 94%|█████████▍| 5445/5773 [42:05<29:43, 5.44s/it] 94%|█████████▍| 5446/5773 [42:11<29:32, 5.42s/it] 94%|█████████▍| 5446/5773 [42:13<29:32, 5.42s/it] {'loss': 0.5551, 'learning_rate': 1.6785164938178277e-07, 'epoch': 0.94} 94%|█████████▍| 5446/5773 [42:13<29:32, 5.42s/it]{'loss': 0.5551, 'learning_rate': 1.6785164938178277e-07, 'epoch': 0.94} 94%|█████████▍| 5446/5773 [42:11<29:32, 5.42s/it] 94%|█████████▍| 5447/5773 [42:16<29:27, 5.42s/it] 94%|█████████▍| 5447/5773 [42:19<29:27, 5.42s/it] {'loss': 0.5554, 'learning_rate': 1.6682946325335735e-07, 'epoch': 0.94} 94%|█████████▍| 5447/5773 [42:19<29:27, 5.42s/it]{'loss': 0.5554, 'learning_rate': 1.6682946325335735e-07, 'epoch': 0.94} 94%|█████████▍| 5447/5773 [42:16<29:27, 5.42s/it] 94%|█████████▍| 5448/5773 [42:22<29:24, 5.43s/it] 94%|█████████▍| 5448/5773 [42:24<29:24, 5.43s/it] {'loss': 0.5692, 'learning_rate': 1.6581037292118573e-07, 'epoch': 0.94} 94%|█████████▍| 5448/5773 [42:24<29:24, 5.43s/it]{'loss': 0.5692, 'learning_rate': 1.6581037292118573e-07, 'epoch': 0.94} 94%|█████████▍| 5448/5773 [42:22<29:24, 5.43s/it] 94%|█████████▍| 5449/5773 [42:27<29:24, 5.44s/it] 94%|█████████▍| 5449/5773 [42:29<29:24, 5.44s/it] {'loss': 0.5551, 'learning_rate': 1.6479437870610793e-07, 'epoch': 0.94} 94%|█████████▍| 5449/5773 [42:29<29:24, 5.44s/it]{'loss': 0.5551, 'learning_rate': 1.6479437870610793e-07, 'epoch': 0.94} 94%|█████████▍| 5449/5773 [42:27<29:24, 5.44s/it]11 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 010 AutoResumeHook: Checking whether to suspend... 23 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 94%|█████████▍| 5450/5773 [42:32<29:04, 5.40s/it]6 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 94%|█████████▍| 5450/5773 [42:35<29:04, 5.40s/it]7 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... {'loss': 0.5666, 'learning_rate': 1.637814809279925e-07, 'epoch': 0.94} 94%|█████████▍| 5450/5773 [42:35<29:04, 5.40s/it] {'loss': 0.5666, 'learning_rate': 1.637814809279925e-07, 'epoch': 0.94} 94%|█████████▍| 5450/5773 [42:32<29:04, 5.40s/it] 94%|█████████▍| 5451/5773 [42:38<29:10, 5.44s/it] 94%|█████████▍| 5451/5773 [42:40<29:10, 5.44s/it] {'loss': 0.5525, 'learning_rate': 1.6277167990573217e-07, 'epoch': 0.94} 94%|█████████▍| 5451/5773 [42:40<29:10, 5.44s/it] {'loss': 0.5525, 'learning_rate': 1.6277167990573217e-07, 'epoch': 0.94} 94%|█████████▍| 5451/5773 [42:38<29:10, 5.44s/it] 94%|█████████▍| 5452/5773 [42:43<29:05, 5.44s/it] 94%|█████████▍| 5452/5773 [42:46<29:05, 5.44s/it] {'loss': 0.5634, 'learning_rate': 1.6176497595724372e-07, 'epoch': 0.94} 94%|█████████▍| 5452/5773 [42:46<29:05, 5.44s/it] {'loss': 0.5634, 'learning_rate': 1.6176497595724372e-07, 'epoch': 0.94} 94%|█████████▍| 5452/5773 [42:43<29:05, 5.44s/it] 94%|█████████▍| 5453/5773 [42:51<29:07, 5.46s/it] 94%|█████████▍| 5453/5773 [42:49<29:07, 5.46s/it] {'loss': 0.5554, 'learning_rate': 1.607613693994714e-07, 'epoch': 0.94} 94%|█████████▍| 5453/5773 [42:51<29:07, 5.46s/it] {'loss': 0.5554, 'learning_rate': 1.607613693994714e-07, 'epoch': 0.94} 94%|█████████▍| 5453/5773 [42:49<29:07, 5.46s/it] 94%|█████████▍| 5454/5773 [42:54<28:51, 5.43s/it] 94%|█████████▍| 5454/5773 [42:57<28:51, 5.43s/it] {'loss': 0.5613, 'learning_rate': 1.5976086054838024e-07, 'epoch': 0.94} 94%|█████████▍| 5454/5773 [42:57<28:51, 5.43s/it] {'loss': 0.5613, 'learning_rate': 1.5976086054838024e-07, 'epoch': 0.94} 94%|█████████▍| 5454/5773 [42:54<28:51, 5.43s/it] 94%|█████████▍| 5455/5773 [43:00<28:40, 5.41s/it] 94%|█████████▍| 5455/5773 [43:02<28:40, 5.41s/it] {'loss': 0.5646, 'learning_rate': 1.5876344971896494e-07, 'epoch': 0.94} 94%|█████████▍| 5455/5773 [43:02<28:40, 5.41s/it]{'loss': 0.5646, 'learning_rate': 1.5876344971896494e-07, 'epoch': 0.94} 94%|█████████▍| 5455/5773 [43:00<28:40, 5.41s/it] 95%|█████████▍| 5456/5773 [43:05<28:36, 5.41s/it] 95%|█████████▍| 5456/5773 [43:07<28:36, 5.41s/it] {'loss': 0.555, 'learning_rate': 1.5776913722523989e-07, 'epoch': 0.95} 95%|█████████▍| 5456/5773 [43:07<28:36, 5.41s/it] {'loss': 0.555, 'learning_rate': 1.5776913722523989e-07, 'epoch': 0.95} 95%|█████████▍| 5456/5773 [43:05<28:36, 5.41s/it] 95%|█████████▍| 5457/5773 [43:11<29:18, 5.57s/it] 95%|█████████▍| 5457/5773 [43:13<29:18, 5.57s/it] {'loss': 0.5562, 'learning_rate': 1.56777923380248e-07, 'epoch': 0.95} 95%|█████████▍| 5457/5773 [43:13<29:18, 5.57s/it]{'loss': 0.5562, 'learning_rate': 1.56777923380248e-07, 'epoch': 0.95} 95%|█████████▍| 5457/5773 [43:11<29:18, 5.57s/it] 95%|█████████▍| 5458/5773 [43:16<29:14, 5.57s/it] 95%|█████████▍| 5458/5773 [43:19<29:14, 5.57s/it] {'loss': 0.5715, 'learning_rate': 1.5578980849605628e-07, 'epoch': 0.95} 95%|█████████▍| 5458/5773 [43:19<29:14, 5.57s/it] {'loss': 0.5715, 'learning_rate': 1.5578980849605628e-07, 'epoch': 0.95} 95%|█████████▍| 5458/5773 [43:16<29:14, 5.57s/it] 95%|█████████▍| 5459/5773 [43:22<29:09, 5.57s/it] 95%|█████████▍| 5459/5773 [43:24<29:09, 5.57s/it] {'loss': 0.5553, 'learning_rate': 1.548047928837515e-07, 'epoch': 0.95} 95%|█████████▍| 5459/5773 [43:24<29:09, 5.57s/it]{'loss': 0.5553, 'learning_rate': 1.548047928837515e-07, 'epoch': 0.95} 95%|█████████▍| 5459/5773 [43:22<29:09, 5.57s/it] 95%|█████████▍| 5460/5773 [43:27<28:55, 5.54s/it] 95%|█████████▍| 5460/5773 [43:30<28:55, 5.54s/it] {'loss': 0.5604, 'learning_rate': 1.5382287685344887e-07, 'epoch': 0.95} 95%|█████████▍| 5460/5773 [43:30<28:55, 5.54s/it] {'loss': 0.5604, 'learning_rate': 1.5382287685344887e-07, 'epoch': 0.95} 95%|█████████▍| 5460/5773 [43:27<28:55, 5.54s/it] 95%|█████████▍| 5461/5773 [43:33<28:41, 5.52s/it] 95%|█████████▍| 5461/5773 [43:35<28:41, 5.52s/it] {'loss': 0.5693, 'learning_rate': 1.5284406071428893e-07, 'epoch': 0.95} 95%|█████████▍| 5461/5773 [43:35<28:41, 5.52s/it]{'loss': 0.5693, 'learning_rate': 1.5284406071428893e-07, 'epoch': 0.95} 95%|█████████▍| 5461/5773 [43:33<28:41, 5.52s/it] 95%|█████████▍| 5462/5773 [43:38<28:28, 5.49s/it] 95%|█████████▍| 5462/5773 [43:41<28:28, 5.49s/it] {'loss': 0.5476, 'learning_rate': 1.5186834477443402e-07, 'epoch': 0.95} 95%|█████████▍| 5462/5773 [43:41<28:28, 5.49s/it] {'loss': 0.5476, 'learning_rate': 1.5186834477443402e-07, 'epoch': 0.95} 95%|█████████▍| 5462/5773 [43:38<28:28, 5.49s/it] 95%|█████████▍| 5463/5773 [43:44<28:10, 5.45s/it] 95%|█████████▍| 5463/5773 [43:46<28:10, 5.45s/it] {'loss': 0.5501, 'learning_rate': 1.5089572934106843e-07, 'epoch': 0.95} 95%|█████████▍| 5463/5773 [43:46<28:10, 5.45s/it] {'loss': 0.5501, 'learning_rate': 1.5089572934106843e-07, 'epoch': 0.95} 95%|█████████▍| 5463/5773 [43:44<28:10, 5.45s/it] 95%|█████████▍| 5464/5773 [43:49<28:10, 5.47s/it] 95%|█████████▍| 5464/5773 [43:52<28:10, 5.47s/it] {'loss': 0.5439, 'learning_rate': 1.4992621472040393e-07, 'epoch': 0.95} 95%|█████████▍| 5464/5773 [43:52<28:10, 5.47s/it] {'loss': 0.5439, 'learning_rate': 1.4992621472040393e-07, 'epoch': 0.95} 95%|█████████▍| 5464/5773 [43:49<28:10, 5.47s/it] 95%|█████████▍| 5465/5773 [43:54<27:40, 5.39s/it] 95%|█████████▍| 5465/5773 [43:57<27:40, 5.39s/it] {'loss': 0.554, 'learning_rate': 1.4895980121767627e-07, 'epoch': 0.95} 95%|█████████▍| 5465/5773 [43:57<27:40, 5.39s/it] {'loss': 0.554, 'learning_rate': 1.4895980121767627e-07, 'epoch': 0.95} 95%|█████████▍| 5465/5773 [43:54<27:40, 5.39s/it] 95%|█████████▍| 5466/5773 [44:00<27:34, 5.39s/it] 95%|█████████▍| 5466/5773 [44:02<27:34, 5.39s/it] {'loss': 0.548, 'learning_rate': 1.4799648913714104e-07, 'epoch': 0.95} 95%|█████████▍| 5466/5773 [44:02<27:34, 5.39s/it] {'loss': 0.548, 'learning_rate': 1.4799648913714104e-07, 'epoch': 0.95} 95%|█████████▍| 5466/5773 [44:00<27:34, 5.39s/it] 95%|█████████▍| 5467/5773 [44:05<27:24, 5.37s/it] 95%|█████████▍| 5467/5773 [44:08<27:24, 5.37s/it] {'loss': 0.5654, 'learning_rate': 1.4703627878207894e-07, 'epoch': 0.95} 95%|█████████▍| 5467/5773 [44:08<27:24, 5.37s/it] {'loss': 0.5654, 'learning_rate': 1.4703627878207894e-07, 'epoch': 0.95} 95%|█████████▍| 5467/5773 [44:05<27:24, 5.37s/it] 95%|█████████▍| 5468/5773 [44:10<27:07, 5.34s/it] 95%|█████████▍| 5468/5773 [44:13<27:07, 5.34s/it] {'loss': 0.5572, 'learning_rate': 1.4607917045479814e-07, 'epoch': 0.95} 95%|█████████▍| 5468/5773 [44:13<27:07, 5.34s/it] {'loss': 0.5572, 'learning_rate': 1.4607917045479814e-07, 'epoch': 0.95} 95%|█████████▍| 5468/5773 [44:10<27:07, 5.34s/it] 95%|█████████▍| 5469/5773 [44:16<26:58, 5.32s/it] 95%|█████████▍| 5469/5773 [44:18<26:58, 5.32s/it] {'loss': 0.5671, 'learning_rate': 1.4512516445662428e-07, 'epoch': 0.95} 95%|█████████▍| 5469/5773 [44:18<26:58, 5.32s/it] {'loss': 0.5671, 'learning_rate': 1.4512516445662428e-07, 'epoch': 0.95} 95%|█████████▍| 5469/5773 [44:16<26:58, 5.32s/it] 95%|█████████▍| 5470/5773 [44:21<27:05, 5.36s/it] 95%|█████████▍| 5470/5773 [44:24<27:05, 5.36s/it] {'loss': 0.5513, 'learning_rate': 1.4417426108790934e-07, 'epoch': 0.95} 95%|█████████▍| 5470/5773 [44:24<27:05, 5.36s/it] {'loss': 0.5513, 'learning_rate': 1.4417426108790934e-07, 'epoch': 0.95} 95%|█████████▍| 5470/5773 [44:21<27:05, 5.36s/it] 95%|█████████▍| 5471/5773 [44:27<27:10, 5.40s/it] 95%|█████████▍| 5471/5773 [44:29<27:10, 5.40s/it] {'loss': 0.5442, 'learning_rate': 1.4322646064802937e-07, 'epoch': 0.95} 95%|█████████▍| 5471/5773 [44:29<27:10, 5.40s/it]{'loss': 0.5442, 'learning_rate': 1.4322646064802937e-07, 'epoch': 0.95} 95%|█████████▍| 5471/5773 [44:27<27:10, 5.40s/it] 95%|█████████▍| 5472/5773 [44:32<27:24, 5.46s/it] 95%|█████████▍| 5472/5773 [44:35<27:24, 5.46s/it] {'loss': 0.5527, 'learning_rate': 1.4228176343538236e-07, 'epoch': 0.95} 95%|█████████▍| 5472/5773 [44:35<27:24, 5.46s/it]{'loss': 0.5527, 'learning_rate': 1.4228176343538236e-07, 'epoch': 0.95} 95%|█████████▍| 5472/5773 [44:32<27:24, 5.46s/it] 95%|█████████▍| 5473/5773 [44:38<27:20, 5.47s/it] 95%|█████████▍| 5473/5773 [44:40<27:20, 5.47s/it] {'loss': 0.5667, 'learning_rate': 1.4134016974738595e-07, 'epoch': 0.95} 95%|█████████▍| 5473/5773 [44:40<27:20, 5.47s/it]{'loss': 0.5667, 'learning_rate': 1.4134016974738595e-07, 'epoch': 0.95} 95%|█████████▍| 5473/5773 [44:38<27:20, 5.47s/it] 95%|█████████▍| 5474/5773 [44:43<27:08, 5.45s/it] 95%|█████████▍| 5474/5773 [44:46<27:08, 5.45s/it] {'loss': 0.549, 'learning_rate': 1.4040167988048748e-07, 'epoch': 0.95} 95%|█████████▍| 5474/5773 [44:46<27:08, 5.45s/it]{'loss': 0.549, 'learning_rate': 1.4040167988048748e-07, 'epoch': 0.95} 95%|█████████▍| 5474/5773 [44:43<27:08, 5.45s/it] 95%|█████████▍| 5475/5773 [44:48<26:51, 5.41s/it] 95%|█████████▍| 5475/5773 [44:51<26:51, 5.41s/it] {'loss': 0.544, 'learning_rate': 1.3946629413015166e-07, 'epoch': 0.95} 95%|█████████▍| 5475/5773 [44:51<26:51, 5.41s/it]{'loss': 0.544, 'learning_rate': 1.3946629413015166e-07, 'epoch': 0.95} 95%|█████████▍| 5475/5773 [44:48<26:51, 5.41s/it] 95%|█████████▍| 5476/5773 [44:54<26:37, 5.38s/it] 95%|█████████▍| 5476/5773 [44:56<26:37, 5.38s/it] {'loss': 0.5593, 'learning_rate': 1.3853401279086853e-07, 'epoch': 0.95} 95%|█████████▍| 5476/5773 [44:56<26:37, 5.38s/it]{'loss': 0.5593, 'learning_rate': 1.3853401279086853e-07, 'epoch': 0.95} 95%|█████████▍| 5476/5773 [44:54<26:37, 5.38s/it] 95%|█████████▍| 5477/5773 [44:59<26:27, 5.36s/it] 95%|█████████▍| 5477/5773 [45:02<26:27, 5.36s/it] {'loss': 0.5636, 'learning_rate': 1.3760483615614995e-07, 'epoch': 0.95} 95%|█████████▍| 5477/5773 [45:02<26:27, 5.36s/it]{'loss': 0.5636, 'learning_rate': 1.3760483615614995e-07, 'epoch': 0.95} 95%|█████████▍| 5477/5773 [44:59<26:27, 5.36s/it] 95%|█████████▍| 5478/5773 [45:04<26:23, 5.37s/it] 95%|█████████▍| 5478/5773 [45:07<26:23, 5.37s/it] {'loss': 0.5605, 'learning_rate': 1.366787645185297e-07, 'epoch': 0.95} {'loss': 0.5605, 'learning_rate': 1.366787645185297e-07, 'epoch': 0.95} 95%|█████████▍| 5478/5773 [45:07<26:23, 5.37s/it] 95%|█████████▍| 5478/5773 [45:04<26:23, 5.37s/it] 95%|█████████▍| 5479/5773 [45:10<26:12, 5.35s/it] 95%|█████████▍| 5479/5773 [45:12<26:12, 5.35s/it] {'loss': 0.5561, 'learning_rate': 1.3575579816956564e-07, 'epoch': 0.95} 95%|█████████▍| 5479/5773 [45:12<26:12, 5.35s/it]{'loss': 0.5561, 'learning_rate': 1.3575579816956564e-07, 'epoch': 0.95} 95%|█████████▍| 5479/5773 [45:10<26:12, 5.35s/it] 95%|█████████▍| 5480/5773 [45:15<26:12, 5.37s/it] 95%|█████████▍| 5480/5773 [45:18<26:12, 5.37s/it] {'loss': 0.55, 'learning_rate': 1.3483593739983646e-07, 'epoch': 0.95} 95%|█████████▍| 5480/5773 [45:18<26:12, 5.37s/it]{'loss': 0.55, 'learning_rate': 1.3483593739983646e-07, 'epoch': 0.95} 95%|█████████▍| 5480/5773 [45:15<26:12, 5.37s/it] 95%|█████████▍| 5481/5773 [45:20<26:00, 5.35s/it] 95%|█████████▍| 5481/5773 [45:23<26:00, 5.35s/it] {'loss': 0.5483, 'learning_rate': 1.3391918249894496e-07, 'epoch': 0.95} 95%|█████████▍| 5481/5773 [45:23<26:00, 5.35s/it] {'loss': 0.5483, 'learning_rate': 1.3391918249894496e-07, 'epoch': 0.95} 95%|█████████▍| 5481/5773 [45:20<26:00, 5.35s/it] 95%|█████████▍| 5482/5773 [45:26<25:48, 5.32s/it] 95%|█████████▍| 5482/5773 [45:28<25:48, 5.32s/it] {'loss': 0.5659, 'learning_rate': 1.3300553375551362e-07, 'epoch': 0.95} 95%|█████████▍| 5482/5773 [45:28<25:48, 5.32s/it] {'loss': 0.5659, 'learning_rate': 1.3300553375551362e-07, 'epoch': 0.95} 95%|█████████▍| 5482/5773 [45:26<25:48, 5.32s/it] 95%|█████████▍| 5483/5773 [45:31<25:48, 5.34s/it] 95%|█████████▍| 5483/5773 [45:34<25:48, 5.34s/it] {'loss': 0.5654, 'learning_rate': 1.3209499145718785e-07, 'epoch': 0.95} 95%|█████████▍| 5483/5773 [45:34<25:48, 5.34s/it]{'loss': 0.5654, 'learning_rate': 1.3209499145718785e-07, 'epoch': 0.95} 95%|█████████▍| 5483/5773 [45:31<25:48, 5.34s/it] 95%|█████████▍| 5484/5773 [45:36<25:38, 5.32s/it] 95%|█████████▍| 5484/5773 [45:39<25:38, 5.32s/it] {'loss': 0.5752, 'learning_rate': 1.3118755589063725e-07, 'epoch': 0.95} 95%|█████████▍| 5484/5773 [45:39<25:38, 5.32s/it]{'loss': 0.5752, 'learning_rate': 1.3118755589063725e-07, 'epoch': 0.95} 95%|█████████▍| 5484/5773 [45:36<25:38, 5.32s/it] 95%|█████████▌| 5485/5773 [45:42<25:51, 5.39s/it] 95%|█████████▌| 5485/5773 [45:44<25:51, 5.39s/it] {'loss': 0.5392, 'learning_rate': 1.3028322734154997e-07, 'epoch': 0.95} 95%|█████████▌| 5485/5773 [45:44<25:51, 5.39s/it]{'loss': 0.5392, 'learning_rate': 1.3028322734154997e-07, 'epoch': 0.95} 95%|█████████▌| 5485/5773 [45:42<25:51, 5.39s/it] 95%|█████████▌| 5486/5773 [45:47<25:57, 5.43s/it] 95%|█████████▌| 5486/5773 [45:50<25:57, 5.43s/it] {'loss': 0.5595, 'learning_rate': 1.2938200609463826e-07, 'epoch': 0.95} 95%|█████████▌| 5486/5773 [45:50<25:57, 5.43s/it] {'loss': 0.5595, 'learning_rate': 1.2938200609463826e-07, 'epoch': 0.95} 95%|█████████▌| 5486/5773 [45:47<25:57, 5.43s/it] 95%|█████████▌| 5487/5773 [45:53<25:59, 5.45s/it] 95%|█████████▌| 5487/5773 [45:55<25:59, 5.45s/it] {'loss': 0.5545, 'learning_rate': 1.2848389243363514e-07, 'epoch': 0.95} 95%|█████████▌| 5487/5773 [45:55<25:59, 5.45s/it]{'loss': 0.5545, 'learning_rate': 1.2848389243363514e-07, 'epoch': 0.95} 95%|█████████▌| 5487/5773 [45:53<25:59, 5.45s/it] 95%|█████████▌| 5488/5773 [45:58<25:45, 5.42s/it] 95%|█████████▌| 5488/5773 [46:01<25:45, 5.42s/it] {'loss': 0.5485, 'learning_rate': 1.2758888664129553e-07, 'epoch': 0.95} 95%|█████████▌| 5488/5773 [46:01<25:45, 5.42s/it] {'loss': 0.5485, 'learning_rate': 1.2758888664129553e-07, 'epoch': 0.95} 95%|█████████▌| 5488/5773 [45:58<25:45, 5.42s/it] 95%|█████████▌| 5489/5773 [46:04<25:40, 5.43s/it] 95%|█████████▌| 5489/5773 [46:06<25:40, 5.43s/it] {'loss': 0.5705, 'learning_rate': 1.266969889993952e-07, 'epoch': 0.95} {'loss': 0.5705, 'learning_rate': 1.266969889993952e-07, 'epoch': 0.95} 95%|█████████▌| 5489/5773 [46:06<25:40, 5.43s/it] 95%|█████████▌| 5489/5773 [46:04<25:40, 5.43s/it] 95%|█████████▌| 5490/5773 [46:09<25:36, 5.43s/it] 95%|█████████▌| 5490/5773 [46:12<25:36, 5.43s/it] {'loss': 0.5621, 'learning_rate': 1.2580819978873283e-07, 'epoch': 0.95} 95%|█████████▌| 5490/5773 [46:12<25:36, 5.43s/it] {'loss': 0.5621, 'learning_rate': 1.2580819978873283e-07, 'epoch': 0.95} 95%|█████████▌| 5490/5773 [46:09<25:36, 5.43s/it] 95%|█████████▌| 5491/5773 [46:15<25:25, 5.41s/it] 95%|█████████▌| 5491/5773 [46:17<25:25, 5.41s/it] {'loss': 0.5536, 'learning_rate': 1.249225192891279e-07, 'epoch': 0.95} 95%|█████████▌| 5491/5773 [46:17<25:25, 5.41s/it]{'loss': 0.5536, 'learning_rate': 1.249225192891279e-07, 'epoch': 0.95} 95%|█████████▌| 5491/5773 [46:15<25:25, 5.41s/it] 95%|█████████▌| 5492/5773 [46:20<25:26, 5.43s/it] 95%|█████████▌| 5492/5773 [46:22<25:26, 5.43s/it] {'loss': 0.5393, 'learning_rate': 1.2403994777941964e-07, 'epoch': 0.95} {'loss': 0.5393, 'learning_rate': 1.2403994777941964e-07, 'epoch': 0.95} 95%|█████████▌| 5492/5773 [46:22<25:26, 5.43s/it] 95%|█████████▌| 5492/5773 [46:20<25:26, 5.43s/it] 95%|█████████▌| 5493/5773 [46:25<25:15, 5.41s/it] 95%|█████████▌| 5493/5773 [46:28<25:15, 5.41s/it] {'loss': 0.5365, 'learning_rate': 1.231604855374713e-07, 'epoch': 0.95} 95%|█████████▌| 5493/5773 [46:28<25:15, 5.41s/it] {'loss': 0.5365, 'learning_rate': 1.231604855374713e-07, 'epoch': 0.95} 95%|█████████▌| 5493/5773 [46:25<25:15, 5.41s/it] 95%|█████████▌| 5494/5773 [46:31<25:10, 5.41s/it] 95%|█████████▌| 5494/5773 [46:33<25:10, 5.41s/it] {'loss': 0.5588, 'learning_rate': 1.2228413284016472e-07, 'epoch': 0.95} 95%|█████████▌| 5494/5773 [46:33<25:10, 5.41s/it] {'loss': 0.5588, 'learning_rate': 1.2228413284016472e-07, 'epoch': 0.95} 95%|█████████▌| 5494/5773 [46:31<25:10, 5.41s/it] 95%|█████████▌| 5495/5773 [46:36<25:09, 5.43s/it] 95%|█████████▌| 5495/5773 [46:39<25:09, 5.43s/it] {'loss': 0.5483, 'learning_rate': 1.2141088996340368e-07, 'epoch': 0.95} 95%|█████████▌| 5495/5773 [46:39<25:09, 5.43s/it] {'loss': 0.5483, 'learning_rate': 1.2141088996340368e-07, 'epoch': 0.95} 95%|█████████▌| 5495/5773 [46:36<25:09, 5.43s/it] 95%|█████████▌| 5496/5773 [46:42<25:06, 5.44s/it] 95%|█████████▌| 5496/5773 [46:44<25:06, 5.44s/it] {'loss': 0.5516, 'learning_rate': 1.2054075718211268e-07, 'epoch': 0.95} {'loss': 0.5516, 'learning_rate': 1.2054075718211268e-07, 'epoch': 0.95} 95%|█████████▌| 5496/5773 [46:44<25:06, 5.44s/it] 95%|█████████▌| 5496/5773 [46:42<25:06, 5.44s/it] 95%|█████████▌| 5497/5773 [46:47<24:54, 5.41s/it] 95%|█████████▌| 5497/5773 [46:50<24:54, 5.41s/it] {'loss': 0.5444, 'learning_rate': 1.1967373477023925e-07, 'epoch': 0.95} 95%|█████████▌| 5497/5773 [46:50<24:54, 5.41s/it] {'loss': 0.5444, 'learning_rate': 1.1967373477023925e-07, 'epoch': 0.95} 95%|█████████▌| 5497/5773 [46:47<24:54, 5.41s/it] 95%|█████████▌| 5498/5773 [46:52<24:42, 5.39s/it] 95%|█████████▌| 5498/5773 [46:55<24:42, 5.39s/it] {'loss': 0.5426, 'learning_rate': 1.1880982300074839e-07, 'epoch': 0.95} 95%|█████████▌| 5498/5773 [46:55<24:42, 5.39s/it] {'loss': 0.5426, 'learning_rate': 1.1880982300074839e-07, 'epoch': 0.95} 95%|█████████▌| 5498/5773 [46:52<24:42, 5.39s/it] 95%|█████████▌| 5499/5773 [46:58<24:41, 5.41s/it] 95%|█████████▌| 5499/5773 [47:00<24:41, 5.41s/it] {'loss': 0.5521, 'learning_rate': 1.1794902214562587e-07, 'epoch': 0.95} 95%|█████████▌| 5499/5773 [47:00<24:41, 5.41s/it] {'loss': 0.5521, 'learning_rate': 1.1794902214562587e-07, 'epoch': 0.95} 95%|█████████▌| 5499/5773 [46:58<24:41, 5.41s/it]1 AutoResumeHook: Checking whether to suspend... 121114 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 45 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...3 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 95%|█████████▌| 5500/5773 [47:03<24:38, 5.42s/it]6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 95%|█████████▌| 5500/5773 [47:06<24:38, 5.42s/it] {'loss': 0.5494, 'learning_rate': 1.1709133247588156e-07, 'epoch': 0.95} 95%|█████████▌| 5500/5773 [47:06<24:38, 5.42s/it]{'loss': 0.5494, 'learning_rate': 1.1709133247588156e-07, 'epoch': 0.95} 95%|█████████▌| 5500/5773 [47:03<24:38, 5.42s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 95%|█████████▌| 5501/5773 [47:27<49:01, 10.81s/it] 95%|█████████▌| 5501/5773 [47:29<49:01, 10.81s/it] {'loss': 0.5598, 'learning_rate': 1.1623675426154279e-07, 'epoch': 0.95} 95%|█████████▌| 5501/5773 [47:29<49:01, 10.81s/it] {'loss': 0.5598, 'learning_rate': 1.1623675426154279e-07, 'epoch': 0.95} 95%|█████████▌| 5501/5773 [47:27<49:01, 10.81s/it] 95%|█████████▌| 5502/5773 [47:32<41:33, 9.20s/it] 95%|█████████▌| 5502/5773 [47:35<41:33, 9.20s/it] {'loss': 0.5404, 'learning_rate': 1.1538528777165659e-07, 'epoch': 0.95} {'loss': 0.5404, 'learning_rate': 1.1538528777165659e-07, 'epoch': 0.95} 95%|█████████▌| 5502/5773 [47:35<41:33, 9.20s/it] 95%|█████████▌| 5502/5773 [47:32<41:33, 9.20s/it] 95%|█████████▌| 5503/5773 [47:38<36:19, 8.07s/it] 95%|█████████▌| 5503/5773 [47:40<36:19, 8.07s/it] {'loss': 0.55, 'learning_rate': 1.1453693327429405e-07, 'epoch': 0.95} 95%|█████████▌| 5503/5773 [47:40<36:19, 8.07s/it] {'loss': 0.55, 'learning_rate': 1.1453693327429405e-07, 'epoch': 0.95} 95%|█████████▌| 5503/5773 [47:38<36:19, 8.07s/it] 95%|█████████▌| 5504/5773 [47:43<33:02, 7.37s/it] 95%|█████████▌| 5504/5773 [47:46<33:02, 7.37s/it] {'loss': 0.5581, 'learning_rate': 1.1369169103654375e-07, 'epoch': 0.95} 95%|█████████▌| 5504/5773 [47:46<33:02, 7.37s/it]{'loss': 0.5581, 'learning_rate': 1.1369169103654375e-07, 'epoch': 0.95} 95%|█████████▌| 5504/5773 [47:43<33:02, 7.37s/it] 95%|█████████▌| 5505/5773 [47:49<30:20, 6.79s/it] 95%|█████████▌| 5505/5773 [47:51<30:20, 6.79s/it] {'loss': 0.5581, 'learning_rate': 1.1284956132451286e-07, 'epoch': 0.95} {'loss': 0.5581, 'learning_rate': 1.1284956132451286e-07, 'epoch': 0.95} 95%|█████████▌| 5505/5773 [47:51<30:20, 6.79s/it] 95%|█████████▌| 5505/5773 [47:49<30:20, 6.79s/it] 95%|█████████▌| 5506/5773 [47:54<28:32, 6.41s/it] 95%|█████████▌| 5506/5773 [47:57<28:32, 6.41s/it] {'loss': 0.5659, 'learning_rate': 1.120105444033337e-07, 'epoch': 0.95} 95%|█████████▌| 5506/5773 [47:57<28:32, 6.41s/it] {'loss': 0.5659, 'learning_rate': 1.120105444033337e-07, 'epoch': 0.95} 95%|█████████▌| 5506/5773 [47:54<28:32, 6.41s/it] 95%|█████████▌| 5507/5773 [48:00<27:11, 6.13s/it] 95%|█████████▌| 5507/5773 [48:02<27:11, 6.13s/it] {'loss': 0.5647, 'learning_rate': 1.1117464053715388e-07, 'epoch': 0.95} 95%|█████████▌| 5507/5773 [48:02<27:11, 6.13s/it]{'loss': 0.5647, 'learning_rate': 1.1117464053715388e-07, 'epoch': 0.95} 95%|█████████▌| 5507/5773 [48:00<27:11, 6.13s/it] 95%|█████████▌| 5508/5773 [48:05<25:55, 5.87s/it] 95%|█████████▌| 5508/5773 [48:07<25:55, 5.87s/it] {'loss': 0.5483, 'learning_rate': 1.1034184998914398e-07, 'epoch': 0.95} 95%|█████████▌| 5508/5773 [48:07<25:55, 5.87s/it] {'loss': 0.5483, 'learning_rate': 1.1034184998914398e-07, 'epoch': 0.95} 95%|█████████▌| 5508/5773 [48:05<25:55, 5.87s/it] 95%|█████████▌| 5509/5773 [48:10<25:12, 5.73s/it] 95%|█████████▌| 5509/5773 [48:13<25:12, 5.73s/it] {'loss': 0.561, 'learning_rate': 1.0951217302148986e-07, 'epoch': 0.95} 95%|█████████▌| 5509/5773 [48:13<25:12, 5.73s/it]{'loss': 0.561, 'learning_rate': 1.0951217302148986e-07, 'epoch': 0.95} 95%|█████████▌| 5509/5773 [48:10<25:12, 5.73s/it] 95%|█████████▌| 5510/5773 [48:16<24:40, 5.63s/it] 95%|█████████▌| 5510/5773 [48:18<24:40, 5.63s/it] {'loss': 0.5652, 'learning_rate': 1.0868560989540477e-07, 'epoch': 0.95} 95%|█████████▌| 5510/5773 [48:18<24:40, 5.63s/it] {'loss': 0.5652, 'learning_rate': 1.0868560989540477e-07, 'epoch': 0.95} 95%|█████████▌| 5510/5773 [48:16<24:40, 5.63s/it] 95%|█████████▌| 5511/5773 [48:21<24:32, 5.62s/it] 95%|█████████▌| 5511/5773 [48:24<24:32, 5.62s/it] {'loss': 0.5615, 'learning_rate': 1.078621608711139e-07, 'epoch': 0.95} 95%|█████████▌| 5511/5773 [48:24<24:32, 5.62s/it] {'loss': 0.5615, 'learning_rate': 1.078621608711139e-07, 'epoch': 0.95} 95%|█████████▌| 5511/5773 [48:21<24:32, 5.62s/it] 95%|█████████▌| 5512/5773 [48:27<24:12, 5.56s/it] 95%|█████████▌| 5512/5773 [48:29<24:12, 5.56s/it] {'loss': 0.5406, 'learning_rate': 1.0704182620786652e-07, 'epoch': 0.95} 95%|█████████▌| 5512/5773 [48:29<24:12, 5.56s/it] {'loss': 0.5406, 'learning_rate': 1.0704182620786652e-07, 'epoch': 0.95} 95%|█████████▌| 5512/5773 [48:27<24:12, 5.56s/it] 95%|█████████▌| 5513/5773 [48:32<23:45, 5.48s/it] 95%|█████████▌| 5513/5773 [48:35<23:45, 5.48s/it] {'loss': 0.5573, 'learning_rate': 1.0622460616393048e-07, 'epoch': 0.95} 95%|█████████▌| 5513/5773 [48:35<23:45, 5.48s/it]{'loss': 0.5573, 'learning_rate': 1.0622460616393048e-07, 'epoch': 0.95} 95%|█████████▌| 5513/5773 [48:32<23:45, 5.48s/it] 96%|█████████▌| 5514/5773 [48:38<23:43, 5.49s/it] 96%|█████████▌| 5514/5773 [48:40<23:43, 5.49s/it] {'loss': 0.5588, 'learning_rate': 1.0541050099659333e-07, 'epoch': 0.96} 96%|█████████▌| 5514/5773 [48:40<23:43, 5.49s/it] {'loss': 0.5588, 'learning_rate': 1.0541050099659333e-07, 'epoch': 0.96} 96%|█████████▌| 5514/5773 [48:38<23:43, 5.49s/it] 96%|█████████▌| 5515/5773 [48:43<23:39, 5.50s/it] 96%|█████████▌| 5515/5773 [48:46<23:39, 5.50s/it] {'loss': 0.553, 'learning_rate': 1.0459951096215892e-07, 'epoch': 0.96} 96%|█████████▌| 5515/5773 [48:46<23:39, 5.50s/it] {'loss': 0.553, 'learning_rate': 1.0459951096215892e-07, 'epoch': 0.96} 96%|█████████▌| 5515/5773 [48:43<23:39, 5.50s/it] 96%|█████████▌| 5516/5773 [48:49<23:38, 5.52s/it] 96%|█████████▌| 5516/5773 [48:51<23:38, 5.52s/it] {'loss': 0.5461, 'learning_rate': 1.0379163631595413e-07, 'epoch': 0.96} 96%|█████████▌| 5516/5773 [48:51<23:38, 5.52s/it]{'loss': 0.5461, 'learning_rate': 1.0379163631595413e-07, 'epoch': 0.96} 96%|█████████▌| 5516/5773 [48:49<23:38, 5.52s/it] 96%|█████████▌| 5517/5773 [48:54<23:28, 5.50s/it] 96%|█████████▌| 5517/5773 [48:57<23:28, 5.50s/it] {'loss': 0.5585, 'learning_rate': 1.0298687731232548e-07, 'epoch': 0.96} 96%|█████████▌| 5517/5773 [48:57<23:28, 5.50s/it]{'loss': 0.5585, 'learning_rate': 1.0298687731232548e-07, 'epoch': 0.96} 96%|█████████▌| 5517/5773 [48:54<23:28, 5.50s/it] 96%|█████████▌| 5518/5773 [49:00<23:22, 5.50s/it] 96%|█████████▌| 5518/5773 [49:02<23:22, 5.50s/it] {'loss': 0.5368, 'learning_rate': 1.0218523420463367e-07, 'epoch': 0.96} 96%|█████████▌| 5518/5773 [49:02<23:22, 5.50s/it] {'loss': 0.5368, 'learning_rate': 1.0218523420463367e-07, 'epoch': 0.96} 96%|█████████▌| 5518/5773 [49:00<23:22, 5.50s/it] 96%|█████████▌| 5519/5773 [49:05<23:16, 5.50s/it] 96%|█████████▌| 5519/5773 [49:08<23:16, 5.50s/it] {'loss': 0.5419, 'learning_rate': 1.0138670724526345e-07, 'epoch': 0.96} 96%|█████████▌| 5519/5773 [49:08<23:16, 5.50s/it] {'loss': 0.5419, 'learning_rate': 1.0138670724526345e-07, 'epoch': 0.96} 96%|█████████▌| 5519/5773 [49:05<23:16, 5.50s/it] 96%|█████████▌| 5520/5773 [49:11<23:05, 5.48s/it] 96%|█████████▌| 5520/5773 [49:13<23:05, 5.48s/it] {'loss': 0.5612, 'learning_rate': 1.0059129668561707e-07, 'epoch': 0.96} 96%|█████████▌| 5520/5773 [49:13<23:05, 5.48s/it] {'loss': 0.5612, 'learning_rate': 1.0059129668561707e-07, 'epoch': 0.96} 96%|█████████▌| 5520/5773 [49:11<23:05, 5.48s/it] 96%|█████████▌| 5521/5773 [49:16<22:51, 5.44s/it] 96%|█████████▌| 5521/5773 [49:18<22:51, 5.44s/it] {'loss': 0.5793, 'learning_rate': 9.97990027761142e-08, 'epoch': 0.96} 96%|█████████▌| 5521/5773 [49:16<22:51, 5.44s/it]{'loss': 0.5793, 'learning_rate': 9.97990027761142e-08, 'epoch': 0.96} 96%|█████████▌| 5521/5773 [49:18<22:51, 5.44s/it] 96%|█████████▌| 5522/5773 [49:21<22:40, 5.42s/it] 96%|█████████▌| 5522/5773 [49:24<22:40, 5.42s/it] {'loss': 0.539, 'learning_rate': 9.900982576619423e-08, 'epoch': 0.96} 96%|█████████▌| 5522/5773 [49:24<22:40, 5.42s/it]{'loss': 0.539, 'learning_rate': 9.900982576619423e-08, 'epoch': 0.96} 96%|█████████▌| 5522/5773 [49:21<22:40, 5.42s/it] 96%|█████████▌| 5523/5773 [49:27<22:28, 5.39s/it] 96%|█████████▌| 5523/5773 [49:29<22:28, 5.39s/it] {'loss': 0.5697, 'learning_rate': 9.822376590431503e-08, 'epoch': 0.96} 96%|█████████▌| 5523/5773 [49:29<22:28, 5.39s/it] {'loss': 0.5697, 'learning_rate': 9.822376590431503e-08, 'epoch': 0.96} 96%|█████████▌| 5523/5773 [49:27<22:28, 5.39s/it] 96%|█████████▌| 5524/5773 [49:32<22:25, 5.40s/it] 96%|█████████▌| 5524/5773 [49:35<22:25, 5.40s/it] {'loss': 0.5438, 'learning_rate': 9.744082343795535e-08, 'epoch': 0.96} 96%|█████████▌| 5524/5773 [49:35<22:25, 5.40s/it] {'loss': 0.5438, 'learning_rate': 9.744082343795535e-08, 'epoch': 0.96} 96%|█████████▌| 5524/5773 [49:32<22:25, 5.40s/it] 96%|█████████▌| 5525/5773 [49:37<22:06, 5.35s/it] 96%|█████████▌| 5525/5773 [49:40<22:06, 5.35s/it] {'loss': 0.5702, 'learning_rate': 9.6660998613608e-08, 'epoch': 0.96} 96%|█████████▌| 5525/5773 [49:40<22:06, 5.35s/it]{'loss': 0.5702, 'learning_rate': 9.6660998613608e-08, 'epoch': 0.96} 96%|█████████▌| 5525/5773 [49:37<22:06, 5.35s/it] 96%|█████████▌| 5526/5773 [49:43<22:06, 5.37s/it] 96%|█████████▌| 5526/5773 [49:45<22:06, 5.37s/it] {'loss': 0.534, 'learning_rate': 9.58842916767877e-08, 'epoch': 0.96} 96%|█████████▌| 5526/5773 [49:45<22:06, 5.37s/it]{'loss': 0.534, 'learning_rate': 9.58842916767877e-08, 'epoch': 0.96} 96%|█████████▌| 5526/5773 [49:43<22:06, 5.37s/it] 96%|█████████▌| 5527/5773 [49:48<22:01, 5.37s/it] 96%|█████████▌| 5527/5773 [49:51<22:01, 5.37s/it] {'loss': 0.5563, 'learning_rate': 9.511070287202773e-08, 'epoch': 0.96} 96%|█████████▌| 5527/5773 [49:51<22:01, 5.37s/it] {'loss': 0.5563, 'learning_rate': 9.511070287202773e-08, 'epoch': 0.96} 96%|█████████▌| 5527/5773 [49:48<22:01, 5.37s/it] 96%|█████████▌| 5528/5773 [49:54<21:56, 5.37s/it] 96%|█████████▌| 5528/5773 [49:56<21:56, 5.37s/it] {'loss': 0.5604, 'learning_rate': 9.43402324428766e-08, 'epoch': 0.96} 96%|█████████▌| 5528/5773 [49:56<21:56, 5.37s/it]{'loss': 0.5604, 'learning_rate': 9.43402324428766e-08, 'epoch': 0.96} 96%|█████████▌| 5528/5773 [49:54<21:56, 5.37s/it] 96%|█████████▌| 5529/5773 [49:59<21:58, 5.40s/it] 96%|█████████▌| 5529/5773 [50:01<21:58, 5.40s/it] {'loss': 0.5497, 'learning_rate': 9.357288063190473e-08, 'epoch': 0.96} 96%|█████████▌| 5529/5773 [50:01<21:58, 5.40s/it] {'loss': 0.5497, 'learning_rate': 9.357288063190473e-08, 'epoch': 0.96} 96%|█████████▌| 5529/5773 [49:59<21:58, 5.40s/it] 96%|█████████▌| 5530/5773 [50:05<22:06, 5.46s/it] 96%|█████████▌| 5530/5773 [50:07<22:06, 5.46s/it] {'loss': 0.5544, 'learning_rate': 9.280864768069775e-08, 'epoch': 0.96} 96%|█████████▌| 5530/5773 [50:07<22:06, 5.46s/it]{'loss': 0.5544, 'learning_rate': 9.280864768069775e-08, 'epoch': 0.96} 96%|█████████▌| 5530/5773 [50:05<22:06, 5.46s/it] 96%|█████████▌| 5531/5773 [50:10<21:56, 5.44s/it] 96%|█████████▌| 5531/5773 [50:12<21:56, 5.44s/it] {'loss': 0.5468, 'learning_rate': 9.204753382986097e-08, 'epoch': 0.96} 96%|█████████▌| 5531/5773 [50:12<21:56, 5.44s/it]{'loss': 0.5468, 'learning_rate': 9.204753382986097e-08, 'epoch': 0.96} 96%|█████████▌| 5531/5773 [50:10<21:56, 5.44s/it] 96%|█████████▌| 5532/5773 [50:15<21:50, 5.44s/it] 96%|█████████▌| 5532/5773 [50:18<21:50, 5.44s/it] {'loss': 0.5466, 'learning_rate': 9.12895393190183e-08, 'epoch': 0.96} 96%|█████████▌| 5532/5773 [50:18<21:50, 5.44s/it] {'loss': 0.5466, 'learning_rate': 9.12895393190183e-08, 'epoch': 0.96} 96%|█████████▌| 5532/5773 [50:15<21:50, 5.44s/it] 96%|█████████▌| 5533/5773 [50:21<21:49, 5.46s/it] 96%|█████████▌| 5533/5773 [50:23<21:49, 5.46s/it] {'loss': 0.5762, 'learning_rate': 9.05346643868088e-08, 'epoch': 0.96} 96%|█████████▌| 5533/5773 [50:23<21:49, 5.46s/it] {'loss': 0.5762, 'learning_rate': 9.05346643868088e-08, 'epoch': 0.96} 96%|█████████▌| 5533/5773 [50:21<21:49, 5.46s/it] 96%|█████████▌| 5534/5773 [50:27<21:58, 5.52s/it] 96%|█████████▌| 5534/5773 [50:29<21:58, 5.52s/it] {'loss': 0.5653, 'learning_rate': 8.978290927089239e-08, 'epoch': 0.96} 96%|█████████▌| 5534/5773 [50:29<21:58, 5.52s/it]{'loss': 0.5653, 'learning_rate': 8.978290927089239e-08, 'epoch': 0.96} 96%|█████████▌| 5534/5773 [50:27<21:58, 5.52s/it] 96%|█████████▌| 5535/5773 [50:32<21:44, 5.48s/it] 96%|█████████▌| 5535/5773 [50:34<21:44, 5.48s/it] {'loss': 0.5615, 'learning_rate': 8.903427420794531e-08, 'epoch': 0.96} 96%|█████████▌| 5535/5773 [50:34<21:44, 5.48s/it] {'loss': 0.5615, 'learning_rate': 8.903427420794531e-08, 'epoch': 0.96} 96%|█████████▌| 5535/5773 [50:32<21:44, 5.48s/it] 96%|█████████▌| 5536/5773 [50:37<21:38, 5.48s/it] 96%|█████████▌| 5536/5773 [50:40<21:38, 5.48s/it] {'loss': 0.5579, 'learning_rate': 8.828875943366233e-08, 'epoch': 0.96} 96%|█████████▌| 5536/5773 [50:40<21:38, 5.48s/it] {'loss': 0.5579, 'learning_rate': 8.828875943366233e-08, 'epoch': 0.96} 96%|█████████▌| 5536/5773 [50:37<21:38, 5.48s/it] 96%|█████████▌| 5537/5773 [50:43<21:30, 5.47s/it] 96%|█████████▌| 5537/5773 [50:45<21:30, 5.47s/it] {'loss': 0.5598, 'learning_rate': 8.754636518275461e-08, 'epoch': 0.96} 96%|█████████▌| 5537/5773 [50:45<21:30, 5.47s/it] {'loss': 0.5598, 'learning_rate': 8.754636518275461e-08, 'epoch': 0.96} 96%|█████████▌| 5537/5773 [50:43<21:30, 5.47s/it] 96%|█████████▌| 5538/5773 [50:48<21:22, 5.46s/it] 96%|█████████▌| 5538/5773 [50:51<21:22, 5.46s/it] {'loss': 0.5574, 'learning_rate': 8.680709168895185e-08, 'epoch': 0.96} 96%|█████████▌| 5538/5773 [50:51<21:22, 5.46s/it] {'loss': 0.5574, 'learning_rate': 8.680709168895185e-08, 'epoch': 0.96} 96%|█████████▌| 5538/5773 [50:48<21:22, 5.46s/it] 96%|█████████▌| 5539/5773 [50:54<21:15, 5.45s/it] 96%|█████████▌| 5539/5773 [50:56<21:15, 5.45s/it] {'loss': 0.5603, 'learning_rate': 8.60709391850012e-08, 'epoch': 0.96} {'loss': 0.5603, 'learning_rate': 8.60709391850012e-08, 'epoch': 0.96} 96%|█████████▌| 5539/5773 [50:56<21:15, 5.45s/it] 96%|█████████▌| 5539/5773 [50:54<21:15, 5.45s/it] 96%|█████████▌| 5540/5773 [50:59<21:09, 5.45s/it] 96%|█████████▌| 5540/5773 [51:02<21:09, 5.45s/it] {'loss': 0.5677, 'learning_rate': 8.533790790266728e-08, 'epoch': 0.96} 96%|█████████▌| 5540/5773 [51:02<21:09, 5.45s/it] {'loss': 0.5677, 'learning_rate': 8.533790790266728e-08, 'epoch': 0.96} 96%|█████████▌| 5540/5773 [50:59<21:09, 5.45s/it] 96%|█████████▌| 5541/5773 [51:05<21:13, 5.49s/it] 96%|█████████▌| 5541/5773 [51:07<21:13, 5.49s/it] {'loss': 0.5663, 'learning_rate': 8.460799807272991e-08, 'epoch': 0.96} 96%|█████████▌| 5541/5773 [51:07<21:13, 5.49s/it] {'loss': 0.5663, 'learning_rate': 8.460799807272991e-08, 'epoch': 0.96} 96%|█████████▌| 5541/5773 [51:05<21:13, 5.49s/it] 96%|█████████▌| 5542/5773 [51:10<21:05, 5.48s/it] 96%|█████████▌| 5542/5773 [51:13<21:05, 5.48s/it] {'loss': 0.5665, 'learning_rate': 8.388120992499083e-08, 'epoch': 0.96} 96%|█████████▌| 5542/5773 [51:13<21:05, 5.48s/it]{'loss': 0.5665, 'learning_rate': 8.388120992499083e-08, 'epoch': 0.96} 96%|█████████▌| 5542/5773 [51:10<21:05, 5.48s/it] 96%|█████████▌| 5543/5773 [51:16<21:06, 5.51s/it] 96%|█████████▌| 5543/5773 [51:18<21:06, 5.51s/it] {'loss': 0.5449, 'learning_rate': 8.315754368826368e-08, 'epoch': 0.96} 96%|█████████▌| 5543/5773 [51:18<21:06, 5.51s/it] {'loss': 0.5449, 'learning_rate': 8.315754368826368e-08, 'epoch': 0.96} 96%|█████████▌| 5543/5773 [51:16<21:06, 5.51s/it] 96%|█████████▌| 5544/5773 [51:21<21:05, 5.53s/it] 96%|█████████▌| 5544/5773 [51:24<21:05, 5.53s/it] {'loss': 0.5415, 'learning_rate': 8.243699959038287e-08, 'epoch': 0.96} 96%|█████████▌| 5544/5773 [51:24<21:05, 5.53s/it]{'loss': 0.5415, 'learning_rate': 8.243699959038287e-08, 'epoch': 0.96} 96%|█████████▌| 5544/5773 [51:21<21:05, 5.53s/it] 96%|█████████▌| 5545/5773 [51:27<20:44, 5.46s/it] 96%|█████████▌| 5545/5773 [51:29<20:44, 5.46s/it] {'loss': 0.5626, 'learning_rate': 8.171957785819918e-08, 'epoch': 0.96} 96%|█████████▌| 5545/5773 [51:29<20:44, 5.46s/it] {'loss': 0.5626, 'learning_rate': 8.171957785819918e-08, 'epoch': 0.96} 96%|█████████▌| 5545/5773 [51:27<20:44, 5.46s/it] 96%|█████████▌| 5546/5773 [51:32<20:39, 5.46s/it] 96%|█████████▌| 5546/5773 [51:35<20:39, 5.46s/it] {'loss': 0.5579, 'learning_rate': 8.100527871757858e-08, 'epoch': 0.96} 96%|█████████▌| 5546/5773 [51:35<20:39, 5.46s/it]{'loss': 0.5579, 'learning_rate': 8.100527871757858e-08, 'epoch': 0.96} 96%|█████████▌| 5546/5773 [51:32<20:39, 5.46s/it] 96%|█████████▌| 5547/5773 [51:38<20:36, 5.47s/it] 96%|█████████▌| 5547/5773 [51:40<20:36, 5.47s/it] {'loss': 0.5622, 'learning_rate': 8.029410239340562e-08, 'epoch': 0.96} 96%|█████████▌| 5547/5773 [51:40<20:36, 5.47s/it] {'loss': 0.5622, 'learning_rate': 8.029410239340562e-08, 'epoch': 0.96} 96%|█████████▌| 5547/5773 [51:38<20:36, 5.47s/it] 96%|█████████▌| 5548/5773 [51:43<20:24, 5.44s/it] 96%|█████████▌| 5548/5773 [51:45<20:24, 5.44s/it] {'loss': 0.5555, 'learning_rate': 7.958604910958235e-08, 'epoch': 0.96} {'loss': 0.5555, 'learning_rate': 7.958604910958235e-08, 'epoch': 0.96} 96%|█████████▌| 5548/5773 [51:43<20:24, 5.44s/it] 96%|█████████▌| 5548/5773 [51:45<20:24, 5.44s/it] 96%|█████████▌| 5549/5773 [51:49<20:22, 5.46s/it] 96%|█████████▌| 5549/5773 [51:51<20:22, 5.46s/it] {'loss': 0.5453, 'learning_rate': 7.888111908902595e-08, 'epoch': 0.96} 96%|█████████▌| 5549/5773 [51:51<20:22, 5.46s/it] {'loss': 0.5453, 'learning_rate': 7.888111908902595e-08, 'epoch': 0.96} 96%|█████████▌| 5549/5773 [51:49<20:22, 5.46s/it]14 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1112 AutoResumeHook: Checking whether to suspend...01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 96%|█████████▌| 5550/5773 [51:54<20:21, 5.48s/it]3 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 96%|█████████▌| 5550/5773 [51:57<20:21, 5.48s/it] {'loss': 0.565, 'learning_rate': 7.817931255367006e-08, 'epoch': 0.96} {'loss': 0.565, 'learning_rate': 7.817931255367006e-08, 'epoch': 0.96} 96%|█████████▌| 5550/5773 [51:57<20:21, 5.48s/it] 96%|█████████▌| 5550/5773 [51:54<20:21, 5.48s/it] 96%|█████████▌| 5551/5773 [51:59<20:03, 5.42s/it] 96%|█████████▌| 5551/5773 [52:02<20:03, 5.42s/it] {'loss': 0.5666, 'learning_rate': 7.74806297244668e-08, 'epoch': 0.96} 96%|█████████▌| 5551/5773 [52:02<20:03, 5.42s/it] {'loss': 0.5666, 'learning_rate': 7.74806297244668e-08, 'epoch': 0.96} 96%|█████████▌| 5551/5773 [51:59<20:03, 5.42s/it] 96%|█████████▌| 5552/5773 [52:05<20:04, 5.45s/it] 96%|█████████▌| 5552/5773 [52:07<20:04, 5.45s/it] {'loss': 0.5532, 'learning_rate': 7.678507082138464e-08, 'epoch': 0.96} 96%|█████████▌| 5552/5773 [52:07<20:04, 5.45s/it]{'loss': 0.5532, 'learning_rate': 7.678507082138464e-08, 'epoch': 0.96} 96%|█████████▌| 5552/5773 [52:05<20:04, 5.45s/it] 96%|█████████▌| 5553/5773 [52:10<19:59, 5.45s/it] 96%|█████████▌| 5553/5773 [52:13<19:59, 5.45s/it] {'loss': 0.5634, 'learning_rate': 7.609263606340622e-08, 'epoch': 0.96} 96%|█████████▌| 5553/5773 [52:13<19:59, 5.45s/it]{'loss': 0.5634, 'learning_rate': 7.609263606340622e-08, 'epoch': 0.96} 96%|█████████▌| 5553/5773 [52:10<19:59, 5.45s/it] 96%|█████████▌| 5554/5773 [52:16<19:56, 5.46s/it] 96%|█████████▌| 5554/5773 [52:18<19:56, 5.46s/it] {'loss': 0.5734, 'learning_rate': 7.540332566853158e-08, 'epoch': 0.96} 96%|█████████▌| 5554/5773 [52:18<19:56, 5.46s/it] {'loss': 0.5734, 'learning_rate': 7.540332566853158e-08, 'epoch': 0.96} 96%|█████████▌| 5554/5773 [52:16<19:56, 5.46s/it] 96%|█████████▌| 5555/5773 [52:22<20:05, 5.53s/it] 96%|█████████▌| 5555/5773 [52:24<20:05, 5.53s/it] {'loss': 0.5594, 'learning_rate': 7.471713985378049e-08, 'epoch': 0.96} 96%|█████████▌| 5555/5773 [52:24<20:05, 5.53s/it]{'loss': 0.5594, 'learning_rate': 7.471713985378049e-08, 'epoch': 0.96} 96%|█████████▌| 5555/5773 [52:22<20:05, 5.53s/it] 96%|█████████▌| 5556/5773 [52:27<19:57, 5.52s/it] 96%|█████████▌| 5556/5773 [52:29<19:57, 5.52s/it] {'loss': 0.5637, 'learning_rate': 7.403407883518344e-08, 'epoch': 0.96} 96%|█████████▌| 5556/5773 [52:29<19:57, 5.52s/it]{'loss': 0.5637, 'learning_rate': 7.403407883518344e-08, 'epoch': 0.96} 96%|█████████▌| 5556/5773 [52:27<19:57, 5.52s/it] 96%|█████████▋| 5557/5773 [52:33<19:53, 5.52s/it] 96%|█████████▋| 5557/5773 [52:35<19:53, 5.53s/it] {'loss': 0.5535, 'learning_rate': 7.335414282779063e-08, 'epoch': 0.96} {'loss': 0.5535, 'learning_rate': 7.335414282779063e-08, 'epoch': 0.96} 96%|█████████▋| 5557/5773 [52:35<19:53, 5.53s/it] 96%|█████████▋| 5557/5773 [52:33<19:53, 5.52s/it] 96%|█████████▋| 5558/5773 [52:38<19:44, 5.51s/it] 96%|█████████▋| 5558/5773 [52:40<19:44, 5.51s/it] {'loss': 0.5485, 'learning_rate': 7.267733204566862e-08, 'epoch': 0.96} 96%|█████████▋| 5558/5773 [52:40<19:44, 5.51s/it] {'loss': 0.5485, 'learning_rate': 7.267733204566862e-08, 'epoch': 0.96} 96%|█████████▋| 5558/5773 [52:38<19:44, 5.51s/it] 96%|█████████▋| 5559/5773 [52:43<19:26, 5.45s/it] 96%|█████████▋| 5559/5773 [52:46<19:26, 5.45s/it] {'loss': 0.5505, 'learning_rate': 7.200364670189807e-08, 'epoch': 0.96} 96%|█████████▋| 5559/5773 [52:46<19:26, 5.45s/it]{'loss': 0.5505, 'learning_rate': 7.200364670189807e-08, 'epoch': 0.96} 96%|█████████▋| 5559/5773 [52:43<19:26, 5.45s/it] 96%|█████████▋| 5560/5773 [52:49<19:21, 5.45s/it] 96%|█████████▋| 5560/5773 [52:51<19:21, 5.45s/it] {'loss': 0.5595, 'learning_rate': 7.1333087008576e-08, 'epoch': 0.96} 96%|█████████▋| 5560/5773 [52:51<19:21, 5.45s/it] {'loss': 0.5595, 'learning_rate': 7.1333087008576e-08, 'epoch': 0.96} 96%|█████████▋| 5560/5773 [52:49<19:21, 5.45s/it] 96%|█████████▋| 5561/5773 [52:54<19:20, 5.47s/it] 96%|█████████▋| 5561/5773 [52:57<19:20, 5.47s/it] {'loss': 0.5554, 'learning_rate': 7.066565317681795e-08, 'epoch': 0.96} 96%|█████████▋| 5561/5773 [52:57<19:20, 5.47s/it]{'loss': 0.5554, 'learning_rate': 7.066565317681795e-08, 'epoch': 0.96} 96%|█████████▋| 5561/5773 [52:54<19:20, 5.47s/it] 96%|█████████▋| 5562/5773 [53:00<19:14, 5.47s/it] 96%|█████████▋| 5562/5773 [53:02<19:14, 5.47s/it] {'loss': 0.5446, 'learning_rate': 7.000134541675141e-08, 'epoch': 0.96} 96%|█████████▋| 5562/5773 [53:02<19:14, 5.47s/it]{'loss': 0.5446, 'learning_rate': 7.000134541675141e-08, 'epoch': 0.96} 96%|█████████▋| 5562/5773 [53:00<19:14, 5.47s/it] 96%|█████████▋| 5563/5773 [53:05<19:01, 5.43s/it] 96%|█████████▋| 5563/5773 [53:08<19:01, 5.43s/it] {'loss': 0.5351, 'learning_rate': 6.934016393752352e-08, 'epoch': 0.96} 96%|█████████▋| 5563/5773 [53:08<19:01, 5.43s/it]{'loss': 0.5351, 'learning_rate': 6.934016393752352e-08, 'epoch': 0.96} 96%|█████████▋| 5563/5773 [53:05<19:01, 5.43s/it] 96%|█████████▋| 5564/5773 [53:11<18:52, 5.42s/it] 96%|█████████▋| 5564/5773 [53:13<18:52, 5.42s/it] {'loss': 0.5603, 'learning_rate': 6.868210894729333e-08, 'epoch': 0.96} 96%|█████████▋| 5564/5773 [53:13<18:52, 5.42s/it] {'loss': 0.5603, 'learning_rate': 6.868210894729333e-08, 'epoch': 0.96} 96%|█████████▋| 5564/5773 [53:11<18:52, 5.42s/it] 96%|█████████▋| 5565/5773 [53:16<18:54, 5.46s/it] 96%|█████████▋| 5565/5773 [53:18<18:54, 5.46s/it] {'loss': 0.5458, 'learning_rate': 6.802718065323955e-08, 'epoch': 0.96} 96%|█████████▋| 5565/5773 [53:18<18:54, 5.46s/it] {'loss': 0.5458, 'learning_rate': 6.802718065323955e-08, 'epoch': 0.96} 96%|█████████▋| 5565/5773 [53:16<18:54, 5.46s/it] 96%|█████████▋| 5566/5773 [53:21<18:42, 5.42s/it] 96%|█████████▋| 5566/5773 [53:24<18:42, 5.42s/it] {'loss': 0.5482, 'learning_rate': 6.73753792615528e-08, 'epoch': 0.96} {'loss': 0.5482, 'learning_rate': 6.73753792615528e-08, 'epoch': 0.96} 96%|█████████▋| 5566/5773 [53:24<18:42, 5.42s/it] 96%|█████████▋| 5566/5773 [53:21<18:42, 5.42s/it] 96%|█████████▋| 5567/5773 [53:27<18:38, 5.43s/it] 96%|█████████▋| 5567/5773 [53:29<18:38, 5.43s/it] {'loss': 0.5477, 'learning_rate': 6.672670497744227e-08, 'epoch': 0.96} 96%|█████████▋| 5567/5773 [53:29<18:38, 5.43s/it]{'loss': 0.5477, 'learning_rate': 6.672670497744227e-08, 'epoch': 0.96} 96%|█████████▋| 5567/5773 [53:27<18:38, 5.43s/it] 96%|█████████▋| 5568/5773 [53:32<18:34, 5.44s/it] 96%|█████████▋| 5568/5773 [53:35<18:34, 5.44s/it] {'loss': 0.5524, 'learning_rate': 6.608115800513126e-08, 'epoch': 0.96} 96%|█████████▋| 5568/5773 [53:35<18:34, 5.44s/it] {'loss': 0.5524, 'learning_rate': 6.608115800513126e-08, 'epoch': 0.96} 96%|█████████▋| 5568/5773 [53:32<18:34, 5.44s/it] 96%|█████████▋| 5569/5773 [53:38<18:21, 5.40s/it] 96%|█████████▋| 5569/5773 [53:40<18:21, 5.40s/it] {'loss': 0.5573, 'learning_rate': 6.543873854785831e-08, 'epoch': 0.96} 96%|█████████▋| 5569/5773 [53:40<18:21, 5.40s/it] {'loss': 0.5573, 'learning_rate': 6.543873854785831e-08, 'epoch': 0.96} 96%|█████████▋| 5569/5773 [53:38<18:21, 5.40s/it] 96%|█████████▋| 5570/5773 [53:43<18:10, 5.37s/it] 96%|█████████▋| 5570/5773 [53:45<18:10, 5.37s/it] {'loss': 0.5443, 'learning_rate': 6.47994468078772e-08, 'epoch': 0.96} 96%|█████████▋| 5570/5773 [53:45<18:10, 5.37s/it]{'loss': 0.5443, 'learning_rate': 6.47994468078772e-08, 'epoch': 0.96} 96%|█████████▋| 5570/5773 [53:43<18:10, 5.37s/it] 97%|█████████▋| 5571/5773 [53:48<18:01, 5.35s/it] 97%|█████████▋| 5571/5773 [53:51<18:01, 5.35s/it] {'loss': 0.5531, 'learning_rate': 6.416328298645802e-08, 'epoch': 0.97} 97%|█████████▋| 5571/5773 [53:51<18:01, 5.35s/it] {'loss': 0.5531, 'learning_rate': 6.416328298645802e-08, 'epoch': 0.97} 97%|█████████▋| 5571/5773 [53:48<18:01, 5.35s/it] 97%|█████████▋| 5572/5773 [53:54<18:11, 5.43s/it] 97%|█████████▋| 5572/5773 [53:56<18:11, 5.43s/it] {'loss': 0.5558, 'learning_rate': 6.353024728388502e-08, 'epoch': 0.97} 97%|█████████▋| 5572/5773 [53:54<18:11, 5.43s/it] {'loss': 0.5558, 'learning_rate': 6.353024728388502e-08, 'epoch': 0.97} 97%|█████████▋| 5572/5773 [53:56<18:11, 5.43s/it] 97%|█████████▋| 5573/5773 [53:59<17:56, 5.38s/it] 97%|█████████▋| 5573/5773 [54:02<17:56, 5.38s/it] {'loss': 0.5415, 'learning_rate': 6.290033989945877e-08, 'epoch': 0.97} 97%|█████████▋| 5573/5773 [54:02<17:56, 5.38s/it] {'loss': 0.5415, 'learning_rate': 6.290033989945877e-08, 'epoch': 0.97} 97%|█████████▋| 5573/5773 [53:59<17:56, 5.38s/it] 97%|█████████▋| 5574/5773 [54:04<17:48, 5.37s/it] 97%|█████████▋| 5574/5773 [54:07<17:48, 5.37s/it] {'loss': 0.5613, 'learning_rate': 6.227356103149285e-08, 'epoch': 0.97} 97%|█████████▋| 5574/5773 [54:07<17:48, 5.37s/it] {'loss': 0.5613, 'learning_rate': 6.227356103149285e-08, 'epoch': 0.97} 97%|█████████▋| 5574/5773 [54:04<17:48, 5.37s/it] 97%|█████████▋| 5575/5773 [54:10<17:45, 5.38s/it] 97%|█████████▋| 5575/5773 [54:12<17:45, 5.38s/it] {'loss': 0.5466, 'learning_rate': 6.164991087731831e-08, 'epoch': 0.97} 97%|█████████▋| 5575/5773 [54:12<17:45, 5.38s/it] {'loss': 0.5466, 'learning_rate': 6.164991087731831e-08, 'epoch': 0.97} 97%|█████████▋| 5575/5773 [54:10<17:45, 5.38s/it] 97%|█████████▋| 5576/5773 [54:15<17:55, 5.46s/it] 97%|█████████▋| 5576/5773 [54:18<17:55, 5.46s/it] {'loss': 0.5419, 'learning_rate': 6.102938963328031e-08, 'epoch': 0.97} 97%|█████████▋| 5576/5773 [54:18<17:55, 5.46s/it]{'loss': 0.5419, 'learning_rate': 6.102938963328031e-08, 'epoch': 0.97} 97%|█████████▋| 5576/5773 [54:15<17:55, 5.46s/it] 97%|█████████▋| 5577/5773 [54:21<17:47, 5.45s/it] 97%|█████████▋| 5577/5773 [54:23<17:47, 5.45s/it] {'loss': 0.559, 'learning_rate': 6.041199749473814e-08, 'epoch': 0.97} 97%|█████████▋| 5577/5773 [54:23<17:47, 5.45s/it] {'loss': 0.559, 'learning_rate': 6.041199749473814e-08, 'epoch': 0.97} 97%|█████████▋| 5577/5773 [54:21<17:47, 5.45s/it] 97%|█████████▋| 5578/5773 [54:26<17:42, 5.45s/it] 97%|█████████▋| 5578/5773 [54:29<17:42, 5.45s/it] {'loss': 0.5504, 'learning_rate': 5.97977346560663e-08, 'epoch': 0.97} 97%|█████████▋| 5578/5773 [54:29<17:42, 5.45s/it] {'loss': 0.5504, 'learning_rate': 5.97977346560663e-08, 'epoch': 0.97} 97%|█████████▋| 5578/5773 [54:26<17:42, 5.45s/it] 97%|█████████▋| 5579/5773 [54:32<17:42, 5.48s/it] 97%|█████████▋| 5579/5773 [54:34<17:42, 5.48s/it] {'loss': 0.5565, 'learning_rate': 5.918660131065568e-08, 'epoch': 0.97} 97%|█████████▋| 5579/5773 [54:34<17:42, 5.48s/it]{'loss': 0.5565, 'learning_rate': 5.918660131065568e-08, 'epoch': 0.97} 97%|█████████▋| 5579/5773 [54:32<17:42, 5.48s/it] 97%|█████████▋| 5580/5773 [54:37<17:41, 5.50s/it] 97%|█████████▋| 5580/5773 [54:40<17:41, 5.50s/it] {'loss': 0.5664, 'learning_rate': 5.8578597650909005e-08, 'epoch': 0.97} 97%|█████████▋| 5580/5773 [54:40<17:41, 5.50s/it]{'loss': 0.5664, 'learning_rate': 5.8578597650909005e-08, 'epoch': 0.97} 97%|█████████▋| 5580/5773 [54:37<17:41, 5.50s/it] 97%|█████████▋| 5581/5773 [54:43<17:39, 5.52s/it] 97%|█████████▋| 5581/5773 [54:45<17:39, 5.52s/it] {'loss': 0.5551, 'learning_rate': 5.797372386824651e-08, 'epoch': 0.97} {'loss': 0.5551, 'learning_rate': 5.797372386824651e-08, 'epoch': 0.97} 97%|█████████▋| 5581/5773 [54:45<17:39, 5.52s/it] 97%|█████████▋| 5581/5773 [54:43<17:39, 5.52s/it] 97%|█████████▋| 5582/5773 [54:49<17:32, 5.51s/it] 97%|█████████▋| 5582/5773 [54:51<17:32, 5.51s/it] {'loss': 0.5562, 'learning_rate': 5.7371980153101416e-08, 'epoch': 0.97} 97%|█████████▋| 5582/5773 [54:51<17:32, 5.51s/it] {'loss': 0.5562, 'learning_rate': 5.7371980153101416e-08, 'epoch': 0.97} 97%|█████████▋| 5582/5773 [54:49<17:32, 5.51s/it] 97%|█████████▋| 5583/5773 [54:54<17:17, 5.46s/it] 97%|█████████▋| 5583/5773 [54:56<17:17, 5.46s/it] {'loss': 0.5414, 'learning_rate': 5.6773366694922174e-08, 'epoch': 0.97} {'loss': 0.5414, 'learning_rate': 5.6773366694922174e-08, 'epoch': 0.97} 97%|█████████▋| 5583/5773 [54:56<17:17, 5.46s/it] 97%|█████████▋| 5583/5773 [54:54<17:17, 5.46s/it] 97%|█████████▋| 5584/5773 [54:59<17:10, 5.45s/it] 97%|█████████▋| 5584/5773 [55:02<17:10, 5.45s/it] {'loss': 0.5684, 'learning_rate': 5.6177883682170254e-08, 'epoch': 0.97} 97%|█████████▋| 5584/5773 [55:02<17:10, 5.45s/it] {'loss': 0.5684, 'learning_rate': 5.6177883682170254e-08, 'epoch': 0.97} 97%|█████████▋| 5584/5773 [54:59<17:10, 5.45s/it] 97%|█████████▋| 5585/5773 [55:05<16:59, 5.42s/it] 97%|█████████▋| 5585/5773 [55:07<16:59, 5.42s/it] {'loss': 0.562, 'learning_rate': 5.558553130232347e-08, 'epoch': 0.97} {'loss': 0.562, 'learning_rate': 5.558553130232347e-08, 'epoch': 0.97} 97%|█████████▋| 5585/5773 [55:07<16:59, 5.42s/it] 97%|█████████▋| 5585/5773 [55:05<16:59, 5.42s/it] 97%|█████████▋| 5586/5773 [55:10<16:55, 5.43s/it] 97%|█████████▋| 5586/5773 [55:13<16:55, 5.43s/it] {'loss': 0.5694, 'learning_rate': 5.4996309741873755e-08, 'epoch': 0.97} 97%|█████████▋| 5586/5773 [55:13<16:55, 5.43s/it] {'loss': 0.5694, 'learning_rate': 5.4996309741873755e-08, 'epoch': 0.97} 97%|█████████▋| 5586/5773 [55:10<16:55, 5.43s/it] 97%|█████████▋| 5587/5773 [55:16<16:54, 5.46s/it] 97%|█████████▋| 5587/5773 [55:18<16:54, 5.46s/it] {'loss': 0.5434, 'learning_rate': 5.4410219186326055e-08, 'epoch': 0.97} {'loss': 0.5434, 'learning_rate': 5.4410219186326055e-08, 'epoch': 0.97} 97%|█████████▋| 5587/5773 [55:18<16:54, 5.46s/it] 97%|█████████▋| 5587/5773 [55:16<16:54, 5.46s/it] 97%|█████████▋| 5588/5773 [55:21<16:49, 5.46s/it] 97%|█████████▋| 5588/5773 [55:23<16:49, 5.46s/it] {'loss': 0.5771, 'learning_rate': 5.382725982020165e-08, 'epoch': 0.97} 97%|█████████▋| 5588/5773 [55:23<16:49, 5.46s/it]{'loss': 0.5771, 'learning_rate': 5.382725982020165e-08, 'epoch': 0.97} 97%|█████████▋| 5588/5773 [55:21<16:49, 5.46s/it] 97%|█████████▋| 5589/5773 [55:27<16:46, 5.47s/it] 97%|█████████▋| 5589/5773 [55:29<16:46, 5.47s/it] {'loss': 0.5526, 'learning_rate': 5.324743182703262e-08, 'epoch': 0.97} 97%|█████████▋| 5589/5773 [55:29<16:46, 5.47s/it] {'loss': 0.5526, 'learning_rate': 5.324743182703262e-08, 'epoch': 0.97} 97%|█████████▋| 5589/5773 [55:27<16:46, 5.47s/it] 97%|█████████▋| 5590/5773 [55:32<16:32, 5.42s/it] 97%|█████████▋| 5590/5773 [55:34<16:32, 5.42s/it] {'loss': 0.5584, 'learning_rate': 5.267073538936962e-08, 'epoch': 0.97} 97%|█████████▋| 5590/5773 [55:34<16:32, 5.42s/it] {'loss': 0.5584, 'learning_rate': 5.267073538936962e-08, 'epoch': 0.97} 97%|█████████▋| 5590/5773 [55:32<16:32, 5.42s/it] 97%|█████████▋| 5591/5773 [55:37<16:27, 5.43s/it] 97%|█████████▋| 5591/5773 [55:40<16:27, 5.43s/it] {'loss': 0.5635, 'learning_rate': 5.2097170688772955e-08, 'epoch': 0.97} 97%|█████████▋| 5591/5773 [55:40<16:27, 5.43s/it] {'loss': 0.5635, 'learning_rate': 5.2097170688772955e-08, 'epoch': 0.97} 97%|█████████▋| 5591/5773 [55:37<16:27, 5.43s/it] 97%|█████████▋| 5592/5773 [55:43<16:19, 5.41s/it] 97%|█████████▋| 5592/5773 [55:45<16:19, 5.41s/it] {'loss': 0.5502, 'learning_rate': 5.152673790582152e-08, 'epoch': 0.97} 97%|█████████▋| 5592/5773 [55:45<16:19, 5.41s/it] {'loss': 0.5502, 'learning_rate': 5.152673790582152e-08, 'epoch': 0.97} 97%|█████████▋| 5592/5773 [55:43<16:19, 5.41s/it] 97%|█████████▋| 5593/5773 [55:48<16:20, 5.45s/it] 97%|█████████▋| 5593/5773 [55:51<16:20, 5.45s/it] {'loss': 0.5508, 'learning_rate': 5.095943722010388e-08, 'epoch': 0.97} 97%|█████████▋| 5593/5773 [55:51<16:20, 5.45s/it] {'loss': 0.5508, 'learning_rate': 5.095943722010388e-08, 'epoch': 0.97} 97%|█████████▋| 5593/5773 [55:48<16:20, 5.45s/it] 97%|█████████▋| 5594/5773 [55:54<16:19, 5.47s/it] 97%|█████████▋| 5594/5773 [55:56<16:19, 5.47s/it] {'loss': 0.5581, 'learning_rate': 5.039526881022494e-08, 'epoch': 0.97} 97%|█████████▋| 5594/5773 [55:56<16:19, 5.47s/it]{'loss': 0.5581, 'learning_rate': 5.039526881022494e-08, 'epoch': 0.97} 97%|█████████▋| 5594/5773 [55:54<16:19, 5.47s/it] 97%|█████████▋| 5595/5773 [55:59<16:14, 5.47s/it] 97%|█████████▋| 5595/5773 [56:02<16:14, 5.47s/it] {'loss': 0.5455, 'learning_rate': 4.9834232853803734e-08, 'epoch': 0.97} 97%|█████████▋| 5595/5773 [56:02<16:14, 5.47s/it] {'loss': 0.5455, 'learning_rate': 4.9834232853803734e-08, 'epoch': 0.97} 97%|█████████▋| 5595/5773 [55:59<16:14, 5.47s/it] 97%|█████████▋| 5596/5773 [56:05<16:08, 5.47s/it] 97%|█████████▋| 5596/5773 [56:07<16:08, 5.47s/it] {'loss': 0.5613, 'learning_rate': 4.927632952747119e-08, 'epoch': 0.97} 97%|█████████▋| 5596/5773 [56:07<16:08, 5.47s/it]{'loss': 0.5613, 'learning_rate': 4.927632952747119e-08, 'epoch': 0.97} 97%|█████████▋| 5596/5773 [56:05<16:08, 5.47s/it] 97%|█████████▋| 5597/5773 [56:10<16:03, 5.47s/it] 97%|█████████▋| 5597/5773 [56:13<16:03, 5.47s/it] {'loss': 0.5537, 'learning_rate': 4.8721559006873473e-08, 'epoch': 0.97} 97%|█████████▋| 5597/5773 [56:13<16:03, 5.47s/it] {'loss': 0.5537, 'learning_rate': 4.8721559006873473e-08, 'epoch': 0.97} 97%|█████████▋| 5597/5773 [56:10<16:03, 5.47s/it] 97%|█████████▋| 5598/5773 [56:16<15:53, 5.45s/it] 97%|█████████▋| 5598/5773 [56:18<15:53, 5.45s/it] {'loss': 0.5544, 'learning_rate': 4.8169921466670875e-08, 'epoch': 0.97} 97%|█████████▋| 5598/5773 [56:18<15:53, 5.45s/it]{'loss': 0.5544, 'learning_rate': 4.8169921466670875e-08, 'epoch': 0.97} 97%|█████████▋| 5598/5773 [56:16<15:53, 5.45s/it] 97%|█████████▋| 5599/5773 [56:21<15:55, 5.49s/it] 97%|█████████▋| 5599/5773 [56:24<15:55, 5.49s/it] {'loss': 0.5577, 'learning_rate': 4.762141708053558e-08, 'epoch': 0.97} 97%|█████████▋| 5599/5773 [56:24<15:55, 5.49s/it]{'loss': 0.5577, 'learning_rate': 4.762141708053558e-08, 'epoch': 0.97} 97%|█████████▋| 5599/5773 [56:21<15:55, 5.49s/it]011 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...12 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 97%|█████████▋| 5600/5773 [56:27<15:43, 5.45s/it]910 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 97%|█████████▋| 5600/5773 [56:29<15:43, 5.45s/it]15 AutoResumeHook: Checking whether to suspend... {'loss': 0.5539, 'learning_rate': 4.70760460211539e-08, 'epoch': 0.97} 97%|█████████▋| 5600/5773 [56:29<15:43, 5.45s/it]{'loss': 0.5539, 'learning_rate': 4.70760460211539e-08, 'epoch': 0.97} 97%|█████████▋| 5600/5773 [56:27<15:43, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 97%|█████████▋| 5601/5773 [56:49<29:55, 10.44s/it] 97%|█████████▋| 5601/5773 [56:51<29:55, 10.44s/it] {'loss': 0.5606, 'learning_rate': 4.653380846022737e-08, 'epoch': 0.97} 97%|█████████▋| 5601/5773 [56:51<29:55, 10.44s/it] {'loss': 0.5606, 'learning_rate': 4.653380846022737e-08, 'epoch': 0.97} 97%|█████████▋| 5601/5773 [56:49<29:55, 10.44s/it] 97%|█████████▋| 5602/5773 [56:54<25:26, 8.92s/it] 97%|█████████▋| 5602/5773 [56:56<25:26, 8.92s/it] {'loss': 0.5545, 'learning_rate': 4.599470456846833e-08, 'epoch': 0.97} {'loss': 0.5545, 'learning_rate': 4.599470456846833e-08, 'epoch': 0.97} 97%|█████████▋| 5602/5773 [56:56<25:26, 8.92s/it] 97%|█████████▋| 5602/5773 [56:54<25:26, 8.92s/it] 97%|█████████▋| 5603/5773 [56:59<22:15, 7.85s/it] 97%|█████████▋| 5603/5773 [57:02<22:15, 7.85s/it] {'loss': 0.5583, 'learning_rate': 4.545873451560545e-08, 'epoch': 0.97} 97%|█████████▋| 5603/5773 [57:02<22:15, 7.85s/it] {'loss': 0.5583, 'learning_rate': 4.545873451560545e-08, 'epoch': 0.97} 97%|█████████▋| 5603/5773 [56:59<22:15, 7.85s/it] 97%|█████████▋| 5604/5773 [57:07<20:06, 7.14s/it] 97%|█████████▋| 5604/5773 [57:05<20:06, 7.14s/it] {'loss': 0.549, 'learning_rate': 4.492589847037709e-08, 'epoch': 0.97} 97%|█████████▋| 5604/5773 [57:07<20:06, 7.14s/it] {'loss': 0.549, 'learning_rate': 4.492589847037709e-08, 'epoch': 0.97} 97%|█████████▋| 5604/5773 [57:05<20:06, 7.14s/it] 97%|█████████▋| 5605/5773 [57:10<18:36, 6.65s/it] 97%|█████████▋| 5605/5773 [57:13<18:36, 6.65s/it] {'loss': 0.5667, 'learning_rate': 4.4396196600539066e-08, 'epoch': 0.97} 97%|█████████▋| 5605/5773 [57:13<18:36, 6.65s/it]{'loss': 0.5667, 'learning_rate': 4.4396196600539066e-08, 'epoch': 0.97} 97%|█████████▋| 5605/5773 [57:10<18:36, 6.65s/it] 97%|█████████▋| 5606/5773 [57:16<17:28, 6.28s/it] 97%|█████████▋| 5606/5773 [57:18<17:28, 6.28s/it] {'loss': 0.5808, 'learning_rate': 4.386962907285797e-08, 'epoch': 0.97} 97%|█████████▋| 5606/5773 [57:18<17:28, 6.28s/it] {'loss': 0.5808, 'learning_rate': 4.386962907285797e-08, 'epoch': 0.97} 97%|█████████▋| 5606/5773 [57:16<17:28, 6.28s/it] 97%|█████████▋| 5607/5773 [57:21<16:40, 6.03s/it] 97%|█████████▋| 5607/5773 [57:24<16:40, 6.03s/it] {'loss': 0.5632, 'learning_rate': 4.334619605311341e-08, 'epoch': 0.97} 97%|█████████▋| 5607/5773 [57:24<16:40, 6.03s/it] {'loss': 0.5632, 'learning_rate': 4.334619605311341e-08, 'epoch': 0.97} 97%|█████████▋| 5607/5773 [57:21<16:40, 6.03s/it] 97%|█████████▋| 5608/5773 [57:27<16:09, 5.87s/it] 97%|█████████▋| 5608/5773 [57:29<16:09, 5.87s/it] {'loss': 0.5621, 'learning_rate': 4.2825897706100237e-08, 'epoch': 0.97} 97%|█████████▋| 5608/5773 [57:29<16:09, 5.87s/it]{'loss': 0.5621, 'learning_rate': 4.2825897706100237e-08, 'epoch': 0.97} 97%|█████████▋| 5608/5773 [57:27<16:09, 5.87s/it] 97%|█████████▋| 5609/5773 [57:32<15:41, 5.74s/it] 97%|█████████▋| 5609/5773 [57:35<15:41, 5.74s/it] {'loss': 0.5509, 'learning_rate': 4.230873419562298e-08, 'epoch': 0.97} 97%|█████████▋| 5609/5773 [57:35<15:41, 5.74s/it]{'loss': 0.5509, 'learning_rate': 4.230873419562298e-08, 'epoch': 0.97} 97%|█████████▋| 5609/5773 [57:32<15:41, 5.74s/it] 97%|█████████▋| 5610/5773 [57:38<15:31, 5.71s/it] 97%|█████████▋| 5610/5773 [57:40<15:31, 5.71s/it] {'loss': 0.5693, 'learning_rate': 4.1794705684501393e-08, 'epoch': 0.97} 97%|█████████▋| 5610/5773 [57:40<15:31, 5.71s/it] {'loss': 0.5693, 'learning_rate': 4.1794705684501393e-08, 'epoch': 0.97} 97%|█████████▋| 5610/5773 [57:38<15:31, 5.71s/it] 97%|█████████▋| 5611/5773 [57:43<15:15, 5.65s/it] 97%|█████████▋| 5611/5773 [57:46<15:15, 5.65s/it] {'loss': 0.5431, 'learning_rate': 4.1283812334570464e-08, 'epoch': 0.97} 97%|█████████▋| 5611/5773 [57:46<15:15, 5.65s/it] {'loss': 0.5431, 'learning_rate': 4.1283812334570464e-08, 'epoch': 0.97} 97%|█████████▋| 5611/5773 [57:43<15:15, 5.65s/it] 97%|█████████▋| 5612/5773 [57:49<14:59, 5.59s/it] 97%|█████████▋| 5612/5773 [57:51<14:59, 5.59s/it] {'loss': 0.5718, 'learning_rate': 4.077605430667375e-08, 'epoch': 0.97} 97%|█████████▋| 5612/5773 [57:51<14:59, 5.59s/it] {'loss': 0.5718, 'learning_rate': 4.077605430667375e-08, 'epoch': 0.97} 97%|█████████▋| 5612/5773 [57:49<14:59, 5.59s/it] 97%|█████████▋| 5613/5773 [57:54<14:51, 5.57s/it] 97%|█████████▋| 5613/5773 [57:57<14:51, 5.57s/it] {'loss': 0.5431, 'learning_rate': 4.027143176067117e-08, 'epoch': 0.97} 97%|█████████▋| 5613/5773 [57:57<14:51, 5.57s/it] {'loss': 0.5431, 'learning_rate': 4.027143176067117e-08, 'epoch': 0.97} 97%|█████████▋| 5613/5773 [57:54<14:51, 5.57s/it] 97%|█████████▋| 5614/5773 [58:00<14:36, 5.51s/it] 97%|█████████▋| 5614/5773 [58:02<14:36, 5.51s/it] {'loss': 0.5642, 'learning_rate': 3.976994485543117e-08, 'epoch': 0.97} 97%|█████████▋| 5614/5773 [58:02<14:36, 5.51s/it] {'loss': 0.5642, 'learning_rate': 3.976994485543117e-08, 'epoch': 0.97} 97%|█████████▋| 5614/5773 [58:00<14:36, 5.51s/it] 97%|█████████▋| 5615/5773 [58:05<14:25, 5.48s/it] 97%|█████████▋| 5615/5773 [58:07<14:25, 5.48s/it] {'loss': 0.5466, 'learning_rate': 3.927159374884082e-08, 'epoch': 0.97} 97%|█████████▋| 5615/5773 [58:07<14:25, 5.48s/it] {'loss': 0.5466, 'learning_rate': 3.927159374884082e-08, 'epoch': 0.97} 97%|█████████▋| 5615/5773 [58:05<14:25, 5.48s/it] 97%|█████████▋| 5616/5773 [58:11<14:27, 5.53s/it] 97%|█████████▋| 5616/5773 [58:13<14:27, 5.53s/it] {'loss': 0.5571, 'learning_rate': 3.8776378597795703e-08, 'epoch': 0.97} 97%|█████████▋| 5616/5773 [58:13<14:27, 5.53s/it] {'loss': 0.5571, 'learning_rate': 3.8776378597795703e-08, 'epoch': 0.97} 97%|█████████▋| 5616/5773 [58:11<14:27, 5.53s/it] 97%|█████████▋| 5617/5773 [58:16<14:12, 5.47s/it] 97%|█████████▋| 5617/5773 [58:18<14:12, 5.47s/it] {'loss': 0.537, 'learning_rate': 3.8284299558205565e-08, 'epoch': 0.97} {'loss': 0.537, 'learning_rate': 3.8284299558205565e-08, 'epoch': 0.97} 97%|█████████▋| 5617/5773 [58:18<14:12, 5.47s/it] 97%|█████████▋| 5617/5773 [58:16<14:12, 5.47s/it] 97%|█████████▋| 5618/5773 [58:21<14:08, 5.48s/it] 97%|█████████▋| 5618/5773 [58:24<14:08, 5.48s/it] {'loss': 0.5473, 'learning_rate': 3.779535678499202e-08, 'epoch': 0.97} 97%|█████████▋| 5618/5773 [58:24<14:08, 5.48s/it] {'loss': 0.5473, 'learning_rate': 3.779535678499202e-08, 'epoch': 0.97} 97%|█████████▋| 5618/5773 [58:21<14:08, 5.48s/it] 97%|█████████▋| 5619/5773 [58:27<14:04, 5.48s/it] 97%|█████████▋| 5619/5773 [58:29<14:04, 5.48s/it] {'loss': 0.5449, 'learning_rate': 3.730955043209083e-08, 'epoch': 0.97}{'loss': 0.5449, 'learning_rate': 3.730955043209083e-08, 'epoch': 0.97} 97%|█████████▋| 5619/5773 [58:29<14:04, 5.48s/it] 97%|█████████▋| 5619/5773 [58:27<14:04, 5.48s/it] 97%|█████████▋| 5620/5773 [58:32<13:58, 5.48s/it] 97%|█████████▋| 5620/5773 [58:35<13:58, 5.48s/it] {'loss': 0.562, 'learning_rate': 3.682688065244966e-08, 'epoch': 0.97} 97%|█████████▋| 5620/5773 [58:35<13:58, 5.48s/it]{'loss': 0.562, 'learning_rate': 3.682688065244966e-08, 'epoch': 0.97} 97%|█████████▋| 5620/5773 [58:32<13:58, 5.48s/it] 97%|█████████▋| 5621/5773 [58:38<13:50, 5.47s/it] 97%|█████████▋| 5621/5773 [58:40<13:50, 5.47s/it] {'loss': 0.5565, 'learning_rate': 3.6347347598026935e-08, 'epoch': 0.97} 97%|█████████▋| 5621/5773 [58:40<13:50, 5.47s/it] {'loss': 0.5565, 'learning_rate': 3.6347347598026935e-08, 'epoch': 0.97} 97%|█████████▋| 5621/5773 [58:38<13:50, 5.47s/it] 97%|█████████▋| 5622/5773 [58:43<13:43, 5.46s/it] 97%|█████████▋| 5622/5773 [58:46<13:43, 5.46s/it] {'loss': 0.5732, 'learning_rate': 3.5870951419795244e-08, 'epoch': 0.97} 97%|█████████▋| 5622/5773 [58:46<13:43, 5.46s/it] {'loss': 0.5732, 'learning_rate': 3.5870951419795244e-08, 'epoch': 0.97} 97%|█████████▋| 5622/5773 [58:43<13:43, 5.46s/it] 97%|█████████▋| 5623/5773 [58:49<13:45, 5.50s/it] 97%|█████████▋| 5623/5773 [58:51<13:45, 5.50s/it] {'loss': 0.5537, 'learning_rate': 3.5397692267740145e-08, 'epoch': 0.97} 97%|█████████▋| 5623/5773 [58:51<13:45, 5.50s/it] {'loss': 0.5537, 'learning_rate': 3.5397692267740145e-08, 'epoch': 0.97} 97%|█████████▋| 5623/5773 [58:49<13:45, 5.50s/it] 97%|█████████▋| 5624/5773 [58:54<13:32, 5.45s/it] 97%|█████████▋| 5624/5773 [58:57<13:32, 5.45s/it] {'loss': 0.5567, 'learning_rate': 3.492757029085914e-08, 'epoch': 0.97} 97%|█████████▋| 5624/5773 [58:57<13:32, 5.45s/it] {'loss': 0.5567, 'learning_rate': 3.492757029085914e-08, 'epoch': 0.97} 97%|█████████▋| 5624/5773 [58:54<13:32, 5.45s/it] 97%|█████████▋| 5625/5773 [59:00<13:21, 5.42s/it] 97%|█████████▋| 5625/5773 [59:02<13:21, 5.42s/it] {'loss': 0.5574, 'learning_rate': 3.446058563716048e-08, 'epoch': 0.97} 97%|█████████▋| 5625/5773 [59:02<13:21, 5.42s/it]{'loss': 0.5574, 'learning_rate': 3.446058563716048e-08, 'epoch': 0.97} 97%|█████████▋| 5625/5773 [59:00<13:21, 5.42s/it] 97%|█████████▋| 5626/5773 [59:05<13:20, 5.45s/it] 97%|█████████▋| 5626/5773 [59:08<13:20, 5.45s/it] {'loss': 0.5534, 'learning_rate': 3.3996738453665466e-08, 'epoch': 0.97} 97%|█████████▋| 5626/5773 [59:08<13:20, 5.45s/it]{'loss': 0.5534, 'learning_rate': 3.3996738453665466e-08, 'epoch': 0.97} 97%|█████████▋| 5626/5773 [59:05<13:20, 5.45s/it] 97%|█████████▋| 5627/5773 [59:11<13:14, 5.44s/it] 97%|█████████▋| 5627/5773 [59:13<13:14, 5.44s/it] {'loss': 0.5538, 'learning_rate': 3.35360288864095e-08, 'epoch': 0.97} 97%|█████████▋| 5627/5773 [59:13<13:14, 5.44s/it] {'loss': 0.5538, 'learning_rate': 3.35360288864095e-08, 'epoch': 0.97} 97%|█████████▋| 5627/5773 [59:11<13:14, 5.44s/it] 97%|█████████▋| 5628/5773 [59:16<13:03, 5.40s/it] 97%|█████████▋| 5628/5773 [59:18<13:03, 5.40s/it] {'loss': 0.5525, 'learning_rate': 3.307845708043877e-08, 'epoch': 0.97} 97%|█████████▋| 5628/5773 [59:18<13:03, 5.40s/it] {'loss': 0.5525, 'learning_rate': 3.307845708043877e-08, 'epoch': 0.97} 97%|█████████▋| 5628/5773 [59:16<13:03, 5.40s/it] 98%|█████████▊| 5629/5773 [59:21<13:01, 5.42s/it] 98%|█████████▊| 5629/5773 [59:24<13:01, 5.42s/it] {'loss': 0.5513, 'learning_rate': 3.262402317980917e-08, 'epoch': 0.98} 98%|█████████▊| 5629/5773 [59:24<13:01, 5.42s/it]{'loss': 0.5513, 'learning_rate': 3.262402317980917e-08, 'epoch': 0.98} 98%|█████████▊| 5629/5773 [59:21<13:01, 5.42s/it] 98%|█████████▊| 5630/5773 [59:27<12:55, 5.42s/it] 98%|█████████▊| 5630/5773 [59:29<12:55, 5.42s/it] {'loss': 0.5668, 'learning_rate': 3.2172727327594024e-08, 'epoch': 0.98} 98%|█████████▊| 5630/5773 [59:29<12:55, 5.42s/it] {'loss': 0.5668, 'learning_rate': 3.2172727327594024e-08, 'epoch': 0.98} 98%|█████████▊| 5630/5773 [59:27<12:55, 5.42s/it] 98%|█████████▊| 5631/5773 [59:32<12:56, 5.47s/it] 98%|█████████▊| 5631/5773 [59:35<12:56, 5.47s/it] {'loss': 0.5587, 'learning_rate': 3.172456966587301e-08, 'epoch': 0.98} 98%|█████████▊| 5631/5773 [59:35<12:56, 5.47s/it]{'loss': 0.5587, 'learning_rate': 3.172456966587301e-08, 'epoch': 0.98} 98%|█████████▊| 5631/5773 [59:32<12:56, 5.47s/it] 98%|█████████▊| 5632/5773 [59:38<12:49, 5.46s/it] 98%|█████████▊| 5632/5773 [59:40<12:49, 5.46s/it] {'loss': 0.56, 'learning_rate': 3.127955033574215e-08, 'epoch': 0.98} 98%|█████████▊| 5632/5773 [59:40<12:49, 5.46s/it] {'loss': 0.56, 'learning_rate': 3.127955033574215e-08, 'epoch': 0.98} 98%|█████████▊| 5632/5773 [59:38<12:49, 5.46s/it] 98%|█████████▊| 5633/5773 [59:43<12:40, 5.44s/it] 98%|█████████▊| 5633/5773 [59:46<12:40, 5.44s/it] {'loss': 0.5435, 'learning_rate': 3.0837669477307154e-08, 'epoch': 0.98} 98%|█████████▊| 5633/5773 [59:46<12:40, 5.44s/it]{'loss': 0.5435, 'learning_rate': 3.0837669477307154e-08, 'epoch': 0.98} 98%|█████████▊| 5633/5773 [59:43<12:40, 5.44s/it] 98%|█████████▊| 5634/5773 [59:49<12:34, 5.43s/it] 98%|█████████▊| 5634/5773 [59:51<12:34, 5.43s/it] {'loss': 0.5688, 'learning_rate': 3.0398927229686734e-08, 'epoch': 0.98} 98%|█████████▊| 5634/5773 [59:51<12:34, 5.43s/it] {'loss': 0.5688, 'learning_rate': 3.0398927229686734e-08, 'epoch': 0.98} 98%|█████████▊| 5634/5773 [59:49<12:34, 5.43s/it] 98%|█████████▊| 5635/5773 [59:54<12:32, 5.45s/it] 98%|█████████▊| 5635/5773 [59:56<12:32, 5.45s/it] {'loss': 0.5549, 'learning_rate': 2.9963323731010406e-08, 'epoch': 0.98} 98%|█████████▊| 5635/5773 [59:56<12:32, 5.45s/it] {'loss': 0.5549, 'learning_rate': 2.9963323731010406e-08, 'epoch': 0.98} 98%|█████████▊| 5635/5773 [59:54<12:32, 5.45s/it] 98%|█████████▊| 5636/5773 [1:00:00<12:28, 5.46s/it] 98%|█████████▊| 5636/5773 [1:00:02<12:28, 5.46s/it] {'loss': 0.5582, 'learning_rate': 2.9530859118419573e-08, 'epoch': 0.98} 98%|█████████▊| 5636/5773 [1:00:02<12:28, 5.46s/it]{'loss': 0.5582, 'learning_rate': 2.9530859118419573e-08, 'epoch': 0.98} 98%|█████████▊| 5636/5773 [1:00:00<12:28, 5.46s/it] 98%|█████████▊| 5637/5773 [1:00:05<12:22, 5.46s/it] 98%|█████████▊| 5637/5773 [1:00:07<12:22, 5.46s/it] {'loss': 0.5489, 'learning_rate': 2.910153352806977e-08, 'epoch': 0.98} 98%|█████████▊| 5637/5773 [1:00:07<12:22, 5.46s/it] {'loss': 0.5489, 'learning_rate': 2.910153352806977e-08, 'epoch': 0.98} 98%|█████████▊| 5637/5773 [1:00:05<12:22, 5.46s/it] 98%|█████████▊| 5638/5773 [1:00:10<12:16, 5.46s/it] 98%|█████████▊| 5638/5773 [1:00:13<12:16, 5.46s/it] {'loss': 0.5447, 'learning_rate': 2.8675347095125094e-08, 'epoch': 0.98} 98%|█████████▊| 5638/5773 [1:00:13<12:16, 5.46s/it] {'loss': 0.5447, 'learning_rate': 2.8675347095125094e-08, 'epoch': 0.98} 98%|█████████▊| 5638/5773 [1:00:10<12:16, 5.46s/it] 98%|█████████▊| 5639/5773 [1:00:16<12:12, 5.47s/it] 98%|█████████▊| 5639/5773 [1:00:18<12:12, 5.47s/it] {'loss': 0.5582, 'learning_rate': 2.8252299953761554e-08, 'epoch': 0.98} 98%|█████████▊| 5639/5773 [1:00:18<12:12, 5.47s/it]{'loss': 0.5582, 'learning_rate': 2.8252299953761554e-08, 'epoch': 0.98} 98%|█████████▊| 5639/5773 [1:00:16<12:12, 5.47s/it] 98%|█████████▊| 5640/5773 [1:00:21<12:07, 5.47s/it] 98%|█████████▊| 5640/5773 [1:00:24<12:07, 5.47s/it] {'loss': 0.5391, 'learning_rate': 2.783239223717038e-08, 'epoch': 0.98} 98%|█████████▊| 5640/5773 [1:00:24<12:07, 5.47s/it] {'loss': 0.5391, 'learning_rate': 2.783239223717038e-08, 'epoch': 0.98} 98%|█████████▊| 5640/5773 [1:00:21<12:07, 5.47s/it] 98%|█████████▊| 5641/5773 [1:00:27<12:02, 5.47s/it] 98%|█████████▊| 5641/5773 [1:00:29<12:02, 5.47s/it] {'loss': 0.5654, 'learning_rate': 2.7415624077551383e-08, 'epoch': 0.98} 98%|█████████▊| 5641/5773 [1:00:29<12:02, 5.47s/it] {'loss': 0.5654, 'learning_rate': 2.7415624077551383e-08, 'epoch': 0.98} 98%|█████████▊| 5641/5773 [1:00:27<12:02, 5.47s/it] 98%|█████████▊| 5642/5773 [1:00:32<11:51, 5.43s/it] 98%|█████████▊| 5642/5773 [1:00:35<11:51, 5.43s/it] {'loss': 0.547, 'learning_rate': 2.700199560611405e-08, 'epoch': 0.98} 98%|█████████▊| 5642/5773 [1:00:35<11:51, 5.43s/it] {'loss': 0.547, 'learning_rate': 2.700199560611405e-08, 'epoch': 0.98} 98%|█████████▊| 5642/5773 [1:00:32<11:51, 5.43s/it] 98%|█████████▊| 5643/5773 [1:00:38<11:42, 5.40s/it] 98%|█████████▊| 5643/5773 [1:00:40<11:41, 5.40s/it] {'loss': 0.5578, 'learning_rate': 2.659150695308421e-08, 'epoch': 0.98} 98%|█████████▊| 5643/5773 [1:00:40<11:41, 5.40s/it] {'loss': 0.5578, 'learning_rate': 2.659150695308421e-08, 'epoch': 0.98} 98%|█████████▊| 5643/5773 [1:00:38<11:42, 5.40s/it] 98%|█████████▊| 5644/5773 [1:00:43<11:34, 5.39s/it] 98%|█████████▊| 5644/5773 [1:00:45<11:34, 5.39s/it] {'loss': 0.5637, 'learning_rate': 2.6184158247697377e-08, 'epoch': 0.98} 98%|█████████▊| 5644/5773 [1:00:45<11:34, 5.39s/it] {'loss': 0.5637, 'learning_rate': 2.6184158247697377e-08, 'epoch': 0.98} 98%|█████████▊| 5644/5773 [1:00:43<11:34, 5.39s/it] 98%|█████████▊| 5645/5773 [1:00:48<11:28, 5.38s/it] 98%|█████████▊| 5645/5773 [1:00:51<11:28, 5.38s/it] {'loss': 0.5561, 'learning_rate': 2.577994961819652e-08, 'epoch': 0.98} 98%|█████████▊| 5645/5773 [1:00:51<11:28, 5.38s/it] {'loss': 0.5561, 'learning_rate': 2.577994961819652e-08, 'epoch': 0.98} 98%|█████████▊| 5645/5773 [1:00:48<11:28, 5.38s/it] 98%|█████████▊| 5646/5773 [1:00:54<11:29, 5.43s/it] 98%|█████████▊| 5646/5773 [1:00:56<11:29, 5.43s/it] {'loss': 0.5472, 'learning_rate': 2.5378881191843176e-08, 'epoch': 0.98} 98%|█████████▊| 5646/5773 [1:00:56<11:29, 5.43s/it]{'loss': 0.5472, 'learning_rate': 2.5378881191843176e-08, 'epoch': 0.98} 98%|█████████▊| 5646/5773 [1:00:54<11:29, 5.43s/it] 98%|█████████▊| 5647/5773 [1:00:59<11:20, 5.40s/it] 98%|█████████▊| 5647/5773 [1:01:02<11:20, 5.40s/it] {'loss': 0.562, 'learning_rate': 2.4980953094905226e-08, 'epoch': 0.98} 98%|█████████▊| 5647/5773 [1:01:02<11:20, 5.40s/it] {'loss': 0.562, 'learning_rate': 2.4980953094905226e-08, 'epoch': 0.98} 98%|█████████▊| 5647/5773 [1:00:59<11:20, 5.40s/it] 98%|█████████▊| 5648/5773 [1:01:05<11:17, 5.42s/it] 98%|█████████▊| 5648/5773 [1:01:07<11:17, 5.42s/it] {'loss': 0.5511, 'learning_rate': 2.458616545266246e-08, 'epoch': 0.98} 98%|█████████▊| 5648/5773 [1:01:07<11:17, 5.42s/it] {'loss': 0.5511, 'learning_rate': 2.458616545266246e-08, 'epoch': 0.98} 98%|█████████▊| 5648/5773 [1:01:05<11:17, 5.42s/it] 98%|█████████▊| 5649/5773 [1:01:10<11:10, 5.41s/it] 98%|█████████▊| 5649/5773 [1:01:12<11:10, 5.41s/it] {'loss': 0.5586, 'learning_rate': 2.4194518389406564e-08, 'epoch': 0.98} 98%|█████████▊| 5649/5773 [1:01:12<11:10, 5.41s/it] {'loss': 0.5586, 'learning_rate': 2.4194518389406564e-08, 'epoch': 0.98} 98%|█████████▊| 5649/5773 [1:01:10<11:10, 5.41s/it]11 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 10 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 98%|█████████▊| 5650/5773 [1:01:18<11:00, 5.37s/it]024 AutoResumeHook: Checking whether to suspend...5 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 98%|█████████▊| 5650/5773 [1:01:15<11:00, 5.37s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5708, 'learning_rate': 2.3806012028442237e-08, 'epoch': 0.98} 98%|█████████▊| 5650/5773 [1:01:18<11:00, 5.37s/it] {'loss': 0.5708, 'learning_rate': 2.3806012028442237e-08, 'epoch': 0.98} 98%|█████████▊| 5650/5773 [1:01:15<11:00, 5.37s/it] 98%|█████████▊| 5651/5773 [1:01:21<10:56, 5.38s/it] 98%|█████████▊| 5651/5773 [1:01:23<10:56, 5.38s/it] {'loss': 0.5531, 'learning_rate': 2.3420646492081645e-08, 'epoch': 0.98} 98%|█████████▊| 5651/5773 [1:01:23<10:56, 5.38s/it]{'loss': 0.5531, 'learning_rate': 2.3420646492081645e-08, 'epoch': 0.98} 98%|█████████▊| 5651/5773 [1:01:21<10:56, 5.38s/it] 98%|█████████▊| 5652/5773 [1:01:29<10:58, 5.44s/it] 98%|█████████▊| 5652/5773 [1:01:26<10:58, 5.44s/it] {'loss': 0.5686, 'learning_rate': 2.3038421901651064e-08, 'epoch': 0.98} 98%|█████████▊| 5652/5773 [1:01:29<10:58, 5.44s/it] {'loss': 0.5686, 'learning_rate': 2.3038421901651064e-08, 'epoch': 0.98} 98%|█████████▊| 5652/5773 [1:01:26<10:58, 5.44s/it] 98%|█████████▊| 5653/5773 [1:01:32<10:55, 5.46s/it] 98%|█████████▊| 5653/5773 [1:01:34<10:55, 5.46s/it] {'loss': 0.5655, 'learning_rate': 2.265933837748757e-08, 'epoch': 0.98} 98%|█████████▊| 5653/5773 [1:01:34<10:55, 5.46s/it]{'loss': 0.5655, 'learning_rate': 2.265933837748757e-08, 'epoch': 0.98} 98%|█████████▊| 5653/5773 [1:01:32<10:55, 5.46s/it] 98%|█████████▊| 5654/5773 [1:01:40<10:48, 5.45s/it] 98%|█████████▊| 5654/5773 [1:01:37<10:48, 5.45s/it] {'loss': 0.5478, 'learning_rate': 2.228339603893792e-08, 'epoch': 0.98} 98%|█████████▊| 5654/5773 [1:01:40<10:48, 5.45s/it] {'loss': 0.5478, 'learning_rate': 2.228339603893792e-08, 'epoch': 0.98} 98%|█████████▊| 5654/5773 [1:01:37<10:48, 5.45s/it] 98%|█████████▊| 5655/5773 [1:01:43<10:41, 5.43s/it] 98%|█████████▊| 5655/5773 [1:01:45<10:41, 5.43s/it] {'loss': 0.559, 'learning_rate': 2.1910595004360767e-08, 'epoch': 0.98} 98%|█████████▊| 5655/5773 [1:01:45<10:41, 5.43s/it] {'loss': 0.559, 'learning_rate': 2.1910595004360767e-08, 'epoch': 0.98} 98%|█████████▊| 5655/5773 [1:01:43<10:41, 5.43s/it] 98%|█████████▊| 5656/5773 [1:01:48<10:32, 5.41s/it] 98%|█████████▊| 5656/5773 [1:01:50<10:32, 5.41s/it] {'loss': 0.5523, 'learning_rate': 2.154093539112667e-08, 'epoch': 0.98} 98%|█████████▊| 5656/5773 [1:01:50<10:32, 5.41s/it]{'loss': 0.5523, 'learning_rate': 2.154093539112667e-08, 'epoch': 0.98} 98%|█████████▊| 5656/5773 [1:01:48<10:32, 5.41s/it] 98%|█████████▊| 5657/5773 [1:01:53<10:29, 5.43s/it] 98%|█████████▊| 5657/5773 [1:01:56<10:29, 5.43s/it] {'loss': 0.5524, 'learning_rate': 2.117441731561587e-08, 'epoch': 0.98} {'loss': 0.5524, 'learning_rate': 2.117441731561587e-08, 'epoch': 0.98} 98%|█████████▊| 5657/5773 [1:01:56<10:29, 5.43s/it] 98%|█████████▊| 5657/5773 [1:01:53<10:29, 5.43s/it] 98%|█████████▊| 5658/5773 [1:01:59<10:23, 5.42s/it] 98%|█████████▊| 5658/5773 [1:02:01<10:23, 5.42s/it] {'loss': 0.5567, 'learning_rate': 2.0811040893218282e-08, 'epoch': 0.98} 98%|█████████▊| 5658/5773 [1:02:01<10:23, 5.42s/it]{'loss': 0.5567, 'learning_rate': 2.0811040893218282e-08, 'epoch': 0.98} 98%|█████████▊| 5658/5773 [1:01:59<10:23, 5.42s/it] 98%|█████████▊| 5659/5773 [1:02:06<10:12, 5.37s/it] 98%|█████████▊| 5659/5773 [1:02:04<10:12, 5.37s/it]{'loss': 0.5582, 'learning_rate': 2.0450806238340172e-08, 'epoch': 0.98} {'loss': 0.5582, 'learning_rate': 2.0450806238340172e-08, 'epoch': 0.98} 98%|█████████▊| 5659/5773 [1:02:06<10:12, 5.37s/it] 98%|█████████▊| 5659/5773 [1:02:04<10:12, 5.37s/it] 98%|█████████▊| 5660/5773 [1:02:10<10:08, 5.39s/it] 98%|█████████▊| 5660/5773 [1:02:12<10:08, 5.39s/it] {'loss': 0.5822, 'learning_rate': 2.009371346439082e-08, 'epoch': 0.98} 98%|█████████▊| 5660/5773 [1:02:12<10:08, 5.39s/it]{'loss': 0.5822, 'learning_rate': 2.009371346439082e-08, 'epoch': 0.98} 98%|█████████▊| 5660/5773 [1:02:10<10:08, 5.39s/it] 98%|█████████▊| 5661/5773 [1:02:15<10:00, 5.36s/it] 98%|█████████▊| 5661/5773 [1:02:17<10:00, 5.36s/it] {'loss': 0.5557, 'learning_rate': 1.973976268379696e-08, 'epoch': 0.98} 98%|█████████▊| 5661/5773 [1:02:17<10:00, 5.36s/it]{'loss': 0.5557, 'learning_rate': 1.973976268379696e-08, 'epoch': 0.98} 98%|█████████▊| 5661/5773 [1:02:15<10:00, 5.36s/it] 98%|█████████▊| 5662/5773 [1:02:20<09:53, 5.35s/it] 98%|█████████▊| 5662/5773 [1:02:23<09:53, 5.35s/it] {'loss': 0.5457, 'learning_rate': 1.938895400799279e-08, 'epoch': 0.98} 98%|█████████▊| 5662/5773 [1:02:23<09:53, 5.35s/it] {'loss': 0.5457, 'learning_rate': 1.938895400799279e-08, 'epoch': 0.98} 98%|█████████▊| 5662/5773 [1:02:20<09:53, 5.35s/it] 98%|█████████▊| 5663/5773 [1:02:25<09:48, 5.35s/it] 98%|█████████▊| 5663/5773 [1:02:28<09:48, 5.35s/it] {'loss': 0.5589, 'learning_rate': 1.9041287547424404e-08, 'epoch': 0.98} 98%|█████████▊| 5663/5773 [1:02:28<09:48, 5.35s/it]{'loss': 0.5589, 'learning_rate': 1.9041287547424404e-08, 'epoch': 0.98} 98%|█████████▊| 5663/5773 [1:02:25<09:48, 5.35s/it] 98%|█████████▊| 5664/5773 [1:02:31<09:49, 5.41s/it] 98%|█████████▊| 5664/5773 [1:02:33<09:49, 5.41s/it] {'loss': 0.558, 'learning_rate': 1.869676341154869e-08, 'epoch': 0.98} 98%|█████████▊| 5664/5773 [1:02:33<09:49, 5.41s/it] {'loss': 0.558, 'learning_rate': 1.869676341154869e-08, 'epoch': 0.98} 98%|█████████▊| 5664/5773 [1:02:31<09:49, 5.41s/it] 98%|█████████▊| 5665/5773 [1:02:36<09:46, 5.43s/it] 98%|█████████▊| 5665/5773 [1:02:39<09:46, 5.43s/it] {'loss': 0.5456, 'learning_rate': 1.835538170883111e-08, 'epoch': 0.98} 98%|█████████▊| 5665/5773 [1:02:39<09:46, 5.43s/it] {'loss': 0.5456, 'learning_rate': 1.835538170883111e-08, 'epoch': 0.98} 98%|█████████▊| 5665/5773 [1:02:36<09:46, 5.43s/it] 98%|█████████▊| 5666/5773 [1:02:44<09:44, 5.47s/it] 98%|█████████▊| 5666/5773 [1:02:42<09:44, 5.47s/it] {'loss': 0.5572, 'learning_rate': 1.8017142546752353e-08, 'epoch': 0.98} 98%|█████████▊| 5666/5773 [1:02:44<09:44, 5.47s/it] {'loss': 0.5572, 'learning_rate': 1.8017142546752353e-08, 'epoch': 0.98} 98%|█████████▊| 5666/5773 [1:02:42<09:44, 5.47s/it] 98%|█████████▊| 5667/5773 [1:02:47<09:37, 5.45s/it] 98%|█████████▊| 5667/5773 [1:02:50<09:37, 5.45s/it] {'loss': 0.5497, 'learning_rate': 1.768204603179835e-08, 'epoch': 0.98} 98%|█████████▊| 5667/5773 [1:02:50<09:37, 5.45s/it] {'loss': 0.5497, 'learning_rate': 1.768204603179835e-08, 'epoch': 0.98} 98%|█████████▊| 5667/5773 [1:02:47<09:37, 5.45s/it] 98%|█████████▊| 5668/5773 [1:02:53<09:30, 5.43s/it] 98%|█████████▊| 5668/5773 [1:02:55<09:30, 5.43s/it] {'loss': 0.5514, 'learning_rate': 1.7350092269469155e-08, 'epoch': 0.98} {'loss': 0.5514, 'learning_rate': 1.7350092269469155e-08, 'epoch': 0.98} 98%|█████████▊| 5668/5773 [1:02:55<09:30, 5.43s/it] 98%|█████████▊| 5668/5773 [1:02:53<09:30, 5.43s/it] 98%|█████████▊| 5669/5773 [1:02:58<09:24, 5.42s/it] 98%|█████████▊| 5669/5773 [1:03:01<09:24, 5.42s/it] {'loss': 0.5598, 'learning_rate': 1.70212813642745e-08, 'epoch': 0.98} 98%|█████████▊| 5669/5773 [1:03:01<09:24, 5.42s/it]{'loss': 0.5598, 'learning_rate': 1.70212813642745e-08, 'epoch': 0.98} 98%|█████████▊| 5669/5773 [1:02:58<09:24, 5.42s/it] 98%|█████████▊| 5670/5773 [1:03:04<09:17, 5.41s/it] 98%|█████████▊| 5670/5773 [1:03:06<09:17, 5.41s/it] {'loss': 0.5519, 'learning_rate': 1.66956134197338e-08, 'epoch': 0.98} 98%|█████████▊| 5670/5773 [1:03:06<09:17, 5.41s/it] {'loss': 0.5519, 'learning_rate': 1.66956134197338e-08, 'epoch': 0.98} 98%|█████████▊| 5670/5773 [1:03:04<09:17, 5.41s/it] 98%|█████████▊| 5671/5773 [1:03:09<09:14, 5.44s/it] 98%|█████████▊| 5671/5773 [1:03:12<09:14, 5.44s/it] {'loss': 0.5648, 'learning_rate': 1.637308853837949e-08, 'epoch': 0.98} 98%|█████████▊| 5671/5773 [1:03:12<09:14, 5.44s/it] {'loss': 0.5648, 'learning_rate': 1.637308853837949e-08, 'epoch': 0.98} 98%|█████████▊| 5671/5773 [1:03:09<09:14, 5.44s/it] 98%|█████████▊| 5672/5773 [1:03:15<09:11, 5.46s/it] 98%|█████████▊| 5672/5773 [1:03:17<09:11, 5.46s/it] {'loss': 0.5503, 'learning_rate': 1.6053706821750336e-08, 'epoch': 0.98} 98%|█████████▊| 5672/5773 [1:03:17<09:11, 5.46s/it] {'loss': 0.5503, 'learning_rate': 1.6053706821750336e-08, 'epoch': 0.98} 98%|█████████▊| 5672/5773 [1:03:15<09:11, 5.46s/it] 98%|█████████▊| 5673/5773 [1:03:20<09:00, 5.41s/it] 98%|█████████▊| 5673/5773 [1:03:22<09:00, 5.41s/it] {'loss': 0.5526, 'learning_rate': 1.5737468370400355e-08, 'epoch': 0.98} 98%|█████████▊| 5673/5773 [1:03:20<09:00, 5.41s/it] {'loss': 0.5526, 'learning_rate': 1.5737468370400355e-08, 'epoch': 0.98} 98%|█████████▊| 5673/5773 [1:03:22<09:00, 5.41s/it] 98%|█████████▊| 5674/5773 [1:03:25<08:53, 5.39s/it] 98%|█████████▊| 5674/5773 [1:03:28<08:53, 5.39s/it] {'loss': 0.5541, 'learning_rate': 1.5424373283889904e-08, 'epoch': 0.98} 98%|█████████▊| 5674/5773 [1:03:28<08:53, 5.39s/it] {'loss': 0.5541, 'learning_rate': 1.5424373283889904e-08, 'epoch': 0.98} 98%|█████████▊| 5674/5773 [1:03:25<08:53, 5.39s/it] 98%|█████████▊| 5675/5773 [1:03:31<08:48, 5.39s/it] 98%|█████████▊| 5675/5773 [1:03:33<08:48, 5.39s/it] {'loss': 0.5417, 'learning_rate': 1.5114421660791245e-08, 'epoch': 0.98} 98%|█████████▊| 5675/5773 [1:03:33<08:48, 5.39s/it]{'loss': 0.5417, 'learning_rate': 1.5114421660791245e-08, 'epoch': 0.98} 98%|█████████▊| 5675/5773 [1:03:31<08:48, 5.39s/it] 98%|█████████▊| 5676/5773 [1:03:36<08:44, 5.41s/it] 98%|█████████▊| 5676/5773 [1:03:39<08:45, 5.41s/it] {'loss': 0.5549, 'learning_rate': 1.4807613598688542e-08, 'epoch': 0.98} 98%|█████████▊| 5676/5773 [1:03:39<08:45, 5.41s/it]{'loss': 0.5549, 'learning_rate': 1.4807613598688542e-08, 'epoch': 0.98} 98%|█████████▊| 5676/5773 [1:03:36<08:44, 5.41s/it] 98%|█████████▊| 5677/5773 [1:03:42<08:44, 5.46s/it] 98%|█████████▊| 5677/5773 [1:03:44<08:44, 5.46s/it] {'loss': 0.59, 'learning_rate': 1.4503949194173417e-08, 'epoch': 0.98} {'loss': 0.59, 'learning_rate': 1.4503949194173417e-08, 'epoch': 0.98} 98%|█████████▊| 5677/5773 [1:03:44<08:44, 5.46s/it] 98%|█████████▊| 5677/5773 [1:03:42<08:44, 5.46s/it] 98%|█████████▊| 5678/5773 [1:03:47<08:40, 5.48s/it] 98%|█████████▊| 5678/5773 [1:03:50<08:40, 5.48s/it] {'loss': 0.5733, 'learning_rate': 1.4203428542849396e-08, 'epoch': 0.98} 98%|█████████▊| 5678/5773 [1:03:50<08:40, 5.48s/it]{'loss': 0.5733, 'learning_rate': 1.4203428542849396e-08, 'epoch': 0.98} 98%|█████████▊| 5678/5773 [1:03:47<08:40, 5.48s/it] 98%|█████████▊| 5679/5773 [1:03:53<08:30, 5.43s/it] 98%|█████████▊| 5679/5773 [1:03:55<08:30, 5.43s/it] {'loss': 0.5703, 'learning_rate': 1.3906051739329684e-08, 'epoch': 0.98} {'loss': 0.5703, 'learning_rate': 1.3906051739329684e-08, 'epoch': 0.98} 98%|█████████▊| 5679/5773 [1:03:55<08:30, 5.43s/it] 98%|█████████▊| 5679/5773 [1:03:53<08:30, 5.43s/it] 98%|█████████▊| 5680/5773 [1:03:58<08:25, 5.44s/it] 98%|█████████▊| 5680/5773 [1:04:00<08:25, 5.44s/it] {'loss': 0.551, 'learning_rate': 1.361181887723939e-08, 'epoch': 0.98} 98%|█████████▊| 5680/5773 [1:04:00<08:25, 5.44s/it] {'loss': 0.551, 'learning_rate': 1.361181887723939e-08, 'epoch': 0.98} 98%|█████████▊| 5680/5773 [1:03:58<08:25, 5.44s/it] 98%|█████████▊| 5681/5773 [1:04:06<08:32, 5.57s/it] 98%|█████████▊| 5681/5773 [1:04:04<08:32, 5.57s/it] {'loss': 0.5662, 'learning_rate': 1.3320730049209973e-08, 'epoch': 0.98} 98%|█████████▊| 5681/5773 [1:04:06<08:32, 5.57s/it] {'loss': 0.5662, 'learning_rate': 1.3320730049209973e-08, 'epoch': 0.98} 98%|█████████▊| 5681/5773 [1:04:04<08:32, 5.57s/it] 98%|█████████▊| 5682/5773 [1:04:09<08:18, 5.48s/it] 98%|█████████▊| 5682/5773 [1:04:12<08:18, 5.48s/it] {'loss': 0.5664, 'learning_rate': 1.3032785346888122e-08, 'epoch': 0.98} 98%|█████████▊| 5682/5773 [1:04:12<08:18, 5.48s/it] {'loss': 0.5664, 'learning_rate': 1.3032785346888122e-08, 'epoch': 0.98} 98%|█████████▊| 5682/5773 [1:04:09<08:18, 5.48s/it] 98%|█████████▊| 5683/5773 [1:04:15<08:17, 5.53s/it] 98%|█████████▊| 5683/5773 [1:04:17<08:17, 5.53s/it] {'loss': 0.5749, 'learning_rate': 1.2747984860926876e-08, 'epoch': 0.98} 98%|█████████▊| 5683/5773 [1:04:17<08:17, 5.53s/it] {'loss': 0.5749, 'learning_rate': 1.2747984860926876e-08, 'epoch': 0.98} 98%|█████████▊| 5683/5773 [1:04:15<08:17, 5.53s/it] 98%|█████████▊| 5684/5773 [1:04:20<08:05, 5.45s/it] 98%|█████████▊| 5684/5773 [1:04:22<08:05, 5.45s/it] {'loss': 0.5705, 'learning_rate': 1.2466328680990069e-08, 'epoch': 0.98} 98%|█████████▊| 5684/5773 [1:04:23<08:05, 5.45s/it]{'loss': 0.5705, 'learning_rate': 1.2466328680990069e-08, 'epoch': 0.98} 98%|█████████▊| 5684/5773 [1:04:20<08:05, 5.45s/it] 98%|█████████▊| 5685/5773 [1:04:26<08:02, 5.48s/it] 98%|█████████▊| 5685/5773 [1:04:28<08:02, 5.48s/it] {'loss': 0.5768, 'learning_rate': 1.2187816895752324e-08, 'epoch': 0.98} 98%|█████████▊| 5685/5773 [1:04:28<08:02, 5.48s/it]{'loss': 0.5768, 'learning_rate': 1.2187816895752324e-08, 'epoch': 0.98} 98%|█████████▊| 5685/5773 [1:04:26<08:02, 5.48s/it] 98%|█████████▊| 5686/5773 [1:04:31<07:52, 5.43s/it] 98%|█████████▊| 5686/5773 [1:04:33<07:52, 5.43s/it] {'loss': 0.5415, 'learning_rate': 1.1912449592899055e-08, 'epoch': 0.98} 98%|█████████▊| 5686/5773 [1:04:33<07:52, 5.43s/it] {'loss': 0.5415, 'learning_rate': 1.1912449592899055e-08, 'epoch': 0.98} 98%|█████████▊| 5686/5773 [1:04:31<07:52, 5.43s/it] 99%|█████████▊| 5687/5773 [1:04:36<07:44, 5.41s/it] 99%|█████████▊| 5687/5773 [1:04:39<07:44, 5.41s/it] {'loss': 0.5619, 'learning_rate': 1.1640226859123138e-08, 'epoch': 0.99} 99%|█████████▊| 5687/5773 [1:04:39<07:44, 5.41s/it] {'loss': 0.5619, 'learning_rate': 1.1640226859123138e-08, 'epoch': 0.99} 99%|█████████▊| 5687/5773 [1:04:36<07:44, 5.41s/it] 99%|█████████▊| 5688/5773 [1:04:42<07:47, 5.50s/it] 99%|█████████▊| 5688/5773 [1:04:44<07:47, 5.50s/it] {'loss': 0.5375, 'learning_rate': 1.1371148780130459e-08, 'epoch': 0.99} 99%|█████████▊| 5688/5773 [1:04:44<07:47, 5.50s/it] {'loss': 0.5375, 'learning_rate': 1.1371148780130459e-08, 'epoch': 0.99} 99%|█████████▊| 5688/5773 [1:04:42<07:47, 5.50s/it] 99%|█████████▊| 5689/5773 [1:04:47<07:39, 5.47s/it] 99%|█████████▊| 5689/5773 [1:04:50<07:39, 5.47s/it] {'loss': 0.5651, 'learning_rate': 1.1105215440634364e-08, 'epoch': 0.99} 99%|█████████▊| 5689/5773 [1:04:50<07:39, 5.47s/it]{'loss': 0.5651, 'learning_rate': 1.1105215440634364e-08, 'epoch': 0.99} 99%|█████████▊| 5689/5773 [1:04:47<07:39, 5.47s/it] 99%|█████████▊| 5690/5773 [1:04:53<07:32, 5.45s/it] 99%|█████████▊| 5690/5773 [1:04:55<07:32, 5.45s/it] {'loss': 0.5548, 'learning_rate': 1.0842426924358996e-08, 'epoch': 0.99} 99%|█████████▊| 5690/5773 [1:04:55<07:32, 5.45s/it] {'loss': 0.5548, 'learning_rate': 1.0842426924358996e-08, 'epoch': 0.99} 99%|█████████▊| 5690/5773 [1:04:53<07:32, 5.45s/it] 99%|█████████▊| 5691/5773 [1:04:58<07:25, 5.43s/it] 99%|█████████▊| 5691/5773 [1:05:01<07:25, 5.43s/it] {'loss': 0.554, 'learning_rate': 1.058278331403928e-08, 'epoch': 0.99} 99%|█████████▊| 5691/5773 [1:05:01<07:25, 5.43s/it] {'loss': 0.554, 'learning_rate': 1.058278331403928e-08, 'epoch': 0.99} 99%|█████████▊| 5691/5773 [1:04:58<07:25, 5.43s/it] 99%|█████████▊| 5692/5773 [1:05:04<07:20, 5.44s/it] 99%|█████████▊| 5692/5773 [1:05:06<07:20, 5.44s/it] {'loss': 0.5504, 'learning_rate': 1.032628469141872e-08, 'epoch': 0.99} 99%|█████████▊| 5692/5773 [1:05:06<07:20, 5.44s/it]{'loss': 0.5504, 'learning_rate': 1.032628469141872e-08, 'epoch': 0.99} 99%|█████████▊| 5692/5773 [1:05:04<07:20, 5.44s/it] 99%|█████████▊| 5693/5773 [1:05:11<07:13, 5.42s/it] 99%|█████████▊| 5693/5773 [1:05:09<07:13, 5.42s/it] {'loss': 0.5558, 'learning_rate': 1.0072931137252717e-08, 'epoch': 0.99} 99%|█████████▊| 5693/5773 [1:05:11<07:13, 5.42s/it] {'loss': 0.5558, 'learning_rate': 1.0072931137252717e-08, 'epoch': 0.99} 99%|█████████▊| 5693/5773 [1:05:09<07:13, 5.42s/it] 99%|█████████▊| 5694/5773 [1:05:15<07:12, 5.47s/it] 99%|█████████▊| 5694/5773 [1:05:17<07:12, 5.47s/it] {'loss': 0.5392, 'learning_rate': 9.822722731303025e-09, 'epoch': 0.99} {'loss': 0.5392, 'learning_rate': 9.822722731303025e-09, 'epoch': 0.99} 99%|█████████▊| 5694/5773 [1:05:17<07:12, 5.47s/it] 99%|█████████▊| 5694/5773 [1:05:15<07:12, 5.47s/it] 99%|█████████▊| 5695/5773 [1:05:20<07:09, 5.51s/it] 99%|█████████▊| 5695/5773 [1:05:23<07:09, 5.51s/it] {'loss': 0.5663, 'learning_rate': 9.575659552344408e-09, 'epoch': 0.99} 99%|█████████▊| 5695/5773 [1:05:23<07:09, 5.51s/it] {'loss': 0.5663, 'learning_rate': 9.575659552344408e-09, 'epoch': 0.99} 99%|█████████▊| 5695/5773 [1:05:20<07:09, 5.51s/it] 99%|█████████▊| 5696/5773 [1:05:26<07:07, 5.55s/it] 99%|█████████▊| 5696/5773 [1:05:28<07:07, 5.55s/it] {'loss': 0.5489, 'learning_rate': 9.3317416781602e-09, 'epoch': 0.99} {'loss': 0.5489, 'learning_rate': 9.3317416781602e-09, 'epoch': 0.99} 99%|█████████▊| 5696/5773 [1:05:28<07:07, 5.55s/it] 99%|█████████▊| 5696/5773 [1:05:26<07:07, 5.55s/it] 99%|█████████▊| 5697/5773 [1:05:31<06:55, 5.47s/it] 99%|█████████▊| 5697/5773 [1:05:34<06:55, 5.47s/it] {'loss': 0.5644, 'learning_rate': 9.09096918554342e-09, 'epoch': 0.99} 99%|█████████▊| 5697/5773 [1:05:34<06:55, 5.47s/it] {'loss': 0.5644, 'learning_rate': 9.09096918554342e-09, 'epoch': 0.99} 99%|█████████▊| 5697/5773 [1:05:31<06:55, 5.47s/it] 99%|█████████▊| 5698/5773 [1:05:37<06:49, 5.45s/it] 99%|█████████▊| 5698/5773 [1:05:39<06:49, 5.45s/it] {'loss': 0.5653, 'learning_rate': 8.853342150296762e-09, 'epoch': 0.99} 99%|█████████▊| 5698/5773 [1:05:39<06:49, 5.45s/it] {'loss': 0.5653, 'learning_rate': 8.853342150296762e-09, 'epoch': 0.99} 99%|█████████▊| 5698/5773 [1:05:37<06:49, 5.45s/it] 99%|█████████▊| 5699/5773 [1:05:42<06:43, 5.45s/it] 99%|█████████▊| 5699/5773 [1:05:44<06:43, 5.45s/it] {'loss': 0.5521, 'learning_rate': 8.618860647232608e-09, 'epoch': 0.99}{'loss': 0.5521, 'learning_rate': 8.618860647232608e-09, 'epoch': 0.99} 99%|█████████▊| 5699/5773 [1:05:42<06:43, 5.45s/it] 99%|█████████▊| 5699/5773 [1:05:44<06:43, 5.45s/it]4 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 99%|█████████▊| 5700/5773 [1:05:47<06:34, 5.40s/it]10 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 9 AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 99%|█████████▊| 5700/5773 [1:05:50<06:34, 5.40s/it] {'loss': 0.5441, 'learning_rate': 8.387524750174125e-09, 'epoch': 0.99} {'loss': 0.5441, 'learning_rate': 8.387524750174125e-09, 'epoch': 0.99} 99%|█████████▊| 5700/5773 [1:05:50<06:34, 5.40s/it] 99%|█████████▊| 5700/5773 [1:05:47<06:34, 5.40s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/tmp-checkpoint-5700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 99%|█████████▉| 5701/5773 [1:06:07<11:38, 9.71s/it] 99%|█████████▉| 5701/5773 [1:06:09<11:38, 9.71s/it] {'loss': 0.5558, 'learning_rate': 8.159334531953055e-09, 'epoch': 0.99} 99%|█████████▉| 5701/5773 [1:06:09<11:38, 9.71s/it] {'loss': 0.5558, 'learning_rate': 8.159334531953055e-09, 'epoch': 0.99} 99%|█████████▉| 5701/5773 [1:06:07<11:38, 9.71s/it] 99%|█████████▉| 5702/5773 [1:06:15<10:01, 8.48s/it] 99%|█████████▉| 5702/5773 [1:06:13<10:01, 8.48s/it] {'loss': 0.5577, 'learning_rate': 7.93429006440971e-09, 'epoch': 0.99} 99%|█████████▉| 5702/5773 [1:06:15<10:01, 8.48s/it] {'loss': 0.5577, 'learning_rate': 7.93429006440971e-09, 'epoch': 0.99} 99%|█████████▉| 5702/5773 [1:06:13<10:01, 8.48s/it] 99%|█████████▉| 5703/5773 [1:06:18<08:52, 7.61s/it] 99%|█████████▉| 5703/5773 [1:06:21<08:52, 7.61s/it] {'loss': 0.5456, 'learning_rate': 7.712391418396304e-09, 'epoch': 0.99} 99%|█████████▉| 5703/5773 [1:06:21<08:52, 7.61s/it] {'loss': 0.5456, 'learning_rate': 7.712391418396304e-09, 'epoch': 0.99} 99%|█████████▉| 5703/5773 [1:06:18<08:52, 7.61s/it] 99%|█████████▉| 5704/5773 [1:06:24<07:57, 6.93s/it] 99%|█████████▉| 5704/5773 [1:06:26<07:57, 6.93s/it] {'loss': 0.5526, 'learning_rate': 7.493638663773617e-09, 'epoch': 0.99} 99%|█████████▉| 5704/5773 [1:06:26<07:57, 6.93s/it] {'loss': 0.5526, 'learning_rate': 7.493638663773617e-09, 'epoch': 0.99} 99%|█████████▉| 5704/5773 [1:06:24<07:57, 6.93s/it] 99%|█████████▉| 5705/5773 [1:06:29<07:21, 6.50s/it] 99%|█████████▉| 5705/5773 [1:06:31<07:21, 6.50s/it] {'loss': 0.5635, 'learning_rate': 7.278031869412117e-09, 'epoch': 0.99} 99%|█████████▉| 5705/5773 [1:06:31<07:21, 6.50s/it] {'loss': 0.5635, 'learning_rate': 7.278031869412117e-09, 'epoch': 0.99} 99%|█████████▉| 5705/5773 [1:06:29<07:21, 6.50s/it] 99%|█████████▉| 5706/5773 [1:06:35<06:54, 6.19s/it] 99%|█████████▉| 5706/5773 [1:06:37<06:54, 6.19s/it] {'loss': 0.5682, 'learning_rate': 7.0655711031908384e-09, 'epoch': 0.99} {'loss': 0.5682, 'learning_rate': 7.0655711031908384e-09, 'epoch': 0.99} 99%|█████████▉| 5706/5773 [1:06:37<06:54, 6.19s/it] 99%|█████████▉| 5706/5773 [1:06:35<06:54, 6.19s/it] 99%|█████████▉| 5707/5773 [1:06:40<06:39, 6.05s/it] 99%|█████████▉| 5707/5773 [1:06:43<06:39, 6.05s/it] {'loss': 0.5632, 'learning_rate': 6.856256432000719e-09, 'epoch': 0.99} 99%|█████████▉| 5707/5773 [1:06:43<06:39, 6.05s/it] {'loss': 0.5632, 'learning_rate': 6.856256432000719e-09, 'epoch': 0.99} 99%|█████████▉| 5707/5773 [1:06:40<06:39, 6.05s/it] 99%|█████████▉| 5708/5773 [1:06:46<06:20, 5.85s/it] 99%|█████████▉| 5708/5773 [1:06:48<06:20, 5.85s/it] {'loss': 0.537, 'learning_rate': 6.650087921739046e-09, 'epoch': 0.99} 99%|█████████▉| 5708/5773 [1:06:48<06:20, 5.85s/it] {'loss': 0.537, 'learning_rate': 6.650087921739046e-09, 'epoch': 0.99} 99%|█████████▉| 5708/5773 [1:06:46<06:20, 5.85s/it] 99%|█████████▉| 5709/5773 [1:06:51<06:05, 5.71s/it] 99%|█████████▉| 5709/5773 [1:06:53<06:05, 5.71s/it] {'loss': 0.5597, 'learning_rate': 6.447065637316119e-09, 'epoch': 0.99} 99%|█████████▉| 5709/5773 [1:06:53<06:05, 5.71s/it] {'loss': 0.5597, 'learning_rate': 6.447065637316119e-09, 'epoch': 0.99} 99%|█████████▉| 5709/5773 [1:06:51<06:05, 5.71s/it] 99%|█████████▉| 5710/5773 [1:06:56<05:54, 5.62s/it] 99%|█████████▉| 5710/5773 [1:06:59<05:54, 5.62s/it] {'loss': 0.552, 'learning_rate': 6.2471896426474774e-09, 'epoch': 0.99} 99%|█████████▉| 5710/5773 [1:06:59<05:54, 5.62s/it]{'loss': 0.552, 'learning_rate': 6.2471896426474774e-09, 'epoch': 0.99} 99%|█████████▉| 5710/5773 [1:06:56<05:54, 5.62s/it] 99%|█████████▉| 5711/5773 [1:07:02<05:46, 5.59s/it] 99%|█████████▉| 5711/5773 [1:07:04<05:46, 5.59s/it] {'loss': 0.5457, 'learning_rate': 6.050460000662783e-09, 'epoch': 0.99} 99%|█████████▉| 5711/5773 [1:07:04<05:46, 5.59s/it] {'loss': 0.5457, 'learning_rate': 6.050460000662783e-09, 'epoch': 0.99} 99%|█████████▉| 5711/5773 [1:07:02<05:46, 5.59s/it] 99%|█████████▉| 5712/5773 [1:07:07<05:39, 5.57s/it] 99%|█████████▉| 5712/5773 [1:07:10<05:39, 5.57s/it] {'loss': 0.5452, 'learning_rate': 5.856876773296938e-09, 'epoch': 0.99} 99%|█████████▉| 5712/5773 [1:07:10<05:39, 5.57s/it]{'loss': 0.5452, 'learning_rate': 5.856876773296938e-09, 'epoch': 0.99} 99%|█████████▉| 5712/5773 [1:07:07<05:39, 5.57s/it] 99%|█████████▉| 5713/5773 [1:07:13<05:32, 5.54s/it] 99%|█████████▉| 5713/5773 [1:07:15<05:32, 5.54s/it] {'loss': 0.5635, 'learning_rate': 5.666440021497855e-09, 'epoch': 0.99} 99%|█████████▉| 5713/5773 [1:07:15<05:32, 5.54s/it] {'loss': 0.5635, 'learning_rate': 5.666440021497855e-09, 'epoch': 0.99} 99%|█████████▉| 5713/5773 [1:07:13<05:32, 5.54s/it] 99%|█████████▉| 5714/5773 [1:07:18<05:23, 5.48s/it] 99%|█████████▉| 5714/5773 [1:07:21<05:23, 5.48s/it] {'loss': 0.5392, 'learning_rate': 5.479149805219796e-09, 'epoch': 0.99} 99%|█████████▉| 5714/5773 [1:07:21<05:23, 5.48s/it] {'loss': 0.5392, 'learning_rate': 5.479149805219796e-09, 'epoch': 0.99} 99%|█████████▉| 5714/5773 [1:07:18<05:23, 5.48s/it] 99%|█████████▉| 5715/5773 [1:07:24<05:18, 5.50s/it] 99%|█████████▉| 5715/5773 [1:07:26<05:18, 5.50s/it] {'loss': 0.5445, 'learning_rate': 5.29500618342782e-09, 'epoch': 0.99} 99%|█████████▉| 5715/5773 [1:07:26<05:18, 5.50s/it] {'loss': 0.5445, 'learning_rate': 5.29500618342782e-09, 'epoch': 0.99} 99%|█████████▉| 5715/5773 [1:07:24<05:18, 5.50s/it] 99%|█████████▉| 5716/5773 [1:07:29<05:11, 5.46s/it] 99%|█████████▉| 5716/5773 [1:07:32<05:11, 5.46s/it] {'loss': 0.5458, 'learning_rate': 5.1140092140966606e-09, 'epoch': 0.99} 99%|█████████▉| 5716/5773 [1:07:32<05:11, 5.46s/it] {'loss': 0.5458, 'learning_rate': 5.1140092140966606e-09, 'epoch': 0.99} 99%|█████████▉| 5716/5773 [1:07:29<05:11, 5.46s/it] 99%|█████████▉| 5717/5773 [1:07:35<05:03, 5.41s/it] 99%|█████████▉| 5717/5773 [1:07:37<05:03, 5.41s/it] {'loss': 0.5544, 'learning_rate': 4.9361589542107345e-09, 'epoch': 0.99} 99%|█████████▉| 5717/5773 [1:07:37<05:03, 5.41s/it] {'loss': 0.5544, 'learning_rate': 4.9361589542107345e-09, 'epoch': 0.99} 99%|█████████▉| 5717/5773 [1:07:35<05:03, 5.41s/it] 99%|█████████▉| 5718/5773 [1:07:40<05:02, 5.50s/it] 99%|█████████▉| 5718/5773 [1:07:43<05:02, 5.50s/it] {'loss': 0.5474, 'learning_rate': 4.761455459760811e-09, 'epoch': 0.99} 99%|█████████▉| 5718/5773 [1:07:43<05:02, 5.50s/it] {'loss': 0.5474, 'learning_rate': 4.761455459760811e-09, 'epoch': 0.99} 99%|█████████▉| 5718/5773 [1:07:40<05:02, 5.50s/it] 99%|█████████▉| 5719/5773 [1:07:46<04:56, 5.50s/it] 99%|█████████▉| 5719/5773 [1:07:48<04:56, 5.50s/it] {'loss': 0.5652, 'learning_rate': 4.589898785750668e-09, 'epoch': 0.99} 99%|█████████▉| 5719/5773 [1:07:48<04:56, 5.50s/it]{'loss': 0.5652, 'learning_rate': 4.589898785750668e-09, 'epoch': 0.99} 99%|█████████▉| 5719/5773 [1:07:46<04:56, 5.50s/it] 99%|█████████▉| 5720/5773 [1:07:51<04:51, 5.49s/it] 99%|█████████▉| 5720/5773 [1:07:54<04:51, 5.49s/it] {'loss': 0.5319, 'learning_rate': 4.421488986192657e-09, 'epoch': 0.99} 99%|█████████▉| 5720/5773 [1:07:54<04:51, 5.49s/it] {'loss': 0.5319, 'learning_rate': 4.421488986192657e-09, 'epoch': 0.99} 99%|█████████▉| 5720/5773 [1:07:51<04:51, 5.49s/it] 99%|█████████▉| 5721/5773 [1:07:57<04:47, 5.52s/it] 99%|█████████▉| 5721/5773 [1:07:59<04:47, 5.53s/it] {'loss': 0.5609, 'learning_rate': 4.256226114105477e-09, 'epoch': 0.99} 99%|█████████▉| 5721/5773 [1:07:59<04:47, 5.53s/it] {'loss': 0.5609, 'learning_rate': 4.256226114105477e-09, 'epoch': 0.99} 99%|█████████▉| 5721/5773 [1:07:57<04:47, 5.52s/it] 99%|█████████▉| 5722/5773 [1:08:02<04:38, 5.46s/it] 99%|█████████▉| 5722/5773 [1:08:05<04:38, 5.46s/it] {'loss': 0.5505, 'learning_rate': 4.094110221519731e-09, 'epoch': 0.99} 99%|█████████▉| 5722/5773 [1:08:05<04:38, 5.46s/it] {'loss': 0.5505, 'learning_rate': 4.094110221519731e-09, 'epoch': 0.99} 99%|█████████▉| 5722/5773 [1:08:02<04:38, 5.46s/it] 99%|█████████▉| 5723/5773 [1:08:07<04:31, 5.42s/it] 99%|█████████▉| 5723/5773 [1:08:10<04:31, 5.42s/it] {'loss': 0.5469, 'learning_rate': 3.935141359475703e-09, 'epoch': 0.99} 99%|█████████▉| 5723/5773 [1:08:10<04:31, 5.42s/it] {'loss': 0.5469, 'learning_rate': 3.935141359475703e-09, 'epoch': 0.99} 99%|█████████▉| 5723/5773 [1:08:07<04:31, 5.42s/it] 99%|█████████▉| 5724/5773 [1:08:13<04:25, 5.42s/it] 99%|█████████▉| 5724/5773 [1:08:15<04:25, 5.42s/it] {'loss': 0.5537, 'learning_rate': 3.779319578021134e-09, 'epoch': 0.99} 99%|█████████▉| 5724/5773 [1:08:15<04:25, 5.42s/it] {'loss': 0.5537, 'learning_rate': 3.779319578021134e-09, 'epoch': 0.99} 99%|█████████▉| 5724/5773 [1:08:13<04:25, 5.42s/it] 99%|█████████▉| 5725/5773 [1:08:18<04:20, 5.43s/it] 99%|█████████▉| 5725/5773 [1:08:21<04:20, 5.43s/it] {'loss': 0.5526, 'learning_rate': 3.626644926214562e-09, 'epoch': 0.99} 99%|█████████▉| 5725/5773 [1:08:21<04:20, 5.43s/it] {'loss': 0.5526, 'learning_rate': 3.626644926214562e-09, 'epoch': 0.99} 99%|█████████▉| 5725/5773 [1:08:18<04:20, 5.43s/it] 99%|█████████▉| 5726/5773 [1:08:24<04:13, 5.40s/it] 99%|█████████▉| 5726/5773 [1:08:26<04:13, 5.40s/it] {'loss': 0.5452, 'learning_rate': 3.4771174521208705e-09, 'epoch': 0.99} 99%|█████████▉| 5726/5773 [1:08:26<04:13, 5.40s/it] {'loss': 0.5452, 'learning_rate': 3.4771174521208705e-09, 'epoch': 0.99} 99%|█████████▉| 5726/5773 [1:08:24<04:13, 5.40s/it] 99%|█████████▉| 5727/5773 [1:08:29<04:07, 5.38s/it] 99%|█████████▉| 5727/5773 [1:08:31<04:07, 5.38s/it] {'loss': 0.5554, 'learning_rate': 3.3307372028179574e-09, 'epoch': 0.99} 99%|█████████▉| 5727/5773 [1:08:31<04:07, 5.38s/it] {'loss': 0.5554, 'learning_rate': 3.3307372028179574e-09, 'epoch': 0.99} 99%|█████████▉| 5727/5773 [1:08:29<04:07, 5.38s/it] 99%|█████████▉| 5728/5773 [1:08:34<04:01, 5.37s/it] 99%|█████████▉| 5728/5773 [1:08:37<04:01, 5.37s/it] {'loss': 0.5637, 'learning_rate': 3.1875042243911803e-09, 'epoch': 0.99} 99%|█████████▉| 5728/5773 [1:08:37<04:01, 5.37s/it] {'loss': 0.5637, 'learning_rate': 3.1875042243911803e-09, 'epoch': 0.99} 99%|█████████▉| 5728/5773 [1:08:34<04:01, 5.37s/it] 99%|█████████▉| 5729/5773 [1:08:40<03:57, 5.39s/it] 99%|█████████▉| 5729/5773 [1:08:42<03:57, 5.39s/it] {'loss': 0.5566, 'learning_rate': 3.0474185619333573e-09, 'epoch': 0.99} 99%|█████████▉| 5729/5773 [1:08:42<03:57, 5.39s/it] {'loss': 0.5566, 'learning_rate': 3.0474185619333573e-09, 'epoch': 0.99} 99%|█████████▉| 5729/5773 [1:08:40<03:57, 5.39s/it] 99%|█████████▉| 5730/5773 [1:08:45<03:51, 5.38s/it] 99%|█████████▉| 5730/5773 [1:08:48<03:51, 5.38s/it] {'loss': 0.5587, 'learning_rate': 2.9104802595480984e-09, 'epoch': 0.99} 99%|█████████▉| 5730/5773 [1:08:48<03:51, 5.38s/it] {'loss': 0.5587, 'learning_rate': 2.9104802595480984e-09, 'epoch': 0.99} 99%|█████████▉| 5730/5773 [1:08:45<03:51, 5.38s/it] 99%|█████████▉| 5731/5773 [1:08:51<03:47, 5.41s/it] 99%|█████████▉| 5731/5773 [1:08:53<03:47, 5.41s/it] {'loss': 0.5469, 'learning_rate': 2.7766893603486946e-09, 'epoch': 0.99} 99%|█████████▉| 5731/5773 [1:08:53<03:47, 5.41s/it] {'loss': 0.5469, 'learning_rate': 2.7766893603486946e-09, 'epoch': 0.99} 99%|█████████▉| 5731/5773 [1:08:51<03:47, 5.41s/it] 99%|█████████▉| 5732/5773 [1:08:56<03:44, 5.47s/it] 99%|█████████▉| 5732/5773 [1:08:59<03:44, 5.47s/it] {'loss': 0.5528, 'learning_rate': 2.6460459064570065e-09, 'epoch': 0.99} 99%|█████████▉| 5732/5773 [1:08:59<03:44, 5.47s/it] {'loss': 0.5528, 'learning_rate': 2.6460459064570065e-09, 'epoch': 0.99} 99%|█████████▉| 5732/5773 [1:08:56<03:44, 5.47s/it] 99%|█████████▉| 5733/5773 [1:09:02<03:38, 5.45s/it] 99%|█████████▉| 5733/5773 [1:09:04<03:38, 5.45s/it] {'loss': 0.5541, 'learning_rate': 2.5185499390023572e-09, 'epoch': 0.99} 99%|█████████▉| 5733/5773 [1:09:04<03:38, 5.45s/it] {'loss': 0.5541, 'learning_rate': 2.5185499390023572e-09, 'epoch': 0.99} 99%|█████████▉| 5733/5773 [1:09:02<03:38, 5.45s/it] 99%|█████████▉| 5734/5773 [1:09:07<03:32, 5.45s/it] 99%|█████████▉| 5734/5773 [1:09:09<03:32, 5.45s/it] {'loss': 0.5479, 'learning_rate': 2.3942014981270802e-09, 'epoch': 0.99} 99%|█████████▉| 5734/5773 [1:09:09<03:32, 5.45s/it] {'loss': 0.5479, 'learning_rate': 2.3942014981270802e-09, 'epoch': 0.99} 99%|█████████▉| 5734/5773 [1:09:07<03:32, 5.45s/it] 99%|█████████▉| 5735/5773 [1:09:12<03:26, 5.42s/it] 99%|█████████▉| 5735/5773 [1:09:15<03:26, 5.42s/it] {'loss': 0.5641, 'learning_rate': 2.273000622977639e-09, 'epoch': 0.99} 99%|█████████▉| 5735/5773 [1:09:15<03:26, 5.42s/it] {'loss': 0.5641, 'learning_rate': 2.273000622977639e-09, 'epoch': 0.99} 99%|█████████▉| 5735/5773 [1:09:12<03:26, 5.42s/it] 99%|█████████▉| 5736/5773 [1:09:18<03:20, 5.41s/it] 99%|█████████▉| 5736/5773 [1:09:20<03:20, 5.41s/it] {'loss': 0.5662, 'learning_rate': 2.1549473517124e-09, 'epoch': 0.99} 99%|█████████▉| 5736/5773 [1:09:20<03:20, 5.41s/it] {'loss': 0.5662, 'learning_rate': 2.1549473517124e-09, 'epoch': 0.99} 99%|█████████▉| 5736/5773 [1:09:18<03:20, 5.41s/it] 99%|█████████▉| 5737/5773 [1:09:23<03:13, 5.38s/it] 99%|█████████▉| 5737/5773 [1:09:26<03:13, 5.38s/it] {'loss': 0.5424, 'learning_rate': 2.0400417214994085e-09, 'epoch': 0.99} 99%|█████████▉| 5737/5773 [1:09:26<03:13, 5.38s/it] {'loss': 0.5424, 'learning_rate': 2.0400417214994085e-09, 'epoch': 0.99} 99%|█████████▉| 5737/5773 [1:09:23<03:13, 5.38s/it] 99%|█████████▉| 5738/5773 [1:09:28<03:07, 5.36s/it] 99%|█████████▉| 5738/5773 [1:09:31<03:07, 5.36s/it] {'loss': 0.5493, 'learning_rate': 1.9282837685141718e-09, 'epoch': 0.99} 99%|█████████▉| 5738/5773 [1:09:31<03:07, 5.36s/it] {'loss': 0.5493, 'learning_rate': 1.9282837685141718e-09, 'epoch': 0.99} 99%|█████████▉| 5738/5773 [1:09:28<03:07, 5.36s/it] 99%|█████████▉| 5739/5773 [1:09:34<03:03, 5.38s/it] 99%|█████████▉| 5739/5773 [1:09:36<03:03, 5.38s/it] {'loss': 0.5449, 'learning_rate': 1.8196735279407684e-09, 'epoch': 0.99} 99%|█████████▉| 5739/5773 [1:09:36<03:03, 5.38s/it] {'loss': 0.5449, 'learning_rate': 1.8196735279407684e-09, 'epoch': 0.99} 99%|█████████▉| 5739/5773 [1:09:34<03:03, 5.38s/it] 99%|█████████▉| 5740/5773 [1:09:39<02:59, 5.45s/it] 99%|█████████▉| 5740/5773 [1:09:42<02:59, 5.45s/it] {'loss': 0.5444, 'learning_rate': 1.7142110339740669e-09, 'epoch': 0.99} 99%|█████████▉| 5740/5773 [1:09:42<02:59, 5.45s/it] {'loss': 0.5444, 'learning_rate': 1.7142110339740669e-09, 'epoch': 0.99} 99%|█████████▉| 5740/5773 [1:09:39<02:59, 5.45s/it] 99%|█████████▉| 5741/5773 [1:09:45<02:54, 5.45s/it] 99%|█████████▉| 5741/5773 [1:09:47<02:54, 5.45s/it] {'loss': 0.5606, 'learning_rate': 1.6118963198163973e-09, 'epoch': 0.99} 99%|█████████▉| 5741/5773 [1:09:47<02:54, 5.45s/it]{'loss': 0.5606, 'learning_rate': 1.6118963198163973e-09, 'epoch': 0.99} 99%|█████████▉| 5741/5773 [1:09:45<02:54, 5.45s/it] 99%|█████████▉| 5742/5773 [1:09:50<02:47, 5.39s/it] 99%|█████████▉| 5742/5773 [1:09:53<02:47, 5.39s/it] {'loss': 0.5706, 'learning_rate': 1.5127294176797703e-09, 'epoch': 0.99} 99%|█████████▉| 5742/5773 [1:09:53<02:47, 5.39s/it] {'loss': 0.5706, 'learning_rate': 1.5127294176797703e-09, 'epoch': 0.99} 99%|█████████▉| 5742/5773 [1:09:50<02:47, 5.39s/it] 99%|█████████▉| 5743/5773 [1:09:55<02:41, 5.37s/it] 99%|█████████▉| 5743/5773 [1:09:58<02:41, 5.37s/it] {'loss': 0.5513, 'learning_rate': 1.416710358786988e-09, 'epoch': 0.99} 99%|█████████▉| 5743/5773 [1:09:58<02:41, 5.37s/it] {'loss': 0.5513, 'learning_rate': 1.416710358786988e-09, 'epoch': 0.99} 99%|█████████▉| 5743/5773 [1:09:55<02:41, 5.37s/it] 99%|█████████▉| 5744/5773 [1:10:01<02:36, 5.40s/it] 99%|█████████▉| 5744/5773 [1:10:03<02:36, 5.40s/it] {'loss': 0.5398, 'learning_rate': 1.3238391733649824e-09, 'epoch': 0.99} 99%|█████████▉| 5744/5773 [1:10:03<02:36, 5.40s/it] {'loss': 0.5398, 'learning_rate': 1.3238391733649824e-09, 'epoch': 0.99} 99%|█████████▉| 5744/5773 [1:10:01<02:36, 5.40s/it] 100%|█████████▉| 5745/5773 [1:10:06<02:32, 5.43s/it] 100%|█████████▉| 5745/5773 [1:10:09<02:32, 5.43s/it] {'loss': 0.547, 'learning_rate': 1.2341158906536976e-09, 'epoch': 1.0} 100%|█████████▉| 5745/5773 [1:10:09<02:32, 5.43s/it] {'loss': 0.547, 'learning_rate': 1.2341158906536976e-09, 'epoch': 1.0} 100%|█████████▉| 5745/5773 [1:10:06<02:32, 5.43s/it] 100%|█████████▉| 5746/5773 [1:10:12<02:25, 5.41s/it] 100%|█████████▉| 5746/5773 [1:10:14<02:25, 5.41s/it] {'loss': 0.5508, 'learning_rate': 1.1475405389016482e-09, 'epoch': 1.0} 100%|█████████▉| 5746/5773 [1:10:14<02:25, 5.41s/it] {'loss': 0.5508, 'learning_rate': 1.1475405389016482e-09, 'epoch': 1.0} 100%|█████████▉| 5746/5773 [1:10:12<02:25, 5.41s/it] 100%|█████████▉| 5747/5773 [1:10:17<02:20, 5.42s/it] 100%|█████████▉| 5747/5773 [1:10:20<02:20, 5.42s/it] {'loss': 0.5759, 'learning_rate': 1.0641131453659193e-09, 'epoch': 1.0} 100%|█████████▉| 5747/5773 [1:10:20<02:20, 5.42s/it] {'loss': 0.5759, 'learning_rate': 1.0641131453659193e-09, 'epoch': 1.0} 100%|█████████▉| 5747/5773 [1:10:17<02:20, 5.42s/it] 100%|█████████▉| 5748/5773 [1:10:23<02:15, 5.43s/it] 100%|█████████▉| 5748/5773 [1:10:25<02:15, 5.43s/it] {'loss': 0.5371, 'learning_rate': 9.83833736309947e-10, 'epoch': 1.0} 100%|█████████▉| 5748/5773 [1:10:25<02:15, 5.43s/it] {'loss': 0.5371, 'learning_rate': 9.83833736309947e-10, 'epoch': 1.0} 100%|█████████▉| 5748/5773 [1:10:23<02:15, 5.43s/it] 100%|█████████▉| 5749/5773 [1:10:28<02:09, 5.40s/it] 100%|█████████▉| 5749/5773 [1:10:30<02:09, 5.40s/it] {'loss': 0.5611, 'learning_rate': 9.067023370101791e-10, 'epoch': 1.0} 100%|█████████▉| 5749/5773 [1:10:30<02:09, 5.40s/it] {'loss': 0.5611, 'learning_rate': 9.067023370101791e-10, 'epoch': 1.0} 100%|█████████▉| 5749/5773 [1:10:28<02:09, 5.40s/it]14 AutoResumeHook: Checking whether to suspend... 11 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... 54 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...7 AutoResumeHook: Checking whether to suspend... 910 100%|█████████▉| 5750/5773 [1:10:34<02:05, 5.45s/it] AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 8 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend... 100%|█████████▉| 5750/5773 [1:10:36<02:05, 5.45s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5649, 'learning_rate': 8.327189717494133e-10, 'epoch': 1.0} 100%|█████████▉| 5750/5773 [1:10:36<02:05, 5.45s/it] {'loss': 0.5649, 'learning_rate': 8.327189717494133e-10, 'epoch': 1.0} 100%|█████████▉| 5750/5773 [1:10:34<02:05, 5.45s/it] 100%|█████████▉| 5751/5773 [1:10:39<02:00, 5.48s/it] 100%|█████████▉| 5751/5773 [1:10:42<02:00, 5.48s/it] {'loss': 0.5521, 'learning_rate': 7.618836638190186e-10, 'epoch': 1.0} 100%|█████████▉| 5751/5773 [1:10:42<02:00, 5.48s/it] {'loss': 0.5521, 'learning_rate': 7.618836638190186e-10, 'epoch': 1.0} 100%|█████████▉| 5751/5773 [1:10:39<02:00, 5.48s/it] 100%|█████████▉| 5752/5773 [1:10:47<01:55, 5.50s/it] 100%|█████████▉| 5752/5773 [1:10:45<01:55, 5.50s/it] {'loss': 0.5415, 'learning_rate': 6.941964355222653e-10, 'epoch': 1.0} 100%|█████████▉| 5752/5773 [1:10:47<01:55, 5.50s/it] {'loss': 0.5415, 'learning_rate': 6.941964355222653e-10, 'epoch': 1.0} 100%|█████████▉| 5752/5773 [1:10:45<01:55, 5.50s/it] 100%|█████████▉| 5753/5773 [1:10:50<01:50, 5.50s/it] 100%|█████████▉| 5753/5773 [1:10:53<01:50, 5.50s/it] {'loss': 0.5645, 'learning_rate': 6.296573081665535e-10, 'epoch': 1.0} 100%|█████████▉| 5753/5773 [1:10:53<01:50, 5.50s/it] {'loss': 0.5645, 'learning_rate': 6.296573081665535e-10, 'epoch': 1.0} 100%|█████████▉| 5753/5773 [1:10:50<01:50, 5.50s/it] 100%|█████████▉| 5754/5773 [1:10:56<01:43, 5.47s/it] 100%|█████████▉| 5754/5773 [1:10:58<01:43, 5.47s/it] {'loss': 0.5451, 'learning_rate': 5.682663020734059e-10, 'epoch': 1.0} 100%|█████████▉| 5754/5773 [1:10:58<01:43, 5.47s/it] {'loss': 0.5451, 'learning_rate': 5.682663020734059e-10, 'epoch': 1.0} 100%|█████████▉| 5754/5773 [1:10:56<01:43, 5.47s/it] 100%|█████████▉| 5755/5773 [1:11:01<01:37, 5.41s/it] 100%|█████████▉| 5755/5773 [1:11:03<01:37, 5.41s/it] {'loss': 0.5462, 'learning_rate': 5.100234365706947e-10, 'epoch': 1.0} 100%|█████████▉| 5755/5773 [1:11:03<01:37, 5.41s/it] {'loss': 0.5462, 'learning_rate': 5.100234365706947e-10, 'epoch': 1.0} 100%|█████████▉| 5755/5773 [1:11:01<01:37, 5.41s/it] 100%|█████████▉| 5756/5773 [1:11:06<01:31, 5.41s/it] 100%|█████████▉| 5756/5773 [1:11:09<01:31, 5.41s/it] {'loss': 0.5678, 'learning_rate': 4.5492872999264303e-10, 'epoch': 1.0} 100%|█████████▉| 5756/5773 [1:11:09<01:31, 5.41s/it] {'loss': 0.5678, 'learning_rate': 4.5492872999264303e-10, 'epoch': 1.0} 100%|█████████▉| 5756/5773 [1:11:06<01:31, 5.41s/it] 100%|█████████▉| 5757/5773 [1:11:12<01:27, 5.45s/it] 100%|█████████▉| 5757/5773 [1:11:14<01:27, 5.45s/it] {'loss': 0.5576, 'learning_rate': 4.029821996864858e-10, 'epoch': 1.0} 100%|█████████▉| 5757/5773 [1:11:14<01:27, 5.45s/it] {'loss': 0.5576, 'learning_rate': 4.029821996864858e-10, 'epoch': 1.0} 100%|█████████▉| 5757/5773 [1:11:12<01:27, 5.45s/it] 100%|█████████▉| 5758/5773 [1:11:17<01:21, 5.46s/it] 100%|█████████▉| 5758/5773 [1:11:20<01:21, 5.46s/it] {'loss': 0.5434, 'learning_rate': 3.541838620069182e-10, 'epoch': 1.0} 100%|█████████▉| 5758/5773 [1:11:20<01:21, 5.46s/it] {'loss': 0.5434, 'learning_rate': 3.541838620069182e-10, 'epoch': 1.0} 100%|█████████▉| 5758/5773 [1:11:17<01:21, 5.46s/it] 100%|█████████▉| 5759/5773 [1:11:23<01:16, 5.47s/it] 100%|█████████▉| 5759/5773 [1:11:25<01:16, 5.47s/it] {'loss': 0.5635, 'learning_rate': 3.0853373231720663e-10, 'epoch': 1.0} 100%|█████████▉| 5759/5773 [1:11:25<01:16, 5.47s/it] {'loss': 0.5635, 'learning_rate': 3.0853373231720663e-10, 'epoch': 1.0} 100%|█████████▉| 5759/5773 [1:11:23<01:16, 5.47s/it] 100%|█████████▉| 5760/5773 [1:11:28<01:11, 5.47s/it] 100%|█████████▉| 5760/5773 [1:11:31<01:11, 5.47s/it] {'loss': 0.5419, 'learning_rate': 2.6603182498807777e-10, 'epoch': 1.0} 100%|█████████▉| 5760/5773 [1:11:31<01:11, 5.47s/it] {'loss': 0.5419, 'learning_rate': 2.6603182498807777e-10, 'epoch': 1.0} 100%|█████████▉| 5760/5773 [1:11:28<01:11, 5.47s/it] 100%|█████████▉| 5761/5773 [1:11:34<01:05, 5.49s/it] 100%|█████████▉| 5761/5773 [1:11:36<01:05, 5.49s/it] {'loss': 0.5594, 'learning_rate': 2.2667815340216004e-10, 'epoch': 1.0} 100%|█████████▉| 5761/5773 [1:11:36<01:05, 5.49s/it] {'loss': 0.5594, 'learning_rate': 2.2667815340216004e-10, 'epoch': 1.0} 100%|█████████▉| 5761/5773 [1:11:34<01:05, 5.49s/it] 100%|█████████▉| 5762/5773 [1:11:39<01:00, 5.49s/it] 100%|█████████▉| 5762/5773 [1:11:42<01:00, 5.49s/it] {'loss': 0.5476, 'learning_rate': 1.904727299473219e-10, 'epoch': 1.0} 100%|█████████▉| 5762/5773 [1:11:42<01:00, 5.49s/it] {'loss': 0.5476, 'learning_rate': 1.904727299473219e-10, 'epoch': 1.0} 100%|█████████▉| 5762/5773 [1:11:39<01:00, 5.49s/it] 100%|█████████▉| 5763/5773 [1:11:45<00:54, 5.43s/it] 100%|█████████▉| 5763/5773 [1:11:47<00:54, 5.43s/it] {'loss': 0.5453, 'learning_rate': 1.5741556602444362e-10, 'epoch': 1.0} 100%|█████████▉| 5763/5773 [1:11:47<00:54, 5.43s/it] {'loss': 0.5453, 'learning_rate': 1.5741556602444362e-10, 'epoch': 1.0} 100%|█████████▉| 5763/5773 [1:11:45<00:54, 5.43s/it] 100%|█████████▉| 5764/5773 [1:11:50<00:48, 5.44s/it] 100%|█████████▉| 5764/5773 [1:11:52<00:48, 5.44s/it] {'loss': 0.5494, 'learning_rate': 1.2750667203964563e-10, 'epoch': 1.0} 100%|█████████▉| 5764/5773 [1:11:52<00:48, 5.44s/it] {'loss': 0.5494, 'learning_rate': 1.2750667203964563e-10, 'epoch': 1.0} 100%|█████████▉| 5764/5773 [1:11:50<00:48, 5.44s/it] 100%|█████████▉| 5765/5773 [1:11:55<00:43, 5.42s/it] 100%|█████████▉| 5765/5773 [1:11:58<00:43, 5.42s/it] {'loss': 0.5529, 'learning_rate': 1.0074605740983957e-10, 'epoch': 1.0} 100%|█████████▉| 5765/5773 [1:11:58<00:43, 5.42s/it] {'loss': 0.5529, 'learning_rate': 1.0074605740983957e-10, 'epoch': 1.0} 100%|█████████▉| 5765/5773 [1:11:55<00:43, 5.42s/it] 100%|█████████▉| 5766/5773 [1:12:01<00:37, 5.43s/it] 100%|█████████▉| 5766/5773 [1:12:03<00:37, 5.43s/it] {'loss': 0.5662, 'learning_rate': 7.713373055939777e-11, 'epoch': 1.0} 100%|█████████▉| 5766/5773 [1:12:03<00:37, 5.43s/it] {'loss': 0.5662, 'learning_rate': 7.713373055939777e-11, 'epoch': 1.0} 100%|█████████▉| 5766/5773 [1:12:01<00:37, 5.43s/it] 100%|█████████▉| 5767/5773 [1:12:06<00:32, 5.40s/it] 100%|█████████▉| 5767/5773 [1:12:09<00:32, 5.40s/it] {'loss': 0.5495, 'learning_rate': 5.666969892348384e-11, 'epoch': 1.0} 100%|█████████▉| 5767/5773 [1:12:09<00:32, 5.40s/it] {'loss': 0.5495, 'learning_rate': 5.666969892348384e-11, 'epoch': 1.0} 100%|█████████▉| 5767/5773 [1:12:06<00:32, 5.40s/it] 100%|█████████▉| 5768/5773 [1:12:12<00:26, 5.39s/it] 100%|█████████▉| 5768/5773 [1:12:14<00:26, 5.39s/it] {'loss': 0.5449, 'learning_rate': 3.935396894250154e-11, 'epoch': 1.0} 100%|█████████▉| 5768/5773 [1:12:14<00:26, 5.39s/it] {'loss': 0.5449, 'learning_rate': 3.935396894250154e-11, 'epoch': 1.0} 100%|█████████▉| 5768/5773 [1:12:12<00:26, 5.39s/it] 100%|█████████▉| 5769/5773 [1:12:17<00:21, 5.41s/it] 100%|█████████▉| 5769/5773 [1:12:19<00:21, 5.41s/it] {'loss': 0.543, 'learning_rate': 2.5186546070976593e-11, 'epoch': 1.0} 100%|█████████▉| 5769/5773 [1:12:19<00:21, 5.41s/it] {'loss': 0.543, 'learning_rate': 2.5186546070976593e-11, 'epoch': 1.0} 100%|█████████▉| 5769/5773 [1:12:17<00:21, 5.41s/it] 100%|█████████▉| 5770/5773 [1:12:23<00:16, 5.45s/it] 100%|█████████▉| 5770/5773 [1:12:25<00:16, 5.45s/it] {'loss': 0.5528, 'learning_rate': 1.4167434766454435e-11, 'epoch': 1.0} 100%|█████████▉| 5770/5773 [1:12:25<00:16, 5.45s/it] {'loss': 0.5528, 'learning_rate': 1.4167434766454435e-11, 'epoch': 1.0} 100%|█████████▉| 5770/5773 [1:12:23<00:16, 5.45s/it] 100%|█████████▉| 5771/5773 [1:12:28<00:10, 5.49s/it] 100%|█████████▉| 5771/5773 [1:12:31<00:10, 5.49s/it] {'loss': 0.5556, 'learning_rate': 6.296638499492247e-12, 'epoch': 1.0} 100%|█████████▉| 5771/5773 [1:12:31<00:10, 5.49s/it] {'loss': 0.5556, 'learning_rate': 6.296638499492247e-12, 'epoch': 1.0} 100%|█████████▉| 5771/5773 [1:12:28<00:10, 5.49s/it] 100%|█████████▉| 5772/5773 [1:12:34<00:05, 5.51s/it] 100%|█████████▉| 5772/5773 [1:12:36<00:05, 5.51s/it] {'loss': 0.5444, 'learning_rate': 1.5741597492180406e-12, 'epoch': 1.0} 100%|█████████▉| 5772/5773 [1:12:36<00:05, 5.51s/it] {'loss': 0.5444, 'learning_rate': 1.5741597492180406e-12, 'epoch': 1.0} 100%|█████████▉| 5772/5773 [1:12:34<00:05, 5.51s/it] 100%|██████████| 5773/5773 [1:12:40<00:00, 5.79s/it] 100%|██████████| 5773/5773 [1:12:43<00:00, 5.79s/it] {'loss': 0.5595, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 5773/5773 [1:12:43<00:00, 5.79s/it] {'loss': 0.5595, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 5773/5773 [1:12:40<00:00, 5.79s/it] {'train_runtime': 4363.0874, 'train_samples_per_second': 682.024, 'train_steps_per_second': 1.323, 'train_loss': 0.07439602913279147, 'epoch': 1.0} 100%|██████████| 5773/5773 [1:12:43<00:00, 5.79s/it]{'train_runtime': 4363.0883, 'train_samples_per_second': 682.024, 'train_steps_per_second': 1.323, 'train_loss': 0.07439602913279147, 'epoch': 1.0} 100%|██████████| 5773/5773 [1:12:40<00:00, 5.79s/it] 100%|██████████| 5773/5773 [1:12:40<00:00, 1.32it/s] 100%|██████████| 5773/5773 [1:12:43<00:00, 1.32it/s] saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path/mm_projector wandb: 🚀 View run vila_3b_oxe_sim_path at: https://wandb.ai/memmelma/VILA/runs/kf44q4ei wandb: Find logs at: ../../../../../../../../fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/wandb/run-20250410_123229-kf44q4ei/logs srun: job 6723732 queued and waiting for resources srun: job 6723732 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-2097 JobID: 6723732 | Full list: batch-block1-2097 batch-block1-10017 NETWORK=Efficient-Large-Model/VILA1.5-3b wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-2097 JobID: 6723732 | Full list: batch-block1-2097 batch-block1-10017 NETWORK=Efficient-Large-Model/VILA1.5-3b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-04-10 14:10:48,677] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:48,677] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:48,677] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:48,677] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:48,677] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:48,677] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:48,677] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:48,678] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:49,021] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:49,021] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:49,021] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:49,021] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:49,021] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:49,021] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:49,021] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:49,022] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-04-10 14:10:49,785] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,785] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,785] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,785] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,785] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,785] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,785] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,785] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,785] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,785] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,785] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-04-10 14:10:49,785] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,785] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,786] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,786] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,786] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,786] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,982] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,982] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,982] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,982] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,982] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,982] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,982] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,982] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,982] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,982] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,982] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,982] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,982] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,982] [INFO] [comm.py:594:init_distributed] cdb=None [2025-04-10 14:10:49,982] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-04-10 14:10:49,982] [INFO] [comm.py:594:init_distributed] cdb=None Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp trainingModels has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/vila/vila_3b_oxe_sim_path. Skipp training