srun: Job 8261139 step creation temporarily disabled, retrying (Requested nodes are busy) srun: job 8262210 queued and waiting for resources srun: job 8262210 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block5-00572 JobID: 8262210 | Full list: batch-block5-00572 NETWORK=Efficient-Large-Model/VILA1.5-3b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-05-27 18:07:56,415] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:07:56,415] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:07:56,415] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:07:56,415] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:07:56,415] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:07:56,415] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:07:56,415] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:07:56,415] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:07:58,182] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:07:58,182] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:07:58,182] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:07:58,182] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:07:58,182] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:07:58,182] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:07:58,182] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:07:58,182] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:07:58,182] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:07:58,182] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:07:58,182] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:07:58,182] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:07:58,182] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:07:58,182] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-05-27 18:07:58,182] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:07:58,182] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:07:58,182] [INFO] [comm.py:594:init_distributed] cdb=None /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( Fetching 17 files: 0%| | 0/17 [00:00 4096). Running this sequence through the model will result in indexing errors 29%|██▊ | 1894/6640 [2:59:01<7:05:40, 5.38s/it] {'loss': 0.6049, 'learning_rate': 1.677545983796741e-05, 'epoch': 0.29} 29%|██▊ | 1894/6640 [2:59:01<7:05:40, 5.38s/it] 29%|██▊ | 1895/6640 [2:59:06<7:07:51, 5.41s/it] {'loss': 0.6253, 'learning_rate': 1.677187117498536e-05, 'epoch': 0.29} 29%|██▊ | 1895/6640 [2:59:06<7:07:51, 5.41s/it] 29%|██▊ | 1896/6640 [2:59:12<7:11:59, 5.46s/it] {'loss': 0.6188, 'learning_rate': 1.6768280900479634e-05, 'epoch': 0.29} 29%|██▊ | 1896/6640 [2:59:12<7:11:59, 5.46s/it] 29%|██▊ | 1897/6640 [2:59:17<7:09:55, 5.44s/it] {'loss': 0.628, 'learning_rate': 1.6764689015304624e-05, 'epoch': 0.29} 29%|██▊ | 1897/6640 [2:59:17<7:09:55, 5.44s/it] 29%|██▊ | 1898/6640 [2:59:23<7:11:25, 5.46s/it] {'loss': 0.6243, 'learning_rate': 1.67610955203151e-05, 'epoch': 0.29} 29%|██▊ | 1898/6640 [2:59:23<7:11:25, 5.46s/it] 29%|██▊ | 1899/6640 [2:59:28<7:13:43, 5.49s/it] {'loss': 0.6022, 'learning_rate': 1.6757500416366225e-05, 'epoch': 0.29} 29%|██▊ | 1899/6640 [2:59:28<7:13:43, 5.49s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 29%|██▊ | 1900/6640 [2:59:34<7:11:33, 5.46s/it]3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5987, 'learning_rate': 1.6753903704313527e-05, 'epoch': 0.29} 29%|██▊ | 1900/6640 [2:59:34<7:11:33, 5.46s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-1900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-1900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-1900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 29%|██▊ | 1901/6640 [3:00:00<15:25:38, 11.72s/it] {'loss': 0.6187, 'learning_rate': 1.6750305385012936e-05, 'epoch': 0.29} 29%|██▊ | 1901/6640 [3:00:00<15:25:38, 11.72s/it] 29%|██▊ | 1902/6640 [3:00:05<12:55:37, 9.82s/it] {'loss': 0.6228, 'learning_rate': 1.6746705459320746e-05, 'epoch': 0.29} 29%|██▊ | 1902/6640 [3:00:05<12:55:37, 9.82s/it] 29%|██▊ | 1903/6640 [3:00:11<11:08:53, 8.47s/it] {'loss': 0.6002, 'learning_rate': 1.674310392809365e-05, 'epoch': 0.29} 29%|██▊ | 1903/6640 [3:00:11<11:08:53, 8.47s/it] 29%|██▊ | 1904/6640 [3:00:16<9:52:43, 7.51s/it] {'loss': 0.619, 'learning_rate': 1.673950079218871e-05, 'epoch': 0.29} 29%|██▊ | 1904/6640 [3:00:16<9:52:43, 7.51s/it] 29%|██▊ | 1905/6640 [3:00:21<9:01:55, 6.87s/it] {'loss': 0.6039, 'learning_rate': 1.6735896052463384e-05, 'epoch': 0.29} 29%|██▊ | 1905/6640 [3:00:21<9:01:55, 6.87s/it] 29%|██▊ | 1906/6640 [3:00:27<8:36:01, 6.54s/it] {'loss': 0.6282, 'learning_rate': 1.6732289709775496e-05, 'epoch': 0.29} 29%|██▊ | 1906/6640 [3:00:27<8:36:01, 6.54s/it] 29%|██▊ | 1907/6640 [3:00:32<8:08:47, 6.20s/it] {'loss': 0.6193, 'learning_rate': 1.672868176498326e-05, 'epoch': 0.29} 29%|██▊ | 1907/6640 [3:00:32<8:08:47, 6.20s/it] 29%|██▊ | 1908/6640 [3:00:38<7:52:09, 5.99s/it] {'loss': 0.6203, 'learning_rate': 1.6725072218945274e-05, 'epoch': 0.29} 29%|██▊ | 1908/6640 [3:00:38<7:52:09, 5.99s/it] 29%|██▉ | 1909/6640 [3:00:43<7:38:09, 5.81s/it] {'loss': 0.5922, 'learning_rate': 1.672146107252051e-05, 'epoch': 0.29} 29%|██▉ | 1909/6640 [3:00:43<7:38:09, 5.81s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 29%|██▉ | 1910/6640 [3:00:49<7:30:30, 5.71s/it] {'loss': 0.6224, 'learning_rate': 1.6717848326568327e-05, 'epoch': 0.29} 29%|██▉ | 1910/6640 [3:00:49<7:30:30, 5.71s/it] 29%|██▉ | 1911/6640 [3:00:54<7:23:04, 5.62s/it] {'loss': 0.5987, 'learning_rate': 1.6714233981948457e-05, 'epoch': 0.29} 29%|██▉ | 1911/6640 [3:00:54<7:23:04, 5.62s/it] 29%|██▉ | 1912/6640 [3:01:00<7:17:43, 5.55s/it] {'loss': 0.6174, 'learning_rate': 1.6710618039521017e-05, 'epoch': 0.29} 29%|██▉ | 1912/6640 [3:01:00<7:17:43, 5.55s/it] 29%|██▉ | 1913/6640 [3:01:05<7:13:08, 5.50s/it] {'loss': 0.6144, 'learning_rate': 1.6707000500146505e-05, 'epoch': 0.29} 29%|██▉ | 1913/6640 [3:01:05<7:13:08, 5.50s/it] 29%|██▉ | 1914/6640 [3:01:10<7:08:36, 5.44s/it] {'loss': 0.5976, 'learning_rate': 1.6703381364685805e-05, 'epoch': 0.29} 29%|██▉ | 1914/6640 [3:01:10<7:08:36, 5.44s/it] 29%|██▉ | 1915/6640 [3:01:16<7:10:20, 5.46s/it] {'loss': 0.6083, 'learning_rate': 1.6699760634000166e-05, 'epoch': 0.29} 29%|██▉ | 1915/6640 [3:01:16<7:10:20, 5.46s/it] 29%|██▉ | 1916/6640 [3:01:21<7:11:21, 5.48s/it] {'loss': 0.588, 'learning_rate': 1.6696138308951227e-05, 'epoch': 0.29} 29%|██▉ | 1916/6640 [3:01:21<7:11:21, 5.48s/it] 29%|██▉ | 1917/6640 [3:01:27<7:10:46, 5.47s/it] {'loss': 0.6085, 'learning_rate': 1.669251439040101e-05, 'epoch': 0.29} 29%|██▉ | 1917/6640 [3:01:27<7:10:46, 5.47s/it] 29%|██▉ | 1918/6640 [3:01:32<7:11:52, 5.49s/it] {'loss': 0.5967, 'learning_rate': 1.66888888792119e-05, 'epoch': 0.29} 29%|██▉ | 1918/6640 [3:01:32<7:11:52, 5.49s/it] 29%|██▉ | 1919/6640 [3:01:38<7:08:37, 5.45s/it] {'loss': 0.6085, 'learning_rate': 1.668526177624668e-05, 'epoch': 0.29} 29%|██▉ | 1919/6640 [3:01:38<7:08:37, 5.45s/it] 29%|██▉ | 1920/6640 [3:01:43<7:08:12, 5.44s/it] {'loss': 0.6086, 'learning_rate': 1.66816330823685e-05, 'epoch': 0.29} 29%|██▉ | 1920/6640 [3:01:43<7:08:12, 5.44s/it] 29%|██▉ | 1921/6640 [3:01:49<7:09:20, 5.46s/it] {'loss': 0.6067, 'learning_rate': 1.6678002798440887e-05, 'epoch': 0.29} 29%|██▉ | 1921/6640 [3:01:49<7:09:20, 5.46s/it] 29%|██▉ | 1922/6640 [3:01:54<7:07:43, 5.44s/it] {'loss': 0.5991, 'learning_rate': 1.667437092532776e-05, 'epoch': 0.29} 29%|██▉ | 1922/6640 [3:01:54<7:07:43, 5.44s/it] 29%|██▉ | 1923/6640 [3:02:00<7:10:01, 5.47s/it] {'loss': 0.6008, 'learning_rate': 1.6670737463893403e-05, 'epoch': 0.29} 29%|██▉ | 1923/6640 [3:02:00<7:10:01, 5.47s/it] 29%|██▉ | 1924/6640 [3:02:05<7:06:00, 5.42s/it] {'loss': 0.634, 'learning_rate': 1.6667102415002482e-05, 'epoch': 0.29} 29%|██▉ | 1924/6640 [3:02:05<7:06:00, 5.42s/it] 29%|██▉ | 1925/6640 [3:02:10<7:07:37, 5.44s/it] {'loss': 0.6132, 'learning_rate': 1.6663465779520042e-05, 'epoch': 0.29} 29%|██▉ | 1925/6640 [3:02:10<7:07:37, 5.44s/it] 29%|██▉ | 1926/6640 [3:02:16<7:09:19, 5.46s/it] {'loss': 0.5924, 'learning_rate': 1.6659827558311504e-05, 'epoch': 0.29} 29%|██▉ | 1926/6640 [3:02:16<7:09:19, 5.46s/it] 29%|██▉ | 1927/6640 [3:02:21<7:06:47, 5.43s/it] {'loss': 0.6127, 'learning_rate': 1.665618775224267e-05, 'epoch': 0.29} 29%|██▉ | 1927/6640 [3:02:21<7:06:47, 5.43s/it] 29%|██▉ | 1928/6640 [3:02:27<7:04:29, 5.41s/it] {'loss': 0.6094, 'learning_rate': 1.665254636217971e-05, 'epoch': 0.29} 29%|██▉ | 1928/6640 [3:02:27<7:04:29, 5.41s/it] 29%|██▉ | 1929/6640 [3:02:32<7:06:29, 5.43s/it] {'loss': 0.6028, 'learning_rate': 1.6648903388989182e-05, 'epoch': 0.29} 29%|██▉ | 1929/6640 [3:02:32<7:06:29, 5.43s/it] 29%|██▉ | 1930/6640 [3:02:37<7:05:00, 5.41s/it] {'loss': 0.6208, 'learning_rate': 1.6645258833538015e-05, 'epoch': 0.29} 29%|██▉ | 1930/6640 [3:02:37<7:05:00, 5.41s/it] 29%|██▉ | 1931/6640 [3:02:43<7:06:04, 5.43s/it] {'loss': 0.6338, 'learning_rate': 1.6641612696693513e-05, 'epoch': 0.29} 29%|██▉ | 1931/6640 [3:02:43<7:06:04, 5.43s/it] 29%|██▉ | 1932/6640 [3:02:48<7:04:15, 5.41s/it] {'loss': 0.6313, 'learning_rate': 1.6637964979323363e-05, 'epoch': 0.29} 29%|██▉ | 1932/6640 [3:02:48<7:04:15, 5.41s/it] 29%|██▉ | 1933/6640 [3:02:54<7:06:07, 5.43s/it] {'loss': 0.5886, 'learning_rate': 1.6634315682295622e-05, 'epoch': 0.29} 29%|██▉ | 1933/6640 [3:02:54<7:06:07, 5.43s/it] 29%|██▉ | 1934/6640 [3:02:59<7:04:06, 5.41s/it] {'loss': 0.6327, 'learning_rate': 1.6630664806478726e-05, 'epoch': 0.29} 29%|██▉ | 1934/6640 [3:02:59<7:04:06, 5.41s/it] 29%|██▉ | 1935/6640 [3:03:05<7:07:08, 5.45s/it] {'loss': 0.6372, 'learning_rate': 1.6627012352741482e-05, 'epoch': 0.29} 29%|██▉ | 1935/6640 [3:03:05<7:07:08, 5.45s/it] 29%|██▉ | 1936/6640 [3:03:10<7:06:17, 5.44s/it] {'loss': 0.5946, 'learning_rate': 1.662335832195308e-05, 'epoch': 0.29} 29%|██▉ | 1936/6640 [3:03:10<7:06:17, 5.44s/it] 29%|██▉ | 1937/6640 [3:03:15<7:02:34, 5.39s/it] {'loss': 0.6301, 'learning_rate': 1.6619702714983077e-05, 'epoch': 0.29} 29%|██▉ | 1937/6640 [3:03:15<7:02:34, 5.39s/it] 29%|██▉ | 1938/6640 [3:03:21<7:00:10, 5.36s/it] {'loss': 0.6105, 'learning_rate': 1.661604553270141e-05, 'epoch': 0.29} 29%|██▉ | 1938/6640 [3:03:21<7:00:10, 5.36s/it] 29%|██▉ | 1939/6640 [3:03:26<7:04:07, 5.41s/it] {'loss': 0.6185, 'learning_rate': 1.6612386775978398e-05, 'epoch': 0.29} 29%|██▉ | 1939/6640 [3:03:26<7:04:07, 5.41s/it] 29%|██▉ | 1940/6640 [3:03:31<7:01:16, 5.38s/it] {'loss': 0.6011, 'learning_rate': 1.6608726445684715e-05, 'epoch': 0.29} 29%|██▉ | 1940/6640 [3:03:31<7:01:16, 5.38s/it] 29%|██▉ | 1941/6640 [3:03:37<7:03:56, 5.41s/it] {'loss': 0.604, 'learning_rate': 1.660506454269143e-05, 'epoch': 0.29} 29%|██▉ | 1941/6640 [3:03:37<7:03:56, 5.41s/it] 29%|██▉ | 1942/6640 [3:03:42<7:04:35, 5.42s/it] {'loss': 0.6282, 'learning_rate': 1.6601401067869978e-05, 'epoch': 0.29} 29%|██▉ | 1942/6640 [3:03:42<7:04:35, 5.42s/it] 29%|██▉ | 1943/6640 [3:03:48<7:03:53, 5.41s/it] {'loss': 0.6084, 'learning_rate': 1.659773602209216e-05, 'epoch': 0.29} 29%|██▉ | 1943/6640 [3:03:48<7:03:53, 5.41s/it] 29%|██▉ | 1944/6640 [3:03:53<6:58:51, 5.35s/it] {'loss': 0.6405, 'learning_rate': 1.6594069406230167e-05, 'epoch': 0.29} 29%|██▉ | 1944/6640 [3:03:53<6:58:51, 5.35s/it] 29%|██▉ | 1945/6640 [3:03:58<6:57:22, 5.33s/it] {'loss': 0.5982, 'learning_rate': 1.659040122115655e-05, 'epoch': 0.29} 29%|██▉ | 1945/6640 [3:03:58<6:57:22, 5.33s/it] 29%|██▉ | 1946/6640 [3:04:04<6:58:46, 5.35s/it] {'loss': 0.5931, 'learning_rate': 1.658673146774424e-05, 'epoch': 0.29} 29%|██▉ | 1946/6640 [3:04:04<6:58:46, 5.35s/it] 29%|██▉ | 1947/6640 [3:04:09<7:01:37, 5.39s/it] {'loss': 0.6303, 'learning_rate': 1.6583060146866542e-05, 'epoch': 0.29} 29%|██▉ | 1947/6640 [3:04:09<7:01:37, 5.39s/it] 29%|██▉ | 1948/6640 [3:04:15<7:05:43, 5.44s/it] {'loss': 0.5989, 'learning_rate': 1.657938725939713e-05, 'epoch': 0.29} 29%|██▉ | 1948/6640 [3:04:15<7:05:43, 5.44s/it] 29%|██▉ | 1949/6640 [3:04:20<7:04:21, 5.43s/it] {'loss': 0.5977, 'learning_rate': 1.657571280621005e-05, 'epoch': 0.29} 29%|██▉ | 1949/6640 [3:04:20<7:04:21, 5.43s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 29%|██▉ | 1950/6640 [3:04:26<7:03:33, 5.42s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6186, 'learning_rate': 1.6572036788179728e-05, 'epoch': 0.29} 29%|██▉ | 1950/6640 [3:04:26<7:03:33, 5.42s/it] 29%|██▉ | 1951/6640 [3:04:31<6:58:40, 5.36s/it] {'loss': 0.6218, 'learning_rate': 1.6568359206180952e-05, 'epoch': 0.29} 29%|██▉ | 1951/6640 [3:04:31<6:58:40, 5.36s/it] 29%|██▉ | 1952/6640 [3:04:36<7:03:04, 5.41s/it] {'loss': 0.6198, 'learning_rate': 1.6564680061088897e-05, 'epoch': 0.29} 29%|██▉ | 1952/6640 [3:04:36<7:03:04, 5.41s/it] 29%|██▉ | 1953/6640 [3:04:42<7:08:23, 5.48s/it] {'loss': 0.6305, 'learning_rate': 1.6560999353779092e-05, 'epoch': 0.29} 29%|██▉ | 1953/6640 [3:04:42<7:08:23, 5.48s/it] 29%|██▉ | 1954/6640 [3:04:47<7:07:24, 5.47s/it] {'loss': 0.6027, 'learning_rate': 1.655731708512745e-05, 'epoch': 0.29} 29%|██▉ | 1954/6640 [3:04:47<7:07:24, 5.47s/it] 29%|██▉ | 1955/6640 [3:04:53<7:06:02, 5.46s/it] {'loss': 0.6113, 'learning_rate': 1.6553633256010254e-05, 'epoch': 0.29} 29%|██▉ | 1955/6640 [3:04:53<7:06:02, 5.46s/it] 29%|██▉ | 1956/6640 [3:04:58<7:06:39, 5.47s/it] {'loss': 0.6166, 'learning_rate': 1.6549947867304154e-05, 'epoch': 0.29} 29%|██▉ | 1956/6640 [3:04:58<7:06:39, 5.47s/it] 29%|██▉ | 1957/6640 [3:05:04<7:05:53, 5.46s/it] {'loss': 0.6461, 'learning_rate': 1.654626091988617e-05, 'epoch': 0.29} 29%|██▉ | 1957/6640 [3:05:04<7:05:53, 5.46s/it] 29%|██▉ | 1958/6640 [3:05:09<7:06:31, 5.47s/it] {'loss': 0.6069, 'learning_rate': 1.6542572414633707e-05, 'epoch': 0.29} 29%|██▉ | 1958/6640 [3:05:09<7:06:31, 5.47s/it] 30%|██▉ | 1959/6640 [3:05:15<7:04:31, 5.44s/it] {'loss': 0.609, 'learning_rate': 1.653888235242452e-05, 'epoch': 0.3} 30%|██▉ | 1959/6640 [3:05:15<7:04:31, 5.44s/it] 30%|██▉ | 1960/6640 [3:05:20<7:07:16, 5.48s/it] {'loss': 0.6259, 'learning_rate': 1.653519073413675e-05, 'epoch': 0.3} 30%|██▉ | 1960/6640 [3:05:20<7:07:16, 5.48s/it] 30%|██▉ | 1961/6640 [3:05:26<7:06:19, 5.47s/it] {'loss': 0.6037, 'learning_rate': 1.6531497560648903e-05, 'epoch': 0.3} 30%|██▉ | 1961/6640 [3:05:26<7:06:19, 5.47s/it] 30%|██▉ | 1962/6640 [3:05:31<7:06:52, 5.48s/it] {'loss': 0.5982, 'learning_rate': 1.6527802832839853e-05, 'epoch': 0.3} 30%|██▉ | 1962/6640 [3:05:31<7:06:52, 5.48s/it] 30%|██▉ | 1963/6640 [3:05:36<7:02:57, 5.43s/it] {'loss': 0.6018, 'learning_rate': 1.652410655158885e-05, 'epoch': 0.3} 30%|██▉ | 1963/6640 [3:05:36<7:02:57, 5.43s/it] 30%|██▉ | 1964/6640 [3:05:42<7:03:19, 5.43s/it] {'loss': 0.6195, 'learning_rate': 1.6520408717775507e-05, 'epoch': 0.3} 30%|██▉ | 1964/6640 [3:05:42<7:03:19, 5.43s/it] 30%|██▉ | 1965/6640 [3:05:47<7:05:56, 5.47s/it] {'loss': 0.6199, 'learning_rate': 1.6516709332279806e-05, 'epoch': 0.3} 30%|██▉ | 1965/6640 [3:05:47<7:05:56, 5.47s/it] 30%|██▉ | 1966/6640 [3:05:53<7:05:21, 5.46s/it] {'loss': 0.62, 'learning_rate': 1.6513008395982107e-05, 'epoch': 0.3} 30%|██▉ | 1966/6640 [3:05:53<7:05:21, 5.46s/it] 30%|██▉ | 1967/6640 [3:05:59<7:13:50, 5.57s/it] {'loss': 0.6154, 'learning_rate': 1.650930590976313e-05, 'epoch': 0.3} 30%|██▉ | 1967/6640 [3:05:59<7:13:50, 5.57s/it] 30%|██▉ | 1968/6640 [3:06:04<7:10:02, 5.52s/it] {'loss': 0.6055, 'learning_rate': 1.650560187450397e-05, 'epoch': 0.3} 30%|██▉ | 1968/6640 [3:06:04<7:10:02, 5.52s/it] 30%|██▉ | 1969/6640 [3:06:10<7:08:01, 5.50s/it] {'loss': 0.6211, 'learning_rate': 1.650189629108609e-05, 'epoch': 0.3} 30%|██▉ | 1969/6640 [3:06:10<7:08:01, 5.50s/it] 30%|██▉ | 1970/6640 [3:06:15<7:10:34, 5.53s/it] {'loss': 0.5893, 'learning_rate': 1.649818916039131e-05, 'epoch': 0.3} 30%|██▉ | 1970/6640 [3:06:15<7:10:34, 5.53s/it] 30%|██▉ | 1971/6640 [3:06:21<7:08:21, 5.50s/it] {'loss': 0.6017, 'learning_rate': 1.6494480483301836e-05, 'epoch': 0.3} 30%|██▉ | 1971/6640 [3:06:21<7:08:21, 5.50s/it] 30%|██▉ | 1972/6640 [3:06:26<7:11:25, 5.55s/it] {'loss': 0.6019, 'learning_rate': 1.6490770260700234e-05, 'epoch': 0.3} 30%|██▉ | 1972/6640 [3:06:26<7:11:25, 5.55s/it] 30%|██▉ | 1973/6640 [3:06:32<7:10:39, 5.54s/it] {'loss': 0.6245, 'learning_rate': 1.6487058493469437e-05, 'epoch': 0.3} 30%|██▉ | 1973/6640 [3:06:32<7:10:39, 5.54s/it] 30%|██▉ | 1974/6640 [3:06:37<7:02:29, 5.43s/it] {'loss': 0.5909, 'learning_rate': 1.6483345182492742e-05, 'epoch': 0.3} 30%|██▉ | 1974/6640 [3:06:37<7:02:29, 5.43s/it] 30%|██▉ | 1975/6640 [3:06:42<7:04:59, 5.47s/it] {'loss': 0.6205, 'learning_rate': 1.6479630328653814e-05, 'epoch': 0.3} 30%|██▉ | 1975/6640 [3:06:42<7:04:59, 5.47s/it] 30%|██▉ | 1976/6640 [3:06:48<7:03:40, 5.45s/it] {'loss': 0.6158, 'learning_rate': 1.64759139328367e-05, 'epoch': 0.3} 30%|██▉ | 1976/6640 [3:06:48<7:03:40, 5.45s/it] 30%|██▉ | 1977/6640 [3:06:53<7:02:12, 5.43s/it] {'loss': 0.6017, 'learning_rate': 1.6472195995925796e-05, 'epoch': 0.3} 30%|██▉ | 1977/6640 [3:06:53<7:02:12, 5.43s/it] 30%|██▉ | 1978/6640 [3:06:59<7:06:16, 5.49s/it] {'loss': 0.6111, 'learning_rate': 1.6468476518805872e-05, 'epoch': 0.3} 30%|██▉ | 1978/6640 [3:06:59<7:06:16, 5.49s/it] 30%|██▉ | 1979/6640 [3:07:05<7:11:05, 5.55s/it] {'loss': 0.6057, 'learning_rate': 1.6464755502362063e-05, 'epoch': 0.3} 30%|██▉ | 1979/6640 [3:07:05<7:11:05, 5.55s/it] 30%|██▉ | 1980/6640 [3:07:10<7:05:15, 5.48s/it] {'loss': 0.6227, 'learning_rate': 1.646103294747987e-05, 'epoch': 0.3} 30%|██▉ | 1980/6640 [3:07:10<7:05:15, 5.48s/it] 30%|██▉ | 1981/6640 [3:07:15<7:01:23, 5.43s/it] {'loss': 0.5945, 'learning_rate': 1.6457308855045165e-05, 'epoch': 0.3} 30%|██▉ | 1981/6640 [3:07:15<7:01:23, 5.43s/it] 30%|██▉ | 1982/6640 [3:07:21<7:05:44, 5.48s/it] {'loss': 0.6196, 'learning_rate': 1.645358322594418e-05, 'epoch': 0.3} 30%|██▉ | 1982/6640 [3:07:21<7:05:44, 5.48s/it] 30%|██▉ | 1983/6640 [3:07:26<7:04:56, 5.47s/it] {'loss': 0.6302, 'learning_rate': 1.6449856061063513e-05, 'epoch': 0.3} 30%|██▉ | 1983/6640 [3:07:26<7:04:56, 5.47s/it] 30%|██▉ | 1984/6640 [3:07:32<7:10:45, 5.55s/it] {'loss': 0.6135, 'learning_rate': 1.644612736129013e-05, 'epoch': 0.3} 30%|██▉ | 1984/6640 [3:07:32<7:10:45, 5.55s/it] 30%|██▉ | 1985/6640 [3:07:37<7:07:36, 5.51s/it] {'loss': 0.602, 'learning_rate': 1.6442397127511366e-05, 'epoch': 0.3} 30%|██▉ | 1985/6640 [3:07:37<7:07:36, 5.51s/it] 30%|██▉ | 1986/6640 [3:07:43<7:05:16, 5.48s/it] {'loss': 0.5936, 'learning_rate': 1.643866536061491e-05, 'epoch': 0.3} 30%|██▉ | 1986/6640 [3:07:43<7:05:16, 5.48s/it] 30%|██▉ | 1987/6640 [3:07:48<7:00:31, 5.42s/it] {'loss': 0.5992, 'learning_rate': 1.6434932061488827e-05, 'epoch': 0.3} 30%|██▉ | 1987/6640 [3:07:48<7:00:31, 5.42s/it] 30%|██▉ | 1988/6640 [3:07:54<7:01:17, 5.43s/it] {'loss': 0.6441, 'learning_rate': 1.6431197231021543e-05, 'epoch': 0.3} 30%|██▉ | 1988/6640 [3:07:54<7:01:17, 5.43s/it] 30%|██▉ | 1989/6640 [3:07:59<7:01:24, 5.44s/it] {'loss': 0.6267, 'learning_rate': 1.6427460870101837e-05, 'epoch': 0.3} 30%|██▉ | 1989/6640 [3:07:59<7:01:24, 5.44s/it] 30%|██▉ | 1990/6640 [3:08:04<7:00:17, 5.42s/it] {'loss': 0.6126, 'learning_rate': 1.6423722979618883e-05, 'epoch': 0.3} 30%|██▉ | 1990/6640 [3:08:04<7:00:17, 5.42s/it] 30%|██▉ | 1991/6640 [3:08:10<7:03:17, 5.46s/it] {'loss': 0.6125, 'learning_rate': 1.6419983560462178e-05, 'epoch': 0.3} 30%|██▉ | 1991/6640 [3:08:10<7:03:17, 5.46s/it] 30%|███ | 1992/6640 [3:08:15<7:03:34, 5.47s/it] {'loss': 0.5942, 'learning_rate': 1.6416242613521612e-05, 'epoch': 0.3} 30%|███ | 1992/6640 [3:08:15<7:03:34, 5.47s/it] 30%|███ | 1993/6640 [3:08:21<7:01:05, 5.44s/it] {'loss': 0.6027, 'learning_rate': 1.641250013968743e-05, 'epoch': 0.3} 30%|███ | 1993/6640 [3:08:21<7:01:05, 5.44s/it] 30%|███ | 1994/6640 [3:08:26<7:01:44, 5.45s/it] {'loss': 0.6085, 'learning_rate': 1.6408756139850243e-05, 'epoch': 0.3} 30%|███ | 1994/6640 [3:08:26<7:01:44, 5.45s/it] 30%|███ | 1995/6640 [3:08:32<6:58:40, 5.41s/it] {'loss': 0.6066, 'learning_rate': 1.6405010614901017e-05, 'epoch': 0.3} 30%|███ | 1995/6640 [3:08:32<6:58:40, 5.41s/it] 30%|███ | 1996/6640 [3:08:37<6:56:53, 5.39s/it] {'loss': 0.6247, 'learning_rate': 1.640126356573109e-05, 'epoch': 0.3} 30%|███ | 1996/6640 [3:08:37<6:56:53, 5.39s/it] 30%|███ | 1997/6640 [3:08:43<7:03:31, 5.47s/it] {'loss': 0.6111, 'learning_rate': 1.639751499323216e-05, 'epoch': 0.3} 30%|███ | 1997/6640 [3:08:43<7:03:31, 5.47s/it] 30%|███ | 1998/6640 [3:08:48<7:02:34, 5.46s/it] {'loss': 0.602, 'learning_rate': 1.6393764898296283e-05, 'epoch': 0.3} 30%|███ | 1998/6640 [3:08:48<7:02:34, 5.46s/it] 30%|███ | 1999/6640 [3:08:53<7:00:27, 5.44s/it] {'loss': 0.6016, 'learning_rate': 1.6390013281815884e-05, 'epoch': 0.3} 30%|███ | 1999/6640 [3:08:53<7:00:27, 5.44s/it]1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...5 AutoResumeHook: Checking whether to suspend... 30%|███ | 2000/6640 [3:08:59<7:03:13, 5.47s/it]2 AutoResumeHook: Checking whether to suspend... {'loss': 0.616, 'learning_rate': 1.6386260144683744e-05, 'epoch': 0.3} 30%|███ | 2000/6640 [3:08:59<7:03:13, 5.47s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 30%|███ | 2001/6640 [3:09:27<15:35:42, 12.10s/it] {'loss': 0.5944, 'learning_rate': 1.6382505487793015e-05, 'epoch': 0.3} 30%|███ | 2001/6640 [3:09:27<15:35:42, 12.10s/it] 30%|███ | 2002/6640 [3:09:32<13:01:26, 10.11s/it] {'loss': 0.6242, 'learning_rate': 1.6378749312037197e-05, 'epoch': 0.3} 30%|███ | 2002/6640 [3:09:32<13:01:26, 10.11s/it] 30%|███ | 2003/6640 [3:09:37<11:13:54, 8.72s/it] {'loss': 0.6154, 'learning_rate': 1.6374991618310165e-05, 'epoch': 0.3} 30%|███ | 2003/6640 [3:09:37<11:13:54, 8.72s/it] 30%|███ | 2004/6640 [3:09:43<9:54:37, 7.70s/it] {'loss': 0.607, 'learning_rate': 1.6371232407506146e-05, 'epoch': 0.3} 30%|███ | 2004/6640 [3:09:43<9:54:37, 7.70s/it] 30%|███ | 2005/6640 [3:09:48<8:58:37, 6.97s/it] {'loss': 0.6095, 'learning_rate': 1.6367471680519734e-05, 'epoch': 0.3} 30%|███ | 2005/6640 [3:09:48<8:58:37, 6.97s/it] 30%|███ | 2006/6640 [3:09:53<8:20:08, 6.48s/it] {'loss': 0.6223, 'learning_rate': 1.6363709438245877e-05, 'epoch': 0.3} 30%|███ | 2006/6640 [3:09:53<8:20:08, 6.48s/it] 30%|███ | 2007/6640 [3:09:59<7:57:07, 6.18s/it] {'loss': 0.6186, 'learning_rate': 1.635994568157989e-05, 'epoch': 0.3} 30%|███ | 2007/6640 [3:09:59<7:57:07, 6.18s/it] 30%|███ | 2008/6640 [3:10:04<7:42:16, 5.99s/it] {'loss': 0.6107, 'learning_rate': 1.6356180411417448e-05, 'epoch': 0.3} 30%|███ | 2008/6640 [3:10:04<7:42:16, 5.99s/it] 30%|███ | 2009/6640 [3:10:10<7:33:07, 5.87s/it] {'loss': 0.6336, 'learning_rate': 1.6352413628654584e-05, 'epoch': 0.3} 30%|███ | 2009/6640 [3:10:10<7:33:07, 5.87s/it] 30%|███ | 2010/6640 [3:10:16<7:25:53, 5.78s/it] {'loss': 0.6157, 'learning_rate': 1.6348645334187686e-05, 'epoch': 0.3} 30%|███ | 2010/6640 [3:10:16<7:25:53, 5.78s/it] 30%|███ | 2011/6640 [3:10:21<7:20:32, 5.71s/it] {'loss': 0.5841, 'learning_rate': 1.6344875528913517e-05, 'epoch': 0.3} 30%|███ | 2011/6640 [3:10:21<7:20:32, 5.71s/it] 30%|███ | 2012/6640 [3:10:26<7:09:59, 5.57s/it] {'loss': 0.6036, 'learning_rate': 1.6341104213729177e-05, 'epoch': 0.3} 30%|███ | 2012/6640 [3:10:26<7:09:59, 5.57s/it] 30%|███ | 2013/6640 [3:10:32<7:05:00, 5.51s/it] {'loss': 0.6105, 'learning_rate': 1.6337331389532148e-05, 'epoch': 0.3} 30%|███ | 2013/6640 [3:10:32<7:05:00, 5.51s/it] 30%|███ | 2014/6640 [3:10:37<7:07:02, 5.54s/it] {'loss': 0.6503, 'learning_rate': 1.633355705722025e-05, 'epoch': 0.3} 30%|███ | 2014/6640 [3:10:37<7:07:02, 5.54s/it] 30%|███ | 2015/6640 [3:10:43<7:01:26, 5.47s/it] {'loss': 0.5906, 'learning_rate': 1.632978121769169e-05, 'epoch': 0.3} 30%|███ | 2015/6640 [3:10:43<7:01:26, 5.47s/it] 30%|███ | 2016/6640 [3:10:48<6:58:37, 5.43s/it] {'loss': 0.5961, 'learning_rate': 1.6326003871845003e-05, 'epoch': 0.3} 30%|███ | 2016/6640 [3:10:48<6:58:37, 5.43s/it] 30%|███ | 2017/6640 [3:10:54<7:01:36, 5.47s/it] {'loss': 0.5994, 'learning_rate': 1.63222250205791e-05, 'epoch': 0.3} 30%|███ | 2017/6640 [3:10:54<7:01:36, 5.47s/it] 30%|███ | 2018/6640 [3:10:59<7:00:23, 5.46s/it] {'loss': 0.6054, 'learning_rate': 1.6318444664793243e-05, 'epoch': 0.3} 30%|███ | 2018/6640 [3:10:59<7:00:23, 5.46s/it] 30%|███ | 2019/6640 [3:11:04<6:58:31, 5.43s/it] {'loss': 0.5903, 'learning_rate': 1.631466280538706e-05, 'epoch': 0.3} 30%|███ | 2019/6640 [3:11:04<6:58:31, 5.43s/it] 30%|███ | 2020/6640 [3:11:10<6:57:59, 5.43s/it] {'loss': 0.6044, 'learning_rate': 1.631087944326053e-05, 'epoch': 0.3} 30%|███ | 2020/6640 [3:11:10<6:57:59, 5.43s/it] 30%|███ | 2021/6640 [3:11:15<6:56:10, 5.41s/it] {'loss': 0.6029, 'learning_rate': 1.630709457931399e-05, 'epoch': 0.3} 30%|███ | 2021/6640 [3:11:15<6:56:10, 5.41s/it] 30%|███ | 2022/6640 [3:11:21<6:57:41, 5.43s/it] {'loss': 0.6242, 'learning_rate': 1.630330821444814e-05, 'epoch': 0.3} 30%|███ | 2022/6640 [3:11:21<6:57:41, 5.43s/it] 30%|███ | 2023/6640 [3:11:26<6:56:26, 5.41s/it] {'loss': 0.5964, 'learning_rate': 1.629952034956403e-05, 'epoch': 0.3} 30%|███ | 2023/6640 [3:11:26<6:56:26, 5.41s/it] 30%|███ | 2024/6640 [3:11:31<6:54:34, 5.39s/it] {'loss': 0.6009, 'learning_rate': 1.6295730985563074e-05, 'epoch': 0.3} 30%|███ | 2024/6640 [3:11:31<6:54:34, 5.39s/it] 30%|███ | 2025/6640 [3:11:37<6:57:56, 5.43s/it] {'loss': 0.6085, 'learning_rate': 1.6291940123347033e-05, 'epoch': 0.3} 30%|███ | 2025/6640 [3:11:37<6:57:56, 5.43s/it] 31%|███ | 2026/6640 [3:11:42<6:55:51, 5.41s/it] {'loss': 0.5905, 'learning_rate': 1.6288147763818038e-05, 'epoch': 0.31} 31%|███ | 2026/6640 [3:11:42<6:55:51, 5.41s/it] 31%|███ | 2027/6640 [3:11:48<7:06:18, 5.54s/it] {'loss': 0.6154, 'learning_rate': 1.6284353907878557e-05, 'epoch': 0.31} 31%|███ | 2027/6640 [3:11:48<7:06:18, 5.54s/it] 31%|███ | 2028/6640 [3:11:53<7:01:46, 5.49s/it] {'loss': 0.6095, 'learning_rate': 1.6280558556431437e-05, 'epoch': 0.31} 31%|███ | 2028/6640 [3:11:53<7:01:46, 5.49s/it] 31%|███ | 2029/6640 [3:11:59<6:59:08, 5.45s/it] {'loss': 0.6006, 'learning_rate': 1.627676171037987e-05, 'epoch': 0.31} 31%|███ | 2029/6640 [3:11:59<6:59:08, 5.45s/it] 31%|███ | 2030/6640 [3:12:04<6:56:09, 5.42s/it] {'loss': 0.6147, 'learning_rate': 1.6272963370627398e-05, 'epoch': 0.31} 31%|███ | 2030/6640 [3:12:04<6:56:09, 5.42s/it] 31%|███ | 2031/6640 [3:12:09<6:54:18, 5.39s/it] {'loss': 0.6099, 'learning_rate': 1.626916353807793e-05, 'epoch': 0.31} 31%|███ | 2031/6640 [3:12:09<6:54:18, 5.39s/it] 31%|███ | 2032/6640 [3:12:15<6:54:04, 5.39s/it] {'loss': 0.5742, 'learning_rate': 1.6265362213635714e-05, 'epoch': 0.31} 31%|███ | 2032/6640 [3:12:15<6:54:04, 5.39s/it] 31%|███ | 2033/6640 [3:12:21<7:02:13, 5.50s/it] {'loss': 0.6322, 'learning_rate': 1.626155939820537e-05, 'epoch': 0.31} 31%|███ | 2033/6640 [3:12:21<7:02:13, 5.50s/it] 31%|███ | 2034/6640 [3:12:26<7:04:00, 5.52s/it] {'loss': 0.5993, 'learning_rate': 1.6257755092691865e-05, 'epoch': 0.31} 31%|███ | 2034/6640 [3:12:26<7:04:00, 5.52s/it] 31%|███ | 2035/6640 [3:12:32<7:01:51, 5.50s/it] {'loss': 0.6318, 'learning_rate': 1.6253949298000527e-05, 'epoch': 0.31} 31%|███ | 2035/6640 [3:12:32<7:01:51, 5.50s/it] 31%|███ | 2036/6640 [3:12:37<7:03:32, 5.52s/it] {'loss': 0.6024, 'learning_rate': 1.6250142015037024e-05, 'epoch': 0.31} 31%|███ | 2036/6640 [3:12:37<7:03:32, 5.52s/it] 31%|███ | 2037/6640 [3:12:43<7:03:29, 5.52s/it] {'loss': 0.6297, 'learning_rate': 1.624633324470739e-05, 'epoch': 0.31} 31%|███ | 2037/6640 [3:12:43<7:03:29, 5.52s/it] 31%|███ | 2038/6640 [3:12:48<7:03:59, 5.53s/it] {'loss': 0.587, 'learning_rate': 1.6242522987918016e-05, 'epoch': 0.31} 31%|███ | 2038/6640 [3:12:48<7:03:59, 5.53s/it] 31%|███ | 2039/6640 [3:12:54<7:08:44, 5.59s/it] {'loss': 0.6136, 'learning_rate': 1.6238711245575632e-05, 'epoch': 0.31} 31%|███ | 2039/6640 [3:12:54<7:08:44, 5.59s/it] 31%|███ | 2040/6640 [3:12:59<7:04:43, 5.54s/it] {'loss': 0.5954, 'learning_rate': 1.6234898018587336e-05, 'epoch': 0.31} 31%|███ | 2040/6640 [3:12:59<7:04:43, 5.54s/it] 31%|███ | 2041/6640 [3:13:05<6:59:14, 5.47s/it] {'loss': 0.5917, 'learning_rate': 1.6231083307860574e-05, 'epoch': 0.31} 31%|███ | 2041/6640 [3:13:05<6:59:14, 5.47s/it] 31%|███ | 2042/6640 [3:13:10<7:01:59, 5.51s/it] {'loss': 0.611, 'learning_rate': 1.6227267114303145e-05, 'epoch': 0.31} 31%|███ | 2042/6640 [3:13:10<7:01:59, 5.51s/it] 31%|███ | 2043/6640 [3:13:16<6:59:36, 5.48s/it] {'loss': 0.5984, 'learning_rate': 1.6223449438823194e-05, 'epoch': 0.31} 31%|███ | 2043/6640 [3:13:16<6:59:36, 5.48s/it] 31%|███ | 2044/6640 [3:13:21<6:56:50, 5.44s/it] {'loss': 0.6042, 'learning_rate': 1.6219630282329232e-05, 'epoch': 0.31} 31%|███ | 2044/6640 [3:13:21<6:56:50, 5.44s/it] 31%|███ | 2045/6640 [3:13:26<6:55:27, 5.42s/it] {'loss': 0.5916, 'learning_rate': 1.6215809645730115e-05, 'epoch': 0.31} 31%|███ | 2045/6640 [3:13:26<6:55:27, 5.42s/it] 31%|███ | 2046/6640 [3:13:32<6:54:11, 5.41s/it] {'loss': 0.6165, 'learning_rate': 1.6211987529935055e-05, 'epoch': 0.31} 31%|███ | 2046/6640 [3:13:32<6:54:11, 5.41s/it] 31%|███ | 2047/6640 [3:13:37<6:51:35, 5.38s/it] {'loss': 0.6126, 'learning_rate': 1.6208163935853605e-05, 'epoch': 0.31} 31%|███ | 2047/6640 [3:13:37<6:51:35, 5.38s/it] 31%|███ | 2048/6640 [3:13:42<6:49:44, 5.35s/it] {'loss': 0.581, 'learning_rate': 1.6204338864395683e-05, 'epoch': 0.31} 31%|███ | 2048/6640 [3:13:42<6:49:44, 5.35s/it] 31%|███ | 2049/6640 [3:13:48<6:55:21, 5.43s/it] {'loss': 0.6165, 'learning_rate': 1.620051231647155e-05, 'epoch': 0.31} 31%|███ | 2049/6640 [3:13:48<6:55:21, 5.43s/it]4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 31%|███ | 2050/6640 [3:13:54<6:57:48, 5.46s/it] {'loss': 0.5998, 'learning_rate': 1.6196684292991827e-05, 'epoch': 0.31} 31%|███ | 2050/6640 [3:13:54<6:57:48, 5.46s/it] 31%|███ | 2051/6640 [3:13:59<6:59:05, 5.48s/it] {'loss': 0.6221, 'learning_rate': 1.6192854794867477e-05, 'epoch': 0.31} 31%|███ | 2051/6640 [3:13:59<6:59:05, 5.48s/it] 31%|███ | 2052/6640 [3:14:05<6:57:34, 5.46s/it] {'loss': 0.5948, 'learning_rate': 1.618902382300982e-05, 'epoch': 0.31} 31%|███ | 2052/6640 [3:14:05<6:57:34, 5.46s/it] 31%|███ | 2053/6640 [3:14:10<6:57:32, 5.46s/it] {'loss': 0.6117, 'learning_rate': 1.6185191378330523e-05, 'epoch': 0.31} 31%|███ | 2053/6640 [3:14:10<6:57:32, 5.46s/it] 31%|███ | 2054/6640 [3:14:16<6:59:37, 5.49s/it] {'loss': 0.6269, 'learning_rate': 1.6181357461741603e-05, 'epoch': 0.31} 31%|███ | 2054/6640 [3:14:16<6:59:37, 5.49s/it] 31%|███ | 2055/6640 [3:14:21<7:00:45, 5.51s/it] {'loss': 0.6033, 'learning_rate': 1.6177522074155436e-05, 'epoch': 0.31} 31%|███ | 2055/6640 [3:14:21<7:00:45, 5.51s/it] 31%|███ | 2056/6640 [3:14:26<6:56:43, 5.45s/it] {'loss': 0.6272, 'learning_rate': 1.6173685216484734e-05, 'epoch': 0.31} 31%|███ | 2056/6640 [3:14:26<6:56:43, 5.45s/it] 31%|███ | 2057/6640 [3:14:32<7:07:02, 5.59s/it] {'loss': 0.5941, 'learning_rate': 1.6169846889642572e-05, 'epoch': 0.31} 31%|███ | 2057/6640 [3:14:32<7:07:02, 5.59s/it] 31%|███ | 2058/6640 [3:14:38<7:03:26, 5.54s/it] {'loss': 0.6206, 'learning_rate': 1.6166007094542365e-05, 'epoch': 0.31} 31%|███ | 2058/6640 [3:14:38<7:03:26, 5.54s/it] 31%|███ | 2059/6640 [3:14:43<7:01:43, 5.52s/it] {'loss': 0.6358, 'learning_rate': 1.616216583209788e-05, 'epoch': 0.31} 31%|███ | 2059/6640 [3:14:43<7:01:43, 5.52s/it] 31%|███ | 2060/6640 [3:14:49<7:00:04, 5.50s/it] {'loss': 0.6212, 'learning_rate': 1.615832310322324e-05, 'epoch': 0.31} 31%|███ | 2060/6640 [3:14:49<7:00:04, 5.50s/it] 31%|███ | 2061/6640 [3:14:54<6:58:18, 5.48s/it] {'loss': 0.6184, 'learning_rate': 1.6154478908832907e-05, 'epoch': 0.31} 31%|███ | 2061/6640 [3:14:54<6:58:18, 5.48s/it] 31%|███ | 2062/6640 [3:14:59<6:53:35, 5.42s/it] {'loss': 0.6071, 'learning_rate': 1.6150633249841696e-05, 'epoch': 0.31} 31%|███ | 2062/6640 [3:14:59<6:53:35, 5.42s/it]WARNING: tokenization mismatch: 1 vs. 1419. [[{'from': 'human', 'value': 'ASSISTANT: You are an innovative tax advisory firm with the following knowledge and traits.\n\nKNOWLEDGE: Tax advisor, junior tax advisor, senior tax advisor, financial strategist, tax regulations.\n\nTRAITS: Empathy, innovation, solution orientation, professionalism, communication, client focus.\n\nTASK: Use your knowledge to develop an internal strategy document. We are a tax advisory firm looking to identify 10 creative ways to improve client satisfaction and loyalty. Focus on ideas that are tailored to the needs of tax advisors and enhance the overall client experience.\n\nOUTPUT: A table in German containing 10 creative strategies to improve client satisfaction and loyalty for a tax advisory firm, including a brief description for each.'}, {'from': 'gpt', 'value': 'Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei\n\n| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1 | Personalisierter Service | Jeder Kunde erhält einen dedizierten Steuerberater, der auf seine individuellen Bedürfnisse und Anliegen eingeht. |\n| 2 | Proaktive Steuerplanung | Regelmäßige Updates und Empfehlungen zur Steueroptimierung, basierend auf aktuellen Gesetzesänderungen und Kundenzielen. |\n| 3 | Umfassende Schulungsangebote | Kostenlose Workshops und Webinare, um das Steuerwissen der Kunden zu erweitern und den Umgang mit Steuerthemen zu erleichtern. |\n| 4 | Digitale Plattform | Eine benutzerfreundliche Online-Plattform, die den Kunden den Zugriff auf ihre Steuerdokumente und den Kontakt zum Berater erleichtert. |\n| 5 | Einführung von Steuer-Coaching | Persönliche Coaching-Sitzungen, um den Kunden dabei zu unterstützen, ihre steuerlichen Ziele und Möglichkeiten besser zu verstehen. |\n| 6 | Schnelle Reaktionszeiten | Verbindliches Versprechen, innerhalb von 24 Stunden auf Kundenanfragen zu reagieren, um eine effiziente Kommunikation zu gewährleisten. |\n| 7 | Transparente Preisgestaltung | Klare und faire Preisstrukturen, die den Kunden im Voraus bekannt sind, um böse Überraschungen zu vermeiden. |\n| 8 | Individuelle Jahresgespräche | Regelmäßige persönliche Jahresgespräche, um auf die Bedürfnisse und Ziele der Kunden einzugehen und die Zusammenarbeit zu stärken. |\n| 9 | Kunden-Feedback-System | Einrichtung eines einfachen Feedback-Systems, um Kundenmeinungen und Verbesserungsvorschläge aktiv einzuholen und umzusetzen. |\n| 10 | Steuer-Newsletter und Informationsangebot | Monatlicher Newsletter mit aktuellen Steuerinformationen und Tipps, der den Kunden einen Mehrwert bietet und sie auf dem Laufenden hält. |'}, {'from': 'human', 'value': 'Bitte etwas kreativer und auf bezugnehmend auf die zwischenmenschliche Ebene.'}, {'from': 'gpt', 'value': 'Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei mit Fokus auf zwischenmenschlicher Ebene\n\n| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1 | Kunden-Events | Organisation von regelmäßigen, informellen Veranstaltungen, um den persönlichen Kontakt zwischen Klienten und Beratern zu fördern. |\n| 2 | Persönliche Geburtstagsgrüße | Individuelle Geburtstagskarten oder -nachrichten für jeden Kunden, um Wertschätzung zu zeigen und Beziehungen zu stärken. |\n| 3 | Kundenbefragung & Anpassung | Aktives Einholen von Kundenmeinungen zur individuellen Betreuung und Anpassung unserer Leistungen entsprechend der Kundenbedürfnisse. |\n| 4 | Teamvorstellung & Empathie | Vorstellung unserer Berater auf der Firmenwebsite und in sozialen Medien, um eine persönliche und empathische Beziehung zum Kunden aufzubauen. |\n| 5 | Kunden-Mentoring-Programm | Schaffung eines Programms, bei dem langjährige Kunden als Mentoren für Neukunden fungieren, um Vertrauen und Loyalität zu fördern. |\n| 6 | "Tag der offenen Tür" | Einladung der Kunden zu einem "Tag der offenen Tür" in unserer Kanzlei, um ihnen einen persönlichen Einblick in unsere Arbeitsweise zu geben. |\n| 7 | Individuelle Erfolgsgeschichten | Erstellung von individuellen Erfolgsgeschichten für jeden Kunden, um ihre steuerlichen Erfolge hervorzuheben und ihre Loyalität zu stärken. |\n| 8 | Gemeinsame soziale Projekte | Zusammenarbeit mit Kunden an sozialen Projekten und Spendenaktionen, um gemeinsame Werte zu teilen und die Zusammenarbeit zu fördern. |\n| 9 | Flexibilität in der Kommunikation | Anbieten verschiedener Kommunikationswege (Telefon, E-Mail, Videoanruf, etc.), um auf die individuellen Bedürfnisse der Kunden einzugehen. |\n| 10 | Jährliches Kunden-Dankeschön | Überraschung der Kunden mit einer kleinen Aufmerksamkeit oder einem Dankeschön-Event, um die Wertschätzung für ihre Treue auszudrücken. |'}]] (ignored) 31%|███ | 2063/6640 [3:15:05<6:58:10, 5.48s/it] {'loss': 0.6187, 'learning_rate': 1.6146786127164773e-05, 'epoch': 0.31} 31%|███ | 2063/6640 [3:15:05<6:58:10, 5.48s/it] 31%|███ | 2064/6640 [3:15:11<7:04:10, 5.56s/it] {'loss': 0.6273, 'learning_rate': 1.614293754171765e-05, 'epoch': 0.31} 31%|███ | 2064/6640 [3:15:11<7:04:10, 5.56s/it] 31%|███ | 2065/6640 [3:15:16<7:00:04, 5.51s/it] {'loss': 0.6126, 'learning_rate': 1.6139087494416184e-05, 'epoch': 0.31} 31%|███ | 2065/6640 [3:15:16<7:00:04, 5.51s/it] 31%|███ | 2066/6640 [3:15:22<6:58:33, 5.49s/it] {'loss': 0.607, 'learning_rate': 1.6135235986176584e-05, 'epoch': 0.31} 31%|███ | 2066/6640 [3:15:22<6:58:33, 5.49s/it] 31%|███ | 2067/6640 [3:15:27<6:55:14, 5.45s/it] {'loss': 0.5957, 'learning_rate': 1.613138301791541e-05, 'epoch': 0.31} 31%|███ | 2067/6640 [3:15:27<6:55:14, 5.45s/it] 31%|███ | 2068/6640 [3:15:32<6:57:07, 5.47s/it] {'loss': 0.6366, 'learning_rate': 1.6127528590549563e-05, 'epoch': 0.31} 31%|███ | 2068/6640 [3:15:33<6:57:07, 5.47s/it] 31%|███ | 2069/6640 [3:15:38<6:54:56, 5.45s/it] {'loss': 0.6059, 'learning_rate': 1.612367270499629e-05, 'epoch': 0.31} 31%|███ | 2069/6640 [3:15:38<6:54:56, 5.45s/it] 31%|███ | 2070/6640 [3:15:43<6:56:53, 5.47s/it] {'loss': 0.6067, 'learning_rate': 1.6119815362173188e-05, 'epoch': 0.31} 31%|███ | 2070/6640 [3:15:43<6:56:53, 5.47s/it] 31%|███ | 2071/6640 [3:15:49<6:55:25, 5.46s/it] {'loss': 0.6209, 'learning_rate': 1.6115956562998208e-05, 'epoch': 0.31} 31%|███ | 2071/6640 [3:15:49<6:55:25, 5.46s/it] 31%|███ | 2072/6640 [3:15:54<6:52:10, 5.41s/it] {'loss': 0.6076, 'learning_rate': 1.611209630838963e-05, 'epoch': 0.31} 31%|███ | 2072/6640 [3:15:54<6:52:10, 5.41s/it] 31%|███ | 2073/6640 [3:16:00<6:52:38, 5.42s/it] {'loss': 0.6093, 'learning_rate': 1.6108234599266102e-05, 'epoch': 0.31} 31%|███ | 2073/6640 [3:16:00<6:52:38, 5.42s/it] 31%|███ | 2074/6640 [3:16:05<6:50:12, 5.39s/it] {'loss': 0.6059, 'learning_rate': 1.6104371436546604e-05, 'epoch': 0.31} 31%|███ | 2074/6640 [3:16:05<6:50:12, 5.39s/it] 31%|███▏ | 2075/6640 [3:16:10<6:51:00, 5.40s/it] {'loss': 0.6077, 'learning_rate': 1.6100506821150455e-05, 'epoch': 0.31} 31%|███▏ | 2075/6640 [3:16:10<6:51:00, 5.40s/it] 31%|███▏ | 2076/6640 [3:16:16<6:54:09, 5.44s/it] {'loss': 0.6275, 'learning_rate': 1.609664075399735e-05, 'epoch': 0.31} 31%|███▏ | 2076/6640 [3:16:16<6:54:09, 5.44s/it] 31%|███▏ | 2077/6640 [3:16:21<6:54:32, 5.45s/it] {'loss': 0.6216, 'learning_rate': 1.6092773236007288e-05, 'epoch': 0.31} 31%|███▏ | 2077/6640 [3:16:21<6:54:32, 5.45s/it] 31%|███▏ | 2078/6640 [3:16:27<6:52:55, 5.43s/it] {'loss': 0.5866, 'learning_rate': 1.6088904268100648e-05, 'epoch': 0.31} 31%|███▏ | 2078/6640 [3:16:27<6:52:55, 5.43s/it] 31%|███▏ | 2079/6640 [3:16:32<6:51:27, 5.41s/it] {'loss': 0.5974, 'learning_rate': 1.6085033851198136e-05, 'epoch': 0.31} 31%|███▏ | 2079/6640 [3:16:32<6:51:27, 5.41s/it] 31%|███▏ | 2080/6640 [3:16:38<6:52:09, 5.42s/it] {'loss': 0.5978, 'learning_rate': 1.6081161986220807e-05, 'epoch': 0.31} 31%|███▏ | 2080/6640 [3:16:38<6:52:09, 5.42s/it] 31%|███▏ | 2081/6640 [3:16:43<6:51:45, 5.42s/it] {'loss': 0.6028, 'learning_rate': 1.6077288674090063e-05, 'epoch': 0.31} 31%|███▏ | 2081/6640 [3:16:43<6:51:45, 5.42s/it] 31%|███▏ | 2082/6640 [3:16:48<6:52:43, 5.43s/it] {'loss': 0.5854, 'learning_rate': 1.6073413915727648e-05, 'epoch': 0.31} 31%|███▏ | 2082/6640 [3:16:48<6:52:43, 5.43s/it] 31%|███▏ | 2083/6640 [3:16:54<6:52:35, 5.43s/it] {'loss': 0.6192, 'learning_rate': 1.6069537712055652e-05, 'epoch': 0.31} 31%|███▏ | 2083/6640 [3:16:54<6:52:35, 5.43s/it] 31%|███▏ | 2084/6640 [3:16:59<6:53:28, 5.45s/it] {'loss': 0.6325, 'learning_rate': 1.6065660063996502e-05, 'epoch': 0.31} 31%|███▏ | 2084/6640 [3:16:59<6:53:28, 5.45s/it] 31%|███▏ | 2085/6640 [3:17:05<6:53:16, 5.44s/it] {'loss': 0.6344, 'learning_rate': 1.6061780972472978e-05, 'epoch': 0.31} 31%|███▏ | 2085/6640 [3:17:05<6:53:16, 5.44s/it] 31%|███▏ | 2086/6640 [3:17:10<6:53:33, 5.45s/it] {'loss': 0.5902, 'learning_rate': 1.60579004384082e-05, 'epoch': 0.31} 31%|███▏ | 2086/6640 [3:17:10<6:53:33, 5.45s/it] 31%|███▏ | 2087/6640 [3:17:15<6:49:22, 5.39s/it] {'loss': 0.6093, 'learning_rate': 1.605401846272563e-05, 'epoch': 0.31} 31%|███▏ | 2087/6640 [3:17:15<6:49:22, 5.39s/it] 31%|███▏ | 2088/6640 [3:17:21<6:51:12, 5.42s/it] {'loss': 0.6414, 'learning_rate': 1.6050135046349073e-05, 'epoch': 0.31} 31%|███▏ | 2088/6640 [3:17:21<6:51:12, 5.42s/it] 31%|███▏ | 2089/6640 [3:17:26<6:46:45, 5.36s/it] {'loss': 0.5868, 'learning_rate': 1.6046250190202684e-05, 'epoch': 0.31} 31%|███▏ | 2089/6640 [3:17:26<6:46:45, 5.36s/it] 31%|███▏ | 2090/6640 [3:17:32<6:46:36, 5.36s/it] {'loss': 0.5997, 'learning_rate': 1.6042363895210948e-05, 'epoch': 0.31} 31%|███▏ | 2090/6640 [3:17:32<6:46:36, 5.36s/it] 31%|███▏ | 2091/6640 [3:17:37<6:45:37, 5.35s/it] {'loss': 0.6139, 'learning_rate': 1.60384761622987e-05, 'epoch': 0.31} 31%|███▏ | 2091/6640 [3:17:37<6:45:37, 5.35s/it] 32%|███▏ | 2092/6640 [3:17:42<6:45:22, 5.35s/it] {'loss': 0.6074, 'learning_rate': 1.603458699239112e-05, 'epoch': 0.32} 32%|███▏ | 2092/6640 [3:17:42<6:45:22, 5.35s/it] 32%|███▏ | 2093/6640 [3:17:48<6:45:27, 5.35s/it] {'loss': 0.6055, 'learning_rate': 1.6030696386413715e-05, 'epoch': 0.32} 32%|███▏ | 2093/6640 [3:17:48<6:45:27, 5.35s/it] 32%|███▏ | 2094/6640 [3:17:53<6:45:34, 5.35s/it] {'loss': 0.5859, 'learning_rate': 1.602680434529236e-05, 'epoch': 0.32} 32%|███▏ | 2094/6640 [3:17:53<6:45:34, 5.35s/it] 32%|███▏ | 2095/6640 [3:17:58<6:42:54, 5.32s/it] {'loss': 0.5945, 'learning_rate': 1.6022910869953245e-05, 'epoch': 0.32} 32%|███▏ | 2095/6640 [3:17:58<6:42:54, 5.32s/it] 32%|███▏ | 2096/6640 [3:18:03<6:42:24, 5.31s/it] {'loss': 0.5883, 'learning_rate': 1.601901596132292e-05, 'epoch': 0.32} 32%|███▏ | 2096/6640 [3:18:03<6:42:24, 5.31s/it] 32%|███▏ | 2097/6640 [3:18:09<6:52:09, 5.44s/it] {'loss': 0.612, 'learning_rate': 1.6015119620328266e-05, 'epoch': 0.32} 32%|███▏ | 2097/6640 [3:18:09<6:52:09, 5.44s/it] 32%|███▏ | 2098/6640 [3:18:15<6:52:28, 5.45s/it] {'loss': 0.603, 'learning_rate': 1.601122184789651e-05, 'epoch': 0.32} 32%|███▏ | 2098/6640 [3:18:15<6:52:28, 5.45s/it] 32%|███▏ | 2099/6640 [3:18:20<6:58:28, 5.53s/it] {'loss': 0.5994, 'learning_rate': 1.6007322644955208e-05, 'epoch': 0.32} 32%|███▏ | 2099/6640 [3:18:20<6:58:28, 5.53s/it]41 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend...0 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 32%|███▏ | 2100/6640 [3:18:26<6:52:15, 5.45s/it]2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5821, 'learning_rate': 1.6003422012432275e-05, 'epoch': 0.32} 32%|███▏ | 2100/6640 [3:18:26<6:52:15, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 32%|███▏ | 2101/6640 [3:18:53<15:15:29, 12.10s/it] {'loss': 0.602, 'learning_rate': 1.5999519951255957e-05, 'epoch': 0.32} 32%|███▏ | 2101/6640 [3:18:53<15:15:29, 12.10s/it] 32%|███▏ | 2102/6640 [3:18:59<12:47:39, 10.15s/it] {'loss': 0.625, 'learning_rate': 1.5995616462354835e-05, 'epoch': 0.32} 32%|███▏ | 2102/6640 [3:18:59<12:47:39, 10.15s/it] 32%|███▏ | 2103/6640 [3:19:04<11:03:01, 8.77s/it] {'loss': 0.6143, 'learning_rate': 1.5991711546657837e-05, 'epoch': 0.32} 32%|███▏ | 2103/6640 [3:19:04<11:03:01, 8.77s/it] 32%|███▏ | 2104/6640 [3:19:10<9:48:24, 7.78s/it] {'loss': 0.6211, 'learning_rate': 1.5987805205094225e-05, 'epoch': 0.32} 32%|███▏ | 2104/6640 [3:19:10<9:48:24, 7.78s/it] 32%|███▏ | 2105/6640 [3:19:16<8:59:31, 7.14s/it] {'loss': 0.6135, 'learning_rate': 1.5983897438593612e-05, 'epoch': 0.32} 32%|███▏ | 2105/6640 [3:19:16<8:59:31, 7.14s/it] 32%|███▏ | 2106/6640 [3:19:21<8:23:53, 6.67s/it] {'loss': 0.5837, 'learning_rate': 1.597998824808593e-05, 'epoch': 0.32} 32%|███▏ | 2106/6640 [3:19:21<8:23:53, 6.67s/it] 32%|███▏ | 2107/6640 [3:19:26<7:53:42, 6.27s/it] {'loss': 0.5988, 'learning_rate': 1.5976077634501476e-05, 'epoch': 0.32} 32%|███▏ | 2107/6640 [3:19:26<7:53:42, 6.27s/it] 32%|███▏ | 2108/6640 [3:19:32<7:33:25, 6.00s/it] {'loss': 0.5876, 'learning_rate': 1.5972165598770853e-05, 'epoch': 0.32} 32%|███▏ | 2108/6640 [3:19:32<7:33:25, 6.00s/it] 32%|███▏ | 2109/6640 [3:19:37<7:22:13, 5.86s/it] {'loss': 0.5914, 'learning_rate': 1.5968252141825038e-05, 'epoch': 0.32} 32%|███▏ | 2109/6640 [3:19:37<7:22:13, 5.86s/it] 32%|███▏ | 2110/6640 [3:19:43<7:14:17, 5.75s/it] {'loss': 0.6453, 'learning_rate': 1.596433726459532e-05, 'epoch': 0.32} 32%|███▏ | 2110/6640 [3:19:43<7:14:17, 5.75s/it] 32%|███▏ | 2111/6640 [3:19:48<7:05:09, 5.63s/it] {'loss': 0.6015, 'learning_rate': 1.5960420968013334e-05, 'epoch': 0.32} 32%|███▏ | 2111/6640 [3:19:48<7:05:09, 5.63s/it] 32%|███▏ | 2112/6640 [3:19:54<7:00:30, 5.57s/it] {'loss': 0.5673, 'learning_rate': 1.5956503253011052e-05, 'epoch': 0.32} 32%|███▏ | 2112/6640 [3:19:54<7:00:30, 5.57s/it] 32%|███▏ | 2113/6640 [3:19:59<6:58:18, 5.54s/it] {'loss': 0.5853, 'learning_rate': 1.595258412052079e-05, 'epoch': 0.32} 32%|███▏ | 2113/6640 [3:19:59<6:58:18, 5.54s/it] 32%|███▏ | 2114/6640 [3:20:05<6:55:44, 5.51s/it] {'loss': 0.6045, 'learning_rate': 1.5948663571475197e-05, 'epoch': 0.32} 32%|███▏ | 2114/6640 [3:20:05<6:55:44, 5.51s/it] 32%|███▏ | 2115/6640 [3:20:10<6:50:54, 5.45s/it] {'loss': 0.5835, 'learning_rate': 1.5944741606807257e-05, 'epoch': 0.32} 32%|███▏ | 2115/6640 [3:20:10<6:50:54, 5.45s/it] 32%|███▏ | 2116/6640 [3:20:15<6:50:04, 5.44s/it] {'loss': 0.5884, 'learning_rate': 1.5940818227450292e-05, 'epoch': 0.32} 32%|███▏ | 2116/6640 [3:20:15<6:50:04, 5.44s/it] 32%|███▏ | 2117/6640 [3:20:21<6:53:26, 5.48s/it] {'loss': 0.6059, 'learning_rate': 1.5936893434337957e-05, 'epoch': 0.32} 32%|███▏ | 2117/6640 [3:20:21<6:53:26, 5.48s/it] 32%|███▏ | 2118/6640 [3:20:26<6:48:38, 5.42s/it] {'loss': 0.6197, 'learning_rate': 1.5932967228404255e-05, 'epoch': 0.32} 32%|███▏ | 2118/6640 [3:20:26<6:48:38, 5.42s/it] 32%|███▏ | 2119/6640 [3:20:32<6:47:51, 5.41s/it] {'loss': 0.6038, 'learning_rate': 1.592903961058351e-05, 'epoch': 0.32} 32%|███▏ | 2119/6640 [3:20:32<6:47:51, 5.41s/it] 32%|███▏ | 2120/6640 [3:20:37<6:55:05, 5.51s/it] {'loss': 0.6103, 'learning_rate': 1.5925110581810396e-05, 'epoch': 0.32} 32%|███▏ | 2120/6640 [3:20:37<6:55:05, 5.51s/it] 32%|███▏ | 2121/6640 [3:20:43<6:50:19, 5.45s/it] {'loss': 0.5944, 'learning_rate': 1.5921180143019915e-05, 'epoch': 0.32} 32%|███▏ | 2121/6640 [3:20:43<6:50:19, 5.45s/it] 32%|███▏ | 2122/6640 [3:20:48<6:47:33, 5.41s/it] {'loss': 0.6159, 'learning_rate': 1.59172482951474e-05, 'epoch': 0.32} 32%|███▏ | 2122/6640 [3:20:48<6:47:33, 5.41s/it] 32%|███▏ | 2123/6640 [3:20:53<6:46:54, 5.40s/it] {'loss': 0.5962, 'learning_rate': 1.5913315039128534e-05, 'epoch': 0.32} 32%|███▏ | 2123/6640 [3:20:53<6:46:54, 5.40s/it] 32%|███▏ | 2124/6640 [3:20:59<6:50:01, 5.45s/it] {'loss': 0.6085, 'learning_rate': 1.5909380375899323e-05, 'epoch': 0.32} 32%|███▏ | 2124/6640 [3:20:59<6:50:01, 5.45s/it] 32%|███▏ | 2125/6640 [3:21:04<6:47:12, 5.41s/it] {'loss': 0.5894, 'learning_rate': 1.590544430639611e-05, 'epoch': 0.32} 32%|███▏ | 2125/6640 [3:21:04<6:47:12, 5.41s/it] 32%|███▏ | 2126/6640 [3:21:10<6:51:48, 5.47s/it] {'loss': 0.6019, 'learning_rate': 1.5901506831555575e-05, 'epoch': 0.32} 32%|███▏ | 2126/6640 [3:21:10<6:51:48, 5.47s/it] 32%|███▏ | 2127/6640 [3:21:15<6:51:57, 5.48s/it] {'loss': 0.5881, 'learning_rate': 1.5897567952314733e-05, 'epoch': 0.32} 32%|███▏ | 2127/6640 [3:21:15<6:51:57, 5.48s/it] 32%|███▏ | 2128/6640 [3:21:21<6:46:31, 5.41s/it] {'loss': 0.6077, 'learning_rate': 1.5893627669610926e-05, 'epoch': 0.32} 32%|███▏ | 2128/6640 [3:21:21<6:46:31, 5.41s/it] 32%|███▏ | 2129/6640 [3:21:26<6:46:05, 5.40s/it] {'loss': 0.5969, 'learning_rate': 1.588968598438184e-05, 'epoch': 0.32} 32%|███▏ | 2129/6640 [3:21:26<6:46:05, 5.40s/it] 32%|███▏ | 2130/6640 [3:21:31<6:46:03, 5.40s/it] {'loss': 0.6053, 'learning_rate': 1.5885742897565494e-05, 'epoch': 0.32} 32%|███▏ | 2130/6640 [3:21:31<6:46:03, 5.40s/it] 32%|███▏ | 2131/6640 [3:21:37<6:44:18, 5.38s/it] {'loss': 0.5791, 'learning_rate': 1.588179841010023e-05, 'epoch': 0.32} 32%|███▏ | 2131/6640 [3:21:37<6:44:18, 5.38s/it] 32%|███▏ | 2132/6640 [3:21:42<6:42:44, 5.36s/it] {'loss': 0.5971, 'learning_rate': 1.5877852522924733e-05, 'epoch': 0.32} 32%|███▏ | 2132/6640 [3:21:42<6:42:44, 5.36s/it] 32%|███▏ | 2133/6640 [3:21:48<6:49:48, 5.46s/it] {'loss': 0.6003, 'learning_rate': 1.5873905236978017e-05, 'epoch': 0.32} 32%|███▏ | 2133/6640 [3:21:48<6:49:48, 5.46s/it] 32%|███▏ | 2134/6640 [3:21:53<6:52:05, 5.49s/it] {'loss': 0.6188, 'learning_rate': 1.5869956553199432e-05, 'epoch': 0.32} 32%|███▏ | 2134/6640 [3:21:53<6:52:05, 5.49s/it] 32%|███▏ | 2135/6640 [3:21:59<6:49:19, 5.45s/it] {'loss': 0.5995, 'learning_rate': 1.586600647252866e-05, 'epoch': 0.32} 32%|███▏ | 2135/6640 [3:21:59<6:49:19, 5.45s/it] 32%|███▏ | 2136/6640 [3:22:04<6:49:35, 5.46s/it] {'loss': 0.6277, 'learning_rate': 1.5862054995905712e-05, 'epoch': 0.32} 32%|███▏ | 2136/6640 [3:22:04<6:49:35, 5.46s/it] 32%|███▏ | 2137/6640 [3:22:13<8:07:12, 6.49s/it] {'loss': 0.6125, 'learning_rate': 1.5858102124270933e-05, 'epoch': 0.32} 32%|███▏ | 2137/6640 [3:22:13<8:07:12, 6.49s/it] 32%|███▏ | 2138/6640 [3:22:18<7:42:31, 6.16s/it] {'loss': 0.6124, 'learning_rate': 1.5854147858565002e-05, 'epoch': 0.32} 32%|███▏ | 2138/6640 [3:22:18<7:42:31, 6.16s/it] 32%|███▏ | 2139/6640 [3:22:24<7:29:12, 5.99s/it] {'loss': 0.5978, 'learning_rate': 1.5850192199728927e-05, 'epoch': 0.32} 32%|███▏ | 2139/6640 [3:22:24<7:29:12, 5.99s/it] 32%|███▏ | 2140/6640 [3:22:29<7:15:42, 5.81s/it] {'loss': 0.6041, 'learning_rate': 1.5846235148704047e-05, 'epoch': 0.32} 32%|███▏ | 2140/6640 [3:22:29<7:15:42, 5.81s/it] 32%|███▏ | 2141/6640 [3:22:35<7:06:55, 5.69s/it] {'loss': 0.6291, 'learning_rate': 1.584227670643204e-05, 'epoch': 0.32} 32%|███▏ | 2141/6640 [3:22:35<7:06:55, 5.69s/it] 32%|███▏ | 2142/6640 [3:22:40<6:56:42, 5.56s/it] {'loss': 0.5984, 'learning_rate': 1.58383168738549e-05, 'epoch': 0.32} 32%|███▏ | 2142/6640 [3:22:40<6:56:42, 5.56s/it] 32%|███▏ | 2143/6640 [3:22:46<6:58:32, 5.58s/it] {'loss': 0.6125, 'learning_rate': 1.583435565191497e-05, 'epoch': 0.32} 32%|███▏ | 2143/6640 [3:22:46<6:58:32, 5.58s/it] 32%|███▏ | 2144/6640 [3:22:51<6:58:02, 5.58s/it] {'loss': 0.6017, 'learning_rate': 1.583039304155491e-05, 'epoch': 0.32} 32%|███▏ | 2144/6640 [3:22:51<6:58:02, 5.58s/it] 32%|███▏ | 2145/6640 [3:22:57<6:54:19, 5.53s/it] {'loss': 0.6337, 'learning_rate': 1.5826429043717716e-05, 'epoch': 0.32} 32%|███▏ | 2145/6640 [3:22:57<6:54:19, 5.53s/it] 32%|███▏ | 2146/6640 [3:23:02<6:51:56, 5.50s/it] {'loss': 0.6128, 'learning_rate': 1.582246365934671e-05, 'epoch': 0.32} 32%|███▏ | 2146/6640 [3:23:02<6:51:56, 5.50s/it] 32%|███▏ | 2147/6640 [3:23:07<6:49:15, 5.47s/it] {'loss': 0.6106, 'learning_rate': 1.5818496889385554e-05, 'epoch': 0.32} 32%|███▏ | 2147/6640 [3:23:07<6:49:15, 5.47s/it] 32%|███▏ | 2148/6640 [3:23:13<6:45:38, 5.42s/it] {'loss': 0.5899, 'learning_rate': 1.5814528734778228e-05, 'epoch': 0.32} 32%|███▏ | 2148/6640 [3:23:13<6:45:38, 5.42s/it] 32%|███▏ | 2149/6640 [3:23:18<6:44:45, 5.41s/it] {'loss': 0.6019, 'learning_rate': 1.5810559196469043e-05, 'epoch': 0.32} 32%|███▏ | 2149/6640 [3:23:18<6:44:45, 5.41s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 6 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 32%|███▏ | 2150/6640 [3:23:24<6:45:37, 5.42s/it]2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5906, 'learning_rate': 1.580658827540265e-05, 'epoch': 0.32} 32%|███▏ | 2150/6640 [3:23:24<6:45:37, 5.42s/it] 32%|███▏ | 2151/6640 [3:23:29<6:52:35, 5.51s/it] {'loss': 0.6016, 'learning_rate': 1.5802615972524017e-05, 'epoch': 0.32} 32%|███▏ | 2151/6640 [3:23:29<6:52:35, 5.51s/it] 32%|███▏ | 2152/6640 [3:23:35<6:51:58, 5.51s/it] {'loss': 0.5725, 'learning_rate': 1.579864228877845e-05, 'epoch': 0.32} 32%|███▏ | 2152/6640 [3:23:35<6:51:58, 5.51s/it] 32%|███▏ | 2153/6640 [3:23:40<6:51:33, 5.50s/it] {'loss': 0.6141, 'learning_rate': 1.5794667225111572e-05, 'epoch': 0.32} 32%|███▏ | 2153/6640 [3:23:40<6:51:33, 5.50s/it] 32%|███▏ | 2154/6640 [3:23:46<6:46:49, 5.44s/it] {'loss': 0.5924, 'learning_rate': 1.5790690782469345e-05, 'epoch': 0.32} 32%|███▏ | 2154/6640 [3:23:46<6:46:49, 5.44s/it] 32%|███▏ | 2155/6640 [3:23:51<6:42:50, 5.39s/it] {'loss': 0.601, 'learning_rate': 1.578671296179806e-05, 'epoch': 0.32} 32%|███▏ | 2155/6640 [3:23:51<6:42:50, 5.39s/it] 32%|███▏ | 2156/6640 [3:23:57<6:56:03, 5.57s/it] {'loss': 0.6165, 'learning_rate': 1.5782733764044326e-05, 'epoch': 0.32} 32%|███▏ | 2156/6640 [3:23:57<6:56:03, 5.57s/it] 32%|███▏ | 2157/6640 [3:24:02<6:50:39, 5.50s/it] {'loss': 0.6118, 'learning_rate': 1.5778753190155085e-05, 'epoch': 0.32} 32%|███▏ | 2157/6640 [3:24:02<6:50:39, 5.50s/it] 32%|███▎ | 2158/6640 [3:24:08<6:48:07, 5.46s/it] {'loss': 0.5835, 'learning_rate': 1.5774771241077612e-05, 'epoch': 0.33} 32%|███▎ | 2158/6640 [3:24:08<6:48:07, 5.46s/it] 33%|███▎ | 2159/6640 [3:24:13<6:49:05, 5.48s/it] {'loss': 0.6116, 'learning_rate': 1.57707879177595e-05, 'epoch': 0.33} 33%|███▎ | 2159/6640 [3:24:13<6:49:05, 5.48s/it] 33%|███▎ | 2160/6640 [3:24:19<6:50:02, 5.49s/it] {'loss': 0.6108, 'learning_rate': 1.5766803221148676e-05, 'epoch': 0.33} 33%|███▎ | 2160/6640 [3:24:19<6:50:02, 5.49s/it] 33%|███▎ | 2161/6640 [3:24:24<6:46:10, 5.44s/it] {'loss': 0.6253, 'learning_rate': 1.5762817152193383e-05, 'epoch': 0.33} 33%|███▎ | 2161/6640 [3:24:24<6:46:10, 5.44s/it] 33%|███▎ | 2162/6640 [3:24:29<6:46:26, 5.45s/it] {'loss': 0.6109, 'learning_rate': 1.5758829711842208e-05, 'epoch': 0.33} 33%|███▎ | 2162/6640 [3:24:29<6:46:26, 5.45s/it] 33%|███▎ | 2163/6640 [3:24:36<7:15:23, 5.84s/it] {'loss': 0.5922, 'learning_rate': 1.5754840901044054e-05, 'epoch': 0.33} 33%|███▎ | 2163/6640 [3:24:36<7:15:23, 5.84s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5014 > 4096). Running this sequence through the model will result in indexing errors 33%|███▎ | 2164/6640 [3:24:41<7:03:52, 5.68s/it] {'loss': 0.6097, 'learning_rate': 1.5750850720748146e-05, 'epoch': 0.33} 33%|███▎ | 2164/6640 [3:24:41<7:03:52, 5.68s/it] 33%|███▎ | 2165/6640 [3:24:47<6:58:43, 5.61s/it] {'loss': 0.6042, 'learning_rate': 1.574685917190404e-05, 'epoch': 0.33} 33%|███▎ | 2165/6640 [3:24:47<6:58:43, 5.61s/it] 33%|███▎ | 2166/6640 [3:24:52<6:48:15, 5.48s/it] {'loss': 0.5933, 'learning_rate': 1.574286625546162e-05, 'epoch': 0.33} 33%|███▎ | 2166/6640 [3:24:52<6:48:15, 5.48s/it] 33%|███▎ | 2167/6640 [3:24:57<6:47:13, 5.46s/it] {'loss': 0.5859, 'learning_rate': 1.5738871972371096e-05, 'epoch': 0.33} 33%|███▎ | 2167/6640 [3:24:57<6:47:13, 5.46s/it] 33%|███▎ | 2168/6640 [3:25:03<6:51:51, 5.53s/it] {'loss': 0.6083, 'learning_rate': 1.5734876323582996e-05, 'epoch': 0.33} 33%|███▎ | 2168/6640 [3:25:03<6:51:51, 5.53s/it] 33%|███▎ | 2169/6640 [3:25:08<6:47:07, 5.46s/it] {'loss': 0.6217, 'learning_rate': 1.5730879310048175e-05, 'epoch': 0.33} 33%|███▎ | 2169/6640 [3:25:08<6:47:07, 5.46s/it] 33%|███▎ | 2170/6640 [3:25:14<6:51:53, 5.53s/it] {'loss': 0.6056, 'learning_rate': 1.572688093271782e-05, 'epoch': 0.33} 33%|███▎ | 2170/6640 [3:25:14<6:51:53, 5.53s/it] 33%|███▎ | 2171/6640 [3:25:20<6:55:12, 5.57s/it] {'loss': 0.6242, 'learning_rate': 1.5722881192543433e-05, 'epoch': 0.33} 33%|███▎ | 2171/6640 [3:25:20<6:55:12, 5.57s/it] 33%|███▎ | 2172/6640 [3:25:27<7:40:37, 6.19s/it] {'loss': 0.6107, 'learning_rate': 1.5718880090476852e-05, 'epoch': 0.33} 33%|███▎ | 2172/6640 [3:25:27<7:40:37, 6.19s/it] 33%|███▎ | 2173/6640 [3:25:33<7:24:33, 5.97s/it] {'loss': 0.598, 'learning_rate': 1.5714877627470225e-05, 'epoch': 0.33} 33%|███▎ | 2173/6640 [3:25:33<7:24:33, 5.97s/it] 33%|███▎ | 2174/6640 [3:25:38<7:08:15, 5.75s/it] {'loss': 0.6023, 'learning_rate': 1.5710873804476035e-05, 'epoch': 0.33} 33%|███▎ | 2174/6640 [3:25:38<7:08:15, 5.75s/it] 33%|███▎ | 2175/6640 [3:25:46<7:59:11, 6.44s/it] {'loss': 0.6054, 'learning_rate': 1.5706868622447084e-05, 'epoch': 0.33} 33%|███▎ | 2175/6640 [3:25:46<7:59:11, 6.44s/it] 33%|███▎ | 2176/6640 [3:25:52<7:34:22, 6.11s/it] {'loss': 0.5998, 'learning_rate': 1.570286208233649e-05, 'epoch': 0.33} 33%|███▎ | 2176/6640 [3:25:52<7:34:22, 6.11s/it] 33%|███▎ | 2177/6640 [3:25:57<7:20:49, 5.93s/it] {'loss': 0.615, 'learning_rate': 1.5698854185097713e-05, 'epoch': 0.33} 33%|███▎ | 2177/6640 [3:25:57<7:20:49, 5.93s/it] 33%|███▎ | 2178/6640 [3:26:03<7:14:23, 5.84s/it] {'loss': 0.6087, 'learning_rate': 1.569484493168452e-05, 'epoch': 0.33} 33%|███▎ | 2178/6640 [3:26:03<7:14:23, 5.84s/it] 33%|███▎ | 2179/6640 [3:26:09<7:32:07, 6.08s/it] {'loss': 0.5904, 'learning_rate': 1.569083432305101e-05, 'epoch': 0.33} 33%|███▎ | 2179/6640 [3:26:09<7:32:07, 6.08s/it] 33%|███▎ | 2180/6640 [3:26:15<7:24:50, 5.98s/it] {'loss': 0.6115, 'learning_rate': 1.568682236015159e-05, 'epoch': 0.33} 33%|███▎ | 2180/6640 [3:26:15<7:24:50, 5.98s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 33%|███▎ | 2181/6640 [3:26:23<8:15:44, 6.67s/it] {'loss': 0.6258, 'learning_rate': 1.5682809043941013e-05, 'epoch': 0.33} 33%|███▎ | 2181/6640 [3:26:23<8:15:44, 6.67s/it] 33%|███▎ | 2182/6640 [3:26:29<7:49:10, 6.31s/it] {'loss': 0.5918, 'learning_rate': 1.567879437537433e-05, 'epoch': 0.33} 33%|███▎ | 2182/6640 [3:26:29<7:49:10, 6.31s/it] 33%|███▎ | 2183/6640 [3:26:34<7:29:30, 6.05s/it] {'loss': 0.6127, 'learning_rate': 1.5674778355406928e-05, 'epoch': 0.33} 33%|███▎ | 2183/6640 [3:26:34<7:29:30, 6.05s/it] 33%|███▎ | 2184/6640 [3:26:40<7:11:31, 5.81s/it] {'loss': 0.5987, 'learning_rate': 1.5670760984994516e-05, 'epoch': 0.33} 33%|███▎ | 2184/6640 [3:26:40<7:11:31, 5.81s/it] 33%|███▎ | 2185/6640 [3:26:45<7:02:27, 5.69s/it] {'loss': 0.6088, 'learning_rate': 1.566674226509311e-05, 'epoch': 0.33} 33%|███▎ | 2185/6640 [3:26:45<7:02:27, 5.69s/it] 33%|███▎ | 2186/6640 [3:26:50<6:57:25, 5.62s/it] {'loss': 0.6253, 'learning_rate': 1.566272219665907e-05, 'epoch': 0.33} 33%|███▎ | 2186/6640 [3:26:50<6:57:25, 5.62s/it] 33%|███▎ | 2187/6640 [3:26:56<6:48:22, 5.50s/it] {'loss': 0.6129, 'learning_rate': 1.5658700780649057e-05, 'epoch': 0.33} 33%|███▎ | 2187/6640 [3:26:56<6:48:22, 5.50s/it] 33%|███▎ | 2188/6640 [3:27:01<6:46:23, 5.48s/it] {'loss': 0.6079, 'learning_rate': 1.565467801802006e-05, 'epoch': 0.33} 33%|███▎ | 2188/6640 [3:27:01<6:46:23, 5.48s/it] 33%|███▎ | 2189/6640 [3:27:06<6:45:43, 5.47s/it] {'loss': 0.6014, 'learning_rate': 1.565065390972939e-05, 'epoch': 0.33} 33%|███▎ | 2189/6640 [3:27:06<6:45:43, 5.47s/it] 33%|███▎ | 2190/6640 [3:27:12<6:43:56, 5.45s/it] {'loss': 0.6023, 'learning_rate': 1.564662845673468e-05, 'epoch': 0.33} 33%|███▎ | 2190/6640 [3:27:12<6:43:56, 5.45s/it] 33%|███▎ | 2191/6640 [3:27:17<6:44:50, 5.46s/it] {'loss': 0.5942, 'learning_rate': 1.5642601659993877e-05, 'epoch': 0.33} 33%|███▎ | 2191/6640 [3:27:17<6:44:50, 5.46s/it] 33%|███▎ | 2192/6640 [3:27:23<6:46:49, 5.49s/it] {'loss': 0.5979, 'learning_rate': 1.563857352046525e-05, 'epoch': 0.33} 33%|███▎ | 2192/6640 [3:27:23<6:46:49, 5.49s/it] 33%|███▎ | 2193/6640 [3:27:28<6:47:37, 5.50s/it] {'loss': 0.6204, 'learning_rate': 1.563454403910739e-05, 'epoch': 0.33} 33%|███▎ | 2193/6640 [3:27:28<6:47:37, 5.50s/it] 33%|███▎ | 2194/6640 [3:27:34<6:49:41, 5.53s/it] {'loss': 0.6112, 'learning_rate': 1.5630513216879203e-05, 'epoch': 0.33} 33%|███▎ | 2194/6640 [3:27:34<6:49:41, 5.53s/it] 33%|███▎ | 2195/6640 [3:27:39<6:47:51, 5.51s/it] {'loss': 0.602, 'learning_rate': 1.5626481054739916e-05, 'epoch': 0.33} 33%|███▎ | 2195/6640 [3:27:39<6:47:51, 5.51s/it] 33%|███▎ | 2196/6640 [3:27:45<6:46:55, 5.49s/it] {'loss': 0.6372, 'learning_rate': 1.562244755364908e-05, 'epoch': 0.33} 33%|███▎ | 2196/6640 [3:27:45<6:46:55, 5.49s/it] 33%|███▎ | 2197/6640 [3:27:50<6:46:44, 5.49s/it] {'loss': 0.6126, 'learning_rate': 1.5618412714566555e-05, 'epoch': 0.33} 33%|███▎ | 2197/6640 [3:27:50<6:46:44, 5.49s/it] 33%|███▎ | 2198/6640 [3:27:56<6:41:10, 5.42s/it] {'loss': 0.5974, 'learning_rate': 1.5614376538452524e-05, 'epoch': 0.33} 33%|███▎ | 2198/6640 [3:27:56<6:41:10, 5.42s/it] 33%|███▎ | 2199/6640 [3:28:01<6:42:56, 5.44s/it] {'loss': 0.6035, 'learning_rate': 1.5610339026267497e-05, 'epoch': 0.33} 33%|███▎ | 2199/6640 [3:28:01<6:42:56, 5.44s/it]4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 33%|███▎ | 2200/6640 [3:28:07<6:43:02, 5.45s/it] {'loss': 0.5977, 'learning_rate': 1.560630017897229e-05, 'epoch': 0.33} 33%|███▎ | 2200/6640 [3:28:07<6:43:02, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 33%|███▎ | 2201/6640 [3:28:35<15:12:25, 12.33s/it] {'loss': 0.5995, 'learning_rate': 1.5602259997528028e-05, 'epoch': 0.33} 33%|███▎ | 2201/6640 [3:28:35<15:12:25, 12.33s/it] 33%|███▎ | 2202/6640 [3:28:41<12:40:48, 10.29s/it] {'loss': 0.591, 'learning_rate': 1.5598218482896182e-05, 'epoch': 0.33} 33%|███▎ | 2202/6640 [3:28:41<12:40:48, 10.29s/it] 33%|███▎ | 2203/6640 [3:28:46<10:50:16, 8.79s/it] {'loss': 0.6127, 'learning_rate': 1.559417563603852e-05, 'epoch': 0.33} 33%|███▎ | 2203/6640 [3:28:46<10:50:16, 8.79s/it] 33%|███▎ | 2204/6640 [3:28:51<9:32:11, 7.74s/it] {'loss': 0.6322, 'learning_rate': 1.5590131457917128e-05, 'epoch': 0.33} 33%|███▎ | 2204/6640 [3:28:51<9:32:11, 7.74s/it] 33%|███▎ | 2205/6640 [3:28:56<8:37:25, 7.00s/it] {'loss': 0.5997, 'learning_rate': 1.558608594949441e-05, 'epoch': 0.33} 33%|███▎ | 2205/6640 [3:28:56<8:37:25, 7.00s/it] 33%|███▎ | 2206/6640 [3:29:02<8:06:14, 6.58s/it] {'loss': 0.6064, 'learning_rate': 1.5582039111733096e-05, 'epoch': 0.33} 33%|███▎ | 2206/6640 [3:29:02<8:06:14, 6.58s/it] 33%|███▎ | 2207/6640 [3:29:07<7:41:19, 6.24s/it] {'loss': 0.6117, 'learning_rate': 1.5577990945596217e-05, 'epoch': 0.33} 33%|███▎ | 2207/6640 [3:29:07<7:41:19, 6.24s/it] 33%|███▎ | 2208/6640 [3:29:13<7:26:07, 6.04s/it] {'loss': 0.5755, 'learning_rate': 1.557394145204713e-05, 'epoch': 0.33} 33%|███▎ | 2208/6640 [3:29:13<7:26:07, 6.04s/it] 33%|███▎ | 2209/6640 [3:29:18<7:08:56, 5.81s/it] {'loss': 0.5979, 'learning_rate': 1.5569890632049515e-05, 'epoch': 0.33} 33%|███▎ | 2209/6640 [3:29:18<7:08:56, 5.81s/it] 33%|███▎ | 2210/6640 [3:29:24<6:59:16, 5.68s/it] {'loss': 0.6242, 'learning_rate': 1.5565838486567343e-05, 'epoch': 0.33} 33%|███▎ | 2210/6640 [3:29:24<6:59:16, 5.68s/it] 33%|███▎ | 2211/6640 [3:29:29<6:56:40, 5.64s/it] {'loss': 0.6059, 'learning_rate': 1.5561785016564928e-05, 'epoch': 0.33} 33%|███▎ | 2211/6640 [3:29:29<6:56:40, 5.64s/it] 33%|███▎ | 2212/6640 [3:29:35<6:55:55, 5.64s/it] {'loss': 0.6122, 'learning_rate': 1.555773022300688e-05, 'epoch': 0.33} 33%|███▎ | 2212/6640 [3:29:35<6:55:55, 5.64s/it] 33%|███▎ | 2213/6640 [3:29:41<7:00:32, 5.70s/it] {'loss': 0.6036, 'learning_rate': 1.5553674106858135e-05, 'epoch': 0.33} 33%|███▎ | 2213/6640 [3:29:41<7:00:32, 5.70s/it] 33%|███▎ | 2214/6640 [3:29:46<6:56:05, 5.64s/it] {'loss': 0.622, 'learning_rate': 1.5549616669083937e-05, 'epoch': 0.33} 33%|███▎ | 2214/6640 [3:29:46<6:56:05, 5.64s/it] 33%|███▎ | 2215/6640 [3:29:52<6:53:35, 5.61s/it] {'loss': 0.6069, 'learning_rate': 1.554555791064985e-05, 'epoch': 0.33} 33%|███▎ | 2215/6640 [3:29:52<6:53:35, 5.61s/it] 33%|███▎ | 2216/6640 [3:29:57<6:53:51, 5.61s/it] {'loss': 0.584, 'learning_rate': 1.554149783252175e-05, 'epoch': 0.33} 33%|███▎ | 2216/6640 [3:29:57<6:53:51, 5.61s/it] 33%|███▎ | 2217/6640 [3:30:03<6:46:04, 5.51s/it] {'loss': 0.6161, 'learning_rate': 1.5537436435665823e-05, 'epoch': 0.33} 33%|███▎ | 2217/6640 [3:30:03<6:46:04, 5.51s/it] 33%|███▎ | 2218/6640 [3:30:08<6:45:53, 5.51s/it] {'loss': 0.5964, 'learning_rate': 1.5533373721048576e-05, 'epoch': 0.33} 33%|███▎ | 2218/6640 [3:30:08<6:45:53, 5.51s/it] 33%|███▎ | 2219/6640 [3:30:14<6:44:14, 5.49s/it] {'loss': 0.6083, 'learning_rate': 1.5529309689636826e-05, 'epoch': 0.33} 33%|███▎ | 2219/6640 [3:30:14<6:44:14, 5.49s/it] 33%|███▎ | 2220/6640 [3:30:19<6:42:37, 5.47s/it] {'loss': 0.61, 'learning_rate': 1.55252443423977e-05, 'epoch': 0.33} 33%|███▎ | 2220/6640 [3:30:19<6:42:37, 5.47s/it] 33%|███▎ | 2221/6640 [3:30:24<6:38:34, 5.41s/it] {'loss': 0.5636, 'learning_rate': 1.5521177680298645e-05, 'epoch': 0.33} 33%|███▎ | 2221/6640 [3:30:24<6:38:34, 5.41s/it] 33%|███▎ | 2222/6640 [3:30:30<6:44:17, 5.49s/it] {'loss': 0.6277, 'learning_rate': 1.5517109704307417e-05, 'epoch': 0.33} 33%|███▎ | 2222/6640 [3:30:30<6:44:17, 5.49s/it] 33%|███▎ | 2223/6640 [3:30:35<6:37:39, 5.40s/it] {'loss': 0.5872, 'learning_rate': 1.551304041539208e-05, 'epoch': 0.33} 33%|███▎ | 2223/6640 [3:30:35<6:37:39, 5.40s/it] 33%|███▎ | 2224/6640 [3:30:40<6:36:09, 5.38s/it] {'loss': 0.6059, 'learning_rate': 1.5508969814521026e-05, 'epoch': 0.33} 33%|███▎ | 2224/6640 [3:30:40<6:36:09, 5.38s/it] 34%|███▎ | 2225/6640 [3:30:46<6:39:51, 5.43s/it] {'loss': 0.6121, 'learning_rate': 1.550489790266294e-05, 'epoch': 0.34} 34%|███▎ | 2225/6640 [3:30:46<6:39:51, 5.43s/it] 34%|███▎ | 2226/6640 [3:30:51<6:39:01, 5.42s/it] {'loss': 0.6123, 'learning_rate': 1.5500824680786832e-05, 'epoch': 0.34} 34%|███▎ | 2226/6640 [3:30:51<6:39:01, 5.42s/it] 34%|███▎ | 2227/6640 [3:30:57<6:38:42, 5.42s/it] {'loss': 0.6269, 'learning_rate': 1.549675014986202e-05, 'epoch': 0.34} 34%|███▎ | 2227/6640 [3:30:57<6:38:42, 5.42s/it] 34%|███▎ | 2228/6640 [3:31:02<6:35:41, 5.38s/it] {'loss': 0.6045, 'learning_rate': 1.5492674310858127e-05, 'epoch': 0.34} 34%|███▎ | 2228/6640 [3:31:02<6:35:41, 5.38s/it] 34%|███▎ | 2229/6640 [3:31:07<6:34:08, 5.36s/it] {'loss': 0.5862, 'learning_rate': 1.5488597164745104e-05, 'epoch': 0.34} 34%|███▎ | 2229/6640 [3:31:07<6:34:08, 5.36s/it] 34%|███▎ | 2230/6640 [3:31:13<6:38:15, 5.42s/it] {'loss': 0.6116, 'learning_rate': 1.5484518712493188e-05, 'epoch': 0.34} 34%|███▎ | 2230/6640 [3:31:13<6:38:15, 5.42s/it] 34%|███▎ | 2231/6640 [3:31:19<6:39:58, 5.44s/it] {'loss': 0.6048, 'learning_rate': 1.5480438955072954e-05, 'epoch': 0.34} 34%|███▎ | 2231/6640 [3:31:19<6:39:58, 5.44s/it] 34%|███▎ | 2232/6640 [3:31:24<6:43:03, 5.49s/it] {'loss': 0.6115, 'learning_rate': 1.5476357893455268e-05, 'epoch': 0.34} 34%|███▎ | 2232/6640 [3:31:24<6:43:03, 5.49s/it] 34%|███▎ | 2233/6640 [3:31:29<6:39:53, 5.44s/it] {'loss': 0.5899, 'learning_rate': 1.5472275528611317e-05, 'epoch': 0.34} 34%|███▎ | 2233/6640 [3:31:29<6:39:53, 5.44s/it] 34%|███▎ | 2234/6640 [3:31:35<6:42:37, 5.48s/it] {'loss': 0.6206, 'learning_rate': 1.546819186151259e-05, 'epoch': 0.34} 34%|███▎ | 2234/6640 [3:31:35<6:42:37, 5.48s/it] 34%|███▎ | 2235/6640 [3:31:40<6:41:53, 5.47s/it] {'loss': 0.6173, 'learning_rate': 1.5464106893130896e-05, 'epoch': 0.34} 34%|███▎ | 2235/6640 [3:31:40<6:41:53, 5.47s/it] 34%|███▎ | 2236/6640 [3:31:46<6:37:42, 5.42s/it] {'loss': 0.6028, 'learning_rate': 1.5460020624438346e-05, 'epoch': 0.34} 34%|███▎ | 2236/6640 [3:31:46<6:37:42, 5.42s/it] 34%|███▎ | 2237/6640 [3:31:51<6:33:53, 5.37s/it] {'loss': 0.6018, 'learning_rate': 1.545593305640736e-05, 'epoch': 0.34} 34%|███▎ | 2237/6640 [3:31:51<6:33:53, 5.37s/it] 34%|███▎ | 2238/6640 [3:31:56<6:31:26, 5.34s/it] {'loss': 0.5971, 'learning_rate': 1.5451844190010666e-05, 'epoch': 0.34} 34%|███▎ | 2238/6640 [3:31:56<6:31:26, 5.34s/it] 34%|███▎ | 2239/6640 [3:32:02<6:33:45, 5.37s/it] {'loss': 0.5934, 'learning_rate': 1.5447754026221313e-05, 'epoch': 0.34} 34%|███▎ | 2239/6640 [3:32:02<6:33:45, 5.37s/it] 34%|███▎ | 2240/6640 [3:32:07<6:30:04, 5.32s/it] {'loss': 0.5827, 'learning_rate': 1.5443662566012645e-05, 'epoch': 0.34} 34%|███▎ | 2240/6640 [3:32:07<6:30:04, 5.32s/it] 34%|███▍ | 2241/6640 [3:32:12<6:31:04, 5.33s/it] {'loss': 0.6069, 'learning_rate': 1.5439569810358324e-05, 'epoch': 0.34} 34%|███▍ | 2241/6640 [3:32:12<6:31:04, 5.33s/it] 34%|███▍ | 2242/6640 [3:32:18<6:37:32, 5.42s/it] {'loss': 0.6085, 'learning_rate': 1.543547576023231e-05, 'epoch': 0.34} 34%|███▍ | 2242/6640 [3:32:18<6:37:32, 5.42s/it] 34%|███▍ | 2243/6640 [3:32:23<6:35:13, 5.39s/it] {'loss': 0.6059, 'learning_rate': 1.543138041660888e-05, 'epoch': 0.34} 34%|███▍ | 2243/6640 [3:32:23<6:35:13, 5.39s/it] 34%|███▍ | 2244/6640 [3:32:29<6:36:28, 5.41s/it] {'loss': 0.6061, 'learning_rate': 1.542728378046262e-05, 'epoch': 0.34} 34%|███▍ | 2244/6640 [3:32:29<6:36:28, 5.41s/it] 34%|███▍ | 2245/6640 [3:32:34<6:37:55, 5.43s/it] {'loss': 0.6137, 'learning_rate': 1.542318585276841e-05, 'epoch': 0.34} 34%|███▍ | 2245/6640 [3:32:34<6:37:55, 5.43s/it] 34%|███▍ | 2246/6640 [3:32:40<6:36:40, 5.42s/it] {'loss': 0.6221, 'learning_rate': 1.5419086634501455e-05, 'epoch': 0.34} 34%|███▍ | 2246/6640 [3:32:40<6:36:40, 5.42s/it] 34%|███▍ | 2247/6640 [3:32:45<6:34:25, 5.39s/it] {'loss': 0.6338, 'learning_rate': 1.541498612663726e-05, 'epoch': 0.34} 34%|███▍ | 2247/6640 [3:32:45<6:34:25, 5.39s/it] 34%|███▍ | 2248/6640 [3:32:50<6:33:06, 5.37s/it] {'loss': 0.5932, 'learning_rate': 1.5410884330151628e-05, 'epoch': 0.34} 34%|███▍ | 2248/6640 [3:32:50<6:33:06, 5.37s/it] 34%|███▍ | 2249/6640 [3:32:56<6:35:40, 5.41s/it] {'loss': 0.5944, 'learning_rate': 1.5406781246020683e-05, 'epoch': 0.34} 34%|███▍ | 2249/6640 [3:32:56<6:35:40, 5.41s/it]21 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 34%|███▍ | 2250/6640 [3:33:01<6:37:29, 5.43s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.618, 'learning_rate': 1.5402676875220847e-05, 'epoch': 0.34} 34%|███▍ | 2250/6640 [3:33:01<6:37:29, 5.43s/it] 34%|███▍ | 2251/6640 [3:33:07<6:36:15, 5.42s/it] {'loss': 0.5799, 'learning_rate': 1.539857121872885e-05, 'epoch': 0.34} 34%|███▍ | 2251/6640 [3:33:07<6:36:15, 5.42s/it] 34%|███▍ | 2252/6640 [3:33:12<6:38:06, 5.44s/it] {'loss': 0.6079, 'learning_rate': 1.5394464277521727e-05, 'epoch': 0.34} 34%|███▍ | 2252/6640 [3:33:12<6:38:06, 5.44s/it] 34%|███▍ | 2253/6640 [3:33:18<6:37:38, 5.44s/it] {'loss': 0.6229, 'learning_rate': 1.539035605257682e-05, 'epoch': 0.34} 34%|███▍ | 2253/6640 [3:33:18<6:37:38, 5.44s/it] 34%|███▍ | 2254/6640 [3:33:23<6:39:56, 5.47s/it] {'loss': 0.6107, 'learning_rate': 1.538624654487178e-05, 'epoch': 0.34} 34%|███▍ | 2254/6640 [3:33:23<6:39:56, 5.47s/it] 34%|███▍ | 2255/6640 [3:33:29<6:40:04, 5.47s/it] {'loss': 0.6105, 'learning_rate': 1.5382135755384554e-05, 'epoch': 0.34} 34%|███▍ | 2255/6640 [3:33:29<6:40:04, 5.47s/it] 34%|███▍ | 2256/6640 [3:33:34<6:36:06, 5.42s/it] {'loss': 0.6162, 'learning_rate': 1.5378023685093408e-05, 'epoch': 0.34} 34%|███▍ | 2256/6640 [3:33:34<6:36:06, 5.42s/it] 34%|███▍ | 2257/6640 [3:33:39<6:34:30, 5.40s/it] {'loss': 0.5928, 'learning_rate': 1.537391033497689e-05, 'epoch': 0.34} 34%|███▍ | 2257/6640 [3:33:39<6:34:30, 5.40s/it] 34%|███▍ | 2258/6640 [3:33:45<6:33:41, 5.39s/it] {'loss': 0.59, 'learning_rate': 1.536979570601388e-05, 'epoch': 0.34} 34%|███▍ | 2258/6640 [3:33:45<6:33:41, 5.39s/it] 34%|███▍ | 2259/6640 [3:33:50<6:33:53, 5.39s/it] {'loss': 0.5892, 'learning_rate': 1.5365679799183548e-05, 'epoch': 0.34} 34%|███▍ | 2259/6640 [3:33:50<6:33:53, 5.39s/it] 34%|███▍ | 2260/6640 [3:33:55<6:33:51, 5.40s/it] {'loss': 0.5902, 'learning_rate': 1.5361562615465366e-05, 'epoch': 0.34} 34%|███▍ | 2260/6640 [3:33:55<6:33:51, 5.40s/it] 34%|███▍ | 2261/6640 [3:34:01<6:34:34, 5.41s/it] {'loss': 0.5983, 'learning_rate': 1.535744415583911e-05, 'epoch': 0.34} 34%|███▍ | 2261/6640 [3:34:01<6:34:34, 5.41s/it] 34%|███▍ | 2262/6640 [3:34:06<6:35:05, 5.41s/it] {'loss': 0.5999, 'learning_rate': 1.535332442128487e-05, 'epoch': 0.34} 34%|███▍ | 2262/6640 [3:34:06<6:35:05, 5.41s/it] 34%|███▍ | 2263/6640 [3:34:12<6:34:18, 5.41s/it] {'loss': 0.6104, 'learning_rate': 1.5349203412783028e-05, 'epoch': 0.34} 34%|███▍ | 2263/6640 [3:34:12<6:34:18, 5.41s/it] 34%|███▍ | 2264/6640 [3:34:17<6:39:28, 5.48s/it] {'loss': 0.5755, 'learning_rate': 1.5345081131314276e-05, 'epoch': 0.34} 34%|███▍ | 2264/6640 [3:34:17<6:39:28, 5.48s/it] 34%|███▍ | 2265/6640 [3:34:23<6:36:35, 5.44s/it] {'loss': 0.6071, 'learning_rate': 1.5340957577859605e-05, 'epoch': 0.34} 34%|███▍ | 2265/6640 [3:34:23<6:36:35, 5.44s/it] 34%|███▍ | 2266/6640 [3:34:28<6:37:38, 5.45s/it] {'loss': 0.5906, 'learning_rate': 1.533683275340031e-05, 'epoch': 0.34} 34%|███▍ | 2266/6640 [3:34:28<6:37:38, 5.45s/it] 34%|███▍ | 2267/6640 [3:34:34<6:36:48, 5.44s/it] {'loss': 0.6227, 'learning_rate': 1.5332706658917985e-05, 'epoch': 0.34} 34%|███▍ | 2267/6640 [3:34:34<6:36:48, 5.44s/it] 34%|███▍ | 2268/6640 [3:34:39<6:36:38, 5.44s/it] {'loss': 0.5964, 'learning_rate': 1.5328579295394534e-05, 'epoch': 0.34} 34%|███▍ | 2268/6640 [3:34:39<6:36:38, 5.44s/it] 34%|███▍ | 2269/6640 [3:34:44<6:36:50, 5.45s/it] {'loss': 0.6075, 'learning_rate': 1.5324450663812164e-05, 'epoch': 0.34} 34%|███▍ | 2269/6640 [3:34:44<6:36:50, 5.45s/it] 34%|███▍ | 2270/6640 [3:34:50<6:34:37, 5.42s/it] {'loss': 0.6077, 'learning_rate': 1.5320320765153367e-05, 'epoch': 0.34} 34%|███▍ | 2270/6640 [3:34:50<6:34:37, 5.42s/it] 34%|███▍ | 2271/6640 [3:34:55<6:36:52, 5.45s/it] {'loss': 0.5982, 'learning_rate': 1.5316189600400955e-05, 'epoch': 0.34} 34%|███▍ | 2271/6640 [3:34:55<6:36:52, 5.45s/it] 34%|███▍ | 2272/6640 [3:35:01<6:36:12, 5.44s/it] {'loss': 0.5837, 'learning_rate': 1.5312057170538033e-05, 'epoch': 0.34} 34%|███▍ | 2272/6640 [3:35:01<6:36:12, 5.44s/it] 34%|███▍ | 2273/6640 [3:35:06<6:37:45, 5.46s/it] {'loss': 0.6258, 'learning_rate': 1.530792347654801e-05, 'epoch': 0.34} 34%|███▍ | 2273/6640 [3:35:06<6:37:45, 5.46s/it] 34%|███▍ | 2274/6640 [3:35:12<6:35:59, 5.44s/it] {'loss': 0.6007, 'learning_rate': 1.5303788519414594e-05, 'epoch': 0.34} 34%|███▍ | 2274/6640 [3:35:12<6:35:59, 5.44s/it] 34%|███▍ | 2275/6640 [3:35:17<6:33:40, 5.41s/it] {'loss': 0.5964, 'learning_rate': 1.5299652300121792e-05, 'epoch': 0.34} 34%|███▍ | 2275/6640 [3:35:17<6:33:40, 5.41s/it] 34%|███▍ | 2276/6640 [3:35:22<6:32:28, 5.40s/it] {'loss': 0.6249, 'learning_rate': 1.5295514819653913e-05, 'epoch': 0.34} 34%|███▍ | 2276/6640 [3:35:22<6:32:28, 5.40s/it] 34%|███▍ | 2277/6640 [3:35:28<6:32:06, 5.39s/it] {'loss': 0.6133, 'learning_rate': 1.529137607899557e-05, 'epoch': 0.34} 34%|███▍ | 2277/6640 [3:35:28<6:32:06, 5.39s/it] 34%|███▍ | 2278/6640 [3:35:33<6:27:45, 5.33s/it] {'loss': 0.615, 'learning_rate': 1.5287236079131668e-05, 'epoch': 0.34} 34%|███▍ | 2278/6640 [3:35:33<6:27:45, 5.33s/it] 34%|███▍ | 2279/6640 [3:35:38<6:28:44, 5.35s/it] {'loss': 0.6008, 'learning_rate': 1.5283094821047416e-05, 'epoch': 0.34} 34%|███▍ | 2279/6640 [3:35:38<6:28:44, 5.35s/it] 34%|███▍ | 2280/6640 [3:35:44<6:27:14, 5.33s/it] {'loss': 0.566, 'learning_rate': 1.5278952305728325e-05, 'epoch': 0.34} 34%|███▍ | 2280/6640 [3:35:44<6:27:14, 5.33s/it] 34%|███▍ | 2281/6640 [3:35:49<6:29:18, 5.36s/it] {'loss': 0.6172, 'learning_rate': 1.5274808534160203e-05, 'epoch': 0.34} 34%|███▍ | 2281/6640 [3:35:49<6:29:18, 5.36s/it] 34%|███▍ | 2282/6640 [3:35:54<6:29:40, 5.36s/it] {'loss': 0.6138, 'learning_rate': 1.5270663507329152e-05, 'epoch': 0.34} 34%|███▍ | 2282/6640 [3:35:54<6:29:40, 5.36s/it] 34%|███▍ | 2283/6640 [3:36:00<6:32:07, 5.40s/it] {'loss': 0.582, 'learning_rate': 1.5266517226221585e-05, 'epoch': 0.34} 34%|███▍ | 2283/6640 [3:36:00<6:32:07, 5.40s/it] 34%|███▍ | 2284/6640 [3:36:05<6:32:33, 5.41s/it] {'loss': 0.6091, 'learning_rate': 1.52623696918242e-05, 'epoch': 0.34} 34%|███▍ | 2284/6640 [3:36:05<6:32:33, 5.41s/it] 34%|███▍ | 2285/6640 [3:36:11<6:36:03, 5.46s/it] {'loss': 0.5966, 'learning_rate': 1.5258220905123997e-05, 'epoch': 0.34} 34%|███▍ | 2285/6640 [3:36:11<6:36:03, 5.46s/it] 34%|███▍ | 2286/6640 [3:36:16<6:36:29, 5.46s/it] {'loss': 0.5957, 'learning_rate': 1.5254070867108277e-05, 'epoch': 0.34} 34%|███▍ | 2286/6640 [3:36:16<6:36:29, 5.46s/it] 34%|███▍ | 2287/6640 [3:36:22<6:32:39, 5.41s/it] {'loss': 0.626, 'learning_rate': 1.5249919578764641e-05, 'epoch': 0.34} 34%|███▍ | 2287/6640 [3:36:22<6:32:39, 5.41s/it] 34%|███▍ | 2288/6640 [3:36:27<6:33:48, 5.43s/it] {'loss': 0.595, 'learning_rate': 1.5245767041080983e-05, 'epoch': 0.34} 34%|███▍ | 2288/6640 [3:36:27<6:33:48, 5.43s/it] 34%|███▍ | 2289/6640 [3:36:32<6:29:58, 5.38s/it] {'loss': 0.5921, 'learning_rate': 1.5241613255045495e-05, 'epoch': 0.34} 34%|███▍ | 2289/6640 [3:36:32<6:29:58, 5.38s/it] 34%|███▍ | 2290/6640 [3:36:38<6:33:56, 5.43s/it] {'loss': 0.6167, 'learning_rate': 1.5237458221646668e-05, 'epoch': 0.34} 34%|███▍ | 2290/6640 [3:36:38<6:33:56, 5.43s/it]WARNING: tokenization mismatch: 1 vs. 1440. [[{'from': 'human', 'value': 'ASSISTANT: Du bist ein Personalmarketing-Experte mit den folgenden Kenntnissen und Eigenschaften.\n\nKENNTNISSE: Personalmarketing, Social Media Marketing, Facebook-Marketing, Kampagnenentwicklung, Zielgruppenanalyse, Content-Strategie, Werbeanzeigen, Performance-Messung, Personalgewinnung, Steuerfachangestellte.\n\nEIGENSCHAFTEN: Kreativ, strategisch, analytisch, kommunikativ, zielorientiert, detailverliebt, teamorientiert.\n\nAUFGABE: Entwickle eine erfolgreiche Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook. Erstelle eine Tabelle mit allen Maßnahmen und weise jeder Maßnahme den zuständigen Spezialisten in einer Marketingagentur zu. Achte dabei auf Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung.\n\nOUTPUT: Eine Tabelle mit allen Maßnahmen für eine erfolgreiche Personalmarketing-Kampagne zur Gewinnung von Steuerfachangestellten auf Facebook, die die Verantwortlichkeiten der Spezialisten in einer Marketingagentur aufzeigt, einschließlich Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung.'}, {'from': 'gpt', 'value': '| Maßnahme | Beschreibung | Zuständiger Spezialist |\n| --- | --- | --- |\n| Zielgruppenanalyse | Analyse der demografischen Daten, Interessen und Verhaltensweisen der potenziellen Bewerber. | Marktanalyst |\n| Erstellung von Personas | Entwicklung von fiktiven, repräsentativen Bewerberprofilen zur besseren Zielgruppenansprache. | Personalmarketing-Experte |\n| Content-Strategie | Planung von relevantem, ansprechendem und zielgruppenspezifischem Content. | Content-Stratege |\n| Redaktionsplan | Erstellung eines Zeitplans für die Veröffentlichung von Inhalten auf Facebook. | Social Media Manager |\n| Erstellung von Werbeanzeigen | Design und Texten von ansprechenden, zielgruppenspezifischen Facebook-Werbeanzeigen. | Art Director & Copywriter |\n| Auswahl von Targeting-Optionen | Festlegung der Zielgruppenparameter für die Facebook-Werbekampagnen. | Facebook-Marketing-Experte |\n| Budgetierung und Zeitplanung | Festlegung des Budgets und der Laufzeit für die Facebook-Werbekampagnen. | Media Planner |\n| Erstellung von Landing Pages | Gestaltung und Implementierung von spezifischen Landing Pages für die Bewerbung. | Web-Designer |\n| Performance-Messung | Überwachung und Analyse der Kampagnenleistung, einschließlich KPIs wie Klicks und Bewerbungen. | Data Analyst |\n| A/B-Testing | Durchführung von A/B-Tests, um die Effektivität von Anzeigen und Landing Pages zu optimieren. | Conversion-Optimierungsexperte |\n| Community-Management | Interaktion mit der Zielgruppe auf Facebook, Beantwortung von Fragen und Moderation von Diskussionen. | Community Manager |\n| Reporting und Optimierung | Regelmäßige Berichterstattung über die Kampagnenergebnisse und Anpassung der Strategie. | Performance Marketing-Manager |'}, {'from': 'human', 'value': 'Weiter.'}, {'from': 'gpt', 'value': 'Hier sind einige zusätzliche Maßnahmen, die Sie in Ihre Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook einbeziehen können:\n\n| Maßnahme | Beschreibung | Zuständiger Spezialist |\n| --- | --- | --- |\n| Video-Content | Erstellung von Video-Content, wie Interviews mit aktuellen Mitarbeitern oder Berufsalltagsvideos. | Videoproduzent |\n| Influencer-Marketing | Zusammenarbeit mit Influencern aus der Branche, um Ihre Arbeitgebermarke zu stärken. | Influencer-Marketing-Experte |\n| Anpassung von Stellenanzeigen | Anpassung der Stellenanzeigen auf Facebook, um die Zielgruppe besser anzusprechen. | Personalmarketing-Experte |\n| Mitarbeiter-Testimonials | Teilen von Erfahrungsberichten und Zitaten der Mitarbeiter auf Facebook, um Authentizität zu schaffen. | Content-Stratege |\n| Virtuelle Karriere-Events | Durchführung von virtuellen Karriere-Events oder Informationsveranstaltungen auf Facebook. | Event-Manager |\n| Interaktive Umfragen und Quiz | Nutzung von interaktiven Umfragen und Quiz auf Facebook, um das Engagement zu erhöhen. | Social Media Manager |\n| Remarketing-Kampagnen | Einsatz von Remarketing-Anzeigen, um die Personen erneut anzusprechen, die bereits interessiert waren. | Facebook-Marketing-Experte |\n| Monitoring von Wettbewerbern | Beobachtung der Personalmarketing-Aktivitäten der Wettbewerber auf Facebook und Ableitung von Optimierungspotenzialen. | Marktanalyst |\n| Integration von Chatbots | Implementierung von Chatbots auf der Facebook-Seite und der Landing Page zur Automatisierung des Bewerbungsprozesses. | Chatbot-Entwickler |\n| Employer Branding-Kampagne | Entwicklung einer Employer Branding-Kampagne, die die Unternehmenskultur und -werte hervorhebt. | Employer Branding-Experte |\n\nMit diesen erweiterten Maßnahmen können Sie eine noch umfassendere Personalmarketing-Strategie entwickeln, die die Chancen erhöht, qualifizierte Steuerfachangestellte über Facebook zu gewinnen.'}]] (ignored) 35%|███▍ | 2291/6640 [3:36:43<6:32:55, 5.42s/it] {'loss': 0.5849, 'learning_rate': 1.5233301941873286e-05, 'epoch': 0.35} 35%|███▍ | 2291/6640 [3:36:43<6:32:55, 5.42s/it] 35%|███▍ | 2292/6640 [3:36:49<6:29:38, 5.38s/it] {'loss': 0.6047, 'learning_rate': 1.5229144416714435e-05, 'epoch': 0.35} 35%|███▍ | 2292/6640 [3:36:49<6:29:38, 5.38s/it] 35%|███▍ | 2293/6640 [3:36:54<6:27:05, 5.34s/it] {'loss': 0.6005, 'learning_rate': 1.5224985647159489e-05, 'epoch': 0.35} 35%|███▍ | 2293/6640 [3:36:54<6:27:05, 5.34s/it] 35%|███▍ | 2294/6640 [3:36:59<6:28:04, 5.36s/it] {'loss': 0.605, 'learning_rate': 1.5220825634198133e-05, 'epoch': 0.35} 35%|███▍ | 2294/6640 [3:36:59<6:28:04, 5.36s/it] 35%|███▍ | 2295/6640 [3:37:04<6:25:23, 5.32s/it] {'loss': 0.5821, 'learning_rate': 1.5216664378820327e-05, 'epoch': 0.35} 35%|███▍ | 2295/6640 [3:37:04<6:25:23, 5.32s/it] 35%|███▍ | 2296/6640 [3:37:10<6:25:09, 5.32s/it] {'loss': 0.5961, 'learning_rate': 1.5212501882016346e-05, 'epoch': 0.35} 35%|███▍ | 2296/6640 [3:37:10<6:25:09, 5.32s/it] 35%|███▍ | 2297/6640 [3:37:15<6:25:02, 5.32s/it] {'loss': 0.6108, 'learning_rate': 1.5208338144776754e-05, 'epoch': 0.35} 35%|███▍ | 2297/6640 [3:37:15<6:25:02, 5.32s/it] 35%|███▍ | 2298/6640 [3:37:20<6:25:03, 5.32s/it] {'loss': 0.5838, 'learning_rate': 1.5204173168092405e-05, 'epoch': 0.35} 35%|███▍ | 2298/6640 [3:37:20<6:25:03, 5.32s/it] 35%|███▍ | 2299/6640 [3:37:26<6:27:11, 5.35s/it] {'loss': 0.6063, 'learning_rate': 1.520000695295445e-05, 'epoch': 0.35} 35%|███▍ | 2299/6640 [3:37:26<6:27:11, 5.35s/it]4 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 35%|███▍ | 2300/6640 [3:37:31<6:33:08, 5.44s/it]5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.6303, 'learning_rate': 1.5195839500354337e-05, 'epoch': 0.35} 35%|███▍ | 2300/6640 [3:37:31<6:33:08, 5.44s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 35%|███▍ | 2301/6640 [3:38:01<15:24:33, 12.78s/it] {'loss': 0.6063, 'learning_rate': 1.5191670811283813e-05, 'epoch': 0.35} 35%|███▍ | 2301/6640 [3:38:01<15:24:33, 12.78s/it] 35%|███▍ | 2302/6640 [3:38:07<12:44:46, 10.58s/it] {'loss': 0.6126, 'learning_rate': 1.5187500886734908e-05, 'epoch': 0.35} 35%|███▍ | 2302/6640 [3:38:07<12:44:46, 10.58s/it] 35%|███▍ | 2303/6640 [3:38:12<10:53:25, 9.04s/it] {'loss': 0.623, 'learning_rate': 1.5183329727699957e-05, 'epoch': 0.35} 35%|███▍ | 2303/6640 [3:38:12<10:53:25, 9.04s/it]WARNING: tokenization mismatch: 1 vs. 64. [[{'from': 'human', 'value': '\nWhat vitamin is this vegetable associated with?\nAnswer the question using a single word or phrase.'}, {'from': 'gpt', 'value': ''}]] (ignored) 35%|███▍ | 2304/6640 [3:38:18<9:38:02, 8.00s/it] {'loss': 0.6186, 'learning_rate': 1.5179157335171579e-05, 'epoch': 0.35} 35%|███▍ | 2304/6640 [3:38:18<9:38:02, 8.00s/it] 35%|███▍ | 2305/6640 [3:38:23<8:41:26, 7.22s/it] {'loss': 0.6072, 'learning_rate': 1.5174983710142694e-05, 'epoch': 0.35} 35%|███▍ | 2305/6640 [3:38:23<8:41:26, 7.22s/it] 35%|███▍ | 2306/6640 [3:38:29<8:05:46, 6.73s/it] {'loss': 0.5593, 'learning_rate': 1.5170808853606515e-05, 'epoch': 0.35} 35%|███▍ | 2306/6640 [3:38:29<8:05:46, 6.73s/it] 35%|███▍ | 2307/6640 [3:38:34<7:37:49, 6.34s/it] {'loss': 0.6168, 'learning_rate': 1.5166632766556546e-05, 'epoch': 0.35} 35%|███▍ | 2307/6640 [3:38:34<7:37:49, 6.34s/it] 35%|███▍ | 2308/6640 [3:38:40<7:17:52, 6.06s/it] {'loss': 0.5883, 'learning_rate': 1.516245544998658e-05, 'epoch': 0.35} 35%|███▍ | 2308/6640 [3:38:40<7:17:52, 6.06s/it] 35%|███▍ | 2309/6640 [3:38:45<7:06:09, 5.90s/it] {'loss': 0.6071, 'learning_rate': 1.5158276904890708e-05, 'epoch': 0.35} 35%|███▍ | 2309/6640 [3:38:45<7:06:09, 5.90s/it] 35%|███▍ | 2310/6640 [3:38:51<6:56:46, 5.78s/it] {'loss': 0.5996, 'learning_rate': 1.515409713226331e-05, 'epoch': 0.35} 35%|███▍ | 2310/6640 [3:38:51<6:56:46, 5.78s/it] 35%|███▍ | 2311/6640 [3:38:56<6:49:25, 5.67s/it] {'loss': 0.6308, 'learning_rate': 1.5149916133099063e-05, 'epoch': 0.35} 35%|███▍ | 2311/6640 [3:38:56<6:49:25, 5.67s/it] 35%|███▍ | 2312/6640 [3:39:02<6:43:16, 5.59s/it] {'loss': 0.5638, 'learning_rate': 1.514573390839293e-05, 'epoch': 0.35} 35%|███▍ | 2312/6640 [3:39:02<6:43:16, 5.59s/it] 35%|███▍ | 2313/6640 [3:39:07<6:37:03, 5.51s/it] {'loss': 0.5986, 'learning_rate': 1.5141550459140168e-05, 'epoch': 0.35} 35%|███▍ | 2313/6640 [3:39:07<6:37:03, 5.51s/it] 35%|███▍ | 2314/6640 [3:39:13<6:40:39, 5.56s/it] {'loss': 0.6159, 'learning_rate': 1.5137365786336329e-05, 'epoch': 0.35} 35%|███▍ | 2314/6640 [3:39:13<6:40:39, 5.56s/it] 35%|███▍ | 2315/6640 [3:39:18<6:35:29, 5.49s/it] {'loss': 0.5988, 'learning_rate': 1.513317989097725e-05, 'epoch': 0.35} 35%|███▍ | 2315/6640 [3:39:18<6:35:29, 5.49s/it] 35%|███▍ | 2316/6640 [3:39:23<6:36:06, 5.50s/it] {'loss': 0.6017, 'learning_rate': 1.5128992774059063e-05, 'epoch': 0.35} 35%|███▍ | 2316/6640 [3:39:23<6:36:06, 5.50s/it] 35%|███▍ | 2317/6640 [3:39:29<6:37:40, 5.52s/it] {'loss': 0.5872, 'learning_rate': 1.5124804436578191e-05, 'epoch': 0.35} 35%|███▍ | 2317/6640 [3:39:29<6:37:40, 5.52s/it] 35%|███▍ | 2318/6640 [3:39:34<6:35:11, 5.49s/it] {'loss': 0.612, 'learning_rate': 1.512061487953134e-05, 'epoch': 0.35} 35%|███▍ | 2318/6640 [3:39:34<6:35:11, 5.49s/it] 35%|███▍ | 2319/6640 [3:39:40<6:31:37, 5.44s/it] {'loss': 0.6407, 'learning_rate': 1.5116424103915519e-05, 'epoch': 0.35} 35%|███▍ | 2319/6640 [3:39:40<6:31:37, 5.44s/it] 35%|███▍ | 2320/6640 [3:39:45<6:33:02, 5.46s/it] {'loss': 0.6228, 'learning_rate': 1.5112232110728016e-05, 'epoch': 0.35} 35%|███▍ | 2320/6640 [3:39:45<6:33:02, 5.46s/it] 35%|███▍ | 2321/6640 [3:39:51<6:32:57, 5.46s/it] {'loss': 0.5919, 'learning_rate': 1.5108038900966416e-05, 'epoch': 0.35} 35%|███▍ | 2321/6640 [3:39:51<6:32:57, 5.46s/it] 35%|███▍ | 2322/6640 [3:39:56<6:32:25, 5.45s/it] {'loss': 0.6181, 'learning_rate': 1.5103844475628585e-05, 'epoch': 0.35} 35%|███▍ | 2322/6640 [3:39:56<6:32:25, 5.45s/it] 35%|███▍ | 2323/6640 [3:40:01<6:29:18, 5.41s/it] {'loss': 0.6121, 'learning_rate': 1.509964883571269e-05, 'epoch': 0.35} 35%|███▍ | 2323/6640 [3:40:01<6:29:18, 5.41s/it] 35%|███▌ | 2324/6640 [3:40:07<6:28:38, 5.40s/it] {'loss': 0.6077, 'learning_rate': 1.5095451982217177e-05, 'epoch': 0.35} 35%|███▌ | 2324/6640 [3:40:07<6:28:38, 5.40s/it] 35%|███▌ | 2325/6640 [3:40:12<6:27:52, 5.39s/it] {'loss': 0.6115, 'learning_rate': 1.5091253916140789e-05, 'epoch': 0.35} 35%|███▌ | 2325/6640 [3:40:12<6:27:52, 5.39s/it] 35%|███▌ | 2326/6640 [3:40:18<6:30:24, 5.43s/it] {'loss': 0.5946, 'learning_rate': 1.508705463848255e-05, 'epoch': 0.35} 35%|███▌ | 2326/6640 [3:40:18<6:30:24, 5.43s/it] 35%|███▌ | 2327/6640 [3:40:23<6:31:47, 5.45s/it] {'loss': 0.5864, 'learning_rate': 1.5082854150241773e-05, 'epoch': 0.35} 35%|███▌ | 2327/6640 [3:40:23<6:31:47, 5.45s/it] 35%|███▌ | 2328/6640 [3:40:29<6:32:13, 5.46s/it] {'loss': 0.6109, 'learning_rate': 1.5078652452418063e-05, 'epoch': 0.35} 35%|███▌ | 2328/6640 [3:40:29<6:32:13, 5.46s/it] 35%|███▌ | 2329/6640 [3:40:34<6:33:49, 5.48s/it] {'loss': 0.5952, 'learning_rate': 1.5074449546011312e-05, 'epoch': 0.35} 35%|███▌ | 2329/6640 [3:40:34<6:33:49, 5.48s/it] 35%|███▌ | 2330/6640 [3:40:40<6:32:11, 5.46s/it] {'loss': 0.5981, 'learning_rate': 1.5070245432021699e-05, 'epoch': 0.35} 35%|███▌ | 2330/6640 [3:40:40<6:32:11, 5.46s/it] 35%|███▌ | 2331/6640 [3:40:45<6:30:16, 5.43s/it] {'loss': 0.5877, 'learning_rate': 1.5066040111449692e-05, 'epoch': 0.35} 35%|███▌ | 2331/6640 [3:40:45<6:30:16, 5.43s/it] 35%|███▌ | 2332/6640 [3:40:51<6:33:21, 5.48s/it] {'loss': 0.62, 'learning_rate': 1.5061833585296044e-05, 'epoch': 0.35} 35%|███▌ | 2332/6640 [3:40:51<6:33:21, 5.48s/it] 35%|███▌ | 2333/6640 [3:40:56<6:30:36, 5.44s/it] {'loss': 0.6038, 'learning_rate': 1.505762585456179e-05, 'epoch': 0.35} 35%|███▌ | 2333/6640 [3:40:56<6:30:36, 5.44s/it] 35%|███▌ | 2334/6640 [3:41:01<6:30:53, 5.45s/it] {'loss': 0.5824, 'learning_rate': 1.5053416920248267e-05, 'epoch': 0.35} 35%|███▌ | 2334/6640 [3:41:01<6:30:53, 5.45s/it] 35%|███▌ | 2335/6640 [3:41:07<6:28:17, 5.41s/it] {'loss': 0.5907, 'learning_rate': 1.5049206783357082e-05, 'epoch': 0.35} 35%|███▌ | 2335/6640 [3:41:07<6:28:17, 5.41s/it] 35%|███▌ | 2336/6640 [3:41:12<6:26:37, 5.39s/it] {'loss': 0.6105, 'learning_rate': 1.504499544489013e-05, 'epoch': 0.35} 35%|███▌ | 2336/6640 [3:41:12<6:26:37, 5.39s/it] 35%|███▌ | 2337/6640 [3:41:18<6:29:04, 5.43s/it] {'loss': 0.5983, 'learning_rate': 1.504078290584961e-05, 'epoch': 0.35} 35%|███▌ | 2337/6640 [3:41:18<6:29:04, 5.43s/it] 35%|███▌ | 2338/6640 [3:41:23<6:30:42, 5.45s/it] {'loss': 0.6312, 'learning_rate': 1.5036569167237978e-05, 'epoch': 0.35} 35%|███▌ | 2338/6640 [3:41:23<6:30:42, 5.45s/it] 35%|███▌ | 2339/6640 [3:41:29<6:39:28, 5.57s/it] {'loss': 0.6139, 'learning_rate': 1.5032354230058004e-05, 'epoch': 0.35} 35%|███▌ | 2339/6640 [3:41:29<6:39:28, 5.57s/it] 35%|███▌ | 2340/6640 [3:41:34<6:37:17, 5.54s/it] {'loss': 0.5962, 'learning_rate': 1.502813809531272e-05, 'epoch': 0.35} 35%|███▌ | 2340/6640 [3:41:34<6:37:17, 5.54s/it] 35%|███▌ | 2341/6640 [3:41:40<6:33:51, 5.50s/it] {'loss': 0.6019, 'learning_rate': 1.502392076400546e-05, 'epoch': 0.35} 35%|███▌ | 2341/6640 [3:41:40<6:33:51, 5.50s/it] 35%|███▌ | 2342/6640 [3:41:45<6:35:00, 5.51s/it] {'loss': 0.6072, 'learning_rate': 1.501970223713983e-05, 'epoch': 0.35} 35%|███▌ | 2342/6640 [3:41:45<6:35:00, 5.51s/it] 35%|███▌ | 2343/6640 [3:41:51<6:34:04, 5.50s/it] {'loss': 0.5939, 'learning_rate': 1.501548251571973e-05, 'epoch': 0.35} 35%|███▌ | 2343/6640 [3:41:51<6:34:04, 5.50s/it] 35%|███▌ | 2344/6640 [3:41:56<6:32:18, 5.48s/it] {'loss': 0.6161, 'learning_rate': 1.501126160074934e-05, 'epoch': 0.35} 35%|███▌ | 2344/6640 [3:41:56<6:32:18, 5.48s/it] 35%|███▌ | 2345/6640 [3:42:02<6:29:19, 5.44s/it] {'loss': 0.6053, 'learning_rate': 1.5007039493233123e-05, 'epoch': 0.35} 35%|███▌ | 2345/6640 [3:42:02<6:29:19, 5.44s/it] 35%|███▌ | 2346/6640 [3:42:07<6:29:30, 5.44s/it] {'loss': 0.5936, 'learning_rate': 1.5002816194175829e-05, 'epoch': 0.35} 35%|███▌ | 2346/6640 [3:42:07<6:29:30, 5.44s/it] 35%|███▌ | 2347/6640 [3:42:12<6:27:56, 5.42s/it] {'loss': 0.5903, 'learning_rate': 1.4998591704582488e-05, 'epoch': 0.35} 35%|███▌ | 2347/6640 [3:42:12<6:27:56, 5.42s/it] 35%|███▌ | 2348/6640 [3:42:18<6:24:57, 5.38s/it] {'loss': 0.6168, 'learning_rate': 1.4994366025458421e-05, 'epoch': 0.35} 35%|███▌ | 2348/6640 [3:42:18<6:24:57, 5.38s/it] 35%|███▌ | 2349/6640 [3:42:23<6:28:37, 5.43s/it] {'loss': 0.6058, 'learning_rate': 1.4990139157809217e-05, 'epoch': 0.35} 35%|███▌ | 2349/6640 [3:42:23<6:28:37, 5.43s/it]4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 35%|███▌ | 2350/6640 [3:42:29<6:32:57, 5.50s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6199, 'learning_rate': 1.4985911102640762e-05, 'epoch': 0.35} 35%|███▌ | 2350/6640 [3:42:29<6:32:57, 5.50s/it] 35%|███▌ | 2351/6640 [3:42:34<6:31:57, 5.48s/it] {'loss': 0.5959, 'learning_rate': 1.4981681860959222e-05, 'epoch': 0.35} 35%|███▌ | 2351/6640 [3:42:34<6:31:57, 5.48s/it] 35%|███▌ | 2352/6640 [3:42:40<6:32:06, 5.49s/it] {'loss': 0.5998, 'learning_rate': 1.4977451433771037e-05, 'epoch': 0.35} 35%|███▌ | 2352/6640 [3:42:40<6:32:06, 5.49s/it] 35%|███▌ | 2353/6640 [3:42:45<6:33:17, 5.50s/it] {'loss': 0.6051, 'learning_rate': 1.4973219822082942e-05, 'epoch': 0.35} 35%|███▌ | 2353/6640 [3:42:45<6:33:17, 5.50s/it] 35%|███▌ | 2354/6640 [3:42:51<6:34:52, 5.53s/it] {'loss': 0.6067, 'learning_rate': 1.4968987026901942e-05, 'epoch': 0.35} 35%|███▌ | 2354/6640 [3:42:51<6:34:52, 5.53s/it] 35%|███▌ | 2355/6640 [3:42:56<6:29:14, 5.45s/it] {'loss': 0.5889, 'learning_rate': 1.4964753049235333e-05, 'epoch': 0.35} 35%|███▌ | 2355/6640 [3:42:56<6:29:14, 5.45s/it] 35%|███▌ | 2356/6640 [3:43:02<6:26:01, 5.41s/it] {'loss': 0.5908, 'learning_rate': 1.4960517890090683e-05, 'epoch': 0.35} 35%|███▌ | 2356/6640 [3:43:02<6:26:01, 5.41s/it] 35%|███▌ | 2357/6640 [3:43:07<6:24:55, 5.39s/it] {'loss': 0.5936, 'learning_rate': 1.4956281550475851e-05, 'epoch': 0.35} 35%|███▌ | 2357/6640 [3:43:07<6:24:55, 5.39s/it] 36%|███▌ | 2358/6640 [3:43:12<6:27:33, 5.43s/it] {'loss': 0.5854, 'learning_rate': 1.4952044031398966e-05, 'epoch': 0.36} 36%|███▌ | 2358/6640 [3:43:12<6:27:33, 5.43s/it] 36%|███▌ | 2359/6640 [3:43:18<6:25:49, 5.41s/it] {'loss': 0.5684, 'learning_rate': 1.4947805333868453e-05, 'epoch': 0.36} 36%|███▌ | 2359/6640 [3:43:18<6:25:49, 5.41s/it] 36%|███▌ | 2360/6640 [3:43:23<6:27:35, 5.43s/it] {'loss': 0.5919, 'learning_rate': 1.4943565458892999e-05, 'epoch': 0.36} 36%|███▌ | 2360/6640 [3:43:23<6:27:35, 5.43s/it] 36%|███▌ | 2361/6640 [3:43:29<6:28:00, 5.44s/it] {'loss': 0.6103, 'learning_rate': 1.4939324407481588e-05, 'epoch': 0.36} 36%|███▌ | 2361/6640 [3:43:29<6:28:00, 5.44s/it] 36%|███▌ | 2362/6640 [3:43:34<6:30:13, 5.47s/it] {'loss': 0.6024, 'learning_rate': 1.493508218064347e-05, 'epoch': 0.36} 36%|███▌ | 2362/6640 [3:43:34<6:30:13, 5.47s/it] 36%|███▌ | 2363/6640 [3:43:40<6:29:00, 5.46s/it] {'loss': 0.6094, 'learning_rate': 1.4930838779388186e-05, 'epoch': 0.36} 36%|███▌ | 2363/6640 [3:43:40<6:29:00, 5.46s/it] 36%|███▌ | 2364/6640 [3:43:45<6:31:02, 5.49s/it] {'loss': 0.5998, 'learning_rate': 1.4926594204725552e-05, 'epoch': 0.36} 36%|███▌ | 2364/6640 [3:43:45<6:31:02, 5.49s/it] 36%|███▌ | 2365/6640 [3:43:51<6:29:33, 5.47s/it] {'loss': 0.6134, 'learning_rate': 1.4922348457665656e-05, 'epoch': 0.36} 36%|███▌ | 2365/6640 [3:43:51<6:29:33, 5.47s/it] 36%|███▌ | 2366/6640 [3:43:56<6:29:37, 5.47s/it] {'loss': 0.6465, 'learning_rate': 1.491810153921888e-05, 'epoch': 0.36} 36%|███▌ | 2366/6640 [3:43:56<6:29:37, 5.47s/it] 36%|███▌ | 2367/6640 [3:44:02<6:31:09, 5.49s/it] {'loss': 0.5819, 'learning_rate': 1.4913853450395874e-05, 'epoch': 0.36} 36%|███▌ | 2367/6640 [3:44:02<6:31:09, 5.49s/it] 36%|███▌ | 2368/6640 [3:44:07<6:29:25, 5.47s/it] {'loss': 0.6064, 'learning_rate': 1.4909604192207569e-05, 'epoch': 0.36} 36%|███▌ | 2368/6640 [3:44:07<6:29:25, 5.47s/it] 36%|███▌ | 2369/6640 [3:44:13<6:30:12, 5.48s/it] {'loss': 0.6143, 'learning_rate': 1.4905353765665171e-05, 'epoch': 0.36} 36%|███▌ | 2369/6640 [3:44:13<6:30:12, 5.48s/it] 36%|███▌ | 2370/6640 [3:44:18<6:27:54, 5.45s/it] {'loss': 0.6137, 'learning_rate': 1.4901102171780175e-05, 'epoch': 0.36} 36%|███▌ | 2370/6640 [3:44:18<6:27:54, 5.45s/it] 36%|███▌ | 2371/6640 [3:44:23<6:26:03, 5.43s/it] {'loss': 0.5866, 'learning_rate': 1.4896849411564337e-05, 'epoch': 0.36} 36%|███▌ | 2371/6640 [3:44:23<6:26:03, 5.43s/it] 36%|███▌ | 2372/6640 [3:44:29<6:24:29, 5.41s/it] {'loss': 0.616, 'learning_rate': 1.4892595486029709e-05, 'epoch': 0.36} 36%|███▌ | 2372/6640 [3:44:29<6:24:29, 5.41s/it] 36%|███▌ | 2373/6640 [3:44:34<6:27:27, 5.45s/it] {'loss': 0.6209, 'learning_rate': 1.4888340396188606e-05, 'epoch': 0.36} 36%|███▌ | 2373/6640 [3:44:34<6:27:27, 5.45s/it] 36%|███▌ | 2374/6640 [3:44:40<6:25:07, 5.42s/it] {'loss': 0.6137, 'learning_rate': 1.4884084143053622e-05, 'epoch': 0.36} 36%|███▌ | 2374/6640 [3:44:40<6:25:07, 5.42s/it] 36%|███▌ | 2375/6640 [3:44:45<6:27:23, 5.45s/it] {'loss': 0.5886, 'learning_rate': 1.487982672763764e-05, 'epoch': 0.36} 36%|███▌ | 2375/6640 [3:44:45<6:27:23, 5.45s/it] 36%|███▌ | 2376/6640 [3:44:50<6:23:06, 5.39s/it] {'loss': 0.5934, 'learning_rate': 1.4875568150953805e-05, 'epoch': 0.36} 36%|███▌ | 2376/6640 [3:44:50<6:23:06, 5.39s/it] 36%|███▌ | 2377/6640 [3:44:56<6:19:13, 5.34s/it] {'loss': 0.5827, 'learning_rate': 1.4871308414015547e-05, 'epoch': 0.36} 36%|███▌ | 2377/6640 [3:44:56<6:19:13, 5.34s/it] 36%|███▌ | 2378/6640 [3:45:01<6:24:06, 5.41s/it] {'loss': 0.5811, 'learning_rate': 1.486704751783656e-05, 'epoch': 0.36} 36%|███▌ | 2378/6640 [3:45:01<6:24:06, 5.41s/it] 36%|███▌ | 2379/6640 [3:45:07<6:23:23, 5.40s/it] {'loss': 0.6038, 'learning_rate': 1.4862785463430836e-05, 'epoch': 0.36} 36%|███▌ | 2379/6640 [3:45:07<6:23:23, 5.40s/it] 36%|███▌ | 2380/6640 [3:45:12<6:28:27, 5.47s/it] {'loss': 0.5854, 'learning_rate': 1.4858522251812621e-05, 'epoch': 0.36} 36%|███▌ | 2380/6640 [3:45:12<6:28:27, 5.47s/it] 36%|███▌ | 2381/6640 [3:45:17<6:25:20, 5.43s/it] {'loss': 0.593, 'learning_rate': 1.4854257883996449e-05, 'epoch': 0.36} 36%|███▌ | 2381/6640 [3:45:17<6:25:20, 5.43s/it] 36%|███▌ | 2382/6640 [3:45:23<6:25:35, 5.43s/it] {'loss': 0.6253, 'learning_rate': 1.4849992360997126e-05, 'epoch': 0.36} 36%|███▌ | 2382/6640 [3:45:23<6:25:35, 5.43s/it] 36%|███▌ | 2383/6640 [3:45:28<6:22:22, 5.39s/it] {'loss': 0.6113, 'learning_rate': 1.4845725683829723e-05, 'epoch': 0.36} 36%|███▌ | 2383/6640 [3:45:28<6:22:22, 5.39s/it] 36%|███▌ | 2384/6640 [3:45:34<6:26:18, 5.45s/it] {'loss': 0.6303, 'learning_rate': 1.4841457853509606e-05, 'epoch': 0.36} 36%|███▌ | 2384/6640 [3:45:34<6:26:18, 5.45s/it] 36%|███▌ | 2385/6640 [3:45:39<6:31:30, 5.52s/it] {'loss': 0.6075, 'learning_rate': 1.4837188871052399e-05, 'epoch': 0.36} 36%|███▌ | 2385/6640 [3:45:40<6:31:30, 5.52s/it] 36%|███▌ | 2386/6640 [3:45:45<6:28:21, 5.48s/it] {'loss': 0.6098, 'learning_rate': 1.4832918737474007e-05, 'epoch': 0.36} 36%|███▌ | 2386/6640 [3:45:45<6:28:21, 5.48s/it] 36%|███▌ | 2387/6640 [3:45:50<6:29:23, 5.49s/it] {'loss': 0.6302, 'learning_rate': 1.4828647453790606e-05, 'epoch': 0.36} 36%|███▌ | 2387/6640 [3:45:50<6:29:23, 5.49s/it] 36%|███▌ | 2388/6640 [3:45:56<6:26:14, 5.45s/it] {'loss': 0.5859, 'learning_rate': 1.4824375021018645e-05, 'epoch': 0.36} 36%|███▌ | 2388/6640 [3:45:56<6:26:14, 5.45s/it] 36%|███▌ | 2389/6640 [3:46:01<6:26:14, 5.45s/it] {'loss': 0.5872, 'learning_rate': 1.4820101440174852e-05, 'epoch': 0.36} 36%|███▌ | 2389/6640 [3:46:01<6:26:14, 5.45s/it] 36%|███▌ | 2390/6640 [3:46:07<6:27:29, 5.47s/it] {'loss': 0.6201, 'learning_rate': 1.481582671227622e-05, 'epoch': 0.36} 36%|███▌ | 2390/6640 [3:46:07<6:27:29, 5.47s/it] 36%|███▌ | 2391/6640 [3:46:12<6:25:27, 5.44s/it] {'loss': 0.6343, 'learning_rate': 1.4811550838340028e-05, 'epoch': 0.36} 36%|███▌ | 2391/6640 [3:46:12<6:25:27, 5.44s/it] 36%|███▌ | 2392/6640 [3:46:18<6:26:28, 5.46s/it] {'loss': 0.6051, 'learning_rate': 1.4807273819383809e-05, 'epoch': 0.36} 36%|███▌ | 2392/6640 [3:46:18<6:26:28, 5.46s/it] 36%|███▌ | 2393/6640 [3:46:23<6:21:54, 5.40s/it] {'loss': 0.582, 'learning_rate': 1.4802995656425387e-05, 'epoch': 0.36} 36%|███▌ | 2393/6640 [3:46:23<6:21:54, 5.40s/it] 36%|███▌ | 2394/6640 [3:46:28<6:22:35, 5.41s/it] {'loss': 0.5724, 'learning_rate': 1.4798716350482845e-05, 'epoch': 0.36} 36%|███▌ | 2394/6640 [3:46:28<6:22:35, 5.41s/it] 36%|███▌ | 2395/6640 [3:46:34<6:21:14, 5.39s/it] {'loss': 0.6228, 'learning_rate': 1.4794435902574543e-05, 'epoch': 0.36} 36%|███▌ | 2395/6640 [3:46:34<6:21:14, 5.39s/it] 36%|███▌ | 2396/6640 [3:46:39<6:21:32, 5.39s/it] {'loss': 0.6094, 'learning_rate': 1.4790154313719117e-05, 'epoch': 0.36} 36%|███▌ | 2396/6640 [3:46:39<6:21:32, 5.39s/it] 36%|███▌ | 2397/6640 [3:46:45<6:23:59, 5.43s/it] {'loss': 0.6075, 'learning_rate': 1.4785871584935469e-05, 'epoch': 0.36} 36%|███▌ | 2397/6640 [3:46:45<6:23:59, 5.43s/it] 36%|███▌ | 2398/6640 [3:46:50<6:24:13, 5.43s/it] {'loss': 0.5938, 'learning_rate': 1.4781587717242772e-05, 'epoch': 0.36} 36%|███▌ | 2398/6640 [3:46:50<6:24:13, 5.43s/it] 36%|███▌ | 2399/6640 [3:46:55<6:23:56, 5.43s/it] {'loss': 0.6072, 'learning_rate': 1.4777302711660469e-05, 'epoch': 0.36} 36%|███▌ | 2399/6640 [3:46:55<6:23:56, 5.43s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 36%|███▌ | 2400/6640 [3:47:01<6:26:48, 5.47s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6062, 'learning_rate': 1.4773016569208283e-05, 'epoch': 0.36} 36%|███▌ | 2400/6640 [3:47:01<6:26:48, 5.47s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 36%|███▌ | 2401/6640 [3:47:30<14:44:32, 12.52s/it] {'loss': 0.6172, 'learning_rate': 1.4768729290906194e-05, 'epoch': 0.36} 36%|███▌ | 2401/6640 [3:47:30<14:44:32, 12.52s/it] 36%|███▌ | 2402/6640 [3:47:35<12:12:11, 10.37s/it] {'loss': 0.5992, 'learning_rate': 1.4764440877774465e-05, 'epoch': 0.36} 36%|███▌ | 2402/6640 [3:47:35<12:12:11, 10.37s/it] 36%|███▌ | 2403/6640 [3:47:41<10:24:29, 8.84s/it] {'loss': 0.5913, 'learning_rate': 1.476015133083362e-05, 'epoch': 0.36} 36%|███▌ | 2403/6640 [3:47:41<10:24:29, 8.84s/it] 36%|███▌ | 2404/6640 [3:47:46<9:11:08, 7.81s/it] {'loss': 0.5936, 'learning_rate': 1.4755860651104455e-05, 'epoch': 0.36} 36%|███▌ | 2404/6640 [3:47:46<9:11:08, 7.81s/it] 36%|███▌ | 2405/6640 [3:47:51<8:21:20, 7.10s/it] {'loss': 0.5864, 'learning_rate': 1.4751568839608036e-05, 'epoch': 0.36} 36%|███▌ | 2405/6640 [3:47:51<8:21:20, 7.10s/it] 36%|███▌ | 2406/6640 [3:47:57<7:46:31, 6.61s/it] {'loss': 0.6087, 'learning_rate': 1.4747275897365707e-05, 'epoch': 0.36} 36%|███▌ | 2406/6640 [3:47:57<7:46:31, 6.61s/it] 36%|███▋ | 2407/6640 [3:48:02<7:25:05, 6.31s/it] {'loss': 0.6031, 'learning_rate': 1.4742981825399067e-05, 'epoch': 0.36} 36%|███▋ | 2407/6640 [3:48:02<7:25:05, 6.31s/it] 36%|███▋ | 2408/6640 [3:48:08<7:06:55, 6.05s/it] {'loss': 0.6077, 'learning_rate': 1.4738686624729987e-05, 'epoch': 0.36} 36%|███▋ | 2408/6640 [3:48:08<7:06:55, 6.05s/it] 36%|███▋ | 2409/6640 [3:48:13<6:53:41, 5.87s/it] {'loss': 0.5875, 'learning_rate': 1.4734390296380618e-05, 'epoch': 0.36} 36%|███▋ | 2409/6640 [3:48:13<6:53:41, 5.87s/it] 36%|███▋ | 2410/6640 [3:48:19<6:43:19, 5.72s/it] {'loss': 0.5958, 'learning_rate': 1.4730092841373362e-05, 'epoch': 0.36} 36%|███▋ | 2410/6640 [3:48:19<6:43:19, 5.72s/it] 36%|███▋ | 2411/6640 [3:48:24<6:38:36, 5.66s/it] {'loss': 0.6085, 'learning_rate': 1.4725794260730903e-05, 'epoch': 0.36} 36%|███▋ | 2411/6640 [3:48:24<6:38:36, 5.66s/it] 36%|███▋ | 2412/6640 [3:48:30<6:36:07, 5.62s/it] {'loss': 0.6054, 'learning_rate': 1.4721494555476189e-05, 'epoch': 0.36} 36%|███▋ | 2412/6640 [3:48:30<6:36:07, 5.62s/it] 36%|███▋ | 2413/6640 [3:48:35<6:34:36, 5.60s/it] {'loss': 0.6199, 'learning_rate': 1.4717193726632428e-05, 'epoch': 0.36} 36%|███▋ | 2413/6640 [3:48:35<6:34:36, 5.60s/it] 36%|███▋ | 2414/6640 [3:48:41<6:30:56, 5.55s/it] {'loss': 0.5935, 'learning_rate': 1.4712891775223108e-05, 'epoch': 0.36} 36%|███▋ | 2414/6640 [3:48:41<6:30:56, 5.55s/it] 36%|███▋ | 2415/6640 [3:48:46<6:32:48, 5.58s/it] {'loss': 0.6036, 'learning_rate': 1.4708588702271978e-05, 'epoch': 0.36} 36%|███▋ | 2415/6640 [3:48:46<6:32:48, 5.58s/it] 36%|███▋ | 2416/6640 [3:48:52<6:38:14, 5.66s/it] {'loss': 0.6055, 'learning_rate': 1.470428450880305e-05, 'epoch': 0.36} 36%|███▋ | 2416/6640 [3:48:52<6:38:14, 5.66s/it] 36%|███▋ | 2417/6640 [3:48:58<6:35:54, 5.62s/it] {'loss': 0.6096, 'learning_rate': 1.469997919584061e-05, 'epoch': 0.36} 36%|███▋ | 2417/6640 [3:48:58<6:35:54, 5.62s/it] 36%|███▋ | 2418/6640 [3:49:04<6:38:26, 5.66s/it] {'loss': 0.6266, 'learning_rate': 1.4695672764409205e-05, 'epoch': 0.36} 36%|███▋ | 2418/6640 [3:49:04<6:38:26, 5.66s/it] 36%|███▋ | 2419/6640 [3:49:09<6:32:59, 5.59s/it] {'loss': 0.5895, 'learning_rate': 1.4691365215533653e-05, 'epoch': 0.36} 36%|███▋ | 2419/6640 [3:49:09<6:32:59, 5.59s/it] 36%|███▋ | 2420/6640 [3:49:15<6:32:14, 5.58s/it] {'loss': 0.5984, 'learning_rate': 1.468705655023903e-05, 'epoch': 0.36} 36%|███▋ | 2420/6640 [3:49:15<6:32:14, 5.58s/it] 36%|███▋ | 2421/6640 [3:49:20<6:29:30, 5.54s/it] {'loss': 0.6225, 'learning_rate': 1.4682746769550686e-05, 'epoch': 0.36} 36%|███▋ | 2421/6640 [3:49:20<6:29:30, 5.54s/it] 36%|███▋ | 2422/6640 [3:49:26<6:29:40, 5.54s/it] {'loss': 0.611, 'learning_rate': 1.4678435874494234e-05, 'epoch': 0.36} 36%|███▋ | 2422/6640 [3:49:26<6:29:40, 5.54s/it] 36%|███▋ | 2423/6640 [3:49:31<6:31:04, 5.56s/it] {'loss': 0.6, 'learning_rate': 1.4674123866095551e-05, 'epoch': 0.36} 36%|███▋ | 2423/6640 [3:49:31<6:31:04, 5.56s/it] 37%|███▋ | 2424/6640 [3:49:37<6:33:13, 5.60s/it] {'loss': 0.6123, 'learning_rate': 1.4669810745380778e-05, 'epoch': 0.37} 37%|███▋ | 2424/6640 [3:49:37<6:33:13, 5.60s/it] 37%|███▋ | 2425/6640 [3:49:42<6:33:45, 5.61s/it] {'loss': 0.6143, 'learning_rate': 1.466549651337632e-05, 'epoch': 0.37} 37%|███▋ | 2425/6640 [3:49:42<6:33:45, 5.61s/it] 37%|███▋ | 2426/6640 [3:49:48<6:30:32, 5.56s/it] {'loss': 0.5845, 'learning_rate': 1.4661181171108855e-05, 'epoch': 0.37} 37%|███▋ | 2426/6640 [3:49:48<6:30:32, 5.56s/it] 37%|███▋ | 2427/6640 [3:49:54<6:31:17, 5.57s/it] {'loss': 0.5914, 'learning_rate': 1.4656864719605314e-05, 'epoch': 0.37} 37%|███▋ | 2427/6640 [3:49:54<6:31:17, 5.57s/it] 37%|███▋ | 2428/6640 [3:49:59<6:33:55, 5.61s/it] {'loss': 0.5929, 'learning_rate': 1.4652547159892896e-05, 'epoch': 0.37} 37%|███▋ | 2428/6640 [3:49:59<6:33:55, 5.61s/it] 37%|███▋ | 2429/6640 [3:50:05<6:30:15, 5.56s/it] {'loss': 0.6152, 'learning_rate': 1.4648228492999065e-05, 'epoch': 0.37} 37%|███▋ | 2429/6640 [3:50:05<6:30:15, 5.56s/it] 37%|███▋ | 2430/6640 [3:50:10<6:30:26, 5.56s/it] {'loss': 0.6072, 'learning_rate': 1.4643908719951551e-05, 'epoch': 0.37} 37%|███▋ | 2430/6640 [3:50:10<6:30:26, 5.56s/it] 37%|███▋ | 2431/6640 [3:50:16<6:28:41, 5.54s/it] {'loss': 0.6035, 'learning_rate': 1.4639587841778342e-05, 'epoch': 0.37} 37%|███▋ | 2431/6640 [3:50:16<6:28:41, 5.54s/it] 37%|███▋ | 2432/6640 [3:50:21<6:27:29, 5.53s/it] {'loss': 0.612, 'learning_rate': 1.4635265859507687e-05, 'epoch': 0.37} 37%|███▋ | 2432/6640 [3:50:21<6:27:29, 5.53s/it] 37%|███▋ | 2433/6640 [3:50:27<6:29:59, 5.56s/it] {'loss': 0.5823, 'learning_rate': 1.4630942774168113e-05, 'epoch': 0.37} 37%|███▋ | 2433/6640 [3:50:27<6:29:59, 5.56s/it] 37%|███▋ | 2434/6640 [3:50:32<6:26:09, 5.51s/it] {'loss': 0.6074, 'learning_rate': 1.4626618586788385e-05, 'epoch': 0.37} 37%|███▋ | 2434/6640 [3:50:32<6:26:09, 5.51s/it] 37%|███▋ | 2435/6640 [3:50:38<6:26:18, 5.51s/it] {'loss': 0.6225, 'learning_rate': 1.4622293298397554e-05, 'epoch': 0.37} 37%|███▋ | 2435/6640 [3:50:38<6:26:18, 5.51s/it] 37%|███▋ | 2436/6640 [3:50:43<6:24:39, 5.49s/it] {'loss': 0.5948, 'learning_rate': 1.4617966910024914e-05, 'epoch': 0.37} 37%|███▋ | 2436/6640 [3:50:43<6:24:39, 5.49s/it] 37%|███▋ | 2437/6640 [3:50:49<6:22:34, 5.46s/it] {'loss': 0.5941, 'learning_rate': 1.4613639422700034e-05, 'epoch': 0.37} 37%|███▋ | 2437/6640 [3:50:49<6:22:34, 5.46s/it] 37%|███▋ | 2438/6640 [3:50:54<6:19:04, 5.41s/it] {'loss': 0.5987, 'learning_rate': 1.4609310837452744e-05, 'epoch': 0.37} 37%|███▋ | 2438/6640 [3:50:54<6:19:04, 5.41s/it] 37%|███▋ | 2439/6640 [3:50:59<6:17:38, 5.39s/it] {'loss': 0.5916, 'learning_rate': 1.4604981155313121e-05, 'epoch': 0.37} 37%|███▋ | 2439/6640 [3:50:59<6:17:38, 5.39s/it] 37%|███▋ | 2440/6640 [3:51:05<6:16:30, 5.38s/it] {'loss': 0.6116, 'learning_rate': 1.4600650377311523e-05, 'epoch': 0.37} 37%|███▋ | 2440/6640 [3:51:05<6:16:30, 5.38s/it] 37%|███▋ | 2441/6640 [3:51:10<6:20:02, 5.43s/it] {'loss': 0.632, 'learning_rate': 1.4596318504478553e-05, 'epoch': 0.37} 37%|███▋ | 2441/6640 [3:51:10<6:20:02, 5.43s/it] 37%|███▋ | 2442/6640 [3:51:16<6:20:21, 5.44s/it] {'loss': 0.6439, 'learning_rate': 1.4591985537845084e-05, 'epoch': 0.37} 37%|███▋ | 2442/6640 [3:51:16<6:20:21, 5.44s/it] 37%|███▋ | 2443/6640 [3:51:21<6:21:22, 5.45s/it] {'loss': 0.5871, 'learning_rate': 1.4587651478442242e-05, 'epoch': 0.37} 37%|███▋ | 2443/6640 [3:51:21<6:21:22, 5.45s/it] 37%|███▋ | 2444/6640 [3:51:26<6:20:08, 5.44s/it] {'loss': 0.5898, 'learning_rate': 1.458331632730142e-05, 'epoch': 0.37} 37%|███▋ | 2444/6640 [3:51:26<6:20:08, 5.44s/it] 37%|███▋ | 2445/6640 [3:51:32<6:24:39, 5.50s/it] {'loss': 0.6239, 'learning_rate': 1.4578980085454268e-05, 'epoch': 0.37} 37%|███▋ | 2445/6640 [3:51:32<6:24:39, 5.50s/it] 37%|███▋ | 2446/6640 [3:51:37<6:19:49, 5.43s/it] {'loss': 0.6014, 'learning_rate': 1.457464275393269e-05, 'epoch': 0.37} 37%|███▋ | 2446/6640 [3:51:37<6:19:49, 5.43s/it] 37%|███▋ | 2447/6640 [3:51:43<6:23:15, 5.48s/it] {'loss': 0.5917, 'learning_rate': 1.4570304333768865e-05, 'epoch': 0.37} 37%|███▋ | 2447/6640 [3:51:43<6:23:15, 5.48s/it] 37%|███▋ | 2448/6640 [3:51:48<6:19:12, 5.43s/it] {'loss': 0.6097, 'learning_rate': 1.456596482599521e-05, 'epoch': 0.37} 37%|███▋ | 2448/6640 [3:51:48<6:19:12, 5.43s/it] 37%|███▋ | 2449/6640 [3:51:54<6:18:28, 5.42s/it] {'loss': 0.5806, 'learning_rate': 1.4561624231644418e-05, 'epoch': 0.37} 37%|███▋ | 2449/6640 [3:51:54<6:18:28, 5.42s/it]47 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 37%|███▋ | 2450/6640 [3:51:59<6:25:07, 5.51s/it] {'loss': 0.6117, 'learning_rate': 1.4557282551749428e-05, 'epoch': 0.37} 37%|███▋ | 2450/6640 [3:51:59<6:25:07, 5.51s/it] 37%|███▋ | 2451/6640 [3:52:05<6:25:47, 5.53s/it] {'loss': 0.5974, 'learning_rate': 1.455293978734345e-05, 'epoch': 0.37} 37%|███▋ | 2451/6640 [3:52:05<6:25:47, 5.53s/it] 37%|███▋ | 2452/6640 [3:52:10<6:23:41, 5.50s/it] {'loss': 0.5812, 'learning_rate': 1.4548595939459942e-05, 'epoch': 0.37} 37%|███▋ | 2452/6640 [3:52:10<6:23:41, 5.50s/it] 37%|███▋ | 2453/6640 [3:52:16<6:23:10, 5.49s/it] {'loss': 0.5908, 'learning_rate': 1.4544251009132624e-05, 'epoch': 0.37} 37%|███▋ | 2453/6640 [3:52:16<6:23:10, 5.49s/it] 37%|███▋ | 2454/6640 [3:52:21<6:21:28, 5.47s/it] {'loss': 0.5997, 'learning_rate': 1.4539904997395468e-05, 'epoch': 0.37} 37%|███▋ | 2454/6640 [3:52:21<6:21:28, 5.47s/it] 37%|███▋ | 2455/6640 [3:52:27<6:22:46, 5.49s/it] {'loss': 0.5908, 'learning_rate': 1.4535557905282716e-05, 'epoch': 0.37} 37%|███▋ | 2455/6640 [3:52:27<6:22:46, 5.49s/it] 37%|███▋ | 2456/6640 [3:52:33<6:26:12, 5.54s/it] {'loss': 0.6153, 'learning_rate': 1.4531209733828858e-05, 'epoch': 0.37} 37%|███▋ | 2456/6640 [3:52:33<6:26:12, 5.54s/it] 37%|███▋ | 2457/6640 [3:52:38<6:23:01, 5.49s/it] {'loss': 0.6084, 'learning_rate': 1.4526860484068638e-05, 'epoch': 0.37} 37%|███▋ | 2457/6640 [3:52:38<6:23:01, 5.49s/it] 37%|███▋ | 2458/6640 [3:52:43<6:23:05, 5.50s/it] {'loss': 0.6021, 'learning_rate': 1.4522510157037063e-05, 'epoch': 0.37} 37%|███▋ | 2458/6640 [3:52:43<6:23:05, 5.50s/it] 37%|███▋ | 2459/6640 [3:52:49<6:21:50, 5.48s/it] {'loss': 0.6083, 'learning_rate': 1.4518158753769396e-05, 'epoch': 0.37} 37%|███▋ | 2459/6640 [3:52:49<6:21:50, 5.48s/it] 37%|███▋ | 2460/6640 [3:52:54<6:20:14, 5.46s/it] {'loss': 0.5945, 'learning_rate': 1.451380627530115e-05, 'epoch': 0.37} 37%|███▋ | 2460/6640 [3:52:54<6:20:14, 5.46s/it] 37%|███▋ | 2461/6640 [3:53:00<6:18:45, 5.44s/it] {'loss': 0.6097, 'learning_rate': 1.45094527226681e-05, 'epoch': 0.37} 37%|███▋ | 2461/6640 [3:53:00<6:18:45, 5.44s/it] 37%|███▋ | 2462/6640 [3:53:05<6:16:26, 5.41s/it] {'loss': 0.6038, 'learning_rate': 1.4505098096906279e-05, 'epoch': 0.37} 37%|███▋ | 2462/6640 [3:53:05<6:16:26, 5.41s/it] 37%|███▋ | 2463/6640 [3:53:11<6:19:32, 5.45s/it] {'loss': 0.6198, 'learning_rate': 1.4500742399051963e-05, 'epoch': 0.37} 37%|███▋ | 2463/6640 [3:53:11<6:19:32, 5.45s/it] 37%|███▋ | 2464/6640 [3:53:16<6:19:56, 5.46s/it] {'loss': 0.6067, 'learning_rate': 1.4496385630141699e-05, 'epoch': 0.37} 37%|███▋ | 2464/6640 [3:53:16<6:19:56, 5.46s/it] 37%|███▋ | 2465/6640 [3:53:22<6:21:10, 5.48s/it] {'loss': 0.6119, 'learning_rate': 1.4492027791212275e-05, 'epoch': 0.37} 37%|███▋ | 2465/6640 [3:53:22<6:21:10, 5.48s/it] 37%|███▋ | 2466/6640 [3:53:27<6:16:20, 5.41s/it] {'loss': 0.5848, 'learning_rate': 1.4487668883300743e-05, 'epoch': 0.37} 37%|███▋ | 2466/6640 [3:53:27<6:16:20, 5.41s/it] 37%|███▋ | 2467/6640 [3:53:32<6:17:30, 5.43s/it] {'loss': 0.5891, 'learning_rate': 1.4483308907444407e-05, 'epoch': 0.37} 37%|███▋ | 2467/6640 [3:53:32<6:17:30, 5.43s/it] 37%|███▋ | 2468/6640 [3:53:38<6:18:17, 5.44s/it] {'loss': 0.5808, 'learning_rate': 1.4478947864680821e-05, 'epoch': 0.37} 37%|███▋ | 2468/6640 [3:53:38<6:18:17, 5.44s/it] 37%|███▋ | 2469/6640 [3:53:43<6:18:35, 5.45s/it] {'loss': 0.5899, 'learning_rate': 1.4474585756047801e-05, 'epoch': 0.37} 37%|███▋ | 2469/6640 [3:53:43<6:18:35, 5.45s/it] 37%|███▋ | 2470/6640 [3:53:49<6:18:22, 5.44s/it] {'loss': 0.6212, 'learning_rate': 1.4470222582583404e-05, 'epoch': 0.37} 37%|███▋ | 2470/6640 [3:53:49<6:18:22, 5.44s/it] 37%|███▋ | 2471/6640 [3:53:54<6:16:30, 5.42s/it] {'loss': 0.6044, 'learning_rate': 1.4465858345325956e-05, 'epoch': 0.37} 37%|███▋ | 2471/6640 [3:53:54<6:16:30, 5.42s/it] 37%|███▋ | 2472/6640 [3:53:59<6:17:01, 5.43s/it] {'loss': 0.5946, 'learning_rate': 1.4461493045314026e-05, 'epoch': 0.37} 37%|███▋ | 2472/6640 [3:53:59<6:17:01, 5.43s/it] 37%|███▋ | 2473/6640 [3:54:05<6:17:48, 5.44s/it] {'loss': 0.592, 'learning_rate': 1.4457126683586436e-05, 'epoch': 0.37} 37%|███▋ | 2473/6640 [3:54:05<6:17:48, 5.44s/it] 37%|███▋ | 2474/6640 [3:54:10<6:14:01, 5.39s/it] {'loss': 0.5889, 'learning_rate': 1.4452759261182268e-05, 'epoch': 0.37} 37%|███▋ | 2474/6640 [3:54:10<6:14:01, 5.39s/it] 37%|███▋ | 2475/6640 [3:54:16<6:18:47, 5.46s/it] {'loss': 0.6416, 'learning_rate': 1.4448390779140844e-05, 'epoch': 0.37} 37%|███▋ | 2475/6640 [3:54:16<6:18:47, 5.46s/it] 37%|███▋ | 2476/6640 [3:54:21<6:18:18, 5.45s/it] {'loss': 0.5976, 'learning_rate': 1.4444021238501754e-05, 'epoch': 0.37} 37%|███▋ | 2476/6640 [3:54:21<6:18:18, 5.45s/it] 37%|███▋ | 2477/6640 [3:54:27<6:19:16, 5.47s/it] {'loss': 0.6254, 'learning_rate': 1.4439650640304822e-05, 'epoch': 0.37} 37%|███▋ | 2477/6640 [3:54:27<6:19:16, 5.47s/it] 37%|███▋ | 2478/6640 [3:54:32<6:18:25, 5.46s/it] {'loss': 0.6009, 'learning_rate': 1.443527898559014e-05, 'epoch': 0.37} 37%|███▋ | 2478/6640 [3:54:32<6:18:25, 5.46s/it] 37%|███▋ | 2479/6640 [3:54:37<6:16:01, 5.42s/it] {'loss': 0.5988, 'learning_rate': 1.4430906275398044e-05, 'epoch': 0.37} 37%|███▋ | 2479/6640 [3:54:37<6:16:01, 5.42s/it] 37%|███▋ | 2480/6640 [3:54:43<6:18:47, 5.46s/it] {'loss': 0.644, 'learning_rate': 1.442653251076912e-05, 'epoch': 0.37} 37%|███▋ | 2480/6640 [3:54:43<6:18:47, 5.46s/it] 37%|███▋ | 2481/6640 [3:54:48<6:18:02, 5.45s/it] {'loss': 0.591, 'learning_rate': 1.442215769274421e-05, 'epoch': 0.37} 37%|███▋ | 2481/6640 [3:54:48<6:18:02, 5.45s/it] 37%|███▋ | 2482/6640 [3:54:54<6:16:01, 5.43s/it] {'loss': 0.6362, 'learning_rate': 1.4417781822364396e-05, 'epoch': 0.37} 37%|███▋ | 2482/6640 [3:54:54<6:16:01, 5.43s/it] 37%|███▋ | 2483/6640 [3:54:59<6:17:43, 5.45s/it] {'loss': 0.5893, 'learning_rate': 1.4413404900671025e-05, 'epoch': 0.37} 37%|███▋ | 2483/6640 [3:54:59<6:17:43, 5.45s/it] 37%|███▋ | 2484/6640 [3:55:05<6:17:25, 5.45s/it] {'loss': 0.6063, 'learning_rate': 1.4409026928705683e-05, 'epoch': 0.37} 37%|███▋ | 2484/6640 [3:55:05<6:17:25, 5.45s/it] 37%|███▋ | 2485/6640 [3:55:10<6:19:49, 5.48s/it] {'loss': 0.5952, 'learning_rate': 1.4404647907510212e-05, 'epoch': 0.37} 37%|███▋ | 2485/6640 [3:55:10<6:19:49, 5.48s/it] 37%|███▋ | 2486/6640 [3:55:16<6:18:33, 5.47s/it] {'loss': 0.6276, 'learning_rate': 1.44002678381267e-05, 'epoch': 0.37} 37%|███▋ | 2486/6640 [3:55:16<6:18:33, 5.47s/it] 37%|███▋ | 2487/6640 [3:55:21<6:20:01, 5.49s/it] {'loss': 0.6241, 'learning_rate': 1.4395886721597485e-05, 'epoch': 0.37} 37%|███▋ | 2487/6640 [3:55:21<6:20:01, 5.49s/it] 37%|███▋ | 2488/6640 [3:55:27<6:18:32, 5.47s/it] {'loss': 0.6197, 'learning_rate': 1.4391504558965157e-05, 'epoch': 0.37} 37%|███▋ | 2488/6640 [3:55:27<6:18:32, 5.47s/it] 37%|███▋ | 2489/6640 [3:55:32<6:20:41, 5.50s/it] {'loss': 0.5978, 'learning_rate': 1.4387121351272555e-05, 'epoch': 0.37} 37%|███▋ | 2489/6640 [3:55:32<6:20:41, 5.50s/it] 38%|███▊ | 2490/6640 [3:55:38<6:19:59, 5.49s/it] {'loss': 0.5918, 'learning_rate': 1.4382737099562765e-05, 'epoch': 0.38} 38%|███▊ | 2490/6640 [3:55:38<6:19:59, 5.49s/it] 38%|███▊ | 2491/6640 [3:55:43<6:16:35, 5.45s/it] {'loss': 0.6032, 'learning_rate': 1.4378351804879113e-05, 'epoch': 0.38} 38%|███▊ | 2491/6640 [3:55:43<6:16:35, 5.45s/it] 38%|███▊ | 2492/6640 [3:55:48<6:14:18, 5.41s/it] {'loss': 0.5924, 'learning_rate': 1.4373965468265195e-05, 'epoch': 0.38} 38%|███▊ | 2492/6640 [3:55:48<6:14:18, 5.41s/it] 38%|███▊ | 2493/6640 [3:55:54<6:11:44, 5.38s/it] {'loss': 0.6196, 'learning_rate': 1.4369578090764829e-05, 'epoch': 0.38} 38%|███▊ | 2493/6640 [3:55:54<6:11:44, 5.38s/it] 38%|███▊ | 2494/6640 [3:55:59<6:10:56, 5.37s/it] {'loss': 0.5982, 'learning_rate': 1.43651896734221e-05, 'epoch': 0.38} 38%|███▊ | 2494/6640 [3:55:59<6:10:56, 5.37s/it] 38%|███▊ | 2495/6640 [3:56:05<6:13:28, 5.41s/it] {'loss': 0.6039, 'learning_rate': 1.436080021728133e-05, 'epoch': 0.38} 38%|███▊ | 2495/6640 [3:56:05<6:13:28, 5.41s/it] 38%|███▊ | 2496/6640 [3:56:10<6:10:20, 5.36s/it] {'loss': 0.6197, 'learning_rate': 1.4356409723387092e-05, 'epoch': 0.38} 38%|███▊ | 2496/6640 [3:56:10<6:10:20, 5.36s/it] 38%|███▊ | 2497/6640 [3:56:15<6:11:03, 5.37s/it] {'loss': 0.597, 'learning_rate': 1.435201819278421e-05, 'epoch': 0.38} 38%|███▊ | 2497/6640 [3:56:15<6:11:03, 5.37s/it] 38%|███▊ | 2498/6640 [3:56:21<6:14:32, 5.43s/it] {'loss': 0.5987, 'learning_rate': 1.434762562651775e-05, 'epoch': 0.38} 38%|███▊ | 2498/6640 [3:56:21<6:14:32, 5.43s/it] 38%|███▊ | 2499/6640 [3:56:26<6:13:30, 5.41s/it] {'loss': 0.594, 'learning_rate': 1.434323202563302e-05, 'epoch': 0.38} 38%|███▊ | 2499/6640 [3:56:26<6:13:30, 5.41s/it]4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 38%|███▊ | 2500/6640 [3:56:32<6:15:05, 5.44s/it] {'loss': 0.6043, 'learning_rate': 1.4338837391175582e-05, 'epoch': 0.38} 38%|███▊ | 2500/6640 [3:56:32<6:15:05, 5.44s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-2500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 38%|███▊ | 2501/6640 [3:57:01<14:19:47, 12.46s/it] {'loss': 0.5918, 'learning_rate': 1.4334441724191244e-05, 'epoch': 0.38} 38%|███▊ | 2501/6640 [3:57:01<14:19:47, 12.46s/it] 38%|███▊ | 2502/6640 [3:57:06<11:52:52, 10.34s/it] {'loss': 0.6073, 'learning_rate': 1.4330045025726049e-05, 'epoch': 0.38} 38%|███▊ | 2502/6640 [3:57:06<11:52:52, 10.34s/it] 38%|███▊ | 2503/6640 [3:57:11<10:10:36, 8.86s/it] {'loss': 0.5999, 'learning_rate': 1.4325647296826302e-05, 'epoch': 0.38} 38%|███▊ | 2503/6640 [3:57:11<10:10:36, 8.86s/it] 38%|███▊ | 2504/6640 [3:57:17<8:57:14, 7.79s/it] {'loss': 0.5761, 'learning_rate': 1.4321248538538537e-05, 'epoch': 0.38} 38%|███▊ | 2504/6640 [3:57:17<8:57:14, 7.79s/it] 38%|███▊ | 2505/6640 [3:57:22<8:11:18, 7.13s/it] {'loss': 0.6114, 'learning_rate': 1.431684875190955e-05, 'epoch': 0.38} 38%|███▊ | 2505/6640 [3:57:22<8:11:18, 7.13s/it] 38%|███▊ | 2506/6640 [3:57:28<7:37:20, 6.64s/it] {'loss': 0.6164, 'learning_rate': 1.431244793798636e-05, 'epoch': 0.38} 38%|███▊ | 2506/6640 [3:57:28<7:37:20, 6.64s/it] 38%|███▊ | 2507/6640 [3:57:33<7:12:53, 6.28s/it] {'loss': 0.607, 'learning_rate': 1.4308046097816252e-05, 'epoch': 0.38} 38%|███▊ | 2507/6640 [3:57:33<7:12:53, 6.28s/it] 38%|███▊ | 2508/6640 [3:57:39<6:55:14, 6.03s/it] {'loss': 0.5933, 'learning_rate': 1.4303643232446745e-05, 'epoch': 0.38} 38%|███▊ | 2508/6640 [3:57:39<6:55:14, 6.03s/it] 38%|███▊ | 2509/6640 [3:57:44<6:45:13, 5.89s/it] {'loss': 0.6268, 'learning_rate': 1.4299239342925597e-05, 'epoch': 0.38} 38%|███▊ | 2509/6640 [3:57:44<6:45:13, 5.89s/it] 38%|███▊ | 2510/6640 [3:57:50<6:34:29, 5.73s/it] {'loss': 0.5887, 'learning_rate': 1.4294834430300822e-05, 'epoch': 0.38} 38%|███▊ | 2510/6640 [3:57:50<6:34:29, 5.73s/it] 38%|███▊ | 2511/6640 [3:57:55<6:32:56, 5.71s/it] {'loss': 0.5947, 'learning_rate': 1.4290428495620664e-05, 'epoch': 0.38} 38%|███▊ | 2511/6640 [3:57:55<6:32:56, 5.71s/it] 38%|███▊ | 2512/6640 [3:58:01<6:28:00, 5.64s/it] {'loss': 0.6094, 'learning_rate': 1.4286021539933625e-05, 'epoch': 0.38} 38%|███▊ | 2512/6640 [3:58:01<6:28:00, 5.64s/it] 38%|███▊ | 2513/6640 [3:58:06<6:24:24, 5.59s/it] {'loss': 0.5788, 'learning_rate': 1.4281613564288436e-05, 'epoch': 0.38} 38%|███▊ | 2513/6640 [3:58:06<6:24:24, 5.59s/it] 38%|███▊ | 2514/6640 [3:58:12<6:19:59, 5.53s/it] {'loss': 0.6156, 'learning_rate': 1.4277204569734085e-05, 'epoch': 0.38} 38%|███▊ | 2514/6640 [3:58:12<6:19:59, 5.53s/it]May 27 22:07:14.762756 152183 slurmstepd 0x155550ab8700: error: *** STEP 8262210.0 ON batch-block5-00572 CANCELLED AT 2025-05-27T22:07:14 DUE TO TIME LIMIT *** srun: Job step aborted: Waiting up to 122 seconds for job step to finish. 38%|███▊ | 2515/6640 [3:58:17<6:22:10, 5.56s/it] {'loss': 0.5948, 'learning_rate': 1.4272794557319785e-05, 'epoch': 0.38} 38%|███▊ | 2515/6640 [3:58:17<6:22:10, 5.56s/it]srun: error: batch-block5-00572: task 0: Terminated srun: Terminating StepId=8262210.0 srun: job 8269448 queued and waiting for resources srun: job 8269448 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block5-00214 JobID: 8269448 | Full list: batch-block5-00214 NETWORK=Efficient-Large-Model/VILA1.5-3b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-05-27 22:09:35,092] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:09:35,092] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:09:35,092] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:09:35,092] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:09:35,092] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:09:35,092] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:09:35,092] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:09:35,092] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:09:36,766] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:09:36,766] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:09:36,766] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:09:36,766] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:09:36,766] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:09:36,766] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:09:36,766] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:09:36,766] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:09:36,766] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:09:36,766] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:09:36,766] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:09:36,766] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:09:36,766] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:09:36,766] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-05-27 22:09:36,766] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:09:36,766] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:09:36,766] [INFO] [comm.py:594:init_distributed] cdb=None You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-05-27 22:09:46,750] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 2.70B parameters Loading checkpoint shards: 0%| | 0/2 [00:00 4096). Running this sequence through the model will result in indexing errors 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 45%|████▌ | 3000/6640 [47:11<5:30:36, 5.45s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5882, 'learning_rate': 1.2034560130526341e-05, 'epoch': 0.45} 45%|████▌ | 3000/6640 [47:11<5:30:36, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 45%|████▌ | 3001/6640 [47:39<12:12:11, 12.07s/it] {'loss': 0.5753, 'learning_rate': 1.2029783672206326e-05, 'epoch': 0.45} 45%|████▌ | 3001/6640 [47:39<12:12:11, 12.07s/it] 45%|████▌ | 3002/6640 [47:44<10:11:23, 10.08s/it] {'loss': 0.6083, 'learning_rate': 1.202500673085225e-05, 'epoch': 0.45} 45%|████▌ | 3002/6640 [47:44<10:11:23, 10.08s/it] 45%|████▌ | 3003/6640 [47:50<8:46:35, 8.69s/it] {'loss': 0.5869, 'learning_rate': 1.2020229307600897e-05, 'epoch': 0.45} 45%|████▌ | 3003/6640 [47:50<8:46:35, 8.69s/it] 45%|████▌ | 3004/6640 [47:55<7:51:13, 7.78s/it] {'loss': 0.5683, 'learning_rate': 1.2015451403589164e-05, 'epoch': 0.45} 45%|████▌ | 3004/6640 [47:55<7:51:13, 7.78s/it] 45%|████▌ | 3005/6640 [48:01<7:16:46, 7.21s/it] {'loss': 0.5879, 'learning_rate': 1.201067301995407e-05, 'epoch': 0.45} 45%|████▌ | 3005/6640 [48:01<7:16:46, 7.21s/it] 45%|████▌ | 3006/6640 [48:07<6:48:27, 6.74s/it] {'loss': 0.6112, 'learning_rate': 1.200589415783273e-05, 'epoch': 0.45} 45%|████▌ | 3006/6640 [48:07<6:48:27, 6.74s/it] 45%|████▌ | 3007/6640 [48:12<6:23:30, 6.33s/it] {'loss': 0.5829, 'learning_rate': 1.2001114818362394e-05, 'epoch': 0.45} 45%|████▌ | 3007/6640 [48:12<6:23:30, 6.33s/it] 45%|████▌ | 3008/6640 [48:18<6:08:18, 6.08s/it] {'loss': 0.5729, 'learning_rate': 1.1996335002680413e-05, 'epoch': 0.45} 45%|████▌ | 3008/6640 [48:18<6:08:18, 6.08s/it] 45%|████▌ | 3009/6640 [48:23<5:54:45, 5.86s/it] {'loss': 0.5976, 'learning_rate': 1.1991554711924256e-05, 'epoch': 0.45} 45%|████▌ | 3009/6640 [48:23<5:54:45, 5.86s/it] 45%|████▌ | 3010/6640 [48:28<5:44:29, 5.69s/it] {'loss': 0.597, 'learning_rate': 1.1986773947231505e-05, 'epoch': 0.45} 45%|████▌ | 3010/6640 [48:28<5:44:29, 5.69s/it] 45%|████▌ | 3011/6640 [48:34<5:43:03, 5.67s/it] {'loss': 0.6066, 'learning_rate': 1.1981992709739853e-05, 'epoch': 0.45} 45%|████▌ | 3011/6640 [48:34<5:43:03, 5.67s/it] 45%|████▌ | 3012/6640 [48:39<5:38:37, 5.60s/it] {'loss': 0.5569, 'learning_rate': 1.1977211000587109e-05, 'epoch': 0.45} 45%|████▌ | 3012/6640 [48:39<5:38:37, 5.60s/it] 45%|████▌ | 3013/6640 [48:45<5:38:06, 5.59s/it] {'loss': 0.6032, 'learning_rate': 1.1972428820911185e-05, 'epoch': 0.45} 45%|████▌ | 3013/6640 [48:45<5:38:06, 5.59s/it] 45%|████▌ | 3014/6640 [48:50<5:34:42, 5.54s/it] {'loss': 0.5955, 'learning_rate': 1.1967646171850118e-05, 'epoch': 0.45} 45%|████▌ | 3014/6640 [48:50<5:34:42, 5.54s/it] 45%|████▌ | 3015/6640 [48:56<5:37:17, 5.58s/it] {'loss': 0.624, 'learning_rate': 1.1962863054542045e-05, 'epoch': 0.45} 45%|████▌ | 3015/6640 [48:56<5:37:17, 5.58s/it] 45%|████▌ | 3016/6640 [49:01<5:34:01, 5.53s/it] {'loss': 0.6053, 'learning_rate': 1.1958079470125223e-05, 'epoch': 0.45} 45%|████▌ | 3016/6640 [49:01<5:34:01, 5.53s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 45%|████▌ | 3017/6640 [49:07<5:42:16, 5.67s/it] {'loss': 0.6008, 'learning_rate': 1.1953295419738013e-05, 'epoch': 0.45} 45%|████▌ | 3017/6640 [49:07<5:42:16, 5.67s/it] 45%|████▌ | 3018/6640 [49:13<5:36:56, 5.58s/it] {'loss': 0.6146, 'learning_rate': 1.1948510904518895e-05, 'epoch': 0.45} 45%|████▌ | 3018/6640 [49:13<5:36:56, 5.58s/it] 45%|████▌ | 3019/6640 [49:18<5:32:57, 5.52s/it] {'loss': 0.5828, 'learning_rate': 1.1943725925606453e-05, 'epoch': 0.45} 45%|████▌ | 3019/6640 [49:18<5:32:57, 5.52s/it] 45%|████▌ | 3020/6640 [49:24<5:31:15, 5.49s/it] {'loss': 0.6034, 'learning_rate': 1.1938940484139387e-05, 'epoch': 0.45} 45%|████▌ | 3020/6640 [49:24<5:31:15, 5.49s/it] 45%|████▌ | 3021/6640 [49:29<5:31:19, 5.49s/it] {'loss': 0.6196, 'learning_rate': 1.1934154581256498e-05, 'epoch': 0.45} 45%|████▌ | 3021/6640 [49:29<5:31:19, 5.49s/it] 46%|████▌ | 3022/6640 [49:34<5:29:46, 5.47s/it] {'loss': 0.5905, 'learning_rate': 1.1929368218096708e-05, 'epoch': 0.46} 46%|████▌ | 3022/6640 [49:34<5:29:46, 5.47s/it] 46%|████▌ | 3023/6640 [49:40<5:27:42, 5.44s/it] {'loss': 0.5915, 'learning_rate': 1.1924581395799039e-05, 'epoch': 0.46} 46%|████▌ | 3023/6640 [49:40<5:27:42, 5.44s/it] 46%|████▌ | 3024/6640 [49:45<5:26:05, 5.41s/it] {'loss': 0.5956, 'learning_rate': 1.1919794115502628e-05, 'epoch': 0.46} 46%|████▌ | 3024/6640 [49:45<5:26:05, 5.41s/it] 46%|████▌ | 3025/6640 [49:51<5:30:18, 5.48s/it] {'loss': 0.6214, 'learning_rate': 1.1915006378346719e-05, 'epoch': 0.46} 46%|████▌ | 3025/6640 [49:51<5:30:18, 5.48s/it] 46%|████▌ | 3026/6640 [49:56<5:26:47, 5.43s/it] {'loss': 0.5877, 'learning_rate': 1.1910218185470663e-05, 'epoch': 0.46} 46%|████▌ | 3026/6640 [49:56<5:26:47, 5.43s/it] 46%|████▌ | 3027/6640 [50:02<5:26:50, 5.43s/it] {'loss': 0.6225, 'learning_rate': 1.1905429538013926e-05, 'epoch': 0.46} 46%|████▌ | 3027/6640 [50:02<5:26:50, 5.43s/it] 46%|████▌ | 3028/6640 [50:07<5:22:05, 5.35s/it] {'loss': 0.5641, 'learning_rate': 1.1900640437116074e-05, 'epoch': 0.46} 46%|████▌ | 3028/6640 [50:07<5:22:05, 5.35s/it] 46%|████▌ | 3029/6640 [50:12<5:25:27, 5.41s/it] {'loss': 0.5963, 'learning_rate': 1.1895850883916786e-05, 'epoch': 0.46} 46%|████▌ | 3029/6640 [50:12<5:25:27, 5.41s/it] 46%|████▌ | 3030/6640 [50:18<5:25:15, 5.41s/it] {'loss': 0.6067, 'learning_rate': 1.1891060879555847e-05, 'epoch': 0.46} 46%|████▌ | 3030/6640 [50:18<5:25:15, 5.41s/it] 46%|████▌ | 3031/6640 [50:23<5:27:16, 5.44s/it] {'loss': 0.602, 'learning_rate': 1.188627042517315e-05, 'epoch': 0.46} 46%|████▌ | 3031/6640 [50:23<5:27:16, 5.44s/it] 46%|████▌ | 3032/6640 [50:29<5:25:01, 5.41s/it] {'loss': 0.5959, 'learning_rate': 1.1881479521908694e-05, 'epoch': 0.46} 46%|████▌ | 3032/6640 [50:29<5:25:01, 5.41s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5158 > 4096). Running this sequence through the model will result in indexing errors 46%|████▌ | 3033/6640 [50:34<5:26:39, 5.43s/it] {'loss': 0.5996, 'learning_rate': 1.1876688170902583e-05, 'epoch': 0.46} 46%|████▌ | 3033/6640 [50:34<5:26:39, 5.43s/it] 46%|████▌ | 3034/6640 [50:39<5:26:13, 5.43s/it] {'loss': 0.6017, 'learning_rate': 1.1871896373295033e-05, 'epoch': 0.46} 46%|████▌ | 3034/6640 [50:39<5:26:13, 5.43s/it] 46%|████▌ | 3035/6640 [50:45<5:26:15, 5.43s/it] {'loss': 0.6199, 'learning_rate': 1.1867104130226363e-05, 'epoch': 0.46} 46%|████▌ | 3035/6640 [50:45<5:26:15, 5.43s/it] 46%|████▌ | 3036/6640 [50:50<5:24:37, 5.40s/it] {'loss': 0.5962, 'learning_rate': 1.1862311442837e-05, 'epoch': 0.46} 46%|████▌ | 3036/6640 [50:50<5:24:37, 5.40s/it] 46%|████▌ | 3037/6640 [50:56<5:26:59, 5.45s/it] {'loss': 0.5736, 'learning_rate': 1.185751831226747e-05, 'epoch': 0.46} 46%|████▌ | 3037/6640 [50:56<5:26:59, 5.45s/it] 46%|████▌ | 3038/6640 [51:01<5:26:05, 5.43s/it] {'loss': 0.5989, 'learning_rate': 1.1852724739658417e-05, 'epoch': 0.46} 46%|████▌ | 3038/6640 [51:01<5:26:05, 5.43s/it] 46%|████▌ | 3039/6640 [51:07<5:25:51, 5.43s/it] {'loss': 0.6089, 'learning_rate': 1.1847930726150574e-05, 'epoch': 0.46} 46%|████▌ | 3039/6640 [51:07<5:25:51, 5.43s/it] 46%|████▌ | 3040/6640 [51:12<5:25:29, 5.42s/it] {'loss': 0.5913, 'learning_rate': 1.1843136272884795e-05, 'epoch': 0.46} 46%|████▌ | 3040/6640 [51:12<5:25:29, 5.42s/it] 46%|████▌ | 3041/6640 [51:17<5:25:59, 5.43s/it] {'loss': 0.5948, 'learning_rate': 1.1838341381002027e-05, 'epoch': 0.46} 46%|████▌ | 3041/6640 [51:17<5:25:59, 5.43s/it] 46%|████▌ | 3042/6640 [51:23<5:24:57, 5.42s/it] {'loss': 0.5877, 'learning_rate': 1.1833546051643325e-05, 'epoch': 0.46} 46%|████▌ | 3042/6640 [51:23<5:24:57, 5.42s/it] 46%|████▌ | 3043/6640 [51:28<5:24:35, 5.41s/it] {'loss': 0.5871, 'learning_rate': 1.182875028594985e-05, 'epoch': 0.46} 46%|████▌ | 3043/6640 [51:28<5:24:35, 5.41s/it] 46%|████▌ | 3044/6640 [51:34<5:24:16, 5.41s/it] {'loss': 0.5868, 'learning_rate': 1.1823954085062867e-05, 'epoch': 0.46} 46%|████▌ | 3044/6640 [51:34<5:24:16, 5.41s/it] 46%|████▌ | 3045/6640 [51:39<5:28:04, 5.48s/it] {'loss': 0.6208, 'learning_rate': 1.1819157450123745e-05, 'epoch': 0.46} 46%|████▌ | 3045/6640 [51:39<5:28:04, 5.48s/it] 46%|████▌ | 3046/6640 [51:45<5:24:50, 5.42s/it] {'loss': 0.5869, 'learning_rate': 1.1814360382273949e-05, 'epoch': 0.46} 46%|████▌ | 3046/6640 [51:45<5:24:50, 5.42s/it] 46%|████▌ | 3047/6640 [51:50<5:25:14, 5.43s/it] {'loss': 0.6177, 'learning_rate': 1.1809562882655054e-05, 'epoch': 0.46} 46%|████▌ | 3047/6640 [51:50<5:25:14, 5.43s/it] 46%|████▌ | 3048/6640 [51:55<5:25:46, 5.44s/it] {'loss': 0.5924, 'learning_rate': 1.180476495240874e-05, 'epoch': 0.46} 46%|████▌ | 3048/6640 [51:55<5:25:46, 5.44s/it] 46%|████▌ | 3049/6640 [52:01<5:25:27, 5.44s/it] {'loss': 0.5861, 'learning_rate': 1.1799966592676784e-05, 'epoch': 0.46} 46%|████▌ | 3049/6640 [52:01<5:25:27, 5.44s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 46%|████▌ | 3050/6640 [52:06<5:25:32, 5.44s/it] {'loss': 0.6194, 'learning_rate': 1.1795167804601062e-05, 'epoch': 0.46} 46%|████▌ | 3050/6640 [52:06<5:25:32, 5.44s/it] 46%|████▌ | 3051/6640 [52:12<5:29:08, 5.50s/it] {'loss': 0.5877, 'learning_rate': 1.1790368589323562e-05, 'epoch': 0.46} 46%|████▌ | 3051/6640 [52:12<5:29:08, 5.50s/it] 46%|████▌ | 3052/6640 [52:18<5:29:08, 5.50s/it] {'loss': 0.589, 'learning_rate': 1.1785568947986368e-05, 'epoch': 0.46} 46%|████▌ | 3052/6640 [52:18<5:29:08, 5.50s/it] 46%|████▌ | 3053/6640 [52:23<5:27:54, 5.48s/it] {'loss': 0.5833, 'learning_rate': 1.1780768881731664e-05, 'epoch': 0.46} 46%|████▌ | 3053/6640 [52:23<5:27:54, 5.48s/it] 46%|████▌ | 3054/6640 [52:29<5:32:38, 5.57s/it] {'loss': 0.6026, 'learning_rate': 1.177596839170174e-05, 'epoch': 0.46} 46%|████▌ | 3054/6640 [52:29<5:32:38, 5.57s/it] 46%|████▌ | 3055/6640 [52:34<5:31:03, 5.54s/it] {'loss': 0.5815, 'learning_rate': 1.1771167479038978e-05, 'epoch': 0.46} 46%|████▌ | 3055/6640 [52:34<5:31:03, 5.54s/it] 46%|████▌ | 3056/6640 [52:40<5:30:05, 5.53s/it] {'loss': 0.5797, 'learning_rate': 1.1766366144885877e-05, 'epoch': 0.46} 46%|████▌ | 3056/6640 [52:40<5:30:05, 5.53s/it] 46%|████▌ | 3057/6640 [52:45<5:27:20, 5.48s/it] {'loss': 0.594, 'learning_rate': 1.1761564390385015e-05, 'epoch': 0.46} 46%|████▌ | 3057/6640 [52:45<5:27:20, 5.48s/it] 46%|████▌ | 3058/6640 [52:50<5:25:22, 5.45s/it] {'loss': 0.6197, 'learning_rate': 1.1756762216679085e-05, 'epoch': 0.46} 46%|████▌ | 3058/6640 [52:50<5:25:22, 5.45s/it] 46%|████▌ | 3059/6640 [52:56<5:26:05, 5.46s/it] {'loss': 0.5872, 'learning_rate': 1.1751959624910874e-05, 'epoch': 0.46} 46%|████▌ | 3059/6640 [52:56<5:26:05, 5.46s/it] 46%|████▌ | 3060/6640 [53:01<5:25:49, 5.46s/it] {'loss': 0.5813, 'learning_rate': 1.1747156616223272e-05, 'epoch': 0.46} 46%|████▌ | 3060/6640 [53:01<5:25:49, 5.46s/it] 46%|████▌ | 3061/6640 [53:07<5:22:47, 5.41s/it] {'loss': 0.6211, 'learning_rate': 1.1742353191759267e-05, 'epoch': 0.46} 46%|████▌ | 3061/6640 [53:07<5:22:47, 5.41s/it] 46%|████▌ | 3062/6640 [53:12<5:21:18, 5.39s/it] {'loss': 0.6012, 'learning_rate': 1.173754935266194e-05, 'epoch': 0.46} 46%|████▌ | 3062/6640 [53:12<5:21:18, 5.39s/it] 46%|████▌ | 3063/6640 [53:17<5:19:17, 5.36s/it] {'loss': 0.6056, 'learning_rate': 1.1732745100074485e-05, 'epoch': 0.46} 46%|████▌ | 3063/6640 [53:17<5:19:17, 5.36s/it] 46%|████▌ | 3064/6640 [53:23<5:20:57, 5.39s/it] {'loss': 0.6134, 'learning_rate': 1.1727940435140177e-05, 'epoch': 0.46} 46%|████▌ | 3064/6640 [53:23<5:20:57, 5.39s/it] 46%|████▌ | 3065/6640 [53:28<5:19:43, 5.37s/it] {'loss': 0.6115, 'learning_rate': 1.1723135359002403e-05, 'epoch': 0.46} 46%|████▌ | 3065/6640 [53:28<5:19:43, 5.37s/it] 46%|████▌ | 3066/6640 [53:33<5:17:58, 5.34s/it] {'loss': 0.5971, 'learning_rate': 1.1718329872804635e-05, 'epoch': 0.46} 46%|████▌ | 3066/6640 [53:33<5:17:58, 5.34s/it] 46%|████▌ | 3067/6640 [53:39<5:20:53, 5.39s/it] {'loss': 0.5973, 'learning_rate': 1.1713523977690458e-05, 'epoch': 0.46} 46%|████▌ | 3067/6640 [53:39<5:20:53, 5.39s/it] 46%|████▌ | 3068/6640 [53:44<5:19:14, 5.36s/it] {'loss': 0.5872, 'learning_rate': 1.1708717674803538e-05, 'epoch': 0.46} 46%|████▌ | 3068/6640 [53:44<5:19:14, 5.36s/it] 46%|████▌ | 3069/6640 [53:50<5:25:58, 5.48s/it] {'loss': 0.6142, 'learning_rate': 1.1703910965287653e-05, 'epoch': 0.46} 46%|████▌ | 3069/6640 [53:50<5:25:58, 5.48s/it] 46%|████▌ | 3070/6640 [53:56<5:30:36, 5.56s/it] {'loss': 0.5991, 'learning_rate': 1.1699103850286668e-05, 'epoch': 0.46} 46%|████▌ | 3070/6640 [53:56<5:30:36, 5.56s/it] 46%|████▋ | 3071/6640 [54:01<5:27:50, 5.51s/it] {'loss': 0.6066, 'learning_rate': 1.1694296330944548e-05, 'epoch': 0.46} 46%|████▋ | 3071/6640 [54:01<5:27:50, 5.51s/it] 46%|████▋ | 3072/6640 [54:07<5:26:46, 5.50s/it] {'loss': 0.6013, 'learning_rate': 1.1689488408405354e-05, 'epoch': 0.46} 46%|████▋ | 3072/6640 [54:07<5:26:46, 5.50s/it] 46%|████▋ | 3073/6640 [54:12<5:32:31, 5.59s/it] {'loss': 0.6254, 'learning_rate': 1.168468008381324e-05, 'epoch': 0.46} 46%|████▋ | 3073/6640 [54:12<5:32:31, 5.59s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4399 > 4096). Running this sequence through the model will result in indexing errors 46%|████▋ | 3074/6640 [54:18<5:28:27, 5.53s/it] {'loss': 0.5843, 'learning_rate': 1.1679871358312462e-05, 'epoch': 0.46} 46%|████▋ | 3074/6640 [54:18<5:28:27, 5.53s/it] 46%|████▋ | 3075/6640 [54:23<5:28:07, 5.52s/it] {'loss': 0.5765, 'learning_rate': 1.1675062233047365e-05, 'epoch': 0.46} 46%|████▋ | 3075/6640 [54:23<5:28:07, 5.52s/it] 46%|████▋ | 3076/6640 [54:29<5:25:23, 5.48s/it] {'loss': 0.5936, 'learning_rate': 1.1670252709162393e-05, 'epoch': 0.46} 46%|████▋ | 3076/6640 [54:29<5:25:23, 5.48s/it] 46%|████▋ | 3077/6640 [54:34<5:22:43, 5.43s/it] {'loss': 0.5743, 'learning_rate': 1.1665442787802083e-05, 'epoch': 0.46} 46%|████▋ | 3077/6640 [54:34<5:22:43, 5.43s/it] 46%|████▋ | 3078/6640 [54:39<5:23:41, 5.45s/it] {'loss': 0.5817, 'learning_rate': 1.1660632470111069e-05, 'epoch': 0.46} 46%|████▋ | 3078/6640 [54:39<5:23:41, 5.45s/it] 46%|████▋ | 3079/6640 [54:45<5:21:22, 5.41s/it] {'loss': 0.5982, 'learning_rate': 1.1655821757234075e-05, 'epoch': 0.46} 46%|████▋ | 3079/6640 [54:45<5:21:22, 5.41s/it] 46%|████▋ | 3080/6640 [54:50<5:21:57, 5.43s/it] {'loss': 0.5707, 'learning_rate': 1.1651010650315923e-05, 'epoch': 0.46} 46%|████▋ | 3080/6640 [54:50<5:21:57, 5.43s/it] 46%|████▋ | 3081/6640 [54:56<5:22:19, 5.43s/it] {'loss': 0.5894, 'learning_rate': 1.164619915050153e-05, 'epoch': 0.46} 46%|████▋ | 3081/6640 [54:56<5:22:19, 5.43s/it] 46%|████▋ | 3082/6640 [55:01<5:22:40, 5.44s/it] {'loss': 0.5914, 'learning_rate': 1.1641387258935896e-05, 'epoch': 0.46} 46%|████▋ | 3082/6640 [55:01<5:22:40, 5.44s/it] 46%|████▋ | 3083/6640 [55:06<5:19:43, 5.39s/it] {'loss': 0.5899, 'learning_rate': 1.1636574976764133e-05, 'epoch': 0.46} 46%|████▋ | 3083/6640 [55:06<5:19:43, 5.39s/it] 46%|████▋ | 3084/6640 [55:12<5:20:45, 5.41s/it] {'loss': 0.58, 'learning_rate': 1.1631762305131424e-05, 'epoch': 0.46} 46%|████▋ | 3084/6640 [55:12<5:20:45, 5.41s/it] 46%|████▋ | 3085/6640 [55:18<5:26:28, 5.51s/it] {'loss': 0.5804, 'learning_rate': 1.1626949245183061e-05, 'epoch': 0.46} 46%|████▋ | 3085/6640 [55:18<5:26:28, 5.51s/it] 46%|████▋ | 3086/6640 [55:23<5:26:27, 5.51s/it] {'loss': 0.5942, 'learning_rate': 1.1622135798064427e-05, 'epoch': 0.46} 46%|████▋ | 3086/6640 [55:23<5:26:27, 5.51s/it] 46%|████▋ | 3087/6640 [55:28<5:23:49, 5.47s/it] {'loss': 0.5803, 'learning_rate': 1.1617321964920986e-05, 'epoch': 0.46} 46%|████▋ | 3087/6640 [55:28<5:23:49, 5.47s/it] 47%|████▋ | 3088/6640 [55:34<5:23:20, 5.46s/it] {'loss': 0.5943, 'learning_rate': 1.1612507746898307e-05, 'epoch': 0.47} 47%|████▋ | 3088/6640 [55:34<5:23:20, 5.46s/it] 47%|████▋ | 3089/6640 [55:39<5:22:23, 5.45s/it] {'loss': 0.6181, 'learning_rate': 1.160769314514204e-05, 'epoch': 0.47} 47%|████▋ | 3089/6640 [55:39<5:22:23, 5.45s/it] 47%|████▋ | 3090/6640 [55:45<5:23:46, 5.47s/it] {'loss': 0.6401, 'learning_rate': 1.1602878160797936e-05, 'epoch': 0.47} 47%|████▋ | 3090/6640 [55:45<5:23:46, 5.47s/it] 47%|████▋ | 3091/6640 [55:50<5:24:02, 5.48s/it] {'loss': 0.6062, 'learning_rate': 1.1598062795011827e-05, 'epoch': 0.47} 47%|████▋ | 3091/6640 [55:50<5:24:02, 5.48s/it] 47%|████▋ | 3092/6640 [55:56<5:23:13, 5.47s/it] {'loss': 0.602, 'learning_rate': 1.1593247048929644e-05, 'epoch': 0.47} 47%|████▋ | 3092/6640 [55:56<5:23:13, 5.47s/it] 47%|████▋ | 3093/6640 [56:01<5:21:45, 5.44s/it] {'loss': 0.6005, 'learning_rate': 1.1588430923697404e-05, 'epoch': 0.47} 47%|████▋ | 3093/6640 [56:01<5:21:45, 5.44s/it] 47%|████▋ | 3094/6640 [56:07<5:22:08, 5.45s/it] {'loss': 0.5999, 'learning_rate': 1.1583614420461218e-05, 'epoch': 0.47} 47%|████▋ | 3094/6640 [56:07<5:22:08, 5.45s/it] 47%|████▋ | 3095/6640 [56:12<5:20:02, 5.42s/it] {'loss': 0.5883, 'learning_rate': 1.1578797540367284e-05, 'epoch': 0.47} 47%|████▋ | 3095/6640 [56:12<5:20:02, 5.42s/it] 47%|████▋ | 3096/6640 [56:18<5:26:44, 5.53s/it] {'loss': 0.5984, 'learning_rate': 1.1573980284561886e-05, 'epoch': 0.47} 47%|████▋ | 3096/6640 [56:18<5:26:44, 5.53s/it] 47%|████▋ | 3097/6640 [56:23<5:24:04, 5.49s/it] {'loss': 0.5817, 'learning_rate': 1.1569162654191408e-05, 'epoch': 0.47} 47%|████▋ | 3097/6640 [56:23<5:24:04, 5.49s/it] 47%|████▋ | 3098/6640 [56:28<5:21:01, 5.44s/it] {'loss': 0.5837, 'learning_rate': 1.156434465040231e-05, 'epoch': 0.47} 47%|████▋ | 3098/6640 [56:28<5:21:01, 5.44s/it] 47%|████▋ | 3099/6640 [56:34<5:24:07, 5.49s/it] {'loss': 0.6113, 'learning_rate': 1.1559526274341155e-05, 'epoch': 0.47} 47%|████▋ | 3099/6640 [56:34<5:24:07, 5.49s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 47%|████▋ | 3100/6640 [56:40<5:32:07, 5.63s/it]7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.6094, 'learning_rate': 1.155470752715458e-05, 'epoch': 0.47} 47%|████▋ | 3100/6640 [56:40<5:32:07, 5.63s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 47%|████▋ | 3101/6640 [57:07<11:52:45, 12.08s/it] {'loss': 0.597, 'learning_rate': 1.154988840998932e-05, 'epoch': 0.47} 47%|████▋ | 3101/6640 [57:07<11:52:45, 12.08s/it] 47%|████▋ | 3102/6640 [57:13<9:58:40, 10.15s/it] {'loss': 0.5919, 'learning_rate': 1.1545068923992199e-05, 'epoch': 0.47} 47%|████▋ | 3102/6640 [57:13<9:58:40, 10.15s/it] 47%|████▋ | 3103/6640 [57:18<8:31:26, 8.68s/it] {'loss': 0.6035, 'learning_rate': 1.1540249070310124e-05, 'epoch': 0.47} 47%|████▋ | 3103/6640 [57:18<8:31:26, 8.68s/it] 47%|████▋ | 3104/6640 [57:23<7:33:32, 7.70s/it] {'loss': 0.6067, 'learning_rate': 1.1535428850090092e-05, 'epoch': 0.47} 47%|████▋ | 3104/6640 [57:23<7:33:32, 7.70s/it] 47%|████▋ | 3105/6640 [57:29<6:53:46, 7.02s/it] {'loss': 0.5832, 'learning_rate': 1.153060826447918e-05, 'epoch': 0.47} 47%|████▋ | 3105/6640 [57:29<6:53:46, 7.02s/it] 47%|████▋ | 3106/6640 [57:35<6:29:06, 6.61s/it] {'loss': 0.6155, 'learning_rate': 1.1525787314624564e-05, 'epoch': 0.47} 47%|████▋ | 3106/6640 [57:35<6:29:06, 6.61s/it] 47%|████▋ | 3107/6640 [57:40<6:15:19, 6.37s/it] {'loss': 0.6072, 'learning_rate': 1.1520966001673496e-05, 'epoch': 0.47} 47%|████▋ | 3107/6640 [57:40<6:15:19, 6.37s/it] 47%|████▋ | 3108/6640 [57:46<5:56:41, 6.06s/it] {'loss': 0.578, 'learning_rate': 1.1516144326773324e-05, 'epoch': 0.47} 47%|████▋ | 3108/6640 [57:46<5:56:41, 6.06s/it] 47%|████▋ | 3109/6640 [57:51<5:45:49, 5.88s/it] {'loss': 0.6106, 'learning_rate': 1.1511322291071474e-05, 'epoch': 0.47} 47%|████▋ | 3109/6640 [57:51<5:45:49, 5.88s/it] 47%|████▋ | 3110/6640 [57:57<5:39:59, 5.78s/it] {'loss': 0.5736, 'learning_rate': 1.1506499895715462e-05, 'epoch': 0.47} 47%|████▋ | 3110/6640 [57:57<5:39:59, 5.78s/it] 47%|████▋ | 3111/6640 [58:02<5:33:15, 5.67s/it] {'loss': 0.6009, 'learning_rate': 1.150167714185289e-05, 'epoch': 0.47} 47%|████▋ | 3111/6640 [58:02<5:33:15, 5.67s/it] 47%|████▋ | 3112/6640 [58:08<5:28:59, 5.60s/it] {'loss': 0.5928, 'learning_rate': 1.1496854030631443e-05, 'epoch': 0.47} 47%|████▋ | 3112/6640 [58:08<5:28:59, 5.60s/it] 47%|████▋ | 3113/6640 [58:13<5:28:10, 5.58s/it] {'loss': 0.5763, 'learning_rate': 1.1492030563198895e-05, 'epoch': 0.47} 47%|████▋ | 3113/6640 [58:13<5:28:10, 5.58s/it] 47%|████▋ | 3114/6640 [58:19<5:24:36, 5.52s/it] {'loss': 0.5979, 'learning_rate': 1.1487206740703094e-05, 'epoch': 0.47} 47%|████▋ | 3114/6640 [58:19<5:24:36, 5.52s/it] 47%|████▋ | 3115/6640 [58:24<5:23:28, 5.51s/it] {'loss': 0.5803, 'learning_rate': 1.148238256429199e-05, 'epoch': 0.47} 47%|████▋ | 3115/6640 [58:24<5:23:28, 5.51s/it] 47%|████▋ | 3116/6640 [58:29<5:20:22, 5.45s/it] {'loss': 0.5753, 'learning_rate': 1.14775580351136e-05, 'epoch': 0.47} 47%|████▋ | 3116/6640 [58:29<5:20:22, 5.45s/it] 47%|████▋ | 3117/6640 [58:35<5:20:08, 5.45s/it] {'loss': 0.5813, 'learning_rate': 1.1472733154316037e-05, 'epoch': 0.47} 47%|████▋ | 3117/6640 [58:35<5:20:08, 5.45s/it] 47%|████▋ | 3118/6640 [58:40<5:17:45, 5.41s/it] {'loss': 0.6035, 'learning_rate': 1.1467907923047488e-05, 'epoch': 0.47} 47%|████▋ | 3118/6640 [58:40<5:17:45, 5.41s/it] 47%|████▋ | 3119/6640 [58:45<5:17:29, 5.41s/it] {'loss': 0.6014, 'learning_rate': 1.1463082342456238e-05, 'epoch': 0.47} 47%|████▋ | 3119/6640 [58:45<5:17:29, 5.41s/it] 47%|████▋ | 3120/6640 [58:51<5:19:11, 5.44s/it] {'loss': 0.6067, 'learning_rate': 1.1458256413690634e-05, 'epoch': 0.47} 47%|████▋ | 3120/6640 [58:51<5:19:11, 5.44s/it] 47%|████▋ | 3121/6640 [58:56<5:18:46, 5.44s/it] {'loss': 0.6033, 'learning_rate': 1.1453430137899129e-05, 'epoch': 0.47} 47%|████▋ | 3121/6640 [58:56<5:18:46, 5.44s/it] 47%|████▋ | 3122/6640 [59:02<5:19:35, 5.45s/it] {'loss': 0.5989, 'learning_rate': 1.1448603516230241e-05, 'epoch': 0.47} 47%|████▋ | 3122/6640 [59:02<5:19:35, 5.45s/it] 47%|████▋ | 3123/6640 [59:07<5:19:35, 5.45s/it] {'loss': 0.5938, 'learning_rate': 1.1443776549832574e-05, 'epoch': 0.47} 47%|████▋ | 3123/6640 [59:07<5:19:35, 5.45s/it] 47%|████▋ | 3124/6640 [59:13<5:20:34, 5.47s/it] {'loss': 0.5916, 'learning_rate': 1.1438949239854822e-05, 'epoch': 0.47} 47%|████▋ | 3124/6640 [59:13<5:20:34, 5.47s/it] 47%|████▋ | 3125/6640 [59:18<5:17:09, 5.41s/it] {'loss': 0.6011, 'learning_rate': 1.1434121587445752e-05, 'epoch': 0.47} 47%|████▋ | 3125/6640 [59:18<5:17:09, 5.41s/it] 47%|████▋ | 3126/6640 [59:24<5:19:07, 5.45s/it] {'loss': 0.615, 'learning_rate': 1.1429293593754216e-05, 'epoch': 0.47} 47%|████▋ | 3126/6640 [59:24<5:19:07, 5.45s/it] 47%|████▋ | 3127/6640 [59:29<5:19:56, 5.46s/it] {'loss': 0.6154, 'learning_rate': 1.1424465259929148e-05, 'epoch': 0.47} 47%|████▋ | 3127/6640 [59:29<5:19:56, 5.46s/it] 47%|████▋ | 3128/6640 [59:35<5:19:38, 5.46s/it] {'loss': 0.5734, 'learning_rate': 1.1419636587119563e-05, 'epoch': 0.47} 47%|████▋ | 3128/6640 [59:35<5:19:38, 5.46s/it] 47%|████▋ | 3129/6640 [59:40<5:18:01, 5.43s/it] {'loss': 0.5977, 'learning_rate': 1.1414807576474554e-05, 'epoch': 0.47} 47%|████▋ | 3129/6640 [59:40<5:18:01, 5.43s/it] 47%|████▋ | 3130/6640 [59:45<5:18:44, 5.45s/it] {'loss': 0.5979, 'learning_rate': 1.1409978229143297e-05, 'epoch': 0.47} 47%|████▋ | 3130/6640 [59:45<5:18:44, 5.45s/it] 47%|████▋ | 3131/6640 [59:51<5:18:42, 5.45s/it] {'loss': 0.5894, 'learning_rate': 1.1405148546275046e-05, 'epoch': 0.47} 47%|████▋ | 3131/6640 [59:51<5:18:42, 5.45s/it] 47%|████▋ | 3132/6640 [59:57<5:23:54, 5.54s/it] {'loss': 0.6052, 'learning_rate': 1.1400318529019134e-05, 'epoch': 0.47} 47%|████▋ | 3132/6640 [59:57<5:23:54, 5.54s/it] 47%|████▋ | 3133/6640 [1:00:02<5:19:29, 5.47s/it] {'loss': 0.6108, 'learning_rate': 1.1395488178524982e-05, 'epoch': 0.47} 47%|████▋ | 3133/6640 [1:00:02<5:19:29, 5.47s/it] 47%|████▋ | 3134/6640 [1:00:07<5:17:05, 5.43s/it] {'loss': 0.5948, 'learning_rate': 1.1390657495942077e-05, 'epoch': 0.47} 47%|████▋ | 3134/6640 [1:00:07<5:17:05, 5.43s/it] 47%|████▋ | 3135/6640 [1:00:13<5:13:22, 5.36s/it] {'loss': 0.6029, 'learning_rate': 1.1385826482419993e-05, 'epoch': 0.47} 47%|████▋ | 3135/6640 [1:00:13<5:13:22, 5.36s/it] 47%|████▋ | 3136/6640 [1:00:18<5:12:22, 5.35s/it] {'loss': 0.6024, 'learning_rate': 1.1380995139108383e-05, 'epoch': 0.47} 47%|████▋ | 3136/6640 [1:00:18<5:12:22, 5.35s/it] 47%|████▋ | 3137/6640 [1:00:23<5:12:55, 5.36s/it] {'loss': 0.608, 'learning_rate': 1.137616346715698e-05, 'epoch': 0.47} 47%|████▋ | 3137/6640 [1:00:23<5:12:55, 5.36s/it] 47%|████▋ | 3138/6640 [1:00:28<5:10:50, 5.33s/it] {'loss': 0.6093, 'learning_rate': 1.137133146771559e-05, 'epoch': 0.47} 47%|████▋ | 3138/6640 [1:00:28<5:10:50, 5.33s/it] 47%|████▋ | 3139/6640 [1:00:34<5:10:44, 5.33s/it] {'loss': 0.5897, 'learning_rate': 1.1366499141934098e-05, 'epoch': 0.47} 47%|████▋ | 3139/6640 [1:00:34<5:10:44, 5.33s/it] 47%|████▋ | 3140/6640 [1:00:39<5:13:56, 5.38s/it] {'loss': 0.5961, 'learning_rate': 1.1361666490962468e-05, 'epoch': 0.47} 47%|████▋ | 3140/6640 [1:00:39<5:13:56, 5.38s/it] 47%|████▋ | 3141/6640 [1:00:45<5:14:17, 5.39s/it] {'loss': 0.6076, 'learning_rate': 1.1356833515950743e-05, 'epoch': 0.47} 47%|████▋ | 3141/6640 [1:00:45<5:14:17, 5.39s/it] 47%|████▋ | 3142/6640 [1:00:50<5:20:33, 5.50s/it] {'loss': 0.5949, 'learning_rate': 1.1352000218049038e-05, 'epoch': 0.47} 47%|████▋ | 3142/6640 [1:00:50<5:20:33, 5.50s/it] 47%|████▋ | 3143/6640 [1:00:56<5:19:09, 5.48s/it] {'loss': 0.5975, 'learning_rate': 1.1347166598407551e-05, 'epoch': 0.47} 47%|████▋ | 3143/6640 [1:00:56<5:19:09, 5.48s/it] 47%|████▋ | 3144/6640 [1:01:01<5:18:58, 5.47s/it] {'loss': 0.6287, 'learning_rate': 1.1342332658176556e-05, 'epoch': 0.47} 47%|████▋ | 3144/6640 [1:01:01<5:18:58, 5.47s/it] 47%|████▋ | 3145/6640 [1:01:07<5:21:38, 5.52s/it] {'loss': 0.6054, 'learning_rate': 1.1337498398506397e-05, 'epoch': 0.47} 47%|████▋ | 3145/6640 [1:01:07<5:21:38, 5.52s/it] 47%|████▋ | 3146/6640 [1:01:13<5:23:40, 5.56s/it] {'loss': 0.5749, 'learning_rate': 1.13326638205475e-05, 'epoch': 0.47} 47%|████▋ | 3146/6640 [1:01:13<5:23:40, 5.56s/it] 47%|████▋ | 3147/6640 [1:01:18<5:21:28, 5.52s/it] {'loss': 0.6075, 'learning_rate': 1.1327828925450363e-05, 'epoch': 0.47} 47%|████▋ | 3147/6640 [1:01:18<5:21:28, 5.52s/it] 47%|████▋ | 3148/6640 [1:01:23<5:18:26, 5.47s/it] {'loss': 0.5776, 'learning_rate': 1.1322993714365567e-05, 'epoch': 0.47} 47%|████▋ | 3148/6640 [1:01:23<5:18:26, 5.47s/it] 47%|████▋ | 3149/6640 [1:01:29<5:18:56, 5.48s/it] {'loss': 0.6027, 'learning_rate': 1.1318158188443758e-05, 'epoch': 0.47} 47%|████▋ | 3149/6640 [1:01:29<5:18:56, 5.48s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 47%|████▋ | 3150/6640 [1:01:34<5:14:03, 5.40s/it] {'loss': 0.5865, 'learning_rate': 1.1313322348835658e-05, 'epoch': 0.47} 47%|████▋ | 3150/6640 [1:01:34<5:14:03, 5.40s/it] 47%|████▋ | 3151/6640 [1:01:40<5:17:32, 5.46s/it] {'loss': 0.5991, 'learning_rate': 1.130848619669207e-05, 'epoch': 0.47} 47%|████▋ | 3151/6640 [1:01:40<5:17:32, 5.46s/it] 47%|████▋ | 3152/6640 [1:01:45<5:17:24, 5.46s/it] {'loss': 0.5903, 'learning_rate': 1.130364973316387e-05, 'epoch': 0.47} 47%|████▋ | 3152/6640 [1:01:45<5:17:24, 5.46s/it] 47%|████▋ | 3153/6640 [1:01:51<5:22:03, 5.54s/it] {'loss': 0.607, 'learning_rate': 1.129881295940201e-05, 'epoch': 0.47} 47%|████▋ | 3153/6640 [1:01:51<5:22:03, 5.54s/it] 48%|████▊ | 3154/6640 [1:01:56<5:16:52, 5.45s/it] {'loss': 0.5896, 'learning_rate': 1.1293975876557506e-05, 'epoch': 0.47} 48%|████▊ | 3154/6640 [1:01:56<5:16:52, 5.45s/it] 48%|████▊ | 3155/6640 [1:02:02<5:18:28, 5.48s/it] {'loss': 0.5898, 'learning_rate': 1.1289138485781456e-05, 'epoch': 0.48} 48%|████▊ | 3155/6640 [1:02:02<5:18:28, 5.48s/it] 48%|████▊ | 3156/6640 [1:02:07<5:18:24, 5.48s/it] {'loss': 0.5891, 'learning_rate': 1.1284300788225032e-05, 'epoch': 0.48} 48%|████▊ | 3156/6640 [1:02:07<5:18:24, 5.48s/it] 48%|████▊ | 3157/6640 [1:02:13<5:17:02, 5.46s/it] {'loss': 0.6011, 'learning_rate': 1.1279462785039472e-05, 'epoch': 0.48} 48%|████▊ | 3157/6640 [1:02:13<5:17:02, 5.46s/it] 48%|████▊ | 3158/6640 [1:02:18<5:15:31, 5.44s/it] {'loss': 0.584, 'learning_rate': 1.1274624477376091e-05, 'epoch': 0.48} 48%|████▊ | 3158/6640 [1:02:18<5:15:31, 5.44s/it] 48%|████▊ | 3159/6640 [1:02:24<5:19:15, 5.50s/it] {'loss': 0.5773, 'learning_rate': 1.1269785866386279e-05, 'epoch': 0.48} 48%|████▊ | 3159/6640 [1:02:24<5:19:15, 5.50s/it] 48%|████▊ | 3160/6640 [1:02:29<5:16:11, 5.45s/it] {'loss': 0.5876, 'learning_rate': 1.1264946953221496e-05, 'epoch': 0.48} 48%|████▊ | 3160/6640 [1:02:29<5:16:11, 5.45s/it] 48%|████▊ | 3161/6640 [1:02:35<5:19:43, 5.51s/it] {'loss': 0.5874, 'learning_rate': 1.126010773903327e-05, 'epoch': 0.48} 48%|████▊ | 3161/6640 [1:02:35<5:19:43, 5.51s/it] 48%|████▊ | 3162/6640 [1:02:40<5:18:24, 5.49s/it] {'loss': 0.5907, 'learning_rate': 1.125526822497321e-05, 'epoch': 0.48} 48%|████▊ | 3162/6640 [1:02:40<5:18:24, 5.49s/it] 48%|████▊ | 3163/6640 [1:02:46<5:18:26, 5.50s/it] {'loss': 0.5905, 'learning_rate': 1.1250428412192985e-05, 'epoch': 0.48} 48%|████▊ | 3163/6640 [1:02:46<5:18:26, 5.50s/it] 48%|████▊ | 3164/6640 [1:02:51<5:17:51, 5.49s/it] {'loss': 0.5977, 'learning_rate': 1.1245588301844343e-05, 'epoch': 0.48} 48%|████▊ | 3164/6640 [1:02:51<5:17:51, 5.49s/it] 48%|████▊ | 3165/6640 [1:02:56<5:12:00, 5.39s/it] {'loss': 0.5943, 'learning_rate': 1.12407478950791e-05, 'epoch': 0.48} 48%|████▊ | 3165/6640 [1:02:56<5:12:00, 5.39s/it] 48%|████▊ | 3166/6640 [1:03:02<5:13:39, 5.42s/it] {'loss': 0.5912, 'learning_rate': 1.1235907193049145e-05, 'epoch': 0.48} 48%|████▊ | 3166/6640 [1:03:02<5:13:39, 5.42s/it] 48%|████▊ | 3167/6640 [1:03:07<5:14:37, 5.44s/it] {'loss': 0.595, 'learning_rate': 1.123106619690643e-05, 'epoch': 0.48} 48%|████▊ | 3167/6640 [1:03:07<5:14:37, 5.44s/it] 48%|████▊ | 3168/6640 [1:03:12<5:11:05, 5.38s/it] {'loss': 0.5991, 'learning_rate': 1.1226224907802986e-05, 'epoch': 0.48} 48%|████▊ | 3168/6640 [1:03:12<5:11:05, 5.38s/it] 48%|████▊ | 3169/6640 [1:03:18<5:10:26, 5.37s/it] {'loss': 0.6217, 'learning_rate': 1.1221383326890911e-05, 'epoch': 0.48} 48%|████▊ | 3169/6640 [1:03:18<5:10:26, 5.37s/it] 48%|████▊ | 3170/6640 [1:03:23<5:10:31, 5.37s/it] {'loss': 0.5997, 'learning_rate': 1.1216541455322367e-05, 'epoch': 0.48} 48%|████▊ | 3170/6640 [1:03:23<5:10:31, 5.37s/it] 48%|████▊ | 3171/6640 [1:03:29<5:12:57, 5.41s/it] {'loss': 0.6044, 'learning_rate': 1.1211699294249597e-05, 'epoch': 0.48} 48%|████▊ | 3171/6640 [1:03:29<5:12:57, 5.41s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4338 > 4096). Running this sequence through the model will result in indexing errors 48%|████▊ | 3172/6640 [1:03:34<5:14:55, 5.45s/it] {'loss': 0.6127, 'learning_rate': 1.1206856844824896e-05, 'epoch': 0.48} 48%|████▊ | 3172/6640 [1:03:34<5:14:55, 5.45s/it] 48%|████▊ | 3173/6640 [1:03:40<5:13:29, 5.43s/it] {'loss': 0.5809, 'learning_rate': 1.1202014108200645e-05, 'epoch': 0.48} 48%|████▊ | 3173/6640 [1:03:40<5:13:29, 5.43s/it] 48%|████▊ | 3174/6640 [1:03:45<5:13:27, 5.43s/it] {'loss': 0.5865, 'learning_rate': 1.119717108552928e-05, 'epoch': 0.48} 48%|████▊ | 3174/6640 [1:03:45<5:13:27, 5.43s/it] 48%|████▊ | 3175/6640 [1:03:50<5:13:59, 5.44s/it] {'loss': 0.6037, 'learning_rate': 1.1192327777963313e-05, 'epoch': 0.48} 48%|████▊ | 3175/6640 [1:03:50<5:13:59, 5.44s/it] 48%|████▊ | 3176/6640 [1:03:56<5:15:42, 5.47s/it] {'loss': 0.5801, 'learning_rate': 1.118748418665532e-05, 'epoch': 0.48} 48%|████▊ | 3176/6640 [1:03:56<5:15:42, 5.47s/it] 48%|████▊ | 3177/6640 [1:04:01<5:13:06, 5.43s/it] {'loss': 0.601, 'learning_rate': 1.1182640312757949e-05, 'epoch': 0.48} 48%|████▊ | 3177/6640 [1:04:01<5:13:06, 5.43s/it] 48%|████▊ | 3178/6640 [1:04:07<5:13:22, 5.43s/it] {'loss': 0.5997, 'learning_rate': 1.1177796157423908e-05, 'epoch': 0.48} 48%|████▊ | 3178/6640 [1:04:07<5:13:22, 5.43s/it] 48%|████▊ | 3179/6640 [1:04:12<5:13:34, 5.44s/it] {'loss': 0.6152, 'learning_rate': 1.1172951721805977e-05, 'epoch': 0.48} 48%|████▊ | 3179/6640 [1:04:12<5:13:34, 5.44s/it] 48%|████▊ | 3180/6640 [1:04:18<5:13:34, 5.44s/it] {'loss': 0.5991, 'learning_rate': 1.1168107007057006e-05, 'epoch': 0.48} 48%|████▊ | 3180/6640 [1:04:18<5:13:34, 5.44s/it] 48%|████▊ | 3181/6640 [1:04:23<5:16:11, 5.48s/it] {'loss': 0.5998, 'learning_rate': 1.1163262014329902e-05, 'epoch': 0.48} 48%|████▊ | 3181/6640 [1:04:23<5:16:11, 5.48s/it] 48%|████▊ | 3182/6640 [1:04:29<5:17:23, 5.51s/it] {'loss': 0.5873, 'learning_rate': 1.1158416744777644e-05, 'epoch': 0.48} 48%|████▊ | 3182/6640 [1:04:29<5:17:23, 5.51s/it] 48%|████▊ | 3183/6640 [1:04:34<5:16:04, 5.49s/it] {'loss': 0.5936, 'learning_rate': 1.1153571199553276e-05, 'epoch': 0.48} 48%|████▊ | 3183/6640 [1:04:34<5:16:04, 5.49s/it] 48%|████▊ | 3184/6640 [1:04:40<5:14:32, 5.46s/it] {'loss': 0.604, 'learning_rate': 1.1148725379809911e-05, 'epoch': 0.48} 48%|████▊ | 3184/6640 [1:04:40<5:14:32, 5.46s/it] 48%|████▊ | 3185/6640 [1:04:45<5:15:33, 5.48s/it] {'loss': 0.6044, 'learning_rate': 1.1143879286700723e-05, 'epoch': 0.48} 48%|████▊ | 3185/6640 [1:04:45<5:15:33, 5.48s/it] 48%|████▊ | 3186/6640 [1:04:50<5:12:22, 5.43s/it] {'loss': 0.6032, 'learning_rate': 1.1139032921378947e-05, 'epoch': 0.48} 48%|████▊ | 3186/6640 [1:04:50<5:12:22, 5.43s/it] 48%|████▊ | 3187/6640 [1:04:56<5:11:19, 5.41s/it] {'loss': 0.591, 'learning_rate': 1.1134186284997897e-05, 'epoch': 0.48} 48%|████▊ | 3187/6640 [1:04:56<5:11:19, 5.41s/it] 48%|████▊ | 3188/6640 [1:05:01<5:14:17, 5.46s/it] {'loss': 0.5658, 'learning_rate': 1.1129339378710933e-05, 'epoch': 0.48} 48%|████▊ | 3188/6640 [1:05:01<5:14:17, 5.46s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 48%|████▊ | 3189/6640 [1:05:07<5:15:33, 5.49s/it] {'loss': 0.6016, 'learning_rate': 1.1124492203671498e-05, 'epoch': 0.48} 48%|████▊ | 3189/6640 [1:05:07<5:15:33, 5.49s/it] 48%|████▊ | 3190/6640 [1:05:12<5:10:57, 5.41s/it] {'loss': 0.607, 'learning_rate': 1.1119644761033079e-05, 'epoch': 0.48} 48%|████▊ | 3190/6640 [1:05:12<5:10:57, 5.41s/it] 48%|████▊ | 3191/6640 [1:05:18<5:15:36, 5.49s/it] {'loss': 0.5833, 'learning_rate': 1.1114797051949248e-05, 'epoch': 0.48} 48%|████▊ | 3191/6640 [1:05:18<5:15:36, 5.49s/it] 48%|████▊ | 3192/6640 [1:05:23<5:12:54, 5.45s/it] {'loss': 0.5864, 'learning_rate': 1.1109949077573623e-05, 'epoch': 0.48} 48%|████▊ | 3192/6640 [1:05:23<5:12:54, 5.45s/it] 48%|████▊ | 3193/6640 [1:05:29<5:12:37, 5.44s/it] {'loss': 0.5879, 'learning_rate': 1.1105100839059892e-05, 'epoch': 0.48} 48%|████▊ | 3193/6640 [1:05:29<5:12:37, 5.44s/it] 48%|████▊ | 3194/6640 [1:05:34<5:13:23, 5.46s/it] {'loss': 0.61, 'learning_rate': 1.110025233756181e-05, 'epoch': 0.48} 48%|████▊ | 3194/6640 [1:05:34<5:13:23, 5.46s/it] 48%|████▊ | 3195/6640 [1:05:40<5:12:04, 5.44s/it] {'loss': 0.5958, 'learning_rate': 1.1095403574233185e-05, 'epoch': 0.48} 48%|████▊ | 3195/6640 [1:05:40<5:12:04, 5.44s/it] 48%|████▊ | 3196/6640 [1:05:45<5:15:18, 5.49s/it] {'loss': 0.6009, 'learning_rate': 1.1090554550227899e-05, 'epoch': 0.48} 48%|████▊ | 3196/6640 [1:05:45<5:15:18, 5.49s/it] 48%|████▊ | 3197/6640 [1:05:51<5:14:07, 5.47s/it] {'loss': 0.6046, 'learning_rate': 1.1085705266699884e-05, 'epoch': 0.48} 48%|████▊ | 3197/6640 [1:05:51<5:14:07, 5.47s/it] 48%|████▊ | 3198/6640 [1:05:56<5:12:51, 5.45s/it] {'loss': 0.5923, 'learning_rate': 1.1080855724803141e-05, 'epoch': 0.48} 48%|████▊ | 3198/6640 [1:05:56<5:12:51, 5.45s/it] 48%|████▊ | 3199/6640 [1:06:02<5:13:38, 5.47s/it] {'loss': 0.5962, 'learning_rate': 1.1076005925691731e-05, 'epoch': 0.48} 48%|████▊ | 3199/6640 [1:06:02<5:13:38, 5.47s/it]6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... 4AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 48%|████▊ | 3200/6640 [1:06:07<5:12:33, 5.45s/it] {'loss': 0.5994, 'learning_rate': 1.1071155870519777e-05, 'epoch': 0.48} 48%|████▊ | 3200/6640 [1:06:07<5:12:33, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 48%|████▊ | 3201/6640 [1:06:33<11:08:22, 11.66s/it] {'loss': 0.5985, 'learning_rate': 1.106630556044146e-05, 'epoch': 0.48} 48%|████▊ | 3201/6640 [1:06:33<11:08:22, 11.66s/it] 48%|████▊ | 3202/6640 [1:06:39<9:22:03, 9.81s/it] {'loss': 0.5809, 'learning_rate': 1.1061454996611026e-05, 'epoch': 0.48} 48%|████▊ | 3202/6640 [1:06:39<9:22:03, 9.81s/it] 48%|████▊ | 3203/6640 [1:06:44<8:04:58, 8.47s/it] {'loss': 0.591, 'learning_rate': 1.1056604180182777e-05, 'epoch': 0.48} 48%|████▊ | 3203/6640 [1:06:44<8:04:58, 8.47s/it] 48%|████▊ | 3204/6640 [1:06:49<7:11:04, 7.53s/it] {'loss': 0.6068, 'learning_rate': 1.105175311231108e-05, 'epoch': 0.48} 48%|████▊ | 3204/6640 [1:06:49<7:11:04, 7.53s/it] 48%|████▊ | 3205/6640 [1:06:55<6:34:56, 6.90s/it] {'loss': 0.601, 'learning_rate': 1.1046901794150358e-05, 'epoch': 0.48} 48%|████▊ | 3205/6640 [1:06:55<6:34:56, 6.90s/it] 48%|████▊ | 3206/6640 [1:07:00<6:10:27, 6.47s/it] {'loss': 0.5832, 'learning_rate': 1.104205022685509e-05, 'epoch': 0.48} 48%|████▊ | 3206/6640 [1:07:00<6:10:27, 6.47s/it] 48%|████▊ | 3207/6640 [1:07:05<5:49:57, 6.12s/it] {'loss': 0.5648, 'learning_rate': 1.1037198411579826e-05, 'epoch': 0.48} 48%|████▊ | 3207/6640 [1:07:05<5:49:57, 6.12s/it] 48%|████▊ | 3208/6640 [1:07:11<5:38:07, 5.91s/it] {'loss': 0.6011, 'learning_rate': 1.1032346349479162e-05, 'epoch': 0.48} 48%|████▊ | 3208/6640 [1:07:11<5:38:07, 5.91s/it] 48%|████▊ | 3209/6640 [1:07:16<5:27:05, 5.72s/it] {'loss': 0.5949, 'learning_rate': 1.1027494041707761e-05, 'epoch': 0.48} 48%|████▊ | 3209/6640 [1:07:16<5:27:05, 5.72s/it] 48%|████▊ | 3210/6640 [1:07:22<5:21:23, 5.62s/it] {'loss': 0.5982, 'learning_rate': 1.1022641489420342e-05, 'epoch': 0.48} 48%|████▊ | 3210/6640 [1:07:22<5:21:23, 5.62s/it] 48%|████▊ | 3211/6640 [1:07:27<5:17:23, 5.55s/it] {'loss': 0.6157, 'learning_rate': 1.1017788693771685e-05, 'epoch': 0.48} 48%|████▊ | 3211/6640 [1:07:27<5:17:23, 5.55s/it] 48%|████▊ | 3212/6640 [1:07:32<5:12:29, 5.47s/it] {'loss': 0.5991, 'learning_rate': 1.1012935655916624e-05, 'epoch': 0.48} 48%|████▊ | 3212/6640 [1:07:32<5:12:29, 5.47s/it] 48%|████▊ | 3213/6640 [1:07:38<5:12:55, 5.48s/it] {'loss': 0.5894, 'learning_rate': 1.1008082377010045e-05, 'epoch': 0.48} 48%|████▊ | 3213/6640 [1:07:38<5:12:55, 5.48s/it] 48%|████▊ | 3214/6640 [1:07:43<5:14:14, 5.50s/it] {'loss': 0.6128, 'learning_rate': 1.100322885820691e-05, 'epoch': 0.48} 48%|████▊ | 3214/6640 [1:07:43<5:14:14, 5.50s/it] 48%|████▊ | 3215/6640 [1:07:49<5:12:03, 5.47s/it] {'loss': 0.604, 'learning_rate': 1.0998375100662215e-05, 'epoch': 0.48} 48%|████▊ | 3215/6640 [1:07:49<5:12:03, 5.47s/it] 48%|████▊ | 3216/6640 [1:07:54<5:10:48, 5.45s/it] {'loss': 0.5839, 'learning_rate': 1.0993521105531033e-05, 'epoch': 0.48} 48%|████▊ | 3216/6640 [1:07:54<5:10:48, 5.45s/it] 48%|████▊ | 3217/6640 [1:07:59<5:10:31, 5.44s/it] {'loss': 0.5902, 'learning_rate': 1.0988666873968477e-05, 'epoch': 0.48} 48%|████▊ | 3217/6640 [1:07:59<5:10:31, 5.44s/it] 48%|████▊ | 3218/6640 [1:08:05<5:08:55, 5.42s/it] {'loss': 0.5777, 'learning_rate': 1.0983812407129728e-05, 'epoch': 0.48} 48%|████▊ | 3218/6640 [1:08:05<5:08:55, 5.42s/it] 48%|████▊ | 3219/6640 [1:08:10<5:11:23, 5.46s/it] {'loss': 0.6033, 'learning_rate': 1.0978957706170016e-05, 'epoch': 0.48} 48%|████▊ | 3219/6640 [1:08:10<5:11:23, 5.46s/it] 48%|████▊ | 3220/6640 [1:08:16<5:09:22, 5.43s/it] {'loss': 0.6038, 'learning_rate': 1.0974102772244638e-05, 'epoch': 0.48} 48%|████▊ | 3220/6640 [1:08:16<5:09:22, 5.43s/it] 49%|████▊ | 3221/6640 [1:08:21<5:09:38, 5.43s/it] {'loss': 0.6078, 'learning_rate': 1.0969247606508928e-05, 'epoch': 0.49} 49%|████▊ | 3221/6640 [1:08:21<5:09:38, 5.43s/it] 49%|████▊ | 3222/6640 [1:08:27<5:14:32, 5.52s/it] {'loss': 0.587, 'learning_rate': 1.0964392210118292e-05, 'epoch': 0.49} 49%|████▊ | 3222/6640 [1:08:27<5:14:32, 5.52s/it] 49%|████▊ | 3223/6640 [1:08:32<5:13:17, 5.50s/it] {'loss': 0.5945, 'learning_rate': 1.0959536584228181e-05, 'epoch': 0.49} 49%|████▊ | 3223/6640 [1:08:32<5:13:17, 5.50s/it] 49%|████▊ | 3224/6640 [1:08:38<5:12:51, 5.50s/it] {'loss': 0.5782, 'learning_rate': 1.0954680729994103e-05, 'epoch': 0.49} 49%|████▊ | 3224/6640 [1:08:38<5:12:51, 5.50s/it] 49%|████▊ | 3225/6640 [1:08:43<5:12:30, 5.49s/it] {'loss': 0.6167, 'learning_rate': 1.094982464857162e-05, 'epoch': 0.49} 49%|████▊ | 3225/6640 [1:08:43<5:12:30, 5.49s/it] 49%|████▊ | 3226/6640 [1:08:49<5:10:44, 5.46s/it] {'loss': 0.597, 'learning_rate': 1.0944968341116353e-05, 'epoch': 0.49} 49%|████▊ | 3226/6640 [1:08:49<5:10:44, 5.46s/it] 49%|████▊ | 3227/6640 [1:08:54<5:09:46, 5.45s/it] {'loss': 0.5948, 'learning_rate': 1.0940111808783972e-05, 'epoch': 0.49} 49%|████▊ | 3227/6640 [1:08:54<5:09:46, 5.45s/it] 49%|████▊ | 3228/6640 [1:09:00<5:12:47, 5.50s/it] {'loss': 0.6006, 'learning_rate': 1.09352550527302e-05, 'epoch': 0.49} 49%|████▊ | 3228/6640 [1:09:00<5:12:47, 5.50s/it] 49%|████▊ | 3229/6640 [1:09:05<5:14:19, 5.53s/it] {'loss': 0.5971, 'learning_rate': 1.0930398074110815e-05, 'epoch': 0.49} 49%|████▊ | 3229/6640 [1:09:05<5:14:19, 5.53s/it] 49%|████▊ | 3230/6640 [1:09:11<5:09:08, 5.44s/it] {'loss': 0.5989, 'learning_rate': 1.0925540874081649e-05, 'epoch': 0.49} 49%|████▊ | 3230/6640 [1:09:11<5:09:08, 5.44s/it] 49%|████▊ | 3231/6640 [1:09:16<5:07:33, 5.41s/it] {'loss': 0.581, 'learning_rate': 1.0920683453798581e-05, 'epoch': 0.49} 49%|████▊ | 3231/6640 [1:09:16<5:07:33, 5.41s/it] 49%|████▊ | 3232/6640 [1:09:21<5:05:51, 5.38s/it] {'loss': 0.5877, 'learning_rate': 1.0915825814417555e-05, 'epoch': 0.49} 49%|████▊ | 3232/6640 [1:09:21<5:05:51, 5.38s/it] 49%|████▊ | 3233/6640 [1:09:27<5:08:28, 5.43s/it] {'loss': 0.5742, 'learning_rate': 1.0910967957094549e-05, 'epoch': 0.49} 49%|████▊ | 3233/6640 [1:09:27<5:08:28, 5.43s/it] 49%|████▊ | 3234/6640 [1:09:32<5:07:46, 5.42s/it] {'loss': 0.5985, 'learning_rate': 1.0906109882985606e-05, 'epoch': 0.49} 49%|████▊ | 3234/6640 [1:09:32<5:07:46, 5.42s/it] 49%|████▊ | 3235/6640 [1:09:38<5:07:28, 5.42s/it] {'loss': 0.581, 'learning_rate': 1.0901251593246822e-05, 'epoch': 0.49} 49%|████▊ | 3235/6640 [1:09:38<5:07:28, 5.42s/it] 49%|████▊ | 3236/6640 [1:09:43<5:07:49, 5.43s/it] {'loss': 0.5965, 'learning_rate': 1.0896393089034336e-05, 'epoch': 0.49} 49%|████▊ | 3236/6640 [1:09:43<5:07:49, 5.43s/it] 49%|████▉ | 3237/6640 [1:09:49<5:09:32, 5.46s/it] {'loss': 0.5742, 'learning_rate': 1.0891534371504341e-05, 'epoch': 0.49} 49%|████▉ | 3237/6640 [1:09:49<5:09:32, 5.46s/it] 49%|████▉ | 3238/6640 [1:09:54<5:09:17, 5.45s/it] {'loss': 0.5865, 'learning_rate': 1.0886675441813083e-05, 'epoch': 0.49} 49%|████▉ | 3238/6640 [1:09:54<5:09:17, 5.45s/it] 49%|████▉ | 3239/6640 [1:10:00<5:10:50, 5.48s/it] {'loss': 0.6153, 'learning_rate': 1.0881816301116857e-05, 'epoch': 0.49} 49%|████▉ | 3239/6640 [1:10:00<5:10:50, 5.48s/it] 49%|████▉ | 3240/6640 [1:10:05<5:10:39, 5.48s/it] {'loss': 0.6009, 'learning_rate': 1.0876956950572006e-05, 'epoch': 0.49} 49%|████▉ | 3240/6640 [1:10:05<5:10:39, 5.48s/it] 49%|████▉ | 3241/6640 [1:10:10<5:09:48, 5.47s/it] {'loss': 0.6203, 'learning_rate': 1.0872097391334928e-05, 'epoch': 0.49} 49%|████▉ | 3241/6640 [1:10:10<5:09:48, 5.47s/it] 49%|████▉ | 3242/6640 [1:10:16<5:09:38, 5.47s/it] {'loss': 0.6121, 'learning_rate': 1.0867237624562064e-05, 'epoch': 0.49} 49%|████▉ | 3242/6640 [1:10:16<5:09:38, 5.47s/it] 49%|████▉ | 3243/6640 [1:10:21<5:07:47, 5.44s/it] {'loss': 0.5685, 'learning_rate': 1.0862377651409912e-05, 'epoch': 0.49} 49%|████▉ | 3243/6640 [1:10:21<5:07:47, 5.44s/it] 49%|████▉ | 3244/6640 [1:10:27<5:10:38, 5.49s/it] {'loss': 0.5939, 'learning_rate': 1.0857517473035013e-05, 'epoch': 0.49} 49%|████▉ | 3244/6640 [1:10:27<5:10:38, 5.49s/it] 49%|████▉ | 3245/6640 [1:10:32<5:10:08, 5.48s/it] {'loss': 0.5771, 'learning_rate': 1.0852657090593961e-05, 'epoch': 0.49} 49%|████▉ | 3245/6640 [1:10:32<5:10:08, 5.48s/it] 49%|████▉ | 3246/6640 [1:10:38<5:15:05, 5.57s/it] {'loss': 0.5946, 'learning_rate': 1.0847796505243393e-05, 'epoch': 0.49} 49%|████▉ | 3246/6640 [1:10:38<5:15:05, 5.57s/it] 49%|████▉ | 3247/6640 [1:10:44<5:12:40, 5.53s/it] {'loss': 0.6174, 'learning_rate': 1.0842935718140005e-05, 'epoch': 0.49} 49%|████▉ | 3247/6640 [1:10:44<5:12:40, 5.53s/it] 49%|████▉ | 3248/6640 [1:10:49<5:10:20, 5.49s/it] {'loss': 0.6195, 'learning_rate': 1.0838074730440525e-05, 'epoch': 0.49} 49%|████▉ | 3248/6640 [1:10:49<5:10:20, 5.49s/it] 49%|████▉ | 3249/6640 [1:10:55<5:10:31, 5.49s/it] {'loss': 0.5913, 'learning_rate': 1.0833213543301742e-05, 'epoch': 0.49} 49%|████▉ | 3249/6640 [1:10:55<5:10:31, 5.49s/it]7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 63 5AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend...2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 49%|████▉ | 3250/6640 [1:11:01<5:18:56, 5.65s/it] {'loss': 0.6072, 'learning_rate': 1.0828352157880489e-05, 'epoch': 0.49} 49%|████▉ | 3250/6640 [1:11:01<5:18:56, 5.65s/it] 49%|████▉ | 3251/6640 [1:11:06<5:15:51, 5.59s/it] {'loss': 0.6138, 'learning_rate': 1.0823490575333641e-05, 'epoch': 0.49} 49%|████▉ | 3251/6640 [1:11:06<5:15:51, 5.59s/it] 49%|████▉ | 3252/6640 [1:11:11<5:10:39, 5.50s/it] {'loss': 0.5866, 'learning_rate': 1.0818628796818134e-05, 'epoch': 0.49} 49%|████▉ | 3252/6640 [1:11:11<5:10:39, 5.50s/it] 49%|████▉ | 3253/6640 [1:11:17<5:10:09, 5.49s/it] {'loss': 0.6004, 'learning_rate': 1.0813766823490927e-05, 'epoch': 0.49} 49%|████▉ | 3253/6640 [1:11:17<5:10:09, 5.49s/it] 49%|████▉ | 3254/6640 [1:11:22<5:05:31, 5.41s/it] {'loss': 0.5698, 'learning_rate': 1.0808904656509054e-05, 'epoch': 0.49} 49%|████▉ | 3254/6640 [1:11:22<5:05:31, 5.41s/it] 49%|████▉ | 3255/6640 [1:11:27<5:06:06, 5.43s/it] {'loss': 0.5698, 'learning_rate': 1.0804042297029567e-05, 'epoch': 0.49} 49%|████▉ | 3255/6640 [1:11:27<5:06:06, 5.43s/it] 49%|████▉ | 3256/6640 [1:11:33<5:04:05, 5.39s/it] {'loss': 0.5726, 'learning_rate': 1.0799179746209586e-05, 'epoch': 0.49} 49%|████▉ | 3256/6640 [1:11:33<5:04:05, 5.39s/it] 49%|████▉ | 3257/6640 [1:11:38<5:03:10, 5.38s/it] {'loss': 0.5842, 'learning_rate': 1.0794317005206263e-05, 'epoch': 0.49} 49%|████▉ | 3257/6640 [1:11:38<5:03:10, 5.38s/it] 49%|████▉ | 3258/6640 [1:11:43<5:02:25, 5.37s/it] {'loss': 0.6151, 'learning_rate': 1.0789454075176805e-05, 'epoch': 0.49} 49%|████▉ | 3258/6640 [1:11:43<5:02:25, 5.37s/it] 49%|████▉ | 3259/6640 [1:11:49<5:04:45, 5.41s/it] {'loss': 0.5941, 'learning_rate': 1.0784590957278452e-05, 'epoch': 0.49} 49%|████▉ | 3259/6640 [1:11:49<5:04:45, 5.41s/it] 49%|████▉ | 3260/6640 [1:11:55<5:15:14, 5.60s/it] {'loss': 0.5917, 'learning_rate': 1.0779727652668496e-05, 'epoch': 0.49} 49%|████▉ | 3260/6640 [1:11:55<5:15:14, 5.60s/it] 49%|████▉ | 3261/6640 [1:12:00<5:11:53, 5.54s/it] {'loss': 0.6037, 'learning_rate': 1.0774864162504285e-05, 'epoch': 0.49} 49%|████▉ | 3261/6640 [1:12:00<5:11:53, 5.54s/it] 49%|████▉ | 3262/6640 [1:12:06<5:12:02, 5.54s/it] {'loss': 0.6086, 'learning_rate': 1.0770000487943182e-05, 'epoch': 0.49} 49%|████▉ | 3262/6640 [1:12:06<5:12:02, 5.54s/it] 49%|████▉ | 3263/6640 [1:12:11<5:09:40, 5.50s/it] {'loss': 0.5999, 'learning_rate': 1.0765136630142626e-05, 'epoch': 0.49} 49%|████▉ | 3263/6640 [1:12:11<5:09:40, 5.50s/it] 49%|████▉ | 3264/6640 [1:12:17<5:10:01, 5.51s/it] {'loss': 0.5739, 'learning_rate': 1.0760272590260072e-05, 'epoch': 0.49} 49%|████▉ | 3264/6640 [1:12:17<5:10:01, 5.51s/it] 49%|████▉ | 3265/6640 [1:12:22<5:06:53, 5.46s/it] {'loss': 0.5703, 'learning_rate': 1.075540836945304e-05, 'epoch': 0.49} 49%|████▉ | 3265/6640 [1:12:22<5:06:53, 5.46s/it] 49%|████▉ | 3266/6640 [1:12:28<5:04:57, 5.42s/it] {'loss': 0.6076, 'learning_rate': 1.0750543968879081e-05, 'epoch': 0.49} 49%|████▉ | 3266/6640 [1:12:28<5:04:57, 5.42s/it] 49%|████▉ | 3267/6640 [1:12:33<5:03:01, 5.39s/it] {'loss': 0.5697, 'learning_rate': 1.0745679389695792e-05, 'epoch': 0.49} 49%|████▉ | 3267/6640 [1:12:33<5:03:01, 5.39s/it] 49%|████▉ | 3268/6640 [1:12:38<5:01:38, 5.37s/it] {'loss': 0.5977, 'learning_rate': 1.0740814633060812e-05, 'epoch': 0.49} 49%|████▉ | 3268/6640 [1:12:38<5:01:38, 5.37s/it] 49%|████▉ | 3269/6640 [1:12:44<5:03:49, 5.41s/it] {'loss': 0.6013, 'learning_rate': 1.0735949700131822e-05, 'epoch': 0.49} 49%|████▉ | 3269/6640 [1:12:44<5:03:49, 5.41s/it] 49%|████▉ | 3270/6640 [1:12:49<5:01:48, 5.37s/it] {'loss': 0.6004, 'learning_rate': 1.0731084592066548e-05, 'epoch': 0.49} 49%|████▉ | 3270/6640 [1:12:49<5:01:48, 5.37s/it] 49%|████▉ | 3271/6640 [1:12:55<5:07:34, 5.48s/it] {'loss': 0.5941, 'learning_rate': 1.0726219310022752e-05, 'epoch': 0.49} 49%|████▉ | 3271/6640 [1:12:55<5:07:34, 5.48s/it] 49%|████▉ | 3272/6640 [1:13:00<5:05:52, 5.45s/it] {'loss': 0.6044, 'learning_rate': 1.0721353855158243e-05, 'epoch': 0.49} 49%|████▉ | 3272/6640 [1:13:00<5:05:52, 5.45s/it] 49%|████▉ | 3273/6640 [1:13:05<5:03:25, 5.41s/it] {'loss': 0.5811, 'learning_rate': 1.0716488228630867e-05, 'epoch': 0.49} 49%|████▉ | 3273/6640 [1:13:05<5:03:25, 5.41s/it] 49%|████▉ | 3274/6640 [1:13:11<5:05:16, 5.44s/it] {'loss': 0.5982, 'learning_rate': 1.0711622431598516e-05, 'epoch': 0.49} 49%|████▉ | 3274/6640 [1:13:11<5:05:16, 5.44s/it] 49%|████▉ | 3275/6640 [1:13:17<5:08:15, 5.50s/it] {'loss': 0.5972, 'learning_rate': 1.0706756465219114e-05, 'epoch': 0.49} 49%|████▉ | 3275/6640 [1:13:17<5:08:15, 5.50s/it] 49%|████▉ | 3276/6640 [1:13:22<5:06:20, 5.46s/it] {'loss': 0.592, 'learning_rate': 1.0701890330650633e-05, 'epoch': 0.49} 49%|████▉ | 3276/6640 [1:13:22<5:06:20, 5.46s/it] 49%|████▉ | 3277/6640 [1:13:27<5:06:07, 5.46s/it] {'loss': 0.5943, 'learning_rate': 1.0697024029051085e-05, 'epoch': 0.49} 49%|████▉ | 3277/6640 [1:13:27<5:06:07, 5.46s/it] 49%|████▉ | 3278/6640 [1:13:33<5:07:19, 5.48s/it] {'loss': 0.6106, 'learning_rate': 1.0692157561578515e-05, 'epoch': 0.49} 49%|████▉ | 3278/6640 [1:13:33<5:07:19, 5.48s/it] 49%|████▉ | 3279/6640 [1:13:38<5:05:07, 5.45s/it] {'loss': 0.5677, 'learning_rate': 1.0687290929391017e-05, 'epoch': 0.49} 49%|████▉ | 3279/6640 [1:13:38<5:05:07, 5.45s/it] 49%|████▉ | 3280/6640 [1:13:44<5:04:48, 5.44s/it] {'loss': 0.6257, 'learning_rate': 1.0682424133646712e-05, 'epoch': 0.49} 49%|████▉ | 3280/6640 [1:13:44<5:04:48, 5.44s/it] 49%|████▉ | 3281/6640 [1:13:49<5:05:05, 5.45s/it] {'loss': 0.6112, 'learning_rate': 1.0677557175503771e-05, 'epoch': 0.49} 49%|████▉ | 3281/6640 [1:13:49<5:05:05, 5.45s/it] 49%|████▉ | 3282/6640 [1:13:55<5:08:52, 5.52s/it] {'loss': 0.6046, 'learning_rate': 1.0672690056120398e-05, 'epoch': 0.49} 49%|████▉ | 3282/6640 [1:13:55<5:08:52, 5.52s/it] 49%|████▉ | 3283/6640 [1:14:00<5:08:32, 5.51s/it] {'loss': 0.5644, 'learning_rate': 1.0667822776654836e-05, 'epoch': 0.49} 49%|████▉ | 3283/6640 [1:14:00<5:08:32, 5.51s/it] 49%|████▉ | 3284/6640 [1:14:06<5:07:21, 5.50s/it] {'loss': 0.6264, 'learning_rate': 1.066295533826537e-05, 'epoch': 0.49} 49%|████▉ | 3284/6640 [1:14:06<5:07:21, 5.50s/it] 49%|████▉ | 3285/6640 [1:14:11<5:03:00, 5.42s/it] {'loss': 0.5774, 'learning_rate': 1.0658087742110322e-05, 'epoch': 0.49} 49%|████▉ | 3285/6640 [1:14:11<5:03:00, 5.42s/it] 49%|████▉ | 3286/6640 [1:14:16<5:00:20, 5.37s/it] {'loss': 0.5854, 'learning_rate': 1.0653219989348045e-05, 'epoch': 0.49} 49%|████▉ | 3286/6640 [1:14:16<5:00:20, 5.37s/it] 50%|████▉ | 3287/6640 [1:14:22<5:00:14, 5.37s/it] {'loss': 0.5846, 'learning_rate': 1.064835208113693e-05, 'epoch': 0.5} 50%|████▉ | 3287/6640 [1:14:22<5:00:14, 5.37s/it] 50%|████▉ | 3288/6640 [1:14:27<5:01:35, 5.40s/it] {'loss': 0.6072, 'learning_rate': 1.0643484018635418e-05, 'epoch': 0.5} 50%|████▉ | 3288/6640 [1:14:27<5:01:35, 5.40s/it] 50%|████▉ | 3289/6640 [1:14:32<5:00:29, 5.38s/it] {'loss': 0.599, 'learning_rate': 1.0638615803001967e-05, 'epoch': 0.5} 50%|████▉ | 3289/6640 [1:14:32<5:00:29, 5.38s/it] 50%|████▉ | 3290/6640 [1:14:38<5:00:38, 5.38s/it] {'loss': 0.5864, 'learning_rate': 1.063374743539509e-05, 'epoch': 0.5} 50%|████▉ | 3290/6640 [1:14:38<5:00:38, 5.38s/it] 50%|████▉ | 3291/6640 [1:14:43<5:03:10, 5.43s/it] {'loss': 0.6027, 'learning_rate': 1.0628878916973322e-05, 'epoch': 0.5} 50%|████▉ | 3291/6640 [1:14:43<5:03:10, 5.43s/it] 50%|████▉ | 3292/6640 [1:14:49<5:02:10, 5.42s/it] {'loss': 0.5939, 'learning_rate': 1.0624010248895242e-05, 'epoch': 0.5} 50%|████▉ | 3292/6640 [1:14:49<5:02:10, 5.42s/it] 50%|████▉ | 3293/6640 [1:14:54<5:03:47, 5.45s/it] {'loss': 0.6243, 'learning_rate': 1.0619141432319464e-05, 'epoch': 0.5} 50%|████▉ | 3293/6640 [1:14:54<5:03:47, 5.45s/it] 50%|████▉ | 3294/6640 [1:15:00<5:07:57, 5.52s/it] {'loss': 0.6038, 'learning_rate': 1.0614272468404637e-05, 'epoch': 0.5} 50%|████▉ | 3294/6640 [1:15:00<5:07:57, 5.52s/it] 50%|████▉ | 3295/6640 [1:15:05<5:04:09, 5.46s/it] {'loss': 0.5914, 'learning_rate': 1.060940335830944e-05, 'epoch': 0.5} 50%|████▉ | 3295/6640 [1:15:05<5:04:09, 5.46s/it] 50%|████▉ | 3296/6640 [1:15:11<5:02:16, 5.42s/it] {'loss': 0.5885, 'learning_rate': 1.060453410319259e-05, 'epoch': 0.5} 50%|████▉ | 3296/6640 [1:15:11<5:02:16, 5.42s/it] 50%|████▉ | 3297/6640 [1:15:16<5:01:36, 5.41s/it] {'loss': 0.6089, 'learning_rate': 1.0599664704212844e-05, 'epoch': 0.5} 50%|████▉ | 3297/6640 [1:15:16<5:01:36, 5.41s/it] 50%|████▉ | 3298/6640 [1:15:22<5:05:49, 5.49s/it] {'loss': 0.593, 'learning_rate': 1.0594795162528982e-05, 'epoch': 0.5} 50%|████▉ | 3298/6640 [1:15:22<5:05:49, 5.49s/it] 50%|████▉ | 3299/6640 [1:15:27<5:06:04, 5.50s/it] {'loss': 0.5879, 'learning_rate': 1.0589925479299833e-05, 'epoch': 0.5} 50%|████▉ | 3299/6640 [1:15:27<5:06:04, 5.50s/it]7 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...5 AutoResumeHook: Checking whether to suspend... 50%|████▉ | 3300/6640 [1:15:33<5:03:28, 5.45s/it]3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.578, 'learning_rate': 1.058505565568424e-05, 'epoch': 0.5} 50%|████▉ | 3300/6640 [1:15:33<5:03:28, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 50%|████▉ | 3301/6640 [1:16:00<11:07:32, 12.00s/it] {'loss': 0.5976, 'learning_rate': 1.0580185692841095e-05, 'epoch': 0.5} 50%|████▉ | 3301/6640 [1:16:00<11:07:32, 12.00s/it] 50%|████▉ | 3302/6640 [1:16:05<9:18:20, 10.04s/it] {'loss': 0.5916, 'learning_rate': 1.0575315591929322e-05, 'epoch': 0.5} 50%|████▉ | 3302/6640 [1:16:05<9:18:20, 10.04s/it] 50%|████▉ | 3303/6640 [1:16:11<7:58:42, 8.61s/it] {'loss': 0.6079, 'learning_rate': 1.0570445354107874e-05, 'epoch': 0.5} 50%|████▉ | 3303/6640 [1:16:11<7:58:42, 8.61s/it] 50%|████▉ | 3304/6640 [1:16:16<6:59:31, 7.55s/it] {'loss': 0.615, 'learning_rate': 1.0565574980535733e-05, 'epoch': 0.5} 50%|████▉ | 3304/6640 [1:16:16<6:59:31, 7.55s/it] 50%|████▉ | 3305/6640 [1:16:21<6:24:05, 6.91s/it] {'loss': 0.6049, 'learning_rate': 1.0560704472371919e-05, 'epoch': 0.5} 50%|████▉ | 3305/6640 [1:16:21<6:24:05, 6.91s/it] 50%|████▉ | 3306/6640 [1:16:27<6:13:58, 6.73s/it] {'loss': 0.5992, 'learning_rate': 1.0555833830775483e-05, 'epoch': 0.5} 50%|████▉ | 3306/6640 [1:16:27<6:13:58, 6.73s/it] 50%|████▉ | 3307/6640 [1:16:33<5:52:32, 6.35s/it] {'loss': 0.6065, 'learning_rate': 1.0550963056905506e-05, 'epoch': 0.5} 50%|████▉ | 3307/6640 [1:16:33<5:52:32, 6.35s/it] 50%|████▉ | 3308/6640 [1:16:38<5:37:39, 6.08s/it] {'loss': 0.5809, 'learning_rate': 1.05460921519211e-05, 'epoch': 0.5} 50%|████▉ | 3308/6640 [1:16:38<5:37:39, 6.08s/it] 50%|████▉ | 3309/6640 [1:16:44<5:25:47, 5.87s/it] {'loss': 0.5839, 'learning_rate': 1.0541221116981412e-05, 'epoch': 0.5} 50%|████▉ | 3309/6640 [1:16:44<5:25:47, 5.87s/it] 50%|████▉ | 3310/6640 [1:16:49<5:15:54, 5.69s/it] {'loss': 0.6066, 'learning_rate': 1.0536349953245622e-05, 'epoch': 0.5} 50%|████▉ | 3310/6640 [1:16:49<5:15:54, 5.69s/it] 50%|████▉ | 3311/6640 [1:16:54<5:13:13, 5.65s/it] {'loss': 0.5787, 'learning_rate': 1.0531478661872926e-05, 'epoch': 0.5} 50%|████▉ | 3311/6640 [1:16:54<5:13:13, 5.65s/it] 50%|████▉ | 3312/6640 [1:17:00<5:08:57, 5.57s/it] {'loss': 0.6147, 'learning_rate': 1.0526607244022571e-05, 'epoch': 0.5} 50%|████▉ | 3312/6640 [1:17:00<5:08:57, 5.57s/it] 50%|████▉ | 3313/6640 [1:17:05<5:08:01, 5.55s/it] {'loss': 0.5899, 'learning_rate': 1.0521735700853818e-05, 'epoch': 0.5} 50%|████▉ | 3313/6640 [1:17:05<5:08:01, 5.55s/it] 50%|████▉ | 3314/6640 [1:17:11<5:05:38, 5.51s/it] {'loss': 0.5779, 'learning_rate': 1.0516864033525961e-05, 'epoch': 0.5} 50%|████▉ | 3314/6640 [1:17:11<5:05:38, 5.51s/it] 50%|████▉ | 3315/6640 [1:17:16<5:02:10, 5.45s/it] {'loss': 0.5908, 'learning_rate': 1.0511992243198335e-05, 'epoch': 0.5} 50%|████▉ | 3315/6640 [1:17:16<5:02:10, 5.45s/it] 50%|████▉ | 3316/6640 [1:17:22<5:02:12, 5.46s/it] {'loss': 0.6135, 'learning_rate': 1.0507120331030286e-05, 'epoch': 0.5} 50%|████▉ | 3316/6640 [1:17:22<5:02:12, 5.46s/it] 50%|████▉ | 3317/6640 [1:17:27<5:06:29, 5.53s/it] {'loss': 0.5975, 'learning_rate': 1.0502248298181202e-05, 'epoch': 0.5} 50%|████▉ | 3317/6640 [1:17:27<5:06:29, 5.53s/it] 50%|████▉ | 3318/6640 [1:17:33<5:03:51, 5.49s/it] {'loss': 0.5919, 'learning_rate': 1.0497376145810496e-05, 'epoch': 0.5} 50%|████▉ | 3318/6640 [1:17:33<5:03:51, 5.49s/it] 50%|████▉ | 3319/6640 [1:17:38<5:05:18, 5.52s/it] {'loss': 0.5597, 'learning_rate': 1.0492503875077613e-05, 'epoch': 0.5} 50%|████▉ | 3319/6640 [1:17:38<5:05:18, 5.52s/it] 50%|█████ | 3320/6640 [1:17:44<5:05:22, 5.52s/it] {'loss': 0.5887, 'learning_rate': 1.0487631487142018e-05, 'epoch': 0.5} 50%|█████ | 3320/6640 [1:17:44<5:05:22, 5.52s/it] 50%|█████ | 3321/6640 [1:17:49<5:04:30, 5.50s/it] {'loss': 0.5832, 'learning_rate': 1.0482758983163215e-05, 'epoch': 0.5} 50%|█████ | 3321/6640 [1:17:49<5:04:30, 5.50s/it] 50%|█████ | 3322/6640 [1:17:55<5:01:12, 5.45s/it] {'loss': 0.6046, 'learning_rate': 1.0477886364300722e-05, 'epoch': 0.5} 50%|█████ | 3322/6640 [1:17:55<5:01:12, 5.45s/it] 50%|█████ | 3323/6640 [1:18:00<5:00:51, 5.44s/it] {'loss': 0.5953, 'learning_rate': 1.0473013631714092e-05, 'epoch': 0.5} 50%|█████ | 3323/6640 [1:18:00<5:00:51, 5.44s/it] 50%|█████ | 3324/6640 [1:18:05<5:00:06, 5.43s/it] {'loss': 0.5729, 'learning_rate': 1.0468140786562911e-05, 'epoch': 0.5} 50%|█████ | 3324/6640 [1:18:05<5:00:06, 5.43s/it] 50%|█████ | 3325/6640 [1:18:11<5:00:42, 5.44s/it] {'loss': 0.6015, 'learning_rate': 1.0463267830006779e-05, 'epoch': 0.5} 50%|█████ | 3325/6640 [1:18:11<5:00:42, 5.44s/it] 50%|█████ | 3326/6640 [1:18:16<5:03:37, 5.50s/it] {'loss': 0.5701, 'learning_rate': 1.0458394763205329e-05, 'epoch': 0.5} 50%|█████ | 3326/6640 [1:18:16<5:03:37, 5.50s/it] 50%|█████ | 3327/6640 [1:18:22<5:03:31, 5.50s/it] {'loss': 0.5894, 'learning_rate': 1.0453521587318226e-05, 'epoch': 0.5} 50%|█████ | 3327/6640 [1:18:22<5:03:31, 5.50s/it] 50%|█████ | 3328/6640 [1:18:27<5:01:52, 5.47s/it] {'loss': 0.5927, 'learning_rate': 1.044864830350515e-05, 'epoch': 0.5} 50%|█████ | 3328/6640 [1:18:27<5:01:52, 5.47s/it] 50%|█████ | 3329/6640 [1:18:33<5:00:09, 5.44s/it] {'loss': 0.6043, 'learning_rate': 1.0443774912925814e-05, 'epoch': 0.5} 50%|█████ | 3329/6640 [1:18:33<5:00:09, 5.44s/it] 50%|█████ | 3330/6640 [1:18:38<4:58:40, 5.41s/it] {'loss': 0.5814, 'learning_rate': 1.0438901416739955e-05, 'epoch': 0.5} 50%|█████ | 3330/6640 [1:18:38<4:58:40, 5.41s/it] 50%|█████ | 3331/6640 [1:18:44<5:00:33, 5.45s/it] {'loss': 0.5938, 'learning_rate': 1.0434027816107333e-05, 'epoch': 0.5} 50%|█████ | 3331/6640 [1:18:44<5:00:33, 5.45s/it] 50%|█████ | 3332/6640 [1:18:49<5:00:20, 5.45s/it] {'loss': 0.5868, 'learning_rate': 1.0429154112187731e-05, 'epoch': 0.5} 50%|█████ | 3332/6640 [1:18:49<5:00:20, 5.45s/it] 50%|█████ | 3333/6640 [1:18:55<5:01:16, 5.47s/it] {'loss': 0.5969, 'learning_rate': 1.0424280306140966e-05, 'epoch': 0.5} 50%|█████ | 3333/6640 [1:18:55<5:01:16, 5.47s/it] 50%|█████ | 3334/6640 [1:19:00<5:00:54, 5.46s/it] {'loss': 0.5914, 'learning_rate': 1.0419406399126868e-05, 'epoch': 0.5} 50%|█████ | 3334/6640 [1:19:00<5:00:54, 5.46s/it] 50%|█████ | 3335/6640 [1:19:05<4:59:55, 5.45s/it] {'loss': 0.5936, 'learning_rate': 1.0414532392305301e-05, 'epoch': 0.5} 50%|█████ | 3335/6640 [1:19:05<4:59:55, 5.45s/it] 50%|█████ | 3336/6640 [1:19:11<4:58:27, 5.42s/it] {'loss': 0.5852, 'learning_rate': 1.0409658286836144e-05, 'epoch': 0.5} 50%|█████ | 3336/6640 [1:19:11<4:58:27, 5.42s/it] 50%|█████ | 3337/6640 [1:19:16<4:58:00, 5.41s/it] {'loss': 0.5801, 'learning_rate': 1.0404784083879305e-05, 'epoch': 0.5} 50%|█████ | 3337/6640 [1:19:16<4:58:00, 5.41s/it] 50%|█████ | 3338/6640 [1:19:22<5:02:27, 5.50s/it] {'loss': 0.5898, 'learning_rate': 1.0399909784594711e-05, 'epoch': 0.5} 50%|█████ | 3338/6640 [1:19:22<5:02:27, 5.50s/it] 50%|█████ | 3339/6640 [1:19:27<5:01:14, 5.48s/it] {'loss': 0.5827, 'learning_rate': 1.0395035390142319e-05, 'epoch': 0.5} 50%|█████ | 3339/6640 [1:19:27<5:01:14, 5.48s/it] 50%|█████ | 3340/6640 [1:19:33<4:57:31, 5.41s/it] {'loss': 0.5605, 'learning_rate': 1.03901609016821e-05, 'epoch': 0.5} 50%|█████ | 3340/6640 [1:19:33<4:57:31, 5.41s/it] 50%|█████ | 3341/6640 [1:19:38<4:59:05, 5.44s/it] {'loss': 0.5929, 'learning_rate': 1.0385286320374053e-05, 'epoch': 0.5} 50%|█████ | 3341/6640 [1:19:38<4:59:05, 5.44s/it] 50%|█████ | 3342/6640 [1:19:44<5:01:31, 5.49s/it] {'loss': 0.6063, 'learning_rate': 1.0380411647378196e-05, 'epoch': 0.5} 50%|█████ | 3342/6640 [1:19:44<5:01:31, 5.49s/it] 50%|█████ | 3343/6640 [1:19:49<5:03:12, 5.52s/it] {'loss': 0.5878, 'learning_rate': 1.0375536883854575e-05, 'epoch': 0.5} 50%|█████ | 3343/6640 [1:19:49<5:03:12, 5.52s/it] 50%|█████ | 3344/6640 [1:19:55<5:02:25, 5.51s/it] {'loss': 0.5835, 'learning_rate': 1.037066203096325e-05, 'epoch': 0.5} 50%|█████ | 3344/6640 [1:19:55<5:02:25, 5.51s/it] 50%|█████ | 3345/6640 [1:20:00<5:00:21, 5.47s/it] {'loss': 0.5991, 'learning_rate': 1.0365787089864303e-05, 'epoch': 0.5} 50%|█████ | 3345/6640 [1:20:00<5:00:21, 5.47s/it] 50%|█████ | 3346/6640 [1:20:06<5:00:42, 5.48s/it] {'loss': 0.606, 'learning_rate': 1.0360912061717845e-05, 'epoch': 0.5} 50%|█████ | 3346/6640 [1:20:06<5:00:42, 5.48s/it] 50%|█████ | 3347/6640 [1:20:11<4:58:38, 5.44s/it] {'loss': 0.5995, 'learning_rate': 1.0356036947683994e-05, 'epoch': 0.5} 50%|█████ | 3347/6640 [1:20:11<4:58:38, 5.44s/it] 50%|█████ | 3348/6640 [1:20:16<4:56:48, 5.41s/it] {'loss': 0.604, 'learning_rate': 1.0351161748922905e-05, 'epoch': 0.5} 50%|█████ | 3348/6640 [1:20:16<4:56:48, 5.41s/it] 50%|█████ | 3349/6640 [1:20:22<4:56:17, 5.40s/it] {'loss': 0.5776, 'learning_rate': 1.0346286466594738e-05, 'epoch': 0.5} 50%|█████ | 3349/6640 [1:20:22<4:56:17, 5.40s/it]06 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 50%|█████ | 3350/6640 [1:20:27<5:01:23, 5.50s/it]3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.6022, 'learning_rate': 1.034141110185968e-05, 'epoch': 0.5} 50%|█████ | 3350/6640 [1:20:27<5:01:23, 5.50s/it] 50%|█████ | 3351/6640 [1:20:33<5:02:47, 5.52s/it] {'loss': 0.5877, 'learning_rate': 1.0336535655877942e-05, 'epoch': 0.5} 50%|█████ | 3351/6640 [1:20:33<5:02:47, 5.52s/it] 50%|█████ | 3352/6640 [1:20:38<5:01:24, 5.50s/it] {'loss': 0.5768, 'learning_rate': 1.0331660129809743e-05, 'epoch': 0.5} 50%|█████ | 3352/6640 [1:20:38<5:01:24, 5.50s/it] 50%|█████ | 3353/6640 [1:20:44<5:00:23, 5.48s/it] {'loss': 0.5867, 'learning_rate': 1.0326784524815331e-05, 'epoch': 0.5} 50%|█████ | 3353/6640 [1:20:44<5:00:23, 5.48s/it] 51%|█████ | 3354/6640 [1:20:49<4:57:47, 5.44s/it] {'loss': 0.5834, 'learning_rate': 1.0321908842054969e-05, 'epoch': 0.51} 51%|█████ | 3354/6640 [1:20:49<4:57:47, 5.44s/it] 51%|█████ | 3355/6640 [1:20:55<4:56:23, 5.41s/it] {'loss': 0.6339, 'learning_rate': 1.031703308268894e-05, 'epoch': 0.51} 51%|█████ | 3355/6640 [1:20:55<4:56:23, 5.41s/it] 51%|█████ | 3356/6640 [1:21:00<4:56:59, 5.43s/it] {'loss': 0.5866, 'learning_rate': 1.031215724787754e-05, 'epoch': 0.51} 51%|█████ | 3356/6640 [1:21:00<4:56:59, 5.43s/it] 51%|█████ | 3357/6640 [1:21:06<4:59:00, 5.46s/it] {'loss': 0.5903, 'learning_rate': 1.0307281338781092e-05, 'epoch': 0.51} 51%|█████ | 3357/6640 [1:21:06<4:59:00, 5.46s/it] 51%|█████ | 3358/6640 [1:21:11<4:58:26, 5.46s/it] {'loss': 0.5916, 'learning_rate': 1.0302405356559925e-05, 'epoch': 0.51} 51%|█████ | 3358/6640 [1:21:11<4:58:26, 5.46s/it] 51%|█████ | 3359/6640 [1:21:16<4:56:18, 5.42s/it] {'loss': 0.5913, 'learning_rate': 1.0297529302374396e-05, 'epoch': 0.51} 51%|█████ | 3359/6640 [1:21:16<4:56:18, 5.42s/it] 51%|█████ | 3360/6640 [1:21:22<4:54:23, 5.39s/it] {'loss': 0.6082, 'learning_rate': 1.0292653177384878e-05, 'epoch': 0.51} 51%|█████ | 3360/6640 [1:21:22<4:54:23, 5.39s/it] 51%|█████ | 3361/6640 [1:21:27<4:52:03, 5.34s/it] {'loss': 0.5777, 'learning_rate': 1.0287776982751752e-05, 'epoch': 0.51} 51%|█████ | 3361/6640 [1:21:27<4:52:03, 5.34s/it] 51%|█████ | 3362/6640 [1:21:33<4:56:39, 5.43s/it] {'loss': 0.6009, 'learning_rate': 1.0282900719635425e-05, 'epoch': 0.51} 51%|█████ | 3362/6640 [1:21:33<4:56:39, 5.43s/it] 51%|█████ | 3363/6640 [1:21:38<4:55:59, 5.42s/it] {'loss': 0.5789, 'learning_rate': 1.0278024389196314e-05, 'epoch': 0.51} 51%|█████ | 3363/6640 [1:21:38<4:55:59, 5.42s/it] 51%|█████ | 3364/6640 [1:21:43<4:54:30, 5.39s/it] {'loss': 0.5967, 'learning_rate': 1.0273147992594861e-05, 'epoch': 0.51} 51%|█████ | 3364/6640 [1:21:43<4:54:30, 5.39s/it] 51%|█████ | 3365/6640 [1:21:49<4:57:01, 5.44s/it] {'loss': 0.6146, 'learning_rate': 1.0268271530991509e-05, 'epoch': 0.51} 51%|█████ | 3365/6640 [1:21:49<4:57:01, 5.44s/it] 51%|█████ | 3366/6640 [1:21:54<4:58:14, 5.47s/it] {'loss': 0.593, 'learning_rate': 1.0263395005546735e-05, 'epoch': 0.51} 51%|█████ | 3366/6640 [1:21:54<4:58:14, 5.47s/it] 51%|█████ | 3367/6640 [1:22:00<4:57:28, 5.45s/it] {'loss': 0.6115, 'learning_rate': 1.025851841742101e-05, 'epoch': 0.51} 51%|█████ | 3367/6640 [1:22:00<4:57:28, 5.45s/it] 51%|█████ | 3368/6640 [1:22:05<4:56:03, 5.43s/it] {'loss': 0.5845, 'learning_rate': 1.0253641767774838e-05, 'epoch': 0.51} 51%|█████ | 3368/6640 [1:22:05<4:56:03, 5.43s/it] 51%|█████ | 3369/6640 [1:22:11<5:02:51, 5.56s/it] {'loss': 0.5786, 'learning_rate': 1.0248765057768733e-05, 'epoch': 0.51} 51%|█████ | 3369/6640 [1:22:11<5:02:51, 5.56s/it] 51%|█████ | 3370/6640 [1:22:16<4:59:45, 5.50s/it] {'loss': 0.6083, 'learning_rate': 1.0243888288563213e-05, 'epoch': 0.51} 51%|█████ | 3370/6640 [1:22:16<4:59:45, 5.50s/it] 51%|█████ | 3371/6640 [1:22:22<4:58:33, 5.48s/it] {'loss': 0.5977, 'learning_rate': 1.0239011461318827e-05, 'epoch': 0.51} 51%|█████ | 3371/6640 [1:22:22<4:58:33, 5.48s/it] 51%|█████ | 3372/6640 [1:22:28<5:07:08, 5.64s/it] {'loss': 0.5721, 'learning_rate': 1.0234134577196122e-05, 'epoch': 0.51} 51%|█████ | 3372/6640 [1:22:28<5:07:08, 5.64s/it] 51%|█████ | 3373/6640 [1:22:34<5:08:28, 5.67s/it] {'loss': 0.6007, 'learning_rate': 1.0229257637355672e-05, 'epoch': 0.51} 51%|█████ | 3373/6640 [1:22:34<5:08:28, 5.67s/it] 51%|█████ | 3374/6640 [1:22:39<5:04:38, 5.60s/it] {'loss': 0.5587, 'learning_rate': 1.0224380642958052e-05, 'epoch': 0.51} 51%|█████ | 3374/6640 [1:22:39<5:04:38, 5.60s/it] 51%|█████ | 3375/6640 [1:22:44<5:02:16, 5.55s/it] {'loss': 0.5967, 'learning_rate': 1.0219503595163857e-05, 'epoch': 0.51} 51%|█████ | 3375/6640 [1:22:44<5:02:16, 5.55s/it] 51%|█████ | 3376/6640 [1:22:50<5:03:46, 5.58s/it] {'loss': 0.6177, 'learning_rate': 1.0214626495133697e-05, 'epoch': 0.51} 51%|█████ | 3376/6640 [1:22:50<5:03:46, 5.58s/it] 51%|█████ | 3377/6640 [1:22:55<5:00:07, 5.52s/it] {'loss': 0.5557, 'learning_rate': 1.0209749344028189e-05, 'epoch': 0.51} 51%|█████ | 3377/6640 [1:22:55<5:00:07, 5.52s/it] 51%|█████ | 3378/6640 [1:23:01<4:58:59, 5.50s/it] {'loss': 0.5714, 'learning_rate': 1.0204872143007965e-05, 'epoch': 0.51} 51%|█████ | 3378/6640 [1:23:01<4:58:59, 5.50s/it] 51%|█████ | 3379/6640 [1:23:06<4:57:50, 5.48s/it] {'loss': 0.6034, 'learning_rate': 1.0199994893233664e-05, 'epoch': 0.51} 51%|█████ | 3379/6640 [1:23:06<4:57:50, 5.48s/it] 51%|█████ | 3380/6640 [1:23:12<4:56:37, 5.46s/it] {'loss': 0.5858, 'learning_rate': 1.0195117595865947e-05, 'epoch': 0.51} 51%|█████ | 3380/6640 [1:23:12<4:56:37, 5.46s/it] 51%|█████ | 3381/6640 [1:23:17<4:55:20, 5.44s/it] {'loss': 0.6049, 'learning_rate': 1.0190240252065472e-05, 'epoch': 0.51} 51%|█████ | 3381/6640 [1:23:17<4:55:20, 5.44s/it] 51%|█████ | 3382/6640 [1:23:22<4:53:00, 5.40s/it] {'loss': 0.601, 'learning_rate': 1.0185362862992925e-05, 'epoch': 0.51} 51%|█████ | 3382/6640 [1:23:22<4:53:00, 5.40s/it] 51%|█████ | 3383/6640 [1:23:28<4:52:21, 5.39s/it] {'loss': 0.6086, 'learning_rate': 1.0180485429808986e-05, 'epoch': 0.51} 51%|█████ | 3383/6640 [1:23:28<4:52:21, 5.39s/it] 51%|█████ | 3384/6640 [1:23:33<4:53:17, 5.40s/it] {'loss': 0.5798, 'learning_rate': 1.0175607953674358e-05, 'epoch': 0.51} 51%|█████ | 3384/6640 [1:23:33<4:53:17, 5.40s/it] 51%|█████ | 3385/6640 [1:23:39<4:53:16, 5.41s/it] {'loss': 0.5811, 'learning_rate': 1.017073043574975e-05, 'epoch': 0.51} 51%|█████ | 3385/6640 [1:23:39<4:53:16, 5.41s/it] 51%|█████ | 3386/6640 [1:23:44<4:54:50, 5.44s/it] {'loss': 0.6042, 'learning_rate': 1.0165852877195874e-05, 'epoch': 0.51} 51%|█████ | 3386/6640 [1:23:44<4:54:50, 5.44s/it] 51%|█████ | 3387/6640 [1:23:49<4:51:34, 5.38s/it] {'loss': 0.5778, 'learning_rate': 1.0160975279173467e-05, 'epoch': 0.51} 51%|█████ | 3387/6640 [1:23:49<4:51:34, 5.38s/it] 51%|█████ | 3388/6640 [1:23:55<4:54:38, 5.44s/it] {'loss': 0.6119, 'learning_rate': 1.015609764284326e-05, 'epoch': 0.51} 51%|█████ | 3388/6640 [1:23:55<4:54:38, 5.44s/it] 51%|█████ | 3389/6640 [1:24:00<4:51:30, 5.38s/it] {'loss': 0.5656, 'learning_rate': 1.0151219969366001e-05, 'epoch': 0.51} 51%|█████ | 3389/6640 [1:24:00<4:51:30, 5.38s/it] 51%|█████ | 3390/6640 [1:24:06<4:53:26, 5.42s/it] {'loss': 0.5987, 'learning_rate': 1.0146342259902446e-05, 'epoch': 0.51} 51%|█████ | 3390/6640 [1:24:06<4:53:26, 5.42s/it] 51%|█████ | 3391/6640 [1:24:11<4:53:58, 5.43s/it] {'loss': 0.59, 'learning_rate': 1.0141464515613357e-05, 'epoch': 0.51} 51%|█████ | 3391/6640 [1:24:11<4:53:58, 5.43s/it] 51%|█████ | 3392/6640 [1:24:17<4:53:51, 5.43s/it] {'loss': 0.5831, 'learning_rate': 1.013658673765951e-05, 'epoch': 0.51} 51%|█████ | 3392/6640 [1:24:17<4:53:51, 5.43s/it] 51%|█████ | 3393/6640 [1:24:22<4:54:31, 5.44s/it] {'loss': 0.5889, 'learning_rate': 1.0131708927201687e-05, 'epoch': 0.51} 51%|█████ | 3393/6640 [1:24:22<4:54:31, 5.44s/it] 51%|█████ | 3394/6640 [1:24:27<4:53:36, 5.43s/it] {'loss': 0.6004, 'learning_rate': 1.012683108540067e-05, 'epoch': 0.51} 51%|█████ | 3394/6640 [1:24:27<4:53:36, 5.43s/it] 51%|█████ | 3395/6640 [1:24:33<4:53:54, 5.43s/it] {'loss': 0.6173, 'learning_rate': 1.0121953213417256e-05, 'epoch': 0.51} 51%|█████ | 3395/6640 [1:24:33<4:53:54, 5.43s/it] 51%|█████ | 3396/6640 [1:24:39<4:55:48, 5.47s/it] {'loss': 0.5877, 'learning_rate': 1.0117075312412253e-05, 'epoch': 0.51} 51%|█████ | 3396/6640 [1:24:39<4:55:48, 5.47s/it] 51%|█████ | 3397/6640 [1:24:44<4:54:23, 5.45s/it] {'loss': 0.5713, 'learning_rate': 1.011219738354646e-05, 'epoch': 0.51} 51%|█████ | 3397/6640 [1:24:44<4:54:23, 5.45s/it] 51%|█████ | 3398/6640 [1:24:49<4:53:52, 5.44s/it] {'loss': 0.5994, 'learning_rate': 1.0107319427980704e-05, 'epoch': 0.51} 51%|█████ | 3398/6640 [1:24:49<4:53:52, 5.44s/it] 51%|█████ | 3399/6640 [1:24:55<4:54:38, 5.45s/it] {'loss': 0.6038, 'learning_rate': 1.0102441446875801e-05, 'epoch': 0.51} 51%|█████ | 3399/6640 [1:24:55<4:54:38, 5.45s/it]0 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 51%|█████ | 3400/6640 [1:25:00<4:54:57, 5.46s/it]3 1AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6066, 'learning_rate': 1.0097563441392582e-05, 'epoch': 0.51} 51%|█████ | 3400/6640 [1:25:00<4:54:57, 5.46s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 51%|█████ | 3401/6640 [1:25:28<11:01:23, 12.25s/it] {'loss': 0.6111, 'learning_rate': 1.009268541269188e-05, 'epoch': 0.51} 51%|█████ | 3401/6640 [1:25:28<11:01:23, 12.25s/it] 51%|█████ | 3402/6640 [1:25:34<9:09:43, 10.19s/it] {'loss': 0.6142, 'learning_rate': 1.008780736193454e-05, 'epoch': 0.51} 51%|█████ | 3402/6640 [1:25:34<9:09:43, 10.19s/it] 51%|█████▏ | 3403/6640 [1:25:39<7:52:47, 8.76s/it] {'loss': 0.5867, 'learning_rate': 1.0082929290281405e-05, 'epoch': 0.51} 51%|█████▏ | 3403/6640 [1:25:39<7:52:47, 8.76s/it] 51%|█████▏ | 3404/6640 [1:25:45<6:58:34, 7.76s/it] {'loss': 0.6188, 'learning_rate': 1.0078051198893318e-05, 'epoch': 0.51} 51%|█████▏ | 3404/6640 [1:25:45<6:58:34, 7.76s/it] 51%|█████▏ | 3405/6640 [1:25:50<6:21:52, 7.08s/it] {'loss': 0.6317, 'learning_rate': 1.0073173088931143e-05, 'epoch': 0.51} 51%|█████▏ | 3405/6640 [1:25:50<6:21:52, 7.08s/it] 51%|█████▏ | 3406/6640 [1:25:56<5:55:51, 6.60s/it] {'loss': 0.6039, 'learning_rate': 1.0068294961555734e-05, 'epoch': 0.51} 51%|█████▏ | 3406/6640 [1:25:56<5:55:51, 6.60s/it] 51%|█████▏ | 3407/6640 [1:26:01<5:36:23, 6.24s/it] {'loss': 0.5869, 'learning_rate': 1.0063416817927958e-05, 'epoch': 0.51} 51%|█████▏ | 3407/6640 [1:26:01<5:36:23, 6.24s/it] 51%|█████▏ | 3408/6640 [1:26:06<5:21:36, 5.97s/it] {'loss': 0.5769, 'learning_rate': 1.005853865920868e-05, 'epoch': 0.51} 51%|█████▏ | 3408/6640 [1:26:06<5:21:36, 5.97s/it] 51%|█████▏ | 3409/6640 [1:26:12<5:12:46, 5.81s/it] {'loss': 0.621, 'learning_rate': 1.0053660486558766e-05, 'epoch': 0.51} 51%|█████▏ | 3409/6640 [1:26:12<5:12:46, 5.81s/it] 51%|█████▏ | 3410/6640 [1:26:17<5:05:54, 5.68s/it] {'loss': 0.5914, 'learning_rate': 1.0048782301139102e-05, 'epoch': 0.51} 51%|█████▏ | 3410/6640 [1:26:17<5:05:54, 5.68s/it] 51%|█████▏ | 3411/6640 [1:26:23<5:02:06, 5.61s/it] {'loss': 0.61, 'learning_rate': 1.0043904104110557e-05, 'epoch': 0.51} 51%|█████▏ | 3411/6640 [1:26:23<5:02:06, 5.61s/it] 51%|█████▏ | 3412/6640 [1:26:28<4:56:37, 5.51s/it] {'loss': 0.6057, 'learning_rate': 1.0039025896634016e-05, 'epoch': 0.51} 51%|█████▏ | 3412/6640 [1:26:28<4:56:37, 5.51s/it] 51%|█████▏ | 3413/6640 [1:26:33<4:54:08, 5.47s/it] {'loss': 0.59, 'learning_rate': 1.0034147679870356e-05, 'epoch': 0.51} 51%|█████▏ | 3413/6640 [1:26:33<4:54:08, 5.47s/it] 51%|█████▏ | 3414/6640 [1:26:39<4:55:19, 5.49s/it] {'loss': 0.5857, 'learning_rate': 1.0029269454980465e-05, 'epoch': 0.51} 51%|█████▏ | 3414/6640 [1:26:39<4:55:19, 5.49s/it] 51%|█████▏ | 3415/6640 [1:26:44<4:56:21, 5.51s/it] {'loss': 0.5678, 'learning_rate': 1.0024391223125226e-05, 'epoch': 0.51} 51%|█████▏ | 3415/6640 [1:26:44<4:56:21, 5.51s/it] 51%|█████▏ | 3416/6640 [1:26:50<4:52:01, 5.43s/it] {'loss': 0.5995, 'learning_rate': 1.0019512985465533e-05, 'epoch': 0.51} 51%|█████▏ | 3416/6640 [1:26:50<4:52:01, 5.43s/it] 51%|█████▏ | 3417/6640 [1:26:55<4:53:05, 5.46s/it] {'loss': 0.5959, 'learning_rate': 1.0014634743162273e-05, 'epoch': 0.51} 51%|█████▏ | 3417/6640 [1:26:55<4:53:05, 5.46s/it] 51%|█████▏ | 3418/6640 [1:27:00<4:50:34, 5.41s/it] {'loss': 0.6003, 'learning_rate': 1.0009756497376337e-05, 'epoch': 0.51} 51%|█████▏ | 3418/6640 [1:27:00<4:50:34, 5.41s/it] 51%|█████▏ | 3419/6640 [1:27:06<4:50:11, 5.41s/it] {'loss': 0.5853, 'learning_rate': 1.0004878249268614e-05, 'epoch': 0.51} 51%|█████▏ | 3419/6640 [1:27:06<4:50:11, 5.41s/it] 52%|█████▏ | 3420/6640 [1:27:11<4:48:46, 5.38s/it] {'loss': 0.595, 'learning_rate': 1e-05, 'epoch': 0.52} 52%|█████▏ | 3420/6640 [1:27:11<4:48:46, 5.38s/it] 52%|█████▏ | 3421/6640 [1:27:16<4:47:11, 5.35s/it] {'loss': 0.5985, 'learning_rate': 9.995121750731389e-06, 'epoch': 0.52} 52%|█████▏ | 3421/6640 [1:27:16<4:47:11, 5.35s/it] 52%|█████▏ | 3422/6640 [1:27:22<4:47:01, 5.35s/it] {'loss': 0.601, 'learning_rate': 9.99024350262367e-06, 'epoch': 0.52} 52%|█████▏ | 3422/6640 [1:27:22<4:47:01, 5.35s/it] 52%|█████▏ | 3423/6640 [1:27:27<4:46:38, 5.35s/it] {'loss': 0.6036, 'learning_rate': 9.985365256837729e-06, 'epoch': 0.52} 52%|█████▏ | 3423/6640 [1:27:27<4:46:38, 5.35s/it] 52%|█████▏ | 3424/6640 [1:27:33<4:47:39, 5.37s/it] {'loss': 0.6175, 'learning_rate': 9.980487014534469e-06, 'epoch': 0.52} 52%|█████▏ | 3424/6640 [1:27:33<4:47:39, 5.37s/it] 52%|█████▏ | 3425/6640 [1:27:38<4:47:45, 5.37s/it] {'loss': 0.621, 'learning_rate': 9.975608776874775e-06, 'epoch': 0.52} 52%|█████▏ | 3425/6640 [1:27:38<4:47:45, 5.37s/it] 52%|█████▏ | 3426/6640 [1:27:44<4:51:43, 5.45s/it] {'loss': 0.6054, 'learning_rate': 9.97073054501954e-06, 'epoch': 0.52} 52%|█████▏ | 3426/6640 [1:27:44<4:51:43, 5.45s/it] 52%|█████▏ | 3427/6640 [1:27:49<4:51:33, 5.44s/it] {'loss': 0.5976, 'learning_rate': 9.965852320129646e-06, 'epoch': 0.52} 52%|█████▏ | 3427/6640 [1:27:49<4:51:33, 5.44s/it] 52%|█████▏ | 3428/6640 [1:27:54<4:51:12, 5.44s/it] {'loss': 0.5784, 'learning_rate': 9.960974103365987e-06, 'epoch': 0.52} 52%|█████▏ | 3428/6640 [1:27:54<4:51:12, 5.44s/it] 52%|█████▏ | 3429/6640 [1:28:00<4:52:19, 5.46s/it] {'loss': 0.6111, 'learning_rate': 9.956095895889444e-06, 'epoch': 0.52} 52%|█████▏ | 3429/6640 [1:28:00<4:52:19, 5.46s/it] 52%|█████▏ | 3430/6640 [1:28:06<4:58:05, 5.57s/it] {'loss': 0.6223, 'learning_rate': 9.951217698860902e-06, 'epoch': 0.52} 52%|█████▏ | 3430/6640 [1:28:06<4:58:05, 5.57s/it] 52%|█████▏ | 3431/6640 [1:28:11<4:55:55, 5.53s/it] {'loss': 0.5754, 'learning_rate': 9.946339513441235e-06, 'epoch': 0.52} 52%|█████▏ | 3431/6640 [1:28:11<4:55:55, 5.53s/it] 52%|█████▏ | 3432/6640 [1:28:16<4:51:33, 5.45s/it] {'loss': 0.566, 'learning_rate': 9.941461340791326e-06, 'epoch': 0.52} 52%|█████▏ | 3432/6640 [1:28:16<4:51:33, 5.45s/it] 52%|█████▏ | 3433/6640 [1:28:22<4:49:33, 5.42s/it] {'loss': 0.587, 'learning_rate': 9.936583182072045e-06, 'epoch': 0.52} 52%|█████▏ | 3433/6640 [1:28:22<4:49:33, 5.42s/it] 52%|█████▏ | 3434/6640 [1:28:28<5:00:49, 5.63s/it] {'loss': 0.5909, 'learning_rate': 9.93170503844427e-06, 'epoch': 0.52} 52%|█████▏ | 3434/6640 [1:28:28<5:00:49, 5.63s/it] 52%|█████▏ | 3435/6640 [1:28:33<4:56:12, 5.55s/it] {'loss': 0.584, 'learning_rate': 9.926826911068862e-06, 'epoch': 0.52} 52%|█████▏ | 3435/6640 [1:28:33<4:56:12, 5.55s/it] 52%|█████▏ | 3436/6640 [1:28:39<4:56:17, 5.55s/it] {'loss': 0.5949, 'learning_rate': 9.921948801106682e-06, 'epoch': 0.52} 52%|█████▏ | 3436/6640 [1:28:39<4:56:17, 5.55s/it] 52%|█████▏ | 3437/6640 [1:28:44<4:52:17, 5.48s/it] {'loss': 0.5564, 'learning_rate': 9.9170707097186e-06, 'epoch': 0.52} 52%|█████▏ | 3437/6640 [1:28:44<4:52:17, 5.48s/it] 52%|█████▏ | 3438/6640 [1:28:49<4:49:04, 5.42s/it] {'loss': 0.6057, 'learning_rate': 9.912192638065462e-06, 'epoch': 0.52} 52%|█████▏ | 3438/6640 [1:28:49<4:49:04, 5.42s/it] 52%|█████▏ | 3439/6640 [1:28:55<4:49:24, 5.42s/it] {'loss': 0.5723, 'learning_rate': 9.907314587308121e-06, 'epoch': 0.52} 52%|█████▏ | 3439/6640 [1:28:55<4:49:24, 5.42s/it] 52%|█████▏ | 3440/6640 [1:29:00<4:50:18, 5.44s/it] {'loss': 0.6017, 'learning_rate': 9.90243655860742e-06, 'epoch': 0.52} 52%|█████▏ | 3440/6640 [1:29:00<4:50:18, 5.44s/it] 52%|█████▏ | 3441/6640 [1:29:06<4:52:13, 5.48s/it] {'loss': 0.5892, 'learning_rate': 9.897558553124202e-06, 'epoch': 0.52} 52%|█████▏ | 3441/6640 [1:29:06<4:52:13, 5.48s/it] 52%|█████▏ | 3442/6640 [1:29:11<4:51:16, 5.46s/it] {'loss': 0.6017, 'learning_rate': 9.8926805720193e-06, 'epoch': 0.52} 52%|█████▏ | 3442/6640 [1:29:11<4:51:16, 5.46s/it] 52%|█████▏ | 3443/6640 [1:29:17<4:51:54, 5.48s/it] {'loss': 0.5937, 'learning_rate': 9.887802616453543e-06, 'epoch': 0.52} 52%|█████▏ | 3443/6640 [1:29:17<4:51:54, 5.48s/it] 52%|█████▏ | 3444/6640 [1:29:22<4:52:07, 5.48s/it] {'loss': 0.6025, 'learning_rate': 9.882924687587754e-06, 'epoch': 0.52} 52%|█████▏ | 3444/6640 [1:29:22<4:52:07, 5.48s/it] 52%|█████▏ | 3445/6640 [1:29:28<4:50:41, 5.46s/it] {'loss': 0.5871, 'learning_rate': 9.878046786582745e-06, 'epoch': 0.52} 52%|█████▏ | 3445/6640 [1:29:28<4:50:41, 5.46s/it] 52%|█████▏ | 3446/6640 [1:29:33<4:49:59, 5.45s/it] {'loss': 0.5908, 'learning_rate': 9.873168914599332e-06, 'epoch': 0.52} 52%|█████▏ | 3446/6640 [1:29:33<4:49:59, 5.45s/it] 52%|█████▏ | 3447/6640 [1:29:38<4:47:38, 5.41s/it] {'loss': 0.5985, 'learning_rate': 9.868291072798317e-06, 'epoch': 0.52} 52%|█████▏ | 3447/6640 [1:29:38<4:47:38, 5.41s/it] 52%|█████▏ | 3448/6640 [1:29:44<4:48:17, 5.42s/it] {'loss': 0.6214, 'learning_rate': 9.863413262340491e-06, 'epoch': 0.52} 52%|█████▏ | 3448/6640 [1:29:44<4:48:17, 5.42s/it] 52%|█████▏ | 3449/6640 [1:29:49<4:47:25, 5.40s/it] {'loss': 0.5858, 'learning_rate': 9.858535484386646e-06, 'epoch': 0.52} 52%|█████▏ | 3449/6640 [1:29:49<4:47:25, 5.40s/it]07 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 52%|█████▏ | 3450/6640 [1:29:55<4:45:17, 5.37s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5815, 'learning_rate': 9.853657740097558e-06, 'epoch': 0.52} 52%|█████▏ | 3450/6640 [1:29:55<4:45:17, 5.37s/it] 52%|█████▏ | 3451/6640 [1:30:00<4:46:24, 5.39s/it] {'loss': 0.613, 'learning_rate': 9.848780030634002e-06, 'epoch': 0.52} 52%|█████▏ | 3451/6640 [1:30:00<4:46:24, 5.39s/it] 52%|█████▏ | 3452/6640 [1:30:05<4:44:19, 5.35s/it] {'loss': 0.5884, 'learning_rate': 9.843902357156744e-06, 'epoch': 0.52} 52%|█████▏ | 3452/6640 [1:30:05<4:44:19, 5.35s/it] 52%|█████▏ | 3453/6640 [1:30:11<4:43:37, 5.34s/it] {'loss': 0.5743, 'learning_rate': 9.839024720826538e-06, 'epoch': 0.52} 52%|█████▏ | 3453/6640 [1:30:11<4:43:37, 5.34s/it] 52%|█████▏ | 3454/6640 [1:30:16<4:50:06, 5.46s/it] {'loss': 0.5945, 'learning_rate': 9.834147122804126e-06, 'epoch': 0.52} 52%|█████▏ | 3454/6640 [1:30:16<4:50:06, 5.46s/it] 52%|█████▏ | 3455/6640 [1:30:22<4:50:33, 5.47s/it] {'loss': 0.5667, 'learning_rate': 9.829269564250254e-06, 'epoch': 0.52} 52%|█████▏ | 3455/6640 [1:30:22<4:50:33, 5.47s/it] 52%|█████▏ | 3456/6640 [1:30:27<4:48:58, 5.45s/it] {'loss': 0.5815, 'learning_rate': 9.824392046325645e-06, 'epoch': 0.52} 52%|█████▏ | 3456/6640 [1:30:27<4:48:58, 5.45s/it] 52%|█████▏ | 3457/6640 [1:30:33<4:46:39, 5.40s/it] {'loss': 0.6038, 'learning_rate': 9.819514570191016e-06, 'epoch': 0.52} 52%|█████▏ | 3457/6640 [1:30:33<4:46:39, 5.40s/it] 52%|█████▏ | 3458/6640 [1:30:38<4:48:06, 5.43s/it] {'loss': 0.5881, 'learning_rate': 9.81463713700708e-06, 'epoch': 0.52} 52%|█████▏ | 3458/6640 [1:30:38<4:48:06, 5.43s/it] 52%|█████▏ | 3459/6640 [1:30:43<4:43:08, 5.34s/it] {'loss': 0.5927, 'learning_rate': 9.809759747934527e-06, 'epoch': 0.52} 52%|█████▏ | 3459/6640 [1:30:43<4:43:08, 5.34s/it] 52%|█████▏ | 3460/6640 [1:30:49<4:45:07, 5.38s/it] {'loss': 0.5849, 'learning_rate': 9.804882404134057e-06, 'epoch': 0.52} 52%|█████▏ | 3460/6640 [1:30:49<4:45:07, 5.38s/it] 52%|█████▏ | 3461/6640 [1:30:54<4:48:29, 5.44s/it] {'loss': 0.6066, 'learning_rate': 9.80000510676634e-06, 'epoch': 0.52} 52%|█████▏ | 3461/6640 [1:30:54<4:48:29, 5.44s/it] 52%|█████▏ | 3462/6640 [1:31:00<4:56:38, 5.60s/it] {'loss': 0.6011, 'learning_rate': 9.79512785699204e-06, 'epoch': 0.52} 52%|█████▏ | 3462/6640 [1:31:00<4:56:38, 5.60s/it] 52%|█████▏ | 3463/6640 [1:31:06<4:56:47, 5.61s/it] {'loss': 0.6092, 'learning_rate': 9.790250655971813e-06, 'epoch': 0.52} 52%|█████▏ | 3463/6640 [1:31:06<4:56:47, 5.61s/it] 52%|█████▏ | 3464/6640 [1:31:11<4:52:33, 5.53s/it] {'loss': 0.6017, 'learning_rate': 9.785373504866305e-06, 'epoch': 0.52} 52%|█████▏ | 3464/6640 [1:31:11<4:52:33, 5.53s/it] 52%|█████▏ | 3465/6640 [1:31:17<4:51:36, 5.51s/it] {'loss': 0.6016, 'learning_rate': 9.780496404836146e-06, 'epoch': 0.52} 52%|█████▏ | 3465/6640 [1:31:17<4:51:36, 5.51s/it] 52%|█████▏ | 3466/6640 [1:31:22<4:49:28, 5.47s/it] {'loss': 0.5903, 'learning_rate': 9.775619357041952e-06, 'epoch': 0.52} 52%|█████▏ | 3466/6640 [1:31:22<4:49:28, 5.47s/it] 52%|█████▏ | 3467/6640 [1:31:28<4:52:35, 5.53s/it] {'loss': 0.588, 'learning_rate': 9.770742362644335e-06, 'epoch': 0.52} 52%|█████▏ | 3467/6640 [1:31:28<4:52:35, 5.53s/it] 52%|█████▏ | 3468/6640 [1:31:33<4:49:22, 5.47s/it] {'loss': 0.5772, 'learning_rate': 9.765865422803878e-06, 'epoch': 0.52} 52%|█████▏ | 3468/6640 [1:31:33<4:49:22, 5.47s/it] 52%|█████▏ | 3469/6640 [1:31:38<4:46:46, 5.43s/it] {'loss': 0.588, 'learning_rate': 9.760988538681176e-06, 'epoch': 0.52} 52%|█████▏ | 3469/6640 [1:31:38<4:46:46, 5.43s/it] 52%|█████▏ | 3470/6640 [1:31:44<4:45:58, 5.41s/it] {'loss': 0.6054, 'learning_rate': 9.75611171143679e-06, 'epoch': 0.52} 52%|█████▏ | 3470/6640 [1:31:44<4:45:58, 5.41s/it] 52%|█████▏ | 3471/6640 [1:31:49<4:47:29, 5.44s/it] {'loss': 0.5614, 'learning_rate': 9.751234942231274e-06, 'epoch': 0.52} 52%|█████▏ | 3471/6640 [1:31:49<4:47:29, 5.44s/it] 52%|█████▏ | 3472/6640 [1:31:55<4:45:24, 5.41s/it] {'loss': 0.5858, 'learning_rate': 9.746358232225162e-06, 'epoch': 0.52} 52%|█████▏ | 3472/6640 [1:31:55<4:45:24, 5.41s/it] 52%|█████▏ | 3473/6640 [1:32:00<4:48:09, 5.46s/it] {'loss': 0.5718, 'learning_rate': 9.741481582578993e-06, 'epoch': 0.52} 52%|█████▏ | 3473/6640 [1:32:00<4:48:09, 5.46s/it] 52%|█████▏ | 3474/6640 [1:32:05<4:44:00, 5.38s/it] {'loss': 0.5951, 'learning_rate': 9.736604994453269e-06, 'epoch': 0.52} 52%|█████▏ | 3474/6640 [1:32:05<4:44:00, 5.38s/it] 52%|█████▏ | 3475/6640 [1:32:11<4:47:41, 5.45s/it] {'loss': 0.5769, 'learning_rate': 9.731728469008493e-06, 'epoch': 0.52} 52%|█████▏ | 3475/6640 [1:32:11<4:47:41, 5.45s/it] 52%|█████▏ | 3476/6640 [1:32:16<4:48:11, 5.47s/it] {'loss': 0.592, 'learning_rate': 9.726852007405144e-06, 'epoch': 0.52} 52%|█████▏ | 3476/6640 [1:32:16<4:48:11, 5.47s/it] 52%|█████▏ | 3477/6640 [1:32:22<4:46:08, 5.43s/it] {'loss': 0.5913, 'learning_rate': 9.721975610803686e-06, 'epoch': 0.52} 52%|█████▏ | 3477/6640 [1:32:22<4:46:08, 5.43s/it] 52%|█████▏ | 3478/6640 [1:32:27<4:45:36, 5.42s/it] {'loss': 0.5976, 'learning_rate': 9.717099280364578e-06, 'epoch': 0.52} 52%|█████▏ | 3478/6640 [1:32:27<4:45:36, 5.42s/it] 52%|█████▏ | 3479/6640 [1:32:33<4:45:32, 5.42s/it] {'loss': 0.5703, 'learning_rate': 9.712223017248252e-06, 'epoch': 0.52} 52%|█████▏ | 3479/6640 [1:32:33<4:45:32, 5.42s/it] 52%|█████▏ | 3480/6640 [1:32:38<4:44:14, 5.40s/it] {'loss': 0.595, 'learning_rate': 9.707346822615127e-06, 'epoch': 0.52} 52%|█████▏ | 3480/6640 [1:32:38<4:44:14, 5.40s/it] 52%|█████▏ | 3481/6640 [1:32:44<4:48:28, 5.48s/it] {'loss': 0.5952, 'learning_rate': 9.702470697625604e-06, 'epoch': 0.52} 52%|█████▏ | 3481/6640 [1:32:44<4:48:28, 5.48s/it] 52%|█████▏ | 3482/6640 [1:32:49<4:50:47, 5.52s/it] {'loss': 0.5826, 'learning_rate': 9.697594643440078e-06, 'epoch': 0.52} 52%|█████▏ | 3482/6640 [1:32:49<4:50:47, 5.52s/it] 52%|█████▏ | 3483/6640 [1:32:55<4:47:00, 5.45s/it] {'loss': 0.5927, 'learning_rate': 9.692718661218912e-06, 'epoch': 0.52} 52%|█████▏ | 3483/6640 [1:32:55<4:47:00, 5.45s/it] 52%|█████▏ | 3484/6640 [1:33:00<4:44:36, 5.41s/it] {'loss': 0.5934, 'learning_rate': 9.687842752122462e-06, 'epoch': 0.52} 52%|█████▏ | 3484/6640 [1:33:00<4:44:36, 5.41s/it] 52%|█████▏ | 3485/6640 [1:33:05<4:44:20, 5.41s/it] {'loss': 0.575, 'learning_rate': 9.682966917311065e-06, 'epoch': 0.52} 52%|█████▏ | 3485/6640 [1:33:05<4:44:20, 5.41s/it] 52%|█████▎ | 3486/6640 [1:33:11<4:44:30, 5.41s/it] {'loss': 0.5872, 'learning_rate': 9.678091157945031e-06, 'epoch': 0.53} 52%|█████▎ | 3486/6640 [1:33:11<4:44:30, 5.41s/it] 53%|█████▎ | 3487/6640 [1:33:16<4:42:52, 5.38s/it] {'loss': 0.5885, 'learning_rate': 9.67321547518467e-06, 'epoch': 0.53} 53%|█████▎ | 3487/6640 [1:33:16<4:42:52, 5.38s/it] 53%|█████▎ | 3488/6640 [1:33:21<4:41:22, 5.36s/it] {'loss': 0.5904, 'learning_rate': 9.66833987019026e-06, 'epoch': 0.53} 53%|█████▎ | 3488/6640 [1:33:21<4:41:22, 5.36s/it] 53%|█████▎ | 3489/6640 [1:33:27<4:42:01, 5.37s/it] {'loss': 0.5927, 'learning_rate': 9.663464344122064e-06, 'epoch': 0.53} 53%|█████▎ | 3489/6640 [1:33:27<4:42:01, 5.37s/it] 53%|█████▎ | 3490/6640 [1:33:32<4:44:14, 5.41s/it] {'loss': 0.5541, 'learning_rate': 9.658588898140322e-06, 'epoch': 0.53} 53%|█████▎ | 3490/6640 [1:33:32<4:44:14, 5.41s/it] 53%|█████▎ | 3491/6640 [1:33:38<4:46:01, 5.45s/it] {'loss': 0.6209, 'learning_rate': 9.653713533405267e-06, 'epoch': 0.53} 53%|█████▎ | 3491/6640 [1:33:38<4:46:01, 5.45s/it] 53%|█████▎ | 3492/6640 [1:33:43<4:47:05, 5.47s/it] {'loss': 0.5944, 'learning_rate': 9.648838251077098e-06, 'epoch': 0.53} 53%|█████▎ | 3492/6640 [1:33:43<4:47:05, 5.47s/it] 53%|█████▎ | 3493/6640 [1:33:49<4:46:28, 5.46s/it] {'loss': 0.5928, 'learning_rate': 9.643963052316009e-06, 'epoch': 0.53} 53%|█████▎ | 3493/6640 [1:33:49<4:46:28, 5.46s/it] 53%|█████▎ | 3494/6640 [1:33:54<4:46:36, 5.47s/it] {'loss': 0.5817, 'learning_rate': 9.639087938282162e-06, 'epoch': 0.53} 53%|█████▎ | 3494/6640 [1:33:54<4:46:36, 5.47s/it] 53%|█████▎ | 3495/6640 [1:34:00<4:45:59, 5.46s/it] {'loss': 0.5909, 'learning_rate': 9.634212910135697e-06, 'epoch': 0.53} 53%|█████▎ | 3495/6640 [1:34:00<4:45:59, 5.46s/it] 53%|█████▎ | 3496/6640 [1:34:05<4:44:35, 5.43s/it] {'loss': 0.57, 'learning_rate': 9.629337969036753e-06, 'epoch': 0.53} 53%|█████▎ | 3496/6640 [1:34:05<4:44:35, 5.43s/it] 53%|█████▎ | 3497/6640 [1:34:10<4:45:06, 5.44s/it] {'loss': 0.5832, 'learning_rate': 9.624463116145428e-06, 'epoch': 0.53} 53%|█████▎ | 3497/6640 [1:34:10<4:45:06, 5.44s/it] 53%|█████▎ | 3498/6640 [1:34:16<4:46:16, 5.47s/it] {'loss': 0.6073, 'learning_rate': 9.619588352621805e-06, 'epoch': 0.53} 53%|█████▎ | 3498/6640 [1:34:16<4:46:16, 5.47s/it] 53%|█████▎ | 3499/6640 [1:34:21<4:43:12, 5.41s/it] {'loss': 0.5922, 'learning_rate': 9.61471367962595e-06, 'epoch': 0.53} 53%|█████▎ | 3499/6640 [1:34:21<4:43:12, 5.41s/it]7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 53%|█████▎ | 3500/6640 [1:34:27<4:41:59, 5.39s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6004, 'learning_rate': 9.609839098317902e-06, 'epoch': 0.53} 53%|█████▎ | 3500/6640 [1:34:27<4:41:59, 5.39s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 53%|█████▎ | 3501/6640 [1:34:54<10:21:15, 11.87s/it] {'loss': 0.614, 'learning_rate': 9.604964609857685e-06, 'epoch': 0.53} 53%|█████▎ | 3501/6640 [1:34:54<10:21:15, 11.87s/it] 53%|█████▎ | 3502/6640 [1:34:59<8:40:25, 9.95s/it] {'loss': 0.6039, 'learning_rate': 9.600090215405292e-06, 'epoch': 0.53} 53%|█████▎ | 3502/6640 [1:34:59<8:40:25, 9.95s/it] 53%|█████▎ | 3503/6640 [1:35:05<7:30:55, 8.62s/it] {'loss': 0.5784, 'learning_rate': 9.5952159161207e-06, 'epoch': 0.53} 53%|█████▎ | 3503/6640 [1:35:05<7:30:55, 8.62s/it] 53%|█████▎ | 3504/6640 [1:35:10<6:42:51, 7.71s/it] {'loss': 0.5954, 'learning_rate': 9.590341713163858e-06, 'epoch': 0.53} 53%|█████▎ | 3504/6640 [1:35:10<6:42:51, 7.71s/it] 53%|█████▎ | 3505/6640 [1:35:16<6:11:14, 7.11s/it] {'loss': 0.6037, 'learning_rate': 9.585467607694702e-06, 'epoch': 0.53} 53%|█████▎ | 3505/6640 [1:35:16<6:11:14, 7.11s/it] 53%|█████▎ | 3506/6640 [1:35:21<5:44:42, 6.60s/it] {'loss': 0.5876, 'learning_rate': 9.580593600873135e-06, 'epoch': 0.53} 53%|█████▎ | 3506/6640 [1:35:21<5:44:42, 6.60s/it] 53%|█████▎ | 3507/6640 [1:35:27<5:26:19, 6.25s/it] {'loss': 0.5724, 'learning_rate': 9.575719693859037e-06, 'epoch': 0.53} 53%|█████▎ | 3507/6640 [1:35:27<5:26:19, 6.25s/it] 53%|█████▎ | 3508/6640 [1:35:32<5:13:18, 6.00s/it] {'loss': 0.5845, 'learning_rate': 9.57084588781227e-06, 'epoch': 0.53} 53%|█████▎ | 3508/6640 [1:35:32<5:13:18, 6.00s/it] 53%|█████▎ | 3509/6640 [1:35:38<5:05:36, 5.86s/it] {'loss': 0.5765, 'learning_rate': 9.565972183892669e-06, 'epoch': 0.53} 53%|█████▎ | 3509/6640 [1:35:38<5:05:36, 5.86s/it] 53%|█████▎ | 3510/6640 [1:35:43<4:56:55, 5.69s/it] {'loss': 0.5965, 'learning_rate': 9.561098583260047e-06, 'epoch': 0.53} 53%|█████▎ | 3510/6640 [1:35:43<4:56:55, 5.69s/it] 53%|█████▎ | 3511/6640 [1:35:48<4:54:13, 5.64s/it] {'loss': 0.5957, 'learning_rate': 9.55622508707419e-06, 'epoch': 0.53} 53%|█████▎ | 3511/6640 [1:35:48<4:54:13, 5.64s/it] 53%|█████▎ | 3512/6640 [1:35:54<4:49:54, 5.56s/it] {'loss': 0.5995, 'learning_rate': 9.551351696494854e-06, 'epoch': 0.53} 53%|█████▎ | 3512/6640 [1:35:54<4:49:54, 5.56s/it] 53%|█████▎ | 3513/6640 [1:35:59<4:46:12, 5.49s/it] {'loss': 0.615, 'learning_rate': 9.546478412681776e-06, 'epoch': 0.53} 53%|█████▎ | 3513/6640 [1:35:59<4:46:12, 5.49s/it] 53%|█████▎ | 3514/6640 [1:36:04<4:43:36, 5.44s/it] {'loss': 0.5954, 'learning_rate': 9.541605236794673e-06, 'epoch': 0.53} 53%|█████▎ | 3514/6640 [1:36:04<4:43:36, 5.44s/it] 53%|█████▎ | 3515/6640 [1:36:10<4:46:31, 5.50s/it] {'loss': 0.5873, 'learning_rate': 9.536732169993225e-06, 'epoch': 0.53} 53%|█████▎ | 3515/6640 [1:36:10<4:46:31, 5.50s/it] 53%|█████▎ | 3516/6640 [1:36:16<4:44:52, 5.47s/it] {'loss': 0.5907, 'learning_rate': 9.531859213437094e-06, 'epoch': 0.53} 53%|█████▎ | 3516/6640 [1:36:16<4:44:52, 5.47s/it] 53%|█████▎ | 3517/6640 [1:36:21<4:48:03, 5.53s/it] {'loss': 0.5783, 'learning_rate': 9.52698636828591e-06, 'epoch': 0.53} 53%|█████▎ | 3517/6640 [1:36:21<4:48:03, 5.53s/it] 53%|█████▎ | 3518/6640 [1:36:27<4:44:41, 5.47s/it] {'loss': 0.6019, 'learning_rate': 9.522113635699281e-06, 'epoch': 0.53} 53%|█████▎ | 3518/6640 [1:36:27<4:44:41, 5.47s/it] 53%|█████▎ | 3519/6640 [1:36:32<4:43:34, 5.45s/it] {'loss': 0.6215, 'learning_rate': 9.517241016836789e-06, 'epoch': 0.53} 53%|█████▎ | 3519/6640 [1:36:32<4:43:34, 5.45s/it] 53%|█████▎ | 3520/6640 [1:36:37<4:41:05, 5.41s/it] {'loss': 0.5863, 'learning_rate': 9.512368512857983e-06, 'epoch': 0.53} 53%|█████▎ | 3520/6640 [1:36:37<4:41:05, 5.41s/it] 53%|█████▎ | 3521/6640 [1:36:43<4:41:33, 5.42s/it] {'loss': 0.5726, 'learning_rate': 9.507496124922392e-06, 'epoch': 0.53} 53%|█████▎ | 3521/6640 [1:36:43<4:41:33, 5.42s/it] 53%|█████▎ | 3522/6640 [1:36:48<4:43:48, 5.46s/it] {'loss': 0.6161, 'learning_rate': 9.502623854189504e-06, 'epoch': 0.53} 53%|█████▎ | 3522/6640 [1:36:48<4:43:48, 5.46s/it] 53%|█████▎ | 3523/6640 [1:36:54<4:44:48, 5.48s/it] {'loss': 0.5931, 'learning_rate': 9.497751701818801e-06, 'epoch': 0.53} 53%|█████▎ | 3523/6640 [1:36:54<4:44:48, 5.48s/it] 53%|█████▎ | 3524/6640 [1:36:59<4:42:42, 5.44s/it] {'loss': 0.5841, 'learning_rate': 9.49287966896972e-06, 'epoch': 0.53} 53%|█████▎ | 3524/6640 [1:36:59<4:42:42, 5.44s/it] 53%|█████▎ | 3525/6640 [1:37:05<4:42:01, 5.43s/it] {'loss': 0.6048, 'learning_rate': 9.488007756801672e-06, 'epoch': 0.53} 53%|█████▎ | 3525/6640 [1:37:05<4:42:01, 5.43s/it] 53%|█████▎ | 3526/6640 [1:37:10<4:42:59, 5.45s/it] {'loss': 0.6003, 'learning_rate': 9.48313596647404e-06, 'epoch': 0.53} 53%|█████▎ | 3526/6640 [1:37:10<4:42:59, 5.45s/it] 53%|█████▎ | 3527/6640 [1:37:15<4:42:41, 5.45s/it] {'loss': 0.5635, 'learning_rate': 9.478264299146186e-06, 'epoch': 0.53} 53%|█████▎ | 3527/6640 [1:37:15<4:42:41, 5.45s/it] 53%|█████▎ | 3528/6640 [1:37:21<4:40:10, 5.40s/it] {'loss': 0.5697, 'learning_rate': 9.473392755977432e-06, 'epoch': 0.53} 53%|█████▎ | 3528/6640 [1:37:21<4:40:10, 5.40s/it] 53%|█████▎ | 3529/6640 [1:37:26<4:40:35, 5.41s/it] {'loss': 0.6012, 'learning_rate': 9.468521338127075e-06, 'epoch': 0.53} 53%|█████▎ | 3529/6640 [1:37:26<4:40:35, 5.41s/it] 53%|█████▎ | 3530/6640 [1:37:32<4:39:22, 5.39s/it] {'loss': 0.6123, 'learning_rate': 9.463650046754383e-06, 'epoch': 0.53} 53%|█████▎ | 3530/6640 [1:37:32<4:39:22, 5.39s/it] 53%|█████▎ | 3531/6640 [1:37:37<4:39:55, 5.40s/it] {'loss': 0.5974, 'learning_rate': 9.458778883018586e-06, 'epoch': 0.53} 53%|█████▎ | 3531/6640 [1:37:37<4:39:55, 5.40s/it] 53%|█████▎ | 3532/6640 [1:37:42<4:40:05, 5.41s/it] {'loss': 0.5892, 'learning_rate': 9.453907848078901e-06, 'epoch': 0.53} 53%|█████▎ | 3532/6640 [1:37:42<4:40:05, 5.41s/it] 53%|█████▎ | 3533/6640 [1:37:48<4:40:08, 5.41s/it] {'loss': 0.5765, 'learning_rate': 9.449036943094497e-06, 'epoch': 0.53} 53%|█████▎ | 3533/6640 [1:37:48<4:40:08, 5.41s/it] 53%|█████▎ | 3534/6640 [1:37:53<4:41:25, 5.44s/it] {'loss': 0.5734, 'learning_rate': 9.444166169224522e-06, 'epoch': 0.53} 53%|█████▎ | 3534/6640 [1:37:53<4:41:25, 5.44s/it] 53%|█████▎ | 3535/6640 [1:37:59<4:40:25, 5.42s/it] {'loss': 0.5902, 'learning_rate': 9.439295527628083e-06, 'epoch': 0.53} 53%|█████▎ | 3535/6640 [1:37:59<4:40:25, 5.42s/it] 53%|█████▎ | 3536/6640 [1:38:04<4:42:35, 5.46s/it] {'loss': 0.6076, 'learning_rate': 9.434425019464269e-06, 'epoch': 0.53} 53%|█████▎ | 3536/6640 [1:38:04<4:42:35, 5.46s/it] 53%|█████▎ | 3537/6640 [1:38:10<4:40:09, 5.42s/it] {'loss': 0.5592, 'learning_rate': 9.429554645892129e-06, 'epoch': 0.53} 53%|█████▎ | 3537/6640 [1:38:10<4:40:09, 5.42s/it] 53%|█████▎ | 3538/6640 [1:38:15<4:43:14, 5.48s/it] {'loss': 0.5842, 'learning_rate': 9.424684408070681e-06, 'epoch': 0.53} 53%|█████▎ | 3538/6640 [1:38:15<4:43:14, 5.48s/it] 53%|█████▎ | 3539/6640 [1:38:21<4:43:07, 5.48s/it] {'loss': 0.5896, 'learning_rate': 9.419814307158908e-06, 'epoch': 0.53} 53%|█████▎ | 3539/6640 [1:38:21<4:43:07, 5.48s/it] 53%|█████▎ | 3540/6640 [1:38:26<4:40:04, 5.42s/it] {'loss': 0.5913, 'learning_rate': 9.414944344315765e-06, 'epoch': 0.53} 53%|█████▎ | 3540/6640 [1:38:26<4:40:04, 5.42s/it] 53%|█████▎ | 3541/6640 [1:38:31<4:42:00, 5.46s/it] {'loss': 0.5949, 'learning_rate': 9.410074520700172e-06, 'epoch': 0.53} 53%|█████▎ | 3541/6640 [1:38:31<4:42:00, 5.46s/it] 53%|█████▎ | 3542/6640 [1:38:37<4:39:19, 5.41s/it] {'loss': 0.5998, 'learning_rate': 9.40520483747102e-06, 'epoch': 0.53} 53%|█████▎ | 3542/6640 [1:38:37<4:39:19, 5.41s/it] 53%|█████▎ | 3543/6640 [1:38:42<4:39:35, 5.42s/it] {'loss': 0.599, 'learning_rate': 9.400335295787162e-06, 'epoch': 0.53} 53%|█████▎ | 3543/6640 [1:38:42<4:39:35, 5.42s/it] 53%|█████▎ | 3544/6640 [1:38:48<4:44:31, 5.51s/it] {'loss': 0.577, 'learning_rate': 9.395465896807411e-06, 'epoch': 0.53} 53%|█████▎ | 3544/6640 [1:38:48<4:44:31, 5.51s/it] 53%|█████▎ | 3545/6640 [1:38:53<4:42:07, 5.47s/it] {'loss': 0.6077, 'learning_rate': 9.390596641690563e-06, 'epoch': 0.53} 53%|█████▎ | 3545/6640 [1:38:53<4:42:07, 5.47s/it] 53%|█████▎ | 3546/6640 [1:38:59<4:46:37, 5.56s/it] {'loss': 0.61, 'learning_rate': 9.385727531595367e-06, 'epoch': 0.53} 53%|█████▎ | 3546/6640 [1:38:59<4:46:37, 5.56s/it] 53%|█████▎ | 3547/6640 [1:39:07<5:18:22, 6.18s/it] {'loss': 0.5632, 'learning_rate': 9.38085856768054e-06, 'epoch': 0.53} 53%|█████▎ | 3547/6640 [1:39:07<5:18:22, 6.18s/it] 53%|█████▎ | 3548/6640 [1:39:12<5:06:29, 5.95s/it] {'loss': 0.5951, 'learning_rate': 9.37598975110476e-06, 'epoch': 0.53} 53%|█████▎ | 3548/6640 [1:39:12<5:06:29, 5.95s/it] 53%|█████▎ | 3549/6640 [1:39:17<4:57:12, 5.77s/it] {'loss': 0.5878, 'learning_rate': 9.371121083026683e-06, 'epoch': 0.53} 53%|█████▎ | 3549/6640 [1:39:17<4:57:12, 5.77s/it]0 AutoResumeHook: Checking whether to suspend... 53%|█████▎ | 3550/6640 [1:39:23<4:52:20, 5.68s/it]47 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5974, 'learning_rate': 9.366252564604914e-06, 'epoch': 0.53} 53%|█████▎ | 3550/6640 [1:39:23<4:52:20, 5.68s/it] 53%|█████▎ | 3551/6640 [1:39:28<4:48:36, 5.61s/it] {'loss': 0.5806, 'learning_rate': 9.361384196998037e-06, 'epoch': 0.53} 53%|█████▎ | 3551/6640 [1:39:28<4:48:36, 5.61s/it] 53%|█████▎ | 3552/6640 [1:39:34<4:43:45, 5.51s/it] {'loss': 0.608, 'learning_rate': 9.356515981364589e-06, 'epoch': 0.53} 53%|█████▎ | 3552/6640 [1:39:34<4:43:45, 5.51s/it] 54%|█████▎ | 3553/6640 [1:39:39<4:42:24, 5.49s/it] {'loss': 0.5876, 'learning_rate': 9.35164791886307e-06, 'epoch': 0.54} 54%|█████▎ | 3553/6640 [1:39:39<4:42:24, 5.49s/it] 54%|█████▎ | 3554/6640 [1:39:44<4:38:32, 5.42s/it] {'loss': 0.6011, 'learning_rate': 9.34678001065196e-06, 'epoch': 0.54} 54%|█████▎ | 3554/6640 [1:39:44<4:38:32, 5.42s/it] 54%|█████▎ | 3555/6640 [1:39:50<4:41:08, 5.47s/it] {'loss': 0.587, 'learning_rate': 9.34191225788968e-06, 'epoch': 0.54} 54%|█████▎ | 3555/6640 [1:39:50<4:41:08, 5.47s/it] 54%|█████▎ | 3556/6640 [1:39:55<4:41:18, 5.47s/it] {'loss': 0.6201, 'learning_rate': 9.337044661734632e-06, 'epoch': 0.54} 54%|█████▎ | 3556/6640 [1:39:55<4:41:18, 5.47s/it] 54%|█████▎ | 3557/6640 [1:40:01<4:38:08, 5.41s/it] {'loss': 0.606, 'learning_rate': 9.332177223345165e-06, 'epoch': 0.54} 54%|█████▎ | 3557/6640 [1:40:01<4:38:08, 5.41s/it] 54%|█████▎ | 3558/6640 [1:40:06<4:41:24, 5.48s/it] {'loss': 0.5812, 'learning_rate': 9.327309943879604e-06, 'epoch': 0.54} 54%|█████▎ | 3558/6640 [1:40:06<4:41:24, 5.48s/it] 54%|█████▎ | 3559/6640 [1:40:12<4:47:52, 5.61s/it] {'loss': 0.5864, 'learning_rate': 9.322442824496232e-06, 'epoch': 0.54} 54%|█████▎ | 3559/6640 [1:40:12<4:47:52, 5.61s/it] 54%|█████▎ | 3560/6640 [1:40:18<4:46:35, 5.58s/it] {'loss': 0.5818, 'learning_rate': 9.317575866353293e-06, 'epoch': 0.54} 54%|█████▎ | 3560/6640 [1:40:18<4:46:35, 5.58s/it] 54%|█████▎ | 3561/6640 [1:40:25<5:12:01, 6.08s/it] {'loss': 0.5661, 'learning_rate': 9.312709070608988e-06, 'epoch': 0.54} 54%|█████▎ | 3561/6640 [1:40:25<5:12:01, 6.08s/it] 54%|█████▎ | 3562/6640 [1:40:30<5:00:22, 5.86s/it] {'loss': 0.6019, 'learning_rate': 9.307842438421484e-06, 'epoch': 0.54} 54%|█████▎ | 3562/6640 [1:40:30<5:00:22, 5.86s/it] 54%|█████▎ | 3563/6640 [1:40:36<4:52:21, 5.70s/it] {'loss': 0.5664, 'learning_rate': 9.302975970948916e-06, 'epoch': 0.54} 54%|█████▎ | 3563/6640 [1:40:36<4:52:21, 5.70s/it] 54%|█████▎ | 3564/6640 [1:40:43<5:20:18, 6.25s/it] {'loss': 0.5952, 'learning_rate': 9.298109669349368e-06, 'epoch': 0.54} 54%|█████▎ | 3564/6640 [1:40:43<5:20:18, 6.25s/it] 54%|█████▎ | 3565/6640 [1:40:51<5:43:11, 6.70s/it] {'loss': 0.631, 'learning_rate': 9.293243534780887e-06, 'epoch': 0.54} 54%|█████▎ | 3565/6640 [1:40:51<5:43:11, 6.70s/it] 54%|█████▎ | 3566/6640 [1:40:56<5:23:53, 6.32s/it] {'loss': 0.5564, 'learning_rate': 9.28837756840149e-06, 'epoch': 0.54} 54%|█████▎ | 3566/6640 [1:40:56<5:23:53, 6.32s/it] 54%|█████▎ | 3567/6640 [1:41:02<5:11:35, 6.08s/it] {'loss': 0.6041, 'learning_rate': 9.283511771369132e-06, 'epoch': 0.54} 54%|█████▎ | 3567/6640 [1:41:02<5:11:35, 6.08s/it] 54%|█████▎ | 3568/6640 [1:41:09<5:29:38, 6.44s/it] {'loss': 0.5765, 'learning_rate': 9.278646144841759e-06, 'epoch': 0.54} 54%|█████▎ | 3568/6640 [1:41:09<5:29:38, 6.44s/it] 54%|█████▍ | 3569/6640 [1:41:17<5:48:57, 6.82s/it] {'loss': 0.5797, 'learning_rate': 9.273780689977251e-06, 'epoch': 0.54} 54%|█████▍ | 3569/6640 [1:41:17<5:48:57, 6.82s/it] 54%|█████▍ | 3570/6640 [1:41:22<5:26:31, 6.38s/it] {'loss': 0.5716, 'learning_rate': 9.268915407933457e-06, 'epoch': 0.54} 54%|█████▍ | 3570/6640 [1:41:22<5:26:31, 6.38s/it] 54%|█████▍ | 3571/6640 [1:41:28<5:15:19, 6.16s/it] {'loss': 0.623, 'learning_rate': 9.26405029986818e-06, 'epoch': 0.54} 54%|█████▍ | 3571/6640 [1:41:28<5:15:19, 6.16s/it] 54%|█████▍ | 3572/6640 [1:41:36<5:39:47, 6.65s/it] {'loss': 0.5902, 'learning_rate': 9.259185366939193e-06, 'epoch': 0.54} 54%|█████▍ | 3572/6640 [1:41:36<5:39:47, 6.65s/it] 54%|█████▍ | 3573/6640 [1:41:41<5:21:03, 6.28s/it] {'loss': 0.582, 'learning_rate': 9.254320610304213e-06, 'epoch': 0.54} 54%|█████▍ | 3573/6640 [1:41:41<5:21:03, 6.28s/it] 54%|█████▍ | 3574/6640 [1:41:47<5:11:29, 6.10s/it] {'loss': 0.5922, 'learning_rate': 9.249456031120922e-06, 'epoch': 0.54} 54%|█████▍ | 3574/6640 [1:41:47<5:11:29, 6.10s/it] 54%|█████▍ | 3575/6640 [1:41:52<5:01:14, 5.90s/it] {'loss': 0.5973, 'learning_rate': 9.244591630546964e-06, 'epoch': 0.54} 54%|█████▍ | 3575/6640 [1:41:52<5:01:14, 5.90s/it] 54%|█████▍ | 3576/6640 [1:42:00<5:26:02, 6.38s/it] {'loss': 0.6049, 'learning_rate': 9.23972740973993e-06, 'epoch': 0.54} 54%|█████▍ | 3576/6640 [1:42:00<5:26:02, 6.38s/it] 54%|█████▍ | 3577/6640 [1:42:05<5:11:49, 6.11s/it] {'loss': 0.58, 'learning_rate': 9.23486336985738e-06, 'epoch': 0.54} 54%|█████▍ | 3577/6640 [1:42:05<5:11:49, 6.11s/it] 54%|█████▍ | 3578/6640 [1:42:11<5:00:12, 5.88s/it] {'loss': 0.565, 'learning_rate': 9.229999512056821e-06, 'epoch': 0.54} 54%|█████▍ | 3578/6640 [1:42:11<5:00:12, 5.88s/it] 54%|█████▍ | 3579/6640 [1:42:16<4:53:11, 5.75s/it] {'loss': 0.6074, 'learning_rate': 9.225135837495722e-06, 'epoch': 0.54} 54%|█████▍ | 3579/6640 [1:42:16<4:53:11, 5.75s/it] 54%|█████▍ | 3580/6640 [1:42:22<4:50:02, 5.69s/it] {'loss': 0.5765, 'learning_rate': 9.220272347331502e-06, 'epoch': 0.54} 54%|█████▍ | 3580/6640 [1:42:22<4:50:02, 5.69s/it] 54%|█████▍ | 3581/6640 [1:42:27<4:47:58, 5.65s/it] {'loss': 0.6221, 'learning_rate': 9.215409042721553e-06, 'epoch': 0.54} 54%|█████▍ | 3581/6640 [1:42:27<4:47:58, 5.65s/it] 54%|█████▍ | 3582/6640 [1:42:32<4:43:15, 5.56s/it] {'loss': 0.5975, 'learning_rate': 9.210545924823199e-06, 'epoch': 0.54} 54%|█████▍ | 3582/6640 [1:42:32<4:43:15, 5.56s/it] 54%|█████▍ | 3583/6640 [1:42:38<4:44:23, 5.58s/it] {'loss': 0.5965, 'learning_rate': 9.20568299479374e-06, 'epoch': 0.54} 54%|█████▍ | 3583/6640 [1:42:38<4:44:23, 5.58s/it] 54%|█████▍ | 3584/6640 [1:42:44<4:43:16, 5.56s/it] {'loss': 0.5855, 'learning_rate': 9.200820253790419e-06, 'epoch': 0.54} 54%|█████▍ | 3584/6640 [1:42:44<4:43:16, 5.56s/it] 54%|█████▍ | 3585/6640 [1:42:49<4:42:01, 5.54s/it] {'loss': 0.5622, 'learning_rate': 9.195957702970434e-06, 'epoch': 0.54} 54%|█████▍ | 3585/6640 [1:42:49<4:42:01, 5.54s/it] 54%|█████▍ | 3586/6640 [1:42:55<4:47:35, 5.65s/it] {'loss': 0.5821, 'learning_rate': 9.19109534349095e-06, 'epoch': 0.54} 54%|█████▍ | 3586/6640 [1:42:55<4:47:35, 5.65s/it] 54%|█████▍ | 3587/6640 [1:43:00<4:44:32, 5.59s/it] {'loss': 0.5643, 'learning_rate': 9.186233176509075e-06, 'epoch': 0.54} 54%|█████▍ | 3587/6640 [1:43:00<4:44:32, 5.59s/it] 54%|█████▍ | 3588/6640 [1:43:06<4:40:36, 5.52s/it] {'loss': 0.587, 'learning_rate': 9.181371203181873e-06, 'epoch': 0.54} 54%|█████▍ | 3588/6640 [1:43:06<4:40:36, 5.52s/it] 54%|█████▍ | 3589/6640 [1:43:11<4:40:12, 5.51s/it] {'loss': 0.6266, 'learning_rate': 9.176509424666359e-06, 'epoch': 0.54} 54%|█████▍ | 3589/6640 [1:43:11<4:40:12, 5.51s/it] 54%|█████▍ | 3590/6640 [1:43:17<4:39:31, 5.50s/it] {'loss': 0.5763, 'learning_rate': 9.171647842119515e-06, 'epoch': 0.54} 54%|█████▍ | 3590/6640 [1:43:17<4:39:31, 5.50s/it] 54%|█████▍ | 3591/6640 [1:43:22<4:38:37, 5.48s/it] {'loss': 0.6209, 'learning_rate': 9.16678645669826e-06, 'epoch': 0.54} 54%|█████▍ | 3591/6640 [1:43:22<4:38:37, 5.48s/it] 54%|█████▍ | 3592/6640 [1:43:28<4:37:54, 5.47s/it] {'loss': 0.5917, 'learning_rate': 9.16192526955948e-06, 'epoch': 0.54} 54%|█████▍ | 3592/6640 [1:43:28<4:37:54, 5.47s/it] 54%|█████▍ | 3593/6640 [1:43:33<4:35:39, 5.43s/it] {'loss': 0.5794, 'learning_rate': 9.15706428186e-06, 'epoch': 0.54} 54%|█████▍ | 3593/6640 [1:43:33<4:35:39, 5.43s/it] 54%|█████▍ | 3594/6640 [1:43:38<4:36:53, 5.45s/it] {'loss': 0.6075, 'learning_rate': 9.152203494756607e-06, 'epoch': 0.54} 54%|█████▍ | 3594/6640 [1:43:38<4:36:53, 5.45s/it] 54%|█████▍ | 3595/6640 [1:43:44<4:34:41, 5.41s/it] {'loss': 0.6093, 'learning_rate': 9.14734290940604e-06, 'epoch': 0.54} 54%|█████▍ | 3595/6640 [1:43:44<4:34:41, 5.41s/it] 54%|█████▍ | 3596/6640 [1:43:49<4:34:44, 5.42s/it] {'loss': 0.5805, 'learning_rate': 9.14248252696499e-06, 'epoch': 0.54} 54%|█████▍ | 3596/6640 [1:43:49<4:34:44, 5.42s/it] 54%|█████▍ | 3597/6640 [1:43:55<4:33:13, 5.39s/it] {'loss': 0.5914, 'learning_rate': 9.137622348590091e-06, 'epoch': 0.54} 54%|█████▍ | 3597/6640 [1:43:55<4:33:13, 5.39s/it] 54%|█████▍ | 3598/6640 [1:44:00<4:36:34, 5.46s/it] {'loss': 0.6075, 'learning_rate': 9.132762375437938e-06, 'epoch': 0.54} 54%|█████▍ | 3598/6640 [1:44:00<4:36:34, 5.46s/it] 54%|█████▍ | 3599/6640 [1:44:06<4:36:03, 5.45s/it] {'loss': 0.5596, 'learning_rate': 9.127902608665075e-06, 'epoch': 0.54} 54%|█████▍ | 3599/6640 [1:44:06<4:36:03, 5.45s/it]02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 35 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 54%|█████▍ | 3600/6640 [1:44:11<4:36:29, 5.46s/it]7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 64 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... {'loss': 0.5778, 'learning_rate': 9.123043049427996e-06, 'epoch': 0.54} 54%|█████▍ | 3600/6640 [1:44:11<4:36:29, 5.46s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 54%|█████▍ | 3601/6640 [1:44:37<9:44:53, 11.55s/it] {'loss': 0.5787, 'learning_rate': 9.118183698883146e-06, 'epoch': 0.54} 54%|█████▍ | 3601/6640 [1:44:37<9:44:53, 11.55s/it] 54%|█████▍ | 3602/6640 [1:44:42<8:12:37, 9.73s/it] {'loss': 0.5985, 'learning_rate': 9.113324558186922e-06, 'epoch': 0.54} 54%|█████▍ | 3602/6640 [1:44:42<8:12:37, 9.73s/it] 54%|█████▍ | 3603/6640 [1:44:48<7:05:05, 8.40s/it] {'loss': 0.5557, 'learning_rate': 9.10846562849566e-06, 'epoch': 0.54} 54%|█████▍ | 3603/6640 [1:44:48<7:05:05, 8.40s/it] 54%|█████▍ | 3604/6640 [1:44:53<6:19:13, 7.49s/it] {'loss': 0.5879, 'learning_rate': 9.103606910965666e-06, 'epoch': 0.54} 54%|█████▍ | 3604/6640 [1:44:53<6:19:13, 7.49s/it] 54%|█████▍ | 3605/6640 [1:44:58<5:48:32, 6.89s/it] {'loss': 0.5745, 'learning_rate': 9.098748406753181e-06, 'epoch': 0.54} 54%|█████▍ | 3605/6640 [1:44:58<5:48:32, 6.89s/it] 54%|█████▍ | 3606/6640 [1:45:04<5:27:14, 6.47s/it] {'loss': 0.5953, 'learning_rate': 9.093890117014395e-06, 'epoch': 0.54} 54%|█████▍ | 3606/6640 [1:45:04<5:27:14, 6.47s/it] 54%|█████▍ | 3607/6640 [1:45:09<5:10:49, 6.15s/it] {'loss': 0.5799, 'learning_rate': 9.089032042905454e-06, 'epoch': 0.54} 54%|█████▍ | 3607/6640 [1:45:09<5:10:49, 6.15s/it] 54%|█████▍ | 3608/6640 [1:45:15<5:00:17, 5.94s/it] {'loss': 0.6129, 'learning_rate': 9.084174185582448e-06, 'epoch': 0.54} 54%|█████▍ | 3608/6640 [1:45:15<5:00:17, 5.94s/it] 54%|█████▍ | 3609/6640 [1:45:20<4:49:56, 5.74s/it] {'loss': 0.6106, 'learning_rate': 9.07931654620142e-06, 'epoch': 0.54} 54%|█████▍ | 3609/6640 [1:45:20<4:49:56, 5.74s/it] 54%|█████▍ | 3610/6640 [1:45:26<4:45:16, 5.65s/it] {'loss': 0.6009, 'learning_rate': 9.074459125918356e-06, 'epoch': 0.54} 54%|█████▍ | 3610/6640 [1:45:26<4:45:16, 5.65s/it] 54%|█████▍ | 3611/6640 [1:45:31<4:42:17, 5.59s/it] {'loss': 0.5759, 'learning_rate': 9.069601925889189e-06, 'epoch': 0.54} 54%|█████▍ | 3611/6640 [1:45:31<4:42:17, 5.59s/it] 54%|█████▍ | 3612/6640 [1:45:36<4:39:30, 5.54s/it] {'loss': 0.5849, 'learning_rate': 9.064744947269801e-06, 'epoch': 0.54} 54%|█████▍ | 3612/6640 [1:45:36<4:39:30, 5.54s/it] 54%|█████▍ | 3613/6640 [1:45:42<4:43:31, 5.62s/it] {'loss': 0.6073, 'learning_rate': 9.05988819121603e-06, 'epoch': 0.54} 54%|█████▍ | 3613/6640 [1:45:42<4:43:31, 5.62s/it] 54%|█████▍ | 3614/6640 [1:45:47<4:38:08, 5.52s/it] {'loss': 0.5925, 'learning_rate': 9.05503165888365e-06, 'epoch': 0.54} 54%|█████▍ | 3614/6640 [1:45:47<4:38:08, 5.52s/it] 54%|█████▍ | 3615/6640 [1:45:53<4:37:04, 5.50s/it] {'loss': 0.5877, 'learning_rate': 9.050175351428381e-06, 'epoch': 0.54} 54%|█████▍ | 3615/6640 [1:45:53<4:37:04, 5.50s/it] 54%|█████▍ | 3616/6640 [1:45:58<4:35:02, 5.46s/it] {'loss': 0.5887, 'learning_rate': 9.0453192700059e-06, 'epoch': 0.54} 54%|█████▍ | 3616/6640 [1:45:58<4:35:02, 5.46s/it] 54%|█████▍ | 3617/6640 [1:46:04<4:34:06, 5.44s/it] {'loss': 0.5854, 'learning_rate': 9.04046341577182e-06, 'epoch': 0.54} 54%|█████▍ | 3617/6640 [1:46:04<4:34:06, 5.44s/it] 54%|█████▍ | 3618/6640 [1:46:09<4:35:45, 5.48s/it] {'loss': 0.5917, 'learning_rate': 9.035607789881711e-06, 'epoch': 0.54} 54%|█████▍ | 3618/6640 [1:46:09<4:35:45, 5.48s/it] 55%|█████▍ | 3619/6640 [1:46:15<4:36:39, 5.49s/it] {'loss': 0.6149, 'learning_rate': 9.030752393491074e-06, 'epoch': 0.55} 55%|█████▍ | 3619/6640 [1:46:15<4:36:39, 5.49s/it] 55%|█████▍ | 3620/6640 [1:46:20<4:37:18, 5.51s/it] {'loss': 0.6064, 'learning_rate': 9.025897227755367e-06, 'epoch': 0.55} 55%|█████▍ | 3620/6640 [1:46:20<4:37:18, 5.51s/it] 55%|█████▍ | 3621/6640 [1:46:26<4:34:02, 5.45s/it] {'loss': 0.6236, 'learning_rate': 9.021042293829982e-06, 'epoch': 0.55} 55%|█████▍ | 3621/6640 [1:46:26<4:34:02, 5.45s/it] 55%|█████▍ | 3622/6640 [1:46:31<4:34:19, 5.45s/it] {'loss': 0.5658, 'learning_rate': 9.016187592870276e-06, 'epoch': 0.55} 55%|█████▍ | 3622/6640 [1:46:31<4:34:19, 5.45s/it] 55%|█████▍ | 3623/6640 [1:46:37<4:34:33, 5.46s/it] {'loss': 0.5788, 'learning_rate': 9.011333126031526e-06, 'epoch': 0.55} 55%|█████▍ | 3623/6640 [1:46:37<4:34:33, 5.46s/it] 55%|█████▍ | 3624/6640 [1:46:42<4:34:00, 5.45s/it] {'loss': 0.5874, 'learning_rate': 9.006478894468973e-06, 'epoch': 0.55} 55%|█████▍ | 3624/6640 [1:46:42<4:34:00, 5.45s/it] 55%|█████▍ | 3625/6640 [1:46:47<4:33:41, 5.45s/it] {'loss': 0.5846, 'learning_rate': 9.001624899337785e-06, 'epoch': 0.55} 55%|█████▍ | 3625/6640 [1:46:47<4:33:41, 5.45s/it] 55%|█████▍ | 3626/6640 [1:46:53<4:33:01, 5.43s/it] {'loss': 0.5925, 'learning_rate': 8.996771141793094e-06, 'epoch': 0.55} 55%|█████▍ | 3626/6640 [1:46:53<4:33:01, 5.43s/it] 55%|█████▍ | 3627/6640 [1:46:58<4:33:00, 5.44s/it] {'loss': 0.613, 'learning_rate': 8.991917622989957e-06, 'epoch': 0.55} 55%|█████▍ | 3627/6640 [1:46:58<4:33:00, 5.44s/it] 55%|█████▍ | 3628/6640 [1:47:04<4:34:04, 5.46s/it] {'loss': 0.5929, 'learning_rate': 8.987064344083381e-06, 'epoch': 0.55} 55%|█████▍ | 3628/6640 [1:47:04<4:34:04, 5.46s/it] 55%|█████▍ | 3629/6640 [1:47:09<4:34:42, 5.47s/it] {'loss': 0.5926, 'learning_rate': 8.98221130622832e-06, 'epoch': 0.55} 55%|█████▍ | 3629/6640 [1:47:09<4:34:42, 5.47s/it] 55%|█████▍ | 3630/6640 [1:47:15<4:31:57, 5.42s/it] {'loss': 0.5679, 'learning_rate': 8.977358510579658e-06, 'epoch': 0.55} 55%|█████▍ | 3630/6640 [1:47:15<4:31:57, 5.42s/it] 55%|█████▍ | 3631/6640 [1:47:20<4:28:56, 5.36s/it] {'loss': 0.5744, 'learning_rate': 8.972505958292242e-06, 'epoch': 0.55} 55%|█████▍ | 3631/6640 [1:47:20<4:28:56, 5.36s/it] 55%|█████▍ | 3632/6640 [1:47:25<4:29:13, 5.37s/it] {'loss': 0.5721, 'learning_rate': 8.967653650520841e-06, 'epoch': 0.55} 55%|█████▍ | 3632/6640 [1:47:25<4:29:13, 5.37s/it] 55%|█████▍ | 3633/6640 [1:47:31<4:30:02, 5.39s/it] {'loss': 0.5709, 'learning_rate': 8.96280158842018e-06, 'epoch': 0.55} 55%|█████▍ | 3633/6640 [1:47:31<4:30:02, 5.39s/it] 55%|█████▍ | 3634/6640 [1:47:36<4:30:12, 5.39s/it] {'loss': 0.5995, 'learning_rate': 8.95794977314491e-06, 'epoch': 0.55} 55%|█████▍ | 3634/6640 [1:47:36<4:30:12, 5.39s/it] 55%|█████▍ | 3635/6640 [1:47:42<4:31:40, 5.42s/it] {'loss': 0.6041, 'learning_rate': 8.953098205849647e-06, 'epoch': 0.55} 55%|█████▍ | 3635/6640 [1:47:42<4:31:40, 5.42s/it] 55%|█████▍ | 3636/6640 [1:47:47<4:28:42, 5.37s/it] {'loss': 0.573, 'learning_rate': 8.948246887688924e-06, 'epoch': 0.55} 55%|█████▍ | 3636/6640 [1:47:47<4:28:42, 5.37s/it] 55%|█████▍ | 3637/6640 [1:47:52<4:27:41, 5.35s/it] {'loss': 0.5945, 'learning_rate': 8.943395819817226e-06, 'epoch': 0.55} 55%|█████▍ | 3637/6640 [1:47:52<4:27:41, 5.35s/it] 55%|█████▍ | 3638/6640 [1:47:57<4:26:42, 5.33s/it] {'loss': 0.5856, 'learning_rate': 8.93854500338898e-06, 'epoch': 0.55} 55%|█████▍ | 3638/6640 [1:47:57<4:26:42, 5.33s/it] 55%|█████▍ | 3639/6640 [1:48:03<4:25:49, 5.31s/it] {'loss': 0.5751, 'learning_rate': 8.933694439558542e-06, 'epoch': 0.55} 55%|█████▍ | 3639/6640 [1:48:03<4:25:49, 5.31s/it] 55%|█████▍ | 3640/6640 [1:48:08<4:28:55, 5.38s/it] {'loss': 0.5921, 'learning_rate': 8.928844129480228e-06, 'epoch': 0.55} 55%|█████▍ | 3640/6640 [1:48:08<4:28:55, 5.38s/it] 55%|█████▍ | 3641/6640 [1:48:14<4:30:48, 5.42s/it] {'loss': 0.6005, 'learning_rate': 8.923994074308272e-06, 'epoch': 0.55} 55%|█████▍ | 3641/6640 [1:48:14<4:30:48, 5.42s/it] 55%|█████▍ | 3642/6640 [1:48:19<4:28:38, 5.38s/it] {'loss': 0.5962, 'learning_rate': 8.919144275196864e-06, 'epoch': 0.55} 55%|█████▍ | 3642/6640 [1:48:19<4:28:38, 5.38s/it] 55%|█████▍ | 3643/6640 [1:48:24<4:30:17, 5.41s/it] {'loss': 0.581, 'learning_rate': 8.914294733300118e-06, 'epoch': 0.55} 55%|█████▍ | 3643/6640 [1:48:24<4:30:17, 5.41s/it] 55%|█████▍ | 3644/6640 [1:48:30<4:36:15, 5.53s/it] {'loss': 0.6032, 'learning_rate': 8.909445449772103e-06, 'epoch': 0.55} 55%|█████▍ | 3644/6640 [1:48:30<4:36:15, 5.53s/it] 55%|█████▍ | 3645/6640 [1:48:36<4:35:11, 5.51s/it] {'loss': 0.5919, 'learning_rate': 8.904596425766817e-06, 'epoch': 0.55} 55%|█████▍ | 3645/6640 [1:48:36<4:35:11, 5.51s/it] 55%|█████▍ | 3646/6640 [1:48:41<4:32:42, 5.46s/it] {'loss': 0.5823, 'learning_rate': 8.899747662438194e-06, 'epoch': 0.55} 55%|█████▍ | 3646/6640 [1:48:41<4:32:42, 5.46s/it] 55%|█████▍ | 3647/6640 [1:48:47<4:31:55, 5.45s/it] {'loss': 0.6164, 'learning_rate': 8.89489916094011e-06, 'epoch': 0.55} 55%|█████▍ | 3647/6640 [1:48:47<4:31:55, 5.45s/it] 55%|█████▍ | 3648/6640 [1:48:52<4:32:32, 5.47s/it] {'loss': 0.5814, 'learning_rate': 8.89005092242638e-06, 'epoch': 0.55} 55%|█████▍ | 3648/6640 [1:48:52<4:32:32, 5.47s/it] 55%|█████▍ | 3649/6640 [1:48:57<4:29:45, 5.41s/it] {'loss': 0.6014, 'learning_rate': 8.885202948050754e-06, 'epoch': 0.55} 55%|█████▍ | 3649/6640 [1:48:57<4:29:45, 5.41s/it]0 AutoResumeHook: Checking whether to suspend... 55%|█████▍ | 3650/6640 [1:49:03<4:29:51, 5.42s/it]72 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.5661, 'learning_rate': 8.880355238966923e-06, 'epoch': 0.55} 55%|█████▍ | 3650/6640 [1:49:03<4:29:51, 5.42s/it] 55%|█████▍ | 3651/6640 [1:49:08<4:34:02, 5.50s/it] {'loss': 0.612, 'learning_rate': 8.875507796328509e-06, 'epoch': 0.55} 55%|█████▍ | 3651/6640 [1:49:08<4:34:02, 5.50s/it] 55%|█████▌ | 3652/6640 [1:49:14<4:33:42, 5.50s/it] {'loss': 0.623, 'learning_rate': 8.870660621289067e-06, 'epoch': 0.55} 55%|█████▌ | 3652/6640 [1:49:14<4:33:42, 5.50s/it] 55%|█████▌ | 3653/6640 [1:49:19<4:34:19, 5.51s/it] {'loss': 0.5951, 'learning_rate': 8.865813715002106e-06, 'epoch': 0.55} 55%|█████▌ | 3653/6640 [1:49:19<4:34:19, 5.51s/it] 55%|█████▌ | 3654/6640 [1:49:25<4:32:58, 5.49s/it] {'loss': 0.5767, 'learning_rate': 8.860967078621056e-06, 'epoch': 0.55} 55%|█████▌ | 3654/6640 [1:49:25<4:32:58, 5.49s/it] 55%|█████▌ | 3655/6640 [1:49:30<4:30:46, 5.44s/it] {'loss': 0.5979, 'learning_rate': 8.856120713299284e-06, 'epoch': 0.55} 55%|█████▌ | 3655/6640 [1:49:30<4:30:46, 5.44s/it] 55%|█████▌ | 3656/6640 [1:49:36<4:30:56, 5.45s/it] {'loss': 0.5932, 'learning_rate': 8.851274620190092e-06, 'epoch': 0.55} 55%|█████▌ | 3656/6640 [1:49:36<4:30:56, 5.45s/it] 55%|█████▌ | 3657/6640 [1:49:42<4:37:15, 5.58s/it] {'loss': 0.59, 'learning_rate': 8.846428800446725e-06, 'epoch': 0.55} 55%|█████▌ | 3657/6640 [1:49:42<4:37:15, 5.58s/it] 55%|█████▌ | 3658/6640 [1:49:47<4:34:06, 5.52s/it] {'loss': 0.5966, 'learning_rate': 8.841583255222359e-06, 'epoch': 0.55} 55%|█████▌ | 3658/6640 [1:49:47<4:34:06, 5.52s/it] 55%|█████▌ | 3659/6640 [1:49:52<4:31:28, 5.46s/it] {'loss': 0.5981, 'learning_rate': 8.836737985670102e-06, 'epoch': 0.55} 55%|█████▌ | 3659/6640 [1:49:52<4:31:28, 5.46s/it] 55%|█████▌ | 3660/6640 [1:49:58<4:29:32, 5.43s/it] {'loss': 0.5623, 'learning_rate': 8.831892992943e-06, 'epoch': 0.55} 55%|█████▌ | 3660/6640 [1:49:58<4:29:32, 5.43s/it] 55%|█████▌ | 3661/6640 [1:50:03<4:30:48, 5.45s/it] {'loss': 0.5971, 'learning_rate': 8.827048278194023e-06, 'epoch': 0.55} 55%|█████▌ | 3661/6640 [1:50:03<4:30:48, 5.45s/it] 55%|█████▌ | 3662/6640 [1:50:09<4:32:07, 5.48s/it] {'loss': 0.6233, 'learning_rate': 8.822203842576095e-06, 'epoch': 0.55} 55%|█████▌ | 3662/6640 [1:50:09<4:32:07, 5.48s/it] 55%|█████▌ | 3663/6640 [1:50:14<4:31:55, 5.48s/it] {'loss': 0.6101, 'learning_rate': 8.817359687242055e-06, 'epoch': 0.55} 55%|█████▌ | 3663/6640 [1:50:14<4:31:55, 5.48s/it] 55%|█████▌ | 3664/6640 [1:50:20<4:30:31, 5.45s/it] {'loss': 0.6003, 'learning_rate': 8.812515813344682e-06, 'epoch': 0.55} 55%|█████▌ | 3664/6640 [1:50:20<4:30:31, 5.45s/it] 55%|█████▌ | 3665/6640 [1:50:25<4:30:18, 5.45s/it] {'loss': 0.556, 'learning_rate': 8.807672222036692e-06, 'epoch': 0.55} 55%|█████▌ | 3665/6640 [1:50:25<4:30:18, 5.45s/it] 55%|█████▌ | 3666/6640 [1:50:30<4:28:19, 5.41s/it] {'loss': 0.5846, 'learning_rate': 8.802828914470722e-06, 'epoch': 0.55} 55%|█████▌ | 3666/6640 [1:50:30<4:28:19, 5.41s/it] 55%|█████▌ | 3667/6640 [1:50:36<4:31:54, 5.49s/it] {'loss': 0.5845, 'learning_rate': 8.797985891799359e-06, 'epoch': 0.55} 55%|█████▌ | 3667/6640 [1:50:36<4:31:54, 5.49s/it] 55%|█████▌ | 3668/6640 [1:50:41<4:30:50, 5.47s/it] {'loss': 0.586, 'learning_rate': 8.793143155175108e-06, 'epoch': 0.55} 55%|█████▌ | 3668/6640 [1:50:41<4:30:50, 5.47s/it] 55%|█████▌ | 3669/6640 [1:50:47<4:28:53, 5.43s/it] {'loss': 0.5921, 'learning_rate': 8.78830070575041e-06, 'epoch': 0.55} 55%|█████▌ | 3669/6640 [1:50:47<4:28:53, 5.43s/it] 55%|█████▌ | 3670/6640 [1:50:52<4:28:10, 5.42s/it] {'loss': 0.5758, 'learning_rate': 8.783458544677633e-06, 'epoch': 0.55} 55%|█████▌ | 3670/6640 [1:50:52<4:28:10, 5.42s/it] 55%|█████▌ | 3671/6640 [1:50:58<4:27:44, 5.41s/it] {'loss': 0.5574, 'learning_rate': 8.778616673109094e-06, 'epoch': 0.55} 55%|█████▌ | 3671/6640 [1:50:58<4:27:44, 5.41s/it] 55%|█████▌ | 3672/6640 [1:51:03<4:29:57, 5.46s/it] {'loss': 0.6095, 'learning_rate': 8.773775092197018e-06, 'epoch': 0.55} 55%|█████▌ | 3672/6640 [1:51:03<4:29:57, 5.46s/it] 55%|█████▌ | 3673/6640 [1:51:08<4:28:58, 5.44s/it] {'loss': 0.5779, 'learning_rate': 8.768933803093573e-06, 'epoch': 0.55} 55%|█████▌ | 3673/6640 [1:51:09<4:28:58, 5.44s/it] 55%|█████▌ | 3674/6640 [1:51:14<4:29:27, 5.45s/it] {'loss': 0.5859, 'learning_rate': 8.76409280695086e-06, 'epoch': 0.55} 55%|█████▌ | 3674/6640 [1:51:14<4:29:27, 5.45s/it] 55%|█████▌ | 3675/6640 [1:51:20<4:32:37, 5.52s/it] {'loss': 0.5695, 'learning_rate': 8.7592521049209e-06, 'epoch': 0.55} 55%|█████▌ | 3675/6640 [1:51:20<4:32:37, 5.52s/it] 55%|█████▌ | 3676/6640 [1:51:25<4:35:07, 5.57s/it] {'loss': 0.6047, 'learning_rate': 8.754411698155659e-06, 'epoch': 0.55} 55%|█████▌ | 3676/6640 [1:51:25<4:35:07, 5.57s/it] 55%|█████▌ | 3677/6640 [1:51:31<4:30:16, 5.47s/it] {'loss': 0.5666, 'learning_rate': 8.749571587807017e-06, 'epoch': 0.55} 55%|█████▌ | 3677/6640 [1:51:31<4:30:16, 5.47s/it] 55%|█████▌ | 3678/6640 [1:51:36<4:29:10, 5.45s/it] {'loss': 0.5872, 'learning_rate': 8.744731775026794e-06, 'epoch': 0.55} 55%|█████▌ | 3678/6640 [1:51:36<4:29:10, 5.45s/it] 55%|█████▌ | 3679/6640 [1:51:41<4:29:41, 5.46s/it] {'loss': 0.5733, 'learning_rate': 8.73989226096673e-06, 'epoch': 0.55} 55%|█████▌ | 3679/6640 [1:51:41<4:29:41, 5.46s/it] 55%|█████▌ | 3680/6640 [1:51:47<4:29:21, 5.46s/it] {'loss': 0.5931, 'learning_rate': 8.735053046778506e-06, 'epoch': 0.55} 55%|█████▌ | 3680/6640 [1:51:47<4:29:21, 5.46s/it] 55%|█████▌ | 3681/6640 [1:51:52<4:28:37, 5.45s/it] {'loss': 0.5775, 'learning_rate': 8.730214133613725e-06, 'epoch': 0.55} 55%|█████▌ | 3681/6640 [1:51:52<4:28:37, 5.45s/it] 55%|█████▌ | 3682/6640 [1:51:58<4:28:40, 5.45s/it] {'loss': 0.5819, 'learning_rate': 8.725375522623912e-06, 'epoch': 0.55} 55%|█████▌ | 3682/6640 [1:51:58<4:28:40, 5.45s/it] 55%|█████▌ | 3683/6640 [1:52:03<4:26:38, 5.41s/it] {'loss': 0.5967, 'learning_rate': 8.720537214960533e-06, 'epoch': 0.55} 55%|█████▌ | 3683/6640 [1:52:03<4:26:38, 5.41s/it] 55%|█████▌ | 3684/6640 [1:52:09<4:26:33, 5.41s/it] {'loss': 0.5592, 'learning_rate': 8.71569921177497e-06, 'epoch': 0.55} 55%|█████▌ | 3684/6640 [1:52:09<4:26:33, 5.41s/it] 55%|█████▌ | 3685/6640 [1:52:14<4:25:44, 5.40s/it] {'loss': 0.5678, 'learning_rate': 8.710861514218545e-06, 'epoch': 0.55} 55%|█████▌ | 3685/6640 [1:52:14<4:25:44, 5.40s/it] 56%|█████▌ | 3686/6640 [1:52:19<4:25:08, 5.39s/it] {'loss': 0.5772, 'learning_rate': 8.706024123442497e-06, 'epoch': 0.56} 56%|█████▌ | 3686/6640 [1:52:19<4:25:08, 5.39s/it] 56%|█████▌ | 3687/6640 [1:52:25<4:26:36, 5.42s/it] {'loss': 0.5761, 'learning_rate': 8.701187040597994e-06, 'epoch': 0.56} 56%|█████▌ | 3687/6640 [1:52:25<4:26:36, 5.42s/it] 56%|█████▌ | 3688/6640 [1:52:30<4:26:40, 5.42s/it] {'loss': 0.5988, 'learning_rate': 8.69635026683613e-06, 'epoch': 0.56} 56%|█████▌ | 3688/6640 [1:52:30<4:26:40, 5.42s/it] 56%|█████▌ | 3689/6640 [1:52:36<4:28:23, 5.46s/it] {'loss': 0.611, 'learning_rate': 8.691513803307933e-06, 'epoch': 0.56} 56%|█████▌ | 3689/6640 [1:52:36<4:28:23, 5.46s/it] 56%|█████▌ | 3690/6640 [1:52:41<4:28:30, 5.46s/it] {'loss': 0.6063, 'learning_rate': 8.686677651164345e-06, 'epoch': 0.56} 56%|█████▌ | 3690/6640 [1:52:41<4:28:30, 5.46s/it] 56%|█████▌ | 3691/6640 [1:52:47<4:28:08, 5.46s/it] {'loss': 0.5914, 'learning_rate': 8.681841811556249e-06, 'epoch': 0.56} 56%|█████▌ | 3691/6640 [1:52:47<4:28:08, 5.46s/it] 56%|█████▌ | 3692/6640 [1:52:52<4:29:20, 5.48s/it] {'loss': 0.5899, 'learning_rate': 8.677006285634438e-06, 'epoch': 0.56} 56%|█████▌ | 3692/6640 [1:52:52<4:29:20, 5.48s/it] 56%|█████▌ | 3693/6640 [1:52:58<4:27:51, 5.45s/it] {'loss': 0.5783, 'learning_rate': 8.672171074549637e-06, 'epoch': 0.56} 56%|█████▌ | 3693/6640 [1:52:58<4:27:51, 5.45s/it] 56%|█████▌ | 3694/6640 [1:53:03<4:26:49, 5.43s/it] {'loss': 0.5976, 'learning_rate': 8.667336179452503e-06, 'epoch': 0.56} 56%|█████▌ | 3694/6640 [1:53:03<4:26:49, 5.43s/it] 56%|█████▌ | 3695/6640 [1:53:09<4:29:45, 5.50s/it] {'loss': 0.5829, 'learning_rate': 8.662501601493607e-06, 'epoch': 0.56} 56%|█████▌ | 3695/6640 [1:53:09<4:29:45, 5.50s/it] 56%|█████▌ | 3696/6640 [1:53:14<4:30:08, 5.51s/it] {'loss': 0.588, 'learning_rate': 8.657667341823449e-06, 'epoch': 0.56} 56%|█████▌ | 3696/6640 [1:53:14<4:30:08, 5.51s/it] 56%|█████▌ | 3697/6640 [1:53:20<4:28:44, 5.48s/it] {'loss': 0.5984, 'learning_rate': 8.65283340159245e-06, 'epoch': 0.56} 56%|█████▌ | 3697/6640 [1:53:20<4:28:44, 5.48s/it] 56%|█████▌ | 3698/6640 [1:53:25<4:26:54, 5.44s/it] {'loss': 0.5827, 'learning_rate': 8.647999781950964e-06, 'epoch': 0.56} 56%|█████▌ | 3698/6640 [1:53:25<4:26:54, 5.44s/it] 56%|█████▌ | 3699/6640 [1:53:30<4:23:57, 5.38s/it] {'loss': 0.5679, 'learning_rate': 8.64316648404926e-06, 'epoch': 0.56} 56%|█████▌ | 3699/6640 [1:53:30<4:23:57, 5.38s/it]02 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...7 AutoResumeHook: Checking whether to suspend... 56%|█████▌ | 3700/6640 [1:53:36<4:24:49, 5.40s/it]6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.5902, 'learning_rate': 8.638333509037537e-06, 'epoch': 0.56} 56%|█████▌ | 3700/6640 [1:53:36<4:24:49, 5.40s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 56%|█████▌ | 3701/6640 [1:54:03<9:43:24, 11.91s/it] {'loss': 0.6068, 'learning_rate': 8.633500858065907e-06, 'epoch': 0.56} 56%|█████▌ | 3701/6640 [1:54:03<9:43:24, 11.91s/it] 56%|█████▌ | 3702/6640 [1:54:08<8:09:38, 10.00s/it] {'loss': 0.6023, 'learning_rate': 8.628668532284411e-06, 'epoch': 0.56} 56%|█████▌ | 3702/6640 [1:54:08<8:09:38, 10.00s/it] 56%|█████▌ | 3703/6640 [1:54:14<7:03:24, 8.65s/it] {'loss': 0.587, 'learning_rate': 8.623836532843022e-06, 'epoch': 0.56} 56%|█████▌ | 3703/6640 [1:54:14<7:03:24, 8.65s/it] 56%|█████▌ | 3704/6640 [1:54:19<6:14:29, 7.65s/it] {'loss': 0.5915, 'learning_rate': 8.619004860891618e-06, 'epoch': 0.56} 56%|█████▌ | 3704/6640 [1:54:19<6:14:29, 7.65s/it] 56%|█████▌ | 3705/6640 [1:54:25<5:43:28, 7.02s/it] {'loss': 0.5701, 'learning_rate': 8.61417351758001e-06, 'epoch': 0.56} 56%|█████▌ | 3705/6640 [1:54:25<5:43:28, 7.02s/it] 56%|█████▌ | 3706/6640 [1:54:30<5:18:42, 6.52s/it] {'loss': 0.5512, 'learning_rate': 8.609342504057927e-06, 'epoch': 0.56} 56%|█████▌ | 3706/6640 [1:54:30<5:18:42, 6.52s/it] 56%|█████▌ | 3707/6640 [1:54:36<5:05:46, 6.26s/it] {'loss': 0.5874, 'learning_rate': 8.604511821475021e-06, 'epoch': 0.56} 56%|█████▌ | 3707/6640 [1:54:36<5:05:46, 6.26s/it] 56%|█████▌ | 3708/6640 [1:54:41<4:57:30, 6.09s/it] {'loss': 0.5848, 'learning_rate': 8.599681470980867e-06, 'epoch': 0.56} 56%|█████▌ | 3708/6640 [1:54:41<4:57:30, 6.09s/it] 56%|█████▌ | 3709/6640 [1:54:47<4:45:42, 5.85s/it] {'loss': 0.5715, 'learning_rate': 8.594851453724959e-06, 'epoch': 0.56} 56%|█████▌ | 3709/6640 [1:54:47<4:45:42, 5.85s/it] 56%|█████▌ | 3710/6640 [1:54:52<4:45:42, 5.85s/it] {'loss': 0.6093, 'learning_rate': 8.590021770856708e-06, 'epoch': 0.56} 56%|█████▌ | 3710/6640 [1:54:52<4:45:42, 5.85s/it] 56%|█████▌ | 3711/6640 [1:54:58<4:39:51, 5.73s/it] {'loss': 0.579, 'learning_rate': 8.585192423525447e-06, 'epoch': 0.56} 56%|█████▌ | 3711/6640 [1:54:58<4:39:51, 5.73s/it] 56%|█████▌ | 3712/6640 [1:55:04<4:39:29, 5.73s/it] {'loss': 0.5897, 'learning_rate': 8.580363412880439e-06, 'epoch': 0.56} 56%|█████▌ | 3712/6640 [1:55:04<4:39:29, 5.73s/it] 56%|█████▌ | 3713/6640 [1:55:09<4:35:10, 5.64s/it] {'loss': 0.5763, 'learning_rate': 8.575534740070854e-06, 'epoch': 0.56} 56%|█████▌ | 3713/6640 [1:55:09<4:35:10, 5.64s/it] 56%|█████▌ | 3714/6640 [1:55:14<4:30:55, 5.56s/it] {'loss': 0.5646, 'learning_rate': 8.570706406245787e-06, 'epoch': 0.56} 56%|█████▌ | 3714/6640 [1:55:14<4:30:55, 5.56s/it] 56%|█████▌ | 3715/6640 [1:55:20<4:30:41, 5.55s/it] {'loss': 0.5828, 'learning_rate': 8.565878412554251e-06, 'epoch': 0.56} 56%|█████▌ | 3715/6640 [1:55:20<4:30:41, 5.55s/it] 56%|█████▌ | 3716/6640 [1:55:25<4:29:09, 5.52s/it] {'loss': 0.5969, 'learning_rate': 8.56105076014518e-06, 'epoch': 0.56} 56%|█████▌ | 3716/6640 [1:55:25<4:29:09, 5.52s/it] 56%|█████▌ | 3717/6640 [1:55:31<4:26:49, 5.48s/it] {'loss': 0.5874, 'learning_rate': 8.55622345016743e-06, 'epoch': 0.56} 56%|█████▌ | 3717/6640 [1:55:31<4:26:49, 5.48s/it] 56%|█████▌ | 3718/6640 [1:55:36<4:26:53, 5.48s/it] {'loss': 0.579, 'learning_rate': 8.551396483769766e-06, 'epoch': 0.56} 56%|█████▌ | 3718/6640 [1:55:36<4:26:53, 5.48s/it] 56%|█████▌ | 3719/6640 [1:55:42<4:26:29, 5.47s/it] {'loss': 0.6036, 'learning_rate': 8.546569862100876e-06, 'epoch': 0.56} 56%|█████▌ | 3719/6640 [1:55:42<4:26:29, 5.47s/it] 56%|█████▌ | 3720/6640 [1:55:47<4:25:46, 5.46s/it] {'loss': 0.6053, 'learning_rate': 8.541743586309366e-06, 'epoch': 0.56} 56%|█████▌ | 3720/6640 [1:55:47<4:25:46, 5.46s/it] 56%|█████▌ | 3721/6640 [1:55:53<4:25:00, 5.45s/it] {'loss': 0.5795, 'learning_rate': 8.536917657543767e-06, 'epoch': 0.56} 56%|█████▌ | 3721/6640 [1:55:53<4:25:00, 5.45s/it] 56%|█████▌ | 3722/6640 [1:55:58<4:23:53, 5.43s/it] {'loss': 0.6022, 'learning_rate': 8.532092076952515e-06, 'epoch': 0.56} 56%|█████▌ | 3722/6640 [1:55:58<4:23:53, 5.43s/it] 56%|█████▌ | 3723/6640 [1:56:03<4:22:16, 5.39s/it] {'loss': 0.5719, 'learning_rate': 8.527266845683968e-06, 'epoch': 0.56} 56%|█████▌ | 3723/6640 [1:56:03<4:22:16, 5.39s/it] 56%|█████▌ | 3724/6640 [1:56:09<4:24:41, 5.45s/it] {'loss': 0.5771, 'learning_rate': 8.522441964886403e-06, 'epoch': 0.56} 56%|█████▌ | 3724/6640 [1:56:09<4:24:41, 5.45s/it] 56%|█████▌ | 3725/6640 [1:56:14<4:23:56, 5.43s/it] {'loss': 0.574, 'learning_rate': 8.517617435708011e-06, 'epoch': 0.56} 56%|█████▌ | 3725/6640 [1:56:14<4:23:56, 5.43s/it] 56%|█████▌ | 3726/6640 [1:56:20<4:21:53, 5.39s/it] {'loss': 0.5923, 'learning_rate': 8.512793259296907e-06, 'epoch': 0.56} 56%|█████▌ | 3726/6640 [1:56:20<4:21:53, 5.39s/it] 56%|█████▌ | 3727/6640 [1:56:25<4:23:40, 5.43s/it] {'loss': 0.5969, 'learning_rate': 8.50796943680111e-06, 'epoch': 0.56} 56%|█████▌ | 3727/6640 [1:56:25<4:23:40, 5.43s/it] 56%|█████▌ | 3728/6640 [1:56:31<4:25:22, 5.47s/it] {'loss': 0.5998, 'learning_rate': 8.503145969368562e-06, 'epoch': 0.56} 56%|█████▌ | 3728/6640 [1:56:31<4:25:22, 5.47s/it] 56%|█████▌ | 3729/6640 [1:56:36<4:24:23, 5.45s/it] {'loss': 0.5966, 'learning_rate': 8.49832285814711e-06, 'epoch': 0.56} 56%|█████▌ | 3729/6640 [1:56:36<4:24:23, 5.45s/it] 56%|█████▌ | 3730/6640 [1:56:41<4:24:33, 5.45s/it] {'loss': 0.6141, 'learning_rate': 8.493500104284539e-06, 'epoch': 0.56} 56%|█████▌ | 3730/6640 [1:56:41<4:24:33, 5.45s/it] 56%|█████▌ | 3731/6640 [1:56:47<4:25:18, 5.47s/it] {'loss': 0.5826, 'learning_rate': 8.488677708928529e-06, 'epoch': 0.56} 56%|█████▌ | 3731/6640 [1:56:47<4:25:18, 5.47s/it] 56%|█████▌ | 3732/6640 [1:56:52<4:22:39, 5.42s/it] {'loss': 0.5981, 'learning_rate': 8.483855673226681e-06, 'epoch': 0.56} 56%|█████▌ | 3732/6640 [1:56:52<4:22:39, 5.42s/it] 56%|█████▌ | 3733/6640 [1:56:58<4:22:10, 5.41s/it] {'loss': 0.5901, 'learning_rate': 8.479033998326504e-06, 'epoch': 0.56} 56%|█████▌ | 3733/6640 [1:56:58<4:22:10, 5.41s/it] 56%|█████▌ | 3734/6640 [1:57:03<4:23:47, 5.45s/it] {'loss': 0.6052, 'learning_rate': 8.474212685375441e-06, 'epoch': 0.56} 56%|█████▌ | 3734/6640 [1:57:03<4:23:47, 5.45s/it] 56%|█████▋ | 3735/6640 [1:57:09<4:21:29, 5.40s/it] {'loss': 0.5644, 'learning_rate': 8.469391735520824e-06, 'epoch': 0.56} 56%|█████▋ | 3735/6640 [1:57:09<4:21:29, 5.40s/it] 56%|█████▋ | 3736/6640 [1:57:14<4:26:06, 5.50s/it] {'loss': 0.5868, 'learning_rate': 8.464571149909915e-06, 'epoch': 0.56} 56%|█████▋ | 3736/6640 [1:57:14<4:26:06, 5.50s/it] 56%|█████▋ | 3737/6640 [1:57:20<4:25:08, 5.48s/it] {'loss': 0.5994, 'learning_rate': 8.459750929689881e-06, 'epoch': 0.56} 56%|█████▋ | 3737/6640 [1:57:20<4:25:08, 5.48s/it] 56%|█████▋ | 3738/6640 [1:57:25<4:26:02, 5.50s/it] {'loss': 0.5788, 'learning_rate': 8.454931076007803e-06, 'epoch': 0.56} 56%|█████▋ | 3738/6640 [1:57:25<4:26:02, 5.50s/it] 56%|█████▋ | 3739/6640 [1:57:31<4:24:35, 5.47s/it] {'loss': 0.5934, 'learning_rate': 8.450111590010682e-06, 'epoch': 0.56} 56%|█████▋ | 3739/6640 [1:57:31<4:24:35, 5.47s/it] 56%|█████▋ | 3740/6640 [1:57:36<4:28:36, 5.56s/it] {'loss': 0.605, 'learning_rate': 8.445292472845423e-06, 'epoch': 0.56} 56%|█████▋ | 3740/6640 [1:57:36<4:28:36, 5.56s/it] 56%|█████▋ | 3741/6640 [1:57:42<4:24:40, 5.48s/it] {'loss': 0.5967, 'learning_rate': 8.440473725658851e-06, 'epoch': 0.56} 56%|█████▋ | 3741/6640 [1:57:42<4:24:40, 5.48s/it] 56%|█████▋ | 3742/6640 [1:57:47<4:21:09, 5.41s/it] {'loss': 0.5845, 'learning_rate': 8.43565534959769e-06, 'epoch': 0.56} 56%|█████▋ | 3742/6640 [1:57:47<4:21:09, 5.41s/it] 56%|█████▋ | 3743/6640 [1:57:52<4:19:30, 5.37s/it] {'loss': 0.575, 'learning_rate': 8.430837345808596e-06, 'epoch': 0.56} 56%|█████▋ | 3743/6640 [1:57:52<4:19:30, 5.37s/it] 56%|█████▋ | 3744/6640 [1:57:58<4:19:36, 5.38s/it] {'loss': 0.5899, 'learning_rate': 8.426019715438116e-06, 'epoch': 0.56} 56%|█████▋ | 3744/6640 [1:57:58<4:19:36, 5.38s/it] 56%|█████▋ | 3745/6640 [1:58:03<4:20:00, 5.39s/it] {'loss': 0.6052, 'learning_rate': 8.42120245963272e-06, 'epoch': 0.56} 56%|█████▋ | 3745/6640 [1:58:03<4:20:00, 5.39s/it] 56%|█████▋ | 3746/6640 [1:58:09<4:22:03, 5.43s/it] {'loss': 0.5902, 'learning_rate': 8.416385579538787e-06, 'epoch': 0.56} 56%|█████▋ | 3746/6640 [1:58:09<4:22:03, 5.43s/it] 56%|█████▋ | 3747/6640 [1:58:14<4:20:11, 5.40s/it] {'loss': 0.5846, 'learning_rate': 8.411569076302598e-06, 'epoch': 0.56} 56%|█████▋ | 3747/6640 [1:58:14<4:20:11, 5.40s/it] 56%|█████▋ | 3748/6640 [1:58:19<4:19:39, 5.39s/it] {'loss': 0.5699, 'learning_rate': 8.40675295107036e-06, 'epoch': 0.56} 56%|█████▋ | 3748/6640 [1:58:19<4:19:39, 5.39s/it] 56%|█████▋ | 3749/6640 [1:58:25<4:21:11, 5.42s/it] {'loss': 0.6032, 'learning_rate': 8.401937204988175e-06, 'epoch': 0.56} 56%|█████▋ | 3749/6640 [1:58:25<4:21:11, 5.42s/it]0 AutoResumeHook: Checking whether to suspend... 56%|█████▋ | 3750/6640 [1:58:30<4:21:19, 5.43s/it]1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5974, 'learning_rate': 8.397121839202069e-06, 'epoch': 0.56} 56%|█████▋ | 3750/6640 [1:58:30<4:21:19, 5.43s/it] 56%|█████▋ | 3751/6640 [1:58:35<4:19:53, 5.40s/it] {'loss': 0.5856, 'learning_rate': 8.392306854857962e-06, 'epoch': 0.56} 56%|█████▋ | 3751/6640 [1:58:35<4:19:53, 5.40s/it] 57%|█████▋ | 3752/6640 [1:58:41<4:20:14, 5.41s/it] {'loss': 0.5823, 'learning_rate': 8.387492253101695e-06, 'epoch': 0.57} 57%|█████▋ | 3752/6640 [1:58:41<4:20:14, 5.41s/it] 57%|█████▋ | 3753/6640 [1:58:47<4:23:23, 5.47s/it] {'loss': 0.6071, 'learning_rate': 8.382678035079017e-06, 'epoch': 0.57} 57%|█████▋ | 3753/6640 [1:58:47<4:23:23, 5.47s/it] 57%|█████▋ | 3754/6640 [1:58:52<4:23:36, 5.48s/it] {'loss': 0.5692, 'learning_rate': 8.377864201935578e-06, 'epoch': 0.57} 57%|█████▋ | 3754/6640 [1:58:52<4:23:36, 5.48s/it] 57%|█████▋ | 3755/6640 [1:58:58<4:29:01, 5.60s/it] {'loss': 0.5851, 'learning_rate': 8.373050754816942e-06, 'epoch': 0.57} 57%|█████▋ | 3755/6640 [1:58:58<4:29:01, 5.60s/it] 57%|█████▋ | 3756/6640 [1:59:03<4:25:59, 5.53s/it] {'loss': 0.5825, 'learning_rate': 8.36823769486858e-06, 'epoch': 0.57} 57%|█████▋ | 3756/6640 [1:59:03<4:25:59, 5.53s/it] 57%|█████▋ | 3757/6640 [1:59:09<4:21:45, 5.45s/it] {'loss': 0.5814, 'learning_rate': 8.36342502323587e-06, 'epoch': 0.57} 57%|█████▋ | 3757/6640 [1:59:09<4:21:45, 5.45s/it] 57%|█████▋ | 3758/6640 [1:59:14<4:22:15, 5.46s/it] {'loss': 0.5839, 'learning_rate': 8.358612741064105e-06, 'epoch': 0.57} 57%|█████▋ | 3758/6640 [1:59:14<4:22:15, 5.46s/it] 57%|█████▋ | 3759/6640 [1:59:20<4:24:14, 5.50s/it] {'loss': 0.5841, 'learning_rate': 8.353800849498475e-06, 'epoch': 0.57} 57%|█████▋ | 3759/6640 [1:59:20<4:24:14, 5.50s/it] 57%|█████▋ | 3760/6640 [1:59:25<4:27:19, 5.57s/it] {'loss': 0.5736, 'learning_rate': 8.348989349684077e-06, 'epoch': 0.57} 57%|█████▋ | 3760/6640 [1:59:25<4:27:19, 5.57s/it] 57%|█████▋ | 3761/6640 [1:59:31<4:21:39, 5.45s/it] {'loss': 0.5874, 'learning_rate': 8.344178242765926e-06, 'epoch': 0.57} 57%|█████▋ | 3761/6640 [1:59:31<4:21:39, 5.45s/it] 57%|█████▋ | 3762/6640 [1:59:36<4:18:51, 5.40s/it] {'loss': 0.6057, 'learning_rate': 8.339367529888934e-06, 'epoch': 0.57} 57%|█████▋ | 3762/6640 [1:59:36<4:18:51, 5.40s/it] 57%|█████▋ | 3763/6640 [1:59:41<4:18:43, 5.40s/it] {'loss': 0.6124, 'learning_rate': 8.33455721219792e-06, 'epoch': 0.57} 57%|█████▋ | 3763/6640 [1:59:41<4:18:43, 5.40s/it] 57%|█████▋ | 3764/6640 [1:59:47<4:20:05, 5.43s/it] {'loss': 0.5756, 'learning_rate': 8.32974729083761e-06, 'epoch': 0.57} 57%|█████▋ | 3764/6640 [1:59:47<4:20:05, 5.43s/it] 57%|█████▋ | 3765/6640 [1:59:52<4:21:29, 5.46s/it] {'loss': 0.578, 'learning_rate': 8.324937766952638e-06, 'epoch': 0.57} 57%|█████▋ | 3765/6640 [1:59:52<4:21:29, 5.46s/it] 57%|█████▋ | 3766/6640 [1:59:58<4:21:40, 5.46s/it] {'loss': 0.5772, 'learning_rate': 8.32012864168754e-06, 'epoch': 0.57} 57%|█████▋ | 3766/6640 [1:59:58<4:21:40, 5.46s/it] 57%|█████▋ | 3767/6640 [2:00:03<4:20:15, 5.44s/it] {'loss': 0.6018, 'learning_rate': 8.315319916186762e-06, 'epoch': 0.57} 57%|█████▋ | 3767/6640 [2:00:03<4:20:15, 5.44s/it] 57%|█████▋ | 3768/6640 [2:00:08<4:19:39, 5.42s/it] {'loss': 0.5931, 'learning_rate': 8.31051159159465e-06, 'epoch': 0.57} 57%|█████▋ | 3768/6640 [2:00:08<4:19:39, 5.42s/it] 57%|█████▋ | 3769/6640 [2:00:14<4:18:44, 5.41s/it] {'loss': 0.6006, 'learning_rate': 8.305703669055454e-06, 'epoch': 0.57} 57%|█████▋ | 3769/6640 [2:00:14<4:18:44, 5.41s/it] 57%|█████▋ | 3770/6640 [2:00:19<4:18:39, 5.41s/it] {'loss': 0.5771, 'learning_rate': 8.300896149713334e-06, 'epoch': 0.57} 57%|█████▋ | 3770/6640 [2:00:19<4:18:39, 5.41s/it] 57%|█████▋ | 3771/6640 [2:00:25<4:21:35, 5.47s/it] {'loss': 0.5981, 'learning_rate': 8.296089034712352e-06, 'epoch': 0.57} 57%|█████▋ | 3771/6640 [2:00:25<4:21:35, 5.47s/it] 57%|█████▋ | 3772/6640 [2:00:30<4:20:17, 5.45s/it] {'loss': 0.6046, 'learning_rate': 8.291282325196464e-06, 'epoch': 0.57} 57%|█████▋ | 3772/6640 [2:00:30<4:20:17, 5.45s/it] 57%|█████▋ | 3773/6640 [2:00:36<4:19:35, 5.43s/it] {'loss': 0.5664, 'learning_rate': 8.286476022309549e-06, 'epoch': 0.57} 57%|█████▋ | 3773/6640 [2:00:36<4:19:35, 5.43s/it] 57%|█████▋ | 3774/6640 [2:00:41<4:18:39, 5.42s/it] {'loss': 0.6064, 'learning_rate': 8.281670127195365e-06, 'epoch': 0.57} 57%|█████▋ | 3774/6640 [2:00:41<4:18:39, 5.42s/it] 57%|█████▋ | 3775/6640 [2:00:46<4:16:51, 5.38s/it] {'loss': 0.6155, 'learning_rate': 8.276864640997602e-06, 'epoch': 0.57} 57%|█████▋ | 3775/6640 [2:00:46<4:16:51, 5.38s/it] 57%|█████▋ | 3776/6640 [2:00:52<4:19:57, 5.45s/it] {'loss': 0.5624, 'learning_rate': 8.272059564859826e-06, 'epoch': 0.57} 57%|█████▋ | 3776/6640 [2:00:52<4:19:57, 5.45s/it] 57%|█████▋ | 3777/6640 [2:00:57<4:16:34, 5.38s/it] {'loss': 0.6179, 'learning_rate': 8.26725489992552e-06, 'epoch': 0.57} 57%|█████▋ | 3777/6640 [2:00:57<4:16:34, 5.38s/it] 57%|█████▋ | 3778/6640 [2:01:03<4:17:32, 5.40s/it] {'loss': 0.611, 'learning_rate': 8.262450647338058e-06, 'epoch': 0.57} 57%|█████▋ | 3778/6640 [2:01:03<4:17:32, 5.40s/it] 57%|█████▋ | 3779/6640 [2:01:08<4:18:31, 5.42s/it] {'loss': 0.5786, 'learning_rate': 8.257646808240735e-06, 'epoch': 0.57} 57%|█████▋ | 3779/6640 [2:01:08<4:18:31, 5.42s/it] 57%|█████▋ | 3780/6640 [2:01:14<4:18:56, 5.43s/it] {'loss': 0.5958, 'learning_rate': 8.252843383776731e-06, 'epoch': 0.57} 57%|█████▋ | 3780/6640 [2:01:14<4:18:56, 5.43s/it] 57%|█████▋ | 3781/6640 [2:01:19<4:19:15, 5.44s/it] {'loss': 0.5744, 'learning_rate': 8.248040375089129e-06, 'epoch': 0.57} 57%|█████▋ | 3781/6640 [2:01:19<4:19:15, 5.44s/it] 57%|█████▋ | 3782/6640 [2:01:24<4:16:46, 5.39s/it] {'loss': 0.5827, 'learning_rate': 8.243237783320922e-06, 'epoch': 0.57} 57%|█████▋ | 3782/6640 [2:01:24<4:16:46, 5.39s/it] 57%|█████▋ | 3783/6640 [2:01:30<4:19:56, 5.46s/it] {'loss': 0.5948, 'learning_rate': 8.238435609614987e-06, 'epoch': 0.57} 57%|█████▋ | 3783/6640 [2:01:30<4:19:56, 5.46s/it] 57%|█████▋ | 3784/6640 [2:01:35<4:17:53, 5.42s/it] {'loss': 0.5855, 'learning_rate': 8.233633855114127e-06, 'epoch': 0.57} 57%|█████▋ | 3784/6640 [2:01:35<4:17:53, 5.42s/it] 57%|█████▋ | 3785/6640 [2:01:41<4:17:13, 5.41s/it] {'loss': 0.5691, 'learning_rate': 8.228832520961023e-06, 'epoch': 0.57} 57%|█████▋ | 3785/6640 [2:01:41<4:17:13, 5.41s/it] 57%|█████▋ | 3786/6640 [2:01:46<4:16:38, 5.40s/it] {'loss': 0.5847, 'learning_rate': 8.224031608298266e-06, 'epoch': 0.57} 57%|█████▋ | 3786/6640 [2:01:46<4:16:38, 5.40s/it] 57%|█████▋ | 3787/6640 [2:01:52<4:19:06, 5.45s/it] {'loss': 0.5611, 'learning_rate': 8.219231118268336e-06, 'epoch': 0.57} 57%|█████▋ | 3787/6640 [2:01:52<4:19:06, 5.45s/it] 57%|█████▋ | 3788/6640 [2:01:57<4:21:48, 5.51s/it] {'loss': 0.5884, 'learning_rate': 8.214431052013636e-06, 'epoch': 0.57} 57%|█████▋ | 3788/6640 [2:01:57<4:21:48, 5.51s/it] 57%|█████▋ | 3789/6640 [2:02:03<4:21:17, 5.50s/it] {'loss': 0.6019, 'learning_rate': 8.209631410676442e-06, 'epoch': 0.57} 57%|█████▋ | 3789/6640 [2:02:03<4:21:17, 5.50s/it] 57%|█████▋ | 3790/6640 [2:02:08<4:17:48, 5.43s/it] {'loss': 0.6051, 'learning_rate': 8.204832195398941e-06, 'epoch': 0.57} 57%|█████▋ | 3790/6640 [2:02:08<4:17:48, 5.43s/it] 57%|█████▋ | 3791/6640 [2:02:14<4:23:51, 5.56s/it] {'loss': 0.5769, 'learning_rate': 8.200033407323223e-06, 'epoch': 0.57} 57%|█████▋ | 3791/6640 [2:02:14<4:23:51, 5.56s/it] 57%|█████▋ | 3792/6640 [2:02:19<4:20:47, 5.49s/it] {'loss': 0.606, 'learning_rate': 8.195235047591261e-06, 'epoch': 0.57} 57%|█████▋ | 3792/6640 [2:02:19<4:20:47, 5.49s/it] 57%|█████▋ | 3793/6640 [2:02:24<4:19:07, 5.46s/it] {'loss': 0.5808, 'learning_rate': 8.190437117344948e-06, 'epoch': 0.57} 57%|█████▋ | 3793/6640 [2:02:25<4:19:07, 5.46s/it] 57%|█████▋ | 3794/6640 [2:02:30<4:20:14, 5.49s/it] {'loss': 0.6016, 'learning_rate': 8.185639617726056e-06, 'epoch': 0.57} 57%|█████▋ | 3794/6640 [2:02:30<4:20:14, 5.49s/it] 57%|█████▋ | 3795/6640 [2:02:36<4:22:20, 5.53s/it] {'loss': 0.5875, 'learning_rate': 8.18084254987626e-06, 'epoch': 0.57} 57%|█████▋ | 3795/6640 [2:02:36<4:22:20, 5.53s/it] 57%|█████▋ | 3796/6640 [2:02:41<4:21:30, 5.52s/it] {'loss': 0.582, 'learning_rate': 8.176045914937133e-06, 'epoch': 0.57} 57%|█████▋ | 3796/6640 [2:02:41<4:21:30, 5.52s/it] 57%|█████▋ | 3797/6640 [2:02:47<4:22:36, 5.54s/it] {'loss': 0.5801, 'learning_rate': 8.171249714050153e-06, 'epoch': 0.57} 57%|█████▋ | 3797/6640 [2:02:47<4:22:36, 5.54s/it] 57%|█████▋ | 3798/6640 [2:02:52<4:21:17, 5.52s/it] {'loss': 0.5757, 'learning_rate': 8.166453948356679e-06, 'epoch': 0.57} 57%|█████▋ | 3798/6640 [2:02:52<4:21:17, 5.52s/it] 57%|█████▋ | 3799/6640 [2:02:58<4:19:18, 5.48s/it] {'loss': 0.5699, 'learning_rate': 8.161658618997978e-06, 'epoch': 0.57} 57%|█████▋ | 3799/6640 [2:02:58<4:19:18, 5.48s/it]0 2 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 57%|█████▋ | 3800/6640 [2:03:03<4:21:44, 5.53s/it]5 AutoResumeHook: Checking whether to suspend...7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6042, 'learning_rate': 8.15686372711521e-06, 'epoch': 0.57} 57%|█████▋ | 3800/6640 [2:03:03<4:21:44, 5.53s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 57%|█████▋ | 3801/6640 [2:03:32<9:47:28, 12.42s/it] {'loss': 0.5649, 'learning_rate': 8.152069273849426e-06, 'epoch': 0.57} 57%|█████▋ | 3801/6640 [2:03:32<9:47:28, 12.42s/it] 57%|█████▋ | 3802/6640 [2:03:37<8:07:52, 10.31s/it] {'loss': 0.5797, 'learning_rate': 8.147275260341588e-06, 'epoch': 0.57} 57%|█████▋ | 3802/6640 [2:03:37<8:07:52, 10.31s/it] 57%|█████▋ | 3803/6640 [2:03:43<6:59:33, 8.87s/it] {'loss': 0.5915, 'learning_rate': 8.142481687732532e-06, 'epoch': 0.57} 57%|█████▋ | 3803/6640 [2:03:43<6:59:33, 8.87s/it] 57%|█████▋ | 3804/6640 [2:03:48<6:09:49, 7.82s/it] {'loss': 0.5992, 'learning_rate': 8.137688557163005e-06, 'epoch': 0.57} 57%|█████▋ | 3804/6640 [2:03:48<6:09:49, 7.82s/it] 57%|█████▋ | 3805/6640 [2:03:53<5:35:36, 7.10s/it] {'loss': 0.57, 'learning_rate': 8.132895869773638e-06, 'epoch': 0.57} 57%|█████▋ | 3805/6640 [2:03:53<5:35:36, 7.10s/it] 57%|█████▋ | 3806/6640 [2:03:59<5:09:58, 6.56s/it] {'loss': 0.6002, 'learning_rate': 8.128103626704969e-06, 'epoch': 0.57} 57%|█████▋ | 3806/6640 [2:03:59<5:09:58, 6.56s/it] 57%|█████▋ | 3807/6640 [2:04:04<4:52:58, 6.20s/it] {'loss': 0.5773, 'learning_rate': 8.12331182909742e-06, 'epoch': 0.57} 57%|█████▋ | 3807/6640 [2:04:04<4:52:58, 6.20s/it] 57%|█████▋ | 3808/6640 [2:04:09<4:40:50, 5.95s/it] {'loss': 0.5652, 'learning_rate': 8.118520478091311e-06, 'epoch': 0.57} 57%|█████▋ | 3808/6640 [2:04:09<4:40:50, 5.95s/it] 57%|█████▋ | 3809/6640 [2:04:15<4:33:57, 5.81s/it] {'loss': 0.5904, 'learning_rate': 8.113729574826856e-06, 'epoch': 0.57} 57%|█████▋ | 3809/6640 [2:04:15<4:33:57, 5.81s/it] 57%|█████▋ | 3810/6640 [2:04:20<4:27:06, 5.66s/it] {'loss': 0.5836, 'learning_rate': 8.108939120444154e-06, 'epoch': 0.57} 57%|█████▋ | 3810/6640 [2:04:20<4:27:06, 5.66s/it] 57%|█████▋ | 3811/6640 [2:04:26<4:24:23, 5.61s/it] {'loss': 0.5839, 'learning_rate': 8.104149116083216e-06, 'epoch': 0.57} 57%|█████▋ | 3811/6640 [2:04:26<4:24:23, 5.61s/it] 57%|█████▋ | 3812/6640 [2:04:31<4:22:37, 5.57s/it] {'loss': 0.5989, 'learning_rate': 8.099359562883931e-06, 'epoch': 0.57} 57%|█████▋ | 3812/6640 [2:04:31<4:22:37, 5.57s/it] 57%|█████▋ | 3813/6640 [2:04:37<4:21:06, 5.54s/it] {'loss': 0.6055, 'learning_rate': 8.094570461986078e-06, 'epoch': 0.57} 57%|█████▋ | 3813/6640 [2:04:37<4:21:06, 5.54s/it] 57%|█████▋ | 3814/6640 [2:04:42<4:16:45, 5.45s/it] {'loss': 0.5998, 'learning_rate': 8.089781814529338e-06, 'epoch': 0.57} 57%|█████▋ | 3814/6640 [2:04:42<4:16:45, 5.45s/it] 57%|█████▋ | 3815/6640 [2:04:47<4:15:17, 5.42s/it] {'loss': 0.6076, 'learning_rate': 8.084993621653283e-06, 'epoch': 0.57} 57%|█████▋ | 3815/6640 [2:04:47<4:15:17, 5.42s/it] 57%|█████▋ | 3816/6640 [2:04:53<4:14:25, 5.41s/it] {'loss': 0.5887, 'learning_rate': 8.080205884497375e-06, 'epoch': 0.57} 57%|█████▋ | 3816/6640 [2:04:53<4:14:25, 5.41s/it] 57%|█████▋ | 3817/6640 [2:04:58<4:13:56, 5.40s/it] {'loss': 0.5797, 'learning_rate': 8.075418604200964e-06, 'epoch': 0.57} 57%|█████▋ | 3817/6640 [2:04:58<4:13:56, 5.40s/it] 57%|█████▊ | 3818/6640 [2:05:03<4:11:42, 5.35s/it] {'loss': 0.5852, 'learning_rate': 8.070631781903298e-06, 'epoch': 0.57} 57%|█████▊ | 3818/6640 [2:05:03<4:11:42, 5.35s/it] 58%|█████▊ | 3819/6640 [2:05:09<4:11:22, 5.35s/it] {'loss': 0.6084, 'learning_rate': 8.065845418743504e-06, 'epoch': 0.58} 58%|█████▊ | 3819/6640 [2:05:09<4:11:22, 5.35s/it] 58%|█████▊ | 3820/6640 [2:05:14<4:14:19, 5.41s/it] {'loss': 0.5941, 'learning_rate': 8.061059515860616e-06, 'epoch': 0.58} 58%|█████▊ | 3820/6640 [2:05:14<4:14:19, 5.41s/it] 58%|█████▊ | 3821/6640 [2:05:20<4:13:30, 5.40s/it] {'loss': 0.5946, 'learning_rate': 8.05627407439355e-06, 'epoch': 0.58} 58%|█████▊ | 3821/6640 [2:05:20<4:13:30, 5.40s/it] 58%|█████▊ | 3822/6640 [2:05:25<4:12:57, 5.39s/it] {'loss': 0.627, 'learning_rate': 8.051489095481107e-06, 'epoch': 0.58} 58%|█████▊ | 3822/6640 [2:05:25<4:12:57, 5.39s/it] 58%|█████▊ | 3823/6640 [2:05:30<4:13:31, 5.40s/it] {'loss': 0.5815, 'learning_rate': 8.046704580261989e-06, 'epoch': 0.58} 58%|█████▊ | 3823/6640 [2:05:30<4:13:31, 5.40s/it] 58%|█████▊ | 3824/6640 [2:05:36<4:12:46, 5.39s/it] {'loss': 0.5815, 'learning_rate': 8.04192052987478e-06, 'epoch': 0.58} 58%|█████▊ | 3824/6640 [2:05:36<4:12:46, 5.39s/it] 58%|█████▊ | 3825/6640 [2:05:41<4:13:21, 5.40s/it] {'loss': 0.569, 'learning_rate': 8.037136945457959e-06, 'epoch': 0.58} 58%|█████▊ | 3825/6640 [2:05:41<4:13:21, 5.40s/it] 58%|█████▊ | 3826/6640 [2:05:47<4:13:04, 5.40s/it] {'loss': 0.5807, 'learning_rate': 8.032353828149889e-06, 'epoch': 0.58} 58%|█████▊ | 3826/6640 [2:05:47<4:13:04, 5.40s/it] 58%|█████▊ | 3827/6640 [2:05:52<4:12:48, 5.39s/it] {'loss': 0.6071, 'learning_rate': 8.027571179088816e-06, 'epoch': 0.58} 58%|█████▊ | 3827/6640 [2:05:52<4:12:48, 5.39s/it] 58%|█████▊ | 3828/6640 [2:05:58<4:20:06, 5.55s/it] {'loss': 0.5857, 'learning_rate': 8.022788999412893e-06, 'epoch': 0.58} 58%|█████▊ | 3828/6640 [2:05:58<4:20:06, 5.55s/it] 58%|█████▊ | 3829/6640 [2:06:03<4:16:01, 5.46s/it] {'loss': 0.5897, 'learning_rate': 8.018007290260148e-06, 'epoch': 0.58} 58%|█████▊ | 3829/6640 [2:06:03<4:16:01, 5.46s/it] 58%|█████▊ | 3830/6640 [2:06:09<4:16:52, 5.48s/it] {'loss': 0.5722, 'learning_rate': 8.013226052768498e-06, 'epoch': 0.58} 58%|█████▊ | 3830/6640 [2:06:09<4:16:52, 5.48s/it] 58%|█████▊ | 3831/6640 [2:06:14<4:17:06, 5.49s/it] {'loss': 0.604, 'learning_rate': 8.008445288075745e-06, 'epoch': 0.58} 58%|█████▊ | 3831/6640 [2:06:14<4:17:06, 5.49s/it] 58%|█████▊ | 3832/6640 [2:06:20<4:19:07, 5.54s/it] {'loss': 0.5855, 'learning_rate': 8.003664997319588e-06, 'epoch': 0.58} 58%|█████▊ | 3832/6640 [2:06:20<4:19:07, 5.54s/it] 58%|█████▊ | 3833/6640 [2:06:25<4:15:32, 5.46s/it] {'loss': 0.5816, 'learning_rate': 7.99888518163761e-06, 'epoch': 0.58} 58%|█████▊ | 3833/6640 [2:06:25<4:15:32, 5.46s/it] 58%|█████▊ | 3834/6640 [2:06:31<4:15:23, 5.46s/it] {'loss': 0.5981, 'learning_rate': 7.994105842167274e-06, 'epoch': 0.58} 58%|█████▊ | 3834/6640 [2:06:31<4:15:23, 5.46s/it] 58%|█████▊ | 3835/6640 [2:06:36<4:12:12, 5.39s/it] {'loss': 0.6075, 'learning_rate': 7.989326980045937e-06, 'epoch': 0.58} 58%|█████▊ | 3835/6640 [2:06:36<4:12:12, 5.39s/it] 58%|█████▊ | 3836/6640 [2:06:41<4:13:25, 5.42s/it] {'loss': 0.5737, 'learning_rate': 7.984548596410836e-06, 'epoch': 0.58} 58%|█████▊ | 3836/6640 [2:06:41<4:13:25, 5.42s/it] 58%|█████▊ | 3837/6640 [2:06:47<4:13:52, 5.43s/it] {'loss': 0.5923, 'learning_rate': 7.979770692399104e-06, 'epoch': 0.58} 58%|█████▊ | 3837/6640 [2:06:47<4:13:52, 5.43s/it] 58%|█████▊ | 3838/6640 [2:06:52<4:13:52, 5.44s/it] {'loss': 0.5687, 'learning_rate': 7.974993269147753e-06, 'epoch': 0.58} 58%|█████▊ | 3838/6640 [2:06:52<4:13:52, 5.44s/it] 58%|█████▊ | 3839/6640 [2:06:57<4:11:32, 5.39s/it] {'loss': 0.5911, 'learning_rate': 7.970216327793676e-06, 'epoch': 0.58} 58%|█████▊ | 3839/6640 [2:06:57<4:11:32, 5.39s/it] 58%|█████▊ | 3840/6640 [2:07:03<4:10:40, 5.37s/it] {'loss': 0.5825, 'learning_rate': 7.965439869473664e-06, 'epoch': 0.58} 58%|█████▊ | 3840/6640 [2:07:03<4:10:40, 5.37s/it] 58%|█████▊ | 3841/6640 [2:07:08<4:13:25, 5.43s/it] {'loss': 0.5653, 'learning_rate': 7.960663895324378e-06, 'epoch': 0.58} 58%|█████▊ | 3841/6640 [2:07:08<4:13:25, 5.43s/it] 58%|█████▊ | 3842/6640 [2:07:14<4:11:32, 5.39s/it] {'loss': 0.5654, 'learning_rate': 7.955888406482382e-06, 'epoch': 0.58} 58%|█████▊ | 3842/6640 [2:07:14<4:11:32, 5.39s/it] 58%|█████▊ | 3843/6640 [2:07:19<4:12:54, 5.43s/it] {'loss': 0.5863, 'learning_rate': 7.951113404084108e-06, 'epoch': 0.58} 58%|█████▊ | 3843/6640 [2:07:19<4:12:54, 5.43s/it] 58%|█████▊ | 3844/6640 [2:07:25<4:12:34, 5.42s/it] {'loss': 0.6158, 'learning_rate': 7.94633888926588e-06, 'epoch': 0.58} 58%|█████▊ | 3844/6640 [2:07:25<4:12:34, 5.42s/it] 58%|█████▊ | 3845/6640 [2:07:30<4:16:09, 5.50s/it] {'loss': 0.606, 'learning_rate': 7.941564863163899e-06, 'epoch': 0.58} 58%|█████▊ | 3845/6640 [2:07:30<4:16:09, 5.50s/it] 58%|█████▊ | 3846/6640 [2:07:36<4:13:58, 5.45s/it] {'loss': 0.5905, 'learning_rate': 7.936791326914262e-06, 'epoch': 0.58} 58%|█████▊ | 3846/6640 [2:07:36<4:13:58, 5.45s/it] 58%|█████▊ | 3847/6640 [2:07:41<4:15:20, 5.49s/it] {'loss': 0.5713, 'learning_rate': 7.932018281652943e-06, 'epoch': 0.58} 58%|█████▊ | 3847/6640 [2:07:41<4:15:20, 5.49s/it] 58%|█████▊ | 3848/6640 [2:07:47<4:16:56, 5.52s/it] {'loss': 0.6122, 'learning_rate': 7.927245728515793e-06, 'epoch': 0.58} 58%|█████▊ | 3848/6640 [2:07:47<4:16:56, 5.52s/it] 58%|█████▊ | 3849/6640 [2:07:53<4:24:04, 5.68s/it] {'loss': 0.5834, 'learning_rate': 7.922473668638558e-06, 'epoch': 0.58} 58%|█████▊ | 3849/6640 [2:07:53<4:24:04, 5.68s/it]5 AutoResumeHook: Checking whether to suspend... 13 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 58%|█████▊ | 3850/6640 [2:07:58<4:20:16, 5.60s/it] {'loss': 0.5666, 'learning_rate': 7.91770210315685e-06, 'epoch': 0.58} 58%|█████▊ | 3850/6640 [2:07:58<4:20:16, 5.60s/it] 58%|█████▊ | 3851/6640 [2:08:04<4:27:33, 5.76s/it] {'loss': 0.6168, 'learning_rate': 7.912931033206187e-06, 'epoch': 0.58} 58%|█████▊ | 3851/6640 [2:08:04<4:27:33, 5.76s/it] 58%|█████▊ | 3852/6640 [2:08:10<4:25:52, 5.72s/it] {'loss': 0.5887, 'learning_rate': 7.908160459921949e-06, 'epoch': 0.58} 58%|█████▊ | 3852/6640 [2:08:10<4:25:52, 5.72s/it] 58%|█████▊ | 3853/6640 [2:08:15<4:21:09, 5.62s/it] {'loss': 0.5752, 'learning_rate': 7.903390384439404e-06, 'epoch': 0.58} 58%|█████▊ | 3853/6640 [2:08:15<4:21:09, 5.62s/it] 58%|█████▊ | 3854/6640 [2:08:21<4:20:55, 5.62s/it] {'loss': 0.5951, 'learning_rate': 7.898620807893698e-06, 'epoch': 0.58} 58%|█████▊ | 3854/6640 [2:08:21<4:20:55, 5.62s/it] 58%|█████▊ | 3855/6640 [2:08:27<4:19:49, 5.60s/it] {'loss': 0.5811, 'learning_rate': 7.893851731419872e-06, 'epoch': 0.58} 58%|█████▊ | 3855/6640 [2:08:27<4:19:49, 5.60s/it] 58%|█████▊ | 3856/6640 [2:08:32<4:20:28, 5.61s/it] {'loss': 0.5877, 'learning_rate': 7.889083156152832e-06, 'epoch': 0.58} 58%|█████▊ | 3856/6640 [2:08:32<4:20:28, 5.61s/it] 58%|█████▊ | 3857/6640 [2:08:38<4:17:19, 5.55s/it] {'loss': 0.6009, 'learning_rate': 7.884315083227373e-06, 'epoch': 0.58} 58%|█████▊ | 3857/6640 [2:08:38<4:17:19, 5.55s/it] 58%|█████▊ | 3858/6640 [2:08:43<4:14:04, 5.48s/it] {'loss': 0.6155, 'learning_rate': 7.879547513778168e-06, 'epoch': 0.58} 58%|█████▊ | 3858/6640 [2:08:43<4:14:04, 5.48s/it] 58%|█████▊ | 3859/6640 [2:08:48<4:14:20, 5.49s/it] {'loss': 0.5826, 'learning_rate': 7.874780448939764e-06, 'epoch': 0.58} 58%|█████▊ | 3859/6640 [2:08:48<4:14:20, 5.49s/it] 58%|█████▊ | 3860/6640 [2:08:54<4:11:29, 5.43s/it] {'loss': 0.6039, 'learning_rate': 7.870013889846608e-06, 'epoch': 0.58} 58%|█████▊ | 3860/6640 [2:08:54<4:11:29, 5.43s/it] 58%|█████▊ | 3861/6640 [2:08:59<4:12:04, 5.44s/it] {'loss': 0.6095, 'learning_rate': 7.865247837633005e-06, 'epoch': 0.58} 58%|█████▊ | 3861/6640 [2:08:59<4:12:04, 5.44s/it] 58%|█████▊ | 3862/6640 [2:09:05<4:11:31, 5.43s/it] {'loss': 0.6016, 'learning_rate': 7.860482293433148e-06, 'epoch': 0.58} 58%|█████▊ | 3862/6640 [2:09:05<4:11:31, 5.43s/it] 58%|█████▊ | 3863/6640 [2:09:10<4:10:27, 5.41s/it] {'loss': 0.5902, 'learning_rate': 7.855717258381105e-06, 'epoch': 0.58} 58%|█████▊ | 3863/6640 [2:09:10<4:10:27, 5.41s/it] 58%|█████▊ | 3864/6640 [2:09:15<4:09:47, 5.40s/it] {'loss': 0.5704, 'learning_rate': 7.850952733610836e-06, 'epoch': 0.58} 58%|█████▊ | 3864/6640 [2:09:15<4:09:47, 5.40s/it] 58%|█████▊ | 3865/6640 [2:09:21<4:11:56, 5.45s/it] {'loss': 0.5717, 'learning_rate': 7.846188720256162e-06, 'epoch': 0.58} 58%|█████▊ | 3865/6640 [2:09:21<4:11:56, 5.45s/it] 58%|█████▊ | 3866/6640 [2:09:26<4:14:09, 5.50s/it] {'loss': 0.5922, 'learning_rate': 7.841425219450796e-06, 'epoch': 0.58} 58%|█████▊ | 3866/6640 [2:09:26<4:14:09, 5.50s/it] 58%|█████▊ | 3867/6640 [2:09:32<4:11:32, 5.44s/it] {'loss': 0.5784, 'learning_rate': 7.836662232328319e-06, 'epoch': 0.58} 58%|█████▊ | 3867/6640 [2:09:32<4:11:32, 5.44s/it] 58%|█████▊ | 3868/6640 [2:09:37<4:11:00, 5.43s/it] {'loss': 0.5818, 'learning_rate': 7.831899760022192e-06, 'epoch': 0.58} 58%|█████▊ | 3868/6640 [2:09:37<4:11:00, 5.43s/it] 58%|█████▊ | 3869/6640 [2:09:43<4:10:35, 5.43s/it] {'loss': 0.5948, 'learning_rate': 7.827137803665765e-06, 'epoch': 0.58} 58%|█████▊ | 3869/6640 [2:09:43<4:10:35, 5.43s/it] 58%|█████▊ | 3870/6640 [2:09:48<4:10:47, 5.43s/it] {'loss': 0.5606, 'learning_rate': 7.822376364392248e-06, 'epoch': 0.58} 58%|█████▊ | 3870/6640 [2:09:48<4:10:47, 5.43s/it] 58%|█████▊ | 3871/6640 [2:09:54<4:11:41, 5.45s/it] {'loss': 0.592, 'learning_rate': 7.817615443334741e-06, 'epoch': 0.58} 58%|█████▊ | 3871/6640 [2:09:54<4:11:41, 5.45s/it] 58%|█████▊ | 3872/6640 [2:09:59<4:14:14, 5.51s/it] {'loss': 0.5982, 'learning_rate': 7.812855041626206e-06, 'epoch': 0.58} 58%|█████▊ | 3872/6640 [2:09:59<4:14:14, 5.51s/it] 58%|█████▊ | 3873/6640 [2:10:05<4:14:22, 5.52s/it] {'loss': 0.5883, 'learning_rate': 7.808095160399502e-06, 'epoch': 0.58} 58%|█████▊ | 3873/6640 [2:10:05<4:14:22, 5.52s/it] 58%|█████▊ | 3874/6640 [2:10:10<4:16:53, 5.57s/it] {'loss': 0.5915, 'learning_rate': 7.803335800787348e-06, 'epoch': 0.58} 58%|█████▊ | 3874/6640 [2:10:10<4:16:53, 5.57s/it] 58%|█████▊ | 3875/6640 [2:10:16<4:13:56, 5.51s/it] {'loss': 0.5991, 'learning_rate': 7.798576963922347e-06, 'epoch': 0.58} 58%|█████▊ | 3875/6640 [2:10:16<4:13:56, 5.51s/it] 58%|█████▊ | 3876/6640 [2:10:21<4:10:14, 5.43s/it] {'loss': 0.5829, 'learning_rate': 7.793818650936969e-06, 'epoch': 0.58} 58%|█████▊ | 3876/6640 [2:10:21<4:10:14, 5.43s/it] 58%|█████▊ | 3877/6640 [2:10:27<4:11:15, 5.46s/it] {'loss': 0.5777, 'learning_rate': 7.789060862963564e-06, 'epoch': 0.58} 58%|█████▊ | 3877/6640 [2:10:27<4:11:15, 5.46s/it] 58%|█████▊ | 3878/6640 [2:10:32<4:12:31, 5.49s/it] {'loss': 0.5917, 'learning_rate': 7.784303601134369e-06, 'epoch': 0.58} 58%|█████▊ | 3878/6640 [2:10:32<4:12:31, 5.49s/it] 58%|█████▊ | 3879/6640 [2:10:38<4:13:12, 5.50s/it] {'loss': 0.5922, 'learning_rate': 7.779546866581475e-06, 'epoch': 0.58} 58%|█████▊ | 3879/6640 [2:10:38<4:13:12, 5.50s/it] 58%|█████▊ | 3880/6640 [2:10:43<4:14:03, 5.52s/it] {'loss': 0.5651, 'learning_rate': 7.774790660436857e-06, 'epoch': 0.58} 58%|█████▊ | 3880/6640 [2:10:43<4:14:03, 5.52s/it] 58%|█████▊ | 3881/6640 [2:10:49<4:15:32, 5.56s/it] {'loss': 0.569, 'learning_rate': 7.770034983832368e-06, 'epoch': 0.58} 58%|█████▊ | 3881/6640 [2:10:49<4:15:32, 5.56s/it] 58%|█████▊ | 3882/6640 [2:10:54<4:14:36, 5.54s/it] {'loss': 0.5786, 'learning_rate': 7.76527983789973e-06, 'epoch': 0.58} 58%|█████▊ | 3882/6640 [2:10:54<4:14:36, 5.54s/it] 58%|█████▊ | 3883/6640 [2:11:00<4:13:20, 5.51s/it] {'loss': 0.6056, 'learning_rate': 7.760525223770544e-06, 'epoch': 0.58} 58%|█████▊ | 3883/6640 [2:11:00<4:13:20, 5.51s/it] 58%|█████▊ | 3884/6640 [2:11:05<4:11:18, 5.47s/it] {'loss': 0.6077, 'learning_rate': 7.755771142576276e-06, 'epoch': 0.58} 58%|█████▊ | 3884/6640 [2:11:05<4:11:18, 5.47s/it] 59%|█████▊ | 3885/6640 [2:11:10<4:08:55, 5.42s/it] {'loss': 0.6015, 'learning_rate': 7.75101759544827e-06, 'epoch': 0.59} 59%|█████▊ | 3885/6640 [2:11:10<4:08:55, 5.42s/it] 59%|█████▊ | 3886/6640 [2:11:16<4:11:07, 5.47s/it] {'loss': 0.5706, 'learning_rate': 7.746264583517744e-06, 'epoch': 0.59} 59%|█████▊ | 3886/6640 [2:11:16<4:11:07, 5.47s/it] 59%|█████▊ | 3887/6640 [2:11:22<4:10:53, 5.47s/it] {'loss': 0.6185, 'learning_rate': 7.741512107915787e-06, 'epoch': 0.59} 59%|█████▊ | 3887/6640 [2:11:22<4:10:53, 5.47s/it] 59%|█████▊ | 3888/6640 [2:11:27<4:09:01, 5.43s/it] {'loss': 0.5775, 'learning_rate': 7.736760169773362e-06, 'epoch': 0.59} 59%|█████▊ | 3888/6640 [2:11:27<4:09:01, 5.43s/it] 59%|█████▊ | 3889/6640 [2:11:32<4:08:24, 5.42s/it] {'loss': 0.5756, 'learning_rate': 7.732008770221301e-06, 'epoch': 0.59} 59%|█████▊ | 3889/6640 [2:11:32<4:08:24, 5.42s/it] 59%|█████▊ | 3890/6640 [2:11:38<4:08:47, 5.43s/it] {'loss': 0.568, 'learning_rate': 7.727257910390309e-06, 'epoch': 0.59} 59%|█████▊ | 3890/6640 [2:11:38<4:08:47, 5.43s/it] 59%|█████▊ | 3891/6640 [2:11:43<4:11:16, 5.48s/it] {'loss': 0.6163, 'learning_rate': 7.722507591410965e-06, 'epoch': 0.59} 59%|█████▊ | 3891/6640 [2:11:43<4:11:16, 5.48s/it] 59%|█████▊ | 3892/6640 [2:11:49<4:09:51, 5.46s/it] {'loss': 0.5637, 'learning_rate': 7.717757814413719e-06, 'epoch': 0.59} 59%|█████▊ | 3892/6640 [2:11:49<4:09:51, 5.46s/it] 59%|█████▊ | 3893/6640 [2:11:54<4:10:11, 5.46s/it] {'loss': 0.5557, 'learning_rate': 7.713008580528886e-06, 'epoch': 0.59} 59%|█████▊ | 3893/6640 [2:11:54<4:10:11, 5.46s/it] 59%|█████▊ | 3894/6640 [2:12:00<4:07:51, 5.42s/it] {'loss': 0.5973, 'learning_rate': 7.708259890886662e-06, 'epoch': 0.59} 59%|█████▊ | 3894/6640 [2:12:00<4:07:51, 5.42s/it] 59%|█████▊ | 3895/6640 [2:12:05<4:07:16, 5.41s/it] {'loss': 0.5902, 'learning_rate': 7.703511746617098e-06, 'epoch': 0.59} 59%|█████▊ | 3895/6640 [2:12:05<4:07:16, 5.41s/it] 59%|█████▊ | 3896/6640 [2:12:10<4:07:20, 5.41s/it] {'loss': 0.5878, 'learning_rate': 7.698764148850138e-06, 'epoch': 0.59} 59%|█████▊ | 3896/6640 [2:12:10<4:07:20, 5.41s/it] 59%|█████▊ | 3897/6640 [2:12:16<4:07:26, 5.41s/it] {'loss': 0.5792, 'learning_rate': 7.694017098715572e-06, 'epoch': 0.59} 59%|█████▊ | 3897/6640 [2:12:16<4:07:26, 5.41s/it] 59%|█████▊ | 3898/6640 [2:12:21<4:10:13, 5.48s/it] {'loss': 0.5903, 'learning_rate': 7.689270597343076e-06, 'epoch': 0.59} 59%|█████▊ | 3898/6640 [2:12:21<4:10:13, 5.48s/it] 59%|█████▊ | 3899/6640 [2:12:27<4:08:33, 5.44s/it] {'loss': 0.5725, 'learning_rate': 7.684524645862188e-06, 'epoch': 0.59} 59%|█████▊ | 3899/6640 [2:12:27<4:08:33, 5.44s/it]5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 59%|█████▊ | 3900/6640 [2:12:32<4:07:15, 5.41s/it]7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5806, 'learning_rate': 7.679779245402321e-06, 'epoch': 0.59} 59%|█████▊ | 3900/6640 [2:12:32<4:07:15, 5.41s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-3900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 59%|█████▉ | 3901/6640 [2:12:59<8:58:05, 11.79s/it] {'loss': 0.5606, 'learning_rate': 7.675034397092749e-06, 'epoch': 0.59} 59%|█████▉ | 3901/6640 [2:12:59<8:58:05, 11.79s/it] 59%|█████▉ | 3902/6640 [2:13:04<7:30:35, 9.87s/it] {'loss': 0.5986, 'learning_rate': 7.670290102062621e-06, 'epoch': 0.59} 59%|█████▉ | 3902/6640 [2:13:04<7:30:35, 9.87s/it] 59%|█████▉ | 3903/6640 [2:13:10<6:30:58, 8.57s/it] {'loss': 0.5959, 'learning_rate': 7.66554636144095e-06, 'epoch': 0.59} 59%|█████▉ | 3903/6640 [2:13:10<6:30:58, 8.57s/it] 59%|█████▉ | 3904/6640 [2:13:15<5:46:44, 7.60s/it] {'loss': 0.5934, 'learning_rate': 7.660803176356614e-06, 'epoch': 0.59} 59%|█████▉ | 3904/6640 [2:13:15<5:46:44, 7.60s/it] 59%|█████▉ | 3905/6640 [2:13:21<5:18:22, 6.98s/it] {'loss': 0.5678, 'learning_rate': 7.656060547938375e-06, 'epoch': 0.59} 59%|█████▉ | 3905/6640 [2:13:21<5:18:22, 6.98s/it] 59%|█████▉ | 3906/6640 [2:13:26<4:56:00, 6.50s/it] {'loss': 0.5835, 'learning_rate': 7.651318477314845e-06, 'epoch': 0.59} 59%|█████▉ | 3906/6640 [2:13:26<4:56:00, 6.50s/it] 59%|█████▉ | 3907/6640 [2:13:31<4:39:55, 6.15s/it] {'loss': 0.5761, 'learning_rate': 7.646576965614509e-06, 'epoch': 0.59} 59%|█████▉ | 3907/6640 [2:13:31<4:39:55, 6.15s/it] 59%|█████▉ | 3908/6640 [2:13:37<4:30:07, 5.93s/it] {'loss': 0.6076, 'learning_rate': 7.641836013965719e-06, 'epoch': 0.59} 59%|█████▉ | 3908/6640 [2:13:37<4:30:07, 5.93s/it] 59%|█████▉ | 3909/6640 [2:13:42<4:23:35, 5.79s/it] {'loss': 0.5824, 'learning_rate': 7.6370956234967e-06, 'epoch': 0.59} 59%|█████▉ | 3909/6640 [2:13:42<4:23:35, 5.79s/it] 59%|█████▉ | 3910/6640 [2:13:47<4:17:39, 5.66s/it] {'loss': 0.5459, 'learning_rate': 7.632355795335533e-06, 'epoch': 0.59} 59%|█████▉ | 3910/6640 [2:13:48<4:17:39, 5.66s/it] 59%|█████▉ | 3911/6640 [2:13:53<4:17:00, 5.65s/it] {'loss': 0.5745, 'learning_rate': 7.627616530610172e-06, 'epoch': 0.59} 59%|█████▉ | 3911/6640 [2:13:53<4:17:00, 5.65s/it] 59%|█████▉ | 3912/6640 [2:13:58<4:12:52, 5.56s/it] {'loss': 0.6025, 'learning_rate': 7.622877830448434e-06, 'epoch': 0.59} 59%|█████▉ | 3912/6640 [2:13:58<4:12:52, 5.56s/it] 59%|█████▉ | 3913/6640 [2:14:04<4:10:04, 5.50s/it] {'loss': 0.6088, 'learning_rate': 7.6181396959779974e-06, 'epoch': 0.59} 59%|█████▉ | 3913/6640 [2:14:04<4:10:04, 5.50s/it] 59%|█████▉ | 3914/6640 [2:14:09<4:09:21, 5.49s/it] {'loss': 0.578, 'learning_rate': 7.613402128326422e-06, 'epoch': 0.59} 59%|█████▉ | 3914/6640 [2:14:09<4:09:21, 5.49s/it] 59%|█████▉ | 3915/6640 [2:14:15<4:08:45, 5.48s/it] {'loss': 0.5935, 'learning_rate': 7.608665128621111e-06, 'epoch': 0.59} 59%|█████▉ | 3915/6640 [2:14:15<4:08:45, 5.48s/it] 59%|█████▉ | 3916/6640 [2:14:20<4:07:40, 5.46s/it] {'loss': 0.5895, 'learning_rate': 7.603928697989353e-06, 'epoch': 0.59} 59%|█████▉ | 3916/6640 [2:14:20<4:07:40, 5.46s/it] 59%|█████▉ | 3917/6640 [2:14:26<4:07:55, 5.46s/it] {'loss': 0.6079, 'learning_rate': 7.5991928375582805e-06, 'epoch': 0.59} 59%|█████▉ | 3917/6640 [2:14:26<4:07:55, 5.46s/it] 59%|█████▉ | 3918/6640 [2:14:31<4:09:39, 5.50s/it] {'loss': 0.6167, 'learning_rate': 7.594457548454911e-06, 'epoch': 0.59} 59%|█████▉ | 3918/6640 [2:14:31<4:09:39, 5.50s/it] 59%|█████▉ | 3919/6640 [2:14:37<4:07:13, 5.45s/it] {'loss': 0.5819, 'learning_rate': 7.589722831806113e-06, 'epoch': 0.59} 59%|█████▉ | 3919/6640 [2:14:37<4:07:13, 5.45s/it] 59%|█████▉ | 3920/6640 [2:14:42<4:09:42, 5.51s/it] {'loss': 0.5698, 'learning_rate': 7.584988688738622e-06, 'epoch': 0.59} 59%|█████▉ | 3920/6640 [2:14:42<4:09:42, 5.51s/it] 59%|█████▉ | 3921/6640 [2:14:48<4:08:43, 5.49s/it] {'loss': 0.5955, 'learning_rate': 7.5802551203790344e-06, 'epoch': 0.59} 59%|█████▉ | 3921/6640 [2:14:48<4:08:43, 5.49s/it] 59%|█████▉ | 3922/6640 [2:14:53<4:08:06, 5.48s/it] {'loss': 0.5865, 'learning_rate': 7.5755221278538135e-06, 'epoch': 0.59} 59%|█████▉ | 3922/6640 [2:14:53<4:08:06, 5.48s/it] 59%|█████▉ | 3923/6640 [2:14:59<4:07:22, 5.46s/it] {'loss': 0.6169, 'learning_rate': 7.570789712289287e-06, 'epoch': 0.59} 59%|█████▉ | 3923/6640 [2:14:59<4:07:22, 5.46s/it] 59%|█████▉ | 3924/6640 [2:15:04<4:07:32, 5.47s/it] {'loss': 0.5808, 'learning_rate': 7.566057874811643e-06, 'epoch': 0.59} 59%|█████▉ | 3924/6640 [2:15:04<4:07:32, 5.47s/it]WARNING: tokenization mismatch: 1 vs. 789. [[{'from': 'human', 'value': 'ASSISTANT: You are a forward-thinking HR consultancy with the following knowledge and traits.\n\nKNOWLEDGE: HR management, employee satisfaction, talent retention, team building, workplace culture.\n\nTRAITS: Innovation, empathy, solution orientation, creativity, understanding, teamwork.\n\nTASK: Use your knowledge to create an internal strategy document. We are an HR consultancy looking to develop 10 creative ways to improve employee satisfaction. Focus on ideas that are innovative and engaging, and cater to the needs of HR managers and company leaders.\n\nOUTPUT: A table in German containing 10 creative strategies to improve employee satisfaction, including a brief description for each.'}, {'from': 'gpt', 'value': '| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1. | "Wertschätzungswand" | Eine Wand im Büro, auf der Mitarbeiter anonym ihre Kollegen für ihre Leistungen und Beiträge loben und anerkennen können. |\n| 2. | Flexible Arbeitsmodelle | Einführung von flexiblen Arbeitszeiten und Homeoffice-Optionen, um Work-Life-Balance zu verbessern und individuelle Bedürfnisse zu berücksichtigen. |\n| 3. | Mitarbeiter-Workshops für persönliche Entwicklung | Organisieren von internen Workshops und Schulungen, um persönliche und berufliche Fähigkeiten der Mitarbeiter zu fördern. |\n| 4. | Team-Building-Aktivitäten außerhalb des Arbeitsplatzes | Regelmäßige Veranstaltungen wie Escape Rooms, Sportevents oder Kochkurse, um den Zusammenhalt und die Kommunikation im Team zu stärken. |\n| 5. | Gesundheits- und Wellness-Programme | Einführung von Fitness- und Entspannungskursen wie Yoga, Meditation oder Massagen, um das Wohlbefinden der Mitarbeiter zu fördern. |\n| 6. | Feedback-Kultur und transparente Kommunikation | Etablierung einer offenen Feedback-Kultur und transparenter Kommunikation, um die Mitarbeiter in Entscheidungsprozesse einzubinden. |\n| 7. | Mitarbeiter-Mentoring-Programm | Einführung eines Mentoring-Programms, bei dem erfahrene Mitarbeiter ihr Wissen und ihre Erfahrungen mit jüngeren Kollegen teilen. |\n| 8. | "Innovationstage" | Einrichtung von speziellen Tagen, an denen Mitarbeiter an eigenen Projekten arbeiten oder innovative Ideen für das Unternehmen entwickeln können. |\n| 9. | Individuelle Karriereentwicklungspläne | Erstellung von personalisierten Karriereplänen in Zusammenarbeit mit den Mitarbeitern, um ihre berufliche Entwicklung zu unterstützen. |\n| 10. | Gemeinschaftliche soziale Verantwortung (CSR) Initiativen | Organisation von Freiwilligenprojekten und CSR-Aktivitäten, um den Mitarbeitern die Möglichkeit zu geben, sozialen Einfluss zu nehmen. |'}]] (ignored) 59%|█████▉ | 3925/6640 [2:15:09<4:05:47, 5.43s/it] {'loss': 0.6143, 'learning_rate': 7.561326616546932e-06, 'epoch': 0.59} 59%|█████▉ | 3925/6640 [2:15:09<4:05:47, 5.43s/it] 59%|█████▉ | 3926/6640 [2:15:15<4:07:06, 5.46s/it] {'loss': 0.5821, 'learning_rate': 7.556595938621058e-06, 'epoch': 0.59} 59%|█████▉ | 3926/6640 [2:15:15<4:07:06, 5.46s/it] 59%|█████▉ | 3927/6640 [2:15:20<4:05:20, 5.43s/it] {'loss': 0.596, 'learning_rate': 7.551865842159809e-06, 'epoch': 0.59} 59%|█████▉ | 3927/6640 [2:15:20<4:05:20, 5.43s/it] 59%|█████▉ | 3928/6640 [2:15:26<4:07:39, 5.48s/it] {'loss': 0.6163, 'learning_rate': 7.547136328288814e-06, 'epoch': 0.59} 59%|█████▉ | 3928/6640 [2:15:26<4:07:39, 5.48s/it] 59%|█████▉ | 3929/6640 [2:15:31<4:05:41, 5.44s/it] {'loss': 0.5933, 'learning_rate': 7.542407398133571e-06, 'epoch': 0.59} 59%|█████▉ | 3929/6640 [2:15:31<4:05:41, 5.44s/it] 59%|█████▉ | 3930/6640 [2:15:36<4:02:58, 5.38s/it] {'loss': 0.5761, 'learning_rate': 7.5376790528194354e-06, 'epoch': 0.59} 59%|█████▉ | 3930/6640 [2:15:36<4:02:58, 5.38s/it] 59%|█████▉ | 3931/6640 [2:15:42<4:03:03, 5.38s/it] {'loss': 0.5878, 'learning_rate': 7.532951293471631e-06, 'epoch': 0.59} 59%|█████▉ | 3931/6640 [2:15:42<4:03:03, 5.38s/it] 59%|█████▉ | 3932/6640 [2:15:47<4:02:38, 5.38s/it] {'loss': 0.5864, 'learning_rate': 7.528224121215235e-06, 'epoch': 0.59} 59%|█████▉ | 3932/6640 [2:15:47<4:02:38, 5.38s/it] 59%|█████▉ | 3933/6640 [2:15:53<4:03:41, 5.40s/it] {'loss': 0.5796, 'learning_rate': 7.523497537175192e-06, 'epoch': 0.59} 59%|█████▉ | 3933/6640 [2:15:53<4:03:41, 5.40s/it] 59%|█████▉ | 3934/6640 [2:15:58<4:02:31, 5.38s/it] {'loss': 0.5938, 'learning_rate': 7.518771542476299e-06, 'epoch': 0.59} 59%|█████▉ | 3934/6640 [2:15:58<4:02:31, 5.38s/it] 59%|█████▉ | 3935/6640 [2:16:03<4:03:14, 5.40s/it] {'loss': 0.5697, 'learning_rate': 7.514046138243211e-06, 'epoch': 0.59} 59%|█████▉ | 3935/6640 [2:16:03<4:03:14, 5.40s/it] 59%|█████▉ | 3936/6640 [2:16:09<4:08:13, 5.51s/it] {'loss': 0.5917, 'learning_rate': 7.509321325600457e-06, 'epoch': 0.59} 59%|█████▉ | 3936/6640 [2:16:09<4:08:13, 5.51s/it] 59%|█████▉ | 3937/6640 [2:16:14<4:05:20, 5.45s/it] {'loss': 0.5783, 'learning_rate': 7.504597105672411e-06, 'epoch': 0.59} 59%|█████▉ | 3937/6640 [2:16:14<4:05:20, 5.45s/it] 59%|█████▉ | 3938/6640 [2:16:20<4:06:33, 5.48s/it] {'loss': 0.571, 'learning_rate': 7.499873479583312e-06, 'epoch': 0.59} 59%|█████▉ | 3938/6640 [2:16:20<4:06:33, 5.48s/it] 59%|█████▉ | 3939/6640 [2:16:26<4:07:41, 5.50s/it] {'loss': 0.584, 'learning_rate': 7.495150448457251e-06, 'epoch': 0.59} 59%|█████▉ | 3939/6640 [2:16:26<4:07:41, 5.50s/it] 59%|█████▉ | 3940/6640 [2:16:31<4:05:49, 5.46s/it] {'loss': 0.5827, 'learning_rate': 7.490428013418187e-06, 'epoch': 0.59} 59%|█████▉ | 3940/6640 [2:16:31<4:05:49, 5.46s/it] 59%|█████▉ | 3941/6640 [2:16:36<4:05:55, 5.47s/it] {'loss': 0.6191, 'learning_rate': 7.485706175589934e-06, 'epoch': 0.59} 59%|█████▉ | 3941/6640 [2:16:36<4:05:55, 5.47s/it] 59%|█████▉ | 3942/6640 [2:16:42<4:04:44, 5.44s/it] {'loss': 0.5844, 'learning_rate': 7.480984936096163e-06, 'epoch': 0.59} 59%|█████▉ | 3942/6640 [2:16:42<4:04:44, 5.44s/it] 59%|█████▉ | 3943/6640 [2:16:47<4:03:53, 5.43s/it] {'loss': 0.5787, 'learning_rate': 7.4762642960604e-06, 'epoch': 0.59} 59%|█████▉ | 3943/6640 [2:16:47<4:03:53, 5.43s/it] 59%|█████▉ | 3944/6640 [2:16:53<4:06:54, 5.49s/it] {'loss': 0.5695, 'learning_rate': 7.471544256606027e-06, 'epoch': 0.59} 59%|█████▉ | 3944/6640 [2:16:53<4:06:54, 5.49s/it] 59%|█████▉ | 3945/6640 [2:16:58<4:06:12, 5.48s/it] {'loss': 0.5835, 'learning_rate': 7.466824818856296e-06, 'epoch': 0.59} 59%|█████▉ | 3945/6640 [2:16:58<4:06:12, 5.48s/it] 59%|█████▉ | 3946/6640 [2:17:04<4:03:30, 5.42s/it] {'loss': 0.5865, 'learning_rate': 7.462105983934302e-06, 'epoch': 0.59} 59%|█████▉ | 3946/6640 [2:17:04<4:03:30, 5.42s/it] 59%|█████▉ | 3947/6640 [2:17:09<4:06:00, 5.48s/it] {'loss': 0.5903, 'learning_rate': 7.457387752962999e-06, 'epoch': 0.59} 59%|█████▉ | 3947/6640 [2:17:09<4:06:00, 5.48s/it] 59%|█████▉ | 3948/6640 [2:17:14<4:03:03, 5.42s/it] {'loss': 0.585, 'learning_rate': 7.4526701270652025e-06, 'epoch': 0.59} 59%|█████▉ | 3948/6640 [2:17:14<4:03:03, 5.42s/it] 59%|█████▉ | 3949/6640 [2:17:20<4:04:28, 5.45s/it] {'loss': 0.6078, 'learning_rate': 7.447953107363574e-06, 'epoch': 0.59} 59%|█████▉ | 3949/6640 [2:17:20<4:04:28, 5.45s/it]01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 59%|█████▉ | 3950/6640 [2:17:25<4:04:11, 5.45s/it]2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6011, 'learning_rate': 7.443236694980649e-06, 'epoch': 0.59} 59%|█████▉ | 3950/6640 [2:17:25<4:04:11, 5.45s/it] 60%|█████▉ | 3951/6640 [2:17:31<4:01:29, 5.39s/it] {'loss': 0.5744, 'learning_rate': 7.438520891038801e-06, 'epoch': 0.6} 60%|█████▉ | 3951/6640 [2:17:31<4:01:29, 5.39s/it] 60%|█████▉ | 3952/6640 [2:17:36<4:04:25, 5.46s/it] {'loss': 0.586, 'learning_rate': 7.433805696660267e-06, 'epoch': 0.6} 60%|█████▉ | 3952/6640 [2:17:36<4:04:25, 5.46s/it] 60%|█████▉ | 3953/6640 [2:17:42<4:05:00, 5.47s/it] {'loss': 0.6247, 'learning_rate': 7.429091112967128e-06, 'epoch': 0.6} 60%|█████▉ | 3953/6640 [2:17:42<4:05:00, 5.47s/it] 60%|█████▉ | 3954/6640 [2:17:47<4:03:36, 5.44s/it] {'loss': 0.5857, 'learning_rate': 7.424377141081343e-06, 'epoch': 0.6} 60%|█████▉ | 3954/6640 [2:17:47<4:03:36, 5.44s/it] 60%|█████▉ | 3955/6640 [2:17:53<4:06:20, 5.50s/it] {'loss': 0.6001, 'learning_rate': 7.4196637821247e-06, 'epoch': 0.6} 60%|█████▉ | 3955/6640 [2:17:53<4:06:20, 5.50s/it] 60%|█████▉ | 3956/6640 [2:17:58<4:02:52, 5.43s/it] {'loss': 0.5864, 'learning_rate': 7.414951037218854e-06, 'epoch': 0.6} 60%|█████▉ | 3956/6640 [2:17:58<4:02:52, 5.43s/it] 60%|█████▉ | 3957/6640 [2:18:03<4:02:16, 5.42s/it] {'loss': 0.5768, 'learning_rate': 7.4102389074853164e-06, 'epoch': 0.6} 60%|█████▉ | 3957/6640 [2:18:03<4:02:16, 5.42s/it] 60%|█████▉ | 3958/6640 [2:18:09<4:01:59, 5.41s/it] {'loss': 0.5758, 'learning_rate': 7.40552739404544e-06, 'epoch': 0.6} 60%|█████▉ | 3958/6640 [2:18:09<4:01:59, 5.41s/it] 60%|█████▉ | 3959/6640 [2:18:14<4:00:01, 5.37s/it] {'loss': 0.5751, 'learning_rate': 7.4008164980204455e-06, 'epoch': 0.6} 60%|█████▉ | 3959/6640 [2:18:14<4:00:01, 5.37s/it] 60%|█████▉ | 3960/6640 [2:18:20<4:05:33, 5.50s/it] {'loss': 0.6063, 'learning_rate': 7.396106220531398e-06, 'epoch': 0.6} 60%|█████▉ | 3960/6640 [2:18:20<4:05:33, 5.50s/it] 60%|█████▉ | 3961/6640 [2:18:25<4:03:26, 5.45s/it] {'loss': 0.5745, 'learning_rate': 7.391396562699218e-06, 'epoch': 0.6} 60%|█████▉ | 3961/6640 [2:18:25<4:03:26, 5.45s/it] 60%|█████▉ | 3962/6640 [2:18:31<4:04:14, 5.47s/it] {'loss': 0.5654, 'learning_rate': 7.38668752564467e-06, 'epoch': 0.6} 60%|█████▉ | 3962/6640 [2:18:31<4:04:14, 5.47s/it] 60%|█████▉ | 3963/6640 [2:18:37<4:07:37, 5.55s/it] {'loss': 0.611, 'learning_rate': 7.381979110488391e-06, 'epoch': 0.6} 60%|█████▉ | 3963/6640 [2:18:37<4:07:37, 5.55s/it] 60%|█████▉ | 3964/6640 [2:18:42<4:09:21, 5.59s/it] {'loss': 0.6093, 'learning_rate': 7.377271318350847e-06, 'epoch': 0.6} 60%|█████▉ | 3964/6640 [2:18:42<4:09:21, 5.59s/it] 60%|█████▉ | 3965/6640 [2:18:48<4:09:30, 5.60s/it] {'loss': 0.5758, 'learning_rate': 7.372564150352373e-06, 'epoch': 0.6} 60%|█████▉ | 3965/6640 [2:18:48<4:09:30, 5.60s/it] 60%|█████▉ | 3966/6640 [2:18:53<4:07:08, 5.55s/it] {'loss': 0.578, 'learning_rate': 7.367857607613147e-06, 'epoch': 0.6} 60%|█████▉ | 3966/6640 [2:18:53<4:07:08, 5.55s/it] 60%|█████▉ | 3967/6640 [2:18:59<4:04:15, 5.48s/it] {'loss': 0.5823, 'learning_rate': 7.3631516912531945e-06, 'epoch': 0.6} 60%|█████▉ | 3967/6640 [2:18:59<4:04:15, 5.48s/it] 60%|█████▉ | 3968/6640 [2:19:04<4:03:12, 5.46s/it] {'loss': 0.5959, 'learning_rate': 7.358446402392405e-06, 'epoch': 0.6} 60%|█████▉ | 3968/6640 [2:19:04<4:03:12, 5.46s/it] 60%|█████▉ | 3969/6640 [2:19:09<4:03:22, 5.47s/it] {'loss': 0.5857, 'learning_rate': 7.353741742150509e-06, 'epoch': 0.6} 60%|█████▉ | 3969/6640 [2:19:09<4:03:22, 5.47s/it] 60%|█████▉ | 3970/6640 [2:19:15<4:04:14, 5.49s/it] {'loss': 0.5912, 'learning_rate': 7.349037711647089e-06, 'epoch': 0.6} 60%|█████▉ | 3970/6640 [2:19:15<4:04:14, 5.49s/it] 60%|█████▉ | 3971/6640 [2:19:20<4:03:18, 5.47s/it] {'loss': 0.5926, 'learning_rate': 7.3443343120015705e-06, 'epoch': 0.6} 60%|█████▉ | 3971/6640 [2:19:20<4:03:18, 5.47s/it] 60%|█████▉ | 3972/6640 [2:19:26<4:03:02, 5.47s/it] {'loss': 0.5851, 'learning_rate': 7.33963154433325e-06, 'epoch': 0.6} 60%|█████▉ | 3972/6640 [2:19:26<4:03:02, 5.47s/it] 60%|█████▉ | 3973/6640 [2:19:31<4:03:18, 5.47s/it] {'loss': 0.5966, 'learning_rate': 7.3349294097612515e-06, 'epoch': 0.6} 60%|█████▉ | 3973/6640 [2:19:31<4:03:18, 5.47s/it] 60%|█████▉ | 3974/6640 [2:19:37<4:05:52, 5.53s/it] {'loss': 0.6094, 'learning_rate': 7.330227909404559e-06, 'epoch': 0.6} 60%|█████▉ | 3974/6640 [2:19:37<4:05:52, 5.53s/it] 60%|█████▉ | 3975/6640 [2:19:42<4:03:01, 5.47s/it] {'loss': 0.5808, 'learning_rate': 7.325527044382004e-06, 'epoch': 0.6} 60%|█████▉ | 3975/6640 [2:19:42<4:03:01, 5.47s/it] 60%|█████▉ | 3976/6640 [2:19:48<4:06:44, 5.56s/it] {'loss': 0.6114, 'learning_rate': 7.320826815812262e-06, 'epoch': 0.6} 60%|█████▉ | 3976/6640 [2:19:48<4:06:44, 5.56s/it] 60%|█████▉ | 3977/6640 [2:19:54<4:08:21, 5.60s/it] {'loss': 0.583, 'learning_rate': 7.316127224813868e-06, 'epoch': 0.6} 60%|█████▉ | 3977/6640 [2:19:54<4:08:21, 5.60s/it] 60%|█████▉ | 3978/6640 [2:19:59<4:05:06, 5.52s/it] {'loss': 0.6022, 'learning_rate': 7.311428272505197e-06, 'epoch': 0.6} 60%|█████▉ | 3978/6640 [2:19:59<4:05:06, 5.52s/it] 60%|█████▉ | 3979/6640 [2:20:05<4:02:11, 5.46s/it] {'loss': 0.5995, 'learning_rate': 7.306729960004471e-06, 'epoch': 0.6} 60%|█████▉ | 3979/6640 [2:20:05<4:02:11, 5.46s/it] 60%|█████▉ | 3980/6640 [2:20:10<4:01:41, 5.45s/it] {'loss': 0.5715, 'learning_rate': 7.3020322884297565e-06, 'epoch': 0.6} 60%|█████▉ | 3980/6640 [2:20:10<4:01:41, 5.45s/it] 60%|█████▉ | 3981/6640 [2:20:16<4:03:27, 5.49s/it] {'loss': 0.5739, 'learning_rate': 7.297335258898988e-06, 'epoch': 0.6} 60%|█████▉ | 3981/6640 [2:20:16<4:03:27, 5.49s/it] 60%|█████▉ | 3982/6640 [2:20:21<4:01:49, 5.46s/it] {'loss': 0.6082, 'learning_rate': 7.292638872529921e-06, 'epoch': 0.6} 60%|█████▉ | 3982/6640 [2:20:21<4:01:49, 5.46s/it] 60%|█████▉ | 3983/6640 [2:20:26<3:59:55, 5.42s/it] {'loss': 0.5919, 'learning_rate': 7.287943130440176e-06, 'epoch': 0.6} 60%|█████▉ | 3983/6640 [2:20:26<3:59:55, 5.42s/it] 60%|██████ | 3984/6640 [2:20:31<3:57:44, 5.37s/it] {'loss': 0.5878, 'learning_rate': 7.28324803374721e-06, 'epoch': 0.6} 60%|██████ | 3984/6640 [2:20:31<3:57:44, 5.37s/it] 60%|██████ | 3985/6640 [2:20:37<3:56:52, 5.35s/it] {'loss': 0.5722, 'learning_rate': 7.278553583568326e-06, 'epoch': 0.6} 60%|██████ | 3985/6640 [2:20:37<3:56:52, 5.35s/it] 60%|██████ | 3986/6640 [2:20:42<3:57:58, 5.38s/it] {'loss': 0.5938, 'learning_rate': 7.273859781020687e-06, 'epoch': 0.6} 60%|██████ | 3986/6640 [2:20:42<3:57:58, 5.38s/it] 60%|██████ | 3987/6640 [2:20:48<3:59:55, 5.43s/it] {'loss': 0.5871, 'learning_rate': 7.269166627221288e-06, 'epoch': 0.6} 60%|██████ | 3987/6640 [2:20:48<3:59:55, 5.43s/it] 60%|██████ | 3988/6640 [2:20:53<3:57:57, 5.38s/it] {'loss': 0.5838, 'learning_rate': 7.26447412328697e-06, 'epoch': 0.6} 60%|██████ | 3988/6640 [2:20:53<3:57:57, 5.38s/it] 60%|██████ | 3989/6640 [2:20:59<4:05:47, 5.56s/it] {'loss': 0.5858, 'learning_rate': 7.259782270334425e-06, 'epoch': 0.6} 60%|██████ | 3989/6640 [2:20:59<4:05:47, 5.56s/it] 60%|██████ | 3990/6640 [2:21:05<4:06:07, 5.57s/it] {'loss': 0.5917, 'learning_rate': 7.2550910694801905e-06, 'epoch': 0.6} 60%|██████ | 3990/6640 [2:21:05<4:06:07, 5.57s/it] 60%|██████ | 3991/6640 [2:21:10<4:04:23, 5.54s/it] {'loss': 0.5877, 'learning_rate': 7.250400521840646e-06, 'epoch': 0.6} 60%|██████ | 3991/6640 [2:21:10<4:04:23, 5.54s/it] 60%|██████ | 3992/6640 [2:21:16<4:02:41, 5.50s/it] {'loss': 0.5869, 'learning_rate': 7.2457106285320146e-06, 'epoch': 0.6} 60%|██████ | 3992/6640 [2:21:16<4:02:41, 5.50s/it] 60%|██████ | 3993/6640 [2:21:21<4:00:37, 5.45s/it] {'loss': 0.5724, 'learning_rate': 7.241021390670366e-06, 'epoch': 0.6} 60%|██████ | 3993/6640 [2:21:21<4:00:37, 5.45s/it] 60%|██████ | 3994/6640 [2:21:26<4:00:56, 5.46s/it] {'loss': 0.5945, 'learning_rate': 7.236332809371609e-06, 'epoch': 0.6} 60%|██████ | 3994/6640 [2:21:26<4:00:56, 5.46s/it] 60%|██████ | 3995/6640 [2:21:32<4:01:30, 5.48s/it] {'loss': 0.6, 'learning_rate': 7.2316448857515076e-06, 'epoch': 0.6} 60%|██████ | 3995/6640 [2:21:32<4:01:30, 5.48s/it] 60%|██████ | 3996/6640 [2:21:37<4:00:02, 5.45s/it] {'loss': 0.5984, 'learning_rate': 7.2269576209256575e-06, 'epoch': 0.6} 60%|██████ | 3996/6640 [2:21:37<4:00:02, 5.45s/it] 60%|██████ | 3997/6640 [2:21:43<4:00:05, 5.45s/it] {'loss': 0.5716, 'learning_rate': 7.2222710160095e-06, 'epoch': 0.6} 60%|██████ | 3997/6640 [2:21:43<4:00:05, 5.45s/it] 60%|██████ | 3998/6640 [2:21:48<3:58:12, 5.41s/it] {'loss': 0.5614, 'learning_rate': 7.217585072118325e-06, 'epoch': 0.6} 60%|██████ | 3998/6640 [2:21:48<3:58:12, 5.41s/it] 60%|██████ | 3999/6640 [2:21:54<4:03:11, 5.53s/it] {'loss': 0.591, 'learning_rate': 7.2128997903672585e-06, 'epoch': 0.6} 60%|██████ | 3999/6640 [2:21:54<4:03:11, 5.53s/it]47 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 60%|██████ | 4000/6640 [2:21:59<4:03:06, 5.53s/it] {'loss': 0.6239, 'learning_rate': 7.208215171871277e-06, 'epoch': 0.6} 60%|██████ | 4000/6640 [2:21:59<4:03:06, 5.53s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 60%|██████ | 4001/6640 [2:22:28<9:02:18, 12.33s/it] {'loss': 0.6012, 'learning_rate': 7.20353121774519e-06, 'epoch': 0.6} 60%|██████ | 4001/6640 [2:22:28<9:02:18, 12.33s/it] 60%|██████ | 4002/6640 [2:22:33<7:32:33, 10.29s/it] {'loss': 0.5996, 'learning_rate': 7.198847929103653e-06, 'epoch': 0.6} 60%|██████ | 4002/6640 [2:22:33<7:32:33, 10.29s/it] 60%|██████ | 4003/6640 [2:22:39<6:29:00, 8.85s/it] {'loss': 0.5845, 'learning_rate': 7.194165307061159e-06, 'epoch': 0.6} 60%|██████ | 4003/6640 [2:22:39<6:29:00, 8.85s/it] 60%|██████ | 4004/6640 [2:22:44<5:42:35, 7.80s/it] {'loss': 0.5711, 'learning_rate': 7.189483352732056e-06, 'epoch': 0.6} 60%|██████ | 4004/6640 [2:22:44<5:42:35, 7.80s/it] 60%|██████ | 4005/6640 [2:22:49<5:11:15, 7.09s/it] {'loss': 0.5919, 'learning_rate': 7.184802067230518e-06, 'epoch': 0.6} 60%|██████ | 4005/6640 [2:22:49<5:11:15, 7.09s/it] 60%|██████ | 4006/6640 [2:22:55<4:47:44, 6.55s/it] {'loss': 0.6023, 'learning_rate': 7.1801214516705656e-06, 'epoch': 0.6} 60%|██████ | 4006/6640 [2:22:55<4:47:44, 6.55s/it] 60%|██████ | 4007/6640 [2:23:00<4:34:31, 6.26s/it] {'loss': 0.6089, 'learning_rate': 7.175441507166056e-06, 'epoch': 0.6} 60%|██████ | 4007/6640 [2:23:00<4:34:31, 6.26s/it] 60%|██████ | 4008/6640 [2:23:06<4:21:59, 5.97s/it] {'loss': 0.5828, 'learning_rate': 7.1707622348307e-06, 'epoch': 0.6} 60%|██████ | 4008/6640 [2:23:06<4:21:59, 5.97s/it] 60%|██████ | 4009/6640 [2:23:11<4:14:41, 5.81s/it] {'loss': 0.5636, 'learning_rate': 7.166083635778033e-06, 'epoch': 0.6} 60%|██████ | 4009/6640 [2:23:11<4:14:41, 5.81s/it] 60%|██████ | 4010/6640 [2:23:16<4:08:45, 5.67s/it] {'loss': 0.5611, 'learning_rate': 7.161405711121436e-06, 'epoch': 0.6} 60%|██████ | 4010/6640 [2:23:16<4:08:45, 5.67s/it] 60%|██████ | 4011/6640 [2:23:22<4:04:43, 5.59s/it] {'loss': 0.5806, 'learning_rate': 7.15672846197413e-06, 'epoch': 0.6} 60%|██████ | 4011/6640 [2:23:22<4:04:43, 5.59s/it] 60%|██████ | 4012/6640 [2:23:27<4:00:21, 5.49s/it] {'loss': 0.5717, 'learning_rate': 7.15205188944917e-06, 'epoch': 0.6} 60%|██████ | 4012/6640 [2:23:27<4:00:21, 5.49s/it] 60%|██████ | 4013/6640 [2:23:32<3:57:57, 5.43s/it] {'loss': 0.6025, 'learning_rate': 7.147375994659465e-06, 'epoch': 0.6} 60%|██████ | 4013/6640 [2:23:32<3:57:57, 5.43s/it] 60%|██████ | 4014/6640 [2:23:38<3:58:43, 5.45s/it] {'loss': 0.579, 'learning_rate': 7.142700778717744e-06, 'epoch': 0.6} 60%|██████ | 4014/6640 [2:23:38<3:58:43, 5.45s/it] 60%|██████ | 4015/6640 [2:23:43<3:59:12, 5.47s/it] {'loss': 0.5685, 'learning_rate': 7.1380262427365885e-06, 'epoch': 0.6} 60%|██████ | 4015/6640 [2:23:43<3:59:12, 5.47s/it] 60%|██████ | 4016/6640 [2:23:49<3:58:39, 5.46s/it] {'loss': 0.5925, 'learning_rate': 7.133352387828407e-06, 'epoch': 0.6} 60%|██████ | 4016/6640 [2:23:49<3:58:39, 5.46s/it] 60%|██████ | 4017/6640 [2:23:54<3:58:32, 5.46s/it] {'loss': 0.6066, 'learning_rate': 7.128679215105459e-06, 'epoch': 0.6} 60%|██████ | 4017/6640 [2:23:54<3:58:32, 5.46s/it] 61%|██████ | 4018/6640 [2:23:59<3:56:58, 5.42s/it] {'loss': 0.5509, 'learning_rate': 7.124006725679828e-06, 'epoch': 0.61} 61%|██████ | 4018/6640 [2:23:59<3:56:58, 5.42s/it] 61%|██████ | 4019/6640 [2:24:05<3:55:49, 5.40s/it] {'loss': 0.5859, 'learning_rate': 7.1193349206634455e-06, 'epoch': 0.61} 61%|██████ | 4019/6640 [2:24:05<3:55:49, 5.40s/it] 61%|██████ | 4020/6640 [2:24:10<3:54:15, 5.36s/it] {'loss': 0.578, 'learning_rate': 7.114663801168073e-06, 'epoch': 0.61} 61%|██████ | 4020/6640 [2:24:10<3:54:15, 5.36s/it] 61%|██████ | 4021/6640 [2:24:16<3:54:39, 5.38s/it] {'loss': 0.5934, 'learning_rate': 7.109993368305308e-06, 'epoch': 0.61} 61%|██████ | 4021/6640 [2:24:16<3:54:39, 5.38s/it] 61%|██████ | 4022/6640 [2:24:21<3:56:38, 5.42s/it] {'loss': 0.6089, 'learning_rate': 7.105323623186595e-06, 'epoch': 0.61} 61%|██████ | 4022/6640 [2:24:21<3:56:38, 5.42s/it] 61%|██████ | 4023/6640 [2:24:27<3:59:05, 5.48s/it] {'loss': 0.6065, 'learning_rate': 7.100654566923205e-06, 'epoch': 0.61} 61%|██████ | 4023/6640 [2:24:27<3:59:05, 5.48s/it] 61%|██████ | 4024/6640 [2:24:32<3:57:32, 5.45s/it] {'loss': 0.6008, 'learning_rate': 7.095986200626251e-06, 'epoch': 0.61} 61%|██████ | 4024/6640 [2:24:32<3:57:32, 5.45s/it] 61%|██████ | 4025/6640 [2:24:37<3:55:25, 5.40s/it] {'loss': 0.5858, 'learning_rate': 7.091318525406671e-06, 'epoch': 0.61} 61%|██████ | 4025/6640 [2:24:37<3:55:25, 5.40s/it] 61%|██████ | 4026/6640 [2:24:43<3:53:11, 5.35s/it] {'loss': 0.5889, 'learning_rate': 7.086651542375256e-06, 'epoch': 0.61} 61%|██████ | 4026/6640 [2:24:43<3:53:11, 5.35s/it] 61%|██████ | 4027/6640 [2:24:48<3:54:12, 5.38s/it] {'loss': 0.5951, 'learning_rate': 7.081985252642621e-06, 'epoch': 0.61} 61%|██████ | 4027/6640 [2:24:48<3:54:12, 5.38s/it] 61%|██████ | 4028/6640 [2:24:53<3:55:44, 5.42s/it] {'loss': 0.5943, 'learning_rate': 7.0773196573192124e-06, 'epoch': 0.61} 61%|██████ | 4028/6640 [2:24:54<3:55:44, 5.42s/it] 61%|██████ | 4029/6640 [2:24:59<3:54:02, 5.38s/it] {'loss': 0.5799, 'learning_rate': 7.072654757515321e-06, 'epoch': 0.61} 61%|██████ | 4029/6640 [2:24:59<3:54:02, 5.38s/it] 61%|██████ | 4030/6640 [2:25:04<3:54:26, 5.39s/it] {'loss': 0.5942, 'learning_rate': 7.067990554341064e-06, 'epoch': 0.61} 61%|██████ | 4030/6640 [2:25:04<3:54:26, 5.39s/it] 61%|██████ | 4031/6640 [2:25:10<3:53:52, 5.38s/it] {'loss': 0.5778, 'learning_rate': 7.063327048906401e-06, 'epoch': 0.61} 61%|██████ | 4031/6640 [2:25:10<3:53:52, 5.38s/it] 61%|██████ | 4032/6640 [2:25:15<3:57:35, 5.47s/it] {'loss': 0.598, 'learning_rate': 7.058664242321121e-06, 'epoch': 0.61} 61%|██████ | 4032/6640 [2:25:15<3:57:35, 5.47s/it] 61%|██████ | 4033/6640 [2:25:21<3:57:51, 5.47s/it] {'loss': 0.5897, 'learning_rate': 7.054002135694844e-06, 'epoch': 0.61} 61%|██████ | 4033/6640 [2:25:21<3:57:51, 5.47s/it] 61%|██████ | 4034/6640 [2:25:26<3:59:00, 5.50s/it] {'loss': 0.6034, 'learning_rate': 7.049340730137024e-06, 'epoch': 0.61} 61%|██████ | 4034/6640 [2:25:26<3:59:00, 5.50s/it] 61%|██████ | 4035/6640 [2:25:32<3:57:39, 5.47s/it] {'loss': 0.5842, 'learning_rate': 7.04468002675696e-06, 'epoch': 0.61} 61%|██████ | 4035/6640 [2:25:32<3:57:39, 5.47s/it] 61%|██████ | 4036/6640 [2:25:37<3:57:57, 5.48s/it] {'loss': 0.6139, 'learning_rate': 7.040020026663767e-06, 'epoch': 0.61} 61%|██████ | 4036/6640 [2:25:37<3:57:57, 5.48s/it] 61%|██████ | 4037/6640 [2:25:43<3:57:28, 5.47s/it] {'loss': 0.5752, 'learning_rate': 7.035360730966403e-06, 'epoch': 0.61} 61%|██████ | 4037/6640 [2:25:43<3:57:28, 5.47s/it] 61%|██████ | 4038/6640 [2:25:48<3:57:57, 5.49s/it] {'loss': 0.6003, 'learning_rate': 7.030702140773651e-06, 'epoch': 0.61} 61%|██████ | 4038/6640 [2:25:48<3:57:57, 5.49s/it] 61%|██████ | 4039/6640 [2:25:54<3:55:51, 5.44s/it] {'loss': 0.6016, 'learning_rate': 7.026044257194133e-06, 'epoch': 0.61} 61%|██████ | 4039/6640 [2:25:54<3:55:51, 5.44s/it] 61%|██████ | 4040/6640 [2:25:59<3:52:52, 5.37s/it] {'loss': 0.5766, 'learning_rate': 7.021387081336302e-06, 'epoch': 0.61} 61%|██████ | 4040/6640 [2:25:59<3:52:52, 5.37s/it] 61%|██████ | 4041/6640 [2:26:05<3:58:14, 5.50s/it] {'loss': 0.6038, 'learning_rate': 7.01673061430844e-06, 'epoch': 0.61} 61%|██████ | 4041/6640 [2:26:05<3:58:14, 5.50s/it] 61%|██████ | 4042/6640 [2:26:10<3:56:43, 5.47s/it] {'loss': 0.612, 'learning_rate': 7.012074857218662e-06, 'epoch': 0.61} 61%|██████ | 4042/6640 [2:26:10<3:56:43, 5.47s/it] 61%|██████ | 4043/6640 [2:26:15<3:56:04, 5.45s/it] {'loss': 0.5833, 'learning_rate': 7.007419811174907e-06, 'epoch': 0.61} 61%|██████ | 4043/6640 [2:26:15<3:56:04, 5.45s/it] 61%|██████ | 4044/6640 [2:26:21<3:57:23, 5.49s/it] {'loss': 0.5787, 'learning_rate': 7.0027654772849615e-06, 'epoch': 0.61} 61%|██████ | 4044/6640 [2:26:21<3:57:23, 5.49s/it] 61%|██████ | 4045/6640 [2:26:27<4:00:19, 5.56s/it] {'loss': 0.5807, 'learning_rate': 6.998111856656427e-06, 'epoch': 0.61} 61%|██████ | 4045/6640 [2:26:27<4:00:19, 5.56s/it] 61%|██████ | 4046/6640 [2:26:32<3:58:38, 5.52s/it] {'loss': 0.586, 'learning_rate': 6.993458950396738e-06, 'epoch': 0.61} 61%|██████ | 4046/6640 [2:26:32<3:58:38, 5.52s/it] 61%|██████ | 4047/6640 [2:26:37<3:56:47, 5.48s/it] {'loss': 0.5818, 'learning_rate': 6.988806759613165e-06, 'epoch': 0.61} 61%|██████ | 4047/6640 [2:26:37<3:56:47, 5.48s/it] 61%|██████ | 4048/6640 [2:26:43<3:55:19, 5.45s/it] {'loss': 0.5833, 'learning_rate': 6.984155285412799e-06, 'epoch': 0.61} 61%|██████ | 4048/6640 [2:26:43<3:55:19, 5.45s/it] 61%|██████ | 4049/6640 [2:26:48<3:57:03, 5.49s/it] {'loss': 0.5852, 'learning_rate': 6.979504528902576e-06, 'epoch': 0.61} 61%|██████ | 4049/6640 [2:26:48<3:57:03, 5.49s/it]6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 40 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 61%|██████ | 4050/6640 [2:26:54<3:55:55, 5.47s/it]3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.5836, 'learning_rate': 6.974854491189243e-06, 'epoch': 0.61} 61%|██████ | 4050/6640 [2:26:54<3:55:55, 5.47s/it] 61%|██████ | 4051/6640 [2:26:59<3:56:28, 5.48s/it] {'loss': 0.5807, 'learning_rate': 6.970205173379389e-06, 'epoch': 0.61} 61%|██████ | 4051/6640 [2:26:59<3:56:28, 5.48s/it] 61%|██████ | 4052/6640 [2:27:05<3:54:32, 5.44s/it] {'loss': 0.5713, 'learning_rate': 6.9655565765794195e-06, 'epoch': 0.61} 61%|██████ | 4052/6640 [2:27:05<3:54:32, 5.44s/it] 61%|██████ | 4053/6640 [2:27:10<3:53:42, 5.42s/it] {'loss': 0.6003, 'learning_rate': 6.9609087018955855e-06, 'epoch': 0.61} 61%|██████ | 4053/6640 [2:27:10<3:53:42, 5.42s/it] 61%|██████ | 4054/6640 [2:27:16<3:54:29, 5.44s/it] {'loss': 0.5888, 'learning_rate': 6.956261550433951e-06, 'epoch': 0.61} 61%|██████ | 4054/6640 [2:27:16<3:54:29, 5.44s/it] 61%|██████ | 4055/6640 [2:27:21<3:53:06, 5.41s/it] {'loss': 0.5778, 'learning_rate': 6.951615123300415e-06, 'epoch': 0.61} 61%|██████ | 4055/6640 [2:27:21<3:53:06, 5.41s/it] 61%|██████ | 4056/6640 [2:27:26<3:55:30, 5.47s/it] {'loss': 0.5918, 'learning_rate': 6.946969421600702e-06, 'epoch': 0.61} 61%|██████ | 4056/6640 [2:27:26<3:55:30, 5.47s/it] 61%|██████ | 4057/6640 [2:27:32<3:53:14, 5.42s/it] {'loss': 0.5776, 'learning_rate': 6.94232444644036e-06, 'epoch': 0.61} 61%|██████ | 4057/6640 [2:27:32<3:53:14, 5.42s/it] 61%|██████ | 4058/6640 [2:27:37<3:55:35, 5.47s/it] {'loss': 0.5651, 'learning_rate': 6.937680198924779e-06, 'epoch': 0.61} 61%|██████ | 4058/6640 [2:27:37<3:55:35, 5.47s/it] 61%|██████ | 4059/6640 [2:27:43<3:54:26, 5.45s/it] {'loss': 0.5992, 'learning_rate': 6.933036680159158e-06, 'epoch': 0.61} 61%|██████ | 4059/6640 [2:27:43<3:54:26, 5.45s/it] 61%|██████ | 4060/6640 [2:27:48<3:52:16, 5.40s/it] {'loss': 0.5523, 'learning_rate': 6.928393891248529e-06, 'epoch': 0.61} 61%|██████ | 4060/6640 [2:27:48<3:52:16, 5.40s/it] 61%|██████ | 4061/6640 [2:27:53<3:52:35, 5.41s/it] {'loss': 0.5898, 'learning_rate': 6.923751833297751e-06, 'epoch': 0.61} 61%|██████ | 4061/6640 [2:27:53<3:52:35, 5.41s/it] 61%|██████ | 4062/6640 [2:27:59<3:52:30, 5.41s/it] {'loss': 0.5535, 'learning_rate': 6.919110507411515e-06, 'epoch': 0.61} 61%|██████ | 4062/6640 [2:27:59<3:52:30, 5.41s/it] 61%|██████ | 4063/6640 [2:28:04<3:52:57, 5.42s/it] {'loss': 0.5737, 'learning_rate': 6.9144699146943285e-06, 'epoch': 0.61} 61%|██████ | 4063/6640 [2:28:04<3:52:57, 5.42s/it] 61%|██████ | 4064/6640 [2:28:10<3:53:08, 5.43s/it] {'loss': 0.5681, 'learning_rate': 6.909830056250527e-06, 'epoch': 0.61} 61%|██████ | 4064/6640 [2:28:10<3:53:08, 5.43s/it] 61%|██████ | 4065/6640 [2:28:15<3:52:11, 5.41s/it] {'loss': 0.5942, 'learning_rate': 6.905190933184275e-06, 'epoch': 0.61} 61%|██████ | 4065/6640 [2:28:15<3:52:11, 5.41s/it] 61%|██████ | 4066/6640 [2:28:21<3:52:56, 5.43s/it] {'loss': 0.5859, 'learning_rate': 6.900552546599553e-06, 'epoch': 0.61} 61%|██████ | 4066/6640 [2:28:21<3:52:56, 5.43s/it] 61%|██████▏ | 4067/6640 [2:28:26<3:56:53, 5.52s/it] {'loss': 0.5912, 'learning_rate': 6.895914897600182e-06, 'epoch': 0.61} 61%|██████▏ | 4067/6640 [2:28:26<3:56:53, 5.52s/it] 61%|██████▏ | 4068/6640 [2:28:32<3:56:47, 5.52s/it] {'loss': 0.5827, 'learning_rate': 6.891277987289794e-06, 'epoch': 0.61} 61%|██████▏ | 4068/6640 [2:28:32<3:56:47, 5.52s/it] 61%|██████▏ | 4069/6640 [2:28:37<3:54:47, 5.48s/it] {'loss': 0.5815, 'learning_rate': 6.886641816771848e-06, 'epoch': 0.61} 61%|██████▏ | 4069/6640 [2:28:37<3:54:47, 5.48s/it] 61%|██████▏ | 4070/6640 [2:28:43<3:53:29, 5.45s/it] {'loss': 0.588, 'learning_rate': 6.882006387149625e-06, 'epoch': 0.61} 61%|██████▏ | 4070/6640 [2:28:43<3:53:29, 5.45s/it] 61%|██████▏ | 4071/6640 [2:28:48<3:52:23, 5.43s/it] {'loss': 0.5853, 'learning_rate': 6.877371699526241e-06, 'epoch': 0.61} 61%|██████▏ | 4071/6640 [2:28:48<3:52:23, 5.43s/it] 61%|██████▏ | 4072/6640 [2:28:54<3:52:59, 5.44s/it] {'loss': 0.5614, 'learning_rate': 6.872737755004619e-06, 'epoch': 0.61} 61%|██████▏ | 4072/6640 [2:28:54<3:52:59, 5.44s/it] 61%|██████▏ | 4073/6640 [2:28:59<3:51:41, 5.42s/it] {'loss': 0.5908, 'learning_rate': 6.868104554687523e-06, 'epoch': 0.61} 61%|██████▏ | 4073/6640 [2:28:59<3:51:41, 5.42s/it] 61%|██████▏ | 4074/6640 [2:29:04<3:52:17, 5.43s/it] {'loss': 0.5638, 'learning_rate': 6.863472099677523e-06, 'epoch': 0.61} 61%|██████▏ | 4074/6640 [2:29:04<3:52:17, 5.43s/it] 61%|██████▏ | 4075/6640 [2:29:10<3:52:30, 5.44s/it] {'loss': 0.5743, 'learning_rate': 6.858840391077017e-06, 'epoch': 0.61} 61%|██████▏ | 4075/6640 [2:29:10<3:52:30, 5.44s/it] 61%|██████▏ | 4076/6640 [2:29:15<3:52:15, 5.44s/it] {'loss': 0.573, 'learning_rate': 6.854209429988236e-06, 'epoch': 0.61} 61%|██████▏ | 4076/6640 [2:29:15<3:52:15, 5.44s/it] 61%|██████▏ | 4077/6640 [2:29:21<3:51:38, 5.42s/it] {'loss': 0.5812, 'learning_rate': 6.849579217513218e-06, 'epoch': 0.61} 61%|██████▏ | 4077/6640 [2:29:21<3:51:38, 5.42s/it] 61%|██████▏ | 4078/6640 [2:29:26<3:50:56, 5.41s/it] {'loss': 0.5881, 'learning_rate': 6.844949754753833e-06, 'epoch': 0.61} 61%|██████▏ | 4078/6640 [2:29:26<3:50:56, 5.41s/it] 61%|██████▏ | 4079/6640 [2:29:31<3:50:52, 5.41s/it] {'loss': 0.6031, 'learning_rate': 6.840321042811764e-06, 'epoch': 0.61} 61%|██████▏ | 4079/6640 [2:29:31<3:50:52, 5.41s/it] 61%|██████▏ | 4080/6640 [2:29:37<3:52:42, 5.45s/it] {'loss': 0.5822, 'learning_rate': 6.8356930827885256e-06, 'epoch': 0.61} 61%|██████▏ | 4080/6640 [2:29:37<3:52:42, 5.45s/it] 61%|██████▏ | 4081/6640 [2:29:42<3:51:38, 5.43s/it] {'loss': 0.5781, 'learning_rate': 6.831065875785446e-06, 'epoch': 0.61} 61%|██████▏ | 4081/6640 [2:29:42<3:51:38, 5.43s/it] 61%|██████▏ | 4082/6640 [2:29:48<3:51:22, 5.43s/it] {'loss': 0.6058, 'learning_rate': 6.826439422903677e-06, 'epoch': 0.61} 61%|██████▏ | 4082/6640 [2:29:48<3:51:22, 5.43s/it] 61%|██████▏ | 4083/6640 [2:29:53<3:51:03, 5.42s/it] {'loss': 0.5861, 'learning_rate': 6.821813725244189e-06, 'epoch': 0.61} 61%|██████▏ | 4083/6640 [2:29:53<3:51:03, 5.42s/it] 62%|██████▏ | 4084/6640 [2:29:59<3:54:32, 5.51s/it] {'loss': 0.6028, 'learning_rate': 6.817188783907771e-06, 'epoch': 0.62} 62%|██████▏ | 4084/6640 [2:29:59<3:54:32, 5.51s/it] 62%|██████▏ | 4085/6640 [2:30:04<3:52:00, 5.45s/it] {'loss': 0.5658, 'learning_rate': 6.812564599995042e-06, 'epoch': 0.62} 62%|██████▏ | 4085/6640 [2:30:04<3:52:00, 5.45s/it] 62%|██████▏ | 4086/6640 [2:30:10<3:52:47, 5.47s/it] {'loss': 0.6006, 'learning_rate': 6.807941174606432e-06, 'epoch': 0.62} 62%|██████▏ | 4086/6640 [2:30:10<3:52:47, 5.47s/it] 62%|██████▏ | 4087/6640 [2:30:15<3:52:41, 5.47s/it] {'loss': 0.5792, 'learning_rate': 6.8033185088421874e-06, 'epoch': 0.62} 62%|██████▏ | 4087/6640 [2:30:15<3:52:41, 5.47s/it] 62%|██████▏ | 4088/6640 [2:30:20<3:50:33, 5.42s/it] {'loss': 0.5905, 'learning_rate': 6.798696603802382e-06, 'epoch': 0.62} 62%|██████▏ | 4088/6640 [2:30:20<3:50:33, 5.42s/it] 62%|██████▏ | 4089/6640 [2:30:26<3:49:27, 5.40s/it] {'loss': 0.5641, 'learning_rate': 6.7940754605869034e-06, 'epoch': 0.62} 62%|██████▏ | 4089/6640 [2:30:26<3:49:27, 5.40s/it] 62%|██████▏ | 4090/6640 [2:30:31<3:51:39, 5.45s/it] {'loss': 0.5811, 'learning_rate': 6.789455080295464e-06, 'epoch': 0.62} 62%|██████▏ | 4090/6640 [2:30:31<3:51:39, 5.45s/it] 62%|██████▏ | 4091/6640 [2:30:37<3:50:55, 5.44s/it] {'loss': 0.5927, 'learning_rate': 6.7848354640275884e-06, 'epoch': 0.62} 62%|██████▏ | 4091/6640 [2:30:37<3:50:55, 5.44s/it] 62%|██████▏ | 4092/6640 [2:30:42<3:49:22, 5.40s/it] {'loss': 0.5831, 'learning_rate': 6.78021661288262e-06, 'epoch': 0.62} 62%|██████▏ | 4092/6640 [2:30:42<3:49:22, 5.40s/it] 62%|██████▏ | 4093/6640 [2:30:48<3:51:42, 5.46s/it] {'loss': 0.6039, 'learning_rate': 6.775598527959717e-06, 'epoch': 0.62} 62%|██████▏ | 4093/6640 [2:30:48<3:51:42, 5.46s/it] 62%|██████▏ | 4094/6640 [2:30:53<3:49:53, 5.42s/it] {'loss': 0.5773, 'learning_rate': 6.77098121035787e-06, 'epoch': 0.62} 62%|██████▏ | 4094/6640 [2:30:53<3:49:53, 5.42s/it] 62%|██████▏ | 4095/6640 [2:30:59<3:51:30, 5.46s/it] {'loss': 0.6095, 'learning_rate': 6.766364661175872e-06, 'epoch': 0.62} 62%|██████▏ | 4095/6640 [2:30:59<3:51:30, 5.46s/it] 62%|██████▏ | 4096/6640 [2:31:04<3:50:26, 5.43s/it] {'loss': 0.5818, 'learning_rate': 6.7617488815123336e-06, 'epoch': 0.62} 62%|██████▏ | 4096/6640 [2:31:04<3:50:26, 5.43s/it] 62%|██████▏ | 4097/6640 [2:31:09<3:48:49, 5.40s/it] {'loss': 0.5843, 'learning_rate': 6.757133872465691e-06, 'epoch': 0.62} 62%|██████▏ | 4097/6640 [2:31:09<3:48:49, 5.40s/it] 62%|██████▏ | 4098/6640 [2:31:15<3:49:29, 5.42s/it] {'loss': 0.5834, 'learning_rate': 6.7525196351341915e-06, 'epoch': 0.62} 62%|██████▏ | 4098/6640 [2:31:15<3:49:29, 5.42s/it] 62%|██████▏ | 4099/6640 [2:31:20<3:48:26, 5.39s/it] {'loss': 0.594, 'learning_rate': 6.7479061706159034e-06, 'epoch': 0.62} 62%|██████▏ | 4099/6640 [2:31:20<3:48:26, 5.39s/it]6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 62%|██████▏ | 4100/6640 [2:31:26<3:48:54, 5.41s/it]AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5739, 'learning_rate': 6.743293480008703e-06, 'epoch': 0.62} 62%|██████▏ | 4100/6640 [2:31:26<3:48:54, 5.41s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 62%|██████▏ | 4101/6640 [2:31:53<8:28:09, 12.01s/it] {'loss': 0.5836, 'learning_rate': 6.7386815644102875e-06, 'epoch': 0.62} 62%|██████▏ | 4101/6640 [2:31:53<8:28:09, 12.01s/it] 62%|██████▏ | 4102/6640 [2:31:58<7:05:03, 10.05s/it] {'loss': 0.5874, 'learning_rate': 6.734070424918167e-06, 'epoch': 0.62} 62%|██████▏ | 4102/6640 [2:31:58<7:05:03, 10.05s/it] 62%|██████▏ | 4103/6640 [2:32:04<6:04:29, 8.62s/it] {'loss': 0.5838, 'learning_rate': 6.729460062629673e-06, 'epoch': 0.62} 62%|██████▏ | 4103/6640 [2:32:04<6:04:29, 8.62s/it] 62%|██████▏ | 4104/6640 [2:32:09<5:23:46, 7.66s/it] {'loss': 0.6037, 'learning_rate': 6.72485047864195e-06, 'epoch': 0.62} 62%|██████▏ | 4104/6640 [2:32:09<5:23:46, 7.66s/it] 62%|██████▏ | 4105/6640 [2:32:14<4:54:35, 6.97s/it] {'loss': 0.6153, 'learning_rate': 6.720241674051948e-06, 'epoch': 0.62} 62%|██████▏ | 4105/6640 [2:32:14<4:54:35, 6.97s/it] 62%|██████▏ | 4106/6640 [2:32:20<4:34:19, 6.50s/it] {'loss': 0.5763, 'learning_rate': 6.715633649956444e-06, 'epoch': 0.62} 62%|██████▏ | 4106/6640 [2:32:20<4:34:19, 6.50s/it] 62%|██████▏ | 4107/6640 [2:32:25<4:22:25, 6.22s/it] {'loss': 0.5785, 'learning_rate': 6.711026407452023e-06, 'epoch': 0.62} 62%|██████▏ | 4107/6640 [2:32:25<4:22:25, 6.22s/it] 62%|██████▏ | 4108/6640 [2:32:31<4:11:03, 5.95s/it] {'loss': 0.6177, 'learning_rate': 6.706419947635087e-06, 'epoch': 0.62} 62%|██████▏ | 4108/6640 [2:32:31<4:11:03, 5.95s/it] 62%|██████▏ | 4109/6640 [2:32:36<4:05:22, 5.82s/it] {'loss': 0.5804, 'learning_rate': 6.701814271601847e-06, 'epoch': 0.62} 62%|██████▏ | 4109/6640 [2:32:36<4:05:22, 5.82s/it] 62%|██████▏ | 4110/6640 [2:32:42<4:00:30, 5.70s/it] {'loss': 0.579, 'learning_rate': 6.697209380448333e-06, 'epoch': 0.62} 62%|██████▏ | 4110/6640 [2:32:42<4:00:30, 5.70s/it] 62%|██████▏ | 4111/6640 [2:32:47<3:56:08, 5.60s/it] {'loss': 0.5684, 'learning_rate': 6.692605275270379e-06, 'epoch': 0.62} 62%|██████▏ | 4111/6640 [2:32:47<3:56:08, 5.60s/it] 62%|██████▏ | 4112/6640 [2:32:53<3:55:04, 5.58s/it] {'loss': 0.6101, 'learning_rate': 6.688001957163647e-06, 'epoch': 0.62} 62%|██████▏ | 4112/6640 [2:32:53<3:55:04, 5.58s/it] 62%|██████▏ | 4113/6640 [2:32:58<3:50:24, 5.47s/it] {'loss': 0.584, 'learning_rate': 6.683399427223598e-06, 'epoch': 0.62} 62%|██████▏ | 4113/6640 [2:32:58<3:50:24, 5.47s/it] 62%|██████▏ | 4114/6640 [2:33:03<3:48:48, 5.43s/it] {'loss': 0.576, 'learning_rate': 6.6787976865455136e-06, 'epoch': 0.62} 62%|██████▏ | 4114/6640 [2:33:03<3:48:48, 5.43s/it] 62%|██████▏ | 4115/6640 [2:33:09<3:47:56, 5.42s/it] {'loss': 0.5746, 'learning_rate': 6.674196736224481e-06, 'epoch': 0.62} 62%|██████▏ | 4115/6640 [2:33:09<3:47:56, 5.42s/it] 62%|██████▏ | 4116/6640 [2:33:14<3:47:13, 5.40s/it] {'loss': 0.5943, 'learning_rate': 6.6695965773554085e-06, 'epoch': 0.62} 62%|██████▏ | 4116/6640 [2:33:14<3:47:13, 5.40s/it] 62%|██████▏ | 4117/6640 [2:33:19<3:45:29, 5.36s/it] {'loss': 0.5886, 'learning_rate': 6.664997211033006e-06, 'epoch': 0.62} 62%|██████▏ | 4117/6640 [2:33:19<3:45:29, 5.36s/it] 62%|██████▏ | 4118/6640 [2:33:25<3:48:56, 5.45s/it] {'loss': 0.578, 'learning_rate': 6.660398638351802e-06, 'epoch': 0.62} 62%|██████▏ | 4118/6640 [2:33:25<3:48:56, 5.45s/it] 62%|██████▏ | 4119/6640 [2:33:30<3:47:39, 5.42s/it] {'loss': 0.5962, 'learning_rate': 6.655800860406132e-06, 'epoch': 0.62} 62%|██████▏ | 4119/6640 [2:33:30<3:47:39, 5.42s/it] 62%|██████▏ | 4120/6640 [2:33:36<3:47:53, 5.43s/it] {'loss': 0.5767, 'learning_rate': 6.651203878290139e-06, 'epoch': 0.62} 62%|██████▏ | 4120/6640 [2:33:36<3:47:53, 5.43s/it] 62%|██████▏ | 4121/6640 [2:33:41<3:49:04, 5.46s/it] {'loss': 0.5985, 'learning_rate': 6.646607693097791e-06, 'epoch': 0.62} 62%|██████▏ | 4121/6640 [2:33:41<3:49:04, 5.46s/it] 62%|██████▏ | 4122/6640 [2:33:46<3:47:30, 5.42s/it] {'loss': 0.5892, 'learning_rate': 6.642012305922849e-06, 'epoch': 0.62} 62%|██████▏ | 4122/6640 [2:33:46<3:47:30, 5.42s/it] 62%|██████▏ | 4123/6640 [2:33:52<3:46:48, 5.41s/it] {'loss': 0.5715, 'learning_rate': 6.637417717858898e-06, 'epoch': 0.62} 62%|██████▏ | 4123/6640 [2:33:52<3:46:48, 5.41s/it] 62%|██████▏ | 4124/6640 [2:33:57<3:46:29, 5.40s/it] {'loss': 0.5782, 'learning_rate': 6.6328239299993194e-06, 'epoch': 0.62} 62%|██████▏ | 4124/6640 [2:33:57<3:46:29, 5.40s/it] 62%|██████▏ | 4125/6640 [2:34:02<3:42:48, 5.32s/it] {'loss': 0.5773, 'learning_rate': 6.628230943437319e-06, 'epoch': 0.62} 62%|██████▏ | 4125/6640 [2:34:02<3:42:48, 5.32s/it] 62%|██████▏ | 4126/6640 [2:34:08<3:45:01, 5.37s/it] {'loss': 0.5921, 'learning_rate': 6.623638759265902e-06, 'epoch': 0.62} 62%|██████▏ | 4126/6640 [2:34:08<3:45:01, 5.37s/it] 62%|██████▏ | 4127/6640 [2:34:13<3:46:51, 5.42s/it] {'loss': 0.5797, 'learning_rate': 6.619047378577883e-06, 'epoch': 0.62} 62%|██████▏ | 4127/6640 [2:34:13<3:46:51, 5.42s/it] 62%|██████▏ | 4128/6640 [2:34:19<3:49:09, 5.47s/it] {'loss': 0.5688, 'learning_rate': 6.61445680246589e-06, 'epoch': 0.62} 62%|██████▏ | 4128/6640 [2:34:19<3:49:09, 5.47s/it] 62%|██████▏ | 4129/6640 [2:34:25<3:52:32, 5.56s/it] {'loss': 0.61, 'learning_rate': 6.6098670320223505e-06, 'epoch': 0.62} 62%|██████▏ | 4129/6640 [2:34:25<3:52:32, 5.56s/it] 62%|██████▏ | 4130/6640 [2:34:30<3:52:11, 5.55s/it] {'loss': 0.5748, 'learning_rate': 6.605278068339516e-06, 'epoch': 0.62} 62%|██████▏ | 4130/6640 [2:34:30<3:52:11, 5.55s/it] 62%|██████▏ | 4131/6640 [2:34:36<3:49:37, 5.49s/it] {'loss': 0.5932, 'learning_rate': 6.600689912509429e-06, 'epoch': 0.62} 62%|██████▏ | 4131/6640 [2:34:36<3:49:37, 5.49s/it] 62%|██████▏ | 4132/6640 [2:34:41<3:48:52, 5.48s/it] {'loss': 0.5735, 'learning_rate': 6.596102565623954e-06, 'epoch': 0.62} 62%|██████▏ | 4132/6640 [2:34:41<3:48:52, 5.48s/it] 62%|██████▏ | 4133/6640 [2:34:47<3:49:36, 5.50s/it] {'loss': 0.6329, 'learning_rate': 6.59151602877475e-06, 'epoch': 0.62} 62%|██████▏ | 4133/6640 [2:34:47<3:49:36, 5.50s/it] 62%|██████▏ | 4134/6640 [2:34:52<3:48:49, 5.48s/it] {'loss': 0.5875, 'learning_rate': 6.586930303053297e-06, 'epoch': 0.62} 62%|██████▏ | 4134/6640 [2:34:52<3:48:49, 5.48s/it] 62%|██████▏ | 4135/6640 [2:34:58<3:50:20, 5.52s/it] {'loss': 0.5707, 'learning_rate': 6.58234538955087e-06, 'epoch': 0.62} 62%|██████▏ | 4135/6640 [2:34:58<3:50:20, 5.52s/it] 62%|██████▏ | 4136/6640 [2:35:03<3:47:22, 5.45s/it] {'loss': 0.585, 'learning_rate': 6.577761289358559e-06, 'epoch': 0.62} 62%|██████▏ | 4136/6640 [2:35:03<3:47:22, 5.45s/it] 62%|██████▏ | 4137/6640 [2:35:08<3:45:22, 5.40s/it] {'loss': 0.586, 'learning_rate': 6.573178003567249e-06, 'epoch': 0.62} 62%|██████▏ | 4137/6640 [2:35:08<3:45:22, 5.40s/it] 62%|██████▏ | 4138/6640 [2:35:14<3:47:14, 5.45s/it] {'loss': 0.5846, 'learning_rate': 6.568595533267645e-06, 'epoch': 0.62} 62%|██████▏ | 4138/6640 [2:35:14<3:47:14, 5.45s/it] 62%|██████▏ | 4139/6640 [2:35:19<3:46:08, 5.43s/it] {'loss': 0.5985, 'learning_rate': 6.564013879550253e-06, 'epoch': 0.62} 62%|██████▏ | 4139/6640 [2:35:19<3:46:08, 5.43s/it] 62%|██████▏ | 4140/6640 [2:35:24<3:44:55, 5.40s/it] {'loss': 0.5981, 'learning_rate': 6.559433043505383e-06, 'epoch': 0.62} 62%|██████▏ | 4140/6640 [2:35:24<3:44:55, 5.40s/it] 62%|██████▏ | 4141/6640 [2:35:30<3:45:31, 5.41s/it] {'loss': 0.5845, 'learning_rate': 6.554853026223149e-06, 'epoch': 0.62} 62%|██████▏ | 4141/6640 [2:35:30<3:45:31, 5.41s/it] 62%|██████▏ | 4142/6640 [2:35:35<3:44:30, 5.39s/it] {'loss': 0.5621, 'learning_rate': 6.55027382879347e-06, 'epoch': 0.62} 62%|██████▏ | 4142/6640 [2:35:35<3:44:30, 5.39s/it] 62%|██████▏ | 4143/6640 [2:35:41<3:46:59, 5.45s/it] {'loss': 0.5749, 'learning_rate': 6.54569545230608e-06, 'epoch': 0.62} 62%|██████▏ | 4143/6640 [2:35:41<3:46:59, 5.45s/it] 62%|██████▏ | 4144/6640 [2:35:46<3:46:29, 5.44s/it] {'loss': 0.5928, 'learning_rate': 6.541117897850504e-06, 'epoch': 0.62} 62%|██████▏ | 4144/6640 [2:35:46<3:46:29, 5.44s/it] 62%|██████▏ | 4145/6640 [2:35:52<3:44:51, 5.41s/it] {'loss': 0.5765, 'learning_rate': 6.536541166516079e-06, 'epoch': 0.62} 62%|██████▏ | 4145/6640 [2:35:52<3:44:51, 5.41s/it] 62%|██████▏ | 4146/6640 [2:35:57<3:45:34, 5.43s/it] {'loss': 0.5869, 'learning_rate': 6.531965259391942e-06, 'epoch': 0.62} 62%|██████▏ | 4146/6640 [2:35:57<3:45:34, 5.43s/it] 62%|██████▏ | 4147/6640 [2:36:03<3:47:10, 5.47s/it] {'loss': 0.6222, 'learning_rate': 6.5273901775670375e-06, 'epoch': 0.62} 62%|██████▏ | 4147/6640 [2:36:03<3:47:10, 5.47s/it] 62%|██████▏ | 4148/6640 [2:36:08<3:43:21, 5.38s/it] {'loss': 0.5923, 'learning_rate': 6.522815922130112e-06, 'epoch': 0.62} 62%|██████▏ | 4148/6640 [2:36:08<3:43:21, 5.38s/it] 62%|██████▏ | 4149/6640 [2:36:13<3:43:10, 5.38s/it] {'loss': 0.6037, 'learning_rate': 6.51824249416972e-06, 'epoch': 0.62} 62%|██████▏ | 4149/6640 [2:36:13<3:43:10, 5.38s/it]176 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 62%|██████▎ | 4150/6640 [2:36:19<3:45:26, 5.43s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5815, 'learning_rate': 6.513669894774209e-06, 'epoch': 0.62} 62%|██████▎ | 4150/6640 [2:36:19<3:45:26, 5.43s/it] 63%|██████▎ | 4151/6640 [2:36:24<3:48:01, 5.50s/it] {'loss': 0.5926, 'learning_rate': 6.509098125031732e-06, 'epoch': 0.63} 63%|██████▎ | 4151/6640 [2:36:24<3:48:01, 5.50s/it] 63%|██████▎ | 4152/6640 [2:36:30<3:47:32, 5.49s/it] {'loss': 0.5755, 'learning_rate': 6.504527186030258e-06, 'epoch': 0.63} 63%|██████▎ | 4152/6640 [2:36:30<3:47:32, 5.49s/it] 63%|██████▎ | 4153/6640 [2:36:35<3:45:44, 5.45s/it] {'loss': 0.587, 'learning_rate': 6.499957078857542e-06, 'epoch': 0.63} 63%|██████▎ | 4153/6640 [2:36:35<3:45:44, 5.45s/it] 63%|██████▎ | 4154/6640 [2:36:41<3:44:57, 5.43s/it] {'loss': 0.588, 'learning_rate': 6.4953878046011434e-06, 'epoch': 0.63} 63%|██████▎ | 4154/6640 [2:36:41<3:44:57, 5.43s/it] 63%|██████▎ | 4155/6640 [2:36:46<3:43:55, 5.41s/it] {'loss': 0.564, 'learning_rate': 6.490819364348434e-06, 'epoch': 0.63} 63%|██████▎ | 4155/6640 [2:36:46<3:43:55, 5.41s/it] 63%|██████▎ | 4156/6640 [2:36:51<3:41:36, 5.35s/it] {'loss': 0.5788, 'learning_rate': 6.486251759186573e-06, 'epoch': 0.63} 63%|██████▎ | 4156/6640 [2:36:51<3:41:36, 5.35s/it] 63%|██████▎ | 4157/6640 [2:36:57<3:43:43, 5.41s/it] {'loss': 0.584, 'learning_rate': 6.481684990202535e-06, 'epoch': 0.63} 63%|██████▎ | 4157/6640 [2:36:57<3:43:43, 5.41s/it] 63%|██████▎ | 4158/6640 [2:37:02<3:42:45, 5.38s/it] {'loss': 0.5793, 'learning_rate': 6.477119058483084e-06, 'epoch': 0.63} 63%|██████▎ | 4158/6640 [2:37:02<3:42:45, 5.38s/it] 63%|██████▎ | 4159/6640 [2:37:07<3:42:44, 5.39s/it] {'loss': 0.5906, 'learning_rate': 6.472553965114791e-06, 'epoch': 0.63} 63%|██████▎ | 4159/6640 [2:37:07<3:42:44, 5.39s/it] 63%|██████▎ | 4160/6640 [2:37:13<3:44:55, 5.44s/it] {'loss': 0.5909, 'learning_rate': 6.467989711184021e-06, 'epoch': 0.63} 63%|██████▎ | 4160/6640 [2:37:13<3:44:55, 5.44s/it] 63%|██████▎ | 4161/6640 [2:37:19<3:45:26, 5.46s/it] {'loss': 0.6174, 'learning_rate': 6.463426297776951e-06, 'epoch': 0.63} 63%|██████▎ | 4161/6640 [2:37:19<3:45:26, 5.46s/it] 63%|██████▎ | 4162/6640 [2:37:24<3:43:34, 5.41s/it] {'loss': 0.597, 'learning_rate': 6.458863725979549e-06, 'epoch': 0.63} 63%|██████▎ | 4162/6640 [2:37:24<3:43:34, 5.41s/it] 63%|██████▎ | 4163/6640 [2:37:29<3:41:27, 5.36s/it] {'loss': 0.5848, 'learning_rate': 6.454301996877582e-06, 'epoch': 0.63} 63%|██████▎ | 4163/6640 [2:37:29<3:41:27, 5.36s/it] 63%|██████▎ | 4164/6640 [2:37:35<3:42:19, 5.39s/it] {'loss': 0.5808, 'learning_rate': 6.449741111556622e-06, 'epoch': 0.63} 63%|██████▎ | 4164/6640 [2:37:35<3:42:19, 5.39s/it] 63%|██████▎ | 4165/6640 [2:37:40<3:42:40, 5.40s/it] {'loss': 0.5658, 'learning_rate': 6.445181071102034e-06, 'epoch': 0.63} 63%|██████▎ | 4165/6640 [2:37:40<3:42:40, 5.40s/it] 63%|██████▎ | 4166/6640 [2:37:46<3:45:26, 5.47s/it] {'loss': 0.5846, 'learning_rate': 6.4406218765989895e-06, 'epoch': 0.63} 63%|██████▎ | 4166/6640 [2:37:46<3:45:26, 5.47s/it] 63%|██████▎ | 4167/6640 [2:37:51<3:43:33, 5.42s/it] {'loss': 0.5692, 'learning_rate': 6.4360635291324525e-06, 'epoch': 0.63} 63%|██████▎ | 4167/6640 [2:37:51<3:43:33, 5.42s/it] 63%|██████▎ | 4168/6640 [2:37:56<3:41:25, 5.37s/it] {'loss': 0.5769, 'learning_rate': 6.431506029787189e-06, 'epoch': 0.63} 63%|██████▎ | 4168/6640 [2:37:56<3:41:25, 5.37s/it] 63%|██████▎ | 4169/6640 [2:38:02<3:42:05, 5.39s/it] {'loss': 0.589, 'learning_rate': 6.426949379647755e-06, 'epoch': 0.63} 63%|██████▎ | 4169/6640 [2:38:02<3:42:05, 5.39s/it] 63%|██████▎ | 4170/6640 [2:38:07<3:42:30, 5.41s/it] {'loss': 0.6127, 'learning_rate': 6.422393579798519e-06, 'epoch': 0.63} 63%|██████▎ | 4170/6640 [2:38:07<3:42:30, 5.41s/it] 63%|██████▎ | 4171/6640 [2:38:13<3:43:48, 5.44s/it] {'loss': 0.5656, 'learning_rate': 6.417838631323636e-06, 'epoch': 0.63} 63%|██████▎ | 4171/6640 [2:38:13<3:43:48, 5.44s/it] 63%|██████▎ | 4172/6640 [2:38:18<3:47:55, 5.54s/it] {'loss': 0.5876, 'learning_rate': 6.413284535307062e-06, 'epoch': 0.63} 63%|██████▎ | 4172/6640 [2:38:18<3:47:55, 5.54s/it] 63%|██████▎ | 4173/6640 [2:38:24<3:48:36, 5.56s/it] {'loss': 0.5957, 'learning_rate': 6.408731292832549e-06, 'epoch': 0.63} 63%|██████▎ | 4173/6640 [2:38:24<3:48:36, 5.56s/it] 63%|██████▎ | 4174/6640 [2:38:29<3:48:02, 5.55s/it] {'loss': 0.5924, 'learning_rate': 6.404178904983644e-06, 'epoch': 0.63} 63%|██████▎ | 4174/6640 [2:38:29<3:48:02, 5.55s/it] 63%|██████▎ | 4175/6640 [2:38:35<3:47:22, 5.53s/it] {'loss': 0.6065, 'learning_rate': 6.399627372843699e-06, 'epoch': 0.63} 63%|██████▎ | 4175/6640 [2:38:35<3:47:22, 5.53s/it] 63%|██████▎ | 4176/6640 [2:38:40<3:43:29, 5.44s/it] {'loss': 0.5837, 'learning_rate': 6.395076697495854e-06, 'epoch': 0.63} 63%|██████▎ | 4176/6640 [2:38:40<3:43:29, 5.44s/it] 63%|██████▎ | 4177/6640 [2:38:46<3:43:06, 5.43s/it] {'loss': 0.5758, 'learning_rate': 6.390526880023049e-06, 'epoch': 0.63} 63%|██████▎ | 4177/6640 [2:38:46<3:43:06, 5.43s/it] 63%|██████▎ | 4178/6640 [2:38:51<3:42:19, 5.42s/it] {'loss': 0.598, 'learning_rate': 6.385977921508011e-06, 'epoch': 0.63} 63%|██████▎ | 4178/6640 [2:38:51<3:42:19, 5.42s/it] 63%|██████▎ | 4179/6640 [2:38:56<3:41:13, 5.39s/it] {'loss': 0.578, 'learning_rate': 6.381429823033281e-06, 'epoch': 0.63} 63%|██████▎ | 4179/6640 [2:38:56<3:41:13, 5.39s/it] 63%|██████▎ | 4180/6640 [2:39:02<3:41:22, 5.40s/it] {'loss': 0.59, 'learning_rate': 6.376882585681174e-06, 'epoch': 0.63} 63%|██████▎ | 4180/6640 [2:39:02<3:41:22, 5.40s/it] 63%|██████▎ | 4181/6640 [2:39:07<3:38:40, 5.34s/it] {'loss': 0.5638, 'learning_rate': 6.372336210533821e-06, 'epoch': 0.63} 63%|██████▎ | 4181/6640 [2:39:07<3:38:40, 5.34s/it] 63%|██████▎ | 4182/6640 [2:39:13<3:43:59, 5.47s/it] {'loss': 0.5903, 'learning_rate': 6.367790698673132e-06, 'epoch': 0.63} 63%|██████▎ | 4182/6640 [2:39:13<3:43:59, 5.47s/it] 63%|██████▎ | 4183/6640 [2:39:18<3:44:20, 5.48s/it] {'loss': 0.6138, 'learning_rate': 6.363246051180812e-06, 'epoch': 0.63} 63%|██████▎ | 4183/6640 [2:39:18<3:44:20, 5.48s/it] 63%|██████▎ | 4184/6640 [2:39:23<3:40:49, 5.39s/it] {'loss': 0.5847, 'learning_rate': 6.358702269138375e-06, 'epoch': 0.63} 63%|██████▎ | 4184/6640 [2:39:23<3:40:49, 5.39s/it] 63%|██████▎ | 4185/6640 [2:39:29<3:40:39, 5.39s/it] {'loss': 0.577, 'learning_rate': 6.354159353627114e-06, 'epoch': 0.63} 63%|██████▎ | 4185/6640 [2:39:29<3:40:39, 5.39s/it] 63%|██████▎ | 4186/6640 [2:39:34<3:40:21, 5.39s/it] {'loss': 0.5933, 'learning_rate': 6.349617305728121e-06, 'epoch': 0.63} 63%|██████▎ | 4186/6640 [2:39:34<3:40:21, 5.39s/it] 63%|██████▎ | 4187/6640 [2:39:40<3:40:50, 5.40s/it] {'loss': 0.5775, 'learning_rate': 6.3450761265222784e-06, 'epoch': 0.63} 63%|██████▎ | 4187/6640 [2:39:40<3:40:50, 5.40s/it] 63%|██████▎ | 4188/6640 [2:39:45<3:41:12, 5.41s/it] {'loss': 0.5835, 'learning_rate': 6.340535817090272e-06, 'epoch': 0.63} 63%|██████▎ | 4188/6640 [2:39:45<3:41:12, 5.41s/it] 63%|██████▎ | 4189/6640 [2:39:50<3:39:21, 5.37s/it] {'loss': 0.5784, 'learning_rate': 6.335996378512568e-06, 'epoch': 0.63} 63%|██████▎ | 4189/6640 [2:39:50<3:39:21, 5.37s/it] 63%|██████▎ | 4190/6640 [2:39:56<3:39:37, 5.38s/it] {'loss': 0.5712, 'learning_rate': 6.331457811869437e-06, 'epoch': 0.63} 63%|██████▎ | 4190/6640 [2:39:56<3:39:37, 5.38s/it] 63%|██████▎ | 4191/6640 [2:40:01<3:40:14, 5.40s/it] {'loss': 0.5806, 'learning_rate': 6.326920118240932e-06, 'epoch': 0.63} 63%|██████▎ | 4191/6640 [2:40:01<3:40:14, 5.40s/it] 63%|██████▎ | 4192/6640 [2:40:07<3:44:24, 5.50s/it] {'loss': 0.6161, 'learning_rate': 6.3223832987068964e-06, 'epoch': 0.63} 63%|██████▎ | 4192/6640 [2:40:07<3:44:24, 5.50s/it] 63%|██████▎ | 4193/6640 [2:40:12<3:42:37, 5.46s/it] {'loss': 0.5725, 'learning_rate': 6.317847354346984e-06, 'epoch': 0.63} 63%|██████▎ | 4193/6640 [2:40:12<3:42:37, 5.46s/it] 63%|██████▎ | 4194/6640 [2:40:18<3:41:00, 5.42s/it] {'loss': 0.5637, 'learning_rate': 6.31331228624062e-06, 'epoch': 0.63} 63%|██████▎ | 4194/6640 [2:40:18<3:41:00, 5.42s/it] 63%|██████▎ | 4195/6640 [2:40:23<3:37:22, 5.33s/it] {'loss': 0.5632, 'learning_rate': 6.3087780954670306e-06, 'epoch': 0.63} 63%|██████▎ | 4195/6640 [2:40:23<3:37:22, 5.33s/it] 63%|██████▎ | 4196/6640 [2:40:28<3:37:39, 5.34s/it] {'loss': 0.5875, 'learning_rate': 6.304244783105231e-06, 'epoch': 0.63} 63%|██████▎ | 4196/6640 [2:40:28<3:37:39, 5.34s/it] 63%|██████▎ | 4197/6640 [2:40:33<3:37:50, 5.35s/it] {'loss': 0.5629, 'learning_rate': 6.2997123502340286e-06, 'epoch': 0.63} 63%|██████▎ | 4197/6640 [2:40:33<3:37:50, 5.35s/it] 63%|██████▎ | 4198/6640 [2:40:39<3:39:29, 5.39s/it] {'loss': 0.5789, 'learning_rate': 6.2951807979320225e-06, 'epoch': 0.63} 63%|██████▎ | 4198/6640 [2:40:39<3:39:29, 5.39s/it] 63%|██████▎ | 4199/6640 [2:40:44<3:40:07, 5.41s/it] {'loss': 0.5772, 'learning_rate': 6.290650127277599e-06, 'epoch': 0.63} 63%|██████▎ | 4199/6640 [2:40:44<3:40:07, 5.41s/it]1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...3 AutoResumeHook: Checking whether to suspend... 63%|██████▎ | 4200/6640 [2:40:50<3:40:49, 5.43s/it]2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5708, 'learning_rate': 6.286120339348935e-06, 'epoch': 0.63} 63%|██████▎ | 4200/6640 [2:40:50<3:40:49, 5.43s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 63%|██████▎ | 4201/6640 [2:41:17<7:59:49, 11.80s/it] {'loss': 0.6137, 'learning_rate': 6.281591435223997e-06, 'epoch': 0.63} 63%|██████▎ | 4201/6640 [2:41:17<7:59:49, 11.80s/it] 63%|██████▎ | 4202/6640 [2:41:22<6:42:29, 9.91s/it] {'loss': 0.5875, 'learning_rate': 6.277063415980549e-06, 'epoch': 0.63} 63%|██████▎ | 4202/6640 [2:41:22<6:42:29, 9.91s/it] 63%|██████▎ | 4203/6640 [2:41:27<5:46:56, 8.54s/it] {'loss': 0.5668, 'learning_rate': 6.2725362826961355e-06, 'epoch': 0.63} 63%|██████▎ | 4203/6640 [2:41:27<5:46:56, 8.54s/it] 63%|██████▎ | 4204/6640 [2:41:33<5:12:04, 7.69s/it] {'loss': 0.5641, 'learning_rate': 6.2680100364480876e-06, 'epoch': 0.63} 63%|██████▎ | 4204/6640 [2:41:33<5:12:04, 7.69s/it] 63%|██████▎ | 4205/6640 [2:41:38<4:43:56, 7.00s/it] {'loss': 0.5854, 'learning_rate': 6.263484678313536e-06, 'epoch': 0.63} 63%|██████▎ | 4205/6640 [2:41:38<4:43:56, 7.00s/it] 63%|██████▎ | 4206/6640 [2:41:44<4:26:18, 6.56s/it] {'loss': 0.5736, 'learning_rate': 6.258960209369392e-06, 'epoch': 0.63} 63%|██████▎ | 4206/6640 [2:41:44<4:26:18, 6.56s/it] 63%|██████▎ | 4207/6640 [2:41:49<4:11:06, 6.19s/it] {'loss': 0.5846, 'learning_rate': 6.254436630692361e-06, 'epoch': 0.63} 63%|██████▎ | 4207/6640 [2:41:49<4:11:06, 6.19s/it] 63%|██████▎ | 4208/6640 [2:41:55<4:09:36, 6.16s/it] {'loss': 0.559, 'learning_rate': 6.2499139433589314e-06, 'epoch': 0.63} 63%|██████▎ | 4208/6640 [2:41:55<4:09:36, 6.16s/it] 63%|██████▎ | 4209/6640 [2:42:01<4:02:06, 5.98s/it] {'loss': 0.6119, 'learning_rate': 6.245392148445382e-06, 'epoch': 0.63} 63%|██████▎ | 4209/6640 [2:42:01<4:02:06, 5.98s/it] 63%|██████▎ | 4210/6640 [2:42:06<3:53:21, 5.76s/it] {'loss': 0.5839, 'learning_rate': 6.240871247027774e-06, 'epoch': 0.63} 63%|██████▎ | 4210/6640 [2:42:06<3:53:21, 5.76s/it] 63%|██████▎ | 4211/6640 [2:42:12<3:49:20, 5.67s/it] {'loss': 0.5837, 'learning_rate': 6.236351240181967e-06, 'epoch': 0.63} 63%|██████▎ | 4211/6640 [2:42:12<3:49:20, 5.67s/it] 63%|██████▎ | 4212/6640 [2:42:17<3:50:16, 5.69s/it] {'loss': 0.5782, 'learning_rate': 6.231832128983599e-06, 'epoch': 0.63} 63%|██████▎ | 4212/6640 [2:42:17<3:50:16, 5.69s/it] 63%|██████▎ | 4213/6640 [2:42:23<3:52:16, 5.74s/it] {'loss': 0.6046, 'learning_rate': 6.227313914508098e-06, 'epoch': 0.63} 63%|██████▎ | 4213/6640 [2:42:23<3:52:16, 5.74s/it] 63%|██████▎ | 4214/6640 [2:42:29<3:50:07, 5.69s/it] {'loss': 0.5599, 'learning_rate': 6.222796597830674e-06, 'epoch': 0.63} 63%|██████▎ | 4214/6640 [2:42:29<3:50:07, 5.69s/it] 63%|██████▎ | 4215/6640 [2:42:34<3:46:01, 5.59s/it] {'loss': 0.5872, 'learning_rate': 6.2182801800263325e-06, 'epoch': 0.63} 63%|██████▎ | 4215/6640 [2:42:34<3:46:01, 5.59s/it] 63%|██████▎ | 4216/6640 [2:42:39<3:40:54, 5.47s/it] {'loss': 0.5945, 'learning_rate': 6.2137646621698585e-06, 'epoch': 0.63} 63%|██████▎ | 4216/6640 [2:42:39<3:40:54, 5.47s/it] 64%|██████▎ | 4217/6640 [2:42:45<3:39:08, 5.43s/it] {'loss': 0.5831, 'learning_rate': 6.209250045335824e-06, 'epoch': 0.64} 64%|██████▎ | 4217/6640 [2:42:45<3:39:08, 5.43s/it] 64%|██████▎ | 4218/6640 [2:42:50<3:39:17, 5.43s/it] {'loss': 0.5691, 'learning_rate': 6.204736330598585e-06, 'epoch': 0.64} 64%|██████▎ | 4218/6640 [2:42:50<3:39:17, 5.43s/it] 64%|██████▎ | 4219/6640 [2:42:55<3:37:12, 5.38s/it] {'loss': 0.5497, 'learning_rate': 6.200223519032283e-06, 'epoch': 0.64} 64%|██████▎ | 4219/6640 [2:42:55<3:37:12, 5.38s/it] 64%|██████▎ | 4220/6640 [2:43:01<3:38:23, 5.41s/it] {'loss': 0.5935, 'learning_rate': 6.195711611710851e-06, 'epoch': 0.64} 64%|██████▎ | 4220/6640 [2:43:01<3:38:23, 5.41s/it] 64%|██████▎ | 4221/6640 [2:43:06<3:39:13, 5.44s/it] {'loss': 0.5687, 'learning_rate': 6.191200609707997e-06, 'epoch': 0.64} 64%|██████▎ | 4221/6640 [2:43:06<3:39:13, 5.44s/it] 64%|██████▎ | 4222/6640 [2:43:12<3:39:26, 5.45s/it] {'loss': 0.5802, 'learning_rate': 6.186690514097223e-06, 'epoch': 0.64} 64%|██████▎ | 4222/6640 [2:43:12<3:39:26, 5.45s/it] 64%|██████▎ | 4223/6640 [2:43:17<3:37:52, 5.41s/it] {'loss': 0.5824, 'learning_rate': 6.182181325951806e-06, 'epoch': 0.64} 64%|██████▎ | 4223/6640 [2:43:17<3:37:52, 5.41s/it] 64%|██████▎ | 4224/6640 [2:43:23<3:37:54, 5.41s/it] {'loss': 0.5989, 'learning_rate': 6.177673046344816e-06, 'epoch': 0.64} 64%|██████▎ | 4224/6640 [2:43:23<3:37:54, 5.41s/it] 64%|██████▎ | 4225/6640 [2:43:28<3:36:39, 5.38s/it] {'loss': 0.5808, 'learning_rate': 6.173165676349103e-06, 'epoch': 0.64} 64%|██████▎ | 4225/6640 [2:43:28<3:36:39, 5.38s/it] 64%|██████▎ | 4226/6640 [2:43:33<3:37:25, 5.40s/it] {'loss': 0.5721, 'learning_rate': 6.168659217037298e-06, 'epoch': 0.64} 64%|██████▎ | 4226/6640 [2:43:33<3:37:25, 5.40s/it] 64%|██████▎ | 4227/6640 [2:43:39<3:36:22, 5.38s/it] {'loss': 0.5844, 'learning_rate': 6.164153669481818e-06, 'epoch': 0.64} 64%|██████▎ | 4227/6640 [2:43:39<3:36:22, 5.38s/it] 64%|██████▎ | 4228/6640 [2:43:44<3:35:01, 5.35s/it] {'loss': 0.5841, 'learning_rate': 6.159649034754858e-06, 'epoch': 0.64} 64%|██████▎ | 4228/6640 [2:43:44<3:35:01, 5.35s/it] 64%|██████▎ | 4229/6640 [2:43:50<3:40:24, 5.48s/it] {'loss': 0.5699, 'learning_rate': 6.155145313928407e-06, 'epoch': 0.64} 64%|██████▎ | 4229/6640 [2:43:50<3:40:24, 5.48s/it] 64%|██████▎ | 4230/6640 [2:43:55<3:40:22, 5.49s/it] {'loss': 0.6069, 'learning_rate': 6.150642508074225e-06, 'epoch': 0.64} 64%|██████▎ | 4230/6640 [2:43:55<3:40:22, 5.49s/it] 64%|██████▎ | 4231/6640 [2:44:01<3:41:43, 5.52s/it] {'loss': 0.5618, 'learning_rate': 6.1461406182638635e-06, 'epoch': 0.64} 64%|██████▎ | 4231/6640 [2:44:01<3:41:43, 5.52s/it] 64%|██████▎ | 4232/6640 [2:44:06<3:39:47, 5.48s/it] {'loss': 0.5981, 'learning_rate': 6.141639645568646e-06, 'epoch': 0.64} 64%|██████▎ | 4232/6640 [2:44:06<3:39:47, 5.48s/it] 64%|██████▍ | 4233/6640 [2:44:12<3:37:43, 5.43s/it] {'loss': 0.5851, 'learning_rate': 6.137139591059689e-06, 'epoch': 0.64} 64%|██████▍ | 4233/6640 [2:44:12<3:37:43, 5.43s/it] 64%|██████▍ | 4234/6640 [2:44:17<3:35:42, 5.38s/it] {'loss': 0.5744, 'learning_rate': 6.1326404558078825e-06, 'epoch': 0.64} 64%|██████▍ | 4234/6640 [2:44:17<3:35:42, 5.38s/it] 64%|██████▍ | 4235/6640 [2:44:22<3:36:23, 5.40s/it] {'loss': 0.563, 'learning_rate': 6.128142240883899e-06, 'epoch': 0.64} 64%|██████▍ | 4235/6640 [2:44:22<3:36:23, 5.40s/it] 64%|██████▍ | 4236/6640 [2:44:28<3:34:39, 5.36s/it] {'loss': 0.5971, 'learning_rate': 6.123644947358191e-06, 'epoch': 0.64} 64%|██████▍ | 4236/6640 [2:44:28<3:34:39, 5.36s/it] 64%|██████▍ | 4237/6640 [2:44:33<3:34:34, 5.36s/it] {'loss': 0.5789, 'learning_rate': 6.119148576300997e-06, 'epoch': 0.64} 64%|██████▍ | 4237/6640 [2:44:33<3:34:34, 5.36s/it] 64%|██████▍ | 4238/6640 [2:44:39<3:42:40, 5.56s/it] {'loss': 0.5692, 'learning_rate': 6.114653128782333e-06, 'epoch': 0.64} 64%|██████▍ | 4238/6640 [2:44:39<3:42:40, 5.56s/it] 64%|██████▍ | 4239/6640 [2:44:45<3:42:44, 5.57s/it] {'loss': 0.5959, 'learning_rate': 6.110158605871994e-06, 'epoch': 0.64} 64%|██████▍ | 4239/6640 [2:44:45<3:42:44, 5.57s/it] 64%|██████▍ | 4240/6640 [2:44:50<3:38:40, 5.47s/it] {'loss': 0.5769, 'learning_rate': 6.105665008639557e-06, 'epoch': 0.64} 64%|██████▍ | 4240/6640 [2:44:50<3:38:40, 5.47s/it] 64%|██████▍ | 4241/6640 [2:44:55<3:35:36, 5.39s/it] {'loss': 0.579, 'learning_rate': 6.101172338154372e-06, 'epoch': 0.64} 64%|██████▍ | 4241/6640 [2:44:55<3:35:36, 5.39s/it] 64%|██████▍ | 4242/6640 [2:45:00<3:35:50, 5.40s/it] {'loss': 0.5696, 'learning_rate': 6.096680595485582e-06, 'epoch': 0.64} 64%|██████▍ | 4242/6640 [2:45:00<3:35:50, 5.40s/it] 64%|██████▍ | 4243/6640 [2:45:06<3:34:17, 5.36s/it] {'loss': 0.5666, 'learning_rate': 6.092189781702098e-06, 'epoch': 0.64} 64%|██████▍ | 4243/6640 [2:45:06<3:34:17, 5.36s/it] 64%|██████▍ | 4244/6640 [2:45:11<3:34:06, 5.36s/it] {'loss': 0.5627, 'learning_rate': 6.087699897872612e-06, 'epoch': 0.64} 64%|██████▍ | 4244/6640 [2:45:11<3:34:06, 5.36s/it] 64%|██████▍ | 4245/6640 [2:45:16<3:34:04, 5.36s/it] {'loss': 0.5958, 'learning_rate': 6.083210945065595e-06, 'epoch': 0.64} 64%|██████▍ | 4245/6640 [2:45:16<3:34:04, 5.36s/it] 64%|██████▍ | 4246/6640 [2:45:22<3:36:18, 5.42s/it] {'loss': 0.5716, 'learning_rate': 6.0787229243493e-06, 'epoch': 0.64} 64%|██████▍ | 4246/6640 [2:45:22<3:36:18, 5.42s/it] 64%|██████▍ | 4247/6640 [2:45:27<3:34:38, 5.38s/it] {'loss': 0.5727, 'learning_rate': 6.074235836791753e-06, 'epoch': 0.64} 64%|██████▍ | 4247/6640 [2:45:27<3:34:38, 5.38s/it] 64%|██████▍ | 4248/6640 [2:45:33<3:34:05, 5.37s/it] {'loss': 0.5608, 'learning_rate': 6.069749683460765e-06, 'epoch': 0.64} 64%|██████▍ | 4248/6640 [2:45:33<3:34:05, 5.37s/it] 64%|██████▍ | 4249/6640 [2:45:38<3:33:25, 5.36s/it] {'loss': 0.5954, 'learning_rate': 6.065264465423917e-06, 'epoch': 0.64} 64%|██████▍ | 4249/6640 [2:45:38<3:33:25, 5.36s/it]7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 64%|██████▍ | 4250/6640 [2:45:43<3:32:35, 5.34s/it]1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.588, 'learning_rate': 6.0607801837485665e-06, 'epoch': 0.64} 64%|██████▍ | 4250/6640 [2:45:43<3:32:35, 5.34s/it] 64%|██████▍ | 4251/6640 [2:45:49<3:34:16, 5.38s/it] {'loss': 0.578, 'learning_rate': 6.05629683950186e-06, 'epoch': 0.64} 64%|██████▍ | 4251/6640 [2:45:49<3:34:16, 5.38s/it] 64%|██████▍ | 4252/6640 [2:45:54<3:34:46, 5.40s/it] {'loss': 0.5844, 'learning_rate': 6.051814433750711e-06, 'epoch': 0.64} 64%|██████▍ | 4252/6640 [2:45:54<3:34:46, 5.40s/it] 64%|██████▍ | 4253/6640 [2:45:59<3:32:21, 5.34s/it] {'loss': 0.6046, 'learning_rate': 6.047332967561809e-06, 'epoch': 0.64} 64%|██████▍ | 4253/6640 [2:45:59<3:32:21, 5.34s/it] 64%|██████▍ | 4254/6640 [2:46:05<3:37:15, 5.46s/it] {'loss': 0.6, 'learning_rate': 6.042852442001624e-06, 'epoch': 0.64} 64%|██████▍ | 4254/6640 [2:46:05<3:37:15, 5.46s/it] 64%|██████▍ | 4255/6640 [2:46:11<3:37:49, 5.48s/it] {'loss': 0.5834, 'learning_rate': 6.038372858136401e-06, 'epoch': 0.64} 64%|██████▍ | 4255/6640 [2:46:11<3:37:49, 5.48s/it] 64%|██████▍ | 4256/6640 [2:46:16<3:36:20, 5.44s/it] {'loss': 0.6077, 'learning_rate': 6.033894217032159e-06, 'epoch': 0.64} 64%|██████▍ | 4256/6640 [2:46:16<3:36:20, 5.44s/it] 64%|██████▍ | 4257/6640 [2:46:21<3:33:58, 5.39s/it] {'loss': 0.5875, 'learning_rate': 6.029416519754699e-06, 'epoch': 0.64} 64%|██████▍ | 4257/6640 [2:46:21<3:33:58, 5.39s/it] 64%|██████▍ | 4258/6640 [2:46:27<3:34:14, 5.40s/it] {'loss': 0.5834, 'learning_rate': 6.024939767369591e-06, 'epoch': 0.64} 64%|██████▍ | 4258/6640 [2:46:27<3:34:14, 5.40s/it] 64%|██████▍ | 4259/6640 [2:46:32<3:35:59, 5.44s/it] {'loss': 0.5892, 'learning_rate': 6.020463960942177e-06, 'epoch': 0.64} 64%|██████▍ | 4259/6640 [2:46:32<3:35:59, 5.44s/it] 64%|██████▍ | 4260/6640 [2:46:38<3:35:31, 5.43s/it] {'loss': 0.5903, 'learning_rate': 6.015989101537586e-06, 'epoch': 0.64} 64%|██████▍ | 4260/6640 [2:46:38<3:35:31, 5.43s/it] 64%|██████▍ | 4261/6640 [2:46:43<3:35:54, 5.45s/it] {'loss': 0.5929, 'learning_rate': 6.011515190220712e-06, 'epoch': 0.64} 64%|██████▍ | 4261/6640 [2:46:43<3:35:54, 5.45s/it] 64%|██████▍ | 4262/6640 [2:46:48<3:35:33, 5.44s/it] {'loss': 0.5747, 'learning_rate': 6.007042228056223e-06, 'epoch': 0.64} 64%|██████▍ | 4262/6640 [2:46:48<3:35:33, 5.44s/it] 64%|██████▍ | 4263/6640 [2:46:54<3:37:17, 5.48s/it] {'loss': 0.5852, 'learning_rate': 6.00257021610857e-06, 'epoch': 0.64} 64%|██████▍ | 4263/6640 [2:46:54<3:37:17, 5.48s/it] 64%|██████▍ | 4264/6640 [2:47:00<3:40:29, 5.57s/it] {'loss': 0.5989, 'learning_rate': 5.998099155441962e-06, 'epoch': 0.64} 64%|██████▍ | 4264/6640 [2:47:00<3:40:29, 5.57s/it] 64%|██████▍ | 4265/6640 [2:47:05<3:40:14, 5.56s/it] {'loss': 0.5921, 'learning_rate': 5.993629047120403e-06, 'epoch': 0.64} 64%|██████▍ | 4265/6640 [2:47:05<3:40:14, 5.56s/it] 64%|██████▍ | 4266/6640 [2:47:11<3:40:56, 5.58s/it] {'loss': 0.5956, 'learning_rate': 5.9891598922076535e-06, 'epoch': 0.64} 64%|██████▍ | 4266/6640 [2:47:11<3:40:56, 5.58s/it] 64%|██████▍ | 4267/6640 [2:47:16<3:37:53, 5.51s/it] {'loss': 0.5747, 'learning_rate': 5.984691691767252e-06, 'epoch': 0.64} 64%|██████▍ | 4267/6640 [2:47:16<3:37:53, 5.51s/it] 64%|██████▍ | 4268/6640 [2:47:22<3:35:17, 5.45s/it] {'loss': 0.5939, 'learning_rate': 5.980224446862507e-06, 'epoch': 0.64} 64%|██████▍ | 4268/6640 [2:47:22<3:35:17, 5.45s/it] 64%|██████▍ | 4269/6640 [2:47:27<3:35:53, 5.46s/it] {'loss': 0.5721, 'learning_rate': 5.9757581585565105e-06, 'epoch': 0.64} 64%|██████▍ | 4269/6640 [2:47:27<3:35:53, 5.46s/it] 64%|██████▍ | 4270/6640 [2:47:33<3:35:24, 5.45s/it] {'loss': 0.5822, 'learning_rate': 5.971292827912117e-06, 'epoch': 0.64} 64%|██████▍ | 4270/6640 [2:47:33<3:35:24, 5.45s/it] 64%|██████▍ | 4271/6640 [2:47:38<3:36:49, 5.49s/it] {'loss': 0.5919, 'learning_rate': 5.966828455991951e-06, 'epoch': 0.64} 64%|██████▍ | 4271/6640 [2:47:38<3:36:49, 5.49s/it] 64%|██████▍ | 4272/6640 [2:47:44<3:40:09, 5.58s/it] {'loss': 0.6007, 'learning_rate': 5.962365043858418e-06, 'epoch': 0.64} 64%|██████▍ | 4272/6640 [2:47:44<3:40:09, 5.58s/it] 64%|██████▍ | 4273/6640 [2:47:50<3:41:51, 5.62s/it] {'loss': 0.5684, 'learning_rate': 5.9579025925736855e-06, 'epoch': 0.64} 64%|██████▍ | 4273/6640 [2:47:50<3:41:51, 5.62s/it] 64%|██████▍ | 4274/6640 [2:47:55<3:42:07, 5.63s/it] {'loss': 0.5922, 'learning_rate': 5.953441103199704e-06, 'epoch': 0.64} 64%|██████▍ | 4274/6640 [2:47:55<3:42:07, 5.63s/it] 64%|██████▍ | 4275/6640 [2:48:01<3:39:03, 5.56s/it] {'loss': 0.5735, 'learning_rate': 5.9489805767981845e-06, 'epoch': 0.64} 64%|██████▍ | 4275/6640 [2:48:01<3:39:03, 5.56s/it] 64%|██████▍ | 4276/6640 [2:48:06<3:37:47, 5.53s/it] {'loss': 0.5916, 'learning_rate': 5.9445210144306136e-06, 'epoch': 0.64} 64%|██████▍ | 4276/6640 [2:48:06<3:37:47, 5.53s/it] 64%|██████▍ | 4277/6640 [2:48:12<3:35:28, 5.47s/it] {'loss': 0.5934, 'learning_rate': 5.9400624171582415e-06, 'epoch': 0.64} 64%|██████▍ | 4277/6640 [2:48:12<3:35:28, 5.47s/it] 64%|██████▍ | 4278/6640 [2:48:17<3:34:02, 5.44s/it] {'loss': 0.5723, 'learning_rate': 5.935604786042104e-06, 'epoch': 0.64} 64%|██████▍ | 4278/6640 [2:48:17<3:34:02, 5.44s/it] 64%|██████▍ | 4279/6640 [2:48:22<3:31:48, 5.38s/it] {'loss': 0.5763, 'learning_rate': 5.9311481221429916e-06, 'epoch': 0.64} 64%|██████▍ | 4279/6640 [2:48:22<3:31:48, 5.38s/it] 64%|██████▍ | 4280/6640 [2:48:28<3:32:52, 5.41s/it] {'loss': 0.5714, 'learning_rate': 5.926692426521474e-06, 'epoch': 0.64} 64%|██████▍ | 4280/6640 [2:48:28<3:32:52, 5.41s/it] 64%|██████▍ | 4281/6640 [2:48:33<3:32:11, 5.40s/it] {'loss': 0.6125, 'learning_rate': 5.922237700237887e-06, 'epoch': 0.64} 64%|██████▍ | 4281/6640 [2:48:33<3:32:11, 5.40s/it] 64%|██████▍ | 4282/6640 [2:48:39<3:36:48, 5.52s/it] {'loss': 0.5938, 'learning_rate': 5.917783944352332e-06, 'epoch': 0.64} 64%|██████▍ | 4282/6640 [2:48:39<3:36:48, 5.52s/it] 65%|██████▍ | 4283/6640 [2:48:44<3:36:36, 5.51s/it] {'loss': 0.5845, 'learning_rate': 5.913331159924689e-06, 'epoch': 0.65} 65%|██████▍ | 4283/6640 [2:48:44<3:36:36, 5.51s/it] 65%|██████▍ | 4284/6640 [2:48:50<3:35:21, 5.48s/it] {'loss': 0.6022, 'learning_rate': 5.9088793480146e-06, 'epoch': 0.65} 65%|██████▍ | 4284/6640 [2:48:50<3:35:21, 5.48s/it] 65%|██████▍ | 4285/6640 [2:48:55<3:34:16, 5.46s/it] {'loss': 0.5914, 'learning_rate': 5.904428509681473e-06, 'epoch': 0.65} 65%|██████▍ | 4285/6640 [2:48:55<3:34:16, 5.46s/it] 65%|██████▍ | 4286/6640 [2:49:01<3:36:01, 5.51s/it] {'loss': 0.5962, 'learning_rate': 5.899978645984488e-06, 'epoch': 0.65} 65%|██████▍ | 4286/6640 [2:49:01<3:36:01, 5.51s/it] 65%|██████▍ | 4287/6640 [2:49:06<3:34:01, 5.46s/it] {'loss': 0.5811, 'learning_rate': 5.8955297579826005e-06, 'epoch': 0.65} 65%|██████▍ | 4287/6640 [2:49:06<3:34:01, 5.46s/it] 65%|██████▍ | 4288/6640 [2:49:12<3:36:12, 5.52s/it] {'loss': 0.5885, 'learning_rate': 5.891081846734519e-06, 'epoch': 0.65} 65%|██████▍ | 4288/6640 [2:49:12<3:36:12, 5.52s/it] 65%|██████▍ | 4289/6640 [2:49:17<3:34:19, 5.47s/it] {'loss': 0.5699, 'learning_rate': 5.886634913298732e-06, 'epoch': 0.65} 65%|██████▍ | 4289/6640 [2:49:17<3:34:19, 5.47s/it] 65%|██████▍ | 4290/6640 [2:49:23<3:35:48, 5.51s/it] {'loss': 0.6146, 'learning_rate': 5.882188958733488e-06, 'epoch': 0.65} 65%|██████▍ | 4290/6640 [2:49:23<3:35:48, 5.51s/it] 65%|██████▍ | 4291/6640 [2:49:28<3:34:23, 5.48s/it] {'loss': 0.5747, 'learning_rate': 5.8777439840968e-06, 'epoch': 0.65} 65%|██████▍ | 4291/6640 [2:49:28<3:34:23, 5.48s/it] 65%|██████▍ | 4292/6640 [2:49:34<3:34:15, 5.48s/it] {'loss': 0.5778, 'learning_rate': 5.873299990446463e-06, 'epoch': 0.65} 65%|██████▍ | 4292/6640 [2:49:34<3:34:15, 5.48s/it] 65%|██████▍ | 4293/6640 [2:49:39<3:33:08, 5.45s/it] {'loss': 0.5897, 'learning_rate': 5.868856978840023e-06, 'epoch': 0.65} 65%|██████▍ | 4293/6640 [2:49:39<3:33:08, 5.45s/it] 65%|██████▍ | 4294/6640 [2:49:45<3:35:50, 5.52s/it] {'loss': 0.593, 'learning_rate': 5.864414950334796e-06, 'epoch': 0.65} 65%|██████▍ | 4294/6640 [2:49:45<3:35:50, 5.52s/it] 65%|██████▍ | 4295/6640 [2:49:50<3:36:31, 5.54s/it] {'loss': 0.5856, 'learning_rate': 5.859973905987866e-06, 'epoch': 0.65} 65%|██████▍ | 4295/6640 [2:49:50<3:36:31, 5.54s/it] 65%|██████▍ | 4296/6640 [2:49:55<3:33:39, 5.47s/it] {'loss': 0.5801, 'learning_rate': 5.8555338468560855e-06, 'epoch': 0.65} 65%|██████▍ | 4296/6640 [2:49:55<3:33:39, 5.47s/it] 65%|██████▍ | 4297/6640 [2:50:01<3:35:07, 5.51s/it] {'loss': 0.5996, 'learning_rate': 5.851094773996067e-06, 'epoch': 0.65} 65%|██████▍ | 4297/6640 [2:50:01<3:35:07, 5.51s/it] 65%|██████▍ | 4298/6640 [2:50:07<3:34:35, 5.50s/it] {'loss': 0.5973, 'learning_rate': 5.846656688464192e-06, 'epoch': 0.65} 65%|██████▍ | 4298/6640 [2:50:07<3:34:35, 5.50s/it] 65%|██████▍ | 4299/6640 [2:50:12<3:33:39, 5.48s/it] {'loss': 0.6001, 'learning_rate': 5.842219591316606e-06, 'epoch': 0.65} 65%|██████▍ | 4299/6640 [2:50:12<3:33:39, 5.48s/it]6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 65%|██████▍ | 4300/6640 [2:50:17<3:33:31, 5.48s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5753, 'learning_rate': 5.837783483609214e-06, 'epoch': 0.65} 65%|██████▍ | 4300/6640 [2:50:17<3:33:31, 5.48s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 65%|██████▍ | 4301/6640 [2:50:43<7:29:20, 11.53s/it] {'loss': 0.6081, 'learning_rate': 5.833348366397697e-06, 'epoch': 0.65} 65%|██████▍ | 4301/6640 [2:50:43<7:29:20, 11.53s/it] 65%|██████▍ | 4302/6640 [2:50:48<6:16:10, 9.65s/it] {'loss': 0.5811, 'learning_rate': 5.828914240737496e-06, 'epoch': 0.65} 65%|██████▍ | 4302/6640 [2:50:48<6:16:10, 9.65s/it] 65%|██████▍ | 4303/6640 [2:50:54<5:27:34, 8.41s/it] {'loss': 0.5922, 'learning_rate': 5.8244811076838055e-06, 'epoch': 0.65} 65%|██████▍ | 4303/6640 [2:50:54<5:27:34, 8.41s/it] 65%|██████▍ | 4304/6640 [2:50:59<4:52:09, 7.50s/it] {'loss': 0.6082, 'learning_rate': 5.820048968291596e-06, 'epoch': 0.65} 65%|██████▍ | 4304/6640 [2:50:59<4:52:09, 7.50s/it] 65%|██████▍ | 4305/6640 [2:51:05<4:28:03, 6.89s/it] {'loss': 0.5706, 'learning_rate': 5.815617823615599e-06, 'epoch': 0.65} 65%|██████▍ | 4305/6640 [2:51:05<4:28:03, 6.89s/it] 65%|██████▍ | 4306/6640 [2:51:10<4:09:26, 6.41s/it] {'loss': 0.5962, 'learning_rate': 5.811187674710311e-06, 'epoch': 0.65} 65%|██████▍ | 4306/6640 [2:51:10<4:09:26, 6.41s/it] 65%|██████▍ | 4307/6640 [2:51:15<3:57:48, 6.12s/it] {'loss': 0.5705, 'learning_rate': 5.806758522629981e-06, 'epoch': 0.65} 65%|██████▍ | 4307/6640 [2:51:15<3:57:48, 6.12s/it] 65%|██████▍ | 4308/6640 [2:51:21<3:50:13, 5.92s/it] {'loss': 0.5741, 'learning_rate': 5.802330368428633e-06, 'epoch': 0.65} 65%|██████▍ | 4308/6640 [2:51:21<3:50:13, 5.92s/it] 65%|██████▍ | 4309/6640 [2:51:26<3:43:08, 5.74s/it] {'loss': 0.5671, 'learning_rate': 5.797903213160047e-06, 'epoch': 0.65} 65%|██████▍ | 4309/6640 [2:51:26<3:43:08, 5.74s/it] 65%|██████▍ | 4310/6640 [2:51:32<3:38:18, 5.62s/it] {'loss': 0.597, 'learning_rate': 5.79347705787777e-06, 'epoch': 0.65} 65%|██████▍ | 4310/6640 [2:51:32<3:38:18, 5.62s/it] 65%|██████▍ | 4311/6640 [2:51:37<3:39:37, 5.66s/it] {'loss': 0.5996, 'learning_rate': 5.789051903635109e-06, 'epoch': 0.65} 65%|██████▍ | 4311/6640 [2:51:37<3:39:37, 5.66s/it] 65%|██████▍ | 4312/6640 [2:51:43<3:36:30, 5.58s/it] {'loss': 0.6081, 'learning_rate': 5.7846277514851255e-06, 'epoch': 0.65} 65%|██████▍ | 4312/6640 [2:51:43<3:36:30, 5.58s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 65%|██████▍ | 4313/6640 [2:51:48<3:36:43, 5.59s/it] {'loss': 0.5986, 'learning_rate': 5.7802046024806545e-06, 'epoch': 0.65} 65%|██████▍ | 4313/6640 [2:51:48<3:36:43, 5.59s/it] 65%|██████▍ | 4314/6640 [2:51:54<3:34:14, 5.53s/it] {'loss': 0.5939, 'learning_rate': 5.775782457674285e-06, 'epoch': 0.65} 65%|██████▍ | 4314/6640 [2:51:54<3:34:14, 5.53s/it] 65%|██████▍ | 4315/6640 [2:51:59<3:34:17, 5.53s/it] {'loss': 0.5609, 'learning_rate': 5.771361318118374e-06, 'epoch': 0.65} 65%|██████▍ | 4315/6640 [2:51:59<3:34:17, 5.53s/it] 65%|██████▌ | 4316/6640 [2:52:05<3:31:08, 5.45s/it] {'loss': 0.5659, 'learning_rate': 5.766941184865024e-06, 'epoch': 0.65} 65%|██████▌ | 4316/6640 [2:52:05<3:31:08, 5.45s/it] 65%|██████▌ | 4317/6640 [2:52:10<3:34:43, 5.55s/it] {'loss': 0.5916, 'learning_rate': 5.7625220589661136e-06, 'epoch': 0.65} 65%|██████▌ | 4317/6640 [2:52:10<3:34:43, 5.55s/it] 65%|██████▌ | 4318/6640 [2:52:16<3:36:48, 5.60s/it] {'loss': 0.5607, 'learning_rate': 5.758103941473276e-06, 'epoch': 0.65} 65%|██████▌ | 4318/6640 [2:52:16<3:36:48, 5.60s/it] 65%|██████▌ | 4319/6640 [2:52:22<3:37:13, 5.62s/it] {'loss': 0.5875, 'learning_rate': 5.7536868334379055e-06, 'epoch': 0.65} 65%|██████▌ | 4319/6640 [2:52:22<3:37:13, 5.62s/it] 65%|██████▌ | 4320/6640 [2:52:28<3:44:42, 5.81s/it] {'loss': 0.6037, 'learning_rate': 5.749270735911159e-06, 'epoch': 0.65} 65%|██████▌ | 4320/6640 [2:52:28<3:44:42, 5.81s/it] 65%|██████▌ | 4321/6640 [2:52:33<3:40:34, 5.71s/it] {'loss': 0.5961, 'learning_rate': 5.744855649943941e-06, 'epoch': 0.65} 65%|██████▌ | 4321/6640 [2:52:33<3:40:34, 5.71s/it] 65%|██████▌ | 4322/6640 [2:52:39<3:36:37, 5.61s/it] {'loss': 0.564, 'learning_rate': 5.740441576586926e-06, 'epoch': 0.65} 65%|██████▌ | 4322/6640 [2:52:39<3:36:37, 5.61s/it] 65%|██████▌ | 4323/6640 [2:52:44<3:33:49, 5.54s/it] {'loss': 0.584, 'learning_rate': 5.736028516890548e-06, 'epoch': 0.65} 65%|██████▌ | 4323/6640 [2:52:44<3:33:49, 5.54s/it] 65%|██████▌ | 4324/6640 [2:52:50<3:32:05, 5.49s/it] {'loss': 0.5777, 'learning_rate': 5.731616471904998e-06, 'epoch': 0.65} 65%|██████▌ | 4324/6640 [2:52:50<3:32:05, 5.49s/it] 65%|██████▌ | 4325/6640 [2:52:55<3:29:54, 5.44s/it] {'loss': 0.5686, 'learning_rate': 5.727205442680218e-06, 'epoch': 0.65} 65%|██████▌ | 4325/6640 [2:52:55<3:29:54, 5.44s/it] 65%|██████▌ | 4326/6640 [2:53:00<3:30:14, 5.45s/it] {'loss': 0.5796, 'learning_rate': 5.722795430265919e-06, 'epoch': 0.65} 65%|██████▌ | 4326/6640 [2:53:00<3:30:14, 5.45s/it] 65%|██████▌ | 4327/6640 [2:53:06<3:32:57, 5.52s/it] {'loss': 0.5845, 'learning_rate': 5.718386435711563e-06, 'epoch': 0.65} 65%|██████▌ | 4327/6640 [2:53:06<3:32:57, 5.52s/it] 65%|██████▌ | 4328/6640 [2:53:12<3:32:35, 5.52s/it] {'loss': 0.5773, 'learning_rate': 5.713978460066376e-06, 'epoch': 0.65} 65%|██████▌ | 4328/6640 [2:53:12<3:32:35, 5.52s/it] 65%|██████▌ | 4329/6640 [2:53:17<3:35:53, 5.61s/it] {'loss': 0.5691, 'learning_rate': 5.709571504379339e-06, 'epoch': 0.65} 65%|██████▌ | 4329/6640 [2:53:17<3:35:53, 5.61s/it] 65%|██████▌ | 4330/6640 [2:53:23<3:32:23, 5.52s/it] {'loss': 0.5799, 'learning_rate': 5.7051655696991825e-06, 'epoch': 0.65} 65%|██████▌ | 4330/6640 [2:53:23<3:32:23, 5.52s/it] 65%|██████▌ | 4331/6640 [2:53:28<3:32:43, 5.53s/it] {'loss': 0.6038, 'learning_rate': 5.7007606570744055e-06, 'epoch': 0.65} 65%|██████▌ | 4331/6640 [2:53:28<3:32:43, 5.53s/it] 65%|██████▌ | 4332/6640 [2:53:34<3:30:38, 5.48s/it] {'loss': 0.5575, 'learning_rate': 5.696356767553257e-06, 'epoch': 0.65} 65%|██████▌ | 4332/6640 [2:53:34<3:30:38, 5.48s/it] 65%|██████▌ | 4333/6640 [2:53:39<3:31:24, 5.50s/it] {'loss': 0.5756, 'learning_rate': 5.6919539021837474e-06, 'epoch': 0.65} 65%|██████▌ | 4333/6640 [2:53:39<3:31:24, 5.50s/it] 65%|██████▌ | 4334/6640 [2:53:45<3:33:42, 5.56s/it] {'loss': 0.589, 'learning_rate': 5.687552062013643e-06, 'epoch': 0.65} 65%|██████▌ | 4334/6640 [2:53:45<3:33:42, 5.56s/it] 65%|██████▌ | 4335/6640 [2:53:50<3:33:05, 5.55s/it] {'loss': 0.5705, 'learning_rate': 5.683151248090455e-06, 'epoch': 0.65} 65%|██████▌ | 4335/6640 [2:53:50<3:33:05, 5.55s/it] 65%|██████▌ | 4336/6640 [2:53:56<3:31:01, 5.50s/it] {'loss': 0.5706, 'learning_rate': 5.6787514614614624e-06, 'epoch': 0.65} 65%|██████▌ | 4336/6640 [2:53:56<3:31:01, 5.50s/it] 65%|██████▌ | 4337/6640 [2:54:01<3:32:34, 5.54s/it] {'loss': 0.5953, 'learning_rate': 5.6743527031737e-06, 'epoch': 0.65} 65%|██████▌ | 4337/6640 [2:54:01<3:32:34, 5.54s/it] 65%|██████▌ | 4338/6640 [2:54:07<3:30:24, 5.48s/it] {'loss': 0.5949, 'learning_rate': 5.6699549742739545e-06, 'epoch': 0.65} 65%|██████▌ | 4338/6640 [2:54:07<3:30:24, 5.48s/it] 65%|██████▌ | 4339/6640 [2:54:12<3:31:43, 5.52s/it] {'loss': 0.5932, 'learning_rate': 5.665558275808761e-06, 'epoch': 0.65} 65%|██████▌ | 4339/6640 [2:54:12<3:31:43, 5.52s/it] 65%|██████▌ | 4340/6640 [2:54:18<3:29:26, 5.46s/it] {'loss': 0.6039, 'learning_rate': 5.66116260882442e-06, 'epoch': 0.65} 65%|██████▌ | 4340/6640 [2:54:18<3:29:26, 5.46s/it] 65%|██████▌ | 4341/6640 [2:54:23<3:29:54, 5.48s/it] {'loss': 0.5828, 'learning_rate': 5.656767974366981e-06, 'epoch': 0.65} 65%|██████▌ | 4341/6640 [2:54:23<3:29:54, 5.48s/it] 65%|██████▌ | 4342/6640 [2:54:28<3:27:52, 5.43s/it] {'loss': 0.5936, 'learning_rate': 5.652374373482253e-06, 'epoch': 0.65} 65%|██████▌ | 4342/6640 [2:54:28<3:27:52, 5.43s/it] 65%|██████▌ | 4343/6640 [2:54:34<3:28:16, 5.44s/it] {'loss': 0.585, 'learning_rate': 5.647981807215792e-06, 'epoch': 0.65} 65%|██████▌ | 4343/6640 [2:54:34<3:28:16, 5.44s/it] 65%|██████▌ | 4344/6640 [2:54:39<3:27:11, 5.41s/it] {'loss': 0.6254, 'learning_rate': 5.643590276612909e-06, 'epoch': 0.65} 65%|██████▌ | 4344/6640 [2:54:39<3:27:11, 5.41s/it] 65%|██████▌ | 4345/6640 [2:54:45<3:26:44, 5.41s/it] {'loss': 0.5713, 'learning_rate': 5.639199782718674e-06, 'epoch': 0.65} 65%|██████▌ | 4345/6640 [2:54:45<3:26:44, 5.41s/it] 65%|██████▌ | 4346/6640 [2:54:50<3:27:48, 5.44s/it] {'loss': 0.5975, 'learning_rate': 5.6348103265779045e-06, 'epoch': 0.65} 65%|██████▌ | 4346/6640 [2:54:50<3:27:48, 5.44s/it] 65%|██████▌ | 4347/6640 [2:54:56<3:28:12, 5.45s/it] {'loss': 0.5799, 'learning_rate': 5.630421909235177e-06, 'epoch': 0.65} 65%|██████▌ | 4347/6640 [2:54:56<3:28:12, 5.45s/it] 65%|██████▌ | 4348/6640 [2:55:01<3:28:49, 5.47s/it] {'loss': 0.5878, 'learning_rate': 5.626034531734812e-06, 'epoch': 0.65} 65%|██████▌ | 4348/6640 [2:55:01<3:28:49, 5.47s/it] 65%|██████▌ | 4349/6640 [2:55:07<3:27:48, 5.44s/it] {'loss': 0.5834, 'learning_rate': 5.621648195120888e-06, 'epoch': 0.65} 65%|██████▌ | 4349/6640 [2:55:07<3:27:48, 5.44s/it]5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 66%|██████▌ | 4350/6640 [2:55:12<3:26:20, 5.41s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5728, 'learning_rate': 5.617262900437239e-06, 'epoch': 0.66} 66%|██████▌ | 4350/6640 [2:55:12<3:26:20, 5.41s/it] 66%|██████▌ | 4351/6640 [2:55:17<3:28:34, 5.47s/it] {'loss': 0.5794, 'learning_rate': 5.612878648727444e-06, 'epoch': 0.66} 66%|██████▌ | 4351/6640 [2:55:17<3:28:34, 5.47s/it] 66%|██████▌ | 4352/6640 [2:55:23<3:28:04, 5.46s/it] {'loss': 0.6143, 'learning_rate': 5.608495441034846e-06, 'epoch': 0.66} 66%|██████▌ | 4352/6640 [2:55:23<3:28:04, 5.46s/it] 66%|██████▌ | 4353/6640 [2:55:28<3:27:20, 5.44s/it] {'loss': 0.5931, 'learning_rate': 5.604113278402519e-06, 'epoch': 0.66} 66%|██████▌ | 4353/6640 [2:55:28<3:27:20, 5.44s/it] 66%|██████▌ | 4354/6640 [2:55:34<3:27:38, 5.45s/it] {'loss': 0.6065, 'learning_rate': 5.599732161873304e-06, 'epoch': 0.66} 66%|██████▌ | 4354/6640 [2:55:34<3:27:38, 5.45s/it] 66%|██████▌ | 4355/6640 [2:55:39<3:26:24, 5.42s/it] {'loss': 0.6029, 'learning_rate': 5.595352092489791e-06, 'epoch': 0.66} 66%|██████▌ | 4355/6640 [2:55:39<3:26:24, 5.42s/it] 66%|██████▌ | 4356/6640 [2:55:44<3:24:10, 5.36s/it] {'loss': 0.5832, 'learning_rate': 5.5909730712943225e-06, 'epoch': 0.66} 66%|██████▌ | 4356/6640 [2:55:44<3:24:10, 5.36s/it] 66%|██████▌ | 4357/6640 [2:55:50<3:28:22, 5.48s/it] {'loss': 0.5653, 'learning_rate': 5.58659509932898e-06, 'epoch': 0.66} 66%|██████▌ | 4357/6640 [2:55:50<3:28:22, 5.48s/it] 66%|██████▌ | 4358/6640 [2:55:56<3:29:10, 5.50s/it] {'loss': 0.5753, 'learning_rate': 5.582218177635607e-06, 'epoch': 0.66} 66%|██████▌ | 4358/6640 [2:55:56<3:29:10, 5.50s/it] 66%|██████▌ | 4359/6640 [2:56:01<3:27:22, 5.45s/it] {'loss': 0.5577, 'learning_rate': 5.577842307255794e-06, 'epoch': 0.66} 66%|██████▌ | 4359/6640 [2:56:01<3:27:22, 5.45s/it] 66%|██████▌ | 4360/6640 [2:56:06<3:26:38, 5.44s/it] {'loss': 0.5944, 'learning_rate': 5.573467489230879e-06, 'epoch': 0.66} 66%|██████▌ | 4360/6640 [2:56:06<3:26:38, 5.44s/it] 66%|██████▌ | 4361/6640 [2:56:12<3:25:48, 5.42s/it] {'loss': 0.5966, 'learning_rate': 5.5690937246019595e-06, 'epoch': 0.66} 66%|██████▌ | 4361/6640 [2:56:12<3:25:48, 5.42s/it] 66%|██████▌ | 4362/6640 [2:56:17<3:24:47, 5.39s/it] {'loss': 0.5854, 'learning_rate': 5.564721014409862e-06, 'epoch': 0.66} 66%|██████▌ | 4362/6640 [2:56:17<3:24:47, 5.39s/it] 66%|██████▌ | 4363/6640 [2:56:23<3:26:56, 5.45s/it] {'loss': 0.6064, 'learning_rate': 5.560349359695181e-06, 'epoch': 0.66} 66%|██████▌ | 4363/6640 [2:56:23<3:26:56, 5.45s/it] 66%|██████▌ | 4364/6640 [2:56:28<3:26:40, 5.45s/it] {'loss': 0.5866, 'learning_rate': 5.55597876149825e-06, 'epoch': 0.66} 66%|██████▌ | 4364/6640 [2:56:28<3:26:40, 5.45s/it] 66%|██████▌ | 4365/6640 [2:56:34<3:29:32, 5.53s/it] {'loss': 0.5675, 'learning_rate': 5.55160922085916e-06, 'epoch': 0.66} 66%|██████▌ | 4365/6640 [2:56:34<3:29:32, 5.53s/it] 66%|██████▌ | 4366/6640 [2:56:39<3:28:05, 5.49s/it] {'loss': 0.6205, 'learning_rate': 5.5472407388177385e-06, 'epoch': 0.66} 66%|██████▌ | 4366/6640 [2:56:39<3:28:05, 5.49s/it] 66%|██████▌ | 4367/6640 [2:56:45<3:28:06, 5.49s/it] {'loss': 0.5857, 'learning_rate': 5.5428733164135665e-06, 'epoch': 0.66} 66%|██████▌ | 4367/6640 [2:56:45<3:28:06, 5.49s/it] 66%|██████▌ | 4368/6640 [2:56:50<3:26:48, 5.46s/it] {'loss': 0.5839, 'learning_rate': 5.538506954685977e-06, 'epoch': 0.66} 66%|██████▌ | 4368/6640 [2:56:50<3:26:48, 5.46s/it] 66%|██████▌ | 4369/6640 [2:56:56<3:25:46, 5.44s/it] {'loss': 0.5874, 'learning_rate': 5.534141654674046e-06, 'epoch': 0.66} 66%|██████▌ | 4369/6640 [2:56:56<3:25:46, 5.44s/it] 66%|██████▌ | 4370/6640 [2:57:01<3:25:04, 5.42s/it] {'loss': 0.5573, 'learning_rate': 5.529777417416599e-06, 'epoch': 0.66} 66%|██████▌ | 4370/6640 [2:57:01<3:25:04, 5.42s/it] 66%|██████▌ | 4371/6640 [2:57:06<3:25:01, 5.42s/it] {'loss': 0.5639, 'learning_rate': 5.525414243952205e-06, 'epoch': 0.66} 66%|██████▌ | 4371/6640 [2:57:06<3:25:01, 5.42s/it] 66%|██████▌ | 4372/6640 [2:57:12<3:22:50, 5.37s/it] {'loss': 0.5705, 'learning_rate': 5.521052135319182e-06, 'epoch': 0.66} 66%|██████▌ | 4372/6640 [2:57:12<3:22:50, 5.37s/it] 66%|██████▌ | 4373/6640 [2:57:17<3:21:02, 5.32s/it] {'loss': 0.5856, 'learning_rate': 5.516691092555595e-06, 'epoch': 0.66} 66%|██████▌ | 4373/6640 [2:57:17<3:21:02, 5.32s/it] 66%|██████▌ | 4374/6640 [2:57:22<3:22:42, 5.37s/it] {'loss': 0.5816, 'learning_rate': 5.512331116699257e-06, 'epoch': 0.66} 66%|██████▌ | 4374/6640 [2:57:22<3:22:42, 5.37s/it] 66%|██████▌ | 4375/6640 [2:57:28<3:26:24, 5.47s/it] {'loss': 0.5833, 'learning_rate': 5.507972208787728e-06, 'epoch': 0.66} 66%|██████▌ | 4375/6640 [2:57:28<3:26:24, 5.47s/it] 66%|██████▌ | 4376/6640 [2:57:34<3:27:22, 5.50s/it] {'loss': 0.5872, 'learning_rate': 5.503614369858301e-06, 'epoch': 0.66} 66%|██████▌ | 4376/6640 [2:57:34<3:27:22, 5.50s/it] 66%|██████▌ | 4377/6640 [2:57:39<3:27:57, 5.51s/it] {'loss': 0.5908, 'learning_rate': 5.499257600948038e-06, 'epoch': 0.66} 66%|██████▌ | 4377/6640 [2:57:39<3:27:57, 5.51s/it] 66%|██████▌ | 4378/6640 [2:57:44<3:25:18, 5.45s/it] {'loss': 0.597, 'learning_rate': 5.494901903093724e-06, 'epoch': 0.66} 66%|██████▌ | 4378/6640 [2:57:44<3:25:18, 5.45s/it] 66%|██████▌ | 4379/6640 [2:57:50<3:27:54, 5.52s/it] {'loss': 0.5706, 'learning_rate': 5.490547277331904e-06, 'epoch': 0.66} 66%|██████▌ | 4379/6640 [2:57:50<3:27:54, 5.52s/it] 66%|██████▌ | 4380/6640 [2:57:56<3:27:42, 5.51s/it] {'loss': 0.5729, 'learning_rate': 5.486193724698854e-06, 'epoch': 0.66} 66%|██████▌ | 4380/6640 [2:57:56<3:27:42, 5.51s/it] 66%|██████▌ | 4381/6640 [2:58:01<3:26:09, 5.48s/it] {'loss': 0.57, 'learning_rate': 5.481841246230608e-06, 'epoch': 0.66} 66%|██████▌ | 4381/6640 [2:58:01<3:26:09, 5.48s/it] 66%|██████▌ | 4382/6640 [2:58:06<3:26:07, 5.48s/it] {'loss': 0.604, 'learning_rate': 5.477489842962937e-06, 'epoch': 0.66} 66%|██████▌ | 4382/6640 [2:58:06<3:26:07, 5.48s/it] 66%|██████▌ | 4383/6640 [2:58:12<3:26:24, 5.49s/it] {'loss': 0.5747, 'learning_rate': 5.473139515931362e-06, 'epoch': 0.66} 66%|██████▌ | 4383/6640 [2:58:12<3:26:24, 5.49s/it] 66%|██████▌ | 4384/6640 [2:58:17<3:24:21, 5.44s/it] {'loss': 0.5825, 'learning_rate': 5.4687902661711465e-06, 'epoch': 0.66} 66%|██████▌ | 4384/6640 [2:58:17<3:24:21, 5.44s/it] 66%|██████▌ | 4385/6640 [2:58:23<3:23:00, 5.40s/it] {'loss': 0.6003, 'learning_rate': 5.464442094717281e-06, 'epoch': 0.66} 66%|██████▌ | 4385/6640 [2:58:23<3:23:00, 5.40s/it] 66%|██████▌ | 4386/6640 [2:58:28<3:25:54, 5.48s/it] {'loss': 0.5909, 'learning_rate': 5.460095002604533e-06, 'epoch': 0.66} 66%|██████▌ | 4386/6640 [2:58:28<3:25:54, 5.48s/it] 66%|██████▌ | 4387/6640 [2:58:34<3:24:26, 5.44s/it] {'loss': 0.5831, 'learning_rate': 5.45574899086738e-06, 'epoch': 0.66} 66%|██████▌ | 4387/6640 [2:58:34<3:24:26, 5.44s/it] 66%|██████▌ | 4388/6640 [2:58:39<3:24:53, 5.46s/it] {'loss': 0.5925, 'learning_rate': 5.451404060540064e-06, 'epoch': 0.66} 66%|██████▌ | 4388/6640 [2:58:39<3:24:53, 5.46s/it] 66%|██████▌ | 4389/6640 [2:58:45<3:26:46, 5.51s/it] {'loss': 0.5883, 'learning_rate': 5.447060212656554e-06, 'epoch': 0.66} 66%|██████▌ | 4389/6640 [2:58:45<3:26:46, 5.51s/it] 66%|██████▌ | 4390/6640 [2:58:50<3:24:12, 5.45s/it] {'loss': 0.6026, 'learning_rate': 5.442717448250574e-06, 'epoch': 0.66} 66%|██████▌ | 4390/6640 [2:58:50<3:24:12, 5.45s/it] 66%|██████▌ | 4391/6640 [2:58:55<3:23:58, 5.44s/it] {'loss': 0.5687, 'learning_rate': 5.438375768355586e-06, 'epoch': 0.66} 66%|██████▌ | 4391/6640 [2:58:55<3:23:58, 5.44s/it] 66%|██████▌ | 4392/6640 [2:59:01<3:23:41, 5.44s/it] {'loss': 0.6052, 'learning_rate': 5.4340351740047915e-06, 'epoch': 0.66} 66%|██████▌ | 4392/6640 [2:59:01<3:23:41, 5.44s/it] 66%|██████▌ | 4393/6640 [2:59:06<3:25:17, 5.48s/it] {'loss': 0.5776, 'learning_rate': 5.429695666231141e-06, 'epoch': 0.66} 66%|██████▌ | 4393/6640 [2:59:06<3:25:17, 5.48s/it] 66%|██████▌ | 4394/6640 [2:59:12<3:25:07, 5.48s/it] {'loss': 0.6118, 'learning_rate': 5.425357246067307e-06, 'epoch': 0.66} 66%|██████▌ | 4394/6640 [2:59:12<3:25:07, 5.48s/it] 66%|██████▌ | 4395/6640 [2:59:18<3:26:12, 5.51s/it] {'loss': 0.5836, 'learning_rate': 5.421019914545735e-06, 'epoch': 0.66} 66%|██████▌ | 4395/6640 [2:59:18<3:26:12, 5.51s/it] 66%|██████▌ | 4396/6640 [2:59:23<3:24:15, 5.46s/it] {'loss': 0.578, 'learning_rate': 5.416683672698581e-06, 'epoch': 0.66} 66%|██████▌ | 4396/6640 [2:59:23<3:24:15, 5.46s/it] 66%|██████▌ | 4397/6640 [2:59:28<3:25:05, 5.49s/it] {'loss': 0.586, 'learning_rate': 5.412348521557762e-06, 'epoch': 0.66} 66%|██████▌ | 4397/6640 [2:59:28<3:25:05, 5.49s/it] 66%|██████▌ | 4398/6640 [2:59:34<3:23:41, 5.45s/it] {'loss': 0.5674, 'learning_rate': 5.4080144621549205e-06, 'epoch': 0.66} 66%|██████▌ | 4398/6640 [2:59:34<3:23:41, 5.45s/it] 66%|██████▋ | 4399/6640 [2:59:39<3:22:04, 5.41s/it] {'loss': 0.6024, 'learning_rate': 5.40368149552145e-06, 'epoch': 0.66} 66%|██████▋ | 4399/6640 [2:59:39<3:22:04, 5.41s/it]4 AutoResumeHook: Checking whether to suspend... 75 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 66%|██████▋ | 4400/6640 [2:59:45<3:22:39, 5.43s/it]3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5631, 'learning_rate': 5.399349622688479e-06, 'epoch': 0.66} 66%|██████▋ | 4400/6640 [2:59:45<3:22:39, 5.43s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 66%|██████▋ | 4401/6640 [3:00:11<7:22:57, 11.87s/it] {'loss': 0.5812, 'learning_rate': 5.395018844686879e-06, 'epoch': 0.66} 66%|██████▋ | 4401/6640 [3:00:11<7:22:57, 11.87s/it] 66%|██████▋ | 4402/6640 [3:00:17<6:09:37, 9.91s/it] {'loss': 0.5979, 'learning_rate': 5.390689162547262e-06, 'epoch': 0.66} 66%|██████▋ | 4402/6640 [3:00:17<6:09:37, 9.91s/it] 66%|██████▋ | 4403/6640 [3:00:22<5:20:30, 8.60s/it] {'loss': 0.5867, 'learning_rate': 5.386360577299965e-06, 'epoch': 0.66} 66%|██████▋ | 4403/6640 [3:00:22<5:20:30, 8.60s/it] 66%|██████▋ | 4404/6640 [3:00:28<4:44:42, 7.64s/it] {'loss': 0.5817, 'learning_rate': 5.3820330899750895e-06, 'epoch': 0.66} 66%|██████▋ | 4404/6640 [3:00:28<4:44:42, 7.64s/it] 66%|██████▋ | 4405/6640 [3:00:33<4:20:47, 7.00s/it] {'loss': 0.5882, 'learning_rate': 5.3777067016024495e-06, 'epoch': 0.66} 66%|██████▋ | 4405/6640 [3:00:33<4:20:47, 7.00s/it] 66%|██████▋ | 4406/6640 [3:00:39<4:02:55, 6.52s/it] {'loss': 0.5613, 'learning_rate': 5.3733814132116205e-06, 'epoch': 0.66} 66%|██████▋ | 4406/6640 [3:00:39<4:02:55, 6.52s/it] 66%|██████▋ | 4407/6640 [3:00:44<3:49:00, 6.15s/it] {'loss': 0.593, 'learning_rate': 5.369057225831893e-06, 'epoch': 0.66} 66%|██████▋ | 4407/6640 [3:00:44<3:49:00, 6.15s/it] 66%|██████▋ | 4408/6640 [3:00:49<3:40:42, 5.93s/it] {'loss': 0.5819, 'learning_rate': 5.364734140492314e-06, 'epoch': 0.66} 66%|██████▋ | 4408/6640 [3:00:49<3:40:42, 5.93s/it] 66%|██████▋ | 4409/6640 [3:00:55<3:35:03, 5.78s/it] {'loss': 0.5972, 'learning_rate': 5.360412158221661e-06, 'epoch': 0.66} 66%|██████▋ | 4409/6640 [3:00:55<3:35:03, 5.78s/it] 66%|██████▋ | 4410/6640 [3:01:00<3:31:22, 5.69s/it] {'loss': 0.5685, 'learning_rate': 5.35609128004845e-06, 'epoch': 0.66} 66%|██████▋ | 4410/6640 [3:01:00<3:31:22, 5.69s/it] 66%|██████▋ | 4411/6640 [3:01:06<3:31:02, 5.68s/it] {'loss': 0.5661, 'learning_rate': 5.351771507000938e-06, 'epoch': 0.66} 66%|██████▋ | 4411/6640 [3:01:06<3:31:02, 5.68s/it] 66%|██████▋ | 4412/6640 [3:01:11<3:27:25, 5.59s/it] {'loss': 0.5833, 'learning_rate': 5.347452840107103e-06, 'epoch': 0.66} 66%|██████▋ | 4412/6640 [3:01:11<3:27:25, 5.59s/it] 66%|██████▋ | 4413/6640 [3:01:17<3:26:14, 5.56s/it] {'loss': 0.5701, 'learning_rate': 5.343135280394689e-06, 'epoch': 0.66} 66%|██████▋ | 4413/6640 [3:01:17<3:26:14, 5.56s/it] 66%|██████▋ | 4414/6640 [3:01:22<3:23:54, 5.50s/it] {'loss': 0.5644, 'learning_rate': 5.338818828891148e-06, 'epoch': 0.66} 66%|██████▋ | 4414/6640 [3:01:22<3:23:54, 5.50s/it] 66%|██████▋ | 4415/6640 [3:01:28<3:22:39, 5.46s/it] {'loss': 0.5785, 'learning_rate': 5.33450348662368e-06, 'epoch': 0.66} 66%|██████▋ | 4415/6640 [3:01:28<3:22:39, 5.46s/it] 67%|██████▋ | 4416/6640 [3:01:33<3:23:13, 5.48s/it] {'loss': 0.5633, 'learning_rate': 5.330189254619227e-06, 'epoch': 0.67} 67%|██████▋ | 4416/6640 [3:01:33<3:23:13, 5.48s/it] 67%|██████▋ | 4417/6640 [3:01:39<3:24:02, 5.51s/it] {'loss': 0.5814, 'learning_rate': 5.325876133904448e-06, 'epoch': 0.67} 67%|██████▋ | 4417/6640 [3:01:39<3:24:02, 5.51s/it] 67%|██████▋ | 4418/6640 [3:01:44<3:25:28, 5.55s/it] {'loss': 0.5598, 'learning_rate': 5.321564125505768e-06, 'epoch': 0.67} 67%|██████▋ | 4418/6640 [3:01:44<3:25:28, 5.55s/it] 67%|██████▋ | 4419/6640 [3:01:50<3:23:29, 5.50s/it] {'loss': 0.5724, 'learning_rate': 5.317253230449315e-06, 'epoch': 0.67} 67%|██████▋ | 4419/6640 [3:01:50<3:23:29, 5.50s/it] 67%|██████▋ | 4420/6640 [3:01:55<3:23:10, 5.49s/it] {'loss': 0.5851, 'learning_rate': 5.312943449760975e-06, 'epoch': 0.67} 67%|██████▋ | 4420/6640 [3:01:55<3:23:10, 5.49s/it] 67%|██████▋ | 4421/6640 [3:02:01<3:21:45, 5.46s/it] {'loss': 0.5639, 'learning_rate': 5.308634784466349e-06, 'epoch': 0.67} 67%|██████▋ | 4421/6640 [3:02:01<3:21:45, 5.46s/it] 67%|██████▋ | 4422/6640 [3:02:06<3:21:50, 5.46s/it] {'loss': 0.593, 'learning_rate': 5.304327235590798e-06, 'epoch': 0.67} 67%|██████▋ | 4422/6640 [3:02:06<3:21:50, 5.46s/it] 67%|██████▋ | 4423/6640 [3:02:11<3:19:21, 5.40s/it] {'loss': 0.5764, 'learning_rate': 5.300020804159393e-06, 'epoch': 0.67} 67%|██████▋ | 4423/6640 [3:02:11<3:19:21, 5.40s/it] 67%|██████▋ | 4424/6640 [3:02:17<3:21:20, 5.45s/it] {'loss': 0.5879, 'learning_rate': 5.295715491196951e-06, 'epoch': 0.67} 67%|██████▋ | 4424/6640 [3:02:17<3:21:20, 5.45s/it] 67%|██████▋ | 4425/6640 [3:02:22<3:20:38, 5.44s/it] {'loss': 0.5746, 'learning_rate': 5.291411297728027e-06, 'epoch': 0.67} 67%|██████▋ | 4425/6640 [3:02:22<3:20:38, 5.44s/it] 67%|██████▋ | 4426/6640 [3:02:28<3:20:44, 5.44s/it] {'loss': 0.5824, 'learning_rate': 5.287108224776891e-06, 'epoch': 0.67} 67%|██████▋ | 4426/6640 [3:02:28<3:20:44, 5.44s/it] 67%|██████▋ | 4427/6640 [3:02:33<3:20:02, 5.42s/it] {'loss': 0.5409, 'learning_rate': 5.282806273367576e-06, 'epoch': 0.67} 67%|██████▋ | 4427/6640 [3:02:33<3:20:02, 5.42s/it] 67%|██████▋ | 4428/6640 [3:02:38<3:19:37, 5.41s/it] {'loss': 0.5973, 'learning_rate': 5.2785054445238156e-06, 'epoch': 0.67} 67%|██████▋ | 4428/6640 [3:02:38<3:19:37, 5.41s/it] 67%|██████▋ | 4429/6640 [3:02:44<3:17:27, 5.36s/it] {'loss': 0.6001, 'learning_rate': 5.274205739269102e-06, 'epoch': 0.67} 67%|██████▋ | 4429/6640 [3:02:44<3:17:27, 5.36s/it] 67%|██████▋ | 4430/6640 [3:02:50<3:22:59, 5.51s/it] {'loss': 0.587, 'learning_rate': 5.269907158626639e-06, 'epoch': 0.67} 67%|██████▋ | 4430/6640 [3:02:50<3:22:59, 5.51s/it] 67%|██████▋ | 4431/6640 [3:02:55<3:21:16, 5.47s/it] {'loss': 0.5699, 'learning_rate': 5.265609703619388e-06, 'epoch': 0.67} 67%|██████▋ | 4431/6640 [3:02:55<3:21:16, 5.47s/it] 67%|██████▋ | 4432/6640 [3:03:00<3:21:37, 5.48s/it] {'loss': 0.5673, 'learning_rate': 5.2613133752700145e-06, 'epoch': 0.67} 67%|██████▋ | 4432/6640 [3:03:00<3:21:37, 5.48s/it] 67%|██████▋ | 4433/6640 [3:03:06<3:20:34, 5.45s/it] {'loss': 0.5825, 'learning_rate': 5.257018174600937e-06, 'epoch': 0.67} 67%|██████▋ | 4433/6640 [3:03:06<3:20:34, 5.45s/it] 67%|██████▋ | 4434/6640 [3:03:11<3:20:50, 5.46s/it] {'loss': 0.5586, 'learning_rate': 5.2527241026342965e-06, 'epoch': 0.67} 67%|██████▋ | 4434/6640 [3:03:11<3:20:50, 5.46s/it] 67%|██████▋ | 4435/6640 [3:03:17<3:22:12, 5.50s/it] {'loss': 0.5674, 'learning_rate': 5.248431160391963e-06, 'epoch': 0.67} 67%|██████▋ | 4435/6640 [3:03:17<3:22:12, 5.50s/it] 67%|██████▋ | 4436/6640 [3:03:22<3:20:07, 5.45s/it] {'loss': 0.5923, 'learning_rate': 5.24413934889555e-06, 'epoch': 0.67} 67%|██████▋ | 4436/6640 [3:03:22<3:20:07, 5.45s/it] 67%|██████▋ | 4437/6640 [3:03:28<3:24:30, 5.57s/it] {'loss': 0.5889, 'learning_rate': 5.239848669166385e-06, 'epoch': 0.67} 67%|██████▋ | 4437/6640 [3:03:28<3:24:30, 5.57s/it] 67%|██████▋ | 4438/6640 [3:03:33<3:22:03, 5.51s/it] {'loss': 0.5795, 'learning_rate': 5.235559122225542e-06, 'epoch': 0.67} 67%|██████▋ | 4438/6640 [3:03:33<3:22:03, 5.51s/it] 67%|██████▋ | 4439/6640 [3:03:39<3:20:41, 5.47s/it] {'loss': 0.5972, 'learning_rate': 5.231270709093808e-06, 'epoch': 0.67} 67%|██████▋ | 4439/6640 [3:03:39<3:20:41, 5.47s/it] 67%|██████▋ | 4440/6640 [3:03:44<3:19:20, 5.44s/it] {'loss': 0.5963, 'learning_rate': 5.226983430791722e-06, 'epoch': 0.67} 67%|██████▋ | 4440/6640 [3:03:44<3:19:20, 5.44s/it] 67%|██████▋ | 4441/6640 [3:03:49<3:17:55, 5.40s/it] {'loss': 0.5877, 'learning_rate': 5.2226972883395335e-06, 'epoch': 0.67} 67%|██████▋ | 4441/6640 [3:03:49<3:17:55, 5.40s/it] 67%|██████▋ | 4442/6640 [3:03:55<3:18:27, 5.42s/it] {'loss': 0.5673, 'learning_rate': 5.2184122827572315e-06, 'epoch': 0.67} 67%|██████▋ | 4442/6640 [3:03:55<3:18:27, 5.42s/it] 67%|██████▋ | 4443/6640 [3:04:00<3:20:02, 5.46s/it] {'loss': 0.5745, 'learning_rate': 5.214128415064536e-06, 'epoch': 0.67} 67%|██████▋ | 4443/6640 [3:04:01<3:20:02, 5.46s/it] 67%|██████▋ | 4444/6640 [3:04:06<3:19:02, 5.44s/it] {'loss': 0.5648, 'learning_rate': 5.209845686280881e-06, 'epoch': 0.67} 67%|██████▋ | 4444/6640 [3:04:06<3:19:02, 5.44s/it] 67%|██████▋ | 4445/6640 [3:04:11<3:20:26, 5.48s/it] {'loss': 0.5875, 'learning_rate': 5.205564097425458e-06, 'epoch': 0.67} 67%|██████▋ | 4445/6640 [3:04:11<3:20:26, 5.48s/it] 67%|██████▋ | 4446/6640 [3:04:17<3:19:59, 5.47s/it] {'loss': 0.5893, 'learning_rate': 5.201283649517157e-06, 'epoch': 0.67} 67%|██████▋ | 4446/6640 [3:04:17<3:19:59, 5.47s/it] 67%|██████▋ | 4447/6640 [3:04:22<3:21:05, 5.50s/it] {'loss': 0.5895, 'learning_rate': 5.197004343574618e-06, 'epoch': 0.67} 67%|██████▋ | 4447/6640 [3:04:22<3:21:05, 5.50s/it] 67%|██████▋ | 4448/6640 [3:04:28<3:19:26, 5.46s/it] {'loss': 0.5693, 'learning_rate': 5.192726180616191e-06, 'epoch': 0.67} 67%|██████▋ | 4448/6640 [3:04:28<3:19:26, 5.46s/it] 67%|██████▋ | 4449/6640 [3:04:33<3:18:30, 5.44s/it] {'loss': 0.6033, 'learning_rate': 5.188449161659978e-06, 'epoch': 0.67} 67%|██████▋ | 4449/6640 [3:04:33<3:18:30, 5.44s/it]7 AutoResumeHook: Checking whether to suspend...2 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... 64 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 67%|██████▋ | 4450/6640 [3:04:39<3:17:09, 5.40s/it]3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.577, 'learning_rate': 5.184173287723782e-06, 'epoch': 0.67} 67%|██████▋ | 4450/6640 [3:04:39<3:17:09, 5.40s/it] 67%|██████▋ | 4451/6640 [3:04:44<3:17:23, 5.41s/it] {'loss': 0.6074, 'learning_rate': 5.179898559825151e-06, 'epoch': 0.67} 67%|██████▋ | 4451/6640 [3:04:44<3:17:23, 5.41s/it] 67%|██████▋ | 4452/6640 [3:04:50<3:24:36, 5.61s/it] {'loss': 0.5937, 'learning_rate': 5.17562497898136e-06, 'epoch': 0.67} 67%|██████▋ | 4452/6640 [3:04:50<3:24:36, 5.61s/it] 67%|██████▋ | 4453/6640 [3:04:55<3:22:35, 5.56s/it] {'loss': 0.5729, 'learning_rate': 5.1713525462093965e-06, 'epoch': 0.67} 67%|██████▋ | 4453/6640 [3:04:55<3:22:35, 5.56s/it] 67%|██████▋ | 4454/6640 [3:05:01<3:20:03, 5.49s/it] {'loss': 0.5795, 'learning_rate': 5.167081262525997e-06, 'epoch': 0.67} 67%|██████▋ | 4454/6640 [3:05:01<3:20:03, 5.49s/it] 67%|██████▋ | 4455/6640 [3:05:06<3:20:20, 5.50s/it] {'loss': 0.5599, 'learning_rate': 5.1628111289476025e-06, 'epoch': 0.67} 67%|██████▋ | 4455/6640 [3:05:06<3:20:20, 5.50s/it] 67%|██████▋ | 4456/6640 [3:05:12<3:19:37, 5.48s/it] {'loss': 0.5851, 'learning_rate': 5.1585421464904e-06, 'epoch': 0.67} 67%|██████▋ | 4456/6640 [3:05:12<3:19:37, 5.48s/it] 67%|██████▋ | 4457/6640 [3:05:17<3:19:21, 5.48s/it] {'loss': 0.5899, 'learning_rate': 5.154274316170277e-06, 'epoch': 0.67} 67%|██████▋ | 4457/6640 [3:05:17<3:19:21, 5.48s/it] 67%|██████▋ | 4458/6640 [3:05:23<3:17:33, 5.43s/it] {'loss': 0.5889, 'learning_rate': 5.150007639002881e-06, 'epoch': 0.67} 67%|██████▋ | 4458/6640 [3:05:23<3:17:33, 5.43s/it] 67%|██████▋ | 4459/6640 [3:05:28<3:17:42, 5.44s/it] {'loss': 0.5997, 'learning_rate': 5.145742116003554e-06, 'epoch': 0.67} 67%|██████▋ | 4459/6640 [3:05:28<3:17:42, 5.44s/it] 67%|██████▋ | 4460/6640 [3:05:33<3:16:56, 5.42s/it] {'loss': 0.5943, 'learning_rate': 5.14147774818738e-06, 'epoch': 0.67} 67%|██████▋ | 4460/6640 [3:05:33<3:16:56, 5.42s/it] 67%|██████▋ | 4461/6640 [3:05:39<3:16:00, 5.40s/it] {'loss': 0.5665, 'learning_rate': 5.13721453656917e-06, 'epoch': 0.67} 67%|██████▋ | 4461/6640 [3:05:39<3:16:00, 5.40s/it] 67%|██████▋ | 4462/6640 [3:05:44<3:15:54, 5.40s/it] {'loss': 0.5947, 'learning_rate': 5.132952482163439e-06, 'epoch': 0.67} 67%|██████▋ | 4462/6640 [3:05:44<3:15:54, 5.40s/it] 67%|██████▋ | 4463/6640 [3:05:50<3:16:29, 5.42s/it] {'loss': 0.575, 'learning_rate': 5.12869158598446e-06, 'epoch': 0.67} 67%|██████▋ | 4463/6640 [3:05:50<3:16:29, 5.42s/it] 67%|██████▋ | 4464/6640 [3:05:55<3:19:56, 5.51s/it] {'loss': 0.5825, 'learning_rate': 5.124431849046197e-06, 'epoch': 0.67} 67%|██████▋ | 4464/6640 [3:05:55<3:19:56, 5.51s/it] 67%|██████▋ | 4465/6640 [3:06:01<3:21:45, 5.57s/it] {'loss': 0.6032, 'learning_rate': 5.120173272362361e-06, 'epoch': 0.67} 67%|██████▋ | 4465/6640 [3:06:01<3:21:45, 5.57s/it] 67%|██████▋ | 4466/6640 [3:06:06<3:19:03, 5.49s/it] {'loss': 0.5827, 'learning_rate': 5.115915856946376e-06, 'epoch': 0.67} 67%|██████▋ | 4466/6640 [3:06:06<3:19:03, 5.49s/it] 67%|██████▋ | 4467/6640 [3:06:12<3:19:00, 5.49s/it] {'loss': 0.5988, 'learning_rate': 5.1116596038113955e-06, 'epoch': 0.67} 67%|██████▋ | 4467/6640 [3:06:12<3:19:00, 5.49s/it] 67%|██████▋ | 4468/6640 [3:06:17<3:17:25, 5.45s/it] {'loss': 0.5771, 'learning_rate': 5.107404513970295e-06, 'epoch': 0.67} 67%|██████▋ | 4468/6640 [3:06:17<3:17:25, 5.45s/it] 67%|██████▋ | 4469/6640 [3:06:23<3:19:23, 5.51s/it] {'loss': 0.5706, 'learning_rate': 5.103150588435663e-06, 'epoch': 0.67} 67%|██████▋ | 4469/6640 [3:06:23<3:19:23, 5.51s/it] 67%|██████▋ | 4470/6640 [3:06:28<3:19:27, 5.51s/it] {'loss': 0.6111, 'learning_rate': 5.098897828219831e-06, 'epoch': 0.67} 67%|██████▋ | 4470/6640 [3:06:28<3:19:27, 5.51s/it] 67%|██████▋ | 4471/6640 [3:06:34<3:17:06, 5.45s/it] {'loss': 0.5713, 'learning_rate': 5.094646234334828e-06, 'epoch': 0.67} 67%|██████▋ | 4471/6640 [3:06:34<3:17:06, 5.45s/it] 67%|██████▋ | 4472/6640 [3:06:39<3:18:50, 5.50s/it] {'loss': 0.5881, 'learning_rate': 5.090395807792437e-06, 'epoch': 0.67} 67%|██████▋ | 4472/6640 [3:06:39<3:18:50, 5.50s/it] 67%|██████▋ | 4473/6640 [3:06:45<3:20:19, 5.55s/it] {'loss': 0.5567, 'learning_rate': 5.08614654960413e-06, 'epoch': 0.67} 67%|██████▋ | 4473/6640 [3:06:45<3:20:19, 5.55s/it] 67%|██████▋ | 4474/6640 [3:06:51<3:20:16, 5.55s/it] {'loss': 0.5834, 'learning_rate': 5.0818984607811206e-06, 'epoch': 0.67} 67%|██████▋ | 4474/6640 [3:06:51<3:20:16, 5.55s/it] 67%|██████▋ | 4475/6640 [3:06:56<3:19:21, 5.53s/it] {'loss': 0.5978, 'learning_rate': 5.0776515423343445e-06, 'epoch': 0.67} 67%|██████▋ | 4475/6640 [3:06:56<3:19:21, 5.53s/it] 67%|██████▋ | 4476/6640 [3:07:01<3:18:12, 5.50s/it] {'loss': 0.5586, 'learning_rate': 5.073405795274451e-06, 'epoch': 0.67} 67%|██████▋ | 4476/6640 [3:07:01<3:18:12, 5.50s/it] 67%|██████▋ | 4477/6640 [3:07:07<3:16:03, 5.44s/it] {'loss': 0.6017, 'learning_rate': 5.069161220611818e-06, 'epoch': 0.67} 67%|██████▋ | 4477/6640 [3:07:07<3:16:03, 5.44s/it] 67%|██████▋ | 4478/6640 [3:07:12<3:18:41, 5.51s/it] {'loss': 0.5678, 'learning_rate': 5.064917819356532e-06, 'epoch': 0.67} 67%|██████▋ | 4478/6640 [3:07:12<3:18:41, 5.51s/it] 67%|██████▋ | 4479/6640 [3:07:18<3:17:55, 5.50s/it] {'loss': 0.5946, 'learning_rate': 5.060675592518419e-06, 'epoch': 0.67} 67%|██████▋ | 4479/6640 [3:07:18<3:17:55, 5.50s/it] 67%|██████▋ | 4480/6640 [3:07:23<3:17:49, 5.50s/it] {'loss': 0.5829, 'learning_rate': 5.0564345411070025e-06, 'epoch': 0.67} 67%|██████▋ | 4480/6640 [3:07:23<3:17:49, 5.50s/it] 67%|██████▋ | 4481/6640 [3:07:29<3:18:43, 5.52s/it] {'loss': 0.5674, 'learning_rate': 5.052194666131553e-06, 'epoch': 0.67} 67%|██████▋ | 4481/6640 [3:07:29<3:18:43, 5.52s/it] 68%|██████▊ | 4482/6640 [3:07:34<3:16:24, 5.46s/it] {'loss': 0.5994, 'learning_rate': 5.047955968601037e-06, 'epoch': 0.68} 68%|██████▊ | 4482/6640 [3:07:34<3:16:24, 5.46s/it] 68%|██████▊ | 4483/6640 [3:07:40<3:17:57, 5.51s/it] {'loss': 0.5868, 'learning_rate': 5.043718449524153e-06, 'epoch': 0.68} 68%|██████▊ | 4483/6640 [3:07:40<3:17:57, 5.51s/it] 68%|██████▊ | 4484/6640 [3:07:45<3:15:31, 5.44s/it] {'loss': 0.5672, 'learning_rate': 5.0394821099093195e-06, 'epoch': 0.68} 68%|██████▊ | 4484/6640 [3:07:45<3:15:31, 5.44s/it] 68%|██████▊ | 4485/6640 [3:07:50<3:13:52, 5.40s/it] {'loss': 0.5647, 'learning_rate': 5.03524695076467e-06, 'epoch': 0.68} 68%|██████▊ | 4485/6640 [3:07:50<3:13:52, 5.40s/it] 68%|██████▊ | 4486/6640 [3:07:56<3:15:42, 5.45s/it] {'loss': 0.5896, 'learning_rate': 5.031012973098062e-06, 'epoch': 0.68} 68%|██████▊ | 4486/6640 [3:07:56<3:15:42, 5.45s/it] 68%|██████▊ | 4487/6640 [3:08:01<3:14:23, 5.42s/it] {'loss': 0.5511, 'learning_rate': 5.026780177917061e-06, 'epoch': 0.68} 68%|██████▊ | 4487/6640 [3:08:01<3:14:23, 5.42s/it] 68%|██████▊ | 4488/6640 [3:08:07<3:14:02, 5.41s/it] {'loss': 0.5827, 'learning_rate': 5.022548566228968e-06, 'epoch': 0.68} 68%|██████▊ | 4488/6640 [3:08:07<3:14:02, 5.41s/it] 68%|██████▊ | 4489/6640 [3:08:12<3:15:25, 5.45s/it] {'loss': 0.5711, 'learning_rate': 5.018318139040781e-06, 'epoch': 0.68} 68%|██████▊ | 4489/6640 [3:08:12<3:15:25, 5.45s/it] 68%|██████▊ | 4490/6640 [3:08:18<3:13:39, 5.40s/it] {'loss': 0.5736, 'learning_rate': 5.014088897359242e-06, 'epoch': 0.68} 68%|██████▊ | 4490/6640 [3:08:18<3:13:39, 5.40s/it] 68%|██████▊ | 4491/6640 [3:08:23<3:15:00, 5.44s/it] {'loss': 0.597, 'learning_rate': 5.009860842190787e-06, 'epoch': 0.68} 68%|██████▊ | 4491/6640 [3:08:23<3:15:00, 5.44s/it] 68%|██████▊ | 4492/6640 [3:08:29<3:14:30, 5.43s/it] {'loss': 0.5719, 'learning_rate': 5.005633974541584e-06, 'epoch': 0.68} 68%|██████▊ | 4492/6640 [3:08:29<3:14:30, 5.43s/it] 68%|██████▊ | 4493/6640 [3:08:34<3:15:00, 5.45s/it] {'loss': 0.5708, 'learning_rate': 5.001408295417512e-06, 'epoch': 0.68} 68%|██████▊ | 4493/6640 [3:08:34<3:15:00, 5.45s/it] 68%|██████▊ | 4494/6640 [3:08:40<3:17:10, 5.51s/it] {'loss': 0.5779, 'learning_rate': 4.997183805824171e-06, 'epoch': 0.68} 68%|██████▊ | 4494/6640 [3:08:40<3:17:10, 5.51s/it] 68%|██████▊ | 4495/6640 [3:08:45<3:15:10, 5.46s/it] {'loss': 0.582, 'learning_rate': 4.992960506766881e-06, 'epoch': 0.68} 68%|██████▊ | 4495/6640 [3:08:45<3:15:10, 5.46s/it] 68%|██████▊ | 4496/6640 [3:08:50<3:13:45, 5.42s/it] {'loss': 0.5879, 'learning_rate': 4.9887383992506636e-06, 'epoch': 0.68} 68%|██████▊ | 4496/6640 [3:08:50<3:13:45, 5.42s/it] 68%|██████▊ | 4497/6640 [3:08:56<3:14:35, 5.45s/it] {'loss': 0.5905, 'learning_rate': 4.984517484280275e-06, 'epoch': 0.68} 68%|██████▊ | 4497/6640 [3:08:56<3:14:35, 5.45s/it] 68%|██████▊ | 4498/6640 [3:09:01<3:12:27, 5.39s/it] {'loss': 0.5859, 'learning_rate': 4.980297762860171e-06, 'epoch': 0.68} 68%|██████▊ | 4498/6640 [3:09:01<3:12:27, 5.39s/it] 68%|██████▊ | 4499/6640 [3:09:07<3:12:39, 5.40s/it] {'loss': 0.6042, 'learning_rate': 4.976079235994545e-06, 'epoch': 0.68} 68%|██████▊ | 4499/6640 [3:09:07<3:12:39, 5.40s/it]7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 68%|██████▊ | 4500/6640 [3:09:12<3:14:43, 5.46s/it]1 AutoResumeHook: Checking whether to suspend... {'loss': 0.5977, 'learning_rate': 4.971861904687283e-06, 'epoch': 0.68} 68%|██████▊ | 4500/6640 [3:09:12<3:14:43, 5.46s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 68%|██████▊ | 4501/6640 [3:09:52<9:25:09, 15.85s/it] {'loss': 0.5759, 'learning_rate': 4.967645769942e-06, 'epoch': 0.68} 68%|██████▊ | 4501/6640 [3:09:52<9:25:09, 15.85s/it] 68%|██████▊ | 4502/6640 [3:09:58<7:33:39, 12.73s/it] {'loss': 0.615, 'learning_rate': 4.963430832762023e-06, 'epoch': 0.68} 68%|██████▊ | 4502/6640 [3:09:58<7:33:39, 12.73s/it] 68%|██████▊ | 4503/6640 [3:10:03<6:15:50, 10.55s/it] {'loss': 0.5859, 'learning_rate': 4.959217094150393e-06, 'epoch': 0.68} 68%|██████▊ | 4503/6640 [3:10:03<6:15:50, 10.55s/it] 68%|██████▊ | 4504/6640 [3:10:08<5:18:50, 8.96s/it] {'loss': 0.569, 'learning_rate': 4.955004555109872e-06, 'epoch': 0.68} 68%|██████▊ | 4504/6640 [3:10:08<5:18:50, 8.96s/it] 68%|██████▊ | 4505/6640 [3:10:14<4:40:45, 7.89s/it] {'loss': 0.5588, 'learning_rate': 4.950793216642923e-06, 'epoch': 0.68} 68%|██████▊ | 4505/6640 [3:10:14<4:40:45, 7.89s/it] 68%|██████▊ | 4506/6640 [3:10:19<4:14:23, 7.15s/it] {'loss': 0.5626, 'learning_rate': 4.946583079751735e-06, 'epoch': 0.68} 68%|██████▊ | 4506/6640 [3:10:19<4:14:23, 7.15s/it] 68%|██████▊ | 4507/6640 [3:10:25<3:57:28, 6.68s/it] {'loss': 0.583, 'learning_rate': 4.942374145438209e-06, 'epoch': 0.68} 68%|██████▊ | 4507/6640 [3:10:25<3:57:28, 6.68s/it] 68%|██████▊ | 4508/6640 [3:10:30<3:42:39, 6.27s/it] {'loss': 0.5635, 'learning_rate': 4.938166414703958e-06, 'epoch': 0.68} 68%|██████▊ | 4508/6640 [3:10:30<3:42:39, 6.27s/it] 68%|██████▊ | 4509/6640 [3:10:36<3:34:14, 6.03s/it] {'loss': 0.5831, 'learning_rate': 4.933959888550311e-06, 'epoch': 0.68} 68%|██████▊ | 4509/6640 [3:10:36<3:34:14, 6.03s/it] 68%|██████▊ | 4510/6640 [3:10:41<3:27:00, 5.83s/it] {'loss': 0.5847, 'learning_rate': 4.929754567978303e-06, 'epoch': 0.68} 68%|██████▊ | 4510/6640 [3:10:41<3:27:00, 5.83s/it] 68%|██████▊ | 4511/6640 [3:10:47<3:25:45, 5.80s/it] {'loss': 0.5547, 'learning_rate': 4.92555045398869e-06, 'epoch': 0.68} 68%|██████▊ | 4511/6640 [3:10:47<3:25:45, 5.80s/it] 68%|██████▊ | 4512/6640 [3:10:52<3:21:34, 5.68s/it] {'loss': 0.5536, 'learning_rate': 4.921347547581939e-06, 'epoch': 0.68} 68%|██████▊ | 4512/6640 [3:10:52<3:21:34, 5.68s/it] 68%|██████▊ | 4513/6640 [3:10:57<3:17:44, 5.58s/it] {'loss': 0.5757, 'learning_rate': 4.917145849758234e-06, 'epoch': 0.68} 68%|██████▊ | 4513/6640 [3:10:57<3:17:44, 5.58s/it] 68%|██████▊ | 4514/6640 [3:11:03<3:13:56, 5.47s/it] {'loss': 0.5569, 'learning_rate': 4.912945361517456e-06, 'epoch': 0.68} 68%|██████▊ | 4514/6640 [3:11:03<3:13:56, 5.47s/it] 68%|██████▊ | 4515/6640 [3:11:08<3:13:51, 5.47s/it] {'loss': 0.5882, 'learning_rate': 4.908746083859214e-06, 'epoch': 0.68} 68%|██████▊ | 4515/6640 [3:11:08<3:13:51, 5.47s/it] 68%|██████▊ | 4516/6640 [3:11:14<3:16:23, 5.55s/it] {'loss': 0.5723, 'learning_rate': 4.904548017782823e-06, 'epoch': 0.68} 68%|██████▊ | 4516/6640 [3:11:14<3:16:23, 5.55s/it] 68%|██████▊ | 4517/6640 [3:11:19<3:14:45, 5.50s/it] {'loss': 0.5795, 'learning_rate': 4.90035116428731e-06, 'epoch': 0.68} 68%|██████▊ | 4517/6640 [3:11:19<3:14:45, 5.50s/it] 68%|██████▊ | 4518/6640 [3:11:25<3:13:30, 5.47s/it] {'loss': 0.5804, 'learning_rate': 4.896155524371419e-06, 'epoch': 0.68} 68%|██████▊ | 4518/6640 [3:11:25<3:13:30, 5.47s/it] 68%|██████▊ | 4519/6640 [3:11:30<3:12:09, 5.44s/it] {'loss': 0.5878, 'learning_rate': 4.891961099033589e-06, 'epoch': 0.68} 68%|██████▊ | 4519/6640 [3:11:30<3:12:09, 5.44s/it] 68%|██████▊ | 4520/6640 [3:11:35<3:11:24, 5.42s/it] {'loss': 0.5822, 'learning_rate': 4.887767889271987e-06, 'epoch': 0.68} 68%|██████▊ | 4520/6640 [3:11:35<3:11:24, 5.42s/it] 68%|██████▊ | 4521/6640 [3:11:41<3:12:03, 5.44s/it] {'loss': 0.5851, 'learning_rate': 4.883575896084484e-06, 'epoch': 0.68} 68%|██████▊ | 4521/6640 [3:11:43<3:12:03, 5.44s/it] 68%|██████▊ | 4522/6640 [3:11:48<3:29:01, 5.92s/it] {'loss': 0.5971, 'learning_rate': 4.879385120468665e-06, 'epoch': 0.68} 68%|██████▊ | 4522/6640 [3:11:48<3:29:01, 5.92s/it] 68%|██████▊ | 4523/6640 [3:11:53<3:24:51, 5.81s/it] {'loss': 0.5934, 'learning_rate': 4.875195563421815e-06, 'epoch': 0.68} 68%|██████▊ | 4523/6640 [3:11:53<3:24:51, 5.81s/it] 68%|██████▊ | 4524/6640 [3:11:59<3:19:51, 5.67s/it] {'loss': 0.5664, 'learning_rate': 4.87100722594094e-06, 'epoch': 0.68} 68%|██████▊ | 4524/6640 [3:11:59<3:19:51, 5.67s/it] 68%|██████▊ | 4525/6640 [3:12:04<3:17:59, 5.62s/it] {'loss': 0.5954, 'learning_rate': 4.866820109022752e-06, 'epoch': 0.68} 68%|██████▊ | 4525/6640 [3:12:04<3:17:59, 5.62s/it] 68%|██████▊ | 4526/6640 [3:12:10<3:18:21, 5.63s/it] {'loss': 0.5847, 'learning_rate': 4.862634213663672e-06, 'epoch': 0.68} 68%|██████▊ | 4526/6640 [3:12:10<3:18:21, 5.63s/it] 68%|██████▊ | 4527/6640 [3:12:15<3:14:57, 5.54s/it] {'loss': 0.593, 'learning_rate': 4.8584495408598355e-06, 'epoch': 0.68} 68%|██████▊ | 4527/6640 [3:12:15<3:14:57, 5.54s/it] 68%|██████▊ | 4528/6640 [3:12:21<3:15:56, 5.57s/it] {'loss': 0.5899, 'learning_rate': 4.854266091607074e-06, 'epoch': 0.68} 68%|██████▊ | 4528/6640 [3:12:21<3:15:56, 5.57s/it] 68%|██████▊ | 4529/6640 [3:12:27<3:17:34, 5.62s/it] {'loss': 0.5792, 'learning_rate': 4.85008386690094e-06, 'epoch': 0.68} 68%|██████▊ | 4529/6640 [3:12:27<3:17:34, 5.62s/it] 68%|██████▊ | 4530/6640 [3:12:32<3:14:39, 5.54s/it] {'loss': 0.589, 'learning_rate': 4.845902867736692e-06, 'epoch': 0.68} 68%|██████▊ | 4530/6640 [3:12:32<3:14:39, 5.54s/it] 68%|██████▊ | 4531/6640 [3:12:37<3:12:24, 5.47s/it] {'loss': 0.5906, 'learning_rate': 4.841723095109297e-06, 'epoch': 0.68} 68%|██████▊ | 4531/6640 [3:12:37<3:12:24, 5.47s/it] 68%|██████▊ | 4532/6640 [3:12:43<3:14:18, 5.53s/it] {'loss': 0.5851, 'learning_rate': 4.8375445500134245e-06, 'epoch': 0.68} 68%|██████▊ | 4532/6640 [3:12:43<3:14:18, 5.53s/it] 68%|██████▊ | 4533/6640 [3:12:48<3:13:00, 5.50s/it] {'loss': 0.5732, 'learning_rate': 4.833367233443457e-06, 'epoch': 0.68} 68%|██████▊ | 4533/6640 [3:12:48<3:13:00, 5.50s/it] 68%|██████▊ | 4534/6640 [3:12:54<3:11:35, 5.46s/it] {'loss': 0.5845, 'learning_rate': 4.829191146393485e-06, 'epoch': 0.68} 68%|██████▊ | 4534/6640 [3:12:54<3:11:35, 5.46s/it] 68%|██████▊ | 4535/6640 [3:12:59<3:10:34, 5.43s/it] {'loss': 0.5756, 'learning_rate': 4.8250162898573046e-06, 'epoch': 0.68} 68%|██████▊ | 4535/6640 [3:12:59<3:10:34, 5.43s/it] 68%|██████▊ | 4536/6640 [3:13:05<3:10:06, 5.42s/it] {'loss': 0.5735, 'learning_rate': 4.820842664828425e-06, 'epoch': 0.68} 68%|██████▊ | 4536/6640 [3:13:05<3:10:06, 5.42s/it] 68%|██████▊ | 4537/6640 [3:13:10<3:09:31, 5.41s/it] {'loss': 0.578, 'learning_rate': 4.8166702723000484e-06, 'epoch': 0.68} 68%|██████▊ | 4537/6640 [3:13:10<3:09:31, 5.41s/it] 68%|██████▊ | 4538/6640 [3:13:15<3:08:31, 5.38s/it] {'loss': 0.5826, 'learning_rate': 4.812499113265095e-06, 'epoch': 0.68} 68%|██████▊ | 4538/6640 [3:13:15<3:08:31, 5.38s/it] 68%|██████▊ | 4539/6640 [3:13:21<3:11:53, 5.48s/it] {'loss': 0.6048, 'learning_rate': 4.80832918871619e-06, 'epoch': 0.68} 68%|██████▊ | 4539/6640 [3:13:21<3:11:53, 5.48s/it] 68%|██████▊ | 4540/6640 [3:13:26<3:11:22, 5.47s/it] {'loss': 0.5905, 'learning_rate': 4.804160499645667e-06, 'epoch': 0.68} 68%|██████▊ | 4540/6640 [3:13:26<3:11:22, 5.47s/it] 68%|██████▊ | 4541/6640 [3:13:32<3:08:57, 5.40s/it] {'loss': 0.5629, 'learning_rate': 4.799993047045556e-06, 'epoch': 0.68} 68%|██████▊ | 4541/6640 [3:13:32<3:08:57, 5.40s/it] 68%|██████▊ | 4542/6640 [3:13:37<3:10:19, 5.44s/it] {'loss': 0.5704, 'learning_rate': 4.7958268319075995e-06, 'epoch': 0.68} 68%|██████▊ | 4542/6640 [3:13:37<3:10:19, 5.44s/it] 68%|██████▊ | 4543/6640 [3:13:43<3:12:18, 5.50s/it] {'loss': 0.5846, 'learning_rate': 4.791661855223249e-06, 'epoch': 0.68} 68%|██████▊ | 4543/6640 [3:13:43<3:12:18, 5.50s/it] 68%|██████▊ | 4544/6640 [3:13:48<3:12:32, 5.51s/it] {'loss': 0.5958, 'learning_rate': 4.787498117983654e-06, 'epoch': 0.68} 68%|██████▊ | 4544/6640 [3:13:48<3:12:32, 5.51s/it] 68%|██████▊ | 4545/6640 [3:13:54<3:13:23, 5.54s/it] {'loss': 0.5794, 'learning_rate': 4.783335621179675e-06, 'epoch': 0.68} 68%|██████▊ | 4545/6640 [3:13:54<3:13:23, 5.54s/it] 68%|██████▊ | 4546/6640 [3:13:59<3:12:34, 5.52s/it] {'loss': 0.6064, 'learning_rate': 4.779174365801872e-06, 'epoch': 0.68} 68%|██████▊ | 4546/6640 [3:13:59<3:12:34, 5.52s/it] 68%|██████▊ | 4547/6640 [3:14:05<3:11:16, 5.48s/it] {'loss': 0.5934, 'learning_rate': 4.775014352840512e-06, 'epoch': 0.68} 68%|██████▊ | 4547/6640 [3:14:05<3:11:16, 5.48s/it] 68%|██████▊ | 4548/6640 [3:14:10<3:13:01, 5.54s/it] {'loss': 0.5877, 'learning_rate': 4.7708555832855685e-06, 'epoch': 0.68} 68%|██████▊ | 4548/6640 [3:14:10<3:13:01, 5.54s/it] 69%|██████▊ | 4549/6640 [3:14:16<3:10:33, 5.47s/it] {'loss': 0.5952, 'learning_rate': 4.766698058126715e-06, 'epoch': 0.69} 69%|██████▊ | 4549/6640 [3:14:16<3:10:33, 5.47s/it]6 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... 21 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 69%|██████▊ | 4550/6640 [3:14:21<3:11:01, 5.48s/it]3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6001, 'learning_rate': 4.762541778353337e-06, 'epoch': 0.69} 69%|██████▊ | 4550/6640 [3:14:21<3:11:01, 5.48s/it] 69%|██████▊ | 4551/6640 [3:14:27<3:10:38, 5.48s/it] {'loss': 0.6005, 'learning_rate': 4.758386744954507e-06, 'epoch': 0.69} 69%|██████▊ | 4551/6640 [3:14:27<3:10:38, 5.48s/it] 69%|██████▊ | 4552/6640 [3:14:32<3:11:31, 5.50s/it] {'loss': 0.5665, 'learning_rate': 4.754232958919019e-06, 'epoch': 0.69} 69%|██████▊ | 4552/6640 [3:14:32<3:11:31, 5.50s/it] 69%|██████▊ | 4553/6640 [3:14:38<3:12:39, 5.54s/it] {'loss': 0.5886, 'learning_rate': 4.7500804212353605e-06, 'epoch': 0.69} 69%|██████▊ | 4553/6640 [3:14:38<3:12:39, 5.54s/it] 69%|██████▊ | 4554/6640 [3:14:43<3:11:55, 5.52s/it] {'loss': 0.5738, 'learning_rate': 4.7459291328917275e-06, 'epoch': 0.69} 69%|██████▊ | 4554/6640 [3:14:43<3:11:55, 5.52s/it] 69%|██████▊ | 4555/6640 [3:14:49<3:12:22, 5.54s/it] {'loss': 0.5766, 'learning_rate': 4.741779094876009e-06, 'epoch': 0.69} 69%|██████▊ | 4555/6640 [3:14:49<3:12:22, 5.54s/it] 69%|██████▊ | 4556/6640 [3:14:54<3:11:24, 5.51s/it] {'loss': 0.5977, 'learning_rate': 4.7376303081758055e-06, 'epoch': 0.69} 69%|██████▊ | 4556/6640 [3:14:54<3:11:24, 5.51s/it] 69%|██████▊ | 4557/6640 [3:15:00<3:10:52, 5.50s/it] {'loss': 0.5758, 'learning_rate': 4.733482773778417e-06, 'epoch': 0.69} 69%|██████▊ | 4557/6640 [3:15:00<3:10:52, 5.50s/it] 69%|██████▊ | 4558/6640 [3:15:05<3:09:30, 5.46s/it] {'loss': 0.5657, 'learning_rate': 4.729336492670846e-06, 'epoch': 0.69} 69%|██████▊ | 4558/6640 [3:15:05<3:09:30, 5.46s/it] 69%|██████▊ | 4559/6640 [3:15:11<3:09:12, 5.46s/it] {'loss': 0.5961, 'learning_rate': 4.7251914658398e-06, 'epoch': 0.69} 69%|██████▊ | 4559/6640 [3:15:11<3:09:12, 5.46s/it] 69%|██████▊ | 4560/6640 [3:15:16<3:06:23, 5.38s/it] {'loss': 0.5858, 'learning_rate': 4.721047694271676e-06, 'epoch': 0.69} 69%|██████▊ | 4560/6640 [3:15:16<3:06:23, 5.38s/it] 69%|██████▊ | 4561/6640 [3:15:21<3:07:21, 5.41s/it] {'loss': 0.5648, 'learning_rate': 4.7169051789525856e-06, 'epoch': 0.69} 69%|██████▊ | 4561/6640 [3:15:21<3:07:21, 5.41s/it] 69%|██████▊ | 4562/6640 [3:15:27<3:07:25, 5.41s/it] {'loss': 0.5814, 'learning_rate': 4.712763920868334e-06, 'epoch': 0.69} 69%|██████▊ | 4562/6640 [3:15:27<3:07:25, 5.41s/it] 69%|██████▊ | 4563/6640 [3:15:32<3:09:17, 5.47s/it] {'loss': 0.592, 'learning_rate': 4.708623921004435e-06, 'epoch': 0.69} 69%|██████▊ | 4563/6640 [3:15:32<3:09:17, 5.47s/it] 69%|██████▊ | 4564/6640 [3:15:38<3:07:22, 5.42s/it] {'loss': 0.5748, 'learning_rate': 4.704485180346089e-06, 'epoch': 0.69} 69%|██████▊ | 4564/6640 [3:15:38<3:07:22, 5.42s/it] 69%|██████▉ | 4565/6640 [3:15:43<3:07:42, 5.43s/it] {'loss': 0.579, 'learning_rate': 4.700347699878211e-06, 'epoch': 0.69} 69%|██████▉ | 4565/6640 [3:15:43<3:07:42, 5.43s/it] 69%|██████▉ | 4566/6640 [3:15:49<3:09:44, 5.49s/it] {'loss': 0.594, 'learning_rate': 4.6962114805854075e-06, 'epoch': 0.69} 69%|██████▉ | 4566/6640 [3:15:49<3:09:44, 5.49s/it] 69%|██████▉ | 4567/6640 [3:15:54<3:08:56, 5.47s/it] {'loss': 0.5847, 'learning_rate': 4.6920765234519895e-06, 'epoch': 0.69} 69%|██████▉ | 4567/6640 [3:15:54<3:08:56, 5.47s/it] 69%|██████▉ | 4568/6640 [3:16:00<3:07:08, 5.42s/it] {'loss': 0.5798, 'learning_rate': 4.687942829461969e-06, 'epoch': 0.69} 69%|██████▉ | 4568/6640 [3:16:00<3:07:08, 5.42s/it] 69%|██████▉ | 4569/6640 [3:16:05<3:07:21, 5.43s/it] {'loss': 0.5608, 'learning_rate': 4.683810399599047e-06, 'epoch': 0.69} 69%|██████▉ | 4569/6640 [3:16:05<3:07:21, 5.43s/it] 69%|██████▉ | 4570/6640 [3:16:10<3:06:26, 5.40s/it] {'loss': 0.5837, 'learning_rate': 4.679679234846636e-06, 'epoch': 0.69} 69%|██████▉ | 4570/6640 [3:16:10<3:06:26, 5.40s/it] 69%|██████▉ | 4571/6640 [3:16:16<3:05:11, 5.37s/it] {'loss': 0.5968, 'learning_rate': 4.67554933618784e-06, 'epoch': 0.69} 69%|██████▉ | 4571/6640 [3:16:16<3:05:11, 5.37s/it] 69%|██████▉ | 4572/6640 [3:16:21<3:05:03, 5.37s/it] {'loss': 0.5832, 'learning_rate': 4.67142070460547e-06, 'epoch': 0.69} 69%|██████▉ | 4572/6640 [3:16:21<3:05:03, 5.37s/it] 69%|██████▉ | 4573/6640 [3:16:26<3:04:41, 5.36s/it] {'loss': 0.5822, 'learning_rate': 4.667293341082019e-06, 'epoch': 0.69} 69%|██████▉ | 4573/6640 [3:16:26<3:04:41, 5.36s/it] 69%|██████▉ | 4574/6640 [3:16:32<3:06:07, 5.41s/it] {'loss': 0.5687, 'learning_rate': 4.663167246599696e-06, 'epoch': 0.69} 69%|██████▉ | 4574/6640 [3:16:32<3:06:07, 5.41s/it] 69%|██████▉ | 4575/6640 [3:16:37<3:05:43, 5.40s/it] {'loss': 0.5716, 'learning_rate': 4.659042422140399e-06, 'epoch': 0.69} 69%|██████▉ | 4575/6640 [3:16:37<3:05:43, 5.40s/it] 69%|██████▉ | 4576/6640 [3:16:43<3:05:55, 5.40s/it] {'loss': 0.5941, 'learning_rate': 4.654918868685726e-06, 'epoch': 0.69} 69%|██████▉ | 4576/6640 [3:16:43<3:05:55, 5.40s/it] 69%|██████▉ | 4577/6640 [3:16:48<3:06:26, 5.42s/it] {'loss': 0.5964, 'learning_rate': 4.650796587216976e-06, 'epoch': 0.69} 69%|██████▉ | 4577/6640 [3:16:48<3:06:26, 5.42s/it] 69%|██████▉ | 4578/6640 [3:16:53<3:05:41, 5.40s/it] {'loss': 0.597, 'learning_rate': 4.646675578715133e-06, 'epoch': 0.69} 69%|██████▉ | 4578/6640 [3:16:53<3:05:41, 5.40s/it] 69%|██████▉ | 4579/6640 [3:16:59<3:05:51, 5.41s/it] {'loss': 0.5805, 'learning_rate': 4.642555844160892e-06, 'epoch': 0.69} 69%|██████▉ | 4579/6640 [3:16:59<3:05:51, 5.41s/it] 69%|██████▉ | 4580/6640 [3:17:04<3:05:05, 5.39s/it] {'loss': 0.565, 'learning_rate': 4.6384373845346375e-06, 'epoch': 0.69} 69%|██████▉ | 4580/6640 [3:17:04<3:05:05, 5.39s/it] 69%|██████▉ | 4581/6640 [3:17:10<3:07:56, 5.48s/it] {'loss': 0.5847, 'learning_rate': 4.634320200816457e-06, 'epoch': 0.69} 69%|██████▉ | 4581/6640 [3:17:10<3:07:56, 5.48s/it] 69%|██████▉ | 4582/6640 [3:17:15<3:06:49, 5.45s/it] {'loss': 0.5804, 'learning_rate': 4.630204293986122e-06, 'epoch': 0.69} 69%|██████▉ | 4582/6640 [3:17:15<3:06:49, 5.45s/it] 69%|██████▉ | 4583/6640 [3:17:21<3:06:52, 5.45s/it] {'loss': 0.564, 'learning_rate': 4.626089665023113e-06, 'epoch': 0.69} 69%|██████▉ | 4583/6640 [3:17:21<3:06:52, 5.45s/it] 69%|██████▉ | 4584/6640 [3:17:26<3:08:28, 5.50s/it] {'loss': 0.587, 'learning_rate': 4.621976314906597e-06, 'epoch': 0.69} 69%|██████▉ | 4584/6640 [3:17:26<3:08:28, 5.50s/it] 69%|██████▉ | 4585/6640 [3:17:32<3:07:24, 5.47s/it] {'loss': 0.5785, 'learning_rate': 4.617864244615448e-06, 'epoch': 0.69} 69%|██████▉ | 4585/6640 [3:17:32<3:07:24, 5.47s/it] 69%|██████▉ | 4586/6640 [3:17:37<3:07:30, 5.48s/it] {'loss': 0.5765, 'learning_rate': 4.6137534551282256e-06, 'epoch': 0.69} 69%|██████▉ | 4586/6640 [3:17:37<3:07:30, 5.48s/it] 69%|██████▉ | 4587/6640 [3:17:43<3:07:43, 5.49s/it] {'loss': 0.5889, 'learning_rate': 4.609643947423184e-06, 'epoch': 0.69} 69%|██████▉ | 4587/6640 [3:17:43<3:07:43, 5.49s/it] 69%|██████▉ | 4588/6640 [3:17:48<3:07:01, 5.47s/it] {'loss': 0.5777, 'learning_rate': 4.605535722478277e-06, 'epoch': 0.69} 69%|██████▉ | 4588/6640 [3:17:48<3:07:01, 5.47s/it] 69%|██████▉ | 4589/6640 [3:17:54<3:05:11, 5.42s/it] {'loss': 0.5917, 'learning_rate': 4.601428781271154e-06, 'epoch': 0.69} 69%|██████▉ | 4589/6640 [3:17:54<3:05:11, 5.42s/it] 69%|██████▉ | 4590/6640 [3:17:59<3:04:50, 5.41s/it] {'loss': 0.5729, 'learning_rate': 4.597323124779155e-06, 'epoch': 0.69} 69%|██████▉ | 4590/6640 [3:17:59<3:04:50, 5.41s/it] 69%|██████▉ | 4591/6640 [3:18:04<3:06:03, 5.45s/it] {'loss': 0.5581, 'learning_rate': 4.593218753979322e-06, 'epoch': 0.69} 69%|██████▉ | 4591/6640 [3:18:04<3:06:03, 5.45s/it] 69%|██████▉ | 4592/6640 [3:18:10<3:05:49, 5.44s/it] {'loss': 0.609, 'learning_rate': 4.589115669848372e-06, 'epoch': 0.69} 69%|██████▉ | 4592/6640 [3:18:10<3:05:49, 5.44s/it] 69%|██████▉ | 4593/6640 [3:18:15<3:05:39, 5.44s/it] {'loss': 0.5628, 'learning_rate': 4.5850138733627435e-06, 'epoch': 0.69} 69%|██████▉ | 4593/6640 [3:18:15<3:05:39, 5.44s/it] 69%|██████▉ | 4594/6640 [3:18:21<3:04:29, 5.41s/it] {'loss': 0.5808, 'learning_rate': 4.580913365498545e-06, 'epoch': 0.69} 69%|██████▉ | 4594/6640 [3:18:21<3:04:29, 5.41s/it] 69%|██████▉ | 4595/6640 [3:18:26<3:04:52, 5.42s/it] {'loss': 0.5938, 'learning_rate': 4.576814147231594e-06, 'epoch': 0.69} 69%|██████▉ | 4595/6640 [3:18:26<3:04:52, 5.42s/it] 69%|██████▉ | 4596/6640 [3:18:31<3:04:13, 5.41s/it] {'loss': 0.6034, 'learning_rate': 4.572716219537386e-06, 'epoch': 0.69} 69%|██████▉ | 4596/6640 [3:18:31<3:04:13, 5.41s/it] 69%|██████▉ | 4597/6640 [3:18:37<3:02:35, 5.36s/it] {'loss': 0.5679, 'learning_rate': 4.568619583391123e-06, 'epoch': 0.69} 69%|██████▉ | 4597/6640 [3:18:37<3:02:35, 5.36s/it] 69%|██████▉ | 4598/6640 [3:18:42<3:04:53, 5.43s/it] {'loss': 0.5759, 'learning_rate': 4.564524239767693e-06, 'epoch': 0.69} 69%|██████▉ | 4598/6640 [3:18:42<3:04:53, 5.43s/it] 69%|██████▉ | 4599/6640 [3:18:48<3:05:44, 5.46s/it] {'loss': 0.5593, 'learning_rate': 4.560430189641679e-06, 'epoch': 0.69} 69%|██████▉ | 4599/6640 [3:18:48<3:05:44, 5.46s/it]1 AutoResumeHook: Checking whether to suspend... 70 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 24 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 69%|██████▉ | 4600/6640 [3:18:53<3:07:24, 5.51s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.586, 'learning_rate': 4.556337433987359e-06, 'epoch': 0.69} 69%|██████▉ | 4600/6640 [3:18:53<3:07:24, 5.51s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 69%|██████▉ | 4601/6640 [3:19:20<6:42:35, 11.85s/it] {'loss': 0.575, 'learning_rate': 4.552245973778686e-06, 'epoch': 0.69} 69%|██████▉ | 4601/6640 [3:19:20<6:42:35, 11.85s/it] 69%|██████▉ | 4602/6640 [3:19:25<5:34:24, 9.84s/it] {'loss': 0.5974, 'learning_rate': 4.548155809989336e-06, 'epoch': 0.69} 69%|██████▉ | 4602/6640 [3:19:25<5:34:24, 9.84s/it] 69%|██████▉ | 4603/6640 [3:19:31<4:48:54, 8.51s/it] {'loss': 0.6085, 'learning_rate': 4.544066943592645e-06, 'epoch': 0.69} 69%|██████▉ | 4603/6640 [3:19:31<4:48:54, 8.51s/it] 69%|██████▉ | 4604/6640 [3:19:36<4:15:40, 7.53s/it] {'loss': 0.557, 'learning_rate': 4.539979375561659e-06, 'epoch': 0.69} 69%|██████▉ | 4604/6640 [3:19:36<4:15:40, 7.53s/it] 69%|██████▉ | 4605/6640 [3:19:41<3:52:53, 6.87s/it] {'loss': 0.5861, 'learning_rate': 4.535893106869107e-06, 'epoch': 0.69} 69%|██████▉ | 4605/6640 [3:19:41<3:52:53, 6.87s/it] 69%|██████▉ | 4606/6640 [3:19:47<3:36:53, 6.40s/it] {'loss': 0.5794, 'learning_rate': 4.53180813848741e-06, 'epoch': 0.69} 69%|██████▉ | 4606/6640 [3:19:47<3:36:53, 6.40s/it] 69%|██████▉ | 4607/6640 [3:19:52<3:26:39, 6.10s/it] {'loss': 0.5672, 'learning_rate': 4.527724471388685e-06, 'epoch': 0.69} 69%|██████▉ | 4607/6640 [3:19:52<3:26:39, 6.10s/it] 69%|██████▉ | 4608/6640 [3:19:57<3:18:32, 5.86s/it] {'loss': 0.5738, 'learning_rate': 4.523642106544732e-06, 'epoch': 0.69} 69%|██████▉ | 4608/6640 [3:19:57<3:18:32, 5.86s/it] 69%|██████▉ | 4609/6640 [3:20:03<3:13:07, 5.71s/it] {'loss': 0.6014, 'learning_rate': 4.519561044927051e-06, 'epoch': 0.69} 69%|██████▉ | 4609/6640 [3:20:03<3:13:07, 5.71s/it] 69%|██████▉ | 4610/6640 [3:20:08<3:10:47, 5.64s/it] {'loss': 0.5713, 'learning_rate': 4.515481287506811e-06, 'epoch': 0.69} 69%|██████▉ | 4610/6640 [3:20:08<3:10:47, 5.64s/it] 69%|██████▉ | 4611/6640 [3:20:13<3:08:28, 5.57s/it] {'loss': 0.5873, 'learning_rate': 4.511402835254902e-06, 'epoch': 0.69} 69%|██████▉ | 4611/6640 [3:20:14<3:08:28, 5.57s/it] 69%|██████▉ | 4612/6640 [3:20:19<3:07:09, 5.54s/it] {'loss': 0.5749, 'learning_rate': 4.507325689141874e-06, 'epoch': 0.69} 69%|██████▉ | 4612/6640 [3:20:19<3:07:09, 5.54s/it] 69%|██████▉ | 4613/6640 [3:20:25<3:08:01, 5.57s/it] {'loss': 0.5631, 'learning_rate': 4.5032498501379865e-06, 'epoch': 0.69} 69%|██████▉ | 4613/6640 [3:20:25<3:08:01, 5.57s/it] 69%|██████▉ | 4614/6640 [3:20:30<3:09:37, 5.62s/it] {'loss': 0.6044, 'learning_rate': 4.499175319213172e-06, 'epoch': 0.69} 69%|██████▉ | 4614/6640 [3:20:30<3:09:37, 5.62s/it] 70%|██████▉ | 4615/6640 [3:20:36<3:05:55, 5.51s/it] {'loss': 0.5749, 'learning_rate': 4.495102097337062e-06, 'epoch': 0.7} 70%|██████▉ | 4615/6640 [3:20:36<3:05:55, 5.51s/it] 70%|██████▉ | 4616/6640 [3:20:41<3:06:21, 5.52s/it] {'loss': 0.5787, 'learning_rate': 4.491030185478976e-06, 'epoch': 0.7} 70%|██████▉ | 4616/6640 [3:20:41<3:06:21, 5.52s/it] 70%|██████▉ | 4617/6640 [3:20:47<3:09:42, 5.63s/it] {'loss': 0.5754, 'learning_rate': 4.486959584607919e-06, 'epoch': 0.7} 70%|██████▉ | 4617/6640 [3:20:47<3:09:42, 5.63s/it] 70%|██████▉ | 4618/6640 [3:20:52<3:06:08, 5.52s/it] {'loss': 0.603, 'learning_rate': 4.482890295692589e-06, 'epoch': 0.7} 70%|██████▉ | 4618/6640 [3:20:52<3:06:08, 5.52s/it] 70%|██████▉ | 4619/6640 [3:20:58<3:08:48, 5.61s/it] {'loss': 0.5818, 'learning_rate': 4.4788223197013555e-06, 'epoch': 0.7} 70%|██████▉ | 4619/6640 [3:20:58<3:08:48, 5.61s/it] 70%|██████▉ | 4620/6640 [3:21:03<3:06:34, 5.54s/it] {'loss': 0.5852, 'learning_rate': 4.474755657602303e-06, 'epoch': 0.7} 70%|██████▉ | 4620/6640 [3:21:03<3:06:34, 5.54s/it] 70%|██████▉ | 4621/6640 [3:21:09<3:07:23, 5.57s/it] {'loss': 0.5797, 'learning_rate': 4.470690310363177e-06, 'epoch': 0.7} 70%|██████▉ | 4621/6640 [3:21:09<3:07:23, 5.57s/it] 70%|██████▉ | 4622/6640 [3:21:15<3:05:51, 5.53s/it] {'loss': 0.5594, 'learning_rate': 4.466626278951429e-06, 'epoch': 0.7} 70%|██████▉ | 4622/6640 [3:21:15<3:05:51, 5.53s/it] 70%|██████▉ | 4623/6640 [3:21:20<3:04:13, 5.48s/it] {'loss': 0.6131, 'learning_rate': 4.4625635643341806e-06, 'epoch': 0.7} 70%|██████▉ | 4623/6640 [3:21:20<3:04:13, 5.48s/it] 70%|██████▉ | 4624/6640 [3:21:25<3:04:30, 5.49s/it] {'loss': 0.5982, 'learning_rate': 4.458502167478254e-06, 'epoch': 0.7} 70%|██████▉ | 4624/6640 [3:21:25<3:04:30, 5.49s/it] 70%|██████▉ | 4625/6640 [3:21:31<3:04:26, 5.49s/it] {'loss': 0.5783, 'learning_rate': 4.454442089350151e-06, 'epoch': 0.7} 70%|██████▉ | 4625/6640 [3:21:31<3:04:26, 5.49s/it] 70%|██████▉ | 4626/6640 [3:21:36<3:03:57, 5.48s/it] {'loss': 0.6005, 'learning_rate': 4.4503833309160645e-06, 'epoch': 0.7} 70%|██████▉ | 4626/6640 [3:21:36<3:03:57, 5.48s/it] 70%|██████▉ | 4627/6640 [3:21:42<3:03:53, 5.48s/it] {'loss': 0.5718, 'learning_rate': 4.44632589314187e-06, 'epoch': 0.7} 70%|██████▉ | 4627/6640 [3:21:42<3:03:53, 5.48s/it] 70%|██████▉ | 4628/6640 [3:21:49<3:20:59, 5.99s/it] {'loss': 0.5836, 'learning_rate': 4.442269776993121e-06, 'epoch': 0.7} 70%|██████▉ | 4628/6640 [3:21:49<3:20:59, 5.99s/it] 70%|██████▉ | 4629/6640 [3:21:55<3:16:31, 5.86s/it] {'loss': 0.5891, 'learning_rate': 4.438214983435076e-06, 'epoch': 0.7} 70%|██████▉ | 4629/6640 [3:21:55<3:16:31, 5.86s/it] 70%|██████▉ | 4630/6640 [3:22:00<3:13:35, 5.78s/it] {'loss': 0.5717, 'learning_rate': 4.434161513432659e-06, 'epoch': 0.7} 70%|██████▉ | 4630/6640 [3:22:00<3:13:35, 5.78s/it] 70%|██████▉ | 4631/6640 [3:22:06<3:09:45, 5.67s/it] {'loss': 0.5679, 'learning_rate': 4.430109367950488e-06, 'epoch': 0.7} 70%|██████▉ | 4631/6640 [3:22:06<3:09:45, 5.67s/it] 70%|██████▉ | 4632/6640 [3:22:11<3:07:44, 5.61s/it] {'loss': 0.573, 'learning_rate': 4.4260585479528675e-06, 'epoch': 0.7} 70%|██████▉ | 4632/6640 [3:22:11<3:07:44, 5.61s/it] 70%|██████▉ | 4633/6640 [3:22:17<3:08:20, 5.63s/it] {'loss': 0.5764, 'learning_rate': 4.422009054403782e-06, 'epoch': 0.7} 70%|██████▉ | 4633/6640 [3:22:17<3:08:20, 5.63s/it] 70%|██████▉ | 4634/6640 [3:22:22<3:05:01, 5.53s/it] {'loss': 0.5812, 'learning_rate': 4.417960888266908e-06, 'epoch': 0.7} 70%|██████▉ | 4634/6640 [3:22:22<3:05:01, 5.53s/it] 70%|██████▉ | 4635/6640 [3:22:27<3:03:20, 5.49s/it] {'loss': 0.5931, 'learning_rate': 4.413914050505591e-06, 'epoch': 0.7} 70%|██████▉ | 4635/6640 [3:22:27<3:03:20, 5.49s/it] 70%|██████▉ | 4636/6640 [3:22:33<3:02:09, 5.45s/it] {'loss': 0.5895, 'learning_rate': 4.409868542082878e-06, 'epoch': 0.7} 70%|██████▉ | 4636/6640 [3:22:33<3:02:09, 5.45s/it] 70%|██████▉ | 4637/6640 [3:22:38<3:00:27, 5.41s/it] {'loss': 0.5818, 'learning_rate': 4.4058243639614804e-06, 'epoch': 0.7} 70%|██████▉ | 4637/6640 [3:22:38<3:00:27, 5.41s/it] 70%|██████▉ | 4638/6640 [3:22:44<3:01:33, 5.44s/it] {'loss': 0.5767, 'learning_rate': 4.401781517103819e-06, 'epoch': 0.7} 70%|██████▉ | 4638/6640 [3:22:44<3:01:33, 5.44s/it] 70%|██████▉ | 4639/6640 [3:22:49<3:01:52, 5.45s/it] {'loss': 0.6039, 'learning_rate': 4.397740002471973e-06, 'epoch': 0.7} 70%|██████▉ | 4639/6640 [3:22:49<3:01:52, 5.45s/it] 70%|██████▉ | 4640/6640 [3:22:55<3:03:20, 5.50s/it] {'loss': 0.5633, 'learning_rate': 4.393699821027716e-06, 'epoch': 0.7} 70%|██████▉ | 4640/6640 [3:22:55<3:03:20, 5.50s/it] 70%|██████▉ | 4641/6640 [3:23:00<3:03:22, 5.50s/it] {'loss': 0.5752, 'learning_rate': 4.389660973732503e-06, 'epoch': 0.7} 70%|██████▉ | 4641/6640 [3:23:00<3:03:22, 5.50s/it] 70%|██████▉ | 4642/6640 [3:23:06<3:02:28, 5.48s/it] {'loss': 0.5907, 'learning_rate': 4.385623461547473e-06, 'epoch': 0.7} 70%|██████▉ | 4642/6640 [3:23:06<3:02:28, 5.48s/it] 70%|██████▉ | 4643/6640 [3:23:11<3:00:07, 5.41s/it] {'loss': 0.5604, 'learning_rate': 4.38158728543345e-06, 'epoch': 0.7} 70%|██████▉ | 4643/6640 [3:23:11<3:00:07, 5.41s/it] 70%|██████▉ | 4644/6640 [3:23:16<2:59:27, 5.39s/it] {'loss': 0.6062, 'learning_rate': 4.377552446350925e-06, 'epoch': 0.7} 70%|██████▉ | 4644/6640 [3:23:16<2:59:27, 5.39s/it] 70%|██████▉ | 4645/6640 [3:23:21<2:57:16, 5.33s/it] {'loss': 0.5931, 'learning_rate': 4.37351894526009e-06, 'epoch': 0.7} 70%|██████▉ | 4645/6640 [3:23:21<2:57:16, 5.33s/it] 70%|██████▉ | 4646/6640 [3:23:27<2:58:55, 5.38s/it] {'loss': 0.577, 'learning_rate': 4.3694867831208e-06, 'epoch': 0.7} 70%|██████▉ | 4646/6640 [3:23:27<2:58:55, 5.38s/it] 70%|██████▉ | 4647/6640 [3:23:32<2:57:56, 5.36s/it] {'loss': 0.608, 'learning_rate': 4.365455960892615e-06, 'epoch': 0.7} 70%|██████▉ | 4647/6640 [3:23:32<2:57:56, 5.36s/it] 70%|███████ | 4648/6640 [3:23:38<3:00:12, 5.43s/it] {'loss': 0.577, 'learning_rate': 4.3614264795347525e-06, 'epoch': 0.7} 70%|███████ | 4648/6640 [3:23:38<3:00:12, 5.43s/it] 70%|███████ | 4649/6640 [3:23:43<3:00:42, 5.45s/it] {'loss': 0.5794, 'learning_rate': 4.357398340006125e-06, 'epoch': 0.7} 70%|███████ | 4649/6640 [3:23:43<3:00:42, 5.45s/it]7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 70%|███████ | 4650/6640 [3:23:49<3:00:18, 5.44s/it]3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5753, 'learning_rate': 4.35337154326532e-06, 'epoch': 0.7} 70%|███████ | 4650/6640 [3:23:49<3:00:18, 5.44s/it] 70%|███████ | 4651/6640 [3:23:54<2:59:31, 5.42s/it] {'loss': 0.5815, 'learning_rate': 4.349346090270608e-06, 'epoch': 0.7} 70%|███████ | 4651/6640 [3:23:54<2:59:31, 5.42s/it] 70%|███████ | 4652/6640 [3:24:00<3:02:36, 5.51s/it] {'loss': 0.5661, 'learning_rate': 4.345321981979942e-06, 'epoch': 0.7} 70%|███████ | 4652/6640 [3:24:00<3:02:36, 5.51s/it] 70%|███████ | 4653/6640 [3:24:05<3:01:42, 5.49s/it] {'loss': 0.568, 'learning_rate': 4.341299219350946e-06, 'epoch': 0.7} 70%|███████ | 4653/6640 [3:24:05<3:01:42, 5.49s/it] 70%|███████ | 4654/6640 [3:24:11<3:01:26, 5.48s/it] {'loss': 0.5852, 'learning_rate': 4.337277803340936e-06, 'epoch': 0.7} 70%|███████ | 4654/6640 [3:24:11<3:01:26, 5.48s/it] 70%|███████ | 4655/6640 [3:24:16<3:01:48, 5.50s/it] {'loss': 0.5517, 'learning_rate': 4.333257734906889e-06, 'epoch': 0.7} 70%|███████ | 4655/6640 [3:24:16<3:01:48, 5.50s/it] 70%|███████ | 4656/6640 [3:24:22<3:00:50, 5.47s/it] {'loss': 0.5603, 'learning_rate': 4.329239015005489e-06, 'epoch': 0.7} 70%|███████ | 4656/6640 [3:24:22<3:00:50, 5.47s/it] 70%|███████ | 4657/6640 [3:24:27<3:00:37, 5.47s/it] {'loss': 0.5877, 'learning_rate': 4.325221644593075e-06, 'epoch': 0.7} 70%|███████ | 4657/6640 [3:24:27<3:00:37, 5.47s/it] 70%|███████ | 4658/6640 [3:24:33<3:02:10, 5.51s/it] {'loss': 0.5909, 'learning_rate': 4.321205624625673e-06, 'epoch': 0.7} 70%|███████ | 4658/6640 [3:24:33<3:02:10, 5.51s/it] 70%|███████ | 4659/6640 [3:24:38<2:59:31, 5.44s/it] {'loss': 0.5764, 'learning_rate': 4.31719095605899e-06, 'epoch': 0.7} 70%|███████ | 4659/6640 [3:24:38<2:59:31, 5.44s/it] 70%|███████ | 4660/6640 [3:24:46<3:20:07, 6.06s/it] {'loss': 0.576, 'learning_rate': 4.313177639848408e-06, 'epoch': 0.7} 70%|███████ | 4660/6640 [3:24:46<3:20:07, 6.06s/it] 70%|███████ | 4661/6640 [3:24:51<3:13:26, 5.86s/it] {'loss': 0.6102, 'learning_rate': 4.309165676948995e-06, 'epoch': 0.7} 70%|███████ | 4661/6640 [3:24:51<3:13:26, 5.86s/it] 70%|███████ | 4662/6640 [3:24:56<3:08:44, 5.73s/it] {'loss': 0.595, 'learning_rate': 4.305155068315481e-06, 'epoch': 0.7} 70%|███████ | 4662/6640 [3:24:56<3:08:44, 5.73s/it] 70%|███████ | 4663/6640 [3:25:02<3:05:21, 5.63s/it] {'loss': 0.5967, 'learning_rate': 4.3011458149022914e-06, 'epoch': 0.7} 70%|███████ | 4663/6640 [3:25:02<3:05:21, 5.63s/it] 70%|███████ | 4664/6640 [3:25:07<3:03:07, 5.56s/it] {'loss': 0.5784, 'learning_rate': 4.29713791766351e-06, 'epoch': 0.7} 70%|███████ | 4664/6640 [3:25:07<3:03:07, 5.56s/it] 70%|███████ | 4665/6640 [3:25:13<3:02:05, 5.53s/it] {'loss': 0.5769, 'learning_rate': 4.293131377552923e-06, 'epoch': 0.7} 70%|███████ | 4665/6640 [3:25:13<3:02:05, 5.53s/it] 70%|███████ | 4666/6640 [3:25:18<3:00:14, 5.48s/it] {'loss': 0.5855, 'learning_rate': 4.289126195523968e-06, 'epoch': 0.7} 70%|███████ | 4666/6640 [3:25:18<3:00:14, 5.48s/it] 70%|███████ | 4667/6640 [3:25:23<2:59:43, 5.47s/it] {'loss': 0.5768, 'learning_rate': 4.285122372529776e-06, 'epoch': 0.7} 70%|███████ | 4667/6640 [3:25:23<2:59:43, 5.47s/it] 70%|███████ | 4668/6640 [3:25:29<3:00:21, 5.49s/it] {'loss': 0.5826, 'learning_rate': 4.281119909523149e-06, 'epoch': 0.7} 70%|███████ | 4668/6640 [3:25:29<3:00:21, 5.49s/it] 70%|███████ | 4669/6640 [3:25:34<3:00:21, 5.49s/it] {'loss': 0.5759, 'learning_rate': 4.277118807456565e-06, 'epoch': 0.7} 70%|███████ | 4669/6640 [3:25:34<3:00:21, 5.49s/it] 70%|███████ | 4670/6640 [3:25:41<3:15:41, 5.96s/it] {'loss': 0.5845, 'learning_rate': 4.273119067282184e-06, 'epoch': 0.7} 70%|███████ | 4670/6640 [3:25:41<3:15:41, 5.96s/it] 70%|███████ | 4671/6640 [3:25:49<3:31:59, 6.46s/it] {'loss': 0.5746, 'learning_rate': 4.269120689951828e-06, 'epoch': 0.7} 70%|███████ | 4671/6640 [3:25:49<3:31:59, 6.46s/it] 70%|███████ | 4672/6640 [3:25:56<3:38:49, 6.67s/it] {'loss': 0.5707, 'learning_rate': 4.265123676417008e-06, 'epoch': 0.7} 70%|███████ | 4672/6640 [3:25:56<3:38:49, 6.67s/it] 70%|███████ | 4673/6640 [3:26:02<3:26:25, 6.30s/it] {'loss': 0.5789, 'learning_rate': 4.261128027628906e-06, 'epoch': 0.7} 70%|███████ | 4673/6640 [3:26:02<3:26:25, 6.30s/it] 70%|███████ | 4674/6640 [3:26:07<3:16:21, 5.99s/it] {'loss': 0.5743, 'learning_rate': 4.25713374453838e-06, 'epoch': 0.7} 70%|███████ | 4674/6640 [3:26:07<3:16:21, 5.99s/it] 70%|███████ | 4675/6640 [3:26:13<3:11:47, 5.86s/it] {'loss': 0.5685, 'learning_rate': 4.253140828095964e-06, 'epoch': 0.7} 70%|███████ | 4675/6640 [3:26:13<3:11:47, 5.86s/it] 70%|███████ | 4676/6640 [3:26:18<3:09:24, 5.79s/it] {'loss': 0.5807, 'learning_rate': 4.249149279251859e-06, 'epoch': 0.7} 70%|███████ | 4676/6640 [3:26:18<3:09:24, 5.79s/it] 70%|███████ | 4677/6640 [3:26:23<3:04:42, 5.65s/it] {'loss': 0.5938, 'learning_rate': 4.24515909895595e-06, 'epoch': 0.7} 70%|███████ | 4677/6640 [3:26:23<3:04:42, 5.65s/it] 70%|███████ | 4678/6640 [3:26:31<3:18:26, 6.07s/it] {'loss': 0.5963, 'learning_rate': 4.241170288157792e-06, 'epoch': 0.7} 70%|███████ | 4678/6640 [3:26:31<3:18:26, 6.07s/it] 70%|███████ | 4679/6640 [3:26:37<3:25:50, 6.30s/it] {'loss': 0.5793, 'learning_rate': 4.23718284780662e-06, 'epoch': 0.7} 70%|███████ | 4679/6640 [3:26:37<3:25:50, 6.30s/it] 70%|███████ | 4680/6640 [3:26:43<3:16:27, 6.01s/it] {'loss': 0.5931, 'learning_rate': 4.2331967788513295e-06, 'epoch': 0.7} 70%|███████ | 4680/6640 [3:26:43<3:16:27, 6.01s/it] 70%|███████ | 4681/6640 [3:26:48<3:09:42, 5.81s/it] {'loss': 0.5478, 'learning_rate': 4.229212082240503e-06, 'epoch': 0.7} 70%|███████ | 4681/6640 [3:26:48<3:09:42, 5.81s/it] 71%|███████ | 4682/6640 [3:26:53<3:06:03, 5.70s/it] {'loss': 0.5761, 'learning_rate': 4.225228758922389e-06, 'epoch': 0.71} 71%|███████ | 4682/6640 [3:26:53<3:06:03, 5.70s/it] 71%|███████ | 4683/6640 [3:26:59<3:02:42, 5.60s/it] {'loss': 0.5675, 'learning_rate': 4.221246809844915e-06, 'epoch': 0.71} 71%|███████ | 4683/6640 [3:26:59<3:02:42, 5.60s/it] 71%|███████ | 4684/6640 [3:27:06<3:14:10, 5.96s/it] {'loss': 0.5853, 'learning_rate': 4.217266235955678e-06, 'epoch': 0.71} 71%|███████ | 4684/6640 [3:27:06<3:14:10, 5.96s/it] 71%|███████ | 4685/6640 [3:27:11<3:06:13, 5.72s/it] {'loss': 0.5972, 'learning_rate': 4.213287038201943e-06, 'epoch': 0.71} 71%|███████ | 4685/6640 [3:27:11<3:06:13, 5.72s/it] 71%|███████ | 4686/6640 [3:27:16<3:02:53, 5.62s/it] {'loss': 0.559, 'learning_rate': 4.209309217530656e-06, 'epoch': 0.71} 71%|███████ | 4686/6640 [3:27:16<3:02:53, 5.62s/it] 71%|███████ | 4687/6640 [3:27:22<3:00:09, 5.53s/it] {'loss': 0.5811, 'learning_rate': 4.20533277488843e-06, 'epoch': 0.71} 71%|███████ | 4687/6640 [3:27:22<3:00:09, 5.53s/it] 71%|███████ | 4688/6640 [3:27:27<2:58:56, 5.50s/it] {'loss': 0.6002, 'learning_rate': 4.2013577112215564e-06, 'epoch': 0.71} 71%|███████ | 4688/6640 [3:27:27<2:58:56, 5.50s/it] 71%|███████ | 4689/6640 [3:27:32<2:59:04, 5.51s/it] {'loss': 0.5605, 'learning_rate': 4.1973840274759855e-06, 'epoch': 0.71} 71%|███████ | 4689/6640 [3:27:32<2:59:04, 5.51s/it] 71%|███████ | 4690/6640 [3:27:38<2:59:05, 5.51s/it] {'loss': 0.5871, 'learning_rate': 4.193411724597352e-06, 'epoch': 0.71} 71%|███████ | 4690/6640 [3:27:38<2:59:05, 5.51s/it] 71%|███████ | 4691/6640 [3:27:43<2:57:49, 5.47s/it] {'loss': 0.5875, 'learning_rate': 4.1894408035309575e-06, 'epoch': 0.71} 71%|███████ | 4691/6640 [3:27:43<2:57:49, 5.47s/it] 71%|███████ | 4692/6640 [3:27:49<2:58:16, 5.49s/it] {'loss': 0.5647, 'learning_rate': 4.185471265221775e-06, 'epoch': 0.71} 71%|███████ | 4692/6640 [3:27:49<2:58:16, 5.49s/it] 71%|███████ | 4693/6640 [3:27:54<2:57:31, 5.47s/it] {'loss': 0.5894, 'learning_rate': 4.18150311061445e-06, 'epoch': 0.71} 71%|███████ | 4693/6640 [3:27:54<2:57:31, 5.47s/it] 71%|███████ | 4694/6640 [3:28:00<2:57:31, 5.47s/it] {'loss': 0.5711, 'learning_rate': 4.177536340653291e-06, 'epoch': 0.71} 71%|███████ | 4694/6640 [3:28:00<2:57:31, 5.47s/it] 71%|███████ | 4695/6640 [3:28:05<2:55:39, 5.42s/it] {'loss': 0.5956, 'learning_rate': 4.173570956282286e-06, 'epoch': 0.71} 71%|███████ | 4695/6640 [3:28:05<2:55:39, 5.42s/it] 71%|███████ | 4696/6640 [3:28:11<2:56:43, 5.45s/it] {'loss': 0.5912, 'learning_rate': 4.16960695844509e-06, 'epoch': 0.71} 71%|███████ | 4696/6640 [3:28:11<2:56:43, 5.45s/it] 71%|███████ | 4697/6640 [3:28:16<2:55:51, 5.43s/it] {'loss': 0.585, 'learning_rate': 4.165644348085033e-06, 'epoch': 0.71} 71%|███████ | 4697/6640 [3:28:16<2:55:51, 5.43s/it] 71%|███████ | 4698/6640 [3:28:22<2:56:39, 5.46s/it] {'loss': 0.5646, 'learning_rate': 4.161683126145102e-06, 'epoch': 0.71} 71%|███████ | 4698/6640 [3:28:22<2:56:39, 5.46s/it] 71%|███████ | 4699/6640 [3:28:27<2:55:36, 5.43s/it] {'loss': 0.607, 'learning_rate': 4.157723293567963e-06, 'epoch': 0.71} 71%|███████ | 4699/6640 [3:28:27<2:55:36, 5.43s/it]01 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 71%|███████ | 4700/6640 [3:28:32<2:54:43, 5.40s/it]4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5775, 'learning_rate': 4.153764851295954e-06, 'epoch': 0.71} 71%|███████ | 4700/6640 [3:28:32<2:54:43, 5.40s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 71%|███████ | 4701/6640 [3:29:05<7:16:36, 13.51s/it] {'loss': 0.5963, 'learning_rate': 4.149807800271075e-06, 'epoch': 0.71} 71%|███████ | 4701/6640 [3:29:05<7:16:36, 13.51s/it] 71%|███████ | 4702/6640 [3:29:10<5:56:55, 11.05s/it] {'loss': 0.5838, 'learning_rate': 4.1458521414350026e-06, 'epoch': 0.71} 71%|███████ | 4702/6640 [3:29:10<5:56:55, 11.05s/it] 71%|███████ | 4703/6640 [3:29:15<5:01:12, 9.33s/it] {'loss': 0.5769, 'learning_rate': 4.141897875729069e-06, 'epoch': 0.71} 71%|███████ | 4703/6640 [3:29:15<5:01:12, 9.33s/it] 71%|███████ | 4704/6640 [3:29:21<4:22:50, 8.15s/it] {'loss': 0.5852, 'learning_rate': 4.13794500409429e-06, 'epoch': 0.71} 71%|███████ | 4704/6640 [3:29:21<4:22:50, 8.15s/it] 71%|███████ | 4705/6640 [3:29:26<3:56:15, 7.33s/it] {'loss': 0.5835, 'learning_rate': 4.1339935274713404e-06, 'epoch': 0.71} 71%|███████ | 4705/6640 [3:29:26<3:56:15, 7.33s/it] 71%|███████ | 4706/6640 [3:29:32<3:39:26, 6.81s/it] {'loss': 0.5785, 'learning_rate': 4.13004344680057e-06, 'epoch': 0.71} 71%|███████ | 4706/6640 [3:29:32<3:39:26, 6.81s/it] 71%|███████ | 4707/6640 [3:29:37<3:24:33, 6.35s/it] {'loss': 0.5819, 'learning_rate': 4.126094763021985e-06, 'epoch': 0.71} 71%|███████ | 4707/6640 [3:29:37<3:24:33, 6.35s/it] 71%|███████ | 4708/6640 [3:29:42<3:14:26, 6.04s/it] {'loss': 0.576, 'learning_rate': 4.12214747707527e-06, 'epoch': 0.71} 71%|███████ | 4708/6640 [3:29:42<3:14:26, 6.04s/it] 71%|███████ | 4709/6640 [3:29:48<3:07:15, 5.82s/it] {'loss': 0.603, 'learning_rate': 4.118201589899772e-06, 'epoch': 0.71} 71%|███████ | 4709/6640 [3:29:48<3:07:15, 5.82s/it] 71%|███████ | 4710/6640 [3:29:53<3:02:40, 5.68s/it] {'loss': 0.5887, 'learning_rate': 4.114257102434508e-06, 'epoch': 0.71} 71%|███████ | 4710/6640 [3:29:53<3:02:40, 5.68s/it] 71%|███████ | 4711/6640 [3:29:58<2:59:56, 5.60s/it] {'loss': 0.5726, 'learning_rate': 4.110314015618162e-06, 'epoch': 0.71} 71%|███████ | 4711/6640 [3:29:58<2:59:56, 5.60s/it] 71%|███████ | 4712/6640 [3:30:04<2:56:55, 5.51s/it] {'loss': 0.6041, 'learning_rate': 4.1063723303890765e-06, 'epoch': 0.71} 71%|███████ | 4712/6640 [3:30:04<2:56:55, 5.51s/it] 71%|███████ | 4713/6640 [3:30:09<2:57:30, 5.53s/it] {'loss': 0.571, 'learning_rate': 4.102432047685271e-06, 'epoch': 0.71} 71%|███████ | 4713/6640 [3:30:09<2:57:30, 5.53s/it] 71%|███████ | 4714/6640 [3:30:15<2:56:34, 5.50s/it] {'loss': 0.6143, 'learning_rate': 4.098493168444426e-06, 'epoch': 0.71} 71%|███████ | 4714/6640 [3:30:15<2:56:34, 5.50s/it] 71%|███████ | 4715/6640 [3:30:20<2:56:50, 5.51s/it] {'loss': 0.5938, 'learning_rate': 4.094555693603891e-06, 'epoch': 0.71} 71%|███████ | 4715/6640 [3:30:20<2:56:50, 5.51s/it] 71%|███████ | 4716/6640 [3:30:26<2:55:13, 5.46s/it] {'loss': 0.6045, 'learning_rate': 4.090619624100681e-06, 'epoch': 0.71} 71%|███████ | 4716/6640 [3:30:26<2:55:13, 5.46s/it] 71%|███████ | 4717/6640 [3:30:31<2:56:20, 5.50s/it] {'loss': 0.5793, 'learning_rate': 4.086684960871469e-06, 'epoch': 0.71} 71%|███████ | 4717/6640 [3:30:31<2:56:20, 5.50s/it] 71%|███████ | 4718/6640 [3:30:37<2:55:42, 5.49s/it] {'loss': 0.5715, 'learning_rate': 4.082751704852601e-06, 'epoch': 0.71} 71%|███████ | 4718/6640 [3:30:37<2:55:42, 5.49s/it] 71%|███████ | 4719/6640 [3:30:42<2:58:13, 5.57s/it] {'loss': 0.5927, 'learning_rate': 4.078819856980088e-06, 'epoch': 0.71} 71%|███████ | 4719/6640 [3:30:42<2:58:13, 5.57s/it] 71%|███████ | 4720/6640 [3:30:48<3:00:42, 5.65s/it] {'loss': 0.5867, 'learning_rate': 4.074889418189608e-06, 'epoch': 0.71} 71%|███████ | 4720/6640 [3:30:48<3:00:42, 5.65s/it] 71%|███████ | 4721/6640 [3:30:53<2:57:07, 5.54s/it] {'loss': 0.5718, 'learning_rate': 4.070960389416493e-06, 'epoch': 0.71} 71%|███████ | 4721/6640 [3:30:53<2:57:07, 5.54s/it] 71%|███████ | 4722/6640 [3:30:59<2:54:26, 5.46s/it] {'loss': 0.5792, 'learning_rate': 4.067032771595749e-06, 'epoch': 0.71} 71%|███████ | 4722/6640 [3:30:59<2:54:26, 5.46s/it] 71%|███████ | 4723/6640 [3:31:04<2:54:07, 5.45s/it] {'loss': 0.6209, 'learning_rate': 4.063106565662044e-06, 'epoch': 0.71} 71%|███████ | 4723/6640 [3:31:04<2:54:07, 5.45s/it] 71%|███████ | 4724/6640 [3:31:10<2:55:46, 5.50s/it] {'loss': 0.5911, 'learning_rate': 4.05918177254971e-06, 'epoch': 0.71} 71%|███████ | 4724/6640 [3:31:10<2:55:46, 5.50s/it] 71%|███████ | 4725/6640 [3:31:15<2:56:08, 5.52s/it] {'loss': 0.5734, 'learning_rate': 4.055258393192746e-06, 'epoch': 0.71} 71%|███████ | 4725/6640 [3:31:15<2:56:08, 5.52s/it] 71%|███████ | 4726/6640 [3:31:21<2:59:08, 5.62s/it] {'loss': 0.585, 'learning_rate': 4.051336428524804e-06, 'epoch': 0.71} 71%|███████ | 4726/6640 [3:31:21<2:59:08, 5.62s/it] 71%|███████ | 4727/6640 [3:31:27<2:58:06, 5.59s/it] {'loss': 0.5846, 'learning_rate': 4.04741587947921e-06, 'epoch': 0.71} 71%|███████ | 4727/6640 [3:31:27<2:58:06, 5.59s/it] 71%|███████ | 4728/6640 [3:31:32<2:56:53, 5.55s/it] {'loss': 0.5703, 'learning_rate': 4.043496746988948e-06, 'epoch': 0.71} 71%|███████ | 4728/6640 [3:31:32<2:56:53, 5.55s/it] 71%|███████ | 4729/6640 [3:31:38<2:55:18, 5.50s/it] {'loss': 0.5832, 'learning_rate': 4.039579031986672e-06, 'epoch': 0.71} 71%|███████ | 4729/6640 [3:31:38<2:55:18, 5.50s/it] 71%|███████ | 4730/6640 [3:31:43<2:53:11, 5.44s/it] {'loss': 0.577, 'learning_rate': 4.0356627354046854e-06, 'epoch': 0.71} 71%|███████ | 4730/6640 [3:31:43<2:53:11, 5.44s/it] 71%|███████▏ | 4731/6640 [3:31:48<2:50:44, 5.37s/it] {'loss': 0.5876, 'learning_rate': 4.0317478581749644e-06, 'epoch': 0.71} 71%|███████▏ | 4731/6640 [3:31:48<2:50:44, 5.37s/it] 71%|███████▏ | 4732/6640 [3:31:53<2:50:26, 5.36s/it] {'loss': 0.5867, 'learning_rate': 4.027834401229146e-06, 'epoch': 0.71} 71%|███████▏ | 4732/6640 [3:31:53<2:50:26, 5.36s/it] 71%|███████▏ | 4733/6640 [3:31:59<2:49:44, 5.34s/it] {'loss': 0.5772, 'learning_rate': 4.0239223654985274e-06, 'epoch': 0.71} 71%|███████▏ | 4733/6640 [3:31:59<2:49:44, 5.34s/it] 71%|███████▏ | 4734/6640 [3:32:04<2:51:30, 5.40s/it] {'loss': 0.5958, 'learning_rate': 4.02001175191407e-06, 'epoch': 0.71} 71%|███████▏ | 4734/6640 [3:32:04<2:51:30, 5.40s/it] 71%|███████▏ | 4735/6640 [3:32:10<2:52:22, 5.43s/it] {'loss': 0.5824, 'learning_rate': 4.016102561406392e-06, 'epoch': 0.71} 71%|███████▏ | 4735/6640 [3:32:10<2:52:22, 5.43s/it] 71%|███████▏ | 4736/6640 [3:32:15<2:52:06, 5.42s/it] {'loss': 0.583, 'learning_rate': 4.0121947949057745e-06, 'epoch': 0.71} 71%|███████▏ | 4736/6640 [3:32:15<2:52:06, 5.42s/it] 71%|███████▏ | 4737/6640 [3:32:21<2:53:07, 5.46s/it] {'loss': 0.5901, 'learning_rate': 4.0082884533421655e-06, 'epoch': 0.71} 71%|███████▏ | 4737/6640 [3:32:21<2:53:07, 5.46s/it] 71%|███████▏ | 4738/6640 [3:32:26<2:50:44, 5.39s/it] {'loss': 0.5652, 'learning_rate': 4.00438353764517e-06, 'epoch': 0.71} 71%|███████▏ | 4738/6640 [3:32:26<2:50:44, 5.39s/it] 71%|███████▏ | 4739/6640 [3:32:31<2:50:45, 5.39s/it] {'loss': 0.5962, 'learning_rate': 4.000480048744048e-06, 'epoch': 0.71} 71%|███████▏ | 4739/6640 [3:32:31<2:50:45, 5.39s/it] 71%|███████▏ | 4740/6640 [3:32:37<2:52:44, 5.46s/it] {'loss': 0.5873, 'learning_rate': 3.996577987567727e-06, 'epoch': 0.71} 71%|███████▏ | 4740/6640 [3:32:37<2:52:44, 5.46s/it] 71%|███████▏ | 4741/6640 [3:32:42<2:53:15, 5.47s/it] {'loss': 0.5952, 'learning_rate': 3.992677355044794e-06, 'epoch': 0.71} 71%|███████▏ | 4741/6640 [3:32:42<2:53:15, 5.47s/it] 71%|███████▏ | 4742/6640 [3:32:48<2:53:29, 5.48s/it] {'loss': 0.6017, 'learning_rate': 3.988778152103494e-06, 'epoch': 0.71} 71%|███████▏ | 4742/6640 [3:32:48<2:53:29, 5.48s/it] 71%|███████▏ | 4743/6640 [3:32:53<2:52:01, 5.44s/it] {'loss': 0.5622, 'learning_rate': 3.9848803796717385e-06, 'epoch': 0.71} 71%|███████▏ | 4743/6640 [3:32:53<2:52:01, 5.44s/it] 71%|███████▏ | 4744/6640 [3:32:59<2:52:36, 5.46s/it] {'loss': 0.6011, 'learning_rate': 3.980984038677083e-06, 'epoch': 0.71} 71%|███████▏ | 4744/6640 [3:32:59<2:52:36, 5.46s/it] 71%|███████▏ | 4745/6640 [3:33:04<2:52:22, 5.46s/it] {'loss': 0.5851, 'learning_rate': 3.977089130046756e-06, 'epoch': 0.71} 71%|███████▏ | 4745/6640 [3:33:04<2:52:22, 5.46s/it] 71%|███████▏ | 4746/6640 [3:33:10<2:51:15, 5.43s/it] {'loss': 0.5661, 'learning_rate': 3.973195654707642e-06, 'epoch': 0.71} 71%|███████▏ | 4746/6640 [3:33:10<2:51:15, 5.43s/it] 71%|███████▏ | 4747/6640 [3:33:15<2:51:26, 5.43s/it] {'loss': 0.5779, 'learning_rate': 3.969303613586288e-06, 'epoch': 0.71} 71%|███████▏ | 4747/6640 [3:33:15<2:51:26, 5.43s/it] 72%|███████▏ | 4748/6640 [3:33:20<2:50:21, 5.40s/it] {'loss': 0.6127, 'learning_rate': 3.965413007608888e-06, 'epoch': 0.72} 72%|███████▏ | 4748/6640 [3:33:20<2:50:21, 5.40s/it] 72%|███████▏ | 4749/6640 [3:33:26<2:51:07, 5.43s/it] {'loss': 0.5704, 'learning_rate': 3.961523837701304e-06, 'epoch': 0.72} 72%|███████▏ | 4749/6640 [3:33:26<2:51:07, 5.43s/it]6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 72%|███████▏ | 4750/6640 [3:33:31<2:50:08, 5.40s/it] {'loss': 0.575, 'learning_rate': 3.957636104789056e-06, 'epoch': 0.72} 72%|███████▏ | 4750/6640 [3:33:31<2:50:08, 5.40s/it] 72%|███████▏ | 4751/6640 [3:33:37<2:50:10, 5.41s/it] {'loss': 0.6046, 'learning_rate': 3.9537498097973186e-06, 'epoch': 0.72} 72%|███████▏ | 4751/6640 [3:33:37<2:50:10, 5.41s/it] 72%|███████▏ | 4752/6640 [3:33:42<2:49:37, 5.39s/it] {'loss': 0.5667, 'learning_rate': 3.949864953650929e-06, 'epoch': 0.72} 72%|███████▏ | 4752/6640 [3:33:42<2:49:37, 5.39s/it] 72%|███████▏ | 4753/6640 [3:33:47<2:49:46, 5.40s/it] {'loss': 0.5897, 'learning_rate': 3.945981537274373e-06, 'epoch': 0.72} 72%|███████▏ | 4753/6640 [3:33:47<2:49:46, 5.40s/it] 72%|███████▏ | 4754/6640 [3:33:53<2:49:52, 5.40s/it] {'loss': 0.5729, 'learning_rate': 3.942099561591802e-06, 'epoch': 0.72} 72%|███████▏ | 4754/6640 [3:33:53<2:49:52, 5.40s/it] 72%|███████▏ | 4755/6640 [3:33:58<2:50:40, 5.43s/it] {'loss': 0.5967, 'learning_rate': 3.938219027527023e-06, 'epoch': 0.72} 72%|███████▏ | 4755/6640 [3:33:58<2:50:40, 5.43s/it] 72%|███████▏ | 4756/6640 [3:34:04<2:50:06, 5.42s/it] {'loss': 0.6007, 'learning_rate': 3.9343399360035e-06, 'epoch': 0.72} 72%|███████▏ | 4756/6640 [3:34:04<2:50:06, 5.42s/it] 72%|███████▏ | 4757/6640 [3:34:09<2:49:14, 5.39s/it] {'loss': 0.5652, 'learning_rate': 3.930462287944354e-06, 'epoch': 0.72} 72%|███████▏ | 4757/6640 [3:34:09<2:49:14, 5.39s/it] 72%|███████▏ | 4758/6640 [3:34:14<2:49:47, 5.41s/it] {'loss': 0.5816, 'learning_rate': 3.9265860842723505e-06, 'epoch': 0.72} 72%|███████▏ | 4758/6640 [3:34:14<2:49:47, 5.41s/it] 72%|███████▏ | 4759/6640 [3:34:20<2:47:46, 5.35s/it] {'loss': 0.5432, 'learning_rate': 3.922711325909938e-06, 'epoch': 0.72} 72%|███████▏ | 4759/6640 [3:34:20<2:47:46, 5.35s/it] 72%|███████▏ | 4760/6640 [3:34:25<2:48:06, 5.37s/it] {'loss': 0.5705, 'learning_rate': 3.9188380137791934e-06, 'epoch': 0.72} 72%|███████▏ | 4760/6640 [3:34:25<2:48:06, 5.37s/it] 72%|███████▏ | 4761/6640 [3:34:31<2:50:29, 5.44s/it] {'loss': 0.581, 'learning_rate': 3.9149661488018685e-06, 'epoch': 0.72} 72%|███████▏ | 4761/6640 [3:34:31<2:50:29, 5.44s/it] 72%|███████▏ | 4762/6640 [3:34:36<2:49:40, 5.42s/it] {'loss': 0.5535, 'learning_rate': 3.911095731899356e-06, 'epoch': 0.72} 72%|███████▏ | 4762/6640 [3:34:36<2:49:40, 5.42s/it] 72%|███████▏ | 4763/6640 [3:34:42<2:49:44, 5.43s/it] {'loss': 0.5864, 'learning_rate': 3.907226763992714e-06, 'epoch': 0.72} 72%|███████▏ | 4763/6640 [3:34:42<2:49:44, 5.43s/it] 72%|███████▏ | 4764/6640 [3:34:47<2:50:02, 5.44s/it] {'loss': 0.5753, 'learning_rate': 3.903359246002655e-06, 'epoch': 0.72} 72%|███████▏ | 4764/6640 [3:34:47<2:50:02, 5.44s/it] 72%|███████▏ | 4765/6640 [3:34:52<2:49:55, 5.44s/it] {'loss': 0.5942, 'learning_rate': 3.899493178849544e-06, 'epoch': 0.72} 72%|███████▏ | 4765/6640 [3:34:52<2:49:55, 5.44s/it] 72%|███████▏ | 4766/6640 [3:34:58<2:49:40, 5.43s/it] {'loss': 0.5841, 'learning_rate': 3.895628563453403e-06, 'epoch': 0.72} 72%|███████▏ | 4766/6640 [3:34:58<2:49:40, 5.43s/it] 72%|███████▏ | 4767/6640 [3:35:04<2:54:48, 5.60s/it] {'loss': 0.6014, 'learning_rate': 3.891765400733898e-06, 'epoch': 0.72} 72%|███████▏ | 4767/6640 [3:35:04<2:54:48, 5.60s/it] 72%|███████▏ | 4768/6640 [3:35:09<2:52:59, 5.54s/it] {'loss': 0.5688, 'learning_rate': 3.887903691610371e-06, 'epoch': 0.72} 72%|███████▏ | 4768/6640 [3:35:09<2:52:59, 5.54s/it] 72%|███████▏ | 4769/6640 [3:35:15<2:50:51, 5.48s/it] {'loss': 0.5901, 'learning_rate': 3.884043437001795e-06, 'epoch': 0.72} 72%|███████▏ | 4769/6640 [3:35:15<2:50:51, 5.48s/it] 72%|███████▏ | 4770/6640 [3:35:20<2:49:52, 5.45s/it] {'loss': 0.5862, 'learning_rate': 3.880184637826816e-06, 'epoch': 0.72} 72%|███████▏ | 4770/6640 [3:35:20<2:49:52, 5.45s/it] 72%|███████▏ | 4771/6640 [3:35:26<2:51:47, 5.51s/it] {'loss': 0.583, 'learning_rate': 3.876327295003716e-06, 'epoch': 0.72} 72%|███████▏ | 4771/6640 [3:35:26<2:51:47, 5.51s/it] 72%|███████▏ | 4772/6640 [3:35:31<2:49:59, 5.46s/it] {'loss': 0.5636, 'learning_rate': 3.872471409450441e-06, 'epoch': 0.72} 72%|███████▏ | 4772/6640 [3:35:31<2:49:59, 5.46s/it] 72%|███████▏ | 4773/6640 [3:35:36<2:48:53, 5.43s/it] {'loss': 0.554, 'learning_rate': 3.868616982084591e-06, 'epoch': 0.72} 72%|███████▏ | 4773/6640 [3:35:36<2:48:53, 5.43s/it] 72%|███████▏ | 4774/6640 [3:35:42<2:50:12, 5.47s/it] {'loss': 0.5884, 'learning_rate': 3.864764013823415e-06, 'epoch': 0.72} 72%|███████▏ | 4774/6640 [3:35:42<2:50:12, 5.47s/it] 72%|███████▏ | 4775/6640 [3:35:47<2:49:02, 5.44s/it] {'loss': 0.5705, 'learning_rate': 3.860912505583819e-06, 'epoch': 0.72} 72%|███████▏ | 4775/6640 [3:35:47<2:49:02, 5.44s/it] 72%|███████▏ | 4776/6640 [3:35:53<2:50:19, 5.48s/it] {'loss': 0.6028, 'learning_rate': 3.85706245828235e-06, 'epoch': 0.72} 72%|███████▏ | 4776/6640 [3:35:53<2:50:19, 5.48s/it] 72%|███████▏ | 4777/6640 [3:35:58<2:48:10, 5.42s/it] {'loss': 0.5915, 'learning_rate': 3.853213872835229e-06, 'epoch': 0.72} 72%|███████▏ | 4777/6640 [3:35:58<2:48:10, 5.42s/it] 72%|███████▏ | 4778/6640 [3:36:03<2:47:28, 5.40s/it] {'loss': 0.5784, 'learning_rate': 3.849366750158305e-06, 'epoch': 0.72} 72%|███████▏ | 4778/6640 [3:36:03<2:47:28, 5.40s/it] 72%|███████▏ | 4779/6640 [3:36:09<2:45:57, 5.35s/it] {'loss': 0.5758, 'learning_rate': 3.845521091167098e-06, 'epoch': 0.72} 72%|███████▏ | 4779/6640 [3:36:09<2:45:57, 5.35s/it] 72%|███████▏ | 4780/6640 [3:36:14<2:47:44, 5.41s/it] {'loss': 0.5714, 'learning_rate': 3.841676896776764e-06, 'epoch': 0.72} 72%|███████▏ | 4780/6640 [3:36:14<2:47:44, 5.41s/it] 72%|███████▏ | 4781/6640 [3:36:20<2:46:59, 5.39s/it] {'loss': 0.5872, 'learning_rate': 3.837834167902121e-06, 'epoch': 0.72} 72%|███████▏ | 4781/6640 [3:36:20<2:46:59, 5.39s/it] 72%|███████▏ | 4782/6640 [3:36:25<2:48:12, 5.43s/it] {'loss': 0.5845, 'learning_rate': 3.8339929054576375e-06, 'epoch': 0.72} 72%|███████▏ | 4782/6640 [3:36:25<2:48:12, 5.43s/it] 72%|███████▏ | 4783/6640 [3:36:31<2:49:42, 5.48s/it] {'loss': 0.5726, 'learning_rate': 3.83015311035743e-06, 'epoch': 0.72} 72%|███████▏ | 4783/6640 [3:36:31<2:49:42, 5.48s/it] 72%|███████▏ | 4784/6640 [3:36:36<2:49:38, 5.48s/it] {'loss': 0.5684, 'learning_rate': 3.826314783515269e-06, 'epoch': 0.72} 72%|███████▏ | 4784/6640 [3:36:36<2:49:38, 5.48s/it] 72%|███████▏ | 4785/6640 [3:36:42<2:48:05, 5.44s/it] {'loss': 0.5701, 'learning_rate': 3.822477925844564e-06, 'epoch': 0.72} 72%|███████▏ | 4785/6640 [3:36:42<2:48:05, 5.44s/it] 72%|███████▏ | 4786/6640 [3:36:47<2:48:03, 5.44s/it] {'loss': 0.5758, 'learning_rate': 3.818642538258398e-06, 'epoch': 0.72} 72%|███████▏ | 4786/6640 [3:36:47<2:48:03, 5.44s/it] 72%|███████▏ | 4787/6640 [3:36:52<2:48:19, 5.45s/it] {'loss': 0.5769, 'learning_rate': 3.814808621669479e-06, 'epoch': 0.72} 72%|███████▏ | 4787/6640 [3:36:52<2:48:19, 5.45s/it] 72%|███████▏ | 4788/6640 [3:36:58<2:46:34, 5.40s/it] {'loss': 0.5743, 'learning_rate': 3.8109761769901845e-06, 'epoch': 0.72} 72%|███████▏ | 4788/6640 [3:36:58<2:46:34, 5.40s/it] 72%|███████▏ | 4789/6640 [3:37:03<2:45:54, 5.38s/it] {'loss': 0.5623, 'learning_rate': 3.8071452051325263e-06, 'epoch': 0.72} 72%|███████▏ | 4789/6640 [3:37:03<2:45:54, 5.38s/it] 72%|███████▏ | 4790/6640 [3:37:08<2:46:12, 5.39s/it] {'loss': 0.6015, 'learning_rate': 3.803315707008176e-06, 'epoch': 0.72} 72%|███████▏ | 4790/6640 [3:37:08<2:46:12, 5.39s/it] 72%|███████▏ | 4791/6640 [3:37:14<2:46:21, 5.40s/it] {'loss': 0.5761, 'learning_rate': 3.7994876835284513e-06, 'epoch': 0.72} 72%|███████▏ | 4791/6640 [3:37:14<2:46:21, 5.40s/it] 72%|███████▏ | 4792/6640 [3:37:19<2:46:29, 5.41s/it] {'loss': 0.5725, 'learning_rate': 3.7956611356043196e-06, 'epoch': 0.72} 72%|███████▏ | 4792/6640 [3:37:19<2:46:29, 5.41s/it] 72%|███████▏ | 4793/6640 [3:37:25<2:46:11, 5.40s/it] {'loss': 0.584, 'learning_rate': 3.7918360641464e-06, 'epoch': 0.72} 72%|███████▏ | 4793/6640 [3:37:25<2:46:11, 5.40s/it] 72%|███████▏ | 4794/6640 [3:37:30<2:45:36, 5.38s/it] {'loss': 0.5685, 'learning_rate': 3.788012470064947e-06, 'epoch': 0.72} 72%|███████▏ | 4794/6640 [3:37:30<2:45:36, 5.38s/it] 72%|███████▏ | 4795/6640 [3:37:36<2:48:00, 5.46s/it] {'loss': 0.5789, 'learning_rate': 3.7841903542698855e-06, 'epoch': 0.72} 72%|███████▏ | 4795/6640 [3:37:36<2:48:00, 5.46s/it] 72%|███████▏ | 4796/6640 [3:37:41<2:49:28, 5.51s/it] {'loss': 0.584, 'learning_rate': 3.7803697176707688e-06, 'epoch': 0.72} 72%|███████▏ | 4796/6640 [3:37:41<2:49:28, 5.51s/it] 72%|███████▏ | 4797/6640 [3:37:47<2:48:30, 5.49s/it] {'loss': 0.6085, 'learning_rate': 3.7765505611768072e-06, 'epoch': 0.72} 72%|███████▏ | 4797/6640 [3:37:47<2:48:30, 5.49s/it] 72%|███████▏ | 4798/6640 [3:37:52<2:46:47, 5.43s/it] {'loss': 0.5828, 'learning_rate': 3.7727328856968613e-06, 'epoch': 0.72} 72%|███████▏ | 4798/6640 [3:37:52<2:46:47, 5.43s/it] 72%|███████▏ | 4799/6640 [3:37:58<2:48:17, 5.48s/it] {'loss': 0.5716, 'learning_rate': 3.768916692139426e-06, 'epoch': 0.72} 72%|███████▏ | 4799/6640 [3:37:58<2:48:17, 5.48s/it]6 AutoResumeHook: Checking whether to suspend... 41 AutoResumeHook: Checking whether to suspend... 7AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...3 AutoResumeHook: Checking whether to suspend... 72%|███████▏ | 4800/6640 [3:38:03<2:47:34, 5.46s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5864, 'learning_rate': 3.7651019814126656e-06, 'epoch': 0.72} 72%|███████▏ | 4800/6640 [3:38:03<2:47:34, 5.46s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 72%|███████▏ | 4801/6640 [3:39:00<10:43:24, 20.99s/it] {'loss': 0.6064, 'learning_rate': 3.7612887544243694e-06, 'epoch': 0.72} 72%|███████▏ | 4801/6640 [3:39:00<10:43:24, 20.99s/it] 72%|███████▏ | 4802/6640 [3:39:06<8:19:43, 16.31s/it] {'loss': 0.5921, 'learning_rate': 3.7574770120819903e-06, 'epoch': 0.72} 72%|███████▏ | 4802/6640 [3:39:06<8:19:43, 16.31s/it] 72%|███████▏ | 4803/6640 [3:39:11<6:37:19, 12.98s/it] {'loss': 0.5768, 'learning_rate': 3.75366675529261e-06, 'epoch': 0.72} 72%|███████▏ | 4803/6640 [3:39:11<6:37:19, 12.98s/it] 72%|███████▏ | 4804/6640 [3:39:17<5:30:06, 10.79s/it] {'loss': 0.5824, 'learning_rate': 3.7498579849629803e-06, 'epoch': 0.72} 72%|███████▏ | 4804/6640 [3:39:17<5:30:06, 10.79s/it] 72%|███████▏ | 4805/6640 [3:39:22<4:44:11, 9.29s/it] {'loss': 0.5733, 'learning_rate': 3.7460507019994775e-06, 'epoch': 0.72} 72%|███████▏ | 4805/6640 [3:39:22<4:44:11, 9.29s/it] 72%|███████▏ | 4806/6640 [3:39:28<4:08:19, 8.12s/it] {'loss': 0.5802, 'learning_rate': 3.7422449073081356e-06, 'epoch': 0.72} 72%|███████▏ | 4806/6640 [3:39:28<4:08:19, 8.12s/it] 72%|███████▏ | 4807/6640 [3:39:33<3:46:20, 7.41s/it] {'loss': 0.5836, 'learning_rate': 3.738440601794635e-06, 'epoch': 0.72} 72%|███████▏ | 4807/6640 [3:39:33<3:46:20, 7.41s/it] 72%|███████▏ | 4808/6640 [3:39:39<3:26:35, 6.77s/it] {'loss': 0.5792, 'learning_rate': 3.734637786364288e-06, 'epoch': 0.72} 72%|███████▏ | 4808/6640 [3:39:39<3:26:35, 6.77s/it] 72%|███████▏ | 4809/6640 [3:39:44<3:12:01, 6.29s/it] {'loss': 0.5858, 'learning_rate': 3.730836461922076e-06, 'epoch': 0.72} 72%|███████▏ | 4809/6640 [3:39:44<3:12:01, 6.29s/it] 72%|███████▏ | 4810/6640 [3:39:49<3:03:32, 6.02s/it] {'loss': 0.6, 'learning_rate': 3.7270366293726033e-06, 'epoch': 0.72} 72%|███████▏ | 4810/6640 [3:39:49<3:03:32, 6.02s/it] 72%|███████▏ | 4811/6640 [3:39:55<2:57:32, 5.82s/it] {'loss': 0.5803, 'learning_rate': 3.7232382896201336e-06, 'epoch': 0.72} 72%|███████▏ | 4811/6640 [3:39:55<2:57:32, 5.82s/it] 72%|███████▏ | 4812/6640 [3:40:00<2:55:21, 5.76s/it] {'loss': 0.5767, 'learning_rate': 3.71944144356856e-06, 'epoch': 0.72} 72%|███████▏ | 4812/6640 [3:40:00<2:55:21, 5.76s/it] 72%|███████▏ | 4813/6640 [3:40:06<2:52:32, 5.67s/it] {'loss': 0.5875, 'learning_rate': 3.715646092121444e-06, 'epoch': 0.72} 72%|███████▏ | 4813/6640 [3:40:06<2:52:32, 5.67s/it] 72%|███████▎ | 4814/6640 [3:40:11<2:50:26, 5.60s/it] {'loss': 0.5653, 'learning_rate': 3.7118522361819676e-06, 'epoch': 0.72} 72%|███████▎ | 4814/6640 [3:40:11<2:50:26, 5.60s/it] 73%|███████▎ | 4815/6640 [3:40:16<2:47:10, 5.50s/it] {'loss': 0.5492, 'learning_rate': 3.7080598766529686e-06, 'epoch': 0.73} 73%|███████▎ | 4815/6640 [3:40:16<2:47:10, 5.50s/it] 73%|███████▎ | 4816/6640 [3:40:22<2:45:36, 5.45s/it] {'loss': 0.5746, 'learning_rate': 3.704269014436932e-06, 'epoch': 0.73} 73%|███████▎ | 4816/6640 [3:40:22<2:45:36, 5.45s/it] 73%|███████▎ | 4817/6640 [3:40:27<2:44:34, 5.42s/it] {'loss': 0.5744, 'learning_rate': 3.7004796504359696e-06, 'epoch': 0.73} 73%|███████▎ | 4817/6640 [3:40:27<2:44:34, 5.42s/it] 73%|███████▎ | 4818/6640 [3:40:33<2:46:53, 5.50s/it] {'loss': 0.6211, 'learning_rate': 3.6966917855518626e-06, 'epoch': 0.73} 73%|███████▎ | 4818/6640 [3:40:33<2:46:53, 5.50s/it] 73%|███████▎ | 4819/6640 [3:40:38<2:48:31, 5.55s/it] {'loss': 0.5885, 'learning_rate': 3.6929054206860116e-06, 'epoch': 0.73} 73%|███████▎ | 4819/6640 [3:40:39<2:48:31, 5.55s/it] 73%|███████▎ | 4820/6640 [3:40:44<2:49:51, 5.60s/it] {'loss': 0.5883, 'learning_rate': 3.689120556739475e-06, 'epoch': 0.73} 73%|███████▎ | 4820/6640 [3:40:44<2:49:51, 5.60s/it] 73%|███████▎ | 4821/6640 [3:40:50<2:47:55, 5.54s/it] {'loss': 0.6061, 'learning_rate': 3.685337194612941e-06, 'epoch': 0.73} 73%|███████▎ | 4821/6640 [3:40:50<2:47:55, 5.54s/it] 73%|███████▎ | 4822/6640 [3:40:55<2:47:13, 5.52s/it] {'loss': 0.589, 'learning_rate': 3.6815553352067602e-06, 'epoch': 0.73} 73%|███████▎ | 4822/6640 [3:40:55<2:47:13, 5.52s/it] 73%|███████▎ | 4823/6640 [3:41:00<2:45:29, 5.46s/it] {'loss': 0.5835, 'learning_rate': 3.677774979420904e-06, 'epoch': 0.73} 73%|███████▎ | 4823/6640 [3:41:00<2:45:29, 5.46s/it] 73%|███████▎ | 4824/6640 [3:41:06<2:44:49, 5.45s/it] {'loss': 0.5958, 'learning_rate': 3.6739961281549997e-06, 'epoch': 0.73} 73%|███████▎ | 4824/6640 [3:41:06<2:44:49, 5.45s/it] 73%|███████▎ | 4825/6640 [3:41:11<2:44:36, 5.44s/it] {'loss': 0.5919, 'learning_rate': 3.6702187823083147e-06, 'epoch': 0.73} 73%|███████▎ | 4825/6640 [3:41:11<2:44:36, 5.44s/it] 73%|███████▎ | 4826/6640 [3:41:17<2:43:49, 5.42s/it] {'loss': 0.5639, 'learning_rate': 3.6664429427797466e-06, 'epoch': 0.73} 73%|███████▎ | 4826/6640 [3:41:17<2:43:49, 5.42s/it] 73%|███████▎ | 4827/6640 [3:41:22<2:43:01, 5.40s/it] {'loss': 0.5774, 'learning_rate': 3.6626686104678565e-06, 'epoch': 0.73} 73%|███████▎ | 4827/6640 [3:41:22<2:43:01, 5.40s/it] 73%|███████▎ | 4828/6640 [3:41:27<2:43:08, 5.40s/it] {'loss': 0.5809, 'learning_rate': 3.6588957862708252e-06, 'epoch': 0.73} 73%|███████▎ | 4828/6640 [3:41:27<2:43:08, 5.40s/it] 73%|███████▎ | 4829/6640 [3:41:33<2:46:12, 5.51s/it] {'loss': 0.5645, 'learning_rate': 3.655124471086491e-06, 'epoch': 0.73} 73%|███████▎ | 4829/6640 [3:41:33<2:46:12, 5.51s/it] 73%|███████▎ | 4830/6640 [3:41:39<2:48:19, 5.58s/it] {'loss': 0.5676, 'learning_rate': 3.651354665812313e-06, 'epoch': 0.73} 73%|███████▎ | 4830/6640 [3:41:39<2:48:19, 5.58s/it] 73%|███████▎ | 4831/6640 [3:41:44<2:47:16, 5.55s/it] {'loss': 0.5714, 'learning_rate': 3.6475863713454207e-06, 'epoch': 0.73} 73%|███████▎ | 4831/6640 [3:41:44<2:47:16, 5.55s/it] 73%|███████▎ | 4832/6640 [3:41:50<2:45:56, 5.51s/it] {'loss': 0.5753, 'learning_rate': 3.6438195885825533e-06, 'epoch': 0.73} 73%|███████▎ | 4832/6640 [3:41:50<2:45:56, 5.51s/it] 73%|███████▎ | 4833/6640 [3:41:55<2:43:54, 5.44s/it] {'loss': 0.5889, 'learning_rate': 3.6400543184201107e-06, 'epoch': 0.73} 73%|███████▎ | 4833/6640 [3:41:55<2:43:54, 5.44s/it] 73%|███████▎ | 4834/6640 [3:42:01<2:46:47, 5.54s/it] {'loss': 0.5802, 'learning_rate': 3.6362905617541276e-06, 'epoch': 0.73} 73%|███████▎ | 4834/6640 [3:42:01<2:46:47, 5.54s/it] 73%|███████▎ | 4835/6640 [3:42:06<2:45:33, 5.50s/it] {'loss': 0.5598, 'learning_rate': 3.6325283194802675e-06, 'epoch': 0.73} 73%|███████▎ | 4835/6640 [3:42:06<2:45:33, 5.50s/it] 73%|███████▎ | 4836/6640 [3:42:12<2:44:34, 5.47s/it] {'loss': 0.5704, 'learning_rate': 3.628767592493857e-06, 'epoch': 0.73} 73%|███████▎ | 4836/6640 [3:42:12<2:44:34, 5.47s/it] 73%|███████▎ | 4837/6640 [3:42:17<2:45:00, 5.49s/it] {'loss': 0.5908, 'learning_rate': 3.6250083816898374e-06, 'epoch': 0.73} 73%|███████▎ | 4837/6640 [3:42:17<2:45:00, 5.49s/it] 73%|███████▎ | 4838/6640 [3:42:23<2:44:24, 5.47s/it] {'loss': 0.5852, 'learning_rate': 3.6212506879628074e-06, 'epoch': 0.73} 73%|███████▎ | 4838/6640 [3:42:23<2:44:24, 5.47s/it] 73%|███████▎ | 4839/6640 [3:42:28<2:42:51, 5.43s/it] {'loss': 0.5949, 'learning_rate': 3.617494512206986e-06, 'epoch': 0.73} 73%|███████▎ | 4839/6640 [3:42:28<2:42:51, 5.43s/it] 73%|███████▎ | 4840/6640 [3:42:33<2:43:37, 5.45s/it] {'loss': 0.5656, 'learning_rate': 3.613739855316257e-06, 'epoch': 0.73} 73%|███████▎ | 4840/6640 [3:42:33<2:43:37, 5.45s/it] 73%|███████▎ | 4841/6640 [3:42:39<2:43:07, 5.44s/it] {'loss': 0.5798, 'learning_rate': 3.6099867181841188e-06, 'epoch': 0.73} 73%|███████▎ | 4841/6640 [3:42:39<2:43:07, 5.44s/it] 73%|███████▎ | 4842/6640 [3:42:45<2:46:11, 5.55s/it] {'loss': 0.575, 'learning_rate': 3.606235101703719e-06, 'epoch': 0.73} 73%|███████▎ | 4842/6640 [3:42:45<2:46:11, 5.55s/it] 73%|███████▎ | 4843/6640 [3:42:50<2:46:22, 5.56s/it] {'loss': 0.5834, 'learning_rate': 3.6024850067678454e-06, 'epoch': 0.73} 73%|███████▎ | 4843/6640 [3:42:50<2:46:22, 5.56s/it] 73%|███████▎ | 4844/6640 [3:42:55<2:43:35, 5.47s/it] {'loss': 0.5683, 'learning_rate': 3.598736434268909e-06, 'epoch': 0.73} 73%|███████▎ | 4844/6640 [3:42:55<2:43:35, 5.47s/it] 73%|███████▎ | 4845/6640 [3:43:01<2:48:18, 5.63s/it] {'loss': 0.5889, 'learning_rate': 3.594989385098985e-06, 'epoch': 0.73} 73%|███████▎ | 4845/6640 [3:43:01<2:48:18, 5.63s/it] 73%|███████▎ | 4846/6640 [3:43:07<2:45:41, 5.54s/it] {'loss': 0.5673, 'learning_rate': 3.591243860149759e-06, 'epoch': 0.73} 73%|███████▎ | 4846/6640 [3:43:07<2:45:41, 5.54s/it] 73%|███████▎ | 4847/6640 [3:43:12<2:44:34, 5.51s/it] {'loss': 0.5967, 'learning_rate': 3.58749986031257e-06, 'epoch': 0.73} 73%|███████▎ | 4847/6640 [3:43:12<2:44:34, 5.51s/it] 73%|███████▎ | 4848/6640 [3:43:18<2:46:30, 5.58s/it] {'loss': 0.5667, 'learning_rate': 3.583757386478389e-06, 'epoch': 0.73} 73%|███████▎ | 4848/6640 [3:43:18<2:46:30, 5.58s/it] 73%|███████▎ | 4849/6640 [3:43:23<2:44:08, 5.50s/it] {'loss': 0.58, 'learning_rate': 3.580016439537823e-06, 'epoch': 0.73} 73%|███████▎ | 4849/6640 [3:43:23<2:44:08, 5.50s/it]2 7AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 73%|███████▎ | 4850/6640 [3:43:29<2:43:20, 5.48s/it] {'loss': 0.5744, 'learning_rate': 3.5762770203811225e-06, 'epoch': 0.73} 73%|███████▎ | 4850/6640 [3:43:29<2:43:20, 5.48s/it] 73%|███████▎ | 4851/6640 [3:43:34<2:43:36, 5.49s/it] {'loss': 0.5758, 'learning_rate': 3.5725391298981616e-06, 'epoch': 0.73} 73%|███████▎ | 4851/6640 [3:43:34<2:43:36, 5.49s/it] 73%|███████▎ | 4852/6640 [3:43:40<2:43:17, 5.48s/it] {'loss': 0.5839, 'learning_rate': 3.568802768978463e-06, 'epoch': 0.73} 73%|███████▎ | 4852/6640 [3:43:40<2:43:17, 5.48s/it] 73%|███████▎ | 4853/6640 [3:43:45<2:44:50, 5.53s/it] {'loss': 0.5824, 'learning_rate': 3.565067938511173e-06, 'epoch': 0.73} 73%|███████▎ | 4853/6640 [3:43:45<2:44:50, 5.53s/it] 73%|███████▎ | 4854/6640 [3:43:51<2:43:58, 5.51s/it] {'loss': 0.5972, 'learning_rate': 3.5613346393850934e-06, 'epoch': 0.73} 73%|███████▎ | 4854/6640 [3:43:51<2:43:58, 5.51s/it] 73%|███████▎ | 4855/6640 [3:43:56<2:45:06, 5.55s/it] {'loss': 0.5875, 'learning_rate': 3.557602872488638e-06, 'epoch': 0.73} 73%|███████▎ | 4855/6640 [3:43:56<2:45:06, 5.55s/it] 73%|███████▎ | 4856/6640 [3:44:02<2:43:47, 5.51s/it] {'loss': 0.5356, 'learning_rate': 3.553872638709871e-06, 'epoch': 0.73} 73%|███████▎ | 4856/6640 [3:44:02<2:43:47, 5.51s/it] 73%|███████▎ | 4857/6640 [3:44:07<2:42:29, 5.47s/it] {'loss': 0.5813, 'learning_rate': 3.55014393893649e-06, 'epoch': 0.73} 73%|███████▎ | 4857/6640 [3:44:07<2:42:29, 5.47s/it] 73%|███████▎ | 4858/6640 [3:44:12<2:40:05, 5.39s/it] {'loss': 0.5889, 'learning_rate': 3.546416774055823e-06, 'epoch': 0.73} 73%|███████▎ | 4858/6640 [3:44:12<2:40:05, 5.39s/it] 73%|███████▎ | 4859/6640 [3:44:18<2:40:50, 5.42s/it] {'loss': 0.5836, 'learning_rate': 3.5426911449548395e-06, 'epoch': 0.73} 73%|███████▎ | 4859/6640 [3:44:18<2:40:50, 5.42s/it] 73%|███████▎ | 4860/6640 [3:44:23<2:41:19, 5.44s/it] {'loss': 0.5883, 'learning_rate': 3.5389670525201335e-06, 'epoch': 0.73} 73%|███████▎ | 4860/6640 [3:44:23<2:41:19, 5.44s/it] 73%|███████▎ | 4861/6640 [3:44:29<2:41:22, 5.44s/it] {'loss': 0.5698, 'learning_rate': 3.535244497637944e-06, 'epoch': 0.73} 73%|███████▎ | 4861/6640 [3:44:29<2:41:22, 5.44s/it] 73%|███████▎ | 4862/6640 [3:44:34<2:42:04, 5.47s/it] {'loss': 0.5896, 'learning_rate': 3.53152348119413e-06, 'epoch': 0.73} 73%|███████▎ | 4862/6640 [3:44:34<2:42:04, 5.47s/it] 73%|███████▎ | 4863/6640 [3:44:40<2:40:30, 5.42s/it] {'loss': 0.5788, 'learning_rate': 3.5278040040742078e-06, 'epoch': 0.73} 73%|███████▎ | 4863/6640 [3:44:40<2:40:30, 5.42s/it] 73%|███████▎ | 4864/6640 [3:44:45<2:40:36, 5.43s/it] {'loss': 0.5857, 'learning_rate': 3.5240860671633027e-06, 'epoch': 0.73} 73%|███████▎ | 4864/6640 [3:44:45<2:40:36, 5.43s/it] 73%|███████▎ | 4865/6640 [3:44:51<2:43:10, 5.52s/it] {'loss': 0.5733, 'learning_rate': 3.5203696713461866e-06, 'epoch': 0.73} 73%|███████▎ | 4865/6640 [3:44:51<2:43:10, 5.52s/it] 73%|███████▎ | 4866/6640 [3:44:56<2:43:57, 5.55s/it] {'loss': 0.5804, 'learning_rate': 3.5166548175072622e-06, 'epoch': 0.73} 73%|███████▎ | 4866/6640 [3:44:56<2:43:57, 5.55s/it] 73%|███████▎ | 4867/6640 [3:45:02<2:43:14, 5.52s/it] {'loss': 0.5547, 'learning_rate': 3.512941506530566e-06, 'epoch': 0.73} 73%|███████▎ | 4867/6640 [3:45:02<2:43:14, 5.52s/it] 73%|███████▎ | 4868/6640 [3:45:08<2:44:33, 5.57s/it] {'loss': 0.5894, 'learning_rate': 3.5092297392997686e-06, 'epoch': 0.73} 73%|███████▎ | 4868/6640 [3:45:08<2:44:33, 5.57s/it] 73%|███████▎ | 4869/6640 [3:45:13<2:42:44, 5.51s/it] {'loss': 0.572, 'learning_rate': 3.505519516698165e-06, 'epoch': 0.73} 73%|███████▎ | 4869/6640 [3:45:13<2:42:44, 5.51s/it] 73%|███████▎ | 4870/6640 [3:45:18<2:41:05, 5.46s/it] {'loss': 0.5719, 'learning_rate': 3.5018108396086945e-06, 'epoch': 0.73} 73%|███████▎ | 4870/6640 [3:45:18<2:41:05, 5.46s/it] 73%|███████▎ | 4871/6640 [3:45:24<2:40:54, 5.46s/it] {'loss': 0.5825, 'learning_rate': 3.4981037089139137e-06, 'epoch': 0.73} 73%|███████▎ | 4871/6640 [3:45:24<2:40:54, 5.46s/it] 73%|███████▎ | 4872/6640 [3:45:29<2:40:37, 5.45s/it] {'loss': 0.5825, 'learning_rate': 3.4943981254960325e-06, 'epoch': 0.73} 73%|███████▎ | 4872/6640 [3:45:29<2:40:37, 5.45s/it] 73%|███████▎ | 4873/6640 [3:45:35<2:42:02, 5.50s/it] {'loss': 0.584, 'learning_rate': 3.4906940902368726e-06, 'epoch': 0.73} 73%|███████▎ | 4873/6640 [3:45:35<2:42:02, 5.50s/it] 73%|███████▎ | 4874/6640 [3:45:40<2:40:04, 5.44s/it] {'loss': 0.5856, 'learning_rate': 3.486991604017895e-06, 'epoch': 0.73} 73%|███████▎ | 4874/6640 [3:45:40<2:40:04, 5.44s/it] 73%|███████▎ | 4875/6640 [3:45:46<2:39:21, 5.42s/it] {'loss': 0.5884, 'learning_rate': 3.483290667720196e-06, 'epoch': 0.73} 73%|███████▎ | 4875/6640 [3:45:46<2:39:21, 5.42s/it] 73%|███████▎ | 4876/6640 [3:45:51<2:39:58, 5.44s/it] {'loss': 0.5758, 'learning_rate': 3.479591282224496e-06, 'epoch': 0.73} 73%|███████▎ | 4876/6640 [3:45:51<2:39:58, 5.44s/it] 73%|███████▎ | 4877/6640 [3:45:56<2:38:30, 5.39s/it] {'loss': 0.5704, 'learning_rate': 3.475893448411154e-06, 'epoch': 0.73} 73%|███████▎ | 4877/6640 [3:45:56<2:38:30, 5.39s/it] 73%|███████▎ | 4878/6640 [3:46:02<2:38:30, 5.40s/it] {'loss': 0.5863, 'learning_rate': 3.4721971671601485e-06, 'epoch': 0.73} 73%|███████▎ | 4878/6640 [3:46:02<2:38:30, 5.40s/it] 73%|███████▎ | 4879/6640 [3:46:07<2:37:19, 5.36s/it] {'loss': 0.5808, 'learning_rate': 3.4685024393511015e-06, 'epoch': 0.73} 73%|███████▎ | 4879/6640 [3:46:07<2:37:19, 5.36s/it] 73%|███████▎ | 4880/6640 [3:46:12<2:38:04, 5.39s/it] {'loss': 0.5746, 'learning_rate': 3.4648092658632506e-06, 'epoch': 0.73} 73%|███████▎ | 4880/6640 [3:46:12<2:38:04, 5.39s/it] 74%|███████▎ | 4881/6640 [3:46:18<2:38:15, 5.40s/it] {'loss': 0.5855, 'learning_rate': 3.461117647575484e-06, 'epoch': 0.74} 74%|███████▎ | 4881/6640 [3:46:18<2:38:15, 5.40s/it] 74%|███████▎ | 4882/6640 [3:46:23<2:39:02, 5.43s/it] {'loss': 0.589, 'learning_rate': 3.4574275853662974e-06, 'epoch': 0.74} 74%|███████▎ | 4882/6640 [3:46:23<2:39:02, 5.43s/it] 74%|███████▎ | 4883/6640 [3:46:29<2:41:05, 5.50s/it] {'loss': 0.5762, 'learning_rate': 3.4537390801138306e-06, 'epoch': 0.74} 74%|███████▎ | 4883/6640 [3:46:29<2:41:05, 5.50s/it] 74%|███████▎ | 4884/6640 [3:46:34<2:40:25, 5.48s/it] {'loss': 0.6003, 'learning_rate': 3.4500521326958492e-06, 'epoch': 0.74} 74%|███████▎ | 4884/6640 [3:46:34<2:40:25, 5.48s/it] 74%|███████▎ | 4885/6640 [3:46:40<2:40:07, 5.47s/it] {'loss': 0.5852, 'learning_rate': 3.4463667439897486e-06, 'epoch': 0.74} 74%|███████▎ | 4885/6640 [3:46:40<2:40:07, 5.47s/it] 74%|███████▎ | 4886/6640 [3:46:46<2:41:27, 5.52s/it] {'loss': 0.5716, 'learning_rate': 3.4426829148725538e-06, 'epoch': 0.74} 74%|███████▎ | 4886/6640 [3:46:46<2:41:27, 5.52s/it] 74%|███████▎ | 4887/6640 [3:46:51<2:42:22, 5.56s/it] {'loss': 0.577, 'learning_rate': 3.4390006462209113e-06, 'epoch': 0.74} 74%|███████▎ | 4887/6640 [3:46:51<2:42:22, 5.56s/it] 74%|███████▎ | 4888/6640 [3:46:57<2:40:35, 5.50s/it] {'loss': 0.5891, 'learning_rate': 3.435319938911107e-06, 'epoch': 0.74} 74%|███████▎ | 4888/6640 [3:46:57<2:40:35, 5.50s/it] 74%|███████▎ | 4889/6640 [3:47:02<2:40:27, 5.50s/it] {'loss': 0.5612, 'learning_rate': 3.4316407938190486e-06, 'epoch': 0.74} 74%|███████▎ | 4889/6640 [3:47:02<2:40:27, 5.50s/it] 74%|███████▎ | 4890/6640 [3:47:08<2:41:20, 5.53s/it] {'loss': 0.5872, 'learning_rate': 3.4279632118202744e-06, 'epoch': 0.74} 74%|███████▎ | 4890/6640 [3:47:08<2:41:20, 5.53s/it] 74%|███████▎ | 4891/6640 [3:47:13<2:40:35, 5.51s/it] {'loss': 0.5971, 'learning_rate': 3.4242871937899548e-06, 'epoch': 0.74} 74%|███████▎ | 4891/6640 [3:47:13<2:40:35, 5.51s/it] 74%|███████▎ | 4892/6640 [3:47:19<2:39:37, 5.48s/it] {'loss': 0.57, 'learning_rate': 3.4206127406028744e-06, 'epoch': 0.74} 74%|███████▎ | 4892/6640 [3:47:19<2:39:37, 5.48s/it] 74%|███████▎ | 4893/6640 [3:47:24<2:39:21, 5.47s/it] {'loss': 0.5912, 'learning_rate': 3.416939853133461e-06, 'epoch': 0.74} 74%|███████▎ | 4893/6640 [3:47:24<2:39:21, 5.47s/it] 74%|███████▎ | 4894/6640 [3:47:30<2:40:26, 5.51s/it] {'loss': 0.5621, 'learning_rate': 3.41326853225576e-06, 'epoch': 0.74} 74%|███████▎ | 4894/6640 [3:47:30<2:40:26, 5.51s/it] 74%|███████▎ | 4895/6640 [3:47:35<2:40:56, 5.53s/it] {'loss': 0.5815, 'learning_rate': 3.4095987788434538e-06, 'epoch': 0.74} 74%|███████▎ | 4895/6640 [3:47:35<2:40:56, 5.53s/it] 74%|███████▎ | 4896/6640 [3:47:41<2:44:43, 5.67s/it] {'loss': 0.5617, 'learning_rate': 3.4059305937698363e-06, 'epoch': 0.74} 74%|███████▎ | 4896/6640 [3:47:41<2:44:43, 5.67s/it] 74%|███████▍ | 4897/6640 [3:47:47<2:42:39, 5.60s/it] {'loss': 0.5559, 'learning_rate': 3.4022639779078403e-06, 'epoch': 0.74} 74%|███████▍ | 4897/6640 [3:47:47<2:42:39, 5.60s/it] 74%|███████▍ | 4898/6640 [3:47:52<2:40:49, 5.54s/it] {'loss': 0.5741, 'learning_rate': 3.398598932130024e-06, 'epoch': 0.74} 74%|███████▍ | 4898/6640 [3:47:52<2:40:49, 5.54s/it] 74%|███████▍ | 4899/6640 [3:47:57<2:40:01, 5.51s/it] {'loss': 0.5551, 'learning_rate': 3.394935457308569e-06, 'epoch': 0.74} 74%|███████▍ | 4899/6640 [3:47:57<2:40:01, 5.51s/it]3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 74%|███████▍ | 4900/6640 [3:48:03<2:43:15, 5.63s/it] {'loss': 0.5869, 'learning_rate': 3.3912735543152864e-06, 'epoch': 0.74} 74%|███████▍ | 4900/6640 [3:48:03<2:43:15, 5.63s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-4900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 74%|███████▍ | 4901/6640 [3:48:30<5:47:03, 11.97s/it] {'loss': 0.5898, 'learning_rate': 3.3876132240216066e-06, 'epoch': 0.74} 74%|███████▍ | 4901/6640 [3:48:30<5:47:03, 11.97s/it] 74%|███████▍ | 4902/6640 [3:48:36<4:50:04, 10.01s/it] {'loss': 0.606, 'learning_rate': 3.3839544672985904e-06, 'epoch': 0.74} 74%|███████▍ | 4902/6640 [3:48:36<4:50:04, 10.01s/it] 74%|███████▍ | 4903/6640 [3:48:41<4:12:30, 8.72s/it] {'loss': 0.5668, 'learning_rate': 3.380297285016925e-06, 'epoch': 0.74} 74%|███████▍ | 4903/6640 [3:48:41<4:12:30, 8.72s/it] 74%|███████▍ | 4904/6640 [3:48:47<3:43:38, 7.73s/it] {'loss': 0.5913, 'learning_rate': 3.376641678046926e-06, 'epoch': 0.74} 74%|███████▍ | 4904/6640 [3:48:47<3:43:38, 7.73s/it] 74%|███████▍ | 4905/6640 [3:48:52<3:23:47, 7.05s/it] {'loss': 0.5411, 'learning_rate': 3.372987647258521e-06, 'epoch': 0.74} 74%|███████▍ | 4905/6640 [3:48:52<3:23:47, 7.05s/it] 74%|███████▍ | 4906/6640 [3:48:58<3:10:35, 6.59s/it] {'loss': 0.5759, 'learning_rate': 3.369335193521277e-06, 'epoch': 0.74} 74%|███████▍ | 4906/6640 [3:48:58<3:10:35, 6.59s/it] 74%|███████▍ | 4907/6640 [3:49:03<3:02:16, 6.31s/it] {'loss': 0.5652, 'learning_rate': 3.3656843177043786e-06, 'epoch': 0.74} 74%|███████▍ | 4907/6640 [3:49:03<3:02:16, 6.31s/it] 74%|███████▍ | 4908/6640 [3:49:09<2:53:36, 6.01s/it] {'loss': 0.5902, 'learning_rate': 3.362035020676637e-06, 'epoch': 0.74} 74%|███████▍ | 4908/6640 [3:49:09<2:53:36, 6.01s/it] 74%|███████▍ | 4909/6640 [3:49:14<2:49:01, 5.86s/it] {'loss': 0.5847, 'learning_rate': 3.3583873033064897e-06, 'epoch': 0.74} 74%|███████▍ | 4909/6640 [3:49:14<2:49:01, 5.86s/it] 74%|███████▍ | 4910/6640 [3:49:20<2:45:12, 5.73s/it] {'loss': 0.5839, 'learning_rate': 3.354741166461989e-06, 'epoch': 0.74} 74%|███████▍ | 4910/6640 [3:49:20<2:45:12, 5.73s/it] 74%|███████▍ | 4911/6640 [3:49:25<2:42:07, 5.63s/it] {'loss': 0.5684, 'learning_rate': 3.3510966110108213e-06, 'epoch': 0.74} 74%|███████▍ | 4911/6640 [3:49:25<2:42:07, 5.63s/it] 74%|███████▍ | 4912/6640 [3:49:31<2:43:50, 5.69s/it] {'loss': 0.5781, 'learning_rate': 3.3474536378202925e-06, 'epoch': 0.74} 74%|███████▍ | 4912/6640 [3:49:31<2:43:50, 5.69s/it] 74%|███████▍ | 4913/6640 [3:49:36<2:41:31, 5.61s/it] {'loss': 0.5715, 'learning_rate': 3.3438122477573365e-06, 'epoch': 0.74} 74%|███████▍ | 4913/6640 [3:49:36<2:41:31, 5.61s/it] 74%|███████▍ | 4914/6640 [3:49:42<2:42:44, 5.66s/it] {'loss': 0.5945, 'learning_rate': 3.3401724416884994e-06, 'epoch': 0.74} 74%|███████▍ | 4914/6640 [3:49:42<2:42:44, 5.66s/it] 74%|███████▍ | 4915/6640 [3:49:48<2:42:08, 5.64s/it] {'loss': 0.5947, 'learning_rate': 3.3365342204799613e-06, 'epoch': 0.74} 74%|███████▍ | 4915/6640 [3:49:48<2:42:08, 5.64s/it] 74%|███████▍ | 4916/6640 [3:49:53<2:39:59, 5.57s/it] {'loss': 0.5699, 'learning_rate': 3.332897584997521e-06, 'epoch': 0.74} 74%|███████▍ | 4916/6640 [3:49:53<2:39:59, 5.57s/it] 74%|███████▍ | 4917/6640 [3:49:59<2:39:23, 5.55s/it] {'loss': 0.5674, 'learning_rate': 3.3292625361065988e-06, 'epoch': 0.74} 74%|███████▍ | 4917/6640 [3:49:59<2:39:23, 5.55s/it] 74%|███████▍ | 4918/6640 [3:50:04<2:38:35, 5.53s/it] {'loss': 0.5922, 'learning_rate': 3.325629074672244e-06, 'epoch': 0.74} 74%|███████▍ | 4918/6640 [3:50:04<2:38:35, 5.53s/it] 74%|███████▍ | 4919/6640 [3:50:09<2:37:46, 5.50s/it] {'loss': 0.5739, 'learning_rate': 3.321997201559115e-06, 'epoch': 0.74} 74%|███████▍ | 4919/6640 [3:50:09<2:37:46, 5.50s/it] 74%|███████▍ | 4920/6640 [3:50:15<2:37:20, 5.49s/it] {'loss': 0.5552, 'learning_rate': 3.3183669176315046e-06, 'epoch': 0.74} 74%|███████▍ | 4920/6640 [3:50:15<2:37:20, 5.49s/it] 74%|███████▍ | 4921/6640 [3:50:20<2:36:45, 5.47s/it] {'loss': 0.5887, 'learning_rate': 3.314738223753323e-06, 'epoch': 0.74} 74%|███████▍ | 4921/6640 [3:50:20<2:36:45, 5.47s/it] 74%|███████▍ | 4922/6640 [3:50:26<2:36:15, 5.46s/it] {'loss': 0.6175, 'learning_rate': 3.311111120788104e-06, 'epoch': 0.74} 74%|███████▍ | 4922/6640 [3:50:26<2:36:15, 5.46s/it] 74%|███████▍ | 4923/6640 [3:50:31<2:38:02, 5.52s/it] {'loss': 0.5809, 'learning_rate': 3.307485609598995e-06, 'epoch': 0.74} 74%|███████▍ | 4923/6640 [3:50:31<2:38:02, 5.52s/it] 74%|███████▍ | 4924/6640 [3:50:37<2:36:59, 5.49s/it] {'loss': 0.592, 'learning_rate': 3.303861691048774e-06, 'epoch': 0.74} 74%|███████▍ | 4924/6640 [3:50:37<2:36:59, 5.49s/it] 74%|███████▍ | 4925/6640 [3:50:42<2:36:58, 5.49s/it] {'loss': 0.5796, 'learning_rate': 3.3002393659998357e-06, 'epoch': 0.74} 74%|███████▍ | 4925/6640 [3:50:42<2:36:58, 5.49s/it] 74%|███████▍ | 4926/6640 [3:50:48<2:35:59, 5.46s/it] {'loss': 0.563, 'learning_rate': 3.296618635314197e-06, 'epoch': 0.74} 74%|███████▍ | 4926/6640 [3:50:48<2:35:59, 5.46s/it] 74%|███████▍ | 4927/6640 [3:50:53<2:37:10, 5.51s/it] {'loss': 0.5786, 'learning_rate': 3.2929994998534976e-06, 'epoch': 0.74} 74%|███████▍ | 4927/6640 [3:50:53<2:37:10, 5.51s/it] 74%|███████▍ | 4928/6640 [3:50:59<2:35:18, 5.44s/it] {'loss': 0.5861, 'learning_rate': 3.289381960478988e-06, 'epoch': 0.74} 74%|███████▍ | 4928/6640 [3:50:59<2:35:18, 5.44s/it] 74%|███████▍ | 4929/6640 [3:51:04<2:34:55, 5.43s/it] {'loss': 0.5896, 'learning_rate': 3.2857660180515484e-06, 'epoch': 0.74} 74%|███████▍ | 4929/6640 [3:51:04<2:34:55, 5.43s/it] 74%|███████▍ | 4930/6640 [3:51:10<2:36:24, 5.49s/it] {'loss': 0.5866, 'learning_rate': 3.2821516734316772e-06, 'epoch': 0.74} 74%|███████▍ | 4930/6640 [3:51:10<2:36:24, 5.49s/it] 74%|███████▍ | 4931/6640 [3:51:15<2:35:11, 5.45s/it] {'loss': 0.5746, 'learning_rate': 3.278538927479491e-06, 'epoch': 0.74} 74%|███████▍ | 4931/6640 [3:51:15<2:35:11, 5.45s/it] 74%|███████▍ | 4932/6640 [3:51:20<2:35:00, 5.45s/it] {'loss': 0.5614, 'learning_rate': 3.2749277810547286e-06, 'epoch': 0.74} 74%|███████▍ | 4932/6640 [3:51:20<2:35:00, 5.45s/it] 74%|███████▍ | 4933/6640 [3:51:26<2:33:03, 5.38s/it] {'loss': 0.5506, 'learning_rate': 3.2713182350167417e-06, 'epoch': 0.74} 74%|███████▍ | 4933/6640 [3:51:26<2:33:03, 5.38s/it] 74%|███████▍ | 4934/6640 [3:51:31<2:32:43, 5.37s/it] {'loss': 0.5925, 'learning_rate': 3.2677102902245073e-06, 'epoch': 0.74} 74%|███████▍ | 4934/6640 [3:51:31<2:32:43, 5.37s/it] 74%|███████▍ | 4935/6640 [3:51:37<2:36:01, 5.49s/it] {'loss': 0.5695, 'learning_rate': 3.264103947536619e-06, 'epoch': 0.74} 74%|███████▍ | 4935/6640 [3:51:37<2:36:01, 5.49s/it] 74%|███████▍ | 4936/6640 [3:51:42<2:34:56, 5.46s/it] {'loss': 0.5896, 'learning_rate': 3.260499207811293e-06, 'epoch': 0.74} 74%|███████▍ | 4936/6640 [3:51:42<2:34:56, 5.46s/it] 74%|███████▍ | 4937/6640 [3:51:48<2:34:20, 5.44s/it] {'loss': 0.5694, 'learning_rate': 3.256896071906355e-06, 'epoch': 0.74} 74%|███████▍ | 4937/6640 [3:51:48<2:34:20, 5.44s/it] 74%|███████▍ | 4938/6640 [3:51:53<2:33:16, 5.40s/it] {'loss': 0.5995, 'learning_rate': 3.2532945406792573e-06, 'epoch': 0.74} 74%|███████▍ | 4938/6640 [3:51:53<2:33:16, 5.40s/it] 74%|███████▍ | 4939/6640 [3:51:58<2:34:30, 5.45s/it] {'loss': 0.5723, 'learning_rate': 3.249694614987068e-06, 'epoch': 0.74} 74%|███████▍ | 4939/6640 [3:51:58<2:34:30, 5.45s/it] 74%|███████▍ | 4940/6640 [3:52:04<2:34:53, 5.47s/it] {'loss': 0.5883, 'learning_rate': 3.2460962956864727e-06, 'epoch': 0.74} 74%|███████▍ | 4940/6640 [3:52:04<2:34:53, 5.47s/it] 74%|███████▍ | 4941/6640 [3:52:09<2:34:44, 5.46s/it] {'loss': 0.5639, 'learning_rate': 3.242499583633779e-06, 'epoch': 0.74} 74%|███████▍ | 4941/6640 [3:52:09<2:34:44, 5.46s/it] 74%|███████▍ | 4942/6640 [3:52:15<2:33:45, 5.43s/it] {'loss': 0.5584, 'learning_rate': 3.2389044796849e-06, 'epoch': 0.74} 74%|███████▍ | 4942/6640 [3:52:15<2:33:45, 5.43s/it] 74%|███████▍ | 4943/6640 [3:52:20<2:34:42, 5.47s/it] {'loss': 0.5859, 'learning_rate': 3.235310984695378e-06, 'epoch': 0.74} 74%|███████▍ | 4943/6640 [3:52:20<2:34:42, 5.47s/it] 74%|███████▍ | 4944/6640 [3:52:26<2:33:40, 5.44s/it] {'loss': 0.6032, 'learning_rate': 3.2317190995203673e-06, 'epoch': 0.74} 74%|███████▍ | 4944/6640 [3:52:26<2:33:40, 5.44s/it] 74%|███████▍ | 4945/6640 [3:52:31<2:34:05, 5.45s/it] {'loss': 0.5894, 'learning_rate': 3.2281288250146447e-06, 'epoch': 0.74} 74%|███████▍ | 4945/6640 [3:52:31<2:34:05, 5.45s/it] 74%|███████▍ | 4946/6640 [3:52:37<2:35:04, 5.49s/it] {'loss': 0.5497, 'learning_rate': 3.2245401620325934e-06, 'epoch': 0.74} 74%|███████▍ | 4946/6640 [3:52:37<2:35:04, 5.49s/it] 75%|███████▍ | 4947/6640 [3:52:42<2:34:24, 5.47s/it] {'loss': 0.5531, 'learning_rate': 3.22095311142822e-06, 'epoch': 0.75} 75%|███████▍ | 4947/6640 [3:52:42<2:34:24, 5.47s/it] 75%|███████▍ | 4948/6640 [3:52:48<2:33:42, 5.45s/it] {'loss': 0.5635, 'learning_rate': 3.2173676740551484e-06, 'epoch': 0.75} 75%|███████▍ | 4948/6640 [3:52:48<2:33:42, 5.45s/it] 75%|███████▍ | 4949/6640 [3:52:53<2:34:20, 5.48s/it] {'loss': 0.5794, 'learning_rate': 3.213783850766614e-06, 'epoch': 0.75} 75%|███████▍ | 4949/6640 [3:52:53<2:34:20, 5.48s/it]614 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend...3 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 75%|███████▍ | 4950/6640 [3:52:59<2:33:52, 5.46s/it] {'loss': 0.5431, 'learning_rate': 3.210201642415477e-06, 'epoch': 0.75} 75%|███████▍ | 4950/6640 [3:52:59<2:33:52, 5.46s/it] 75%|███████▍ | 4951/6640 [3:53:04<2:33:11, 5.44s/it] {'loss': 0.5723, 'learning_rate': 3.2066210498541984e-06, 'epoch': 0.75} 75%|███████▍ | 4951/6640 [3:53:04<2:33:11, 5.44s/it] 75%|███████▍ | 4952/6640 [3:53:09<2:32:44, 5.43s/it] {'loss': 0.5805, 'learning_rate': 3.2030420739348666e-06, 'epoch': 0.75} 75%|███████▍ | 4952/6640 [3:53:09<2:32:44, 5.43s/it] 75%|███████▍ | 4953/6640 [3:53:15<2:32:38, 5.43s/it] {'loss': 0.5669, 'learning_rate': 3.199464715509183e-06, 'epoch': 0.75} 75%|███████▍ | 4953/6640 [3:53:15<2:32:38, 5.43s/it] 75%|███████▍ | 4954/6640 [3:53:20<2:33:40, 5.47s/it] {'loss': 0.5682, 'learning_rate': 3.1958889754284637e-06, 'epoch': 0.75} 75%|███████▍ | 4954/6640 [3:53:20<2:33:40, 5.47s/it] 75%|███████▍ | 4955/6640 [3:53:26<2:32:54, 5.44s/it] {'loss': 0.5855, 'learning_rate': 3.1923148545436357e-06, 'epoch': 0.75} 75%|███████▍ | 4955/6640 [3:53:26<2:32:54, 5.44s/it] 75%|███████▍ | 4956/6640 [3:53:31<2:33:39, 5.47s/it] {'loss': 0.5865, 'learning_rate': 3.188742353705244e-06, 'epoch': 0.75} 75%|███████▍ | 4956/6640 [3:53:31<2:33:39, 5.47s/it] 75%|███████▍ | 4957/6640 [3:53:37<2:35:27, 5.54s/it] {'loss': 0.6014, 'learning_rate': 3.1851714737634496e-06, 'epoch': 0.75} 75%|███████▍ | 4957/6640 [3:53:37<2:35:27, 5.54s/it] 75%|███████▍ | 4958/6640 [3:53:42<2:32:48, 5.45s/it] {'loss': 0.5692, 'learning_rate': 3.181602215568027e-06, 'epoch': 0.75} 75%|███████▍ | 4958/6640 [3:53:42<2:32:48, 5.45s/it] 75%|███████▍ | 4959/6640 [3:53:48<2:34:44, 5.52s/it] {'loss': 0.5736, 'learning_rate': 3.1780345799683654e-06, 'epoch': 0.75} 75%|███████▍ | 4959/6640 [3:53:48<2:34:44, 5.52s/it] 75%|███████▍ | 4960/6640 [3:53:53<2:33:42, 5.49s/it] {'loss': 0.5738, 'learning_rate': 3.174468567813461e-06, 'epoch': 0.75} 75%|███████▍ | 4960/6640 [3:53:53<2:33:42, 5.49s/it] 75%|███████▍ | 4961/6640 [3:53:59<2:32:26, 5.45s/it] {'loss': 0.5788, 'learning_rate': 3.1709041799519312e-06, 'epoch': 0.75} 75%|███████▍ | 4961/6640 [3:53:59<2:32:26, 5.45s/it] 75%|███████▍ | 4962/6640 [3:54:04<2:31:14, 5.41s/it] {'loss': 0.5628, 'learning_rate': 3.167341417232006e-06, 'epoch': 0.75} 75%|███████▍ | 4962/6640 [3:54:04<2:31:14, 5.41s/it] 75%|███████▍ | 4963/6640 [3:54:10<2:32:14, 5.45s/it] {'loss': 0.5724, 'learning_rate': 3.1637802805015296e-06, 'epoch': 0.75} 75%|███████▍ | 4963/6640 [3:54:10<2:32:14, 5.45s/it] 75%|███████▍ | 4964/6640 [3:54:15<2:33:21, 5.49s/it] {'loss': 0.5566, 'learning_rate': 3.1602207706079503e-06, 'epoch': 0.75} 75%|███████▍ | 4964/6640 [3:54:15<2:33:21, 5.49s/it] 75%|███████▍ | 4965/6640 [3:54:21<2:32:50, 5.48s/it] {'loss': 0.5788, 'learning_rate': 3.1566628883983395e-06, 'epoch': 0.75} 75%|███████▍ | 4965/6640 [3:54:21<2:32:50, 5.48s/it] 75%|███████▍ | 4966/6640 [3:54:26<2:31:47, 5.44s/it] {'loss': 0.5477, 'learning_rate': 3.1531066347193796e-06, 'epoch': 0.75} 75%|███████▍ | 4966/6640 [3:54:26<2:31:47, 5.44s/it] 75%|███████▍ | 4967/6640 [3:54:31<2:31:34, 5.44s/it] {'loss': 0.5806, 'learning_rate': 3.1495520104173603e-06, 'epoch': 0.75} 75%|███████▍ | 4967/6640 [3:54:31<2:31:34, 5.44s/it] 75%|███████▍ | 4968/6640 [3:54:37<2:33:09, 5.50s/it] {'loss': 0.5695, 'learning_rate': 3.1459990163381926e-06, 'epoch': 0.75} 75%|███████▍ | 4968/6640 [3:54:37<2:33:09, 5.50s/it] 75%|███████▍ | 4969/6640 [3:54:43<2:34:34, 5.55s/it] {'loss': 0.5994, 'learning_rate': 3.1424476533273864e-06, 'epoch': 0.75} 75%|███████▍ | 4969/6640 [3:54:43<2:34:34, 5.55s/it] 75%|███████▍ | 4970/6640 [3:54:48<2:33:30, 5.52s/it] {'loss': 0.5729, 'learning_rate': 3.138897922230074e-06, 'epoch': 0.75} 75%|███████▍ | 4970/6640 [3:54:48<2:33:30, 5.52s/it] 75%|███████▍ | 4971/6640 [3:54:53<2:31:52, 5.46s/it] {'loss': 0.5765, 'learning_rate': 3.135349823890995e-06, 'epoch': 0.75} 75%|███████▍ | 4971/6640 [3:54:53<2:31:52, 5.46s/it] 75%|███████▍ | 4972/6640 [3:54:59<2:31:28, 5.45s/it] {'loss': 0.5612, 'learning_rate': 3.131803359154505e-06, 'epoch': 0.75} 75%|███████▍ | 4972/6640 [3:54:59<2:31:28, 5.45s/it] 75%|███████▍ | 4973/6640 [3:55:04<2:31:26, 5.45s/it] {'loss': 0.5664, 'learning_rate': 3.1282585288645675e-06, 'epoch': 0.75} 75%|███████▍ | 4973/6640 [3:55:04<2:31:26, 5.45s/it] 75%|███████▍ | 4974/6640 [3:55:10<2:30:59, 5.44s/it] {'loss': 0.5616, 'learning_rate': 3.1247153338647486e-06, 'epoch': 0.75} 75%|███████▍ | 4974/6640 [3:55:10<2:30:59, 5.44s/it] 75%|███████▍ | 4975/6640 [3:55:15<2:31:58, 5.48s/it] {'loss': 0.5889, 'learning_rate': 3.121173774998245e-06, 'epoch': 0.75} 75%|███████▍ | 4975/6640 [3:55:15<2:31:58, 5.48s/it] 75%|███████▍ | 4976/6640 [3:55:21<2:31:34, 5.47s/it] {'loss': 0.6028, 'learning_rate': 3.1176338531078467e-06, 'epoch': 0.75} 75%|███████▍ | 4976/6640 [3:55:21<2:31:34, 5.47s/it] 75%|███████▍ | 4977/6640 [3:55:26<2:30:13, 5.42s/it] {'loss': 0.5816, 'learning_rate': 3.114095569035962e-06, 'epoch': 0.75} 75%|███████▍ | 4977/6640 [3:55:26<2:30:13, 5.42s/it] 75%|███████▍ | 4978/6640 [3:55:31<2:30:21, 5.43s/it] {'loss': 0.5869, 'learning_rate': 3.1105589236246047e-06, 'epoch': 0.75} 75%|███████▍ | 4978/6640 [3:55:31<2:30:21, 5.43s/it] 75%|███████▍ | 4979/6640 [3:55:37<2:29:06, 5.39s/it] {'loss': 0.5832, 'learning_rate': 3.107023917715404e-06, 'epoch': 0.75} 75%|███████▍ | 4979/6640 [3:55:37<2:29:06, 5.39s/it] 75%|███████▌ | 4980/6640 [3:55:42<2:28:43, 5.38s/it] {'loss': 0.5576, 'learning_rate': 3.103490552149595e-06, 'epoch': 0.75} 75%|███████▌ | 4980/6640 [3:55:42<2:28:43, 5.38s/it] 75%|███████▌ | 4981/6640 [3:55:47<2:28:29, 5.37s/it] {'loss': 0.5729, 'learning_rate': 3.099958827768025e-06, 'epoch': 0.75} 75%|███████▌ | 4981/6640 [3:55:47<2:28:29, 5.37s/it] 75%|███████▌ | 4982/6640 [3:55:53<2:28:11, 5.36s/it] {'loss': 0.561, 'learning_rate': 3.096428745411153e-06, 'epoch': 0.75} 75%|███████▌ | 4982/6640 [3:55:53<2:28:11, 5.36s/it] 75%|███████▌ | 4983/6640 [3:55:58<2:29:45, 5.42s/it] {'loss': 0.6035, 'learning_rate': 3.092900305919033e-06, 'epoch': 0.75} 75%|███████▌ | 4983/6640 [3:55:58<2:29:45, 5.42s/it] 75%|███████▌ | 4984/6640 [3:56:04<2:28:14, 5.37s/it] {'loss': 0.5795, 'learning_rate': 3.089373510131354e-06, 'epoch': 0.75} 75%|███████▌ | 4984/6640 [3:56:04<2:28:14, 5.37s/it] 75%|███████▌ | 4985/6640 [3:56:09<2:29:06, 5.41s/it] {'loss': 0.5725, 'learning_rate': 3.0858483588873878e-06, 'epoch': 0.75} 75%|███████▌ | 4985/6640 [3:56:09<2:29:06, 5.41s/it] 75%|███████▌ | 4986/6640 [3:56:15<2:29:18, 5.42s/it] {'loss': 0.6055, 'learning_rate': 3.082324853026032e-06, 'epoch': 0.75} 75%|███████▌ | 4986/6640 [3:56:15<2:29:18, 5.42s/it] 75%|███████▌ | 4987/6640 [3:56:20<2:30:49, 5.47s/it] {'loss': 0.6017, 'learning_rate': 3.078802993385781e-06, 'epoch': 0.75} 75%|███████▌ | 4987/6640 [3:56:20<2:30:49, 5.47s/it] 75%|███████▌ | 4988/6640 [3:56:26<2:34:15, 5.60s/it] {'loss': 0.5819, 'learning_rate': 3.0752827808047446e-06, 'epoch': 0.75} 75%|███████▌ | 4988/6640 [3:56:26<2:34:15, 5.60s/it] 75%|███████▌ | 4989/6640 [3:56:31<2:32:26, 5.54s/it] {'loss': 0.5963, 'learning_rate': 3.0717642161206405e-06, 'epoch': 0.75} 75%|███████▌ | 4989/6640 [3:56:31<2:32:26, 5.54s/it] 75%|███████▌ | 4990/6640 [3:56:37<2:32:44, 5.55s/it] {'loss': 0.5797, 'learning_rate': 3.0682473001707925e-06, 'epoch': 0.75} 75%|███████▌ | 4990/6640 [3:56:37<2:32:44, 5.55s/it] 75%|███████▌ | 4991/6640 [3:56:43<2:32:05, 5.53s/it] {'loss': 0.6039, 'learning_rate': 3.0647320337921347e-06, 'epoch': 0.75} 75%|███████▌ | 4991/6640 [3:56:43<2:32:05, 5.53s/it] 75%|███████▌ | 4992/6640 [3:56:48<2:31:07, 5.50s/it] {'loss': 0.5693, 'learning_rate': 3.061218417821197e-06, 'epoch': 0.75} 75%|███████▌ | 4992/6640 [3:56:48<2:31:07, 5.50s/it] 75%|███████▌ | 4993/6640 [3:56:53<2:30:12, 5.47s/it] {'loss': 0.5708, 'learning_rate': 3.057706453094138e-06, 'epoch': 0.75} 75%|███████▌ | 4993/6640 [3:56:53<2:30:12, 5.47s/it] 75%|███████▌ | 4994/6640 [3:56:59<2:29:33, 5.45s/it] {'loss': 0.5821, 'learning_rate': 3.0541961404467026e-06, 'epoch': 0.75} 75%|███████▌ | 4994/6640 [3:56:59<2:29:33, 5.45s/it] 75%|███████▌ | 4995/6640 [3:57:04<2:28:56, 5.43s/it] {'loss': 0.5811, 'learning_rate': 3.050687480714256e-06, 'epoch': 0.75} 75%|███████▌ | 4995/6640 [3:57:04<2:28:56, 5.43s/it] 75%|███████▌ | 4996/6640 [3:57:10<2:28:46, 5.43s/it] {'loss': 0.5793, 'learning_rate': 3.0471804747317614e-06, 'epoch': 0.75} 75%|███████▌ | 4996/6640 [3:57:10<2:28:46, 5.43s/it] 75%|███████▌ | 4997/6640 [3:57:15<2:28:21, 5.42s/it] {'loss': 0.5675, 'learning_rate': 3.0436751233337914e-06, 'epoch': 0.75} 75%|███████▌ | 4997/6640 [3:57:15<2:28:21, 5.42s/it] 75%|███████▌ | 4998/6640 [3:57:20<2:28:14, 5.42s/it] {'loss': 0.5716, 'learning_rate': 3.040171427354529e-06, 'epoch': 0.75} 75%|███████▌ | 4998/6640 [3:57:20<2:28:14, 5.42s/it] 75%|███████▌ | 4999/6640 [3:57:26<2:28:36, 5.43s/it] {'loss': 0.5674, 'learning_rate': 3.0366693876277587e-06, 'epoch': 0.75} 75%|███████▌ | 4999/6640 [3:57:26<2:28:36, 5.43s/it]2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 75%|███████▌ | 5000/6640 [3:57:31<2:27:15, 5.39s/it] {'loss': 0.6121, 'learning_rate': 3.0331690049868733e-06, 'epoch': 0.75} 75%|███████▌ | 5000/6640 [3:57:31<2:27:15, 5.39s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( May 28 02:08:36.585700 1497847 slurmstepd 0x155550ab8700: error: *** STEP 8269448.0 ON batch-block5-00214 CANCELLED AT 2025-05-28T02:08:36 DUE TO TIME LIMIT *** srun: Job step aborted: Waiting up to 122 seconds for job step to finish. srun: error: batch-block5-00214: task 0: Terminated srun: Terminating StepId=8269448.0 srun: job 8277396 queued and waiting for resources srun: job 8277396 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-2057 JobID: 8277396 | Full list: batch-block1-2057 NETWORK=Efficient-Large-Model/VILA1.5-3b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-05-28 02:10:45,006] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 02:10:45,006] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 02:10:45,006] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 02:10:45,006] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 02:10:45,006] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 02:10:45,006] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 02:10:45,006] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 02:10:45,006] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 02:10:46,404] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 02:10:46,404] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 02:10:46,404] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 02:10:46,404] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 02:10:46,404] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 02:10:46,404] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 02:10:46,404] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 02:10:46,404] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 02:10:46,404] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 02:10:46,404] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 02:10:46,404] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 02:10:46,404] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 02:10:46,404] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 02:10:46,404] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 02:10:46,404] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 02:10:46,404] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 02:10:46,404] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-05-28 02:10:54,978] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 2.70B parameters Loading checkpoint shards: 0%| | 0/2 [00:00\nWould this person be more likely to be a type a or b person?\nAnswer the question using a single word or phrase.'}, {'from': 'gpt', 'value': ''}]] (ignored) 80%|████████ | 5336/6640 [31:49<1:59:00, 5.48s/it] {'loss': 0.5739, 'learning_rate': 1.955953829794711e-06, 'epoch': 0.8} 80%|████████ | 5336/6640 [31:49<1:59:00, 5.48s/it] 80%|████████ | 5337/6640 [31:54<1:59:43, 5.51s/it] {'loss': 0.5645, 'learning_rate': 1.953056707711005e-06, 'epoch': 0.8} 80%|████████ | 5337/6640 [31:54<1:59:43, 5.51s/it] 80%|████████ | 5338/6640 [32:00<2:00:05, 5.53s/it] {'loss': 0.5766, 'learning_rate': 1.95016150058393e-06, 'epoch': 0.8} 80%|████████ | 5338/6640 [32:00<2:00:05, 5.53s/it] 80%|████████ | 5339/6640 [32:05<1:58:49, 5.48s/it] {'loss': 0.5816, 'learning_rate': 1.9472682091024696e-06, 'epoch': 0.8} 80%|████████ | 5339/6640 [32:05<1:58:49, 5.48s/it] 80%|████████ | 5340/6640 [32:11<1:59:03, 5.50s/it] {'loss': 0.5821, 'learning_rate': 1.944376833955147e-06, 'epoch': 0.8} 80%|████████ | 5340/6640 [32:11<1:59:03, 5.50s/it] 80%|████████ | 5341/6640 [32:16<1:58:30, 5.47s/it] {'loss': 0.564, 'learning_rate': 1.941487375830037e-06, 'epoch': 0.8} 80%|████████ | 5341/6640 [32:16<1:58:30, 5.47s/it] 80%|████████ | 5342/6640 [32:22<1:58:32, 5.48s/it] {'loss': 0.5748, 'learning_rate': 1.938599835414745e-06, 'epoch': 0.8} 80%|████████ | 5342/6640 [32:22<1:58:32, 5.48s/it] 80%|████████ | 5343/6640 [32:27<1:58:10, 5.47s/it] {'loss': 0.5801, 'learning_rate': 1.9357142133964336e-06, 'epoch': 0.8} 80%|████████ | 5343/6640 [32:27<1:58:10, 5.47s/it] 80%|████████ | 5344/6640 [32:33<2:00:02, 5.56s/it] {'loss': 0.5532, 'learning_rate': 1.932830510461802e-06, 'epoch': 0.8} 80%|████████ | 5344/6640 [32:33<2:00:02, 5.56s/it] 80%|████████ | 5345/6640 [32:38<1:58:36, 5.50s/it] {'loss': 0.5822, 'learning_rate': 1.929948727297096e-06, 'epoch': 0.8} 80%|████████ | 5345/6640 [32:38<1:58:36, 5.50s/it] 81%|████████ | 5346/6640 [32:44<1:58:09, 5.48s/it] {'loss': 0.5734, 'learning_rate': 1.9270688645881e-06, 'epoch': 0.81} 81%|████████ | 5346/6640 [32:44<1:58:09, 5.48s/it] 81%|████████ | 5347/6640 [32:49<1:58:12, 5.48s/it] {'loss': 0.5776, 'learning_rate': 1.924190923020144e-06, 'epoch': 0.81} 81%|████████ | 5347/6640 [32:49<1:58:12, 5.48s/it] 81%|████████ | 5348/6640 [32:55<1:58:40, 5.51s/it] {'loss': 0.5994, 'learning_rate': 1.921314903278102e-06, 'epoch': 0.81} 81%|████████ | 5348/6640 [32:55<1:58:40, 5.51s/it] 81%|████████ | 5349/6640 [33:00<1:57:18, 5.45s/it] {'loss': 0.5816, 'learning_rate': 1.918440806046391e-06, 'epoch': 0.81} 81%|████████ | 5349/6640 [33:00<1:57:18, 5.45s/it]7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 81%|████████ | 5350/6640 [33:05<1:57:10, 5.45s/it]6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5652, 'learning_rate': 1.9155686320089684e-06, 'epoch': 0.81} 81%|████████ | 5350/6640 [33:05<1:57:10, 5.45s/it] 81%|████████ | 5351/6640 [33:11<1:56:37, 5.43s/it] {'loss': 0.5766, 'learning_rate': 1.912698381849333e-06, 'epoch': 0.81} 81%|████████ | 5351/6640 [33:11<1:56:37, 5.43s/it] 81%|████████ | 5352/6640 [33:16<1:55:54, 5.40s/it] {'loss': 0.5749, 'learning_rate': 1.9098300562505266e-06, 'epoch': 0.81} 81%|████████ | 5352/6640 [33:16<1:55:54, 5.40s/it] 81%|████████ | 5353/6640 [33:21<1:55:37, 5.39s/it] {'loss': 0.5731, 'learning_rate': 1.9069636558951354e-06, 'epoch': 0.81} 81%|████████ | 5353/6640 [33:21<1:55:37, 5.39s/it] 81%|████████ | 5354/6640 [33:27<1:55:22, 5.38s/it] {'loss': 0.58, 'learning_rate': 1.9040991814652864e-06, 'epoch': 0.81} 81%|████████ | 5354/6640 [33:27<1:55:22, 5.38s/it] 81%|████████ | 5355/6640 [33:32<1:55:50, 5.41s/it] {'loss': 0.5733, 'learning_rate': 1.901236633642649e-06, 'epoch': 0.81} 81%|████████ | 5355/6640 [33:32<1:55:50, 5.41s/it] 81%|████████ | 5356/6640 [33:38<1:55:33, 5.40s/it] {'loss': 0.5555, 'learning_rate': 1.8983760131084283e-06, 'epoch': 0.81} 81%|████████ | 5356/6640 [33:38<1:55:33, 5.40s/it] 81%|████████ | 5357/6640 [33:43<1:56:28, 5.45s/it] {'loss': 0.561, 'learning_rate': 1.8955173205433774e-06, 'epoch': 0.81} 81%|████████ | 5357/6640 [33:43<1:56:28, 5.45s/it] 81%|████████ | 5358/6640 [33:49<1:56:39, 5.46s/it] {'loss': 0.5766, 'learning_rate': 1.892660556627789e-06, 'epoch': 0.81} 81%|████████ | 5358/6640 [33:49<1:56:39, 5.46s/it] 81%|████████ | 5359/6640 [33:54<1:55:57, 5.43s/it] {'loss': 0.5981, 'learning_rate': 1.889805722041499e-06, 'epoch': 0.81} 81%|████████ | 5359/6640 [33:54<1:55:57, 5.43s/it] 81%|████████ | 5360/6640 [34:00<1:56:06, 5.44s/it] {'loss': 0.5808, 'learning_rate': 1.8869528174638752e-06, 'epoch': 0.81} 81%|████████ | 5360/6640 [34:00<1:56:06, 5.44s/it] 81%|████████ | 5361/6640 [34:05<1:56:05, 5.45s/it] {'loss': 0.5926, 'learning_rate': 1.8841018435738357e-06, 'epoch': 0.81} 81%|████████ | 5361/6640 [34:05<1:56:05, 5.45s/it] 81%|████████ | 5362/6640 [34:10<1:55:38, 5.43s/it] {'loss': 0.5787, 'learning_rate': 1.8812528010498355e-06, 'epoch': 0.81} 81%|████████ | 5362/6640 [34:10<1:55:38, 5.43s/it] 81%|████████ | 5363/6640 [34:16<1:57:03, 5.50s/it] {'loss': 0.5797, 'learning_rate': 1.878405690569871e-06, 'epoch': 0.81} 81%|████████ | 5363/6640 [34:16<1:57:03, 5.50s/it] 81%|████████ | 5364/6640 [34:21<1:55:43, 5.44s/it] {'loss': 0.5552, 'learning_rate': 1.8755605128114796e-06, 'epoch': 0.81} 81%|████████ | 5364/6640 [34:21<1:55:43, 5.44s/it] 81%|████████ | 5365/6640 [34:27<1:54:43, 5.40s/it] {'loss': 0.5572, 'learning_rate': 1.8727172684517325e-06, 'epoch': 0.81} 81%|████████ | 5365/6640 [34:27<1:54:43, 5.40s/it] 81%|████████ | 5366/6640 [34:32<1:55:43, 5.45s/it] {'loss': 0.5797, 'learning_rate': 1.8698759581672487e-06, 'epoch': 0.81} 81%|████████ | 5366/6640 [34:32<1:55:43, 5.45s/it] 81%|████████ | 5367/6640 [34:38<1:54:56, 5.42s/it] {'loss': 0.5503, 'learning_rate': 1.8670365826341842e-06, 'epoch': 0.81} 81%|████████ | 5367/6640 [34:38<1:54:56, 5.42s/it] 81%|████████ | 5368/6640 [34:43<1:54:42, 5.41s/it] {'loss': 0.5967, 'learning_rate': 1.8641991425282347e-06, 'epoch': 0.81} 81%|████████ | 5368/6640 [34:43<1:54:42, 5.41s/it] 81%|████████ | 5369/6640 [34:49<1:57:38, 5.55s/it] {'loss': 0.5572, 'learning_rate': 1.8613636385246326e-06, 'epoch': 0.81} 81%|████████ | 5369/6640 [34:49<1:57:38, 5.55s/it] 81%|████████ | 5370/6640 [34:54<1:56:32, 5.51s/it] {'loss': 0.5971, 'learning_rate': 1.8585300712981514e-06, 'epoch': 0.81} 81%|████████ | 5370/6640 [34:54<1:56:32, 5.51s/it] 81%|████████ | 5371/6640 [35:00<1:56:05, 5.49s/it] {'loss': 0.5642, 'learning_rate': 1.855698441523106e-06, 'epoch': 0.81} 81%|████████ | 5371/6640 [35:00<1:56:05, 5.49s/it] 81%|████████ | 5372/6640 [35:05<1:55:40, 5.47s/it] {'loss': 0.5845, 'learning_rate': 1.8528687498733478e-06, 'epoch': 0.81} 81%|████████ | 5372/6640 [35:05<1:55:40, 5.47s/it] 81%|████████ | 5373/6640 [35:11<1:56:22, 5.51s/it] {'loss': 0.5914, 'learning_rate': 1.85004099702227e-06, 'epoch': 0.81} 81%|████████ | 5373/6640 [35:11<1:56:22, 5.51s/it] 81%|████████ | 5374/6640 [35:16<1:56:40, 5.53s/it] {'loss': 0.5866, 'learning_rate': 1.8472151836427976e-06, 'epoch': 0.81} 81%|████████ | 5374/6640 [35:16<1:56:40, 5.53s/it] 81%|████████ | 5375/6640 [35:22<1:55:51, 5.50s/it] {'loss': 0.5962, 'learning_rate': 1.8443913104073984e-06, 'epoch': 0.81} 81%|████████ | 5375/6640 [35:22<1:55:51, 5.50s/it] 81%|████████ | 5376/6640 [35:27<1:54:58, 5.46s/it] {'loss': 0.5796, 'learning_rate': 1.8415693779880816e-06, 'epoch': 0.81} 81%|████████ | 5376/6640 [35:27<1:54:58, 5.46s/it] 81%|████████ | 5377/6640 [35:33<1:55:03, 5.47s/it] {'loss': 0.5608, 'learning_rate': 1.8387493870563933e-06, 'epoch': 0.81} 81%|████████ | 5377/6640 [35:33<1:55:03, 5.47s/it] 81%|████████ | 5378/6640 [35:38<1:55:07, 5.47s/it] {'loss': 0.6034, 'learning_rate': 1.8359313382834088e-06, 'epoch': 0.81} 81%|████████ | 5378/6640 [35:38<1:55:07, 5.47s/it] 81%|████████ | 5379/6640 [35:44<1:56:50, 5.56s/it] {'loss': 0.601, 'learning_rate': 1.8331152323397515e-06, 'epoch': 0.81} 81%|████████ | 5379/6640 [35:44<1:56:50, 5.56s/it] 81%|████████ | 5380/6640 [35:49<1:57:02, 5.57s/it] {'loss': 0.5755, 'learning_rate': 1.8303010698955803e-06, 'epoch': 0.81} 81%|████████ | 5380/6640 [35:49<1:57:02, 5.57s/it] 81%|████████ | 5381/6640 [35:55<1:56:46, 5.56s/it] {'loss': 0.5628, 'learning_rate': 1.827488851620589e-06, 'epoch': 0.81} 81%|████████ | 5381/6640 [35:55<1:56:46, 5.56s/it] 81%|████████ | 5382/6640 [36:00<1:54:38, 5.47s/it] {'loss': 0.5838, 'learning_rate': 1.8246785781840138e-06, 'epoch': 0.81} 81%|████████ | 5382/6640 [36:00<1:54:38, 5.47s/it] 81%|████████ | 5383/6640 [36:06<1:53:47, 5.43s/it] {'loss': 0.5774, 'learning_rate': 1.821870250254617e-06, 'epoch': 0.81} 81%|████████ | 5383/6640 [36:06<1:53:47, 5.43s/it] 81%|████████ | 5384/6640 [36:11<1:52:15, 5.36s/it] {'loss': 0.5453, 'learning_rate': 1.8190638685007111e-06, 'epoch': 0.81} 81%|████████ | 5384/6640 [36:11<1:52:15, 5.36s/it] 81%|████████ | 5385/6640 [36:16<1:52:52, 5.40s/it] {'loss': 0.5881, 'learning_rate': 1.8162594335901363e-06, 'epoch': 0.81} 81%|████████ | 5385/6640 [36:16<1:52:52, 5.40s/it] 81%|████████ | 5386/6640 [36:22<1:52:30, 5.38s/it] {'loss': 0.566, 'learning_rate': 1.8134569461902785e-06, 'epoch': 0.81} 81%|████████ | 5386/6640 [36:22<1:52:30, 5.38s/it] 81%|████████ | 5387/6640 [36:27<1:52:09, 5.37s/it] {'loss': 0.6047, 'learning_rate': 1.8106564069680476e-06, 'epoch': 0.81} 81%|████████ | 5387/6640 [36:27<1:52:09, 5.37s/it] 81%|████████ | 5388/6640 [36:32<1:52:21, 5.38s/it] {'loss': 0.6005, 'learning_rate': 1.8078578165898997e-06, 'epoch': 0.81} 81%|████████ | 5388/6640 [36:32<1:52:21, 5.38s/it] 81%|████████ | 5389/6640 [36:38<1:53:27, 5.44s/it] {'loss': 0.5875, 'learning_rate': 1.8050611757218251e-06, 'epoch': 0.81} 81%|████████ | 5389/6640 [36:38<1:53:27, 5.44s/it] 81%|████████ | 5390/6640 [36:44<1:55:17, 5.53s/it] {'loss': 0.574, 'learning_rate': 1.802266485029347e-06, 'epoch': 0.81} 81%|████████ | 5390/6640 [36:44<1:55:17, 5.53s/it] 81%|████████ | 5391/6640 [36:49<1:54:49, 5.52s/it] {'loss': 0.587, 'learning_rate': 1.7994737451775324e-06, 'epoch': 0.81} 81%|████████ | 5391/6640 [36:49<1:54:49, 5.52s/it] 81%|████████ | 5392/6640 [36:54<1:53:22, 5.45s/it] {'loss': 0.5773, 'learning_rate': 1.7966829568309708e-06, 'epoch': 0.81} 81%|████████ | 5392/6640 [36:54<1:53:22, 5.45s/it] 81%|████████ | 5393/6640 [37:00<1:53:27, 5.46s/it] {'loss': 0.5571, 'learning_rate': 1.7938941206537997e-06, 'epoch': 0.81} 81%|████████ | 5393/6640 [37:00<1:53:27, 5.46s/it] 81%|████████ | 5394/6640 [37:05<1:53:00, 5.44s/it] {'loss': 0.5836, 'learning_rate': 1.791107237309685e-06, 'epoch': 0.81} 81%|████████ | 5394/6640 [37:05<1:53:00, 5.44s/it] 81%|████████▏ | 5395/6640 [37:11<1:52:43, 5.43s/it] {'loss': 0.5748, 'learning_rate': 1.7883223074618316e-06, 'epoch': 0.81} 81%|████████▏ | 5395/6640 [37:11<1:52:43, 5.43s/it] 81%|████████▏ | 5396/6640 [37:16<1:52:55, 5.45s/it] {'loss': 0.5861, 'learning_rate': 1.7855393317729808e-06, 'epoch': 0.81} 81%|████████▏ | 5396/6640 [37:16<1:52:55, 5.45s/it] 81%|████████▏ | 5397/6640 [37:22<1:52:23, 5.43s/it] {'loss': 0.5824, 'learning_rate': 1.782758310905398e-06, 'epoch': 0.81} 81%|████████▏ | 5397/6640 [37:22<1:52:23, 5.43s/it] 81%|████████▏ | 5398/6640 [37:27<1:51:45, 5.40s/it] {'loss': 0.5667, 'learning_rate': 1.7799792455209019e-06, 'epoch': 0.81} 81%|████████▏ | 5398/6640 [37:27<1:51:45, 5.40s/it] 81%|████████▏ | 5399/6640 [37:32<1:51:21, 5.38s/it] {'loss': 0.5989, 'learning_rate': 1.7772021362808279e-06, 'epoch': 0.81} 81%|████████▏ | 5399/6640 [37:32<1:51:21, 5.38s/it]7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 81%|████████▏ | 5400/6640 [37:38<1:53:33, 5.50s/it]6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5852, 'learning_rate': 1.774426983846058e-06, 'epoch': 0.81} 81%|████████▏ | 5400/6640 [37:38<1:53:33, 5.50s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 81%|████████▏ | 5401/6640 [38:05<4:06:40, 11.95s/it] {'loss': 0.5671, 'learning_rate': 1.771653788876999e-06, 'epoch': 0.81} 81%|████████▏ | 5401/6640 [38:05<4:06:40, 11.95s/it] 81%|████████▏ | 5402/6640 [38:11<3:27:58, 10.08s/it] {'loss': 0.5854, 'learning_rate': 1.7688825520336017e-06, 'epoch': 0.81} 81%|████████▏ | 5402/6640 [38:11<3:27:58, 10.08s/it] 81%|████████▏ | 5403/6640 [38:16<2:58:28, 8.66s/it] {'loss': 0.5977, 'learning_rate': 1.7661132739753429e-06, 'epoch': 0.81} 81%|████████▏ | 5403/6640 [38:16<2:58:28, 8.66s/it] 81%|████████▏ | 5404/6640 [38:21<2:37:50, 7.66s/it] {'loss': 0.5937, 'learning_rate': 1.7633459553612387e-06, 'epoch': 0.81} 81%|████████▏ | 5404/6640 [38:21<2:37:50, 7.66s/it] 81%|████████▏ | 5405/6640 [38:27<2:24:08, 7.00s/it] {'loss': 0.56, 'learning_rate': 1.760580596849838e-06, 'epoch': 0.81} 81%|████████▏ | 5405/6640 [38:27<2:24:08, 7.00s/it] 81%|████████▏ | 5406/6640 [38:32<2:13:25, 6.49s/it] {'loss': 0.5952, 'learning_rate': 1.7578171990992144e-06, 'epoch': 0.81} 81%|████████▏ | 5406/6640 [38:32<2:13:25, 6.49s/it] 81%|████████▏ | 5407/6640 [38:38<2:06:55, 6.18s/it] {'loss': 0.5817, 'learning_rate': 1.7550557627669928e-06, 'epoch': 0.81} 81%|████████▏ | 5407/6640 [38:38<2:06:55, 6.18s/it] 81%|████████▏ | 5408/6640 [38:43<2:02:22, 5.96s/it] {'loss': 0.5698, 'learning_rate': 1.7522962885103145e-06, 'epoch': 0.81} 81%|████████▏ | 5408/6640 [38:43<2:02:22, 5.96s/it] 81%|████████▏ | 5409/6640 [38:48<1:58:43, 5.79s/it] {'loss': 0.5868, 'learning_rate': 1.749538776985864e-06, 'epoch': 0.81} 81%|████████▏ | 5409/6640 [38:48<1:58:43, 5.79s/it] 81%|████████▏ | 5410/6640 [38:54<1:58:05, 5.76s/it] {'loss': 0.5946, 'learning_rate': 1.746783228849851e-06, 'epoch': 0.81} 81%|████████▏ | 5410/6640 [38:54<1:58:05, 5.76s/it] 81%|████████▏ | 5411/6640 [39:00<1:56:10, 5.67s/it] {'loss': 0.5846, 'learning_rate': 1.744029644758023e-06, 'epoch': 0.81} 81%|████████▏ | 5411/6640 [39:00<1:56:10, 5.67s/it] 82%|████████▏ | 5412/6640 [39:05<1:54:48, 5.61s/it] {'loss': 0.5647, 'learning_rate': 1.7412780253656603e-06, 'epoch': 0.82} 82%|████████▏ | 5412/6640 [39:05<1:54:48, 5.61s/it] 82%|████████▏ | 5413/6640 [39:10<1:52:57, 5.52s/it] {'loss': 0.574, 'learning_rate': 1.7385283713275746e-06, 'epoch': 0.82} 82%|████████▏ | 5413/6640 [39:10<1:52:57, 5.52s/it] 82%|████████▏ | 5414/6640 [39:16<1:51:35, 5.46s/it] {'loss': 0.5875, 'learning_rate': 1.7357806832981127e-06, 'epoch': 0.82} 82%|████████▏ | 5414/6640 [39:16<1:51:35, 5.46s/it] 82%|████████▏ | 5415/6640 [39:21<1:51:21, 5.45s/it] {'loss': 0.5796, 'learning_rate': 1.7330349619311415e-06, 'epoch': 0.82} 82%|████████▏ | 5415/6640 [39:21<1:51:21, 5.45s/it] 82%|████████▏ | 5416/6640 [39:27<1:50:55, 5.44s/it] {'loss': 0.5759, 'learning_rate': 1.7302912078800805e-06, 'epoch': 0.82} 82%|████████▏ | 5416/6640 [39:27<1:50:55, 5.44s/it] 82%|████████▏ | 5417/6640 [39:32<1:50:27, 5.42s/it] {'loss': 0.5699, 'learning_rate': 1.7275494217978616e-06, 'epoch': 0.82} 82%|████████▏ | 5417/6640 [39:32<1:50:27, 5.42s/it] 82%|████████▏ | 5418/6640 [39:37<1:49:52, 5.39s/it] {'loss': 0.5679, 'learning_rate': 1.724809604336961e-06, 'epoch': 0.82} 82%|████████▏ | 5418/6640 [39:37<1:49:52, 5.39s/it] 82%|████████▏ | 5419/6640 [39:43<1:49:19, 5.37s/it] {'loss': 0.5643, 'learning_rate': 1.7220717561493773e-06, 'epoch': 0.82} 82%|████████▏ | 5419/6640 [39:43<1:49:19, 5.37s/it] 82%|████████▏ | 5420/6640 [39:48<1:51:53, 5.50s/it] {'loss': 0.5825, 'learning_rate': 1.7193358778866464e-06, 'epoch': 0.82} 82%|████████▏ | 5420/6640 [39:48<1:51:53, 5.50s/it] 82%|████████▏ | 5421/6640 [39:54<1:52:37, 5.54s/it] {'loss': 0.5765, 'learning_rate': 1.716601970199836e-06, 'epoch': 0.82} 82%|████████▏ | 5421/6640 [39:54<1:52:37, 5.54s/it] 82%|████████▏ | 5422/6640 [40:00<1:52:09, 5.53s/it] {'loss': 0.5741, 'learning_rate': 1.713870033739541e-06, 'epoch': 0.82} 82%|████████▏ | 5422/6640 [40:00<1:52:09, 5.53s/it] 82%|████████▏ | 5423/6640 [40:05<1:51:02, 5.47s/it] {'loss': 0.5772, 'learning_rate': 1.7111400691558911e-06, 'epoch': 0.82} 82%|████████▏ | 5423/6640 [40:05<1:51:02, 5.47s/it] 82%|████████▏ | 5424/6640 [40:10<1:50:11, 5.44s/it] {'loss': 0.572, 'learning_rate': 1.708412077098539e-06, 'epoch': 0.82} 82%|████████▏ | 5424/6640 [40:10<1:50:11, 5.44s/it] 82%|████████▏ | 5425/6640 [40:16<1:50:11, 5.44s/it] {'loss': 0.5864, 'learning_rate': 1.7056860582166823e-06, 'epoch': 0.82} 82%|████████▏ | 5425/6640 [40:16<1:50:11, 5.44s/it] 82%|████████▏ | 5426/6640 [40:21<1:49:31, 5.41s/it] {'loss': 0.5732, 'learning_rate': 1.702962013159033e-06, 'epoch': 0.82} 82%|████████▏ | 5426/6640 [40:21<1:49:31, 5.41s/it] 82%|████████▏ | 5427/6640 [40:26<1:48:24, 5.36s/it] {'loss': 0.5732, 'learning_rate': 1.7002399425738459e-06, 'epoch': 0.82} 82%|████████▏ | 5427/6640 [40:26<1:48:24, 5.36s/it] 82%|████████▏ | 5428/6640 [40:32<1:48:09, 5.35s/it] {'loss': 0.5851, 'learning_rate': 1.6975198471088973e-06, 'epoch': 0.82} 82%|████████▏ | 5428/6640 [40:32<1:48:09, 5.35s/it] 82%|████████▏ | 5429/6640 [40:37<1:48:26, 5.37s/it] {'loss': 0.6027, 'learning_rate': 1.6948017274114959e-06, 'epoch': 0.82} 82%|████████▏ | 5429/6640 [40:37<1:48:26, 5.37s/it] 82%|████████▏ | 5430/6640 [40:42<1:49:00, 5.41s/it] {'loss': 0.5953, 'learning_rate': 1.6920855841284844e-06, 'epoch': 0.82} 82%|████████▏ | 5430/6640 [40:42<1:49:00, 5.41s/it] 82%|████████▏ | 5431/6640 [40:48<1:48:10, 5.37s/it] {'loss': 0.5645, 'learning_rate': 1.6893714179062315e-06, 'epoch': 0.82} 82%|████████▏ | 5431/6640 [40:48<1:48:10, 5.37s/it] 82%|████████▏ | 5432/6640 [40:53<1:48:10, 5.37s/it] {'loss': 0.5449, 'learning_rate': 1.6866592293906369e-06, 'epoch': 0.82} 82%|████████▏ | 5432/6640 [40:53<1:48:10, 5.37s/it] 82%|████████▏ | 5433/6640 [40:59<1:49:59, 5.47s/it] {'loss': 0.5764, 'learning_rate': 1.6839490192271225e-06, 'epoch': 0.82} 82%|████████▏ | 5433/6640 [40:59<1:49:59, 5.47s/it] 82%|████████▏ | 5434/6640 [41:04<1:50:14, 5.48s/it] {'loss': 0.5634, 'learning_rate': 1.6812407880606563e-06, 'epoch': 0.82} 82%|████████▏ | 5434/6640 [41:04<1:50:14, 5.48s/it] 82%|████████▏ | 5435/6640 [41:10<1:49:38, 5.46s/it] {'loss': 0.5376, 'learning_rate': 1.6785345365357153e-06, 'epoch': 0.82} 82%|████████▏ | 5435/6640 [41:10<1:49:38, 5.46s/it] 82%|████████▏ | 5436/6640 [41:15<1:49:56, 5.48s/it] {'loss': 0.6004, 'learning_rate': 1.6758302652963176e-06, 'epoch': 0.82} 82%|████████▏ | 5436/6640 [41:15<1:49:56, 5.48s/it] 82%|████████▏ | 5437/6640 [41:21<1:48:57, 5.43s/it] {'loss': 0.6126, 'learning_rate': 1.6731279749860086e-06, 'epoch': 0.82} 82%|████████▏ | 5437/6640 [41:21<1:48:57, 5.43s/it] 82%|████████▏ | 5438/6640 [41:26<1:47:55, 5.39s/it] {'loss': 0.6017, 'learning_rate': 1.6704276662478602e-06, 'epoch': 0.82} 82%|████████▏ | 5438/6640 [41:26<1:47:55, 5.39s/it] 82%|████████▏ | 5439/6640 [41:31<1:47:38, 5.38s/it] {'loss': 0.5578, 'learning_rate': 1.6677293397244753e-06, 'epoch': 0.82} 82%|████████▏ | 5439/6640 [41:31<1:47:38, 5.38s/it] 82%|████████▏ | 5440/6640 [41:37<1:49:42, 5.49s/it] {'loss': 0.5726, 'learning_rate': 1.6650329960579792e-06, 'epoch': 0.82} 82%|████████▏ | 5440/6640 [41:37<1:49:42, 5.49s/it] 82%|████████▏ | 5441/6640 [41:42<1:47:57, 5.40s/it] {'loss': 0.5564, 'learning_rate': 1.6623386358900339e-06, 'epoch': 0.82} 82%|████████▏ | 5441/6640 [41:42<1:47:57, 5.40s/it] 82%|████████▏ | 5442/6640 [41:48<1:48:15, 5.42s/it] {'loss': 0.6103, 'learning_rate': 1.6596462598618179e-06, 'epoch': 0.82} 82%|████████▏ | 5442/6640 [41:48<1:48:15, 5.42s/it] 82%|████████▏ | 5443/6640 [41:53<1:48:05, 5.42s/it] {'loss': 0.5852, 'learning_rate': 1.656955868614053e-06, 'epoch': 0.82} 82%|████████▏ | 5443/6640 [41:53<1:48:05, 5.42s/it] 82%|████████▏ | 5444/6640 [41:58<1:47:25, 5.39s/it] {'loss': 0.5999, 'learning_rate': 1.6542674627869738e-06, 'epoch': 0.82} 82%|████████▏ | 5444/6640 [41:58<1:47:25, 5.39s/it] 82%|████████▏ | 5445/6640 [42:04<1:47:23, 5.39s/it] {'loss': 0.5674, 'learning_rate': 1.6515810430203516e-06, 'epoch': 0.82} 82%|████████▏ | 5445/6640 [42:04<1:47:23, 5.39s/it] 82%|████████▏ | 5446/6640 [42:09<1:46:34, 5.36s/it] {'loss': 0.6029, 'learning_rate': 1.648896609953481e-06, 'epoch': 0.82} 82%|████████▏ | 5446/6640 [42:09<1:46:34, 5.36s/it] 82%|████████▏ | 5447/6640 [42:15<1:46:59, 5.38s/it] {'loss': 0.5737, 'learning_rate': 1.6462141642251862e-06, 'epoch': 0.82} 82%|████████▏ | 5447/6640 [42:15<1:46:59, 5.38s/it] 82%|████████▏ | 5448/6640 [42:20<1:46:32, 5.36s/it] {'loss': 0.5676, 'learning_rate': 1.643533706473819e-06, 'epoch': 0.82} 82%|████████▏ | 5448/6640 [42:20<1:46:32, 5.36s/it] 82%|████████▏ | 5449/6640 [42:25<1:45:49, 5.33s/it] {'loss': 0.5817, 'learning_rate': 1.640855237337252e-06, 'epoch': 0.82} 82%|████████▏ | 5449/6640 [42:25<1:45:49, 5.33s/it]20 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 82%|████████▏ | 5450/6640 [42:31<1:47:12, 5.41s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5675, 'learning_rate': 1.638178757452894e-06, 'epoch': 0.82} 82%|████████▏ | 5450/6640 [42:31<1:47:12, 5.41s/it] 82%|████████▏ | 5451/6640 [42:36<1:48:31, 5.48s/it] {'loss': 0.5735, 'learning_rate': 1.6355042674576671e-06, 'epoch': 0.82} 82%|████████▏ | 5451/6640 [42:36<1:48:31, 5.48s/it] 82%|████████▏ | 5452/6640 [42:42<1:48:32, 5.48s/it] {'loss': 0.5611, 'learning_rate': 1.632831767988039e-06, 'epoch': 0.82} 82%|████████▏ | 5452/6640 [42:42<1:48:32, 5.48s/it] 82%|████████▏ | 5453/6640 [42:47<1:48:42, 5.50s/it] {'loss': 0.5429, 'learning_rate': 1.6301612596799854e-06, 'epoch': 0.82} 82%|████████▏ | 5453/6640 [42:47<1:48:42, 5.50s/it] 82%|████████▏ | 5454/6640 [42:53<1:48:11, 5.47s/it] {'loss': 0.5738, 'learning_rate': 1.627492743169018e-06, 'epoch': 0.82} 82%|████████▏ | 5454/6640 [42:53<1:48:11, 5.47s/it] 82%|████████▏ | 5455/6640 [42:58<1:47:08, 5.42s/it] {'loss': 0.5566, 'learning_rate': 1.624826219090172e-06, 'epoch': 0.82} 82%|████████▏ | 5455/6640 [42:58<1:47:08, 5.42s/it] 82%|████████▏ | 5456/6640 [43:03<1:46:41, 5.41s/it] {'loss': 0.5782, 'learning_rate': 1.6221616880780078e-06, 'epoch': 0.82} 82%|████████▏ | 5456/6640 [43:03<1:46:41, 5.41s/it] 82%|████████▏ | 5457/6640 [43:09<1:46:46, 5.42s/it] {'loss': 0.5694, 'learning_rate': 1.6194991507666159e-06, 'epoch': 0.82} 82%|████████▏ | 5457/6640 [43:09<1:46:46, 5.42s/it] 82%|████████▏ | 5458/6640 [43:14<1:46:58, 5.43s/it] {'loss': 0.577, 'learning_rate': 1.6168386077896036e-06, 'epoch': 0.82} 82%|████████▏ | 5458/6640 [43:14<1:46:58, 5.43s/it] 82%|████████▏ | 5459/6640 [43:20<1:47:01, 5.44s/it] {'loss': 0.5799, 'learning_rate': 1.6141800597801139e-06, 'epoch': 0.82} 82%|████████▏ | 5459/6640 [43:20<1:47:01, 5.44s/it] 82%|████████▏ | 5460/6640 [43:25<1:47:16, 5.45s/it] {'loss': 0.5892, 'learning_rate': 1.6115235073708024e-06, 'epoch': 0.82} 82%|████████▏ | 5460/6640 [43:25<1:47:16, 5.45s/it] 82%|████████▏ | 5461/6640 [43:31<1:46:36, 5.43s/it] {'loss': 0.5618, 'learning_rate': 1.608868951193867e-06, 'epoch': 0.82} 82%|████████▏ | 5461/6640 [43:31<1:46:36, 5.43s/it] 82%|████████▏ | 5462/6640 [43:36<1:47:04, 5.45s/it] {'loss': 0.5801, 'learning_rate': 1.6062163918810136e-06, 'epoch': 0.82} 82%|████████▏ | 5462/6640 [43:36<1:47:04, 5.45s/it] 82%|████████▏ | 5463/6640 [43:42<1:47:36, 5.49s/it] {'loss': 0.5836, 'learning_rate': 1.6035658300634816e-06, 'epoch': 0.82} 82%|████████▏ | 5463/6640 [43:42<1:47:36, 5.49s/it] 82%|████████▏ | 5464/6640 [43:47<1:46:44, 5.45s/it] {'loss': 0.5602, 'learning_rate': 1.6009172663720352e-06, 'epoch': 0.82} 82%|████████▏ | 5464/6640 [43:47<1:46:44, 5.45s/it] 82%|████████▏ | 5465/6640 [43:53<1:46:42, 5.45s/it] {'loss': 0.5722, 'learning_rate': 1.5982707014369603e-06, 'epoch': 0.82} 82%|████████▏ | 5465/6640 [43:53<1:46:42, 5.45s/it] 82%|████████▏ | 5466/6640 [43:58<1:47:40, 5.50s/it] {'loss': 0.5862, 'learning_rate': 1.595626135888071e-06, 'epoch': 0.82} 82%|████████▏ | 5466/6640 [43:58<1:47:40, 5.50s/it] 82%|████████▏ | 5467/6640 [44:03<1:46:21, 5.44s/it] {'loss': 0.5873, 'learning_rate': 1.5929835703546992e-06, 'epoch': 0.82} 82%|████████▏ | 5467/6640 [44:03<1:46:21, 5.44s/it] 82%|████████▏ | 5468/6640 [44:09<1:46:43, 5.46s/it] {'loss': 0.6002, 'learning_rate': 1.5903430054657077e-06, 'epoch': 0.82} 82%|████████▏ | 5468/6640 [44:09<1:46:43, 5.46s/it] 82%|████████▏ | 5469/6640 [44:15<1:47:04, 5.49s/it] {'loss': 0.5614, 'learning_rate': 1.5877044418494747e-06, 'epoch': 0.82} 82%|████████▏ | 5469/6640 [44:15<1:47:04, 5.49s/it] 82%|████████▏ | 5470/6640 [44:20<1:46:01, 5.44s/it] {'loss': 0.5613, 'learning_rate': 1.585067880133916e-06, 'epoch': 0.82} 82%|████████▏ | 5470/6640 [44:20<1:46:01, 5.44s/it] 82%|████████▏ | 5471/6640 [44:25<1:46:46, 5.48s/it] {'loss': 0.5658, 'learning_rate': 1.582433320946456e-06, 'epoch': 0.82} 82%|████████▏ | 5471/6640 [44:25<1:46:46, 5.48s/it] 82%|████████▏ | 5472/6640 [44:31<1:47:57, 5.55s/it] {'loss': 0.584, 'learning_rate': 1.57980076491405e-06, 'epoch': 0.82} 82%|████████▏ | 5472/6640 [44:31<1:47:57, 5.55s/it] 82%|████████▏ | 5473/6640 [44:37<1:47:14, 5.51s/it] {'loss': 0.5749, 'learning_rate': 1.5771702126631784e-06, 'epoch': 0.82} 82%|████████▏ | 5473/6640 [44:37<1:47:14, 5.51s/it] 82%|████████▏ | 5474/6640 [44:42<1:46:43, 5.49s/it] {'loss': 0.5681, 'learning_rate': 1.5745416648198386e-06, 'epoch': 0.82} 82%|████████▏ | 5474/6640 [44:42<1:46:43, 5.49s/it] 82%|████████▏ | 5475/6640 [44:47<1:46:34, 5.49s/it] {'loss': 0.5746, 'learning_rate': 1.5719151220095596e-06, 'epoch': 0.82} 82%|████████▏ | 5475/6640 [44:47<1:46:34, 5.49s/it] 82%|████████▏ | 5476/6640 [44:53<1:46:44, 5.50s/it] {'loss': 0.5595, 'learning_rate': 1.5692905848573836e-06, 'epoch': 0.82} 82%|████████▏ | 5476/6640 [44:53<1:46:44, 5.50s/it] 82%|████████▏ | 5477/6640 [44:58<1:45:56, 5.47s/it] {'loss': 0.5921, 'learning_rate': 1.5666680539878797e-06, 'epoch': 0.82} 82%|████████▏ | 5477/6640 [44:58<1:45:56, 5.47s/it] 82%|████████▎ | 5478/6640 [45:04<1:45:48, 5.46s/it] {'loss': 0.5565, 'learning_rate': 1.5640475300251423e-06, 'epoch': 0.82} 82%|████████▎ | 5478/6640 [45:04<1:45:48, 5.46s/it] 83%|████████▎ | 5479/6640 [45:09<1:45:57, 5.48s/it] {'loss': 0.5836, 'learning_rate': 1.5614290135927857e-06, 'epoch': 0.83} 83%|████████▎ | 5479/6640 [45:09<1:45:57, 5.48s/it] 83%|████████▎ | 5480/6640 [45:15<1:45:56, 5.48s/it] {'loss': 0.587, 'learning_rate': 1.558812505313947e-06, 'epoch': 0.83} 83%|████████▎ | 5480/6640 [45:15<1:45:56, 5.48s/it] 83%|████████▎ | 5481/6640 [45:20<1:45:53, 5.48s/it] {'loss': 0.5857, 'learning_rate': 1.5561980058112825e-06, 'epoch': 0.83} 83%|████████▎ | 5481/6640 [45:20<1:45:53, 5.48s/it] 83%|████████▎ | 5482/6640 [45:26<1:45:43, 5.48s/it] {'loss': 0.5512, 'learning_rate': 1.5535855157069734e-06, 'epoch': 0.83} 83%|████████▎ | 5482/6640 [45:26<1:45:43, 5.48s/it] 83%|████████▎ | 5483/6640 [45:31<1:44:26, 5.42s/it] {'loss': 0.5693, 'learning_rate': 1.5509750356227249e-06, 'epoch': 0.83} 83%|████████▎ | 5483/6640 [45:31<1:44:26, 5.42s/it] 83%|████████▎ | 5484/6640 [45:37<1:44:33, 5.43s/it] {'loss': 0.5846, 'learning_rate': 1.5483665661797598e-06, 'epoch': 0.83} 83%|████████▎ | 5484/6640 [45:37<1:44:33, 5.43s/it] 83%|████████▎ | 5485/6640 [45:42<1:43:47, 5.39s/it] {'loss': 0.5453, 'learning_rate': 1.5457601079988226e-06, 'epoch': 0.83} 83%|████████▎ | 5485/6640 [45:42<1:43:47, 5.39s/it] 83%|████████▎ | 5486/6640 [45:47<1:43:35, 5.39s/it] {'loss': 0.5804, 'learning_rate': 1.5431556617001808e-06, 'epoch': 0.83} 83%|████████▎ | 5486/6640 [45:47<1:43:35, 5.39s/it] 83%|████████▎ | 5487/6640 [45:52<1:42:57, 5.36s/it] {'loss': 0.6043, 'learning_rate': 1.540553227903624e-06, 'epoch': 0.83} 83%|████████▎ | 5487/6640 [45:52<1:42:57, 5.36s/it] 83%|████████▎ | 5488/6640 [45:58<1:43:25, 5.39s/it] {'loss': 0.5758, 'learning_rate': 1.53795280722846e-06, 'epoch': 0.83} 83%|████████▎ | 5488/6640 [45:58<1:43:25, 5.39s/it] 83%|████████▎ | 5489/6640 [46:03<1:43:41, 5.41s/it] {'loss': 0.5985, 'learning_rate': 1.5353544002935229e-06, 'epoch': 0.83} 83%|████████▎ | 5489/6640 [46:03<1:43:41, 5.41s/it] 83%|████████▎ | 5490/6640 [46:09<1:43:38, 5.41s/it] {'loss': 0.5611, 'learning_rate': 1.5327580077171589e-06, 'epoch': 0.83} 83%|████████▎ | 5490/6640 [46:09<1:43:38, 5.41s/it] 83%|████████▎ | 5491/6640 [46:14<1:43:34, 5.41s/it] {'loss': 0.554, 'learning_rate': 1.5301636301172418e-06, 'epoch': 0.83} 83%|████████▎ | 5491/6640 [46:14<1:43:34, 5.41s/it] 83%|████████▎ | 5492/6640 [46:20<1:43:41, 5.42s/it] {'loss': 0.5663, 'learning_rate': 1.5275712681111643e-06, 'epoch': 0.83} 83%|████████▎ | 5492/6640 [46:20<1:43:41, 5.42s/it] 83%|████████▎ | 5493/6640 [46:25<1:43:27, 5.41s/it] {'loss': 0.5848, 'learning_rate': 1.5249809223158406e-06, 'epoch': 0.83} 83%|████████▎ | 5493/6640 [46:25<1:43:27, 5.41s/it] 83%|████████▎ | 5494/6640 [46:31<1:44:10, 5.45s/it] {'loss': 0.5888, 'learning_rate': 1.5223925933477002e-06, 'epoch': 0.83} 83%|████████▎ | 5494/6640 [46:31<1:44:10, 5.45s/it] 83%|████████▎ | 5495/6640 [46:36<1:43:40, 5.43s/it] {'loss': 0.5835, 'learning_rate': 1.5198062818226967e-06, 'epoch': 0.83} 83%|████████▎ | 5495/6640 [46:36<1:43:40, 5.43s/it] 83%|████████▎ | 5496/6640 [46:41<1:43:15, 5.42s/it] {'loss': 0.5856, 'learning_rate': 1.5172219883563033e-06, 'epoch': 0.83} 83%|████████▎ | 5496/6640 [46:41<1:43:15, 5.42s/it] 83%|████████▎ | 5497/6640 [46:47<1:44:57, 5.51s/it] {'loss': 0.5667, 'learning_rate': 1.514639713563514e-06, 'epoch': 0.83} 83%|████████▎ | 5497/6640 [46:47<1:44:57, 5.51s/it] 83%|████████▎ | 5498/6640 [46:52<1:43:56, 5.46s/it] {'loss': 0.5788, 'learning_rate': 1.512059458058841e-06, 'epoch': 0.83} 83%|████████▎ | 5498/6640 [46:52<1:43:56, 5.46s/it] 83%|████████▎ | 5499/6640 [46:58<1:44:20, 5.49s/it] {'loss': 0.5755, 'learning_rate': 1.5094812224563117e-06, 'epoch': 0.83} 83%|████████▎ | 5499/6640 [46:58<1:44:20, 5.49s/it]07 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 32 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 46 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 83%|████████▎ | 5500/6640 [47:04<1:44:38, 5.51s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5819, 'learning_rate': 1.5069050073694813e-06, 'epoch': 0.83} 83%|████████▎ | 5500/6640 [47:04<1:44:38, 5.51s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 83%|████████▎ | 5501/6640 [47:30<3:44:22, 11.82s/it] {'loss': 0.579, 'learning_rate': 1.5043308134114177e-06, 'epoch': 0.83} 83%|████████▎ | 5501/6640 [47:30<3:44:22, 11.82s/it] 83%|████████▎ | 5502/6640 [47:35<3:07:19, 9.88s/it] {'loss': 0.5778, 'learning_rate': 1.5017586411947138e-06, 'epoch': 0.83} 83%|████████▎ | 5502/6640 [47:35<3:07:19, 9.88s/it] 83%|████████▎ | 5503/6640 [47:41<2:41:57, 8.55s/it] {'loss': 0.5853, 'learning_rate': 1.4991884913314714e-06, 'epoch': 0.83} 83%|████████▎ | 5503/6640 [47:41<2:41:57, 8.55s/it] 83%|████████▎ | 5504/6640 [47:47<2:25:28, 7.68s/it] {'loss': 0.5855, 'learning_rate': 1.496620364433321e-06, 'epoch': 0.83} 83%|████████▎ | 5504/6640 [47:47<2:25:28, 7.68s/it] 83%|████████▎ | 5505/6640 [47:52<2:12:09, 6.99s/it] {'loss': 0.598, 'learning_rate': 1.4940542611114073e-06, 'epoch': 0.83} 83%|████████▎ | 5505/6640 [47:52<2:12:09, 6.99s/it] 83%|████████▎ | 5506/6640 [47:57<2:03:14, 6.52s/it] {'loss': 0.5726, 'learning_rate': 1.4914901819763938e-06, 'epoch': 0.83} 83%|████████▎ | 5506/6640 [47:57<2:03:14, 6.52s/it] 83%|████████▎ | 5507/6640 [48:03<1:57:11, 6.21s/it] {'loss': 0.583, 'learning_rate': 1.4889281276384648e-06, 'epoch': 0.83} 83%|████████▎ | 5507/6640 [48:03<1:57:11, 6.21s/it] 83%|████████▎ | 5508/6640 [48:08<1:53:14, 6.00s/it] {'loss': 0.5878, 'learning_rate': 1.486368098707317e-06, 'epoch': 0.83} 83%|████████▎ | 5508/6640 [48:08<1:53:14, 6.00s/it] 83%|████████▎ | 5509/6640 [48:14<1:49:11, 5.79s/it] {'loss': 0.5642, 'learning_rate': 1.4838100957921697e-06, 'epoch': 0.83} 83%|████████▎ | 5509/6640 [48:14<1:49:11, 5.79s/it] 83%|████████▎ | 5510/6640 [48:19<1:46:57, 5.68s/it] {'loss': 0.5722, 'learning_rate': 1.4812541195017593e-06, 'epoch': 0.83} 83%|████████▎ | 5510/6640 [48:19<1:46:57, 5.68s/it] 83%|████████▎ | 5511/6640 [48:24<1:45:19, 5.60s/it] {'loss': 0.5722, 'learning_rate': 1.4787001704443426e-06, 'epoch': 0.83} 83%|████████▎ | 5511/6640 [48:24<1:45:19, 5.60s/it] 83%|████████▎ | 5512/6640 [48:30<1:43:24, 5.50s/it] {'loss': 0.5811, 'learning_rate': 1.4761482492276847e-06, 'epoch': 0.83} 83%|████████▎ | 5512/6640 [48:30<1:43:24, 5.50s/it] 83%|████████▎ | 5513/6640 [48:35<1:43:02, 5.49s/it] {'loss': 0.6024, 'learning_rate': 1.4735983564590784e-06, 'epoch': 0.83} 83%|████████▎ | 5513/6640 [48:35<1:43:02, 5.49s/it] 83%|████████▎ | 5514/6640 [48:41<1:42:40, 5.47s/it] {'loss': 0.5872, 'learning_rate': 1.4710504927453295e-06, 'epoch': 0.83} 83%|████████▎ | 5514/6640 [48:41<1:42:40, 5.47s/it] 83%|████████▎ | 5515/6640 [48:46<1:41:17, 5.40s/it] {'loss': 0.5678, 'learning_rate': 1.4685046586927598e-06, 'epoch': 0.83} 83%|████████▎ | 5515/6640 [48:46<1:41:17, 5.40s/it] 83%|████████▎ | 5516/6640 [48:51<1:40:26, 5.36s/it] {'loss': 0.5718, 'learning_rate': 1.4659608549072135e-06, 'epoch': 0.83} 83%|████████▎ | 5516/6640 [48:51<1:40:26, 5.36s/it] 83%|████████▎ | 5517/6640 [48:56<1:40:12, 5.35s/it] {'loss': 0.5835, 'learning_rate': 1.463419081994042e-06, 'epoch': 0.83} 83%|████████▎ | 5517/6640 [48:56<1:40:12, 5.35s/it] 83%|████████▎ | 5518/6640 [49:02<1:40:29, 5.37s/it] {'loss': 0.5681, 'learning_rate': 1.4608793405581224e-06, 'epoch': 0.83} 83%|████████▎ | 5518/6640 [49:02<1:40:29, 5.37s/it] 83%|████████▎ | 5519/6640 [49:07<1:40:23, 5.37s/it] {'loss': 0.5981, 'learning_rate': 1.4583416312038434e-06, 'epoch': 0.83} 83%|████████▎ | 5519/6640 [49:07<1:40:23, 5.37s/it] 83%|████████▎ | 5520/6640 [49:13<1:39:52, 5.35s/it] {'loss': 0.5873, 'learning_rate': 1.4558059545351144e-06, 'epoch': 0.83} 83%|████████▎ | 5520/6640 [49:13<1:39:52, 5.35s/it] 83%|████████▎ | 5521/6640 [49:18<1:40:27, 5.39s/it] {'loss': 0.6011, 'learning_rate': 1.453272311155357e-06, 'epoch': 0.83} 83%|████████▎ | 5521/6640 [49:18<1:40:27, 5.39s/it] 83%|████████▎ | 5522/6640 [49:23<1:40:36, 5.40s/it] {'loss': 0.5728, 'learning_rate': 1.450740701667509e-06, 'epoch': 0.83} 83%|████████▎ | 5522/6640 [49:23<1:40:36, 5.40s/it] 83%|████████▎ | 5523/6640 [49:29<1:41:02, 5.43s/it] {'loss': 0.564, 'learning_rate': 1.4482111266740274e-06, 'epoch': 0.83} 83%|████████▎ | 5523/6640 [49:29<1:41:02, 5.43s/it] 83%|████████▎ | 5524/6640 [49:34<1:40:23, 5.40s/it] {'loss': 0.5977, 'learning_rate': 1.4456835867768814e-06, 'epoch': 0.83} 83%|████████▎ | 5524/6640 [49:34<1:40:23, 5.40s/it] 83%|████████▎ | 5525/6640 [49:40<1:40:02, 5.38s/it] {'loss': 0.5654, 'learning_rate': 1.4431580825775604e-06, 'epoch': 0.83} 83%|████████▎ | 5525/6640 [49:40<1:40:02, 5.38s/it] 83%|████████▎ | 5526/6640 [49:45<1:40:29, 5.41s/it] {'loss': 0.5778, 'learning_rate': 1.4406346146770633e-06, 'epoch': 0.83} 83%|████████▎ | 5526/6640 [49:45<1:40:29, 5.41s/it] 83%|████████▎ | 5527/6640 [49:51<1:40:20, 5.41s/it] {'loss': 0.564, 'learning_rate': 1.43811318367591e-06, 'epoch': 0.83} 83%|████████▎ | 5527/6640 [49:51<1:40:20, 5.41s/it] 83%|████████▎ | 5528/6640 [49:56<1:40:33, 5.43s/it] {'loss': 0.5763, 'learning_rate': 1.4355937901741324e-06, 'epoch': 0.83} 83%|████████▎ | 5528/6640 [49:56<1:40:33, 5.43s/it] 83%|████████▎ | 5529/6640 [50:02<1:41:06, 5.46s/it] {'loss': 0.593, 'learning_rate': 1.433076434771279e-06, 'epoch': 0.83} 83%|████████▎ | 5529/6640 [50:02<1:41:06, 5.46s/it] 83%|████████▎ | 5530/6640 [50:07<1:41:02, 5.46s/it] {'loss': 0.5732, 'learning_rate': 1.4305611180664157e-06, 'epoch': 0.83} 83%|████████▎ | 5530/6640 [50:07<1:41:02, 5.46s/it] 83%|████████▎ | 5531/6640 [50:12<1:40:19, 5.43s/it] {'loss': 0.5694, 'learning_rate': 1.4280478406581156e-06, 'epoch': 0.83} 83%|████████▎ | 5531/6640 [50:12<1:40:19, 5.43s/it] 83%|████████▎ | 5532/6640 [50:18<1:39:52, 5.41s/it] {'loss': 0.5883, 'learning_rate': 1.4255366031444717e-06, 'epoch': 0.83} 83%|████████▎ | 5532/6640 [50:18<1:39:52, 5.41s/it] 83%|████████▎ | 5533/6640 [50:23<1:39:45, 5.41s/it] {'loss': 0.5737, 'learning_rate': 1.4230274061230943e-06, 'epoch': 0.83} 83%|████████▎ | 5533/6640 [50:23<1:39:45, 5.41s/it] 83%|████████▎ | 5534/6640 [50:29<1:40:30, 5.45s/it] {'loss': 0.5705, 'learning_rate': 1.4205202501911052e-06, 'epoch': 0.83} 83%|████████▎ | 5534/6640 [50:29<1:40:30, 5.45s/it] 83%|████████▎ | 5535/6640 [50:34<1:40:59, 5.48s/it] {'loss': 0.5814, 'learning_rate': 1.4180151359451367e-06, 'epoch': 0.83} 83%|████████▎ | 5535/6640 [50:34<1:40:59, 5.48s/it] 83%|████████▎ | 5536/6640 [50:40<1:40:32, 5.46s/it] {'loss': 0.5708, 'learning_rate': 1.4155120639813392e-06, 'epoch': 0.83} 83%|████████▎ | 5536/6640 [50:40<1:40:32, 5.46s/it] 83%|████████▎ | 5537/6640 [50:45<1:39:51, 5.43s/it] {'loss': 0.5599, 'learning_rate': 1.4130110348953795e-06, 'epoch': 0.83} 83%|████████▎ | 5537/6640 [50:45<1:39:51, 5.43s/it] 83%|████████▎ | 5538/6640 [50:50<1:39:11, 5.40s/it] {'loss': 0.563, 'learning_rate': 1.410512049282433e-06, 'epoch': 0.83} 83%|████████▎ | 5538/6640 [50:50<1:39:11, 5.40s/it] 83%|████████▎ | 5539/6640 [50:56<1:38:45, 5.38s/it] {'loss': 0.5804, 'learning_rate': 1.408015107737195e-06, 'epoch': 0.83} 83%|████████▎ | 5539/6640 [50:56<1:38:45, 5.38s/it] 83%|████████▎ | 5540/6640 [51:01<1:39:10, 5.41s/it] {'loss': 0.6031, 'learning_rate': 1.4055202108538657e-06, 'epoch': 0.83} 83%|████████▎ | 5540/6640 [51:01<1:39:10, 5.41s/it] 83%|████████▎ | 5541/6640 [51:06<1:38:05, 5.36s/it] {'loss': 0.594, 'learning_rate': 1.4030273592261656e-06, 'epoch': 0.83} 83%|████████▎ | 5541/6640 [51:06<1:38:05, 5.36s/it] 83%|████████▎ | 5542/6640 [51:12<1:38:55, 5.41s/it] {'loss': 0.5907, 'learning_rate': 1.400536553447327e-06, 'epoch': 0.83} 83%|████████▎ | 5542/6640 [51:12<1:38:55, 5.41s/it] 83%|████████▎ | 5543/6640 [51:17<1:39:14, 5.43s/it] {'loss': 0.5741, 'learning_rate': 1.3980477941100956e-06, 'epoch': 0.83} 83%|████████▎ | 5543/6640 [51:17<1:39:14, 5.43s/it] 83%|████████▎ | 5544/6640 [51:23<1:38:58, 5.42s/it] {'loss': 0.5661, 'learning_rate': 1.3955610818067267e-06, 'epoch': 0.83} 83%|████████▎ | 5544/6640 [51:23<1:38:58, 5.42s/it] 84%|████████▎ | 5545/6640 [51:28<1:39:08, 5.43s/it] {'loss': 0.5984, 'learning_rate': 1.3930764171289935e-06, 'epoch': 0.84} 84%|████████▎ | 5545/6640 [51:28<1:39:08, 5.43s/it] 84%|████████▎ | 5546/6640 [51:34<1:39:44, 5.47s/it] {'loss': 0.5551, 'learning_rate': 1.3905938006681773e-06, 'epoch': 0.84} 84%|████████▎ | 5546/6640 [51:34<1:39:44, 5.47s/it] 84%|████████▎ | 5547/6640 [51:39<1:39:40, 5.47s/it] {'loss': 0.5779, 'learning_rate': 1.3881132330150771e-06, 'epoch': 0.84} 84%|████████▎ | 5547/6640 [51:39<1:39:40, 5.47s/it] 84%|████████▎ | 5548/6640 [51:45<1:38:38, 5.42s/it] {'loss': 0.5951, 'learning_rate': 1.3856347147600014e-06, 'epoch': 0.84} 84%|████████▎ | 5548/6640 [51:45<1:38:38, 5.42s/it] 84%|████████▎ | 5549/6640 [51:50<1:37:41, 5.37s/it] {'loss': 0.5507, 'learning_rate': 1.3831582464927685e-06, 'epoch': 0.84} 84%|████████▎ | 5549/6640 [51:50<1:37:41, 5.37s/it]1 AutoResumeHook: Checking whether to suspend... 60 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 84%|████████▎ | 5550/6640 [51:55<1:38:03, 5.40s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5765, 'learning_rate': 1.3806838288027113e-06, 'epoch': 0.84} 84%|████████▎ | 5550/6640 [51:55<1:38:03, 5.40s/it] 84%|████████▎ | 5551/6640 [52:01<1:38:24, 5.42s/it] {'loss': 0.5642, 'learning_rate': 1.3782114622786769e-06, 'epoch': 0.84} 84%|████████▎ | 5551/6640 [52:01<1:38:24, 5.42s/it] 84%|████████▎ | 5552/6640 [52:06<1:37:52, 5.40s/it] {'loss': 0.5582, 'learning_rate': 1.3757411475090233e-06, 'epoch': 0.84} 84%|████████▎ | 5552/6640 [52:06<1:37:52, 5.40s/it] 84%|████████▎ | 5553/6640 [52:12<1:38:10, 5.42s/it] {'loss': 0.5866, 'learning_rate': 1.3732728850816146e-06, 'epoch': 0.84} 84%|████████▎ | 5553/6640 [52:12<1:38:10, 5.42s/it] 84%|████████▎ | 5554/6640 [52:17<1:37:49, 5.40s/it] {'loss': 0.591, 'learning_rate': 1.3708066755838344e-06, 'epoch': 0.84} 84%|████████▎ | 5554/6640 [52:17<1:37:49, 5.40s/it] 84%|████████▎ | 5555/6640 [52:22<1:37:57, 5.42s/it] {'loss': 0.5856, 'learning_rate': 1.3683425196025734e-06, 'epoch': 0.84} 84%|████████▎ | 5555/6640 [52:22<1:37:57, 5.42s/it] 84%|████████▎ | 5556/6640 [52:28<1:37:50, 5.42s/it] {'loss': 0.5926, 'learning_rate': 1.3658804177242347e-06, 'epoch': 0.84} 84%|████████▎ | 5556/6640 [52:28<1:37:50, 5.42s/it] 84%|████████▎ | 5557/6640 [52:33<1:37:58, 5.43s/it] {'loss': 0.5769, 'learning_rate': 1.3634203705347348e-06, 'epoch': 0.84} 84%|████████▎ | 5557/6640 [52:33<1:37:58, 5.43s/it] 84%|████████▎ | 5558/6640 [52:39<1:37:48, 5.42s/it] {'loss': 0.5826, 'learning_rate': 1.3609623786194958e-06, 'epoch': 0.84} 84%|████████▎ | 5558/6640 [52:39<1:37:48, 5.42s/it] 84%|████████▎ | 5559/6640 [52:44<1:37:24, 5.41s/it] {'loss': 0.5964, 'learning_rate': 1.3585064425634542e-06, 'epoch': 0.84} 84%|████████▎ | 5559/6640 [52:44<1:37:24, 5.41s/it] 84%|████████▎ | 5560/6640 [52:49<1:36:04, 5.34s/it] {'loss': 0.544, 'learning_rate': 1.3560525629510567e-06, 'epoch': 0.84} 84%|████████▎ | 5560/6640 [52:49<1:36:04, 5.34s/it] 84%|████████▍ | 5561/6640 [52:55<1:36:36, 5.37s/it] {'loss': 0.5591, 'learning_rate': 1.3536007403662632e-06, 'epoch': 0.84} 84%|████████▍ | 5561/6640 [52:55<1:36:36, 5.37s/it] 84%|████████▍ | 5562/6640 [53:00<1:36:43, 5.38s/it] {'loss': 0.5721, 'learning_rate': 1.3511509753925422e-06, 'epoch': 0.84} 84%|████████▍ | 5562/6640 [53:00<1:36:43, 5.38s/it] 84%|████████▍ | 5563/6640 [53:06<1:37:02, 5.41s/it] {'loss': 0.5835, 'learning_rate': 1.3487032686128653e-06, 'epoch': 0.84} 84%|████████▍ | 5563/6640 [53:06<1:37:02, 5.41s/it] 84%|████████▍ | 5564/6640 [53:11<1:37:07, 5.42s/it] {'loss': 0.5664, 'learning_rate': 1.34625762060973e-06, 'epoch': 0.84} 84%|████████▍ | 5564/6640 [53:11<1:37:07, 5.42s/it] 84%|████████▍ | 5565/6640 [53:16<1:37:22, 5.43s/it] {'loss': 0.5696, 'learning_rate': 1.3438140319651283e-06, 'epoch': 0.84} 84%|████████▍ | 5565/6640 [53:16<1:37:22, 5.43s/it] 84%|████████▍ | 5566/6640 [53:22<1:38:06, 5.48s/it] {'loss': 0.5962, 'learning_rate': 1.341372503260574e-06, 'epoch': 0.84} 84%|████████▍ | 5566/6640 [53:22<1:38:06, 5.48s/it] 84%|████████▍ | 5567/6640 [53:28<1:37:59, 5.48s/it] {'loss': 0.5734, 'learning_rate': 1.33893303507708e-06, 'epoch': 0.84} 84%|████████▍ | 5567/6640 [53:28<1:37:59, 5.48s/it] 84%|████████▍ | 5568/6640 [53:33<1:37:18, 5.45s/it] {'loss': 0.5796, 'learning_rate': 1.3364956279951768e-06, 'epoch': 0.84} 84%|████████▍ | 5568/6640 [53:33<1:37:18, 5.45s/it] 84%|████████▍ | 5569/6640 [53:38<1:37:42, 5.47s/it] {'loss': 0.5769, 'learning_rate': 1.3340602825949024e-06, 'epoch': 0.84} 84%|████████▍ | 5569/6640 [53:38<1:37:42, 5.47s/it] 84%|████████▍ | 5570/6640 [53:44<1:38:15, 5.51s/it] {'loss': 0.5635, 'learning_rate': 1.331626999455804e-06, 'epoch': 0.84} 84%|████████▍ | 5570/6640 [53:44<1:38:15, 5.51s/it] 84%|████████▍ | 5571/6640 [53:50<1:38:21, 5.52s/it] {'loss': 0.5809, 'learning_rate': 1.3291957791569376e-06, 'epoch': 0.84} 84%|████████▍ | 5571/6640 [53:50<1:38:21, 5.52s/it] 84%|████████▍ | 5572/6640 [53:55<1:38:36, 5.54s/it] {'loss': 0.586, 'learning_rate': 1.3267666222768637e-06, 'epoch': 0.84} 84%|████████▍ | 5572/6640 [53:55<1:38:36, 5.54s/it] 84%|████████▍ | 5573/6640 [54:01<1:37:45, 5.50s/it] {'loss': 0.5709, 'learning_rate': 1.3243395293936657e-06, 'epoch': 0.84} 84%|████████▍ | 5573/6640 [54:01<1:37:45, 5.50s/it] 84%|████████▍ | 5574/6640 [54:06<1:37:41, 5.50s/it] {'loss': 0.5985, 'learning_rate': 1.3219145010849188e-06, 'epoch': 0.84} 84%|████████▍ | 5574/6640 [54:06<1:37:41, 5.50s/it] 84%|████████▍ | 5575/6640 [54:11<1:36:26, 5.43s/it] {'loss': 0.594, 'learning_rate': 1.3194915379277195e-06, 'epoch': 0.84} 84%|████████▍ | 5575/6640 [54:11<1:36:26, 5.43s/it] 84%|████████▍ | 5576/6640 [54:17<1:36:33, 5.44s/it] {'loss': 0.5627, 'learning_rate': 1.3170706404986645e-06, 'epoch': 0.84} 84%|████████▍ | 5576/6640 [54:17<1:36:33, 5.44s/it] 84%|████████▍ | 5577/6640 [54:22<1:36:14, 5.43s/it] {'loss': 0.5705, 'learning_rate': 1.3146518093738624e-06, 'epoch': 0.84} 84%|████████▍ | 5577/6640 [54:22<1:36:14, 5.43s/it] 84%|████████▍ | 5578/6640 [54:28<1:37:03, 5.48s/it] {'loss': 0.5844, 'learning_rate': 1.3122350451289323e-06, 'epoch': 0.84} 84%|████████▍ | 5578/6640 [54:28<1:37:03, 5.48s/it] 84%|████████▍ | 5579/6640 [54:33<1:36:08, 5.44s/it] {'loss': 0.5743, 'learning_rate': 1.309820348338998e-06, 'epoch': 0.84} 84%|████████▍ | 5579/6640 [54:33<1:36:08, 5.44s/it] 84%|████████▍ | 5580/6640 [54:38<1:35:30, 5.41s/it] {'loss': 0.5798, 'learning_rate': 1.307407719578696e-06, 'epoch': 0.84} 84%|████████▍ | 5580/6640 [54:38<1:35:30, 5.41s/it] 84%|████████▍ | 5581/6640 [54:44<1:34:59, 5.38s/it] {'loss': 0.5649, 'learning_rate': 1.30499715942216e-06, 'epoch': 0.84} 84%|████████▍ | 5581/6640 [54:44<1:34:59, 5.38s/it] 84%|████████▍ | 5582/6640 [54:49<1:34:54, 5.38s/it] {'loss': 0.583, 'learning_rate': 1.3025886684430467e-06, 'epoch': 0.84} 84%|████████▍ | 5582/6640 [54:49<1:34:54, 5.38s/it] 84%|████████▍ | 5583/6640 [54:54<1:34:29, 5.36s/it] {'loss': 0.5649, 'learning_rate': 1.3001822472145066e-06, 'epoch': 0.84} 84%|████████▍ | 5583/6640 [54:54<1:34:29, 5.36s/it] 84%|████████▍ | 5584/6640 [55:00<1:34:52, 5.39s/it] {'loss': 0.5704, 'learning_rate': 1.297777896309207e-06, 'epoch': 0.84} 84%|████████▍ | 5584/6640 [55:00<1:34:52, 5.39s/it] 84%|████████▍ | 5585/6640 [55:05<1:34:32, 5.38s/it] {'loss': 0.5781, 'learning_rate': 1.2953756162993158e-06, 'epoch': 0.84} 84%|████████▍ | 5585/6640 [55:05<1:34:32, 5.38s/it] 84%|████████▍ | 5586/6640 [55:11<1:35:11, 5.42s/it] {'loss': 0.6005, 'learning_rate': 1.2929754077565126e-06, 'epoch': 0.84} 84%|████████▍ | 5586/6640 [55:11<1:35:11, 5.42s/it] 84%|████████▍ | 5587/6640 [55:16<1:34:37, 5.39s/it] {'loss': 0.5844, 'learning_rate': 1.2905772712519826e-06, 'epoch': 0.84} 84%|████████▍ | 5587/6640 [55:16<1:34:37, 5.39s/it] 84%|████████▍ | 5588/6640 [55:22<1:34:36, 5.40s/it] {'loss': 0.5743, 'learning_rate': 1.288181207356417e-06, 'epoch': 0.84} 84%|████████▍ | 5588/6640 [55:22<1:34:36, 5.40s/it] 84%|████████▍ | 5589/6640 [55:27<1:35:12, 5.43s/it] {'loss': 0.5919, 'learning_rate': 1.2857872166400198e-06, 'epoch': 0.84} 84%|████████▍ | 5589/6640 [55:27<1:35:12, 5.43s/it] 84%|████████▍ | 5590/6640 [55:33<1:36:22, 5.51s/it] {'loss': 0.5748, 'learning_rate': 1.2833952996724864e-06, 'epoch': 0.84} 84%|████████▍ | 5590/6640 [55:33<1:36:22, 5.51s/it] 84%|████████▍ | 5591/6640 [55:38<1:36:10, 5.50s/it] {'loss': 0.5769, 'learning_rate': 1.28100545702304e-06, 'epoch': 0.84} 84%|████████▍ | 5591/6640 [55:38<1:36:10, 5.50s/it] 84%|████████▍ | 5592/6640 [55:44<1:36:02, 5.50s/it] {'loss': 0.585, 'learning_rate': 1.278617689260393e-06, 'epoch': 0.84} 84%|████████▍ | 5592/6640 [55:44<1:36:02, 5.50s/it] 84%|████████▍ | 5593/6640 [55:49<1:35:28, 5.47s/it] {'loss': 0.5884, 'learning_rate': 1.2762319969527725e-06, 'epoch': 0.84} 84%|████████▍ | 5593/6640 [55:49<1:35:28, 5.47s/it] 84%|████████▍ | 5594/6640 [55:55<1:35:24, 5.47s/it] {'loss': 0.5809, 'learning_rate': 1.2738483806679057e-06, 'epoch': 0.84} 84%|████████▍ | 5594/6640 [55:55<1:35:24, 5.47s/it] 84%|████████▍ | 5595/6640 [56:00<1:34:44, 5.44s/it] {'loss': 0.5975, 'learning_rate': 1.2714668409730312e-06, 'epoch': 0.84} 84%|████████▍ | 5595/6640 [56:00<1:34:44, 5.44s/it] 84%|████████▍ | 5596/6640 [56:05<1:34:46, 5.45s/it] {'loss': 0.5872, 'learning_rate': 1.2690873784348923e-06, 'epoch': 0.84} 84%|████████▍ | 5596/6640 [56:05<1:34:46, 5.45s/it] 84%|████████▍ | 5597/6640 [56:11<1:34:39, 5.45s/it] {'loss': 0.5884, 'learning_rate': 1.266709993619737e-06, 'epoch': 0.84} 84%|████████▍ | 5597/6640 [56:11<1:34:39, 5.45s/it] 84%|████████▍ | 5598/6640 [56:16<1:34:24, 5.44s/it] {'loss': 0.5801, 'learning_rate': 1.2643346870933204e-06, 'epoch': 0.84} 84%|████████▍ | 5598/6640 [56:16<1:34:24, 5.44s/it] 84%|████████▍ | 5599/6640 [56:22<1:33:29, 5.39s/it] {'loss': 0.5573, 'learning_rate': 1.2619614594208972e-06, 'epoch': 0.84} 84%|████████▍ | 5599/6640 [56:22<1:33:29, 5.39s/it]04 AutoResumeHook: Checking whether to suspend... 16 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 32 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 84%|████████▍ | 5600/6640 [56:27<1:34:09, 5.43s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5656, 'learning_rate': 1.259590311167238e-06, 'epoch': 0.84} 84%|████████▍ | 5600/6640 [56:27<1:34:09, 5.43s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 84%|████████▍ | 5601/6640 [56:55<3:31:57, 12.24s/it] {'loss': 0.5894, 'learning_rate': 1.2572212428966079e-06, 'epoch': 0.84} 84%|████████▍ | 5601/6640 [56:55<3:31:57, 12.24s/it] 84%|████████▍ | 5602/6640 [57:01<2:56:17, 10.19s/it] {'loss': 0.5793, 'learning_rate': 1.2548542551727837e-06, 'epoch': 0.84} 84%|████████▍ | 5602/6640 [57:01<2:56:17, 10.19s/it] 84%|████████▍ | 5603/6640 [57:06<2:31:32, 8.77s/it] {'loss': 0.5871, 'learning_rate': 1.2524893485590462e-06, 'epoch': 0.84} 84%|████████▍ | 5603/6640 [57:06<2:31:32, 8.77s/it] 84%|████████▍ | 5604/6640 [57:12<2:15:46, 7.86s/it] {'loss': 0.5606, 'learning_rate': 1.2501265236181736e-06, 'epoch': 0.84} 84%|████████▍ | 5604/6640 [57:12<2:15:46, 7.86s/it] 84%|████████▍ | 5605/6640 [57:17<2:03:13, 7.14s/it] {'loss': 0.5726, 'learning_rate': 1.2477657809124632e-06, 'epoch': 0.84} 84%|████████▍ | 5605/6640 [57:17<2:03:13, 7.14s/it] 84%|████████▍ | 5606/6640 [57:23<1:54:06, 6.62s/it] {'loss': 0.5927, 'learning_rate': 1.2454071210037033e-06, 'epoch': 0.84} 84%|████████▍ | 5606/6640 [57:23<1:54:06, 6.62s/it] 84%|████████▍ | 5607/6640 [57:28<1:47:17, 6.23s/it] {'loss': 0.61, 'learning_rate': 1.2430505444531937e-06, 'epoch': 0.84} 84%|████████▍ | 5607/6640 [57:28<1:47:17, 6.23s/it] 84%|████████▍ | 5608/6640 [57:34<1:43:16, 6.00s/it] {'loss': 0.5725, 'learning_rate': 1.2406960518217325e-06, 'epoch': 0.84} 84%|████████▍ | 5608/6640 [57:34<1:43:16, 6.00s/it] 84%|████████▍ | 5609/6640 [57:39<1:40:37, 5.86s/it] {'loss': 0.5994, 'learning_rate': 1.2383436436696328e-06, 'epoch': 0.84} 84%|████████▍ | 5609/6640 [57:39<1:40:37, 5.86s/it] 84%|████████▍ | 5610/6640 [57:45<1:38:38, 5.75s/it] {'loss': 0.5842, 'learning_rate': 1.2359933205566987e-06, 'epoch': 0.84} 84%|████████▍ | 5610/6640 [57:45<1:38:38, 5.75s/it] 85%|████████▍ | 5611/6640 [57:50<1:37:26, 5.68s/it] {'loss': 0.5825, 'learning_rate': 1.2336450830422452e-06, 'epoch': 0.85} 85%|████████▍ | 5611/6640 [57:50<1:37:26, 5.68s/it] 85%|████████▍ | 5612/6640 [57:55<1:36:05, 5.61s/it] {'loss': 0.5995, 'learning_rate': 1.2312989316850932e-06, 'epoch': 0.85} 85%|████████▍ | 5612/6640 [57:55<1:36:05, 5.61s/it] 85%|████████▍ | 5613/6640 [58:01<1:34:52, 5.54s/it] {'loss': 0.5609, 'learning_rate': 1.2289548670435568e-06, 'epoch': 0.85} 85%|████████▍ | 5613/6640 [58:01<1:34:52, 5.54s/it] 85%|████████▍ | 5614/6640 [58:06<1:33:16, 5.45s/it] {'loss': 0.5837, 'learning_rate': 1.2266128896754703e-06, 'epoch': 0.85} 85%|████████▍ | 5614/6640 [58:06<1:33:16, 5.45s/it] 85%|████████▍ | 5615/6640 [58:12<1:34:06, 5.51s/it] {'loss': 0.5955, 'learning_rate': 1.2242730001381532e-06, 'epoch': 0.85} 85%|████████▍ | 5615/6640 [58:12<1:34:06, 5.51s/it] 85%|████████▍ | 5616/6640 [58:17<1:35:15, 5.58s/it] {'loss': 0.5679, 'learning_rate': 1.221935198988441e-06, 'epoch': 0.85} 85%|████████▍ | 5616/6640 [58:17<1:35:15, 5.58s/it] 85%|████████▍ | 5617/6640 [58:23<1:34:42, 5.55s/it] {'loss': 0.5533, 'learning_rate': 1.2195994867826622e-06, 'epoch': 0.85} 85%|████████▍ | 5617/6640 [58:23<1:34:42, 5.55s/it] 85%|████████▍ | 5618/6640 [58:28<1:33:24, 5.48s/it] {'loss': 0.5697, 'learning_rate': 1.2172658640766622e-06, 'epoch': 0.85} 85%|████████▍ | 5618/6640 [58:28<1:33:24, 5.48s/it] 85%|████████▍ | 5619/6640 [58:34<1:32:28, 5.43s/it] {'loss': 0.5981, 'learning_rate': 1.2149343314257733e-06, 'epoch': 0.85} 85%|████████▍ | 5619/6640 [58:34<1:32:28, 5.43s/it] 85%|████████▍ | 5620/6640 [58:39<1:31:34, 5.39s/it] {'loss': 0.5914, 'learning_rate': 1.2126048893848396e-06, 'epoch': 0.85} 85%|████████▍ | 5620/6640 [58:39<1:31:34, 5.39s/it] 85%|████████▍ | 5621/6640 [58:44<1:31:17, 5.38s/it] {'loss': 0.5923, 'learning_rate': 1.21027753850821e-06, 'epoch': 0.85} 85%|████████▍ | 5621/6640 [58:44<1:31:17, 5.38s/it] 85%|████████▍ | 5622/6640 [58:50<1:31:58, 5.42s/it] {'loss': 0.5585, 'learning_rate': 1.2079522793497233e-06, 'epoch': 0.85} 85%|████████▍ | 5622/6640 [58:50<1:31:58, 5.42s/it] 85%|████████▍ | 5623/6640 [58:55<1:31:40, 5.41s/it] {'loss': 0.5545, 'learning_rate': 1.2056291124627374e-06, 'epoch': 0.85} 85%|████████▍ | 5623/6640 [58:55<1:31:40, 5.41s/it] 85%|████████▍ | 5624/6640 [59:01<1:31:54, 5.43s/it] {'loss': 0.6039, 'learning_rate': 1.2033080384000983e-06, 'epoch': 0.85} 85%|████████▍ | 5624/6640 [59:01<1:31:54, 5.43s/it] 85%|████████▍ | 5625/6640 [59:06<1:32:12, 5.45s/it] {'loss': 0.555, 'learning_rate': 1.2009890577141625e-06, 'epoch': 0.85} 85%|████████▍ | 5625/6640 [59:06<1:32:12, 5.45s/it] 85%|████████▍ | 5626/6640 [59:11<1:31:08, 5.39s/it] {'loss': 0.5674, 'learning_rate': 1.1986721709567795e-06, 'epoch': 0.85} 85%|████████▍ | 5626/6640 [59:11<1:31:08, 5.39s/it] 85%|████████▍ | 5627/6640 [59:17<1:31:43, 5.43s/it] {'loss': 0.5796, 'learning_rate': 1.1963573786793148e-06, 'epoch': 0.85} 85%|████████▍ | 5627/6640 [59:17<1:31:43, 5.43s/it] 85%|████████▍ | 5628/6640 [59:22<1:31:33, 5.43s/it] {'loss': 0.5674, 'learning_rate': 1.19404468143262e-06, 'epoch': 0.85} 85%|████████▍ | 5628/6640 [59:22<1:31:33, 5.43s/it] 85%|████████▍ | 5629/6640 [59:28<1:33:24, 5.54s/it] {'loss': 0.5838, 'learning_rate': 1.1917340797670584e-06, 'epoch': 0.85} 85%|████████▍ | 5629/6640 [59:28<1:33:24, 5.54s/it] 85%|████████▍ | 5630/6640 [59:34<1:32:24, 5.49s/it] {'loss': 0.5964, 'learning_rate': 1.189425574232491e-06, 'epoch': 0.85} 85%|████████▍ | 5630/6640 [59:34<1:32:24, 5.49s/it] 85%|████████▍ | 5631/6640 [59:39<1:31:58, 5.47s/it] {'loss': 0.5836, 'learning_rate': 1.1871191653782764e-06, 'epoch': 0.85} 85%|████████▍ | 5631/6640 [59:39<1:31:58, 5.47s/it] 85%|████████▍ | 5632/6640 [59:44<1:31:39, 5.46s/it] {'loss': 0.5802, 'learning_rate': 1.1848148537532845e-06, 'epoch': 0.85} 85%|████████▍ | 5632/6640 [59:44<1:31:39, 5.46s/it] 85%|████████▍ | 5633/6640 [59:50<1:31:05, 5.43s/it] {'loss': 0.5753, 'learning_rate': 1.182512639905874e-06, 'epoch': 0.85} 85%|████████▍ | 5633/6640 [59:50<1:31:05, 5.43s/it] 85%|████████▍ | 5634/6640 [59:55<1:32:01, 5.49s/it] {'loss': 0.574, 'learning_rate': 1.1802125243839146e-06, 'epoch': 0.85} 85%|████████▍ | 5634/6640 [59:55<1:32:01, 5.49s/it] 85%|████████▍ | 5635/6640 [1:00:01<1:32:08, 5.50s/it] {'loss': 0.5685, 'learning_rate': 1.1779145077347653e-06, 'epoch': 0.85} 85%|████████▍ | 5635/6640 [1:00:01<1:32:08, 5.50s/it] 85%|████████▍ | 5636/6640 [1:00:06<1:31:18, 5.46s/it] {'loss': 0.5904, 'learning_rate': 1.1756185905053019e-06, 'epoch': 0.85} 85%|████████▍ | 5636/6640 [1:00:06<1:31:18, 5.46s/it] 85%|████████▍ | 5637/6640 [1:00:12<1:32:07, 5.51s/it] {'loss': 0.6022, 'learning_rate': 1.1733247732418828e-06, 'epoch': 0.85} 85%|████████▍ | 5637/6640 [1:00:12<1:32:07, 5.51s/it] 85%|████████▍ | 5638/6640 [1:00:17<1:31:06, 5.46s/it] {'loss': 0.562, 'learning_rate': 1.1710330564903782e-06, 'epoch': 0.85} 85%|████████▍ | 5638/6640 [1:00:17<1:31:06, 5.46s/it] 85%|████████▍ | 5639/6640 [1:00:22<1:30:03, 5.40s/it] {'loss': 0.5898, 'learning_rate': 1.1687434407961584e-06, 'epoch': 0.85} 85%|████████▍ | 5639/6640 [1:00:22<1:30:03, 5.40s/it] 85%|████████▍ | 5640/6640 [1:00:28<1:30:02, 5.40s/it] {'loss': 0.5681, 'learning_rate': 1.1664559267040821e-06, 'epoch': 0.85} 85%|████████▍ | 5640/6640 [1:00:28<1:30:02, 5.40s/it] 85%|████████▍ | 5641/6640 [1:00:33<1:29:31, 5.38s/it] {'loss': 0.5592, 'learning_rate': 1.1641705147585258e-06, 'epoch': 0.85} 85%|████████▍ | 5641/6640 [1:00:33<1:29:31, 5.38s/it] 85%|████████▍ | 5642/6640 [1:00:39<1:29:51, 5.40s/it] {'loss': 0.5992, 'learning_rate': 1.1618872055033503e-06, 'epoch': 0.85} 85%|████████▍ | 5642/6640 [1:00:39<1:29:51, 5.40s/it] 85%|████████▍ | 5643/6640 [1:00:44<1:30:13, 5.43s/it] {'loss': 0.5924, 'learning_rate': 1.1596059994819253e-06, 'epoch': 0.85} 85%|████████▍ | 5643/6640 [1:00:44<1:30:13, 5.43s/it] 85%|████████▌ | 5644/6640 [1:00:49<1:29:37, 5.40s/it] {'loss': 0.5876, 'learning_rate': 1.157326897237111e-06, 'epoch': 0.85} 85%|████████▌ | 5644/6640 [1:00:49<1:29:37, 5.40s/it] 85%|████████▌ | 5645/6640 [1:00:55<1:29:05, 5.37s/it] {'loss': 0.5813, 'learning_rate': 1.1550498993112812e-06, 'epoch': 0.85} 85%|████████▌ | 5645/6640 [1:00:55<1:29:05, 5.37s/it] 85%|████████▌ | 5646/6640 [1:01:00<1:29:13, 5.39s/it] {'loss': 0.5755, 'learning_rate': 1.1527750062462928e-06, 'epoch': 0.85} 85%|████████▌ | 5646/6640 [1:01:00<1:29:13, 5.39s/it] 85%|████████▌ | 5647/6640 [1:01:06<1:28:41, 5.36s/it] {'loss': 0.5732, 'learning_rate': 1.1505022185835124e-06, 'epoch': 0.85} 85%|████████▌ | 5647/6640 [1:01:06<1:28:41, 5.36s/it] 85%|████████▌ | 5648/6640 [1:01:11<1:28:24, 5.35s/it] {'loss': 0.5522, 'learning_rate': 1.1482315368638041e-06, 'epoch': 0.85} 85%|████████▌ | 5648/6640 [1:01:11<1:28:24, 5.35s/it] 85%|████████▌ | 5649/6640 [1:01:17<1:30:51, 5.50s/it] {'loss': 0.5771, 'learning_rate': 1.1459629616275236e-06, 'epoch': 0.85} 85%|████████▌ | 5649/6640 [1:01:17<1:30:51, 5.50s/it]1 AutoResumeHook: Checking whether to suspend... 62 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 85%|████████▌ | 5650/6640 [1:01:22<1:30:48, 5.50s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5759, 'learning_rate': 1.143696493414539e-06, 'epoch': 0.85} 85%|████████▌ | 5650/6640 [1:01:22<1:30:48, 5.50s/it] 85%|████████▌ | 5651/6640 [1:01:28<1:30:06, 5.47s/it] {'loss': 0.5817, 'learning_rate': 1.1414321327642019e-06, 'epoch': 0.85} 85%|████████▌ | 5651/6640 [1:01:28<1:30:06, 5.47s/it] 85%|████████▌ | 5652/6640 [1:01:33<1:28:44, 5.39s/it] {'loss': 0.5763, 'learning_rate': 1.1391698802153728e-06, 'epoch': 0.85} 85%|████████▌ | 5652/6640 [1:01:33<1:28:44, 5.39s/it] 85%|████████▌ | 5653/6640 [1:01:38<1:28:43, 5.39s/it] {'loss': 0.5803, 'learning_rate': 1.1369097363064064e-06, 'epoch': 0.85} 85%|████████▌ | 5653/6640 [1:01:38<1:28:43, 5.39s/it] 85%|████████▌ | 5654/6640 [1:01:44<1:29:53, 5.47s/it] {'loss': 0.5851, 'learning_rate': 1.134651701575157e-06, 'epoch': 0.85} 85%|████████▌ | 5654/6640 [1:01:44<1:29:53, 5.47s/it] 85%|████████▌ | 5655/6640 [1:01:49<1:29:17, 5.44s/it] {'loss': 0.5805, 'learning_rate': 1.1323957765589766e-06, 'epoch': 0.85} 85%|████████▌ | 5655/6640 [1:01:49<1:29:17, 5.44s/it] 85%|████████▌ | 5656/6640 [1:01:55<1:29:16, 5.44s/it] {'loss': 0.5722, 'learning_rate': 1.1301419617947118e-06, 'epoch': 0.85} 85%|████████▌ | 5656/6640 [1:01:55<1:29:16, 5.44s/it] 85%|████████▌ | 5657/6640 [1:02:00<1:30:03, 5.50s/it] {'loss': 0.5923, 'learning_rate': 1.1278902578187157e-06, 'epoch': 0.85} 85%|████████▌ | 5657/6640 [1:02:00<1:30:03, 5.50s/it] 85%|████████▌ | 5658/6640 [1:02:06<1:29:56, 5.50s/it] {'loss': 0.5836, 'learning_rate': 1.125640665166825e-06, 'epoch': 0.85} 85%|████████▌ | 5658/6640 [1:02:06<1:29:56, 5.50s/it] 85%|████████▌ | 5659/6640 [1:02:11<1:29:27, 5.47s/it] {'loss': 0.5635, 'learning_rate': 1.1233931843743906e-06, 'epoch': 0.85} 85%|████████▌ | 5659/6640 [1:02:11<1:29:27, 5.47s/it] 85%|████████▌ | 5660/6640 [1:02:17<1:29:01, 5.45s/it] {'loss': 0.5626, 'learning_rate': 1.121147815976248e-06, 'epoch': 0.85} 85%|████████▌ | 5660/6640 [1:02:17<1:29:01, 5.45s/it] 85%|████████▌ | 5661/6640 [1:02:22<1:28:55, 5.45s/it] {'loss': 0.5664, 'learning_rate': 1.1189045605067362e-06, 'epoch': 0.85} 85%|████████▌ | 5661/6640 [1:02:22<1:28:55, 5.45s/it] 85%|████████▌ | 5662/6640 [1:02:27<1:28:41, 5.44s/it] {'loss': 0.5666, 'learning_rate': 1.1166634184996894e-06, 'epoch': 0.85} 85%|████████▌ | 5662/6640 [1:02:27<1:28:41, 5.44s/it] 85%|████████▌ | 5663/6640 [1:02:33<1:31:08, 5.60s/it] {'loss': 0.5778, 'learning_rate': 1.1144243904884399e-06, 'epoch': 0.85} 85%|████████▌ | 5663/6640 [1:02:33<1:31:08, 5.60s/it] 85%|████████▌ | 5664/6640 [1:02:39<1:29:56, 5.53s/it] {'loss': 0.5627, 'learning_rate': 1.1121874770058171e-06, 'epoch': 0.85} 85%|████████▌ | 5664/6640 [1:02:39<1:29:56, 5.53s/it] 85%|████████▌ | 5665/6640 [1:02:44<1:28:08, 5.42s/it] {'loss': 0.5717, 'learning_rate': 1.109952678584144e-06, 'epoch': 0.85} 85%|████████▌ | 5665/6640 [1:02:44<1:28:08, 5.42s/it] 85%|████████▌ | 5666/6640 [1:02:49<1:28:22, 5.44s/it] {'loss': 0.5714, 'learning_rate': 1.1077199957552453e-06, 'epoch': 0.85} 85%|████████▌ | 5666/6640 [1:02:49<1:28:22, 5.44s/it] 85%|████████▌ | 5667/6640 [1:02:55<1:28:25, 5.45s/it] {'loss': 0.5775, 'learning_rate': 1.1054894290504348e-06, 'epoch': 0.85} 85%|████████▌ | 5667/6640 [1:02:55<1:28:25, 5.45s/it] 85%|████████▌ | 5668/6640 [1:03:00<1:28:24, 5.46s/it] {'loss': 0.5769, 'learning_rate': 1.1032609790005344e-06, 'epoch': 0.85} 85%|████████▌ | 5668/6640 [1:03:00<1:28:24, 5.46s/it] 85%|████████▌ | 5669/6640 [1:03:06<1:28:06, 5.44s/it] {'loss': 0.5763, 'learning_rate': 1.1010346461358501e-06, 'epoch': 0.85} 85%|████████▌ | 5669/6640 [1:03:06<1:28:06, 5.44s/it] 85%|████████▌ | 5670/6640 [1:03:11<1:27:57, 5.44s/it] {'loss': 0.597, 'learning_rate': 1.0988104309861913e-06, 'epoch': 0.85} 85%|████████▌ | 5670/6640 [1:03:11<1:27:57, 5.44s/it] 85%|████████▌ | 5671/6640 [1:03:17<1:27:23, 5.41s/it] {'loss': 0.5616, 'learning_rate': 1.0965883340808614e-06, 'epoch': 0.85} 85%|████████▌ | 5671/6640 [1:03:17<1:27:23, 5.41s/it] 85%|████████▌ | 5672/6640 [1:03:22<1:28:27, 5.48s/it] {'loss': 0.5914, 'learning_rate': 1.0943683559486607e-06, 'epoch': 0.85} 85%|████████▌ | 5672/6640 [1:03:22<1:28:27, 5.48s/it] 85%|████████▌ | 5673/6640 [1:03:28<1:27:45, 5.45s/it] {'loss': 0.5758, 'learning_rate': 1.0921504971178854e-06, 'epoch': 0.85} 85%|████████▌ | 5673/6640 [1:03:28<1:27:45, 5.45s/it] 85%|████████▌ | 5674/6640 [1:03:33<1:26:36, 5.38s/it] {'loss': 0.551, 'learning_rate': 1.0899347581163222e-06, 'epoch': 0.85} 85%|████████▌ | 5674/6640 [1:03:33<1:26:36, 5.38s/it] 85%|████████▌ | 5675/6640 [1:03:38<1:26:25, 5.37s/it] {'loss': 0.5895, 'learning_rate': 1.0877211394712617e-06, 'epoch': 0.85} 85%|████████▌ | 5675/6640 [1:03:38<1:26:25, 5.37s/it] 85%|████████▌ | 5676/6640 [1:03:44<1:27:41, 5.46s/it] {'loss': 0.57, 'learning_rate': 1.0855096417094813e-06, 'epoch': 0.85} 85%|████████▌ | 5676/6640 [1:03:44<1:27:41, 5.46s/it] 85%|████████▌ | 5677/6640 [1:03:49<1:27:12, 5.43s/it] {'loss': 0.5486, 'learning_rate': 1.0833002653572633e-06, 'epoch': 0.85} 85%|████████▌ | 5677/6640 [1:03:49<1:27:12, 5.43s/it] 86%|████████▌ | 5678/6640 [1:03:55<1:28:10, 5.50s/it] {'loss': 0.5736, 'learning_rate': 1.0810930109403772e-06, 'epoch': 0.86} 86%|████████▌ | 5678/6640 [1:03:55<1:28:10, 5.50s/it] 86%|████████▌ | 5679/6640 [1:04:00<1:28:07, 5.50s/it] {'loss': 0.5726, 'learning_rate': 1.0788878789840895e-06, 'epoch': 0.86} 86%|████████▌ | 5679/6640 [1:04:00<1:28:07, 5.50s/it] 86%|████████▌ | 5680/6640 [1:04:06<1:28:18, 5.52s/it] {'loss': 0.5638, 'learning_rate': 1.076684870013165e-06, 'epoch': 0.86} 86%|████████▌ | 5680/6640 [1:04:06<1:28:18, 5.52s/it] 86%|████████▌ | 5681/6640 [1:04:11<1:26:55, 5.44s/it] {'loss': 0.5987, 'learning_rate': 1.0744839845518585e-06, 'epoch': 0.86} 86%|████████▌ | 5681/6640 [1:04:11<1:26:55, 5.44s/it] 86%|████████▌ | 5682/6640 [1:04:17<1:27:09, 5.46s/it] {'loss': 0.5922, 'learning_rate': 1.0722852231239245e-06, 'epoch': 0.86} 86%|████████▌ | 5682/6640 [1:04:17<1:27:09, 5.46s/it] 86%|████████▌ | 5683/6640 [1:04:23<1:29:00, 5.58s/it] {'loss': 0.5799, 'learning_rate': 1.0700885862526066e-06, 'epoch': 0.86} 86%|████████▌ | 5683/6640 [1:04:23<1:29:00, 5.58s/it] 86%|████████▌ | 5684/6640 [1:04:28<1:29:20, 5.61s/it] {'loss': 0.5918, 'learning_rate': 1.0678940744606469e-06, 'epoch': 0.86} 86%|████████▌ | 5684/6640 [1:04:28<1:29:20, 5.61s/it] 86%|████████▌ | 5685/6640 [1:04:34<1:28:12, 5.54s/it] {'loss': 0.5419, 'learning_rate': 1.0657016882702764e-06, 'epoch': 0.86} 86%|████████▌ | 5685/6640 [1:04:34<1:28:12, 5.54s/it] 86%|████████▌ | 5686/6640 [1:04:39<1:26:54, 5.47s/it] {'loss': 0.5771, 'learning_rate': 1.063511428203231e-06, 'epoch': 0.86} 86%|████████▌ | 5686/6640 [1:04:39<1:26:54, 5.47s/it] 86%|████████▌ | 5687/6640 [1:04:44<1:26:59, 5.48s/it] {'loss': 0.5819, 'learning_rate': 1.0613232947807294e-06, 'epoch': 0.86} 86%|████████▌ | 5687/6640 [1:04:44<1:26:59, 5.48s/it] 86%|████████▌ | 5688/6640 [1:04:50<1:25:43, 5.40s/it] {'loss': 0.5628, 'learning_rate': 1.0591372885234885e-06, 'epoch': 0.86} 86%|████████▌ | 5688/6640 [1:04:50<1:25:43, 5.40s/it] 86%|████████▌ | 5689/6640 [1:04:55<1:25:18, 5.38s/it] {'loss': 0.5519, 'learning_rate': 1.056953409951721e-06, 'epoch': 0.86} 86%|████████▌ | 5689/6640 [1:04:55<1:25:18, 5.38s/it] 86%|████████▌ | 5690/6640 [1:05:00<1:24:47, 5.36s/it] {'loss': 0.5842, 'learning_rate': 1.0547716595851298e-06, 'epoch': 0.86} 86%|████████▌ | 5690/6640 [1:05:00<1:24:47, 5.36s/it] 86%|████████▌ | 5691/6640 [1:05:06<1:27:08, 5.51s/it] {'loss': 0.562, 'learning_rate': 1.0525920379429155e-06, 'epoch': 0.86} 86%|████████▌ | 5691/6640 [1:05:06<1:27:08, 5.51s/it] 86%|████████▌ | 5692/6640 [1:05:11<1:26:20, 5.46s/it] {'loss': 0.5993, 'learning_rate': 1.0504145455437664e-06, 'epoch': 0.86} 86%|████████▌ | 5692/6640 [1:05:11<1:26:20, 5.46s/it] 86%|████████▌ | 5693/6640 [1:05:17<1:26:03, 5.45s/it] {'loss': 0.5761, 'learning_rate': 1.0482391829058691e-06, 'epoch': 0.86} 86%|████████▌ | 5693/6640 [1:05:17<1:26:03, 5.45s/it] 86%|████████▌ | 5694/6640 [1:05:22<1:26:15, 5.47s/it] {'loss': 0.5804, 'learning_rate': 1.0460659505469006e-06, 'epoch': 0.86} 86%|████████▌ | 5694/6640 [1:05:22<1:26:15, 5.47s/it] 86%|████████▌ | 5695/6640 [1:05:28<1:27:06, 5.53s/it] {'loss': 0.5903, 'learning_rate': 1.0438948489840327e-06, 'epoch': 0.86} 86%|████████▌ | 5695/6640 [1:05:28<1:27:06, 5.53s/it] 86%|████████▌ | 5696/6640 [1:05:34<1:27:23, 5.55s/it] {'loss': 0.5659, 'learning_rate': 1.0417258787339314e-06, 'epoch': 0.86} 86%|████████▌ | 5696/6640 [1:05:34<1:27:23, 5.55s/it] 86%|████████▌ | 5697/6640 [1:05:39<1:25:53, 5.47s/it] {'loss': 0.5798, 'learning_rate': 1.0395590403127487e-06, 'epoch': 0.86} 86%|████████▌ | 5697/6640 [1:05:39<1:25:53, 5.47s/it] 86%|████████▌ | 5698/6640 [1:05:44<1:24:44, 5.40s/it] {'loss': 0.5762, 'learning_rate': 1.037394334236137e-06, 'epoch': 0.86} 86%|████████▌ | 5698/6640 [1:05:44<1:24:44, 5.40s/it] 86%|████████▌ | 5699/6640 [1:05:50<1:24:48, 5.41s/it] {'loss': 0.5789, 'learning_rate': 1.0352317610192376e-06, 'epoch': 0.86} 86%|████████▌ | 5699/6640 [1:05:50<1:24:48, 5.41s/it]1 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...7 AutoResumeHook: Checking whether to suspend... 86%|████████▌ | 5700/6640 [1:05:55<1:24:57, 5.42s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5655, 'learning_rate': 1.0330713211766864e-06, 'epoch': 0.86} 86%|████████▌ | 5700/6640 [1:05:55<1:24:57, 5.42s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 86%|████████▌ | 5701/6640 [1:06:24<3:13:07, 12.34s/it] {'loss': 0.5465, 'learning_rate': 1.0309130152226077e-06, 'epoch': 0.86} 86%|████████▌ | 5701/6640 [1:06:24<3:13:07, 12.34s/it] 86%|████████▌ | 5702/6640 [1:06:29<2:39:59, 10.23s/it] {'loss': 0.5733, 'learning_rate': 1.0287568436706208e-06, 'epoch': 0.86} 86%|████████▌ | 5702/6640 [1:06:29<2:39:59, 10.23s/it] 86%|████████▌ | 5703/6640 [1:06:34<2:17:03, 8.78s/it] {'loss': 0.5543, 'learning_rate': 1.0266028070338374e-06, 'epoch': 0.86} 86%|████████▌ | 5703/6640 [1:06:34<2:17:03, 8.78s/it] 86%|████████▌ | 5704/6640 [1:06:40<2:01:33, 7.79s/it] {'loss': 0.5644, 'learning_rate': 1.0244509058248608e-06, 'epoch': 0.86} 86%|████████▌ | 5704/6640 [1:06:40<2:01:33, 7.79s/it] 86%|████████▌ | 5705/6640 [1:06:45<1:49:54, 7.05s/it] {'loss': 0.5576, 'learning_rate': 1.022301140555787e-06, 'epoch': 0.86} 86%|████████▌ | 5705/6640 [1:06:45<1:49:54, 7.05s/it] 86%|████████▌ | 5706/6640 [1:06:51<1:43:03, 6.62s/it] {'loss': 0.5852, 'learning_rate': 1.0201535117382e-06, 'epoch': 0.86} 86%|████████▌ | 5706/6640 [1:06:51<1:43:03, 6.62s/it] 86%|████████▌ | 5707/6640 [1:06:56<1:37:06, 6.25s/it] {'loss': 0.6101, 'learning_rate': 1.0180080198831778e-06, 'epoch': 0.86} 86%|████████▌ | 5707/6640 [1:06:56<1:37:06, 6.25s/it] 86%|████████▌ | 5708/6640 [1:07:02<1:34:44, 6.10s/it] {'loss': 0.6014, 'learning_rate': 1.0158646655012905e-06, 'epoch': 0.86} 86%|████████▌ | 5708/6640 [1:07:02<1:34:44, 6.10s/it] 86%|████████▌ | 5709/6640 [1:07:07<1:31:13, 5.88s/it] {'loss': 0.5971, 'learning_rate': 1.0137234491026015e-06, 'epoch': 0.86} 86%|████████▌ | 5709/6640 [1:07:07<1:31:13, 5.88s/it] 86%|████████▌ | 5710/6640 [1:07:13<1:29:53, 5.80s/it] {'loss': 0.5909, 'learning_rate': 1.0115843711966577e-06, 'epoch': 0.86} 86%|████████▌ | 5710/6640 [1:07:13<1:29:53, 5.80s/it] 86%|████████▌ | 5711/6640 [1:07:19<1:29:26, 5.78s/it] {'loss': 0.5846, 'learning_rate': 1.0094474322925062e-06, 'epoch': 0.86} 86%|████████▌ | 5711/6640 [1:07:19<1:29:26, 5.78s/it] 86%|████████▌ | 5712/6640 [1:07:24<1:27:45, 5.67s/it] {'loss': 0.577, 'learning_rate': 1.0073126328986805e-06, 'epoch': 0.86} 86%|████████▌ | 5712/6640 [1:07:24<1:27:45, 5.67s/it] 86%|████████▌ | 5713/6640 [1:07:30<1:28:04, 5.70s/it] {'loss': 0.5809, 'learning_rate': 1.0051799735232038e-06, 'epoch': 0.86} 86%|████████▌ | 5713/6640 [1:07:30<1:28:04, 5.70s/it] 86%|████████▌ | 5714/6640 [1:07:35<1:26:23, 5.60s/it] {'loss': 0.5917, 'learning_rate': 1.0030494546735958e-06, 'epoch': 0.86} 86%|████████▌ | 5714/6640 [1:07:35<1:26:23, 5.60s/it] 86%|████████▌ | 5715/6640 [1:07:41<1:25:56, 5.57s/it] {'loss': 0.5853, 'learning_rate': 1.000921076856859e-06, 'epoch': 0.86} 86%|████████▌ | 5715/6640 [1:07:41<1:25:56, 5.57s/it] 86%|████████▌ | 5716/6640 [1:07:46<1:24:43, 5.50s/it] {'loss': 0.5636, 'learning_rate': 9.987948405794912e-07, 'epoch': 0.86} 86%|████████▌ | 5716/6640 [1:07:46<1:24:43, 5.50s/it] 86%|████████▌ | 5717/6640 [1:07:51<1:23:57, 5.46s/it] {'loss': 0.5878, 'learning_rate': 9.966707463474789e-07, 'epoch': 0.86} 86%|████████▌ | 5717/6640 [1:07:51<1:23:57, 5.46s/it] 86%|████████▌ | 5718/6640 [1:07:57<1:23:46, 5.45s/it] {'loss': 0.5599, 'learning_rate': 9.945487946663036e-07, 'epoch': 0.86} 86%|████████▌ | 5718/6640 [1:07:57<1:23:46, 5.45s/it] 86%|████████▌ | 5719/6640 [1:08:02<1:23:09, 5.42s/it] {'loss': 0.6007, 'learning_rate': 9.924289860409264e-07, 'epoch': 0.86} 86%|████████▌ | 5719/6640 [1:08:02<1:23:09, 5.42s/it] 86%|████████▌ | 5720/6640 [1:08:08<1:23:09, 5.42s/it] {'loss': 0.5823, 'learning_rate': 9.903113209758098e-07, 'epoch': 0.86} 86%|████████▌ | 5720/6640 [1:08:08<1:23:09, 5.42s/it] 86%|████████▌ | 5721/6640 [1:08:13<1:23:42, 5.47s/it] {'loss': 0.5614, 'learning_rate': 9.88195799974898e-07, 'epoch': 0.86} 86%|████████▌ | 5721/6640 [1:08:13<1:23:42, 5.47s/it] 86%|████████▌ | 5722/6640 [1:08:19<1:23:43, 5.47s/it] {'loss': 0.5947, 'learning_rate': 9.86082423541631e-07, 'epoch': 0.86} 86%|████████▌ | 5722/6640 [1:08:19<1:23:43, 5.47s/it] 86%|████████▌ | 5723/6640 [1:08:24<1:23:00, 5.43s/it] {'loss': 0.5964, 'learning_rate': 9.839711921789363e-07, 'epoch': 0.86} 86%|████████▌ | 5723/6640 [1:08:24<1:23:00, 5.43s/it] 86%|████████▌ | 5724/6640 [1:08:29<1:22:33, 5.41s/it] {'loss': 0.5716, 'learning_rate': 9.818621063892276e-07, 'epoch': 0.86} 86%|████████▌ | 5724/6640 [1:08:29<1:22:33, 5.41s/it] 86%|████████▌ | 5725/6640 [1:08:35<1:22:11, 5.39s/it] {'loss': 0.5756, 'learning_rate': 9.79755166674411e-07, 'epoch': 0.86} 86%|████████▌ | 5725/6640 [1:08:35<1:22:11, 5.39s/it] 86%|████████▌ | 5726/6640 [1:08:40<1:23:33, 5.48s/it] {'loss': 0.5803, 'learning_rate': 9.776503735358833e-07, 'epoch': 0.86} 86%|████████▌ | 5726/6640 [1:08:40<1:23:33, 5.48s/it] 86%|████████▋ | 5727/6640 [1:08:46<1:23:12, 5.47s/it] {'loss': 0.568, 'learning_rate': 9.755477274745284e-07, 'epoch': 0.86} 86%|████████▋ | 5727/6640 [1:08:46<1:23:12, 5.47s/it] 86%|████████▋ | 5728/6640 [1:08:51<1:23:54, 5.52s/it] {'loss': 0.5727, 'learning_rate': 9.734472289907182e-07, 'epoch': 0.86} 86%|████████▋ | 5728/6640 [1:08:51<1:23:54, 5.52s/it] 86%|████████▋ | 5729/6640 [1:08:57<1:23:09, 5.48s/it] {'loss': 0.5773, 'learning_rate': 9.713488785843161e-07, 'epoch': 0.86} 86%|████████▋ | 5729/6640 [1:08:57<1:23:09, 5.48s/it] 86%|████████▋ | 5730/6640 [1:09:02<1:22:33, 5.44s/it] {'loss': 0.5692, 'learning_rate': 9.692526767546727e-07, 'epoch': 0.86} 86%|████████▋ | 5730/6640 [1:09:02<1:22:33, 5.44s/it] 86%|████████▋ | 5731/6640 [1:09:07<1:21:55, 5.41s/it] {'loss': 0.5738, 'learning_rate': 9.671586240006292e-07, 'epoch': 0.86} 86%|████████▋ | 5731/6640 [1:09:07<1:21:55, 5.41s/it] 86%|████████▋ | 5732/6640 [1:09:13<1:21:23, 5.38s/it] {'loss': 0.579, 'learning_rate': 9.650667208205145e-07, 'epoch': 0.86} 86%|████████▋ | 5732/6640 [1:09:13<1:21:23, 5.38s/it] 86%|████████▋ | 5733/6640 [1:09:18<1:21:16, 5.38s/it] {'loss': 0.5839, 'learning_rate': 9.629769677121414e-07, 'epoch': 0.86} 86%|████████▋ | 5733/6640 [1:09:18<1:21:16, 5.38s/it] 86%|████████▋ | 5734/6640 [1:09:24<1:22:10, 5.44s/it] {'loss': 0.5906, 'learning_rate': 9.608893651728179e-07, 'epoch': 0.86} 86%|████████▋ | 5734/6640 [1:09:24<1:22:10, 5.44s/it] 86%|████████▋ | 5735/6640 [1:09:29<1:21:45, 5.42s/it] {'loss': 0.5503, 'learning_rate': 9.588039136993366e-07, 'epoch': 0.86} 86%|████████▋ | 5735/6640 [1:09:29<1:21:45, 5.42s/it] 86%|████████▋ | 5736/6640 [1:09:34<1:20:55, 5.37s/it] {'loss': 0.575, 'learning_rate': 9.567206137879804e-07, 'epoch': 0.86} 86%|████████▋ | 5736/6640 [1:09:34<1:20:55, 5.37s/it] 86%|████████▋ | 5737/6640 [1:09:40<1:21:01, 5.38s/it] {'loss': 0.5745, 'learning_rate': 9.546394659345192e-07, 'epoch': 0.86} 86%|████████▋ | 5737/6640 [1:09:40<1:21:01, 5.38s/it] 86%|████████▋ | 5738/6640 [1:09:45<1:21:45, 5.44s/it] {'loss': 0.5637, 'learning_rate': 9.525604706342073e-07, 'epoch': 0.86} 86%|████████▋ | 5738/6640 [1:09:45<1:21:45, 5.44s/it] 86%|████████▋ | 5739/6640 [1:09:51<1:21:26, 5.42s/it] {'loss': 0.5781, 'learning_rate': 9.50483628381792e-07, 'epoch': 0.86} 86%|████████▋ | 5739/6640 [1:09:51<1:21:26, 5.42s/it] 86%|████████▋ | 5740/6640 [1:09:56<1:20:51, 5.39s/it] {'loss': 0.5663, 'learning_rate': 9.484089396715057e-07, 'epoch': 0.86} 86%|████████▋ | 5740/6640 [1:09:56<1:20:51, 5.39s/it] 86%|████████▋ | 5741/6640 [1:10:01<1:20:18, 5.36s/it] {'loss': 0.5749, 'learning_rate': 9.463364049970691e-07, 'epoch': 0.86} 86%|████████▋ | 5741/6640 [1:10:01<1:20:18, 5.36s/it] 86%|████████▋ | 5742/6640 [1:10:07<1:20:39, 5.39s/it] {'loss': 0.5665, 'learning_rate': 9.442660248516889e-07, 'epoch': 0.86} 86%|████████▋ | 5742/6640 [1:10:07<1:20:39, 5.39s/it] 86%|████████▋ | 5743/6640 [1:10:12<1:20:56, 5.41s/it] {'loss': 0.589, 'learning_rate': 9.421977997280596e-07, 'epoch': 0.86} 86%|████████▋ | 5743/6640 [1:10:12<1:20:56, 5.41s/it] 87%|████████▋ | 5744/6640 [1:10:18<1:20:59, 5.42s/it] {'loss': 0.6015, 'learning_rate': 9.401317301183655e-07, 'epoch': 0.87} 87%|████████▋ | 5744/6640 [1:10:18<1:20:59, 5.42s/it] 87%|████████▋ | 5745/6640 [1:10:23<1:21:07, 5.44s/it] {'loss': 0.5576, 'learning_rate': 9.380678165142732e-07, 'epoch': 0.87} 87%|████████▋ | 5745/6640 [1:10:23<1:21:07, 5.44s/it] 87%|████████▋ | 5746/6640 [1:10:28<1:20:32, 5.41s/it] {'loss': 0.5791, 'learning_rate': 9.36006059406942e-07, 'epoch': 0.87} 87%|████████▋ | 5746/6640 [1:10:28<1:20:32, 5.41s/it] 87%|████████▋ | 5747/6640 [1:10:34<1:20:24, 5.40s/it] {'loss': 0.6025, 'learning_rate': 9.339464592870107e-07, 'epoch': 0.87} 87%|████████▋ | 5747/6640 [1:10:34<1:20:24, 5.40s/it] 87%|████████▋ | 5748/6640 [1:10:39<1:20:26, 5.41s/it] {'loss': 0.576, 'learning_rate': 9.318890166446115e-07, 'epoch': 0.87} 87%|████████▋ | 5748/6640 [1:10:39<1:20:26, 5.41s/it] 87%|████████▋ | 5749/6640 [1:10:45<1:20:14, 5.40s/it] {'loss': 0.5866, 'learning_rate': 9.298337319693596e-07, 'epoch': 0.87} 87%|████████▋ | 5749/6640 [1:10:45<1:20:14, 5.40s/it]3 AutoResumeHook: Checking whether to suspend... 26 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 87%|████████▋ | 5750/6640 [1:10:50<1:21:10, 5.47s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6046, 'learning_rate': 9.277806057503592e-07, 'epoch': 0.87} 87%|████████▋ | 5750/6640 [1:10:50<1:21:10, 5.47s/it] 87%|████████▋ | 5751/6640 [1:10:56<1:20:12, 5.41s/it] {'loss': 0.5574, 'learning_rate': 9.257296384761971e-07, 'epoch': 0.87} 87%|████████▋ | 5751/6640 [1:10:56<1:20:12, 5.41s/it] 87%|████████▋ | 5752/6640 [1:11:01<1:19:47, 5.39s/it] {'loss': 0.5761, 'learning_rate': 9.236808306349498e-07, 'epoch': 0.87} 87%|████████▋ | 5752/6640 [1:11:01<1:19:47, 5.39s/it] 87%|████████▋ | 5753/6640 [1:11:06<1:19:55, 5.41s/it] {'loss': 0.6038, 'learning_rate': 9.216341827141772e-07, 'epoch': 0.87} 87%|████████▋ | 5753/6640 [1:11:06<1:19:55, 5.41s/it] 87%|████████▋ | 5754/6640 [1:11:12<1:19:15, 5.37s/it] {'loss': 0.5473, 'learning_rate': 9.195896952009287e-07, 'epoch': 0.87} 87%|████████▋ | 5754/6640 [1:11:12<1:19:15, 5.37s/it] 87%|████████▋ | 5755/6640 [1:11:17<1:19:07, 5.36s/it] {'loss': 0.5669, 'learning_rate': 9.175473685817371e-07, 'epoch': 0.87} 87%|████████▋ | 5755/6640 [1:11:17<1:19:07, 5.36s/it] 87%|████████▋ | 5756/6640 [1:11:23<1:19:42, 5.41s/it] {'loss': 0.5701, 'learning_rate': 9.155072033426194e-07, 'epoch': 0.87} 87%|████████▋ | 5756/6640 [1:11:23<1:19:42, 5.41s/it] 87%|████████▋ | 5757/6640 [1:11:28<1:19:40, 5.41s/it] {'loss': 0.5529, 'learning_rate': 9.134691999690815e-07, 'epoch': 0.87} 87%|████████▋ | 5757/6640 [1:11:28<1:19:40, 5.41s/it] 87%|████████▋ | 5758/6640 [1:11:34<1:21:08, 5.52s/it] {'loss': 0.5877, 'learning_rate': 9.114333589461144e-07, 'epoch': 0.87} 87%|████████▋ | 5758/6640 [1:11:34<1:21:08, 5.52s/it] 87%|████████▋ | 5759/6640 [1:11:39<1:20:53, 5.51s/it] {'loss': 0.5718, 'learning_rate': 9.093996807581929e-07, 'epoch': 0.87} 87%|████████▋ | 5759/6640 [1:11:39<1:20:53, 5.51s/it] 87%|████████▋ | 5760/6640 [1:11:45<1:21:21, 5.55s/it] {'loss': 0.5685, 'learning_rate': 9.073681658892775e-07, 'epoch': 0.87} 87%|████████▋ | 5760/6640 [1:11:45<1:21:21, 5.55s/it] 87%|████████▋ | 5761/6640 [1:11:51<1:21:51, 5.59s/it] {'loss': 0.5607, 'learning_rate': 9.053388148228138e-07, 'epoch': 0.87} 87%|████████▋ | 5761/6640 [1:11:51<1:21:51, 5.59s/it] 87%|████████▋ | 5762/6640 [1:11:56<1:21:31, 5.57s/it] {'loss': 0.5691, 'learning_rate': 9.033116280417331e-07, 'epoch': 0.87} 87%|████████▋ | 5762/6640 [1:11:56<1:21:31, 5.57s/it] 87%|████████▋ | 5763/6640 [1:12:01<1:20:43, 5.52s/it] {'loss': 0.559, 'learning_rate': 9.01286606028452e-07, 'epoch': 0.87} 87%|████████▋ | 5763/6640 [1:12:01<1:20:43, 5.52s/it] 87%|████████▋ | 5764/6640 [1:12:07<1:19:50, 5.47s/it] {'loss': 0.559, 'learning_rate': 8.992637492648737e-07, 'epoch': 0.87} 87%|████████▋ | 5764/6640 [1:12:07<1:19:50, 5.47s/it] 87%|████████▋ | 5765/6640 [1:12:12<1:20:40, 5.53s/it] {'loss': 0.5731, 'learning_rate': 8.972430582323788e-07, 'epoch': 0.87} 87%|████████▋ | 5765/6640 [1:12:12<1:20:40, 5.53s/it] 87%|████████▋ | 5766/6640 [1:12:18<1:20:59, 5.56s/it] {'loss': 0.5572, 'learning_rate': 8.952245334118415e-07, 'epoch': 0.87} 87%|████████▋ | 5766/6640 [1:12:18<1:20:59, 5.56s/it] 87%|████████▋ | 5767/6640 [1:12:23<1:20:04, 5.50s/it] {'loss': 0.5653, 'learning_rate': 8.932081752836141e-07, 'epoch': 0.87} 87%|████████▋ | 5767/6640 [1:12:23<1:20:04, 5.50s/it] 87%|████████▋ | 5768/6640 [1:12:29<1:19:12, 5.45s/it] {'loss': 0.5861, 'learning_rate': 8.911939843275396e-07, 'epoch': 0.87} 87%|████████▋ | 5768/6640 [1:12:29<1:19:12, 5.45s/it] 87%|████████▋ | 5769/6640 [1:12:34<1:20:01, 5.51s/it] {'loss': 0.5674, 'learning_rate': 8.891819610229369e-07, 'epoch': 0.87} 87%|████████▋ | 5769/6640 [1:12:34<1:20:01, 5.51s/it] 87%|████████▋ | 5770/6640 [1:12:40<1:19:58, 5.52s/it] {'loss': 0.5861, 'learning_rate': 8.871721058486149e-07, 'epoch': 0.87} 87%|████████▋ | 5770/6640 [1:12:40<1:19:58, 5.52s/it] 87%|████████▋ | 5771/6640 [1:12:46<1:20:31, 5.56s/it] {'loss': 0.5981, 'learning_rate': 8.851644192828668e-07, 'epoch': 0.87} 87%|████████▋ | 5771/6640 [1:12:46<1:20:31, 5.56s/it] 87%|████████▋ | 5772/6640 [1:12:51<1:19:52, 5.52s/it] {'loss': 0.5871, 'learning_rate': 8.831589018034659e-07, 'epoch': 0.87} 87%|████████▋ | 5772/6640 [1:12:51<1:19:52, 5.52s/it] 87%|████████▋ | 5773/6640 [1:12:56<1:18:59, 5.47s/it] {'loss': 0.5656, 'learning_rate': 8.811555538876748e-07, 'epoch': 0.87} 87%|████████▋ | 5773/6640 [1:12:56<1:18:59, 5.47s/it] 87%|████████▋ | 5774/6640 [1:13:02<1:19:54, 5.54s/it] {'loss': 0.5739, 'learning_rate': 8.791543760122335e-07, 'epoch': 0.87} 87%|████████▋ | 5774/6640 [1:13:02<1:19:54, 5.54s/it] 87%|████████▋ | 5775/6640 [1:13:08<1:19:10, 5.49s/it] {'loss': 0.576, 'learning_rate': 8.771553686533684e-07, 'epoch': 0.87} 87%|████████▋ | 5775/6640 [1:13:08<1:19:10, 5.49s/it] 87%|████████▋ | 5776/6640 [1:13:13<1:18:31, 5.45s/it] {'loss': 0.5981, 'learning_rate': 8.751585322867917e-07, 'epoch': 0.87} 87%|████████▋ | 5776/6640 [1:13:13<1:18:31, 5.45s/it] 87%|████████▋ | 5777/6640 [1:13:18<1:18:16, 5.44s/it] {'loss': 0.5845, 'learning_rate': 8.731638673876964e-07, 'epoch': 0.87} 87%|████████▋ | 5777/6640 [1:13:18<1:18:16, 5.44s/it] 87%|████████▋ | 5778/6640 [1:13:24<1:18:25, 5.46s/it] {'loss': 0.6049, 'learning_rate': 8.711713744307603e-07, 'epoch': 0.87} 87%|████████▋ | 5778/6640 [1:13:24<1:18:25, 5.46s/it] 87%|████████▋ | 5779/6640 [1:13:29<1:18:39, 5.48s/it] {'loss': 0.5546, 'learning_rate': 8.69181053890138e-07, 'epoch': 0.87} 87%|████████▋ | 5779/6640 [1:13:29<1:18:39, 5.48s/it] 87%|████████▋ | 5780/6640 [1:13:35<1:18:39, 5.49s/it] {'loss': 0.5656, 'learning_rate': 8.671929062394802e-07, 'epoch': 0.87} 87%|████████▋ | 5780/6640 [1:13:35<1:18:39, 5.49s/it] 87%|████████▋ | 5781/6640 [1:13:41<1:19:42, 5.57s/it] {'loss': 0.5967, 'learning_rate': 8.652069319519074e-07, 'epoch': 0.87} 87%|████████▋ | 5781/6640 [1:13:41<1:19:42, 5.57s/it] 87%|████████▋ | 5782/6640 [1:13:46<1:18:55, 5.52s/it] {'loss': 0.5955, 'learning_rate': 8.632231315000305e-07, 'epoch': 0.87} 87%|████████▋ | 5782/6640 [1:13:46<1:18:55, 5.52s/it] 87%|████████▋ | 5783/6640 [1:13:52<1:19:11, 5.54s/it] {'loss': 0.5693, 'learning_rate': 8.612415053559386e-07, 'epoch': 0.87} 87%|████████▋ | 5783/6640 [1:13:52<1:19:11, 5.54s/it] 87%|████████▋ | 5784/6640 [1:13:57<1:18:34, 5.51s/it] {'loss': 0.5795, 'learning_rate': 8.592620539912067e-07, 'epoch': 0.87} 87%|████████▋ | 5784/6640 [1:13:57<1:18:34, 5.51s/it] 87%|████████▋ | 5785/6640 [1:14:03<1:18:52, 5.54s/it] {'loss': 0.6188, 'learning_rate': 8.572847778768912e-07, 'epoch': 0.87} 87%|████████▋ | 5785/6640 [1:14:03<1:18:52, 5.54s/it] 87%|████████▋ | 5786/6640 [1:14:08<1:17:41, 5.46s/it] {'loss': 0.5739, 'learning_rate': 8.553096774835312e-07, 'epoch': 0.87} 87%|████████▋ | 5786/6640 [1:14:08<1:17:41, 5.46s/it] 87%|████████▋ | 5787/6640 [1:14:13<1:18:10, 5.50s/it] {'loss': 0.5797, 'learning_rate': 8.533367532811487e-07, 'epoch': 0.87} 87%|████████▋ | 5787/6640 [1:14:13<1:18:10, 5.50s/it] 87%|████████▋ | 5788/6640 [1:14:19<1:18:56, 5.56s/it] {'loss': 0.5745, 'learning_rate': 8.513660057392415e-07, 'epoch': 0.87} 87%|████████▋ | 5788/6640 [1:14:19<1:18:56, 5.56s/it] 87%|████████▋ | 5789/6640 [1:14:25<1:20:00, 5.64s/it] {'loss': 0.5926, 'learning_rate': 8.493974353268019e-07, 'epoch': 0.87} 87%|████████▋ | 5789/6640 [1:14:25<1:20:00, 5.64s/it] 87%|████████▋ | 5790/6640 [1:14:30<1:18:43, 5.56s/it] {'loss': 0.5726, 'learning_rate': 8.474310425122923e-07, 'epoch': 0.87} 87%|████████▋ | 5790/6640 [1:14:30<1:18:43, 5.56s/it] 87%|████████▋ | 5791/6640 [1:14:36<1:17:27, 5.47s/it] {'loss': 0.5774, 'learning_rate': 8.454668277636636e-07, 'epoch': 0.87} 87%|████████▋ | 5791/6640 [1:14:36<1:17:27, 5.47s/it] 87%|████████▋ | 5792/6640 [1:14:41<1:16:34, 5.42s/it] {'loss': 0.5748, 'learning_rate': 8.435047915483441e-07, 'epoch': 0.87} 87%|████████▋ | 5792/6640 [1:14:41<1:16:34, 5.42s/it] 87%|████████▋ | 5793/6640 [1:14:46<1:16:31, 5.42s/it] {'loss': 0.5939, 'learning_rate': 8.41544934333246e-07, 'epoch': 0.87} 87%|████████▋ | 5793/6640 [1:14:46<1:16:31, 5.42s/it] 87%|████████▋ | 5794/6640 [1:14:52<1:16:59, 5.46s/it] {'loss': 0.5833, 'learning_rate': 8.39587256584764e-07, 'epoch': 0.87} 87%|████████▋ | 5794/6640 [1:14:52<1:16:59, 5.46s/it] 87%|████████▋ | 5795/6640 [1:14:57<1:16:38, 5.44s/it] {'loss': 0.5702, 'learning_rate': 8.376317587687721e-07, 'epoch': 0.87} 87%|████████▋ | 5795/6640 [1:14:57<1:16:38, 5.44s/it] 87%|████████▋ | 5796/6640 [1:15:03<1:16:32, 5.44s/it] {'loss': 0.5684, 'learning_rate': 8.356784413506281e-07, 'epoch': 0.87} 87%|████████▋ | 5796/6640 [1:15:03<1:16:32, 5.44s/it] 87%|████████▋ | 5797/6640 [1:15:08<1:16:20, 5.43s/it] {'loss': 0.5689, 'learning_rate': 8.337273047951655e-07, 'epoch': 0.87} 87%|████████▋ | 5797/6640 [1:15:08<1:16:20, 5.43s/it] 87%|████████▋ | 5798/6640 [1:15:14<1:16:16, 5.44s/it] {'loss': 0.5787, 'learning_rate': 8.317783495667075e-07, 'epoch': 0.87} 87%|████████▋ | 5798/6640 [1:15:14<1:16:16, 5.44s/it] 87%|████████▋ | 5799/6640 [1:15:19<1:16:16, 5.44s/it] {'loss': 0.5822, 'learning_rate': 8.298315761290477e-07, 'epoch': 0.87} 87%|████████▋ | 5799/6640 [1:15:19<1:16:16, 5.44s/it]6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... 2 4AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 87%|████████▋ | 5800/6640 [1:15:24<1:15:46, 5.41s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5816, 'learning_rate': 8.278869849454718e-07, 'epoch': 0.87} 87%|████████▋ | 5800/6640 [1:15:24<1:15:46, 5.41s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 87%|████████▋ | 5801/6640 [1:15:52<2:48:49, 12.07s/it] {'loss': 0.5869, 'learning_rate': 8.259445764787344e-07, 'epoch': 0.87} 87%|████████▋ | 5801/6640 [1:15:52<2:48:49, 12.07s/it] 87%|████████▋ | 5802/6640 [1:15:57<2:20:37, 10.07s/it] {'loss': 0.5866, 'learning_rate': 8.24004351191079e-07, 'epoch': 0.87} 87%|████████▋ | 5802/6640 [1:15:57<2:20:37, 10.07s/it] 87%|████████▋ | 5803/6640 [1:16:03<2:01:11, 8.69s/it] {'loss': 0.5801, 'learning_rate': 8.220663095442283e-07, 'epoch': 0.87} 87%|████████▋ | 5803/6640 [1:16:03<2:01:11, 8.69s/it] 87%|████████▋ | 5804/6640 [1:16:09<1:49:01, 7.82s/it] {'loss': 0.5943, 'learning_rate': 8.201304519993836e-07, 'epoch': 0.87} 87%|████████▋ | 5804/6640 [1:16:09<1:49:01, 7.82s/it] 87%|████████▋ | 5805/6640 [1:16:14<1:38:37, 7.09s/it] {'loss': 0.5788, 'learning_rate': 8.181967790172274e-07, 'epoch': 0.87} 87%|████████▋ | 5805/6640 [1:16:14<1:38:37, 7.09s/it] 87%|████████▋ | 5806/6640 [1:16:19<1:31:31, 6.58s/it] {'loss': 0.5691, 'learning_rate': 8.162652910579194e-07, 'epoch': 0.87} 87%|████████▋ | 5806/6640 [1:16:19<1:31:31, 6.58s/it] 87%|████████▋ | 5807/6640 [1:16:25<1:26:58, 6.26s/it] {'loss': 0.5702, 'learning_rate': 8.143359885811064e-07, 'epoch': 0.87} 87%|████████▋ | 5807/6640 [1:16:25<1:26:58, 6.26s/it] 87%|████████▋ | 5808/6640 [1:16:31<1:24:38, 6.10s/it] {'loss': 0.5868, 'learning_rate': 8.124088720459067e-07, 'epoch': 0.87} 87%|████████▋ | 5808/6640 [1:16:31<1:24:38, 6.10s/it] 87%|████████▋ | 5809/6640 [1:16:36<1:21:48, 5.91s/it] {'loss': 0.5842, 'learning_rate': 8.104839419109256e-07, 'epoch': 0.87} 87%|████████▋ | 5809/6640 [1:16:36<1:21:48, 5.91s/it] 88%|████████▊ | 5810/6640 [1:16:42<1:19:15, 5.73s/it] {'loss': 0.5614, 'learning_rate': 8.085611986342423e-07, 'epoch': 0.88} 88%|████████▊ | 5810/6640 [1:16:42<1:19:15, 5.73s/it] 88%|████████▊ | 5811/6640 [1:16:47<1:18:29, 5.68s/it] {'loss': 0.5796, 'learning_rate': 8.066406426734174e-07, 'epoch': 0.88} 88%|████████▊ | 5811/6640 [1:16:47<1:18:29, 5.68s/it] 88%|████████▊ | 5812/6640 [1:16:53<1:17:35, 5.62s/it] {'loss': 0.5687, 'learning_rate': 8.047222744854943e-07, 'epoch': 0.88} 88%|████████▊ | 5812/6640 [1:16:53<1:17:35, 5.62s/it] 88%|████████▊ | 5813/6640 [1:16:58<1:16:37, 5.56s/it] {'loss': 0.5957, 'learning_rate': 8.028060945269911e-07, 'epoch': 0.88} 88%|████████▊ | 5813/6640 [1:16:58<1:16:37, 5.56s/it] 88%|████████▊ | 5814/6640 [1:17:03<1:16:10, 5.53s/it] {'loss': 0.578, 'learning_rate': 8.008921032539108e-07, 'epoch': 0.88} 88%|████████▊ | 5814/6640 [1:17:03<1:16:10, 5.53s/it] 88%|████████▊ | 5815/6640 [1:17:09<1:15:09, 5.47s/it] {'loss': 0.5589, 'learning_rate': 7.989803011217256e-07, 'epoch': 0.88} 88%|████████▊ | 5815/6640 [1:17:09<1:15:09, 5.47s/it] 88%|████████▊ | 5816/6640 [1:17:14<1:16:13, 5.55s/it] {'loss': 0.5999, 'learning_rate': 7.970706885854007e-07, 'epoch': 0.88} 88%|████████▊ | 5816/6640 [1:17:14<1:16:13, 5.55s/it] 88%|████████▊ | 5817/6640 [1:17:20<1:15:54, 5.53s/it] {'loss': 0.5583, 'learning_rate': 7.951632660993669e-07, 'epoch': 0.88} 88%|████████▊ | 5817/6640 [1:17:20<1:15:54, 5.53s/it] 88%|████████▊ | 5818/6640 [1:17:25<1:14:26, 5.43s/it] {'loss': 0.5733, 'learning_rate': 7.932580341175411e-07, 'epoch': 0.88} 88%|████████▊ | 5818/6640 [1:17:25<1:14:26, 5.43s/it] 88%|████████▊ | 5819/6640 [1:17:31<1:14:35, 5.45s/it] {'loss': 0.5904, 'learning_rate': 7.913549930933196e-07, 'epoch': 0.88} 88%|████████▊ | 5819/6640 [1:17:31<1:14:35, 5.45s/it] 88%|████████▊ | 5820/6640 [1:17:36<1:14:14, 5.43s/it] {'loss': 0.5879, 'learning_rate': 7.8945414347957e-07, 'epoch': 0.88} 88%|████████▊ | 5820/6640 [1:17:36<1:14:14, 5.43s/it] 88%|████████▊ | 5821/6640 [1:17:42<1:14:34, 5.46s/it] {'loss': 0.5744, 'learning_rate': 7.875554857286504e-07, 'epoch': 0.88} 88%|████████▊ | 5821/6640 [1:17:42<1:14:34, 5.46s/it] 88%|████████▊ | 5822/6640 [1:17:47<1:14:45, 5.48s/it] {'loss': 0.598, 'learning_rate': 7.856590202923852e-07, 'epoch': 0.88} 88%|████████▊ | 5822/6640 [1:17:47<1:14:45, 5.48s/it] 88%|████████▊ | 5823/6640 [1:17:53<1:15:09, 5.52s/it] {'loss': 0.5703, 'learning_rate': 7.837647476220855e-07, 'epoch': 0.88} 88%|████████▊ | 5823/6640 [1:17:53<1:15:09, 5.52s/it] 88%|████████▊ | 5824/6640 [1:17:58<1:14:47, 5.50s/it] {'loss': 0.594, 'learning_rate': 7.818726681685329e-07, 'epoch': 0.88} 88%|████████▊ | 5824/6640 [1:17:58<1:14:47, 5.50s/it] 88%|████████▊ | 5825/6640 [1:18:04<1:14:42, 5.50s/it] {'loss': 0.6021, 'learning_rate': 7.799827823819972e-07, 'epoch': 0.88} 88%|████████▊ | 5825/6640 [1:18:04<1:14:42, 5.50s/it] 88%|████████▊ | 5826/6640 [1:18:09<1:14:05, 5.46s/it] {'loss': 0.5919, 'learning_rate': 7.780950907122164e-07, 'epoch': 0.88} 88%|████████▊ | 5826/6640 [1:18:09<1:14:05, 5.46s/it] 88%|████████▊ | 5827/6640 [1:18:14<1:13:47, 5.45s/it] {'loss': 0.573, 'learning_rate': 7.762095936084124e-07, 'epoch': 0.88} 88%|████████▊ | 5827/6640 [1:18:14<1:13:47, 5.45s/it] 88%|████████▊ | 5828/6640 [1:18:20<1:13:07, 5.40s/it] {'loss': 0.5785, 'learning_rate': 7.743262915192839e-07, 'epoch': 0.88} 88%|████████▊ | 5828/6640 [1:18:20<1:13:07, 5.40s/it] 88%|████████▊ | 5829/6640 [1:18:25<1:12:37, 5.37s/it] {'loss': 0.5624, 'learning_rate': 7.724451848930014e-07, 'epoch': 0.88} 88%|████████▊ | 5829/6640 [1:18:25<1:12:37, 5.37s/it] 88%|████████▊ | 5830/6640 [1:18:31<1:13:07, 5.42s/it] {'loss': 0.5845, 'learning_rate': 7.705662741772235e-07, 'epoch': 0.88} 88%|████████▊ | 5830/6640 [1:18:31<1:13:07, 5.42s/it] 88%|████████▊ | 5831/6640 [1:18:36<1:13:02, 5.42s/it] {'loss': 0.591, 'learning_rate': 7.68689559819078e-07, 'epoch': 0.88} 88%|████████▊ | 5831/6640 [1:18:36<1:13:02, 5.42s/it] 88%|████████▊ | 5832/6640 [1:18:41<1:13:01, 5.42s/it] {'loss': 0.5561, 'learning_rate': 7.668150422651743e-07, 'epoch': 0.88} 88%|████████▊ | 5832/6640 [1:18:41<1:13:01, 5.42s/it] 88%|████████▊ | 5833/6640 [1:18:47<1:14:35, 5.55s/it] {'loss': 0.572, 'learning_rate': 7.649427219615912e-07, 'epoch': 0.88} 88%|████████▊ | 5833/6640 [1:18:47<1:14:35, 5.55s/it] 88%|████████▊ | 5834/6640 [1:18:53<1:15:11, 5.60s/it] {'loss': 0.5832, 'learning_rate': 7.630725993539001e-07, 'epoch': 0.88} 88%|████████▊ | 5834/6640 [1:18:53<1:15:11, 5.60s/it] 88%|████████▊ | 5835/6640 [1:18:58<1:14:22, 5.54s/it] {'loss': 0.5649, 'learning_rate': 7.612046748871327e-07, 'epoch': 0.88} 88%|████████▊ | 5835/6640 [1:18:58<1:14:22, 5.54s/it] 88%|████████▊ | 5836/6640 [1:19:04<1:14:32, 5.56s/it] {'loss': 0.587, 'learning_rate': 7.593389490058078e-07, 'epoch': 0.88} 88%|████████▊ | 5836/6640 [1:19:04<1:14:32, 5.56s/it] 88%|████████▊ | 5837/6640 [1:19:10<1:14:55, 5.60s/it] {'loss': 0.5769, 'learning_rate': 7.574754221539194e-07, 'epoch': 0.88} 88%|████████▊ | 5837/6640 [1:19:10<1:14:55, 5.60s/it] 88%|████████▊ | 5838/6640 [1:19:15<1:14:27, 5.57s/it] {'loss': 0.6036, 'learning_rate': 7.556140947749324e-07, 'epoch': 0.88} 88%|████████▊ | 5838/6640 [1:19:15<1:14:27, 5.57s/it] 88%|████████▊ | 5839/6640 [1:19:21<1:13:59, 5.54s/it] {'loss': 0.5818, 'learning_rate': 7.537549673117983e-07, 'epoch': 0.88} 88%|████████▊ | 5839/6640 [1:19:21<1:13:59, 5.54s/it] 88%|████████▊ | 5840/6640 [1:19:26<1:12:30, 5.44s/it] {'loss': 0.5815, 'learning_rate': 7.518980402069354e-07, 'epoch': 0.88} 88%|████████▊ | 5840/6640 [1:19:26<1:12:30, 5.44s/it] 88%|████████▊ | 5841/6640 [1:19:31<1:11:54, 5.40s/it] {'loss': 0.5817, 'learning_rate': 7.500433139022456e-07, 'epoch': 0.88} 88%|████████▊ | 5841/6640 [1:19:31<1:11:54, 5.40s/it] 88%|████████▊ | 5842/6640 [1:19:37<1:12:27, 5.45s/it] {'loss': 0.5795, 'learning_rate': 7.481907888390994e-07, 'epoch': 0.88} 88%|████████▊ | 5842/6640 [1:19:37<1:12:27, 5.45s/it] 88%|████████▊ | 5843/6640 [1:19:42<1:13:07, 5.50s/it] {'loss': 0.5986, 'learning_rate': 7.46340465458354e-07, 'epoch': 0.88} 88%|████████▊ | 5843/6640 [1:19:42<1:13:07, 5.50s/it] 88%|████████▊ | 5844/6640 [1:19:48<1:12:45, 5.48s/it] {'loss': 0.5567, 'learning_rate': 7.444923442003327e-07, 'epoch': 0.88} 88%|████████▊ | 5844/6640 [1:19:48<1:12:45, 5.48s/it] 88%|████████▊ | 5845/6640 [1:19:53<1:12:05, 5.44s/it] {'loss': 0.5764, 'learning_rate': 7.426464255048393e-07, 'epoch': 0.88} 88%|████████▊ | 5845/6640 [1:19:53<1:12:05, 5.44s/it] 88%|████████▊ | 5846/6640 [1:19:59<1:11:44, 5.42s/it] {'loss': 0.5839, 'learning_rate': 7.408027098111559e-07, 'epoch': 0.88} 88%|████████▊ | 5846/6640 [1:19:59<1:11:44, 5.42s/it] 88%|████████▊ | 5847/6640 [1:20:04<1:11:05, 5.38s/it] {'loss': 0.5579, 'learning_rate': 7.389611975580313e-07, 'epoch': 0.88} 88%|████████▊ | 5847/6640 [1:20:04<1:11:05, 5.38s/it] 88%|████████▊ | 5848/6640 [1:20:09<1:11:47, 5.44s/it] {'loss': 0.5651, 'learning_rate': 7.371218891837029e-07, 'epoch': 0.88} 88%|████████▊ | 5848/6640 [1:20:09<1:11:47, 5.44s/it] 88%|████████▊ | 5849/6640 [1:20:15<1:11:15, 5.41s/it] {'loss': 0.5725, 'learning_rate': 7.352847851258716e-07, 'epoch': 0.88} 88%|████████▊ | 5849/6640 [1:20:15<1:11:15, 5.41s/it]7 AutoResumeHook: Checking whether to suspend... 16 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 42 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 88%|████████▊ | 5850/6640 [1:20:20<1:11:17, 5.41s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6183, 'learning_rate': 7.334498858217231e-07, 'epoch': 0.88} 88%|████████▊ | 5850/6640 [1:20:20<1:11:17, 5.41s/it] 88%|████████▊ | 5851/6640 [1:20:26<1:10:58, 5.40s/it] {'loss': 0.5928, 'learning_rate': 7.316171917079084e-07, 'epoch': 0.88} 88%|████████▊ | 5851/6640 [1:20:26<1:10:58, 5.40s/it] 88%|████████▊ | 5852/6640 [1:20:31<1:11:08, 5.42s/it] {'loss': 0.5809, 'learning_rate': 7.297867032205664e-07, 'epoch': 0.88} 88%|████████▊ | 5852/6640 [1:20:31<1:11:08, 5.42s/it] 88%|████████▊ | 5853/6640 [1:20:37<1:11:58, 5.49s/it] {'loss': 0.5641, 'learning_rate': 7.279584207952995e-07, 'epoch': 0.88} 88%|████████▊ | 5853/6640 [1:20:37<1:11:58, 5.49s/it] 88%|████████▊ | 5854/6640 [1:20:42<1:12:00, 5.50s/it] {'loss': 0.582, 'learning_rate': 7.261323448671919e-07, 'epoch': 0.88} 88%|████████▊ | 5854/6640 [1:20:42<1:12:00, 5.50s/it] 88%|████████▊ | 5855/6640 [1:20:48<1:11:25, 5.46s/it] {'loss': 0.6091, 'learning_rate': 7.243084758708007e-07, 'epoch': 0.88} 88%|████████▊ | 5855/6640 [1:20:48<1:11:25, 5.46s/it] 88%|████████▊ | 5856/6640 [1:20:53<1:11:43, 5.49s/it] {'loss': 0.5801, 'learning_rate': 7.224868142401542e-07, 'epoch': 0.88} 88%|████████▊ | 5856/6640 [1:20:53<1:11:43, 5.49s/it] 88%|████████▊ | 5857/6640 [1:20:58<1:10:56, 5.44s/it] {'loss': 0.5714, 'learning_rate': 7.206673604087655e-07, 'epoch': 0.88} 88%|████████▊ | 5857/6640 [1:20:58<1:10:56, 5.44s/it] 88%|████████▊ | 5858/6640 [1:21:04<1:10:49, 5.43s/it] {'loss': 0.5613, 'learning_rate': 7.188501148096117e-07, 'epoch': 0.88} 88%|████████▊ | 5858/6640 [1:21:04<1:10:49, 5.43s/it] 88%|████████▊ | 5859/6640 [1:21:09<1:10:23, 5.41s/it] {'loss': 0.5608, 'learning_rate': 7.170350778751478e-07, 'epoch': 0.88} 88%|████████▊ | 5859/6640 [1:21:09<1:10:23, 5.41s/it] 88%|████████▊ | 5860/6640 [1:21:15<1:10:35, 5.43s/it] {'loss': 0.5498, 'learning_rate': 7.152222500373052e-07, 'epoch': 0.88} 88%|████████▊ | 5860/6640 [1:21:15<1:10:35, 5.43s/it] 88%|████████▊ | 5861/6640 [1:21:20<1:09:57, 5.39s/it] {'loss': 0.5614, 'learning_rate': 7.134116317274886e-07, 'epoch': 0.88} 88%|████████▊ | 5861/6640 [1:21:20<1:09:57, 5.39s/it] 88%|████████▊ | 5862/6640 [1:21:25<1:10:21, 5.43s/it] {'loss': 0.581, 'learning_rate': 7.11603223376578e-07, 'epoch': 0.88} 88%|████████▊ | 5862/6640 [1:21:25<1:10:21, 5.43s/it] 88%|████████▊ | 5863/6640 [1:21:31<1:10:20, 5.43s/it] {'loss': 0.5874, 'learning_rate': 7.097970254149222e-07, 'epoch': 0.88} 88%|████████▊ | 5863/6640 [1:21:31<1:10:20, 5.43s/it] 88%|████████▊ | 5864/6640 [1:21:37<1:11:11, 5.50s/it] {'loss': 0.5622, 'learning_rate': 7.079930382723521e-07, 'epoch': 0.88} 88%|████████▊ | 5864/6640 [1:21:37<1:11:11, 5.50s/it] 88%|████████▊ | 5865/6640 [1:21:42<1:11:19, 5.52s/it] {'loss': 0.5752, 'learning_rate': 7.06191262378163e-07, 'epoch': 0.88} 88%|████████▊ | 5865/6640 [1:21:42<1:11:19, 5.52s/it] 88%|████████▊ | 5866/6640 [1:21:48<1:11:29, 5.54s/it] {'loss': 0.574, 'learning_rate': 7.043916981611354e-07, 'epoch': 0.88} 88%|████████▊ | 5866/6640 [1:21:48<1:11:29, 5.54s/it] 88%|████████▊ | 5867/6640 [1:21:53<1:11:14, 5.53s/it] {'loss': 0.571, 'learning_rate': 7.025943460495133e-07, 'epoch': 0.88} 88%|████████▊ | 5867/6640 [1:21:53<1:11:14, 5.53s/it] 88%|████████▊ | 5868/6640 [1:21:59<1:10:42, 5.50s/it] {'loss': 0.5736, 'learning_rate': 7.007992064710189e-07, 'epoch': 0.88} 88%|████████▊ | 5868/6640 [1:21:59<1:10:42, 5.50s/it] 88%|████████▊ | 5869/6640 [1:22:04<1:10:30, 5.49s/it] {'loss': 0.5747, 'learning_rate': 6.990062798528474e-07, 'epoch': 0.88} 88%|████████▊ | 5869/6640 [1:22:04<1:10:30, 5.49s/it] 88%|████████▊ | 5870/6640 [1:22:10<1:11:41, 5.59s/it] {'loss': 0.5624, 'learning_rate': 6.972155666216684e-07, 'epoch': 0.88} 88%|████████▊ | 5870/6640 [1:22:10<1:11:41, 5.59s/it] 88%|████████▊ | 5871/6640 [1:22:15<1:10:29, 5.50s/it] {'loss': 0.5882, 'learning_rate': 6.954270672036234e-07, 'epoch': 0.88} 88%|████████▊ | 5871/6640 [1:22:15<1:10:29, 5.50s/it] 88%|████████▊ | 5872/6640 [1:22:21<1:10:00, 5.47s/it] {'loss': 0.5698, 'learning_rate': 6.936407820243251e-07, 'epoch': 0.88} 88%|████████▊ | 5872/6640 [1:22:21<1:10:00, 5.47s/it] 88%|████████▊ | 5873/6640 [1:22:26<1:09:01, 5.40s/it] {'loss': 0.5637, 'learning_rate': 6.918567115088648e-07, 'epoch': 0.88} 88%|████████▊ | 5873/6640 [1:22:26<1:09:01, 5.40s/it] 88%|████████▊ | 5874/6640 [1:22:31<1:09:16, 5.43s/it] {'loss': 0.6001, 'learning_rate': 6.900748560817982e-07, 'epoch': 0.88} 88%|████████▊ | 5874/6640 [1:22:31<1:09:16, 5.43s/it] 88%|████████▊ | 5875/6640 [1:22:37<1:09:52, 5.48s/it] {'loss': 0.5881, 'learning_rate': 6.882952161671652e-07, 'epoch': 0.88} 88%|████████▊ | 5875/6640 [1:22:37<1:09:52, 5.48s/it] 88%|████████▊ | 5876/6640 [1:22:42<1:09:53, 5.49s/it] {'loss': 0.5661, 'learning_rate': 6.865177921884681e-07, 'epoch': 0.88} 88%|████████▊ | 5876/6640 [1:22:42<1:09:53, 5.49s/it] 89%|████████▊ | 5877/6640 [1:22:48<1:09:25, 5.46s/it] {'loss': 0.6031, 'learning_rate': 6.847425845686884e-07, 'epoch': 0.89} 89%|████████▊ | 5877/6640 [1:22:48<1:09:25, 5.46s/it]WARNING: tokenization mismatch: 1 vs. 737. [[{'from': 'human', 'value': 'ASSISTANT: You are a top consultant specialising in advising tax firms and have the following skills and qualities.\n\nKNOWLEDGE: Consultant in a top consulting firm specialising in tax consulting, senior strategist, digital expert.\n\nCHARACTERISTICS: Customer-focused, empathetic, solution-oriented, personable, persuasive, creative.\n\nTASK: Use your knowledge to develop 10 measures to encourage clients to submit their documents digitally instead of by post. The measures should be formulated in such a way that the client recognises the advantages of digital filing. The law firm is digitised and can accept digital documents.\n\nOUTPUT: A list of the 10 actions, each with the name of the action as a heading and 2 short additional bullet points.\n\nOUTPUT LANGUAGE: German'}, {'from': 'gpt', 'value': '1. Persönlicher Kontakt und Beratung:\n\t* Persönliche Beratung und Unterstützung bei der Umstellung auf digitales Einreichen\n\t* Betonung der Vorteile, wie Zeit- und Kosteneinsparungen\n2. Schulung und Unterstützung:\n\t* Schulungen für die digitale Einreichung von Dokumenten\n\t* Anleitung zur Nutzung von Tools und Software\n3. Anreize schaffen:\n\t* Ermäßigung bei der Beauftragung digitaler Dienstleistungen\n\t* Sonderangebote für Kunden, die ihre Dokumente digital einreichen\n4. Informationen bereitstellen:\n\t* Veröffentlichung von Leitfäden und Handbüchern für die digitale Einreichung von Dokumenten\n\t* FAQ-Liste für häufig gestellte Fragen zum digitalen Einreichen\n5. Klare Kommunikation:\n\t* Verständliche Kommunikation der Vorteile der digitalen Einreichung\n\t* Bereitstellung von Informationen über die Sicherheit der digitalen Übermittlung\n6. Einfache Zugänglichkeit:\n\t* Vereinfachung des digitalen Einreichungsprozesses durch intuitive Online-Plattformen\n\t* Bereitstellung von Unterstützung für den Fall technischer Schwierigkeiten\n7. Zeitersparnis betonen:\n\t* Betonung der Vorteile von Zeitersparnis durch digitales Einreichen\n\t* Erinnerungen an Fristen und den Nutzen der pünktlichen Einreichung\n8. Umweltbewusstsein:\n\t* Betonung des Umweltvorteils durch digitales Einreichen von Dokumenten\n\t* Verweis auf die Möglichkeit, Papierverbrauch zu reduzieren\n9. Kosteneffizienz:\n\t* Betonung der Vorteile von Kosteneinsparungen durch digitales Einreichen\n\t* Verweis auf die Möglichkeit, Papier- und Portokosten zu sparen\n10. Feedback einholen:\n* Umfrage zur Bewertung des digitalen Einreichungsprozesses\n* Analyse der Ergebnisse zur Verbesserung der digitalen Dienstleistungen'}]] (ignored) 89%|████████▊ | 5878/6640 [1:22:53<1:08:38, 5.40s/it] {'loss': 0.5837, 'learning_rate': 6.82969593730276e-07, 'epoch': 0.89} 89%|████████▊ | 5878/6640 [1:22:53<1:08:38, 5.40s/it] 89%|████████▊ | 5879/6640 [1:22:59<1:08:24, 5.39s/it] {'loss': 0.553, 'learning_rate': 6.81198820095158e-07, 'epoch': 0.89} 89%|████████▊ | 5879/6640 [1:22:59<1:08:24, 5.39s/it] 89%|████████▊ | 5880/6640 [1:23:04<1:08:39, 5.42s/it] {'loss': 0.5612, 'learning_rate': 6.794302640847294e-07, 'epoch': 0.89} 89%|████████▊ | 5880/6640 [1:23:04<1:08:39, 5.42s/it] 89%|████████▊ | 5881/6640 [1:23:09<1:08:28, 5.41s/it] {'loss': 0.5929, 'learning_rate': 6.776639261198581e-07, 'epoch': 0.89} 89%|████████▊ | 5881/6640 [1:23:09<1:08:28, 5.41s/it] 89%|████████▊ | 5882/6640 [1:23:15<1:08:13, 5.40s/it] {'loss': 0.5937, 'learning_rate': 6.758998066208866e-07, 'epoch': 0.89} 89%|████████▊ | 5882/6640 [1:23:15<1:08:13, 5.40s/it] 89%|████████▊ | 5883/6640 [1:23:20<1:07:49, 5.38s/it] {'loss': 0.5718, 'learning_rate': 6.741379060076258e-07, 'epoch': 0.89} 89%|████████▊ | 5883/6640 [1:23:20<1:07:49, 5.38s/it] 89%|████████▊ | 5884/6640 [1:23:25<1:07:04, 5.32s/it] {'loss': 0.5675, 'learning_rate': 6.723782246993648e-07, 'epoch': 0.89} 89%|████████▊ | 5884/6640 [1:23:25<1:07:04, 5.32s/it] 89%|████████▊ | 5885/6640 [1:23:31<1:07:46, 5.39s/it] {'loss': 0.5772, 'learning_rate': 6.706207631148564e-07, 'epoch': 0.89} 89%|████████▊ | 5885/6640 [1:23:31<1:07:46, 5.39s/it] 89%|████████▊ | 5886/6640 [1:23:36<1:08:14, 5.43s/it] {'loss': 0.5906, 'learning_rate': 6.688655216723317e-07, 'epoch': 0.89} 89%|████████▊ | 5886/6640 [1:23:36<1:08:14, 5.43s/it] 89%|████████▊ | 5887/6640 [1:23:42<1:08:16, 5.44s/it] {'loss': 0.5918, 'learning_rate': 6.671125007894896e-07, 'epoch': 0.89} 89%|████████▊ | 5887/6640 [1:23:42<1:08:16, 5.44s/it] 89%|████████▊ | 5888/6640 [1:23:47<1:08:25, 5.46s/it] {'loss': 0.6002, 'learning_rate': 6.653617008835034e-07, 'epoch': 0.89} 89%|████████▊ | 5888/6640 [1:23:47<1:08:25, 5.46s/it] 89%|████████▊ | 5889/6640 [1:23:53<1:08:13, 5.45s/it] {'loss': 0.6049, 'learning_rate': 6.636131223710174e-07, 'epoch': 0.89} 89%|████████▊ | 5889/6640 [1:23:53<1:08:13, 5.45s/it] 89%|████████▊ | 5890/6640 [1:23:58<1:07:54, 5.43s/it] {'loss': 0.5478, 'learning_rate': 6.618667656681444e-07, 'epoch': 0.89} 89%|████████▊ | 5890/6640 [1:23:58<1:07:54, 5.43s/it] 89%|████████▊ | 5891/6640 [1:24:04<1:10:27, 5.64s/it] {'loss': 0.5617, 'learning_rate': 6.601226311904718e-07, 'epoch': 0.89} 89%|████████▊ | 5891/6640 [1:24:04<1:10:27, 5.64s/it] 89%|████████▊ | 5892/6640 [1:24:10<1:09:19, 5.56s/it] {'loss': 0.585, 'learning_rate': 6.583807193530523e-07, 'epoch': 0.89} 89%|████████▊ | 5892/6640 [1:24:10<1:09:19, 5.56s/it] 89%|████████▉ | 5893/6640 [1:24:15<1:09:16, 5.56s/it] {'loss': 0.6006, 'learning_rate': 6.566410305704218e-07, 'epoch': 0.89} 89%|████████▉ | 5893/6640 [1:24:15<1:09:16, 5.56s/it] 89%|████████▉ | 5894/6640 [1:24:21<1:09:46, 5.61s/it] {'loss': 0.5646, 'learning_rate': 6.549035652565739e-07, 'epoch': 0.89} 89%|████████▉ | 5894/6640 [1:24:21<1:09:46, 5.61s/it] 89%|████████▉ | 5895/6640 [1:24:26<1:08:57, 5.55s/it] {'loss': 0.5813, 'learning_rate': 6.531683238249809e-07, 'epoch': 0.89} 89%|████████▉ | 5895/6640 [1:24:26<1:08:57, 5.55s/it] 89%|████████▉ | 5896/6640 [1:24:32<1:08:33, 5.53s/it] {'loss': 0.5897, 'learning_rate': 6.514353066885837e-07, 'epoch': 0.89} 89%|████████▉ | 5896/6640 [1:24:32<1:08:33, 5.53s/it] 89%|████████▉ | 5897/6640 [1:24:37<1:08:02, 5.49s/it] {'loss': 0.5761, 'learning_rate': 6.497045142597924e-07, 'epoch': 0.89} 89%|████████▉ | 5897/6640 [1:24:37<1:08:02, 5.49s/it] 89%|████████▉ | 5898/6640 [1:24:43<1:08:21, 5.53s/it] {'loss': 0.566, 'learning_rate': 6.479759469504931e-07, 'epoch': 0.89} 89%|████████▉ | 5898/6640 [1:24:43<1:08:21, 5.53s/it] 89%|████████▉ | 5899/6640 [1:24:48<1:08:05, 5.51s/it] {'loss': 0.5849, 'learning_rate': 6.462496051720335e-07, 'epoch': 0.89} 89%|████████▉ | 5899/6640 [1:24:48<1:08:05, 5.51s/it]7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 0AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 89%|████████▉ | 5900/6640 [1:24:54<1:07:40, 5.49s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.566, 'learning_rate': 6.445254893352381e-07, 'epoch': 0.89} 89%|████████▉ | 5900/6640 [1:24:54<1:07:40, 5.49s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-5900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 89%|████████▉ | 5901/6640 [1:25:22<2:30:26, 12.21s/it] {'loss': 0.5821, 'learning_rate': 6.428035998504023e-07, 'epoch': 0.89} 89%|████████▉ | 5901/6640 [1:25:22<2:30:26, 12.21s/it] 89%|████████▉ | 5902/6640 [1:25:27<2:05:15, 10.18s/it] {'loss': 0.5472, 'learning_rate': 6.41083937127287e-07, 'epoch': 0.89} 89%|████████▉ | 5902/6640 [1:25:27<2:05:15, 10.18s/it] 89%|████████▉ | 5903/6640 [1:25:32<1:47:11, 8.73s/it] {'loss': 0.5845, 'learning_rate': 6.393665015751293e-07, 'epoch': 0.89} 89%|████████▉ | 5903/6640 [1:25:32<1:47:11, 8.73s/it] 89%|████████▉ | 5904/6640 [1:25:38<1:35:20, 7.77s/it] {'loss': 0.5582, 'learning_rate': 6.37651293602628e-07, 'epoch': 0.89} 89%|████████▉ | 5904/6640 [1:25:38<1:35:20, 7.77s/it] 89%|████████▉ | 5905/6640 [1:25:44<1:27:49, 7.17s/it] {'loss': 0.5921, 'learning_rate': 6.359383136179598e-07, 'epoch': 0.89} 89%|████████▉ | 5905/6640 [1:25:44<1:27:49, 7.17s/it] 89%|████████▉ | 5906/6640 [1:25:49<1:21:39, 6.67s/it] {'loss': 0.5699, 'learning_rate': 6.342275620287675e-07, 'epoch': 0.89} 89%|████████▉ | 5906/6640 [1:25:49<1:21:39, 6.67s/it] 89%|████████▉ | 5907/6640 [1:25:55<1:18:34, 6.43s/it] {'loss': 0.6096, 'learning_rate': 6.325190392421654e-07, 'epoch': 0.89} 89%|████████▉ | 5907/6640 [1:25:55<1:18:34, 6.43s/it] 89%|████████▉ | 5908/6640 [1:26:00<1:14:31, 6.11s/it] {'loss': 0.5502, 'learning_rate': 6.308127456647317e-07, 'epoch': 0.89} 89%|████████▉ | 5908/6640 [1:26:00<1:14:31, 6.11s/it] 89%|████████▉ | 5909/6640 [1:26:06<1:12:24, 5.94s/it] {'loss': 0.5867, 'learning_rate': 6.291086817025228e-07, 'epoch': 0.89} 89%|████████▉ | 5909/6640 [1:26:06<1:12:24, 5.94s/it] 89%|████████▉ | 5910/6640 [1:26:11<1:10:08, 5.77s/it] {'loss': 0.5794, 'learning_rate': 6.274068477610584e-07, 'epoch': 0.89} 89%|████████▉ | 5910/6640 [1:26:11<1:10:08, 5.77s/it] 89%|████████▉ | 5911/6640 [1:26:17<1:08:11, 5.61s/it] {'loss': 0.5667, 'learning_rate': 6.257072442453293e-07, 'epoch': 0.89} 89%|████████▉ | 5911/6640 [1:26:17<1:08:11, 5.61s/it] 89%|████████▉ | 5912/6640 [1:26:22<1:07:55, 5.60s/it] {'loss': 0.558, 'learning_rate': 6.240098715597975e-07, 'epoch': 0.89} 89%|████████▉ | 5912/6640 [1:26:22<1:07:55, 5.60s/it] 89%|████████▉ | 5913/6640 [1:26:28<1:07:01, 5.53s/it] {'loss': 0.5531, 'learning_rate': 6.223147301083887e-07, 'epoch': 0.89} 89%|████████▉ | 5913/6640 [1:26:28<1:07:01, 5.53s/it] 89%|████████▉ | 5914/6640 [1:26:33<1:06:14, 5.47s/it] {'loss': 0.5756, 'learning_rate': 6.206218202945036e-07, 'epoch': 0.89} 89%|████████▉ | 5914/6640 [1:26:33<1:06:14, 5.47s/it] 89%|████████▉ | 5915/6640 [1:26:38<1:06:02, 5.47s/it] {'loss': 0.5839, 'learning_rate': 6.189311425210087e-07, 'epoch': 0.89} 89%|████████▉ | 5915/6640 [1:26:38<1:06:02, 5.47s/it] 89%|████████▉ | 5916/6640 [1:26:44<1:05:55, 5.46s/it] {'loss': 0.5739, 'learning_rate': 6.172426971902412e-07, 'epoch': 0.89} 89%|████████▉ | 5916/6640 [1:26:44<1:05:55, 5.46s/it] 89%|████████▉ | 5917/6640 [1:26:49<1:05:08, 5.41s/it] {'loss': 0.5785, 'learning_rate': 6.15556484704003e-07, 'epoch': 0.89} 89%|████████▉ | 5917/6640 [1:26:49<1:05:08, 5.41s/it] 89%|████████▉ | 5918/6640 [1:26:54<1:04:57, 5.40s/it] {'loss': 0.5951, 'learning_rate': 6.138725054635697e-07, 'epoch': 0.89} 89%|████████▉ | 5918/6640 [1:26:54<1:04:57, 5.40s/it] 89%|████████▉ | 5919/6640 [1:27:00<1:04:55, 5.40s/it] {'loss': 0.579, 'learning_rate': 6.121907598696808e-07, 'epoch': 0.89} 89%|████████▉ | 5919/6640 [1:27:00<1:04:55, 5.40s/it] 89%|████████▉ | 5920/6640 [1:27:05<1:04:25, 5.37s/it] {'loss': 0.5832, 'learning_rate': 6.105112483225495e-07, 'epoch': 0.89} 89%|████████▉ | 5920/6640 [1:27:05<1:04:25, 5.37s/it] 89%|████████▉ | 5921/6640 [1:27:11<1:04:34, 5.39s/it] {'loss': 0.5722, 'learning_rate': 6.08833971221855e-07, 'epoch': 0.89} 89%|████████▉ | 5921/6640 [1:27:11<1:04:34, 5.39s/it] 89%|████████▉ | 5922/6640 [1:27:16<1:04:11, 5.36s/it] {'loss': 0.5697, 'learning_rate': 6.071589289667402e-07, 'epoch': 0.89} 89%|████████▉ | 5922/6640 [1:27:16<1:04:11, 5.36s/it] 89%|████████▉ | 5923/6640 [1:27:21<1:04:15, 5.38s/it] {'loss': 0.5873, 'learning_rate': 6.054861219558228e-07, 'epoch': 0.89} 89%|████████▉ | 5923/6640 [1:27:21<1:04:15, 5.38s/it] 89%|████████▉ | 5924/6640 [1:27:27<1:04:35, 5.41s/it] {'loss': 0.583, 'learning_rate': 6.038155505871857e-07, 'epoch': 0.89} 89%|████████▉ | 5924/6640 [1:27:27<1:04:35, 5.41s/it] 89%|████████▉ | 5925/6640 [1:27:32<1:04:43, 5.43s/it] {'loss': 0.5839, 'learning_rate': 6.021472152583818e-07, 'epoch': 0.89} 89%|████████▉ | 5925/6640 [1:27:32<1:04:43, 5.43s/it] 89%|████████▉ | 5926/6640 [1:27:38<1:04:32, 5.42s/it] {'loss': 0.589, 'learning_rate': 6.00481116366427e-07, 'epoch': 0.89} 89%|████████▉ | 5926/6640 [1:27:38<1:04:32, 5.42s/it] 89%|████████▉ | 5927/6640 [1:27:43<1:04:16, 5.41s/it] {'loss': 0.5649, 'learning_rate': 5.988172543078097e-07, 'epoch': 0.89} 89%|████████▉ | 5927/6640 [1:27:43<1:04:16, 5.41s/it] 89%|████████▉ | 5928/6640 [1:27:49<1:04:46, 5.46s/it] {'loss': 0.6002, 'learning_rate': 5.971556294784842e-07, 'epoch': 0.89} 89%|████████▉ | 5928/6640 [1:27:49<1:04:46, 5.46s/it] 89%|████████▉ | 5929/6640 [1:27:54<1:04:39, 5.46s/it] {'loss': 0.5657, 'learning_rate': 5.95496242273873e-07, 'epoch': 0.89} 89%|████████▉ | 5929/6640 [1:27:54<1:04:39, 5.46s/it] 89%|████████▉ | 5930/6640 [1:28:00<1:04:21, 5.44s/it] {'loss': 0.5911, 'learning_rate': 5.938390930888671e-07, 'epoch': 0.89} 89%|████████▉ | 5930/6640 [1:28:00<1:04:21, 5.44s/it] 89%|████████▉ | 5931/6640 [1:28:05<1:04:18, 5.44s/it] {'loss': 0.5793, 'learning_rate': 5.921841823178198e-07, 'epoch': 0.89} 89%|████████▉ | 5931/6640 [1:28:05<1:04:18, 5.44s/it] 89%|████████▉ | 5932/6640 [1:28:10<1:04:11, 5.44s/it] {'loss': 0.5642, 'learning_rate': 5.905315103545573e-07, 'epoch': 0.89} 89%|████████▉ | 5932/6640 [1:28:10<1:04:11, 5.44s/it] 89%|████████▉ | 5933/6640 [1:28:16<1:04:01, 5.43s/it] {'loss': 0.5781, 'learning_rate': 5.888810775923726e-07, 'epoch': 0.89} 89%|████████▉ | 5933/6640 [1:28:16<1:04:01, 5.43s/it] 89%|████████▉ | 5934/6640 [1:28:21<1:04:30, 5.48s/it] {'loss': 0.5765, 'learning_rate': 5.872328844240249e-07, 'epoch': 0.89} 89%|████████▉ | 5934/6640 [1:28:21<1:04:30, 5.48s/it] 89%|████████▉ | 5935/6640 [1:28:26<1:02:56, 5.36s/it] {'loss': 0.5789, 'learning_rate': 5.855869312417362e-07, 'epoch': 0.89} 89%|████████▉ | 5935/6640 [1:28:26<1:02:56, 5.36s/it] 89%|████████▉ | 5936/6640 [1:28:32<1:03:44, 5.43s/it] {'loss': 0.5699, 'learning_rate': 5.839432184372018e-07, 'epoch': 0.89} 89%|████████▉ | 5936/6640 [1:28:32<1:03:44, 5.43s/it] 89%|████████▉ | 5937/6640 [1:28:38<1:03:53, 5.45s/it] {'loss': 0.582, 'learning_rate': 5.823017464015801e-07, 'epoch': 0.89} 89%|████████▉ | 5937/6640 [1:28:38<1:03:53, 5.45s/it] 89%|████████▉ | 5938/6640 [1:28:43<1:03:43, 5.45s/it] {'loss': 0.5577, 'learning_rate': 5.806625155254986e-07, 'epoch': 0.89} 89%|████████▉ | 5938/6640 [1:28:43<1:03:43, 5.45s/it] 89%|████████▉ | 5939/6640 [1:28:48<1:03:21, 5.42s/it] {'loss': 0.5725, 'learning_rate': 5.790255261990518e-07, 'epoch': 0.89} 89%|████████▉ | 5939/6640 [1:28:48<1:03:21, 5.42s/it] 89%|████████▉ | 5940/6640 [1:28:54<1:03:28, 5.44s/it] {'loss': 0.5993, 'learning_rate': 5.77390778811796e-07, 'epoch': 0.89} 89%|████████▉ | 5940/6640 [1:28:54<1:03:28, 5.44s/it] 89%|████████▉ | 5941/6640 [1:28:59<1:03:17, 5.43s/it] {'loss': 0.602, 'learning_rate': 5.757582737527579e-07, 'epoch': 0.89} 89%|████████▉ | 5941/6640 [1:28:59<1:03:17, 5.43s/it] 89%|████████▉ | 5942/6640 [1:29:05<1:02:56, 5.41s/it] {'loss': 0.5701, 'learning_rate': 5.741280114104309e-07, 'epoch': 0.89} 89%|████████▉ | 5942/6640 [1:29:05<1:02:56, 5.41s/it] 90%|████████▉ | 5943/6640 [1:29:10<1:02:45, 5.40s/it] {'loss': 0.5629, 'learning_rate': 5.724999921727725e-07, 'epoch': 0.9} 90%|████████▉ | 5943/6640 [1:29:10<1:02:45, 5.40s/it] 90%|████████▉ | 5944/6640 [1:29:16<1:03:20, 5.46s/it] {'loss': 0.5825, 'learning_rate': 5.708742164272108e-07, 'epoch': 0.9} 90%|████████▉ | 5944/6640 [1:29:16<1:03:20, 5.46s/it] 90%|████████▉ | 5945/6640 [1:29:21<1:03:11, 5.46s/it] {'loss': 0.6206, 'learning_rate': 5.692506845606327e-07, 'epoch': 0.9} 90%|████████▉ | 5945/6640 [1:29:21<1:03:11, 5.46s/it] 90%|████████▉ | 5946/6640 [1:29:27<1:03:27, 5.49s/it] {'loss': 0.585, 'learning_rate': 5.676293969593971e-07, 'epoch': 0.9} 90%|████████▉ | 5946/6640 [1:29:27<1:03:27, 5.49s/it] 90%|████████▉ | 5947/6640 [1:29:32<1:03:14, 5.48s/it] {'loss': 0.5763, 'learning_rate': 5.660103540093265e-07, 'epoch': 0.9} 90%|████████▉ | 5947/6640 [1:29:32<1:03:14, 5.48s/it] 90%|████████▉ | 5948/6640 [1:29:38<1:03:18, 5.49s/it] {'loss': 0.582, 'learning_rate': 5.643935560957114e-07, 'epoch': 0.9} 90%|████████▉ | 5948/6640 [1:29:38<1:03:18, 5.49s/it] 90%|████████▉ | 5949/6640 [1:29:43<1:03:01, 5.47s/it] {'loss': 0.5625, 'learning_rate': 5.627790036033031e-07, 'epoch': 0.9} 90%|████████▉ | 5949/6640 [1:29:43<1:03:01, 5.47s/it]7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 90%|████████▉ | 5950/6640 [1:29:48<1:02:52, 5.47s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5761, 'learning_rate': 5.611666969163243e-07, 'epoch': 0.9} 90%|████████▉ | 5950/6640 [1:29:48<1:02:52, 5.47s/it] 90%|████████▉ | 5951/6640 [1:29:54<1:02:43, 5.46s/it] {'loss': 0.5752, 'learning_rate': 5.595566364184602e-07, 'epoch': 0.9} 90%|████████▉ | 5951/6640 [1:29:54<1:02:43, 5.46s/it] 90%|████████▉ | 5952/6640 [1:29:59<1:02:22, 5.44s/it] {'loss': 0.5978, 'learning_rate': 5.57948822492862e-07, 'epoch': 0.9} 90%|████████▉ | 5952/6640 [1:29:59<1:02:22, 5.44s/it] 90%|████████▉ | 5953/6640 [1:30:05<1:02:18, 5.44s/it] {'loss': 0.5653, 'learning_rate': 5.563432555221471e-07, 'epoch': 0.9} 90%|████████▉ | 5953/6640 [1:30:05<1:02:18, 5.44s/it] 90%|████████▉ | 5954/6640 [1:30:10<1:02:28, 5.46s/it] {'loss': 0.574, 'learning_rate': 5.547399358883953e-07, 'epoch': 0.9} 90%|████████▉ | 5954/6640 [1:30:10<1:02:28, 5.46s/it] 90%|████████▉ | 5955/6640 [1:30:16<1:02:26, 5.47s/it] {'loss': 0.5741, 'learning_rate': 5.53138863973155e-07, 'epoch': 0.9} 90%|████████▉ | 5955/6640 [1:30:16<1:02:26, 5.47s/it] 90%|████████▉ | 5956/6640 [1:30:21<1:01:57, 5.43s/it] {'loss': 0.6071, 'learning_rate': 5.515400401574378e-07, 'epoch': 0.9} 90%|████████▉ | 5956/6640 [1:30:21<1:01:57, 5.43s/it] 90%|████████▉ | 5957/6640 [1:30:26<1:01:24, 5.39s/it] {'loss': 0.5775, 'learning_rate': 5.499434648217216e-07, 'epoch': 0.9} 90%|████████▉ | 5957/6640 [1:30:26<1:01:24, 5.39s/it] 90%|████████▉ | 5958/6640 [1:30:32<1:01:28, 5.41s/it] {'loss': 0.5939, 'learning_rate': 5.483491383459482e-07, 'epoch': 0.9} 90%|████████▉ | 5958/6640 [1:30:32<1:01:28, 5.41s/it] 90%|████████▉ | 5959/6640 [1:30:37<1:00:44, 5.35s/it] {'loss': 0.573, 'learning_rate': 5.467570611095241e-07, 'epoch': 0.9} 90%|████████▉ | 5959/6640 [1:30:37<1:00:44, 5.35s/it] 90%|████████▉ | 5960/6640 [1:30:43<1:01:55, 5.46s/it] {'loss': 0.597, 'learning_rate': 5.451672334913216e-07, 'epoch': 0.9} 90%|████████▉ | 5960/6640 [1:30:43<1:01:55, 5.46s/it] 90%|████████▉ | 5961/6640 [1:30:48<1:01:52, 5.47s/it] {'loss': 0.5647, 'learning_rate': 5.43579655869676e-07, 'epoch': 0.9} 90%|████████▉ | 5961/6640 [1:30:48<1:01:52, 5.47s/it] 90%|████████▉ | 5962/6640 [1:30:54<1:02:02, 5.49s/it] {'loss': 0.5862, 'learning_rate': 5.419943286223905e-07, 'epoch': 0.9} 90%|████████▉ | 5962/6640 [1:30:54<1:02:02, 5.49s/it] 90%|████████▉ | 5963/6640 [1:30:59<1:01:29, 5.45s/it] {'loss': 0.5708, 'learning_rate': 5.40411252126728e-07, 'epoch': 0.9} 90%|████████▉ | 5963/6640 [1:30:59<1:01:29, 5.45s/it] 90%|████████▉ | 5964/6640 [1:31:05<1:01:04, 5.42s/it] {'loss': 0.5811, 'learning_rate': 5.388304267594191e-07, 'epoch': 0.9} 90%|████████▉ | 5964/6640 [1:31:05<1:01:04, 5.42s/it] 90%|████████▉ | 5965/6640 [1:31:10<1:00:58, 5.42s/it] {'loss': 0.5848, 'learning_rate': 5.372518528966575e-07, 'epoch': 0.9} 90%|████████▉ | 5965/6640 [1:31:10<1:00:58, 5.42s/it] 90%|████████▉ | 5966/6640 [1:31:15<1:00:33, 5.39s/it] {'loss': 0.5899, 'learning_rate': 5.356755309141026e-07, 'epoch': 0.9} 90%|████████▉ | 5966/6640 [1:31:15<1:00:33, 5.39s/it] 90%|████████▉ | 5967/6640 [1:31:21<1:00:16, 5.37s/it] {'loss': 0.571, 'learning_rate': 5.341014611868756e-07, 'epoch': 0.9} 90%|████████▉ | 5967/6640 [1:31:21<1:00:16, 5.37s/it] 90%|████████▉ | 5968/6640 [1:31:26<1:00:31, 5.40s/it] {'loss': 0.6007, 'learning_rate': 5.325296440895622e-07, 'epoch': 0.9} 90%|████████▉ | 5968/6640 [1:31:26<1:00:31, 5.40s/it] 90%|████████▉ | 5969/6640 [1:31:32<1:01:18, 5.48s/it] {'loss': 0.582, 'learning_rate': 5.309600799962145e-07, 'epoch': 0.9} 90%|████████▉ | 5969/6640 [1:31:32<1:01:18, 5.48s/it] 90%|████████▉ | 5970/6640 [1:31:37<1:01:21, 5.50s/it] {'loss': 0.5791, 'learning_rate': 5.293927692803458e-07, 'epoch': 0.9} 90%|████████▉ | 5970/6640 [1:31:37<1:01:21, 5.50s/it] 90%|████████▉ | 5971/6640 [1:31:43<1:01:20, 5.50s/it] {'loss': 0.607, 'learning_rate': 5.278277123149355e-07, 'epoch': 0.9} 90%|████████▉ | 5971/6640 [1:31:43<1:01:20, 5.50s/it] 90%|████████▉ | 5972/6640 [1:31:48<1:00:57, 5.48s/it] {'loss': 0.591, 'learning_rate': 5.262649094724226e-07, 'epoch': 0.9} 90%|████████▉ | 5972/6640 [1:31:48<1:00:57, 5.48s/it] 90%|████████▉ | 5973/6640 [1:31:54<1:01:19, 5.52s/it] {'loss': 0.5798, 'learning_rate': 5.247043611247127e-07, 'epoch': 0.9} 90%|████████▉ | 5973/6640 [1:31:54<1:01:19, 5.52s/it] 90%|████████▉ | 5974/6640 [1:31:59<1:00:53, 5.49s/it] {'loss': 0.5904, 'learning_rate': 5.231460676431755e-07, 'epoch': 0.9} 90%|████████▉ | 5974/6640 [1:31:59<1:00:53, 5.49s/it] 90%|████████▉ | 5975/6640 [1:32:05<1:01:44, 5.57s/it] {'loss': 0.5867, 'learning_rate': 5.215900293986431e-07, 'epoch': 0.9} 90%|████████▉ | 5975/6640 [1:32:05<1:01:44, 5.57s/it] 90%|█████████ | 5976/6640 [1:32:10<1:01:14, 5.53s/it] {'loss': 0.58, 'learning_rate': 5.200362467614106e-07, 'epoch': 0.9} 90%|█████████ | 5976/6640 [1:32:10<1:01:14, 5.53s/it] 90%|█████████ | 5977/6640 [1:32:16<1:00:53, 5.51s/it] {'loss': 0.5607, 'learning_rate': 5.184847201012355e-07, 'epoch': 0.9} 90%|█████████ | 5977/6640 [1:32:16<1:00:53, 5.51s/it] 90%|█████████ | 5978/6640 [1:32:21<1:00:29, 5.48s/it] {'loss': 0.5641, 'learning_rate': 5.169354497873402e-07, 'epoch': 0.9} 90%|█████████ | 5978/6640 [1:32:21<1:00:29, 5.48s/it] 90%|█████████ | 5979/6640 [1:32:27<59:52, 5.44s/it] {'loss': 0.5691, 'learning_rate': 5.153884361884098e-07, 'epoch': 0.9} 90%|█████████ | 5979/6640 [1:32:27<59:52, 5.44s/it] 90%|█████████ | 5980/6640 [1:32:32<59:31, 5.41s/it] {'loss': 0.5539, 'learning_rate': 5.138436796725942e-07, 'epoch': 0.9} 90%|█████████ | 5980/6640 [1:32:32<59:31, 5.41s/it] 90%|█████████ | 5981/6640 [1:32:37<59:05, 5.38s/it] {'loss': 0.5928, 'learning_rate': 5.123011806074995e-07, 'epoch': 0.9} 90%|█████████ | 5981/6640 [1:32:37<59:05, 5.38s/it] 90%|█████████ | 5982/6640 [1:32:43<59:36, 5.44s/it] {'loss': 0.5872, 'learning_rate': 5.107609393602019e-07, 'epoch': 0.9} 90%|█████████ | 5982/6640 [1:32:43<59:36, 5.44s/it] 90%|█████████ | 5983/6640 [1:32:48<59:51, 5.47s/it] {'loss': 0.6165, 'learning_rate': 5.092229562972373e-07, 'epoch': 0.9} 90%|█████████ | 5983/6640 [1:32:48<59:51, 5.47s/it] 90%|█████████ | 5984/6640 [1:32:54<59:13, 5.42s/it] {'loss': 0.5908, 'learning_rate': 5.076872317846027e-07, 'epoch': 0.9} 90%|█████████ | 5984/6640 [1:32:54<59:13, 5.42s/it] 90%|█████████ | 5985/6640 [1:32:59<59:31, 5.45s/it] {'loss': 0.5621, 'learning_rate': 5.061537661877636e-07, 'epoch': 0.9} 90%|█████████ | 5985/6640 [1:32:59<59:31, 5.45s/it] 90%|█████████ | 5986/6640 [1:33:05<59:34, 5.47s/it] {'loss': 0.5647, 'learning_rate': 5.04622559871637e-07, 'epoch': 0.9} 90%|█████████ | 5986/6640 [1:33:05<59:34, 5.47s/it] 90%|█████████ | 5987/6640 [1:33:11<1:00:47, 5.59s/it] {'loss': 0.5793, 'learning_rate': 5.030936132006159e-07, 'epoch': 0.9} 90%|█████████ | 5987/6640 [1:33:11<1:00:47, 5.59s/it] 90%|█████████ | 5988/6640 [1:33:16<1:00:37, 5.58s/it] {'loss': 0.6094, 'learning_rate': 5.015669265385436e-07, 'epoch': 0.9} 90%|█████████ | 5988/6640 [1:33:16<1:00:37, 5.58s/it] 90%|█████████ | 5989/6640 [1:33:22<1:00:14, 5.55s/it] {'loss': 0.5718, 'learning_rate': 5.000425002487342e-07, 'epoch': 0.9} 90%|█████████ | 5989/6640 [1:33:22<1:00:14, 5.55s/it] 90%|█████████ | 5990/6640 [1:33:27<1:00:25, 5.58s/it] {'loss': 0.5905, 'learning_rate': 4.98520334693956e-07, 'epoch': 0.9} 90%|█████████ | 5990/6640 [1:33:27<1:00:25, 5.58s/it] 90%|█████████ | 5991/6640 [1:33:33<59:54, 5.54s/it] {'loss': 0.5977, 'learning_rate': 4.970004302364473e-07, 'epoch': 0.9} 90%|█████████ | 5991/6640 [1:33:33<59:54, 5.54s/it] 90%|█████████ | 5992/6640 [1:33:38<59:23, 5.50s/it] {'loss': 0.5557, 'learning_rate': 4.954827872379031e-07, 'epoch': 0.9} 90%|█████████ | 5992/6640 [1:33:38<59:23, 5.50s/it] 90%|█████████ | 5993/6640 [1:33:43<58:39, 5.44s/it] {'loss': 0.5802, 'learning_rate': 4.939674060594813e-07, 'epoch': 0.9} 90%|█████████ | 5993/6640 [1:33:43<58:39, 5.44s/it] 90%|█████████ | 5994/6640 [1:33:49<58:50, 5.47s/it] {'loss': 0.5764, 'learning_rate': 4.924542870618043e-07, 'epoch': 0.9} 90%|█████████ | 5994/6640 [1:33:49<58:50, 5.47s/it] 90%|█████████ | 5995/6640 [1:33:54<58:14, 5.42s/it] {'loss': 0.5712, 'learning_rate': 4.909434306049487e-07, 'epoch': 0.9} 90%|█████████ | 5995/6640 [1:33:54<58:14, 5.42s/it] 90%|█████████ | 5996/6640 [1:34:00<57:44, 5.38s/it] {'loss': 0.5572, 'learning_rate': 4.894348370484648e-07, 'epoch': 0.9} 90%|█████████ | 5996/6640 [1:34:00<57:44, 5.38s/it] 90%|█████████ | 5997/6640 [1:34:05<58:12, 5.43s/it] {'loss': 0.594, 'learning_rate': 4.879285067513518e-07, 'epoch': 0.9} 90%|█████████ | 5997/6640 [1:34:05<58:12, 5.43s/it] 90%|█████████ | 5998/6640 [1:34:11<58:12, 5.44s/it] {'loss': 0.5541, 'learning_rate': 4.864244400720797e-07, 'epoch': 0.9} 90%|█████████ | 5998/6640 [1:34:11<58:12, 5.44s/it] 90%|█████████ | 5999/6640 [1:34:16<57:31, 5.39s/it] {'loss': 0.5604, 'learning_rate': 4.849226373685723e-07, 'epoch': 0.9} 90%|█████████ | 5999/6640 [1:34:16<57:31, 5.39s/it]7 AutoResumeHook: Checking whether to suspend... 04 3AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 90%|█████████ | 6000/6640 [1:34:21<57:24, 5.38s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.564, 'learning_rate': 4.834230989982214e-07, 'epoch': 0.9} 90%|█████████ | 6000/6640 [1:34:21<57:24, 5.38s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 90%|█████████ | 6001/6640 [1:34:48<2:04:39, 11.71s/it] {'loss': 0.5622, 'learning_rate': 4.819258253178749e-07, 'epoch': 0.9} 90%|█████████ | 6001/6640 [1:34:48<2:04:39, 11.71s/it] 90%|█████████ | 6002/6640 [1:34:53<1:44:15, 9.80s/it] {'loss': 0.5632, 'learning_rate': 4.804308166838445e-07, 'epoch': 0.9} 90%|█████████ | 6002/6640 [1:34:53<1:44:15, 9.80s/it] 90%|█████████ | 6003/6640 [1:34:58<1:29:43, 8.45s/it] {'loss': 0.5689, 'learning_rate': 4.789380734519045e-07, 'epoch': 0.9} 90%|█████████ | 6003/6640 [1:34:58<1:29:43, 8.45s/it] 90%|█████████ | 6004/6640 [1:35:04<1:19:52, 7.54s/it] {'loss': 0.5797, 'learning_rate': 4.774475959772818e-07, 'epoch': 0.9} 90%|█████████ | 6004/6640 [1:35:04<1:19:52, 7.54s/it] 90%|█████████ | 6005/6640 [1:35:09<1:12:45, 6.87s/it] {'loss': 0.5863, 'learning_rate': 4.7595938461467706e-07, 'epoch': 0.9} 90%|█████████ | 6005/6640 [1:35:09<1:12:45, 6.87s/it] 90%|█████████ | 6006/6640 [1:35:14<1:07:41, 6.41s/it] {'loss': 0.5644, 'learning_rate': 4.7447343971824044e-07, 'epoch': 0.9} 90%|█████████ | 6006/6640 [1:35:14<1:07:41, 6.41s/it] 90%|█████████ | 6007/6640 [1:35:20<1:05:37, 6.22s/it] {'loss': 0.5522, 'learning_rate': 4.7298976164158795e-07, 'epoch': 0.9} 90%|█████████ | 6007/6640 [1:35:20<1:05:37, 6.22s/it] 90%|█████████ | 6008/6640 [1:35:25<1:02:21, 5.92s/it] {'loss': 0.5979, 'learning_rate': 4.715083507377949e-07, 'epoch': 0.9} 90%|█████████ | 6008/6640 [1:35:25<1:02:21, 5.92s/it] 90%|█████████ | 6009/6640 [1:35:31<1:01:37, 5.86s/it] {'loss': 0.5667, 'learning_rate': 4.7002920735939705e-07, 'epoch': 0.9} 90%|█████████ | 6009/6640 [1:35:31<1:01:37, 5.86s/it] 91%|█████████ | 6010/6640 [1:35:36<59:37, 5.68s/it] {'loss': 0.5953, 'learning_rate': 4.6855233185839175e-07, 'epoch': 0.91} 91%|█████████ | 6010/6640 [1:35:36<59:37, 5.68s/it] 91%|█████████ | 6011/6640 [1:35:42<59:05, 5.64s/it] {'loss': 0.5857, 'learning_rate': 4.6707772458623566e-07, 'epoch': 0.91} 91%|█████████ | 6011/6640 [1:35:42<59:05, 5.64s/it] 91%|█████████ | 6012/6640 [1:35:47<58:49, 5.62s/it] {'loss': 0.5633, 'learning_rate': 4.6560538589384585e-07, 'epoch': 0.91} 91%|█████████ | 6012/6640 [1:35:47<58:49, 5.62s/it] 91%|█████████ | 6013/6640 [1:35:53<57:59, 5.55s/it] {'loss': 0.5894, 'learning_rate': 4.6413531613159755e-07, 'epoch': 0.91} 91%|█████████ | 6013/6640 [1:35:53<57:59, 5.55s/it] 91%|█████████ | 6014/6640 [1:35:59<58:26, 5.60s/it] {'loss': 0.5953, 'learning_rate': 4.62667515649331e-07, 'epoch': 0.91} 91%|█████████ | 6014/6640 [1:35:59<58:26, 5.60s/it] 91%|█████████ | 6015/6640 [1:36:04<57:50, 5.55s/it] {'loss': 0.6021, 'learning_rate': 4.6120198479634117e-07, 'epoch': 0.91} 91%|█████████ | 6015/6640 [1:36:04<57:50, 5.55s/it] 91%|█████████ | 6016/6640 [1:36:10<57:32, 5.53s/it] {'loss': 0.5626, 'learning_rate': 4.5973872392138686e-07, 'epoch': 0.91} 91%|█████████ | 6016/6640 [1:36:10<57:32, 5.53s/it] 91%|█████████ | 6017/6640 [1:36:15<57:19, 5.52s/it] {'loss': 0.5804, 'learning_rate': 4.5827773337268177e-07, 'epoch': 0.91} 91%|█████████ | 6017/6640 [1:36:15<57:19, 5.52s/it] 91%|█████████ | 6018/6640 [1:36:20<56:35, 5.46s/it] {'loss': 0.5647, 'learning_rate': 4.5681901349790556e-07, 'epoch': 0.91} 91%|█████████ | 6018/6640 [1:36:20<56:35, 5.46s/it] 91%|█████████ | 6019/6640 [1:36:26<55:57, 5.41s/it] {'loss': 0.5826, 'learning_rate': 4.553625646441928e-07, 'epoch': 0.91} 91%|█████████ | 6019/6640 [1:36:26<55:57, 5.41s/it] 91%|█████████ | 6020/6640 [1:36:31<55:57, 5.41s/it] {'loss': 0.5612, 'learning_rate': 4.5390838715813956e-07, 'epoch': 0.91} 91%|█████████ | 6020/6640 [1:36:31<55:57, 5.41s/it] 91%|█████████ | 6021/6640 [1:36:37<56:00, 5.43s/it] {'loss': 0.5694, 'learning_rate': 4.5245648138580234e-07, 'epoch': 0.91} 91%|█████████ | 6021/6640 [1:36:37<56:00, 5.43s/it] 91%|█████████ | 6022/6640 [1:36:42<55:47, 5.42s/it] {'loss': 0.5623, 'learning_rate': 4.5100684767269256e-07, 'epoch': 0.91} 91%|█████████ | 6022/6640 [1:36:42<55:47, 5.42s/it] 91%|█████████ | 6023/6640 [1:36:47<55:30, 5.40s/it] {'loss': 0.5506, 'learning_rate': 4.495594863637875e-07, 'epoch': 0.91} 91%|█████████ | 6023/6640 [1:36:47<55:30, 5.40s/it] 91%|█████████ | 6024/6640 [1:36:53<55:05, 5.37s/it] {'loss': 0.5375, 'learning_rate': 4.481143978035196e-07, 'epoch': 0.91} 91%|█████████ | 6024/6640 [1:36:53<55:05, 5.37s/it] 91%|█████████ | 6025/6640 [1:36:58<55:27, 5.41s/it] {'loss': 0.5868, 'learning_rate': 4.4667158233577925e-07, 'epoch': 0.91} 91%|█████████ | 6025/6640 [1:36:58<55:27, 5.41s/it] 91%|█████████ | 6026/6640 [1:37:04<56:01, 5.48s/it] {'loss': 0.5892, 'learning_rate': 4.452310403039217e-07, 'epoch': 0.91} 91%|█████████ | 6026/6640 [1:37:04<56:01, 5.48s/it] 91%|█████████ | 6027/6640 [1:37:09<56:45, 5.56s/it] {'loss': 0.5758, 'learning_rate': 4.4379277205075175e-07, 'epoch': 0.91} 91%|█████████ | 6027/6640 [1:37:09<56:45, 5.56s/it] 91%|█████████ | 6028/6640 [1:37:15<56:29, 5.54s/it] {'loss': 0.5909, 'learning_rate': 4.4235677791854445e-07, 'epoch': 0.91} 91%|█████████ | 6028/6640 [1:37:15<56:29, 5.54s/it] 91%|█████████ | 6029/6640 [1:37:21<56:44, 5.57s/it] {'loss': 0.5429, 'learning_rate': 4.4092305824902317e-07, 'epoch': 0.91} 91%|█████████ | 6029/6640 [1:37:21<56:44, 5.57s/it] 91%|█████████ | 6030/6640 [1:37:26<55:54, 5.50s/it] {'loss': 0.5949, 'learning_rate': 4.394916133833782e-07, 'epoch': 0.91} 91%|█████████ | 6030/6640 [1:37:26<55:54, 5.50s/it] 91%|█████████ | 6031/6640 [1:37:31<55:44, 5.49s/it] {'loss': 0.5681, 'learning_rate': 4.380624436622516e-07, 'epoch': 0.91} 91%|█████████ | 6031/6640 [1:37:31<55:44, 5.49s/it] 91%|█████████ | 6032/6640 [1:37:37<55:59, 5.53s/it] {'loss': 0.5815, 'learning_rate': 4.3663554942575235e-07, 'epoch': 0.91} 91%|█████████ | 6032/6640 [1:37:37<55:59, 5.53s/it] 91%|█████████ | 6033/6640 [1:37:42<55:33, 5.49s/it] {'loss': 0.5572, 'learning_rate': 4.352109310134378e-07, 'epoch': 0.91} 91%|█████████ | 6033/6640 [1:37:42<55:33, 5.49s/it] 91%|█████████ | 6034/6640 [1:37:48<55:03, 5.45s/it] {'loss': 0.5935, 'learning_rate': 4.3378858876433117e-07, 'epoch': 0.91} 91%|█████████ | 6034/6640 [1:37:48<55:03, 5.45s/it] 91%|█████████ | 6035/6640 [1:37:53<54:34, 5.41s/it] {'loss': 0.5578, 'learning_rate': 4.323685230169128e-07, 'epoch': 0.91} 91%|█████████ | 6035/6640 [1:37:53<54:34, 5.41s/it] 91%|█████████ | 6036/6640 [1:37:58<54:05, 5.37s/it] {'loss': 0.5707, 'learning_rate': 4.309507341091179e-07, 'epoch': 0.91} 91%|█████████ | 6036/6640 [1:37:58<54:05, 5.37s/it] 91%|█████████ | 6037/6640 [1:38:04<53:52, 5.36s/it] {'loss': 0.5596, 'learning_rate': 4.295352223783444e-07, 'epoch': 0.91} 91%|█████████ | 6037/6640 [1:38:04<53:52, 5.36s/it] 91%|█████████ | 6038/6640 [1:38:09<53:49, 5.37s/it] {'loss': 0.5826, 'learning_rate': 4.281219881614451e-07, 'epoch': 0.91} 91%|█████████ | 6038/6640 [1:38:09<53:49, 5.37s/it] 91%|█████████ | 6039/6640 [1:38:14<53:36, 5.35s/it] {'loss': 0.5782, 'learning_rate': 4.267110317947332e-07, 'epoch': 0.91} 91%|█████████ | 6039/6640 [1:38:14<53:36, 5.35s/it] 91%|█████████ | 6040/6640 [1:38:20<53:33, 5.36s/it] {'loss': 0.5868, 'learning_rate': 4.253023536139733e-07, 'epoch': 0.91} 91%|█████████ | 6040/6640 [1:38:20<53:33, 5.36s/it] 91%|█████████ | 6041/6640 [1:38:25<53:35, 5.37s/it] {'loss': 0.5852, 'learning_rate': 4.2389595395440074e-07, 'epoch': 0.91} 91%|█████████ | 6041/6640 [1:38:25<53:35, 5.37s/it] 91%|█████████ | 6042/6640 [1:38:31<53:54, 5.41s/it] {'loss': 0.6051, 'learning_rate': 4.224918331506955e-07, 'epoch': 0.91} 91%|█████████ | 6042/6640 [1:38:31<53:54, 5.41s/it] 91%|█████████ | 6043/6640 [1:38:36<53:39, 5.39s/it] {'loss': 0.5591, 'learning_rate': 4.2108999153700257e-07, 'epoch': 0.91} 91%|█████████ | 6043/6640 [1:38:36<53:39, 5.39s/it] 91%|█████████ | 6044/6640 [1:38:42<54:23, 5.48s/it] {'loss': 0.6087, 'learning_rate': 4.1969042944692284e-07, 'epoch': 0.91} 91%|█████████ | 6044/6640 [1:38:42<54:23, 5.48s/it] 91%|█████████ | 6045/6640 [1:38:47<54:07, 5.46s/it] {'loss': 0.5556, 'learning_rate': 4.1829314721351213e-07, 'epoch': 0.91} 91%|█████████ | 6045/6640 [1:38:47<54:07, 5.46s/it] 91%|█████████ | 6046/6640 [1:38:53<54:23, 5.49s/it] {'loss': 0.5796, 'learning_rate': 4.1689814516928994e-07, 'epoch': 0.91} 91%|█████████ | 6046/6640 [1:38:53<54:23, 5.49s/it] 91%|█████████ | 6047/6640 [1:39:00<59:55, 6.06s/it] {'loss': 0.5747, 'learning_rate': 4.155054236462264e-07, 'epoch': 0.91} 91%|█████████ | 6047/6640 [1:39:00<59:55, 6.06s/it] 91%|█████████ | 6048/6640 [1:39:06<58:09, 5.90s/it] {'loss': 0.5814, 'learning_rate': 4.14114982975754e-07, 'epoch': 0.91} 91%|█████████ | 6048/6640 [1:39:06<58:09, 5.90s/it] 91%|█████████ | 6049/6640 [1:39:11<56:36, 5.75s/it] {'loss': 0.5811, 'learning_rate': 4.1272682348875604e-07, 'epoch': 0.91} 91%|█████████ | 6049/6640 [1:39:11<56:36, 5.75s/it]3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 051 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 91%|█████████ | 6050/6640 [1:39:17<55:58, 5.69s/it]4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6015, 'learning_rate': 4.113409455155837e-07, 'epoch': 0.91} 91%|█████████ | 6050/6640 [1:39:17<55:58, 5.69s/it] 91%|█████████ | 6051/6640 [1:39:22<55:05, 5.61s/it] {'loss': 0.5694, 'learning_rate': 4.099573493860343e-07, 'epoch': 0.91} 91%|█████████ | 6051/6640 [1:39:22<55:05, 5.61s/it] 91%|█████████ | 6052/6640 [1:39:27<54:10, 5.53s/it] {'loss': 0.5704, 'learning_rate': 4.0857603542936776e-07, 'epoch': 0.91} 91%|█████████ | 6052/6640 [1:39:27<54:10, 5.53s/it] 91%|█████████ | 6053/6640 [1:39:33<53:51, 5.51s/it] {'loss': 0.5725, 'learning_rate': 4.0719700397430116e-07, 'epoch': 0.91} 91%|█████████ | 6053/6640 [1:39:33<53:51, 5.51s/it] 91%|█████████ | 6054/6640 [1:39:38<53:51, 5.51s/it] {'loss': 0.5861, 'learning_rate': 4.05820255349002e-07, 'epoch': 0.91} 91%|█████████ | 6054/6640 [1:39:38<53:51, 5.51s/it] 91%|█████████ | 6055/6640 [1:39:44<53:27, 5.48s/it] {'loss': 0.5853, 'learning_rate': 4.0444578988110715e-07, 'epoch': 0.91} 91%|█████████ | 6055/6640 [1:39:44<53:27, 5.48s/it] 91%|█████████ | 6056/6640 [1:39:49<53:14, 5.47s/it] {'loss': 0.5764, 'learning_rate': 4.0307360789769603e-07, 'epoch': 0.91} 91%|█████████ | 6056/6640 [1:39:49<53:14, 5.47s/it] 91%|█████████ | 6057/6640 [1:39:54<52:28, 5.40s/it] {'loss': 0.5597, 'learning_rate': 4.017037097253151e-07, 'epoch': 0.91} 91%|█████████ | 6057/6640 [1:39:54<52:28, 5.40s/it] 91%|█████████ | 6058/6640 [1:40:00<53:12, 5.49s/it] {'loss': 0.5422, 'learning_rate': 4.0033609568995934e-07, 'epoch': 0.91} 91%|█████████ | 6058/6640 [1:40:00<53:12, 5.49s/it] 91%|█████████▏| 6059/6640 [1:40:05<52:59, 5.47s/it] {'loss': 0.5899, 'learning_rate': 3.989707661170894e-07, 'epoch': 0.91} 91%|█████████▏| 6059/6640 [1:40:05<52:59, 5.47s/it] 91%|█████████▏| 6060/6640 [1:40:11<52:10, 5.40s/it] {'loss': 0.5558, 'learning_rate': 3.976077213316132e-07, 'epoch': 0.91} 91%|█████████▏| 6060/6640 [1:40:11<52:10, 5.40s/it] 91%|█████████▏| 6061/6640 [1:40:16<52:09, 5.41s/it] {'loss': 0.5544, 'learning_rate': 3.96246961657899e-07, 'epoch': 0.91} 91%|█████████▏| 6061/6640 [1:40:16<52:09, 5.41s/it] 91%|█████████▏| 6062/6640 [1:40:21<51:23, 5.33s/it] {'loss': 0.58, 'learning_rate': 3.9488848741977447e-07, 'epoch': 0.91} 91%|█████████▏| 6062/6640 [1:40:21<51:23, 5.33s/it] 91%|█████████▏| 6063/6640 [1:40:27<51:48, 5.39s/it] {'loss': 0.5845, 'learning_rate': 3.935322989405155e-07, 'epoch': 0.91} 91%|█████████▏| 6063/6640 [1:40:27<51:48, 5.39s/it] 91%|█████████▏| 6064/6640 [1:40:32<52:01, 5.42s/it] {'loss': 0.5683, 'learning_rate': 3.9217839654286274e-07, 'epoch': 0.91} 91%|█████████▏| 6064/6640 [1:40:32<52:01, 5.42s/it] 91%|█████████▏| 6065/6640 [1:40:38<51:34, 5.38s/it] {'loss': 0.5818, 'learning_rate': 3.908267805490051e-07, 'epoch': 0.91} 91%|█████████▏| 6065/6640 [1:40:38<51:34, 5.38s/it] 91%|█████████▏| 6066/6640 [1:40:43<51:51, 5.42s/it] {'loss': 0.5713, 'learning_rate': 3.894774512805932e-07, 'epoch': 0.91} 91%|█████████▏| 6066/6640 [1:40:43<51:51, 5.42s/it] 91%|█████████▏| 6067/6640 [1:40:50<57:18, 6.00s/it] {'loss': 0.6017, 'learning_rate': 3.88130409058729e-07, 'epoch': 0.91} 91%|█████████▏| 6067/6640 [1:40:50<57:18, 6.00s/it] 91%|█████████▏| 6068/6640 [1:40:56<55:42, 5.84s/it] {'loss': 0.5507, 'learning_rate': 3.8678565420397606e-07, 'epoch': 0.91} 91%|█████████▏| 6068/6640 [1:40:56<55:42, 5.84s/it] 91%|█████████▏| 6069/6640 [1:41:01<53:59, 5.67s/it] {'loss': 0.5677, 'learning_rate': 3.854431870363473e-07, 'epoch': 0.91} 91%|█████████▏| 6069/6640 [1:41:01<53:59, 5.67s/it] 91%|█████████▏| 6070/6640 [1:41:09<1:00:26, 6.36s/it] {'loss': 0.5542, 'learning_rate': 3.8410300787531385e-07, 'epoch': 0.91} 91%|█████████▏| 6070/6640 [1:41:09<1:00:26, 6.36s/it] 91%|█████████▏| 6071/6640 [1:41:15<57:53, 6.10s/it] {'loss': 0.5819, 'learning_rate': 3.827651170398039e-07, 'epoch': 0.91} 91%|█████████▏| 6071/6640 [1:41:15<57:53, 6.10s/it] 91%|█████████▏| 6072/6640 [1:41:22<1:01:24, 6.49s/it] {'loss': 0.5672, 'learning_rate': 3.814295148481961e-07, 'epoch': 0.91} 91%|█████████▏| 6072/6640 [1:41:22<1:01:24, 6.49s/it] 91%|█████████▏| 6073/6640 [1:41:27<57:40, 6.10s/it] {'loss': 0.5618, 'learning_rate': 3.8009620161833295e-07, 'epoch': 0.91} 91%|█████████▏| 6073/6640 [1:41:27<57:40, 6.10s/it] 91%|█████████▏| 6074/6640 [1:41:33<56:01, 5.94s/it] {'loss': 0.5935, 'learning_rate': 3.7876517766750386e-07, 'epoch': 0.91} 91%|█████████▏| 6074/6640 [1:41:33<56:01, 5.94s/it] 91%|█████████▏| 6075/6640 [1:41:38<54:20, 5.77s/it] {'loss': 0.5583, 'learning_rate': 3.774364433124578e-07, 'epoch': 0.91} 91%|█████████▏| 6075/6640 [1:41:38<54:20, 5.77s/it] 92%|█████████▏| 6076/6640 [1:41:45<58:19, 6.21s/it] {'loss': 0.569, 'learning_rate': 3.7610999886939725e-07, 'epoch': 0.92} 92%|█████████▏| 6076/6640 [1:41:45<58:19, 6.21s/it] 92%|█████████▏| 6077/6640 [1:41:53<1:02:13, 6.63s/it] {'loss': 0.5604, 'learning_rate': 3.7478584465398206e-07, 'epoch': 0.92} 92%|█████████▏| 6077/6640 [1:41:53<1:02:13, 6.63s/it] 92%|█████████▏| 6078/6640 [1:41:58<58:31, 6.25s/it] {'loss': 0.5725, 'learning_rate': 3.734639809813245e-07, 'epoch': 0.92} 92%|█████████▏| 6078/6640 [1:41:58<58:31, 6.25s/it] 92%|█████████▏| 6079/6640 [1:42:04<56:36, 6.05s/it] {'loss': 0.5568, 'learning_rate': 3.7214440816599193e-07, 'epoch': 0.92} 92%|█████████▏| 6079/6640 [1:42:04<56:36, 6.05s/it] 92%|█████████▏| 6080/6640 [1:42:10<55:11, 5.91s/it] {'loss': 0.5748, 'learning_rate': 3.708271265220087e-07, 'epoch': 0.92} 92%|█████████▏| 6080/6640 [1:42:10<55:11, 5.91s/it] 92%|█████████▏| 6081/6640 [1:42:15<53:09, 5.71s/it] {'loss': 0.5364, 'learning_rate': 3.695121363628484e-07, 'epoch': 0.92} 92%|█████████▏| 6081/6640 [1:42:15<53:09, 5.71s/it] 92%|█████████▏| 6082/6640 [1:42:20<52:20, 5.63s/it] {'loss': 0.543, 'learning_rate': 3.6819943800144975e-07, 'epoch': 0.92} 92%|█████████▏| 6082/6640 [1:42:20<52:20, 5.63s/it] 92%|█████████▏| 6083/6640 [1:42:26<52:17, 5.63s/it] {'loss': 0.5752, 'learning_rate': 3.6688903175019607e-07, 'epoch': 0.92} 92%|█████████▏| 6083/6640 [1:42:26<52:17, 5.63s/it] 92%|█████████▏| 6084/6640 [1:42:31<51:50, 5.59s/it] {'loss': 0.5853, 'learning_rate': 3.6558091792092907e-07, 'epoch': 0.92} 92%|█████████▏| 6084/6640 [1:42:31<51:50, 5.59s/it] 92%|█████████▏| 6085/6640 [1:42:37<50:59, 5.51s/it] {'loss': 0.5739, 'learning_rate': 3.642750968249442e-07, 'epoch': 0.92} 92%|█████████▏| 6085/6640 [1:42:37<50:59, 5.51s/it] 92%|█████████▏| 6086/6640 [1:42:43<52:09, 5.65s/it] {'loss': 0.5786, 'learning_rate': 3.62971568772994e-07, 'epoch': 0.92} 92%|█████████▏| 6086/6640 [1:42:43<52:09, 5.65s/it] 92%|█████████▏| 6087/6640 [1:42:48<52:15, 5.67s/it] {'loss': 0.5737, 'learning_rate': 3.616703340752825e-07, 'epoch': 0.92} 92%|█████████▏| 6087/6640 [1:42:48<52:15, 5.67s/it] 92%|█████████▏| 6088/6640 [1:42:54<51:45, 5.63s/it] {'loss': 0.573, 'learning_rate': 3.603713930414676e-07, 'epoch': 0.92} 92%|█████████▏| 6088/6640 [1:42:54<51:45, 5.63s/it] 92%|█████████▏| 6089/6640 [1:42:59<50:58, 5.55s/it] {'loss': 0.5973, 'learning_rate': 3.5907474598066315e-07, 'epoch': 0.92} 92%|█████████▏| 6089/6640 [1:42:59<50:58, 5.55s/it] 92%|█████████▏| 6090/6640 [1:43:05<49:54, 5.44s/it] {'loss': 0.5589, 'learning_rate': 3.5778039320143456e-07, 'epoch': 0.92} 92%|█████████▏| 6090/6640 [1:43:05<49:54, 5.44s/it] 92%|█████████▏| 6091/6640 [1:43:10<49:27, 5.41s/it] {'loss': 0.562, 'learning_rate': 3.5648833501180645e-07, 'epoch': 0.92} 92%|█████████▏| 6091/6640 [1:43:10<49:27, 5.41s/it] 92%|█████████▏| 6092/6640 [1:43:15<49:54, 5.46s/it] {'loss': 0.5786, 'learning_rate': 3.551985717192519e-07, 'epoch': 0.92} 92%|█████████▏| 6092/6640 [1:43:15<49:54, 5.46s/it] 92%|█████████▏| 6093/6640 [1:43:21<49:20, 5.41s/it] {'loss': 0.58, 'learning_rate': 3.539111036306997e-07, 'epoch': 0.92} 92%|█████████▏| 6093/6640 [1:43:21<49:20, 5.41s/it] 92%|█████████▏| 6094/6640 [1:43:26<49:26, 5.43s/it] {'loss': 0.5577, 'learning_rate': 3.5262593105253374e-07, 'epoch': 0.92} 92%|█████████▏| 6094/6640 [1:43:26<49:26, 5.43s/it] 92%|█████████▏| 6095/6640 [1:43:32<49:29, 5.45s/it] {'loss': 0.5919, 'learning_rate': 3.5134305429058935e-07, 'epoch': 0.92} 92%|█████████▏| 6095/6640 [1:43:32<49:29, 5.45s/it] 92%|█████████▏| 6096/6640 [1:43:37<49:13, 5.43s/it] {'loss': 0.5694, 'learning_rate': 3.50062473650159e-07, 'epoch': 0.92} 92%|█████████▏| 6096/6640 [1:43:37<49:13, 5.43s/it] 92%|█████████▏| 6097/6640 [1:43:42<48:50, 5.40s/it] {'loss': 0.5812, 'learning_rate': 3.4878418943598446e-07, 'epoch': 0.92} 92%|█████████▏| 6097/6640 [1:43:42<48:50, 5.40s/it] 92%|█████████▏| 6098/6640 [1:43:48<49:07, 5.44s/it] {'loss': 0.5872, 'learning_rate': 3.475082019522646e-07, 'epoch': 0.92} 92%|█████████▏| 6098/6640 [1:43:48<49:07, 5.44s/it] 92%|█████████▏| 6099/6640 [1:43:53<49:15, 5.46s/it] {'loss': 0.5602, 'learning_rate': 3.462345115026478e-07, 'epoch': 0.92} 92%|█████████▏| 6099/6640 [1:43:53<49:15, 5.46s/it]3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 92%|█████████▏| 6100/6640 [1:43:59<49:04, 5.45s/it]6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.5858, 'learning_rate': 3.4496311839024133e-07, 'epoch': 0.92} 92%|█████████▏| 6100/6640 [1:43:59<49:04, 5.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 92%|█████████▏| 6101/6640 [1:44:27<1:49:44, 12.22s/it] {'loss': 0.577, 'learning_rate': 3.436940229176e-07, 'epoch': 0.92} 92%|█████████▏| 6101/6640 [1:44:27<1:49:44, 12.22s/it] 92%|█████████▏| 6102/6640 [1:44:32<1:31:10, 10.17s/it] {'loss': 0.5721, 'learning_rate': 3.4242722538673557e-07, 'epoch': 0.92} 92%|█████████▏| 6102/6640 [1:44:32<1:31:10, 10.17s/it] 92%|█████████▏| 6103/6640 [1:44:38<1:17:54, 8.70s/it] {'loss': 0.5525, 'learning_rate': 3.411627260991124e-07, 'epoch': 0.92} 92%|█████████▏| 6103/6640 [1:44:38<1:17:54, 8.70s/it] 92%|█████████▏| 6104/6640 [1:44:43<1:08:54, 7.71s/it] {'loss': 0.5608, 'learning_rate': 3.3990052535564644e-07, 'epoch': 0.92} 92%|█████████▏| 6104/6640 [1:44:43<1:08:54, 7.71s/it] 92%|█████████▏| 6105/6640 [1:44:48<1:02:25, 7.00s/it] {'loss': 0.577, 'learning_rate': 3.386406234567086e-07, 'epoch': 0.92} 92%|█████████▏| 6105/6640 [1:44:48<1:02:25, 7.00s/it] 92%|█████████▏| 6106/6640 [1:44:54<58:00, 6.52s/it] {'loss': 0.5805, 'learning_rate': 3.3738302070212114e-07, 'epoch': 0.92} 92%|█████████▏| 6106/6640 [1:44:54<58:00, 6.52s/it] 92%|█████████▏| 6107/6640 [1:44:59<54:42, 6.16s/it] {'loss': 0.5685, 'learning_rate': 3.3612771739116035e-07, 'epoch': 0.92} 92%|█████████▏| 6107/6640 [1:44:59<54:42, 6.16s/it] 92%|█████████▏| 6108/6640 [1:45:04<52:43, 5.95s/it] {'loss': 0.5971, 'learning_rate': 3.3487471382255277e-07, 'epoch': 0.92} 92%|█████████▏| 6108/6640 [1:45:04<52:43, 5.95s/it] 92%|█████████▏| 6109/6640 [1:45:10<50:58, 5.76s/it] {'loss': 0.5666, 'learning_rate': 3.336240102944821e-07, 'epoch': 0.92} 92%|█████████▏| 6109/6640 [1:45:10<50:58, 5.76s/it] 92%|█████████▏| 6110/6640 [1:45:15<49:47, 5.64s/it] {'loss': 0.5474, 'learning_rate': 3.3237560710458137e-07, 'epoch': 0.92} 92%|█████████▏| 6110/6640 [1:45:15<49:47, 5.64s/it] 92%|█████████▏| 6111/6640 [1:45:21<49:20, 5.60s/it] {'loss': 0.583, 'learning_rate': 3.3112950454993633e-07, 'epoch': 0.92} 92%|█████████▏| 6111/6640 [1:45:21<49:20, 5.60s/it] 92%|█████████▏| 6112/6640 [1:45:26<49:27, 5.62s/it] {'loss': 0.5726, 'learning_rate': 3.298857029270863e-07, 'epoch': 0.92} 92%|█████████▏| 6112/6640 [1:45:26<49:27, 5.62s/it] 92%|█████████▏| 6113/6640 [1:45:32<49:48, 5.67s/it] {'loss': 0.5858, 'learning_rate': 3.2864420253202357e-07, 'epoch': 0.92} 92%|█████████▏| 6113/6640 [1:45:32<49:48, 5.67s/it] 92%|█████████▏| 6114/6640 [1:45:38<49:25, 5.64s/it] {'loss': 0.5669, 'learning_rate': 3.274050036601917e-07, 'epoch': 0.92} 92%|█████████▏| 6114/6640 [1:45:38<49:25, 5.64s/it] 92%|█████████▏| 6115/6640 [1:45:43<48:38, 5.56s/it] {'loss': 0.5923, 'learning_rate': 3.261681066064859e-07, 'epoch': 0.92} 92%|█████████▏| 6115/6640 [1:45:43<48:38, 5.56s/it] 92%|█████████▏| 6116/6640 [1:45:48<48:07, 5.51s/it] {'loss': 0.5762, 'learning_rate': 3.2493351166525413e-07, 'epoch': 0.92} 92%|█████████▏| 6116/6640 [1:45:48<48:07, 5.51s/it] 92%|█████████▏| 6117/6640 [1:45:54<47:46, 5.48s/it] {'loss': 0.5505, 'learning_rate': 3.2370121913029796e-07, 'epoch': 0.92} 92%|█████████▏| 6117/6640 [1:45:54<47:46, 5.48s/it] 92%|█████████▏| 6118/6640 [1:45:59<47:24, 5.45s/it] {'loss': 0.5782, 'learning_rate': 3.224712292948695e-07, 'epoch': 0.92} 92%|█████████▏| 6118/6640 [1:45:59<47:24, 5.45s/it] 92%|█████████▏| 6119/6640 [1:46:05<47:12, 5.44s/it] {'loss': 0.592, 'learning_rate': 3.212435424516758e-07, 'epoch': 0.92} 92%|█████████▏| 6119/6640 [1:46:05<47:12, 5.44s/it] 92%|█████████▏| 6120/6640 [1:46:10<47:08, 5.44s/it] {'loss': 0.5714, 'learning_rate': 3.2001815889286856e-07, 'epoch': 0.92} 92%|█████████▏| 6120/6640 [1:46:10<47:08, 5.44s/it] 92%|█████████▏| 6121/6640 [1:46:16<47:20, 5.47s/it] {'loss': 0.5707, 'learning_rate': 3.1879507891005914e-07, 'epoch': 0.92} 92%|█████████▏| 6121/6640 [1:46:16<47:20, 5.47s/it] 92%|█████████▏| 6122/6640 [1:46:21<46:56, 5.44s/it] {'loss': 0.5602, 'learning_rate': 3.175743027943079e-07, 'epoch': 0.92} 92%|█████████▏| 6122/6640 [1:46:21<46:56, 5.44s/it] 92%|█████████▏| 6123/6640 [1:46:27<47:31, 5.52s/it] {'loss': 0.5962, 'learning_rate': 3.16355830836127e-07, 'epoch': 0.92} 92%|█████████▏| 6123/6640 [1:46:27<47:31, 5.52s/it] 92%|█████████▏| 6124/6640 [1:46:32<47:27, 5.52s/it] {'loss': 0.5732, 'learning_rate': 3.1513966332547887e-07, 'epoch': 0.92} 92%|█████████▏| 6124/6640 [1:46:32<47:27, 5.52s/it] 92%|█████████▏| 6125/6640 [1:46:38<47:02, 5.48s/it] {'loss': 0.5845, 'learning_rate': 3.1392580055177867e-07, 'epoch': 0.92} 92%|█████████▏| 6125/6640 [1:46:38<47:02, 5.48s/it] 92%|█████████▏| 6126/6640 [1:46:43<46:09, 5.39s/it] {'loss': 0.5938, 'learning_rate': 3.1271424280389317e-07, 'epoch': 0.92} 92%|█████████▏| 6126/6640 [1:46:43<46:09, 5.39s/it] 92%|█████████▏| 6127/6640 [1:46:48<45:39, 5.34s/it] {'loss': 0.5872, 'learning_rate': 3.115049903701417e-07, 'epoch': 0.92} 92%|█████████▏| 6127/6640 [1:46:48<45:39, 5.34s/it] 92%|█████████▏| 6128/6640 [1:46:53<45:45, 5.36s/it] {'loss': 0.5567, 'learning_rate': 3.1029804353829517e-07, 'epoch': 0.92} 92%|█████████▏| 6128/6640 [1:46:53<45:45, 5.36s/it] 92%|█████████▏| 6129/6640 [1:46:59<45:47, 5.38s/it] {'loss': 0.5722, 'learning_rate': 3.0909340259557055e-07, 'epoch': 0.92} 92%|█████████▏| 6129/6640 [1:46:59<45:47, 5.38s/it] 92%|█████████▏| 6130/6640 [1:47:04<45:42, 5.38s/it] {'loss': 0.5871, 'learning_rate': 3.0789106782864285e-07, 'epoch': 0.92} 92%|█████████▏| 6130/6640 [1:47:04<45:42, 5.38s/it] 92%|█████████▏| 6131/6640 [1:47:10<45:57, 5.42s/it] {'loss': 0.5581, 'learning_rate': 3.0669103952363557e-07, 'epoch': 0.92} 92%|█████████▏| 6131/6640 [1:47:10<45:57, 5.42s/it] 92%|█████████▏| 6132/6640 [1:47:15<45:54, 5.42s/it] {'loss': 0.571, 'learning_rate': 3.0549331796612237e-07, 'epoch': 0.92} 92%|█████████▏| 6132/6640 [1:47:15<45:54, 5.42s/it] 92%|█████████▏| 6133/6640 [1:47:21<45:48, 5.42s/it] {'loss': 0.5905, 'learning_rate': 3.042979034411286e-07, 'epoch': 0.92} 92%|█████████▏| 6133/6640 [1:47:21<45:48, 5.42s/it] 92%|█████████▏| 6134/6640 [1:47:26<45:36, 5.41s/it] {'loss': 0.5675, 'learning_rate': 3.0310479623313125e-07, 'epoch': 0.92} 92%|█████████▏| 6134/6640 [1:47:26<45:36, 5.41s/it] 92%|█████████▏| 6135/6640 [1:47:32<46:16, 5.50s/it] {'loss': 0.5872, 'learning_rate': 3.019139966260587e-07, 'epoch': 0.92} 92%|█████████▏| 6135/6640 [1:47:32<46:16, 5.50s/it] 92%|█████████▏| 6136/6640 [1:47:37<45:43, 5.44s/it] {'loss': 0.5758, 'learning_rate': 3.0072550490328754e-07, 'epoch': 0.92} 92%|█████████▏| 6136/6640 [1:47:37<45:43, 5.44s/it] 92%|█████████▏| 6137/6640 [1:47:42<45:35, 5.44s/it] {'loss': 0.573, 'learning_rate': 2.995393213476494e-07, 'epoch': 0.92} 92%|█████████▏| 6137/6640 [1:47:42<45:35, 5.44s/it] 92%|█████████▏| 6138/6640 [1:47:48<45:13, 5.41s/it] {'loss': 0.5745, 'learning_rate': 2.983554462414218e-07, 'epoch': 0.92} 92%|█████████▏| 6138/6640 [1:47:48<45:13, 5.41s/it] 92%|█████████▏| 6139/6640 [1:47:53<45:06, 5.40s/it] {'loss': 0.5857, 'learning_rate': 2.971738798663348e-07, 'epoch': 0.92} 92%|█████████▏| 6139/6640 [1:47:53<45:06, 5.40s/it] 92%|█████████▏| 6140/6640 [1:47:59<46:13, 5.55s/it] {'loss': 0.5743, 'learning_rate': 2.959946225035726e-07, 'epoch': 0.92} 92%|█████████▏| 6140/6640 [1:47:59<46:13, 5.55s/it] 92%|█████████▏| 6141/6640 [1:48:04<45:43, 5.50s/it] {'loss': 0.571, 'learning_rate': 2.9481767443376496e-07, 'epoch': 0.92} 92%|█████████▏| 6141/6640 [1:48:04<45:43, 5.50s/it] 92%|█████████▎| 6142/6640 [1:48:10<45:36, 5.50s/it] {'loss': 0.5665, 'learning_rate': 2.936430359369935e-07, 'epoch': 0.93} 92%|█████████▎| 6142/6640 [1:48:10<45:36, 5.50s/it] 93%|█████████▎| 6143/6640 [1:48:15<45:29, 5.49s/it] {'loss': 0.5711, 'learning_rate': 2.9247070729279017e-07, 'epoch': 0.93} 93%|█████████▎| 6143/6640 [1:48:15<45:29, 5.49s/it] 93%|█████████▎| 6144/6640 [1:48:21<44:46, 5.42s/it] {'loss': 0.5929, 'learning_rate': 2.913006887801395e-07, 'epoch': 0.93} 93%|█████████▎| 6144/6640 [1:48:21<44:46, 5.42s/it] 93%|█████████▎| 6145/6640 [1:48:26<45:14, 5.48s/it] {'loss': 0.5718, 'learning_rate': 2.901329806774744e-07, 'epoch': 0.93} 93%|█████████▎| 6145/6640 [1:48:26<45:14, 5.48s/it] 93%|█████████▎| 6146/6640 [1:48:32<45:19, 5.51s/it] {'loss': 0.556, 'learning_rate': 2.8896758326267703e-07, 'epoch': 0.93} 93%|█████████▎| 6146/6640 [1:48:32<45:19, 5.51s/it] 93%|█████████▎| 6147/6640 [1:48:37<44:49, 5.45s/it] {'loss': 0.6051, 'learning_rate': 2.878044968130811e-07, 'epoch': 0.93} 93%|█████████▎| 6147/6640 [1:48:37<44:49, 5.45s/it] 93%|█████████▎| 6148/6640 [1:48:43<45:07, 5.50s/it] {'loss': 0.5697, 'learning_rate': 2.866437216054696e-07, 'epoch': 0.93} 93%|█████████▎| 6148/6640 [1:48:43<45:07, 5.50s/it] 93%|█████████▎| 6149/6640 [1:48:48<44:41, 5.46s/it] {'loss': 0.5878, 'learning_rate': 2.854852579160772e-07, 'epoch': 0.93} 93%|█████████▎| 6149/6640 [1:48:48<44:41, 5.46s/it]7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 93%|█████████▎| 6150/6640 [1:48:54<44:52, 5.49s/it]6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5643, 'learning_rate': 2.843291060205855e-07, 'epoch': 0.93} 93%|█████████▎| 6150/6640 [1:48:54<44:52, 5.49s/it] 93%|█████████▎| 6151/6640 [1:48:59<44:52, 5.51s/it] {'loss': 0.5839, 'learning_rate': 2.8317526619412895e-07, 'epoch': 0.93} 93%|█████████▎| 6151/6640 [1:48:59<44:52, 5.51s/it] 93%|█████████▎| 6152/6640 [1:49:05<44:59, 5.53s/it] {'loss': 0.5789, 'learning_rate': 2.8202373871128897e-07, 'epoch': 0.93} 93%|█████████▎| 6152/6640 [1:49:05<44:59, 5.53s/it] 93%|█████████▎| 6153/6640 [1:49:10<45:01, 5.55s/it] {'loss': 0.583, 'learning_rate': 2.808745238460997e-07, 'epoch': 0.93} 93%|█████████▎| 6153/6640 [1:49:10<45:01, 5.55s/it] 93%|█████████▎| 6154/6640 [1:49:16<44:44, 5.52s/it] {'loss': 0.5794, 'learning_rate': 2.7972762187204237e-07, 'epoch': 0.93} 93%|█████████▎| 6154/6640 [1:49:16<44:44, 5.52s/it] 93%|█████████▎| 6155/6640 [1:49:21<44:17, 5.48s/it] {'loss': 0.5747, 'learning_rate': 2.785830330620509e-07, 'epoch': 0.93} 93%|█████████▎| 6155/6640 [1:49:21<44:17, 5.48s/it] 93%|█████████▎| 6156/6640 [1:49:27<43:54, 5.44s/it] {'loss': 0.6032, 'learning_rate': 2.774407576885041e-07, 'epoch': 0.93} 93%|█████████▎| 6156/6640 [1:49:27<43:54, 5.44s/it] 93%|█████████▎| 6157/6640 [1:49:32<43:44, 5.43s/it] {'loss': 0.5852, 'learning_rate': 2.7630079602323447e-07, 'epoch': 0.93} 93%|█████████▎| 6157/6640 [1:49:32<43:44, 5.43s/it] 93%|█████████▎| 6158/6640 [1:49:38<44:03, 5.49s/it] {'loss': 0.5747, 'learning_rate': 2.751631483375217e-07, 'epoch': 0.93} 93%|█████████▎| 6158/6640 [1:49:38<44:03, 5.49s/it] 93%|█████████▎| 6159/6640 [1:49:43<43:36, 5.44s/it] {'loss': 0.5658, 'learning_rate': 2.7402781490209473e-07, 'epoch': 0.93} 93%|█████████▎| 6159/6640 [1:49:43<43:36, 5.44s/it] 93%|█████████▎| 6160/6640 [1:49:48<43:15, 5.41s/it] {'loss': 0.5738, 'learning_rate': 2.728947959871353e-07, 'epoch': 0.93} 93%|█████████▎| 6160/6640 [1:49:48<43:15, 5.41s/it] 93%|█████████▎| 6161/6640 [1:49:53<42:27, 5.32s/it] {'loss': 0.6298, 'learning_rate': 2.717640918622677e-07, 'epoch': 0.93} 93%|█████████▎| 6161/6640 [1:49:53<42:27, 5.32s/it] 93%|█████████▎| 6162/6640 [1:49:59<42:56, 5.39s/it] {'loss': 0.5589, 'learning_rate': 2.7063570279657114e-07, 'epoch': 0.93} 93%|█████████▎| 6162/6640 [1:49:59<42:56, 5.39s/it] 93%|█████████▎| 6163/6640 [1:50:05<43:23, 5.46s/it] {'loss': 0.5576, 'learning_rate': 2.695096290585719e-07, 'epoch': 0.93} 93%|█████████▎| 6163/6640 [1:50:05<43:23, 5.46s/it] 93%|█████████▎| 6164/6640 [1:50:10<43:04, 5.43s/it] {'loss': 0.5672, 'learning_rate': 2.683858709162468e-07, 'epoch': 0.93} 93%|█████████▎| 6164/6640 [1:50:10<43:04, 5.43s/it] 93%|█████████▎| 6165/6640 [1:50:15<42:33, 5.38s/it] {'loss': 0.5709, 'learning_rate': 2.672644286370163e-07, 'epoch': 0.93} 93%|█████████▎| 6165/6640 [1:50:15<42:33, 5.38s/it] 93%|█████████▎| 6166/6640 [1:50:21<43:03, 5.45s/it] {'loss': 0.5785, 'learning_rate': 2.661453024877569e-07, 'epoch': 0.93} 93%|█████████▎| 6166/6640 [1:50:21<43:03, 5.45s/it] 93%|█████████▎| 6167/6640 [1:50:26<42:42, 5.42s/it] {'loss': 0.5694, 'learning_rate': 2.6502849273478903e-07, 'epoch': 0.93} 93%|█████████▎| 6167/6640 [1:50:26<42:42, 5.42s/it] 93%|█████████▎| 6168/6640 [1:50:31<42:21, 5.38s/it] {'loss': 0.5766, 'learning_rate': 2.639139996438844e-07, 'epoch': 0.93} 93%|█████████▎| 6168/6640 [1:50:31<42:21, 5.38s/it] 93%|█████████▎| 6169/6640 [1:50:37<42:26, 5.41s/it] {'loss': 0.5502, 'learning_rate': 2.6280182348026206e-07, 'epoch': 0.93} 93%|█████████▎| 6169/6640 [1:50:37<42:26, 5.41s/it] 93%|█████████▎| 6170/6640 [1:50:42<42:17, 5.40s/it] {'loss': 0.5622, 'learning_rate': 2.616919645085902e-07, 'epoch': 0.93} 93%|█████████▎| 6170/6640 [1:50:42<42:17, 5.40s/it] 93%|█████████▎| 6171/6640 [1:50:48<41:46, 5.35s/it] {'loss': 0.5665, 'learning_rate': 2.6058442299298437e-07, 'epoch': 0.93} 93%|█████████▎| 6171/6640 [1:50:48<41:46, 5.35s/it] 93%|█████████▎| 6172/6640 [1:50:53<41:51, 5.37s/it] {'loss': 0.5725, 'learning_rate': 2.594791991970114e-07, 'epoch': 0.93} 93%|█████████▎| 6172/6640 [1:50:53<41:51, 5.37s/it] 93%|█████████▎| 6173/6640 [1:50:58<41:45, 5.37s/it] {'loss': 0.5665, 'learning_rate': 2.583762933836853e-07, 'epoch': 0.93} 93%|█████████▎| 6173/6640 [1:50:58<41:45, 5.37s/it] 93%|█████████▎| 6174/6640 [1:51:04<41:58, 5.40s/it] {'loss': 0.577, 'learning_rate': 2.572757058154651e-07, 'epoch': 0.93} 93%|█████████▎| 6174/6640 [1:51:04<41:58, 5.40s/it] 93%|█████████▎| 6175/6640 [1:51:09<41:55, 5.41s/it] {'loss': 0.5792, 'learning_rate': 2.5617743675426354e-07, 'epoch': 0.93} 93%|█████████▎| 6175/6640 [1:51:09<41:55, 5.41s/it] 93%|█████████▎| 6176/6640 [1:51:15<41:36, 5.38s/it] {'loss': 0.5805, 'learning_rate': 2.5508148646143927e-07, 'epoch': 0.93} 93%|█████████▎| 6176/6640 [1:51:15<41:36, 5.38s/it] 93%|█████████▎| 6177/6640 [1:51:20<41:48, 5.42s/it] {'loss': 0.5664, 'learning_rate': 2.5398785519779814e-07, 'epoch': 0.93} 93%|█████████▎| 6177/6640 [1:51:20<41:48, 5.42s/it] 93%|█████████▎| 6178/6640 [1:51:25<41:27, 5.38s/it] {'loss': 0.5786, 'learning_rate': 2.5289654322359526e-07, 'epoch': 0.93} 93%|█████████▎| 6178/6640 [1:51:25<41:27, 5.38s/it] 93%|█████████▎| 6179/6640 [1:51:31<41:51, 5.45s/it] {'loss': 0.6066, 'learning_rate': 2.5180755079853405e-07, 'epoch': 0.93} 93%|█████████▎| 6179/6640 [1:51:31<41:51, 5.45s/it] 93%|█████████▎| 6180/6640 [1:51:36<41:38, 5.43s/it] {'loss': 0.5756, 'learning_rate': 2.507208781817638e-07, 'epoch': 0.93} 93%|█████████▎| 6180/6640 [1:51:36<41:38, 5.43s/it] 93%|█████████▎| 6181/6640 [1:51:42<41:18, 5.40s/it] {'loss': 0.5749, 'learning_rate': 2.4963652563188557e-07, 'epoch': 0.93} 93%|█████████▎| 6181/6640 [1:51:42<41:18, 5.40s/it] 93%|█████████▎| 6182/6640 [1:51:47<41:33, 5.44s/it] {'loss': 0.5715, 'learning_rate': 2.4855449340694616e-07, 'epoch': 0.93} 93%|█████████▎| 6182/6640 [1:51:47<41:33, 5.44s/it] 93%|█████████▎| 6183/6640 [1:51:53<41:30, 5.45s/it] {'loss': 0.5748, 'learning_rate': 2.474747817644396e-07, 'epoch': 0.93} 93%|█████████▎| 6183/6640 [1:51:53<41:30, 5.45s/it] 93%|█████████▎| 6184/6640 [1:51:58<41:05, 5.41s/it] {'loss': 0.5892, 'learning_rate': 2.463973909613071e-07, 'epoch': 0.93} 93%|█████████▎| 6184/6640 [1:51:58<41:05, 5.41s/it] 93%|█████████▎| 6185/6640 [1:52:03<40:35, 5.35s/it] {'loss': 0.5702, 'learning_rate': 2.453223212539391e-07, 'epoch': 0.93} 93%|█████████▎| 6185/6640 [1:52:03<40:35, 5.35s/it] 93%|█████████▎| 6186/6640 [1:52:09<40:37, 5.37s/it] {'loss': 0.5938, 'learning_rate': 2.442495728981753e-07, 'epoch': 0.93} 93%|█████████▎| 6186/6640 [1:52:09<40:37, 5.37s/it] 93%|█████████▎| 6187/6640 [1:52:14<40:45, 5.40s/it] {'loss': 0.6038, 'learning_rate': 2.4317914614930047e-07, 'epoch': 0.93} 93%|█████████▎| 6187/6640 [1:52:14<40:45, 5.40s/it] 93%|█████████▎| 6188/6640 [1:52:19<40:36, 5.39s/it] {'loss': 0.567, 'learning_rate': 2.421110412620453e-07, 'epoch': 0.93} 93%|█████████▎| 6188/6640 [1:52:20<40:36, 5.39s/it] 93%|█████████▎| 6189/6640 [1:52:25<41:20, 5.50s/it] {'loss': 0.5401, 'learning_rate': 2.41045258490592e-07, 'epoch': 0.93} 93%|█████████▎| 6189/6640 [1:52:25<41:20, 5.50s/it] 93%|█████████▎| 6190/6640 [1:52:31<41:07, 5.48s/it] {'loss': 0.5932, 'learning_rate': 2.399817980885677e-07, 'epoch': 0.93} 93%|█████████▎| 6190/6640 [1:52:31<41:07, 5.48s/it] 93%|█████████▎| 6191/6640 [1:52:36<40:43, 5.44s/it] {'loss': 0.5644, 'learning_rate': 2.389206603090488e-07, 'epoch': 0.93} 93%|█████████▎| 6191/6640 [1:52:36<40:43, 5.44s/it] 93%|█████████▎| 6192/6640 [1:52:42<40:51, 5.47s/it] {'loss': 0.5908, 'learning_rate': 2.3786184540455449e-07, 'epoch': 0.93} 93%|█████████▎| 6192/6640 [1:52:42<40:51, 5.47s/it] 93%|█████████▎| 6193/6640 [1:52:47<40:56, 5.50s/it] {'loss': 0.5614, 'learning_rate': 2.3680535362705758e-07, 'epoch': 0.93} 93%|█████████▎| 6193/6640 [1:52:47<40:56, 5.50s/it] 93%|█████████▎| 6194/6640 [1:52:53<40:47, 5.49s/it] {'loss': 0.5822, 'learning_rate': 2.3575118522797258e-07, 'epoch': 0.93} 93%|█████████▎| 6194/6640 [1:52:53<40:47, 5.49s/it] 93%|█████████▎| 6195/6640 [1:52:58<40:26, 5.45s/it] {'loss': 0.57, 'learning_rate': 2.3469934045816435e-07, 'epoch': 0.93} 93%|█████████▎| 6195/6640 [1:52:58<40:26, 5.45s/it] 93%|█████████▎| 6196/6640 [1:53:03<40:18, 5.45s/it] {'loss': 0.5902, 'learning_rate': 2.3364981956794487e-07, 'epoch': 0.93} 93%|█████████▎| 6196/6640 [1:53:03<40:18, 5.45s/it] 93%|█████████▎| 6197/6640 [1:53:09<39:51, 5.40s/it] {'loss': 0.5971, 'learning_rate': 2.326026228070688e-07, 'epoch': 0.93} 93%|█████████▎| 6197/6640 [1:53:09<39:51, 5.40s/it] 93%|█████████▎| 6198/6640 [1:53:14<40:08, 5.45s/it] {'loss': 0.5786, 'learning_rate': 2.315577504247435e-07, 'epoch': 0.93} 93%|█████████▎| 6198/6640 [1:53:14<40:08, 5.45s/it] 93%|█████████▎| 6199/6640 [1:53:19<39:40, 5.40s/it] {'loss': 0.5761, 'learning_rate': 2.3051520266961892e-07, 'epoch': 0.93} 93%|█████████▎| 6199/6640 [1:53:19<39:40, 5.40s/it]01 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 93%|█████████▎| 6200/6640 [1:53:25<39:37, 5.40s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5646, 'learning_rate': 2.294749797897955e-07, 'epoch': 0.93} 93%|█████████▎| 6200/6640 [1:53:25<39:37, 5.40s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 93%|█████████▎| 6201/6640 [1:53:53<1:28:50, 12.14s/it] {'loss': 0.5799, 'learning_rate': 2.2843708203281745e-07, 'epoch': 0.93} 93%|█████████▎| 6201/6640 [1:53:53<1:28:50, 12.14s/it] 93%|█████████▎| 6202/6640 [1:53:58<1:13:58, 10.13s/it] {'loss': 0.5797, 'learning_rate': 2.2740150964567387e-07, 'epoch': 0.93} 93%|█████████▎| 6202/6640 [1:53:58<1:13:58, 10.13s/it] 93%|█████████▎| 6203/6640 [1:54:04<1:03:28, 8.72s/it] {'loss': 0.611, 'learning_rate': 2.2636826287480872e-07, 'epoch': 0.93} 93%|█████████▎| 6203/6640 [1:54:04<1:03:28, 8.72s/it] 93%|█████████▎| 6204/6640 [1:54:09<55:52, 7.69s/it] {'loss': 0.5732, 'learning_rate': 2.2533734196610203e-07, 'epoch': 0.93} 93%|█████████▎| 6204/6640 [1:54:09<55:52, 7.69s/it] 93%|█████████▎| 6205/6640 [1:54:14<50:53, 7.02s/it] {'loss': 0.5775, 'learning_rate': 2.243087471648886e-07, 'epoch': 0.93} 93%|█████████▎| 6205/6640 [1:54:14<50:53, 7.02s/it] 93%|█████████▎| 6206/6640 [1:54:20<47:19, 6.54s/it] {'loss': 0.5783, 'learning_rate': 2.2328247871594379e-07, 'epoch': 0.93} 93%|█████████▎| 6206/6640 [1:54:20<47:19, 6.54s/it] 93%|█████████▎| 6207/6640 [1:54:25<45:07, 6.25s/it] {'loss': 0.5768, 'learning_rate': 2.222585368634933e-07, 'epoch': 0.93} 93%|█████████▎| 6207/6640 [1:54:25<45:07, 6.25s/it] 93%|█████████▎| 6208/6640 [1:54:31<43:34, 6.05s/it] {'loss': 0.5885, 'learning_rate': 2.2123692185120783e-07, 'epoch': 0.93} 93%|█████████▎| 6208/6640 [1:54:31<43:34, 6.05s/it] 94%|█████████▎| 6209/6640 [1:54:36<42:19, 5.89s/it] {'loss': 0.59, 'learning_rate': 2.2021763392220396e-07, 'epoch': 0.94} 94%|█████████▎| 6209/6640 [1:54:36<42:19, 5.89s/it] 94%|█████████▎| 6210/6640 [1:54:42<41:08, 5.74s/it] {'loss': 0.5825, 'learning_rate': 2.192006733190466e-07, 'epoch': 0.94} 94%|█████████▎| 6210/6640 [1:54:42<41:08, 5.74s/it] 94%|█████████▎| 6211/6640 [1:54:47<40:16, 5.63s/it] {'loss': 0.581, 'learning_rate': 2.1818604028374212e-07, 'epoch': 0.94} 94%|█████████▎| 6211/6640 [1:54:47<40:16, 5.63s/it] 94%|█████████▎| 6212/6640 [1:54:52<39:15, 5.50s/it] {'loss': 0.5649, 'learning_rate': 2.1717373505774854e-07, 'epoch': 0.94} 94%|█████████▎| 6212/6640 [1:54:52<39:15, 5.50s/it] 94%|█████████▎| 6213/6640 [1:54:58<39:23, 5.54s/it] {'loss': 0.6015, 'learning_rate': 2.161637578819653e-07, 'epoch': 0.94} 94%|█████████▎| 6213/6640 [1:54:58<39:23, 5.54s/it] 94%|█████████▎| 6214/6640 [1:55:03<38:55, 5.48s/it] {'loss': 0.5556, 'learning_rate': 2.1515610899674244e-07, 'epoch': 0.94} 94%|█████████▎| 6214/6640 [1:55:03<38:55, 5.48s/it] 94%|█████████▎| 6215/6640 [1:55:09<38:45, 5.47s/it] {'loss': 0.5879, 'learning_rate': 2.1415078864187034e-07, 'epoch': 0.94} 94%|█████████▎| 6215/6640 [1:55:09<38:45, 5.47s/it] 94%|█████████▎| 6216/6640 [1:55:14<38:26, 5.44s/it] {'loss': 0.6022, 'learning_rate': 2.1314779705658982e-07, 'epoch': 0.94} 94%|█████████▎| 6216/6640 [1:55:14<38:26, 5.44s/it] 94%|█████████▎| 6217/6640 [1:55:20<38:26, 5.45s/it] {'loss': 0.5708, 'learning_rate': 2.1214713447958556e-07, 'epoch': 0.94} 94%|█████████▎| 6217/6640 [1:55:20<38:26, 5.45s/it] 94%|█████████▎| 6218/6640 [1:55:25<38:04, 5.41s/it] {'loss': 0.5642, 'learning_rate': 2.1114880114898928e-07, 'epoch': 0.94} 94%|█████████▎| 6218/6640 [1:55:25<38:04, 5.41s/it] 94%|█████████▎| 6219/6640 [1:55:31<38:09, 5.44s/it] {'loss': 0.5603, 'learning_rate': 2.1015279730237757e-07, 'epoch': 0.94} 94%|█████████▎| 6219/6640 [1:55:31<38:09, 5.44s/it] 94%|█████████▎| 6220/6640 [1:55:36<37:57, 5.42s/it] {'loss': 0.5896, 'learning_rate': 2.091591231767709e-07, 'epoch': 0.94} 94%|█████████▎| 6220/6640 [1:55:36<37:57, 5.42s/it] 94%|█████████▎| 6221/6640 [1:55:41<37:51, 5.42s/it] {'loss': 0.5774, 'learning_rate': 2.0816777900863895e-07, 'epoch': 0.94} 94%|█████████▎| 6221/6640 [1:55:41<37:51, 5.42s/it] 94%|█████████▎| 6222/6640 [1:55:47<37:43, 5.42s/it] {'loss': 0.5863, 'learning_rate': 2.071787650338941e-07, 'epoch': 0.94} 94%|█████████▎| 6222/6640 [1:55:47<37:43, 5.42s/it] 94%|█████████▎| 6223/6640 [1:55:52<37:43, 5.43s/it] {'loss': 0.5717, 'learning_rate': 2.0619208148789593e-07, 'epoch': 0.94} 94%|█████████▎| 6223/6640 [1:55:52<37:43, 5.43s/it] 94%|█████████▎| 6224/6640 [1:55:58<38:10, 5.51s/it] {'loss': 0.5405, 'learning_rate': 2.0520772860544768e-07, 'epoch': 0.94} 94%|█████████▎| 6224/6640 [1:55:58<38:10, 5.51s/it] 94%|█████████▍| 6225/6640 [1:56:03<38:14, 5.53s/it] {'loss': 0.5645, 'learning_rate': 2.0422570662079866e-07, 'epoch': 0.94} 94%|█████████▍| 6225/6640 [1:56:03<38:14, 5.53s/it] 94%|█████████▍| 6226/6640 [1:56:09<37:43, 5.47s/it] {'loss': 0.5656, 'learning_rate': 2.0324601576764525e-07, 'epoch': 0.94} 94%|█████████▍| 6226/6640 [1:56:09<37:43, 5.47s/it] 94%|█████████▍| 6227/6640 [1:56:14<37:25, 5.44s/it] {'loss': 0.5941, 'learning_rate': 2.022686562791254e-07, 'epoch': 0.94} 94%|█████████▍| 6227/6640 [1:56:14<37:25, 5.44s/it] 94%|█████████▍| 6228/6640 [1:56:20<37:38, 5.48s/it] {'loss': 0.5785, 'learning_rate': 2.012936283878275e-07, 'epoch': 0.94} 94%|█████████▍| 6228/6640 [1:56:20<37:38, 5.48s/it] 94%|█████████▍| 6229/6640 [1:56:25<37:30, 5.48s/it] {'loss': 0.5852, 'learning_rate': 2.0032093232577822e-07, 'epoch': 0.94} 94%|█████████▍| 6229/6640 [1:56:25<37:30, 5.48s/it] 94%|█████████▍| 6230/6640 [1:56:31<37:09, 5.44s/it] {'loss': 0.6114, 'learning_rate': 1.9935056832445676e-07, 'epoch': 0.94} 94%|█████████▍| 6230/6640 [1:56:31<37:09, 5.44s/it] 94%|█████████▍| 6231/6640 [1:56:36<37:52, 5.56s/it] {'loss': 0.5972, 'learning_rate': 1.983825366147818e-07, 'epoch': 0.94} 94%|█████████▍| 6231/6640 [1:56:36<37:52, 5.56s/it] 94%|█████████▍| 6232/6640 [1:56:42<37:09, 5.46s/it] {'loss': 0.5727, 'learning_rate': 1.9741683742712014e-07, 'epoch': 0.94} 94%|█████████▍| 6232/6640 [1:56:42<37:09, 5.46s/it] 94%|█████████▍| 6233/6640 [1:56:47<36:34, 5.39s/it] {'loss': 0.597, 'learning_rate': 1.964534709912813e-07, 'epoch': 0.94} 94%|█████████▍| 6233/6640 [1:56:47<36:34, 5.39s/it] 94%|█████████▍| 6234/6640 [1:56:52<36:14, 5.36s/it] {'loss': 0.5808, 'learning_rate': 1.954924375365197e-07, 'epoch': 0.94} 94%|█████████▍| 6234/6640 [1:56:52<36:14, 5.36s/it] 94%|█████████▍| 6235/6640 [1:56:57<36:03, 5.34s/it] {'loss': 0.5641, 'learning_rate': 1.945337372915368e-07, 'epoch': 0.94} 94%|█████████▍| 6235/6640 [1:56:57<36:03, 5.34s/it] 94%|█████████▍| 6236/6640 [1:57:03<36:02, 5.35s/it] {'loss': 0.5877, 'learning_rate': 1.935773704844779e-07, 'epoch': 0.94} 94%|█████████▍| 6236/6640 [1:57:03<36:02, 5.35s/it] 94%|█████████▍| 6237/6640 [1:57:08<36:04, 5.37s/it] {'loss': 0.5838, 'learning_rate': 1.9262333734293203e-07, 'epoch': 0.94} 94%|█████████▍| 6237/6640 [1:57:08<36:04, 5.37s/it] 94%|█████████▍| 6238/6640 [1:57:14<36:15, 5.41s/it] {'loss': 0.5819, 'learning_rate': 1.9167163809393207e-07, 'epoch': 0.94} 94%|█████████▍| 6238/6640 [1:57:14<36:15, 5.41s/it] 94%|█████████▍| 6239/6640 [1:57:19<36:07, 5.41s/it] {'loss': 0.5719, 'learning_rate': 1.9072227296396017e-07, 'epoch': 0.94} 94%|█████████▍| 6239/6640 [1:57:19<36:07, 5.41s/it] 94%|█████████▍| 6240/6640 [1:57:25<36:17, 5.44s/it] {'loss': 0.5828, 'learning_rate': 1.8977524217893782e-07, 'epoch': 0.94} 94%|█████████▍| 6240/6640 [1:57:25<36:17, 5.44s/it] 94%|█████████▍| 6241/6640 [1:57:30<36:06, 5.43s/it] {'loss': 0.5742, 'learning_rate': 1.8883054596423255e-07, 'epoch': 0.94} 94%|█████████▍| 6241/6640 [1:57:30<36:06, 5.43s/it] 94%|█████████▍| 6242/6640 [1:57:36<36:07, 5.45s/it] {'loss': 0.573, 'learning_rate': 1.8788818454465674e-07, 'epoch': 0.94} 94%|█████████▍| 6242/6640 [1:57:36<36:07, 5.45s/it] 94%|█████████▍| 6243/6640 [1:57:41<36:06, 5.46s/it] {'loss': 0.5784, 'learning_rate': 1.8694815814446875e-07, 'epoch': 0.94} 94%|█████████▍| 6243/6640 [1:57:41<36:06, 5.46s/it] 94%|█████████▍| 6244/6640 [1:57:46<35:53, 5.44s/it] {'loss': 0.5671, 'learning_rate': 1.8601046698736858e-07, 'epoch': 0.94} 94%|█████████▍| 6244/6640 [1:57:46<35:53, 5.44s/it] 94%|█████████▍| 6245/6640 [1:57:52<36:09, 5.49s/it] {'loss': 0.562, 'learning_rate': 1.85075111296501e-07, 'epoch': 0.94} 94%|█████████▍| 6245/6640 [1:57:52<36:09, 5.49s/it] 94%|█████████▍| 6246/6640 [1:57:58<36:14, 5.52s/it] {'loss': 0.5742, 'learning_rate': 1.8414209129445692e-07, 'epoch': 0.94} 94%|█████████▍| 6246/6640 [1:57:58<36:14, 5.52s/it] 94%|█████████▍| 6247/6640 [1:58:03<36:16, 5.54s/it] {'loss': 0.571, 'learning_rate': 1.8321140720326758e-07, 'epoch': 0.94} 94%|█████████▍| 6247/6640 [1:58:03<36:16, 5.54s/it] 94%|█████████▍| 6248/6640 [1:58:09<35:46, 5.48s/it] {'loss': 0.5855, 'learning_rate': 1.8228305924441469e-07, 'epoch': 0.94} 94%|█████████▍| 6248/6640 [1:58:09<35:46, 5.48s/it] 94%|█████████▍| 6249/6640 [1:58:14<35:21, 5.43s/it] {'loss': 0.5685, 'learning_rate': 1.81357047638816e-07, 'epoch': 0.94} 94%|█████████▍| 6249/6640 [1:58:14<35:21, 5.43s/it]7 AutoResumeHook: Checking whether to suspend... 35 1AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 94%|█████████▍| 6250/6640 [1:58:19<35:01, 5.39s/it]4 AutoResumeHook: Checking whether to suspend... {'loss': 0.5745, 'learning_rate': 1.804333726068408e-07, 'epoch': 0.94} 94%|█████████▍| 6250/6640 [1:58:19<35:01, 5.39s/it] 94%|█████████▍| 6251/6640 [1:58:25<34:56, 5.39s/it] {'loss': 0.5653, 'learning_rate': 1.7951203436829656e-07, 'epoch': 0.94} 94%|█████████▍| 6251/6640 [1:58:25<34:56, 5.39s/it] 94%|█████████▍| 6252/6640 [1:58:30<34:50, 5.39s/it] {'loss': 0.5677, 'learning_rate': 1.785930331424379e-07, 'epoch': 0.94} 94%|█████████▍| 6252/6640 [1:58:30<34:50, 5.39s/it] 94%|█████████▍| 6253/6640 [1:58:35<34:53, 5.41s/it] {'loss': 0.5821, 'learning_rate': 1.7767636914796437e-07, 'epoch': 0.94} 94%|█████████▍| 6253/6640 [1:58:35<34:53, 5.41s/it] 94%|█████████▍| 6254/6640 [1:58:41<35:00, 5.44s/it] {'loss': 0.5887, 'learning_rate': 1.7676204260301477e-07, 'epoch': 0.94} 94%|█████████▍| 6254/6640 [1:58:41<35:00, 5.44s/it] 94%|█████████▍| 6255/6640 [1:58:46<35:09, 5.48s/it] {'loss': 0.5721, 'learning_rate': 1.7585005372517504e-07, 'epoch': 0.94} 94%|█████████▍| 6255/6640 [1:58:46<35:09, 5.48s/it] 94%|█████████▍| 6256/6640 [1:58:52<34:36, 5.41s/it] {'loss': 0.5442, 'learning_rate': 1.7494040273147384e-07, 'epoch': 0.94} 94%|█████████▍| 6256/6640 [1:58:52<34:36, 5.41s/it] 94%|█████████▍| 6257/6640 [1:58:57<34:49, 5.46s/it] {'loss': 0.5753, 'learning_rate': 1.7403308983838462e-07, 'epoch': 0.94} 94%|█████████▍| 6257/6640 [1:58:57<34:49, 5.46s/it] 94%|█████████▍| 6258/6640 [1:59:03<34:47, 5.47s/it] {'loss': 0.5625, 'learning_rate': 1.731281152618225e-07, 'epoch': 0.94} 94%|█████████▍| 6258/6640 [1:59:03<34:47, 5.47s/it] 94%|█████████▍| 6259/6640 [1:59:08<34:40, 5.46s/it] {'loss': 0.5852, 'learning_rate': 1.722254792171485e-07, 'epoch': 0.94} 94%|█████████▍| 6259/6640 [1:59:08<34:40, 5.46s/it] 94%|█████████▍| 6260/6640 [1:59:14<34:23, 5.43s/it] {'loss': 0.5771, 'learning_rate': 1.7132518191916413e-07, 'epoch': 0.94} 94%|█████████▍| 6260/6640 [1:59:14<34:23, 5.43s/it] 94%|█████████▍| 6261/6640 [1:59:19<34:11, 5.41s/it] {'loss': 0.6081, 'learning_rate': 1.7042722358211694e-07, 'epoch': 0.94} 94%|█████████▍| 6261/6640 [1:59:19<34:11, 5.41s/it] 94%|█████████▍| 6262/6640 [1:59:24<34:11, 5.43s/it] {'loss': 0.569, 'learning_rate': 1.6953160441969707e-07, 'epoch': 0.94} 94%|█████████▍| 6262/6640 [1:59:24<34:11, 5.43s/it] 94%|█████████▍| 6263/6640 [1:59:30<34:42, 5.52s/it] {'loss': 0.5603, 'learning_rate': 1.686383246450374e-07, 'epoch': 0.94} 94%|█████████▍| 6263/6640 [1:59:30<34:42, 5.52s/it] 94%|█████████▍| 6264/6640 [1:59:36<34:17, 5.47s/it] {'loss': 0.5674, 'learning_rate': 1.677473844707156e-07, 'epoch': 0.94} 94%|█████████▍| 6264/6640 [1:59:36<34:17, 5.47s/it] 94%|█████████▍| 6265/6640 [1:59:41<34:14, 5.48s/it] {'loss': 0.5606, 'learning_rate': 1.6685878410874768e-07, 'epoch': 0.94} 94%|█████████▍| 6265/6640 [1:59:41<34:14, 5.48s/it] 94%|█████████▍| 6266/6640 [1:59:46<34:03, 5.46s/it] {'loss': 0.5529, 'learning_rate': 1.6597252377060335e-07, 'epoch': 0.94} 94%|█████████▍| 6266/6640 [1:59:46<34:03, 5.46s/it] 94%|█████████▍| 6267/6640 [1:59:52<34:01, 5.47s/it] {'loss': 0.5803, 'learning_rate': 1.6508860366718283e-07, 'epoch': 0.94} 94%|█████████▍| 6267/6640 [1:59:52<34:01, 5.47s/it] 94%|█████████▍| 6268/6640 [1:59:57<34:02, 5.49s/it] {'loss': 0.5728, 'learning_rate': 1.642070240088378e-07, 'epoch': 0.94} 94%|█████████▍| 6268/6640 [1:59:57<34:02, 5.49s/it] 94%|█████████▍| 6269/6640 [2:00:03<34:26, 5.57s/it] {'loss': 0.5664, 'learning_rate': 1.633277850053605e-07, 'epoch': 0.94} 94%|█████████▍| 6269/6640 [2:00:03<34:26, 5.57s/it] 94%|█████████▍| 6270/6640 [2:00:09<33:55, 5.50s/it] {'loss': 0.5582, 'learning_rate': 1.6245088686598686e-07, 'epoch': 0.94} 94%|█████████▍| 6270/6640 [2:00:09<33:55, 5.50s/it] 94%|█████████▍| 6271/6640 [2:00:14<33:40, 5.48s/it] {'loss': 0.6029, 'learning_rate': 1.615763297993944e-07, 'epoch': 0.94} 94%|█████████▍| 6271/6640 [2:00:14<33:40, 5.48s/it] 94%|█████████▍| 6272/6640 [2:00:19<33:38, 5.49s/it] {'loss': 0.6093, 'learning_rate': 1.6070411401370335e-07, 'epoch': 0.94} 94%|█████████▍| 6272/6640 [2:00:19<33:38, 5.49s/it] 94%|█████████▍| 6273/6640 [2:00:25<33:30, 5.48s/it] {'loss': 0.5755, 'learning_rate': 1.5983423971647983e-07, 'epoch': 0.94} 94%|█████████▍| 6273/6640 [2:00:25<33:30, 5.48s/it] 94%|█████████▍| 6274/6640 [2:00:30<33:11, 5.44s/it] {'loss': 0.5717, 'learning_rate': 1.5896670711472828e-07, 'epoch': 0.94} 94%|█████████▍| 6274/6640 [2:00:30<33:11, 5.44s/it] 95%|█████████▍| 6275/6640 [2:00:36<32:58, 5.42s/it] {'loss': 0.5786, 'learning_rate': 1.5810151641489912e-07, 'epoch': 0.95} 95%|█████████▍| 6275/6640 [2:00:36<32:58, 5.42s/it] 95%|█████████▍| 6276/6640 [2:00:41<33:16, 5.48s/it] {'loss': 0.562, 'learning_rate': 1.5723866782288545e-07, 'epoch': 0.95} 95%|█████████▍| 6276/6640 [2:00:41<33:16, 5.48s/it] 95%|█████████▍| 6277/6640 [2:00:47<32:52, 5.43s/it] {'loss': 0.5858, 'learning_rate': 1.5637816154402075e-07, 'epoch': 0.95} 95%|█████████▍| 6277/6640 [2:00:47<32:52, 5.43s/it] 95%|█████████▍| 6278/6640 [2:00:52<33:03, 5.48s/it] {'loss': 0.5787, 'learning_rate': 1.5551999778308235e-07, 'epoch': 0.95} 95%|█████████▍| 6278/6640 [2:00:52<33:03, 5.48s/it] 95%|█████████▍| 6279/6640 [2:00:58<32:37, 5.42s/it] {'loss': 0.5842, 'learning_rate': 1.5466417674429136e-07, 'epoch': 0.95} 95%|█████████▍| 6279/6640 [2:00:58<32:37, 5.42s/it] 95%|█████████▍| 6280/6640 [2:01:03<32:33, 5.43s/it] {'loss': 0.5715, 'learning_rate': 1.5381069863131037e-07, 'epoch': 0.95} 95%|█████████▍| 6280/6640 [2:01:03<32:33, 5.43s/it] 95%|█████████▍| 6281/6640 [2:01:08<32:31, 5.43s/it] {'loss': 0.5904, 'learning_rate': 1.529595636472425e-07, 'epoch': 0.95} 95%|█████████▍| 6281/6640 [2:01:08<32:31, 5.43s/it] 95%|█████████▍| 6282/6640 [2:01:14<32:49, 5.50s/it] {'loss': 0.5577, 'learning_rate': 1.5211077199463685e-07, 'epoch': 0.95} 95%|█████████▍| 6282/6640 [2:01:14<32:49, 5.50s/it] 95%|█████████▍| 6283/6640 [2:01:19<32:16, 5.43s/it] {'loss': 0.5759, 'learning_rate': 1.5126432387548185e-07, 'epoch': 0.95} 95%|█████████▍| 6283/6640 [2:01:19<32:16, 5.43s/it] 95%|█████████▍| 6284/6640 [2:01:25<31:58, 5.39s/it] {'loss': 0.5651, 'learning_rate': 1.5042021949120967e-07, 'epoch': 0.95} 95%|█████████▍| 6284/6640 [2:01:25<31:58, 5.39s/it] 95%|█████████▍| 6285/6640 [2:01:30<31:50, 5.38s/it] {'loss': 0.5625, 'learning_rate': 1.495784590426963e-07, 'epoch': 0.95} 95%|█████████▍| 6285/6640 [2:01:30<31:50, 5.38s/it] 95%|█████████▍| 6286/6640 [2:01:35<31:51, 5.40s/it] {'loss': 0.5766, 'learning_rate': 1.4873904273025486e-07, 'epoch': 0.95} 95%|█████████▍| 6286/6640 [2:01:35<31:51, 5.40s/it] 95%|█████████▍| 6287/6640 [2:01:41<31:45, 5.40s/it] {'loss': 0.5753, 'learning_rate': 1.4790197075364666e-07, 'epoch': 0.95} 95%|█████████▍| 6287/6640 [2:01:41<31:45, 5.40s/it] 95%|█████████▍| 6288/6640 [2:01:46<31:42, 5.41s/it] {'loss': 0.5843, 'learning_rate': 1.4706724331207122e-07, 'epoch': 0.95} 95%|█████████▍| 6288/6640 [2:01:46<31:42, 5.41s/it] 95%|█████████▍| 6289/6640 [2:01:52<31:43, 5.42s/it] {'loss': 0.5709, 'learning_rate': 1.462348606041708e-07, 'epoch': 0.95} 95%|█████████▍| 6289/6640 [2:01:52<31:43, 5.42s/it] 95%|█████████▍| 6290/6640 [2:01:57<31:42, 5.43s/it] {'loss': 0.5849, 'learning_rate': 1.4540482282803136e-07, 'epoch': 0.95} 95%|█████████▍| 6290/6640 [2:01:57<31:42, 5.43s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4901 > 4096). Running this sequence through the model will result in indexing errors 95%|█████████▍| 6291/6640 [2:02:03<31:32, 5.42s/it] {'loss': 0.5503, 'learning_rate': 1.4457713018117935e-07, 'epoch': 0.95} 95%|█████████▍| 6291/6640 [2:02:03<31:32, 5.42s/it] 95%|█████████▍| 6292/6640 [2:02:08<31:24, 5.41s/it] {'loss': 0.5959, 'learning_rate': 1.4375178286058167e-07, 'epoch': 0.95} 95%|█████████▍| 6292/6640 [2:02:08<31:24, 5.41s/it] 95%|█████████▍| 6293/6640 [2:02:13<31:16, 5.41s/it] {'loss': 0.5713, 'learning_rate': 1.4292878106265118e-07, 'epoch': 0.95} 95%|█████████▍| 6293/6640 [2:02:13<31:16, 5.41s/it] 95%|█████████▍| 6294/6640 [2:02:19<31:33, 5.47s/it] {'loss': 0.5681, 'learning_rate': 1.4210812498324012e-07, 'epoch': 0.95} 95%|█████████▍| 6294/6640 [2:02:19<31:33, 5.47s/it] 95%|█████████▍| 6295/6640 [2:02:25<31:38, 5.50s/it] {'loss': 0.5981, 'learning_rate': 1.4128981481764115e-07, 'epoch': 0.95} 95%|█████████▍| 6295/6640 [2:02:25<31:38, 5.50s/it] 95%|█████████▍| 6296/6640 [2:02:30<31:31, 5.50s/it] {'loss': 0.5828, 'learning_rate': 1.4047385076059072e-07, 'epoch': 0.95} 95%|█████████▍| 6296/6640 [2:02:30<31:31, 5.50s/it] 95%|█████████▍| 6297/6640 [2:02:35<31:16, 5.47s/it] {'loss': 0.5534, 'learning_rate': 1.3966023300626685e-07, 'epoch': 0.95} 95%|█████████▍| 6297/6640 [2:02:35<31:16, 5.47s/it] 95%|█████████▍| 6298/6640 [2:02:41<31:12, 5.47s/it] {'loss': 0.5661, 'learning_rate': 1.388489617482891e-07, 'epoch': 0.95} 95%|█████████▍| 6298/6640 [2:02:41<31:12, 5.47s/it] 95%|█████████▍| 6299/6640 [2:02:46<30:59, 5.45s/it] {'loss': 0.5653, 'learning_rate': 1.3804003717971637e-07, 'epoch': 0.95} 95%|█████████▍| 6299/6640 [2:02:46<30:59, 5.45s/it]1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 95%|█████████▍| 6300/6640 [2:02:52<30:37, 5.40s/it]5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.5938, 'learning_rate': 1.3723345949305245e-07, 'epoch': 0.95} 95%|█████████▍| 6300/6640 [2:02:52<30:37, 5.40s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 95%|█████████▍| 6301/6640 [2:03:21<1:11:07, 12.59s/it] {'loss': 0.5771, 'learning_rate': 1.3642922888024047e-07, 'epoch': 0.95} 95%|█████████▍| 6301/6640 [2:03:21<1:11:07, 12.59s/it] 95%|█████████▍| 6302/6640 [2:03:27<59:14, 10.52s/it] {'loss': 0.5746, 'learning_rate': 1.356273455326662e-07, 'epoch': 0.95} 95%|█████████▍| 6302/6640 [2:03:27<59:14, 10.52s/it] 95%|█████████▍| 6303/6640 [2:03:32<50:24, 8.97s/it] {'loss': 0.5863, 'learning_rate': 1.3482780964115705e-07, 'epoch': 0.95} 95%|█████████▍| 6303/6640 [2:03:32<50:24, 8.97s/it] 95%|█████████▍| 6304/6640 [2:03:38<44:27, 7.94s/it] {'loss': 0.5685, 'learning_rate': 1.3403062139598078e-07, 'epoch': 0.95} 95%|█████████▍| 6304/6640 [2:03:38<44:27, 7.94s/it] 95%|█████████▍| 6305/6640 [2:03:43<40:06, 7.18s/it] {'loss': 0.5929, 'learning_rate': 1.3323578098684565e-07, 'epoch': 0.95} 95%|█████████▍| 6305/6640 [2:03:43<40:06, 7.18s/it] 95%|█████████▍| 6306/6640 [2:03:48<36:51, 6.62s/it] {'loss': 0.5687, 'learning_rate': 1.3244328860290257e-07, 'epoch': 0.95} 95%|█████████▍| 6306/6640 [2:03:48<36:51, 6.62s/it] 95%|█████████▍| 6307/6640 [2:03:54<34:49, 6.27s/it] {'loss': 0.5665, 'learning_rate': 1.3165314443274623e-07, 'epoch': 0.95} 95%|█████████▍| 6307/6640 [2:03:54<34:49, 6.27s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 95%|█████████▌| 6308/6640 [2:03:59<33:30, 6.05s/it] {'loss': 0.5619, 'learning_rate': 1.3086534866440515e-07, 'epoch': 0.95} 95%|█████████▌| 6308/6640 [2:03:59<33:30, 6.05s/it] 95%|█████████▌| 6309/6640 [2:04:05<32:04, 5.81s/it] {'loss': 0.5884, 'learning_rate': 1.300799014853571e-07, 'epoch': 0.95} 95%|█████████▌| 6309/6640 [2:04:05<32:04, 5.81s/it] 95%|█████████▌| 6310/6640 [2:04:10<31:19, 5.70s/it] {'loss': 0.575, 'learning_rate': 1.292968030825159e-07, 'epoch': 0.95} 95%|█████████▌| 6310/6640 [2:04:10<31:19, 5.70s/it] 95%|█████████▌| 6311/6640 [2:04:15<30:42, 5.60s/it] {'loss': 0.5769, 'learning_rate': 1.285160536422392e-07, 'epoch': 0.95} 95%|█████████▌| 6311/6640 [2:04:15<30:42, 5.60s/it] 95%|█████████▌| 6312/6640 [2:04:21<30:16, 5.54s/it] {'loss': 0.5732, 'learning_rate': 1.2773765335032384e-07, 'epoch': 0.95} 95%|█████████▌| 6312/6640 [2:04:21<30:16, 5.54s/it] 95%|█████████▌| 6313/6640 [2:04:26<29:57, 5.50s/it] {'loss': 0.566, 'learning_rate': 1.2696160239200728e-07, 'epoch': 0.95} 95%|█████████▌| 6313/6640 [2:04:26<29:57, 5.50s/it] 95%|█████████▌| 6314/6640 [2:04:32<29:42, 5.47s/it] {'loss': 0.5875, 'learning_rate': 1.2618790095196953e-07, 'epoch': 0.95} 95%|█████████▌| 6314/6640 [2:04:32<29:42, 5.47s/it] 95%|█████████▌| 6315/6640 [2:04:37<29:18, 5.41s/it] {'loss': 0.5725, 'learning_rate': 1.2541654921432998e-07, 'epoch': 0.95} 95%|█████████▌| 6315/6640 [2:04:37<29:18, 5.41s/it] 95%|█████████▌| 6316/6640 [2:04:42<28:57, 5.36s/it] {'loss': 0.5856, 'learning_rate': 1.2464754736265183e-07, 'epoch': 0.95} 95%|█████████▌| 6316/6640 [2:04:42<28:57, 5.36s/it] 95%|█████████▌| 6317/6640 [2:04:47<28:49, 5.35s/it] {'loss': 0.5792, 'learning_rate': 1.2388089557993533e-07, 'epoch': 0.95} 95%|█████████▌| 6317/6640 [2:04:47<28:49, 5.35s/it] 95%|█████████▌| 6318/6640 [2:04:53<29:05, 5.42s/it] {'loss': 0.571, 'learning_rate': 1.231165940486234e-07, 'epoch': 0.95} 95%|█████████▌| 6318/6640 [2:04:53<29:05, 5.42s/it] 95%|█████████▌| 6319/6640 [2:04:58<29:00, 5.42s/it] {'loss': 0.5714, 'learning_rate': 1.223546429505984e-07, 'epoch': 0.95} 95%|█████████▌| 6319/6640 [2:04:58<29:00, 5.42s/it] 95%|█████████▌| 6320/6640 [2:05:04<29:12, 5.48s/it] {'loss': 0.5879, 'learning_rate': 1.2159504246718522e-07, 'epoch': 0.95} 95%|█████████▌| 6320/6640 [2:05:04<29:12, 5.48s/it] 95%|█████████▌| 6321/6640 [2:05:09<29:06, 5.47s/it] {'loss': 0.5666, 'learning_rate': 1.208377927791482e-07, 'epoch': 0.95} 95%|█████████▌| 6321/6640 [2:05:09<29:06, 5.47s/it] 95%|█████████▌| 6322/6640 [2:05:15<28:48, 5.43s/it] {'loss': 0.5814, 'learning_rate': 1.2008289406669206e-07, 'epoch': 0.95} 95%|█████████▌| 6322/6640 [2:05:15<28:48, 5.43s/it] 95%|█████████▌| 6323/6640 [2:05:20<28:49, 5.45s/it] {'loss': 0.5905, 'learning_rate': 1.1933034650946306e-07, 'epoch': 0.95} 95%|█████████▌| 6323/6640 [2:05:20<28:49, 5.45s/it] 95%|█████████▌| 6324/6640 [2:05:26<28:29, 5.41s/it] {'loss': 0.5905, 'learning_rate': 1.1858015028654801e-07, 'epoch': 0.95} 95%|█████████▌| 6324/6640 [2:05:26<28:29, 5.41s/it] 95%|█████████▌| 6325/6640 [2:05:31<28:07, 5.36s/it] {'loss': 0.5713, 'learning_rate': 1.1783230557647075e-07, 'epoch': 0.95} 95%|█████████▌| 6325/6640 [2:05:31<28:07, 5.36s/it] 95%|█████████▌| 6326/6640 [2:05:37<28:39, 5.48s/it] {'loss': 0.5901, 'learning_rate': 1.1708681255720223e-07, 'epoch': 0.95} 95%|█████████▌| 6326/6640 [2:05:37<28:39, 5.48s/it] 95%|█████████▌| 6327/6640 [2:05:42<28:23, 5.44s/it] {'loss': 0.5577, 'learning_rate': 1.1634367140614611e-07, 'epoch': 0.95} 95%|█████████▌| 6327/6640 [2:05:42<28:23, 5.44s/it] 95%|█████████▌| 6328/6640 [2:05:47<28:09, 5.42s/it] {'loss': 0.5735, 'learning_rate': 1.1560288230015204e-07, 'epoch': 0.95} 95%|█████████▌| 6328/6640 [2:05:47<28:09, 5.42s/it] 95%|█████████▌| 6329/6640 [2:05:53<28:00, 5.40s/it] {'loss': 0.5947, 'learning_rate': 1.1486444541550679e-07, 'epoch': 0.95} 95%|█████████▌| 6329/6640 [2:05:53<28:00, 5.40s/it] 95%|█████████▌| 6330/6640 [2:05:58<27:58, 5.41s/it] {'loss': 0.5668, 'learning_rate': 1.1412836092793977e-07, 'epoch': 0.95} 95%|█████████▌| 6330/6640 [2:05:58<27:58, 5.41s/it] 95%|█████████▌| 6331/6640 [2:06:03<27:40, 5.37s/it] {'loss': 0.5692, 'learning_rate': 1.1339462901261867e-07, 'epoch': 0.95} 95%|█████████▌| 6331/6640 [2:06:03<27:40, 5.37s/it] 95%|█████████▌| 6332/6640 [2:06:09<28:03, 5.47s/it] {'loss': 0.5398, 'learning_rate': 1.1266324984415266e-07, 'epoch': 0.95} 95%|█████████▌| 6332/6640 [2:06:09<28:03, 5.47s/it] 95%|█████████▌| 6333/6640 [2:06:15<28:12, 5.51s/it] {'loss': 0.6161, 'learning_rate': 1.1193422359658924e-07, 'epoch': 0.95} 95%|█████████▌| 6333/6640 [2:06:15<28:12, 5.51s/it] 95%|█████████▌| 6334/6640 [2:06:20<27:51, 5.46s/it] {'loss': 0.5451, 'learning_rate': 1.1120755044341736e-07, 'epoch': 0.95} 95%|█████████▌| 6334/6640 [2:06:20<27:51, 5.46s/it] 95%|█████████▌| 6335/6640 [2:06:25<27:35, 5.43s/it] {'loss': 0.5682, 'learning_rate': 1.1048323055756649e-07, 'epoch': 0.95} 95%|█████████▌| 6335/6640 [2:06:25<27:35, 5.43s/it] 95%|█████████▌| 6336/6640 [2:06:31<27:17, 5.39s/it] {'loss': 0.5775, 'learning_rate': 1.097612641114043e-07, 'epoch': 0.95} 95%|█████████▌| 6336/6640 [2:06:31<27:17, 5.39s/it] 95%|█████████▌| 6337/6640 [2:06:36<27:29, 5.44s/it] {'loss': 0.5768, 'learning_rate': 1.0904165127674116e-07, 'epoch': 0.95} 95%|█████████▌| 6337/6640 [2:06:36<27:29, 5.44s/it] 95%|█████████▌| 6338/6640 [2:06:42<27:42, 5.50s/it] {'loss': 0.5738, 'learning_rate': 1.0832439222482338e-07, 'epoch': 0.95} 95%|█████████▌| 6338/6640 [2:06:42<27:42, 5.50s/it] 95%|█████████▌| 6339/6640 [2:06:47<27:34, 5.50s/it] {'loss': 0.59, 'learning_rate': 1.0760948712634112e-07, 'epoch': 0.95} 95%|█████████▌| 6339/6640 [2:06:47<27:34, 5.50s/it] 95%|█████████▌| 6340/6640 [2:06:53<27:30, 5.50s/it] {'loss': 0.582, 'learning_rate': 1.068969361514216e-07, 'epoch': 0.95} 95%|█████████▌| 6340/6640 [2:06:53<27:30, 5.50s/it] 95%|█████████▌| 6341/6640 [2:06:58<27:15, 5.47s/it] {'loss': 0.5856, 'learning_rate': 1.0618673946963365e-07, 'epoch': 0.95} 95%|█████████▌| 6341/6640 [2:06:58<27:15, 5.47s/it] 96%|█████████▌| 6342/6640 [2:07:04<27:09, 5.47s/it] {'loss': 0.5882, 'learning_rate': 1.0547889724998428e-07, 'epoch': 0.96} 96%|█████████▌| 6342/6640 [2:07:04<27:09, 5.47s/it] 96%|█████████▌| 6343/6640 [2:07:09<27:00, 5.46s/it] {'loss': 0.5924, 'learning_rate': 1.0477340966092097e-07, 'epoch': 0.96} 96%|█████████▌| 6343/6640 [2:07:09<27:00, 5.46s/it] 96%|█████████▌| 6344/6640 [2:07:14<26:41, 5.41s/it] {'loss': 0.5514, 'learning_rate': 1.0407027687033166e-07, 'epoch': 0.96} 96%|█████████▌| 6344/6640 [2:07:14<26:41, 5.41s/it] 96%|█████████▌| 6345/6640 [2:07:20<26:48, 5.45s/it] {'loss': 0.5486, 'learning_rate': 1.033694990455425e-07, 'epoch': 0.96} 96%|█████████▌| 6345/6640 [2:07:20<26:48, 5.45s/it] 96%|█████████▌| 6346/6640 [2:07:25<26:39, 5.44s/it] {'loss': 0.5684, 'learning_rate': 1.0267107635331897e-07, 'epoch': 0.96} 96%|█████████▌| 6346/6640 [2:07:25<26:39, 5.44s/it] 96%|█████████▌| 6347/6640 [2:07:31<26:37, 5.45s/it] {'loss': 0.5843, 'learning_rate': 1.0197500895986922e-07, 'epoch': 0.96} 96%|█████████▌| 6347/6640 [2:07:31<26:37, 5.45s/it] 96%|█████████▌| 6348/6640 [2:07:37<26:49, 5.51s/it] {'loss': 0.5744, 'learning_rate': 1.0128129703083634e-07, 'epoch': 0.96} 96%|█████████▌| 6348/6640 [2:07:37<26:49, 5.51s/it] 96%|█████████▌| 6349/6640 [2:07:42<26:53, 5.54s/it] {'loss': 0.5901, 'learning_rate': 1.0058994073130712e-07, 'epoch': 0.96} 96%|█████████▌| 6349/6640 [2:07:42<26:53, 5.54s/it]63 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 12 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 96%|█████████▌| 6350/6640 [2:07:48<26:43, 5.53s/it]4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5637, 'learning_rate': 9.990094022580332e-08, 'epoch': 0.96} 96%|█████████▌| 6350/6640 [2:07:48<26:43, 5.53s/it] 96%|█████████▌| 6351/6640 [2:07:53<26:19, 5.46s/it] {'loss': 0.5888, 'learning_rate': 9.921429567829043e-08, 'epoch': 0.96} 96%|█████████▌| 6351/6640 [2:07:53<26:19, 5.46s/it] 96%|█████████▌| 6352/6640 [2:07:58<26:02, 5.43s/it] {'loss': 0.5546, 'learning_rate': 9.85300072521711e-08, 'epoch': 0.96} 96%|█████████▌| 6352/6640 [2:07:58<26:02, 5.43s/it] 96%|█████████▌| 6353/6640 [2:08:04<25:57, 5.43s/it] {'loss': 0.5898, 'learning_rate': 9.784807511028837e-08, 'epoch': 0.96} 96%|█████████▌| 6353/6640 [2:08:04<25:57, 5.43s/it] 96%|█████████▌| 6354/6640 [2:08:09<25:58, 5.45s/it] {'loss': 0.5804, 'learning_rate': 9.716849941492135e-08, 'epoch': 0.96} 96%|█████████▌| 6354/6640 [2:08:09<25:58, 5.45s/it] 96%|█████████▌| 6355/6640 [2:08:15<26:03, 5.48s/it] {'loss': 0.5814, 'learning_rate': 9.649128032779287e-08, 'epoch': 0.96} 96%|█████████▌| 6355/6640 [2:08:15<26:03, 5.48s/it] 96%|█████████▌| 6356/6640 [2:08:20<25:56, 5.48s/it] {'loss': 0.5735, 'learning_rate': 9.581641801006292e-08, 'epoch': 0.96} 96%|█████████▌| 6356/6640 [2:08:20<25:56, 5.48s/it] 96%|█████████▌| 6357/6640 [2:08:26<25:42, 5.45s/it] {'loss': 0.5882, 'learning_rate': 9.514391262233081e-08, 'epoch': 0.96} 96%|█████████▌| 6357/6640 [2:08:26<25:42, 5.45s/it] 96%|█████████▌| 6358/6640 [2:08:31<25:47, 5.49s/it] {'loss': 0.5367, 'learning_rate': 9.447376432463295e-08, 'epoch': 0.96} 96%|█████████▌| 6358/6640 [2:08:31<25:47, 5.49s/it] 96%|█████████▌| 6359/6640 [2:08:37<25:42, 5.49s/it] {'loss': 0.5896, 'learning_rate': 9.380597327644847e-08, 'epoch': 0.96} 96%|█████████▌| 6359/6640 [2:08:37<25:42, 5.49s/it] 96%|█████████▌| 6360/6640 [2:08:42<25:35, 5.48s/it] {'loss': 0.5897, 'learning_rate': 9.314053963669245e-08, 'epoch': 0.96} 96%|█████████▌| 6360/6640 [2:08:42<25:35, 5.48s/it] 96%|█████████▌| 6361/6640 [2:08:48<25:22, 5.46s/it] {'loss': 0.5805, 'learning_rate': 9.247746356372156e-08, 'epoch': 0.96} 96%|█████████▌| 6361/6640 [2:08:48<25:22, 5.46s/it] 96%|█████████▌| 6362/6640 [2:08:53<25:25, 5.49s/it] {'loss': 0.5732, 'learning_rate': 9.181674521532957e-08, 'epoch': 0.96} 96%|█████████▌| 6362/6640 [2:08:53<25:25, 5.49s/it] 96%|█████████▌| 6363/6640 [2:08:58<25:03, 5.43s/it] {'loss': 0.5747, 'learning_rate': 9.115838474874849e-08, 'epoch': 0.96} 96%|█████████▌| 6363/6640 [2:08:58<25:03, 5.43s/it] 96%|█████████▌| 6364/6640 [2:09:04<24:56, 5.42s/it] {'loss': 0.5821, 'learning_rate': 9.0502382320653e-08, 'epoch': 0.96} 96%|█████████▌| 6364/6640 [2:09:04<24:56, 5.42s/it] 96%|█████████▌| 6365/6640 [2:09:09<24:41, 5.39s/it] {'loss': 0.5732, 'learning_rate': 8.984873808715155e-08, 'epoch': 0.96} 96%|█████████▌| 6365/6640 [2:09:09<24:41, 5.39s/it] 96%|█████████▌| 6366/6640 [2:09:15<24:31, 5.37s/it] {'loss': 0.5638, 'learning_rate': 8.919745220379528e-08, 'epoch': 0.96} 96%|█████████▌| 6366/6640 [2:09:15<24:31, 5.37s/it] 96%|█████████▌| 6367/6640 [2:09:20<24:22, 5.36s/it] {'loss': 0.5568, 'learning_rate': 8.854852482557242e-08, 'epoch': 0.96} 96%|█████████▌| 6367/6640 [2:09:20<24:22, 5.36s/it] 96%|█████████▌| 6368/6640 [2:09:25<24:24, 5.38s/it] {'loss': 0.5683, 'learning_rate': 8.790195610691054e-08, 'epoch': 0.96} 96%|█████████▌| 6368/6640 [2:09:25<24:24, 5.38s/it] 96%|█████████▌| 6369/6640 [2:09:31<24:32, 5.43s/it] {'loss': 0.5491, 'learning_rate': 8.725774620167549e-08, 'epoch': 0.96} 96%|█████████▌| 6369/6640 [2:09:31<24:32, 5.43s/it] 96%|█████████▌| 6370/6640 [2:09:37<25:01, 5.56s/it] {'loss': 0.5885, 'learning_rate': 8.661589526317238e-08, 'epoch': 0.96} 96%|█████████▌| 6370/6640 [2:09:37<25:01, 5.56s/it] 96%|█████████▌| 6371/6640 [2:09:42<24:46, 5.53s/it] {'loss': 0.5556, 'learning_rate': 8.597640344414348e-08, 'epoch': 0.96} 96%|█████████▌| 6371/6640 [2:09:42<24:46, 5.53s/it] 96%|█████████▌| 6372/6640 [2:09:48<24:34, 5.50s/it] {'loss': 0.5468, 'learning_rate': 8.533927089677152e-08, 'epoch': 0.96} 96%|█████████▌| 6372/6640 [2:09:48<24:34, 5.50s/it] 96%|█████████▌| 6373/6640 [2:09:53<24:38, 5.54s/it] {'loss': 0.5768, 'learning_rate': 8.470449777267631e-08, 'epoch': 0.96} 96%|█████████▌| 6373/6640 [2:09:53<24:38, 5.54s/it] 96%|█████████▌| 6374/6640 [2:09:59<24:31, 5.53s/it] {'loss': 0.5862, 'learning_rate': 8.407208422291702e-08, 'epoch': 0.96} 96%|█████████▌| 6374/6640 [2:09:59<24:31, 5.53s/it] 96%|█████████▌| 6375/6640 [2:10:04<24:17, 5.50s/it] {'loss': 0.5491, 'learning_rate': 8.344203039799214e-08, 'epoch': 0.96} 96%|█████████▌| 6375/6640 [2:10:04<24:17, 5.50s/it] 96%|█████████▌| 6376/6640 [2:10:10<24:08, 5.49s/it] {'loss': 0.5593, 'learning_rate': 8.281433644783621e-08, 'epoch': 0.96} 96%|█████████▌| 6376/6640 [2:10:10<24:08, 5.49s/it] 96%|█████████▌| 6377/6640 [2:10:15<24:04, 5.49s/it] {'loss': 0.5875, 'learning_rate': 8.218900252182415e-08, 'epoch': 0.96} 96%|█████████▌| 6377/6640 [2:10:15<24:04, 5.49s/it] 96%|█████████▌| 6378/6640 [2:10:21<23:52, 5.47s/it] {'loss': 0.5567, 'learning_rate': 8.156602876876918e-08, 'epoch': 0.96} 96%|█████████▌| 6378/6640 [2:10:21<23:52, 5.47s/it] 96%|█████████▌| 6379/6640 [2:10:26<23:32, 5.41s/it] {'loss': 0.5783, 'learning_rate': 8.094541533692047e-08, 'epoch': 0.96} 96%|█████████▌| 6379/6640 [2:10:26<23:32, 5.41s/it] 96%|█████████▌| 6380/6640 [2:10:31<23:25, 5.40s/it] {'loss': 0.5712, 'learning_rate': 8.032716237396987e-08, 'epoch': 0.96} 96%|█████████▌| 6380/6640 [2:10:31<23:25, 5.40s/it] 96%|█████████▌| 6381/6640 [2:10:36<23:06, 5.35s/it] {'loss': 0.5533, 'learning_rate': 7.971127002704304e-08, 'epoch': 0.96} 96%|█████████▌| 6381/6640 [2:10:36<23:06, 5.35s/it] 96%|█████████▌| 6382/6640 [2:10:42<22:51, 5.31s/it] {'loss': 0.5793, 'learning_rate': 7.909773844270718e-08, 'epoch': 0.96} 96%|█████████▌| 6382/6640 [2:10:42<22:51, 5.31s/it] 96%|█████████▌| 6383/6640 [2:10:47<22:48, 5.32s/it] {'loss': 0.5741, 'learning_rate': 7.84865677669655e-08, 'epoch': 0.96} 96%|█████████▌| 6383/6640 [2:10:47<22:48, 5.32s/it] 96%|█████████▌| 6384/6640 [2:10:52<22:50, 5.35s/it] {'loss': 0.5875, 'learning_rate': 7.787775814526055e-08, 'epoch': 0.96} 96%|█████████▌| 6384/6640 [2:10:52<22:50, 5.35s/it] 96%|█████████▌| 6385/6640 [2:10:58<22:44, 5.35s/it] {'loss': 0.5878, 'learning_rate': 7.727130972247199e-08, 'epoch': 0.96} 96%|█████████▌| 6385/6640 [2:10:58<22:44, 5.35s/it] 96%|█████████▌| 6386/6640 [2:11:03<22:49, 5.39s/it] {'loss': 0.5696, 'learning_rate': 7.666722264291882e-08, 'epoch': 0.96} 96%|█████████▌| 6386/6640 [2:11:03<22:49, 5.39s/it] 96%|█████████▌| 6387/6640 [2:11:09<22:34, 5.35s/it] {'loss': 0.5664, 'learning_rate': 7.606549705035937e-08, 'epoch': 0.96} 96%|█████████▌| 6387/6640 [2:11:09<22:34, 5.35s/it] 96%|█████████▌| 6388/6640 [2:11:14<22:41, 5.40s/it] {'loss': 0.5825, 'learning_rate': 7.546613308798468e-08, 'epoch': 0.96} 96%|█████████▌| 6388/6640 [2:11:14<22:41, 5.40s/it] 96%|█████████▌| 6389/6640 [2:11:20<22:47, 5.45s/it] {'loss': 0.5427, 'learning_rate': 7.48691308984295e-08, 'epoch': 0.96} 96%|█████████▌| 6389/6640 [2:11:20<22:47, 5.45s/it] 96%|█████████▌| 6390/6640 [2:11:25<22:35, 5.42s/it] {'loss': 0.5776, 'learning_rate': 7.427449062376468e-08, 'epoch': 0.96} 96%|█████████▌| 6390/6640 [2:11:25<22:35, 5.42s/it] 96%|█████████▋| 6391/6640 [2:11:31<22:40, 5.46s/it] {'loss': 0.6003, 'learning_rate': 7.3682212405497e-08, 'epoch': 0.96} 96%|█████████▋| 6391/6640 [2:11:31<22:40, 5.46s/it] 96%|█████████▋| 6392/6640 [2:11:36<22:30, 5.44s/it] {'loss': 0.5649, 'learning_rate': 7.309229638457372e-08, 'epoch': 0.96} 96%|█████████▋| 6392/6640 [2:11:36<22:30, 5.44s/it] 96%|█████████▋| 6393/6640 [2:11:41<22:19, 5.42s/it] {'loss': 0.6047, 'learning_rate': 7.250474270137919e-08, 'epoch': 0.96} 96%|█████████▋| 6393/6640 [2:11:41<22:19, 5.42s/it] 96%|█████████▋| 6394/6640 [2:11:46<21:55, 5.35s/it] {'loss': 0.5868, 'learning_rate': 7.191955149573492e-08, 'epoch': 0.96} 96%|█████████▋| 6394/6640 [2:11:46<21:55, 5.35s/it] 96%|█████████▋| 6395/6640 [2:11:52<21:58, 5.38s/it] {'loss': 0.5589, 'learning_rate': 7.133672290690064e-08, 'epoch': 0.96} 96%|█████████▋| 6395/6640 [2:11:52<21:58, 5.38s/it] 96%|█████████▋| 6396/6640 [2:11:57<21:53, 5.38s/it] {'loss': 0.5747, 'learning_rate': 7.075625707357537e-08, 'epoch': 0.96} 96%|█████████▋| 6396/6640 [2:11:57<21:53, 5.38s/it] 96%|█████████▋| 6397/6640 [2:12:03<21:44, 5.37s/it] {'loss': 0.5682, 'learning_rate': 7.017815413389306e-08, 'epoch': 0.96} 96%|█████████▋| 6397/6640 [2:12:03<21:44, 5.37s/it] 96%|█████████▋| 6398/6640 [2:12:08<21:36, 5.36s/it] {'loss': 0.5517, 'learning_rate': 6.960241422542702e-08, 'epoch': 0.96} 96%|█████████▋| 6398/6640 [2:12:08<21:36, 5.36s/it] 96%|█████████▋| 6399/6640 [2:12:13<21:42, 5.40s/it] {'loss': 0.5657, 'learning_rate': 6.902903748518764e-08, 'epoch': 0.96} 96%|█████████▋| 6399/6640 [2:12:13<21:42, 5.40s/it]03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 17 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 96%|█████████▋| 6400/6640 [2:12:19<21:35, 5.40s/it] 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.5738, 'learning_rate': 6.845802404962243e-08, 'epoch': 0.96} 96%|█████████▋| 6400/6640 [2:12:19<21:35, 5.40s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 96%|█████████▋| 6401/6640 [2:12:46<47:36, 11.95s/it] {'loss': 0.585, 'learning_rate': 6.788937405461937e-08, 'epoch': 0.96} 96%|█████████▋| 6401/6640 [2:12:46<47:36, 11.95s/it] 96%|█████████▋| 6402/6640 [2:12:51<39:31, 9.97s/it] {'loss': 0.5771, 'learning_rate': 6.732308763550022e-08, 'epoch': 0.96} 96%|█████████▋| 6402/6640 [2:12:51<39:31, 9.97s/it] 96%|█████████▋| 6403/6640 [2:12:57<33:58, 8.60s/it] {'loss': 0.57, 'learning_rate': 6.675916492702717e-08, 'epoch': 0.96} 96%|█████████▋| 6403/6640 [2:12:57<33:58, 8.60s/it] 96%|█████████▋| 6404/6640 [2:13:02<30:00, 7.63s/it] {'loss': 0.6263, 'learning_rate': 6.619760606339731e-08, 'epoch': 0.96} 96%|█████████▋| 6404/6640 [2:13:02<30:00, 7.63s/it] 96%|█████████▋| 6405/6640 [2:13:08<27:24, 7.00s/it] {'loss': 0.5823, 'learning_rate': 6.56384111782482e-08, 'epoch': 0.96} 96%|█████████▋| 6405/6640 [2:13:08<27:24, 7.00s/it] 96%|█████████▋| 6406/6640 [2:13:13<25:25, 6.52s/it] {'loss': 0.5713, 'learning_rate': 6.508158040465118e-08, 'epoch': 0.96} 96%|█████████▋| 6406/6640 [2:13:13<25:25, 6.52s/it] 96%|█████████▋| 6407/6640 [2:13:18<23:58, 6.17s/it] {'loss': 0.5421, 'learning_rate': 6.452711387511912e-08, 'epoch': 0.96} 96%|█████████▋| 6407/6640 [2:13:19<23:58, 6.17s/it] 97%|█████████▋| 6408/6640 [2:13:24<23:04, 5.97s/it] {'loss': 0.5782, 'learning_rate': 6.39750117215987e-08, 'epoch': 0.97} 97%|█████████▋| 6408/6640 [2:13:24<23:04, 5.97s/it] 97%|█████████▋| 6409/6640 [2:13:29<22:14, 5.78s/it] {'loss': 0.568, 'learning_rate': 6.342527407547594e-08, 'epoch': 0.97} 97%|█████████▋| 6409/6640 [2:13:29<22:14, 5.78s/it] 97%|█████████▋| 6410/6640 [2:13:35<21:48, 5.69s/it] {'loss': 0.5921, 'learning_rate': 6.287790106757396e-08, 'epoch': 0.97} 97%|█████████▋| 6410/6640 [2:13:35<21:48, 5.69s/it] 97%|█████████▋| 6411/6640 [2:13:40<21:40, 5.68s/it] {'loss': 0.5507, 'learning_rate': 6.233289282815302e-08, 'epoch': 0.97} 97%|█████████▋| 6411/6640 [2:13:40<21:40, 5.68s/it] 97%|█████████▋| 6412/6640 [2:13:46<21:26, 5.64s/it] {'loss': 0.5693, 'learning_rate': 6.179024948690938e-08, 'epoch': 0.97} 97%|█████████▋| 6412/6640 [2:13:46<21:26, 5.64s/it] 97%|█████████▋| 6413/6640 [2:13:51<21:04, 5.57s/it] {'loss': 0.5823, 'learning_rate': 6.124997117297859e-08, 'epoch': 0.97} 97%|█████████▋| 6413/6640 [2:13:51<21:04, 5.57s/it] 97%|█████████▋| 6414/6640 [2:13:57<20:44, 5.50s/it] {'loss': 0.5727, 'learning_rate': 6.07120580149323e-08, 'epoch': 0.97} 97%|█████████▋| 6414/6640 [2:13:57<20:44, 5.50s/it] 97%|█████████▋| 6415/6640 [2:14:02<20:37, 5.50s/it] {'loss': 0.5582, 'learning_rate': 6.017651014077807e-08, 'epoch': 0.97} 97%|█████████▋| 6415/6640 [2:14:02<20:37, 5.50s/it] 97%|█████████▋| 6416/6640 [2:14:08<20:39, 5.53s/it] {'loss': 0.6008, 'learning_rate': 5.964332767796399e-08, 'epoch': 0.97} 97%|█████████▋| 6416/6640 [2:14:08<20:39, 5.53s/it] 97%|█████████▋| 6417/6640 [2:14:13<20:40, 5.56s/it] {'loss': 0.552, 'learning_rate': 5.911251075337188e-08, 'epoch': 0.97} 97%|█████████▋| 6417/6640 [2:14:13<20:40, 5.56s/it] 97%|█████████▋| 6418/6640 [2:14:19<20:11, 5.46s/it] {'loss': 0.6009, 'learning_rate': 5.85840594933218e-08, 'epoch': 0.97} 97%|█████████▋| 6418/6640 [2:14:19<20:11, 5.46s/it] 97%|█████████▋| 6419/6640 [2:14:24<20:11, 5.48s/it] {'loss': 0.5857, 'learning_rate': 5.805797402357205e-08, 'epoch': 0.97} 97%|█████████▋| 6419/6640 [2:14:24<20:11, 5.48s/it] 97%|█████████▋| 6420/6640 [2:14:30<19:58, 5.45s/it] {'loss': 0.5597, 'learning_rate': 5.753425446931582e-08, 'epoch': 0.97} 97%|█████████▋| 6420/6640 [2:14:30<19:58, 5.45s/it] 97%|█████████▋| 6421/6640 [2:14:35<19:57, 5.47s/it] {'loss': 0.5786, 'learning_rate': 5.701290095518564e-08, 'epoch': 0.97} 97%|█████████▋| 6421/6640 [2:14:35<19:57, 5.47s/it] 97%|█████████▋| 6422/6640 [2:14:41<19:46, 5.44s/it] {'loss': 0.5964, 'learning_rate': 5.6493913605246696e-08, 'epoch': 0.97} 97%|█████████▋| 6422/6640 [2:14:41<19:46, 5.44s/it] 97%|█████████▋| 6423/6640 [2:14:46<19:52, 5.49s/it] {'loss': 0.5562, 'learning_rate': 5.5977292543007987e-08, 'epoch': 0.97} 97%|█████████▋| 6423/6640 [2:14:46<19:52, 5.49s/it] 97%|█████████▋| 6424/6640 [2:14:52<20:02, 5.57s/it] {'loss': 0.5625, 'learning_rate': 5.5463037891408944e-08, 'epoch': 0.97} 97%|█████████▋| 6424/6640 [2:14:52<20:02, 5.57s/it] 97%|█████████▋| 6425/6640 [2:14:57<19:54, 5.55s/it] {'loss': 0.5741, 'learning_rate': 5.495114977282945e-08, 'epoch': 0.97} 97%|█████████▋| 6425/6640 [2:14:57<19:54, 5.55s/it] 97%|█████████▋| 6426/6640 [2:15:03<19:44, 5.54s/it] {'loss': 0.5634, 'learning_rate': 5.444162830908428e-08, 'epoch': 0.97} 97%|█████████▋| 6426/6640 [2:15:03<19:44, 5.54s/it] 97%|█████████▋| 6427/6640 [2:15:08<19:27, 5.48s/it] {'loss': 0.5712, 'learning_rate': 5.393447362142645e-08, 'epoch': 0.97} 97%|█████████▋| 6427/6640 [2:15:08<19:27, 5.48s/it] 97%|█████████▋| 6428/6640 [2:15:14<19:24, 5.49s/it] {'loss': 0.5751, 'learning_rate': 5.342968583054497e-08, 'epoch': 0.97} 97%|█████████▋| 6428/6640 [2:15:14<19:24, 5.49s/it] 97%|█████████▋| 6429/6640 [2:15:19<19:09, 5.45s/it] {'loss': 0.5754, 'learning_rate': 5.292726505656598e-08, 'epoch': 0.97} 97%|█████████▋| 6429/6640 [2:15:19<19:09, 5.45s/it] 97%|█████████▋| 6430/6640 [2:15:24<18:59, 5.43s/it] {'loss': 0.5803, 'learning_rate': 5.2427211419051605e-08, 'epoch': 0.97} 97%|█████████▋| 6430/6640 [2:15:24<18:59, 5.43s/it] 97%|█████████▋| 6431/6640 [2:15:30<18:48, 5.40s/it] {'loss': 0.5769, 'learning_rate': 5.192952503700222e-08, 'epoch': 0.97} 97%|█████████▋| 6431/6640 [2:15:30<18:48, 5.40s/it] 97%|█████████▋| 6432/6640 [2:15:35<18:46, 5.41s/it] {'loss': 0.5707, 'learning_rate': 5.1434206028853074e-08, 'epoch': 0.97} 97%|█████████▋| 6432/6640 [2:15:35<18:46, 5.41s/it] 97%|█████████▋| 6433/6640 [2:15:41<18:58, 5.50s/it] {'loss': 0.5559, 'learning_rate': 5.094125451247656e-08, 'epoch': 0.97} 97%|█████████▋| 6433/6640 [2:15:41<18:58, 5.50s/it] 97%|█████████▋| 6434/6640 [2:15:47<19:00, 5.54s/it] {'loss': 0.5712, 'learning_rate': 5.0450670605182165e-08, 'epoch': 0.97} 97%|█████████▋| 6434/6640 [2:15:47<19:00, 5.54s/it] 97%|█████████▋| 6435/6640 [2:15:52<18:36, 5.45s/it] {'loss': 0.6173, 'learning_rate': 4.99624544237165e-08, 'epoch': 0.97} 97%|█████████▋| 6435/6640 [2:15:52<18:36, 5.45s/it] 97%|█████████▋| 6436/6640 [2:15:57<18:26, 5.43s/it] {'loss': 0.5755, 'learning_rate': 4.947660608426108e-08, 'epoch': 0.97} 97%|█████████▋| 6436/6640 [2:15:57<18:26, 5.43s/it] 97%|█████████▋| 6437/6640 [2:16:03<18:28, 5.46s/it] {'loss': 0.5705, 'learning_rate': 4.899312570243453e-08, 'epoch': 0.97} 97%|█████████▋| 6437/6640 [2:16:03<18:28, 5.46s/it] 97%|█████████▋| 6438/6640 [2:16:08<18:13, 5.41s/it] {'loss': 0.5646, 'learning_rate': 4.851201339329259e-08, 'epoch': 0.97} 97%|█████████▋| 6438/6640 [2:16:08<18:13, 5.41s/it] 97%|█████████▋| 6439/6640 [2:16:13<18:04, 5.40s/it] {'loss': 0.5631, 'learning_rate': 4.803326927132812e-08, 'epoch': 0.97} 97%|█████████▋| 6439/6640 [2:16:13<18:04, 5.40s/it] 97%|█████████▋| 6440/6640 [2:16:19<17:57, 5.39s/it] {'loss': 0.5812, 'learning_rate': 4.7556893450466656e-08, 'epoch': 0.97} 97%|█████████▋| 6440/6640 [2:16:19<17:57, 5.39s/it] 97%|█████████▋| 6441/6640 [2:16:24<17:59, 5.43s/it] {'loss': 0.5723, 'learning_rate': 4.70828860440764e-08, 'epoch': 0.97} 97%|█████████▋| 6441/6640 [2:16:24<17:59, 5.43s/it] 97%|█████████▋| 6442/6640 [2:16:30<17:56, 5.44s/it] {'loss': 0.5765, 'learning_rate': 4.661124716495602e-08, 'epoch': 0.97} 97%|█████████▋| 6442/6640 [2:16:30<17:56, 5.44s/it] 97%|█████████▋| 6443/6640 [2:16:35<17:56, 5.47s/it] {'loss': 0.5953, 'learning_rate': 4.61419769253435e-08, 'epoch': 0.97} 97%|█████████▋| 6443/6640 [2:16:35<17:56, 5.47s/it] 97%|█████████▋| 6444/6640 [2:16:41<17:50, 5.46s/it] {'loss': 0.5763, 'learning_rate': 4.567507543691174e-08, 'epoch': 0.97} 97%|█████████▋| 6444/6640 [2:16:41<17:50, 5.46s/it] 97%|█████████▋| 6445/6640 [2:16:46<17:42, 5.45s/it] {'loss': 0.5567, 'learning_rate': 4.5210542810771864e-08, 'epoch': 0.97} 97%|█████████▋| 6445/6640 [2:16:46<17:42, 5.45s/it] 97%|█████████▋| 6446/6640 [2:16:51<17:23, 5.38s/it] {'loss': 0.5972, 'learning_rate': 4.474837915747099e-08, 'epoch': 0.97} 97%|█████████▋| 6446/6640 [2:16:51<17:23, 5.38s/it] 97%|█████████▋| 6447/6640 [2:16:57<17:19, 5.38s/it] {'loss': 0.5728, 'learning_rate': 4.4288584586990034e-08, 'epoch': 0.97} 97%|█████████▋| 6447/6640 [2:16:57<17:19, 5.38s/it] 97%|█████████▋| 6448/6640 [2:17:02<17:12, 5.38s/it] {'loss': 0.5695, 'learning_rate': 4.3831159208748145e-08, 'epoch': 0.97} 97%|█████████▋| 6448/6640 [2:17:02<17:12, 5.38s/it] 97%|█████████▋| 6449/6640 [2:17:08<17:11, 5.40s/it] {'loss': 0.559, 'learning_rate': 4.337610313159935e-08, 'epoch': 0.97} 97%|█████████▋| 6449/6640 [2:17:08<17:11, 5.40s/it]23 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 0AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 97%|█████████▋| 6450/6640 [2:17:13<16:58, 5.36s/it]4 AutoResumeHook: Checking whether to suspend... {'loss': 0.592, 'learning_rate': 4.292341646383813e-08, 'epoch': 0.97} 97%|█████████▋| 6450/6640 [2:17:13<16:58, 5.36s/it] 97%|█████████▋| 6451/6640 [2:17:18<16:56, 5.38s/it] {'loss': 0.5764, 'learning_rate': 4.247309931318722e-08, 'epoch': 0.97} 97%|█████████▋| 6451/6640 [2:17:18<16:56, 5.38s/it] 97%|█████████▋| 6452/6640 [2:17:24<16:48, 5.36s/it] {'loss': 0.5837, 'learning_rate': 4.202515178681421e-08, 'epoch': 0.97} 97%|█████████▋| 6452/6640 [2:17:24<16:48, 5.36s/it] 97%|█████████▋| 6453/6640 [2:17:29<16:52, 5.41s/it] {'loss': 0.5982, 'learning_rate': 4.157957399131607e-08, 'epoch': 0.97} 97%|█████████▋| 6453/6640 [2:17:29<16:52, 5.41s/it] 97%|█████████▋| 6454/6640 [2:17:35<16:47, 5.42s/it] {'loss': 0.5983, 'learning_rate': 4.113636603272797e-08, 'epoch': 0.97} 97%|█████████▋| 6454/6640 [2:17:35<16:47, 5.42s/it] 97%|█████████▋| 6455/6640 [2:17:40<16:35, 5.38s/it] {'loss': 0.5608, 'learning_rate': 4.069552801652443e-08, 'epoch': 0.97} 97%|█████████▋| 6455/6640 [2:17:40<16:35, 5.38s/it] 97%|█████████▋| 6456/6640 [2:17:45<16:30, 5.38s/it] {'loss': 0.58, 'learning_rate': 4.025706004760932e-08, 'epoch': 0.97} 97%|█████████▋| 6456/6640 [2:17:45<16:30, 5.38s/it] 97%|█████████▋| 6457/6640 [2:17:51<16:34, 5.43s/it] {'loss': 0.5823, 'learning_rate': 3.982096223032916e-08, 'epoch': 0.97} 97%|█████████▋| 6457/6640 [2:17:51<16:34, 5.43s/it] 97%|█████████▋| 6458/6640 [2:17:56<16:35, 5.47s/it] {'loss': 0.5946, 'learning_rate': 3.938723466846206e-08, 'epoch': 0.97} 97%|█████████▋| 6458/6640 [2:17:56<16:35, 5.47s/it] 97%|█████████▋| 6459/6640 [2:18:02<16:26, 5.45s/it] {'loss': 0.5859, 'learning_rate': 3.8955877465224335e-08, 'epoch': 0.97} 97%|█████████▋| 6459/6640 [2:18:02<16:26, 5.45s/it] 97%|█████████▋| 6460/6640 [2:18:07<16:12, 5.40s/it] {'loss': 0.5668, 'learning_rate': 3.852689072326832e-08, 'epoch': 0.97} 97%|█████████▋| 6460/6640 [2:18:07<16:12, 5.40s/it] 97%|█████████▋| 6461/6640 [2:18:13<16:12, 5.43s/it] {'loss': 0.5846, 'learning_rate': 3.8100274544677904e-08, 'epoch': 0.97} 97%|█████████▋| 6461/6640 [2:18:13<16:12, 5.43s/it] 97%|█████████▋| 6462/6640 [2:18:18<16:15, 5.48s/it] {'loss': 0.5558, 'learning_rate': 3.7676029030979666e-08, 'epoch': 0.97} 97%|█████████▋| 6462/6640 [2:18:18<16:15, 5.48s/it] 97%|█████████▋| 6463/6640 [2:18:24<16:04, 5.45s/it] {'loss': 0.5558, 'learning_rate': 3.725415428313173e-08, 'epoch': 0.97} 97%|█████████▋| 6463/6640 [2:18:24<16:04, 5.45s/it] 97%|█████████▋| 6464/6640 [2:18:29<15:56, 5.44s/it] {'loss': 0.575, 'learning_rate': 3.6834650401528226e-08, 'epoch': 0.97} 97%|█████████▋| 6464/6640 [2:18:29<15:56, 5.44s/it] 97%|█████████▋| 6465/6640 [2:18:34<15:47, 5.41s/it] {'loss': 0.5889, 'learning_rate': 3.641751748600042e-08, 'epoch': 0.97} 97%|█████████▋| 6465/6640 [2:18:34<15:47, 5.41s/it] 97%|█████████▋| 6466/6640 [2:18:40<15:50, 5.46s/it] {'loss': 0.5838, 'learning_rate': 3.6002755635814455e-08, 'epoch': 0.97} 97%|█████████▋| 6466/6640 [2:18:40<15:50, 5.46s/it] 97%|█████████▋| 6467/6640 [2:18:45<15:51, 5.50s/it] {'loss': 0.5877, 'learning_rate': 3.5590364949672495e-08, 'epoch': 0.97} 97%|█████████▋| 6467/6640 [2:18:45<15:51, 5.50s/it] 97%|█████████▋| 6468/6640 [2:18:51<15:33, 5.43s/it] {'loss': 0.5663, 'learning_rate': 3.51803455257127e-08, 'epoch': 0.97} 97%|█████████▋| 6468/6640 [2:18:51<15:33, 5.43s/it] 97%|█████████▋| 6469/6640 [2:18:56<15:35, 5.47s/it] {'loss': 0.5783, 'learning_rate': 3.477269746150924e-08, 'epoch': 0.97} 97%|█████████▋| 6469/6640 [2:18:56<15:35, 5.47s/it] 97%|█████████▋| 6470/6640 [2:19:02<15:28, 5.46s/it] {'loss': 0.5826, 'learning_rate': 3.436742085407119e-08, 'epoch': 0.97} 97%|█████████▋| 6470/6640 [2:19:02<15:28, 5.46s/it] 97%|█████████▋| 6471/6640 [2:19:07<15:31, 5.51s/it] {'loss': 0.5682, 'learning_rate': 3.39645157998425e-08, 'epoch': 0.97} 97%|█████████▋| 6471/6640 [2:19:07<15:31, 5.51s/it] 97%|█████████▋| 6472/6640 [2:19:13<15:30, 5.54s/it] {'loss': 0.5893, 'learning_rate': 3.356398239470427e-08, 'epoch': 0.97} 97%|█████████▋| 6472/6640 [2:19:13<15:30, 5.54s/it] 97%|█████████▋| 6473/6640 [2:19:18<15:17, 5.50s/it] {'loss': 0.5931, 'learning_rate': 3.3165820733973566e-08, 'epoch': 0.97} 97%|█████████▋| 6473/6640 [2:19:18<15:17, 5.50s/it] 98%|█████████▊| 6474/6640 [2:19:24<15:16, 5.52s/it] {'loss': 0.5602, 'learning_rate': 3.277003091240128e-08, 'epoch': 0.97} 98%|█████████▊| 6474/6640 [2:19:24<15:16, 5.52s/it] 98%|█████████▊| 6475/6640 [2:19:29<15:08, 5.51s/it] {'loss': 0.5833, 'learning_rate': 3.2376613024175384e-08, 'epoch': 0.98} 98%|█████████▊| 6475/6640 [2:19:29<15:08, 5.51s/it] 98%|█████████▊| 6476/6640 [2:19:35<15:03, 5.51s/it] {'loss': 0.5704, 'learning_rate': 3.198556716291767e-08, 'epoch': 0.98} 98%|█████████▊| 6476/6640 [2:19:35<15:03, 5.51s/it] 98%|█████████▊| 6477/6640 [2:19:40<14:56, 5.50s/it] {'loss': 0.5773, 'learning_rate': 3.159689342168704e-08, 'epoch': 0.98} 98%|█████████▊| 6477/6640 [2:19:40<14:56, 5.50s/it] 98%|█████████▊| 6478/6640 [2:19:46<14:47, 5.48s/it] {'loss': 0.5768, 'learning_rate': 3.121059189297837e-08, 'epoch': 0.98} 98%|█████████▊| 6478/6640 [2:19:46<14:47, 5.48s/it] 98%|█████████▊| 6479/6640 [2:19:51<14:39, 5.46s/it] {'loss': 0.5694, 'learning_rate': 3.082666266872036e-08, 'epoch': 0.98} 98%|█████████▊| 6479/6640 [2:19:51<14:39, 5.46s/it] 98%|█████████▊| 6480/6640 [2:19:57<14:30, 5.44s/it] {'loss': 0.5746, 'learning_rate': 3.044510584027771e-08, 'epoch': 0.98} 98%|█████████▊| 6480/6640 [2:19:57<14:30, 5.44s/it] 98%|█████████▊| 6481/6640 [2:20:02<14:20, 5.41s/it] {'loss': 0.5904, 'learning_rate': 3.0065921498451113e-08, 'epoch': 0.98} 98%|█████████▊| 6481/6640 [2:20:02<14:20, 5.41s/it] 98%|█████████▊| 6482/6640 [2:20:07<14:16, 5.42s/it] {'loss': 0.639, 'learning_rate': 2.9689109733475052e-08, 'epoch': 0.98} 98%|█████████▊| 6482/6640 [2:20:07<14:16, 5.42s/it] 98%|█████████▊| 6483/6640 [2:20:13<14:08, 5.41s/it] {'loss': 0.5658, 'learning_rate': 2.9314670635022247e-08, 'epoch': 0.98} 98%|█████████▊| 6483/6640 [2:20:13<14:08, 5.41s/it] 98%|█████████▊| 6484/6640 [2:20:18<14:07, 5.43s/it] {'loss': 0.5932, 'learning_rate': 2.8942604292199195e-08, 'epoch': 0.98} 98%|█████████▊| 6484/6640 [2:20:18<14:07, 5.43s/it] 98%|█████████▊| 6485/6640 [2:20:24<14:01, 5.43s/it] {'loss': 0.5711, 'learning_rate': 2.8572910793546183e-08, 'epoch': 0.98} 98%|█████████▊| 6485/6640 [2:20:24<14:01, 5.43s/it] 98%|█████████▊| 6486/6640 [2:20:29<14:08, 5.51s/it] {'loss': 0.5551, 'learning_rate': 2.8205590227040613e-08, 'epoch': 0.98} 98%|█████████▊| 6486/6640 [2:20:29<14:08, 5.51s/it] 98%|█████████▊| 6487/6640 [2:20:35<13:57, 5.47s/it] {'loss': 0.5657, 'learning_rate': 2.7840642680097007e-08, 'epoch': 0.98} 98%|█████████▊| 6487/6640 [2:20:35<13:57, 5.47s/it] 98%|█████████▊| 6488/6640 [2:20:40<13:48, 5.45s/it] {'loss': 0.5681, 'learning_rate': 2.747806823956034e-08, 'epoch': 0.98} 98%|█████████▊| 6488/6640 [2:20:40<13:48, 5.45s/it] 98%|█████████▊| 6489/6640 [2:20:46<13:51, 5.50s/it] {'loss': 0.5626, 'learning_rate': 2.7117866991714925e-08, 'epoch': 0.98} 98%|█████████▊| 6489/6640 [2:20:46<13:51, 5.50s/it] 98%|█████████▊| 6490/6640 [2:20:51<13:42, 5.48s/it] {'loss': 0.5673, 'learning_rate': 2.676003902227886e-08, 'epoch': 0.98} 98%|█████████▊| 6490/6640 [2:20:51<13:42, 5.48s/it] 98%|█████████▊| 6491/6640 [2:20:57<13:30, 5.44s/it] {'loss': 0.5728, 'learning_rate': 2.6404584416406254e-08, 'epoch': 0.98} 98%|█████████▊| 6491/6640 [2:20:57<13:30, 5.44s/it] 98%|█████████▊| 6492/6640 [2:21:02<13:34, 5.50s/it] {'loss': 0.5592, 'learning_rate': 2.6051503258684995e-08, 'epoch': 0.98} 98%|█████████▊| 6492/6640 [2:21:02<13:34, 5.50s/it] 98%|█████████▊| 6493/6640 [2:21:08<13:36, 5.56s/it] {'loss': 0.5776, 'learning_rate': 2.5700795633138987e-08, 'epoch': 0.98} 98%|█████████▊| 6493/6640 [2:21:08<13:36, 5.56s/it] 98%|█████████▊| 6494/6640 [2:21:14<13:34, 5.58s/it] {'loss': 0.5709, 'learning_rate': 2.535246162322702e-08, 'epoch': 0.98} 98%|█████████▊| 6494/6640 [2:21:14<13:34, 5.58s/it] 98%|█████████▊| 6495/6640 [2:21:19<13:20, 5.52s/it] {'loss': 0.5807, 'learning_rate': 2.50065013118439e-08, 'epoch': 0.98} 98%|█████████▊| 6495/6640 [2:21:19<13:20, 5.52s/it] 98%|█████████▊| 6496/6640 [2:21:24<13:08, 5.48s/it] {'loss': 0.5704, 'learning_rate': 2.4662914781318215e-08, 'epoch': 0.98} 98%|█████████▊| 6496/6640 [2:21:24<13:08, 5.48s/it] 98%|█████████▊| 6497/6640 [2:21:30<12:56, 5.43s/it] {'loss': 0.5765, 'learning_rate': 2.432170211341567e-08, 'epoch': 0.98} 98%|█████████▊| 6497/6640 [2:21:30<12:56, 5.43s/it] 98%|█████████▊| 6498/6640 [2:21:35<12:59, 5.49s/it] {'loss': 0.5996, 'learning_rate': 2.3982863389333532e-08, 'epoch': 0.98} 98%|█████████▊| 6498/6640 [2:21:35<12:59, 5.49s/it] 98%|█████████▊| 6499/6640 [2:21:41<13:00, 5.54s/it] {'loss': 0.5876, 'learning_rate': 2.3646398689707307e-08, 'epoch': 0.98} 98%|█████████▊| 6499/6640 [2:21:41<13:00, 5.54s/it]3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 98%|█████████▊| 6500/6640 [2:21:46<12:55, 5.54s/it]4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.577, 'learning_rate': 2.3312308094607382e-08, 'epoch': 0.98} 98%|█████████▊| 6500/6640 [2:21:46<12:55, 5.54s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 98%|█████████▊| 6501/6640 [2:22:15<28:41, 12.38s/it] {'loss': 0.5857, 'learning_rate': 2.2980591683539056e-08, 'epoch': 0.98} 98%|█████████▊| 6501/6640 [2:22:15<28:41, 12.38s/it] 98%|█████████▊| 6502/6640 [2:22:20<23:38, 10.28s/it] {'loss': 0.5679, 'learning_rate': 2.265124953543918e-08, 'epoch': 0.98} 98%|█████████▊| 6502/6640 [2:22:20<23:38, 10.28s/it] 98%|█████████▊| 6503/6640 [2:22:26<20:07, 8.82s/it] {'loss': 0.5664, 'learning_rate': 2.232428172868395e-08, 'epoch': 0.98} 98%|█████████▊| 6503/6640 [2:22:26<20:07, 8.82s/it] 98%|█████████▊| 6504/6640 [2:22:31<17:40, 7.80s/it] {'loss': 0.5538, 'learning_rate': 2.1999688341083348e-08, 'epoch': 0.98} 98%|█████████▊| 6504/6640 [2:22:31<17:40, 7.80s/it] 98%|█████████▊| 6505/6640 [2:22:36<15:52, 7.06s/it] {'loss': 0.5794, 'learning_rate': 2.167746944988114e-08, 'epoch': 0.98} 98%|█████████▊| 6505/6640 [2:22:36<15:52, 7.06s/it] 98%|█████████▊| 6506/6640 [2:22:42<14:38, 6.56s/it] {'loss': 0.5697, 'learning_rate': 2.135762513175599e-08, 'epoch': 0.98} 98%|█████████▊| 6506/6640 [2:22:42<14:38, 6.56s/it] 98%|█████████▊| 6507/6640 [2:22:47<13:45, 6.21s/it] {'loss': 0.5713, 'learning_rate': 2.1040155462824786e-08, 'epoch': 0.98} 98%|█████████▊| 6507/6640 [2:22:47<13:45, 6.21s/it] 98%|█████████▊| 6508/6640 [2:22:53<13:13, 6.01s/it] {'loss': 0.5788, 'learning_rate': 2.0725060518633765e-08, 'epoch': 0.98} 98%|█████████▊| 6508/6640 [2:22:53<13:13, 6.01s/it] 98%|█████████▊| 6509/6640 [2:22:58<12:43, 5.83s/it] {'loss': 0.5845, 'learning_rate': 2.0412340374169615e-08, 'epoch': 0.98} 98%|█████████▊| 6509/6640 [2:22:58<12:43, 5.83s/it] 98%|█████████▊| 6510/6640 [2:23:03<12:16, 5.67s/it] {'loss': 0.5939, 'learning_rate': 2.010199510385058e-08, 'epoch': 0.98} 98%|█████████▊| 6510/6640 [2:23:03<12:16, 5.67s/it] 98%|█████████▊| 6511/6640 [2:23:09<11:59, 5.58s/it] {'loss': 0.5672, 'learning_rate': 1.9794024781529808e-08, 'epoch': 0.98} 98%|█████████▊| 6511/6640 [2:23:09<11:59, 5.58s/it] 98%|█████████▊| 6512/6640 [2:23:14<11:57, 5.61s/it] {'loss': 0.5814, 'learning_rate': 1.948842948049534e-08, 'epoch': 0.98} 98%|█████████▊| 6512/6640 [2:23:14<11:57, 5.61s/it] 98%|█████████▊| 6513/6640 [2:23:20<11:44, 5.55s/it] {'loss': 0.5598, 'learning_rate': 1.9185209273473448e-08, 'epoch': 0.98} 98%|█████████▊| 6513/6640 [2:23:20<11:44, 5.55s/it] 98%|█████████▊| 6514/6640 [2:23:25<11:31, 5.49s/it] {'loss': 0.586, 'learning_rate': 1.8884364232619744e-08, 'epoch': 0.98} 98%|█████████▊| 6514/6640 [2:23:25<11:31, 5.49s/it] 98%|█████████▊| 6515/6640 [2:23:31<11:28, 5.51s/it] {'loss': 0.5992, 'learning_rate': 1.8585894429528073e-08, 'epoch': 0.98} 98%|█████████▊| 6515/6640 [2:23:31<11:28, 5.51s/it] 98%|█████████▊| 6516/6640 [2:23:36<11:15, 5.45s/it] {'loss': 0.5741, 'learning_rate': 1.828979993522717e-08, 'epoch': 0.98} 98%|█████████▊| 6516/6640 [2:23:36<11:15, 5.45s/it] 98%|█████████▊| 6517/6640 [2:23:41<11:10, 5.45s/it] {'loss': 0.5572, 'learning_rate': 1.7996080820178453e-08, 'epoch': 0.98} 98%|█████████▊| 6517/6640 [2:23:41<11:10, 5.45s/it] 98%|█████████▊| 6518/6640 [2:23:47<11:04, 5.45s/it] {'loss': 0.5699, 'learning_rate': 1.770473715427934e-08, 'epoch': 0.98} 98%|█████████▊| 6518/6640 [2:23:47<11:04, 5.45s/it] 98%|█████████▊| 6519/6640 [2:23:52<11:01, 5.46s/it] {'loss': 0.5833, 'learning_rate': 1.741576900686215e-08, 'epoch': 0.98} 98%|█████████▊| 6519/6640 [2:23:52<11:01, 5.46s/it] 98%|█████████▊| 6520/6640 [2:23:58<10:54, 5.45s/it] {'loss': 0.5857, 'learning_rate': 1.7129176446692986e-08, 'epoch': 0.98} 98%|█████████▊| 6520/6640 [2:23:58<10:54, 5.45s/it] 98%|█████████▊| 6521/6640 [2:24:03<10:50, 5.47s/it] {'loss': 0.6023, 'learning_rate': 1.6844959541973958e-08, 'epoch': 0.98} 98%|█████████▊| 6521/6640 [2:24:03<10:50, 5.47s/it] 98%|█████████▊| 6522/6640 [2:24:09<10:43, 5.45s/it] {'loss': 0.5433, 'learning_rate': 1.6563118360339858e-08, 'epoch': 0.98} 98%|█████████▊| 6522/6640 [2:24:09<10:43, 5.45s/it] 98%|█████████▊| 6523/6640 [2:24:14<10:42, 5.49s/it] {'loss': 0.5752, 'learning_rate': 1.6283652968863695e-08, 'epoch': 0.98} 98%|█████████▊| 6523/6640 [2:24:14<10:42, 5.49s/it] 98%|█████████▊| 6524/6640 [2:24:20<10:36, 5.49s/it] {'loss': 0.5559, 'learning_rate': 1.6006563434047828e-08, 'epoch': 0.98} 98%|█████████▊| 6524/6640 [2:24:20<10:36, 5.49s/it] 98%|█████████▊| 6525/6640 [2:24:25<10:25, 5.44s/it] {'loss': 0.5781, 'learning_rate': 1.5731849821833955e-08, 'epoch': 0.98} 98%|█████████▊| 6525/6640 [2:24:25<10:25, 5.44s/it] 98%|█████████▊| 6526/6640 [2:24:31<10:20, 5.44s/it] {'loss': 0.5743, 'learning_rate': 1.545951219759645e-08, 'epoch': 0.98} 98%|█████████▊| 6526/6640 [2:24:31<10:20, 5.44s/it] 98%|█████████▊| 6527/6640 [2:24:36<10:15, 5.45s/it] {'loss': 0.5459, 'learning_rate': 1.5189550626144577e-08, 'epoch': 0.98} 98%|█████████▊| 6527/6640 [2:24:36<10:15, 5.45s/it] 98%|█████████▊| 6528/6640 [2:24:42<10:11, 5.46s/it] {'loss': 0.5642, 'learning_rate': 1.4921965171720288e-08, 'epoch': 0.98} 98%|█████████▊| 6528/6640 [2:24:42<10:11, 5.46s/it] 98%|█████████▊| 6529/6640 [2:24:47<10:12, 5.51s/it] {'loss': 0.569, 'learning_rate': 1.465675589800375e-08, 'epoch': 0.98} 98%|█████████▊| 6529/6640 [2:24:47<10:12, 5.51s/it] 98%|█████████▊| 6530/6640 [2:24:52<09:59, 5.45s/it] {'loss': 0.5581, 'learning_rate': 1.4393922868105591e-08, 'epoch': 0.98} 98%|█████████▊| 6530/6640 [2:24:52<09:59, 5.45s/it] 98%|█████████▊| 6531/6640 [2:24:58<09:53, 5.44s/it] {'loss': 0.6045, 'learning_rate': 1.413346614457467e-08, 'epoch': 0.98} 98%|█████████▊| 6531/6640 [2:24:58<09:53, 5.44s/it] 98%|█████████▊| 6532/6640 [2:25:03<09:43, 5.41s/it] {'loss': 0.5749, 'learning_rate': 1.3875385789392514e-08, 'epoch': 0.98} 98%|█████████▊| 6532/6640 [2:25:03<09:43, 5.41s/it] 98%|█████████▊| 6533/6640 [2:25:09<09:39, 5.41s/it] {'loss': 0.6037, 'learning_rate': 1.3619681863974443e-08, 'epoch': 0.98} 98%|█████████▊| 6533/6640 [2:25:09<09:39, 5.41s/it] 98%|█████████▊| 6534/6640 [2:25:14<09:36, 5.44s/it] {'loss': 0.5919, 'learning_rate': 1.336635442917289e-08, 'epoch': 0.98} 98%|█████████▊| 6534/6640 [2:25:14<09:36, 5.44s/it] 98%|█████████▊| 6535/6640 [2:25:19<09:25, 5.39s/it] {'loss': 0.5647, 'learning_rate': 1.3115403545270744e-08, 'epoch': 0.98} 98%|█████████▊| 6535/6640 [2:25:19<09:25, 5.39s/it] 98%|█████████▊| 6536/6640 [2:25:25<09:21, 5.40s/it] {'loss': 0.5788, 'learning_rate': 1.2866829271989122e-08, 'epoch': 0.98} 98%|█████████▊| 6536/6640 [2:25:25<09:21, 5.40s/it] 98%|█████████▊| 6537/6640 [2:25:30<09:15, 5.40s/it] {'loss': 0.5897, 'learning_rate': 1.2620631668481819e-08, 'epoch': 0.98} 98%|█████████▊| 6537/6640 [2:25:30<09:15, 5.40s/it] 98%|█████████▊| 6538/6640 [2:25:36<09:12, 5.41s/it] {'loss': 0.5923, 'learning_rate': 1.2376810793336413e-08, 'epoch': 0.98} 98%|█████████▊| 6538/6640 [2:25:36<09:12, 5.41s/it] 98%|█████████▊| 6539/6640 [2:25:41<09:08, 5.43s/it] {'loss': 0.5615, 'learning_rate': 1.213536670457649e-08, 'epoch': 0.98} 98%|█████████▊| 6539/6640 [2:25:41<09:08, 5.43s/it] 98%|█████████▊| 6540/6640 [2:25:46<08:58, 5.38s/it] {'loss': 0.5885, 'learning_rate': 1.1896299459658311e-08, 'epoch': 0.98} 98%|█████████▊| 6540/6640 [2:25:46<08:58, 5.38s/it] 99%|█████████▊| 6541/6640 [2:25:52<09:00, 5.46s/it] {'loss': 0.6008, 'learning_rate': 1.1659609115475257e-08, 'epoch': 0.99} 99%|█████████▊| 6541/6640 [2:25:52<09:00, 5.46s/it] 99%|█████████▊| 6542/6640 [2:25:58<08:56, 5.47s/it] {'loss': 0.571, 'learning_rate': 1.142529572835227e-08, 'epoch': 0.99} 99%|█████████▊| 6542/6640 [2:25:58<08:56, 5.47s/it] 99%|█████████▊| 6543/6640 [2:26:03<08:46, 5.43s/it] {'loss': 0.6195, 'learning_rate': 1.1193359354049194e-08, 'epoch': 0.99} 99%|█████████▊| 6543/6640 [2:26:03<08:46, 5.43s/it] 99%|█████████▊| 6544/6640 [2:26:08<08:37, 5.39s/it] {'loss': 0.5668, 'learning_rate': 1.0963800047760764e-08, 'epoch': 0.99} 99%|█████████▊| 6544/6640 [2:26:08<08:37, 5.39s/it] 99%|█████████▊| 6545/6640 [2:26:14<08:32, 5.40s/it] {'loss': 0.5573, 'learning_rate': 1.0736617864117727e-08, 'epoch': 0.99} 99%|█████████▊| 6545/6640 [2:26:14<08:32, 5.40s/it] 99%|█████████▊| 6546/6640 [2:26:19<08:26, 5.39s/it] {'loss': 0.5736, 'learning_rate': 1.051181285718128e-08, 'epoch': 0.99} 99%|█████████▊| 6546/6640 [2:26:19<08:26, 5.39s/it] 99%|█████████▊| 6547/6640 [2:26:25<08:29, 5.48s/it] {'loss': 0.5549, 'learning_rate': 1.0289385080448632e-08, 'epoch': 0.99} 99%|█████████▊| 6547/6640 [2:26:25<08:29, 5.48s/it] 99%|█████████▊| 6548/6640 [2:26:30<08:18, 5.42s/it] {'loss': 0.5774, 'learning_rate': 1.0069334586854106e-08, 'epoch': 0.99} 99%|█████████▊| 6548/6640 [2:26:30<08:18, 5.42s/it] 99%|█████████▊| 6549/6640 [2:26:35<08:15, 5.45s/it] {'loss': 0.584, 'learning_rate': 9.851661428761372e-09, 'epoch': 0.99} 99%|█████████▊| 6549/6640 [2:26:35<08:15, 5.45s/it]6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 99%|█████████▊| 6550/6640 [2:26:41<08:07, 5.41s/it] {'loss': 0.5904, 'learning_rate': 9.636365657971215e-09, 'epoch': 0.99} 99%|█████████▊| 6550/6640 [2:26:41<08:07, 5.41s/it] 99%|█████████▊| 6551/6640 [2:26:47<08:11, 5.52s/it] {'loss': 0.5429, 'learning_rate': 9.423447325720425e-09, 'epoch': 0.99} 99%|█████████▊| 6551/6640 [2:26:47<08:11, 5.52s/it] 99%|█████████▊| 6552/6640 [2:26:52<08:00, 5.46s/it] {'loss': 0.5906, 'learning_rate': 9.212906482675143e-09, 'epoch': 0.99} 99%|█████████▊| 6552/6640 [2:26:52<08:00, 5.46s/it] 99%|█████████▊| 6553/6640 [2:26:57<07:51, 5.42s/it] {'loss': 0.5542, 'learning_rate': 9.004743178939735e-09, 'epoch': 0.99} 99%|█████████▊| 6553/6640 [2:26:57<07:51, 5.42s/it] 99%|█████████▊| 6554/6640 [2:27:03<07:47, 5.44s/it] {'loss': 0.5631, 'learning_rate': 8.798957464051238e-09, 'epoch': 0.99} 99%|█████████▊| 6554/6640 [2:27:03<07:47, 5.44s/it] 99%|█████████▊| 6555/6640 [2:27:08<07:43, 5.45s/it] {'loss': 0.5891, 'learning_rate': 8.595549386981595e-09, 'epoch': 0.99} 99%|█████████▊| 6555/6640 [2:27:08<07:43, 5.45s/it] 99%|█████████▊| 6556/6640 [2:27:14<07:36, 5.44s/it] {'loss': 0.5814, 'learning_rate': 8.394518996135414e-09, 'epoch': 0.99} 99%|█████████▊| 6556/6640 [2:27:14<07:36, 5.44s/it] 99%|█████████▉| 6557/6640 [2:27:19<07:28, 5.40s/it] {'loss': 0.5801, 'learning_rate': 8.19586633935332e-09, 'epoch': 0.99} 99%|█████████▉| 6557/6640 [2:27:19<07:28, 5.40s/it] 99%|█████████▉| 6558/6640 [2:27:24<07:23, 5.41s/it] {'loss': 0.5592, 'learning_rate': 7.999591463908607e-09, 'epoch': 0.99} 99%|█████████▉| 6558/6640 [2:27:24<07:23, 5.41s/it] 99%|█████████▉| 6559/6640 [2:27:30<07:18, 5.41s/it] {'loss': 0.5781, 'learning_rate': 7.805694416510579e-09, 'epoch': 0.99} 99%|█████████▉| 6559/6640 [2:27:30<07:18, 5.41s/it] 99%|█████████▉| 6560/6640 [2:27:35<07:14, 5.43s/it] {'loss': 0.5883, 'learning_rate': 7.614175243301213e-09, 'epoch': 0.99} 99%|█████████▉| 6560/6640 [2:27:35<07:14, 5.43s/it] 99%|█████████▉| 6561/6640 [2:27:41<07:17, 5.53s/it] {'loss': 0.5602, 'learning_rate': 7.425033989856279e-09, 'epoch': 0.99} 99%|█████████▉| 6561/6640 [2:27:41<07:17, 5.53s/it] 99%|█████████▉| 6562/6640 [2:27:46<07:09, 5.50s/it] {'loss': 0.574, 'learning_rate': 7.238270701186434e-09, 'epoch': 0.99} 99%|█████████▉| 6562/6640 [2:27:46<07:09, 5.50s/it] 99%|█████████▉| 6563/6640 [2:27:52<06:59, 5.45s/it] {'loss': 0.5913, 'learning_rate': 7.053885421737239e-09, 'epoch': 0.99} 99%|█████████▉| 6563/6640 [2:27:52<06:59, 5.45s/it] 99%|█████████▉| 6564/6640 [2:27:57<06:54, 5.46s/it] {'loss': 0.57, 'learning_rate': 6.871878195386927e-09, 'epoch': 0.99} 99%|█████████▉| 6564/6640 [2:27:57<06:54, 5.46s/it] 99%|█████████▉| 6565/6640 [2:28:03<06:46, 5.42s/it] {'loss': 0.5691, 'learning_rate': 6.692249065447521e-09, 'epoch': 0.99} 99%|█████████▉| 6565/6640 [2:28:03<06:46, 5.42s/it] 99%|█████████▉| 6566/6640 [2:28:08<06:41, 5.43s/it] {'loss': 0.5535, 'learning_rate': 6.514998074667045e-09, 'epoch': 0.99} 99%|█████████▉| 6566/6640 [2:28:08<06:41, 5.43s/it] 99%|█████████▉| 6567/6640 [2:28:13<06:35, 5.42s/it] {'loss': 0.5797, 'learning_rate': 6.340125265226205e-09, 'epoch': 0.99} 99%|█████████▉| 6567/6640 [2:28:13<06:35, 5.42s/it] 99%|█████████▉| 6568/6640 [2:28:19<06:29, 5.41s/it] {'loss': 0.5865, 'learning_rate': 6.167630678739489e-09, 'epoch': 0.99} 99%|█████████▉| 6568/6640 [2:28:19<06:29, 5.41s/it] 99%|█████████▉| 6569/6640 [2:28:24<06:27, 5.46s/it] {'loss': 0.5808, 'learning_rate': 5.997514356256285e-09, 'epoch': 0.99} 99%|█████████▉| 6569/6640 [2:28:24<06:27, 5.46s/it] 99%|█████████▉| 6570/6640 [2:28:30<06:23, 5.48s/it] {'loss': 0.5781, 'learning_rate': 5.8297763382597625e-09, 'epoch': 0.99} 99%|█████████▉| 6570/6640 [2:28:30<06:23, 5.48s/it] 99%|█████████▉| 6571/6640 [2:28:35<06:19, 5.50s/it] {'loss': 0.5627, 'learning_rate': 5.664416664666883e-09, 'epoch': 0.99} 99%|█████████▉| 6571/6640 [2:28:35<06:19, 5.50s/it] 99%|█████████▉| 6572/6640 [2:28:41<06:09, 5.43s/it] {'loss': 0.5808, 'learning_rate': 5.5014353748295e-09, 'epoch': 0.99} 99%|█████████▉| 6572/6640 [2:28:41<06:09, 5.43s/it] 99%|█████████▉| 6573/6640 [2:28:46<06:01, 5.40s/it] {'loss': 0.5943, 'learning_rate': 5.3408325075332555e-09, 'epoch': 0.99} 99%|█████████▉| 6573/6640 [2:28:46<06:01, 5.40s/it] 99%|█████████▉| 6574/6640 [2:28:51<05:56, 5.40s/it] {'loss': 0.603, 'learning_rate': 5.182608100995356e-09, 'epoch': 0.99} 99%|█████████▉| 6574/6640 [2:28:51<05:56, 5.40s/it] 99%|█████████▉| 6575/6640 [2:28:57<05:52, 5.42s/it] {'loss': 0.5835, 'learning_rate': 5.026762192870127e-09, 'epoch': 0.99} 99%|█████████▉| 6575/6640 [2:28:57<05:52, 5.42s/it] 99%|█████████▉| 6576/6640 [2:29:02<05:44, 5.39s/it] {'loss': 0.5959, 'learning_rate': 4.873294820244568e-09, 'epoch': 0.99} 99%|█████████▉| 6576/6640 [2:29:02<05:44, 5.39s/it] 99%|█████████▉| 6577/6640 [2:29:07<05:37, 5.36s/it] {'loss': 0.5628, 'learning_rate': 4.7222060196394634e-09, 'epoch': 0.99} 99%|█████████▉| 6577/6640 [2:29:08<05:37, 5.36s/it] 99%|█████████▉| 6578/6640 [2:29:13<05:32, 5.36s/it] {'loss': 0.5771, 'learning_rate': 4.573495827010499e-09, 'epoch': 0.99} 99%|█████████▉| 6578/6640 [2:29:13<05:32, 5.36s/it] 99%|█████████▉| 6579/6640 [2:29:18<05:28, 5.38s/it] {'loss': 0.5967, 'learning_rate': 4.4271642777471425e-09, 'epoch': 0.99} 99%|█████████▉| 6579/6640 [2:29:18<05:28, 5.38s/it] 99%|█████████▉| 6580/6640 [2:29:24<05:24, 5.40s/it] {'loss': 0.5568, 'learning_rate': 4.283211406670429e-09, 'epoch': 0.99} 99%|█████████▉| 6580/6640 [2:29:24<05:24, 5.40s/it] 99%|█████████▉| 6581/6640 [2:29:29<05:16, 5.37s/it] {'loss': 0.5747, 'learning_rate': 4.14163724803962e-09, 'epoch': 0.99} 99%|█████████▉| 6581/6640 [2:29:29<05:16, 5.37s/it] 99%|█████████▉| 6582/6640 [2:29:34<05:10, 5.36s/it] {'loss': 0.587, 'learning_rate': 4.002441835544435e-09, 'epoch': 0.99} 99%|█████████▉| 6582/6640 [2:29:34<05:10, 5.36s/it] 99%|█████████▉| 6583/6640 [2:29:40<05:04, 5.34s/it] {'loss': 0.5736, 'learning_rate': 3.865625202309487e-09, 'epoch': 0.99} 99%|█████████▉| 6583/6640 [2:29:40<05:04, 5.34s/it] 99%|█████████▉| 6584/6640 [2:29:45<05:01, 5.39s/it] {'loss': 0.5664, 'learning_rate': 3.731187380893176e-09, 'epoch': 0.99} 99%|█████████▉| 6584/6640 [2:29:45<05:01, 5.39s/it] 99%|█████████▉| 6585/6640 [2:29:51<04:56, 5.40s/it] {'loss': 0.5785, 'learning_rate': 3.5991284032899087e-09, 'epoch': 0.99} 99%|█████████▉| 6585/6640 [2:29:51<04:56, 5.40s/it] 99%|█████████▉| 6586/6640 [2:29:56<04:50, 5.38s/it] {'loss': 0.5627, 'learning_rate': 3.4694483009234393e-09, 'epoch': 0.99} 99%|█████████▉| 6586/6640 [2:29:56<04:50, 5.38s/it] 99%|█████████▉| 6587/6640 [2:30:01<04:44, 5.37s/it] {'loss': 0.5671, 'learning_rate': 3.3421471046568564e-09, 'epoch': 0.99} 99%|█████████▉| 6587/6640 [2:30:01<04:44, 5.37s/it] 99%|█████████▉| 6588/6640 [2:30:07<04:37, 5.34s/it] {'loss': 0.5783, 'learning_rate': 3.217224844782596e-09, 'epoch': 0.99} 99%|█████████▉| 6588/6640 [2:30:07<04:37, 5.34s/it] 99%|█████████▉| 6589/6640 [2:30:12<04:36, 5.41s/it] {'loss': 0.6104, 'learning_rate': 3.0946815510290995e-09, 'epoch': 0.99} 99%|█████████▉| 6589/6640 [2:30:12<04:36, 5.41s/it] 99%|█████████▉| 6590/6640 [2:30:18<04:32, 5.44s/it] {'loss': 0.5767, 'learning_rate': 2.974517252558595e-09, 'epoch': 0.99} 99%|█████████▉| 6590/6640 [2:30:18<04:32, 5.44s/it] 99%|█████████▉| 6591/6640 [2:30:23<04:27, 5.45s/it] {'loss': 0.5775, 'learning_rate': 2.856731977968208e-09, 'epoch': 0.99} 99%|█████████▉| 6591/6640 [2:30:23<04:27, 5.45s/it] 99%|█████████▉| 6592/6640 [2:30:29<04:22, 5.47s/it] {'loss': 0.5763, 'learning_rate': 2.741325755285518e-09, 'epoch': 0.99} 99%|█████████▉| 6592/6640 [2:30:29<04:22, 5.47s/it] 99%|█████████▉| 6593/6640 [2:30:34<04:20, 5.54s/it] {'loss': 0.5735, 'learning_rate': 2.628298611975222e-09, 'epoch': 0.99} 99%|█████████▉| 6593/6640 [2:30:34<04:20, 5.54s/it] 99%|█████████▉| 6594/6640 [2:30:40<04:14, 5.52s/it] {'loss': 0.5789, 'learning_rate': 2.5176505749346937e-09, 'epoch': 0.99} 99%|█████████▉| 6594/6640 [2:30:40<04:14, 5.52s/it] 99%|█████████▉| 6595/6640 [2:30:45<04:05, 5.46s/it] {'loss': 0.5708, 'learning_rate': 2.4093816704950924e-09, 'epoch': 0.99} 99%|█████████▉| 6595/6640 [2:30:45<04:05, 5.46s/it] 99%|█████████▉| 6596/6640 [2:30:50<03:58, 5.42s/it] {'loss': 0.6031, 'learning_rate': 2.3034919244213637e-09, 'epoch': 0.99} 99%|█████████▉| 6596/6640 [2:30:50<03:58, 5.42s/it] 99%|█████████▉| 6597/6640 [2:30:56<03:52, 5.41s/it] {'loss': 0.5794, 'learning_rate': 2.19998136191224e-09, 'epoch': 0.99} 99%|█████████▉| 6597/6640 [2:30:56<03:52, 5.41s/it] 99%|█████████▉| 6598/6640 [2:31:01<03:47, 5.41s/it] {'loss': 0.5814, 'learning_rate': 2.0988500076013494e-09, 'epoch': 0.99} 99%|█████████▉| 6598/6640 [2:31:01<03:47, 5.41s/it] 99%|█████████▉| 6599/6640 [2:31:07<03:42, 5.42s/it] {'loss': 0.5666, 'learning_rate': 2.000097885554997e-09, 'epoch': 0.99} 99%|█████████▉| 6599/6640 [2:31:07<03:42, 5.42s/it]06 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 99%|█████████▉| 6600/6640 [2:31:12<03:36, 5.42s/it]4 AutoResumeHook: Checking whether to suspend... {'loss': 0.5527, 'learning_rate': 1.9037250192732728e-09, 'epoch': 0.99} 99%|█████████▉| 6600/6640 [2:31:12<03:36, 5.42s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/tmp-checkpoint-6600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 99%|█████████▉| 6601/6640 [2:31:40<07:54, 12.16s/it] {'loss': 0.5772, 'learning_rate': 1.8097314316900537e-09, 'epoch': 0.99} 99%|█████████▉| 6601/6640 [2:31:40<07:54, 12.16s/it] 99%|█████████▉| 6602/6640 [2:31:45<06:25, 10.15s/it] {'loss': 0.5721, 'learning_rate': 1.7181171451730039e-09, 'epoch': 0.99} 99%|█████████▉| 6602/6640 [2:31:45<06:25, 10.15s/it] 99%|█████████▉| 6603/6640 [2:31:51<05:24, 8.78s/it] {'loss': 0.5542, 'learning_rate': 1.6288821815246825e-09, 'epoch': 0.99} 99%|█████████▉| 6603/6640 [2:31:51<05:24, 8.78s/it] 99%|█████████▉| 6604/6640 [2:31:56<04:40, 7.79s/it] {'loss': 0.5872, 'learning_rate': 1.5420265619803255e-09, 'epoch': 0.99} 99%|█████████▉| 6604/6640 [2:31:56<04:40, 7.79s/it] 99%|█████████▉| 6605/6640 [2:32:02<04:09, 7.14s/it] {'loss': 0.5882, 'learning_rate': 1.4575503072100649e-09, 'epoch': 0.99} 99%|█████████▉| 6605/6640 [2:32:02<04:09, 7.14s/it] 99%|█████████▉| 6606/6640 [2:32:07<03:43, 6.58s/it] {'loss': 0.5906, 'learning_rate': 1.3754534373155992e-09, 'epoch': 0.99} 99%|█████████▉| 6606/6640 [2:32:07<03:43, 6.58s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4214 > 4096). Running this sequence through the model will result in indexing errors 100%|█████████▉| 6607/6640 [2:32:13<03:25, 6.24s/it] {'loss': 0.549, 'learning_rate': 1.295735971834633e-09, 'epoch': 1.0} 100%|█████████▉| 6607/6640 [2:32:13<03:25, 6.24s/it] 100%|█████████▉| 6608/6640 [2:32:18<03:12, 6.00s/it] {'loss': 0.5821, 'learning_rate': 1.2183979297364368e-09, 'epoch': 1.0} 100%|█████████▉| 6608/6640 [2:32:18<03:12, 6.00s/it] 100%|█████████▉| 6609/6640 [2:32:24<03:00, 5.83s/it] {'loss': 0.5962, 'learning_rate': 1.1434393294273981e-09, 'epoch': 1.0} 100%|█████████▉| 6609/6640 [2:32:24<03:00, 5.83s/it] 100%|█████████▉| 6610/6640 [2:32:29<02:50, 5.68s/it] {'loss': 0.5829, 'learning_rate': 1.0708601887454706e-09, 'epoch': 1.0} 100%|█████████▉| 6610/6640 [2:32:29<02:50, 5.68s/it] 100%|█████████▉| 6611/6640 [2:32:34<02:42, 5.59s/it] {'loss': 0.5778, 'learning_rate': 1.000660524960173e-09, 'epoch': 1.0} 100%|█████████▉| 6611/6640 [2:32:34<02:42, 5.59s/it] 100%|█████████▉| 6612/6640 [2:32:40<02:39, 5.68s/it] {'loss': 0.5532, 'learning_rate': 9.328403547792518e-10, 'epoch': 1.0} 100%|█████████▉| 6612/6640 [2:32:40<02:39, 5.68s/it] 100%|█████████▉| 6613/6640 [2:32:46<02:31, 5.60s/it] {'loss': 0.5881, 'learning_rate': 8.673996943420193e-10, 'epoch': 1.0} 100%|█████████▉| 6613/6640 [2:32:46<02:31, 5.60s/it] 100%|█████████▉| 6614/6640 [2:32:51<02:25, 5.59s/it] {'loss': 0.568, 'learning_rate': 8.043385592215735e-10, 'epoch': 1.0} 100%|█████████▉| 6614/6640 [2:32:51<02:25, 5.59s/it] 100%|█████████▉| 6615/6640 [2:32:57<02:18, 5.53s/it] {'loss': 0.5679, 'learning_rate': 7.43656964423689e-10, 'epoch': 1.0} 100%|█████████▉| 6615/6640 [2:32:57<02:18, 5.53s/it] 100%|█████████▉| 6616/6640 [2:33:02<02:11, 5.47s/it] {'loss': 0.5428, 'learning_rate': 6.85354924390147e-10, 'epoch': 1.0} 100%|█████████▉| 6616/6640 [2:33:02<02:11, 5.47s/it] 100%|█████████▉| 6617/6640 [2:33:08<02:06, 5.49s/it] {'loss': 0.554, 'learning_rate': 6.294324529942942e-10, 'epoch': 1.0} 100%|█████████▉| 6617/6640 [2:33:08<02:06, 5.49s/it] 100%|█████████▉| 6618/6640 [2:33:13<02:01, 5.51s/it] {'loss': 0.5606, 'learning_rate': 5.75889563544374e-10, 'epoch': 1.0} 100%|█████████▉| 6618/6640 [2:33:13<02:01, 5.51s/it] 100%|█████████▉| 6619/6640 [2:33:19<01:54, 5.47s/it] {'loss': 0.5738, 'learning_rate': 5.247262687835264e-10, 'epoch': 1.0} 100%|█████████▉| 6619/6640 [2:33:19<01:54, 5.47s/it] 100%|█████████▉| 6620/6640 [2:33:24<01:49, 5.45s/it] {'loss': 0.5297, 'learning_rate': 4.759425808853468e-10, 'epoch': 1.0} 100%|█████████▉| 6620/6640 [2:33:24<01:49, 5.45s/it] 100%|█████████▉| 6621/6640 [2:33:29<01:43, 5.44s/it] {'loss': 0.5881, 'learning_rate': 4.295385114594375e-10, 'epoch': 1.0} 100%|█████████▉| 6621/6640 [2:33:29<01:43, 5.44s/it] 100%|█████████▉| 6622/6640 [2:33:35<01:36, 5.38s/it] {'loss': 0.5572, 'learning_rate': 3.8551407155029697e-10, 'epoch': 1.0} 100%|█████████▉| 6622/6640 [2:33:35<01:36, 5.38s/it] 100%|█████████▉| 6623/6640 [2:33:40<01:31, 5.37s/it] {'loss': 0.5794, 'learning_rate': 3.4386927163287953e-10, 'epoch': 1.0} 100%|█████████▉| 6623/6640 [2:33:40<01:31, 5.37s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") [2025-05-28 04:45:53,134] [WARNING] [stage3.py:1850:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time 100%|█████████▉| 6624/6640 [2:33:46<01:29, 5.60s/it] {'loss': 0.5926, 'learning_rate': 3.0460412161814613e-10, 'epoch': 1.0} 100%|█████████▉| 6624/6640 [2:33:46<01:29, 5.60s/it] 100%|█████████▉| 6625/6640 [2:33:51<01:22, 5.50s/it] {'loss': 0.5677, 'learning_rate': 2.677186308497337e-10, 'epoch': 1.0} 100%|█████████▉| 6625/6640 [2:33:51<01:22, 5.50s/it] 100%|█████████▉| 6626/6640 [2:33:57<01:17, 5.50s/it] {'loss': 0.6055, 'learning_rate': 2.3321280810617575e-10, 'epoch': 1.0} 100%|█████████▉| 6626/6640 [2:33:57<01:17, 5.50s/it] 100%|█████████▉| 6627/6640 [2:34:02<01:11, 5.53s/it] {'loss': 0.5717, 'learning_rate': 2.0108666159757151e-10, 'epoch': 1.0} 100%|█████████▉| 6627/6640 [2:34:02<01:11, 5.53s/it] 100%|█████████▉| 6628/6640 [2:34:08<01:06, 5.50s/it] {'loss': 0.5814, 'learning_rate': 1.7134019897113718e-10, 'epoch': 1.0} 100%|█████████▉| 6628/6640 [2:34:08<01:06, 5.50s/it] 100%|█████████▉| 6629/6640 [2:34:13<01:00, 5.47s/it] {'loss': 0.5843, 'learning_rate': 1.4397342730343434e-10, 'epoch': 1.0} 100%|█████████▉| 6629/6640 [2:34:13<01:00, 5.47s/it] 100%|█████████▉| 6630/6640 [2:34:19<00:54, 5.43s/it] {'loss': 0.572, 'learning_rate': 1.1898635310925167e-10, 'epoch': 1.0} 100%|█████████▉| 6630/6640 [2:34:19<00:54, 5.43s/it] 100%|█████████▉| 6631/6640 [2:34:24<00:48, 5.42s/it] {'loss': 0.5871, 'learning_rate': 9.637898233272324e-11, 'epoch': 1.0} 100%|█████████▉| 6631/6640 [2:34:24<00:48, 5.42s/it] 100%|█████████▉| 6632/6640 [2:34:29<00:43, 5.39s/it] {'loss': 0.5929, 'learning_rate': 7.615132035510008e-11, 'epoch': 1.0} 100%|█████████▉| 6632/6640 [2:34:29<00:43, 5.39s/it] 100%|█████████▉| 6633/6640 [2:34:35<00:37, 5.42s/it] {'loss': 0.5959, 'learning_rate': 5.830337199030922e-11, 'epoch': 1.0} 100%|█████████▉| 6633/6640 [2:34:35<00:37, 5.42s/it] 100%|█████████▉| 6634/6640 [2:34:40<00:32, 5.38s/it] {'loss': 0.6106, 'learning_rate': 4.2835141484953715e-11, 'epoch': 1.0} 100%|█████████▉| 6634/6640 [2:34:40<00:32, 5.38s/it] 100%|█████████▉| 6635/6640 [2:34:46<00:26, 5.40s/it] {'loss': 0.599, 'learning_rate': 2.9746632520533116e-11, 'epoch': 1.0} 100%|█████████▉| 6635/6640 [2:34:46<00:26, 5.40s/it] 100%|█████████▉| 6636/6640 [2:34:51<00:21, 5.40s/it] {'loss': 0.5793, 'learning_rate': 1.903784821122301e-11, 'epoch': 1.0} 100%|█████████▉| 6636/6640 [2:34:51<00:21, 5.40s/it] 100%|█████████▉| 6637/6640 [2:34:56<00:16, 5.39s/it] {'loss': 0.5687, 'learning_rate': 1.070879110498524e-11, 'epoch': 1.0} 100%|█████████▉| 6637/6640 [2:34:56<00:16, 5.39s/it] 100%|█████████▉| 6638/6640 [2:35:02<00:10, 5.40s/it] {'loss': 0.598, 'learning_rate': 4.759463185788349e-12, 'epoch': 1.0} 100%|█████████▉| 6638/6640 [2:35:02<00:10, 5.40s/it] 100%|█████████▉| 6639/6640 [2:35:07<00:05, 5.39s/it] {'loss': 0.561, 'learning_rate': 1.1898658669462494e-12, 'epoch': 1.0} 100%|█████████▉| 6639/6640 [2:35:07<00:05, 5.39s/it] 100%|██████████| 6640/6640 [2:35:14<00:00, 5.76s/it] {'loss': 0.5804, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 6640/6640 [2:35:14<00:00, 5.76s/it] {'train_runtime': 9316.3349, 'train_samples_per_second': 182.496, 'train_steps_per_second': 0.713, 'train_loss': 0.14256353377398237, 'epoch': 1.0} 100%|██████████| 6640/6640 [2:35:14<00:00, 5.76s/it] 100%|██████████| 6640/6640 [2:35:14<00:00, 1.40s/it] saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask/mm_projector wandb: 🚀 View run vila_3b_path_mask at: https://wandb.ai/memmelma/VILA/runs/cvs7yr6k wandb: Find logs at: ../../../../../../../../fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/wandb/run-20250528_021205-cvs7yr6k/logs srun: job 8281991 queued and waiting for resources srun: job 8281991 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block7-00733 JobID: 8281991 | Full list: batch-block7-00733 NETWORK=Efficient-Large-Model/VILA1.5-3b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-05-28 04:50:07,925] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 04:50:07,925] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 04:50:07,925] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 04:50:07,925] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 04:50:07,925] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 04:50:07,925] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 04:50:07,925] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 04:50:07,925] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 04:50:09,156] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 04:50:09,156] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 04:50:09,156] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 04:50:09,156] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 04:50:09,156] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 04:50:09,156] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 04:50:09,156] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 04:50:09,157] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 04:50:09,157] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 04:50:09,157] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 04:50:09,157] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 04:50:09,157] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 04:50:09,157] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 04:50:09,157] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 04:50:09,157] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 04:50:09,157] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 04:50:09,157] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask. Skipp trainingModels has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/checkpoints/finetuned/nvila/vila_3b_path_mask. Skipp training