srun: job 8262253 queued and waiting for resources srun: job 8262253 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block4-0002 JobID: 8262253 | Full list: batch-block4-0002 NETWORK=Efficient-Large-Model/VILA1.5-13b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-05-27 18:08:49,522] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:49,522] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:49,522] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:49,522] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:49,522] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:49,522] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:49,522] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:49,523] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:51,531] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:08:51,531] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:08:51,531] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:08:51,531] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:08:51,531] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:08:51,531] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:08:51,531] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:08:51,531] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:08:51,532] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:08:51,532] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:08:51,532] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:08:51,532] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:08:51,532] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:08:51,532] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:08:51,532] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-05-27 18:08:51,532] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:08:51,532] [INFO] [comm.py:594:init_distributed] cdb=None /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( Fetching 21 files: 0%| | 0/21 [00:00 4096). Running this sequence through the model will result in indexing errors 29%|██▊ | 1894/6640 [1:23:20<21:16:34, 16.14s/it] {'loss': 0.5448, 'learning_rate': 1.677545983796741e-05, 'epoch': 0.29} 29%|██▊ | 1894/6640 [1:23:20<21:16:34, 16.14s/it] 29%|██▊ | 1895/6640 [1:23:36<21:28:32, 16.29s/it] {'loss': 0.5378, 'learning_rate': 1.677187117498536e-05, 'epoch': 0.29} 29%|██▊ | 1895/6640 [1:23:37<21:28:32, 16.29s/it] 29%|██▊ | 1896/6640 [1:23:53<21:43:11, 16.48s/it] {'loss': 0.5428, 'learning_rate': 1.6768280900479634e-05, 'epoch': 0.29} 29%|██▊ | 1896/6640 [1:23:53<21:43:11, 16.48s/it] 29%|██▊ | 1897/6640 [1:24:10<21:33:21, 16.36s/it] {'loss': 0.5363, 'learning_rate': 1.6764689015304624e-05, 'epoch': 0.29} 29%|██▊ | 1897/6640 [1:24:10<21:33:21, 16.36s/it] 29%|██▊ | 1898/6640 [1:24:26<21:39:00, 16.44s/it] {'loss': 0.5567, 'learning_rate': 1.67610955203151e-05, 'epoch': 0.29} 29%|██▊ | 1898/6640 [1:24:26<21:39:00, 16.44s/it] 29%|██▊ | 1899/6640 [1:24:43<21:48:56, 16.57s/it] {'loss': 0.5249, 'learning_rate': 1.6757500416366225e-05, 'epoch': 0.29} 29%|██▊ | 1899/6640 [1:24:43<21:48:56, 16.57s/it]6 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 29%|██▊ | 1900/6640 [1:24:59<21:38:15, 16.43s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5475, 'learning_rate': 1.6753903704313527e-05, 'epoch': 0.29} 29%|██▊ | 1900/6640 [1:24:59<21:38:15, 16.43s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-1900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-1900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-1900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 29%|██▊ | 1901/6640 [1:26:48<58:07:25, 44.15s/it] {'loss': 0.552, 'learning_rate': 1.6750305385012936e-05, 'epoch': 0.29} 29%|██▊ | 1901/6640 [1:26:48<58:07:25, 44.15s/it] 29%|██▊ | 1902/6640 [1:27:04<47:04:53, 35.77s/it] {'loss': 0.5565, 'learning_rate': 1.6746705459320746e-05, 'epoch': 0.29} 29%|██▊ | 1902/6640 [1:27:04<47:04:53, 35.77s/it] 29%|██▊ | 1903/6640 [1:27:20<39:12:27, 29.80s/it] {'loss': 0.5363, 'learning_rate': 1.674310392809365e-05, 'epoch': 0.29} 29%|██▊ | 1903/6640 [1:27:20<39:12:27, 29.80s/it] 29%|██▊ | 1904/6640 [1:27:36<33:36:46, 25.55s/it] {'loss': 0.5597, 'learning_rate': 1.673950079218871e-05, 'epoch': 0.29} 29%|██▊ | 1904/6640 [1:27:36<33:36:46, 25.55s/it] 29%|██▊ | 1905/6640 [1:27:52<29:50:48, 22.69s/it] {'loss': 0.5216, 'learning_rate': 1.6735896052463384e-05, 'epoch': 0.29} 29%|██▊ | 1905/6640 [1:27:52<29:50:48, 22.69s/it] 29%|██▊ | 1906/6640 [1:28:10<27:57:56, 21.27s/it] {'loss': 0.5398, 'learning_rate': 1.6732289709775496e-05, 'epoch': 0.29} 29%|██▊ | 1906/6640 [1:28:10<27:57:56, 21.27s/it] 29%|██▊ | 1907/6640 [1:28:26<25:52:51, 19.69s/it] {'loss': 0.5552, 'learning_rate': 1.672868176498326e-05, 'epoch': 0.29} 29%|██▊ | 1907/6640 [1:28:26<25:52:51, 19.69s/it] 29%|██▊ | 1908/6640 [1:28:42<24:42:15, 18.79s/it] {'loss': 0.5538, 'learning_rate': 1.6725072218945274e-05, 'epoch': 0.29} 29%|██▊ | 1908/6640 [1:28:42<24:42:15, 18.79s/it] 29%|██▉ | 1909/6640 [1:28:59<23:40:05, 18.01s/it] {'loss': 0.5287, 'learning_rate': 1.672146107252051e-05, 'epoch': 0.29} 29%|██▉ | 1909/6640 [1:28:59<23:40:05, 18.01s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 29%|██▉ | 1910/6640 [1:29:15<23:01:27, 17.52s/it] {'loss': 0.5529, 'learning_rate': 1.6717848326568327e-05, 'epoch': 0.29} 29%|██▉ | 1910/6640 [1:29:15<23:01:27, 17.52s/it] 29%|██▉ | 1911/6640 [1:29:31<22:32:55, 17.17s/it] {'loss': 0.5551, 'learning_rate': 1.6714233981948457e-05, 'epoch': 0.29} 29%|██▉ | 1911/6640 [1:29:31<22:32:55, 17.17s/it] 29%|██▉ | 1912/6640 [1:29:47<22:09:38, 16.87s/it] {'loss': 0.5621, 'learning_rate': 1.6710618039521017e-05, 'epoch': 0.29} 29%|██▉ | 1912/6640 [1:29:47<22:09:38, 16.87s/it] 29%|██▉ | 1913/6640 [1:30:03<21:47:56, 16.60s/it] {'loss': 0.5448, 'learning_rate': 1.6707000500146505e-05, 'epoch': 0.29} 29%|██▉ | 1913/6640 [1:30:03<21:47:56, 16.60s/it] 29%|██▉ | 1914/6640 [1:30:19<21:28:21, 16.36s/it] {'loss': 0.553, 'learning_rate': 1.6703381364685805e-05, 'epoch': 0.29} 29%|██▉ | 1914/6640 [1:30:19<21:28:21, 16.36s/it] 29%|██▉ | 1915/6640 [1:30:36<21:37:34, 16.48s/it] {'loss': 0.531, 'learning_rate': 1.6699760634000166e-05, 'epoch': 0.29} 29%|██▉ | 1915/6640 [1:30:36<21:37:34, 16.48s/it] 29%|██▉ | 1916/6640 [1:30:53<21:41:40, 16.53s/it] {'loss': 0.5166, 'learning_rate': 1.6696138308951227e-05, 'epoch': 0.29} 29%|██▉ | 1916/6640 [1:30:53<21:41:40, 16.53s/it] 29%|██▉ | 1917/6640 [1:31:09<21:38:57, 16.50s/it] {'loss': 0.5323, 'learning_rate': 1.669251439040101e-05, 'epoch': 0.29} 29%|██▉ | 1917/6640 [1:31:09<21:38:57, 16.50s/it] 29%|██▉ | 1918/6640 [1:31:26<21:41:39, 16.54s/it] {'loss': 0.5436, 'learning_rate': 1.66888888792119e-05, 'epoch': 0.29} 29%|██▉ | 1918/6640 [1:31:26<21:41:39, 16.54s/it] 29%|██▉ | 1919/6640 [1:31:42<21:27:46, 16.37s/it] {'loss': 0.5486, 'learning_rate': 1.668526177624668e-05, 'epoch': 0.29} 29%|██▉ | 1919/6640 [1:31:42<21:27:46, 16.37s/it] 29%|██▉ | 1920/6640 [1:31:58<21:23:07, 16.31s/it] {'loss': 0.545, 'learning_rate': 1.66816330823685e-05, 'epoch': 0.29} 29%|██▉ | 1920/6640 [1:31:58<21:23:07, 16.31s/it] 29%|██▉ | 1921/6640 [1:32:14<21:28:56, 16.39s/it] {'loss': 0.536, 'learning_rate': 1.6678002798440887e-05, 'epoch': 0.29} 29%|██▉ | 1921/6640 [1:32:14<21:28:56, 16.39s/it] 29%|██▉ | 1922/6640 [1:32:31<21:26:51, 16.37s/it] {'loss': 0.527, 'learning_rate': 1.667437092532776e-05, 'epoch': 0.29} 29%|██▉ | 1922/6640 [1:32:31<21:26:51, 16.37s/it] 29%|██▉ | 1923/6640 [1:32:47<21:33:35, 16.45s/it] {'loss': 0.548, 'learning_rate': 1.6670737463893403e-05, 'epoch': 0.29} 29%|██▉ | 1923/6640 [1:32:47<21:33:35, 16.45s/it] 29%|██▉ | 1924/6640 [1:33:03<21:18:26, 16.27s/it] {'loss': 0.5472, 'learning_rate': 1.6667102415002482e-05, 'epoch': 0.29} 29%|██▉ | 1924/6640 [1:33:03<21:18:26, 16.27s/it] 29%|██▉ | 1925/6640 [1:33:20<21:23:51, 16.34s/it] {'loss': 0.5428, 'learning_rate': 1.6663465779520042e-05, 'epoch': 0.29} 29%|██▉ | 1925/6640 [1:33:20<21:23:51, 16.34s/it] 29%|██▉ | 1926/6640 [1:33:36<21:26:40, 16.38s/it] {'loss': 0.5307, 'learning_rate': 1.6659827558311504e-05, 'epoch': 0.29} 29%|██▉ | 1926/6640 [1:33:36<21:26:40, 16.38s/it] 29%|██▉ | 1927/6640 [1:33:52<21:17:12, 16.26s/it] {'loss': 0.5313, 'learning_rate': 1.665618775224267e-05, 'epoch': 0.29} 29%|██▉ | 1927/6640 [1:33:52<21:17:12, 16.26s/it] 29%|██▉ | 1928/6640 [1:34:08<21:06:52, 16.13s/it] {'loss': 0.5488, 'learning_rate': 1.665254636217971e-05, 'epoch': 0.29} 29%|██▉ | 1928/6640 [1:34:08<21:06:52, 16.13s/it] 29%|██▉ | 1929/6640 [1:34:24<21:16:10, 16.25s/it] {'loss': 0.5289, 'learning_rate': 1.6648903388989182e-05, 'epoch': 0.29} 29%|██▉ | 1929/6640 [1:34:24<21:16:10, 16.25s/it] 29%|██▉ | 1930/6640 [1:34:41<21:11:39, 16.20s/it] {'loss': 0.5402, 'learning_rate': 1.6645258833538015e-05, 'epoch': 0.29} 29%|██▉ | 1930/6640 [1:34:41<21:11:39, 16.20s/it] 29%|██▉ | 1931/6640 [1:34:57<21:25:07, 16.37s/it] {'loss': 0.5536, 'learning_rate': 1.6641612696693513e-05, 'epoch': 0.29} 29%|██▉ | 1931/6640 [1:34:57<21:25:07, 16.37s/it] 29%|██▉ | 1932/6640 [1:35:13<21:16:21, 16.27s/it] {'loss': 0.5545, 'learning_rate': 1.6637964979323363e-05, 'epoch': 0.29} 29%|██▉ | 1932/6640 [1:35:13<21:16:21, 16.27s/it] 29%|██▉ | 1933/6640 [1:35:30<21:22:55, 16.35s/it] {'loss': 0.5354, 'learning_rate': 1.6634315682295622e-05, 'epoch': 0.29} 29%|██▉ | 1933/6640 [1:35:30<21:22:55, 16.35s/it] 29%|██▉ | 1934/6640 [1:35:46<21:14:44, 16.25s/it] {'loss': 0.5232, 'learning_rate': 1.6630664806478726e-05, 'epoch': 0.29} 29%|██▉ | 1934/6640 [1:35:46<21:14:44, 16.25s/it] 29%|██▉ | 1935/6640 [1:36:03<21:26:57, 16.41s/it] {'loss': 0.5584, 'learning_rate': 1.6627012352741482e-05, 'epoch': 0.29} 29%|██▉ | 1935/6640 [1:36:03<21:26:57, 16.41s/it] 29%|██▉ | 1936/6640 [1:36:19<21:21:30, 16.35s/it] {'loss': 0.5434, 'learning_rate': 1.662335832195308e-05, 'epoch': 0.29} 29%|██▉ | 1936/6640 [1:36:19<21:21:30, 16.35s/it] 29%|██▉ | 1937/6640 [1:36:35<21:07:24, 16.17s/it] {'loss': 0.5677, 'learning_rate': 1.6619702714983077e-05, 'epoch': 0.29} 29%|██▉ | 1937/6640 [1:36:35<21:07:24, 16.17s/it] 29%|██▉ | 1938/6640 [1:36:50<20:56:00, 16.03s/it] {'loss': 0.5341, 'learning_rate': 1.661604553270141e-05, 'epoch': 0.29} 29%|██▉ | 1938/6640 [1:36:50<20:56:00, 16.03s/it] 29%|██▉ | 1939/6640 [1:37:07<21:11:38, 16.23s/it] {'loss': 0.5603, 'learning_rate': 1.6612386775978398e-05, 'epoch': 0.29} 29%|██▉ | 1939/6640 [1:37:07<21:11:38, 16.23s/it] 29%|██▉ | 1940/6640 [1:37:23<20:58:56, 16.07s/it] {'loss': 0.5372, 'learning_rate': 1.6608726445684715e-05, 'epoch': 0.29} 29%|██▉ | 1940/6640 [1:37:23<20:58:56, 16.07s/it] 29%|██▉ | 1941/6640 [1:37:39<21:12:20, 16.25s/it] {'loss': 0.5369, 'learning_rate': 1.660506454269143e-05, 'epoch': 0.29} 29%|██▉ | 1941/6640 [1:37:39<21:12:20, 16.25s/it] 29%|██▉ | 1942/6640 [1:37:55<21:08:05, 16.20s/it] {'loss': 0.5604, 'learning_rate': 1.6601401067869978e-05, 'epoch': 0.29} 29%|██▉ | 1942/6640 [1:37:56<21:08:05, 16.20s/it] 29%|██▉ | 1943/6640 [1:38:12<21:05:05, 16.16s/it] {'loss': 0.5412, 'learning_rate': 1.659773602209216e-05, 'epoch': 0.29} 29%|██▉ | 1943/6640 [1:38:12<21:05:05, 16.16s/it] 29%|██▉ | 1944/6640 [1:38:27<20:46:49, 15.93s/it] {'loss': 0.5659, 'learning_rate': 1.6594069406230167e-05, 'epoch': 0.29} 29%|██▉ | 1944/6640 [1:38:27<20:46:49, 15.93s/it] 29%|██▉ | 1945/6640 [1:38:43<20:43:37, 15.89s/it] {'loss': 0.5308, 'learning_rate': 1.659040122115655e-05, 'epoch': 0.29} 29%|██▉ | 1945/6640 [1:38:43<20:43:37, 15.89s/it] 29%|██▉ | 1946/6640 [1:38:59<20:50:48, 15.99s/it] {'loss': 0.5351, 'learning_rate': 1.658673146774424e-05, 'epoch': 0.29} 29%|██▉ | 1946/6640 [1:38:59<20:50:48, 15.99s/it] 29%|██▉ | 1947/6640 [1:39:16<21:05:44, 16.18s/it] {'loss': 0.5553, 'learning_rate': 1.6583060146866542e-05, 'epoch': 0.29} 29%|██▉ | 1947/6640 [1:39:16<21:05:44, 16.18s/it] 29%|██▉ | 1948/6640 [1:39:33<21:22:15, 16.40s/it] {'loss': 0.5241, 'learning_rate': 1.657938725939713e-05, 'epoch': 0.29} 29%|██▉ | 1948/6640 [1:39:33<21:22:15, 16.40s/it] 29%|██▉ | 1949/6640 [1:39:49<21:14:46, 16.31s/it] {'loss': 0.5339, 'learning_rate': 1.657571280621005e-05, 'epoch': 0.29} 29%|██▉ | 1949/6640 [1:39:49<21:14:46, 16.31s/it]6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...3 AutoResumeHook: Checking whether to suspend... 29%|██▉ | 1950/6640 [1:40:05<21:10:45, 16.26s/it]5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.5543, 'learning_rate': 1.6572036788179728e-05, 'epoch': 0.29} 29%|██▉ | 1950/6640 [1:40:05<21:10:45, 16.26s/it] 29%|██▉ | 1951/6640 [1:40:20<20:57:13, 16.09s/it] {'loss': 0.5498, 'learning_rate': 1.6568359206180952e-05, 'epoch': 0.29} 29%|██▉ | 1951/6640 [1:40:20<20:57:13, 16.09s/it] 29%|██▉ | 1952/6640 [1:40:37<21:11:16, 16.27s/it] {'loss': 0.5479, 'learning_rate': 1.6564680061088897e-05, 'epoch': 0.29} 29%|██▉ | 1952/6640 [1:40:37<21:11:16, 16.27s/it] 29%|██▉ | 1953/6640 [1:40:54<21:33:10, 16.55s/it] {'loss': 0.5534, 'learning_rate': 1.6560999353779092e-05, 'epoch': 0.29} 29%|██▉ | 1953/6640 [1:40:54<21:33:10, 16.55s/it] 29%|██▉ | 1954/6640 [1:41:11<21:26:25, 16.47s/it] {'loss': 0.5333, 'learning_rate': 1.655731708512745e-05, 'epoch': 0.29} 29%|██▉ | 1954/6640 [1:41:11<21:26:25, 16.47s/it] 29%|██▉ | 1955/6640 [1:41:27<21:24:25, 16.45s/it] {'loss': 0.5391, 'learning_rate': 1.6553633256010254e-05, 'epoch': 0.29} 29%|██▉ | 1955/6640 [1:41:27<21:24:25, 16.45s/it] 29%|██▉ | 1956/6640 [1:41:44<21:27:28, 16.49s/it] {'loss': 0.5433, 'learning_rate': 1.6549947867304154e-05, 'epoch': 0.29} 29%|██▉ | 1956/6640 [1:41:44<21:27:28, 16.49s/it] 29%|██▉ | 1957/6640 [1:42:00<21:22:29, 16.43s/it] {'loss': 0.5644, 'learning_rate': 1.654626091988617e-05, 'epoch': 0.29} 29%|██▉ | 1957/6640 [1:42:00<21:22:29, 16.43s/it] 29%|██▉ | 1958/6640 [1:42:16<21:22:56, 16.44s/it] {'loss': 0.5376, 'learning_rate': 1.6542572414633707e-05, 'epoch': 0.29} 29%|██▉ | 1958/6640 [1:42:16<21:22:56, 16.44s/it] 30%|██▉ | 1959/6640 [1:42:33<21:15:18, 16.35s/it] {'loss': 0.55, 'learning_rate': 1.653888235242452e-05, 'epoch': 0.3} 30%|██▉ | 1959/6640 [1:42:33<21:15:18, 16.35s/it] 30%|██▉ | 1960/6640 [1:42:49<21:24:47, 16.47s/it] {'loss': 0.5491, 'learning_rate': 1.653519073413675e-05, 'epoch': 0.3} 30%|██▉ | 1960/6640 [1:42:49<21:24:47, 16.47s/it] 30%|██▉ | 1961/6640 [1:43:06<21:19:48, 16.41s/it] {'loss': 0.5561, 'learning_rate': 1.6531497560648903e-05, 'epoch': 0.3} 30%|██▉ | 1961/6640 [1:43:06<21:19:48, 16.41s/it] 30%|██▉ | 1962/6640 [1:43:22<21:23:01, 16.46s/it] {'loss': 0.526, 'learning_rate': 1.6527802832839853e-05, 'epoch': 0.3} 30%|██▉ | 1962/6640 [1:43:22<21:23:01, 16.46s/it] 30%|██▉ | 1963/6640 [1:43:38<21:06:11, 16.24s/it] {'loss': 0.5333, 'learning_rate': 1.652410655158885e-05, 'epoch': 0.3} 30%|██▉ | 1963/6640 [1:43:38<21:06:11, 16.24s/it] 30%|██▉ | 1964/6640 [1:43:54<21:08:43, 16.28s/it] {'loss': 0.5528, 'learning_rate': 1.6520408717775507e-05, 'epoch': 0.3} 30%|██▉ | 1964/6640 [1:43:54<21:08:43, 16.28s/it] 30%|██▉ | 1965/6640 [1:44:11<21:18:05, 16.40s/it] {'loss': 0.5416, 'learning_rate': 1.6516709332279806e-05, 'epoch': 0.3} 30%|██▉ | 1965/6640 [1:44:11<21:18:05, 16.40s/it] 30%|██▉ | 1966/6640 [1:44:27<21:14:53, 16.37s/it] {'loss': 0.5492, 'learning_rate': 1.6513008395982107e-05, 'epoch': 0.3} 30%|██▉ | 1966/6640 [1:44:27<21:14:53, 16.37s/it] 30%|██▉ | 1967/6640 [1:44:45<21:47:40, 16.79s/it] {'loss': 0.5433, 'learning_rate': 1.650930590976313e-05, 'epoch': 0.3} 30%|██▉ | 1967/6640 [1:44:45<21:47:40, 16.79s/it] 30%|██▉ | 1968/6640 [1:45:01<21:37:03, 16.66s/it] {'loss': 0.5409, 'learning_rate': 1.650560187450397e-05, 'epoch': 0.3} 30%|██▉ | 1968/6640 [1:45:01<21:37:03, 16.66s/it] 30%|██▉ | 1969/6640 [1:45:18<21:29:34, 16.56s/it] {'loss': 0.546, 'learning_rate': 1.650189629108609e-05, 'epoch': 0.3} 30%|██▉ | 1969/6640 [1:45:18<21:29:34, 16.56s/it] 30%|██▉ | 1970/6640 [1:45:35<21:37:21, 16.67s/it] {'loss': 0.525, 'learning_rate': 1.649818916039131e-05, 'epoch': 0.3} 30%|██▉ | 1970/6640 [1:45:35<21:37:21, 16.67s/it] 30%|██▉ | 1971/6640 [1:45:51<21:26:59, 16.54s/it] {'loss': 0.5394, 'learning_rate': 1.6494480483301836e-05, 'epoch': 0.3} 30%|██▉ | 1971/6640 [1:45:51<21:26:59, 16.54s/it] 30%|██▉ | 1972/6640 [1:46:08<21:39:24, 16.70s/it] {'loss': 0.52, 'learning_rate': 1.6490770260700234e-05, 'epoch': 0.3} 30%|██▉ | 1972/6640 [1:46:08<21:39:24, 16.70s/it] 30%|██▉ | 1973/6640 [1:46:24<21:35:56, 16.66s/it] {'loss': 0.5668, 'learning_rate': 1.6487058493469437e-05, 'epoch': 0.3} 30%|██▉ | 1973/6640 [1:46:24<21:35:56, 16.66s/it] 30%|██▉ | 1974/6640 [1:46:40<21:06:04, 16.28s/it] {'loss': 0.5117, 'learning_rate': 1.6483345182492742e-05, 'epoch': 0.3} 30%|██▉ | 1974/6640 [1:46:40<21:06:04, 16.28s/it] 30%|██▉ | 1975/6640 [1:46:56<21:12:32, 16.37s/it] {'loss': 0.5503, 'learning_rate': 1.6479630328653814e-05, 'epoch': 0.3} 30%|██▉ | 1975/6640 [1:46:56<21:12:32, 16.37s/it] 30%|██▉ | 1976/6640 [1:47:13<21:07:55, 16.31s/it] {'loss': 0.5409, 'learning_rate': 1.64759139328367e-05, 'epoch': 0.3} 30%|██▉ | 1976/6640 [1:47:13<21:07:55, 16.31s/it] 30%|██▉ | 1977/6640 [1:47:29<21:01:09, 16.23s/it] {'loss': 0.5467, 'learning_rate': 1.6472195995925796e-05, 'epoch': 0.3} 30%|██▉ | 1977/6640 [1:47:29<21:01:09, 16.23s/it] 30%|██▉ | 1978/6640 [1:47:46<21:16:21, 16.43s/it] {'loss': 0.5382, 'learning_rate': 1.6468476518805872e-05, 'epoch': 0.3} 30%|██▉ | 1978/6640 [1:47:46<21:16:21, 16.43s/it] 30%|██▉ | 1979/6640 [1:48:03<21:40:04, 16.74s/it] {'loss': 0.5383, 'learning_rate': 1.6464755502362063e-05, 'epoch': 0.3} 30%|██▉ | 1979/6640 [1:48:03<21:40:04, 16.74s/it] 30%|██▉ | 1980/6640 [1:48:19<21:20:59, 16.49s/it] {'loss': 0.5462, 'learning_rate': 1.646103294747987e-05, 'epoch': 0.3} 30%|██▉ | 1980/6640 [1:48:19<21:20:59, 16.49s/it] 30%|██▉ | 1981/6640 [1:48:35<21:05:03, 16.29s/it] {'loss': 0.5279, 'learning_rate': 1.6457308855045165e-05, 'epoch': 0.3} 30%|██▉ | 1981/6640 [1:48:35<21:05:03, 16.29s/it] 30%|██▉ | 1982/6640 [1:48:52<21:24:22, 16.54s/it] {'loss': 0.5587, 'learning_rate': 1.645358322594418e-05, 'epoch': 0.3} 30%|██▉ | 1982/6640 [1:48:52<21:24:22, 16.54s/it] 30%|██▉ | 1983/6640 [1:49:08<21:18:57, 16.48s/it] {'loss': 0.5571, 'learning_rate': 1.6449856061063513e-05, 'epoch': 0.3} 30%|██▉ | 1983/6640 [1:49:08<21:18:57, 16.48s/it] 30%|██▉ | 1984/6640 [1:49:26<21:48:08, 16.86s/it] {'loss': 0.5332, 'learning_rate': 1.644612736129013e-05, 'epoch': 0.3} 30%|██▉ | 1984/6640 [1:49:26<21:48:08, 16.86s/it] 30%|██▉ | 1985/6640 [1:49:42<21:33:28, 16.67s/it] {'loss': 0.5251, 'learning_rate': 1.6442397127511366e-05, 'epoch': 0.3} 30%|██▉ | 1985/6640 [1:49:42<21:33:28, 16.67s/it] 30%|██▉ | 1986/6640 [1:49:58<21:23:18, 16.54s/it] {'loss': 0.5254, 'learning_rate': 1.643866536061491e-05, 'epoch': 0.3} 30%|██▉ | 1986/6640 [1:49:58<21:23:18, 16.54s/it] 30%|██▉ | 1987/6640 [1:50:14<21:05:29, 16.32s/it] {'loss': 0.526, 'learning_rate': 1.6434932061488827e-05, 'epoch': 0.3} 30%|██▉ | 1987/6640 [1:50:14<21:05:29, 16.32s/it] 30%|██▉ | 1988/6640 [1:50:31<21:09:34, 16.37s/it] {'loss': 0.5597, 'learning_rate': 1.6431197231021543e-05, 'epoch': 0.3} 30%|██▉ | 1988/6640 [1:50:31<21:09:34, 16.37s/it] 30%|██▉ | 1989/6640 [1:50:47<21:12:33, 16.42s/it] {'loss': 0.59, 'learning_rate': 1.6427460870101837e-05, 'epoch': 0.3} 30%|██▉ | 1989/6640 [1:50:47<21:12:33, 16.42s/it] 30%|██▉ | 1990/6640 [1:51:03<21:04:49, 16.32s/it] {'loss': 0.5368, 'learning_rate': 1.6423722979618883e-05, 'epoch': 0.3} 30%|██▉ | 1990/6640 [1:51:03<21:04:49, 16.32s/it] 30%|██▉ | 1991/6640 [1:51:20<21:16:15, 16.47s/it] {'loss': 0.5697, 'learning_rate': 1.6419983560462178e-05, 'epoch': 0.3} 30%|██▉ | 1991/6640 [1:51:20<21:16:15, 16.47s/it] 30%|███ | 1992/6640 [1:51:37<21:17:50, 16.50s/it] {'loss': 0.5482, 'learning_rate': 1.6416242613521612e-05, 'epoch': 0.3} 30%|███ | 1992/6640 [1:51:37<21:17:50, 16.50s/it] 30%|███ | 1993/6640 [1:51:53<21:07:03, 16.36s/it] {'loss': 0.5264, 'learning_rate': 1.641250013968743e-05, 'epoch': 0.3} 30%|███ | 1993/6640 [1:51:53<21:07:03, 16.36s/it] 30%|███ | 1994/6640 [1:52:09<21:06:21, 16.35s/it] {'loss': 0.5465, 'learning_rate': 1.6408756139850243e-05, 'epoch': 0.3} 30%|███ | 1994/6640 [1:52:09<21:06:21, 16.35s/it] 30%|███ | 1995/6640 [1:52:25<20:54:31, 16.20s/it] {'loss': 0.5405, 'learning_rate': 1.6405010614901017e-05, 'epoch': 0.3} 30%|███ | 1995/6640 [1:52:25<20:54:31, 16.20s/it] 30%|███ | 1996/6640 [1:52:41<20:46:13, 16.10s/it] {'loss': 0.5535, 'learning_rate': 1.640126356573109e-05, 'epoch': 0.3} 30%|███ | 1996/6640 [1:52:41<20:46:13, 16.10s/it] 30%|███ | 1997/6640 [1:52:58<21:18:35, 16.52s/it] {'loss': 0.5457, 'learning_rate': 1.639751499323216e-05, 'epoch': 0.3} 30%|███ | 1997/6640 [1:52:58<21:18:35, 16.52s/it] 30%|███ | 1998/6640 [1:53:15<21:14:25, 16.47s/it] {'loss': 0.5268, 'learning_rate': 1.6393764898296283e-05, 'epoch': 0.3} 30%|███ | 1998/6640 [1:53:15<21:14:25, 16.47s/it] 30%|███ | 1999/6640 [1:53:31<21:03:48, 16.34s/it] {'loss': 0.5446, 'learning_rate': 1.6390013281815884e-05, 'epoch': 0.3} 30%|███ | 1999/6640 [1:53:31<21:03:48, 16.34s/it]6 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 30%|███ | 2000/6640 [1:53:47<21:12:58, 16.46s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.5436, 'learning_rate': 1.6386260144683744e-05, 'epoch': 0.3} 30%|███ | 2000/6640 [1:53:47<21:12:58, 16.46s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 30%|███ | 2001/6640 [1:55:35<56:17:19, 43.68s/it] {'loss': 0.5411, 'learning_rate': 1.6382505487793015e-05, 'epoch': 0.3} 30%|███ | 2001/6640 [1:55:35<56:17:19, 43.68s/it] 30%|███ | 2002/6640 [1:55:51<45:44:49, 35.51s/it] {'loss': 0.5432, 'learning_rate': 1.6378749312037197e-05, 'epoch': 0.3} 30%|███ | 2002/6640 [1:55:51<45:44:49, 35.51s/it] 30%|███ | 2003/6640 [1:56:08<38:22:26, 29.79s/it] {'loss': 0.5471, 'learning_rate': 1.6374991618310165e-05, 'epoch': 0.3} 30%|███ | 2003/6640 [1:56:08<38:22:26, 29.79s/it] 30%|███ | 2004/6640 [1:56:23<32:56:54, 25.59s/it] {'loss': 0.5402, 'learning_rate': 1.6371232407506146e-05, 'epoch': 0.3} 30%|███ | 2004/6640 [1:56:23<32:56:54, 25.59s/it] 30%|███ | 2005/6640 [1:56:39<29:10:03, 22.65s/it] {'loss': 0.5572, 'learning_rate': 1.6367471680519734e-05, 'epoch': 0.3} 30%|███ | 2005/6640 [1:56:39<29:10:03, 22.65s/it] 30%|███ | 2006/6640 [1:56:55<26:34:45, 20.65s/it] {'loss': 0.552, 'learning_rate': 1.6363709438245877e-05, 'epoch': 0.3} 30%|███ | 2006/6640 [1:56:55<26:34:45, 20.65s/it] 30%|███ | 2007/6640 [1:57:12<25:01:11, 19.44s/it] {'loss': 0.5511, 'learning_rate': 1.635994568157989e-05, 'epoch': 0.3} 30%|███ | 2007/6640 [1:57:12<25:01:11, 19.44s/it] 30%|███ | 2008/6640 [1:57:28<23:57:48, 18.62s/it] {'loss': 0.5509, 'learning_rate': 1.6356180411417448e-05, 'epoch': 0.3} 30%|███ | 2008/6640 [1:57:28<23:57:48, 18.62s/it] 30%|███ | 2009/6640 [1:57:45<23:17:28, 18.11s/it] {'loss': 0.5499, 'learning_rate': 1.6352413628654584e-05, 'epoch': 0.3} 30%|███ | 2009/6640 [1:57:45<23:17:28, 18.11s/it] 30%|███ | 2010/6640 [1:58:02<22:47:53, 17.73s/it] {'loss': 0.5389, 'learning_rate': 1.6348645334187686e-05, 'epoch': 0.3} 30%|███ | 2010/6640 [1:58:02<22:47:53, 17.73s/it] 30%|███ | 2011/6640 [1:58:19<22:25:00, 17.43s/it] {'loss': 0.5392, 'learning_rate': 1.6344875528913517e-05, 'epoch': 0.3} 30%|███ | 2011/6640 [1:58:19<22:25:00, 17.43s/it] 30%|███ | 2012/6640 [1:58:35<21:43:49, 16.90s/it] {'loss': 0.5469, 'learning_rate': 1.6341104213729177e-05, 'epoch': 0.3} 30%|███ | 2012/6640 [1:58:35<21:43:49, 16.90s/it] 30%|███ | 2013/6640 [1:58:51<21:23:51, 16.65s/it] {'loss': 0.5464, 'learning_rate': 1.6337331389532148e-05, 'epoch': 0.3} 30%|███ | 2013/6640 [1:58:51<21:23:51, 16.65s/it] 30%|███ | 2014/6640 [1:59:08<21:35:24, 16.80s/it] {'loss': 0.5626, 'learning_rate': 1.633355705722025e-05, 'epoch': 0.3} 30%|███ | 2014/6640 [1:59:08<21:35:24, 16.80s/it] 30%|███ | 2015/6640 [1:59:24<21:10:15, 16.48s/it] {'loss': 0.5243, 'learning_rate': 1.632978121769169e-05, 'epoch': 0.3} 30%|███ | 2015/6640 [1:59:24<21:10:15, 16.48s/it] 30%|███ | 2016/6640 [1:59:40<20:59:30, 16.34s/it] {'loss': 0.5503, 'learning_rate': 1.6326003871845003e-05, 'epoch': 0.3} 30%|███ | 2016/6640 [1:59:40<20:59:30, 16.34s/it] 30%|███ | 2017/6640 [1:59:56<21:10:26, 16.49s/it] {'loss': 0.5325, 'learning_rate': 1.63222250205791e-05, 'epoch': 0.3} 30%|███ | 2017/6640 [1:59:56<21:10:26, 16.49s/it] 30%|███ | 2018/6640 [2:00:13<21:05:02, 16.42s/it] {'loss': 0.5461, 'learning_rate': 1.6318444664793243e-05, 'epoch': 0.3} 30%|███ | 2018/6640 [2:00:13<21:05:02, 16.42s/it] 30%|███ | 2019/6640 [2:00:29<20:54:39, 16.29s/it] {'loss': 0.521, 'learning_rate': 1.631466280538706e-05, 'epoch': 0.3} 30%|███ | 2019/6640 [2:00:29<20:54:39, 16.29s/it] 30%|███ | 2020/6640 [2:00:45<20:52:19, 16.26s/it] {'loss': 0.5416, 'learning_rate': 1.631087944326053e-05, 'epoch': 0.3} 30%|███ | 2020/6640 [2:00:45<20:52:19, 16.26s/it] 30%|███ | 2021/6640 [2:01:01<20:47:57, 16.21s/it] {'loss': 0.5333, 'learning_rate': 1.630709457931399e-05, 'epoch': 0.3} 30%|███ | 2021/6640 [2:01:01<20:47:57, 16.21s/it] 30%|███ | 2022/6640 [2:01:17<20:51:53, 16.27s/it] {'loss': 0.5574, 'learning_rate': 1.630330821444814e-05, 'epoch': 0.3} 30%|███ | 2022/6640 [2:01:17<20:51:53, 16.27s/it] 30%|███ | 2023/6640 [2:01:33<20:46:07, 16.19s/it] {'loss': 0.5322, 'learning_rate': 1.629952034956403e-05, 'epoch': 0.3} 30%|███ | 2023/6640 [2:01:33<20:46:07, 16.19s/it] 30%|███ | 2024/6640 [2:01:49<20:38:54, 16.10s/it] {'loss': 0.5356, 'learning_rate': 1.6295730985563074e-05, 'epoch': 0.3} 30%|███ | 2024/6640 [2:01:49<20:38:54, 16.10s/it] 30%|███ | 2025/6640 [2:02:06<20:52:50, 16.29s/it] {'loss': 0.538, 'learning_rate': 1.6291940123347033e-05, 'epoch': 0.3} 30%|███ | 2025/6640 [2:02:06<20:52:50, 16.29s/it] 31%|███ | 2026/6640 [2:02:22<20:43:09, 16.17s/it] {'loss': 0.5374, 'learning_rate': 1.6288147763818038e-05, 'epoch': 0.31} 31%|███ | 2026/6640 [2:02:22<20:43:09, 16.17s/it] 31%|███ | 2027/6640 [2:02:40<21:28:18, 16.76s/it] {'loss': 0.5458, 'learning_rate': 1.6284353907878557e-05, 'epoch': 0.31} 31%|███ | 2027/6640 [2:02:40<21:28:18, 16.76s/it] 31%|███ | 2028/6640 [2:02:56<21:13:10, 16.56s/it] {'loss': 0.5437, 'learning_rate': 1.6280558556431437e-05, 'epoch': 0.31} 31%|███ | 2028/6640 [2:02:56<21:13:10, 16.56s/it] 31%|███ | 2029/6640 [2:03:12<21:03:50, 16.45s/it] {'loss': 0.5423, 'learning_rate': 1.627676171037987e-05, 'epoch': 0.31} 31%|███ | 2029/6640 [2:03:12<21:03:50, 16.45s/it] 31%|███ | 2030/6640 [2:03:28<20:50:05, 16.27s/it] {'loss': 0.5385, 'learning_rate': 1.6272963370627398e-05, 'epoch': 0.31} 31%|███ | 2030/6640 [2:03:28<20:50:05, 16.27s/it] 31%|███ | 2031/6640 [2:03:44<20:48:47, 16.26s/it] {'loss': 0.5317, 'learning_rate': 1.626916353807793e-05, 'epoch': 0.31} 31%|███ | 2031/6640 [2:03:44<20:48:47, 16.26s/it] 31%|███ | 2032/6640 [2:04:00<20:45:32, 16.22s/it] {'loss': 0.5441, 'learning_rate': 1.6265362213635714e-05, 'epoch': 0.31} 31%|███ | 2032/6640 [2:04:00<20:45:32, 16.22s/it] 31%|███ | 2033/6640 [2:04:18<21:15:27, 16.61s/it] {'loss': 0.5483, 'learning_rate': 1.626155939820537e-05, 'epoch': 0.31} 31%|███ | 2033/6640 [2:04:18<21:15:27, 16.61s/it] 31%|███ | 2034/6640 [2:04:35<21:21:49, 16.70s/it] {'loss': 0.5497, 'learning_rate': 1.6257755092691865e-05, 'epoch': 0.31} 31%|███ | 2034/6640 [2:04:35<21:21:49, 16.70s/it] 31%|███ | 2035/6640 [2:04:51<21:12:14, 16.58s/it] {'loss': 0.5488, 'learning_rate': 1.6253949298000527e-05, 'epoch': 0.31} 31%|███ | 2035/6640 [2:04:51<21:12:14, 16.58s/it] 31%|███ | 2036/6640 [2:05:08<21:19:24, 16.67s/it] {'loss': 0.5363, 'learning_rate': 1.6250142015037024e-05, 'epoch': 0.31} 31%|███ | 2036/6640 [2:05:08<21:19:24, 16.67s/it] 31%|███ | 2037/6640 [2:05:25<21:15:09, 16.62s/it] {'loss': 0.5575, 'learning_rate': 1.624633324470739e-05, 'epoch': 0.31} 31%|███ | 2037/6640 [2:05:25<21:15:09, 16.62s/it] 31%|███ | 2038/6640 [2:05:41<21:19:49, 16.69s/it] {'loss': 0.5284, 'learning_rate': 1.6242522987918016e-05, 'epoch': 0.31} 31%|███ | 2038/6640 [2:05:41<21:19:49, 16.69s/it] 31%|███ | 2039/6640 [2:05:59<21:39:20, 16.94s/it] {'loss': 0.5484, 'learning_rate': 1.6238711245575632e-05, 'epoch': 0.31} 31%|███ | 2039/6640 [2:05:59<21:39:20, 16.94s/it] 31%|███ | 2040/6640 [2:06:15<21:19:55, 16.69s/it] {'loss': 0.531, 'learning_rate': 1.6234898018587336e-05, 'epoch': 0.31} 31%|███ | 2040/6640 [2:06:15<21:19:55, 16.69s/it] 31%|███ | 2041/6640 [2:06:31<20:57:26, 16.40s/it] {'loss': 0.5364, 'learning_rate': 1.6231083307860574e-05, 'epoch': 0.31} 31%|███ | 2041/6640 [2:06:31<20:57:26, 16.40s/it] 31%|███ | 2042/6640 [2:06:48<21:06:36, 16.53s/it] {'loss': 0.541, 'learning_rate': 1.6227267114303145e-05, 'epoch': 0.31} 31%|███ | 2042/6640 [2:06:48<21:06:36, 16.53s/it] 31%|███ | 2043/6640 [2:07:04<20:58:48, 16.43s/it] {'loss': 0.5375, 'learning_rate': 1.6223449438823194e-05, 'epoch': 0.31} 31%|███ | 2043/6640 [2:07:04<20:58:48, 16.43s/it] 31%|███ | 2044/6640 [2:07:20<20:48:03, 16.29s/it] {'loss': 0.5367, 'learning_rate': 1.6219630282329232e-05, 'epoch': 0.31} 31%|███ | 2044/6640 [2:07:20<20:48:03, 16.29s/it] 31%|███ | 2045/6640 [2:07:36<20:43:08, 16.23s/it] {'loss': 0.539, 'learning_rate': 1.6215809645730115e-05, 'epoch': 0.31} 31%|███ | 2045/6640 [2:07:36<20:43:08, 16.23s/it] 31%|███ | 2046/6640 [2:07:52<20:39:33, 16.19s/it] {'loss': 0.5527, 'learning_rate': 1.6211987529935055e-05, 'epoch': 0.31} 31%|███ | 2046/6640 [2:07:52<20:39:33, 16.19s/it] 31%|███ | 2047/6640 [2:08:08<20:25:08, 16.00s/it] {'loss': 0.5433, 'learning_rate': 1.6208163935853605e-05, 'epoch': 0.31} 31%|███ | 2047/6640 [2:08:08<20:25:08, 16.00s/it] 31%|███ | 2048/6640 [2:08:23<20:17:20, 15.91s/it] {'loss': 0.5162, 'learning_rate': 1.6204338864395683e-05, 'epoch': 0.31} 31%|███ | 2048/6640 [2:08:23<20:17:20, 15.91s/it] 31%|███ | 2049/6640 [2:08:40<20:43:21, 16.25s/it] {'loss': 0.5488, 'learning_rate': 1.620051231647155e-05, 'epoch': 0.31} 31%|███ | 2049/6640 [2:08:40<20:43:21, 16.25s/it]04 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 76 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 31%|███ | 2050/6640 [2:08:57<20:55:21, 16.41s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5391, 'learning_rate': 1.6196684292991827e-05, 'epoch': 0.31} 31%|███ | 2050/6640 [2:08:57<20:55:21, 16.41s/it] 31%|███ | 2051/6640 [2:09:14<21:03:30, 16.52s/it] {'loss': 0.5494, 'learning_rate': 1.6192854794867477e-05, 'epoch': 0.31} 31%|███ | 2051/6640 [2:09:14<21:03:30, 16.52s/it] 31%|███ | 2052/6640 [2:09:30<20:56:43, 16.43s/it] {'loss': 0.525, 'learning_rate': 1.618902382300982e-05, 'epoch': 0.31} 31%|███ | 2052/6640 [2:09:30<20:56:43, 16.43s/it] 31%|███ | 2053/6640 [2:09:47<21:00:39, 16.49s/it] {'loss': 0.5463, 'learning_rate': 1.6185191378330523e-05, 'epoch': 0.31} 31%|███ | 2053/6640 [2:09:47<21:00:39, 16.49s/it] 31%|███ | 2054/6640 [2:10:03<21:07:20, 16.58s/it] {'loss': 0.5545, 'learning_rate': 1.6181357461741603e-05, 'epoch': 0.31} 31%|███ | 2054/6640 [2:10:03<21:07:20, 16.58s/it] 31%|███ | 2055/6640 [2:10:20<21:09:08, 16.61s/it] {'loss': 0.5397, 'learning_rate': 1.6177522074155436e-05, 'epoch': 0.31} 31%|███ | 2055/6640 [2:10:20<21:09:08, 16.61s/it] 31%|███ | 2056/6640 [2:10:36<20:52:49, 16.40s/it] {'loss': 0.5522, 'learning_rate': 1.6173685216484734e-05, 'epoch': 0.31} 31%|███ | 2056/6640 [2:10:36<20:52:49, 16.40s/it] 31%|███ | 2057/6640 [2:10:54<21:33:28, 16.93s/it] {'loss': 0.538, 'learning_rate': 1.6169846889642572e-05, 'epoch': 0.31} 31%|███ | 2057/6640 [2:10:54<21:33:28, 16.93s/it] 31%|███ | 2058/6640 [2:11:11<21:19:50, 16.76s/it] {'loss': 0.559, 'learning_rate': 1.6166007094542365e-05, 'epoch': 0.31} 31%|███ | 2058/6640 [2:11:11<21:19:50, 16.76s/it] 31%|███ | 2059/6640 [2:11:27<21:13:32, 16.68s/it] {'loss': 0.5778, 'learning_rate': 1.616216583209788e-05, 'epoch': 0.31} 31%|███ | 2059/6640 [2:11:27<21:13:32, 16.68s/it] 31%|███ | 2060/6640 [2:11:43<21:04:14, 16.56s/it] {'loss': 0.5396, 'learning_rate': 1.615832310322324e-05, 'epoch': 0.31} 31%|███ | 2060/6640 [2:11:43<21:04:14, 16.56s/it] 31%|███ | 2061/6640 [2:12:00<20:58:04, 16.49s/it] {'loss': 0.5541, 'learning_rate': 1.6154478908832907e-05, 'epoch': 0.31} 31%|███ | 2061/6640 [2:12:00<20:58:04, 16.49s/it] 31%|███ | 2062/6640 [2:12:15<20:38:56, 16.24s/it] {'loss': 0.5373, 'learning_rate': 1.6150633249841696e-05, 'epoch': 0.31} 31%|███ | 2062/6640 [2:12:15<20:38:56, 16.24s/it]WARNING: tokenization mismatch: 1 vs. 1419. [[{'from': 'human', 'value': 'ASSISTANT: You are an innovative tax advisory firm with the following knowledge and traits.\n\nKNOWLEDGE: Tax advisor, junior tax advisor, senior tax advisor, financial strategist, tax regulations.\n\nTRAITS: Empathy, innovation, solution orientation, professionalism, communication, client focus.\n\nTASK: Use your knowledge to develop an internal strategy document. We are a tax advisory firm looking to identify 10 creative ways to improve client satisfaction and loyalty. Focus on ideas that are tailored to the needs of tax advisors and enhance the overall client experience.\n\nOUTPUT: A table in German containing 10 creative strategies to improve client satisfaction and loyalty for a tax advisory firm, including a brief description for each.'}, {'from': 'gpt', 'value': 'Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei\n\n| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1 | Personalisierter Service | Jeder Kunde erhält einen dedizierten Steuerberater, der auf seine individuellen Bedürfnisse und Anliegen eingeht. |\n| 2 | Proaktive Steuerplanung | Regelmäßige Updates und Empfehlungen zur Steueroptimierung, basierend auf aktuellen Gesetzesänderungen und Kundenzielen. |\n| 3 | Umfassende Schulungsangebote | Kostenlose Workshops und Webinare, um das Steuerwissen der Kunden zu erweitern und den Umgang mit Steuerthemen zu erleichtern. |\n| 4 | Digitale Plattform | Eine benutzerfreundliche Online-Plattform, die den Kunden den Zugriff auf ihre Steuerdokumente und den Kontakt zum Berater erleichtert. |\n| 5 | Einführung von Steuer-Coaching | Persönliche Coaching-Sitzungen, um den Kunden dabei zu unterstützen, ihre steuerlichen Ziele und Möglichkeiten besser zu verstehen. |\n| 6 | Schnelle Reaktionszeiten | Verbindliches Versprechen, innerhalb von 24 Stunden auf Kundenanfragen zu reagieren, um eine effiziente Kommunikation zu gewährleisten. |\n| 7 | Transparente Preisgestaltung | Klare und faire Preisstrukturen, die den Kunden im Voraus bekannt sind, um böse Überraschungen zu vermeiden. |\n| 8 | Individuelle Jahresgespräche | Regelmäßige persönliche Jahresgespräche, um auf die Bedürfnisse und Ziele der Kunden einzugehen und die Zusammenarbeit zu stärken. |\n| 9 | Kunden-Feedback-System | Einrichtung eines einfachen Feedback-Systems, um Kundenmeinungen und Verbesserungsvorschläge aktiv einzuholen und umzusetzen. |\n| 10 | Steuer-Newsletter und Informationsangebot | Monatlicher Newsletter mit aktuellen Steuerinformationen und Tipps, der den Kunden einen Mehrwert bietet und sie auf dem Laufenden hält. |'}, {'from': 'human', 'value': 'Bitte etwas kreativer und auf bezugnehmend auf die zwischenmenschliche Ebene.'}, {'from': 'gpt', 'value': 'Tabelle: 10 kreative Strategien zur Steigerung der Kundenzufriedenheit und -loyalität in einer Steuerberatungskanzlei mit Fokus auf zwischenmenschlicher Ebene\n\n| Nr. | Strategie | Beschreibung |\n| --- | --- | --- |\n| 1 | Kunden-Events | Organisation von regelmäßigen, informellen Veranstaltungen, um den persönlichen Kontakt zwischen Klienten und Beratern zu fördern. |\n| 2 | Persönliche Geburtstagsgrüße | Individuelle Geburtstagskarten oder -nachrichten für jeden Kunden, um Wertschätzung zu zeigen und Beziehungen zu stärken. |\n| 3 | Kundenbefragung & Anpassung | Aktives Einholen von Kundenmeinungen zur individuellen Betreuung und Anpassung unserer Leistungen entsprechend der Kundenbedürfnisse. |\n| 4 | Teamvorstellung & Empathie | Vorstellung unserer Berater auf der Firmenwebsite und in sozialen Medien, um eine persönliche und empathische Beziehung zum Kunden aufzubauen. |\n| 5 | Kunden-Mentoring-Programm | Schaffung eines Programms, bei dem langjährige Kunden als Mentoren für Neukunden fungieren, um Vertrauen und Loyalität zu fördern. |\n| 6 | "Tag der offenen Tür" | Einladung der Kunden zu einem "Tag der offenen Tür" in unserer Kanzlei, um ihnen einen persönlichen Einblick in unsere Arbeitsweise zu geben. |\n| 7 | Individuelle Erfolgsgeschichten | Erstellung von individuellen Erfolgsgeschichten für jeden Kunden, um ihre steuerlichen Erfolge hervorzuheben und ihre Loyalität zu stärken. |\n| 8 | Gemeinsame soziale Projekte | Zusammenarbeit mit Kunden an sozialen Projekten und Spendenaktionen, um gemeinsame Werte zu teilen und die Zusammenarbeit zu fördern. |\n| 9 | Flexibilität in der Kommunikation | Anbieten verschiedener Kommunikationswege (Telefon, E-Mail, Videoanruf, etc.), um auf die individuellen Bedürfnisse der Kunden einzugehen. |\n| 10 | Jährliches Kunden-Dankeschön | Überraschung der Kunden mit einer kleinen Aufmerksamkeit oder einem Dankeschön-Event, um die Wertschätzung für ihre Treue auszudrücken. |'}]] (ignored) 31%|███ | 2063/6640 [2:12:32<20:57:46, 16.49s/it] {'loss': 0.5401, 'learning_rate': 1.6146786127164773e-05, 'epoch': 0.31} 31%|███ | 2063/6640 [2:12:32<20:57:46, 16.49s/it] 31%|███ | 2064/6640 [2:12:50<21:22:51, 16.82s/it] {'loss': 0.5621, 'learning_rate': 1.614293754171765e-05, 'epoch': 0.31} 31%|███ | 2064/6640 [2:12:50<21:22:51, 16.82s/it] 31%|███ | 2065/6640 [2:13:06<21:07:13, 16.62s/it] {'loss': 0.5522, 'learning_rate': 1.6139087494416184e-05, 'epoch': 0.31} 31%|███ | 2065/6640 [2:13:06<21:07:13, 16.62s/it] 31%|███ | 2066/6640 [2:13:22<20:59:46, 16.53s/it] {'loss': 0.5335, 'learning_rate': 1.6135235986176584e-05, 'epoch': 0.31} 31%|███ | 2066/6640 [2:13:22<20:59:46, 16.53s/it] 31%|███ | 2067/6640 [2:13:38<20:43:53, 16.32s/it] {'loss': 0.5459, 'learning_rate': 1.613138301791541e-05, 'epoch': 0.31} 31%|███ | 2067/6640 [2:13:38<20:43:53, 16.32s/it] 31%|███ | 2068/6640 [2:13:55<20:51:14, 16.42s/it] {'loss': 0.5526, 'learning_rate': 1.6127528590549563e-05, 'epoch': 0.31} 31%|███ | 2068/6640 [2:13:55<20:51:14, 16.42s/it] 31%|███ | 2069/6640 [2:14:11<20:43:56, 16.33s/it] {'loss': 0.5312, 'learning_rate': 1.612367270499629e-05, 'epoch': 0.31} 31%|███ | 2069/6640 [2:14:11<20:43:56, 16.33s/it] 31%|███ | 2070/6640 [2:14:28<20:56:10, 16.49s/it] {'loss': 0.539, 'learning_rate': 1.6119815362173188e-05, 'epoch': 0.31} 31%|███ | 2070/6640 [2:14:28<20:56:10, 16.49s/it] 31%|███ | 2071/6640 [2:14:44<20:45:43, 16.36s/it] {'loss': 0.5536, 'learning_rate': 1.6115956562998208e-05, 'epoch': 0.31} 31%|███ | 2071/6640 [2:14:44<20:45:43, 16.36s/it] 31%|███ | 2072/6640 [2:15:00<20:32:23, 16.19s/it] {'loss': 0.5534, 'learning_rate': 1.611209630838963e-05, 'epoch': 0.31} 31%|███ | 2072/6640 [2:15:00<20:32:23, 16.19s/it] 31%|███ | 2073/6640 [2:15:16<20:33:46, 16.21s/it] {'loss': 0.5341, 'learning_rate': 1.6108234599266102e-05, 'epoch': 0.31} 31%|███ | 2073/6640 [2:15:16<20:33:46, 16.21s/it] 31%|███ | 2074/6640 [2:15:32<20:25:50, 16.11s/it] {'loss': 0.544, 'learning_rate': 1.6104371436546604e-05, 'epoch': 0.31} 31%|███ | 2074/6640 [2:15:32<20:25:50, 16.11s/it] 31%|███▏ | 2075/6640 [2:15:48<20:30:41, 16.18s/it] {'loss': 0.5391, 'learning_rate': 1.6100506821150455e-05, 'epoch': 0.31} 31%|███▏ | 2075/6640 [2:15:48<20:30:41, 16.18s/it] 31%|███▏ | 2076/6640 [2:16:05<20:44:23, 16.36s/it] {'loss': 0.548, 'learning_rate': 1.609664075399735e-05, 'epoch': 0.31} 31%|███▏ | 2076/6640 [2:16:05<20:44:23, 16.36s/it] 31%|███▏ | 2077/6640 [2:16:21<20:43:36, 16.35s/it] {'loss': 0.5496, 'learning_rate': 1.6092773236007288e-05, 'epoch': 0.31} 31%|███▏ | 2077/6640 [2:16:21<20:43:36, 16.35s/it] 31%|███▏ | 2078/6640 [2:16:37<20:38:09, 16.28s/it] {'loss': 0.5206, 'learning_rate': 1.6088904268100648e-05, 'epoch': 0.31} 31%|███▏ | 2078/6640 [2:16:38<20:38:09, 16.28s/it] 31%|███▏ | 2079/6640 [2:16:54<20:35:53, 16.26s/it] {'loss': 0.5249, 'learning_rate': 1.6085033851198136e-05, 'epoch': 0.31} 31%|███▏ | 2079/6640 [2:16:54<20:35:53, 16.26s/it] 31%|███▏ | 2080/6640 [2:17:10<20:38:18, 16.29s/it] {'loss': 0.5266, 'learning_rate': 1.6081161986220807e-05, 'epoch': 0.31} 31%|███▏ | 2080/6640 [2:17:10<20:38:18, 16.29s/it] 31%|███▏ | 2081/6640 [2:17:26<20:35:01, 16.25s/it] {'loss': 0.5438, 'learning_rate': 1.6077288674090063e-05, 'epoch': 0.31} 31%|███▏ | 2081/6640 [2:17:26<20:35:01, 16.25s/it] 31%|███▏ | 2082/6640 [2:17:43<20:38:47, 16.31s/it] {'loss': 0.5245, 'learning_rate': 1.6073413915727648e-05, 'epoch': 0.31} 31%|███▏ | 2082/6640 [2:17:43<20:38:47, 16.31s/it] 31%|███▏ | 2083/6640 [2:17:59<20:38:37, 16.31s/it] {'loss': 0.5324, 'learning_rate': 1.6069537712055652e-05, 'epoch': 0.31} 31%|███▏ | 2083/6640 [2:17:59<20:38:37, 16.31s/it] 31%|███▏ | 2084/6640 [2:18:15<20:41:24, 16.35s/it] {'loss': 0.5593, 'learning_rate': 1.6065660063996502e-05, 'epoch': 0.31} 31%|███▏ | 2084/6640 [2:18:15<20:41:24, 16.35s/it] 31%|███▏ | 2085/6640 [2:18:32<20:40:29, 16.34s/it] {'loss': 0.5567, 'learning_rate': 1.6061780972472978e-05, 'epoch': 0.31} 31%|███▏ | 2085/6640 [2:18:32<20:40:29, 16.34s/it] 31%|███▏ | 2086/6640 [2:18:48<20:39:38, 16.33s/it] {'loss': 0.5248, 'learning_rate': 1.60579004384082e-05, 'epoch': 0.31} 31%|███▏ | 2086/6640 [2:18:48<20:39:38, 16.33s/it] 31%|███▏ | 2087/6640 [2:19:04<20:23:50, 16.13s/it] {'loss': 0.5352, 'learning_rate': 1.605401846272563e-05, 'epoch': 0.31} 31%|███▏ | 2087/6640 [2:19:04<20:23:50, 16.13s/it] 31%|███▏ | 2088/6640 [2:19:20<20:35:21, 16.28s/it] {'loss': 0.5618, 'learning_rate': 1.6050135046349073e-05, 'epoch': 0.31} 31%|███▏ | 2088/6640 [2:19:20<20:35:21, 16.28s/it] 31%|███▏ | 2089/6640 [2:19:36<20:15:17, 16.02s/it] {'loss': 0.5372, 'learning_rate': 1.6046250190202684e-05, 'epoch': 0.31} 31%|███▏ | 2089/6640 [2:19:36<20:15:17, 16.02s/it] 31%|███▏ | 2090/6640 [2:19:52<20:15:47, 16.03s/it] {'loss': 0.5417, 'learning_rate': 1.6042363895210948e-05, 'epoch': 0.31} 31%|███▏ | 2090/6640 [2:19:52<20:15:47, 16.03s/it] 31%|███▏ | 2091/6640 [2:20:08<20:14:51, 16.02s/it] {'loss': 0.5423, 'learning_rate': 1.60384761622987e-05, 'epoch': 0.31} 31%|███▏ | 2091/6640 [2:20:08<20:14:51, 16.02s/it] 32%|███▏ | 2092/6640 [2:20:24<20:11:54, 15.99s/it] {'loss': 0.5562, 'learning_rate': 1.603458699239112e-05, 'epoch': 0.32} 32%|███▏ | 2092/6640 [2:20:24<20:11:54, 15.99s/it] 32%|███▏ | 2093/6640 [2:20:40<20:11:20, 15.98s/it] {'loss': 0.5448, 'learning_rate': 1.6030696386413715e-05, 'epoch': 0.32} 32%|███▏ | 2093/6640 [2:20:40<20:11:20, 15.98s/it] 32%|███▏ | 2094/6640 [2:20:56<20:10:20, 15.97s/it] {'loss': 0.5219, 'learning_rate': 1.602680434529236e-05, 'epoch': 0.32} 32%|███▏ | 2094/6640 [2:20:56<20:10:20, 15.97s/it] 32%|███▏ | 2095/6640 [2:21:11<19:57:27, 15.81s/it] {'loss': 0.5342, 'learning_rate': 1.6022910869953245e-05, 'epoch': 0.32} 32%|███▏ | 2095/6640 [2:21:11<19:57:27, 15.81s/it] 32%|███▏ | 2096/6640 [2:21:27<19:54:30, 15.77s/it] {'loss': 0.5353, 'learning_rate': 1.601901596132292e-05, 'epoch': 0.32} 32%|███▏ | 2096/6640 [2:21:27<19:54:30, 15.77s/it] 32%|███▏ | 2097/6640 [2:21:44<20:34:16, 16.30s/it] {'loss': 0.5483, 'learning_rate': 1.6015119620328266e-05, 'epoch': 0.32} 32%|███▏ | 2097/6640 [2:21:44<20:34:16, 16.30s/it] 32%|███▏ | 2098/6640 [2:22:01<20:41:30, 16.40s/it] {'loss': 0.5217, 'learning_rate': 1.601122184789651e-05, 'epoch': 0.32} 32%|███▏ | 2098/6640 [2:22:01<20:41:30, 16.40s/it] 32%|███▏ | 2099/6640 [2:22:18<21:04:28, 16.71s/it] {'loss': 0.5395, 'learning_rate': 1.6007322644955208e-05, 'epoch': 0.32} 32%|███▏ | 2099/6640 [2:22:18<21:04:28, 16.71s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 32%|███▏ | 2100/6640 [2:22:34<20:40:36, 16.40s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5241, 'learning_rate': 1.6003422012432275e-05, 'epoch': 0.32} 32%|███▏ | 2100/6640 [2:22:34<20:40:36, 16.40s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 32%|███▏ | 2101/6640 [2:24:22<55:27:34, 43.99s/it] {'loss': 0.5408, 'learning_rate': 1.5999519951255957e-05, 'epoch': 0.32} 32%|███▏ | 2101/6640 [2:24:22<55:27:34, 43.99s/it] 32%|███▏ | 2102/6640 [2:24:39<45:13:02, 35.87s/it] {'loss': 0.5498, 'learning_rate': 1.5995616462354835e-05, 'epoch': 0.32} 32%|███▏ | 2102/6640 [2:24:39<45:13:02, 35.87s/it] 32%|███▏ | 2103/6640 [2:24:56<38:00:29, 30.16s/it] {'loss': 0.5354, 'learning_rate': 1.5991711546657837e-05, 'epoch': 0.32} 32%|███▏ | 2103/6640 [2:24:56<38:00:29, 30.16s/it] 32%|███▏ | 2104/6640 [2:25:13<32:50:32, 26.07s/it] {'loss': 0.5507, 'learning_rate': 1.5987805205094225e-05, 'epoch': 0.32} 32%|███▏ | 2104/6640 [2:25:13<32:50:32, 26.07s/it] 32%|███▏ | 2105/6640 [2:25:30<29:28:43, 23.40s/it] {'loss': 0.5457, 'learning_rate': 1.5983897438593612e-05, 'epoch': 0.32} 32%|███▏ | 2105/6640 [2:25:30<29:28:43, 23.40s/it] 32%|███▏ | 2106/6640 [2:25:47<27:00:21, 21.44s/it] {'loss': 0.5294, 'learning_rate': 1.597998824808593e-05, 'epoch': 0.32} 32%|███▏ | 2106/6640 [2:25:47<27:00:21, 21.44s/it] 32%|███▏ | 2107/6640 [2:26:03<24:57:51, 19.83s/it] {'loss': 0.5515, 'learning_rate': 1.5976077634501476e-05, 'epoch': 0.32} 32%|███▏ | 2107/6640 [2:26:03<24:57:51, 19.83s/it] 32%|███▏ | 2108/6640 [2:26:19<23:31:06, 18.68s/it] {'loss': 0.5448, 'learning_rate': 1.5972165598770853e-05, 'epoch': 0.32} 32%|███▏ | 2108/6640 [2:26:19<23:31:06, 18.68s/it] 32%|███▏ | 2109/6640 [2:26:35<22:42:35, 18.04s/it] {'loss': 0.523, 'learning_rate': 1.5968252141825038e-05, 'epoch': 0.32} 32%|███▏ | 2109/6640 [2:26:35<22:42:35, 18.04s/it] 32%|███▏ | 2110/6640 [2:26:52<22:07:47, 17.59s/it] {'loss': 0.5583, 'learning_rate': 1.596433726459532e-05, 'epoch': 0.32} 32%|███▏ | 2110/6640 [2:26:52<22:07:47, 17.59s/it] 32%|███▏ | 2111/6640 [2:27:08<21:33:24, 17.14s/it] {'loss': 0.5392, 'learning_rate': 1.5960420968013334e-05, 'epoch': 0.32} 32%|███▏ | 2111/6640 [2:27:08<21:33:24, 17.14s/it] 32%|███▏ | 2112/6640 [2:27:24<21:14:19, 16.89s/it] {'loss': 0.5051, 'learning_rate': 1.5956503253011052e-05, 'epoch': 0.32} 32%|███▏ | 2112/6640 [2:27:24<21:14:19, 16.89s/it] 32%|███▏ | 2113/6640 [2:27:41<21:03:28, 16.75s/it] {'loss': 0.5223, 'learning_rate': 1.595258412052079e-05, 'epoch': 0.32} 32%|███▏ | 2113/6640 [2:27:41<21:03:28, 16.75s/it] 32%|███▏ | 2114/6640 [2:27:57<20:53:17, 16.61s/it] {'loss': 0.5374, 'learning_rate': 1.5948663571475197e-05, 'epoch': 0.32} 32%|███▏ | 2114/6640 [2:27:57<20:53:17, 16.61s/it] 32%|███▏ | 2115/6640 [2:28:13<20:33:53, 16.36s/it] {'loss': 0.5301, 'learning_rate': 1.5944741606807257e-05, 'epoch': 0.32} 32%|███▏ | 2115/6640 [2:28:13<20:33:53, 16.36s/it] 32%|███▏ | 2116/6640 [2:28:29<20:28:01, 16.29s/it] {'loss': 0.5228, 'learning_rate': 1.5940818227450292e-05, 'epoch': 0.32} 32%|███▏ | 2116/6640 [2:28:29<20:28:01, 16.29s/it] 32%|███▏ | 2117/6640 [2:28:46<20:45:18, 16.52s/it] {'loss': 0.5411, 'learning_rate': 1.5936893434337957e-05, 'epoch': 0.32} 32%|███▏ | 2117/6640 [2:28:46<20:45:18, 16.52s/it] 32%|███▏ | 2118/6640 [2:29:02<20:26:14, 16.27s/it] {'loss': 0.547, 'learning_rate': 1.5932967228404255e-05, 'epoch': 0.32} 32%|███▏ | 2118/6640 [2:29:02<20:26:14, 16.27s/it] 32%|███▏ | 2119/6640 [2:29:18<20:23:42, 16.24s/it] {'loss': 0.5509, 'learning_rate': 1.592903961058351e-05, 'epoch': 0.32} 32%|███▏ | 2119/6640 [2:29:18<20:23:42, 16.24s/it] 32%|███▏ | 2120/6640 [2:29:35<20:50:30, 16.60s/it] {'loss': 0.5679, 'learning_rate': 1.5925110581810396e-05, 'epoch': 0.32} 32%|███▏ | 2120/6640 [2:29:35<20:50:30, 16.60s/it] 32%|███▏ | 2121/6640 [2:29:51<20:34:06, 16.39s/it] {'loss': 0.5273, 'learning_rate': 1.5921180143019915e-05, 'epoch': 0.32} 32%|███▏ | 2121/6640 [2:29:51<20:34:06, 16.39s/it] 32%|███▏ | 2122/6640 [2:30:07<20:22:47, 16.24s/it] {'loss': 0.559, 'learning_rate': 1.59172482951474e-05, 'epoch': 0.32} 32%|███▏ | 2122/6640 [2:30:07<20:22:47, 16.24s/it] 32%|███▏ | 2123/6640 [2:30:23<20:20:16, 16.21s/it] {'loss': 0.5337, 'learning_rate': 1.5913315039128534e-05, 'epoch': 0.32} 32%|███▏ | 2123/6640 [2:30:23<20:20:16, 16.21s/it] 32%|███▏ | 2124/6640 [2:30:40<20:34:43, 16.40s/it] {'loss': 0.5464, 'learning_rate': 1.5909380375899323e-05, 'epoch': 0.32} 32%|███▏ | 2124/6640 [2:30:40<20:34:43, 16.40s/it] 32%|███▏ | 2125/6640 [2:30:56<20:21:17, 16.23s/it] {'loss': 0.5327, 'learning_rate': 1.590544430639611e-05, 'epoch': 0.32} 32%|███▏ | 2125/6640 [2:30:56<20:21:17, 16.23s/it] 32%|███▏ | 2126/6640 [2:31:13<20:35:19, 16.42s/it] {'loss': 0.529, 'learning_rate': 1.5901506831555575e-05, 'epoch': 0.32} 32%|███▏ | 2126/6640 [2:31:13<20:35:19, 16.42s/it] 32%|███▏ | 2127/6640 [2:31:29<20:36:07, 16.43s/it] {'loss': 0.5262, 'learning_rate': 1.5897567952314733e-05, 'epoch': 0.32} 32%|███▏ | 2127/6640 [2:31:29<20:36:07, 16.43s/it] 32%|███▏ | 2128/6640 [2:31:45<20:13:11, 16.13s/it] {'loss': 0.552, 'learning_rate': 1.5893627669610926e-05, 'epoch': 0.32} 32%|███▏ | 2128/6640 [2:31:45<20:13:11, 16.13s/it] 32%|███▏ | 2129/6640 [2:32:01<20:12:38, 16.13s/it] {'loss': 0.5311, 'learning_rate': 1.588968598438184e-05, 'epoch': 0.32} 32%|███▏ | 2129/6640 [2:32:01<20:12:38, 16.13s/it] 32%|███▏ | 2130/6640 [2:32:17<20:14:20, 16.16s/it] {'loss': 0.5499, 'learning_rate': 1.5885742897565494e-05, 'epoch': 0.32} 32%|███▏ | 2130/6640 [2:32:17<20:14:20, 16.16s/it] 32%|███▏ | 2131/6640 [2:32:33<20:07:00, 16.06s/it] {'loss': 0.5213, 'learning_rate': 1.588179841010023e-05, 'epoch': 0.32} 32%|███▏ | 2131/6640 [2:32:33<20:07:00, 16.06s/it] 32%|███▏ | 2132/6640 [2:32:49<20:01:46, 16.00s/it] {'loss': 0.5302, 'learning_rate': 1.5877852522924733e-05, 'epoch': 0.32} 32%|███▏ | 2132/6640 [2:32:49<20:01:46, 16.00s/it] 32%|███▏ | 2133/6640 [2:33:06<20:28:05, 16.35s/it] {'loss': 0.5458, 'learning_rate': 1.5873905236978017e-05, 'epoch': 0.32} 32%|███▏ | 2133/6640 [2:33:06<20:28:05, 16.35s/it] 32%|███▏ | 2134/6640 [2:33:23<20:37:15, 16.47s/it] {'loss': 0.5509, 'learning_rate': 1.5869956553199432e-05, 'epoch': 0.32} 32%|███▏ | 2134/6640 [2:33:23<20:37:15, 16.47s/it] 32%|███▏ | 2135/6640 [2:33:39<20:26:01, 16.33s/it] {'loss': 0.5422, 'learning_rate': 1.586600647252866e-05, 'epoch': 0.32} 32%|███▏ | 2135/6640 [2:33:39<20:26:01, 16.33s/it] 32%|███▏ | 2136/6640 [2:33:55<20:27:42, 16.35s/it] {'loss': 0.5524, 'learning_rate': 1.5862054995905712e-05, 'epoch': 0.32} 32%|███▏ | 2136/6640 [2:33:55<20:27:42, 16.35s/it] 32%|███▏ | 2137/6640 [2:34:13<21:13:32, 16.97s/it] {'loss': 0.5365, 'learning_rate': 1.5858102124270933e-05, 'epoch': 0.32} 32%|███▏ | 2137/6640 [2:34:13<21:13:32, 16.97s/it] 32%|███▏ | 2138/6640 [2:34:30<20:57:19, 16.76s/it] {'loss': 0.5379, 'learning_rate': 1.5854147858565002e-05, 'epoch': 0.32} 32%|███▏ | 2138/6640 [2:34:30<20:57:19, 16.76s/it] 32%|███▏ | 2139/6640 [2:34:47<21:02:05, 16.82s/it] {'loss': 0.5278, 'learning_rate': 1.5850192199728927e-05, 'epoch': 0.32} 32%|███▏ | 2139/6640 [2:34:47<21:02:05, 16.82s/it] 32%|███▏ | 2140/6640 [2:35:03<20:47:52, 16.64s/it] {'loss': 0.5365, 'learning_rate': 1.5846235148704047e-05, 'epoch': 0.32} 32%|███▏ | 2140/6640 [2:35:03<20:47:52, 16.64s/it] 32%|███▏ | 2141/6640 [2:35:19<20:38:27, 16.52s/it] {'loss': 0.5431, 'learning_rate': 1.584227670643204e-05, 'epoch': 0.32} 32%|███▏ | 2141/6640 [2:35:19<20:38:27, 16.52s/it] 32%|███▏ | 2142/6640 [2:35:35<20:16:16, 16.22s/it] {'loss': 0.5563, 'learning_rate': 1.58383168738549e-05, 'epoch': 0.32} 32%|███▏ | 2142/6640 [2:35:35<20:16:16, 16.22s/it] 32%|███▏ | 2143/6640 [2:35:52<20:36:00, 16.49s/it] {'loss': 0.5476, 'learning_rate': 1.583435565191497e-05, 'epoch': 0.32} 32%|███▏ | 2143/6640 [2:35:52<20:36:00, 16.49s/it] 32%|███▏ | 2144/6640 [2:36:08<20:41:05, 16.56s/it] {'loss': 0.5354, 'learning_rate': 1.583039304155491e-05, 'epoch': 0.32} 32%|███▏ | 2144/6640 [2:36:08<20:41:05, 16.56s/it] 32%|███▏ | 2145/6640 [2:36:25<20:37:15, 16.52s/it] {'loss': 0.5519, 'learning_rate': 1.5826429043717716e-05, 'epoch': 0.32} 32%|███▏ | 2145/6640 [2:36:25<20:37:15, 16.52s/it] 32%|███▏ | 2146/6640 [2:36:41<20:30:48, 16.43s/it] {'loss': 0.5385, 'learning_rate': 1.582246365934671e-05, 'epoch': 0.32} 32%|███▏ | 2146/6640 [2:36:41<20:30:48, 16.43s/it] 32%|███▏ | 2147/6640 [2:36:57<20:22:30, 16.33s/it] {'loss': 0.5437, 'learning_rate': 1.5818496889385554e-05, 'epoch': 0.32} 32%|███▏ | 2147/6640 [2:36:57<20:22:30, 16.33s/it] 32%|███▏ | 2148/6640 [2:37:13<20:09:17, 16.15s/it] {'loss': 0.5292, 'learning_rate': 1.5814528734778228e-05, 'epoch': 0.32} 32%|███▏ | 2148/6640 [2:37:13<20:09:17, 16.15s/it] 32%|███▏ | 2149/6640 [2:37:29<20:07:54, 16.14s/it] {'loss': 0.5301, 'learning_rate': 1.5810559196469043e-05, 'epoch': 0.32} 32%|███▏ | 2149/6640 [2:37:29<20:07:54, 16.14s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 67 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 32%|███▏ | 2150/6640 [2:37:45<20:11:55, 16.19s/it]4 AutoResumeHook: Checking whether to suspend... {'loss': 0.5446, 'learning_rate': 1.580658827540265e-05, 'epoch': 0.32} 32%|███▏ | 2150/6640 [2:37:45<20:11:55, 16.19s/it] 32%|███▏ | 2151/6640 [2:38:03<20:46:28, 16.66s/it] {'loss': 0.5343, 'learning_rate': 1.5802615972524017e-05, 'epoch': 0.32} 32%|███▏ | 2151/6640 [2:38:03<20:46:28, 16.66s/it] 32%|███▏ | 2152/6640 [2:38:20<20:42:19, 16.61s/it] {'loss': 0.5146, 'learning_rate': 1.579864228877845e-05, 'epoch': 0.32} 32%|███▏ | 2152/6640 [2:38:20<20:42:19, 16.61s/it] 32%|███▏ | 2153/6640 [2:38:36<20:38:57, 16.57s/it] {'loss': 0.5399, 'learning_rate': 1.5794667225111572e-05, 'epoch': 0.32} 32%|███▏ | 2153/6640 [2:38:36<20:38:57, 16.57s/it] 32%|███▏ | 2154/6640 [2:38:52<20:21:13, 16.33s/it] {'loss': 0.5431, 'learning_rate': 1.5790690782469345e-05, 'epoch': 0.32} 32%|███▏ | 2154/6640 [2:38:52<20:21:13, 16.33s/it] 32%|███▏ | 2155/6640 [2:39:07<20:04:00, 16.11s/it] {'loss': 0.5339, 'learning_rate': 1.578671296179806e-05, 'epoch': 0.32} 32%|███▏ | 2155/6640 [2:39:07<20:04:00, 16.11s/it] 32%|███▏ | 2156/6640 [2:39:26<20:55:14, 16.80s/it] {'loss': 0.5332, 'learning_rate': 1.5782733764044326e-05, 'epoch': 0.32} 32%|███▏ | 2156/6640 [2:39:26<20:55:14, 16.80s/it] 32%|███▏ | 2157/6640 [2:39:42<20:33:44, 16.51s/it] {'loss': 0.5243, 'learning_rate': 1.5778753190155085e-05, 'epoch': 0.32} 32%|███▏ | 2157/6640 [2:39:42<20:33:44, 16.51s/it] 32%|███▎ | 2158/6640 [2:39:58<20:25:28, 16.41s/it] {'loss': 0.5195, 'learning_rate': 1.5774771241077612e-05, 'epoch': 0.33} 32%|███▎ | 2158/6640 [2:39:58<20:25:28, 16.41s/it] 33%|███▎ | 2159/6640 [2:40:15<20:35:07, 16.54s/it] {'loss': 0.5456, 'learning_rate': 1.57707879177595e-05, 'epoch': 0.33} 33%|███▎ | 2159/6640 [2:40:15<20:35:07, 16.54s/it] 33%|███▎ | 2160/6640 [2:40:31<20:36:45, 16.56s/it] {'loss': 0.5389, 'learning_rate': 1.5766803221148676e-05, 'epoch': 0.33} 33%|███▎ | 2160/6640 [2:40:31<20:36:45, 16.56s/it] 33%|███▎ | 2161/6640 [2:40:47<20:19:23, 16.33s/it] {'loss': 0.5558, 'learning_rate': 1.5762817152193383e-05, 'epoch': 0.33} 33%|███▎ | 2161/6640 [2:40:47<20:19:23, 16.33s/it] 33%|███▎ | 2162/6640 [2:41:04<20:21:32, 16.37s/it] {'loss': 0.5429, 'learning_rate': 1.5758829711842208e-05, 'epoch': 0.33} 33%|███▎ | 2162/6640 [2:41:04<20:21:32, 16.37s/it] 33%|███▎ | 2163/6640 [2:41:19<20:10:22, 16.22s/it] {'loss': 0.5264, 'learning_rate': 1.5754840901044054e-05, 'epoch': 0.33} 33%|███▎ | 2163/6640 [2:41:19<20:10:22, 16.22s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5014 > 4096). Running this sequence through the model will result in indexing errors 33%|███▎ | 2164/6640 [2:41:35<20:01:20, 16.10s/it] {'loss': 0.5484, 'learning_rate': 1.5750850720748146e-05, 'epoch': 0.33} 33%|███▎ | 2164/6640 [2:41:35<20:01:20, 16.10s/it] 33%|███▎ | 2165/6640 [2:41:52<20:07:47, 16.19s/it] {'loss': 0.5429, 'learning_rate': 1.574685917190404e-05, 'epoch': 0.33} 33%|███▎ | 2165/6640 [2:41:52<20:07:47, 16.19s/it] 33%|███▎ | 2166/6640 [2:42:07<19:46:16, 15.91s/it] {'loss': 0.5343, 'learning_rate': 1.574286625546162e-05, 'epoch': 0.33} 33%|███▎ | 2166/6640 [2:42:07<19:46:16, 15.91s/it] 33%|███▎ | 2167/6640 [2:42:23<19:56:04, 16.04s/it] {'loss': 0.5151, 'learning_rate': 1.5738871972371096e-05, 'epoch': 0.33} 33%|███▎ | 2167/6640 [2:42:23<19:56:04, 16.04s/it] 33%|███▎ | 2168/6640 [2:42:40<20:22:49, 16.41s/it] {'loss': 0.5498, 'learning_rate': 1.5734876323582996e-05, 'epoch': 0.33} 33%|███▎ | 2168/6640 [2:42:40<20:22:49, 16.41s/it] 33%|███▎ | 2169/6640 [2:42:56<20:09:03, 16.23s/it] {'loss': 0.5419, 'learning_rate': 1.5730879310048175e-05, 'epoch': 0.33} 33%|███▎ | 2169/6640 [2:42:56<20:09:03, 16.23s/it] 33%|███▎ | 2170/6640 [2:43:14<20:32:23, 16.54s/it] {'loss': 0.5455, 'learning_rate': 1.572688093271782e-05, 'epoch': 0.33} 33%|███▎ | 2170/6640 [2:43:14<20:32:23, 16.54s/it] 33%|███▎ | 2171/6640 [2:43:31<20:45:07, 16.72s/it] {'loss': 0.5374, 'learning_rate': 1.5722881192543433e-05, 'epoch': 0.33} 33%|███▎ | 2171/6640 [2:43:31<20:45:07, 16.72s/it] 33%|███▎ | 2172/6640 [2:43:49<21:15:24, 17.13s/it] {'loss': 0.5396, 'learning_rate': 1.5718880090476852e-05, 'epoch': 0.33} 33%|███▎ | 2172/6640 [2:43:49<21:15:24, 17.13s/it] 33%|███▎ | 2173/6640 [2:44:05<21:02:47, 16.96s/it] {'loss': 0.5275, 'learning_rate': 1.5714877627470225e-05, 'epoch': 0.33} 33%|███▎ | 2173/6640 [2:44:05<21:02:47, 16.96s/it] 33%|███▎ | 2174/6640 [2:44:21<20:29:53, 16.52s/it] {'loss': 0.5424, 'learning_rate': 1.5710873804476035e-05, 'epoch': 0.33} 33%|███▎ | 2174/6640 [2:44:21<20:29:53, 16.52s/it] 33%|███▎ | 2175/6640 [2:44:37<20:26:10, 16.48s/it] {'loss': 0.5321, 'learning_rate': 1.5706868622447084e-05, 'epoch': 0.33} 33%|███▎ | 2175/6640 [2:44:37<20:26:10, 16.48s/it] 33%|███▎ | 2176/6640 [2:44:53<20:12:37, 16.30s/it] {'loss': 0.5323, 'learning_rate': 1.570286208233649e-05, 'epoch': 0.33} 33%|███▎ | 2176/6640 [2:44:53<20:12:37, 16.30s/it] 33%|███▎ | 2177/6640 [2:45:10<20:17:35, 16.37s/it] {'loss': 0.5561, 'learning_rate': 1.5698854185097713e-05, 'epoch': 0.33} 33%|███▎ | 2177/6640 [2:45:10<20:17:35, 16.37s/it] 33%|███▎ | 2178/6640 [2:45:27<20:32:40, 16.58s/it] {'loss': 0.5395, 'learning_rate': 1.569484493168452e-05, 'epoch': 0.33} 33%|███▎ | 2178/6640 [2:45:27<20:32:40, 16.58s/it] 33%|███▎ | 2179/6640 [2:45:43<20:20:01, 16.41s/it] {'loss': 0.5234, 'learning_rate': 1.569083432305101e-05, 'epoch': 0.33} 33%|███▎ | 2179/6640 [2:45:43<20:20:01, 16.41s/it] 33%|███▎ | 2180/6640 [2:46:00<20:43:10, 16.72s/it] {'loss': 0.536, 'learning_rate': 1.568682236015159e-05, 'epoch': 0.33} 33%|███▎ | 2180/6640 [2:46:00<20:43:10, 16.72s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 33%|███▎ | 2181/6640 [2:46:18<21:11:27, 17.11s/it] {'loss': 0.543, 'learning_rate': 1.5682809043941013e-05, 'epoch': 0.33} 33%|███▎ | 2181/6640 [2:46:18<21:11:27, 17.11s/it] 33%|███▎ | 2182/6640 [2:46:35<20:56:30, 16.91s/it] {'loss': 0.5243, 'learning_rate': 1.567879437537433e-05, 'epoch': 0.33} 33%|███▎ | 2182/6640 [2:46:35<20:56:30, 16.91s/it] 33%|███▎ | 2183/6640 [2:46:51<20:40:37, 16.70s/it] {'loss': 0.5394, 'learning_rate': 1.5674778355406928e-05, 'epoch': 0.33} 33%|███▎ | 2183/6640 [2:46:51<20:40:37, 16.70s/it] 33%|███▎ | 2184/6640 [2:47:06<20:13:29, 16.34s/it] {'loss': 0.5477, 'learning_rate': 1.5670760984994516e-05, 'epoch': 0.33} 33%|███▎ | 2184/6640 [2:47:06<20:13:29, 16.34s/it] 33%|███▎ | 2185/6640 [2:47:22<20:08:09, 16.27s/it] {'loss': 0.5414, 'learning_rate': 1.566674226509311e-05, 'epoch': 0.33} 33%|███▎ | 2185/6640 [2:47:22<20:08:09, 16.27s/it] 33%|███▎ | 2186/6640 [2:47:39<20:09:11, 16.29s/it] {'loss': 0.5671, 'learning_rate': 1.566272219665907e-05, 'epoch': 0.33} 33%|███▎ | 2186/6640 [2:47:39<20:09:11, 16.29s/it] 33%|███▎ | 2187/6640 [2:47:54<19:54:25, 16.09s/it] {'loss': 0.5405, 'learning_rate': 1.5658700780649057e-05, 'epoch': 0.33} 33%|███▎ | 2187/6640 [2:47:54<19:54:25, 16.09s/it] 33%|███▎ | 2188/6640 [2:48:11<19:57:10, 16.13s/it] {'loss': 0.5411, 'learning_rate': 1.565467801802006e-05, 'epoch': 0.33} 33%|███▎ | 2188/6640 [2:48:11<19:57:10, 16.13s/it] 33%|███▎ | 2189/6640 [2:48:27<20:02:27, 16.21s/it] {'loss': 0.5605, 'learning_rate': 1.565065390972939e-05, 'epoch': 0.33} 33%|███▎ | 2189/6640 [2:48:27<20:02:27, 16.21s/it] 33%|███▎ | 2190/6640 [2:48:43<19:59:45, 16.18s/it] {'loss': 0.5354, 'learning_rate': 1.564662845673468e-05, 'epoch': 0.33} 33%|███▎ | 2190/6640 [2:48:43<19:59:45, 16.18s/it] 33%|███▎ | 2191/6640 [2:49:00<20:07:48, 16.29s/it] {'loss': 0.5334, 'learning_rate': 1.5642601659993877e-05, 'epoch': 0.33} 33%|███▎ | 2191/6640 [2:49:00<20:07:48, 16.29s/it] 33%|███▎ | 2192/6640 [2:49:16<20:18:31, 16.44s/it] {'loss': 0.5506, 'learning_rate': 1.563857352046525e-05, 'epoch': 0.33} 33%|███▎ | 2192/6640 [2:49:16<20:18:31, 16.44s/it] 33%|███▎ | 2193/6640 [2:49:33<20:20:29, 16.47s/it] {'loss': 0.5502, 'learning_rate': 1.563454403910739e-05, 'epoch': 0.33} 33%|███▎ | 2193/6640 [2:49:33<20:20:29, 16.47s/it] 33%|███▎ | 2194/6640 [2:49:50<20:30:51, 16.61s/it] {'loss': 0.549, 'learning_rate': 1.5630513216879203e-05, 'epoch': 0.33} 33%|███▎ | 2194/6640 [2:49:50<20:30:51, 16.61s/it] 33%|███▎ | 2195/6640 [2:50:06<20:24:08, 16.52s/it] {'loss': 0.5367, 'learning_rate': 1.5626481054739916e-05, 'epoch': 0.33} 33%|███▎ | 2195/6640 [2:50:06<20:24:08, 16.52s/it] 33%|███▎ | 2196/6640 [2:50:23<20:21:33, 16.49s/it] {'loss': 0.5519, 'learning_rate': 1.562244755364908e-05, 'epoch': 0.33} 33%|███▎ | 2196/6640 [2:50:23<20:21:33, 16.49s/it] 33%|███▎ | 2197/6640 [2:50:39<20:21:25, 16.49s/it] {'loss': 0.5518, 'learning_rate': 1.5618412714566555e-05, 'epoch': 0.33} 33%|███▎ | 2197/6640 [2:50:39<20:21:25, 16.49s/it] 33%|███▎ | 2198/6640 [2:50:55<20:01:03, 16.22s/it] {'loss': 0.5516, 'learning_rate': 1.5614376538452524e-05, 'epoch': 0.33} 33%|███▎ | 2198/6640 [2:50:55<20:01:03, 16.22s/it] 33%|███▎ | 2199/6640 [2:51:11<20:06:21, 16.30s/it] {'loss': 0.5389, 'learning_rate': 1.5610339026267497e-05, 'epoch': 0.33} 33%|███▎ | 2199/6640 [2:51:11<20:06:21, 16.30s/it]7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 33%|███▎ | 2200/6640 [2:51:28<20:07:28, 16.32s/it] {'loss': 0.5363, 'learning_rate': 1.560630017897229e-05, 'epoch': 0.33} 33%|███▎ | 2200/6640 [2:51:28<20:07:28, 16.32s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 33%|███▎ | 2201/6640 [2:53:16<54:09:40, 43.92s/it] {'loss': 0.5334, 'learning_rate': 1.5602259997528028e-05, 'epoch': 0.33} 33%|███▎ | 2201/6640 [2:53:16<54:09:40, 43.92s/it] 33%|███▎ | 2202/6640 [2:53:33<44:02:09, 35.72s/it] {'loss': 0.5343, 'learning_rate': 1.5598218482896182e-05, 'epoch': 0.33} 33%|███▎ | 2202/6640 [2:53:33<44:02:09, 35.72s/it] 33%|███▎ | 2203/6640 [2:53:48<36:42:03, 29.78s/it] {'loss': 0.5505, 'learning_rate': 1.559417563603852e-05, 'epoch': 0.33} 33%|███▎ | 2203/6640 [2:53:48<36:42:03, 29.78s/it] 33%|███▎ | 2204/6640 [2:54:04<31:33:23, 25.61s/it] {'loss': 0.55, 'learning_rate': 1.5590131457917128e-05, 'epoch': 0.33} 33%|███▎ | 2204/6640 [2:54:04<31:33:23, 25.61s/it] 33%|███▎ | 2205/6640 [2:54:20<27:48:55, 22.58s/it] {'loss': 0.5423, 'learning_rate': 1.558608594949441e-05, 'epoch': 0.33} 33%|███▎ | 2205/6640 [2:54:20<27:48:55, 22.58s/it] 33%|███▎ | 2206/6640 [2:54:37<25:42:54, 20.88s/it] {'loss': 0.5387, 'learning_rate': 1.5582039111733096e-05, 'epoch': 0.33} 33%|███▎ | 2206/6640 [2:54:37<25:42:54, 20.88s/it] 33%|███▎ | 2207/6640 [2:54:53<24:02:32, 19.52s/it] {'loss': 0.5358, 'learning_rate': 1.5577990945596217e-05, 'epoch': 0.33} 33%|███▎ | 2207/6640 [2:54:53<24:02:32, 19.52s/it] 33%|███▎ | 2208/6640 [2:55:10<23:05:14, 18.75s/it] {'loss': 0.5211, 'learning_rate': 1.557394145204713e-05, 'epoch': 0.33} 33%|███▎ | 2208/6640 [2:55:10<23:05:14, 18.75s/it] 33%|███▎ | 2209/6640 [2:55:26<21:55:32, 17.81s/it] {'loss': 0.5374, 'learning_rate': 1.5569890632049515e-05, 'epoch': 0.33} 33%|███▎ | 2209/6640 [2:55:26<21:55:32, 17.81s/it] 33%|███▎ | 2210/6640 [2:55:42<21:16:15, 17.29s/it] {'loss': 0.5473, 'learning_rate': 1.5565838486567343e-05, 'epoch': 0.33} 33%|███▎ | 2210/6640 [2:55:42<21:16:15, 17.29s/it] 33%|███▎ | 2211/6640 [2:55:59<21:04:30, 17.13s/it] {'loss': 0.5432, 'learning_rate': 1.5561785016564928e-05, 'epoch': 0.33} 33%|███▎ | 2211/6640 [2:55:59<21:04:30, 17.13s/it] 33%|███▎ | 2212/6640 [2:56:16<21:01:31, 17.09s/it] {'loss': 0.5412, 'learning_rate': 1.555773022300688e-05, 'epoch': 0.33} 33%|███▎ | 2212/6640 [2:56:16<21:01:31, 17.09s/it] 33%|███▎ | 2213/6640 [2:56:34<21:21:00, 17.36s/it] {'loss': 0.5341, 'learning_rate': 1.5553674106858135e-05, 'epoch': 0.33} 33%|███▎ | 2213/6640 [2:56:34<21:21:00, 17.36s/it] 33%|███▎ | 2214/6640 [2:56:50<21:06:46, 17.17s/it] {'loss': 0.5513, 'learning_rate': 1.5549616669083937e-05, 'epoch': 0.33} 33%|███▎ | 2214/6640 [2:56:50<21:06:46, 17.17s/it] 33%|███▎ | 2215/6640 [2:57:07<20:55:01, 17.02s/it] {'loss': 0.5401, 'learning_rate': 1.554555791064985e-05, 'epoch': 0.33} 33%|███▎ | 2215/6640 [2:57:07<20:55:01, 17.02s/it] 33%|███▎ | 2216/6640 [2:57:24<20:55:46, 17.03s/it] {'loss': 0.5405, 'learning_rate': 1.554149783252175e-05, 'epoch': 0.33} 33%|███▎ | 2216/6640 [2:57:24<20:55:46, 17.03s/it] 33%|███▎ | 2217/6640 [2:57:40<20:25:41, 16.63s/it] {'loss': 0.5576, 'learning_rate': 1.5537436435665823e-05, 'epoch': 0.33} 33%|███▎ | 2217/6640 [2:57:40<20:25:41, 16.63s/it] 33%|███▎ | 2218/6640 [2:57:56<20:19:50, 16.55s/it] {'loss': 0.5315, 'learning_rate': 1.5533373721048576e-05, 'epoch': 0.33} 33%|███▎ | 2218/6640 [2:57:56<20:19:50, 16.55s/it] 33%|███▎ | 2219/6640 [2:58:12<20:15:59, 16.50s/it] {'loss': 0.5426, 'learning_rate': 1.5529309689636826e-05, 'epoch': 0.33} 33%|███▎ | 2219/6640 [2:58:12<20:15:59, 16.50s/it] 33%|███▎ | 2220/6640 [2:58:29<20:08:36, 16.41s/it] {'loss': 0.5435, 'learning_rate': 1.55252443423977e-05, 'epoch': 0.33} 33%|███▎ | 2220/6640 [2:58:29<20:08:36, 16.41s/it] 33%|███▎ | 2221/6640 [2:58:44<19:49:43, 16.15s/it] {'loss': 0.522, 'learning_rate': 1.5521177680298645e-05, 'epoch': 0.33} 33%|███▎ | 2221/6640 [2:58:44<19:49:43, 16.15s/it] 33%|███▎ | 2222/6640 [2:59:01<20:12:49, 16.47s/it] {'loss': 0.5476, 'learning_rate': 1.5517109704307417e-05, 'epoch': 0.33} 33%|███▎ | 2222/6640 [2:59:01<20:12:49, 16.47s/it] 33%|███▎ | 2223/6640 [2:59:17<19:46:53, 16.12s/it] {'loss': 0.5319, 'learning_rate': 1.551304041539208e-05, 'epoch': 0.33} 33%|███▎ | 2223/6640 [2:59:17<19:46:53, 16.12s/it] 33%|███▎ | 2224/6640 [2:59:33<19:41:57, 16.06s/it] {'loss': 0.5476, 'learning_rate': 1.5508969814521026e-05, 'epoch': 0.33} 33%|███▎ | 2224/6640 [2:59:33<19:41:57, 16.06s/it] 34%|███▎ | 2225/6640 [2:59:50<20:04:59, 16.38s/it] {'loss': 0.5508, 'learning_rate': 1.550489790266294e-05, 'epoch': 0.34} 34%|███▎ | 2225/6640 [2:59:50<20:04:59, 16.38s/it] 34%|███▎ | 2226/6640 [3:00:06<20:00:30, 16.32s/it] {'loss': 0.5422, 'learning_rate': 1.5500824680786832e-05, 'epoch': 0.34} 34%|███▎ | 2226/6640 [3:00:06<20:00:30, 16.32s/it] 34%|███▎ | 2227/6640 [3:00:22<19:58:25, 16.29s/it] {'loss': 0.563, 'learning_rate': 1.549675014986202e-05, 'epoch': 0.34} 34%|███▎ | 2227/6640 [3:00:22<19:58:25, 16.29s/it] 34%|███▎ | 2228/6640 [3:00:38<19:45:37, 16.12s/it] {'loss': 0.5359, 'learning_rate': 1.5492674310858127e-05, 'epoch': 0.34} 34%|███▎ | 2228/6640 [3:00:38<19:45:37, 16.12s/it] 34%|███▎ | 2229/6640 [3:00:54<19:38:51, 16.04s/it] {'loss': 0.5291, 'learning_rate': 1.5488597164745104e-05, 'epoch': 0.34} 34%|███▎ | 2229/6640 [3:00:54<19:38:51, 16.04s/it] 34%|███▎ | 2230/6640 [3:01:10<19:53:47, 16.24s/it] {'loss': 0.5469, 'learning_rate': 1.5484518712493188e-05, 'epoch': 0.34} 34%|███▎ | 2230/6640 [3:01:10<19:53:47, 16.24s/it] 34%|███▎ | 2231/6640 [3:01:27<20:03:16, 16.37s/it] {'loss': 0.5411, 'learning_rate': 1.5480438955072954e-05, 'epoch': 0.34} 34%|███▎ | 2231/6640 [3:01:27<20:03:16, 16.37s/it] 34%|███▎ | 2232/6640 [3:01:44<20:13:48, 16.52s/it] {'loss': 0.5471, 'learning_rate': 1.5476357893455268e-05, 'epoch': 0.34} 34%|███▎ | 2232/6640 [3:01:44<20:13:48, 16.52s/it] 34%|███▎ | 2233/6640 [3:02:00<20:00:44, 16.35s/it] {'loss': 0.5366, 'learning_rate': 1.5472275528611317e-05, 'epoch': 0.34} 34%|███▎ | 2233/6640 [3:02:00<20:00:44, 16.35s/it] 34%|███▎ | 2234/6640 [3:02:17<20:14:52, 16.54s/it] {'loss': 0.5467, 'learning_rate': 1.546819186151259e-05, 'epoch': 0.34} 34%|███▎ | 2234/6640 [3:02:17<20:14:52, 16.54s/it] 34%|███▎ | 2235/6640 [3:02:34<20:15:58, 16.56s/it] {'loss': 0.5372, 'learning_rate': 1.5464106893130896e-05, 'epoch': 0.34} 34%|███▎ | 2235/6640 [3:02:34<20:15:58, 16.56s/it] 34%|███▎ | 2236/6640 [3:02:49<19:56:20, 16.30s/it] {'loss': 0.5281, 'learning_rate': 1.5460020624438346e-05, 'epoch': 0.34} 34%|███▎ | 2236/6640 [3:02:49<19:56:20, 16.30s/it] 34%|███▎ | 2237/6640 [3:03:05<19:44:42, 16.14s/it] {'loss': 0.536, 'learning_rate': 1.545593305640736e-05, 'epoch': 0.34} 34%|███▎ | 2237/6640 [3:03:05<19:44:42, 16.14s/it] 34%|███▎ | 2238/6640 [3:03:21<19:32:00, 15.97s/it] {'loss': 0.5307, 'learning_rate': 1.5451844190010666e-05, 'epoch': 0.34} 34%|███▎ | 2238/6640 [3:03:21<19:32:00, 15.97s/it] 34%|███▎ | 2239/6640 [3:03:37<19:44:18, 16.15s/it] {'loss': 0.5342, 'learning_rate': 1.5447754026221313e-05, 'epoch': 0.34} 34%|███▎ | 2239/6640 [3:03:37<19:44:18, 16.15s/it] 34%|███▎ | 2240/6640 [3:03:52<19:26:10, 15.90s/it] {'loss': 0.511, 'learning_rate': 1.5443662566012645e-05, 'epoch': 0.34} 34%|███▎ | 2240/6640 [3:03:52<19:26:10, 15.90s/it] 34%|███▍ | 2241/6640 [3:04:09<19:31:09, 15.97s/it] {'loss': 0.5322, 'learning_rate': 1.5439569810358324e-05, 'epoch': 0.34} 34%|███▍ | 2241/6640 [3:04:09<19:31:09, 15.97s/it] 34%|███▍ | 2242/6640 [3:04:26<19:54:28, 16.30s/it] {'loss': 0.5518, 'learning_rate': 1.543547576023231e-05, 'epoch': 0.34} 34%|███▍ | 2242/6640 [3:04:26<19:54:28, 16.30s/it] 34%|███▍ | 2243/6640 [3:04:41<19:44:39, 16.17s/it] {'loss': 0.5383, 'learning_rate': 1.543138041660888e-05, 'epoch': 0.34} 34%|███▍ | 2243/6640 [3:04:41<19:44:39, 16.17s/it] 34%|███▍ | 2244/6640 [3:04:58<19:47:58, 16.21s/it] {'loss': 0.5395, 'learning_rate': 1.542728378046262e-05, 'epoch': 0.34} 34%|███▍ | 2244/6640 [3:04:58<19:47:58, 16.21s/it] 34%|███▍ | 2245/6640 [3:05:14<19:52:48, 16.28s/it] {'loss': 0.5497, 'learning_rate': 1.542318585276841e-05, 'epoch': 0.34} 34%|███▍ | 2245/6640 [3:05:14<19:52:48, 16.28s/it] 34%|███▍ | 2246/6640 [3:05:30<19:49:46, 16.25s/it] {'loss': 0.5454, 'learning_rate': 1.5419086634501455e-05, 'epoch': 0.34} 34%|███▍ | 2246/6640 [3:05:30<19:49:46, 16.25s/it] 34%|███▍ | 2247/6640 [3:05:46<19:43:15, 16.16s/it] {'loss': 0.5699, 'learning_rate': 1.541498612663726e-05, 'epoch': 0.34} 34%|███▍ | 2247/6640 [3:05:46<19:43:15, 16.16s/it] 34%|███▍ | 2248/6640 [3:06:02<19:37:06, 16.08s/it] {'loss': 0.5249, 'learning_rate': 1.5410884330151628e-05, 'epoch': 0.34} 34%|███▍ | 2248/6640 [3:06:02<19:37:06, 16.08s/it] 34%|███▍ | 2249/6640 [3:06:19<19:48:14, 16.24s/it] {'loss': 0.5468, 'learning_rate': 1.5406781246020683e-05, 'epoch': 0.34} 34%|███▍ | 2249/6640 [3:06:19<19:48:14, 16.24s/it]7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 26 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 34%|███▍ | 2250/6640 [3:06:35<19:54:58, 16.33s/it] {'loss': 0.5627, 'learning_rate': 1.5402676875220847e-05, 'epoch': 0.34} 34%|███▍ | 2250/6640 [3:06:35<19:54:58, 16.33s/it] 34%|███▍ | 2251/6640 [3:06:51<19:49:07, 16.26s/it] {'loss': 0.5261, 'learning_rate': 1.539857121872885e-05, 'epoch': 0.34} 34%|███▍ | 2251/6640 [3:06:52<19:49:07, 16.26s/it] 34%|███▍ | 2252/6640 [3:07:08<19:53:18, 16.32s/it] {'loss': 0.5402, 'learning_rate': 1.5394464277521727e-05, 'epoch': 0.34} 34%|███▍ | 2252/6640 [3:07:08<19:53:18, 16.32s/it] 34%|███▍ | 2253/6640 [3:07:24<19:48:38, 16.26s/it] {'loss': 0.5491, 'learning_rate': 1.539035605257682e-05, 'epoch': 0.34} 34%|███▍ | 2253/6640 [3:07:24<19:48:38, 16.26s/it] 34%|███▍ | 2254/6640 [3:07:41<20:04:45, 16.48s/it] {'loss': 0.5423, 'learning_rate': 1.538624654487178e-05, 'epoch': 0.34} 34%|███▍ | 2254/6640 [3:07:41<20:04:45, 16.48s/it] 34%|███▍ | 2255/6640 [3:07:58<20:08:43, 16.54s/it] {'loss': 0.5284, 'learning_rate': 1.5382135755384554e-05, 'epoch': 0.34} 34%|███▍ | 2255/6640 [3:07:58<20:08:43, 16.54s/it] 34%|███▍ | 2256/6640 [3:08:13<19:50:47, 16.30s/it] {'loss': 0.5645, 'learning_rate': 1.5378023685093408e-05, 'epoch': 0.34} 34%|███▍ | 2256/6640 [3:08:13<19:50:47, 16.30s/it] 34%|███▍ | 2257/6640 [3:08:29<19:44:09, 16.21s/it] {'loss': 0.5343, 'learning_rate': 1.537391033497689e-05, 'epoch': 0.34} 34%|███▍ | 2257/6640 [3:08:29<19:44:09, 16.21s/it] 34%|███▍ | 2258/6640 [3:08:45<19:35:47, 16.10s/it] {'loss': 0.5427, 'learning_rate': 1.536979570601388e-05, 'epoch': 0.34} 34%|███▍ | 2258/6640 [3:08:45<19:35:47, 16.10s/it] 34%|███▍ | 2259/6640 [3:09:02<19:37:25, 16.13s/it] {'loss': 0.5344, 'learning_rate': 1.5365679799183548e-05, 'epoch': 0.34} 34%|███▍ | 2259/6640 [3:09:02<19:37:25, 16.13s/it] 34%|███▍ | 2260/6640 [3:09:18<19:40:26, 16.17s/it] {'loss': 0.5206, 'learning_rate': 1.5361562615465366e-05, 'epoch': 0.34} 34%|███▍ | 2260/6640 [3:09:18<19:40:26, 16.17s/it] 34%|███▍ | 2261/6640 [3:09:34<19:40:54, 16.18s/it] {'loss': 0.5379, 'learning_rate': 1.535744415583911e-05, 'epoch': 0.34} 34%|███▍ | 2261/6640 [3:09:34<19:40:54, 16.18s/it] 34%|███▍ | 2262/6640 [3:09:50<19:46:23, 16.26s/it] {'loss': 0.5361, 'learning_rate': 1.535332442128487e-05, 'epoch': 0.34} 34%|███▍ | 2262/6640 [3:09:50<19:46:23, 16.26s/it] 34%|███▍ | 2263/6640 [3:10:07<19:43:47, 16.23s/it] {'loss': 0.5266, 'learning_rate': 1.5349203412783028e-05, 'epoch': 0.34} 34%|███▍ | 2263/6640 [3:10:07<19:43:47, 16.23s/it] 34%|███▍ | 2264/6640 [3:10:24<20:06:30, 16.54s/it] {'loss': 0.5077, 'learning_rate': 1.5345081131314276e-05, 'epoch': 0.34} 34%|███▍ | 2264/6640 [3:10:24<20:06:30, 16.54s/it] 34%|███▍ | 2265/6640 [3:10:40<19:55:02, 16.39s/it] {'loss': 0.5516, 'learning_rate': 1.5340957577859605e-05, 'epoch': 0.34} 34%|███▍ | 2265/6640 [3:10:40<19:55:02, 16.39s/it] 34%|███▍ | 2266/6640 [3:10:57<20:00:08, 16.46s/it] {'loss': 0.5218, 'learning_rate': 1.533683275340031e-05, 'epoch': 0.34} 34%|███▍ | 2266/6640 [3:10:57<20:00:08, 16.46s/it] 34%|███▍ | 2267/6640 [3:11:13<19:54:31, 16.39s/it] {'loss': 0.5419, 'learning_rate': 1.5332706658917985e-05, 'epoch': 0.34} 34%|███▍ | 2267/6640 [3:11:13<19:54:31, 16.39s/it] 34%|███▍ | 2268/6640 [3:11:29<19:54:25, 16.39s/it] {'loss': 0.5304, 'learning_rate': 1.5328579295394534e-05, 'epoch': 0.34} 34%|███▍ | 2268/6640 [3:11:29<19:54:25, 16.39s/it] 34%|███▍ | 2269/6640 [3:11:46<19:57:00, 16.43s/it] {'loss': 0.5259, 'learning_rate': 1.5324450663812164e-05, 'epoch': 0.34} 34%|███▍ | 2269/6640 [3:11:46<19:57:00, 16.43s/it] 34%|███▍ | 2270/6640 [3:12:02<19:44:35, 16.26s/it] {'loss': 0.5483, 'learning_rate': 1.5320320765153367e-05, 'epoch': 0.34} 34%|███▍ | 2270/6640 [3:12:02<19:44:35, 16.26s/it] 34%|███▍ | 2271/6640 [3:12:18<19:52:23, 16.38s/it] {'loss': 0.5315, 'learning_rate': 1.5316189600400955e-05, 'epoch': 0.34} 34%|███▍ | 2271/6640 [3:12:18<19:52:23, 16.38s/it] 34%|███▍ | 2272/6640 [3:12:34<19:47:20, 16.31s/it] {'loss': 0.5193, 'learning_rate': 1.5312057170538033e-05, 'epoch': 0.34} 34%|███▍ | 2272/6640 [3:12:34<19:47:20, 16.31s/it] 34%|███▍ | 2273/6640 [3:12:51<19:53:30, 16.40s/it] {'loss': 0.5508, 'learning_rate': 1.530792347654801e-05, 'epoch': 0.34} 34%|███▍ | 2273/6640 [3:12:51<19:53:30, 16.40s/it] 34%|███▍ | 2274/6640 [3:13:07<19:47:10, 16.31s/it] {'loss': 0.536, 'learning_rate': 1.5303788519414594e-05, 'epoch': 0.34} 34%|███▍ | 2274/6640 [3:13:07<19:47:10, 16.31s/it] 34%|███▍ | 2275/6640 [3:13:23<19:37:21, 16.18s/it] {'loss': 0.539, 'learning_rate': 1.5299652300121792e-05, 'epoch': 0.34} 34%|███▍ | 2275/6640 [3:13:23<19:37:21, 16.18s/it] 34%|███▍ | 2276/6640 [3:13:39<19:31:23, 16.11s/it] {'loss': 0.5525, 'learning_rate': 1.5295514819653913e-05, 'epoch': 0.34} 34%|███▍ | 2276/6640 [3:13:39<19:31:23, 16.11s/it] 34%|███▍ | 2277/6640 [3:13:55<19:33:08, 16.13s/it] {'loss': 0.5539, 'learning_rate': 1.529137607899557e-05, 'epoch': 0.34} 34%|███▍ | 2277/6640 [3:13:55<19:33:08, 16.13s/it] 34%|███▍ | 2278/6640 [3:14:10<19:15:03, 15.89s/it] {'loss': 0.5577, 'learning_rate': 1.5287236079131668e-05, 'epoch': 0.34} 34%|███▍ | 2278/6640 [3:14:10<19:15:03, 15.89s/it] 34%|███▍ | 2279/6640 [3:14:26<19:18:58, 15.95s/it] {'loss': 0.538, 'learning_rate': 1.5283094821047416e-05, 'epoch': 0.34} 34%|███▍ | 2279/6640 [3:14:26<19:18:58, 15.95s/it] 34%|███▍ | 2280/6640 [3:14:42<19:13:53, 15.88s/it] {'loss': 0.5079, 'learning_rate': 1.5278952305728325e-05, 'epoch': 0.34} 34%|███▍ | 2280/6640 [3:14:42<19:13:53, 15.88s/it] 34%|███▍ | 2281/6640 [3:14:59<19:25:48, 16.05s/it] {'loss': 0.5534, 'learning_rate': 1.5274808534160203e-05, 'epoch': 0.34} 34%|███▍ | 2281/6640 [3:14:59<19:25:48, 16.05s/it] 34%|███▍ | 2282/6640 [3:15:15<19:28:09, 16.08s/it] {'loss': 0.5286, 'learning_rate': 1.5270663507329152e-05, 'epoch': 0.34} 34%|███▍ | 2282/6640 [3:15:15<19:28:09, 16.08s/it] 34%|███▍ | 2283/6640 [3:15:32<19:41:38, 16.27s/it] {'loss': 0.5218, 'learning_rate': 1.5266517226221585e-05, 'epoch': 0.34} 34%|███▍ | 2283/6640 [3:15:32<19:41:38, 16.27s/it] 34%|███▍ | 2284/6640 [3:15:48<19:40:15, 16.26s/it] {'loss': 0.534, 'learning_rate': 1.52623696918242e-05, 'epoch': 0.34} 34%|███▍ | 2284/6640 [3:15:48<19:40:15, 16.26s/it] 34%|███▍ | 2285/6640 [3:16:05<19:57:26, 16.50s/it] {'loss': 0.5358, 'learning_rate': 1.5258220905123997e-05, 'epoch': 0.34} 34%|███▍ | 2285/6640 [3:16:05<19:57:26, 16.50s/it] 34%|███▍ | 2286/6640 [3:16:21<19:55:50, 16.48s/it] {'loss': 0.525, 'learning_rate': 1.5254070867108277e-05, 'epoch': 0.34} 34%|███▍ | 2286/6640 [3:16:21<19:55:50, 16.48s/it] 34%|███▍ | 2287/6640 [3:16:37<19:38:13, 16.24s/it] {'loss': 0.5391, 'learning_rate': 1.5249919578764641e-05, 'epoch': 0.34} 34%|███▍ | 2287/6640 [3:16:37<19:38:13, 16.24s/it] 34%|███▍ | 2288/6640 [3:16:53<19:40:26, 16.27s/it] {'loss': 0.5202, 'learning_rate': 1.5245767041080983e-05, 'epoch': 0.34} 34%|███▍ | 2288/6640 [3:16:53<19:40:26, 16.27s/it] 34%|███▍ | 2289/6640 [3:17:09<19:25:17, 16.07s/it] {'loss': 0.5266, 'learning_rate': 1.5241613255045495e-05, 'epoch': 0.34} 34%|███▍ | 2289/6640 [3:17:09<19:25:17, 16.07s/it] 34%|███▍ | 2290/6640 [3:17:26<19:41:18, 16.29s/it] {'loss': 0.5435, 'learning_rate': 1.5237458221646668e-05, 'epoch': 0.34} 34%|███▍ | 2290/6640 [3:17:26<19:41:18, 16.29s/it]WARNING: tokenization mismatch: 1 vs. 1440. [[{'from': 'human', 'value': 'ASSISTANT: Du bist ein Personalmarketing-Experte mit den folgenden Kenntnissen und Eigenschaften.\n\nKENNTNISSE: Personalmarketing, Social Media Marketing, Facebook-Marketing, Kampagnenentwicklung, Zielgruppenanalyse, Content-Strategie, Werbeanzeigen, Performance-Messung, Personalgewinnung, Steuerfachangestellte.\n\nEIGENSCHAFTEN: Kreativ, strategisch, analytisch, kommunikativ, zielorientiert, detailverliebt, teamorientiert.\n\nAUFGABE: Entwickle eine erfolgreiche Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook. Erstelle eine Tabelle mit allen Maßnahmen und weise jeder Maßnahme den zuständigen Spezialisten in einer Marketingagentur zu. Achte dabei auf Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung.\n\nOUTPUT: Eine Tabelle mit allen Maßnahmen für eine erfolgreiche Personalmarketing-Kampagne zur Gewinnung von Steuerfachangestellten auf Facebook, die die Verantwortlichkeiten der Spezialisten in einer Marketingagentur aufzeigt, einschließlich Zielgruppenanalyse, Content-Strategie, Werbeanzeigen und Performance-Messung.'}, {'from': 'gpt', 'value': '| Maßnahme | Beschreibung | Zuständiger Spezialist |\n| --- | --- | --- |\n| Zielgruppenanalyse | Analyse der demografischen Daten, Interessen und Verhaltensweisen der potenziellen Bewerber. | Marktanalyst |\n| Erstellung von Personas | Entwicklung von fiktiven, repräsentativen Bewerberprofilen zur besseren Zielgruppenansprache. | Personalmarketing-Experte |\n| Content-Strategie | Planung von relevantem, ansprechendem und zielgruppenspezifischem Content. | Content-Stratege |\n| Redaktionsplan | Erstellung eines Zeitplans für die Veröffentlichung von Inhalten auf Facebook. | Social Media Manager |\n| Erstellung von Werbeanzeigen | Design und Texten von ansprechenden, zielgruppenspezifischen Facebook-Werbeanzeigen. | Art Director & Copywriter |\n| Auswahl von Targeting-Optionen | Festlegung der Zielgruppenparameter für die Facebook-Werbekampagnen. | Facebook-Marketing-Experte |\n| Budgetierung und Zeitplanung | Festlegung des Budgets und der Laufzeit für die Facebook-Werbekampagnen. | Media Planner |\n| Erstellung von Landing Pages | Gestaltung und Implementierung von spezifischen Landing Pages für die Bewerbung. | Web-Designer |\n| Performance-Messung | Überwachung und Analyse der Kampagnenleistung, einschließlich KPIs wie Klicks und Bewerbungen. | Data Analyst |\n| A/B-Testing | Durchführung von A/B-Tests, um die Effektivität von Anzeigen und Landing Pages zu optimieren. | Conversion-Optimierungsexperte |\n| Community-Management | Interaktion mit der Zielgruppe auf Facebook, Beantwortung von Fragen und Moderation von Diskussionen. | Community Manager |\n| Reporting und Optimierung | Regelmäßige Berichterstattung über die Kampagnenergebnisse und Anpassung der Strategie. | Performance Marketing-Manager |'}, {'from': 'human', 'value': 'Weiter.'}, {'from': 'gpt', 'value': 'Hier sind einige zusätzliche Maßnahmen, die Sie in Ihre Personalmarketing-Strategie für die Gewinnung von Steuerfachangestellten über Facebook einbeziehen können:\n\n| Maßnahme | Beschreibung | Zuständiger Spezialist |\n| --- | --- | --- |\n| Video-Content | Erstellung von Video-Content, wie Interviews mit aktuellen Mitarbeitern oder Berufsalltagsvideos. | Videoproduzent |\n| Influencer-Marketing | Zusammenarbeit mit Influencern aus der Branche, um Ihre Arbeitgebermarke zu stärken. | Influencer-Marketing-Experte |\n| Anpassung von Stellenanzeigen | Anpassung der Stellenanzeigen auf Facebook, um die Zielgruppe besser anzusprechen. | Personalmarketing-Experte |\n| Mitarbeiter-Testimonials | Teilen von Erfahrungsberichten und Zitaten der Mitarbeiter auf Facebook, um Authentizität zu schaffen. | Content-Stratege |\n| Virtuelle Karriere-Events | Durchführung von virtuellen Karriere-Events oder Informationsveranstaltungen auf Facebook. | Event-Manager |\n| Interaktive Umfragen und Quiz | Nutzung von interaktiven Umfragen und Quiz auf Facebook, um das Engagement zu erhöhen. | Social Media Manager |\n| Remarketing-Kampagnen | Einsatz von Remarketing-Anzeigen, um die Personen erneut anzusprechen, die bereits interessiert waren. | Facebook-Marketing-Experte |\n| Monitoring von Wettbewerbern | Beobachtung der Personalmarketing-Aktivitäten der Wettbewerber auf Facebook und Ableitung von Optimierungspotenzialen. | Marktanalyst |\n| Integration von Chatbots | Implementierung von Chatbots auf der Facebook-Seite und der Landing Page zur Automatisierung des Bewerbungsprozesses. | Chatbot-Entwickler |\n| Employer Branding-Kampagne | Entwicklung einer Employer Branding-Kampagne, die die Unternehmenskultur und -werte hervorhebt. | Employer Branding-Experte |\n\nMit diesen erweiterten Maßnahmen können Sie eine noch umfassendere Personalmarketing-Strategie entwickeln, die die Chancen erhöht, qualifizierte Steuerfachangestellte über Facebook zu gewinnen.'}]] (ignored) 35%|███▍ | 2291/6640 [3:17:42<19:37:18, 16.24s/it] {'loss': 0.5122, 'learning_rate': 1.5233301941873286e-05, 'epoch': 0.35} 35%|███▍ | 2291/6640 [3:17:42<19:37:18, 16.24s/it] 35%|███▍ | 2292/6640 [3:17:57<19:24:09, 16.06s/it] {'loss': 0.5323, 'learning_rate': 1.5229144416714435e-05, 'epoch': 0.35} 35%|███▍ | 2292/6640 [3:17:57<19:24:09, 16.06s/it] 35%|███▍ | 2293/6640 [3:18:13<19:12:38, 15.91s/it] {'loss': 0.5278, 'learning_rate': 1.5224985647159489e-05, 'epoch': 0.35} 35%|███▍ | 2293/6640 [3:18:13<19:12:38, 15.91s/it] 35%|███▍ | 2294/6640 [3:18:29<19:18:00, 15.99s/it] {'loss': 0.5292, 'learning_rate': 1.5220825634198133e-05, 'epoch': 0.35} 35%|███▍ | 2294/6640 [3:18:29<19:18:00, 15.99s/it] 35%|███▍ | 2295/6640 [3:18:45<19:08:48, 15.86s/it] {'loss': 0.5238, 'learning_rate': 1.5216664378820327e-05, 'epoch': 0.35} 35%|███▍ | 2295/6640 [3:18:45<19:08:48, 15.86s/it] 35%|███▍ | 2296/6640 [3:19:01<19:08:15, 15.86s/it] {'loss': 0.5307, 'learning_rate': 1.5212501882016346e-05, 'epoch': 0.35} 35%|███▍ | 2296/6640 [3:19:01<19:08:15, 15.86s/it] 35%|███▍ | 2297/6640 [3:19:16<19:05:12, 15.82s/it] {'loss': 0.5476, 'learning_rate': 1.5208338144776754e-05, 'epoch': 0.35} 35%|███▍ | 2297/6640 [3:19:16<19:05:12, 15.82s/it] 35%|███▍ | 2298/6640 [3:19:32<19:05:11, 15.82s/it] {'loss': 0.5265, 'learning_rate': 1.5204173168092405e-05, 'epoch': 0.35} 35%|███▍ | 2298/6640 [3:19:32<19:05:11, 15.82s/it] 35%|███▍ | 2299/6640 [3:19:48<19:12:17, 15.93s/it] {'loss': 0.5432, 'learning_rate': 1.520000695295445e-05, 'epoch': 0.35} 35%|███▍ | 2299/6640 [3:19:48<19:12:17, 15.93s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 0AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 75 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 35%|███▍ | 2300/6640 [3:20:05<19:37:23, 16.28s/it] {'loss': 0.5604, 'learning_rate': 1.5195839500354337e-05, 'epoch': 0.35} 35%|███▍ | 2300/6640 [3:20:05<19:37:23, 16.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 35%|███▍ | 2301/6640 [3:21:50<51:32:22, 42.76s/it] {'loss': 0.528, 'learning_rate': 1.5191670811283813e-05, 'epoch': 0.35} 35%|███▍ | 2301/6640 [3:21:50<51:32:22, 42.76s/it] 35%|███▍ | 2302/6640 [3:22:06<41:57:31, 34.82s/it] {'loss': 0.5343, 'learning_rate': 1.5187500886734908e-05, 'epoch': 0.35} 35%|███▍ | 2302/6640 [3:22:06<41:57:31, 34.82s/it] 35%|███▍ | 2303/6640 [3:22:23<35:15:38, 29.27s/it] {'loss': 0.5427, 'learning_rate': 1.5183329727699957e-05, 'epoch': 0.35} 35%|███▍ | 2303/6640 [3:22:23<35:15:38, 29.27s/it]WARNING: tokenization mismatch: 1 vs. 64. [[{'from': 'human', 'value': '\nWhat vitamin is this vegetable associated with?\nAnswer the question using a single word or phrase.'}, {'from': 'gpt', 'value': ''}]] (ignored) 35%|███▍ | 2304/6640 [3:22:39<30:45:04, 25.53s/it] {'loss': 0.5366, 'learning_rate': 1.5179157335171579e-05, 'epoch': 0.35} 35%|███▍ | 2304/6640 [3:22:39<30:45:04, 25.53s/it] 35%|███▍ | 2305/6640 [3:22:55<27:20:15, 22.70s/it] {'loss': 0.5439, 'learning_rate': 1.5174983710142694e-05, 'epoch': 0.35} 35%|███▍ | 2305/6640 [3:22:55<27:20:15, 22.70s/it] 35%|███▍ | 2306/6640 [3:23:12<25:12:30, 20.94s/it] {'loss': 0.5105, 'learning_rate': 1.5170808853606515e-05, 'epoch': 0.35} 35%|███▍ | 2306/6640 [3:23:12<25:12:30, 20.94s/it] 35%|███▍ | 2307/6640 [3:23:29<23:30:58, 19.54s/it] {'loss': 0.5296, 'learning_rate': 1.5166632766556546e-05, 'epoch': 0.35} 35%|███▍ | 2307/6640 [3:23:29<23:30:58, 19.54s/it] 35%|███▍ | 2308/6640 [3:23:45<22:17:23, 18.52s/it] {'loss': 0.528, 'learning_rate': 1.516245544998658e-05, 'epoch': 0.35} 35%|███▍ | 2308/6640 [3:23:45<22:17:23, 18.52s/it] 35%|███▍ | 2309/6640 [3:24:01<21:35:48, 17.95s/it] {'loss': 0.5441, 'learning_rate': 1.5158276904890708e-05, 'epoch': 0.35} 35%|███▍ | 2309/6640 [3:24:01<21:35:48, 17.95s/it] 35%|███▍ | 2310/6640 [3:24:18<20:59:40, 17.45s/it] {'loss': 0.5315, 'learning_rate': 1.515409713226331e-05, 'epoch': 0.35} 35%|███▍ | 2310/6640 [3:24:18<20:59:40, 17.45s/it] 35%|███▍ | 2311/6640 [3:24:34<20:35:37, 17.13s/it] {'loss': 0.5542, 'learning_rate': 1.5149916133099063e-05, 'epoch': 0.35} 35%|███▍ | 2311/6640 [3:24:34<20:35:37, 17.13s/it] 35%|███▍ | 2312/6640 [3:24:50<20:12:20, 16.81s/it] {'loss': 0.5063, 'learning_rate': 1.514573390839293e-05, 'epoch': 0.35} 35%|███▍ | 2312/6640 [3:24:50<20:12:20, 16.81s/it] 35%|███▍ | 2313/6640 [3:25:06<19:50:04, 16.50s/it] {'loss': 0.5433, 'learning_rate': 1.5141550459140168e-05, 'epoch': 0.35} 35%|███▍ | 2313/6640 [3:25:06<19:50:04, 16.50s/it] 35%|███▍ | 2314/6640 [3:25:23<20:08:17, 16.76s/it] {'loss': 0.5573, 'learning_rate': 1.5137365786336329e-05, 'epoch': 0.35} 35%|███▍ | 2314/6640 [3:25:23<20:08:17, 16.76s/it] 35%|███▍ | 2315/6640 [3:25:39<19:46:44, 16.46s/it] {'loss': 0.5393, 'learning_rate': 1.513317989097725e-05, 'epoch': 0.35} 35%|███▍ | 2315/6640 [3:25:39<19:46:44, 16.46s/it] 35%|███▍ | 2316/6640 [3:25:56<19:49:33, 16.51s/it] {'loss': 0.5467, 'learning_rate': 1.5128992774059063e-05, 'epoch': 0.35} 35%|███▍ | 2316/6640 [3:25:56<19:49:33, 16.51s/it] 35%|███▍ | 2317/6640 [3:26:13<19:59:10, 16.64s/it] {'loss': 0.5295, 'learning_rate': 1.5124804436578191e-05, 'epoch': 0.35} 35%|███▍ | 2317/6640 [3:26:13<19:59:10, 16.64s/it] 35%|███▍ | 2318/6640 [3:26:29<19:49:23, 16.51s/it] {'loss': 0.5352, 'learning_rate': 1.512061487953134e-05, 'epoch': 0.35} 35%|███▍ | 2318/6640 [3:26:29<19:49:23, 16.51s/it] 35%|███▍ | 2319/6640 [3:26:45<19:36:25, 16.34s/it] {'loss': 0.5673, 'learning_rate': 1.5116424103915519e-05, 'epoch': 0.35} 35%|███▍ | 2319/6640 [3:26:45<19:36:25, 16.34s/it] 35%|███▍ | 2320/6640 [3:27:01<19:39:48, 16.39s/it] {'loss': 0.5455, 'learning_rate': 1.5112232110728016e-05, 'epoch': 0.35} 35%|███▍ | 2320/6640 [3:27:01<19:39:48, 16.39s/it] 35%|███▍ | 2321/6640 [3:27:18<19:40:38, 16.40s/it] {'loss': 0.521, 'learning_rate': 1.5108038900966416e-05, 'epoch': 0.35} 35%|███▍ | 2321/6640 [3:27:18<19:40:38, 16.40s/it] 35%|███▍ | 2322/6640 [3:27:34<19:36:24, 16.35s/it] {'loss': 0.5408, 'learning_rate': 1.5103844475628585e-05, 'epoch': 0.35} 35%|███▍ | 2322/6640 [3:27:34<19:36:24, 16.35s/it] 35%|███▍ | 2323/6640 [3:27:50<19:22:22, 16.16s/it] {'loss': 0.5307, 'learning_rate': 1.509964883571269e-05, 'epoch': 0.35} 35%|███▍ | 2323/6640 [3:27:50<19:22:22, 16.16s/it] 35%|███▌ | 2324/6640 [3:28:06<19:22:09, 16.16s/it] {'loss': 0.5519, 'learning_rate': 1.5095451982217177e-05, 'epoch': 0.35} 35%|███▌ | 2324/6640 [3:28:06<19:22:09, 16.16s/it] 35%|███▌ | 2325/6640 [3:28:22<19:19:27, 16.12s/it] {'loss': 0.5373, 'learning_rate': 1.5091253916140789e-05, 'epoch': 0.35} 35%|███▌ | 2325/6640 [3:28:22<19:19:27, 16.12s/it] 35%|███▌ | 2326/6640 [3:28:38<19:29:03, 16.26s/it] {'loss': 0.5393, 'learning_rate': 1.508705463848255e-05, 'epoch': 0.35} 35%|███▌ | 2326/6640 [3:28:38<19:29:03, 16.26s/it] 35%|███▌ | 2327/6640 [3:28:55<19:37:08, 16.38s/it] {'loss': 0.5403, 'learning_rate': 1.5082854150241773e-05, 'epoch': 0.35} 35%|███▌ | 2327/6640 [3:28:55<19:37:08, 16.38s/it] 35%|███▌ | 2328/6640 [3:29:12<19:42:15, 16.45s/it] {'loss': 0.5219, 'learning_rate': 1.5078652452418063e-05, 'epoch': 0.35} 35%|███▌ | 2328/6640 [3:29:12<19:42:15, 16.45s/it] 35%|███▌ | 2329/6640 [3:29:28<19:49:13, 16.55s/it] {'loss': 0.5289, 'learning_rate': 1.5074449546011312e-05, 'epoch': 0.35} 35%|███▌ | 2329/6640 [3:29:28<19:49:13, 16.55s/it] 35%|███▌ | 2330/6640 [3:29:44<19:38:19, 16.40s/it] {'loss': 0.5408, 'learning_rate': 1.5070245432021699e-05, 'epoch': 0.35} 35%|███▌ | 2330/6640 [3:29:44<19:38:19, 16.40s/it] 35%|███▌ | 2331/6640 [3:30:01<19:32:00, 16.32s/it] {'loss': 0.5243, 'learning_rate': 1.5066040111449692e-05, 'epoch': 0.35} 35%|███▌ | 2331/6640 [3:30:01<19:32:00, 16.32s/it] 35%|███▌ | 2332/6640 [3:30:17<19:43:47, 16.49s/it] {'loss': 0.5491, 'learning_rate': 1.5061833585296044e-05, 'epoch': 0.35} 35%|███▌ | 2332/6640 [3:30:17<19:43:47, 16.49s/it] 35%|███▌ | 2333/6640 [3:30:34<19:34:07, 16.36s/it] {'loss': 0.5397, 'learning_rate': 1.505762585456179e-05, 'epoch': 0.35} 35%|███▌ | 2333/6640 [3:30:34<19:34:07, 16.36s/it] 35%|███▌ | 2334/6640 [3:30:50<19:34:37, 16.37s/it] {'loss': 0.5267, 'learning_rate': 1.5053416920248267e-05, 'epoch': 0.35} 35%|███▌ | 2334/6640 [3:30:50<19:34:37, 16.37s/it] 35%|███▌ | 2335/6640 [3:31:06<19:25:15, 16.24s/it] {'loss': 0.5248, 'learning_rate': 1.5049206783357082e-05, 'epoch': 0.35} 35%|███▌ | 2335/6640 [3:31:06<19:25:15, 16.24s/it] 35%|███▌ | 2336/6640 [3:31:22<19:14:06, 16.09s/it] {'loss': 0.5371, 'learning_rate': 1.504499544489013e-05, 'epoch': 0.35} 35%|███▌ | 2336/6640 [3:31:22<19:14:06, 16.09s/it] 35%|███▌ | 2337/6640 [3:31:38<19:22:50, 16.21s/it] {'loss': 0.5387, 'learning_rate': 1.504078290584961e-05, 'epoch': 0.35} 35%|███▌ | 2337/6640 [3:31:38<19:22:50, 16.21s/it] 35%|███▌ | 2338/6640 [3:31:55<19:29:33, 16.31s/it] {'loss': 0.5438, 'learning_rate': 1.5036569167237978e-05, 'epoch': 0.35} 35%|███▌ | 2338/6640 [3:31:55<19:29:33, 16.31s/it] 35%|███▌ | 2339/6640 [3:32:13<20:06:15, 16.83s/it] {'loss': 0.5435, 'learning_rate': 1.5032354230058004e-05, 'epoch': 0.35} 35%|███▌ | 2339/6640 [3:32:13<20:06:15, 16.83s/it] 35%|███▌ | 2340/6640 [3:32:29<19:56:39, 16.70s/it] {'loss': 0.5315, 'learning_rate': 1.502813809531272e-05, 'epoch': 0.35} 35%|███▌ | 2340/6640 [3:32:29<19:56:39, 16.70s/it] 35%|███▌ | 2341/6640 [3:32:45<19:43:52, 16.52s/it] {'loss': 0.5247, 'learning_rate': 1.502392076400546e-05, 'epoch': 0.35} 35%|███▌ | 2341/6640 [3:32:45<19:43:52, 16.52s/it] 35%|███▌ | 2342/6640 [3:33:02<19:50:46, 16.62s/it] {'loss': 0.538, 'learning_rate': 1.501970223713983e-05, 'epoch': 0.35} 35%|███▌ | 2342/6640 [3:33:02<19:50:46, 16.62s/it] 35%|███▌ | 2343/6640 [3:33:18<19:46:06, 16.56s/it] {'loss': 0.5346, 'learning_rate': 1.501548251571973e-05, 'epoch': 0.35} 35%|███▌ | 2343/6640 [3:33:18<19:46:06, 16.56s/it] 35%|███▌ | 2344/6640 [3:33:35<19:38:37, 16.46s/it] {'loss': 0.5487, 'learning_rate': 1.501126160074934e-05, 'epoch': 0.35} 35%|███▌ | 2344/6640 [3:33:35<19:38:37, 16.46s/it] 35%|███▌ | 2345/6640 [3:33:51<19:28:00, 16.32s/it] {'loss': 0.5391, 'learning_rate': 1.5007039493233123e-05, 'epoch': 0.35} 35%|███▌ | 2345/6640 [3:33:51<19:28:00, 16.32s/it] 35%|███▌ | 2346/6640 [3:34:07<19:27:03, 16.31s/it] {'loss': 0.5287, 'learning_rate': 1.5002816194175829e-05, 'epoch': 0.35} 35%|███▌ | 2346/6640 [3:34:07<19:27:03, 16.31s/it] 35%|███▌ | 2347/6640 [3:34:23<19:21:40, 16.24s/it] {'loss': 0.5251, 'learning_rate': 1.4998591704582488e-05, 'epoch': 0.35} 35%|███▌ | 2347/6640 [3:34:23<19:21:40, 16.24s/it] 35%|███▌ | 2348/6640 [3:34:39<19:11:00, 16.09s/it] {'loss': 0.5448, 'learning_rate': 1.4994366025458421e-05, 'epoch': 0.35} 35%|███▌ | 2348/6640 [3:34:39<19:11:00, 16.09s/it] 35%|███▌ | 2349/6640 [3:34:55<19:22:37, 16.26s/it] {'loss': 0.5411, 'learning_rate': 1.4990139157809217e-05, 'epoch': 0.35} 35%|███▌ | 2349/6640 [3:34:55<19:22:37, 16.26s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 35%|███▌ | 2350/6640 [3:35:12<19:38:25, 16.48s/it]7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.5428, 'learning_rate': 1.4985911102640762e-05, 'epoch': 0.35} 35%|███▌ | 2350/6640 [3:35:12<19:38:25, 16.48s/it] 35%|███▌ | 2351/6640 [3:35:29<19:35:18, 16.44s/it] {'loss': 0.5339, 'learning_rate': 1.4981681860959222e-05, 'epoch': 0.35} 35%|███▌ | 2351/6640 [3:35:29<19:35:18, 16.44s/it] 35%|███▌ | 2352/6640 [3:35:45<19:36:43, 16.47s/it] {'loss': 0.5229, 'learning_rate': 1.4977451433771037e-05, 'epoch': 0.35} 35%|███▌ | 2352/6640 [3:35:45<19:36:43, 16.47s/it] 35%|███▌ | 2353/6640 [3:36:02<19:42:48, 16.55s/it] {'loss': 0.5415, 'learning_rate': 1.4973219822082942e-05, 'epoch': 0.35} 35%|███▌ | 2353/6640 [3:36:02<19:42:48, 16.55s/it] 35%|███▌ | 2354/6640 [3:36:19<19:46:37, 16.61s/it] {'loss': 0.5282, 'learning_rate': 1.4968987026901942e-05, 'epoch': 0.35} 35%|███▌ | 2354/6640 [3:36:19<19:46:37, 16.61s/it] 35%|███▌ | 2355/6640 [3:36:35<19:27:59, 16.35s/it] {'loss': 0.5274, 'learning_rate': 1.4964753049235333e-05, 'epoch': 0.35} 35%|███▌ | 2355/6640 [3:36:35<19:27:59, 16.35s/it] 35%|███▌ | 2356/6640 [3:36:50<19:18:04, 16.22s/it] {'loss': 0.5309, 'learning_rate': 1.4960517890090683e-05, 'epoch': 0.35} 35%|███▌ | 2356/6640 [3:36:50<19:18:04, 16.22s/it] 35%|███▌ | 2357/6640 [3:37:06<19:10:33, 16.12s/it] {'loss': 0.5267, 'learning_rate': 1.4956281550475851e-05, 'epoch': 0.35} 35%|███▌ | 2357/6640 [3:37:06<19:10:33, 16.12s/it] 36%|███▌ | 2358/6640 [3:37:23<19:21:40, 16.28s/it] {'loss': 0.5381, 'learning_rate': 1.4952044031398966e-05, 'epoch': 0.36} 36%|███▌ | 2358/6640 [3:37:23<19:21:40, 16.28s/it] 36%|███▌ | 2359/6640 [3:37:39<19:14:53, 16.19s/it] {'loss': 0.5113, 'learning_rate': 1.4947805333868453e-05, 'epoch': 0.36} 36%|███▌ | 2359/6640 [3:37:39<19:14:53, 16.19s/it] 36%|███▌ | 2360/6640 [3:37:56<19:26:16, 16.35s/it] {'loss': 0.5265, 'learning_rate': 1.4943565458892999e-05, 'epoch': 0.36} 36%|███▌ | 2360/6640 [3:37:56<19:26:16, 16.35s/it] 36%|███▌ | 2361/6640 [3:38:12<19:26:14, 16.35s/it] {'loss': 0.54, 'learning_rate': 1.4939324407481588e-05, 'epoch': 0.36} 36%|███▌ | 2361/6640 [3:38:12<19:26:14, 16.35s/it] 36%|███▌ | 2362/6640 [3:38:29<19:34:41, 16.48s/it] {'loss': 0.5375, 'learning_rate': 1.493508218064347e-05, 'epoch': 0.36} 36%|███▌ | 2362/6640 [3:38:29<19:34:41, 16.48s/it] 36%|███▌ | 2363/6640 [3:38:45<19:26:28, 16.36s/it] {'loss': 0.5437, 'learning_rate': 1.4930838779388186e-05, 'epoch': 0.36} 36%|███▌ | 2363/6640 [3:38:45<19:26:28, 16.36s/it] 36%|███▌ | 2364/6640 [3:39:02<19:37:03, 16.52s/it] {'loss': 0.548, 'learning_rate': 1.4926594204725552e-05, 'epoch': 0.36} 36%|███▌ | 2364/6640 [3:39:02<19:37:03, 16.52s/it] 36%|███▌ | 2365/6640 [3:39:18<19:32:04, 16.45s/it] {'loss': 0.5367, 'learning_rate': 1.4922348457665656e-05, 'epoch': 0.36} 36%|███▌ | 2365/6640 [3:39:18<19:32:04, 16.45s/it] 36%|███▌ | 2366/6640 [3:39:34<19:29:38, 16.42s/it] {'loss': 0.5644, 'learning_rate': 1.491810153921888e-05, 'epoch': 0.36} 36%|███▌ | 2366/6640 [3:39:34<19:29:38, 16.42s/it] 36%|███▌ | 2367/6640 [3:39:51<19:34:44, 16.50s/it] {'loss': 0.5173, 'learning_rate': 1.4913853450395874e-05, 'epoch': 0.36} 36%|███▌ | 2367/6640 [3:39:51<19:34:44, 16.50s/it] 36%|███▌ | 2368/6640 [3:40:07<19:27:06, 16.39s/it] {'loss': 0.5416, 'learning_rate': 1.4909604192207569e-05, 'epoch': 0.36} 36%|███▌ | 2368/6640 [3:40:07<19:27:06, 16.39s/it] 36%|███▌ | 2369/6640 [3:40:24<19:30:51, 16.45s/it] {'loss': 0.5395, 'learning_rate': 1.4905353765665171e-05, 'epoch': 0.36} 36%|███▌ | 2369/6640 [3:40:24<19:30:51, 16.45s/it] 36%|███▌ | 2370/6640 [3:40:40<19:20:38, 16.31s/it] {'loss': 0.5502, 'learning_rate': 1.4901102171780175e-05, 'epoch': 0.36} 36%|███▌ | 2370/6640 [3:40:40<19:20:38, 16.31s/it] 36%|███▌ | 2371/6640 [3:40:56<19:15:04, 16.23s/it] {'loss': 0.5273, 'learning_rate': 1.4896849411564337e-05, 'epoch': 0.36} 36%|███▌ | 2371/6640 [3:40:56<19:15:04, 16.23s/it] 36%|███▌ | 2372/6640 [3:41:12<19:09:35, 16.16s/it] {'loss': 0.5266, 'learning_rate': 1.4892595486029709e-05, 'epoch': 0.36} 36%|███▌ | 2372/6640 [3:41:12<19:09:35, 16.16s/it] 36%|███▌ | 2373/6640 [3:41:29<19:23:16, 16.36s/it] {'loss': 0.5609, 'learning_rate': 1.4888340396188606e-05, 'epoch': 0.36} 36%|███▌ | 2373/6640 [3:41:29<19:23:16, 16.36s/it] 36%|███▌ | 2374/6640 [3:41:45<19:15:43, 16.25s/it] {'loss': 0.5403, 'learning_rate': 1.4884084143053622e-05, 'epoch': 0.36} 36%|███▌ | 2374/6640 [3:41:45<19:15:43, 16.25s/it] 36%|███▌ | 2375/6640 [3:42:01<19:25:23, 16.39s/it] {'loss': 0.5243, 'learning_rate': 1.487982672763764e-05, 'epoch': 0.36} 36%|███▌ | 2375/6640 [3:42:01<19:25:23, 16.39s/it] 36%|███▌ | 2376/6640 [3:42:17<19:07:28, 16.15s/it] {'loss': 0.5337, 'learning_rate': 1.4875568150953805e-05, 'epoch': 0.36} 36%|███▌ | 2376/6640 [3:42:17<19:07:28, 16.15s/it] 36%|███▌ | 2377/6640 [3:42:33<18:54:14, 15.96s/it] {'loss': 0.5169, 'learning_rate': 1.4871308414015547e-05, 'epoch': 0.36} 36%|███▌ | 2377/6640 [3:42:33<18:54:14, 15.96s/it] 36%|███▌ | 2378/6640 [3:42:49<19:11:02, 16.20s/it] {'loss': 0.5151, 'learning_rate': 1.486704751783656e-05, 'epoch': 0.36} 36%|███▌ | 2378/6640 [3:42:49<19:11:02, 16.20s/it] 36%|███▌ | 2379/6640 [3:43:05<19:09:32, 16.19s/it] {'loss': 0.5603, 'learning_rate': 1.4862785463430836e-05, 'epoch': 0.36} 36%|███▌ | 2379/6640 [3:43:05<19:09:32, 16.19s/it] 36%|███▌ | 2380/6640 [3:43:23<19:30:19, 16.48s/it] {'loss': 0.5287, 'learning_rate': 1.4858522251812621e-05, 'epoch': 0.36} 36%|███▌ | 2380/6640 [3:43:23<19:30:19, 16.48s/it] 36%|███▌ | 2381/6640 [3:43:39<19:18:02, 16.31s/it] {'loss': 0.531, 'learning_rate': 1.4854257883996449e-05, 'epoch': 0.36} 36%|███▌ | 2381/6640 [3:43:39<19:18:02, 16.31s/it] 36%|███▌ | 2382/6640 [3:43:55<19:14:07, 16.26s/it] {'loss': 0.555, 'learning_rate': 1.4849992360997126e-05, 'epoch': 0.36} 36%|███▌ | 2382/6640 [3:43:55<19:14:07, 16.26s/it] 36%|███▌ | 2383/6640 [3:44:11<19:05:19, 16.14s/it] {'loss': 0.5381, 'learning_rate': 1.4845725683829723e-05, 'epoch': 0.36} 36%|███▌ | 2383/6640 [3:44:11<19:05:19, 16.14s/it] 36%|███▌ | 2384/6640 [3:44:27<19:20:12, 16.36s/it] {'loss': 0.5568, 'learning_rate': 1.4841457853509606e-05, 'epoch': 0.36} 36%|███▌ | 2384/6640 [3:44:27<19:20:12, 16.36s/it] 36%|███▌ | 2385/6640 [3:44:45<19:40:13, 16.64s/it] {'loss': 0.5519, 'learning_rate': 1.4837188871052399e-05, 'epoch': 0.36} 36%|███▌ | 2385/6640 [3:44:45<19:40:13, 16.64s/it] 36%|███▌ | 2386/6640 [3:45:01<19:28:05, 16.48s/it] {'loss': 0.5527, 'learning_rate': 1.4832918737474007e-05, 'epoch': 0.36} 36%|███▌ | 2386/6640 [3:45:01<19:28:05, 16.48s/it] 36%|███▌ | 2387/6640 [3:45:18<19:33:47, 16.56s/it] {'loss': 0.563, 'learning_rate': 1.4828647453790606e-05, 'epoch': 0.36} 36%|███▌ | 2387/6640 [3:45:18<19:33:47, 16.56s/it] 36%|███▌ | 2388/6640 [3:45:33<19:19:23, 16.36s/it] {'loss': 0.5405, 'learning_rate': 1.4824375021018645e-05, 'epoch': 0.36} 36%|███▌ | 2388/6640 [3:45:33<19:19:23, 16.36s/it] 36%|███▌ | 2389/6640 [3:45:50<19:19:36, 16.37s/it] {'loss': 0.5199, 'learning_rate': 1.4820101440174852e-05, 'epoch': 0.36} 36%|███▌ | 2389/6640 [3:45:50<19:19:36, 16.37s/it] 36%|███▌ | 2390/6640 [3:46:06<19:22:58, 16.42s/it] {'loss': 0.5588, 'learning_rate': 1.481582671227622e-05, 'epoch': 0.36} 36%|███▌ | 2390/6640 [3:46:06<19:22:58, 16.42s/it] 36%|███▌ | 2391/6640 [3:46:22<19:15:30, 16.32s/it] {'loss': 0.549, 'learning_rate': 1.4811550838340028e-05, 'epoch': 0.36} 36%|███▌ | 2391/6640 [3:46:22<19:15:30, 16.32s/it] 36%|███▌ | 2392/6640 [3:46:39<19:19:16, 16.37s/it] {'loss': 0.5431, 'learning_rate': 1.4807273819383809e-05, 'epoch': 0.36} 36%|███▌ | 2392/6640 [3:46:39<19:19:16, 16.37s/it] 36%|███▌ | 2393/6640 [3:46:55<19:01:57, 16.13s/it] {'loss': 0.5177, 'learning_rate': 1.4802995656425387e-05, 'epoch': 0.36} 36%|███▌ | 2393/6640 [3:46:55<19:01:57, 16.13s/it] 36%|███▌ | 2394/6640 [3:47:11<19:04:28, 16.17s/it] {'loss': 0.5193, 'learning_rate': 1.4798716350482845e-05, 'epoch': 0.36} 36%|███▌ | 2394/6640 [3:47:11<19:04:28, 16.17s/it] 36%|███▌ | 2395/6640 [3:47:27<18:58:37, 16.09s/it] {'loss': 0.5584, 'learning_rate': 1.4794435902574543e-05, 'epoch': 0.36} 36%|███▌ | 2395/6640 [3:47:27<18:58:37, 16.09s/it] 36%|███▌ | 2396/6640 [3:47:43<19:01:15, 16.13s/it] {'loss': 0.5374, 'learning_rate': 1.4790154313719117e-05, 'epoch': 0.36} 36%|███▌ | 2396/6640 [3:47:43<19:01:15, 16.13s/it] 36%|███▌ | 2397/6640 [3:47:59<19:10:32, 16.27s/it] {'loss': 0.5599, 'learning_rate': 1.4785871584935469e-05, 'epoch': 0.36} 36%|███▌ | 2397/6640 [3:48:00<19:10:32, 16.27s/it] 36%|███▌ | 2398/6640 [3:48:16<19:16:36, 16.36s/it] {'loss': 0.5215, 'learning_rate': 1.4781587717242772e-05, 'epoch': 0.36} 36%|███▌ | 2398/6640 [3:48:16<19:16:36, 16.36s/it] 36%|███▌ | 2399/6640 [3:48:32<19:13:56, 16.33s/it] {'loss': 0.5278, 'learning_rate': 1.4777302711660469e-05, 'epoch': 0.36} 36%|███▌ | 2399/6640 [3:48:32<19:13:56, 16.33s/it]01 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 36%|███▌ | 2400/6640 [3:48:49<19:22:33, 16.45s/it]5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5452, 'learning_rate': 1.4773016569208283e-05, 'epoch': 0.36} 36%|███▌ | 2400/6640 [3:48:49<19:22:33, 16.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-2400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 36%|███▌ | 2401/6640 [3:50:36<51:28:22, 43.71s/it] {'loss': 0.5432, 'learning_rate': 1.4768729290906194e-05, 'epoch': 0.36} 36%|███▌ | 2401/6640 [3:50:36<51:28:22, 43.71s/it] 36%|███▌ | 2402/6640 [3:50:52<41:35:52, 35.34s/it] {'loss': 0.5374, 'learning_rate': 1.4764440877774465e-05, 'epoch': 0.36} 36%|███▌ | 2402/6640 [3:50:52<41:35:52, 35.34s/it] 36%|███▌ | 2403/6640 [3:51:08<34:32:37, 29.35s/it] {'loss': 0.5209, 'learning_rate': 1.476015133083362e-05, 'epoch': 0.36} 36%|███▌ | 2403/6640 [3:51:08<34:32:37, 29.35s/it] 36%|███▌ | 2404/6640 [3:51:24<29:50:53, 25.37s/it] {'loss': 0.5227, 'learning_rate': 1.4755860651104455e-05, 'epoch': 0.36} 36%|███▌ | 2404/6640 [3:51:24<29:50:53, 25.37s/it] 36%|███▌ | 2405/6640 [3:51:40<26:37:25, 22.63s/it] {'loss': 0.5245, 'learning_rate': 1.4751568839608036e-05, 'epoch': 0.36} 36%|███▌ | 2405/6640 [3:51:40<26:37:25, 22.63s/it] 36%|███▌ | 2406/6640 [3:51:56<24:22:34, 20.73s/it] {'loss': 0.535, 'learning_rate': 1.4747275897365707e-05, 'epoch': 0.36} 36%|███▌ | 2406/6640 [3:51:56<24:22:34, 20.73s/it] 36%|███▋ | 2407/6640 [3:52:13<23:01:45, 19.59s/it] {'loss': 0.5431, 'learning_rate': 1.4742981825399067e-05, 'epoch': 0.36} 36%|███▋ | 2407/6640 [3:52:13<23:01:45, 19.59s/it] 36%|███▋ | 2408/6640 [3:52:29<21:53:56, 18.63s/it] {'loss': 0.5433, 'learning_rate': 1.4738686624729987e-05, 'epoch': 0.36} 36%|███▋ | 2408/6640 [3:52:29<21:53:56, 18.63s/it] 36%|███▋ | 2409/6640 [3:52:46<20:59:28, 17.86s/it] {'loss': 0.5243, 'learning_rate': 1.4734390296380618e-05, 'epoch': 0.36} 36%|███▋ | 2409/6640 [3:52:46<20:59:28, 17.86s/it] 36%|███▋ | 2410/6640 [3:53:02<20:19:52, 17.30s/it] {'loss': 0.5298, 'learning_rate': 1.4730092841373362e-05, 'epoch': 0.36} 36%|███▋ | 2410/6640 [3:53:02<20:19:52, 17.30s/it] 36%|███▋ | 2411/6640 [3:53:18<20:02:29, 17.06s/it] {'loss': 0.5291, 'learning_rate': 1.4725794260730903e-05, 'epoch': 0.36} 36%|███▋ | 2411/6640 [3:53:18<20:02:29, 17.06s/it] 36%|███▋ | 2412/6640 [3:53:35<19:57:37, 17.00s/it] {'loss': 0.5439, 'learning_rate': 1.4721494555476189e-05, 'epoch': 0.36} 36%|███▋ | 2412/6640 [3:53:35<19:57:37, 17.00s/it] 36%|███▋ | 2413/6640 [3:53:52<19:52:19, 16.92s/it] {'loss': 0.5514, 'learning_rate': 1.4717193726632428e-05, 'epoch': 0.36} 36%|███▋ | 2413/6640 [3:53:52<19:52:19, 16.92s/it] 36%|███▋ | 2414/6640 [3:54:08<19:36:44, 16.71s/it] {'loss': 0.5379, 'learning_rate': 1.4712891775223108e-05, 'epoch': 0.36} 36%|███▋ | 2414/6640 [3:54:08<19:36:44, 16.71s/it] 36%|███▋ | 2415/6640 [3:54:25<19:41:28, 16.78s/it] {'loss': 0.5423, 'learning_rate': 1.4708588702271978e-05, 'epoch': 0.36} 36%|███▋ | 2415/6640 [3:54:25<19:41:28, 16.78s/it] 36%|███▋ | 2416/6640 [3:54:43<20:01:24, 17.07s/it] {'loss': 0.5293, 'learning_rate': 1.470428450880305e-05, 'epoch': 0.36} 36%|███▋ | 2416/6640 [3:54:43<20:01:24, 17.07s/it] 36%|███▋ | 2417/6640 [3:54:59<19:56:57, 17.01s/it] {'loss': 0.5388, 'learning_rate': 1.469997919584061e-05, 'epoch': 0.36} 36%|███▋ | 2417/6640 [3:54:59<19:56:57, 17.01s/it] 36%|███▋ | 2418/6640 [3:55:17<20:07:36, 17.16s/it] {'loss': 0.5442, 'learning_rate': 1.4695672764409205e-05, 'epoch': 0.36} 36%|███▋ | 2418/6640 [3:55:17<20:07:36, 17.16s/it] 36%|███▋ | 2419/6640 [3:55:33<19:41:45, 16.80s/it] {'loss': 0.5227, 'learning_rate': 1.4691365215533653e-05, 'epoch': 0.36} 36%|███▋ | 2419/6640 [3:55:33<19:41:45, 16.80s/it] 36%|███▋ | 2420/6640 [3:55:50<19:41:19, 16.80s/it] {'loss': 0.5473, 'learning_rate': 1.468705655023903e-05, 'epoch': 0.36} 36%|███▋ | 2420/6640 [3:55:50<19:41:19, 16.80s/it] 36%|███▋ | 2421/6640 [3:56:06<19:28:41, 16.62s/it] {'loss': 0.5487, 'learning_rate': 1.4682746769550686e-05, 'epoch': 0.36} 36%|███▋ | 2421/6640 [3:56:06<19:28:41, 16.62s/it] 36%|███▋ | 2422/6640 [3:56:23<19:31:28, 16.66s/it] {'loss': 0.5385, 'learning_rate': 1.4678435874494234e-05, 'epoch': 0.36} 36%|███▋ | 2422/6640 [3:56:23<19:31:28, 16.66s/it] 36%|███▋ | 2423/6640 [3:56:40<19:36:33, 16.74s/it] {'loss': 0.552, 'learning_rate': 1.4674123866095551e-05, 'epoch': 0.36} 36%|███▋ | 2423/6640 [3:56:40<19:36:33, 16.74s/it] 37%|███▋ | 2424/6640 [3:56:57<19:49:13, 16.92s/it] {'loss': 0.5313, 'learning_rate': 1.4669810745380778e-05, 'epoch': 0.37} 37%|███▋ | 2424/6640 [3:56:57<19:49:13, 16.92s/it]May 28 06:10:44.238270 342306 slurmstepd 0x155550ab8700: error: *** STEP 8277401.0 ON batch-block7-01076 CANCELLED AT 2025-05-28T06:10:44 DUE TO TIME LIMIT *** srun: Job step aborted: Waiting up to 122 seconds for job step to finish. srun: error: batch-block7-01076: task 0: Terminated srun: Terminating StepId=8277401.0 srun: job 8284398 queued and waiting for resources srun: job 8284398 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-0066 JobID: 8284398 | Full list: batch-block1-0066 NETWORK=Efficient-Large-Model/VILA1.5-13b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-05-28 06:12:58,146] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 06:12:58,146] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 06:12:58,146] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 06:12:58,146] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 06:12:58,146] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 06:12:58,146] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 06:12:58,146] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 06:12:58,146] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 06:12:59,726] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 06:12:59,726] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 06:12:59,726] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 06:12:59,726] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 06:12:59,726] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 06:12:59,726] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 06:12:59,726] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 06:12:59,726] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 06:12:59,726] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 06:12:59,726] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 06:12:59,726] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 06:12:59,726] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 06:12:59,726] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 06:12:59,726] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 06:12:59,726] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-05-28 06:12:59,726] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 06:12:59,726] [INFO] [comm.py:594:init_distributed] cdb=None You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-05-28 06:13:06,907] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 13.02B parameters Loading checkpoint shards: 0%| | 0/6 [00:00 4096). Running this sequence through the model will result in indexing errors 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 45%|████▌ | 3000/6640 [2:52:11<16:34:17, 16.39s/it] {'loss': 0.5432, 'learning_rate': 1.2034560130526341e-05, 'epoch': 0.45} 45%|████▌ | 3000/6640 [2:52:11<16:34:17, 16.39s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-3000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-3000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-3000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 45%|████▌ | 3001/6640 [2:54:00<44:35:53, 44.12s/it] {'loss': 0.515, 'learning_rate': 1.2029783672206326e-05, 'epoch': 0.45} 45%|████▌ | 3001/6640 [2:54:00<44:35:53, 44.12s/it] 45%|████▌ | 3002/6640 [2:54:16<36:13:46, 35.85s/it] {'loss': 0.5288, 'learning_rate': 1.202500673085225e-05, 'epoch': 0.45} 45%|████▌ | 3002/6640 [2:54:16<36:13:46, 35.85s/it] 45%|████▌ | 3003/6640 [2:54:33<30:20:27, 30.03s/it] {'loss': 0.5211, 'learning_rate': 1.2020229307600897e-05, 'epoch': 0.45} 45%|████▌ | 3003/6640 [2:54:33<30:20:27, 30.03s/it] 45%|████▌ | 3004/6640 [2:54:50<26:28:14, 26.21s/it] {'loss': 0.5225, 'learning_rate': 1.2015451403589164e-05, 'epoch': 0.45} 45%|████▌ | 3004/6640 [2:54:50<26:28:14, 26.21s/it] 45%|████▌ | 3005/6640 [2:55:08<24:01:35, 23.80s/it] {'loss': 0.5281, 'learning_rate': 1.201067301995407e-05, 'epoch': 0.45} 45%|████▌ | 3005/6640 [2:55:08<24:01:35, 23.80s/it] 45%|████▌ | 3006/6640 [2:55:25<22:03:16, 21.85s/it] {'loss': 0.5659, 'learning_rate': 1.200589415783273e-05, 'epoch': 0.45} 45%|████▌ | 3006/6640 [2:55:25<22:03:16, 21.85s/it] 45%|████▌ | 3007/6640 [2:55:42<20:19:31, 20.14s/it] {'loss': 0.5263, 'learning_rate': 1.2001114818362394e-05, 'epoch': 0.45} 45%|████▌ | 3007/6640 [2:55:42<20:19:31, 20.14s/it] 45%|████▌ | 3008/6640 [2:55:58<19:15:12, 19.08s/it] {'loss': 0.5315, 'learning_rate': 1.1996335002680413e-05, 'epoch': 0.45} 45%|████▌ | 3008/6640 [2:55:58<19:15:12, 19.08s/it] 45%|████▌ | 3009/6640 [2:56:14<18:22:55, 18.23s/it] {'loss': 0.5372, 'learning_rate': 1.1991554711924256e-05, 'epoch': 0.45} 45%|████▌ | 3009/6640 [2:56:14<18:22:55, 18.23s/it] 45%|████▌ | 3010/6640 [2:56:30<17:38:37, 17.50s/it] {'loss': 0.5314, 'learning_rate': 1.1986773947231505e-05, 'epoch': 0.45} 45%|████▌ | 3010/6640 [2:56:30<17:38:37, 17.50s/it] 45%|████▌ | 3011/6640 [2:56:47<17:31:41, 17.39s/it] {'loss': 0.5357, 'learning_rate': 1.1981992709739853e-05, 'epoch': 0.45} 45%|████▌ | 3011/6640 [2:56:47<17:31:41, 17.39s/it] 45%|████▌ | 3012/6640 [2:57:04<17:11:50, 17.06s/it] {'loss': 0.5141, 'learning_rate': 1.1977211000587109e-05, 'epoch': 0.45} 45%|████▌ | 3012/6640 [2:57:04<17:11:50, 17.06s/it] 45%|████▌ | 3013/6640 [2:57:20<17:05:13, 16.96s/it] {'loss': 0.5335, 'learning_rate': 1.1972428820911185e-05, 'epoch': 0.45} 45%|████▌ | 3013/6640 [2:57:20<17:05:13, 16.96s/it] 45%|████▌ | 3014/6640 [2:57:37<16:52:41, 16.76s/it] {'loss': 0.5271, 'learning_rate': 1.1967646171850118e-05, 'epoch': 0.45} 45%|████▌ | 3014/6640 [2:57:37<16:52:41, 16.76s/it] 45%|████▌ | 3015/6640 [2:57:54<17:05:28, 16.97s/it] {'loss': 0.5567, 'learning_rate': 1.1962863054542045e-05, 'epoch': 0.45} 45%|████▌ | 3015/6640 [2:57:54<17:05:28, 16.97s/it] 45%|████▌ | 3016/6640 [2:58:11<16:57:11, 16.84s/it] {'loss': 0.5243, 'learning_rate': 1.1958079470125223e-05, 'epoch': 0.45} 45%|████▌ | 3016/6640 [2:58:11<16:57:11, 16.84s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 45%|████▌ | 3017/6640 [2:58:29<17:26:55, 17.34s/it] {'loss': 0.5339, 'learning_rate': 1.1953295419738013e-05, 'epoch': 0.45} 45%|████▌ | 3017/6640 [2:58:29<17:26:55, 17.34s/it] 45%|████▌ | 3018/6640 [2:58:45<17:05:13, 16.98s/it] {'loss': 0.5339, 'learning_rate': 1.1948510904518895e-05, 'epoch': 0.45} 45%|████▌ | 3018/6640 [2:58:45<17:05:13, 16.98s/it] 45%|████▌ | 3019/6640 [2:59:02<16:51:16, 16.76s/it] {'loss': 0.5267, 'learning_rate': 1.1943725925606453e-05, 'epoch': 0.45} 45%|████▌ | 3019/6640 [2:59:02<16:51:16, 16.76s/it] 45%|████▌ | 3020/6640 [2:59:18<16:43:58, 16.64s/it] {'loss': 0.5334, 'learning_rate': 1.1938940484139387e-05, 'epoch': 0.45} 45%|████▌ | 3020/6640 [2:59:18<16:43:58, 16.64s/it] 45%|████▌ | 3021/6640 [2:59:34<16:41:41, 16.61s/it] {'loss': 0.5372, 'learning_rate': 1.1934154581256498e-05, 'epoch': 0.45} 45%|████▌ | 3021/6640 [2:59:34<16:41:41, 16.61s/it] 46%|████▌ | 3022/6640 [2:59:51<16:35:02, 16.50s/it] {'loss': 0.5216, 'learning_rate': 1.1929368218096708e-05, 'epoch': 0.46} 46%|████▌ | 3022/6640 [2:59:51<16:35:02, 16.50s/it] 46%|████▌ | 3023/6640 [3:00:07<16:29:07, 16.41s/it] {'loss': 0.5265, 'learning_rate': 1.1924581395799039e-05, 'epoch': 0.46} 46%|████▌ | 3023/6640 [3:00:07<16:29:07, 16.41s/it] 46%|████▌ | 3024/6640 [3:00:23<16:25:00, 16.34s/it] {'loss': 0.5289, 'learning_rate': 1.1919794115502628e-05, 'epoch': 0.46} 46%|████▌ | 3024/6640 [3:00:23<16:25:00, 16.34s/it] 46%|████▌ | 3025/6640 [3:00:40<16:40:09, 16.60s/it] {'loss': 0.5555, 'learning_rate': 1.1915006378346719e-05, 'epoch': 0.46} 46%|████▌ | 3025/6640 [3:00:40<16:40:09, 16.60s/it] 46%|████▌ | 3026/6640 [3:00:56<16:22:10, 16.31s/it] {'loss': 0.5185, 'learning_rate': 1.1910218185470663e-05, 'epoch': 0.46} 46%|████▌ | 3026/6640 [3:00:56<16:22:10, 16.31s/it] 46%|████▌ | 3027/6640 [3:01:12<16:26:55, 16.39s/it] {'loss': 0.5356, 'learning_rate': 1.1905429538013926e-05, 'epoch': 0.46} 46%|████▌ | 3027/6640 [3:01:12<16:26:55, 16.39s/it] 46%|████▌ | 3028/6640 [3:01:28<16:05:29, 16.04s/it] {'loss': 0.5074, 'learning_rate': 1.1900640437116074e-05, 'epoch': 0.46} 46%|████▌ | 3028/6640 [3:01:28<16:05:29, 16.04s/it] 46%|████▌ | 3029/6640 [3:01:45<16:20:34, 16.29s/it] {'loss': 0.5278, 'learning_rate': 1.1895850883916786e-05, 'epoch': 0.46} 46%|████▌ | 3029/6640 [3:01:45<16:20:34, 16.29s/it] 46%|████▌ | 3030/6640 [3:02:01<16:20:41, 16.30s/it] {'loss': 0.5326, 'learning_rate': 1.1891060879555847e-05, 'epoch': 0.46} 46%|████▌ | 3030/6640 [3:02:01<16:20:41, 16.30s/it] 46%|████▌ | 3031/6640 [3:02:18<16:29:22, 16.45s/it] {'loss': 0.5446, 'learning_rate': 1.188627042517315e-05, 'epoch': 0.46} 46%|████▌ | 3031/6640 [3:02:18<16:29:22, 16.45s/it] 46%|████▌ | 3032/6640 [3:02:34<16:19:38, 16.29s/it] {'loss': 0.5423, 'learning_rate': 1.1881479521908694e-05, 'epoch': 0.46} 46%|████▌ | 3032/6640 [3:02:34<16:19:38, 16.29s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5158 > 4096). Running this sequence through the model will result in indexing errors 46%|████▌ | 3033/6640 [3:02:50<16:28:26, 16.44s/it] {'loss': 0.5141, 'learning_rate': 1.1876688170902583e-05, 'epoch': 0.46} 46%|████▌ | 3033/6640 [3:02:50<16:28:26, 16.44s/it] 46%|████▌ | 3034/6640 [3:03:07<16:23:05, 16.36s/it] {'loss': 0.5411, 'learning_rate': 1.1871896373295033e-05, 'epoch': 0.46} 46%|████▌ | 3034/6640 [3:03:07<16:23:05, 16.36s/it] 46%|████▌ | 3035/6640 [3:03:23<16:22:22, 16.35s/it] {'loss': 0.5543, 'learning_rate': 1.1867104130226363e-05, 'epoch': 0.46} 46%|████▌ | 3035/6640 [3:03:23<16:22:22, 16.35s/it] 46%|████▌ | 3036/6640 [3:03:39<16:16:13, 16.25s/it] {'loss': 0.5301, 'learning_rate': 1.1862311442837e-05, 'epoch': 0.46} 46%|████▌ | 3036/6640 [3:03:39<16:16:13, 16.25s/it] 46%|████▌ | 3037/6640 [3:03:56<16:28:49, 16.47s/it] {'loss': 0.5216, 'learning_rate': 1.185751831226747e-05, 'epoch': 0.46} 46%|████▌ | 3037/6640 [3:03:56<16:28:49, 16.47s/it] 46%|████▌ | 3038/6640 [3:04:12<16:25:46, 16.42s/it] {'loss': 0.5261, 'learning_rate': 1.1852724739658417e-05, 'epoch': 0.46} 46%|████▌ | 3038/6640 [3:04:12<16:25:46, 16.42s/it] 46%|████▌ | 3039/6640 [3:04:29<16:23:36, 16.39s/it] {'loss': 0.544, 'learning_rate': 1.1847930726150574e-05, 'epoch': 0.46} 46%|████▌ | 3039/6640 [3:04:29<16:23:36, 16.39s/it] 46%|████▌ | 3040/6640 [3:04:45<16:24:29, 16.41s/it] {'loss': 0.5157, 'learning_rate': 1.1843136272884795e-05, 'epoch': 0.46} 46%|████▌ | 3040/6640 [3:04:45<16:24:29, 16.41s/it] 46%|████▌ | 3041/6640 [3:05:01<16:25:01, 16.42s/it] {'loss': 0.5283, 'learning_rate': 1.1838341381002027e-05, 'epoch': 0.46} 46%|████▌ | 3041/6640 [3:05:01<16:25:01, 16.42s/it] 46%|████▌ | 3042/6640 [3:05:18<16:20:33, 16.35s/it] {'loss': 0.5268, 'learning_rate': 1.1833546051643325e-05, 'epoch': 0.46} 46%|████▌ | 3042/6640 [3:05:18<16:20:33, 16.35s/it] 46%|████▌ | 3043/6640 [3:05:34<16:19:10, 16.33s/it] {'loss': 0.5216, 'learning_rate': 1.182875028594985e-05, 'epoch': 0.46} 46%|████▌ | 3043/6640 [3:05:34<16:19:10, 16.33s/it] 46%|████▌ | 3044/6640 [3:05:50<16:16:28, 16.29s/it] {'loss': 0.5304, 'learning_rate': 1.1823954085062867e-05, 'epoch': 0.46} 46%|████▌ | 3044/6640 [3:05:50<16:16:28, 16.29s/it] 46%|████▌ | 3045/6640 [3:06:07<16:31:11, 16.54s/it] {'loss': 0.5517, 'learning_rate': 1.1819157450123745e-05, 'epoch': 0.46} 46%|████▌ | 3045/6640 [3:06:07<16:31:11, 16.54s/it] 46%|████▌ | 3046/6640 [3:06:23<16:19:56, 16.36s/it] {'loss': 0.5238, 'learning_rate': 1.1814360382273949e-05, 'epoch': 0.46} 46%|████▌ | 3046/6640 [3:06:23<16:19:56, 16.36s/it] 46%|████▌ | 3047/6640 [3:06:40<16:22:22, 16.40s/it] {'loss': 0.5408, 'learning_rate': 1.1809562882655054e-05, 'epoch': 0.46} 46%|████▌ | 3047/6640 [3:06:40<16:22:22, 16.40s/it] 46%|████▌ | 3048/6640 [3:06:56<16:21:35, 16.40s/it] {'loss': 0.5245, 'learning_rate': 1.180476495240874e-05, 'epoch': 0.46} 46%|████▌ | 3048/6640 [3:06:56<16:21:35, 16.40s/it] 46%|████▌ | 3049/6640 [3:07:12<16:18:28, 16.35s/it] {'loss': 0.5272, 'learning_rate': 1.1799966592676784e-05, 'epoch': 0.46} 46%|████▌ | 3049/6640 [3:07:12<16:18:28, 16.35s/it]54 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 46%|████▌ | 3050/6640 [3:07:29<16:16:45, 16.32s/it]7 AutoResumeHook: Checking whether to suspend... {'loss': 0.5565, 'learning_rate': 1.1795167804601062e-05, 'epoch': 0.46} 46%|████▌ | 3050/6640 [3:07:29<16:16:45, 16.32s/it] 46%|████▌ | 3051/6640 [3:07:46<16:37:23, 16.67s/it] {'loss': 0.5296, 'learning_rate': 1.1790368589323562e-05, 'epoch': 0.46} 46%|████▌ | 3051/6640 [3:07:46<16:37:23, 16.67s/it] 46%|████▌ | 3052/6640 [3:08:03<16:36:49, 16.67s/it] {'loss': 0.5083, 'learning_rate': 1.1785568947986368e-05, 'epoch': 0.46} 46%|████▌ | 3052/6640 [3:08:03<16:36:49, 16.67s/it] 46%|████▌ | 3053/6640 [3:08:19<16:33:58, 16.63s/it] {'loss': 0.5149, 'learning_rate': 1.1780768881731664e-05, 'epoch': 0.46} 46%|████▌ | 3053/6640 [3:08:19<16:33:58, 16.63s/it] 46%|████▌ | 3054/6640 [3:08:37<16:48:23, 16.87s/it] {'loss': 0.5238, 'learning_rate': 1.177596839170174e-05, 'epoch': 0.46} 46%|████▌ | 3054/6640 [3:08:37<16:48:23, 16.87s/it] 46%|████▌ | 3055/6640 [3:08:53<16:41:55, 16.77s/it] {'loss': 0.5078, 'learning_rate': 1.1771167479038978e-05, 'epoch': 0.46} 46%|████▌ | 3055/6640 [3:08:53<16:41:55, 16.77s/it] 46%|████▌ | 3056/6640 [3:09:10<16:40:45, 16.75s/it] {'loss': 0.5218, 'learning_rate': 1.1766366144885877e-05, 'epoch': 0.46} 46%|████▌ | 3056/6640 [3:09:10<16:40:45, 16.75s/it] 46%|████▌ | 3057/6640 [3:09:26<16:28:40, 16.56s/it] {'loss': 0.5195, 'learning_rate': 1.1761564390385015e-05, 'epoch': 0.46} 46%|████▌ | 3057/6640 [3:09:26<16:28:40, 16.56s/it] 46%|████▌ | 3058/6640 [3:09:42<16:21:41, 16.44s/it] {'loss': 0.5476, 'learning_rate': 1.1756762216679085e-05, 'epoch': 0.46} 46%|████▌ | 3058/6640 [3:09:42<16:21:41, 16.44s/it] 46%|████▌ | 3059/6640 [3:09:59<16:23:38, 16.48s/it] {'loss': 0.5306, 'learning_rate': 1.1751959624910874e-05, 'epoch': 0.46} 46%|████▌ | 3059/6640 [3:09:59<16:23:38, 16.48s/it] 46%|████▌ | 3060/6640 [3:10:15<16:24:47, 16.50s/it] {'loss': 0.5293, 'learning_rate': 1.1747156616223272e-05, 'epoch': 0.46} 46%|████▌ | 3060/6640 [3:10:15<16:24:47, 16.50s/it] 46%|████▌ | 3061/6640 [3:10:31<16:12:15, 16.30s/it] {'loss': 0.563, 'learning_rate': 1.1742353191759267e-05, 'epoch': 0.46} 46%|████▌ | 3061/6640 [3:10:31<16:12:15, 16.30s/it] 46%|████▌ | 3062/6640 [3:10:47<16:05:09, 16.18s/it] {'loss': 0.5282, 'learning_rate': 1.173754935266194e-05, 'epoch': 0.46} 46%|████▌ | 3062/6640 [3:10:47<16:05:09, 16.18s/it] 46%|████▌ | 3063/6640 [3:11:03<15:59:54, 16.10s/it] {'loss': 0.5289, 'learning_rate': 1.1732745100074485e-05, 'epoch': 0.46} 46%|████▌ | 3063/6640 [3:11:03<15:59:54, 16.10s/it] 46%|████▌ | 3064/6640 [3:11:20<16:08:08, 16.24s/it] {'loss': 0.5488, 'learning_rate': 1.1727940435140177e-05, 'epoch': 0.46} 46%|████▌ | 3064/6640 [3:11:20<16:08:08, 16.24s/it] 46%|████▌ | 3065/6640 [3:11:35<16:01:17, 16.13s/it] {'loss': 0.5503, 'learning_rate': 1.1723135359002403e-05, 'epoch': 0.46} 46%|████▌ | 3065/6640 [3:11:35<16:01:17, 16.13s/it] 46%|████▌ | 3066/6640 [3:11:51<15:53:14, 16.00s/it] {'loss': 0.5287, 'learning_rate': 1.1718329872804635e-05, 'epoch': 0.46} 46%|████▌ | 3066/6640 [3:11:51<15:53:14, 16.00s/it] 46%|████▌ | 3067/6640 [3:12:08<16:05:09, 16.21s/it] {'loss': 0.5354, 'learning_rate': 1.1713523977690458e-05, 'epoch': 0.46} 46%|████▌ | 3067/6640 [3:12:08<16:05:09, 16.21s/it] 46%|████▌ | 3068/6640 [3:12:24<15:58:51, 16.11s/it] {'loss': 0.5321, 'learning_rate': 1.1708717674803538e-05, 'epoch': 0.46} 46%|████▌ | 3068/6640 [3:12:24<15:58:51, 16.11s/it] 46%|████▌ | 3069/6640 [3:12:41<16:27:53, 16.60s/it] {'loss': 0.5484, 'learning_rate': 1.1703910965287653e-05, 'epoch': 0.46} 46%|████▌ | 3069/6640 [3:12:41<16:27:53, 16.60s/it] 46%|████▌ | 3070/6640 [3:12:59<16:45:04, 16.89s/it] {'loss': 0.523, 'learning_rate': 1.1699103850286668e-05, 'epoch': 0.46} 46%|████▌ | 3070/6640 [3:12:59<16:45:04, 16.89s/it] 46%|████▋ | 3071/6640 [3:13:15<16:35:35, 16.74s/it] {'loss': 0.5323, 'learning_rate': 1.1694296330944548e-05, 'epoch': 0.46} 46%|████▋ | 3071/6640 [3:13:15<16:35:35, 16.74s/it] 46%|████▋ | 3072/6640 [3:13:32<16:29:55, 16.65s/it] {'loss': 0.5284, 'learning_rate': 1.1689488408405354e-05, 'epoch': 0.46} 46%|████▋ | 3072/6640 [3:13:32<16:29:55, 16.65s/it] 46%|████▋ | 3073/6640 [3:13:50<16:56:11, 17.09s/it] {'loss': 0.5431, 'learning_rate': 1.168468008381324e-05, 'epoch': 0.46} 46%|████▋ | 3073/6640 [3:13:50<16:56:11, 17.09s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4399 > 4096). Running this sequence through the model will result in indexing errors 46%|████▋ | 3074/6640 [3:14:06<16:37:15, 16.78s/it] {'loss': 0.5218, 'learning_rate': 1.1679871358312462e-05, 'epoch': 0.46} 46%|████▋ | 3074/6640 [3:14:06<16:37:15, 16.78s/it] 46%|████▋ | 3075/6640 [3:14:23<16:36:30, 16.77s/it] {'loss': 0.5069, 'learning_rate': 1.1675062233047365e-05, 'epoch': 0.46} 46%|████▋ | 3075/6640 [3:14:23<16:36:30, 16.77s/it] 46%|████▋ | 3076/6640 [3:14:39<16:24:12, 16.57s/it] {'loss': 0.5276, 'learning_rate': 1.1670252709162393e-05, 'epoch': 0.46} 46%|████▋ | 3076/6640 [3:14:39<16:24:12, 16.57s/it] 46%|████▋ | 3077/6640 [3:14:55<16:14:42, 16.41s/it] {'loss': 0.5249, 'learning_rate': 1.1665442787802083e-05, 'epoch': 0.46} 46%|████▋ | 3077/6640 [3:14:55<16:14:42, 16.41s/it] 46%|████▋ | 3078/6640 [3:15:12<16:18:02, 16.47s/it] {'loss': 0.5174, 'learning_rate': 1.1660632470111069e-05, 'epoch': 0.46} 46%|████▋ | 3078/6640 [3:15:12<16:18:02, 16.47s/it] 46%|████▋ | 3079/6640 [3:15:28<16:09:34, 16.34s/it] {'loss': 0.5327, 'learning_rate': 1.1655821757234075e-05, 'epoch': 0.46} 46%|████▋ | 3079/6640 [3:15:28<16:09:34, 16.34s/it] 46%|████▋ | 3080/6640 [3:15:44<16:12:11, 16.39s/it] {'loss': 0.5117, 'learning_rate': 1.1651010650315923e-05, 'epoch': 0.46} 46%|████▋ | 3080/6640 [3:15:44<16:12:11, 16.39s/it] 46%|████▋ | 3081/6640 [3:16:01<16:16:34, 16.46s/it] {'loss': 0.533, 'learning_rate': 1.164619915050153e-05, 'epoch': 0.46} 46%|████▋ | 3081/6640 [3:16:01<16:16:34, 16.46s/it] 46%|████▋ | 3082/6640 [3:16:17<16:16:47, 16.47s/it] {'loss': 0.5266, 'learning_rate': 1.1641387258935896e-05, 'epoch': 0.46} 46%|████▋ | 3082/6640 [3:16:17<16:16:47, 16.47s/it] 46%|████▋ | 3083/6640 [3:16:33<16:02:39, 16.24s/it] {'loss': 0.5237, 'learning_rate': 1.1636574976764133e-05, 'epoch': 0.46} 46%|████▋ | 3083/6640 [3:16:33<16:02:39, 16.24s/it] 46%|████▋ | 3084/6640 [3:16:49<16:05:42, 16.29s/it] {'loss': 0.5156, 'learning_rate': 1.1631762305131424e-05, 'epoch': 0.46} 46%|████▋ | 3084/6640 [3:16:49<16:05:42, 16.29s/it] 46%|████▋ | 3085/6640 [3:17:07<16:31:20, 16.73s/it] {'loss': 0.5246, 'learning_rate': 1.1626949245183061e-05, 'epoch': 0.46} 46%|████▋ | 3085/6640 [3:17:07<16:31:20, 16.73s/it] 46%|████▋ | 3086/6640 [3:17:24<16:29:11, 16.70s/it] {'loss': 0.5339, 'learning_rate': 1.1622135798064427e-05, 'epoch': 0.46} 46%|████▋ | 3086/6640 [3:17:24<16:29:11, 16.70s/it] 46%|████▋ | 3087/6640 [3:17:40<16:20:07, 16.55s/it] {'loss': 0.5118, 'learning_rate': 1.1617321964920986e-05, 'epoch': 0.46} 46%|████▋ | 3087/6640 [3:17:40<16:20:07, 16.55s/it] 47%|████▋ | 3088/6640 [3:17:56<16:15:50, 16.48s/it] {'loss': 0.5153, 'learning_rate': 1.1612507746898307e-05, 'epoch': 0.47} 47%|████▋ | 3088/6640 [3:17:56<16:15:50, 16.48s/it] 47%|████▋ | 3089/6640 [3:18:12<16:10:53, 16.40s/it] {'loss': 0.5473, 'learning_rate': 1.160769314514204e-05, 'epoch': 0.47} 47%|████▋ | 3089/6640 [3:18:12<16:10:53, 16.40s/it] 47%|████▋ | 3090/6640 [3:18:29<16:19:15, 16.55s/it] {'loss': 0.5615, 'learning_rate': 1.1602878160797936e-05, 'epoch': 0.47} 47%|████▋ | 3090/6640 [3:18:29<16:19:15, 16.55s/it] 47%|████▋ | 3091/6640 [3:18:46<16:20:45, 16.58s/it] {'loss': 0.5484, 'learning_rate': 1.1598062795011827e-05, 'epoch': 0.47} 47%|████▋ | 3091/6640 [3:18:46<16:20:45, 16.58s/it] 47%|████▋ | 3092/6640 [3:19:02<16:18:01, 16.54s/it] {'loss': 0.5351, 'learning_rate': 1.1593247048929644e-05, 'epoch': 0.47} 47%|████▋ | 3092/6640 [3:19:02<16:18:01, 16.54s/it] 47%|████▋ | 3093/6640 [3:19:19<16:13:04, 16.46s/it] {'loss': 0.5261, 'learning_rate': 1.1588430923697404e-05, 'epoch': 0.47} 47%|████▋ | 3093/6640 [3:19:19<16:13:04, 16.46s/it] 47%|████▋ | 3094/6640 [3:19:35<16:14:27, 16.49s/it] {'loss': 0.5333, 'learning_rate': 1.1583614420461218e-05, 'epoch': 0.47} 47%|████▋ | 3094/6640 [3:19:35<16:14:27, 16.49s/it] 47%|████▋ | 3095/6640 [3:19:51<16:08:15, 16.39s/it] {'loss': 0.5141, 'learning_rate': 1.1578797540367284e-05, 'epoch': 0.47} 47%|████▋ | 3095/6640 [3:19:51<16:08:15, 16.39s/it] 47%|████▋ | 3096/6640 [3:20:09<16:33:49, 16.83s/it] {'loss': 0.533, 'learning_rate': 1.1573980284561886e-05, 'epoch': 0.47} 47%|████▋ | 3096/6640 [3:20:09<16:33:49, 16.83s/it] 47%|████▋ | 3097/6640 [3:20:25<16:21:22, 16.62s/it] {'loss': 0.5208, 'learning_rate': 1.1569162654191408e-05, 'epoch': 0.47} 47%|████▋ | 3097/6640 [3:20:25<16:21:22, 16.62s/it] 47%|████▋ | 3098/6640 [3:20:41<16:09:59, 16.43s/it] {'loss': 0.5163, 'learning_rate': 1.156434465040231e-05, 'epoch': 0.47} 47%|████▋ | 3098/6640 [3:20:41<16:09:59, 16.43s/it] 47%|████▋ | 3099/6640 [3:20:59<16:23:44, 16.67s/it] {'loss': 0.5354, 'learning_rate': 1.1559526274341155e-05, 'epoch': 0.47} 47%|████▋ | 3099/6640 [3:20:59<16:23:44, 16.67s/it]34 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 47%|████▋ | 3100/6640 [3:21:17<16:56:55, 17.24s/it] {'loss': 0.56, 'learning_rate': 1.155470752715458e-05, 'epoch': 0.47} 47%|████▋ | 3100/6640 [3:21:17<16:56:55, 17.24s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-3100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-3100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-3100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 47%|████▋ | 3101/6640 [3:23:06<44:00:38, 44.77s/it] {'loss': 0.5296, 'learning_rate': 1.154988840998932e-05, 'epoch': 0.47} 47%|████▋ | 3101/6640 [3:23:06<44:00:38, 44.77s/it] 47%|████▋ | 3102/6640 [3:23:23<35:52:59, 36.51s/it] {'loss': 0.5239, 'learning_rate': 1.1545068923992199e-05, 'epoch': 0.47} 47%|████▋ | 3102/6640 [3:23:23<35:52:59, 36.51s/it] 47%|████▋ | 3103/6640 [3:23:39<29:40:26, 30.20s/it] {'loss': 0.5369, 'learning_rate': 1.1540249070310124e-05, 'epoch': 0.47} 47%|████▋ | 3103/6640 [3:23:39<29:40:26, 30.20s/it] 47%|████▋ | 3104/6640 [3:23:55<25:35:21, 26.05s/it] {'loss': 0.5387, 'learning_rate': 1.1535428850090092e-05, 'epoch': 0.47} 47%|████▋ | 3104/6640 [3:23:55<25:35:21, 26.05s/it] 47%|████▋ | 3105/6640 [3:24:12<22:45:53, 23.18s/it] {'loss': 0.518, 'learning_rate': 1.153060826447918e-05, 'epoch': 0.47} 47%|████▋ | 3105/6640 [3:24:12<22:45:53, 23.18s/it] 47%|████▋ | 3106/6640 [3:24:29<20:56:45, 21.34s/it] {'loss': 0.5412, 'learning_rate': 1.1525787314624564e-05, 'epoch': 0.47} 47%|████▋ | 3106/6640 [3:24:29<20:56:45, 21.34s/it] 47%|████▋ | 3107/6640 [3:24:46<19:51:32, 20.24s/it] {'loss': 0.5284, 'learning_rate': 1.1520966001673496e-05, 'epoch': 0.47} 47%|████▋ | 3107/6640 [3:24:46<19:51:32, 20.24s/it] 47%|████▋ | 3108/6640 [3:25:02<18:33:35, 18.92s/it] {'loss': 0.5235, 'learning_rate': 1.1516144326773324e-05, 'epoch': 0.47} 47%|████▋ | 3108/6640 [3:25:02<18:33:35, 18.92s/it] 47%|████▋ | 3109/6640 [3:25:19<17:49:41, 18.18s/it] {'loss': 0.5367, 'learning_rate': 1.1511322291071474e-05, 'epoch': 0.47} 47%|████▋ | 3109/6640 [3:25:19<17:49:41, 18.18s/it] 47%|████▋ | 3110/6640 [3:25:36<17:29:25, 17.84s/it] {'loss': 0.5097, 'learning_rate': 1.1506499895715462e-05, 'epoch': 0.47} 47%|████▋ | 3110/6640 [3:25:36<17:29:25, 17.84s/it] 47%|████▋ | 3111/6640 [3:25:52<17:02:43, 17.39s/it] {'loss': 0.5289, 'learning_rate': 1.150167714185289e-05, 'epoch': 0.47} 47%|████▋ | 3111/6640 [3:25:52<17:02:43, 17.39s/it] 47%|████▋ | 3112/6640 [3:26:08<16:44:19, 17.08s/it] {'loss': 0.5329, 'learning_rate': 1.1496854030631443e-05, 'epoch': 0.47} 47%|████▋ | 3112/6640 [3:26:08<16:44:19, 17.08s/it] 47%|████▋ | 3113/6640 [3:26:25<16:40:18, 17.02s/it] {'loss': 0.5149, 'learning_rate': 1.1492030563198895e-05, 'epoch': 0.47} 47%|████▋ | 3113/6640 [3:26:25<16:40:18, 17.02s/it] 47%|████▋ | 3114/6640 [3:26:41<16:22:23, 16.72s/it] {'loss': 0.5241, 'learning_rate': 1.1487206740703094e-05, 'epoch': 0.47} 47%|████▋ | 3114/6640 [3:26:41<16:22:23, 16.72s/it] 47%|████▋ | 3115/6640 [3:26:58<16:19:12, 16.67s/it] {'loss': 0.5044, 'learning_rate': 1.148238256429199e-05, 'epoch': 0.47} 47%|████▋ | 3115/6640 [3:26:58<16:19:12, 16.67s/it] 47%|████▋ | 3116/6640 [3:27:14<16:09:04, 16.50s/it] {'loss': 0.5004, 'learning_rate': 1.14775580351136e-05, 'epoch': 0.47} 47%|████▋ | 3116/6640 [3:27:14<16:09:04, 16.50s/it] 47%|████▋ | 3117/6640 [3:27:30<16:07:11, 16.47s/it] {'loss': 0.5282, 'learning_rate': 1.1472733154316037e-05, 'epoch': 0.47} 47%|████▋ | 3117/6640 [3:27:30<16:07:11, 16.47s/it] 47%|████▋ | 3118/6640 [3:27:47<16:00:15, 16.36s/it] {'loss': 0.5311, 'learning_rate': 1.1467907923047488e-05, 'epoch': 0.47} 47%|████▋ | 3118/6640 [3:27:47<16:00:15, 16.36s/it] 47%|████▋ | 3119/6640 [3:28:03<15:58:24, 16.33s/it] {'loss': 0.5293, 'learning_rate': 1.1463082342456238e-05, 'epoch': 0.47} 47%|████▋ | 3119/6640 [3:28:03<15:58:24, 16.33s/it] 47%|████▋ | 3120/6640 [3:28:20<16:07:27, 16.49s/it] {'loss': 0.5444, 'learning_rate': 1.1458256413690634e-05, 'epoch': 0.47} 47%|████▋ | 3120/6640 [3:28:20<16:07:27, 16.49s/it] 47%|████▋ | 3121/6640 [3:28:36<16:05:44, 16.47s/it] {'loss': 0.5324, 'learning_rate': 1.1453430137899129e-05, 'epoch': 0.47} 47%|████▋ | 3121/6640 [3:28:36<16:05:44, 16.47s/it] 47%|████▋ | 3122/6640 [3:28:53<16:05:45, 16.47s/it] {'loss': 0.5441, 'learning_rate': 1.1448603516230241e-05, 'epoch': 0.47} 47%|████▋ | 3122/6640 [3:28:53<16:05:45, 16.47s/it] 47%|████▋ | 3123/6640 [3:29:09<16:07:06, 16.50s/it] {'loss': 0.5161, 'learning_rate': 1.1443776549832574e-05, 'epoch': 0.47} 47%|████▋ | 3123/6640 [3:29:09<16:07:06, 16.50s/it] 47%|████▋ | 3124/6640 [3:29:26<16:11:24, 16.58s/it] {'loss': 0.5246, 'learning_rate': 1.1438949239854822e-05, 'epoch': 0.47} 47%|████▋ | 3124/6640 [3:29:26<16:11:24, 16.58s/it] 47%|████▋ | 3125/6640 [3:29:42<15:56:47, 16.33s/it] {'loss': 0.5349, 'learning_rate': 1.1434121587445752e-05, 'epoch': 0.47} 47%|████▋ | 3125/6640 [3:29:42<15:56:47, 16.33s/it] 47%|████▋ | 3126/6640 [3:29:59<16:07:15, 16.52s/it] {'loss': 0.5434, 'learning_rate': 1.1429293593754216e-05, 'epoch': 0.47} 47%|████▋ | 3126/6640 [3:29:59<16:07:15, 16.52s/it] 47%|████▋ | 3127/6640 [3:30:15<16:08:08, 16.54s/it] {'loss': 0.5429, 'learning_rate': 1.1424465259929148e-05, 'epoch': 0.47} 47%|████▋ | 3127/6640 [3:30:15<16:08:08, 16.54s/it] 47%|████▋ | 3128/6640 [3:30:32<16:04:56, 16.49s/it] {'loss': 0.5148, 'learning_rate': 1.1419636587119563e-05, 'epoch': 0.47} 47%|████▋ | 3128/6640 [3:30:32<16:04:56, 16.49s/it] 47%|████▋ | 3129/6640 [3:30:48<15:59:17, 16.39s/it] {'loss': 0.5407, 'learning_rate': 1.1414807576474554e-05, 'epoch': 0.47} 47%|████▋ | 3129/6640 [3:30:48<15:59:17, 16.39s/it] 47%|████▋ | 3130/6640 [3:31:04<16:04:01, 16.48s/it] {'loss': 0.5335, 'learning_rate': 1.1409978229143297e-05, 'epoch': 0.47} 47%|████▋ | 3130/6640 [3:31:04<16:04:01, 16.48s/it] 47%|████▋ | 3131/6640 [3:31:21<16:06:07, 16.52s/it] {'loss': 0.5317, 'learning_rate': 1.1405148546275046e-05, 'epoch': 0.47} 47%|████▋ | 3131/6640 [3:31:21<16:06:07, 16.52s/it] 47%|████▋ | 3132/6640 [3:31:39<16:28:46, 16.91s/it] {'loss': 0.5325, 'learning_rate': 1.1400318529019134e-05, 'epoch': 0.47} 47%|████▋ | 3132/6640 [3:31:39<16:28:46, 16.91s/it] 47%|████▋ | 3133/6640 [3:31:55<16:10:25, 16.60s/it] {'loss': 0.5386, 'learning_rate': 1.1395488178524982e-05, 'epoch': 0.47} 47%|████▋ | 3133/6640 [3:31:55<16:10:25, 16.60s/it] 47%|████▋ | 3134/6640 [3:32:11<15:57:03, 16.38s/it] {'loss': 0.5301, 'learning_rate': 1.1390657495942077e-05, 'epoch': 0.47} 47%|████▋ | 3134/6640 [3:32:11<15:57:03, 16.38s/it] 47%|████▋ | 3135/6640 [3:32:26<15:42:49, 16.14s/it] {'loss': 0.5361, 'learning_rate': 1.1385826482419993e-05, 'epoch': 0.47} 47%|████▋ | 3135/6640 [3:32:26<15:42:49, 16.14s/it] 47%|████▋ | 3136/6640 [3:32:42<15:42:21, 16.14s/it] {'loss': 0.538, 'learning_rate': 1.1380995139108383e-05, 'epoch': 0.47} 47%|████▋ | 3136/6640 [3:32:42<15:42:21, 16.14s/it] 47%|████▋ | 3137/6640 [3:32:58<15:42:59, 16.15s/it] {'loss': 0.5433, 'learning_rate': 1.137616346715698e-05, 'epoch': 0.47} 47%|████▋ | 3137/6640 [3:32:58<15:42:59, 16.15s/it] 47%|████▋ | 3138/6640 [3:33:14<15:34:25, 16.01s/it] {'loss': 0.5247, 'learning_rate': 1.137133146771559e-05, 'epoch': 0.47} 47%|████▋ | 3138/6640 [3:33:14<15:34:25, 16.01s/it] 47%|████▋ | 3139/6640 [3:33:30<15:35:16, 16.03s/it] {'loss': 0.524, 'learning_rate': 1.1366499141934098e-05, 'epoch': 0.47} 47%|████▋ | 3139/6640 [3:33:30<15:35:16, 16.03s/it] 47%|████▋ | 3140/6640 [3:33:47<15:49:40, 16.28s/it] {'loss': 0.5336, 'learning_rate': 1.1361666490962468e-05, 'epoch': 0.47} 47%|████▋ | 3140/6640 [3:33:47<15:49:40, 16.28s/it] 47%|████▋ | 3141/6640 [3:34:03<15:48:49, 16.27s/it] {'loss': 0.5466, 'learning_rate': 1.1356833515950743e-05, 'epoch': 0.47} 47%|████▋ | 3141/6640 [3:34:03<15:48:49, 16.27s/it] 47%|████▋ | 3142/6640 [3:34:21<16:16:07, 16.74s/it] {'loss': 0.5443, 'learning_rate': 1.1352000218049038e-05, 'epoch': 0.47} 47%|████▋ | 3142/6640 [3:34:21<16:16:07, 16.74s/it] 47%|████▋ | 3143/6640 [3:34:38<16:12:09, 16.68s/it] {'loss': 0.5298, 'learning_rate': 1.1347166598407551e-05, 'epoch': 0.47} 47%|████▋ | 3143/6640 [3:34:38<16:12:09, 16.68s/it] 47%|████▋ | 3144/6640 [3:34:54<16:10:57, 16.66s/it] {'loss': 0.5518, 'learning_rate': 1.1342332658176556e-05, 'epoch': 0.47} 47%|████▋ | 3144/6640 [3:34:54<16:10:57, 16.66s/it] 47%|████▋ | 3145/6640 [3:35:12<16:23:38, 16.89s/it] {'loss': 0.5407, 'learning_rate': 1.1337498398506397e-05, 'epoch': 0.47} 47%|████▋ | 3145/6640 [3:35:12<16:23:38, 16.89s/it] 47%|████▋ | 3146/6640 [3:35:29<16:26:54, 16.95s/it] {'loss': 0.5101, 'learning_rate': 1.13326638205475e-05, 'epoch': 0.47} 47%|████▋ | 3146/6640 [3:35:29<16:26:54, 16.95s/it] 47%|████▋ | 3147/6640 [3:35:45<16:17:58, 16.80s/it] {'loss': 0.5464, 'learning_rate': 1.1327828925450363e-05, 'epoch': 0.47} 47%|████▋ | 3147/6640 [3:35:45<16:17:58, 16.80s/it] 47%|████▋ | 3148/6640 [3:36:01<16:05:40, 16.59s/it] {'loss': 0.5156, 'learning_rate': 1.1322993714365567e-05, 'epoch': 0.47} 47%|████▋ | 3148/6640 [3:36:01<16:05:40, 16.59s/it] 47%|████▋ | 3149/6640 [3:36:18<16:07:50, 16.63s/it] {'loss': 0.5334, 'learning_rate': 1.1318158188443758e-05, 'epoch': 0.47} 47%|████▋ | 3149/6640 [3:36:18<16:07:50, 16.63s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 47%|████▋ | 3150/6640 [3:36:33<15:44:35, 16.24s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5203, 'learning_rate': 1.1313322348835658e-05, 'epoch': 0.47} 47%|████▋ | 3150/6640 [3:36:33<15:44:35, 16.24s/it] 47%|████▋ | 3151/6640 [3:36:50<15:56:40, 16.45s/it] {'loss': 0.5244, 'learning_rate': 1.130848619669207e-05, 'epoch': 0.47} 47%|████▋ | 3151/6640 [3:36:50<15:56:40, 16.45s/it] 47%|████▋ | 3152/6640 [3:37:07<15:58:39, 16.49s/it] {'loss': 0.5287, 'learning_rate': 1.130364973316387e-05, 'epoch': 0.47} 47%|████▋ | 3152/6640 [3:37:07<15:58:39, 16.49s/it] 47%|████▋ | 3153/6640 [3:37:25<16:19:40, 16.86s/it] {'loss': 0.5457, 'learning_rate': 1.129881295940201e-05, 'epoch': 0.47} 47%|████▋ | 3153/6640 [3:37:25<16:19:40, 16.86s/it] 48%|████▊ | 3154/6640 [3:37:40<15:56:28, 16.46s/it] {'loss': 0.5327, 'learning_rate': 1.1293975876557506e-05, 'epoch': 0.47} 48%|████▊ | 3154/6640 [3:37:40<15:56:28, 16.46s/it] 48%|████▊ | 3155/6640 [3:37:57<16:02:40, 16.57s/it] {'loss': 0.5284, 'learning_rate': 1.1289138485781456e-05, 'epoch': 0.48} 48%|████▊ | 3155/6640 [3:37:57<16:02:40, 16.57s/it] 48%|████▊ | 3156/6640 [3:38:14<16:04:34, 16.61s/it] {'loss': 0.5452, 'learning_rate': 1.1284300788225032e-05, 'epoch': 0.48} 48%|████▊ | 3156/6640 [3:38:14<16:04:34, 16.61s/it] 48%|████▊ | 3157/6640 [3:38:30<16:00:10, 16.54s/it] {'loss': 0.5411, 'learning_rate': 1.1279462785039472e-05, 'epoch': 0.48} 48%|████▊ | 3157/6640 [3:38:30<16:00:10, 16.54s/it] 48%|████▊ | 3158/6640 [3:38:46<15:54:27, 16.45s/it] {'loss': 0.5211, 'learning_rate': 1.1274624477376091e-05, 'epoch': 0.48} 48%|████▊ | 3158/6640 [3:38:46<15:54:27, 16.45s/it] 48%|████▊ | 3159/6640 [3:39:04<16:09:09, 16.70s/it] {'loss': 0.5078, 'learning_rate': 1.1269785866386279e-05, 'epoch': 0.48} 48%|████▊ | 3159/6640 [3:39:04<16:09:09, 16.70s/it] 48%|████▊ | 3160/6640 [3:39:20<15:57:59, 16.52s/it] {'loss': 0.5272, 'learning_rate': 1.1264946953221496e-05, 'epoch': 0.48} 48%|████▊ | 3160/6640 [3:39:20<15:57:59, 16.52s/it] 48%|████▊ | 3161/6640 [3:39:37<16:09:52, 16.73s/it] {'loss': 0.5269, 'learning_rate': 1.126010773903327e-05, 'epoch': 0.48} 48%|████▊ | 3161/6640 [3:39:37<16:09:52, 16.73s/it] 48%|████▊ | 3162/6640 [3:39:53<16:03:43, 16.63s/it] {'loss': 0.5437, 'learning_rate': 1.125526822497321e-05, 'epoch': 0.48} 48%|████▊ | 3162/6640 [3:39:53<16:03:43, 16.63s/it] 48%|████▊ | 3163/6640 [3:40:10<16:06:06, 16.67s/it] {'loss': 0.5254, 'learning_rate': 1.1250428412192985e-05, 'epoch': 0.48} 48%|████▊ | 3163/6640 [3:40:10<16:06:06, 16.67s/it] 48%|████▊ | 3164/6640 [3:40:27<16:02:22, 16.61s/it] {'loss': 0.5361, 'learning_rate': 1.1245588301844343e-05, 'epoch': 0.48} 48%|████▊ | 3164/6640 [3:40:27<16:02:22, 16.61s/it] 48%|████▊ | 3165/6640 [3:40:42<15:39:07, 16.22s/it] {'loss': 0.5381, 'learning_rate': 1.12407478950791e-05, 'epoch': 0.48} 48%|████▊ | 3165/6640 [3:40:42<15:39:07, 16.22s/it] 48%|████▊ | 3166/6640 [3:40:59<15:46:52, 16.35s/it] {'loss': 0.5265, 'learning_rate': 1.1235907193049145e-05, 'epoch': 0.48} 48%|████▊ | 3166/6640 [3:40:59<15:46:52, 16.35s/it] 48%|████▊ | 3167/6640 [3:41:15<15:48:21, 16.38s/it] {'loss': 0.5327, 'learning_rate': 1.123106619690643e-05, 'epoch': 0.48} 48%|████▊ | 3167/6640 [3:41:15<15:48:21, 16.38s/it] 48%|████▊ | 3168/6640 [3:41:31<15:36:03, 16.18s/it] {'loss': 0.5374, 'learning_rate': 1.1226224907802986e-05, 'epoch': 0.48} 48%|████▊ | 3168/6640 [3:41:31<15:36:03, 16.18s/it] 48%|████▊ | 3169/6640 [3:41:47<15:30:17, 16.08s/it] {'loss': 0.5442, 'learning_rate': 1.1221383326890911e-05, 'epoch': 0.48} 48%|████▊ | 3169/6640 [3:41:47<15:30:17, 16.08s/it] 48%|████▊ | 3170/6640 [3:42:03<15:32:18, 16.12s/it] {'loss': 0.5288, 'learning_rate': 1.1216541455322367e-05, 'epoch': 0.48} 48%|████▊ | 3170/6640 [3:42:03<15:32:18, 16.12s/it] 48%|████▊ | 3171/6640 [3:42:20<15:42:33, 16.30s/it] {'loss': 0.5333, 'learning_rate': 1.1211699294249597e-05, 'epoch': 0.48} 48%|████▊ | 3171/6640 [3:42:20<15:42:33, 16.30s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4338 > 4096). Running this sequence through the model will result in indexing errors 48%|████▊ | 3172/6640 [3:42:36<15:50:56, 16.45s/it] {'loss': 0.5329, 'learning_rate': 1.1206856844824896e-05, 'epoch': 0.48} 48%|████▊ | 3172/6640 [3:42:36<15:50:56, 16.45s/it] 48%|████▊ | 3173/6640 [3:42:53<15:46:46, 16.38s/it] {'loss': 0.5429, 'learning_rate': 1.1202014108200645e-05, 'epoch': 0.48} 48%|████▊ | 3173/6640 [3:42:53<15:46:46, 16.38s/it] 48%|████▊ | 3174/6640 [3:43:09<15:47:00, 16.39s/it] {'loss': 0.52, 'learning_rate': 1.119717108552928e-05, 'epoch': 0.48} 48%|████▊ | 3174/6640 [3:43:09<15:47:00, 16.39s/it] 48%|████▊ | 3175/6640 [3:43:26<15:51:47, 16.48s/it] {'loss': 0.539, 'learning_rate': 1.1192327777963313e-05, 'epoch': 0.48} 48%|████▊ | 3175/6640 [3:43:26<15:51:47, 16.48s/it] 48%|████▊ | 3176/6640 [3:43:42<15:58:01, 16.59s/it] {'loss': 0.5187, 'learning_rate': 1.118748418665532e-05, 'epoch': 0.48} 48%|████▊ | 3176/6640 [3:43:42<15:58:01, 16.59s/it] 48%|████▊ | 3177/6640 [3:43:59<15:49:08, 16.44s/it] {'loss': 0.5353, 'learning_rate': 1.1182640312757949e-05, 'epoch': 0.48} 48%|████▊ | 3177/6640 [3:43:59<15:49:08, 16.44s/it] 48%|████▊ | 3178/6640 [3:44:15<15:48:09, 16.43s/it] {'loss': 0.5425, 'learning_rate': 1.1177796157423908e-05, 'epoch': 0.48} 48%|████▊ | 3178/6640 [3:44:15<15:48:09, 16.43s/it] 48%|████▊ | 3179/6640 [3:44:31<15:46:00, 16.40s/it] {'loss': 0.5356, 'learning_rate': 1.1172951721805977e-05, 'epoch': 0.48} 48%|████▊ | 3179/6640 [3:44:31<15:46:00, 16.40s/it] 48%|████▊ | 3180/6640 [3:44:48<15:48:21, 16.45s/it] {'loss': 0.5281, 'learning_rate': 1.1168107007057006e-05, 'epoch': 0.48} 48%|████▊ | 3180/6640 [3:44:48<15:48:21, 16.45s/it] 48%|████▊ | 3181/6640 [3:45:05<15:58:37, 16.63s/it] {'loss': 0.52, 'learning_rate': 1.1163262014329902e-05, 'epoch': 0.48} 48%|████▊ | 3181/6640 [3:45:05<15:58:37, 16.63s/it] 48%|████▊ | 3182/6640 [3:45:22<16:02:45, 16.70s/it] {'loss': 0.5246, 'learning_rate': 1.1158416744777644e-05, 'epoch': 0.48} 48%|████▊ | 3182/6640 [3:45:22<16:02:45, 16.70s/it] 48%|████▊ | 3183/6640 [3:45:38<16:01:58, 16.70s/it] {'loss': 0.5232, 'learning_rate': 1.1153571199553276e-05, 'epoch': 0.48} 48%|████▊ | 3183/6640 [3:45:38<16:01:58, 16.70s/it] 48%|████▊ | 3184/6640 [3:45:55<15:54:31, 16.57s/it] {'loss': 0.5384, 'learning_rate': 1.1148725379809911e-05, 'epoch': 0.48} 48%|████▊ | 3184/6640 [3:45:55<15:54:31, 16.57s/it] 48%|████▊ | 3185/6640 [3:46:12<16:00:15, 16.68s/it] {'loss': 0.5264, 'learning_rate': 1.1143879286700723e-05, 'epoch': 0.48} 48%|████▊ | 3185/6640 [3:46:12<16:00:15, 16.68s/it] 48%|████▊ | 3186/6640 [3:46:28<15:45:43, 16.43s/it] {'loss': 0.5514, 'learning_rate': 1.1139032921378947e-05, 'epoch': 0.48} 48%|████▊ | 3186/6640 [3:46:28<15:45:43, 16.43s/it] 48%|████▊ | 3187/6640 [3:46:44<15:38:29, 16.31s/it] {'loss': 0.5206, 'learning_rate': 1.1134186284997897e-05, 'epoch': 0.48} 48%|████▊ | 3187/6640 [3:46:44<15:38:29, 16.31s/it] 48%|████▊ | 3188/6640 [3:47:01<15:50:05, 16.51s/it] {'loss': 0.5177, 'learning_rate': 1.1129339378710933e-05, 'epoch': 0.48} 48%|████▊ | 3188/6640 [3:47:01<15:50:05, 16.51s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 48%|████▊ | 3189/6640 [3:47:17<15:54:40, 16.60s/it] {'loss': 0.5404, 'learning_rate': 1.1124492203671498e-05, 'epoch': 0.48} 48%|████▊ | 3189/6640 [3:47:17<15:54:40, 16.60s/it] 48%|████▊ | 3190/6640 [3:47:33<15:39:04, 16.33s/it] {'loss': 0.53, 'learning_rate': 1.1119644761033079e-05, 'epoch': 0.48} 48%|████▊ | 3190/6640 [3:47:33<15:39:04, 16.33s/it] 48%|████▊ | 3191/6640 [3:47:51<15:58:36, 16.68s/it] {'loss': 0.523, 'learning_rate': 1.1114797051949248e-05, 'epoch': 0.48} 48%|████▊ | 3191/6640 [3:47:51<15:58:36, 16.68s/it] 48%|████▊ | 3192/6640 [3:48:07<15:46:07, 16.46s/it] {'loss': 0.5258, 'learning_rate': 1.1109949077573623e-05, 'epoch': 0.48} 48%|████▊ | 3192/6640 [3:48:07<15:46:07, 16.46s/it] 48%|████▊ | 3193/6640 [3:48:23<15:45:11, 16.45s/it] {'loss': 0.5233, 'learning_rate': 1.1105100839059892e-05, 'epoch': 0.48} 48%|████▊ | 3193/6640 [3:48:23<15:45:11, 16.45s/it] 48%|████▊ | 3194/6640 [3:48:39<15:46:34, 16.48s/it] {'loss': 0.5382, 'learning_rate': 1.110025233756181e-05, 'epoch': 0.48} 48%|████▊ | 3194/6640 [3:48:39<15:46:34, 16.48s/it] 48%|████▊ | 3195/6640 [3:48:56<15:44:11, 16.44s/it] {'loss': 0.543, 'learning_rate': 1.1095403574233185e-05, 'epoch': 0.48} 48%|████▊ | 3195/6640 [3:48:56<15:44:11, 16.44s/it] 48%|████▊ | 3196/6640 [3:49:13<15:55:19, 16.64s/it] {'loss': 0.5444, 'learning_rate': 1.1090554550227899e-05, 'epoch': 0.48} 48%|████▊ | 3196/6640 [3:49:13<15:55:19, 16.64s/it] 48%|████▊ | 3197/6640 [3:49:29<15:53:20, 16.61s/it] {'loss': 0.5342, 'learning_rate': 1.1085705266699884e-05, 'epoch': 0.48} 48%|████▊ | 3197/6640 [3:49:29<15:53:20, 16.61s/it] 48%|████▊ | 3198/6640 [3:49:46<15:47:09, 16.51s/it] {'loss': 0.5425, 'learning_rate': 1.1080855724803141e-05, 'epoch': 0.48} 48%|████▊ | 3198/6640 [3:49:46<15:47:09, 16.51s/it] 48%|████▊ | 3199/6640 [3:50:02<15:49:35, 16.56s/it] {'loss': 0.5268, 'learning_rate': 1.1076005925691731e-05, 'epoch': 0.48} 48%|████▊ | 3199/6640 [3:50:02<15:49:35, 16.56s/it]4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 48%|████▊ | 3200/6640 [3:50:19<15:47:01, 16.52s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5176, 'learning_rate': 1.1071155870519777e-05, 'epoch': 0.48} 48%|████▊ | 3200/6640 [3:50:19<15:47:01, 16.52s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-3200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-3200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-3200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 48%|████▊ | 3201/6640 [3:52:10<42:56:00, 44.94s/it] {'loss': 0.5322, 'learning_rate': 1.106630556044146e-05, 'epoch': 0.48} 48%|████▊ | 3201/6640 [3:52:10<42:56:00, 44.94s/it] 48%|████▊ | 3202/6640 [3:52:27<34:48:32, 36.45s/it] {'loss': 0.5227, 'learning_rate': 1.1061454996611026e-05, 'epoch': 0.48} 48%|████▊ | 3202/6640 [3:52:27<34:48:32, 36.45s/it] 48%|████▊ | 3203/6640 [3:52:43<28:56:26, 30.31s/it] {'loss': 0.5302, 'learning_rate': 1.1056604180182777e-05, 'epoch': 0.48} 48%|████▊ | 3203/6640 [3:52:43<28:56:26, 30.31s/it] 48%|████▊ | 3204/6640 [3:52:59<24:51:30, 26.05s/it] {'loss': 0.5335, 'learning_rate': 1.105175311231108e-05, 'epoch': 0.48} 48%|████▊ | 3204/6640 [3:52:59<24:51:30, 26.05s/it] 48%|████▊ | 3205/6640 [3:53:15<22:06:55, 23.18s/it] {'loss': 0.5355, 'learning_rate': 1.1046901794150358e-05, 'epoch': 0.48} 48%|████▊ | 3205/6640 [3:53:15<22:06:55, 23.18s/it] 48%|████▊ | 3206/6640 [3:53:32<20:13:40, 21.21s/it] {'loss': 0.5141, 'learning_rate': 1.104205022685509e-05, 'epoch': 0.48} 48%|████▊ | 3206/6640 [3:53:32<20:13:40, 21.21s/it] 48%|████▊ | 3207/6640 [3:53:48<18:40:06, 19.58s/it] {'loss': 0.5121, 'learning_rate': 1.1037198411579826e-05, 'epoch': 0.48} 48%|████▊ | 3207/6640 [3:53:48<18:40:06, 19.58s/it] 48%|████▊ | 3208/6640 [3:54:04<17:46:35, 18.65s/it] {'loss': 0.5398, 'learning_rate': 1.1032346349479162e-05, 'epoch': 0.48} 48%|████▊ | 3208/6640 [3:54:04<17:46:35, 18.65s/it] 48%|████▊ | 3209/6640 [3:54:20<16:56:42, 17.78s/it] {'loss': 0.5194, 'learning_rate': 1.1027494041707761e-05, 'epoch': 0.48} 48%|████▊ | 3209/6640 [3:54:20<16:56:42, 17.78s/it] 48%|████▊ | 3210/6640 [3:54:36<16:30:08, 17.32s/it] {'loss': 0.5329, 'learning_rate': 1.1022641489420342e-05, 'epoch': 0.48} 48%|████▊ | 3210/6640 [3:54:36<16:30:08, 17.32s/it] 48%|████▊ | 3211/6640 [3:54:52<16:12:17, 17.01s/it] {'loss': 0.5437, 'learning_rate': 1.1017788693771685e-05, 'epoch': 0.48} 48%|████▊ | 3211/6640 [3:54:52<16:12:17, 17.01s/it] 48%|████▊ | 3212/6640 [3:55:08<15:48:35, 16.60s/it] {'loss': 0.5382, 'learning_rate': 1.1012935655916624e-05, 'epoch': 0.48} 48%|████▊ | 3212/6640 [3:55:08<15:48:35, 16.60s/it] 48%|████▊ | 3213/6640 [3:55:25<15:52:20, 16.67s/it] {'loss': 0.5245, 'learning_rate': 1.1008082377010045e-05, 'epoch': 0.48} 48%|████▊ | 3213/6640 [3:55:25<15:52:20, 16.67s/it] 48%|████▊ | 3214/6640 [3:55:42<15:55:02, 16.73s/it] {'loss': 0.5401, 'learning_rate': 1.100322885820691e-05, 'epoch': 0.48} 48%|████▊ | 3214/6640 [3:55:42<15:55:02, 16.73s/it] 48%|████▊ | 3215/6640 [3:55:58<15:47:43, 16.60s/it] {'loss': 0.5337, 'learning_rate': 1.0998375100662215e-05, 'epoch': 0.48} 48%|████▊ | 3215/6640 [3:55:58<15:47:43, 16.60s/it] 48%|████▊ | 3216/6640 [3:56:14<15:42:51, 16.52s/it] {'loss': 0.516, 'learning_rate': 1.0993521105531033e-05, 'epoch': 0.48} 48%|████▊ | 3216/6640 [3:56:14<15:42:51, 16.52s/it] 48%|████▊ | 3217/6640 [3:56:31<15:41:40, 16.51s/it] {'loss': 0.5332, 'learning_rate': 1.0988666873968477e-05, 'epoch': 0.48} 48%|████▊ | 3217/6640 [3:56:31<15:41:40, 16.51s/it] 48%|████▊ | 3218/6640 [3:56:47<15:33:27, 16.37s/it] {'loss': 0.5187, 'learning_rate': 1.0983812407129728e-05, 'epoch': 0.48} 48%|████▊ | 3218/6640 [3:56:47<15:33:27, 16.37s/it]May 28 10:12:09.774501 1384361 slurmstepd 0x155550ab8700: error: *** STEP 8284398.0 ON batch-block1-0066 CANCELLED AT 2025-05-28T10:12:09 DUE TO TIME LIMIT *** srun: Job step aborted: Waiting up to 122 seconds for job step to finish. srun: error: batch-block1-0066: task 0: Terminated srun: Terminating StepId=8284398.0 srun: job 8289522 queued and waiting for resources srun: job 8289522 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-0105 JobID: 8289522 | Full list: batch-block1-0105 NETWORK=Efficient-Large-Model/VILA1.5-13b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-05-28 10:14:11,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 10:14:11,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 10:14:11,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 10:14:11,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 10:14:11,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 10:14:11,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 10:14:11,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 10:14:11,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 10:14:12,794] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 10:14:12,794] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 10:14:12,794] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 10:14:12,794] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 10:14:12,794] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 10:14:12,794] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 10:14:12,794] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 10:14:12,794] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 10:14:12,794] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 10:14:12,794] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 10:14:12,794] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 10:14:12,794] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 10:14:12,794] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 10:14:12,794] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 10:14:12,794] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 10:14:12,794] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 10:14:12,794] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-05-28 10:14:20,455] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 13.02B parameters Loading checkpoint shards: 0%| | 0/6 [00:00\nWould this person be more likely to be a type a or b person?\nAnswer the question using a single word or phrase.'}, {'from': 'gpt', 'value': ''}]] (ignored) 80%|████████ | 5336/6640 [2:35:08<5:57:45, 16.46s/it] {'loss': 0.5179, 'learning_rate': 1.955953829794711e-06, 'epoch': 0.8} 80%|████████ | 5336/6640 [2:35:08<5:57:45, 16.46s/it] 80%|████████ | 5337/6640 [2:35:25<6:01:24, 16.64s/it] {'loss': 0.4977, 'learning_rate': 1.953056707711005e-06, 'epoch': 0.8} 80%|████████ | 5337/6640 [2:35:25<6:01:24, 16.64s/it] 80%|████████ | 5338/6640 [2:35:42<6:03:06, 16.73s/it] {'loss': 0.4984, 'learning_rate': 1.95016150058393e-06, 'epoch': 0.8} 80%|████████ | 5338/6640 [2:35:42<6:03:06, 16.73s/it] 80%|████████ | 5339/6640 [2:35:58<5:58:45, 16.55s/it] {'loss': 0.5054, 'learning_rate': 1.9472682091024696e-06, 'epoch': 0.8} 80%|████████ | 5339/6640 [2:35:58<5:58:45, 16.55s/it] 80%|████████ | 5340/6640 [2:36:15<5:59:32, 16.59s/it] {'loss': 0.5115, 'learning_rate': 1.944376833955147e-06, 'epoch': 0.8} 80%|████████ | 5340/6640 [2:36:15<5:59:32, 16.59s/it] 80%|████████ | 5341/6640 [2:36:31<5:57:17, 16.50s/it] {'loss': 0.4995, 'learning_rate': 1.941487375830037e-06, 'epoch': 0.8} 80%|████████ | 5341/6640 [2:36:31<5:57:17, 16.50s/it] 80%|████████ | 5342/6640 [2:36:48<5:57:48, 16.54s/it] {'loss': 0.512, 'learning_rate': 1.938599835414745e-06, 'epoch': 0.8} 80%|████████ | 5342/6640 [2:36:48<5:57:48, 16.54s/it] 80%|████████ | 5343/6640 [2:37:04<5:57:03, 16.52s/it] {'loss': 0.5187, 'learning_rate': 1.9357142133964336e-06, 'epoch': 0.8} 80%|████████ | 5343/6640 [2:37:04<5:57:03, 16.52s/it] 80%|████████ | 5344/6640 [2:37:22<6:05:06, 16.90s/it] {'loss': 0.4909, 'learning_rate': 1.932830510461802e-06, 'epoch': 0.8} 80%|████████ | 5344/6640 [2:37:22<6:05:06, 16.90s/it] 80%|████████ | 5345/6640 [2:37:38<5:58:51, 16.63s/it] {'loss': 0.514, 'learning_rate': 1.929948727297096e-06, 'epoch': 0.8} 80%|████████ | 5345/6640 [2:37:38<5:58:51, 16.63s/it] 81%|████████ | 5346/6640 [2:37:55<5:58:02, 16.60s/it] {'loss': 0.5018, 'learning_rate': 1.9270688645881e-06, 'epoch': 0.81} 81%|████████ | 5346/6640 [2:37:55<5:58:02, 16.60s/it] 81%|████████ | 5347/6640 [2:38:11<5:56:32, 16.54s/it] {'loss': 0.5213, 'learning_rate': 1.924190923020144e-06, 'epoch': 0.81} 81%|████████ | 5347/6640 [2:38:11<5:56:32, 16.54s/it] 81%|████████ | 5348/6640 [2:38:28<5:57:37, 16.61s/it] {'loss': 0.5285, 'learning_rate': 1.921314903278102e-06, 'epoch': 0.81} 81%|████████ | 5348/6640 [2:38:28<5:57:37, 16.61s/it] 81%|████████ | 5349/6640 [2:38:44<5:52:42, 16.39s/it] {'loss': 0.5083, 'learning_rate': 1.918440806046391e-06, 'epoch': 0.81} 81%|████████ | 5349/6640 [2:38:44<5:52:42, 16.39s/it]4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 81%|████████ | 5350/6640 [2:39:00<5:53:00, 16.42s/it]2 AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.4943, 'learning_rate': 1.9155686320089684e-06, 'epoch': 0.81} 81%|████████ | 5350/6640 [2:39:00<5:53:00, 16.42s/it] 81%|████████ | 5351/6640 [2:39:16<5:50:03, 16.29s/it] {'loss': 0.5015, 'learning_rate': 1.912698381849333e-06, 'epoch': 0.81} 81%|████████ | 5351/6640 [2:39:16<5:50:03, 16.29s/it] 81%|████████ | 5352/6640 [2:39:32<5:47:14, 16.18s/it] {'loss': 0.5126, 'learning_rate': 1.9098300562505266e-06, 'epoch': 0.81} 81%|████████ | 5352/6640 [2:39:32<5:47:14, 16.18s/it] 81%|████████ | 5353/6640 [2:39:48<5:46:04, 16.13s/it] {'loss': 0.5051, 'learning_rate': 1.9069636558951354e-06, 'epoch': 0.81} 81%|████████ | 5353/6640 [2:39:48<5:46:04, 16.13s/it] 81%|████████ | 5354/6640 [2:40:04<5:44:41, 16.08s/it] {'loss': 0.5198, 'learning_rate': 1.9040991814652864e-06, 'epoch': 0.81} 81%|████████ | 5354/6640 [2:40:04<5:44:41, 16.08s/it] 81%|████████ | 5355/6640 [2:40:21<5:47:58, 16.25s/it] {'loss': 0.5235, 'learning_rate': 1.901236633642649e-06, 'epoch': 0.81} 81%|████████ | 5355/6640 [2:40:21<5:47:58, 16.25s/it] 81%|████████ | 5356/6640 [2:40:37<5:47:00, 16.22s/it] {'loss': 0.4959, 'learning_rate': 1.8983760131084283e-06, 'epoch': 0.81} 81%|████████ | 5356/6640 [2:40:37<5:47:00, 16.22s/it] 81%|████████ | 5357/6640 [2:40:54<5:51:09, 16.42s/it] {'loss': 0.4919, 'learning_rate': 1.8955173205433774e-06, 'epoch': 0.81} 81%|████████ | 5357/6640 [2:40:54<5:51:09, 16.42s/it] 81%|████████ | 5358/6640 [2:41:10<5:50:43, 16.41s/it] {'loss': 0.504, 'learning_rate': 1.892660556627789e-06, 'epoch': 0.81} 81%|████████ | 5358/6640 [2:41:10<5:50:43, 16.41s/it] 81%|████████ | 5359/6640 [2:41:26<5:47:41, 16.28s/it] {'loss': 0.5288, 'learning_rate': 1.889805722041499e-06, 'epoch': 0.81} 81%|████████ | 5359/6640 [2:41:26<5:47:41, 16.28s/it] 81%|████████ | 5360/6640 [2:41:43<5:48:39, 16.34s/it] {'loss': 0.5107, 'learning_rate': 1.8869528174638752e-06, 'epoch': 0.81} 81%|████████ | 5360/6640 [2:41:43<5:48:39, 16.34s/it] 81%|████████ | 5361/6640 [2:41:59<5:49:31, 16.40s/it] {'loss': 0.5216, 'learning_rate': 1.8841018435738357e-06, 'epoch': 0.81} 81%|████████ | 5361/6640 [2:41:59<5:49:31, 16.40s/it] 81%|████████ | 5362/6640 [2:42:15<5:47:14, 16.30s/it] {'loss': 0.5253, 'learning_rate': 1.8812528010498355e-06, 'epoch': 0.81} 81%|████████ | 5362/6640 [2:42:15<5:47:14, 16.30s/it] 81%|████████ | 5363/6640 [2:42:33<5:53:06, 16.59s/it] {'loss': 0.5142, 'learning_rate': 1.878405690569871e-06, 'epoch': 0.81} 81%|████████ | 5363/6640 [2:42:33<5:53:06, 16.59s/it] 81%|████████ | 5364/6640 [2:42:49<5:48:08, 16.37s/it] {'loss': 0.4968, 'learning_rate': 1.8755605128114796e-06, 'epoch': 0.81} 81%|████████ | 5364/6640 [2:42:49<5:48:08, 16.37s/it] 81%|████████ | 5365/6640 [2:43:04<5:43:56, 16.19s/it] {'loss': 0.5, 'learning_rate': 1.8727172684517325e-06, 'epoch': 0.81} 81%|████████ | 5365/6640 [2:43:04<5:43:56, 16.19s/it] 81%|████████ | 5366/6640 [2:43:21<5:48:26, 16.41s/it] {'loss': 0.5044, 'learning_rate': 1.8698759581672487e-06, 'epoch': 0.81} 81%|████████ | 5366/6640 [2:43:21<5:48:26, 16.41s/it] 81%|████████ | 5367/6640 [2:43:37<5:44:00, 16.21s/it] {'loss': 0.4973, 'learning_rate': 1.8670365826341842e-06, 'epoch': 0.81} 81%|████████ | 5367/6640 [2:43:37<5:44:00, 16.21s/it] 81%|████████ | 5368/6640 [2:43:53<5:43:11, 16.19s/it] {'loss': 0.5292, 'learning_rate': 1.8641991425282347e-06, 'epoch': 0.81} 81%|████████ | 5368/6640 [2:43:53<5:43:11, 16.19s/it] 81%|████████ | 5369/6640 [2:44:11<5:54:27, 16.73s/it] {'loss': 0.4997, 'learning_rate': 1.8613636385246326e-06, 'epoch': 0.81} 81%|████████ | 5369/6640 [2:44:11<5:54:27, 16.73s/it] 81%|████████ | 5370/6640 [2:44:27<5:50:36, 16.56s/it] {'loss': 0.534, 'learning_rate': 1.8585300712981514e-06, 'epoch': 0.81} 81%|████████ | 5370/6640 [2:44:27<5:50:36, 16.56s/it] 81%|████████ | 5371/6640 [2:44:44<5:50:37, 16.58s/it] {'loss': 0.5092, 'learning_rate': 1.855698441523106e-06, 'epoch': 0.81} 81%|████████ | 5371/6640 [2:44:44<5:50:37, 16.58s/it] 81%|████████ | 5372/6640 [2:45:00<5:48:22, 16.48s/it] {'loss': 0.5177, 'learning_rate': 1.8528687498733478e-06, 'epoch': 0.81} 81%|████████ | 5372/6640 [2:45:00<5:48:22, 16.48s/it] 81%|████████ | 5373/6640 [2:45:17<5:51:46, 16.66s/it] {'loss': 0.5304, 'learning_rate': 1.85004099702227e-06, 'epoch': 0.81} 81%|████████ | 5373/6640 [2:45:17<5:51:46, 16.66s/it] 81%|████████ | 5374/6640 [2:45:34<5:54:32, 16.80s/it] {'loss': 0.5202, 'learning_rate': 1.8472151836427976e-06, 'epoch': 0.81} 81%|████████ | 5374/6640 [2:45:34<5:54:32, 16.80s/it] 81%|████████ | 5375/6640 [2:45:51<5:51:26, 16.67s/it] {'loss': 0.5258, 'learning_rate': 1.8443913104073984e-06, 'epoch': 0.81} 81%|████████ | 5375/6640 [2:45:51<5:51:26, 16.67s/it] 81%|████████ | 5376/6640 [2:46:07<5:46:44, 16.46s/it] {'loss': 0.5188, 'learning_rate': 1.8415693779880816e-06, 'epoch': 0.81} 81%|████████ | 5376/6640 [2:46:07<5:46:44, 16.46s/it] 81%|████████ | 5377/6640 [2:46:23<5:47:02, 16.49s/it] {'loss': 0.5087, 'learning_rate': 1.8387493870563933e-06, 'epoch': 0.81} 81%|████████ | 5377/6640 [2:46:23<5:47:02, 16.49s/it] 81%|████████ | 5378/6640 [2:46:40<5:47:47, 16.54s/it] {'loss': 0.533, 'learning_rate': 1.8359313382834088e-06, 'epoch': 0.81} 81%|████████ | 5378/6640 [2:46:40<5:47:47, 16.54s/it] 81%|████████ | 5379/6640 [2:46:58<5:55:20, 16.91s/it] {'loss': 0.5258, 'learning_rate': 1.8331152323397515e-06, 'epoch': 0.81} 81%|████████ | 5379/6640 [2:46:58<5:55:20, 16.91s/it] 81%|████████ | 5380/6640 [2:47:15<5:56:46, 16.99s/it] {'loss': 0.4973, 'learning_rate': 1.8303010698955803e-06, 'epoch': 0.81} 81%|████████ | 5380/6640 [2:47:15<5:56:46, 16.99s/it] 81%|████████ | 5381/6640 [2:47:32<5:55:02, 16.92s/it] {'loss': 0.495, 'learning_rate': 1.827488851620589e-06, 'epoch': 0.81} 81%|████████ | 5381/6640 [2:47:32<5:55:02, 16.92s/it] 81%|████████ | 5382/6640 [2:47:47<5:45:58, 16.50s/it] {'loss': 0.5107, 'learning_rate': 1.8246785781840138e-06, 'epoch': 0.81} 81%|████████ | 5382/6640 [2:47:47<5:45:58, 16.50s/it] 81%|████████ | 5383/6640 [2:48:03<5:42:19, 16.34s/it] {'loss': 0.5148, 'learning_rate': 1.821870250254617e-06, 'epoch': 0.81} 81%|████████ | 5383/6640 [2:48:03<5:42:19, 16.34s/it] 81%|████████ | 5384/6640 [2:48:18<5:35:42, 16.04s/it] {'loss': 0.4884, 'learning_rate': 1.8190638685007111e-06, 'epoch': 0.81} 81%|████████ | 5384/6640 [2:48:18<5:35:42, 16.04s/it] 81%|████████ | 5385/6640 [2:48:35<5:37:56, 16.16s/it] {'loss': 0.506, 'learning_rate': 1.8162594335901363e-06, 'epoch': 0.81} 81%|████████ | 5385/6640 [2:48:35<5:37:56, 16.16s/it] 81%|████████ | 5386/6640 [2:48:51<5:36:42, 16.11s/it] {'loss': 0.4977, 'learning_rate': 1.8134569461902785e-06, 'epoch': 0.81} 81%|████████ | 5386/6640 [2:48:51<5:36:42, 16.11s/it] 81%|████████ | 5387/6640 [2:49:07<5:35:34, 16.07s/it] {'loss': 0.5402, 'learning_rate': 1.8106564069680476e-06, 'epoch': 0.81} 81%|████████ | 5387/6640 [2:49:07<5:35:34, 16.07s/it] 81%|████████ | 5388/6640 [2:49:23<5:36:23, 16.12s/it] {'loss': 0.5355, 'learning_rate': 1.8078578165898997e-06, 'epoch': 0.81} 81%|████████ | 5388/6640 [2:49:23<5:36:23, 16.12s/it] 81%|████████ | 5389/6640 [2:49:40<5:40:44, 16.34s/it] {'loss': 0.5233, 'learning_rate': 1.8050611757218251e-06, 'epoch': 0.81} 81%|████████ | 5389/6640 [2:49:40<5:40:44, 16.34s/it] 81%|████████ | 5390/6640 [2:49:58<5:49:07, 16.76s/it] {'loss': 0.5141, 'learning_rate': 1.802266485029347e-06, 'epoch': 0.81} 81%|████████ | 5390/6640 [2:49:58<5:49:07, 16.76s/it] 81%|████████ | 5391/6640 [2:50:14<5:48:08, 16.72s/it] {'loss': 0.5075, 'learning_rate': 1.7994737451775324e-06, 'epoch': 0.81} 81%|████████ | 5391/6640 [2:50:14<5:48:08, 16.72s/it] 81%|████████ | 5392/6640 [2:50:30<5:41:40, 16.43s/it] {'loss': 0.5107, 'learning_rate': 1.7966829568309708e-06, 'epoch': 0.81} 81%|████████ | 5392/6640 [2:50:30<5:41:40, 16.43s/it] 81%|████████ | 5393/6640 [2:50:47<5:42:06, 16.46s/it] {'loss': 0.501, 'learning_rate': 1.7938941206537997e-06, 'epoch': 0.81} 81%|████████ | 5393/6640 [2:50:47<5:42:06, 16.46s/it] 81%|████████ | 5394/6640 [2:51:03<5:40:16, 16.39s/it] {'loss': 0.5285, 'learning_rate': 1.791107237309685e-06, 'epoch': 0.81} 81%|████████ | 5394/6640 [2:51:03<5:40:16, 16.39s/it] 81%|████████▏ | 5395/6640 [2:51:19<5:39:20, 16.35s/it] {'loss': 0.5124, 'learning_rate': 1.7883223074618316e-06, 'epoch': 0.81} 81%|████████▏ | 5395/6640 [2:51:19<5:39:20, 16.35s/it] 81%|████████▏ | 5396/6640 [2:51:35<5:39:21, 16.37s/it] {'loss': 0.5189, 'learning_rate': 1.7855393317729808e-06, 'epoch': 0.81} 81%|████████▏ | 5396/6640 [2:51:35<5:39:21, 16.37s/it] 81%|████████▏ | 5397/6640 [2:51:52<5:37:40, 16.30s/it] {'loss': 0.5087, 'learning_rate': 1.782758310905398e-06, 'epoch': 0.81} 81%|████████▏ | 5397/6640 [2:51:52<5:37:40, 16.30s/it] 81%|████████▏ | 5398/6640 [2:52:08<5:36:10, 16.24s/it] {'loss': 0.5053, 'learning_rate': 1.7799792455209019e-06, 'epoch': 0.81} 81%|████████▏ | 5398/6640 [2:52:08<5:36:10, 16.24s/it] 81%|████████▏ | 5399/6640 [2:52:24<5:34:14, 16.16s/it] {'loss': 0.5203, 'learning_rate': 1.7772021362808279e-06, 'epoch': 0.81} 81%|████████▏ | 5399/6640 [2:52:24<5:34:14, 16.16s/it]4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 81%|████████▏ | 5400/6640 [2:52:41<5:42:55, 16.59s/it] {'loss': 0.525, 'learning_rate': 1.774426983846058e-06, 'epoch': 0.81} 81%|████████▏ | 5400/6640 [2:52:41<5:42:55, 16.59s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-5400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-5400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-5400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 81%|████████▏ | 5401/6640 [2:54:23<14:32:11, 42.24s/it] {'loss': 0.5107, 'learning_rate': 1.771653788876999e-06, 'epoch': 0.81} 81%|████████▏ | 5401/6640 [2:54:23<14:32:11, 42.24s/it] 81%|████████▏ | 5402/6640 [2:54:41<11:59:06, 34.85s/it] {'loss': 0.5222, 'learning_rate': 1.7688825520336017e-06, 'epoch': 0.81} 81%|████████▏ | 5402/6640 [2:54:41<11:59:06, 34.85s/it] 81%|████████▏ | 5403/6640 [2:54:57<10:02:16, 29.21s/it] {'loss': 0.5194, 'learning_rate': 1.7661132739753429e-06, 'epoch': 0.81} 81%|████████▏ | 5403/6640 [2:54:57<10:02:16, 29.21s/it] 81%|████████▏ | 5404/6640 [2:55:13<8:39:38, 25.23s/it] {'loss': 0.5301, 'learning_rate': 1.7633459553612387e-06, 'epoch': 0.81} 81%|████████▏ | 5404/6640 [2:55:13<8:39:38, 25.23s/it] 81%|████████▏ | 5405/6640 [2:55:29<7:44:48, 22.58s/it] {'loss': 0.5067, 'learning_rate': 1.760580596849838e-06, 'epoch': 0.81} 81%|████████▏ | 5405/6640 [2:55:29<7:44:48, 22.58s/it] 81%|████████▏ | 5406/6640 [2:55:45<7:02:42, 20.55s/it] {'loss': 0.5234, 'learning_rate': 1.7578171990992144e-06, 'epoch': 0.81} 81%|████████▏ | 5406/6640 [2:55:45<7:02:42, 20.55s/it] 81%|████████▏ | 5407/6640 [2:56:02<6:36:34, 19.30s/it] {'loss': 0.5168, 'learning_rate': 1.7550557627669928e-06, 'epoch': 0.81} 81%|████████▏ | 5407/6640 [2:56:02<6:36:34, 19.30s/it] 81%|████████▏ | 5408/6640 [2:56:18<6:19:15, 18.47s/it] {'loss': 0.5125, 'learning_rate': 1.7522962885103145e-06, 'epoch': 0.81} 81%|████████▏ | 5408/6640 [2:56:18<6:19:15, 18.47s/it] 81%|████████▏ | 5409/6640 [2:56:34<6:04:32, 17.77s/it] {'loss': 0.512, 'learning_rate': 1.749538776985864e-06, 'epoch': 0.81} 81%|████████▏ | 5409/6640 [2:56:34<6:04:32, 17.77s/it] 81%|████████▏ | 5410/6640 [2:56:51<6:01:02, 17.61s/it] {'loss': 0.5318, 'learning_rate': 1.746783228849851e-06, 'epoch': 0.81} 81%|████████▏ | 5410/6640 [2:56:51<6:01:02, 17.61s/it] 81%|████████▏ | 5411/6640 [2:57:08<5:54:45, 17.32s/it] {'loss': 0.5115, 'learning_rate': 1.744029644758023e-06, 'epoch': 0.81} 81%|████████▏ | 5411/6640 [2:57:08<5:54:45, 17.32s/it] 82%|████████▏ | 5412/6640 [2:57:25<5:49:29, 17.08s/it] {'loss': 0.5089, 'learning_rate': 1.7412780253656603e-06, 'epoch': 0.82} 82%|████████▏ | 5412/6640 [2:57:25<5:49:29, 17.08s/it] 82%|████████▏ | 5413/6640 [2:57:40<5:40:48, 16.67s/it] {'loss': 0.5, 'learning_rate': 1.7385283713275746e-06, 'epoch': 0.82} 82%|████████▏ | 5413/6640 [2:57:40<5:40:48, 16.67s/it] 82%|████████▏ | 5414/6640 [2:57:56<5:35:29, 16.42s/it] {'loss': 0.5235, 'learning_rate': 1.7357806832981127e-06, 'epoch': 0.82} 82%|████████▏ | 5414/6640 [2:57:56<5:35:29, 16.42s/it] 82%|████████▏ | 5415/6640 [2:58:13<5:35:15, 16.42s/it] {'loss': 0.5179, 'learning_rate': 1.7330349619311415e-06, 'epoch': 0.82} 82%|████████▏ | 5415/6640 [2:58:13<5:35:15, 16.42s/it] 82%|████████▏ | 5416/6640 [2:58:29<5:33:30, 16.35s/it] {'loss': 0.5072, 'learning_rate': 1.7302912078800805e-06, 'epoch': 0.82} 82%|████████▏ | 5416/6640 [2:58:29<5:33:30, 16.35s/it] 82%|████████▏ | 5417/6640 [2:58:45<5:31:32, 16.27s/it] {'loss': 0.4983, 'learning_rate': 1.7275494217978616e-06, 'epoch': 0.82} 82%|████████▏ | 5417/6640 [2:58:45<5:31:32, 16.27s/it] 82%|████████▏ | 5418/6640 [2:59:01<5:29:33, 16.18s/it] {'loss': 0.501, 'learning_rate': 1.724809604336961e-06, 'epoch': 0.82} 82%|████████▏ | 5418/6640 [2:59:01<5:29:33, 16.18s/it] 82%|████████▏ | 5419/6640 [2:59:17<5:27:49, 16.11s/it] {'loss': 0.4958, 'learning_rate': 1.7220717561493773e-06, 'epoch': 0.82} 82%|████████▏ | 5419/6640 [2:59:17<5:27:49, 16.11s/it] 82%|████████▏ | 5420/6640 [2:59:35<5:38:35, 16.65s/it] {'loss': 0.5059, 'learning_rate': 1.7193358778866464e-06, 'epoch': 0.82} 82%|████████▏ | 5420/6640 [2:59:35<5:38:35, 16.65s/it] 82%|████████▏ | 5421/6640 [2:59:52<5:42:24, 16.85s/it] {'loss': 0.5161, 'learning_rate': 1.716601970199836e-06, 'epoch': 0.82} 82%|████████▏ | 5421/6640 [2:59:52<5:42:24, 16.85s/it] 82%|████████▏ | 5422/6640 [3:00:08<5:39:45, 16.74s/it] {'loss': 0.5143, 'learning_rate': 1.713870033739541e-06, 'epoch': 0.82} 82%|████████▏ | 5422/6640 [3:00:08<5:39:45, 16.74s/it] 82%|████████▏ | 5423/6640 [3:00:25<5:35:47, 16.55s/it] {'loss': 0.5093, 'learning_rate': 1.7111400691558911e-06, 'epoch': 0.82} 82%|████████▏ | 5423/6640 [3:00:25<5:35:47, 16.55s/it] 82%|████████▏ | 5424/6640 [3:00:40<5:31:27, 16.35s/it] {'loss': 0.5128, 'learning_rate': 1.708412077098539e-06, 'epoch': 0.82} 82%|████████▏ | 5424/6640 [3:00:40<5:31:27, 16.35s/it] 82%|████████▏ | 5425/6640 [3:00:57<5:30:55, 16.34s/it] {'loss': 0.5245, 'learning_rate': 1.7056860582166823e-06, 'epoch': 0.82} 82%|████████▏ | 5425/6640 [3:00:57<5:30:55, 16.34s/it] 82%|████████▏ | 5426/6640 [3:01:13<5:28:06, 16.22s/it] {'loss': 0.523, 'learning_rate': 1.702962013159033e-06, 'epoch': 0.82} 82%|████████▏ | 5426/6640 [3:01:13<5:28:06, 16.22s/it] 82%|████████▏ | 5427/6640 [3:01:28<5:23:29, 16.00s/it] {'loss': 0.5062, 'learning_rate': 1.7002399425738459e-06, 'epoch': 0.82} 82%|████████▏ | 5427/6640 [3:01:28<5:23:29, 16.00s/it] 82%|████████▏ | 5428/6640 [3:01:44<5:23:16, 16.00s/it] {'loss': 0.5211, 'learning_rate': 1.6975198471088973e-06, 'epoch': 0.82} 82%|████████▏ | 5428/6640 [3:01:44<5:23:16, 16.00s/it] 82%|████████▏ | 5429/6640 [3:02:00<5:24:09, 16.06s/it] {'loss': 0.5289, 'learning_rate': 1.6948017274114959e-06, 'epoch': 0.82} 82%|████████▏ | 5429/6640 [3:02:00<5:24:09, 16.06s/it] 82%|████████▏ | 5430/6640 [3:02:17<5:27:25, 16.24s/it] {'loss': 0.5263, 'learning_rate': 1.6920855841284844e-06, 'epoch': 0.82} 82%|████████▏ | 5430/6640 [3:02:17<5:27:25, 16.24s/it] 82%|████████▏ | 5431/6640 [3:02:33<5:24:02, 16.08s/it] {'loss': 0.5046, 'learning_rate': 1.6893714179062315e-06, 'epoch': 0.82} 82%|████████▏ | 5431/6640 [3:02:33<5:24:02, 16.08s/it] 82%|████████▏ | 5432/6640 [3:02:49<5:23:19, 16.06s/it] {'loss': 0.4939, 'learning_rate': 1.6866592293906369e-06, 'epoch': 0.82} 82%|████████▏ | 5432/6640 [3:02:49<5:23:19, 16.06s/it] 82%|████████▏ | 5433/6640 [3:03:06<5:30:53, 16.45s/it] {'loss': 0.4983, 'learning_rate': 1.6839490192271225e-06, 'epoch': 0.82} 82%|████████▏ | 5433/6640 [3:03:06<5:30:53, 16.45s/it] 82%|████████▏ | 5434/6640 [3:03:23<5:33:03, 16.57s/it] {'loss': 0.504, 'learning_rate': 1.6812407880606563e-06, 'epoch': 0.82} 82%|████████▏ | 5434/6640 [3:03:23<5:33:03, 16.57s/it] 82%|████████▏ | 5435/6640 [3:03:39<5:31:12, 16.49s/it] {'loss': 0.4839, 'learning_rate': 1.6785345365357153e-06, 'epoch': 0.82} 82%|████████▏ | 5435/6640 [3:03:39<5:31:12, 16.49s/it] 82%|████████▏ | 5436/6640 [3:03:56<5:31:27, 16.52s/it] {'loss': 0.5269, 'learning_rate': 1.6758302652963176e-06, 'epoch': 0.82} 82%|████████▏ | 5436/6640 [3:03:56<5:31:27, 16.52s/it] 82%|████████▏ | 5437/6640 [3:04:12<5:27:51, 16.35s/it] {'loss': 0.5379, 'learning_rate': 1.6731279749860086e-06, 'epoch': 0.82} 82%|████████▏ | 5437/6640 [3:04:12<5:27:51, 16.35s/it] 82%|████████▏ | 5438/6640 [3:04:27<5:23:16, 16.14s/it] {'loss': 0.5041, 'learning_rate': 1.6704276662478602e-06, 'epoch': 0.82} 82%|████████▏ | 5438/6640 [3:04:27<5:23:16, 16.14s/it] 82%|████████▏ | 5439/6640 [3:04:44<5:23:23, 16.16s/it] {'loss': 0.5087, 'learning_rate': 1.6677293397244753e-06, 'epoch': 0.82} 82%|████████▏ | 5439/6640 [3:04:44<5:23:23, 16.16s/it] 82%|████████▏ | 5440/6640 [3:05:01<5:31:16, 16.56s/it] {'loss': 0.5175, 'learning_rate': 1.6650329960579792e-06, 'epoch': 0.82} 82%|████████▏ | 5440/6640 [3:05:01<5:31:16, 16.56s/it] 82%|████████▏ | 5441/6640 [3:05:17<5:24:21, 16.23s/it] {'loss': 0.493, 'learning_rate': 1.6623386358900339e-06, 'epoch': 0.82} 82%|████████▏ | 5441/6640 [3:05:17<5:24:21, 16.23s/it] 82%|████████▏ | 5442/6640 [3:05:33<5:25:56, 16.32s/it] {'loss': 0.5451, 'learning_rate': 1.6596462598618179e-06, 'epoch': 0.82} 82%|████████▏ | 5442/6640 [3:05:33<5:25:56, 16.32s/it] 82%|████████▏ | 5443/6640 [3:05:49<5:25:19, 16.31s/it] {'loss': 0.516, 'learning_rate': 1.656955868614053e-06, 'epoch': 0.82} 82%|████████▏ | 5443/6640 [3:05:49<5:25:19, 16.31s/it] 82%|████████▏ | 5444/6640 [3:06:06<5:23:24, 16.22s/it] {'loss': 0.5214, 'learning_rate': 1.6542674627869738e-06, 'epoch': 0.82} 82%|████████▏ | 5444/6640 [3:06:06<5:23:24, 16.22s/it] 82%|████████▏ | 5445/6640 [3:06:22<5:23:32, 16.24s/it] {'loss': 0.5153, 'learning_rate': 1.6515810430203516e-06, 'epoch': 0.82} 82%|████████▏ | 5445/6640 [3:06:22<5:23:32, 16.24s/it] 82%|████████▏ | 5446/6640 [3:06:38<5:20:35, 16.11s/it] {'loss': 0.5314, 'learning_rate': 1.648896609953481e-06, 'epoch': 0.82} 82%|████████▏ | 5446/6640 [3:06:38<5:20:35, 16.11s/it] 82%|████████▏ | 5447/6640 [3:06:54<5:21:55, 16.19s/it] {'loss': 0.5132, 'learning_rate': 1.6462141642251862e-06, 'epoch': 0.82} 82%|████████▏ | 5447/6640 [3:06:54<5:21:55, 16.19s/it] 82%|████████▏ | 5448/6640 [3:07:10<5:19:11, 16.07s/it] {'loss': 0.5056, 'learning_rate': 1.643533706473819e-06, 'epoch': 0.82} 82%|████████▏ | 5448/6640 [3:07:10<5:19:11, 16.07s/it] 82%|████████▏ | 5449/6640 [3:07:25<5:15:36, 15.90s/it] {'loss': 0.5167, 'learning_rate': 1.640855237337252e-06, 'epoch': 0.82} 82%|████████▏ | 5449/6640 [3:07:25<5:15:36, 15.90s/it]6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend...4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 82%|████████▏ | 5450/6640 [3:07:42<5:21:24, 16.21s/it] {'loss': 0.5092, 'learning_rate': 1.638178757452894e-06, 'epoch': 0.82} 82%|████████▏ | 5450/6640 [3:07:42<5:21:24, 16.21s/it] 82%|████████▏ | 5451/6640 [3:07:59<5:26:59, 16.50s/it] {'loss': 0.5095, 'learning_rate': 1.6355042674576671e-06, 'epoch': 0.82} 82%|████████▏ | 5451/6640 [3:07:59<5:26:59, 16.50s/it] 82%|████████▏ | 5452/6640 [3:08:16<5:27:37, 16.55s/it] {'loss': 0.5124, 'learning_rate': 1.632831767988039e-06, 'epoch': 0.82} 82%|████████▏ | 5452/6640 [3:08:16<5:27:37, 16.55s/it] 82%|████████▏ | 5453/6640 [3:08:33<5:27:51, 16.57s/it] {'loss': 0.4922, 'learning_rate': 1.6301612596799854e-06, 'epoch': 0.82} 82%|████████▏ | 5453/6640 [3:08:33<5:27:51, 16.57s/it] 82%|████████▏ | 5454/6640 [3:08:49<5:25:10, 16.45s/it] {'loss': 0.5065, 'learning_rate': 1.627492743169018e-06, 'epoch': 0.82} 82%|████████▏ | 5454/6640 [3:08:49<5:25:10, 16.45s/it] 82%|████████▏ | 5455/6640 [3:09:05<5:21:23, 16.27s/it] {'loss': 0.4938, 'learning_rate': 1.624826219090172e-06, 'epoch': 0.82} 82%|████████▏ | 5455/6640 [3:09:05<5:21:23, 16.27s/it] 82%|████████▏ | 5456/6640 [3:09:21<5:19:35, 16.20s/it] {'loss': 0.5161, 'learning_rate': 1.6221616880780078e-06, 'epoch': 0.82} 82%|████████▏ | 5456/6640 [3:09:21<5:19:35, 16.20s/it] 82%|████████▏ | 5457/6640 [3:09:37<5:20:53, 16.27s/it] {'loss': 0.5106, 'learning_rate': 1.6194991507666159e-06, 'epoch': 0.82} 82%|████████▏ | 5457/6640 [3:09:37<5:20:53, 16.27s/it] 82%|████████▏ | 5458/6640 [3:09:54<5:21:59, 16.34s/it] {'loss': 0.5034, 'learning_rate': 1.6168386077896036e-06, 'epoch': 0.82} 82%|████████▏ | 5458/6640 [3:09:54<5:21:59, 16.34s/it] 82%|████████▏ | 5459/6640 [3:10:10<5:21:20, 16.33s/it] {'loss': 0.5282, 'learning_rate': 1.6141800597801139e-06, 'epoch': 0.82} 82%|████████▏ | 5459/6640 [3:10:10<5:21:20, 16.33s/it] 82%|████████▏ | 5460/6640 [3:10:26<5:22:27, 16.40s/it] {'loss': 0.5082, 'learning_rate': 1.6115235073708024e-06, 'epoch': 0.82} 82%|████████▏ | 5460/6640 [3:10:27<5:22:27, 16.40s/it] 82%|████████▏ | 5461/6640 [3:10:43<5:20:44, 16.32s/it] {'loss': 0.5126, 'learning_rate': 1.608868951193867e-06, 'epoch': 0.82} 82%|████████▏ | 5461/6640 [3:10:43<5:20:44, 16.32s/it] 82%|████████▏ | 5462/6640 [3:11:00<5:24:29, 16.53s/it] {'loss': 0.5137, 'learning_rate': 1.6062163918810136e-06, 'epoch': 0.82} 82%|████████▏ | 5462/6640 [3:11:00<5:24:29, 16.53s/it] 82%|████████▏ | 5463/6640 [3:11:17<5:26:44, 16.66s/it] {'loss': 0.5233, 'learning_rate': 1.6035658300634816e-06, 'epoch': 0.82} 82%|████████▏ | 5463/6640 [3:11:17<5:26:44, 16.66s/it] 82%|████████▏ | 5464/6640 [3:11:33<5:23:59, 16.53s/it] {'loss': 0.5025, 'learning_rate': 1.6009172663720352e-06, 'epoch': 0.82} 82%|████████▏ | 5464/6640 [3:11:33<5:23:59, 16.53s/it] 82%|████████▏ | 5465/6640 [3:11:49<5:23:28, 16.52s/it] {'loss': 0.5132, 'learning_rate': 1.5982707014369603e-06, 'epoch': 0.82} 82%|████████▏ | 5465/6640 [3:11:49<5:23:28, 16.52s/it] 82%|████████▏ | 5466/6640 [3:12:07<5:27:40, 16.75s/it] {'loss': 0.5164, 'learning_rate': 1.595626135888071e-06, 'epoch': 0.82} 82%|████████▏ | 5466/6640 [3:12:07<5:27:40, 16.75s/it] 82%|████████▏ | 5467/6640 [3:12:22<5:20:44, 16.41s/it] {'loss': 0.5088, 'learning_rate': 1.5929835703546992e-06, 'epoch': 0.82} 82%|████████▏ | 5467/6640 [3:12:22<5:20:44, 16.41s/it] 82%|████████▏ | 5468/6640 [3:12:39<5:22:37, 16.52s/it] {'loss': 0.5226, 'learning_rate': 1.5903430054657077e-06, 'epoch': 0.82} 82%|████████▏ | 5468/6640 [3:12:39<5:22:37, 16.52s/it] 82%|████████▏ | 5469/6640 [3:12:56<5:24:12, 16.61s/it] {'loss': 0.4909, 'learning_rate': 1.5877044418494747e-06, 'epoch': 0.82} 82%|████████▏ | 5469/6640 [3:12:56<5:24:12, 16.61s/it] 82%|████████▏ | 5470/6640 [3:13:12<5:19:39, 16.39s/it] {'loss': 0.5008, 'learning_rate': 1.585067880133916e-06, 'epoch': 0.82} 82%|████████▏ | 5470/6640 [3:13:12<5:19:39, 16.39s/it] 82%|████████▏ | 5471/6640 [3:13:29<5:23:00, 16.58s/it] {'loss': 0.5014, 'learning_rate': 1.582433320946456e-06, 'epoch': 0.82} 82%|████████▏ | 5471/6640 [3:13:29<5:23:00, 16.58s/it] 82%|████████▏ | 5472/6640 [3:13:46<5:28:05, 16.85s/it] {'loss': 0.5257, 'learning_rate': 1.57980076491405e-06, 'epoch': 0.82} 82%|████████▏ | 5472/6640 [3:13:46<5:28:05, 16.85s/it] 82%|████████▏ | 5473/6640 [3:14:03<5:25:21, 16.73s/it] {'loss': 0.5192, 'learning_rate': 1.5771702126631784e-06, 'epoch': 0.82} 82%|████████▏ | 5473/6640 [3:14:03<5:25:21, 16.73s/it] 82%|████████▏ | 5474/6640 [3:14:19<5:23:37, 16.65s/it] {'loss': 0.5135, 'learning_rate': 1.5745416648198386e-06, 'epoch': 0.82} 82%|████████▏ | 5474/6640 [3:14:19<5:23:37, 16.65s/it] 82%|████████▏ | 5475/6640 [3:14:36<5:23:20, 16.65s/it] {'loss': 0.5226, 'learning_rate': 1.5719151220095596e-06, 'epoch': 0.82} 82%|████████▏ | 5475/6640 [3:14:36<5:23:20, 16.65s/it] 82%|████████▏ | 5476/6640 [3:14:53<5:24:35, 16.73s/it] {'loss': 0.5002, 'learning_rate': 1.5692905848573836e-06, 'epoch': 0.82} 82%|████████▏ | 5476/6640 [3:14:53<5:24:35, 16.73s/it] 82%|████████▏ | 5477/6640 [3:15:09<5:21:06, 16.57s/it] {'loss': 0.5352, 'learning_rate': 1.5666680539878797e-06, 'epoch': 0.82} 82%|████████▏ | 5477/6640 [3:15:09<5:21:06, 16.57s/it] 82%|████████▎ | 5478/6640 [3:15:25<5:20:52, 16.57s/it] {'loss': 0.5131, 'learning_rate': 1.5640475300251423e-06, 'epoch': 0.82} 82%|████████▎ | 5478/6640 [3:15:25<5:20:52, 16.57s/it] 83%|████████▎ | 5479/6640 [3:15:42<5:21:24, 16.61s/it] {'loss': 0.5026, 'learning_rate': 1.5614290135927857e-06, 'epoch': 0.83} 83%|████████▎ | 5479/6640 [3:15:42<5:21:24, 16.61s/it] 83%|████████▎ | 5480/6640 [3:15:59<5:21:21, 16.62s/it] {'loss': 0.5061, 'learning_rate': 1.558812505313947e-06, 'epoch': 0.83} 83%|████████▎ | 5480/6640 [3:15:59<5:21:21, 16.62s/it] 83%|████████▎ | 5481/6640 [3:16:15<5:21:05, 16.62s/it] {'loss': 0.5001, 'learning_rate': 1.5561980058112825e-06, 'epoch': 0.83} 83%|████████▎ | 5481/6640 [3:16:15<5:21:05, 16.62s/it] 83%|████████▎ | 5482/6640 [3:16:32<5:20:37, 16.61s/it] {'loss': 0.4972, 'learning_rate': 1.5535855157069734e-06, 'epoch': 0.83} 83%|████████▎ | 5482/6640 [3:16:32<5:20:37, 16.61s/it] 83%|████████▎ | 5483/6640 [3:16:48<5:15:59, 16.39s/it] {'loss': 0.5159, 'learning_rate': 1.5509750356227249e-06, 'epoch': 0.83} 83%|████████▎ | 5483/6640 [3:16:48<5:15:59, 16.39s/it] 83%|████████▎ | 5484/6640 [3:17:04<5:16:13, 16.41s/it] {'loss': 0.513, 'learning_rate': 1.5483665661797598e-06, 'epoch': 0.83} 83%|████████▎ | 5484/6640 [3:17:04<5:16:13, 16.41s/it] 83%|████████▎ | 5485/6640 [3:17:20<5:13:07, 16.27s/it] {'loss': 0.4897, 'learning_rate': 1.5457601079988226e-06, 'epoch': 0.83} 83%|████████▎ | 5485/6640 [3:17:20<5:13:07, 16.27s/it] 83%|████████▎ | 5486/6640 [3:17:36<5:11:46, 16.21s/it] {'loss': 0.5073, 'learning_rate': 1.5431556617001808e-06, 'epoch': 0.83} 83%|████████▎ | 5486/6640 [3:17:36<5:11:46, 16.21s/it] 83%|████████▎ | 5487/6640 [3:17:52<5:09:01, 16.08s/it] {'loss': 0.5248, 'learning_rate': 1.540553227903624e-06, 'epoch': 0.83} 83%|████████▎ | 5487/6640 [3:17:52<5:09:01, 16.08s/it] 83%|████████▎ | 5488/6640 [3:18:09<5:10:29, 16.17s/it] {'loss': 0.5126, 'learning_rate': 1.53795280722846e-06, 'epoch': 0.83} 83%|████████▎ | 5488/6640 [3:18:09<5:10:29, 16.17s/it] 83%|████████▎ | 5489/6640 [3:18:25<5:12:16, 16.28s/it] {'loss': 0.5165, 'learning_rate': 1.5353544002935229e-06, 'epoch': 0.83} 83%|████████▎ | 5489/6640 [3:18:25<5:12:16, 16.28s/it] 83%|████████▎ | 5490/6640 [3:18:41<5:12:17, 16.29s/it] {'loss': 0.4936, 'learning_rate': 1.5327580077171589e-06, 'epoch': 0.83} 83%|████████▎ | 5490/6640 [3:18:41<5:12:17, 16.29s/it] 83%|████████▎ | 5491/6640 [3:18:58<5:12:35, 16.32s/it] {'loss': 0.4918, 'learning_rate': 1.5301636301172418e-06, 'epoch': 0.83} 83%|████████▎ | 5491/6640 [3:18:58<5:12:35, 16.32s/it] 83%|████████▎ | 5492/6640 [3:19:14<5:12:58, 16.36s/it] {'loss': 0.5104, 'learning_rate': 1.5275712681111643e-06, 'epoch': 0.83} 83%|████████▎ | 5492/6640 [3:19:14<5:12:58, 16.36s/it] 83%|████████▎ | 5493/6640 [3:19:30<5:12:00, 16.32s/it] {'loss': 0.5209, 'learning_rate': 1.5249809223158406e-06, 'epoch': 0.83} 83%|████████▎ | 5493/6640 [3:19:30<5:12:00, 16.32s/it] 83%|████████▎ | 5494/6640 [3:19:47<5:13:43, 16.43s/it] {'loss': 0.5278, 'learning_rate': 1.5223925933477002e-06, 'epoch': 0.83} 83%|████████▎ | 5494/6640 [3:19:47<5:13:43, 16.43s/it] 83%|████████▎ | 5495/6640 [3:20:03<5:11:29, 16.32s/it] {'loss': 0.5076, 'learning_rate': 1.5198062818226967e-06, 'epoch': 0.83} 83%|████████▎ | 5495/6640 [3:20:03<5:11:29, 16.32s/it] 83%|████████▎ | 5496/6640 [3:20:20<5:11:03, 16.31s/it] {'loss': 0.5086, 'learning_rate': 1.5172219883563033e-06, 'epoch': 0.83} 83%|████████▎ | 5496/6640 [3:20:20<5:11:03, 16.31s/it] 83%|████████▎ | 5497/6640 [3:20:37<5:18:50, 16.74s/it] {'loss': 0.4917, 'learning_rate': 1.514639713563514e-06, 'epoch': 0.83} 83%|████████▎ | 5497/6640 [3:20:37<5:18:50, 16.74s/it] 83%|████████▎ | 5498/6640 [3:20:53<5:14:25, 16.52s/it] {'loss': 0.5207, 'learning_rate': 1.512059458058841e-06, 'epoch': 0.83} 83%|████████▎ | 5498/6640 [3:20:53<5:14:25, 16.52s/it] 83%|████████▎ | 5499/6640 [3:21:10<5:16:27, 16.64s/it] {'loss': 0.5045, 'learning_rate': 1.5094812224563117e-06, 'epoch': 0.83} 83%|████████▎ | 5499/6640 [3:21:10<5:16:27, 16.64s/it]5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend...6 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 83%|████████▎ | 5500/6640 [3:21:27<5:17:03, 16.69s/it] {'loss': 0.5128, 'learning_rate': 1.5069050073694813e-06, 'epoch': 0.83} 83%|████████▎ | 5500/6640 [3:21:27<5:17:03, 16.69s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-5500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-5500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-5500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 83%|████████▎ | 5501/6640 [3:23:14<13:52:32, 43.86s/it] {'loss': 0.5102, 'learning_rate': 1.5043308134114177e-06, 'epoch': 0.83} 83%|████████▎ | 5501/6640 [3:23:14<13:52:32, 43.86s/it] 83%|████████▎ | 5502/6640 [3:23:30<11:13:40, 35.52s/it] {'loss': 0.5127, 'learning_rate': 1.5017586411947138e-06, 'epoch': 0.83} 83%|████████▎ | 5502/6640 [3:23:30<11:13:40, 35.52s/it] 83%|████████▎ | 5503/6640 [3:23:47<9:25:24, 29.84s/it] {'loss': 0.5077, 'learning_rate': 1.4991884913314714e-06, 'epoch': 0.83} 83%|████████▎ | 5503/6640 [3:23:47<9:25:24, 29.84s/it] 83%|████████▎ | 5504/6640 [3:24:04<8:13:52, 26.09s/it] {'loss': 0.5195, 'learning_rate': 1.496620364433321e-06, 'epoch': 0.83} 83%|████████▎ | 5504/6640 [3:24:04<8:13:52, 26.09s/it] 83%|████████▎ | 5505/6640 [3:24:20<7:15:43, 23.03s/it] {'loss': 0.5261, 'learning_rate': 1.4940542611114073e-06, 'epoch': 0.83} 83%|████████▎ | 5505/6640 [3:24:20<7:15:43, 23.03s/it] 83%|████████▎ | 5506/6640 [3:24:37<6:38:20, 21.08s/it] {'loss': 0.5101, 'learning_rate': 1.4914901819763938e-06, 'epoch': 0.83} 83%|████████▎ | 5506/6640 [3:24:37<6:38:20, 21.08s/it] 83%|████████▎ | 5507/6640 [3:24:53<6:10:55, 19.64s/it] {'loss': 0.5136, 'learning_rate': 1.4889281276384648e-06, 'epoch': 0.83} 83%|████████▎ | 5507/6640 [3:24:53<6:10:55, 19.64s/it] 83%|████████▎ | 5508/6640 [3:25:10<5:54:00, 18.76s/it] {'loss': 0.5229, 'learning_rate': 1.486368098707317e-06, 'epoch': 0.83} 83%|████████▎ | 5508/6640 [3:25:10<5:54:00, 18.76s/it] 83%|████████▎ | 5509/6640 [3:25:25<5:37:13, 17.89s/it] {'loss': 0.4981, 'learning_rate': 1.4838100957921697e-06, 'epoch': 0.83} 83%|████████▎ | 5509/6640 [3:25:25<5:37:13, 17.89s/it] 83%|████████▎ | 5510/6640 [3:25:42<5:28:12, 17.43s/it] {'loss': 0.4969, 'learning_rate': 1.4812541195017593e-06, 'epoch': 0.83} 83%|████████▎ | 5510/6640 [3:25:42<5:28:12, 17.43s/it] 83%|████████▎ | 5511/6640 [3:25:58<5:21:43, 17.10s/it] {'loss': 0.4988, 'learning_rate': 1.4787001704443426e-06, 'epoch': 0.83} 83%|████████▎ | 5511/6640 [3:25:58<5:21:43, 17.10s/it] 83%|████████▎ | 5512/6640 [3:26:14<5:13:27, 16.67s/it] {'loss': 0.5286, 'learning_rate': 1.4761482492276847e-06, 'epoch': 0.83} 83%|████████▎ | 5512/6640 [3:26:14<5:13:27, 16.67s/it] 83%|████████▎ | 5513/6640 [3:26:30<5:11:58, 16.61s/it] {'loss': 0.5389, 'learning_rate': 1.4735983564590784e-06, 'epoch': 0.83} 83%|████████▎ | 5513/6640 [3:26:30<5:11:58, 16.61s/it] 83%|████████▎ | 5514/6640 [3:26:47<5:10:07, 16.53s/it] {'loss': 0.5236, 'learning_rate': 1.4710504927453295e-06, 'epoch': 0.83} 83%|████████▎ | 5514/6640 [3:26:47<5:10:07, 16.53s/it] 83%|████████▎ | 5515/6640 [3:27:02<5:04:05, 16.22s/it] {'loss': 0.5032, 'learning_rate': 1.4685046586927598e-06, 'epoch': 0.83} 83%|████████▎ | 5515/6640 [3:27:02<5:04:05, 16.22s/it] 83%|████████▎ | 5516/6640 [3:27:18<5:00:27, 16.04s/it] {'loss': 0.5046, 'learning_rate': 1.4659608549072135e-06, 'epoch': 0.83} 83%|████████▎ | 5516/6640 [3:27:18<5:00:27, 16.04s/it] 83%|████████▎ | 5517/6640 [3:27:34<4:59:56, 16.03s/it] {'loss': 0.5226, 'learning_rate': 1.463419081994042e-06, 'epoch': 0.83} 83%|████████▎ | 5517/6640 [3:27:34<4:59:56, 16.03s/it] 83%|████████▎ | 5518/6640 [3:27:50<5:01:48, 16.14s/it] {'loss': 0.5109, 'learning_rate': 1.4608793405581224e-06, 'epoch': 0.83} 83%|████████▎ | 5518/6640 [3:27:50<5:01:48, 16.14s/it] 83%|████████▎ | 5519/6640 [3:28:06<5:01:07, 16.12s/it] {'loss': 0.5272, 'learning_rate': 1.4583416312038434e-06, 'epoch': 0.83} 83%|████████▎ | 5519/6640 [3:28:06<5:01:07, 16.12s/it] 83%|████████▎ | 5520/6640 [3:28:22<4:58:56, 16.01s/it] {'loss': 0.5047, 'learning_rate': 1.4558059545351144e-06, 'epoch': 0.83} 83%|████████▎ | 5520/6640 [3:28:22<4:58:56, 16.01s/it] 83%|████████▎ | 5521/6640 [3:28:39<5:01:48, 16.18s/it] {'loss': 0.521, 'learning_rate': 1.453272311155357e-06, 'epoch': 0.83} 83%|████████▎ | 5521/6640 [3:28:39<5:01:48, 16.18s/it] 83%|████████▎ | 5522/6640 [3:28:55<5:03:31, 16.29s/it] {'loss': 0.5124, 'learning_rate': 1.450740701667509e-06, 'epoch': 0.83} 83%|████████▎ | 5522/6640 [3:28:55<5:03:31, 16.29s/it] 83%|████████▎ | 5523/6640 [3:29:12<5:04:23, 16.35s/it] {'loss': 0.5024, 'learning_rate': 1.4482111266740274e-06, 'epoch': 0.83} 83%|████████▎ | 5523/6640 [3:29:12<5:04:23, 16.35s/it] 83%|████████▎ | 5524/6640 [3:29:28<5:02:10, 16.25s/it] {'loss': 0.5265, 'learning_rate': 1.4456835867768814e-06, 'epoch': 0.83} 83%|████████▎ | 5524/6640 [3:29:28<5:02:10, 16.25s/it] 83%|████████▎ | 5525/6640 [3:29:44<5:01:06, 16.20s/it] {'loss': 0.4857, 'learning_rate': 1.4431580825775604e-06, 'epoch': 0.83} 83%|████████▎ | 5525/6640 [3:29:44<5:01:06, 16.20s/it] 83%|████████▎ | 5526/6640 [3:30:00<5:02:30, 16.29s/it] {'loss': 0.5038, 'learning_rate': 1.4406346146770633e-06, 'epoch': 0.83} 83%|████████▎ | 5526/6640 [3:30:00<5:02:30, 16.29s/it] 83%|████████▎ | 5527/6640 [3:30:16<5:01:30, 16.25s/it] {'loss': 0.5005, 'learning_rate': 1.43811318367591e-06, 'epoch': 0.83} 83%|████████▎ | 5527/6640 [3:30:16<5:01:30, 16.25s/it] 83%|████████▎ | 5528/6640 [3:30:33<5:01:38, 16.28s/it] {'loss': 0.5098, 'learning_rate': 1.4355937901741324e-06, 'epoch': 0.83} 83%|████████▎ | 5528/6640 [3:30:33<5:01:38, 16.28s/it] 83%|████████▎ | 5529/6640 [3:30:49<5:03:20, 16.38s/it] {'loss': 0.51, 'learning_rate': 1.433076434771279e-06, 'epoch': 0.83} 83%|████████▎ | 5529/6640 [3:30:49<5:03:20, 16.38s/it] 83%|████████▎ | 5530/6640 [3:31:06<5:02:32, 16.35s/it] {'loss': 0.4993, 'learning_rate': 1.4305611180664157e-06, 'epoch': 0.83} 83%|████████▎ | 5530/6640 [3:31:06<5:02:32, 16.35s/it] 83%|████████▎ | 5531/6640 [3:31:22<5:00:52, 16.28s/it] {'loss': 0.507, 'learning_rate': 1.4280478406581156e-06, 'epoch': 0.83} 83%|████████▎ | 5531/6640 [3:31:22<5:00:52, 16.28s/it] 83%|████████▎ | 5532/6640 [3:31:38<5:00:22, 16.27s/it] {'loss': 0.518, 'learning_rate': 1.4255366031444717e-06, 'epoch': 0.83} 83%|████████▎ | 5532/6640 [3:31:38<5:00:22, 16.27s/it] 83%|████████▎ | 5533/6640 [3:31:54<4:59:53, 16.25s/it] {'loss': 0.5111, 'learning_rate': 1.4230274061230943e-06, 'epoch': 0.83} 83%|████████▎ | 5533/6640 [3:31:54<4:59:53, 16.25s/it] 83%|████████▎ | 5534/6640 [3:32:11<5:03:25, 16.46s/it] {'loss': 0.5088, 'learning_rate': 1.4205202501911052e-06, 'epoch': 0.83} 83%|████████▎ | 5534/6640 [3:32:11<5:03:25, 16.46s/it] 83%|████████▎ | 5535/6640 [3:32:28<5:05:29, 16.59s/it] {'loss': 0.5042, 'learning_rate': 1.4180151359451367e-06, 'epoch': 0.83} 83%|████████▎ | 5535/6640 [3:32:28<5:05:29, 16.59s/it] 83%|████████▎ | 5536/6640 [3:32:44<5:03:37, 16.50s/it] {'loss': 0.5072, 'learning_rate': 1.4155120639813392e-06, 'epoch': 0.83} 83%|████████▎ | 5536/6640 [3:32:44<5:03:37, 16.50s/it] 83%|████████▎ | 5537/6640 [3:33:00<5:01:00, 16.37s/it] {'loss': 0.5165, 'learning_rate': 1.4130110348953795e-06, 'epoch': 0.83} 83%|████████▎ | 5537/6640 [3:33:00<5:01:00, 16.37s/it] 83%|████████▎ | 5538/6640 [3:33:16<4:58:47, 16.27s/it] {'loss': 0.5011, 'learning_rate': 1.410512049282433e-06, 'epoch': 0.83} 83%|████████▎ | 5538/6640 [3:33:16<4:58:47, 16.27s/it] 83%|████████▎ | 5539/6640 [3:33:32<4:57:15, 16.20s/it] {'loss': 0.517, 'learning_rate': 1.408015107737195e-06, 'epoch': 0.83} 83%|████████▎ | 5539/6640 [3:33:32<4:57:15, 16.20s/it] 83%|████████▎ | 5540/6640 [3:33:49<4:58:28, 16.28s/it] {'loss': 0.5287, 'learning_rate': 1.4055202108538657e-06, 'epoch': 0.83} 83%|████████▎ | 5540/6640 [3:33:49<4:58:28, 16.28s/it] 83%|████████▎ | 5541/6640 [3:34:05<4:54:32, 16.08s/it] {'loss': 0.5394, 'learning_rate': 1.4030273592261656e-06, 'epoch': 0.83} 83%|████████▎ | 5541/6640 [3:34:05<4:54:32, 16.08s/it] 83%|████████▎ | 5542/6640 [3:34:21<4:58:34, 16.32s/it] {'loss': 0.5188, 'learning_rate': 1.400536553447327e-06, 'epoch': 0.83} 83%|████████▎ | 5542/6640 [3:34:21<4:58:34, 16.32s/it] 83%|████████▎ | 5543/6640 [3:34:38<5:00:23, 16.43s/it] {'loss': 0.5045, 'learning_rate': 1.3980477941100956e-06, 'epoch': 0.83} 83%|████████▎ | 5543/6640 [3:34:38<5:00:23, 16.43s/it] 83%|████████▎ | 5544/6640 [3:34:54<4:58:59, 16.37s/it] {'loss': 0.5115, 'learning_rate': 1.3955610818067267e-06, 'epoch': 0.83} 83%|████████▎ | 5544/6640 [3:34:54<4:58:59, 16.37s/it] 84%|████████▎ | 5545/6640 [3:35:11<5:00:13, 16.45s/it] {'loss': 0.5237, 'learning_rate': 1.3930764171289935e-06, 'epoch': 0.84} 84%|████████▎ | 5545/6640 [3:35:11<5:00:13, 16.45s/it] 84%|████████▎ | 5546/6640 [3:35:28<5:02:05, 16.57s/it] {'loss': 0.4916, 'learning_rate': 1.3905938006681773e-06, 'epoch': 0.84} 84%|████████▎ | 5546/6640 [3:35:28<5:02:05, 16.57s/it] 84%|████████▎ | 5547/6640 [3:35:44<5:01:41, 16.56s/it] {'loss': 0.5041, 'learning_rate': 1.3881132330150771e-06, 'epoch': 0.84} 84%|████████▎ | 5547/6640 [3:35:44<5:01:41, 16.56s/it] 84%|████████▎ | 5548/6640 [3:36:00<4:57:42, 16.36s/it] {'loss': 0.5182, 'learning_rate': 1.3856347147600014e-06, 'epoch': 0.84} 84%|████████▎ | 5548/6640 [3:36:00<4:57:42, 16.36s/it] 84%|████████▎ | 5549/6640 [3:36:16<4:54:15, 16.18s/it] {'loss': 0.4927, 'learning_rate': 1.3831582464927685e-06, 'epoch': 0.84} 84%|████████▎ | 5549/6640 [3:36:16<4:54:15, 16.18s/it]7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 30 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...4 AutoResumeHook: Checking whether to suspend... 84%|████████▎ | 5550/6640 [3:36:32<4:54:53, 16.23s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5106, 'learning_rate': 1.3806838288027113e-06, 'epoch': 0.84} 84%|████████▎ | 5550/6640 [3:36:32<4:54:53, 16.23s/it] 84%|████████▎ | 5551/6640 [3:36:49<4:55:53, 16.30s/it] {'loss': 0.5258, 'learning_rate': 1.3782114622786769e-06, 'epoch': 0.84} 84%|████████▎ | 5551/6640 [3:36:49<4:55:53, 16.30s/it] 84%|████████▎ | 5552/6640 [3:37:05<4:53:45, 16.20s/it] {'loss': 0.488, 'learning_rate': 1.3757411475090233e-06, 'epoch': 0.84} 84%|████████▎ | 5552/6640 [3:37:05<4:53:45, 16.20s/it] 84%|████████▎ | 5553/6640 [3:37:21<4:56:04, 16.34s/it] {'loss': 0.5226, 'learning_rate': 1.3732728850816146e-06, 'epoch': 0.84} 84%|████████▎ | 5553/6640 [3:37:21<4:56:04, 16.34s/it] 84%|████████▎ | 5554/6640 [3:37:38<4:54:58, 16.30s/it] {'loss': 0.5286, 'learning_rate': 1.3708066755838344e-06, 'epoch': 0.84} 84%|████████▎ | 5554/6640 [3:37:38<4:54:58, 16.30s/it] 84%|████████▎ | 5555/6640 [3:37:54<4:55:30, 16.34s/it] {'loss': 0.511, 'learning_rate': 1.3683425196025734e-06, 'epoch': 0.84} 84%|████████▎ | 5555/6640 [3:37:54<4:55:30, 16.34s/it] 84%|████████▎ | 5556/6640 [3:38:10<4:55:11, 16.34s/it] {'loss': 0.5297, 'learning_rate': 1.3658804177242347e-06, 'epoch': 0.84} 84%|████████▎ | 5556/6640 [3:38:10<4:55:11, 16.34s/it] 84%|████████▎ | 5557/6640 [3:38:27<4:55:57, 16.40s/it] {'loss': 0.5107, 'learning_rate': 1.3634203705347348e-06, 'epoch': 0.84} 84%|████████▎ | 5557/6640 [3:38:27<4:55:57, 16.40s/it] 84%|████████▎ | 5558/6640 [3:38:43<4:54:44, 16.34s/it] {'loss': 0.5146, 'learning_rate': 1.3609623786194958e-06, 'epoch': 0.84} 84%|████████▎ | 5558/6640 [3:38:43<4:54:44, 16.34s/it] 84%|████████▎ | 5559/6640 [3:38:59<4:54:12, 16.33s/it] {'loss': 0.5324, 'learning_rate': 1.3585064425634542e-06, 'epoch': 0.84} 84%|████████▎ | 5559/6640 [3:38:59<4:54:12, 16.33s/it] 84%|████████▎ | 5560/6640 [3:39:15<4:48:32, 16.03s/it] {'loss': 0.4863, 'learning_rate': 1.3560525629510567e-06, 'epoch': 0.84} 84%|████████▎ | 5560/6640 [3:39:15<4:48:32, 16.03s/it] 84%|████████▍ | 5561/6640 [3:39:31<4:50:06, 16.13s/it] {'loss': 0.4995, 'learning_rate': 1.3536007403662632e-06, 'epoch': 0.84} 84%|████████▍ | 5561/6640 [3:39:31<4:50:06, 16.13s/it] 84%|████████▍ | 5562/6640 [3:39:47<4:50:27, 16.17s/it] {'loss': 0.4995, 'learning_rate': 1.3511509753925422e-06, 'epoch': 0.84} 84%|████████▍ | 5562/6640 [3:39:47<4:50:27, 16.17s/it] 84%|████████▍ | 5563/6640 [3:40:04<4:52:07, 16.27s/it] {'loss': 0.5201, 'learning_rate': 1.3487032686128653e-06, 'epoch': 0.84} 84%|████████▍ | 5563/6640 [3:40:04<4:52:07, 16.27s/it] 84%|████████▍ | 5564/6640 [3:40:20<4:52:27, 16.31s/it] {'loss': 0.5237, 'learning_rate': 1.34625762060973e-06, 'epoch': 0.84} 84%|████████▍ | 5564/6640 [3:40:20<4:52:27, 16.31s/it] 84%|████████▍ | 5565/6640 [3:40:37<4:53:54, 16.40s/it] {'loss': 0.4966, 'learning_rate': 1.3438140319651283e-06, 'epoch': 0.84} 84%|████████▍ | 5565/6640 [3:40:37<4:53:54, 16.40s/it] 84%|████████▍ | 5566/6640 [3:40:54<4:56:23, 16.56s/it] {'loss': 0.5286, 'learning_rate': 1.341372503260574e-06, 'epoch': 0.84} 84%|████████▍ | 5566/6640 [3:40:54<4:56:23, 16.56s/it] 84%|████████▍ | 5567/6640 [3:41:10<4:55:34, 16.53s/it] {'loss': 0.5162, 'learning_rate': 1.33893303507708e-06, 'epoch': 0.84} 84%|████████▍ | 5567/6640 [3:41:10<4:55:34, 16.53s/it] 84%|████████▍ | 5568/6640 [3:41:26<4:52:51, 16.39s/it] {'loss': 0.5173, 'learning_rate': 1.3364956279951768e-06, 'epoch': 0.84} 84%|████████▍ | 5568/6640 [3:41:26<4:52:51, 16.39s/it] 84%|████████▍ | 5569/6640 [3:41:43<4:54:23, 16.49s/it] {'loss': 0.509, 'learning_rate': 1.3340602825949024e-06, 'epoch': 0.84} 84%|████████▍ | 5569/6640 [3:41:43<4:54:23, 16.49s/it] 84%|████████▍ | 5570/6640 [3:42:00<4:56:43, 16.64s/it] {'loss': 0.5049, 'learning_rate': 1.331626999455804e-06, 'epoch': 0.84} 84%|████████▍ | 5570/6640 [3:42:00<4:56:43, 16.64s/it] 84%|████████▍ | 5571/6640 [3:42:17<4:57:43, 16.71s/it] {'loss': 0.5022, 'learning_rate': 1.3291957791569376e-06, 'epoch': 0.84} 84%|████████▍ | 5571/6640 [3:42:17<4:57:43, 16.71s/it] 84%|████████▍ | 5572/6640 [3:42:34<4:59:20, 16.82s/it] {'loss': 0.5285, 'learning_rate': 1.3267666222768637e-06, 'epoch': 0.84} 84%|████████▍ | 5572/6640 [3:42:34<4:59:20, 16.82s/it] 84%|████████▍ | 5573/6640 [3:42:50<4:55:05, 16.59s/it] {'loss': 0.5079, 'learning_rate': 1.3243395293936657e-06, 'epoch': 0.84} 84%|████████▍ | 5573/6640 [3:42:50<4:55:05, 16.59s/it] 84%|████████▍ | 5574/6640 [3:43:07<4:55:11, 16.62s/it] {'loss': 0.5202, 'learning_rate': 1.3219145010849188e-06, 'epoch': 0.84} 84%|████████▍ | 5574/6640 [3:43:07<4:55:11, 16.62s/it] 84%|████████▍ | 5575/6640 [3:43:23<4:50:12, 16.35s/it] {'loss': 0.5276, 'learning_rate': 1.3194915379277195e-06, 'epoch': 0.84} 84%|████████▍ | 5575/6640 [3:43:23<4:50:12, 16.35s/it] 84%|████████▍ | 5576/6640 [3:43:39<4:50:53, 16.40s/it] {'loss': 0.5008, 'learning_rate': 1.3170706404986645e-06, 'epoch': 0.84} 84%|████████▍ | 5576/6640 [3:43:39<4:50:53, 16.40s/it] 84%|████████▍ | 5577/6640 [3:43:55<4:49:44, 16.35s/it] {'loss': 0.4968, 'learning_rate': 1.3146518093738624e-06, 'epoch': 0.84} 84%|████████▍ | 5577/6640 [3:43:55<4:49:44, 16.35s/it] 84%|████████▍ | 5578/6640 [3:44:12<4:53:06, 16.56s/it] {'loss': 0.5083, 'learning_rate': 1.3122350451289323e-06, 'epoch': 0.84} 84%|████████▍ | 5578/6640 [3:44:12<4:53:06, 16.56s/it] 84%|████████▍ | 5579/6640 [3:44:28<4:49:29, 16.37s/it] {'loss': 0.5191, 'learning_rate': 1.309820348338998e-06, 'epoch': 0.84} 84%|████████▍ | 5579/6640 [3:44:28<4:49:29, 16.37s/it] 84%|████████▍ | 5580/6640 [3:44:44<4:48:01, 16.30s/it] {'loss': 0.5093, 'learning_rate': 1.307407719578696e-06, 'epoch': 0.84} 84%|████████▍ | 5580/6640 [3:44:44<4:48:01, 16.30s/it] 84%|████████▍ | 5581/6640 [3:45:00<4:46:19, 16.22s/it] {'loss': 0.4988, 'learning_rate': 1.30499715942216e-06, 'epoch': 0.84} 84%|████████▍ | 5581/6640 [3:45:00<4:46:19, 16.22s/it] 84%|████████▍ | 5582/6640 [3:45:17<4:45:47, 16.21s/it] {'loss': 0.5047, 'learning_rate': 1.3025886684430467e-06, 'epoch': 0.84} 84%|████████▍ | 5582/6640 [3:45:17<4:45:47, 16.21s/it] 84%|████████▍ | 5583/6640 [3:45:33<4:44:49, 16.17s/it] {'loss': 0.5087, 'learning_rate': 1.3001822472145066e-06, 'epoch': 0.84} 84%|████████▍ | 5583/6640 [3:45:33<4:44:49, 16.17s/it] 84%|████████▍ | 5584/6640 [3:45:49<4:45:55, 16.25s/it] {'loss': 0.5171, 'learning_rate': 1.297777896309207e-06, 'epoch': 0.84} 84%|████████▍ | 5584/6640 [3:45:49<4:45:55, 16.25s/it] 84%|████████▍ | 5585/6640 [3:46:05<4:43:45, 16.14s/it] {'loss': 0.5112, 'learning_rate': 1.2953756162993158e-06, 'epoch': 0.84} 84%|████████▍ | 5585/6640 [3:46:05<4:43:45, 16.14s/it] 84%|████████▍ | 5586/6640 [3:46:22<4:46:00, 16.28s/it] {'loss': 0.5338, 'learning_rate': 1.2929754077565126e-06, 'epoch': 0.84} 84%|████████▍ | 5586/6640 [3:46:22<4:46:00, 16.28s/it] 84%|████████▍ | 5587/6640 [3:46:38<4:43:58, 16.18s/it] {'loss': 0.5004, 'learning_rate': 1.2905772712519826e-06, 'epoch': 0.84} 84%|████████▍ | 5587/6640 [3:46:38<4:43:58, 16.18s/it] 84%|████████▍ | 5588/6640 [3:46:54<4:44:23, 16.22s/it] {'loss': 0.4971, 'learning_rate': 1.288181207356417e-06, 'epoch': 0.84} 84%|████████▍ | 5588/6640 [3:46:54<4:44:23, 16.22s/it] 84%|████████▍ | 5589/6640 [3:47:11<4:46:31, 16.36s/it] {'loss': 0.5306, 'learning_rate': 1.2857872166400198e-06, 'epoch': 0.84} 84%|████████▍ | 5589/6640 [3:47:11<4:46:31, 16.36s/it] 84%|████████▍ | 5590/6640 [3:47:28<4:50:27, 16.60s/it] {'loss': 0.5159, 'learning_rate': 1.2833952996724864e-06, 'epoch': 0.84} 84%|████████▍ | 5590/6640 [3:47:28<4:50:27, 16.60s/it] 84%|████████▍ | 5591/6640 [3:47:44<4:50:24, 16.61s/it] {'loss': 0.5044, 'learning_rate': 1.28100545702304e-06, 'epoch': 0.84} 84%|████████▍ | 5591/6640 [3:47:44<4:50:24, 16.61s/it] 84%|████████▍ | 5592/6640 [3:48:01<4:49:33, 16.58s/it] {'loss': 0.5089, 'learning_rate': 1.278617689260393e-06, 'epoch': 0.84} 84%|████████▍ | 5592/6640 [3:48:01<4:49:33, 16.58s/it] 84%|████████▍ | 5593/6640 [3:48:17<4:47:16, 16.46s/it] {'loss': 0.5118, 'learning_rate': 1.2762319969527725e-06, 'epoch': 0.84} 84%|████████▍ | 5593/6640 [3:48:17<4:47:16, 16.46s/it] 84%|████████▍ | 5594/6640 [3:48:34<4:47:29, 16.49s/it] {'loss': 0.5094, 'learning_rate': 1.2738483806679057e-06, 'epoch': 0.84} 84%|████████▍ | 5594/6640 [3:48:34<4:47:29, 16.49s/it] 84%|████████▍ | 5595/6640 [3:48:50<4:44:28, 16.33s/it] {'loss': 0.5119, 'learning_rate': 1.2714668409730312e-06, 'epoch': 0.84} 84%|████████▍ | 5595/6640 [3:48:50<4:44:28, 16.33s/it] 84%|████████▍ | 5596/6640 [3:49:06<4:45:24, 16.40s/it] {'loss': 0.5188, 'learning_rate': 1.2690873784348923e-06, 'epoch': 0.84} 84%|████████▍ | 5596/6640 [3:49:06<4:45:24, 16.40s/it] 84%|████████▍ | 5597/6640 [3:49:23<4:45:03, 16.40s/it] {'loss': 0.5162, 'learning_rate': 1.266709993619737e-06, 'epoch': 0.84} 84%|████████▍ | 5597/6640 [3:49:23<4:45:03, 16.40s/it] 84%|████████▍ | 5598/6640 [3:49:39<4:44:10, 16.36s/it] {'loss': 0.5133, 'learning_rate': 1.2643346870933204e-06, 'epoch': 0.84} 84%|████████▍ | 5598/6640 [3:49:39<4:44:10, 16.36s/it] 84%|████████▍ | 5599/6640 [3:49:54<4:40:27, 16.16s/it] {'loss': 0.5082, 'learning_rate': 1.2619614594208972e-06, 'epoch': 0.84} 84%|████████▍ | 5599/6640 [3:49:54<4:40:27, 16.16s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 84%|████████▍ | 5600/6640 [3:50:11<4:42:42, 16.31s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.5053, 'learning_rate': 1.259590311167238e-06, 'epoch': 0.84} 84%|████████▍ | 5600/6640 [3:50:11<4:42:42, 16.31s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-5600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-5600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-5600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 84%|████████▍ | 5601/6640 [3:52:02<12:53:25, 44.66s/it] {'loss': 0.5196, 'learning_rate': 1.2572212428966079e-06, 'epoch': 0.84} 84%|████████▍ | 5601/6640 [3:52:02<12:53:25, 44.66s/it] 84%|████████▍ | 5602/6640 [3:52:18<10:24:47, 36.12s/it] {'loss': 0.5116, 'learning_rate': 1.2548542551727837e-06, 'epoch': 0.84} 84%|████████▍ | 5602/6640 [3:52:18<10:24:47, 36.12s/it] 84%|████████▍ | 5603/6640 [3:52:35<8:42:10, 30.21s/it] {'loss': 0.5239, 'learning_rate': 1.2524893485590462e-06, 'epoch': 0.84} 84%|████████▍ | 5603/6640 [3:52:35<8:42:10, 30.21s/it] 84%|████████▍ | 5604/6640 [3:52:52<7:36:56, 26.46s/it] {'loss': 0.5054, 'learning_rate': 1.2501265236181736e-06, 'epoch': 0.84} 84%|████████▍ | 5604/6640 [3:52:52<7:36:56, 26.46s/it] 84%|████████▍ | 5605/6640 [3:53:09<6:44:51, 23.47s/it] {'loss': 0.5099, 'learning_rate': 1.2477657809124632e-06, 'epoch': 0.84} 84%|████████▍ | 5605/6640 [3:53:09<6:44:51, 23.47s/it] 84%|████████▍ | 5606/6640 [3:53:25<6:07:54, 21.35s/it] {'loss': 0.5149, 'learning_rate': 1.2454071210037033e-06, 'epoch': 0.84} 84%|████████▍ | 5606/6640 [3:53:25<6:07:54, 21.35s/it] 84%|████████▍ | 5607/6640 [3:53:41<5:39:48, 19.74s/it] {'loss': 0.5343, 'learning_rate': 1.2430505444531937e-06, 'epoch': 0.84} 84%|████████▍ | 5607/6640 [3:53:41<5:39:48, 19.74s/it] 84%|████████▍ | 5608/6640 [3:53:58<5:22:54, 18.77s/it] {'loss': 0.5037, 'learning_rate': 1.2406960518217325e-06, 'epoch': 0.84} 84%|████████▍ | 5608/6640 [3:53:58<5:22:54, 18.77s/it] 84%|████████▍ | 5609/6640 [3:54:14<5:11:25, 18.12s/it] {'loss': 0.5161, 'learning_rate': 1.2383436436696328e-06, 'epoch': 0.84} 84%|████████▍ | 5609/6640 [3:54:14<5:11:25, 18.12s/it] 84%|████████▍ | 5610/6640 [3:54:31<5:03:51, 17.70s/it] {'loss': 0.5103, 'learning_rate': 1.2359933205566987e-06, 'epoch': 0.84} 84%|████████▍ | 5610/6640 [3:54:31<5:03:51, 17.70s/it] 85%|████████▍ | 5611/6640 [3:54:48<4:58:47, 17.42s/it] {'loss': 0.5244, 'learning_rate': 1.2336450830422452e-06, 'epoch': 0.85} 85%|████████▍ | 5611/6640 [3:54:48<4:58:47, 17.42s/it] 85%|████████▍ | 5612/6640 [3:55:04<4:53:33, 17.13s/it] {'loss': 0.5331, 'learning_rate': 1.2312989316850932e-06, 'epoch': 0.85} 85%|████████▍ | 5612/6640 [3:55:04<4:53:33, 17.13s/it] 85%|████████▍ | 5613/6640 [3:55:20<4:48:20, 16.85s/it] {'loss': 0.502, 'learning_rate': 1.2289548670435568e-06, 'epoch': 0.85} 85%|████████▍ | 5613/6640 [3:55:20<4:48:20, 16.85s/it] 85%|████████▍ | 5614/6640 [3:55:36<4:41:19, 16.45s/it] {'loss': 0.529, 'learning_rate': 1.2266128896754703e-06, 'epoch': 0.85} 85%|████████▍ | 5614/6640 [3:55:36<4:41:19, 16.45s/it] 85%|████████▍ | 5615/6640 [3:55:53<4:45:03, 16.69s/it] {'loss': 0.519, 'learning_rate': 1.2242730001381532e-06, 'epoch': 0.85} 85%|████████▍ | 5615/6640 [3:55:53<4:45:03, 16.69s/it] 85%|████████▍ | 5616/6640 [3:56:11<4:50:13, 17.01s/it] {'loss': 0.523, 'learning_rate': 1.221935198988441e-06, 'epoch': 0.85} 85%|████████▍ | 5616/6640 [3:56:11<4:50:13, 17.01s/it] 85%|████████▍ | 5617/6640 [3:56:28<4:48:47, 16.94s/it] {'loss': 0.4875, 'learning_rate': 1.2195994867826622e-06, 'epoch': 0.85} 85%|████████▍ | 5617/6640 [3:56:28<4:48:47, 16.94s/it] 85%|████████▍ | 5618/6640 [3:56:44<4:43:27, 16.64s/it] {'loss': 0.5005, 'learning_rate': 1.2172658640766622e-06, 'epoch': 0.85} 85%|████████▍ | 5618/6640 [3:56:44<4:43:27, 16.64s/it]May 28 22:16:36.587682 1503970 slurmstepd 0x155550a06700: error: *** STEP 8296786.0 ON batch-block1-2107 CANCELLED AT 2025-05-28T22:16:36 DUE TO TIME LIMIT *** srun: Job step aborted: Waiting up to 122 seconds for job step to finish. srun: error: batch-block1-2107: task 0: Terminated srun: Terminating StepId=8296786.0 srun: job 8299338 queued and waiting for resources srun: job 8299338 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-2107 JobID: 8299338 | Full list: batch-block1-2107 NETWORK=Efficient-Large-Model/VILA1.5-13b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-05-28 22:18:38,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 22:18:38,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 22:18:38,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 22:18:38,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 22:18:38,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 22:18:38,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 22:18:38,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 22:18:38,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 22:18:39,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 22:18:39,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 22:18:39,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 22:18:39,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 22:18:39,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 22:18:39,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 22:18:39,708] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 22:18:39,708] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 22:18:39,708] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 22:18:39,708] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 22:18:39,708] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 22:18:39,709] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 22:18:39,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 22:18:39,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 22:18:39,709] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-05-28 22:18:39,709] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 22:18:39,709] [INFO] [comm.py:594:init_distributed] cdb=None You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-05-28 22:18:47,167] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 13.02B parameters Loading checkpoint shards: 0%| | 0/6 [00:00 4096). Running this sequence through the model will result in indexing errors 95%|█████████▍| 6291/6640 [3:18:45<1:35:03, 16.34s/it] {'loss': 0.4974, 'learning_rate': 1.4457713018117935e-07, 'epoch': 0.95} 95%|█████████▍| 6291/6640 [3:18:45<1:35:03, 16.34s/it] 95%|█████████▍| 6292/6640 [3:19:01<1:34:41, 16.33s/it] {'loss': 0.5232, 'learning_rate': 1.4375178286058167e-07, 'epoch': 0.95} 95%|█████████▍| 6292/6640 [3:19:01<1:34:41, 16.33s/it] 95%|█████████▍| 6293/6640 [3:19:18<1:34:27, 16.33s/it] {'loss': 0.5091, 'learning_rate': 1.4292878106265118e-07, 'epoch': 0.95} 95%|█████████▍| 6293/6640 [3:19:18<1:34:27, 16.33s/it] 95%|█████████▍| 6294/6640 [3:19:35<1:35:21, 16.54s/it] {'loss': 0.5132, 'learning_rate': 1.4210812498324012e-07, 'epoch': 0.95} 95%|█████████▍| 6294/6640 [3:19:35<1:35:21, 16.54s/it] 95%|█████████▍| 6295/6640 [3:19:52<1:35:55, 16.68s/it] {'loss': 0.5198, 'learning_rate': 1.4128981481764115e-07, 'epoch': 0.95} 95%|█████████▍| 6295/6640 [3:19:52<1:35:55, 16.68s/it] 95%|█████████▍| 6296/6640 [3:20:08<1:35:16, 16.62s/it] {'loss': 0.5193, 'learning_rate': 1.4047385076059072e-07, 'epoch': 0.95} 95%|█████████▍| 6296/6640 [3:20:08<1:35:16, 16.62s/it] 95%|█████████▍| 6297/6640 [3:20:25<1:34:16, 16.49s/it] {'loss': 0.4945, 'learning_rate': 1.3966023300626685e-07, 'epoch': 0.95} 95%|█████████▍| 6297/6640 [3:20:25<1:34:16, 16.49s/it] 95%|█████████▍| 6298/6640 [3:20:41<1:33:55, 16.48s/it] {'loss': 0.5177, 'learning_rate': 1.388489617482891e-07, 'epoch': 0.95} 95%|█████████▍| 6298/6640 [3:20:41<1:33:55, 16.48s/it] 95%|█████████▍| 6299/6640 [3:20:57<1:33:27, 16.44s/it] {'loss': 0.5007, 'learning_rate': 1.3804003717971637e-07, 'epoch': 0.95} 95%|█████████▍| 6299/6640 [3:20:57<1:33:27, 16.44s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 7AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 95%|█████████▍| 6300/6640 [3:21:13<1:32:05, 16.25s/it] {'loss': 0.5133, 'learning_rate': 1.3723345949305245e-07, 'epoch': 0.95} 95%|█████████▍| 6300/6640 [3:21:13<1:32:05, 16.25s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-6300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-6300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-6300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 95%|█████████▍| 6301/6640 [3:22:38<3:28:34, 36.92s/it] {'loss': 0.519, 'learning_rate': 1.3642922888024047e-07, 'epoch': 0.95} 95%|█████████▍| 6301/6640 [3:22:38<3:28:34, 36.92s/it] 95%|█████████▍| 6302/6640 [3:22:56<2:54:46, 31.03s/it] {'loss': 0.5095, 'learning_rate': 1.356273455326662e-07, 'epoch': 0.95} 95%|█████████▍| 6302/6640 [3:22:56<2:54:46, 31.03s/it] 95%|█████████▍| 6303/6640 [3:23:12<2:29:07, 26.55s/it] {'loss': 0.5007, 'learning_rate': 1.3482780964115705e-07, 'epoch': 0.95} 95%|█████████▍| 6303/6640 [3:23:12<2:29:07, 26.55s/it] 95%|█████████▍| 6304/6640 [3:23:28<2:12:18, 23.63s/it] {'loss': 0.5117, 'learning_rate': 1.3403062139598078e-07, 'epoch': 0.95} 95%|█████████▍| 6304/6640 [3:23:28<2:12:18, 23.63s/it] 95%|█████████▍| 6305/6640 [3:23:45<1:59:33, 21.41s/it] {'loss': 0.5231, 'learning_rate': 1.3323578098684565e-07, 'epoch': 0.95} 95%|█████████▍| 6305/6640 [3:23:45<1:59:33, 21.41s/it] 95%|█████████▍| 6306/6640 [3:24:00<1:49:49, 19.73s/it] {'loss': 0.5157, 'learning_rate': 1.3244328860290257e-07, 'epoch': 0.95} 95%|█████████▍| 6306/6640 [3:24:00<1:49:49, 19.73s/it] 95%|█████████▍| 6307/6640 [3:24:17<1:43:59, 18.74s/it] {'loss': 0.4963, 'learning_rate': 1.3165314443274623e-07, 'epoch': 0.95} 95%|█████████▍| 6307/6640 [3:24:17<1:43:59, 18.74s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 95%|█████████▌| 6308/6640 [3:24:33<1:40:05, 18.09s/it] {'loss': 0.5086, 'learning_rate': 1.3086534866440515e-07, 'epoch': 0.95} 95%|█████████▌| 6308/6640 [3:24:33<1:40:05, 18.09s/it] 95%|█████████▌| 6309/6640 [3:24:49<1:35:56, 17.39s/it] {'loss': 0.5183, 'learning_rate': 1.300799014853571e-07, 'epoch': 0.95} 95%|█████████▌| 6309/6640 [3:24:49<1:35:56, 17.39s/it] 95%|█████████▌| 6310/6640 [3:25:06<1:33:48, 17.06s/it] {'loss': 0.5058, 'learning_rate': 1.292968030825159e-07, 'epoch': 0.95} 95%|█████████▌| 6310/6640 [3:25:06<1:33:48, 17.06s/it] 95%|█████████▌| 6311/6640 [3:25:22<1:32:02, 16.79s/it] {'loss': 0.505, 'learning_rate': 1.285160536422392e-07, 'epoch': 0.95} 95%|█████████▌| 6311/6640 [3:25:22<1:32:02, 16.79s/it] 95%|█████████▌| 6312/6640 [3:25:38<1:30:55, 16.63s/it] {'loss': 0.5056, 'learning_rate': 1.2773765335032384e-07, 'epoch': 0.95} 95%|█████████▌| 6312/6640 [3:25:38<1:30:55, 16.63s/it] 95%|█████████▌| 6313/6640 [3:25:54<1:30:03, 16.52s/it] {'loss': 0.4945, 'learning_rate': 1.2696160239200728e-07, 'epoch': 0.95} 95%|█████████▌| 6313/6640 [3:25:54<1:30:03, 16.52s/it] 95%|█████████▌| 6314/6640 [3:26:10<1:29:10, 16.41s/it] {'loss': 0.5074, 'learning_rate': 1.2618790095196953e-07, 'epoch': 0.95} 95%|█████████▌| 6314/6640 [3:26:10<1:29:10, 16.41s/it] 95%|█████████▌| 6315/6640 [3:26:26<1:27:28, 16.15s/it] {'loss': 0.5069, 'learning_rate': 1.2541654921432998e-07, 'epoch': 0.95} 95%|█████████▌| 6315/6640 [3:26:26<1:27:28, 16.15s/it] 95%|█████████▌| 6316/6640 [3:26:42<1:26:30, 16.02s/it] {'loss': 0.5196, 'learning_rate': 1.2464754736265183e-07, 'epoch': 0.95} 95%|█████████▌| 6316/6640 [3:26:42<1:26:30, 16.02s/it] 95%|█████████▌| 6317/6640 [3:26:58<1:26:02, 15.98s/it] {'loss': 0.5131, 'learning_rate': 1.2388089557993533e-07, 'epoch': 0.95} 95%|█████████▌| 6317/6640 [3:26:58<1:26:02, 15.98s/it] 95%|█████████▌| 6318/6640 [3:27:15<1:27:26, 16.29s/it] {'loss': 0.5044, 'learning_rate': 1.231165940486234e-07, 'epoch': 0.95} 95%|█████████▌| 6318/6640 [3:27:15<1:27:26, 16.29s/it] 95%|█████████▌| 6319/6640 [3:27:31<1:27:21, 16.33s/it] {'loss': 0.5023, 'learning_rate': 1.223546429505984e-07, 'epoch': 0.95} 95%|█████████▌| 6319/6640 [3:27:31<1:27:21, 16.33s/it] 95%|█████████▌| 6320/6640 [3:27:48<1:28:09, 16.53s/it] {'loss': 0.5185, 'learning_rate': 1.2159504246718522e-07, 'epoch': 0.95} 95%|█████████▌| 6320/6640 [3:27:48<1:28:09, 16.53s/it] 95%|█████████▌| 6321/6640 [3:28:05<1:27:58, 16.55s/it] {'loss': 0.502, 'learning_rate': 1.208377927791482e-07, 'epoch': 0.95} 95%|█████████▌| 6321/6640 [3:28:05<1:27:58, 16.55s/it] 95%|█████████▌| 6322/6640 [3:28:20<1:26:38, 16.35s/it] {'loss': 0.5066, 'learning_rate': 1.2008289406669206e-07, 'epoch': 0.95} 95%|█████████▌| 6322/6640 [3:28:20<1:26:38, 16.35s/it] 95%|█████████▌| 6323/6640 [3:28:37<1:26:57, 16.46s/it] {'loss': 0.5247, 'learning_rate': 1.1933034650946306e-07, 'epoch': 0.95} 95%|█████████▌| 6323/6640 [3:28:37<1:26:57, 16.46s/it] 95%|█████████▌| 6324/6640 [3:28:53<1:25:47, 16.29s/it] {'loss': 0.5164, 'learning_rate': 1.1858015028654801e-07, 'epoch': 0.95} 95%|█████████▌| 6324/6640 [3:28:53<1:25:47, 16.29s/it] 95%|█████████▌| 6325/6640 [3:29:09<1:24:20, 16.07s/it] {'loss': 0.4956, 'learning_rate': 1.1783230557647075e-07, 'epoch': 0.95} 95%|█████████▌| 6325/6640 [3:29:09<1:24:20, 16.07s/it] 95%|█████████▌| 6326/6640 [3:29:26<1:26:39, 16.56s/it] {'loss': 0.5129, 'learning_rate': 1.1708681255720223e-07, 'epoch': 0.95} 95%|█████████▌| 6326/6640 [3:29:26<1:26:39, 16.56s/it] 95%|█████████▌| 6327/6640 [3:29:43<1:26:01, 16.49s/it] {'loss': 0.492, 'learning_rate': 1.1634367140614611e-07, 'epoch': 0.95} 95%|█████████▌| 6327/6640 [3:29:43<1:26:01, 16.49s/it] 95%|█████████▌| 6328/6640 [3:29:59<1:24:51, 16.32s/it] {'loss': 0.506, 'learning_rate': 1.1560288230015204e-07, 'epoch': 0.95} 95%|█████████▌| 6328/6640 [3:29:59<1:24:51, 16.32s/it] 95%|█████████▌| 6329/6640 [3:30:15<1:24:20, 16.27s/it] {'loss': 0.5171, 'learning_rate': 1.1486444541550679e-07, 'epoch': 0.95} 95%|█████████▌| 6329/6640 [3:30:15<1:24:20, 16.27s/it] 95%|█████████▌| 6330/6640 [3:30:31<1:24:19, 16.32s/it] {'loss': 0.5045, 'learning_rate': 1.1412836092793977e-07, 'epoch': 0.95} 95%|█████████▌| 6330/6640 [3:30:31<1:24:19, 16.32s/it] 95%|█████████▌| 6331/6640 [3:30:47<1:23:20, 16.18s/it] {'loss': 0.5081, 'learning_rate': 1.1339462901261867e-07, 'epoch': 0.95} 95%|█████████▌| 6331/6640 [3:30:47<1:23:20, 16.18s/it] 95%|█████████▌| 6332/6640 [3:31:05<1:25:13, 16.60s/it] {'loss': 0.4999, 'learning_rate': 1.1266324984415266e-07, 'epoch': 0.95} 95%|█████████▌| 6332/6640 [3:31:05<1:25:13, 16.60s/it] 95%|█████████▌| 6333/6640 [3:31:22<1:25:33, 16.72s/it] {'loss': 0.5367, 'learning_rate': 1.1193422359658924e-07, 'epoch': 0.95} 95%|█████████▌| 6333/6640 [3:31:22<1:25:33, 16.72s/it] 95%|█████████▌| 6334/6640 [3:31:38<1:24:17, 16.53s/it] {'loss': 0.4759, 'learning_rate': 1.1120755044341736e-07, 'epoch': 0.95} 95%|█████████▌| 6334/6640 [3:31:38<1:24:17, 16.53s/it] 95%|█████████▌| 6335/6640 [3:31:54<1:23:04, 16.34s/it] {'loss': 0.5079, 'learning_rate': 1.1048323055756649e-07, 'epoch': 0.95} 95%|█████████▌| 6335/6640 [3:31:54<1:23:04, 16.34s/it] 95%|█████████▌| 6336/6640 [3:32:09<1:21:57, 16.18s/it] {'loss': 0.506, 'learning_rate': 1.097612641114043e-07, 'epoch': 0.95} 95%|█████████▌| 6336/6640 [3:32:09<1:21:57, 16.18s/it] 95%|█████████▌| 6337/6640 [3:32:26<1:22:54, 16.42s/it] {'loss': 0.5012, 'learning_rate': 1.0904165127674116e-07, 'epoch': 0.95} 95%|█████████▌| 6337/6640 [3:32:26<1:22:54, 16.42s/it] 95%|█████████▌| 6338/6640 [3:32:44<1:23:57, 16.68s/it] {'loss': 0.5093, 'learning_rate': 1.0832439222482338e-07, 'epoch': 0.95} 95%|█████████▌| 6338/6640 [3:32:44<1:23:57, 16.68s/it] 95%|█████████▌| 6339/6640 [3:33:00<1:23:29, 16.64s/it] {'loss': 0.5248, 'learning_rate': 1.0760948712634112e-07, 'epoch': 0.95} 95%|█████████▌| 6339/6640 [3:33:00<1:23:29, 16.64s/it] 95%|█████████▌| 6340/6640 [3:33:17<1:23:16, 16.66s/it] {'loss': 0.5235, 'learning_rate': 1.068969361514216e-07, 'epoch': 0.95} 95%|█████████▌| 6340/6640 [3:33:17<1:23:16, 16.66s/it] 95%|█████████▌| 6341/6640 [3:33:33<1:22:11, 16.49s/it] {'loss': 0.5253, 'learning_rate': 1.0618673946963365e-07, 'epoch': 0.95} 95%|█████████▌| 6341/6640 [3:33:33<1:22:11, 16.49s/it] 96%|█████████▌| 6342/6640 [3:33:49<1:21:46, 16.47s/it] {'loss': 0.5189, 'learning_rate': 1.0547889724998428e-07, 'epoch': 0.96} 96%|█████████▌| 6342/6640 [3:33:49<1:21:46, 16.47s/it] 96%|█████████▌| 6343/6640 [3:34:06<1:21:14, 16.41s/it] {'loss': 0.5156, 'learning_rate': 1.0477340966092097e-07, 'epoch': 0.96} 96%|█████████▌| 6343/6640 [3:34:06<1:21:14, 16.41s/it] 96%|█████████▌| 6344/6640 [3:34:22<1:20:14, 16.26s/it] {'loss': 0.4865, 'learning_rate': 1.0407027687033166e-07, 'epoch': 0.96} 96%|█████████▌| 6344/6640 [3:34:22<1:20:14, 16.26s/it] 96%|█████████▌| 6345/6640 [3:34:38<1:20:39, 16.41s/it] {'loss': 0.4949, 'learning_rate': 1.033694990455425e-07, 'epoch': 0.96} 96%|█████████▌| 6345/6640 [3:34:38<1:20:39, 16.41s/it] 96%|█████████▌| 6346/6640 [3:34:54<1:19:58, 16.32s/it] {'loss': 0.5116, 'learning_rate': 1.0267107635331897e-07, 'epoch': 0.96} 96%|█████████▌| 6346/6640 [3:34:54<1:19:58, 16.32s/it] 96%|█████████▌| 6347/6640 [3:35:11<1:20:15, 16.43s/it] {'loss': 0.5137, 'learning_rate': 1.0197500895986922e-07, 'epoch': 0.96} 96%|█████████▌| 6347/6640 [3:35:11<1:20:15, 16.43s/it] 96%|█████████▌| 6348/6640 [3:35:28<1:20:31, 16.55s/it] {'loss': 0.5034, 'learning_rate': 1.0128129703083634e-07, 'epoch': 0.96} 96%|█████████▌| 6348/6640 [3:35:28<1:20:31, 16.55s/it] 96%|█████████▌| 6349/6640 [3:35:45<1:20:42, 16.64s/it] {'loss': 0.5018, 'learning_rate': 1.0058994073130712e-07, 'epoch': 0.96} 96%|█████████▌| 6349/6640 [3:35:45<1:20:42, 16.64s/it]6 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 96%|█████████▌| 6350/6640 [3:36:02<1:20:35, 16.67s/it]2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.5132, 'learning_rate': 9.990094022580332e-08, 'epoch': 0.96} 96%|█████████▌| 6350/6640 [3:36:02<1:20:35, 16.67s/it] 96%|█████████▌| 6351/6640 [3:36:17<1:19:05, 16.42s/it] {'loss': 0.5322, 'learning_rate': 9.921429567829043e-08, 'epoch': 0.96} 96%|█████████▌| 6351/6640 [3:36:17<1:19:05, 16.42s/it] 96%|█████████▌| 6352/6640 [3:36:33<1:18:16, 16.31s/it] {'loss': 0.4868, 'learning_rate': 9.85300072521711e-08, 'epoch': 0.96} 96%|█████████▌| 6352/6640 [3:36:33<1:18:16, 16.31s/it] 96%|█████████▌| 6353/6640 [3:36:50<1:18:04, 16.32s/it] {'loss': 0.5353, 'learning_rate': 9.784807511028837e-08, 'epoch': 0.96} 96%|█████████▌| 6353/6640 [3:36:50<1:18:04, 16.32s/it] 96%|█████████▌| 6354/6640 [3:37:06<1:18:06, 16.39s/it] {'loss': 0.5015, 'learning_rate': 9.716849941492135e-08, 'epoch': 0.96} 96%|█████████▌| 6354/6640 [3:37:06<1:18:06, 16.39s/it] 96%|█████████▌| 6355/6640 [3:37:23<1:18:20, 16.49s/it] {'loss': 0.5273, 'learning_rate': 9.649128032779287e-08, 'epoch': 0.96} 96%|█████████▌| 6355/6640 [3:37:23<1:18:20, 16.49s/it] 96%|█████████▌| 6356/6640 [3:37:40<1:18:12, 16.52s/it] {'loss': 0.4961, 'learning_rate': 9.581641801006292e-08, 'epoch': 0.96} 96%|█████████▌| 6356/6640 [3:37:40<1:18:12, 16.52s/it] 96%|█████████▌| 6357/6640 [3:37:56<1:17:21, 16.40s/it] {'loss': 0.5202, 'learning_rate': 9.514391262233081e-08, 'epoch': 0.96} 96%|█████████▌| 6357/6640 [3:37:56<1:17:21, 16.40s/it] 96%|█████████▌| 6358/6640 [3:38:13<1:17:54, 16.58s/it] {'loss': 0.4762, 'learning_rate': 9.447376432463295e-08, 'epoch': 0.96} 96%|█████████▌| 6358/6640 [3:38:13<1:17:54, 16.58s/it] 96%|█████████▌| 6359/6640 [3:38:30<1:17:50, 16.62s/it] {'loss': 0.5346, 'learning_rate': 9.380597327644847e-08, 'epoch': 0.96} 96%|█████████▌| 6359/6640 [3:38:30<1:17:50, 16.62s/it] 96%|█████████▌| 6360/6640 [3:38:46<1:17:23, 16.58s/it] {'loss': 0.5251, 'learning_rate': 9.314053963669245e-08, 'epoch': 0.96} 96%|█████████▌| 6360/6640 [3:38:46<1:17:23, 16.58s/it] 96%|█████████▌| 6361/6640 [3:39:02<1:16:40, 16.49s/it] {'loss': 0.5154, 'learning_rate': 9.247746356372156e-08, 'epoch': 0.96} 96%|█████████▌| 6361/6640 [3:39:02<1:16:40, 16.49s/it] 96%|█████████▌| 6362/6640 [3:39:19<1:16:48, 16.58s/it] {'loss': 0.5087, 'learning_rate': 9.181674521532957e-08, 'epoch': 0.96} 96%|█████████▌| 6362/6640 [3:39:19<1:16:48, 16.58s/it] 96%|█████████▌| 6363/6640 [3:39:35<1:15:29, 16.35s/it] {'loss': 0.5181, 'learning_rate': 9.115838474874849e-08, 'epoch': 0.96} 96%|█████████▌| 6363/6640 [3:39:35<1:15:29, 16.35s/it] 96%|█████████▌| 6364/6640 [3:39:51<1:15:06, 16.33s/it] {'loss': 0.5208, 'learning_rate': 9.0502382320653e-08, 'epoch': 0.96} 96%|█████████▌| 6364/6640 [3:39:51<1:15:06, 16.33s/it] 96%|█████████▌| 6365/6640 [3:40:07<1:14:09, 16.18s/it] {'loss': 0.51, 'learning_rate': 8.984873808715155e-08, 'epoch': 0.96} 96%|█████████▌| 6365/6640 [3:40:07<1:14:09, 16.18s/it] 96%|█████████▌| 6366/6640 [3:40:23<1:13:33, 16.11s/it] {'loss': 0.5046, 'learning_rate': 8.919745220379528e-08, 'epoch': 0.96} 96%|█████████▌| 6366/6640 [3:40:23<1:13:33, 16.11s/it] 96%|█████████▌| 6367/6640 [3:40:39<1:13:05, 16.06s/it] {'loss': 0.5019, 'learning_rate': 8.854852482557242e-08, 'epoch': 0.96} 96%|█████████▌| 6367/6640 [3:40:39<1:13:05, 16.06s/it] 96%|█████████▌| 6368/6640 [3:40:55<1:13:06, 16.13s/it] {'loss': 0.5059, 'learning_rate': 8.790195610691054e-08, 'epoch': 0.96} 96%|█████████▌| 6368/6640 [3:40:55<1:13:06, 16.13s/it] 96%|█████████▌| 6369/6640 [3:41:12<1:13:43, 16.32s/it] {'loss': 0.4818, 'learning_rate': 8.725774620167549e-08, 'epoch': 0.96} 96%|█████████▌| 6369/6640 [3:41:12<1:13:43, 16.32s/it] 96%|█████████▌| 6370/6640 [3:41:30<1:15:43, 16.83s/it] {'loss': 0.5013, 'learning_rate': 8.661589526317238e-08, 'epoch': 0.96} 96%|█████████▌| 6370/6640 [3:41:30<1:15:43, 16.83s/it] 96%|█████████▌| 6371/6640 [3:41:46<1:14:51, 16.70s/it] {'loss': 0.4931, 'learning_rate': 8.597640344414348e-08, 'epoch': 0.96} 96%|█████████▌| 6371/6640 [3:41:46<1:14:51, 16.70s/it] 96%|█████████▌| 6372/6640 [3:42:03<1:13:59, 16.56s/it] {'loss': 0.4834, 'learning_rate': 8.533927089677152e-08, 'epoch': 0.96} 96%|█████████▌| 6372/6640 [3:42:03<1:13:59, 16.56s/it] 96%|█████████▌| 6373/6640 [3:42:20<1:14:37, 16.77s/it] {'loss': 0.5084, 'learning_rate': 8.470449777267631e-08, 'epoch': 0.96} 96%|█████████▌| 6373/6640 [3:42:20<1:14:37, 16.77s/it] 96%|█████████▌| 6374/6640 [3:42:37<1:14:17, 16.76s/it] {'loss': 0.5185, 'learning_rate': 8.407208422291702e-08, 'epoch': 0.96} 96%|█████████▌| 6374/6640 [3:42:37<1:14:17, 16.76s/it] 96%|█████████▌| 6375/6640 [3:42:53<1:13:42, 16.69s/it] {'loss': 0.4926, 'learning_rate': 8.344203039799214e-08, 'epoch': 0.96} 96%|█████████▌| 6375/6640 [3:42:53<1:13:42, 16.69s/it] 96%|█████████▌| 6376/6640 [3:43:09<1:13:01, 16.60s/it] {'loss': 0.5038, 'learning_rate': 8.281433644783621e-08, 'epoch': 0.96} 96%|█████████▌| 6376/6640 [3:43:09<1:13:01, 16.60s/it] 96%|█████████▌| 6377/6640 [3:43:26<1:12:51, 16.62s/it] {'loss': 0.5242, 'learning_rate': 8.218900252182415e-08, 'epoch': 0.96} 96%|█████████▌| 6377/6640 [3:43:26<1:12:51, 16.62s/it] 96%|█████████▌| 6378/6640 [3:43:42<1:12:03, 16.50s/it] {'loss': 0.5032, 'learning_rate': 8.156602876876918e-08, 'epoch': 0.96} 96%|█████████▌| 6378/6640 [3:43:42<1:12:03, 16.50s/it] 96%|█████████▌| 6379/6640 [3:43:58<1:10:51, 16.29s/it] {'loss': 0.5082, 'learning_rate': 8.094541533692047e-08, 'epoch': 0.96} 96%|█████████▌| 6379/6640 [3:43:58<1:10:51, 16.29s/it] 96%|█████████▌| 6380/6640 [3:44:14<1:10:26, 16.26s/it] {'loss': 0.5187, 'learning_rate': 8.032716237396987e-08, 'epoch': 0.96} 96%|█████████▌| 6380/6640 [3:44:14<1:10:26, 16.26s/it] 96%|█████████▌| 6381/6640 [3:44:30<1:09:16, 16.05s/it] {'loss': 0.494, 'learning_rate': 7.971127002704304e-08, 'epoch': 0.96} 96%|█████████▌| 6381/6640 [3:44:30<1:09:16, 16.05s/it] 96%|█████████▌| 6382/6640 [3:44:45<1:08:17, 15.88s/it] {'loss': 0.5188, 'learning_rate': 7.909773844270718e-08, 'epoch': 0.96} 96%|█████████▌| 6382/6640 [3:44:45<1:08:17, 15.88s/it] 96%|█████████▌| 6383/6640 [3:45:01<1:07:59, 15.88s/it] {'loss': 0.514, 'learning_rate': 7.84865677669655e-08, 'epoch': 0.96} 96%|█████████▌| 6383/6640 [3:45:01<1:07:59, 15.88s/it] 96%|█████████▌| 6384/6640 [3:45:18<1:08:29, 16.05s/it] {'loss': 0.5246, 'learning_rate': 7.787775814526055e-08, 'epoch': 0.96} 96%|█████████▌| 6384/6640 [3:45:18<1:08:29, 16.05s/it] 96%|█████████▌| 6385/6640 [3:45:34<1:08:09, 16.04s/it] {'loss': 0.5081, 'learning_rate': 7.727130972247199e-08, 'epoch': 0.96} 96%|█████████▌| 6385/6640 [3:45:34<1:08:09, 16.04s/it] 96%|█████████▌| 6386/6640 [3:45:50<1:08:39, 16.22s/it] {'loss': 0.5192, 'learning_rate': 7.666722264291882e-08, 'epoch': 0.96} 96%|█████████▌| 6386/6640 [3:45:50<1:08:39, 16.22s/it] 96%|█████████▌| 6387/6640 [3:46:06<1:07:40, 16.05s/it] {'loss': 0.5009, 'learning_rate': 7.606549705035937e-08, 'epoch': 0.96} 96%|█████████▌| 6387/6640 [3:46:06<1:07:40, 16.05s/it] 96%|█████████▌| 6388/6640 [3:46:23<1:08:17, 16.26s/it] {'loss': 0.512, 'learning_rate': 7.546613308798468e-08, 'epoch': 0.96} 96%|█████████▌| 6388/6640 [3:46:23<1:08:17, 16.26s/it] 96%|█████████▌| 6389/6640 [3:46:40<1:08:42, 16.42s/it] {'loss': 0.497, 'learning_rate': 7.48691308984295e-08, 'epoch': 0.96} 96%|█████████▌| 6389/6640 [3:46:40<1:08:42, 16.42s/it] 96%|█████████▌| 6390/6640 [3:46:56<1:08:09, 16.36s/it] {'loss': 0.5028, 'learning_rate': 7.427449062376468e-08, 'epoch': 0.96} 96%|█████████▌| 6390/6640 [3:46:56<1:08:09, 16.36s/it] 96%|█████████▋| 6391/6640 [3:47:13<1:08:38, 16.54s/it] {'loss': 0.5307, 'learning_rate': 7.3682212405497e-08, 'epoch': 0.96} 96%|█████████▋| 6391/6640 [3:47:13<1:08:38, 16.54s/it] 96%|█████████▋| 6392/6640 [3:47:29<1:08:05, 16.47s/it] {'loss': 0.5085, 'learning_rate': 7.309229638457372e-08, 'epoch': 0.96} 96%|█████████▋| 6392/6640 [3:47:29<1:08:05, 16.47s/it] 96%|█████████▋| 6393/6640 [3:47:45<1:07:22, 16.37s/it] {'loss': 0.5371, 'learning_rate': 7.250474270137919e-08, 'epoch': 0.96} 96%|█████████▋| 6393/6640 [3:47:45<1:07:22, 16.37s/it] 96%|█████████▋| 6394/6640 [3:48:00<1:05:45, 16.04s/it] {'loss': 0.5146, 'learning_rate': 7.191955149573492e-08, 'epoch': 0.96} 96%|█████████▋| 6394/6640 [3:48:00<1:05:45, 16.04s/it] 96%|█████████▋| 6395/6640 [3:48:17<1:06:00, 16.17s/it] {'loss': 0.5038, 'learning_rate': 7.133672290690064e-08, 'epoch': 0.96} 96%|█████████▋| 6395/6640 [3:48:17<1:06:00, 16.17s/it] 96%|█████████▋| 6396/6640 [3:48:33<1:05:55, 16.21s/it] {'loss': 0.5108, 'learning_rate': 7.075625707357537e-08, 'epoch': 0.96} 96%|█████████▋| 6396/6640 [3:48:33<1:05:55, 16.21s/it] 96%|█████████▋| 6397/6640 [3:48:49<1:05:36, 16.20s/it] {'loss': 0.4978, 'learning_rate': 7.017815413389306e-08, 'epoch': 0.96} 96%|█████████▋| 6397/6640 [3:48:49<1:05:36, 16.20s/it] 96%|█████████▋| 6398/6640 [3:49:06<1:05:12, 16.17s/it] {'loss': 0.5166, 'learning_rate': 6.960241422542702e-08, 'epoch': 0.96} 96%|█████████▋| 6398/6640 [3:49:06<1:05:12, 16.17s/it] 96%|█████████▋| 6399/6640 [3:49:22<1:05:25, 16.29s/it] {'loss': 0.5081, 'learning_rate': 6.902903748518764e-08, 'epoch': 0.96} 96%|█████████▋| 6399/6640 [3:49:22<1:05:25, 16.29s/it]6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 96%|█████████▋| 6400/6640 [3:49:38<1:05:11, 16.30s/it]2 AutoResumeHook: Checking whether to suspend... {'loss': 0.5034, 'learning_rate': 6.845802404962243e-08, 'epoch': 0.96} 96%|█████████▋| 6400/6640 [3:49:38<1:05:11, 16.30s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-6400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-6400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/tmp-checkpoint-6400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( 96%|█████████▋| 6401/6640 [3:51:21<2:48:17, 42.25s/it] {'loss': 0.523, 'learning_rate': 6.788937405461937e-08, 'epoch': 0.96} 96%|█████████▋| 6401/6640 [3:51:21<2:48:17, 42.25s/it] 96%|█████████▋| 6402/6640 [3:51:37<2:16:19, 34.37s/it] {'loss': 0.5158, 'learning_rate': 6.732308763550022e-08, 'epoch': 0.96} 96%|█████████▋| 6402/6640 [3:51:37<2:16:19, 34.37s/it] 96%|█████████▋| 6403/6640 [3:51:54<1:54:28, 28.98s/it] {'loss': 0.4973, 'learning_rate': 6.675916492702717e-08, 'epoch': 0.96} 96%|█████████▋| 6403/6640 [3:51:54<1:54:28, 28.98s/it] 96%|█████████▋| 6404/6640 [3:52:10<1:38:52, 25.14s/it] {'loss': 0.5525, 'learning_rate': 6.619760606339731e-08, 'epoch': 0.96} 96%|█████████▋| 6404/6640 [3:52:10<1:38:52, 25.14s/it] 96%|█████████▋| 6405/6640 [3:52:26<1:28:26, 22.58s/it] {'loss': 0.4992, 'learning_rate': 6.56384111782482e-08, 'epoch': 0.96} 96%|█████████▋| 6405/6640 [3:52:26<1:28:26, 22.58s/it] 96%|█████████▋| 6406/6640 [3:52:43<1:20:35, 20.67s/it] {'loss': 0.5044, 'learning_rate': 6.508158040465118e-08, 'epoch': 0.96} 96%|█████████▋| 6406/6640 [3:52:43<1:20:35, 20.67s/it] 96%|█████████▋| 6407/6640 [3:52:59<1:15:00, 19.31s/it] {'loss': 0.4805, 'learning_rate': 6.452711387511912e-08, 'epoch': 0.96} 96%|█████████▋| 6407/6640 [3:52:59<1:15:00, 19.31s/it] 97%|█████████▋| 6408/6640 [3:53:15<1:11:30, 18.49s/it] {'loss': 0.5101, 'learning_rate': 6.39750117215987e-08, 'epoch': 0.97} 97%|█████████▋| 6408/6640 [3:53:15<1:11:30, 18.49s/it] 97%|█████████▋| 6409/6640 [3:53:31<1:08:08, 17.70s/it] {'loss': 0.5122, 'learning_rate': 6.342527407547594e-08, 'epoch': 0.97} 97%|█████████▋| 6409/6640 [3:53:31<1:08:08, 17.70s/it] 97%|█████████▋| 6410/6640 [3:53:48<1:06:28, 17.34s/it] {'loss': 0.512, 'learning_rate': 6.287790106757396e-08, 'epoch': 0.97} 97%|█████████▋| 6410/6640 [3:53:48<1:06:28, 17.34s/it] 97%|█████████▋| 6411/6640 [3:54:05<1:06:10, 17.34s/it] {'loss': 0.4956, 'learning_rate': 6.233289282815302e-08, 'epoch': 0.97} 97%|█████████▋| 6411/6640 [3:54:05<1:06:10, 17.34s/it] 97%|█████████▋| 6412/6640 [3:54:22<1:05:35, 17.26s/it] {'loss': 0.5012, 'learning_rate': 6.179024948690938e-08, 'epoch': 0.97} 97%|█████████▋| 6412/6640 [3:54:22<1:05:35, 17.26s/it] 97%|█████████▋| 6413/6640 [3:54:38<1:04:05, 16.94s/it] {'loss': 0.5122, 'learning_rate': 6.124997117297859e-08, 'epoch': 0.97} 97%|█████████▋| 6413/6640 [3:54:38<1:04:05, 16.94s/it] 97%|█████████▋| 6414/6640 [3:54:54<1:02:51, 16.69s/it] {'loss': 0.5091, 'learning_rate': 6.07120580149323e-08, 'epoch': 0.97} 97%|█████████▋| 6414/6640 [3:54:54<1:02:51, 16.69s/it] 97%|█████████▋| 6415/6640 [3:55:11<1:02:37, 16.70s/it] {'loss': 0.5013, 'learning_rate': 6.017651014077807e-08, 'epoch': 0.97} 97%|█████████▋| 6415/6640 [3:55:11<1:02:37, 16.70s/it] 97%|█████████▋| 6416/6640 [3:55:28<1:02:43, 16.80s/it] {'loss': 0.5189, 'learning_rate': 5.964332767796399e-08, 'epoch': 0.97} 97%|█████████▋| 6416/6640 [3:55:28<1:02:43, 16.80s/it] 97%|█████████▋| 6417/6640 [3:55:45<1:02:54, 16.92s/it] {'loss': 0.4995, 'learning_rate': 5.911251075337188e-08, 'epoch': 0.97} 97%|█████████▋| 6417/6640 [3:55:45<1:02:54, 16.92s/it] 97%|█████████▋| 6418/6640 [3:56:01<1:01:05, 16.51s/it] {'loss': 0.5385, 'learning_rate': 5.85840594933218e-08, 'epoch': 0.97} 97%|█████████▋| 6418/6640 [3:56:01<1:01:05, 16.51s/it] 97%|█████████▋| 6419/6640 [3:56:18<1:01:11, 16.61s/it] {'loss': 0.5158, 'learning_rate': 5.805797402357205e-08, 'epoch': 0.97} 97%|█████████▋| 6419/6640 [3:56:18<1:01:11, 16.61s/it] 97%|█████████▋| 6420/6640 [3:56:34<1:00:21, 16.46s/it] {'loss': 0.5014, 'learning_rate': 5.753425446931582e-08, 'epoch': 0.97} 97%|█████████▋| 6420/6640 [3:56:34<1:00:21, 16.46s/it] 97%|█████████▋| 6421/6640 [3:56:51<1:00:28, 16.57s/it] {'loss': 0.5063, 'learning_rate': 5.701290095518564e-08, 'epoch': 0.97} 97%|█████████▋| 6421/6640 [3:56:51<1:00:28, 16.57s/it] 97%|█████████▋| 6422/6640 [3:57:07<59:45, 16.45s/it] {'loss': 0.5155, 'learning_rate': 5.6493913605246696e-08, 'epoch': 0.97} 97%|█████████▋| 6422/6640 [3:57:07<59:45, 16.45s/it] 97%|█████████▋| 6423/6640 [3:57:24<1:00:10, 16.64s/it] {'loss': 0.4869, 'learning_rate': 5.5977292543007987e-08, 'epoch': 0.97} 97%|█████████▋| 6423/6640 [3:57:24<1:00:10, 16.64s/it] 97%|█████████▋| 6424/6640 [3:57:42<1:01:05, 16.97s/it] {'loss': 0.5116, 'learning_rate': 5.5463037891408944e-08, 'epoch': 0.97} 97%|█████████▋| 6424/6640 [3:57:42<1:01:05, 16.97s/it] 97%|█████████▋| 6425/6640 [3:57:58<1:00:37, 16.92s/it] {'loss': 0.5167, 'learning_rate': 5.495114977282945e-08, 'epoch': 0.97} 97%|█████████▋| 6425/6640 [3:57:58<1:00:37, 16.92s/it]May 29 02:18:00.807282 1643482 slurmstepd 0x155550a06700: error: *** STEP 8299338.0 ON batch-block1-2107 CANCELLED AT 2025-05-29T02:18:00 DUE TO TIME LIMIT *** srun: Job step aborted: Waiting up to 122 seconds for job step to finish. srun: error: batch-block1-2107: task 0: Terminated srun: Terminating StepId=8299338.0 srun: job 8515132 queued and waiting for resources srun: job 8515132 has been allocated resources srun: job 8515153 queued and waiting for resources srun: job 8515153 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-0048 JobID: 8515153 | Full list: batch-block1-0048 batch-block1-2006 NETWORK=Efficient-Large-Model/VILA1.5-13b wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-0048 JobID: 8515153 | Full list: batch-block1-0048 batch-block1-2006 NETWORK=Efficient-Large-Model/VILA1.5-13b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-06-03 16:10:59,577] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,577] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,577] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,577] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,577] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,577] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,577] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,577] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,878] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,878] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,878] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,878] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,878] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,878] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,878] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:10:59,878] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:11:00,828] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,828] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,828] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,828] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,828] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,828] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,828] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,828] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,828] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,828] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,828] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,828] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,828] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,828] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,828] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,828] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,828] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-06-03 16:11:00,927] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,927] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,927] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,927] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,927] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,927] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,927] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,927] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,927] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,927] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,927] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,927] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,927] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,927] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:11:00,927] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:11:00,927] [INFO] [comm.py:594:init_distributed] cdb=None You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-06-03 16:11:09,422] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 13.02B parameters Loading checkpoint shards: 0%| | 0/6 [00:00 train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2604, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2635, in _load_checkpoint sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader return MegatronSDLoader(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in __init__ super().__init__(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in __init__ self.check_ckpt_list() File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list assert len(self.ckpt_list) > 0 AssertionError Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2604, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2635, in _load_checkpoint sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader return MegatronSDLoader(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in __init__ super().__init__(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in __init__ self.check_ckpt_list() File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list assert len(self.ckpt_list) > 0 AssertionError Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2604, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2635, in _load_checkpoint sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader return MegatronSDLoader(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in __init__ super().__init__(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in __init__ self.check_ckpt_list() File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list assert len(self.ckpt_list) > 0 AssertionError Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2604, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2635, in _load_checkpoint sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader return MegatronSDLoader(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in __init__ super().__init__(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in __init__ self.check_ckpt_list() File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list assert len(self.ckpt_list) > 0 AssertionError Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2604, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2635, in _load_checkpoint sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader return MegatronSDLoader(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in __init__ super().__init__(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in __init__ self.check_ckpt_list() File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list assert len(self.ckpt_list) > 0 AssertionError Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2604, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2635, in _load_checkpoint sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader return MegatronSDLoader(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in __init__ super().__init__(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in __init__ self.check_ckpt_list() File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list assert len(self.ckpt_list) > 0 AssertionError Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2604, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2635, in _load_checkpoint sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader return MegatronSDLoader(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in __init__ super().__init__(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in __init__ self.check_ckpt_list() File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list assert len(self.ckpt_list) > 0 AssertionError Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2604, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2635, in _load_checkpoint sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader return MegatronSDLoader(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in __init__ super().__init__(ckpt_list, version, checkpoint_engine) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in __init__ self.check_ckpt_list() File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list assert len(self.ckpt_list) > 0 AssertionError Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2614, in load_checkpoint success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2614, in load_checkpoint success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2614, in load_checkpoint success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2614, in load_checkpoint success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2614, in load_checkpoint success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2614, in load_checkpoint success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2614, in load_checkpoint success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train_mem.py", line 36, in train() File "/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/train/train.py", line 436, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/trainer.py", line 1693, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2614, in load_checkpoint success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2756, in _load_zero_checkpoint raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2258980) of binary: /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/bin/python3.10 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3014318) of binary: /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/bin/python3.10 Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/bin/torchrun", line 8, in sys.exit(main()) Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/bin/torchrun", line 8, in File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper sys.exit(main()) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main return f(*args, **kwargs) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main run(args) run(args) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( elastic_launch( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent return launch_agent(self._config, self._entrypoint, list(args)) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/vila/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ llava/train/train_mem.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2025-06-03_16:12:23 host : batch-block1-0048.cm.cluster rank : 1 (local_rank: 1) exitcode : 1 (pid: 2258981) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2025-06-03_16:12:23 host : batch-block1-0048.cm.cluster rank : 2 (local_rank: 2) exitcode : 1 (pid: 2258982) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2025-06-03_16:12:23 host : batch-block1-0048.cm.cluster rank : 3 (local_rank: 3) exitcode : 1 (pid: 2258983) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [4]: time : 2025-06-03_16:12:23 host : batch-block1-0048.cm.cluster rank : 4 (local_rank: 4) exitcode : 1 (pid: 2258984) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [5]: time : 2025-06-03_16:12:23 host : batch-block1-0048.cm.cluster rank : 5 (local_rank: 5) exitcode : 1 (pid: 2258985) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [6]: time : 2025-06-03_16:12:23 host : batch-block1-0048.cm.cluster rank : 6 (local_rank: 6) exitcode : 1 (pid: 2258986) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [7]: time : 2025-06-03_16:12:23 host : batch-block1-0048.cm.cluster rank : 7 (local_rank: 7) exitcode : 1 (pid: 2258987) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-06-03_16:12:23 host : batch-block1-0048.cm.cluster rank : 0 (local_rank: 0) exitcode : 1 (pid: 2258980) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ llava/train/train_mem.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2025-06-03_16:12:23 host : batch-block1-2006.cm.cluster rank : 9 (local_rank: 1) exitcode : 1 (pid: 3014319) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2025-06-03_16:12:23 host : batch-block1-2006.cm.cluster rank : 10 (local_rank: 2) exitcode : 1 (pid: 3014320) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2025-06-03_16:12:23 host : batch-block1-2006.cm.cluster rank : 11 (local_rank: 3) exitcode : 1 (pid: 3014321) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [4]: time : 2025-06-03_16:12:23 host : batch-block1-2006.cm.cluster rank : 12 (local_rank: 4) exitcode : 1 (pid: 3014322) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [5]: time : 2025-06-03_16:12:23 host : batch-block1-2006.cm.cluster rank : 13 (local_rank: 5) exitcode : 1 (pid: 3014323) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [6]: time : 2025-06-03_16:12:23 host : batch-block1-2006.cm.cluster rank : 14 (local_rank: 6) exitcode : 1 (pid: 3014324) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [7]: time : 2025-06-03_16:12:23 host : batch-block1-2006.cm.cluster rank : 15 (local_rank: 7) exitcode : 1 (pid: 3014325) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-06-03_16:12:23 host : batch-block1-2006.cm.cluster rank : 8 (local_rank: 0) exitcode : 1 (pid: 3014318) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ srun: error: batch-block1-2006: task 1: Exited with exit code 1 srun: Terminating StepId=8515153.0 srun: error: batch-block1-0048: task 0: Exited with exit code 1 srun: job 8515163 queued and waiting for resources srun: job 8515163 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block5-00142 JobID: 8515163 | Full list: batch-block5-00142 NETWORK=Efficient-Large-Model/VILA1.5-13b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-06-03 16:14:28,596] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:14:28,596] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:14:28,596] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:14:28,596] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:14:28,596] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:14:28,596] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:14:28,596] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:14:28,596] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 16:14:30,252] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:14:30,252] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:14:30,252] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:14:30,252] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:14:30,252] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:14:30,252] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:14:30,252] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:14:30,252] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:14:30,252] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:14:30,252] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:14:30,252] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:14:30,252] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:14:30,252] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-06-03 16:14:30,252] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:14:30,252] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 16:14:30,252] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 16:14:30,252] [INFO] [comm.py:594:init_distributed] cdb=None You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-06-03 16:14:39,868] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 13.02B parameters Loading checkpoint shards: 0%| | 0/6 [00:00 4096). Running this sequence through the model will result in indexing errors 100%|█████████▉| 6607/6640 [1:00:09<11:04, 20.14s/it] {'loss': 0.5004, 'learning_rate': 1.295735971834633e-09, 'epoch': 1.0} 100%|█████████▉| 6607/6640 [1:00:09<11:04, 20.14s/it] 100%|█████████▉| 6608/6640 [1:00:26<10:08, 19.02s/it] {'loss': 0.5098, 'learning_rate': 1.2183979297364368e-09, 'epoch': 1.0} 100%|█████████▉| 6608/6640 [1:00:26<10:08, 19.02s/it] 100%|█████████▉| 6609/6640 [1:00:42<09:23, 18.18s/it] {'loss': 0.5136, 'learning_rate': 1.1434393294273981e-09, 'epoch': 1.0} 100%|█████████▉| 6609/6640 [1:00:42<09:23, 18.18s/it] 100%|█████████▉| 6610/6640 [1:00:58<08:43, 17.45s/it] {'loss': 0.5004, 'learning_rate': 1.0708601887454706e-09, 'epoch': 1.0} 100%|█████████▉| 6610/6640 [1:00:58<08:43, 17.45s/it] 100%|█████████▉| 6611/6640 [1:01:14<08:14, 17.06s/it] {'loss': 0.5095, 'learning_rate': 1.000660524960173e-09, 'epoch': 1.0} 100%|█████████▉| 6611/6640 [1:01:14<08:14, 17.06s/it] 100%|█████████▉| 6612/6640 [1:01:32<08:07, 17.40s/it] {'loss': 0.5033, 'learning_rate': 9.328403547792518e-10, 'epoch': 1.0} 100%|█████████▉| 6612/6640 [1:01:32<08:07, 17.40s/it] 100%|█████████▉| 6613/6640 [1:01:49<07:41, 17.10s/it] {'loss': 0.5187, 'learning_rate': 8.673996943420193e-10, 'epoch': 1.0} 100%|█████████▉| 6613/6640 [1:01:49<07:41, 17.10s/it] 100%|█████████▉| 6614/6640 [1:02:05<07:21, 16.99s/it] {'loss': 0.4937, 'learning_rate': 8.043385592215735e-10, 'epoch': 1.0} 100%|█████████▉| 6614/6640 [1:02:05<07:21, 16.99s/it] 100%|█████████▉| 6615/6640 [1:02:22<06:59, 16.77s/it] {'loss': 0.5089, 'learning_rate': 7.43656964423689e-10, 'epoch': 1.0} 100%|█████████▉| 6615/6640 [1:02:22<06:59, 16.77s/it] 100%|█████████▉| 6616/6640 [1:02:38<06:36, 16.54s/it] {'loss': 0.4785, 'learning_rate': 6.85354924390147e-10, 'epoch': 1.0} 100%|█████████▉| 6616/6640 [1:02:38<06:36, 16.54s/it] 100%|█████████▉| 6617/6640 [1:02:54<06:21, 16.61s/it] {'loss': 0.4915, 'learning_rate': 6.294324529942942e-10, 'epoch': 1.0} 100%|█████████▉| 6617/6640 [1:02:54<06:21, 16.61s/it] 100%|█████████▉| 6618/6640 [1:03:11<06:06, 16.67s/it] {'loss': 0.5025, 'learning_rate': 5.75889563544374e-10, 'epoch': 1.0} 100%|█████████▉| 6618/6640 [1:03:11<06:06, 16.67s/it] 100%|█████████▉| 6619/6640 [1:03:27<05:45, 16.47s/it] {'loss': 0.5055, 'learning_rate': 5.247262687835264e-10, 'epoch': 1.0} 100%|█████████▉| 6619/6640 [1:03:27<05:45, 16.47s/it] 100%|█████████▉| 6620/6640 [1:03:43<05:27, 16.40s/it] {'loss': 0.4739, 'learning_rate': 4.759425808853468e-10, 'epoch': 1.0} 100%|█████████▉| 6620/6640 [1:03:43<05:27, 16.40s/it] 100%|█████████▉| 6621/6640 [1:04:00<05:10, 16.37s/it] {'loss': 0.5151, 'learning_rate': 4.295385114594375e-10, 'epoch': 1.0} 100%|█████████▉| 6621/6640 [1:04:00<05:10, 16.37s/it] 100%|█████████▉| 6622/6640 [1:04:15<04:50, 16.13s/it] {'loss': 0.5007, 'learning_rate': 3.8551407155029697e-10, 'epoch': 1.0} 100%|█████████▉| 6622/6640 [1:04:15<04:50, 16.13s/it] 100%|█████████▉| 6623/6640 [1:04:31<04:33, 16.10s/it] {'loss': 0.4975, 'learning_rate': 3.4386927163287953e-10, 'epoch': 1.0} 100%|█████████▉| 6623/6640 [1:04:31<04:33, 16.10s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/llava/model/llava_arch.py:397: UserWarning: Inputs truncated! warnings.warn("Inputs truncated!") 100%|█████████▉| 6624/6640 [1:04:50<04:28, 16.75s/it] {'loss': 0.5179, 'learning_rate': 3.0460412161814613e-10, 'epoch': 1.0} 100%|█████████▉| 6624/6640 [1:04:50<04:28, 16.75s/it] 100%|█████████▉| 6625/6640 [1:05:05<04:06, 16.43s/it] {'loss': 0.5025, 'learning_rate': 2.677186308497337e-10, 'epoch': 1.0} 100%|█████████▉| 6625/6640 [1:05:05<04:06, 16.43s/it] 100%|█████████▉| 6626/6640 [1:05:22<03:51, 16.55s/it] {'loss': 0.5267, 'learning_rate': 2.3321280810617575e-10, 'epoch': 1.0} 100%|█████████▉| 6626/6640 [1:05:22<03:51, 16.55s/it] 100%|█████████▉| 6627/6640 [1:05:39<03:37, 16.70s/it] {'loss': 0.5048, 'learning_rate': 2.0108666159757151e-10, 'epoch': 1.0} 100%|█████████▉| 6627/6640 [1:05:39<03:37, 16.70s/it] 100%|█████████▉| 6628/6640 [1:05:56<03:19, 16.63s/it] {'loss': 0.5102, 'learning_rate': 1.7134019897113718e-10, 'epoch': 1.0} 100%|█████████▉| 6628/6640 [1:05:56<03:19, 16.63s/it] 100%|█████████▉| 6629/6640 [1:06:12<03:01, 16.46s/it] {'loss': 0.5099, 'learning_rate': 1.4397342730343434e-10, 'epoch': 1.0} 100%|█████████▉| 6629/6640 [1:06:12<03:01, 16.46s/it] 100%|█████████▉| 6630/6640 [1:06:28<02:43, 16.33s/it] {'loss': 0.5086, 'learning_rate': 1.1898635310925167e-10, 'epoch': 1.0} 100%|█████████▉| 6630/6640 [1:06:28<02:43, 16.33s/it] 100%|█████████▉| 6631/6640 [1:06:44<02:26, 16.33s/it] {'loss': 0.5192, 'learning_rate': 9.637898233272324e-11, 'epoch': 1.0} 100%|█████████▉| 6631/6640 [1:06:44<02:26, 16.33s/it] 100%|█████████▉| 6632/6640 [1:07:00<02:09, 16.21s/it] {'loss': 0.5252, 'learning_rate': 7.615132035510008e-11, 'epoch': 1.0} 100%|█████████▉| 6632/6640 [1:07:00<02:09, 16.21s/it] 100%|█████████▉| 6633/6640 [1:07:17<01:54, 16.33s/it] {'loss': 0.5239, 'learning_rate': 5.830337199030922e-11, 'epoch': 1.0} 100%|█████████▉| 6633/6640 [1:07:17<01:54, 16.33s/it] 100%|█████████▉| 6634/6640 [1:07:32<01:37, 16.17s/it] {'loss': 0.524, 'learning_rate': 4.2835141484953715e-11, 'epoch': 1.0} 100%|█████████▉| 6634/6640 [1:07:32<01:37, 16.17s/it] 100%|█████████▉| 6635/6640 [1:07:49<01:21, 16.24s/it] {'loss': 0.5149, 'learning_rate': 2.9746632520533116e-11, 'epoch': 1.0} 100%|█████████▉| 6635/6640 [1:07:49<01:21, 16.24s/it] 100%|█████████▉| 6636/6640 [1:08:05<01:04, 16.23s/it] {'loss': 0.512, 'learning_rate': 1.903784821122301e-11, 'epoch': 1.0} 100%|█████████▉| 6636/6640 [1:08:05<01:04, 16.23s/it] 100%|█████████▉| 6637/6640 [1:08:21<00:48, 16.16s/it] {'loss': 0.5005, 'learning_rate': 1.070879110498524e-11, 'epoch': 1.0} 100%|█████████▉| 6637/6640 [1:08:21<00:48, 16.16s/it] 100%|█████████▉| 6638/6640 [1:08:37<00:32, 16.18s/it] {'loss': 0.526, 'learning_rate': 4.759463185788349e-12, 'epoch': 1.0} 100%|█████████▉| 6638/6640 [1:08:37<00:32, 16.18s/it] 100%|█████████▉| 6639/6640 [1:08:53<00:16, 16.16s/it] {'loss': 0.4951, 'learning_rate': 1.1898658669462494e-12, 'epoch': 1.0} 100%|█████████▉| 6639/6640 [1:08:53<00:16, 16.16s/it] 100%|██████████| 6640/6640 [1:09:12<00:00, 16.90s/it] {'loss': 0.5203, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 6640/6640 [1:09:12<00:00, 16.90s/it] {'train_runtime': 4155.3143, 'train_samples_per_second': 409.162, 'train_steps_per_second': 1.598, 'train_loss': 0.01843953949051449, 'epoch': 1.0} 100%|██████████| 6640/6640 [1:09:12<00:00, 16.90s/it] 100%|██████████| 6640/6640 [1:09:12<00:00, 1.60it/s] saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask/mm_projector wandb: 🚀 View run vila_13b_path_mask at: https://wandb.ai/memmelma/VILA/runs/dqplhl83 wandb: Find logs at: ../../../../../../../../fs12/portfolios/nvr/users/mmemmel/projects/vila/VILA/wandb/run-20250603_161638-dqplhl83/logs srun: job 8517763 queued and waiting for resources srun: job 8517763 has been allocated resources wandb: Currently logged in as: memmelma. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-0101 JobID: 8517763 | Full list: batch-block1-0101 NETWORK=Efficient-Large-Model/VILA1.5-13b WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! [2025-06-03 17:29:34,854] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 17:29:34,854] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 17:29:34,854] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 17:29:34,854] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 17:29:34,854] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 17:29:34,854] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 17:29:34,854] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 17:29:34,854] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-06-03 17:29:35,970] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 17:29:35,970] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 17:29:35,970] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 17:29:35,970] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 17:29:35,970] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 17:29:35,970] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 17:29:35,970] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 17:29:35,970] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 17:29:35,970] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 17:29:35,970] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 17:29:35,970] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 17:29:35,970] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 17:29:35,970] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 17:29:35,970] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 17:29:35,970] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-06-03 17:29:35,970] [INFO] [comm.py:594:init_distributed] cdb=None [2025-06-03 17:29:35,970] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask. Skipp trainingModels has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/vila_13b_path_mask. Skipp training