srun: job 8262265 queued and waiting for resources srun: job 8262265 has been allocated resources wandb: Currently logged in as: memmelma to https://api.wandb.ai. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block5-00222 JobID: 8262265 | Full list: batch-block5-00222 NETWORK=Efficient-Large-Model/NVILA-Lite-2B W0527 18:08:36.759000 23456244102976 torch/distributed/run.py:757] W0527 18:08:36.759000 23456244102976 torch/distributed/run.py:757] ***************************************** W0527 18:08:36.759000 23456244102976 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0527 18:08:36.759000 23456244102976 torch/distributed/run.py:757] ***************************************** 2025-05-27 18:08:59.343 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 18:08:59.343 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 18:08:59.343 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 18:08:59.343 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 18:08:59.343 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 18:08:59.343 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 18:08:59.343 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 18:08:59.344 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 18:08:59.346 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 18:08:59.346 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 18:08:59.346 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 18:08:59.346 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 18:08:59.346 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 18:08:59.346 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 18:08:59.346 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 18:08:59.346 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. [2025-05-27 18:08:59,554] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:59,554] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:59,554] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:59,554] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:59,554] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:59,554] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:59,554] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 18:08:59,555] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( [2025-05-27 18:09:16,339] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:09:16,339] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:09:16,339] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:09:16,339] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-05-27 18:09:16,339] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:09:16,340] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:09:16,340] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:09:16,344] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:09:16,344] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:09:16,350] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:09:16,351] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:09:16,351] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:09:16,351] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:09:16,351] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:09:16,351] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 18:09:16,351] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 18:09:16,351] [INFO] [comm.py:594:init_distributed] cdb=None [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) Fetching 16 files: 0%| | 0/16 [00:00 4096). Running this sequence through the model will result in indexing errors 14%|█▎ | 902/6638 [51:39<9:18:11, 5.84s/it] {'loss': 0.6762, 'grad_norm': 0.68187390854547, 'learning_rate': 1.941898019186298e-05, 'epoch': 0.14} 14%|█▎ | 902/6638 [51:39<9:18:11, 5.84s/it] 14%|█▎ | 903/6638 [51:42<8:04:41, 5.07s/it] {'loss': 0.6644, 'grad_norm': 0.6739168908087529, 'learning_rate': 1.9417339962465084e-05, 'epoch': 0.14} 14%|█▎ | 903/6638 [51:42<8:04:41, 5.07s/it] 14%|█▎ | 904/6638 [51:45<7:09:14, 4.49s/it] {'loss': 0.6476, 'grad_norm': 0.749032536326029, 'learning_rate': 1.9415697490600287e-05, 'epoch': 0.14} 14%|█▎ | 904/6638 [51:45<7:09:14, 4.49s/it] 14%|█▎ | 905/6638 [51:48<6:34:27, 4.13s/it] {'loss': 0.6833, 'grad_norm': 0.7254317253676545, 'learning_rate': 1.9414052776659705e-05, 'epoch': 0.14} 14%|█▎ | 905/6638 [51:48<6:34:27, 4.13s/it] 14%|█▎ | 906/6638 [51:52<6:09:21, 3.87s/it] {'loss': 0.6654, 'grad_norm': 0.6866042147891699, 'learning_rate': 1.9412405821034973e-05, 'epoch': 0.14} 14%|█▎ | 906/6638 [51:52<6:09:21, 3.87s/it] 14%|█▎ | 907/6638 [51:55<5:56:54, 3.74s/it] {'loss': 0.7016, 'grad_norm': 0.637081479197643, 'learning_rate': 1.9410756624118267e-05, 'epoch': 0.14} 14%|█▎ | 907/6638 [51:55<5:56:54, 3.74s/it] 14%|█▎ | 908/6638 [51:59<5:45:29, 3.62s/it] {'loss': 0.6787, 'grad_norm': 0.5984286519566647, 'learning_rate': 1.9409105186302292e-05, 'epoch': 0.14} 14%|█▎ | 908/6638 [51:59<5:45:29, 3.62s/it] 14%|█▎ | 909/6638 [52:02<5:35:49, 3.52s/it] {'loss': 0.6895, 'grad_norm': 0.6639572930479347, 'learning_rate': 1.9407451507980298e-05, 'epoch': 0.14} 14%|█▎ | 909/6638 [52:02<5:35:49, 3.52s/it] 14%|█▎ | 910/6638 [52:05<5:28:20, 3.44s/it] {'loss': 0.6883, 'grad_norm': 0.7151899295154446, 'learning_rate': 1.940579558954606e-05, 'epoch': 0.14} 14%|█▎ | 910/6638 [52:05<5:28:20, 3.44s/it] 14%|█▎ | 911/6638 [52:08<5:23:28, 3.39s/it] {'loss': 0.7168, 'grad_norm': 0.7431178337130052, 'learning_rate': 1.940413743139388e-05, 'epoch': 0.14} 14%|█▎ | 911/6638 [52:08<5:23:28, 3.39s/it] 14%|█▎ | 912/6638 [52:11<5:17:08, 3.32s/it] {'loss': 0.7105, 'grad_norm': 0.8071901965463575, 'learning_rate': 1.9402477033918607e-05, 'epoch': 0.14} 14%|█▎ | 912/6638 [52:12<5:17:08, 3.32s/it] 14%|█▍ | 913/6638 [52:15<5:17:19, 3.33s/it] {'loss': 0.7042, 'grad_norm': 0.6849112546929427, 'learning_rate': 1.9400814397515612e-05, 'epoch': 0.14} 14%|█▍ | 913/6638 [52:15<5:17:19, 3.33s/it] 14%|█▍ | 914/6638 [52:18<5:14:50, 3.30s/it] {'loss': 0.672, 'grad_norm': 0.72959503877506, 'learning_rate': 1.9399149522580805e-05, 'epoch': 0.14} 14%|█▍ | 914/6638 [52:18<5:14:50, 3.30s/it] 14%|█▍ | 915/6638 [52:21<5:14:47, 3.30s/it] {'loss': 0.7367, 'grad_norm': 0.7002244289421035, 'learning_rate': 1.9397482409510627e-05, 'epoch': 0.14} 14%|█▍ | 915/6638 [52:21<5:14:47, 3.30s/it] 14%|█▍ | 916/6638 [52:25<5:14:19, 3.30s/it] {'loss': 0.6769, 'grad_norm': 0.6821784662908217, 'learning_rate': 1.9395813058702057e-05, 'epoch': 0.14} 14%|█▍ | 916/6638 [52:25<5:14:19, 3.30s/it] 14%|█▍ | 917/6638 [52:28<5:15:00, 3.30s/it] {'loss': 0.7156, 'grad_norm': 0.7791119177325236, 'learning_rate': 1.93941414705526e-05, 'epoch': 0.14} 14%|█▍ | 917/6638 [52:28<5:15:00, 3.30s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/model/llava_arch.py:500: UserWarning: Truncating sequences to `model_max_length` (4096). warnings.warn(f"Truncating sequences to `model_max_length` ({self.tokenizer.model_max_length}).") [2025-05-27 19:02:42,282] [WARNING] [stage3.py:1850:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time 14%|█▍ | 918/6638 [52:32<5:43:24, 3.60s/it] {'loss': 0.6951, 'grad_norm': 0.6114130199722557, 'learning_rate': 1.9392467645460293e-05, 'epoch': 0.14} 14%|█▍ | 918/6638 [52:32<5:43:24, 3.60s/it] 14%|█▍ | 919/6638 [52:36<5:34:07, 3.51s/it] {'loss': 0.6819, 'grad_norm': 0.7200306500085004, 'learning_rate': 1.9390791583823713e-05, 'epoch': 0.14} 14%|█▍ | 919/6638 [52:36<5:34:07, 3.51s/it] 14%|█▍ | 920/6638 [52:39<5:27:44, 3.44s/it] {'loss': 0.7052, 'grad_norm': 0.6473553876862537, 'learning_rate': 1.9389113286041965e-05, 'epoch': 0.14} 14%|█▍ | 920/6638 [52:39<5:27:44, 3.44s/it] 14%|█▍ | 921/6638 [52:42<5:25:07, 3.41s/it] {'loss': 0.7553, 'grad_norm': 0.8009744195646337, 'learning_rate': 1.9387432752514686e-05, 'epoch': 0.14} 14%|█▍ | 921/6638 [52:42<5:25:07, 3.41s/it] 14%|█▍ | 922/6638 [52:45<5:19:52, 3.36s/it] {'loss': 0.6637, 'grad_norm': 0.638723995765676, 'learning_rate': 1.938574998364205e-05, 'epoch': 0.14} 14%|█▍ | 922/6638 [52:45<5:19:52, 3.36s/it] 14%|█▍ | 923/6638 [52:49<5:16:57, 3.33s/it] {'loss': 0.6659, 'grad_norm': 0.6957279405671217, 'learning_rate': 1.9384064979824753e-05, 'epoch': 0.14} 14%|█▍ | 923/6638 [52:49<5:16:57, 3.33s/it] 14%|█▍ | 924/6638 [52:52<5:13:47, 3.30s/it] {'loss': 0.647, 'grad_norm': 0.5994879633527312, 'learning_rate': 1.9382377741464032e-05, 'epoch': 0.14} 14%|█▍ | 924/6638 [52:52<5:13:47, 3.30s/it] 14%|█▍ | 925/6638 [52:55<5:17:25, 3.33s/it] {'loss': 0.6717, 'grad_norm': 0.685158244426605, 'learning_rate': 1.938068826896166e-05, 'epoch': 0.14} 14%|█▍ | 925/6638 [52:55<5:17:25, 3.33s/it] 14%|█▍ | 926/6638 [52:59<5:13:14, 3.29s/it] {'loss': 0.6775, 'grad_norm': 0.6573910983382236, 'learning_rate': 1.937899656271993e-05, 'epoch': 0.14} 14%|█▍ | 926/6638 [52:59<5:13:14, 3.29s/it] 14%|█▍ | 927/6638 [53:02<5:12:39, 3.28s/it] {'loss': 0.6817, 'grad_norm': 0.7211897141125094, 'learning_rate': 1.9377302623141672e-05, 'epoch': 0.14} 14%|█▍ | 927/6638 [53:02<5:12:39, 3.28s/it] 14%|█▍ | 928/6638 [53:05<5:11:44, 3.28s/it] {'loss': 0.6591, 'grad_norm': 0.699541966401071, 'learning_rate': 1.9375606450630253e-05, 'epoch': 0.14} 14%|█▍ | 928/6638 [53:05<5:11:44, 3.28s/it] 14%|█▍ | 929/6638 [53:08<5:12:46, 3.29s/it] {'loss': 0.6888, 'grad_norm': 0.7292388927026444, 'learning_rate': 1.9373908045589566e-05, 'epoch': 0.14} 14%|█▍ | 929/6638 [53:08<5:12:46, 3.29s/it] 14%|█▍ | 930/6638 [53:12<5:11:16, 3.27s/it] {'loss': 0.6682, 'grad_norm': 0.6621935966410205, 'learning_rate': 1.9372207408424034e-05, 'epoch': 0.14} 14%|█▍ | 930/6638 [53:12<5:11:16, 3.27s/it] 14%|█▍ | 931/6638 [53:15<5:10:38, 3.27s/it] {'loss': 0.6618, 'grad_norm': 0.6722098554107628, 'learning_rate': 1.937050453953862e-05, 'epoch': 0.14} 14%|█▍ | 931/6638 [53:15<5:10:38, 3.27s/it] 14%|█▍ | 932/6638 [53:18<5:12:31, 3.29s/it] {'loss': 0.6833, 'grad_norm': 0.6415563321435204, 'learning_rate': 1.936879943933881e-05, 'epoch': 0.14} 14%|█▍ | 932/6638 [53:18<5:12:31, 3.29s/it] 14%|█▍ | 933/6638 [53:22<5:14:58, 3.31s/it] {'loss': 0.7171, 'grad_norm': 0.6584053163180253, 'learning_rate': 1.936709210823062e-05, 'epoch': 0.14} 14%|█▍ | 933/6638 [53:22<5:14:58, 3.31s/it] 14%|█▍ | 934/6638 [53:25<5:11:41, 3.28s/it] {'loss': 0.6682, 'grad_norm': 0.6966973158614526, 'learning_rate': 1.9365382546620607e-05, 'epoch': 0.14} 14%|█▍ | 934/6638 [53:25<5:11:41, 3.28s/it] 14%|█▍ | 935/6638 [53:28<5:10:27, 3.27s/it] {'loss': 0.6542, 'grad_norm': 0.6454049957459709, 'learning_rate': 1.9363670754915855e-05, 'epoch': 0.14} 14%|█▍ | 935/6638 [53:28<5:10:27, 3.27s/it] 14%|█▍ | 936/6638 [53:31<5:11:11, 3.27s/it] {'loss': 0.6897, 'grad_norm': 0.7081437540545048, 'learning_rate': 1.936195673352397e-05, 'epoch': 0.14} 14%|█▍ | 936/6638 [53:31<5:11:11, 3.27s/it] 14%|█▍ | 937/6638 [53:35<5:11:34, 3.28s/it] {'loss': 0.6967, 'grad_norm': 0.7239302516345713, 'learning_rate': 1.9360240482853104e-05, 'epoch': 0.14} 14%|█▍ | 937/6638 [53:35<5:11:34, 3.28s/it] 14%|█▍ | 938/6638 [53:38<5:09:48, 3.26s/it] {'loss': 0.7043, 'grad_norm': 0.7182121420899091, 'learning_rate': 1.9358522003311927e-05, 'epoch': 0.14} 14%|█▍ | 938/6638 [53:38<5:09:48, 3.26s/it] 14%|█▍ | 939/6638 [53:41<5:07:52, 3.24s/it] {'loss': 0.6609, 'grad_norm': 0.7336055024529935, 'learning_rate': 1.935680129530965e-05, 'epoch': 0.14} 14%|█▍ | 939/6638 [53:41<5:07:52, 3.24s/it] 14%|█▍ | 940/6638 [53:44<5:07:05, 3.23s/it] {'loss': 0.6605, 'grad_norm': 0.6660929494707624, 'learning_rate': 1.935507835925601e-05, 'epoch': 0.14} 14%|█▍ | 940/6638 [53:44<5:07:05, 3.23s/it] 14%|█▍ | 941/6638 [53:47<5:06:18, 3.23s/it] {'loss': 0.7187, 'grad_norm': 0.6957962134708899, 'learning_rate': 1.9353353195561274e-05, 'epoch': 0.14} 14%|█▍ | 941/6638 [53:47<5:06:18, 3.23s/it] 14%|█▍ | 942/6638 [53:51<5:09:16, 3.26s/it] {'loss': 0.6824, 'grad_norm': 0.7010321614075179, 'learning_rate': 1.9351625804636232e-05, 'epoch': 0.14} 14%|█▍ | 942/6638 [53:51<5:09:16, 3.26s/it] 14%|█▍ | 943/6638 [53:54<5:08:55, 3.25s/it] {'loss': 0.6828, 'grad_norm': 0.7084903352321915, 'learning_rate': 1.934989618689222e-05, 'epoch': 0.14} 14%|█▍ | 943/6638 [53:54<5:08:55, 3.25s/it] 14%|█▍ | 944/6638 [53:57<5:12:55, 3.30s/it] {'loss': 0.6685, 'grad_norm': 0.6868743911135619, 'learning_rate': 1.9348164342741095e-05, 'epoch': 0.14} 14%|█▍ | 944/6638 [53:57<5:12:55, 3.30s/it] 14%|█▍ | 945/6638 [54:01<5:09:12, 3.26s/it] {'loss': 0.6742, 'grad_norm': 0.7459921665109648, 'learning_rate': 1.9346430272595244e-05, 'epoch': 0.14} 14%|█▍ | 945/6638 [54:01<5:09:12, 3.26s/it] 14%|█▍ | 946/6638 [54:04<5:10:15, 3.27s/it] {'loss': 0.6659, 'grad_norm': 0.6530061998187889, 'learning_rate': 1.934469397686759e-05, 'epoch': 0.14} 14%|█▍ | 946/6638 [54:04<5:10:15, 3.27s/it] 14%|█▍ | 947/6638 [54:07<5:10:08, 3.27s/it] {'loss': 0.6947, 'grad_norm': 0.6184133272490362, 'learning_rate': 1.9342955455971576e-05, 'epoch': 0.14} 14%|█▍ | 947/6638 [54:07<5:10:08, 3.27s/it] 14%|█▍ | 948/6638 [54:10<5:11:57, 3.29s/it] {'loss': 0.6748, 'grad_norm': 0.645456121113939, 'learning_rate': 1.9341214710321178e-05, 'epoch': 0.14} 14%|█▍ | 948/6638 [54:10<5:11:57, 3.29s/it] 14%|█▍ | 949/6638 [54:14<5:13:40, 3.31s/it] {'loss': 0.745, 'grad_norm': 0.7628978630266898, 'learning_rate': 1.9339471740330917e-05, 'epoch': 0.14} 14%|█▍ | 949/6638 [54:14<5:13:40, 3.31s/it]2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 14%|█▍ | 950/6638 [54:17<5:11:27, 3.29s/it]1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6829, 'grad_norm': 0.665241141724793, 'learning_rate': 1.9337726546415815e-05, 'epoch': 0.14} 14%|█▍ | 950/6638 [54:17<5:11:27, 3.29s/it] 14%|█▍ | 951/6638 [54:21<5:17:21, 3.35s/it] {'loss': 0.7337, 'grad_norm': 0.6794911399292006, 'learning_rate': 1.933597912899145e-05, 'epoch': 0.14} 14%|█▍ | 951/6638 [54:21<5:17:21, 3.35s/it] 14%|█▍ | 952/6638 [54:24<5:19:25, 3.37s/it] {'loss': 0.7107, 'grad_norm': 0.6390229643265372, 'learning_rate': 1.9334229488473917e-05, 'epoch': 0.14} 14%|█▍ | 952/6638 [54:24<5:19:25, 3.37s/it] 14%|█▍ | 953/6638 [54:27<5:17:10, 3.35s/it] {'loss': 0.6832, 'grad_norm': 0.7171073173692172, 'learning_rate': 1.933247762527984e-05, 'epoch': 0.14} 14%|█▍ | 953/6638 [54:27<5:17:10, 3.35s/it] 14%|█▍ | 954/6638 [54:31<5:18:48, 3.37s/it] {'loss': 0.6601, 'grad_norm': 0.6114464170048621, 'learning_rate': 1.9330723539826373e-05, 'epoch': 0.14} 14%|█▍ | 954/6638 [54:31<5:18:48, 3.37s/it] 14%|█▍ | 955/6638 [54:34<5:16:34, 3.34s/it] {'loss': 0.6597, 'grad_norm': 0.5434412760405166, 'learning_rate': 1.932896723253121e-05, 'epoch': 0.14} 14%|█▍ | 955/6638 [54:34<5:16:34, 3.34s/it] 14%|█▍ | 956/6638 [54:37<5:16:02, 3.34s/it] {'loss': 0.6772, 'grad_norm': 0.6080144258771906, 'learning_rate': 1.9327208703812553e-05, 'epoch': 0.14} 14%|█▍ | 956/6638 [54:37<5:16:02, 3.34s/it] 14%|█▍ | 957/6638 [54:41<5:14:28, 3.32s/it] {'loss': 0.6622, 'grad_norm': 0.6831540423757874, 'learning_rate': 1.9325447954089148e-05, 'epoch': 0.14} 14%|█▍ | 957/6638 [54:41<5:14:28, 3.32s/it] 14%|█▍ | 958/6638 [54:44<5:13:48, 3.31s/it] {'loss': 0.6664, 'grad_norm': 0.6074707914411165, 'learning_rate': 1.9323684983780273e-05, 'epoch': 0.14} 14%|█▍ | 958/6638 [54:44<5:13:48, 3.31s/it] 14%|█▍ | 959/6638 [54:47<5:08:21, 3.26s/it] {'loss': 0.7148, 'grad_norm': 0.7678712621386089, 'learning_rate': 1.9321919793305723e-05, 'epoch': 0.14} 14%|█▍ | 959/6638 [54:47<5:08:21, 3.26s/it] 14%|█▍ | 960/6638 [54:50<5:07:59, 3.25s/it] {'loss': 0.6309, 'grad_norm': 0.5696733042789716, 'learning_rate': 1.9320152383085826e-05, 'epoch': 0.14} 14%|█▍ | 960/6638 [54:50<5:07:59, 3.25s/it] 14%|█▍ | 961/6638 [54:54<5:08:34, 3.26s/it] {'loss': 0.7025, 'grad_norm': 0.6798585370652108, 'learning_rate': 1.931838275354144e-05, 'epoch': 0.14} 14%|█▍ | 961/6638 [54:54<5:08:34, 3.26s/it] 14%|█▍ | 962/6638 [54:57<5:06:11, 3.24s/it] {'loss': 0.7037, 'grad_norm': 0.6924794171065392, 'learning_rate': 1.9316610905093957e-05, 'epoch': 0.14} 14%|█▍ | 962/6638 [54:57<5:06:11, 3.24s/it] 15%|█▍ | 963/6638 [55:00<5:08:17, 3.26s/it] {'loss': 0.7129, 'grad_norm': 0.6635034530710712, 'learning_rate': 1.931483683816528e-05, 'epoch': 0.15} 15%|█▍ | 963/6638 [55:00<5:08:17, 3.26s/it] 15%|█▍ | 964/6638 [55:03<5:07:50, 3.26s/it] {'loss': 0.6585, 'grad_norm': 0.6013089955085775, 'learning_rate': 1.9313060553177865e-05, 'epoch': 0.15} 15%|█▍ | 964/6638 [55:03<5:07:50, 3.26s/it] 15%|█▍ | 965/6638 [55:07<5:08:23, 3.26s/it] {'loss': 0.7139, 'grad_norm': 0.7236541446257583, 'learning_rate': 1.931128205055467e-05, 'epoch': 0.15} 15%|█▍ | 965/6638 [55:07<5:08:23, 3.26s/it] 15%|█▍ | 966/6638 [55:10<5:07:38, 3.25s/it] {'loss': 0.6567, 'grad_norm': 0.6809089608421448, 'learning_rate': 1.9309501330719205e-05, 'epoch': 0.15} 15%|█▍ | 966/6638 [55:10<5:07:38, 3.25s/it] 15%|█▍ | 967/6638 [55:13<5:11:14, 3.29s/it] {'loss': 0.7118, 'grad_norm': 0.7258564212099405, 'learning_rate': 1.9307718394095493e-05, 'epoch': 0.15} 15%|█▍ | 967/6638 [55:13<5:11:14, 3.29s/it] 15%|█▍ | 968/6638 [55:16<5:10:22, 3.28s/it] {'loss': 0.6801, 'grad_norm': 0.6491023152982844, 'learning_rate': 1.9305933241108086e-05, 'epoch': 0.15} 15%|█▍ | 968/6638 [55:16<5:10:22, 3.28s/it] 15%|█▍ | 969/6638 [55:20<5:14:37, 3.33s/it] {'loss': 0.6708, 'grad_norm': 0.6117321751901337, 'learning_rate': 1.9304145872182064e-05, 'epoch': 0.15} 15%|█▍ | 969/6638 [55:20<5:14:37, 3.33s/it] 15%|█▍ | 970/6638 [55:23<5:10:50, 3.29s/it] {'loss': 0.7121, 'grad_norm': 1.173012384172919, 'learning_rate': 1.9302356287743048e-05, 'epoch': 0.15} 15%|█▍ | 970/6638 [55:23<5:10:50, 3.29s/it] 15%|█▍ | 971/6638 [55:26<5:10:56, 3.29s/it] {'loss': 0.7078, 'grad_norm': 0.6970044894892841, 'learning_rate': 1.9300564488217164e-05, 'epoch': 0.15} 15%|█▍ | 971/6638 [55:26<5:10:56, 3.29s/it] 15%|█▍ | 972/6638 [55:30<5:07:46, 3.26s/it] {'loss': 0.6459, 'grad_norm': 0.6996069571964895, 'learning_rate': 1.9298770474031086e-05, 'epoch': 0.15} 15%|█▍ | 972/6638 [55:30<5:07:46, 3.26s/it] 15%|█▍ | 973/6638 [55:33<5:12:43, 3.31s/it] {'loss': 0.669, 'grad_norm': 0.5840887358286527, 'learning_rate': 1.9296974245612e-05, 'epoch': 0.15} 15%|█▍ | 973/6638 [55:33<5:12:43, 3.31s/it] 15%|█▍ | 974/6638 [55:36<5:11:56, 3.30s/it] {'loss': 0.7142, 'grad_norm': 0.6885994407019949, 'learning_rate': 1.9295175803387633e-05, 'epoch': 0.15} 15%|█▍ | 974/6638 [55:36<5:11:56, 3.30s/it] 15%|█▍ | 975/6638 [55:40<5:11:03, 3.30s/it] {'loss': 0.664, 'grad_norm': 0.7624037358520149, 'learning_rate': 1.9293375147786225e-05, 'epoch': 0.15} 15%|█▍ | 975/6638 [55:40<5:11:03, 3.30s/it] 15%|█▍ | 976/6638 [55:43<5:10:13, 3.29s/it] {'loss': 0.7114, 'grad_norm': 0.7210536588162779, 'learning_rate': 1.929157227923655e-05, 'epoch': 0.15} 15%|█▍ | 976/6638 [55:43<5:10:13, 3.29s/it] 15%|█▍ | 977/6638 [55:46<5:11:05, 3.30s/it] {'loss': 0.6986, 'grad_norm': 0.6815754258108876, 'learning_rate': 1.9289767198167918e-05, 'epoch': 0.15} 15%|█▍ | 977/6638 [55:46<5:11:05, 3.30s/it] 15%|█▍ | 978/6638 [55:49<5:11:54, 3.31s/it] {'loss': 0.6654, 'grad_norm': 0.6698931916821557, 'learning_rate': 1.9287959905010144e-05, 'epoch': 0.15} 15%|█▍ | 978/6638 [55:49<5:11:54, 3.31s/it] 15%|█▍ | 979/6638 [55:53<5:12:37, 3.31s/it] {'loss': 0.6715, 'grad_norm': 0.6438431792026867, 'learning_rate': 1.9286150400193593e-05, 'epoch': 0.15} 15%|█▍ | 979/6638 [55:53<5:12:37, 3.31s/it] 15%|█▍ | 980/6638 [55:56<5:11:46, 3.31s/it] {'loss': 0.7263, 'grad_norm': 0.6865920155132048, 'learning_rate': 1.928433868414914e-05, 'epoch': 0.15} 15%|█▍ | 980/6638 [55:56<5:11:46, 3.31s/it] 15%|█▍ | 981/6638 [55:59<5:09:16, 3.28s/it] {'loss': 0.6847, 'grad_norm': 0.7179143008061655, 'learning_rate': 1.9282524757308197e-05, 'epoch': 0.15} 15%|█▍ | 981/6638 [55:59<5:09:16, 3.28s/it] 15%|█▍ | 982/6638 [56:02<5:06:41, 3.25s/it] {'loss': 0.6709, 'grad_norm': 0.6415902612018246, 'learning_rate': 1.9280708620102695e-05, 'epoch': 0.15} 15%|█▍ | 982/6638 [56:02<5:06:41, 3.25s/it] 15%|█▍ | 983/6638 [56:06<5:07:36, 3.26s/it] {'loss': 0.6514, 'grad_norm': 0.5889278200659385, 'learning_rate': 1.9278890272965097e-05, 'epoch': 0.15} 15%|█▍ | 983/6638 [56:06<5:07:36, 3.26s/it] 15%|█▍ | 984/6638 [56:09<5:11:55, 3.31s/it] {'loss': 0.6612, 'grad_norm': 0.6350550631507899, 'learning_rate': 1.9277069716328385e-05, 'epoch': 0.15} 15%|█▍ | 984/6638 [56:09<5:11:55, 3.31s/it] 15%|█▍ | 985/6638 [56:12<5:12:17, 3.31s/it] {'loss': 0.7135, 'grad_norm': 0.7027245097133012, 'learning_rate': 1.9275246950626077e-05, 'epoch': 0.15} 15%|█▍ | 985/6638 [56:12<5:12:17, 3.31s/it] 15%|█▍ | 986/6638 [56:16<5:12:33, 3.32s/it] {'loss': 0.6583, 'grad_norm': 0.5874439055625105, 'learning_rate': 1.927342197629221e-05, 'epoch': 0.15} 15%|█▍ | 986/6638 [56:16<5:12:33, 3.32s/it] 15%|█▍ | 987/6638 [56:19<5:08:22, 3.27s/it] {'loss': 0.6519, 'grad_norm': 0.648794122167457, 'learning_rate': 1.927159479376135e-05, 'epoch': 0.15} 15%|█▍ | 987/6638 [56:19<5:08:22, 3.27s/it] 15%|█▍ | 988/6638 [56:22<5:10:12, 3.29s/it] {'loss': 0.728, 'grad_norm': 0.642737134746567, 'learning_rate': 1.9269765403468583e-05, 'epoch': 0.15} 15%|█▍ | 988/6638 [56:22<5:10:12, 3.29s/it] 15%|█▍ | 989/6638 [56:26<5:10:11, 3.29s/it] {'loss': 0.7101, 'grad_norm': 0.7034490378604505, 'learning_rate': 1.9267933805849534e-05, 'epoch': 0.15} 15%|█▍ | 989/6638 [56:26<5:10:11, 3.29s/it] 15%|█▍ | 990/6638 [56:29<5:10:47, 3.30s/it] {'loss': 0.6652, 'grad_norm': 0.6873717813493707, 'learning_rate': 1.9266100001340337e-05, 'epoch': 0.15} 15%|█▍ | 990/6638 [56:29<5:10:47, 3.30s/it] 15%|█▍ | 991/6638 [56:32<5:11:38, 3.31s/it] {'loss': 0.6667, 'grad_norm': 0.5772147546393677, 'learning_rate': 1.926426399037766e-05, 'epoch': 0.15} 15%|█▍ | 991/6638 [56:32<5:11:38, 3.31s/it] 15%|█▍ | 992/6638 [56:36<5:09:36, 3.29s/it] {'loss': 0.7154, 'grad_norm': 0.7238097123761295, 'learning_rate': 1.92624257733987e-05, 'epoch': 0.15} 15%|█▍ | 992/6638 [56:36<5:09:36, 3.29s/it] 15%|█▍ | 993/6638 [56:39<5:09:34, 3.29s/it] {'loss': 0.7146, 'grad_norm': 0.7799333646402666, 'learning_rate': 1.9260585350841174e-05, 'epoch': 0.15} 15%|█▍ | 993/6638 [56:39<5:09:34, 3.29s/it] 15%|█▍ | 994/6638 [56:42<5:07:55, 3.27s/it] {'loss': 0.6825, 'grad_norm': 0.6546011996752934, 'learning_rate': 1.9258742723143324e-05, 'epoch': 0.15} 15%|█▍ | 994/6638 [56:42<5:07:55, 3.27s/it] 15%|█▍ | 995/6638 [56:45<5:08:10, 3.28s/it] {'loss': 0.7057, 'grad_norm': 0.7446907841028834, 'learning_rate': 1.925689789074392e-05, 'epoch': 0.15} 15%|█▍ | 995/6638 [56:45<5:08:10, 3.28s/it] 15%|█▌ | 996/6638 [56:49<5:10:20, 3.30s/it] {'loss': 0.6691, 'grad_norm': 0.7114756092805848, 'learning_rate': 1.9255050854082255e-05, 'epoch': 0.15} 15%|█▌ | 996/6638 [56:49<5:10:20, 3.30s/it] 15%|█▌ | 997/6638 [56:52<5:09:51, 3.30s/it] {'loss': 0.7098, 'grad_norm': 0.622701405278449, 'learning_rate': 1.9253201613598145e-05, 'epoch': 0.15} 15%|█▌ | 997/6638 [56:52<5:09:51, 3.30s/it] 15%|█▌ | 998/6638 [56:55<5:08:11, 3.28s/it] {'loss': 0.7164, 'grad_norm': 0.7507502088822822, 'learning_rate': 1.9251350169731935e-05, 'epoch': 0.15} 15%|█▌ | 998/6638 [56:55<5:08:11, 3.28s/it] 15%|█▌ | 999/6638 [56:58<5:07:42, 3.27s/it] {'loss': 0.642, 'grad_norm': 0.6668151445321151, 'learning_rate': 1.9249496522924492e-05, 'epoch': 0.15} 15%|█▌ | 999/6638 [56:58<5:07:42, 3.27s/it]0 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 15%|█▌ | 1000/6638 [57:02<5:04:13, 3.24s/it]1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.687, 'grad_norm': 0.6325200621168311, 'learning_rate': 1.9247640673617213e-05, 'epoch': 0.15} 15%|█▌ | 1000/6638 [57:02<5:04:13, 3.24s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 15%|█▌ | 1001/6638 [57:17<10:48:20, 6.90s/it] {'loss': 0.6989, 'grad_norm': 0.6038252260538217, 'learning_rate': 1.9245782622252008e-05, 'epoch': 0.15} 15%|█▌ | 1001/6638 [57:17<10:48:20, 6.90s/it] 15%|█▌ | 1002/6638 [57:20<9:07:51, 5.83s/it] {'loss': 0.6602, 'grad_norm': 0.5986769014462575, 'learning_rate': 1.924392236927132e-05, 'epoch': 0.15} 15%|█▌ | 1002/6638 [57:20<9:07:51, 5.83s/it] 15%|█▌ | 1003/6638 [57:24<7:57:43, 5.09s/it] {'loss': 0.7007, 'grad_norm': 0.7040418925689108, 'learning_rate': 1.9242059915118117e-05, 'epoch': 0.15} 15%|█▌ | 1003/6638 [57:24<7:57:43, 5.09s/it] 15%|█▌ | 1004/6638 [57:27<7:06:02, 4.54s/it] {'loss': 0.6665, 'grad_norm': 0.6445000866351138, 'learning_rate': 1.9240195260235883e-05, 'epoch': 0.15} 15%|█▌ | 1004/6638 [57:27<7:06:02, 4.54s/it] 15%|█▌ | 1005/6638 [57:30<6:28:34, 4.14s/it] {'loss': 0.7022, 'grad_norm': 0.7383361452886437, 'learning_rate': 1.923832840506864e-05, 'epoch': 0.15} 15%|█▌ | 1005/6638 [57:30<6:28:34, 4.14s/it] 15%|█▌ | 1006/6638 [57:33<6:01:20, 3.85s/it] {'loss': 0.6826, 'grad_norm': 0.7435451338788902, 'learning_rate': 1.9236459350060918e-05, 'epoch': 0.15} 15%|█▌ | 1006/6638 [57:33<6:01:20, 3.85s/it] 15%|█▌ | 1007/6638 [57:37<5:45:45, 3.68s/it] {'loss': 0.6746, 'grad_norm': 0.7487729902811388, 'learning_rate': 1.9234588095657783e-05, 'epoch': 0.15} 15%|█▌ | 1007/6638 [57:37<5:45:45, 3.68s/it] 15%|█▌ | 1008/6638 [57:40<5:35:03, 3.57s/it] {'loss': 0.6684, 'grad_norm': 0.6748245989287394, 'learning_rate': 1.923271464230482e-05, 'epoch': 0.15} 15%|█▌ | 1008/6638 [57:40<5:35:03, 3.57s/it] 15%|█▌ | 1009/6638 [57:43<5:24:22, 3.46s/it] {'loss': 0.6818, 'grad_norm': 0.7896620809984121, 'learning_rate': 1.9230838990448134e-05, 'epoch': 0.15} 15%|█▌ | 1009/6638 [57:43<5:24:22, 3.46s/it] 15%|█▌ | 1010/6638 [57:47<5:23:47, 3.45s/it] {'loss': 0.6749, 'grad_norm': 0.6226262127981447, 'learning_rate': 1.922896114053436e-05, 'epoch': 0.15} 15%|█▌ | 1010/6638 [57:47<5:23:47, 3.45s/it] 15%|█▌ | 1011/6638 [57:50<5:20:29, 3.42s/it] {'loss': 0.6621, 'grad_norm': 0.8184356884866267, 'learning_rate': 1.9227081093010647e-05, 'epoch': 0.15} 15%|█▌ | 1011/6638 [57:50<5:20:29, 3.42s/it] 15%|█▌ | 1012/6638 [57:53<5:16:18, 3.37s/it] {'loss': 0.7076, 'grad_norm': 0.6858446002992756, 'learning_rate': 1.9225198848324687e-05, 'epoch': 0.15} 15%|█▌ | 1012/6638 [57:53<5:16:18, 3.37s/it] 15%|█▌ | 1013/6638 [57:56<5:10:58, 3.32s/it] {'loss': 0.6859, 'grad_norm': 0.7073251531493238, 'learning_rate': 1.9223314406924675e-05, 'epoch': 0.15} 15%|█▌ | 1013/6638 [57:56<5:10:58, 3.32s/it] 15%|█▌ | 1014/6638 [58:00<5:07:12, 3.28s/it] {'loss': 0.6631, 'grad_norm': 0.6812064022326099, 'learning_rate': 1.9221427769259333e-05, 'epoch': 0.15} 15%|█▌ | 1014/6638 [58:00<5:07:12, 3.28s/it] 15%|█▌ | 1015/6638 [58:03<5:06:18, 3.27s/it] {'loss': 0.6652, 'grad_norm': 0.6564253774667392, 'learning_rate': 1.9219538935777912e-05, 'epoch': 0.15} 15%|█▌ | 1015/6638 [58:03<5:06:18, 3.27s/it] 15%|█▌ | 1016/6638 [58:06<5:05:01, 3.26s/it] {'loss': 0.6924, 'grad_norm': 0.6708957609890859, 'learning_rate': 1.9217647906930183e-05, 'epoch': 0.15} 15%|█▌ | 1016/6638 [58:06<5:05:01, 3.26s/it] 15%|█▌ | 1017/6638 [58:09<5:04:09, 3.25s/it] {'loss': 0.6529, 'grad_norm': 0.7546208183161142, 'learning_rate': 1.9215754683166442e-05, 'epoch': 0.15} 15%|█▌ | 1017/6638 [58:09<5:04:09, 3.25s/it] 15%|█▌ | 1018/6638 [58:13<5:03:40, 3.24s/it] {'loss': 0.695, 'grad_norm': 0.6740733564022454, 'learning_rate': 1.9213859264937503e-05, 'epoch': 0.15} 15%|█▌ | 1018/6638 [58:13<5:03:40, 3.24s/it] 15%|█▌ | 1019/6638 [58:16<5:07:36, 3.28s/it] {'loss': 0.6679, 'grad_norm': 0.6710880331572703, 'learning_rate': 1.9211961652694704e-05, 'epoch': 0.15} 15%|█▌ | 1019/6638 [58:16<5:07:36, 3.28s/it] 15%|█▌ | 1020/6638 [58:19<5:07:31, 3.28s/it] {'loss': 0.6614, 'grad_norm': 0.6654567390116445, 'learning_rate': 1.9210061846889908e-05, 'epoch': 0.15} 15%|█▌ | 1020/6638 [58:19<5:07:31, 3.28s/it] 15%|█▌ | 1021/6638 [58:23<5:08:52, 3.30s/it] {'loss': 0.746, 'grad_norm': 0.8399855468657729, 'learning_rate': 1.92081598479755e-05, 'epoch': 0.15} 15%|█▌ | 1021/6638 [58:23<5:08:52, 3.30s/it] 15%|█▌ | 1022/6638 [58:26<5:11:49, 3.33s/it] {'loss': 0.7564, 'grad_norm': 0.7855262800197367, 'learning_rate': 1.9206255656404384e-05, 'epoch': 0.15} 15%|█▌ | 1022/6638 [58:26<5:11:49, 3.33s/it] 15%|█▌ | 1023/6638 [58:29<5:10:00, 3.31s/it] {'loss': 0.6716, 'grad_norm': 0.7084898998441156, 'learning_rate': 1.9204349272629988e-05, 'epoch': 0.15} 15%|█▌ | 1023/6638 [58:29<5:10:00, 3.31s/it] 15%|█▌ | 1024/6638 [58:32<5:08:43, 3.30s/it] {'loss': 0.6428, 'grad_norm': 0.6121217736215039, 'learning_rate': 1.9202440697106263e-05, 'epoch': 0.15} 15%|█▌ | 1024/6638 [58:32<5:08:43, 3.30s/it] 15%|█▌ | 1025/6638 [58:36<5:10:28, 3.32s/it] {'loss': 0.6805, 'grad_norm': 0.6495429843355885, 'learning_rate': 1.920052993028768e-05, 'epoch': 0.15} 15%|█▌ | 1025/6638 [58:36<5:10:28, 3.32s/it] 15%|█▌ | 1026/6638 [58:39<5:12:23, 3.34s/it] {'loss': 0.6632, 'grad_norm': 0.6469132132646218, 'learning_rate': 1.919861697262923e-05, 'epoch': 0.15} 15%|█▌ | 1026/6638 [58:39<5:12:23, 3.34s/it] 15%|█▌ | 1027/6638 [58:43<5:10:30, 3.32s/it] {'loss': 0.6801, 'grad_norm': 0.7182377837864267, 'learning_rate': 1.919670182458644e-05, 'epoch': 0.15} 15%|█▌ | 1027/6638 [58:43<5:10:30, 3.32s/it] 15%|█▌ | 1028/6638 [58:46<5:08:46, 3.30s/it] {'loss': 0.7183, 'grad_norm': 0.7080633459609597, 'learning_rate': 1.9194784486615333e-05, 'epoch': 0.15} 15%|█▌ | 1028/6638 [58:46<5:08:46, 3.30s/it] 16%|█▌ | 1029/6638 [58:49<5:02:48, 3.24s/it] {'loss': 0.6565, 'grad_norm': 0.6061557601540913, 'learning_rate': 1.9192864959172475e-05, 'epoch': 0.16} 16%|█▌ | 1029/6638 [58:49<5:02:48, 3.24s/it] 16%|█▌ | 1030/6638 [58:52<5:02:30, 3.24s/it] {'loss': 0.7062, 'grad_norm': 0.6786927557425264, 'learning_rate': 1.9190943242714947e-05, 'epoch': 0.16} 16%|█▌ | 1030/6638 [58:52<5:02:30, 3.24s/it] 16%|█▌ | 1031/6638 [58:55<5:04:10, 3.26s/it] {'loss': 0.6602, 'grad_norm': 0.6572504269545003, 'learning_rate': 1.9189019337700344e-05, 'epoch': 0.16} 16%|█▌ | 1031/6638 [58:55<5:04:10, 3.26s/it] 16%|█▌ | 1032/6638 [58:59<5:02:08, 3.23s/it] {'loss': 0.6345, 'grad_norm': 0.6366776656094071, 'learning_rate': 1.9187093244586793e-05, 'epoch': 0.16} 16%|█▌ | 1032/6638 [58:59<5:02:08, 3.23s/it] 16%|█▌ | 1033/6638 [59:02<5:02:17, 3.24s/it] {'loss': 0.6676, 'grad_norm': 0.6996154293644766, 'learning_rate': 1.9185164963832938e-05, 'epoch': 0.16} 16%|█▌ | 1033/6638 [59:02<5:02:17, 3.24s/it] 16%|█▌ | 1034/6638 [59:05<5:01:33, 3.23s/it] {'loss': 0.6467, 'grad_norm': 0.6821955119182719, 'learning_rate': 1.918323449589794e-05, 'epoch': 0.16} 16%|█▌ | 1034/6638 [59:05<5:01:33, 3.23s/it] 16%|█▌ | 1035/6638 [59:08<5:05:22, 3.27s/it] {'loss': 0.6976, 'grad_norm': 0.7731450004444251, 'learning_rate': 1.9181301841241486e-05, 'epoch': 0.16} 16%|█▌ | 1035/6638 [59:08<5:05:22, 3.27s/it] 16%|█▌ | 1036/6638 [59:12<5:05:07, 3.27s/it] {'loss': 0.6513, 'grad_norm': 0.656862125684745, 'learning_rate': 1.917936700032378e-05, 'epoch': 0.16} 16%|█▌ | 1036/6638 [59:12<5:05:07, 3.27s/it] 16%|█▌ | 1037/6638 [59:15<5:06:00, 3.28s/it] {'loss': 0.686, 'grad_norm': 0.636552986438872, 'learning_rate': 1.917742997360555e-05, 'epoch': 0.16} 16%|█▌ | 1037/6638 [59:15<5:06:00, 3.28s/it] 16%|█▌ | 1038/6638 [59:18<5:07:13, 3.29s/it] {'loss': 0.7158, 'grad_norm': 0.7254697376122644, 'learning_rate': 1.9175490761548047e-05, 'epoch': 0.16} 16%|█▌ | 1038/6638 [59:18<5:07:13, 3.29s/it] 16%|█▌ | 1039/6638 [59:21<5:04:46, 3.27s/it] {'loss': 0.661, 'grad_norm': 0.639337175423338, 'learning_rate': 1.917354936461303e-05, 'epoch': 0.16} 16%|█▌ | 1039/6638 [59:21<5:04:46, 3.27s/it] 16%|█▌ | 1040/6638 [59:25<5:05:22, 3.27s/it] {'loss': 0.6777, 'grad_norm': 0.6236975620651775, 'learning_rate': 1.917160578326279e-05, 'epoch': 0.16} 16%|█▌ | 1040/6638 [59:25<5:05:22, 3.27s/it] 16%|█▌ | 1041/6638 [59:28<5:05:51, 3.28s/it] {'loss': 0.6846, 'grad_norm': 0.6920842598672201, 'learning_rate': 1.9169660017960135e-05, 'epoch': 0.16} 16%|█▌ | 1041/6638 [59:28<5:05:51, 3.28s/it] 16%|█▌ | 1042/6638 [59:31<5:06:52, 3.29s/it] {'loss': 0.6584, 'grad_norm': 0.5467316520439687, 'learning_rate': 1.916771206916839e-05, 'epoch': 0.16} 16%|█▌ | 1042/6638 [59:31<5:06:52, 3.29s/it] 16%|█▌ | 1043/6638 [59:35<5:06:41, 3.29s/it] {'loss': 0.6527, 'grad_norm': 0.6562135244355156, 'learning_rate': 1.9165761937351412e-05, 'epoch': 0.16} 16%|█▌ | 1043/6638 [59:35<5:06:41, 3.29s/it] 16%|█▌ | 1044/6638 [59:38<5:04:27, 3.27s/it] {'loss': 0.6485, 'grad_norm': 0.6793906972740594, 'learning_rate': 1.9163809622973555e-05, 'epoch': 0.16} 16%|█▌ | 1044/6638 [59:38<5:04:27, 3.27s/it] 16%|█▌ | 1045/6638 [59:41<5:03:04, 3.25s/it] {'loss': 0.6753, 'grad_norm': 0.6946328126178045, 'learning_rate': 1.9161855126499716e-05, 'epoch': 0.16} 16%|█▌ | 1045/6638 [59:41<5:03:04, 3.25s/it] 16%|█▌ | 1046/6638 [59:44<5:03:04, 3.25s/it] {'loss': 0.6727, 'grad_norm': 2.1910808234617813, 'learning_rate': 1.91598984483953e-05, 'epoch': 0.16} 16%|█▌ | 1046/6638 [59:44<5:03:04, 3.25s/it] 16%|█▌ | 1047/6638 [59:48<5:10:20, 3.33s/it] {'loss': 0.7312, 'grad_norm': 0.7048516289577554, 'learning_rate': 1.9157939589126226e-05, 'epoch': 0.16} 16%|█▌ | 1047/6638 [59:48<5:10:20, 3.33s/it] 16%|█▌ | 1048/6638 [59:51<5:10:33, 3.33s/it] {'loss': 0.6741, 'grad_norm': 0.6381295787354312, 'learning_rate': 1.9155978549158952e-05, 'epoch': 0.16} 16%|█▌ | 1048/6638 [59:51<5:10:33, 3.33s/it] 16%|█▌ | 1049/6638 [59:54<5:07:10, 3.30s/it] {'loss': 0.6515, 'grad_norm': 0.6072166883394099, 'learning_rate': 1.915401532896043e-05, 'epoch': 0.16} 16%|█▌ | 1049/6638 [59:54<5:07:10, 3.30s/it]4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 16%|█▌ | 1050/6638 [59:58<5:09:09, 3.32s/it]5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.7076, 'grad_norm': 0.758532075068989, 'learning_rate': 1.9152049928998157e-05, 'epoch': 0.16} 16%|█▌ | 1050/6638 [59:58<5:09:09, 3.32s/it] 16%|█▌ | 1051/6638 [1:00:01<5:07:57, 3.31s/it] {'loss': 0.6608, 'grad_norm': 0.6277235951437938, 'learning_rate': 1.9150082349740123e-05, 'epoch': 0.16} 16%|█▌ | 1051/6638 [1:00:01<5:07:57, 3.31s/it] 16%|█▌ | 1052/6638 [1:00:04<5:10:04, 3.33s/it] {'loss': 0.7474, 'grad_norm': 0.740227847294947, 'learning_rate': 1.9148112591654858e-05, 'epoch': 0.16} 16%|█▌ | 1052/6638 [1:00:04<5:10:04, 3.33s/it] 16%|█▌ | 1053/6638 [1:00:08<5:06:59, 3.30s/it] {'loss': 0.7156, 'grad_norm': 0.696678750660825, 'learning_rate': 1.9146140655211405e-05, 'epoch': 0.16} 16%|█▌ | 1053/6638 [1:00:08<5:06:59, 3.30s/it] 16%|█▌ | 1054/6638 [1:00:11<5:06:23, 3.29s/it] {'loss': 0.6686, 'grad_norm': 0.660657557694167, 'learning_rate': 1.9144166540879315e-05, 'epoch': 0.16} 16%|█▌ | 1054/6638 [1:00:11<5:06:23, 3.29s/it] 16%|█▌ | 1055/6638 [1:00:14<5:05:55, 3.29s/it] {'loss': 0.6753, 'grad_norm': 0.7452713626678901, 'learning_rate': 1.9142190249128677e-05, 'epoch': 0.16} 16%|█▌ | 1055/6638 [1:00:14<5:05:55, 3.29s/it] 16%|█▌ | 1056/6638 [1:00:17<5:03:06, 3.26s/it] {'loss': 0.6902, 'grad_norm': 0.7315635339181775, 'learning_rate': 1.9140211780430076e-05, 'epoch': 0.16} 16%|█▌ | 1056/6638 [1:00:17<5:03:06, 3.26s/it] 16%|█▌ | 1057/6638 [1:00:21<5:03:27, 3.26s/it] {'loss': 0.6612, 'grad_norm': 0.6849237308448028, 'learning_rate': 1.9138231135254635e-05, 'epoch': 0.16} 16%|█▌ | 1057/6638 [1:00:21<5:03:27, 3.26s/it] 16%|█▌ | 1058/6638 [1:00:24<5:04:16, 3.27s/it] {'loss': 0.6659, 'grad_norm': 0.6792683894431087, 'learning_rate': 1.9136248314073985e-05, 'epoch': 0.16} 16%|█▌ | 1058/6638 [1:00:24<5:04:16, 3.27s/it] 16%|█▌ | 1059/6638 [1:00:27<5:05:10, 3.28s/it] {'loss': 0.6818, 'grad_norm': 0.7784820995396766, 'learning_rate': 1.9134263317360277e-05, 'epoch': 0.16} 16%|█▌ | 1059/6638 [1:00:27<5:05:10, 3.28s/it] 16%|█▌ | 1060/6638 [1:00:31<5:03:47, 3.27s/it] {'loss': 0.6936, 'grad_norm': 0.7803843939931899, 'learning_rate': 1.913227614558618e-05, 'epoch': 0.16} 16%|█▌ | 1060/6638 [1:00:31<5:03:47, 3.27s/it] 16%|█▌ | 1061/6638 [1:00:34<5:02:22, 3.25s/it] {'loss': 0.7033, 'grad_norm': 0.6410272948683576, 'learning_rate': 1.9130286799224887e-05, 'epoch': 0.16} 16%|█▌ | 1061/6638 [1:00:34<5:02:22, 3.25s/it] 16%|█▌ | 1062/6638 [1:00:37<5:03:30, 3.27s/it] {'loss': 0.6616, 'grad_norm': 0.6577455423504229, 'learning_rate': 1.9128295278750094e-05, 'epoch': 0.16} 16%|█▌ | 1062/6638 [1:00:37<5:03:30, 3.27s/it] 16%|█▌ | 1063/6638 [1:00:40<5:01:55, 3.25s/it] {'loss': 0.6995, 'grad_norm': 0.7323010070309606, 'learning_rate': 1.9126301584636034e-05, 'epoch': 0.16} 16%|█▌ | 1063/6638 [1:00:40<5:01:55, 3.25s/it] 16%|█▌ | 1064/6638 [1:00:44<5:03:25, 3.27s/it] {'loss': 0.6679, 'grad_norm': 0.8294468396369925, 'learning_rate': 1.9124305717357437e-05, 'epoch': 0.16} 16%|█▌ | 1064/6638 [1:00:44<5:03:25, 3.27s/it] 16%|█▌ | 1065/6638 [1:00:47<5:05:29, 3.29s/it] {'loss': 0.6603, 'grad_norm': 0.6457312694123527, 'learning_rate': 1.912230767738957e-05, 'epoch': 0.16} 16%|█▌ | 1065/6638 [1:00:47<5:05:29, 3.29s/it] 16%|█▌ | 1066/6638 [1:00:50<5:07:58, 3.32s/it] {'loss': 0.7404, 'grad_norm': 0.7299775098232685, 'learning_rate': 1.91203074652082e-05, 'epoch': 0.16} 16%|█▌ | 1066/6638 [1:00:50<5:07:58, 3.32s/it] 16%|█▌ | 1067/6638 [1:00:54<5:08:46, 3.33s/it] {'loss': 0.6936, 'grad_norm': 0.7253011386850066, 'learning_rate': 1.9118305081289626e-05, 'epoch': 0.16} 16%|█▌ | 1067/6638 [1:00:54<5:08:46, 3.33s/it] 16%|█▌ | 1068/6638 [1:00:57<5:10:42, 3.35s/it] {'loss': 0.6982, 'grad_norm': 0.7904856472658858, 'learning_rate': 1.911630052611066e-05, 'epoch': 0.16} 16%|█▌ | 1068/6638 [1:00:57<5:10:42, 3.35s/it] 16%|█▌ | 1069/6638 [1:01:00<5:08:15, 3.32s/it] {'loss': 0.6736, 'grad_norm': 0.7033517442893636, 'learning_rate': 1.9114293800148622e-05, 'epoch': 0.16} 16%|█▌ | 1069/6638 [1:01:00<5:08:15, 3.32s/it] 16%|█▌ | 1070/6638 [1:01:03<5:03:49, 3.27s/it] {'loss': 0.6564, 'grad_norm': 0.6472466848201528, 'learning_rate': 1.911228490388136e-05, 'epoch': 0.16} 16%|█▌ | 1070/6638 [1:01:03<5:03:49, 3.27s/it] 16%|█▌ | 1071/6638 [1:01:07<5:06:42, 3.31s/it] {'loss': 0.7094, 'grad_norm': 0.6606697726695491, 'learning_rate': 1.911027383778723e-05, 'epoch': 0.16} 16%|█▌ | 1071/6638 [1:01:07<5:06:42, 3.31s/it] 16%|█▌ | 1072/6638 [1:01:10<5:03:41, 3.27s/it] {'loss': 0.6991, 'grad_norm': 0.7166285966939416, 'learning_rate': 1.9108260602345114e-05, 'epoch': 0.16} 16%|█▌ | 1072/6638 [1:01:10<5:03:41, 3.27s/it] 16%|█▌ | 1073/6638 [1:01:13<5:00:05, 3.24s/it] {'loss': 0.6492, 'grad_norm': 0.616919326904523, 'learning_rate': 1.9106245198034402e-05, 'epoch': 0.16} 16%|█▌ | 1073/6638 [1:01:13<5:00:05, 3.24s/it] 16%|█▌ | 1074/6638 [1:01:17<5:02:36, 3.26s/it] {'loss': 0.6628, 'grad_norm': 0.7154516592100144, 'learning_rate': 1.910422762533501e-05, 'epoch': 0.16} 16%|█▌ | 1074/6638 [1:01:17<5:02:36, 3.26s/it] 16%|█▌ | 1075/6638 [1:01:20<5:04:52, 3.29s/it] {'loss': 0.6893, 'grad_norm': 0.7031092945476152, 'learning_rate': 1.910220788472736e-05, 'epoch': 0.16} 16%|█▌ | 1075/6638 [1:01:20<5:04:52, 3.29s/it] 16%|█▌ | 1076/6638 [1:01:23<5:00:04, 3.24s/it] {'loss': 0.6444, 'grad_norm': 0.655370999988725, 'learning_rate': 1.910018597669239e-05, 'epoch': 0.16} 16%|█▌ | 1076/6638 [1:01:23<5:00:04, 3.24s/it] 16%|█▌ | 1077/6638 [1:01:26<4:59:46, 3.23s/it] {'loss': 0.7377, 'grad_norm': 0.8159126901365852, 'learning_rate': 1.909816190171157e-05, 'epoch': 0.16} 16%|█▌ | 1077/6638 [1:01:26<4:59:46, 3.23s/it] 16%|█▌ | 1078/6638 [1:01:29<4:59:49, 3.24s/it] {'loss': 0.6685, 'grad_norm': 0.6833891701619472, 'learning_rate': 1.909613566026687e-05, 'epoch': 0.16} 16%|█▌ | 1078/6638 [1:01:29<4:59:49, 3.24s/it] 16%|█▋ | 1079/6638 [1:01:33<5:01:24, 3.25s/it] {'loss': 0.6518, 'grad_norm': 0.7724479379214189, 'learning_rate': 1.9094107252840778e-05, 'epoch': 0.16} 16%|█▋ | 1079/6638 [1:01:33<5:01:24, 3.25s/it] 16%|█▋ | 1080/6638 [1:01:36<5:01:34, 3.26s/it] {'loss': 0.6993, 'grad_norm': 0.6663562233257743, 'learning_rate': 1.9092076679916302e-05, 'epoch': 0.16} 16%|█▋ | 1080/6638 [1:01:36<5:01:34, 3.26s/it] 16%|█▋ | 1081/6638 [1:01:39<5:03:26, 3.28s/it] {'loss': 0.6529, 'grad_norm': 0.6420863125575358, 'learning_rate': 1.909004394197696e-05, 'epoch': 0.16} 16%|█▋ | 1081/6638 [1:01:39<5:03:26, 3.28s/it] 16%|█▋ | 1082/6638 [1:01:43<5:03:25, 3.28s/it] {'loss': 0.7137, 'grad_norm': 0.7459801162275626, 'learning_rate': 1.90880090395068e-05, 'epoch': 0.16} 16%|█▋ | 1082/6638 [1:01:43<5:03:25, 3.28s/it] 16%|█▋ | 1083/6638 [1:01:46<5:04:24, 3.29s/it] {'loss': 0.6527, 'grad_norm': 0.7220522269559936, 'learning_rate': 1.9085971972990366e-05, 'epoch': 0.16} 16%|█▋ | 1083/6638 [1:01:46<5:04:24, 3.29s/it] 16%|█▋ | 1084/6638 [1:01:49<5:04:49, 3.29s/it] {'loss': 0.7127, 'grad_norm': 0.7527361506976167, 'learning_rate': 1.9083932742912733e-05, 'epoch': 0.16} 16%|█▋ | 1084/6638 [1:01:49<5:04:49, 3.29s/it] 16%|█▋ | 1085/6638 [1:01:53<5:04:51, 3.29s/it] {'loss': 0.6543, 'grad_norm': 0.6876645682362502, 'learning_rate': 1.908189134975948e-05, 'epoch': 0.16} 16%|█▋ | 1085/6638 [1:01:53<5:04:51, 3.29s/it] 16%|█▋ | 1086/6638 [1:01:56<5:12:24, 3.38s/it] {'loss': 0.7094, 'grad_norm': 0.6672926378542664, 'learning_rate': 1.907984779401671e-05, 'epoch': 0.16} 16%|█▋ | 1086/6638 [1:01:56<5:12:24, 3.38s/it] 16%|█▋ | 1087/6638 [1:01:59<5:07:49, 3.33s/it] {'loss': 0.6416, 'grad_norm': 0.5840455741382731, 'learning_rate': 1.907780207617103e-05, 'epoch': 0.16} 16%|█▋ | 1087/6638 [1:01:59<5:07:49, 3.33s/it] 16%|█▋ | 1088/6638 [1:02:03<5:05:31, 3.30s/it] {'loss': 0.6613, 'grad_norm': 0.6812746332155833, 'learning_rate': 1.9075754196709574e-05, 'epoch': 0.16} 16%|█▋ | 1088/6638 [1:02:03<5:05:31, 3.30s/it] 16%|█▋ | 1089/6638 [1:02:06<5:01:39, 3.26s/it] {'loss': 0.6922, 'grad_norm': 0.8080877378794457, 'learning_rate': 1.907370415611998e-05, 'epoch': 0.16} 16%|█▋ | 1089/6638 [1:02:06<5:01:39, 3.26s/it] 16%|█▋ | 1090/6638 [1:02:09<5:04:13, 3.29s/it] {'loss': 0.6599, 'grad_norm': 0.638937911158776, 'learning_rate': 1.907165195489041e-05, 'epoch': 0.16} 16%|█▋ | 1090/6638 [1:02:09<5:04:13, 3.29s/it] 16%|█▋ | 1091/6638 [1:02:12<5:04:47, 3.30s/it] {'loss': 0.6793, 'grad_norm': 0.7463731355381771, 'learning_rate': 1.9069597593509538e-05, 'epoch': 0.16} 16%|█▋ | 1091/6638 [1:02:12<5:04:47, 3.30s/it] 16%|█▋ | 1092/6638 [1:02:16<5:05:30, 3.31s/it] {'loss': 0.6819, 'grad_norm': 0.7384442576343866, 'learning_rate': 1.906754107246655e-05, 'epoch': 0.16} 16%|█▋ | 1092/6638 [1:02:16<5:05:30, 3.31s/it] 16%|█▋ | 1093/6638 [1:02:19<5:02:20, 3.27s/it] {'loss': 0.6709, 'grad_norm': 0.7840406734379601, 'learning_rate': 1.9065482392251142e-05, 'epoch': 0.16} 16%|█▋ | 1093/6638 [1:02:19<5:02:20, 3.27s/it] 16%|█▋ | 1094/6638 [1:02:22<4:58:55, 3.24s/it] {'loss': 0.722, 'grad_norm': 0.7124609729549104, 'learning_rate': 1.9063421553353535e-05, 'epoch': 0.16} 16%|█▋ | 1094/6638 [1:02:22<4:58:55, 3.24s/it] 16%|█▋ | 1095/6638 [1:02:25<5:01:06, 3.26s/it] {'loss': 0.681, 'grad_norm': 0.6787160957100058, 'learning_rate': 1.9061358556264455e-05, 'epoch': 0.16} 16%|█▋ | 1095/6638 [1:02:25<5:01:06, 3.26s/it] 17%|█▋ | 1096/6638 [1:02:29<5:00:50, 3.26s/it] {'loss': 0.6956, 'grad_norm': 0.6765992943539654, 'learning_rate': 1.905929340147514e-05, 'epoch': 0.17} 17%|█▋ | 1096/6638 [1:02:29<5:00:50, 3.26s/it] 17%|█▋ | 1097/6638 [1:02:32<5:00:05, 3.25s/it] {'loss': 0.7016, 'grad_norm': 0.7311144749862097, 'learning_rate': 1.9057226089477358e-05, 'epoch': 0.17} 17%|█▋ | 1097/6638 [1:02:32<5:00:05, 3.25s/it] 17%|█▋ | 1098/6638 [1:02:35<5:00:21, 3.25s/it] {'loss': 0.6644, 'grad_norm': 0.7383818566410839, 'learning_rate': 1.9055156620763372e-05, 'epoch': 0.17} 17%|█▋ | 1098/6638 [1:02:35<5:00:21, 3.25s/it] 17%|█▋ | 1099/6638 [1:02:38<5:03:55, 3.29s/it] {'loss': 0.7098, 'grad_norm': 0.6270844532526362, 'learning_rate': 1.905308499582597e-05, 'epoch': 0.17} 17%|█▋ | 1099/6638 [1:02:38<5:03:55, 3.29s/it]4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 17%|█▋ | 1100/6638 [1:02:42<5:02:27, 3.28s/it]5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6634, 'grad_norm': 0.6184381090758786, 'learning_rate': 1.9051011215158445e-05, 'epoch': 0.17} 17%|█▋ | 1100/6638 [1:02:42<5:02:27, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 17%|█▋ | 1101/6638 [1:02:58<10:50:23, 7.05s/it] {'loss': 0.6824, 'grad_norm': 0.6982110456542928, 'learning_rate': 1.904893527925461e-05, 'epoch': 0.17} 17%|█▋ | 1101/6638 [1:02:58<10:50:23, 7.05s/it] 17%|█▋ | 1102/6638 [1:03:01<9:05:09, 5.91s/it] {'loss': 0.665, 'grad_norm': 0.6281098946912066, 'learning_rate': 1.904685718860879e-05, 'epoch': 0.17} 17%|█▋ | 1102/6638 [1:03:01<9:05:09, 5.91s/it] 17%|█▋ | 1103/6638 [1:03:04<7:50:57, 5.11s/it] {'loss': 0.6473, 'grad_norm': 0.645189211458104, 'learning_rate': 1.904477694371582e-05, 'epoch': 0.17} 17%|█▋ | 1103/6638 [1:03:04<7:50:57, 5.11s/it] 17%|█▋ | 1104/6638 [1:03:07<7:01:41, 4.57s/it] {'loss': 0.7078, 'grad_norm': 0.6907122322846925, 'learning_rate': 1.9042694545071055e-05, 'epoch': 0.17} 17%|█▋ | 1104/6638 [1:03:07<7:01:41, 4.57s/it] 17%|█▋ | 1105/6638 [1:03:11<6:27:10, 4.20s/it] {'loss': 0.67, 'grad_norm': 0.6756314654721772, 'learning_rate': 1.9040609993170352e-05, 'epoch': 0.17} 17%|█▋ | 1105/6638 [1:03:11<6:27:10, 4.20s/it] 17%|█▋ | 1106/6638 [1:03:14<6:03:21, 3.94s/it] {'loss': 0.6196, 'grad_norm': 0.5521497734240342, 'learning_rate': 1.9038523288510088e-05, 'epoch': 0.17} 17%|█▋ | 1106/6638 [1:03:14<6:03:21, 3.94s/it] 17%|█▋ | 1107/6638 [1:03:17<5:46:01, 3.75s/it] {'loss': 0.6551, 'grad_norm': 0.6531585351208821, 'learning_rate': 1.9036434431587155e-05, 'epoch': 0.17} 17%|█▋ | 1107/6638 [1:03:17<5:46:01, 3.75s/it] 17%|█▋ | 1108/6638 [1:03:21<5:29:24, 3.57s/it] {'loss': 0.6558, 'grad_norm': 0.6808626925894595, 'learning_rate': 1.9034343422898952e-05, 'epoch': 0.17} 17%|█▋ | 1108/6638 [1:03:21<5:29:24, 3.57s/it] 17%|█▋ | 1109/6638 [1:03:24<5:26:37, 3.54s/it] {'loss': 0.7062, 'grad_norm': 0.7221261468257616, 'learning_rate': 1.9032250262943388e-05, 'epoch': 0.17} 17%|█▋ | 1109/6638 [1:03:24<5:26:37, 3.54s/it] 17%|█▋ | 1110/6638 [1:03:28<5:25:55, 3.54s/it] {'loss': 0.7019, 'grad_norm': 0.7005599117351788, 'learning_rate': 1.9030154952218895e-05, 'epoch': 0.17} 17%|█▋ | 1110/6638 [1:03:28<5:25:55, 3.54s/it] 17%|█▋ | 1111/6638 [1:03:31<5:19:27, 3.47s/it] {'loss': 0.6736, 'grad_norm': 0.7005083731811114, 'learning_rate': 1.9028057491224405e-05, 'epoch': 0.17} 17%|█▋ | 1111/6638 [1:03:31<5:19:27, 3.47s/it] 17%|█▋ | 1112/6638 [1:03:34<5:13:34, 3.40s/it] {'loss': 0.6819, 'grad_norm': 0.6318047183793386, 'learning_rate': 1.902595788045937e-05, 'epoch': 0.17} 17%|█▋ | 1112/6638 [1:03:34<5:13:34, 3.40s/it] 17%|█▋ | 1113/6638 [1:03:37<5:09:43, 3.36s/it] {'loss': 0.6546, 'grad_norm': 0.6762707243326003, 'learning_rate': 1.9023856120423754e-05, 'epoch': 0.17} 17%|█▋ | 1113/6638 [1:03:37<5:09:43, 3.36s/it] 17%|█▋ | 1114/6638 [1:03:41<5:06:39, 3.33s/it] {'loss': 0.6738, 'grad_norm': 0.6755409377327407, 'learning_rate': 1.9021752211618026e-05, 'epoch': 0.17} 17%|█▋ | 1114/6638 [1:03:41<5:06:39, 3.33s/it] 17%|█▋ | 1115/6638 [1:03:44<5:03:37, 3.30s/it] {'loss': 0.6513, 'grad_norm': 0.6090385161895328, 'learning_rate': 1.9019646154543173e-05, 'epoch': 0.17} 17%|█▋ | 1115/6638 [1:03:44<5:03:37, 3.30s/it] 17%|█▋ | 1116/6638 [1:03:47<5:05:10, 3.32s/it] {'loss': 0.6984, 'grad_norm': 0.6335283281677424, 'learning_rate': 1.9017537949700693e-05, 'epoch': 0.17} 17%|█▋ | 1116/6638 [1:03:47<5:05:10, 3.32s/it] 17%|█▋ | 1117/6638 [1:03:51<5:10:17, 3.37s/it] {'loss': 0.7056, 'grad_norm': 0.6719824707263344, 'learning_rate': 1.901542759759259e-05, 'epoch': 0.17} 17%|█▋ | 1117/6638 [1:03:51<5:10:17, 3.37s/it] 17%|█▋ | 1118/6638 [1:03:54<5:05:54, 3.33s/it] {'loss': 0.691, 'grad_norm': 0.6405667163015825, 'learning_rate': 1.901331509872139e-05, 'epoch': 0.17} 17%|█▋ | 1118/6638 [1:03:54<5:05:54, 3.33s/it] 17%|█▋ | 1119/6638 [1:03:57<5:05:51, 3.33s/it] {'loss': 0.6764, 'grad_norm': 0.6503706620592271, 'learning_rate': 1.9011200453590116e-05, 'epoch': 0.17} 17%|█▋ | 1119/6638 [1:03:57<5:05:51, 3.33s/it] 17%|█▋ | 1120/6638 [1:04:01<5:06:16, 3.33s/it] {'loss': 0.676, 'grad_norm': 0.6337900583286221, 'learning_rate': 1.900908366270231e-05, 'epoch': 0.17} 17%|█▋ | 1120/6638 [1:04:01<5:06:16, 3.33s/it] 17%|█▋ | 1121/6638 [1:04:04<5:03:50, 3.30s/it] {'loss': 0.7318, 'grad_norm': 0.742236595238669, 'learning_rate': 1.900696472656203e-05, 'epoch': 0.17} 17%|█▋ | 1121/6638 [1:04:04<5:03:50, 3.30s/it] 17%|█▋ | 1122/6638 [1:04:07<5:03:27, 3.30s/it] {'loss': 0.6445, 'grad_norm': 0.5491771012079956, 'learning_rate': 1.9004843645673835e-05, 'epoch': 0.17} 17%|█▋ | 1122/6638 [1:04:07<5:03:27, 3.30s/it] 17%|█▋ | 1123/6638 [1:04:11<5:07:22, 3.34s/it] {'loss': 0.6897, 'grad_norm': 0.7419637336582277, 'learning_rate': 1.9002720420542803e-05, 'epoch': 0.17} 17%|█▋ | 1123/6638 [1:04:11<5:07:22, 3.34s/it] 17%|█▋ | 1124/6638 [1:04:14<5:06:06, 3.33s/it] {'loss': 0.6591, 'grad_norm': 0.6126857408047494, 'learning_rate': 1.9000595051674518e-05, 'epoch': 0.17} 17%|█▋ | 1124/6638 [1:04:14<5:06:06, 3.33s/it] 17%|█▋ | 1125/6638 [1:04:17<5:04:46, 3.32s/it] {'loss': 0.6815, 'grad_norm': 0.7023087140556897, 'learning_rate': 1.899846753957507e-05, 'epoch': 0.17} 17%|█▋ | 1125/6638 [1:04:17<5:04:46, 3.32s/it] 17%|█▋ | 1126/6638 [1:04:21<5:10:06, 3.38s/it] {'loss': 0.7665, 'grad_norm': 0.7287076768426669, 'learning_rate': 1.8996337884751064e-05, 'epoch': 0.17} 17%|█▋ | 1126/6638 [1:04:21<5:10:06, 3.38s/it] 17%|█▋ | 1127/6638 [1:04:24<5:12:00, 3.40s/it] {'loss': 0.6962, 'grad_norm': 0.651193938533364, 'learning_rate': 1.8994206087709623e-05, 'epoch': 0.17} 17%|█▋ | 1127/6638 [1:04:24<5:12:00, 3.40s/it] 17%|█▋ | 1128/6638 [1:04:27<5:07:04, 3.34s/it] {'loss': 0.6641, 'grad_norm': 0.6354900280195703, 'learning_rate': 1.8992072148958368e-05, 'epoch': 0.17} 17%|█▋ | 1128/6638 [1:04:27<5:07:04, 3.34s/it] 17%|█▋ | 1129/6638 [1:04:31<5:05:20, 3.33s/it] {'loss': 0.7243, 'grad_norm': 0.7060444331893995, 'learning_rate': 1.8989936069005437e-05, 'epoch': 0.17} 17%|█▋ | 1129/6638 [1:04:31<5:05:20, 3.33s/it] 17%|█▋ | 1130/6638 [1:04:34<5:06:18, 3.34s/it] {'loss': 0.6507, 'grad_norm': 0.6594784043609074, 'learning_rate': 1.8987797848359472e-05, 'epoch': 0.17} 17%|█▋ | 1130/6638 [1:04:34<5:06:18, 3.34s/it] 17%|█▋ | 1131/6638 [1:04:37<5:09:06, 3.37s/it] {'loss': 0.6795, 'grad_norm': 0.6324649356969944, 'learning_rate': 1.8985657487529633e-05, 'epoch': 0.17} 17%|█▋ | 1131/6638 [1:04:37<5:09:06, 3.37s/it] 17%|█▋ | 1132/6638 [1:04:41<5:04:13, 3.32s/it] {'loss': 0.6907, 'grad_norm': 0.7011440144061115, 'learning_rate': 1.8983514987025583e-05, 'epoch': 0.17} 17%|█▋ | 1132/6638 [1:04:41<5:04:13, 3.32s/it] 17%|█▋ | 1133/6638 [1:04:44<5:07:09, 3.35s/it] {'loss': 0.6953, 'grad_norm': 0.7488523074549616, 'learning_rate': 1.8981370347357494e-05, 'epoch': 0.17} 17%|█▋ | 1133/6638 [1:04:44<5:07:09, 3.35s/it] 17%|█▋ | 1134/6638 [1:04:47<5:06:58, 3.35s/it] {'loss': 0.6902, 'grad_norm': 0.6364677926303742, 'learning_rate': 1.8979223569036055e-05, 'epoch': 0.17} 17%|█▋ | 1134/6638 [1:04:47<5:06:58, 3.35s/it] 17%|█▋ | 1135/6638 [1:04:51<5:03:21, 3.31s/it] {'loss': 0.6586, 'grad_norm': 0.6374684510759611, 'learning_rate': 1.8977074652572452e-05, 'epoch': 0.17} 17%|█▋ | 1135/6638 [1:04:51<5:03:21, 3.31s/it] 17%|█▋ | 1136/6638 [1:04:54<5:07:05, 3.35s/it] {'loss': 0.6698, 'grad_norm': 0.5738996793847926, 'learning_rate': 1.8974923598478393e-05, 'epoch': 0.17} 17%|█▋ | 1136/6638 [1:04:54<5:07:05, 3.35s/it] 17%|█▋ | 1137/6638 [1:04:57<5:03:35, 3.31s/it] {'loss': 0.7179, 'grad_norm': 0.7356648120582814, 'learning_rate': 1.897277040726609e-05, 'epoch': 0.17} 17%|█▋ | 1137/6638 [1:04:57<5:03:35, 3.31s/it] 17%|█▋ | 1138/6638 [1:05:01<5:02:02, 3.30s/it] {'loss': 0.6578, 'grad_norm': 0.659632303821525, 'learning_rate': 1.8970615079448254e-05, 'epoch': 0.17} 17%|█▋ | 1138/6638 [1:05:01<5:02:02, 3.30s/it] 17%|█▋ | 1139/6638 [1:05:04<5:01:08, 3.29s/it] {'loss': 0.6818, 'grad_norm': 0.5981811594734429, 'learning_rate': 1.8968457615538127e-05, 'epoch': 0.17} 17%|█▋ | 1139/6638 [1:05:04<5:01:08, 3.29s/it] 17%|█▋ | 1140/6638 [1:05:07<5:00:16, 3.28s/it] {'loss': 0.6488, 'grad_norm': 0.6427453896987189, 'learning_rate': 1.8966298016049438e-05, 'epoch': 0.17} 17%|█▋ | 1140/6638 [1:05:07<5:00:16, 3.28s/it] 17%|█▋ | 1141/6638 [1:05:10<5:02:35, 3.30s/it] {'loss': 0.6793, 'grad_norm': 0.5958287259765811, 'learning_rate': 1.8964136281496433e-05, 'epoch': 0.17} 17%|█▋ | 1141/6638 [1:05:10<5:02:35, 3.30s/it] 17%|█▋ | 1142/6638 [1:05:14<5:06:28, 3.35s/it] {'loss': 0.6921, 'grad_norm': 0.6177758839794466, 'learning_rate': 1.8961972412393873e-05, 'epoch': 0.17} 17%|█▋ | 1142/6638 [1:05:14<5:06:28, 3.35s/it] 17%|█▋ | 1143/6638 [1:05:17<5:05:06, 3.33s/it] {'loss': 0.6799, 'grad_norm': 0.7270114369385174, 'learning_rate': 1.8959806409257014e-05, 'epoch': 0.17} 17%|█▋ | 1143/6638 [1:05:17<5:05:06, 3.33s/it] 17%|█▋ | 1144/6638 [1:05:20<5:05:20, 3.33s/it] {'loss': 0.6684, 'grad_norm': 0.653145294632963, 'learning_rate': 1.895763827260163e-05, 'epoch': 0.17} 17%|█▋ | 1144/6638 [1:05:20<5:05:20, 3.33s/it] 17%|█▋ | 1145/6638 [1:05:24<5:03:43, 3.32s/it] {'loss': 0.6911, 'grad_norm': 0.7660360605949962, 'learning_rate': 1.8955468002943996e-05, 'epoch': 0.17} 17%|█▋ | 1145/6638 [1:05:24<5:03:43, 3.32s/it] 17%|█▋ | 1146/6638 [1:05:27<5:00:44, 3.29s/it] {'loss': 0.7328, 'grad_norm': 0.6979456526886678, 'learning_rate': 1.8953295600800906e-05, 'epoch': 0.17} 17%|█▋ | 1146/6638 [1:05:27<5:00:44, 3.29s/it] 17%|█▋ | 1147/6638 [1:05:30<5:01:28, 3.29s/it] {'loss': 0.6846, 'grad_norm': 0.7210162725518932, 'learning_rate': 1.8951121066689655e-05, 'epoch': 0.17} 17%|█▋ | 1147/6638 [1:05:30<5:01:28, 3.29s/it] 17%|█▋ | 1148/6638 [1:05:34<5:01:27, 3.29s/it] {'loss': 0.6908, 'grad_norm': 0.6567963598579117, 'learning_rate': 1.8948944401128035e-05, 'epoch': 0.17} 17%|█▋ | 1148/6638 [1:05:34<5:01:27, 3.29s/it] 17%|█▋ | 1149/6638 [1:05:37<4:58:16, 3.26s/it] {'loss': 0.6653, 'grad_norm': 0.6824559377898468, 'learning_rate': 1.8946765604634368e-05, 'epoch': 0.17} 17%|█▋ | 1149/6638 [1:05:37<4:58:16, 3.26s/it]2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 17%|█▋ | 1150/6638 [1:05:40<4:59:25, 3.27s/it]6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6653, 'grad_norm': 0.667978776659425, 'learning_rate': 1.894458467772746e-05, 'epoch': 0.17} 17%|█▋ | 1150/6638 [1:05:40<4:59:25, 3.27s/it] 17%|█▋ | 1151/6638 [1:05:43<5:00:34, 3.29s/it] {'loss': 0.6777, 'grad_norm': 0.6555888548221162, 'learning_rate': 1.8942401620926643e-05, 'epoch': 0.17} 17%|█▋ | 1151/6638 [1:05:43<5:00:34, 3.29s/it] 17%|█▋ | 1152/6638 [1:05:47<4:59:04, 3.27s/it] {'loss': 0.659, 'grad_norm': 0.709414831713894, 'learning_rate': 1.894021643475175e-05, 'epoch': 0.17} 17%|█▋ | 1152/6638 [1:05:47<4:59:04, 3.27s/it] 17%|█▋ | 1153/6638 [1:05:50<4:57:09, 3.25s/it] {'loss': 0.7243, 'grad_norm': 0.7939165973020618, 'learning_rate': 1.8938029119723113e-05, 'epoch': 0.17} 17%|█▋ | 1153/6638 [1:05:50<4:57:09, 3.25s/it] 17%|█▋ | 1154/6638 [1:05:53<4:57:25, 3.25s/it] {'loss': 0.6735, 'grad_norm': 0.6602832663758746, 'learning_rate': 1.8935839676361584e-05, 'epoch': 0.17} 17%|█▋ | 1154/6638 [1:05:53<4:57:25, 3.25s/it] 17%|█▋ | 1155/6638 [1:05:57<5:02:49, 3.31s/it] {'loss': 0.6268, 'grad_norm': 0.5906175751916928, 'learning_rate': 1.893364810518851e-05, 'epoch': 0.17} 17%|█▋ | 1155/6638 [1:05:57<5:02:49, 3.31s/it] 17%|█▋ | 1156/6638 [1:06:00<5:02:03, 3.31s/it] {'loss': 0.6626, 'grad_norm': 0.6947141869182695, 'learning_rate': 1.8931454406725757e-05, 'epoch': 0.17} 17%|█▋ | 1156/6638 [1:06:00<5:02:03, 3.31s/it] 17%|█▋ | 1157/6638 [1:06:03<4:58:00, 3.26s/it] {'loss': 0.6766, 'grad_norm': 0.7562607818360452, 'learning_rate': 1.8929258581495688e-05, 'epoch': 0.17} 17%|█▋ | 1157/6638 [1:06:03<4:58:00, 3.26s/it] 17%|█▋ | 1158/6638 [1:06:06<4:56:50, 3.25s/it] {'loss': 0.6732, 'grad_norm': 0.6602287615991296, 'learning_rate': 1.892706063002117e-05, 'epoch': 0.17} 17%|█▋ | 1158/6638 [1:06:06<4:56:50, 3.25s/it] 17%|█▋ | 1159/6638 [1:06:09<4:56:13, 3.24s/it] {'loss': 0.6925, 'grad_norm': 0.6060685496185907, 'learning_rate': 1.892486055282559e-05, 'epoch': 0.17} 17%|█▋ | 1159/6638 [1:06:09<4:56:13, 3.24s/it] 17%|█▋ | 1160/6638 [1:06:13<4:56:07, 3.24s/it] {'loss': 0.6841, 'grad_norm': 0.7173287122945146, 'learning_rate': 1.8922658350432827e-05, 'epoch': 0.17} 17%|█▋ | 1160/6638 [1:06:13<4:56:07, 3.24s/it] 17%|█▋ | 1161/6638 [1:06:16<4:54:59, 3.23s/it] {'loss': 0.6704, 'grad_norm': 0.7008026350149225, 'learning_rate': 1.892045402336727e-05, 'epoch': 0.17} 17%|█▋ | 1161/6638 [1:06:16<4:54:59, 3.23s/it] 18%|█▊ | 1162/6638 [1:06:19<4:55:24, 3.24s/it] {'loss': 0.6702, 'grad_norm': 0.6531601421601132, 'learning_rate': 1.8918247572153822e-05, 'epoch': 0.18} 18%|█▊ | 1162/6638 [1:06:19<4:55:24, 3.24s/it] 18%|█▊ | 1163/6638 [1:06:22<4:54:10, 3.22s/it] {'loss': 0.6464, 'grad_norm': 0.7365435203447176, 'learning_rate': 1.8916038997317887e-05, 'epoch': 0.18} 18%|█▊ | 1163/6638 [1:06:22<4:54:10, 3.22s/it] 18%|█▊ | 1164/6638 [1:06:26<4:56:33, 3.25s/it] {'loss': 0.6954, 'grad_norm': 0.6553150026760167, 'learning_rate': 1.8913828299385362e-05, 'epoch': 0.18} 18%|█▊ | 1164/6638 [1:06:26<4:56:33, 3.25s/it] 18%|█▊ | 1165/6638 [1:06:29<4:57:21, 3.26s/it] {'loss': 0.6517, 'grad_norm': 0.6239049865390233, 'learning_rate': 1.8911615478882672e-05, 'epoch': 0.18} 18%|█▊ | 1165/6638 [1:06:29<4:57:21, 3.26s/it] 18%|█▊ | 1166/6638 [1:06:32<4:59:45, 3.29s/it] {'loss': 0.7245, 'grad_norm': 0.7548766031424601, 'learning_rate': 1.8909400536336728e-05, 'epoch': 0.18} 18%|█▊ | 1166/6638 [1:06:32<4:59:45, 3.29s/it] 18%|█▊ | 1167/6638 [1:06:36<5:00:58, 3.30s/it] {'loss': 0.6751, 'grad_norm': 0.61548470910966, 'learning_rate': 1.890718347227496e-05, 'epoch': 0.18} 18%|█▊ | 1167/6638 [1:06:36<5:00:58, 3.30s/it] 18%|█▊ | 1168/6638 [1:06:39<4:58:21, 3.27s/it] {'loss': 0.6987, 'grad_norm': 0.7052660437870065, 'learning_rate': 1.8904964287225292e-05, 'epoch': 0.18} 18%|█▊ | 1168/6638 [1:06:39<4:58:21, 3.27s/it] 18%|█▊ | 1169/6638 [1:06:42<4:59:05, 3.28s/it] {'loss': 0.6821, 'grad_norm': 0.6430569417193543, 'learning_rate': 1.8902742981716166e-05, 'epoch': 0.18} 18%|█▊ | 1169/6638 [1:06:42<4:59:05, 3.28s/it] 18%|█▊ | 1170/6638 [1:06:45<4:59:20, 3.28s/it] {'loss': 0.6426, 'grad_norm': 0.6615253477565682, 'learning_rate': 1.890051955627652e-05, 'epoch': 0.18} 18%|█▊ | 1170/6638 [1:06:45<4:59:20, 3.28s/it] 18%|█▊ | 1171/6638 [1:06:49<4:58:55, 3.28s/it] {'loss': 0.6562, 'grad_norm': 0.6769749274796191, 'learning_rate': 1.889829401143579e-05, 'epoch': 0.18} 18%|█▊ | 1171/6638 [1:06:49<4:58:55, 3.28s/it] 18%|█▊ | 1172/6638 [1:06:52<4:59:03, 3.28s/it] {'loss': 0.6742, 'grad_norm': 0.580857391302448, 'learning_rate': 1.889606634772394e-05, 'epoch': 0.18} 18%|█▊ | 1172/6638 [1:06:52<4:59:03, 3.28s/it] 18%|█▊ | 1173/6638 [1:06:55<5:00:49, 3.30s/it] {'loss': 0.7142, 'grad_norm': 0.726516717124712, 'learning_rate': 1.8893836565671408e-05, 'epoch': 0.18} 18%|█▊ | 1173/6638 [1:06:55<5:00:49, 3.30s/it] 18%|█▊ | 1174/6638 [1:06:59<4:58:42, 3.28s/it] {'loss': 0.7094, 'grad_norm': 0.6954936712231816, 'learning_rate': 1.8891604665809162e-05, 'epoch': 0.18} 18%|█▊ | 1174/6638 [1:06:59<4:58:42, 3.28s/it] 18%|█▊ | 1175/6638 [1:07:02<4:57:14, 3.26s/it] {'loss': 0.7335, 'grad_norm': 0.6892788792696719, 'learning_rate': 1.888937064866866e-05, 'epoch': 0.18} 18%|█▊ | 1175/6638 [1:07:02<4:57:14, 3.26s/it] 18%|█▊ | 1176/6638 [1:07:05<4:56:01, 3.25s/it] {'loss': 0.7027, 'grad_norm': 0.771947108729116, 'learning_rate': 1.8887134514781872e-05, 'epoch': 0.18} 18%|█▊ | 1176/6638 [1:07:05<4:56:01, 3.25s/it] 18%|█▊ | 1177/6638 [1:07:08<4:57:02, 3.26s/it] {'loss': 0.6614, 'grad_norm': 0.6438716262108823, 'learning_rate': 1.8884896264681264e-05, 'epoch': 0.18} 18%|█▊ | 1177/6638 [1:07:08<4:57:02, 3.26s/it] 18%|█▊ | 1178/6638 [1:07:12<5:03:23, 3.33s/it] {'loss': 0.6785, 'grad_norm': 0.5533864525770534, 'learning_rate': 1.8882655898899812e-05, 'epoch': 0.18} 18%|█▊ | 1178/6638 [1:07:12<5:03:23, 3.33s/it] 18%|█▊ | 1179/6638 [1:07:15<5:01:14, 3.31s/it] {'loss': 0.6587, 'grad_norm': 0.6353040056331015, 'learning_rate': 1.8880413417970998e-05, 'epoch': 0.18} 18%|█▊ | 1179/6638 [1:07:15<5:01:14, 3.31s/it] 18%|█▊ | 1180/6638 [1:07:18<5:03:36, 3.34s/it] {'loss': 0.7466, 'grad_norm': 0.8544070888893099, 'learning_rate': 1.88781688224288e-05, 'epoch': 0.18} 18%|█▊ | 1180/6638 [1:07:18<5:03:36, 3.34s/it] 18%|█▊ | 1181/6638 [1:07:22<5:02:52, 3.33s/it] {'loss': 0.6651, 'grad_norm': 0.6128168670759729, 'learning_rate': 1.8875922112807706e-05, 'epoch': 0.18} 18%|█▊ | 1181/6638 [1:07:22<5:02:52, 3.33s/it] 18%|█▊ | 1182/6638 [1:07:25<5:03:21, 3.34s/it] {'loss': 0.6953, 'grad_norm': 0.697509424744347, 'learning_rate': 1.88736732896427e-05, 'epoch': 0.18} 18%|█▊ | 1182/6638 [1:07:25<5:03:21, 3.34s/it] 18%|█▊ | 1183/6638 [1:07:28<5:03:57, 3.34s/it] {'loss': 0.7177, 'grad_norm': 0.7250077274995088, 'learning_rate': 1.887142235346928e-05, 'epoch': 0.18} 18%|█▊ | 1183/6638 [1:07:28<5:03:57, 3.34s/it] 18%|█▊ | 1184/6638 [1:07:32<5:02:55, 3.33s/it] {'loss': 0.6581, 'grad_norm': 0.6158906885849874, 'learning_rate': 1.8869169304823438e-05, 'epoch': 0.18} 18%|█▊ | 1184/6638 [1:07:32<5:02:55, 3.33s/it] 18%|█▊ | 1185/6638 [1:07:35<4:59:38, 3.30s/it] {'loss': 0.6666, 'grad_norm': 0.6420610755094848, 'learning_rate': 1.8866914144241673e-05, 'epoch': 0.18} 18%|█▊ | 1185/6638 [1:07:35<4:59:38, 3.30s/it] 18%|█▊ | 1186/6638 [1:07:38<4:57:19, 3.27s/it] {'loss': 0.6684, 'grad_norm': 0.6639904552796635, 'learning_rate': 1.8864656872260985e-05, 'epoch': 0.18} 18%|█▊ | 1186/6638 [1:07:38<4:57:19, 3.27s/it] 18%|█▊ | 1187/6638 [1:07:42<5:01:54, 3.32s/it] {'loss': 0.6766, 'grad_norm': 0.5637657964647235, 'learning_rate': 1.8862397489418885e-05, 'epoch': 0.18} 18%|█▊ | 1187/6638 [1:07:42<5:01:54, 3.32s/it] 18%|█▊ | 1188/6638 [1:07:45<4:59:05, 3.29s/it] {'loss': 0.6908, 'grad_norm': 0.7756481803309391, 'learning_rate': 1.8860135996253368e-05, 'epoch': 0.18} 18%|█▊ | 1188/6638 [1:07:45<4:59:05, 3.29s/it] 18%|█▊ | 1189/6638 [1:07:48<4:56:51, 3.27s/it] {'loss': 0.6882, 'grad_norm': 0.6814481017653025, 'learning_rate': 1.8857872393302955e-05, 'epoch': 0.18} 18%|█▊ | 1189/6638 [1:07:48<4:56:51, 3.27s/it] 18%|█▊ | 1190/6638 [1:07:51<4:55:41, 3.26s/it] {'loss': 0.6712, 'grad_norm': 0.6141669349285444, 'learning_rate': 1.8855606681106646e-05, 'epoch': 0.18} 18%|█▊ | 1190/6638 [1:07:51<4:55:41, 3.26s/it] 18%|█▊ | 1191/6638 [1:07:54<4:54:13, 3.24s/it] {'loss': 0.7239, 'grad_norm': 0.822067132003536, 'learning_rate': 1.8853338860203962e-05, 'epoch': 0.18} 18%|█▊ | 1191/6638 [1:07:54<4:54:13, 3.24s/it] 18%|█▊ | 1192/6638 [1:07:58<4:54:35, 3.25s/it] {'loss': 0.6836, 'grad_norm': 0.58534529255156, 'learning_rate': 1.885106893113492e-05, 'epoch': 0.18} 18%|█▊ | 1192/6638 [1:07:58<4:54:35, 3.25s/it] 18%|█▊ | 1193/6638 [1:08:02<5:14:02, 3.46s/it] {'loss': 0.671, 'grad_norm': 0.6896820923023519, 'learning_rate': 1.884879689444003e-05, 'epoch': 0.18} 18%|█▊ | 1193/6638 [1:08:02<5:14:02, 3.46s/it] 18%|█▊ | 1194/6638 [1:08:05<5:08:34, 3.40s/it] {'loss': 0.6903, 'grad_norm': 0.7343608536951383, 'learning_rate': 1.884652275066032e-05, 'epoch': 0.18} 18%|█▊ | 1194/6638 [1:08:05<5:08:34, 3.40s/it] 18%|█▊ | 1195/6638 [1:08:08<5:05:49, 3.37s/it] {'loss': 0.6567, 'grad_norm': 0.5918102371994218, 'learning_rate': 1.8844246500337308e-05, 'epoch': 0.18} 18%|█▊ | 1195/6638 [1:08:08<5:05:49, 3.37s/it] 18%|█▊ | 1196/6638 [1:08:11<5:01:06, 3.32s/it] {'loss': 0.652, 'grad_norm': 0.645714986584149, 'learning_rate': 1.884196814401302e-05, 'epoch': 0.18} 18%|█▊ | 1196/6638 [1:08:11<5:01:06, 3.32s/it] 18%|█▊ | 1197/6638 [1:08:15<5:00:58, 3.32s/it] {'loss': 0.6582, 'grad_norm': 0.6547859230691749, 'learning_rate': 1.8839687682229978e-05, 'epoch': 0.18} 18%|█▊ | 1197/6638 [1:08:15<5:00:58, 3.32s/it] 18%|█▊ | 1198/6638 [1:08:18<4:58:57, 3.30s/it] {'loss': 0.6782, 'grad_norm': 0.6149244017100866, 'learning_rate': 1.8837405115531205e-05, 'epoch': 0.18} 18%|█▊ | 1198/6638 [1:08:18<4:58:57, 3.30s/it] 18%|█▊ | 1199/6638 [1:08:21<4:55:39, 3.26s/it] {'loss': 0.6806, 'grad_norm': 0.6491381922213095, 'learning_rate': 1.883512044446023e-05, 'epoch': 0.18} 18%|█▊ | 1199/6638 [1:08:21<4:55:39, 3.26s/it]04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 18%|█▊ | 1200/6638 [1:08:24<4:53:47, 3.24s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 63 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.681, 'grad_norm': 0.6607833232081943, 'learning_rate': 1.8832833669561087e-05, 'epoch': 0.18} 18%|█▊ | 1200/6638 [1:08:24<4:53:47, 3.24s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 18%|█▊ | 1201/6638 [1:08:40<10:40:55, 7.07s/it] {'loss': 0.7072, 'grad_norm': 0.6722807627072672, 'learning_rate': 1.8830544791378298e-05, 'epoch': 0.18} 18%|█▊ | 1201/6638 [1:08:40<10:40:55, 7.07s/it] 18%|█▊ | 1202/6638 [1:08:44<8:56:08, 5.92s/it] {'loss': 0.6513, 'grad_norm': 0.633551446572449, 'learning_rate': 1.8828253810456898e-05, 'epoch': 0.18} 18%|█▊ | 1202/6638 [1:08:44<8:56:08, 5.92s/it] 18%|█▊ | 1203/6638 [1:08:47<7:45:35, 5.14s/it] {'loss': 0.7066, 'grad_norm': 0.617702278667922, 'learning_rate': 1.8825960727342414e-05, 'epoch': 0.18} 18%|█▊ | 1203/6638 [1:08:47<7:45:35, 5.14s/it] 18%|█▊ | 1204/6638 [1:08:50<6:55:19, 4.59s/it] {'loss': 0.682, 'grad_norm': 0.7001561356067693, 'learning_rate': 1.8823665542580878e-05, 'epoch': 0.18} 18%|█▊ | 1204/6638 [1:08:50<6:55:19, 4.59s/it] 18%|█▊ | 1205/6638 [1:08:54<6:19:57, 4.20s/it] {'loss': 0.6443, 'grad_norm': 0.5870137178572862, 'learning_rate': 1.8821368256718825e-05, 'epoch': 0.18} 18%|█▊ | 1205/6638 [1:08:54<6:19:57, 4.20s/it] 18%|█▊ | 1206/6638 [1:08:57<5:54:19, 3.91s/it] {'loss': 0.6447, 'grad_norm': 0.5946178156723558, 'learning_rate': 1.8819068870303287e-05, 'epoch': 0.18} 18%|█▊ | 1206/6638 [1:08:57<5:54:19, 3.91s/it] 18%|█▊ | 1207/6638 [1:09:00<5:43:59, 3.80s/it] {'loss': 0.7051, 'grad_norm': 0.6649442997888244, 'learning_rate': 1.881676738388179e-05, 'epoch': 0.18} 18%|█▊ | 1207/6638 [1:09:00<5:43:59, 3.80s/it] 18%|█▊ | 1208/6638 [1:09:03<5:26:27, 3.61s/it] {'loss': 0.6743, 'grad_norm': 0.7355827177060826, 'learning_rate': 1.881446379800237e-05, 'epoch': 0.18} 18%|█▊ | 1208/6638 [1:09:03<5:26:27, 3.61s/it] 18%|█▊ | 1209/6638 [1:09:07<5:14:28, 3.48s/it] {'loss': 0.6818, 'grad_norm': 0.6571046500774244, 'learning_rate': 1.8812158113213564e-05, 'epoch': 0.18} 18%|█▊ | 1209/6638 [1:09:07<5:14:28, 3.48s/it] 18%|█▊ | 1210/6638 [1:09:10<5:11:07, 3.44s/it] {'loss': 0.6374, 'grad_norm': 0.5530991839684608, 'learning_rate': 1.88098503300644e-05, 'epoch': 0.18} 18%|█▊ | 1210/6638 [1:09:10<5:11:07, 3.44s/it] 18%|█▊ | 1211/6638 [1:09:13<5:08:18, 3.41s/it] {'loss': 0.6708, 'grad_norm': 0.728000400341443, 'learning_rate': 1.880754044910441e-05, 'epoch': 0.18} 18%|█▊ | 1211/6638 [1:09:13<5:08:18, 3.41s/it] 18%|█▊ | 1212/6638 [1:09:17<5:04:31, 3.37s/it] {'loss': 0.6714, 'grad_norm': 0.7019337909936959, 'learning_rate': 1.8805228470883624e-05, 'epoch': 0.18} 18%|█▊ | 1212/6638 [1:09:17<5:04:31, 3.37s/it] 18%|█▊ | 1213/6638 [1:09:20<5:03:31, 3.36s/it] {'loss': 0.706, 'grad_norm': 0.8053057953941767, 'learning_rate': 1.880291439595257e-05, 'epoch': 0.18} 18%|█▊ | 1213/6638 [1:09:20<5:03:31, 3.36s/it] 18%|█▊ | 1214/6638 [1:09:23<5:03:16, 3.35s/it] {'loss': 0.7291, 'grad_norm': 1.3162085537943795, 'learning_rate': 1.8800598224862286e-05, 'epoch': 0.18} 18%|█▊ | 1214/6638 [1:09:23<5:03:16, 3.35s/it] 18%|█▊ | 1215/6638 [1:09:27<5:01:02, 3.33s/it] {'loss': 0.7343, 'grad_norm': 0.8523682141729249, 'learning_rate': 1.8798279958164295e-05, 'epoch': 0.18} 18%|█▊ | 1215/6638 [1:09:27<5:01:02, 3.33s/it] 18%|█▊ | 1216/6638 [1:09:30<4:58:06, 3.30s/it] {'loss': 0.6679, 'grad_norm': 0.6675639992395456, 'learning_rate': 1.879595959641063e-05, 'epoch': 0.18} 18%|█▊ | 1216/6638 [1:09:30<4:58:06, 3.30s/it] 18%|█▊ | 1217/6638 [1:09:33<4:55:15, 3.27s/it] {'loss': 0.6749, 'grad_norm': 0.6625984202764811, 'learning_rate': 1.8793637140153812e-05, 'epoch': 0.18} 18%|█▊ | 1217/6638 [1:09:33<4:55:15, 3.27s/it] 18%|█▊ | 1218/6638 [1:09:36<4:55:39, 3.27s/it] {'loss': 0.6155, 'grad_norm': 0.6138418506468304, 'learning_rate': 1.8791312589946867e-05, 'epoch': 0.18} 18%|█▊ | 1218/6638 [1:09:36<4:55:39, 3.27s/it] 18%|█▊ | 1219/6638 [1:09:39<4:53:19, 3.25s/it] {'loss': 0.7359, 'grad_norm': 0.6782100596818343, 'learning_rate': 1.8788985946343327e-05, 'epoch': 0.18} 18%|█▊ | 1219/6638 [1:09:39<4:53:19, 3.25s/it] 18%|█▊ | 1220/6638 [1:09:43<5:08:23, 3.42s/it] {'loss': 0.7025, 'grad_norm': 0.625469908199866, 'learning_rate': 1.8786657209897207e-05, 'epoch': 0.18} 18%|█▊ | 1220/6638 [1:09:43<5:08:23, 3.42s/it] 18%|█▊ | 1221/6638 [1:09:47<5:14:01, 3.48s/it] {'loss': 0.7367, 'grad_norm': 0.6555598093291926, 'learning_rate': 1.8784326381163038e-05, 'epoch': 0.18} 18%|█▊ | 1221/6638 [1:09:47<5:14:01, 3.48s/it] 18%|█▊ | 1222/6638 [1:09:50<5:07:18, 3.40s/it] {'loss': 0.6672, 'grad_norm': 0.6854265011123543, 'learning_rate': 1.8781993460695824e-05, 'epoch': 0.18} 18%|█▊ | 1222/6638 [1:09:50<5:07:18, 3.40s/it] 18%|█▊ | 1223/6638 [1:09:53<5:04:02, 3.37s/it] {'loss': 0.6965, 'grad_norm': 0.6599427818786912, 'learning_rate': 1.8779658449051092e-05, 'epoch': 0.18} 18%|█▊ | 1223/6638 [1:09:53<5:04:02, 3.37s/it] 18%|█▊ | 1224/6638 [1:09:57<5:01:01, 3.34s/it] {'loss': 0.7101, 'grad_norm': 0.7539322958746176, 'learning_rate': 1.8777321346784858e-05, 'epoch': 0.18} 18%|█▊ | 1224/6638 [1:09:57<5:01:01, 3.34s/it] 18%|█▊ | 1225/6638 [1:10:00<4:59:58, 3.32s/it] {'loss': 0.6909, 'grad_norm': 0.6385155851339123, 'learning_rate': 1.8774982154453632e-05, 'epoch': 0.18} 18%|█▊ | 1225/6638 [1:10:00<4:59:58, 3.32s/it] 18%|█▊ | 1226/6638 [1:10:03<4:57:41, 3.30s/it] {'loss': 0.682, 'grad_norm': 0.6805399582394873, 'learning_rate': 1.877264087261443e-05, 'epoch': 0.18} 18%|█▊ | 1226/6638 [1:10:03<4:57:41, 3.30s/it] 18%|█▊ | 1227/6638 [1:10:07<4:58:31, 3.31s/it] {'loss': 0.6585, 'grad_norm': 0.6667697803203647, 'learning_rate': 1.877029750182475e-05, 'epoch': 0.18} 18%|█▊ | 1227/6638 [1:10:07<4:58:31, 3.31s/it] 18%|█▊ | 1228/6638 [1:10:10<4:56:02, 3.28s/it] {'loss': 0.6837, 'grad_norm': 0.6457652700094633, 'learning_rate': 1.8767952042642613e-05, 'epoch': 0.18} 18%|█▊ | 1228/6638 [1:10:10<4:56:02, 3.28s/it] 19%|█▊ | 1229/6638 [1:10:13<4:52:36, 3.25s/it] {'loss': 0.6827, 'grad_norm': 0.6706528073733432, 'learning_rate': 1.876560449562651e-05, 'epoch': 0.19} 19%|█▊ | 1229/6638 [1:10:13<4:52:36, 3.25s/it] 19%|█▊ | 1230/6638 [1:10:16<4:53:53, 3.26s/it] {'loss': 0.7255, 'grad_norm': 0.681470295673143, 'learning_rate': 1.8763254861335445e-05, 'epoch': 0.19} 19%|█▊ | 1230/6638 [1:10:16<4:53:53, 3.26s/it] 19%|█▊ | 1231/6638 [1:10:19<4:52:16, 3.24s/it] {'loss': 0.6811, 'grad_norm': 0.6644485303880913, 'learning_rate': 1.8760903140328917e-05, 'epoch': 0.19} 19%|█▊ | 1231/6638 [1:10:19<4:52:16, 3.24s/it] 19%|█▊ | 1232/6638 [1:10:23<4:51:00, 3.23s/it] {'loss': 0.66, 'grad_norm': 0.6383526451340901, 'learning_rate': 1.875854933316692e-05, 'epoch': 0.19} 19%|█▊ | 1232/6638 [1:10:23<4:51:00, 3.23s/it] 19%|█▊ | 1233/6638 [1:10:26<4:52:47, 3.25s/it] {'loss': 0.6896, 'grad_norm': 0.6460923455109141, 'learning_rate': 1.8756193440409945e-05, 'epoch': 0.19} 19%|█▊ | 1233/6638 [1:10:26<4:52:47, 3.25s/it] 19%|█▊ | 1234/6638 [1:10:29<4:52:18, 3.25s/it] {'loss': 0.6569, 'grad_norm': 0.6153933951413478, 'learning_rate': 1.8753835462618976e-05, 'epoch': 0.19} 19%|█▊ | 1234/6638 [1:10:29<4:52:18, 3.25s/it] 19%|█▊ | 1235/6638 [1:10:32<4:51:58, 3.24s/it] {'loss': 0.6502, 'grad_norm': 0.6068673282153165, 'learning_rate': 1.87514754003555e-05, 'epoch': 0.19} 19%|█▊ | 1235/6638 [1:10:32<4:51:58, 3.24s/it] 19%|█▊ | 1236/6638 [1:10:36<4:56:27, 3.29s/it] {'loss': 0.6938, 'grad_norm': 0.6135006016741416, 'learning_rate': 1.8749113254181498e-05, 'epoch': 0.19} 19%|█▊ | 1236/6638 [1:10:36<4:56:27, 3.29s/it] 19%|█▊ | 1237/6638 [1:10:39<4:55:41, 3.28s/it] {'loss': 0.6493, 'grad_norm': 0.5820519261213161, 'learning_rate': 1.8746749024659445e-05, 'epoch': 0.19} 19%|█▊ | 1237/6638 [1:10:39<4:55:41, 3.28s/it] 19%|█▊ | 1238/6638 [1:10:42<4:53:48, 3.26s/it] {'loss': 0.6335, 'grad_norm': 0.5773702778270887, 'learning_rate': 1.8744382712352317e-05, 'epoch': 0.19} 19%|█▊ | 1238/6638 [1:10:42<4:53:48, 3.26s/it] 19%|█▊ | 1239/6638 [1:10:46<4:53:48, 3.27s/it] {'loss': 0.7017, 'grad_norm': 0.8761569807720294, 'learning_rate': 1.8742014317823583e-05, 'epoch': 0.19} 19%|█▊ | 1239/6638 [1:10:46<4:53:48, 3.27s/it] 19%|█▊ | 1240/6638 [1:10:49<4:50:57, 3.23s/it] {'loss': 0.6523, 'grad_norm': 0.7028081702902207, 'learning_rate': 1.8739643841637202e-05, 'epoch': 0.19} 19%|█▊ | 1240/6638 [1:10:49<4:50:57, 3.23s/it] 19%|█▊ | 1241/6638 [1:10:52<4:52:13, 3.25s/it] {'loss': 0.7029, 'grad_norm': 0.654631159720294, 'learning_rate': 1.873727128435764e-05, 'epoch': 0.19} 19%|█▊ | 1241/6638 [1:10:52<4:52:13, 3.25s/it] 19%|█▊ | 1242/6638 [1:10:55<4:54:25, 3.27s/it] {'loss': 0.6542, 'grad_norm': 0.6083001922646839, 'learning_rate': 1.873489664654985e-05, 'epoch': 0.19} 19%|█▊ | 1242/6638 [1:10:55<4:54:25, 3.27s/it] 19%|█▊ | 1243/6638 [1:10:59<4:52:26, 3.25s/it] {'loss': 0.639, 'grad_norm': 0.5840482262308646, 'learning_rate': 1.873251992877928e-05, 'epoch': 0.19} 19%|█▊ | 1243/6638 [1:10:59<4:52:26, 3.25s/it] 19%|█▊ | 1244/6638 [1:11:02<4:59:29, 3.33s/it] {'loss': 0.6933, 'grad_norm': 0.7558945204612608, 'learning_rate': 1.8730141131611882e-05, 'epoch': 0.19} 19%|█▊ | 1244/6638 [1:11:02<4:59:29, 3.33s/it] 19%|█▉ | 1245/6638 [1:11:05<4:59:09, 3.33s/it] {'loss': 0.7166, 'grad_norm': 0.6533565207040796, 'learning_rate': 1.8727760255614097e-05, 'epoch': 0.19} 19%|█▉ | 1245/6638 [1:11:05<4:59:09, 3.33s/it] 19%|█▉ | 1246/6638 [1:11:09<4:56:09, 3.30s/it] {'loss': 0.6987, 'grad_norm': 0.6649908857250304, 'learning_rate': 1.872537730135286e-05, 'epoch': 0.19} 19%|█▉ | 1246/6638 [1:11:09<4:56:09, 3.30s/it] 19%|█▉ | 1247/6638 [1:11:12<4:53:23, 3.27s/it] {'loss': 0.6747, 'grad_norm': 0.6450044529355627, 'learning_rate': 1.8722992269395605e-05, 'epoch': 0.19} 19%|█▉ | 1247/6638 [1:11:12<4:53:23, 3.27s/it] 19%|█▉ | 1248/6638 [1:11:15<4:50:03, 3.23s/it] {'loss': 0.645, 'grad_norm': 0.6436106920558151, 'learning_rate': 1.8720605160310257e-05, 'epoch': 0.19} 19%|█▉ | 1248/6638 [1:11:15<4:50:03, 3.23s/it] 19%|█▉ | 1249/6638 [1:11:18<4:49:51, 3.23s/it] {'loss': 0.6841, 'grad_norm': 0.661468838194671, 'learning_rate': 1.8718215974665235e-05, 'epoch': 0.19} 19%|█▉ | 1249/6638 [1:11:18<4:49:51, 3.23s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 4 19%|█▉ | 1250/6638 [1:11:21<4:46:56, 3.20s/it] AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6473, 'grad_norm': 0.6471639946937242, 'learning_rate': 1.8715824713029455e-05, 'epoch': 0.19} 19%|█▉ | 1250/6638 [1:11:21<4:46:56, 3.20s/it] 19%|█▉ | 1251/6638 [1:11:25<4:47:59, 3.21s/it] {'loss': 0.6631, 'grad_norm': 0.6382674970543903, 'learning_rate': 1.871343137597233e-05, 'epoch': 0.19} 19%|█▉ | 1251/6638 [1:11:25<4:47:59, 3.21s/it] 19%|█▉ | 1252/6638 [1:11:28<4:52:02, 3.25s/it] {'loss': 0.6999, 'grad_norm': 0.6535373044457387, 'learning_rate': 1.871103596406376e-05, 'epoch': 0.19} 19%|█▉ | 1252/6638 [1:11:28<4:52:02, 3.25s/it] 19%|█▉ | 1253/6638 [1:11:31<4:52:45, 3.26s/it] {'loss': 0.6605, 'grad_norm': 0.701963326658867, 'learning_rate': 1.8708638477874145e-05, 'epoch': 0.19} 19%|█▉ | 1253/6638 [1:11:31<4:52:45, 3.26s/it] 19%|█▉ | 1254/6638 [1:11:34<4:53:51, 3.27s/it] {'loss': 0.693, 'grad_norm': 0.6939751587499933, 'learning_rate': 1.8706238917974377e-05, 'epoch': 0.19} 19%|█▉ | 1254/6638 [1:11:34<4:53:51, 3.27s/it] 19%|█▉ | 1255/6638 [1:11:38<4:54:32, 3.28s/it] {'loss': 0.679, 'grad_norm': 0.6660456954537872, 'learning_rate': 1.870383728493584e-05, 'epoch': 0.19} 19%|█▉ | 1255/6638 [1:11:38<4:54:32, 3.28s/it] 19%|█▉ | 1256/6638 [1:11:41<4:56:08, 3.30s/it] {'loss': 0.661, 'grad_norm': 0.9106870638976644, 'learning_rate': 1.8701433579330414e-05, 'epoch': 0.19} 19%|█▉ | 1256/6638 [1:11:41<4:56:08, 3.30s/it] 19%|█▉ | 1257/6638 [1:11:44<4:55:47, 3.30s/it] {'loss': 0.678, 'grad_norm': 0.6755047202442478, 'learning_rate': 1.8699027801730473e-05, 'epoch': 0.19} 19%|█▉ | 1257/6638 [1:11:44<4:55:47, 3.30s/it] 19%|█▉ | 1258/6638 [1:11:48<4:58:50, 3.33s/it] {'loss': 0.7001, 'grad_norm': 0.7410857914080149, 'learning_rate': 1.8696619952708885e-05, 'epoch': 0.19} 19%|█▉ | 1258/6638 [1:11:48<4:58:50, 3.33s/it] 19%|█▉ | 1259/6638 [1:11:51<4:55:16, 3.29s/it] {'loss': 0.6585, 'grad_norm': 0.5996962311753415, 'learning_rate': 1.8694210032839005e-05, 'epoch': 0.19} 19%|█▉ | 1259/6638 [1:11:51<4:55:16, 3.29s/it] 19%|█▉ | 1260/6638 [1:11:54<4:53:54, 3.28s/it] {'loss': 0.66, 'grad_norm': 0.5972354054193265, 'learning_rate': 1.869179804269469e-05, 'epoch': 0.19} 19%|█▉ | 1260/6638 [1:11:54<4:53:54, 3.28s/it] 19%|█▉ | 1261/6638 [1:11:57<4:51:34, 3.25s/it] {'loss': 0.6793, 'grad_norm': 0.74304000991722, 'learning_rate': 1.8689383982850284e-05, 'epoch': 0.19} 19%|█▉ | 1261/6638 [1:11:57<4:51:34, 3.25s/it] 19%|█▉ | 1262/6638 [1:12:01<4:49:12, 3.23s/it] {'loss': 0.6783, 'grad_norm': 0.6366071257304082, 'learning_rate': 1.868696785388062e-05, 'epoch': 0.19} 19%|█▉ | 1262/6638 [1:12:01<4:49:12, 3.23s/it] 19%|█▉ | 1263/6638 [1:12:04<4:55:39, 3.30s/it] {'loss': 0.7349, 'grad_norm': 0.6520457240014027, 'learning_rate': 1.868454965636104e-05, 'epoch': 0.19} 19%|█▉ | 1263/6638 [1:12:04<4:55:39, 3.30s/it] 19%|█▉ | 1264/6638 [1:12:07<4:50:43, 3.25s/it] {'loss': 0.6361, 'grad_norm': 0.6598910381869969, 'learning_rate': 1.868212939086736e-05, 'epoch': 0.19} 19%|█▉ | 1264/6638 [1:12:07<4:50:43, 3.25s/it] 19%|█▉ | 1265/6638 [1:12:10<4:50:26, 3.24s/it] {'loss': 0.6503, 'grad_norm': 0.5997685067583168, 'learning_rate': 1.8679707057975894e-05, 'epoch': 0.19} 19%|█▉ | 1265/6638 [1:12:10<4:50:26, 3.24s/it] 19%|█▉ | 1266/6638 [1:12:14<4:50:00, 3.24s/it] {'loss': 0.6678, 'grad_norm': 0.7656029436269592, 'learning_rate': 1.867728265826346e-05, 'epoch': 0.19} 19%|█▉ | 1266/6638 [1:12:14<4:50:00, 3.24s/it] 19%|█▉ | 1267/6638 [1:12:17<4:52:49, 3.27s/it] {'loss': 0.6879, 'grad_norm': 0.6467333745128909, 'learning_rate': 1.8674856192307354e-05, 'epoch': 0.19} 19%|█▉ | 1267/6638 [1:12:17<4:52:49, 3.27s/it] 19%|█▉ | 1268/6638 [1:12:20<4:50:57, 3.25s/it] {'loss': 0.6694, 'grad_norm': 0.7825542558394035, 'learning_rate': 1.8672427660685365e-05, 'epoch': 0.19} 19%|█▉ | 1268/6638 [1:12:20<4:50:57, 3.25s/it] 19%|█▉ | 1269/6638 [1:12:24<4:54:51, 3.30s/it] {'loss': 0.6572, 'grad_norm': 0.6120126955954358, 'learning_rate': 1.8669997063975783e-05, 'epoch': 0.19} 19%|█▉ | 1269/6638 [1:12:24<4:54:51, 3.30s/it] 19%|█▉ | 1270/6638 [1:12:27<4:54:01, 3.29s/it] {'loss': 0.709, 'grad_norm': 0.692458320799285, 'learning_rate': 1.866756440275738e-05, 'epoch': 0.19} 19%|█▉ | 1270/6638 [1:12:27<4:54:01, 3.29s/it] 19%|█▉ | 1271/6638 [1:12:30<4:54:51, 3.30s/it] {'loss': 0.7076, 'grad_norm': 0.7299291693512933, 'learning_rate': 1.8665129677609425e-05, 'epoch': 0.19} 19%|█▉ | 1271/6638 [1:12:30<4:54:51, 3.30s/it] 19%|█▉ | 1272/6638 [1:12:33<4:53:17, 3.28s/it] {'loss': 0.6578, 'grad_norm': 0.6579057561668477, 'learning_rate': 1.8662692889111675e-05, 'epoch': 0.19} 19%|█▉ | 1272/6638 [1:12:33<4:53:17, 3.28s/it] 19%|█▉ | 1273/6638 [1:12:37<4:54:30, 3.29s/it] {'loss': 0.6909, 'grad_norm': 0.7515949361353235, 'learning_rate': 1.866025403784439e-05, 'epoch': 0.19} 19%|█▉ | 1273/6638 [1:12:37<4:54:30, 3.29s/it] 19%|█▉ | 1274/6638 [1:12:40<4:53:02, 3.28s/it] {'loss': 0.6527, 'grad_norm': 0.5820174535011281, 'learning_rate': 1.86578131243883e-05, 'epoch': 0.19} 19%|█▉ | 1274/6638 [1:12:40<4:53:02, 3.28s/it] 19%|█▉ | 1275/6638 [1:12:43<4:52:43, 3.27s/it] {'loss': 0.6342, 'grad_norm': 0.6528916445753146, 'learning_rate': 1.8655370149324647e-05, 'epoch': 0.19} 19%|█▉ | 1275/6638 [1:12:43<4:52:43, 3.27s/it] 19%|█▉ | 1276/6638 [1:12:46<4:50:15, 3.25s/it] {'loss': 0.6467, 'grad_norm': 0.6306934300008409, 'learning_rate': 1.8652925113235146e-05, 'epoch': 0.19} 19%|█▉ | 1276/6638 [1:12:46<4:50:15, 3.25s/it] 19%|█▉ | 1277/6638 [1:12:50<4:49:55, 3.24s/it] {'loss': 0.6938, 'grad_norm': 0.6839169657183893, 'learning_rate': 1.865047801670202e-05, 'epoch': 0.19} 19%|█▉ | 1277/6638 [1:12:50<4:49:55, 3.24s/it] 19%|█▉ | 1278/6638 [1:12:53<4:51:49, 3.27s/it] {'loss': 0.724, 'grad_norm': 0.8366384952485972, 'learning_rate': 1.864802886030797e-05, 'epoch': 0.19} 19%|█▉ | 1278/6638 [1:12:53<4:51:49, 3.27s/it] 19%|█▉ | 1279/6638 [1:12:56<4:56:47, 3.32s/it] {'loss': 0.6344, 'grad_norm': 0.5491485833180568, 'learning_rate': 1.8645577644636193e-05, 'epoch': 0.19} 19%|█▉ | 1279/6638 [1:12:56<4:56:47, 3.32s/it] 19%|█▉ | 1280/6638 [1:13:00<4:52:06, 3.27s/it] {'loss': 0.6754, 'grad_norm': 0.7711588630767937, 'learning_rate': 1.8643124370270373e-05, 'epoch': 0.19} 19%|█▉ | 1280/6638 [1:13:00<4:52:06, 3.27s/it] 19%|█▉ | 1281/6638 [1:13:03<4:52:06, 3.27s/it] {'loss': 0.6833, 'grad_norm': 0.6793963767609885, 'learning_rate': 1.864066903779469e-05, 'epoch': 0.19} 19%|█▉ | 1281/6638 [1:13:03<4:52:06, 3.27s/it] 19%|█▉ | 1282/6638 [1:13:06<4:48:35, 3.23s/it] {'loss': 0.6781, 'grad_norm': 0.8641168567957884, 'learning_rate': 1.8638211647793807e-05, 'epoch': 0.19} 19%|█▉ | 1282/6638 [1:13:06<4:48:35, 3.23s/it] 19%|█▉ | 1283/6638 [1:13:10<4:57:45, 3.34s/it] {'loss': 0.6722, 'grad_norm': 0.6964547642622946, 'learning_rate': 1.8635752200852878e-05, 'epoch': 0.19} 19%|█▉ | 1283/6638 [1:13:10<4:57:45, 3.34s/it] 19%|█▉ | 1284/6638 [1:13:13<4:58:33, 3.35s/it] {'loss': 0.7037, 'grad_norm': 0.6975680859749758, 'learning_rate': 1.8633290697557557e-05, 'epoch': 0.19} 19%|█▉ | 1284/6638 [1:13:13<4:58:33, 3.35s/it] 19%|█▉ | 1285/6638 [1:13:16<5:00:22, 3.37s/it] {'loss': 0.7098, 'grad_norm': 0.6854202896269833, 'learning_rate': 1.8630827138493975e-05, 'epoch': 0.19} 19%|█▉ | 1285/6638 [1:13:16<5:00:22, 3.37s/it] 19%|█▉ | 1286/6638 [1:13:20<5:00:59, 3.37s/it] {'loss': 0.6914, 'grad_norm': 0.6273829367846141, 'learning_rate': 1.8628361524248755e-05, 'epoch': 0.19} 19%|█▉ | 1286/6638 [1:13:20<5:00:59, 3.37s/it] 19%|█▉ | 1287/6638 [1:13:23<4:56:10, 3.32s/it] {'loss': 0.6594, 'grad_norm': 0.6417164051787517, 'learning_rate': 1.8625893855409014e-05, 'epoch': 0.19} 19%|█▉ | 1287/6638 [1:13:23<4:56:10, 3.32s/it] 19%|█▉ | 1288/6638 [1:13:26<4:53:34, 3.29s/it] {'loss': 0.6903, 'grad_norm': 0.7606541274546917, 'learning_rate': 1.8623424132562358e-05, 'epoch': 0.19} 19%|█▉ | 1288/6638 [1:13:26<4:53:34, 3.29s/it] 19%|█▉ | 1289/6638 [1:13:30<4:54:00, 3.30s/it] {'loss': 0.672, 'grad_norm': 0.6230852227185121, 'learning_rate': 1.8620952356296876e-05, 'epoch': 0.19} 19%|█▉ | 1289/6638 [1:13:30<4:54:00, 3.30s/it] 19%|█▉ | 1290/6638 [1:13:33<4:56:40, 3.33s/it] {'loss': 0.6556, 'grad_norm': 0.7164997419602537, 'learning_rate': 1.8618478527201154e-05, 'epoch': 0.19} 19%|█▉ | 1290/6638 [1:13:33<4:56:40, 3.33s/it] 19%|█▉ | 1291/6638 [1:13:36<4:56:02, 3.32s/it] {'loss': 0.7253, 'grad_norm': 0.6700850119944358, 'learning_rate': 1.861600264586426e-05, 'epoch': 0.19} 19%|█▉ | 1291/6638 [1:13:36<4:56:02, 3.32s/it] 19%|█▉ | 1292/6638 [1:13:39<4:52:27, 3.28s/it] {'loss': 0.6692, 'grad_norm': 0.6997845913021749, 'learning_rate': 1.8613524712875755e-05, 'epoch': 0.19} 19%|█▉ | 1292/6638 [1:13:39<4:52:27, 3.28s/it] 19%|█▉ | 1293/6638 [1:13:43<4:50:31, 3.26s/it] {'loss': 0.6781, 'grad_norm': 0.7719910147661084, 'learning_rate': 1.8611044728825683e-05, 'epoch': 0.19} 19%|█▉ | 1293/6638 [1:13:43<4:50:31, 3.26s/it] 19%|█▉ | 1294/6638 [1:13:46<4:48:50, 3.24s/it] {'loss': 0.6873, 'grad_norm': 0.7505363653014495, 'learning_rate': 1.8608562694304592e-05, 'epoch': 0.19} 19%|█▉ | 1294/6638 [1:13:46<4:48:50, 3.24s/it] 20%|█▉ | 1295/6638 [1:13:49<4:50:50, 3.27s/it] {'loss': 0.6809, 'grad_norm': 0.6292219155452681, 'learning_rate': 1.860607860990349e-05, 'epoch': 0.2} 20%|█▉ | 1295/6638 [1:13:49<4:50:50, 3.27s/it] 20%|█▉ | 1296/6638 [1:13:52<4:51:13, 3.27s/it] {'loss': 0.6829, 'grad_norm': 0.7522469028125686, 'learning_rate': 1.86035924762139e-05, 'epoch': 0.2} 20%|█▉ | 1296/6638 [1:13:52<4:51:13, 3.27s/it] 20%|█▉ | 1297/6638 [1:13:56<4:52:44, 3.29s/it] {'loss': 0.7309, 'grad_norm': 0.706198152840812, 'learning_rate': 1.8601104293827823e-05, 'epoch': 0.2} 20%|█▉ | 1297/6638 [1:13:56<4:52:44, 3.29s/it] 20%|█▉ | 1298/6638 [1:13:59<4:52:36, 3.29s/it] {'loss': 0.7188, 'grad_norm': 0.7397753890051006, 'learning_rate': 1.8598614063337743e-05, 'epoch': 0.2} 20%|█▉ | 1298/6638 [1:13:59<4:52:36, 3.29s/it] 20%|█▉ | 1299/6638 [1:14:02<4:52:30, 3.29s/it] {'loss': 0.6501, 'grad_norm': 0.7189320294472463, 'learning_rate': 1.859612178533664e-05, 'epoch': 0.2} 20%|█▉ | 1299/6638 [1:14:02<4:52:30, 3.29s/it]02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 20%|█▉ | 1300/6638 [1:14:06<4:52:50, 3.29s/it]AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 6AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.6581, 'grad_norm': 0.6368397066485124, 'learning_rate': 1.8593627460417977e-05, 'epoch': 0.2} 20%|█▉ | 1300/6638 [1:14:06<4:52:50, 3.29s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 20%|█▉ | 1301/6638 [1:14:21<10:15:50, 6.92s/it] {'loss': 0.6832, 'grad_norm': 0.7923600286749349, 'learning_rate': 1.8591131089175704e-05, 'epoch': 0.2} 20%|█▉ | 1301/6638 [1:14:21<10:15:50, 6.92s/it] 20%|█▉ | 1302/6638 [1:14:24<8:40:47, 5.86s/it] {'loss': 0.6776, 'grad_norm': 0.6638851008705375, 'learning_rate': 1.8588632672204264e-05, 'epoch': 0.2} 20%|█▉ | 1302/6638 [1:14:24<8:40:47, 5.86s/it] 20%|█▉ | 1303/6638 [1:14:28<7:29:48, 5.06s/it] {'loss': 0.6874, 'grad_norm': 0.6639449148805172, 'learning_rate': 1.8586132210098573e-05, 'epoch': 0.2} 20%|█▉ | 1303/6638 [1:14:28<7:29:48, 5.06s/it] 20%|█▉ | 1304/6638 [1:14:31<6:39:24, 4.49s/it] {'loss': 0.6981, 'grad_norm': 0.7427749412804127, 'learning_rate': 1.8583629703454055e-05, 'epoch': 0.2} 20%|█▉ | 1304/6638 [1:14:31<6:39:24, 4.49s/it] 20%|█▉ | 1305/6638 [1:14:34<6:08:05, 4.14s/it] {'loss': 0.7061, 'grad_norm': 0.6396134725400101, 'learning_rate': 1.85811251528666e-05, 'epoch': 0.2} 20%|█▉ | 1305/6638 [1:14:34<6:08:05, 4.14s/it] 20%|█▉ | 1306/6638 [1:14:37<5:43:21, 3.86s/it] {'loss': 0.6606, 'grad_norm': 0.6326135844428522, 'learning_rate': 1.85786185589326e-05, 'epoch': 0.2} 20%|█▉ | 1306/6638 [1:14:37<5:43:21, 3.86s/it] 20%|█▉ | 1307/6638 [1:14:40<5:22:58, 3.64s/it] {'loss': 0.6417, 'grad_norm': 0.6720121642224005, 'learning_rate': 1.8576109922248932e-05, 'epoch': 0.2} 20%|█▉ | 1307/6638 [1:14:40<5:22:58, 3.64s/it] 20%|█▉ | 1308/6638 [1:14:44<5:15:48, 3.56s/it] {'loss': 0.6969, 'grad_norm': 0.6373592766502154, 'learning_rate': 1.8573599243412944e-05, 'epoch': 0.2} 20%|█▉ | 1308/6638 [1:14:44<5:15:48, 3.56s/it] 20%|█▉ | 1309/6638 [1:14:47<5:06:38, 3.45s/it] {'loss': 0.642, 'grad_norm': 0.6260691811938939, 'learning_rate': 1.8571086523022486e-05, 'epoch': 0.2} 20%|█▉ | 1309/6638 [1:14:47<5:06:38, 3.45s/it] 20%|█▉ | 1310/6638 [1:14:50<5:04:46, 3.43s/it] {'loss': 0.6926, 'grad_norm': 0.6388368554724736, 'learning_rate': 1.8568571761675893e-05, 'epoch': 0.2} 20%|█▉ | 1310/6638 [1:14:50<5:04:46, 3.43s/it] 20%|█▉ | 1311/6638 [1:14:54<5:02:07, 3.40s/it] {'loss': 0.6601, 'grad_norm': 0.5883678737357282, 'learning_rate': 1.8566054959971983e-05, 'epoch': 0.2} 20%|█▉ | 1311/6638 [1:14:54<5:02:07, 3.40s/it] 20%|█▉ | 1312/6638 [1:14:57<4:59:46, 3.38s/it] {'loss': 0.6644, 'grad_norm': 0.6003048867024199, 'learning_rate': 1.856353611851005e-05, 'epoch': 0.2} 20%|█▉ | 1312/6638 [1:14:57<4:59:46, 3.38s/it] 20%|█▉ | 1313/6638 [1:15:00<4:55:08, 3.33s/it] {'loss': 0.6924, 'grad_norm': 0.6115551554325095, 'learning_rate': 1.8561015237889896e-05, 'epoch': 0.2} 20%|█▉ | 1313/6638 [1:15:00<4:55:08, 3.33s/it] 20%|█▉ | 1314/6638 [1:15:03<4:52:48, 3.30s/it] {'loss': 0.6451, 'grad_norm': 0.5703001687962591, 'learning_rate': 1.8558492318711786e-05, 'epoch': 0.2} 20%|█▉ | 1314/6638 [1:15:03<4:52:48, 3.30s/it] 20%|█▉ | 1315/6638 [1:15:07<4:51:30, 3.29s/it] {'loss': 0.6817, 'grad_norm': 0.6145011092225048, 'learning_rate': 1.8555967361576484e-05, 'epoch': 0.2} 20%|█▉ | 1315/6638 [1:15:07<4:51:30, 3.29s/it] 20%|█▉ | 1316/6638 [1:15:10<4:49:48, 3.27s/it] {'loss': 0.6885, 'grad_norm': 0.7022323549469678, 'learning_rate': 1.8553440367085238e-05, 'epoch': 0.2} 20%|█▉ | 1316/6638 [1:15:10<4:49:48, 3.27s/it] 20%|█▉ | 1317/6638 [1:15:13<4:50:35, 3.28s/it] {'loss': 0.7427, 'grad_norm': 0.7440141704601526, 'learning_rate': 1.8550911335839774e-05, 'epoch': 0.2} 20%|█▉ | 1317/6638 [1:15:13<4:50:35, 3.28s/it] 20%|█▉ | 1318/6638 [1:15:17<4:49:31, 3.27s/it] {'loss': 0.6718, 'grad_norm': 0.6266363035709854, 'learning_rate': 1.854838026844231e-05, 'epoch': 0.2} 20%|█▉ | 1318/6638 [1:15:17<4:49:31, 3.27s/it] 20%|█▉ | 1319/6638 [1:15:20<4:49:40, 3.27s/it] {'loss': 0.7033, 'grad_norm': 0.6738878562790707, 'learning_rate': 1.8545847165495547e-05, 'epoch': 0.2} 20%|█▉ | 1319/6638 [1:15:20<4:49:40, 3.27s/it] 20%|█▉ | 1320/6638 [1:15:23<4:51:58, 3.29s/it] {'loss': 0.7048, 'grad_norm': 0.6754931748029022, 'learning_rate': 1.8543312027602668e-05, 'epoch': 0.2} 20%|█▉ | 1320/6638 [1:15:23<4:51:58, 3.29s/it] 20%|█▉ | 1321/6638 [1:15:26<4:51:45, 3.29s/it] {'loss': 0.6714, 'grad_norm': 0.6463293177138033, 'learning_rate': 1.8540774855367344e-05, 'epoch': 0.2} 20%|█▉ | 1321/6638 [1:15:26<4:51:45, 3.29s/it] 20%|█▉ | 1322/6638 [1:15:30<4:53:35, 3.31s/it] {'loss': 0.7097, 'grad_norm': 0.6902170658610308, 'learning_rate': 1.8538235649393727e-05, 'epoch': 0.2} 20%|█▉ | 1322/6638 [1:15:30<4:53:35, 3.31s/it] 20%|█▉ | 1323/6638 [1:15:33<4:50:04, 3.27s/it] {'loss': 0.6864, 'grad_norm': 0.6668150881500824, 'learning_rate': 1.853569441028646e-05, 'epoch': 0.2} 20%|█▉ | 1323/6638 [1:15:33<4:50:04, 3.27s/it] 20%|█▉ | 1324/6638 [1:15:36<4:48:25, 3.26s/it] {'loss': 0.6613, 'grad_norm': 0.6180602257302807, 'learning_rate': 1.8533151138650657e-05, 'epoch': 0.2} 20%|█▉ | 1324/6638 [1:15:36<4:48:25, 3.26s/it] 20%|█▉ | 1325/6638 [1:15:39<4:49:27, 3.27s/it] {'loss': 0.7159, 'grad_norm': 0.6137201114147662, 'learning_rate': 1.8530605835091936e-05, 'epoch': 0.2} 20%|█▉ | 1325/6638 [1:15:39<4:49:27, 3.27s/it] 20%|█▉ | 1326/6638 [1:15:43<4:50:36, 3.28s/it] {'loss': 0.7491, 'grad_norm': 0.8057416209181951, 'learning_rate': 1.8528058500216383e-05, 'epoch': 0.2} 20%|█▉ | 1326/6638 [1:15:43<4:50:36, 3.28s/it] 20%|█▉ | 1327/6638 [1:15:46<4:48:36, 3.26s/it] {'loss': 0.6492, 'grad_norm': 0.5929730100565535, 'learning_rate': 1.8525509134630566e-05, 'epoch': 0.2} 20%|█▉ | 1327/6638 [1:15:46<4:48:36, 3.26s/it] 20%|██ | 1328/6638 [1:15:49<4:49:08, 3.27s/it] {'loss': 0.6627, 'grad_norm': 0.6362586509291763, 'learning_rate': 1.852295773894155e-05, 'epoch': 0.2} 20%|██ | 1328/6638 [1:15:49<4:49:08, 3.27s/it] 20%|██ | 1329/6638 [1:15:53<4:47:52, 3.25s/it] {'loss': 0.6793, 'grad_norm': 0.6383946011431068, 'learning_rate': 1.852040431375687e-05, 'epoch': 0.2} 20%|██ | 1329/6638 [1:15:53<4:47:52, 3.25s/it] 20%|██ | 1330/6638 [1:15:56<4:46:27, 3.24s/it] {'loss': 0.6532, 'grad_norm': 0.691963055933888, 'learning_rate': 1.851784885968456e-05, 'epoch': 0.2} 20%|██ | 1330/6638 [1:15:56<4:46:27, 3.24s/it] 20%|██ | 1331/6638 [1:15:59<4:49:40, 3.28s/it] {'loss': 0.6418, 'grad_norm': 0.561528685814138, 'learning_rate': 1.8515291377333114e-05, 'epoch': 0.2} 20%|██ | 1331/6638 [1:15:59<4:49:40, 3.28s/it] 20%|██ | 1332/6638 [1:16:02<4:48:48, 3.27s/it] {'loss': 0.6669, 'grad_norm': 0.6148535701317194, 'learning_rate': 1.8512731867311534e-05, 'epoch': 0.2} 20%|██ | 1332/6638 [1:16:02<4:48:48, 3.27s/it] 20%|██ | 1333/6638 [1:16:06<4:49:38, 3.28s/it] {'loss': 0.6841, 'grad_norm': 0.6479722673675775, 'learning_rate': 1.851017033022929e-05, 'epoch': 0.2} 20%|██ | 1333/6638 [1:16:06<4:49:38, 3.28s/it] 20%|██ | 1334/6638 [1:16:09<4:50:33, 3.29s/it] {'loss': 0.6943, 'grad_norm': 0.7123339109272768, 'learning_rate': 1.850760676669633e-05, 'epoch': 0.2} 20%|██ | 1334/6638 [1:16:09<4:50:33, 3.29s/it] 20%|██ | 1335/6638 [1:16:12<4:49:03, 3.27s/it] {'loss': 0.6594, 'grad_norm': 0.6033259395714131, 'learning_rate': 1.85050411773231e-05, 'epoch': 0.2} 20%|██ | 1335/6638 [1:16:12<4:49:03, 3.27s/it] 20%|██ | 1336/6638 [1:16:15<4:48:39, 3.27s/it] {'loss': 0.6994, 'grad_norm': 0.6651586822868355, 'learning_rate': 1.8502473562720525e-05, 'epoch': 0.2} 20%|██ | 1336/6638 [1:16:15<4:48:39, 3.27s/it] 20%|██ | 1337/6638 [1:16:19<4:51:42, 3.30s/it] {'loss': 0.7111, 'grad_norm': 0.6591158048281441, 'learning_rate': 1.8499903923500002e-05, 'epoch': 0.2} 20%|██ | 1337/6638 [1:16:19<4:51:42, 3.30s/it] 20%|██ | 1338/6638 [1:16:22<4:49:14, 3.27s/it] {'loss': 0.6836, 'grad_norm': 0.6007064292321194, 'learning_rate': 1.8497332260273415e-05, 'epoch': 0.2} 20%|██ | 1338/6638 [1:16:22<4:49:14, 3.27s/it] 20%|██ | 1339/6638 [1:16:25<4:49:17, 3.28s/it] {'loss': 0.6727, 'grad_norm': 0.636670357654969, 'learning_rate': 1.8494758573653134e-05, 'epoch': 0.2} 20%|██ | 1339/6638 [1:16:25<4:49:17, 3.28s/it] 20%|██ | 1340/6638 [1:16:29<4:52:45, 3.32s/it] {'loss': 0.7282, 'grad_norm': 0.6525220103796071, 'learning_rate': 1.849218286425201e-05, 'epoch': 0.2} 20%|██ | 1340/6638 [1:16:29<4:52:45, 3.32s/it] 20%|██ | 1341/6638 [1:16:32<4:50:32, 3.29s/it] {'loss': 0.6386, 'grad_norm': 0.5538695425429433, 'learning_rate': 1.8489605132683365e-05, 'epoch': 0.2} 20%|██ | 1341/6638 [1:16:32<4:50:32, 3.29s/it] 20%|██ | 1342/6638 [1:16:35<4:51:16, 3.30s/it] {'loss': 0.7269, 'grad_norm': 0.7089285226416117, 'learning_rate': 1.848702537956102e-05, 'epoch': 0.2} 20%|██ | 1342/6638 [1:16:35<4:51:16, 3.30s/it] 20%|██ | 1343/6638 [1:16:39<4:50:03, 3.29s/it] {'loss': 0.7056, 'grad_norm': 0.7271504741902867, 'learning_rate': 1.8484443605499266e-05, 'epoch': 0.2} 20%|██ | 1343/6638 [1:16:39<4:50:03, 3.29s/it] 20%|██ | 1344/6638 [1:16:42<4:50:13, 3.29s/it] {'loss': 0.723, 'grad_norm': 0.6842319188400756, 'learning_rate': 1.848185981111288e-05, 'epoch': 0.2} 20%|██ | 1344/6638 [1:16:42<4:50:13, 3.29s/it] 20%|██ | 1345/6638 [1:16:45<4:50:32, 3.29s/it] {'loss': 0.6668, 'grad_norm': 0.654186533124208, 'learning_rate': 1.8479273997017112e-05, 'epoch': 0.2} 20%|██ | 1345/6638 [1:16:45<4:50:32, 3.29s/it] 20%|██ | 1346/6638 [1:16:48<4:50:19, 3.29s/it] {'loss': 0.717, 'grad_norm': 0.6262112105542913, 'learning_rate': 1.84766861638277e-05, 'epoch': 0.2} 20%|██ | 1346/6638 [1:16:48<4:50:19, 3.29s/it] 20%|██ | 1347/6638 [1:16:52<4:52:17, 3.31s/it] {'loss': 0.7169, 'grad_norm': 0.6924284904108939, 'learning_rate': 1.8474096312160866e-05, 'epoch': 0.2} 20%|██ | 1347/6638 [1:16:52<4:52:17, 3.31s/it] 20%|██ | 1348/6638 [1:16:55<4:52:36, 3.32s/it] {'loss': 0.7882, 'grad_norm': 1.0487927835333088, 'learning_rate': 1.8471504442633304e-05, 'epoch': 0.2} 20%|██ | 1348/6638 [1:16:55<4:52:36, 3.32s/it] 20%|██ | 1349/6638 [1:16:58<4:51:42, 3.31s/it] {'loss': 0.6772, 'grad_norm': 0.6415212295248469, 'learning_rate': 1.8468910555862196e-05, 'epoch': 0.2} 20%|██ | 1349/6638 [1:16:58<4:51:42, 3.31s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 20%|██ | 1350/6638 [1:17:02<4:52:13, 3.32s/it]3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.6895, 'grad_norm': 0.6661915554690784, 'learning_rate': 1.84663146524652e-05, 'epoch': 0.2} 20%|██ | 1350/6638 [1:17:02<4:52:13, 3.32s/it] 20%|██ | 1351/6638 [1:17:05<4:49:51, 3.29s/it] {'loss': 0.7044, 'grad_norm': 0.7398288220146225, 'learning_rate': 1.8463716733060454e-05, 'epoch': 0.2} 20%|██ | 1351/6638 [1:17:05<4:49:51, 3.29s/it] 20%|██ | 1352/6638 [1:17:08<4:49:28, 3.29s/it] {'loss': 0.6651, 'grad_norm': 0.6620694824146058, 'learning_rate': 1.846111679826658e-05, 'epoch': 0.2} 20%|██ | 1352/6638 [1:17:08<4:49:28, 3.29s/it] 20%|██ | 1353/6638 [1:17:11<4:47:58, 3.27s/it] {'loss': 0.6559, 'grad_norm': 0.6639608396946841, 'learning_rate': 1.845851484870267e-05, 'epoch': 0.2} 20%|██ | 1353/6638 [1:17:11<4:47:58, 3.27s/it] 20%|██ | 1354/6638 [1:17:15<4:55:03, 3.35s/it] {'loss': 0.7143, 'grad_norm': 0.7158491509548275, 'learning_rate': 1.8455910884988312e-05, 'epoch': 0.2} 20%|██ | 1354/6638 [1:17:15<4:55:03, 3.35s/it] 20%|██ | 1355/6638 [1:17:18<4:53:17, 3.33s/it] {'loss': 0.7084, 'grad_norm': 0.7510125849038746, 'learning_rate': 1.8453304907743562e-05, 'epoch': 0.2} 20%|██ | 1355/6638 [1:17:18<4:53:17, 3.33s/it] 20%|██ | 1356/6638 [1:17:22<4:51:21, 3.31s/it] {'loss': 0.6719, 'grad_norm': 0.7447024663372724, 'learning_rate': 1.8450696917588954e-05, 'epoch': 0.2} 20%|██ | 1356/6638 [1:17:22<4:51:21, 3.31s/it] 20%|██ | 1357/6638 [1:17:25<4:50:53, 3.30s/it] {'loss': 0.67, 'grad_norm': 0.650049016295174, 'learning_rate': 1.8448086915145508e-05, 'epoch': 0.2} 20%|██ | 1357/6638 [1:17:25<4:50:53, 3.30s/it] 20%|██ | 1358/6638 [1:17:28<4:45:48, 3.25s/it] {'loss': 0.6577, 'grad_norm': 0.6426188278301328, 'learning_rate': 1.844547490103472e-05, 'epoch': 0.2} 20%|██ | 1358/6638 [1:17:28<4:45:48, 3.25s/it] 20%|██ | 1359/6638 [1:17:31<4:45:35, 3.25s/it] {'loss': 0.6436, 'grad_norm': 0.625066326770112, 'learning_rate': 1.844286087587857e-05, 'epoch': 0.2} 20%|██ | 1359/6638 [1:17:31<4:45:35, 3.25s/it] 20%|██ | 1360/6638 [1:17:34<4:43:59, 3.23s/it] {'loss': 0.6337, 'grad_norm': 0.6002975782356008, 'learning_rate': 1.8440244840299507e-05, 'epoch': 0.2} 20%|██ | 1360/6638 [1:17:34<4:43:59, 3.23s/it] 21%|██ | 1361/6638 [1:17:38<4:47:11, 3.27s/it] {'loss': 0.6729, 'grad_norm': 0.7148213047356087, 'learning_rate': 1.8437626794920464e-05, 'epoch': 0.21} 21%|██ | 1361/6638 [1:17:38<4:47:11, 3.27s/it] 21%|██ | 1362/6638 [1:17:41<4:45:13, 3.24s/it] {'loss': 0.6832, 'grad_norm': 0.645800450641402, 'learning_rate': 1.8435006740364857e-05, 'epoch': 0.21} 21%|██ | 1362/6638 [1:17:41<4:45:13, 3.24s/it] 21%|██ | 1363/6638 [1:17:44<4:46:46, 3.26s/it] {'loss': 0.6791, 'grad_norm': 0.6667030047021277, 'learning_rate': 1.843238467725657e-05, 'epoch': 0.21} 21%|██ | 1363/6638 [1:17:44<4:46:46, 3.26s/it] 21%|██ | 1364/6638 [1:17:47<4:45:34, 3.25s/it] {'loss': 0.6631, 'grad_norm': 0.6586002193976124, 'learning_rate': 1.8429760606219974e-05, 'epoch': 0.21} 21%|██ | 1364/6638 [1:17:47<4:45:34, 3.25s/it] 21%|██ | 1365/6638 [1:17:51<4:47:06, 3.27s/it] {'loss': 0.6711, 'grad_norm': 0.6758626421966961, 'learning_rate': 1.8427134527879923e-05, 'epoch': 0.21} 21%|██ | 1365/6638 [1:17:51<4:47:06, 3.27s/it] 21%|██ | 1366/6638 [1:17:54<4:45:24, 3.25s/it] {'loss': 0.714, 'grad_norm': 0.9271985501108518, 'learning_rate': 1.8424506442861733e-05, 'epoch': 0.21} 21%|██ | 1366/6638 [1:17:54<4:45:24, 3.25s/it] 21%|██ | 1367/6638 [1:17:57<4:52:14, 3.33s/it] {'loss': 0.7012, 'grad_norm': 0.8131683553489338, 'learning_rate': 1.8421876351791206e-05, 'epoch': 0.21} 21%|██ | 1367/6638 [1:17:57<4:52:14, 3.33s/it] 21%|██ | 1368/6638 [1:18:01<4:55:18, 3.36s/it] {'loss': 0.699, 'grad_norm': 0.6265243460377387, 'learning_rate': 1.8419244255294628e-05, 'epoch': 0.21} 21%|██ | 1368/6638 [1:18:01<4:55:18, 3.36s/it] 21%|██ | 1369/6638 [1:18:04<4:53:08, 3.34s/it] {'loss': 0.7001, 'grad_norm': 0.7043324872525049, 'learning_rate': 1.8416610153998748e-05, 'epoch': 0.21} 21%|██ | 1369/6638 [1:18:04<4:53:08, 3.34s/it] 21%|██ | 1370/6638 [1:18:07<4:49:25, 3.30s/it] {'loss': 0.6849, 'grad_norm': 0.6996773366908847, 'learning_rate': 1.8413974048530813e-05, 'epoch': 0.21} 21%|██ | 1370/6638 [1:18:07<4:49:25, 3.30s/it] 21%|██ | 1371/6638 [1:18:11<4:46:15, 3.26s/it] {'loss': 0.7093, 'grad_norm': 0.7583174689868039, 'learning_rate': 1.8411335939518523e-05, 'epoch': 0.21} 21%|██ | 1371/6638 [1:18:11<4:46:15, 3.26s/it] 21%|██ | 1372/6638 [1:18:14<4:51:20, 3.32s/it] {'loss': 0.6947, 'grad_norm': 0.648628371637579, 'learning_rate': 1.8408695827590073e-05, 'epoch': 0.21} 21%|██ | 1372/6638 [1:18:14<4:51:20, 3.32s/it] 21%|██ | 1373/6638 [1:18:17<4:51:39, 3.32s/it] {'loss': 0.6641, 'grad_norm': 0.664846633104132, 'learning_rate': 1.840605371337413e-05, 'epoch': 0.21} 21%|██ | 1373/6638 [1:18:17<4:51:39, 3.32s/it] 21%|██ | 1374/6638 [1:18:21<4:48:40, 3.29s/it] {'loss': 0.6758, 'grad_norm': 0.6308553386697108, 'learning_rate': 1.8403409597499835e-05, 'epoch': 0.21} 21%|██ | 1374/6638 [1:18:21<4:48:40, 3.29s/it] 21%|██ | 1375/6638 [1:18:24<4:48:13, 3.29s/it] {'loss': 0.6282, 'grad_norm': 0.5556427904151043, 'learning_rate': 1.840076348059681e-05, 'epoch': 0.21} 21%|██ | 1375/6638 [1:18:24<4:48:13, 3.29s/it] 21%|██ | 1376/6638 [1:18:27<4:48:00, 3.28s/it] {'loss': 0.6858, 'grad_norm': 0.621108825833484, 'learning_rate': 1.8398115363295152e-05, 'epoch': 0.21} 21%|██ | 1376/6638 [1:18:27<4:48:00, 3.28s/it] 21%|██ | 1377/6638 [1:18:31<4:55:02, 3.36s/it] {'loss': 0.6758, 'grad_norm': 0.6352190290973909, 'learning_rate': 1.8395465246225425e-05, 'epoch': 0.21} 21%|██ | 1377/6638 [1:18:31<4:55:02, 3.36s/it] 21%|██ | 1378/6638 [1:18:34<4:55:24, 3.37s/it] {'loss': 0.6669, 'grad_norm': 0.5867656054741606, 'learning_rate': 1.8392813130018687e-05, 'epoch': 0.21} 21%|██ | 1378/6638 [1:18:34<4:55:24, 3.37s/it] 21%|██ | 1379/6638 [1:18:37<4:53:07, 3.34s/it] {'loss': 0.6826, 'grad_norm': 0.6305991091450499, 'learning_rate': 1.839015901530646e-05, 'epoch': 0.21} 21%|██ | 1379/6638 [1:18:37<4:53:07, 3.34s/it] 21%|██ | 1380/6638 [1:18:41<4:50:01, 3.31s/it] {'loss': 0.685, 'grad_norm': 0.6631617149575023, 'learning_rate': 1.8387502902720744e-05, 'epoch': 0.21} 21%|██ | 1380/6638 [1:18:41<4:50:01, 3.31s/it] 21%|██ | 1381/6638 [1:18:44<4:48:26, 3.29s/it] {'loss': 0.6745, 'grad_norm': 0.630483105143307, 'learning_rate': 1.8384844792894014e-05, 'epoch': 0.21} 21%|██ | 1381/6638 [1:18:44<4:48:26, 3.29s/it] 21%|██ | 1382/6638 [1:18:47<4:47:21, 3.28s/it] {'loss': 0.6729, 'grad_norm': 0.704474502772705, 'learning_rate': 1.8382184686459225e-05, 'epoch': 0.21} 21%|██ | 1382/6638 [1:18:47<4:47:21, 3.28s/it] 21%|██ | 1383/6638 [1:18:50<4:50:35, 3.32s/it] {'loss': 0.6534, 'grad_norm': 0.6415121658493754, 'learning_rate': 1.8379522584049803e-05, 'epoch': 0.21} 21%|██ | 1383/6638 [1:18:50<4:50:35, 3.32s/it] 21%|██ | 1384/6638 [1:18:54<4:48:29, 3.29s/it] {'loss': 0.6881, 'grad_norm': 0.6133715236667957, 'learning_rate': 1.837685848629965e-05, 'epoch': 0.21} 21%|██ | 1384/6638 [1:18:54<4:48:29, 3.29s/it] 21%|██ | 1385/6638 [1:18:57<4:49:34, 3.31s/it] {'loss': 0.6582, 'grad_norm': 0.6237425079183482, 'learning_rate': 1.8374192393843143e-05, 'epoch': 0.21} 21%|██ | 1385/6638 [1:18:57<4:49:34, 3.31s/it] 21%|██ | 1386/6638 [1:19:00<4:50:26, 3.32s/it] {'loss': 0.6623, 'grad_norm': 0.6433697866983755, 'learning_rate': 1.8371524307315135e-05, 'epoch': 0.21} 21%|██ | 1386/6638 [1:19:00<4:50:26, 3.32s/it] 21%|██ | 1387/6638 [1:19:04<4:49:18, 3.31s/it] {'loss': 0.6603, 'grad_norm': 0.5905312400997305, 'learning_rate': 1.8368854227350955e-05, 'epoch': 0.21} 21%|██ | 1387/6638 [1:19:04<4:49:18, 3.31s/it] 21%|██ | 1388/6638 [1:19:07<4:47:15, 3.28s/it] {'loss': 0.6972, 'grad_norm': 0.6616293078311177, 'learning_rate': 1.8366182154586408e-05, 'epoch': 0.21} 21%|██ | 1388/6638 [1:19:07<4:47:15, 3.28s/it] 21%|██ | 1389/6638 [1:19:10<4:44:55, 3.26s/it] {'loss': 0.6741, 'grad_norm': 0.6629074015334505, 'learning_rate': 1.8363508089657763e-05, 'epoch': 0.21} 21%|██ | 1389/6638 [1:19:10<4:44:55, 3.26s/it] 21%|██ | 1390/6638 [1:19:13<4:45:36, 3.27s/it] {'loss': 0.6935, 'grad_norm': 0.6774692740541267, 'learning_rate': 1.8360832033201777e-05, 'epoch': 0.21} 21%|██ | 1390/6638 [1:19:13<4:45:36, 3.27s/it] 21%|██ | 1391/6638 [1:19:17<4:46:08, 3.27s/it] {'loss': 0.6626, 'grad_norm': 0.6675787724049153, 'learning_rate': 1.8358153985855675e-05, 'epoch': 0.21} 21%|██ | 1391/6638 [1:19:17<4:46:08, 3.27s/it] 21%|██ | 1392/6638 [1:19:20<4:44:49, 3.26s/it] {'loss': 0.6829, 'grad_norm': 0.6545388666717739, 'learning_rate': 1.8355473948257156e-05, 'epoch': 0.21} 21%|██ | 1392/6638 [1:19:20<4:44:49, 3.26s/it] 21%|██ | 1393/6638 [1:19:23<4:45:04, 3.26s/it] {'loss': 0.6429, 'grad_norm': 0.6085687550442133, 'learning_rate': 1.835279192104439e-05, 'epoch': 0.21} 21%|██ | 1393/6638 [1:19:23<4:45:04, 3.26s/it] 21%|██ | 1394/6638 [1:19:26<4:42:46, 3.24s/it] {'loss': 0.6887, 'grad_norm': 0.6770069696604777, 'learning_rate': 1.8350107904856026e-05, 'epoch': 0.21} 21%|██ | 1394/6638 [1:19:26<4:42:46, 3.24s/it] 21%|██ | 1395/6638 [1:19:30<4:46:24, 3.28s/it] {'loss': 0.6947, 'grad_norm': 0.6896491106557917, 'learning_rate': 1.834742190033119e-05, 'epoch': 0.21} 21%|██ | 1395/6638 [1:19:30<4:46:24, 3.28s/it] 21%|██ | 1396/6638 [1:19:33<4:50:56, 3.33s/it] {'loss': 0.6875, 'grad_norm': 0.5553997513193227, 'learning_rate': 1.834473390810947e-05, 'epoch': 0.21} 21%|██ | 1396/6638 [1:19:33<4:50:56, 3.33s/it] 21%|██ | 1397/6638 [1:19:37<4:51:13, 3.33s/it] {'loss': 0.6972, 'grad_norm': 0.6745064160727037, 'learning_rate': 1.8342043928830932e-05, 'epoch': 0.21} 21%|██ | 1397/6638 [1:19:37<4:51:13, 3.33s/it] 21%|██ | 1398/6638 [1:19:40<4:48:43, 3.31s/it] {'loss': 0.648, 'grad_norm': 0.6318389998936145, 'learning_rate': 1.833935196313612e-05, 'epoch': 0.21} 21%|██ | 1398/6638 [1:19:40<4:48:43, 3.31s/it] 21%|██ | 1399/6638 [1:19:43<4:47:47, 3.30s/it] {'loss': 0.7187, 'grad_norm': 0.6747985983695389, 'learning_rate': 1.8336658011666055e-05, 'epoch': 0.21} 21%|██ | 1399/6638 [1:19:43<4:47:47, 3.30s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... 6AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 21%|██ | 1400/6638 [1:19:46<4:46:24, 3.28s/it] AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.7026, 'grad_norm': 0.6745542341133071, 'learning_rate': 1.833396207506221e-05, 'epoch': 0.21} 21%|██ | 1400/6638 [1:19:46<4:46:24, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 21%|██ | 1401/6638 [1:20:02<10:08:16, 6.97s/it] {'loss': 0.6727, 'grad_norm': 0.6234422767133969, 'learning_rate': 1.833126415396655e-05, 'epoch': 0.21} 21%|██ | 1401/6638 [1:20:02<10:08:16, 6.97s/it] 21%|██ | 1402/6638 [1:20:05<8:30:05, 5.85s/it] {'loss': 0.7109, 'grad_norm': 0.6410067283889025, 'learning_rate': 1.8328564249021514e-05, 'epoch': 0.21} 21%|██ | 1402/6638 [1:20:05<8:30:05, 5.85s/it] 21%|██ | 1403/6638 [1:20:08<7:22:56, 5.08s/it] {'loss': 0.6827, 'grad_norm': 0.6025808676580767, 'learning_rate': 1.8325862360869994e-05, 'epoch': 0.21} 21%|██ | 1403/6638 [1:20:08<7:22:56, 5.08s/it] 21%|██ | 1404/6638 [1:20:12<6:36:33, 4.55s/it] {'loss': 0.8119, 'grad_norm': 0.8200431867171021, 'learning_rate': 1.8323158490155374e-05, 'epoch': 0.21} 21%|██ | 1404/6638 [1:20:12<6:36:33, 4.55s/it] 21%|██ | 1405/6638 [1:20:15<6:04:03, 4.17s/it] {'loss': 0.6837, 'grad_norm': 0.6503189526345259, 'learning_rate': 1.83204526375215e-05, 'epoch': 0.21} 21%|██ | 1405/6638 [1:20:15<6:04:03, 4.17s/it] 21%|██ | 1406/6638 [1:20:18<5:45:16, 3.96s/it] {'loss': 0.7015, 'grad_norm': 0.632832944468238, 'learning_rate': 1.8317744803612693e-05, 'epoch': 0.21} 21%|██ | 1406/6638 [1:20:18<5:45:16, 3.96s/it] 21%|██ | 1407/6638 [1:20:22<5:24:54, 3.73s/it] {'loss': 0.6251, 'grad_norm': 0.6066199392643846, 'learning_rate': 1.831503498907375e-05, 'epoch': 0.21} 21%|██ | 1407/6638 [1:20:22<5:24:54, 3.73s/it] 21%|██ | 1408/6638 [1:20:25<5:11:52, 3.58s/it] {'loss': 0.6445, 'grad_norm': 0.605836070371619, 'learning_rate': 1.8312323194549923e-05, 'epoch': 0.21} 21%|██ | 1408/6638 [1:20:25<5:11:52, 3.58s/it] 21%|██ | 1409/6638 [1:20:28<5:03:48, 3.49s/it] {'loss': 0.6408, 'grad_norm': 0.5869842654436523, 'learning_rate': 1.8309609420686956e-05, 'epoch': 0.21} 21%|██ | 1409/6638 [1:20:28<5:03:48, 3.49s/it] 21%|██ | 1410/6638 [1:20:31<4:55:53, 3.40s/it] {'loss': 0.6549, 'grad_norm': 0.6299633931757662, 'learning_rate': 1.8306893668131058e-05, 'epoch': 0.21} 21%|██ | 1410/6638 [1:20:31<4:55:53, 3.40s/it] 21%|██▏ | 1411/6638 [1:20:34<4:49:18, 3.32s/it] {'loss': 0.7071, 'grad_norm': 0.7317499523018466, 'learning_rate': 1.8304175937528902e-05, 'epoch': 0.21} 21%|██▏ | 1411/6638 [1:20:34<4:49:18, 3.32s/it] 21%|██▏ | 1412/6638 [1:20:38<4:49:28, 3.32s/it] {'loss': 0.707, 'grad_norm': 0.6805344446534591, 'learning_rate': 1.8301456229527637e-05, 'epoch': 0.21} 21%|██▏ | 1412/6638 [1:20:38<4:49:28, 3.32s/it] 21%|██▏ | 1413/6638 [1:20:41<4:48:38, 3.31s/it] {'loss': 0.7041, 'grad_norm': 0.7344327116628767, 'learning_rate': 1.8298734544774882e-05, 'epoch': 0.21} 21%|██▏ | 1413/6638 [1:20:41<4:48:38, 3.31s/it] 21%|██▏ | 1414/6638 [1:20:44<4:46:45, 3.29s/it] {'loss': 0.7186, 'grad_norm': 0.7661745892170709, 'learning_rate': 1.829601088391873e-05, 'epoch': 0.21} 21%|██▏ | 1414/6638 [1:20:44<4:46:45, 3.29s/it] 21%|██▏ | 1415/6638 [1:20:48<4:45:47, 3.28s/it] {'loss': 0.6814, 'grad_norm': 0.6526691148641242, 'learning_rate': 1.8293285247607743e-05, 'epoch': 0.21} 21%|██▏ | 1415/6638 [1:20:48<4:45:47, 3.28s/it] 21%|██▏ | 1416/6638 [1:20:51<4:44:55, 3.27s/it] {'loss': 0.7477, 'grad_norm': 0.7135755432300975, 'learning_rate': 1.829055763649095e-05, 'epoch': 0.21} 21%|██▏ | 1416/6638 [1:20:51<4:44:55, 3.27s/it] 21%|██▏ | 1417/6638 [1:20:54<4:45:50, 3.28s/it] {'loss': 0.6553, 'grad_norm': 0.7068537492489374, 'learning_rate': 1.8287828051217854e-05, 'epoch': 0.21} 21%|██▏ | 1417/6638 [1:20:54<4:45:50, 3.28s/it] 21%|██▏ | 1418/6638 [1:20:57<4:46:26, 3.29s/it] {'loss': 0.6629, 'grad_norm': 0.5843680194126959, 'learning_rate': 1.8285096492438424e-05, 'epoch': 0.21} 21%|██▏ | 1418/6638 [1:20:57<4:46:26, 3.29s/it] 21%|██▏ | 1419/6638 [1:21:01<4:47:36, 3.31s/it] {'loss': 0.6859, 'grad_norm': 0.6872816254511924, 'learning_rate': 1.82823629608031e-05, 'epoch': 0.21} 21%|██▏ | 1419/6638 [1:21:01<4:47:36, 3.31s/it] 21%|██▏ | 1420/6638 [1:21:04<4:45:36, 3.28s/it] {'loss': 0.706, 'grad_norm': 0.6823533077574995, 'learning_rate': 1.8279627456962804e-05, 'epoch': 0.21} 21%|██▏ | 1420/6638 [1:21:04<4:45:36, 3.28s/it] 21%|██▏ | 1421/6638 [1:21:07<4:44:17, 3.27s/it] {'loss': 0.6785, 'grad_norm': 0.5836688490970761, 'learning_rate': 1.827688998156891e-05, 'epoch': 0.21} 21%|██▏ | 1421/6638 [1:21:07<4:44:17, 3.27s/it] 21%|██▏ | 1422/6638 [1:21:10<4:42:53, 3.25s/it] {'loss': 0.7182, 'grad_norm': 0.6994185290896885, 'learning_rate': 1.8274150535273264e-05, 'epoch': 0.21} 21%|██▏ | 1422/6638 [1:21:10<4:42:53, 3.25s/it] 21%|██▏ | 1423/6638 [1:21:14<4:46:42, 3.30s/it] {'loss': 0.6712, 'grad_norm': 0.5819563110385442, 'learning_rate': 1.8271409118728192e-05, 'epoch': 0.21} 21%|██▏ | 1423/6638 [1:21:14<4:46:42, 3.30s/it] 21%|██▏ | 1424/6638 [1:21:17<4:47:08, 3.30s/it] {'loss': 0.6579, 'grad_norm': 0.5985097630441871, 'learning_rate': 1.826866573258648e-05, 'epoch': 0.21} 21%|██▏ | 1424/6638 [1:21:17<4:47:08, 3.30s/it] 21%|██▏ | 1425/6638 [1:21:25<6:52:09, 4.74s/it] {'loss': 0.6434, 'grad_norm': 0.5531479680573015, 'learning_rate': 1.826592037750139e-05, 'epoch': 0.21} 21%|██▏ | 1425/6638 [1:21:25<6:52:09, 4.74s/it] 21%|██▏ | 1426/6638 [1:21:29<6:14:05, 4.31s/it] {'loss': 0.6737, 'grad_norm': 0.6060977657676204, 'learning_rate': 1.8263173054126643e-05, 'epoch': 0.21} 21%|██▏ | 1426/6638 [1:21:29<6:14:05, 4.31s/it] 21%|██▏ | 1427/6638 [1:21:32<5:44:38, 3.97s/it] {'loss': 0.7157, 'grad_norm': 0.6367886566429432, 'learning_rate': 1.826042376311644e-05, 'epoch': 0.21} 21%|██▏ | 1427/6638 [1:21:32<5:44:38, 3.97s/it] 22%|██▏ | 1428/6638 [1:21:36<6:04:16, 4.20s/it] {'loss': 0.7136, 'grad_norm': 0.6675228660754237, 'learning_rate': 1.8257672505125445e-05, 'epoch': 0.22} 22%|██▏ | 1428/6638 [1:21:36<6:04:16, 4.20s/it] 22%|██▏ | 1429/6638 [1:21:42<6:37:11, 4.58s/it] {'loss': 0.6393, 'grad_norm': 0.5458787329277566, 'learning_rate': 1.8254919280808782e-05, 'epoch': 0.22} 22%|██▏ | 1429/6638 [1:21:42<6:37:11, 4.58s/it] 22%|██▏ | 1430/6638 [1:21:47<6:39:51, 4.61s/it] {'loss': 0.7128, 'grad_norm': 0.6519547298866878, 'learning_rate': 1.8252164090822064e-05, 'epoch': 0.22} 22%|██▏ | 1430/6638 [1:21:47<6:39:51, 4.61s/it] 22%|██▏ | 1431/6638 [1:21:50<6:03:47, 4.19s/it] {'loss': 0.6334, 'grad_norm': 0.6278453194407896, 'learning_rate': 1.824940693582135e-05, 'epoch': 0.22} 22%|██▏ | 1431/6638 [1:21:50<6:03:47, 4.19s/it] 22%|██▏ | 1432/6638 [1:21:53<5:38:16, 3.90s/it] {'loss': 0.7196, 'grad_norm': 0.6825449116147547, 'learning_rate': 1.824664781646318e-05, 'epoch': 0.22} 22%|██▏ | 1432/6638 [1:21:53<5:38:16, 3.90s/it] 22%|██▏ | 1433/6638 [1:21:56<5:24:26, 3.74s/it] {'loss': 0.7327, 'grad_norm': 0.7231333145221025, 'learning_rate': 1.8243886733404563e-05, 'epoch': 0.22} 22%|██▏ | 1433/6638 [1:21:56<5:24:26, 3.74s/it] 22%|██▏ | 1434/6638 [1:22:00<5:11:26, 3.59s/it] {'loss': 0.689, 'grad_norm': 0.5977486069272752, 'learning_rate': 1.824112368730296e-05, 'epoch': 0.22} 22%|██▏ | 1434/6638 [1:22:00<5:11:26, 3.59s/it] 22%|██▏ | 1435/6638 [1:22:03<5:02:34, 3.49s/it] {'loss': 0.6643, 'grad_norm': 0.5897517915437822, 'learning_rate': 1.8238358678816324e-05, 'epoch': 0.22} 22%|██▏ | 1435/6638 [1:22:03<5:02:34, 3.49s/it] 22%|██▏ | 1436/6638 [1:22:06<4:56:27, 3.42s/it] {'loss': 0.6562, 'grad_norm': 0.5776726052543243, 'learning_rate': 1.823559170860305e-05, 'epoch': 0.22} 22%|██▏ | 1436/6638 [1:22:06<4:56:27, 3.42s/it] 22%|██▏ | 1437/6638 [1:22:11<5:34:22, 3.86s/it] {'loss': 0.6642, 'grad_norm': 0.6165143935899642, 'learning_rate': 1.823282277732202e-05, 'epoch': 0.22} 22%|██▏ | 1437/6638 [1:22:11<5:34:22, 3.86s/it] 22%|██▏ | 1438/6638 [1:22:14<5:21:09, 3.71s/it] {'loss': 0.6976, 'grad_norm': 0.6417789079805943, 'learning_rate': 1.823005188563257e-05, 'epoch': 0.22} 22%|██▏ | 1438/6638 [1:22:14<5:21:09, 3.71s/it] 22%|██▏ | 1439/6638 [1:22:19<5:52:29, 4.07s/it] {'loss': 0.6845, 'grad_norm': 0.6058712536592873, 'learning_rate': 1.822727903419451e-05, 'epoch': 0.22} 22%|██▏ | 1439/6638 [1:22:19<5:52:29, 4.07s/it] 22%|██▏ | 1440/6638 [1:22:23<5:32:51, 3.84s/it] {'loss': 0.7064, 'grad_norm': 0.7151235621756298, 'learning_rate': 1.8224504223668112e-05, 'epoch': 0.22} 22%|██▏ | 1440/6638 [1:22:23<5:32:51, 3.84s/it] 22%|██▏ | 1441/6638 [1:22:26<5:16:12, 3.65s/it] {'loss': 0.6837, 'grad_norm': 0.6242112443093313, 'learning_rate': 1.822172745471412e-05, 'epoch': 0.22} 22%|██▏ | 1441/6638 [1:22:26<5:16:12, 3.65s/it] 22%|██▏ | 1442/6638 [1:22:29<5:04:19, 3.51s/it] {'loss': 0.7032, 'grad_norm': 0.727912182459285, 'learning_rate': 1.8218948727993736e-05, 'epoch': 0.22} 22%|██▏ | 1442/6638 [1:22:29<5:04:19, 3.51s/it] 22%|██▏ | 1443/6638 [1:22:32<4:58:00, 3.44s/it] {'loss': 0.6768, 'grad_norm': 0.8904042633156036, 'learning_rate': 1.821616804416864e-05, 'epoch': 0.22} 22%|██▏ | 1443/6638 [1:22:32<4:58:00, 3.44s/it] 22%|██▏ | 1444/6638 [1:22:37<5:33:16, 3.85s/it] {'loss': 0.6257, 'grad_norm': 0.569105195123786, 'learning_rate': 1.8213385403900966e-05, 'epoch': 0.22} 22%|██▏ | 1444/6638 [1:22:37<5:33:16, 3.85s/it] 22%|██▏ | 1445/6638 [1:22:40<5:17:43, 3.67s/it] {'loss': 0.7176, 'grad_norm': 0.6941894032535014, 'learning_rate': 1.8210600807853325e-05, 'epoch': 0.22} 22%|██▏ | 1445/6638 [1:22:40<5:17:43, 3.67s/it] 22%|██▏ | 1446/6638 [1:22:44<5:08:53, 3.57s/it] {'loss': 0.7093, 'grad_norm': 0.6711894542316306, 'learning_rate': 1.820781425668878e-05, 'epoch': 0.22} 22%|██▏ | 1446/6638 [1:22:44<5:08:53, 3.57s/it] 22%|██▏ | 1447/6638 [1:22:47<4:58:49, 3.45s/it] {'loss': 0.649, 'grad_norm': 0.6117549512809066, 'learning_rate': 1.8205025751070878e-05, 'epoch': 0.22} 22%|██▏ | 1447/6638 [1:22:47<4:58:49, 3.45s/it] 22%|██▏ | 1448/6638 [1:22:50<4:53:02, 3.39s/it] {'loss': 0.7145, 'grad_norm': 0.6494622073087296, 'learning_rate': 1.820223529166361e-05, 'epoch': 0.22} 22%|██▏ | 1448/6638 [1:22:50<4:53:02, 3.39s/it] 22%|██▏ | 1449/6638 [1:22:53<4:49:09, 3.34s/it] {'loss': 0.6487, 'grad_norm': 0.5910394740054993, 'learning_rate': 1.819944287913145e-05, 'epoch': 0.22} 22%|██▏ | 1449/6638 [1:22:53<4:49:09, 3.34s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 22%|██▏ | 1450/6638 [1:22:57<4:48:25, 3.34s/it]3 4AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 61 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... {'loss': 0.6762, 'grad_norm': 0.6322989112118631, 'learning_rate': 1.8196648514139325e-05, 'epoch': 0.22} 22%|██▏ | 1450/6638 [1:22:57<4:48:25, 3.34s/it] 22%|██▏ | 1451/6638 [1:23:00<4:47:19, 3.32s/it] {'loss': 0.6451, 'grad_norm': 0.632378511007854, 'learning_rate': 1.8193852197352636e-05, 'epoch': 0.22} 22%|██▏ | 1451/6638 [1:23:00<4:47:19, 3.32s/it] 22%|██▏ | 1452/6638 [1:23:03<4:46:57, 3.32s/it] {'loss': 0.7186, 'grad_norm': 0.7474545432413087, 'learning_rate': 1.8191053929437243e-05, 'epoch': 0.22} 22%|██▏ | 1452/6638 [1:23:03<4:46:57, 3.32s/it] 22%|██▏ | 1453/6638 [1:23:06<4:42:36, 3.27s/it] {'loss': 0.6819, 'grad_norm': 0.7566024591517276, 'learning_rate': 1.8188253711059475e-05, 'epoch': 0.22} 22%|██▏ | 1453/6638 [1:23:06<4:42:36, 3.27s/it] 22%|██▏ | 1454/6638 [1:23:10<4:45:54, 3.31s/it] {'loss': 0.7169, 'grad_norm': 0.8102584979059796, 'learning_rate': 1.8185451542886123e-05, 'epoch': 0.22} 22%|██▏ | 1454/6638 [1:23:10<4:45:54, 3.31s/it] 22%|██▏ | 1455/6638 [1:23:13<4:45:34, 3.31s/it] {'loss': 0.6777, 'grad_norm': 0.6548908659986206, 'learning_rate': 1.818264742558444e-05, 'epoch': 0.22} 22%|██▏ | 1455/6638 [1:23:13<4:45:34, 3.31s/it] 22%|██▏ | 1456/6638 [1:23:16<4:43:55, 3.29s/it] {'loss': 0.6731, 'grad_norm': 0.7203694896888035, 'learning_rate': 1.8179841359822144e-05, 'epoch': 0.22} 22%|██▏ | 1456/6638 [1:23:16<4:43:55, 3.29s/it] 22%|██▏ | 1457/6638 [1:23:20<4:42:08, 3.27s/it] {'loss': 0.6641, 'grad_norm': 0.6184515426913213, 'learning_rate': 1.8177033346267424e-05, 'epoch': 0.22} 22%|██▏ | 1457/6638 [1:23:20<4:42:08, 3.27s/it] 22%|██▏ | 1458/6638 [1:23:23<4:42:25, 3.27s/it] {'loss': 0.67, 'grad_norm': 0.6256655375201219, 'learning_rate': 1.817422338558892e-05, 'epoch': 0.22} 22%|██▏ | 1458/6638 [1:23:23<4:42:25, 3.27s/it] 22%|██▏ | 1459/6638 [1:23:26<4:45:29, 3.31s/it] {'loss': 0.6986, 'grad_norm': 0.7083019998253568, 'learning_rate': 1.8171411478455746e-05, 'epoch': 0.22} 22%|██▏ | 1459/6638 [1:23:26<4:45:29, 3.31s/it] 22%|██▏ | 1460/6638 [1:23:29<4:41:06, 3.26s/it] {'loss': 0.6533, 'grad_norm': 0.6896906049401709, 'learning_rate': 1.816859762553748e-05, 'epoch': 0.22} 22%|██▏ | 1460/6638 [1:23:29<4:41:06, 3.26s/it] 22%|██▏ | 1461/6638 [1:23:33<4:48:29, 3.34s/it] {'loss': 0.6759, 'grad_norm': 0.6459422625287965, 'learning_rate': 1.8165781827504155e-05, 'epoch': 0.22} 22%|██▏ | 1461/6638 [1:23:33<4:48:29, 3.34s/it] 22%|██▏ | 1462/6638 [1:23:36<4:46:34, 3.32s/it] {'loss': 0.657, 'grad_norm': 0.6754709094984873, 'learning_rate': 1.8162964085026272e-05, 'epoch': 0.22} 22%|██▏ | 1462/6638 [1:23:36<4:46:34, 3.32s/it] 22%|██▏ | 1463/6638 [1:23:39<4:44:14, 3.30s/it] {'loss': 0.6884, 'grad_norm': 0.6685611266623434, 'learning_rate': 1.8160144398774797e-05, 'epoch': 0.22} 22%|██▏ | 1463/6638 [1:23:39<4:44:14, 3.30s/it] 22%|██▏ | 1464/6638 [1:23:43<4:41:24, 3.26s/it] {'loss': 0.6886, 'grad_norm': 0.7071402093796584, 'learning_rate': 1.8157322769421155e-05, 'epoch': 0.22} 22%|██▏ | 1464/6638 [1:23:43<4:41:24, 3.26s/it] 22%|██▏ | 1465/6638 [1:23:46<4:41:27, 3.26s/it] {'loss': 0.666, 'grad_norm': 0.6840867660413421, 'learning_rate': 1.8154499197637236e-05, 'epoch': 0.22} 22%|██▏ | 1465/6638 [1:23:46<4:41:27, 3.26s/it] 22%|██▏ | 1466/6638 [1:23:49<4:40:11, 3.25s/it] {'loss': 0.6588, 'grad_norm': 0.6477127376384746, 'learning_rate': 1.8151673684095393e-05, 'epoch': 0.22} 22%|██▏ | 1466/6638 [1:23:49<4:40:11, 3.25s/it] 22%|██▏ | 1467/6638 [1:23:52<4:39:46, 3.25s/it] {'loss': 0.6566, 'grad_norm': 0.6427108096258277, 'learning_rate': 1.8148846229468436e-05, 'epoch': 0.22} 22%|██▏ | 1467/6638 [1:23:52<4:39:46, 3.25s/it] 22%|██▏ | 1468/6638 [1:23:56<4:38:52, 3.24s/it] {'loss': 0.6666, 'grad_norm': 0.5899106807751352, 'learning_rate': 1.814601683442965e-05, 'epoch': 0.22} 22%|██▏ | 1468/6638 [1:23:56<4:38:52, 3.24s/it] 22%|██▏ | 1469/6638 [1:23:59<4:38:41, 3.23s/it] {'loss': 0.6553, 'grad_norm': 0.6194561664055582, 'learning_rate': 1.8143185499652757e-05, 'epoch': 0.22} 22%|██▏ | 1469/6638 [1:23:59<4:38:41, 3.23s/it] 22%|██▏ | 1470/6638 [1:24:02<4:43:43, 3.29s/it] {'loss': 0.6951, 'grad_norm': 0.6611677598059661, 'learning_rate': 1.8140352225811976e-05, 'epoch': 0.22} 22%|██▏ | 1470/6638 [1:24:02<4:43:43, 3.29s/it] 22%|██▏ | 1471/6638 [1:24:06<4:45:32, 3.32s/it] {'loss': 0.6735, 'grad_norm': 0.5725180872615226, 'learning_rate': 1.8137517013581963e-05, 'epoch': 0.22} 22%|██▏ | 1471/6638 [1:24:06<4:45:32, 3.32s/it] 22%|██▏ | 1472/6638 [1:24:09<4:43:00, 3.29s/it] {'loss': 0.629, 'grad_norm': 0.578242721994188, 'learning_rate': 1.813467986363784e-05, 'epoch': 0.22} 22%|██▏ | 1472/6638 [1:24:09<4:43:00, 3.29s/it] 22%|██▏ | 1473/6638 [1:24:12<4:49:59, 3.37s/it] {'loss': 0.707, 'grad_norm': 0.6415812800777787, 'learning_rate': 1.8131840776655186e-05, 'epoch': 0.22} 22%|██▏ | 1473/6638 [1:24:12<4:49:59, 3.37s/it] 22%|██▏ | 1474/6638 [1:24:16<4:47:19, 3.34s/it] {'loss': 0.6804, 'grad_norm': 0.6489992948742392, 'learning_rate': 1.8128999753310062e-05, 'epoch': 0.22} 22%|██▏ | 1474/6638 [1:24:16<4:47:19, 3.34s/it] 22%|██▏ | 1475/6638 [1:24:19<4:43:21, 3.29s/it] {'loss': 0.6723, 'grad_norm': 0.6864882530484773, 'learning_rate': 1.8126156794278962e-05, 'epoch': 0.22} 22%|██▏ | 1475/6638 [1:24:19<4:43:21, 3.29s/it] 22%|██▏ | 1476/6638 [1:24:22<4:42:40, 3.29s/it] {'loss': 0.6738, 'grad_norm': 0.6877544008493974, 'learning_rate': 1.812331190023886e-05, 'epoch': 0.22} 22%|██▏ | 1476/6638 [1:24:22<4:42:40, 3.29s/it] 22%|██▏ | 1477/6638 [1:24:25<4:43:36, 3.30s/it] {'loss': 0.735, 'grad_norm': 0.7165019157511239, 'learning_rate': 1.8120465071867186e-05, 'epoch': 0.22} 22%|██▏ | 1477/6638 [1:24:25<4:43:36, 3.30s/it] 22%|██▏ | 1478/6638 [1:24:29<4:44:23, 3.31s/it] {'loss': 0.6858, 'grad_norm': 0.5975553390156703, 'learning_rate': 1.811761630984183e-05, 'epoch': 0.22} 22%|██▏ | 1478/6638 [1:24:29<4:44:23, 3.31s/it] 22%|██▏ | 1479/6638 [1:24:32<4:44:39, 3.31s/it] {'loss': 0.6301, 'grad_norm': 0.5382462044667531, 'learning_rate': 1.8114765614841137e-05, 'epoch': 0.22} 22%|██▏ | 1479/6638 [1:24:32<4:44:39, 3.31s/it] 22%|██▏ | 1480/6638 [1:24:35<4:42:17, 3.28s/it] {'loss': 0.6436, 'grad_norm': 0.6634942931109418, 'learning_rate': 1.8111912987543924e-05, 'epoch': 0.22} 22%|██▏ | 1480/6638 [1:24:35<4:42:17, 3.28s/it] 22%|██▏ | 1481/6638 [1:24:39<4:42:10, 3.28s/it] {'loss': 0.6369, 'grad_norm': 0.6225165383747934, 'learning_rate': 1.8109058428629457e-05, 'epoch': 0.22} 22%|██▏ | 1481/6638 [1:24:39<4:42:10, 3.28s/it] 22%|██▏ | 1482/6638 [1:24:42<4:43:37, 3.30s/it] {'loss': 0.6563, 'grad_norm': 0.5920353917151138, 'learning_rate': 1.810620193877747e-05, 'epoch': 0.22} 22%|██▏ | 1482/6638 [1:24:42<4:43:37, 3.30s/it] 22%|██▏ | 1483/6638 [1:24:45<4:42:15, 3.29s/it] {'loss': 0.6597, 'grad_norm': 0.6660099567288938, 'learning_rate': 1.810334351866815e-05, 'epoch': 0.22} 22%|██▏ | 1483/6638 [1:24:45<4:42:15, 3.29s/it] 22%|██▏ | 1484/6638 [1:24:48<4:40:35, 3.27s/it] {'loss': 0.6586, 'grad_norm': 0.5534224834357307, 'learning_rate': 1.8100483168982152e-05, 'epoch': 0.22} 22%|██▏ | 1484/6638 [1:24:48<4:40:35, 3.27s/it] 22%|██▏ | 1485/6638 [1:24:52<4:42:37, 3.29s/it] {'loss': 0.6467, 'grad_norm': 0.5381148445905984, 'learning_rate': 1.8097620890400583e-05, 'epoch': 0.22} 22%|██▏ | 1485/6638 [1:24:52<4:42:37, 3.29s/it] 22%|██▏ | 1486/6638 [1:24:55<4:46:17, 3.33s/it] {'loss': 0.7518, 'grad_norm': 0.8297686080051694, 'learning_rate': 1.809475668360501e-05, 'epoch': 0.22} 22%|██▏ | 1486/6638 [1:24:55<4:46:17, 3.33s/it] 22%|██▏ | 1487/6638 [1:24:58<4:43:58, 3.31s/it] {'loss': 0.6843, 'grad_norm': 0.6824763961644341, 'learning_rate': 1.809189054927746e-05, 'epoch': 0.22} 22%|██▏ | 1487/6638 [1:24:58<4:43:58, 3.31s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5020 > 4096). Running this sequence through the model will result in indexing errors 22%|██▏ | 1488/6638 [1:25:02<4:42:23, 3.29s/it] {'loss': 0.7054, 'grad_norm': 0.6619232954400589, 'learning_rate': 1.808902248810042e-05, 'epoch': 0.22} 22%|██▏ | 1488/6638 [1:25:02<4:42:23, 3.29s/it] 22%|██▏ | 1489/6638 [1:25:05<4:43:33, 3.30s/it] {'loss': 0.6712, 'grad_norm': 0.6391819799859523, 'learning_rate': 1.808615250075684e-05, 'epoch': 0.22} 22%|██▏ | 1489/6638 [1:25:05<4:43:33, 3.30s/it] 22%|██▏ | 1490/6638 [1:25:08<4:41:20, 3.28s/it] {'loss': 0.7182, 'grad_norm': 0.7040859546116723, 'learning_rate': 1.8083280587930124e-05, 'epoch': 0.22} 22%|██▏ | 1490/6638 [1:25:08<4:41:20, 3.28s/it] 22%|██▏ | 1491/6638 [1:25:12<4:45:36, 3.33s/it] {'loss': 0.6995, 'grad_norm': 0.674691226537342, 'learning_rate': 1.8080406750304133e-05, 'epoch': 0.22} 22%|██▏ | 1491/6638 [1:25:12<4:45:36, 3.33s/it] 22%|██▏ | 1492/6638 [1:25:15<4:45:54, 3.33s/it] {'loss': 0.6527, 'grad_norm': 0.7036783799433514, 'learning_rate': 1.8077530988563184e-05, 'epoch': 0.22} 22%|██▏ | 1492/6638 [1:25:15<4:45:54, 3.33s/it] 22%|██▏ | 1493/6638 [1:25:18<4:40:48, 3.27s/it] {'loss': 0.6893, 'grad_norm': 0.7877631804951788, 'learning_rate': 1.8074653303392063e-05, 'epoch': 0.22} 22%|██▏ | 1493/6638 [1:25:18<4:40:48, 3.27s/it] 23%|██▎ | 1494/6638 [1:25:21<4:40:36, 3.27s/it] {'loss': 0.6494, 'grad_norm': 0.6079825083227418, 'learning_rate': 1.8071773695476003e-05, 'epoch': 0.23} 23%|██▎ | 1494/6638 [1:25:21<4:40:36, 3.27s/it] 23%|██▎ | 1495/6638 [1:25:25<4:43:43, 3.31s/it] {'loss': 0.6885, 'grad_norm': 0.6447492685101002, 'learning_rate': 1.8068892165500704e-05, 'epoch': 0.23} 23%|██▎ | 1495/6638 [1:25:25<4:43:43, 3.31s/it] 23%|██▎ | 1496/6638 [1:25:28<4:47:32, 3.36s/it] {'loss': 0.6959, 'grad_norm': 0.7278602597841082, 'learning_rate': 1.8066008714152317e-05, 'epoch': 0.23} 23%|██▎ | 1496/6638 [1:25:28<4:47:32, 3.36s/it] 23%|██▎ | 1497/6638 [1:25:32<4:46:45, 3.35s/it] {'loss': 0.6285, 'grad_norm': 0.5987139153721941, 'learning_rate': 1.806312334211745e-05, 'epoch': 0.23} 23%|██▎ | 1497/6638 [1:25:32<4:46:45, 3.35s/it] 23%|██▎ | 1498/6638 [1:25:35<4:46:35, 3.35s/it] {'loss': 0.7023, 'grad_norm': 0.7029564115933683, 'learning_rate': 1.8060236050083173e-05, 'epoch': 0.23} 23%|██▎ | 1498/6638 [1:25:35<4:46:35, 3.35s/it] 23%|██▎ | 1499/6638 [1:25:38<4:49:10, 3.38s/it] {'loss': 0.6815, 'grad_norm': 0.5891711995304789, 'learning_rate': 1.805734683873701e-05, 'epoch': 0.23} 23%|██▎ | 1499/6638 [1:25:38<4:49:10, 3.38s/it]5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 23%|██▎ | 1500/6638 [1:25:42<4:43:44, 3.31s/it]4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 31 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... {'loss': 0.6522, 'grad_norm': 0.7164058852559086, 'learning_rate': 1.8054455708766946e-05, 'epoch': 0.23} 23%|██▎ | 1500/6638 [1:25:42<4:43:44, 3.31s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 23%|██▎ | 1501/6638 [1:25:57<9:59:27, 7.00s/it] {'loss': 0.7093, 'grad_norm': 0.6916106108712289, 'learning_rate': 1.8051562660861415e-05, 'epoch': 0.23} 23%|██▎ | 1501/6638 [1:25:57<9:59:27, 7.00s/it] 23%|██▎ | 1502/6638 [1:26:00<8:22:27, 5.87s/it] {'loss': 0.6753, 'grad_norm': 0.6273005285963468, 'learning_rate': 1.8048667695709317e-05, 'epoch': 0.23} 23%|██▎ | 1502/6638 [1:26:00<8:22:27, 5.87s/it] 23%|██▎ | 1503/6638 [1:26:04<7:17:04, 5.11s/it] {'loss': 0.6587, 'grad_norm': 0.6961014210088764, 'learning_rate': 1.8045770813999998e-05, 'epoch': 0.23} 23%|██▎ | 1503/6638 [1:26:04<7:17:04, 5.11s/it] 23%|██▎ | 1504/6638 [1:26:07<6:26:07, 4.51s/it] {'loss': 0.6582, 'grad_norm': 0.6413897938312669, 'learning_rate': 1.8042872016423274e-05, 'epoch': 0.23} 23%|██▎ | 1504/6638 [1:26:07<6:26:07, 4.51s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/model/llava_arch.py:500: UserWarning: Truncating sequences to `model_max_length` (4096). warnings.warn(f"Truncating sequences to `model_max_length` ({self.tokenizer.model_max_length}).") 23%|██▎ | 1505/6638 [1:26:10<6:02:27, 4.24s/it] {'loss': 0.7042, 'grad_norm': 0.6797141156017452, 'learning_rate': 1.8039971303669407e-05, 'epoch': 0.23} 23%|██▎ | 1505/6638 [1:26:10<6:02:27, 4.24s/it] 23%|██▎ | 1506/6638 [1:26:14<5:37:44, 3.95s/it] {'loss': 0.6412, 'grad_norm': 0.5807301432678297, 'learning_rate': 1.8037068676429114e-05, 'epoch': 0.23} 23%|██▎ | 1506/6638 [1:26:14<5:37:44, 3.95s/it] 23%|██▎ | 1507/6638 [1:26:17<5:19:50, 3.74s/it] {'loss': 0.6984, 'grad_norm': 0.6688353934895215, 'learning_rate': 1.803416413539358e-05, 'epoch': 0.23} 23%|██▎ | 1507/6638 [1:26:17<5:19:50, 3.74s/it] 23%|██▎ | 1508/6638 [1:26:20<5:08:03, 3.60s/it] {'loss': 0.6971, 'grad_norm': 0.645383028070827, 'learning_rate': 1.803125768125443e-05, 'epoch': 0.23} 23%|██▎ | 1508/6638 [1:26:20<5:08:03, 3.60s/it] 23%|██▎ | 1509/6638 [1:26:23<4:57:22, 3.48s/it] {'loss': 0.6381, 'grad_norm': 0.6119741925555319, 'learning_rate': 1.8028349314703754e-05, 'epoch': 0.23} 23%|██▎ | 1509/6638 [1:26:23<4:57:22, 3.48s/it] 23%|██▎ | 1510/6638 [1:26:27<4:52:08, 3.42s/it] {'loss': 0.6459, 'grad_norm': 0.6208288745831833, 'learning_rate': 1.8025439036434094e-05, 'epoch': 0.23} 23%|██▎ | 1510/6638 [1:26:27<4:52:08, 3.42s/it] 23%|██▎ | 1511/6638 [1:26:30<4:50:19, 3.40s/it] {'loss': 0.6405, 'grad_norm': 0.5796304989912535, 'learning_rate': 1.8022526847138454e-05, 'epoch': 0.23} 23%|██▎ | 1511/6638 [1:26:30<4:50:19, 3.40s/it] 23%|██▎ | 1512/6638 [1:26:33<4:47:22, 3.36s/it] {'loss': 0.7013, 'grad_norm': 0.7307888019966192, 'learning_rate': 1.801961274751028e-05, 'epoch': 0.23} 23%|██▎ | 1512/6638 [1:26:33<4:47:22, 3.36s/it] 23%|██▎ | 1513/6638 [1:26:37<4:46:44, 3.36s/it] {'loss': 0.7045, 'grad_norm': 0.6047944153523338, 'learning_rate': 1.8016696738243483e-05, 'epoch': 0.23} 23%|██▎ | 1513/6638 [1:26:37<4:46:44, 3.36s/it] 23%|██▎ | 1514/6638 [1:26:40<4:42:19, 3.31s/it] {'loss': 0.6456, 'grad_norm': 0.6518445468230745, 'learning_rate': 1.801377882003243e-05, 'epoch': 0.23} 23%|██▎ | 1514/6638 [1:26:40<4:42:19, 3.31s/it] 23%|██▎ | 1515/6638 [1:26:43<4:41:08, 3.29s/it] {'loss': 0.6909, 'grad_norm': 0.6982381210060918, 'learning_rate': 1.8010858993571937e-05, 'epoch': 0.23} 23%|██▎ | 1515/6638 [1:26:43<4:41:08, 3.29s/it] 23%|██▎ | 1516/6638 [1:26:46<4:39:48, 3.28s/it] {'loss': 0.6596, 'grad_norm': 0.5929816700747245, 'learning_rate': 1.8007937259557274e-05, 'epoch': 0.23} 23%|██▎ | 1516/6638 [1:26:46<4:39:48, 3.28s/it] 23%|██▎ | 1517/6638 [1:26:50<4:38:16, 3.26s/it] {'loss': 0.6916, 'grad_norm': 0.5994026481216576, 'learning_rate': 1.8005013618684168e-05, 'epoch': 0.23} 23%|██▎ | 1517/6638 [1:26:50<4:38:16, 3.26s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5181 > 4096). Running this sequence through the model will result in indexing errors 23%|██▎ | 1518/6638 [1:26:53<4:37:13, 3.25s/it] {'loss': 0.6756, 'grad_norm': 0.6046242811366526, 'learning_rate': 1.80020880716488e-05, 'epoch': 0.23} 23%|██▎ | 1518/6638 [1:26:53<4:37:13, 3.25s/it] 23%|██▎ | 1519/6638 [1:26:56<4:43:12, 3.32s/it] {'loss': 0.6699, 'grad_norm': 0.6247356658257286, 'learning_rate': 1.7999160619147806e-05, 'epoch': 0.23} 23%|██▎ | 1519/6638 [1:26:56<4:43:12, 3.32s/it] 23%|██▎ | 1520/6638 [1:27:00<4:42:00, 3.31s/it] {'loss': 0.6662, 'grad_norm': 0.6559443251996006, 'learning_rate': 1.799623126187827e-05, 'epoch': 0.23} 23%|██▎ | 1520/6638 [1:27:00<4:42:00, 3.31s/it] 23%|██▎ | 1521/6638 [1:27:03<4:48:30, 3.38s/it] {'loss': 0.6637, 'grad_norm': 0.5574773343062076, 'learning_rate': 1.799330000053774e-05, 'epoch': 0.23} 23%|██▎ | 1521/6638 [1:27:03<4:48:30, 3.38s/it] 23%|██▎ | 1522/6638 [1:27:06<4:46:25, 3.36s/it] {'loss': 0.6744, 'grad_norm': 0.6248249664628928, 'learning_rate': 1.79903668358242e-05, 'epoch': 0.23} 23%|██▎ | 1522/6638 [1:27:06<4:46:25, 3.36s/it] 23%|██▎ | 1523/6638 [1:27:10<4:44:12, 3.33s/it] {'loss': 0.6395, 'grad_norm': 0.5859892485601295, 'learning_rate': 1.798743176843611e-05, 'epoch': 0.23} 23%|██▎ | 1523/6638 [1:27:10<4:44:12, 3.33s/it] 23%|██▎ | 1524/6638 [1:27:13<4:45:59, 3.36s/it] {'loss': 0.7005, 'grad_norm': 0.6871294689756297, 'learning_rate': 1.798449479907237e-05, 'epoch': 0.23} 23%|██▎ | 1524/6638 [1:27:13<4:45:59, 3.36s/it] 23%|██▎ | 1525/6638 [1:27:17<4:47:47, 3.38s/it] {'loss': 0.7115, 'grad_norm': 0.6082876663801163, 'learning_rate': 1.7981555928432327e-05, 'epoch': 0.23} 23%|██▎ | 1525/6638 [1:27:17<4:47:47, 3.38s/it] 23%|██▎ | 1526/6638 [1:27:20<4:45:38, 3.35s/it] {'loss': 0.6616, 'grad_norm': 0.614979866074247, 'learning_rate': 1.7978615157215792e-05, 'epoch': 0.23} 23%|██▎ | 1526/6638 [1:27:20<4:45:38, 3.35s/it] 23%|██▎ | 1527/6638 [1:27:23<4:40:50, 3.30s/it] {'loss': 0.6797, 'grad_norm': 0.9757373506943542, 'learning_rate': 1.7975672486123025e-05, 'epoch': 0.23} 23%|██▎ | 1527/6638 [1:27:23<4:40:50, 3.30s/it] 23%|██▎ | 1528/6638 [1:27:26<4:40:02, 3.29s/it] {'loss': 0.735, 'grad_norm': 0.6761439407019542, 'learning_rate': 1.7972727915854735e-05, 'epoch': 0.23} 23%|██▎ | 1528/6638 [1:27:26<4:40:02, 3.29s/it] 23%|██▎ | 1529/6638 [1:27:30<4:39:07, 3.28s/it] {'loss': 0.6711, 'grad_norm': 0.628400105042271, 'learning_rate': 1.796978144711209e-05, 'epoch': 0.23} 23%|██▎ | 1529/6638 [1:27:30<4:39:07, 3.28s/it] 23%|██▎ | 1530/6638 [1:27:33<4:38:48, 3.27s/it] {'loss': 0.6293, 'grad_norm': 0.6690868280430496, 'learning_rate': 1.7966833080596705e-05, 'epoch': 0.23} 23%|██▎ | 1530/6638 [1:27:33<4:38:48, 3.27s/it] 23%|██▎ | 1531/6638 [1:27:36<4:40:51, 3.30s/it] {'loss': 0.6879, 'grad_norm': 0.7073005544758819, 'learning_rate': 1.7963882817010647e-05, 'epoch': 0.23} 23%|██▎ | 1531/6638 [1:27:36<4:40:51, 3.30s/it] 23%|██▎ | 1532/6638 [1:27:40<4:48:52, 3.39s/it] {'loss': 0.6804, 'grad_norm': 0.6395138603252815, 'learning_rate': 1.796093065705644e-05, 'epoch': 0.23} 23%|██▎ | 1532/6638 [1:27:40<4:48:52, 3.39s/it] 23%|██▎ | 1533/6638 [1:27:43<4:44:34, 3.34s/it] {'loss': 0.6904, 'grad_norm': 0.6255743753607244, 'learning_rate': 1.795797660143705e-05, 'epoch': 0.23} 23%|██▎ | 1533/6638 [1:27:43<4:44:34, 3.34s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/model/llava_arch.py:500: UserWarning: Truncating sequences to `model_max_length` (4096). warnings.warn(f"Truncating sequences to `model_max_length` ({self.tokenizer.model_max_length}).") 23%|██▎ | 1534/6638 [1:27:47<4:56:22, 3.48s/it] {'loss': 0.7127, 'grad_norm': 0.6000352138021722, 'learning_rate': 1.79550206508559e-05, 'epoch': 0.23} 23%|██▎ | 1534/6638 [1:27:47<4:56:22, 3.48s/it] 23%|██▎ | 1535/6638 [1:27:50<4:51:42, 3.43s/it] {'loss': 0.6921, 'grad_norm': 0.6809975103648269, 'learning_rate': 1.7952062806016867e-05, 'epoch': 0.23} 23%|██▎ | 1535/6638 [1:27:50<4:51:42, 3.43s/it] 23%|██▎ | 1536/6638 [1:27:53<4:49:16, 3.40s/it] {'loss': 0.7229, 'grad_norm': 0.7153943627695147, 'learning_rate': 1.7949103067624277e-05, 'epoch': 0.23} 23%|██▎ | 1536/6638 [1:27:53<4:49:16, 3.40s/it] 23%|██▎ | 1537/6638 [1:27:57<4:45:50, 3.36s/it] {'loss': 0.6753, 'grad_norm': 0.6115024844445701, 'learning_rate': 1.7946141436382904e-05, 'epoch': 0.23} 23%|██▎ | 1537/6638 [1:27:57<4:45:50, 3.36s/it] 23%|██▎ | 1538/6638 [1:28:00<4:43:29, 3.34s/it] {'loss': 0.7013, 'grad_norm': 0.6673599778218593, 'learning_rate': 1.7943177912997974e-05, 'epoch': 0.23} 23%|██▎ | 1538/6638 [1:28:00<4:43:29, 3.34s/it] 23%|██▎ | 1539/6638 [1:28:03<4:40:06, 3.30s/it] {'loss': 0.6667, 'grad_norm': 0.657667888512, 'learning_rate': 1.7940212498175168e-05, 'epoch': 0.23} 23%|██▎ | 1539/6638 [1:28:03<4:40:06, 3.30s/it] 23%|██▎ | 1540/6638 [1:28:07<4:40:33, 3.30s/it] {'loss': 0.6558, 'grad_norm': 0.650656395136392, 'learning_rate': 1.7937245192620606e-05, 'epoch': 0.23} 23%|██▎ | 1540/6638 [1:28:07<4:40:33, 3.30s/it] 23%|██▎ | 1541/6638 [1:28:10<4:39:10, 3.29s/it] {'loss': 0.6528, 'grad_norm': 0.737449628560089, 'learning_rate': 1.7934275997040873e-05, 'epoch': 0.23} 23%|██▎ | 1541/6638 [1:28:10<4:39:10, 3.29s/it] 23%|██▎ | 1542/6638 [1:28:13<4:37:28, 3.27s/it] {'loss': 0.6853, 'grad_norm': 0.7323666888313801, 'learning_rate': 1.7931304912142998e-05, 'epoch': 0.23} 23%|██▎ | 1542/6638 [1:28:13<4:37:28, 3.27s/it] 23%|██▎ | 1543/6638 [1:28:16<4:34:14, 3.23s/it] {'loss': 0.7032, 'grad_norm': 0.6456766253847315, 'learning_rate': 1.7928331938634452e-05, 'epoch': 0.23} 23%|██▎ | 1543/6638 [1:28:16<4:34:14, 3.23s/it] 23%|██▎ | 1544/6638 [1:28:19<4:34:33, 3.23s/it] {'loss': 0.6981, 'grad_norm': 0.6381705304421529, 'learning_rate': 1.792535707722317e-05, 'epoch': 0.23} 23%|██▎ | 1544/6638 [1:28:19<4:34:33, 3.23s/it] 23%|██▎ | 1545/6638 [1:28:23<4:36:54, 3.26s/it] {'loss': 0.6842, 'grad_norm': 0.6478779436340525, 'learning_rate': 1.792238032861752e-05, 'epoch': 0.23} 23%|██▎ | 1545/6638 [1:28:23<4:36:54, 3.26s/it] 23%|██▎ | 1546/6638 [1:28:26<4:38:52, 3.29s/it] {'loss': 0.6907, 'grad_norm': 0.6137189128503192, 'learning_rate': 1.7919401693526338e-05, 'epoch': 0.23} 23%|██▎ | 1546/6638 [1:28:26<4:38:52, 3.29s/it] 23%|██▎ | 1547/6638 [1:28:29<4:36:10, 3.25s/it] {'loss': 0.6518, 'grad_norm': 0.5587274756335808, 'learning_rate': 1.7916421172658892e-05, 'epoch': 0.23} 23%|██▎ | 1547/6638 [1:28:29<4:36:10, 3.25s/it] 23%|██▎ | 1548/6638 [1:28:33<4:36:38, 3.26s/it] {'loss': 0.6618, 'grad_norm': 0.5688255401165506, 'learning_rate': 1.7913438766724914e-05, 'epoch': 0.23} 23%|██▎ | 1548/6638 [1:28:33<4:36:38, 3.26s/it] 23%|██▎ | 1549/6638 [1:28:36<4:36:13, 3.26s/it] {'loss': 0.6617, 'grad_norm': 0.6104706508121968, 'learning_rate': 1.7910454476434574e-05, 'epoch': 0.23} 23%|██▎ | 1549/6638 [1:28:36<4:36:13, 3.26s/it]3 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 23%|██▎ | 1550/6638 [1:28:39<4:36:46, 3.26s/it]4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6437, 'grad_norm': 0.6377743773493715, 'learning_rate': 1.7907468302498495e-05, 'epoch': 0.23} 23%|██▎ | 1550/6638 [1:28:39<4:36:46, 3.26s/it] 23%|██▎ | 1551/6638 [1:28:42<4:36:07, 3.26s/it] {'loss': 0.6686, 'grad_norm': 0.7263605196656489, 'learning_rate': 1.7904480245627743e-05, 'epoch': 0.23} 23%|██▎ | 1551/6638 [1:28:42<4:36:07, 3.26s/it] 23%|██▎ | 1552/6638 [1:28:46<4:40:44, 3.31s/it] {'loss': 0.7156, 'grad_norm': 0.6311855652063565, 'learning_rate': 1.7901490306533845e-05, 'epoch': 0.23} 23%|██▎ | 1552/6638 [1:28:46<4:40:44, 3.31s/it] 23%|██▎ | 1553/6638 [1:28:49<4:39:35, 3.30s/it] {'loss': 0.6893, 'grad_norm': 0.6200510867951841, 'learning_rate': 1.7898498485928764e-05, 'epoch': 0.23} 23%|██▎ | 1553/6638 [1:28:49<4:39:35, 3.30s/it] 23%|██▎ | 1554/6638 [1:28:52<4:39:51, 3.30s/it] {'loss': 0.6725, 'grad_norm': 0.6655962482988895, 'learning_rate': 1.7895504784524917e-05, 'epoch': 0.23} 23%|██▎ | 1554/6638 [1:28:52<4:39:51, 3.30s/it] 23%|██▎ | 1555/6638 [1:28:56<4:38:44, 3.29s/it] {'loss': 0.6974, 'grad_norm': 0.646853498728999, 'learning_rate': 1.7892509203035166e-05, 'epoch': 0.23} 23%|██▎ | 1555/6638 [1:28:56<4:38:44, 3.29s/it] 23%|██▎ | 1556/6638 [1:28:59<4:39:29, 3.30s/it] {'loss': 0.6565, 'grad_norm': 0.5432464600609034, 'learning_rate': 1.7889511742172822e-05, 'epoch': 0.23} 23%|██▎ | 1556/6638 [1:28:59<4:39:29, 3.30s/it] 23%|██▎ | 1557/6638 [1:29:02<4:38:43, 3.29s/it] {'loss': 0.6953, 'grad_norm': 0.6937563691035745, 'learning_rate': 1.7886512402651645e-05, 'epoch': 0.23} 23%|██▎ | 1557/6638 [1:29:02<4:38:43, 3.29s/it] 23%|██▎ | 1558/6638 [1:29:05<4:37:29, 3.28s/it] {'loss': 0.6493, 'grad_norm': 0.601592658714127, 'learning_rate': 1.7883511185185843e-05, 'epoch': 0.23} 23%|██▎ | 1558/6638 [1:29:05<4:37:29, 3.28s/it] 23%|██▎ | 1559/6638 [1:29:09<4:36:05, 3.26s/it] {'loss': 0.7191, 'grad_norm': 0.7160758268598757, 'learning_rate': 1.7880508090490062e-05, 'epoch': 0.23} 23%|██▎ | 1559/6638 [1:29:09<4:36:05, 3.26s/it] 24%|██▎ | 1560/6638 [1:29:12<4:35:38, 3.26s/it] {'loss': 0.6572, 'grad_norm': 0.6363740208611255, 'learning_rate': 1.7877503119279408e-05, 'epoch': 0.24} 24%|██▎ | 1560/6638 [1:29:12<4:35:38, 3.26s/it] 24%|██▎ | 1561/6638 [1:29:15<4:38:05, 3.29s/it] {'loss': 0.7057, 'grad_norm': 0.6316173030959046, 'learning_rate': 1.7874496272269426e-05, 'epoch': 0.24} 24%|██▎ | 1561/6638 [1:29:15<4:38:05, 3.29s/it] 24%|██▎ | 1562/6638 [1:29:19<4:41:19, 3.33s/it] {'loss': 0.7101, 'grad_norm': 0.6905577433474661, 'learning_rate': 1.787148755017611e-05, 'epoch': 0.24} 24%|██▎ | 1562/6638 [1:29:19<4:41:19, 3.33s/it] 24%|██▎ | 1563/6638 [1:29:22<4:44:47, 3.37s/it] {'loss': 0.6254, 'grad_norm': 0.5383256780342487, 'learning_rate': 1.78684769537159e-05, 'epoch': 0.24} 24%|██▎ | 1563/6638 [1:29:22<4:44:47, 3.37s/it] 24%|██▎ | 1564/6638 [1:29:26<4:45:03, 3.37s/it] {'loss': 0.6558, 'grad_norm': 0.6265091414671405, 'learning_rate': 1.7865464483605684e-05, 'epoch': 0.24} 24%|██▎ | 1564/6638 [1:29:26<4:45:03, 3.37s/it] 24%|██▎ | 1565/6638 [1:29:29<4:42:56, 3.35s/it] {'loss': 0.6559, 'grad_norm': 0.58886794766804, 'learning_rate': 1.786245014056279e-05, 'epoch': 0.24} 24%|██▎ | 1565/6638 [1:29:29<4:42:56, 3.35s/it] 24%|██▎ | 1566/6638 [1:29:32<4:44:03, 3.36s/it] {'loss': 0.6585, 'grad_norm': 0.6932704162372019, 'learning_rate': 1.7859433925305e-05, 'epoch': 0.24} 24%|██▎ | 1566/6638 [1:29:32<4:44:03, 3.36s/it] 24%|██▎ | 1567/6638 [1:29:35<4:39:11, 3.30s/it] {'loss': 0.7086, 'grad_norm': 0.6832575282994122, 'learning_rate': 1.785641583855054e-05, 'epoch': 0.24} 24%|██▎ | 1567/6638 [1:29:35<4:39:11, 3.30s/it] 24%|██▎ | 1568/6638 [1:29:39<4:38:42, 3.30s/it] {'loss': 0.6852, 'grad_norm': 0.6391080163302464, 'learning_rate': 1.7853395881018075e-05, 'epoch': 0.24} 24%|██▎ | 1568/6638 [1:29:39<4:38:42, 3.30s/it] 24%|██▎ | 1569/6638 [1:29:42<4:38:07, 3.29s/it] {'loss': 0.6287, 'grad_norm': 0.6274069201850614, 'learning_rate': 1.7850374053426725e-05, 'epoch': 0.24} 24%|██▎ | 1569/6638 [1:29:42<4:38:07, 3.29s/it] 24%|██▎ | 1570/6638 [1:29:45<4:36:44, 3.28s/it] {'loss': 0.6878, 'grad_norm': 0.6073487746597761, 'learning_rate': 1.7847350356496047e-05, 'epoch': 0.24} 24%|██▎ | 1570/6638 [1:29:45<4:36:44, 3.28s/it] 24%|██▎ | 1571/6638 [1:29:48<4:36:33, 3.27s/it] {'loss': 0.6403, 'grad_norm': 0.6674199492041577, 'learning_rate': 1.784432479094605e-05, 'epoch': 0.24} 24%|██▎ | 1571/6638 [1:29:48<4:36:33, 3.27s/it] 24%|██▎ | 1572/6638 [1:29:52<4:34:38, 3.25s/it] {'loss': 0.6324, 'grad_norm': 0.6108008785450532, 'learning_rate': 1.7841297357497184e-05, 'epoch': 0.24} 24%|██▎ | 1572/6638 [1:29:52<4:34:38, 3.25s/it] 24%|██▎ | 1573/6638 [1:29:55<4:41:32, 3.34s/it] {'loss': 0.6382, 'grad_norm': 0.5941204983803486, 'learning_rate': 1.7838268056870345e-05, 'epoch': 0.24} 24%|██▎ | 1573/6638 [1:29:55<4:41:32, 3.34s/it] 24%|██▎ | 1574/6638 [1:29:58<4:38:58, 3.31s/it] {'loss': 0.6742, 'grad_norm': 0.634609667825837, 'learning_rate': 1.7835236889786868e-05, 'epoch': 0.24} 24%|██▎ | 1574/6638 [1:29:58<4:38:58, 3.31s/it] 24%|██▎ | 1575/6638 [1:30:02<4:37:23, 3.29s/it] {'loss': 0.6737, 'grad_norm': 0.6749205147272049, 'learning_rate': 1.7832203856968545e-05, 'epoch': 0.24} 24%|██▎ | 1575/6638 [1:30:02<4:37:23, 3.29s/it] 24%|██▎ | 1576/6638 [1:30:05<4:35:19, 3.26s/it] {'loss': 0.6857, 'grad_norm': 0.7219188832786193, 'learning_rate': 1.7829168959137606e-05, 'epoch': 0.24} 24%|██▎ | 1576/6638 [1:30:05<4:35:19, 3.26s/it] 24%|██▍ | 1577/6638 [1:30:08<4:38:07, 3.30s/it] {'loss': 0.6839, 'grad_norm': 0.7134624886370099, 'learning_rate': 1.7826132197016715e-05, 'epoch': 0.24} 24%|██▍ | 1577/6638 [1:30:08<4:38:07, 3.30s/it] 24%|██▍ | 1578/6638 [1:30:12<4:38:49, 3.31s/it] {'loss': 0.6778, 'grad_norm': 0.6816256617233114, 'learning_rate': 1.7823093571328998e-05, 'epoch': 0.24} 24%|██▍ | 1578/6638 [1:30:12<4:38:49, 3.31s/it] 24%|██▍ | 1579/6638 [1:30:15<4:37:21, 3.29s/it] {'loss': 0.6564, 'grad_norm': 0.6342466090229744, 'learning_rate': 1.782005308279801e-05, 'epoch': 0.24} 24%|██▍ | 1579/6638 [1:30:15<4:37:21, 3.29s/it] 24%|██▍ | 1580/6638 [1:30:18<4:40:39, 3.33s/it] {'loss': 0.7483, 'grad_norm': 0.7662655928651118, 'learning_rate': 1.7817010732147758e-05, 'epoch': 0.24} 24%|██▍ | 1580/6638 [1:30:18<4:40:39, 3.33s/it] 24%|██▍ | 1581/6638 [1:30:21<4:38:09, 3.30s/it] {'loss': 0.6789, 'grad_norm': 0.6839404488723808, 'learning_rate': 1.781396652010269e-05, 'epoch': 0.24} 24%|██▍ | 1581/6638 [1:30:21<4:38:09, 3.30s/it] 24%|██▍ | 1582/6638 [1:30:25<4:38:17, 3.30s/it] {'loss': 0.6874, 'grad_norm': 0.659031630564845, 'learning_rate': 1.7810920447387696e-05, 'epoch': 0.24} 24%|██▍ | 1582/6638 [1:30:25<4:38:17, 3.30s/it] 24%|██▍ | 1583/6638 [1:30:28<4:36:29, 3.28s/it] {'loss': 0.6717, 'grad_norm': 0.6847473278452061, 'learning_rate': 1.7807872514728105e-05, 'epoch': 0.24} 24%|██▍ | 1583/6638 [1:30:28<4:36:29, 3.28s/it] 24%|██▍ | 1584/6638 [1:30:31<4:36:11, 3.28s/it] {'loss': 0.681, 'grad_norm': 0.6299241947481684, 'learning_rate': 1.7804822722849704e-05, 'epoch': 0.24} 24%|██▍ | 1584/6638 [1:30:31<4:36:11, 3.28s/it] 24%|██▍ | 1585/6638 [1:30:34<4:34:01, 3.25s/it] {'loss': 0.6486, 'grad_norm': 0.6913718965926384, 'learning_rate': 1.7801771072478705e-05, 'epoch': 0.24} 24%|██▍ | 1585/6638 [1:30:34<4:34:01, 3.25s/it] 24%|██▍ | 1586/6638 [1:30:38<4:32:13, 3.23s/it] {'loss': 0.5946, 'grad_norm': 0.5770489446066127, 'learning_rate': 1.7798717564341775e-05, 'epoch': 0.24} 24%|██▍ | 1586/6638 [1:30:38<4:32:13, 3.23s/it] 24%|██▍ | 1587/6638 [1:30:41<4:32:54, 3.24s/it] {'loss': 0.6535, 'grad_norm': 0.6597926448162356, 'learning_rate': 1.7795662199166017e-05, 'epoch': 0.24} 24%|██▍ | 1587/6638 [1:30:41<4:32:54, 3.24s/it] 24%|██▍ | 1588/6638 [1:30:44<4:35:10, 3.27s/it] {'loss': 0.6563, 'grad_norm': 0.6208847788924888, 'learning_rate': 1.7792604977678974e-05, 'epoch': 0.24} 24%|██▍ | 1588/6638 [1:30:44<4:35:10, 3.27s/it] 24%|██▍ | 1589/6638 [1:30:48<4:39:43, 3.32s/it] {'loss': 0.6712, 'grad_norm': 0.6612505758845441, 'learning_rate': 1.778954590060864e-05, 'epoch': 0.24} 24%|██▍ | 1589/6638 [1:30:48<4:39:43, 3.32s/it] 24%|██▍ | 1590/6638 [1:30:51<4:38:09, 3.31s/it] {'loss': 0.672, 'grad_norm': 0.709514539564477, 'learning_rate': 1.7786484968683442e-05, 'epoch': 0.24} 24%|██▍ | 1590/6638 [1:30:51<4:38:09, 3.31s/it] 24%|██▍ | 1591/6638 [1:30:54<4:36:35, 3.29s/it] {'loss': 0.6969, 'grad_norm': 0.6854152079516644, 'learning_rate': 1.7783422182632257e-05, 'epoch': 0.24} 24%|██▍ | 1591/6638 [1:30:54<4:36:35, 3.29s/it] 24%|██▍ | 1592/6638 [1:30:57<4:36:14, 3.28s/it] {'loss': 0.6394, 'grad_norm': 0.8023167714959523, 'learning_rate': 1.7780357543184396e-05, 'epoch': 0.24} 24%|██▍ | 1592/6638 [1:30:57<4:36:14, 3.28s/it] 24%|██▍ | 1593/6638 [1:31:01<4:40:11, 3.33s/it] {'loss': 0.7375, 'grad_norm': 0.733357110088057, 'learning_rate': 1.7777291051069614e-05, 'epoch': 0.24} 24%|██▍ | 1593/6638 [1:31:01<4:40:11, 3.33s/it] 24%|██▍ | 1594/6638 [1:31:04<4:37:58, 3.31s/it] {'loss': 0.6653, 'grad_norm': 0.6064952479825039, 'learning_rate': 1.7774222707018106e-05, 'epoch': 0.24} 24%|██▍ | 1594/6638 [1:31:04<4:37:58, 3.31s/it] 24%|██▍ | 1595/6638 [1:31:07<4:36:51, 3.29s/it] {'loss': 0.6984, 'grad_norm': 0.7330023455534711, 'learning_rate': 1.777115251176051e-05, 'epoch': 0.24} 24%|██▍ | 1595/6638 [1:31:07<4:36:51, 3.29s/it] 24%|██▍ | 1596/6638 [1:31:11<4:35:02, 3.27s/it] {'loss': 0.6978, 'grad_norm': 0.6726257721654455, 'learning_rate': 1.7768080466027904e-05, 'epoch': 0.24} 24%|██▍ | 1596/6638 [1:31:11<4:35:02, 3.27s/it] 24%|██▍ | 1597/6638 [1:31:14<4:34:08, 3.26s/it] {'loss': 0.6849, 'grad_norm': 0.6015374036937865, 'learning_rate': 1.776500657055181e-05, 'epoch': 0.24} 24%|██▍ | 1597/6638 [1:31:14<4:34:08, 3.26s/it] 24%|██▍ | 1598/6638 [1:31:17<4:33:59, 3.26s/it] {'loss': 0.7015, 'grad_norm': 0.6955513247789471, 'learning_rate': 1.776193082606418e-05, 'epoch': 0.24} 24%|██▍ | 1598/6638 [1:31:17<4:33:59, 3.26s/it] 24%|██▍ | 1599/6638 [1:31:20<4:33:06, 3.25s/it] {'loss': 0.6811, 'grad_norm': 0.6730588254383238, 'learning_rate': 1.7758853233297422e-05, 'epoch': 0.24} 24%|██▍ | 1599/6638 [1:31:20<4:33:06, 3.25s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 24%|██▍ | 1600/6638 [1:31:24<4:33:41, 3.26s/it]4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.6649, 'grad_norm': 0.687699029369108, 'learning_rate': 1.7755773792984374e-05, 'epoch': 0.24} 24%|██▍ | 1600/6638 [1:31:24<4:33:41, 3.26s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 24%|██▍ | 1601/6638 [1:31:40<10:09:26, 7.26s/it] {'loss': 0.6586, 'grad_norm': 0.6014176361253051, 'learning_rate': 1.775269250585831e-05, 'epoch': 0.24} 24%|██▍ | 1601/6638 [1:31:40<10:09:26, 7.26s/it] 24%|██▍ | 1602/6638 [1:31:43<8:27:34, 6.05s/it] {'loss': 0.6687, 'grad_norm': 0.6534559707424467, 'learning_rate': 1.7749609372652953e-05, 'epoch': 0.24} 24%|██▍ | 1602/6638 [1:31:43<8:27:34, 6.05s/it] 24%|██▍ | 1603/6638 [1:31:47<7:16:20, 5.20s/it] {'loss': 0.6773, 'grad_norm': 0.6133741596600623, 'learning_rate': 1.774652439410246e-05, 'epoch': 0.24} 24%|██▍ | 1603/6638 [1:31:47<7:16:20, 5.20s/it] 24%|██▍ | 1604/6638 [1:31:50<6:28:12, 4.63s/it] {'loss': 0.6772, 'grad_norm': 0.6394840382541312, 'learning_rate': 1.7743437570941435e-05, 'epoch': 0.24} 24%|██▍ | 1604/6638 [1:31:50<6:28:12, 4.63s/it] 24%|██▍ | 1605/6638 [1:31:53<5:52:33, 4.20s/it] {'loss': 0.6582, 'grad_norm': 0.6305442726726213, 'learning_rate': 1.7740348903904913e-05, 'epoch': 0.24} 24%|██▍ | 1605/6638 [1:31:53<5:52:33, 4.20s/it] 24%|██▍ | 1606/6638 [1:31:56<5:28:12, 3.91s/it] {'loss': 0.6543, 'grad_norm': 0.6884695257734633, 'learning_rate': 1.7737258393728363e-05, 'epoch': 0.24} 24%|██▍ | 1606/6638 [1:31:56<5:28:12, 3.91s/it] 24%|██▍ | 1607/6638 [1:32:00<5:11:54, 3.72s/it] {'loss': 0.6858, 'grad_norm': 0.6916824571668944, 'learning_rate': 1.7734166041147713e-05, 'epoch': 0.24} 24%|██▍ | 1607/6638 [1:32:00<5:11:54, 3.72s/it] 24%|██▍ | 1608/6638 [1:32:03<4:57:49, 3.55s/it] {'loss': 0.6739, 'grad_norm': 0.7524719164661057, 'learning_rate': 1.7731071846899307e-05, 'epoch': 0.24} 24%|██▍ | 1608/6638 [1:32:03<4:57:49, 3.55s/it] 24%|██▍ | 1609/6638 [1:32:06<4:50:30, 3.47s/it] {'loss': 0.6837, 'grad_norm': 0.6646561738118798, 'learning_rate': 1.7727975811719942e-05, 'epoch': 0.24} 24%|██▍ | 1609/6638 [1:32:06<4:50:30, 3.47s/it] 24%|██▍ | 1610/6638 [1:32:09<4:44:28, 3.39s/it] {'loss': 0.7283, 'grad_norm': 0.6787757669113391, 'learning_rate': 1.772487793634685e-05, 'epoch': 0.24} 24%|██▍ | 1610/6638 [1:32:09<4:44:28, 3.39s/it] 24%|██▍ | 1611/6638 [1:32:13<4:38:43, 3.33s/it] {'loss': 0.6513, 'grad_norm': 0.6844801742786756, 'learning_rate': 1.77217782215177e-05, 'epoch': 0.24} 24%|██▍ | 1611/6638 [1:32:13<4:38:43, 3.33s/it] 24%|██▍ | 1612/6638 [1:32:16<4:37:45, 3.32s/it] {'loss': 0.689, 'grad_norm': 0.6316368908490092, 'learning_rate': 1.7718676667970597e-05, 'epoch': 0.24} 24%|██▍ | 1612/6638 [1:32:16<4:37:45, 3.32s/it] 24%|██▍ | 1613/6638 [1:32:19<4:38:46, 3.33s/it] {'loss': 0.744, 'grad_norm': 0.7008299949347878, 'learning_rate': 1.7715573276444086e-05, 'epoch': 0.24} 24%|██▍ | 1613/6638 [1:32:19<4:38:46, 3.33s/it] 24%|██▍ | 1614/6638 [1:32:23<4:42:10, 3.37s/it] {'loss': 0.7011, 'grad_norm': 0.7025539007164617, 'learning_rate': 1.771246804767716e-05, 'epoch': 0.24} 24%|██▍ | 1614/6638 [1:32:23<4:42:10, 3.37s/it] 24%|██▍ | 1615/6638 [1:32:26<4:39:22, 3.34s/it] {'loss': 0.6706, 'grad_norm': 0.6405250975207156, 'learning_rate': 1.770936098240922e-05, 'epoch': 0.24} 24%|██▍ | 1615/6638 [1:32:26<4:39:22, 3.34s/it] 24%|██▍ | 1616/6638 [1:32:29<4:41:07, 3.36s/it] {'loss': 0.7268, 'grad_norm': 0.606273284642912, 'learning_rate': 1.7706252081380143e-05, 'epoch': 0.24} 24%|██▍ | 1616/6638 [1:32:29<4:41:07, 3.36s/it] 24%|██▍ | 1617/6638 [1:32:33<4:37:48, 3.32s/it] {'loss': 0.6516, 'grad_norm': 0.5868080721195206, 'learning_rate': 1.7703141345330212e-05, 'epoch': 0.24} 24%|██▍ | 1617/6638 [1:32:33<4:37:48, 3.32s/it] 24%|██▍ | 1618/6638 [1:32:36<4:38:11, 3.32s/it] {'loss': 0.6475, 'grad_norm': 0.6153676860636524, 'learning_rate': 1.770002877500016e-05, 'epoch': 0.24} 24%|██▍ | 1618/6638 [1:32:36<4:38:11, 3.32s/it] 24%|██▍ | 1619/6638 [1:32:39<4:33:59, 3.28s/it] {'loss': 0.6772, 'grad_norm': 0.6355043989952147, 'learning_rate': 1.7696914371131164e-05, 'epoch': 0.24} 24%|██▍ | 1619/6638 [1:32:39<4:33:59, 3.28s/it] 24%|██▍ | 1620/6638 [1:32:42<4:34:40, 3.28s/it] {'loss': 0.6358, 'grad_norm': 0.5971844018980015, 'learning_rate': 1.7693798134464818e-05, 'epoch': 0.24} 24%|██▍ | 1620/6638 [1:32:42<4:34:40, 3.28s/it] 24%|██▍ | 1621/6638 [1:32:46<4:32:36, 3.26s/it] {'loss': 0.6447, 'grad_norm': 0.6559237651613004, 'learning_rate': 1.769068006574317e-05, 'epoch': 0.24} 24%|██▍ | 1621/6638 [1:32:46<4:32:36, 3.26s/it] 24%|██▍ | 1622/6638 [1:32:49<4:30:12, 3.23s/it] {'loss': 0.668, 'grad_norm': 0.7991743515974584, 'learning_rate': 1.7687560165708695e-05, 'epoch': 0.24} 24%|██▍ | 1622/6638 [1:32:49<4:30:12, 3.23s/it] 24%|██▍ | 1623/6638 [1:32:52<4:33:32, 3.27s/it] {'loss': 0.6547, 'grad_norm': 0.5920831139006036, 'learning_rate': 1.7684438435104308e-05, 'epoch': 0.24} 24%|██▍ | 1623/6638 [1:32:52<4:33:32, 3.27s/it] 24%|██▍ | 1624/6638 [1:32:55<4:32:45, 3.26s/it] {'loss': 0.6358, 'grad_norm': 0.6250351261722746, 'learning_rate': 1.768131487467336e-05, 'epoch': 0.24} 24%|██▍ | 1624/6638 [1:32:55<4:32:45, 3.26s/it] 24%|██▍ | 1625/6638 [1:32:59<4:35:35, 3.30s/it] {'loss': 0.6637, 'grad_norm': 0.5764735894970706, 'learning_rate': 1.7678189485159635e-05, 'epoch': 0.24} 24%|██▍ | 1625/6638 [1:32:59<4:35:35, 3.30s/it] 24%|██▍ | 1626/6638 [1:33:02<4:32:59, 3.27s/it] {'loss': 0.683, 'grad_norm': 0.6857455213200528, 'learning_rate': 1.7675062267307358e-05, 'epoch': 0.24} 24%|██▍ | 1626/6638 [1:33:02<4:32:59, 3.27s/it] 25%|██▍ | 1627/6638 [1:33:05<4:33:03, 3.27s/it] {'loss': 0.6628, 'grad_norm': 0.6696052522257977, 'learning_rate': 1.7671933221861178e-05, 'epoch': 0.25} 25%|██▍ | 1627/6638 [1:33:05<4:33:03, 3.27s/it] 25%|██▍ | 1628/6638 [1:33:09<4:34:06, 3.28s/it] {'loss': 0.6463, 'grad_norm': 0.5783008938301908, 'learning_rate': 1.766880234956619e-05, 'epoch': 0.25} 25%|██▍ | 1628/6638 [1:33:09<4:34:06, 3.28s/it] 25%|██▍ | 1629/6638 [1:33:12<4:32:05, 3.26s/it] {'loss': 0.6446, 'grad_norm': 0.6005861483447681, 'learning_rate': 1.766566965116792e-05, 'epoch': 0.25} 25%|██▍ | 1629/6638 [1:33:12<4:32:05, 3.26s/it] 25%|██▍ | 1630/6638 [1:33:15<4:31:11, 3.25s/it] {'loss': 0.632, 'grad_norm': 0.6074053811975004, 'learning_rate': 1.7662535127412334e-05, 'epoch': 0.25} 25%|██▍ | 1630/6638 [1:33:15<4:31:11, 3.25s/it] 25%|██▍ | 1631/6638 [1:33:18<4:31:36, 3.25s/it] {'loss': 0.6428, 'grad_norm': 0.6328817235270247, 'learning_rate': 1.7659398779045824e-05, 'epoch': 0.25} 25%|██▍ | 1631/6638 [1:33:18<4:31:36, 3.25s/it] 25%|██▍ | 1632/6638 [1:33:21<4:31:47, 3.26s/it] {'loss': 0.6929, 'grad_norm': 0.60770406534112, 'learning_rate': 1.765626060681522e-05, 'epoch': 0.25} 25%|██▍ | 1632/6638 [1:33:21<4:31:47, 3.26s/it] 25%|██▍ | 1633/6638 [1:33:25<4:32:37, 3.27s/it] {'loss': 0.6741, 'grad_norm': 0.619185638578314, 'learning_rate': 1.7653120611467786e-05, 'epoch': 0.25} 25%|██▍ | 1633/6638 [1:33:25<4:32:37, 3.27s/it] 25%|██▍ | 1634/6638 [1:33:28<4:35:33, 3.30s/it] {'loss': 0.6445, 'grad_norm': 0.5744953377225263, 'learning_rate': 1.7649978793751226e-05, 'epoch': 0.25} 25%|██▍ | 1634/6638 [1:33:28<4:35:33, 3.30s/it] 25%|██▍ | 1635/6638 [1:33:31<4:32:32, 3.27s/it] {'loss': 0.6694, 'grad_norm': 0.5784398644833675, 'learning_rate': 1.764683515441367e-05, 'epoch': 0.25} 25%|██▍ | 1635/6638 [1:33:31<4:32:32, 3.27s/it] 25%|██▍ | 1636/6638 [1:33:35<4:30:48, 3.25s/it] {'loss': 0.661, 'grad_norm': 0.656578483976508, 'learning_rate': 1.7643689694203682e-05, 'epoch': 0.25} 25%|██▍ | 1636/6638 [1:33:35<4:30:48, 3.25s/it] 25%|██▍ | 1637/6638 [1:33:38<4:30:36, 3.25s/it] {'loss': 0.6568, 'grad_norm': 0.5869941906077736, 'learning_rate': 1.764054241387027e-05, 'epoch': 0.25} 25%|██▍ | 1637/6638 [1:33:38<4:30:36, 3.25s/it] 25%|██▍ | 1638/6638 [1:33:41<4:34:18, 3.29s/it] {'loss': 0.7072, 'grad_norm': 0.6900429030019841, 'learning_rate': 1.7637393314162856e-05, 'epoch': 0.25} 25%|██▍ | 1638/6638 [1:33:41<4:34:18, 3.29s/it] 25%|██▍ | 1639/6638 [1:33:44<4:31:19, 3.26s/it] {'loss': 0.6446, 'grad_norm': 0.6809699191604917, 'learning_rate': 1.7634242395831316e-05, 'epoch': 0.25} 25%|██▍ | 1639/6638 [1:33:44<4:31:19, 3.26s/it] 25%|██▍ | 1640/6638 [1:33:48<4:29:35, 3.24s/it] {'loss': 0.6922, 'grad_norm': 0.7153018422709773, 'learning_rate': 1.763108965962595e-05, 'epoch': 0.25} 25%|██▍ | 1640/6638 [1:33:48<4:29:35, 3.24s/it] 25%|██▍ | 1641/6638 [1:33:51<4:29:47, 3.24s/it] {'loss': 0.6895, 'grad_norm': 0.7105353981587162, 'learning_rate': 1.762793510629749e-05, 'epoch': 0.25} 25%|██▍ | 1641/6638 [1:33:51<4:29:47, 3.24s/it] 25%|██▍ | 1642/6638 [1:33:54<4:30:28, 3.25s/it] {'loss': 0.667, 'grad_norm': 0.5826384035609166, 'learning_rate': 1.7624778736597094e-05, 'epoch': 0.25} 25%|██▍ | 1642/6638 [1:33:54<4:30:28, 3.25s/it] 25%|██▍ | 1643/6638 [1:33:57<4:31:31, 3.26s/it] {'loss': 0.6592, 'grad_norm': 0.6447136824886913, 'learning_rate': 1.7621620551276366e-05, 'epoch': 0.25} 25%|██▍ | 1643/6638 [1:33:57<4:31:31, 3.26s/it] 25%|██▍ | 1644/6638 [1:34:01<4:30:52, 3.25s/it] {'loss': 0.6488, 'grad_norm': 0.5561760337811922, 'learning_rate': 1.7618460551087338e-05, 'epoch': 0.25} 25%|██▍ | 1644/6638 [1:34:01<4:30:52, 3.25s/it] 25%|██▍ | 1645/6638 [1:34:04<4:33:33, 3.29s/it] {'loss': 0.6903, 'grad_norm': 0.6764533599445771, 'learning_rate': 1.761529873678247e-05, 'epoch': 0.25} 25%|██▍ | 1645/6638 [1:34:04<4:33:33, 3.29s/it] 25%|██▍ | 1646/6638 [1:34:07<4:31:53, 3.27s/it] {'loss': 0.6822, 'grad_norm': 0.598268083858938, 'learning_rate': 1.761213510911466e-05, 'epoch': 0.25} 25%|██▍ | 1646/6638 [1:34:07<4:31:53, 3.27s/it] 25%|██▍ | 1647/6638 [1:34:10<4:31:45, 3.27s/it] {'loss': 0.6598, 'grad_norm': 0.6196863187065341, 'learning_rate': 1.760896966883723e-05, 'epoch': 0.25} 25%|██▍ | 1647/6638 [1:34:10<4:31:45, 3.27s/it] 25%|██▍ | 1648/6638 [1:34:14<4:30:32, 3.25s/it] {'loss': 0.6195, 'grad_norm': 0.5891377843539755, 'learning_rate': 1.7605802416703946e-05, 'epoch': 0.25} 25%|██▍ | 1648/6638 [1:34:14<4:30:32, 3.25s/it] 25%|██▍ | 1649/6638 [1:34:17<4:31:03, 3.26s/it] {'loss': 0.6354, 'grad_norm': 0.5990675733274786, 'learning_rate': 1.7602633353468987e-05, 'epoch': 0.25} 25%|██▍ | 1649/6638 [1:34:17<4:31:03, 3.26s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 25%|██▍ | 1650/6638 [1:34:20<4:30:42, 3.26s/it]1 AutoResumeHook: Checking whether to suspend... {'loss': 0.6688, 'grad_norm': 0.5779420796284356, 'learning_rate': 1.7599462479886976e-05, 'epoch': 0.25} 25%|██▍ | 1650/6638 [1:34:20<4:30:42, 3.26s/it] 25%|██▍ | 1651/6638 [1:34:23<4:29:41, 3.24s/it] {'loss': 0.6994, 'grad_norm': 0.6214274311754685, 'learning_rate': 1.759628979671297e-05, 'epoch': 0.25} 25%|██▍ | 1651/6638 [1:34:23<4:29:41, 3.24s/it] 25%|██▍ | 1652/6638 [1:34:27<4:30:07, 3.25s/it] {'loss': 0.7001, 'grad_norm': 0.6380512146715702, 'learning_rate': 1.7593115304702448e-05, 'epoch': 0.25} 25%|██▍ | 1652/6638 [1:34:27<4:30:07, 3.25s/it] 25%|██▍ | 1653/6638 [1:34:30<4:30:13, 3.25s/it] {'loss': 0.6373, 'grad_norm': 0.5776618547174539, 'learning_rate': 1.7589939004611318e-05, 'epoch': 0.25} 25%|██▍ | 1653/6638 [1:34:30<4:30:13, 3.25s/it] 25%|██▍ | 1654/6638 [1:34:33<4:29:30, 3.24s/it] {'loss': 0.6829, 'grad_norm': 0.5797867024516337, 'learning_rate': 1.7586760897195935e-05, 'epoch': 0.25} 25%|██▍ | 1654/6638 [1:34:33<4:29:30, 3.24s/it] 25%|██▍ | 1655/6638 [1:34:36<4:28:44, 3.24s/it] {'loss': 0.6418, 'grad_norm': 0.5649653581054211, 'learning_rate': 1.7583580983213068e-05, 'epoch': 0.25} 25%|██▍ | 1655/6638 [1:34:36<4:28:44, 3.24s/it] 25%|██▍ | 1656/6638 [1:34:40<4:30:13, 3.25s/it] {'loss': 0.6449, 'grad_norm': 0.5596178534783062, 'learning_rate': 1.758039926341992e-05, 'epoch': 0.25} 25%|██▍ | 1656/6638 [1:34:40<4:30:13, 3.25s/it] 25%|██▍ | 1657/6638 [1:34:43<4:30:23, 3.26s/it] {'loss': 0.6372, 'grad_norm': 0.5831202637054532, 'learning_rate': 1.757721573857413e-05, 'epoch': 0.25} 25%|██▍ | 1657/6638 [1:34:43<4:30:23, 3.26s/it] 25%|██▍ | 1658/6638 [1:34:46<4:29:50, 3.25s/it] {'loss': 0.685, 'grad_norm': 0.6450813992195612, 'learning_rate': 1.7574030409433755e-05, 'epoch': 0.25} 25%|██▍ | 1658/6638 [1:34:46<4:29:50, 3.25s/it] 25%|██▍ | 1659/6638 [1:34:49<4:27:46, 3.23s/it] {'loss': 0.6939, 'grad_norm': 0.6561044164488741, 'learning_rate': 1.757084327675729e-05, 'epoch': 0.25} 25%|██▍ | 1659/6638 [1:34:49<4:27:46, 3.23s/it] 25%|██▌ | 1660/6638 [1:34:53<4:29:01, 3.24s/it] {'loss': 0.6759, 'grad_norm': 0.6281034200336928, 'learning_rate': 1.7567654341303668e-05, 'epoch': 0.25} 25%|██▌ | 1660/6638 [1:34:53<4:29:01, 3.24s/it] 25%|██▌ | 1661/6638 [1:34:56<4:36:44, 3.34s/it] {'loss': 0.6564, 'grad_norm': 0.6057809307677838, 'learning_rate': 1.7564463603832234e-05, 'epoch': 0.25} 25%|██▌ | 1661/6638 [1:34:56<4:36:44, 3.34s/it] 25%|██▌ | 1662/6638 [1:34:59<4:34:39, 3.31s/it] {'loss': 0.7053, 'grad_norm': 0.7325679730756602, 'learning_rate': 1.756127106510277e-05, 'epoch': 0.25} 25%|██▌ | 1662/6638 [1:34:59<4:34:39, 3.31s/it] 25%|██▌ | 1663/6638 [1:35:03<4:32:19, 3.28s/it] {'loss': 0.7076, 'grad_norm': 0.6948189421686126, 'learning_rate': 1.7558076725875487e-05, 'epoch': 0.25} 25%|██▌ | 1663/6638 [1:35:03<4:32:19, 3.28s/it] 25%|██▌ | 1664/6638 [1:35:06<4:35:50, 3.33s/it] {'loss': 0.6677, 'grad_norm': 0.5701434255976519, 'learning_rate': 1.7554880586911025e-05, 'epoch': 0.25} 25%|██▌ | 1664/6638 [1:35:06<4:35:50, 3.33s/it] 25%|██▌ | 1665/6638 [1:35:09<4:36:17, 3.33s/it] {'loss': 0.6852, 'grad_norm': 0.6487104911322109, 'learning_rate': 1.755168264897045e-05, 'epoch': 0.25} 25%|██▌ | 1665/6638 [1:35:09<4:36:17, 3.33s/it] 25%|██▌ | 1666/6638 [1:35:13<4:37:29, 3.35s/it] {'loss': 0.6559, 'grad_norm': 0.6708204460726402, 'learning_rate': 1.7548482912815265e-05, 'epoch': 0.25} 25%|██▌ | 1666/6638 [1:35:13<4:37:29, 3.35s/it] 25%|██▌ | 1667/6638 [1:35:16<4:36:17, 3.33s/it] {'loss': 0.6626, 'grad_norm': 0.592953777669106, 'learning_rate': 1.7545281379207385e-05, 'epoch': 0.25} 25%|██▌ | 1667/6638 [1:35:16<4:36:17, 3.33s/it] 25%|██▌ | 1668/6638 [1:35:19<4:34:07, 3.31s/it] {'loss': 0.6806, 'grad_norm': 0.6665155545036845, 'learning_rate': 1.7542078048909172e-05, 'epoch': 0.25} 25%|██▌ | 1668/6638 [1:35:19<4:34:07, 3.31s/it] 25%|██▌ | 1669/6638 [1:35:23<4:39:03, 3.37s/it] {'loss': 0.7529, 'grad_norm': 0.697541986089498, 'learning_rate': 1.75388729226834e-05, 'epoch': 0.25} 25%|██▌ | 1669/6638 [1:35:23<4:39:03, 3.37s/it] 25%|██▌ | 1670/6638 [1:35:26<4:35:04, 3.32s/it] {'loss': 0.6512, 'grad_norm': 0.7124212073018154, 'learning_rate': 1.753566600129328e-05, 'epoch': 0.25} 25%|██▌ | 1670/6638 [1:35:26<4:35:04, 3.32s/it] 25%|██▌ | 1671/6638 [1:35:29<4:35:46, 3.33s/it] {'loss': 0.6927, 'grad_norm': 0.6430257756005475, 'learning_rate': 1.7532457285502444e-05, 'epoch': 0.25} 25%|██▌ | 1671/6638 [1:35:29<4:35:46, 3.33s/it] 25%|██▌ | 1672/6638 [1:35:33<4:35:54, 3.33s/it] {'loss': 0.646, 'grad_norm': 0.5864695931032301, 'learning_rate': 1.7529246776074958e-05, 'epoch': 0.25} 25%|██▌ | 1672/6638 [1:35:33<4:35:54, 3.33s/it] 25%|██▌ | 1673/6638 [1:35:36<4:34:37, 3.32s/it] {'loss': 0.672, 'grad_norm': 0.5627459251641812, 'learning_rate': 1.752603447377531e-05, 'epoch': 0.25} 25%|██▌ | 1673/6638 [1:35:36<4:34:37, 3.32s/it] 25%|██▌ | 1674/6638 [1:35:39<4:35:32, 3.33s/it] {'loss': 0.7199, 'grad_norm': 0.6923883073856213, 'learning_rate': 1.7522820379368417e-05, 'epoch': 0.25} 25%|██▌ | 1674/6638 [1:35:39<4:35:32, 3.33s/it] 25%|██▌ | 1675/6638 [1:35:43<4:37:34, 3.36s/it] {'loss': 0.7161, 'grad_norm': 0.7149676081004435, 'learning_rate': 1.7519604493619622e-05, 'epoch': 0.25} 25%|██▌ | 1675/6638 [1:35:43<4:37:34, 3.36s/it] 25%|██▌ | 1676/6638 [1:35:46<4:34:15, 3.32s/it] {'loss': 0.7193, 'grad_norm': 0.6888628183497084, 'learning_rate': 1.75163868172947e-05, 'epoch': 0.25} 25%|██▌ | 1676/6638 [1:35:46<4:34:15, 3.32s/it] 25%|██▌ | 1677/6638 [1:35:49<4:32:53, 3.30s/it] {'loss': 0.6837, 'grad_norm': 0.6166309244512046, 'learning_rate': 1.7513167351159846e-05, 'epoch': 0.25} 25%|██▌ | 1677/6638 [1:35:49<4:32:53, 3.30s/it] 25%|██▌ | 1678/6638 [1:35:53<4:31:20, 3.28s/it] {'loss': 0.6681, 'grad_norm': 0.7074727352757983, 'learning_rate': 1.7509946095981676e-05, 'epoch': 0.25} 25%|██▌ | 1678/6638 [1:35:53<4:31:20, 3.28s/it] 25%|██▌ | 1679/6638 [1:35:56<4:28:59, 3.25s/it] {'loss': 0.6902, 'grad_norm': 0.6867309536791619, 'learning_rate': 1.7506723052527243e-05, 'epoch': 0.25} 25%|██▌ | 1679/6638 [1:35:56<4:28:59, 3.25s/it] 25%|██▌ | 1680/6638 [1:35:59<4:27:46, 3.24s/it] {'loss': 0.6603, 'grad_norm': 0.641457236809842, 'learning_rate': 1.7503498221564026e-05, 'epoch': 0.25} 25%|██▌ | 1680/6638 [1:35:59<4:27:46, 3.24s/it] 25%|██▌ | 1681/6638 [1:36:02<4:28:58, 3.26s/it] {'loss': 0.6384, 'grad_norm': 0.5495571057126968, 'learning_rate': 1.750027160385992e-05, 'epoch': 0.25} 25%|██▌ | 1681/6638 [1:36:02<4:28:58, 3.26s/it] 25%|██▌ | 1682/6638 [1:36:06<4:35:31, 3.34s/it] {'loss': 0.6781, 'grad_norm': 0.6198345100331547, 'learning_rate': 1.7497043200183247e-05, 'epoch': 0.25} 25%|██▌ | 1682/6638 [1:36:06<4:35:31, 3.34s/it] 25%|██▌ | 1683/6638 [1:36:09<4:37:02, 3.35s/it] {'loss': 0.6385, 'grad_norm': 0.5914350264948682, 'learning_rate': 1.7493813011302764e-05, 'epoch': 0.25} 25%|██▌ | 1683/6638 [1:36:09<4:37:02, 3.35s/it] 25%|██▌ | 1684/6638 [1:36:12<4:34:32, 3.33s/it] {'loss': 0.6703, 'grad_norm': 0.5972435745054988, 'learning_rate': 1.7490581037987645e-05, 'epoch': 0.25} 25%|██▌ | 1684/6638 [1:36:12<4:34:32, 3.33s/it] 25%|██▌ | 1685/6638 [1:36:16<4:33:14, 3.31s/it] {'loss': 0.6828, 'grad_norm': 0.6445865787907427, 'learning_rate': 1.7487347281007493e-05, 'epoch': 0.25} 25%|██▌ | 1685/6638 [1:36:16<4:33:14, 3.31s/it] 25%|██▌ | 1686/6638 [1:36:19<4:41:11, 3.41s/it] {'loss': 0.6722, 'grad_norm': 0.665497359437504, 'learning_rate': 1.7484111741132332e-05, 'epoch': 0.25} 25%|██▌ | 1686/6638 [1:36:19<4:41:11, 3.41s/it] 25%|██▌ | 1687/6638 [1:36:23<4:38:50, 3.38s/it] {'loss': 0.6438, 'grad_norm': 0.5699740143621963, 'learning_rate': 1.7480874419132613e-05, 'epoch': 0.25} 25%|██▌ | 1687/6638 [1:36:23<4:38:50, 3.38s/it] 25%|██▌ | 1688/6638 [1:36:26<4:36:01, 3.35s/it] {'loss': 0.7014, 'grad_norm': 0.6786001823253982, 'learning_rate': 1.7477635315779205e-05, 'epoch': 0.25} 25%|██▌ | 1688/6638 [1:36:26<4:36:01, 3.35s/it] 25%|██▌ | 1689/6638 [1:36:29<4:32:39, 3.31s/it] {'loss': 0.653, 'grad_norm': 0.6017297748725391, 'learning_rate': 1.7474394431843414e-05, 'epoch': 0.25} 25%|██▌ | 1689/6638 [1:36:29<4:32:39, 3.31s/it] 25%|██▌ | 1690/6638 [1:36:32<4:31:10, 3.29s/it] {'loss': 0.6636, 'grad_norm': 0.6006680163409177, 'learning_rate': 1.747115176809696e-05, 'epoch': 0.25} 25%|██▌ | 1690/6638 [1:36:32<4:31:10, 3.29s/it] 25%|██▌ | 1691/6638 [1:36:36<4:29:55, 3.27s/it] {'loss': 0.7093, 'grad_norm': 0.6263868333121692, 'learning_rate': 1.7467907325311987e-05, 'epoch': 0.25} 25%|██▌ | 1691/6638 [1:36:36<4:29:55, 3.27s/it] 25%|██▌ | 1692/6638 [1:36:39<4:28:49, 3.26s/it] {'loss': 0.6559, 'grad_norm': 0.5744423738921283, 'learning_rate': 1.746466110426107e-05, 'epoch': 0.25} 25%|██▌ | 1692/6638 [1:36:39<4:28:49, 3.26s/it] 26%|██▌ | 1693/6638 [1:36:42<4:31:36, 3.30s/it] {'loss': 0.622, 'grad_norm': 0.5495886558443903, 'learning_rate': 1.74614131057172e-05, 'epoch': 0.26} 26%|██▌ | 1693/6638 [1:36:42<4:31:36, 3.30s/it] 26%|██▌ | 1694/6638 [1:36:45<4:28:40, 3.26s/it] {'loss': 0.6454, 'grad_norm': 0.6382821087590724, 'learning_rate': 1.7458163330453794e-05, 'epoch': 0.26} 26%|██▌ | 1694/6638 [1:36:45<4:28:40, 3.26s/it] 26%|██▌ | 1695/6638 [1:36:49<4:28:51, 3.26s/it] {'loss': 0.6539, 'grad_norm': 0.6202188629607457, 'learning_rate': 1.745491177924469e-05, 'epoch': 0.26} 26%|██▌ | 1695/6638 [1:36:49<4:28:51, 3.26s/it] 26%|██▌ | 1696/6638 [1:36:52<4:27:34, 3.25s/it] {'loss': 0.7066, 'grad_norm': 0.6650089374337685, 'learning_rate': 1.7451658452864152e-05, 'epoch': 0.26} 26%|██▌ | 1696/6638 [1:36:52<4:27:34, 3.25s/it] 26%|██▌ | 1697/6638 [1:36:55<4:29:08, 3.27s/it] {'loss': 0.679, 'grad_norm': 0.6426770497988092, 'learning_rate': 1.7448403352086868e-05, 'epoch': 0.26} 26%|██▌ | 1697/6638 [1:36:55<4:29:08, 3.27s/it] 26%|██▌ | 1698/6638 [1:36:58<4:29:04, 3.27s/it] {'loss': 0.6789, 'grad_norm': 0.6154788449857773, 'learning_rate': 1.7445146477687943e-05, 'epoch': 0.26} 26%|██▌ | 1698/6638 [1:36:58<4:29:04, 3.27s/it] 26%|██▌ | 1699/6638 [1:37:02<4:31:50, 3.30s/it] {'loss': 0.6954, 'grad_norm': 0.8679990899380581, 'learning_rate': 1.7441887830442913e-05, 'epoch': 0.26} 26%|██▌ | 1699/6638 [1:37:02<4:31:50, 3.30s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 26%|██▌ | 1700/6638 [1:37:05<4:29:31, 3.27s/it] {'loss': 0.6877, 'grad_norm': 0.5996327907450989, 'learning_rate': 1.743862741112772e-05, 'epoch': 0.26} 26%|██▌ | 1700/6638 [1:37:05<4:29:31, 3.27s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 26%|██▌ | 1701/6638 [1:37:21<9:54:11, 7.22s/it] {'loss': 0.6813, 'grad_norm': 0.7170654542639548, 'learning_rate': 1.7435365220518746e-05, 'epoch': 0.26} 26%|██▌ | 1701/6638 [1:37:21<9:54:11, 7.22s/it] 26%|██▌ | 1702/6638 [1:37:25<8:15:31, 6.02s/it] {'loss': 0.6513, 'grad_norm': 0.6140792078059165, 'learning_rate': 1.7432101259392786e-05, 'epoch': 0.26} 26%|██▌ | 1702/6638 [1:37:25<8:15:31, 6.02s/it] 26%|██▌ | 1703/6638 [1:37:28<7:06:26, 5.18s/it] {'loss': 0.6542, 'grad_norm': 0.5681542790868173, 'learning_rate': 1.742883552852706e-05, 'epoch': 0.26} 26%|██▌ | 1703/6638 [1:37:28<7:06:26, 5.18s/it] 26%|██▌ | 1704/6638 [1:37:31<6:21:53, 4.64s/it] {'loss': 0.6327, 'grad_norm': 0.5602283490479433, 'learning_rate': 1.74255680286992e-05, 'epoch': 0.26} 26%|██▌ | 1704/6638 [1:37:31<6:21:53, 4.64s/it] 26%|██▌ | 1705/6638 [1:37:35<5:47:06, 4.22s/it] {'loss': 0.6405, 'grad_norm': 0.5950672363275054, 'learning_rate': 1.7422298760687275e-05, 'epoch': 0.26} 26%|██▌ | 1705/6638 [1:37:35<5:47:06, 4.22s/it] 26%|██▌ | 1706/6638 [1:37:38<5:22:43, 3.93s/it] {'loss': 0.6272, 'grad_norm': 0.5589488840423971, 'learning_rate': 1.7419027725269763e-05, 'epoch': 0.26} 26%|██▌ | 1706/6638 [1:37:38<5:22:43, 3.93s/it] 26%|██▌ | 1707/6638 [1:37:41<5:04:26, 3.70s/it] {'loss': 0.6903, 'grad_norm': 0.7034803170073902, 'learning_rate': 1.7415754923225564e-05, 'epoch': 0.26} 26%|██▌ | 1707/6638 [1:37:41<5:04:26, 3.70s/it] 26%|██▌ | 1708/6638 [1:37:44<4:56:14, 3.61s/it] {'loss': 0.7238, 'grad_norm': 0.7308788472771279, 'learning_rate': 1.7412480355334006e-05, 'epoch': 0.26} 26%|██▌ | 1708/6638 [1:37:44<4:56:14, 3.61s/it] 26%|██▌ | 1709/6638 [1:37:48<4:49:30, 3.52s/it] {'loss': 0.6844, 'grad_norm': 0.7448420362510647, 'learning_rate': 1.7409204022374826e-05, 'epoch': 0.26} 26%|██▌ | 1709/6638 [1:37:48<4:49:30, 3.52s/it] 26%|██▌ | 1710/6638 [1:37:51<4:43:47, 3.46s/it] {'loss': 0.6632, 'grad_norm': 0.6605782845131091, 'learning_rate': 1.7405925925128193e-05, 'epoch': 0.26} 26%|██▌ | 1710/6638 [1:37:51<4:43:47, 3.46s/it] 26%|██▌ | 1711/6638 [1:37:54<4:42:08, 3.44s/it] {'loss': 0.7035, 'grad_norm': 0.6432516911915026, 'learning_rate': 1.740264606437469e-05, 'epoch': 0.26} 26%|██▌ | 1711/6638 [1:37:54<4:42:08, 3.44s/it] 26%|██▌ | 1712/6638 [1:37:58<4:38:30, 3.39s/it] {'loss': 0.7263, 'grad_norm': 0.6808999706420098, 'learning_rate': 1.7399364440895323e-05, 'epoch': 0.26} 26%|██▌ | 1712/6638 [1:37:58<4:38:30, 3.39s/it] 26%|██▌ | 1713/6638 [1:38:01<4:34:47, 3.35s/it] {'loss': 0.6319, 'grad_norm': 0.7221329711600994, 'learning_rate': 1.739608105547151e-05, 'epoch': 0.26} 26%|██▌ | 1713/6638 [1:38:01<4:34:47, 3.35s/it] 26%|██▌ | 1714/6638 [1:38:04<4:32:46, 3.32s/it] {'loss': 0.689, 'grad_norm': 0.7186738060431389, 'learning_rate': 1.73927959088851e-05, 'epoch': 0.26} 26%|██▌ | 1714/6638 [1:38:04<4:32:46, 3.32s/it] 26%|██▌ | 1715/6638 [1:38:07<4:30:40, 3.30s/it] {'loss': 0.6889, 'grad_norm': 0.6373439873593904, 'learning_rate': 1.738950900191835e-05, 'epoch': 0.26} 26%|██▌ | 1715/6638 [1:38:07<4:30:40, 3.30s/it] 26%|██▌ | 1716/6638 [1:38:11<4:31:09, 3.31s/it] {'loss': 0.637, 'grad_norm': 0.5944403631514575, 'learning_rate': 1.7386220335353945e-05, 'epoch': 0.26} 26%|██▌ | 1716/6638 [1:38:11<4:31:09, 3.31s/it] 26%|██▌ | 1717/6638 [1:38:14<4:33:38, 3.34s/it] {'loss': 0.7442, 'grad_norm': 0.7047476344504595, 'learning_rate': 1.7382929909974988e-05, 'epoch': 0.26} 26%|██▌ | 1717/6638 [1:38:14<4:33:38, 3.34s/it] 26%|██▌ | 1718/6638 [1:38:17<4:29:02, 3.28s/it] {'loss': 0.636, 'grad_norm': 0.5767492515109623, 'learning_rate': 1.7379637726564994e-05, 'epoch': 0.26} 26%|██▌ | 1718/6638 [1:38:17<4:29:02, 3.28s/it] 26%|██▌ | 1719/6638 [1:38:21<4:27:11, 3.26s/it] {'loss': 0.6779, 'grad_norm': 0.6693423276681383, 'learning_rate': 1.7376343785907905e-05, 'epoch': 0.26} 26%|██▌ | 1719/6638 [1:38:21<4:27:11, 3.26s/it] 26%|██▌ | 1720/6638 [1:38:24<4:30:32, 3.30s/it] {'loss': 0.6543, 'grad_norm': 0.6526823844800301, 'learning_rate': 1.7373048088788076e-05, 'epoch': 0.26} 26%|██▌ | 1720/6638 [1:38:24<4:30:32, 3.30s/it] 26%|██▌ | 1721/6638 [1:38:27<4:29:00, 3.28s/it] {'loss': 0.697, 'grad_norm': 0.7315355874865944, 'learning_rate': 1.7369750635990278e-05, 'epoch': 0.26} 26%|██▌ | 1721/6638 [1:38:27<4:29:00, 3.28s/it] 26%|██▌ | 1722/6638 [1:38:30<4:27:44, 3.27s/it] {'loss': 0.7009, 'grad_norm': 0.6979358268362879, 'learning_rate': 1.7366451428299713e-05, 'epoch': 0.26} 26%|██▌ | 1722/6638 [1:38:30<4:27:44, 3.27s/it] 26%|██▌ | 1723/6638 [1:38:34<4:30:46, 3.31s/it] {'loss': 0.695, 'grad_norm': 0.698404826128714, 'learning_rate': 1.736315046650198e-05, 'epoch': 0.26} 26%|██▌ | 1723/6638 [1:38:34<4:30:46, 3.31s/it] 26%|██▌ | 1724/6638 [1:38:37<4:34:43, 3.35s/it] {'loss': 0.7533, 'grad_norm': 0.7422437738088142, 'learning_rate': 1.7359847751383115e-05, 'epoch': 0.26} 26%|██▌ | 1724/6638 [1:38:37<4:34:43, 3.35s/it] 26%|██▌ | 1725/6638 [1:38:41<4:33:43, 3.34s/it] {'loss': 0.6753, 'grad_norm': 0.6144391784573607, 'learning_rate': 1.735654328372957e-05, 'epoch': 0.26} 26%|██▌ | 1725/6638 [1:38:41<4:33:43, 3.34s/it] 26%|██▌ | 1726/6638 [1:38:44<4:31:37, 3.32s/it] {'loss': 0.6419, 'grad_norm': 0.6695052009874012, 'learning_rate': 1.735323706432819e-05, 'epoch': 0.26} 26%|██▌ | 1726/6638 [1:38:44<4:31:37, 3.32s/it] 26%|██▌ | 1727/6638 [1:38:47<4:28:53, 3.29s/it] {'loss': 0.6236, 'grad_norm': 0.6349836806037713, 'learning_rate': 1.7349929093966275e-05, 'epoch': 0.26} 26%|██▌ | 1727/6638 [1:38:47<4:28:53, 3.29s/it] 26%|██▌ | 1728/6638 [1:38:50<4:28:16, 3.28s/it] {'loss': 0.7292, 'grad_norm': 0.6462519982670244, 'learning_rate': 1.7346619373431513e-05, 'epoch': 0.26} 26%|██▌ | 1728/6638 [1:38:50<4:28:16, 3.28s/it] 26%|██▌ | 1729/6638 [1:38:54<4:29:05, 3.29s/it] {'loss': 0.6588, 'grad_norm': 0.6065828182690474, 'learning_rate': 1.734330790351202e-05, 'epoch': 0.26} 26%|██▌ | 1729/6638 [1:38:54<4:29:05, 3.29s/it] 26%|██▌ | 1730/6638 [1:38:57<4:30:16, 3.30s/it] {'loss': 0.6787, 'grad_norm': 0.6621131756695192, 'learning_rate': 1.733999468499632e-05, 'epoch': 0.26} 26%|██▌ | 1730/6638 [1:38:57<4:30:16, 3.30s/it] 26%|██▌ | 1731/6638 [1:39:00<4:30:00, 3.30s/it] {'loss': 0.683, 'grad_norm': 0.726278587470733, 'learning_rate': 1.733667971867337e-05, 'epoch': 0.26} 26%|██▌ | 1731/6638 [1:39:00<4:30:00, 3.30s/it] 26%|██▌ | 1732/6638 [1:39:03<4:27:20, 3.27s/it] {'loss': 0.6767, 'grad_norm': 0.6116187006619267, 'learning_rate': 1.733336300533253e-05, 'epoch': 0.26} 26%|██▌ | 1732/6638 [1:39:03<4:27:20, 3.27s/it] 26%|██▌ | 1733/6638 [1:39:07<4:28:53, 3.29s/it] {'loss': 0.6593, 'grad_norm': 0.6332261430512078, 'learning_rate': 1.7330044545763574e-05, 'epoch': 0.26} 26%|██▌ | 1733/6638 [1:39:07<4:28:53, 3.29s/it] 26%|██▌ | 1734/6638 [1:39:10<4:30:30, 3.31s/it] {'loss': 0.6328, 'grad_norm': 0.6071049239862439, 'learning_rate': 1.7326724340756706e-05, 'epoch': 0.26} 26%|██▌ | 1734/6638 [1:39:10<4:30:30, 3.31s/it] 26%|██▌ | 1735/6638 [1:39:13<4:29:43, 3.30s/it] {'loss': 0.6661, 'grad_norm': 0.6937541362569136, 'learning_rate': 1.732340239110253e-05, 'epoch': 0.26} 26%|██▌ | 1735/6638 [1:39:13<4:29:43, 3.30s/it] 26%|██▌ | 1736/6638 [1:39:17<4:33:50, 3.35s/it] {'loss': 0.7178, 'grad_norm': 0.6474811819097174, 'learning_rate': 1.7320078697592077e-05, 'epoch': 0.26} 26%|██▌ | 1736/6638 [1:39:17<4:33:50, 3.35s/it] 26%|██▌ | 1737/6638 [1:39:20<4:33:36, 3.35s/it] {'loss': 0.7257, 'grad_norm': 0.7145651962141898, 'learning_rate': 1.7316753261016782e-05, 'epoch': 0.26} 26%|██▌ | 1737/6638 [1:39:20<4:33:36, 3.35s/it] 26%|██▌ | 1738/6638 [1:39:24<4:34:37, 3.36s/it] {'loss': 0.6642, 'grad_norm': 0.6237443620041919, 'learning_rate': 1.731342608216851e-05, 'epoch': 0.26} 26%|██▌ | 1738/6638 [1:39:24<4:34:37, 3.36s/it] 26%|██▌ | 1739/6638 [1:39:27<4:34:50, 3.37s/it] {'loss': 0.6427, 'grad_norm': 0.6793531856802376, 'learning_rate': 1.7310097161839526e-05, 'epoch': 0.26} 26%|██▌ | 1739/6638 [1:39:27<4:34:50, 3.37s/it] 26%|██▌ | 1740/6638 [1:39:30<4:30:07, 3.31s/it] {'loss': 0.6331, 'grad_norm': 0.6094112247077411, 'learning_rate': 1.7306766500822516e-05, 'epoch': 0.26} 26%|██▌ | 1740/6638 [1:39:30<4:30:07, 3.31s/it] 26%|██▌ | 1741/6638 [1:39:33<4:30:01, 3.31s/it] {'loss': 0.726, 'grad_norm': 0.6683527919063151, 'learning_rate': 1.730343409991058e-05, 'epoch': 0.26} 26%|██▌ | 1741/6638 [1:39:33<4:30:01, 3.31s/it] 26%|██▌ | 1742/6638 [1:39:37<4:31:30, 3.33s/it] {'loss': 0.6931, 'grad_norm': 0.6599434191708372, 'learning_rate': 1.730009995989724e-05, 'epoch': 0.26} 26%|██▌ | 1742/6638 [1:39:37<4:31:30, 3.33s/it] 26%|██▋ | 1743/6638 [1:39:40<4:30:48, 3.32s/it] {'loss': 0.6387, 'grad_norm': 0.564720375706006, 'learning_rate': 1.7296764081576417e-05, 'epoch': 0.26} 26%|██▋ | 1743/6638 [1:39:40<4:30:48, 3.32s/it] 26%|██▋ | 1744/6638 [1:39:43<4:30:43, 3.32s/it] {'loss': 0.6895, 'grad_norm': 0.672238587518248, 'learning_rate': 1.729342646574246e-05, 'epoch': 0.26} 26%|██▋ | 1744/6638 [1:39:43<4:30:43, 3.32s/it] 26%|██▋ | 1745/6638 [1:39:47<4:33:48, 3.36s/it] {'loss': 0.6451, 'grad_norm': 0.5888447609152252, 'learning_rate': 1.7290087113190117e-05, 'epoch': 0.26} 26%|██▋ | 1745/6638 [1:39:47<4:33:48, 3.36s/it] 26%|██▋ | 1746/6638 [1:39:50<4:31:59, 3.34s/it] {'loss': 0.6895, 'grad_norm': 0.601293361515266, 'learning_rate': 1.7286746024714566e-05, 'epoch': 0.26} 26%|██▋ | 1746/6638 [1:39:50<4:31:59, 3.34s/it] 26%|██▋ | 1747/6638 [1:39:54<4:32:23, 3.34s/it] {'loss': 0.6908, 'grad_norm': 0.63994495150914, 'learning_rate': 1.7283403201111384e-05, 'epoch': 0.26} 26%|██▋ | 1747/6638 [1:39:54<4:32:23, 3.34s/it] 26%|██▋ | 1748/6638 [1:39:57<4:31:51, 3.34s/it] {'loss': 0.6573, 'grad_norm': 0.592703200783084, 'learning_rate': 1.728005864317658e-05, 'epoch': 0.26} 26%|██▋ | 1748/6638 [1:39:57<4:31:51, 3.34s/it] 26%|██▋ | 1749/6638 [1:40:00<4:31:25, 3.33s/it] {'loss': 0.6921, 'grad_norm': 0.7076779353510051, 'learning_rate': 1.7276712351706548e-05, 'epoch': 0.26} 26%|██▋ | 1749/6638 [1:40:00<4:31:25, 3.33s/it]2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 26%|██▋ | 1750/6638 [1:40:04<4:32:18, 3.34s/it] {'loss': 0.6706, 'grad_norm': 0.6842947029351112, 'learning_rate': 1.7273364327498118e-05, 'epoch': 0.26} 26%|██▋ | 1750/6638 [1:40:04<4:32:18, 3.34s/it] 26%|██▋ | 1751/6638 [1:40:07<4:31:51, 3.34s/it] {'loss': 0.7011, 'grad_norm': 0.7272144202507804, 'learning_rate': 1.7270014571348528e-05, 'epoch': 0.26} 26%|██▋ | 1751/6638 [1:40:07<4:31:51, 3.34s/it] 26%|██▋ | 1752/6638 [1:40:10<4:29:53, 3.31s/it] {'loss': 0.6597, 'grad_norm': 0.6966698722366903, 'learning_rate': 1.726666308405542e-05, 'epoch': 0.26} 26%|██▋ | 1752/6638 [1:40:10<4:29:53, 3.31s/it] 26%|██▋ | 1753/6638 [1:40:13<4:29:01, 3.30s/it] {'loss': 0.6647, 'grad_norm': 0.7133462056741577, 'learning_rate': 1.726330986641686e-05, 'epoch': 0.26} 26%|██▋ | 1753/6638 [1:40:13<4:29:01, 3.30s/it] 26%|██▋ | 1754/6638 [1:40:17<4:29:05, 3.31s/it] {'loss': 0.707, 'grad_norm': 0.6554299466171177, 'learning_rate': 1.725995491923131e-05, 'epoch': 0.26} 26%|██▋ | 1754/6638 [1:40:17<4:29:05, 3.31s/it] 26%|██▋ | 1755/6638 [1:40:20<4:24:39, 3.25s/it] {'loss': 0.6463, 'grad_norm': 0.6107311133389134, 'learning_rate': 1.7256598243297662e-05, 'epoch': 0.26} 26%|██▋ | 1755/6638 [1:40:20<4:24:39, 3.25s/it] 26%|██▋ | 1756/6638 [1:40:23<4:31:48, 3.34s/it] {'loss': 0.6944, 'grad_norm': 0.6807254244713388, 'learning_rate': 1.7253239839415207e-05, 'epoch': 0.26} 26%|██▋ | 1756/6638 [1:40:23<4:31:48, 3.34s/it] 26%|██▋ | 1757/6638 [1:40:27<4:29:24, 3.31s/it] {'loss': 0.6656, 'grad_norm': 0.640248951966066, 'learning_rate': 1.7249879708383654e-05, 'epoch': 0.26} 26%|██▋ | 1757/6638 [1:40:27<4:29:24, 3.31s/it] 26%|██▋ | 1758/6638 [1:40:30<4:26:42, 3.28s/it] {'loss': 0.6173, 'grad_norm': 0.5600860685669731, 'learning_rate': 1.724651785100312e-05, 'epoch': 0.26} 26%|██▋ | 1758/6638 [1:40:30<4:26:42, 3.28s/it] 26%|██▋ | 1759/6638 [1:40:33<4:26:24, 3.28s/it] {'loss': 0.6577, 'grad_norm': 0.600067715481128, 'learning_rate': 1.7243154268074132e-05, 'epoch': 0.26} 26%|██▋ | 1759/6638 [1:40:33<4:26:24, 3.28s/it] 27%|██▋ | 1760/6638 [1:40:36<4:27:24, 3.29s/it] {'loss': 0.6532, 'grad_norm': 0.670205696387704, 'learning_rate': 1.7239788960397636e-05, 'epoch': 0.27} 27%|██▋ | 1760/6638 [1:40:36<4:27:24, 3.29s/it] 27%|██▋ | 1761/6638 [1:40:40<4:29:16, 3.31s/it] {'loss': 0.6719, 'grad_norm': 0.5896699153381546, 'learning_rate': 1.7236421928774975e-05, 'epoch': 0.27} 27%|██▋ | 1761/6638 [1:40:40<4:29:16, 3.31s/it] 27%|██▋ | 1762/6638 [1:40:43<4:29:05, 3.31s/it] {'loss': 0.6962, 'grad_norm': 0.7029617939067102, 'learning_rate': 1.7233053174007914e-05, 'epoch': 0.27} 27%|██▋ | 1762/6638 [1:40:43<4:29:05, 3.31s/it] 27%|██▋ | 1763/6638 [1:40:46<4:29:26, 3.32s/it] {'loss': 0.6698, 'grad_norm': 0.5969573065381526, 'learning_rate': 1.7229682696898623e-05, 'epoch': 0.27} 27%|██▋ | 1763/6638 [1:40:46<4:29:26, 3.32s/it] 27%|██▋ | 1764/6638 [1:40:50<4:31:17, 3.34s/it] {'loss': 0.6783, 'grad_norm': 0.6854783369886926, 'learning_rate': 1.7226310498249688e-05, 'epoch': 0.27} 27%|██▋ | 1764/6638 [1:40:50<4:31:17, 3.34s/it] 27%|██▋ | 1765/6638 [1:40:53<4:32:37, 3.36s/it] {'loss': 0.6846, 'grad_norm': 0.6718644669078616, 'learning_rate': 1.7222936578864094e-05, 'epoch': 0.27} 27%|██▋ | 1765/6638 [1:40:53<4:32:37, 3.36s/it] 27%|██▋ | 1766/6638 [1:40:56<4:30:18, 3.33s/it] {'loss': 0.682, 'grad_norm': 0.6977483869454257, 'learning_rate': 1.7219560939545246e-05, 'epoch': 0.27} 27%|██▋ | 1766/6638 [1:40:56<4:30:18, 3.33s/it] 27%|██▋ | 1767/6638 [1:41:00<4:33:15, 3.37s/it] {'loss': 0.6853, 'grad_norm': 0.6120031204825922, 'learning_rate': 1.7216183581096955e-05, 'epoch': 0.27} 27%|██▋ | 1767/6638 [1:41:00<4:33:15, 3.37s/it] 27%|██▋ | 1768/6638 [1:41:03<4:31:23, 3.34s/it] {'loss': 0.6921, 'grad_norm': 0.6628460517711222, 'learning_rate': 1.7212804504323437e-05, 'epoch': 0.27} 27%|██▋ | 1768/6638 [1:41:03<4:31:23, 3.34s/it] 27%|██▋ | 1769/6638 [1:41:06<4:28:06, 3.30s/it] {'loss': 0.6315, 'grad_norm': 0.6250890713822089, 'learning_rate': 1.720942371002933e-05, 'epoch': 0.27} 27%|██▋ | 1769/6638 [1:41:06<4:28:06, 3.30s/it] 27%|██▋ | 1770/6638 [1:41:10<4:28:04, 3.30s/it] {'loss': 0.6595, 'grad_norm': 0.6083530396896053, 'learning_rate': 1.7206041199019664e-05, 'epoch': 0.27} 27%|██▋ | 1770/6638 [1:41:10<4:28:04, 3.30s/it] 27%|██▋ | 1771/6638 [1:41:13<4:29:22, 3.32s/it] {'loss': 0.7095, 'grad_norm': 0.6324163876579746, 'learning_rate': 1.7202656972099886e-05, 'epoch': 0.27} 27%|██▋ | 1771/6638 [1:41:13<4:29:22, 3.32s/it] 27%|██▋ | 1772/6638 [1:41:16<4:30:11, 3.33s/it] {'loss': 0.6365, 'grad_norm': 0.6036258153024646, 'learning_rate': 1.719927103007586e-05, 'epoch': 0.27} 27%|██▋ | 1772/6638 [1:41:16<4:30:11, 3.33s/it] 27%|██▋ | 1773/6638 [1:41:20<4:25:47, 3.28s/it] {'loss': 0.5982, 'grad_norm': 0.5207515609667718, 'learning_rate': 1.719588337375384e-05, 'epoch': 0.27} 27%|██▋ | 1773/6638 [1:41:20<4:25:47, 3.28s/it] 27%|██▋ | 1774/6638 [1:41:23<4:26:03, 3.28s/it] {'loss': 0.6172, 'grad_norm': 0.8277791924403283, 'learning_rate': 1.7192494003940504e-05, 'epoch': 0.27} 27%|██▋ | 1774/6638 [1:41:23<4:26:03, 3.28s/it] 27%|██▋ | 1775/6638 [1:41:26<4:24:26, 3.26s/it] {'loss': 0.6902, 'grad_norm': 0.6327302687338655, 'learning_rate': 1.718910292144293e-05, 'epoch': 0.27} 27%|██▋ | 1775/6638 [1:41:26<4:24:26, 3.26s/it] 27%|██▋ | 1776/6638 [1:41:29<4:22:54, 3.24s/it] {'loss': 0.6583, 'grad_norm': 0.6045643929909895, 'learning_rate': 1.7185710127068614e-05, 'epoch': 0.27} 27%|██▋ | 1776/6638 [1:41:29<4:22:54, 3.24s/it] 27%|██▋ | 1777/6638 [1:41:33<4:29:47, 3.33s/it] {'loss': 0.7138, 'grad_norm': 0.5628849195056524, 'learning_rate': 1.718231562162544e-05, 'epoch': 0.27} 27%|██▋ | 1777/6638 [1:41:33<4:29:47, 3.33s/it] 27%|██▋ | 1778/6638 [1:41:36<4:27:41, 3.30s/it] {'loss': 0.674, 'grad_norm': 0.6892699343257249, 'learning_rate': 1.7178919405921717e-05, 'epoch': 0.27} 27%|██▋ | 1778/6638 [1:41:36<4:27:41, 3.30s/it] 27%|██▋ | 1779/6638 [1:41:39<4:26:38, 3.29s/it] {'loss': 0.6562, 'grad_norm': 0.6195150521767394, 'learning_rate': 1.7175521480766152e-05, 'epoch': 0.27} 27%|██▋ | 1779/6638 [1:41:39<4:26:38, 3.29s/it] 27%|██▋ | 1780/6638 [1:41:43<4:24:32, 3.27s/it] {'loss': 0.6566, 'grad_norm': 0.6540284151311114, 'learning_rate': 1.717212184696787e-05, 'epoch': 0.27} 27%|██▋ | 1780/6638 [1:41:43<4:24:32, 3.27s/it] 27%|██▋ | 1781/6638 [1:41:46<4:27:44, 3.31s/it] {'loss': 0.6689, 'grad_norm': 0.5957102131175296, 'learning_rate': 1.7168720505336385e-05, 'epoch': 0.27} 27%|██▋ | 1781/6638 [1:41:46<4:27:44, 3.31s/it] 27%|██▋ | 1782/6638 [1:41:50<4:37:34, 3.43s/it] {'loss': 0.7416, 'grad_norm': 0.7302933209979361, 'learning_rate': 1.7165317456681642e-05, 'epoch': 0.27} 27%|██▋ | 1782/6638 [1:41:50<4:37:34, 3.43s/it] 27%|██▋ | 1783/6638 [1:41:53<4:35:37, 3.41s/it] {'loss': 0.7065, 'grad_norm': 0.6059704256264147, 'learning_rate': 1.716191270181396e-05, 'epoch': 0.27} 27%|██▋ | 1783/6638 [1:41:53<4:35:37, 3.41s/it] 27%|██▋ | 1784/6638 [1:41:56<4:35:31, 3.41s/it] {'loss': 0.6265, 'grad_norm': 0.6278798930567646, 'learning_rate': 1.71585062415441e-05, 'epoch': 0.27} 27%|██▋ | 1784/6638 [1:41:56<4:35:31, 3.41s/it] 27%|██▋ | 1785/6638 [1:42:00<4:31:09, 3.35s/it] {'loss': 0.6683, 'grad_norm': 0.6489076847078263, 'learning_rate': 1.7155098076683203e-05, 'epoch': 0.27} 27%|██▋ | 1785/6638 [1:42:00<4:31:09, 3.35s/it] 27%|██▋ | 1786/6638 [1:42:03<4:31:54, 3.36s/it] {'loss': 0.644, 'grad_norm': 0.5485527745776785, 'learning_rate': 1.7151688208042826e-05, 'epoch': 0.27} 27%|██▋ | 1786/6638 [1:42:03<4:31:54, 3.36s/it] 27%|██▋ | 1787/6638 [1:42:06<4:31:33, 3.36s/it] {'loss': 0.671, 'grad_norm': 0.6813421172575378, 'learning_rate': 1.7148276636434933e-05, 'epoch': 0.27} 27%|██▋ | 1787/6638 [1:42:06<4:31:33, 3.36s/it] 27%|██▋ | 1788/6638 [1:42:10<4:27:34, 3.31s/it] {'loss': 0.6699, 'grad_norm': 0.747149428874087, 'learning_rate': 1.714486336267189e-05, 'epoch': 0.27} 27%|██▋ | 1788/6638 [1:42:10<4:27:34, 3.31s/it] 27%|██▋ | 1789/6638 [1:42:13<4:26:21, 3.30s/it] {'loss': 0.6593, 'grad_norm': 0.5741838078489052, 'learning_rate': 1.7141448387566467e-05, 'epoch': 0.27} 27%|██▋ | 1789/6638 [1:42:13<4:26:21, 3.30s/it] 27%|██▋ | 1790/6638 [1:42:16<4:24:05, 3.27s/it] {'loss': 0.6527, 'grad_norm': 0.6675588511078216, 'learning_rate': 1.7138031711931842e-05, 'epoch': 0.27} 27%|██▋ | 1790/6638 [1:42:16<4:24:05, 3.27s/it] 27%|██▋ | 1791/6638 [1:42:19<4:24:59, 3.28s/it] {'loss': 0.6807, 'grad_norm': 0.6733759326944493, 'learning_rate': 1.7134613336581602e-05, 'epoch': 0.27} 27%|██▋ | 1791/6638 [1:42:19<4:24:59, 3.28s/it] 27%|██▋ | 1792/6638 [1:42:23<4:28:11, 3.32s/it] {'loss': 0.6484, 'grad_norm': 0.5948337822555234, 'learning_rate': 1.7131193262329726e-05, 'epoch': 0.27} 27%|██▋ | 1792/6638 [1:42:23<4:28:11, 3.32s/it] 27%|██▋ | 1793/6638 [1:42:26<4:28:02, 3.32s/it] {'loss': 0.705, 'grad_norm': 0.7359712557173704, 'learning_rate': 1.7127771489990616e-05, 'epoch': 0.27} 27%|██▋ | 1793/6638 [1:42:26<4:28:02, 3.32s/it] 27%|██▋ | 1794/6638 [1:42:29<4:26:06, 3.30s/it] {'loss': 0.6904, 'grad_norm': 0.6298549871401545, 'learning_rate': 1.7124348020379056e-05, 'epoch': 0.27} 27%|██▋ | 1794/6638 [1:42:29<4:26:06, 3.30s/it] 27%|██▋ | 1795/6638 [1:42:33<4:26:38, 3.30s/it] {'loss': 0.6567, 'grad_norm': 0.564276168724458, 'learning_rate': 1.712092285431026e-05, 'epoch': 0.27} 27%|██▋ | 1795/6638 [1:42:33<4:26:38, 3.30s/it] 27%|██▋ | 1796/6638 [1:42:36<4:26:24, 3.30s/it] {'loss': 0.685, 'grad_norm': 0.6731704331634146, 'learning_rate': 1.7117495992599814e-05, 'epoch': 0.27} 27%|██▋ | 1796/6638 [1:42:36<4:26:24, 3.30s/it] 27%|██▋ | 1797/6638 [1:42:39<4:25:27, 3.29s/it] {'loss': 0.6634, 'grad_norm': 0.6108164324376457, 'learning_rate': 1.7114067436063745e-05, 'epoch': 0.27} 27%|██▋ | 1797/6638 [1:42:39<4:25:27, 3.29s/it] 27%|██▋ | 1798/6638 [1:42:43<4:28:38, 3.33s/it] {'loss': 0.7085, 'grad_norm': 0.60111680414402, 'learning_rate': 1.7110637185518453e-05, 'epoch': 0.27} 27%|██▋ | 1798/6638 [1:42:43<4:28:38, 3.33s/it] 27%|██▋ | 1799/6638 [1:42:46<4:25:30, 3.29s/it] {'loss': 0.7215, 'grad_norm': 0.7078011760219505, 'learning_rate': 1.710720524178076e-05, 'epoch': 0.27} 27%|██▋ | 1799/6638 [1:42:46<4:25:30, 3.29s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 27%|██▋ | 1800/6638 [1:42:49<4:25:00, 3.29s/it] {'loss': 0.6489, 'grad_norm': 0.5875510126526048, 'learning_rate': 1.7103771605667874e-05, 'epoch': 0.27} 27%|██▋ | 1800/6638 [1:42:49<4:25:00, 3.29s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 27%|██▋ | 1801/6638 [1:43:06<9:43:10, 7.23s/it] {'loss': 0.6975, 'grad_norm': 0.6364138460534913, 'learning_rate': 1.7100336277997424e-05, 'epoch': 0.27} 27%|██▋ | 1801/6638 [1:43:06<9:43:10, 7.23s/it] 27%|██▋ | 1802/6638 [1:43:09<8:06:15, 6.03s/it] {'loss': 0.6537, 'grad_norm': 0.6556296656078353, 'learning_rate': 1.7096899259587432e-05, 'epoch': 0.27} 27%|██▋ | 1802/6638 [1:43:09<8:06:15, 6.03s/it] 27%|██▋ | 1803/6638 [1:43:12<6:57:35, 5.18s/it] {'loss': 0.6488, 'grad_norm': 0.5708335760629363, 'learning_rate': 1.7093460551256325e-05, 'epoch': 0.27} 27%|██▋ | 1803/6638 [1:43:12<6:57:35, 5.18s/it] 27%|██▋ | 1804/6638 [1:43:15<6:09:33, 4.59s/it] {'loss': 0.6452, 'grad_norm': 0.6108966535594041, 'learning_rate': 1.7090020153822934e-05, 'epoch': 0.27} 27%|██▋ | 1804/6638 [1:43:15<6:09:33, 4.59s/it] 27%|██▋ | 1805/6638 [1:43:19<5:38:02, 4.20s/it] {'loss': 0.6445, 'grad_norm': 0.577153457125943, 'learning_rate': 1.7086578068106484e-05, 'epoch': 0.27} 27%|██▋ | 1805/6638 [1:43:19<5:38:02, 4.20s/it] 27%|██▋ | 1806/6638 [1:43:22<5:14:21, 3.90s/it] {'loss': 0.6725, 'grad_norm': 0.7342401897997367, 'learning_rate': 1.708313429492661e-05, 'epoch': 0.27} 27%|██▋ | 1806/6638 [1:43:22<5:14:21, 3.90s/it] 27%|██▋ | 1807/6638 [1:43:25<5:00:45, 3.74s/it] {'loss': 0.6807, 'grad_norm': 0.6144567322080671, 'learning_rate': 1.707968883510335e-05, 'epoch': 0.27} 27%|██▋ | 1807/6638 [1:43:25<5:00:45, 3.74s/it] 27%|██▋ | 1808/6638 [1:43:28<4:49:48, 3.60s/it] {'loss': 0.6525, 'grad_norm': 0.6149057188994369, 'learning_rate': 1.7076241689457134e-05, 'epoch': 0.27} 27%|██▋ | 1808/6638 [1:43:28<4:49:48, 3.60s/it] 27%|██▋ | 1809/6638 [1:43:32<4:42:01, 3.50s/it] {'loss': 0.643, 'grad_norm': 0.5315164224038265, 'learning_rate': 1.707279285880881e-05, 'epoch': 0.27} 27%|██▋ | 1809/6638 [1:43:32<4:42:01, 3.50s/it] 27%|██▋ | 1810/6638 [1:43:35<4:37:15, 3.45s/it] {'loss': 0.7403, 'grad_norm': 0.7630563936963908, 'learning_rate': 1.706934234397961e-05, 'epoch': 0.27} 27%|██▋ | 1810/6638 [1:43:35<4:37:15, 3.45s/it] 27%|██▋ | 1811/6638 [1:43:38<4:36:54, 3.44s/it] {'loss': 0.6682, 'grad_norm': 0.5676545668802969, 'learning_rate': 1.7065890145791177e-05, 'epoch': 0.27} 27%|██▋ | 1811/6638 [1:43:38<4:36:54, 3.44s/it] 27%|██▋ | 1812/6638 [1:43:42<4:30:48, 3.37s/it] {'loss': 0.6716, 'grad_norm': 0.6228916789384723, 'learning_rate': 1.706243626506555e-05, 'epoch': 0.27} 27%|██▋ | 1812/6638 [1:43:42<4:30:48, 3.37s/it] 27%|██▋ | 1813/6638 [1:43:45<4:28:29, 3.34s/it] {'loss': 0.6454, 'grad_norm': 0.6119232835813476, 'learning_rate': 1.7058980702625172e-05, 'epoch': 0.27} 27%|██▋ | 1813/6638 [1:43:45<4:28:29, 3.34s/it] 27%|██▋ | 1814/6638 [1:43:48<4:27:12, 3.32s/it] {'loss': 0.6493, 'grad_norm': 0.6266826685057171, 'learning_rate': 1.7055523459292888e-05, 'epoch': 0.27} 27%|██▋ | 1814/6638 [1:43:48<4:27:12, 3.32s/it] 27%|██▋ | 1815/6638 [1:43:51<4:25:43, 3.31s/it] {'loss': 0.6788, 'grad_norm': 0.6759073887600364, 'learning_rate': 1.7052064535891935e-05, 'epoch': 0.27} 27%|██▋ | 1815/6638 [1:43:51<4:25:43, 3.31s/it] 27%|██▋ | 1816/6638 [1:43:55<4:23:08, 3.27s/it] {'loss': 0.6721, 'grad_norm': 0.6375560167344843, 'learning_rate': 1.7048603933245956e-05, 'epoch': 0.27} 27%|██▋ | 1816/6638 [1:43:55<4:23:08, 3.27s/it] 27%|██▋ | 1817/6638 [1:43:58<4:21:05, 3.25s/it] {'loss': 0.661, 'grad_norm': 0.604322452295215, 'learning_rate': 1.7045141652178997e-05, 'epoch': 0.27} 27%|██▋ | 1817/6638 [1:43:58<4:21:05, 3.25s/it] 27%|██▋ | 1818/6638 [1:44:01<4:20:14, 3.24s/it] {'loss': 0.672, 'grad_norm': 0.6724622664185036, 'learning_rate': 1.7041677693515497e-05, 'epoch': 0.27} 27%|██▋ | 1818/6638 [1:44:01<4:20:14, 3.24s/it] 27%|██▋ | 1819/6638 [1:44:04<4:23:17, 3.28s/it] {'loss': 0.6317, 'grad_norm': 0.5747279925459927, 'learning_rate': 1.70382120580803e-05, 'epoch': 0.27} 27%|██▋ | 1819/6638 [1:44:04<4:23:17, 3.28s/it] 27%|██▋ | 1820/6638 [1:44:08<4:22:43, 3.27s/it] {'loss': 0.6989, 'grad_norm': 0.6335049741859766, 'learning_rate': 1.7034744746698644e-05, 'epoch': 0.27} 27%|██▋ | 1820/6638 [1:44:08<4:22:43, 3.27s/it] 27%|██▋ | 1821/6638 [1:44:11<4:24:21, 3.29s/it] {'loss': 0.7043, 'grad_norm': 0.6312679863655236, 'learning_rate': 1.7031275760196172e-05, 'epoch': 0.27} 27%|██▋ | 1821/6638 [1:44:11<4:24:21, 3.29s/it] 27%|██▋ | 1822/6638 [1:44:14<4:25:31, 3.31s/it] {'loss': 0.7229, 'grad_norm': 0.7667066818615915, 'learning_rate': 1.702780509939892e-05, 'epoch': 0.27} 27%|██▋ | 1822/6638 [1:44:14<4:25:31, 3.31s/it] 27%|██▋ | 1823/6638 [1:44:18<4:28:01, 3.34s/it] {'loss': 0.6715, 'grad_norm': 0.6082280125772999, 'learning_rate': 1.7024332765133325e-05, 'epoch': 0.27} 27%|██▋ | 1823/6638 [1:44:18<4:28:01, 3.34s/it] 27%|██▋ | 1824/6638 [1:44:21<4:24:40, 3.30s/it] {'loss': 0.6449, 'grad_norm': 0.5861182056800907, 'learning_rate': 1.702085875822623e-05, 'epoch': 0.27} 27%|██▋ | 1824/6638 [1:44:21<4:24:40, 3.30s/it] 27%|██▋ | 1825/6638 [1:44:24<4:23:17, 3.28s/it] {'loss': 0.6699, 'grad_norm': 0.585275424590033, 'learning_rate': 1.7017383079504858e-05, 'epoch': 0.27} 27%|██▋ | 1825/6638 [1:44:24<4:23:17, 3.28s/it] 28%|██▊ | 1826/6638 [1:44:27<4:23:48, 3.29s/it] {'loss': 0.6653, 'grad_norm': 0.601667382104673, 'learning_rate': 1.7013905729796845e-05, 'epoch': 0.28} 28%|██▊ | 1826/6638 [1:44:27<4:23:48, 3.29s/it] 28%|██▊ | 1827/6638 [1:44:31<4:26:55, 3.33s/it] {'loss': 0.6709, 'grad_norm': 0.587916357593917, 'learning_rate': 1.701042670993023e-05, 'epoch': 0.28} 28%|██▊ | 1827/6638 [1:44:31<4:26:55, 3.33s/it] 28%|██▊ | 1828/6638 [1:44:34<4:25:14, 3.31s/it] {'loss': 0.6694, 'grad_norm': 0.5922175796915872, 'learning_rate': 1.7006946020733426e-05, 'epoch': 0.28} 28%|██▊ | 1828/6638 [1:44:34<4:25:14, 3.31s/it] 28%|██▊ | 1829/6638 [1:44:37<4:22:07, 3.27s/it] {'loss': 0.6472, 'grad_norm': 0.576560817618092, 'learning_rate': 1.700346366303527e-05, 'epoch': 0.28} 28%|██▊ | 1829/6638 [1:44:37<4:22:07, 3.27s/it] 28%|██▊ | 1830/6638 [1:44:41<4:20:54, 3.26s/it] {'loss': 0.643, 'grad_norm': 0.5726073315721737, 'learning_rate': 1.6999979637664982e-05, 'epoch': 0.28} 28%|██▊ | 1830/6638 [1:44:41<4:20:54, 3.26s/it] 28%|██▊ | 1831/6638 [1:44:44<4:18:46, 3.23s/it] {'loss': 0.6568, 'grad_norm': 0.6518660069723208, 'learning_rate': 1.699649394545218e-05, 'epoch': 0.28} 28%|██▊ | 1831/6638 [1:44:44<4:18:46, 3.23s/it] 28%|██▊ | 1832/6638 [1:44:47<4:20:41, 3.25s/it] {'loss': 0.6878, 'grad_norm': 0.6219910953069202, 'learning_rate': 1.6993006587226876e-05, 'epoch': 0.28} 28%|██▊ | 1832/6638 [1:44:47<4:20:41, 3.25s/it] 28%|██▊ | 1833/6638 [1:44:50<4:22:41, 3.28s/it] {'loss': 0.6773, 'grad_norm': 0.6201879731342221, 'learning_rate': 1.6989517563819496e-05, 'epoch': 0.28} 28%|██▊ | 1833/6638 [1:44:50<4:22:41, 3.28s/it] 28%|██▊ | 1834/6638 [1:44:54<4:22:33, 3.28s/it] {'loss': 0.647, 'grad_norm': 0.5752562207355584, 'learning_rate': 1.698602687606084e-05, 'epoch': 0.28} 28%|██▊ | 1834/6638 [1:44:54<4:22:33, 3.28s/it] 28%|██▊ | 1835/6638 [1:44:57<4:23:12, 3.29s/it] {'loss': 0.662, 'grad_norm': 0.5591788246217959, 'learning_rate': 1.6982534524782117e-05, 'epoch': 0.28} 28%|██▊ | 1835/6638 [1:44:57<4:23:12, 3.29s/it] 28%|██▊ | 1836/6638 [1:45:00<4:24:52, 3.31s/it] {'loss': 0.6991, 'grad_norm': 1.4354525388871866, 'learning_rate': 1.697904051081493e-05, 'epoch': 0.28} 28%|██▊ | 1836/6638 [1:45:00<4:24:52, 3.31s/it] 28%|██▊ | 1837/6638 [1:45:04<4:27:43, 3.35s/it] {'loss': 0.7033, 'grad_norm': 0.7494823282482305, 'learning_rate': 1.6975544834991273e-05, 'epoch': 0.28} 28%|██▊ | 1837/6638 [1:45:04<4:27:43, 3.35s/it] 28%|██▊ | 1838/6638 [1:45:07<4:25:15, 3.32s/it] {'loss': 0.6586, 'grad_norm': 0.6266708092507675, 'learning_rate': 1.6972047498143544e-05, 'epoch': 0.28} 28%|██▊ | 1838/6638 [1:45:07<4:25:15, 3.32s/it] 28%|██▊ | 1839/6638 [1:45:10<4:24:46, 3.31s/it] {'loss': 0.6887, 'grad_norm': 0.5852728930941969, 'learning_rate': 1.6968548501104532e-05, 'epoch': 0.28} 28%|██▊ | 1839/6638 [1:45:10<4:24:46, 3.31s/it] 28%|██▊ | 1840/6638 [1:45:14<4:24:08, 3.30s/it] {'loss': 0.6311, 'grad_norm': 0.5541035076170161, 'learning_rate': 1.6965047844707428e-05, 'epoch': 0.28} 28%|██▊ | 1840/6638 [1:45:14<4:24:08, 3.30s/it] 28%|██▊ | 1841/6638 [1:45:17<4:22:45, 3.29s/it] {'loss': 0.7048, 'grad_norm': 0.7023069961968147, 'learning_rate': 1.69615455297858e-05, 'epoch': 0.28} 28%|██▊ | 1841/6638 [1:45:17<4:22:45, 3.29s/it] 28%|██▊ | 1842/6638 [1:45:20<4:22:34, 3.28s/it] {'loss': 0.6456, 'grad_norm': 0.5236691088979891, 'learning_rate': 1.695804155717363e-05, 'epoch': 0.28} 28%|██▊ | 1842/6638 [1:45:20<4:22:34, 3.28s/it] 28%|██▊ | 1843/6638 [1:45:23<4:21:02, 3.27s/it] {'loss': 0.6553, 'grad_norm': 0.5786152230161066, 'learning_rate': 1.6954535927705288e-05, 'epoch': 0.28} 28%|██▊ | 1843/6638 [1:45:23<4:21:02, 3.27s/it] 28%|██▊ | 1844/6638 [1:45:27<4:20:11, 3.26s/it] {'loss': 0.7224, 'grad_norm': 0.7542905274544525, 'learning_rate': 1.6951028642215533e-05, 'epoch': 0.28} 28%|██▊ | 1844/6638 [1:45:27<4:20:11, 3.26s/it] 28%|██▊ | 1845/6638 [1:45:30<4:20:15, 3.26s/it] {'loss': 0.6355, 'grad_norm': 0.6735127013301859, 'learning_rate': 1.694751970153953e-05, 'epoch': 0.28} 28%|██▊ | 1845/6638 [1:45:30<4:20:15, 3.26s/it] 28%|██▊ | 1846/6638 [1:45:33<4:19:32, 3.25s/it] {'loss': 0.6329, 'grad_norm': 0.5906269731011513, 'learning_rate': 1.6944009106512828e-05, 'epoch': 0.28} 28%|██▊ | 1846/6638 [1:45:33<4:19:32, 3.25s/it] 28%|██▊ | 1847/6638 [1:45:36<4:19:00, 3.24s/it] {'loss': 0.649, 'grad_norm': 0.5682587126205089, 'learning_rate': 1.6940496857971375e-05, 'epoch': 0.28} 28%|██▊ | 1847/6638 [1:45:36<4:19:00, 3.24s/it] 28%|██▊ | 1848/6638 [1:45:40<4:18:28, 3.24s/it] {'loss': 0.6718, 'grad_norm': 0.6860459937785327, 'learning_rate': 1.693698295675151e-05, 'epoch': 0.28} 28%|██▊ | 1848/6638 [1:45:40<4:18:28, 3.24s/it] 28%|██▊ | 1849/6638 [1:45:43<4:19:02, 3.25s/it] {'loss': 0.6769, 'grad_norm': 0.6616078670428808, 'learning_rate': 1.693346740368997e-05, 'epoch': 0.28} 28%|██▊ | 1849/6638 [1:45:43<4:19:02, 3.25s/it]2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 5AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 28%|██▊ | 1850/6638 [1:45:46<4:16:57, 3.22s/it] {'loss': 0.6598, 'grad_norm': 0.668037381111998, 'learning_rate': 1.6929950199623875e-05, 'epoch': 0.28} 28%|██▊ | 1850/6638 [1:45:46<4:16:57, 3.22s/it] 28%|██▊ | 1851/6638 [1:45:49<4:18:32, 3.24s/it] {'loss': 0.6747, 'grad_norm': 0.6729220035629909, 'learning_rate': 1.692643134539075e-05, 'epoch': 0.28} 28%|██▊ | 1851/6638 [1:45:49<4:18:32, 3.24s/it] 28%|██▊ | 1852/6638 [1:45:52<4:18:18, 3.24s/it] {'loss': 0.6973, 'grad_norm': 0.6845956916463783, 'learning_rate': 1.6922910841828513e-05, 'epoch': 0.28} 28%|██▊ | 1852/6638 [1:45:52<4:18:18, 3.24s/it] 28%|██▊ | 1853/6638 [1:45:56<4:23:32, 3.30s/it] {'loss': 0.7226, 'grad_norm': 0.6362581779756593, 'learning_rate': 1.6919388689775463e-05, 'epoch': 0.28} 28%|██▊ | 1853/6638 [1:45:56<4:23:32, 3.30s/it] 28%|██▊ | 1854/6638 [1:45:59<4:26:19, 3.34s/it] {'loss': 0.6837, 'grad_norm': 0.6847104971081109, 'learning_rate': 1.69158648900703e-05, 'epoch': 0.28} 28%|██▊ | 1854/6638 [1:45:59<4:26:19, 3.34s/it] 28%|██▊ | 1855/6638 [1:46:03<4:25:10, 3.33s/it] {'loss': 0.6711, 'grad_norm': 0.6365029551974952, 'learning_rate': 1.691233944355212e-05, 'epoch': 0.28} 28%|██▊ | 1855/6638 [1:46:03<4:25:10, 3.33s/it] 28%|██▊ | 1856/6638 [1:46:06<4:27:47, 3.36s/it] {'loss': 0.715, 'grad_norm': 0.7987729376176234, 'learning_rate': 1.6908812351060397e-05, 'epoch': 0.28} 28%|██▊ | 1856/6638 [1:46:06<4:27:47, 3.36s/it] 28%|██▊ | 1857/6638 [1:46:09<4:24:43, 3.32s/it] {'loss': 0.6567, 'grad_norm': 0.6092034844894908, 'learning_rate': 1.6905283613435012e-05, 'epoch': 0.28} 28%|██▊ | 1857/6638 [1:46:09<4:24:43, 3.32s/it] 28%|██▊ | 1858/6638 [1:46:13<4:25:14, 3.33s/it] {'loss': 0.662, 'grad_norm': 0.6636559318941768, 'learning_rate': 1.690175323151623e-05, 'epoch': 0.28} 28%|██▊ | 1858/6638 [1:46:13<4:25:14, 3.33s/it] 28%|██▊ | 1859/6638 [1:46:16<4:24:04, 3.32s/it] {'loss': 0.6474, 'grad_norm': 1.2140188633857907, 'learning_rate': 1.689822120614471e-05, 'epoch': 0.28} 28%|██▊ | 1859/6638 [1:46:16<4:24:04, 3.32s/it] 28%|██▊ | 1860/6638 [1:46:19<4:21:26, 3.28s/it] {'loss': 0.6523, 'grad_norm': 0.670433182321442, 'learning_rate': 1.6894687538161503e-05, 'epoch': 0.28} 28%|██▊ | 1860/6638 [1:46:19<4:21:26, 3.28s/it] 28%|██▊ | 1861/6638 [1:46:23<4:24:16, 3.32s/it] {'loss': 0.633, 'grad_norm': 0.656716268116018, 'learning_rate': 1.6891152228408046e-05, 'epoch': 0.28} 28%|██▊ | 1861/6638 [1:46:23<4:24:16, 3.32s/it] 28%|██▊ | 1862/6638 [1:46:26<4:22:02, 3.29s/it] {'loss': 0.7203, 'grad_norm': 0.7698686326637686, 'learning_rate': 1.6887615277726174e-05, 'epoch': 0.28} 28%|██▊ | 1862/6638 [1:46:26<4:22:02, 3.29s/it] 28%|██▊ | 1863/6638 [1:46:29<4:20:48, 3.28s/it] {'loss': 0.6368, 'grad_norm': 0.5705635851102934, 'learning_rate': 1.6884076686958107e-05, 'epoch': 0.28} 28%|██▊ | 1863/6638 [1:46:29<4:20:48, 3.28s/it] 28%|██▊ | 1864/6638 [1:46:32<4:20:55, 3.28s/it] {'loss': 0.6828, 'grad_norm': 0.635979744570959, 'learning_rate': 1.688053645694646e-05, 'epoch': 0.28} 28%|██▊ | 1864/6638 [1:46:32<4:20:55, 3.28s/it] 28%|██▊ | 1865/6638 [1:46:36<4:20:07, 3.27s/it] {'loss': 0.6223, 'grad_norm': 0.590251344705017, 'learning_rate': 1.6876994588534234e-05, 'epoch': 0.28} 28%|██▊ | 1865/6638 [1:46:36<4:20:07, 3.27s/it] 28%|██▊ | 1866/6638 [1:46:39<4:19:32, 3.26s/it] {'loss': 0.6398, 'grad_norm': 0.5833778975328824, 'learning_rate': 1.6873451082564828e-05, 'epoch': 0.28} 28%|██▊ | 1866/6638 [1:46:39<4:19:32, 3.26s/it] 28%|██▊ | 1867/6638 [1:46:42<4:17:57, 3.24s/it] {'loss': 0.6446, 'grad_norm': 0.6395306582245023, 'learning_rate': 1.6869905939882015e-05, 'epoch': 0.28} 28%|██▊ | 1867/6638 [1:46:42<4:17:57, 3.24s/it] 28%|██▊ | 1868/6638 [1:46:45<4:18:28, 3.25s/it] {'loss': 0.637, 'grad_norm': 0.5542809438657942, 'learning_rate': 1.686635916132998e-05, 'epoch': 0.28} 28%|██▊ | 1868/6638 [1:46:45<4:18:28, 3.25s/it] 28%|██▊ | 1869/6638 [1:46:49<4:19:19, 3.26s/it] {'loss': 0.6718, 'grad_norm': 0.6423792679594741, 'learning_rate': 1.6862810747753274e-05, 'epoch': 0.28} 28%|██▊ | 1869/6638 [1:46:49<4:19:19, 3.26s/it] 28%|██▊ | 1870/6638 [1:46:52<4:21:17, 3.29s/it] {'loss': 0.7005, 'grad_norm': 0.7055052493844649, 'learning_rate': 1.685926069999686e-05, 'epoch': 0.28} 28%|██▊ | 1870/6638 [1:46:52<4:21:17, 3.29s/it] 28%|██▊ | 1871/6638 [1:46:55<4:19:21, 3.26s/it] {'loss': 0.6185, 'grad_norm': 0.5753882165278605, 'learning_rate': 1.6855709018906073e-05, 'epoch': 0.28} 28%|██▊ | 1871/6638 [1:46:55<4:19:21, 3.26s/it] 28%|██▊ | 1872/6638 [1:46:58<4:19:07, 3.26s/it] {'loss': 0.6334, 'grad_norm': 0.5270128503281205, 'learning_rate': 1.6852155705326644e-05, 'epoch': 0.28} 28%|██▊ | 1872/6638 [1:46:58<4:19:07, 3.26s/it] 28%|██▊ | 1873/6638 [1:47:02<4:18:37, 3.26s/it] {'loss': 0.6869, 'grad_norm': 0.6716793613345466, 'learning_rate': 1.684860076010469e-05, 'epoch': 0.28} 28%|██▊ | 1873/6638 [1:47:02<4:18:37, 3.26s/it] 28%|██▊ | 1874/6638 [1:47:05<4:19:39, 3.27s/it] {'loss': 0.7577, 'grad_norm': 0.7783201117004758, 'learning_rate': 1.684504418408672e-05, 'epoch': 0.28} 28%|██▊ | 1874/6638 [1:47:05<4:19:39, 3.27s/it] 28%|██▊ | 1875/6638 [1:47:08<4:18:43, 3.26s/it] {'loss': 0.6716, 'grad_norm': 0.6201723542371773, 'learning_rate': 1.684148597811963e-05, 'epoch': 0.28} 28%|██▊ | 1875/6638 [1:47:08<4:18:43, 3.26s/it] 28%|██▊ | 1876/6638 [1:47:11<4:16:48, 3.24s/it] {'loss': 0.6815, 'grad_norm': 0.6231622134502004, 'learning_rate': 1.6837926143050707e-05, 'epoch': 0.28} 28%|██▊ | 1876/6638 [1:47:11<4:16:48, 3.24s/it] 28%|██▊ | 1877/6638 [1:47:15<4:18:35, 3.26s/it] {'loss': 0.6756, 'grad_norm': 0.6225411632954607, 'learning_rate': 1.6834364679727614e-05, 'epoch': 0.28} 28%|██▊ | 1877/6638 [1:47:15<4:18:35, 3.26s/it] 28%|██▊ | 1878/6638 [1:47:18<4:15:52, 3.23s/it] {'loss': 0.622, 'grad_norm': 0.5532240934425172, 'learning_rate': 1.683080158899842e-05, 'epoch': 0.28} 28%|██▊ | 1878/6638 [1:47:18<4:15:52, 3.23s/it] 28%|██▊ | 1879/6638 [1:47:21<4:14:13, 3.21s/it] {'loss': 0.6614, 'grad_norm': 0.6368825089308193, 'learning_rate': 1.6827236871711566e-05, 'epoch': 0.28} 28%|██▊ | 1879/6638 [1:47:21<4:14:13, 3.21s/it] 28%|██▊ | 1880/6638 [1:47:24<4:13:17, 3.19s/it] {'loss': 0.6563, 'grad_norm': 0.6164470463944469, 'learning_rate': 1.6823670528715886e-05, 'epoch': 0.28} 28%|██▊ | 1880/6638 [1:47:24<4:13:17, 3.19s/it] 28%|██▊ | 1881/6638 [1:47:27<4:15:14, 3.22s/it] {'loss': 0.6535, 'grad_norm': 0.6518940093706134, 'learning_rate': 1.6820102560860607e-05, 'epoch': 0.28} 28%|██▊ | 1881/6638 [1:47:27<4:15:14, 3.22s/it] 28%|██▊ | 1882/6638 [1:47:31<4:14:10, 3.21s/it] {'loss': 0.6921, 'grad_norm': 0.7155480852377768, 'learning_rate': 1.681653296899533e-05, 'epoch': 0.28} 28%|██▊ | 1882/6638 [1:47:31<4:14:10, 3.21s/it] 28%|██▊ | 1883/6638 [1:47:34<4:13:35, 3.20s/it] {'loss': 0.6697, 'grad_norm': 0.5952513603215248, 'learning_rate': 1.6812961753970054e-05, 'epoch': 0.28} 28%|██▊ | 1883/6638 [1:47:34<4:13:35, 3.20s/it] 28%|██▊ | 1884/6638 [1:47:37<4:16:49, 3.24s/it] {'loss': 0.6346, 'grad_norm': 0.5607413667347206, 'learning_rate': 1.6809388916635158e-05, 'epoch': 0.28} 28%|██▊ | 1884/6638 [1:47:37<4:16:49, 3.24s/it] 28%|██▊ | 1885/6638 [1:47:40<4:19:45, 3.28s/it] {'loss': 0.682, 'grad_norm': 0.6267620316776615, 'learning_rate': 1.6805814457841416e-05, 'epoch': 0.28} 28%|██▊ | 1885/6638 [1:47:40<4:19:45, 3.28s/it] 28%|██▊ | 1886/6638 [1:47:44<4:17:41, 3.25s/it] {'loss': 0.6726, 'grad_norm': 0.6873545089225107, 'learning_rate': 1.6802238378439976e-05, 'epoch': 0.28} 28%|██▊ | 1886/6638 [1:47:44<4:17:41, 3.25s/it] 28%|██▊ | 1887/6638 [1:47:47<4:16:44, 3.24s/it] {'loss': 0.6518, 'grad_norm': 0.5828008812385425, 'learning_rate': 1.6798660679282374e-05, 'epoch': 0.28} 28%|██▊ | 1887/6638 [1:47:47<4:16:44, 3.24s/it] 28%|██▊ | 1888/6638 [1:47:50<4:19:48, 3.28s/it] {'loss': 0.6785, 'grad_norm': 0.663121559720079, 'learning_rate': 1.6795081361220547e-05, 'epoch': 0.28} 28%|██▊ | 1888/6638 [1:47:50<4:19:48, 3.28s/it] 28%|██▊ | 1889/6638 [1:47:54<4:21:20, 3.30s/it] {'loss': 0.7802, 'grad_norm': 0.7448376661541555, 'learning_rate': 1.6791500425106795e-05, 'epoch': 0.28} 28%|██▊ | 1889/6638 [1:47:54<4:21:20, 3.30s/it] 28%|██▊ | 1890/6638 [1:47:57<4:23:17, 3.33s/it] {'loss': 0.6835, 'grad_norm': 0.6721148910913721, 'learning_rate': 1.6787917871793823e-05, 'epoch': 0.28} 28%|██▊ | 1890/6638 [1:47:57<4:23:17, 3.33s/it] 28%|██▊ | 1891/6638 [1:48:00<4:18:45, 3.27s/it] {'loss': 0.6682, 'grad_norm': 0.7334319159573572, 'learning_rate': 1.678433370213471e-05, 'epoch': 0.28} 28%|██▊ | 1891/6638 [1:48:00<4:18:45, 3.27s/it] 29%|██▊ | 1892/6638 [1:48:03<4:20:48, 3.30s/it] {'loss': 0.7075, 'grad_norm': 0.6572195268828025, 'learning_rate': 1.6780747916982912e-05, 'epoch': 0.29} 29%|██▊ | 1892/6638 [1:48:03<4:20:48, 3.30s/it] 29%|██▊ | 1893/6638 [1:48:07<4:17:29, 3.26s/it] {'loss': 0.6562, 'grad_norm': 0.5801566479984088, 'learning_rate': 1.6777160517192297e-05, 'epoch': 0.29} 29%|██▊ | 1893/6638 [1:48:07<4:17:29, 3.26s/it] 29%|██▊ | 1894/6638 [1:48:10<4:19:43, 3.28s/it] {'loss': 0.622, 'grad_norm': 0.5300131825847929, 'learning_rate': 1.677357150361709e-05, 'epoch': 0.29} 29%|██▊ | 1894/6638 [1:48:10<4:19:43, 3.28s/it] 29%|██▊ | 1895/6638 [1:48:13<4:19:41, 3.29s/it] {'loss': 0.7386, 'grad_norm': 0.688093302925135, 'learning_rate': 1.6769980877111906e-05, 'epoch': 0.29} 29%|██▊ | 1895/6638 [1:48:13<4:19:41, 3.29s/it] 29%|██▊ | 1896/6638 [1:48:17<4:20:56, 3.30s/it] {'loss': 0.6574, 'grad_norm': 0.6061432387744066, 'learning_rate': 1.6766388638531762e-05, 'epoch': 0.29} 29%|██▊ | 1896/6638 [1:48:17<4:20:56, 3.30s/it] 29%|██▊ | 1897/6638 [1:48:20<4:17:46, 3.26s/it] {'loss': 0.6348, 'grad_norm': 0.6074929829470956, 'learning_rate': 1.6762794788732037e-05, 'epoch': 0.29} 29%|██▊ | 1897/6638 [1:48:20<4:17:46, 3.26s/it] 29%|██▊ | 1898/6638 [1:48:23<4:17:30, 3.26s/it] {'loss': 0.6397, 'grad_norm': 0.6144528333643647, 'learning_rate': 1.6759199328568506e-05, 'epoch': 0.29} 29%|██▊ | 1898/6638 [1:48:23<4:17:30, 3.26s/it] 29%|██▊ | 1899/6638 [1:48:26<4:17:18, 3.26s/it] {'loss': 0.6806, 'grad_norm': 0.606108543143607, 'learning_rate': 1.6755602258897323e-05, 'epoch': 0.29} 29%|██▊ | 1899/6638 [1:48:26<4:17:18, 3.26s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 29%|██▊ | 1900/6638 [1:48:30<4:19:33, 3.29s/it]7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6593, 'grad_norm': 0.5835461442945231, 'learning_rate': 1.675200358057502e-05, 'epoch': 0.29} 29%|██▊ | 1900/6638 [1:48:30<4:19:33, 3.29s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-1900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 29%|██▊ | 1901/6638 [1:48:46<9:33:32, 7.26s/it] {'loss': 0.7106, 'grad_norm': 0.6550617425620094, 'learning_rate': 1.6748403294458528e-05, 'epoch': 0.29} 29%|██▊ | 1901/6638 [1:48:46<9:33:32, 7.26s/it] 29%|██▊ | 1902/6638 [1:48:49<7:57:45, 6.05s/it] {'loss': 0.664, 'grad_norm': 0.6384558955663704, 'learning_rate': 1.6744801401405138e-05, 'epoch': 0.29} 29%|██▊ | 1902/6638 [1:48:49<7:57:45, 6.05s/it] 29%|██▊ | 1903/6638 [1:48:53<6:52:54, 5.23s/it] {'loss': 0.6482, 'grad_norm': 0.6108512249423601, 'learning_rate': 1.6741197902272553e-05, 'epoch': 0.29} 29%|██▊ | 1903/6638 [1:48:53<6:52:54, 5.23s/it] 29%|██▊ | 1904/6638 [1:48:56<6:09:20, 4.68s/it] {'loss': 0.6889, 'grad_norm': 0.6306103706444607, 'learning_rate': 1.673759279791883e-05, 'epoch': 0.29} 29%|██▊ | 1904/6638 [1:48:56<6:09:20, 4.68s/it] 29%|██▊ | 1905/6638 [1:48:59<5:35:15, 4.25s/it] {'loss': 0.6485, 'grad_norm': 0.6896467138238298, 'learning_rate': 1.6733986089202427e-05, 'epoch': 0.29} 29%|██▊ | 1905/6638 [1:48:59<5:35:15, 4.25s/it] 29%|██▊ | 1906/6638 [1:49:03<5:13:15, 3.97s/it] {'loss': 0.6715, 'grad_norm': 0.5588275521759509, 'learning_rate': 1.6730377776982173e-05, 'epoch': 0.29} 29%|██▊ | 1906/6638 [1:49:03<5:13:15, 3.97s/it] 29%|██▊ | 1907/6638 [1:49:06<4:56:39, 3.76s/it] {'loss': 0.6385, 'grad_norm': 0.6504797683479632, 'learning_rate': 1.6726767862117283e-05, 'epoch': 0.29} 29%|██▊ | 1907/6638 [1:49:06<4:56:39, 3.76s/it] 29%|██▊ | 1908/6638 [1:49:09<4:44:36, 3.61s/it] {'loss': 0.6832, 'grad_norm': 0.6663634341157721, 'learning_rate': 1.6723156345467354e-05, 'epoch': 0.29} 29%|██▊ | 1908/6638 [1:49:09<4:44:36, 3.61s/it] 29%|██▉ | 1909/6638 [1:49:13<4:37:31, 3.52s/it] {'loss': 0.6793, 'grad_norm': 0.607436711127808, 'learning_rate': 1.6719543227892367e-05, 'epoch': 0.29} 29%|██▉ | 1909/6638 [1:49:13<4:37:31, 3.52s/it] 29%|██▉ | 1910/6638 [1:49:16<4:34:28, 3.48s/it] {'loss': 0.6712, 'grad_norm': 0.6329436384849534, 'learning_rate': 1.6715928510252678e-05, 'epoch': 0.29} 29%|██▉ | 1910/6638 [1:49:16<4:34:28, 3.48s/it] 29%|██▉ | 1911/6638 [1:49:19<4:27:47, 3.40s/it] {'loss': 0.6637, 'grad_norm': 0.6691536492169868, 'learning_rate': 1.6712312193409032e-05, 'epoch': 0.29} 29%|██▉ | 1911/6638 [1:49:19<4:27:47, 3.40s/it] 29%|██▉ | 1912/6638 [1:49:22<4:24:48, 3.36s/it] {'loss': 0.6431, 'grad_norm': 0.5785017748408227, 'learning_rate': 1.6708694278222542e-05, 'epoch': 0.29} 29%|██▉ | 1912/6638 [1:49:22<4:24:48, 3.36s/it] 29%|██▉ | 1913/6638 [1:49:26<4:23:06, 3.34s/it] {'loss': 0.6612, 'grad_norm': 0.5698107919126326, 'learning_rate': 1.670507476555472e-05, 'epoch': 0.29} 29%|██▉ | 1913/6638 [1:49:26<4:23:06, 3.34s/it] 29%|██▉ | 1914/6638 [1:49:29<4:20:17, 3.31s/it] {'loss': 0.6278, 'grad_norm': 0.578487490843945, 'learning_rate': 1.6701453656267437e-05, 'epoch': 0.29} 29%|██▉ | 1914/6638 [1:49:29<4:20:17, 3.31s/it] 29%|██▉ | 1915/6638 [1:49:32<4:19:01, 3.29s/it] {'loss': 0.6568, 'grad_norm': 0.6167211954323019, 'learning_rate': 1.6697830951222963e-05, 'epoch': 0.29} 29%|██▉ | 1915/6638 [1:49:32<4:19:01, 3.29s/it] 29%|██▉ | 1916/6638 [1:49:35<4:17:25, 3.27s/it] {'loss': 0.6779, 'grad_norm': 0.6291289269054882, 'learning_rate': 1.6694206651283938e-05, 'epoch': 0.29} 29%|██▉ | 1916/6638 [1:49:35<4:17:25, 3.27s/it] 29%|██▉ | 1917/6638 [1:49:39<4:18:08, 3.28s/it] {'loss': 0.6527, 'grad_norm': 0.6078529445226073, 'learning_rate': 1.669058075731339e-05, 'epoch': 0.29} 29%|██▉ | 1917/6638 [1:49:39<4:18:08, 3.28s/it] 29%|██▉ | 1918/6638 [1:49:42<4:20:44, 3.31s/it] {'loss': 0.7123, 'grad_norm': 0.6221228147987709, 'learning_rate': 1.668695327017471e-05, 'epoch': 0.29} 29%|██▉ | 1918/6638 [1:49:42<4:20:44, 3.31s/it] 29%|██▉ | 1919/6638 [1:49:45<4:21:46, 3.33s/it] {'loss': 0.681, 'grad_norm': 0.597948073016116, 'learning_rate': 1.6683324190731686e-05, 'epoch': 0.29} 29%|██▉ | 1919/6638 [1:49:45<4:21:46, 3.33s/it] 29%|██▉ | 1920/6638 [1:49:49<4:18:03, 3.28s/it] {'loss': 0.6485, 'grad_norm': 0.6097994996716026, 'learning_rate': 1.667969351984848e-05, 'epoch': 0.29} 29%|██▉ | 1920/6638 [1:49:49<4:18:03, 3.28s/it] 29%|██▉ | 1921/6638 [1:49:52<4:17:31, 3.28s/it] {'loss': 0.6871, 'grad_norm': 0.6542548263459733, 'learning_rate': 1.667606125838962e-05, 'epoch': 0.29} 29%|██▉ | 1921/6638 [1:49:52<4:17:31, 3.28s/it] 29%|██▉ | 1922/6638 [1:49:55<4:19:29, 3.30s/it] {'loss': 0.6611, 'grad_norm': 0.6157305090939241, 'learning_rate': 1.6672427407220038e-05, 'epoch': 0.29} 29%|██▉ | 1922/6638 [1:49:55<4:19:29, 3.30s/it] 29%|██▉ | 1923/6638 [1:49:59<4:20:30, 3.32s/it] {'loss': 0.649, 'grad_norm': 0.5672884053947013, 'learning_rate': 1.6668791967205026e-05, 'epoch': 0.29} 29%|██▉ | 1923/6638 [1:49:59<4:20:30, 3.32s/it] 29%|██▉ | 1924/6638 [1:50:02<4:18:47, 3.29s/it] {'loss': 0.6567, 'grad_norm': 0.5810232103432629, 'learning_rate': 1.666515493921025e-05, 'epoch': 0.29} 29%|██▉ | 1924/6638 [1:50:02<4:18:47, 3.29s/it] 29%|██▉ | 1925/6638 [1:50:05<4:18:40, 3.29s/it] {'loss': 0.672, 'grad_norm': 0.6565080107925734, 'learning_rate': 1.6661516324101775e-05, 'epoch': 0.29} 29%|██▉ | 1925/6638 [1:50:05<4:18:40, 3.29s/it] 29%|██▉ | 1926/6638 [1:50:09<4:21:50, 3.33s/it] {'loss': 0.6914, 'grad_norm': 0.6001712809655573, 'learning_rate': 1.6657876122746027e-05, 'epoch': 0.29} 29%|██▉ | 1926/6638 [1:50:09<4:21:50, 3.33s/it] 29%|██▉ | 1927/6638 [1:50:12<4:21:03, 3.32s/it] {'loss': 0.6544, 'grad_norm': 0.6348362632487347, 'learning_rate': 1.6654234336009812e-05, 'epoch': 0.29} 29%|██▉ | 1927/6638 [1:50:12<4:21:03, 3.32s/it] 29%|██▉ | 1928/6638 [1:50:15<4:17:43, 3.28s/it] {'loss': 0.6071, 'grad_norm': 0.5357581123061967, 'learning_rate': 1.665059096476032e-05, 'epoch': 0.29} 29%|██▉ | 1928/6638 [1:50:15<4:17:43, 3.28s/it] 29%|██▉ | 1929/6638 [1:50:18<4:16:13, 3.26s/it] {'loss': 0.662, 'grad_norm': 0.6210970113542793, 'learning_rate': 1.664694600986511e-05, 'epoch': 0.29} 29%|██▉ | 1929/6638 [1:50:18<4:16:13, 3.26s/it] 29%|██▉ | 1930/6638 [1:50:22<4:15:20, 3.25s/it] {'loss': 0.6445, 'grad_norm': 0.6152581798889214, 'learning_rate': 1.664329947219213e-05, 'epoch': 0.29} 29%|██▉ | 1930/6638 [1:50:22<4:15:20, 3.25s/it] 29%|██▉ | 1931/6638 [1:50:25<4:15:13, 3.25s/it] {'loss': 0.6588, 'grad_norm': 0.6126464056145036, 'learning_rate': 1.663965135260969e-05, 'epoch': 0.29} 29%|██▉ | 1931/6638 [1:50:25<4:15:13, 3.25s/it] 29%|██▉ | 1932/6638 [1:50:28<4:13:55, 3.24s/it] {'loss': 0.6539, 'grad_norm': 0.61281912162743, 'learning_rate': 1.6636001651986485e-05, 'epoch': 0.29} 29%|██▉ | 1932/6638 [1:50:28<4:13:55, 3.24s/it] 29%|██▉ | 1933/6638 [1:50:31<4:14:17, 3.24s/it] {'loss': 0.6734, 'grad_norm': 0.595937785064383, 'learning_rate': 1.663235037119159e-05, 'epoch': 0.29} 29%|██▉ | 1933/6638 [1:50:31<4:14:17, 3.24s/it] 29%|██▉ | 1934/6638 [1:50:35<4:18:44, 3.30s/it] {'loss': 0.6628, 'grad_norm': 0.6203552897401345, 'learning_rate': 1.6628697511094446e-05, 'epoch': 0.29} 29%|██▉ | 1934/6638 [1:50:35<4:18:44, 3.30s/it] 29%|██▉ | 1935/6638 [1:50:38<4:17:58, 3.29s/it] {'loss': 0.6513, 'grad_norm': 0.5863389272827225, 'learning_rate': 1.662504307256488e-05, 'epoch': 0.29} 29%|██▉ | 1935/6638 [1:50:38<4:17:58, 3.29s/it] 29%|██▉ | 1936/6638 [1:50:41<4:14:59, 3.25s/it] {'loss': 0.6468, 'grad_norm': 0.7232064195627793, 'learning_rate': 1.6621387056473094e-05, 'epoch': 0.29} 29%|██▉ | 1936/6638 [1:50:41<4:14:59, 3.25s/it] 29%|██▉ | 1937/6638 [1:50:44<4:14:48, 3.25s/it] {'loss': 0.6664, 'grad_norm': 0.6087342099546939, 'learning_rate': 1.6617729463689655e-05, 'epoch': 0.29} 29%|██▉ | 1937/6638 [1:50:44<4:14:48, 3.25s/it] 29%|██▉ | 1938/6638 [1:50:48<4:14:41, 3.25s/it] {'loss': 0.6799, 'grad_norm': 0.6601883990441284, 'learning_rate': 1.661407029508552e-05, 'epoch': 0.29} 29%|██▉ | 1938/6638 [1:50:48<4:14:41, 3.25s/it] 29%|██▉ | 1939/6638 [1:50:51<4:15:37, 3.26s/it] {'loss': 0.6541, 'grad_norm': 0.698768212191695, 'learning_rate': 1.6610409551532006e-05, 'epoch': 0.29} 29%|██▉ | 1939/6638 [1:50:51<4:15:37, 3.26s/it] 29%|██▉ | 1940/6638 [1:50:54<4:16:09, 3.27s/it] {'loss': 0.6687, 'grad_norm': 0.6218978669939073, 'learning_rate': 1.6606747233900816e-05, 'epoch': 0.29} 29%|██▉ | 1940/6638 [1:50:54<4:16:09, 3.27s/it] 29%|██▉ | 1941/6638 [1:50:57<4:14:41, 3.25s/it] {'loss': 0.6486, 'grad_norm': 0.5901111486126298, 'learning_rate': 1.6603083343064028e-05, 'epoch': 0.29} 29%|██▉ | 1941/6638 [1:50:57<4:14:41, 3.25s/it] 29%|██▉ | 1942/6638 [1:51:01<4:15:31, 3.26s/it] {'loss': 0.6605, 'grad_norm': 0.6132902505152463, 'learning_rate': 1.659941787989409e-05, 'epoch': 0.29} 29%|██▉ | 1942/6638 [1:51:01<4:15:31, 3.26s/it] 29%|██▉ | 1943/6638 [1:51:04<4:13:25, 3.24s/it] {'loss': 0.6483, 'grad_norm': 0.5971452791567898, 'learning_rate': 1.6595750845263825e-05, 'epoch': 0.29} 29%|██▉ | 1943/6638 [1:51:04<4:13:25, 3.24s/it] 29%|██▉ | 1944/6638 [1:51:07<4:13:03, 3.23s/it] {'loss': 0.6535, 'grad_norm': 0.6187315290466641, 'learning_rate': 1.659208224004643e-05, 'epoch': 0.29} 29%|██▉ | 1944/6638 [1:51:07<4:13:03, 3.23s/it] 29%|██▉ | 1945/6638 [1:51:10<4:15:06, 3.26s/it] {'loss': 0.686, 'grad_norm': 0.6919389292498318, 'learning_rate': 1.6588412065115483e-05, 'epoch': 0.29} 29%|██▉ | 1945/6638 [1:51:10<4:15:06, 3.26s/it] 29%|██▉ | 1946/6638 [1:51:14<4:12:44, 3.23s/it] {'loss': 0.6424, 'grad_norm': 0.6860297095486131, 'learning_rate': 1.6584740321344927e-05, 'epoch': 0.29} 29%|██▉ | 1946/6638 [1:51:14<4:12:44, 3.23s/it] 29%|██▉ | 1947/6638 [1:51:17<4:13:19, 3.24s/it] {'loss': 0.6558, 'grad_norm': 0.608684880961068, 'learning_rate': 1.6581067009609073e-05, 'epoch': 0.29} 29%|██▉ | 1947/6638 [1:51:17<4:13:19, 3.24s/it] 29%|██▉ | 1948/6638 [1:51:20<4:18:05, 3.30s/it] {'loss': 0.7351, 'grad_norm': 0.699383791144693, 'learning_rate': 1.6577392130782625e-05, 'epoch': 0.29} 29%|██▉ | 1948/6638 [1:51:20<4:18:05, 3.30s/it] 29%|██▉ | 1949/6638 [1:51:24<4:19:31, 3.32s/it] {'loss': 0.7058, 'grad_norm': 1.5319022129137818, 'learning_rate': 1.6573715685740647e-05, 'epoch': 0.29} 29%|██▉ | 1949/6638 [1:51:24<4:19:31, 3.32s/it]5 2AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 29%|██▉ | 1950/6638 [1:51:27<4:18:38, 3.31s/it]6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6704, 'grad_norm': 0.5729789270787383, 'learning_rate': 1.6570037675358573e-05, 'epoch': 0.29} 29%|██▉ | 1950/6638 [1:51:27<4:18:38, 3.31s/it] 29%|██▉ | 1951/6638 [1:51:31<4:40:15, 3.59s/it] {'loss': 0.6859, 'grad_norm': 0.5789522241533372, 'learning_rate': 1.6566358100512222e-05, 'epoch': 0.29} 29%|██▉ | 1951/6638 [1:51:31<4:40:15, 3.59s/it] 29%|██▉ | 1952/6638 [1:51:34<4:32:42, 3.49s/it] {'loss': 0.6492, 'grad_norm': 0.6239856597072051, 'learning_rate': 1.656267696207777e-05, 'epoch': 0.29} 29%|██▉ | 1952/6638 [1:51:34<4:32:42, 3.49s/it] 29%|██▉ | 1953/6638 [1:51:38<4:26:31, 3.41s/it] {'loss': 0.6787, 'grad_norm': 0.6038214181436689, 'learning_rate': 1.655899426093178e-05, 'epoch': 0.29} 29%|██▉ | 1953/6638 [1:51:38<4:26:31, 3.41s/it] 29%|██▉ | 1954/6638 [1:51:41<4:21:44, 3.35s/it] {'loss': 0.674, 'grad_norm': 0.6674371433384498, 'learning_rate': 1.6555309997951178e-05, 'epoch': 0.29} 29%|██▉ | 1954/6638 [1:51:41<4:21:44, 3.35s/it] 29%|██▉ | 1955/6638 [1:51:44<4:22:29, 3.36s/it] {'loss': 0.6503, 'grad_norm': 0.5849197160058323, 'learning_rate': 1.6551624174013264e-05, 'epoch': 0.29} 29%|██▉ | 1955/6638 [1:51:44<4:22:29, 3.36s/it] 29%|██▉ | 1956/6638 [1:51:47<4:18:56, 3.32s/it] {'loss': 0.6609, 'grad_norm': 0.6217067299730228, 'learning_rate': 1.654793678999571e-05, 'epoch': 0.29} 29%|██▉ | 1956/6638 [1:51:47<4:18:56, 3.32s/it] 29%|██▉ | 1957/6638 [1:51:51<4:18:25, 3.31s/it] {'loss': 0.6833, 'grad_norm': 0.6424025834801155, 'learning_rate': 1.6544247846776562e-05, 'epoch': 0.29} 29%|██▉ | 1957/6638 [1:51:51<4:18:25, 3.31s/it] 29%|██▉ | 1958/6638 [1:51:54<4:16:42, 3.29s/it] {'loss': 0.6946, 'grad_norm': 0.6559793320967559, 'learning_rate': 1.6540557345234235e-05, 'epoch': 0.29} 29%|██▉ | 1958/6638 [1:51:54<4:16:42, 3.29s/it] 30%|██▉ | 1959/6638 [1:51:57<4:17:32, 3.30s/it] {'loss': 0.645, 'grad_norm': 0.6722875575304322, 'learning_rate': 1.6536865286247512e-05, 'epoch': 0.3} 30%|██▉ | 1959/6638 [1:51:57<4:17:32, 3.30s/it] 30%|██▉ | 1960/6638 [1:52:01<4:16:35, 3.29s/it] {'loss': 0.6749, 'grad_norm': 0.5935040164051317, 'learning_rate': 1.6533171670695555e-05, 'epoch': 0.3} 30%|██▉ | 1960/6638 [1:52:01<4:16:35, 3.29s/it] 30%|██▉ | 1961/6638 [1:52:04<4:17:01, 3.30s/it] {'loss': 0.6774, 'grad_norm': 0.6720936155550166, 'learning_rate': 1.6529476499457886e-05, 'epoch': 0.3} 30%|██▉ | 1961/6638 [1:52:04<4:17:01, 3.30s/it] 30%|██▉ | 1962/6638 [1:52:07<4:15:57, 3.28s/it] {'loss': 0.7451, 'grad_norm': 0.6616057472332961, 'learning_rate': 1.6525779773414406e-05, 'epoch': 0.3} 30%|██▉ | 1962/6638 [1:52:07<4:15:57, 3.28s/it] 30%|██▉ | 1963/6638 [1:52:10<4:14:19, 3.26s/it] {'loss': 0.6083, 'grad_norm': 0.5785076229154287, 'learning_rate': 1.6522081493445386e-05, 'epoch': 0.3} 30%|██▉ | 1963/6638 [1:52:10<4:14:19, 3.26s/it] 30%|██▉ | 1964/6638 [1:52:14<4:12:55, 3.25s/it] {'loss': 0.6541, 'grad_norm': 0.6708909737343172, 'learning_rate': 1.6518381660431456e-05, 'epoch': 0.3} 30%|██▉ | 1964/6638 [1:52:14<4:12:55, 3.25s/it] 30%|██▉ | 1965/6638 [1:52:17<4:11:42, 3.23s/it] {'loss': 0.6749, 'grad_norm': 0.6232107763671537, 'learning_rate': 1.6514680275253634e-05, 'epoch': 0.3} 30%|██▉ | 1965/6638 [1:52:17<4:11:42, 3.23s/it] 30%|██▉ | 1966/6638 [1:52:20<4:11:42, 3.23s/it] {'loss': 0.6771, 'grad_norm': 0.6694467096354754, 'learning_rate': 1.6510977338793293e-05, 'epoch': 0.3} 30%|██▉ | 1966/6638 [1:52:20<4:11:42, 3.23s/it] 30%|██▉ | 1967/6638 [1:52:23<4:11:29, 3.23s/it] {'loss': 0.6446, 'grad_norm': 0.6087908570989607, 'learning_rate': 1.650727285193218e-05, 'epoch': 0.3} 30%|██▉ | 1967/6638 [1:52:23<4:11:29, 3.23s/it] 30%|██▉ | 1968/6638 [1:52:27<4:12:39, 3.25s/it] {'loss': 0.6759, 'grad_norm': 0.6487145675442659, 'learning_rate': 1.6503566815552408e-05, 'epoch': 0.3} 30%|██▉ | 1968/6638 [1:52:27<4:12:39, 3.25s/it] 30%|██▉ | 1969/6638 [1:52:30<4:16:22, 3.29s/it] {'loss': 0.7016, 'grad_norm': 0.6597654002195472, 'learning_rate': 1.6499859230536468e-05, 'epoch': 0.3} 30%|██▉ | 1969/6638 [1:52:30<4:16:22, 3.29s/it] 30%|██▉ | 1970/6638 [1:52:33<4:15:51, 3.29s/it] {'loss': 0.6906, 'grad_norm': 0.6335691306815586, 'learning_rate': 1.6496150097767212e-05, 'epoch': 0.3} 30%|██▉ | 1970/6638 [1:52:33<4:15:51, 3.29s/it] 30%|██▉ | 1971/6638 [1:52:36<4:13:32, 3.26s/it] {'loss': 0.7202, 'grad_norm': 0.7288147882419663, 'learning_rate': 1.6492439418127865e-05, 'epoch': 0.3} 30%|██▉ | 1971/6638 [1:52:36<4:13:32, 3.26s/it] 30%|██▉ | 1972/6638 [1:52:40<4:12:40, 3.25s/it] {'loss': 0.6877, 'grad_norm': 0.6285570190173818, 'learning_rate': 1.648872719250201e-05, 'epoch': 0.3} 30%|██▉ | 1972/6638 [1:52:40<4:12:40, 3.25s/it] 30%|██▉ | 1973/6638 [1:52:43<4:11:36, 3.24s/it] {'loss': 0.6973, 'grad_norm': 0.6319583469589086, 'learning_rate': 1.6485013421773616e-05, 'epoch': 0.3} 30%|██▉ | 1973/6638 [1:52:43<4:11:36, 3.24s/it] 30%|██▉ | 1974/6638 [1:52:46<4:17:54, 3.32s/it] {'loss': 0.665, 'grad_norm': 0.5796334822740468, 'learning_rate': 1.6481298106827005e-05, 'epoch': 0.3} 30%|██▉ | 1974/6638 [1:52:46<4:17:54, 3.32s/it] 30%|██▉ | 1975/6638 [1:52:50<4:16:37, 3.30s/it] {'loss': 0.6673, 'grad_norm': 0.6623839485138754, 'learning_rate': 1.647758124854687e-05, 'epoch': 0.3} 30%|██▉ | 1975/6638 [1:52:50<4:16:37, 3.30s/it] 30%|██▉ | 1976/6638 [1:52:53<4:13:54, 3.27s/it] {'loss': 0.675, 'grad_norm': 0.6472874967817553, 'learning_rate': 1.647386284781828e-05, 'epoch': 0.3} 30%|██▉ | 1976/6638 [1:52:53<4:13:54, 3.27s/it] 30%|██▉ | 1977/6638 [1:52:56<4:14:16, 3.27s/it] {'loss': 0.6493, 'grad_norm': 0.5590138003015903, 'learning_rate': 1.6470142905526652e-05, 'epoch': 0.3} 30%|██▉ | 1977/6638 [1:52:56<4:14:16, 3.27s/it] 30%|██▉ | 1978/6638 [1:52:59<4:15:23, 3.29s/it] {'loss': 0.6246, 'grad_norm': 0.6103095545179862, 'learning_rate': 1.6466421422557798e-05, 'epoch': 0.3} 30%|██▉ | 1978/6638 [1:52:59<4:15:23, 3.29s/it] 30%|██▉ | 1979/6638 [1:53:03<4:15:19, 3.29s/it] {'loss': 0.6773, 'grad_norm': 0.5866242186039291, 'learning_rate': 1.6462698399797873e-05, 'epoch': 0.3} 30%|██▉ | 1979/6638 [1:53:03<4:15:19, 3.29s/it] 30%|██▉ | 1980/6638 [1:53:06<4:14:17, 3.28s/it] {'loss': 0.6918, 'grad_norm': 0.6558205231380169, 'learning_rate': 1.6458973838133405e-05, 'epoch': 0.3} 30%|██▉ | 1980/6638 [1:53:06<4:14:17, 3.28s/it] 30%|██▉ | 1981/6638 [1:53:09<4:13:13, 3.26s/it] {'loss': 0.6895, 'grad_norm': 0.6577897617230187, 'learning_rate': 1.6455247738451296e-05, 'epoch': 0.3} 30%|██▉ | 1981/6638 [1:53:09<4:13:13, 3.26s/it] 30%|██▉ | 1982/6638 [1:53:12<4:10:49, 3.23s/it] {'loss': 0.6592, 'grad_norm': 0.6802319869032187, 'learning_rate': 1.6451520101638805e-05, 'epoch': 0.3} 30%|██▉ | 1982/6638 [1:53:12<4:10:49, 3.23s/it] 30%|██▉ | 1983/6638 [1:53:16<4:11:03, 3.24s/it] {'loss': 0.66, 'grad_norm': 0.5999062414291182, 'learning_rate': 1.644779092858356e-05, 'epoch': 0.3} 30%|██▉ | 1983/6638 [1:53:16<4:11:03, 3.24s/it] 30%|██▉ | 1984/6638 [1:53:19<4:11:27, 3.24s/it] {'loss': 0.6745, 'grad_norm': 0.6158996055298076, 'learning_rate': 1.644406022017356e-05, 'epoch': 0.3} 30%|██▉ | 1984/6638 [1:53:19<4:11:27, 3.24s/it] 30%|██▉ | 1985/6638 [1:53:22<4:13:44, 3.27s/it] {'loss': 0.6527, 'grad_norm': 0.5932280823230136, 'learning_rate': 1.6440327977297165e-05, 'epoch': 0.3} 30%|██▉ | 1985/6638 [1:53:22<4:13:44, 3.27s/it] 30%|██▉ | 1986/6638 [1:53:25<4:13:47, 3.27s/it] {'loss': 0.6704, 'grad_norm': 0.6432698216542461, 'learning_rate': 1.6436594200843097e-05, 'epoch': 0.3} 30%|██▉ | 1986/6638 [1:53:25<4:13:47, 3.27s/it] 30%|██▉ | 1987/6638 [1:53:29<4:15:30, 3.30s/it] {'loss': 0.6402, 'grad_norm': 0.5916250865499887, 'learning_rate': 1.6432858891700445e-05, 'epoch': 0.3} 30%|██▉ | 1987/6638 [1:53:29<4:15:30, 3.30s/it] 30%|██▉ | 1988/6638 [1:53:32<4:18:18, 3.33s/it] {'loss': 0.6641, 'grad_norm': 0.58912974086618, 'learning_rate': 1.642912205075867e-05, 'epoch': 0.3} 30%|██▉ | 1988/6638 [1:53:32<4:18:18, 3.33s/it] 30%|██▉ | 1989/6638 [1:53:36<4:19:23, 3.35s/it] {'loss': 0.6575, 'grad_norm': 0.6207662605866063, 'learning_rate': 1.6425383678907594e-05, 'epoch': 0.3} 30%|██▉ | 1989/6638 [1:53:36<4:19:23, 3.35s/it] 30%|██▉ | 1990/6638 [1:53:39<4:16:16, 3.31s/it] {'loss': 0.6436, 'grad_norm': 0.5929378408665614, 'learning_rate': 1.6421643777037397e-05, 'epoch': 0.3} 30%|██▉ | 1990/6638 [1:53:39<4:16:16, 3.31s/it] 30%|██▉ | 1991/6638 [1:53:42<4:15:13, 3.30s/it] {'loss': 0.6717, 'grad_norm': 0.5809871531273296, 'learning_rate': 1.641790234603863e-05, 'epoch': 0.3} 30%|██▉ | 1991/6638 [1:53:42<4:15:13, 3.30s/it] 30%|███ | 1992/6638 [1:53:45<4:12:30, 3.26s/it] {'loss': 0.6364, 'grad_norm': 0.6363890122532887, 'learning_rate': 1.6414159386802204e-05, 'epoch': 0.3} 30%|███ | 1992/6638 [1:53:45<4:12:30, 3.26s/it] 30%|███ | 1993/6638 [1:53:49<4:12:36, 3.26s/it] {'loss': 0.6668, 'grad_norm': 0.6168712882513632, 'learning_rate': 1.64104149002194e-05, 'epoch': 0.3} 30%|███ | 1993/6638 [1:53:49<4:12:36, 3.26s/it] 30%|███ | 1994/6638 [1:53:52<4:12:25, 3.26s/it] {'loss': 0.621, 'grad_norm': 0.583703307907776, 'learning_rate': 1.640666888718186e-05, 'epoch': 0.3} 30%|███ | 1994/6638 [1:53:52<4:12:25, 3.26s/it] 30%|███ | 1995/6638 [1:53:55<4:11:32, 3.25s/it] {'loss': 0.6639, 'grad_norm': 0.5789124718550362, 'learning_rate': 1.6402921348581586e-05, 'epoch': 0.3} 30%|███ | 1995/6638 [1:53:55<4:11:32, 3.25s/it] 30%|███ | 1996/6638 [1:53:58<4:09:59, 3.23s/it] {'loss': 0.6542, 'grad_norm': 0.6463773490129623, 'learning_rate': 1.6399172285310943e-05, 'epoch': 0.3} 30%|███ | 1996/6638 [1:53:58<4:09:59, 3.23s/it] 30%|███ | 1997/6638 [1:54:02<4:13:52, 3.28s/it] {'loss': 0.6945, 'grad_norm': 0.6008530359870763, 'learning_rate': 1.6395421698262665e-05, 'epoch': 0.3} 30%|███ | 1997/6638 [1:54:02<4:13:52, 3.28s/it] 30%|███ | 1998/6638 [1:54:05<4:15:28, 3.30s/it] {'loss': 0.6625, 'grad_norm': 0.6224976880029081, 'learning_rate': 1.639166958832985e-05, 'epoch': 0.3} 30%|███ | 1998/6638 [1:54:05<4:15:28, 3.30s/it] 30%|███ | 1999/6638 [1:54:08<4:14:36, 3.29s/it] {'loss': 0.695, 'grad_norm': 0.6424639572222199, 'learning_rate': 1.638791595640595e-05, 'epoch': 0.3} 30%|███ | 1999/6638 [1:54:08<4:14:36, 3.29s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 30%|███ | 2000/6638 [1:54:12<4:15:12, 3.30s/it] AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6734, 'grad_norm': 0.662424993013341, 'learning_rate': 1.6384160803384782e-05, 'epoch': 0.3} 30%|███ | 2000/6638 [1:54:12<4:15:12, 3.30s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 30%|███ | 2001/6638 [1:54:28<9:17:38, 7.22s/it] {'loss': 0.6711, 'grad_norm': 0.6311246977435147, 'learning_rate': 1.6380404130160528e-05, 'epoch': 0.3} 30%|███ | 2001/6638 [1:54:28<9:17:38, 7.22s/it] 30%|███ | 2002/6638 [1:54:31<7:44:32, 6.01s/it] {'loss': 0.68, 'grad_norm': 0.6113031485053102, 'learning_rate': 1.6376645937627734e-05, 'epoch': 0.3} 30%|███ | 2002/6638 [1:54:31<7:44:32, 6.01s/it] 30%|███ | 2003/6638 [1:54:34<6:42:02, 5.20s/it] {'loss': 0.6611, 'grad_norm': 0.5642387667535429, 'learning_rate': 1.6372886226681303e-05, 'epoch': 0.3} 30%|███ | 2003/6638 [1:54:34<6:42:02, 5.20s/it] 30%|███ | 2004/6638 [1:54:38<5:57:11, 4.62s/it] {'loss': 0.6429, 'grad_norm': 0.5685619007569851, 'learning_rate': 1.63691249982165e-05, 'epoch': 0.3} 30%|███ | 2004/6638 [1:54:38<5:57:11, 4.62s/it] 30%|███ | 2005/6638 [1:54:41<5:25:33, 4.22s/it] {'loss': 0.6363, 'grad_norm': 0.6073090242363802, 'learning_rate': 1.6365362253128956e-05, 'epoch': 0.3} 30%|███ | 2005/6638 [1:54:41<5:25:33, 4.22s/it] 30%|███ | 2006/6638 [1:54:44<5:03:47, 3.94s/it] {'loss': 0.666, 'grad_norm': 0.6357708796569858, 'learning_rate': 1.6361597992314657e-05, 'epoch': 0.3} 30%|███ | 2006/6638 [1:54:44<5:03:47, 3.94s/it] 30%|███ | 2007/6638 [1:54:48<4:48:27, 3.74s/it] {'loss': 0.6518, 'grad_norm': 0.6338431794855766, 'learning_rate': 1.6357832216669956e-05, 'epoch': 0.3} 30%|███ | 2007/6638 [1:54:48<4:48:27, 3.74s/it] 30%|███ | 2008/6638 [1:54:51<4:39:38, 3.62s/it] {'loss': 0.6682, 'grad_norm': 0.5546250307887723, 'learning_rate': 1.6354064927091555e-05, 'epoch': 0.3} 30%|███ | 2008/6638 [1:54:51<4:39:38, 3.62s/it] 30%|███ | 2009/6638 [1:54:54<4:32:30, 3.53s/it] {'loss': 0.6705, 'grad_norm': 0.6155822056785621, 'learning_rate': 1.6350296124476534e-05, 'epoch': 0.3} 30%|███ | 2009/6638 [1:54:54<4:32:30, 3.53s/it] 30%|███ | 2010/6638 [1:54:57<4:26:50, 3.46s/it] {'loss': 0.6738, 'grad_norm': 0.5884186963091748, 'learning_rate': 1.634652580972232e-05, 'epoch': 0.3} 30%|███ | 2010/6638 [1:54:57<4:26:50, 3.46s/it] 30%|███ | 2011/6638 [1:55:01<4:22:39, 3.41s/it] {'loss': 0.6459, 'grad_norm': 0.6394582909685321, 'learning_rate': 1.6342753983726704e-05, 'epoch': 0.3} 30%|███ | 2011/6638 [1:55:01<4:22:39, 3.41s/it] 30%|███ | 2012/6638 [1:55:04<4:20:15, 3.38s/it] {'loss': 0.6628, 'grad_norm': 0.6775901837566848, 'learning_rate': 1.6338980647387843e-05, 'epoch': 0.3} 30%|███ | 2012/6638 [1:55:04<4:20:15, 3.38s/it] 30%|███ | 2013/6638 [1:55:08<4:22:53, 3.41s/it] {'loss': 0.6641, 'grad_norm': 0.6696672783709704, 'learning_rate': 1.6335205801604242e-05, 'epoch': 0.3} 30%|███ | 2013/6638 [1:55:08<4:22:53, 3.41s/it] 30%|███ | 2014/6638 [1:55:11<4:17:04, 3.34s/it] {'loss': 0.6969, 'grad_norm': 0.8061374403770122, 'learning_rate': 1.633142944727477e-05, 'epoch': 0.3} 30%|███ | 2014/6638 [1:55:11<4:17:04, 3.34s/it] 30%|███ | 2015/6638 [1:55:14<4:15:53, 3.32s/it] {'loss': 0.676, 'grad_norm': 0.6136737339041879, 'learning_rate': 1.6327651585298657e-05, 'epoch': 0.3} 30%|███ | 2015/6638 [1:55:14<4:15:53, 3.32s/it] 30%|███ | 2016/6638 [1:55:17<4:17:29, 3.34s/it] {'loss': 0.6363, 'grad_norm': 0.5586728696878206, 'learning_rate': 1.6323872216575498e-05, 'epoch': 0.3} 30%|███ | 2016/6638 [1:55:17<4:17:29, 3.34s/it] 30%|███ | 2017/6638 [1:55:21<4:15:58, 3.32s/it] {'loss': 0.706, 'grad_norm': 0.7141331787310959, 'learning_rate': 1.6320091342005238e-05, 'epoch': 0.3} 30%|███ | 2017/6638 [1:55:21<4:15:58, 3.32s/it] 30%|███ | 2018/6638 [1:55:24<4:13:25, 3.29s/it] {'loss': 0.6805, 'grad_norm': 0.6790261166829864, 'learning_rate': 1.6316308962488173e-05, 'epoch': 0.3} 30%|███ | 2018/6638 [1:55:24<4:13:25, 3.29s/it] 30%|███ | 2019/6638 [1:55:27<4:11:31, 3.27s/it] {'loss': 0.6926, 'grad_norm': 0.6395614204615089, 'learning_rate': 1.631252507892498e-05, 'epoch': 0.3} 30%|███ | 2019/6638 [1:55:27<4:11:31, 3.27s/it] 30%|███ | 2020/6638 [1:55:30<4:11:37, 3.27s/it] {'loss': 0.6362, 'grad_norm': 0.6015402609099046, 'learning_rate': 1.6308739692216675e-05, 'epoch': 0.3} 30%|███ | 2020/6638 [1:55:30<4:11:37, 3.27s/it] 30%|███ | 2021/6638 [1:55:34<4:10:08, 3.25s/it] {'loss': 0.6602, 'grad_norm': 0.6315548959156808, 'learning_rate': 1.6304952803264643e-05, 'epoch': 0.3} 30%|███ | 2021/6638 [1:55:34<4:10:08, 3.25s/it] 30%|███ | 2022/6638 [1:55:37<4:08:45, 3.23s/it] {'loss': 0.6453, 'grad_norm': 0.5640970818555194, 'learning_rate': 1.6301164412970612e-05, 'epoch': 0.3} 30%|███ | 2022/6638 [1:55:37<4:08:45, 3.23s/it] 30%|███ | 2023/6638 [1:55:40<4:09:48, 3.25s/it] {'loss': 0.6844, 'grad_norm': 0.6134402329204184, 'learning_rate': 1.629737452223669e-05, 'epoch': 0.3} 30%|███ | 2023/6638 [1:55:40<4:09:48, 3.25s/it] 30%|███ | 2024/6638 [1:55:43<4:10:12, 3.25s/it] {'loss': 0.7432, 'grad_norm': 0.7055642167529573, 'learning_rate': 1.6293583131965317e-05, 'epoch': 0.3} 30%|███ | 2024/6638 [1:55:43<4:10:12, 3.25s/it] 31%|███ | 2025/6638 [1:55:47<4:08:56, 3.24s/it] {'loss': 0.7299, 'grad_norm': 0.8062064353935884, 'learning_rate': 1.6289790243059313e-05, 'epoch': 0.31} 31%|███ | 2025/6638 [1:55:47<4:08:56, 3.24s/it] 31%|███ | 2026/6638 [1:55:50<4:08:44, 3.24s/it] {'loss': 0.6876, 'grad_norm': 0.6457775432452351, 'learning_rate': 1.6285995856421843e-05, 'epoch': 0.31} 31%|███ | 2026/6638 [1:55:50<4:08:44, 3.24s/it] 31%|███ | 2027/6638 [1:55:53<4:09:28, 3.25s/it] {'loss': 0.664, 'grad_norm': 0.6416806516584062, 'learning_rate': 1.6282199972956425e-05, 'epoch': 0.31} 31%|███ | 2027/6638 [1:55:53<4:09:28, 3.25s/it] 31%|███ | 2028/6638 [1:55:56<4:10:41, 3.26s/it] {'loss': 0.6074, 'grad_norm': 0.591546496016183, 'learning_rate': 1.6278402593566946e-05, 'epoch': 0.31} 31%|███ | 2028/6638 [1:55:56<4:10:41, 3.26s/it] 31%|███ | 2029/6638 [1:56:00<4:11:46, 3.28s/it] {'loss': 0.6698, 'grad_norm': 0.6364923515256288, 'learning_rate': 1.6274603719157633e-05, 'epoch': 0.31} 31%|███ | 2029/6638 [1:56:00<4:11:46, 3.28s/it] 31%|███ | 2030/6638 [1:56:03<4:14:47, 3.32s/it] {'loss': 0.6443, 'grad_norm': 0.6070622192888235, 'learning_rate': 1.6270803350633085e-05, 'epoch': 0.31} 31%|███ | 2030/6638 [1:56:03<4:14:47, 3.32s/it] 31%|███ | 2031/6638 [1:56:06<4:11:56, 3.28s/it] {'loss': 0.6901, 'grad_norm': 0.7771214824719491, 'learning_rate': 1.626700148889825e-05, 'epoch': 0.31} 31%|███ | 2031/6638 [1:56:06<4:11:56, 3.28s/it] 31%|███ | 2032/6638 [1:56:10<4:11:29, 3.28s/it] {'loss': 0.6622, 'grad_norm': 0.6802329158377185, 'learning_rate': 1.6263198134858428e-05, 'epoch': 0.31} 31%|███ | 2032/6638 [1:56:10<4:11:29, 3.28s/it] 31%|███ | 2033/6638 [1:56:13<4:11:51, 3.28s/it] {'loss': 0.6996, 'grad_norm': 0.6628426904191672, 'learning_rate': 1.625939328941928e-05, 'epoch': 0.31} 31%|███ | 2033/6638 [1:56:13<4:11:51, 3.28s/it] 31%|███ | 2034/6638 [1:56:16<4:13:48, 3.31s/it] {'loss': 0.6986, 'grad_norm': 0.6327123931601856, 'learning_rate': 1.6255586953486817e-05, 'epoch': 0.31} 31%|███ | 2034/6638 [1:56:16<4:13:48, 3.31s/it] 31%|███ | 2035/6638 [1:56:20<4:14:23, 3.32s/it] {'loss': 0.7144, 'grad_norm': 0.7726937972188457, 'learning_rate': 1.6251779127967412e-05, 'epoch': 0.31} 31%|███ | 2035/6638 [1:56:20<4:14:23, 3.32s/it] 31%|███ | 2036/6638 [1:56:23<4:14:56, 3.32s/it] {'loss': 0.664, 'grad_norm': 0.6930111918427974, 'learning_rate': 1.6247969813767784e-05, 'epoch': 0.31} 31%|███ | 2036/6638 [1:56:23<4:14:56, 3.32s/it] 31%|███ | 2037/6638 [1:56:26<4:13:50, 3.31s/it] {'loss': 0.618, 'grad_norm': 0.6056279378230718, 'learning_rate': 1.6244159011795012e-05, 'epoch': 0.31} 31%|███ | 2037/6638 [1:56:26<4:13:50, 3.31s/it] 31%|███ | 2038/6638 [1:56:30<4:15:07, 3.33s/it] {'loss': 0.6234, 'grad_norm': 0.6035005867854618, 'learning_rate': 1.624034672295653e-05, 'epoch': 0.31} 31%|███ | 2038/6638 [1:56:30<4:15:07, 3.33s/it] 31%|███ | 2039/6638 [1:56:33<4:15:03, 3.33s/it] {'loss': 0.6779, 'grad_norm': 0.6561557467784587, 'learning_rate': 1.6236532948160123e-05, 'epoch': 0.31} 31%|███ | 2039/6638 [1:56:33<4:15:03, 3.33s/it] 31%|███ | 2040/6638 [1:56:36<4:12:01, 3.29s/it] {'loss': 0.6454, 'grad_norm': 0.6651308725477398, 'learning_rate': 1.6232717688313933e-05, 'epoch': 0.31} 31%|███ | 2040/6638 [1:56:36<4:12:01, 3.29s/it] 31%|███ | 2041/6638 [1:56:39<4:12:15, 3.29s/it] {'loss': 0.691, 'grad_norm': 0.6266780046417394, 'learning_rate': 1.622890094432645e-05, 'epoch': 0.31} 31%|███ | 2041/6638 [1:56:39<4:12:15, 3.29s/it] 31%|███ | 2042/6638 [1:56:43<4:10:57, 3.28s/it] {'loss': 0.6792, 'grad_norm': 0.6347703537885047, 'learning_rate': 1.6225082717106525e-05, 'epoch': 0.31} 31%|███ | 2042/6638 [1:56:43<4:10:57, 3.28s/it] 31%|███ | 2043/6638 [1:56:46<4:14:55, 3.33s/it] {'loss': 0.7017, 'grad_norm': 0.6120324054598476, 'learning_rate': 1.6221263007563352e-05, 'epoch': 0.31} 31%|███ | 2043/6638 [1:56:46<4:14:55, 3.33s/it] 31%|███ | 2044/6638 [1:56:49<4:11:02, 3.28s/it] {'loss': 0.6522, 'grad_norm': 0.6089712708461548, 'learning_rate': 1.6217441816606494e-05, 'epoch': 0.31} 31%|███ | 2044/6638 [1:56:49<4:11:02, 3.28s/it] 31%|███ | 2045/6638 [1:56:52<4:09:07, 3.25s/it] {'loss': 0.6652, 'grad_norm': 0.591746422136342, 'learning_rate': 1.621361914514585e-05, 'epoch': 0.31} 31%|███ | 2045/6638 [1:56:52<4:09:07, 3.25s/it] 31%|███ | 2046/6638 [1:56:56<4:10:50, 3.28s/it] {'loss': 0.6834, 'grad_norm': 0.6131209519796731, 'learning_rate': 1.6209794994091676e-05, 'epoch': 0.31} 31%|███ | 2046/6638 [1:56:56<4:10:50, 3.28s/it] 31%|███ | 2047/6638 [1:56:59<4:13:15, 3.31s/it] {'loss': 0.6598, 'grad_norm': 0.6229136906495194, 'learning_rate': 1.620596936435459e-05, 'epoch': 0.31} 31%|███ | 2047/6638 [1:56:59<4:13:15, 3.31s/it] 31%|███ | 2048/6638 [1:57:02<4:10:33, 3.28s/it] {'loss': 0.6945, 'grad_norm': 0.7122562213432869, 'learning_rate': 1.6202142256845553e-05, 'epoch': 0.31} 31%|███ | 2048/6638 [1:57:02<4:10:33, 3.28s/it] 31%|███ | 2049/6638 [1:57:06<4:10:34, 3.28s/it] {'loss': 0.6349, 'grad_norm': 0.612723223123844, 'learning_rate': 1.6198313672475875e-05, 'epoch': 0.31} 31%|███ | 2049/6638 [1:57:06<4:10:34, 3.28s/it]2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...4 AutoResumeHook: Checking whether to suspend... 31%|███ | 2050/6638 [1:57:09<4:11:22, 3.29s/it]7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.7233, 'grad_norm': 0.7170655774469341, 'learning_rate': 1.6194483612157232e-05, 'epoch': 0.31} 31%|███ | 2050/6638 [1:57:09<4:11:22, 3.29s/it] 31%|███ | 2051/6638 [1:57:12<4:12:45, 3.31s/it] {'loss': 0.6483, 'grad_norm': 0.5726519839135734, 'learning_rate': 1.6190652076801635e-05, 'epoch': 0.31} 31%|███ | 2051/6638 [1:57:12<4:12:45, 3.31s/it] 31%|███ | 2052/6638 [1:57:15<4:10:58, 3.28s/it] {'loss': 0.6529, 'grad_norm': 0.6285600033913243, 'learning_rate': 1.618681906732145e-05, 'epoch': 0.31} 31%|███ | 2052/6638 [1:57:15<4:10:58, 3.28s/it] 31%|███ | 2053/6638 [1:57:19<4:11:50, 3.30s/it] {'loss': 0.6044, 'grad_norm': 0.6030865119605279, 'learning_rate': 1.6182984584629405e-05, 'epoch': 0.31} 31%|███ | 2053/6638 [1:57:19<4:11:50, 3.30s/it] 31%|███ | 2054/6638 [1:57:22<4:10:05, 3.27s/it] {'loss': 0.6425, 'grad_norm': 0.5728218368347316, 'learning_rate': 1.6179148629638567e-05, 'epoch': 0.31} 31%|███ | 2054/6638 [1:57:22<4:10:05, 3.27s/it] 31%|███ | 2055/6638 [1:57:25<4:11:06, 3.29s/it] {'loss': 0.7186, 'grad_norm': 0.8085170872899482, 'learning_rate': 1.617531120326236e-05, 'epoch': 0.31} 31%|███ | 2055/6638 [1:57:25<4:11:06, 3.29s/it] 31%|███ | 2056/6638 [1:57:28<4:07:38, 3.24s/it] {'loss': 0.7277, 'grad_norm': 0.7410983220562306, 'learning_rate': 1.6171472306414554e-05, 'epoch': 0.31} 31%|███ | 2056/6638 [1:57:28<4:07:38, 3.24s/it] 31%|███ | 2057/6638 [1:57:32<4:06:13, 3.22s/it] {'loss': 0.6164, 'grad_norm': 0.6257918687745256, 'learning_rate': 1.6167631940009273e-05, 'epoch': 0.31} 31%|███ | 2057/6638 [1:57:32<4:06:13, 3.22s/it] 31%|███ | 2058/6638 [1:57:35<4:05:50, 3.22s/it] {'loss': 0.6593, 'grad_norm': 0.6872127083293577, 'learning_rate': 1.6163790104960984e-05, 'epoch': 0.31} 31%|███ | 2058/6638 [1:57:35<4:05:50, 3.22s/it] 31%|███ | 2059/6638 [1:57:38<4:05:59, 3.22s/it] {'loss': 0.6734, 'grad_norm': 0.6068097986682842, 'learning_rate': 1.6159946802184518e-05, 'epoch': 0.31} 31%|███ | 2059/6638 [1:57:38<4:05:59, 3.22s/it] 31%|███ | 2060/6638 [1:57:41<4:06:50, 3.24s/it] {'loss': 0.7016, 'grad_norm': 0.7789006176312792, 'learning_rate': 1.6156102032595032e-05, 'epoch': 0.31} 31%|███ | 2060/6638 [1:57:41<4:06:50, 3.24s/it] 31%|███ | 2061/6638 [1:57:45<4:09:33, 3.27s/it] {'loss': 0.7686, 'grad_norm': 0.709591104083925, 'learning_rate': 1.6152255797108064e-05, 'epoch': 0.31} 31%|███ | 2061/6638 [1:57:45<4:09:33, 3.27s/it] 31%|███ | 2062/6638 [1:57:48<4:09:11, 3.27s/it] {'loss': 0.6346, 'grad_norm': 0.5590333153065523, 'learning_rate': 1.614840809663947e-05, 'epoch': 0.31} 31%|███ | 2062/6638 [1:57:48<4:09:11, 3.27s/it] 31%|███ | 2063/6638 [1:57:52<4:15:05, 3.35s/it] {'loss': 0.6803, 'grad_norm': 0.6051579054488705, 'learning_rate': 1.6144558932105473e-05, 'epoch': 0.31} 31%|███ | 2063/6638 [1:57:52<4:15:05, 3.35s/it] 31%|███ | 2064/6638 [1:57:55<4:13:49, 3.33s/it] {'loss': 0.6791, 'grad_norm': 0.6225330472186236, 'learning_rate': 1.6140708304422647e-05, 'epoch': 0.31} 31%|███ | 2064/6638 [1:57:55<4:13:49, 3.33s/it] 31%|███ | 2065/6638 [1:57:58<4:11:55, 3.31s/it] {'loss': 0.6496, 'grad_norm': 0.6149344117190298, 'learning_rate': 1.61368562145079e-05, 'epoch': 0.31} 31%|███ | 2065/6638 [1:57:58<4:11:55, 3.31s/it] 31%|███ | 2066/6638 [1:58:01<4:12:01, 3.31s/it] {'loss': 0.6332, 'grad_norm': 0.6400434014211173, 'learning_rate': 1.613300266327849e-05, 'epoch': 0.31} 31%|███ | 2066/6638 [1:58:01<4:12:01, 3.31s/it] 31%|███ | 2067/6638 [1:58:05<4:09:34, 3.28s/it] {'loss': 0.6464, 'grad_norm': 0.5981530620455798, 'learning_rate': 1.612914765165204e-05, 'epoch': 0.31} 31%|███ | 2067/6638 [1:58:05<4:09:34, 3.28s/it] 31%|███ | 2068/6638 [1:58:08<4:09:55, 3.28s/it] {'loss': 0.6795, 'grad_norm': 0.6721058145649954, 'learning_rate': 1.6125291180546508e-05, 'epoch': 0.31} 31%|███ | 2068/6638 [1:58:08<4:09:55, 3.28s/it] 31%|███ | 2069/6638 [1:58:11<4:06:55, 3.24s/it] {'loss': 0.6383, 'grad_norm': 0.5700712824498524, 'learning_rate': 1.6121433250880193e-05, 'epoch': 0.31} 31%|███ | 2069/6638 [1:58:11<4:06:55, 3.24s/it] 31%|███ | 2070/6638 [1:58:14<4:07:26, 3.25s/it] {'loss': 0.632, 'grad_norm': 0.5780841913244117, 'learning_rate': 1.6117573863571756e-05, 'epoch': 0.31} 31%|███ | 2070/6638 [1:58:14<4:07:26, 3.25s/it] 31%|███ | 2071/6638 [1:58:18<4:09:01, 3.27s/it] {'loss': 0.6589, 'grad_norm': 0.5720711228113943, 'learning_rate': 1.6113713019540196e-05, 'epoch': 0.31} 31%|███ | 2071/6638 [1:58:18<4:09:01, 3.27s/it] 31%|███ | 2072/6638 [1:58:21<4:07:11, 3.25s/it] {'loss': 0.6736, 'grad_norm': 0.6308421194539277, 'learning_rate': 1.6109850719704864e-05, 'epoch': 0.31} 31%|███ | 2072/6638 [1:58:21<4:07:11, 3.25s/it] 31%|███ | 2073/6638 [1:58:24<4:07:15, 3.25s/it] {'loss': 0.683, 'grad_norm': 0.672292241361755, 'learning_rate': 1.6105986964985453e-05, 'epoch': 0.31} 31%|███ | 2073/6638 [1:58:24<4:07:15, 3.25s/it] 31%|███ | 2074/6638 [1:58:27<4:06:43, 3.24s/it] {'loss': 0.676, 'grad_norm': 0.5714502642540338, 'learning_rate': 1.6102121756302e-05, 'epoch': 0.31} 31%|███ | 2074/6638 [1:58:27<4:06:43, 3.24s/it] 31%|███▏ | 2075/6638 [1:58:31<4:09:33, 3.28s/it] {'loss': 0.7097, 'grad_norm': 0.6247396582094072, 'learning_rate': 1.60982550945749e-05, 'epoch': 0.31} 31%|███▏ | 2075/6638 [1:58:31<4:09:33, 3.28s/it] 31%|███▏ | 2076/6638 [1:58:34<4:08:57, 3.27s/it] {'loss': 0.6998, 'grad_norm': 0.6802755497234451, 'learning_rate': 1.609438698072488e-05, 'epoch': 0.31} 31%|███▏ | 2076/6638 [1:58:34<4:08:57, 3.27s/it] 31%|███▏ | 2077/6638 [1:58:37<4:09:07, 3.28s/it] {'loss': 0.664, 'grad_norm': 0.5709842023224919, 'learning_rate': 1.6090517415673024e-05, 'epoch': 0.31} 31%|███▏ | 2077/6638 [1:58:37<4:09:07, 3.28s/it] 31%|███▏ | 2078/6638 [1:58:40<4:08:02, 3.26s/it] {'loss': 0.6569, 'grad_norm': 0.5689112088706544, 'learning_rate': 1.6086646400340756e-05, 'epoch': 0.31} 31%|███▏ | 2078/6638 [1:58:40<4:08:02, 3.26s/it] 31%|███▏ | 2079/6638 [1:58:44<4:07:02, 3.25s/it] {'loss': 0.6549, 'grad_norm': 0.576899102916866, 'learning_rate': 1.6082773935649843e-05, 'epoch': 0.31} 31%|███▏ | 2079/6638 [1:58:44<4:07:02, 3.25s/it] 31%|███▏ | 2080/6638 [1:58:47<4:05:06, 3.23s/it] {'loss': 0.6708, 'grad_norm': 0.5889648745281073, 'learning_rate': 1.607890002252241e-05, 'epoch': 0.31} 31%|███▏ | 2080/6638 [1:58:47<4:05:06, 3.23s/it] 31%|███▏ | 2081/6638 [1:58:50<4:05:25, 3.23s/it] {'loss': 0.645, 'grad_norm': 0.5921803442118254, 'learning_rate': 1.6075024661880903e-05, 'epoch': 0.31} 31%|███▏ | 2081/6638 [1:58:50<4:05:25, 3.23s/it] 31%|███▏ | 2082/6638 [1:58:53<4:05:43, 3.24s/it] {'loss': 0.6648, 'grad_norm': 0.5787701053099829, 'learning_rate': 1.6071147854648137e-05, 'epoch': 0.31} 31%|███▏ | 2082/6638 [1:58:53<4:05:43, 3.24s/it] 31%|███▏ | 2083/6638 [1:58:57<4:07:07, 3.26s/it] {'loss': 0.696, 'grad_norm': 0.6427870852369068, 'learning_rate': 1.606726960174726e-05, 'epoch': 0.31} 31%|███▏ | 2083/6638 [1:58:57<4:07:07, 3.26s/it] 31%|███▏ | 2084/6638 [1:59:00<4:06:38, 3.25s/it] {'loss': 0.6804, 'grad_norm': 0.690219246004093, 'learning_rate': 1.6063389904101764e-05, 'epoch': 0.31} 31%|███▏ | 2084/6638 [1:59:00<4:06:38, 3.25s/it] 31%|███▏ | 2085/6638 [1:59:03<4:07:22, 3.26s/it] {'loss': 0.6604, 'grad_norm': 0.6262334726164459, 'learning_rate': 1.6059508762635482e-05, 'epoch': 0.31} 31%|███▏ | 2085/6638 [1:59:03<4:07:22, 3.26s/it] 31%|███▏ | 2086/6638 [1:59:06<4:08:51, 3.28s/it] {'loss': 0.6775, 'grad_norm': 0.6139019327731335, 'learning_rate': 1.6055626178272606e-05, 'epoch': 0.31} 31%|███▏ | 2086/6638 [1:59:06<4:08:51, 3.28s/it] 31%|███▏ | 2087/6638 [1:59:10<4:10:54, 3.31s/it] {'loss': 0.6622, 'grad_norm': 0.6635206878934325, 'learning_rate': 1.6051742151937655e-05, 'epoch': 0.31} 31%|███▏ | 2087/6638 [1:59:10<4:10:54, 3.31s/it] 31%|███▏ | 2088/6638 [1:59:13<4:10:36, 3.30s/it] {'loss': 0.691, 'grad_norm': 0.6794140028865384, 'learning_rate': 1.6047856684555493e-05, 'epoch': 0.31} 31%|███▏ | 2088/6638 [1:59:13<4:10:36, 3.30s/it] 31%|███▏ | 2089/6638 [1:59:16<4:08:52, 3.28s/it] {'loss': 0.6888, 'grad_norm': 0.5951735471819878, 'learning_rate': 1.6043969777051342e-05, 'epoch': 0.31} 31%|███▏ | 2089/6638 [1:59:16<4:08:52, 3.28s/it] 31%|███▏ | 2090/6638 [1:59:20<4:09:09, 3.29s/it] {'loss': 0.6974, 'grad_norm': 0.6212232324082853, 'learning_rate': 1.604008143035075e-05, 'epoch': 0.31} 31%|███▏ | 2090/6638 [1:59:20<4:09:09, 3.29s/it] 32%|███▏ | 2091/6638 [1:59:23<4:09:47, 3.30s/it] {'loss': 0.67, 'grad_norm': 0.6233069926413143, 'learning_rate': 1.6036191645379613e-05, 'epoch': 0.32} 32%|███▏ | 2091/6638 [1:59:23<4:09:47, 3.30s/it] 32%|███▏ | 2092/6638 [1:59:26<4:09:05, 3.29s/it] {'loss': 0.7081, 'grad_norm': 0.6541115388235261, 'learning_rate': 1.6032300423064174e-05, 'epoch': 0.32} 32%|███▏ | 2092/6638 [1:59:26<4:09:05, 3.29s/it] 32%|███▏ | 2093/6638 [1:59:30<4:10:58, 3.31s/it] {'loss': 0.6984, 'grad_norm': 0.7589672837867865, 'learning_rate': 1.6028407764331015e-05, 'epoch': 0.32} 32%|███▏ | 2093/6638 [1:59:30<4:10:58, 3.31s/it] 32%|███▏ | 2094/6638 [1:59:33<4:13:38, 3.35s/it] {'loss': 0.6553, 'grad_norm': 0.5807973774577591, 'learning_rate': 1.602451367010706e-05, 'epoch': 0.32} 32%|███▏ | 2094/6638 [1:59:33<4:13:38, 3.35s/it] 32%|███▏ | 2095/6638 [1:59:36<4:13:36, 3.35s/it] {'loss': 0.6879, 'grad_norm': 0.587526288478221, 'learning_rate': 1.602061814131957e-05, 'epoch': 0.32} 32%|███▏ | 2095/6638 [1:59:36<4:13:36, 3.35s/it] 32%|███▏ | 2096/6638 [1:59:40<4:12:48, 3.34s/it] {'loss': 0.6567, 'grad_norm': 0.9329526054086108, 'learning_rate': 1.601672117889616e-05, 'epoch': 0.32} 32%|███▏ | 2096/6638 [1:59:40<4:12:48, 3.34s/it] 32%|███▏ | 2097/6638 [1:59:43<4:13:44, 3.35s/it] {'loss': 0.6627, 'grad_norm': 0.6430696572869131, 'learning_rate': 1.6012822783764774e-05, 'epoch': 0.32} 32%|███▏ | 2097/6638 [1:59:43<4:13:44, 3.35s/it] 32%|███▏ | 2098/6638 [1:59:46<4:12:02, 3.33s/it] {'loss': 0.7029, 'grad_norm': 0.706816002672648, 'learning_rate': 1.60089229568537e-05, 'epoch': 0.32} 32%|███▏ | 2098/6638 [1:59:46<4:12:02, 3.33s/it] 32%|███▏ | 2099/6638 [1:59:50<4:09:31, 3.30s/it] {'loss': 0.6674, 'grad_norm': 0.6826788977670271, 'learning_rate': 1.6005021699091576e-05, 'epoch': 0.32} 32%|███▏ | 2099/6638 [1:59:50<4:09:31, 3.30s/it]2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 32%|███▏ | 2100/6638 [1:59:53<4:08:26, 3.28s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6747, 'grad_norm': 0.6096739384932762, 'learning_rate': 1.6001119011407366e-05, 'epoch': 0.32} 32%|███▏ | 2100/6638 [1:59:53<4:08:26, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 32%|███▏ | 2101/6638 [2:00:09<9:09:09, 7.26s/it] {'loss': 0.6557, 'grad_norm': 0.5434128055688305, 'learning_rate': 1.5997214894730387e-05, 'epoch': 0.32} 32%|███▏ | 2101/6638 [2:00:09<9:09:09, 7.26s/it] 32%|███▏ | 2102/6638 [2:00:13<7:39:36, 6.08s/it] {'loss': 0.7242, 'grad_norm': 1.0056041223169552, 'learning_rate': 1.5993309349990293e-05, 'epoch': 0.32} 32%|███▏ | 2102/6638 [2:00:13<7:39:36, 6.08s/it] 32%|███▏ | 2103/6638 [2:00:16<6:36:01, 5.24s/it] {'loss': 0.6618, 'grad_norm': 0.5659556918794301, 'learning_rate': 1.5989402378117068e-05, 'epoch': 0.32} 32%|███▏ | 2103/6638 [2:00:16<6:36:01, 5.24s/it] 32%|███▏ | 2104/6638 [2:00:19<5:51:04, 4.65s/it] {'loss': 0.6985, 'grad_norm': 0.6436582961581966, 'learning_rate': 1.598549398004105e-05, 'epoch': 0.32} 32%|███▏ | 2104/6638 [2:00:19<5:51:04, 4.65s/it] 32%|███▏ | 2105/6638 [2:00:23<5:21:00, 4.25s/it] {'loss': 0.6666, 'grad_norm': 0.6626810292013945, 'learning_rate': 1.5981584156692912e-05, 'epoch': 0.32} 32%|███▏ | 2105/6638 [2:00:23<5:21:00, 4.25s/it] 32%|███▏ | 2106/6638 [2:00:26<5:04:59, 4.04s/it] {'loss': 0.6487, 'grad_norm': 0.5842095721328012, 'learning_rate': 1.5977672909003664e-05, 'epoch': 0.32} 32%|███▏ | 2106/6638 [2:00:26<5:04:59, 4.04s/it] 32%|███▏ | 2107/6638 [2:00:29<4:49:54, 3.84s/it] {'loss': 0.6624, 'grad_norm': 0.6820043577244419, 'learning_rate': 1.5973760237904653e-05, 'epoch': 0.32} 32%|███▏ | 2107/6638 [2:00:29<4:49:54, 3.84s/it] 32%|███▏ | 2108/6638 [2:00:33<4:39:50, 3.71s/it] {'loss': 0.6643, 'grad_norm': 0.6139830596661715, 'learning_rate': 1.5969846144327574e-05, 'epoch': 0.32} 32%|███▏ | 2108/6638 [2:00:33<4:39:50, 3.71s/it] 32%|███▏ | 2109/6638 [2:00:36<4:29:40, 3.57s/it] {'loss': 0.6571, 'grad_norm': 0.6315719064622022, 'learning_rate': 1.596593062920445e-05, 'epoch': 0.32} 32%|███▏ | 2109/6638 [2:00:36<4:29:40, 3.57s/it] 32%|███▏ | 2110/6638 [2:00:40<4:26:56, 3.54s/it] {'loss': 0.6844, 'grad_norm': 0.8317171038116228, 'learning_rate': 1.5962013693467652e-05, 'epoch': 0.32} 32%|███▏ | 2110/6638 [2:00:40<4:26:56, 3.54s/it] 32%|███▏ | 2111/6638 [2:00:43<4:23:50, 3.50s/it] {'loss': 0.6909, 'grad_norm': 0.7054288086332844, 'learning_rate': 1.5958095338049882e-05, 'epoch': 0.32} 32%|███▏ | 2111/6638 [2:00:43<4:23:50, 3.50s/it] 32%|███▏ | 2112/6638 [2:00:46<4:17:42, 3.42s/it] {'loss': 0.6668, 'grad_norm': 0.6369913726720163, 'learning_rate': 1.5954175563884187e-05, 'epoch': 0.32} 32%|███▏ | 2112/6638 [2:00:46<4:17:42, 3.42s/it] 32%|███▏ | 2113/6638 [2:00:50<4:14:32, 3.38s/it] {'loss': 0.6398, 'grad_norm': 0.7216595071816856, 'learning_rate': 1.595025437190394e-05, 'epoch': 0.32} 32%|███▏ | 2113/6638 [2:00:50<4:14:32, 3.38s/it] 32%|███▏ | 2114/6638 [2:00:53<4:13:11, 3.36s/it] {'loss': 0.6424, 'grad_norm': 0.6632851551060056, 'learning_rate': 1.594633176304287e-05, 'epoch': 0.32} 32%|███▏ | 2114/6638 [2:00:53<4:13:11, 3.36s/it] 32%|███▏ | 2115/6638 [2:00:56<4:10:20, 3.32s/it] {'loss': 0.6543, 'grad_norm': 0.5837640290112748, 'learning_rate': 1.594240773823502e-05, 'epoch': 0.32} 32%|███▏ | 2115/6638 [2:00:56<4:10:20, 3.32s/it] 32%|███▏ | 2116/6638 [2:00:59<4:09:42, 3.31s/it] {'loss': 0.7019, 'grad_norm': 0.664427674774738, 'learning_rate': 1.5938482298414794e-05, 'epoch': 0.32} 32%|███▏ | 2116/6638 [2:00:59<4:09:42, 3.31s/it] 32%|███▏ | 2117/6638 [2:01:03<4:11:35, 3.34s/it] {'loss': 0.6417, 'grad_norm': 0.6595686522710361, 'learning_rate': 1.5934555444516916e-05, 'epoch': 0.32} 32%|███▏ | 2117/6638 [2:01:03<4:11:35, 3.34s/it] 32%|███▏ | 2118/6638 [2:01:06<4:09:44, 3.32s/it] {'loss': 0.6709, 'grad_norm': 0.7712645385392205, 'learning_rate': 1.5930627177476452e-05, 'epoch': 0.32} 32%|███▏ | 2118/6638 [2:01:06<4:09:44, 3.32s/it] 32%|███▏ | 2119/6638 [2:01:09<4:08:58, 3.31s/it] {'loss': 0.675, 'grad_norm': 0.6167504871080524, 'learning_rate': 1.5926697498228808e-05, 'epoch': 0.32} 32%|███▏ | 2119/6638 [2:01:09<4:08:58, 3.31s/it] 32%|███▏ | 2120/6638 [2:01:13<4:07:39, 3.29s/it] {'loss': 0.6495, 'grad_norm': 0.8511541975949969, 'learning_rate': 1.5922766407709724e-05, 'epoch': 0.32} 32%|███▏ | 2120/6638 [2:01:13<4:07:39, 3.29s/it] 32%|███▏ | 2121/6638 [2:01:16<4:07:15, 3.28s/it] {'loss': 0.6467, 'grad_norm': 0.6051628785548695, 'learning_rate': 1.5918833906855274e-05, 'epoch': 0.32} 32%|███▏ | 2121/6638 [2:01:16<4:07:15, 3.28s/it] 32%|███▏ | 2122/6638 [2:01:19<4:05:14, 3.26s/it] {'loss': 0.6109, 'grad_norm': 0.7329593365476854, 'learning_rate': 1.5914899996601865e-05, 'epoch': 0.32} 32%|███▏ | 2122/6638 [2:01:19<4:05:14, 3.26s/it] 32%|███▏ | 2123/6638 [2:01:22<4:04:21, 3.25s/it] {'loss': 0.6783, 'grad_norm': 0.6443277157196383, 'learning_rate': 1.591096467788625e-05, 'epoch': 0.32} 32%|███▏ | 2123/6638 [2:01:22<4:04:21, 3.25s/it] 32%|███▏ | 2124/6638 [2:01:26<4:06:29, 3.28s/it] {'loss': 0.6383, 'grad_norm': 0.6056794946995239, 'learning_rate': 1.590702795164551e-05, 'epoch': 0.32} 32%|███▏ | 2124/6638 [2:01:26<4:06:29, 3.28s/it] 32%|███▏ | 2125/6638 [2:01:29<4:07:39, 3.29s/it] {'loss': 0.6816, 'grad_norm': 0.6606272883614269, 'learning_rate': 1.5903089818817067e-05, 'epoch': 0.32} 32%|███▏ | 2125/6638 [2:01:29<4:07:39, 3.29s/it] 32%|███▏ | 2126/6638 [2:01:32<4:08:40, 3.31s/it] {'loss': 0.7183, 'grad_norm': 0.7471248644271963, 'learning_rate': 1.589915028033866e-05, 'epoch': 0.32} 32%|███▏ | 2126/6638 [2:01:32<4:08:40, 3.31s/it] 32%|███▏ | 2127/6638 [2:01:36<4:09:38, 3.32s/it] {'loss': 0.6526, 'grad_norm': 0.612037851519659, 'learning_rate': 1.5895209337148388e-05, 'epoch': 0.32} 32%|███▏ | 2127/6638 [2:01:36<4:09:38, 3.32s/it] 32%|███▏ | 2128/6638 [2:01:39<4:09:20, 3.32s/it] {'loss': 0.6385, 'grad_norm': 0.6346726272199661, 'learning_rate': 1.5891266990184665e-05, 'epoch': 0.32} 32%|███▏ | 2128/6638 [2:01:39<4:09:20, 3.32s/it] 32%|███▏ | 2129/6638 [2:01:42<4:07:34, 3.29s/it] {'loss': 0.6383, 'grad_norm': 0.5782998102228555, 'learning_rate': 1.5887323240386252e-05, 'epoch': 0.32} 32%|███▏ | 2129/6638 [2:01:42<4:07:34, 3.29s/it] 32%|███▏ | 2130/6638 [2:01:45<4:08:29, 3.31s/it] {'loss': 0.6205, 'grad_norm': 0.549969060178427, 'learning_rate': 1.5883378088692238e-05, 'epoch': 0.32} 32%|███▏ | 2130/6638 [2:01:45<4:08:29, 3.31s/it] 32%|███▏ | 2131/6638 [2:01:49<4:06:32, 3.28s/it] {'loss': 0.6456, 'grad_norm': 0.6318110960918721, 'learning_rate': 1.5879431536042047e-05, 'epoch': 0.32} 32%|███▏ | 2131/6638 [2:01:49<4:06:32, 3.28s/it] 32%|███▏ | 2132/6638 [2:01:52<4:09:09, 3.32s/it] {'loss': 0.6547, 'grad_norm': 0.6383833584192691, 'learning_rate': 1.5875483583375434e-05, 'epoch': 0.32} 32%|███▏ | 2132/6638 [2:01:52<4:09:09, 3.32s/it] 32%|███▏ | 2133/6638 [2:01:55<4:08:53, 3.31s/it] {'loss': 0.6944, 'grad_norm': 0.6838474972204014, 'learning_rate': 1.5871534231632488e-05, 'epoch': 0.32} 32%|███▏ | 2133/6638 [2:01:55<4:08:53, 3.31s/it] 32%|███▏ | 2134/6638 [2:01:59<4:08:44, 3.31s/it] {'loss': 0.6602, 'grad_norm': 0.6379584799646452, 'learning_rate': 1.5867583481753638e-05, 'epoch': 0.32} 32%|███▏ | 2134/6638 [2:01:59<4:08:44, 3.31s/it] 32%|███▏ | 2135/6638 [2:02:02<4:06:05, 3.28s/it] {'loss': 0.6514, 'grad_norm': 0.6504619994110028, 'learning_rate': 1.5863631334679638e-05, 'epoch': 0.32} 32%|███▏ | 2135/6638 [2:02:02<4:06:05, 3.28s/it] 32%|███▏ | 2136/6638 [2:02:05<4:10:10, 3.33s/it] {'loss': 0.6728, 'grad_norm': 0.5273360192439609, 'learning_rate': 1.5859677791351577e-05, 'epoch': 0.32} 32%|███▏ | 2136/6638 [2:02:05<4:10:10, 3.33s/it] 32%|███▏ | 2137/6638 [2:02:10<4:42:45, 3.77s/it] {'loss': 0.661, 'grad_norm': 0.5853017235403376, 'learning_rate': 1.5855722852710878e-05, 'epoch': 0.32} 32%|███▏ | 2137/6638 [2:02:10<4:42:45, 3.77s/it] 32%|███▏ | 2138/6638 [2:02:13<4:29:56, 3.60s/it] {'loss': 0.6926, 'grad_norm': 0.5968570501126174, 'learning_rate': 1.5851766519699294e-05, 'epoch': 0.32} 32%|███▏ | 2138/6638 [2:02:13<4:29:56, 3.60s/it] 32%|███▏ | 2139/6638 [2:02:17<4:23:22, 3.51s/it] {'loss': 0.6437, 'grad_norm': 0.5803862783637905, 'learning_rate': 1.584780879325891e-05, 'epoch': 0.32} 32%|███▏ | 2139/6638 [2:02:17<4:23:22, 3.51s/it] 32%|███▏ | 2140/6638 [2:02:20<4:17:05, 3.43s/it] {'loss': 0.6606, 'grad_norm': 0.6118293271099865, 'learning_rate': 1.584384967433215e-05, 'epoch': 0.32} 32%|███▏ | 2140/6638 [2:02:20<4:17:05, 3.43s/it] 32%|███▏ | 2141/6638 [2:02:23<4:15:38, 3.41s/it] {'loss': 0.6524, 'grad_norm': 0.5666928499555595, 'learning_rate': 1.5839889163861756e-05, 'epoch': 0.32} 32%|███▏ | 2141/6638 [2:02:23<4:15:38, 3.41s/it] 32%|███▏ | 2142/6638 [2:02:29<5:01:35, 4.02s/it] {'loss': 0.6801, 'grad_norm': 0.6957329264570076, 'learning_rate': 1.5835927262790812e-05, 'epoch': 0.32} 32%|███▏ | 2142/6638 [2:02:29<5:01:35, 4.02s/it] 32%|███▏ | 2143/6638 [2:02:32<4:44:18, 3.80s/it] {'loss': 0.7071, 'grad_norm': 0.6536737219511788, 'learning_rate': 1.5831963972062734e-05, 'epoch': 0.32} 32%|███▏ | 2143/6638 [2:02:32<4:44:18, 3.80s/it] 32%|███▏ | 2144/6638 [2:02:35<4:33:00, 3.64s/it] {'loss': 0.6774, 'grad_norm': 0.667801810881733, 'learning_rate': 1.5827999292621263e-05, 'epoch': 0.32} 32%|███▏ | 2144/6638 [2:02:35<4:33:00, 3.64s/it] 32%|███▏ | 2145/6638 [2:02:40<4:59:27, 4.00s/it] {'loss': 0.6484, 'grad_norm': 0.6129040290348441, 'learning_rate': 1.5824033225410463e-05, 'epoch': 0.32} 32%|███▏ | 2145/6638 [2:02:40<4:59:27, 4.00s/it] 32%|███▏ | 2146/6638 [2:02:44<4:47:42, 3.84s/it] {'loss': 0.6712, 'grad_norm': 0.5721070882493647, 'learning_rate': 1.582006577137475e-05, 'epoch': 0.32} 32%|███▏ | 2146/6638 [2:02:44<4:47:42, 3.84s/it] 32%|███▏ | 2147/6638 [2:02:47<4:31:46, 3.63s/it] {'loss': 0.6668, 'grad_norm': 0.691795665165791, 'learning_rate': 1.5816096931458854e-05, 'epoch': 0.32} 32%|███▏ | 2147/6638 [2:02:47<4:31:46, 3.63s/it] 32%|███▏ | 2148/6638 [2:02:50<4:22:22, 3.51s/it] {'loss': 0.6431, 'grad_norm': 0.6030718248327733, 'learning_rate': 1.5812126706607846e-05, 'epoch': 0.32} 32%|███▏ | 2148/6638 [2:02:50<4:22:22, 3.51s/it] 32%|███▏ | 2149/6638 [2:02:53<4:17:52, 3.45s/it] {'loss': 0.6235, 'grad_norm': 0.6336052556079328, 'learning_rate': 1.5808155097767107e-05, 'epoch': 0.32} 32%|███▏ | 2149/6638 [2:02:53<4:17:52, 3.45s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 32%|███▏ | 2150/6638 [2:02:57<4:14:46, 3.41s/it] {'loss': 0.6713, 'grad_norm': 0.6990554280479107, 'learning_rate': 1.580418210588237e-05, 'epoch': 0.32} 32%|███▏ | 2150/6638 [2:02:57<4:14:46, 3.41s/it] 32%|███▏ | 2151/6638 [2:03:00<4:09:49, 3.34s/it] {'loss': 0.666, 'grad_norm': 0.5833404478827101, 'learning_rate': 1.5800207731899683e-05, 'epoch': 0.32} 32%|███▏ | 2151/6638 [2:03:00<4:09:49, 3.34s/it] 32%|███▏ | 2152/6638 [2:03:03<4:08:34, 3.32s/it] {'loss': 0.674, 'grad_norm': 0.635469986641034, 'learning_rate': 1.579623197676543e-05, 'epoch': 0.32} 32%|███▏ | 2152/6638 [2:03:03<4:08:34, 3.32s/it] 32%|███▏ | 2153/6638 [2:03:06<4:06:52, 3.30s/it] {'loss': 0.6553, 'grad_norm': 0.6304266927080252, 'learning_rate': 1.579225484142633e-05, 'epoch': 0.32} 32%|███▏ | 2153/6638 [2:03:06<4:06:52, 3.30s/it] 32%|███▏ | 2154/6638 [2:03:12<4:59:33, 4.01s/it] {'loss': 0.6917, 'grad_norm': 0.6518681291916703, 'learning_rate': 1.5788276326829408e-05, 'epoch': 0.32} 32%|███▏ | 2154/6638 [2:03:12<4:59:33, 4.01s/it] 32%|███▏ | 2155/6638 [2:03:15<4:43:10, 3.79s/it] {'loss': 0.7235, 'grad_norm': 0.6378730899877907, 'learning_rate': 1.578429643392204e-05, 'epoch': 0.32} 32%|███▏ | 2155/6638 [2:03:15<4:43:10, 3.79s/it] 32%|███▏ | 2156/6638 [2:03:18<4:29:37, 3.61s/it] {'loss': 0.6323, 'grad_norm': 0.5848110748651554, 'learning_rate': 1.5780315163651922e-05, 'epoch': 0.32} 32%|███▏ | 2156/6638 [2:03:18<4:29:37, 3.61s/it] 32%|███▏ | 2157/6638 [2:03:22<4:22:32, 3.52s/it] {'loss': 0.6075, 'grad_norm': 0.5368811828871636, 'learning_rate': 1.577633251696708e-05, 'epoch': 0.32} 32%|███▏ | 2157/6638 [2:03:22<4:22:32, 3.52s/it] 33%|███▎ | 2158/6638 [2:03:27<5:09:47, 4.15s/it] {'loss': 0.6978, 'grad_norm': 0.6311798239075903, 'learning_rate': 1.5772348494815864e-05, 'epoch': 0.33} 33%|███▎ | 2158/6638 [2:03:27<5:09:47, 4.15s/it] 33%|███▎ | 2159/6638 [2:03:31<4:50:44, 3.89s/it] {'loss': 0.6926, 'grad_norm': 0.6398466755067335, 'learning_rate': 1.576836309814695e-05, 'epoch': 0.33} 33%|███▎ | 2159/6638 [2:03:31<4:50:44, 3.89s/it] 33%|███▎ | 2160/6638 [2:03:34<4:35:31, 3.69s/it] {'loss': 0.6414, 'grad_norm': 0.5505383195973236, 'learning_rate': 1.5764376327909353e-05, 'epoch': 0.33} 33%|███▎ | 2160/6638 [2:03:34<4:35:31, 3.69s/it] 33%|███▎ | 2161/6638 [2:03:37<4:26:07, 3.57s/it] {'loss': 0.6759, 'grad_norm': 0.6735627958830963, 'learning_rate': 1.57603881850524e-05, 'epoch': 0.33} 33%|███▎ | 2161/6638 [2:03:37<4:26:07, 3.57s/it] 33%|███▎ | 2162/6638 [2:03:41<4:23:12, 3.53s/it] {'loss': 0.6554, 'grad_norm': 0.5925277013246734, 'learning_rate': 1.5756398670525753e-05, 'epoch': 0.33} 33%|███▎ | 2162/6638 [2:03:41<4:23:12, 3.53s/it] 33%|███▎ | 2163/6638 [2:03:44<4:17:07, 3.45s/it] {'loss': 0.6891, 'grad_norm': 0.6172057789204426, 'learning_rate': 1.57524077852794e-05, 'epoch': 0.33} 33%|███▎ | 2163/6638 [2:03:44<4:17:07, 3.45s/it] 33%|███▎ | 2164/6638 [2:03:49<4:44:34, 3.82s/it] {'loss': 0.639, 'grad_norm': 0.5688501922042367, 'learning_rate': 1.5748415530263656e-05, 'epoch': 0.33} 33%|███▎ | 2164/6638 [2:03:49<4:44:34, 3.82s/it] 33%|███▎ | 2165/6638 [2:03:52<4:32:34, 3.66s/it] {'loss': 0.6138, 'grad_norm': 0.5787399530605867, 'learning_rate': 1.5744421906429163e-05, 'epoch': 0.33} 33%|███▎ | 2165/6638 [2:03:52<4:32:34, 3.66s/it] 33%|███▎ | 2166/6638 [2:03:55<4:20:58, 3.50s/it] {'loss': 0.6719, 'grad_norm': 0.6502373636623335, 'learning_rate': 1.5740426914726882e-05, 'epoch': 0.33} 33%|███▎ | 2166/6638 [2:03:55<4:20:58, 3.50s/it] 33%|███▎ | 2167/6638 [2:03:58<4:15:06, 3.42s/it] {'loss': 0.6179, 'grad_norm': 0.6229304273917585, 'learning_rate': 1.5736430556108104e-05, 'epoch': 0.33} 33%|███▎ | 2167/6638 [2:03:58<4:15:06, 3.42s/it] 33%|███▎ | 2168/6638 [2:04:01<4:11:27, 3.38s/it] {'loss': 0.6964, 'grad_norm': 0.6472830442309236, 'learning_rate': 1.5732432831524448e-05, 'epoch': 0.33} 33%|███▎ | 2168/6638 [2:04:01<4:11:27, 3.38s/it] 33%|███▎ | 2169/6638 [2:04:05<4:13:24, 3.40s/it] {'loss': 0.6623, 'grad_norm': 0.6365915854552638, 'learning_rate': 1.572843374192786e-05, 'epoch': 0.33} 33%|███▎ | 2169/6638 [2:04:05<4:13:24, 3.40s/it] 33%|███▎ | 2170/6638 [2:04:10<4:42:56, 3.80s/it] {'loss': 0.677, 'grad_norm': 0.6452104489422091, 'learning_rate': 1.5724433288270606e-05, 'epoch': 0.33} 33%|███▎ | 2170/6638 [2:04:10<4:42:56, 3.80s/it] 33%|███▎ | 2171/6638 [2:04:13<4:30:50, 3.64s/it] {'loss': 0.6944, 'grad_norm': 0.641248690489977, 'learning_rate': 1.572043147150528e-05, 'epoch': 0.33} 33%|███▎ | 2171/6638 [2:04:13<4:30:50, 3.64s/it] 33%|███▎ | 2172/6638 [2:04:16<4:23:51, 3.54s/it] {'loss': 0.667, 'grad_norm': 0.6371082775187714, 'learning_rate': 1.5716428292584788e-05, 'epoch': 0.33} 33%|███▎ | 2172/6638 [2:04:16<4:23:51, 3.54s/it] 33%|███▎ | 2173/6638 [2:04:20<4:19:57, 3.49s/it] {'loss': 0.6764, 'grad_norm': 0.6128081995931737, 'learning_rate': 1.571242375246238e-05, 'epoch': 0.33} 33%|███▎ | 2173/6638 [2:04:20<4:19:57, 3.49s/it] 33%|███▎ | 2174/6638 [2:04:23<4:15:03, 3.43s/it] {'loss': 0.6627, 'grad_norm': 0.6544404391771131, 'learning_rate': 1.5708417852091625e-05, 'epoch': 0.33} 33%|███▎ | 2174/6638 [2:04:23<4:15:03, 3.43s/it] 33%|███▎ | 2175/6638 [2:04:26<4:14:43, 3.42s/it] {'loss': 0.6863, 'grad_norm': 0.5774025239462989, 'learning_rate': 1.5704410592426406e-05, 'epoch': 0.33} 33%|███▎ | 2175/6638 [2:04:26<4:14:43, 3.42s/it] 33%|███▎ | 2176/6638 [2:04:30<4:10:54, 3.37s/it] {'loss': 0.6819, 'grad_norm': 0.6035843973247257, 'learning_rate': 1.5700401974420935e-05, 'epoch': 0.33} 33%|███▎ | 2176/6638 [2:04:30<4:10:54, 3.37s/it] 33%|███▎ | 2177/6638 [2:04:34<4:44:41, 3.83s/it] {'loss': 0.6727, 'grad_norm': 0.6472253908331146, 'learning_rate': 1.5696391999029753e-05, 'epoch': 0.33} 33%|███▎ | 2177/6638 [2:04:34<4:44:41, 3.83s/it] 33%|███▎ | 2178/6638 [2:04:38<4:32:14, 3.66s/it] {'loss': 0.6683, 'grad_norm': 0.6148251539033884, 'learning_rate': 1.5692380667207715e-05, 'epoch': 0.33} 33%|███▎ | 2178/6638 [2:04:38<4:32:14, 3.66s/it] 33%|███▎ | 2179/6638 [2:04:41<4:26:06, 3.58s/it] {'loss': 0.6345, 'grad_norm': 0.550058758953424, 'learning_rate': 1.5688367979910004e-05, 'epoch': 0.33} 33%|███▎ | 2179/6638 [2:04:41<4:26:06, 3.58s/it] 33%|███▎ | 2180/6638 [2:04:44<4:20:24, 3.50s/it] {'loss': 0.642, 'grad_norm': 0.6205152667722864, 'learning_rate': 1.568435393809213e-05, 'epoch': 0.33} 33%|███▎ | 2180/6638 [2:04:44<4:20:24, 3.50s/it] 33%|███▎ | 2181/6638 [2:04:48<4:16:18, 3.45s/it] {'loss': 0.6511, 'grad_norm': 0.5709504716078927, 'learning_rate': 1.5680338542709917e-05, 'epoch': 0.33} 33%|███▎ | 2181/6638 [2:04:48<4:16:18, 3.45s/it] 33%|███▎ | 2182/6638 [2:04:51<4:12:02, 3.39s/it] {'loss': 0.6466, 'grad_norm': 0.6568896325701462, 'learning_rate': 1.5676321794719517e-05, 'epoch': 0.33} 33%|███▎ | 2182/6638 [2:04:51<4:12:02, 3.39s/it] 33%|███▎ | 2183/6638 [2:04:54<4:09:26, 3.36s/it] {'loss': 0.6593, 'grad_norm': 0.7162892501003059, 'learning_rate': 1.56723036950774e-05, 'epoch': 0.33} 33%|███▎ | 2183/6638 [2:04:54<4:09:26, 3.36s/it] 33%|███▎ | 2184/6638 [2:04:58<4:06:29, 3.32s/it] {'loss': 0.658, 'grad_norm': 0.6641442535409962, 'learning_rate': 1.566828424474036e-05, 'epoch': 0.33} 33%|███▎ | 2184/6638 [2:04:58<4:06:29, 3.32s/it] 33%|███▎ | 2185/6638 [2:05:01<4:05:19, 3.31s/it] {'loss': 0.6976, 'grad_norm': 0.674309737552729, 'learning_rate': 1.5664263444665518e-05, 'epoch': 0.33} 33%|███▎ | 2185/6638 [2:05:01<4:05:19, 3.31s/it] 33%|███▎ | 2186/6638 [2:05:04<4:05:16, 3.31s/it] {'loss': 0.6975, 'grad_norm': 0.7216291246241585, 'learning_rate': 1.5660241295810307e-05, 'epoch': 0.33} 33%|███▎ | 2186/6638 [2:05:04<4:05:16, 3.31s/it] 33%|███▎ | 2187/6638 [2:05:07<4:02:07, 3.26s/it] {'loss': 0.6403, 'grad_norm': 0.6344546377854752, 'learning_rate': 1.565621779913248e-05, 'epoch': 0.33} 33%|███▎ | 2187/6638 [2:05:07<4:02:07, 3.26s/it] 33%|███▎ | 2188/6638 [2:05:11<4:01:32, 3.26s/it] {'loss': 0.7094, 'grad_norm': 0.7166149196750081, 'learning_rate': 1.5652192955590127e-05, 'epoch': 0.33} 33%|███▎ | 2188/6638 [2:05:11<4:01:32, 3.26s/it] 33%|███▎ | 2189/6638 [2:05:14<4:01:34, 3.26s/it] {'loss': 0.6662, 'grad_norm': 0.6435881292266186, 'learning_rate': 1.564816676614164e-05, 'epoch': 0.33} 33%|███▎ | 2189/6638 [2:05:14<4:01:34, 3.26s/it] 33%|███▎ | 2190/6638 [2:05:17<4:03:15, 3.28s/it] {'loss': 0.6691, 'grad_norm': 0.5959719465619782, 'learning_rate': 1.5644139231745743e-05, 'epoch': 0.33} 33%|███▎ | 2190/6638 [2:05:17<4:03:15, 3.28s/it] 33%|███▎ | 2191/6638 [2:05:20<4:01:52, 3.26s/it] {'loss': 0.661, 'grad_norm': 0.6562227647820773, 'learning_rate': 1.5640110353361478e-05, 'epoch': 0.33} 33%|███▎ | 2191/6638 [2:05:20<4:01:52, 3.26s/it] 33%|███▎ | 2192/6638 [2:05:24<4:01:02, 3.25s/it] {'loss': 0.674, 'grad_norm': 0.6330571777955667, 'learning_rate': 1.5636080131948204e-05, 'epoch': 0.33} 33%|███▎ | 2192/6638 [2:05:24<4:01:02, 3.25s/it] 33%|███▎ | 2193/6638 [2:05:27<4:03:25, 3.29s/it] {'loss': 0.6825, 'grad_norm': 0.6450088537285087, 'learning_rate': 1.56320485684656e-05, 'epoch': 0.33} 33%|███▎ | 2193/6638 [2:05:27<4:03:25, 3.29s/it] 33%|███▎ | 2194/6638 [2:05:30<4:01:52, 3.27s/it] {'loss': 0.6549, 'grad_norm': 0.6228660566530518, 'learning_rate': 1.5628015663873664e-05, 'epoch': 0.33} 33%|███▎ | 2194/6638 [2:05:30<4:01:52, 3.27s/it] 33%|███▎ | 2195/6638 [2:05:34<4:06:03, 3.32s/it] {'loss': 0.6684, 'grad_norm': 0.6348542948405451, 'learning_rate': 1.5623981419132722e-05, 'epoch': 0.33} 33%|███▎ | 2195/6638 [2:05:34<4:06:03, 3.32s/it] 33%|███▎ | 2196/6638 [2:05:37<4:07:00, 3.34s/it] {'loss': 0.7068, 'grad_norm': 0.6040607634580308, 'learning_rate': 1.5619945835203413e-05, 'epoch': 0.33} 33%|███▎ | 2196/6638 [2:05:37<4:07:00, 3.34s/it] 33%|███▎ | 2197/6638 [2:05:40<4:05:00, 3.31s/it] {'loss': 0.6932, 'grad_norm': 0.6287605136037231, 'learning_rate': 1.5615908913046686e-05, 'epoch': 0.33} 33%|███▎ | 2197/6638 [2:05:40<4:05:00, 3.31s/it] 33%|███▎ | 2198/6638 [2:05:43<4:03:37, 3.29s/it] {'loss': 0.6166, 'grad_norm': 0.7798774893398289, 'learning_rate': 1.5611870653623826e-05, 'epoch': 0.33} 33%|███▎ | 2198/6638 [2:05:43<4:03:37, 3.29s/it] 33%|███▎ | 2199/6638 [2:05:47<4:02:42, 3.28s/it] {'loss': 0.6141, 'grad_norm': 0.5756083539413442, 'learning_rate': 1.560783105789642e-05, 'epoch': 0.33} 33%|███▎ | 2199/6638 [2:05:47<4:02:42, 3.28s/it]2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 33%|███▎ | 2200/6638 [2:05:50<4:06:57, 3.34s/it]3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6946, 'grad_norm': 0.7282168663700926, 'learning_rate': 1.560379012682639e-05, 'epoch': 0.33} 33%|███▎ | 2200/6638 [2:05:50<4:06:57, 3.34s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 33%|███▎ | 2201/6638 [2:06:07<8:59:41, 7.30s/it] {'loss': 0.6639, 'grad_norm': 0.6883372637845102, 'learning_rate': 1.5599747861375957e-05, 'epoch': 0.33} 33%|███▎ | 2201/6638 [2:06:07<8:59:41, 7.30s/it] 33%|███▎ | 2202/6638 [2:06:10<7:31:24, 6.11s/it] {'loss': 0.6797, 'grad_norm': 0.6484992159985299, 'learning_rate': 1.5595704262507672e-05, 'epoch': 0.33} 33%|███▎ | 2202/6638 [2:06:10<7:31:24, 6.11s/it] 33%|███▎ | 2203/6638 [2:06:13<6:26:38, 5.23s/it] {'loss': 0.6831, 'grad_norm': 0.7620809799794547, 'learning_rate': 1.5591659331184407e-05, 'epoch': 0.33} 33%|███▎ | 2203/6638 [2:06:13<6:26:38, 5.23s/it] 33%|███▎ | 2204/6638 [2:06:16<5:42:29, 4.63s/it] {'loss': 0.6652, 'grad_norm': 0.624017934536609, 'learning_rate': 1.5587613068369337e-05, 'epoch': 0.33} 33%|███▎ | 2204/6638 [2:06:16<5:42:29, 4.63s/it] 33%|███▎ | 2205/6638 [2:06:20<5:16:27, 4.28s/it] {'loss': 0.6524, 'grad_norm': 0.6166028520870536, 'learning_rate': 1.558356547502597e-05, 'epoch': 0.33} 33%|███▎ | 2205/6638 [2:06:20<5:16:27, 4.28s/it] 33%|███▎ | 2206/6638 [2:06:23<4:54:16, 3.98s/it] {'loss': 0.6885, 'grad_norm': 0.6407583271615785, 'learning_rate': 1.557951655211812e-05, 'epoch': 0.33} 33%|███▎ | 2206/6638 [2:06:23<4:54:16, 3.98s/it] 33%|███▎ | 2207/6638 [2:06:26<4:37:01, 3.75s/it] {'loss': 0.6776, 'grad_norm': 0.5678014984147134, 'learning_rate': 1.5575466300609917e-05, 'epoch': 0.33} 33%|███▎ | 2207/6638 [2:06:26<4:37:01, 3.75s/it] 33%|███▎ | 2208/6638 [2:06:30<4:25:39, 3.60s/it] {'loss': 0.6645, 'grad_norm': 0.6366820282175545, 'learning_rate': 1.5571414721465818e-05, 'epoch': 0.33} 33%|███▎ | 2208/6638 [2:06:30<4:25:39, 3.60s/it] 33%|███▎ | 2209/6638 [2:06:33<4:17:23, 3.49s/it] {'loss': 0.7079, 'grad_norm': 0.7167202005571828, 'learning_rate': 1.5567361815650586e-05, 'epoch': 0.33} 33%|███▎ | 2209/6638 [2:06:33<4:17:23, 3.49s/it] 33%|███▎ | 2210/6638 [2:06:36<4:15:20, 3.46s/it] {'loss': 0.6897, 'grad_norm': 0.5818848433509063, 'learning_rate': 1.5563307584129302e-05, 'epoch': 0.33} 33%|███▎ | 2210/6638 [2:06:36<4:15:20, 3.46s/it] 33%|███▎ | 2211/6638 [2:06:40<4:09:23, 3.38s/it] {'loss': 0.6551, 'grad_norm': 0.5882054033204731, 'learning_rate': 1.5559252027867368e-05, 'epoch': 0.33} 33%|███▎ | 2211/6638 [2:06:40<4:09:23, 3.38s/it] 33%|███▎ | 2212/6638 [2:06:43<4:08:11, 3.36s/it] {'loss': 0.7018, 'grad_norm': 0.6532874000746371, 'learning_rate': 1.5555195147830488e-05, 'epoch': 0.33} 33%|███▎ | 2212/6638 [2:06:43<4:08:11, 3.36s/it] 33%|███▎ | 2213/6638 [2:06:46<4:05:53, 3.33s/it] {'loss': 0.676, 'grad_norm': 0.6367503386741924, 'learning_rate': 1.55511369449847e-05, 'epoch': 0.33} 33%|███▎ | 2213/6638 [2:06:46<4:05:53, 3.33s/it] 33%|███▎ | 2214/6638 [2:06:49<4:02:19, 3.29s/it] {'loss': 0.6698, 'grad_norm': 0.6066822740550684, 'learning_rate': 1.5547077420296345e-05, 'epoch': 0.33} 33%|███▎ | 2214/6638 [2:06:49<4:02:19, 3.29s/it] 33%|███▎ | 2215/6638 [2:06:53<4:04:39, 3.32s/it] {'loss': 0.6938, 'grad_norm': 0.6958907161085574, 'learning_rate': 1.554301657473208e-05, 'epoch': 0.33} 33%|███▎ | 2215/6638 [2:06:53<4:04:39, 3.32s/it] 33%|███▎ | 2216/6638 [2:06:56<4:00:32, 3.26s/it] {'loss': 0.6598, 'grad_norm': 0.591773614421409, 'learning_rate': 1.553895440925888e-05, 'epoch': 0.33} 33%|███▎ | 2216/6638 [2:06:56<4:00:32, 3.26s/it] 33%|███▎ | 2217/6638 [2:06:59<4:04:58, 3.32s/it] {'loss': 0.7038, 'grad_norm': 0.6344504462609468, 'learning_rate': 1.5534890924844025e-05, 'epoch': 0.33} 33%|███▎ | 2217/6638 [2:06:59<4:04:58, 3.32s/it] 33%|███▎ | 2218/6638 [2:07:02<4:02:35, 3.29s/it] {'loss': 0.6692, 'grad_norm': 0.6790338076309971, 'learning_rate': 1.5530826122455128e-05, 'epoch': 0.33} 33%|███▎ | 2218/6638 [2:07:02<4:02:35, 3.29s/it] 33%|███▎ | 2219/6638 [2:07:06<4:03:21, 3.30s/it] {'loss': 0.6885, 'grad_norm': 0.6430140397158, 'learning_rate': 1.5526760003060095e-05, 'epoch': 0.33} 33%|███▎ | 2219/6638 [2:07:06<4:03:21, 3.30s/it] 33%|███▎ | 2220/6638 [2:07:09<4:01:31, 3.28s/it] {'loss': 0.6436, 'grad_norm': 0.5468581430600885, 'learning_rate': 1.5522692567627157e-05, 'epoch': 0.33} 33%|███▎ | 2220/6638 [2:07:09<4:01:31, 3.28s/it] 33%|███▎ | 2221/6638 [2:07:12<3:59:02, 3.25s/it] {'loss': 0.6258, 'grad_norm': 0.6036681466752383, 'learning_rate': 1.5518623817124858e-05, 'epoch': 0.33} 33%|███▎ | 2221/6638 [2:07:12<3:59:02, 3.25s/it] 33%|███▎ | 2222/6638 [2:07:15<3:58:34, 3.24s/it] {'loss': 0.6714, 'grad_norm': 0.5760341032400416, 'learning_rate': 1.551455375252205e-05, 'epoch': 0.33} 33%|███▎ | 2222/6638 [2:07:15<3:58:34, 3.24s/it] 33%|███▎ | 2223/6638 [2:07:19<3:58:38, 3.24s/it] {'loss': 0.637, 'grad_norm': 0.5672068384345005, 'learning_rate': 1.5510482374787902e-05, 'epoch': 0.33} 33%|███▎ | 2223/6638 [2:07:19<3:58:38, 3.24s/it] 34%|███▎ | 2224/6638 [2:07:22<3:57:58, 3.23s/it] {'loss': 0.7002, 'grad_norm': 0.6627396184914157, 'learning_rate': 1.5506409684891897e-05, 'epoch': 0.34} 34%|███▎ | 2224/6638 [2:07:22<3:57:58, 3.23s/it] 34%|███▎ | 2225/6638 [2:07:25<4:01:08, 3.28s/it] {'loss': 0.6903, 'grad_norm': 0.6148569164098552, 'learning_rate': 1.5502335683803828e-05, 'epoch': 0.34} 34%|███▎ | 2225/6638 [2:07:25<4:01:08, 3.28s/it] 34%|███▎ | 2226/6638 [2:07:29<4:00:08, 3.27s/it] {'loss': 0.6719, 'grad_norm': 0.6343915171869653, 'learning_rate': 1.5498260372493795e-05, 'epoch': 0.34} 34%|███▎ | 2226/6638 [2:07:29<4:00:08, 3.27s/it] 34%|███▎ | 2227/6638 [2:07:32<4:04:50, 3.33s/it] {'loss': 0.6844, 'grad_norm': 0.7187991584213663, 'learning_rate': 1.5494183751932227e-05, 'epoch': 0.34} 34%|███▎ | 2227/6638 [2:07:32<4:04:50, 3.33s/it] 34%|███▎ | 2228/6638 [2:07:35<4:06:03, 3.35s/it] {'loss': 0.7253, 'grad_norm': 0.9840751518070424, 'learning_rate': 1.549010582308984e-05, 'epoch': 0.34} 34%|███▎ | 2228/6638 [2:07:35<4:06:03, 3.35s/it] 34%|███▎ | 2229/6638 [2:07:39<4:07:29, 3.37s/it] {'loss': 0.6433, 'grad_norm': 0.6872291824225145, 'learning_rate': 1.548602658693768e-05, 'epoch': 0.34} 34%|███▎ | 2229/6638 [2:07:39<4:07:29, 3.37s/it] 34%|███▎ | 2230/6638 [2:07:42<4:04:14, 3.32s/it] {'loss': 0.6311, 'grad_norm': 0.5391117347136128, 'learning_rate': 1.54819460444471e-05, 'epoch': 0.34} 34%|███▎ | 2230/6638 [2:07:42<4:04:14, 3.32s/it] 34%|███▎ | 2231/6638 [2:07:45<4:01:35, 3.29s/it] {'loss': 0.6252, 'grad_norm': 0.6068480065879956, 'learning_rate': 1.5477864196589762e-05, 'epoch': 0.34} 34%|███▎ | 2231/6638 [2:07:45<4:01:35, 3.29s/it] 34%|███▎ | 2232/6638 [2:07:48<4:00:24, 3.27s/it] {'loss': 0.7072, 'grad_norm': 0.667859308945728, 'learning_rate': 1.547378104433764e-05, 'epoch': 0.34} 34%|███▎ | 2232/6638 [2:07:48<4:00:24, 3.27s/it] 34%|███▎ | 2233/6638 [2:07:52<4:05:34, 3.34s/it] {'loss': 0.7358, 'grad_norm': 0.788863870911118, 'learning_rate': 1.546969658866302e-05, 'epoch': 0.34} 34%|███▎ | 2233/6638 [2:07:52<4:05:34, 3.34s/it] 34%|███▎ | 2234/6638 [2:07:55<4:04:04, 3.33s/it] {'loss': 0.677, 'grad_norm': 0.6849952196605561, 'learning_rate': 1.5465610830538496e-05, 'epoch': 0.34} 34%|███▎ | 2234/6638 [2:07:55<4:04:04, 3.33s/it] 34%|███▎ | 2235/6638 [2:07:59<4:03:50, 3.32s/it] {'loss': 0.6656, 'grad_norm': 0.6461265074850859, 'learning_rate': 1.546152377093697e-05, 'epoch': 0.34} 34%|███▎ | 2235/6638 [2:07:59<4:03:50, 3.32s/it] 34%|███▎ | 2236/6638 [2:08:02<4:02:03, 3.30s/it] {'loss': 0.7239, 'grad_norm': 0.6840941552711534, 'learning_rate': 1.5457435410831662e-05, 'epoch': 0.34} 34%|███▎ | 2236/6638 [2:08:02<4:02:03, 3.30s/it] 34%|███▎ | 2237/6638 [2:08:05<4:00:27, 3.28s/it] {'loss': 0.6687, 'grad_norm': 0.6177547780846663, 'learning_rate': 1.5453345751196095e-05, 'epoch': 0.34} 34%|███▎ | 2237/6638 [2:08:05<4:00:27, 3.28s/it] 34%|███▎ | 2238/6638 [2:08:09<4:04:23, 3.33s/it] {'loss': 0.6583, 'grad_norm': 0.5882456680018321, 'learning_rate': 1.54492547930041e-05, 'epoch': 0.34} 34%|███▎ | 2238/6638 [2:08:09<4:04:23, 3.33s/it] 34%|███▎ | 2239/6638 [2:08:12<4:03:02, 3.31s/it] {'loss': 0.6468, 'grad_norm': 0.6016043024962744, 'learning_rate': 1.5445162537229825e-05, 'epoch': 0.34} 34%|███▎ | 2239/6638 [2:08:12<4:03:02, 3.31s/it] 34%|███▎ | 2240/6638 [2:08:15<4:01:42, 3.30s/it] {'loss': 0.6428, 'grad_norm': 0.5137794633990915, 'learning_rate': 1.5441068984847718e-05, 'epoch': 0.34} 34%|███▎ | 2240/6638 [2:08:15<4:01:42, 3.30s/it] 34%|███▍ | 2241/6638 [2:08:18<3:59:58, 3.27s/it] {'loss': 0.6314, 'grad_norm': 0.6279132406458744, 'learning_rate': 1.5436974136832542e-05, 'epoch': 0.34} 34%|███▍ | 2241/6638 [2:08:18<3:59:58, 3.27s/it] 34%|███▍ | 2242/6638 [2:08:22<4:00:44, 3.29s/it] {'loss': 0.6744, 'grad_norm': 0.6506393967250386, 'learning_rate': 1.5432877994159365e-05, 'epoch': 0.34} 34%|███▍ | 2242/6638 [2:08:22<4:00:44, 3.29s/it] 34%|███▍ | 2243/6638 [2:08:25<3:59:36, 3.27s/it] {'loss': 0.6422, 'grad_norm': 0.5934204014941642, 'learning_rate': 1.542878055780357e-05, 'epoch': 0.34} 34%|███▍ | 2243/6638 [2:08:25<3:59:36, 3.27s/it] 34%|███▍ | 2244/6638 [2:08:28<3:59:13, 3.27s/it] {'loss': 0.6531, 'grad_norm': 0.559403408453958, 'learning_rate': 1.5424681828740837e-05, 'epoch': 0.34} 34%|███▍ | 2244/6638 [2:08:28<3:59:13, 3.27s/it] 34%|███▍ | 2245/6638 [2:08:31<3:58:42, 3.26s/it] {'loss': 0.6387, 'grad_norm': 0.6079343095490412, 'learning_rate': 1.5420581807947158e-05, 'epoch': 0.34} 34%|███▍ | 2245/6638 [2:08:31<3:58:42, 3.26s/it] 34%|███▍ | 2246/6638 [2:08:35<3:58:21, 3.26s/it] {'loss': 0.6168, 'grad_norm': 0.643859286172804, 'learning_rate': 1.5416480496398843e-05, 'epoch': 0.34} 34%|███▍ | 2246/6638 [2:08:35<3:58:21, 3.26s/it] 34%|███▍ | 2247/6638 [2:08:38<3:58:13, 3.26s/it] {'loss': 0.65, 'grad_norm': 0.6085107564829618, 'learning_rate': 1.541237789507249e-05, 'epoch': 0.34} 34%|███▍ | 2247/6638 [2:08:38<3:58:13, 3.26s/it] 34%|███▍ | 2248/6638 [2:08:41<3:58:30, 3.26s/it] {'loss': 0.6678, 'grad_norm': 0.7403981328086365, 'learning_rate': 1.5408274004945024e-05, 'epoch': 0.34} 34%|███▍ | 2248/6638 [2:08:41<3:58:30, 3.26s/it] 34%|███▍ | 2249/6638 [2:08:44<3:56:23, 3.23s/it] {'loss': 0.7097, 'grad_norm': 1.2215215012468394, 'learning_rate': 1.5404168826993665e-05, 'epoch': 0.34} 34%|███▍ | 2249/6638 [2:08:44<3:56:23, 3.23s/it]2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 34%|███▍ | 2250/6638 [2:08:48<3:58:44, 3.26s/it]3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6688, 'grad_norm': 0.6916247843301067, 'learning_rate': 1.540006236219594e-05, 'epoch': 0.34} 34%|███▍ | 2250/6638 [2:08:48<3:58:44, 3.26s/it] 34%|███▍ | 2251/6638 [2:08:51<3:57:59, 3.25s/it] {'loss': 0.6574, 'grad_norm': 0.5817492832835538, 'learning_rate': 1.5395954611529684e-05, 'epoch': 0.34} 34%|███▍ | 2251/6638 [2:08:51<3:57:59, 3.25s/it] 34%|███▍ | 2252/6638 [2:08:54<3:57:23, 3.25s/it] {'loss': 0.6655, 'grad_norm': 0.6391113737594598, 'learning_rate': 1.5391845575973046e-05, 'epoch': 0.34} 34%|███▍ | 2252/6638 [2:08:54<3:57:23, 3.25s/it] 34%|███▍ | 2253/6638 [2:08:57<3:55:27, 3.22s/it] {'loss': 0.6693, 'grad_norm': 0.7151230397688914, 'learning_rate': 1.5387735256504464e-05, 'epoch': 0.34} 34%|███▍ | 2253/6638 [2:08:57<3:55:27, 3.22s/it] 34%|███▍ | 2254/6638 [2:09:01<3:59:49, 3.28s/it] {'loss': 0.674, 'grad_norm': 0.5865444485164467, 'learning_rate': 1.5383623654102697e-05, 'epoch': 0.34} 34%|███▍ | 2254/6638 [2:09:01<3:59:49, 3.28s/it] 34%|███▍ | 2255/6638 [2:09:04<3:59:21, 3.28s/it] {'loss': 0.6472, 'grad_norm': 1.4451988074895483, 'learning_rate': 1.5379510769746803e-05, 'epoch': 0.34} 34%|███▍ | 2255/6638 [2:09:04<3:59:21, 3.28s/it] 34%|███▍ | 2256/6638 [2:09:08<4:06:48, 3.38s/it] {'loss': 0.7023, 'grad_norm': 0.6543935748017653, 'learning_rate': 1.537539660441615e-05, 'epoch': 0.34} 34%|███▍ | 2256/6638 [2:09:08<4:06:48, 3.38s/it] 34%|███▍ | 2257/6638 [2:09:11<4:05:52, 3.37s/it] {'loss': 0.6777, 'grad_norm': 0.6092755304985349, 'learning_rate': 1.53712811590904e-05, 'epoch': 0.34} 34%|███▍ | 2257/6638 [2:09:11<4:05:52, 3.37s/it] 34%|███▍ | 2258/6638 [2:09:14<4:04:47, 3.35s/it] {'loss': 0.6468, 'grad_norm': 0.6126599168127075, 'learning_rate': 1.5367164434749536e-05, 'epoch': 0.34} 34%|███▍ | 2258/6638 [2:09:14<4:04:47, 3.35s/it] 34%|███▍ | 2259/6638 [2:09:17<4:03:26, 3.34s/it] {'loss': 0.7199, 'grad_norm': 0.7889272954261694, 'learning_rate': 1.5363046432373824e-05, 'epoch': 0.34} 34%|███▍ | 2259/6638 [2:09:17<4:03:26, 3.34s/it] 34%|███▍ | 2260/6638 [2:09:21<4:03:56, 3.34s/it] {'loss': 0.6687, 'grad_norm': 0.5711503932296267, 'learning_rate': 1.535892715294386e-05, 'epoch': 0.34} 34%|███▍ | 2260/6638 [2:09:21<4:03:56, 3.34s/it] 34%|███▍ | 2261/6638 [2:09:24<4:04:34, 3.35s/it] {'loss': 0.6851, 'grad_norm': 0.6884148662666721, 'learning_rate': 1.535480659744053e-05, 'epoch': 0.34} 34%|███▍ | 2261/6638 [2:09:24<4:04:34, 3.35s/it] 34%|███▍ | 2262/6638 [2:09:27<4:02:58, 3.33s/it] {'loss': 0.6821, 'grad_norm': 0.6584561699538396, 'learning_rate': 1.5350684766845017e-05, 'epoch': 0.34} 34%|███▍ | 2262/6638 [2:09:27<4:02:58, 3.33s/it] 34%|███▍ | 2263/6638 [2:09:31<4:01:38, 3.31s/it] {'loss': 0.6641, 'grad_norm': 0.6691623408921396, 'learning_rate': 1.5346561662138817e-05, 'epoch': 0.34} 34%|███▍ | 2263/6638 [2:09:31<4:01:38, 3.31s/it] 34%|███▍ | 2264/6638 [2:09:34<4:00:15, 3.30s/it] {'loss': 0.6624, 'grad_norm': 0.6486672522847133, 'learning_rate': 1.5342437284303738e-05, 'epoch': 0.34} 34%|███▍ | 2264/6638 [2:09:34<4:00:15, 3.30s/it] 34%|███▍ | 2265/6638 [2:09:37<3:59:09, 3.28s/it] {'loss': 0.6737, 'grad_norm': 0.6187476781616337, 'learning_rate': 1.5338311634321868e-05, 'epoch': 0.34} 34%|███▍ | 2265/6638 [2:09:37<3:59:09, 3.28s/it] 34%|███▍ | 2266/6638 [2:09:41<4:01:42, 3.32s/it] {'loss': 0.6464, 'grad_norm': 0.6317025209197619, 'learning_rate': 1.5334184713175618e-05, 'epoch': 0.34} 34%|███▍ | 2266/6638 [2:09:41<4:01:42, 3.32s/it] 34%|███▍ | 2267/6638 [2:09:44<4:01:32, 3.32s/it] {'loss': 0.7049, 'grad_norm': 0.6401400308821503, 'learning_rate': 1.5330056521847695e-05, 'epoch': 0.34} 34%|███▍ | 2267/6638 [2:09:44<4:01:32, 3.32s/it] 34%|███▍ | 2268/6638 [2:09:47<4:04:20, 3.35s/it] {'loss': 0.6876, 'grad_norm': 0.5757753207456788, 'learning_rate': 1.532592706132111e-05, 'epoch': 0.34} 34%|███▍ | 2268/6638 [2:09:47<4:04:20, 3.35s/it] 34%|███▍ | 2269/6638 [2:09:51<4:04:51, 3.36s/it] {'loss': 0.6499, 'grad_norm': 0.6000642026997293, 'learning_rate': 1.5321796332579167e-05, 'epoch': 0.34} 34%|███▍ | 2269/6638 [2:09:51<4:04:51, 3.36s/it] 34%|███▍ | 2270/6638 [2:09:54<4:01:35, 3.32s/it] {'loss': 0.616, 'grad_norm': 0.589575489861149, 'learning_rate': 1.5317664336605488e-05, 'epoch': 0.34} 34%|███▍ | 2270/6638 [2:09:54<4:01:35, 3.32s/it] 34%|███▍ | 2271/6638 [2:09:57<3:59:46, 3.29s/it] {'loss': 0.6698, 'grad_norm': 0.6370355489057384, 'learning_rate': 1.5313531074383984e-05, 'epoch': 0.34} 34%|███▍ | 2271/6638 [2:09:57<3:59:46, 3.29s/it] 34%|███▍ | 2272/6638 [2:10:01<4:01:34, 3.32s/it] {'loss': 0.7176, 'grad_norm': 0.6283398681684559, 'learning_rate': 1.530939654689887e-05, 'epoch': 0.34} 34%|███▍ | 2272/6638 [2:10:01<4:01:34, 3.32s/it] 34%|███▍ | 2273/6638 [2:10:04<4:01:52, 3.32s/it] {'loss': 0.6504, 'grad_norm': 0.6001452405079465, 'learning_rate': 1.5305260755134668e-05, 'epoch': 0.34} 34%|███▍ | 2273/6638 [2:10:04<4:01:52, 3.32s/it] 34%|███▍ | 2274/6638 [2:10:07<4:00:45, 3.31s/it] {'loss': 0.7411, 'grad_norm': 1.1241820348049216, 'learning_rate': 1.530112370007619e-05, 'epoch': 0.34} 34%|███▍ | 2274/6638 [2:10:07<4:00:45, 3.31s/it] 34%|███▍ | 2275/6638 [2:10:11<4:00:44, 3.31s/it] {'loss': 0.6461, 'grad_norm': 0.6067777660086796, 'learning_rate': 1.5296985382708573e-05, 'epoch': 0.34} 34%|███▍ | 2275/6638 [2:10:11<4:00:44, 3.31s/it] 34%|███▍ | 2276/6638 [2:10:14<3:59:23, 3.29s/it] {'loss': 0.7013, 'grad_norm': 0.6260669309645238, 'learning_rate': 1.5292845804017216e-05, 'epoch': 0.34} 34%|███▍ | 2276/6638 [2:10:14<3:59:23, 3.29s/it] 34%|███▍ | 2277/6638 [2:10:17<3:57:56, 3.27s/it] {'loss': 0.6857, 'grad_norm': 0.6264022141937152, 'learning_rate': 1.528870496498785e-05, 'epoch': 0.34} 34%|███▍ | 2277/6638 [2:10:17<3:57:56, 3.27s/it] 34%|███▍ | 2278/6638 [2:10:20<4:00:43, 3.31s/it] {'loss': 0.6553, 'grad_norm': 0.645719856554175, 'learning_rate': 1.52845628666065e-05, 'epoch': 0.34} 34%|███▍ | 2278/6638 [2:10:20<4:00:43, 3.31s/it] 34%|███▍ | 2279/6638 [2:10:24<3:58:33, 3.28s/it] {'loss': 0.679, 'grad_norm': 0.608021247013011, 'learning_rate': 1.5280419509859477e-05, 'epoch': 0.34} 34%|███▍ | 2279/6638 [2:10:24<3:58:33, 3.28s/it] 34%|███▍ | 2280/6638 [2:10:27<3:59:50, 3.30s/it] {'loss': 0.6483, 'grad_norm': 0.6045975419517808, 'learning_rate': 1.527627489573341e-05, 'epoch': 0.34} 34%|███▍ | 2280/6638 [2:10:27<3:59:50, 3.30s/it] 34%|███▍ | 2281/6638 [2:10:30<3:59:59, 3.30s/it] {'loss': 0.6538, 'grad_norm': 0.6275655226225195, 'learning_rate': 1.5272129025215217e-05, 'epoch': 0.34} 34%|███▍ | 2281/6638 [2:10:30<3:59:59, 3.30s/it] 34%|███▍ | 2282/6638 [2:10:34<3:57:43, 3.27s/it] {'loss': 0.6577, 'grad_norm': 0.6198077185327046, 'learning_rate': 1.526798189929211e-05, 'epoch': 0.34} 34%|███▍ | 2282/6638 [2:10:34<3:57:43, 3.27s/it] 34%|███▍ | 2283/6638 [2:10:37<4:00:03, 3.31s/it] {'loss': 0.6268, 'grad_norm': 0.5749186376967981, 'learning_rate': 1.5263833518951616e-05, 'epoch': 0.34} 34%|███▍ | 2283/6638 [2:10:37<4:00:03, 3.31s/it] 34%|███▍ | 2284/6638 [2:10:40<3:59:43, 3.30s/it] {'loss': 0.6721, 'grad_norm': 0.6333283014195451, 'learning_rate': 1.525968388518155e-05, 'epoch': 0.34} 34%|███▍ | 2284/6638 [2:10:40<3:59:43, 3.30s/it] 34%|███▍ | 2285/6638 [2:10:43<3:58:03, 3.28s/it] {'loss': 0.651, 'grad_norm': 0.5987256956593366, 'learning_rate': 1.5255532998970023e-05, 'epoch': 0.34} 34%|███▍ | 2285/6638 [2:10:43<3:58:03, 3.28s/it] 34%|███▍ | 2286/6638 [2:10:47<4:00:02, 3.31s/it] {'loss': 0.7163, 'grad_norm': 0.5978477096509278, 'learning_rate': 1.5251380861305452e-05, 'epoch': 0.34} 34%|███▍ | 2286/6638 [2:10:47<4:00:02, 3.31s/it] 34%|███▍ | 2287/6638 [2:10:50<3:58:17, 3.29s/it] {'loss': 0.6498, 'grad_norm': 0.6366328664748864, 'learning_rate': 1.5247227473176547e-05, 'epoch': 0.34} 34%|███▍ | 2287/6638 [2:10:50<3:58:17, 3.29s/it] 34%|███▍ | 2288/6638 [2:10:53<4:00:23, 3.32s/it] {'loss': 0.6837, 'grad_norm': 0.7025817275060657, 'learning_rate': 1.5243072835572319e-05, 'epoch': 0.34} 34%|███▍ | 2288/6638 [2:10:53<4:00:23, 3.32s/it] 34%|███▍ | 2289/6638 [2:10:57<3:59:43, 3.31s/it] {'loss': 0.7316, 'grad_norm': 0.6975430196332176, 'learning_rate': 1.523891694948207e-05, 'epoch': 0.34} 34%|███▍ | 2289/6638 [2:10:57<3:59:43, 3.31s/it] 34%|███▍ | 2290/6638 [2:11:00<3:55:04, 3.24s/it] {'loss': 0.6542, 'grad_norm': 0.6822825844406576, 'learning_rate': 1.523475981589541e-05, 'epoch': 0.34} 34%|███▍ | 2290/6638 [2:11:00<3:55:04, 3.24s/it] 35%|███▍ | 2291/6638 [2:11:03<3:55:47, 3.25s/it] {'loss': 0.6577, 'grad_norm': 0.652092332599939, 'learning_rate': 1.5230601435802235e-05, 'epoch': 0.35} 35%|███▍ | 2291/6638 [2:11:03<3:55:47, 3.25s/it] 35%|███▍ | 2292/6638 [2:11:06<3:56:14, 3.26s/it] {'loss': 0.699, 'grad_norm': 0.6736074215904266, 'learning_rate': 1.5226441810192744e-05, 'epoch': 0.35} 35%|███▍ | 2292/6638 [2:11:06<3:56:14, 3.26s/it] 35%|███▍ | 2293/6638 [2:11:10<3:59:48, 3.31s/it] {'loss': 0.6634, 'grad_norm': 0.6046633961040456, 'learning_rate': 1.5222280940057436e-05, 'epoch': 0.35} 35%|███▍ | 2293/6638 [2:11:10<3:59:48, 3.31s/it] 35%|███▍ | 2294/6638 [2:11:13<3:59:21, 3.31s/it] {'loss': 0.6659, 'grad_norm': 0.6020221994500095, 'learning_rate': 1.5218118826387099e-05, 'epoch': 0.35} 35%|███▍ | 2294/6638 [2:11:13<3:59:21, 3.31s/it] 35%|███▍ | 2295/6638 [2:11:16<3:59:45, 3.31s/it] {'loss': 0.7173, 'grad_norm': 0.6705743929306244, 'learning_rate': 1.5213955470172814e-05, 'epoch': 0.35} 35%|███▍ | 2295/6638 [2:11:16<3:59:45, 3.31s/it] 35%|███▍ | 2296/6638 [2:11:20<3:58:13, 3.29s/it] {'loss': 0.6474, 'grad_norm': 0.5690020408749245, 'learning_rate': 1.5209790872405972e-05, 'epoch': 0.35} 35%|███▍ | 2296/6638 [2:11:20<3:58:13, 3.29s/it] 35%|███▍ | 2297/6638 [2:11:23<3:59:18, 3.31s/it] {'loss': 0.6457, 'grad_norm': 0.6486014368030142, 'learning_rate': 1.5205625034078244e-05, 'epoch': 0.35} 35%|███▍ | 2297/6638 [2:11:23<3:59:18, 3.31s/it] 35%|███▍ | 2298/6638 [2:11:26<3:57:26, 3.28s/it] {'loss': 0.6496, 'grad_norm': 0.6429981497851172, 'learning_rate': 1.5201457956181612e-05, 'epoch': 0.35} 35%|███▍ | 2298/6638 [2:11:26<3:57:26, 3.28s/it] 35%|███▍ | 2299/6638 [2:11:29<3:56:37, 3.27s/it] {'loss': 0.664, 'grad_norm': 0.605388853124246, 'learning_rate': 1.5197289639708341e-05, 'epoch': 0.35} 35%|███▍ | 2299/6638 [2:11:29<3:56:37, 3.27s/it]2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 35%|███▍ | 2300/6638 [2:11:33<3:57:25, 3.28s/it]4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6644, 'grad_norm': 0.6178987085917577, 'learning_rate': 1.5193120085650996e-05, 'epoch': 0.35} 35%|███▍ | 2300/6638 [2:11:33<3:57:25, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 35%|███▍ | 2301/6638 [2:11:50<8:58:22, 7.45s/it] {'loss': 0.6842, 'grad_norm': 0.6393303461297282, 'learning_rate': 1.5188949295002431e-05, 'epoch': 0.35} 35%|███▍ | 2301/6638 [2:11:50<8:58:22, 7.45s/it] 35%|███▍ | 2302/6638 [2:11:53<7:30:20, 6.23s/it] {'loss': 0.6938, 'grad_norm': 0.6844778189034161, 'learning_rate': 1.5184777268755803e-05, 'epoch': 0.35} 35%|███▍ | 2302/6638 [2:11:53<7:30:20, 6.23s/it] 35%|███▍ | 2303/6638 [2:11:57<6:27:27, 5.36s/it] {'loss': 0.6734, 'grad_norm': 0.6279574171369962, 'learning_rate': 1.518060400790456e-05, 'epoch': 0.35} 35%|███▍ | 2303/6638 [2:11:57<6:27:27, 5.36s/it] 35%|███▍ | 2304/6638 [2:12:00<5:42:01, 4.74s/it] {'loss': 0.6303, 'grad_norm': 0.6089065542604769, 'learning_rate': 1.5176429513442443e-05, 'epoch': 0.35} 35%|███▍ | 2304/6638 [2:12:00<5:42:01, 4.74s/it] 35%|███▍ | 2305/6638 [2:12:03<5:12:36, 4.33s/it] {'loss': 0.7053, 'grad_norm': 0.5620089222450594, 'learning_rate': 1.5172253786363484e-05, 'epoch': 0.35} 35%|███▍ | 2305/6638 [2:12:03<5:12:36, 4.33s/it] 35%|███▍ | 2306/6638 [2:12:07<4:52:07, 4.05s/it] {'loss': 0.6733, 'grad_norm': 0.6373784880469732, 'learning_rate': 1.5168076827662014e-05, 'epoch': 0.35} 35%|███▍ | 2306/6638 [2:12:07<4:52:07, 4.05s/it] 35%|███▍ | 2307/6638 [2:12:10<4:34:49, 3.81s/it] {'loss': 0.6504, 'grad_norm': 0.6164221429852031, 'learning_rate': 1.5163898638332652e-05, 'epoch': 0.35} 35%|███▍ | 2307/6638 [2:12:10<4:34:49, 3.81s/it] 35%|███▍ | 2308/6638 [2:12:13<4:21:35, 3.62s/it] {'loss': 0.6395, 'grad_norm': 0.5831281889228754, 'learning_rate': 1.515971921937032e-05, 'epoch': 0.35} 35%|███▍ | 2308/6638 [2:12:13<4:21:35, 3.62s/it] 35%|███▍ | 2309/6638 [2:12:16<4:13:34, 3.51s/it] {'loss': 0.6303, 'grad_norm': 0.5270133622801509, 'learning_rate': 1.515553857177022e-05, 'epoch': 0.35} 35%|███▍ | 2309/6638 [2:12:16<4:13:34, 3.51s/it] 35%|███▍ | 2310/6638 [2:12:20<4:09:10, 3.45s/it] {'loss': 0.6306, 'grad_norm': 0.5741484898719281, 'learning_rate': 1.5151356696527852e-05, 'epoch': 0.35} 35%|███▍ | 2310/6638 [2:12:20<4:09:10, 3.45s/it] 35%|███▍ | 2311/6638 [2:12:23<4:03:29, 3.38s/it] {'loss': 0.6296, 'grad_norm': 0.520645417443004, 'learning_rate': 1.5147173594639007e-05, 'epoch': 0.35} 35%|███▍ | 2311/6638 [2:12:23<4:03:29, 3.38s/it] 35%|███▍ | 2312/6638 [2:12:26<3:59:37, 3.32s/it] {'loss': 0.7111, 'grad_norm': 0.7034594822576308, 'learning_rate': 1.5142989267099775e-05, 'epoch': 0.35} 35%|███▍ | 2312/6638 [2:12:26<3:59:37, 3.32s/it] 35%|███▍ | 2313/6638 [2:12:29<3:59:12, 3.32s/it] {'loss': 0.6469, 'grad_norm': 0.6184125754535155, 'learning_rate': 1.5138803714906529e-05, 'epoch': 0.35} 35%|███▍ | 2313/6638 [2:12:29<3:59:12, 3.32s/it] 35%|███▍ | 2314/6638 [2:12:33<3:58:31, 3.31s/it] {'loss': 0.6638, 'grad_norm': 0.5711998702735936, 'learning_rate': 1.513461693905594e-05, 'epoch': 0.35} 35%|███▍ | 2314/6638 [2:12:33<3:58:31, 3.31s/it] 35%|███▍ | 2315/6638 [2:12:36<3:56:29, 3.28s/it] {'loss': 0.6663, 'grad_norm': 0.5834751607159436, 'learning_rate': 1.5130428940544963e-05, 'epoch': 0.35} 35%|███▍ | 2315/6638 [2:12:36<3:56:29, 3.28s/it] 35%|███▍ | 2316/6638 [2:12:39<3:58:06, 3.31s/it] {'loss': 0.6439, 'grad_norm': 0.6037433377737725, 'learning_rate': 1.5126239720370852e-05, 'epoch': 0.35} 35%|███▍ | 2316/6638 [2:12:39<3:58:06, 3.31s/it] 35%|███▍ | 2317/6638 [2:12:43<4:00:16, 3.34s/it] {'loss': 0.6646, 'grad_norm': 0.6618266023771936, 'learning_rate': 1.5122049279531143e-05, 'epoch': 0.35} 35%|███▍ | 2317/6638 [2:12:43<4:00:16, 3.34s/it] 35%|███▍ | 2318/6638 [2:12:46<4:00:46, 3.34s/it] {'loss': 0.6725, 'grad_norm': 0.6431823729268393, 'learning_rate': 1.5117857619023677e-05, 'epoch': 0.35} 35%|███▍ | 2318/6638 [2:12:46<4:00:46, 3.34s/it] 35%|███▍ | 2319/6638 [2:12:49<3:59:12, 3.32s/it] {'loss': 0.6521, 'grad_norm': 0.5836809044773009, 'learning_rate': 1.5113664739846571e-05, 'epoch': 0.35} 35%|███▍ | 2319/6638 [2:12:49<3:59:12, 3.32s/it] 35%|███▍ | 2320/6638 [2:12:53<4:03:37, 3.39s/it] {'loss': 0.6236, 'grad_norm': 0.5465343728137614, 'learning_rate': 1.5109470642998241e-05, 'epoch': 0.35} 35%|███▍ | 2320/6638 [2:12:53<4:03:37, 3.39s/it] 35%|███▍ | 2321/6638 [2:12:56<4:04:33, 3.40s/it] {'loss': 0.7124, 'grad_norm': 0.6702450530565155, 'learning_rate': 1.5105275329477384e-05, 'epoch': 0.35} 35%|███▍ | 2321/6638 [2:12:56<4:04:33, 3.40s/it] 35%|███▍ | 2322/6638 [2:13:00<4:04:50, 3.40s/it] {'loss': 0.6548, 'grad_norm': 0.5693241173079581, 'learning_rate': 1.5101078800282998e-05, 'epoch': 0.35} 35%|███▍ | 2322/6638 [2:13:00<4:04:50, 3.40s/it] 35%|███▍ | 2323/6638 [2:13:03<4:02:45, 3.38s/it] {'loss': 0.6144, 'grad_norm': 0.5579042753225468, 'learning_rate': 1.5096881056414364e-05, 'epoch': 0.35} 35%|███▍ | 2323/6638 [2:13:03<4:02:45, 3.38s/it] 35%|███▌ | 2324/6638 [2:13:06<4:02:38, 3.37s/it] {'loss': 0.7019, 'grad_norm': 0.6111286952972034, 'learning_rate': 1.5092682098871052e-05, 'epoch': 0.35} 35%|███▌ | 2324/6638 [2:13:06<4:02:38, 3.37s/it] 35%|███▌ | 2325/6638 [2:13:10<3:59:41, 3.33s/it] {'loss': 0.6473, 'grad_norm': 0.5914397078825174, 'learning_rate': 1.5088481928652921e-05, 'epoch': 0.35} 35%|███▌ | 2325/6638 [2:13:10<3:59:41, 3.33s/it] 35%|███▌ | 2326/6638 [2:13:13<3:58:05, 3.31s/it] {'loss': 0.6785, 'grad_norm': 0.5896418826493772, 'learning_rate': 1.5084280546760122e-05, 'epoch': 0.35} 35%|███▌ | 2326/6638 [2:13:13<3:58:05, 3.31s/it] 35%|███▌ | 2327/6638 [2:13:16<3:58:45, 3.32s/it] {'loss': 0.7467, 'grad_norm': 0.6517125705994783, 'learning_rate': 1.5080077954193093e-05, 'epoch': 0.35} 35%|███▌ | 2327/6638 [2:13:16<3:58:45, 3.32s/it] 35%|███▌ | 2328/6638 [2:13:19<3:56:36, 3.29s/it] {'loss': 0.6496, 'grad_norm': 0.5805699841230578, 'learning_rate': 1.5075874151952557e-05, 'epoch': 0.35} 35%|███▌ | 2328/6638 [2:13:19<3:56:36, 3.29s/it] 35%|███▌ | 2329/6638 [2:13:23<3:57:29, 3.31s/it] {'loss': 0.6667, 'grad_norm': 0.6206590381792227, 'learning_rate': 1.507166914103953e-05, 'epoch': 0.35} 35%|███▌ | 2329/6638 [2:13:23<3:57:29, 3.31s/it] 35%|███▌ | 2330/6638 [2:13:26<3:55:48, 3.28s/it] {'loss': 0.6731, 'grad_norm': 0.6104427166225079, 'learning_rate': 1.5067462922455314e-05, 'epoch': 0.35} 35%|███▌ | 2330/6638 [2:13:26<3:55:48, 3.28s/it] 35%|███▌ | 2331/6638 [2:13:29<3:55:39, 3.28s/it] {'loss': 0.659, 'grad_norm': 0.5928436646855828, 'learning_rate': 1.5063255497201497e-05, 'epoch': 0.35} 35%|███▌ | 2331/6638 [2:13:29<3:55:39, 3.28s/it] 35%|███▌ | 2332/6638 [2:13:33<3:57:52, 3.31s/it] {'loss': 0.652, 'grad_norm': 0.5913596508053619, 'learning_rate': 1.5059046866279957e-05, 'epoch': 0.35} 35%|███▌ | 2332/6638 [2:13:33<3:57:52, 3.31s/it] 35%|███▌ | 2333/6638 [2:13:36<3:55:07, 3.28s/it] {'loss': 0.599, 'grad_norm': 0.49835556963194466, 'learning_rate': 1.5054837030692855e-05, 'epoch': 0.35} 35%|███▌ | 2333/6638 [2:13:36<3:55:07, 3.28s/it] 35%|███▌ | 2334/6638 [2:13:39<3:53:59, 3.26s/it] {'loss': 0.6527, 'grad_norm': 0.6455552993231781, 'learning_rate': 1.5050625991442642e-05, 'epoch': 0.35} 35%|███▌ | 2334/6638 [2:13:39<3:53:59, 3.26s/it] 35%|███▌ | 2335/6638 [2:13:43<3:56:53, 3.30s/it] {'loss': 0.6849, 'grad_norm': 0.6426047120533404, 'learning_rate': 1.5046413749532058e-05, 'epoch': 0.35} 35%|███▌ | 2335/6638 [2:13:43<3:56:53, 3.30s/it] 35%|███▌ | 2336/6638 [2:13:46<3:58:10, 3.32s/it] {'loss': 0.6879, 'grad_norm': 0.750716932003007, 'learning_rate': 1.5042200305964123e-05, 'epoch': 0.35} 35%|███▌ | 2336/6638 [2:13:46<3:58:10, 3.32s/it] 35%|███▌ | 2337/6638 [2:13:49<3:56:04, 3.29s/it] {'loss': 0.6897, 'grad_norm': 0.706184588909519, 'learning_rate': 1.503798566174215e-05, 'epoch': 0.35} 35%|███▌ | 2337/6638 [2:13:49<3:56:04, 3.29s/it] 35%|███▌ | 2338/6638 [2:13:52<3:57:31, 3.31s/it] {'loss': 0.7075, 'grad_norm': 0.6663618082878406, 'learning_rate': 1.503376981786973e-05, 'epoch': 0.35} 35%|███▌ | 2338/6638 [2:13:52<3:57:31, 3.31s/it] 35%|███▌ | 2339/6638 [2:13:56<3:56:58, 3.31s/it] {'loss': 0.6768, 'grad_norm': 0.7007101223073923, 'learning_rate': 1.5029552775350746e-05, 'epoch': 0.35} 35%|███▌ | 2339/6638 [2:13:56<3:56:58, 3.31s/it] 35%|███▌ | 2340/6638 [2:13:59<3:56:45, 3.31s/it] {'loss': 0.6759, 'grad_norm': 0.6077990912426472, 'learning_rate': 1.5025334535189368e-05, 'epoch': 0.35} 35%|███▌ | 2340/6638 [2:13:59<3:56:45, 3.31s/it] 35%|███▌ | 2341/6638 [2:14:02<3:58:07, 3.33s/it] {'loss': 0.6365, 'grad_norm': 0.5929859278332409, 'learning_rate': 1.5021115098390046e-05, 'epoch': 0.35} 35%|███▌ | 2341/6638 [2:14:02<3:58:07, 3.33s/it] 35%|███▌ | 2342/6638 [2:14:06<3:57:30, 3.32s/it] {'loss': 0.6695, 'grad_norm': 0.6100181704519515, 'learning_rate': 1.5016894465957514e-05, 'epoch': 0.35} 35%|███▌ | 2342/6638 [2:14:06<3:57:30, 3.32s/it] 35%|███▌ | 2343/6638 [2:14:09<3:55:21, 3.29s/it] {'loss': 0.6644, 'grad_norm': 0.7015371417481819, 'learning_rate': 1.5012672638896797e-05, 'epoch': 0.35} 35%|███▌ | 2343/6638 [2:14:09<3:55:21, 3.29s/it] 35%|███▌ | 2344/6638 [2:14:12<3:53:16, 3.26s/it] {'loss': 0.6631, 'grad_norm': 0.5736179031423525, 'learning_rate': 1.5008449618213196e-05, 'epoch': 0.35} 35%|███▌ | 2344/6638 [2:14:12<3:53:16, 3.26s/it] 35%|███▌ | 2345/6638 [2:14:15<3:52:36, 3.25s/it] {'loss': 0.6866, 'grad_norm': 0.6942837515137859, 'learning_rate': 1.500422540491231e-05, 'epoch': 0.35} 35%|███▌ | 2345/6638 [2:14:15<3:52:36, 3.25s/it] 35%|███▌ | 2346/6638 [2:14:19<3:56:54, 3.31s/it] {'loss': 0.6963, 'grad_norm': 0.6147237691034974, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.35} 35%|███▌ | 2346/6638 [2:14:19<3:56:54, 3.31s/it] 35%|███▌ | 2347/6638 [2:14:22<3:55:31, 3.29s/it] {'loss': 0.6902, 'grad_norm': 0.6600630181101814, 'learning_rate': 1.4995773404482436e-05, 'epoch': 0.35} 35%|███▌ | 2347/6638 [2:14:22<3:55:31, 3.29s/it] 35%|███▌ | 2348/6638 [2:14:25<3:54:46, 3.28s/it] {'loss': 0.6317, 'grad_norm': 0.5317105216610785, 'learning_rate': 1.4991545619366055e-05, 'epoch': 0.35} 35%|███▌ | 2348/6638 [2:14:25<3:54:46, 3.28s/it] 35%|███▌ | 2349/6638 [2:14:29<3:55:29, 3.29s/it] {'loss': 0.7027, 'grad_norm': 0.7280609161116578, 'learning_rate': 1.4987316645657581e-05, 'epoch': 0.35} 35%|███▌ | 2349/6638 [2:14:29<3:55:29, 3.29s/it]7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 31 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 35%|███▌ | 2350/6638 [2:14:32<3:56:57, 3.32s/it]6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.638, 'grad_norm': 0.6128541473169643, 'learning_rate': 1.4983086484364023e-05, 'epoch': 0.35} 35%|███▌ | 2350/6638 [2:14:32<3:56:57, 3.32s/it] 35%|███▌ | 2351/6638 [2:14:35<3:58:35, 3.34s/it] {'loss': 0.6761, 'grad_norm': 0.6268947818253358, 'learning_rate': 1.4978855136492666e-05, 'epoch': 0.35} 35%|███▌ | 2351/6638 [2:14:35<3:58:35, 3.34s/it] 35%|███▌ | 2352/6638 [2:14:39<4:02:33, 3.40s/it] {'loss': 0.7179, 'grad_norm': 0.6132618739742185, 'learning_rate': 1.4974622603051093e-05, 'epoch': 0.35} 35%|███▌ | 2352/6638 [2:14:39<4:02:33, 3.40s/it] 35%|███▌ | 2353/6638 [2:14:42<4:00:02, 3.36s/it] {'loss': 0.668, 'grad_norm': 0.6064607716990325, 'learning_rate': 1.497038888504715e-05, 'epoch': 0.35} 35%|███▌ | 2353/6638 [2:14:42<4:00:02, 3.36s/it] 35%|███▌ | 2354/6638 [2:14:45<3:56:48, 3.32s/it] {'loss': 0.6863, 'grad_norm': 0.8158963243878032, 'learning_rate': 1.4966153983488978e-05, 'epoch': 0.35} 35%|███▌ | 2354/6638 [2:14:45<3:56:48, 3.32s/it] 35%|███▌ | 2355/6638 [2:14:49<3:58:43, 3.34s/it] {'loss': 0.7133, 'grad_norm': 0.783915678120556, 'learning_rate': 1.4961917899384999e-05, 'epoch': 0.35} 35%|███▌ | 2355/6638 [2:14:49<3:58:43, 3.34s/it] 35%|███▌ | 2356/6638 [2:14:52<3:58:23, 3.34s/it] {'loss': 0.6895, 'grad_norm': 0.6804238823446701, 'learning_rate': 1.495768063374391e-05, 'epoch': 0.35} 35%|███▌ | 2356/6638 [2:14:52<3:58:23, 3.34s/it] 36%|███▌ | 2357/6638 [2:14:56<3:58:31, 3.34s/it] {'loss': 0.6442, 'grad_norm': 0.6024962294425096, 'learning_rate': 1.4953442187574694e-05, 'epoch': 0.36} 36%|███▌ | 2357/6638 [2:14:56<3:58:31, 3.34s/it] 36%|███▌ | 2358/6638 [2:14:59<3:58:00, 3.34s/it] {'loss': 0.6385, 'grad_norm': 0.546825163996202, 'learning_rate': 1.4949202561886615e-05, 'epoch': 0.36} 36%|███▌ | 2358/6638 [2:14:59<3:58:00, 3.34s/it] 36%|███▌ | 2359/6638 [2:15:02<3:56:25, 3.32s/it] {'loss': 0.6221, 'grad_norm': 0.544504834775375, 'learning_rate': 1.4944961757689216e-05, 'epoch': 0.36} 36%|███▌ | 2359/6638 [2:15:02<3:56:25, 3.32s/it] 36%|███▌ | 2360/6638 [2:15:05<3:53:53, 3.28s/it] {'loss': 0.61, 'grad_norm': 0.5897508939950297, 'learning_rate': 1.4940719775992326e-05, 'epoch': 0.36} 36%|███▌ | 2360/6638 [2:15:05<3:53:53, 3.28s/it] 36%|███▌ | 2361/6638 [2:15:09<3:54:04, 3.28s/it] {'loss': 0.6374, 'grad_norm': 0.5528348654903592, 'learning_rate': 1.4936476617806041e-05, 'epoch': 0.36} 36%|███▌ | 2361/6638 [2:15:09<3:54:04, 3.28s/it] 36%|███▌ | 2362/6638 [2:15:12<3:55:02, 3.30s/it] {'loss': 0.6942, 'grad_norm': 0.6442104857684823, 'learning_rate': 1.4932232284140755e-05, 'epoch': 0.36} 36%|███▌ | 2362/6638 [2:15:12<3:55:02, 3.30s/it] 36%|███▌ | 2363/6638 [2:15:15<3:53:30, 3.28s/it] {'loss': 0.6618, 'grad_norm': 0.6398585316508066, 'learning_rate': 1.4927986776007129e-05, 'epoch': 0.36} 36%|███▌ | 2363/6638 [2:15:15<3:53:30, 3.28s/it] 36%|███▌ | 2364/6638 [2:15:18<3:51:34, 3.25s/it] {'loss': 0.6425, 'grad_norm': 0.5708409968941804, 'learning_rate': 1.492374009441611e-05, 'epoch': 0.36} 36%|███▌ | 2364/6638 [2:15:18<3:51:34, 3.25s/it] 36%|███▌ | 2365/6638 [2:15:22<3:50:31, 3.24s/it] {'loss': 0.6514, 'grad_norm': 0.612168358396315, 'learning_rate': 1.4919492240378923e-05, 'epoch': 0.36} 36%|███▌ | 2365/6638 [2:15:22<3:50:31, 3.24s/it] 36%|███▌ | 2366/6638 [2:15:25<3:47:35, 3.20s/it] {'loss': 0.6655, 'grad_norm': 0.7146204590473682, 'learning_rate': 1.4915243214907067e-05, 'epoch': 0.36} 36%|███▌ | 2366/6638 [2:15:25<3:47:35, 3.20s/it] 36%|███▌ | 2367/6638 [2:15:28<3:50:07, 3.23s/it] {'loss': 0.7318, 'grad_norm': 0.7471847237376508, 'learning_rate': 1.4910993019012328e-05, 'epoch': 0.36} 36%|███▌ | 2367/6638 [2:15:28<3:50:07, 3.23s/it] 36%|███▌ | 2368/6638 [2:15:31<3:52:44, 3.27s/it] {'loss': 0.6587, 'grad_norm': 0.5573851231492808, 'learning_rate': 1.4906741653706767e-05, 'epoch': 0.36} 36%|███▌ | 2368/6638 [2:15:31<3:52:44, 3.27s/it] 36%|███▌ | 2369/6638 [2:15:35<3:52:15, 3.26s/it] {'loss': 0.6727, 'grad_norm': 0.6613193679294401, 'learning_rate': 1.4902489120002722e-05, 'epoch': 0.36} 36%|███▌ | 2369/6638 [2:15:35<3:52:15, 3.26s/it] 36%|███▌ | 2370/6638 [2:15:38<3:55:11, 3.31s/it] {'loss': 0.6305, 'grad_norm': 0.5918425277229854, 'learning_rate': 1.4898235418912812e-05, 'epoch': 0.36} 36%|███▌ | 2370/6638 [2:15:38<3:55:11, 3.31s/it] 36%|███▌ | 2371/6638 [2:15:41<3:52:25, 3.27s/it] {'loss': 0.6053, 'grad_norm': 0.5551768413596739, 'learning_rate': 1.4893980551449932e-05, 'epoch': 0.36} 36%|███▌ | 2371/6638 [2:15:41<3:52:25, 3.27s/it] 36%|███▌ | 2372/6638 [2:15:44<3:52:20, 3.27s/it] {'loss': 0.6539, 'grad_norm': 0.6285350261956619, 'learning_rate': 1.4889724518627255e-05, 'epoch': 0.36} 36%|███▌ | 2372/6638 [2:15:44<3:52:20, 3.27s/it] 36%|███▌ | 2373/6638 [2:15:48<3:50:23, 3.24s/it] {'loss': 0.6388, 'grad_norm': 0.6413606026457522, 'learning_rate': 1.4885467321458234e-05, 'epoch': 0.36} 36%|███▌ | 2373/6638 [2:15:48<3:50:23, 3.24s/it] 36%|███▌ | 2374/6638 [2:15:51<3:49:06, 3.22s/it] {'loss': 0.6906, 'grad_norm': 0.6003344283822484, 'learning_rate': 1.4881208960956596e-05, 'epoch': 0.36} 36%|███▌ | 2374/6638 [2:15:51<3:49:06, 3.22s/it] 36%|███▌ | 2375/6638 [2:15:54<3:53:44, 3.29s/it] {'loss': 0.6505, 'grad_norm': 0.6151930583371212, 'learning_rate': 1.4876949438136348e-05, 'epoch': 0.36} 36%|███▌ | 2375/6638 [2:15:54<3:53:44, 3.29s/it] 36%|███▌ | 2376/6638 [2:15:58<3:53:47, 3.29s/it] {'loss': 0.6854, 'grad_norm': 0.6172083428522603, 'learning_rate': 1.4872688754011768e-05, 'epoch': 0.36} 36%|███▌ | 2376/6638 [2:15:58<3:53:47, 3.29s/it] 36%|███▌ | 2377/6638 [2:16:01<3:52:23, 3.27s/it] {'loss': 0.6747, 'grad_norm': 0.6593079221062361, 'learning_rate': 1.4868426909597417e-05, 'epoch': 0.36} 36%|███▌ | 2377/6638 [2:16:01<3:52:23, 3.27s/it] 36%|███▌ | 2378/6638 [2:16:04<3:52:44, 3.28s/it] {'loss': 0.7491, 'grad_norm': 0.6679391093479401, 'learning_rate': 1.4864163905908133e-05, 'epoch': 0.36} 36%|███▌ | 2378/6638 [2:16:04<3:52:44, 3.28s/it] 36%|███▌ | 2379/6638 [2:16:07<3:52:10, 3.27s/it] {'loss': 0.6525, 'grad_norm': 0.6026971470058301, 'learning_rate': 1.4859899743959023e-05, 'epoch': 0.36} 36%|███▌ | 2379/6638 [2:16:07<3:52:10, 3.27s/it] 36%|███▌ | 2380/6638 [2:16:11<3:52:55, 3.28s/it] {'loss': 0.6507, 'grad_norm': 0.5862075133116889, 'learning_rate': 1.4855634424765473e-05, 'epoch': 0.36} 36%|███▌ | 2380/6638 [2:16:11<3:52:55, 3.28s/it] 36%|███▌ | 2381/6638 [2:16:14<3:55:56, 3.33s/it] {'loss': 0.713, 'grad_norm': 0.6506253905135553, 'learning_rate': 1.485136794934315e-05, 'epoch': 0.36} 36%|███▌ | 2381/6638 [2:16:14<3:55:56, 3.33s/it] 36%|███▌ | 2382/6638 [2:16:17<3:55:09, 3.32s/it] {'loss': 0.6596, 'grad_norm': 0.582995570124845, 'learning_rate': 1.4847100318707983e-05, 'epoch': 0.36} 36%|███▌ | 2382/6638 [2:16:17<3:55:09, 3.32s/it] 36%|███▌ | 2383/6638 [2:16:21<3:53:05, 3.29s/it] {'loss': 0.6378, 'grad_norm': 0.5702034936045776, 'learning_rate': 1.4842831533876196e-05, 'epoch': 0.36} 36%|███▌ | 2383/6638 [2:16:21<3:53:05, 3.29s/it] 36%|███▌ | 2384/6638 [2:16:24<3:54:19, 3.30s/it] {'loss': 0.6327, 'grad_norm': 0.5909691276398431, 'learning_rate': 1.4838561595864269e-05, 'epoch': 0.36} 36%|███▌ | 2384/6638 [2:16:24<3:54:19, 3.30s/it] 36%|███▌ | 2385/6638 [2:16:27<3:53:14, 3.29s/it] {'loss': 0.6731, 'grad_norm': 0.5716024720736343, 'learning_rate': 1.4834290505688967e-05, 'epoch': 0.36} 36%|███▌ | 2385/6638 [2:16:27<3:53:14, 3.29s/it] 36%|███▌ | 2386/6638 [2:16:30<3:51:30, 3.27s/it] {'loss': 0.5977, 'grad_norm': 0.6082747550059899, 'learning_rate': 1.4830018264367322e-05, 'epoch': 0.36} 36%|███▌ | 2386/6638 [2:16:30<3:51:30, 3.27s/it] 36%|███▌ | 2387/6638 [2:16:34<3:52:55, 3.29s/it] {'loss': 0.6358, 'grad_norm': 0.5946619790002314, 'learning_rate': 1.482574487291665e-05, 'epoch': 0.36} 36%|███▌ | 2387/6638 [2:16:34<3:52:55, 3.29s/it] 36%|███▌ | 2388/6638 [2:16:37<3:54:29, 3.31s/it] {'loss': 0.6661, 'grad_norm': 0.6012037985493774, 'learning_rate': 1.4821470332354536e-05, 'epoch': 0.36} 36%|███▌ | 2388/6638 [2:16:37<3:54:29, 3.31s/it] 36%|███▌ | 2389/6638 [2:16:40<3:53:51, 3.30s/it] {'loss': 0.6748, 'grad_norm': 0.6709969307126975, 'learning_rate': 1.481719464369883e-05, 'epoch': 0.36} 36%|███▌ | 2389/6638 [2:16:40<3:53:51, 3.30s/it] 36%|███▌ | 2390/6638 [2:16:44<3:52:53, 3.29s/it] {'loss': 0.631, 'grad_norm': 0.5898513546726796, 'learning_rate': 1.4812917807967672e-05, 'epoch': 0.36} 36%|███▌ | 2390/6638 [2:16:44<3:52:53, 3.29s/it] 36%|███▌ | 2391/6638 [2:16:47<3:51:41, 3.27s/it] {'loss': 0.6605, 'grad_norm': 0.5755448994900213, 'learning_rate': 1.4808639826179464e-05, 'epoch': 0.36} 36%|███▌ | 2391/6638 [2:16:47<3:51:41, 3.27s/it] 36%|███▌ | 2392/6638 [2:16:50<3:51:28, 3.27s/it] {'loss': 0.6283, 'grad_norm': 0.6053392771496432, 'learning_rate': 1.4804360699352884e-05, 'epoch': 0.36} 36%|███▌ | 2392/6638 [2:16:50<3:51:28, 3.27s/it] 36%|███▌ | 2393/6638 [2:16:54<3:54:58, 3.32s/it] {'loss': 0.6559, 'grad_norm': 0.5845958401624122, 'learning_rate': 1.4800080428506883e-05, 'epoch': 0.36} 36%|███▌ | 2393/6638 [2:16:54<3:54:58, 3.32s/it] 36%|███▌ | 2394/6638 [2:16:57<3:55:26, 3.33s/it] {'loss': 0.6401, 'grad_norm': 0.5831199528218411, 'learning_rate': 1.4795799014660677e-05, 'epoch': 0.36} 36%|███▌ | 2394/6638 [2:16:57<3:55:26, 3.33s/it] 36%|███▌ | 2395/6638 [2:17:00<3:52:52, 3.29s/it] {'loss': 0.6808, 'grad_norm': 0.6616689287368194, 'learning_rate': 1.4791516458833771e-05, 'epoch': 0.36} 36%|███▌ | 2395/6638 [2:17:00<3:52:52, 3.29s/it] 36%|███▌ | 2396/6638 [2:17:03<3:53:35, 3.30s/it] {'loss': 0.6615, 'grad_norm': 0.6519344485868159, 'learning_rate': 1.478723276204592e-05, 'epoch': 0.36} 36%|███▌ | 2396/6638 [2:17:03<3:53:35, 3.30s/it] 36%|███▌ | 2397/6638 [2:17:07<3:53:05, 3.30s/it] {'loss': 0.6551, 'grad_norm': 0.5910959909114449, 'learning_rate': 1.4782947925317173e-05, 'epoch': 0.36} 36%|███▌ | 2397/6638 [2:17:07<3:53:05, 3.30s/it] 36%|███▌ | 2398/6638 [2:17:10<3:51:31, 3.28s/it] {'loss': 0.6723, 'grad_norm': 0.5673299601337196, 'learning_rate': 1.4778661949667836e-05, 'epoch': 0.36} 36%|███▌ | 2398/6638 [2:17:10<3:51:31, 3.28s/it] 36%|███▌ | 2399/6638 [2:17:13<3:49:55, 3.25s/it] {'loss': 0.6587, 'grad_norm': 0.5918251472862424, 'learning_rate': 1.477437483611849e-05, 'epoch': 0.36} 36%|███▌ | 2399/6638 [2:17:13<3:49:55, 3.25s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 37 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 36%|███▌ | 2400/6638 [2:17:16<3:51:05, 3.27s/it]5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6592, 'grad_norm': 0.5858651891815303, 'learning_rate': 1.4770086585689982e-05, 'epoch': 0.36} 36%|███▌ | 2400/6638 [2:17:16<3:51:05, 3.27s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 36%|███▌ | 2401/6638 [2:17:33<8:25:53, 7.16s/it] {'loss': 0.6605, 'grad_norm': 0.6056140742971021, 'learning_rate': 1.4765797199403445e-05, 'epoch': 0.36} 36%|███▌ | 2401/6638 [2:17:33<8:25:53, 7.16s/it] 36%|███▌ | 2402/6638 [2:17:36<7:03:46, 6.00s/it] {'loss': 0.7088, 'grad_norm': 0.6907879623800623, 'learning_rate': 1.4761506678280264e-05, 'epoch': 0.36} 36%|███▌ | 2402/6638 [2:17:36<7:03:46, 6.00s/it] 36%|███▌ | 2403/6638 [2:17:39<6:06:15, 5.19s/it] {'loss': 0.6477, 'grad_norm': 0.6238477602557383, 'learning_rate': 1.4757215023342105e-05, 'epoch': 0.36} 36%|███▌ | 2403/6638 [2:17:39<6:06:15, 5.19s/it] 36%|███▌ | 2404/6638 [2:17:43<5:25:05, 4.61s/it] {'loss': 0.6785, 'grad_norm': 0.5851535360382526, 'learning_rate': 1.47529222356109e-05, 'epoch': 0.36} 36%|███▌ | 2404/6638 [2:17:43<5:25:05, 4.61s/it] 36%|███▌ | 2405/6638 [2:17:46<5:01:32, 4.27s/it] {'loss': 0.6786, 'grad_norm': 0.629218360067542, 'learning_rate': 1.4748628316108855e-05, 'epoch': 0.36} 36%|███▌ | 2405/6638 [2:17:46<5:01:32, 4.27s/it] 36%|███▌ | 2406/6638 [2:17:49<4:39:24, 3.96s/it] {'loss': 0.6446, 'grad_norm': 0.6065829907941402, 'learning_rate': 1.4744333265858442e-05, 'epoch': 0.36} 36%|███▌ | 2406/6638 [2:17:49<4:39:24, 3.96s/it] 36%|███▋ | 2407/6638 [2:17:53<4:23:59, 3.74s/it] {'loss': 0.6797, 'grad_norm': 0.6259508892366518, 'learning_rate': 1.4740037085882401e-05, 'epoch': 0.36} 36%|███▋ | 2407/6638 [2:17:53<4:23:59, 3.74s/it] 36%|███▋ | 2408/6638 [2:17:56<4:16:39, 3.64s/it] {'loss': 0.6514, 'grad_norm': 0.6332826302205673, 'learning_rate': 1.4735739777203746e-05, 'epoch': 0.36} 36%|███▋ | 2408/6638 [2:17:56<4:16:39, 3.64s/it] 36%|███▋ | 2409/6638 [2:18:00<4:16:27, 3.64s/it] {'loss': 0.6678, 'grad_norm': 0.6099284281815012, 'learning_rate': 1.473144134084575e-05, 'epoch': 0.36} 36%|███▋ | 2409/6638 [2:18:00<4:16:27, 3.64s/it] 36%|███▋ | 2410/6638 [2:18:03<4:06:27, 3.50s/it] {'loss': 0.6769, 'grad_norm': 0.655453222484744, 'learning_rate': 1.4727141777831969e-05, 'epoch': 0.36} 36%|███▋ | 2410/6638 [2:18:03<4:06:27, 3.50s/it] 36%|███▋ | 2411/6638 [2:18:06<4:06:42, 3.50s/it] {'loss': 0.6501, 'grad_norm': 0.547260405571351, 'learning_rate': 1.4722841089186214e-05, 'epoch': 0.36} 36%|███▋ | 2411/6638 [2:18:06<4:06:42, 3.50s/it] 36%|███▋ | 2412/6638 [2:18:10<4:02:15, 3.44s/it] {'loss': 0.7589, 'grad_norm': 0.804373068508164, 'learning_rate': 1.4718539275932575e-05, 'epoch': 0.36} 36%|███▋ | 2412/6638 [2:18:10<4:02:15, 3.44s/it] 36%|███▋ | 2413/6638 [2:18:13<3:59:47, 3.41s/it] {'loss': 0.6523, 'grad_norm': 0.6790793399556456, 'learning_rate': 1.4714236339095397e-05, 'epoch': 0.36} 36%|███▋ | 2413/6638 [2:18:13<3:59:47, 3.41s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/model/llava_arch.py:500: UserWarning: Truncating sequences to `model_max_length` (4096). warnings.warn(f"Truncating sequences to `model_max_length` ({self.tokenizer.model_max_length}).") 36%|███▋ | 2414/6638 [2:18:16<4:03:15, 3.46s/it] {'loss': 0.6665, 'grad_norm': 0.6558086786270529, 'learning_rate': 1.4709932279699309e-05, 'epoch': 0.36} 36%|███▋ | 2414/6638 [2:18:16<4:03:15, 3.46s/it] 36%|███▋ | 2415/6638 [2:18:20<4:00:34, 3.42s/it] {'loss': 0.6792, 'grad_norm': 0.6385887066095195, 'learning_rate': 1.4705627098769186e-05, 'epoch': 0.36} 36%|███▋ | 2415/6638 [2:18:20<4:00:34, 3.42s/it] 36%|███▋ | 2416/6638 [2:18:23<3:56:44, 3.36s/it] {'loss': 0.6553, 'grad_norm': 0.5886211900343357, 'learning_rate': 1.4701320797330196e-05, 'epoch': 0.36} 36%|███▋ | 2416/6638 [2:18:23<3:56:44, 3.36s/it] 36%|███▋ | 2417/6638 [2:18:26<3:56:53, 3.37s/it] {'loss': 0.7246, 'grad_norm': 0.6641236318328706, 'learning_rate': 1.469701337640775e-05, 'epoch': 0.36} 36%|███▋ | 2417/6638 [2:18:26<3:56:53, 3.37s/it] 36%|███▋ | 2418/6638 [2:18:30<3:57:56, 3.38s/it] {'loss': 0.6414, 'grad_norm': 0.6045949374520351, 'learning_rate': 1.4692704837027538e-05, 'epoch': 0.36} 36%|███▋ | 2418/6638 [2:18:30<3:57:56, 3.38s/it] 36%|███▋ | 2419/6638 [2:18:33<3:57:11, 3.37s/it] {'loss': 0.6535, 'grad_norm': 0.6489437033351406, 'learning_rate': 1.4688395180215515e-05, 'epoch': 0.36} 36%|███▋ | 2419/6638 [2:18:33<3:57:11, 3.37s/it] 36%|███▋ | 2420/6638 [2:18:36<3:53:40, 3.32s/it] {'loss': 0.6491, 'grad_norm': 0.6111023918176923, 'learning_rate': 1.4684084406997903e-05, 'epoch': 0.36} 36%|███▋ | 2420/6638 [2:18:36<3:53:40, 3.32s/it] 36%|███▋ | 2421/6638 [2:18:40<3:54:17, 3.33s/it] {'loss': 0.6585, 'grad_norm': 0.6810413619616432, 'learning_rate': 1.4679772518401185e-05, 'epoch': 0.36} 36%|███▋ | 2421/6638 [2:18:40<3:54:17, 3.33s/it] 36%|███▋ | 2422/6638 [2:18:43<3:52:37, 3.31s/it] {'loss': 0.6751, 'grad_norm': 0.6517235720310675, 'learning_rate': 1.4675459515452113e-05, 'epoch': 0.36} 36%|███▋ | 2422/6638 [2:18:43<3:52:37, 3.31s/it] 37%|███▋ | 2423/6638 [2:18:46<3:52:42, 3.31s/it] {'loss': 0.6728, 'grad_norm': 0.5726557631472354, 'learning_rate': 1.4671145399177702e-05, 'epoch': 0.37} 37%|███▋ | 2423/6638 [2:18:46<3:52:42, 3.31s/it] 37%|███▋ | 2424/6638 [2:18:50<3:54:08, 3.33s/it] {'loss': 0.6741, 'grad_norm': 0.5689017516318359, 'learning_rate': 1.4666830170605238e-05, 'epoch': 0.37} 37%|███▋ | 2424/6638 [2:18:50<3:54:08, 3.33s/it] 37%|███▋ | 2425/6638 [2:18:53<3:53:28, 3.33s/it] {'loss': 0.6471, 'grad_norm': 0.5882563463273841, 'learning_rate': 1.4662513830762262e-05, 'epoch': 0.37} 37%|███▋ | 2425/6638 [2:18:53<3:53:28, 3.33s/it] 37%|███▋ | 2426/6638 [2:18:56<3:51:29, 3.30s/it] {'loss': 0.6554, 'grad_norm': 0.599133209824609, 'learning_rate': 1.465819638067659e-05, 'epoch': 0.37} 37%|███▋ | 2426/6638 [2:18:56<3:51:29, 3.30s/it] 37%|███▋ | 2427/6638 [2:18:59<3:49:18, 3.27s/it] {'loss': 0.669, 'grad_norm': 0.6351162842313347, 'learning_rate': 1.4653877821376301e-05, 'epoch': 0.37} 37%|███▋ | 2427/6638 [2:18:59<3:49:18, 3.27s/it] 37%|███▋ | 2428/6638 [2:19:03<3:52:50, 3.32s/it] {'loss': 0.7222, 'grad_norm': 0.6029821953309522, 'learning_rate': 1.4649558153889726e-05, 'epoch': 0.37} 37%|███▋ | 2428/6638 [2:19:03<3:52:50, 3.32s/it] 37%|███▋ | 2429/6638 [2:19:06<3:49:23, 3.27s/it] {'loss': 0.6305, 'grad_norm': 0.6220720184626304, 'learning_rate': 1.4645237379245476e-05, 'epoch': 0.37} 37%|███▋ | 2429/6638 [2:19:06<3:49:23, 3.27s/it] 37%|███▋ | 2430/6638 [2:19:09<3:51:28, 3.30s/it] {'loss': 0.6634, 'grad_norm': 0.5956982162541038, 'learning_rate': 1.4640915498472415e-05, 'epoch': 0.37} 37%|███▋ | 2430/6638 [2:19:09<3:51:28, 3.30s/it] 37%|███▋ | 2431/6638 [2:19:13<3:51:18, 3.30s/it] {'loss': 0.6278, 'grad_norm': 0.5388906062220454, 'learning_rate': 1.4636592512599674e-05, 'epoch': 0.37} 37%|███▋ | 2431/6638 [2:19:13<3:51:18, 3.30s/it] 37%|███▋ | 2432/6638 [2:19:16<3:51:32, 3.30s/it] {'loss': 0.6231, 'grad_norm': 0.5578488635685822, 'learning_rate': 1.4632268422656645e-05, 'epoch': 0.37} 37%|███▋ | 2432/6638 [2:19:16<3:51:32, 3.30s/it] 37%|███▋ | 2433/6638 [2:19:19<3:51:01, 3.30s/it] {'loss': 0.6683, 'grad_norm': 0.6143643364861328, 'learning_rate': 1.4627943229672992e-05, 'epoch': 0.37} 37%|███▋ | 2433/6638 [2:19:19<3:51:01, 3.30s/it] 37%|███▋ | 2434/6638 [2:19:23<3:49:57, 3.28s/it] {'loss': 0.7053, 'grad_norm': 0.6992959048019253, 'learning_rate': 1.4623616934678627e-05, 'epoch': 0.37} 37%|███▋ | 2434/6638 [2:19:23<3:49:57, 3.28s/it] 37%|███▋ | 2435/6638 [2:19:26<3:47:06, 3.24s/it] {'loss': 0.66, 'grad_norm': 0.6613182177800517, 'learning_rate': 1.4619289538703736e-05, 'epoch': 0.37} 37%|███▋ | 2435/6638 [2:19:26<3:47:06, 3.24s/it] 37%|███▋ | 2436/6638 [2:19:29<3:46:01, 3.23s/it] {'loss': 0.6542, 'grad_norm': 0.6903825072517508, 'learning_rate': 1.4614961042778762e-05, 'epoch': 0.37} 37%|███▋ | 2436/6638 [2:19:29<3:46:01, 3.23s/it] 37%|███▋ | 2437/6638 [2:19:32<3:46:37, 3.24s/it] {'loss': 0.679, 'grad_norm': 0.6382664015802364, 'learning_rate': 1.461063144793441e-05, 'epoch': 0.37} 37%|███▋ | 2437/6638 [2:19:32<3:46:37, 3.24s/it] 37%|███▋ | 2438/6638 [2:19:35<3:47:25, 3.25s/it] {'loss': 0.6765, 'grad_norm': 0.6684878162536315, 'learning_rate': 1.4606300755201646e-05, 'epoch': 0.37} 37%|███▋ | 2438/6638 [2:19:35<3:47:25, 3.25s/it] 37%|███▋ | 2439/6638 [2:19:39<3:46:27, 3.24s/it] {'loss': 0.6519, 'grad_norm': 0.6599976374294302, 'learning_rate': 1.4601968965611704e-05, 'epoch': 0.37} 37%|███▋ | 2439/6638 [2:19:39<3:46:27, 3.24s/it] 37%|███▋ | 2440/6638 [2:19:42<3:47:23, 3.25s/it] {'loss': 0.7213, 'grad_norm': 0.6513705498464333, 'learning_rate': 1.4597636080196074e-05, 'epoch': 0.37} 37%|███▋ | 2440/6638 [2:19:42<3:47:23, 3.25s/it] 37%|███▋ | 2441/6638 [2:19:45<3:47:47, 3.26s/it] {'loss': 0.6773, 'grad_norm': 0.6926430433090888, 'learning_rate': 1.4593302099986502e-05, 'epoch': 0.37} 37%|███▋ | 2441/6638 [2:19:45<3:47:47, 3.26s/it] 37%|███▋ | 2442/6638 [2:19:48<3:46:40, 3.24s/it] {'loss': 0.6243, 'grad_norm': 0.6165397731557533, 'learning_rate': 1.4588967026015004e-05, 'epoch': 0.37} 37%|███▋ | 2442/6638 [2:19:48<3:46:40, 3.24s/it] 37%|███▋ | 2443/6638 [2:19:52<3:46:18, 3.24s/it] {'loss': 0.6563, 'grad_norm': 0.6390625638816929, 'learning_rate': 1.458463085931385e-05, 'epoch': 0.37} 37%|███▋ | 2443/6638 [2:19:52<3:46:18, 3.24s/it] 37%|███▋ | 2444/6638 [2:19:55<3:46:52, 3.25s/it] {'loss': 0.7025, 'grad_norm': 0.6713916884658255, 'learning_rate': 1.4580293600915579e-05, 'epoch': 0.37} 37%|███▋ | 2444/6638 [2:19:55<3:46:52, 3.25s/it] 37%|███▋ | 2445/6638 [2:19:58<3:47:25, 3.25s/it] {'loss': 0.6445, 'grad_norm': 0.6176429942244668, 'learning_rate': 1.4575955251852973e-05, 'epoch': 0.37} 37%|███▋ | 2445/6638 [2:19:58<3:47:25, 3.25s/it] 37%|███▋ | 2446/6638 [2:20:01<3:47:56, 3.26s/it] {'loss': 0.6965, 'grad_norm': 0.656813519418741, 'learning_rate': 1.457161581315909e-05, 'epoch': 0.37} 37%|███▋ | 2446/6638 [2:20:01<3:47:56, 3.26s/it] 37%|███▋ | 2447/6638 [2:20:05<3:48:38, 3.27s/it] {'loss': 0.6284, 'grad_norm': 0.6134080888410173, 'learning_rate': 1.4567275285867243e-05, 'epoch': 0.37} 37%|███▋ | 2447/6638 [2:20:05<3:48:38, 3.27s/it] 37%|███▋ | 2448/6638 [2:20:08<3:47:11, 3.25s/it] {'loss': 0.6952, 'grad_norm': 0.709630382052189, 'learning_rate': 1.4562933671011001e-05, 'epoch': 0.37} 37%|███▋ | 2448/6638 [2:20:08<3:47:11, 3.25s/it] 37%|███▋ | 2449/6638 [2:20:11<3:52:50, 3.34s/it] {'loss': 0.6633, 'grad_norm': 0.581202963360411, 'learning_rate': 1.4558590969624198e-05, 'epoch': 0.37} 37%|███▋ | 2449/6638 [2:20:11<3:52:50, 3.34s/it]2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 37%|███▋ | 2450/6638 [2:20:15<3:54:55, 3.37s/it] {'loss': 0.6664, 'grad_norm': 0.632329201354008, 'learning_rate': 1.4554247182740915e-05, 'epoch': 0.37} 37%|███▋ | 2450/6638 [2:20:15<3:54:55, 3.37s/it] 37%|███▋ | 2451/6638 [2:20:18<3:56:14, 3.39s/it] {'loss': 0.6515, 'grad_norm': 0.7124530915635278, 'learning_rate': 1.4549902311395503e-05, 'epoch': 0.37} 37%|███▋ | 2451/6638 [2:20:18<3:56:14, 3.39s/it] 37%|███▋ | 2452/6638 [2:20:21<3:52:10, 3.33s/it] {'loss': 0.6579, 'grad_norm': 0.6734267479168843, 'learning_rate': 1.4545556356622568e-05, 'epoch': 0.37} 37%|███▋ | 2452/6638 [2:20:22<3:52:10, 3.33s/it] 37%|███▋ | 2453/6638 [2:20:25<3:52:55, 3.34s/it] {'loss': 0.7185, 'grad_norm': 0.7352757949565729, 'learning_rate': 1.4541209319456972e-05, 'epoch': 0.37} 37%|███▋ | 2453/6638 [2:20:25<3:52:55, 3.34s/it] 37%|███▋ | 2454/6638 [2:20:28<3:49:40, 3.29s/it] {'loss': 0.6492, 'grad_norm': 0.5987450000111949, 'learning_rate': 1.4536861200933838e-05, 'epoch': 0.37} 37%|███▋ | 2454/6638 [2:20:28<3:49:40, 3.29s/it] 37%|███▋ | 2455/6638 [2:20:31<3:51:21, 3.32s/it] {'loss': 0.6308, 'grad_norm': 0.5762836658298375, 'learning_rate': 1.4532512002088544e-05, 'epoch': 0.37} 37%|███▋ | 2455/6638 [2:20:31<3:51:21, 3.32s/it] 37%|███▋ | 2456/6638 [2:20:35<3:49:32, 3.29s/it] {'loss': 0.6325, 'grad_norm': 0.5535114067113039, 'learning_rate': 1.4528161723956725e-05, 'epoch': 0.37} 37%|███▋ | 2456/6638 [2:20:35<3:49:32, 3.29s/it] 37%|███▋ | 2457/6638 [2:20:38<3:48:55, 3.29s/it] {'loss': 0.6506, 'grad_norm': 0.5260670542713122, 'learning_rate': 1.4523810367574271e-05, 'epoch': 0.37} 37%|███▋ | 2457/6638 [2:20:38<3:48:55, 3.29s/it] 37%|███▋ | 2458/6638 [2:20:41<3:49:18, 3.29s/it] {'loss': 0.6763, 'grad_norm': 0.5912853941656662, 'learning_rate': 1.4519457933977337e-05, 'epoch': 0.37} 37%|███▋ | 2458/6638 [2:20:41<3:49:18, 3.29s/it] 37%|███▋ | 2459/6638 [2:20:45<3:48:55, 3.29s/it] {'loss': 0.6479, 'grad_norm': 0.6086184647279831, 'learning_rate': 1.4515104424202325e-05, 'epoch': 0.37} 37%|███▋ | 2459/6638 [2:20:45<3:48:55, 3.29s/it] 37%|███▋ | 2460/6638 [2:20:48<3:48:25, 3.28s/it] {'loss': 0.6329, 'grad_norm': 0.583609166458799, 'learning_rate': 1.45107498392859e-05, 'epoch': 0.37} 37%|███▋ | 2460/6638 [2:20:48<3:48:25, 3.28s/it] 37%|███▋ | 2461/6638 [2:20:51<3:50:25, 3.31s/it] {'loss': 0.7188, 'grad_norm': 0.6616179530726007, 'learning_rate': 1.4506394180264978e-05, 'epoch': 0.37} 37%|███▋ | 2461/6638 [2:20:51<3:50:25, 3.31s/it] 37%|███▋ | 2462/6638 [2:20:54<3:50:46, 3.32s/it] {'loss': 0.6461, 'grad_norm': 0.8419791224759622, 'learning_rate': 1.4502037448176734e-05, 'epoch': 0.37} 37%|███▋ | 2462/6638 [2:20:54<3:50:46, 3.32s/it] 37%|███▋ | 2463/6638 [2:20:58<3:50:02, 3.31s/it] {'loss': 0.6761, 'grad_norm': 0.6871412902223601, 'learning_rate': 1.44976796440586e-05, 'epoch': 0.37} 37%|███▋ | 2463/6638 [2:20:58<3:50:02, 3.31s/it] 37%|███▋ | 2464/6638 [2:21:01<3:51:05, 3.32s/it] {'loss': 0.6443, 'grad_norm': 0.5695731861141183, 'learning_rate': 1.4493320768948258e-05, 'epoch': 0.37} 37%|███▋ | 2464/6638 [2:21:01<3:51:05, 3.32s/it] 37%|███▋ | 2465/6638 [2:21:04<3:51:10, 3.32s/it] {'loss': 0.6428, 'grad_norm': 0.5868669746220312, 'learning_rate': 1.4488960823883647e-05, 'epoch': 0.37} 37%|███▋ | 2465/6638 [2:21:04<3:51:10, 3.32s/it] 37%|███▋ | 2466/6638 [2:21:08<3:48:05, 3.28s/it] {'loss': 0.6188, 'grad_norm': 0.6194870350458747, 'learning_rate': 1.4484599809902964e-05, 'epoch': 0.37} 37%|███▋ | 2466/6638 [2:21:08<3:48:05, 3.28s/it] 37%|███▋ | 2467/6638 [2:21:11<3:46:39, 3.26s/it] {'loss': 0.6761, 'grad_norm': 0.6297520562215129, 'learning_rate': 1.448023772804466e-05, 'epoch': 0.37} 37%|███▋ | 2467/6638 [2:21:11<3:46:39, 3.26s/it] 37%|███▋ | 2468/6638 [2:21:14<3:48:33, 3.29s/it] {'loss': 0.6734, 'grad_norm': 0.6605046284590742, 'learning_rate': 1.4475874579347436e-05, 'epoch': 0.37} 37%|███▋ | 2468/6638 [2:21:14<3:48:33, 3.29s/it] 37%|███▋ | 2469/6638 [2:21:17<3:48:13, 3.28s/it] {'loss': 0.6519, 'grad_norm': 0.6366626860538016, 'learning_rate': 1.4471510364850248e-05, 'epoch': 0.37} 37%|███▋ | 2469/6638 [2:21:17<3:48:13, 3.28s/it] 37%|███▋ | 2470/6638 [2:21:21<3:51:02, 3.33s/it] {'loss': 0.6642, 'grad_norm': 0.6781976255038451, 'learning_rate': 1.4467145085592309e-05, 'epoch': 0.37} 37%|███▋ | 2470/6638 [2:21:21<3:51:02, 3.33s/it] 37%|███▋ | 2471/6638 [2:21:24<3:49:16, 3.30s/it] {'loss': 0.6293, 'grad_norm': 0.5931963871361927, 'learning_rate': 1.4462778742613082e-05, 'epoch': 0.37} 37%|███▋ | 2471/6638 [2:21:24<3:49:16, 3.30s/it] 37%|███▋ | 2472/6638 [2:21:28<3:50:43, 3.32s/it] {'loss': 0.6572, 'grad_norm': 0.6229979192057076, 'learning_rate': 1.4458411336952292e-05, 'epoch': 0.37} 37%|███▋ | 2472/6638 [2:21:28<3:50:43, 3.32s/it] 37%|███▋ | 2473/6638 [2:21:31<3:53:36, 3.37s/it] {'loss': 0.6625, 'grad_norm': 0.7234483326641485, 'learning_rate': 1.4454042869649902e-05, 'epoch': 0.37} 37%|███▋ | 2473/6638 [2:21:31<3:53:36, 3.37s/it] 37%|███▋ | 2474/6638 [2:21:34<3:54:21, 3.38s/it] {'loss': 0.6589, 'grad_norm': 0.5699169975503667, 'learning_rate': 1.4449673341746142e-05, 'epoch': 0.37} 37%|███▋ | 2474/6638 [2:21:34<3:54:21, 3.38s/it] 37%|███▋ | 2475/6638 [2:21:38<3:52:14, 3.35s/it] {'loss': 0.6378, 'grad_norm': 0.6434390379701718, 'learning_rate': 1.4445302754281483e-05, 'epoch': 0.37} 37%|███▋ | 2475/6638 [2:21:38<3:52:14, 3.35s/it] 37%|███▋ | 2476/6638 [2:21:41<3:49:29, 3.31s/it] {'loss': 0.6574, 'grad_norm': 0.6048556073106861, 'learning_rate': 1.4440931108296658e-05, 'epoch': 0.37} 37%|███▋ | 2476/6638 [2:21:41<3:49:29, 3.31s/it] 37%|███▋ | 2477/6638 [2:21:44<3:51:42, 3.34s/it] {'loss': 0.6704, 'grad_norm': 0.6567034588690397, 'learning_rate': 1.443655840483265e-05, 'epoch': 0.37} 37%|███▋ | 2477/6638 [2:21:44<3:51:42, 3.34s/it] 37%|███▋ | 2478/6638 [2:21:48<3:51:12, 3.33s/it] {'loss': 0.7242, 'grad_norm': 0.7156720017982519, 'learning_rate': 1.4432184644930685e-05, 'epoch': 0.37} 37%|███▋ | 2478/6638 [2:21:48<3:51:12, 3.33s/it] 37%|███▋ | 2479/6638 [2:21:51<3:54:38, 3.39s/it] {'loss': 0.6355, 'grad_norm': 0.5365216478922022, 'learning_rate': 1.4427809829632252e-05, 'epoch': 0.37} 37%|███▋ | 2479/6638 [2:21:51<3:54:38, 3.39s/it] 37%|███▋ | 2480/6638 [2:21:54<3:52:36, 3.36s/it] {'loss': 0.6862, 'grad_norm': 0.6322354665914075, 'learning_rate': 1.4423433959979085e-05, 'epoch': 0.37} 37%|███▋ | 2480/6638 [2:21:54<3:52:36, 3.36s/it] 37%|███▋ | 2481/6638 [2:21:58<3:52:19, 3.35s/it] {'loss': 0.6283, 'grad_norm': 0.5740100714017188, 'learning_rate': 1.4419057037013171e-05, 'epoch': 0.37} 37%|███▋ | 2481/6638 [2:21:58<3:52:19, 3.35s/it] 37%|███▋ | 2482/6638 [2:22:01<3:49:11, 3.31s/it] {'loss': 0.6479, 'grad_norm': 0.6017498315694673, 'learning_rate': 1.4414679061776746e-05, 'epoch': 0.37} 37%|███▋ | 2482/6638 [2:22:01<3:49:11, 3.31s/it] 37%|███▋ | 2483/6638 [2:22:04<3:48:56, 3.31s/it] {'loss': 0.6377, 'grad_norm': 0.5945376084921447, 'learning_rate': 1.4410300035312304e-05, 'epoch': 0.37} 37%|███▋ | 2483/6638 [2:22:04<3:48:56, 3.31s/it] 37%|███▋ | 2484/6638 [2:22:07<3:46:16, 3.27s/it] {'loss': 0.6564, 'grad_norm': 0.6531973751735513, 'learning_rate': 1.4405919958662575e-05, 'epoch': 0.37} 37%|███▋ | 2484/6638 [2:22:07<3:46:16, 3.27s/it] 37%|███▋ | 2485/6638 [2:22:11<3:47:08, 3.28s/it] {'loss': 0.6472, 'grad_norm': 0.561665397799652, 'learning_rate': 1.4401538832870555e-05, 'epoch': 0.37} 37%|███▋ | 2485/6638 [2:22:11<3:47:08, 3.28s/it] 37%|███▋ | 2486/6638 [2:22:14<3:46:43, 3.28s/it] {'loss': 0.6557, 'grad_norm': 0.634005915003834, 'learning_rate': 1.439715665897948e-05, 'epoch': 0.37} 37%|███▋ | 2486/6638 [2:22:14<3:46:43, 3.28s/it] 37%|███▋ | 2487/6638 [2:22:17<3:49:08, 3.31s/it] {'loss': 0.6485, 'grad_norm': 0.6530826065814677, 'learning_rate': 1.4392773438032833e-05, 'epoch': 0.37} 37%|███▋ | 2487/6638 [2:22:17<3:49:08, 3.31s/it] 37%|███▋ | 2488/6638 [2:22:21<3:48:27, 3.30s/it] {'loss': 0.6648, 'grad_norm': 0.70066936968313, 'learning_rate': 1.4388389171074356e-05, 'epoch': 0.37} 37%|███▋ | 2488/6638 [2:22:21<3:48:27, 3.30s/it] 37%|███▋ | 2489/6638 [2:22:24<3:48:04, 3.30s/it] {'loss': 0.6819, 'grad_norm': 0.5854477347434514, 'learning_rate': 1.4384003859148035e-05, 'epoch': 0.37} 37%|███▋ | 2489/6638 [2:22:24<3:48:04, 3.30s/it] 38%|███▊ | 2490/6638 [2:22:27<3:49:25, 3.32s/it] {'loss': 0.6636, 'grad_norm': 0.6542850173939749, 'learning_rate': 1.43796175032981e-05, 'epoch': 0.38} 38%|███▊ | 2490/6638 [2:22:27<3:49:25, 3.32s/it] 38%|███▊ | 2491/6638 [2:22:31<3:49:14, 3.32s/it] {'loss': 0.6408, 'grad_norm': 0.5359119993556994, 'learning_rate': 1.4375230104569044e-05, 'epoch': 0.38} 38%|███▊ | 2491/6638 [2:22:31<3:49:14, 3.32s/it] 38%|███▊ | 2492/6638 [2:22:34<3:46:26, 3.28s/it] {'loss': 0.6253, 'grad_norm': 0.636769821797502, 'learning_rate': 1.4370841664005592e-05, 'epoch': 0.38} 38%|███▊ | 2492/6638 [2:22:34<3:46:26, 3.28s/it] 38%|███▊ | 2493/6638 [2:22:37<3:46:35, 3.28s/it] {'loss': 0.7492, 'grad_norm': 0.6536506405036735, 'learning_rate': 1.4366452182652729e-05, 'epoch': 0.38} 38%|███▊ | 2493/6638 [2:22:37<3:46:35, 3.28s/it] 38%|███▊ | 2494/6638 [2:22:40<3:47:07, 3.29s/it] {'loss': 0.6654, 'grad_norm': 0.5784068929983779, 'learning_rate': 1.4362061661555675e-05, 'epoch': 0.38} 38%|███▊ | 2494/6638 [2:22:40<3:47:07, 3.29s/it] 38%|███▊ | 2495/6638 [2:22:44<3:47:16, 3.29s/it] {'loss': 0.6065, 'grad_norm': 0.5315215659691942, 'learning_rate': 1.4357670101759914e-05, 'epoch': 0.38} 38%|███▊ | 2495/6638 [2:22:44<3:47:16, 3.29s/it] 38%|███▊ | 2496/6638 [2:22:47<3:46:22, 3.28s/it] {'loss': 0.6468, 'grad_norm': 0.5656207709889908, 'learning_rate': 1.4353277504311164e-05, 'epoch': 0.38} 38%|███▊ | 2496/6638 [2:22:47<3:46:22, 3.28s/it] 38%|███▊ | 2497/6638 [2:22:50<3:47:54, 3.30s/it] {'loss': 0.7312, 'grad_norm': 0.630033836591218, 'learning_rate': 1.4348883870255397e-05, 'epoch': 0.38} 38%|███▊ | 2497/6638 [2:22:50<3:47:54, 3.30s/it] 38%|███▊ | 2498/6638 [2:22:54<3:45:23, 3.27s/it] {'loss': 0.6584, 'grad_norm': 0.6236608433413159, 'learning_rate': 1.4344489200638827e-05, 'epoch': 0.38} 38%|███▊ | 2498/6638 [2:22:54<3:45:23, 3.27s/it] 38%|███▊ | 2499/6638 [2:22:57<3:46:00, 3.28s/it] {'loss': 0.6636, 'grad_norm': 0.5605637001532346, 'learning_rate': 1.4340093496507921e-05, 'epoch': 0.38} 38%|███▊ | 2499/6638 [2:22:57<3:46:00, 3.28s/it]2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 38%|███▊ | 2500/6638 [2:23:00<3:44:27, 3.25s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6972, 'grad_norm': 0.6672061714576163, 'learning_rate': 1.4335696758909386e-05, 'epoch': 0.38} 38%|███▊ | 2500/6638 [2:23:00<3:44:27, 3.25s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 38%|███▊ | 2501/6638 [2:23:16<8:10:54, 7.12s/it] {'loss': 0.6678, 'grad_norm': 0.554445654781431, 'learning_rate': 1.433129898889018e-05, 'epoch': 0.38} 38%|███▊ | 2501/6638 [2:23:16<8:10:54, 7.12s/it] 38%|███▊ | 2502/6638 [2:23:19<6:50:57, 5.96s/it] {'loss': 0.6344, 'grad_norm': 0.5399170220629289, 'learning_rate': 1.4326900187497504e-05, 'epoch': 0.38} 38%|███▊ | 2502/6638 [2:23:19<6:50:57, 5.96s/it] 38%|███▊ | 2503/6638 [2:23:23<5:55:20, 5.16s/it] {'loss': 0.649, 'grad_norm': 0.5693561898610656, 'learning_rate': 1.4322500355778804e-05, 'epoch': 0.38} 38%|███▊ | 2503/6638 [2:23:23<5:55:20, 5.16s/it] 38%|███▊ | 2504/6638 [2:23:26<5:15:52, 4.58s/it] {'loss': 0.6886, 'grad_norm': 0.5822653381431755, 'learning_rate': 1.431809949478177e-05, 'epoch': 0.38} 38%|███▊ | 2504/6638 [2:23:26<5:15:52, 4.58s/it] 38%|███▊ | 2505/6638 [2:23:29<4:49:16, 4.20s/it] {'loss': 0.6633, 'grad_norm': 0.5225861882317324, 'learning_rate': 1.4313697605554349e-05, 'epoch': 0.38} 38%|███▊ | 2505/6638 [2:23:29<4:49:16, 4.20s/it] 38%|███▊ | 2506/6638 [2:23:33<4:31:46, 3.95s/it] {'loss': 0.6623, 'grad_norm': 0.6437624759252302, 'learning_rate': 1.4309294689144712e-05, 'epoch': 0.38} 38%|███▊ | 2506/6638 [2:23:33<4:31:46, 3.95s/it] 38%|███▊ | 2507/6638 [2:23:36<4:18:27, 3.75s/it] {'loss': 0.6392, 'grad_norm': 0.5711936303728562, 'learning_rate': 1.4304890746601294e-05, 'epoch': 0.38} 38%|███▊ | 2507/6638 [2:23:36<4:18:27, 3.75s/it] 38%|███▊ | 2508/6638 [2:23:39<4:08:55, 3.62s/it] {'loss': 0.7019, 'grad_norm': 0.7712522881158431, 'learning_rate': 1.4300485778972761e-05, 'epoch': 0.38} 38%|███▊ | 2508/6638 [2:23:39<4:08:55, 3.62s/it] 38%|███▊ | 2509/6638 [2:23:42<4:00:56, 3.50s/it] {'loss': 0.6218, 'grad_norm': 0.5821483409162017, 'learning_rate': 1.4296079787308035e-05, 'epoch': 0.38} 38%|███▊ | 2509/6638 [2:23:42<4:00:56, 3.50s/it] 38%|███▊ | 2510/6638 [2:23:46<3:57:58, 3.46s/it] {'loss': 0.6945, 'grad_norm': 0.5944597519553532, 'learning_rate': 1.429167277265627e-05, 'epoch': 0.38} 38%|███▊ | 2510/6638 [2:23:46<3:57:58, 3.46s/it] 38%|███▊ | 2511/6638 [2:23:49<3:54:34, 3.41s/it] {'loss': 0.6539, 'grad_norm': 0.6115679909554104, 'learning_rate': 1.428726473606687e-05, 'epoch': 0.38} 38%|███▊ | 2511/6638 [2:23:49<3:54:34, 3.41s/it] 38%|███▊ | 2512/6638 [2:23:52<3:52:10, 3.38s/it] {'loss': 0.6842, 'grad_norm': 0.6547018653135271, 'learning_rate': 1.4282855678589482e-05, 'epoch': 0.38} 38%|███▊ | 2512/6638 [2:23:52<3:52:10, 3.38s/it] 38%|███▊ | 2513/6638 [2:23:56<3:49:51, 3.34s/it] {'loss': 0.6458, 'grad_norm': 0.5969545922326593, 'learning_rate': 1.4278445601273998e-05, 'epoch': 0.38} 38%|███▊ | 2513/6638 [2:23:56<3:49:51, 3.34s/it] 38%|███▊ | 2514/6638 [2:23:59<3:48:28, 3.32s/it] {'loss': 0.669, 'grad_norm': 0.7458112298373144, 'learning_rate': 1.4274034505170543e-05, 'epoch': 0.38} 38%|███▊ | 2514/6638 [2:23:59<3:48:28, 3.32s/it] 38%|███▊ | 2515/6638 [2:24:02<3:44:03, 3.26s/it] {'loss': 0.6952, 'grad_norm': 0.6880253584032108, 'learning_rate': 1.4269622391329501e-05, 'epoch': 0.38} 38%|███▊ | 2515/6638 [2:24:02<3:44:03, 3.26s/it] 38%|███▊ | 2516/6638 [2:24:06<3:48:27, 3.33s/it] {'loss': 0.7009, 'grad_norm': 0.7753709017225635, 'learning_rate': 1.4265209260801483e-05, 'epoch': 0.38} 38%|███▊ | 2516/6638 [2:24:06<3:48:27, 3.33s/it] 38%|███▊ | 2517/6638 [2:24:09<3:49:30, 3.34s/it] {'loss': 0.6926, 'grad_norm': 0.6071094648578409, 'learning_rate': 1.426079511463735e-05, 'epoch': 0.38} 38%|███▊ | 2517/6638 [2:24:09<3:49:30, 3.34s/it] 38%|███▊ | 2518/6638 [2:24:12<3:52:19, 3.38s/it] {'loss': 0.6568, 'grad_norm': 0.6383329775779488, 'learning_rate': 1.4256379953888202e-05, 'epoch': 0.38} 38%|███▊ | 2518/6638 [2:24:12<3:52:19, 3.38s/it] 38%|███▊ | 2519/6638 [2:24:16<3:51:14, 3.37s/it] {'loss': 0.6566, 'grad_norm': 0.5655349485402484, 'learning_rate': 1.4251963779605383e-05, 'epoch': 0.38} 38%|███▊ | 2519/6638 [2:24:16<3:51:14, 3.37s/it] 38%|███▊ | 2520/6638 [2:24:19<3:48:30, 3.33s/it] {'loss': 0.6683, 'grad_norm': 0.5946805117972763, 'learning_rate': 1.424754659284048e-05, 'epoch': 0.38} 38%|███▊ | 2520/6638 [2:24:19<3:48:30, 3.33s/it] 38%|███▊ | 2521/6638 [2:24:22<3:45:55, 3.29s/it] {'loss': 0.6743, 'grad_norm': 0.6462174259225573, 'learning_rate': 1.424312839464531e-05, 'epoch': 0.38} 38%|███▊ | 2521/6638 [2:24:22<3:45:55, 3.29s/it] 38%|███▊ | 2522/6638 [2:24:25<3:45:31, 3.29s/it] {'loss': 0.6222, 'grad_norm': 0.7084056662168857, 'learning_rate': 1.4238709186071948e-05, 'epoch': 0.38} 38%|███▊ | 2522/6638 [2:24:25<3:45:31, 3.29s/it] 38%|███▊ | 2523/6638 [2:24:29<3:46:49, 3.31s/it] {'loss': 0.6605, 'grad_norm': 0.5823820009863608, 'learning_rate': 1.4234288968172696e-05, 'epoch': 0.38} 38%|███▊ | 2523/6638 [2:24:29<3:46:49, 3.31s/it] 38%|███▊ | 2524/6638 [2:24:32<3:46:02, 3.30s/it] {'loss': 0.6572, 'grad_norm': 0.5989100278626966, 'learning_rate': 1.4229867742000103e-05, 'epoch': 0.38} 38%|███▊ | 2524/6638 [2:24:32<3:46:02, 3.30s/it] 38%|███▊ | 2525/6638 [2:24:35<3:45:51, 3.29s/it] {'loss': 0.6534, 'grad_norm': 0.6055769486352905, 'learning_rate': 1.4225445508606951e-05, 'epoch': 0.38} 38%|███▊ | 2525/6638 [2:24:35<3:45:51, 3.29s/it] 38%|███▊ | 2526/6638 [2:24:39<3:44:09, 3.27s/it] {'loss': 0.6564, 'grad_norm': 0.6267599657468905, 'learning_rate': 1.4221022269046273e-05, 'epoch': 0.38} 38%|███▊ | 2526/6638 [2:24:39<3:44:09, 3.27s/it] 38%|███▊ | 2527/6638 [2:24:42<3:44:59, 3.28s/it] {'loss': 0.7262, 'grad_norm': 0.7383951119117791, 'learning_rate': 1.4216598024371332e-05, 'epoch': 0.38} 38%|███▊ | 2527/6638 [2:24:42<3:44:59, 3.28s/it] 38%|███▊ | 2528/6638 [2:24:45<3:46:01, 3.30s/it] {'loss': 0.6891, 'grad_norm': 0.6420919179649183, 'learning_rate': 1.4212172775635633e-05, 'epoch': 0.38} 38%|███▊ | 2528/6638 [2:24:45<3:46:01, 3.30s/it] 38%|███▊ | 2529/6638 [2:24:49<3:45:20, 3.29s/it] {'loss': 0.6415, 'grad_norm': 0.6088454008991646, 'learning_rate': 1.4207746523892928e-05, 'epoch': 0.38} 38%|███▊ | 2529/6638 [2:24:49<3:45:20, 3.29s/it] 38%|███▊ | 2530/6638 [2:24:52<3:44:47, 3.28s/it] {'loss': 0.6413, 'grad_norm': 0.5615943694925443, 'learning_rate': 1.4203319270197188e-05, 'epoch': 0.38} 38%|███▊ | 2530/6638 [2:24:52<3:44:47, 3.28s/it] 38%|███▊ | 2531/6638 [2:24:55<3:42:56, 3.26s/it] {'loss': 0.6142, 'grad_norm': 0.5522317592594196, 'learning_rate': 1.4198891015602648e-05, 'epoch': 0.38} 38%|███▊ | 2531/6638 [2:24:55<3:42:56, 3.26s/it] 38%|███▊ | 2532/6638 [2:24:58<3:42:15, 3.25s/it] {'loss': 0.665, 'grad_norm': 0.6109228146719775, 'learning_rate': 1.419446176116376e-05, 'epoch': 0.38} 38%|███▊ | 2532/6638 [2:24:58<3:42:15, 3.25s/it] 38%|███▊ | 2533/6638 [2:25:01<3:41:43, 3.24s/it] {'loss': 0.6226, 'grad_norm': 0.5778531648343218, 'learning_rate': 1.4190031507935227e-05, 'epoch': 0.38} 38%|███▊ | 2533/6638 [2:25:01<3:41:43, 3.24s/it] 38%|███▊ | 2534/6638 [2:25:05<3:41:33, 3.24s/it] {'loss': 0.6706, 'grad_norm': 0.6698259219096365, 'learning_rate': 1.4185600256971987e-05, 'epoch': 0.38} 38%|███▊ | 2534/6638 [2:25:05<3:41:33, 3.24s/it] 38%|███▊ | 2535/6638 [2:25:08<3:42:46, 3.26s/it] {'loss': 0.6328, 'grad_norm': 0.556987809624015, 'learning_rate': 1.4181168009329209e-05, 'epoch': 0.38} 38%|███▊ | 2535/6638 [2:25:08<3:42:46, 3.26s/it] 38%|███▊ | 2536/6638 [2:25:11<3:45:24, 3.30s/it] {'loss': 0.6609, 'grad_norm': 0.5544718389819929, 'learning_rate': 1.4176734766062307e-05, 'epoch': 0.38} 38%|███▊ | 2536/6638 [2:25:11<3:45:24, 3.30s/it] 38%|███▊ | 2537/6638 [2:25:15<3:43:46, 3.27s/it] {'loss': 0.6445, 'grad_norm': 0.6407616903658884, 'learning_rate': 1.417230052822693e-05, 'epoch': 0.38} 38%|███▊ | 2537/6638 [2:25:15<3:43:46, 3.27s/it] 38%|███▊ | 2538/6638 [2:25:18<3:42:47, 3.26s/it] {'loss': 0.6235, 'grad_norm': 0.5692465436773012, 'learning_rate': 1.4167865296878964e-05, 'epoch': 0.38} 38%|███▊ | 2538/6638 [2:25:18<3:42:47, 3.26s/it] 38%|███▊ | 2539/6638 [2:25:21<3:43:04, 3.27s/it] {'loss': 0.6901, 'grad_norm': 0.6053552179949804, 'learning_rate': 1.416342907307453e-05, 'epoch': 0.38} 38%|███▊ | 2539/6638 [2:25:21<3:43:04, 3.27s/it] 38%|███▊ | 2540/6638 [2:25:24<3:41:54, 3.25s/it] {'loss': 0.6426, 'grad_norm': 0.5509862785018763, 'learning_rate': 1.4158991857869987e-05, 'epoch': 0.38} 38%|███▊ | 2540/6638 [2:25:24<3:41:54, 3.25s/it] 38%|███▊ | 2541/6638 [2:25:28<3:42:18, 3.26s/it] {'loss': 0.6568, 'grad_norm': 0.5930987438654906, 'learning_rate': 1.4154553652321928e-05, 'epoch': 0.38} 38%|███▊ | 2541/6638 [2:25:28<3:42:18, 3.26s/it] 38%|███▊ | 2542/6638 [2:25:31<3:43:00, 3.27s/it] {'loss': 0.6381, 'grad_norm': 0.567652701798271, 'learning_rate': 1.4150114457487183e-05, 'epoch': 0.38} 38%|███▊ | 2542/6638 [2:25:31<3:43:00, 3.27s/it] 38%|███▊ | 2543/6638 [2:25:34<3:43:49, 3.28s/it] {'loss': 0.6766, 'grad_norm': 0.5589043594154687, 'learning_rate': 1.414567427442282e-05, 'epoch': 0.38} 38%|███▊ | 2543/6638 [2:25:34<3:43:49, 3.28s/it] 38%|███▊ | 2544/6638 [2:25:38<3:46:06, 3.31s/it] {'loss': 0.6842, 'grad_norm': 0.6478322105603844, 'learning_rate': 1.4141233104186136e-05, 'epoch': 0.38} 38%|███▊ | 2544/6638 [2:25:38<3:46:06, 3.31s/it] 38%|███▊ | 2545/6638 [2:25:41<3:47:27, 3.33s/it] {'loss': 0.7294, 'grad_norm': 0.694637233128844, 'learning_rate': 1.4136790947834673e-05, 'epoch': 0.38} 38%|███▊ | 2545/6638 [2:25:41<3:47:27, 3.33s/it] 38%|███▊ | 2546/6638 [2:25:44<3:47:35, 3.34s/it] {'loss': 0.6693, 'grad_norm': 0.6483902011806453, 'learning_rate': 1.4132347806426197e-05, 'epoch': 0.38} 38%|███▊ | 2546/6638 [2:25:44<3:47:35, 3.34s/it] 38%|███▊ | 2547/6638 [2:25:48<3:45:52, 3.31s/it] {'loss': 0.6916, 'grad_norm': 0.7187281721995423, 'learning_rate': 1.4127903681018715e-05, 'epoch': 0.38} 38%|███▊ | 2547/6638 [2:25:48<3:45:52, 3.31s/it] 38%|███▊ | 2548/6638 [2:25:51<3:44:18, 3.29s/it] {'loss': 0.6634, 'grad_norm': 0.6230454766063559, 'learning_rate': 1.4123458572670468e-05, 'epoch': 0.38} 38%|███▊ | 2548/6638 [2:25:51<3:44:18, 3.29s/it] 38%|███▊ | 2549/6638 [2:25:54<3:44:02, 3.29s/it] {'loss': 0.6337, 'grad_norm': 0.5676642903985553, 'learning_rate': 1.4119012482439929e-05, 'epoch': 0.38} 38%|███▊ | 2549/6638 [2:25:54<3:44:02, 3.29s/it]7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 38%|███▊ | 2550/6638 [2:25:57<3:44:19, 3.29s/it]1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.654, 'grad_norm': 0.689344589077052, 'learning_rate': 1.4114565411385803e-05, 'epoch': 0.38} 38%|███▊ | 2550/6638 [2:25:57<3:44:19, 3.29s/it] 38%|███▊ | 2551/6638 [2:26:01<3:43:40, 3.28s/it] {'loss': 0.6483, 'grad_norm': 0.6722400605617483, 'learning_rate': 1.4110117360567037e-05, 'epoch': 0.38} 38%|███▊ | 2551/6638 [2:26:01<3:43:40, 3.28s/it] 38%|███▊ | 2552/6638 [2:26:04<3:44:11, 3.29s/it] {'loss': 0.6655, 'grad_norm': 0.6771499239757427, 'learning_rate': 1.41056683310428e-05, 'epoch': 0.38} 38%|███▊ | 2552/6638 [2:26:04<3:44:11, 3.29s/it] 38%|███▊ | 2553/6638 [2:26:07<3:46:34, 3.33s/it] {'loss': 0.647, 'grad_norm': 0.6039597514670614, 'learning_rate': 1.4101218323872506e-05, 'epoch': 0.38} 38%|███▊ | 2553/6638 [2:26:07<3:46:34, 3.33s/it] 38%|███▊ | 2554/6638 [2:26:11<3:47:41, 3.35s/it] {'loss': 0.6507, 'grad_norm': 0.5609531620963247, 'learning_rate': 1.4096767340115787e-05, 'epoch': 0.38} 38%|███▊ | 2554/6638 [2:26:11<3:47:41, 3.35s/it] 38%|███▊ | 2555/6638 [2:26:14<3:46:52, 3.33s/it] {'loss': 0.6979, 'grad_norm': 0.5750237624472484, 'learning_rate': 1.4092315380832522e-05, 'epoch': 0.38} 38%|███▊ | 2555/6638 [2:26:14<3:46:52, 3.33s/it] 39%|███▊ | 2556/6638 [2:26:17<3:45:06, 3.31s/it] {'loss': 0.6476, 'grad_norm': 0.619712444938613, 'learning_rate': 1.4087862447082814e-05, 'epoch': 0.39} 39%|███▊ | 2556/6638 [2:26:17<3:45:06, 3.31s/it] 39%|███▊ | 2557/6638 [2:26:21<3:44:41, 3.30s/it] {'loss': 0.6829, 'grad_norm': 0.5850548070466677, 'learning_rate': 1.4083408539927002e-05, 'epoch': 0.39} 39%|███▊ | 2557/6638 [2:26:21<3:44:41, 3.30s/it] 39%|███▊ | 2558/6638 [2:26:24<3:44:02, 3.29s/it] {'loss': 0.678, 'grad_norm': 0.6522980045300786, 'learning_rate': 1.4078953660425651e-05, 'epoch': 0.39} 39%|███▊ | 2558/6638 [2:26:24<3:44:02, 3.29s/it] 39%|███▊ | 2559/6638 [2:26:27<3:45:53, 3.32s/it] {'loss': 0.6793, 'grad_norm': 0.7005171524357422, 'learning_rate': 1.4074497809639569e-05, 'epoch': 0.39} 39%|███▊ | 2559/6638 [2:26:27<3:45:53, 3.32s/it] 39%|███▊ | 2560/6638 [2:26:30<3:43:52, 3.29s/it] {'loss': 0.6645, 'grad_norm': 0.5656101763867986, 'learning_rate': 1.4070040988629778e-05, 'epoch': 0.39} 39%|███▊ | 2560/6638 [2:26:30<3:43:52, 3.29s/it] 39%|███▊ | 2561/6638 [2:26:34<3:46:08, 3.33s/it] {'loss': 0.6903, 'grad_norm': 0.6109466872748092, 'learning_rate': 1.406558319845755e-05, 'epoch': 0.39} 39%|███▊ | 2561/6638 [2:26:34<3:46:08, 3.33s/it] 39%|███▊ | 2562/6638 [2:26:37<3:47:12, 3.34s/it] {'loss': 0.6367, 'grad_norm': 0.5434653296593239, 'learning_rate': 1.4061124440184375e-05, 'epoch': 0.39} 39%|███▊ | 2562/6638 [2:26:37<3:47:12, 3.34s/it] 39%|███▊ | 2563/6638 [2:26:41<3:46:06, 3.33s/it] {'loss': 0.6564, 'grad_norm': 0.6191191930068383, 'learning_rate': 1.4056664714871974e-05, 'epoch': 0.39} 39%|███▊ | 2563/6638 [2:26:41<3:46:06, 3.33s/it] 39%|███▊ | 2564/6638 [2:26:44<3:43:50, 3.30s/it] {'loss': 0.646, 'grad_norm': 0.5981473566336004, 'learning_rate': 1.4052204023582305e-05, 'epoch': 0.39} 39%|███▊ | 2564/6638 [2:26:44<3:43:50, 3.30s/it] 39%|███▊ | 2565/6638 [2:26:47<3:43:06, 3.29s/it] {'loss': 0.6534, 'grad_norm': 0.6314994508609677, 'learning_rate': 1.4047742367377555e-05, 'epoch': 0.39} 39%|███▊ | 2565/6638 [2:26:47<3:43:06, 3.29s/it] 39%|███▊ | 2566/6638 [2:26:50<3:43:41, 3.30s/it] {'loss': 0.6814, 'grad_norm': 0.6336994581357146, 'learning_rate': 1.4043279747320133e-05, 'epoch': 0.39} 39%|███▊ | 2566/6638 [2:26:50<3:43:41, 3.30s/it] 39%|███▊ | 2567/6638 [2:26:54<3:46:17, 3.34s/it] {'loss': 0.6883, 'grad_norm': 0.6876304076951957, 'learning_rate': 1.4038816164472686e-05, 'epoch': 0.39} 39%|███▊ | 2567/6638 [2:26:54<3:46:17, 3.34s/it] 39%|███▊ | 2568/6638 [2:26:57<3:45:57, 3.33s/it] {'loss': 0.7074, 'grad_norm': 0.6300368107778678, 'learning_rate': 1.4034351619898088e-05, 'epoch': 0.39} 39%|███▊ | 2568/6638 [2:26:57<3:45:57, 3.33s/it] 39%|███▊ | 2569/6638 [2:27:00<3:47:18, 3.35s/it] {'loss': 0.7008, 'grad_norm': 0.5873476999308366, 'learning_rate': 1.4029886114659434e-05, 'epoch': 0.39} 39%|███▊ | 2569/6638 [2:27:00<3:47:18, 3.35s/it] 39%|███▊ | 2570/6638 [2:27:04<3:43:24, 3.29s/it] {'loss': 0.66, 'grad_norm': 0.6116991421402854, 'learning_rate': 1.4025419649820065e-05, 'epoch': 0.39} 39%|███▊ | 2570/6638 [2:27:04<3:43:24, 3.29s/it] 39%|███▊ | 2571/6638 [2:27:07<3:41:24, 3.27s/it] {'loss': 0.6692, 'grad_norm': 0.5883588767768978, 'learning_rate': 1.4020952226443534e-05, 'epoch': 0.39} 39%|███▊ | 2571/6638 [2:27:07<3:41:24, 3.27s/it] 39%|███▊ | 2572/6638 [2:27:10<3:45:03, 3.32s/it] {'loss': 0.6754, 'grad_norm': 0.6451279677283954, 'learning_rate': 1.4016483845593629e-05, 'epoch': 0.39} 39%|███▊ | 2572/6638 [2:27:10<3:45:03, 3.32s/it] 39%|███▉ | 2573/6638 [2:27:14<3:44:37, 3.32s/it] {'loss': 0.6326, 'grad_norm': 0.5457878381498928, 'learning_rate': 1.4012014508334366e-05, 'epoch': 0.39} 39%|███▉ | 2573/6638 [2:27:14<3:44:37, 3.32s/it] 39%|███▉ | 2574/6638 [2:27:17<3:41:41, 3.27s/it] {'loss': 0.6647, 'grad_norm': 0.6497318276980403, 'learning_rate': 1.4007544215729991e-05, 'epoch': 0.39} 39%|███▉ | 2574/6638 [2:27:17<3:41:41, 3.27s/it] 39%|███▉ | 2575/6638 [2:27:20<3:39:57, 3.25s/it] {'loss': 0.6595, 'grad_norm': 0.660849012655293, 'learning_rate': 1.4003072968844969e-05, 'epoch': 0.39} 39%|███▉ | 2575/6638 [2:27:20<3:39:57, 3.25s/it] 39%|███▉ | 2576/6638 [2:27:23<3:41:28, 3.27s/it] {'loss': 0.6401, 'grad_norm': 0.601312461146337, 'learning_rate': 1.3998600768744006e-05, 'epoch': 0.39} 39%|███▉ | 2576/6638 [2:27:23<3:41:28, 3.27s/it] 39%|███▉ | 2577/6638 [2:27:27<3:44:13, 3.31s/it] {'loss': 0.6516, 'grad_norm': 0.5911011933938373, 'learning_rate': 1.399412761649202e-05, 'epoch': 0.39} 39%|███▉ | 2577/6638 [2:27:27<3:44:13, 3.31s/it] 39%|███▉ | 2578/6638 [2:27:30<3:43:16, 3.30s/it] {'loss': 0.64, 'grad_norm': 0.6064500098111769, 'learning_rate': 1.3989653513154165e-05, 'epoch': 0.39} 39%|███▉ | 2578/6638 [2:27:30<3:43:16, 3.30s/it] 39%|███▉ | 2579/6638 [2:27:33<3:40:42, 3.26s/it] {'loss': 0.6233, 'grad_norm': 0.5941879997297846, 'learning_rate': 1.3985178459795819e-05, 'epoch': 0.39} 39%|███▉ | 2579/6638 [2:27:33<3:40:42, 3.26s/it] 39%|███▉ | 2580/6638 [2:27:37<3:45:54, 3.34s/it] {'loss': 0.6141, 'grad_norm': 0.5651537216428345, 'learning_rate': 1.3980702457482588e-05, 'epoch': 0.39} 39%|███▉ | 2580/6638 [2:27:37<3:45:54, 3.34s/it] 39%|███▉ | 2581/6638 [2:27:40<3:43:22, 3.30s/it] {'loss': 0.6259, 'grad_norm': 0.6065785485148942, 'learning_rate': 1.39762255072803e-05, 'epoch': 0.39} 39%|███▉ | 2581/6638 [2:27:40<3:43:22, 3.30s/it] 39%|███▉ | 2582/6638 [2:27:43<3:42:14, 3.29s/it] {'loss': 0.6461, 'grad_norm': 0.8086359894166403, 'learning_rate': 1.3971747610255015e-05, 'epoch': 0.39} 39%|███▉ | 2582/6638 [2:27:43<3:42:14, 3.29s/it] 39%|███▉ | 2583/6638 [2:27:46<3:40:53, 3.27s/it] {'loss': 0.6702, 'grad_norm': 0.6325842552825035, 'learning_rate': 1.3967268767473008e-05, 'epoch': 0.39} 39%|███▉ | 2583/6638 [2:27:46<3:40:53, 3.27s/it] 39%|███▉ | 2584/6638 [2:27:50<3:39:49, 3.25s/it] {'loss': 0.6383, 'grad_norm': 0.5276798774725563, 'learning_rate': 1.3962788980000793e-05, 'epoch': 0.39} 39%|███▉ | 2584/6638 [2:27:50<3:39:49, 3.25s/it] 39%|███▉ | 2585/6638 [2:27:53<3:40:37, 3.27s/it] {'loss': 0.6364, 'grad_norm': 0.5894808367787141, 'learning_rate': 1.3958308248905097e-05, 'epoch': 0.39} 39%|███▉ | 2585/6638 [2:27:53<3:40:37, 3.27s/it] 39%|███▉ | 2586/6638 [2:27:56<3:40:36, 3.27s/it] {'loss': 0.6343, 'grad_norm': 0.6277358371186186, 'learning_rate': 1.3953826575252878e-05, 'epoch': 0.39} 39%|███▉ | 2586/6638 [2:27:56<3:40:36, 3.27s/it] 39%|███▉ | 2587/6638 [2:27:59<3:42:04, 3.29s/it] {'loss': 0.7173, 'grad_norm': 0.6784445708613361, 'learning_rate': 1.3949343960111315e-05, 'epoch': 0.39} 39%|███▉ | 2587/6638 [2:27:59<3:42:04, 3.29s/it] 39%|███▉ | 2588/6638 [2:28:03<3:40:09, 3.26s/it] {'loss': 0.6548, 'grad_norm': 0.627111502022629, 'learning_rate': 1.3944860404547816e-05, 'epoch': 0.39} 39%|███▉ | 2588/6638 [2:28:03<3:40:09, 3.26s/it] 39%|███▉ | 2589/6638 [2:28:06<3:39:25, 3.25s/it] {'loss': 0.7291, 'grad_norm': 0.6537634436571356, 'learning_rate': 1.3940375909630006e-05, 'epoch': 0.39} 39%|███▉ | 2589/6638 [2:28:06<3:39:25, 3.25s/it] 39%|███▉ | 2590/6638 [2:28:09<3:40:30, 3.27s/it] {'loss': 0.6753, 'grad_norm': 0.5951414533091234, 'learning_rate': 1.3935890476425744e-05, 'epoch': 0.39} 39%|███▉ | 2590/6638 [2:28:09<3:40:30, 3.27s/it] 39%|███▉ | 2591/6638 [2:28:13<3:43:25, 3.31s/it] {'loss': 0.6724, 'grad_norm': 0.7985624090762588, 'learning_rate': 1.3931404106003101e-05, 'epoch': 0.39} 39%|███▉ | 2591/6638 [2:28:13<3:43:25, 3.31s/it] 39%|███▉ | 2592/6638 [2:28:16<3:40:51, 3.28s/it] {'loss': 0.6559, 'grad_norm': 0.5871075713598061, 'learning_rate': 1.3926916799430376e-05, 'epoch': 0.39} 39%|███▉ | 2592/6638 [2:28:16<3:40:51, 3.28s/it] 39%|███▉ | 2593/6638 [2:28:19<3:41:23, 3.28s/it] {'loss': 0.6616, 'grad_norm': 0.6057763147957962, 'learning_rate': 1.3922428557776094e-05, 'epoch': 0.39} 39%|███▉ | 2593/6638 [2:28:19<3:41:23, 3.28s/it] 39%|███▉ | 2594/6638 [2:28:22<3:40:36, 3.27s/it] {'loss': 0.6473, 'grad_norm': 0.634869037000274, 'learning_rate': 1.3917939382108998e-05, 'epoch': 0.39} 39%|███▉ | 2594/6638 [2:28:22<3:40:36, 3.27s/it] 39%|███▉ | 2595/6638 [2:28:26<3:44:25, 3.33s/it] {'loss': 0.661, 'grad_norm': 0.6486941778773625, 'learning_rate': 1.3913449273498057e-05, 'epoch': 0.39} 39%|███▉ | 2595/6638 [2:28:26<3:44:25, 3.33s/it] 39%|███▉ | 2596/6638 [2:28:29<3:44:34, 3.33s/it] {'loss': 0.6624, 'grad_norm': 0.5807026766699924, 'learning_rate': 1.3908958233012457e-05, 'epoch': 0.39} 39%|███▉ | 2596/6638 [2:28:29<3:44:34, 3.33s/it] 39%|███▉ | 2597/6638 [2:28:32<3:43:46, 3.32s/it] {'loss': 0.689, 'grad_norm': 0.7299304899205923, 'learning_rate': 1.3904466261721613e-05, 'epoch': 0.39} 39%|███▉ | 2597/6638 [2:28:32<3:43:46, 3.32s/it] 39%|███▉ | 2598/6638 [2:28:36<3:45:56, 3.36s/it] {'loss': 0.7072, 'grad_norm': 0.6685247205726235, 'learning_rate': 1.3899973360695154e-05, 'epoch': 0.39} 39%|███▉ | 2598/6638 [2:28:36<3:45:56, 3.36s/it] 39%|███▉ | 2599/6638 [2:28:39<3:42:08, 3.30s/it] {'loss': 0.622, 'grad_norm': 0.715758680409633, 'learning_rate': 1.3895479531002942e-05, 'epoch': 0.39} 39%|███▉ | 2599/6638 [2:28:39<3:42:08, 3.30s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 39%|███▉ | 2600/6638 [2:28:42<3:40:57, 3.28s/it]1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.652, 'grad_norm': 0.6132565239530762, 'learning_rate': 1.3890984773715042e-05, 'epoch': 0.39} 39%|███▉ | 2600/6638 [2:28:42<3:40:57, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 39%|███▉ | 2601/6638 [2:28:58<7:56:50, 7.09s/it] {'loss': 0.5884, 'grad_norm': 0.5472194266116885, 'learning_rate': 1.3886489089901756e-05, 'epoch': 0.39} 39%|███▉ | 2601/6638 [2:28:58<7:56:50, 7.09s/it] 39%|███▉ | 2602/6638 [2:29:02<6:40:05, 5.95s/it] {'loss': 0.6353, 'grad_norm': 0.7074589238993341, 'learning_rate': 1.3881992480633601e-05, 'epoch': 0.39} 39%|███▉ | 2602/6638 [2:29:02<6:40:05, 5.95s/it] 39%|███▉ | 2603/6638 [2:29:05<5:45:14, 5.13s/it] {'loss': 0.6608, 'grad_norm': 0.6592602997329216, 'learning_rate': 1.3877494946981313e-05, 'epoch': 0.39} 39%|███▉ | 2603/6638 [2:29:05<5:45:14, 5.13s/it] 39%|███▉ | 2604/6638 [2:29:08<5:06:24, 4.56s/it] {'loss': 0.5931, 'grad_norm': 0.5664386679046488, 'learning_rate': 1.3872996490015853e-05, 'epoch': 0.39} 39%|███▉ | 2604/6638 [2:29:08<5:06:24, 4.56s/it] 39%|███▉ | 2605/6638 [2:29:11<4:41:29, 4.19s/it] {'loss': 0.7323, 'grad_norm': 0.7578968950840951, 'learning_rate': 1.3868497110808394e-05, 'epoch': 0.39} 39%|███▉ | 2605/6638 [2:29:11<4:41:29, 4.19s/it] 39%|███▉ | 2606/6638 [2:29:15<4:23:59, 3.93s/it] {'loss': 0.6864, 'grad_norm': 0.6910917085124056, 'learning_rate': 1.3863996810430333e-05, 'epoch': 0.39} 39%|███▉ | 2606/6638 [2:29:15<4:23:59, 3.93s/it] 39%|███▉ | 2607/6638 [2:29:18<4:09:48, 3.72s/it] {'loss': 0.6674, 'grad_norm': 0.6295301759165914, 'learning_rate': 1.3859495589953289e-05, 'epoch': 0.39} 39%|███▉ | 2607/6638 [2:29:18<4:09:48, 3.72s/it] 39%|███▉ | 2608/6638 [2:29:21<3:58:55, 3.56s/it] {'loss': 0.7467, 'grad_norm': 0.7413298702431517, 'learning_rate': 1.3854993450449095e-05, 'epoch': 0.39} 39%|███▉ | 2608/6638 [2:29:21<3:58:55, 3.56s/it] 39%|███▉ | 2609/6638 [2:29:25<3:56:10, 3.52s/it] {'loss': 0.6367, 'grad_norm': 0.6356020840887571, 'learning_rate': 1.3850490392989808e-05, 'epoch': 0.39} 39%|███▉ | 2609/6638 [2:29:25<3:56:10, 3.52s/it] 39%|███▉ | 2610/6638 [2:29:28<3:49:36, 3.42s/it] {'loss': 0.6582, 'grad_norm': 0.6137165646978151, 'learning_rate': 1.3845986418647697e-05, 'epoch': 0.39} 39%|███▉ | 2610/6638 [2:29:28<3:49:36, 3.42s/it] 39%|███▉ | 2611/6638 [2:29:31<3:46:24, 3.37s/it] {'loss': 0.6794, 'grad_norm': 0.7013657013575136, 'learning_rate': 1.3841481528495255e-05, 'epoch': 0.39} 39%|███▉ | 2611/6638 [2:29:31<3:46:24, 3.37s/it] 39%|███▉ | 2612/6638 [2:29:34<3:43:58, 3.34s/it] {'loss': 0.6489, 'grad_norm': 0.6405192909786248, 'learning_rate': 1.383697572360519e-05, 'epoch': 0.39} 39%|███▉ | 2612/6638 [2:29:34<3:43:58, 3.34s/it] 39%|███▉ | 2613/6638 [2:29:38<3:43:48, 3.34s/it] {'loss': 0.641, 'grad_norm': 0.6019724623979493, 'learning_rate': 1.3832469005050433e-05, 'epoch': 0.39} 39%|███▉ | 2613/6638 [2:29:38<3:43:48, 3.34s/it] 39%|███▉ | 2614/6638 [2:29:41<3:42:57, 3.32s/it] {'loss': 0.6508, 'grad_norm': 0.6328796846111497, 'learning_rate': 1.3827961373904126e-05, 'epoch': 0.39} 39%|███▉ | 2614/6638 [2:29:41<3:42:57, 3.32s/it] 39%|███▉ | 2615/6638 [2:29:44<3:41:02, 3.30s/it] {'loss': 0.6518, 'grad_norm': 0.6905948222465111, 'learning_rate': 1.382345283123963e-05, 'epoch': 0.39} 39%|███▉ | 2615/6638 [2:29:44<3:41:02, 3.30s/it] 39%|███▉ | 2616/6638 [2:29:47<3:40:05, 3.28s/it] {'loss': 0.6452, 'grad_norm': 0.587907833003692, 'learning_rate': 1.3818943378130522e-05, 'epoch': 0.39} 39%|███▉ | 2616/6638 [2:29:47<3:40:05, 3.28s/it] 39%|███▉ | 2617/6638 [2:29:51<3:39:50, 3.28s/it] {'loss': 0.6699, 'grad_norm': 0.6035620432104476, 'learning_rate': 1.3814433015650605e-05, 'epoch': 0.39} 39%|███▉ | 2617/6638 [2:29:51<3:39:50, 3.28s/it] 39%|███▉ | 2618/6638 [2:29:54<3:39:32, 3.28s/it] {'loss': 0.6598, 'grad_norm': 0.6280751365526488, 'learning_rate': 1.3809921744873885e-05, 'epoch': 0.39} 39%|███▉ | 2618/6638 [2:29:54<3:39:32, 3.28s/it] 39%|███▉ | 2619/6638 [2:29:57<3:37:58, 3.25s/it] {'loss': 0.6584, 'grad_norm': 0.6266514938510479, 'learning_rate': 1.380540956687459e-05, 'epoch': 0.39} 39%|███▉ | 2619/6638 [2:29:57<3:37:58, 3.25s/it] 39%|███▉ | 2620/6638 [2:30:00<3:38:11, 3.26s/it] {'loss': 0.6355, 'grad_norm': 0.6128566474583504, 'learning_rate': 1.380089648272717e-05, 'epoch': 0.39} 39%|███▉ | 2620/6638 [2:30:00<3:38:11, 3.26s/it] 39%|███▉ | 2621/6638 [2:30:04<3:38:53, 3.27s/it] {'loss': 0.6438, 'grad_norm': 0.5781439071546397, 'learning_rate': 1.3796382493506277e-05, 'epoch': 0.39} 39%|███▉ | 2621/6638 [2:30:04<3:38:53, 3.27s/it] 39%|███▉ | 2622/6638 [2:30:07<3:40:33, 3.30s/it] {'loss': 0.6904, 'grad_norm': 0.6567437137833017, 'learning_rate': 1.3791867600286795e-05, 'epoch': 0.39} 39%|███▉ | 2622/6638 [2:30:07<3:40:33, 3.30s/it] 40%|███▉ | 2623/6638 [2:30:10<3:44:21, 3.35s/it] {'loss': 0.6659, 'grad_norm': 0.6543573539122424, 'learning_rate': 1.3787351804143812e-05, 'epoch': 0.4} 40%|███▉ | 2623/6638 [2:30:10<3:44:21, 3.35s/it] 40%|███▉ | 2624/6638 [2:30:14<3:46:05, 3.38s/it] {'loss': 0.6313, 'grad_norm': 0.5609769981994861, 'learning_rate': 1.3782835106152634e-05, 'epoch': 0.4} 40%|███▉ | 2624/6638 [2:30:14<3:46:05, 3.38s/it] 40%|███▉ | 2625/6638 [2:30:17<3:43:48, 3.35s/it] {'loss': 0.6994, 'grad_norm': 0.9447356102121621, 'learning_rate': 1.3778317507388778e-05, 'epoch': 0.4} 40%|███▉ | 2625/6638 [2:30:17<3:43:48, 3.35s/it] 40%|███▉ | 2626/6638 [2:30:21<3:45:32, 3.37s/it] {'loss': 0.7061, 'grad_norm': 0.7142427791965085, 'learning_rate': 1.377379900892799e-05, 'epoch': 0.4} 40%|███▉ | 2626/6638 [2:30:21<3:45:32, 3.37s/it] 40%|███▉ | 2627/6638 [2:30:24<3:43:20, 3.34s/it] {'loss': 0.6667, 'grad_norm': 0.5813880763542632, 'learning_rate': 1.3769279611846209e-05, 'epoch': 0.4} 40%|███▉ | 2627/6638 [2:30:24<3:43:20, 3.34s/it] 40%|███▉ | 2628/6638 [2:30:27<3:41:22, 3.31s/it] {'loss': 0.6722, 'grad_norm': 0.5951796144297049, 'learning_rate': 1.3764759317219607e-05, 'epoch': 0.4} 40%|███▉ | 2628/6638 [2:30:27<3:41:22, 3.31s/it] 40%|███▉ | 2629/6638 [2:30:30<3:39:46, 3.29s/it] {'loss': 0.6943, 'grad_norm': 0.6533736833776682, 'learning_rate': 1.3760238126124552e-05, 'epoch': 0.4} 40%|███▉ | 2629/6638 [2:30:30<3:39:46, 3.29s/it] 40%|███▉ | 2630/6638 [2:30:34<3:37:24, 3.25s/it] {'loss': 0.6091, 'grad_norm': 0.5344637796861096, 'learning_rate': 1.375571603963764e-05, 'epoch': 0.4} 40%|███▉ | 2630/6638 [2:30:34<3:37:24, 3.25s/it] 40%|███▉ | 2631/6638 [2:30:37<3:44:00, 3.35s/it] {'loss': 0.6837, 'grad_norm': 0.6226169398317385, 'learning_rate': 1.3751193058835675e-05, 'epoch': 0.4} 40%|███▉ | 2631/6638 [2:30:37<3:44:00, 3.35s/it] 40%|███▉ | 2632/6638 [2:30:40<3:41:25, 3.32s/it] {'loss': 0.6583, 'grad_norm': 0.5733354805403619, 'learning_rate': 1.3746669184795677e-05, 'epoch': 0.4} 40%|███▉ | 2632/6638 [2:30:40<3:41:25, 3.32s/it] 40%|███▉ | 2633/6638 [2:30:44<3:40:48, 3.31s/it] {'loss': 0.6534, 'grad_norm': 0.5572526870439747, 'learning_rate': 1.3742144418594869e-05, 'epoch': 0.4} 40%|███▉ | 2633/6638 [2:30:44<3:40:48, 3.31s/it] 40%|███▉ | 2634/6638 [2:30:47<3:44:49, 3.37s/it] {'loss': 0.6551, 'grad_norm': 0.5786652592081828, 'learning_rate': 1.3737618761310695e-05, 'epoch': 0.4} 40%|███▉ | 2634/6638 [2:30:47<3:44:49, 3.37s/it] 40%|███▉ | 2635/6638 [2:30:50<3:42:58, 3.34s/it] {'loss': 0.6766, 'grad_norm': 0.6004841146766007, 'learning_rate': 1.3733092214020813e-05, 'epoch': 0.4} 40%|███▉ | 2635/6638 [2:30:50<3:42:58, 3.34s/it] 40%|███▉ | 2636/6638 [2:30:54<3:41:39, 3.32s/it] {'loss': 0.6526, 'grad_norm': 0.5954329767163178, 'learning_rate': 1.3728564777803089e-05, 'epoch': 0.4} 40%|███▉ | 2636/6638 [2:30:54<3:41:39, 3.32s/it] 40%|███▉ | 2637/6638 [2:30:57<3:40:37, 3.31s/it] {'loss': 0.6425, 'grad_norm': 0.6453779427230295, 'learning_rate': 1.3724036453735593e-05, 'epoch': 0.4} 40%|███▉ | 2637/6638 [2:30:57<3:40:37, 3.31s/it] 40%|███▉ | 2638/6638 [2:31:00<3:39:37, 3.29s/it] {'loss': 0.6865, 'grad_norm': 0.6207255169311229, 'learning_rate': 1.3719507242896625e-05, 'epoch': 0.4} 40%|███▉ | 2638/6638 [2:31:00<3:39:37, 3.29s/it] 40%|███▉ | 2639/6638 [2:31:04<3:39:53, 3.30s/it] {'loss': 0.6866, 'grad_norm': 0.6173350650392379, 'learning_rate': 1.3714977146364676e-05, 'epoch': 0.4} 40%|███▉ | 2639/6638 [2:31:04<3:39:53, 3.30s/it] 40%|███▉ | 2640/6638 [2:31:07<3:38:31, 3.28s/it] {'loss': 0.6665, 'grad_norm': 0.6872325900497246, 'learning_rate': 1.3710446165218465e-05, 'epoch': 0.4} 40%|███▉ | 2640/6638 [2:31:07<3:38:31, 3.28s/it] 40%|███▉ | 2641/6638 [2:31:10<3:39:42, 3.30s/it] {'loss': 0.6932, 'grad_norm': 0.6971249108996206, 'learning_rate': 1.370591430053691e-05, 'epoch': 0.4} 40%|███▉ | 2641/6638 [2:31:10<3:39:42, 3.30s/it] 40%|███▉ | 2642/6638 [2:31:13<3:38:22, 3.28s/it] {'loss': 0.7074, 'grad_norm': 0.6656731297861789, 'learning_rate': 1.3701381553399147e-05, 'epoch': 0.4} 40%|███▉ | 2642/6638 [2:31:13<3:38:22, 3.28s/it] 40%|███▉ | 2643/6638 [2:31:17<3:41:07, 3.32s/it] {'loss': 0.6859, 'grad_norm': 0.6417526711797407, 'learning_rate': 1.369684792488451e-05, 'epoch': 0.4} 40%|███▉ | 2643/6638 [2:31:17<3:41:07, 3.32s/it] 40%|███▉ | 2644/6638 [2:31:20<3:40:39, 3.31s/it] {'loss': 0.6332, 'grad_norm': 0.613906625330289, 'learning_rate': 1.369231341607256e-05, 'epoch': 0.4} 40%|███▉ | 2644/6638 [2:31:20<3:40:39, 3.31s/it] 40%|███▉ | 2645/6638 [2:31:24<3:43:43, 3.36s/it] {'loss': 0.6578, 'grad_norm': 0.6041212929606182, 'learning_rate': 1.3687778028043055e-05, 'epoch': 0.4} 40%|███▉ | 2645/6638 [2:31:24<3:43:43, 3.36s/it] 40%|███▉ | 2646/6638 [2:31:27<3:39:12, 3.29s/it] {'loss': 0.6402, 'grad_norm': 0.5795516871485059, 'learning_rate': 1.368324176187597e-05, 'epoch': 0.4} 40%|███▉ | 2646/6638 [2:31:27<3:39:12, 3.29s/it] 40%|███▉ | 2647/6638 [2:31:30<3:38:11, 3.28s/it] {'loss': 0.6653, 'grad_norm': 0.5850659996610701, 'learning_rate': 1.3678704618651481e-05, 'epoch': 0.4} 40%|███▉ | 2647/6638 [2:31:30<3:38:11, 3.28s/it] 40%|███▉ | 2648/6638 [2:31:33<3:38:24, 3.28s/it] {'loss': 0.6785, 'grad_norm': 0.6551816286849942, 'learning_rate': 1.3674166599449978e-05, 'epoch': 0.4} 40%|███▉ | 2648/6638 [2:31:33<3:38:24, 3.28s/it] 40%|███▉ | 2649/6638 [2:31:37<3:40:59, 3.32s/it] {'loss': 0.668, 'grad_norm': 0.6208187891826744, 'learning_rate': 1.3669627705352063e-05, 'epoch': 0.4} 40%|███▉ | 2649/6638 [2:31:37<3:40:59, 3.32s/it]2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 40%|███▉ | 2650/6638 [2:31:40<3:40:38, 3.32s/it]1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6392, 'grad_norm': 0.613205087818952, 'learning_rate': 1.3665087937438537e-05, 'epoch': 0.4} 40%|███▉ | 2650/6638 [2:31:40<3:40:38, 3.32s/it] 40%|███▉ | 2651/6638 [2:31:43<3:39:53, 3.31s/it] {'loss': 0.7029, 'grad_norm': 0.7284908154489154, 'learning_rate': 1.3660547296790418e-05, 'epoch': 0.4} 40%|███▉ | 2651/6638 [2:31:43<3:39:53, 3.31s/it] 40%|███▉ | 2652/6638 [2:31:47<3:39:58, 3.31s/it] {'loss': 0.6739, 'grad_norm': 0.5542745511775922, 'learning_rate': 1.3656005784488925e-05, 'epoch': 0.4} 40%|███▉ | 2652/6638 [2:31:47<3:39:58, 3.31s/it] 40%|███▉ | 2653/6638 [2:31:50<3:38:05, 3.28s/it] {'loss': 0.6721, 'grad_norm': 0.6034799327971151, 'learning_rate': 1.365146340161549e-05, 'epoch': 0.4} 40%|███▉ | 2653/6638 [2:31:50<3:38:05, 3.28s/it] 40%|███▉ | 2654/6638 [2:31:53<3:35:58, 3.25s/it] {'loss': 0.6598, 'grad_norm': 0.5743394089085412, 'learning_rate': 1.364692014925175e-05, 'epoch': 0.4} 40%|███▉ | 2654/6638 [2:31:53<3:35:58, 3.25s/it] 40%|███▉ | 2655/6638 [2:31:56<3:39:58, 3.31s/it] {'loss': 0.6727, 'grad_norm': 0.6201758819994845, 'learning_rate': 1.364237602847955e-05, 'epoch': 0.4} 40%|███▉ | 2655/6638 [2:31:56<3:39:58, 3.31s/it] 40%|████ | 2656/6638 [2:32:00<3:37:33, 3.28s/it] {'loss': 0.6649, 'grad_norm': 0.590157712143155, 'learning_rate': 1.3637831040380937e-05, 'epoch': 0.4} 40%|████ | 2656/6638 [2:32:00<3:37:33, 3.28s/it] 40%|████ | 2657/6638 [2:32:03<3:36:19, 3.26s/it] {'loss': 0.6655, 'grad_norm': 0.6030358413759392, 'learning_rate': 1.3633285186038172e-05, 'epoch': 0.4} 40%|████ | 2657/6638 [2:32:03<3:36:19, 3.26s/it] 40%|████ | 2658/6638 [2:32:06<3:38:19, 3.29s/it] {'loss': 0.7335, 'grad_norm': 0.6402159426070578, 'learning_rate': 1.3628738466533716e-05, 'epoch': 0.4} 40%|████ | 2658/6638 [2:32:06<3:38:19, 3.29s/it] 40%|████ | 2659/6638 [2:32:09<3:37:49, 3.28s/it] {'loss': 0.6489, 'grad_norm': 0.5641091309653401, 'learning_rate': 1.3624190882950241e-05, 'epoch': 0.4} 40%|████ | 2659/6638 [2:32:09<3:37:49, 3.28s/it] 40%|████ | 2660/6638 [2:32:13<3:37:51, 3.29s/it] {'loss': 0.6596, 'grad_norm': 0.6362925435725149, 'learning_rate': 1.361964243637062e-05, 'epoch': 0.4} 40%|████ | 2660/6638 [2:32:13<3:37:51, 3.29s/it] 40%|████ | 2661/6638 [2:32:16<3:39:09, 3.31s/it] {'loss': 0.6515, 'grad_norm': 0.5698077998161807, 'learning_rate': 1.3615093127877938e-05, 'epoch': 0.4} 40%|████ | 2661/6638 [2:32:16<3:39:09, 3.31s/it] 40%|████ | 2662/6638 [2:32:19<3:37:53, 3.29s/it] {'loss': 0.6995, 'grad_norm': 0.6990184084943794, 'learning_rate': 1.3610542958555475e-05, 'epoch': 0.4} 40%|████ | 2662/6638 [2:32:19<3:37:53, 3.29s/it] 40%|████ | 2663/6638 [2:32:23<3:36:03, 3.26s/it] {'loss': 0.6956, 'grad_norm': 0.6539938220802892, 'learning_rate': 1.3605991929486728e-05, 'epoch': 0.4} 40%|████ | 2663/6638 [2:32:23<3:36:03, 3.26s/it] 40%|████ | 2664/6638 [2:32:26<3:35:23, 3.25s/it] {'loss': 0.641, 'grad_norm': 0.6009756045212526, 'learning_rate': 1.3601440041755386e-05, 'epoch': 0.4} 40%|████ | 2664/6638 [2:32:26<3:35:23, 3.25s/it] 40%|████ | 2665/6638 [2:32:29<3:35:43, 3.26s/it] {'loss': 0.655, 'grad_norm': 0.5585560098467558, 'learning_rate': 1.359688729644536e-05, 'epoch': 0.4} 40%|████ | 2665/6638 [2:32:29<3:35:43, 3.26s/it] 40%|████ | 2666/6638 [2:32:33<3:39:42, 3.32s/it] {'loss': 0.6594, 'grad_norm': 0.5554522453841938, 'learning_rate': 1.3592333694640743e-05, 'epoch': 0.4} 40%|████ | 2666/6638 [2:32:33<3:39:42, 3.32s/it] 40%|████ | 2667/6638 [2:32:36<3:39:47, 3.32s/it] {'loss': 0.6367, 'grad_norm': 0.5832132981455574, 'learning_rate': 1.358777923742585e-05, 'epoch': 0.4} 40%|████ | 2667/6638 [2:32:36<3:39:47, 3.32s/it] 40%|████ | 2668/6638 [2:32:39<3:39:57, 3.32s/it] {'loss': 0.65, 'grad_norm': 0.5732745188601805, 'learning_rate': 1.3583223925885191e-05, 'epoch': 0.4} 40%|████ | 2668/6638 [2:32:39<3:39:57, 3.32s/it] 40%|████ | 2669/6638 [2:32:43<3:40:43, 3.34s/it] {'loss': 0.6861, 'grad_norm': 0.6279616089555227, 'learning_rate': 1.3578667761103483e-05, 'epoch': 0.4} 40%|████ | 2669/6638 [2:32:43<3:40:43, 3.34s/it] 40%|████ | 2670/6638 [2:32:46<3:38:29, 3.30s/it] {'loss': 0.6468, 'grad_norm': 0.7155811177058317, 'learning_rate': 1.3574110744165644e-05, 'epoch': 0.4} 40%|████ | 2670/6638 [2:32:46<3:38:29, 3.30s/it] 40%|████ | 2671/6638 [2:32:49<3:38:12, 3.30s/it] {'loss': 0.6294, 'grad_norm': 0.5989400263182268, 'learning_rate': 1.3569552876156798e-05, 'epoch': 0.4} 40%|████ | 2671/6638 [2:32:49<3:38:12, 3.30s/it] 40%|████ | 2672/6638 [2:32:52<3:38:56, 3.31s/it] {'loss': 0.6666, 'grad_norm': 0.7111790473424381, 'learning_rate': 1.3564994158162261e-05, 'epoch': 0.4} 40%|████ | 2672/6638 [2:32:52<3:38:56, 3.31s/it] 40%|████ | 2673/6638 [2:32:56<3:39:15, 3.32s/it] {'loss': 0.657, 'grad_norm': 0.6105831369066208, 'learning_rate': 1.356043459126757e-05, 'epoch': 0.4} 40%|████ | 2673/6638 [2:32:56<3:39:15, 3.32s/it] 40%|████ | 2674/6638 [2:32:59<3:42:33, 3.37s/it] {'loss': 0.6736, 'grad_norm': 0.6800520606305775, 'learning_rate': 1.3555874176558445e-05, 'epoch': 0.4} 40%|████ | 2674/6638 [2:32:59<3:42:33, 3.37s/it] 40%|████ | 2675/6638 [2:33:03<3:41:25, 3.35s/it] {'loss': 0.7156, 'grad_norm': 0.6510144722689402, 'learning_rate': 1.3551312915120826e-05, 'epoch': 0.4} 40%|████ | 2675/6638 [2:33:03<3:41:25, 3.35s/it] 40%|████ | 2676/6638 [2:33:06<3:41:22, 3.35s/it] {'loss': 0.6356, 'grad_norm': 0.6531394005678803, 'learning_rate': 1.3546750808040841e-05, 'epoch': 0.4} 40%|████ | 2676/6638 [2:33:06<3:41:22, 3.35s/it] 40%|████ | 2677/6638 [2:33:09<3:40:39, 3.34s/it] {'loss': 0.6774, 'grad_norm': 0.6561500720772209, 'learning_rate': 1.3542187856404823e-05, 'epoch': 0.4} 40%|████ | 2677/6638 [2:33:09<3:40:39, 3.34s/it] 40%|████ | 2678/6638 [2:33:13<3:40:00, 3.33s/it] {'loss': 0.6728, 'grad_norm': 0.6115965584647265, 'learning_rate': 1.3537624061299303e-05, 'epoch': 0.4} 40%|████ | 2678/6638 [2:33:13<3:40:00, 3.33s/it] 40%|████ | 2679/6638 [2:33:16<3:39:35, 3.33s/it] {'loss': 0.7027, 'grad_norm': 0.6602142521836323, 'learning_rate': 1.3533059423811026e-05, 'epoch': 0.4} 40%|████ | 2679/6638 [2:33:16<3:39:35, 3.33s/it] 40%|████ | 2680/6638 [2:33:19<3:37:07, 3.29s/it] {'loss': 0.6211, 'grad_norm': 0.6799135684507582, 'learning_rate': 1.352849394502692e-05, 'epoch': 0.4} 40%|████ | 2680/6638 [2:33:19<3:37:07, 3.29s/it] 40%|████ | 2681/6638 [2:33:22<3:38:00, 3.31s/it] {'loss': 0.6739, 'grad_norm': 0.669940906628508, 'learning_rate': 1.3523927626034126e-05, 'epoch': 0.4} 40%|████ | 2681/6638 [2:33:22<3:38:00, 3.31s/it] 40%|████ | 2682/6638 [2:33:26<3:39:24, 3.33s/it] {'loss': 0.6894, 'grad_norm': 0.6763557626841712, 'learning_rate': 1.3519360467919977e-05, 'epoch': 0.4} 40%|████ | 2682/6638 [2:33:26<3:39:24, 3.33s/it] 40%|████ | 2683/6638 [2:33:29<3:40:00, 3.34s/it] {'loss': 0.6239, 'grad_norm': 0.5665882995191839, 'learning_rate': 1.3514792471772013e-05, 'epoch': 0.4} 40%|████ | 2683/6638 [2:33:29<3:40:00, 3.34s/it] 40%|████ | 2684/6638 [2:33:32<3:37:37, 3.30s/it] {'loss': 0.6576, 'grad_norm': 0.6456437987956417, 'learning_rate': 1.3510223638677973e-05, 'epoch': 0.4} 40%|████ | 2684/6638 [2:33:32<3:37:37, 3.30s/it] 40%|████ | 2685/6638 [2:33:36<3:37:01, 3.29s/it] {'loss': 0.6406, 'grad_norm': 0.6596032118763698, 'learning_rate': 1.3505653969725785e-05, 'epoch': 0.4} 40%|████ | 2685/6638 [2:33:36<3:37:01, 3.29s/it] 40%|████ | 2686/6638 [2:33:39<3:36:08, 3.28s/it] {'loss': 0.6525, 'grad_norm': 0.6294384622675823, 'learning_rate': 1.3501083466003589e-05, 'epoch': 0.4} 40%|████ | 2686/6638 [2:33:39<3:36:08, 3.28s/it] 40%|████ | 2687/6638 [2:33:42<3:35:21, 3.27s/it] {'loss': 0.6623, 'grad_norm': 0.6849114503802156, 'learning_rate': 1.3496512128599713e-05, 'epoch': 0.4} 40%|████ | 2687/6638 [2:33:42<3:35:21, 3.27s/it] 40%|████ | 2688/6638 [2:33:45<3:34:01, 3.25s/it] {'loss': 0.6494, 'grad_norm': 0.5750352908501657, 'learning_rate': 1.3491939958602691e-05, 'epoch': 0.4} 40%|████ | 2688/6638 [2:33:45<3:34:01, 3.25s/it] 41%|████ | 2689/6638 [2:33:48<3:32:26, 3.23s/it] {'loss': 0.666, 'grad_norm': 0.6388510020550072, 'learning_rate': 1.3487366957101258e-05, 'epoch': 0.41} 41%|████ | 2689/6638 [2:33:48<3:32:26, 3.23s/it] 41%|████ | 2690/6638 [2:33:52<3:33:22, 3.24s/it] {'loss': 0.6553, 'grad_norm': 0.608562845920392, 'learning_rate': 1.3482793125184332e-05, 'epoch': 0.41} 41%|████ | 2690/6638 [2:33:52<3:33:22, 3.24s/it] 41%|████ | 2691/6638 [2:33:55<3:32:38, 3.23s/it] {'loss': 0.6625, 'grad_norm': 0.6907210702126629, 'learning_rate': 1.3478218463941047e-05, 'epoch': 0.41} 41%|████ | 2691/6638 [2:33:55<3:32:38, 3.23s/it] 41%|████ | 2692/6638 [2:33:58<3:34:45, 3.27s/it] {'loss': 0.6437, 'grad_norm': 0.5839041500626532, 'learning_rate': 1.3473642974460724e-05, 'epoch': 0.41} 41%|████ | 2692/6638 [2:33:58<3:34:45, 3.27s/it] 41%|████ | 2693/6638 [2:34:02<3:34:33, 3.26s/it] {'loss': 0.6414, 'grad_norm': 0.6303946892213776, 'learning_rate': 1.3469066657832882e-05, 'epoch': 0.41} 41%|████ | 2693/6638 [2:34:02<3:34:33, 3.26s/it] 41%|████ | 2694/6638 [2:34:05<3:33:32, 3.25s/it] {'loss': 0.6423, 'grad_norm': 0.585201875014543, 'learning_rate': 1.3464489515147239e-05, 'epoch': 0.41} 41%|████ | 2694/6638 [2:34:05<3:33:32, 3.25s/it] 41%|████ | 2695/6638 [2:34:08<3:33:29, 3.25s/it] {'loss': 0.6814, 'grad_norm': 0.6654440645529996, 'learning_rate': 1.3459911547493704e-05, 'epoch': 0.41} 41%|████ | 2695/6638 [2:34:08<3:33:29, 3.25s/it] 41%|████ | 2696/6638 [2:34:11<3:36:02, 3.29s/it] {'loss': 0.6782, 'grad_norm': 0.6463427974536462, 'learning_rate': 1.3455332755962398e-05, 'epoch': 0.41} 41%|████ | 2696/6638 [2:34:11<3:36:02, 3.29s/it] 41%|████ | 2697/6638 [2:34:15<3:35:27, 3.28s/it] {'loss': 0.6816, 'grad_norm': 0.6477695257007249, 'learning_rate': 1.345075314164362e-05, 'epoch': 0.41} 41%|████ | 2697/6638 [2:34:15<3:35:27, 3.28s/it] 41%|████ | 2698/6638 [2:34:18<3:35:39, 3.28s/it] {'loss': 0.6864, 'grad_norm': 0.6359381833017643, 'learning_rate': 1.3446172705627878e-05, 'epoch': 0.41} 41%|████ | 2698/6638 [2:34:18<3:35:39, 3.28s/it] 41%|████ | 2699/6638 [2:34:21<3:37:20, 3.31s/it] {'loss': 0.6336, 'grad_norm': 0.544741947022796, 'learning_rate': 1.3441591449005862e-05, 'epoch': 0.41} 41%|████ | 2699/6638 [2:34:21<3:37:20, 3.31s/it]2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 41%|████ | 2700/6638 [2:34:25<3:36:39, 3.30s/it]6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6777, 'grad_norm': 0.6576452211765447, 'learning_rate': 1.3437009372868476e-05, 'epoch': 0.41} 41%|████ | 2700/6638 [2:34:25<3:36:39, 3.30s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 41%|████ | 2701/6638 [2:34:41<7:52:50, 7.21s/it] {'loss': 0.6901, 'grad_norm': 0.6340216314224353, 'learning_rate': 1.3432426478306796e-05, 'epoch': 0.41} 41%|████ | 2701/6638 [2:34:41<7:52:50, 7.21s/it] 41%|████ | 2702/6638 [2:34:44<6:34:41, 6.02s/it] {'loss': 0.6196, 'grad_norm': 0.5606044581517264, 'learning_rate': 1.3427842766412118e-05, 'epoch': 0.41} 41%|████ | 2702/6638 [2:34:44<6:34:41, 6.02s/it] 41%|████ | 2703/6638 [2:34:47<5:39:52, 5.18s/it] {'loss': 0.6616, 'grad_norm': 0.6215281633119211, 'learning_rate': 1.3423258238275915e-05, 'epoch': 0.41} 41%|████ | 2703/6638 [2:34:47<5:39:52, 5.18s/it] 41%|████ | 2704/6638 [2:34:51<5:01:51, 4.60s/it] {'loss': 0.6756, 'grad_norm': 0.5807217636900105, 'learning_rate': 1.341867289498986e-05, 'epoch': 0.41} 41%|████ | 2704/6638 [2:34:51<5:01:51, 4.60s/it] 41%|████ | 2705/6638 [2:34:54<4:35:42, 4.21s/it] {'loss': 0.6613, 'grad_norm': 0.568987456593198, 'learning_rate': 1.3414086737645819e-05, 'epoch': 0.41} 41%|████ | 2705/6638 [2:34:54<4:35:42, 4.21s/it] 41%|████ | 2706/6638 [2:34:57<4:18:02, 3.94s/it] {'loss': 0.7093, 'grad_norm': 0.7444610601586726, 'learning_rate': 1.3409499767335854e-05, 'epoch': 0.41} 41%|████ | 2706/6638 [2:34:57<4:18:02, 3.94s/it] 41%|████ | 2707/6638 [2:35:01<4:05:09, 3.74s/it] {'loss': 0.6224, 'grad_norm': 0.546622966313527, 'learning_rate': 1.3404911985152216e-05, 'epoch': 0.41} 41%|████ | 2707/6638 [2:35:01<4:05:09, 3.74s/it] 41%|████ | 2708/6638 [2:35:04<3:58:21, 3.64s/it] {'loss': 0.6547, 'grad_norm': 0.5757518689801999, 'learning_rate': 1.3400323392187358e-05, 'epoch': 0.41} 41%|████ | 2708/6638 [2:35:04<3:58:21, 3.64s/it] 41%|████ | 2709/6638 [2:35:07<3:52:11, 3.55s/it] {'loss': 0.6342, 'grad_norm': 0.5535878399968526, 'learning_rate': 1.3395733989533916e-05, 'epoch': 0.41} 41%|████ | 2709/6638 [2:35:07<3:52:11, 3.55s/it] 41%|████ | 2710/6638 [2:35:11<3:48:55, 3.50s/it] {'loss': 0.7418, 'grad_norm': 0.7483050438299772, 'learning_rate': 1.3391143778284725e-05, 'epoch': 0.41} 41%|████ | 2710/6638 [2:35:11<3:48:55, 3.50s/it] 41%|████ | 2711/6638 [2:35:14<3:43:59, 3.42s/it] {'loss': 0.6744, 'grad_norm': 0.6601472216298319, 'learning_rate': 1.3386552759532814e-05, 'epoch': 0.41} 41%|████ | 2711/6638 [2:35:14<3:43:59, 3.42s/it] 41%|████ | 2712/6638 [2:35:17<3:42:01, 3.39s/it] {'loss': 0.6727, 'grad_norm': 0.61115350758707, 'learning_rate': 1.3381960934371394e-05, 'epoch': 0.41} 41%|████ | 2712/6638 [2:35:17<3:42:01, 3.39s/it] 41%|████ | 2713/6638 [2:35:20<3:38:28, 3.34s/it] {'loss': 0.6427, 'grad_norm': 0.6103093611790263, 'learning_rate': 1.3377368303893882e-05, 'epoch': 0.41} 41%|████ | 2713/6638 [2:35:20<3:38:28, 3.34s/it] 41%|████ | 2714/6638 [2:35:24<3:37:37, 3.33s/it] {'loss': 0.6412, 'grad_norm': 0.5374028274945735, 'learning_rate': 1.3372774869193876e-05, 'epoch': 0.41} 41%|████ | 2714/6638 [2:35:24<3:37:37, 3.33s/it] 41%|████ | 2715/6638 [2:35:27<3:36:00, 3.30s/it] {'loss': 0.6547, 'grad_norm': 0.5061432528311283, 'learning_rate': 1.3368180631365171e-05, 'epoch': 0.41} 41%|████ | 2715/6638 [2:35:27<3:36:00, 3.30s/it] 41%|████ | 2716/6638 [2:35:30<3:33:45, 3.27s/it] {'loss': 0.6567, 'grad_norm': 0.5903834235550341, 'learning_rate': 1.3363585591501751e-05, 'epoch': 0.41} 41%|████ | 2716/6638 [2:35:30<3:33:45, 3.27s/it] 41%|████ | 2717/6638 [2:35:33<3:34:12, 3.28s/it] {'loss': 0.6359, 'grad_norm': 0.5427983112783539, 'learning_rate': 1.3358989750697797e-05, 'epoch': 0.41} 41%|████ | 2717/6638 [2:35:33<3:34:12, 3.28s/it] 41%|████ | 2718/6638 [2:35:37<3:33:13, 3.26s/it] {'loss': 0.6297, 'grad_norm': 0.5326323408114971, 'learning_rate': 1.3354393110047665e-05, 'epoch': 0.41} 41%|████ | 2718/6638 [2:35:37<3:33:13, 3.26s/it] 41%|████ | 2719/6638 [2:35:40<3:33:00, 3.26s/it] {'loss': 0.6825, 'grad_norm': 0.6253780002443668, 'learning_rate': 1.334979567064592e-05, 'epoch': 0.41} 41%|████ | 2719/6638 [2:35:40<3:33:00, 3.26s/it] 41%|████ | 2720/6638 [2:35:43<3:35:47, 3.30s/it] {'loss': 0.6949, 'grad_norm': 0.6129625111437799, 'learning_rate': 1.3345197433587306e-05, 'epoch': 0.41} 41%|████ | 2720/6638 [2:35:43<3:35:47, 3.30s/it] 41%|████ | 2721/6638 [2:35:47<3:36:02, 3.31s/it] {'loss': 0.6846, 'grad_norm': 0.6131868691246162, 'learning_rate': 1.3340598399966762e-05, 'epoch': 0.41} 41%|████ | 2721/6638 [2:35:47<3:36:02, 3.31s/it] 41%|████ | 2722/6638 [2:35:50<3:33:32, 3.27s/it] {'loss': 0.6344, 'grad_norm': 0.5698292654260778, 'learning_rate': 1.3335998570879414e-05, 'epoch': 0.41} 41%|████ | 2722/6638 [2:35:50<3:33:32, 3.27s/it] 41%|████ | 2723/6638 [2:35:53<3:35:11, 3.30s/it] {'loss': 0.6759, 'grad_norm': 0.6504691865181805, 'learning_rate': 1.3331397947420578e-05, 'epoch': 0.41} 41%|████ | 2723/6638 [2:35:53<3:35:11, 3.30s/it] 41%|████ | 2724/6638 [2:35:56<3:34:09, 3.28s/it] {'loss': 0.6825, 'grad_norm': 0.63200924686832, 'learning_rate': 1.3326796530685757e-05, 'epoch': 0.41} 41%|████ | 2724/6638 [2:35:56<3:34:09, 3.28s/it] 41%|████ | 2725/6638 [2:36:00<3:34:51, 3.29s/it] {'loss': 0.6719, 'grad_norm': 0.6318064225173494, 'learning_rate': 1.3322194321770647e-05, 'epoch': 0.41} 41%|████ | 2725/6638 [2:36:00<3:34:51, 3.29s/it] 41%|████ | 2726/6638 [2:36:03<3:35:55, 3.31s/it] {'loss': 0.704, 'grad_norm': 0.6685916002634905, 'learning_rate': 1.3317591321771135e-05, 'epoch': 0.41} 41%|████ | 2726/6638 [2:36:03<3:35:55, 3.31s/it] 41%|████ | 2727/6638 [2:36:07<3:36:46, 3.33s/it] {'loss': 0.7246, 'grad_norm': 0.61557148865403, 'learning_rate': 1.3312987531783285e-05, 'epoch': 0.41} 41%|████ | 2727/6638 [2:36:07<3:36:46, 3.33s/it] 41%|████ | 2728/6638 [2:36:10<3:38:27, 3.35s/it] {'loss': 0.6346, 'grad_norm': 0.6121352121340222, 'learning_rate': 1.3308382952903358e-05, 'epoch': 0.41} 41%|████ | 2728/6638 [2:36:10<3:38:27, 3.35s/it] 41%|████ | 2729/6638 [2:36:13<3:35:43, 3.31s/it] {'loss': 0.6815, 'grad_norm': 0.5841672731359733, 'learning_rate': 1.3303777586227806e-05, 'epoch': 0.41} 41%|████ | 2729/6638 [2:36:13<3:35:43, 3.31s/it] 41%|████ | 2730/6638 [2:36:16<3:34:43, 3.30s/it] {'loss': 0.6589, 'grad_norm': 0.5542788376709353, 'learning_rate': 1.3299171432853259e-05, 'epoch': 0.41} 41%|████ | 2730/6638 [2:36:16<3:34:43, 3.30s/it] 41%|████ | 2731/6638 [2:36:20<3:32:34, 3.26s/it] {'loss': 0.6505, 'grad_norm': 0.6499377523006158, 'learning_rate': 1.329456449387654e-05, 'epoch': 0.41} 41%|████ | 2731/6638 [2:36:20<3:32:34, 3.26s/it] 41%|████ | 2732/6638 [2:36:23<3:35:29, 3.31s/it] {'loss': 0.6798, 'grad_norm': 0.5972437252220775, 'learning_rate': 1.3289956770394661e-05, 'epoch': 0.41} 41%|████ | 2732/6638 [2:36:23<3:35:29, 3.31s/it] 41%|████ | 2733/6638 [2:36:26<3:34:19, 3.29s/it] {'loss': 0.6823, 'grad_norm': 0.5756173246227202, 'learning_rate': 1.3285348263504814e-05, 'epoch': 0.41} 41%|████ | 2733/6638 [2:36:26<3:34:19, 3.29s/it] 41%|████ | 2734/6638 [2:36:30<3:34:24, 3.30s/it] {'loss': 0.6493, 'grad_norm': 0.6241966373622901, 'learning_rate': 1.3280738974304383e-05, 'epoch': 0.41} 41%|████ | 2734/6638 [2:36:30<3:34:24, 3.30s/it] 41%|████ | 2735/6638 [2:36:33<3:34:44, 3.30s/it] {'loss': 0.689, 'grad_norm': 0.6244416979638299, 'learning_rate': 1.327612890389094e-05, 'epoch': 0.41} 41%|████ | 2735/6638 [2:36:33<3:34:44, 3.30s/it] 41%|████ | 2736/6638 [2:36:36<3:32:44, 3.27s/it] {'loss': 0.6387, 'grad_norm': 0.5725976154185093, 'learning_rate': 1.3271518053362233e-05, 'epoch': 0.41} 41%|████ | 2736/6638 [2:36:36<3:32:44, 3.27s/it] 41%|████ | 2737/6638 [2:36:40<3:36:28, 3.33s/it] {'loss': 0.6631, 'grad_norm': 0.5802597509343959, 'learning_rate': 1.3266906423816206e-05, 'epoch': 0.41} 41%|████ | 2737/6638 [2:36:40<3:36:28, 3.33s/it] 41%|████ | 2738/6638 [2:36:43<3:36:18, 3.33s/it] {'loss': 0.6579, 'grad_norm': 0.6716463971378479, 'learning_rate': 1.3262294016350987e-05, 'epoch': 0.41} 41%|████ | 2738/6638 [2:36:43<3:36:18, 3.33s/it] 41%|████▏ | 2739/6638 [2:36:46<3:35:25, 3.32s/it] {'loss': 0.6718, 'grad_norm': 0.6630437230750295, 'learning_rate': 1.3257680832064884e-05, 'epoch': 0.41} 41%|████▏ | 2739/6638 [2:36:46<3:35:25, 3.32s/it] 41%|████▏ | 2740/6638 [2:36:49<3:34:32, 3.30s/it] {'loss': 0.6188, 'grad_norm': 0.6434928094280721, 'learning_rate': 1.3253066872056402e-05, 'epoch': 0.41} 41%|████▏ | 2740/6638 [2:36:49<3:34:32, 3.30s/it] 41%|████▏ | 2741/6638 [2:36:53<3:32:48, 3.28s/it] {'loss': 0.6245, 'grad_norm': 0.5834737694944573, 'learning_rate': 1.3248452137424208e-05, 'epoch': 0.41} 41%|████▏ | 2741/6638 [2:36:53<3:32:48, 3.28s/it] 41%|████▏ | 2742/6638 [2:36:56<3:34:47, 3.31s/it] {'loss': 0.6376, 'grad_norm': 0.5801040212664361, 'learning_rate': 1.3243836629267177e-05, 'epoch': 0.41} 41%|████▏ | 2742/6638 [2:36:56<3:34:47, 3.31s/it] 41%|████▏ | 2743/6638 [2:36:59<3:32:38, 3.28s/it] {'loss': 0.6203, 'grad_norm': 0.5705208060584088, 'learning_rate': 1.3239220348684353e-05, 'epoch': 0.41} 41%|████▏ | 2743/6638 [2:36:59<3:32:38, 3.28s/it] 41%|████▏ | 2744/6638 [2:37:02<3:31:39, 3.26s/it] {'loss': 0.625, 'grad_norm': 0.5758298024654307, 'learning_rate': 1.3234603296774978e-05, 'epoch': 0.41} 41%|████▏ | 2744/6638 [2:37:02<3:31:39, 3.26s/it] 41%|████▏ | 2745/6638 [2:37:06<3:32:15, 3.27s/it] {'loss': 0.6346, 'grad_norm': 0.68375986313181, 'learning_rate': 1.322998547463846e-05, 'epoch': 0.41} 41%|████▏ | 2745/6638 [2:37:06<3:32:15, 3.27s/it] 41%|████▏ | 2746/6638 [2:37:09<3:29:53, 3.24s/it] {'loss': 0.6931, 'grad_norm': 0.6983333030381318, 'learning_rate': 1.3225366883374409e-05, 'epoch': 0.41} 41%|████▏ | 2746/6638 [2:37:09<3:29:53, 3.24s/it] 41%|████▏ | 2747/6638 [2:37:12<3:31:29, 3.26s/it] {'loss': 0.641, 'grad_norm': 0.5810242097336138, 'learning_rate': 1.3220747524082595e-05, 'epoch': 0.41} 41%|████▏ | 2747/6638 [2:37:12<3:31:29, 3.26s/it] 41%|████▏ | 2748/6638 [2:37:15<3:29:22, 3.23s/it] {'loss': 0.6896, 'grad_norm': 0.7047600277528262, 'learning_rate': 1.3216127397863001e-05, 'epoch': 0.41} 41%|████▏ | 2748/6638 [2:37:15<3:29:22, 3.23s/it] 41%|████▏ | 2749/6638 [2:37:19<3:32:16, 3.28s/it] {'loss': 0.6503, 'grad_norm': 0.5928698969011725, 'learning_rate': 1.3211506505815764e-05, 'epoch': 0.41} 41%|████▏ | 2749/6638 [2:37:19<3:32:16, 3.28s/it]3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 41%|████▏ | 2750/6638 [2:37:22<3:35:52, 3.33s/it] {'loss': 0.7379, 'grad_norm': 0.6467691093009679, 'learning_rate': 1.3206884849041224e-05, 'epoch': 0.41} 41%|████▏ | 2750/6638 [2:37:22<3:35:52, 3.33s/it] 41%|████▏ | 2751/6638 [2:37:26<3:35:16, 3.32s/it] {'loss': 0.671, 'grad_norm': 0.6362073414521544, 'learning_rate': 1.3202262428639886e-05, 'epoch': 0.41} 41%|████▏ | 2751/6638 [2:37:26<3:35:16, 3.32s/it] 41%|████▏ | 2752/6638 [2:37:29<3:34:40, 3.31s/it] {'loss': 0.6758, 'grad_norm': 0.5696782627858229, 'learning_rate': 1.3197639245712454e-05, 'epoch': 0.41} 41%|████▏ | 2752/6638 [2:37:29<3:34:40, 3.31s/it] 41%|████▏ | 2753/6638 [2:37:32<3:37:54, 3.37s/it] {'loss': 0.6522, 'grad_norm': 0.6234574877826108, 'learning_rate': 1.31930153013598e-05, 'epoch': 0.41} 41%|████▏ | 2753/6638 [2:37:32<3:37:54, 3.37s/it] 41%|████▏ | 2754/6638 [2:37:36<3:35:38, 3.33s/it] {'loss': 0.6418, 'grad_norm': 0.5971208799355912, 'learning_rate': 1.3188390596682985e-05, 'epoch': 0.41} 41%|████▏ | 2754/6638 [2:37:36<3:35:38, 3.33s/it] 42%|████▏ | 2755/6638 [2:37:39<3:33:46, 3.30s/it] {'loss': 0.6725, 'grad_norm': 0.6389141343591987, 'learning_rate': 1.318376513278325e-05, 'epoch': 0.42} 42%|████▏ | 2755/6638 [2:37:39<3:33:46, 3.30s/it] 42%|████▏ | 2756/6638 [2:37:42<3:34:02, 3.31s/it] {'loss': 0.6745, 'grad_norm': 0.6249559779116554, 'learning_rate': 1.3179138910762013e-05, 'epoch': 0.42} 42%|████▏ | 2756/6638 [2:37:42<3:34:02, 3.31s/it] 42%|████▏ | 2757/6638 [2:37:45<3:34:03, 3.31s/it] {'loss': 0.6555, 'grad_norm': 0.6130153307502321, 'learning_rate': 1.317451193172087e-05, 'epoch': 0.42} 42%|████▏ | 2757/6638 [2:37:45<3:34:03, 3.31s/it] 42%|████▏ | 2758/6638 [2:37:49<3:32:21, 3.28s/it] {'loss': 0.6658, 'grad_norm': 0.6176664595044026, 'learning_rate': 1.316988419676161e-05, 'epoch': 0.42} 42%|████▏ | 2758/6638 [2:37:49<3:32:21, 3.28s/it] 42%|████▏ | 2759/6638 [2:37:52<3:30:54, 3.26s/it] {'loss': 0.6489, 'grad_norm': 0.6304810569199839, 'learning_rate': 1.3165255706986193e-05, 'epoch': 0.42} 42%|████▏ | 2759/6638 [2:37:52<3:30:54, 3.26s/it] 42%|████▏ | 2760/6638 [2:37:55<3:30:41, 3.26s/it] {'loss': 0.6241, 'grad_norm': 0.5876803010532439, 'learning_rate': 1.3160626463496756e-05, 'epoch': 0.42} 42%|████▏ | 2760/6638 [2:37:55<3:30:41, 3.26s/it] 42%|████▏ | 2761/6638 [2:37:58<3:29:40, 3.24s/it] {'loss': 0.6611, 'grad_norm': 0.664296433822662, 'learning_rate': 1.3155996467395623e-05, 'epoch': 0.42} 42%|████▏ | 2761/6638 [2:37:58<3:29:40, 3.24s/it] 42%|████▏ | 2762/6638 [2:38:02<3:31:38, 3.28s/it] {'loss': 0.6712, 'grad_norm': 0.6286050130567421, 'learning_rate': 1.315136571978529e-05, 'epoch': 0.42} 42%|████▏ | 2762/6638 [2:38:02<3:31:38, 3.28s/it] 42%|████▏ | 2763/6638 [2:38:05<3:30:38, 3.26s/it] {'loss': 0.6199, 'grad_norm': 0.6210552396630105, 'learning_rate': 1.3146734221768439e-05, 'epoch': 0.42} 42%|████▏ | 2763/6638 [2:38:05<3:30:38, 3.26s/it] 42%|████▏ | 2764/6638 [2:38:08<3:30:50, 3.27s/it] {'loss': 0.6621, 'grad_norm': 0.6512627904274132, 'learning_rate': 1.3142101974447923e-05, 'epoch': 0.42} 42%|████▏ | 2764/6638 [2:38:08<3:30:50, 3.27s/it] 42%|████▏ | 2765/6638 [2:38:11<3:30:33, 3.26s/it] {'loss': 0.681, 'grad_norm': 0.6391158939604487, 'learning_rate': 1.3137468978926784e-05, 'epoch': 0.42} 42%|████▏ | 2765/6638 [2:38:11<3:30:33, 3.26s/it] 42%|████▏ | 2766/6638 [2:38:15<3:30:02, 3.25s/it] {'loss': 0.679, 'grad_norm': 0.630879819845237, 'learning_rate': 1.3132835236308228e-05, 'epoch': 0.42} 42%|████▏ | 2766/6638 [2:38:15<3:30:02, 3.25s/it] 42%|████▏ | 2767/6638 [2:38:18<3:29:13, 3.24s/it] {'loss': 0.6447, 'grad_norm': 0.7712270186277468, 'learning_rate': 1.3128200747695651e-05, 'epoch': 0.42} 42%|████▏ | 2767/6638 [2:38:18<3:29:13, 3.24s/it] 42%|████▏ | 2768/6638 [2:38:21<3:30:31, 3.26s/it] {'loss': 0.6909, 'grad_norm': 0.622233122116063, 'learning_rate': 1.3123565514192625e-05, 'epoch': 0.42} 42%|████▏ | 2768/6638 [2:38:21<3:30:31, 3.26s/it] 42%|████▏ | 2769/6638 [2:38:24<3:28:38, 3.24s/it] {'loss': 0.6458, 'grad_norm': 0.6103549971851475, 'learning_rate': 1.3118929536902894e-05, 'epoch': 0.42} 42%|████▏ | 2769/6638 [2:38:24<3:28:38, 3.24s/it] 42%|████▏ | 2770/6638 [2:38:28<3:30:07, 3.26s/it] {'loss': 0.6138, 'grad_norm': 0.5482076598618298, 'learning_rate': 1.3114292816930378e-05, 'epoch': 0.42} 42%|████▏ | 2770/6638 [2:38:28<3:30:07, 3.26s/it] 42%|████▏ | 2771/6638 [2:38:31<3:32:20, 3.29s/it] {'loss': 0.6578, 'grad_norm': 0.6417095756796075, 'learning_rate': 1.3109655355379183e-05, 'epoch': 0.42} 42%|████▏ | 2771/6638 [2:38:31<3:32:20, 3.29s/it] 42%|████▏ | 2772/6638 [2:38:34<3:31:02, 3.28s/it] {'loss': 0.7065, 'grad_norm': 0.6799074223219738, 'learning_rate': 1.3105017153353583e-05, 'epoch': 0.42} 42%|████▏ | 2772/6638 [2:38:34<3:31:02, 3.28s/it] 42%|████▏ | 2773/6638 [2:38:38<3:31:55, 3.29s/it] {'loss': 0.6522, 'grad_norm': 0.5516247850447001, 'learning_rate': 1.3100378211958036e-05, 'epoch': 0.42} 42%|████▏ | 2773/6638 [2:38:38<3:31:55, 3.29s/it] 42%|████▏ | 2774/6638 [2:38:41<3:30:26, 3.27s/it] {'loss': 0.6228, 'grad_norm': 0.5669057588122889, 'learning_rate': 1.3095738532297164e-05, 'epoch': 0.42} 42%|████▏ | 2774/6638 [2:38:41<3:30:26, 3.27s/it] 42%|████▏ | 2775/6638 [2:38:44<3:30:43, 3.27s/it] {'loss': 0.678, 'grad_norm': 0.5817163773718856, 'learning_rate': 1.309109811547578e-05, 'epoch': 0.42} 42%|████▏ | 2775/6638 [2:38:44<3:30:43, 3.27s/it] 42%|████▏ | 2776/6638 [2:38:47<3:31:20, 3.28s/it] {'loss': 0.6282, 'grad_norm': 0.5789110293577137, 'learning_rate': 1.3086456962598859e-05, 'epoch': 0.42} 42%|████▏ | 2776/6638 [2:38:47<3:31:20, 3.28s/it] 42%|████▏ | 2777/6638 [2:38:51<3:32:11, 3.30s/it] {'loss': 0.6546, 'grad_norm': 0.6937263815158488, 'learning_rate': 1.3081815074771562e-05, 'epoch': 0.42} 42%|████▏ | 2777/6638 [2:38:51<3:32:11, 3.30s/it] 42%|████▏ | 2778/6638 [2:38:54<3:29:05, 3.25s/it] {'loss': 0.6543, 'grad_norm': 0.5439185365986875, 'learning_rate': 1.3077172453099219e-05, 'epoch': 0.42} 42%|████▏ | 2778/6638 [2:38:54<3:29:05, 3.25s/it] 42%|████▏ | 2779/6638 [2:38:57<3:30:50, 3.28s/it] {'loss': 0.6336, 'grad_norm': 0.5520059473816465, 'learning_rate': 1.3072529098687334e-05, 'epoch': 0.42} 42%|████▏ | 2779/6638 [2:38:57<3:30:50, 3.28s/it] 42%|████▏ | 2780/6638 [2:39:00<3:28:54, 3.25s/it] {'loss': 0.7252, 'grad_norm': 0.8351795741738078, 'learning_rate': 1.3067885012641589e-05, 'epoch': 0.42} 42%|████▏ | 2780/6638 [2:39:00<3:28:54, 3.25s/it] 42%|████▏ | 2781/6638 [2:39:04<3:28:38, 3.25s/it] {'loss': 0.6625, 'grad_norm': 0.6685379997412128, 'learning_rate': 1.3063240196067837e-05, 'epoch': 0.42} 42%|████▏ | 2781/6638 [2:39:04<3:28:38, 3.25s/it] 42%|████▏ | 2782/6638 [2:39:07<3:30:18, 3.27s/it] {'loss': 0.7172, 'grad_norm': 0.643009381206245, 'learning_rate': 1.3058594650072106e-05, 'epoch': 0.42} 42%|████▏ | 2782/6638 [2:39:07<3:30:18, 3.27s/it] 42%|████▏ | 2783/6638 [2:39:10<3:29:27, 3.26s/it] {'loss': 0.6371, 'grad_norm': 0.5983931070971233, 'learning_rate': 1.3053948375760604e-05, 'epoch': 0.42} 42%|████▏ | 2783/6638 [2:39:10<3:29:27, 3.26s/it] 42%|████▏ | 2784/6638 [2:39:14<3:30:48, 3.28s/it] {'loss': 0.7057, 'grad_norm': 0.6629791904444136, 'learning_rate': 1.3049301374239702e-05, 'epoch': 0.42} 42%|████▏ | 2784/6638 [2:39:14<3:30:48, 3.28s/it] 42%|████▏ | 2785/6638 [2:39:17<3:31:02, 3.29s/it] {'loss': 0.6855, 'grad_norm': 0.6733455657900466, 'learning_rate': 1.3044653646615948e-05, 'epoch': 0.42} 42%|████▏ | 2785/6638 [2:39:17<3:31:02, 3.29s/it] 42%|████▏ | 2786/6638 [2:39:20<3:30:40, 3.28s/it] {'loss': 0.6542, 'grad_norm': 0.5632708063698049, 'learning_rate': 1.3040005193996065e-05, 'epoch': 0.42} 42%|████▏ | 2786/6638 [2:39:20<3:30:40, 3.28s/it] 42%|████▏ | 2787/6638 [2:39:23<3:31:37, 3.30s/it] {'loss': 0.6874, 'grad_norm': 0.643742042509703, 'learning_rate': 1.3035356017486951e-05, 'epoch': 0.42} 42%|████▏ | 2787/6638 [2:39:23<3:31:37, 3.30s/it] 42%|████▏ | 2788/6638 [2:39:27<3:30:01, 3.27s/it] {'loss': 0.7025, 'grad_norm': 0.6032433036799046, 'learning_rate': 1.3030706118195669e-05, 'epoch': 0.42} 42%|████▏ | 2788/6638 [2:39:27<3:30:01, 3.27s/it] 42%|████▏ | 2789/6638 [2:39:30<3:28:06, 3.24s/it] {'loss': 0.6517, 'grad_norm': 0.6093143341722966, 'learning_rate': 1.3026055497229457e-05, 'epoch': 0.42} 42%|████▏ | 2789/6638 [2:39:30<3:28:06, 3.24s/it] 42%|████▏ | 2790/6638 [2:39:33<3:29:44, 3.27s/it] {'loss': 0.6519, 'grad_norm': 0.5584857979289496, 'learning_rate': 1.3021404155695728e-05, 'epoch': 0.42} 42%|████▏ | 2790/6638 [2:39:33<3:29:44, 3.27s/it] 42%|████▏ | 2791/6638 [2:39:37<3:32:35, 3.32s/it] {'loss': 0.6201, 'grad_norm': 0.6038109741553553, 'learning_rate': 1.3016752094702064e-05, 'epoch': 0.42} 42%|████▏ | 2791/6638 [2:39:37<3:32:35, 3.32s/it] 42%|████▏ | 2792/6638 [2:39:40<3:31:25, 3.30s/it] {'loss': 0.6584, 'grad_norm': 0.6722181793259361, 'learning_rate': 1.3012099315356222e-05, 'epoch': 0.42} 42%|████▏ | 2792/6638 [2:39:40<3:31:25, 3.30s/it] 42%|████▏ | 2793/6638 [2:39:43<3:30:37, 3.29s/it] {'loss': 0.6706, 'grad_norm': 0.5910331704104179, 'learning_rate': 1.3007445818766116e-05, 'epoch': 0.42} 42%|████▏ | 2793/6638 [2:39:43<3:30:37, 3.29s/it] 42%|████▏ | 2794/6638 [2:39:46<3:30:41, 3.29s/it] {'loss': 0.6469, 'grad_norm': 0.6578302533899463, 'learning_rate': 1.3002791606039853e-05, 'epoch': 0.42} 42%|████▏ | 2794/6638 [2:39:46<3:30:41, 3.29s/it] 42%|████▏ | 2795/6638 [2:39:50<3:31:57, 3.31s/it] {'loss': 0.6912, 'grad_norm': 0.5970087271025868, 'learning_rate': 1.2998136678285694e-05, 'epoch': 0.42} 42%|████▏ | 2795/6638 [2:39:50<3:31:57, 3.31s/it] 42%|████▏ | 2796/6638 [2:39:53<3:30:50, 3.29s/it] {'loss': 0.6588, 'grad_norm': 0.6578682892294766, 'learning_rate': 1.2993481036612074e-05, 'epoch': 0.42} 42%|████▏ | 2796/6638 [2:39:53<3:30:50, 3.29s/it] 42%|████▏ | 2797/6638 [2:39:56<3:30:19, 3.29s/it] {'loss': 0.6751, 'grad_norm': 0.6529571661887568, 'learning_rate': 1.2988824682127605e-05, 'epoch': 0.42} 42%|████▏ | 2797/6638 [2:39:56<3:30:19, 3.29s/it] 42%|████▏ | 2798/6638 [2:40:00<3:29:52, 3.28s/it] {'loss': 0.6564, 'grad_norm': 0.6115738166234164, 'learning_rate': 1.2984167615941056e-05, 'epoch': 0.42} 42%|████▏ | 2798/6638 [2:40:00<3:29:52, 3.28s/it] 42%|████▏ | 2799/6638 [2:40:03<3:31:03, 3.30s/it] {'loss': 0.6816, 'grad_norm': 0.6300701822442761, 'learning_rate': 1.2979509839161377e-05, 'epoch': 0.42} 42%|████▏ | 2799/6638 [2:40:03<3:31:03, 3.30s/it]5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 42%|████▏ | 2800/6638 [2:40:06<3:29:50, 3.28s/it] {'loss': 0.6075, 'grad_norm': 0.5564468187551493, 'learning_rate': 1.2974851352897681e-05, 'epoch': 0.42} 42%|████▏ | 2800/6638 [2:40:06<3:29:50, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 42%|████▏ | 2801/6638 [2:40:23<7:40:47, 7.21s/it] {'loss': 0.6551, 'grad_norm': 0.5924991011902887, 'learning_rate': 1.2970192158259251e-05, 'epoch': 0.42} 42%|████▏ | 2801/6638 [2:40:23<7:40:47, 7.21s/it] 42%|████▏ | 2802/6638 [2:40:26<6:23:02, 5.99s/it] {'loss': 0.6778, 'grad_norm': 0.5928843397922696, 'learning_rate': 1.2965532256355547e-05, 'epoch': 0.42} 42%|████▏ | 2802/6638 [2:40:26<6:23:02, 5.99s/it] 42%|████▏ | 2803/6638 [2:40:29<5:29:59, 5.16s/it] {'loss': 0.6706, 'grad_norm': 0.6145192709462333, 'learning_rate': 1.2960871648296176e-05, 'epoch': 0.42} 42%|████▏ | 2803/6638 [2:40:29<5:29:59, 5.16s/it] 42%|████▏ | 2804/6638 [2:40:32<4:52:43, 4.58s/it] {'loss': 0.6704, 'grad_norm': 0.6629748561580991, 'learning_rate': 1.2956210335190934e-05, 'epoch': 0.42} 42%|████▏ | 2804/6638 [2:40:32<4:52:43, 4.58s/it] 42%|████▏ | 2805/6638 [2:40:35<4:26:05, 4.17s/it] {'loss': 0.6359, 'grad_norm': 0.5931194886208482, 'learning_rate': 1.295154831814978e-05, 'epoch': 0.42} 42%|████▏ | 2805/6638 [2:40:35<4:26:05, 4.17s/it] 42%|████▏ | 2806/6638 [2:40:39<4:10:10, 3.92s/it] {'loss': 0.6674, 'grad_norm': 0.5754863782198096, 'learning_rate': 1.2946885598282839e-05, 'epoch': 0.42} 42%|████▏ | 2806/6638 [2:40:39<4:10:10, 3.92s/it] 42%|████▏ | 2807/6638 [2:40:42<3:57:51, 3.73s/it] {'loss': 0.7344, 'grad_norm': 0.646870463941353, 'learning_rate': 1.2942222176700397e-05, 'epoch': 0.42} 42%|████▏ | 2807/6638 [2:40:42<3:57:51, 3.73s/it] 42%|████▏ | 2808/6638 [2:40:45<3:48:58, 3.59s/it] {'loss': 0.6708, 'grad_norm': 0.6251167678581274, 'learning_rate': 1.2937558054512916e-05, 'epoch': 0.42} 42%|████▏ | 2808/6638 [2:40:45<3:48:58, 3.59s/it] 42%|████▏ | 2809/6638 [2:40:49<3:44:17, 3.51s/it] {'loss': 0.6881, 'grad_norm': 0.6007715852676514, 'learning_rate': 1.2932893232831021e-05, 'epoch': 0.42} 42%|████▏ | 2809/6638 [2:40:49<3:44:17, 3.51s/it] 42%|████▏ | 2810/6638 [2:40:52<3:37:48, 3.41s/it] {'loss': 0.627, 'grad_norm': 0.6052291658238009, 'learning_rate': 1.2928227712765504e-05, 'epoch': 0.42} 42%|████▏ | 2810/6638 [2:40:52<3:37:48, 3.41s/it] 42%|████▏ | 2811/6638 [2:40:55<3:36:52, 3.40s/it] {'loss': 0.6655, 'grad_norm': 0.5777398173063868, 'learning_rate': 1.2923561495427327e-05, 'epoch': 0.42} 42%|████▏ | 2811/6638 [2:40:55<3:36:52, 3.40s/it] 42%|████▏ | 2812/6638 [2:40:58<3:35:36, 3.38s/it] {'loss': 0.6784, 'grad_norm': 0.6122705835177884, 'learning_rate': 1.291889458192761e-05, 'epoch': 0.42} 42%|████▏ | 2812/6638 [2:40:58<3:35:36, 3.38s/it] 42%|████▏ | 2813/6638 [2:41:02<3:33:47, 3.35s/it] {'loss': 0.6862, 'grad_norm': 0.6238718258555352, 'learning_rate': 1.2914226973377646e-05, 'epoch': 0.42} 42%|████▏ | 2813/6638 [2:41:02<3:33:47, 3.35s/it] 42%|████▏ | 2814/6638 [2:41:05<3:32:07, 3.33s/it] {'loss': 0.618, 'grad_norm': 0.5859298667830953, 'learning_rate': 1.2909558670888891e-05, 'epoch': 0.42} 42%|████▏ | 2814/6638 [2:41:05<3:32:07, 3.33s/it] 42%|████▏ | 2815/6638 [2:41:08<3:32:16, 3.33s/it] {'loss': 0.711, 'grad_norm': 0.642448248123815, 'learning_rate': 1.2904889675572964e-05, 'epoch': 0.42} 42%|████▏ | 2815/6638 [2:41:08<3:32:16, 3.33s/it] 42%|████▏ | 2816/6638 [2:41:11<3:28:47, 3.28s/it] {'loss': 0.6441, 'grad_norm': 0.6057170657735091, 'learning_rate': 1.2900219988541652e-05, 'epoch': 0.42} 42%|████▏ | 2816/6638 [2:41:11<3:28:47, 3.28s/it] 42%|████▏ | 2817/6638 [2:41:15<3:28:53, 3.28s/it] {'loss': 0.6693, 'grad_norm': 0.6388496295623081, 'learning_rate': 1.2895549610906909e-05, 'epoch': 0.42} 42%|████▏ | 2817/6638 [2:41:15<3:28:53, 3.28s/it] 42%|████▏ | 2818/6638 [2:41:18<3:26:49, 3.25s/it] {'loss': 0.6827, 'grad_norm': 0.7453954098596746, 'learning_rate': 1.2890878543780843e-05, 'epoch': 0.42} 42%|████▏ | 2818/6638 [2:41:18<3:26:49, 3.25s/it] 42%|████▏ | 2819/6638 [2:41:21<3:26:21, 3.24s/it] {'loss': 0.6521, 'grad_norm': 0.585730598174937, 'learning_rate': 1.2886206788275739e-05, 'epoch': 0.42} 42%|████▏ | 2819/6638 [2:41:21<3:26:21, 3.24s/it] 42%|████▏ | 2820/6638 [2:41:24<3:26:20, 3.24s/it] {'loss': 0.6848, 'grad_norm': 0.6166033850790518, 'learning_rate': 1.2881534345504041e-05, 'epoch': 0.42} 42%|████▏ | 2820/6638 [2:41:24<3:26:20, 3.24s/it] 42%|████▏ | 2821/6638 [2:41:28<3:25:53, 3.24s/it] {'loss': 0.6622, 'grad_norm': 0.5809182356152129, 'learning_rate': 1.2876861216578354e-05, 'epoch': 0.42} 42%|████▏ | 2821/6638 [2:41:28<3:25:53, 3.24s/it] 43%|████▎ | 2822/6638 [2:41:31<3:31:17, 3.32s/it] {'loss': 0.7174, 'grad_norm': 0.6552357810289784, 'learning_rate': 1.2872187402611446e-05, 'epoch': 0.43} 43%|████▎ | 2822/6638 [2:41:31<3:31:17, 3.32s/it] 43%|████▎ | 2823/6638 [2:41:35<3:34:05, 3.37s/it] {'loss': 0.7066, 'grad_norm': 0.6649004911984603, 'learning_rate': 1.2867512904716255e-05, 'epoch': 0.43} 43%|████▎ | 2823/6638 [2:41:35<3:34:05, 3.37s/it] 43%|████▎ | 2824/6638 [2:41:38<3:33:31, 3.36s/it] {'loss': 0.6723, 'grad_norm': 0.7745825397675499, 'learning_rate': 1.2862837724005872e-05, 'epoch': 0.43} 43%|████▎ | 2824/6638 [2:41:38<3:33:31, 3.36s/it] 43%|████▎ | 2825/6638 [2:41:41<3:32:52, 3.35s/it] {'loss': 0.6778, 'grad_norm': 0.5751755560151685, 'learning_rate': 1.2858161861593566e-05, 'epoch': 0.43} 43%|████▎ | 2825/6638 [2:41:41<3:32:52, 3.35s/it] 43%|████▎ | 2826/6638 [2:41:45<3:30:16, 3.31s/it] {'loss': 0.6482, 'grad_norm': 0.5611199375583438, 'learning_rate': 1.2853485318592744e-05, 'epoch': 0.43} 43%|████▎ | 2826/6638 [2:41:45<3:30:16, 3.31s/it] 43%|████▎ | 2827/6638 [2:41:48<3:27:42, 3.27s/it] {'loss': 0.6844, 'grad_norm': 0.6126995494614742, 'learning_rate': 1.2848808096117003e-05, 'epoch': 0.43} 43%|████▎ | 2827/6638 [2:41:48<3:27:42, 3.27s/it] 43%|████▎ | 2828/6638 [2:41:51<3:29:17, 3.30s/it] {'loss': 0.6439, 'grad_norm': 0.6178831456425566, 'learning_rate': 1.2844130195280076e-05, 'epoch': 0.43} 43%|████▎ | 2828/6638 [2:41:51<3:29:17, 3.30s/it] 43%|████▎ | 2829/6638 [2:41:54<3:29:10, 3.29s/it] {'loss': 0.6396, 'grad_norm': 0.5595033511120197, 'learning_rate': 1.2839451617195879e-05, 'epoch': 0.43} 43%|████▎ | 2829/6638 [2:41:54<3:29:10, 3.29s/it] 43%|████▎ | 2830/6638 [2:41:58<3:30:25, 3.32s/it] {'loss': 0.6674, 'grad_norm': 0.6141261604442094, 'learning_rate': 1.2834772362978476e-05, 'epoch': 0.43} 43%|████▎ | 2830/6638 [2:41:58<3:30:25, 3.32s/it] 43%|████▎ | 2831/6638 [2:42:01<3:30:08, 3.31s/it] {'loss': 0.6486, 'grad_norm': 0.6061043862653257, 'learning_rate': 1.2830092433742098e-05, 'epoch': 0.43} 43%|████▎ | 2831/6638 [2:42:01<3:30:08, 3.31s/it] 43%|████▎ | 2832/6638 [2:42:04<3:27:56, 3.28s/it] {'loss': 0.6275, 'grad_norm': 0.6541118613003541, 'learning_rate': 1.282541183060113e-05, 'epoch': 0.43} 43%|████▎ | 2832/6638 [2:42:04<3:27:56, 3.28s/it] 43%|████▎ | 2833/6638 [2:42:07<3:27:32, 3.27s/it] {'loss': 0.6884, 'grad_norm': 0.581834458170945, 'learning_rate': 1.2820730554670128e-05, 'epoch': 0.43} 43%|████▎ | 2833/6638 [2:42:07<3:27:32, 3.27s/it] 43%|████▎ | 2834/6638 [2:42:11<3:28:34, 3.29s/it] {'loss': 0.661, 'grad_norm': 0.5719579847194571, 'learning_rate': 1.28160486070638e-05, 'epoch': 0.43} 43%|████▎ | 2834/6638 [2:42:11<3:28:34, 3.29s/it] 43%|████▎ | 2835/6638 [2:42:14<3:26:18, 3.25s/it] {'loss': 0.669, 'grad_norm': 0.5773865771226409, 'learning_rate': 1.2811365988897015e-05, 'epoch': 0.43} 43%|████▎ | 2835/6638 [2:42:14<3:26:18, 3.25s/it] 43%|████▎ | 2836/6638 [2:42:17<3:26:30, 3.26s/it] {'loss': 0.631, 'grad_norm': 0.5943511562697634, 'learning_rate': 1.2806682701284803e-05, 'epoch': 0.43} 43%|████▎ | 2836/6638 [2:42:17<3:26:30, 3.26s/it] 43%|████▎ | 2837/6638 [2:42:21<3:26:36, 3.26s/it] {'loss': 0.6388, 'grad_norm': 0.5807342429249078, 'learning_rate': 1.2801998745342354e-05, 'epoch': 0.43} 43%|████▎ | 2837/6638 [2:42:21<3:26:36, 3.26s/it] 43%|████▎ | 2838/6638 [2:42:24<3:28:34, 3.29s/it] {'loss': 0.6491, 'grad_norm': 0.6052062254960481, 'learning_rate': 1.2797314122185018e-05, 'epoch': 0.43} 43%|████▎ | 2838/6638 [2:42:24<3:28:34, 3.29s/it] 43%|████▎ | 2839/6638 [2:42:27<3:29:31, 3.31s/it] {'loss': 0.7094, 'grad_norm': 0.6604853054134291, 'learning_rate': 1.2792628832928302e-05, 'epoch': 0.43} 43%|████▎ | 2839/6638 [2:42:27<3:29:31, 3.31s/it] 43%|████▎ | 2840/6638 [2:42:30<3:28:53, 3.30s/it] {'loss': 0.6664, 'grad_norm': 0.6658484304051011, 'learning_rate': 1.2787942878687871e-05, 'epoch': 0.43} 43%|████▎ | 2840/6638 [2:42:30<3:28:53, 3.30s/it] 43%|████▎ | 2841/6638 [2:42:34<3:27:48, 3.28s/it] {'loss': 0.6703, 'grad_norm': 0.670389775600166, 'learning_rate': 1.278325626057955e-05, 'epoch': 0.43} 43%|████▎ | 2841/6638 [2:42:34<3:27:48, 3.28s/it] 43%|████▎ | 2842/6638 [2:42:37<3:27:19, 3.28s/it] {'loss': 0.654, 'grad_norm': 0.6295257865520946, 'learning_rate': 1.277856897971932e-05, 'epoch': 0.43} 43%|████▎ | 2842/6638 [2:42:37<3:27:19, 3.28s/it] 43%|████▎ | 2843/6638 [2:42:40<3:27:49, 3.29s/it] {'loss': 0.6374, 'grad_norm': 0.5484574515233612, 'learning_rate': 1.2773881037223321e-05, 'epoch': 0.43} 43%|████▎ | 2843/6638 [2:42:40<3:27:49, 3.29s/it] 43%|████▎ | 2844/6638 [2:42:44<3:29:03, 3.31s/it] {'loss': 0.6711, 'grad_norm': 0.5887415517685555, 'learning_rate': 1.2769192434207853e-05, 'epoch': 0.43} 43%|████▎ | 2844/6638 [2:42:44<3:29:03, 3.31s/it] 43%|████▎ | 2845/6638 [2:42:47<3:27:44, 3.29s/it] {'loss': 0.6719, 'grad_norm': 0.6612483801202264, 'learning_rate': 1.2764503171789369e-05, 'epoch': 0.43} 43%|████▎ | 2845/6638 [2:42:47<3:27:44, 3.29s/it] 43%|████▎ | 2846/6638 [2:42:50<3:26:04, 3.26s/it] {'loss': 0.648, 'grad_norm': 0.6098155643636902, 'learning_rate': 1.2759813251084486e-05, 'epoch': 0.43} 43%|████▎ | 2846/6638 [2:42:50<3:26:04, 3.26s/it] 43%|████▎ | 2847/6638 [2:42:55<4:05:46, 3.89s/it] {'loss': 0.6958, 'grad_norm': 0.5937145023156897, 'learning_rate': 1.2755122673209963e-05, 'epoch': 0.43} 43%|████▎ | 2847/6638 [2:42:55<4:05:46, 3.89s/it] 43%|████▎ | 2848/6638 [2:42:59<3:57:01, 3.75s/it] {'loss': 0.6562, 'grad_norm': 0.5458624364019167, 'learning_rate': 1.2750431439282736e-05, 'epoch': 0.43} 43%|████▎ | 2848/6638 [2:42:59<3:57:01, 3.75s/it] 43%|████▎ | 2849/6638 [2:43:02<3:49:40, 3.64s/it] {'loss': 0.692, 'grad_norm': 0.5520046111923109, 'learning_rate': 1.2745739550419882e-05, 'epoch': 0.43} 43%|████▎ | 2849/6638 [2:43:02<3:49:40, 3.64s/it]3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 43%|████▎ | 2850/6638 [2:43:05<3:40:41, 3.50s/it]2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6892, 'grad_norm': 0.7120779787400626, 'learning_rate': 1.2741047007738638e-05, 'epoch': 0.43} 43%|████▎ | 2850/6638 [2:43:05<3:40:41, 3.50s/it] 43%|████▎ | 2851/6638 [2:43:09<3:39:46, 3.48s/it] {'loss': 0.6706, 'grad_norm': 0.6177485117200477, 'learning_rate': 1.2736353812356396e-05, 'epoch': 0.43} 43%|████▎ | 2851/6638 [2:43:09<3:39:46, 3.48s/it] 43%|████▎ | 2852/6638 [2:43:12<3:35:30, 3.42s/it] {'loss': 0.6015, 'grad_norm': 0.510668124373749, 'learning_rate': 1.2731659965390707e-05, 'epoch': 0.43} 43%|████▎ | 2852/6638 [2:43:12<3:35:30, 3.42s/it] 43%|████▎ | 2853/6638 [2:43:15<3:31:18, 3.35s/it] {'loss': 0.6148, 'grad_norm': 0.5773394825680992, 'learning_rate': 1.2726965467959275e-05, 'epoch': 0.43} 43%|████▎ | 2853/6638 [2:43:15<3:31:18, 3.35s/it] 43%|████▎ | 2854/6638 [2:43:19<3:30:32, 3.34s/it] {'loss': 0.6497, 'grad_norm': 0.6418460446206453, 'learning_rate': 1.2722270321179958e-05, 'epoch': 0.43} 43%|████▎ | 2854/6638 [2:43:19<3:30:32, 3.34s/it] 43%|████▎ | 2855/6638 [2:43:22<3:28:27, 3.31s/it] {'loss': 0.6211, 'grad_norm': 0.5696647133460181, 'learning_rate': 1.2717574526170769e-05, 'epoch': 0.43} 43%|████▎ | 2855/6638 [2:43:22<3:28:27, 3.31s/it] 43%|████▎ | 2856/6638 [2:43:25<3:26:27, 3.28s/it] {'loss': 0.7205, 'grad_norm': 0.7566649377207536, 'learning_rate': 1.2712878084049873e-05, 'epoch': 0.43} 43%|████▎ | 2856/6638 [2:43:25<3:26:27, 3.28s/it] 43%|████▎ | 2857/6638 [2:43:28<3:23:44, 3.23s/it] {'loss': 0.6478, 'grad_norm': 0.5972661052614049, 'learning_rate': 1.2708180995935595e-05, 'epoch': 0.43} 43%|████▎ | 2857/6638 [2:43:28<3:23:44, 3.23s/it] 43%|████▎ | 2858/6638 [2:43:32<3:26:54, 3.28s/it] {'loss': 0.6875, 'grad_norm': 0.7031549267907364, 'learning_rate': 1.2703483262946415e-05, 'epoch': 0.43} 43%|████▎ | 2858/6638 [2:43:32<3:26:54, 3.28s/it] 43%|████▎ | 2859/6638 [2:43:35<3:28:34, 3.31s/it] {'loss': 0.6773, 'grad_norm': 0.608420418903322, 'learning_rate': 1.2698784886200953e-05, 'epoch': 0.43} 43%|████▎ | 2859/6638 [2:43:35<3:28:34, 3.31s/it] 43%|████▎ | 2860/6638 [2:43:38<3:29:16, 3.32s/it] {'loss': 0.6349, 'grad_norm': 0.540358436509329, 'learning_rate': 1.2694085866817995e-05, 'epoch': 0.43} 43%|████▎ | 2860/6638 [2:43:38<3:29:16, 3.32s/it] 43%|████▎ | 2861/6638 [2:43:42<3:26:26, 3.28s/it] {'loss': 0.6714, 'grad_norm': 0.6153288940649264, 'learning_rate': 1.2689386205916477e-05, 'epoch': 0.43} 43%|████▎ | 2861/6638 [2:43:42<3:26:26, 3.28s/it] 43%|████▎ | 2862/6638 [2:43:45<3:25:19, 3.26s/it] {'loss': 0.674, 'grad_norm': 0.6531263538097931, 'learning_rate': 1.2684685904615488e-05, 'epoch': 0.43} 43%|████▎ | 2862/6638 [2:43:45<3:25:19, 3.26s/it] 43%|████▎ | 2863/6638 [2:43:48<3:25:49, 3.27s/it] {'loss': 0.6935, 'grad_norm': 0.5898668370330908, 'learning_rate': 1.2679984964034269e-05, 'epoch': 0.43} 43%|████▎ | 2863/6638 [2:43:48<3:25:49, 3.27s/it] 43%|████▎ | 2864/6638 [2:43:54<4:11:32, 4.00s/it] {'loss': 0.659, 'grad_norm': 0.6059460760630884, 'learning_rate': 1.2675283385292212e-05, 'epoch': 0.43} 43%|████▎ | 2864/6638 [2:43:54<4:11:32, 4.00s/it] 43%|████▎ | 2865/6638 [2:43:57<4:00:42, 3.83s/it] {'loss': 0.6451, 'grad_norm': 0.5341088163025515, 'learning_rate': 1.2670581169508857e-05, 'epoch': 0.43} 43%|████▎ | 2865/6638 [2:43:57<4:00:42, 3.83s/it] 43%|████▎ | 2866/6638 [2:44:01<3:52:43, 3.70s/it] {'loss': 0.665, 'grad_norm': 0.5743285407274343, 'learning_rate': 1.266587831780391e-05, 'epoch': 0.43} 43%|████▎ | 2866/6638 [2:44:01<3:52:43, 3.70s/it] 43%|████▎ | 2867/6638 [2:44:06<4:27:41, 4.26s/it] {'loss': 0.669, 'grad_norm': 0.6020825077087806, 'learning_rate': 1.2661174831297212e-05, 'epoch': 0.43} 43%|████▎ | 2867/6638 [2:44:06<4:27:41, 4.26s/it] 43%|████▎ | 2868/6638 [2:44:09<4:10:07, 3.98s/it] {'loss': 0.6507, 'grad_norm': 0.5840314732696104, 'learning_rate': 1.2656470711108763e-05, 'epoch': 0.43} 43%|████▎ | 2868/6638 [2:44:09<4:10:07, 3.98s/it] 43%|████▎ | 2869/6638 [2:44:13<3:58:19, 3.79s/it] {'loss': 0.6791, 'grad_norm': 0.5910181025871062, 'learning_rate': 1.2651765958358717e-05, 'epoch': 0.43} 43%|████▎ | 2869/6638 [2:44:13<3:58:19, 3.79s/it] 43%|████▎ | 2870/6638 [2:44:16<3:46:36, 3.61s/it] {'loss': 0.6283, 'grad_norm': 0.6090788855865854, 'learning_rate': 1.2647060574167374e-05, 'epoch': 0.43} 43%|████▎ | 2870/6638 [2:44:16<3:46:36, 3.61s/it] 43%|████▎ | 2871/6638 [2:44:19<3:40:07, 3.51s/it] {'loss': 0.651, 'grad_norm': 0.6360147290652944, 'learning_rate': 1.2642354559655177e-05, 'epoch': 0.43} 43%|████▎ | 2871/6638 [2:44:19<3:40:07, 3.51s/it] 43%|████▎ | 2872/6638 [2:44:23<3:35:11, 3.43s/it] {'loss': 0.6794, 'grad_norm': 0.6899175361388559, 'learning_rate': 1.263764791594274e-05, 'epoch': 0.43} 43%|████▎ | 2872/6638 [2:44:23<3:35:11, 3.43s/it] 43%|████▎ | 2873/6638 [2:44:26<3:32:04, 3.38s/it] {'loss': 0.6596, 'grad_norm': 0.5969677159166767, 'learning_rate': 1.2632940644150803e-05, 'epoch': 0.43} 43%|████▎ | 2873/6638 [2:44:26<3:32:04, 3.38s/it] 43%|████▎ | 2874/6638 [2:44:29<3:31:00, 3.36s/it] {'loss': 0.6368, 'grad_norm': 0.6517951700803196, 'learning_rate': 1.2628232745400269e-05, 'epoch': 0.43} 43%|████▎ | 2874/6638 [2:44:29<3:31:00, 3.36s/it] 43%|████▎ | 2875/6638 [2:44:32<3:27:00, 3.30s/it] {'loss': 0.6386, 'grad_norm': 0.5886127153081238, 'learning_rate': 1.2623524220812193e-05, 'epoch': 0.43} 43%|████▎ | 2875/6638 [2:44:32<3:27:00, 3.30s/it] 43%|████▎ | 2876/6638 [2:44:36<3:27:04, 3.30s/it] {'loss': 0.6251, 'grad_norm': 0.5656127249539123, 'learning_rate': 1.2618815071507768e-05, 'epoch': 0.43} 43%|████▎ | 2876/6638 [2:44:36<3:27:04, 3.30s/it] 43%|████▎ | 2877/6638 [2:44:39<3:27:16, 3.31s/it] {'loss': 0.6246, 'grad_norm': 0.5348742963703079, 'learning_rate': 1.2614105298608347e-05, 'epoch': 0.43} 43%|████▎ | 2877/6638 [2:44:39<3:27:16, 3.31s/it] 43%|████▎ | 2878/6638 [2:44:42<3:28:20, 3.32s/it] {'loss': 0.7117, 'grad_norm': 0.6441122166831321, 'learning_rate': 1.260939490323542e-05, 'epoch': 0.43} 43%|████▎ | 2878/6638 [2:44:42<3:28:20, 3.32s/it] 43%|████▎ | 2879/6638 [2:44:46<3:27:10, 3.31s/it] {'loss': 0.6352, 'grad_norm': 0.5379343990742881, 'learning_rate': 1.2604683886510635e-05, 'epoch': 0.43} 43%|████▎ | 2879/6638 [2:44:46<3:27:10, 3.31s/it] 43%|████▎ | 2880/6638 [2:44:49<3:27:42, 3.32s/it] {'loss': 0.6516, 'grad_norm': 0.5629406465024573, 'learning_rate': 1.2599972249555782e-05, 'epoch': 0.43} 43%|████▎ | 2880/6638 [2:44:49<3:27:42, 3.32s/it] 43%|████▎ | 2881/6638 [2:44:52<3:28:46, 3.33s/it] {'loss': 0.657, 'grad_norm': 0.6025074363245875, 'learning_rate': 1.2595259993492808e-05, 'epoch': 0.43} 43%|████▎ | 2881/6638 [2:44:52<3:28:46, 3.33s/it] 43%|████▎ | 2882/6638 [2:44:55<3:27:30, 3.31s/it] {'loss': 0.6538, 'grad_norm': 0.6303338920042446, 'learning_rate': 1.259054711944379e-05, 'epoch': 0.43} 43%|████▎ | 2882/6638 [2:44:55<3:27:30, 3.31s/it] 43%|████▎ | 2883/6638 [2:44:59<3:26:10, 3.29s/it] {'loss': 0.6789, 'grad_norm': 0.6217991001987672, 'learning_rate': 1.2585833628530967e-05, 'epoch': 0.43} 43%|████▎ | 2883/6638 [2:44:59<3:26:10, 3.29s/it] 43%|████▎ | 2884/6638 [2:45:04<4:11:30, 4.02s/it] {'loss': 0.7227, 'grad_norm': 0.6948605798078938, 'learning_rate': 1.2581119521876724e-05, 'epoch': 0.43} 43%|████▎ | 2884/6638 [2:45:04<4:11:30, 4.02s/it] 43%|████▎ | 2885/6638 [2:45:08<3:59:19, 3.83s/it] {'loss': 0.721, 'grad_norm': 0.6955379993025818, 'learning_rate': 1.2576404800603583e-05, 'epoch': 0.43} 43%|████▎ | 2885/6638 [2:45:08<3:59:19, 3.83s/it] 43%|████▎ | 2886/6638 [2:45:13<4:29:08, 4.30s/it] {'loss': 0.5922, 'grad_norm': 0.5224432075849884, 'learning_rate': 1.2571689465834223e-05, 'epoch': 0.43} 43%|████▎ | 2886/6638 [2:45:13<4:29:08, 4.30s/it] 43%|████▎ | 2887/6638 [2:45:17<4:10:58, 4.01s/it] {'loss': 0.6344, 'grad_norm': 0.5237176453850042, 'learning_rate': 1.2566973518691463e-05, 'epoch': 0.43} 43%|████▎ | 2887/6638 [2:45:17<4:10:58, 4.01s/it] 44%|████▎ | 2888/6638 [2:45:20<3:56:06, 3.78s/it] {'loss': 0.6881, 'grad_norm': 0.6502544771724825, 'learning_rate': 1.2562256960298267e-05, 'epoch': 0.44} 44%|████▎ | 2888/6638 [2:45:20<3:56:06, 3.78s/it] 44%|████▎ | 2889/6638 [2:45:23<3:48:17, 3.65s/it] {'loss': 0.7433, 'grad_norm': 0.6620987060708412, 'learning_rate': 1.2557539791777749e-05, 'epoch': 0.44} 44%|████▎ | 2889/6638 [2:45:23<3:48:17, 3.65s/it] 44%|████▎ | 2890/6638 [2:45:27<3:42:16, 3.56s/it] {'loss': 0.624, 'grad_norm': 0.5775529850495867, 'learning_rate': 1.2552822014253165e-05, 'epoch': 0.44} 44%|████▎ | 2890/6638 [2:45:27<3:42:16, 3.56s/it] 44%|████▎ | 2891/6638 [2:45:30<3:40:38, 3.53s/it] {'loss': 0.6897, 'grad_norm': 0.5939629225590424, 'learning_rate': 1.2548103628847923e-05, 'epoch': 0.44} 44%|████▎ | 2891/6638 [2:45:30<3:40:38, 3.53s/it] 44%|████▎ | 2892/6638 [2:45:35<4:01:11, 3.86s/it] {'loss': 0.6552, 'grad_norm': 0.5829690414118291, 'learning_rate': 1.2543384636685561e-05, 'epoch': 0.44} 44%|████▎ | 2892/6638 [2:45:35<4:01:11, 3.86s/it] 44%|████▎ | 2893/6638 [2:45:38<3:49:25, 3.68s/it] {'loss': 0.6572, 'grad_norm': 0.6490394752774048, 'learning_rate': 1.2538665038889775e-05, 'epoch': 0.44} 44%|████▎ | 2893/6638 [2:45:38<3:49:25, 3.68s/it] 44%|████▎ | 2894/6638 [2:45:41<3:41:29, 3.55s/it] {'loss': 0.6536, 'grad_norm': 0.5848156501377424, 'learning_rate': 1.2533944836584397e-05, 'epoch': 0.44} 44%|████▎ | 2894/6638 [2:45:41<3:41:29, 3.55s/it] 44%|████▎ | 2895/6638 [2:45:44<3:38:04, 3.50s/it] {'loss': 0.6357, 'grad_norm': 0.5121406996679957, 'learning_rate': 1.2529224030893415e-05, 'epoch': 0.44} 44%|████▎ | 2895/6638 [2:45:44<3:38:04, 3.50s/it] 44%|████▎ | 2896/6638 [2:45:48<3:34:45, 3.44s/it] {'loss': 0.6849, 'grad_norm': 0.6189141283355242, 'learning_rate': 1.2524502622940944e-05, 'epoch': 0.44} 44%|████▎ | 2896/6638 [2:45:48<3:34:45, 3.44s/it] 44%|████▎ | 2897/6638 [2:45:51<3:30:03, 3.37s/it] {'loss': 0.6356, 'grad_norm': 0.6322114861875807, 'learning_rate': 1.2519780613851254e-05, 'epoch': 0.44} 44%|████▎ | 2897/6638 [2:45:51<3:30:03, 3.37s/it] 44%|████▎ | 2898/6638 [2:45:54<3:26:51, 3.32s/it] {'loss': 0.6741, 'grad_norm': 0.6508572552240017, 'learning_rate': 1.2515058004748753e-05, 'epoch': 0.44} 44%|████▎ | 2898/6638 [2:45:54<3:26:51, 3.32s/it] 44%|████▎ | 2899/6638 [2:45:58<3:28:18, 3.34s/it] {'loss': 0.6526, 'grad_norm': 0.6310132821530597, 'learning_rate': 1.2510334796757997e-05, 'epoch': 0.44} 44%|████▎ | 2899/6638 [2:45:58<3:28:18, 3.34s/it]32 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 44%|████▎ | 2900/6638 [2:46:01<3:28:18, 3.34s/it]6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6574, 'grad_norm': 0.5985131414839913, 'learning_rate': 1.2505610991003678e-05, 'epoch': 0.44} 44%|████▎ | 2900/6638 [2:46:01<3:28:18, 3.34s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-2900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 44%|████▎ | 2901/6638 [2:46:20<8:21:55, 8.06s/it] {'loss': 0.6876, 'grad_norm': 0.5803557735786551, 'learning_rate': 1.250088658861063e-05, 'epoch': 0.44} 44%|████▎ | 2901/6638 [2:46:20<8:21:55, 8.06s/it] 44%|████▎ | 2902/6638 [2:46:23<6:50:30, 6.59s/it] {'loss': 0.6398, 'grad_norm': 0.5814403461107563, 'learning_rate': 1.2496161590703844e-05, 'epoch': 0.44} 44%|████▎ | 2902/6638 [2:46:23<6:50:30, 6.59s/it] 44%|████▎ | 2903/6638 [2:46:26<5:48:28, 5.60s/it] {'loss': 0.6061, 'grad_norm': 0.667881729593825, 'learning_rate': 1.249143599840843e-05, 'epoch': 0.44} 44%|████▎ | 2903/6638 [2:46:26<5:48:28, 5.60s/it] 44%|████▎ | 2904/6638 [2:46:30<5:05:10, 4.90s/it] {'loss': 0.6304, 'grad_norm': 0.6170360603393908, 'learning_rate': 1.2486709812849659e-05, 'epoch': 0.44} 44%|████▎ | 2904/6638 [2:46:30<5:05:10, 4.90s/it] 44%|████▍ | 2905/6638 [2:46:33<4:35:48, 4.43s/it] {'loss': 0.6427, 'grad_norm': 0.622093602604225, 'learning_rate': 1.248198303515293e-05, 'epoch': 0.44} 44%|████▍ | 2905/6638 [2:46:33<4:35:48, 4.43s/it] 44%|████▍ | 2906/6638 [2:46:36<4:14:38, 4.09s/it] {'loss': 0.6442, 'grad_norm': 0.6101365600806363, 'learning_rate': 1.2477255666443793e-05, 'epoch': 0.44} 44%|████▍ | 2906/6638 [2:46:36<4:14:38, 4.09s/it] 44%|████▍ | 2907/6638 [2:46:41<4:30:18, 4.35s/it] {'loss': 0.6127, 'grad_norm': 0.48501198487682606, 'learning_rate': 1.2472527707847926e-05, 'epoch': 0.44} 44%|████▍ | 2907/6638 [2:46:41<4:30:18, 4.35s/it] 44%|████▍ | 2908/6638 [2:46:45<4:10:53, 4.04s/it] {'loss': 0.6488, 'grad_norm': 0.6515109459665956, 'learning_rate': 1.2467799160491165e-05, 'epoch': 0.44} 44%|████▍ | 2908/6638 [2:46:45<4:10:53, 4.04s/it] 44%|████▍ | 2909/6638 [2:46:48<3:56:25, 3.80s/it] {'loss': 0.6619, 'grad_norm': 0.6269801919369047, 'learning_rate': 1.246307002549947e-05, 'epoch': 0.44} 44%|████▍ | 2909/6638 [2:46:48<3:56:25, 3.80s/it] 44%|████▍ | 2910/6638 [2:46:51<3:48:11, 3.67s/it] {'loss': 0.6671, 'grad_norm': 0.7158553834446646, 'learning_rate': 1.2458340303998954e-05, 'epoch': 0.44} 44%|████▍ | 2910/6638 [2:46:51<3:48:11, 3.67s/it] 44%|████▍ | 2911/6638 [2:46:55<3:40:29, 3.55s/it] {'loss': 0.6741, 'grad_norm': 0.5989814118471336, 'learning_rate': 1.2453609997115856e-05, 'epoch': 0.44} 44%|████▍ | 2911/6638 [2:46:55<3:40:29, 3.55s/it] 44%|████▍ | 2912/6638 [2:46:58<3:35:06, 3.46s/it] {'loss': 0.7025, 'grad_norm': 0.7969915034366338, 'learning_rate': 1.2448879105976567e-05, 'epoch': 0.44} 44%|████▍ | 2912/6638 [2:46:58<3:35:06, 3.46s/it] 44%|████▍ | 2913/6638 [2:47:01<3:30:34, 3.39s/it] {'loss': 0.619, 'grad_norm': 0.6211812175383659, 'learning_rate': 1.2444147631707606e-05, 'epoch': 0.44} 44%|████▍ | 2913/6638 [2:47:01<3:30:34, 3.39s/it] 44%|████▍ | 2914/6638 [2:47:04<3:29:40, 3.38s/it] {'loss': 0.6408, 'grad_norm': 0.6932658421036221, 'learning_rate': 1.2439415575435647e-05, 'epoch': 0.44} 44%|████▍ | 2914/6638 [2:47:04<3:29:40, 3.38s/it] 44%|████▍ | 2915/6638 [2:47:08<3:29:18, 3.37s/it] {'loss': 0.6252, 'grad_norm': 0.550647956404607, 'learning_rate': 1.2434682938287478e-05, 'epoch': 0.44} 44%|████▍ | 2915/6638 [2:47:08<3:29:18, 3.37s/it] 44%|████▍ | 2916/6638 [2:47:11<3:27:55, 3.35s/it] {'loss': 0.6613, 'grad_norm': 0.5770154521483221, 'learning_rate': 1.2429949721390053e-05, 'epoch': 0.44} 44%|████▍ | 2916/6638 [2:47:11<3:27:55, 3.35s/it] 44%|████▍ | 2917/6638 [2:47:14<3:26:34, 3.33s/it] {'loss': 0.66, 'grad_norm': 0.5976705537739693, 'learning_rate': 1.2425215925870439e-05, 'epoch': 0.44} 44%|████▍ | 2917/6638 [2:47:14<3:26:34, 3.33s/it] 44%|████▍ | 2918/6638 [2:47:17<3:23:20, 3.28s/it] {'loss': 0.5945, 'grad_norm': 0.552395095537515, 'learning_rate': 1.2420481552855862e-05, 'epoch': 0.44} 44%|████▍ | 2918/6638 [2:47:17<3:23:20, 3.28s/it] 44%|████▍ | 2919/6638 [2:47:21<3:22:49, 3.27s/it] {'loss': 0.6351, 'grad_norm': 0.5506246650721106, 'learning_rate': 1.2415746603473673e-05, 'epoch': 0.44} 44%|████▍ | 2919/6638 [2:47:21<3:22:49, 3.27s/it] 44%|████▍ | 2920/6638 [2:47:24<3:23:52, 3.29s/it] {'loss': 0.664, 'grad_norm': 0.6199986378009769, 'learning_rate': 1.2411011078851361e-05, 'epoch': 0.44} 44%|████▍ | 2920/6638 [2:47:24<3:23:52, 3.29s/it] 44%|████▍ | 2921/6638 [2:47:27<3:25:24, 3.32s/it] {'loss': 0.6799, 'grad_norm': 0.6039042182495659, 'learning_rate': 1.2406274980116552e-05, 'epoch': 0.44} 44%|████▍ | 2921/6638 [2:47:27<3:25:24, 3.32s/it] 44%|████▍ | 2922/6638 [2:47:31<3:24:15, 3.30s/it] {'loss': 0.6941, 'grad_norm': 0.5970905124393634, 'learning_rate': 1.2401538308397019e-05, 'epoch': 0.44} 44%|████▍ | 2922/6638 [2:47:31<3:24:15, 3.30s/it] 44%|████▍ | 2923/6638 [2:47:34<3:23:49, 3.29s/it] {'loss': 0.6427, 'grad_norm': 0.5593260375157922, 'learning_rate': 1.2396801064820654e-05, 'epoch': 0.44} 44%|████▍ | 2923/6638 [2:47:34<3:23:49, 3.29s/it] 44%|████▍ | 2924/6638 [2:47:37<3:23:53, 3.29s/it] {'loss': 0.6661, 'grad_norm': 0.5925935745091451, 'learning_rate': 1.23920632505155e-05, 'epoch': 0.44} 44%|████▍ | 2924/6638 [2:47:37<3:23:53, 3.29s/it] 44%|████▍ | 2925/6638 [2:47:41<3:28:39, 3.37s/it] {'loss': 0.6532, 'grad_norm': 0.5365332559805847, 'learning_rate': 1.2387324866609732e-05, 'epoch': 0.44} 44%|████▍ | 2925/6638 [2:47:41<3:28:39, 3.37s/it] 44%|████▍ | 2926/6638 [2:47:44<3:27:52, 3.36s/it] {'loss': 0.7021, 'grad_norm': 0.6323392499982149, 'learning_rate': 1.238258591423165e-05, 'epoch': 0.44} 44%|████▍ | 2926/6638 [2:47:44<3:27:52, 3.36s/it] 44%|████▍ | 2927/6638 [2:47:48<3:28:11, 3.37s/it] {'loss': 0.7145, 'grad_norm': 0.6315525509603241, 'learning_rate': 1.237784639450971e-05, 'epoch': 0.44} 44%|████▍ | 2927/6638 [2:47:48<3:28:11, 3.37s/it] 44%|████▍ | 2928/6638 [2:47:51<3:25:47, 3.33s/it] {'loss': 0.6459, 'grad_norm': 0.610770950520282, 'learning_rate': 1.2373106308572483e-05, 'epoch': 0.44} 44%|████▍ | 2928/6638 [2:47:51<3:25:47, 3.33s/it] 44%|████▍ | 2929/6638 [2:47:54<3:24:09, 3.30s/it] {'loss': 0.6491, 'grad_norm': 0.5729195820735082, 'learning_rate': 1.2368365657548686e-05, 'epoch': 0.44} 44%|████▍ | 2929/6638 [2:47:54<3:24:09, 3.30s/it] 44%|████▍ | 2930/6638 [2:47:57<3:21:04, 3.25s/it] {'loss': 0.6143, 'grad_norm': 0.5793399131618046, 'learning_rate': 1.2363624442567167e-05, 'epoch': 0.44} 44%|████▍ | 2930/6638 [2:47:57<3:21:04, 3.25s/it] 44%|████▍ | 2931/6638 [2:48:00<3:20:46, 3.25s/it] {'loss': 0.6665, 'grad_norm': 0.592571681546521, 'learning_rate': 1.235888266475691e-05, 'epoch': 0.44} 44%|████▍ | 2931/6638 [2:48:00<3:20:46, 3.25s/it] 44%|████▍ | 2932/6638 [2:48:04<3:21:32, 3.26s/it] {'loss': 0.6366, 'grad_norm': 0.6287442554892508, 'learning_rate': 1.2354140325247033e-05, 'epoch': 0.44} 44%|████▍ | 2932/6638 [2:48:04<3:21:32, 3.26s/it] 44%|████▍ | 2933/6638 [2:48:07<3:20:06, 3.24s/it] {'loss': 0.6916, 'grad_norm': 0.6446567366623647, 'learning_rate': 1.2349397425166788e-05, 'epoch': 0.44} 44%|████▍ | 2933/6638 [2:48:07<3:20:06, 3.24s/it] 44%|████▍ | 2934/6638 [2:48:10<3:19:41, 3.23s/it] {'loss': 0.6604, 'grad_norm': 0.5691119490759666, 'learning_rate': 1.2344653965645553e-05, 'epoch': 0.44} 44%|████▍ | 2934/6638 [2:48:10<3:19:41, 3.23s/it] 44%|████▍ | 2935/6638 [2:48:13<3:22:24, 3.28s/it] {'loss': 0.6671, 'grad_norm': 0.5592132470552621, 'learning_rate': 1.2339909947812851e-05, 'epoch': 0.44} 44%|████▍ | 2935/6638 [2:48:13<3:22:24, 3.28s/it] 44%|████▍ | 2936/6638 [2:48:17<3:24:04, 3.31s/it] {'loss': 0.616, 'grad_norm': 0.5181448853928653, 'learning_rate': 1.233516537279833e-05, 'epoch': 0.44} 44%|████▍ | 2936/6638 [2:48:17<3:24:04, 3.31s/it] 44%|████▍ | 2937/6638 [2:48:20<3:26:06, 3.34s/it] {'loss': 0.6586, 'grad_norm': 0.5991386649704505, 'learning_rate': 1.2330420241731778e-05, 'epoch': 0.44} 44%|████▍ | 2937/6638 [2:48:20<3:26:06, 3.34s/it] 44%|████▍ | 2938/6638 [2:48:24<3:25:50, 3.34s/it] {'loss': 0.6802, 'grad_norm': 0.6321259446644523, 'learning_rate': 1.2325674555743106e-05, 'epoch': 0.44} 44%|████▍ | 2938/6638 [2:48:24<3:25:50, 3.34s/it] 44%|████▍ | 2939/6638 [2:48:27<3:25:26, 3.33s/it] {'loss': 0.6495, 'grad_norm': 0.622782540706548, 'learning_rate': 1.2320928315962362e-05, 'epoch': 0.44} 44%|████▍ | 2939/6638 [2:48:27<3:25:26, 3.33s/it] 44%|████▍ | 2940/6638 [2:48:30<3:25:02, 3.33s/it] {'loss': 0.6864, 'grad_norm': 0.6489402445303075, 'learning_rate': 1.2316181523519725e-05, 'epoch': 0.44} 44%|████▍ | 2940/6638 [2:48:30<3:25:02, 3.33s/it] 44%|████▍ | 2941/6638 [2:48:34<3:26:04, 3.34s/it] {'loss': 0.6624, 'grad_norm': 0.5949231955313828, 'learning_rate': 1.231143417954551e-05, 'epoch': 0.44} 44%|████▍ | 2941/6638 [2:48:34<3:26:04, 3.34s/it] 44%|████▍ | 2942/6638 [2:48:37<3:23:54, 3.31s/it] {'loss': 0.6486, 'grad_norm': 0.646954316747398, 'learning_rate': 1.2306686285170156e-05, 'epoch': 0.44} 44%|████▍ | 2942/6638 [2:48:37<3:23:54, 3.31s/it] 44%|████▍ | 2943/6638 [2:48:40<3:23:15, 3.30s/it] {'loss': 0.6616, 'grad_norm': 0.600652095790168, 'learning_rate': 1.230193784152424e-05, 'epoch': 0.44} 44%|████▍ | 2943/6638 [2:48:40<3:23:15, 3.30s/it] 44%|████▍ | 2944/6638 [2:48:43<3:22:24, 3.29s/it] {'loss': 0.6493, 'grad_norm': 0.5468823343056689, 'learning_rate': 1.2297188849738462e-05, 'epoch': 0.44} 44%|████▍ | 2944/6638 [2:48:43<3:22:24, 3.29s/it] 44%|████▍ | 2945/6638 [2:48:47<3:22:48, 3.30s/it] {'loss': 0.6559, 'grad_norm': 0.578695070001257, 'learning_rate': 1.2292439310943658e-05, 'epoch': 0.44} 44%|████▍ | 2945/6638 [2:48:47<3:22:48, 3.30s/it] 44%|████▍ | 2946/6638 [2:48:50<3:22:00, 3.28s/it] {'loss': 0.6734, 'grad_norm': 0.6294330582846047, 'learning_rate': 1.2287689226270794e-05, 'epoch': 0.44} 44%|████▍ | 2946/6638 [2:48:50<3:22:00, 3.28s/it] 44%|████▍ | 2947/6638 [2:48:53<3:24:38, 3.33s/it] {'loss': 0.6416, 'grad_norm': 0.6630906624152908, 'learning_rate': 1.2282938596850968e-05, 'epoch': 0.44} 44%|████▍ | 2947/6638 [2:48:53<3:24:38, 3.33s/it] 44%|████▍ | 2948/6638 [2:48:57<3:23:56, 3.32s/it] {'loss': 0.6714, 'grad_norm': 0.7258808545453754, 'learning_rate': 1.2278187423815402e-05, 'epoch': 0.44} 44%|████▍ | 2948/6638 [2:48:57<3:23:56, 3.32s/it] 44%|████▍ | 2949/6638 [2:49:00<3:21:41, 3.28s/it] {'loss': 0.6307, 'grad_norm': 0.5678405929460838, 'learning_rate': 1.2273435708295451e-05, 'epoch': 0.44} 44%|████▍ | 2949/6638 [2:49:00<3:21:41, 3.28s/it]3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 44%|████▍ | 2950/6638 [2:49:03<3:20:11, 3.26s/it]1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6239, 'grad_norm': 0.632140822383796, 'learning_rate': 1.2268683451422596e-05, 'epoch': 0.44} 44%|████▍ | 2950/6638 [2:49:03<3:20:11, 3.26s/it] 44%|████▍ | 2951/6638 [2:49:06<3:19:57, 3.25s/it] {'loss': 0.6688, 'grad_norm': 0.6211201905729622, 'learning_rate': 1.2263930654328452e-05, 'epoch': 0.44} 44%|████▍ | 2951/6638 [2:49:06<3:19:57, 3.25s/it] 44%|████▍ | 2952/6638 [2:49:10<3:19:33, 3.25s/it] {'loss': 0.6786, 'grad_norm': 0.630383286469929, 'learning_rate': 1.2259177318144762e-05, 'epoch': 0.44} 44%|████▍ | 2952/6638 [2:49:10<3:19:33, 3.25s/it] 44%|████▍ | 2953/6638 [2:49:13<3:18:41, 3.24s/it] {'loss': 0.6731, 'grad_norm': 0.624396037191847, 'learning_rate': 1.2254423444003387e-05, 'epoch': 0.44} 44%|████▍ | 2953/6638 [2:49:13<3:18:41, 3.24s/it] 45%|████▍ | 2954/6638 [2:49:16<3:19:10, 3.24s/it] {'loss': 0.6263, 'grad_norm': 0.5736438858195643, 'learning_rate': 1.2249669033036336e-05, 'epoch': 0.45} 45%|████▍ | 2954/6638 [2:49:16<3:19:10, 3.24s/it] 45%|████▍ | 2955/6638 [2:49:19<3:19:06, 3.24s/it] {'loss': 0.6872, 'grad_norm': 0.6192969050648685, 'learning_rate': 1.2244914086375726e-05, 'epoch': 0.45} 45%|████▍ | 2955/6638 [2:49:19<3:19:06, 3.24s/it] 45%|████▍ | 2956/6638 [2:49:22<3:18:16, 3.23s/it] {'loss': 0.6742, 'grad_norm': 0.5764695652619181, 'learning_rate': 1.2240158605153814e-05, 'epoch': 0.45} 45%|████▍ | 2956/6638 [2:49:22<3:18:16, 3.23s/it] 45%|████▍ | 2957/6638 [2:49:26<3:17:19, 3.22s/it] {'loss': 0.6254, 'grad_norm': 0.6028134177032795, 'learning_rate': 1.223540259050298e-05, 'epoch': 0.45} 45%|████▍ | 2957/6638 [2:49:26<3:17:19, 3.22s/it] 45%|████▍ | 2958/6638 [2:49:29<3:18:31, 3.24s/it] {'loss': 0.6608, 'grad_norm': 0.5849097378495991, 'learning_rate': 1.2230646043555729e-05, 'epoch': 0.45} 45%|████▍ | 2958/6638 [2:49:29<3:18:31, 3.24s/it] 45%|████▍ | 2959/6638 [2:49:32<3:17:51, 3.23s/it] {'loss': 0.6398, 'grad_norm': 0.5741480237401129, 'learning_rate': 1.2225888965444694e-05, 'epoch': 0.45} 45%|████▍ | 2959/6638 [2:49:32<3:17:51, 3.23s/it] 45%|████▍ | 2960/6638 [2:49:35<3:17:40, 3.22s/it] {'loss': 0.6404, 'grad_norm': 0.5864280503341437, 'learning_rate': 1.2221131357302643e-05, 'epoch': 0.45} 45%|████▍ | 2960/6638 [2:49:35<3:17:40, 3.22s/it] 45%|████▍ | 2961/6638 [2:49:39<3:20:47, 3.28s/it] {'loss': 0.6289, 'grad_norm': 0.629742556160432, 'learning_rate': 1.2216373220262453e-05, 'epoch': 0.45} 45%|████▍ | 2961/6638 [2:49:39<3:20:47, 3.28s/it] 45%|████▍ | 2962/6638 [2:49:42<3:20:37, 3.27s/it] {'loss': 0.6511, 'grad_norm': 0.5562675998514632, 'learning_rate': 1.221161455545714e-05, 'epoch': 0.45} 45%|████▍ | 2962/6638 [2:49:42<3:20:37, 3.27s/it] 45%|████▍ | 2963/6638 [2:49:45<3:19:53, 3.26s/it] {'loss': 0.6458, 'grad_norm': 0.6780498537868972, 'learning_rate': 1.2206855364019845e-05, 'epoch': 0.45} 45%|████▍ | 2963/6638 [2:49:45<3:19:53, 3.26s/it] 45%|████▍ | 2964/6638 [2:49:49<3:21:06, 3.28s/it] {'loss': 0.6661, 'grad_norm': 0.6449280785072801, 'learning_rate': 1.220209564708383e-05, 'epoch': 0.45} 45%|████▍ | 2964/6638 [2:49:49<3:21:06, 3.28s/it] 45%|████▍ | 2965/6638 [2:49:52<3:20:51, 3.28s/it] {'loss': 0.6296, 'grad_norm': 0.49796530252914994, 'learning_rate': 1.219733540578248e-05, 'epoch': 0.45} 45%|████▍ | 2965/6638 [2:49:52<3:20:51, 3.28s/it] 45%|████▍ | 2966/6638 [2:49:55<3:19:24, 3.26s/it] {'loss': 0.6262, 'grad_norm': 0.617450205764656, 'learning_rate': 1.2192574641249318e-05, 'epoch': 0.45} 45%|████▍ | 2966/6638 [2:49:55<3:19:24, 3.26s/it] 45%|████▍ | 2967/6638 [2:49:58<3:20:29, 3.28s/it] {'loss': 0.6273, 'grad_norm': 0.518809155839549, 'learning_rate': 1.2187813354617973e-05, 'epoch': 0.45} 45%|████▍ | 2967/6638 [2:49:58<3:20:29, 3.28s/it] 45%|████▍ | 2968/6638 [2:50:02<3:18:45, 3.25s/it] {'loss': 0.6834, 'grad_norm': 0.6227225416771147, 'learning_rate': 1.2183051547022212e-05, 'epoch': 0.45} 45%|████▍ | 2968/6638 [2:50:02<3:18:45, 3.25s/it] 45%|████▍ | 2969/6638 [2:50:05<3:18:18, 3.24s/it] {'loss': 0.6548, 'grad_norm': 0.617261173180276, 'learning_rate': 1.2178289219595917e-05, 'epoch': 0.45} 45%|████▍ | 2969/6638 [2:50:05<3:18:18, 3.24s/it] 45%|████▍ | 2970/6638 [2:50:08<3:16:01, 3.21s/it] {'loss': 0.6184, 'grad_norm': 0.6127107103419333, 'learning_rate': 1.2173526373473105e-05, 'epoch': 0.45} 45%|████▍ | 2970/6638 [2:50:08<3:16:01, 3.21s/it] 45%|████▍ | 2971/6638 [2:50:11<3:17:43, 3.24s/it] {'loss': 0.6482, 'grad_norm': 0.5893368212386408, 'learning_rate': 1.216876300978791e-05, 'epoch': 0.45} 45%|████▍ | 2971/6638 [2:50:11<3:17:43, 3.24s/it] 45%|████▍ | 2972/6638 [2:50:14<3:17:11, 3.23s/it] {'loss': 0.6993, 'grad_norm': 0.6334554689630555, 'learning_rate': 1.2163999129674583e-05, 'epoch': 0.45} 45%|████▍ | 2972/6638 [2:50:14<3:17:11, 3.23s/it] 45%|████▍ | 2973/6638 [2:50:18<3:20:27, 3.28s/it] {'loss': 0.6411, 'grad_norm': 0.5654192811515041, 'learning_rate': 1.2159234734267504e-05, 'epoch': 0.45} 45%|████▍ | 2973/6638 [2:50:18<3:20:27, 3.28s/it] 45%|████▍ | 2974/6638 [2:50:21<3:20:23, 3.28s/it] {'loss': 0.6733, 'grad_norm': 0.661250535730681, 'learning_rate': 1.2154469824701185e-05, 'epoch': 0.45} 45%|████▍ | 2974/6638 [2:50:21<3:20:23, 3.28s/it] 45%|████▍ | 2975/6638 [2:50:25<3:24:38, 3.35s/it] {'loss': 0.6861, 'grad_norm': 0.6281835335770245, 'learning_rate': 1.2149704402110243e-05, 'epoch': 0.45} 45%|████▍ | 2975/6638 [2:50:25<3:24:38, 3.35s/it] 45%|████▍ | 2976/6638 [2:50:28<3:22:36, 3.32s/it] {'loss': 0.6636, 'grad_norm': 0.6360566052516738, 'learning_rate': 1.2144938467629426e-05, 'epoch': 0.45} 45%|████▍ | 2976/6638 [2:50:28<3:22:36, 3.32s/it] 45%|████▍ | 2977/6638 [2:50:31<3:22:12, 3.31s/it] {'loss': 0.6358, 'grad_norm': 0.5377599931394372, 'learning_rate': 1.2140172022393608e-05, 'epoch': 0.45} 45%|████▍ | 2977/6638 [2:50:31<3:22:12, 3.31s/it] 45%|████▍ | 2978/6638 [2:50:35<3:23:14, 3.33s/it] {'loss': 0.6801, 'grad_norm': 0.602247636925524, 'learning_rate': 1.2135405067537778e-05, 'epoch': 0.45} 45%|████▍ | 2978/6638 [2:50:35<3:23:14, 3.33s/it] 45%|████▍ | 2979/6638 [2:50:38<3:21:43, 3.31s/it] {'loss': 0.6458, 'grad_norm': 0.5999306201648944, 'learning_rate': 1.2130637604197043e-05, 'epoch': 0.45} 45%|████▍ | 2979/6638 [2:50:38<3:21:43, 3.31s/it] 45%|████▍ | 2980/6638 [2:50:41<3:22:26, 3.32s/it] {'loss': 0.6775, 'grad_norm': 0.6103843738031605, 'learning_rate': 1.2125869633506643e-05, 'epoch': 0.45} 45%|████▍ | 2980/6638 [2:50:41<3:22:26, 3.32s/it] 45%|████▍ | 2981/6638 [2:50:44<3:20:49, 3.29s/it] {'loss': 0.6222, 'grad_norm': 0.5609918640303528, 'learning_rate': 1.2121101156601932e-05, 'epoch': 0.45} 45%|████▍ | 2981/6638 [2:50:44<3:20:49, 3.29s/it] 45%|████▍ | 2982/6638 [2:50:48<3:19:39, 3.28s/it] {'loss': 0.6428, 'grad_norm': 0.6008886822608164, 'learning_rate': 1.2116332174618378e-05, 'epoch': 0.45} 45%|████▍ | 2982/6638 [2:50:48<3:19:39, 3.28s/it] 45%|████▍ | 2983/6638 [2:50:51<3:19:54, 3.28s/it] {'loss': 0.6502, 'grad_norm': 0.5539301958962769, 'learning_rate': 1.2111562688691587e-05, 'epoch': 0.45} 45%|████▍ | 2983/6638 [2:50:51<3:19:54, 3.28s/it] 45%|████▍ | 2984/6638 [2:50:54<3:17:19, 3.24s/it] {'loss': 0.688, 'grad_norm': 0.701974132307586, 'learning_rate': 1.2106792699957264e-05, 'epoch': 0.45} 45%|████▍ | 2984/6638 [2:50:54<3:17:19, 3.24s/it] 45%|████▍ | 2985/6638 [2:50:57<3:18:34, 3.26s/it] {'loss': 0.6494, 'grad_norm': 0.5927804786786363, 'learning_rate': 1.2102022209551249e-05, 'epoch': 0.45} 45%|████▍ | 2985/6638 [2:50:57<3:18:34, 3.26s/it] 45%|████▍ | 2986/6638 [2:51:01<3:21:51, 3.32s/it] {'loss': 0.6408, 'grad_norm': 0.5674928665498596, 'learning_rate': 1.2097251218609494e-05, 'epoch': 0.45} 45%|████▍ | 2986/6638 [2:51:01<3:21:51, 3.32s/it] 45%|████▍ | 2987/6638 [2:51:04<3:21:06, 3.31s/it] {'loss': 0.6314, 'grad_norm': 0.5865207098208618, 'learning_rate': 1.2092479728268073e-05, 'epoch': 0.45} 45%|████▍ | 2987/6638 [2:51:04<3:21:06, 3.31s/it] 45%|████▌ | 2988/6638 [2:51:08<3:23:22, 3.34s/it] {'loss': 0.6702, 'grad_norm': 0.6276884776111197, 'learning_rate': 1.2087707739663177e-05, 'epoch': 0.45} 45%|████▌ | 2988/6638 [2:51:08<3:23:22, 3.34s/it] 45%|████▌ | 2989/6638 [2:51:11<3:21:25, 3.31s/it] {'loss': 0.6764, 'grad_norm': 0.5404726839562132, 'learning_rate': 1.2082935253931121e-05, 'epoch': 0.45} 45%|████▌ | 2989/6638 [2:51:11<3:21:25, 3.31s/it] 45%|████▌ | 2990/6638 [2:51:14<3:19:51, 3.29s/it] {'loss': 0.6266, 'grad_norm': 0.5447237294197628, 'learning_rate': 1.2078162272208332e-05, 'epoch': 0.45} 45%|████▌ | 2990/6638 [2:51:14<3:19:51, 3.29s/it] 45%|████▌ | 2991/6638 [2:51:17<3:19:36, 3.28s/it] {'loss': 0.6785, 'grad_norm': 0.6331279220723166, 'learning_rate': 1.2073388795631355e-05, 'epoch': 0.45} 45%|████▌ | 2991/6638 [2:51:17<3:19:36, 3.28s/it] 45%|████▌ | 2992/6638 [2:51:21<3:20:08, 3.29s/it] {'loss': 0.6397, 'grad_norm': 0.5443690935550074, 'learning_rate': 1.2068614825336856e-05, 'epoch': 0.45} 45%|████▌ | 2992/6638 [2:51:21<3:20:08, 3.29s/it] 45%|████▌ | 2993/6638 [2:51:24<3:17:55, 3.26s/it] {'loss': 0.6475, 'grad_norm': 0.6465153672357878, 'learning_rate': 1.2063840362461621e-05, 'epoch': 0.45} 45%|████▌ | 2993/6638 [2:51:24<3:17:55, 3.26s/it] 45%|████▌ | 2994/6638 [2:51:27<3:17:54, 3.26s/it] {'loss': 0.6581, 'grad_norm': 0.6436260987113472, 'learning_rate': 1.205906540814255e-05, 'epoch': 0.45} 45%|████▌ | 2994/6638 [2:51:27<3:17:54, 3.26s/it] 45%|████▌ | 2995/6638 [2:51:30<3:17:43, 3.26s/it] {'loss': 0.6595, 'grad_norm': 0.6339139750297472, 'learning_rate': 1.2054289963516656e-05, 'epoch': 0.45} 45%|████▌ | 2995/6638 [2:51:30<3:17:43, 3.26s/it] 45%|████▌ | 2996/6638 [2:51:34<3:18:59, 3.28s/it] {'loss': 0.6775, 'grad_norm': 0.6495057720080203, 'learning_rate': 1.2049514029721077e-05, 'epoch': 0.45} 45%|████▌ | 2996/6638 [2:51:34<3:18:59, 3.28s/it] 45%|████▌ | 2997/6638 [2:51:37<3:19:24, 3.29s/it] {'loss': 0.638, 'grad_norm': 0.823850257873031, 'learning_rate': 1.204473760789306e-05, 'epoch': 0.45} 45%|████▌ | 2997/6638 [2:51:37<3:19:24, 3.29s/it] 45%|████▌ | 2998/6638 [2:51:40<3:17:35, 3.26s/it] {'loss': 0.6388, 'grad_norm': 0.5602735506281792, 'learning_rate': 1.2039960699169972e-05, 'epoch': 0.45} 45%|████▌ | 2998/6638 [2:51:40<3:17:35, 3.26s/it] 45%|████▌ | 2999/6638 [2:51:43<3:16:44, 3.24s/it] {'loss': 0.6106, 'grad_norm': 0.5177276132898304, 'learning_rate': 1.2035183304689303e-05, 'epoch': 0.45} 45%|████▌ | 2999/6638 [2:51:43<3:16:44, 3.24s/it]2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 45%|████▌ | 3000/6638 [2:51:47<3:18:27, 3.27s/it]6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.652, 'grad_norm': 0.6151880187745138, 'learning_rate': 1.2030405425588638e-05, 'epoch': 0.45} 45%|████▌ | 3000/6638 [2:51:47<3:18:27, 3.27s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 45%|████▌ | 3001/6638 [2:52:04<7:26:25, 7.36s/it] {'loss': 0.6388, 'grad_norm': 0.5725036207935896, 'learning_rate': 1.2025627063005703e-05, 'epoch': 0.45} 45%|████▌ | 3001/6638 [2:52:04<7:26:25, 7.36s/it] 45%|████▌ | 3002/6638 [2:52:07<6:12:30, 6.15s/it] {'loss': 0.6266, 'grad_norm': 0.5687447969559645, 'learning_rate': 1.2020848218078315e-05, 'epoch': 0.45} 45%|████▌ | 3002/6638 [2:52:07<6:12:30, 6.15s/it] 45%|████▌ | 3003/6638 [2:52:10<5:19:08, 5.27s/it] {'loss': 0.6528, 'grad_norm': 0.5983227724752451, 'learning_rate': 1.201606889194443e-05, 'epoch': 0.45} 45%|████▌ | 3003/6638 [2:52:10<5:19:08, 5.27s/it] 45%|████▌ | 3004/6638 [2:52:13<4:43:35, 4.68s/it] {'loss': 0.6932, 'grad_norm': 0.6137047673898204, 'learning_rate': 1.2011289085742099e-05, 'epoch': 0.45} 45%|████▌ | 3004/6638 [2:52:13<4:43:35, 4.68s/it] 45%|████▌ | 3005/6638 [2:52:17<4:17:33, 4.25s/it] {'loss': 0.6302, 'grad_norm': 0.5616251947551949, 'learning_rate': 1.2006508800609495e-05, 'epoch': 0.45} 45%|████▌ | 3005/6638 [2:52:17<4:17:33, 4.25s/it] 45%|████▌ | 3006/6638 [2:52:20<3:58:25, 3.94s/it] {'loss': 0.662, 'grad_norm': 0.6535326246351111, 'learning_rate': 1.2001728037684903e-05, 'epoch': 0.45} 45%|████▌ | 3006/6638 [2:52:20<3:58:25, 3.94s/it] 45%|████▌ | 3007/6638 [2:52:23<3:45:22, 3.72s/it] {'loss': 0.6548, 'grad_norm': 0.7034630558714904, 'learning_rate': 1.1996946798106728e-05, 'epoch': 0.45} 45%|████▌ | 3007/6638 [2:52:23<3:45:22, 3.72s/it] 45%|████▌ | 3008/6638 [2:52:26<3:35:08, 3.56s/it] {'loss': 0.6325, 'grad_norm': 0.6217649984619366, 'learning_rate': 1.199216508301348e-05, 'epoch': 0.45} 45%|████▌ | 3008/6638 [2:52:26<3:35:08, 3.56s/it] 45%|████▌ | 3009/6638 [2:52:30<3:31:07, 3.49s/it] {'loss': 0.6577, 'grad_norm': 0.5547108317661548, 'learning_rate': 1.1987382893543786e-05, 'epoch': 0.45} 45%|████▌ | 3009/6638 [2:52:30<3:31:07, 3.49s/it] 45%|████▌ | 3010/6638 [2:52:33<3:27:18, 3.43s/it] {'loss': 0.6504, 'grad_norm': 0.5527128928061764, 'learning_rate': 1.198260023083639e-05, 'epoch': 0.45} 45%|████▌ | 3010/6638 [2:52:33<3:27:18, 3.43s/it] 45%|████▌ | 3011/6638 [2:52:36<3:25:37, 3.40s/it] {'loss': 0.6578, 'grad_norm': 0.5718474586244586, 'learning_rate': 1.1977817096030138e-05, 'epoch': 0.45} 45%|████▌ | 3011/6638 [2:52:36<3:25:37, 3.40s/it] 45%|████▌ | 3012/6638 [2:52:40<3:28:03, 3.44s/it] {'loss': 0.7132, 'grad_norm': 0.5953894607240025, 'learning_rate': 1.1973033490264e-05, 'epoch': 0.45} 45%|████▌ | 3012/6638 [2:52:40<3:28:03, 3.44s/it] 45%|████▌ | 3013/6638 [2:52:43<3:29:57, 3.48s/it] {'loss': 0.6583, 'grad_norm': 0.5956491939946196, 'learning_rate': 1.1968249414677055e-05, 'epoch': 0.45} 45%|████▌ | 3013/6638 [2:52:43<3:29:57, 3.48s/it] 45%|████▌ | 3014/6638 [2:52:47<3:28:13, 3.45s/it] {'loss': 0.7037, 'grad_norm': 0.6396365573550484, 'learning_rate': 1.196346487040849e-05, 'epoch': 0.45} 45%|████▌ | 3014/6638 [2:52:47<3:28:13, 3.45s/it] 45%|████▌ | 3015/6638 [2:52:50<3:24:31, 3.39s/it] {'loss': 0.5905, 'grad_norm': 0.5455275222744983, 'learning_rate': 1.1958679858597599e-05, 'epoch': 0.45} 45%|████▌ | 3015/6638 [2:52:50<3:24:31, 3.39s/it] 45%|████▌ | 3016/6638 [2:52:53<3:22:58, 3.36s/it] {'loss': 0.654, 'grad_norm': 0.6758694374743819, 'learning_rate': 1.1953894380383805e-05, 'epoch': 0.45} 45%|████▌ | 3016/6638 [2:52:53<3:22:58, 3.36s/it] 45%|████▌ | 3017/6638 [2:52:57<3:22:12, 3.35s/it] {'loss': 0.6428, 'grad_norm': 0.5929331277554535, 'learning_rate': 1.1949108436906625e-05, 'epoch': 0.45} 45%|████▌ | 3017/6638 [2:52:57<3:22:12, 3.35s/it] 45%|████▌ | 3018/6638 [2:53:00<3:19:41, 3.31s/it] {'loss': 0.6435, 'grad_norm': 0.6363850271095031, 'learning_rate': 1.1944322029305697e-05, 'epoch': 0.45} 45%|████▌ | 3018/6638 [2:53:00<3:19:41, 3.31s/it] 45%|████▌ | 3019/6638 [2:53:03<3:18:50, 3.30s/it] {'loss': 0.6418, 'grad_norm': 0.6379709262313806, 'learning_rate': 1.193953515872076e-05, 'epoch': 0.45} 45%|████▌ | 3019/6638 [2:53:03<3:18:50, 3.30s/it] 45%|████▌ | 3020/6638 [2:53:06<3:20:45, 3.33s/it] {'loss': 0.6557, 'grad_norm': 0.5657183347481478, 'learning_rate': 1.193474782629167e-05, 'epoch': 0.45} 45%|████▌ | 3020/6638 [2:53:06<3:20:45, 3.33s/it] 46%|████▌ | 3021/6638 [2:53:10<3:21:55, 3.35s/it] {'loss': 0.6779, 'grad_norm': 0.6057441686233195, 'learning_rate': 1.1929960033158392e-05, 'epoch': 0.46} 46%|████▌ | 3021/6638 [2:53:10<3:21:55, 3.35s/it] 46%|████▌ | 3022/6638 [2:53:13<3:19:18, 3.31s/it] {'loss': 0.6411, 'grad_norm': 0.5564049457893279, 'learning_rate': 1.1925171780461004e-05, 'epoch': 0.46} 46%|████▌ | 3022/6638 [2:53:13<3:19:18, 3.31s/it] 46%|████▌ | 3023/6638 [2:53:16<3:18:34, 3.30s/it] {'loss': 0.643, 'grad_norm': 0.623784466415611, 'learning_rate': 1.1920383069339684e-05, 'epoch': 0.46} 46%|████▌ | 3023/6638 [2:53:16<3:18:34, 3.30s/it] 46%|████▌ | 3024/6638 [2:53:19<3:16:25, 3.26s/it] {'loss': 0.6775, 'grad_norm': 0.6609351418279199, 'learning_rate': 1.191559390093473e-05, 'epoch': 0.46} 46%|████▌ | 3024/6638 [2:53:19<3:16:25, 3.26s/it] 46%|████▌ | 3025/6638 [2:53:23<3:16:45, 3.27s/it] {'loss': 0.6303, 'grad_norm': 0.5313111076229331, 'learning_rate': 1.1910804276386541e-05, 'epoch': 0.46} 46%|████▌ | 3025/6638 [2:53:23<3:16:45, 3.27s/it] 46%|████▌ | 3026/6638 [2:53:26<3:17:13, 3.28s/it] {'loss': 0.6777, 'grad_norm': 0.6967216143823634, 'learning_rate': 1.190601419683563e-05, 'epoch': 0.46} 46%|████▌ | 3026/6638 [2:53:26<3:17:13, 3.28s/it] 46%|████▌ | 3027/6638 [2:53:29<3:16:11, 3.26s/it] {'loss': 0.6449, 'grad_norm': 0.6105205520172671, 'learning_rate': 1.1901223663422611e-05, 'epoch': 0.46} 46%|████▌ | 3027/6638 [2:53:29<3:16:11, 3.26s/it] 46%|████▌ | 3028/6638 [2:53:33<3:15:43, 3.25s/it] {'loss': 0.6631, 'grad_norm': 0.5709273498108195, 'learning_rate': 1.1896432677288215e-05, 'epoch': 0.46} 46%|████▌ | 3028/6638 [2:53:33<3:15:43, 3.25s/it] 46%|████▌ | 3029/6638 [2:53:36<3:23:14, 3.38s/it] {'loss': 0.6545, 'grad_norm': 0.5537969742479244, 'learning_rate': 1.1891641239573273e-05, 'epoch': 0.46} 46%|████▌ | 3029/6638 [2:53:36<3:23:14, 3.38s/it] 46%|████▌ | 3030/6638 [2:53:39<3:20:27, 3.33s/it] {'loss': 0.6438, 'grad_norm': 0.6222377205839881, 'learning_rate': 1.1886849351418732e-05, 'epoch': 0.46} 46%|████▌ | 3030/6638 [2:53:39<3:20:27, 3.33s/it] 46%|████▌ | 3031/6638 [2:53:43<3:18:28, 3.30s/it] {'loss': 0.6493, 'grad_norm': 0.589007121756015, 'learning_rate': 1.1882057013965637e-05, 'epoch': 0.46} 46%|████▌ | 3031/6638 [2:53:43<3:18:28, 3.30s/it] 46%|████▌ | 3032/6638 [2:53:46<3:17:17, 3.28s/it] {'loss': 0.65, 'grad_norm': 0.6111550284051496, 'learning_rate': 1.187726422835515e-05, 'epoch': 0.46} 46%|████▌ | 3032/6638 [2:53:46<3:17:17, 3.28s/it] 46%|████▌ | 3033/6638 [2:53:49<3:16:44, 3.27s/it] {'loss': 0.6147, 'grad_norm': 0.6404676746017174, 'learning_rate': 1.1872470995728529e-05, 'epoch': 0.46} 46%|████▌ | 3033/6638 [2:53:49<3:16:44, 3.27s/it] 46%|████▌ | 3034/6638 [2:53:53<3:25:15, 3.42s/it] {'loss': 0.71, 'grad_norm': 0.6920142474566029, 'learning_rate': 1.1867677317227144e-05, 'epoch': 0.46} 46%|████▌ | 3034/6638 [2:53:53<3:25:15, 3.42s/it] 46%|████▌ | 3035/6638 [2:53:56<3:22:17, 3.37s/it] {'loss': 0.6221, 'grad_norm': 0.5467144265831788, 'learning_rate': 1.1862883193992471e-05, 'epoch': 0.46} 46%|████▌ | 3035/6638 [2:53:56<3:22:17, 3.37s/it] 46%|████▌ | 3036/6638 [2:53:59<3:20:56, 3.35s/it] {'loss': 0.645, 'grad_norm': 0.6048367321217213, 'learning_rate': 1.1858088627166096e-05, 'epoch': 0.46} 46%|████▌ | 3036/6638 [2:53:59<3:20:56, 3.35s/it] 46%|████▌ | 3037/6638 [2:54:03<3:19:43, 3.33s/it] {'loss': 0.6959, 'grad_norm': 0.6655780139528195, 'learning_rate': 1.1853293617889701e-05, 'epoch': 0.46} 46%|████▌ | 3037/6638 [2:54:03<3:19:43, 3.33s/it] 46%|████▌ | 3038/6638 [2:54:06<3:18:51, 3.31s/it] {'loss': 0.6676, 'grad_norm': 0.6370611174336352, 'learning_rate': 1.1848498167305078e-05, 'epoch': 0.46} 46%|████▌ | 3038/6638 [2:54:06<3:18:51, 3.31s/it] 46%|████▌ | 3039/6638 [2:54:09<3:18:26, 3.31s/it] {'loss': 0.6144, 'grad_norm': 0.5906192217820777, 'learning_rate': 1.1843702276554131e-05, 'epoch': 0.46} 46%|████▌ | 3039/6638 [2:54:09<3:18:26, 3.31s/it] 46%|████▌ | 3040/6638 [2:54:13<3:17:13, 3.29s/it] {'loss': 0.6439, 'grad_norm': 0.6369648521651176, 'learning_rate': 1.1838905946778856e-05, 'epoch': 0.46} 46%|████▌ | 3040/6638 [2:54:13<3:17:13, 3.29s/it] 46%|████▌ | 3041/6638 [2:54:16<3:18:08, 3.30s/it] {'loss': 0.6452, 'grad_norm': 0.5871280447973884, 'learning_rate': 1.1834109179121367e-05, 'epoch': 0.46} 46%|████▌ | 3041/6638 [2:54:16<3:18:08, 3.30s/it] 46%|████▌ | 3042/6638 [2:54:19<3:16:44, 3.28s/it] {'loss': 0.6548, 'grad_norm': 0.5972135917402859, 'learning_rate': 1.1829311974723868e-05, 'epoch': 0.46} 46%|████▌ | 3042/6638 [2:54:19<3:16:44, 3.28s/it] 46%|████▌ | 3043/6638 [2:54:23<3:20:00, 3.34s/it] {'loss': 0.6711, 'grad_norm': 0.6562920805498269, 'learning_rate': 1.1824514334728678e-05, 'epoch': 0.46} 46%|████▌ | 3043/6638 [2:54:23<3:20:00, 3.34s/it] 46%|████▌ | 3044/6638 [2:54:26<3:21:33, 3.36s/it] {'loss': 0.7041, 'grad_norm': 0.6265578005216372, 'learning_rate': 1.1819716260278215e-05, 'epoch': 0.46} 46%|████▌ | 3044/6638 [2:54:26<3:21:33, 3.36s/it] 46%|████▌ | 3045/6638 [2:54:29<3:21:36, 3.37s/it] {'loss': 0.6571, 'grad_norm': 0.6224890190954586, 'learning_rate': 1.1814917752515005e-05, 'epoch': 0.46} 46%|████▌ | 3045/6638 [2:54:29<3:21:36, 3.37s/it] 46%|████▌ | 3046/6638 [2:54:33<3:25:30, 3.43s/it] {'loss': 0.7136, 'grad_norm': 0.7325415707537838, 'learning_rate': 1.1810118812581671e-05, 'epoch': 0.46} 46%|████▌ | 3046/6638 [2:54:33<3:25:30, 3.43s/it] 46%|████▌ | 3047/6638 [2:54:36<3:22:56, 3.39s/it] {'loss': 0.6596, 'grad_norm': 0.6245946518051633, 'learning_rate': 1.180531944162094e-05, 'epoch': 0.46} 46%|████▌ | 3047/6638 [2:54:36<3:22:56, 3.39s/it] 46%|████▌ | 3048/6638 [2:54:40<3:22:23, 3.38s/it] {'loss': 0.6652, 'grad_norm': 0.6212586660928074, 'learning_rate': 1.1800519640775642e-05, 'epoch': 0.46} 46%|████▌ | 3048/6638 [2:54:40<3:22:23, 3.38s/it] 46%|████▌ | 3049/6638 [2:54:43<3:21:22, 3.37s/it] {'loss': 0.6042, 'grad_norm': 0.611434337801836, 'learning_rate': 1.1795719411188717e-05, 'epoch': 0.46} 46%|████▌ | 3049/6638 [2:54:43<3:21:22, 3.37s/it]5 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...6 AutoResumeHook: Checking whether to suspend... 46%|████▌ | 3050/6638 [2:54:46<3:20:44, 3.36s/it]1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6515, 'grad_norm': 0.5832251472416285, 'learning_rate': 1.1790918754003195e-05, 'epoch': 0.46} 46%|████▌ | 3050/6638 [2:54:46<3:20:44, 3.36s/it] 46%|████▌ | 3051/6638 [2:54:50<3:18:25, 3.32s/it] {'loss': 0.675, 'grad_norm': 0.6636422953162753, 'learning_rate': 1.178611767036222e-05, 'epoch': 0.46} 46%|████▌ | 3051/6638 [2:54:50<3:18:25, 3.32s/it] 46%|████▌ | 3052/6638 [2:54:53<3:15:56, 3.28s/it] {'loss': 0.6567, 'grad_norm': 0.6504167933847731, 'learning_rate': 1.1781316161409024e-05, 'epoch': 0.46} 46%|████▌ | 3052/6638 [2:54:53<3:15:56, 3.28s/it] 46%|████▌ | 3053/6638 [2:54:56<3:14:37, 3.26s/it] {'loss': 0.6821, 'grad_norm': 0.6513190423457916, 'learning_rate': 1.1776514228286951e-05, 'epoch': 0.46} 46%|████▌ | 3053/6638 [2:54:56<3:14:37, 3.26s/it] 46%|████▌ | 3054/6638 [2:54:59<3:14:49, 3.26s/it] {'loss': 0.6698, 'grad_norm': 0.6852390565108801, 'learning_rate': 1.177171187213944e-05, 'epoch': 0.46} 46%|████▌ | 3054/6638 [2:54:59<3:14:49, 3.26s/it] 46%|████▌ | 3055/6638 [2:55:03<3:16:52, 3.30s/it] {'loss': 0.647, 'grad_norm': 0.6411709502767542, 'learning_rate': 1.1766909094110036e-05, 'epoch': 0.46} 46%|████▌ | 3055/6638 [2:55:03<3:16:52, 3.30s/it] 46%|████▌ | 3056/6638 [2:55:06<3:16:43, 3.30s/it] {'loss': 0.6444, 'grad_norm': 0.5742162845384132, 'learning_rate': 1.176210589534238e-05, 'epoch': 0.46} 46%|████▌ | 3056/6638 [2:55:06<3:16:43, 3.30s/it] 46%|████▌ | 3057/6638 [2:55:09<3:16:02, 3.28s/it] {'loss': 0.6688, 'grad_norm': 0.6377489971424426, 'learning_rate': 1.1757302276980213e-05, 'epoch': 0.46} 46%|████▌ | 3057/6638 [2:55:09<3:16:02, 3.28s/it] 46%|████▌ | 3058/6638 [2:55:12<3:15:51, 3.28s/it] {'loss': 0.6551, 'grad_norm': 0.5908469668254238, 'learning_rate': 1.175249824016738e-05, 'epoch': 0.46} 46%|████▌ | 3058/6638 [2:55:12<3:15:51, 3.28s/it] 46%|████▌ | 3059/6638 [2:55:16<3:16:08, 3.29s/it] {'loss': 0.6786, 'grad_norm': 0.6469836717046201, 'learning_rate': 1.1747693786047827e-05, 'epoch': 0.46} 46%|████▌ | 3059/6638 [2:55:16<3:16:08, 3.29s/it] 46%|████▌ | 3060/6638 [2:55:19<3:14:17, 3.26s/it] {'loss': 0.6651, 'grad_norm': 0.5985930705242963, 'learning_rate': 1.174288891576559e-05, 'epoch': 0.46} 46%|████▌ | 3060/6638 [2:55:19<3:14:17, 3.26s/it] 46%|████▌ | 3061/6638 [2:55:22<3:14:55, 3.27s/it] {'loss': 0.7173, 'grad_norm': 0.7448979219405333, 'learning_rate': 1.1738083630464807e-05, 'epoch': 0.46} 46%|████▌ | 3061/6638 [2:55:22<3:14:55, 3.27s/it] 46%|████▌ | 3062/6638 [2:55:25<3:14:01, 3.26s/it] {'loss': 0.648, 'grad_norm': 0.5811968991435029, 'learning_rate': 1.1733277931289726e-05, 'epoch': 0.46} 46%|████▌ | 3062/6638 [2:55:25<3:14:01, 3.26s/it] 46%|████▌ | 3063/6638 [2:55:29<3:15:09, 3.28s/it] {'loss': 0.6287, 'grad_norm': 0.5173288884985489, 'learning_rate': 1.1728471819384679e-05, 'epoch': 0.46} 46%|████▌ | 3063/6638 [2:55:29<3:15:09, 3.28s/it] 46%|████▌ | 3064/6638 [2:55:32<3:15:24, 3.28s/it] {'loss': 0.6355, 'grad_norm': 0.576590194675849, 'learning_rate': 1.1723665295894104e-05, 'epoch': 0.46} 46%|████▌ | 3064/6638 [2:55:32<3:15:24, 3.28s/it] 46%|████▌ | 3065/6638 [2:55:35<3:17:07, 3.31s/it] {'loss': 0.684, 'grad_norm': 0.5952812780673078, 'learning_rate': 1.171885836196254e-05, 'epoch': 0.46} 46%|████▌ | 3065/6638 [2:55:35<3:17:07, 3.31s/it] 46%|████▌ | 3066/6638 [2:55:39<3:17:34, 3.32s/it] {'loss': 0.6276, 'grad_norm': 0.6229544769165379, 'learning_rate': 1.1714051018734612e-05, 'epoch': 0.46} 46%|████▌ | 3066/6638 [2:55:39<3:17:34, 3.32s/it] 46%|████▌ | 3067/6638 [2:55:42<3:14:31, 3.27s/it] {'loss': 0.632, 'grad_norm': 0.5927584143497022, 'learning_rate': 1.170924326735505e-05, 'epoch': 0.46} 46%|████▌ | 3067/6638 [2:55:42<3:14:31, 3.27s/it] 46%|████▌ | 3068/6638 [2:55:45<3:14:52, 3.28s/it] {'loss': 0.63, 'grad_norm': 0.5218722398881489, 'learning_rate': 1.1704435108968688e-05, 'epoch': 0.46} 46%|████▌ | 3068/6638 [2:55:45<3:14:52, 3.28s/it] 46%|████▌ | 3069/6638 [2:55:49<3:15:30, 3.29s/it] {'loss': 0.6722, 'grad_norm': 0.5526156068075213, 'learning_rate': 1.1699626544720446e-05, 'epoch': 0.46} 46%|████▌ | 3069/6638 [2:55:49<3:15:30, 3.29s/it] 46%|████▌ | 3070/6638 [2:55:52<3:14:07, 3.26s/it] {'loss': 0.68, 'grad_norm': 0.6189547885299199, 'learning_rate': 1.1694817575755342e-05, 'epoch': 0.46} 46%|████▌ | 3070/6638 [2:55:52<3:14:07, 3.26s/it] 46%|████▋ | 3071/6638 [2:55:55<3:12:49, 3.24s/it] {'loss': 0.638, 'grad_norm': 0.5948864914367371, 'learning_rate': 1.1690008203218493e-05, 'epoch': 0.46} 46%|████▋ | 3071/6638 [2:55:55<3:12:49, 3.24s/it] 46%|████▋ | 3072/6638 [2:55:58<3:14:49, 3.28s/it] {'loss': 0.6924, 'grad_norm': 0.7425582896742682, 'learning_rate': 1.1685198428255114e-05, 'epoch': 0.46} 46%|████▋ | 3072/6638 [2:55:58<3:14:49, 3.28s/it] 46%|████▋ | 3073/6638 [2:56:01<3:13:40, 3.26s/it] {'loss': 0.6525, 'grad_norm': 0.585536988743151, 'learning_rate': 1.1680388252010511e-05, 'epoch': 0.46} 46%|████▋ | 3073/6638 [2:56:01<3:13:40, 3.26s/it] 46%|████▋ | 3074/6638 [2:56:05<3:12:33, 3.24s/it] {'loss': 0.6562, 'grad_norm': 0.6107186724248828, 'learning_rate': 1.1675577675630093e-05, 'epoch': 0.46} 46%|████▋ | 3074/6638 [2:56:05<3:12:33, 3.24s/it] 46%|████▋ | 3075/6638 [2:56:08<3:12:31, 3.24s/it] {'loss': 0.6604, 'grad_norm': 0.5719909052981282, 'learning_rate': 1.1670766700259355e-05, 'epoch': 0.46} 46%|████▋ | 3075/6638 [2:56:08<3:12:31, 3.24s/it] 46%|████▋ | 3076/6638 [2:56:11<3:13:20, 3.26s/it] {'loss': 0.6025, 'grad_norm': 0.5151132992769731, 'learning_rate': 1.1665955327043897e-05, 'epoch': 0.46} 46%|████▋ | 3076/6638 [2:56:11<3:13:20, 3.26s/it] 46%|████▋ | 3077/6638 [2:56:14<3:13:25, 3.26s/it] {'loss': 0.6771, 'grad_norm': 0.647415219012351, 'learning_rate': 1.1661143557129397e-05, 'epoch': 0.46} 46%|████▋ | 3077/6638 [2:56:14<3:13:25, 3.26s/it] 46%|████▋ | 3078/6638 [2:56:18<3:13:40, 3.26s/it] {'loss': 0.677, 'grad_norm': 0.58739330820463, 'learning_rate': 1.1656331391661652e-05, 'epoch': 0.46} 46%|████▋ | 3078/6638 [2:56:18<3:13:40, 3.26s/it] 46%|████▋ | 3079/6638 [2:56:21<3:12:11, 3.24s/it] {'loss': 0.6332, 'grad_norm': 0.6118776219710536, 'learning_rate': 1.1651518831786533e-05, 'epoch': 0.46} 46%|████▋ | 3079/6638 [2:56:21<3:12:11, 3.24s/it] 46%|████▋ | 3080/6638 [2:56:24<3:15:42, 3.30s/it] {'loss': 0.656, 'grad_norm': 0.6497739561983196, 'learning_rate': 1.1646705878650012e-05, 'epoch': 0.46} 46%|████▋ | 3080/6638 [2:56:24<3:15:42, 3.30s/it] 46%|████▋ | 3081/6638 [2:56:28<3:14:40, 3.28s/it] {'loss': 0.633, 'grad_norm': 0.5778855201587444, 'learning_rate': 1.1641892533398156e-05, 'epoch': 0.46} 46%|████▋ | 3081/6638 [2:56:28<3:14:40, 3.28s/it] 46%|████▋ | 3082/6638 [2:56:31<3:12:00, 3.24s/it] {'loss': 0.6138, 'grad_norm': 0.653177116904927, 'learning_rate': 1.1637078797177122e-05, 'epoch': 0.46} 46%|████▋ | 3082/6638 [2:56:31<3:12:00, 3.24s/it] 46%|████▋ | 3083/6638 [2:56:34<3:10:26, 3.21s/it] {'loss': 0.6504, 'grad_norm': 0.6676214069249781, 'learning_rate': 1.1632264671133163e-05, 'epoch': 0.46} 46%|████▋ | 3083/6638 [2:56:34<3:10:26, 3.21s/it] 46%|████▋ | 3084/6638 [2:56:37<3:10:49, 3.22s/it] {'loss': 0.6243, 'grad_norm': 0.6283334675050346, 'learning_rate': 1.1627450156412628e-05, 'epoch': 0.46} 46%|████▋ | 3084/6638 [2:56:37<3:10:49, 3.22s/it] 46%|████▋ | 3085/6638 [2:56:41<3:13:30, 3.27s/it] {'loss': 0.6609, 'grad_norm': 0.6495532791999455, 'learning_rate': 1.1622635254161945e-05, 'epoch': 0.46} 46%|████▋ | 3085/6638 [2:56:41<3:13:30, 3.27s/it] 46%|████▋ | 3086/6638 [2:56:44<3:14:36, 3.29s/it] {'loss': 0.6931, 'grad_norm': 0.6010451396522176, 'learning_rate': 1.161781996552765e-05, 'epoch': 0.46} 46%|████▋ | 3086/6638 [2:56:44<3:14:36, 3.29s/it] 47%|████▋ | 3087/6638 [2:56:47<3:15:45, 3.31s/it] {'loss': 0.6327, 'grad_norm': 0.5714368134152359, 'learning_rate': 1.1613004291656362e-05, 'epoch': 0.47} 47%|████▋ | 3087/6638 [2:56:47<3:15:45, 3.31s/it] 47%|████▋ | 3088/6638 [2:56:51<3:16:19, 3.32s/it] {'loss': 0.6533, 'grad_norm': 0.58902999051483, 'learning_rate': 1.1608188233694797e-05, 'epoch': 0.47} 47%|████▋ | 3088/6638 [2:56:51<3:16:19, 3.32s/it] 47%|████▋ | 3089/6638 [2:56:54<3:15:24, 3.30s/it] {'loss': 0.6523, 'grad_norm': 0.5950617619156188, 'learning_rate': 1.1603371792789759e-05, 'epoch': 0.47} 47%|████▋ | 3089/6638 [2:56:54<3:15:24, 3.30s/it] 47%|████▋ | 3090/6638 [2:56:57<3:15:11, 3.30s/it] {'loss': 0.665, 'grad_norm': 0.6228488101533582, 'learning_rate': 1.1598554970088142e-05, 'epoch': 0.47} 47%|████▋ | 3090/6638 [2:56:57<3:15:11, 3.30s/it] 47%|████▋ | 3091/6638 [2:57:01<3:17:14, 3.34s/it] {'loss': 0.6393, 'grad_norm': 0.5442370488183695, 'learning_rate': 1.1593737766736936e-05, 'epoch': 0.47} 47%|████▋ | 3091/6638 [2:57:01<3:17:14, 3.34s/it] 47%|████▋ | 3092/6638 [2:57:04<3:15:15, 3.30s/it] {'loss': 0.6539, 'grad_norm': 0.5586532586385713, 'learning_rate': 1.1588920183883216e-05, 'epoch': 0.47} 47%|████▋ | 3092/6638 [2:57:04<3:15:15, 3.30s/it] 47%|████▋ | 3093/6638 [2:57:07<3:12:46, 3.26s/it] {'loss': 0.6628, 'grad_norm': 0.5698078988504277, 'learning_rate': 1.1584102222674152e-05, 'epoch': 0.47} 47%|████▋ | 3093/6638 [2:57:07<3:12:46, 3.26s/it] 47%|████▋ | 3094/6638 [2:57:10<3:11:11, 3.24s/it] {'loss': 0.6459, 'grad_norm': 0.5506358596831756, 'learning_rate': 1.1579283884256997e-05, 'epoch': 0.47} 47%|████▋ | 3094/6638 [2:57:10<3:11:11, 3.24s/it] 47%|████▋ | 3095/6638 [2:57:13<3:12:16, 3.26s/it] {'loss': 0.6518, 'grad_norm': 0.5374286383504657, 'learning_rate': 1.1574465169779106e-05, 'epoch': 0.47} 47%|████▋ | 3095/6638 [2:57:13<3:12:16, 3.26s/it] 47%|████▋ | 3096/6638 [2:57:17<3:15:36, 3.31s/it] {'loss': 0.649, 'grad_norm': 0.6006926683966278, 'learning_rate': 1.156964608038791e-05, 'epoch': 0.47} 47%|████▋ | 3096/6638 [2:57:17<3:15:36, 3.31s/it] 47%|████▋ | 3097/6638 [2:57:20<3:14:46, 3.30s/it] {'loss': 0.6415, 'grad_norm': 0.5701814342705697, 'learning_rate': 1.1564826617230943e-05, 'epoch': 0.47} 47%|████▋ | 3097/6638 [2:57:20<3:14:46, 3.30s/it] 47%|████▋ | 3098/6638 [2:57:23<3:13:22, 3.28s/it] {'loss': 0.6335, 'grad_norm': 0.5420058068651814, 'learning_rate': 1.1560006781455813e-05, 'epoch': 0.47} 47%|████▋ | 3098/6638 [2:57:23<3:13:22, 3.28s/it] 47%|████▋ | 3099/6638 [2:57:27<3:12:34, 3.26s/it] {'loss': 0.684, 'grad_norm': 0.5771834722043935, 'learning_rate': 1.1555186574210229e-05, 'epoch': 0.47} 47%|████▋ | 3099/6638 [2:57:27<3:12:34, 3.26s/it]5 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 47%|████▋ | 3100/6638 [2:57:30<3:12:16, 3.26s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.6608, 'grad_norm': 0.5575329119880695, 'learning_rate': 1.155036599664198e-05, 'epoch': 0.47} 47%|████▋ | 3100/6638 [2:57:30<3:12:16, 3.26s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 47%|████▋ | 3101/6638 [2:57:47<7:25:57, 7.57s/it] {'loss': 0.6541, 'grad_norm': 0.6077118034911164, 'learning_rate': 1.154554504989895e-05, 'epoch': 0.47} 47%|████▋ | 3101/6638 [2:57:47<7:25:57, 7.57s/it] 47%|████▋ | 3102/6638 [2:57:51<6:10:08, 6.28s/it] {'loss': 0.6695, 'grad_norm': 0.6411527448145701, 'learning_rate': 1.154072373512911e-05, 'epoch': 0.47} 47%|████▋ | 3102/6638 [2:57:51<6:10:08, 6.28s/it] 47%|████▋ | 3103/6638 [2:57:54<5:21:30, 5.46s/it] {'loss': 0.684, 'grad_norm': 0.7271722140715154, 'learning_rate': 1.153590205348051e-05, 'epoch': 0.47} 47%|████▋ | 3103/6638 [2:57:54<5:21:30, 5.46s/it] 47%|████▋ | 3104/6638 [2:57:58<4:49:07, 4.91s/it] {'loss': 0.6331, 'grad_norm': 0.5154913704916394, 'learning_rate': 1.1531080006101298e-05, 'epoch': 0.47} 47%|████▋ | 3104/6638 [2:57:58<4:49:07, 4.91s/it] 47%|████▋ | 3105/6638 [2:58:01<4:20:06, 4.42s/it] {'loss': 0.6351, 'grad_norm': 0.5443522602551507, 'learning_rate': 1.1526257594139706e-05, 'epoch': 0.47} 47%|████▋ | 3105/6638 [2:58:01<4:20:06, 4.42s/it] 47%|████▋ | 3106/6638 [2:58:04<3:58:50, 4.06s/it] {'loss': 0.6177, 'grad_norm': 0.5802936223546313, 'learning_rate': 1.152143481874405e-05, 'epoch': 0.47} 47%|████▋ | 3106/6638 [2:58:04<3:58:50, 4.06s/it] 47%|████▋ | 3107/6638 [2:58:08<3:46:08, 3.84s/it] {'loss': 0.6971, 'grad_norm': 0.6191215877664102, 'learning_rate': 1.1516611681062738e-05, 'epoch': 0.47} 47%|████▋ | 3107/6638 [2:58:08<3:46:08, 3.84s/it] 47%|████▋ | 3108/6638 [2:58:11<3:35:22, 3.66s/it] {'loss': 0.6521, 'grad_norm': 0.6060943703855068, 'learning_rate': 1.1511788182244252e-05, 'epoch': 0.47} 47%|████▋ | 3108/6638 [2:58:11<3:35:22, 3.66s/it] 47%|████▋ | 3109/6638 [2:58:14<3:27:30, 3.53s/it] {'loss': 0.6303, 'grad_norm': 0.5874668596128356, 'learning_rate': 1.1506964323437178e-05, 'epoch': 0.47} 47%|████▋ | 3109/6638 [2:58:14<3:27:30, 3.53s/it] 47%|████▋ | 3110/6638 [2:58:17<3:21:21, 3.42s/it] {'loss': 0.6561, 'grad_norm': 0.5922185840560525, 'learning_rate': 1.1502140105790172e-05, 'epoch': 0.47} 47%|████▋ | 3110/6638 [2:58:17<3:21:21, 3.42s/it] 47%|████▋ | 3111/6638 [2:58:21<3:18:43, 3.38s/it] {'loss': 0.6542, 'grad_norm': 0.5323401129405644, 'learning_rate': 1.1497315530451985e-05, 'epoch': 0.47} 47%|████▋ | 3111/6638 [2:58:21<3:18:43, 3.38s/it] 47%|████▋ | 3112/6638 [2:58:24<3:16:20, 3.34s/it] {'loss': 0.6035, 'grad_norm': 0.5732909699495967, 'learning_rate': 1.1492490598571451e-05, 'epoch': 0.47} 47%|████▋ | 3112/6638 [2:58:24<3:16:20, 3.34s/it] 47%|████▋ | 3113/6638 [2:58:27<3:17:53, 3.37s/it] {'loss': 0.636, 'grad_norm': 0.5621830169157023, 'learning_rate': 1.1487665311297484e-05, 'epoch': 0.47} 47%|████▋ | 3113/6638 [2:58:27<3:17:53, 3.37s/it] 47%|████▋ | 3114/6638 [2:58:31<3:17:03, 3.36s/it] {'loss': 0.6757, 'grad_norm': 0.6298263263754819, 'learning_rate': 1.1482839669779087e-05, 'epoch': 0.47} 47%|████▋ | 3114/6638 [2:58:31<3:17:03, 3.36s/it] 47%|████▋ | 3115/6638 [2:58:34<3:15:46, 3.33s/it] {'loss': 0.6104, 'grad_norm': 0.6027164355093692, 'learning_rate': 1.147801367516535e-05, 'epoch': 0.47} 47%|████▋ | 3115/6638 [2:58:34<3:15:46, 3.33s/it] 47%|████▋ | 3116/6638 [2:58:37<3:14:07, 3.31s/it] {'loss': 0.651, 'grad_norm': 0.5918973683551977, 'learning_rate': 1.1473187328605444e-05, 'epoch': 0.47} 47%|████▋ | 3116/6638 [2:58:37<3:14:07, 3.31s/it] 47%|████▋ | 3117/6638 [2:58:40<3:14:10, 3.31s/it] {'loss': 0.6178, 'grad_norm': 0.5301056035386077, 'learning_rate': 1.1468360631248618e-05, 'epoch': 0.47} 47%|████▋ | 3117/6638 [2:58:41<3:14:10, 3.31s/it] 47%|████▋ | 3118/6638 [2:58:44<3:13:32, 3.30s/it] {'loss': 0.6782, 'grad_norm': 0.6092642121983647, 'learning_rate': 1.1463533584244218e-05, 'epoch': 0.47} 47%|████▋ | 3118/6638 [2:58:44<3:13:32, 3.30s/it] 47%|████▋ | 3119/6638 [2:58:47<3:14:34, 3.32s/it] {'loss': 0.6601, 'grad_norm': 0.535644268670881, 'learning_rate': 1.1458706188741657e-05, 'epoch': 0.47} 47%|████▋ | 3119/6638 [2:58:47<3:14:34, 3.32s/it] 47%|████▋ | 3120/6638 [2:58:50<3:12:20, 3.28s/it] {'loss': 0.6586, 'grad_norm': 0.6003040987202732, 'learning_rate': 1.145387844589045e-05, 'epoch': 0.47} 47%|████▋ | 3120/6638 [2:58:50<3:12:20, 3.28s/it] 47%|████▋ | 3121/6638 [2:58:54<3:14:27, 3.32s/it] {'loss': 0.7013, 'grad_norm': 1.065879692720387, 'learning_rate': 1.1449050356840175e-05, 'epoch': 0.47} 47%|████▋ | 3121/6638 [2:58:54<3:14:27, 3.32s/it] 47%|████▋ | 3122/6638 [2:58:57<3:16:13, 3.35s/it] {'loss': 0.681, 'grad_norm': 0.5502386472707947, 'learning_rate': 1.1444221922740507e-05, 'epoch': 0.47} 47%|████▋ | 3122/6638 [2:58:57<3:16:13, 3.35s/it] 47%|████▋ | 3123/6638 [2:59:01<3:18:44, 3.39s/it] {'loss': 0.6333, 'grad_norm': 0.5728174301639225, 'learning_rate': 1.1439393144741192e-05, 'epoch': 0.47} 47%|████▋ | 3123/6638 [2:59:01<3:18:44, 3.39s/it] 47%|████▋ | 3124/6638 [2:59:04<3:16:14, 3.35s/it] {'loss': 0.6862, 'grad_norm': 0.613190726752255, 'learning_rate': 1.143456402399207e-05, 'epoch': 0.47} 47%|████▋ | 3124/6638 [2:59:04<3:16:14, 3.35s/it] 47%|████▋ | 3125/6638 [2:59:07<3:13:28, 3.30s/it] {'loss': 0.6062, 'grad_norm': 0.5578981858504176, 'learning_rate': 1.1429734561643053e-05, 'epoch': 0.47} 47%|████▋ | 3125/6638 [2:59:07<3:13:28, 3.30s/it] 47%|████▋ | 3126/6638 [2:59:10<3:12:08, 3.28s/it] {'loss': 0.6394, 'grad_norm': 0.7046269364028893, 'learning_rate': 1.1424904758844141e-05, 'epoch': 0.47} 47%|████▋ | 3126/6638 [2:59:10<3:12:08, 3.28s/it] 47%|████▋ | 3127/6638 [2:59:14<3:12:58, 3.30s/it] {'loss': 0.6425, 'grad_norm': 0.5891096508740109, 'learning_rate': 1.1420074616745407e-05, 'epoch': 0.47} 47%|████▋ | 3127/6638 [2:59:14<3:12:58, 3.30s/it] 47%|████▋ | 3128/6638 [2:59:17<3:11:19, 3.27s/it] {'loss': 0.6514, 'grad_norm': 0.6000430287909104, 'learning_rate': 1.1415244136497012e-05, 'epoch': 0.47} 47%|████▋ | 3128/6638 [2:59:17<3:11:19, 3.27s/it] 47%|████▋ | 3129/6638 [2:59:20<3:12:41, 3.29s/it] {'loss': 0.6767, 'grad_norm': 0.6087482035739222, 'learning_rate': 1.1410413319249193e-05, 'epoch': 0.47} 47%|████▋ | 3129/6638 [2:59:20<3:12:41, 3.29s/it] 47%|████▋ | 3130/6638 [2:59:24<3:12:38, 3.29s/it] {'loss': 0.6722, 'grad_norm': 0.6180481364235578, 'learning_rate': 1.1405582166152276e-05, 'epoch': 0.47} 47%|████▋ | 3130/6638 [2:59:24<3:12:38, 3.29s/it] 47%|████▋ | 3131/6638 [2:59:27<3:12:35, 3.30s/it] {'loss': 0.6331, 'grad_norm': 0.6099411048935439, 'learning_rate': 1.1400750678356652e-05, 'epoch': 0.47} 47%|████▋ | 3131/6638 [2:59:27<3:12:35, 3.30s/it] 47%|████▋ | 3132/6638 [2:59:30<3:10:36, 3.26s/it] {'loss': 0.6654, 'grad_norm': 0.6135557442434662, 'learning_rate': 1.1395918857012803e-05, 'epoch': 0.47} 47%|████▋ | 3132/6638 [2:59:30<3:10:36, 3.26s/it] 47%|████▋ | 3133/6638 [2:59:33<3:11:46, 3.28s/it] {'loss': 0.6423, 'grad_norm': 0.5603495993562917, 'learning_rate': 1.1391086703271285e-05, 'epoch': 0.47} 47%|████▋ | 3133/6638 [2:59:33<3:11:46, 3.28s/it] 47%|████▋ | 3134/6638 [2:59:37<3:10:22, 3.26s/it] {'loss': 0.6436, 'grad_norm': 0.6442444362020561, 'learning_rate': 1.1386254218282744e-05, 'epoch': 0.47} 47%|████▋ | 3134/6638 [2:59:37<3:10:22, 3.26s/it] 47%|████▋ | 3135/6638 [2:59:40<3:09:37, 3.25s/it] {'loss': 0.6655, 'grad_norm': 0.6099188614658567, 'learning_rate': 1.1381421403197888e-05, 'epoch': 0.47} 47%|████▋ | 3135/6638 [2:59:40<3:09:37, 3.25s/it] 47%|████▋ | 3136/6638 [2:59:43<3:10:50, 3.27s/it] {'loss': 0.6545, 'grad_norm': 0.6018885712714029, 'learning_rate': 1.1376588259167516e-05, 'epoch': 0.47} 47%|████▋ | 3136/6638 [2:59:43<3:10:50, 3.27s/it] 47%|████▋ | 3137/6638 [2:59:46<3:09:37, 3.25s/it] {'loss': 0.6565, 'grad_norm': 0.6082841041958562, 'learning_rate': 1.1371754787342497e-05, 'epoch': 0.47} 47%|████▋ | 3137/6638 [2:59:46<3:09:37, 3.25s/it] 47%|████▋ | 3138/6638 [2:59:50<3:10:32, 3.27s/it] {'loss': 0.7013, 'grad_norm': 0.6312950532101589, 'learning_rate': 1.1366920988873787e-05, 'epoch': 0.47} 47%|████▋ | 3138/6638 [2:59:50<3:10:32, 3.27s/it] 47%|████▋ | 3139/6638 [2:59:53<3:09:19, 3.25s/it] {'loss': 0.6417, 'grad_norm': 0.5478539410660216, 'learning_rate': 1.1362086864912411e-05, 'epoch': 0.47} 47%|████▋ | 3139/6638 [2:59:53<3:09:19, 3.25s/it] 47%|████▋ | 3140/6638 [2:59:56<3:08:57, 3.24s/it] {'loss': 0.6867, 'grad_norm': 0.6181402681234909, 'learning_rate': 1.1357252416609482e-05, 'epoch': 0.47} 47%|████▋ | 3140/6638 [2:59:56<3:08:57, 3.24s/it] 47%|████▋ | 3141/6638 [2:59:59<3:10:05, 3.26s/it] {'loss': 0.6604, 'grad_norm': 0.5668203314523277, 'learning_rate': 1.1352417645116178e-05, 'epoch': 0.47} 47%|████▋ | 3141/6638 [2:59:59<3:10:05, 3.26s/it] 47%|████▋ | 3142/6638 [3:00:03<3:10:55, 3.28s/it] {'loss': 0.6919, 'grad_norm': 0.6689805672127258, 'learning_rate': 1.1347582551583764e-05, 'epoch': 0.47} 47%|████▋ | 3142/6638 [3:00:03<3:10:55, 3.28s/it] 47%|████▋ | 3143/6638 [3:00:06<3:11:01, 3.28s/it] {'loss': 0.6619, 'grad_norm': 0.5797660535082493, 'learning_rate': 1.1342747137163571e-05, 'epoch': 0.47} 47%|████▋ | 3143/6638 [3:00:06<3:11:01, 3.28s/it] 47%|████▋ | 3144/6638 [3:00:09<3:10:43, 3.28s/it] {'loss': 0.6421, 'grad_norm': 0.6015071501182226, 'learning_rate': 1.1337911403007023e-05, 'epoch': 0.47} 47%|████▋ | 3144/6638 [3:00:09<3:10:43, 3.28s/it] 47%|████▋ | 3145/6638 [3:00:12<3:10:32, 3.27s/it] {'loss': 0.6482, 'grad_norm': 0.5890617739914294, 'learning_rate': 1.1333075350265601e-05, 'epoch': 0.47} 47%|████▋ | 3145/6638 [3:00:12<3:10:32, 3.27s/it] 47%|████▋ | 3146/6638 [3:00:16<3:12:52, 3.31s/it] {'loss': 0.6513, 'grad_norm': 0.6722436314816655, 'learning_rate': 1.1328238980090876e-05, 'epoch': 0.47} 47%|████▋ | 3146/6638 [3:00:16<3:12:52, 3.31s/it] 47%|████▋ | 3147/6638 [3:00:19<3:12:08, 3.30s/it] {'loss': 0.6483, 'grad_norm': 0.5760789034146997, 'learning_rate': 1.1323402293634487e-05, 'epoch': 0.47} 47%|████▋ | 3147/6638 [3:00:19<3:12:08, 3.30s/it] 47%|████▋ | 3148/6638 [3:00:23<3:13:35, 3.33s/it] {'loss': 0.651, 'grad_norm': 0.5812421052837944, 'learning_rate': 1.131856529204815e-05, 'epoch': 0.47} 47%|████▋ | 3148/6638 [3:00:23<3:13:35, 3.33s/it] 47%|████▋ | 3149/6638 [3:00:26<3:12:30, 3.31s/it] {'loss': 0.6452, 'grad_norm': 0.5684066556841098, 'learning_rate': 1.1313727976483666e-05, 'epoch': 0.47} 47%|████▋ | 3149/6638 [3:00:26<3:12:30, 3.31s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 47%|████▋ | 3150/6638 [3:00:29<3:11:43, 3.30s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... {'loss': 0.647, 'grad_norm': 0.5882681509770046, 'learning_rate': 1.130889034809289e-05, 'epoch': 0.47} 47%|████▋ | 3150/6638 [3:00:29<3:11:43, 3.30s/it] 47%|████▋ | 3151/6638 [3:00:32<3:10:34, 3.28s/it] {'loss': 0.6445, 'grad_norm': 0.5964807573690597, 'learning_rate': 1.1304052408027766e-05, 'epoch': 0.47} 47%|████▋ | 3151/6638 [3:00:32<3:10:34, 3.28s/it] 47%|████▋ | 3152/6638 [3:00:36<3:10:13, 3.27s/it] {'loss': 0.6137, 'grad_norm': 0.5268223353113644, 'learning_rate': 1.1299214157440313e-05, 'epoch': 0.47} 47%|████▋ | 3152/6638 [3:00:36<3:10:13, 3.27s/it] 47%|████▋ | 3153/6638 [3:00:39<3:10:23, 3.28s/it] {'loss': 0.6253, 'grad_norm': 0.5765290872300178, 'learning_rate': 1.129437559748262e-05, 'epoch': 0.47} 47%|████▋ | 3153/6638 [3:00:39<3:10:23, 3.28s/it] 48%|████▊ | 3154/6638 [3:00:42<3:14:53, 3.36s/it] {'loss': 0.65, 'grad_norm': 0.6399623175772048, 'learning_rate': 1.1289536729306844e-05, 'epoch': 0.48} 48%|████▊ | 3154/6638 [3:00:42<3:14:53, 3.36s/it] 48%|████▊ | 3155/6638 [3:00:46<3:14:09, 3.34s/it] {'loss': 0.6339, 'grad_norm': 0.5516826424444452, 'learning_rate': 1.128469755406523e-05, 'epoch': 0.48} 48%|████▊ | 3155/6638 [3:00:46<3:14:09, 3.34s/it] 48%|████▊ | 3156/6638 [3:00:49<3:14:03, 3.34s/it] {'loss': 0.6867, 'grad_norm': 0.6267611493206303, 'learning_rate': 1.127985807291008e-05, 'epoch': 0.48} 48%|████▊ | 3156/6638 [3:00:49<3:14:03, 3.34s/it] 48%|████▊ | 3157/6638 [3:00:52<3:12:41, 3.32s/it] {'loss': 0.6703, 'grad_norm': 0.5688417645086975, 'learning_rate': 1.1275018286993782e-05, 'epoch': 0.48} 48%|████▊ | 3157/6638 [3:00:52<3:12:41, 3.32s/it] 48%|████▊ | 3158/6638 [3:00:56<3:11:12, 3.30s/it] {'loss': 0.618, 'grad_norm': 0.561165530135773, 'learning_rate': 1.1270178197468788e-05, 'epoch': 0.48} 48%|████▊ | 3158/6638 [3:00:56<3:11:12, 3.30s/it] 48%|████▊ | 3159/6638 [3:00:59<3:13:11, 3.33s/it] {'loss': 0.6083, 'grad_norm': 0.4988953960891633, 'learning_rate': 1.1265337805487628e-05, 'epoch': 0.48} 48%|████▊ | 3159/6638 [3:00:59<3:13:11, 3.33s/it] 48%|████▊ | 3160/6638 [3:01:02<3:11:16, 3.30s/it] {'loss': 0.6604, 'grad_norm': 0.5702175627020051, 'learning_rate': 1.1260497112202895e-05, 'epoch': 0.48} 48%|████▊ | 3160/6638 [3:01:02<3:11:16, 3.30s/it] 48%|████▊ | 3161/6638 [3:01:05<3:10:53, 3.29s/it] {'loss': 0.6778, 'grad_norm': 0.6504878411463226, 'learning_rate': 1.1255656118767266e-05, 'epoch': 0.48} 48%|████▊ | 3161/6638 [3:01:05<3:10:53, 3.29s/it] 48%|████▊ | 3162/6638 [3:01:09<3:11:13, 3.30s/it] {'loss': 0.6131, 'grad_norm': 0.6007100960772531, 'learning_rate': 1.125081482633348e-05, 'epoch': 0.48} 48%|████▊ | 3162/6638 [3:01:09<3:11:13, 3.30s/it] 48%|████▊ | 3163/6638 [3:01:12<3:11:47, 3.31s/it] {'loss': 0.6322, 'grad_norm': 0.9380848710593346, 'learning_rate': 1.1245973236054355e-05, 'epoch': 0.48} 48%|████▊ | 3163/6638 [3:01:12<3:11:47, 3.31s/it] 48%|████▊ | 3164/6638 [3:01:15<3:11:28, 3.31s/it] {'loss': 0.6473, 'grad_norm': 0.6077103134512968, 'learning_rate': 1.124113134908277e-05, 'epoch': 0.48} 48%|████▊ | 3164/6638 [3:01:15<3:11:28, 3.31s/it] 48%|████▊ | 3165/6638 [3:01:19<3:12:22, 3.32s/it] {'loss': 0.6813, 'grad_norm': 0.6083923500745261, 'learning_rate': 1.1236289166571683e-05, 'epoch': 0.48} 48%|████▊ | 3165/6638 [3:01:19<3:12:22, 3.32s/it] 48%|████▊ | 3166/6638 [3:01:22<3:11:29, 3.31s/it] {'loss': 0.7341, 'grad_norm': 0.6181612349175906, 'learning_rate': 1.1231446689674119e-05, 'epoch': 0.48} 48%|████▊ | 3166/6638 [3:01:22<3:11:29, 3.31s/it] 48%|████▊ | 3167/6638 [3:01:25<3:10:44, 3.30s/it] {'loss': 0.6473, 'grad_norm': 0.5769060154044675, 'learning_rate': 1.1226603919543176e-05, 'epoch': 0.48} 48%|████▊ | 3167/6638 [3:01:25<3:10:44, 3.30s/it] 48%|████▊ | 3168/6638 [3:01:29<3:10:58, 3.30s/it] {'loss': 0.7041, 'grad_norm': 0.6565353453953743, 'learning_rate': 1.1221760857332016e-05, 'epoch': 0.48} 48%|████▊ | 3168/6638 [3:01:29<3:10:58, 3.30s/it] 48%|████▊ | 3169/6638 [3:01:32<3:10:19, 3.29s/it] {'loss': 0.6793, 'grad_norm': 0.655007682652244, 'learning_rate': 1.1216917504193877e-05, 'epoch': 0.48} 48%|████▊ | 3169/6638 [3:01:32<3:10:19, 3.29s/it] 48%|████▊ | 3170/6638 [3:01:35<3:13:46, 3.35s/it] {'loss': 0.6537, 'grad_norm': 0.5378464812859659, 'learning_rate': 1.121207386128206e-05, 'epoch': 0.48} 48%|████▊ | 3170/6638 [3:01:35<3:13:46, 3.35s/it] 48%|████▊ | 3171/6638 [3:01:39<3:14:56, 3.37s/it] {'loss': 0.6685, 'grad_norm': 0.6027720393852165, 'learning_rate': 1.1207229929749944e-05, 'epoch': 0.48} 48%|████▊ | 3171/6638 [3:01:39<3:14:56, 3.37s/it] 48%|████▊ | 3172/6638 [3:01:42<3:14:26, 3.37s/it] {'loss': 0.6424, 'grad_norm': 0.543854710539828, 'learning_rate': 1.1202385710750962e-05, 'epoch': 0.48} 48%|████▊ | 3172/6638 [3:01:42<3:14:26, 3.37s/it] 48%|████▊ | 3173/6638 [3:01:45<3:12:28, 3.33s/it] {'loss': 0.6904, 'grad_norm': 0.6895228448736652, 'learning_rate': 1.1197541205438634e-05, 'epoch': 0.48} 48%|████▊ | 3173/6638 [3:01:45<3:12:28, 3.33s/it] 48%|████▊ | 3174/6638 [3:01:49<3:13:52, 3.36s/it] {'loss': 0.6699, 'grad_norm': 0.5882567024885752, 'learning_rate': 1.1192696414966533e-05, 'epoch': 0.48} 48%|████▊ | 3174/6638 [3:01:49<3:13:52, 3.36s/it] 48%|████▊ | 3175/6638 [3:01:52<3:11:18, 3.31s/it] {'loss': 0.6393, 'grad_norm': 0.5886819921982224, 'learning_rate': 1.1187851340488308e-05, 'epoch': 0.48} 48%|████▊ | 3175/6638 [3:01:52<3:11:18, 3.31s/it] 48%|████▊ | 3176/6638 [3:01:55<3:08:47, 3.27s/it] {'loss': 0.6295, 'grad_norm': 0.5894767170440438, 'learning_rate': 1.1183005983157672e-05, 'epoch': 0.48} 48%|████▊ | 3176/6638 [3:01:55<3:08:47, 3.27s/it] 48%|████▊ | 3177/6638 [3:01:58<3:08:33, 3.27s/it] {'loss': 0.7161, 'grad_norm': 0.7103950429063557, 'learning_rate': 1.1178160344128409e-05, 'epoch': 0.48} 48%|████▊ | 3177/6638 [3:01:59<3:08:33, 3.27s/it] 48%|████▊ | 3178/6638 [3:02:02<3:12:36, 3.34s/it] {'loss': 0.7033, 'grad_norm': 0.6563500235453776, 'learning_rate': 1.1173314424554365e-05, 'epoch': 0.48} 48%|████▊ | 3178/6638 [3:02:02<3:12:36, 3.34s/it] 48%|████▊ | 3179/6638 [3:02:05<3:09:38, 3.29s/it] {'loss': 0.659, 'grad_norm': 0.6230967782355684, 'learning_rate': 1.1168468225589452e-05, 'epoch': 0.48} 48%|████▊ | 3179/6638 [3:02:05<3:09:38, 3.29s/it] 48%|████▊ | 3180/6638 [3:02:09<3:11:13, 3.32s/it] {'loss': 0.6485, 'grad_norm': 0.6014615753512634, 'learning_rate': 1.1163621748387663e-05, 'epoch': 0.48} 48%|████▊ | 3180/6638 [3:02:09<3:11:13, 3.32s/it] 48%|████▊ | 3181/6638 [3:02:12<3:11:49, 3.33s/it] {'loss': 0.6214, 'grad_norm': 0.49311583444678375, 'learning_rate': 1.1158774994103035e-05, 'epoch': 0.48} 48%|████▊ | 3181/6638 [3:02:12<3:11:49, 3.33s/it] 48%|████▊ | 3182/6638 [3:02:15<3:10:46, 3.31s/it] {'loss': 0.6287, 'grad_norm': 0.5521672744881212, 'learning_rate': 1.115392796388969e-05, 'epoch': 0.48} 48%|████▊ | 3182/6638 [3:02:15<3:10:46, 3.31s/it] 48%|████▊ | 3183/6638 [3:02:19<3:10:58, 3.32s/it] {'loss': 0.6757, 'grad_norm': 0.6407704161005185, 'learning_rate': 1.11490806589018e-05, 'epoch': 0.48} 48%|████▊ | 3183/6638 [3:02:19<3:10:58, 3.32s/it] 48%|████▊ | 3184/6638 [3:02:22<3:08:58, 3.28s/it] {'loss': 0.6427, 'grad_norm': 0.5966193618813824, 'learning_rate': 1.1144233080293619e-05, 'epoch': 0.48} 48%|████▊ | 3184/6638 [3:02:22<3:08:58, 3.28s/it] 48%|████▊ | 3185/6638 [3:02:25<3:06:49, 3.25s/it] {'loss': 0.6849, 'grad_norm': 0.6258064626277853, 'learning_rate': 1.1139385229219451e-05, 'epoch': 0.48} 48%|████▊ | 3185/6638 [3:02:25<3:06:49, 3.25s/it] 48%|████▊ | 3186/6638 [3:02:28<3:11:53, 3.34s/it] {'loss': 0.6895, 'grad_norm': 0.6181626848848918, 'learning_rate': 1.1134537106833673e-05, 'epoch': 0.48} 48%|████▊ | 3186/6638 [3:02:28<3:11:53, 3.34s/it] 48%|████▊ | 3187/6638 [3:02:32<3:09:47, 3.30s/it] {'loss': 0.6766, 'grad_norm': 0.650500664189861, 'learning_rate': 1.112968871429073e-05, 'epoch': 0.48} 48%|████▊ | 3187/6638 [3:02:32<3:09:47, 3.30s/it] 48%|████▊ | 3188/6638 [3:02:35<3:07:35, 3.26s/it] {'loss': 0.6342, 'grad_norm': 0.5581708736089859, 'learning_rate': 1.1124840052745121e-05, 'epoch': 0.48} 48%|████▊ | 3188/6638 [3:02:35<3:07:35, 3.26s/it] 48%|████▊ | 3189/6638 [3:02:38<3:08:26, 3.28s/it] {'loss': 0.666, 'grad_norm': 0.5850324223746082, 'learning_rate': 1.1119991123351413e-05, 'epoch': 0.48} 48%|████▊ | 3189/6638 [3:02:38<3:08:26, 3.28s/it] 48%|████▊ | 3190/6638 [3:02:41<3:07:51, 3.27s/it] {'loss': 0.6341, 'grad_norm': 0.644014622146031, 'learning_rate': 1.1115141927264244e-05, 'epoch': 0.48} 48%|████▊ | 3190/6638 [3:02:41<3:07:51, 3.27s/it] 48%|████▊ | 3191/6638 [3:02:45<3:07:10, 3.26s/it] {'loss': 0.6707, 'grad_norm': 0.6160262969187155, 'learning_rate': 1.1110292465638301e-05, 'epoch': 0.48} 48%|████▊ | 3191/6638 [3:02:45<3:07:10, 3.26s/it] 48%|████▊ | 3192/6638 [3:02:48<3:08:06, 3.28s/it] {'loss': 0.692, 'grad_norm': 0.614069193494685, 'learning_rate': 1.1105442739628357e-05, 'epoch': 0.48} 48%|████▊ | 3192/6638 [3:02:48<3:08:06, 3.28s/it] 48%|████▊ | 3193/6638 [3:02:51<3:07:19, 3.26s/it] {'loss': 0.6547, 'grad_norm': 0.5937438551159684, 'learning_rate': 1.1100592750389219e-05, 'epoch': 0.48} 48%|████▊ | 3193/6638 [3:02:51<3:07:19, 3.26s/it] 48%|████▊ | 3194/6638 [3:02:54<3:05:17, 3.23s/it] {'loss': 0.6347, 'grad_norm': 0.5529641416961014, 'learning_rate': 1.109574249907578e-05, 'epoch': 0.48} 48%|████▊ | 3194/6638 [3:02:54<3:05:17, 3.23s/it] 48%|████▊ | 3195/6638 [3:02:58<3:04:40, 3.22s/it] {'loss': 0.6772, 'grad_norm': 0.6347297569245365, 'learning_rate': 1.1090891986842982e-05, 'epoch': 0.48} 48%|████▊ | 3195/6638 [3:02:58<3:04:40, 3.22s/it] 48%|████▊ | 3196/6638 [3:03:01<3:06:03, 3.24s/it] {'loss': 0.6163, 'grad_norm': 0.5483401970158034, 'learning_rate': 1.108604121484584e-05, 'epoch': 0.48} 48%|████▊ | 3196/6638 [3:03:01<3:06:03, 3.24s/it] 48%|████▊ | 3197/6638 [3:03:04<3:09:26, 3.30s/it] {'loss': 0.6893, 'grad_norm': 0.6159036262505644, 'learning_rate': 1.1081190184239418e-05, 'epoch': 0.48} 48%|████▊ | 3197/6638 [3:03:04<3:09:26, 3.30s/it] 48%|████▊ | 3198/6638 [3:03:08<3:10:13, 3.32s/it] {'loss': 0.6332, 'grad_norm': 0.5537782426093334, 'learning_rate': 1.1076338896178858e-05, 'epoch': 0.48} 48%|████▊ | 3198/6638 [3:03:08<3:10:13, 3.32s/it] 48%|████▊ | 3199/6638 [3:03:11<3:10:06, 3.32s/it] {'loss': 0.6402, 'grad_norm': 0.5348461398998254, 'learning_rate': 1.1071487351819341e-05, 'epoch': 0.48} 48%|████▊ | 3199/6638 [3:03:11<3:10:06, 3.32s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 48%|████▊ | 3200/6638 [3:03:14<3:09:33, 3.31s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6333, 'grad_norm': 0.5472255780517213, 'learning_rate': 1.1066635552316133e-05, 'epoch': 0.48} 48%|████▊ | 3200/6638 [3:03:14<3:09:33, 3.31s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 48%|████▊ | 3201/6638 [3:03:31<7:03:24, 7.39s/it] {'loss': 0.6211, 'grad_norm': 0.5527542876391877, 'learning_rate': 1.1061783498824545e-05, 'epoch': 0.48} 48%|████▊ | 3201/6638 [3:03:31<7:03:24, 7.39s/it] 48%|████▊ | 3202/6638 [3:03:34<5:51:07, 6.13s/it] {'loss': 0.6823, 'grad_norm': 0.6671091588682874, 'learning_rate': 1.1056931192499954e-05, 'epoch': 0.48} 48%|████▊ | 3202/6638 [3:03:34<5:51:07, 6.13s/it] 48%|████▊ | 3203/6638 [3:03:38<5:02:44, 5.29s/it] {'loss': 0.6717, 'grad_norm': 0.5752439703906737, 'learning_rate': 1.1052078634497795e-05, 'epoch': 0.48} 48%|████▊ | 3203/6638 [3:03:38<5:02:44, 5.29s/it] 48%|████▊ | 3204/6638 [3:03:41<4:30:36, 4.73s/it] {'loss': 0.6776, 'grad_norm': 0.6530287370257555, 'learning_rate': 1.1047225825973565e-05, 'epoch': 0.48} 48%|████▊ | 3204/6638 [3:03:41<4:30:36, 4.73s/it] 48%|████▊ | 3205/6638 [3:03:44<4:05:12, 4.29s/it] {'loss': 0.6786, 'grad_norm': 0.6216732188743925, 'learning_rate': 1.104237276808282e-05, 'epoch': 0.48} 48%|████▊ | 3205/6638 [3:03:44<4:05:12, 4.29s/it] 48%|████▊ | 3206/6638 [3:03:48<3:48:27, 3.99s/it] {'loss': 0.6424, 'grad_norm': 0.6115852757653967, 'learning_rate': 1.1037519461981176e-05, 'epoch': 0.48} 48%|████▊ | 3206/6638 [3:03:48<3:48:27, 3.99s/it] 48%|████▊ | 3207/6638 [3:03:51<3:36:10, 3.78s/it] {'loss': 0.6611, 'grad_norm': 0.6402336000107495, 'learning_rate': 1.1032665908824307e-05, 'epoch': 0.48} 48%|████▊ | 3207/6638 [3:03:51<3:36:10, 3.78s/it] 48%|████▊ | 3208/6638 [3:03:54<3:27:17, 3.63s/it] {'loss': 0.6719, 'grad_norm': 0.6430798702121128, 'learning_rate': 1.1027812109767944e-05, 'epoch': 0.48} 48%|████▊ | 3208/6638 [3:03:54<3:27:17, 3.63s/it] 48%|████▊ | 3209/6638 [3:03:57<3:21:50, 3.53s/it] {'loss': 0.6373, 'grad_norm': 0.5847857951210184, 'learning_rate': 1.1022958065967879e-05, 'epoch': 0.48} 48%|████▊ | 3209/6638 [3:03:57<3:21:50, 3.53s/it] 48%|████▊ | 3210/6638 [3:04:01<3:17:23, 3.46s/it] {'loss': 0.6772, 'grad_norm': 0.5847960314164485, 'learning_rate': 1.1018103778579966e-05, 'epoch': 0.48} 48%|████▊ | 3210/6638 [3:04:01<3:17:23, 3.46s/it] 48%|████▊ | 3211/6638 [3:04:04<3:14:41, 3.41s/it] {'loss': 0.6679, 'grad_norm': 0.6065413612363885, 'learning_rate': 1.1013249248760111e-05, 'epoch': 0.48} 48%|████▊ | 3211/6638 [3:04:04<3:14:41, 3.41s/it] 48%|████▊ | 3212/6638 [3:04:07<3:13:17, 3.39s/it] {'loss': 0.6682, 'grad_norm': 0.6431061823365648, 'learning_rate': 1.1008394477664278e-05, 'epoch': 0.48} 48%|████▊ | 3212/6638 [3:04:07<3:13:17, 3.39s/it] 48%|████▊ | 3213/6638 [3:04:11<3:09:27, 3.32s/it] {'loss': 0.638, 'grad_norm': 0.6952704596563568, 'learning_rate': 1.1003539466448492e-05, 'epoch': 0.48} 48%|████▊ | 3213/6638 [3:04:11<3:09:27, 3.32s/it] 48%|████▊ | 3214/6638 [3:04:14<3:09:42, 3.32s/it] {'loss': 0.6345, 'grad_norm': 0.5865562710296625, 'learning_rate': 1.0998684216268831e-05, 'epoch': 0.48} 48%|████▊ | 3214/6638 [3:04:14<3:09:42, 3.32s/it] 48%|████▊ | 3215/6638 [3:04:17<3:09:36, 3.32s/it] {'loss': 0.6558, 'grad_norm': 0.6411091116390361, 'learning_rate': 1.099382872828144e-05, 'epoch': 0.48} 48%|████▊ | 3215/6638 [3:04:17<3:09:36, 3.32s/it] 48%|████▊ | 3216/6638 [3:04:20<3:08:13, 3.30s/it] {'loss': 0.6443, 'grad_norm': 0.6069994381852738, 'learning_rate': 1.09889730036425e-05, 'epoch': 0.48} 48%|████▊ | 3216/6638 [3:04:20<3:08:13, 3.30s/it] 48%|████▊ | 3217/6638 [3:04:24<3:06:46, 3.28s/it] {'loss': 0.6395, 'grad_norm': 0.5635317774101306, 'learning_rate': 1.0984117043508271e-05, 'epoch': 0.48} 48%|████▊ | 3217/6638 [3:04:24<3:06:46, 3.28s/it] 48%|████▊ | 3218/6638 [3:04:27<3:05:19, 3.25s/it] {'loss': 0.6467, 'grad_norm': 0.5713353540561591, 'learning_rate': 1.0979260849035054e-05, 'epoch': 0.48} 48%|████▊ | 3218/6638 [3:04:27<3:05:19, 3.25s/it] 48%|████▊ | 3219/6638 [3:04:30<3:04:47, 3.24s/it] {'loss': 0.6569, 'grad_norm': 0.5968476502747939, 'learning_rate': 1.0974404421379217e-05, 'epoch': 0.48} 48%|████▊ | 3219/6638 [3:04:30<3:04:47, 3.24s/it] 49%|████▊ | 3220/6638 [3:04:33<3:04:44, 3.24s/it] {'loss': 0.6348, 'grad_norm': 0.5618657250891066, 'learning_rate': 1.0969547761697174e-05, 'epoch': 0.49} 49%|████▊ | 3220/6638 [3:04:33<3:04:44, 3.24s/it] 49%|████▊ | 3221/6638 [3:04:37<3:04:33, 3.24s/it] {'loss': 0.6718, 'grad_norm': 0.6228986053101047, 'learning_rate': 1.0964690871145398e-05, 'epoch': 0.49} 49%|████▊ | 3221/6638 [3:04:37<3:04:33, 3.24s/it] 49%|████▊ | 3222/6638 [3:04:40<3:04:37, 3.24s/it] {'loss': 0.6839, 'grad_norm': 0.6819124566319846, 'learning_rate': 1.0959833750880415e-05, 'epoch': 0.49} 49%|████▊ | 3222/6638 [3:04:40<3:04:37, 3.24s/it] 49%|████▊ | 3223/6638 [3:04:43<3:04:10, 3.24s/it] {'loss': 0.6232, 'grad_norm': 0.5665770633215095, 'learning_rate': 1.0954976402058812e-05, 'epoch': 0.49} 49%|████▊ | 3223/6638 [3:04:43<3:04:10, 3.24s/it] 49%|████▊ | 3224/6638 [3:04:46<3:05:24, 3.26s/it] {'loss': 0.608, 'grad_norm': 0.5714518330842957, 'learning_rate': 1.0950118825837226e-05, 'epoch': 0.49} 49%|████▊ | 3224/6638 [3:04:46<3:05:24, 3.26s/it] 49%|████▊ | 3225/6638 [3:04:49<3:03:26, 3.22s/it] {'loss': 0.644, 'grad_norm': 0.6009208854305347, 'learning_rate': 1.094526102337234e-05, 'epoch': 0.49} 49%|████▊ | 3225/6638 [3:04:49<3:03:26, 3.22s/it] 49%|████▊ | 3226/6638 [3:04:53<3:05:06, 3.26s/it] {'loss': 0.6685, 'grad_norm': 0.5949004022840694, 'learning_rate': 1.0940402995820908e-05, 'epoch': 0.49} 49%|████▊ | 3226/6638 [3:04:53<3:05:06, 3.26s/it] 49%|████▊ | 3227/6638 [3:04:56<3:06:27, 3.28s/it] {'loss': 0.6354, 'grad_norm': 0.5696146077276684, 'learning_rate': 1.0935544744339727e-05, 'epoch': 0.49} 49%|████▊ | 3227/6638 [3:04:56<3:06:27, 3.28s/it] 49%|████▊ | 3228/6638 [3:04:59<3:07:01, 3.29s/it] {'loss': 0.6817, 'grad_norm': 0.6082137457116558, 'learning_rate': 1.0930686270085646e-05, 'epoch': 0.49} 49%|████▊ | 3228/6638 [3:04:59<3:07:01, 3.29s/it] 49%|████▊ | 3229/6638 [3:05:03<3:06:00, 3.27s/it] {'loss': 0.6253, 'grad_norm': 0.5647972173800955, 'learning_rate': 1.0925827574215573e-05, 'epoch': 0.49} 49%|████▊ | 3229/6638 [3:05:03<3:06:00, 3.27s/it] 49%|████▊ | 3230/6638 [3:05:06<3:05:05, 3.26s/it] {'loss': 0.6054, 'grad_norm': 0.5700019624389522, 'learning_rate': 1.0920968657886463e-05, 'epoch': 0.49} 49%|████▊ | 3230/6638 [3:05:06<3:05:05, 3.26s/it] 49%|████▊ | 3231/6638 [3:05:09<3:06:47, 3.29s/it] {'loss': 0.6799, 'grad_norm': 0.6404860335845981, 'learning_rate': 1.0916109522255326e-05, 'epoch': 0.49} 49%|████▊ | 3231/6638 [3:05:09<3:06:47, 3.29s/it] 49%|████▊ | 3232/6638 [3:05:12<3:04:27, 3.25s/it] {'loss': 0.6608, 'grad_norm': 0.6525189174548482, 'learning_rate': 1.091125016847923e-05, 'epoch': 0.49} 49%|████▊ | 3232/6638 [3:05:12<3:04:27, 3.25s/it] 49%|████▊ | 3233/6638 [3:05:16<3:04:40, 3.25s/it] {'loss': 0.6506, 'grad_norm': 0.5973165882242332, 'learning_rate': 1.0906390597715281e-05, 'epoch': 0.49} 49%|████▊ | 3233/6638 [3:05:16<3:04:40, 3.25s/it] 49%|████▊ | 3234/6638 [3:05:19<3:03:48, 3.24s/it] {'loss': 0.6285, 'grad_norm': 0.5749071819623012, 'learning_rate': 1.0901530811120655e-05, 'epoch': 0.49} 49%|████▊ | 3234/6638 [3:05:19<3:03:48, 3.24s/it] 49%|████▊ | 3235/6638 [3:05:22<3:04:35, 3.25s/it] {'loss': 0.6514, 'grad_norm': 0.6211988068064275, 'learning_rate': 1.0896670809852558e-05, 'epoch': 0.49} 49%|████▊ | 3235/6638 [3:05:22<3:04:35, 3.25s/it] 49%|████▊ | 3236/6638 [3:05:26<3:07:57, 3.32s/it] {'loss': 0.6381, 'grad_norm': 0.5534640888652412, 'learning_rate': 1.0891810595068271e-05, 'epoch': 0.49} 49%|████▊ | 3236/6638 [3:05:26<3:07:57, 3.32s/it] 49%|████▉ | 3237/6638 [3:05:29<3:05:35, 3.27s/it] {'loss': 0.6569, 'grad_norm': 0.5848228451037861, 'learning_rate': 1.0886950167925101e-05, 'epoch': 0.49} 49%|████▉ | 3237/6638 [3:05:29<3:05:35, 3.27s/it] 49%|████▉ | 3238/6638 [3:05:32<3:05:03, 3.27s/it] {'loss': 0.6788, 'grad_norm': 0.7786867438848124, 'learning_rate': 1.0882089529580428e-05, 'epoch': 0.49} 49%|████▉ | 3238/6638 [3:05:32<3:05:03, 3.27s/it] 49%|████▉ | 3239/6638 [3:05:35<3:06:14, 3.29s/it] {'loss': 0.6775, 'grad_norm': 0.6350161870112506, 'learning_rate': 1.0877228681191667e-05, 'epoch': 0.49} 49%|████▉ | 3239/6638 [3:05:35<3:06:14, 3.29s/it] 49%|████▉ | 3240/6638 [3:05:39<3:10:03, 3.36s/it] {'loss': 0.6855, 'grad_norm': 0.6128498462276339, 'learning_rate': 1.0872367623916291e-05, 'epoch': 0.49} 49%|████▉ | 3240/6638 [3:05:39<3:10:03, 3.36s/it] 49%|████▉ | 3241/6638 [3:05:42<3:08:00, 3.32s/it] {'loss': 0.6632, 'grad_norm': 0.6393003228539109, 'learning_rate': 1.0867506358911818e-05, 'epoch': 0.49} 49%|████▉ | 3241/6638 [3:05:42<3:08:00, 3.32s/it] 49%|████▉ | 3242/6638 [3:05:45<3:06:50, 3.30s/it] {'loss': 0.6443, 'grad_norm': 0.6299395813899472, 'learning_rate': 1.0862644887335819e-05, 'epoch': 0.49} 49%|████▉ | 3242/6638 [3:05:45<3:06:50, 3.30s/it] 49%|████▉ | 3243/6638 [3:05:49<3:04:55, 3.27s/it] {'loss': 0.6375, 'grad_norm': 0.6083733096428801, 'learning_rate': 1.0857783210345914e-05, 'epoch': 0.49} 49%|████▉ | 3243/6638 [3:05:49<3:04:55, 3.27s/it] 49%|████▉ | 3244/6638 [3:05:52<3:03:41, 3.25s/it] {'loss': 0.6367, 'grad_norm': 0.6542539499932892, 'learning_rate': 1.0852921329099766e-05, 'epoch': 0.49} 49%|████▉ | 3244/6638 [3:05:52<3:03:41, 3.25s/it] 49%|████▉ | 3245/6638 [3:05:55<3:05:12, 3.28s/it] {'loss': 0.6417, 'grad_norm': 0.5349237024121132, 'learning_rate': 1.0848059244755093e-05, 'epoch': 0.49} 49%|████▉ | 3245/6638 [3:05:55<3:05:12, 3.28s/it] 49%|████▉ | 3246/6638 [3:05:58<3:04:46, 3.27s/it] {'loss': 0.6293, 'grad_norm': 0.618085845808623, 'learning_rate': 1.0843196958469665e-05, 'epoch': 0.49} 49%|████▉ | 3246/6638 [3:05:58<3:04:46, 3.27s/it] 49%|████▉ | 3247/6638 [3:06:02<3:06:52, 3.31s/it] {'loss': 0.6833, 'grad_norm': 0.6559789836847499, 'learning_rate': 1.0838334471401285e-05, 'epoch': 0.49} 49%|████▉ | 3247/6638 [3:06:02<3:06:52, 3.31s/it] 49%|████▉ | 3248/6638 [3:06:05<3:04:57, 3.27s/it] {'loss': 0.6376, 'grad_norm': 0.575706761617881, 'learning_rate': 1.0833471784707825e-05, 'epoch': 0.49} 49%|████▉ | 3248/6638 [3:06:05<3:04:57, 3.27s/it] 49%|████▉ | 3249/6638 [3:06:08<3:03:41, 3.25s/it] {'loss': 0.649, 'grad_norm': 0.6426086639095097, 'learning_rate': 1.0828608899547181e-05, 'epoch': 0.49} 49%|████▉ | 3249/6638 [3:06:08<3:03:41, 3.25s/it]2 AutoResumeHook: Checking whether to suspend... 074 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 49%|████▉ | 3250/6638 [3:06:11<3:03:33, 3.25s/it]6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.624, 'grad_norm': 0.6041744197587156, 'learning_rate': 1.0823745817077318e-05, 'epoch': 0.49} 49%|████▉ | 3250/6638 [3:06:11<3:03:33, 3.25s/it] 49%|████▉ | 3251/6638 [3:06:15<3:04:27, 3.27s/it] {'loss': 0.6882, 'grad_norm': 0.6248888615213573, 'learning_rate': 1.0818882538456232e-05, 'epoch': 0.49} 49%|████▉ | 3251/6638 [3:06:15<3:04:27, 3.27s/it] 49%|████▉ | 3252/6638 [3:06:18<3:03:04, 3.24s/it] {'loss': 0.633, 'grad_norm': 0.6013709637802629, 'learning_rate': 1.0814019064841979e-05, 'epoch': 0.49} 49%|████▉ | 3252/6638 [3:06:18<3:03:04, 3.24s/it] 49%|████▉ | 3253/6638 [3:06:21<3:03:39, 3.26s/it] {'loss': 0.6984, 'grad_norm': 0.686848218044621, 'learning_rate': 1.0809155397392648e-05, 'epoch': 0.49} 49%|████▉ | 3253/6638 [3:06:21<3:03:39, 3.26s/it] 49%|████▉ | 3254/6638 [3:06:25<3:04:16, 3.27s/it] {'loss': 0.6535, 'grad_norm': 0.6155847022390015, 'learning_rate': 1.0804291537266381e-05, 'epoch': 0.49} 49%|████▉ | 3254/6638 [3:06:25<3:04:16, 3.27s/it] 49%|████▉ | 3255/6638 [3:06:28<3:05:22, 3.29s/it] {'loss': 0.6534, 'grad_norm': 0.6781376563214114, 'learning_rate': 1.079942748562137e-05, 'epoch': 0.49} 49%|████▉ | 3255/6638 [3:06:28<3:05:22, 3.29s/it] 49%|████▉ | 3256/6638 [3:06:31<3:05:52, 3.30s/it] {'loss': 0.6128, 'grad_norm': 0.569753670346393, 'learning_rate': 1.0794563243615843e-05, 'epoch': 0.49} 49%|████▉ | 3256/6638 [3:06:31<3:05:52, 3.30s/it] 49%|████▉ | 3257/6638 [3:06:34<3:03:47, 3.26s/it] {'loss': 0.611, 'grad_norm': 0.5611048390741842, 'learning_rate': 1.0789698812408082e-05, 'epoch': 0.49} 49%|████▉ | 3257/6638 [3:06:34<3:03:47, 3.26s/it] 49%|████▉ | 3258/6638 [3:06:38<3:03:28, 3.26s/it] {'loss': 0.6248, 'grad_norm': 0.6012662462050359, 'learning_rate': 1.0784834193156408e-05, 'epoch': 0.49} 49%|████▉ | 3258/6638 [3:06:38<3:03:28, 3.26s/it] 49%|████▉ | 3259/6638 [3:06:41<3:04:12, 3.27s/it] {'loss': 0.6788, 'grad_norm': 0.6282935739703217, 'learning_rate': 1.0779969387019193e-05, 'epoch': 0.49} 49%|████▉ | 3259/6638 [3:06:41<3:04:12, 3.27s/it] 49%|████▉ | 3260/6638 [3:06:44<3:05:54, 3.30s/it] {'loss': 0.5925, 'grad_norm': 0.5037345373020157, 'learning_rate': 1.0775104395154845e-05, 'epoch': 0.49} 49%|████▉ | 3260/6638 [3:06:44<3:05:54, 3.30s/it] 49%|████▉ | 3261/6638 [3:06:48<3:07:37, 3.33s/it] {'loss': 0.6207, 'grad_norm': 0.6034277668679199, 'learning_rate': 1.0770239218721822e-05, 'epoch': 0.49} 49%|████▉ | 3261/6638 [3:06:48<3:07:37, 3.33s/it] 49%|████▉ | 3262/6638 [3:06:51<3:05:38, 3.30s/it] {'loss': 0.6651, 'grad_norm': 0.6268861138156101, 'learning_rate': 1.0765373858878632e-05, 'epoch': 0.49} 49%|████▉ | 3262/6638 [3:06:51<3:05:38, 3.30s/it] 49%|████▉ | 3263/6638 [3:06:54<3:05:28, 3.30s/it] {'loss': 0.6856, 'grad_norm': 0.659910734533083, 'learning_rate': 1.0760508316783809e-05, 'epoch': 0.49} 49%|████▉ | 3263/6638 [3:06:54<3:05:28, 3.30s/it] 49%|████▉ | 3264/6638 [3:06:57<3:04:40, 3.28s/it] {'loss': 0.6492, 'grad_norm': 0.5713380009161254, 'learning_rate': 1.0755642593595945e-05, 'epoch': 0.49} 49%|████▉ | 3264/6638 [3:06:57<3:04:40, 3.28s/it] 49%|████▉ | 3265/6638 [3:07:01<3:05:20, 3.30s/it] {'loss': 0.7219, 'grad_norm': 0.656212369979937, 'learning_rate': 1.0750776690473674e-05, 'epoch': 0.49} 49%|████▉ | 3265/6638 [3:07:01<3:05:20, 3.30s/it] 49%|████▉ | 3266/6638 [3:07:04<3:05:12, 3.30s/it] {'loss': 0.6851, 'grad_norm': 0.6493036703232856, 'learning_rate': 1.0745910608575667e-05, 'epoch': 0.49} 49%|████▉ | 3266/6638 [3:07:04<3:05:12, 3.30s/it] 49%|████▉ | 3267/6638 [3:07:07<3:02:41, 3.25s/it] {'loss': 0.6424, 'grad_norm': 0.6684254264881755, 'learning_rate': 1.0741044349060646e-05, 'epoch': 0.49} 49%|████▉ | 3267/6638 [3:07:07<3:02:41, 3.25s/it] 49%|████▉ | 3268/6638 [3:07:11<3:04:11, 3.28s/it] {'loss': 0.6588, 'grad_norm': 0.5793808140403384, 'learning_rate': 1.0736177913087358e-05, 'epoch': 0.49} 49%|████▉ | 3268/6638 [3:07:11<3:04:11, 3.28s/it] 49%|████▉ | 3269/6638 [3:07:14<3:03:31, 3.27s/it] {'loss': 0.6591, 'grad_norm': 0.6184540519878146, 'learning_rate': 1.0731311301814616e-05, 'epoch': 0.49} 49%|████▉ | 3269/6638 [3:07:14<3:03:31, 3.27s/it] 49%|████▉ | 3270/6638 [3:07:17<3:05:05, 3.30s/it] {'loss': 0.6582, 'grad_norm': 0.6509265116484049, 'learning_rate': 1.0726444516401253e-05, 'epoch': 0.49} 49%|████▉ | 3270/6638 [3:07:17<3:05:05, 3.30s/it] 49%|████▉ | 3271/6638 [3:07:21<3:05:40, 3.31s/it] {'loss': 0.6577, 'grad_norm': 0.5458263990091223, 'learning_rate': 1.0721577558006164e-05, 'epoch': 0.49} 49%|████▉ | 3271/6638 [3:07:21<3:05:40, 3.31s/it] 49%|████▉ | 3272/6638 [3:07:24<3:06:20, 3.32s/it] {'loss': 0.6577, 'grad_norm': 0.6329251481365904, 'learning_rate': 1.0716710427788263e-05, 'epoch': 0.49} 49%|████▉ | 3272/6638 [3:07:24<3:06:20, 3.32s/it] 49%|████▉ | 3273/6638 [3:07:27<3:05:31, 3.31s/it] {'loss': 0.6607, 'grad_norm': 0.6572586404941624, 'learning_rate': 1.0711843126906522e-05, 'epoch': 0.49} 49%|████▉ | 3273/6638 [3:07:27<3:05:31, 3.31s/it] 49%|████▉ | 3274/6638 [3:07:30<3:04:56, 3.30s/it] {'loss': 0.6086, 'grad_norm': 0.5028055733328165, 'learning_rate': 1.0706975656519946e-05, 'epoch': 0.49} 49%|████▉ | 3274/6638 [3:07:30<3:04:56, 3.30s/it] 49%|████▉ | 3275/6638 [3:07:34<3:04:29, 3.29s/it] {'loss': 0.673, 'grad_norm': 0.6627604651068659, 'learning_rate': 1.0702108017787587e-05, 'epoch': 0.49} 49%|████▉ | 3275/6638 [3:07:34<3:04:29, 3.29s/it] 49%|████▉ | 3276/6638 [3:07:37<3:02:48, 3.26s/it] {'loss': 0.6186, 'grad_norm': 0.6154657164108253, 'learning_rate': 1.0697240211868527e-05, 'epoch': 0.49} 49%|████▉ | 3276/6638 [3:07:37<3:02:48, 3.26s/it] 49%|████▉ | 3277/6638 [3:07:40<3:04:46, 3.30s/it] {'loss': 0.6099, 'grad_norm': 0.5499919048870147, 'learning_rate': 1.0692372239921894e-05, 'epoch': 0.49} 49%|████▉ | 3277/6638 [3:07:40<3:04:46, 3.30s/it] 49%|████▉ | 3278/6638 [3:07:44<3:04:36, 3.30s/it] {'loss': 0.6529, 'grad_norm': 0.6006167545797032, 'learning_rate': 1.0687504103106854e-05, 'epoch': 0.49} 49%|████▉ | 3278/6638 [3:07:44<3:04:36, 3.30s/it] 49%|████▉ | 3279/6638 [3:07:47<3:03:51, 3.28s/it] {'loss': 0.6229, 'grad_norm': 0.5550931187685133, 'learning_rate': 1.0682635802582616e-05, 'epoch': 0.49} 49%|████▉ | 3279/6638 [3:07:47<3:03:51, 3.28s/it] 49%|████▉ | 3280/6638 [3:07:50<3:04:20, 3.29s/it] {'loss': 0.6916, 'grad_norm': 0.6163973986694651, 'learning_rate': 1.0677767339508418e-05, 'epoch': 0.49} 49%|████▉ | 3280/6638 [3:07:50<3:04:20, 3.29s/it] 49%|████▉ | 3281/6638 [3:07:53<3:03:11, 3.27s/it] {'loss': 0.6542, 'grad_norm': 0.6516459591814092, 'learning_rate': 1.067289871504355e-05, 'epoch': 0.49} 49%|████▉ | 3281/6638 [3:07:53<3:03:11, 3.27s/it] 49%|████▉ | 3282/6638 [3:07:57<3:03:58, 3.29s/it] {'loss': 0.6694, 'grad_norm': 0.6757691064551447, 'learning_rate': 1.0668029930347336e-05, 'epoch': 0.49} 49%|████▉ | 3282/6638 [3:07:57<3:03:58, 3.29s/it] 49%|████▉ | 3283/6638 [3:08:00<3:03:24, 3.28s/it] {'loss': 0.7087, 'grad_norm': 0.6598825067519473, 'learning_rate': 1.0663160986579129e-05, 'epoch': 0.49} 49%|████▉ | 3283/6638 [3:08:00<3:03:24, 3.28s/it] 49%|████▉ | 3284/6638 [3:08:03<3:03:26, 3.28s/it] {'loss': 0.5947, 'grad_norm': 0.547678883516739, 'learning_rate': 1.0658291884898333e-05, 'epoch': 0.49} 49%|████▉ | 3284/6638 [3:08:03<3:03:26, 3.28s/it] 49%|████▉ | 3285/6638 [3:08:06<3:02:12, 3.26s/it] {'loss': 0.639, 'grad_norm': 1.0354353574097623, 'learning_rate': 1.065342262646438e-05, 'epoch': 0.49} 49%|████▉ | 3285/6638 [3:08:06<3:02:12, 3.26s/it] 50%|████▉ | 3286/6638 [3:08:10<3:04:52, 3.31s/it] {'loss': 0.6405, 'grad_norm': 0.6868237049239395, 'learning_rate': 1.0648553212436746e-05, 'epoch': 0.5} 50%|████▉ | 3286/6638 [3:08:10<3:04:52, 3.31s/it] 50%|████▉ | 3287/6638 [3:08:13<3:03:59, 3.29s/it] {'loss': 0.6475, 'grad_norm': 0.5831963741271339, 'learning_rate': 1.0643683643974934e-05, 'epoch': 0.5} 50%|████▉ | 3287/6638 [3:08:13<3:03:59, 3.29s/it] 50%|████▉ | 3288/6638 [3:08:16<3:02:04, 3.26s/it] {'loss': 0.6248, 'grad_norm': 0.6069216116322836, 'learning_rate': 1.0638813922238498e-05, 'epoch': 0.5} 50%|████▉ | 3288/6638 [3:08:16<3:02:04, 3.26s/it] 50%|████▉ | 3289/6638 [3:08:20<3:01:34, 3.25s/it] {'loss': 0.6708, 'grad_norm': 0.6341559874788948, 'learning_rate': 1.0633944048387017e-05, 'epoch': 0.5} 50%|████▉ | 3289/6638 [3:08:20<3:01:34, 3.25s/it] 50%|████▉ | 3290/6638 [3:08:23<3:02:36, 3.27s/it] {'loss': 0.635, 'grad_norm': 0.6042551741058239, 'learning_rate': 1.062907402358012e-05, 'epoch': 0.5} 50%|████▉ | 3290/6638 [3:08:23<3:02:36, 3.27s/it] 50%|████▉ | 3291/6638 [3:08:26<3:00:57, 3.24s/it] {'loss': 0.6489, 'grad_norm': 0.702749271939134, 'learning_rate': 1.0624203848977446e-05, 'epoch': 0.5} 50%|████▉ | 3291/6638 [3:08:26<3:00:57, 3.24s/it] 50%|████▉ | 3292/6638 [3:08:29<3:00:55, 3.24s/it] {'loss': 0.6506, 'grad_norm': 0.6770969407977115, 'learning_rate': 1.06193335257387e-05, 'epoch': 0.5} 50%|████▉ | 3292/6638 [3:08:29<3:00:55, 3.24s/it] 50%|████▉ | 3293/6638 [3:08:33<3:02:25, 3.27s/it] {'loss': 0.6237, 'grad_norm': 0.5698568881249959, 'learning_rate': 1.0614463055023601e-05, 'epoch': 0.5} 50%|████▉ | 3293/6638 [3:08:33<3:02:25, 3.27s/it] 50%|████▉ | 3294/6638 [3:08:36<3:02:21, 3.27s/it] {'loss': 0.6437, 'grad_norm': 0.6295934498180284, 'learning_rate': 1.0609592437991913e-05, 'epoch': 0.5} 50%|████▉ | 3294/6638 [3:08:36<3:02:21, 3.27s/it] 50%|████▉ | 3295/6638 [3:08:39<3:07:29, 3.37s/it] {'loss': 0.6799, 'grad_norm': 0.652935126916932, 'learning_rate': 1.0604721675803436e-05, 'epoch': 0.5} 50%|████▉ | 3295/6638 [3:08:39<3:07:29, 3.37s/it] 50%|████▉ | 3296/6638 [3:08:43<3:08:58, 3.39s/it] {'loss': 0.6963, 'grad_norm': 0.6109053952913689, 'learning_rate': 1.0599850769617996e-05, 'epoch': 0.5} 50%|████▉ | 3296/6638 [3:08:43<3:08:58, 3.39s/it] 50%|████▉ | 3297/6638 [3:08:46<3:11:16, 3.44s/it] {'loss': 0.7035, 'grad_norm': 0.6417456550925508, 'learning_rate': 1.0594979720595457e-05, 'epoch': 0.5} 50%|████▉ | 3297/6638 [3:08:46<3:11:16, 3.44s/it] 50%|████▉ | 3298/6638 [3:08:50<3:08:02, 3.38s/it] {'loss': 0.6274, 'grad_norm': 0.6133921165654336, 'learning_rate': 1.0590108529895723e-05, 'epoch': 0.5} 50%|████▉ | 3298/6638 [3:08:50<3:08:02, 3.38s/it] 50%|████▉ | 3299/6638 [3:08:53<3:05:36, 3.34s/it] {'loss': 0.6259, 'grad_norm': 0.5626026461676864, 'learning_rate': 1.0585237198678724e-05, 'epoch': 0.5} 50%|████▉ | 3299/6638 [3:08:53<3:05:36, 3.34s/it]2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 50%|████▉ | 3300/6638 [3:08:56<3:04:59, 3.33s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6299, 'grad_norm': 0.6226295840502906, 'learning_rate': 1.0580365728104433e-05, 'epoch': 0.5} 50%|████▉ | 3300/6638 [3:08:56<3:04:59, 3.33s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 50%|████▉ | 3301/6638 [3:09:14<7:06:01, 7.66s/it] {'loss': 0.6294, 'grad_norm': 0.6002162041723934, 'learning_rate': 1.0575494119332838e-05, 'epoch': 0.5} 50%|████▉ | 3301/6638 [3:09:14<7:06:01, 7.66s/it] 50%|████▉ | 3302/6638 [3:09:17<5:53:16, 6.35s/it] {'loss': 0.6825, 'grad_norm': 0.6814917574783462, 'learning_rate': 1.0570622373523983e-05, 'epoch': 0.5} 50%|████▉ | 3302/6638 [3:09:17<5:53:16, 6.35s/it] 50%|████▉ | 3303/6638 [3:09:21<5:01:34, 5.43s/it] {'loss': 0.638, 'grad_norm': 0.7087387660598268, 'learning_rate': 1.0565750491837925e-05, 'epoch': 0.5} 50%|████▉ | 3303/6638 [3:09:21<5:01:34, 5.43s/it] 50%|████▉ | 3304/6638 [3:09:24<4:29:03, 4.84s/it] {'loss': 0.7314, 'grad_norm': 0.6884170388400508, 'learning_rate': 1.0560878475434766e-05, 'epoch': 0.5} 50%|████▉ | 3304/6638 [3:09:24<4:29:03, 4.84s/it] 50%|████▉ | 3305/6638 [3:09:27<4:02:03, 4.36s/it] {'loss': 0.6824, 'grad_norm': 0.7586158034371013, 'learning_rate': 1.0556006325474634e-05, 'epoch': 0.5} 50%|████▉ | 3305/6638 [3:09:27<4:02:03, 4.36s/it] 50%|████▉ | 3306/6638 [3:09:31<3:46:14, 4.07s/it] {'loss': 0.6467, 'grad_norm': 0.6822469822920341, 'learning_rate': 1.0551134043117693e-05, 'epoch': 0.5} 50%|████▉ | 3306/6638 [3:09:31<3:46:14, 4.07s/it] 50%|████▉ | 3307/6638 [3:09:34<3:32:49, 3.83s/it] {'loss': 0.6567, 'grad_norm': 0.6252989782885213, 'learning_rate': 1.054626162952413e-05, 'epoch': 0.5} 50%|████▉ | 3307/6638 [3:09:34<3:32:49, 3.83s/it] 50%|████▉ | 3308/6638 [3:09:37<3:24:13, 3.68s/it] {'loss': 0.6769, 'grad_norm': 0.6570582005382828, 'learning_rate': 1.0541389085854177e-05, 'epoch': 0.5} 50%|████▉ | 3308/6638 [3:09:37<3:24:13, 3.68s/it] 50%|████▉ | 3309/6638 [3:09:41<3:16:41, 3.55s/it] {'loss': 0.6338, 'grad_norm': 0.6527085395901407, 'learning_rate': 1.0536516413268085e-05, 'epoch': 0.5} 50%|████▉ | 3309/6638 [3:09:41<3:16:41, 3.55s/it] 50%|████▉ | 3310/6638 [3:09:44<3:12:02, 3.46s/it] {'loss': 0.677, 'grad_norm': 0.7042450640817068, 'learning_rate': 1.0531643612926136e-05, 'epoch': 0.5} 50%|████▉ | 3310/6638 [3:09:44<3:12:02, 3.46s/it] 50%|████▉ | 3311/6638 [3:09:47<3:11:07, 3.45s/it] {'loss': 0.7276, 'grad_norm': 0.6187167132346508, 'learning_rate': 1.0526770685988657e-05, 'epoch': 0.5} 50%|████▉ | 3311/6638 [3:09:47<3:11:07, 3.45s/it] 50%|████▉ | 3312/6638 [3:09:51<3:09:11, 3.41s/it] {'loss': 0.6715, 'grad_norm': 0.611124889735465, 'learning_rate': 1.0521897633615983e-05, 'epoch': 0.5} 50%|████▉ | 3312/6638 [3:09:51<3:09:11, 3.41s/it] 50%|████▉ | 3313/6638 [3:09:54<3:09:30, 3.42s/it] {'loss': 0.6726, 'grad_norm': 0.5777473961075218, 'learning_rate': 1.0517024456968498e-05, 'epoch': 0.5} 50%|████▉ | 3313/6638 [3:09:54<3:09:30, 3.42s/it] 50%|████▉ | 3314/6638 [3:09:57<3:05:56, 3.36s/it] {'loss': 0.6627, 'grad_norm': 0.6583746457771087, 'learning_rate': 1.0512151157206606e-05, 'epoch': 0.5} 50%|████▉ | 3314/6638 [3:09:57<3:05:56, 3.36s/it] 50%|████▉ | 3315/6638 [3:10:00<3:02:46, 3.30s/it] {'loss': 0.6511, 'grad_norm': 0.6323743991390597, 'learning_rate': 1.0507277735490743e-05, 'epoch': 0.5} 50%|████▉ | 3315/6638 [3:10:00<3:02:46, 3.30s/it] 50%|████▉ | 3316/6638 [3:10:04<3:02:29, 3.30s/it] {'loss': 0.6566, 'grad_norm': 0.6475546130673729, 'learning_rate': 1.050240419298137e-05, 'epoch': 0.5} 50%|████▉ | 3316/6638 [3:10:04<3:02:29, 3.30s/it] 50%|████▉ | 3317/6638 [3:10:07<3:01:00, 3.27s/it] {'loss': 0.6369, 'grad_norm': 0.5863506855861375, 'learning_rate': 1.0497530530838985e-05, 'epoch': 0.5} 50%|████▉ | 3317/6638 [3:10:07<3:01:00, 3.27s/it] 50%|████▉ | 3318/6638 [3:10:10<3:00:12, 3.26s/it] {'loss': 0.6283, 'grad_norm': 0.5514445149522691, 'learning_rate': 1.0492656750224103e-05, 'epoch': 0.5} 50%|████▉ | 3318/6638 [3:10:10<3:00:12, 3.26s/it] 50%|█████ | 3319/6638 [3:10:13<3:00:31, 3.26s/it] {'loss': 0.6318, 'grad_norm': 0.5848172927385367, 'learning_rate': 1.048778285229728e-05, 'epoch': 0.5} 50%|█████ | 3319/6638 [3:10:13<3:00:31, 3.26s/it] 50%|█████ | 3320/6638 [3:10:17<3:00:13, 3.26s/it] {'loss': 0.63, 'grad_norm': 0.660311061189289, 'learning_rate': 1.0482908838219088e-05, 'epoch': 0.5} 50%|█████ | 3320/6638 [3:10:17<3:00:13, 3.26s/it] 50%|█████ | 3321/6638 [3:10:20<3:01:11, 3.28s/it] {'loss': 0.6605, 'grad_norm': 0.5570628660097922, 'learning_rate': 1.047803470915014e-05, 'epoch': 0.5} 50%|█████ | 3321/6638 [3:10:20<3:01:11, 3.28s/it] 50%|█████ | 3322/6638 [3:10:23<3:01:18, 3.28s/it] {'loss': 0.6692, 'grad_norm': 0.5872928810639331, 'learning_rate': 1.047316046625106e-05, 'epoch': 0.5} 50%|█████ | 3322/6638 [3:10:23<3:01:18, 3.28s/it] 50%|█████ | 3323/6638 [3:10:26<2:59:45, 3.25s/it] {'loss': 0.6201, 'grad_norm': 0.6524101715178611, 'learning_rate': 1.0468286110682518e-05, 'epoch': 0.5} 50%|█████ | 3323/6638 [3:10:26<2:59:45, 3.25s/it] 50%|█████ | 3324/6638 [3:10:30<2:59:00, 3.24s/it] {'loss': 0.6454, 'grad_norm': 0.5664502427606211, 'learning_rate': 1.0463411643605189e-05, 'epoch': 0.5} 50%|█████ | 3324/6638 [3:10:30<2:59:00, 3.24s/it] 50%|█████ | 3325/6638 [3:10:33<2:58:18, 3.23s/it] {'loss': 0.6499, 'grad_norm': 0.6522190920808224, 'learning_rate': 1.0458537066179795e-05, 'epoch': 0.5} 50%|█████ | 3325/6638 [3:10:33<2:58:18, 3.23s/it] 50%|█████ | 3326/6638 [3:10:36<2:58:48, 3.24s/it] {'loss': 0.6215, 'grad_norm': 0.619304664402858, 'learning_rate': 1.0453662379567069e-05, 'epoch': 0.5} 50%|█████ | 3326/6638 [3:10:36<2:58:48, 3.24s/it] 50%|█████ | 3327/6638 [3:10:39<3:01:12, 3.28s/it] {'loss': 0.6329, 'grad_norm': 0.5182512895108291, 'learning_rate': 1.0448787584927782e-05, 'epoch': 0.5} 50%|█████ | 3327/6638 [3:10:39<3:01:12, 3.28s/it] 50%|█████ | 3328/6638 [3:10:43<3:01:27, 3.29s/it] {'loss': 0.6373, 'grad_norm': 0.5708355179418548, 'learning_rate': 1.0443912683422726e-05, 'epoch': 0.5} 50%|█████ | 3328/6638 [3:10:43<3:01:27, 3.29s/it] 50%|█████ | 3329/6638 [3:10:46<3:00:32, 3.27s/it] {'loss': 0.6931, 'grad_norm': 0.6485198691688612, 'learning_rate': 1.0439037676212713e-05, 'epoch': 0.5} 50%|█████ | 3329/6638 [3:10:46<3:00:32, 3.27s/it] 50%|█████ | 3330/6638 [3:10:49<3:01:27, 3.29s/it] {'loss': 0.6588, 'grad_norm': 0.5871740458436904, 'learning_rate': 1.0434162564458585e-05, 'epoch': 0.5} 50%|█████ | 3330/6638 [3:10:49<3:01:27, 3.29s/it] 50%|█████ | 3331/6638 [3:10:53<3:02:13, 3.31s/it] {'loss': 0.6648, 'grad_norm': 0.5672814509254529, 'learning_rate': 1.0429287349321214e-05, 'epoch': 0.5} 50%|█████ | 3331/6638 [3:10:53<3:02:13, 3.31s/it] 50%|█████ | 3332/6638 [3:10:56<3:01:18, 3.29s/it] {'loss': 0.6386, 'grad_norm': 0.6135741301340792, 'learning_rate': 1.0424412031961485e-05, 'epoch': 0.5} 50%|█████ | 3332/6638 [3:10:56<3:01:18, 3.29s/it] 50%|█████ | 3333/6638 [3:10:59<3:01:46, 3.30s/it] {'loss': 0.6522, 'grad_norm': 0.686366978640388, 'learning_rate': 1.0419536613540317e-05, 'epoch': 0.5} 50%|█████ | 3333/6638 [3:10:59<3:01:46, 3.30s/it] 50%|█████ | 3334/6638 [3:11:02<2:59:42, 3.26s/it] {'loss': 0.6298, 'grad_norm': 0.5975550763368831, 'learning_rate': 1.041466109521865e-05, 'epoch': 0.5} 50%|█████ | 3334/6638 [3:11:02<2:59:42, 3.26s/it] 50%|█████ | 3335/6638 [3:11:06<2:58:01, 3.23s/it] {'loss': 0.6751, 'grad_norm': 0.6985062668334346, 'learning_rate': 1.0409785478157448e-05, 'epoch': 0.5} 50%|█████ | 3335/6638 [3:11:06<2:58:01, 3.23s/it] 50%|█████ | 3336/6638 [3:11:09<2:58:09, 3.24s/it] {'loss': 0.7192, 'grad_norm': 0.7069197596304373, 'learning_rate': 1.0404909763517695e-05, 'epoch': 0.5} 50%|█████ | 3336/6638 [3:11:09<2:58:09, 3.24s/it] 50%|█████ | 3337/6638 [3:11:12<2:57:37, 3.23s/it] {'loss': 0.7047, 'grad_norm': 0.5978044101759978, 'learning_rate': 1.0400033952460405e-05, 'epoch': 0.5} 50%|█████ | 3337/6638 [3:11:12<2:57:37, 3.23s/it] 50%|█████ | 3338/6638 [3:11:15<2:58:17, 3.24s/it] {'loss': 0.6738, 'grad_norm': 0.6470015036544696, 'learning_rate': 1.0395158046146608e-05, 'epoch': 0.5} 50%|█████ | 3338/6638 [3:11:15<2:58:17, 3.24s/it] 50%|█████ | 3339/6638 [3:11:19<2:59:18, 3.26s/it] {'loss': 0.6551, 'grad_norm': 0.5343422485432398, 'learning_rate': 1.039028204573736e-05, 'epoch': 0.5} 50%|█████ | 3339/6638 [3:11:19<2:59:18, 3.26s/it] 50%|█████ | 3340/6638 [3:11:22<2:59:46, 3.27s/it] {'loss': 0.6379, 'grad_norm': 0.517637109388519, 'learning_rate': 1.0385405952393743e-05, 'epoch': 0.5} 50%|█████ | 3340/6638 [3:11:22<2:59:46, 3.27s/it] 50%|█████ | 3341/6638 [3:11:25<3:00:15, 3.28s/it] {'loss': 0.6208, 'grad_norm': 0.5279051067432731, 'learning_rate': 1.0380529767276856e-05, 'epoch': 0.5} 50%|█████ | 3341/6638 [3:11:25<3:00:15, 3.28s/it] 50%|█████ | 3342/6638 [3:11:29<3:01:02, 3.30s/it] {'loss': 0.6786, 'grad_norm': 0.654567671734777, 'learning_rate': 1.0375653491547822e-05, 'epoch': 0.5} 50%|█████ | 3342/6638 [3:11:29<3:01:02, 3.30s/it] 50%|█████ | 3343/6638 [3:11:32<3:04:24, 3.36s/it] {'loss': 0.6726, 'grad_norm': 0.5519331114865923, 'learning_rate': 1.037077712636778e-05, 'epoch': 0.5} 50%|█████ | 3343/6638 [3:11:32<3:04:24, 3.36s/it] 50%|█████ | 3344/6638 [3:11:35<3:03:56, 3.35s/it] {'loss': 0.6636, 'grad_norm': 0.6177448773209662, 'learning_rate': 1.03659006728979e-05, 'epoch': 0.5} 50%|█████ | 3344/6638 [3:11:35<3:03:56, 3.35s/it] 50%|█████ | 3345/6638 [3:11:39<3:01:11, 3.30s/it] {'loss': 0.6343, 'grad_norm': 0.6178477598013582, 'learning_rate': 1.0361024132299364e-05, 'epoch': 0.5} 50%|█████ | 3345/6638 [3:11:39<3:01:11, 3.30s/it] 50%|█████ | 3346/6638 [3:11:42<3:01:33, 3.31s/it] {'loss': 0.6424, 'grad_norm': 0.5639163725770046, 'learning_rate': 1.0356147505733385e-05, 'epoch': 0.5} 50%|█████ | 3346/6638 [3:11:42<3:01:33, 3.31s/it] 50%|█████ | 3347/6638 [3:11:45<3:01:50, 3.32s/it] {'loss': 0.6347, 'grad_norm': 0.6339658393821023, 'learning_rate': 1.0351270794361188e-05, 'epoch': 0.5} 50%|█████ | 3347/6638 [3:11:45<3:01:50, 3.32s/it] 50%|█████ | 3348/6638 [3:11:49<3:02:08, 3.32s/it] {'loss': 0.6051, 'grad_norm': 0.5091114903603996, 'learning_rate': 1.034639399934402e-05, 'epoch': 0.5} 50%|█████ | 3348/6638 [3:11:49<3:02:08, 3.32s/it] 50%|█████ | 3349/6638 [3:11:52<3:02:55, 3.34s/it] {'loss': 0.596, 'grad_norm': 0.6248194650409539, 'learning_rate': 1.0341517121843147e-05, 'epoch': 0.5} 50%|█████ | 3349/6638 [3:11:52<3:02:55, 3.34s/it]3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 50%|█████ | 3350/6638 [3:11:55<3:01:05, 3.30s/it]5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6784, 'grad_norm': 0.6504963246798952, 'learning_rate': 1.0336640163019857e-05, 'epoch': 0.5} 50%|█████ | 3350/6638 [3:11:55<3:01:05, 3.30s/it] 50%|█████ | 3351/6638 [3:11:58<3:01:05, 3.31s/it] {'loss': 0.6268, 'grad_norm': 0.5944471068479856, 'learning_rate': 1.0331763124035461e-05, 'epoch': 0.5} 50%|█████ | 3351/6638 [3:11:59<3:01:05, 3.31s/it] 50%|█████ | 3352/6638 [3:12:02<3:00:45, 3.30s/it] {'loss': 0.6564, 'grad_norm': 0.5708584070935934, 'learning_rate': 1.032688600605128e-05, 'epoch': 0.5} 50%|█████ | 3352/6638 [3:12:02<3:00:45, 3.30s/it] 51%|█████ | 3353/6638 [3:12:05<3:00:26, 3.30s/it] {'loss': 0.6573, 'grad_norm': 0.6039511875785665, 'learning_rate': 1.0322008810228657e-05, 'epoch': 0.51} 51%|█████ | 3353/6638 [3:12:05<3:00:26, 3.30s/it] 51%|█████ | 3354/6638 [3:12:08<3:02:14, 3.33s/it] {'loss': 0.6476, 'grad_norm': 0.554683556941714, 'learning_rate': 1.031713153772896e-05, 'epoch': 0.51} 51%|█████ | 3354/6638 [3:12:08<3:02:14, 3.33s/it] 51%|█████ | 3355/6638 [3:12:12<3:02:34, 3.34s/it] {'loss': 0.6946, 'grad_norm': 0.668946235256272, 'learning_rate': 1.0312254189713564e-05, 'epoch': 0.51} 51%|█████ | 3355/6638 [3:12:12<3:02:34, 3.34s/it] 51%|█████ | 3356/6638 [3:12:15<3:00:19, 3.30s/it] {'loss': 0.6219, 'grad_norm': 0.5716327818272324, 'learning_rate': 1.0307376767343877e-05, 'epoch': 0.51} 51%|█████ | 3356/6638 [3:12:15<3:00:19, 3.30s/it] 51%|█████ | 3357/6638 [3:12:18<3:02:48, 3.34s/it] {'loss': 0.6673, 'grad_norm': 0.5616779844980389, 'learning_rate': 1.0302499271781306e-05, 'epoch': 0.51} 51%|█████ | 3357/6638 [3:12:18<3:02:48, 3.34s/it] 51%|█████ | 3358/6638 [3:12:22<3:02:55, 3.35s/it] {'loss': 0.6281, 'grad_norm': 0.5785762041998681, 'learning_rate': 1.0297621704187292e-05, 'epoch': 0.51} 51%|█████ | 3358/6638 [3:12:22<3:02:55, 3.35s/it] 51%|█████ | 3359/6638 [3:12:25<3:03:10, 3.35s/it] {'loss': 0.6493, 'grad_norm': 0.6647354697025896, 'learning_rate': 1.0292744065723286e-05, 'epoch': 0.51} 51%|█████ | 3359/6638 [3:12:25<3:03:10, 3.35s/it] 51%|█████ | 3360/6638 [3:12:29<3:02:30, 3.34s/it] {'loss': 0.6709, 'grad_norm': 0.5939522237014168, 'learning_rate': 1.0287866357550753e-05, 'epoch': 0.51} 51%|█████ | 3360/6638 [3:12:29<3:02:30, 3.34s/it] 51%|█████ | 3361/6638 [3:12:32<3:01:25, 3.32s/it] {'loss': 0.6373, 'grad_norm': 0.6257322059613217, 'learning_rate': 1.0282988580831183e-05, 'epoch': 0.51} 51%|█████ | 3361/6638 [3:12:32<3:01:25, 3.32s/it] 51%|█████ | 3362/6638 [3:12:35<3:00:46, 3.31s/it] {'loss': 0.6588, 'grad_norm': 0.5562513829779084, 'learning_rate': 1.0278110736726074e-05, 'epoch': 0.51} 51%|█████ | 3362/6638 [3:12:35<3:00:46, 3.31s/it] 51%|█████ | 3363/6638 [3:12:38<2:58:51, 3.28s/it] {'loss': 0.6598, 'grad_norm': 0.6129248867056261, 'learning_rate': 1.0273232826396942e-05, 'epoch': 0.51} 51%|█████ | 3363/6638 [3:12:38<2:58:51, 3.28s/it] 51%|█████ | 3364/6638 [3:12:42<2:58:09, 3.26s/it] {'loss': 0.6395, 'grad_norm': 0.5653988966319851, 'learning_rate': 1.0268354851005322e-05, 'epoch': 0.51} 51%|█████ | 3364/6638 [3:12:42<2:58:09, 3.26s/it] 51%|█████ | 3365/6638 [3:12:45<2:58:22, 3.27s/it] {'loss': 0.6363, 'grad_norm': 0.5266094020371248, 'learning_rate': 1.0263476811712765e-05, 'epoch': 0.51} 51%|█████ | 3365/6638 [3:12:45<2:58:22, 3.27s/it] 51%|█████ | 3366/6638 [3:12:48<3:00:33, 3.31s/it] {'loss': 0.6774, 'grad_norm': 0.6328777075756038, 'learning_rate': 1.0258598709680829e-05, 'epoch': 0.51} 51%|█████ | 3366/6638 [3:12:48<3:00:33, 3.31s/it] 51%|█████ | 3367/6638 [3:12:51<2:58:16, 3.27s/it] {'loss': 0.6227, 'grad_norm': 0.5882079128309938, 'learning_rate': 1.02537205460711e-05, 'epoch': 0.51} 51%|█████ | 3367/6638 [3:12:51<2:58:16, 3.27s/it] 51%|█████ | 3368/6638 [3:12:55<2:57:44, 3.26s/it] {'loss': 0.6211, 'grad_norm': 0.6148670896657784, 'learning_rate': 1.0248842322045165e-05, 'epoch': 0.51} 51%|█████ | 3368/6638 [3:12:55<2:57:44, 3.26s/it] 51%|█████ | 3369/6638 [3:12:58<2:56:52, 3.25s/it] {'loss': 0.6419, 'grad_norm': 0.6068930423213493, 'learning_rate': 1.0243964038764636e-05, 'epoch': 0.51} 51%|█████ | 3369/6638 [3:12:58<2:56:52, 3.25s/it] 51%|█████ | 3370/6638 [3:13:01<2:56:29, 3.24s/it] {'loss': 0.6013, 'grad_norm': 0.570140721591356, 'learning_rate': 1.0239085697391134e-05, 'epoch': 0.51} 51%|█████ | 3370/6638 [3:13:01<2:56:29, 3.24s/it] 51%|█████ | 3371/6638 [3:13:04<2:58:55, 3.29s/it] {'loss': 0.6734, 'grad_norm': 0.5896251566313473, 'learning_rate': 1.0234207299086294e-05, 'epoch': 0.51} 51%|█████ | 3371/6638 [3:13:04<2:58:55, 3.29s/it] 51%|█████ | 3372/6638 [3:13:08<2:57:43, 3.27s/it] {'loss': 0.6555, 'grad_norm': 0.625554820064321, 'learning_rate': 1.0229328845011763e-05, 'epoch': 0.51} 51%|█████ | 3372/6638 [3:13:08<2:57:43, 3.27s/it] 51%|█████ | 3373/6638 [3:13:11<2:59:33, 3.30s/it] {'loss': 0.6292, 'grad_norm': 1.4454912517694989, 'learning_rate': 1.0224450336329207e-05, 'epoch': 0.51} 51%|█████ | 3373/6638 [3:13:11<2:59:33, 3.30s/it] 51%|█████ | 3374/6638 [3:13:14<2:59:06, 3.29s/it] {'loss': 0.6778, 'grad_norm': 0.6651368864036429, 'learning_rate': 1.0219571774200303e-05, 'epoch': 0.51} 51%|█████ | 3374/6638 [3:13:14<2:59:06, 3.29s/it] 51%|█████ | 3375/6638 [3:13:18<2:57:54, 3.27s/it] {'loss': 0.6158, 'grad_norm': 0.6094258307956986, 'learning_rate': 1.021469315978674e-05, 'epoch': 0.51} 51%|█████ | 3375/6638 [3:13:18<2:57:54, 3.27s/it] 51%|█████ | 3376/6638 [3:13:21<3:00:49, 3.33s/it] {'loss': 0.6994, 'grad_norm': 0.6683667798847484, 'learning_rate': 1.020981449425021e-05, 'epoch': 0.51} 51%|█████ | 3376/6638 [3:13:21<3:00:49, 3.33s/it] 51%|█████ | 3377/6638 [3:13:24<3:00:20, 3.32s/it] {'loss': 0.6616, 'grad_norm': 0.6303368840565983, 'learning_rate': 1.0204935778752434e-05, 'epoch': 0.51} 51%|█████ | 3377/6638 [3:13:24<3:00:20, 3.32s/it] 51%|█████ | 3378/6638 [3:13:27<2:57:37, 3.27s/it] {'loss': 0.632, 'grad_norm': 0.5715841454601485, 'learning_rate': 1.0200057014455134e-05, 'epoch': 0.51} 51%|█████ | 3378/6638 [3:13:27<2:57:37, 3.27s/it] 51%|█████ | 3379/6638 [3:13:31<2:55:58, 3.24s/it] {'loss': 0.6268, 'grad_norm': 0.5883025759498401, 'learning_rate': 1.0195178202520048e-05, 'epoch': 0.51} 51%|█████ | 3379/6638 [3:13:31<2:55:58, 3.24s/it] 51%|█████ | 3380/6638 [3:13:34<2:57:22, 3.27s/it] {'loss': 0.6637, 'grad_norm': 0.6303495398855856, 'learning_rate': 1.0190299344108924e-05, 'epoch': 0.51} 51%|█████ | 3380/6638 [3:13:34<2:57:22, 3.27s/it] 51%|█████ | 3381/6638 [3:13:37<3:01:23, 3.34s/it] {'loss': 0.7028, 'grad_norm': 0.6066472252493968, 'learning_rate': 1.0185420440383522e-05, 'epoch': 0.51} 51%|█████ | 3381/6638 [3:13:37<3:01:23, 3.34s/it] 51%|█████ | 3382/6638 [3:13:41<2:59:16, 3.30s/it] {'loss': 0.6113, 'grad_norm': 0.584023749215488, 'learning_rate': 1.0180541492505605e-05, 'epoch': 0.51} 51%|█████ | 3382/6638 [3:13:41<2:59:16, 3.30s/it] 51%|█████ | 3383/6638 [3:13:44<2:58:02, 3.28s/it] {'loss': 0.6328, 'grad_norm': 0.6139864062035143, 'learning_rate': 1.017566250163696e-05, 'epoch': 0.51} 51%|█████ | 3383/6638 [3:13:44<2:58:02, 3.28s/it] 51%|█████ | 3384/6638 [3:13:47<2:59:44, 3.31s/it] {'loss': 0.687, 'grad_norm': 0.6411258897429072, 'learning_rate': 1.0170783468939379e-05, 'epoch': 0.51} 51%|█████ | 3384/6638 [3:13:47<2:59:44, 3.31s/it] 51%|█████ | 3385/6638 [3:13:51<3:01:41, 3.35s/it] {'loss': 0.6697, 'grad_norm': 0.5671288136246395, 'learning_rate': 1.0165904395574655e-05, 'epoch': 0.51} 51%|█████ | 3385/6638 [3:13:51<3:01:41, 3.35s/it] 51%|█████ | 3386/6638 [3:13:54<3:00:09, 3.32s/it] {'loss': 0.666, 'grad_norm': 0.620717836976636, 'learning_rate': 1.01610252827046e-05, 'epoch': 0.51} 51%|█████ | 3386/6638 [3:13:54<3:00:09, 3.32s/it] 51%|█████ | 3387/6638 [3:13:57<3:00:56, 3.34s/it] {'loss': 0.6701, 'grad_norm': 0.6013196032273106, 'learning_rate': 1.0156146131491038e-05, 'epoch': 0.51} 51%|█████ | 3387/6638 [3:13:57<3:00:56, 3.34s/it] 51%|█████ | 3388/6638 [3:14:01<3:00:59, 3.34s/it] {'loss': 0.6392, 'grad_norm': 0.5842203307014828, 'learning_rate': 1.0151266943095792e-05, 'epoch': 0.51} 51%|█████ | 3388/6638 [3:14:01<3:00:59, 3.34s/it] 51%|█████ | 3389/6638 [3:14:04<3:01:42, 3.36s/it] {'loss': 0.6797, 'grad_norm': 0.5455079295644631, 'learning_rate': 1.0146387718680706e-05, 'epoch': 0.51} 51%|█████ | 3389/6638 [3:14:04<3:01:42, 3.36s/it] 51%|█████ | 3390/6638 [3:14:07<3:01:08, 3.35s/it] {'loss': 0.6505, 'grad_norm': 0.529216535184518, 'learning_rate': 1.0141508459407622e-05, 'epoch': 0.51} 51%|█████ | 3390/6638 [3:14:07<3:01:08, 3.35s/it] 51%|█████ | 3391/6638 [3:14:11<3:00:43, 3.34s/it] {'loss': 0.7198, 'grad_norm': 0.6377747002197881, 'learning_rate': 1.0136629166438396e-05, 'epoch': 0.51} 51%|█████ | 3391/6638 [3:14:11<3:00:43, 3.34s/it] 51%|█████ | 3392/6638 [3:14:14<3:00:12, 3.33s/it] {'loss': 0.6475, 'grad_norm': 0.5930946093068913, 'learning_rate': 1.0131749840934885e-05, 'epoch': 0.51} 51%|█████ | 3392/6638 [3:14:14<3:00:12, 3.33s/it] 51%|█████ | 3393/6638 [3:14:17<2:59:21, 3.32s/it] {'loss': 0.6661, 'grad_norm': 0.6465351664463025, 'learning_rate': 1.0126870484058966e-05, 'epoch': 0.51} 51%|█████ | 3393/6638 [3:14:17<2:59:21, 3.32s/it] 51%|█████ | 3394/6638 [3:14:21<2:58:17, 3.30s/it] {'loss': 0.6492, 'grad_norm': 0.6024316614610556, 'learning_rate': 1.0121991096972514e-05, 'epoch': 0.51} 51%|█████ | 3394/6638 [3:14:21<2:58:17, 3.30s/it] 51%|█████ | 3395/6638 [3:14:24<2:59:50, 3.33s/it] {'loss': 0.6572, 'grad_norm': 0.5768856197339143, 'learning_rate': 1.0117111680837412e-05, 'epoch': 0.51} 51%|█████ | 3395/6638 [3:14:24<2:59:50, 3.33s/it] 51%|█████ | 3396/6638 [3:14:27<2:58:44, 3.31s/it] {'loss': 0.6219, 'grad_norm': 0.5510811258924201, 'learning_rate': 1.0112232236815555e-05, 'epoch': 0.51} 51%|█████ | 3396/6638 [3:14:27<2:58:44, 3.31s/it] 51%|█████ | 3397/6638 [3:14:31<2:58:06, 3.30s/it] {'loss': 0.6569, 'grad_norm': 0.5596873937632858, 'learning_rate': 1.0107352766068838e-05, 'epoch': 0.51} 51%|█████ | 3397/6638 [3:14:31<2:58:06, 3.30s/it] 51%|█████ | 3398/6638 [3:14:34<3:00:25, 3.34s/it] {'loss': 0.6893, 'grad_norm': 0.6075225594292842, 'learning_rate': 1.0102473269759172e-05, 'epoch': 0.51} 51%|█████ | 3398/6638 [3:14:34<3:00:25, 3.34s/it] 51%|█████ | 3399/6638 [3:14:37<3:00:45, 3.35s/it] {'loss': 0.6765, 'grad_norm': 0.681427181055629, 'learning_rate': 1.009759374904846e-05, 'epoch': 0.51} 51%|█████ | 3399/6638 [3:14:37<3:00:45, 3.35s/it]2 AutoResumeHook: Checking whether to suspend... 03 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend...7 AutoResumeHook: Checking whether to suspend... 51%|█████ | 3400/6638 [3:14:41<3:06:05, 3.45s/it]4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6832, 'grad_norm': 0.6911437998549225, 'learning_rate': 1.0092714205098622e-05, 'epoch': 0.51} 51%|█████ | 3400/6638 [3:14:41<3:06:05, 3.45s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 51%|█████ | 3401/6638 [3:14:59<6:59:01, 7.77s/it] {'loss': 0.6814, 'grad_norm': 0.6580852551704737, 'learning_rate': 1.0087834639071578e-05, 'epoch': 0.51} 51%|█████ | 3401/6638 [3:14:59<6:59:01, 7.77s/it] 51%|█████▏ | 3402/6638 [3:15:02<5:45:50, 6.41s/it] {'loss': 0.6171, 'grad_norm': 0.5264810362575416, 'learning_rate': 1.0082955052129259e-05, 'epoch': 0.51} 51%|█████▏ | 3402/6638 [3:15:02<5:45:50, 6.41s/it] 51%|█████▏ | 3403/6638 [3:15:05<4:54:15, 5.46s/it] {'loss': 0.6187, 'grad_norm': 0.6045118671667045, 'learning_rate': 1.0078075445433598e-05, 'epoch': 0.51} 51%|█████▏ | 3403/6638 [3:15:05<4:54:15, 5.46s/it] 51%|█████▏ | 3404/6638 [3:15:09<4:19:30, 4.81s/it] {'loss': 0.6626, 'grad_norm': 0.58609582208605, 'learning_rate': 1.0073195820146529e-05, 'epoch': 0.51} 51%|█████▏ | 3404/6638 [3:15:09<4:19:30, 4.81s/it] 51%|█████▏ | 3405/6638 [3:15:12<3:55:02, 4.36s/it] {'loss': 0.6846, 'grad_norm': 0.6699676574514075, 'learning_rate': 1.006831617742999e-05, 'epoch': 0.51} 51%|█████▏ | 3405/6638 [3:15:12<3:55:02, 4.36s/it] 51%|█████▏ | 3406/6638 [3:15:15<3:39:32, 4.08s/it] {'loss': 0.6389, 'grad_norm': 0.5799254508600066, 'learning_rate': 1.0063436518445936e-05, 'epoch': 0.51} 51%|█████▏ | 3406/6638 [3:15:15<3:39:32, 4.08s/it] 51%|█████▏ | 3407/6638 [3:15:19<3:26:08, 3.83s/it] {'loss': 0.6523, 'grad_norm': 0.6134888619384433, 'learning_rate': 1.0058556844356306e-05, 'epoch': 0.51} 51%|█████▏ | 3407/6638 [3:15:19<3:26:08, 3.83s/it] 51%|█████▏ | 3408/6638 [3:15:22<3:17:37, 3.67s/it] {'loss': 0.6358, 'grad_norm': 0.5874194131217706, 'learning_rate': 1.0053677156323067e-05, 'epoch': 0.51} 51%|█████▏ | 3408/6638 [3:15:22<3:17:37, 3.67s/it] 51%|█████▏ | 3409/6638 [3:15:25<3:10:32, 3.54s/it] {'loss': 0.6257, 'grad_norm': 0.5900712907371114, 'learning_rate': 1.004879745550816e-05, 'epoch': 0.51} 51%|█████▏ | 3409/6638 [3:15:25<3:10:32, 3.54s/it] 51%|█████▏ | 3410/6638 [3:15:28<3:06:30, 3.47s/it] {'loss': 0.5956, 'grad_norm': 0.4673140094905447, 'learning_rate': 1.0043917743073555e-05, 'epoch': 0.51} 51%|█████▏ | 3410/6638 [3:15:28<3:06:30, 3.47s/it] 51%|█████▏ | 3411/6638 [3:15:32<3:02:53, 3.40s/it] {'loss': 0.6665, 'grad_norm': 0.6979434884350707, 'learning_rate': 1.003903802018121e-05, 'epoch': 0.51} 51%|█████▏ | 3411/6638 [3:15:32<3:02:53, 3.40s/it] 51%|█████▏ | 3412/6638 [3:15:35<3:01:24, 3.37s/it] {'loss': 0.6445, 'grad_norm': 0.5788247468132368, 'learning_rate': 1.0034158287993092e-05, 'epoch': 0.51} 51%|█████▏ | 3412/6638 [3:15:35<3:01:24, 3.37s/it] 51%|█████▏ | 3413/6638 [3:15:38<2:59:10, 3.33s/it] {'loss': 0.7066, 'grad_norm': 0.5991600768963975, 'learning_rate': 1.0029278547671162e-05, 'epoch': 0.51} 51%|█████▏ | 3413/6638 [3:15:38<2:59:10, 3.33s/it] 51%|█████▏ | 3414/6638 [3:15:42<2:58:47, 3.33s/it] {'loss': 0.6488, 'grad_norm': 0.6556012942894754, 'learning_rate': 1.0024398800377398e-05, 'epoch': 0.51} 51%|█████▏ | 3414/6638 [3:15:42<2:58:47, 3.33s/it] 51%|█████▏ | 3415/6638 [3:15:45<2:58:01, 3.31s/it] {'loss': 0.664, 'grad_norm': 0.575740702875452, 'learning_rate': 1.0019519047273762e-05, 'epoch': 0.51} 51%|█████▏ | 3415/6638 [3:15:45<2:58:01, 3.31s/it] 51%|█████▏ | 3416/6638 [3:15:48<2:55:47, 3.27s/it] {'loss': 0.6325, 'grad_norm': 0.6464619905528767, 'learning_rate': 1.0014639289522232e-05, 'epoch': 0.51} 51%|█████▏ | 3416/6638 [3:15:48<2:55:47, 3.27s/it] 51%|█████▏ | 3417/6638 [3:15:51<2:56:18, 3.28s/it] {'loss': 0.6328, 'grad_norm': 0.5697135576784522, 'learning_rate': 1.0009759528284781e-05, 'epoch': 0.51} 51%|█████▏ | 3417/6638 [3:15:51<2:56:18, 3.28s/it] 51%|█████▏ | 3418/6638 [3:15:55<2:57:53, 3.31s/it] {'loss': 0.6928, 'grad_norm': 0.6841825966378166, 'learning_rate': 1.0004879764723377e-05, 'epoch': 0.51} 51%|█████▏ | 3418/6638 [3:15:55<2:57:53, 3.31s/it] 52%|█████▏ | 3419/6638 [3:15:58<2:58:14, 3.32s/it] {'loss': 0.6186, 'grad_norm': 0.534115680240901, 'learning_rate': 1e-05, 'epoch': 0.52} 52%|█████▏ | 3419/6638 [3:15:58<2:58:14, 3.32s/it] 52%|█████▏ | 3420/6638 [3:16:02<2:59:54, 3.35s/it] {'loss': 0.6729, 'grad_norm': 0.6405035718963561, 'learning_rate': 9.995120235276625e-06, 'epoch': 0.52} 52%|█████▏ | 3420/6638 [3:16:02<2:59:54, 3.35s/it] 52%|█████▏ | 3421/6638 [3:16:05<2:57:48, 3.32s/it] {'loss': 0.6105, 'grad_norm': 0.5930380627783779, 'learning_rate': 9.990240471715222e-06, 'epoch': 0.52} 52%|█████▏ | 3421/6638 [3:16:05<2:57:48, 3.32s/it] 52%|█████▏ | 3422/6638 [3:16:08<2:59:16, 3.34s/it] {'loss': 0.6825, 'grad_norm': 0.6586552128406612, 'learning_rate': 9.98536071047777e-06, 'epoch': 0.52} 52%|█████▏ | 3422/6638 [3:16:08<2:59:16, 3.34s/it] 52%|█████▏ | 3423/6638 [3:16:11<2:56:28, 3.29s/it] {'loss': 0.6824, 'grad_norm': 0.6472041296073038, 'learning_rate': 9.980480952726238e-06, 'epoch': 0.52} 52%|█████▏ | 3423/6638 [3:16:11<2:56:28, 3.29s/it] 52%|█████▏ | 3424/6638 [3:16:15<3:01:36, 3.39s/it] {'loss': 0.6648, 'grad_norm': 0.6177629247378971, 'learning_rate': 9.975601199622605e-06, 'epoch': 0.52} 52%|█████▏ | 3424/6638 [3:16:15<3:01:36, 3.39s/it] 52%|█████▏ | 3425/6638 [3:16:18<2:59:39, 3.36s/it] {'loss': 0.6565, 'grad_norm': 0.6639584166829531, 'learning_rate': 9.970721452328841e-06, 'epoch': 0.52} 52%|█████▏ | 3425/6638 [3:16:18<2:59:39, 3.36s/it] 52%|█████▏ | 3426/6638 [3:16:21<2:57:48, 3.32s/it] {'loss': 0.6229, 'grad_norm': 0.5951888765939566, 'learning_rate': 9.965841712006911e-06, 'epoch': 0.52} 52%|█████▏ | 3426/6638 [3:16:21<2:57:48, 3.32s/it] 52%|█████▏ | 3427/6638 [3:16:25<2:55:11, 3.27s/it] {'loss': 0.6394, 'grad_norm': 0.7251611936493167, 'learning_rate': 9.960961979818796e-06, 'epoch': 0.52} 52%|█████▏ | 3427/6638 [3:16:25<2:55:11, 3.27s/it] 52%|█████▏ | 3428/6638 [3:16:28<2:54:14, 3.26s/it] {'loss': 0.6608, 'grad_norm': 0.6264119076192868, 'learning_rate': 9.956082256926448e-06, 'epoch': 0.52} 52%|█████▏ | 3428/6638 [3:16:28<2:54:14, 3.26s/it] 52%|█████▏ | 3429/6638 [3:16:31<2:53:38, 3.25s/it] {'loss': 0.5961, 'grad_norm': 0.5549656281875505, 'learning_rate': 9.951202544491843e-06, 'epoch': 0.52} 52%|█████▏ | 3429/6638 [3:16:31<2:53:38, 3.25s/it] 52%|█████▏ | 3430/6638 [3:16:34<2:53:31, 3.25s/it] {'loss': 0.6319, 'grad_norm': 0.6676790544312078, 'learning_rate': 9.946322843676938e-06, 'epoch': 0.52} 52%|█████▏ | 3430/6638 [3:16:34<2:53:31, 3.25s/it] 52%|█████▏ | 3431/6638 [3:16:38<2:55:22, 3.28s/it] {'loss': 0.6547, 'grad_norm': 0.5729783990889232, 'learning_rate': 9.941443155643694e-06, 'epoch': 0.52} 52%|█████▏ | 3431/6638 [3:16:38<2:55:22, 3.28s/it] 52%|█████▏ | 3432/6638 [3:16:41<2:54:36, 3.27s/it] {'loss': 0.6831, 'grad_norm': 0.604217255803089, 'learning_rate': 9.936563481554068e-06, 'epoch': 0.52} 52%|█████▏ | 3432/6638 [3:16:41<2:54:36, 3.27s/it] 52%|█████▏ | 3433/6638 [3:16:44<2:55:57, 3.29s/it] {'loss': 0.6708, 'grad_norm': 0.6169714282005069, 'learning_rate': 9.93168382257001e-06, 'epoch': 0.52} 52%|█████▏ | 3433/6638 [3:16:44<2:55:57, 3.29s/it] 52%|█████▏ | 3434/6638 [3:16:47<2:54:20, 3.26s/it] {'loss': 0.6438, 'grad_norm': 0.6276192342054018, 'learning_rate': 9.926804179853476e-06, 'epoch': 0.52} 52%|█████▏ | 3434/6638 [3:16:47<2:54:20, 3.26s/it] 52%|█████▏ | 3435/6638 [3:16:51<2:54:03, 3.26s/it] {'loss': 0.6448, 'grad_norm': 0.6426732766886871, 'learning_rate': 9.921924554566407e-06, 'epoch': 0.52} 52%|█████▏ | 3435/6638 [3:16:51<2:54:03, 3.26s/it] 52%|█████▏ | 3436/6638 [3:16:54<2:53:18, 3.25s/it] {'loss': 0.6394, 'grad_norm': 0.5345240994699308, 'learning_rate': 9.917044947870741e-06, 'epoch': 0.52} 52%|█████▏ | 3436/6638 [3:16:54<2:53:18, 3.25s/it] 52%|█████▏ | 3437/6638 [3:16:57<2:53:18, 3.25s/it] {'loss': 0.6656, 'grad_norm': 0.6460032037356618, 'learning_rate': 9.912165360928425e-06, 'epoch': 0.52} 52%|█████▏ | 3437/6638 [3:16:57<2:53:18, 3.25s/it] 52%|█████▏ | 3438/6638 [3:17:00<2:54:24, 3.27s/it] {'loss': 0.6656, 'grad_norm': 0.628996214822668, 'learning_rate': 9.907285794901383e-06, 'epoch': 0.52} 52%|█████▏ | 3438/6638 [3:17:00<2:54:24, 3.27s/it] 52%|█████▏ | 3439/6638 [3:17:04<2:56:13, 3.31s/it] {'loss': 0.6382, 'grad_norm': 0.5993422460028996, 'learning_rate': 9.902406250951544e-06, 'epoch': 0.52} 52%|█████▏ | 3439/6638 [3:17:04<2:56:13, 3.31s/it] 52%|█████▏ | 3440/6638 [3:17:07<2:55:32, 3.29s/it] {'loss': 0.7136, 'grad_norm': 0.6503617717872389, 'learning_rate': 9.897526730240833e-06, 'epoch': 0.52} 52%|█████▏ | 3440/6638 [3:17:07<2:55:32, 3.29s/it] 52%|█████▏ | 3441/6638 [3:17:10<2:54:48, 3.28s/it] {'loss': 0.6843, 'grad_norm': 0.6769002697614542, 'learning_rate': 9.892647233931162e-06, 'epoch': 0.52} 52%|█████▏ | 3441/6638 [3:17:10<2:54:48, 3.28s/it] 52%|█████▏ | 3442/6638 [3:17:14<2:52:38, 3.24s/it] {'loss': 0.6444, 'grad_norm': 0.6661629533057142, 'learning_rate': 9.887767763184448e-06, 'epoch': 0.52} 52%|█████▏ | 3442/6638 [3:17:14<2:52:38, 3.24s/it] 52%|█████▏ | 3443/6638 [3:17:17<2:51:35, 3.22s/it] {'loss': 0.656, 'grad_norm': 0.6167906032766604, 'learning_rate': 9.882888319162591e-06, 'epoch': 0.52} 52%|█████▏ | 3443/6638 [3:17:17<2:51:35, 3.22s/it] 52%|█████▏ | 3444/6638 [3:17:20<2:52:32, 3.24s/it] {'loss': 0.6897, 'grad_norm': 0.6904032345597123, 'learning_rate': 9.878008903027488e-06, 'epoch': 0.52} 52%|█████▏ | 3444/6638 [3:17:20<2:52:32, 3.24s/it] 52%|█████▏ | 3445/6638 [3:17:23<2:51:54, 3.23s/it] {'loss': 0.6621, 'grad_norm': 1.0655188107664126, 'learning_rate': 9.873129515941036e-06, 'epoch': 0.52} 52%|█████▏ | 3445/6638 [3:17:23<2:51:54, 3.23s/it] 52%|█████▏ | 3446/6638 [3:17:26<2:51:57, 3.23s/it] {'loss': 0.674, 'grad_norm': 0.5726029040461668, 'learning_rate': 9.868250159065115e-06, 'epoch': 0.52} 52%|█████▏ | 3446/6638 [3:17:26<2:51:57, 3.23s/it] 52%|█████▏ | 3447/6638 [3:17:30<2:54:01, 3.27s/it] {'loss': 0.6565, 'grad_norm': 0.6937013354558644, 'learning_rate': 9.863370833561607e-06, 'epoch': 0.52} 52%|█████▏ | 3447/6638 [3:17:30<2:54:01, 3.27s/it] 52%|█████▏ | 3448/6638 [3:17:33<2:55:38, 3.30s/it] {'loss': 0.6365, 'grad_norm': 0.6202711981415266, 'learning_rate': 9.858491540592383e-06, 'epoch': 0.52} 52%|█████▏ | 3448/6638 [3:17:33<2:55:38, 3.30s/it] 52%|█████▏ | 3449/6638 [3:17:36<2:53:46, 3.27s/it] {'loss': 0.6431, 'grad_norm': 0.5939886179656391, 'learning_rate': 9.853612281319296e-06, 'epoch': 0.52} 52%|█████▏ | 3449/6638 [3:17:36<2:53:46, 3.27s/it]04 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 52%|█████▏ | 3450/6638 [3:17:40<2:54:39, 3.29s/it]2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6229, 'grad_norm': 0.5364171019034748, 'learning_rate': 9.84873305690421e-06, 'epoch': 0.52} 52%|█████▏ | 3450/6638 [3:17:40<2:54:39, 3.29s/it] 52%|█████▏ | 3451/6638 [3:17:43<2:53:25, 3.26s/it] {'loss': 0.6701, 'grad_norm': 0.6513977257985646, 'learning_rate': 9.843853868508965e-06, 'epoch': 0.52} 52%|█████▏ | 3451/6638 [3:17:43<2:53:25, 3.26s/it] 52%|█████▏ | 3452/6638 [3:17:46<2:54:32, 3.29s/it] {'loss': 0.6424, 'grad_norm': 1.0912758635160464, 'learning_rate': 9.838974717295403e-06, 'epoch': 0.52} 52%|█████▏ | 3452/6638 [3:17:46<2:54:32, 3.29s/it] 52%|█████▏ | 3453/6638 [3:17:50<2:55:14, 3.30s/it] {'loss': 0.6687, 'grad_norm': 0.6091552674731673, 'learning_rate': 9.834095604425351e-06, 'epoch': 0.52} 52%|█████▏ | 3453/6638 [3:17:50<2:55:14, 3.30s/it] 52%|█████▏ | 3454/6638 [3:17:53<2:54:00, 3.28s/it] {'loss': 0.6311, 'grad_norm': 0.582975507321179, 'learning_rate': 9.829216531060626e-06, 'epoch': 0.52} 52%|█████▏ | 3454/6638 [3:17:53<2:54:00, 3.28s/it] 52%|█████▏ | 3455/6638 [3:17:56<2:54:50, 3.30s/it] {'loss': 0.6491, 'grad_norm': 0.5753593992669662, 'learning_rate': 9.824337498363042e-06, 'epoch': 0.52} 52%|█████▏ | 3455/6638 [3:17:56<2:54:50, 3.30s/it] 52%|█████▏ | 3456/6638 [3:17:59<2:55:23, 3.31s/it] {'loss': 0.6206, 'grad_norm': 0.5463548564307604, 'learning_rate': 9.819458507494395e-06, 'epoch': 0.52} 52%|█████▏ | 3456/6638 [3:17:59<2:55:23, 3.31s/it] 52%|█████▏ | 3457/6638 [3:18:03<2:55:42, 3.31s/it] {'loss': 0.6595, 'grad_norm': 0.6077737205951561, 'learning_rate': 9.81457955961648e-06, 'epoch': 0.52} 52%|█████▏ | 3457/6638 [3:18:03<2:55:42, 3.31s/it] 52%|█████▏ | 3458/6638 [3:18:06<2:55:29, 3.31s/it] {'loss': 0.6553, 'grad_norm': 0.6118794302073018, 'learning_rate': 9.80970065589108e-06, 'epoch': 0.52} 52%|█████▏ | 3458/6638 [3:18:06<2:55:29, 3.31s/it] 52%|█████▏ | 3459/6638 [3:18:09<2:54:09, 3.29s/it] {'loss': 0.6203, 'grad_norm': 0.5896722837803352, 'learning_rate': 9.804821797479952e-06, 'epoch': 0.52} 52%|█████▏ | 3459/6638 [3:18:09<2:54:09, 3.29s/it] 52%|█████▏ | 3460/6638 [3:18:13<2:53:19, 3.27s/it] {'loss': 0.6293, 'grad_norm': 0.591950482409619, 'learning_rate': 9.79994298554487e-06, 'epoch': 0.52} 52%|█████▏ | 3460/6638 [3:18:13<2:53:19, 3.27s/it] 52%|█████▏ | 3461/6638 [3:18:16<2:53:10, 3.27s/it] {'loss': 0.6445, 'grad_norm': 0.5955592197827854, 'learning_rate': 9.795064221247566e-06, 'epoch': 0.52} 52%|█████▏ | 3461/6638 [3:18:16<2:53:10, 3.27s/it] 52%|█████▏ | 3462/6638 [3:18:19<2:53:56, 3.29s/it] {'loss': 0.6343, 'grad_norm': 0.6595147442150602, 'learning_rate': 9.790185505749794e-06, 'epoch': 0.52} 52%|█████▏ | 3462/6638 [3:18:19<2:53:56, 3.29s/it] 52%|█████▏ | 3463/6638 [3:18:23<2:55:08, 3.31s/it] {'loss': 0.6568, 'grad_norm': 0.8795733919823758, 'learning_rate': 9.785306840213267e-06, 'epoch': 0.52} 52%|█████▏ | 3463/6638 [3:18:23<2:55:08, 3.31s/it] 52%|█████▏ | 3464/6638 [3:18:26<2:53:51, 3.29s/it] {'loss': 0.6445, 'grad_norm': 0.6022750090354995, 'learning_rate': 9.780428225799699e-06, 'epoch': 0.52} 52%|█████▏ | 3464/6638 [3:18:26<2:53:51, 3.29s/it] 52%|█████▏ | 3465/6638 [3:18:29<2:54:10, 3.29s/it] {'loss': 0.6414, 'grad_norm': 0.590763179549419, 'learning_rate': 9.775549663670794e-06, 'epoch': 0.52} 52%|█████▏ | 3465/6638 [3:18:29<2:54:10, 3.29s/it] 52%|█████▏ | 3466/6638 [3:18:32<2:53:00, 3.27s/it] {'loss': 0.6486, 'grad_norm': 0.6068462980734062, 'learning_rate': 9.77067115498824e-06, 'epoch': 0.52} 52%|█████▏ | 3466/6638 [3:18:32<2:53:00, 3.27s/it] 52%|█████▏ | 3467/6638 [3:18:36<2:52:37, 3.27s/it] {'loss': 0.6301, 'grad_norm': 0.6157188112876113, 'learning_rate': 9.76579270091371e-06, 'epoch': 0.52} 52%|█████▏ | 3467/6638 [3:18:36<2:52:37, 3.27s/it] 52%|█████▏ | 3468/6638 [3:18:39<2:53:51, 3.29s/it] {'loss': 0.611, 'grad_norm': 0.5783669841325071, 'learning_rate': 9.76091430260887e-06, 'epoch': 0.52} 52%|█████▏ | 3468/6638 [3:18:39<2:53:51, 3.29s/it] 52%|█████▏ | 3469/6638 [3:18:42<2:54:20, 3.30s/it] {'loss': 0.6601, 'grad_norm': 0.5463044661174637, 'learning_rate': 9.756035961235366e-06, 'epoch': 0.52} 52%|█████▏ | 3469/6638 [3:18:42<2:54:20, 3.30s/it] 52%|█████▏ | 3470/6638 [3:18:45<2:53:33, 3.29s/it] {'loss': 0.6671, 'grad_norm': 0.6677130656213013, 'learning_rate': 9.751157677954839e-06, 'epoch': 0.52} 52%|█████▏ | 3470/6638 [3:18:45<2:53:33, 3.29s/it] 52%|█████▏ | 3471/6638 [3:18:49<2:53:30, 3.29s/it] {'loss': 0.6829, 'grad_norm': 0.6054349982076515, 'learning_rate': 9.746279453928905e-06, 'epoch': 0.52} 52%|█████▏ | 3471/6638 [3:18:49<2:53:30, 3.29s/it] 52%|█████▏ | 3472/6638 [3:18:52<2:53:26, 3.29s/it] {'loss': 0.6782, 'grad_norm': 0.6546988554711725, 'learning_rate': 9.741401290319173e-06, 'epoch': 0.52} 52%|█████▏ | 3472/6638 [3:18:52<2:53:26, 3.29s/it] 52%|█████▏ | 3473/6638 [3:18:55<2:53:04, 3.28s/it] {'loss': 0.7112, 'grad_norm': 0.6959228999930887, 'learning_rate': 9.73652318828724e-06, 'epoch': 0.52} 52%|█████▏ | 3473/6638 [3:18:55<2:53:04, 3.28s/it] 52%|█████▏ | 3474/6638 [3:18:59<2:52:40, 3.27s/it] {'loss': 0.6703, 'grad_norm': 0.5470088606559691, 'learning_rate': 9.731645148994681e-06, 'epoch': 0.52} 52%|█████▏ | 3474/6638 [3:18:59<2:52:40, 3.27s/it] 52%|█████▏ | 3475/6638 [3:19:02<2:52:28, 3.27s/it] {'loss': 0.6393, 'grad_norm': 0.5847029750572236, 'learning_rate': 9.726767173603063e-06, 'epoch': 0.52} 52%|█████▏ | 3475/6638 [3:19:02<2:52:28, 3.27s/it] 52%|█████▏ | 3476/6638 [3:19:05<2:52:35, 3.28s/it] {'loss': 0.6281, 'grad_norm': 0.592718638718003, 'learning_rate': 9.721889263273931e-06, 'epoch': 0.52} 52%|█████▏ | 3476/6638 [3:19:05<2:52:35, 3.28s/it] 52%|█████▏ | 3477/6638 [3:19:08<2:51:59, 3.26s/it] {'loss': 0.6381, 'grad_norm': 0.5540538486558999, 'learning_rate': 9.71701141916882e-06, 'epoch': 0.52} 52%|█████▏ | 3477/6638 [3:19:08<2:51:59, 3.26s/it] 52%|█████▏ | 3478/6638 [3:19:12<2:52:38, 3.28s/it] {'loss': 0.6441, 'grad_norm': 0.5964242849190187, 'learning_rate': 9.712133642449249e-06, 'epoch': 0.52} 52%|█████▏ | 3478/6638 [3:19:12<2:52:38, 3.28s/it] 52%|█████▏ | 3479/6638 [3:19:15<2:52:09, 3.27s/it] {'loss': 0.6702, 'grad_norm': 0.6212454326144277, 'learning_rate': 9.707255934276715e-06, 'epoch': 0.52} 52%|█████▏ | 3479/6638 [3:19:15<2:52:09, 3.27s/it] 52%|█████▏ | 3480/6638 [3:19:18<2:52:02, 3.27s/it] {'loss': 0.6501, 'grad_norm': 0.5988211156553898, 'learning_rate': 9.70237829581271e-06, 'epoch': 0.52} 52%|█████▏ | 3480/6638 [3:19:18<2:52:02, 3.27s/it] 52%|█████▏ | 3481/6638 [3:19:21<2:50:55, 3.25s/it] {'loss': 0.6301, 'grad_norm': 0.620358017025957, 'learning_rate': 9.697500728218699e-06, 'epoch': 0.52} 52%|█████▏ | 3481/6638 [3:19:21<2:50:55, 3.25s/it] 52%|█████▏ | 3482/6638 [3:19:25<2:52:17, 3.28s/it] {'loss': 0.6182, 'grad_norm': 0.547918154926649, 'learning_rate': 9.692623232656126e-06, 'epoch': 0.52} 52%|█████▏ | 3482/6638 [3:19:25<2:52:17, 3.28s/it] 52%|█████▏ | 3483/6638 [3:19:28<2:50:33, 3.24s/it] {'loss': 0.6507, 'grad_norm': 0.6565330939473418, 'learning_rate': 9.687745810286439e-06, 'epoch': 0.52} 52%|█████▏ | 3483/6638 [3:19:28<2:50:33, 3.24s/it] 52%|█████▏ | 3484/6638 [3:19:31<2:50:26, 3.24s/it] {'loss': 0.6325, 'grad_norm': 0.5869988730537559, 'learning_rate': 9.682868462271042e-06, 'epoch': 0.52} 52%|█████▏ | 3484/6638 [3:19:31<2:50:26, 3.24s/it] 53%|█████▎ | 3485/6638 [3:19:34<2:49:43, 3.23s/it] {'loss': 0.6215, 'grad_norm': 0.558107524369887, 'learning_rate': 9.677991189771347e-06, 'epoch': 0.53} 53%|█████▎ | 3485/6638 [3:19:34<2:49:43, 3.23s/it] 53%|█████▎ | 3486/6638 [3:19:38<2:51:25, 3.26s/it] {'loss': 0.6568, 'grad_norm': 0.5999440722096248, 'learning_rate': 9.673113993948726e-06, 'epoch': 0.53} 53%|█████▎ | 3486/6638 [3:19:38<2:51:25, 3.26s/it] 53%|█████▎ | 3487/6638 [3:19:41<2:52:35, 3.29s/it] {'loss': 0.6298, 'grad_norm': 0.5652789975455611, 'learning_rate': 9.668236875964542e-06, 'epoch': 0.53} 53%|█████▎ | 3487/6638 [3:19:41<2:52:35, 3.29s/it] 53%|█████▎ | 3488/6638 [3:19:44<2:51:52, 3.27s/it] {'loss': 0.6846, 'grad_norm': 0.6494971930358875, 'learning_rate': 9.663359836980144e-06, 'epoch': 0.53} 53%|█████▎ | 3488/6638 [3:19:44<2:51:52, 3.27s/it] 53%|█████▎ | 3489/6638 [3:19:48<2:52:22, 3.28s/it] {'loss': 0.6346, 'grad_norm': 0.6239870791000398, 'learning_rate': 9.658482878156854e-06, 'epoch': 0.53} 53%|█████▎ | 3489/6638 [3:19:48<2:52:22, 3.28s/it] 53%|█████▎ | 3490/6638 [3:19:51<2:52:37, 3.29s/it] {'loss': 0.6531, 'grad_norm': 0.5760188066085569, 'learning_rate': 9.653606000655983e-06, 'epoch': 0.53} 53%|█████▎ | 3490/6638 [3:19:51<2:52:37, 3.29s/it] 53%|█████▎ | 3491/6638 [3:19:54<2:53:31, 3.31s/it] {'loss': 0.639, 'grad_norm': 0.5932626316752189, 'learning_rate': 9.648729205638816e-06, 'epoch': 0.53} 53%|█████▎ | 3491/6638 [3:19:54<2:53:31, 3.31s/it] 53%|█████▎ | 3492/6638 [3:19:57<2:51:50, 3.28s/it] {'loss': 0.6339, 'grad_norm': 0.6060185450749461, 'learning_rate': 9.643852494266615e-06, 'epoch': 0.53} 53%|█████▎ | 3492/6638 [3:19:57<2:51:50, 3.28s/it] 53%|█████▎ | 3493/6638 [3:20:01<2:51:20, 3.27s/it] {'loss': 0.6868, 'grad_norm': 0.6433795150201838, 'learning_rate': 9.638975867700638e-06, 'epoch': 0.53} 53%|█████▎ | 3493/6638 [3:20:01<2:51:20, 3.27s/it] 53%|█████▎ | 3494/6638 [3:20:04<2:51:00, 3.26s/it] {'loss': 0.6762, 'grad_norm': 0.6864420161175502, 'learning_rate': 9.634099327102102e-06, 'epoch': 0.53} 53%|█████▎ | 3494/6638 [3:20:04<2:51:00, 3.26s/it] 53%|█████▎ | 3495/6638 [3:20:07<2:51:34, 3.28s/it] {'loss': 0.6722, 'grad_norm': 0.605634490234443, 'learning_rate': 9.629222873632224e-06, 'epoch': 0.53} 53%|█████▎ | 3495/6638 [3:20:07<2:51:34, 3.28s/it] 53%|█████▎ | 3496/6638 [3:20:11<2:52:49, 3.30s/it] {'loss': 0.6905, 'grad_norm': 0.6255093565011718, 'learning_rate': 9.624346508452185e-06, 'epoch': 0.53} 53%|█████▎ | 3496/6638 [3:20:11<2:52:49, 3.30s/it] 53%|█████▎ | 3497/6638 [3:20:14<2:51:42, 3.28s/it] {'loss': 0.6333, 'grad_norm': 0.6151925656431706, 'learning_rate': 9.619470232723145e-06, 'epoch': 0.53} 53%|█████▎ | 3497/6638 [3:20:14<2:51:42, 3.28s/it] 53%|█████▎ | 3498/6638 [3:20:17<2:50:56, 3.27s/it] {'loss': 0.6735, 'grad_norm': 0.6276486591911116, 'learning_rate': 9.61459404760626e-06, 'epoch': 0.53} 53%|█████▎ | 3498/6638 [3:20:17<2:50:56, 3.27s/it] 53%|█████▎ | 3499/6638 [3:20:20<2:52:02, 3.29s/it] {'loss': 0.6543, 'grad_norm': 0.5785367660149637, 'learning_rate': 9.609717954262643e-06, 'epoch': 0.53} 53%|█████▎ | 3499/6638 [3:20:20<2:52:02, 3.29s/it]2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 30 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 53%|█████▎ | 3500/6638 [3:20:24<2:50:28, 3.26s/it]7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6335, 'grad_norm': 0.5528564258523113, 'learning_rate': 9.604841953853396e-06, 'epoch': 0.53} 53%|█████▎ | 3500/6638 [3:20:24<2:50:28, 3.26s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 53%|█████▎ | 3501/6638 [3:20:41<6:36:03, 7.58s/it] {'loss': 0.6092, 'grad_norm': 0.5866594273212586, 'learning_rate': 9.599966047539599e-06, 'epoch': 0.53} 53%|█████▎ | 3501/6638 [3:20:41<6:36:03, 7.58s/it] 53%|█████▎ | 3502/6638 [3:20:44<5:26:49, 6.25s/it] {'loss': 0.67, 'grad_norm': 0.6610901120773458, 'learning_rate': 9.595090236482307e-06, 'epoch': 0.53} 53%|█████▎ | 3502/6638 [3:20:44<5:26:49, 6.25s/it] 53%|█████▎ | 3503/6638 [3:20:48<4:39:51, 5.36s/it] {'loss': 0.6357, 'grad_norm': 0.5783776588133915, 'learning_rate': 9.590214521842555e-06, 'epoch': 0.53} 53%|█████▎ | 3503/6638 [3:20:48<4:39:51, 5.36s/it] 53%|█████▎ | 3504/6638 [3:20:51<4:07:19, 4.73s/it] {'loss': 0.6008, 'grad_norm': 0.604906086515029, 'learning_rate': 9.585338904781355e-06, 'epoch': 0.53} 53%|█████▎ | 3504/6638 [3:20:51<4:07:19, 4.73s/it] 53%|█████▎ | 3505/6638 [3:20:54<3:44:33, 4.30s/it] {'loss': 0.6551, 'grad_norm': 0.590102748510723, 'learning_rate': 9.580463386459686e-06, 'epoch': 0.53} 53%|█████▎ | 3505/6638 [3:20:54<3:44:33, 4.30s/it] 53%|█████▎ | 3506/6638 [3:20:58<3:28:01, 3.99s/it] {'loss': 0.6542, 'grad_norm': 0.6267567717248568, 'learning_rate': 9.57558796803852e-06, 'epoch': 0.53} 53%|█████▎ | 3506/6638 [3:20:58<3:28:01, 3.99s/it] 53%|█████▎ | 3507/6638 [3:21:01<3:17:28, 3.78s/it] {'loss': 0.6495, 'grad_norm': 0.5230836608658621, 'learning_rate': 9.57071265067879e-06, 'epoch': 0.53} 53%|█████▎ | 3507/6638 [3:21:01<3:17:28, 3.78s/it] 53%|█████▎ | 3508/6638 [3:21:04<3:09:23, 3.63s/it] {'loss': 0.6282, 'grad_norm': 0.5534900023015266, 'learning_rate': 9.565837435541418e-06, 'epoch': 0.53} 53%|█████▎ | 3508/6638 [3:21:04<3:09:23, 3.63s/it] 53%|█████▎ | 3509/6638 [3:21:07<3:03:31, 3.52s/it] {'loss': 0.6462, 'grad_norm': 0.5677019393451915, 'learning_rate': 9.560962323787292e-06, 'epoch': 0.53} 53%|█████▎ | 3509/6638 [3:21:07<3:03:31, 3.52s/it] 53%|█████▎ | 3510/6638 [3:21:11<3:00:57, 3.47s/it] {'loss': 0.6182, 'grad_norm': 0.6239373945726525, 'learning_rate': 9.556087316577278e-06, 'epoch': 0.53} 53%|█████▎ | 3510/6638 [3:21:11<3:00:57, 3.47s/it] 53%|█████▎ | 3511/6638 [3:21:14<2:57:39, 3.41s/it] {'loss': 0.6182, 'grad_norm': 0.5861366178574448, 'learning_rate': 9.55121241507222e-06, 'epoch': 0.53} 53%|█████▎ | 3511/6638 [3:21:14<2:57:39, 3.41s/it] 53%|█████▎ | 3512/6638 [3:21:17<2:54:09, 3.34s/it] {'loss': 0.6884, 'grad_norm': 0.6221535494152514, 'learning_rate': 9.546337620432931e-06, 'epoch': 0.53} 53%|█████▎ | 3512/6638 [3:21:17<2:54:09, 3.34s/it] 53%|█████▎ | 3513/6638 [3:21:20<2:53:15, 3.33s/it] {'loss': 0.6614, 'grad_norm': 0.6500820608060359, 'learning_rate': 9.541462933820208e-06, 'epoch': 0.53} 53%|█████▎ | 3513/6638 [3:21:20<2:53:15, 3.33s/it] 53%|█████▎ | 3514/6638 [3:21:24<2:52:39, 3.32s/it] {'loss': 0.6355, 'grad_norm': 0.6379685699091046, 'learning_rate': 9.536588356394816e-06, 'epoch': 0.53} 53%|█████▎ | 3514/6638 [3:21:24<2:52:39, 3.32s/it] 53%|█████▎ | 3515/6638 [3:21:27<2:52:07, 3.31s/it] {'loss': 0.6785, 'grad_norm': 0.6427707903944415, 'learning_rate': 9.531713889317487e-06, 'epoch': 0.53} 53%|█████▎ | 3515/6638 [3:21:27<2:52:07, 3.31s/it] 53%|█████▎ | 3516/6638 [3:21:30<2:51:33, 3.30s/it] {'loss': 0.625, 'grad_norm': 0.6030986423621107, 'learning_rate': 9.526839533748943e-06, 'epoch': 0.53} 53%|█████▎ | 3516/6638 [3:21:30<2:51:33, 3.30s/it] 53%|█████▎ | 3517/6638 [3:21:34<2:50:10, 3.27s/it] {'loss': 0.6788, 'grad_norm': 0.6814672588122418, 'learning_rate': 9.521965290849861e-06, 'epoch': 0.53} 53%|█████▎ | 3517/6638 [3:21:34<2:50:10, 3.27s/it] 53%|█████▎ | 3518/6638 [3:21:37<2:50:04, 3.27s/it] {'loss': 0.6152, 'grad_norm': 0.4918234938286018, 'learning_rate': 9.517091161780914e-06, 'epoch': 0.53} 53%|█████▎ | 3518/6638 [3:21:37<2:50:04, 3.27s/it] 53%|█████▎ | 3519/6638 [3:21:40<2:48:49, 3.25s/it] {'loss': 0.6406, 'grad_norm': 0.633529239535702, 'learning_rate': 9.512217147702725e-06, 'epoch': 0.53} 53%|█████▎ | 3519/6638 [3:21:40<2:48:49, 3.25s/it] 53%|█████▎ | 3520/6638 [3:21:43<2:50:32, 3.28s/it] {'loss': 0.6253, 'grad_norm': 0.5777835208895028, 'learning_rate': 9.507343249775898e-06, 'epoch': 0.53} 53%|█████▎ | 3520/6638 [3:21:43<2:50:32, 3.28s/it] 53%|█████▎ | 3521/6638 [3:21:47<2:50:23, 3.28s/it] {'loss': 0.6204, 'grad_norm': 0.6001641894431067, 'learning_rate': 9.50246946916102e-06, 'epoch': 0.53} 53%|█████▎ | 3521/6638 [3:21:47<2:50:23, 3.28s/it] 53%|█████▎ | 3522/6638 [3:21:50<2:50:35, 3.28s/it] {'loss': 0.6337, 'grad_norm': 0.5840841608283118, 'learning_rate': 9.49759580701863e-06, 'epoch': 0.53} 53%|█████▎ | 3522/6638 [3:21:50<2:50:35, 3.28s/it] 53%|█████▎ | 3523/6638 [3:21:53<2:50:57, 3.29s/it] {'loss': 0.6576, 'grad_norm': 0.5609286262233193, 'learning_rate': 9.492722264509258e-06, 'epoch': 0.53} 53%|█████▎ | 3523/6638 [3:21:53<2:50:57, 3.29s/it] 53%|█████▎ | 3524/6638 [3:21:57<2:50:46, 3.29s/it] {'loss': 0.7252, 'grad_norm': 0.7021219049404948, 'learning_rate': 9.487848842793395e-06, 'epoch': 0.53} 53%|█████▎ | 3524/6638 [3:21:57<2:50:46, 3.29s/it] 53%|█████▎ | 3525/6638 [3:22:00<2:51:06, 3.30s/it] {'loss': 0.6693, 'grad_norm': 0.6076727450288739, 'learning_rate': 9.4829755430315e-06, 'epoch': 0.53} 53%|█████▎ | 3525/6638 [3:22:00<2:51:06, 3.30s/it] 53%|█████▎ | 3526/6638 [3:22:03<2:50:03, 3.28s/it] {'loss': 0.6229, 'grad_norm': 0.6115544513841832, 'learning_rate': 9.478102366384019e-06, 'epoch': 0.53} 53%|█████▎ | 3526/6638 [3:22:03<2:50:03, 3.28s/it] 53%|█████▎ | 3527/6638 [3:22:06<2:48:50, 3.26s/it] {'loss': 0.6292, 'grad_norm': 0.5709447947065575, 'learning_rate': 9.473229314011345e-06, 'epoch': 0.53} 53%|█████▎ | 3527/6638 [3:22:06<2:48:50, 3.26s/it] 53%|█████▎ | 3528/6638 [3:22:10<2:50:05, 3.28s/it] {'loss': 0.638, 'grad_norm': 0.5612894923129942, 'learning_rate': 9.468356387073866e-06, 'epoch': 0.53} 53%|█████▎ | 3528/6638 [3:22:10<2:50:05, 3.28s/it] 53%|█████▎ | 3529/6638 [3:22:13<2:50:59, 3.30s/it] {'loss': 0.668, 'grad_norm': 0.6357117620286284, 'learning_rate': 9.46348358673192e-06, 'epoch': 0.53} 53%|█████▎ | 3529/6638 [3:22:13<2:50:59, 3.30s/it] 53%|█████▎ | 3530/6638 [3:22:16<2:50:15, 3.29s/it] {'loss': 0.6439, 'grad_norm': 0.6514190749566497, 'learning_rate': 9.458610914145826e-06, 'epoch': 0.53} 53%|█████▎ | 3530/6638 [3:22:16<2:50:15, 3.29s/it] 53%|█████▎ | 3531/6638 [3:22:19<2:49:41, 3.28s/it] {'loss': 0.6581, 'grad_norm': 0.5531290392443369, 'learning_rate': 9.453738370475871e-06, 'epoch': 0.53} 53%|█████▎ | 3531/6638 [3:22:19<2:49:41, 3.28s/it] 53%|█████▎ | 3532/6638 [3:22:23<2:51:03, 3.30s/it] {'loss': 0.6658, 'grad_norm': 0.5867541641412446, 'learning_rate': 9.448865956882312e-06, 'epoch': 0.53} 53%|█████▎ | 3532/6638 [3:22:23<2:51:03, 3.30s/it] 53%|█████▎ | 3533/6638 [3:22:26<2:49:20, 3.27s/it] {'loss': 0.6814, 'grad_norm': 0.6184040429937288, 'learning_rate': 9.443993674525367e-06, 'epoch': 0.53} 53%|█████▎ | 3533/6638 [3:22:26<2:49:20, 3.27s/it] 53%|█████▎ | 3534/6638 [3:22:29<2:49:55, 3.28s/it] {'loss': 0.6789, 'grad_norm': 0.6516247004372825, 'learning_rate': 9.439121524565237e-06, 'epoch': 0.53} 53%|█████▎ | 3534/6638 [3:22:29<2:49:55, 3.28s/it] 53%|█████▎ | 3535/6638 [3:22:33<2:50:29, 3.30s/it] {'loss': 0.694, 'grad_norm': 0.6801217233349556, 'learning_rate': 9.434249508162076e-06, 'epoch': 0.53} 53%|█████▎ | 3535/6638 [3:22:33<2:50:29, 3.30s/it] 53%|█████▎ | 3536/6638 [3:22:36<2:49:26, 3.28s/it] {'loss': 0.6655, 'grad_norm': 0.6098513399902791, 'learning_rate': 9.42937762647602e-06, 'epoch': 0.53} 53%|█████▎ | 3536/6638 [3:22:36<2:49:26, 3.28s/it] 53%|█████▎ | 3537/6638 [3:22:39<2:48:57, 3.27s/it] {'loss': 0.6731, 'grad_norm': 0.6374759592827853, 'learning_rate': 9.424505880667167e-06, 'epoch': 0.53} 53%|█████▎ | 3537/6638 [3:22:39<2:48:57, 3.27s/it] 53%|█████▎ | 3538/6638 [3:22:42<2:47:42, 3.25s/it] {'loss': 0.6617, 'grad_norm': 0.5973807833614534, 'learning_rate': 9.41963427189557e-06, 'epoch': 0.53} 53%|█████▎ | 3538/6638 [3:22:42<2:47:42, 3.25s/it] 53%|█████▎ | 3539/6638 [3:22:46<2:50:52, 3.31s/it] {'loss': 0.7062, 'grad_norm': 0.6363315945854878, 'learning_rate': 9.414762801321277e-06, 'epoch': 0.53} 53%|█████▎ | 3539/6638 [3:22:46<2:50:52, 3.31s/it] 53%|█████▎ | 3540/6638 [3:22:49<2:51:04, 3.31s/it] {'loss': 0.7257, 'grad_norm': 0.6450433063862775, 'learning_rate': 9.409891470104277e-06, 'epoch': 0.53} 53%|█████▎ | 3540/6638 [3:22:49<2:51:04, 3.31s/it] 53%|█████▎ | 3541/6638 [3:22:53<2:54:03, 3.37s/it] {'loss': 0.6525, 'grad_norm': 0.639058837354433, 'learning_rate': 9.405020279404545e-06, 'epoch': 0.53} 53%|█████▎ | 3541/6638 [3:22:53<2:54:03, 3.37s/it] 53%|█████▎ | 3542/6638 [3:22:56<2:52:02, 3.33s/it] {'loss': 0.6416, 'grad_norm': 0.6148055678874921, 'learning_rate': 9.40014923038201e-06, 'epoch': 0.53} 53%|█████▎ | 3542/6638 [3:22:56<2:52:02, 3.33s/it] 53%|█████▎ | 3543/6638 [3:22:59<2:49:37, 3.29s/it] {'loss': 0.6039, 'grad_norm': 0.5615233725797876, 'learning_rate': 9.395278324196568e-06, 'epoch': 0.53} 53%|█████▎ | 3543/6638 [3:22:59<2:49:37, 3.29s/it] 53%|█████▎ | 3544/6638 [3:23:02<2:49:07, 3.28s/it] {'loss': 0.6585, 'grad_norm': 0.5864033313839699, 'learning_rate': 9.390407562008088e-06, 'epoch': 0.53} 53%|█████▎ | 3544/6638 [3:23:02<2:49:07, 3.28s/it] 53%|█████▎ | 3545/6638 [3:23:06<2:49:27, 3.29s/it] {'loss': 0.6649, 'grad_norm': 0.6425087207025957, 'learning_rate': 9.3855369449764e-06, 'epoch': 0.53} 53%|█████▎ | 3545/6638 [3:23:06<2:49:27, 3.29s/it] 53%|█████▎ | 3546/6638 [3:23:09<2:48:08, 3.26s/it] {'loss': 0.6449, 'grad_norm': 0.6188930001774405, 'learning_rate': 9.380666474261303e-06, 'epoch': 0.53} 53%|█████▎ | 3546/6638 [3:23:09<2:48:08, 3.26s/it] 53%|█████▎ | 3547/6638 [3:23:12<2:46:55, 3.24s/it] {'loss': 0.6623, 'grad_norm': 0.6080785787307473, 'learning_rate': 9.375796151022558e-06, 'epoch': 0.53} 53%|█████▎ | 3547/6638 [3:23:12<2:46:55, 3.24s/it] 53%|█████▎ | 3548/6638 [3:23:15<2:50:08, 3.30s/it] {'loss': 0.6386, 'grad_norm': 0.6475611516669741, 'learning_rate': 9.370925976419885e-06, 'epoch': 0.53} 53%|█████▎ | 3548/6638 [3:23:15<2:50:08, 3.30s/it] 53%|█████▎ | 3549/6638 [3:23:19<2:48:42, 3.28s/it] {'loss': 0.6434, 'grad_norm': 0.6046009191037868, 'learning_rate': 9.366055951612985e-06, 'epoch': 0.53} 53%|█████▎ | 3549/6638 [3:23:19<2:48:42, 3.28s/it]2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 53%|█████▎ | 3550/6638 [3:23:22<2:48:13, 3.27s/it]3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... {'loss': 0.656, 'grad_norm': 0.6515707675987662, 'learning_rate': 9.361186077761503e-06, 'epoch': 0.53} 53%|█████▎ | 3550/6638 [3:23:22<2:48:13, 3.27s/it] 53%|█████▎ | 3551/6638 [3:23:25<2:50:33, 3.31s/it] {'loss': 0.6477, 'grad_norm': 0.5402377337121549, 'learning_rate': 9.356316356025069e-06, 'epoch': 0.53} 53%|█████▎ | 3551/6638 [3:23:25<2:50:33, 3.31s/it] 54%|█████▎ | 3552/6638 [3:23:29<2:51:37, 3.34s/it] {'loss': 0.6384, 'grad_norm': 0.5793257135414858, 'learning_rate': 9.351446787563261e-06, 'epoch': 0.54} 54%|█████▎ | 3552/6638 [3:23:29<2:51:37, 3.34s/it] 54%|█████▎ | 3553/6638 [3:23:32<2:50:14, 3.31s/it] {'loss': 0.7099, 'grad_norm': 0.6724182123562685, 'learning_rate': 9.346577373535623e-06, 'epoch': 0.54} 54%|█████▎ | 3553/6638 [3:23:32<2:50:14, 3.31s/it] 54%|█████▎ | 3554/6638 [3:23:35<2:50:52, 3.32s/it] {'loss': 0.6954, 'grad_norm': 0.6552874639710508, 'learning_rate': 9.34170811510167e-06, 'epoch': 0.54} 54%|█████▎ | 3554/6638 [3:23:35<2:50:52, 3.32s/it] 54%|█████▎ | 3555/6638 [3:23:39<2:49:59, 3.31s/it] {'loss': 0.6275, 'grad_norm': 0.6109310236947174, 'learning_rate': 9.336839013420871e-06, 'epoch': 0.54} 54%|█████▎ | 3555/6638 [3:23:39<2:49:59, 3.31s/it] 54%|█████▎ | 3556/6638 [3:23:42<2:51:01, 3.33s/it] {'loss': 0.6408, 'grad_norm': 0.5444675145492128, 'learning_rate': 9.331970069652665e-06, 'epoch': 0.54} 54%|█████▎ | 3556/6638 [3:23:42<2:51:01, 3.33s/it] 54%|█████▎ | 3557/6638 [3:23:45<2:50:58, 3.33s/it] {'loss': 0.6675, 'grad_norm': 0.6933008738243228, 'learning_rate': 9.327101284956451e-06, 'epoch': 0.54} 54%|█████▎ | 3557/6638 [3:23:45<2:50:58, 3.33s/it] 54%|█████▎ | 3558/6638 [3:23:49<2:50:25, 3.32s/it] {'loss': 0.6687, 'grad_norm': 0.6315321562674546, 'learning_rate': 9.322232660491582e-06, 'epoch': 0.54} 54%|█████▎ | 3558/6638 [3:23:49<2:50:25, 3.32s/it] 54%|█████▎ | 3559/6638 [3:23:52<2:49:19, 3.30s/it] {'loss': 0.6488, 'grad_norm': 0.6271774981085213, 'learning_rate': 9.31736419741739e-06, 'epoch': 0.54} 54%|█████▎ | 3559/6638 [3:23:52<2:49:19, 3.30s/it] 54%|█████▎ | 3560/6638 [3:23:55<2:48:53, 3.29s/it] {'loss': 0.6722, 'grad_norm': 0.6617964215857665, 'learning_rate': 9.312495896893153e-06, 'epoch': 0.54} 54%|█████▎ | 3560/6638 [3:23:55<2:48:53, 3.29s/it] 54%|█████▎ | 3561/6638 [3:23:58<2:48:45, 3.29s/it] {'loss': 0.6696, 'grad_norm': 0.6273499553328262, 'learning_rate': 9.307627760078108e-06, 'epoch': 0.54} 54%|█████▎ | 3561/6638 [3:23:58<2:48:45, 3.29s/it] 54%|█████▎ | 3562/6638 [3:24:03<3:10:02, 3.71s/it] {'loss': 0.6532, 'grad_norm': 0.5370856895073263, 'learning_rate': 9.302759788131478e-06, 'epoch': 0.54} 54%|█████▎ | 3562/6638 [3:24:03<3:10:02, 3.71s/it] 54%|█████▎ | 3563/6638 [3:24:06<3:03:15, 3.58s/it] {'loss': 0.6326, 'grad_norm': 0.6301283826940222, 'learning_rate': 9.297891982212416e-06, 'epoch': 0.54} 54%|█████▎ | 3563/6638 [3:24:06<3:03:15, 3.58s/it] 54%|█████▎ | 3564/6638 [3:24:10<3:01:52, 3.55s/it] {'loss': 0.6834, 'grad_norm': 0.6005891774641371, 'learning_rate': 9.293024343480056e-06, 'epoch': 0.54} 54%|█████▎ | 3564/6638 [3:24:10<3:01:52, 3.55s/it] 54%|█████▎ | 3565/6638 [3:24:13<2:56:07, 3.44s/it] {'loss': 0.6266, 'grad_norm': 0.5922285162718061, 'learning_rate': 9.288156873093481e-06, 'epoch': 0.54} 54%|█████▎ | 3565/6638 [3:24:13<2:56:07, 3.44s/it] 54%|█████▎ | 3566/6638 [3:24:16<2:55:38, 3.43s/it] {'loss': 0.6643, 'grad_norm': 0.6304514916996145, 'learning_rate': 9.28328957221174e-06, 'epoch': 0.54} 54%|█████▎ | 3566/6638 [3:24:16<2:55:38, 3.43s/it] 54%|█████▎ | 3567/6638 [3:24:20<2:53:34, 3.39s/it] {'loss': 0.6526, 'grad_norm': 0.5443660440660607, 'learning_rate': 9.278422441993841e-06, 'epoch': 0.54} 54%|█████▎ | 3567/6638 [3:24:20<2:53:34, 3.39s/it] 54%|█████▍ | 3568/6638 [3:24:23<2:51:41, 3.36s/it] {'loss': 0.6748, 'grad_norm': 0.6631016794413237, 'learning_rate': 9.273555483598747e-06, 'epoch': 0.54} 54%|█████▍ | 3568/6638 [3:24:23<2:51:41, 3.36s/it] 54%|█████▍ | 3569/6638 [3:24:26<2:50:42, 3.34s/it] {'loss': 0.6383, 'grad_norm': 0.799315639716697, 'learning_rate': 9.268688698185388e-06, 'epoch': 0.54} 54%|█████▍ | 3569/6638 [3:24:26<2:50:42, 3.34s/it] 54%|█████▍ | 3570/6638 [3:24:30<2:49:04, 3.31s/it] {'loss': 0.6611, 'grad_norm': 0.6079953686711288, 'learning_rate': 9.263822086912646e-06, 'epoch': 0.54} 54%|█████▍ | 3570/6638 [3:24:30<2:49:04, 3.31s/it] 54%|█████▍ | 3571/6638 [3:24:33<2:48:10, 3.29s/it] {'loss': 0.6411, 'grad_norm': 0.6452750594845224, 'learning_rate': 9.25895565093936e-06, 'epoch': 0.54} 54%|█████▍ | 3571/6638 [3:24:33<2:48:10, 3.29s/it] 54%|█████▍ | 3572/6638 [3:24:36<2:47:22, 3.28s/it] {'loss': 0.6761, 'grad_norm': 0.6502393130401971, 'learning_rate': 9.254089391424335e-06, 'epoch': 0.54} 54%|█████▍ | 3572/6638 [3:24:36<2:47:22, 3.28s/it] 54%|█████▍ | 3573/6638 [3:24:40<2:50:56, 3.35s/it] {'loss': 0.623, 'grad_norm': 0.6238927132273586, 'learning_rate': 9.249223309526324e-06, 'epoch': 0.54} 54%|█████▍ | 3573/6638 [3:24:40<2:50:56, 3.35s/it] 54%|█████▍ | 3574/6638 [3:24:43<2:51:00, 3.35s/it] {'loss': 0.6616, 'grad_norm': 0.6046141667082947, 'learning_rate': 9.244357406404057e-06, 'epoch': 0.54} 54%|█████▍ | 3574/6638 [3:24:43<2:51:00, 3.35s/it] 54%|█████▍ | 3575/6638 [3:24:46<2:50:08, 3.33s/it] {'loss': 0.6248, 'grad_norm': 0.5517842162297812, 'learning_rate': 9.239491683216195e-06, 'epoch': 0.54} 54%|█████▍ | 3575/6638 [3:24:46<2:50:08, 3.33s/it] 54%|█████▍ | 3576/6638 [3:24:50<2:51:18, 3.36s/it] {'loss': 0.6586, 'grad_norm': 0.596601490021485, 'learning_rate': 9.234626141121372e-06, 'epoch': 0.54} 54%|█████▍ | 3576/6638 [3:24:50<2:51:18, 3.36s/it] 54%|█████▍ | 3577/6638 [3:24:53<2:51:38, 3.36s/it] {'loss': 0.6197, 'grad_norm': 0.5667357056430038, 'learning_rate': 9.22976078127818e-06, 'epoch': 0.54} 54%|█████▍ | 3577/6638 [3:24:53<2:51:38, 3.36s/it] 54%|█████▍ | 3578/6638 [3:24:56<2:50:57, 3.35s/it] {'loss': 0.6324, 'grad_norm': 0.6377581957647979, 'learning_rate': 9.224895604845157e-06, 'epoch': 0.54} 54%|█████▍ | 3578/6638 [3:24:56<2:50:57, 3.35s/it] 54%|█████▍ | 3579/6638 [3:25:00<2:50:30, 3.34s/it] {'loss': 0.5888, 'grad_norm': 0.5629113439921954, 'learning_rate': 9.22003061298081e-06, 'epoch': 0.54} 54%|█████▍ | 3579/6638 [3:25:00<2:50:30, 3.34s/it] 54%|█████▍ | 3580/6638 [3:25:03<2:48:29, 3.31s/it] {'loss': 0.6269, 'grad_norm': 0.6133166840283404, 'learning_rate': 9.215165806843594e-06, 'epoch': 0.54} 54%|█████▍ | 3580/6638 [3:25:03<2:48:29, 3.31s/it] 54%|█████▍ | 3581/6638 [3:25:08<3:18:38, 3.90s/it] {'loss': 0.6833, 'grad_norm': 0.6278171487073101, 'learning_rate': 9.21030118759192e-06, 'epoch': 0.54} 54%|█████▍ | 3581/6638 [3:25:08<3:18:38, 3.90s/it] 54%|█████▍ | 3582/6638 [3:25:11<3:09:43, 3.73s/it] {'loss': 0.6394, 'grad_norm': 0.5510677328343825, 'learning_rate': 9.205436756384159e-06, 'epoch': 0.54} 54%|█████▍ | 3582/6638 [3:25:11<3:09:43, 3.73s/it] 54%|█████▍ | 3583/6638 [3:25:15<3:04:14, 3.62s/it] {'loss': 0.6655, 'grad_norm': 0.5428629037757631, 'learning_rate': 9.200572514378631e-06, 'epoch': 0.54} 54%|█████▍ | 3583/6638 [3:25:15<3:04:14, 3.62s/it] 54%|█████▍ | 3584/6638 [3:25:18<3:00:15, 3.54s/it] {'loss': 0.6571, 'grad_norm': 0.6225083312801157, 'learning_rate': 9.195708462733622e-06, 'epoch': 0.54} 54%|█████▍ | 3584/6638 [3:25:18<3:00:15, 3.54s/it] 54%|█████▍ | 3585/6638 [3:25:22<2:56:25, 3.47s/it] {'loss': 0.6421, 'grad_norm': 0.5651880518230132, 'learning_rate': 9.190844602607357e-06, 'epoch': 0.54} 54%|█████▍ | 3585/6638 [3:25:22<2:56:25, 3.47s/it] 54%|█████▍ | 3586/6638 [3:25:26<3:18:34, 3.90s/it] {'loss': 0.6535, 'grad_norm': 0.5549305210994417, 'learning_rate': 9.185980935158025e-06, 'epoch': 0.54} 54%|█████▍ | 3586/6638 [3:25:26<3:18:34, 3.90s/it] 54%|█████▍ | 3587/6638 [3:25:30<3:10:54, 3.75s/it] {'loss': 0.6545, 'grad_norm': 0.6678834543244514, 'learning_rate': 9.181117461543771e-06, 'epoch': 0.54} 54%|█████▍ | 3587/6638 [3:25:30<3:10:54, 3.75s/it] 54%|█████▍ | 3588/6638 [3:25:33<3:05:31, 3.65s/it] {'loss': 0.6342, 'grad_norm': 0.5899525160658385, 'learning_rate': 9.176254182922685e-06, 'epoch': 0.54} 54%|█████▍ | 3588/6638 [3:25:33<3:05:31, 3.65s/it] 54%|█████▍ | 3589/6638 [3:25:37<2:59:35, 3.53s/it] {'loss': 0.6522, 'grad_norm': 0.5703854840671588, 'learning_rate': 9.17139110045282e-06, 'epoch': 0.54} 54%|█████▍ | 3589/6638 [3:25:37<2:59:35, 3.53s/it] 54%|█████▍ | 3590/6638 [3:25:40<2:55:24, 3.45s/it] {'loss': 0.6561, 'grad_norm': 0.633035865667548, 'learning_rate': 9.166528215292181e-06, 'epoch': 0.54} 54%|█████▍ | 3590/6638 [3:25:40<2:55:24, 3.45s/it] 54%|█████▍ | 3591/6638 [3:25:43<2:53:11, 3.41s/it] {'loss': 0.7022, 'grad_norm': 0.6488105334063271, 'learning_rate': 9.161665528598717e-06, 'epoch': 0.54} 54%|█████▍ | 3591/6638 [3:25:43<2:53:11, 3.41s/it] 54%|█████▍ | 3592/6638 [3:25:46<2:51:09, 3.37s/it] {'loss': 0.6188, 'grad_norm': 0.5857447796304692, 'learning_rate': 9.15680304153034e-06, 'epoch': 0.54} 54%|█████▍ | 3592/6638 [3:25:46<2:51:09, 3.37s/it] 54%|█████▍ | 3593/6638 [3:25:50<2:48:47, 3.33s/it] {'loss': 0.612, 'grad_norm': 0.552393129683937, 'learning_rate': 9.151940755244912e-06, 'epoch': 0.54} 54%|█████▍ | 3593/6638 [3:25:50<2:48:47, 3.33s/it] 54%|█████▍ | 3594/6638 [3:25:53<2:49:29, 3.34s/it] {'loss': 0.6402, 'grad_norm': 0.5721087154874426, 'learning_rate': 9.147078670900238e-06, 'epoch': 0.54} 54%|█████▍ | 3594/6638 [3:25:53<2:49:29, 3.34s/it] 54%|█████▍ | 3595/6638 [3:25:56<2:48:23, 3.32s/it] {'loss': 0.6417, 'grad_norm': 0.5861525071926023, 'learning_rate': 9.142216789654093e-06, 'epoch': 0.54} 54%|█████▍ | 3595/6638 [3:25:56<2:48:23, 3.32s/it] 54%|█████▍ | 3596/6638 [3:26:00<2:48:22, 3.32s/it] {'loss': 0.6458, 'grad_norm': 0.5574900836013575, 'learning_rate': 9.137355112664181e-06, 'epoch': 0.54} 54%|█████▍ | 3596/6638 [3:26:00<2:48:22, 3.32s/it] 54%|█████▍ | 3597/6638 [3:26:03<2:47:18, 3.30s/it] {'loss': 0.6319, 'grad_norm': 0.6041280702388644, 'learning_rate': 9.132493641088184e-06, 'epoch': 0.54} 54%|█████▍ | 3597/6638 [3:26:03<2:47:18, 3.30s/it] 54%|█████▍ | 3598/6638 [3:26:06<2:45:56, 3.28s/it] {'loss': 0.6246, 'grad_norm': 0.6129764123180661, 'learning_rate': 9.127632376083712e-06, 'epoch': 0.54} 54%|█████▍ | 3598/6638 [3:26:06<2:45:56, 3.28s/it] 54%|█████▍ | 3599/6638 [3:26:09<2:45:21, 3.26s/it] {'loss': 0.6305, 'grad_norm': 0.579049120569049, 'learning_rate': 9.122771318808335e-06, 'epoch': 0.54} 54%|█████▍ | 3599/6638 [3:26:09<2:45:21, 3.26s/it]2 AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 54%|█████▍ | 3600/6638 [3:26:13<2:45:47, 3.27s/it]7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6461, 'grad_norm': 0.5692178459410088, 'learning_rate': 9.117910470419575e-06, 'epoch': 0.54} 54%|█████▍ | 3600/6638 [3:26:13<2:45:47, 3.27s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 54%|█████▍ | 3601/6638 [3:26:31<6:37:28, 7.85s/it] {'loss': 0.6247, 'grad_norm': 0.5875926554666135, 'learning_rate': 9.113049832074899e-06, 'epoch': 0.54} 54%|█████▍ | 3601/6638 [3:26:31<6:37:28, 7.85s/it] 54%|█████▍ | 3602/6638 [3:26:35<5:31:00, 6.54s/it] {'loss': 0.6199, 'grad_norm': 0.5403533343984274, 'learning_rate': 9.108189404931732e-06, 'epoch': 0.54} 54%|█████▍ | 3602/6638 [3:26:35<5:31:00, 6.54s/it] 54%|█████▍ | 3603/6638 [3:26:38<4:45:50, 5.65s/it] {'loss': 0.6569, 'grad_norm': 0.5555523763169197, 'learning_rate': 9.103329190147444e-06, 'epoch': 0.54} 54%|█████▍ | 3603/6638 [3:26:38<4:45:50, 5.65s/it] 54%|█████▍ | 3604/6638 [3:26:44<4:46:31, 5.67s/it] {'loss': 0.619, 'grad_norm': 0.5546596009052074, 'learning_rate': 9.098469188879348e-06, 'epoch': 0.54} 54%|█████▍ | 3604/6638 [3:26:44<4:46:31, 5.67s/it] 54%|█████▍ | 3605/6638 [3:26:47<4:10:18, 4.95s/it] {'loss': 0.6389, 'grad_norm': 0.5497518926414036, 'learning_rate': 9.093609402284722e-06, 'epoch': 0.54} 54%|█████▍ | 3605/6638 [3:26:47<4:10:18, 4.95s/it] 54%|█████▍ | 3606/6638 [3:26:50<3:46:06, 4.47s/it] {'loss': 0.683, 'grad_norm': 0.5813717342783445, 'learning_rate': 9.088749831520771e-06, 'epoch': 0.54} 54%|█████▍ | 3606/6638 [3:26:51<3:46:06, 4.47s/it] 54%|█████▍ | 3607/6638 [3:26:54<3:30:20, 4.16s/it] {'loss': 0.702, 'grad_norm': 0.6094066865939706, 'learning_rate': 9.083890477744675e-06, 'epoch': 0.54} 54%|█████▍ | 3607/6638 [3:26:54<3:30:20, 4.16s/it] 54%|█████▍ | 3608/6638 [3:26:57<3:16:55, 3.90s/it] {'loss': 0.6226, 'grad_norm': 0.5438895651711985, 'learning_rate': 9.079031342113542e-06, 'epoch': 0.54} 54%|█████▍ | 3608/6638 [3:26:57<3:16:55, 3.90s/it] 54%|█████▍ | 3609/6638 [3:27:00<3:06:50, 3.70s/it] {'loss': 0.6802, 'grad_norm': 0.622654235307277, 'learning_rate': 9.07417242578443e-06, 'epoch': 0.54} 54%|█████▍ | 3609/6638 [3:27:00<3:06:50, 3.70s/it] 54%|█████▍ | 3610/6638 [3:27:04<2:59:08, 3.55s/it] {'loss': 0.613, 'grad_norm': 0.5761605354824348, 'learning_rate': 9.069313729914356e-06, 'epoch': 0.54} 54%|█████▍ | 3610/6638 [3:27:04<2:59:08, 3.55s/it] 54%|█████▍ | 3611/6638 [3:27:07<2:55:25, 3.48s/it] {'loss': 0.6084, 'grad_norm': 0.5135064986414383, 'learning_rate': 9.064455255660274e-06, 'epoch': 0.54} 54%|█████▍ | 3611/6638 [3:27:07<2:55:25, 3.48s/it] 54%|█████▍ | 3612/6638 [3:27:12<3:17:50, 3.92s/it] {'loss': 0.6452, 'grad_norm': 0.5774172473519033, 'learning_rate': 9.059597004179093e-06, 'epoch': 0.54} 54%|█████▍ | 3612/6638 [3:27:12<3:17:50, 3.92s/it] 54%|█████▍ | 3613/6638 [3:27:15<3:08:02, 3.73s/it] {'loss': 0.6506, 'grad_norm': 0.6132024241999775, 'learning_rate': 9.054738976627662e-06, 'epoch': 0.54} 54%|█████▍ | 3613/6638 [3:27:15<3:08:02, 3.73s/it] 54%|█████▍ | 3614/6638 [3:27:19<3:02:20, 3.62s/it] {'loss': 0.652, 'grad_norm': 0.6565251913700021, 'learning_rate': 9.049881174162779e-06, 'epoch': 0.54} 54%|█████▍ | 3614/6638 [3:27:19<3:02:20, 3.62s/it] 54%|█████▍ | 3615/6638 [3:27:22<2:58:03, 3.53s/it] {'loss': 0.6994, 'grad_norm': 0.6836401367635316, 'learning_rate': 9.04502359794119e-06, 'epoch': 0.54} 54%|█████▍ | 3615/6638 [3:27:22<2:58:03, 3.53s/it] 54%|█████▍ | 3616/6638 [3:27:25<2:54:45, 3.47s/it] {'loss': 0.6532, 'grad_norm': 0.5951698123158793, 'learning_rate': 9.040166249119583e-06, 'epoch': 0.54} 54%|█████▍ | 3616/6638 [3:27:25<2:54:45, 3.47s/it] 54%|█████▍ | 3617/6638 [3:27:28<2:50:33, 3.39s/it] {'loss': 0.6272, 'grad_norm': 0.5901048980826633, 'learning_rate': 9.035309128854605e-06, 'epoch': 0.54} 54%|█████▍ | 3617/6638 [3:27:28<2:50:33, 3.39s/it] 55%|█████▍ | 3618/6638 [3:27:33<3:09:45, 3.77s/it] {'loss': 0.651, 'grad_norm': 0.5583043193040368, 'learning_rate': 9.030452238302831e-06, 'epoch': 0.55} 55%|█████▍ | 3618/6638 [3:27:33<3:09:45, 3.77s/it] 55%|█████▍ | 3619/6638 [3:27:36<3:01:37, 3.61s/it] {'loss': 0.6122, 'grad_norm': 0.5380141494163744, 'learning_rate': 9.025595578620783e-06, 'epoch': 0.55} 55%|█████▍ | 3619/6638 [3:27:36<3:01:37, 3.61s/it] 55%|█████▍ | 3620/6638 [3:27:40<2:57:22, 3.53s/it] {'loss': 0.6974, 'grad_norm': 0.5975720589500002, 'learning_rate': 9.020739150964947e-06, 'epoch': 0.55} 55%|█████▍ | 3620/6638 [3:27:40<2:57:22, 3.53s/it] 55%|█████▍ | 3621/6638 [3:27:43<2:54:17, 3.47s/it] {'loss': 0.6448, 'grad_norm': 0.5425108075831295, 'learning_rate': 9.015882956491732e-06, 'epoch': 0.55} 55%|█████▍ | 3621/6638 [3:27:43<2:54:17, 3.47s/it] 55%|█████▍ | 3622/6638 [3:27:46<2:51:01, 3.40s/it] {'loss': 0.6449, 'grad_norm': 0.6167033902787153, 'learning_rate': 9.011026996357504e-06, 'epoch': 0.55} 55%|█████▍ | 3622/6638 [3:27:46<2:51:01, 3.40s/it] 55%|█████▍ | 3623/6638 [3:27:52<3:28:16, 4.14s/it] {'loss': 0.664, 'grad_norm': 0.5907899354388564, 'learning_rate': 9.006171271718567e-06, 'epoch': 0.55} 55%|█████▍ | 3623/6638 [3:27:52<3:28:16, 4.14s/it] 55%|█████▍ | 3624/6638 [3:27:56<3:18:27, 3.95s/it] {'loss': 0.6647, 'grad_norm': 0.5386208287381058, 'learning_rate': 9.001315783731169e-06, 'epoch': 0.55} 55%|█████▍ | 3624/6638 [3:27:56<3:18:27, 3.95s/it] 55%|█████▍ | 3625/6638 [3:27:59<3:06:14, 3.71s/it] {'loss': 0.6135, 'grad_norm': 0.5477930759086108, 'learning_rate': 8.996460533551512e-06, 'epoch': 0.55} 55%|█████▍ | 3625/6638 [3:27:59<3:06:14, 3.71s/it] 55%|█████▍ | 3626/6638 [3:28:02<2:58:14, 3.55s/it] {'loss': 0.6528, 'grad_norm': 0.597973851253595, 'learning_rate': 8.991605522335727e-06, 'epoch': 0.55} 55%|█████▍ | 3626/6638 [3:28:02<2:58:14, 3.55s/it] 55%|█████▍ | 3627/6638 [3:28:05<2:54:43, 3.48s/it] {'loss': 0.7116, 'grad_norm': 0.59950547578153, 'learning_rate': 8.98675075123989e-06, 'epoch': 0.55} 55%|█████▍ | 3627/6638 [3:28:05<2:54:43, 3.48s/it] 55%|█████▍ | 3628/6638 [3:28:09<2:52:50, 3.45s/it] {'loss': 0.6253, 'grad_norm': 0.5669948395762286, 'learning_rate': 8.981896221420039e-06, 'epoch': 0.55} 55%|█████▍ | 3628/6638 [3:28:09<2:52:50, 3.45s/it] 55%|█████▍ | 3629/6638 [3:28:12<2:50:01, 3.39s/it] {'loss': 0.6424, 'grad_norm': 0.6434618706449703, 'learning_rate': 8.97704193403212e-06, 'epoch': 0.55} 55%|█████▍ | 3629/6638 [3:28:12<2:50:01, 3.39s/it] 55%|█████▍ | 3630/6638 [3:28:15<2:48:19, 3.36s/it] {'loss': 0.6688, 'grad_norm': 0.5964732183358888, 'learning_rate': 8.972187890232061e-06, 'epoch': 0.55} 55%|█████▍ | 3630/6638 [3:28:15<2:48:19, 3.36s/it] 55%|█████▍ | 3631/6638 [3:28:18<2:46:00, 3.31s/it] {'loss': 0.6353, 'grad_norm': 0.5992863043210648, 'learning_rate': 8.967334091175698e-06, 'epoch': 0.55} 55%|█████▍ | 3631/6638 [3:28:18<2:46:00, 3.31s/it] 55%|█████▍ | 3632/6638 [3:28:22<2:47:26, 3.34s/it] {'loss': 0.6634, 'grad_norm': 0.5547211164481164, 'learning_rate': 8.962480538018828e-06, 'epoch': 0.55} 55%|█████▍ | 3632/6638 [3:28:22<2:47:26, 3.34s/it] 55%|█████▍ | 3633/6638 [3:28:25<2:47:03, 3.34s/it] {'loss': 0.6339, 'grad_norm': 0.5959831360344352, 'learning_rate': 8.957627231917183e-06, 'epoch': 0.55} 55%|█████▍ | 3633/6638 [3:28:25<2:47:03, 3.34s/it] 55%|█████▍ | 3634/6638 [3:28:28<2:47:39, 3.35s/it] {'loss': 0.6428, 'grad_norm': 0.6261844852869836, 'learning_rate': 8.952774174026437e-06, 'epoch': 0.55} 55%|█████▍ | 3634/6638 [3:28:28<2:47:39, 3.35s/it] 55%|█████▍ | 3635/6638 [3:28:33<3:08:39, 3.77s/it] {'loss': 0.6019, 'grad_norm': 0.5745200817826617, 'learning_rate': 8.947921365502208e-06, 'epoch': 0.55} 55%|█████▍ | 3635/6638 [3:28:33<3:08:39, 3.77s/it] 55%|█████▍ | 3636/6638 [3:28:36<3:01:15, 3.62s/it] {'loss': 0.6551, 'grad_norm': 0.637336219631641, 'learning_rate': 8.94306880750005e-06, 'epoch': 0.55} 55%|█████▍ | 3636/6638 [3:28:37<3:01:15, 3.62s/it] 55%|█████▍ | 3637/6638 [3:28:40<2:59:32, 3.59s/it] {'loss': 0.6685, 'grad_norm': 0.538793343599026, 'learning_rate': 8.938216501175457e-06, 'epoch': 0.55} 55%|█████▍ | 3637/6638 [3:28:40<2:59:32, 3.59s/it] 55%|█████▍ | 3638/6638 [3:28:43<2:52:42, 3.45s/it] {'loss': 0.6232, 'grad_norm': 0.5987783683253713, 'learning_rate': 8.933364447683868e-06, 'epoch': 0.55} 55%|█████▍ | 3638/6638 [3:28:43<2:52:42, 3.45s/it] 55%|█████▍ | 3639/6638 [3:28:47<2:51:50, 3.44s/it] {'loss': 0.6621, 'grad_norm': 0.5913726381610234, 'learning_rate': 8.928512648180659e-06, 'epoch': 0.55} 55%|█████▍ | 3639/6638 [3:28:47<2:51:50, 3.44s/it] 55%|█████▍ | 3640/6638 [3:28:50<2:49:47, 3.40s/it] {'loss': 0.6543, 'grad_norm': 0.6260162680444022, 'learning_rate': 8.923661103821146e-06, 'epoch': 0.55} 55%|█████▍ | 3640/6638 [3:28:50<2:49:47, 3.40s/it] 55%|█████▍ | 3641/6638 [3:28:53<2:46:42, 3.34s/it] {'loss': 0.6458, 'grad_norm': 0.6038588801566152, 'learning_rate': 8.918809815760585e-06, 'epoch': 0.55} 55%|█████▍ | 3641/6638 [3:28:53<2:46:42, 3.34s/it] 55%|█████▍ | 3642/6638 [3:28:56<2:44:45, 3.30s/it] {'loss': 0.6772, 'grad_norm': 0.5968721685150119, 'learning_rate': 8.913958785154165e-06, 'epoch': 0.55} 55%|█████▍ | 3642/6638 [3:28:56<2:44:45, 3.30s/it] 55%|█████▍ | 3643/6638 [3:29:00<2:44:15, 3.29s/it] {'loss': 0.6712, 'grad_norm': 0.6194190171762216, 'learning_rate': 8.909108013157021e-06, 'epoch': 0.55} 55%|█████▍ | 3643/6638 [3:29:00<2:44:15, 3.29s/it] 55%|█████▍ | 3644/6638 [3:29:03<2:45:13, 3.31s/it] {'loss': 0.6897, 'grad_norm': 0.5864614189694572, 'learning_rate': 8.904257500924224e-06, 'epoch': 0.55} 55%|█████▍ | 3644/6638 [3:29:03<2:45:13, 3.31s/it] 55%|█████▍ | 3645/6638 [3:29:06<2:44:03, 3.29s/it] {'loss': 0.667, 'grad_norm': 0.620465229097593, 'learning_rate': 8.899407249610785e-06, 'epoch': 0.55} 55%|█████▍ | 3645/6638 [3:29:06<2:44:03, 3.29s/it] 55%|█████▍ | 3646/6638 [3:29:09<2:42:47, 3.26s/it] {'loss': 0.6488, 'grad_norm': 0.6533083973189928, 'learning_rate': 8.894557260371648e-06, 'epoch': 0.55} 55%|█████▍ | 3646/6638 [3:29:09<2:42:47, 3.26s/it] 55%|█████▍ | 3647/6638 [3:29:13<2:43:25, 3.28s/it] {'loss': 0.6573, 'grad_norm': 0.6349196702074634, 'learning_rate': 8.889707534361699e-06, 'epoch': 0.55} 55%|█████▍ | 3647/6638 [3:29:13<2:43:25, 3.28s/it] 55%|█████▍ | 3648/6638 [3:29:16<2:42:38, 3.26s/it] {'loss': 0.6531, 'grad_norm': 0.5929767534323438, 'learning_rate': 8.88485807273576e-06, 'epoch': 0.55} 55%|█████▍ | 3648/6638 [3:29:16<2:42:38, 3.26s/it] 55%|█████▍ | 3649/6638 [3:29:19<2:44:35, 3.30s/it] {'loss': 0.6614, 'grad_norm': 0.5992201517114123, 'learning_rate': 8.880008876648588e-06, 'epoch': 0.55} 55%|█████▍ | 3649/6638 [3:29:19<2:44:35, 3.30s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 55%|█████▍ | 3650/6638 [3:29:23<2:43:37, 3.29s/it]7 AutoResumeHook: Checking whether to suspend... 46 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6711, 'grad_norm': 0.6186924731094804, 'learning_rate': 8.875159947254882e-06, 'epoch': 0.55} 55%|█████▍ | 3650/6638 [3:29:23<2:43:37, 3.29s/it] 55%|█████▌ | 3651/6638 [3:29:26<2:45:19, 3.32s/it] {'loss': 0.6737, 'grad_norm': 0.5279283883435486, 'learning_rate': 8.870311285709274e-06, 'epoch': 0.55} 55%|█████▌ | 3651/6638 [3:29:26<2:45:19, 3.32s/it] 55%|█████▌ | 3652/6638 [3:29:29<2:47:01, 3.36s/it] {'loss': 0.6548, 'grad_norm': 0.578821658041875, 'learning_rate': 8.865462893166325e-06, 'epoch': 0.55} 55%|█████▌ | 3652/6638 [3:29:29<2:47:01, 3.36s/it] 55%|█████▌ | 3653/6638 [3:29:33<2:46:21, 3.34s/it] {'loss': 0.6327, 'grad_norm': 0.5845163707762345, 'learning_rate': 8.860614770780552e-06, 'epoch': 0.55} 55%|█████▌ | 3653/6638 [3:29:33<2:46:21, 3.34s/it] 55%|█████▌ | 3654/6638 [3:29:36<2:45:52, 3.34s/it] {'loss': 0.6175, 'grad_norm': 0.5267292471077838, 'learning_rate': 8.855766919706386e-06, 'epoch': 0.55} 55%|█████▌ | 3654/6638 [3:29:36<2:45:52, 3.34s/it] 55%|█████▌ | 3655/6638 [3:29:39<2:43:44, 3.29s/it] {'loss': 0.6357, 'grad_norm': 0.5890171592013441, 'learning_rate': 8.850919341098202e-06, 'epoch': 0.55} 55%|█████▌ | 3655/6638 [3:29:39<2:43:44, 3.29s/it] 55%|█████▌ | 3656/6638 [3:29:43<2:45:30, 3.33s/it] {'loss': 0.6355, 'grad_norm': 0.5819904009741759, 'learning_rate': 8.846072036110316e-06, 'epoch': 0.55} 55%|█████▌ | 3656/6638 [3:29:43<2:45:30, 3.33s/it] 55%|█████▌ | 3657/6638 [3:29:46<2:45:32, 3.33s/it] {'loss': 0.6633, 'grad_norm': 0.5888614092612092, 'learning_rate': 8.841225005896967e-06, 'epoch': 0.55} 55%|█████▌ | 3657/6638 [3:29:46<2:45:32, 3.33s/it] 55%|█████▌ | 3658/6638 [3:29:49<2:45:24, 3.33s/it] {'loss': 0.653, 'grad_norm': 0.605809057344195, 'learning_rate': 8.83637825161234e-06, 'epoch': 0.55} 55%|█████▌ | 3658/6638 [3:29:49<2:45:24, 3.33s/it] 55%|█████▌ | 3659/6638 [3:29:53<2:44:22, 3.31s/it] {'loss': 0.6178, 'grad_norm': 0.5441028747858697, 'learning_rate': 8.83153177441055e-06, 'epoch': 0.55} 55%|█████▌ | 3659/6638 [3:29:53<2:44:22, 3.31s/it] 55%|█████▌ | 3660/6638 [3:29:56<2:41:45, 3.26s/it] {'loss': 0.6299, 'grad_norm': 0.5690643713522285, 'learning_rate': 8.826685575445638e-06, 'epoch': 0.55} 55%|█████▌ | 3660/6638 [3:29:56<2:41:45, 3.26s/it] 55%|█████▌ | 3661/6638 [3:29:59<2:44:22, 3.31s/it] {'loss': 0.7398, 'grad_norm': 0.7032832190289898, 'learning_rate': 8.821839655871593e-06, 'epoch': 0.55} 55%|█████▌ | 3661/6638 [3:29:59<2:44:22, 3.31s/it] 55%|█████▌ | 3662/6638 [3:30:02<2:44:39, 3.32s/it] {'loss': 0.6833, 'grad_norm': 0.6347481437038773, 'learning_rate': 8.81699401684233e-06, 'epoch': 0.55} 55%|█████▌ | 3662/6638 [3:30:02<2:44:39, 3.32s/it] 55%|█████▌ | 3663/6638 [3:30:06<2:44:23, 3.32s/it] {'loss': 0.6281, 'grad_norm': 0.5697686680088727, 'learning_rate': 8.812148659511695e-06, 'epoch': 0.55} 55%|█████▌ | 3663/6638 [3:30:06<2:44:23, 3.32s/it] 55%|█████▌ | 3664/6638 [3:30:09<2:43:33, 3.30s/it] {'loss': 0.6329, 'grad_norm': 0.5721067510239032, 'learning_rate': 8.80730358503347e-06, 'epoch': 0.55} 55%|█████▌ | 3664/6638 [3:30:09<2:43:33, 3.30s/it] 55%|█████▌ | 3665/6638 [3:30:12<2:42:31, 3.28s/it] {'loss': 0.6594, 'grad_norm': 0.5910557921598735, 'learning_rate': 8.80245879456137e-06, 'epoch': 0.55} 55%|█████▌ | 3665/6638 [3:30:12<2:42:31, 3.28s/it] 55%|█████▌ | 3666/6638 [3:30:15<2:41:17, 3.26s/it] {'loss': 0.6156, 'grad_norm': 0.5428013803120085, 'learning_rate': 8.797614289249041e-06, 'epoch': 0.55} 55%|█████▌ | 3666/6638 [3:30:15<2:41:17, 3.26s/it] 55%|█████▌ | 3667/6638 [3:30:19<2:43:27, 3.30s/it] {'loss': 0.6307, 'grad_norm': 0.5163696786115048, 'learning_rate': 8.792770070250061e-06, 'epoch': 0.55} 55%|█████▌ | 3667/6638 [3:30:19<2:43:27, 3.30s/it] 55%|█████▌ | 3668/6638 [3:30:22<2:43:56, 3.31s/it] {'loss': 0.6485, 'grad_norm': 0.5686212713761608, 'learning_rate': 8.787926138717942e-06, 'epoch': 0.55} 55%|█████▌ | 3668/6638 [3:30:22<2:43:56, 3.31s/it] 55%|█████▌ | 3669/6638 [3:30:25<2:43:41, 3.31s/it] {'loss': 0.6756, 'grad_norm': 0.6642030111079748, 'learning_rate': 8.783082495806128e-06, 'epoch': 0.55} 55%|█████▌ | 3669/6638 [3:30:25<2:43:41, 3.31s/it] 55%|█████▌ | 3670/6638 [3:30:29<2:42:36, 3.29s/it] {'loss': 0.6563, 'grad_norm': 0.6550035054659576, 'learning_rate': 8.778239142667985e-06, 'epoch': 0.55} 55%|█████▌ | 3670/6638 [3:30:29<2:42:36, 3.29s/it] 55%|█████▌ | 3671/6638 [3:30:32<2:42:38, 3.29s/it] {'loss': 0.6532, 'grad_norm': 0.5657732302245114, 'learning_rate': 8.773396080456828e-06, 'epoch': 0.55} 55%|█████▌ | 3671/6638 [3:30:32<2:42:38, 3.29s/it] 55%|█████▌ | 3672/6638 [3:30:35<2:43:20, 3.30s/it] {'loss': 0.6664, 'grad_norm': 0.5716220607601459, 'learning_rate': 8.768553310325883e-06, 'epoch': 0.55} 55%|█████▌ | 3672/6638 [3:30:35<2:43:20, 3.30s/it] 55%|█████▌ | 3673/6638 [3:30:39<2:43:22, 3.31s/it] {'loss': 0.6908, 'grad_norm': 0.6001503152406291, 'learning_rate': 8.763710833428319e-06, 'epoch': 0.55} 55%|█████▌ | 3673/6638 [3:30:39<2:43:22, 3.31s/it] 55%|█████▌ | 3674/6638 [3:30:42<2:45:28, 3.35s/it] {'loss': 0.6933, 'grad_norm': 0.6205921714018865, 'learning_rate': 8.758868650917236e-06, 'epoch': 0.55} 55%|█████▌ | 3674/6638 [3:30:42<2:45:28, 3.35s/it] 55%|█████▌ | 3675/6638 [3:30:45<2:45:49, 3.36s/it] {'loss': 0.6667, 'grad_norm': 0.6368960239390975, 'learning_rate': 8.754026763945648e-06, 'epoch': 0.55} 55%|█████▌ | 3675/6638 [3:30:45<2:45:49, 3.36s/it] 55%|█████▌ | 3676/6638 [3:30:49<2:46:03, 3.36s/it] {'loss': 0.6322, 'grad_norm': 0.5155150158842029, 'learning_rate': 8.749185173666523e-06, 'epoch': 0.55} 55%|█████▌ | 3676/6638 [3:30:49<2:46:03, 3.36s/it] 55%|█████▌ | 3677/6638 [3:30:52<2:47:09, 3.39s/it] {'loss': 0.7162, 'grad_norm': 0.6868252787642397, 'learning_rate': 8.744343881232736e-06, 'epoch': 0.55} 55%|█████▌ | 3677/6638 [3:30:52<2:47:09, 3.39s/it] 55%|█████▌ | 3678/6638 [3:30:56<2:44:50, 3.34s/it] {'loss': 0.7126, 'grad_norm': 0.6380353665622891, 'learning_rate': 8.739502887797108e-06, 'epoch': 0.55} 55%|█████▌ | 3678/6638 [3:30:56<2:44:50, 3.34s/it] 55%|█████▌ | 3679/6638 [3:30:59<2:46:06, 3.37s/it] {'loss': 0.6001, 'grad_norm': 0.5538459145846046, 'learning_rate': 8.734662194512377e-06, 'epoch': 0.55} 55%|█████▌ | 3679/6638 [3:30:59<2:46:06, 3.37s/it] 55%|█████▌ | 3680/6638 [3:31:02<2:44:31, 3.34s/it] {'loss': 0.6889, 'grad_norm': 0.6545697166775194, 'learning_rate': 8.729821802531213e-06, 'epoch': 0.55} 55%|█████▌ | 3680/6638 [3:31:02<2:44:31, 3.34s/it] 55%|█████▌ | 3681/6638 [3:31:06<2:44:01, 3.33s/it] {'loss': 0.7064, 'grad_norm': 0.6770655036451891, 'learning_rate': 8.72498171300622e-06, 'epoch': 0.55} 55%|█████▌ | 3681/6638 [3:31:06<2:44:01, 3.33s/it] 55%|█████▌ | 3682/6638 [3:31:09<2:42:44, 3.30s/it] {'loss': 0.6834, 'grad_norm': 0.6354104361793783, 'learning_rate': 8.720141927089921e-06, 'epoch': 0.55} 55%|█████▌ | 3682/6638 [3:31:09<2:42:44, 3.30s/it] 55%|█████▌ | 3683/6638 [3:31:12<2:42:37, 3.30s/it] {'loss': 0.6429, 'grad_norm': 0.5659266664766285, 'learning_rate': 8.715302445934773e-06, 'epoch': 0.55} 55%|█████▌ | 3683/6638 [3:31:12<2:42:37, 3.30s/it] 55%|█████▌ | 3684/6638 [3:31:15<2:43:36, 3.32s/it] {'loss': 0.6217, 'grad_norm': 0.5354289693253862, 'learning_rate': 8.710463270693159e-06, 'epoch': 0.55} 55%|█████▌ | 3684/6638 [3:31:15<2:43:36, 3.32s/it] 56%|█████▌ | 3685/6638 [3:31:19<2:44:09, 3.34s/it] {'loss': 0.67, 'grad_norm': 0.6236263520429809, 'learning_rate': 8.705624402517382e-06, 'epoch': 0.56} 56%|█████▌ | 3685/6638 [3:31:19<2:44:09, 3.34s/it] 56%|█████▌ | 3686/6638 [3:31:22<2:44:56, 3.35s/it] {'loss': 0.6888, 'grad_norm': 0.6945505461307269, 'learning_rate': 8.70078584255969e-06, 'epoch': 0.56} 56%|█████▌ | 3686/6638 [3:31:22<2:44:56, 3.35s/it] 56%|█████▌ | 3687/6638 [3:31:26<2:43:49, 3.33s/it] {'loss': 0.6446, 'grad_norm': 0.6199194043946838, 'learning_rate': 8.695947591972238e-06, 'epoch': 0.56} 56%|█████▌ | 3687/6638 [3:31:26<2:43:49, 3.33s/it] 56%|█████▌ | 3688/6638 [3:31:29<2:43:42, 3.33s/it] {'loss': 0.6348, 'grad_norm': 0.5423271616553267, 'learning_rate': 8.691109651907114e-06, 'epoch': 0.56} 56%|█████▌ | 3688/6638 [3:31:29<2:43:42, 3.33s/it] 56%|█████▌ | 3689/6638 [3:31:32<2:43:07, 3.32s/it] {'loss': 0.6537, 'grad_norm': 0.6618150530719946, 'learning_rate': 8.686272023516339e-06, 'epoch': 0.56} 56%|█████▌ | 3689/6638 [3:31:32<2:43:07, 3.32s/it] 56%|█████▌ | 3690/6638 [3:31:35<2:41:52, 3.29s/it] {'loss': 0.6709, 'grad_norm': 0.6867942807301246, 'learning_rate': 8.68143470795185e-06, 'epoch': 0.56} 56%|█████▌ | 3690/6638 [3:31:35<2:41:52, 3.29s/it] 56%|█████▌ | 3691/6638 [3:31:39<2:41:31, 3.29s/it] {'loss': 0.6149, 'grad_norm': 0.551895526501202, 'learning_rate': 8.676597706365516e-06, 'epoch': 0.56} 56%|█████▌ | 3691/6638 [3:31:39<2:41:31, 3.29s/it] 56%|█████▌ | 3692/6638 [3:31:42<2:40:52, 3.28s/it] {'loss': 0.6391, 'grad_norm': 0.5898231416424234, 'learning_rate': 8.671761019909129e-06, 'epoch': 0.56} 56%|█████▌ | 3692/6638 [3:31:42<2:40:52, 3.28s/it] 56%|█████▌ | 3693/6638 [3:31:45<2:41:31, 3.29s/it] {'loss': 0.6515, 'grad_norm': 0.7031142603585757, 'learning_rate': 8.6669246497344e-06, 'epoch': 0.56} 56%|█████▌ | 3693/6638 [3:31:45<2:41:31, 3.29s/it] 56%|█████▌ | 3694/6638 [3:31:49<2:42:19, 3.31s/it] {'loss': 0.6642, 'grad_norm': 0.7090585319981514, 'learning_rate': 8.662088596992982e-06, 'epoch': 0.56} 56%|█████▌ | 3694/6638 [3:31:49<2:42:19, 3.31s/it] 56%|█████▌ | 3695/6638 [3:31:52<2:42:40, 3.32s/it] {'loss': 0.5965, 'grad_norm': 0.5965561403928547, 'learning_rate': 8.65725286283643e-06, 'epoch': 0.56} 56%|█████▌ | 3695/6638 [3:31:52<2:42:40, 3.32s/it] 56%|█████▌ | 3696/6638 [3:31:55<2:41:40, 3.30s/it] {'loss': 0.6231, 'grad_norm': 0.5815453901422586, 'learning_rate': 8.652417448416239e-06, 'epoch': 0.56} 56%|█████▌ | 3696/6638 [3:31:55<2:41:40, 3.30s/it] 56%|█████▌ | 3697/6638 [3:31:58<2:40:18, 3.27s/it] {'loss': 0.6411, 'grad_norm': 0.6685984980298318, 'learning_rate': 8.647582354883827e-06, 'epoch': 0.56} 56%|█████▌ | 3697/6638 [3:31:58<2:40:18, 3.27s/it] 56%|█████▌ | 3698/6638 [3:32:02<2:40:05, 3.27s/it] {'loss': 0.6618, 'grad_norm': 0.5804464467530895, 'learning_rate': 8.642747583390522e-06, 'epoch': 0.56} 56%|█████▌ | 3698/6638 [3:32:02<2:40:05, 3.27s/it] 56%|█████▌ | 3699/6638 [3:32:05<2:41:28, 3.30s/it] {'loss': 0.6502, 'grad_norm': 0.6275852124002468, 'learning_rate': 8.637913135087592e-06, 'epoch': 0.56} 56%|█████▌ | 3699/6638 [3:32:05<2:41:28, 3.30s/it]2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 56%|█████▌ | 3700/6638 [3:32:08<2:41:11, 3.29s/it]4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6112, 'grad_norm': 0.5398652204816912, 'learning_rate': 8.633079011126216e-06, 'epoch': 0.56} 56%|█████▌ | 3700/6638 [3:32:08<2:41:11, 3.29s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 56%|█████▌ | 3701/6638 [3:32:26<6:09:03, 7.54s/it] {'loss': 0.6656, 'grad_norm': 0.6815996340289507, 'learning_rate': 8.628245212657506e-06, 'epoch': 0.56} 56%|█████▌ | 3701/6638 [3:32:26<6:09:03, 7.54s/it] 56%|█████▌ | 3702/6638 [3:32:29<5:08:36, 6.31s/it] {'loss': 0.6894, 'grad_norm': 0.6106966496113011, 'learning_rate': 8.62341174083249e-06, 'epoch': 0.56} 56%|█████▌ | 3702/6638 [3:32:29<5:08:36, 6.31s/it] 56%|█████▌ | 3703/6638 [3:32:32<4:25:00, 5.42s/it] {'loss': 0.6409, 'grad_norm': 0.5890006377504723, 'learning_rate': 8.618578596802114e-06, 'epoch': 0.56} 56%|█████▌ | 3703/6638 [3:32:32<4:25:00, 5.42s/it] 56%|█████▌ | 3704/6638 [3:32:36<3:53:41, 4.78s/it] {'loss': 0.6663, 'grad_norm': 0.6158904713266281, 'learning_rate': 8.61374578171726e-06, 'epoch': 0.56} 56%|█████▌ | 3704/6638 [3:32:36<3:53:41, 4.78s/it] 56%|█████▌ | 3705/6638 [3:32:39<3:31:52, 4.33s/it] {'loss': 0.6139, 'grad_norm': 0.6240070585928978, 'learning_rate': 8.608913296728713e-06, 'epoch': 0.56} 56%|█████▌ | 3705/6638 [3:32:39<3:31:52, 4.33s/it] 56%|█████▌ | 3706/6638 [3:32:42<3:15:15, 4.00s/it] {'loss': 0.6319, 'grad_norm': 0.6089440775812587, 'learning_rate': 8.604081142987199e-06, 'epoch': 0.56} 56%|█████▌ | 3706/6638 [3:32:42<3:15:15, 4.00s/it] 56%|█████▌ | 3707/6638 [3:32:45<3:03:34, 3.76s/it] {'loss': 0.6619, 'grad_norm': 0.6686915943455939, 'learning_rate': 8.599249321643353e-06, 'epoch': 0.56} 56%|█████▌ | 3707/6638 [3:32:45<3:03:34, 3.76s/it] 56%|█████▌ | 3708/6638 [3:32:49<2:57:26, 3.63s/it] {'loss': 0.6831, 'grad_norm': 0.6912670207956769, 'learning_rate': 8.594417833847728e-06, 'epoch': 0.56} 56%|█████▌ | 3708/6638 [3:32:49<2:57:26, 3.63s/it] 56%|█████▌ | 3709/6638 [3:32:52<2:51:58, 3.52s/it] {'loss': 0.6572, 'grad_norm': 0.5750862444330145, 'learning_rate': 8.58958668075081e-06, 'epoch': 0.56} 56%|█████▌ | 3709/6638 [3:32:52<2:51:58, 3.52s/it] 56%|█████▌ | 3710/6638 [3:32:55<2:49:46, 3.48s/it] {'loss': 0.725, 'grad_norm': 0.711025885158913, 'learning_rate': 8.584755863502988e-06, 'epoch': 0.56} 56%|█████▌ | 3710/6638 [3:32:55<2:49:46, 3.48s/it] 56%|█████▌ | 3711/6638 [3:32:59<2:46:39, 3.42s/it] {'loss': 0.6441, 'grad_norm': 0.6092789787824064, 'learning_rate': 8.579925383254598e-06, 'epoch': 0.56} 56%|█████▌ | 3711/6638 [3:32:59<2:46:39, 3.42s/it] 56%|█████▌ | 3712/6638 [3:33:02<2:46:45, 3.42s/it] {'loss': 0.6397, 'grad_norm': 0.5428573138969033, 'learning_rate': 8.575095241155864e-06, 'epoch': 0.56} 56%|█████▌ | 3712/6638 [3:33:02<2:46:45, 3.42s/it] 56%|█████▌ | 3713/6638 [3:33:05<2:43:59, 3.36s/it] {'loss': 0.6311, 'grad_norm': 0.6181275032816865, 'learning_rate': 8.570265438356949e-06, 'epoch': 0.56} 56%|█████▌ | 3713/6638 [3:33:05<2:43:59, 3.36s/it] 56%|█████▌ | 3714/6638 [3:33:09<2:41:59, 3.32s/it] {'loss': 0.65, 'grad_norm': 0.6231167062186854, 'learning_rate': 8.565435976007932e-06, 'epoch': 0.56} 56%|█████▌ | 3714/6638 [3:33:09<2:41:59, 3.32s/it] 56%|█████▌ | 3715/6638 [3:33:12<2:43:24, 3.35s/it] {'loss': 0.6465, 'grad_norm': 0.5725902311896598, 'learning_rate': 8.560606855258808e-06, 'epoch': 0.56} 56%|█████▌ | 3715/6638 [3:33:12<2:43:24, 3.35s/it] 56%|█████▌ | 3716/6638 [3:33:15<2:41:23, 3.31s/it] {'loss': 0.6271, 'grad_norm': 0.5581226464041724, 'learning_rate': 8.555778077259496e-06, 'epoch': 0.56} 56%|█████▌ | 3716/6638 [3:33:15<2:41:23, 3.31s/it] 56%|█████▌ | 3717/6638 [3:33:19<2:40:17, 3.29s/it] {'loss': 0.6561, 'grad_norm': 0.6936070777042979, 'learning_rate': 8.550949643159829e-06, 'epoch': 0.56} 56%|█████▌ | 3717/6638 [3:33:19<2:40:17, 3.29s/it] 56%|█████▌ | 3718/6638 [3:33:22<2:38:49, 3.26s/it] {'loss': 0.6843, 'grad_norm': 0.6699959546973364, 'learning_rate': 8.546121554109552e-06, 'epoch': 0.56} 56%|█████▌ | 3718/6638 [3:33:22<2:38:49, 3.26s/it] 56%|█████▌ | 3719/6638 [3:33:25<2:40:51, 3.31s/it] {'loss': 0.6564, 'grad_norm': 0.5657690714524138, 'learning_rate': 8.541293811258345e-06, 'epoch': 0.56} 56%|█████▌ | 3719/6638 [3:33:25<2:40:51, 3.31s/it] 56%|█████▌ | 3720/6638 [3:33:28<2:40:14, 3.29s/it] {'loss': 0.646, 'grad_norm': 0.629440534381659, 'learning_rate': 8.536466415755787e-06, 'epoch': 0.56} 56%|█████▌ | 3720/6638 [3:33:28<2:40:14, 3.29s/it] 56%|█████▌ | 3721/6638 [3:33:32<2:40:47, 3.31s/it] {'loss': 0.6565, 'grad_norm': 0.5824436418427704, 'learning_rate': 8.531639368751384e-06, 'epoch': 0.56} 56%|█████▌ | 3721/6638 [3:33:32<2:40:47, 3.31s/it] 56%|█████▌ | 3722/6638 [3:33:35<2:40:55, 3.31s/it] {'loss': 0.6572, 'grad_norm': 0.6102838194137313, 'learning_rate': 8.52681267139456e-06, 'epoch': 0.56} 56%|█████▌ | 3722/6638 [3:33:35<2:40:55, 3.31s/it] 56%|█████▌ | 3723/6638 [3:33:38<2:40:40, 3.31s/it] {'loss': 0.66, 'grad_norm': 0.5840814554362087, 'learning_rate': 8.521986324834653e-06, 'epoch': 0.56} 56%|█████▌ | 3723/6638 [3:33:38<2:40:40, 3.31s/it] 56%|█████▌ | 3724/6638 [3:33:42<2:39:48, 3.29s/it] {'loss': 0.6455, 'grad_norm': 0.56764297923849, 'learning_rate': 8.517160330220915e-06, 'epoch': 0.56} 56%|█████▌ | 3724/6638 [3:33:42<2:39:48, 3.29s/it] 56%|█████▌ | 3725/6638 [3:33:45<2:39:05, 3.28s/it] {'loss': 0.6923, 'grad_norm': 0.58088664466149, 'learning_rate': 8.512334688702521e-06, 'epoch': 0.56} 56%|█████▌ | 3725/6638 [3:33:45<2:39:05, 3.28s/it] 56%|█████▌ | 3726/6638 [3:33:48<2:40:16, 3.30s/it] {'loss': 0.6719, 'grad_norm': 0.5808110848565889, 'learning_rate': 8.507509401428554e-06, 'epoch': 0.56} 56%|█████▌ | 3726/6638 [3:33:48<2:40:16, 3.30s/it] 56%|█████▌ | 3727/6638 [3:33:51<2:38:49, 3.27s/it] {'loss': 0.6556, 'grad_norm': 0.6446102983045031, 'learning_rate': 8.502684469548017e-06, 'epoch': 0.56} 56%|█████▌ | 3727/6638 [3:33:51<2:38:49, 3.27s/it] 56%|█████▌ | 3728/6638 [3:33:55<2:39:04, 3.28s/it] {'loss': 0.6427, 'grad_norm': 0.5978973015576857, 'learning_rate': 8.49785989420983e-06, 'epoch': 0.56} 56%|█████▌ | 3728/6638 [3:33:55<2:39:04, 3.28s/it] 56%|█████▌ | 3729/6638 [3:33:58<2:40:31, 3.31s/it] {'loss': 0.6654, 'grad_norm': 0.6247445864146708, 'learning_rate': 8.493035676562826e-06, 'epoch': 0.56} 56%|█████▌ | 3729/6638 [3:33:58<2:40:31, 3.31s/it] 56%|█████▌ | 3730/6638 [3:34:02<2:42:19, 3.35s/it] {'loss': 0.6597, 'grad_norm': 0.5924754600445974, 'learning_rate': 8.488211817755753e-06, 'epoch': 0.56} 56%|█████▌ | 3730/6638 [3:34:02<2:42:19, 3.35s/it] 56%|█████▌ | 3731/6638 [3:34:05<2:41:27, 3.33s/it] {'loss': 0.6176, 'grad_norm': 0.5549830287517622, 'learning_rate': 8.483388318937264e-06, 'epoch': 0.56} 56%|█████▌ | 3731/6638 [3:34:05<2:41:27, 3.33s/it] 56%|█████▌ | 3732/6638 [3:34:08<2:43:39, 3.38s/it] {'loss': 0.6911, 'grad_norm': 0.6089024768493263, 'learning_rate': 8.478565181255953e-06, 'epoch': 0.56} 56%|█████▌ | 3732/6638 [3:34:08<2:43:39, 3.38s/it] 56%|█████▌ | 3733/6638 [3:34:12<2:43:01, 3.37s/it] {'loss': 0.6657, 'grad_norm': 0.5723608852641097, 'learning_rate': 8.473742405860294e-06, 'epoch': 0.56} 56%|█████▌ | 3733/6638 [3:34:12<2:43:01, 3.37s/it] 56%|█████▋ | 3734/6638 [3:34:15<2:42:07, 3.35s/it] {'loss': 0.6451, 'grad_norm': 0.6229994294970161, 'learning_rate': 8.468919993898704e-06, 'epoch': 0.56} 56%|█████▋ | 3734/6638 [3:34:15<2:42:07, 3.35s/it] 56%|█████▋ | 3735/6638 [3:34:18<2:41:41, 3.34s/it] {'loss': 0.6426, 'grad_norm': 0.6066118680626347, 'learning_rate': 8.464097946519494e-06, 'epoch': 0.56} 56%|█████▋ | 3735/6638 [3:34:18<2:41:41, 3.34s/it] 56%|█████▋ | 3736/6638 [3:34:22<2:41:47, 3.35s/it] {'loss': 0.6563, 'grad_norm': 0.5705614345149826, 'learning_rate': 8.459276264870895e-06, 'epoch': 0.56} 56%|█████▋ | 3736/6638 [3:34:22<2:41:47, 3.35s/it] 56%|█████▋ | 3737/6638 [3:34:25<2:45:07, 3.42s/it] {'loss': 0.6719, 'grad_norm': 0.6197493723046751, 'learning_rate': 8.454454950101052e-06, 'epoch': 0.56} 56%|█████▋ | 3737/6638 [3:34:25<2:45:07, 3.42s/it] 56%|█████▋ | 3738/6638 [3:34:29<2:45:33, 3.43s/it] {'loss': 0.7081, 'grad_norm': 0.6069507695263057, 'learning_rate': 8.449634003358022e-06, 'epoch': 0.56} 56%|█████▋ | 3738/6638 [3:34:29<2:45:33, 3.43s/it] 56%|█████▋ | 3739/6638 [3:34:32<2:43:15, 3.38s/it] {'loss': 0.6509, 'grad_norm': 0.633337208610587, 'learning_rate': 8.444813425789776e-06, 'epoch': 0.56} 56%|█████▋ | 3739/6638 [3:34:32<2:43:15, 3.38s/it] 56%|█████▋ | 3740/6638 [3:34:35<2:42:01, 3.35s/it] {'loss': 0.6502, 'grad_norm': 0.5349939763114289, 'learning_rate': 8.439993218544192e-06, 'epoch': 0.56} 56%|█████▋ | 3740/6638 [3:34:35<2:42:01, 3.35s/it] 56%|█████▋ | 3741/6638 [3:34:39<2:40:55, 3.33s/it] {'loss': 0.6461, 'grad_norm': 0.6557974760737436, 'learning_rate': 8.435173382769059e-06, 'epoch': 0.56} 56%|█████▋ | 3741/6638 [3:34:39<2:40:55, 3.33s/it] 56%|█████▋ | 3742/6638 [3:34:42<2:40:42, 3.33s/it] {'loss': 0.6276, 'grad_norm': 0.5517780695736009, 'learning_rate': 8.430353919612091e-06, 'epoch': 0.56} 56%|█████▋ | 3742/6638 [3:34:42<2:40:42, 3.33s/it] 56%|█████▋ | 3743/6638 [3:34:45<2:41:18, 3.34s/it] {'loss': 0.6469, 'grad_norm': 0.5771175162655628, 'learning_rate': 8.425534830220894e-06, 'epoch': 0.56} 56%|█████▋ | 3743/6638 [3:34:45<2:41:18, 3.34s/it] 56%|█████▋ | 3744/6638 [3:34:48<2:39:36, 3.31s/it] {'loss': 0.6667, 'grad_norm': 0.6378330352524052, 'learning_rate': 8.420716115743004e-06, 'epoch': 0.56} 56%|█████▋ | 3744/6638 [3:34:48<2:39:36, 3.31s/it] 56%|█████▋ | 3745/6638 [3:34:52<2:37:45, 3.27s/it] {'loss': 0.6107, 'grad_norm': 0.6153774599724834, 'learning_rate': 8.415897777325853e-06, 'epoch': 0.56} 56%|█████▋ | 3745/6638 [3:34:52<2:37:45, 3.27s/it] 56%|█████▋ | 3746/6638 [3:34:55<2:37:14, 3.26s/it] {'loss': 0.6408, 'grad_norm': 0.5800389934932998, 'learning_rate': 8.411079816116786e-06, 'epoch': 0.56} 56%|█████▋ | 3746/6638 [3:34:55<2:37:14, 3.26s/it] 56%|█████▋ | 3747/6638 [3:34:58<2:35:58, 3.24s/it] {'loss': 0.6414, 'grad_norm': 0.6244441774889014, 'learning_rate': 8.406262233263067e-06, 'epoch': 0.56} 56%|█████▋ | 3747/6638 [3:34:58<2:35:58, 3.24s/it] 56%|█████▋ | 3748/6638 [3:35:01<2:38:21, 3.29s/it] {'loss': 0.6935, 'grad_norm': 0.7141373996086436, 'learning_rate': 8.401445029911861e-06, 'epoch': 0.56} 56%|█████▋ | 3748/6638 [3:35:01<2:38:21, 3.29s/it] 56%|█████▋ | 3749/6638 [3:35:05<2:37:40, 3.27s/it] {'loss': 0.6359, 'grad_norm': 0.6651754841372888, 'learning_rate': 8.396628207210243e-06, 'epoch': 0.56} 56%|█████▋ | 3749/6638 [3:35:05<2:37:40, 3.27s/it]21 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 56%|█████▋ | 3750/6638 [3:35:08<2:35:46, 3.24s/it]6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.651, 'grad_norm': 0.6324006658022648, 'learning_rate': 8.391811766305205e-06, 'epoch': 0.56} 56%|█████▋ | 3750/6638 [3:35:08<2:35:46, 3.24s/it] 57%|█████▋ | 3751/6638 [3:35:11<2:39:44, 3.32s/it] {'loss': 0.6496, 'grad_norm': 0.6661947629650613, 'learning_rate': 8.38699570834364e-06, 'epoch': 0.57} 57%|█████▋ | 3751/6638 [3:35:11<2:39:44, 3.32s/it] 57%|█████▋ | 3752/6638 [3:35:15<2:39:15, 3.31s/it] {'loss': 0.7252, 'grad_norm': 0.6527386965200614, 'learning_rate': 8.382180034472353e-06, 'epoch': 0.57} 57%|█████▋ | 3752/6638 [3:35:15<2:39:15, 3.31s/it] 57%|█████▋ | 3753/6638 [3:35:18<2:39:05, 3.31s/it] {'loss': 0.6668, 'grad_norm': 0.583317223863415, 'learning_rate': 8.377364745838062e-06, 'epoch': 0.57} 57%|█████▋ | 3753/6638 [3:35:18<2:39:05, 3.31s/it] 57%|█████▋ | 3754/6638 [3:35:21<2:39:05, 3.31s/it] {'loss': 0.6579, 'grad_norm': 0.5872356248648022, 'learning_rate': 8.372549843587377e-06, 'epoch': 0.57} 57%|█████▋ | 3754/6638 [3:35:21<2:39:05, 3.31s/it] 57%|█████▋ | 3755/6638 [3:35:25<2:42:06, 3.37s/it] {'loss': 0.707, 'grad_norm': 0.6737918150587491, 'learning_rate': 8.36773532886684e-06, 'epoch': 0.57} 57%|█████▋ | 3755/6638 [3:35:25<2:42:06, 3.37s/it] 57%|█████▋ | 3756/6638 [3:35:28<2:45:04, 3.44s/it] {'loss': 0.6296, 'grad_norm': 0.556265669722867, 'learning_rate': 8.36292120282288e-06, 'epoch': 0.57} 57%|█████▋ | 3756/6638 [3:35:28<2:45:04, 3.44s/it] 57%|█████▋ | 3757/6638 [3:35:32<2:42:07, 3.38s/it] {'loss': 0.5877, 'grad_norm': 0.5751444588341845, 'learning_rate': 8.358107466601848e-06, 'epoch': 0.57} 57%|█████▋ | 3757/6638 [3:35:32<2:42:07, 3.38s/it] 57%|█████▋ | 3758/6638 [3:35:35<2:43:18, 3.40s/it] {'loss': 0.6796, 'grad_norm': 0.6014831737936754, 'learning_rate': 8.353294121349993e-06, 'epoch': 0.57} 57%|█████▋ | 3758/6638 [3:35:35<2:43:18, 3.40s/it] 57%|█████▋ | 3759/6638 [3:35:38<2:41:20, 3.36s/it] {'loss': 0.5993, 'grad_norm': 0.5665295880352202, 'learning_rate': 8.348481168213468e-06, 'epoch': 0.57} 57%|█████▋ | 3759/6638 [3:35:38<2:41:20, 3.36s/it] 57%|█████▋ | 3760/6638 [3:35:42<2:38:49, 3.31s/it] {'loss': 0.617, 'grad_norm': 0.5615431255935301, 'learning_rate': 8.343668608338351e-06, 'epoch': 0.57} 57%|█████▋ | 3760/6638 [3:35:42<2:38:49, 3.31s/it] 57%|█████▋ | 3761/6638 [3:35:45<2:39:06, 3.32s/it] {'loss': 0.689, 'grad_norm': 0.6494028619226682, 'learning_rate': 8.338856442870603e-06, 'epoch': 0.57} 57%|█████▋ | 3761/6638 [3:35:45<2:39:06, 3.32s/it] 57%|█████▋ | 3762/6638 [3:35:48<2:37:58, 3.30s/it] {'loss': 0.6477, 'grad_norm': 0.598280174974222, 'learning_rate': 8.334044672956108e-06, 'epoch': 0.57} 57%|█████▋ | 3762/6638 [3:35:48<2:37:58, 3.30s/it] 57%|█████▋ | 3763/6638 [3:35:51<2:38:57, 3.32s/it] {'loss': 0.6439, 'grad_norm': 0.6562332930602439, 'learning_rate': 8.329233299740649e-06, 'epoch': 0.57} 57%|█████▋ | 3763/6638 [3:35:51<2:38:57, 3.32s/it] 57%|█████▋ | 3764/6638 [3:35:55<2:38:58, 3.32s/it] {'loss': 0.626, 'grad_norm': 0.5570336774871507, 'learning_rate': 8.324422324369908e-06, 'epoch': 0.57} 57%|█████▋ | 3764/6638 [3:35:55<2:38:58, 3.32s/it] 57%|█████▋ | 3765/6638 [3:35:58<2:38:17, 3.31s/it] {'loss': 0.6295, 'grad_norm': 0.6097353312255378, 'learning_rate': 8.319611747989492e-06, 'epoch': 0.57} 57%|█████▋ | 3765/6638 [3:35:58<2:38:17, 3.31s/it] 57%|█████▋ | 3766/6638 [3:36:01<2:38:15, 3.31s/it] {'loss': 0.6522, 'grad_norm': 0.6287910729937296, 'learning_rate': 8.314801571744888e-06, 'epoch': 0.57} 57%|█████▋ | 3766/6638 [3:36:01<2:38:15, 3.31s/it] 57%|█████▋ | 3767/6638 [3:36:05<2:39:51, 3.34s/it] {'loss': 0.6285, 'grad_norm': 0.5090253877319008, 'learning_rate': 8.309991796781512e-06, 'epoch': 0.57} 57%|█████▋ | 3767/6638 [3:36:05<2:39:51, 3.34s/it] 57%|█████▋ | 3768/6638 [3:36:08<2:38:41, 3.32s/it] {'loss': 0.6934, 'grad_norm': 0.6983465030333781, 'learning_rate': 8.305182424244665e-06, 'epoch': 0.57} 57%|█████▋ | 3768/6638 [3:36:08<2:38:41, 3.32s/it] 57%|█████▋ | 3769/6638 [3:36:11<2:37:19, 3.29s/it] {'loss': 0.6782, 'grad_norm': 0.6299228970314724, 'learning_rate': 8.300373455279557e-06, 'epoch': 0.57} 57%|█████▋ | 3769/6638 [3:36:11<2:37:19, 3.29s/it] 57%|█████▋ | 3770/6638 [3:36:15<2:38:53, 3.32s/it] {'loss': 0.6692, 'grad_norm': 0.6120326516661575, 'learning_rate': 8.295564891031314e-06, 'epoch': 0.57} 57%|█████▋ | 3770/6638 [3:36:15<2:38:53, 3.32s/it] 57%|█████▋ | 3771/6638 [3:36:18<2:38:46, 3.32s/it] {'loss': 0.6577, 'grad_norm': 0.6296251980062935, 'learning_rate': 8.29075673264495e-06, 'epoch': 0.57} 57%|█████▋ | 3771/6638 [3:36:18<2:38:46, 3.32s/it] 57%|█████▋ | 3772/6638 [3:36:21<2:38:45, 3.32s/it] {'loss': 0.6689, 'grad_norm': 0.6232224651453734, 'learning_rate': 8.285948981265392e-06, 'epoch': 0.57} 57%|█████▋ | 3772/6638 [3:36:21<2:38:45, 3.32s/it] 57%|█████▋ | 3773/6638 [3:36:25<2:36:54, 3.29s/it] {'loss': 0.6387, 'grad_norm': 0.5670065322593387, 'learning_rate': 8.281141638037464e-06, 'epoch': 0.57} 57%|█████▋ | 3773/6638 [3:36:25<2:36:54, 3.29s/it] 57%|█████▋ | 3774/6638 [3:36:28<2:36:33, 3.28s/it] {'loss': 0.6686, 'grad_norm': 0.6336750474042272, 'learning_rate': 8.276334704105896e-06, 'epoch': 0.57} 57%|█████▋ | 3774/6638 [3:36:28<2:36:33, 3.28s/it] 57%|█████▋ | 3775/6638 [3:36:31<2:39:37, 3.35s/it] {'loss': 0.6739, 'grad_norm': 0.5836730004152566, 'learning_rate': 8.271528180615322e-06, 'epoch': 0.57} 57%|█████▋ | 3775/6638 [3:36:31<2:39:37, 3.35s/it] 57%|█████▋ | 3776/6638 [3:36:35<2:39:25, 3.34s/it] {'loss': 0.6743, 'grad_norm': 0.6021269684848909, 'learning_rate': 8.266722068710276e-06, 'epoch': 0.57} 57%|█████▋ | 3776/6638 [3:36:35<2:39:25, 3.34s/it] 57%|█████▋ | 3777/6638 [3:36:38<2:39:29, 3.34s/it] {'loss': 0.6856, 'grad_norm': 0.6076541725680658, 'learning_rate': 8.261916369535196e-06, 'epoch': 0.57} 57%|█████▋ | 3777/6638 [3:36:38<2:39:29, 3.34s/it] 57%|█████▋ | 3778/6638 [3:36:41<2:38:11, 3.32s/it] {'loss': 0.655, 'grad_norm': 0.5541810378113903, 'learning_rate': 8.257111084234417e-06, 'epoch': 0.57} 57%|█████▋ | 3778/6638 [3:36:41<2:38:11, 3.32s/it] 57%|█████▋ | 3779/6638 [3:36:45<2:39:39, 3.35s/it] {'loss': 0.6402, 'grad_norm': 0.5988817666298409, 'learning_rate': 8.252306213952178e-06, 'epoch': 0.57} 57%|█████▋ | 3779/6638 [3:36:45<2:39:39, 3.35s/it] 57%|█████▋ | 3780/6638 [3:36:48<2:38:30, 3.33s/it] {'loss': 0.66, 'grad_norm': 0.6304699597747009, 'learning_rate': 8.247501759832621e-06, 'epoch': 0.57} 57%|█████▋ | 3780/6638 [3:36:48<2:38:30, 3.33s/it] 57%|█████▋ | 3781/6638 [3:36:51<2:37:45, 3.31s/it] {'loss': 0.679, 'grad_norm': 0.6407906302638763, 'learning_rate': 8.24269772301979e-06, 'epoch': 0.57} 57%|█████▋ | 3781/6638 [3:36:51<2:37:45, 3.31s/it] 57%|█████▋ | 3782/6638 [3:36:55<2:39:30, 3.35s/it] {'loss': 0.6904, 'grad_norm': 0.6188886344923015, 'learning_rate': 8.237894104657624e-06, 'epoch': 0.57} 57%|█████▋ | 3782/6638 [3:36:55<2:39:30, 3.35s/it] 57%|█████▋ | 3783/6638 [3:36:58<2:37:53, 3.32s/it] {'loss': 0.6362, 'grad_norm': 0.5833419483377107, 'learning_rate': 8.233090905889969e-06, 'epoch': 0.57} 57%|█████▋ | 3783/6638 [3:36:58<2:37:53, 3.32s/it] 57%|█████▋ | 3784/6638 [3:37:01<2:39:38, 3.36s/it] {'loss': 0.6377, 'grad_norm': 0.7568566590404857, 'learning_rate': 8.228288127860561e-06, 'epoch': 0.57} 57%|█████▋ | 3784/6638 [3:37:01<2:39:38, 3.36s/it] 57%|█████▋ | 3785/6638 [3:37:05<2:37:15, 3.31s/it] {'loss': 0.6246, 'grad_norm': 0.57211728399827, 'learning_rate': 8.223485771713052e-06, 'epoch': 0.57} 57%|█████▋ | 3785/6638 [3:37:05<2:37:15, 3.31s/it] 57%|█████▋ | 3786/6638 [3:37:08<2:36:18, 3.29s/it] {'loss': 0.6587, 'grad_norm': 0.6716543436602489, 'learning_rate': 8.218683838590981e-06, 'epoch': 0.57} 57%|█████▋ | 3786/6638 [3:37:08<2:36:18, 3.29s/it] 57%|█████▋ | 3787/6638 [3:37:11<2:36:27, 3.29s/it] {'loss': 0.6371, 'grad_norm': 0.6148830846975577, 'learning_rate': 8.213882329637781e-06, 'epoch': 0.57} 57%|█████▋ | 3787/6638 [3:37:11<2:36:27, 3.29s/it] 57%|█████▋ | 3788/6638 [3:37:14<2:34:59, 3.26s/it] {'loss': 0.6706, 'grad_norm': 0.59933065267668, 'learning_rate': 8.209081245996807e-06, 'epoch': 0.57} 57%|█████▋ | 3788/6638 [3:37:14<2:34:59, 3.26s/it] 57%|█████▋ | 3789/6638 [3:37:17<2:34:16, 3.25s/it] {'loss': 0.6377, 'grad_norm': 0.5849471889527696, 'learning_rate': 8.204280588811283e-06, 'epoch': 0.57} 57%|█████▋ | 3789/6638 [3:37:17<2:34:16, 3.25s/it] 57%|█████▋ | 3790/6638 [3:37:21<2:34:41, 3.26s/it] {'loss': 0.665, 'grad_norm': 0.6189245151702965, 'learning_rate': 8.199480359224359e-06, 'epoch': 0.57} 57%|█████▋ | 3790/6638 [3:37:21<2:34:41, 3.26s/it] 57%|█████▋ | 3791/6638 [3:37:24<2:35:53, 3.29s/it] {'loss': 0.6439, 'grad_norm': 0.5498168509669255, 'learning_rate': 8.194680558379066e-06, 'epoch': 0.57} 57%|█████▋ | 3791/6638 [3:37:24<2:35:53, 3.29s/it] 57%|█████▋ | 3792/6638 [3:37:28<2:37:33, 3.32s/it] {'loss': 0.6602, 'grad_norm': 0.5512711703184783, 'learning_rate': 8.189881187418332e-06, 'epoch': 0.57} 57%|█████▋ | 3792/6638 [3:37:28<2:37:33, 3.32s/it] 57%|█████▋ | 3793/6638 [3:37:31<2:35:58, 3.29s/it] {'loss': 0.6705, 'grad_norm': 0.5967188225613687, 'learning_rate': 8.185082247484999e-06, 'epoch': 0.57} 57%|█████▋ | 3793/6638 [3:37:31<2:35:58, 3.29s/it] 57%|█████▋ | 3794/6638 [3:37:34<2:34:24, 3.26s/it] {'loss': 0.6429, 'grad_norm': 0.5802213093466076, 'learning_rate': 8.180283739721785e-06, 'epoch': 0.57} 57%|█████▋ | 3794/6638 [3:37:34<2:34:24, 3.26s/it] 57%|█████▋ | 3795/6638 [3:37:37<2:34:43, 3.27s/it] {'loss': 0.6275, 'grad_norm': 0.5968975347494619, 'learning_rate': 8.175485665271324e-06, 'epoch': 0.57} 57%|█████▋ | 3795/6638 [3:37:37<2:34:43, 3.27s/it] 57%|█████▋ | 3796/6638 [3:37:41<2:35:33, 3.28s/it] {'loss': 0.6312, 'grad_norm': 0.568978227804806, 'learning_rate': 8.170688025276134e-06, 'epoch': 0.57} 57%|█████▋ | 3796/6638 [3:37:41<2:35:33, 3.28s/it] 57%|█████▋ | 3797/6638 [3:37:44<2:34:15, 3.26s/it] {'loss': 0.6437, 'grad_norm': 0.5871358841680089, 'learning_rate': 8.165890820878635e-06, 'epoch': 0.57} 57%|█████▋ | 3797/6638 [3:37:44<2:34:15, 3.26s/it] 57%|█████▋ | 3798/6638 [3:37:47<2:35:27, 3.28s/it] {'loss': 0.6718, 'grad_norm': 0.6190544841041284, 'learning_rate': 8.161094053221148e-06, 'epoch': 0.57} 57%|█████▋ | 3798/6638 [3:37:47<2:35:27, 3.28s/it] 57%|█████▋ | 3799/6638 [3:37:50<2:35:41, 3.29s/it] {'loss': 0.7, 'grad_norm': 0.6737356423223396, 'learning_rate': 8.15629772344587e-06, 'epoch': 0.57} 57%|█████▋ | 3799/6638 [3:37:50<2:35:41, 3.29s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 57%|█████▋ | 3800/6638 [3:37:54<2:35:22, 3.28s/it]7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6529, 'grad_norm': 0.5917500664462062, 'learning_rate': 8.151501832694923e-06, 'epoch': 0.57} 57%|█████▋ | 3800/6638 [3:37:54<2:35:22, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 57%|█████▋ | 3801/6638 [3:38:12<6:02:51, 7.67s/it] {'loss': 0.6731, 'grad_norm': 0.5870788182127502, 'learning_rate': 8.146706382110304e-06, 'epoch': 0.57} 57%|█████▋ | 3801/6638 [3:38:12<6:02:51, 7.67s/it] 57%|█████▋ | 3802/6638 [3:38:15<5:01:29, 6.38s/it] {'loss': 0.6178, 'grad_norm': 0.591961803469202, 'learning_rate': 8.141911372833909e-06, 'epoch': 0.57} 57%|█████▋ | 3802/6638 [3:38:15<5:01:29, 6.38s/it] 57%|█████▋ | 3803/6638 [3:38:18<4:17:18, 5.45s/it] {'loss': 0.7229, 'grad_norm': 0.707998507861507, 'learning_rate': 8.137116806007532e-06, 'epoch': 0.57} 57%|█████▋ | 3803/6638 [3:38:18<4:17:18, 5.45s/it] 57%|█████▋ | 3804/6638 [3:38:21<3:46:04, 4.79s/it] {'loss': 0.6109, 'grad_norm': 0.5471148358396926, 'learning_rate': 8.132322682772859e-06, 'epoch': 0.57} 57%|█████▋ | 3804/6638 [3:38:21<3:46:04, 4.79s/it] 57%|█████▋ | 3805/6638 [3:38:25<3:27:00, 4.38s/it] {'loss': 0.6575, 'grad_norm': 0.539206518536386, 'learning_rate': 8.127529004271474e-06, 'epoch': 0.57} 57%|█████▋ | 3805/6638 [3:38:25<3:27:00, 4.38s/it] 57%|█████▋ | 3806/6638 [3:38:28<3:11:17, 4.05s/it] {'loss': 0.6651, 'grad_norm': 0.6497608484901408, 'learning_rate': 8.122735771644853e-06, 'epoch': 0.57} 57%|█████▋ | 3806/6638 [3:38:28<3:11:17, 4.05s/it] 57%|█████▋ | 3807/6638 [3:38:31<3:00:13, 3.82s/it] {'loss': 0.6513, 'grad_norm': 0.585995966054769, 'learning_rate': 8.117942986034365e-06, 'epoch': 0.57} 57%|█████▋ | 3807/6638 [3:38:31<3:00:13, 3.82s/it] 57%|█████▋ | 3808/6638 [3:38:35<2:53:14, 3.67s/it] {'loss': 0.6596, 'grad_norm': 0.5788947611996383, 'learning_rate': 8.11315064858127e-06, 'epoch': 0.57} 57%|█████▋ | 3808/6638 [3:38:35<2:53:14, 3.67s/it] 57%|█████▋ | 3809/6638 [3:38:38<2:47:41, 3.56s/it] {'loss': 0.647, 'grad_norm': 0.5883575786808966, 'learning_rate': 8.108358760426727e-06, 'epoch': 0.57} 57%|█████▋ | 3809/6638 [3:38:38<2:47:41, 3.56s/it] 57%|█████▋ | 3810/6638 [3:38:41<2:43:46, 3.47s/it] {'loss': 0.6553, 'grad_norm': 0.6377754899651957, 'learning_rate': 8.103567322711787e-06, 'epoch': 0.57} 57%|█████▋ | 3810/6638 [3:38:41<2:43:46, 3.47s/it] 57%|█████▋ | 3811/6638 [3:38:45<2:41:45, 3.43s/it] {'loss': 0.6821, 'grad_norm': 0.6477377603030101, 'learning_rate': 8.098776336577394e-06, 'epoch': 0.57} 57%|█████▋ | 3811/6638 [3:38:45<2:41:45, 3.43s/it] 57%|█████▋ | 3812/6638 [3:38:48<2:39:10, 3.38s/it] {'loss': 0.673, 'grad_norm': 0.6118518437191774, 'learning_rate': 8.093985803164372e-06, 'epoch': 0.57} 57%|█████▋ | 3812/6638 [3:38:48<2:39:10, 3.38s/it] 57%|█████▋ | 3813/6638 [3:38:51<2:39:48, 3.39s/it] {'loss': 0.5988, 'grad_norm': 0.4763003410414989, 'learning_rate': 8.089195723613462e-06, 'epoch': 0.57} 57%|█████▋ | 3813/6638 [3:38:51<2:39:48, 3.39s/it] 57%|█████▋ | 3814/6638 [3:38:55<2:39:58, 3.40s/it] {'loss': 0.6448, 'grad_norm': 0.5481137884976035, 'learning_rate': 8.084406099065273e-06, 'epoch': 0.57} 57%|█████▋ | 3814/6638 [3:38:55<2:39:58, 3.40s/it] 57%|█████▋ | 3815/6638 [3:38:58<2:39:49, 3.40s/it] {'loss': 0.6618, 'grad_norm': 0.6268633291126467, 'learning_rate': 8.079616930660317e-06, 'epoch': 0.57} 57%|█████▋ | 3815/6638 [3:38:58<2:39:49, 3.40s/it] 57%|█████▋ | 3816/6638 [3:39:01<2:38:28, 3.37s/it] {'loss': 0.6912, 'grad_norm': 0.6519384516070182, 'learning_rate': 8.074828219539001e-06, 'epoch': 0.57} 57%|█████▋ | 3816/6638 [3:39:01<2:38:28, 3.37s/it] 58%|█████▊ | 3817/6638 [3:39:05<2:37:32, 3.35s/it] {'loss': 0.6923, 'grad_norm': 1.427904658154348, 'learning_rate': 8.07003996684161e-06, 'epoch': 0.58} 58%|█████▊ | 3817/6638 [3:39:05<2:37:32, 3.35s/it] 58%|█████▊ | 3818/6638 [3:39:08<2:39:26, 3.39s/it] {'loss': 0.6746, 'grad_norm': 0.5681236709877884, 'learning_rate': 8.065252173708334e-06, 'epoch': 0.58} 58%|█████▊ | 3818/6638 [3:39:08<2:39:26, 3.39s/it] 58%|█████▊ | 3819/6638 [3:39:12<2:38:02, 3.36s/it] {'loss': 0.6183, 'grad_norm': 0.5638160946533768, 'learning_rate': 8.060464841279247e-06, 'epoch': 0.58} 58%|█████▊ | 3819/6638 [3:39:12<2:38:02, 3.36s/it] 58%|█████▊ | 3820/6638 [3:39:15<2:36:43, 3.34s/it] {'loss': 0.6588, 'grad_norm': 0.5873599293251087, 'learning_rate': 8.055677970694306e-06, 'epoch': 0.58} 58%|█████▊ | 3820/6638 [3:39:15<2:36:43, 3.34s/it] 58%|█████▊ | 3821/6638 [3:39:18<2:34:49, 3.30s/it] {'loss': 0.6142, 'grad_norm': 0.5644891036948774, 'learning_rate': 8.050891563093378e-06, 'epoch': 0.58} 58%|█████▊ | 3821/6638 [3:39:18<2:34:49, 3.30s/it] 58%|█████▊ | 3822/6638 [3:39:21<2:34:35, 3.29s/it] {'loss': 0.6171, 'grad_norm': 0.5833443922116182, 'learning_rate': 8.046105619616195e-06, 'epoch': 0.58} 58%|█████▊ | 3822/6638 [3:39:21<2:34:35, 3.29s/it] 58%|█████▊ | 3823/6638 [3:39:25<2:35:10, 3.31s/it] {'loss': 0.6076, 'grad_norm': 0.5319321449204986, 'learning_rate': 8.041320141402403e-06, 'epoch': 0.58} 58%|█████▊ | 3823/6638 [3:39:25<2:35:10, 3.31s/it] 58%|█████▊ | 3824/6638 [3:39:28<2:34:32, 3.29s/it] {'loss': 0.6619, 'grad_norm': 0.5751613918644324, 'learning_rate': 8.036535129591517e-06, 'epoch': 0.58} 58%|█████▊ | 3824/6638 [3:39:28<2:34:32, 3.29s/it] 58%|█████▊ | 3825/6638 [3:39:31<2:35:39, 3.32s/it] {'loss': 0.6595, 'grad_norm': 0.6453953777480755, 'learning_rate': 8.031750585322948e-06, 'epoch': 0.58} 58%|█████▊ | 3825/6638 [3:39:31<2:35:39, 3.32s/it] 58%|█████▊ | 3826/6638 [3:39:35<2:35:09, 3.31s/it] {'loss': 0.6595, 'grad_norm': 0.6358697690839977, 'learning_rate': 8.026966509736001e-06, 'epoch': 0.58} 58%|█████▊ | 3826/6638 [3:39:35<2:35:09, 3.31s/it] 58%|█████▊ | 3827/6638 [3:39:38<2:34:49, 3.30s/it] {'loss': 0.6356, 'grad_norm': 0.5316358568556936, 'learning_rate': 8.022182903969863e-06, 'epoch': 0.58} 58%|█████▊ | 3827/6638 [3:39:38<2:34:49, 3.30s/it] 58%|█████▊ | 3828/6638 [3:39:41<2:33:47, 3.28s/it] {'loss': 0.638, 'grad_norm': 0.6221436637607994, 'learning_rate': 8.017399769163614e-06, 'epoch': 0.58} 58%|█████▊ | 3828/6638 [3:39:41<2:33:47, 3.28s/it] 58%|█████▊ | 3829/6638 [3:39:44<2:33:44, 3.28s/it] {'loss': 0.6463, 'grad_norm': 0.5763476013345645, 'learning_rate': 8.012617106456217e-06, 'epoch': 0.58} 58%|█████▊ | 3829/6638 [3:39:44<2:33:44, 3.28s/it] 58%|█████▊ | 3830/6638 [3:39:48<2:33:42, 3.28s/it] {'loss': 0.6551, 'grad_norm': 0.6071855331360987, 'learning_rate': 8.007834916986523e-06, 'epoch': 0.58} 58%|█████▊ | 3830/6638 [3:39:48<2:33:42, 3.28s/it] 58%|█████▊ | 3831/6638 [3:39:51<2:33:44, 3.29s/it] {'loss': 0.6602, 'grad_norm': 0.6743949786397936, 'learning_rate': 8.003053201893275e-06, 'epoch': 0.58} 58%|█████▊ | 3831/6638 [3:39:51<2:33:44, 3.29s/it] 58%|█████▊ | 3832/6638 [3:39:54<2:32:34, 3.26s/it] {'loss': 0.6441, 'grad_norm': 0.562774050054745, 'learning_rate': 7.998271962315099e-06, 'epoch': 0.58} 58%|█████▊ | 3832/6638 [3:39:54<2:32:34, 3.26s/it] 58%|█████▊ | 3833/6638 [3:39:57<2:33:04, 3.27s/it] {'loss': 0.6261, 'grad_norm': 0.5927657558131579, 'learning_rate': 7.993491199390508e-06, 'epoch': 0.58} 58%|█████▊ | 3833/6638 [3:39:57<2:33:04, 3.27s/it] 58%|█████▊ | 3834/6638 [3:40:01<2:33:16, 3.28s/it] {'loss': 0.6285, 'grad_norm': 0.532651047549077, 'learning_rate': 7.988710914257906e-06, 'epoch': 0.58} 58%|█████▊ | 3834/6638 [3:40:01<2:33:16, 3.28s/it] 58%|█████▊ | 3835/6638 [3:40:04<2:33:19, 3.28s/it] {'loss': 0.6395, 'grad_norm': 0.6239212451881343, 'learning_rate': 7.983931108055574e-06, 'epoch': 0.58} 58%|█████▊ | 3835/6638 [3:40:04<2:33:19, 3.28s/it] 58%|█████▊ | 3836/6638 [3:40:07<2:33:18, 3.28s/it] {'loss': 0.6378, 'grad_norm': 0.5580293305000313, 'learning_rate': 7.979151781921686e-06, 'epoch': 0.58} 58%|█████▊ | 3836/6638 [3:40:07<2:33:18, 3.28s/it] 58%|█████▊ | 3837/6638 [3:40:11<2:31:51, 3.25s/it] {'loss': 0.608, 'grad_norm': 0.5834837826561033, 'learning_rate': 7.974372936994302e-06, 'epoch': 0.58} 58%|█████▊ | 3837/6638 [3:40:11<2:31:51, 3.25s/it] 58%|█████▊ | 3838/6638 [3:40:14<2:32:38, 3.27s/it] {'loss': 0.6733, 'grad_norm': 0.6264977480934764, 'learning_rate': 7.969594574411364e-06, 'epoch': 0.58} 58%|█████▊ | 3838/6638 [3:40:14<2:32:38, 3.27s/it] 58%|█████▊ | 3839/6638 [3:40:17<2:33:22, 3.29s/it] {'loss': 0.6685, 'grad_norm': 0.5816974554370805, 'learning_rate': 7.964816695310702e-06, 'epoch': 0.58} 58%|█████▊ | 3839/6638 [3:40:17<2:33:22, 3.29s/it] 58%|█████▊ | 3840/6638 [3:40:21<2:35:39, 3.34s/it] {'loss': 0.7215, 'grad_norm': 0.66053720114772, 'learning_rate': 7.960039300830028e-06, 'epoch': 0.58} 58%|█████▊ | 3840/6638 [3:40:21<2:35:39, 3.34s/it] 58%|█████▊ | 3841/6638 [3:40:24<2:34:48, 3.32s/it] {'loss': 0.6696, 'grad_norm': 0.6019555650087384, 'learning_rate': 7.955262392106943e-06, 'epoch': 0.58} 58%|█████▊ | 3841/6638 [3:40:24<2:34:48, 3.32s/it] 58%|█████▊ | 3842/6638 [3:40:27<2:33:54, 3.30s/it] {'loss': 0.6942, 'grad_norm': 0.6258300102372018, 'learning_rate': 7.95048597027893e-06, 'epoch': 0.58} 58%|█████▊ | 3842/6638 [3:40:27<2:33:54, 3.30s/it] 58%|█████▊ | 3843/6638 [3:40:30<2:33:13, 3.29s/it] {'loss': 0.6615, 'grad_norm': 0.5964965162179036, 'learning_rate': 7.945710036483345e-06, 'epoch': 0.58} 58%|█████▊ | 3843/6638 [3:40:30<2:33:13, 3.29s/it] 58%|█████▊ | 3844/6638 [3:40:34<2:34:16, 3.31s/it] {'loss': 0.6472, 'grad_norm': 0.6098797052914648, 'learning_rate': 7.940934591857455e-06, 'epoch': 0.58} 58%|█████▊ | 3844/6638 [3:40:34<2:34:16, 3.31s/it] 58%|█████▊ | 3845/6638 [3:40:37<2:33:22, 3.29s/it] {'loss': 0.652, 'grad_norm': 0.583834235594532, 'learning_rate': 7.936159637538379e-06, 'epoch': 0.58} 58%|█████▊ | 3845/6638 [3:40:37<2:33:22, 3.29s/it] 58%|█████▊ | 3846/6638 [3:40:40<2:33:56, 3.31s/it] {'loss': 0.6652, 'grad_norm': 0.6070132637795885, 'learning_rate': 7.931385174663146e-06, 'epoch': 0.58} 58%|█████▊ | 3846/6638 [3:40:40<2:33:56, 3.31s/it] 58%|█████▊ | 3847/6638 [3:40:44<2:33:55, 3.31s/it] {'loss': 0.6508, 'grad_norm': 0.55382214765578, 'learning_rate': 7.92661120436865e-06, 'epoch': 0.58} 58%|█████▊ | 3847/6638 [3:40:44<2:33:55, 3.31s/it] 58%|█████▊ | 3848/6638 [3:40:47<2:33:26, 3.30s/it] {'loss': 0.6541, 'grad_norm': 0.7599819869591969, 'learning_rate': 7.921837727791673e-06, 'epoch': 0.58} 58%|█████▊ | 3848/6638 [3:40:47<2:33:26, 3.30s/it] 58%|█████▊ | 3849/6638 [3:40:50<2:32:08, 3.27s/it] {'loss': 0.6116, 'grad_norm': 0.5392317241417762, 'learning_rate': 7.917064746068882e-06, 'epoch': 0.58} 58%|█████▊ | 3849/6638 [3:40:50<2:32:08, 3.27s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 58%|█████▊ | 3850/6638 [3:40:53<2:31:55, 3.27s/it]7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6319, 'grad_norm': 0.5769418990652698, 'learning_rate': 7.912292260336823e-06, 'epoch': 0.58} 58%|█████▊ | 3850/6638 [3:40:53<2:31:55, 3.27s/it] 58%|█████▊ | 3851/6638 [3:40:57<2:32:13, 3.28s/it] {'loss': 0.6239, 'grad_norm': 0.5647730305221652, 'learning_rate': 7.90752027173193e-06, 'epoch': 0.58} 58%|█████▊ | 3851/6638 [3:40:57<2:32:13, 3.28s/it] 58%|█████▊ | 3852/6638 [3:41:00<2:32:18, 3.28s/it] {'loss': 0.6388, 'grad_norm': 0.5639392429440219, 'learning_rate': 7.902748781390509e-06, 'epoch': 0.58} 58%|█████▊ | 3852/6638 [3:41:00<2:32:18, 3.28s/it] 58%|█████▊ | 3853/6638 [3:41:04<2:37:11, 3.39s/it] {'loss': 0.6766, 'grad_norm': 0.5877943736136881, 'learning_rate': 7.897977790448753e-06, 'epoch': 0.58} 58%|█████▊ | 3853/6638 [3:41:04<2:37:11, 3.39s/it] 58%|█████▊ | 3854/6638 [3:41:07<2:35:04, 3.34s/it] {'loss': 0.6406, 'grad_norm': 0.6956474526386301, 'learning_rate': 7.89320730004274e-06, 'epoch': 0.58} 58%|█████▊ | 3854/6638 [3:41:07<2:35:04, 3.34s/it] 58%|█████▊ | 3855/6638 [3:41:10<2:34:52, 3.34s/it] {'loss': 0.6546, 'grad_norm': 0.5429517703900083, 'learning_rate': 7.888437311308415e-06, 'epoch': 0.58} 58%|█████▊ | 3855/6638 [3:41:10<2:34:52, 3.34s/it] 58%|█████▊ | 3856/6638 [3:41:13<2:33:10, 3.30s/it] {'loss': 0.6028, 'grad_norm': 0.5631393887063232, 'learning_rate': 7.883667825381623e-06, 'epoch': 0.58} 58%|█████▊ | 3856/6638 [3:41:13<2:33:10, 3.30s/it] 58%|█████▊ | 3857/6638 [3:41:17<2:32:49, 3.30s/it] {'loss': 0.6887, 'grad_norm': 0.644794938122964, 'learning_rate': 7.878898843398073e-06, 'epoch': 0.58} 58%|█████▊ | 3857/6638 [3:41:17<2:32:49, 3.30s/it] 58%|█████▊ | 3858/6638 [3:41:20<2:33:14, 3.31s/it] {'loss': 0.6967, 'grad_norm': 0.6879095652236981, 'learning_rate': 7.874130366493359e-06, 'epoch': 0.58} 58%|█████▊ | 3858/6638 [3:41:20<2:33:14, 3.31s/it] 58%|█████▊ | 3859/6638 [3:41:23<2:32:31, 3.29s/it] {'loss': 0.685, 'grad_norm': 0.6425319546293986, 'learning_rate': 7.869362395802959e-06, 'epoch': 0.58} 58%|█████▊ | 3859/6638 [3:41:23<2:32:31, 3.29s/it] 58%|█████▊ | 3860/6638 [3:41:27<2:33:47, 3.32s/it] {'loss': 0.6639, 'grad_norm': 0.5641516639005049, 'learning_rate': 7.864594932462227e-06, 'epoch': 0.58} 58%|█████▊ | 3860/6638 [3:41:27<2:33:47, 3.32s/it] 58%|█████▊ | 3861/6638 [3:41:30<2:32:54, 3.30s/it] {'loss': 0.6434, 'grad_norm': 0.6780005299403008, 'learning_rate': 7.859827977606394e-06, 'epoch': 0.58} 58%|█████▊ | 3861/6638 [3:41:30<2:32:54, 3.30s/it] 58%|█████▊ | 3862/6638 [3:41:33<2:35:06, 3.35s/it] {'loss': 0.6601, 'grad_norm': 0.6075397352196308, 'learning_rate': 7.855061532370575e-06, 'epoch': 0.58} 58%|█████▊ | 3862/6638 [3:41:33<2:35:06, 3.35s/it] 58%|█████▊ | 3863/6638 [3:41:37<2:34:07, 3.33s/it] {'loss': 0.6221, 'grad_norm': 0.5705764195359126, 'learning_rate': 7.85029559788976e-06, 'epoch': 0.58} 58%|█████▊ | 3863/6638 [3:41:37<2:34:07, 3.33s/it] 58%|█████▊ | 3864/6638 [3:41:40<2:32:35, 3.30s/it] {'loss': 0.7246, 'grad_norm': 0.6699312471249709, 'learning_rate': 7.845530175298818e-06, 'epoch': 0.58} 58%|█████▊ | 3864/6638 [3:41:40<2:32:35, 3.30s/it] 58%|█████▊ | 3865/6638 [3:41:43<2:31:33, 3.28s/it] {'loss': 0.6794, 'grad_norm': 0.610800664137685, 'learning_rate': 7.840765265732495e-06, 'epoch': 0.58} 58%|█████▊ | 3865/6638 [3:41:43<2:31:33, 3.28s/it] 58%|█████▊ | 3866/6638 [3:41:46<2:30:28, 3.26s/it] {'loss': 0.6355, 'grad_norm': 0.5985905212074952, 'learning_rate': 7.83600087032542e-06, 'epoch': 0.58} 58%|█████▊ | 3866/6638 [3:41:46<2:30:28, 3.26s/it] 58%|█████▊ | 3867/6638 [3:41:50<2:31:56, 3.29s/it] {'loss': 0.6476, 'grad_norm': 0.5730356750525049, 'learning_rate': 7.831236990212097e-06, 'epoch': 0.58} 58%|█████▊ | 3867/6638 [3:41:50<2:31:56, 3.29s/it] 58%|█████▊ | 3868/6638 [3:41:53<2:32:06, 3.29s/it] {'loss': 0.6628, 'grad_norm': 0.6078234706774344, 'learning_rate': 7.826473626526895e-06, 'epoch': 0.58} 58%|█████▊ | 3868/6638 [3:41:53<2:32:06, 3.29s/it] 58%|█████▊ | 3869/6638 [3:41:56<2:33:44, 3.33s/it] {'loss': 0.6684, 'grad_norm': 0.571855991508189, 'learning_rate': 7.821710780404086e-06, 'epoch': 0.58} 58%|█████▊ | 3869/6638 [3:41:56<2:33:44, 3.33s/it] 58%|█████▊ | 3870/6638 [3:42:00<2:35:07, 3.36s/it] {'loss': 0.6504, 'grad_norm': 0.6975826117039501, 'learning_rate': 7.816948452977792e-06, 'epoch': 0.58} 58%|█████▊ | 3870/6638 [3:42:00<2:35:07, 3.36s/it] 58%|█████▊ | 3871/6638 [3:42:03<2:34:33, 3.35s/it] {'loss': 0.6292, 'grad_norm': 0.5830537849568173, 'learning_rate': 7.81218664538203e-06, 'epoch': 0.58} 58%|█████▊ | 3871/6638 [3:42:03<2:34:33, 3.35s/it] 58%|█████▊ | 3872/6638 [3:42:07<2:34:47, 3.36s/it] {'loss': 0.6417, 'grad_norm': 0.5280617961505175, 'learning_rate': 7.807425358750687e-06, 'epoch': 0.58} 58%|█████▊ | 3872/6638 [3:42:07<2:34:47, 3.36s/it] 58%|█████▊ | 3873/6638 [3:42:10<2:35:44, 3.38s/it] {'loss': 0.6832, 'grad_norm': 0.5873078090230428, 'learning_rate': 7.80266459421752e-06, 'epoch': 0.58} 58%|█████▊ | 3873/6638 [3:42:10<2:35:44, 3.38s/it] 58%|█████▊ | 3874/6638 [3:42:13<2:35:21, 3.37s/it] {'loss': 0.6519, 'grad_norm': 0.5744483191811454, 'learning_rate': 7.797904352916174e-06, 'epoch': 0.58} 58%|█████▊ | 3874/6638 [3:42:13<2:35:21, 3.37s/it] 58%|█████▊ | 3875/6638 [3:42:17<2:33:00, 3.32s/it] {'loss': 0.6195, 'grad_norm': 0.5875727257131952, 'learning_rate': 7.79314463598016e-06, 'epoch': 0.58} 58%|█████▊ | 3875/6638 [3:42:17<2:33:00, 3.32s/it] 58%|█████▊ | 3876/6638 [3:42:20<2:32:02, 3.30s/it] {'loss': 0.6079, 'grad_norm': 0.5157232626367692, 'learning_rate': 7.78838544454286e-06, 'epoch': 0.58} 58%|█████▊ | 3876/6638 [3:42:20<2:32:02, 3.30s/it] 58%|█████▊ | 3877/6638 [3:42:23<2:34:36, 3.36s/it] {'loss': 0.6606, 'grad_norm': 0.5546526098016052, 'learning_rate': 7.783626779737552e-06, 'epoch': 0.58} 58%|█████▊ | 3877/6638 [3:42:23<2:34:36, 3.36s/it] 58%|█████▊ | 3878/6638 [3:42:27<2:32:51, 3.32s/it] {'loss': 0.6283, 'grad_norm': 0.5407841981710526, 'learning_rate': 7.778868642697359e-06, 'epoch': 0.58} 58%|█████▊ | 3878/6638 [3:42:27<2:32:51, 3.32s/it] 58%|█████▊ | 3879/6638 [3:42:30<2:31:49, 3.30s/it] {'loss': 0.6551, 'grad_norm': 0.6994045747406302, 'learning_rate': 7.774111034555308e-06, 'epoch': 0.58} 58%|█████▊ | 3879/6638 [3:42:30<2:31:49, 3.30s/it] 58%|█████▊ | 3880/6638 [3:42:33<2:31:14, 3.29s/it] {'loss': 0.6515, 'grad_norm': 0.5967433230956267, 'learning_rate': 7.769353956444275e-06, 'epoch': 0.58} 58%|█████▊ | 3880/6638 [3:42:33<2:31:14, 3.29s/it] 58%|█████▊ | 3881/6638 [3:42:37<2:32:48, 3.33s/it] {'loss': 0.6384, 'grad_norm': 0.5605759338087992, 'learning_rate': 7.764597409497023e-06, 'epoch': 0.58} 58%|█████▊ | 3881/6638 [3:42:37<2:32:48, 3.33s/it] 58%|█████▊ | 3882/6638 [3:42:40<2:32:44, 3.33s/it] {'loss': 0.6552, 'grad_norm': 0.5453376851180938, 'learning_rate': 7.759841394846187e-06, 'epoch': 0.58} 58%|█████▊ | 3882/6638 [3:42:40<2:32:44, 3.33s/it] 58%|█████▊ | 3883/6638 [3:42:43<2:31:15, 3.29s/it] {'loss': 0.6133, 'grad_norm': 0.5992982871872323, 'learning_rate': 7.755085913624274e-06, 'epoch': 0.58} 58%|█████▊ | 3883/6638 [3:42:43<2:31:15, 3.29s/it] 59%|█████▊ | 3884/6638 [3:42:46<2:31:14, 3.30s/it] {'loss': 0.64, 'grad_norm': 0.5993369437619732, 'learning_rate': 7.750330966963666e-06, 'epoch': 0.59} 59%|█████▊ | 3884/6638 [3:42:46<2:31:14, 3.30s/it] 59%|█████▊ | 3885/6638 [3:42:50<2:31:04, 3.29s/it] {'loss': 0.6308, 'grad_norm': 0.5949325055825053, 'learning_rate': 7.745576555996615e-06, 'epoch': 0.59} 59%|█████▊ | 3885/6638 [3:42:50<2:31:04, 3.29s/it] 59%|█████▊ | 3886/6638 [3:42:53<2:30:39, 3.28s/it] {'loss': 0.6073, 'grad_norm': 0.5467021855685069, 'learning_rate': 7.740822681855242e-06, 'epoch': 0.59} 59%|█████▊ | 3886/6638 [3:42:53<2:30:39, 3.28s/it] 59%|█████▊ | 3887/6638 [3:42:56<2:29:36, 3.26s/it] {'loss': 0.6538, 'grad_norm': 0.5848346647543128, 'learning_rate': 7.736069345671551e-06, 'epoch': 0.59} 59%|█████▊ | 3887/6638 [3:42:56<2:29:36, 3.26s/it] 59%|█████▊ | 3888/6638 [3:42:59<2:30:37, 3.29s/it] {'loss': 0.6594, 'grad_norm': 0.6612601594928808, 'learning_rate': 7.731316548577406e-06, 'epoch': 0.59} 59%|█████▊ | 3888/6638 [3:42:59<2:30:37, 3.29s/it] 59%|█████▊ | 3889/6638 [3:43:03<2:29:22, 3.26s/it] {'loss': 0.6096, 'grad_norm': 0.589495931928548, 'learning_rate': 7.726564291704552e-06, 'epoch': 0.59} 59%|█████▊ | 3889/6638 [3:43:03<2:29:22, 3.26s/it] 59%|█████▊ | 3890/6638 [3:43:06<2:29:29, 3.26s/it] {'loss': 0.671, 'grad_norm': 0.5807826179768114, 'learning_rate': 7.721812576184603e-06, 'epoch': 0.59} 59%|█████▊ | 3890/6638 [3:43:06<2:29:29, 3.26s/it] 59%|█████▊ | 3891/6638 [3:43:09<2:28:49, 3.25s/it] {'loss': 0.6242, 'grad_norm': 0.5856191987962265, 'learning_rate': 7.717061403149034e-06, 'epoch': 0.59} 59%|█████▊ | 3891/6638 [3:43:09<2:28:49, 3.25s/it] 59%|█████▊ | 3892/6638 [3:43:12<2:29:23, 3.26s/it] {'loss': 0.7038, 'grad_norm': 0.6923402212665574, 'learning_rate': 7.712310773729208e-06, 'epoch': 0.59} 59%|█████▊ | 3892/6638 [3:43:12<2:29:23, 3.26s/it] 59%|█████▊ | 3893/6638 [3:43:16<2:29:31, 3.27s/it] {'loss': 0.6387, 'grad_norm': 0.5803458205156179, 'learning_rate': 7.707560689056343e-06, 'epoch': 0.59} 59%|█████▊ | 3893/6638 [3:43:16<2:29:31, 3.27s/it] 59%|█████▊ | 3894/6638 [3:43:19<2:28:16, 3.24s/it] {'loss': 0.6215, 'grad_norm': 0.6084858377871911, 'learning_rate': 7.702811150261543e-06, 'epoch': 0.59} 59%|█████▊ | 3894/6638 [3:43:19<2:28:16, 3.24s/it] 59%|█████▊ | 3895/6638 [3:43:22<2:29:11, 3.26s/it] {'loss': 0.6677, 'grad_norm': 0.593702130203482, 'learning_rate': 7.698062158475766e-06, 'epoch': 0.59} 59%|█████▊ | 3895/6638 [3:43:22<2:29:11, 3.26s/it] 59%|█████▊ | 3896/6638 [3:43:26<2:29:46, 3.28s/it] {'loss': 0.626, 'grad_norm': 0.5837341791840076, 'learning_rate': 7.693313714829846e-06, 'epoch': 0.59} 59%|█████▊ | 3896/6638 [3:43:26<2:29:46, 3.28s/it] 59%|█████▊ | 3897/6638 [3:43:29<2:29:35, 3.27s/it] {'loss': 0.6535, 'grad_norm': 0.6712537834527053, 'learning_rate': 7.688565820454492e-06, 'epoch': 0.59} 59%|█████▊ | 3897/6638 [3:43:29<2:29:35, 3.27s/it] 59%|█████▊ | 3898/6638 [3:43:32<2:34:27, 3.38s/it] {'loss': 0.6404, 'grad_norm': 0.6076048826998771, 'learning_rate': 7.683818476480275e-06, 'epoch': 0.59} 59%|█████▊ | 3898/6638 [3:43:32<2:34:27, 3.38s/it] 59%|█████▊ | 3899/6638 [3:43:36<2:33:43, 3.37s/it] {'loss': 0.6988, 'grad_norm': 0.6083982786300928, 'learning_rate': 7.67907168403764e-06, 'epoch': 0.59} 59%|█████▊ | 3899/6638 [3:43:36<2:33:43, 3.37s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 59%|█████▉ | 3900/6638 [3:43:39<2:33:28, 3.36s/it]4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6764, 'grad_norm': 0.6123800226399995, 'learning_rate': 7.674325444256899e-06, 'epoch': 0.59} 59%|█████▉ | 3900/6638 [3:43:39<2:33:28, 3.36s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 59%|█████▉ | 3901/6638 [3:43:57<5:47:05, 7.61s/it] {'loss': 0.6749, 'grad_norm': 0.5877318462041119, 'learning_rate': 7.669579758268222e-06, 'epoch': 0.59} 59%|█████▉ | 3901/6638 [3:43:57<5:47:05, 7.61s/it] 59%|█████▉ | 3902/6638 [3:44:00<4:46:15, 6.28s/it] {'loss': 0.6453, 'grad_norm': 0.6277640765031568, 'learning_rate': 7.664834627201671e-06, 'epoch': 0.59} 59%|█████▉ | 3902/6638 [3:44:00<4:46:15, 6.28s/it] 59%|█████▉ | 3903/6638 [3:44:03<4:06:31, 5.41s/it] {'loss': 0.6565, 'grad_norm': 0.618351304953389, 'learning_rate': 7.660090052187149e-06, 'epoch': 0.59} 59%|█████▉ | 3903/6638 [3:44:03<4:06:31, 5.41s/it] 59%|█████▉ | 3904/6638 [3:44:06<3:37:32, 4.77s/it] {'loss': 0.6647, 'grad_norm': 0.6488332811637513, 'learning_rate': 7.655346034354449e-06, 'epoch': 0.59} 59%|█████▉ | 3904/6638 [3:44:06<3:37:32, 4.77s/it] 59%|█████▉ | 3905/6638 [3:44:10<3:16:30, 4.31s/it] {'loss': 0.6227, 'grad_norm': 0.5363735954624153, 'learning_rate': 7.650602574833217e-06, 'epoch': 0.59} 59%|█████▉ | 3905/6638 [3:44:10<3:16:30, 4.31s/it] 59%|█████▉ | 3906/6638 [3:44:13<3:02:40, 4.01s/it] {'loss': 0.6338, 'grad_norm': 0.5880712289276466, 'learning_rate': 7.645859674752968e-06, 'epoch': 0.59} 59%|█████▉ | 3906/6638 [3:44:13<3:02:40, 4.01s/it] 59%|█████▉ | 3907/6638 [3:44:16<2:52:31, 3.79s/it] {'loss': 0.6459, 'grad_norm': 0.6437255109982556, 'learning_rate': 7.641117335243091e-06, 'epoch': 0.59} 59%|█████▉ | 3907/6638 [3:44:16<2:52:31, 3.79s/it] 59%|█████▉ | 3908/6638 [3:44:20<2:45:20, 3.63s/it] {'loss': 0.6354, 'grad_norm': 0.5848483244214956, 'learning_rate': 7.636375557432835e-06, 'epoch': 0.59} 59%|█████▉ | 3908/6638 [3:44:20<2:45:20, 3.63s/it] 59%|█████▉ | 3909/6638 [3:44:23<2:41:13, 3.54s/it] {'loss': 0.6506, 'grad_norm': 0.5447316659467375, 'learning_rate': 7.631634342451315e-06, 'epoch': 0.59} 59%|█████▉ | 3909/6638 [3:44:23<2:41:13, 3.54s/it] 59%|█████▉ | 3910/6638 [3:44:26<2:36:38, 3.45s/it] {'loss': 0.6119, 'grad_norm': 0.5512028405012773, 'learning_rate': 7.62689369142752e-06, 'epoch': 0.59} 59%|█████▉ | 3910/6638 [3:44:26<2:36:38, 3.45s/it] 59%|█████▉ | 3911/6638 [3:44:29<2:34:42, 3.40s/it] {'loss': 0.6148, 'grad_norm': 0.559671179884189, 'learning_rate': 7.622153605490292e-06, 'epoch': 0.59} 59%|█████▉ | 3911/6638 [3:44:29<2:34:42, 3.40s/it] 59%|█████▉ | 3912/6638 [3:44:33<2:32:37, 3.36s/it] {'loss': 0.6556, 'grad_norm': 0.5986698323931354, 'learning_rate': 7.617414085768352e-06, 'epoch': 0.59} 59%|█████▉ | 3912/6638 [3:44:33<2:32:37, 3.36s/it] 59%|█████▉ | 3913/6638 [3:44:36<2:31:03, 3.33s/it] {'loss': 0.6257, 'grad_norm': 0.6018666606443839, 'learning_rate': 7.612675133390275e-06, 'epoch': 0.59} 59%|█████▉ | 3913/6638 [3:44:36<2:31:03, 3.33s/it] 59%|█████▉ | 3914/6638 [3:44:39<2:28:57, 3.28s/it] {'loss': 0.6213, 'grad_norm': 0.6184218881729543, 'learning_rate': 7.607936749484503e-06, 'epoch': 0.59} 59%|█████▉ | 3914/6638 [3:44:39<2:28:57, 3.28s/it] 59%|█████▉ | 3915/6638 [3:44:43<2:33:49, 3.39s/it] {'loss': 0.6324, 'grad_norm': 0.5734016713080795, 'learning_rate': 7.603198935179349e-06, 'epoch': 0.59} 59%|█████▉ | 3915/6638 [3:44:43<2:33:49, 3.39s/it] 59%|█████▉ | 3916/6638 [3:44:46<2:32:36, 3.36s/it] {'loss': 0.6392, 'grad_norm': 0.5759036294522647, 'learning_rate': 7.598461691602984e-06, 'epoch': 0.59} 59%|█████▉ | 3916/6638 [3:44:46<2:32:36, 3.36s/it] 59%|█████▉ | 3917/6638 [3:44:49<2:31:39, 3.34s/it] {'loss': 0.6265, 'grad_norm': 0.5362809539977285, 'learning_rate': 7.593725019883449e-06, 'epoch': 0.59} 59%|█████▉ | 3917/6638 [3:44:49<2:31:39, 3.34s/it] 59%|█████▉ | 3918/6638 [3:44:53<2:30:05, 3.31s/it] {'loss': 0.6358, 'grad_norm': 0.5992684175789176, 'learning_rate': 7.588988921148644e-06, 'epoch': 0.59} 59%|█████▉ | 3918/6638 [3:44:53<2:30:05, 3.31s/it] 59%|█████▉ | 3919/6638 [3:44:56<2:30:17, 3.32s/it] {'loss': 0.6425, 'grad_norm': 0.56244749054037, 'learning_rate': 7.58425339652633e-06, 'epoch': 0.59} 59%|█████▉ | 3919/6638 [3:44:56<2:30:17, 3.32s/it] 59%|█████▉ | 3920/6638 [3:44:59<2:30:15, 3.32s/it] {'loss': 0.6535, 'grad_norm': 0.5878402857206421, 'learning_rate': 7.579518447144139e-06, 'epoch': 0.59} 59%|█████▉ | 3920/6638 [3:44:59<2:30:15, 3.32s/it] 59%|█████▉ | 3921/6638 [3:45:03<2:29:24, 3.30s/it] {'loss': 0.6514, 'grad_norm': 0.5374341903096698, 'learning_rate': 7.57478407412956e-06, 'epoch': 0.59} 59%|█████▉ | 3921/6638 [3:45:03<2:29:24, 3.30s/it] 59%|█████▉ | 3922/6638 [3:45:06<2:31:34, 3.35s/it] {'loss': 0.6931, 'grad_norm': 0.5908762113581326, 'learning_rate': 7.570050278609951e-06, 'epoch': 0.59} 59%|█████▉ | 3922/6638 [3:45:06<2:31:34, 3.35s/it] 59%|█████▉ | 3923/6638 [3:45:09<2:30:48, 3.33s/it] {'loss': 0.6584, 'grad_norm': 0.5653496413332638, 'learning_rate': 7.565317061712525e-06, 'epoch': 0.59} 59%|█████▉ | 3923/6638 [3:45:09<2:30:48, 3.33s/it] 59%|█████▉ | 3924/6638 [3:45:13<2:30:59, 3.34s/it] {'loss': 0.6816, 'grad_norm': 0.5575872009554379, 'learning_rate': 7.560584424564356e-06, 'epoch': 0.59} 59%|█████▉ | 3924/6638 [3:45:13<2:30:59, 3.34s/it] 59%|█████▉ | 3925/6638 [3:45:16<2:28:58, 3.29s/it] {'loss': 0.6519, 'grad_norm': 0.6265899535444117, 'learning_rate': 7.555852368292396e-06, 'epoch': 0.59} 59%|█████▉ | 3925/6638 [3:45:16<2:28:58, 3.29s/it] 59%|█████▉ | 3926/6638 [3:45:19<2:28:36, 3.29s/it] {'loss': 0.6659, 'grad_norm': 0.6470322057363781, 'learning_rate': 7.551120894023433e-06, 'epoch': 0.59} 59%|█████▉ | 3926/6638 [3:45:19<2:28:36, 3.29s/it] 59%|█████▉ | 3927/6638 [3:45:22<2:28:50, 3.29s/it] {'loss': 0.6241, 'grad_norm': 0.5869410597244701, 'learning_rate': 7.546390002884147e-06, 'epoch': 0.59} 59%|█████▉ | 3927/6638 [3:45:22<2:28:50, 3.29s/it] 59%|█████▉ | 3928/6638 [3:45:26<2:27:40, 3.27s/it] {'loss': 0.632, 'grad_norm': 0.5713262506804215, 'learning_rate': 7.54165969600105e-06, 'epoch': 0.59} 59%|█████▉ | 3928/6638 [3:45:26<2:27:40, 3.27s/it] 59%|█████▉ | 3929/6638 [3:45:29<2:27:38, 3.27s/it] {'loss': 0.6241, 'grad_norm': 0.5693746833090004, 'learning_rate': 7.53692997450053e-06, 'epoch': 0.59} 59%|█████▉ | 3929/6638 [3:45:29<2:27:38, 3.27s/it] 59%|█████▉ | 3930/6638 [3:45:32<2:27:42, 3.27s/it] {'loss': 0.6507, 'grad_norm': 0.6505214371854521, 'learning_rate': 7.532200839508838e-06, 'epoch': 0.59} 59%|█████▉ | 3930/6638 [3:45:32<2:27:42, 3.27s/it] 59%|█████▉ | 3931/6638 [3:45:35<2:28:06, 3.28s/it] {'loss': 0.6369, 'grad_norm': 0.5377343851571785, 'learning_rate': 7.527472292152074e-06, 'epoch': 0.59} 59%|█████▉ | 3931/6638 [3:45:35<2:28:06, 3.28s/it] 59%|█████▉ | 3932/6638 [3:45:39<2:29:56, 3.32s/it] {'loss': 0.6615, 'grad_norm': 0.6538939189552204, 'learning_rate': 7.522744333556211e-06, 'epoch': 0.59} 59%|█████▉ | 3932/6638 [3:45:39<2:29:56, 3.32s/it] 59%|█████▉ | 3933/6638 [3:45:42<2:28:37, 3.30s/it] {'loss': 0.6225, 'grad_norm': 0.6233132441199971, 'learning_rate': 7.5180169648470744e-06, 'epoch': 0.59} 59%|█████▉ | 3933/6638 [3:45:42<2:28:37, 3.30s/it] 59%|█████▉ | 3934/6638 [3:45:45<2:27:42, 3.28s/it] {'loss': 0.6522, 'grad_norm': 0.6628361783318606, 'learning_rate': 7.513290187150343e-06, 'epoch': 0.59} 59%|█████▉ | 3934/6638 [3:45:45<2:27:42, 3.28s/it] 59%|█████▉ | 3935/6638 [3:45:49<2:27:22, 3.27s/it] {'loss': 0.7113, 'grad_norm': 0.6547474229189393, 'learning_rate': 7.508564001591573e-06, 'epoch': 0.59} 59%|█████▉ | 3935/6638 [3:45:49<2:27:22, 3.27s/it] 59%|█████▉ | 3936/6638 [3:45:52<2:29:38, 3.32s/it] {'loss': 0.61, 'grad_norm': 0.5538900075219026, 'learning_rate': 7.503838409296162e-06, 'epoch': 0.59} 59%|█████▉ | 3936/6638 [3:45:52<2:29:38, 3.32s/it] 59%|█████▉ | 3937/6638 [3:45:55<2:29:33, 3.32s/it] {'loss': 0.6533, 'grad_norm': 0.6210355502696011, 'learning_rate': 7.499113411389371e-06, 'epoch': 0.59} 59%|█████▉ | 3937/6638 [3:45:55<2:29:33, 3.32s/it] 59%|█████▉ | 3938/6638 [3:45:59<2:28:21, 3.30s/it] {'loss': 0.6558, 'grad_norm': 0.5894148281910717, 'learning_rate': 7.494389008996328e-06, 'epoch': 0.59} 59%|█████▉ | 3938/6638 [3:45:59<2:28:21, 3.30s/it] 59%|█████▉ | 3939/6638 [3:46:02<2:29:34, 3.33s/it] {'loss': 0.6359, 'grad_norm': 0.5453072567715932, 'learning_rate': 7.489665203242007e-06, 'epoch': 0.59} 59%|█████▉ | 3939/6638 [3:46:02<2:29:34, 3.33s/it] 59%|█████▉ | 3940/6638 [3:46:05<2:27:54, 3.29s/it] {'loss': 0.6551, 'grad_norm': 0.5825286921295844, 'learning_rate': 7.4849419952512495e-06, 'epoch': 0.59} 59%|█████▉ | 3940/6638 [3:46:05<2:27:54, 3.29s/it] 59%|█████▉ | 3941/6638 [3:46:08<2:26:19, 3.26s/it] {'loss': 0.6084, 'grad_norm': 0.6534663493151395, 'learning_rate': 7.480219386148751e-06, 'epoch': 0.59} 59%|█████▉ | 3941/6638 [3:46:08<2:26:19, 3.26s/it] 59%|█████▉ | 3942/6638 [3:46:12<2:26:53, 3.27s/it] {'loss': 0.6364, 'grad_norm': 0.5768165042886024, 'learning_rate': 7.475497377059058e-06, 'epoch': 0.59} 59%|█████▉ | 3942/6638 [3:46:12<2:26:53, 3.27s/it] 59%|█████▉ | 3943/6638 [3:46:15<2:26:43, 3.27s/it] {'loss': 0.6701, 'grad_norm': 0.6178706062834588, 'learning_rate': 7.470775969106587e-06, 'epoch': 0.59} 59%|█████▉ | 3943/6638 [3:46:15<2:26:43, 3.27s/it] 59%|█████▉ | 3944/6638 [3:46:18<2:26:30, 3.26s/it] {'loss': 0.6512, 'grad_norm': 0.58261002308075, 'learning_rate': 7.466055163415602e-06, 'epoch': 0.59} 59%|█████▉ | 3944/6638 [3:46:18<2:26:30, 3.26s/it] 59%|█████▉ | 3945/6638 [3:46:22<2:28:39, 3.31s/it] {'loss': 0.7269, 'grad_norm': 0.6823938847631769, 'learning_rate': 7.461334961110227e-06, 'epoch': 0.59} 59%|█████▉ | 3945/6638 [3:46:22<2:28:39, 3.31s/it] 59%|█████▉ | 3946/6638 [3:46:25<2:32:52, 3.41s/it] {'loss': 0.7002, 'grad_norm': 0.5904616200054394, 'learning_rate': 7.456615363314445e-06, 'epoch': 0.59} 59%|█████▉ | 3946/6638 [3:46:25<2:32:52, 3.41s/it] 59%|█████▉ | 3947/6638 [3:46:28<2:30:42, 3.36s/it] {'loss': 0.6484, 'grad_norm': 0.623376042843785, 'learning_rate': 7.45189637115208e-06, 'epoch': 0.59} 59%|█████▉ | 3947/6638 [3:46:29<2:30:42, 3.36s/it] 59%|█████▉ | 3948/6638 [3:46:32<2:29:00, 3.32s/it] {'loss': 0.6509, 'grad_norm': 0.5588607273518312, 'learning_rate': 7.447177985746837e-06, 'epoch': 0.59} 59%|█████▉ | 3948/6638 [3:46:32<2:29:00, 3.32s/it] 59%|█████▉ | 3949/6638 [3:46:35<2:28:30, 3.31s/it] {'loss': 0.6292, 'grad_norm': 0.5502737365970732, 'learning_rate': 7.442460208222253e-06, 'epoch': 0.59} 59%|█████▉ | 3949/6638 [3:46:35<2:28:30, 3.31s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 60%|█████▉ | 3950/6638 [3:46:38<2:27:38, 3.30s/it]3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6288, 'grad_norm': 0.6429405440518334, 'learning_rate': 7.437743039701737e-06, 'epoch': 0.6} 60%|█████▉ | 3950/6638 [3:46:38<2:27:38, 3.30s/it] 60%|█████▉ | 3951/6638 [3:46:42<2:27:43, 3.30s/it] {'loss': 0.6412, 'grad_norm': 0.6356584952263704, 'learning_rate': 7.433026481308543e-06, 'epoch': 0.6} 60%|█████▉ | 3951/6638 [3:46:42<2:27:43, 3.30s/it] 60%|█████▉ | 3952/6638 [3:46:45<2:28:26, 3.32s/it] {'loss': 0.6106, 'grad_norm': 0.515424435377886, 'learning_rate': 7.428310534165779e-06, 'epoch': 0.6} 60%|█████▉ | 3952/6638 [3:46:45<2:28:26, 3.32s/it] 60%|█████▉ | 3953/6638 [3:46:48<2:26:37, 3.28s/it] {'loss': 0.6633, 'grad_norm': 0.6185698899391154, 'learning_rate': 7.4235951993964195e-06, 'epoch': 0.6} 60%|█████▉ | 3953/6638 [3:46:48<2:26:37, 3.28s/it] 60%|█████▉ | 3954/6638 [3:46:52<2:31:52, 3.40s/it] {'loss': 0.6224, 'grad_norm': 0.5355230654017541, 'learning_rate': 7.418880478123278e-06, 'epoch': 0.6} 60%|█████▉ | 3954/6638 [3:46:52<2:31:52, 3.40s/it] 60%|█████▉ | 3955/6638 [3:46:55<2:30:55, 3.38s/it] {'loss': 0.6625, 'grad_norm': 0.6306001640644097, 'learning_rate': 7.414166371469034e-06, 'epoch': 0.6} 60%|█████▉ | 3955/6638 [3:46:55<2:30:55, 3.38s/it] 60%|█████▉ | 3956/6638 [3:46:58<2:30:35, 3.37s/it] {'loss': 0.6188, 'grad_norm': 0.51962978619346, 'learning_rate': 7.409452880556215e-06, 'epoch': 0.6} 60%|█████▉ | 3956/6638 [3:46:58<2:30:35, 3.37s/it] 60%|█████▉ | 3957/6638 [3:47:02<2:28:36, 3.33s/it] {'loss': 0.6267, 'grad_norm': 0.6111465022153799, 'learning_rate': 7.404740006507194e-06, 'epoch': 0.6} 60%|█████▉ | 3957/6638 [3:47:02<2:28:36, 3.33s/it] 60%|█████▉ | 3958/6638 [3:47:05<2:31:50, 3.40s/it] {'loss': 0.6634, 'grad_norm': 0.5769345267264627, 'learning_rate': 7.40002775044422e-06, 'epoch': 0.6} 60%|█████▉ | 3958/6638 [3:47:05<2:31:50, 3.40s/it] 60%|█████▉ | 3959/6638 [3:47:09<2:29:32, 3.35s/it] {'loss': 0.6031, 'grad_norm': 0.5483628997500377, 'learning_rate': 7.3953161134893644e-06, 'epoch': 0.6} 60%|█████▉ | 3959/6638 [3:47:09<2:29:32, 3.35s/it] 60%|█████▉ | 3960/6638 [3:47:12<2:29:04, 3.34s/it] {'loss': 0.6757, 'grad_norm': 0.6401149420999043, 'learning_rate': 7.390605096764583e-06, 'epoch': 0.6} 60%|█████▉ | 3960/6638 [3:47:12<2:29:04, 3.34s/it] 60%|█████▉ | 3961/6638 [3:47:15<2:27:42, 3.31s/it] {'loss': 0.6166, 'grad_norm': 0.8886573925876537, 'learning_rate': 7.385894701391658e-06, 'epoch': 0.6} 60%|█████▉ | 3961/6638 [3:47:15<2:27:42, 3.31s/it] 60%|█████▉ | 3962/6638 [3:47:18<2:28:35, 3.33s/it] {'loss': 0.6365, 'grad_norm': 0.5926710727873918, 'learning_rate': 7.381184928492233e-06, 'epoch': 0.6} 60%|█████▉ | 3962/6638 [3:47:18<2:28:35, 3.33s/it] 60%|█████▉ | 3963/6638 [3:47:22<2:28:18, 3.33s/it] {'loss': 0.6512, 'grad_norm': 0.6217505279542578, 'learning_rate': 7.376475779187811e-06, 'epoch': 0.6} 60%|█████▉ | 3963/6638 [3:47:22<2:28:18, 3.33s/it] 60%|█████▉ | 3964/6638 [3:47:25<2:28:58, 3.34s/it] {'loss': 0.6797, 'grad_norm': 0.6364842273013628, 'learning_rate': 7.371767254599731e-06, 'epoch': 0.6} 60%|█████▉ | 3964/6638 [3:47:25<2:28:58, 3.34s/it] 60%|█████▉ | 3965/6638 [3:47:28<2:28:43, 3.34s/it] {'loss': 0.6364, 'grad_norm': 0.8138584826876324, 'learning_rate': 7.3670593558492e-06, 'epoch': 0.6} 60%|█████▉ | 3965/6638 [3:47:28<2:28:43, 3.34s/it] 60%|█████▉ | 3966/6638 [3:47:32<2:27:27, 3.31s/it] {'loss': 0.6896, 'grad_norm': 0.6966145059462878, 'learning_rate': 7.362352084057265e-06, 'epoch': 0.6} 60%|█████▉ | 3966/6638 [3:47:32<2:27:27, 3.31s/it] 60%|█████▉ | 3967/6638 [3:47:35<2:26:35, 3.29s/it] {'loss': 0.6458, 'grad_norm': 0.582005270884251, 'learning_rate': 7.357645440344823e-06, 'epoch': 0.6} 60%|█████▉ | 3967/6638 [3:47:35<2:26:35, 3.29s/it] 60%|█████▉ | 3968/6638 [3:47:38<2:29:25, 3.36s/it] {'loss': 0.7082, 'grad_norm': 0.635917078959883, 'learning_rate': 7.3529394258326305e-06, 'epoch': 0.6} 60%|█████▉ | 3968/6638 [3:47:38<2:29:25, 3.36s/it] 60%|█████▉ | 3969/6638 [3:47:42<2:27:35, 3.32s/it] {'loss': 0.6351, 'grad_norm': 0.558742165895199, 'learning_rate': 7.3482340416412865e-06, 'epoch': 0.6} 60%|█████▉ | 3969/6638 [3:47:42<2:27:35, 3.32s/it] 60%|█████▉ | 3970/6638 [3:47:45<2:26:45, 3.30s/it] {'loss': 0.6697, 'grad_norm': 0.6450090311993606, 'learning_rate': 7.343529288891239e-06, 'epoch': 0.6} 60%|█████▉ | 3970/6638 [3:47:45<2:26:45, 3.30s/it] 60%|█████▉ | 3971/6638 [3:47:48<2:26:33, 3.30s/it] {'loss': 0.6365, 'grad_norm': 0.5923024740100908, 'learning_rate': 7.338825168702792e-06, 'epoch': 0.6} 60%|█████▉ | 3971/6638 [3:47:48<2:26:33, 3.30s/it] 60%|█████▉ | 3972/6638 [3:47:52<2:27:01, 3.31s/it] {'loss': 0.6143, 'grad_norm': 0.5903926169289604, 'learning_rate': 7.3341216821960935e-06, 'epoch': 0.6} 60%|█████▉ | 3972/6638 [3:47:52<2:27:01, 3.31s/it] 60%|█████▉ | 3973/6638 [3:47:55<2:28:10, 3.34s/it] {'loss': 0.649, 'grad_norm': 0.5568176252839875, 'learning_rate': 7.329418830491145e-06, 'epoch': 0.6} 60%|█████▉ | 3973/6638 [3:47:55<2:28:10, 3.34s/it] 60%|█████▉ | 3974/6638 [3:47:58<2:27:50, 3.33s/it] {'loss': 0.7174, 'grad_norm': 0.6412419280497565, 'learning_rate': 7.324716614707794e-06, 'epoch': 0.6} 60%|█████▉ | 3974/6638 [3:47:58<2:27:50, 3.33s/it] 60%|█████▉ | 3975/6638 [3:48:02<2:28:33, 3.35s/it] {'loss': 0.6239, 'grad_norm': 0.5887059543307107, 'learning_rate': 7.320015035965734e-06, 'epoch': 0.6} 60%|█████▉ | 3975/6638 [3:48:02<2:28:33, 3.35s/it] 60%|█████▉ | 3976/6638 [3:48:05<2:28:09, 3.34s/it] {'loss': 0.6446, 'grad_norm': 0.576596330405192, 'learning_rate': 7.315314095384515e-06, 'epoch': 0.6} 60%|█████▉ | 3976/6638 [3:48:05<2:28:09, 3.34s/it] 60%|█████▉ | 3977/6638 [3:48:08<2:28:32, 3.35s/it] {'loss': 0.6437, 'grad_norm': 0.6052644681501195, 'learning_rate': 7.310613794083524e-06, 'epoch': 0.6} 60%|█████▉ | 3977/6638 [3:48:08<2:28:32, 3.35s/it] 60%|█████▉ | 3978/6638 [3:48:12<2:27:26, 3.33s/it] {'loss': 0.636, 'grad_norm': 0.5838527992031358, 'learning_rate': 7.305914133182008e-06, 'epoch': 0.6} 60%|█████▉ | 3978/6638 [3:48:12<2:27:26, 3.33s/it] 60%|█████▉ | 3979/6638 [3:48:15<2:30:07, 3.39s/it] {'loss': 0.6694, 'grad_norm': 0.6769907554004139, 'learning_rate': 7.301215113799054e-06, 'epoch': 0.6} 60%|█████▉ | 3979/6638 [3:48:15<2:30:07, 3.39s/it] 60%|█████▉ | 3980/6638 [3:48:18<2:28:48, 3.36s/it] {'loss': 0.737, 'grad_norm': 0.7514036724205915, 'learning_rate': 7.296516737053587e-06, 'epoch': 0.6} 60%|█████▉ | 3980/6638 [3:48:18<2:28:48, 3.36s/it] 60%|█████▉ | 3981/6638 [3:48:22<2:28:23, 3.35s/it] {'loss': 0.6604, 'grad_norm': 0.5939427045281922, 'learning_rate': 7.291819004064407e-06, 'epoch': 0.6} 60%|█████▉ | 3981/6638 [3:48:22<2:28:23, 3.35s/it] 60%|█████▉ | 3982/6638 [3:48:25<2:27:11, 3.33s/it] {'loss': 0.6862, 'grad_norm': 0.6780641997557831, 'learning_rate': 7.287121915950126e-06, 'epoch': 0.6} 60%|█████▉ | 3982/6638 [3:48:25<2:27:11, 3.33s/it] 60%|██████ | 3983/6638 [3:48:28<2:25:36, 3.29s/it] {'loss': 0.5951, 'grad_norm': 0.5761364535005052, 'learning_rate': 7.282425473829235e-06, 'epoch': 0.6} 60%|██████ | 3983/6638 [3:48:28<2:25:36, 3.29s/it] 60%|██████ | 3984/6638 [3:48:32<2:24:48, 3.27s/it] {'loss': 0.6586, 'grad_norm': 0.6173444164146116, 'learning_rate': 7.2777296788200466e-06, 'epoch': 0.6} 60%|██████ | 3984/6638 [3:48:32<2:24:48, 3.27s/it] 60%|██████ | 3985/6638 [3:48:35<2:24:52, 3.28s/it] {'loss': 0.6424, 'grad_norm': 0.5607466036305674, 'learning_rate': 7.273034532040727e-06, 'epoch': 0.6} 60%|██████ | 3985/6638 [3:48:35<2:24:52, 3.28s/it] 60%|██████ | 3986/6638 [3:48:38<2:27:24, 3.33s/it] {'loss': 0.6614, 'grad_norm': 0.6220537312061587, 'learning_rate': 7.268340034609296e-06, 'epoch': 0.6} 60%|██████ | 3986/6638 [3:48:38<2:27:24, 3.33s/it] 60%|██████ | 3987/6638 [3:48:41<2:25:35, 3.30s/it] {'loss': 0.611, 'grad_norm': 0.604581604952108, 'learning_rate': 7.263646187643605e-06, 'epoch': 0.6} 60%|██████ | 3987/6638 [3:48:41<2:25:35, 3.30s/it] 60%|██████ | 3988/6638 [3:48:45<2:25:00, 3.28s/it] {'loss': 0.6849, 'grad_norm': 0.6061274787810357, 'learning_rate': 7.258952992261366e-06, 'epoch': 0.6} 60%|██████ | 3988/6638 [3:48:45<2:25:00, 3.28s/it] 60%|██████ | 3989/6638 [3:48:48<2:23:35, 3.25s/it] {'loss': 0.6251, 'grad_norm': 0.5992227279151122, 'learning_rate': 7.254260449580122e-06, 'epoch': 0.6} 60%|██████ | 3989/6638 [3:48:48<2:23:35, 3.25s/it] 60%|██████ | 3990/6638 [3:48:51<2:26:12, 3.31s/it] {'loss': 0.6512, 'grad_norm': 0.5600710433353401, 'learning_rate': 7.249568560717264e-06, 'epoch': 0.6} 60%|██████ | 3990/6638 [3:48:51<2:26:12, 3.31s/it] 60%|██████ | 3991/6638 [3:48:55<2:25:12, 3.29s/it] {'loss': 0.6385, 'grad_norm': 0.6776894967271062, 'learning_rate': 7.244877326790039e-06, 'epoch': 0.6} 60%|██████ | 3991/6638 [3:48:55<2:25:12, 3.29s/it] 60%|██████ | 3992/6638 [3:48:58<2:25:05, 3.29s/it] {'loss': 0.6806, 'grad_norm': 0.6651373286990231, 'learning_rate': 7.240186748915517e-06, 'epoch': 0.6} 60%|██████ | 3992/6638 [3:48:58<2:25:05, 3.29s/it] 60%|██████ | 3993/6638 [3:49:01<2:26:50, 3.33s/it] {'loss': 0.6428, 'grad_norm': 0.5845591431397548, 'learning_rate': 7.235496828210633e-06, 'epoch': 0.6} 60%|██████ | 3993/6638 [3:49:01<2:26:50, 3.33s/it] 60%|██████ | 3994/6638 [3:49:05<2:27:58, 3.36s/it] {'loss': 0.6079, 'grad_norm': 0.5788646660727541, 'learning_rate': 7.230807565792151e-06, 'epoch': 0.6} 60%|██████ | 3994/6638 [3:49:05<2:27:58, 3.36s/it][2025-05-27 21:59:18] Rank 2: Timeout, start to save checkpoint.... [2025-05-27 21:59:18] Rank 0:[2025-05-27 21:59:18] Rank 7: Timeout, start to save checkpoint.... [2025-05-27 21:59:18] Rank 1: Timeout, start to save checkpoint.... Timeout, start to save checkpoint.... 60%|██████ | 3995/6638 [3:49:08<2:26:29, 3.33s/it][2025-05-27 21:59:18] Rank 3: Timeout, start to save checkpoint.... [2025-05-27 21:59:18] Rank 6: Timeout, start to save checkpoint.... [2025-05-27 21:59:18] Rank 4: Timeout, start to save checkpoint.... [2025-05-27 21:59:18] Rank 5: Timeout, start to save checkpoint.... {'loss': 0.6119, 'grad_norm': 0.6092605628351098, 'learning_rate': 7.226118962776683e-06, 'epoch': 0.6} 60%|██████ | 3995/6638 [3:49:08<2:26:29, 3.33s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3995/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3995/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-3995/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( {'train_runtime': 13764.3348, 'train_samples_per_second': 123.522, 'train_steps_per_second': 0.482, 'train_loss': 0.6748497788688268, 'epoch': 0.6} 60%|██████ | 3995/6638 [3:49:21<2:26:29, 3.33s/it] 60%|██████ | 3995/6638 [3:49:22<2:31:45, 3.45s/it] wandb: wandb: 🚀 View run nvila_2b_path_mask at: https://wandb.ai/memmelma/VILA/runs/us20txks wandb: Find logs at: ../../../../../../../../fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/wandb/run-20250527_181007-us20txks/logs E0527 21:59:48.308000 23456244102976 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 124) local_rank: 0 (pid: 29851) of binary: /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/bin/python3.10 Traceback (most recent call last): File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/bin/torchrun", line 8, in sys.exit(main()) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper return f(*args, **kwargs) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main run(args) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run elastic_launch( File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ llava/train/train_mem.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2025-05-27_21:59:48 host : batch-block5-00222.cm.cluster rank : 1 (local_rank: 1) exitcode : 124 (pid: 29852) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2025-05-27_21:59:48 host : batch-block5-00222.cm.cluster rank : 2 (local_rank: 2) exitcode : 124 (pid: 29853) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2025-05-27_21:59:48 host : batch-block5-00222.cm.cluster rank : 3 (local_rank: 3) exitcode : 124 (pid: 29854) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [4]: time : 2025-05-27_21:59:48 host : batch-block5-00222.cm.cluster rank : 4 (local_rank: 4) exitcode : 124 (pid: 29855) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [5]: time : 2025-05-27_21:59:48 host : batch-block5-00222.cm.cluster rank : 5 (local_rank: 5) exitcode : 124 (pid: 29856) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [6]: time : 2025-05-27_21:59:48 host : batch-block5-00222.cm.cluster rank : 6 (local_rank: 6) exitcode : 124 (pid: 29857) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [7]: time : 2025-05-27_21:59:48 host : batch-block5-00222.cm.cluster rank : 7 (local_rank: 7) exitcode : 124 (pid: 29858) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-05-27_21:59:48 host : batch-block5-00222.cm.cluster rank : 0 (local_rank: 0) exitcode : 124 (pid: 29851) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ srun: error: batch-block5-00222: task 0: Exited with exit code 1 srun: Terminating StepId=8262265.0 srun: job 8269229 queued and waiting for resources srun: job 8269229 has been allocated resources wandb: Currently logged in as: memmelma to https://api.wandb.ai. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block7-00363 JobID: 8269229 | Full list: batch-block7-00363 NETWORK=Efficient-Large-Model/NVILA-Lite-2B W0527 22:01:17.202000 23456244102976 torch/distributed/run.py:757] W0527 22:01:17.202000 23456244102976 torch/distributed/run.py:757] ***************************************** W0527 22:01:17.202000 23456244102976 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0527 22:01:17.202000 23456244102976 torch/distributed/run.py:757] ***************************************** 2025-05-27 22:01:30.625 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 22:01:30.625 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 22:01:30.625 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 22:01:30.625 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 22:01:30.625 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 22:01:30.625 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 22:01:30.625 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 22:01:30.625 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-27 22:01:30.627 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 22:01:30.627 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 22:01:30.627 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 22:01:30.627 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 22:01:30.627 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 22:01:30.627 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 22:01:30.627 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-27 22:01:30.627 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. [2025-05-27 22:01:30,753] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:01:30,753] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:01:30,753] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:01:30,753] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:01:30,753] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:01:30,753] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:01:30,753] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-27 22:01:30,754] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( [2025-05-27 22:01:41,467] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:01:41,467] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:01:41,467] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:01:41,467] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:01:41,467] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:01:41,467] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-05-27 22:01:41,467] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:01:41,467] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:01:41,467] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:01:41,468] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:01:41,468] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:01:41,469] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:01:41,469] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:01:41,470] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:01:41,470] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-27 22:01:41,471] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-27 22:01:41,471] [INFO] [comm.py:594:init_distributed] cdb=None [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2025-05-27 22:01:47,001] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 1.78B parameters [2025-05-27 22:01:51,142] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 2.19B parameters [2025-05-27 22:01:52,509] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 2.23B parameters [dist-0-of-8] LlavaLlamaModel( (llm): Qwen2ForCausalLM( (model): Qwen2Model( (embed_tokens): Embedding(151651, 1536) (layers): ModuleList( (0-27): 28 x Qwen2DecoderLayer( (self_attn): Qwen2FlashAttention2( (q_proj): Linear(in_features=1536, out_features=1536, bias=True) (k_proj): Linear(in_features=1536, out_features=256, bias=True) (v_proj): Linear(in_features=1536, out_features=256, bias=True) (o_proj): Linear(in_features=1536, out_features=1536, bias=False) (rotary_emb): Qwen2RotaryEmbedding() ) (mlp): Qwen2MLP( (gate_proj): Linear(in_features=1536, out_features=8960, bias=False) (up_proj): Linear(in_features=1536, out_features=8960, bias=False) (down_proj): Linear(in_features=8960, out_features=1536, bias=False) (act_fn): SiLU() ) (input_layernorm): Qwen2RMSNorm((0,), eps=1e-06) (post_attention_layernorm): Qwen2RMSNorm((0,), eps=1e-06) ) ) (norm): Qwen2RMSNorm((0,), eps=1e-06) (rotary_emb): Qwen2RotaryEmbedding() ) (lm_head): Linear(in_features=1536, out_features=151651, bias=False) ) (vision_tower): SiglipVisionTower( (vision_tower): SiglipVisionModel( (vision_model): SiglipVisionTransformer( (embeddings): SiglipVisionEmbeddings( (patch_embedding): Conv2d(3, 1152, kernel_size=(14, 14), stride=(14, 14), padding=valid) (position_embedding): Embedding(1024, 1152) ) (encoder): SiglipEncoder( (layers): ModuleList( (0-26): 27 x SiglipEncoderLayer( (self_attn): SiglipFlashAttention2( (k_proj): Linear(in_features=1152, out_features=1152, bias=True) (v_proj): Linear(in_features=1152, out_features=1152, bias=True) (q_proj): Linear(in_features=1152, out_features=1152, bias=True) (out_proj): Linear(in_features=1152, out_features=1152, bias=True) ) (layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True) (mlp): SiglipMLP( (activation_fn): PytorchGELUTanh() (fc1): Linear(in_features=1152, out_features=4304, bias=True) (fc2): Linear(in_features=4304, out_features=1152, bias=True) ) (layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True) ) ) ) (post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True) ) ) ) (mm_projector): MultimodalProjector( (layers): Sequential( (0): DownSample3x3BlockFix() (1): LayerNorm((10368,), eps=1e-05, elementwise_affine=True) (2): Linear(in_features=10368, out_features=3456, bias=True) (3): GELU(approximate='none') (4): LayerNorm((3456,), eps=1e-05, elementwise_affine=True) (5): Linear(in_features=3456, out_features=1536, bias=True) (6): GELU(approximate='none') (7): Linear(in_features=1536, out_features=1536, bias=True) ) ) ) [dist-0-of-8] Tunable parameters: language model True [dist-0-of-8] vision tower True [dist-0-of-8] mm projector True trainable params: 2,000,137,328 || all params: 2,000,137,328 || trainable%: 100.0000 trainable params: 2,000,137,328 || all params: 2,000,137,328 || trainable%: 100.0000 trainable params: 2,000,137,328 || all params: 2,000,137,328 || trainable%: 100.0000 trainable params: 2,000,137,328 || all params: 2,000,137,328 || trainable%: 100.0000 trainable params: 2,000,137,328 || all params: 2,000,137,328 || trainable%: 100.0000 trainable params: 2,000,137,328 || all params: 2,000,137,328 || trainable%: 100.0000 trainable params: 2,000,137,328 || all params: 2,000,137,328 || trainable%: 100.0000 trainable params: 2,000,137,328 || all params: 2,000,137,328 || trainable%: 100.0000 2025-05-27 22:01:52.691 | WARNING | llava.data.builder:build_dataset:91 - Using mixture 'robopoint_1432k+austin_buds_dataset_converted_externally_to_rlds_primary_path_mask+austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask+austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask+bc_z_primary_path_mask+bc_z_secondary_path_mask+bc_z_tertiary_path_mask+berkeley_autolab_ur5_primary_path_mask+berkeley_autolab_ur5_secondary_path_mask+berkeley_autolab_ur5_tertiary_path_mask+berkeley_fanuc_manipulation_primary_path_mask+berkeley_fanuc_manipulation_secondary_path_mask+berkeley_fanuc_manipulation_tertiary_path_mask+bridge_v2_primary_path_mask+bridge_v2_secondary_path_mask+bridge_v2_tertiary_path_mask+cmu_stretch_primary_path_mask+cmu_stretch_secondary_path_mask+cmu_stretch_tertiary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask+droid_primary_path_mask+droid_secondary_path_mask+droid_tertiary_path_mask+fmb_primary_path_mask+fmb_secondary_path_mask+fmb_tertiary_path_mask+fractal20220817_data_primary_path_mask+fractal20220817_data_secondary_path_mask+fractal20220817_data_tertiary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask+jaco_play_primary_path_mask+jaco_play_secondary_path_mask+jaco_play_tertiary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask+taco_play_primary_path_mask+taco_play_secondary_path_mask+taco_play_tertiary_path_mask+toto_primary_path_mask+toto_secondary_path_mask+toto_tertiary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask+utaustin_mutex_primary_path_mask+utaustin_mutex_secondary_path_mask+utaustin_mutex_tertiary_path_mask+viola_primary_path_mask+viola_secondary_path_mask+viola_tertiary_path_mask'. 2025-05-27 22:01:52.691 | WARNING | llava.data.builder:build_dataset:91 - Using mixture 'robopoint_1432k+austin_buds_dataset_converted_externally_to_rlds_primary_path_mask+austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask+austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask+bc_z_primary_path_mask+bc_z_secondary_path_mask+bc_z_tertiary_path_mask+berkeley_autolab_ur5_primary_path_mask+berkeley_autolab_ur5_secondary_path_mask+berkeley_autolab_ur5_tertiary_path_mask+berkeley_fanuc_manipulation_primary_path_mask+berkeley_fanuc_manipulation_secondary_path_mask+berkeley_fanuc_manipulation_tertiary_path_mask+bridge_v2_primary_path_mask+bridge_v2_secondary_path_mask+bridge_v2_tertiary_path_mask+cmu_stretch_primary_path_mask+cmu_stretch_secondary_path_mask+cmu_stretch_tertiary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask+droid_primary_path_mask+droid_secondary_path_mask+droid_tertiary_path_mask+fmb_primary_path_mask+fmb_secondary_path_mask+fmb_tertiary_path_mask+fractal20220817_data_primary_path_mask+fractal20220817_data_secondary_path_mask+fractal20220817_data_tertiary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask+jaco_play_primary_path_mask+jaco_play_secondary_path_mask+jaco_play_tertiary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask+taco_play_primary_path_mask+taco_play_secondary_path_mask+taco_play_tertiary_path_mask+toto_primary_path_mask+toto_secondary_path_mask+toto_tertiary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask+utaustin_mutex_primary_path_mask+utaustin_mutex_secondary_path_mask+utaustin_mutex_tertiary_path_mask+viola_primary_path_mask+viola_secondary_path_mask+viola_tertiary_path_mask'. 2025-05-27 22:01:52.691 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.691 | WARNING | llava.data.builder:build_dataset:91 - Using mixture 'robopoint_1432k+austin_buds_dataset_converted_externally_to_rlds_primary_path_mask+austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask+austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask+bc_z_primary_path_mask+bc_z_secondary_path_mask+bc_z_tertiary_path_mask+berkeley_autolab_ur5_primary_path_mask+berkeley_autolab_ur5_secondary_path_mask+berkeley_autolab_ur5_tertiary_path_mask+berkeley_fanuc_manipulation_primary_path_mask+berkeley_fanuc_manipulation_secondary_path_mask+berkeley_fanuc_manipulation_tertiary_path_mask+bridge_v2_primary_path_mask+bridge_v2_secondary_path_mask+bridge_v2_tertiary_path_mask+cmu_stretch_primary_path_mask+cmu_stretch_secondary_path_mask+cmu_stretch_tertiary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask+droid_primary_path_mask+droid_secondary_path_mask+droid_tertiary_path_mask+fmb_primary_path_mask+fmb_secondary_path_mask+fmb_tertiary_path_mask+fractal20220817_data_primary_path_mask+fractal20220817_data_secondary_path_mask+fractal20220817_data_tertiary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask+jaco_play_primary_path_mask+jaco_play_secondary_path_mask+jaco_play_tertiary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask+taco_play_primary_path_mask+taco_play_secondary_path_mask+taco_play_tertiary_path_mask+toto_primary_path_mask+toto_secondary_path_mask+toto_tertiary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask+utaustin_mutex_primary_path_mask+utaustin_mutex_secondary_path_mask+utaustin_mutex_tertiary_path_mask+viola_primary_path_mask+viola_secondary_path_mask+viola_tertiary_path_mask'. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:91 - Using mixture 'robopoint_1432k+austin_buds_dataset_converted_externally_to_rlds_primary_path_mask+austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask+austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask+bc_z_primary_path_mask+bc_z_secondary_path_mask+bc_z_tertiary_path_mask+berkeley_autolab_ur5_primary_path_mask+berkeley_autolab_ur5_secondary_path_mask+berkeley_autolab_ur5_tertiary_path_mask+berkeley_fanuc_manipulation_primary_path_mask+berkeley_fanuc_manipulation_secondary_path_mask+berkeley_fanuc_manipulation_tertiary_path_mask+bridge_v2_primary_path_mask+bridge_v2_secondary_path_mask+bridge_v2_tertiary_path_mask+cmu_stretch_primary_path_mask+cmu_stretch_secondary_path_mask+cmu_stretch_tertiary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask+droid_primary_path_mask+droid_secondary_path_mask+droid_tertiary_path_mask+fmb_primary_path_mask+fmb_secondary_path_mask+fmb_tertiary_path_mask+fractal20220817_data_primary_path_mask+fractal20220817_data_secondary_path_mask+fractal20220817_data_tertiary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask+jaco_play_primary_path_mask+jaco_play_secondary_path_mask+jaco_play_tertiary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask+taco_play_primary_path_mask+taco_play_secondary_path_mask+taco_play_tertiary_path_mask+toto_primary_path_mask+toto_secondary_path_mask+toto_tertiary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask+utaustin_mutex_primary_path_mask+utaustin_mutex_secondary_path_mask+utaustin_mutex_tertiary_path_mask+viola_primary_path_mask+viola_secondary_path_mask+viola_tertiary_path_mask'. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:91 - Using mixture 'robopoint_1432k+austin_buds_dataset_converted_externally_to_rlds_primary_path_mask+austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask+austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask+bc_z_primary_path_mask+bc_z_secondary_path_mask+bc_z_tertiary_path_mask+berkeley_autolab_ur5_primary_path_mask+berkeley_autolab_ur5_secondary_path_mask+berkeley_autolab_ur5_tertiary_path_mask+berkeley_fanuc_manipulation_primary_path_mask+berkeley_fanuc_manipulation_secondary_path_mask+berkeley_fanuc_manipulation_tertiary_path_mask+bridge_v2_primary_path_mask+bridge_v2_secondary_path_mask+bridge_v2_tertiary_path_mask+cmu_stretch_primary_path_mask+cmu_stretch_secondary_path_mask+cmu_stretch_tertiary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask+droid_primary_path_mask+droid_secondary_path_mask+droid_tertiary_path_mask+fmb_primary_path_mask+fmb_secondary_path_mask+fmb_tertiary_path_mask+fractal20220817_data_primary_path_mask+fractal20220817_data_secondary_path_mask+fractal20220817_data_tertiary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask+jaco_play_primary_path_mask+jaco_play_secondary_path_mask+jaco_play_tertiary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask+taco_play_primary_path_mask+taco_play_secondary_path_mask+taco_play_tertiary_path_mask+toto_primary_path_mask+toto_secondary_path_mask+toto_tertiary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask+utaustin_mutex_primary_path_mask+utaustin_mutex_secondary_path_mask+utaustin_mutex_tertiary_path_mask+viola_primary_path_mask+viola_secondary_path_mask+viola_tertiary_path_mask'. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:91 - Using mixture 'robopoint_1432k+austin_buds_dataset_converted_externally_to_rlds_primary_path_mask+austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask+austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask+bc_z_primary_path_mask+bc_z_secondary_path_mask+bc_z_tertiary_path_mask+berkeley_autolab_ur5_primary_path_mask+berkeley_autolab_ur5_secondary_path_mask+berkeley_autolab_ur5_tertiary_path_mask+berkeley_fanuc_manipulation_primary_path_mask+berkeley_fanuc_manipulation_secondary_path_mask+berkeley_fanuc_manipulation_tertiary_path_mask+bridge_v2_primary_path_mask+bridge_v2_secondary_path_mask+bridge_v2_tertiary_path_mask+cmu_stretch_primary_path_mask+cmu_stretch_secondary_path_mask+cmu_stretch_tertiary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask+droid_primary_path_mask+droid_secondary_path_mask+droid_tertiary_path_mask+fmb_primary_path_mask+fmb_secondary_path_mask+fmb_tertiary_path_mask+fractal20220817_data_primary_path_mask+fractal20220817_data_secondary_path_mask+fractal20220817_data_tertiary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask+jaco_play_primary_path_mask+jaco_play_secondary_path_mask+jaco_play_tertiary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask+taco_play_primary_path_mask+taco_play_secondary_path_mask+taco_play_tertiary_path_mask+toto_primary_path_mask+toto_secondary_path_mask+toto_tertiary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask+utaustin_mutex_primary_path_mask+utaustin_mutex_secondary_path_mask+utaustin_mutex_tertiary_path_mask+viola_primary_path_mask+viola_secondary_path_mask+viola_tertiary_path_mask'. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:91 - Using mixture 'robopoint_1432k+austin_buds_dataset_converted_externally_to_rlds_primary_path_mask+austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask+austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask+bc_z_primary_path_mask+bc_z_secondary_path_mask+bc_z_tertiary_path_mask+berkeley_autolab_ur5_primary_path_mask+berkeley_autolab_ur5_secondary_path_mask+berkeley_autolab_ur5_tertiary_path_mask+berkeley_fanuc_manipulation_primary_path_mask+berkeley_fanuc_manipulation_secondary_path_mask+berkeley_fanuc_manipulation_tertiary_path_mask+bridge_v2_primary_path_mask+bridge_v2_secondary_path_mask+bridge_v2_tertiary_path_mask+cmu_stretch_primary_path_mask+cmu_stretch_secondary_path_mask+cmu_stretch_tertiary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask+droid_primary_path_mask+droid_secondary_path_mask+droid_tertiary_path_mask+fmb_primary_path_mask+fmb_secondary_path_mask+fmb_tertiary_path_mask+fractal20220817_data_primary_path_mask+fractal20220817_data_secondary_path_mask+fractal20220817_data_tertiary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask+jaco_play_primary_path_mask+jaco_play_secondary_path_mask+jaco_play_tertiary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask+taco_play_primary_path_mask+taco_play_secondary_path_mask+taco_play_tertiary_path_mask+toto_primary_path_mask+toto_secondary_path_mask+toto_tertiary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask+utaustin_mutex_primary_path_mask+utaustin_mutex_secondary_path_mask+utaustin_mutex_tertiary_path_mask+viola_primary_path_mask+viola_secondary_path_mask+viola_tertiary_path_mask'. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:91 - Using mixture 'robopoint_1432k+austin_buds_dataset_converted_externally_to_rlds_primary_path_mask+austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask+austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask+austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask+bc_z_primary_path_mask+bc_z_secondary_path_mask+bc_z_tertiary_path_mask+berkeley_autolab_ur5_primary_path_mask+berkeley_autolab_ur5_secondary_path_mask+berkeley_autolab_ur5_tertiary_path_mask+berkeley_fanuc_manipulation_primary_path_mask+berkeley_fanuc_manipulation_secondary_path_mask+berkeley_fanuc_manipulation_tertiary_path_mask+bridge_v2_primary_path_mask+bridge_v2_secondary_path_mask+bridge_v2_tertiary_path_mask+cmu_stretch_primary_path_mask+cmu_stretch_secondary_path_mask+cmu_stretch_tertiary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask+dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask+droid_primary_path_mask+droid_secondary_path_mask+droid_tertiary_path_mask+fmb_primary_path_mask+fmb_secondary_path_mask+fmb_tertiary_path_mask+fractal20220817_data_primary_path_mask+fractal20220817_data_secondary_path_mask+fractal20220817_data_tertiary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask+iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask+jaco_play_primary_path_mask+jaco_play_secondary_path_mask+jaco_play_tertiary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask+nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask+stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask+taco_play_primary_path_mask+taco_play_secondary_path_mask+taco_play_tertiary_path_mask+toto_primary_path_mask+toto_secondary_path_mask+toto_tertiary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask+ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask+utaustin_mutex_primary_path_mask+utaustin_mutex_secondary_path_mask+utaustin_mutex_tertiary_path_mask+viola_primary_path_mask+viola_secondary_path_mask+viola_tertiary_path_mask'. 2025-05-27 22:01:52.692 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:52.806 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.806 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.806 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.806 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.806 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.807 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.807 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:52.807 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:52.808 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.808 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.808 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.808 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.808 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.808 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.808 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:52.808 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_buds_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:52.810 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.810 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.810 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.810 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.810 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.810 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:52.810 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:52.810 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:53.023 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.023 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.023 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.023 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.023 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.023 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.023 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.024 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode2025-05-27 22:01:53.025 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.025 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.025 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.025 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.025 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.025 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.025 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.026 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sailor_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:53.027 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.027 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.027 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.027 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.027 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.027 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.027 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.027 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:53.028 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.028 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.028 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.028 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.028 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.028 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.028 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.029 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:53.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'austin_sirius_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:53.032 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.032 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.032 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.032 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.032 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.032 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.032 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.032 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.397 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:53.399 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.399 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.400 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:53.401 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.401 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.401 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.402 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.402 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.402 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.402 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.402 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.402 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.402 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.403 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.403 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.403 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.403 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.403 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.404 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.405 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.405 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.405 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.405 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.405 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.405 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.405 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.405 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.406 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.406 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.406 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bc_z_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.407 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.408 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.408 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_autolab_ur5_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.408 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.408 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.409 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.409 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.409 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.409 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.409 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.409 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.409 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.409 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.409 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'berkeley_fanuc_manipulation_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:53.410 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.410 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.410 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.410 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.410 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.410 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:53.410 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:53.410 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.023 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.026 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.028 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.028 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.029 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.030 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.031 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.031 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.031 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.032 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.033 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.033 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.033 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.033 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.034 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.034 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.034 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.034 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.035 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.036 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:54.037 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:54.037 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.037 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:54.038 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:54.038 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.038 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.058 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.059 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.060 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.061 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.061 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.062 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.062 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.063 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.063 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.081 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.082 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.084 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.084 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.087 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.088 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.088 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.089 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.090 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:54.090 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.090 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.090 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.091 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.092 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.092 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.093 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.094 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.095 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.099 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.100 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.101 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.102 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.102 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.103 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:54.103 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.103 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.104 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'bridge_v2_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.104 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.104 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.105 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.105 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.105 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'cmu_stretch_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.106 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.107 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.107 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'dlr_edan_shared_control_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.108 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.784 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.798 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.801 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.823 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.856 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.857 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.873 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:54.878 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.310 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.312 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.313 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.313 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:55.313 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:55.314 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_secondary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:55.314 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.315 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.315 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:01:55.316 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:55.316 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.317 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.318 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:55.318 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.318 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.322 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.323 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.324 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.324 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.324 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.325 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:55.325 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.325 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.326 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.326 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.329 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.330 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.331 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.331 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.331 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:55.332 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'droid_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.332 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.332 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.332 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.333 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.333 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.333 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.333 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fmb_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:55.334 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:55.334 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.230 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.239 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.245 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.245 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.247 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.247 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.248 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.249 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.250 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.250 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.252 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.252 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.253 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.254 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.254 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.255 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.255 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.256 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.256 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.257 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:56.257 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.257 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.258 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.258 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.258 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.258 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.259 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.259 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.259 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:56.259 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.260 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.261 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.261 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'robopoint_1432k' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.261 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'robopoint_1432k' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.261 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.262 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'robopoint_1432k' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.276 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.277 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.278 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.278 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.279 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.280 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.280 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.281 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.281 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.282 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.282 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.283 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'robopoint_1432k' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.298 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.299 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.299 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.300 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.300 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.301 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.302 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.302 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.303 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.303 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.304 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.305 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'robopoint_1432k' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.306 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.306 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.307 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.308 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.308 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.309 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.309 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.310 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.311 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.311 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.312 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.312 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'robopoint_1432k' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.328 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.329 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.330 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:01:56.330 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.330 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.331 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'fractal20220817_data_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.331 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.331 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.332 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.332 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.332 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.332 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'iamlab_cmu_pickup_insert_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.333 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.333 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.333 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.334 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.334 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.334 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'jaco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.334 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.335 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.335 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'robopoint_1432k' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.335 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.336 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'nyu_franka_play_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:01:56.336 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'robopoint_1432k' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.012 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.015 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.017 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.018 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.020 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.022 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.024 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.026 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.027 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.029 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.033 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.035 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.037 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.038 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.040 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.041 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.043 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.044 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode [2025-05-27 22:02:09] Rank 4: Timer for terminate callback has been set. Total limit: 240min Pre terminate time: 10min elapsed_time: 27.65906023979187s /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/train/llava_trainer.py:591: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `LLaVATrainer.__init__`. Use `processing_class` instead. super().__init__(*args, **kwargs) Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.050 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.051 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.052 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.052 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.053 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.053 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.054 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.055 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.055 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.056 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.056 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.057 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.057 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.058 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.058 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.059 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.060 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.060 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode [2025-05-27 22:02:09] Rank 2: Timer for terminate callback has been set. Total limit: 240min Pre terminate time: 10min elapsed_time: 27.673946380615234s /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/train/llava_trainer.py:591: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `LLaVATrainer.__init__`. Use `processing_class` instead. super().__init__(*args, **kwargs) length of dataloader: 13276 1700195 length of dataloader: 13276 1700195 [GPU memory] before trainer 0.955507755279541 [GPU memory] before trainer 0.955507755279541 Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.453 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.454 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.455 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.456 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.457 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.457 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.458 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.459 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.459 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.460 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.461 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.461 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.462 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.462 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.463 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.463 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.464 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.464 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode [2025-05-27 22:02:09] Rank 0: Timer for terminate callback has been set. Total limit: 240min Pre terminate time: 10min elapsed_time: 28.078200340270996s /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/train/llava_trainer.py:591: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `LLaVATrainer.__init__`. Use `processing_class` instead. super().__init__(*args, **kwargs) Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.529 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.531 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.531 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.532 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.533 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.533 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.534 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.535 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.536 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.536 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.537 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.537 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.538 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.538 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.539 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.540 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.540 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.541 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode [2025-05-27 22:02:09] Rank 6: Timer for terminate callback has been set. Total limit: 240min Pre terminate time: 10min elapsed_time: 28.15425682067871s /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/train/llava_trainer.py:591: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `LLaVATrainer.__init__`. Use `processing_class` instead. super().__init__(*args, **kwargs) length of dataloader: 13276 1700195 [GPU memory] before trainer 0.955507755279541 length of dataloader: 13276 1700195 [GPU memory] before trainer 0.955507755279541 Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.682 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.700 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.701 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.701 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:02:09.702 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy modeFormatting inputs...Skip in lazy mode 2025-05-27 22:02:09.702 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:02:09.702 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.703 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.703 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.703 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. 2025-05-27 22:02:09.703 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.704 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.704 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.705 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.705 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.705 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.705 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.706 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.706 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.706 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.706 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.707 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.707 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.707 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.707 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.708 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.708 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.708 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.709 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.709 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.709 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.709 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.710 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.710 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.710 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.711 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_tertiary_path_mask' is from the legacy registry. Please consider migrating it. [2025-05-27 22:02:09] Rank 3: Timer for terminate callback has been set. Total limit: 240min Pre terminate time: 10min elapsed_time: 28.32365584373474s Formatting inputs...Skip in lazy mode [2025-05-27 22:02:09] Rank 1: Timer for terminate callback has been set. Total limit: 240min Pre terminate time: 10min elapsed_time: 28.324324131011963s /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/train/llava_trainer.py:591: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `LLaVATrainer.__init__`. Use `processing_class` instead. super().__init__(*args, **kwargs) /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/train/llava_trainer.py:591: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `LLaVATrainer.__init__`. Use `processing_class` instead. super().__init__(*args, **kwargs) Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.734 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.735 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.736 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.736 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.737 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.738 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.738 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.739 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.739 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.740 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.741 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.741 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.742 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.742 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.743 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.744 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.744 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.745 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode [2025-05-27 22:02:09] Rank 5: Timer for terminate callback has been set. Total limit: 240min Pre terminate time: 10min elapsed_time: 28.358332872390747s /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/train/llava_trainer.py:591: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `LLaVATrainer.__init__`. Use `processing_class` instead. super().__init__(*args, **kwargs) length of dataloader: 13276 1700195 [GPU memory] before trainer 0.955507755279541 length of dataloader: 13276 1700195 [GPU memory] before trainer 0.955507755279541 length of dataloader: 13276 1700195 [GPU memory] before trainer 0.955507755279541 Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.902 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.904 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.904 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'stanford_hydra_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.905 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.905 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.906 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'taco_play_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.907 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.907 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.908 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'toto_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.908 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.909 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.910 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'ucsd_kitchen_dataset_converted_externally_to_rlds_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.910 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.911 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.911 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'utaustin_mutex_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.912 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_primary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.912 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_secondary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode 2025-05-27 22:02:09.913 | WARNING | llava.data.builder:build_dataset:127 - Dataset 'viola_tertiary_path_mask' is from the legacy registry. Please consider migrating it. Formatting inputs...Skip in lazy mode [2025-05-27 22:02:09] Rank 7: Timer for terminate callback has been set. Total limit: 240min Pre terminate time: 10min elapsed_time: 28.526604890823364s /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/train/llava_trainer.py:591: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `LLaVATrainer.__init__`. Use `processing_class` instead. super().__init__(*args, **kwargs) length of dataloader: 13276 1700195 [GPU memory] before trainer 0.955507755279541 [rank4]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [rank2]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [rank0]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [rank6]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [rank1]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [rank3]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [rank5]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [rank7]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) Parameter Offload: Total persistent parameters: 578672 in 421 params wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) wandb: Currently logged in as: memmelma to https://api.wandb.ai. Use `wandb login --relogin` to force relogin huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) wandb: Tracking run with wandb version 0.19.9 wandb: Run data is saved locally in /lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/wandb/run-20250527_220238-i92sb1t7 wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run nvila_2b_path_mask wandb: ⭐️ View project at https://wandb.ai/memmelma/VILA wandb: 🚀 View run at https://wandb.ai/memmelma/VILA/runs/i92sb1t7 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) 0%| | 0/6638 [00:00 4096). Running this sequence through the model will result in indexing errors 65%|██████▌ | 4344/6638 [20:23<2:06:16, 3.30s/it] {'loss': 0.6388, 'grad_norm': 0.586431694171502, 'learning_rate': 5.637938338444325e-06, 'epoch': 0.65} 65%|██████▌ | 4344/6638 [20:23<2:06:16, 3.30s/it] 65%|██████▌ | 4345/6638 [20:26<2:05:42, 3.29s/it] {'loss': 0.64, 'grad_norm': 0.5745418824229569, 'learning_rate': 5.633547817347277e-06, 'epoch': 0.65} 65%|██████▌ | 4345/6638 [20:26<2:05:42, 3.29s/it] 65%|██████▌ | 4346/6638 [20:30<2:05:00, 3.27s/it] {'loss': 0.6221, 'grad_norm': 0.5729844075976919, 'learning_rate': 5.6291583359944095e-06, 'epoch': 0.65} 65%|██████▌ | 4346/6638 [20:30<2:05:00, 3.27s/it] 65%|██████▌ | 4347/6638 [20:33<2:04:44, 3.27s/it] {'loss': 0.6356, 'grad_norm': 0.5390268749551785, 'learning_rate': 5.6247698954309616e-06, 'epoch': 0.65} 65%|██████▌ | 4347/6638 [20:33<2:04:44, 3.27s/it] 66%|██████▌ | 4348/6638 [20:36<2:04:57, 3.27s/it] {'loss': 0.656, 'grad_norm': 0.5674880654069336, 'learning_rate': 5.620382496701897e-06, 'epoch': 0.66} 66%|██████▌ | 4348/6638 [20:36<2:04:57, 3.27s/it] 66%|██████▌ | 4349/6638 [20:40<2:05:12, 3.28s/it] {'loss': 0.6369, 'grad_norm': 0.5900488831135604, 'learning_rate': 5.6159961408519695e-06, 'epoch': 0.66} 66%|██████▌ | 4349/6638 [20:40<2:05:12, 3.28s/it]6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 66%|██████▌ | 4350/6638 [20:43<2:04:53, 3.28s/it] {'loss': 0.6216, 'grad_norm': 0.6246229258288113, 'learning_rate': 5.61161082892565e-06, 'epoch': 0.66} 66%|██████▌ | 4350/6638 [20:43<2:04:53, 3.28s/it] 66%|██████▌ | 4351/6638 [20:46<2:04:05, 3.26s/it] {'loss': 0.6989, 'grad_norm': 0.6624571630031104, 'learning_rate': 5.607226561967171e-06, 'epoch': 0.66} 66%|██████▌ | 4351/6638 [20:46<2:04:05, 3.26s/it] 66%|██████▌ | 4352/6638 [20:49<2:05:19, 3.29s/it] {'loss': 0.633, 'grad_norm': 0.6363931168777079, 'learning_rate': 5.602843341020525e-06, 'epoch': 0.66} 66%|██████▌ | 4352/6638 [20:49<2:05:19, 3.29s/it] 66%|██████▌ | 4353/6638 [20:53<2:04:46, 3.28s/it] {'loss': 0.6079, 'grad_norm': 0.5988018706368636, 'learning_rate': 5.598461167129445e-06, 'epoch': 0.66} 66%|██████▌ | 4353/6638 [20:53<2:04:46, 3.28s/it] 66%|██████▌ | 4354/6638 [20:56<2:04:45, 3.28s/it] {'loss': 0.6574, 'grad_norm': 0.6392157499140725, 'learning_rate': 5.594080041337426e-06, 'epoch': 0.66} 66%|██████▌ | 4354/6638 [20:56<2:04:45, 3.28s/it] 66%|██████▌ | 4355/6638 [20:59<2:04:49, 3.28s/it] {'loss': 0.6302, 'grad_norm': 0.5773451538244594, 'learning_rate': 5.589699964687698e-06, 'epoch': 0.66} 66%|██████▌ | 4355/6638 [20:59<2:04:49, 3.28s/it] 66%|██████▌ | 4356/6638 [21:02<2:04:49, 3.28s/it] {'loss': 0.6711, 'grad_norm': 0.6468617245930083, 'learning_rate': 5.585320938223253e-06, 'epoch': 0.66} 66%|██████▌ | 4356/6638 [21:02<2:04:49, 3.28s/it] 66%|██████▌ | 4357/6638 [21:06<2:05:29, 3.30s/it] {'loss': 0.6336, 'grad_norm': 0.5902805756456831, 'learning_rate': 5.580942962986833e-06, 'epoch': 0.66} 66%|██████▌ | 4357/6638 [21:06<2:05:29, 3.30s/it] 66%|██████▌ | 4358/6638 [21:09<2:04:39, 3.28s/it] {'loss': 0.6178, 'grad_norm': 0.5782754165275018, 'learning_rate': 5.576566040020917e-06, 'epoch': 0.66} 66%|██████▌ | 4358/6638 [21:09<2:04:39, 3.28s/it] 66%|██████▌ | 4359/6638 [21:12<2:05:48, 3.31s/it] {'loss': 0.6294, 'grad_norm': 0.5732083362159325, 'learning_rate': 5.57219017036775e-06, 'epoch': 0.66} 66%|██████▌ | 4359/6638 [21:12<2:05:48, 3.31s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/model/llava_arch.py:500: UserWarning: Truncating sequences to `model_max_length` (4096). warnings.warn(f"Truncating sequences to `model_max_length` ({self.tokenizer.model_max_length}).") 66%|██████▌ | 4360/6638 [21:16<2:09:43, 3.42s/it] {'loss': 0.6639, 'grad_norm': 0.565531868872224, 'learning_rate': 5.567815355069319e-06, 'epoch': 0.66} 66%|██████▌ | 4360/6638 [21:16<2:09:43, 3.42s/it] 66%|██████▌ | 4361/6638 [21:19<2:06:45, 3.34s/it] {'loss': 0.6376, 'grad_norm': 0.5914672828126165, 'learning_rate': 5.5634415951673536e-06, 'epoch': 0.66} 66%|██████▌ | 4361/6638 [21:19<2:06:45, 3.34s/it] 66%|██████▌ | 4362/6638 [21:23<2:05:40, 3.31s/it] {'loss': 0.6399, 'grad_norm': 0.575243978095837, 'learning_rate': 5.5590688917033454e-06, 'epoch': 0.66} 66%|██████▌ | 4362/6638 [21:23<2:05:40, 3.31s/it] 66%|██████▌ | 4363/6638 [21:26<2:06:13, 3.33s/it] {'loss': 0.6666, 'grad_norm': 0.630613430957318, 'learning_rate': 5.554697245718519e-06, 'epoch': 0.66} 66%|██████▌ | 4363/6638 [21:26<2:06:13, 3.33s/it] 66%|██████▌ | 4364/6638 [21:29<2:04:38, 3.29s/it] {'loss': 0.607, 'grad_norm': 0.5926500114290508, 'learning_rate': 5.550326658253861e-06, 'epoch': 0.66} 66%|██████▌ | 4364/6638 [21:29<2:04:38, 3.29s/it] 66%|██████▌ | 4365/6638 [21:32<2:03:50, 3.27s/it] {'loss': 0.6428, 'grad_norm': 0.5957275412669756, 'learning_rate': 5.545957130350102e-06, 'epoch': 0.66} 66%|██████▌ | 4365/6638 [21:32<2:03:50, 3.27s/it] 66%|██████▌ | 4366/6638 [21:35<2:02:56, 3.25s/it] {'loss': 0.6358, 'grad_norm': 0.5934797380232196, 'learning_rate': 5.541588663047711e-06, 'epoch': 0.66} 66%|██████▌ | 4366/6638 [21:35<2:02:56, 3.25s/it] 66%|██████▌ | 4367/6638 [21:39<2:02:50, 3.25s/it] {'loss': 0.6359, 'grad_norm': 0.5631959408660062, 'learning_rate': 5.5372212573869175e-06, 'epoch': 0.66} 66%|██████▌ | 4367/6638 [21:39<2:02:50, 3.25s/it] 66%|██████▌ | 4368/6638 [21:42<2:04:06, 3.28s/it] {'loss': 0.6493, 'grad_norm': 0.6356639381744377, 'learning_rate': 5.532854914407693e-06, 'epoch': 0.66} 66%|██████▌ | 4368/6638 [21:42<2:04:06, 3.28s/it] 66%|██████▌ | 4369/6638 [21:45<2:03:31, 3.27s/it] {'loss': 0.6292, 'grad_norm': 0.5506967980828745, 'learning_rate': 5.5284896351497566e-06, 'epoch': 0.66} 66%|██████▌ | 4369/6638 [21:45<2:03:31, 3.27s/it] 66%|██████▌ | 4370/6638 [21:49<2:03:00, 3.25s/it] {'loss': 0.6327, 'grad_norm': 0.533144762702268, 'learning_rate': 5.524125420652571e-06, 'epoch': 0.66} 66%|██████▌ | 4370/6638 [21:49<2:03:00, 3.25s/it] 66%|██████▌ | 4371/6638 [21:52<2:04:35, 3.30s/it] {'loss': 0.6757, 'grad_norm': 0.6779911022828954, 'learning_rate': 5.51976227195534e-06, 'epoch': 0.66} 66%|██████▌ | 4371/6638 [21:52<2:04:35, 3.30s/it] 66%|██████▌ | 4372/6638 [21:55<2:03:39, 3.27s/it] {'loss': 0.6453, 'grad_norm': 0.6296207380301424, 'learning_rate': 5.515400190097038e-06, 'epoch': 0.66} 66%|██████▌ | 4372/6638 [21:55<2:03:39, 3.27s/it] 66%|██████▌ | 4373/6638 [21:59<2:04:28, 3.30s/it] {'loss': 0.6684, 'grad_norm': 0.583333366814411, 'learning_rate': 5.511039176116357e-06, 'epoch': 0.66} 66%|██████▌ | 4373/6638 [21:59<2:04:28, 3.30s/it] 66%|██████▌ | 4374/6638 [22:02<2:04:53, 3.31s/it] {'loss': 0.6375, 'grad_norm': 0.537837746067476, 'learning_rate': 5.506679231051747e-06, 'epoch': 0.66} 66%|██████▌ | 4374/6638 [22:02<2:04:53, 3.31s/it] 66%|██████▌ | 4375/6638 [22:05<2:04:16, 3.29s/it] {'loss': 0.6292, 'grad_norm': 0.5795975477840978, 'learning_rate': 5.502320355941404e-06, 'epoch': 0.66} 66%|██████▌ | 4375/6638 [22:05<2:04:16, 3.29s/it] 66%|██████▌ | 4376/6638 [22:08<2:03:18, 3.27s/it] {'loss': 0.6326, 'grad_norm': 0.584632017411363, 'learning_rate': 5.497962551823266e-06, 'epoch': 0.66} 66%|██████▌ | 4376/6638 [22:08<2:03:18, 3.27s/it] 66%|██████▌ | 4377/6638 [22:12<2:03:22, 3.27s/it] {'loss': 0.604, 'grad_norm': 0.5237303963379295, 'learning_rate': 5.493605819735026e-06, 'epoch': 0.66} 66%|██████▌ | 4377/6638 [22:12<2:03:22, 3.27s/it] 66%|██████▌ | 4378/6638 [22:15<2:03:58, 3.29s/it] {'loss': 0.6349, 'grad_norm': 0.567995773670271, 'learning_rate': 5.4892501607141036e-06, 'epoch': 0.66} 66%|██████▌ | 4378/6638 [22:15<2:03:58, 3.29s/it] 66%|██████▌ | 4379/6638 [22:18<2:02:26, 3.25s/it] {'loss': 0.6737, 'grad_norm': 0.6318647403855284, 'learning_rate': 5.4848955757976775e-06, 'epoch': 0.66} 66%|██████▌ | 4379/6638 [22:18<2:02:26, 3.25s/it] 66%|██████▌ | 4380/6638 [22:22<2:05:43, 3.34s/it] {'loss': 0.6333, 'grad_norm': 0.5437471185773303, 'learning_rate': 5.480542066022667e-06, 'epoch': 0.66} 66%|██████▌ | 4380/6638 [22:22<2:05:43, 3.34s/it] 66%|██████▌ | 4381/6638 [22:25<2:04:24, 3.31s/it] {'loss': 0.6352, 'grad_norm': 0.5923171939139149, 'learning_rate': 5.476189632425732e-06, 'epoch': 0.66} 66%|██████▌ | 4381/6638 [22:25<2:04:24, 3.31s/it] 66%|██████▌ | 4382/6638 [22:28<2:03:54, 3.30s/it] {'loss': 0.6147, 'grad_norm': 0.5413687824298313, 'learning_rate': 5.471838276043278e-06, 'epoch': 0.66} 66%|██████▌ | 4382/6638 [22:28<2:03:54, 3.30s/it] 66%|██████▌ | 4383/6638 [22:31<2:03:17, 3.28s/it] {'loss': 0.6344, 'grad_norm': 0.5405768351737796, 'learning_rate': 5.46748799791146e-06, 'epoch': 0.66} 66%|██████▌ | 4383/6638 [22:31<2:03:17, 3.28s/it] 66%|██████▌ | 4384/6638 [22:35<2:02:22, 3.26s/it] {'loss': 0.6078, 'grad_norm': 0.5551413816354016, 'learning_rate': 5.4631387990661635e-06, 'epoch': 0.66} 66%|██████▌ | 4384/6638 [22:35<2:02:22, 3.26s/it] 66%|██████▌ | 4385/6638 [22:38<2:04:00, 3.30s/it] {'loss': 0.6741, 'grad_norm': 0.6331780495048941, 'learning_rate': 5.458790680543031e-06, 'epoch': 0.66} 66%|██████▌ | 4385/6638 [22:38<2:04:00, 3.30s/it] 66%|██████▌ | 4386/6638 [22:41<2:03:26, 3.29s/it] {'loss': 0.6048, 'grad_norm': 0.5457809330792106, 'learning_rate': 5.454443643377435e-06, 'epoch': 0.66} 66%|██████▌ | 4386/6638 [22:41<2:03:26, 3.29s/it] 66%|██████▌ | 4387/6638 [22:45<2:02:46, 3.27s/it] {'loss': 0.6392, 'grad_norm': 0.5828550171958836, 'learning_rate': 5.450097688604498e-06, 'epoch': 0.66} 66%|██████▌ | 4387/6638 [22:45<2:02:46, 3.27s/it] 66%|██████▌ | 4388/6638 [22:48<2:03:48, 3.30s/it] {'loss': 0.5942, 'grad_norm': 0.5240214886338841, 'learning_rate': 5.44575281725909e-06, 'epoch': 0.66} 66%|██████▌ | 4388/6638 [22:48<2:03:48, 3.30s/it] 66%|██████▌ | 4389/6638 [22:51<2:03:38, 3.30s/it] {'loss': 0.6711, 'grad_norm': 0.6141060384636402, 'learning_rate': 5.441409030375806e-06, 'epoch': 0.66} 66%|██████▌ | 4389/6638 [22:51<2:03:38, 3.30s/it] 66%|██████▌ | 4390/6638 [22:54<2:02:17, 3.26s/it] {'loss': 0.6603, 'grad_norm': 0.6608077773087019, 'learning_rate': 5.437066328988999e-06, 'epoch': 0.66} 66%|██████▌ | 4390/6638 [22:54<2:02:17, 3.26s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (5165 > 4096). Running this sequence through the model will result in indexing errors 66%|██████▌ | 4391/6638 [22:58<2:03:03, 3.29s/it] {'loss': 0.6113, 'grad_norm': 0.6047979566669626, 'learning_rate': 5.432724714132756e-06, 'epoch': 0.66} 66%|██████▌ | 4391/6638 [22:58<2:03:03, 3.29s/it] 66%|██████▌ | 4392/6638 [23:01<2:03:46, 3.31s/it] {'loss': 0.6355, 'grad_norm': 0.5797700000610989, 'learning_rate': 5.428384186840912e-06, 'epoch': 0.66} 66%|██████▌ | 4392/6638 [23:01<2:03:46, 3.31s/it] 66%|██████▌ | 4393/6638 [23:04<2:03:19, 3.30s/it] {'loss': 0.6381, 'grad_norm': 0.5802013126817335, 'learning_rate': 5.424044748147032e-06, 'epoch': 0.66} 66%|██████▌ | 4393/6638 [23:04<2:03:19, 3.30s/it] 66%|██████▌ | 4394/6638 [23:08<2:02:45, 3.28s/it] {'loss': 0.6511, 'grad_norm': 0.640587266733435, 'learning_rate': 5.419706399084424e-06, 'epoch': 0.66} 66%|██████▌ | 4394/6638 [23:08<2:02:45, 3.28s/it] 66%|██████▌ | 4395/6638 [23:11<2:03:08, 3.29s/it] {'loss': 0.6686, 'grad_norm': 0.5748615941883094, 'learning_rate': 5.415369140686151e-06, 'epoch': 0.66} 66%|██████▌ | 4395/6638 [23:11<2:03:08, 3.29s/it] 66%|██████▌ | 4396/6638 [23:14<2:03:45, 3.31s/it] {'loss': 0.6749, 'grad_norm': 0.6591126187733815, 'learning_rate': 5.411032973984997e-06, 'epoch': 0.66} 66%|██████▌ | 4396/6638 [23:14<2:03:45, 3.31s/it] 66%|██████▌ | 4397/6638 [23:18<2:04:11, 3.33s/it] {'loss': 0.6941, 'grad_norm': 0.7065871852920755, 'learning_rate': 5.406697900013502e-06, 'epoch': 0.66} 66%|██████▌ | 4397/6638 [23:18<2:04:11, 3.33s/it] 66%|██████▋ | 4398/6638 [23:21<2:02:41, 3.29s/it] {'loss': 0.6111, 'grad_norm': 0.5987637112695803, 'learning_rate': 5.40236391980393e-06, 'epoch': 0.66} 66%|██████▋ | 4398/6638 [23:21<2:02:41, 3.29s/it] 66%|██████▋ | 4399/6638 [23:24<2:02:44, 3.29s/it] {'loss': 0.6929, 'grad_norm': 0.646370113353539, 'learning_rate': 5.3980310343882955e-06, 'epoch': 0.66} 66%|██████▋ | 4399/6638 [23:24<2:02:44, 3.29s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 07 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 66%|██████▋ | 4400/6638 [23:27<2:03:40, 3.32s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.647, 'grad_norm': 0.6241131834879239, 'learning_rate': 5.393699244798357e-06, 'epoch': 0.66} 66%|██████▋ | 4400/6638 [23:27<2:03:40, 3.32s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 66%|██████▋ | 4401/6638 [23:45<4:38:50, 7.48s/it] {'loss': 0.6535, 'grad_norm': 0.6357214241294098, 'learning_rate': 5.389368552065595e-06, 'epoch': 0.66} 66%|██████▋ | 4401/6638 [23:45<4:38:50, 7.48s/it] 66%|██████▋ | 4402/6638 [23:48<3:50:53, 6.20s/it] {'loss': 0.6148, 'grad_norm': 0.5982807472319754, 'learning_rate': 5.385038957221241e-06, 'epoch': 0.66} 66%|██████▋ | 4402/6638 [23:48<3:50:53, 6.20s/it] 66%|██████▋ | 4403/6638 [23:51<3:18:41, 5.33s/it] {'loss': 0.6358, 'grad_norm': 0.5375857485743747, 'learning_rate': 5.380710461296268e-06, 'epoch': 0.66} 66%|██████▋ | 4403/6638 [23:51<3:18:41, 5.33s/it] 66%|██████▋ | 4404/6638 [23:54<2:55:40, 4.72s/it] {'loss': 0.6349, 'grad_norm': 0.5743976903566017, 'learning_rate': 5.376383065321376e-06, 'epoch': 0.66} 66%|██████▋ | 4404/6638 [23:54<2:55:40, 4.72s/it] 66%|██████▋ | 4405/6638 [23:58<2:39:28, 4.28s/it] {'loss': 0.6394, 'grad_norm': 0.6144918859936048, 'learning_rate': 5.3720567703270135e-06, 'epoch': 0.66} 66%|██████▋ | 4405/6638 [23:58<2:39:28, 4.28s/it] 66%|██████▋ | 4406/6638 [24:01<2:27:58, 3.98s/it] {'loss': 0.7101, 'grad_norm': 0.6717591977346143, 'learning_rate': 5.367731577343357e-06, 'epoch': 0.66} 66%|██████▋ | 4406/6638 [24:01<2:27:58, 3.98s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/model/llava_arch.py:500: UserWarning: Truncating sequences to `model_max_length` (4096). warnings.warn(f"Truncating sequences to `model_max_length` ({self.tokenizer.model_max_length}).") 66%|██████▋ | 4407/6638 [24:05<2:24:42, 3.89s/it] {'loss': 0.6596, 'grad_norm': 0.5600534701720957, 'learning_rate': 5.36340748740033e-06, 'epoch': 0.66} 66%|██████▋ | 4407/6638 [24:05<2:24:42, 3.89s/it] 66%|██████▋ | 4408/6638 [24:08<2:17:06, 3.69s/it] {'loss': 0.6524, 'grad_norm': 0.5734426412845384, 'learning_rate': 5.35908450152759e-06, 'epoch': 0.66} 66%|██████▋ | 4408/6638 [24:08<2:17:06, 3.69s/it] 66%|██████▋ | 4409/6638 [24:11<2:13:02, 3.58s/it] {'loss': 0.6075, 'grad_norm': 0.5184322052495842, 'learning_rate': 5.354762620754528e-06, 'epoch': 0.66} 66%|██████▋ | 4409/6638 [24:11<2:13:02, 3.58s/it] 66%|██████▋ | 4410/6638 [24:15<2:10:26, 3.51s/it] {'loss': 0.6636, 'grad_norm': 0.6219501534534768, 'learning_rate': 5.350441846110274e-06, 'epoch': 0.66} 66%|██████▋ | 4410/6638 [24:15<2:10:26, 3.51s/it] 66%|██████▋ | 4411/6638 [24:18<2:08:25, 3.46s/it] {'loss': 0.7053, 'grad_norm': 0.6133397988675606, 'learning_rate': 5.346122178623705e-06, 'epoch': 0.66} 66%|██████▋ | 4411/6638 [24:18<2:08:25, 3.46s/it] 66%|██████▋ | 4412/6638 [24:21<2:06:20, 3.41s/it] {'loss': 0.6739, 'grad_norm': 0.6222778873976943, 'learning_rate': 5.3418036193234115e-06, 'epoch': 0.66} 66%|██████▋ | 4412/6638 [24:21<2:06:20, 3.41s/it] 66%|██████▋ | 4413/6638 [24:25<2:06:15, 3.40s/it] {'loss': 0.7215, 'grad_norm': 0.7028492191009001, 'learning_rate': 5.337486169237739e-06, 'epoch': 0.66} 66%|██████▋ | 4413/6638 [24:25<2:06:15, 3.40s/it] 66%|██████▋ | 4414/6638 [24:28<2:05:15, 3.38s/it] {'loss': 0.6441, 'grad_norm': 0.6027512388327543, 'learning_rate': 5.3331698293947645e-06, 'epoch': 0.66} 66%|██████▋ | 4414/6638 [24:28<2:05:15, 3.38s/it] 67%|██████▋ | 4415/6638 [24:32<2:07:31, 3.44s/it] {'loss': 0.6731, 'grad_norm': 0.579827401056567, 'learning_rate': 5.328854600822302e-06, 'epoch': 0.67} 67%|██████▋ | 4415/6638 [24:32<2:07:31, 3.44s/it] 67%|██████▋ | 4416/6638 [24:35<2:07:58, 3.46s/it] {'loss': 0.6049, 'grad_norm': 0.6057359484794456, 'learning_rate': 5.324540484547894e-06, 'epoch': 0.67} 67%|██████▋ | 4416/6638 [24:35<2:07:58, 3.46s/it] 67%|██████▋ | 4417/6638 [24:38<2:06:35, 3.42s/it] {'loss': 0.6765, 'grad_norm': 0.7063350263949465, 'learning_rate': 5.320227481598816e-06, 'epoch': 0.67} 67%|██████▋ | 4417/6638 [24:38<2:06:35, 3.42s/it] 67%|██████▋ | 4418/6638 [24:42<2:04:21, 3.36s/it] {'loss': 0.6105, 'grad_norm': 0.5518369071010597, 'learning_rate': 5.3159155930021e-06, 'epoch': 0.67} 67%|██████▋ | 4418/6638 [24:42<2:04:21, 3.36s/it] 67%|██████▋ | 4419/6638 [24:45<2:04:16, 3.36s/it] {'loss': 0.6357, 'grad_norm': 0.5504984392268166, 'learning_rate': 5.3116048197844845e-06, 'epoch': 0.67} 67%|██████▋ | 4419/6638 [24:45<2:04:16, 3.36s/it] 67%|██████▋ | 4420/6638 [24:48<2:02:12, 3.31s/it] {'loss': 0.6182, 'grad_norm': 0.5881541729392261, 'learning_rate': 5.307295162972466e-06, 'epoch': 0.67} 67%|██████▋ | 4420/6638 [24:48<2:02:12, 3.31s/it] 67%|██████▋ | 4421/6638 [24:51<2:02:53, 3.33s/it] {'loss': 0.6748, 'grad_norm': 0.6341959363637606, 'learning_rate': 5.302986623592253e-06, 'epoch': 0.67} 67%|██████▋ | 4421/6638 [24:51<2:02:53, 3.33s/it] 67%|██████▋ | 4422/6638 [24:55<2:01:17, 3.28s/it] {'loss': 0.6137, 'grad_norm': 0.6092374267654018, 'learning_rate': 5.298679202669806e-06, 'epoch': 0.67} 67%|██████▋ | 4422/6638 [24:55<2:01:17, 3.28s/it] 67%|██████▋ | 4423/6638 [24:58<2:01:02, 3.28s/it] {'loss': 0.622, 'grad_norm': 0.545874098986164, 'learning_rate': 5.294372901230815e-06, 'epoch': 0.67} 67%|██████▋ | 4423/6638 [24:58<2:01:02, 3.28s/it] 67%|██████▋ | 4424/6638 [25:01<1:59:56, 3.25s/it] {'loss': 0.5998, 'grad_norm': 0.7183101275241254, 'learning_rate': 5.290067720300695e-06, 'epoch': 0.67} 67%|██████▋ | 4424/6638 [25:01<1:59:56, 3.25s/it] 67%|██████▋ | 4425/6638 [25:04<1:58:57, 3.23s/it] {'loss': 0.685, 'grad_norm': 0.6586379963537001, 'learning_rate': 5.2857636609046026e-06, 'epoch': 0.67} 67%|██████▋ | 4425/6638 [25:04<1:58:57, 3.23s/it] 67%|██████▋ | 4426/6638 [25:08<2:00:03, 3.26s/it] {'loss': 0.6754, 'grad_norm': 0.6674166413048284, 'learning_rate': 5.2814607240674285e-06, 'epoch': 0.67} 67%|██████▋ | 4426/6638 [25:08<2:00:03, 3.26s/it] 67%|██████▋ | 4427/6638 [25:11<2:00:37, 3.27s/it] {'loss': 0.6784, 'grad_norm': 0.5660949519930977, 'learning_rate': 5.277158910813786e-06, 'epoch': 0.67} 67%|██████▋ | 4427/6638 [25:11<2:00:37, 3.27s/it] 67%|██████▋ | 4428/6638 [25:14<2:01:39, 3.30s/it] {'loss': 0.6123, 'grad_norm': 0.5375655324848172, 'learning_rate': 5.2728582221680355e-06, 'epoch': 0.67} 67%|██████▋ | 4428/6638 [25:14<2:01:39, 3.30s/it] 67%|██████▋ | 4429/6638 [25:18<2:00:53, 3.28s/it] {'loss': 0.6743, 'grad_norm': 0.8261443975697808, 'learning_rate': 5.268558659154248e-06, 'epoch': 0.67} 67%|██████▋ | 4429/6638 [25:18<2:00:53, 3.28s/it] 67%|██████▋ | 4430/6638 [25:21<2:00:43, 3.28s/it] {'loss': 0.6882, 'grad_norm': 0.6561356326312763, 'learning_rate': 5.264260222796258e-06, 'epoch': 0.67} 67%|██████▋ | 4430/6638 [25:21<2:00:43, 3.28s/it] 67%|██████▋ | 4431/6638 [25:24<2:01:47, 3.31s/it] {'loss': 0.6663, 'grad_norm': 0.5856179892311929, 'learning_rate': 5.259962914117603e-06, 'epoch': 0.67} 67%|██████▋ | 4431/6638 [25:24<2:01:47, 3.31s/it] 67%|██████▋ | 4432/6638 [25:28<2:01:49, 3.31s/it] {'loss': 0.6241, 'grad_norm': 0.6007358791559934, 'learning_rate': 5.255666734141561e-06, 'epoch': 0.67} 67%|██████▋ | 4432/6638 [25:28<2:01:49, 3.31s/it] 67%|██████▋ | 4433/6638 [25:31<2:01:57, 3.32s/it] {'loss': 0.6723, 'grad_norm': 0.7008020690037614, 'learning_rate': 5.2513716838911465e-06, 'epoch': 0.67} 67%|██████▋ | 4433/6638 [25:31<2:01:57, 3.32s/it] 67%|██████▋ | 4434/6638 [25:34<1:59:52, 3.26s/it] {'loss': 0.6086, 'grad_norm': 0.5801572447945543, 'learning_rate': 5.247077764389099e-06, 'epoch': 0.67} 67%|██████▋ | 4434/6638 [25:34<1:59:52, 3.26s/it] 67%|██████▋ | 4435/6638 [25:37<2:00:59, 3.30s/it] {'loss': 0.6815, 'grad_norm': 0.6357343792337611, 'learning_rate': 5.242784976657899e-06, 'epoch': 0.67} 67%|██████▋ | 4435/6638 [25:37<2:00:59, 3.30s/it] 67%|██████▋ | 4436/6638 [25:41<2:01:25, 3.31s/it] {'loss': 0.6556, 'grad_norm': 0.6307404802435987, 'learning_rate': 5.238493321719739e-06, 'epoch': 0.67} 67%|██████▋ | 4436/6638 [25:41<2:01:25, 3.31s/it] 67%|██████▋ | 4437/6638 [25:44<2:01:09, 3.30s/it] {'loss': 0.6321, 'grad_norm': 0.5769539990351875, 'learning_rate': 5.2342028005965575e-06, 'epoch': 0.67} 67%|██████▋ | 4437/6638 [25:44<2:01:09, 3.30s/it] 67%|██████▋ | 4438/6638 [25:47<2:00:21, 3.28s/it] {'loss': 0.6387, 'grad_norm': 0.6495195292077808, 'learning_rate': 5.229913414310019e-06, 'epoch': 0.67} 67%|██████▋ | 4438/6638 [25:47<2:00:21, 3.28s/it] 67%|██████▋ | 4439/6638 [25:51<2:03:15, 3.36s/it] {'loss': 0.6699, 'grad_norm': 0.6080611592760381, 'learning_rate': 5.225625163881518e-06, 'epoch': 0.67} 67%|██████▋ | 4439/6638 [25:51<2:03:15, 3.36s/it] 67%|██████▋ | 4440/6638 [25:54<2:02:32, 3.34s/it] {'loss': 0.6169, 'grad_norm': 0.6204435715087747, 'learning_rate': 5.22133805033217e-06, 'epoch': 0.67} 67%|██████▋ | 4440/6638 [25:54<2:02:32, 3.34s/it] 67%|██████▋ | 4441/6638 [25:57<2:01:47, 3.33s/it] {'loss': 0.6928, 'grad_norm': 0.698170487607999, 'learning_rate': 5.217052074682829e-06, 'epoch': 0.67} 67%|██████▋ | 4441/6638 [25:57<2:01:47, 3.33s/it] 67%|██████▋ | 4442/6638 [26:01<2:03:41, 3.38s/it] {'loss': 0.6186, 'grad_norm': 0.5686054364450623, 'learning_rate': 5.212767237954081e-06, 'epoch': 0.67} 67%|██████▋ | 4442/6638 [26:01<2:03:41, 3.38s/it] 67%|██████▋ | 4443/6638 [26:04<2:03:45, 3.38s/it] {'loss': 0.665, 'grad_norm': 0.6503745201297412, 'learning_rate': 5.208483541166236e-06, 'epoch': 0.67} 67%|██████▋ | 4443/6638 [26:04<2:03:45, 3.38s/it] 67%|██████▋ | 4444/6638 [26:08<2:04:10, 3.40s/it] {'loss': 0.6723, 'grad_norm': 0.6384264083963853, 'learning_rate': 5.2042009853393245e-06, 'epoch': 0.67} 67%|██████▋ | 4444/6638 [26:08<2:04:10, 3.40s/it] 67%|██████▋ | 4445/6638 [26:11<2:02:11, 3.34s/it] {'loss': 0.6825, 'grad_norm': 0.6843066652832854, 'learning_rate': 5.1999195714931205e-06, 'epoch': 0.67} 67%|██████▋ | 4445/6638 [26:11<2:02:11, 3.34s/it] 67%|██████▋ | 4446/6638 [26:14<2:02:20, 3.35s/it] {'loss': 0.6493, 'grad_norm': 0.6038460950227601, 'learning_rate': 5.195639300647119e-06, 'epoch': 0.67} 67%|██████▋ | 4446/6638 [26:14<2:02:20, 3.35s/it] 67%|██████▋ | 4447/6638 [26:17<2:00:55, 3.31s/it] {'loss': 0.6297, 'grad_norm': 0.6460536370049031, 'learning_rate': 5.191360173820536e-06, 'epoch': 0.67} 67%|██████▋ | 4447/6638 [26:17<2:00:55, 3.31s/it] 67%|██████▋ | 4448/6638 [26:21<2:01:58, 3.34s/it] {'loss': 0.6385, 'grad_norm': 0.5902629285371016, 'learning_rate': 5.1870821920323275e-06, 'epoch': 0.67} 67%|██████▋ | 4448/6638 [26:21<2:01:58, 3.34s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4403 > 4096). Running this sequence through the model will result in indexing errors 67%|██████▋ | 4449/6638 [26:24<2:00:59, 3.32s/it] {'loss': 0.6819, 'grad_norm': 0.6436585763976147, 'learning_rate': 5.182805356301173e-06, 'epoch': 0.67} 67%|██████▋ | 4449/6638 [26:24<2:00:59, 3.32s/it]6 AutoResumeHook: Checking whether to suspend... 24 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 67%|██████▋ | 4450/6638 [26:27<2:00:35, 3.31s/it]7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6626, 'grad_norm': 0.5938976826922733, 'learning_rate': 5.1785296676454685e-06, 'epoch': 0.67} 67%|██████▋ | 4450/6638 [26:27<2:00:35, 3.31s/it] 67%|██████▋ | 4451/6638 [26:31<2:01:19, 3.33s/it] {'loss': 0.6857, 'grad_norm': 0.6253562419629894, 'learning_rate': 5.174255127083354e-06, 'epoch': 0.67} 67%|██████▋ | 4451/6638 [26:31<2:01:19, 3.33s/it] 67%|██████▋ | 4452/6638 [26:34<2:01:27, 3.33s/it] {'loss': 0.6358, 'grad_norm': 0.5621015562668804, 'learning_rate': 5.169981735632677e-06, 'epoch': 0.67} 67%|██████▋ | 4452/6638 [26:34<2:01:27, 3.33s/it] 67%|██████▋ | 4453/6638 [26:37<2:00:26, 3.31s/it] {'loss': 0.6084, 'grad_norm': 0.5648159911959051, 'learning_rate': 5.165709494311037e-06, 'epoch': 0.67} 67%|██████▋ | 4453/6638 [26:37<2:00:26, 3.31s/it] 67%|██████▋ | 4454/6638 [26:41<1:59:42, 3.29s/it] {'loss': 0.6491, 'grad_norm': 0.6104058752448683, 'learning_rate': 5.1614384041357356e-06, 'epoch': 0.67} 67%|██████▋ | 4454/6638 [26:41<1:59:42, 3.29s/it] 67%|██████▋ | 4455/6638 [26:44<1:58:47, 3.27s/it] {'loss': 0.6208, 'grad_norm': 0.5703889871315514, 'learning_rate': 5.1571684661238075e-06, 'epoch': 0.67} 67%|██████▋ | 4455/6638 [26:44<1:58:47, 3.27s/it] 67%|██████▋ | 4456/6638 [26:47<1:58:53, 3.27s/it] {'loss': 0.6613, 'grad_norm': 0.5932649606666124, 'learning_rate': 5.1528996812920166e-06, 'epoch': 0.67} 67%|██████▋ | 4456/6638 [26:47<1:58:53, 3.27s/it] 67%|██████▋ | 4457/6638 [26:50<1:59:30, 3.29s/it] {'loss': 0.6673, 'grad_norm': 0.6062387390809086, 'learning_rate': 5.148632050656852e-06, 'epoch': 0.67} 67%|██████▋ | 4457/6638 [26:50<1:59:30, 3.29s/it] 67%|██████▋ | 4458/6638 [26:54<1:58:22, 3.26s/it] {'loss': 0.6556, 'grad_norm': 0.5636798876168666, 'learning_rate': 5.144365575234529e-06, 'epoch': 0.67} 67%|██████▋ | 4458/6638 [26:54<1:58:22, 3.26s/it] 67%|██████▋ | 4459/6638 [26:57<1:57:48, 3.24s/it] {'loss': 0.6299, 'grad_norm': 0.5435068371766716, 'learning_rate': 5.140100256040979e-06, 'epoch': 0.67} 67%|██████▋ | 4459/6638 [26:57<1:57:48, 3.24s/it] 67%|██████▋ | 4460/6638 [27:00<1:56:58, 3.22s/it] {'loss': 0.6689, 'grad_norm': 0.6332652086771197, 'learning_rate': 5.135836094091867e-06, 'epoch': 0.67} 67%|██████▋ | 4460/6638 [27:00<1:56:58, 3.22s/it] 67%|██████▋ | 4461/6638 [27:03<1:57:00, 3.22s/it] {'loss': 0.6603, 'grad_norm': 0.579792686586688, 'learning_rate': 5.131573090402584e-06, 'epoch': 0.67} 67%|██████▋ | 4461/6638 [27:03<1:57:00, 3.22s/it] 67%|██████▋ | 4462/6638 [27:06<1:56:36, 3.22s/it] {'loss': 0.6196, 'grad_norm': 0.5686803465019463, 'learning_rate': 5.127311245988233e-06, 'epoch': 0.67} 67%|██████▋ | 4462/6638 [27:06<1:56:36, 3.22s/it] 67%|██████▋ | 4463/6638 [27:10<1:57:06, 3.23s/it] {'loss': 0.588, 'grad_norm': 0.5514480016049988, 'learning_rate': 5.1230505618636575e-06, 'epoch': 0.67} 67%|██████▋ | 4463/6638 [27:10<1:57:06, 3.23s/it] 67%|██████▋ | 4464/6638 [27:13<1:57:25, 3.24s/it] {'loss': 0.6748, 'grad_norm': 0.6044854856768775, 'learning_rate': 5.118791039043407e-06, 'epoch': 0.67} 67%|██████▋ | 4464/6638 [27:13<1:57:25, 3.24s/it] 67%|██████▋ | 4465/6638 [27:17<2:01:49, 3.36s/it] {'loss': 0.6374, 'grad_norm': 0.5585278487777795, 'learning_rate': 5.114532678541768e-06, 'epoch': 0.67} 67%|██████▋ | 4465/6638 [27:17<2:01:49, 3.36s/it] 67%|██████▋ | 4466/6638 [27:20<2:01:43, 3.36s/it] {'loss': 0.6488, 'grad_norm': 0.5866234540721269, 'learning_rate': 5.110275481372748e-06, 'epoch': 0.67} 67%|██████▋ | 4466/6638 [27:20<2:01:43, 3.36s/it] 67%|██████▋ | 4467/6638 [27:23<2:00:36, 3.33s/it] {'loss': 0.6766, 'grad_norm': 0.62646547299651, 'learning_rate': 5.106019448550073e-06, 'epoch': 0.67} 67%|██████▋ | 4467/6638 [27:23<2:00:36, 3.33s/it] 67%|██████▋ | 4468/6638 [27:27<2:01:11, 3.35s/it] {'loss': 0.7204, 'grad_norm': 0.6748111331153808, 'learning_rate': 5.10176458108719e-06, 'epoch': 0.67} 67%|██████▋ | 4468/6638 [27:27<2:01:11, 3.35s/it] 67%|██████▋ | 4469/6638 [27:30<2:00:20, 3.33s/it] {'loss': 0.6711, 'grad_norm': 0.6470263414045041, 'learning_rate': 5.097510879997283e-06, 'epoch': 0.67} 67%|██████▋ | 4469/6638 [27:30<2:00:20, 3.33s/it] 67%|██████▋ | 4470/6638 [27:33<1:58:49, 3.29s/it] {'loss': 0.6235, 'grad_norm': 0.533199057896142, 'learning_rate': 5.093258346293237e-06, 'epoch': 0.67} 67%|██████▋ | 4470/6638 [27:33<1:58:49, 3.29s/it] 67%|██████▋ | 4471/6638 [27:36<1:57:55, 3.27s/it] {'loss': 0.6353, 'grad_norm': 0.7337850522358201, 'learning_rate': 5.089006980987674e-06, 'epoch': 0.67} 67%|██████▋ | 4471/6638 [27:36<1:57:55, 3.27s/it] 67%|██████▋ | 4472/6638 [27:40<1:57:23, 3.25s/it] {'loss': 0.6434, 'grad_norm': 0.5426072393759763, 'learning_rate': 5.084756785092937e-06, 'epoch': 0.67} 67%|██████▋ | 4472/6638 [27:40<1:57:23, 3.25s/it] 67%|██████▋ | 4473/6638 [27:43<1:57:17, 3.25s/it] {'loss': 0.653, 'grad_norm': 0.5573843114857178, 'learning_rate': 5.080507759621081e-06, 'epoch': 0.67} 67%|██████▋ | 4473/6638 [27:43<1:57:17, 3.25s/it] 67%|██████▋ | 4474/6638 [27:46<1:59:47, 3.32s/it] {'loss': 0.7028, 'grad_norm': 0.6001285030534518, 'learning_rate': 5.076259905583894e-06, 'epoch': 0.67} 67%|██████▋ | 4474/6638 [27:46<1:59:47, 3.32s/it] 67%|██████▋ | 4475/6638 [27:50<1:58:58, 3.30s/it] {'loss': 0.6571, 'grad_norm': 0.6087955728470112, 'learning_rate': 5.07201322399287e-06, 'epoch': 0.67} 67%|██████▋ | 4475/6638 [27:50<1:58:58, 3.30s/it] 67%|██████▋ | 4476/6638 [27:53<1:59:21, 3.31s/it] {'loss': 0.6941, 'grad_norm': 0.6730599630064146, 'learning_rate': 5.067767715859248e-06, 'epoch': 0.67} 67%|██████▋ | 4476/6638 [27:53<1:59:21, 3.31s/it] 67%|██████▋ | 4477/6638 [27:56<1:59:21, 3.31s/it] {'loss': 0.6693, 'grad_norm': 0.55376408522941, 'learning_rate': 5.063523382193963e-06, 'epoch': 0.67} 67%|██████▋ | 4477/6638 [27:56<1:59:21, 3.31s/it] 67%|██████▋ | 4478/6638 [27:59<1:58:41, 3.30s/it] {'loss': 0.5859, 'grad_norm': 0.5189538418597406, 'learning_rate': 5.05928022400768e-06, 'epoch': 0.67} 67%|██████▋ | 4478/6638 [27:59<1:58:41, 3.30s/it] 67%|██████▋ | 4479/6638 [28:03<2:00:43, 3.36s/it] {'loss': 0.63, 'grad_norm': 0.5397538223923345, 'learning_rate': 5.055038242310786e-06, 'epoch': 0.67} 67%|██████▋ | 4479/6638 [28:03<2:00:43, 3.36s/it] 67%|██████▋ | 4480/6638 [28:06<1:59:20, 3.32s/it] {'loss': 0.6371, 'grad_norm': 0.5758297141603071, 'learning_rate': 5.050797438113387e-06, 'epoch': 0.67} 67%|██████▋ | 4480/6638 [28:06<1:59:20, 3.32s/it] 68%|██████▊ | 4481/6638 [28:10<1:59:41, 3.33s/it] {'loss': 0.6475, 'grad_norm': 0.5737221437482916, 'learning_rate': 5.0465578124253104e-06, 'epoch': 0.68} 68%|██████▊ | 4481/6638 [28:10<1:59:41, 3.33s/it] 68%|██████▊ | 4482/6638 [28:13<1:58:43, 3.30s/it] {'loss': 0.6312, 'grad_norm': 0.6045247684753445, 'learning_rate': 5.042319366256096e-06, 'epoch': 0.68} 68%|██████▊ | 4482/6638 [28:13<1:58:43, 3.30s/it] 68%|██████▊ | 4483/6638 [28:16<1:57:34, 3.27s/it] {'loss': 0.6237, 'grad_norm': 0.5513488360941877, 'learning_rate': 5.038082100615003e-06, 'epoch': 0.68} 68%|██████▊ | 4483/6638 [28:16<1:57:34, 3.27s/it] 68%|██████▊ | 4484/6638 [28:19<1:57:07, 3.26s/it] {'loss': 0.6509, 'grad_norm': 0.5961456529644494, 'learning_rate': 5.0338460165110235e-06, 'epoch': 0.68} 68%|██████▊ | 4484/6638 [28:19<1:57:07, 3.26s/it] 68%|██████▊ | 4485/6638 [28:23<1:58:06, 3.29s/it] {'loss': 0.6693, 'grad_norm': 0.614421872571021, 'learning_rate': 5.029611114952852e-06, 'epoch': 0.68} 68%|██████▊ | 4485/6638 [28:23<1:58:06, 3.29s/it] 68%|██████▊ | 4486/6638 [28:26<1:57:21, 3.27s/it] {'loss': 0.6341, 'grad_norm': 0.5588040900725986, 'learning_rate': 5.025377396948914e-06, 'epoch': 0.68} 68%|██████▊ | 4486/6638 [28:26<1:57:21, 3.27s/it] 68%|██████▊ | 4487/6638 [28:29<1:59:04, 3.32s/it] {'loss': 0.6515, 'grad_norm': 0.583466430245106, 'learning_rate': 5.021144863507337e-06, 'epoch': 0.68} 68%|██████▊ | 4487/6638 [28:29<1:59:04, 3.32s/it] 68%|██████▊ | 4488/6638 [28:32<1:57:41, 3.28s/it] {'loss': 0.6645, 'grad_norm': 0.6045005312589322, 'learning_rate': 5.016913515635981e-06, 'epoch': 0.68} 68%|██████▊ | 4488/6638 [28:32<1:57:41, 3.28s/it] 68%|██████▊ | 4489/6638 [28:36<1:57:08, 3.27s/it] {'loss': 0.6254, 'grad_norm': 0.6393030855208734, 'learning_rate': 5.012683354342424e-06, 'epoch': 0.68} 68%|██████▊ | 4489/6638 [28:36<1:57:08, 3.27s/it] 68%|██████▊ | 4490/6638 [28:39<1:57:15, 3.28s/it] {'loss': 0.6149, 'grad_norm': 0.594850529212135, 'learning_rate': 5.008454380633948e-06, 'epoch': 0.68} 68%|██████▊ | 4490/6638 [28:39<1:57:15, 3.28s/it] 68%|██████▊ | 4491/6638 [28:43<2:00:08, 3.36s/it] {'loss': 0.6325, 'grad_norm': 0.4869186543060522, 'learning_rate': 5.004226595517565e-06, 'epoch': 0.68} 68%|██████▊ | 4491/6638 [28:43<2:00:08, 3.36s/it] 68%|██████▊ | 4492/6638 [28:46<1:59:13, 3.33s/it] {'loss': 0.6187, 'grad_norm': 0.5603898938276428, 'learning_rate': 5.000000000000003e-06, 'epoch': 0.68} 68%|██████▊ | 4492/6638 [28:46<1:59:13, 3.33s/it] 68%|██████▊ | 4493/6638 [28:49<1:58:35, 3.32s/it] {'loss': 0.6312, 'grad_norm': 0.5591475143426067, 'learning_rate': 4.995774595087695e-06, 'epoch': 0.68} 68%|██████▊ | 4493/6638 [28:49<1:58:35, 3.32s/it] 68%|██████▊ | 4494/6638 [28:52<1:57:39, 3.29s/it] {'loss': 0.6862, 'grad_norm': 0.7753996746058034, 'learning_rate': 4.991550381786804e-06, 'epoch': 0.68} 68%|██████▊ | 4494/6638 [28:52<1:57:39, 3.29s/it] 68%|██████▊ | 4495/6638 [28:56<1:58:46, 3.33s/it] {'loss': 0.6305, 'grad_norm': 0.5215382709005446, 'learning_rate': 4.9873273611032035e-06, 'epoch': 0.68} 68%|██████▊ | 4495/6638 [28:56<1:58:46, 3.33s/it] 68%|██████▊ | 4496/6638 [28:59<1:57:26, 3.29s/it] {'loss': 0.6552, 'grad_norm': 0.625867651142659, 'learning_rate': 4.983105534042489e-06, 'epoch': 0.68} 68%|██████▊ | 4496/6638 [28:59<1:57:26, 3.29s/it] 68%|██████▊ | 4497/6638 [29:02<1:57:30, 3.29s/it] {'loss': 0.647, 'grad_norm': 0.5876909043537415, 'learning_rate': 4.9788849016099595e-06, 'epoch': 0.68} 68%|██████▊ | 4497/6638 [29:02<1:57:30, 3.29s/it] 68%|██████▊ | 4498/6638 [29:06<1:58:48, 3.33s/it] {'loss': 0.6493, 'grad_norm': 0.626450504646023, 'learning_rate': 4.974665464810635e-06, 'epoch': 0.68} 68%|██████▊ | 4498/6638 [29:06<1:58:48, 3.33s/it] 68%|██████▊ | 4499/6638 [29:09<1:57:48, 3.30s/it] {'loss': 0.6786, 'grad_norm': 0.6670892252077338, 'learning_rate': 4.970447224649255e-06, 'epoch': 0.68} 68%|██████▊ | 4499/6638 [29:09<1:57:48, 3.30s/it]6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 68%|██████▊ | 4500/6638 [29:12<1:56:47, 3.28s/it] {'loss': 0.7005, 'grad_norm': 0.7093401539339975, 'learning_rate': 4.966230182130275e-06, 'epoch': 0.68} 68%|██████▊ | 4500/6638 [29:12<1:56:47, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 68%|██████▊ | 4501/6638 [29:27<4:03:39, 6.84s/it] {'loss': 0.657, 'grad_norm': 0.5741649731961028, 'learning_rate': 4.962014338257856e-06, 'epoch': 0.68} 68%|██████▊ | 4501/6638 [29:27<4:03:39, 6.84s/it] 68%|██████▊ | 4502/6638 [29:31<3:25:57, 5.79s/it] {'loss': 0.6457, 'grad_norm': 0.5996085866822314, 'learning_rate': 4.95779969403588e-06, 'epoch': 0.68} 68%|██████▊ | 4502/6638 [29:31<3:25:57, 5.79s/it] 68%|██████▊ | 4503/6638 [29:34<2:59:54, 5.06s/it] {'loss': 0.6846, 'grad_norm': 0.619258157791514, 'learning_rate': 4.9535862504679435e-06, 'epoch': 0.68} 68%|██████▊ | 4503/6638 [29:34<2:59:54, 5.06s/it] 68%|██████▊ | 4504/6638 [29:37<2:41:53, 4.55s/it] {'loss': 0.6645, 'grad_norm': 0.6041271526964724, 'learning_rate': 4.949374008557361e-06, 'epoch': 0.68} 68%|██████▊ | 4504/6638 [29:37<2:41:53, 4.55s/it] 68%|██████▊ | 4505/6638 [29:41<2:29:21, 4.20s/it] {'loss': 0.6695, 'grad_norm': 0.6987164590942083, 'learning_rate': 4.945162969307152e-06, 'epoch': 0.68} 68%|██████▊ | 4505/6638 [29:41<2:29:21, 4.20s/it] 68%|██████▊ | 4506/6638 [29:44<2:19:32, 3.93s/it] {'loss': 0.6602, 'grad_norm': 0.5831164159018772, 'learning_rate': 4.940953133720045e-06, 'epoch': 0.68} 68%|██████▊ | 4506/6638 [29:44<2:19:32, 3.93s/it] 68%|██████▊ | 4507/6638 [29:47<2:13:01, 3.75s/it] {'loss': 0.68, 'grad_norm': 0.6303626563196097, 'learning_rate': 4.936744502798506e-06, 'epoch': 0.68} 68%|██████▊ | 4507/6638 [29:47<2:13:01, 3.75s/it] 68%|██████▊ | 4508/6638 [29:51<2:07:38, 3.60s/it] {'loss': 0.64, 'grad_norm': 0.6100527741346303, 'learning_rate': 4.9325370775446866e-06, 'epoch': 0.68} 68%|██████▊ | 4508/6638 [29:51<2:07:38, 3.60s/it] 68%|██████▊ | 4509/6638 [29:54<2:03:31, 3.48s/it] {'loss': 0.6948, 'grad_norm': 0.5983622375397902, 'learning_rate': 4.9283308589604725e-06, 'epoch': 0.68} 68%|██████▊ | 4509/6638 [29:54<2:03:31, 3.48s/it] 68%|██████▊ | 4510/6638 [29:57<2:00:50, 3.41s/it] {'loss': 0.5886, 'grad_norm': 0.5035765310646819, 'learning_rate': 4.9241258480474454e-06, 'epoch': 0.68} 68%|██████▊ | 4510/6638 [29:57<2:00:50, 3.41s/it] 68%|██████▊ | 4511/6638 [30:00<1:58:20, 3.34s/it] {'loss': 0.6507, 'grad_norm': 0.6103943583032858, 'learning_rate': 4.9199220458069085e-06, 'epoch': 0.68} 68%|██████▊ | 4511/6638 [30:00<1:58:20, 3.34s/it] 68%|██████▊ | 4512/6638 [30:03<1:56:27, 3.29s/it] {'loss': 0.6491, 'grad_norm': 0.5831807826175426, 'learning_rate': 4.915719453239882e-06, 'epoch': 0.68} 68%|██████▊ | 4512/6638 [30:03<1:56:27, 3.29s/it] 68%|██████▊ | 4513/6638 [30:07<1:56:21, 3.29s/it] {'loss': 0.6494, 'grad_norm': 0.564761040170534, 'learning_rate': 4.911518071347081e-06, 'epoch': 0.68} 68%|██████▊ | 4513/6638 [30:07<1:56:21, 3.29s/it] 68%|██████▊ | 4514/6638 [30:10<1:57:26, 3.32s/it] {'loss': 0.6371, 'grad_norm': 0.5730082504101534, 'learning_rate': 4.907317901128951e-06, 'epoch': 0.68} 68%|██████▊ | 4514/6638 [30:10<1:57:26, 3.32s/it] 68%|██████▊ | 4515/6638 [30:13<1:56:25, 3.29s/it] {'loss': 0.6081, 'grad_norm': 0.5751738247846389, 'learning_rate': 4.90311894358564e-06, 'epoch': 0.68} 68%|██████▊ | 4515/6638 [30:13<1:56:25, 3.29s/it] 68%|██████▊ | 4516/6638 [30:16<1:55:01, 3.25s/it] {'loss': 0.6207, 'grad_norm': 0.5728735795465605, 'learning_rate': 4.898921199717004e-06, 'epoch': 0.68} 68%|██████▊ | 4516/6638 [30:16<1:55:01, 3.25s/it] 68%|██████▊ | 4517/6638 [30:20<1:55:10, 3.26s/it] {'loss': 0.6325, 'grad_norm': 0.6336143653699485, 'learning_rate': 4.894724670522617e-06, 'epoch': 0.68} 68%|██████▊ | 4517/6638 [30:20<1:55:10, 3.26s/it] 68%|██████▊ | 4518/6638 [30:23<1:54:41, 3.25s/it] {'loss': 0.6343, 'grad_norm': 0.580014638931637, 'learning_rate': 4.89052935700176e-06, 'epoch': 0.68} 68%|██████▊ | 4518/6638 [30:23<1:54:41, 3.25s/it] 68%|██████▊ | 4519/6638 [30:26<1:54:10, 3.23s/it] {'loss': 0.6953, 'grad_norm': 0.6075380148373959, 'learning_rate': 4.886335260153431e-06, 'epoch': 0.68} 68%|██████▊ | 4519/6638 [30:26<1:54:10, 3.23s/it] 68%|██████▊ | 4520/6638 [30:29<1:54:48, 3.25s/it] {'loss': 0.653, 'grad_norm': 0.7149674501463699, 'learning_rate': 4.882142380976327e-06, 'epoch': 0.68} 68%|██████▊ | 4520/6638 [30:29<1:54:48, 3.25s/it] 68%|██████▊ | 4521/6638 [30:33<1:54:30, 3.25s/it] {'loss': 0.6343, 'grad_norm': 0.6614143638460859, 'learning_rate': 4.8779507204688595e-06, 'epoch': 0.68} 68%|██████▊ | 4521/6638 [30:33<1:54:30, 3.25s/it] 68%|██████▊ | 4522/6638 [30:36<1:54:38, 3.25s/it] {'loss': 0.644, 'grad_norm': 0.6530223134153217, 'learning_rate': 4.873760279629152e-06, 'epoch': 0.68} 68%|██████▊ | 4522/6638 [30:36<1:54:38, 3.25s/it] 68%|██████▊ | 4523/6638 [30:39<1:55:00, 3.26s/it] {'loss': 0.6919, 'grad_norm': 0.6594102408289327, 'learning_rate': 4.86957105945504e-06, 'epoch': 0.68} 68%|██████▊ | 4523/6638 [30:39<1:55:00, 3.26s/it] 68%|██████▊ | 4524/6638 [30:42<1:54:09, 3.24s/it] {'loss': 0.6428, 'grad_norm': 0.5634515988620643, 'learning_rate': 4.865383060944065e-06, 'epoch': 0.68} 68%|██████▊ | 4524/6638 [30:42<1:54:09, 3.24s/it] 68%|██████▊ | 4525/6638 [30:46<1:53:46, 3.23s/it] {'loss': 0.6309, 'grad_norm': 0.5671118080793927, 'learning_rate': 4.861196285093473e-06, 'epoch': 0.68} 68%|██████▊ | 4525/6638 [30:46<1:53:46, 3.23s/it] 68%|██████▊ | 4526/6638 [30:49<1:53:32, 3.23s/it] {'loss': 0.6444, 'grad_norm': 0.597642508338841, 'learning_rate': 4.857010732900225e-06, 'epoch': 0.68} 68%|██████▊ | 4526/6638 [30:49<1:53:32, 3.23s/it] 68%|██████▊ | 4527/6638 [30:52<1:55:00, 3.27s/it] {'loss': 0.6269, 'grad_norm': 0.5136759527563403, 'learning_rate': 4.852826405360996e-06, 'epoch': 0.68} 68%|██████▊ | 4527/6638 [30:52<1:55:00, 3.27s/it] 68%|██████▊ | 4528/6638 [30:55<1:55:11, 3.28s/it] {'loss': 0.6449, 'grad_norm': 0.6124789653974039, 'learning_rate': 4.848643303472151e-06, 'epoch': 0.68} 68%|██████▊ | 4528/6638 [30:55<1:55:11, 3.28s/it] 68%|██████▊ | 4529/6638 [30:59<1:54:01, 3.24s/it] {'loss': 0.6343, 'grad_norm': 0.6205895991664818, 'learning_rate': 4.844461428229782e-06, 'epoch': 0.68} 68%|██████▊ | 4529/6638 [30:59<1:54:01, 3.24s/it] 68%|██████▊ | 4530/6638 [31:02<1:55:26, 3.29s/it] {'loss': 0.6221, 'grad_norm': 0.5439635453152547, 'learning_rate': 4.840280780629683e-06, 'epoch': 0.68} 68%|██████▊ | 4530/6638 [31:02<1:55:26, 3.29s/it] 68%|██████▊ | 4531/6638 [31:05<1:56:52, 3.33s/it] {'loss': 0.6032, 'grad_norm': 0.5140164724359639, 'learning_rate': 4.836101361667348e-06, 'epoch': 0.68} 68%|██████▊ | 4531/6638 [31:05<1:56:52, 3.33s/it] 68%|██████▊ | 4532/6638 [31:09<1:56:25, 3.32s/it] {'loss': 0.6559, 'grad_norm': 0.6348938310971186, 'learning_rate': 4.831923172337991e-06, 'epoch': 0.68} 68%|██████▊ | 4532/6638 [31:09<1:56:25, 3.32s/it] 68%|██████▊ | 4533/6638 [31:12<1:56:42, 3.33s/it] {'loss': 0.6654, 'grad_norm': 0.5706105093616531, 'learning_rate': 4.827746213636519e-06, 'epoch': 0.68} 68%|██████▊ | 4533/6638 [31:12<1:56:42, 3.33s/it] 68%|██████▊ | 4534/6638 [31:15<1:56:32, 3.32s/it] {'loss': 0.6524, 'grad_norm': 0.5502295389638404, 'learning_rate': 4.823570486557561e-06, 'epoch': 0.68} 68%|██████▊ | 4534/6638 [31:15<1:56:32, 3.32s/it] 68%|██████▊ | 4535/6638 [31:19<1:54:47, 3.28s/it] {'loss': 0.607, 'grad_norm': 0.5734933895220836, 'learning_rate': 4.8193959920954435e-06, 'epoch': 0.68} 68%|██████▊ | 4535/6638 [31:19<1:54:47, 3.28s/it] 68%|██████▊ | 4536/6638 [31:22<1:54:22, 3.26s/it] {'loss': 0.7067, 'grad_norm': 0.6226242831701556, 'learning_rate': 4.815222731244199e-06, 'epoch': 0.68} 68%|██████▊ | 4536/6638 [31:22<1:54:22, 3.26s/it] 68%|██████▊ | 4537/6638 [31:25<1:54:14, 3.26s/it] {'loss': 0.6437, 'grad_norm': 0.5735518322185169, 'learning_rate': 4.8110507049975705e-06, 'epoch': 0.68} 68%|██████▊ | 4537/6638 [31:25<1:54:14, 3.26s/it] 68%|██████▊ | 4538/6638 [31:28<1:53:50, 3.25s/it] {'loss': 0.6362, 'grad_norm': 0.6147970180016977, 'learning_rate': 4.80687991434901e-06, 'epoch': 0.68} 68%|██████▊ | 4538/6638 [31:28<1:53:50, 3.25s/it] 68%|██████▊ | 4539/6638 [31:32<1:56:29, 3.33s/it] {'loss': 0.6487, 'grad_norm': 0.5986320983514333, 'learning_rate': 4.8027103602916615e-06, 'epoch': 0.68} 68%|██████▊ | 4539/6638 [31:32<1:56:29, 3.33s/it] 68%|██████▊ | 4540/6638 [31:35<1:56:22, 3.33s/it] {'loss': 0.6589, 'grad_norm': 0.5605298741442357, 'learning_rate': 4.798542043818391e-06, 'epoch': 0.68} 68%|██████▊ | 4540/6638 [31:35<1:56:22, 3.33s/it] 68%|██████▊ | 4541/6638 [31:38<1:55:39, 3.31s/it] {'loss': 0.6291, 'grad_norm': 0.5571069855701517, 'learning_rate': 4.794374965921754e-06, 'epoch': 0.68} 68%|██████▊ | 4541/6638 [31:38<1:55:39, 3.31s/it] 68%|██████▊ | 4542/6638 [31:42<1:56:29, 3.33s/it] {'loss': 0.685, 'grad_norm': 0.6465537181174057, 'learning_rate': 4.790209127594032e-06, 'epoch': 0.68} 68%|██████▊ | 4542/6638 [31:42<1:56:29, 3.33s/it] 68%|██████▊ | 4543/6638 [31:45<1:57:52, 3.38s/it] {'loss': 0.6717, 'grad_norm': 0.6289266299141641, 'learning_rate': 4.786044529827191e-06, 'epoch': 0.68} 68%|██████▊ | 4543/6638 [31:45<1:57:52, 3.38s/it] 68%|██████▊ | 4544/6638 [31:49<1:58:46, 3.40s/it] {'loss': 0.6309, 'grad_norm': 0.6513131470890856, 'learning_rate': 4.781881173612906e-06, 'epoch': 0.68} 68%|██████▊ | 4544/6638 [31:49<1:58:46, 3.40s/it] 68%|██████▊ | 4545/6638 [31:52<1:58:58, 3.41s/it] {'loss': 0.6661, 'grad_norm': 0.6142054341886785, 'learning_rate': 4.777719059942566e-06, 'epoch': 0.68} 68%|██████▊ | 4545/6638 [31:52<1:58:58, 3.41s/it] 68%|██████▊ | 4546/6638 [31:55<1:56:26, 3.34s/it] {'loss': 0.6575, 'grad_norm': 0.5987977299664704, 'learning_rate': 4.773558189807255e-06, 'epoch': 0.68} 68%|██████▊ | 4546/6638 [31:55<1:56:26, 3.34s/it] 68%|██████▊ | 4547/6638 [31:59<1:56:15, 3.34s/it] {'loss': 0.6425, 'grad_norm': 0.5977942228421141, 'learning_rate': 4.769398564197768e-06, 'epoch': 0.68} 68%|██████▊ | 4547/6638 [31:59<1:56:15, 3.34s/it] 69%|██████▊ | 4548/6638 [32:02<1:55:32, 3.32s/it] {'loss': 0.653, 'grad_norm': 0.6502146701404702, 'learning_rate': 4.765240184104592e-06, 'epoch': 0.69} 69%|██████▊ | 4548/6638 [32:02<1:55:32, 3.32s/it] 69%|██████▊ | 4549/6638 [32:05<1:54:00, 3.27s/it] {'loss': 0.6073, 'grad_norm': 0.5891740032153289, 'learning_rate': 4.7610830505179295e-06, 'epoch': 0.69} 69%|██████▊ | 4549/6638 [32:05<1:54:00, 3.27s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 14 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 0 5 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 69%|██████▊ | 4550/6638 [32:08<1:54:12, 3.28s/it]7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6569, 'grad_norm': 0.5948060039076123, 'learning_rate': 4.756927164427685e-06, 'epoch': 0.69} 69%|██████▊ | 4550/6638 [32:08<1:54:12, 3.28s/it] 69%|██████▊ | 4551/6638 [32:12<1:54:01, 3.28s/it] {'loss': 0.6205, 'grad_norm': 0.5564543888689412, 'learning_rate': 4.752772526823453e-06, 'epoch': 0.69} 69%|██████▊ | 4551/6638 [32:12<1:54:01, 3.28s/it] 69%|██████▊ | 4552/6638 [32:15<1:54:13, 3.29s/it] {'loss': 0.6223, 'grad_norm': 0.588672360365074, 'learning_rate': 4.748619138694548e-06, 'epoch': 0.69} 69%|██████▊ | 4552/6638 [32:15<1:54:13, 3.29s/it] 69%|██████▊ | 4553/6638 [32:18<1:53:59, 3.28s/it] {'loss': 0.6594, 'grad_norm': 0.5885609847047052, 'learning_rate': 4.74446700102998e-06, 'epoch': 0.69} 69%|██████▊ | 4553/6638 [32:18<1:53:59, 3.28s/it] 69%|██████▊ | 4554/6638 [32:21<1:53:20, 3.26s/it] {'loss': 0.6264, 'grad_norm': 0.5862972235567164, 'learning_rate': 4.7403161148184515e-06, 'epoch': 0.69} 69%|██████▊ | 4554/6638 [32:21<1:53:20, 3.26s/it] 69%|██████▊ | 4555/6638 [32:25<1:53:57, 3.28s/it] {'loss': 0.6612, 'grad_norm': 0.730896193143443, 'learning_rate': 4.736166481048388e-06, 'epoch': 0.69} 69%|██████▊ | 4555/6638 [32:25<1:53:57, 3.28s/it] 69%|██████▊ | 4556/6638 [32:28<1:52:52, 3.25s/it] {'loss': 0.592, 'grad_norm': 0.6093838260976762, 'learning_rate': 4.732018100707892e-06, 'epoch': 0.69} 69%|██████▊ | 4556/6638 [32:28<1:52:52, 3.25s/it] 69%|██████▊ | 4557/6638 [32:31<1:52:10, 3.23s/it] {'loss': 0.623, 'grad_norm': 0.6324887246060251, 'learning_rate': 4.727870974784787e-06, 'epoch': 0.69} 69%|██████▊ | 4557/6638 [32:31<1:52:10, 3.23s/it] 69%|██████▊ | 4558/6638 [32:34<1:52:34, 3.25s/it] {'loss': 0.6416, 'grad_norm': 0.6115804277509346, 'learning_rate': 4.723725104266594e-06, 'epoch': 0.69} 69%|██████▊ | 4558/6638 [32:34<1:52:34, 3.25s/it] 69%|██████▊ | 4559/6638 [32:38<1:53:05, 3.26s/it] {'loss': 0.6104, 'grad_norm': 0.5439869497381614, 'learning_rate': 4.719580490140525e-06, 'epoch': 0.69} 69%|██████▊ | 4559/6638 [32:38<1:53:05, 3.26s/it] 69%|██████▊ | 4560/6638 [32:41<1:52:40, 3.25s/it] {'loss': 0.5831, 'grad_norm': 0.576633234464236, 'learning_rate': 4.715437133393503e-06, 'epoch': 0.69} 69%|██████▊ | 4560/6638 [32:41<1:52:40, 3.25s/it] 69%|██████▊ | 4561/6638 [32:44<1:52:00, 3.24s/it] {'loss': 0.6559, 'grad_norm': 0.6452992818715159, 'learning_rate': 4.711295035012153e-06, 'epoch': 0.69} 69%|██████▊ | 4561/6638 [32:44<1:52:00, 3.24s/it] 69%|██████▊ | 4562/6638 [32:47<1:51:58, 3.24s/it] {'loss': 0.6318, 'grad_norm': 0.6983379332902983, 'learning_rate': 4.707154195982788e-06, 'epoch': 0.69} 69%|██████▊ | 4562/6638 [32:47<1:51:58, 3.24s/it] 69%|██████▊ | 4563/6638 [32:51<1:52:29, 3.25s/it] {'loss': 0.63, 'grad_norm': 0.5801994061487162, 'learning_rate': 4.703014617291436e-06, 'epoch': 0.69} 69%|██████▊ | 4563/6638 [32:51<1:52:29, 3.25s/it] 69%|██████▉ | 4564/6638 [32:54<1:51:36, 3.23s/it] {'loss': 0.6035, 'grad_norm': 0.5859341789820068, 'learning_rate': 4.698876299923807e-06, 'epoch': 0.69} 69%|██████▉ | 4564/6638 [32:54<1:51:36, 3.23s/it] 69%|██████▉ | 4565/6638 [32:57<1:52:26, 3.25s/it] {'loss': 0.682, 'grad_norm': 0.6225234608130751, 'learning_rate': 4.694739244865335e-06, 'epoch': 0.69} 69%|██████▉ | 4565/6638 [32:57<1:52:26, 3.25s/it] 69%|██████▉ | 4566/6638 [33:00<1:52:27, 3.26s/it] {'loss': 0.6218, 'grad_norm': 0.5385353058127919, 'learning_rate': 4.690603453101134e-06, 'epoch': 0.69} 69%|██████▉ | 4566/6638 [33:00<1:52:27, 3.26s/it] 69%|██████▉ | 4567/6638 [33:04<1:51:36, 3.23s/it] {'loss': 0.6275, 'grad_norm': 0.6082707484286983, 'learning_rate': 4.686468925616021e-06, 'epoch': 0.69} 69%|██████▉ | 4567/6638 [33:04<1:51:36, 3.23s/it] 69%|██████▉ | 4568/6638 [33:07<1:53:48, 3.30s/it] {'loss': 0.6326, 'grad_norm': 0.5328980647459903, 'learning_rate': 4.6823356633945136e-06, 'epoch': 0.69} 69%|██████▉ | 4568/6638 [33:07<1:53:48, 3.30s/it] 69%|██████▉ | 4569/6638 [33:10<1:53:09, 3.28s/it] {'loss': 0.6265, 'grad_norm': 0.5637397924350044, 'learning_rate': 4.678203667420832e-06, 'epoch': 0.69} 69%|██████▉ | 4569/6638 [33:10<1:53:09, 3.28s/it] 69%|██████▉ | 4570/6638 [33:14<1:55:02, 3.34s/it] {'loss': 0.6418, 'grad_norm': 0.5497913793729812, 'learning_rate': 4.674072938678894e-06, 'epoch': 0.69} 69%|██████▉ | 4570/6638 [33:14<1:55:02, 3.34s/it] 69%|██████▉ | 4571/6638 [33:17<1:54:35, 3.33s/it] {'loss': 0.6869, 'grad_norm': 0.6622239263006957, 'learning_rate': 4.669943478152305e-06, 'epoch': 0.69} 69%|██████▉ | 4571/6638 [33:17<1:54:35, 3.33s/it] 69%|██████▉ | 4572/6638 [33:20<1:52:43, 3.27s/it] {'loss': 0.6511, 'grad_norm': 0.6223144490500898, 'learning_rate': 4.665815286824381e-06, 'epoch': 0.69} 69%|██████▉ | 4572/6638 [33:20<1:52:43, 3.27s/it] 69%|██████▉ | 4573/6638 [33:24<1:55:08, 3.35s/it] {'loss': 0.6615, 'grad_norm': 0.6584585669431199, 'learning_rate': 4.661688365678135e-06, 'epoch': 0.69} 69%|██████▉ | 4573/6638 [33:24<1:55:08, 3.35s/it] 69%|██████▉ | 4574/6638 [33:27<1:54:03, 3.32s/it] {'loss': 0.6518, 'grad_norm': 0.7067685337447733, 'learning_rate': 4.657562715696266e-06, 'epoch': 0.69} 69%|██████▉ | 4574/6638 [33:27<1:54:03, 3.32s/it] 69%|██████▉ | 4575/6638 [33:30<1:53:33, 3.30s/it] {'loss': 0.6291, 'grad_norm': 0.5496478491255385, 'learning_rate': 4.653438337861182e-06, 'epoch': 0.69} 69%|██████▉ | 4575/6638 [33:30<1:53:33, 3.30s/it] 69%|██████▉ | 4576/6638 [33:34<1:55:15, 3.35s/it] {'loss': 0.7383, 'grad_norm': 0.7172997024773935, 'learning_rate': 4.649315233154988e-06, 'epoch': 0.69} 69%|██████▉ | 4576/6638 [33:34<1:55:15, 3.35s/it] 69%|██████▉ | 4577/6638 [33:37<1:53:26, 3.30s/it] {'loss': 0.644, 'grad_norm': 0.6389772541410005, 'learning_rate': 4.645193402559473e-06, 'epoch': 0.69} 69%|██████▉ | 4577/6638 [33:37<1:53:26, 3.30s/it] 69%|██████▉ | 4578/6638 [33:40<1:52:46, 3.28s/it] {'loss': 0.6689, 'grad_norm': 0.6148282083754012, 'learning_rate': 4.641072847056142e-06, 'epoch': 0.69} 69%|██████▉ | 4578/6638 [33:40<1:52:46, 3.28s/it] 69%|██████▉ | 4579/6638 [33:44<1:53:50, 3.32s/it] {'loss': 0.6644, 'grad_norm': 0.6950684121851165, 'learning_rate': 4.636953567626176e-06, 'epoch': 0.69} 69%|██████▉ | 4579/6638 [33:44<1:53:50, 3.32s/it] 69%|██████▉ | 4580/6638 [33:47<1:54:07, 3.33s/it] {'loss': 0.6898, 'grad_norm': 0.6525680627756094, 'learning_rate': 4.6328355652504686e-06, 'epoch': 0.69} 69%|██████▉ | 4580/6638 [33:47<1:54:07, 3.33s/it] 69%|██████▉ | 4581/6638 [33:50<1:55:14, 3.36s/it] {'loss': 0.6732, 'grad_norm': 0.620239882683244, 'learning_rate': 4.628718840909604e-06, 'epoch': 0.69} 69%|██████▉ | 4581/6638 [33:50<1:55:14, 3.36s/it] 69%|██████▉ | 4582/6638 [33:54<1:53:44, 3.32s/it] {'loss': 0.6482, 'grad_norm': 0.5936034700713024, 'learning_rate': 4.624603395583854e-06, 'epoch': 0.69} 69%|██████▉ | 4582/6638 [33:54<1:53:44, 3.32s/it] 69%|██████▉ | 4583/6638 [33:57<1:51:50, 3.27s/it] {'loss': 0.6347, 'grad_norm': 0.5806966947775536, 'learning_rate': 4.620489230253198e-06, 'epoch': 0.69} 69%|██████▉ | 4583/6638 [33:57<1:51:50, 3.27s/it] 69%|██████▉ | 4584/6638 [34:00<1:52:35, 3.29s/it] {'loss': 0.6654, 'grad_norm': 0.5796437685498608, 'learning_rate': 4.616376345897303e-06, 'epoch': 0.69} 69%|██████▉ | 4584/6638 [34:00<1:52:35, 3.29s/it] 69%|██████▉ | 4585/6638 [34:03<1:51:44, 3.27s/it] {'loss': 0.6441, 'grad_norm': 0.6220891341779629, 'learning_rate': 4.612264743495539e-06, 'epoch': 0.69} 69%|██████▉ | 4585/6638 [34:03<1:51:44, 3.27s/it] 69%|██████▉ | 4586/6638 [34:07<1:52:30, 3.29s/it] {'loss': 0.6371, 'grad_norm': 0.5715746812761383, 'learning_rate': 4.60815442402696e-06, 'epoch': 0.69} 69%|██████▉ | 4586/6638 [34:07<1:52:30, 3.29s/it] 69%|██████▉ | 4587/6638 [34:10<1:53:29, 3.32s/it] {'loss': 0.646, 'grad_norm': 0.5285792833299414, 'learning_rate': 4.604045388470314e-06, 'epoch': 0.69} 69%|██████▉ | 4587/6638 [34:10<1:53:29, 3.32s/it] 69%|██████▉ | 4588/6638 [34:13<1:52:31, 3.29s/it] {'loss': 0.6154, 'grad_norm': 0.5604278919232287, 'learning_rate': 4.599937637804063e-06, 'epoch': 0.69} 69%|██████▉ | 4588/6638 [34:13<1:52:31, 3.29s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (4345 > 4096). Running this sequence through the model will result in indexing errors 69%|██████▉ | 4589/6638 [34:16<1:51:06, 3.25s/it] {'loss': 0.6118, 'grad_norm': 0.5955832455026644, 'learning_rate': 4.595831173006335e-06, 'epoch': 0.69} 69%|██████▉ | 4589/6638 [34:16<1:51:06, 3.25s/it] 69%|██████▉ | 4590/6638 [34:20<1:50:44, 3.24s/it] {'loss': 0.6345, 'grad_norm': 0.57691685885883, 'learning_rate': 4.5917259950549775e-06, 'epoch': 0.69} 69%|██████▉ | 4590/6638 [34:20<1:50:44, 3.24s/it] 69%|██████▉ | 4591/6638 [34:23<1:51:01, 3.25s/it] {'loss': 0.6203, 'grad_norm': 0.6187948803262753, 'learning_rate': 4.587622104927511e-06, 'epoch': 0.69} 69%|██████▉ | 4591/6638 [34:23<1:51:01, 3.25s/it] 69%|██████▉ | 4592/6638 [34:26<1:51:13, 3.26s/it] {'loss': 0.6481, 'grad_norm': 0.66863026739009, 'learning_rate': 4.583519503601159e-06, 'epoch': 0.69} 69%|██████▉ | 4592/6638 [34:26<1:51:13, 3.26s/it] 69%|██████▉ | 4593/6638 [34:29<1:51:08, 3.26s/it] {'loss': 0.6443, 'grad_norm': 0.6076941100517935, 'learning_rate': 4.579418192052844e-06, 'epoch': 0.69} 69%|██████▉ | 4593/6638 [34:29<1:51:08, 3.26s/it] 69%|██████▉ | 4594/6638 [34:33<1:51:49, 3.28s/it] {'loss': 0.6452, 'grad_norm': 0.6179008766602995, 'learning_rate': 4.5753181712591675e-06, 'epoch': 0.69} 69%|██████▉ | 4594/6638 [34:33<1:51:49, 3.28s/it] 69%|██████▉ | 4595/6638 [34:36<1:51:07, 3.26s/it] {'loss': 0.6378, 'grad_norm': 0.614858733061887, 'learning_rate': 4.571219442196433e-06, 'epoch': 0.69} 69%|██████▉ | 4595/6638 [34:36<1:51:07, 3.26s/it] 69%|██████▉ | 4596/6638 [34:39<1:53:23, 3.33s/it] {'loss': 0.6653, 'grad_norm': 0.595782440307248, 'learning_rate': 4.567122005840639e-06, 'epoch': 0.69} 69%|██████▉ | 4596/6638 [34:39<1:53:23, 3.33s/it] 69%|██████▉ | 4597/6638 [34:43<1:52:47, 3.32s/it] {'loss': 0.617, 'grad_norm': 0.5736077158345351, 'learning_rate': 4.563025863167461e-06, 'epoch': 0.69} 69%|██████▉ | 4597/6638 [34:43<1:52:47, 3.32s/it] 69%|██████▉ | 4598/6638 [34:46<1:51:45, 3.29s/it] {'loss': 0.661, 'grad_norm': 0.6163308917005301, 'learning_rate': 4.558931015152288e-06, 'epoch': 0.69} 69%|██████▉ | 4598/6638 [34:46<1:51:45, 3.29s/it] 69%|██████▉ | 4599/6638 [34:49<1:52:05, 3.30s/it] {'loss': 0.6307, 'grad_norm': 0.6479167528407923, 'learning_rate': 4.55483746277018e-06, 'epoch': 0.69} 69%|██████▉ | 4599/6638 [34:49<1:52:05, 3.30s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 7AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 69%|██████▉ | 4600/6638 [34:53<1:51:47, 3.29s/it] {'loss': 0.6414, 'grad_norm': 0.6467722804365335, 'learning_rate': 4.550745206995901e-06, 'epoch': 0.69} 69%|██████▉ | 4600/6638 [34:53<1:51:47, 3.29s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 69%|██████▉ | 4601/6638 [35:09<4:06:23, 7.26s/it] {'loss': 0.6532, 'grad_norm': 0.5633054184549432, 'learning_rate': 4.54665424880391e-06, 'epoch': 0.69} 69%|██████▉ | 4601/6638 [35:09<4:06:23, 7.26s/it] 69%|██████▉ | 4602/6638 [35:12<3:25:19, 6.05s/it] {'loss': 0.6493, 'grad_norm': 0.5888137652630362, 'learning_rate': 4.54256458916834e-06, 'epoch': 0.69} 69%|██████▉ | 4602/6638 [35:12<3:25:19, 6.05s/it] 69%|██████▉ | 4603/6638 [35:16<2:57:49, 5.24s/it] {'loss': 0.6401, 'grad_norm': 0.5794862749636462, 'learning_rate': 4.53847622906303e-06, 'epoch': 0.69} 69%|██████▉ | 4603/6638 [35:16<2:57:49, 5.24s/it] 69%|██████▉ | 4604/6638 [35:19<2:37:19, 4.64s/it] {'loss': 0.6163, 'grad_norm': 0.6853966492926579, 'learning_rate': 4.5343891694615094e-06, 'epoch': 0.69} 69%|██████▉ | 4604/6638 [35:19<2:37:19, 4.64s/it]/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/model/llava_arch.py:500: UserWarning: Truncating sequences to `model_max_length` (4096). warnings.warn(f"Truncating sequences to `model_max_length` ({self.tokenizer.model_max_length}).") [2025-05-27 22:38:03,381] [WARNING] [stage3.py:1850:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time 69%|██████▉ | 4605/6638 [35:23<2:36:07, 4.61s/it] {'loss': 0.6401, 'grad_norm': 0.5737436358078047, 'learning_rate': 4.530303411336983e-06, 'epoch': 0.69} 69%|██████▉ | 4605/6638 [35:23<2:36:07, 4.61s/it] 69%|██████▉ | 4606/6638 [35:27<2:22:42, 4.21s/it] {'loss': 0.6568, 'grad_norm': 0.629829594489583, 'learning_rate': 4.526218955662361e-06, 'epoch': 0.69} 69%|██████▉ | 4606/6638 [35:27<2:22:42, 4.21s/it] 69%|██████▉ | 4607/6638 [35:30<2:12:23, 3.91s/it] {'loss': 0.6822, 'grad_norm': 0.6405348304931355, 'learning_rate': 4.522135803410238e-06, 'epoch': 0.69} 69%|██████▉ | 4607/6638 [35:30<2:12:23, 3.91s/it] 69%|██████▉ | 4608/6638 [35:33<2:05:21, 3.71s/it] {'loss': 0.6431, 'grad_norm': 0.5867316223737478, 'learning_rate': 4.518053955552903e-06, 'epoch': 0.69} 69%|██████▉ | 4608/6638 [35:33<2:05:21, 3.71s/it] 69%|██████▉ | 4609/6638 [35:37<2:02:23, 3.62s/it] {'loss': 0.6293, 'grad_norm': 0.573013552804018, 'learning_rate': 4.513973413062326e-06, 'epoch': 0.69} 69%|██████▉ | 4609/6638 [35:37<2:02:23, 3.62s/it] 69%|██████▉ | 4610/6638 [35:40<1:58:29, 3.51s/it] {'loss': 0.7089, 'grad_norm': 0.6709361777409769, 'learning_rate': 4.509894176910161e-06, 'epoch': 0.69} 69%|██████▉ | 4610/6638 [35:40<1:58:29, 3.51s/it] 69%|██████▉ | 4611/6638 [35:43<1:56:19, 3.44s/it] {'loss': 0.6551, 'grad_norm': 0.6062400852781089, 'learning_rate': 4.505816248067778e-06, 'epoch': 0.69} 69%|██████▉ | 4611/6638 [35:43<1:56:19, 3.44s/it] 69%|██████▉ | 4612/6638 [35:46<1:53:51, 3.37s/it] {'loss': 0.618, 'grad_norm': 0.6016462674167294, 'learning_rate': 4.501739627506203e-06, 'epoch': 0.69} 69%|██████▉ | 4612/6638 [35:46<1:53:51, 3.37s/it] 69%|██████▉ | 4613/6638 [35:50<1:52:34, 3.34s/it] {'loss': 0.6358, 'grad_norm': 0.6097652404487752, 'learning_rate': 4.497664316196175e-06, 'epoch': 0.69} 69%|██████▉ | 4613/6638 [35:50<1:52:34, 3.34s/it] 70%|██████▉ | 4614/6638 [35:53<1:52:53, 3.35s/it] {'loss': 0.6324, 'grad_norm': 0.5316640278150055, 'learning_rate': 4.493590315108103e-06, 'epoch': 0.7} 70%|██████▉ | 4614/6638 [35:53<1:52:53, 3.35s/it] 70%|██████▉ | 4615/6638 [35:56<1:52:09, 3.33s/it] {'loss': 0.6328, 'grad_norm': 0.6530355311213725, 'learning_rate': 4.489517625212096e-06, 'epoch': 0.7} 70%|██████▉ | 4615/6638 [35:56<1:52:09, 3.33s/it] 70%|██████▉ | 4616/6638 [36:00<1:51:52, 3.32s/it] {'loss': 0.6656, 'grad_norm': 0.5631668411806791, 'learning_rate': 4.485446247477953e-06, 'epoch': 0.7} 70%|██████▉ | 4616/6638 [36:00<1:51:52, 3.32s/it] 70%|██████▉ | 4617/6638 [36:03<1:51:30, 3.31s/it] {'loss': 0.6335, 'grad_norm': 0.6052293054207551, 'learning_rate': 4.4813761828751445e-06, 'epoch': 0.7} 70%|██████▉ | 4617/6638 [36:03<1:51:30, 3.31s/it] 70%|██████▉ | 4618/6638 [36:06<1:50:47, 3.29s/it] {'loss': 0.631, 'grad_norm': 0.5549909871612911, 'learning_rate': 4.477307432372844e-06, 'epoch': 0.7} 70%|██████▉ | 4618/6638 [36:06<1:50:47, 3.29s/it] 70%|██████▉ | 4619/6638 [36:09<1:49:35, 3.26s/it] {'loss': 0.5976, 'grad_norm': 0.5394091114145347, 'learning_rate': 4.473239996939909e-06, 'epoch': 0.7} 70%|██████▉ | 4619/6638 [36:09<1:49:35, 3.26s/it] 70%|██████▉ | 4620/6638 [36:12<1:48:46, 3.23s/it] {'loss': 0.6015, 'grad_norm': 0.5712344412014485, 'learning_rate': 4.4691738775448755e-06, 'epoch': 0.7} 70%|██████▉ | 4620/6638 [36:12<1:48:46, 3.23s/it] 70%|██████▉ | 4621/6638 [36:16<1:50:13, 3.28s/it] {'loss': 0.6564, 'grad_norm': 0.6811997339783584, 'learning_rate': 4.4651090751559775e-06, 'epoch': 0.7} 70%|██████▉ | 4621/6638 [36:16<1:50:13, 3.28s/it] 70%|██████▉ | 4622/6638 [36:19<1:49:16, 3.25s/it] {'loss': 0.6059, 'grad_norm': 0.5767046951745384, 'learning_rate': 4.461045590741122e-06, 'epoch': 0.7} 70%|██████▉ | 4622/6638 [36:19<1:49:16, 3.25s/it] 70%|██████▉ | 4623/6638 [36:22<1:49:47, 3.27s/it] {'loss': 0.683, 'grad_norm': 0.6914803306995597, 'learning_rate': 4.4569834252679235e-06, 'epoch': 0.7} 70%|██████▉ | 4623/6638 [36:22<1:49:47, 3.27s/it] 70%|██████▉ | 4624/6638 [36:26<1:52:12, 3.34s/it] {'loss': 0.6404, 'grad_norm': 0.577127718135318, 'learning_rate': 4.45292257970366e-06, 'epoch': 0.7} 70%|██████▉ | 4624/6638 [36:26<1:52:12, 3.34s/it] 70%|██████▉ | 4625/6638 [36:29<1:54:32, 3.41s/it] {'loss': 0.6328, 'grad_norm': 0.5296727695524583, 'learning_rate': 4.4488630550153036e-06, 'epoch': 0.7} 70%|██████▉ | 4625/6638 [36:29<1:54:32, 3.41s/it] 70%|██████▉ | 4626/6638 [36:33<1:53:30, 3.38s/it] {'loss': 0.6482, 'grad_norm': 0.5518491638645018, 'learning_rate': 4.444804852169515e-06, 'epoch': 0.7} 70%|██████▉ | 4626/6638 [36:33<1:53:30, 3.38s/it] 70%|██████▉ | 4627/6638 [36:36<1:52:19, 3.35s/it] {'loss': 0.6453, 'grad_norm': 0.5555740147881898, 'learning_rate': 4.44074797213264e-06, 'epoch': 0.7} 70%|██████▉ | 4627/6638 [36:36<1:52:19, 3.35s/it] 70%|██████▉ | 4628/6638 [36:39<1:52:08, 3.35s/it] {'loss': 0.6092, 'grad_norm': 0.5315713788584766, 'learning_rate': 4.436692415870701e-06, 'epoch': 0.7} 70%|██████▉ | 4628/6638 [36:39<1:52:08, 3.35s/it] 70%|██████▉ | 4629/6638 [36:43<1:51:46, 3.34s/it] {'loss': 0.663, 'grad_norm': 0.5996703292915534, 'learning_rate': 4.432638184349416e-06, 'epoch': 0.7} 70%|██████▉ | 4629/6638 [36:43<1:51:46, 3.34s/it] 70%|██████▉ | 4630/6638 [36:46<1:51:24, 3.33s/it] {'loss': 0.6029, 'grad_norm': 0.5052906815745808, 'learning_rate': 4.428585278534181e-06, 'epoch': 0.7} 70%|██████▉ | 4630/6638 [36:46<1:51:24, 3.33s/it] 70%|██████▉ | 4631/6638 [36:49<1:50:31, 3.30s/it] {'loss': 0.6761, 'grad_norm': 0.5935394117301245, 'learning_rate': 4.424533699390083e-06, 'epoch': 0.7} 70%|██████▉ | 4631/6638 [36:49<1:50:31, 3.30s/it] 70%|██████▉ | 4632/6638 [36:53<1:52:52, 3.38s/it] {'loss': 0.6035, 'grad_norm': 0.48679177979983335, 'learning_rate': 4.420483447881885e-06, 'epoch': 0.7} 70%|██████▉ | 4632/6638 [36:53<1:52:52, 3.38s/it] 70%|██████▉ | 4633/6638 [36:56<1:53:00, 3.38s/it] {'loss': 0.639, 'grad_norm': 0.5956947599650285, 'learning_rate': 4.416434524974033e-06, 'epoch': 0.7} 70%|██████▉ | 4633/6638 [36:56<1:53:00, 3.38s/it] 70%|██████▉ | 4634/6638 [36:59<1:51:31, 3.34s/it] {'loss': 0.6431, 'grad_norm': 0.6783765707768536, 'learning_rate': 4.412386931630663e-06, 'epoch': 0.7} 70%|██████▉ | 4634/6638 [36:59<1:51:31, 3.34s/it] 70%|██████▉ | 4635/6638 [37:03<1:50:28, 3.31s/it] {'loss': 0.6158, 'grad_norm': 0.5605727091670648, 'learning_rate': 4.408340668815595e-06, 'epoch': 0.7} 70%|██████▉ | 4635/6638 [37:03<1:50:28, 3.31s/it] 70%|██████▉ | 4636/6638 [37:06<1:50:32, 3.31s/it] {'loss': 0.6449, 'grad_norm': 0.6276660463733625, 'learning_rate': 4.4042957374923316e-06, 'epoch': 0.7} 70%|██████▉ | 4636/6638 [37:06<1:50:32, 3.31s/it] 70%|██████▉ | 4637/6638 [37:09<1:49:46, 3.29s/it] {'loss': 0.6197, 'grad_norm': 0.5630627241434767, 'learning_rate': 4.400252138624047e-06, 'epoch': 0.7} 70%|██████▉ | 4637/6638 [37:09<1:49:46, 3.29s/it] 70%|██████▉ | 4638/6638 [37:12<1:48:44, 3.26s/it] {'loss': 0.6069, 'grad_norm': 0.5553092537438464, 'learning_rate': 4.396209873173614e-06, 'epoch': 0.7} 70%|██████▉ | 4638/6638 [37:12<1:48:44, 3.26s/it] 70%|██████▉ | 4639/6638 [37:16<1:48:45, 3.26s/it] {'loss': 0.6202, 'grad_norm': 0.555938084076045, 'learning_rate': 4.392168942103583e-06, 'epoch': 0.7} 70%|██████▉ | 4639/6638 [37:16<1:48:45, 3.26s/it] 70%|██████▉ | 4640/6638 [37:19<1:48:40, 3.26s/it] {'loss': 0.6545, 'grad_norm': 0.6433057335236655, 'learning_rate': 4.388129346376177e-06, 'epoch': 0.7} 70%|██████▉ | 4640/6638 [37:19<1:48:40, 3.26s/it] 70%|██████▉ | 4641/6638 [37:22<1:48:59, 3.27s/it] {'loss': 0.6223, 'grad_norm': 0.5973377929335645, 'learning_rate': 4.384091086953314e-06, 'epoch': 0.7} 70%|██████▉ | 4641/6638 [37:22<1:48:59, 3.27s/it] 70%|██████▉ | 4642/6638 [37:25<1:48:53, 3.27s/it] {'loss': 0.6503, 'grad_norm': 0.6659519470663963, 'learning_rate': 4.380054164796591e-06, 'epoch': 0.7} 70%|██████▉ | 4642/6638 [37:25<1:48:53, 3.27s/it] 70%|██████▉ | 4643/6638 [37:29<1:49:17, 3.29s/it] {'loss': 0.6321, 'grad_norm': 0.6262505691106974, 'learning_rate': 4.376018580867278e-06, 'epoch': 0.7} 70%|██████▉ | 4643/6638 [37:29<1:49:17, 3.29s/it] 70%|██████▉ | 4644/6638 [37:32<1:48:29, 3.26s/it] {'loss': 0.63, 'grad_norm': 0.6191402594612434, 'learning_rate': 4.371984336126338e-06, 'epoch': 0.7} 70%|██████▉ | 4644/6638 [37:32<1:48:29, 3.26s/it] 70%|██████▉ | 4645/6638 [37:35<1:48:14, 3.26s/it] {'loss': 0.6389, 'grad_norm': 0.5699097274234317, 'learning_rate': 4.367951431534401e-06, 'epoch': 0.7} 70%|██████▉ | 4645/6638 [37:35<1:48:14, 3.26s/it] 70%|██████▉ | 4646/6638 [37:39<1:49:31, 3.30s/it] {'loss': 0.6491, 'grad_norm': 0.5821627594540842, 'learning_rate': 4.363919868051799e-06, 'epoch': 0.7} 70%|██████▉ | 4646/6638 [37:39<1:49:31, 3.30s/it] 70%|███████ | 4647/6638 [37:42<1:48:48, 3.28s/it] {'loss': 0.6256, 'grad_norm': 0.5980262762029519, 'learning_rate': 4.359889646638527e-06, 'epoch': 0.7} 70%|███████ | 4647/6638 [37:42<1:48:48, 3.28s/it] 70%|███████ | 4648/6638 [37:45<1:48:41, 3.28s/it] {'loss': 0.6303, 'grad_norm': 0.5966435739890129, 'learning_rate': 4.355860768254259e-06, 'epoch': 0.7} 70%|███████ | 4648/6638 [37:45<1:48:41, 3.28s/it] 70%|███████ | 4649/6638 [37:48<1:48:59, 3.29s/it] {'loss': 0.6713, 'grad_norm': 0.5961390352859313, 'learning_rate': 4.351833233858361e-06, 'epoch': 0.7} 70%|███████ | 4649/6638 [37:48<1:48:59, 3.29s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 70%|███████ | 4650/6638 [37:52<1:50:00, 3.32s/it] {'loss': 0.6928, 'grad_norm': 0.6523273297442697, 'learning_rate': 4.347807044409874e-06, 'epoch': 0.7} 70%|███████ | 4650/6638 [37:52<1:50:00, 3.32s/it] 70%|███████ | 4651/6638 [37:55<1:48:51, 3.29s/it] {'loss': 0.6156, 'grad_norm': 0.593250043681027, 'learning_rate': 4.343782200867522e-06, 'epoch': 0.7} 70%|███████ | 4651/6638 [37:55<1:48:51, 3.29s/it] 70%|███████ | 4652/6638 [37:58<1:48:36, 3.28s/it] {'loss': 0.6245, 'grad_norm': 0.565176928507819, 'learning_rate': 4.339758704189699e-06, 'epoch': 0.7} 70%|███████ | 4652/6638 [37:58<1:48:36, 3.28s/it] 70%|███████ | 4653/6638 [38:02<1:47:31, 3.25s/it] {'loss': 0.6225, 'grad_norm': 0.6315356295309218, 'learning_rate': 4.3357365553344836e-06, 'epoch': 0.7} 70%|███████ | 4653/6638 [38:02<1:47:31, 3.25s/it] 70%|███████ | 4654/6638 [38:05<1:49:35, 3.31s/it] {'loss': 0.659, 'grad_norm': 0.5840300195109338, 'learning_rate': 4.331715755259642e-06, 'epoch': 0.7} 70%|███████ | 4654/6638 [38:05<1:49:35, 3.31s/it] 70%|███████ | 4655/6638 [38:08<1:50:45, 3.35s/it] {'loss': 0.647, 'grad_norm': 0.6196213596544824, 'learning_rate': 4.327696304922606e-06, 'epoch': 0.7} 70%|███████ | 4655/6638 [38:08<1:50:45, 3.35s/it] 70%|███████ | 4656/6638 [38:12<1:48:40, 3.29s/it] {'loss': 0.5977, 'grad_norm': 0.6272121006665438, 'learning_rate': 4.3236782052804884e-06, 'epoch': 0.7} 70%|███████ | 4656/6638 [38:12<1:48:40, 3.29s/it] 70%|███████ | 4657/6638 [38:15<1:49:13, 3.31s/it] {'loss': 0.6759, 'grad_norm': 0.6007663220574962, 'learning_rate': 4.3196614572900865e-06, 'epoch': 0.7} 70%|███████ | 4657/6638 [38:15<1:49:13, 3.31s/it] 70%|███████ | 4658/6638 [38:18<1:48:42, 3.29s/it] {'loss': 0.6065, 'grad_norm': 0.584276325112802, 'learning_rate': 4.315646061907872e-06, 'epoch': 0.7} 70%|███████ | 4658/6638 [38:18<1:48:42, 3.29s/it] 70%|███████ | 4659/6638 [38:21<1:47:33, 3.26s/it] {'loss': 0.669, 'grad_norm': 0.6772683233355759, 'learning_rate': 4.31163202009e-06, 'epoch': 0.7} 70%|███████ | 4659/6638 [38:21<1:47:33, 3.26s/it] 70%|███████ | 4660/6638 [38:25<1:48:05, 3.28s/it] {'loss': 0.6764, 'grad_norm': 0.6207952879379774, 'learning_rate': 4.307619332792289e-06, 'epoch': 0.7} 70%|███████ | 4660/6638 [38:25<1:48:05, 3.28s/it] 70%|███████ | 4661/6638 [38:28<1:47:54, 3.27s/it] {'loss': 0.6455, 'grad_norm': 0.5567525709361235, 'learning_rate': 4.303608000970251e-06, 'epoch': 0.7} 70%|███████ | 4661/6638 [38:28<1:47:54, 3.27s/it] 70%|███████ | 4662/6638 [38:31<1:48:07, 3.28s/it] {'loss': 0.6443, 'grad_norm': 0.5653082283754322, 'learning_rate': 4.299598025579069e-06, 'epoch': 0.7} 70%|███████ | 4662/6638 [38:31<1:48:07, 3.28s/it] 70%|███████ | 4663/6638 [38:35<1:48:25, 3.29s/it] {'loss': 0.6271, 'grad_norm': 0.558786140254231, 'learning_rate': 4.295589407573598e-06, 'epoch': 0.7} 70%|███████ | 4663/6638 [38:35<1:48:25, 3.29s/it] 70%|███████ | 4664/6638 [38:38<1:50:02, 3.34s/it] {'loss': 0.6812, 'grad_norm': 0.6061542891935404, 'learning_rate': 4.291582147908376e-06, 'epoch': 0.7} 70%|███████ | 4664/6638 [38:38<1:50:02, 3.34s/it] 70%|███████ | 4665/6638 [38:41<1:49:23, 3.33s/it] {'loss': 0.6648, 'grad_norm': 0.5739449695972746, 'learning_rate': 4.2875762475376204e-06, 'epoch': 0.7} 70%|███████ | 4665/6638 [38:41<1:49:23, 3.33s/it] 70%|███████ | 4666/6638 [38:45<1:48:38, 3.31s/it] {'loss': 0.6224, 'grad_norm': 0.5781965408763308, 'learning_rate': 4.283571707415214e-06, 'epoch': 0.7} 70%|███████ | 4666/6638 [38:45<1:48:38, 3.31s/it] 70%|███████ | 4667/6638 [38:48<1:48:32, 3.30s/it] {'loss': 0.6295, 'grad_norm': 0.554333060668496, 'learning_rate': 4.279568528494727e-06, 'epoch': 0.7} 70%|███████ | 4667/6638 [38:48<1:48:32, 3.30s/it] 70%|███████ | 4668/6638 [38:51<1:47:35, 3.28s/it] {'loss': 0.5971, 'grad_norm': 0.5565429222862027, 'learning_rate': 4.275566711729392e-06, 'epoch': 0.7} 70%|███████ | 4668/6638 [38:51<1:47:35, 3.28s/it] 70%|███████ | 4669/6638 [38:54<1:46:19, 3.24s/it] {'loss': 0.714, 'grad_norm': 0.6975400962028327, 'learning_rate': 4.271566258072139e-06, 'epoch': 0.7} 70%|███████ | 4669/6638 [38:54<1:46:19, 3.24s/it] 70%|███████ | 4670/6638 [38:58<1:46:39, 3.25s/it] {'loss': 0.6278, 'grad_norm': 0.6364569284627177, 'learning_rate': 4.267567168475554e-06, 'epoch': 0.7} 70%|███████ | 4670/6638 [38:58<1:46:39, 3.25s/it] 70%|███████ | 4671/6638 [39:01<1:46:52, 3.26s/it] {'loss': 0.6643, 'grad_norm': 0.6719146233864036, 'learning_rate': 4.2635694438919e-06, 'epoch': 0.7} 70%|███████ | 4671/6638 [39:01<1:46:52, 3.26s/it] 70%|███████ | 4672/6638 [39:04<1:48:31, 3.31s/it] {'loss': 0.6402, 'grad_norm': 0.529577157940741, 'learning_rate': 4.259573085273122e-06, 'epoch': 0.7} 70%|███████ | 4672/6638 [39:04<1:48:31, 3.31s/it] 70%|███████ | 4673/6638 [39:07<1:47:03, 3.27s/it] {'loss': 0.6364, 'grad_norm': 0.5584242369279848, 'learning_rate': 4.2555780935708405e-06, 'epoch': 0.7} 70%|███████ | 4673/6638 [39:07<1:47:03, 3.27s/it] 70%|███████ | 4674/6638 [39:11<1:46:28, 3.25s/it] {'loss': 0.6368, 'grad_norm': 0.5408443456574042, 'learning_rate': 4.2515844697363465e-06, 'epoch': 0.7} 70%|███████ | 4674/6638 [39:11<1:46:28, 3.25s/it] 70%|███████ | 4675/6638 [39:14<1:48:12, 3.31s/it] {'loss': 0.6294, 'grad_norm': 0.6035807115555794, 'learning_rate': 4.247592214720605e-06, 'epoch': 0.7} 70%|███████ | 4675/6638 [39:14<1:48:12, 3.31s/it] 70%|███████ | 4676/6638 [39:17<1:48:03, 3.30s/it] {'loss': 0.6146, 'grad_norm': 0.5727706989189555, 'learning_rate': 4.243601329474248e-06, 'epoch': 0.7} 70%|███████ | 4676/6638 [39:17<1:48:03, 3.30s/it] 70%|███████ | 4677/6638 [39:21<1:48:32, 3.32s/it] {'loss': 0.5965, 'grad_norm': 0.5144471745982216, 'learning_rate': 4.239611814947605e-06, 'epoch': 0.7} 70%|███████ | 4677/6638 [39:21<1:48:32, 3.32s/it] 70%|███████ | 4678/6638 [39:24<1:47:02, 3.28s/it] {'loss': 0.646, 'grad_norm': 0.5900434731263644, 'learning_rate': 4.23562367209065e-06, 'epoch': 0.7} 70%|███████ | 4678/6638 [39:24<1:47:02, 3.28s/it] 70%|███████ | 4679/6638 [39:27<1:46:53, 3.27s/it] {'loss': 0.6849, 'grad_norm': 0.5501214296163737, 'learning_rate': 4.231636901853053e-06, 'epoch': 0.7} 70%|███████ | 4679/6638 [39:27<1:46:53, 3.27s/it] 71%|███████ | 4680/6638 [39:30<1:46:23, 3.26s/it] {'loss': 0.6192, 'grad_norm': 0.5686808032842701, 'learning_rate': 4.227651505184141e-06, 'epoch': 0.71} 71%|███████ | 4680/6638 [39:30<1:46:23, 3.26s/it] 71%|███████ | 4681/6638 [39:34<1:47:24, 3.29s/it] {'loss': 0.6381, 'grad_norm': 0.6077577592556571, 'learning_rate': 4.223667483032921e-06, 'epoch': 0.71} 71%|███████ | 4681/6638 [39:34<1:47:24, 3.29s/it] 71%|███████ | 4682/6638 [39:37<1:47:18, 3.29s/it] {'loss': 0.6261, 'grad_norm': 0.572810468418154, 'learning_rate': 4.2196848363480815e-06, 'epoch': 0.71} 71%|███████ | 4682/6638 [39:37<1:47:18, 3.29s/it] 71%|███████ | 4683/6638 [39:40<1:47:12, 3.29s/it] {'loss': 0.6167, 'grad_norm': 0.5643254155358246, 'learning_rate': 4.215703566077962e-06, 'epoch': 0.71} 71%|███████ | 4683/6638 [39:40<1:47:12, 3.29s/it] 71%|███████ | 4684/6638 [39:44<1:47:45, 3.31s/it] {'loss': 0.694, 'grad_norm': 0.6523557970957652, 'learning_rate': 4.211723673170595e-06, 'epoch': 0.71} 71%|███████ | 4684/6638 [39:44<1:47:45, 3.31s/it] 71%|███████ | 4685/6638 [39:47<1:47:43, 3.31s/it] {'loss': 0.6494, 'grad_norm': 0.5827629351236037, 'learning_rate': 4.207745158573677e-06, 'epoch': 0.71} 71%|███████ | 4685/6638 [39:47<1:47:43, 3.31s/it] 71%|███████ | 4686/6638 [39:51<1:50:15, 3.39s/it] {'loss': 0.6655, 'grad_norm': 0.5933414310148741, 'learning_rate': 4.20376802323457e-06, 'epoch': 0.71} 71%|███████ | 4686/6638 [39:51<1:50:15, 3.39s/it] 71%|███████ | 4687/6638 [39:54<1:51:03, 3.42s/it] {'loss': 0.7052, 'grad_norm': 0.6959089078181043, 'learning_rate': 4.199792268100318e-06, 'epoch': 0.71} 71%|███████ | 4687/6638 [39:54<1:51:03, 3.42s/it] 71%|███████ | 4688/6638 [39:57<1:51:06, 3.42s/it] {'loss': 0.6342, 'grad_norm': 0.6014201901141204, 'learning_rate': 4.1958178941176355e-06, 'epoch': 0.71} 71%|███████ | 4688/6638 [39:57<1:51:06, 3.42s/it] 71%|███████ | 4689/6638 [40:03<2:10:36, 4.02s/it] {'loss': 0.6726, 'grad_norm': 0.6632557246214474, 'learning_rate': 4.191844902232896e-06, 'epoch': 0.71} 71%|███████ | 4689/6638 [40:03<2:10:36, 4.02s/it] 71%|███████ | 4690/6638 [40:06<2:03:20, 3.80s/it] {'loss': 0.6225, 'grad_norm': 0.5473869488924956, 'learning_rate': 4.187873293392161e-06, 'epoch': 0.71} 71%|███████ | 4690/6638 [40:06<2:03:20, 3.80s/it] 71%|███████ | 4691/6638 [40:10<2:00:02, 3.70s/it] {'loss': 0.6508, 'grad_norm': 0.5925639313511522, 'learning_rate': 4.183903068541146e-06, 'epoch': 0.71} 71%|███████ | 4691/6638 [40:10<2:00:02, 3.70s/it] 71%|███████ | 4692/6638 [40:13<1:56:36, 3.60s/it] {'loss': 0.6129, 'grad_norm': 0.5339959697810231, 'learning_rate': 4.17993422862525e-06, 'epoch': 0.71} 71%|███████ | 4692/6638 [40:13<1:56:36, 3.60s/it] 71%|███████ | 4693/6638 [40:21<2:35:34, 4.80s/it] {'loss': 0.6553, 'grad_norm': 0.5907489349034445, 'learning_rate': 4.1759667745895395e-06, 'epoch': 0.71} 71%|███████ | 4693/6638 [40:21<2:35:34, 4.80s/it] 71%|███████ | 4694/6638 [40:26<2:38:58, 4.91s/it] {'loss': 0.6383, 'grad_norm': 0.573779340892378, 'learning_rate': 4.1720007073787425e-06, 'epoch': 0.71} 71%|███████ | 4694/6638 [40:26<2:38:58, 4.91s/it] 71%|███████ | 4695/6638 [40:29<2:23:29, 4.43s/it] {'loss': 0.6317, 'grad_norm': 0.5554687899489739, 'learning_rate': 4.168036027937267e-06, 'epoch': 0.71} 71%|███████ | 4695/6638 [40:29<2:23:29, 4.43s/it] 71%|███████ | 4696/6638 [40:35<2:34:05, 4.76s/it] {'loss': 0.6321, 'grad_norm': 0.5888882695165654, 'learning_rate': 4.164072737209186e-06, 'epoch': 0.71} 71%|███████ | 4696/6638 [40:35<2:34:05, 4.76s/it] 71%|███████ | 4697/6638 [40:38<2:20:15, 4.34s/it] {'loss': 0.6401, 'grad_norm': 0.5221058567487677, 'learning_rate': 4.160110836138246e-06, 'epoch': 0.71} 71%|███████ | 4697/6638 [40:38<2:20:15, 4.34s/it] 71%|███████ | 4698/6638 [40:41<2:11:08, 4.06s/it] {'loss': 0.6521, 'grad_norm': 0.5733203000825239, 'learning_rate': 4.156150325667856e-06, 'epoch': 0.71} 71%|███████ | 4698/6638 [40:41<2:11:08, 4.06s/it] 71%|███████ | 4699/6638 [40:47<2:26:44, 4.54s/it] {'loss': 0.6099, 'grad_norm': 0.5645473553447448, 'learning_rate': 4.15219120674109e-06, 'epoch': 0.71} 71%|███████ | 4699/6638 [40:47<2:26:44, 4.54s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 71%|███████ | 4700/6638 [40:50<2:14:30, 4.16s/it]5 AutoResumeHook: Checking whether to suspend... {'loss': 0.6115, 'grad_norm': 0.5749145692398162, 'learning_rate': 4.148233480300709e-06, 'epoch': 0.71} 71%|███████ | 4700/6638 [40:50<2:14:30, 4.16s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 71%|███████ | 4701/6638 [41:08<4:26:10, 8.24s/it] {'loss': 0.6416, 'grad_norm': 0.5977130559153471, 'learning_rate': 4.144277147289125e-06, 'epoch': 0.71} 71%|███████ | 4701/6638 [41:08<4:26:10, 8.24s/it] 71%|███████ | 4702/6638 [41:13<3:56:58, 7.34s/it] {'loss': 0.7145, 'grad_norm': 0.6981814799421321, 'learning_rate': 4.140322208648428e-06, 'epoch': 0.71} 71%|███████ | 4702/6638 [41:13<3:56:58, 7.34s/it] 71%|███████ | 4703/6638 [41:17<3:17:49, 6.13s/it] {'loss': 0.6152, 'grad_norm': 0.537679163604045, 'learning_rate': 4.136368665320366e-06, 'epoch': 0.71} 71%|███████ | 4703/6638 [41:17<3:17:49, 6.13s/it] 71%|███████ | 4704/6638 [41:20<2:49:23, 5.26s/it] {'loss': 0.6041, 'grad_norm': 0.5452851986612771, 'learning_rate': 4.132416518246365e-06, 'epoch': 0.71} 71%|███████ | 4704/6638 [41:20<2:49:23, 5.26s/it] 71%|███████ | 4705/6638 [41:23<2:30:53, 4.68s/it] {'loss': 0.6251, 'grad_norm': 0.5364756967532868, 'learning_rate': 4.1284657683675164e-06, 'epoch': 0.71} 71%|███████ | 4705/6638 [41:23<2:30:53, 4.68s/it] 71%|███████ | 4706/6638 [41:26<2:17:02, 4.26s/it] {'loss': 0.616, 'grad_norm': 0.5983968289739109, 'learning_rate': 4.124516416624571e-06, 'epoch': 0.71} 71%|███████ | 4706/6638 [41:26<2:17:02, 4.26s/it] 71%|███████ | 4707/6638 [41:30<2:08:33, 3.99s/it] {'loss': 0.6668, 'grad_norm': 0.6221970665188356, 'learning_rate': 4.120568463957956e-06, 'epoch': 0.71} 71%|███████ | 4707/6638 [41:30<2:08:33, 3.99s/it] 71%|███████ | 4708/6638 [41:33<2:00:32, 3.75s/it] {'loss': 0.6189, 'grad_norm': 0.5845610829695971, 'learning_rate': 4.116621911307765e-06, 'epoch': 0.71} 71%|███████ | 4708/6638 [41:33<2:00:32, 3.75s/it] 71%|███████ | 4709/6638 [41:36<1:55:47, 3.60s/it] {'loss': 0.6277, 'grad_norm': 0.5662190809356864, 'learning_rate': 4.11267675961375e-06, 'epoch': 0.71} 71%|███████ | 4709/6638 [41:36<1:55:47, 3.60s/it] 71%|███████ | 4710/6638 [41:40<1:53:18, 3.53s/it] {'loss': 0.6291, 'grad_norm': 0.5786036975090392, 'learning_rate': 4.108733009815335e-06, 'epoch': 0.71} 71%|███████ | 4710/6638 [41:40<1:53:18, 3.53s/it] 71%|███████ | 4711/6638 [41:43<1:50:09, 3.43s/it] {'loss': 0.627, 'grad_norm': 0.9738633373091191, 'learning_rate': 4.104790662851613e-06, 'epoch': 0.71} 71%|███████ | 4711/6638 [41:43<1:50:09, 3.43s/it] 71%|███████ | 4712/6638 [41:48<2:03:35, 3.85s/it] {'loss': 0.6466, 'grad_norm': 0.5436160622884791, 'learning_rate': 4.1008497196613415e-06, 'epoch': 0.71} 71%|███████ | 4712/6638 [41:48<2:03:35, 3.85s/it] 71%|███████ | 4713/6638 [41:51<1:57:29, 3.66s/it] {'loss': 0.5966, 'grad_norm': 0.5907215161419994, 'learning_rate': 4.0969101811829406e-06, 'epoch': 0.71} 71%|███████ | 4713/6638 [41:51<1:57:29, 3.66s/it] 71%|███████ | 4714/6638 [41:54<1:56:27, 3.63s/it] {'loss': 0.6219, 'grad_norm': 0.46088173247320063, 'learning_rate': 4.092972048354491e-06, 'epoch': 0.71} 71%|███████ | 4714/6638 [41:54<1:56:27, 3.63s/it] 71%|███████ | 4715/6638 [41:58<1:52:30, 3.51s/it] {'loss': 0.6168, 'grad_norm': 0.5384783548742126, 'learning_rate': 4.089035322113749e-06, 'epoch': 0.71} 71%|███████ | 4715/6638 [41:58<1:52:30, 3.51s/it] 71%|███████ | 4716/6638 [42:01<1:49:49, 3.43s/it] {'loss': 0.7194, 'grad_norm': 0.691819323771103, 'learning_rate': 4.085100003398138e-06, 'epoch': 0.71} 71%|███████ | 4716/6638 [42:01<1:49:49, 3.43s/it] 71%|███████ | 4717/6638 [42:04<1:48:24, 3.39s/it] {'loss': 0.6548, 'grad_norm': 0.6285563914959115, 'learning_rate': 4.0811660931447305e-06, 'epoch': 0.71} 71%|███████ | 4717/6638 [42:04<1:48:24, 3.39s/it] 71%|███████ | 4718/6638 [42:07<1:47:17, 3.35s/it] {'loss': 0.6102, 'grad_norm': 0.655439782960291, 'learning_rate': 4.077233592290279e-06, 'epoch': 0.71} 71%|███████ | 4718/6638 [42:07<1:47:17, 3.35s/it] 71%|███████ | 4719/6638 [42:11<1:46:09, 3.32s/it] {'loss': 0.6285, 'grad_norm': 0.5894867969848812, 'learning_rate': 4.073302501771192e-06, 'epoch': 0.71} 71%|███████ | 4719/6638 [42:11<1:46:09, 3.32s/it] 71%|███████ | 4720/6638 [42:14<1:46:29, 3.33s/it] {'loss': 0.6715, 'grad_norm': 0.640905726242716, 'learning_rate': 4.069372822523552e-06, 'epoch': 0.71} 71%|███████ | 4720/6638 [42:14<1:46:29, 3.33s/it] 71%|███████ | 4721/6638 [42:17<1:45:52, 3.31s/it] {'loss': 0.6415, 'grad_norm': 0.6403804063105438, 'learning_rate': 4.065444555483091e-06, 'epoch': 0.71} 71%|███████ | 4721/6638 [42:17<1:45:52, 3.31s/it] 71%|███████ | 4722/6638 [42:21<1:45:36, 3.31s/it] {'loss': 0.6616, 'grad_norm': 0.5889506439574513, 'learning_rate': 4.061517701585208e-06, 'epoch': 0.71} 71%|███████ | 4722/6638 [42:21<1:45:36, 3.31s/it] 71%|███████ | 4723/6638 [42:24<1:45:42, 3.31s/it] {'loss': 0.6742, 'grad_norm': 0.6321199797666048, 'learning_rate': 4.057592261764982e-06, 'epoch': 0.71} 71%|███████ | 4723/6638 [42:24<1:45:42, 3.31s/it] 71%|███████ | 4724/6638 [42:28<1:49:05, 3.42s/it] {'loss': 0.6472, 'grad_norm': 0.5785926925097172, 'learning_rate': 4.053668236957135e-06, 'epoch': 0.71} 71%|███████ | 4724/6638 [42:28<1:49:05, 3.42s/it] 71%|███████ | 4725/6638 [42:31<1:47:19, 3.37s/it] {'loss': 0.6082, 'grad_norm': 0.582860152748636, 'learning_rate': 4.049745628096062e-06, 'epoch': 0.71} 71%|███████ | 4725/6638 [42:31<1:47:19, 3.37s/it] 71%|███████ | 4726/6638 [42:34<1:46:01, 3.33s/it] {'loss': 0.6149, 'grad_norm': 0.5683960826286042, 'learning_rate': 4.045824436115816e-06, 'epoch': 0.71} 71%|███████ | 4726/6638 [42:34<1:46:01, 3.33s/it] 71%|███████ | 4727/6638 [42:37<1:46:33, 3.35s/it] {'loss': 0.6903, 'grad_norm': 0.6280541899012938, 'learning_rate': 4.041904661950118e-06, 'epoch': 0.71} 71%|███████ | 4727/6638 [42:37<1:46:33, 3.35s/it] 71%|███████ | 4728/6638 [42:41<1:45:37, 3.32s/it] {'loss': 0.6159, 'grad_norm': 0.593666846162824, 'learning_rate': 4.037986306532351e-06, 'epoch': 0.71} 71%|███████ | 4728/6638 [42:41<1:45:37, 3.32s/it] 71%|███████ | 4729/6638 [42:44<1:45:15, 3.31s/it] {'loss': 0.6521, 'grad_norm': 0.5727652722288861, 'learning_rate': 4.034069370795552e-06, 'epoch': 0.71} 71%|███████ | 4729/6638 [42:44<1:45:15, 3.31s/it] 71%|███████▏ | 4730/6638 [42:47<1:45:48, 3.33s/it] {'loss': 0.6898, 'grad_norm': 0.6186286028679339, 'learning_rate': 4.030153855672428e-06, 'epoch': 0.71} 71%|███████▏ | 4730/6638 [42:47<1:45:48, 3.33s/it] 71%|███████▏ | 4731/6638 [42:51<1:46:22, 3.35s/it] {'loss': 0.6833, 'grad_norm': 0.678250024567251, 'learning_rate': 4.026239762095351e-06, 'epoch': 0.71} 71%|███████▏ | 4731/6638 [42:51<1:46:22, 3.35s/it] 71%|███████▏ | 4732/6638 [42:54<1:45:11, 3.31s/it] {'loss': 0.6664, 'grad_norm': 0.6311808193499743, 'learning_rate': 4.0223270909963405e-06, 'epoch': 0.71} 71%|███████▏ | 4732/6638 [42:54<1:45:11, 3.31s/it] 71%|███████▏ | 4733/6638 [42:57<1:45:23, 3.32s/it] {'loss': 0.6998, 'grad_norm': 0.6622593122014824, 'learning_rate': 4.018415843307094e-06, 'epoch': 0.71} 71%|███████▏ | 4733/6638 [42:57<1:45:23, 3.32s/it] 71%|███████▏ | 4734/6638 [43:01<1:43:49, 3.27s/it] {'loss': 0.6124, 'grad_norm': 0.5796736697633855, 'learning_rate': 4.01450601995895e-06, 'epoch': 0.71} 71%|███████▏ | 4734/6638 [43:01<1:43:49, 3.27s/it] 71%|███████▏ | 4735/6638 [43:04<1:43:21, 3.26s/it] {'loss': 0.6498, 'grad_norm': 0.6444057583475298, 'learning_rate': 4.010597621882936e-06, 'epoch': 0.71} 71%|███████▏ | 4735/6638 [43:04<1:43:21, 3.26s/it] 71%|███████▏ | 4736/6638 [43:07<1:43:41, 3.27s/it] {'loss': 0.6311, 'grad_norm': 0.6157961251891012, 'learning_rate': 4.0066906500097145e-06, 'epoch': 0.71} 71%|███████▏ | 4736/6638 [43:07<1:43:41, 3.27s/it] 71%|███████▏ | 4737/6638 [43:10<1:44:45, 3.31s/it] {'loss': 0.6592, 'grad_norm': 0.5980680144525107, 'learning_rate': 4.002785105269616e-06, 'epoch': 0.71} 71%|███████▏ | 4737/6638 [43:10<1:44:45, 3.31s/it] 71%|███████▏ | 4738/6638 [43:14<1:44:23, 3.30s/it] {'loss': 0.6436, 'grad_norm': 0.5618248379996222, 'learning_rate': 3.998880988592635e-06, 'epoch': 0.71} 71%|███████▏ | 4738/6638 [43:14<1:44:23, 3.30s/it] 71%|███████▏ | 4739/6638 [43:17<1:45:37, 3.34s/it] {'loss': 0.6999, 'grad_norm': 0.672623010527729, 'learning_rate': 3.994978300908425e-06, 'epoch': 0.71} 71%|███████▏ | 4739/6638 [43:17<1:45:37, 3.34s/it] 71%|███████▏ | 4740/6638 [43:20<1:44:24, 3.30s/it] {'loss': 0.6365, 'grad_norm': 0.5750863873115336, 'learning_rate': 3.991077043146302e-06, 'epoch': 0.71} 71%|███████▏ | 4740/6638 [43:20<1:44:24, 3.30s/it] 71%|███████▏ | 4741/6638 [43:24<1:44:01, 3.29s/it] {'loss': 0.6521, 'grad_norm': 0.5902941867215385, 'learning_rate': 3.987177216235229e-06, 'epoch': 0.71} 71%|███████▏ | 4741/6638 [43:24<1:44:01, 3.29s/it] 71%|███████▏ | 4742/6638 [43:27<1:44:51, 3.32s/it] {'loss': 0.6419, 'grad_norm': 0.5567858528936964, 'learning_rate': 3.98327882110384e-06, 'epoch': 0.71} 71%|███████▏ | 4742/6638 [43:27<1:44:51, 3.32s/it] 71%|███████▏ | 4743/6638 [43:30<1:43:50, 3.29s/it] {'loss': 0.6093, 'grad_norm': 0.5345467861684728, 'learning_rate': 3.979381858680431e-06, 'epoch': 0.71} 71%|███████▏ | 4743/6638 [43:30<1:43:50, 3.29s/it] 71%|███████▏ | 4744/6638 [43:34<1:44:39, 3.32s/it] {'loss': 0.6431, 'grad_norm': 0.6666943043941548, 'learning_rate': 3.9754863298929414e-06, 'epoch': 0.71} 71%|███████▏ | 4744/6638 [43:34<1:44:39, 3.32s/it] 71%|███████▏ | 4745/6638 [43:37<1:42:57, 3.26s/it] {'loss': 0.6638, 'grad_norm': 0.6358702803750633, 'learning_rate': 3.971592235668984e-06, 'epoch': 0.71} 71%|███████▏ | 4745/6638 [43:37<1:42:57, 3.26s/it] 71%|███████▏ | 4746/6638 [43:40<1:43:51, 3.29s/it] {'loss': 0.6527, 'grad_norm': 0.6719421342195268, 'learning_rate': 3.967699576935828e-06, 'epoch': 0.71} 71%|███████▏ | 4746/6638 [43:40<1:43:51, 3.29s/it] 72%|███████▏ | 4747/6638 [43:43<1:43:39, 3.29s/it] {'loss': 0.6139, 'grad_norm': 0.5650586157422361, 'learning_rate': 3.963808354620388e-06, 'epoch': 0.72} 72%|███████▏ | 4747/6638 [43:43<1:43:39, 3.29s/it] 72%|███████▏ | 4748/6638 [43:47<1:43:44, 3.29s/it] {'loss': 0.608, 'grad_norm': 0.5031198198503766, 'learning_rate': 3.959918569649255e-06, 'epoch': 0.72} 72%|███████▏ | 4748/6638 [43:47<1:43:44, 3.29s/it] 72%|███████▏ | 4749/6638 [43:50<1:42:52, 3.27s/it] {'loss': 0.6034, 'grad_norm': 0.5899766663296737, 'learning_rate': 3.9560302229486604e-06, 'epoch': 0.72} 72%|███████▏ | 4749/6638 [43:50<1:42:52, 3.27s/it]6 AutoResumeHook: Checking whether to suspend... 23 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 72%|███████▏ | 4750/6638 [43:53<1:42:47, 3.27s/it]7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6276, 'grad_norm': 0.5793474223688159, 'learning_rate': 3.952143315444506e-06, 'epoch': 0.72} 72%|███████▏ | 4750/6638 [43:53<1:42:47, 3.27s/it] 72%|███████▏ | 4751/6638 [43:56<1:41:31, 3.23s/it] {'loss': 0.6435, 'grad_norm': 0.5924513079522921, 'learning_rate': 3.948257848062351e-06, 'epoch': 0.72} 72%|███████▏ | 4751/6638 [43:56<1:41:31, 3.23s/it] 72%|███████▏ | 4752/6638 [44:00<1:42:39, 3.27s/it] {'loss': 0.6839, 'grad_norm': 0.6382882765955291, 'learning_rate': 3.944373821727398e-06, 'epoch': 0.72} 72%|███████▏ | 4752/6638 [44:00<1:42:39, 3.27s/it] 72%|███████▏ | 4753/6638 [44:03<1:43:00, 3.28s/it] {'loss': 0.6844, 'grad_norm': 1.105206123325887, 'learning_rate': 3.940491237364519e-06, 'epoch': 0.72} 72%|███████▏ | 4753/6638 [44:03<1:43:00, 3.28s/it] 72%|███████▏ | 4754/6638 [44:06<1:42:57, 3.28s/it] {'loss': 0.6501, 'grad_norm': 0.634543787819533, 'learning_rate': 3.9366100958982425e-06, 'epoch': 0.72} 72%|███████▏ | 4754/6638 [44:06<1:42:57, 3.28s/it] 72%|███████▏ | 4755/6638 [44:10<1:44:07, 3.32s/it] {'loss': 0.6491, 'grad_norm': 0.581261076809834, 'learning_rate': 3.9327303982527445e-06, 'epoch': 0.72} 72%|███████▏ | 4755/6638 [44:10<1:44:07, 3.32s/it] 72%|███████▏ | 4756/6638 [44:13<1:44:19, 3.33s/it] {'loss': 0.6395, 'grad_norm': 0.5915313687288785, 'learning_rate': 3.928852145351868e-06, 'epoch': 0.72} 72%|███████▏ | 4756/6638 [44:13<1:44:19, 3.33s/it] 72%|███████▏ | 4757/6638 [44:16<1:42:35, 3.27s/it] {'loss': 0.6262, 'grad_norm': 0.6280013606283277, 'learning_rate': 3.924975338119097e-06, 'epoch': 0.72} 72%|███████▏ | 4757/6638 [44:16<1:42:35, 3.27s/it] 72%|███████▏ | 4758/6638 [44:19<1:41:59, 3.26s/it] {'loss': 0.6064, 'grad_norm': 0.6413140706445457, 'learning_rate': 3.921099977477595e-06, 'epoch': 0.72} 72%|███████▏ | 4758/6638 [44:19<1:41:59, 3.26s/it] 72%|███████▏ | 4759/6638 [44:23<1:42:04, 3.26s/it] {'loss': 0.6633, 'grad_norm': 0.6380484994554507, 'learning_rate': 3.917226064350159e-06, 'epoch': 0.72} 72%|███████▏ | 4759/6638 [44:23<1:42:04, 3.26s/it] 72%|███████▏ | 4760/6638 [44:26<1:43:42, 3.31s/it] {'loss': 0.5932, 'grad_norm': 0.5178366545590364, 'learning_rate': 3.913353599659248e-06, 'epoch': 0.72} 72%|███████▏ | 4760/6638 [44:26<1:43:42, 3.31s/it] 72%|███████▏ | 4761/6638 [44:29<1:44:04, 3.33s/it] {'loss': 0.6509, 'grad_norm': 0.6542809337079519, 'learning_rate': 3.9094825843269775e-06, 'epoch': 0.72} 72%|███████▏ | 4761/6638 [44:29<1:44:04, 3.33s/it] 72%|███████▏ | 4762/6638 [44:33<1:43:07, 3.30s/it] {'loss': 0.6186, 'grad_norm': 0.5288404442329677, 'learning_rate': 3.905613019275122e-06, 'epoch': 0.72} 72%|███████▏ | 4762/6638 [44:33<1:43:07, 3.30s/it] 72%|███████▏ | 4763/6638 [44:36<1:42:57, 3.29s/it] {'loss': 0.6572, 'grad_norm': 0.5953510986802714, 'learning_rate': 3.9017449054251055e-06, 'epoch': 0.72} 72%|███████▏ | 4763/6638 [44:36<1:42:57, 3.29s/it] 72%|███████▏ | 4764/6638 [44:39<1:43:38, 3.32s/it] {'loss': 0.6654, 'grad_norm': 0.6554216451683269, 'learning_rate': 3.897878243698004e-06, 'epoch': 0.72} 72%|███████▏ | 4764/6638 [44:39<1:43:38, 3.32s/it] 72%|███████▏ | 4765/6638 [44:43<1:44:10, 3.34s/it] {'loss': 0.6253, 'grad_norm': 0.5443221434988768, 'learning_rate': 3.8940130350145515e-06, 'epoch': 0.72} 72%|███████▏ | 4765/6638 [44:43<1:44:10, 3.34s/it] 72%|███████▏ | 4766/6638 [44:46<1:43:41, 3.32s/it] {'loss': 0.6398, 'grad_norm': 0.5588927617492652, 'learning_rate': 3.8901492802951404e-06, 'epoch': 0.72} 72%|███████▏ | 4766/6638 [44:46<1:43:41, 3.32s/it] 72%|███████▏ | 4767/6638 [44:49<1:43:14, 3.31s/it] {'loss': 0.6789, 'grad_norm': 0.6122402251658999, 'learning_rate': 3.886286980459806e-06, 'epoch': 0.72} 72%|███████▏ | 4767/6638 [44:49<1:43:14, 3.31s/it] 72%|███████▏ | 4768/6638 [44:53<1:43:43, 3.33s/it] {'loss': 0.6337, 'grad_norm': 0.5983987453245944, 'learning_rate': 3.882426136428245e-06, 'epoch': 0.72} 72%|███████▏ | 4768/6638 [44:53<1:43:43, 3.33s/it] 72%|███████▏ | 4769/6638 [44:56<1:43:12, 3.31s/it] {'loss': 0.6843, 'grad_norm': 0.637403803414699, 'learning_rate': 3.878566749119809e-06, 'epoch': 0.72} 72%|███████▏ | 4769/6638 [44:56<1:43:12, 3.31s/it] 72%|███████▏ | 4770/6638 [44:59<1:42:35, 3.30s/it] {'loss': 0.6091, 'grad_norm': 0.5497270857539066, 'learning_rate': 3.874708819453496e-06, 'epoch': 0.72} 72%|███████▏ | 4770/6638 [44:59<1:42:35, 3.30s/it] 72%|███████▏ | 4771/6638 [45:02<1:42:37, 3.30s/it] {'loss': 0.6585, 'grad_norm': 0.6103729886887795, 'learning_rate': 3.870852348347963e-06, 'epoch': 0.72} 72%|███████▏ | 4771/6638 [45:02<1:42:37, 3.30s/it] 72%|███████▏ | 4772/6638 [45:06<1:42:19, 3.29s/it] {'loss': 0.6172, 'grad_norm': 0.5506219357879574, 'learning_rate': 3.866997336721511e-06, 'epoch': 0.72} 72%|███████▏ | 4772/6638 [45:06<1:42:19, 3.29s/it] 72%|███████▏ | 4773/6638 [45:09<1:41:47, 3.27s/it] {'loss': 0.6012, 'grad_norm': 0.5074321477713539, 'learning_rate': 3.8631437854921035e-06, 'epoch': 0.72} 72%|███████▏ | 4773/6638 [45:09<1:41:47, 3.27s/it] 72%|███████▏ | 4774/6638 [45:12<1:42:23, 3.30s/it] {'loss': 0.6434, 'grad_norm': 0.5530311455694775, 'learning_rate': 3.859291695577358e-06, 'epoch': 0.72} 72%|███████▏ | 4774/6638 [45:12<1:42:23, 3.30s/it] 72%|███████▏ | 4775/6638 [45:16<1:42:59, 3.32s/it] {'loss': 0.6261, 'grad_norm': 0.5971238348655653, 'learning_rate': 3.8554410678945265e-06, 'epoch': 0.72} 72%|███████▏ | 4775/6638 [45:16<1:42:59, 3.32s/it] 72%|███████▏ | 4776/6638 [45:19<1:42:33, 3.30s/it] {'loss': 0.6495, 'grad_norm': 0.5664971933816321, 'learning_rate': 3.851591903360531e-06, 'epoch': 0.72} 72%|███████▏ | 4776/6638 [45:19<1:42:33, 3.30s/it] 72%|███████▏ | 4777/6638 [45:22<1:44:12, 3.36s/it] {'loss': 0.6739, 'grad_norm': 0.5788086515714878, 'learning_rate': 3.847744202891938e-06, 'epoch': 0.72} 72%|███████▏ | 4777/6638 [45:22<1:44:12, 3.36s/it] 72%|███████▏ | 4778/6638 [45:26<1:44:04, 3.36s/it] {'loss': 0.6009, 'grad_norm': 0.48165983071095775, 'learning_rate': 3.843897967404968e-06, 'epoch': 0.72} 72%|███████▏ | 4778/6638 [45:26<1:44:04, 3.36s/it] 72%|███████▏ | 4779/6638 [45:29<1:43:28, 3.34s/it] {'loss': 0.6407, 'grad_norm': 0.5522273627313992, 'learning_rate': 3.84005319781549e-06, 'epoch': 0.72} 72%|███████▏ | 4779/6638 [45:29<1:43:28, 3.34s/it] 72%|███████▏ | 4780/6638 [45:32<1:42:23, 3.31s/it] {'loss': 0.7012, 'grad_norm': 0.6985814344037697, 'learning_rate': 3.836209895039016e-06, 'epoch': 0.72} 72%|███████▏ | 4780/6638 [45:32<1:42:23, 3.31s/it] 72%|███████▏ | 4781/6638 [45:36<1:42:01, 3.30s/it] {'loss': 0.7069, 'grad_norm': 0.6548418959397163, 'learning_rate': 3.832368059990732e-06, 'epoch': 0.72} 72%|███████▏ | 4781/6638 [45:36<1:42:01, 3.30s/it] 72%|███████▏ | 4782/6638 [45:39<1:42:26, 3.31s/it] {'loss': 0.7027, 'grad_norm': 0.6145640936606501, 'learning_rate': 3.828527693585451e-06, 'epoch': 0.72} 72%|███████▏ | 4782/6638 [45:39<1:42:26, 3.31s/it] 72%|███████▏ | 4783/6638 [45:42<1:41:24, 3.28s/it] {'loss': 0.6222, 'grad_norm': 0.5942919466540483, 'learning_rate': 3.824688796737643e-06, 'epoch': 0.72} 72%|███████▏ | 4783/6638 [45:42<1:41:24, 3.28s/it] 72%|███████▏ | 4784/6638 [45:46<1:42:29, 3.32s/it] {'loss': 0.6322, 'grad_norm': 0.5907914138802439, 'learning_rate': 3.820851370361435e-06, 'epoch': 0.72} 72%|███████▏ | 4784/6638 [45:46<1:42:29, 3.32s/it] 72%|███████▏ | 4785/6638 [45:49<1:42:11, 3.31s/it] {'loss': 0.6214, 'grad_norm': 0.5556719751986515, 'learning_rate': 3.817015415370596e-06, 'epoch': 0.72} 72%|███████▏ | 4785/6638 [45:49<1:42:11, 3.31s/it] 72%|███████▏ | 4786/6638 [45:52<1:41:42, 3.29s/it] {'loss': 0.6402, 'grad_norm': 0.5775134596699862, 'learning_rate': 3.813180932678553e-06, 'epoch': 0.72} 72%|███████▏ | 4786/6638 [45:52<1:41:42, 3.29s/it] 72%|███████▏ | 4787/6638 [45:55<1:41:42, 3.30s/it] {'loss': 0.6173, 'grad_norm': 0.5355539443590439, 'learning_rate': 3.80934792319837e-06, 'epoch': 0.72} 72%|███████▏ | 4787/6638 [45:55<1:41:42, 3.30s/it] 72%|███████▏ | 4788/6638 [45:59<1:40:39, 3.26s/it] {'loss': 0.6091, 'grad_norm': 0.5469309207491391, 'learning_rate': 3.8055163878427703e-06, 'epoch': 0.72} 72%|███████▏ | 4788/6638 [45:59<1:40:39, 3.26s/it] 72%|███████▏ | 4789/6638 [46:02<1:40:48, 3.27s/it] {'loss': 0.6213, 'grad_norm': 0.5949832542459692, 'learning_rate': 3.801686327524126e-06, 'epoch': 0.72} 72%|███████▏ | 4789/6638 [46:02<1:40:48, 3.27s/it] 72%|███████▏ | 4790/6638 [46:05<1:40:10, 3.25s/it] {'loss': 0.6513, 'grad_norm': 0.608104536014695, 'learning_rate': 3.7978577431544495e-06, 'epoch': 0.72} 72%|███████▏ | 4790/6638 [46:05<1:40:10, 3.25s/it] 72%|███████▏ | 4791/6638 [46:08<1:40:39, 3.27s/it] {'loss': 0.6166, 'grad_norm': 0.5748205878667838, 'learning_rate': 3.7940306356454136e-06, 'epoch': 0.72} 72%|███████▏ | 4791/6638 [46:08<1:40:39, 3.27s/it] 72%|███████▏ | 4792/6638 [46:12<1:40:39, 3.27s/it] {'loss': 0.6332, 'grad_norm': 0.642378523382626, 'learning_rate': 3.7902050059083264e-06, 'epoch': 0.72} 72%|███████▏ | 4792/6638 [46:12<1:40:39, 3.27s/it] 72%|███████▏ | 4793/6638 [46:15<1:40:02, 3.25s/it] {'loss': 0.6565, 'grad_norm': 0.6206119750724369, 'learning_rate': 3.7863808548541536e-06, 'epoch': 0.72} 72%|███████▏ | 4793/6638 [46:15<1:40:02, 3.25s/it] 72%|███████▏ | 4794/6638 [46:18<1:40:29, 3.27s/it] {'loss': 0.6404, 'grad_norm': 0.622546114990436, 'learning_rate': 3.782558183393511e-06, 'epoch': 0.72} 72%|███████▏ | 4794/6638 [46:18<1:40:29, 3.27s/it] 72%|███████▏ | 4795/6638 [46:22<1:40:46, 3.28s/it] {'loss': 0.6323, 'grad_norm': 0.6224646305380034, 'learning_rate': 3.778736992436649e-06, 'epoch': 0.72} 72%|███████▏ | 4795/6638 [46:22<1:40:46, 3.28s/it] 72%|███████▏ | 4796/6638 [46:25<1:40:03, 3.26s/it] {'loss': 0.5925, 'grad_norm': 0.5184059063996407, 'learning_rate': 3.7749172828934777e-06, 'epoch': 0.72} 72%|███████▏ | 4796/6638 [46:25<1:40:03, 3.26s/it] 72%|███████▏ | 4797/6638 [46:28<1:41:01, 3.29s/it] {'loss': 0.6693, 'grad_norm': 0.595356077929769, 'learning_rate': 3.771099055673553e-06, 'epoch': 0.72} 72%|███████▏ | 4797/6638 [46:28<1:41:01, 3.29s/it] 72%|███████▏ | 4798/6638 [46:31<1:40:58, 3.29s/it] {'loss': 0.6705, 'grad_norm': 0.6043935009501462, 'learning_rate': 3.767282311686069e-06, 'epoch': 0.72} 72%|███████▏ | 4798/6638 [46:31<1:40:58, 3.29s/it] 72%|███████▏ | 4799/6638 [46:35<1:41:05, 3.30s/it] {'loss': 0.609, 'grad_norm': 0.5347539149076873, 'learning_rate': 3.7634670518398764e-06, 'epoch': 0.72} 72%|███████▏ | 4799/6638 [46:35<1:41:05, 3.30s/it]3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 72%|███████▏ | 4800/6638 [46:38<1:40:14, 3.27s/it]7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6201, 'grad_norm': 0.564631469755337, 'learning_rate': 3.759653277043469e-06, 'epoch': 0.72} 72%|███████▏ | 4800/6638 [46:38<1:40:14, 3.27s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 72%|███████▏ | 4801/6638 [46:55<3:43:46, 7.31s/it] {'loss': 0.6576, 'grad_norm': 0.6132443740275679, 'learning_rate': 3.7558409882049897e-06, 'epoch': 0.72} 72%|███████▏ | 4801/6638 [46:55<3:43:46, 7.31s/it] 72%|███████▏ | 4802/6638 [46:58<3:06:17, 6.09s/it] {'loss': 0.6106, 'grad_norm': 0.5718682378837646, 'learning_rate': 3.7520301862322207e-06, 'epoch': 0.72} 72%|███████▏ | 4802/6638 [46:58<3:06:17, 6.09s/it] 72%|███████▏ | 4803/6638 [47:01<2:41:14, 5.27s/it] {'loss': 0.6514, 'grad_norm': 0.7186963188741096, 'learning_rate': 3.7482208720325886e-06, 'epoch': 0.72} 72%|███████▏ | 4803/6638 [47:01<2:41:14, 5.27s/it] 72%|███████▏ | 4804/6638 [47:04<2:22:40, 4.67s/it] {'loss': 0.6154, 'grad_norm': 0.5739946105307934, 'learning_rate': 3.7444130465131836e-06, 'epoch': 0.72} 72%|███████▏ | 4804/6638 [47:04<2:22:40, 4.67s/it] 72%|███████▏ | 4805/6638 [47:08<2:09:58, 4.25s/it] {'loss': 0.6301, 'grad_norm': 0.6026167838299119, 'learning_rate': 3.740606710580721e-06, 'epoch': 0.72} 72%|███████▏ | 4805/6638 [47:08<2:09:58, 4.25s/it] 72%|███████▏ | 4806/6638 [47:11<1:59:37, 3.92s/it] {'loss': 0.6526, 'grad_norm': 0.6094216241758607, 'learning_rate': 3.7368018651415738e-06, 'epoch': 0.72} 72%|███████▏ | 4806/6638 [47:11<1:59:37, 3.92s/it] 72%|███████▏ | 4807/6638 [47:14<1:53:56, 3.73s/it] {'loss': 0.6499, 'grad_norm': 0.6182637383194867, 'learning_rate': 3.732998511101752e-06, 'epoch': 0.72} 72%|███████▏ | 4807/6638 [47:14<1:53:56, 3.73s/it] 72%|███████▏ | 4808/6638 [47:18<1:50:15, 3.62s/it] {'loss': 0.6281, 'grad_norm': 0.5630343518155598, 'learning_rate': 3.729196649366914e-06, 'epoch': 0.72} 72%|███████▏ | 4808/6638 [47:18<1:50:15, 3.62s/it] 72%|███████▏ | 4809/6638 [47:21<1:47:16, 3.52s/it] {'loss': 0.6663, 'grad_norm': 0.6251648752438228, 'learning_rate': 3.725396280842369e-06, 'epoch': 0.72} 72%|███████▏ | 4809/6638 [47:21<1:47:16, 3.52s/it] 72%|███████▏ | 4810/6638 [47:24<1:44:14, 3.42s/it] {'loss': 0.6608, 'grad_norm': 0.610152258786723, 'learning_rate': 3.721597406433062e-06, 'epoch': 0.72} 72%|███████▏ | 4810/6638 [47:24<1:44:14, 3.42s/it] 72%|███████▏ | 4811/6638 [47:27<1:42:21, 3.36s/it] {'loss': 0.6693, 'grad_norm': 0.623600678122956, 'learning_rate': 3.7178000270435765e-06, 'epoch': 0.72} 72%|███████▏ | 4811/6638 [47:27<1:42:21, 3.36s/it] 72%|███████▏ | 4812/6638 [47:31<1:42:20, 3.36s/it] {'loss': 0.6012, 'grad_norm': 0.5386097563478505, 'learning_rate': 3.7140041435781616e-06, 'epoch': 0.72} 72%|███████▏ | 4812/6638 [47:31<1:42:20, 3.36s/it] 73%|███████▎ | 4813/6638 [47:34<1:41:39, 3.34s/it] {'loss': 0.6386, 'grad_norm': 0.5980710979170977, 'learning_rate': 3.7102097569406893e-06, 'epoch': 0.73} 73%|███████▎ | 4813/6638 [47:34<1:41:39, 3.34s/it] 73%|███████▎ | 4814/6638 [47:37<1:43:27, 3.40s/it] {'loss': 0.6617, 'grad_norm': 0.6834272911106406, 'learning_rate': 3.706416868034687e-06, 'epoch': 0.73} 73%|███████▎ | 4814/6638 [47:37<1:43:27, 3.40s/it] 73%|███████▎ | 4815/6638 [47:41<1:42:53, 3.39s/it] {'loss': 0.6371, 'grad_norm': 0.6362757713554309, 'learning_rate': 3.7026254777633164e-06, 'epoch': 0.73} 73%|███████▎ | 4815/6638 [47:41<1:42:53, 3.39s/it] 73%|███████▎ | 4816/6638 [47:44<1:41:03, 3.33s/it] {'loss': 0.6587, 'grad_norm': 0.5884730752235714, 'learning_rate': 3.698835587029389e-06, 'epoch': 0.73} 73%|███████▎ | 4816/6638 [47:44<1:41:03, 3.33s/it] 73%|███████▎ | 4817/6638 [47:47<1:41:06, 3.33s/it] {'loss': 0.6902, 'grad_norm': 0.6846261829920376, 'learning_rate': 3.6950471967353642e-06, 'epoch': 0.73} 73%|███████▎ | 4817/6638 [47:47<1:41:06, 3.33s/it] 73%|███████▎ | 4818/6638 [47:51<1:40:19, 3.31s/it] {'loss': 0.6638, 'grad_norm': 0.6308898734203886, 'learning_rate': 3.6912603077833274e-06, 'epoch': 0.73} 73%|███████▎ | 4818/6638 [47:51<1:40:19, 3.31s/it] 73%|███████▎ | 4819/6638 [47:54<1:40:36, 3.32s/it] {'loss': 0.6515, 'grad_norm': 0.6560614145311191, 'learning_rate': 3.6874749210750207e-06, 'epoch': 0.73} 73%|███████▎ | 4819/6638 [47:54<1:40:36, 3.32s/it] 73%|███████▎ | 4820/6638 [47:57<1:40:08, 3.31s/it] {'loss': 0.6368, 'grad_norm': 0.5696533434674188, 'learning_rate': 3.6836910375118294e-06, 'epoch': 0.73} 73%|███████▎ | 4820/6638 [47:57<1:40:08, 3.31s/it] 73%|███████▎ | 4821/6638 [48:01<1:40:12, 3.31s/it] {'loss': 0.6114, 'grad_norm': 0.5434357208299432, 'learning_rate': 3.6799086579947674e-06, 'epoch': 0.73} 73%|███████▎ | 4821/6638 [48:01<1:40:12, 3.31s/it] 73%|███████▎ | 4822/6638 [48:04<1:39:51, 3.30s/it] {'loss': 0.6072, 'grad_norm': 0.5139142565634449, 'learning_rate': 3.6761277834245023e-06, 'epoch': 0.73} 73%|███████▎ | 4822/6638 [48:04<1:39:51, 3.30s/it] 73%|███████▎ | 4823/6638 [48:07<1:41:51, 3.37s/it] {'loss': 0.6264, 'grad_norm': 0.58225614566743, 'learning_rate': 3.672348414701341e-06, 'epoch': 0.73} 73%|███████▎ | 4823/6638 [48:07<1:41:51, 3.37s/it] 73%|███████▎ | 4824/6638 [48:11<1:41:18, 3.35s/it] {'loss': 0.6634, 'grad_norm': 0.6270127055546907, 'learning_rate': 3.6685705527252337e-06, 'epoch': 0.73} 73%|███████▎ | 4824/6638 [48:11<1:41:18, 3.35s/it] 73%|███████▎ | 4825/6638 [48:14<1:39:54, 3.31s/it] {'loss': 0.6274, 'grad_norm': 0.5950156361954594, 'learning_rate': 3.6647941983957647e-06, 'epoch': 0.73} 73%|███████▎ | 4825/6638 [48:14<1:39:54, 3.31s/it] 73%|███████▎ | 4826/6638 [48:17<1:40:18, 3.32s/it] {'loss': 0.6426, 'grad_norm': 0.5793160045731715, 'learning_rate': 3.661019352612162e-06, 'epoch': 0.73} 73%|███████▎ | 4826/6638 [48:17<1:40:18, 3.32s/it] 73%|███████▎ | 4827/6638 [48:20<1:39:19, 3.29s/it] {'loss': 0.645, 'grad_norm': 0.5961782015009275, 'learning_rate': 3.657246016273297e-06, 'epoch': 0.73} 73%|███████▎ | 4827/6638 [48:20<1:39:19, 3.29s/it] 73%|███████▎ | 4828/6638 [48:24<1:39:39, 3.30s/it] {'loss': 0.6381, 'grad_norm': 0.5483509612338087, 'learning_rate': 3.6534741902776816e-06, 'epoch': 0.73} 73%|███████▎ | 4828/6638 [48:24<1:39:39, 3.30s/it] 73%|███████▎ | 4829/6638 [48:27<1:38:57, 3.28s/it] {'loss': 0.6364, 'grad_norm': 0.5950700267192727, 'learning_rate': 3.6497038755234703e-06, 'epoch': 0.73} 73%|███████▎ | 4829/6638 [48:27<1:38:57, 3.28s/it] 73%|███████▎ | 4830/6638 [48:30<1:38:53, 3.28s/it] {'loss': 0.6475, 'grad_norm': 0.567014320616211, 'learning_rate': 3.6459350729084473e-06, 'epoch': 0.73} 73%|███████▎ | 4830/6638 [48:30<1:38:53, 3.28s/it] 73%|███████▎ | 4831/6638 [48:34<1:39:30, 3.30s/it] {'loss': 0.6348, 'grad_norm': 0.6124392065875399, 'learning_rate': 3.6421677833300474e-06, 'epoch': 0.73} 73%|███████▎ | 4831/6638 [48:34<1:39:30, 3.30s/it] 73%|███████▎ | 4832/6638 [48:37<1:40:45, 3.35s/it] {'loss': 0.6538, 'grad_norm': 0.6454193582641201, 'learning_rate': 3.6384020076853453e-06, 'epoch': 0.73} 73%|███████▎ | 4832/6638 [48:37<1:40:45, 3.35s/it] 73%|███████▎ | 4833/6638 [48:40<1:39:48, 3.32s/it] {'loss': 0.6881, 'grad_norm': 0.6381006018556225, 'learning_rate': 3.634637746871045e-06, 'epoch': 0.73} 73%|███████▎ | 4833/6638 [48:40<1:39:48, 3.32s/it] 73%|███████▎ | 4834/6638 [48:44<1:39:04, 3.30s/it] {'loss': 0.6383, 'grad_norm': 0.5716652062972422, 'learning_rate': 3.6308750017834984e-06, 'epoch': 0.73} 73%|███████▎ | 4834/6638 [48:44<1:39:04, 3.30s/it] 73%|███████▎ | 4835/6638 [48:47<1:39:18, 3.30s/it] {'loss': 0.6545, 'grad_norm': 0.5648145252856824, 'learning_rate': 3.6271137733186977e-06, 'epoch': 0.73} 73%|███████▎ | 4835/6638 [48:47<1:39:18, 3.30s/it] 73%|███████▎ | 4836/6638 [48:50<1:38:23, 3.28s/it] {'loss': 0.6665, 'grad_norm': 0.6313142922651922, 'learning_rate': 3.6233540623722662e-06, 'epoch': 0.73} 73%|███████▎ | 4836/6638 [48:50<1:38:23, 3.28s/it] 73%|███████▎ | 4837/6638 [48:53<1:38:23, 3.28s/it] {'loss': 0.6249, 'grad_norm': 0.5856737072243342, 'learning_rate': 3.619595869839474e-06, 'epoch': 0.73} 73%|███████▎ | 4837/6638 [48:53<1:38:23, 3.28s/it] 73%|███████▎ | 4838/6638 [48:57<1:37:16, 3.24s/it] {'loss': 0.611, 'grad_norm': 0.6345508188219477, 'learning_rate': 3.6158391966152175e-06, 'epoch': 0.73} 73%|███████▎ | 4838/6638 [48:57<1:37:16, 3.24s/it] 73%|███████▎ | 4839/6638 [49:00<1:37:29, 3.25s/it] {'loss': 0.7029, 'grad_norm': 0.7498412845592278, 'learning_rate': 3.6120840435940517e-06, 'epoch': 0.73} 73%|███████▎ | 4839/6638 [49:00<1:37:29, 3.25s/it] 73%|███████▎ | 4840/6638 [49:03<1:37:25, 3.25s/it] {'loss': 0.6282, 'grad_norm': 0.5674740803930446, 'learning_rate': 3.6083304116701535e-06, 'epoch': 0.73} 73%|███████▎ | 4840/6638 [49:03<1:37:25, 3.25s/it] 73%|███████▎ | 4841/6638 [49:06<1:36:36, 3.23s/it] {'loss': 0.6179, 'grad_norm': 0.6231371451104385, 'learning_rate': 3.604578301737336e-06, 'epoch': 0.73} 73%|███████▎ | 4841/6638 [49:06<1:36:36, 3.23s/it] 73%|███████▎ | 4842/6638 [49:09<1:36:25, 3.22s/it] {'loss': 0.6623, 'grad_norm': 0.608918185398886, 'learning_rate': 3.600827714689059e-06, 'epoch': 0.73} 73%|███████▎ | 4842/6638 [49:09<1:36:25, 3.22s/it] 73%|███████▎ | 4843/6638 [49:13<1:35:59, 3.21s/it] {'loss': 0.6622, 'grad_norm': 0.6162521275563141, 'learning_rate': 3.59707865141842e-06, 'epoch': 0.73} 73%|███████▎ | 4843/6638 [49:13<1:35:59, 3.21s/it] 73%|███████▎ | 4844/6638 [49:16<1:36:32, 3.23s/it] {'loss': 0.6183, 'grad_norm': 0.5919968827691272, 'learning_rate': 3.593331112818144e-06, 'epoch': 0.73} 73%|███████▎ | 4844/6638 [49:16<1:36:32, 3.23s/it] 73%|███████▎ | 4845/6638 [49:19<1:37:05, 3.25s/it] {'loss': 0.6569, 'grad_norm': 0.5716446453041846, 'learning_rate': 3.5895850997806016e-06, 'epoch': 0.73} 73%|███████▎ | 4845/6638 [49:19<1:37:05, 3.25s/it] 73%|███████▎ | 4846/6638 [49:22<1:37:19, 3.26s/it] {'loss': 0.6619, 'grad_norm': 0.6111883650055916, 'learning_rate': 3.5858406131977965e-06, 'epoch': 0.73} 73%|███████▎ | 4846/6638 [49:22<1:37:19, 3.26s/it] 73%|███████▎ | 4847/6638 [49:26<1:37:06, 3.25s/it] {'loss': 0.6797, 'grad_norm': 0.6086084976359746, 'learning_rate': 3.5820976539613738e-06, 'epoch': 0.73} 73%|███████▎ | 4847/6638 [49:26<1:37:06, 3.25s/it] 73%|███████▎ | 4848/6638 [49:29<1:37:08, 3.26s/it] {'loss': 0.6538, 'grad_norm': 0.580255533151388, 'learning_rate': 3.5783562229626078e-06, 'epoch': 0.73} 73%|███████▎ | 4848/6638 [49:29<1:37:08, 3.26s/it] 73%|███████▎ | 4849/6638 [49:32<1:38:18, 3.30s/it] {'loss': 0.673, 'grad_norm': 0.5824374614523866, 'learning_rate': 3.574616321092409e-06, 'epoch': 0.73} 73%|███████▎ | 4849/6638 [49:32<1:38:18, 3.30s/it]2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 73%|███████▎ | 4850/6638 [49:36<1:39:02, 3.32s/it] {'loss': 0.7033, 'grad_norm': 0.6971329656406097, 'learning_rate': 3.5708779492413294e-06, 'epoch': 0.73} 73%|███████▎ | 4850/6638 [49:36<1:39:02, 3.32s/it] 73%|███████▎ | 4851/6638 [49:39<1:40:26, 3.37s/it] {'loss': 0.6417, 'grad_norm': 0.5526587558470905, 'learning_rate': 3.567141108299554e-06, 'epoch': 0.73} 73%|███████▎ | 4851/6638 [49:39<1:40:26, 3.37s/it] 73%|███████▎ | 4852/6638 [49:43<1:39:52, 3.36s/it] {'loss': 0.6156, 'grad_norm': 0.5651624883078525, 'learning_rate': 3.5634057991569082e-06, 'epoch': 0.73} 73%|███████▎ | 4852/6638 [49:43<1:39:52, 3.36s/it] 73%|███████▎ | 4853/6638 [49:46<1:39:30, 3.34s/it] {'loss': 0.6609, 'grad_norm': 0.568383352142188, 'learning_rate': 3.5596720227028382e-06, 'epoch': 0.73} 73%|███████▎ | 4853/6638 [49:46<1:39:30, 3.34s/it] 73%|███████▎ | 4854/6638 [49:49<1:39:00, 3.33s/it] {'loss': 0.6106, 'grad_norm': 0.5806419423607537, 'learning_rate': 3.5559397798264404e-06, 'epoch': 0.73} 73%|███████▎ | 4854/6638 [49:49<1:39:00, 3.33s/it] 73%|███████▎ | 4855/6638 [49:52<1:38:46, 3.32s/it] {'loss': 0.6699, 'grad_norm': 0.6770821580446511, 'learning_rate': 3.5522090714164425e-06, 'epoch': 0.73} 73%|███████▎ | 4855/6638 [49:52<1:38:46, 3.32s/it] 73%|███████▎ | 4856/6638 [49:56<1:40:10, 3.37s/it] {'loss': 0.6284, 'grad_norm': 0.6335532001518972, 'learning_rate': 3.548479898361199e-06, 'epoch': 0.73} 73%|███████▎ | 4856/6638 [49:56<1:40:10, 3.37s/it] 73%|███████▎ | 4857/6638 [49:59<1:39:47, 3.36s/it] {'loss': 0.6989, 'grad_norm': 0.6345536237827524, 'learning_rate': 3.544752261548706e-06, 'epoch': 0.73} 73%|███████▎ | 4857/6638 [49:59<1:39:47, 3.36s/it] 73%|███████▎ | 4858/6638 [50:03<1:40:31, 3.39s/it] {'loss': 0.6326, 'grad_norm': 0.5332925107944096, 'learning_rate': 3.5410261618665985e-06, 'epoch': 0.73} 73%|███████▎ | 4858/6638 [50:03<1:40:31, 3.39s/it] 73%|███████▎ | 4859/6638 [50:06<1:38:37, 3.33s/it] {'loss': 0.6356, 'grad_norm': 0.6020916461767946, 'learning_rate': 3.537301600202131e-06, 'epoch': 0.73} 73%|███████▎ | 4859/6638 [50:06<1:38:37, 3.33s/it] 73%|███████▎ | 4860/6638 [50:09<1:37:50, 3.30s/it] {'loss': 0.6414, 'grad_norm': 0.5653282849364668, 'learning_rate': 3.533578577442206e-06, 'epoch': 0.73} 73%|███████▎ | 4860/6638 [50:09<1:37:50, 3.30s/it] 73%|███████▎ | 4861/6638 [50:13<1:38:07, 3.31s/it] {'loss': 0.6383, 'grad_norm': 0.5376722461828289, 'learning_rate': 3.5298570944733447e-06, 'epoch': 0.73} 73%|███████▎ | 4861/6638 [50:13<1:38:07, 3.31s/it] 73%|███████▎ | 4862/6638 [50:16<1:38:21, 3.32s/it] {'loss': 0.6295, 'grad_norm': 0.5691397355973509, 'learning_rate': 3.5261371521817247e-06, 'epoch': 0.73} 73%|███████▎ | 4862/6638 [50:16<1:38:21, 3.32s/it] 73%|███████▎ | 4863/6638 [50:19<1:38:30, 3.33s/it] {'loss': 0.6064, 'grad_norm': 0.5800783084581392, 'learning_rate': 3.522418751453133e-06, 'epoch': 0.73} 73%|███████▎ | 4863/6638 [50:19<1:38:30, 3.33s/it] 73%|███████▎ | 4864/6638 [50:23<1:39:55, 3.38s/it] {'loss': 0.6405, 'grad_norm': 0.5989892812468943, 'learning_rate': 3.5187018931729987e-06, 'epoch': 0.73} 73%|███████▎ | 4864/6638 [50:23<1:39:55, 3.38s/it] 73%|███████▎ | 4865/6638 [50:26<1:38:48, 3.34s/it] {'loss': 0.6169, 'grad_norm': 0.5078159006407036, 'learning_rate': 3.514986578226386e-06, 'epoch': 0.73} 73%|███████▎ | 4865/6638 [50:26<1:38:48, 3.34s/it] 73%|███████▎ | 4866/6638 [50:29<1:38:43, 3.34s/it] {'loss': 0.6559, 'grad_norm': 0.5671179501966621, 'learning_rate': 3.5112728074979896e-06, 'epoch': 0.73} 73%|███████▎ | 4866/6638 [50:29<1:38:43, 3.34s/it] 73%|███████▎ | 4867/6638 [50:33<1:39:46, 3.38s/it] {'loss': 0.6195, 'grad_norm': 0.6274641462956946, 'learning_rate': 3.507560581872139e-06, 'epoch': 0.73} 73%|███████▎ | 4867/6638 [50:33<1:39:46, 3.38s/it] 73%|███████▎ | 4868/6638 [50:36<1:40:50, 3.42s/it] {'loss': 0.7102, 'grad_norm': 0.6832276415949672, 'learning_rate': 3.503849902232792e-06, 'epoch': 0.73} 73%|███████▎ | 4868/6638 [50:36<1:40:50, 3.42s/it] 73%|███████▎ | 4869/6638 [50:40<1:40:52, 3.42s/it] {'loss': 0.6543, 'grad_norm': 0.6278525533508338, 'learning_rate': 3.5001407694635326e-06, 'epoch': 0.73} 73%|███████▎ | 4869/6638 [50:40<1:40:52, 3.42s/it] 73%|███████▎ | 4870/6638 [50:43<1:39:49, 3.39s/it] {'loss': 0.6125, 'grad_norm': 0.5754534530592758, 'learning_rate': 3.4964331844475953e-06, 'epoch': 0.73} 73%|███████▎ | 4870/6638 [50:43<1:39:49, 3.39s/it] 73%|███████▎ | 4871/6638 [50:46<1:38:43, 3.35s/it] {'loss': 0.6352, 'grad_norm': 0.5954941713949875, 'learning_rate': 3.492727148067824e-06, 'epoch': 0.73} 73%|███████▎ | 4871/6638 [50:46<1:38:43, 3.35s/it] 73%|███████▎ | 4872/6638 [50:50<1:37:47, 3.32s/it] {'loss': 0.6266, 'grad_norm': 0.6500890356607806, 'learning_rate': 3.4890226612067124e-06, 'epoch': 0.73} 73%|███████▎ | 4872/6638 [50:50<1:37:47, 3.32s/it] 73%|███████▎ | 4873/6638 [50:53<1:36:55, 3.29s/it] {'loss': 0.6457, 'grad_norm': 0.6624317392622164, 'learning_rate': 3.485319724746369e-06, 'epoch': 0.73} 73%|███████▎ | 4873/6638 [50:53<1:36:55, 3.29s/it] 73%|███████▎ | 4874/6638 [50:56<1:36:44, 3.29s/it] {'loss': 0.6065, 'grad_norm': 0.5357280522284574, 'learning_rate': 3.4816183395685433e-06, 'epoch': 0.73} 73%|███████▎ | 4874/6638 [50:56<1:36:44, 3.29s/it] 73%|███████▎ | 4875/6638 [50:59<1:37:17, 3.31s/it] {'loss': 0.6746, 'grad_norm': 0.6470389266558952, 'learning_rate': 3.4779185065546183e-06, 'epoch': 0.73} 73%|███████▎ | 4875/6638 [50:59<1:37:17, 3.31s/it] 73%|███████▎ | 4876/6638 [51:03<1:37:18, 3.31s/it] {'loss': 0.6628, 'grad_norm': 0.6062130473113019, 'learning_rate': 3.474220226585595e-06, 'epoch': 0.73} 73%|███████▎ | 4876/6638 [51:03<1:37:18, 3.31s/it] 73%|███████▎ | 4877/6638 [51:06<1:36:52, 3.30s/it] {'loss': 0.6022, 'grad_norm': 0.5539555242057337, 'learning_rate': 3.4705235005421144e-06, 'epoch': 0.73} 73%|███████▎ | 4877/6638 [51:06<1:36:52, 3.30s/it] 73%|███████▎ | 4878/6638 [51:09<1:36:46, 3.30s/it] {'loss': 0.6004, 'grad_norm': 0.5768662455870615, 'learning_rate': 3.4668283293044492e-06, 'epoch': 0.73} 73%|███████▎ | 4878/6638 [51:09<1:36:46, 3.30s/it] 74%|███████▎ | 4879/6638 [51:13<1:36:36, 3.30s/it] {'loss': 0.6589, 'grad_norm': 0.5815804892178797, 'learning_rate': 3.4631347137524896e-06, 'epoch': 0.74} 74%|███████▎ | 4879/6638 [51:13<1:36:36, 3.30s/it] 74%|███████▎ | 4880/6638 [51:16<1:36:45, 3.30s/it] {'loss': 0.6941, 'grad_norm': 0.6130376381666288, 'learning_rate': 3.4594426547657667e-06, 'epoch': 0.74} 74%|███████▎ | 4880/6638 [51:16<1:36:45, 3.30s/it] 74%|███████▎ | 4881/6638 [51:19<1:36:32, 3.30s/it] {'loss': 0.6364, 'grad_norm': 0.5893678235264955, 'learning_rate': 3.4557521532234405e-06, 'epoch': 0.74} 74%|███████▎ | 4881/6638 [51:19<1:36:32, 3.30s/it] 74%|███████▎ | 4882/6638 [51:22<1:35:40, 3.27s/it] {'loss': 0.6583, 'grad_norm': 0.6580731285630764, 'learning_rate': 3.4520632100042916e-06, 'epoch': 0.74} 74%|███████▎ | 4882/6638 [51:22<1:35:40, 3.27s/it] 74%|███████▎ | 4883/6638 [51:26<1:36:03, 3.28s/it] {'loss': 0.6603, 'grad_norm': 0.6562476758847667, 'learning_rate': 3.448375825986742e-06, 'epoch': 0.74} 74%|███████▎ | 4883/6638 [51:26<1:36:03, 3.28s/it] 74%|███████▎ | 4884/6638 [51:29<1:35:31, 3.27s/it] {'loss': 0.6335, 'grad_norm': 0.5777460598811183, 'learning_rate': 3.444690002048826e-06, 'epoch': 0.74} 74%|███████▎ | 4884/6638 [51:29<1:35:31, 3.27s/it] 74%|███████▎ | 4885/6638 [51:32<1:35:51, 3.28s/it] {'loss': 0.6374, 'grad_norm': 0.6063490288989268, 'learning_rate': 3.441005739068223e-06, 'epoch': 0.74} 74%|███████▎ | 4885/6638 [51:32<1:35:51, 3.28s/it] 74%|███████▎ | 4886/6638 [51:35<1:35:17, 3.26s/it] {'loss': 0.639, 'grad_norm': 0.6115407090219576, 'learning_rate': 3.4373230379222344e-06, 'epoch': 0.74} 74%|███████▎ | 4886/6638 [51:35<1:35:17, 3.26s/it] 74%|███████▎ | 4887/6638 [51:39<1:35:21, 3.27s/it] {'loss': 0.6279, 'grad_norm': 0.5988889735037497, 'learning_rate': 3.433641899487783e-06, 'epoch': 0.74} 74%|███████▎ | 4887/6638 [51:39<1:35:21, 3.27s/it] 74%|███████▎ | 4888/6638 [51:42<1:35:00, 3.26s/it] {'loss': 0.6508, 'grad_norm': 0.6381051786176061, 'learning_rate': 3.4299623246414283e-06, 'epoch': 0.74} 74%|███████▎ | 4888/6638 [51:42<1:35:00, 3.26s/it] 74%|███████▎ | 4889/6638 [51:45<1:35:09, 3.26s/it] {'loss': 0.6221, 'grad_norm': 0.555584774246887, 'learning_rate': 3.4262843142593536e-06, 'epoch': 0.74} 74%|███████▎ | 4889/6638 [51:45<1:35:09, 3.26s/it] 74%|███████▎ | 4890/6638 [51:49<1:35:33, 3.28s/it] {'loss': 0.6249, 'grad_norm': 0.5790724372068735, 'learning_rate': 3.422607869217377e-06, 'epoch': 0.74} 74%|███████▎ | 4890/6638 [51:49<1:35:33, 3.28s/it] 74%|███████▎ | 4891/6638 [51:52<1:35:56, 3.29s/it] {'loss': 0.6281, 'grad_norm': 0.6048568888931377, 'learning_rate': 3.4189329903909307e-06, 'epoch': 0.74} 74%|███████▎ | 4891/6638 [51:52<1:35:56, 3.29s/it] 74%|███████▎ | 4892/6638 [51:55<1:35:58, 3.30s/it] {'loss': 0.6065, 'grad_norm': 0.556265524566053, 'learning_rate': 3.4152596786550764e-06, 'epoch': 0.74} 74%|███████▎ | 4892/6638 [51:55<1:35:58, 3.30s/it] 74%|███████▎ | 4893/6638 [51:58<1:35:23, 3.28s/it] {'loss': 0.6192, 'grad_norm': 0.5672820080616493, 'learning_rate': 3.4115879348845194e-06, 'epoch': 0.74} 74%|███████▎ | 4893/6638 [51:58<1:35:23, 3.28s/it] 74%|███████▎ | 4894/6638 [52:02<1:35:46, 3.30s/it] {'loss': 0.6183, 'grad_norm': 0.592462452699497, 'learning_rate': 3.4079177599535695e-06, 'epoch': 0.74} 74%|███████▎ | 4894/6638 [52:02<1:35:46, 3.30s/it] 74%|███████▎ | 4895/6638 [52:05<1:35:09, 3.28s/it] {'loss': 0.6743, 'grad_norm': 0.6356674575554835, 'learning_rate': 3.404249154736179e-06, 'epoch': 0.74} 74%|███████▎ | 4895/6638 [52:05<1:35:09, 3.28s/it] 74%|███████▍ | 4896/6638 [52:08<1:35:58, 3.31s/it] {'loss': 0.6328, 'grad_norm': 0.608787771498633, 'learning_rate': 3.400582120105913e-06, 'epoch': 0.74} 74%|███████▍ | 4896/6638 [52:08<1:35:58, 3.31s/it] 74%|███████▍ | 4897/6638 [52:12<1:35:54, 3.31s/it] {'loss': 0.6573, 'grad_norm': 0.7186597846440141, 'learning_rate': 3.3969166569359734e-06, 'epoch': 0.74} 74%|███████▍ | 4897/6638 [52:12<1:35:54, 3.31s/it] 74%|███████▍ | 4898/6638 [52:15<1:35:18, 3.29s/it] {'loss': 0.6673, 'grad_norm': 0.6327540683704423, 'learning_rate': 3.3932527660991877e-06, 'epoch': 0.74} 74%|███████▍ | 4898/6638 [52:15<1:35:18, 3.29s/it] 74%|███████▍ | 4899/6638 [52:18<1:35:54, 3.31s/it] {'loss': 0.6123, 'grad_norm': 0.577146676444663, 'learning_rate': 3.3895904484679986e-06, 'epoch': 0.74} 74%|███████▍ | 4899/6638 [52:18<1:35:54, 3.31s/it]2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 74%|███████▍ | 4900/6638 [52:22<1:36:00, 3.31s/it] {'loss': 0.6257, 'grad_norm': 0.5181727135437159, 'learning_rate': 3.3859297049144833e-06, 'epoch': 0.74} 74%|███████▍ | 4900/6638 [52:22<1:36:00, 3.31s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-4900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 74%|███████▍ | 4901/6638 [52:40<3:43:22, 7.72s/it] {'loss': 0.6607, 'grad_norm': 0.5830519832888628, 'learning_rate': 3.382270536310347e-06, 'epoch': 0.74} 74%|███████▍ | 4901/6638 [52:40<3:43:22, 7.72s/it] 74%|███████▍ | 4902/6638 [52:43<3:05:13, 6.40s/it] {'loss': 0.682, 'grad_norm': 0.6320146337926597, 'learning_rate': 3.3786129435269076e-06, 'epoch': 0.74} 74%|███████▍ | 4902/6638 [52:43<3:05:13, 6.40s/it] 74%|███████▍ | 4903/6638 [52:46<2:37:54, 5.46s/it] {'loss': 0.616, 'grad_norm': 0.582405191925418, 'learning_rate': 3.374956927435118e-06, 'epoch': 0.74} 74%|███████▍ | 4903/6638 [52:46<2:37:54, 5.46s/it] 74%|███████▍ | 4904/6638 [52:50<2:19:39, 4.83s/it] {'loss': 0.6307, 'grad_norm': 0.5747612746832665, 'learning_rate': 3.3713024889055557e-06, 'epoch': 0.74} 74%|███████▍ | 4904/6638 [52:50<2:19:39, 4.83s/it] 74%|███████▍ | 4905/6638 [52:53<2:06:37, 4.38s/it] {'loss': 0.6536, 'grad_norm': 0.6242091146908949, 'learning_rate': 3.3676496288084127e-06, 'epoch': 0.74} 74%|███████▍ | 4905/6638 [52:53<2:06:37, 4.38s/it] 74%|███████▍ | 4906/6638 [52:56<1:57:00, 4.05s/it] {'loss': 0.6333, 'grad_norm': 0.5792376581288031, 'learning_rate': 3.3639983480135197e-06, 'epoch': 0.74} 74%|███████▍ | 4906/6638 [52:56<1:57:00, 4.05s/it] 74%|███████▍ | 4907/6638 [52:59<1:50:03, 3.81s/it] {'loss': 0.613, 'grad_norm': 0.6047715100991005, 'learning_rate': 3.3603486473903147e-06, 'epoch': 0.74} 74%|███████▍ | 4907/6638 [52:59<1:50:03, 3.81s/it] 74%|███████▍ | 4908/6638 [53:03<1:45:30, 3.66s/it] {'loss': 0.6087, 'grad_norm': 0.5794452049351857, 'learning_rate': 3.3567005278078736e-06, 'epoch': 0.74} 74%|███████▍ | 4908/6638 [53:03<1:45:30, 3.66s/it] 74%|███████▍ | 4909/6638 [53:06<1:40:58, 3.50s/it] {'loss': 0.6394, 'grad_norm': 0.5785136505336567, 'learning_rate': 3.3530539901348937e-06, 'epoch': 0.74} 74%|███████▍ | 4909/6638 [53:06<1:40:58, 3.50s/it] 74%|███████▍ | 4910/6638 [53:09<1:38:41, 3.43s/it] {'loss': 0.6065, 'grad_norm': 0.5797559674575653, 'learning_rate': 3.349409035239685e-06, 'epoch': 0.74} 74%|███████▍ | 4910/6638 [53:09<1:38:41, 3.43s/it] 74%|███████▍ | 4911/6638 [53:12<1:36:53, 3.37s/it] {'loss': 0.6437, 'grad_norm': 0.5692698726297202, 'learning_rate': 3.3457656639901903e-06, 'epoch': 0.74} 74%|███████▍ | 4911/6638 [53:12<1:36:53, 3.37s/it] 74%|███████▍ | 4912/6638 [53:16<1:35:29, 3.32s/it] {'loss': 0.6334, 'grad_norm': 0.5740269880886669, 'learning_rate': 3.3421238772539755e-06, 'epoch': 0.74} 74%|███████▍ | 4912/6638 [53:16<1:35:29, 3.32s/it] 74%|███████▍ | 4913/6638 [53:19<1:33:56, 3.27s/it] {'loss': 0.6577, 'grad_norm': 0.6498356586820666, 'learning_rate': 3.3384836758982277e-06, 'epoch': 0.74} 74%|███████▍ | 4913/6638 [53:19<1:33:56, 3.27s/it] 74%|███████▍ | 4914/6638 [53:22<1:33:44, 3.26s/it] {'loss': 0.6388, 'grad_norm': 0.6324613768806919, 'learning_rate': 3.334845060789753e-06, 'epoch': 0.74} 74%|███████▍ | 4914/6638 [53:22<1:33:44, 3.26s/it] 74%|███████▍ | 4915/6638 [53:25<1:35:03, 3.31s/it] {'loss': 0.6847, 'grad_norm': 0.6038927001437132, 'learning_rate': 3.331208032794977e-06, 'epoch': 0.74} 74%|███████▍ | 4915/6638 [53:25<1:35:03, 3.31s/it] 74%|███████▍ | 4916/6638 [53:29<1:34:17, 3.29s/it] {'loss': 0.6416, 'grad_norm': 0.594096762824588, 'learning_rate': 3.327572592779963e-06, 'epoch': 0.74} 74%|███████▍ | 4916/6638 [53:29<1:34:17, 3.29s/it] 74%|███████▍ | 4917/6638 [53:32<1:33:53, 3.27s/it] {'loss': 0.6092, 'grad_norm': 0.5649234834245428, 'learning_rate': 3.3239387416103786e-06, 'epoch': 0.74} 74%|███████▍ | 4917/6638 [53:32<1:33:53, 3.27s/it] 74%|███████▍ | 4918/6638 [53:35<1:34:54, 3.31s/it] {'loss': 0.6724, 'grad_norm': 0.6337249646560003, 'learning_rate': 3.320306480151526e-06, 'epoch': 0.74} 74%|███████▍ | 4918/6638 [53:35<1:34:54, 3.31s/it] 74%|███████▍ | 4919/6638 [53:39<1:35:05, 3.32s/it] {'loss': 0.6349, 'grad_norm': 0.5320064387271356, 'learning_rate': 3.316675809268316e-06, 'epoch': 0.74} 74%|███████▍ | 4919/6638 [53:39<1:35:05, 3.32s/it] 74%|███████▍ | 4920/6638 [53:42<1:35:22, 3.33s/it] {'loss': 0.6674, 'grad_norm': 0.6461030757445936, 'learning_rate': 3.313046729825291e-06, 'epoch': 0.74} 74%|███████▍ | 4920/6638 [53:42<1:35:22, 3.33s/it] 74%|███████▍ | 4921/6638 [53:45<1:34:30, 3.30s/it] {'loss': 0.6184, 'grad_norm': 0.6396617744122047, 'learning_rate': 3.3094192426866144e-06, 'epoch': 0.74} 74%|███████▍ | 4921/6638 [53:45<1:34:30, 3.30s/it] 74%|███████▍ | 4922/6638 [53:48<1:33:54, 3.28s/it] {'loss': 0.6226, 'grad_norm': 0.5992459406486402, 'learning_rate': 3.305793348716062e-06, 'epoch': 0.74} 74%|███████▍ | 4922/6638 [53:48<1:33:54, 3.28s/it] 74%|███████▍ | 4923/6638 [53:52<1:33:13, 3.26s/it] {'loss': 0.6044, 'grad_norm': 0.6101264917243836, 'learning_rate': 3.3021690487770374e-06, 'epoch': 0.74} 74%|███████▍ | 4923/6638 [53:52<1:33:13, 3.26s/it] 74%|███████▍ | 4924/6638 [53:55<1:32:30, 3.24s/it] {'loss': 0.616, 'grad_norm': 0.5842496869721702, 'learning_rate': 3.298546343732567e-06, 'epoch': 0.74} 74%|███████▍ | 4924/6638 [53:55<1:32:30, 3.24s/it] 74%|███████▍ | 4925/6638 [53:58<1:32:48, 3.25s/it] {'loss': 0.6528, 'grad_norm': 0.5794399544522776, 'learning_rate': 3.2949252344452855e-06, 'epoch': 0.74} 74%|███████▍ | 4925/6638 [53:58<1:32:48, 3.25s/it] 74%|███████▍ | 4926/6638 [54:01<1:33:10, 3.27s/it] {'loss': 0.5922, 'grad_norm': 0.576546785624887, 'learning_rate': 3.2913057217774636e-06, 'epoch': 0.74} 74%|███████▍ | 4926/6638 [54:01<1:33:10, 3.27s/it] 74%|███████▍ | 4927/6638 [54:05<1:33:33, 3.28s/it] {'loss': 0.6113, 'grad_norm': 0.5109465580907435, 'learning_rate': 3.2876878065909714e-06, 'epoch': 0.74} 74%|███████▍ | 4927/6638 [54:05<1:33:33, 3.28s/it] 74%|███████▍ | 4928/6638 [54:08<1:33:18, 3.27s/it] {'loss': 0.6957, 'grad_norm': 0.667980348461984, 'learning_rate': 3.2840714897473247e-06, 'epoch': 0.74} 74%|███████▍ | 4928/6638 [54:08<1:33:18, 3.27s/it] 74%|███████▍ | 4929/6638 [54:11<1:33:33, 3.28s/it] {'loss': 0.6078, 'grad_norm': 0.5522471231794776, 'learning_rate': 3.2804567721076385e-06, 'epoch': 0.74} 74%|███████▍ | 4929/6638 [54:11<1:33:33, 3.28s/it] 74%|███████▍ | 4930/6638 [54:15<1:33:55, 3.30s/it] {'loss': 0.6665, 'grad_norm': 0.6146445637082344, 'learning_rate': 3.276843654532649e-06, 'epoch': 0.74} 74%|███████▍ | 4930/6638 [54:15<1:33:55, 3.30s/it] 74%|███████▍ | 4931/6638 [54:18<1:34:50, 3.33s/it] {'loss': 0.6534, 'grad_norm': 0.5512449314035418, 'learning_rate': 3.2732321378827204e-06, 'epoch': 0.74} 74%|███████▍ | 4931/6638 [54:18<1:34:50, 3.33s/it] 74%|███████▍ | 4932/6638 [54:21<1:34:16, 3.32s/it] {'loss': 0.6183, 'grad_norm': 0.5868406128907335, 'learning_rate': 3.2696222230178286e-06, 'epoch': 0.74} 74%|███████▍ | 4932/6638 [54:21<1:34:16, 3.32s/it] 74%|███████▍ | 4933/6638 [54:25<1:33:38, 3.30s/it] {'loss': 0.6532, 'grad_norm': 0.6389971539846168, 'learning_rate': 3.2660139107975764e-06, 'epoch': 0.74} 74%|███████▍ | 4933/6638 [54:25<1:33:38, 3.30s/it] 74%|███████▍ | 4934/6638 [54:28<1:33:37, 3.30s/it] {'loss': 0.6793, 'grad_norm': 0.6750542309788151, 'learning_rate': 3.2624072020811703e-06, 'epoch': 0.74} 74%|███████▍ | 4934/6638 [54:28<1:33:37, 3.30s/it] 74%|███████▍ | 4935/6638 [54:31<1:33:10, 3.28s/it] {'loss': 0.6295, 'grad_norm': 0.5805671440353636, 'learning_rate': 3.258802097727447e-06, 'epoch': 0.74} 74%|███████▍ | 4935/6638 [54:31<1:33:10, 3.28s/it] 74%|███████▍ | 4936/6638 [54:34<1:31:53, 3.24s/it] {'loss': 0.6078, 'grad_norm': 0.5635623766774938, 'learning_rate': 3.255198598594862e-06, 'epoch': 0.74} 74%|███████▍ | 4936/6638 [54:34<1:31:53, 3.24s/it] 74%|███████▍ | 4937/6638 [54:38<1:32:16, 3.25s/it] {'loss': 0.6593, 'grad_norm': 0.7026927226369474, 'learning_rate': 3.25159670554148e-06, 'epoch': 0.74} 74%|███████▍ | 4937/6638 [54:38<1:32:16, 3.25s/it] 74%|███████▍ | 4938/6638 [54:41<1:32:16, 3.26s/it] {'loss': 0.6273, 'grad_norm': 0.5439068660732794, 'learning_rate': 3.2479964194249813e-06, 'epoch': 0.74} 74%|███████▍ | 4938/6638 [54:41<1:32:16, 3.26s/it] 74%|███████▍ | 4939/6638 [54:44<1:31:57, 3.25s/it] {'loss': 0.6521, 'grad_norm': 0.6036524880137908, 'learning_rate': 3.244397741102683e-06, 'epoch': 0.74} 74%|███████▍ | 4939/6638 [54:44<1:31:57, 3.25s/it] 74%|███████▍ | 4940/6638 [54:47<1:32:33, 3.27s/it] {'loss': 0.5991, 'grad_norm': 0.5095813777366938, 'learning_rate': 3.2408006714314967e-06, 'epoch': 0.74} 74%|███████▍ | 4940/6638 [54:47<1:32:33, 3.27s/it] 74%|███████▍ | 4941/6638 [54:51<1:32:12, 3.26s/it] {'loss': 0.642, 'grad_norm': 0.5903525696412014, 'learning_rate': 3.2372052112679666e-06, 'epoch': 0.74} 74%|███████▍ | 4941/6638 [54:51<1:32:12, 3.26s/it] 74%|███████▍ | 4942/6638 [54:54<1:32:46, 3.28s/it] {'loss': 0.6461, 'grad_norm': 0.5263653848532588, 'learning_rate': 3.2336113614682405e-06, 'epoch': 0.74} 74%|███████▍ | 4942/6638 [54:54<1:32:46, 3.28s/it] 74%|███████▍ | 4943/6638 [54:57<1:32:53, 3.29s/it] {'loss': 0.631, 'grad_norm': 0.5410089824886936, 'learning_rate': 3.230019122888094e-06, 'epoch': 0.74} 74%|███████▍ | 4943/6638 [54:57<1:32:53, 3.29s/it] 74%|███████▍ | 4944/6638 [55:00<1:32:27, 3.27s/it] {'loss': 0.6955, 'grad_norm': 0.7080438740372297, 'learning_rate': 3.2264284963829175e-06, 'epoch': 0.74} 74%|███████▍ | 4944/6638 [55:00<1:32:27, 3.27s/it] 74%|███████▍ | 4945/6638 [55:04<1:32:34, 3.28s/it] {'loss': 0.6021, 'grad_norm': 0.5955956182139692, 'learning_rate': 3.222839482807707e-06, 'epoch': 0.74} 74%|███████▍ | 4945/6638 [55:04<1:32:34, 3.28s/it] 75%|███████▍ | 4946/6638 [55:07<1:32:17, 3.27s/it] {'loss': 0.6687, 'grad_norm': 0.590515220944185, 'learning_rate': 3.2192520830170882e-06, 'epoch': 0.75} 75%|███████▍ | 4946/6638 [55:07<1:32:17, 3.27s/it] 75%|███████▍ | 4947/6638 [55:10<1:33:02, 3.30s/it] {'loss': 0.5914, 'grad_norm': 0.5209866037471098, 'learning_rate': 3.2156662978652975e-06, 'epoch': 0.75} 75%|███████▍ | 4947/6638 [55:10<1:33:02, 3.30s/it] 75%|███████▍ | 4948/6638 [55:14<1:33:46, 3.33s/it] {'loss': 0.6383, 'grad_norm': 0.549314452593222, 'learning_rate': 3.2120821282061798e-06, 'epoch': 0.75} 75%|███████▍ | 4948/6638 [55:14<1:33:46, 3.33s/it] 75%|███████▍ | 4949/6638 [55:17<1:33:49, 3.33s/it] {'loss': 0.6124, 'grad_norm': 0.5710525962314787, 'learning_rate': 3.208499574893208e-06, 'epoch': 0.75} 75%|███████▍ | 4949/6638 [55:17<1:33:49, 3.33s/it]6 AutoResumeHook: Checking whether to suspend... 23 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...5 AutoResumeHook: Checking whether to suspend... 75%|███████▍ | 4950/6638 [55:20<1:33:29, 3.32s/it]1 AutoResumeHook: Checking whether to suspend... {'loss': 0.6228, 'grad_norm': 0.5425772607630055, 'learning_rate': 3.2049186387794538e-06, 'epoch': 0.75} 75%|███████▍ | 4950/6638 [55:20<1:33:29, 3.32s/it] 75%|███████▍ | 4951/6638 [55:24<1:33:01, 3.31s/it] {'loss': 0.6616, 'grad_norm': 0.6203314227783292, 'learning_rate': 3.2013393207176267e-06, 'epoch': 0.75} 75%|███████▍ | 4951/6638 [55:24<1:33:01, 3.31s/it] 75%|███████▍ | 4952/6638 [55:27<1:33:06, 3.31s/it] {'loss': 0.6206, 'grad_norm': 0.573961849399598, 'learning_rate': 3.1977616215600304e-06, 'epoch': 0.75} 75%|███████▍ | 4952/6638 [55:27<1:33:06, 3.31s/it] 75%|███████▍ | 4953/6638 [55:30<1:32:40, 3.30s/it] {'loss': 0.6536, 'grad_norm': 0.5891896624888643, 'learning_rate': 3.1941855421585876e-06, 'epoch': 0.75} 75%|███████▍ | 4953/6638 [55:30<1:32:40, 3.30s/it] 75%|███████▍ | 4954/6638 [55:34<1:32:20, 3.29s/it] {'loss': 0.6882, 'grad_norm': 0.6283066216729972, 'learning_rate': 3.1906110833648417e-06, 'epoch': 0.75} 75%|███████▍ | 4954/6638 [55:34<1:32:20, 3.29s/it] 75%|███████▍ | 4955/6638 [55:37<1:31:26, 3.26s/it] {'loss': 0.6206, 'grad_norm': 0.6137520033424941, 'learning_rate': 3.1870382460299466e-06, 'epoch': 0.75} 75%|███████▍ | 4955/6638 [55:37<1:31:26, 3.26s/it] 75%|███████▍ | 4956/6638 [55:40<1:31:41, 3.27s/it] {'loss': 0.6137, 'grad_norm': 0.5208179692307497, 'learning_rate': 3.1834670310046735e-06, 'epoch': 0.75} 75%|███████▍ | 4956/6638 [55:40<1:31:41, 3.27s/it] 75%|███████▍ | 4957/6638 [55:43<1:31:54, 3.28s/it] {'loss': 0.6372, 'grad_norm': 0.5244121439458276, 'learning_rate': 3.1798974391393953e-06, 'epoch': 0.75} 75%|███████▍ | 4957/6638 [55:43<1:31:54, 3.28s/it] 75%|███████▍ | 4958/6638 [55:47<1:31:07, 3.25s/it] {'loss': 0.5957, 'grad_norm': 0.5639330675573557, 'learning_rate': 3.176329471284113e-06, 'epoch': 0.75} 75%|███████▍ | 4958/6638 [55:47<1:31:07, 3.25s/it] 75%|███████▍ | 4959/6638 [55:50<1:30:31, 3.24s/it] {'loss': 0.6456, 'grad_norm': 0.5906767629244062, 'learning_rate': 3.1727631282884363e-06, 'epoch': 0.75} 75%|███████▍ | 4959/6638 [55:50<1:30:31, 3.24s/it] 75%|███████▍ | 4960/6638 [55:53<1:30:29, 3.24s/it] {'loss': 0.6367, 'grad_norm': 0.7188976702078511, 'learning_rate': 3.1691984110015818e-06, 'epoch': 0.75} 75%|███████▍ | 4960/6638 [55:53<1:30:29, 3.24s/it] 75%|███████▍ | 4961/6638 [55:56<1:30:40, 3.24s/it] {'loss': 0.6119, 'grad_norm': 0.5692252214042318, 'learning_rate': 3.1656353202723876e-06, 'epoch': 0.75} 75%|███████▍ | 4961/6638 [55:56<1:30:40, 3.24s/it] 75%|███████▍ | 4962/6638 [56:00<1:31:35, 3.28s/it] {'loss': 0.6223, 'grad_norm': 0.5694541587111395, 'learning_rate': 3.162073856949296e-06, 'epoch': 0.75} 75%|███████▍ | 4962/6638 [56:00<1:31:35, 3.28s/it] 75%|███████▍ | 4963/6638 [56:03<1:32:42, 3.32s/it] {'loss': 0.6377, 'grad_norm': 0.5238745402544027, 'learning_rate': 3.15851402188037e-06, 'epoch': 0.75} 75%|███████▍ | 4963/6638 [56:03<1:32:42, 3.32s/it] 75%|███████▍ | 4964/6638 [56:06<1:32:33, 3.32s/it] {'loss': 0.688, 'grad_norm': 0.6999375731570368, 'learning_rate': 3.154955815913283e-06, 'epoch': 0.75} 75%|███████▍ | 4964/6638 [56:06<1:32:33, 3.32s/it] 75%|███████▍ | 4965/6638 [56:10<1:31:56, 3.30s/it] {'loss': 0.6451, 'grad_norm': 0.5765996328489743, 'learning_rate': 3.151399239895313e-06, 'epoch': 0.75} 75%|███████▍ | 4965/6638 [56:10<1:31:56, 3.30s/it] 75%|███████▍ | 4966/6638 [56:13<1:31:34, 3.29s/it] {'loss': 0.6376, 'grad_norm': 0.6223372267810997, 'learning_rate': 3.147844294673359e-06, 'epoch': 0.75} 75%|███████▍ | 4966/6638 [56:13<1:31:34, 3.29s/it] 75%|███████▍ | 4967/6638 [56:16<1:31:54, 3.30s/it] {'loss': 0.6303, 'grad_norm': 0.5788927334702014, 'learning_rate': 3.1442909810939316e-06, 'epoch': 0.75} 75%|███████▍ | 4967/6638 [56:16<1:31:54, 3.30s/it] 75%|███████▍ | 4968/6638 [56:19<1:30:59, 3.27s/it] {'loss': 0.6371, 'grad_norm': 0.5567160798220882, 'learning_rate': 3.1407393000031426e-06, 'epoch': 0.75} 75%|███████▍ | 4968/6638 [56:19<1:30:59, 3.27s/it] 75%|███████▍ | 4969/6638 [56:23<1:30:40, 3.26s/it] {'loss': 0.6302, 'grad_norm': 0.6472879379650571, 'learning_rate': 3.137189252246726e-06, 'epoch': 0.75} 75%|███████▍ | 4969/6638 [56:23<1:30:40, 3.26s/it] 75%|███████▍ | 4970/6638 [56:26<1:29:58, 3.24s/it] {'loss': 0.6094, 'grad_norm': 0.5711201858667391, 'learning_rate': 3.1336408386700256e-06, 'epoch': 0.75} 75%|███████▍ | 4970/6638 [56:26<1:29:58, 3.24s/it] 75%|███████▍ | 4971/6638 [56:29<1:29:34, 3.22s/it] {'loss': 0.6145, 'grad_norm': 0.6126492144504562, 'learning_rate': 3.130094060117986e-06, 'epoch': 0.75} 75%|███████▍ | 4971/6638 [56:29<1:29:34, 3.22s/it] 75%|███████▍ | 4972/6638 [56:32<1:29:42, 3.23s/it] {'loss': 0.6501, 'grad_norm': 0.6104933308597638, 'learning_rate': 3.126548917435179e-06, 'epoch': 0.75} 75%|███████▍ | 4972/6638 [56:32<1:29:42, 3.23s/it] 75%|███████▍ | 4973/6638 [56:35<1:29:34, 3.23s/it] {'loss': 0.6458, 'grad_norm': 0.614732802555654, 'learning_rate': 3.123005411465766e-06, 'epoch': 0.75} 75%|███████▍ | 4973/6638 [56:35<1:29:34, 3.23s/it] 75%|███████▍ | 4974/6638 [56:39<1:28:55, 3.21s/it] {'loss': 0.6326, 'grad_norm': 0.6132170925516323, 'learning_rate': 3.1194635430535426e-06, 'epoch': 0.75} 75%|███████▍ | 4974/6638 [56:39<1:28:55, 3.21s/it] 75%|███████▍ | 4975/6638 [56:42<1:29:47, 3.24s/it] {'loss': 0.6709, 'grad_norm': 0.6677707804577803, 'learning_rate': 3.1159233130418975e-06, 'epoch': 0.75} 75%|███████▍ | 4975/6638 [56:42<1:29:47, 3.24s/it] 75%|███████▍ | 4976/6638 [56:45<1:29:59, 3.25s/it] {'loss': 0.6152, 'grad_norm': 0.5828368664405392, 'learning_rate': 3.11238472227383e-06, 'epoch': 0.75} 75%|███████▍ | 4976/6638 [56:45<1:29:59, 3.25s/it] 75%|███████▍ | 4977/6638 [56:48<1:30:11, 3.26s/it] {'loss': 0.6377, 'grad_norm': 0.6253483235058489, 'learning_rate': 3.108847771591956e-06, 'epoch': 0.75} 75%|███████▍ | 4977/6638 [56:48<1:30:11, 3.26s/it] 75%|███████▍ | 4978/6638 [56:52<1:29:41, 3.24s/it] {'loss': 0.6392, 'grad_norm': 0.6147144487806033, 'learning_rate': 3.105312461838499e-06, 'epoch': 0.75} 75%|███████▍ | 4978/6638 [56:52<1:29:41, 3.24s/it] 75%|███████▌ | 4979/6638 [56:55<1:30:24, 3.27s/it] {'loss': 0.6078, 'grad_norm': 0.5417167700164547, 'learning_rate': 3.1017787938552925e-06, 'epoch': 0.75} 75%|███████▌ | 4979/6638 [56:55<1:30:24, 3.27s/it] 75%|███████▌ | 4980/6638 [56:58<1:31:30, 3.31s/it] {'loss': 0.6237, 'grad_norm': 0.5243527372189473, 'learning_rate': 3.0982467684837724e-06, 'epoch': 0.75} 75%|███████▌ | 4980/6638 [56:58<1:31:30, 3.31s/it] 75%|███████▌ | 4981/6638 [57:02<1:31:13, 3.30s/it] {'loss': 0.6552, 'grad_norm': 0.6020984565410329, 'learning_rate': 3.0947163865649897e-06, 'epoch': 0.75} 75%|███████▌ | 4981/6638 [57:02<1:31:13, 3.30s/it] 75%|███████▌ | 4982/6638 [57:05<1:30:59, 3.30s/it] {'loss': 0.6152, 'grad_norm': 0.5599686880840606, 'learning_rate': 3.0911876489396063e-06, 'epoch': 0.75} 75%|███████▌ | 4982/6638 [57:05<1:30:59, 3.30s/it] 75%|███████▌ | 4983/6638 [57:08<1:30:39, 3.29s/it] {'loss': 0.6419, 'grad_norm': 0.6329945646046816, 'learning_rate': 3.0876605564478832e-06, 'epoch': 0.75} 75%|███████▌ | 4983/6638 [57:08<1:30:39, 3.29s/it] 75%|███████▌ | 4984/6638 [57:12<1:31:06, 3.30s/it] {'loss': 0.6638, 'grad_norm': 0.6182704030073183, 'learning_rate': 3.0841351099297025e-06, 'epoch': 0.75} 75%|███████▌ | 4984/6638 [57:12<1:31:06, 3.30s/it] 75%|███████▌ | 4985/6638 [57:15<1:30:07, 3.27s/it] {'loss': 0.638, 'grad_norm': 0.5860724863989494, 'learning_rate': 3.0806113102245395e-06, 'epoch': 0.75} 75%|███████▌ | 4985/6638 [57:15<1:30:07, 3.27s/it] 75%|███████▌ | 4986/6638 [57:18<1:30:55, 3.30s/it] {'loss': 0.6156, 'grad_norm': 0.592998465050267, 'learning_rate': 3.0770891581714877e-06, 'epoch': 0.75} 75%|███████▌ | 4986/6638 [57:18<1:30:55, 3.30s/it] 75%|███████▌ | 4987/6638 [57:21<1:30:52, 3.30s/it] {'loss': 0.6773, 'grad_norm': 0.6661069083941697, 'learning_rate': 3.0735686546092514e-06, 'epoch': 0.75} 75%|███████▌ | 4987/6638 [57:21<1:30:52, 3.30s/it] 75%|███████▌ | 4988/6638 [57:25<1:30:06, 3.28s/it] {'loss': 0.6117, 'grad_norm': 0.5555240241353264, 'learning_rate': 3.070049800376127e-06, 'epoch': 0.75} 75%|███████▌ | 4988/6638 [57:25<1:30:06, 3.28s/it] 75%|███████▌ | 4989/6638 [57:28<1:31:40, 3.34s/it] {'loss': 0.6808, 'grad_norm': 0.6204641670072029, 'learning_rate': 3.0665325963100334e-06, 'epoch': 0.75} 75%|███████▌ | 4989/6638 [57:28<1:31:40, 3.34s/it] 75%|███████▌ | 4990/6638 [57:32<1:31:57, 3.35s/it] {'loss': 0.6258, 'grad_norm': 0.5990450324644958, 'learning_rate': 3.063017043248493e-06, 'epoch': 0.75} 75%|███████▌ | 4990/6638 [57:32<1:31:57, 3.35s/it] 75%|███████▌ | 4991/6638 [57:35<1:31:03, 3.32s/it] {'loss': 0.5807, 'grad_norm': 0.49470506833929573, 'learning_rate': 3.059503142028627e-06, 'epoch': 0.75} 75%|███████▌ | 4991/6638 [57:35<1:31:03, 3.32s/it] 75%|███████▌ | 4992/6638 [57:38<1:31:51, 3.35s/it] {'loss': 0.631, 'grad_norm': 0.5624475655713633, 'learning_rate': 3.055990893487173e-06, 'epoch': 0.75} 75%|███████▌ | 4992/6638 [57:38<1:31:51, 3.35s/it] 75%|███████▌ | 4993/6638 [57:41<1:31:15, 3.33s/it] {'loss': 0.6222, 'grad_norm': 0.5540787677513154, 'learning_rate': 3.0524802984604694e-06, 'epoch': 0.75} 75%|███████▌ | 4993/6638 [57:41<1:31:15, 3.33s/it] 75%|███████▌ | 4994/6638 [57:45<1:30:13, 3.29s/it] {'loss': 0.6212, 'grad_norm': 0.5757370553584168, 'learning_rate': 3.0489713577844683e-06, 'epoch': 0.75} 75%|███████▌ | 4994/6638 [57:45<1:30:13, 3.29s/it] 75%|███████▌ | 4995/6638 [57:48<1:30:15, 3.30s/it] {'loss': 0.6562, 'grad_norm': 0.5575440950367183, 'learning_rate': 3.045464072294717e-06, 'epoch': 0.75} 75%|███████▌ | 4995/6638 [57:48<1:30:15, 3.30s/it] 75%|███████▌ | 4996/6638 [57:51<1:29:59, 3.29s/it] {'loss': 0.6341, 'grad_norm': 0.5837979128907169, 'learning_rate': 3.0419584428263692e-06, 'epoch': 0.75} 75%|███████▌ | 4996/6638 [57:51<1:29:59, 3.29s/it] 75%|███████▌ | 4997/6638 [57:55<1:31:17, 3.34s/it] {'loss': 0.7053, 'grad_norm': 0.652609817683554, 'learning_rate': 3.038454470214203e-06, 'epoch': 0.75} 75%|███████▌ | 4997/6638 [57:55<1:31:17, 3.34s/it] 75%|███████▌ | 4998/6638 [57:58<1:30:43, 3.32s/it] {'loss': 0.6305, 'grad_norm': 0.6149192171202899, 'learning_rate': 3.0349521552925774e-06, 'epoch': 0.75} 75%|███████▌ | 4998/6638 [57:58<1:30:43, 3.32s/it] 75%|███████▌ | 4999/6638 [58:01<1:30:12, 3.30s/it] {'loss': 0.5836, 'grad_norm': 0.536745782757973, 'learning_rate': 3.031451498895468e-06, 'epoch': 0.75} 75%|███████▌ | 4999/6638 [58:01<1:30:12, 3.30s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 15 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 75%|███████▌ | 5000/6638 [58:05<1:30:19, 3.31s/it]7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6353, 'grad_norm': 0.6417585628844823, 'learning_rate': 3.027952501856457e-06, 'epoch': 0.75} 75%|███████▌ | 5000/6638 [58:05<1:30:19, 3.31s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 75%|███████▌ | 5001/6638 [58:22<3:25:18, 7.53s/it] {'loss': 0.6771, 'grad_norm': 0.6123562368698487, 'learning_rate': 3.0244551650087286e-06, 'epoch': 0.75} 75%|███████▌ | 5001/6638 [58:22<3:25:18, 7.53s/it] 75%|███████▌ | 5002/6638 [58:25<2:51:35, 6.29s/it] {'loss': 0.6522, 'grad_norm': 0.572810652978113, 'learning_rate': 3.0209594891850757e-06, 'epoch': 0.75} 75%|███████▌ | 5002/6638 [58:25<2:51:35, 6.29s/it] 75%|███████▌ | 5003/6638 [58:29<2:27:48, 5.42s/it] {'loss': 0.645, 'grad_norm': 0.6005166433601818, 'learning_rate': 3.01746547521789e-06, 'epoch': 0.75} 75%|███████▌ | 5003/6638 [58:29<2:27:48, 5.42s/it] 75%|███████▌ | 5004/6638 [58:32<2:10:55, 4.81s/it] {'loss': 0.6613, 'grad_norm': 0.5961676383865272, 'learning_rate': 3.0139731239391625e-06, 'epoch': 0.75} 75%|███████▌ | 5004/6638 [58:32<2:10:55, 4.81s/it] 75%|███████▌ | 5005/6638 [58:35<1:58:52, 4.37s/it] {'loss': 0.6367, 'grad_norm': 0.5781750110509635, 'learning_rate': 3.0104824361805074e-06, 'epoch': 0.75} 75%|███████▌ | 5005/6638 [58:35<1:58:52, 4.37s/it] 75%|███████▌ | 5006/6638 [58:39<1:50:46, 4.07s/it] {'loss': 0.6091, 'grad_norm': 0.5802318498265906, 'learning_rate': 3.006993412773124e-06, 'epoch': 0.75} 75%|███████▌ | 5006/6638 [58:39<1:50:46, 4.07s/it] 75%|███████▌ | 5007/6638 [58:42<1:45:08, 3.87s/it] {'loss': 0.5931, 'grad_norm': 0.49266356289414936, 'learning_rate': 3.003506054547827e-06, 'epoch': 0.75} 75%|███████▌ | 5007/6638 [58:42<1:45:08, 3.87s/it] 75%|███████▌ | 5008/6638 [58:46<1:40:19, 3.69s/it] {'loss': 0.6181, 'grad_norm': 0.5771276211813346, 'learning_rate': 3.0000203623350223e-06, 'epoch': 0.75} 75%|███████▌ | 5008/6638 [58:46<1:40:19, 3.69s/it] 75%|███████▌ | 5009/6638 [58:49<1:37:27, 3.59s/it] {'loss': 0.6259, 'grad_norm': 0.5248359373518521, 'learning_rate': 2.9965363369647317e-06, 'epoch': 0.75} 75%|███████▌ | 5009/6638 [58:49<1:37:27, 3.59s/it] 75%|███████▌ | 5010/6638 [58:52<1:34:31, 3.48s/it] {'loss': 0.6497, 'grad_norm': 0.6385234684577478, 'learning_rate': 2.9930539792665767e-06, 'epoch': 0.75} 75%|███████▌ | 5010/6638 [58:52<1:34:31, 3.48s/it] 75%|███████▌ | 5011/6638 [58:55<1:33:02, 3.43s/it] {'loss': 0.6566, 'grad_norm': 0.5495985603804383, 'learning_rate': 2.989573290069776e-06, 'epoch': 0.75} 75%|███████▌ | 5011/6638 [58:55<1:33:02, 3.43s/it] 76%|███████▌ | 5012/6638 [58:59<1:30:53, 3.35s/it] {'loss': 0.6273, 'grad_norm': 0.6201803880995693, 'learning_rate': 2.986094270203156e-06, 'epoch': 0.76} 76%|███████▌ | 5012/6638 [58:59<1:30:53, 3.35s/it] 76%|███████▌ | 5013/6638 [59:02<1:30:27, 3.34s/it] {'loss': 0.6381, 'grad_norm': 0.5743829876625187, 'learning_rate': 2.982616920495147e-06, 'epoch': 0.76} 76%|███████▌ | 5013/6638 [59:02<1:30:27, 3.34s/it] 76%|███████▌ | 5014/6638 [59:05<1:31:52, 3.39s/it] {'loss': 0.6469, 'grad_norm': 0.5609926295528549, 'learning_rate': 2.979141241773775e-06, 'epoch': 0.76} 76%|███████▌ | 5014/6638 [59:05<1:31:52, 3.39s/it] 76%|███████▌ | 5015/6638 [59:09<1:32:09, 3.41s/it] {'loss': 0.6355, 'grad_norm': 0.5402810692856813, 'learning_rate': 2.975667234866675e-06, 'epoch': 0.76} 76%|███████▌ | 5015/6638 [59:09<1:32:09, 3.41s/it] 76%|███████▌ | 5016/6638 [59:12<1:31:17, 3.38s/it] {'loss': 0.6393, 'grad_norm': 0.544290158882814, 'learning_rate': 2.9721949006010807e-06, 'epoch': 0.76} 76%|███████▌ | 5016/6638 [59:12<1:31:17, 3.38s/it] 76%|███████▌ | 5017/6638 [59:15<1:29:44, 3.32s/it] {'loss': 0.6368, 'grad_norm': 0.607647456983409, 'learning_rate': 2.968724239803831e-06, 'epoch': 0.76} 76%|███████▌ | 5017/6638 [59:15<1:29:44, 3.32s/it] 76%|███████▌ | 5018/6638 [59:19<1:29:25, 3.31s/it] {'loss': 0.6284, 'grad_norm': 0.562125749408481, 'learning_rate': 2.96525525330136e-06, 'epoch': 0.76} 76%|███████▌ | 5018/6638 [59:19<1:29:25, 3.31s/it] 76%|███████▌ | 5019/6638 [59:22<1:28:58, 3.30s/it] {'loss': 0.628, 'grad_norm': 0.6223018873377979, 'learning_rate': 2.9617879419197037e-06, 'epoch': 0.76} 76%|███████▌ | 5019/6638 [59:22<1:28:58, 3.30s/it] 76%|███████▌ | 5020/6638 [59:25<1:30:24, 3.35s/it] {'loss': 0.615, 'grad_norm': 0.5796296582167931, 'learning_rate': 2.9583223064845057e-06, 'epoch': 0.76} 76%|███████▌ | 5020/6638 [59:25<1:30:24, 3.35s/it] 76%|███████▌ | 5021/6638 [59:29<1:31:37, 3.40s/it] {'loss': 0.6499, 'grad_norm': 0.5148049464969962, 'learning_rate': 2.9548583478210045e-06, 'epoch': 0.76} 76%|███████▌ | 5021/6638 [59:29<1:31:37, 3.40s/it] 76%|███████▌ | 5022/6638 [59:32<1:30:56, 3.38s/it] {'loss': 0.7336, 'grad_norm': 0.6732116124384073, 'learning_rate': 2.9513960667540475e-06, 'epoch': 0.76} 76%|███████▌ | 5022/6638 [59:32<1:30:56, 3.38s/it] 76%|███████▌ | 5023/6638 [59:36<1:30:33, 3.36s/it] {'loss': 0.6767, 'grad_norm': 0.6521969519941786, 'learning_rate': 2.947935464108069e-06, 'epoch': 0.76} 76%|███████▌ | 5023/6638 [59:36<1:30:33, 3.36s/it] 76%|███████▌ | 5024/6638 [59:39<1:29:24, 3.32s/it] {'loss': 0.6309, 'grad_norm': 0.6027112238047306, 'learning_rate': 2.9444765407071153e-06, 'epoch': 0.76} 76%|███████▌ | 5024/6638 [59:39<1:29:24, 3.32s/it] 76%|███████▌ | 5025/6638 [59:42<1:28:19, 3.29s/it] {'loss': 0.6346, 'grad_norm': 0.5610403187097133, 'learning_rate': 2.9410192973748298e-06, 'epoch': 0.76} 76%|███████▌ | 5025/6638 [59:42<1:28:19, 3.29s/it] 76%|███████▌ | 5026/6638 [59:45<1:28:18, 3.29s/it] {'loss': 0.6409, 'grad_norm': 0.5800291931185813, 'learning_rate': 2.937563734934451e-06, 'epoch': 0.76} 76%|███████▌ | 5026/6638 [59:45<1:28:18, 3.29s/it] 76%|███████▌ | 5027/6638 [59:49<1:27:54, 3.27s/it] {'loss': 0.6895, 'grad_norm': 0.6873825780156744, 'learning_rate': 2.9341098542088232e-06, 'epoch': 0.76} 76%|███████▌ | 5027/6638 [59:49<1:27:54, 3.27s/it] 76%|███████▌ | 5028/6638 [59:52<1:27:34, 3.26s/it] {'loss': 0.6158, 'grad_norm': 0.5655100869228301, 'learning_rate': 2.9306576560203926e-06, 'epoch': 0.76} 76%|███████▌ | 5028/6638 [59:52<1:27:34, 3.26s/it] 76%|███████▌ | 5029/6638 [59:55<1:28:10, 3.29s/it] {'loss': 0.621, 'grad_norm': 0.5776926250933179, 'learning_rate': 2.927207141191192e-06, 'epoch': 0.76} 76%|███████▌ | 5029/6638 [59:55<1:28:10, 3.29s/it] 76%|███████▌ | 5030/6638 [59:58<1:28:10, 3.29s/it] {'loss': 0.6469, 'grad_norm': 0.606690087475936, 'learning_rate': 2.923758310542868e-06, 'epoch': 0.76} 76%|███████▌ | 5030/6638 [59:58<1:28:10, 3.29s/it] 76%|███████▌ | 5031/6638 [1:00:02<1:27:23, 3.26s/it] {'loss': 0.637, 'grad_norm': 0.6071345256488706, 'learning_rate': 2.920311164896655e-06, 'epoch': 0.76} 76%|███████▌ | 5031/6638 [1:00:02<1:27:23, 3.26s/it] 76%|███████▌ | 5032/6638 [1:00:05<1:30:24, 3.38s/it] {'loss': 0.6651, 'grad_norm': 0.6761929808550046, 'learning_rate': 2.916865705073393e-06, 'epoch': 0.76} 76%|███████▌ | 5032/6638 [1:00:05<1:30:24, 3.38s/it] 76%|███████▌ | 5033/6638 [1:00:09<1:31:18, 3.41s/it] {'loss': 0.6348, 'grad_norm': 0.5361212734464413, 'learning_rate': 2.913421931893523e-06, 'epoch': 0.76} 76%|███████▌ | 5033/6638 [1:00:09<1:31:18, 3.41s/it] 76%|███████▌ | 5034/6638 [1:00:12<1:29:46, 3.36s/it] {'loss': 0.6473, 'grad_norm': 0.6110313985771125, 'learning_rate': 2.909979846177071e-06, 'epoch': 0.76} 76%|███████▌ | 5034/6638 [1:00:12<1:29:46, 3.36s/it] 76%|███████▌ | 5035/6638 [1:00:15<1:28:43, 3.32s/it] {'loss': 0.6467, 'grad_norm': 0.6196826618062274, 'learning_rate': 2.906539448743676e-06, 'epoch': 0.76} 76%|███████▌ | 5035/6638 [1:00:15<1:28:43, 3.32s/it] 76%|███████▌ | 5036/6638 [1:00:18<1:27:21, 3.27s/it] {'loss': 0.6347, 'grad_norm': 0.6269192729053048, 'learning_rate': 2.903100740412571e-06, 'epoch': 0.76} 76%|███████▌ | 5036/6638 [1:00:18<1:27:21, 3.27s/it] 76%|███████▌ | 5037/6638 [1:00:22<1:27:05, 3.26s/it] {'loss': 0.67, 'grad_norm': 0.6562014733363246, 'learning_rate': 2.8996637220025782e-06, 'epoch': 0.76} 76%|███████▌ | 5037/6638 [1:00:22<1:27:05, 3.26s/it] 76%|███████▌ | 5038/6638 [1:00:25<1:27:13, 3.27s/it] {'loss': 0.6223, 'grad_norm': 0.6028274832409745, 'learning_rate': 2.8962283943321277e-06, 'epoch': 0.76} 76%|███████▌ | 5038/6638 [1:00:25<1:27:13, 3.27s/it] 76%|███████▌ | 5039/6638 [1:00:28<1:28:55, 3.34s/it] {'loss': 0.6181, 'grad_norm': 0.5580483442829662, 'learning_rate': 2.892794758219243e-06, 'epoch': 0.76} 76%|███████▌ | 5039/6638 [1:00:28<1:28:55, 3.34s/it] 76%|███████▌ | 5040/6638 [1:00:32<1:29:15, 3.35s/it] {'loss': 0.6885, 'grad_norm': 0.7037961277869906, 'learning_rate': 2.889362814481549e-06, 'epoch': 0.76} 76%|███████▌ | 5040/6638 [1:00:32<1:29:15, 3.35s/it] 76%|███████▌ | 5041/6638 [1:00:35<1:28:33, 3.33s/it] {'loss': 0.6672, 'grad_norm': 0.652409766412618, 'learning_rate': 2.885932563936259e-06, 'epoch': 0.76} 76%|███████▌ | 5041/6638 [1:00:35<1:28:33, 3.33s/it] 76%|███████▌ | 5042/6638 [1:00:38<1:29:10, 3.35s/it] {'loss': 0.6388, 'grad_norm': 0.566926841583556, 'learning_rate': 2.8825040074001877e-06, 'epoch': 0.76} 76%|███████▌ | 5042/6638 [1:00:38<1:29:10, 3.35s/it] 76%|███████▌ | 5043/6638 [1:00:42<1:27:42, 3.30s/it] {'loss': 0.6264, 'grad_norm': 0.648535690151687, 'learning_rate': 2.879077145689746e-06, 'epoch': 0.76} 76%|███████▌ | 5043/6638 [1:00:42<1:27:42, 3.30s/it] 76%|███████▌ | 5044/6638 [1:00:45<1:27:35, 3.30s/it] {'loss': 0.6199, 'grad_norm': 0.5919021213675583, 'learning_rate': 2.875651979620945e-06, 'epoch': 0.76} 76%|███████▌ | 5044/6638 [1:00:45<1:27:35, 3.30s/it] 76%|███████▌ | 5045/6638 [1:00:48<1:27:29, 3.30s/it] {'loss': 0.6627, 'grad_norm': 0.60793625755289, 'learning_rate': 2.8722285100093894e-06, 'epoch': 0.76} 76%|███████▌ | 5045/6638 [1:00:48<1:27:29, 3.30s/it] 76%|███████▌ | 5046/6638 [1:00:51<1:27:09, 3.28s/it] {'loss': 0.6224, 'grad_norm': 0.5365654445491085, 'learning_rate': 2.8688067376702743e-06, 'epoch': 0.76} 76%|███████▌ | 5046/6638 [1:00:51<1:27:09, 3.28s/it] 76%|███████▌ | 5047/6638 [1:00:55<1:28:04, 3.32s/it] {'loss': 0.6137, 'grad_norm': 0.672065479612535, 'learning_rate': 2.8653866634184e-06, 'epoch': 0.76} 76%|███████▌ | 5047/6638 [1:00:55<1:28:04, 3.32s/it] 76%|███████▌ | 5048/6638 [1:00:58<1:28:10, 3.33s/it] {'loss': 0.6405, 'grad_norm': 0.5960838816662613, 'learning_rate': 2.86196828806816e-06, 'epoch': 0.76} 76%|███████▌ | 5048/6638 [1:00:58<1:28:10, 3.33s/it] 76%|███████▌ | 5049/6638 [1:01:02<1:27:49, 3.32s/it] {'loss': 0.5952, 'grad_norm': 0.5871666750393915, 'learning_rate': 2.8585516124335355e-06, 'epoch': 0.76} 76%|███████▌ | 5049/6638 [1:01:02<1:27:49, 3.32s/it]6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 71 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 76%|███████▌ | 5050/6638 [1:01:05<1:27:23, 3.30s/it]4 AutoResumeHook: Checking whether to suspend... {'loss': 0.639, 'grad_norm': 0.5762455316911813, 'learning_rate': 2.8551366373281107e-06, 'epoch': 0.76} 76%|███████▌ | 5050/6638 [1:01:05<1:27:23, 3.30s/it] 76%|███████▌ | 5051/6638 [1:01:08<1:27:47, 3.32s/it] {'loss': 0.6298, 'grad_norm': 0.5354884723874843, 'learning_rate': 2.8517233635650687e-06, 'epoch': 0.76} 76%|███████▌ | 5051/6638 [1:01:08<1:27:47, 3.32s/it] 76%|███████▌ | 5052/6638 [1:01:11<1:26:38, 3.28s/it] {'loss': 0.6615, 'grad_norm': 0.6125381129956596, 'learning_rate': 2.8483117919571748e-06, 'epoch': 0.76} 76%|███████▌ | 5052/6638 [1:01:11<1:26:38, 3.28s/it] 76%|███████▌ | 5053/6638 [1:01:15<1:26:59, 3.29s/it] {'loss': 0.6806, 'grad_norm': 0.672762147934036, 'learning_rate': 2.844901923316801e-06, 'epoch': 0.76} 76%|███████▌ | 5053/6638 [1:01:15<1:26:59, 3.29s/it] 76%|███████▌ | 5054/6638 [1:01:18<1:26:42, 3.28s/it] {'loss': 0.6549, 'grad_norm': 0.60124289317949, 'learning_rate': 2.8414937584559e-06, 'epoch': 0.76} 76%|███████▌ | 5054/6638 [1:01:18<1:26:42, 3.28s/it] 76%|███████▌ | 5055/6638 [1:01:21<1:26:20, 3.27s/it] {'loss': 0.6216, 'grad_norm': 0.6034404287842172, 'learning_rate': 2.8380872981860396e-06, 'epoch': 0.76} 76%|███████▌ | 5055/6638 [1:01:21<1:26:20, 3.27s/it] 76%|███████▌ | 5056/6638 [1:01:24<1:26:23, 3.28s/it] {'loss': 0.6285, 'grad_norm': 0.5178416616828233, 'learning_rate': 2.8346825433183654e-06, 'epoch': 0.76} 76%|███████▌ | 5056/6638 [1:01:24<1:26:23, 3.28s/it] 76%|███████▌ | 5057/6638 [1:01:28<1:25:15, 3.24s/it] {'loss': 0.6491, 'grad_norm': 0.6383479499569441, 'learning_rate': 2.831279494663616e-06, 'epoch': 0.76} 76%|███████▌ | 5057/6638 [1:01:28<1:25:15, 3.24s/it] 76%|███████▌ | 5058/6638 [1:01:31<1:24:47, 3.22s/it] {'loss': 0.6146, 'grad_norm': 0.6157815876889629, 'learning_rate': 2.827878153032133e-06, 'epoch': 0.76} 76%|███████▌ | 5058/6638 [1:01:31<1:24:47, 3.22s/it] 76%|███████▌ | 5059/6638 [1:01:34<1:25:04, 3.23s/it] {'loss': 0.6562, 'grad_norm': 0.6242455240162906, 'learning_rate': 2.8244785192338488e-06, 'epoch': 0.76} 76%|███████▌ | 5059/6638 [1:01:34<1:25:04, 3.23s/it] 76%|███████▌ | 5060/6638 [1:01:37<1:25:38, 3.26s/it] {'loss': 0.6739, 'grad_norm': 0.6077750558025291, 'learning_rate': 2.821080594078288e-06, 'epoch': 0.76} 76%|███████▌ | 5060/6638 [1:01:37<1:25:38, 3.26s/it] 76%|███████▌ | 5061/6638 [1:01:41<1:25:41, 3.26s/it] {'loss': 0.6519, 'grad_norm': 0.6063913772425584, 'learning_rate': 2.8176843783745665e-06, 'epoch': 0.76} 76%|███████▌ | 5061/6638 [1:01:41<1:25:41, 3.26s/it] 76%|███████▋ | 5062/6638 [1:01:44<1:25:17, 3.25s/it] {'loss': 0.618, 'grad_norm': 0.5455713743959035, 'learning_rate': 2.8142898729313885e-06, 'epoch': 0.76} 76%|███████▋ | 5062/6638 [1:01:44<1:25:17, 3.25s/it] 76%|███████▋ | 5063/6638 [1:01:47<1:25:30, 3.26s/it] {'loss': 0.6419, 'grad_norm': 0.5844317392683479, 'learning_rate': 2.81089707855707e-06, 'epoch': 0.76} 76%|███████▋ | 5063/6638 [1:01:47<1:25:30, 3.26s/it] 76%|███████▋ | 5064/6638 [1:01:50<1:26:03, 3.28s/it] {'loss': 0.6377, 'grad_norm': 0.5916473109637415, 'learning_rate': 2.8075059960594998e-06, 'epoch': 0.76} 76%|███████▋ | 5064/6638 [1:01:50<1:26:03, 3.28s/it] 76%|███████▋ | 5065/6638 [1:01:54<1:26:05, 3.28s/it] {'loss': 0.6648, 'grad_norm': 0.5903693633563627, 'learning_rate': 2.804116626246164e-06, 'epoch': 0.76} 76%|███████▋ | 5065/6638 [1:01:54<1:26:05, 3.28s/it] 76%|███████▋ | 5066/6638 [1:01:57<1:25:36, 3.27s/it] {'loss': 0.6407, 'grad_norm': 0.6201882164858642, 'learning_rate': 2.8007289699241435e-06, 'epoch': 0.76} 76%|███████▋ | 5066/6638 [1:01:57<1:25:36, 3.27s/it] 76%|███████▋ | 5067/6638 [1:02:00<1:25:44, 3.27s/it] {'loss': 0.6646, 'grad_norm': 0.6345480990910604, 'learning_rate': 2.7973430279001146e-06, 'epoch': 0.76} 76%|███████▋ | 5067/6638 [1:02:00<1:25:44, 3.27s/it] 76%|███████▋ | 5068/6638 [1:02:03<1:25:08, 3.25s/it] {'loss': 0.6293, 'grad_norm': 0.5963947689201611, 'learning_rate': 2.793958800980341e-06, 'epoch': 0.76} 76%|███████▋ | 5068/6638 [1:02:03<1:25:08, 3.25s/it] 76%|███████▋ | 5069/6638 [1:02:07<1:24:31, 3.23s/it] {'loss': 0.6788, 'grad_norm': 0.6397437544068969, 'learning_rate': 2.7905762899706734e-06, 'epoch': 0.76} 76%|███████▋ | 5069/6638 [1:02:07<1:24:31, 3.23s/it] 76%|███████▋ | 5070/6638 [1:02:10<1:23:51, 3.21s/it] {'loss': 0.6515, 'grad_norm': 0.6777407197266302, 'learning_rate': 2.7871954956765625e-06, 'epoch': 0.76} 76%|███████▋ | 5070/6638 [1:02:10<1:23:51, 3.21s/it] 76%|███████▋ | 5071/6638 [1:02:13<1:24:30, 3.24s/it] {'loss': 0.6541, 'grad_norm': 0.6598458129309446, 'learning_rate': 2.7838164189030493e-06, 'epoch': 0.76} 76%|███████▋ | 5071/6638 [1:02:13<1:24:30, 3.24s/it] 76%|███████▋ | 5072/6638 [1:02:16<1:23:56, 3.22s/it] {'loss': 0.6077, 'grad_norm': 0.5594747933737536, 'learning_rate': 2.780439060454756e-06, 'epoch': 0.76} 76%|███████▋ | 5072/6638 [1:02:16<1:23:56, 3.22s/it] 76%|███████▋ | 5073/6638 [1:02:20<1:24:27, 3.24s/it] {'loss': 0.6461, 'grad_norm': 0.58068648492506, 'learning_rate': 2.777063421135907e-06, 'epoch': 0.76} 76%|███████▋ | 5073/6638 [1:02:20<1:24:27, 3.24s/it] 76%|███████▋ | 5074/6638 [1:02:23<1:23:57, 3.22s/it] {'loss': 0.6299, 'grad_norm': 0.6014424676729205, 'learning_rate': 2.7736895017503163e-06, 'epoch': 0.76} 76%|███████▋ | 5074/6638 [1:02:23<1:23:57, 3.22s/it] 76%|███████▋ | 5075/6638 [1:02:26<1:24:34, 3.25s/it] {'loss': 0.6716, 'grad_norm': 0.6006827394726385, 'learning_rate': 2.7703173031013773e-06, 'epoch': 0.76} 76%|███████▋ | 5075/6638 [1:02:26<1:24:34, 3.25s/it] 76%|███████▋ | 5076/6638 [1:02:29<1:25:11, 3.27s/it] {'loss': 0.6101, 'grad_norm': 0.574217909456707, 'learning_rate': 2.766946825992091e-06, 'epoch': 0.76} 76%|███████▋ | 5076/6638 [1:02:29<1:25:11, 3.27s/it] 76%|███████▋ | 5077/6638 [1:02:33<1:24:45, 3.26s/it] {'loss': 0.6422, 'grad_norm': 0.6475494693926311, 'learning_rate': 2.763578071225028e-06, 'epoch': 0.76} 76%|███████▋ | 5077/6638 [1:02:33<1:24:45, 3.26s/it] 76%|███████▋ | 5078/6638 [1:02:36<1:24:52, 3.26s/it] {'loss': 0.6335, 'grad_norm': 0.6378022827094, 'learning_rate': 2.7602110396023672e-06, 'epoch': 0.76} 76%|███████▋ | 5078/6638 [1:02:36<1:24:52, 3.26s/it] 77%|███████▋ | 5079/6638 [1:02:39<1:24:43, 3.26s/it] {'loss': 0.6364, 'grad_norm': 0.5754232582847884, 'learning_rate': 2.7568457319258714e-06, 'epoch': 0.77} 77%|███████▋ | 5079/6638 [1:02:39<1:24:43, 3.26s/it] 77%|███████▋ | 5080/6638 [1:02:42<1:24:09, 3.24s/it] {'loss': 0.6237, 'grad_norm': 0.5517064485406961, 'learning_rate': 2.7534821489968833e-06, 'epoch': 0.77} 77%|███████▋ | 5080/6638 [1:02:42<1:24:09, 3.24s/it] 77%|███████▋ | 5081/6638 [1:02:46<1:25:45, 3.30s/it] {'loss': 0.6403, 'grad_norm': 0.5656895123011692, 'learning_rate': 2.7501202916163484e-06, 'epoch': 0.77} 77%|███████▋ | 5081/6638 [1:02:46<1:25:45, 3.30s/it] 77%|███████▋ | 5082/6638 [1:02:49<1:25:47, 3.31s/it] {'loss': 0.6757, 'grad_norm': 0.6450219960186464, 'learning_rate': 2.7467601605847937e-06, 'epoch': 0.77} 77%|███████▋ | 5082/6638 [1:02:49<1:25:47, 3.31s/it] 77%|███████▋ | 5083/6638 [1:02:52<1:25:43, 3.31s/it] {'loss': 0.6643, 'grad_norm': 0.6503145945241453, 'learning_rate': 2.743401756702341e-06, 'epoch': 0.77} 77%|███████▋ | 5083/6638 [1:02:52<1:25:43, 3.31s/it] 77%|███████▋ | 5084/6638 [1:02:56<1:26:15, 3.33s/it] {'loss': 0.6079, 'grad_norm': 0.548856584003815, 'learning_rate': 2.740045080768694e-06, 'epoch': 0.77} 77%|███████▋ | 5084/6638 [1:02:56<1:26:15, 3.33s/it] 77%|███████▋ | 5085/6638 [1:02:59<1:26:47, 3.35s/it] {'loss': 0.6459, 'grad_norm': 0.600431328712268, 'learning_rate': 2.736690133583143e-06, 'epoch': 0.77} 77%|███████▋ | 5085/6638 [1:02:59<1:26:47, 3.35s/it] 77%|███████▋ | 5086/6638 [1:03:02<1:26:15, 3.33s/it] {'loss': 0.5992, 'grad_norm': 0.5387205189453892, 'learning_rate': 2.733336915944581e-06, 'epoch': 0.77} 77%|███████▋ | 5086/6638 [1:03:03<1:26:15, 3.33s/it] 77%|███████▋ | 5087/6638 [1:03:06<1:25:23, 3.30s/it] {'loss': 0.6277, 'grad_norm': 0.5574135207018568, 'learning_rate': 2.7299854286514727e-06, 'epoch': 0.77} 77%|███████▋ | 5087/6638 [1:03:06<1:25:23, 3.30s/it] 77%|███████▋ | 5088/6638 [1:03:09<1:25:11, 3.30s/it] {'loss': 0.6261, 'grad_norm': 0.6165568722891932, 'learning_rate': 2.726635672501884e-06, 'epoch': 0.77} 77%|███████▋ | 5088/6638 [1:03:09<1:25:11, 3.30s/it] 77%|███████▋ | 5089/6638 [1:03:12<1:24:48, 3.29s/it] {'loss': 0.6131, 'grad_norm': 0.6249471395971975, 'learning_rate': 2.7232876482934545e-06, 'epoch': 0.77} 77%|███████▋ | 5089/6638 [1:03:12<1:24:48, 3.29s/it] 77%|███████▋ | 5090/6638 [1:03:16<1:24:33, 3.28s/it] {'loss': 0.6135, 'grad_norm': 0.5438047844828054, 'learning_rate': 2.7199413568234245e-06, 'epoch': 0.77} 77%|███████▋ | 5090/6638 [1:03:16<1:24:33, 3.28s/it] 77%|███████▋ | 5091/6638 [1:03:19<1:24:31, 3.28s/it] {'loss': 0.6581, 'grad_norm': 0.5968288843750663, 'learning_rate': 2.7165967988886168e-06, 'epoch': 0.77} 77%|███████▋ | 5091/6638 [1:03:19<1:24:31, 3.28s/it] 77%|███████▋ | 5092/6638 [1:03:22<1:24:50, 3.29s/it] {'loss': 0.6417, 'grad_norm': 0.6116840555676449, 'learning_rate': 2.713253975285437e-06, 'epoch': 0.77} 77%|███████▋ | 5092/6638 [1:03:22<1:24:50, 3.29s/it] 77%|███████▋ | 5093/6638 [1:03:25<1:24:53, 3.30s/it] {'loss': 0.6539, 'grad_norm': 0.5888894785132355, 'learning_rate': 2.7099128868098847e-06, 'epoch': 0.77} 77%|███████▋ | 5093/6638 [1:03:25<1:24:53, 3.30s/it] 77%|███████▋ | 5094/6638 [1:03:29<1:24:31, 3.28s/it] {'loss': 0.6268, 'grad_norm': 0.591979157847539, 'learning_rate': 2.7065735342575463e-06, 'epoch': 0.77} 77%|███████▋ | 5094/6638 [1:03:29<1:24:31, 3.28s/it] 77%|███████▋ | 5095/6638 [1:03:32<1:25:15, 3.32s/it] {'loss': 0.6339, 'grad_norm': 0.5707458881672285, 'learning_rate': 2.7032359184235845e-06, 'epoch': 0.77} 77%|███████▋ | 5095/6638 [1:03:32<1:25:15, 3.32s/it] 77%|███████▋ | 5096/6638 [1:03:35<1:24:29, 3.29s/it] {'loss': 0.6359, 'grad_norm': 0.6612639147365167, 'learning_rate': 2.6999000401027607e-06, 'epoch': 0.77} 77%|███████▋ | 5096/6638 [1:03:35<1:24:29, 3.29s/it] 77%|███████▋ | 5097/6638 [1:03:39<1:24:46, 3.30s/it] {'loss': 0.65, 'grad_norm': 0.5914862492757497, 'learning_rate': 2.6965659000894205e-06, 'epoch': 0.77} 77%|███████▋ | 5097/6638 [1:03:39<1:24:46, 3.30s/it] 77%|███████▋ | 5098/6638 [1:03:42<1:24:19, 3.29s/it] {'loss': 0.6312, 'grad_norm': 0.5631071962502511, 'learning_rate': 2.693233499177487e-06, 'epoch': 0.77} 77%|███████▋ | 5098/6638 [1:03:42<1:24:19, 3.29s/it] 77%|███████▋ | 5099/6638 [1:03:45<1:24:03, 3.28s/it] {'loss': 0.6076, 'grad_norm': 0.6023839209958416, 'learning_rate': 2.6899028381604786e-06, 'epoch': 0.77} 77%|███████▋ | 5099/6638 [1:03:45<1:24:03, 3.28s/it]6 AutoResumeHook: Checking whether to suspend... 02 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 77%|███████▋ | 5100/6638 [1:03:48<1:23:28, 3.26s/it]5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6261, 'grad_norm': 0.592577392229917, 'learning_rate': 2.6865739178314933e-06, 'epoch': 0.77} 77%|███████▋ | 5100/6638 [1:03:48<1:23:28, 3.26s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 77%|███████▋ | 5101/6638 [1:04:06<3:12:07, 7.50s/it] {'loss': 0.627, 'grad_norm': 0.6090712639776322, 'learning_rate': 2.683246738983217e-06, 'epoch': 0.77} 77%|███████▋ | 5101/6638 [1:04:06<3:12:07, 7.50s/it] 77%|███████▋ | 5102/6638 [1:04:09<2:39:24, 6.23s/it] {'loss': 0.6543, 'grad_norm': 0.6082351167656658, 'learning_rate': 2.6799213024079263e-06, 'epoch': 0.77} 77%|███████▋ | 5102/6638 [1:04:09<2:39:24, 6.23s/it] 77%|███████▋ | 5103/6638 [1:04:12<2:16:42, 5.34s/it] {'loss': 0.6594, 'grad_norm': 0.6695253085272471, 'learning_rate': 2.67659760889747e-06, 'epoch': 0.77} 77%|███████▋ | 5103/6638 [1:04:12<2:16:42, 5.34s/it] 77%|███████▋ | 5104/6638 [1:04:16<2:00:35, 4.72s/it] {'loss': 0.6472, 'grad_norm': 0.5779835764210879, 'learning_rate': 2.673275659243294e-06, 'epoch': 0.77} 77%|███████▋ | 5104/6638 [1:04:16<2:00:35, 4.72s/it] 77%|███████▋ | 5105/6638 [1:04:19<1:49:54, 4.30s/it] {'loss': 0.6299, 'grad_norm': 0.6021931987618091, 'learning_rate': 2.6699554542364237e-06, 'epoch': 0.77} 77%|███████▋ | 5105/6638 [1:04:19<1:49:54, 4.30s/it] 77%|███████▋ | 5106/6638 [1:04:22<1:41:37, 3.98s/it] {'loss': 0.6249, 'grad_norm': 0.5546085375688261, 'learning_rate': 2.666636994667473e-06, 'epoch': 0.77} 77%|███████▋ | 5106/6638 [1:04:22<1:41:37, 3.98s/it] 77%|███████▋ | 5107/6638 [1:04:25<1:36:17, 3.77s/it] {'loss': 0.6141, 'grad_norm': 0.5314573479496535, 'learning_rate': 2.6633202813266334e-06, 'epoch': 0.77} 77%|███████▋ | 5107/6638 [1:04:25<1:36:17, 3.77s/it] 77%|███████▋ | 5108/6638 [1:04:29<1:32:32, 3.63s/it] {'loss': 0.617, 'grad_norm': 0.5839009816851373, 'learning_rate': 2.6600053150036798e-06, 'epoch': 0.77} 77%|███████▋ | 5108/6638 [1:04:29<1:32:32, 3.63s/it] 77%|███████▋ | 5109/6638 [1:04:32<1:29:03, 3.49s/it] {'loss': 0.6657, 'grad_norm': 0.7132776107626122, 'learning_rate': 2.656692096487985e-06, 'epoch': 0.77} 77%|███████▋ | 5109/6638 [1:04:32<1:29:03, 3.49s/it] 77%|███████▋ | 5110/6638 [1:04:35<1:27:38, 3.44s/it] {'loss': 0.6591, 'grad_norm': 0.6674257982923096, 'learning_rate': 2.6533806265684892e-06, 'epoch': 0.77} 77%|███████▋ | 5110/6638 [1:04:35<1:27:38, 3.44s/it] 77%|███████▋ | 5111/6638 [1:04:38<1:25:40, 3.37s/it] {'loss': 0.6535, 'grad_norm': 0.585120658460153, 'learning_rate': 2.650070906033727e-06, 'epoch': 0.77} 77%|███████▋ | 5111/6638 [1:04:38<1:25:40, 3.37s/it] 77%|███████▋ | 5112/6638 [1:04:42<1:26:00, 3.38s/it] {'loss': 0.6436, 'grad_norm': 0.5836302284191877, 'learning_rate': 2.6467629356718095e-06, 'epoch': 0.77} 77%|███████▋ | 5112/6638 [1:04:42<1:26:00, 3.38s/it] 77%|███████▋ | 5113/6638 [1:04:45<1:25:51, 3.38s/it] {'loss': 0.6385, 'grad_norm': 0.6265333029576255, 'learning_rate': 2.6434567162704338e-06, 'epoch': 0.77} 77%|███████▋ | 5113/6638 [1:04:45<1:25:51, 3.38s/it] 77%|███████▋ | 5114/6638 [1:04:48<1:24:59, 3.35s/it] {'loss': 0.6607, 'grad_norm': 0.6107469613610922, 'learning_rate': 2.640152248616886e-06, 'epoch': 0.77} 77%|███████▋ | 5114/6638 [1:04:48<1:24:59, 3.35s/it] 77%|███████▋ | 5115/6638 [1:04:52<1:24:06, 3.31s/it] {'loss': 0.6215, 'grad_norm': 0.5988132080802467, 'learning_rate': 2.6368495334980214e-06, 'epoch': 0.77} 77%|███████▋ | 5115/6638 [1:04:52<1:24:06, 3.31s/it] 77%|███████▋ | 5116/6638 [1:04:55<1:23:53, 3.31s/it] {'loss': 0.6403, 'grad_norm': 0.6218010236229387, 'learning_rate': 2.633548571700291e-06, 'epoch': 0.77} 77%|███████▋ | 5116/6638 [1:04:55<1:23:53, 3.31s/it] 77%|███████▋ | 5117/6638 [1:04:58<1:23:51, 3.31s/it] {'loss': 0.6678, 'grad_norm': 0.5803321668636623, 'learning_rate': 2.630249364009725e-06, 'epoch': 0.77} 77%|███████▋ | 5117/6638 [1:04:58<1:23:51, 3.31s/it] 77%|███████▋ | 5118/6638 [1:05:02<1:24:02, 3.32s/it] {'loss': 0.6287, 'grad_norm': 0.5624400658356922, 'learning_rate': 2.626951911211928e-06, 'epoch': 0.77} 77%|███████▋ | 5118/6638 [1:05:02<1:24:02, 3.32s/it] 77%|███████▋ | 5119/6638 [1:05:05<1:24:07, 3.32s/it] {'loss': 0.6547, 'grad_norm': 0.5453787026548796, 'learning_rate': 2.6236562140920998e-06, 'epoch': 0.77} 77%|███████▋ | 5119/6638 [1:05:05<1:24:07, 3.32s/it] 77%|███████▋ | 5120/6638 [1:05:08<1:24:40, 3.35s/it] {'loss': 0.6371, 'grad_norm': 0.6272445186383878, 'learning_rate': 2.6203622734350054e-06, 'epoch': 0.77} 77%|███████▋ | 5120/6638 [1:05:08<1:24:40, 3.35s/it] 77%|███████▋ | 5121/6638 [1:05:12<1:23:50, 3.32s/it] {'loss': 0.6162, 'grad_norm': 0.5893170197046104, 'learning_rate': 2.6170700900250146e-06, 'epoch': 0.77} 77%|███████▋ | 5121/6638 [1:05:12<1:23:50, 3.32s/it] 77%|███████▋ | 5122/6638 [1:05:15<1:24:32, 3.35s/it] {'loss': 0.679, 'grad_norm': 0.5995580953855894, 'learning_rate': 2.6137796646460576e-06, 'epoch': 0.77} 77%|███████▋ | 5122/6638 [1:05:15<1:24:32, 3.35s/it] 77%|███████▋ | 5123/6638 [1:05:18<1:24:18, 3.34s/it] {'loss': 0.6535, 'grad_norm': 0.5691751605225561, 'learning_rate': 2.6104909980816527e-06, 'epoch': 0.77} 77%|███████▋ | 5123/6638 [1:05:18<1:24:18, 3.34s/it] 77%|███████▋ | 5124/6638 [1:05:22<1:23:25, 3.31s/it] {'loss': 0.6189, 'grad_norm': 0.5587647272606703, 'learning_rate': 2.6072040911149033e-06, 'epoch': 0.77} 77%|███████▋ | 5124/6638 [1:05:22<1:23:25, 3.31s/it] 77%|███████▋ | 5125/6638 [1:05:25<1:22:42, 3.28s/it] {'loss': 0.6183, 'grad_norm': 0.5810230999983779, 'learning_rate': 2.603918944528494e-06, 'epoch': 0.77} 77%|███████▋ | 5125/6638 [1:05:25<1:22:42, 3.28s/it] 77%|███████▋ | 5126/6638 [1:05:28<1:22:51, 3.29s/it] {'loss': 0.6699, 'grad_norm': 0.7310818157430964, 'learning_rate': 2.6006355591046806e-06, 'epoch': 0.77} 77%|███████▋ | 5126/6638 [1:05:28<1:22:51, 3.29s/it] 77%|███████▋ | 5127/6638 [1:05:31<1:23:08, 3.30s/it] {'loss': 0.6409, 'grad_norm': 0.5457966279234178, 'learning_rate': 2.597353935625311e-06, 'epoch': 0.77} 77%|███████▋ | 5127/6638 [1:05:31<1:23:08, 3.30s/it] 77%|███████▋ | 5128/6638 [1:05:35<1:22:49, 3.29s/it] {'loss': 0.5995, 'grad_norm': 0.5464865265302254, 'learning_rate': 2.5940740748718074e-06, 'epoch': 0.77} 77%|███████▋ | 5128/6638 [1:05:35<1:22:49, 3.29s/it] 77%|███████▋ | 5129/6638 [1:05:38<1:21:56, 3.26s/it] {'loss': 0.6461, 'grad_norm': 0.6186021414529131, 'learning_rate': 2.5907959776251768e-06, 'epoch': 0.77} 77%|███████▋ | 5129/6638 [1:05:38<1:21:56, 3.26s/it] 77%|███████▋ | 5130/6638 [1:05:41<1:21:48, 3.26s/it] {'loss': 0.643, 'grad_norm': 0.5771136564872497, 'learning_rate': 2.587519644666001e-06, 'epoch': 0.77} 77%|███████▋ | 5130/6638 [1:05:41<1:21:48, 3.26s/it] 77%|███████▋ | 5131/6638 [1:05:44<1:22:24, 3.28s/it] {'loss': 0.6438, 'grad_norm': 0.585097952823141, 'learning_rate': 2.5842450767744378e-06, 'epoch': 0.77} 77%|███████▋ | 5131/6638 [1:05:44<1:22:24, 3.28s/it] 77%|███████▋ | 5132/6638 [1:05:48<1:21:49, 3.26s/it] {'loss': 0.636, 'grad_norm': 0.5803516058092527, 'learning_rate': 2.5809722747302414e-06, 'epoch': 0.77} 77%|███████▋ | 5132/6638 [1:05:48<1:21:49, 3.26s/it] 77%|███████▋ | 5133/6638 [1:05:51<1:22:24, 3.29s/it] {'loss': 0.6506, 'grad_norm': 0.5870772233848682, 'learning_rate': 2.5777012393127265e-06, 'epoch': 0.77} 77%|███████▋ | 5133/6638 [1:05:51<1:22:24, 3.29s/it] 77%|███████▋ | 5134/6638 [1:05:54<1:22:16, 3.28s/it] {'loss': 0.6417, 'grad_norm': 0.5580806503988354, 'learning_rate': 2.5744319713008025e-06, 'epoch': 0.77} 77%|███████▋ | 5134/6638 [1:05:54<1:22:16, 3.28s/it] 77%|███████▋ | 5135/6638 [1:05:58<1:22:05, 3.28s/it] {'loss': 0.6445, 'grad_norm': 0.66208295320243, 'learning_rate': 2.5711644714729443e-06, 'epoch': 0.77} 77%|███████▋ | 5135/6638 [1:05:58<1:22:05, 3.28s/it] 77%|███████▋ | 5136/6638 [1:06:01<1:21:31, 3.26s/it] {'loss': 0.6627, 'grad_norm': 0.5708003669042702, 'learning_rate': 2.567898740607215e-06, 'epoch': 0.77} 77%|███████▋ | 5136/6638 [1:06:01<1:21:31, 3.26s/it] 77%|███████▋ | 5137/6638 [1:06:04<1:23:09, 3.32s/it] {'loss': 0.6422, 'grad_norm': 0.6296880088147051, 'learning_rate': 2.5646347794812566e-06, 'epoch': 0.77} 77%|███████▋ | 5137/6638 [1:06:04<1:23:09, 3.32s/it] 77%|███████▋ | 5138/6638 [1:06:08<1:22:55, 3.32s/it] {'loss': 0.6024, 'grad_norm': 0.6970818028871405, 'learning_rate': 2.561372588872283e-06, 'epoch': 0.77} 77%|███████▋ | 5138/6638 [1:06:08<1:22:55, 3.32s/it] 77%|███████▋ | 5139/6638 [1:06:11<1:22:06, 3.29s/it] {'loss': 0.6659, 'grad_norm': 0.5705148614186183, 'learning_rate': 2.558112169557091e-06, 'epoch': 0.77} 77%|███████▋ | 5139/6638 [1:06:11<1:22:06, 3.29s/it] 77%|███████▋ | 5140/6638 [1:06:14<1:22:06, 3.29s/it] {'loss': 0.6465, 'grad_norm': 0.625109438014193, 'learning_rate': 2.5548535223120584e-06, 'epoch': 0.77} 77%|███████▋ | 5140/6638 [1:06:14<1:22:06, 3.29s/it] 77%|███████▋ | 5141/6638 [1:06:17<1:22:36, 3.31s/it] {'loss': 0.6443, 'grad_norm': 0.5705422204080212, 'learning_rate': 2.5515966479131325e-06, 'epoch': 0.77} 77%|███████▋ | 5141/6638 [1:06:17<1:22:36, 3.31s/it] 77%|███████▋ | 5142/6638 [1:06:21<1:22:05, 3.29s/it] {'loss': 0.6204, 'grad_norm': 0.5341058858686484, 'learning_rate': 2.548341547135851e-06, 'epoch': 0.77} 77%|███████▋ | 5142/6638 [1:06:21<1:22:05, 3.29s/it] 77%|███████▋ | 5143/6638 [1:06:24<1:21:42, 3.28s/it] {'loss': 0.6103, 'grad_norm': 0.6310785169703723, 'learning_rate': 2.5450882207553095e-06, 'epoch': 0.77} 77%|███████▋ | 5143/6638 [1:06:24<1:21:42, 3.28s/it] 77%|███████▋ | 5144/6638 [1:06:27<1:21:30, 3.27s/it] {'loss': 0.6105, 'grad_norm': 0.5393638108379727, 'learning_rate': 2.5418366695462095e-06, 'epoch': 0.77} 77%|███████▋ | 5144/6638 [1:06:27<1:21:30, 3.27s/it] 78%|███████▊ | 5145/6638 [1:06:30<1:21:19, 3.27s/it] {'loss': 0.6046, 'grad_norm': 0.584795571010535, 'learning_rate': 2.5385868942828042e-06, 'epoch': 0.78} 78%|███████▊ | 5145/6638 [1:06:30<1:21:19, 3.27s/it] 78%|███████▊ | 5146/6638 [1:06:34<1:20:58, 3.26s/it] {'loss': 0.6151, 'grad_norm': 0.564171247776602, 'learning_rate': 2.5353388957389323e-06, 'epoch': 0.78} 78%|███████▊ | 5146/6638 [1:06:34<1:20:58, 3.26s/it] 78%|███████▊ | 5147/6638 [1:06:37<1:21:55, 3.30s/it] {'loss': 0.6787, 'grad_norm': 0.603432670392924, 'learning_rate': 2.5320926746880137e-06, 'epoch': 0.78} 78%|███████▊ | 5147/6638 [1:06:37<1:21:55, 3.30s/it] 78%|███████▊ | 5148/6638 [1:06:40<1:22:13, 3.31s/it] {'loss': 0.6518, 'grad_norm': 0.5745576025366068, 'learning_rate': 2.5288482319030417e-06, 'epoch': 0.78} 78%|███████▊ | 5148/6638 [1:06:40<1:22:13, 3.31s/it] 78%|███████▊ | 5149/6638 [1:06:44<1:22:30, 3.32s/it] {'loss': 0.6234, 'grad_norm': 0.5584090043398897, 'learning_rate': 2.5256055681565883e-06, 'epoch': 0.78} 78%|███████▊ | 5149/6638 [1:06:44<1:22:30, 3.32s/it]6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 78%|███████▊ | 5150/6638 [1:06:47<1:21:35, 3.29s/it]3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6422, 'grad_norm': 0.6600295573575681, 'learning_rate': 2.522364684220796e-06, 'epoch': 0.78} 78%|███████▊ | 5150/6638 [1:06:47<1:21:35, 3.29s/it] 78%|███████▊ | 5151/6638 [1:06:50<1:20:59, 3.27s/it] {'loss': 0.6225, 'grad_norm': 0.5867296743200656, 'learning_rate': 2.51912558086739e-06, 'epoch': 0.78} 78%|███████▊ | 5151/6638 [1:06:50<1:20:59, 3.27s/it] 78%|███████▊ | 5152/6638 [1:06:54<1:22:11, 3.32s/it] {'loss': 0.6023, 'grad_norm': 0.5166366605482506, 'learning_rate': 2.5158882588676704e-06, 'epoch': 0.78} 78%|███████▊ | 5152/6638 [1:06:54<1:22:11, 3.32s/it] 78%|███████▊ | 5153/6638 [1:06:57<1:21:14, 3.28s/it] {'loss': 0.6628, 'grad_norm': 0.6266466663229667, 'learning_rate': 2.512652718992508e-06, 'epoch': 0.78} 78%|███████▊ | 5153/6638 [1:06:57<1:21:14, 3.28s/it] 78%|███████▊ | 5154/6638 [1:07:00<1:21:38, 3.30s/it] {'loss': 0.6445, 'grad_norm': 0.6371713580162921, 'learning_rate': 2.5094189620123565e-06, 'epoch': 0.78} 78%|███████▊ | 5154/6638 [1:07:00<1:21:38, 3.30s/it] 78%|███████▊ | 5155/6638 [1:07:03<1:21:14, 3.29s/it] {'loss': 0.6324, 'grad_norm': 0.539025803702577, 'learning_rate': 2.5061869886972378e-06, 'epoch': 0.78} 78%|███████▊ | 5155/6638 [1:07:03<1:21:14, 3.29s/it] 78%|███████▊ | 5156/6638 [1:07:07<1:21:37, 3.30s/it] {'loss': 0.6617, 'grad_norm': 0.6158974842360482, 'learning_rate': 2.5029567998167546e-06, 'epoch': 0.78} 78%|███████▊ | 5156/6638 [1:07:07<1:21:37, 3.30s/it] 78%|███████▊ | 5157/6638 [1:07:10<1:21:24, 3.30s/it] {'loss': 0.6367, 'grad_norm': 0.5949081273233732, 'learning_rate': 2.4997283961400866e-06, 'epoch': 0.78} 78%|███████▊ | 5157/6638 [1:07:10<1:21:24, 3.30s/it] 78%|███████▊ | 5158/6638 [1:07:13<1:21:10, 3.29s/it] {'loss': 0.6315, 'grad_norm': 0.5500410151113319, 'learning_rate': 2.496501778435977e-06, 'epoch': 0.78} 78%|███████▊ | 5158/6638 [1:07:13<1:21:10, 3.29s/it] 78%|███████▊ | 5159/6638 [1:07:17<1:21:19, 3.30s/it] {'loss': 0.6322, 'grad_norm': 0.6265556135646939, 'learning_rate': 2.493276947472756e-06, 'epoch': 0.78} 78%|███████▊ | 5159/6638 [1:07:17<1:21:19, 3.30s/it] 78%|███████▊ | 5160/6638 [1:07:20<1:20:40, 3.28s/it] {'loss': 0.5999, 'grad_norm': 0.5144321164587318, 'learning_rate': 2.490053904018327e-06, 'epoch': 0.78} 78%|███████▊ | 5160/6638 [1:07:20<1:20:40, 3.28s/it] 78%|███████▊ | 5161/6638 [1:07:23<1:21:47, 3.32s/it] {'loss': 0.7034, 'grad_norm': 0.6398485847494013, 'learning_rate': 2.4868326488401575e-06, 'epoch': 0.78} 78%|███████▊ | 5161/6638 [1:07:23<1:21:47, 3.32s/it] 78%|███████▊ | 5162/6638 [1:07:27<1:20:57, 3.29s/it] {'loss': 0.6176, 'grad_norm': 0.5746288731675541, 'learning_rate': 2.483613182705299e-06, 'epoch': 0.78} 78%|███████▊ | 5162/6638 [1:07:27<1:20:57, 3.29s/it] 78%|███████▊ | 5163/6638 [1:07:30<1:21:59, 3.33s/it] {'loss': 0.6426, 'grad_norm': 0.5477780415777733, 'learning_rate': 2.4803955063803775e-06, 'epoch': 0.78} 78%|███████▊ | 5163/6638 [1:07:30<1:21:59, 3.33s/it] 78%|███████▊ | 5164/6638 [1:07:33<1:21:59, 3.34s/it] {'loss': 0.6288, 'grad_norm': 0.5458751927667487, 'learning_rate': 2.4771796206315846e-06, 'epoch': 0.78} 78%|███████▊ | 5164/6638 [1:07:33<1:21:59, 3.34s/it] 78%|███████▊ | 5165/6638 [1:07:37<1:23:39, 3.41s/it] {'loss': 0.6629, 'grad_norm': 0.6219586982864573, 'learning_rate': 2.4739655262246954e-06, 'epoch': 0.78} 78%|███████▊ | 5165/6638 [1:07:37<1:23:39, 3.41s/it] 78%|███████▊ | 5166/6638 [1:07:40<1:23:41, 3.41s/it] {'loss': 0.6484, 'grad_norm': 0.6234031183895977, 'learning_rate': 2.4707532239250442e-06, 'epoch': 0.78} 78%|███████▊ | 5166/6638 [1:07:40<1:23:41, 3.41s/it] 78%|███████▊ | 5167/6638 [1:07:44<1:24:17, 3.44s/it] {'loss': 0.6307, 'grad_norm': 0.549593273101902, 'learning_rate': 2.4675427144975593e-06, 'epoch': 0.78} 78%|███████▊ | 5167/6638 [1:07:44<1:24:17, 3.44s/it] 78%|███████▊ | 5168/6638 [1:07:47<1:23:35, 3.41s/it] {'loss': 0.6463, 'grad_norm': 0.5572913028864399, 'learning_rate': 2.464333998706726e-06, 'epoch': 0.78} 78%|███████▊ | 5168/6638 [1:07:47<1:23:35, 3.41s/it] 78%|███████▊ | 5169/6638 [1:07:50<1:22:21, 3.36s/it] {'loss': 0.6326, 'grad_norm': 0.5743497854890715, 'learning_rate': 2.461127077316603e-06, 'epoch': 0.78} 78%|███████▊ | 5169/6638 [1:07:50<1:22:21, 3.36s/it] 78%|███████▊ | 5170/6638 [1:07:54<1:21:32, 3.33s/it] {'loss': 0.6396, 'grad_norm': 0.5689736472941717, 'learning_rate': 2.4579219510908294e-06, 'epoch': 0.78} 78%|███████▊ | 5170/6638 [1:07:54<1:21:32, 3.33s/it] 78%|███████▊ | 5171/6638 [1:07:57<1:22:36, 3.38s/it] {'loss': 0.666, 'grad_norm': 0.6205439364107417, 'learning_rate': 2.4547186207926134e-06, 'epoch': 0.78} 78%|███████▊ | 5171/6638 [1:07:57<1:22:36, 3.38s/it] 78%|███████▊ | 5172/6638 [1:08:00<1:22:03, 3.36s/it] {'loss': 0.6312, 'grad_norm': 0.5355545056862783, 'learning_rate': 2.4515170871847383e-06, 'epoch': 0.78} 78%|███████▊ | 5172/6638 [1:08:00<1:22:03, 3.36s/it] 78%|███████▊ | 5173/6638 [1:08:04<1:21:21, 3.33s/it] {'loss': 0.649, 'grad_norm': 0.584289265659771, 'learning_rate': 2.44831735102955e-06, 'epoch': 0.78} 78%|███████▊ | 5173/6638 [1:08:04<1:21:21, 3.33s/it] 78%|███████▊ | 5174/6638 [1:08:07<1:19:57, 3.28s/it] {'loss': 0.6331, 'grad_norm': 0.614857397277731, 'learning_rate': 2.4451194130889754e-06, 'epoch': 0.78} 78%|███████▊ | 5174/6638 [1:08:07<1:19:57, 3.28s/it] 78%|███████▊ | 5175/6638 [1:08:10<1:20:51, 3.32s/it] {'loss': 0.6861, 'grad_norm': 0.6128838188904596, 'learning_rate': 2.4419232741245158e-06, 'epoch': 0.78} 78%|███████▊ | 5175/6638 [1:08:10<1:20:51, 3.32s/it] 78%|███████▊ | 5176/6638 [1:08:14<1:20:41, 3.31s/it] {'loss': 0.6564, 'grad_norm': 0.5632898397451964, 'learning_rate': 2.438728934897232e-06, 'epoch': 0.78} 78%|███████▊ | 5176/6638 [1:08:14<1:20:41, 3.31s/it] 78%|███████▊ | 5177/6638 [1:08:17<1:20:08, 3.29s/it] {'loss': 0.6773, 'grad_norm': 0.6186672837949603, 'learning_rate': 2.4355363961677702e-06, 'epoch': 0.78} 78%|███████▊ | 5177/6638 [1:08:17<1:20:08, 3.29s/it] 78%|███████▊ | 5178/6638 [1:08:20<1:19:58, 3.29s/it] {'loss': 0.6542, 'grad_norm': 0.6090134657751898, 'learning_rate': 2.4323456586963333e-06, 'epoch': 0.78} 78%|███████▊ | 5178/6638 [1:08:20<1:19:58, 3.29s/it] 78%|███████▊ | 5179/6638 [1:08:23<1:19:46, 3.28s/it] {'loss': 0.6512, 'grad_norm': 0.5868011346483591, 'learning_rate': 2.4291567232427095e-06, 'epoch': 0.78} 78%|███████▊ | 5179/6638 [1:08:23<1:19:46, 3.28s/it] 78%|███████▊ | 5180/6638 [1:08:27<1:19:31, 3.27s/it] {'loss': 0.6062, 'grad_norm': 0.5542580830451097, 'learning_rate': 2.4259695905662506e-06, 'epoch': 0.78} 78%|███████▊ | 5180/6638 [1:08:27<1:19:31, 3.27s/it] 78%|███████▊ | 5181/6638 [1:08:30<1:19:58, 3.29s/it] {'loss': 0.6548, 'grad_norm': 0.6914597689411965, 'learning_rate': 2.4227842614258758e-06, 'epoch': 0.78} 78%|███████▊ | 5181/6638 [1:08:30<1:19:58, 3.29s/it] 78%|███████▊ | 5182/6638 [1:08:33<1:19:47, 3.29s/it] {'loss': 0.6675, 'grad_norm': 0.6323899202778033, 'learning_rate': 2.419600736580081e-06, 'epoch': 0.78} 78%|███████▊ | 5182/6638 [1:08:33<1:19:47, 3.29s/it] 78%|███████▊ | 5183/6638 [1:08:36<1:18:56, 3.26s/it] {'loss': 0.6224, 'grad_norm': 0.5787499785958093, 'learning_rate': 2.416419016786936e-06, 'epoch': 0.78} 78%|███████▊ | 5183/6638 [1:08:36<1:18:56, 3.26s/it] 78%|███████▊ | 5184/6638 [1:08:40<1:19:07, 3.27s/it] {'loss': 0.6346, 'grad_norm': 0.581194912767651, 'learning_rate': 2.413239102804067e-06, 'epoch': 0.78} 78%|███████▊ | 5184/6638 [1:08:40<1:19:07, 3.27s/it] 78%|███████▊ | 5185/6638 [1:08:43<1:18:55, 3.26s/it] {'loss': 0.6421, 'grad_norm': 0.5941583472268208, 'learning_rate': 2.410060995388681e-06, 'epoch': 0.78} 78%|███████▊ | 5185/6638 [1:08:43<1:18:55, 3.26s/it] 78%|███████▊ | 5186/6638 [1:08:46<1:18:56, 3.26s/it] {'loss': 0.6507, 'grad_norm': 0.6183461670620091, 'learning_rate': 2.406884695297558e-06, 'epoch': 0.78} 78%|███████▊ | 5186/6638 [1:08:46<1:18:56, 3.26s/it] 78%|███████▊ | 5187/6638 [1:08:49<1:18:22, 3.24s/it] {'loss': 0.6254, 'grad_norm': 0.5969065971992484, 'learning_rate': 2.403710203287033e-06, 'epoch': 0.78} 78%|███████▊ | 5187/6638 [1:08:49<1:18:22, 3.24s/it] 78%|███████▊ | 5188/6638 [1:08:53<1:19:33, 3.29s/it] {'loss': 0.6627, 'grad_norm': 0.6113656629994944, 'learning_rate': 2.4005375201130275e-06, 'epoch': 0.78} 78%|███████▊ | 5188/6638 [1:08:53<1:19:33, 3.29s/it] 78%|███████▊ | 5189/6638 [1:08:56<1:19:08, 3.28s/it] {'loss': 0.6293, 'grad_norm': 0.7423142303392725, 'learning_rate': 2.3973666465310143e-06, 'epoch': 0.78} 78%|███████▊ | 5189/6638 [1:08:56<1:19:08, 3.28s/it] 78%|███████▊ | 5190/6638 [1:08:59<1:18:42, 3.26s/it] {'loss': 0.6318, 'grad_norm': 0.5889532541134326, 'learning_rate': 2.3941975832960563e-06, 'epoch': 0.78} 78%|███████▊ | 5190/6638 [1:08:59<1:18:42, 3.26s/it] 78%|███████▊ | 5191/6638 [1:09:03<1:20:04, 3.32s/it] {'loss': 0.6686, 'grad_norm': 0.5563473900759296, 'learning_rate': 2.39103033116277e-06, 'epoch': 0.78} 78%|███████▊ | 5191/6638 [1:09:03<1:20:04, 3.32s/it] 78%|███████▊ | 5192/6638 [1:09:06<1:19:47, 3.31s/it] {'loss': 0.6573, 'grad_norm': 0.5819852226307559, 'learning_rate': 2.387864890885341e-06, 'epoch': 0.78} 78%|███████▊ | 5192/6638 [1:09:06<1:19:47, 3.31s/it] 78%|███████▊ | 5193/6638 [1:09:09<1:20:51, 3.36s/it] {'loss': 0.6657, 'grad_norm': 0.6081630402613836, 'learning_rate': 2.3847012632175293e-06, 'epoch': 0.78} 78%|███████▊ | 5193/6638 [1:09:09<1:20:51, 3.36s/it] 78%|███████▊ | 5194/6638 [1:09:13<1:20:44, 3.35s/it] {'loss': 0.6108, 'grad_norm': 0.5330385008577712, 'learning_rate': 2.381539448912662e-06, 'epoch': 0.78} 78%|███████▊ | 5194/6638 [1:09:13<1:20:44, 3.35s/it] 78%|███████▊ | 5195/6638 [1:09:16<1:19:56, 3.32s/it] {'loss': 0.6199, 'grad_norm': 0.5698839197539857, 'learning_rate': 2.3783794487236367e-06, 'epoch': 0.78} 78%|███████▊ | 5195/6638 [1:09:16<1:19:56, 3.32s/it] 78%|███████▊ | 5196/6638 [1:09:19<1:19:47, 3.32s/it] {'loss': 0.6241, 'grad_norm': 0.5961442492188535, 'learning_rate': 2.3752212634029127e-06, 'epoch': 0.78} 78%|███████▊ | 5196/6638 [1:09:19<1:19:47, 3.32s/it] 78%|███████▊ | 5197/6638 [1:09:23<1:19:07, 3.29s/it] {'loss': 0.6229, 'grad_norm': 0.5889590220668315, 'learning_rate': 2.3720648937025157e-06, 'epoch': 0.78} 78%|███████▊ | 5197/6638 [1:09:23<1:19:07, 3.29s/it] 78%|███████▊ | 5198/6638 [1:09:26<1:19:51, 3.33s/it] {'loss': 0.632, 'grad_norm': 0.5635535330808561, 'learning_rate': 2.368910340374054e-06, 'epoch': 0.78} 78%|███████▊ | 5198/6638 [1:09:26<1:19:51, 3.33s/it] 78%|███████▊ | 5199/6638 [1:09:29<1:18:34, 3.28s/it] {'loss': 0.6099, 'grad_norm': 0.643459520570539, 'learning_rate': 2.365757604168686e-06, 'epoch': 0.78} 78%|███████▊ | 5199/6638 [1:09:29<1:18:34, 3.28s/it]5 AutoResumeHook: Checking whether to suspend... 62 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 78%|███████▊ | 5200/6638 [1:09:32<1:18:21, 3.27s/it]3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6314, 'grad_norm': 0.6016569715429921, 'learning_rate': 2.3626066858371477e-06, 'epoch': 0.78} 78%|███████▊ | 5200/6638 [1:09:32<1:18:21, 3.27s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 78%|███████▊ | 5201/6638 [1:09:51<3:05:06, 7.73s/it] {'loss': 0.6274, 'grad_norm': 0.5545019088591738, 'learning_rate': 2.3594575861297355e-06, 'epoch': 0.78} 78%|███████▊ | 5201/6638 [1:09:51<3:05:06, 7.73s/it] 78%|███████▊ | 5202/6638 [1:09:54<2:36:12, 6.53s/it] {'loss': 0.6658, 'grad_norm': 0.5111985635698727, 'learning_rate': 2.3563103057963188e-06, 'epoch': 0.78} 78%|███████▊ | 5202/6638 [1:09:54<2:36:12, 6.53s/it] 78%|███████▊ | 5203/6638 [1:09:58<2:12:17, 5.53s/it] {'loss': 0.6208, 'grad_norm': 0.57439745055704, 'learning_rate': 2.3531648455863345e-06, 'epoch': 0.78} 78%|███████▊ | 5203/6638 [1:09:58<2:12:17, 5.53s/it] 78%|███████▊ | 5204/6638 [1:10:01<1:58:58, 4.98s/it] {'loss': 0.7117, 'grad_norm': 0.7039122552966799, 'learning_rate': 2.350021206248777e-06, 'epoch': 0.78} 78%|███████▊ | 5204/6638 [1:10:01<1:58:58, 4.98s/it] 78%|███████▊ | 5205/6638 [1:10:04<1:46:44, 4.47s/it] {'loss': 0.6236, 'grad_norm': 0.6151822610487079, 'learning_rate': 2.346879388532215e-06, 'epoch': 0.78} 78%|███████▊ | 5205/6638 [1:10:04<1:46:44, 4.47s/it] 78%|███████▊ | 5206/6638 [1:10:08<1:37:58, 4.11s/it] {'loss': 0.6029, 'grad_norm': 0.582924655602004, 'learning_rate': 2.3437393931847842e-06, 'epoch': 0.78} 78%|███████▊ | 5206/6638 [1:10:08<1:37:58, 4.11s/it] 78%|███████▊ | 5207/6638 [1:10:11<1:32:35, 3.88s/it] {'loss': 0.6046, 'grad_norm': 0.5254249791675618, 'learning_rate': 2.3406012209541795e-06, 'epoch': 0.78} 78%|███████▊ | 5207/6638 [1:10:11<1:32:35, 3.88s/it] 78%|███████▊ | 5208/6638 [1:10:15<1:30:45, 3.81s/it] {'loss': 0.6667, 'grad_norm': 0.5883956977631936, 'learning_rate': 2.337464872587667e-06, 'epoch': 0.78} 78%|███████▊ | 5208/6638 [1:10:15<1:30:45, 3.81s/it] 78%|███████▊ | 5209/6638 [1:10:18<1:26:25, 3.63s/it] {'loss': 0.6177, 'grad_norm': 0.5715221842484964, 'learning_rate': 2.334330348832079e-06, 'epoch': 0.78} 78%|███████▊ | 5209/6638 [1:10:18<1:26:25, 3.63s/it] 78%|███████▊ | 5210/6638 [1:10:21<1:23:41, 3.52s/it] {'loss': 0.625, 'grad_norm': 0.589874498056512, 'learning_rate': 2.331197650433813e-06, 'epoch': 0.78} 78%|███████▊ | 5210/6638 [1:10:21<1:23:41, 3.52s/it] 79%|███████▊ | 5211/6638 [1:10:24<1:21:41, 3.43s/it] {'loss': 0.6413, 'grad_norm': 0.581556581909065, 'learning_rate': 2.3280667781388276e-06, 'epoch': 0.79} 79%|███████▊ | 5211/6638 [1:10:24<1:21:41, 3.43s/it] 79%|███████▊ | 5212/6638 [1:10:28<1:20:20, 3.38s/it] {'loss': 0.6464, 'grad_norm': 0.5573586809722952, 'learning_rate': 2.324937732692647e-06, 'epoch': 0.79} 79%|███████▊ | 5212/6638 [1:10:28<1:20:20, 3.38s/it] 79%|███████▊ | 5213/6638 [1:10:31<1:19:25, 3.34s/it] {'loss': 0.6538, 'grad_norm': 0.579952947066484, 'learning_rate': 2.3218105148403657e-06, 'epoch': 0.79} 79%|███████▊ | 5213/6638 [1:10:31<1:19:25, 3.34s/it] 79%|███████▊ | 5214/6638 [1:10:35<1:21:31, 3.44s/it] {'loss': 0.7003, 'grad_norm': 0.7065650984278975, 'learning_rate': 2.3186851253266397e-06, 'epoch': 0.79} 79%|███████▊ | 5214/6638 [1:10:35<1:21:31, 3.44s/it] 79%|███████▊ | 5215/6638 [1:10:38<1:19:40, 3.36s/it] {'loss': 0.6093, 'grad_norm': 0.5843000546067109, 'learning_rate': 2.3155615648956944e-06, 'epoch': 0.79} 79%|███████▊ | 5215/6638 [1:10:38<1:19:40, 3.36s/it] 79%|███████▊ | 5216/6638 [1:10:41<1:18:25, 3.31s/it] {'loss': 0.6227, 'grad_norm': 0.6421815556418948, 'learning_rate': 2.312439834291307e-06, 'epoch': 0.79} 79%|███████▊ | 5216/6638 [1:10:41<1:18:25, 3.31s/it] 79%|███████▊ | 5217/6638 [1:10:44<1:18:12, 3.30s/it] {'loss': 0.6223, 'grad_norm': 0.5661658442676469, 'learning_rate': 2.3093199342568316e-06, 'epoch': 0.79} 79%|███████▊ | 5217/6638 [1:10:44<1:18:12, 3.30s/it] 79%|███████▊ | 5218/6638 [1:10:48<1:17:45, 3.29s/it] {'loss': 0.6092, 'grad_norm': 0.5821973485681161, 'learning_rate': 2.306201865535186e-06, 'epoch': 0.79} 79%|███████▊ | 5218/6638 [1:10:48<1:17:45, 3.29s/it] 79%|███████▊ | 5219/6638 [1:10:51<1:17:19, 3.27s/it] {'loss': 0.6086, 'grad_norm': 0.5643115961747673, 'learning_rate': 2.303085628868843e-06, 'epoch': 0.79} 79%|███████▊ | 5219/6638 [1:10:51<1:17:19, 3.27s/it] 79%|███████▊ | 5220/6638 [1:10:54<1:17:18, 3.27s/it] {'loss': 0.6339, 'grad_norm': 0.5772934309748639, 'learning_rate': 2.2999712249998396e-06, 'epoch': 0.79} 79%|███████▊ | 5220/6638 [1:10:54<1:17:18, 3.27s/it] 79%|███████▊ | 5221/6638 [1:10:57<1:17:13, 3.27s/it] {'loss': 0.6126, 'grad_norm': 0.5890820746900108, 'learning_rate': 2.2968586546697914e-06, 'epoch': 0.79} 79%|███████▊ | 5221/6638 [1:10:57<1:17:13, 3.27s/it] 79%|███████▊ | 5222/6638 [1:11:01<1:17:22, 3.28s/it] {'loss': 0.6079, 'grad_norm': 0.5344900059497018, 'learning_rate': 2.293747918619861e-06, 'epoch': 0.79} 79%|███████▊ | 5222/6638 [1:11:01<1:17:22, 3.28s/it] 79%|███████▊ | 5223/6638 [1:11:04<1:16:46, 3.26s/it] {'loss': 0.6412, 'grad_norm': 0.6148023069046679, 'learning_rate': 2.2906390175907823e-06, 'epoch': 0.79} 79%|███████▊ | 5223/6638 [1:11:04<1:16:46, 3.26s/it] 79%|███████▊ | 5224/6638 [1:11:07<1:16:34, 3.25s/it] {'loss': 0.6222, 'grad_norm': 0.5701679012628815, 'learning_rate': 2.2875319523228466e-06, 'epoch': 0.79} 79%|███████▊ | 5224/6638 [1:11:07<1:16:34, 3.25s/it] 79%|███████▊ | 5225/6638 [1:11:10<1:16:16, 3.24s/it] {'loss': 0.6462, 'grad_norm': 0.6014334393863157, 'learning_rate': 2.284426723555914e-06, 'epoch': 0.79} 79%|███████▊ | 5225/6638 [1:11:10<1:16:16, 3.24s/it] 79%|███████▊ | 5226/6638 [1:11:14<1:17:12, 3.28s/it] {'loss': 0.6915, 'grad_norm': 0.7040053249002608, 'learning_rate': 2.281323332029407e-06, 'epoch': 0.79} 79%|███████▊ | 5226/6638 [1:11:14<1:17:12, 3.28s/it] 79%|███████▊ | 5227/6638 [1:11:17<1:16:29, 3.25s/it] {'loss': 0.6524, 'grad_norm': 0.6707503652571731, 'learning_rate': 2.2782217784823036e-06, 'epoch': 0.79} 79%|███████▊ | 5227/6638 [1:11:17<1:16:29, 3.25s/it] 79%|███████▉ | 5228/6638 [1:11:20<1:16:21, 3.25s/it] {'loss': 0.6593, 'grad_norm': 0.6007979297090515, 'learning_rate': 2.2751220636531523e-06, 'epoch': 0.79} 79%|███████▉ | 5228/6638 [1:11:20<1:16:21, 3.25s/it] 79%|███████▉ | 5229/6638 [1:11:23<1:16:19, 3.25s/it] {'loss': 0.6493, 'grad_norm': 0.5663396527341037, 'learning_rate': 2.272024188280062e-06, 'epoch': 0.79} 79%|███████▉ | 5229/6638 [1:11:23<1:16:19, 3.25s/it] 79%|███████▉ | 5230/6638 [1:11:27<1:15:56, 3.24s/it] {'loss': 0.6127, 'grad_norm': 0.5612064473840379, 'learning_rate': 2.268928153100697e-06, 'epoch': 0.79} 79%|███████▉ | 5230/6638 [1:11:27<1:15:56, 3.24s/it] 79%|███████▉ | 5231/6638 [1:11:30<1:16:00, 3.24s/it] {'loss': 0.6279, 'grad_norm': 0.5972695495586305, 'learning_rate': 2.2658339588522906e-06, 'epoch': 0.79} 79%|███████▉ | 5231/6638 [1:11:30<1:16:00, 3.24s/it] 79%|███████▉ | 5232/6638 [1:11:33<1:17:08, 3.29s/it] {'loss': 0.6509, 'grad_norm': 0.5675716840492354, 'learning_rate': 2.2627416062716366e-06, 'epoch': 0.79} 79%|███████▉ | 5232/6638 [1:11:33<1:17:08, 3.29s/it] 79%|███████▉ | 5233/6638 [1:11:36<1:16:23, 3.26s/it] {'loss': 0.6868, 'grad_norm': 0.6667785114975984, 'learning_rate': 2.259651096095091e-06, 'epoch': 0.79} 79%|███████▉ | 5233/6638 [1:11:36<1:16:23, 3.26s/it] 79%|███████▉ | 5234/6638 [1:11:40<1:16:33, 3.27s/it] {'loss': 0.643, 'grad_norm': 0.8420377716134365, 'learning_rate': 2.2565624290585674e-06, 'epoch': 0.79} 79%|███████▉ | 5234/6638 [1:11:40<1:16:33, 3.27s/it] 79%|███████▉ | 5235/6638 [1:11:43<1:16:47, 3.28s/it] {'loss': 0.6269, 'grad_norm': 0.57150950053485, 'learning_rate': 2.2534756058975403e-06, 'epoch': 0.79} 79%|███████▉ | 5235/6638 [1:11:43<1:16:47, 3.28s/it] 79%|███████▉ | 5236/6638 [1:11:46<1:16:28, 3.27s/it] {'loss': 0.6796, 'grad_norm': 0.7401038318690882, 'learning_rate': 2.250390627347049e-06, 'epoch': 0.79} 79%|███████▉ | 5236/6638 [1:11:46<1:16:28, 3.27s/it] 79%|███████▉ | 5237/6638 [1:11:49<1:16:26, 3.27s/it] {'loss': 0.6287, 'grad_norm': 0.6778562858646258, 'learning_rate': 2.2473074941416916e-06, 'epoch': 0.79} 79%|███████▉ | 5237/6638 [1:11:49<1:16:26, 3.27s/it] 79%|███████▉ | 5238/6638 [1:11:53<1:16:44, 3.29s/it] {'loss': 0.6348, 'grad_norm': 0.5790058110665176, 'learning_rate': 2.2442262070156294e-06, 'epoch': 0.79} 79%|███████▉ | 5238/6638 [1:11:53<1:16:44, 3.29s/it] 79%|███████▉ | 5239/6638 [1:11:56<1:16:28, 3.28s/it] {'loss': 0.6208, 'grad_norm': 0.5665255582331759, 'learning_rate': 2.2411467667025787e-06, 'epoch': 0.79} 79%|███████▉ | 5239/6638 [1:11:56<1:16:28, 3.28s/it] 79%|███████▉ | 5240/6638 [1:11:59<1:16:33, 3.29s/it] {'loss': 0.6349, 'grad_norm': 0.6598568363670676, 'learning_rate': 2.2380691739358186e-06, 'epoch': 0.79} 79%|███████▉ | 5240/6638 [1:11:59<1:16:33, 3.29s/it] 79%|███████▉ | 5241/6638 [1:12:03<1:16:29, 3.29s/it] {'loss': 0.6466, 'grad_norm': 0.6034877712726772, 'learning_rate': 2.2349934294481943e-06, 'epoch': 0.79} 79%|███████▉ | 5241/6638 [1:12:03<1:16:29, 3.29s/it] 79%|███████▉ | 5242/6638 [1:12:06<1:17:08, 3.32s/it] {'loss': 0.6748, 'grad_norm': 0.7081100647330546, 'learning_rate': 2.231919533972098e-06, 'epoch': 0.79} 79%|███████▉ | 5242/6638 [1:12:06<1:17:08, 3.32s/it] 79%|███████▉ | 5243/6638 [1:12:09<1:16:32, 3.29s/it] {'loss': 0.5923, 'grad_norm': 0.5382275382209557, 'learning_rate': 2.228847488239492e-06, 'epoch': 0.79} 79%|███████▉ | 5243/6638 [1:12:09<1:16:32, 3.29s/it] 79%|███████▉ | 5244/6638 [1:12:13<1:16:37, 3.30s/it] {'loss': 0.677, 'grad_norm': 0.6548339569159749, 'learning_rate': 2.2257772929818977e-06, 'epoch': 0.79} 79%|███████▉ | 5244/6638 [1:12:13<1:16:37, 3.30s/it] 79%|███████▉ | 5245/6638 [1:12:16<1:16:20, 3.29s/it] {'loss': 0.6448, 'grad_norm': 0.6133340432040966, 'learning_rate': 2.222708948930389e-06, 'epoch': 0.79} 79%|███████▉ | 5245/6638 [1:12:16<1:16:20, 3.29s/it] 79%|███████▉ | 5246/6638 [1:12:19<1:15:48, 3.27s/it] {'loss': 0.6534, 'grad_norm': 0.6343624550155408, 'learning_rate': 2.2196424568156073e-06, 'epoch': 0.79} 79%|███████▉ | 5246/6638 [1:12:19<1:15:48, 3.27s/it] 79%|███████▉ | 5247/6638 [1:12:22<1:16:17, 3.29s/it] {'loss': 0.6016, 'grad_norm': 0.5188145470166905, 'learning_rate': 2.216577817367741e-06, 'epoch': 0.79} 79%|███████▉ | 5247/6638 [1:12:22<1:16:17, 3.29s/it] 79%|███████▉ | 5248/6638 [1:12:26<1:15:57, 3.28s/it] {'loss': 0.6221, 'grad_norm': 0.6219545039339853, 'learning_rate': 2.2135150313165566e-06, 'epoch': 0.79} 79%|███████▉ | 5248/6638 [1:12:26<1:15:57, 3.28s/it] 79%|███████▉ | 5249/6638 [1:12:29<1:16:42, 3.31s/it] {'loss': 0.6688, 'grad_norm': 0.5984742510677109, 'learning_rate': 2.2104540993913613e-06, 'epoch': 0.79} 79%|███████▉ | 5249/6638 [1:12:29<1:16:42, 3.31s/it]6 AutoResumeHook: Checking whether to suspend... 52 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 0 3 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 79%|███████▉ | 5250/6638 [1:12:32<1:16:18, 3.30s/it]1 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6498, 'grad_norm': 0.7804706954722974, 'learning_rate': 2.2073950223210274e-06, 'epoch': 0.79} 79%|███████▉ | 5250/6638 [1:12:32<1:16:18, 3.30s/it] 79%|███████▉ | 5251/6638 [1:12:36<1:16:00, 3.29s/it] {'loss': 0.6058, 'grad_norm': 0.5789505122271277, 'learning_rate': 2.2043378008339845e-06, 'epoch': 0.79} 79%|███████▉ | 5251/6638 [1:12:36<1:16:00, 3.29s/it] 79%|███████▉ | 5252/6638 [1:12:39<1:16:23, 3.31s/it] {'loss': 0.6668, 'grad_norm': 0.6165692278863945, 'learning_rate': 2.2012824356582275e-06, 'epoch': 0.79} 79%|███████▉ | 5252/6638 [1:12:39<1:16:23, 3.31s/it] 79%|███████▉ | 5253/6638 [1:12:42<1:17:25, 3.35s/it] {'loss': 0.6271, 'grad_norm': 0.5041300635659789, 'learning_rate': 2.1982289275212954e-06, 'epoch': 0.79} 79%|███████▉ | 5253/6638 [1:12:42<1:17:25, 3.35s/it] 79%|███████▉ | 5254/6638 [1:12:46<1:16:43, 3.33s/it] {'loss': 0.6385, 'grad_norm': 0.6349233098494725, 'learning_rate': 2.1951772771503e-06, 'epoch': 0.79} 79%|███████▉ | 5254/6638 [1:12:46<1:16:43, 3.33s/it] 79%|███████▉ | 5255/6638 [1:12:49<1:16:54, 3.34s/it] {'loss': 0.6397, 'grad_norm': 0.5696581614566679, 'learning_rate': 2.1921274852718944e-06, 'epoch': 0.79} 79%|███████▉ | 5255/6638 [1:12:49<1:16:54, 3.34s/it] 79%|███████▉ | 5256/6638 [1:12:52<1:16:20, 3.31s/it] {'loss': 0.6518, 'grad_norm': 0.604961847359679, 'learning_rate': 2.1890795526123086e-06, 'epoch': 0.79} 79%|███████▉ | 5256/6638 [1:12:52<1:16:20, 3.31s/it] 79%|███████▉ | 5257/6638 [1:12:56<1:15:38, 3.29s/it] {'loss': 0.6017, 'grad_norm': 0.5861251034548115, 'learning_rate': 2.186033479897316e-06, 'epoch': 0.79} 79%|███████▉ | 5257/6638 [1:12:56<1:15:38, 3.29s/it] 79%|███████▉ | 5258/6638 [1:12:59<1:16:21, 3.32s/it] {'loss': 0.6753, 'grad_norm': 0.6035507483476106, 'learning_rate': 2.182989267852246e-06, 'epoch': 0.79} 79%|███████▉ | 5258/6638 [1:12:59<1:16:21, 3.32s/it] 79%|███████▉ | 5259/6638 [1:13:02<1:16:07, 3.31s/it] {'loss': 0.6211, 'grad_norm': 0.575594324951664, 'learning_rate': 2.1799469172019914e-06, 'epoch': 0.79} 79%|███████▉ | 5259/6638 [1:13:02<1:16:07, 3.31s/it] 79%|███████▉ | 5260/6638 [1:13:05<1:15:47, 3.30s/it] {'loss': 0.6723, 'grad_norm': 0.5994460615009043, 'learning_rate': 2.176906428671003e-06, 'epoch': 0.79} 79%|███████▉ | 5260/6638 [1:13:05<1:15:47, 3.30s/it] 79%|███████▉ | 5261/6638 [1:13:09<1:16:40, 3.34s/it] {'loss': 0.6606, 'grad_norm': 0.6324883227725829, 'learning_rate': 2.173867802983286e-06, 'epoch': 0.79} 79%|███████▉ | 5261/6638 [1:13:09<1:16:40, 3.34s/it] 79%|███████▉ | 5262/6638 [1:13:12<1:17:32, 3.38s/it] {'loss': 0.6463, 'grad_norm': 0.586111850511905, 'learning_rate': 2.170831040862397e-06, 'epoch': 0.79} 79%|███████▉ | 5262/6638 [1:13:12<1:17:32, 3.38s/it] 79%|███████▉ | 5263/6638 [1:13:16<1:18:23, 3.42s/it] {'loss': 0.643, 'grad_norm': 0.5772598755478007, 'learning_rate': 2.167796143031453e-06, 'epoch': 0.79} 79%|███████▉ | 5263/6638 [1:13:16<1:18:23, 3.42s/it] 79%|███████▉ | 5264/6638 [1:13:19<1:17:39, 3.39s/it] {'loss': 0.6311, 'grad_norm': 0.6019731684349254, 'learning_rate': 2.1647631102131328e-06, 'epoch': 0.79} 79%|███████▉ | 5264/6638 [1:13:19<1:17:39, 3.39s/it] 79%|███████▉ | 5265/6638 [1:13:22<1:16:39, 3.35s/it] {'loss': 0.657, 'grad_norm': 0.5807225295392592, 'learning_rate': 2.161731943129658e-06, 'epoch': 0.79} 79%|███████▉ | 5265/6638 [1:13:22<1:16:39, 3.35s/it] 79%|███████▉ | 5266/6638 [1:13:26<1:16:39, 3.35s/it] {'loss': 0.6181, 'grad_norm': 0.5916279987636685, 'learning_rate': 2.1587026425028168e-06, 'epoch': 0.79} 79%|███████▉ | 5266/6638 [1:13:26<1:16:39, 3.35s/it] 79%|███████▉ | 5267/6638 [1:13:29<1:15:57, 3.32s/it] {'loss': 0.6585, 'grad_norm': 0.5939250670942606, 'learning_rate': 2.1556752090539523e-06, 'epoch': 0.79} 79%|███████▉ | 5267/6638 [1:13:29<1:15:57, 3.32s/it] 79%|███████▉ | 5268/6638 [1:13:32<1:16:11, 3.34s/it] {'loss': 0.6921, 'grad_norm': 0.6424984643481668, 'learning_rate': 2.1526496435039547e-06, 'epoch': 0.79} 79%|███████▉ | 5268/6638 [1:13:32<1:16:11, 3.34s/it] 79%|███████▉ | 5269/6638 [1:13:36<1:16:41, 3.36s/it] {'loss': 0.6357, 'grad_norm': 0.5905460527240731, 'learning_rate': 2.1496259465732783e-06, 'epoch': 0.79} 79%|███████▉ | 5269/6638 [1:13:36<1:16:41, 3.36s/it] 79%|███████▉ | 5270/6638 [1:13:39<1:16:52, 3.37s/it] {'loss': 0.6965, 'grad_norm': 0.6626213989178882, 'learning_rate': 2.1466041189819266e-06, 'epoch': 0.79} 79%|███████▉ | 5270/6638 [1:13:39<1:16:52, 3.37s/it] 79%|███████▉ | 5271/6638 [1:13:43<1:18:42, 3.45s/it] {'loss': 0.6526, 'grad_norm': 0.5483000102859629, 'learning_rate': 2.1435841614494625e-06, 'epoch': 0.79} 79%|███████▉ | 5271/6638 [1:13:43<1:18:42, 3.45s/it] 79%|███████▉ | 5272/6638 [1:13:46<1:19:12, 3.48s/it] {'loss': 0.6412, 'grad_norm': 0.5539391068920089, 'learning_rate': 2.140566074695002e-06, 'epoch': 0.79} 79%|███████▉ | 5272/6638 [1:13:46<1:19:12, 3.48s/it] 79%|███████▉ | 5273/6638 [1:13:50<1:17:27, 3.40s/it] {'loss': 0.6807, 'grad_norm': 0.6232840665531627, 'learning_rate': 2.1375498594372113e-06, 'epoch': 0.79} 79%|███████▉ | 5273/6638 [1:13:50<1:17:27, 3.40s/it] 79%|███████▉ | 5274/6638 [1:13:53<1:16:07, 3.35s/it] {'loss': 0.6074, 'grad_norm': 0.573636063049436, 'learning_rate': 2.1345355163943173e-06, 'epoch': 0.79} 79%|███████▉ | 5274/6638 [1:13:53<1:16:07, 3.35s/it] 79%|███████▉ | 5275/6638 [1:13:56<1:16:01, 3.35s/it] {'loss': 0.637, 'grad_norm': 0.5395098135107963, 'learning_rate': 2.1315230462840985e-06, 'epoch': 0.79} 79%|███████▉ | 5275/6638 [1:13:56<1:16:01, 3.35s/it] 79%|███████▉ | 5276/6638 [1:14:00<1:16:38, 3.38s/it] {'loss': 0.6321, 'grad_norm': 0.5129888326597918, 'learning_rate': 2.1285124498238905e-06, 'epoch': 0.79} 79%|███████▉ | 5276/6638 [1:14:00<1:16:38, 3.38s/it] 79%|███████▉ | 5277/6638 [1:14:03<1:15:43, 3.34s/it] {'loss': 0.6276, 'grad_norm': 0.5731477162168674, 'learning_rate': 2.125503727730577e-06, 'epoch': 0.79} 79%|███████▉ | 5277/6638 [1:14:03<1:15:43, 3.34s/it] 80%|███████▉ | 5278/6638 [1:14:06<1:15:25, 3.33s/it] {'loss': 0.6831, 'grad_norm': 0.632800443149916, 'learning_rate': 2.1224968807205914e-06, 'epoch': 0.8} 80%|███████▉ | 5278/6638 [1:14:06<1:15:25, 3.33s/it] 80%|███████▉ | 5279/6638 [1:14:10<1:15:16, 3.32s/it] {'loss': 0.6416, 'grad_norm': 0.6360364154041436, 'learning_rate': 2.1194919095099396e-06, 'epoch': 0.8} 80%|███████▉ | 5279/6638 [1:14:10<1:15:16, 3.32s/it] 80%|███████▉ | 5280/6638 [1:14:13<1:15:06, 3.32s/it] {'loss': 0.6526, 'grad_norm': 0.5859627876760501, 'learning_rate': 2.116488814814163e-06, 'epoch': 0.8} 80%|███████▉ | 5280/6638 [1:14:13<1:15:06, 3.32s/it] 80%|███████▉ | 5281/6638 [1:14:16<1:15:25, 3.33s/it] {'loss': 0.6713, 'grad_norm': 0.639287244463641, 'learning_rate': 2.113487597348357e-06, 'epoch': 0.8} 80%|███████▉ | 5281/6638 [1:14:16<1:15:25, 3.33s/it] 80%|███████▉ | 5282/6638 [1:14:20<1:15:21, 3.33s/it] {'loss': 0.6301, 'grad_norm': 0.5493311362699541, 'learning_rate': 2.110488257827179e-06, 'epoch': 0.8} 80%|███████▉ | 5282/6638 [1:14:20<1:15:21, 3.33s/it] 80%|███████▉ | 5283/6638 [1:14:23<1:14:59, 3.32s/it] {'loss': 0.6249, 'grad_norm': 0.6052738147583528, 'learning_rate': 2.107490796964835e-06, 'epoch': 0.8} 80%|███████▉ | 5283/6638 [1:14:23<1:14:59, 3.32s/it] 80%|███████▉ | 5284/6638 [1:14:26<1:14:42, 3.31s/it] {'loss': 0.6067, 'grad_norm': 0.5483961724678146, 'learning_rate': 2.1044952154750864e-06, 'epoch': 0.8} 80%|███████▉ | 5284/6638 [1:14:26<1:14:42, 3.31s/it] 80%|███████▉ | 5285/6638 [1:14:29<1:14:22, 3.30s/it] {'loss': 0.6294, 'grad_norm': 0.5378459756116889, 'learning_rate': 2.101501514071238e-06, 'epoch': 0.8} 80%|███████▉ | 5285/6638 [1:14:29<1:14:22, 3.30s/it] 80%|███████▉ | 5286/6638 [1:14:33<1:13:55, 3.28s/it] {'loss': 0.6389, 'grad_norm': 0.5571840730649165, 'learning_rate': 2.0985096934661563e-06, 'epoch': 0.8} 80%|███████▉ | 5286/6638 [1:14:33<1:13:55, 3.28s/it] 80%|███████▉ | 5287/6638 [1:14:36<1:13:48, 3.28s/it] {'loss': 0.651, 'grad_norm': 0.5752308959772201, 'learning_rate': 2.0955197543722595e-06, 'epoch': 0.8} 80%|███████▉ | 5287/6638 [1:14:36<1:13:48, 3.28s/it] 80%|███████▉ | 5288/6638 [1:14:39<1:13:28, 3.27s/it] {'loss': 0.6499, 'grad_norm': 0.6414428709102453, 'learning_rate': 2.0925316975015087e-06, 'epoch': 0.8} 80%|███████▉ | 5288/6638 [1:14:39<1:13:28, 3.27s/it] 80%|███████▉ | 5289/6638 [1:14:43<1:14:37, 3.32s/it] {'loss': 0.6805, 'grad_norm': 0.6199836620237379, 'learning_rate': 2.0895455235654306e-06, 'epoch': 0.8} 80%|███████▉ | 5289/6638 [1:14:43<1:14:37, 3.32s/it] 80%|███████▉ | 5290/6638 [1:14:46<1:14:19, 3.31s/it] {'loss': 0.6513, 'grad_norm': 0.6530834594099323, 'learning_rate': 2.0865612332750883e-06, 'epoch': 0.8} 80%|███████▉ | 5290/6638 [1:14:46<1:14:19, 3.31s/it] 80%|███████▉ | 5291/6638 [1:14:49<1:14:18, 3.31s/it] {'loss': 0.6612, 'grad_norm': 0.617538836445298, 'learning_rate': 2.0835788273411084e-06, 'epoch': 0.8} 80%|███████▉ | 5291/6638 [1:14:49<1:14:18, 3.31s/it] 80%|███████▉ | 5292/6638 [1:14:53<1:14:28, 3.32s/it] {'loss': 0.6315, 'grad_norm': 0.5491071021173958, 'learning_rate': 2.0805983064736668e-06, 'epoch': 0.8} 80%|███████▉ | 5292/6638 [1:14:53<1:14:28, 3.32s/it] 80%|███████▉ | 5293/6638 [1:14:56<1:15:11, 3.35s/it] {'loss': 0.6557, 'grad_norm': 0.5970246222229599, 'learning_rate': 2.0776196713824825e-06, 'epoch': 0.8} 80%|███████▉ | 5293/6638 [1:14:56<1:15:11, 3.35s/it] 80%|███████▉ | 5294/6638 [1:14:59<1:14:39, 3.33s/it] {'loss': 0.6619, 'grad_norm': 0.5745694632056544, 'learning_rate': 2.074642922776834e-06, 'epoch': 0.8} 80%|███████▉ | 5294/6638 [1:14:59<1:14:39, 3.33s/it] 80%|███████▉ | 5295/6638 [1:15:03<1:14:14, 3.32s/it] {'loss': 0.6422, 'grad_norm': 0.5978627158498283, 'learning_rate': 2.0716680613655515e-06, 'epoch': 0.8} 80%|███████▉ | 5295/6638 [1:15:03<1:14:14, 3.32s/it] 80%|███████▉ | 5296/6638 [1:15:06<1:14:12, 3.32s/it] {'loss': 0.6112, 'grad_norm': 0.6407242111904723, 'learning_rate': 2.0686950878570058e-06, 'epoch': 0.8} 80%|███████▉ | 5296/6638 [1:15:06<1:14:12, 3.32s/it] 80%|███████▉ | 5297/6638 [1:15:09<1:13:36, 3.29s/it] {'loss': 0.6392, 'grad_norm': 0.5589793583080808, 'learning_rate': 2.0657240029591276e-06, 'epoch': 0.8} 80%|███████▉ | 5297/6638 [1:15:09<1:13:36, 3.29s/it] 80%|███████▉ | 5298/6638 [1:15:12<1:13:09, 3.28s/it] {'loss': 0.6082, 'grad_norm': 0.5981553921601727, 'learning_rate': 2.0627548073793933e-06, 'epoch': 0.8} 80%|███████▉ | 5298/6638 [1:15:12<1:13:09, 3.28s/it] 80%|███████▉ | 5299/6638 [1:15:16<1:12:53, 3.27s/it] {'loss': 0.6172, 'grad_norm': 0.5922335159405336, 'learning_rate': 2.059787501824836e-06, 'epoch': 0.8} 80%|███████▉ | 5299/6638 [1:15:16<1:12:53, 3.27s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 80%|███████▉ | 5300/6638 [1:15:19<1:14:20, 3.33s/it] {'loss': 0.6519, 'grad_norm': 0.6207732681176767, 'learning_rate': 2.0568220870020296e-06, 'epoch': 0.8} 80%|███████▉ | 5300/6638 [1:15:19<1:14:20, 3.33s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 80%|███████▉ | 5301/6638 [1:15:36<2:48:08, 7.55s/it] {'loss': 0.6643, 'grad_norm': 0.6422249246356544, 'learning_rate': 2.053858563617096e-06, 'epoch': 0.8} 80%|███████▉ | 5301/6638 [1:15:36<2:48:08, 7.55s/it] 80%|███████▉ | 5302/6638 [1:15:40<2:19:29, 6.26s/it] {'loss': 0.6068, 'grad_norm': 0.5595810181237673, 'learning_rate': 2.0508969323757243e-06, 'epoch': 0.8} 80%|███████▉ | 5302/6638 [1:15:40<2:19:29, 6.26s/it] 80%|███████▉ | 5303/6638 [1:15:43<2:00:43, 5.43s/it] {'loss': 0.6191, 'grad_norm': 0.7101296418092183, 'learning_rate': 2.0479371939831325e-06, 'epoch': 0.8} 80%|███████▉ | 5303/6638 [1:15:43<2:00:43, 5.43s/it] 80%|███████▉ | 5304/6638 [1:15:46<1:46:13, 4.78s/it] {'loss': 0.633, 'grad_norm': 0.5524105143526903, 'learning_rate': 2.0449793491441026e-06, 'epoch': 0.8} 80%|███████▉ | 5304/6638 [1:15:46<1:46:13, 4.78s/it] 80%|███████▉ | 5305/6638 [1:15:50<1:36:20, 4.34s/it] {'loss': 0.6484, 'grad_norm': 0.6190656440567665, 'learning_rate': 2.0420233985629534e-06, 'epoch': 0.8} 80%|███████▉ | 5305/6638 [1:15:50<1:36:20, 4.34s/it] 80%|███████▉ | 5306/6638 [1:15:53<1:28:49, 4.00s/it] {'loss': 0.6271, 'grad_norm': 0.5820494199558492, 'learning_rate': 2.0390693429435626e-06, 'epoch': 0.8} 80%|███████▉ | 5306/6638 [1:15:53<1:28:49, 4.00s/it] 80%|███████▉ | 5307/6638 [1:15:56<1:23:23, 3.76s/it] {'loss': 0.6332, 'grad_norm': 0.5947764516315042, 'learning_rate': 2.0361171829893545e-06, 'epoch': 0.8} 80%|███████▉ | 5307/6638 [1:15:56<1:23:23, 3.76s/it] 80%|███████▉ | 5308/6638 [1:15:59<1:19:26, 3.58s/it] {'loss': 0.5963, 'grad_norm': 0.5582205623330546, 'learning_rate': 2.0331669194032968e-06, 'epoch': 0.8} 80%|███████▉ | 5308/6638 [1:15:59<1:19:26, 3.58s/it] 80%|███████▉ | 5309/6638 [1:16:03<1:16:58, 3.48s/it] {'loss': 0.6391, 'grad_norm': 0.6011548457741809, 'learning_rate': 2.0302185528879104e-06, 'epoch': 0.8} 80%|███████▉ | 5309/6638 [1:16:03<1:16:58, 3.48s/it] 80%|███████▉ | 5310/6638 [1:16:06<1:15:23, 3.41s/it] {'loss': 0.6124, 'grad_norm': 0.5892762835897339, 'learning_rate': 2.0272720841452674e-06, 'epoch': 0.8} 80%|███████▉ | 5310/6638 [1:16:06<1:15:23, 3.41s/it] 80%|████████ | 5311/6638 [1:16:09<1:15:06, 3.40s/it] {'loss': 0.6395, 'grad_norm': 0.6436910725773076, 'learning_rate': 2.024327513876979e-06, 'epoch': 0.8} 80%|████████ | 5311/6638 [1:16:09<1:15:06, 3.40s/it] 80%|████████ | 5312/6638 [1:16:13<1:14:35, 3.38s/it] {'loss': 0.6372, 'grad_norm': 0.5761224940516166, 'learning_rate': 2.0213848427842133e-06, 'epoch': 0.8} 80%|████████ | 5312/6638 [1:16:13<1:14:35, 3.38s/it] 80%|████████ | 5313/6638 [1:16:16<1:14:22, 3.37s/it] {'loss': 0.6181, 'grad_norm': 0.5793378605036774, 'learning_rate': 2.0184440715676767e-06, 'epoch': 0.8} 80%|████████ | 5313/6638 [1:16:16<1:14:22, 3.37s/it] 80%|████████ | 5314/6638 [1:16:19<1:14:03, 3.36s/it] {'loss': 0.683, 'grad_norm': 0.621418178995121, 'learning_rate': 2.015505200927633e-06, 'epoch': 0.8} 80%|████████ | 5314/6638 [1:16:19<1:14:03, 3.36s/it] 80%|████████ | 5315/6638 [1:16:23<1:14:01, 3.36s/it] {'loss': 0.7422, 'grad_norm': 0.7142754364381517, 'learning_rate': 2.0125682315638915e-06, 'epoch': 0.8} 80%|████████ | 5315/6638 [1:16:23<1:14:01, 3.36s/it] 80%|████████ | 5316/6638 [1:16:26<1:13:49, 3.35s/it] {'loss': 0.6241, 'grad_norm': 0.5525355364949112, 'learning_rate': 2.0096331641758006e-06, 'epoch': 0.8} 80%|████████ | 5316/6638 [1:16:26<1:13:49, 3.35s/it] 80%|████████ | 5317/6638 [1:16:29<1:13:19, 3.33s/it] {'loss': 0.6226, 'grad_norm': 0.6228353045605665, 'learning_rate': 2.006699999462264e-06, 'epoch': 0.8} 80%|████████ | 5317/6638 [1:16:29<1:13:19, 3.33s/it] 80%|████████ | 5318/6638 [1:16:32<1:12:49, 3.31s/it] {'loss': 0.6348, 'grad_norm': 0.5616253236467167, 'learning_rate': 2.003768738121732e-06, 'epoch': 0.8} 80%|████████ | 5318/6638 [1:16:32<1:12:49, 3.31s/it] 80%|████████ | 5319/6638 [1:16:36<1:12:47, 3.31s/it] {'loss': 0.6314, 'grad_norm': 0.567446992004555, 'learning_rate': 2.0008393808521966e-06, 'epoch': 0.8} 80%|████████ | 5319/6638 [1:16:36<1:12:47, 3.31s/it] 80%|████████ | 5320/6638 [1:16:39<1:12:03, 3.28s/it] {'loss': 0.623, 'grad_norm': 0.5686927315113607, 'learning_rate': 1.9979119283512006e-06, 'epoch': 0.8} 80%|████████ | 5320/6638 [1:16:39<1:12:03, 3.28s/it] 80%|████████ | 5321/6638 [1:16:42<1:12:35, 3.31s/it] {'loss': 0.6565, 'grad_norm': 0.5824498572056942, 'learning_rate': 1.994986381315832e-06, 'epoch': 0.8} 80%|████████ | 5321/6638 [1:16:42<1:12:35, 3.31s/it] 80%|████████ | 5322/6638 [1:16:46<1:12:04, 3.29s/it] {'loss': 0.6462, 'grad_norm': 0.619547610125703, 'learning_rate': 1.9920627404427275e-06, 'epoch': 0.8} 80%|████████ | 5322/6638 [1:16:46<1:12:04, 3.29s/it] 80%|████████ | 5323/6638 [1:16:49<1:12:36, 3.31s/it] {'loss': 0.6663, 'grad_norm': 0.5725472454152855, 'learning_rate': 1.989141006428066e-06, 'epoch': 0.8} 80%|████████ | 5323/6638 [1:16:49<1:12:36, 3.31s/it] 80%|████████ | 5324/6638 [1:16:52<1:12:23, 3.31s/it] {'loss': 0.6169, 'grad_norm': 0.5799450654905547, 'learning_rate': 1.986221179967568e-06, 'epoch': 0.8} 80%|████████ | 5324/6638 [1:16:52<1:12:23, 3.31s/it] 80%|████████ | 5325/6638 [1:16:55<1:11:50, 3.28s/it] {'loss': 0.6566, 'grad_norm': 0.6133753118176537, 'learning_rate': 1.9833032617565173e-06, 'epoch': 0.8} 80%|████████ | 5325/6638 [1:16:55<1:11:50, 3.28s/it] 80%|████████ | 5326/6638 [1:16:59<1:11:07, 3.25s/it] {'loss': 0.6052, 'grad_norm': 0.5856104466120295, 'learning_rate': 1.9803872524897215e-06, 'epoch': 0.8} 80%|████████ | 5326/6638 [1:16:59<1:11:07, 3.25s/it] 80%|████████ | 5327/6638 [1:17:02<1:11:31, 3.27s/it] {'loss': 0.6716, 'grad_norm': 0.6180043386080523, 'learning_rate': 1.97747315286155e-06, 'epoch': 0.8} 80%|████████ | 5327/6638 [1:17:02<1:11:31, 3.27s/it] 80%|████████ | 5328/6638 [1:17:05<1:11:15, 3.26s/it] {'loss': 0.6359, 'grad_norm': 0.5644991703618135, 'learning_rate': 1.974560963565907e-06, 'epoch': 0.8} 80%|████████ | 5328/6638 [1:17:05<1:11:15, 3.26s/it] 80%|████████ | 5329/6638 [1:17:08<1:11:16, 3.27s/it] {'loss': 0.6296, 'grad_norm': 0.5617139144840703, 'learning_rate': 1.971650685296247e-06, 'epoch': 0.8} 80%|████████ | 5329/6638 [1:17:08<1:11:16, 3.27s/it] 80%|████████ | 5330/6638 [1:17:12<1:11:40, 3.29s/it] {'loss': 0.6589, 'grad_norm': 0.6501341873561901, 'learning_rate': 1.9687423187455735e-06, 'epoch': 0.8} 80%|████████ | 5330/6638 [1:17:12<1:11:40, 3.29s/it] 80%|████████ | 5331/6638 [1:17:15<1:11:28, 3.28s/it] {'loss': 0.6137, 'grad_norm': 0.5260340023737857, 'learning_rate': 1.965835864606422e-06, 'epoch': 0.8} 80%|████████ | 5331/6638 [1:17:15<1:11:28, 3.28s/it] 80%|████████ | 5332/6638 [1:17:18<1:11:15, 3.27s/it] {'loss': 0.6347, 'grad_norm': 0.5721345821473178, 'learning_rate': 1.9629313235708846e-06, 'epoch': 0.8} 80%|████████ | 5332/6638 [1:17:18<1:11:15, 3.27s/it] 80%|████████ | 5333/6638 [1:17:22<1:11:30, 3.29s/it] {'loss': 0.6176, 'grad_norm': 0.5529535301862363, 'learning_rate': 1.960028696330596e-06, 'epoch': 0.8} 80%|████████ | 5333/6638 [1:17:22<1:11:30, 3.29s/it] 80%|████████ | 5334/6638 [1:17:25<1:11:29, 3.29s/it] {'loss': 0.6526, 'grad_norm': 0.6401901004151478, 'learning_rate': 1.9571279835767276e-06, 'epoch': 0.8} 80%|████████ | 5334/6638 [1:17:25<1:11:29, 3.29s/it] 80%|████████ | 5335/6638 [1:17:28<1:11:48, 3.31s/it] {'loss': 0.6376, 'grad_norm': 0.5499512589786653, 'learning_rate': 1.954229186000005e-06, 'epoch': 0.8} 80%|████████ | 5335/6638 [1:17:28<1:11:48, 3.31s/it] 80%|████████ | 5336/6638 [1:17:32<1:11:31, 3.30s/it] {'loss': 0.6235, 'grad_norm': 0.5529392393722641, 'learning_rate': 1.951332304290685e-06, 'epoch': 0.8} 80%|████████ | 5336/6638 [1:17:32<1:11:31, 3.30s/it] 80%|████████ | 5337/6638 [1:17:35<1:11:58, 3.32s/it] {'loss': 0.6809, 'grad_norm': 0.6073549099335528, 'learning_rate': 1.948437339138588e-06, 'epoch': 0.8} 80%|████████ | 5337/6638 [1:17:35<1:11:58, 3.32s/it] 80%|████████ | 5338/6638 [1:17:38<1:11:40, 3.31s/it] {'loss': 0.6223, 'grad_norm': 0.6000364925981816, 'learning_rate': 1.945544291233059e-06, 'epoch': 0.8} 80%|████████ | 5338/6638 [1:17:38<1:11:40, 3.31s/it] 80%|████████ | 5339/6638 [1:17:41<1:11:11, 3.29s/it] {'loss': 0.5924, 'grad_norm': 0.5193653023832907, 'learning_rate': 1.9426531612629917e-06, 'epoch': 0.8} 80%|████████ | 5339/6638 [1:17:41<1:11:11, 3.29s/it] 80%|████████ | 5340/6638 [1:17:45<1:11:53, 3.32s/it] {'loss': 0.6595, 'grad_norm': 0.6111543354300594, 'learning_rate': 1.9397639499168285e-06, 'epoch': 0.8} 80%|████████ | 5340/6638 [1:17:45<1:11:53, 3.32s/it] 80%|████████ | 5341/6638 [1:17:48<1:11:33, 3.31s/it] {'loss': 0.6091, 'grad_norm': 0.5415450678089272, 'learning_rate': 1.9368766578825503e-06, 'epoch': 0.8} 80%|████████ | 5341/6638 [1:17:48<1:11:33, 3.31s/it] 80%|████████ | 5342/6638 [1:17:51<1:11:23, 3.31s/it] {'loss': 0.63, 'grad_norm': 0.5671432505169294, 'learning_rate': 1.933991285847686e-06, 'epoch': 0.8} 80%|████████ | 5342/6638 [1:17:51<1:11:23, 3.31s/it] 80%|████████ | 5343/6638 [1:17:55<1:12:22, 3.35s/it] {'loss': 0.6388, 'grad_norm': 0.5795615559338682, 'learning_rate': 1.931107834499296e-06, 'epoch': 0.8} 80%|████████ | 5343/6638 [1:17:55<1:12:22, 3.35s/it] 81%|████████ | 5344/6638 [1:17:58<1:12:24, 3.36s/it] {'loss': 0.6725, 'grad_norm': 0.6163010179761984, 'learning_rate': 1.928226304523996e-06, 'epoch': 0.81} 81%|████████ | 5344/6638 [1:17:58<1:12:24, 3.36s/it] 81%|████████ | 5345/6638 [1:18:02<1:12:01, 3.34s/it] {'loss': 0.6519, 'grad_norm': 0.5375602721865744, 'learning_rate': 1.9253466966079393e-06, 'epoch': 0.81} 81%|████████ | 5345/6638 [1:18:02<1:12:01, 3.34s/it] 81%|████████ | 5346/6638 [1:18:05<1:11:38, 3.33s/it] {'loss': 0.655, 'grad_norm': 0.5824996738444489, 'learning_rate': 1.9224690114368205e-06, 'epoch': 0.81} 81%|████████ | 5346/6638 [1:18:05<1:11:38, 3.33s/it] 81%|████████ | 5347/6638 [1:18:08<1:11:13, 3.31s/it] {'loss': 0.6199, 'grad_norm': 0.5624369942730951, 'learning_rate': 1.9195932496958738e-06, 'epoch': 0.81} 81%|████████ | 5347/6638 [1:18:08<1:11:13, 3.31s/it] 81%|████████ | 5348/6638 [1:18:11<1:10:19, 3.27s/it] {'loss': 0.6421, 'grad_norm': 0.5892830438157106, 'learning_rate': 1.9167194120698797e-06, 'epoch': 0.81} 81%|████████ | 5348/6638 [1:18:11<1:10:19, 3.27s/it] 81%|████████ | 5349/6638 [1:18:15<1:10:57, 3.30s/it] {'loss': 0.6307, 'grad_norm': 0.6060512665847316, 'learning_rate': 1.913847499243161e-06, 'epoch': 0.81} 81%|████████ | 5349/6638 [1:18:15<1:10:57, 3.30s/it]6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 81%|████████ | 5350/6638 [1:18:18<1:10:18, 3.28s/it]7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.6252, 'grad_norm': 0.6457904459560664, 'learning_rate': 1.9109775118995823e-06, 'epoch': 0.81} 81%|████████ | 5350/6638 [1:18:18<1:10:18, 3.28s/it] 81%|████████ | 5351/6638 [1:18:21<1:10:38, 3.29s/it] {'loss': 0.6318, 'grad_norm': 0.5833930014550824, 'learning_rate': 1.908109450722544e-06, 'epoch': 0.81} 81%|████████ | 5351/6638 [1:18:21<1:10:38, 3.29s/it] 81%|████████ | 5352/6638 [1:18:25<1:10:47, 3.30s/it] {'loss': 0.6067, 'grad_norm': 0.534971343533535, 'learning_rate': 1.9052433163949935e-06, 'epoch': 0.81} 81%|████████ | 5352/6638 [1:18:25<1:10:47, 3.30s/it] 81%|████████ | 5353/6638 [1:18:28<1:10:44, 3.30s/it] {'loss': 0.5926, 'grad_norm': 0.5689453871869492, 'learning_rate': 1.9023791095994214e-06, 'epoch': 0.81} 81%|████████ | 5353/6638 [1:18:28<1:10:44, 3.30s/it] 81%|████████ | 5354/6638 [1:18:31<1:10:25, 3.29s/it] {'loss': 0.6565, 'grad_norm': 0.6518017321647102, 'learning_rate': 1.899516831017849e-06, 'epoch': 0.81} 81%|████████ | 5354/6638 [1:18:31<1:10:25, 3.29s/it] 81%|████████ | 5355/6638 [1:18:34<1:10:20, 3.29s/it] {'loss': 0.6638, 'grad_norm': 0.6306312602951168, 'learning_rate': 1.8966564813318489e-06, 'epoch': 0.81} 81%|████████ | 5355/6638 [1:18:34<1:10:20, 3.29s/it] 81%|████████ | 5356/6638 [1:18:38<1:11:34, 3.35s/it] {'loss': 0.6519, 'grad_norm': 0.5906273376488597, 'learning_rate': 1.8937980612225315e-06, 'epoch': 0.81} 81%|████████ | 5356/6638 [1:18:38<1:11:34, 3.35s/it] 81%|████████ | 5357/6638 [1:18:41<1:10:41, 3.31s/it] {'loss': 0.6328, 'grad_norm': 0.5854008951716949, 'learning_rate': 1.8909415713705438e-06, 'epoch': 0.81} 81%|████████ | 5357/6638 [1:18:41<1:10:41, 3.31s/it] 81%|████████ | 5358/6638 [1:18:44<1:10:42, 3.31s/it] {'loss': 0.6654, 'grad_norm': 0.6105384567478837, 'learning_rate': 1.888087012456079e-06, 'epoch': 0.81} 81%|████████ | 5358/6638 [1:18:44<1:10:42, 3.31s/it] 81%|████████ | 5359/6638 [1:18:48<1:10:12, 3.29s/it] {'loss': 0.6277, 'grad_norm': 0.6325294695062104, 'learning_rate': 1.8852343851588617e-06, 'epoch': 0.81} 81%|████████ | 5359/6638 [1:18:48<1:10:12, 3.29s/it] 81%|████████ | 5360/6638 [1:18:51<1:09:37, 3.27s/it] {'loss': 0.6505, 'grad_norm': 0.6489077786845736, 'learning_rate': 1.8823836901581727e-06, 'epoch': 0.81} 81%|████████ | 5360/6638 [1:18:51<1:09:37, 3.27s/it] 81%|████████ | 5361/6638 [1:18:54<1:09:27, 3.26s/it] {'loss': 0.6345, 'grad_norm': 1.5395315378375034, 'learning_rate': 1.8795349281328167e-06, 'epoch': 0.81} 81%|████████ | 5361/6638 [1:18:54<1:09:27, 3.26s/it] 81%|████████ | 5362/6638 [1:18:57<1:09:40, 3.28s/it] {'loss': 0.665, 'grad_norm': 0.663663356272197, 'learning_rate': 1.8766880997611424e-06, 'epoch': 0.81} 81%|████████ | 5362/6638 [1:18:57<1:09:40, 3.28s/it] 81%|████████ | 5363/6638 [1:19:01<1:09:09, 3.25s/it] {'loss': 0.6109, 'grad_norm': 0.5547855995576231, 'learning_rate': 1.8738432057210398e-06, 'epoch': 0.81} 81%|████████ | 5363/6638 [1:19:01<1:09:09, 3.25s/it] 81%|████████ | 5364/6638 [1:19:04<1:09:11, 3.26s/it] {'loss': 0.654, 'grad_norm': 0.6057128199823747, 'learning_rate': 1.8710002466899414e-06, 'epoch': 0.81} 81%|████████ | 5364/6638 [1:19:04<1:09:11, 3.26s/it] 81%|████████ | 5365/6638 [1:19:07<1:08:43, 3.24s/it] {'loss': 0.68, 'grad_norm': 0.667398285556605, 'learning_rate': 1.8681592233448142e-06, 'epoch': 0.81} 81%|████████ | 5365/6638 [1:19:07<1:08:43, 3.24s/it] 81%|████████ | 5366/6638 [1:19:11<1:09:36, 3.28s/it] {'loss': 0.6012, 'grad_norm': 0.5510244616652831, 'learning_rate': 1.8653201363621643e-06, 'epoch': 0.81} 81%|████████ | 5366/6638 [1:19:11<1:09:36, 3.28s/it] 81%|████████ | 5367/6638 [1:19:14<1:08:55, 3.25s/it] {'loss': 0.6345, 'grad_norm': 0.6092217123883804, 'learning_rate': 1.8624829864180384e-06, 'epoch': 0.81} 81%|████████ | 5367/6638 [1:19:14<1:08:55, 3.25s/it] 81%|████████ | 5368/6638 [1:19:17<1:10:03, 3.31s/it] {'loss': 0.6825, 'grad_norm': 0.690762552154813, 'learning_rate': 1.8596477741880248e-06, 'epoch': 0.81} 81%|████████ | 5368/6638 [1:19:17<1:10:03, 3.31s/it] 81%|████████ | 5369/6638 [1:19:20<1:09:04, 3.27s/it] {'loss': 0.6577, 'grad_norm': 0.6043600037182532, 'learning_rate': 1.8568145003472427e-06, 'epoch': 0.81} 81%|████████ | 5369/6638 [1:19:20<1:09:04, 3.27s/it] 81%|████████ | 5370/6638 [1:19:24<1:08:51, 3.26s/it] {'loss': 0.614, 'grad_norm': 0.5663815814147979, 'learning_rate': 1.8539831655703577e-06, 'epoch': 0.81} 81%|████████ | 5370/6638 [1:19:24<1:08:51, 3.26s/it] 81%|████████ | 5371/6638 [1:19:27<1:09:06, 3.27s/it] {'loss': 0.6568, 'grad_norm': 0.5774983341960735, 'learning_rate': 1.8511537705315674e-06, 'epoch': 0.81} 81%|████████ | 5371/6638 [1:19:27<1:09:06, 3.27s/it] 81%|████████ | 5372/6638 [1:19:30<1:08:10, 3.23s/it] {'loss': 0.5923, 'grad_norm': 0.5071329036889999, 'learning_rate': 1.8483263159046104e-06, 'epoch': 0.81} 81%|████████ | 5372/6638 [1:19:30<1:08:10, 3.23s/it] 81%|████████ | 5373/6638 [1:19:33<1:07:41, 3.21s/it] {'loss': 0.6418, 'grad_norm': 0.5525768333078327, 'learning_rate': 1.8455008023627674e-06, 'epoch': 0.81} 81%|████████ | 5373/6638 [1:19:33<1:07:41, 3.21s/it] 81%|████████ | 5374/6638 [1:19:36<1:07:39, 3.21s/it] {'loss': 0.6288, 'grad_norm': 0.6077775053711314, 'learning_rate': 1.8426772305788476e-06, 'epoch': 0.81} 81%|████████ | 5374/6638 [1:19:36<1:07:39, 3.21s/it] 81%|████████ | 5375/6638 [1:19:40<1:07:28, 3.21s/it] {'loss': 0.65, 'grad_norm': 0.5922312230483827, 'learning_rate': 1.8398556012252044e-06, 'epoch': 0.81} 81%|████████ | 5375/6638 [1:19:40<1:07:28, 3.21s/it] 81%|████████ | 5376/6638 [1:19:43<1:09:02, 3.28s/it] {'loss': 0.6108, 'grad_norm': 0.5974427298239384, 'learning_rate': 1.8370359149737304e-06, 'epoch': 0.81} 81%|████████ | 5376/6638 [1:19:43<1:09:02, 3.28s/it] 81%|████████ | 5377/6638 [1:19:46<1:09:22, 3.30s/it] {'loss': 0.6254, 'grad_norm': 0.5552505487592003, 'learning_rate': 1.8342181724958474e-06, 'epoch': 0.81} 81%|████████ | 5377/6638 [1:19:46<1:09:22, 3.30s/it] 81%|████████ | 5378/6638 [1:19:50<1:08:53, 3.28s/it] {'loss': 0.6469, 'grad_norm': 0.5761198915515551, 'learning_rate': 1.831402374462521e-06, 'epoch': 0.81} 81%|████████ | 5378/6638 [1:19:50<1:08:53, 3.28s/it] 81%|████████ | 5379/6638 [1:19:53<1:09:21, 3.31s/it] {'loss': 0.6671, 'grad_norm': 0.5541448308076478, 'learning_rate': 1.8285885215442556e-06, 'epoch': 0.81} 81%|████████ | 5379/6638 [1:19:53<1:09:21, 3.31s/it] 81%|████████ | 5380/6638 [1:19:56<1:09:45, 3.33s/it] {'loss': 0.6555, 'grad_norm': 0.5450996398727578, 'learning_rate': 1.8257766144110823e-06, 'epoch': 0.81} 81%|████████ | 5380/6638 [1:19:56<1:09:45, 3.33s/it] 81%|████████ | 5381/6638 [1:20:00<1:09:25, 3.31s/it] {'loss': 0.6136, 'grad_norm': 0.5495760780246419, 'learning_rate': 1.8229666537325819e-06, 'epoch': 0.81} 81%|████████ | 5381/6638 [1:20:00<1:09:25, 3.31s/it] 81%|████████ | 5382/6638 [1:20:03<1:08:54, 3.29s/it] {'loss': 0.6341, 'grad_norm': 0.6791195928400154, 'learning_rate': 1.820158640177856e-06, 'epoch': 0.81} 81%|████████ | 5382/6638 [1:20:03<1:08:54, 3.29s/it] 81%|████████ | 5383/6638 [1:20:06<1:08:29, 3.27s/it] {'loss': 0.6169, 'grad_norm': 0.5890394652648847, 'learning_rate': 1.817352574415563e-06, 'epoch': 0.81} 81%|████████ | 5383/6638 [1:20:06<1:08:29, 3.27s/it] 81%|████████ | 5384/6638 [1:20:09<1:08:37, 3.28s/it] {'loss': 0.669, 'grad_norm': 0.5657764743798778, 'learning_rate': 1.8145484571138805e-06, 'epoch': 0.81} 81%|████████ | 5384/6638 [1:20:09<1:08:37, 3.28s/it] 81%|████████ | 5385/6638 [1:20:13<1:08:23, 3.27s/it] {'loss': 0.6201, 'grad_norm': 0.5976838773233205, 'learning_rate': 1.811746288940527e-06, 'epoch': 0.81} 81%|████████ | 5385/6638 [1:20:13<1:08:23, 3.27s/it] 81%|████████ | 5386/6638 [1:20:16<1:08:30, 3.28s/it] {'loss': 0.6421, 'grad_norm': 0.590797095502106, 'learning_rate': 1.808946070562758e-06, 'epoch': 0.81} 81%|████████ | 5386/6638 [1:20:16<1:08:30, 3.28s/it] 81%|████████ | 5387/6638 [1:20:19<1:08:15, 3.27s/it] {'loss': 0.6224, 'grad_norm': 0.5179954259279903, 'learning_rate': 1.8061478026473656e-06, 'epoch': 0.81} 81%|████████ | 5387/6638 [1:20:19<1:08:15, 3.27s/it] 81%|████████ | 5388/6638 [1:20:22<1:08:14, 3.28s/it] {'loss': 0.6364, 'grad_norm': 0.5609190169479462, 'learning_rate': 1.8033514858606783e-06, 'epoch': 0.81} 81%|████████ | 5388/6638 [1:20:23<1:08:14, 3.28s/it] 81%|████████ | 5389/6638 [1:20:26<1:08:32, 3.29s/it] {'loss': 0.7011, 'grad_norm': 0.6318134320914576, 'learning_rate': 1.8005571208685567e-06, 'epoch': 0.81} 81%|████████ | 5389/6638 [1:20:26<1:08:32, 3.29s/it] 81%|████████ | 5390/6638 [1:20:29<1:07:52, 3.26s/it] {'loss': 0.61, 'grad_norm': 0.5395034844758563, 'learning_rate': 1.7977647083363914e-06, 'epoch': 0.81} 81%|████████ | 5390/6638 [1:20:29<1:07:52, 3.26s/it] 81%|████████ | 5391/6638 [1:20:32<1:07:16, 3.24s/it] {'loss': 0.5955, 'grad_norm': 0.5485556414735956, 'learning_rate': 1.7949742489291256e-06, 'epoch': 0.81} 81%|████████ | 5391/6638 [1:20:32<1:07:16, 3.24s/it] 81%|████████ | 5392/6638 [1:20:36<1:07:40, 3.26s/it] {'loss': 0.6443, 'grad_norm': 0.596774865241629, 'learning_rate': 1.7921857433112188e-06, 'epoch': 0.81} 81%|████████ | 5392/6638 [1:20:36<1:07:40, 3.26s/it] 81%|████████ | 5393/6638 [1:20:39<1:07:45, 3.27s/it] {'loss': 0.6613, 'grad_norm': 0.6150146972895579, 'learning_rate': 1.789399192146678e-06, 'epoch': 0.81} 81%|████████ | 5393/6638 [1:20:39<1:07:45, 3.27s/it] 81%|████████▏ | 5394/6638 [1:20:42<1:07:42, 3.27s/it] {'loss': 0.6173, 'grad_norm': 0.5492652509786584, 'learning_rate': 1.7866145960990334e-06, 'epoch': 0.81} 81%|████████▏ | 5394/6638 [1:20:42<1:07:42, 3.27s/it] 81%|████████▏ | 5395/6638 [1:20:45<1:07:32, 3.26s/it] {'loss': 0.652, 'grad_norm': 0.6449192279061234, 'learning_rate': 1.7838319558313598e-06, 'epoch': 0.81} 81%|████████▏ | 5395/6638 [1:20:45<1:07:32, 3.26s/it] 81%|████████▏ | 5396/6638 [1:20:49<1:07:39, 3.27s/it] {'loss': 0.6404, 'grad_norm': 0.6052542416117769, 'learning_rate': 1.7810512720062655e-06, 'epoch': 0.81} 81%|████████▏ | 5396/6638 [1:20:49<1:07:39, 3.27s/it] 81%|████████▏ | 5397/6638 [1:20:52<1:07:17, 3.25s/it] {'loss': 0.6114, 'grad_norm': 0.5649894108967775, 'learning_rate': 1.778272545285883e-06, 'epoch': 0.81} 81%|████████▏ | 5397/6638 [1:20:52<1:07:17, 3.25s/it] 81%|████████▏ | 5398/6638 [1:20:55<1:07:14, 3.25s/it] {'loss': 0.6655, 'grad_norm': 0.6907479787116432, 'learning_rate': 1.7754957763318892e-06, 'epoch': 0.81} 81%|████████▏ | 5398/6638 [1:20:55<1:07:14, 3.25s/it] 81%|████████▏ | 5399/6638 [1:20:58<1:07:22, 3.26s/it] {'loss': 0.6448, 'grad_norm': 0.5777603813040537, 'learning_rate': 1.7727209658054933e-06, 'epoch': 0.81} 81%|████████▏ | 5399/6638 [1:20:58<1:07:22, 3.26s/it]6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 05 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 81%|████████▏ | 5400/6638 [1:21:02<1:06:48, 3.24s/it]7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.5953, 'grad_norm': 0.5366719202124755, 'learning_rate': 1.7699481143674323e-06, 'epoch': 0.81} 81%|████████▏ | 5400/6638 [1:21:02<1:06:48, 3.24s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 81%|████████▏ | 5401/6638 [1:21:21<2:48:49, 8.19s/it] {'loss': 0.649, 'grad_norm': 0.5680263508773344, 'learning_rate': 1.7671772226779814e-06, 'epoch': 0.81} 81%|████████▏ | 5401/6638 [1:21:21<2:48:49, 8.19s/it] 81%|████████▏ | 5402/6638 [1:21:25<2:18:46, 6.74s/it] {'loss': 0.6618, 'grad_norm': 0.594175494328018, 'learning_rate': 1.7644082913969496e-06, 'epoch': 0.81} 81%|████████▏ | 5402/6638 [1:21:25<2:18:46, 6.74s/it] 81%|████████▏ | 5403/6638 [1:21:28<1:57:11, 5.69s/it] {'loss': 0.6388, 'grad_norm': 0.6155493580977169, 'learning_rate': 1.7616413211836792e-06, 'epoch': 0.81} 81%|████████▏ | 5403/6638 [1:21:28<1:57:11, 5.69s/it] 81%|████████▏ | 5404/6638 [1:21:31<1:42:16, 4.97s/it] {'loss': 0.6759, 'grad_norm': 0.6760592954910069, 'learning_rate': 1.758876312697041e-06, 'epoch': 0.81} 81%|████████▏ | 5404/6638 [1:21:31<1:42:16, 4.97s/it] 81%|████████▏ | 5405/6638 [1:21:34<1:31:45, 4.47s/it] {'loss': 0.6007, 'grad_norm': 0.5973953723420654, 'learning_rate': 1.7561132665954418e-06, 'epoch': 0.81} 81%|████████▏ | 5405/6638 [1:21:34<1:31:45, 4.47s/it] 81%|████████▏ | 5406/6638 [1:21:38<1:23:58, 4.09s/it] {'loss': 0.6292, 'grad_norm': 0.5731225131822129, 'learning_rate': 1.7533521835368206e-06, 'epoch': 0.81} 81%|████████▏ | 5406/6638 [1:21:38<1:23:58, 4.09s/it] 81%|████████▏ | 5407/6638 [1:21:41<1:19:34, 3.88s/it] {'loss': 0.6407, 'grad_norm': 0.5466398140450313, 'learning_rate': 1.750593064178654e-06, 'epoch': 0.81} 81%|████████▏ | 5407/6638 [1:21:41<1:19:34, 3.88s/it] 81%|████████▏ | 5408/6638 [1:21:44<1:15:26, 3.68s/it] {'loss': 0.6425, 'grad_norm': 0.5238772714801607, 'learning_rate': 1.7478359091779395e-06, 'epoch': 0.81} 81%|████████▏ | 5408/6638 [1:21:44<1:15:26, 3.68s/it] 81%|████████▏ | 5409/6638 [1:21:48<1:12:49, 3.56s/it] {'loss': 0.614, 'grad_norm': 0.5996520558671959, 'learning_rate': 1.7450807191912188e-06, 'epoch': 0.81} 81%|████████▏ | 5409/6638 [1:21:48<1:12:49, 3.56s/it] 82%|████████▏ | 5410/6638 [1:21:51<1:11:44, 3.50s/it] {'loss': 0.626, 'grad_norm': 0.5360863431725162, 'learning_rate': 1.7423274948745584e-06, 'epoch': 0.82} 82%|████████▏ | 5410/6638 [1:21:51<1:11:44, 3.50s/it] 82%|████████▏ | 5411/6638 [1:21:54<1:10:15, 3.44s/it] {'loss': 0.611, 'grad_norm': 0.5461604832957181, 'learning_rate': 1.7395762368835623e-06, 'epoch': 0.82} 82%|████████▏ | 5411/6638 [1:21:54<1:10:15, 3.44s/it] 82%|████████▏ | 5412/6638 [1:21:59<1:17:17, 3.78s/it] {'loss': 0.5979, 'grad_norm': 0.5305367501354007, 'learning_rate': 1.7368269458733611e-06, 'epoch': 0.82} 82%|████████▏ | 5412/6638 [1:21:59<1:17:17, 3.78s/it] 82%|████████▏ | 5413/6638 [1:22:02<1:14:24, 3.64s/it] {'loss': 0.6415, 'grad_norm': 0.6095827007257164, 'learning_rate': 1.7340796224986123e-06, 'epoch': 0.82} 82%|████████▏ | 5413/6638 [1:22:02<1:14:24, 3.64s/it] 82%|████████▏ | 5414/6638 [1:22:05<1:11:49, 3.52s/it] {'loss': 0.6172, 'grad_norm': 0.5753196757394888, 'learning_rate': 1.7313342674135225e-06, 'epoch': 0.82} 82%|████████▏ | 5414/6638 [1:22:05<1:11:49, 3.52s/it] 82%|████████▏ | 5415/6638 [1:22:13<1:39:08, 4.86s/it] {'loss': 0.6678, 'grad_norm': 0.6500240084358402, 'learning_rate': 1.728590881271811e-06, 'epoch': 0.82} 82%|████████▏ | 5415/6638 [1:22:13<1:39:08, 4.86s/it] 82%|████████▏ | 5416/6638 [1:22:20<1:47:57, 5.30s/it] {'loss': 0.6188, 'grad_norm': 0.544797192980225, 'learning_rate': 1.725849464726741e-06, 'epoch': 0.82} 82%|████████▏ | 5416/6638 [1:22:20<1:47:57, 5.30s/it] 82%|████████▏ | 5417/6638 [1:22:23<1:35:10, 4.68s/it] {'loss': 0.6257, 'grad_norm': 0.5805764238565119, 'learning_rate': 1.7231100184310955e-06, 'epoch': 0.82} 82%|████████▏ | 5417/6638 [1:22:23<1:35:10, 4.68s/it] 82%|████████▏ | 5418/6638 [1:22:26<1:26:26, 4.25s/it] {'loss': 0.6487, 'grad_norm': 0.594398990663391, 'learning_rate': 1.7203725430371976e-06, 'epoch': 0.82} 82%|████████▏ | 5418/6638 [1:22:26<1:26:26, 4.25s/it] 82%|████████▏ | 5419/6638 [1:22:31<1:29:45, 4.42s/it] {'loss': 0.6731, 'grad_norm': 0.6205225768535414, 'learning_rate': 1.7176370391968999e-06, 'epoch': 0.82} 82%|████████▏ | 5419/6638 [1:22:31<1:29:45, 4.42s/it] 82%|████████▏ | 5420/6638 [1:22:34<1:21:57, 4.04s/it] {'loss': 0.6172, 'grad_norm': 0.5702170202691866, 'learning_rate': 1.7149035075615795e-06, 'epoch': 0.82} 82%|████████▏ | 5420/6638 [1:22:34<1:21:57, 4.04s/it] 82%|████████▏ | 5421/6638 [1:22:37<1:17:18, 3.81s/it] {'loss': 0.6478, 'grad_norm': 0.5749182141596967, 'learning_rate': 1.7121719487821498e-06, 'epoch': 0.82} 82%|████████▏ | 5421/6638 [1:22:37<1:17:18, 3.81s/it] 82%|████████▏ | 5422/6638 [1:22:41<1:14:54, 3.70s/it] {'loss': 0.6419, 'grad_norm': 0.6126278962613442, 'learning_rate': 1.709442363509054e-06, 'epoch': 0.82} 82%|████████▏ | 5422/6638 [1:22:41<1:14:54, 3.70s/it] 82%|████████▏ | 5423/6638 [1:22:44<1:12:28, 3.58s/it] {'loss': 0.6384, 'grad_norm': 0.5700692187006029, 'learning_rate': 1.7067147523922589e-06, 'epoch': 0.82} 82%|████████▏ | 5423/6638 [1:22:44<1:12:28, 3.58s/it] 82%|████████▏ | 5424/6638 [1:22:48<1:11:43, 3.55s/it] {'loss': 0.6602, 'grad_norm': 0.6068119270927137, 'learning_rate': 1.7039891160812705e-06, 'epoch': 0.82} 82%|████████▏ | 5424/6638 [1:22:48<1:11:43, 3.55s/it] 82%|████████▏ | 5425/6638 [1:22:51<1:10:47, 3.50s/it] {'loss': 0.6645, 'grad_norm': 0.586322483155354, 'learning_rate': 1.7012654552251183e-06, 'epoch': 0.82} 82%|████████▏ | 5425/6638 [1:22:51<1:10:47, 3.50s/it] 82%|████████▏ | 5426/6638 [1:22:54<1:09:34, 3.44s/it] {'loss': 0.6691, 'grad_norm': 0.6117673010947995, 'learning_rate': 1.6985437704723673e-06, 'epoch': 0.82} 82%|████████▏ | 5426/6638 [1:22:54<1:09:34, 3.44s/it] 82%|████████▏ | 5427/6638 [1:22:59<1:18:16, 3.88s/it] {'loss': 0.6323, 'grad_norm': 0.583284989913916, 'learning_rate': 1.6958240624711032e-06, 'epoch': 0.82} 82%|████████▏ | 5427/6638 [1:22:59<1:18:16, 3.88s/it] 82%|████████▏ | 5428/6638 [1:23:02<1:14:32, 3.70s/it] {'loss': 0.6444, 'grad_norm': 0.5794840454474547, 'learning_rate': 1.6931063318689456e-06, 'epoch': 0.82} 82%|████████▏ | 5428/6638 [1:23:02<1:14:32, 3.70s/it] 82%|████████▏ | 5429/6638 [1:23:06<1:12:24, 3.59s/it] {'loss': 0.6533, 'grad_norm': 0.5520690492319196, 'learning_rate': 1.690390579313045e-06, 'epoch': 0.82} 82%|████████▏ | 5429/6638 [1:23:06<1:12:24, 3.59s/it] 82%|████████▏ | 5430/6638 [1:23:09<1:10:48, 3.52s/it] {'loss': 0.6379, 'grad_norm': 0.5434647221620179, 'learning_rate': 1.6876768054500781e-06, 'epoch': 0.82} 82%|████████▏ | 5430/6638 [1:23:09<1:10:48, 3.52s/it] 82%|████████▏ | 5431/6638 [1:23:12<1:09:11, 3.44s/it] {'loss': 0.6099, 'grad_norm': 0.5943067573327374, 'learning_rate': 1.684965010926256e-06, 'epoch': 0.82} 82%|████████▏ | 5431/6638 [1:23:12<1:09:11, 3.44s/it] 82%|████████▏ | 5432/6638 [1:23:16<1:08:28, 3.41s/it] {'loss': 0.6569, 'grad_norm': 0.5970016884231482, 'learning_rate': 1.682255196387308e-06, 'epoch': 0.82} 82%|████████▏ | 5432/6638 [1:23:16<1:08:28, 3.41s/it] 82%|████████▏ | 5433/6638 [1:23:19<1:08:50, 3.43s/it] {'loss': 0.6128, 'grad_norm': 0.5769782143563907, 'learning_rate': 1.6795473624784998e-06, 'epoch': 0.82} 82%|████████▏ | 5433/6638 [1:23:19<1:08:50, 3.43s/it] 82%|████████▏ | 5434/6638 [1:23:24<1:16:06, 3.79s/it] {'loss': 0.623, 'grad_norm': 0.5833727824942555, 'learning_rate': 1.6768415098446277e-06, 'epoch': 0.82} 82%|████████▏ | 5434/6638 [1:23:24<1:16:06, 3.79s/it] 82%|████████▏ | 5435/6638 [1:23:27<1:12:46, 3.63s/it] {'loss': 0.6205, 'grad_norm': 0.5962057550616303, 'learning_rate': 1.674137639130009e-06, 'epoch': 0.82} 82%|████████▏ | 5435/6638 [1:23:27<1:12:46, 3.63s/it] 82%|████████▏ | 5436/6638 [1:23:30<1:10:26, 3.52s/it] {'loss': 0.6354, 'grad_norm': 0.6229223566911081, 'learning_rate': 1.6714357509784873e-06, 'epoch': 0.82} 82%|████████▏ | 5436/6638 [1:23:30<1:10:26, 3.52s/it] 82%|████████▏ | 5437/6638 [1:23:34<1:08:46, 3.44s/it] {'loss': 0.6223, 'grad_norm': 0.6016763880076427, 'learning_rate': 1.6687358460334491e-06, 'epoch': 0.82} 82%|████████▏ | 5437/6638 [1:23:34<1:08:46, 3.44s/it] 82%|████████▏ | 5438/6638 [1:23:37<1:08:30, 3.43s/it] {'loss': 0.6392, 'grad_norm': 0.6075567686817372, 'learning_rate': 1.6660379249377912e-06, 'epoch': 0.82} 82%|████████▏ | 5438/6638 [1:23:37<1:08:30, 3.43s/it] 82%|████████▏ | 5439/6638 [1:23:40<1:07:19, 3.37s/it] {'loss': 0.6341, 'grad_norm': 0.5891182479032939, 'learning_rate': 1.6633419883339496e-06, 'epoch': 0.82} 82%|████████▏ | 5439/6638 [1:23:40<1:07:19, 3.37s/it] 82%|████████▏ | 5440/6638 [1:23:44<1:06:50, 3.35s/it] {'loss': 0.6769, 'grad_norm': 0.6535582544671188, 'learning_rate': 1.66064803686388e-06, 'epoch': 0.82} 82%|████████▏ | 5440/6638 [1:23:44<1:06:50, 3.35s/it] 82%|████████▏ | 5441/6638 [1:23:47<1:06:55, 3.35s/it] {'loss': 0.6622, 'grad_norm': 0.5730080090603066, 'learning_rate': 1.6579560711690702e-06, 'epoch': 0.82} 82%|████████▏ | 5441/6638 [1:23:47<1:06:55, 3.35s/it] 82%|████████▏ | 5442/6638 [1:23:50<1:06:13, 3.32s/it] {'loss': 0.6221, 'grad_norm': 0.5976623230484678, 'learning_rate': 1.6552660918905361e-06, 'epoch': 0.82} 82%|████████▏ | 5442/6638 [1:23:50<1:06:13, 3.32s/it] 82%|████████▏ | 5443/6638 [1:23:53<1:05:48, 3.30s/it] {'loss': 0.6247, 'grad_norm': 0.589713690726368, 'learning_rate': 1.6525780996688146e-06, 'epoch': 0.82} 82%|████████▏ | 5443/6638 [1:23:53<1:05:48, 3.30s/it] 82%|████████▏ | 5444/6638 [1:23:57<1:05:12, 3.28s/it] {'loss': 0.638, 'grad_norm': 0.6564219783493453, 'learning_rate': 1.6498920951439756e-06, 'epoch': 0.82} 82%|████████▏ | 5444/6638 [1:23:57<1:05:12, 3.28s/it] 82%|████████▏ | 5445/6638 [1:24:00<1:04:55, 3.26s/it] {'loss': 0.5983, 'grad_norm': 0.5460782161271552, 'learning_rate': 1.6472080789556145e-06, 'epoch': 0.82} 82%|████████▏ | 5445/6638 [1:24:00<1:04:55, 3.26s/it] 82%|████████▏ | 5446/6638 [1:24:03<1:04:36, 3.25s/it] {'loss': 0.6131, 'grad_norm': 0.5954669833908589, 'learning_rate': 1.6445260517428486e-06, 'epoch': 0.82} 82%|████████▏ | 5446/6638 [1:24:03<1:04:36, 3.25s/it] 82%|████████▏ | 5447/6638 [1:24:06<1:04:53, 3.27s/it] {'loss': 0.6184, 'grad_norm': 0.5959036946627355, 'learning_rate': 1.6418460141443303e-06, 'epoch': 0.82} 82%|████████▏ | 5447/6638 [1:24:06<1:04:53, 3.27s/it] 82%|████████▏ | 5448/6638 [1:24:10<1:04:48, 3.27s/it] {'loss': 0.6293, 'grad_norm': 0.5760479809798742, 'learning_rate': 1.6391679667982241e-06, 'epoch': 0.82} 82%|████████▏ | 5448/6638 [1:24:10<1:04:48, 3.27s/it] 82%|████████▏ | 5449/6638 [1:24:13<1:04:12, 3.24s/it] {'loss': 0.633, 'grad_norm': 0.6461462313121973, 'learning_rate': 1.6364919103422394e-06, 'epoch': 0.82} 82%|████████▏ | 5449/6638 [1:24:13<1:04:12, 3.24s/it]2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 82%|████████▏ | 5450/6638 [1:24:17<1:06:47, 3.37s/it] {'loss': 0.6291, 'grad_norm': 0.5218106346755002, 'learning_rate': 1.6338178454135977e-06, 'epoch': 0.82} 82%|████████▏ | 5450/6638 [1:24:17<1:06:47, 3.37s/it] 82%|████████▏ | 5451/6638 [1:24:20<1:05:53, 3.33s/it] {'loss': 0.6055, 'grad_norm': 0.5770507448025285, 'learning_rate': 1.6311457726490466e-06, 'epoch': 0.82} 82%|████████▏ | 5451/6638 [1:24:20<1:05:53, 3.33s/it] 82%|████████▏ | 5452/6638 [1:24:23<1:05:33, 3.32s/it] {'loss': 0.6465, 'grad_norm': 0.6210188605714616, 'learning_rate': 1.6284756926848667e-06, 'epoch': 0.82} 82%|████████▏ | 5452/6638 [1:24:23<1:05:33, 3.32s/it] 82%|████████▏ | 5453/6638 [1:24:26<1:05:24, 3.31s/it] {'loss': 0.6692, 'grad_norm': 0.6655603843603201, 'learning_rate': 1.6258076061568585e-06, 'epoch': 0.82} 82%|████████▏ | 5453/6638 [1:24:26<1:05:24, 3.31s/it] 82%|████████▏ | 5454/6638 [1:24:30<1:05:09, 3.30s/it] {'loss': 0.6114, 'grad_norm': 0.5409592900799889, 'learning_rate': 1.6231415137003536e-06, 'epoch': 0.82} 82%|████████▏ | 5454/6638 [1:24:30<1:05:09, 3.30s/it] 82%|████████▏ | 5455/6638 [1:24:33<1:04:49, 3.29s/it] {'loss': 0.6033, 'grad_norm': 0.5868603489977539, 'learning_rate': 1.6204774159501991e-06, 'epoch': 0.82} 82%|████████▏ | 5455/6638 [1:24:33<1:04:49, 3.29s/it] 82%|████████▏ | 5456/6638 [1:24:36<1:04:29, 3.27s/it] {'loss': 0.6173, 'grad_norm': 0.8079539045295225, 'learning_rate': 1.6178153135407748e-06, 'epoch': 0.82} 82%|████████▏ | 5456/6638 [1:24:36<1:04:29, 3.27s/it] 82%|████████▏ | 5457/6638 [1:24:39<1:04:27, 3.28s/it] {'loss': 0.6203, 'grad_norm': 0.5967431370853294, 'learning_rate': 1.6151552071059874e-06, 'epoch': 0.82} 82%|████████▏ | 5457/6638 [1:24:39<1:04:27, 3.28s/it] 82%|████████▏ | 5458/6638 [1:24:43<1:04:10, 3.26s/it] {'loss': 0.6714, 'grad_norm': 0.6162704426981865, 'learning_rate': 1.6124970972792574e-06, 'epoch': 0.82} 82%|████████▏ | 5458/6638 [1:24:43<1:04:10, 3.26s/it] 82%|████████▏ | 5459/6638 [1:24:46<1:04:39, 3.29s/it] {'loss': 0.6418, 'grad_norm': 0.533378284553322, 'learning_rate': 1.6098409846935402e-06, 'epoch': 0.82} 82%|████████▏ | 5459/6638 [1:24:46<1:04:39, 3.29s/it] 82%|████████▏ | 5460/6638 [1:24:49<1:04:14, 3.27s/it] {'loss': 0.5974, 'grad_norm': 0.566786748847034, 'learning_rate': 1.6071868699813143e-06, 'epoch': 0.82} 82%|████████▏ | 5460/6638 [1:24:49<1:04:14, 3.27s/it] 82%|████████▏ | 5461/6638 [1:24:53<1:04:29, 3.29s/it] {'loss': 0.6298, 'grad_norm': 0.6335956679674069, 'learning_rate': 1.6045347537745759e-06, 'epoch': 0.82} 82%|████████▏ | 5461/6638 [1:24:53<1:04:29, 3.29s/it] 82%|████████▏ | 5462/6638 [1:24:56<1:03:59, 3.26s/it] {'loss': 0.6478, 'grad_norm': 0.6597519360180987, 'learning_rate': 1.6018846367048534e-06, 'epoch': 0.82} 82%|████████▏ | 5462/6638 [1:24:56<1:03:59, 3.26s/it] 82%|████████▏ | 5463/6638 [1:24:59<1:04:01, 3.27s/it] {'loss': 0.6359, 'grad_norm': 0.5585580770262589, 'learning_rate': 1.5992365194031922e-06, 'epoch': 0.82} 82%|████████▏ | 5463/6638 [1:24:59<1:04:01, 3.27s/it] 82%|████████▏ | 5464/6638 [1:25:02<1:03:59, 3.27s/it] {'loss': 0.6335, 'grad_norm': 0.5629451409251001, 'learning_rate': 1.5965904025001656e-06, 'epoch': 0.82} 82%|████████▏ | 5464/6638 [1:25:02<1:03:59, 3.27s/it] 82%|████████▏ | 5465/6638 [1:25:06<1:04:15, 3.29s/it] {'loss': 0.6389, 'grad_norm': 0.5647982305246608, 'learning_rate': 1.5939462866258725e-06, 'epoch': 0.82} 82%|████████▏ | 5465/6638 [1:25:06<1:04:15, 3.29s/it] 82%|████████▏ | 5466/6638 [1:25:09<1:04:11, 3.29s/it] {'loss': 0.6071, 'grad_norm': 0.539506373739615, 'learning_rate': 1.5913041724099288e-06, 'epoch': 0.82} 82%|████████▏ | 5466/6638 [1:25:09<1:04:11, 3.29s/it] 82%|████████▏ | 5467/6638 [1:25:12<1:03:44, 3.27s/it] {'loss': 0.593, 'grad_norm': 0.5299276536836022, 'learning_rate': 1.5886640604814796e-06, 'epoch': 0.82} 82%|████████▏ | 5467/6638 [1:25:12<1:03:44, 3.27s/it] 82%|████████▏ | 5468/6638 [1:25:15<1:04:10, 3.29s/it] {'loss': 0.6148, 'grad_norm': 0.5456044604065817, 'learning_rate': 1.5860259514691933e-06, 'epoch': 0.82} 82%|████████▏ | 5468/6638 [1:25:15<1:04:10, 3.29s/it] 82%|████████▏ | 5469/6638 [1:25:19<1:04:05, 3.29s/it] {'loss': 0.6154, 'grad_norm': 0.5516165578002747, 'learning_rate': 1.5833898460012531e-06, 'epoch': 0.82} 82%|████████▏ | 5469/6638 [1:25:19<1:04:05, 3.29s/it] 82%|████████▏ | 5470/6638 [1:25:22<1:04:17, 3.30s/it] {'loss': 0.6543, 'grad_norm': 0.6909815538868701, 'learning_rate': 1.5807557447053779e-06, 'epoch': 0.82} 82%|████████▏ | 5470/6638 [1:25:22<1:04:17, 3.30s/it] 82%|████████▏ | 5471/6638 [1:25:25<1:04:26, 3.31s/it] {'loss': 0.6378, 'grad_norm': 0.5680210179000609, 'learning_rate': 1.5781236482087947e-06, 'epoch': 0.82} 82%|████████▏ | 5471/6638 [1:25:25<1:04:26, 3.31s/it] 82%|████████▏ | 5472/6638 [1:25:29<1:05:01, 3.35s/it] {'loss': 0.6422, 'grad_norm': 0.57549283325941, 'learning_rate': 1.5754935571382712e-06, 'epoch': 0.82} 82%|████████▏ | 5472/6638 [1:25:29<1:05:01, 3.35s/it] 82%|████████▏ | 5473/6638 [1:25:32<1:04:42, 3.33s/it] {'loss': 0.6729, 'grad_norm': 0.6138952066079331, 'learning_rate': 1.5728654721200808e-06, 'epoch': 0.82} 82%|████████▏ | 5473/6638 [1:25:32<1:04:42, 3.33s/it] 82%|████████▏ | 5474/6638 [1:25:36<1:04:53, 3.35s/it] {'loss': 0.6796, 'grad_norm': 0.636304602881301, 'learning_rate': 1.5702393937800264e-06, 'epoch': 0.82} 82%|████████▏ | 5474/6638 [1:25:36<1:04:53, 3.35s/it] 82%|████████▏ | 5475/6638 [1:25:39<1:04:38, 3.33s/it] {'loss': 0.6592, 'grad_norm': 0.6065095713859062, 'learning_rate': 1.567615322743432e-06, 'epoch': 0.82} 82%|████████▏ | 5475/6638 [1:25:39<1:04:38, 3.33s/it] 82%|████████▏ | 5476/6638 [1:25:42<1:04:44, 3.34s/it] {'loss': 0.6266, 'grad_norm': 0.5683541577729511, 'learning_rate': 1.5649932596351469e-06, 'epoch': 0.82} 82%|████████▏ | 5476/6638 [1:25:42<1:04:44, 3.34s/it] 83%|████████▎ | 5477/6638 [1:25:45<1:04:11, 3.32s/it] {'loss': 0.631, 'grad_norm': 0.5614993622254344, 'learning_rate': 1.5623732050795393e-06, 'epoch': 0.83} 83%|████████▎ | 5477/6638 [1:25:45<1:04:11, 3.32s/it] 83%|████████▎ | 5478/6638 [1:25:49<1:04:36, 3.34s/it] {'loss': 0.6463, 'grad_norm': 0.5818991566399383, 'learning_rate': 1.5597551597004968e-06, 'epoch': 0.83} 83%|████████▎ | 5478/6638 [1:25:49<1:04:36, 3.34s/it] 83%|████████▎ | 5479/6638 [1:25:52<1:04:01, 3.31s/it] {'loss': 0.6643, 'grad_norm': 0.6220516096894037, 'learning_rate': 1.557139124121433e-06, 'epoch': 0.83} 83%|████████▎ | 5479/6638 [1:25:52<1:04:01, 3.31s/it] 83%|████████▎ | 5480/6638 [1:25:55<1:04:04, 3.32s/it] {'loss': 0.6678, 'grad_norm': 0.6419324647512994, 'learning_rate': 1.5545250989652816e-06, 'epoch': 0.83} 83%|████████▎ | 5480/6638 [1:25:55<1:04:04, 3.32s/it] 83%|████████▎ | 5481/6638 [1:25:59<1:03:37, 3.30s/it] {'loss': 0.6458, 'grad_norm': 0.6400201463451172, 'learning_rate': 1.551913084854494e-06, 'epoch': 0.83} 83%|████████▎ | 5481/6638 [1:25:59<1:03:37, 3.30s/it] 83%|████████▎ | 5482/6638 [1:26:02<1:03:18, 3.29s/it] {'loss': 0.6472, 'grad_norm': 0.6263401074149925, 'learning_rate': 1.54930308241105e-06, 'epoch': 0.83} 83%|████████▎ | 5482/6638 [1:26:02<1:03:18, 3.29s/it] 83%|████████▎ | 5483/6638 [1:26:05<1:03:14, 3.29s/it] {'loss': 0.6302, 'grad_norm': 0.591013928356304, 'learning_rate': 1.5466950922564428e-06, 'epoch': 0.83} 83%|████████▎ | 5483/6638 [1:26:05<1:03:14, 3.29s/it] 83%|████████▎ | 5484/6638 [1:26:09<1:03:15, 3.29s/it] {'loss': 0.6442, 'grad_norm': 0.5913097104310754, 'learning_rate': 1.5440891150116888e-06, 'epoch': 0.83} 83%|████████▎ | 5484/6638 [1:26:09<1:03:15, 3.29s/it] 83%|████████▎ | 5485/6638 [1:26:12<1:03:14, 3.29s/it] {'loss': 0.6309, 'grad_norm': 0.6597174881338814, 'learning_rate': 1.5414851512973317e-06, 'epoch': 0.83} 83%|████████▎ | 5485/6638 [1:26:12<1:03:14, 3.29s/it] 83%|████████▎ | 5486/6638 [1:26:15<1:03:29, 3.31s/it] {'loss': 0.6516, 'grad_norm': 0.6081881073415707, 'learning_rate': 1.5388832017334244e-06, 'epoch': 0.83} 83%|████████▎ | 5486/6638 [1:26:15<1:03:29, 3.31s/it] 83%|████████▎ | 5487/6638 [1:26:19<1:04:29, 3.36s/it] {'loss': 0.7095, 'grad_norm': 0.6507491028050093, 'learning_rate': 1.5362832669395466e-06, 'epoch': 0.83} 83%|████████▎ | 5487/6638 [1:26:19<1:04:29, 3.36s/it] 83%|████████▎ | 5488/6638 [1:26:22<1:04:29, 3.36s/it] {'loss': 0.625, 'grad_norm': 0.5713183552759289, 'learning_rate': 1.5336853475348023e-06, 'epoch': 0.83} 83%|████████▎ | 5488/6638 [1:26:22<1:04:29, 3.36s/it] 83%|████████▎ | 5489/6638 [1:26:25<1:04:07, 3.35s/it] {'loss': 0.6331, 'grad_norm': 0.5592592198732924, 'learning_rate': 1.5310894441378043e-06, 'epoch': 0.83} 83%|████████▎ | 5489/6638 [1:26:25<1:04:07, 3.35s/it] 83%|████████▎ | 5490/6638 [1:26:29<1:03:33, 3.32s/it] {'loss': 0.6226, 'grad_norm': 0.5482072477772422, 'learning_rate': 1.528495557366696e-06, 'epoch': 0.83} 83%|████████▎ | 5490/6638 [1:26:29<1:03:33, 3.32s/it] 83%|████████▎ | 5491/6638 [1:26:32<1:02:56, 3.29s/it] {'loss': 0.6481, 'grad_norm': 0.5761603236506806, 'learning_rate': 1.5259036878391342e-06, 'epoch': 0.83} 83%|████████▎ | 5491/6638 [1:26:32<1:02:56, 3.29s/it] 83%|████████▎ | 5492/6638 [1:26:35<1:02:58, 3.30s/it] {'loss': 0.6132, 'grad_norm': 0.5815595263234301, 'learning_rate': 1.523313836172301e-06, 'epoch': 0.83} 83%|████████▎ | 5492/6638 [1:26:35<1:02:58, 3.30s/it] 83%|████████▎ | 5493/6638 [1:26:38<1:02:07, 3.26s/it] {'loss': 0.6141, 'grad_norm': 0.5596651114901866, 'learning_rate': 1.520726002982893e-06, 'epoch': 0.83} 83%|████████▎ | 5493/6638 [1:26:38<1:02:07, 3.26s/it] 83%|████████▎ | 5494/6638 [1:26:42<1:02:07, 3.26s/it] {'loss': 0.655, 'grad_norm': 0.608648065680168, 'learning_rate': 1.5181401888871218e-06, 'epoch': 0.83} 83%|████████▎ | 5494/6638 [1:26:42<1:02:07, 3.26s/it] 83%|████████▎ | 5495/6638 [1:26:45<1:02:13, 3.27s/it] {'loss': 0.6152, 'grad_norm': 0.528910920449669, 'learning_rate': 1.5155563945007345e-06, 'epoch': 0.83} 83%|████████▎ | 5495/6638 [1:26:45<1:02:13, 3.27s/it] 83%|████████▎ | 5496/6638 [1:26:48<1:02:35, 3.29s/it] {'loss': 0.6913, 'grad_norm': 0.6423110121626411, 'learning_rate': 1.5129746204389795e-06, 'epoch': 0.83} 83%|████████▎ | 5496/6638 [1:26:48<1:02:35, 3.29s/it] 83%|████████▎ | 5497/6638 [1:26:52<1:02:48, 3.30s/it] {'loss': 0.5993, 'grad_norm': 0.5573649673511151, 'learning_rate': 1.5103948673166369e-06, 'epoch': 0.83} 83%|████████▎ | 5497/6638 [1:26:52<1:02:48, 3.30s/it] 83%|████████▎ | 5498/6638 [1:26:55<1:03:04, 3.32s/it] {'loss': 0.6454, 'grad_norm': 0.5915926189032693, 'learning_rate': 1.5078171357479942e-06, 'epoch': 0.83} 83%|████████▎ | 5498/6638 [1:26:55<1:03:04, 3.32s/it] 83%|████████▎ | 5499/6638 [1:26:58<1:02:53, 3.31s/it] {'loss': 0.6346, 'grad_norm': 0.5720866752945992, 'learning_rate': 1.5052414263468663e-06, 'epoch': 0.83} 83%|████████▎ | 5499/6638 [1:26:58<1:02:53, 3.31s/it]2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 83%|████████▎ | 5500/6638 [1:27:01<1:02:44, 3.31s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6126, 'grad_norm': 0.5906192798031685, 'learning_rate': 1.5026677397265876e-06, 'epoch': 0.83} 83%|████████▎ | 5500/6638 [1:27:01<1:02:44, 3.31s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 83%|████████▎ | 5501/6638 [1:27:19<2:21:26, 7.46s/it] {'loss': 0.6232, 'grad_norm': 0.555224455552178, 'learning_rate': 1.5000960765000006e-06, 'epoch': 0.83} 83%|████████▎ | 5501/6638 [1:27:19<2:21:26, 7.46s/it] 83%|████████▎ | 5502/6638 [1:27:22<1:57:25, 6.20s/it] {'loss': 0.6618, 'grad_norm': 0.6552873830853004, 'learning_rate': 1.497526437279475e-06, 'epoch': 0.83} 83%|████████▎ | 5502/6638 [1:27:22<1:57:25, 6.20s/it] 83%|████████▎ | 5503/6638 [1:27:25<1:40:59, 5.34s/it] {'loss': 0.6049, 'grad_norm': 0.5841737259153761, 'learning_rate': 1.4949588226768996e-06, 'epoch': 0.83} 83%|████████▎ | 5503/6638 [1:27:25<1:40:59, 5.34s/it] 83%|████████▎ | 5504/6638 [1:27:28<1:29:11, 4.72s/it] {'loss': 0.6276, 'grad_norm': 0.570265682156102, 'learning_rate': 1.4923932333036717e-06, 'epoch': 0.83} 83%|████████▎ | 5504/6638 [1:27:28<1:29:11, 4.72s/it] 83%|████████▎ | 5505/6638 [1:27:32<1:20:49, 4.28s/it] {'loss': 0.6169, 'grad_norm': 0.5721743721084642, 'learning_rate': 1.489829669770717e-06, 'epoch': 0.83} 83%|████████▎ | 5505/6638 [1:27:32<1:20:49, 4.28s/it] 83%|████████▎ | 5506/6638 [1:27:35<1:14:43, 3.96s/it] {'loss': 0.605, 'grad_norm': 0.5784049857249902, 'learning_rate': 1.4872681326884696e-06, 'epoch': 0.83} 83%|████████▎ | 5506/6638 [1:27:35<1:14:43, 3.96s/it] 83%|████████▎ | 5507/6638 [1:27:38<1:10:36, 3.75s/it] {'loss': 0.6843, 'grad_norm': 0.6617032921724881, 'learning_rate': 1.4847086226668871e-06, 'epoch': 0.83} 83%|████████▎ | 5507/6638 [1:27:38<1:10:36, 3.75s/it] 83%|████████▎ | 5508/6638 [1:27:41<1:07:40, 3.59s/it] {'loss': 0.5938, 'grad_norm': 0.5204118763060616, 'learning_rate': 1.4821511403154465e-06, 'epoch': 0.83} 83%|████████▎ | 5508/6638 [1:27:41<1:07:40, 3.59s/it] 83%|████████▎ | 5509/6638 [1:27:45<1:05:27, 3.48s/it] {'loss': 0.6317, 'grad_norm': 0.5890799567714525, 'learning_rate': 1.4795956862431315e-06, 'epoch': 0.83} 83%|████████▎ | 5509/6638 [1:27:45<1:05:27, 3.48s/it] 83%|████████▎ | 5510/6638 [1:27:48<1:03:50, 3.40s/it] {'loss': 0.6305, 'grad_norm': 0.5970337128919423, 'learning_rate': 1.4770422610584534e-06, 'epoch': 0.83} 83%|████████▎ | 5510/6638 [1:27:48<1:03:50, 3.40s/it] 83%|████████▎ | 5511/6638 [1:27:51<1:03:15, 3.37s/it] {'loss': 0.6273, 'grad_norm': 0.6424730521010404, 'learning_rate': 1.474490865369438e-06, 'epoch': 0.83} 83%|████████▎ | 5511/6638 [1:27:51<1:03:15, 3.37s/it] 83%|████████▎ | 5512/6638 [1:27:54<1:02:28, 3.33s/it] {'loss': 0.6313, 'grad_norm': 0.5786229079096712, 'learning_rate': 1.4719414997836223e-06, 'epoch': 0.83} 83%|████████▎ | 5512/6638 [1:27:54<1:02:28, 3.33s/it] 83%|████████▎ | 5513/6638 [1:27:58<1:02:55, 3.36s/it] {'loss': 0.6351, 'grad_norm': 0.6213689548312222, 'learning_rate': 1.4693941649080656e-06, 'epoch': 0.83} 83%|████████▎ | 5513/6638 [1:27:58<1:02:55, 3.36s/it] 83%|████████▎ | 5514/6638 [1:28:01<1:01:59, 3.31s/it] {'loss': 0.6372, 'grad_norm': 0.5734718064623414, 'learning_rate': 1.4668488613493426e-06, 'epoch': 0.83} 83%|████████▎ | 5514/6638 [1:28:01<1:01:59, 3.31s/it] 83%|████████▎ | 5515/6638 [1:28:04<1:01:46, 3.30s/it] {'loss': 0.6022, 'grad_norm': 0.5142358213250061, 'learning_rate': 1.464305589713545e-06, 'epoch': 0.83} 83%|████████▎ | 5515/6638 [1:28:04<1:01:46, 3.30s/it] 83%|████████▎ | 5516/6638 [1:28:08<1:01:55, 3.31s/it] {'loss': 0.623, 'grad_norm': 0.6498780659928741, 'learning_rate': 1.4617643506062774e-06, 'epoch': 0.83} 83%|████████▎ | 5516/6638 [1:28:08<1:01:55, 3.31s/it] 83%|████████▎ | 5517/6638 [1:28:11<1:02:12, 3.33s/it] {'loss': 0.6503, 'grad_norm': 0.5744265963037115, 'learning_rate': 1.459225144632659e-06, 'epoch': 0.83} 83%|████████▎ | 5517/6638 [1:28:11<1:02:12, 3.33s/it] 83%|████████▎ | 5518/6638 [1:28:14<1:02:26, 3.35s/it] {'loss': 0.6392, 'grad_norm': 0.5589145039618867, 'learning_rate': 1.456687972397336e-06, 'epoch': 0.83} 83%|████████▎ | 5518/6638 [1:28:14<1:02:26, 3.35s/it] 83%|████████▎ | 5519/6638 [1:28:18<1:02:18, 3.34s/it] {'loss': 0.5953, 'grad_norm': 0.5594723697538111, 'learning_rate': 1.4541528345044553e-06, 'epoch': 0.83} 83%|████████▎ | 5519/6638 [1:28:18<1:02:18, 3.34s/it] 83%|████████▎ | 5520/6638 [1:28:21<1:01:55, 3.32s/it] {'loss': 0.6111, 'grad_norm': 0.5573470039638402, 'learning_rate': 1.451619731557693e-06, 'epoch': 0.83} 83%|████████▎ | 5520/6638 [1:28:21<1:01:55, 3.32s/it] 83%|████████▎ | 5521/6638 [1:28:24<1:01:38, 3.31s/it] {'loss': 0.6349, 'grad_norm': 0.5679934321529275, 'learning_rate': 1.4490886641602275e-06, 'epoch': 0.83} 83%|████████▎ | 5521/6638 [1:28:24<1:01:38, 3.31s/it] 83%|████████▎ | 5522/6638 [1:28:28<1:01:53, 3.33s/it] {'loss': 0.6421, 'grad_norm': 0.5917332022331867, 'learning_rate': 1.4465596329147635e-06, 'epoch': 0.83} 83%|████████▎ | 5522/6638 [1:28:28<1:01:53, 3.33s/it] 83%|████████▎ | 5523/6638 [1:28:31<1:02:12, 3.35s/it] {'loss': 0.6377, 'grad_norm': 0.5202244128893034, 'learning_rate': 1.444032638423517e-06, 'epoch': 0.83} 83%|████████▎ | 5523/6638 [1:28:31<1:02:12, 3.35s/it] 83%|████████▎ | 5524/6638 [1:28:34<1:02:38, 3.37s/it] {'loss': 0.6606, 'grad_norm': 0.574113815282679, 'learning_rate': 1.4415076812882156e-06, 'epoch': 0.83} 83%|████████▎ | 5524/6638 [1:28:34<1:02:38, 3.37s/it] 83%|████████▎ | 5525/6638 [1:28:38<1:01:44, 3.33s/it] {'loss': 0.6231, 'grad_norm': 0.5607194766009382, 'learning_rate': 1.4389847621101061e-06, 'epoch': 0.83} 83%|████████▎ | 5525/6638 [1:28:38<1:01:44, 3.33s/it] 83%|████████▎ | 5526/6638 [1:28:41<1:01:43, 3.33s/it] {'loss': 0.6253, 'grad_norm': 0.5352441861694789, 'learning_rate': 1.4364638814899513e-06, 'epoch': 0.83} 83%|████████▎ | 5526/6638 [1:28:41<1:01:43, 3.33s/it] 83%|████████▎ | 5527/6638 [1:28:44<1:00:56, 3.29s/it] {'loss': 0.6028, 'grad_norm': 0.5916882081832974, 'learning_rate': 1.4339450400280208e-06, 'epoch': 0.83} 83%|████████▎ | 5527/6638 [1:28:44<1:00:56, 3.29s/it] 83%|████████▎ | 5528/6638 [1:28:47<1:00:31, 3.27s/it] {'loss': 0.6217, 'grad_norm': 0.6089489866186537, 'learning_rate': 1.4314282383241097e-06, 'epoch': 0.83} 83%|████████▎ | 5528/6638 [1:28:47<1:00:31, 3.27s/it] 83%|████████▎ | 5529/6638 [1:28:51<1:00:41, 3.28s/it] {'loss': 0.6361, 'grad_norm': 0.5865906320537629, 'learning_rate': 1.4289134769775147e-06, 'epoch': 0.83} 83%|████████▎ | 5529/6638 [1:28:51<1:00:41, 3.28s/it] 83%|████████▎ | 5530/6638 [1:28:54<1:01:39, 3.34s/it] {'loss': 0.596, 'grad_norm': 0.5470763136851472, 'learning_rate': 1.4264007565870586e-06, 'epoch': 0.83} 83%|████████▎ | 5530/6638 [1:28:54<1:01:39, 3.34s/it] 83%|████████▎ | 5531/6638 [1:28:58<1:01:21, 3.33s/it] {'loss': 0.6335, 'grad_norm': 0.5517344530035794, 'learning_rate': 1.4238900777510734e-06, 'epoch': 0.83} 83%|████████▎ | 5531/6638 [1:28:58<1:01:21, 3.33s/it] 83%|████████▎ | 5532/6638 [1:29:01<1:00:40, 3.29s/it] {'loss': 0.6399, 'grad_norm': 0.5945058240902356, 'learning_rate': 1.4213814410673999e-06, 'epoch': 0.83} 83%|████████▎ | 5532/6638 [1:29:01<1:00:40, 3.29s/it] 83%|████████▎ | 5533/6638 [1:29:04<1:00:31, 3.29s/it] {'loss': 0.6402, 'grad_norm': 0.5818015649694234, 'learning_rate': 1.4188748471334003e-06, 'epoch': 0.83} 83%|████████▎ | 5533/6638 [1:29:04<1:00:31, 3.29s/it] 83%|████████▎ | 5534/6638 [1:29:07<1:00:03, 3.26s/it] {'loss': 0.6128, 'grad_norm': 0.6579302743066495, 'learning_rate': 1.416370296545949e-06, 'epoch': 0.83} 83%|████████▎ | 5534/6638 [1:29:07<1:00:03, 3.26s/it] 83%|████████▎ | 5535/6638 [1:29:11<1:00:13, 3.28s/it] {'loss': 0.6456, 'grad_norm': 0.6005576111841454, 'learning_rate': 1.4138677899014276e-06, 'epoch': 0.83} 83%|████████▎ | 5535/6638 [1:29:11<1:00:13, 3.28s/it] 83%|████████▎ | 5536/6638 [1:29:14<59:58, 3.27s/it] {'loss': 0.6334, 'grad_norm': 0.5752360049797938, 'learning_rate': 1.4113673277957395e-06, 'epoch': 0.83} 83%|████████▎ | 5536/6638 [1:29:14<59:58, 3.27s/it] 83%|████████▎ | 5537/6638 [1:29:17<1:00:31, 3.30s/it] {'loss': 0.6691, 'grad_norm': 0.6074811262943146, 'learning_rate': 1.4088689108242959e-06, 'epoch': 0.83} 83%|████████▎ | 5537/6638 [1:29:17<1:00:31, 3.30s/it] 83%|████████▎ | 5538/6638 [1:29:21<1:01:33, 3.36s/it] {'loss': 0.6382, 'grad_norm': 0.5980688202785429, 'learning_rate': 1.406372539582025e-06, 'epoch': 0.83} 83%|████████▎ | 5538/6638 [1:29:21<1:01:33, 3.36s/it] 83%|████████▎ | 5539/6638 [1:29:24<1:01:21, 3.35s/it] {'loss': 0.5739, 'grad_norm': 0.5086644212800159, 'learning_rate': 1.4038782146633634e-06, 'epoch': 0.83} 83%|████████▎ | 5539/6638 [1:29:24<1:01:21, 3.35s/it] 83%|████████▎ | 5540/6638 [1:29:27<1:00:31, 3.31s/it] {'loss': 0.6138, 'grad_norm': 0.5602098726469357, 'learning_rate': 1.4013859366622595e-06, 'epoch': 0.83} 83%|████████▎ | 5540/6638 [1:29:27<1:00:31, 3.31s/it] 83%|████████▎ | 5541/6638 [1:29:31<1:00:47, 3.32s/it] {'loss': 0.6552, 'grad_norm': 0.5544331442649344, 'learning_rate': 1.3988957061721797e-06, 'epoch': 0.83} 83%|████████▎ | 5541/6638 [1:29:31<1:00:47, 3.32s/it] 83%|████████▎ | 5542/6638 [1:29:34<1:01:06, 3.34s/it] {'loss': 0.6349, 'grad_norm': 0.5489075039165476, 'learning_rate': 1.3964075237861007e-06, 'epoch': 0.83} 83%|████████▎ | 5542/6638 [1:29:34<1:01:06, 3.34s/it] 84%|████████▎ | 5543/6638 [1:29:37<1:00:26, 3.31s/it] {'loss': 0.6877, 'grad_norm': 0.7044348779313309, 'learning_rate': 1.3939213900965133e-06, 'epoch': 0.84} 84%|████████▎ | 5543/6638 [1:29:37<1:00:26, 3.31s/it] 84%|████████▎ | 5544/6638 [1:29:41<1:00:34, 3.32s/it] {'loss': 0.6445, 'grad_norm': 0.6015187641546037, 'learning_rate': 1.3914373056954122e-06, 'epoch': 0.84} 84%|████████▎ | 5544/6638 [1:29:41<1:00:34, 3.32s/it] 84%|████████▎ | 5545/6638 [1:29:44<1:00:03, 3.30s/it] {'loss': 0.629, 'grad_norm': 0.5680560939351582, 'learning_rate': 1.3889552711743147e-06, 'epoch': 0.84} 84%|████████▎ | 5545/6638 [1:29:44<1:00:03, 3.30s/it] 84%|████████▎ | 5546/6638 [1:29:47<59:35, 3.27s/it] {'loss': 0.6278, 'grad_norm': 0.6080628600067711, 'learning_rate': 1.386475287124247e-06, 'epoch': 0.84} 84%|████████▎ | 5546/6638 [1:29:47<59:35, 3.27s/it] 84%|████████▎ | 5547/6638 [1:29:50<59:07, 3.25s/it] {'loss': 0.6199, 'grad_norm': 0.5493655634324306, 'learning_rate': 1.383997354135741e-06, 'epoch': 0.84} 84%|████████▎ | 5547/6638 [1:29:50<59:07, 3.25s/it] 84%|████████▎ | 5548/6638 [1:29:54<59:26, 3.27s/it] {'loss': 0.6296, 'grad_norm': 0.5382817613190258, 'learning_rate': 1.381521472798847e-06, 'epoch': 0.84} 84%|████████▎ | 5548/6638 [1:29:54<59:26, 3.27s/it] 84%|████████▎ | 5549/6638 [1:29:57<59:02, 3.25s/it] {'loss': 0.596, 'grad_norm': 0.5536694340853511, 'learning_rate': 1.3790476437031252e-06, 'epoch': 0.84} 84%|████████▎ | 5549/6638 [1:29:57<59:02, 3.25s/it]3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 84%|████████▎ | 5550/6638 [1:30:00<59:17, 3.27s/it] {'loss': 0.6526, 'grad_norm': 0.6241865174484311, 'learning_rate': 1.376575867437644e-06, 'epoch': 0.84} 84%|████████▎ | 5550/6638 [1:30:00<59:17, 3.27s/it] 84%|████████▎ | 5551/6638 [1:30:03<59:20, 3.28s/it] {'loss': 0.6334, 'grad_norm': 0.5890266739963437, 'learning_rate': 1.3741061445909886e-06, 'epoch': 0.84} 84%|████████▎ | 5551/6638 [1:30:03<59:20, 3.28s/it] 84%|████████▎ | 5552/6638 [1:30:07<59:15, 3.27s/it] {'loss': 0.6084, 'grad_norm': 0.5557095954987888, 'learning_rate': 1.3716384757512458e-06, 'epoch': 0.84} 84%|████████▎ | 5552/6638 [1:30:07<59:15, 3.27s/it] 84%|████████▎ | 5553/6638 [1:30:10<59:10, 3.27s/it] {'loss': 0.6388, 'grad_norm': 0.6354869199547376, 'learning_rate': 1.3691728615060284e-06, 'epoch': 0.84} 84%|████████▎ | 5553/6638 [1:30:10<59:10, 3.27s/it] 84%|████████▎ | 5554/6638 [1:30:13<59:35, 3.30s/it] {'loss': 0.6072, 'grad_norm': 0.4802338577789856, 'learning_rate': 1.366709302442446e-06, 'epoch': 0.84} 84%|████████▎ | 5554/6638 [1:30:13<59:35, 3.30s/it] 84%|████████▎ | 5555/6638 [1:30:16<59:13, 3.28s/it] {'loss': 0.6465, 'grad_norm': 0.5574733682444385, 'learning_rate': 1.3642477991471225e-06, 'epoch': 0.84} 84%|████████▎ | 5555/6638 [1:30:16<59:13, 3.28s/it] 84%|████████▎ | 5556/6638 [1:30:20<1:00:05, 3.33s/it] {'loss': 0.6932, 'grad_norm': 0.67916481937743, 'learning_rate': 1.361788352206196e-06, 'epoch': 0.84} 84%|████████▎ | 5556/6638 [1:30:20<1:00:05, 3.33s/it] 84%|████████▎ | 5557/6638 [1:30:23<59:34, 3.31s/it] {'loss': 0.6188, 'grad_norm': 0.5164530277371384, 'learning_rate': 1.3593309622053118e-06, 'epoch': 0.84} 84%|████████▎ | 5557/6638 [1:30:23<59:34, 3.31s/it] 84%|████████▎ | 5558/6638 [1:30:26<59:13, 3.29s/it] {'loss': 0.6073, 'grad_norm': 0.5522976910795454, 'learning_rate': 1.3568756297296292e-06, 'epoch': 0.84} 84%|████████▎ | 5558/6638 [1:30:26<59:13, 3.29s/it] 84%|████████▎ | 5559/6638 [1:30:30<59:04, 3.29s/it] {'loss': 0.6247, 'grad_norm': 0.5427974880207577, 'learning_rate': 1.3544223553638091e-06, 'epoch': 0.84} 84%|████████▎ | 5559/6638 [1:30:30<59:04, 3.29s/it] 84%|████████▍ | 5560/6638 [1:30:33<58:56, 3.28s/it] {'loss': 0.6385, 'grad_norm': 0.5696721487932583, 'learning_rate': 1.351971139692031e-06, 'epoch': 0.84} 84%|████████▍ | 5560/6638 [1:30:33<58:56, 3.28s/it] 84%|████████▍ | 5561/6638 [1:30:36<58:46, 3.27s/it] {'loss': 0.5984, 'grad_norm': 0.520960300540554, 'learning_rate': 1.3495219832979821e-06, 'epoch': 0.84} 84%|████████▍ | 5561/6638 [1:30:36<58:46, 3.27s/it] 84%|████████▍ | 5562/6638 [1:30:39<58:10, 3.24s/it] {'loss': 0.6401, 'grad_norm': 0.6388558389215967, 'learning_rate': 1.3470748867648576e-06, 'epoch': 0.84} 84%|████████▍ | 5562/6638 [1:30:39<58:10, 3.24s/it] 84%|████████▍ | 5563/6638 [1:30:43<58:29, 3.27s/it] {'loss': 0.6559, 'grad_norm': 0.5744884298210894, 'learning_rate': 1.3446298506753585e-06, 'epoch': 0.84} 84%|████████▍ | 5563/6638 [1:30:43<58:29, 3.27s/it] 84%|████████▍ | 5564/6638 [1:30:46<58:33, 3.27s/it] {'loss': 0.662, 'grad_norm': 0.5980281525075573, 'learning_rate': 1.3421868756117028e-06, 'epoch': 0.84} 84%|████████▍ | 5564/6638 [1:30:46<58:33, 3.27s/it] 84%|████████▍ | 5565/6638 [1:30:49<58:35, 3.28s/it] {'loss': 0.6495, 'grad_norm': 0.6198311362817575, 'learning_rate': 1.339745962155613e-06, 'epoch': 0.84} 84%|████████▍ | 5565/6638 [1:30:49<58:35, 3.28s/it] 84%|████████▍ | 5566/6638 [1:30:53<59:16, 3.32s/it] {'loss': 0.6254, 'grad_norm': 0.5360840629863481, 'learning_rate': 1.3373071108883263e-06, 'epoch': 0.84} 84%|████████▍ | 5566/6638 [1:30:53<59:16, 3.32s/it] 84%|████████▍ | 5567/6638 [1:30:56<59:02, 3.31s/it] {'loss': 0.6258, 'grad_norm': 0.5260603437380613, 'learning_rate': 1.334870322390579e-06, 'epoch': 0.84} 84%|████████▍ | 5567/6638 [1:30:56<59:02, 3.31s/it] 84%|████████▍ | 5568/6638 [1:30:59<58:49, 3.30s/it] {'loss': 0.6448, 'grad_norm': 0.5520202632086123, 'learning_rate': 1.3324355972426228e-06, 'epoch': 0.84} 84%|████████▍ | 5568/6638 [1:30:59<58:49, 3.30s/it] 84%|████████▍ | 5569/6638 [1:31:02<58:17, 3.27s/it] {'loss': 0.6381, 'grad_norm': 0.5889378782842535, 'learning_rate': 1.3300029360242218e-06, 'epoch': 0.84} 84%|████████▍ | 5569/6638 [1:31:02<58:17, 3.27s/it] 84%|████████▍ | 5570/6638 [1:31:06<58:33, 3.29s/it] {'loss': 0.6141, 'grad_norm': 0.5542706070125352, 'learning_rate': 1.3275723393146367e-06, 'epoch': 0.84} 84%|████████▍ | 5570/6638 [1:31:06<58:33, 3.29s/it] 84%|████████▍ | 5571/6638 [1:31:09<58:10, 3.27s/it] {'loss': 0.657, 'grad_norm': 0.612130730750346, 'learning_rate': 1.325143807692648e-06, 'epoch': 0.84} 84%|████████▍ | 5571/6638 [1:31:09<58:10, 3.27s/it] 84%|████████▍ | 5572/6638 [1:31:12<57:53, 3.26s/it] {'loss': 0.6835, 'grad_norm': 0.6588848370998984, 'learning_rate': 1.3227173417365413e-06, 'epoch': 0.84} 84%|████████▍ | 5572/6638 [1:31:12<57:53, 3.26s/it] 84%|████████▍ | 5573/6638 [1:31:16<57:59, 3.27s/it] {'loss': 0.624, 'grad_norm': 0.5551213072428679, 'learning_rate': 1.320292942024105e-06, 'epoch': 0.84} 84%|████████▍ | 5573/6638 [1:31:16<57:59, 3.27s/it] 84%|████████▍ | 5574/6638 [1:31:19<57:33, 3.25s/it] {'loss': 0.6528, 'grad_norm': 0.635860747597213, 'learning_rate': 1.3178706091326455e-06, 'epoch': 0.84} 84%|████████▍ | 5574/6638 [1:31:19<57:33, 3.25s/it] 84%|████████▍ | 5575/6638 [1:31:22<57:27, 3.24s/it] {'loss': 0.6206, 'grad_norm': 0.5943367086226626, 'learning_rate': 1.315450343638962e-06, 'epoch': 0.84} 84%|████████▍ | 5575/6638 [1:31:22<57:27, 3.24s/it] 84%|████████▍ | 5576/6638 [1:31:25<57:32, 3.25s/it] {'loss': 0.637, 'grad_norm': 0.6194009400484228, 'learning_rate': 1.3130321461193807e-06, 'epoch': 0.84} 84%|████████▍ | 5576/6638 [1:31:25<57:32, 3.25s/it] 84%|████████▍ | 5577/6638 [1:31:29<57:38, 3.26s/it] {'loss': 0.6396, 'grad_norm': 0.6098951450534407, 'learning_rate': 1.3106160171497217e-06, 'epoch': 0.84} 84%|████████▍ | 5577/6638 [1:31:29<57:38, 3.26s/it] 84%|████████▍ | 5578/6638 [1:31:32<57:27, 3.25s/it] {'loss': 0.6471, 'grad_norm': 0.5635442652882399, 'learning_rate': 1.308201957305314e-06, 'epoch': 0.84} 84%|████████▍ | 5578/6638 [1:31:32<57:27, 3.25s/it] 84%|████████▍ | 5579/6638 [1:31:35<57:19, 3.25s/it] {'loss': 0.6134, 'grad_norm': 0.5225568325877107, 'learning_rate': 1.3057899671609974e-06, 'epoch': 0.84} 84%|████████▍ | 5579/6638 [1:31:35<57:19, 3.25s/it] 84%|████████▍ | 5580/6638 [1:31:38<57:49, 3.28s/it] {'loss': 0.6541, 'grad_norm': 0.5764319679028337, 'learning_rate': 1.3033800472911174e-06, 'epoch': 0.84} 84%|████████▍ | 5580/6638 [1:31:38<57:49, 3.28s/it] 84%|████████▍ | 5581/6638 [1:31:42<57:44, 3.28s/it] {'loss': 0.6001, 'grad_norm': 0.5151795632311689, 'learning_rate': 1.300972198269529e-06, 'epoch': 0.84} 84%|████████▍ | 5581/6638 [1:31:42<57:44, 3.28s/it] 84%|████████▍ | 5582/6638 [1:31:45<57:52, 3.29s/it] {'loss': 0.6025, 'grad_norm': 0.5221607909150064, 'learning_rate': 1.2985664206695902e-06, 'epoch': 0.84} 84%|████████▍ | 5582/6638 [1:31:45<57:52, 3.29s/it] 84%|████████▍ | 5583/6638 [1:31:48<57:26, 3.27s/it] {'loss': 0.6362, 'grad_norm': 0.596225352778634, 'learning_rate': 1.296162715064162e-06, 'epoch': 0.84} 84%|████████▍ | 5583/6638 [1:31:48<57:26, 3.27s/it] 84%|████████▍ | 5584/6638 [1:31:51<57:32, 3.28s/it] {'loss': 0.6407, 'grad_norm': 0.6453381463796578, 'learning_rate': 1.2937610820256275e-06, 'epoch': 0.84} 84%|████████▍ | 5584/6638 [1:31:51<57:32, 3.28s/it] 84%|████████▍ | 5585/6638 [1:31:55<57:20, 3.27s/it] {'loss': 0.6558, 'grad_norm': 0.6288633863943021, 'learning_rate': 1.2913615221258579e-06, 'epoch': 0.84} 84%|████████▍ | 5585/6638 [1:31:55<57:20, 3.27s/it] 84%|████████▍ | 5586/6638 [1:31:58<57:16, 3.27s/it] {'loss': 0.6192, 'grad_norm': 0.6181008112084332, 'learning_rate': 1.2889640359362432e-06, 'epoch': 0.84} 84%|████████▍ | 5586/6638 [1:31:58<57:16, 3.27s/it] 84%|████████▍ | 5587/6638 [1:32:01<57:33, 3.29s/it] {'loss': 0.607, 'grad_norm': 0.5064683977555812, 'learning_rate': 1.2865686240276732e-06, 'epoch': 0.84} 84%|████████▍ | 5587/6638 [1:32:01<57:33, 3.29s/it] 84%|████████▍ | 5588/6638 [1:32:05<57:38, 3.29s/it] {'loss': 0.63, 'grad_norm': 0.5705831448996721, 'learning_rate': 1.2841752869705459e-06, 'epoch': 0.84} 84%|████████▍ | 5588/6638 [1:32:05<57:38, 3.29s/it] 84%|████████▍ | 5589/6638 [1:32:08<57:42, 3.30s/it] {'loss': 0.679, 'grad_norm': 0.6428192821423052, 'learning_rate': 1.2817840253347679e-06, 'epoch': 0.84} 84%|████████▍ | 5589/6638 [1:32:08<57:42, 3.30s/it] 84%|████████▍ | 5590/6638 [1:32:11<57:07, 3.27s/it] {'loss': 0.6043, 'grad_norm': 0.6552455202166059, 'learning_rate': 1.279394839689746e-06, 'epoch': 0.84} 84%|████████▍ | 5590/6638 [1:32:11<57:07, 3.27s/it] 84%|████████▍ | 5591/6638 [1:32:14<57:00, 3.27s/it] {'loss': 0.6773, 'grad_norm': 0.5824126406356132, 'learning_rate': 1.277007730604396e-06, 'epoch': 0.84} 84%|████████▍ | 5591/6638 [1:32:14<57:00, 3.27s/it] 84%|████████▍ | 5592/6638 [1:32:18<56:56, 3.27s/it] {'loss': 0.6472, 'grad_norm': 0.5932656177487232, 'learning_rate': 1.274622698647141e-06, 'epoch': 0.84} 84%|████████▍ | 5592/6638 [1:32:18<56:56, 3.27s/it] 84%|████████▍ | 5593/6638 [1:32:21<57:29, 3.30s/it] {'loss': 0.6617, 'grad_norm': 0.5986361178247416, 'learning_rate': 1.2722397443859036e-06, 'epoch': 0.84} 84%|████████▍ | 5593/6638 [1:32:21<57:29, 3.30s/it] 84%|████████▍ | 5594/6638 [1:32:24<57:04, 3.28s/it] {'loss': 0.6023, 'grad_norm': 0.5563447415559574, 'learning_rate': 1.2698588683881185e-06, 'epoch': 0.84} 84%|████████▍ | 5594/6638 [1:32:24<57:04, 3.28s/it] 84%|████████▍ | 5595/6638 [1:32:28<57:20, 3.30s/it] {'loss': 0.6263, 'grad_norm': 0.543919936878136, 'learning_rate': 1.2674800712207226e-06, 'epoch': 0.84} 84%|████████▍ | 5595/6638 [1:32:28<57:20, 3.30s/it] 84%|████████▍ | 5596/6638 [1:32:31<57:35, 3.32s/it] {'loss': 0.6342, 'grad_norm': 0.6432555812091575, 'learning_rate': 1.2651033534501543e-06, 'epoch': 0.84} 84%|████████▍ | 5596/6638 [1:32:31<57:35, 3.32s/it] 84%|████████▍ | 5597/6638 [1:32:34<58:21, 3.36s/it] {'loss': 0.6713, 'grad_norm': 0.6818207600436881, 'learning_rate': 1.262728715642365e-06, 'epoch': 0.84} 84%|████████▍ | 5597/6638 [1:32:34<58:21, 3.36s/it] 84%|████████▍ | 5598/6638 [1:32:38<59:05, 3.41s/it] {'loss': 0.6743, 'grad_norm': 0.5784629957144864, 'learning_rate': 1.260356158362801e-06, 'epoch': 0.84} 84%|████████▍ | 5598/6638 [1:32:38<59:05, 3.41s/it] 84%|████████▍ | 5599/6638 [1:32:41<58:12, 3.36s/it] {'loss': 0.6249, 'grad_norm': 0.6169025443996694, 'learning_rate': 1.2579856821764202e-06, 'epoch': 0.84} 84%|████████▍ | 5599/6638 [1:32:41<58:12, 3.36s/it]2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 84%|████████▍ | 5600/6638 [1:32:44<57:41, 3.33s/it] {'loss': 0.6538, 'grad_norm': 0.647485818906738, 'learning_rate': 1.2556172876476846e-06, 'epoch': 0.84} 84%|████████▍ | 5600/6638 [1:32:44<57:41, 3.33s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 84%|████████▍ | 5601/6638 [1:33:05<2:26:01, 8.45s/it] {'loss': 0.6526, 'grad_norm': 0.6277070880017616, 'learning_rate': 1.2532509753405554e-06, 'epoch': 0.84} 84%|████████▍ | 5601/6638 [1:33:05<2:26:01, 8.45s/it] 84%|████████▍ | 5602/6638 [1:33:08<1:58:51, 6.88s/it] {'loss': 0.6589, 'grad_norm': 0.5959622007995076, 'learning_rate': 1.2508867458185037e-06, 'epoch': 0.84} 84%|████████▍ | 5602/6638 [1:33:08<1:58:51, 6.88s/it] 84%|████████▍ | 5603/6638 [1:33:11<1:40:21, 5.82s/it] {'loss': 0.6203, 'grad_norm': 0.5437694290952105, 'learning_rate': 1.2485245996445006e-06, 'epoch': 0.84} 84%|████████▍ | 5603/6638 [1:33:11<1:40:21, 5.82s/it] 84%|████████▍ | 5604/6638 [1:33:15<1:28:42, 5.15s/it] {'loss': 0.6858, 'grad_norm': 0.6178772856461408, 'learning_rate': 1.2461645373810272e-06, 'epoch': 0.84} 84%|████████▍ | 5604/6638 [1:33:15<1:28:42, 5.15s/it] 84%|████████▍ | 5605/6638 [1:33:18<1:18:22, 4.55s/it] {'loss': 0.7073, 'grad_norm': 0.6878101325607899, 'learning_rate': 1.2438065595900605e-06, 'epoch': 0.84} 84%|████████▍ | 5605/6638 [1:33:18<1:18:22, 4.55s/it] 84%|████████▍ | 5606/6638 [1:33:21<1:11:08, 4.14s/it] {'loss': 0.6019, 'grad_norm': 0.5607485412328549, 'learning_rate': 1.2414506668330805e-06, 'epoch': 0.84} 84%|████████▍ | 5606/6638 [1:33:21<1:11:08, 4.14s/it] 84%|████████▍ | 5607/6638 [1:33:25<1:07:34, 3.93s/it] {'loss': 0.6821, 'grad_norm': 0.5888983024810078, 'learning_rate': 1.239096859671084e-06, 'epoch': 0.84} 84%|████████▍ | 5607/6638 [1:33:25<1:07:34, 3.93s/it] 84%|████████▍ | 5608/6638 [1:33:28<1:04:03, 3.73s/it] {'loss': 0.6375, 'grad_norm': 0.5954568105595253, 'learning_rate': 1.2367451386645558e-06, 'epoch': 0.84} 84%|████████▍ | 5608/6638 [1:33:28<1:04:03, 3.73s/it] 84%|████████▍ | 5609/6638 [1:33:31<1:01:43, 3.60s/it] {'loss': 0.6876, 'grad_norm': 0.6154641324015604, 'learning_rate': 1.2343955043734924e-06, 'epoch': 0.84} 84%|████████▍ | 5609/6638 [1:33:31<1:01:43, 3.60s/it] 85%|████████▍ | 5610/6638 [1:33:35<59:55, 3.50s/it] {'loss': 0.6318, 'grad_norm': 0.5883831508918861, 'learning_rate': 1.2320479573573895e-06, 'epoch': 0.85} 85%|████████▍ | 5610/6638 [1:33:35<59:55, 3.50s/it] 85%|████████▍ | 5611/6638 [1:33:38<59:01, 3.45s/it] {'loss': 0.6118, 'grad_norm': 0.5709109558044105, 'learning_rate': 1.2297024981752482e-06, 'epoch': 0.85} 85%|████████▍ | 5611/6638 [1:33:38<59:01, 3.45s/it] 85%|████████▍ | 5612/6638 [1:33:41<57:44, 3.38s/it] {'loss': 0.608, 'grad_norm': 0.5415353358695585, 'learning_rate': 1.2273591273855744e-06, 'epoch': 0.85} 85%|████████▍ | 5612/6638 [1:33:41<57:44, 3.38s/it] 85%|████████▍ | 5613/6638 [1:33:44<57:35, 3.37s/it] {'loss': 0.6175, 'grad_norm': 0.6207298604692878, 'learning_rate': 1.225017845546369e-06, 'epoch': 0.85} 85%|████████▍ | 5613/6638 [1:33:44<57:35, 3.37s/it] 85%|████████▍ | 5614/6638 [1:33:48<57:30, 3.37s/it] {'loss': 0.6459, 'grad_norm': 0.6009241992776592, 'learning_rate': 1.2226786532151435e-06, 'epoch': 0.85} 85%|████████▍ | 5614/6638 [1:33:48<57:30, 3.37s/it] 85%|████████▍ | 5615/6638 [1:33:51<56:55, 3.34s/it] {'loss': 0.6122, 'grad_norm': 0.5315286327239942, 'learning_rate': 1.2203415509489102e-06, 'epoch': 0.85} 85%|████████▍ | 5615/6638 [1:33:51<56:55, 3.34s/it] 85%|████████▍ | 5616/6638 [1:33:54<56:02, 3.29s/it] {'loss': 0.6234, 'grad_norm': 0.5774298389456176, 'learning_rate': 1.2180065393041796e-06, 'epoch': 0.85} 85%|████████▍ | 5616/6638 [1:33:54<56:02, 3.29s/it] 85%|████████▍ | 5617/6638 [1:33:58<56:01, 3.29s/it] {'loss': 0.6795, 'grad_norm': 0.6218359313257715, 'learning_rate': 1.2156736188369667e-06, 'epoch': 0.85} 85%|████████▍ | 5617/6638 [1:33:58<56:01, 3.29s/it] 85%|████████▍ | 5618/6638 [1:34:01<55:30, 3.27s/it] {'loss': 0.6347, 'grad_norm': 0.6062755044993849, 'learning_rate': 1.2133427901027916e-06, 'epoch': 0.85} 85%|████████▍ | 5618/6638 [1:34:01<55:30, 3.27s/it] 85%|████████▍ | 5619/6638 [1:34:04<55:11, 3.25s/it] {'loss': 0.6147, 'grad_norm': 0.5483181391141112, 'learning_rate': 1.211014053656674e-06, 'epoch': 0.85} 85%|████████▍ | 5619/6638 [1:34:04<55:11, 3.25s/it] 85%|████████▍ | 5620/6638 [1:34:07<55:19, 3.26s/it] {'loss': 0.6248, 'grad_norm': 0.5821527630015496, 'learning_rate': 1.2086874100531342e-06, 'epoch': 0.85} 85%|████████▍ | 5620/6638 [1:34:07<55:19, 3.26s/it] 85%|████████▍ | 5621/6638 [1:34:11<55:39, 3.28s/it] {'loss': 0.6366, 'grad_norm': 0.6443656234134721, 'learning_rate': 1.206362859846192e-06, 'epoch': 0.85} 85%|████████▍ | 5621/6638 [1:34:11<55:39, 3.28s/it] 85%|████████▍ | 5622/6638 [1:34:14<55:26, 3.27s/it] {'loss': 0.6685, 'grad_norm': 0.6142705296670333, 'learning_rate': 1.2040404035893737e-06, 'epoch': 0.85} 85%|████████▍ | 5622/6638 [1:34:14<55:26, 3.27s/it] 85%|████████▍ | 5623/6638 [1:34:17<55:52, 3.30s/it] {'loss': 0.6594, 'grad_norm': 0.6315129849866883, 'learning_rate': 1.2017200418357077e-06, 'epoch': 0.85} 85%|████████▍ | 5623/6638 [1:34:17<55:52, 3.30s/it] 85%|████████▍ | 5624/6638 [1:34:21<55:42, 3.30s/it] {'loss': 0.5933, 'grad_norm': 0.5076966946882748, 'learning_rate': 1.1994017751377175e-06, 'epoch': 0.85} 85%|████████▍ | 5624/6638 [1:34:21<55:42, 3.30s/it] 85%|████████▍ | 5625/6638 [1:34:24<55:46, 3.30s/it] {'loss': 0.6436, 'grad_norm': 0.6603400548383962, 'learning_rate': 1.1970856040474311e-06, 'epoch': 0.85} 85%|████████▍ | 5625/6638 [1:34:24<55:46, 3.30s/it] 85%|████████▍ | 5626/6638 [1:34:27<55:56, 3.32s/it] {'loss': 0.6541, 'grad_norm': 0.6178782361986399, 'learning_rate': 1.1947715291163797e-06, 'epoch': 0.85} 85%|████████▍ | 5626/6638 [1:34:27<55:56, 3.32s/it] 85%|████████▍ | 5627/6638 [1:34:31<56:35, 3.36s/it] {'loss': 0.6165, 'grad_norm': 0.5668482820593019, 'learning_rate': 1.1924595508955949e-06, 'epoch': 0.85} 85%|████████▍ | 5627/6638 [1:34:31<56:35, 3.36s/it] 85%|████████▍ | 5628/6638 [1:34:34<55:41, 3.31s/it] {'loss': 0.6179, 'grad_norm': 0.5986850348407894, 'learning_rate': 1.1901496699356041e-06, 'epoch': 0.85} 85%|████████▍ | 5628/6638 [1:34:34<55:41, 3.31s/it] 85%|████████▍ | 5629/6638 [1:34:37<56:04, 3.33s/it] {'loss': 0.6997, 'grad_norm': 0.6777031093106383, 'learning_rate': 1.187841886786436e-06, 'epoch': 0.85} 85%|████████▍ | 5629/6638 [1:34:37<56:04, 3.33s/it] 85%|████████▍ | 5630/6638 [1:34:41<55:49, 3.32s/it] {'loss': 0.6509, 'grad_norm': 0.6102474170547533, 'learning_rate': 1.1855362019976302e-06, 'epoch': 0.85} 85%|████████▍ | 5630/6638 [1:34:41<55:49, 3.32s/it] 85%|████████▍ | 5631/6638 [1:34:44<55:38, 3.32s/it] {'loss': 0.6219, 'grad_norm': 0.5928686814859323, 'learning_rate': 1.1832326161182118e-06, 'epoch': 0.85} 85%|████████▍ | 5631/6638 [1:34:44<55:38, 3.32s/it] 85%|████████▍ | 5632/6638 [1:34:47<55:10, 3.29s/it] {'loss': 0.618, 'grad_norm': 0.5344911816171011, 'learning_rate': 1.1809311296967184e-06, 'epoch': 0.85} 85%|████████▍ | 5632/6638 [1:34:47<55:10, 3.29s/it] 85%|████████▍ | 5633/6638 [1:34:50<54:27, 3.25s/it] {'loss': 0.6259, 'grad_norm': 0.64367228899722, 'learning_rate': 1.1786317432811767e-06, 'epoch': 0.85} 85%|████████▍ | 5633/6638 [1:34:50<54:27, 3.25s/it] 85%|████████▍ | 5634/6638 [1:34:54<54:54, 3.28s/it] {'loss': 0.6734, 'grad_norm': 0.6100085113194645, 'learning_rate': 1.1763344574191227e-06, 'epoch': 0.85} 85%|████████▍ | 5634/6638 [1:34:54<54:54, 3.28s/it] 85%|████████▍ | 5635/6638 [1:34:57<54:26, 3.26s/it] {'loss': 0.6336, 'grad_norm': 0.6012345788072389, 'learning_rate': 1.17403927265759e-06, 'epoch': 0.85} 85%|████████▍ | 5635/6638 [1:34:57<54:26, 3.26s/it] 85%|████████▍ | 5636/6638 [1:35:00<55:53, 3.35s/it] {'loss': 0.6598, 'grad_norm': 0.5589671179943925, 'learning_rate': 1.171746189543106e-06, 'epoch': 0.85} 85%|████████▍ | 5636/6638 [1:35:00<55:53, 3.35s/it] 85%|████████▍ | 5637/6638 [1:35:04<55:35, 3.33s/it] {'loss': 0.6426, 'grad_norm': 0.5613846876950992, 'learning_rate': 1.1694552086217037e-06, 'epoch': 0.85} 85%|████████▍ | 5637/6638 [1:35:04<55:35, 3.33s/it] 85%|████████▍ | 5638/6638 [1:35:07<54:51, 3.29s/it] {'loss': 0.5914, 'grad_norm': 0.5728781968364997, 'learning_rate': 1.1671663304389169e-06, 'epoch': 0.85} 85%|████████▍ | 5638/6638 [1:35:07<54:51, 3.29s/it] 85%|████████▍ | 5639/6638 [1:35:10<54:41, 3.29s/it] {'loss': 0.6087, 'grad_norm': 0.5844859513150535, 'learning_rate': 1.1648795555397719e-06, 'epoch': 0.85} 85%|████████▍ | 5639/6638 [1:35:10<54:41, 3.29s/it] 85%|████████▍ | 5640/6638 [1:35:13<54:04, 3.25s/it] {'loss': 0.5775, 'grad_norm': 0.5696809331512643, 'learning_rate': 1.1625948844688007e-06, 'epoch': 0.85} 85%|████████▍ | 5640/6638 [1:35:13<54:04, 3.25s/it] 85%|████████▍ | 5641/6638 [1:35:17<54:22, 3.27s/it] {'loss': 0.619, 'grad_norm': 0.569642882371298, 'learning_rate': 1.160312317770026e-06, 'epoch': 0.85} 85%|████████▍ | 5641/6638 [1:35:17<54:22, 3.27s/it] 85%|████████▍ | 5642/6638 [1:35:20<54:44, 3.30s/it] {'loss': 0.6288, 'grad_norm': 0.5412045939129869, 'learning_rate': 1.158031855986983e-06, 'epoch': 0.85} 85%|████████▍ | 5642/6638 [1:35:20<54:44, 3.30s/it] 85%|████████▌ | 5643/6638 [1:35:23<55:12, 3.33s/it] {'loss': 0.6474, 'grad_norm': 0.5954993131674032, 'learning_rate': 1.155753499662694e-06, 'epoch': 0.85} 85%|████████▌ | 5643/6638 [1:35:23<55:12, 3.33s/it] 85%|████████▌ | 5644/6638 [1:35:27<54:58, 3.32s/it] {'loss': 0.6639, 'grad_norm': 0.701478925471739, 'learning_rate': 1.1534772493396818e-06, 'epoch': 0.85} 85%|████████▌ | 5644/6638 [1:35:27<54:58, 3.32s/it] 85%|████████▌ | 5645/6638 [1:35:30<54:54, 3.32s/it] {'loss': 0.6363, 'grad_norm': 0.5515739263598126, 'learning_rate': 1.1512031055599703e-06, 'epoch': 0.85} 85%|████████▌ | 5645/6638 [1:35:30<54:54, 3.32s/it] 85%|████████▌ | 5646/6638 [1:35:33<54:45, 3.31s/it] {'loss': 0.6338, 'grad_norm': 0.5610054643241367, 'learning_rate': 1.1489310688650834e-06, 'epoch': 0.85} 85%|████████▌ | 5646/6638 [1:35:33<54:45, 3.31s/it] 85%|████████▌ | 5647/6638 [1:35:37<54:24, 3.29s/it] {'loss': 0.6066, 'grad_norm': 0.6278207050986787, 'learning_rate': 1.1466611397960404e-06, 'epoch': 0.85} 85%|████████▌ | 5647/6638 [1:35:37<54:24, 3.29s/it] 85%|████████▌ | 5648/6638 [1:35:40<54:59, 3.33s/it] {'loss': 0.6779, 'grad_norm': 0.5932302173024719, 'learning_rate': 1.1443933188933554e-06, 'epoch': 0.85} 85%|████████▌ | 5648/6638 [1:35:40<54:59, 3.33s/it] 85%|████████▌ | 5649/6638 [1:35:43<55:45, 3.38s/it] {'loss': 0.6116, 'grad_norm': 0.5296490290676259, 'learning_rate': 1.1421276066970478e-06, 'epoch': 0.85} 85%|████████▌ | 5649/6638 [1:35:43<55:45, 3.38s/it]3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend...1 AutoResumeHook: Checking whether to suspend... 85%|████████▌ | 5650/6638 [1:35:47<55:26, 3.37s/it] {'loss': 0.6327, 'grad_norm': 0.5444615518985184, 'learning_rate': 1.1398640037466325e-06, 'epoch': 0.85} 85%|████████▌ | 5650/6638 [1:35:47<55:26, 3.37s/it] 85%|████████▌ | 5651/6638 [1:35:50<55:21, 3.36s/it] {'loss': 0.6074, 'grad_norm': 0.6482219861128078, 'learning_rate': 1.1376025105811172e-06, 'epoch': 0.85} 85%|████████▌ | 5651/6638 [1:35:50<55:21, 3.36s/it] 85%|████████▌ | 5652/6638 [1:35:54<55:22, 3.37s/it] {'loss': 0.6534, 'grad_norm': 0.5974403221145997, 'learning_rate': 1.1353431277390125e-06, 'epoch': 0.85} 85%|████████▌ | 5652/6638 [1:35:54<55:22, 3.37s/it] 85%|████████▌ | 5653/6638 [1:35:57<54:55, 3.35s/it] {'loss': 0.5936, 'grad_norm': 0.5829516500043287, 'learning_rate': 1.1330858557583279e-06, 'epoch': 0.85} 85%|████████▌ | 5653/6638 [1:35:57<54:55, 3.35s/it] 85%|████████▌ | 5654/6638 [1:36:00<55:36, 3.39s/it] {'loss': 0.6602, 'grad_norm': 0.6186660747286811, 'learning_rate': 1.1308306951765635e-06, 'epoch': 0.85} 85%|████████▌ | 5654/6638 [1:36:00<55:36, 3.39s/it] 85%|████████▌ | 5655/6638 [1:36:04<54:48, 3.35s/it] {'loss': 0.6116, 'grad_norm': 0.5337088238940675, 'learning_rate': 1.128577646530723e-06, 'epoch': 0.85} 85%|████████▌ | 5655/6638 [1:36:04<54:48, 3.35s/it] 85%|████████▌ | 5656/6638 [1:36:07<54:22, 3.32s/it] {'loss': 0.6044, 'grad_norm': 0.6227182663757874, 'learning_rate': 1.1263267103573016e-06, 'epoch': 0.85} 85%|████████▌ | 5656/6638 [1:36:07<54:22, 3.32s/it] 85%|████████▌ | 5657/6638 [1:36:10<53:48, 3.29s/it] {'loss': 0.613, 'grad_norm': 0.5747825608994959, 'learning_rate': 1.1240778871922976e-06, 'epoch': 0.85} 85%|████████▌ | 5657/6638 [1:36:10<53:48, 3.29s/it] 85%|████████▌ | 5658/6638 [1:36:13<53:59, 3.31s/it] {'loss': 0.6656, 'grad_norm': 0.5922096839099951, 'learning_rate': 1.1218311775712031e-06, 'epoch': 0.85} 85%|████████▌ | 5658/6638 [1:36:13<53:59, 3.31s/it] 85%|████████▌ | 5659/6638 [1:36:17<54:12, 3.32s/it] {'loss': 0.6772, 'grad_norm': 0.5902810213967639, 'learning_rate': 1.1195865820290052e-06, 'epoch': 0.85} 85%|████████▌ | 5659/6638 [1:36:17<54:12, 3.32s/it] 85%|████████▌ | 5660/6638 [1:36:20<54:23, 3.34s/it] {'loss': 0.6448, 'grad_norm': 0.5544315615161294, 'learning_rate': 1.1173441011001895e-06, 'epoch': 0.85} 85%|████████▌ | 5660/6638 [1:36:20<54:23, 3.34s/it] 85%|████████▌ | 5661/6638 [1:36:23<53:50, 3.31s/it] {'loss': 0.6438, 'grad_norm': 0.616535941248097, 'learning_rate': 1.11510373531874e-06, 'epoch': 0.85} 85%|████████▌ | 5661/6638 [1:36:23<53:50, 3.31s/it] 85%|████████▌ | 5662/6638 [1:36:27<54:06, 3.33s/it] {'loss': 0.6279, 'grad_norm': 0.566298488263723, 'learning_rate': 1.112865485218132e-06, 'epoch': 0.85} 85%|████████▌ | 5662/6638 [1:36:27<54:06, 3.33s/it] 85%|████████▌ | 5663/6638 [1:36:30<54:11, 3.33s/it] {'loss': 0.6527, 'grad_norm': 0.5712796741695938, 'learning_rate': 1.1106293513313437e-06, 'epoch': 0.85} 85%|████████▌ | 5663/6638 [1:36:30<54:11, 3.33s/it] 85%|████████▌ | 5664/6638 [1:36:33<54:17, 3.34s/it] {'loss': 0.6539, 'grad_norm': 0.6308779109575686, 'learning_rate': 1.1083953341908393e-06, 'epoch': 0.85} 85%|████████▌ | 5664/6638 [1:36:33<54:17, 3.34s/it] 85%|████████▌ | 5665/6638 [1:36:37<53:56, 3.33s/it] {'loss': 0.6847, 'grad_norm': 0.6522269969345312, 'learning_rate': 1.1061634343285944e-06, 'epoch': 0.85} 85%|████████▌ | 5665/6638 [1:36:37<53:56, 3.33s/it] 85%|████████▌ | 5666/6638 [1:36:40<54:18, 3.35s/it] {'loss': 0.6437, 'grad_norm': 0.583234723348489, 'learning_rate': 1.1039336522760659e-06, 'epoch': 0.85} 85%|████████▌ | 5666/6638 [1:36:40<54:18, 3.35s/it] 85%|████████▌ | 5667/6638 [1:36:43<54:16, 3.35s/it] {'loss': 0.596, 'grad_norm': 0.5084057208045905, 'learning_rate': 1.10170598856421e-06, 'epoch': 0.85} 85%|████████▌ | 5667/6638 [1:36:43<54:16, 3.35s/it] 85%|████████▌ | 5668/6638 [1:36:47<54:04, 3.34s/it] {'loss': 0.6456, 'grad_norm': 0.6029388337214346, 'learning_rate': 1.099480443723483e-06, 'epoch': 0.85} 85%|████████▌ | 5668/6638 [1:36:47<54:04, 3.34s/it] 85%|████████▌ | 5669/6638 [1:36:50<53:58, 3.34s/it] {'loss': 0.6462, 'grad_norm': 0.5602827244178048, 'learning_rate': 1.0972570182838327e-06, 'epoch': 0.85} 85%|████████▌ | 5669/6638 [1:36:50<53:58, 3.34s/it] 85%|████████▌ | 5670/6638 [1:36:53<53:37, 3.32s/it] {'loss': 0.632, 'grad_norm': 0.5880264944828975, 'learning_rate': 1.095035712774708e-06, 'epoch': 0.85} 85%|████████▌ | 5670/6638 [1:36:53<53:37, 3.32s/it] 85%|████████▌ | 5671/6638 [1:36:57<53:37, 3.33s/it] {'loss': 0.6294, 'grad_norm': 0.5371849454621767, 'learning_rate': 1.0928165277250425e-06, 'epoch': 0.85} 85%|████████▌ | 5671/6638 [1:36:57<53:37, 3.33s/it] 85%|████████▌ | 5672/6638 [1:37:00<52:59, 3.29s/it] {'loss': 0.6659, 'grad_norm': 0.6092728404915567, 'learning_rate': 1.0905994636632732e-06, 'epoch': 0.85} 85%|████████▌ | 5672/6638 [1:37:00<52:59, 3.29s/it] 85%|████████▌ | 5673/6638 [1:37:03<52:37, 3.27s/it] {'loss': 0.6194, 'grad_norm': 0.565206852498305, 'learning_rate': 1.0883845211173315e-06, 'epoch': 0.85} 85%|████████▌ | 5673/6638 [1:37:03<52:37, 3.27s/it] 85%|████████▌ | 5674/6638 [1:37:06<52:37, 3.28s/it] {'loss': 0.6076, 'grad_norm': 0.6916198807307015, 'learning_rate': 1.0861717006146388e-06, 'epoch': 0.85} 85%|████████▌ | 5674/6638 [1:37:06<52:37, 3.28s/it] 85%|████████▌ | 5675/6638 [1:37:10<53:00, 3.30s/it] {'loss': 0.6209, 'grad_norm': 0.5945167990905894, 'learning_rate': 1.0839610026821179e-06, 'epoch': 0.85} 85%|████████▌ | 5675/6638 [1:37:10<53:00, 3.30s/it] 86%|████████▌ | 5676/6638 [1:37:13<53:07, 3.31s/it] {'loss': 0.6486, 'grad_norm': 0.7180740710960511, 'learning_rate': 1.0817524278461777e-06, 'epoch': 0.86} 86%|████████▌ | 5676/6638 [1:37:13<53:07, 3.31s/it] 86%|████████▌ | 5677/6638 [1:37:16<52:24, 3.27s/it] {'loss': 0.6073, 'grad_norm': 0.5822215820790868, 'learning_rate': 1.07954597663273e-06, 'epoch': 0.86} 86%|████████▌ | 5677/6638 [1:37:16<52:24, 3.27s/it] 86%|████████▌ | 5678/6638 [1:37:20<52:34, 3.29s/it] {'loss': 0.6368, 'grad_norm': 0.5298933363533533, 'learning_rate': 1.0773416495671773e-06, 'epoch': 0.86} 86%|████████▌ | 5678/6638 [1:37:20<52:34, 3.29s/it] 86%|████████▌ | 5679/6638 [1:37:23<52:14, 3.27s/it] {'loss': 0.6413, 'grad_norm': 0.5794192532542335, 'learning_rate': 1.0751394471744138e-06, 'epoch': 0.86} 86%|████████▌ | 5679/6638 [1:37:23<52:14, 3.27s/it] 86%|████████▌ | 5680/6638 [1:37:26<52:45, 3.30s/it] {'loss': 0.6135, 'grad_norm': 0.5988738750474536, 'learning_rate': 1.0729393699788303e-06, 'epoch': 0.86} 86%|████████▌ | 5680/6638 [1:37:26<52:45, 3.30s/it] 86%|████████▌ | 5681/6638 [1:37:29<52:11, 3.27s/it] {'loss': 0.6291, 'grad_norm': 0.5848176799155472, 'learning_rate': 1.0707414185043163e-06, 'epoch': 0.86} 86%|████████▌ | 5681/6638 [1:37:29<52:11, 3.27s/it] 86%|████████▌ | 5682/6638 [1:37:33<51:59, 3.26s/it] {'loss': 0.6126, 'grad_norm': 0.5545624749444051, 'learning_rate': 1.068545593274245e-06, 'epoch': 0.86} 86%|████████▌ | 5682/6638 [1:37:33<51:59, 3.26s/it] 86%|████████▌ | 5683/6638 [1:37:36<52:36, 3.31s/it] {'loss': 0.6331, 'grad_norm': 0.5665734528950892, 'learning_rate': 1.0663518948114892e-06, 'epoch': 0.86} 86%|████████▌ | 5683/6638 [1:37:36<52:36, 3.31s/it] 86%|████████▌ | 5684/6638 [1:37:39<52:24, 3.30s/it] {'loss': 0.6227, 'grad_norm': 0.5519948175600106, 'learning_rate': 1.064160323638418e-06, 'epoch': 0.86} 86%|████████▌ | 5684/6638 [1:37:39<52:24, 3.30s/it] 86%|████████▌ | 5685/6638 [1:37:43<52:03, 3.28s/it] {'loss': 0.6134, 'grad_norm': 0.5328050910091461, 'learning_rate': 1.0619708802768891e-06, 'epoch': 0.86} 86%|████████▌ | 5685/6638 [1:37:43<52:03, 3.28s/it] 86%|████████▌ | 5686/6638 [1:37:46<52:52, 3.33s/it] {'loss': 0.6651, 'grad_norm': 0.6034816181558756, 'learning_rate': 1.0597835652482545e-06, 'epoch': 0.86} 86%|████████▌ | 5686/6638 [1:37:46<52:52, 3.33s/it] 86%|████████▌ | 5687/6638 [1:37:50<53:17, 3.36s/it] {'loss': 0.646, 'grad_norm': 0.6107668751502915, 'learning_rate': 1.0575983790733569e-06, 'epoch': 0.86} 86%|████████▌ | 5687/6638 [1:37:50<53:17, 3.36s/it] 86%|████████▌ | 5688/6638 [1:37:53<53:28, 3.38s/it] {'loss': 0.6398, 'grad_norm': 0.6117202523229114, 'learning_rate': 1.0554153222725417e-06, 'epoch': 0.86} 86%|████████▌ | 5688/6638 [1:37:53<53:28, 3.38s/it] 86%|████████▌ | 5689/6638 [1:37:56<52:57, 3.35s/it] {'loss': 0.6431, 'grad_norm': 0.5854327777363176, 'learning_rate': 1.0532343953656377e-06, 'epoch': 0.86} 86%|████████▌ | 5689/6638 [1:37:56<52:57, 3.35s/it] 86%|████████▌ | 5690/6638 [1:37:59<52:29, 3.32s/it] {'loss': 0.6373, 'grad_norm': 0.6016744697483828, 'learning_rate': 1.0510555988719663e-06, 'epoch': 0.86} 86%|████████▌ | 5690/6638 [1:37:59<52:29, 3.32s/it] 86%|████████▌ | 5691/6638 [1:38:03<51:48, 3.28s/it] {'loss': 0.5961, 'grad_norm': 0.5515911292949983, 'learning_rate': 1.0488789333103488e-06, 'epoch': 0.86} 86%|████████▌ | 5691/6638 [1:38:03<51:48, 3.28s/it] 86%|████████▌ | 5692/6638 [1:38:06<52:16, 3.32s/it] {'loss': 0.6969, 'grad_norm': 0.59082332144621, 'learning_rate': 1.0467043991990932e-06, 'epoch': 0.86} 86%|████████▌ | 5692/6638 [1:38:06<52:16, 3.32s/it] 86%|████████▌ | 5693/6638 [1:38:09<52:05, 3.31s/it] {'loss': 0.6366, 'grad_norm': 0.5552032201013314, 'learning_rate': 1.0445319970560042e-06, 'epoch': 0.86} 86%|████████▌ | 5693/6638 [1:38:09<52:05, 3.31s/it] 86%|████████▌ | 5694/6638 [1:38:13<51:40, 3.28s/it] {'loss': 0.6057, 'grad_norm': 0.5659636495978444, 'learning_rate': 1.0423617273983733e-06, 'epoch': 0.86} 86%|████████▌ | 5694/6638 [1:38:13<51:40, 3.28s/it] 86%|████████▌ | 5695/6638 [1:38:16<52:01, 3.31s/it] {'loss': 0.6201, 'grad_norm': 0.5701203378883345, 'learning_rate': 1.0401935907429883e-06, 'epoch': 0.86} 86%|████████▌ | 5695/6638 [1:38:16<52:01, 3.31s/it] 86%|████████▌ | 5696/6638 [1:38:19<51:50, 3.30s/it] {'loss': 0.6317, 'grad_norm': 0.5761642454437321, 'learning_rate': 1.0380275876061307e-06, 'epoch': 0.86} 86%|████████▌ | 5696/6638 [1:38:19<51:50, 3.30s/it] 86%|████████▌ | 5697/6638 [1:38:22<51:22, 3.28s/it] {'loss': 0.6512, 'grad_norm': 0.5707769883289779, 'learning_rate': 1.035863718503568e-06, 'epoch': 0.86} 86%|████████▌ | 5697/6638 [1:38:22<51:22, 3.28s/it] 86%|████████▌ | 5698/6638 [1:38:26<50:59, 3.25s/it] {'loss': 0.6619, 'grad_norm': 0.6542540703087487, 'learning_rate': 1.0337019839505669e-06, 'epoch': 0.86} 86%|████████▌ | 5698/6638 [1:38:26<50:59, 3.25s/it] 86%|████████▌ | 5699/6638 [1:38:29<50:41, 3.24s/it] {'loss': 0.5931, 'grad_norm': 0.5604028962871233, 'learning_rate': 1.0315423844618767e-06, 'epoch': 0.86} 86%|████████▌ | 5699/6638 [1:38:29<50:41, 3.24s/it]3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 75 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 86%|████████▌ | 5700/6638 [1:38:32<50:30, 3.23s/it]6 AutoResumeHook: Checking whether to suspend... {'loss': 0.6063, 'grad_norm': 0.7125662048071961, 'learning_rate': 1.029384920551747e-06, 'epoch': 0.86} 86%|████████▌ | 5700/6638 [1:38:32<50:30, 3.23s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5700/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5700/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5700/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 86%|████████▌ | 5701/6638 [1:38:50<1:57:20, 7.51s/it] {'loss': 0.5968, 'grad_norm': 0.5337068064666473, 'learning_rate': 1.027229592733916e-06, 'epoch': 0.86} 86%|████████▌ | 5701/6638 [1:38:50<1:57:20, 7.51s/it] 86%|████████▌ | 5702/6638 [1:38:53<1:37:22, 6.24s/it] {'loss': 0.6325, 'grad_norm': 0.6115104154272925, 'learning_rate': 1.0250764015216097e-06, 'epoch': 0.86} 86%|████████▌ | 5702/6638 [1:38:53<1:37:22, 6.24s/it] 86%|████████▌ | 5703/6638 [1:38:56<1:23:01, 5.33s/it] {'loss': 0.6388, 'grad_norm': 0.704652660229918, 'learning_rate': 1.0229253474275502e-06, 'epoch': 0.86} 86%|████████▌ | 5703/6638 [1:38:56<1:23:01, 5.33s/it] 86%|████████▌ | 5704/6638 [1:38:59<1:13:29, 4.72s/it] {'loss': 0.6426, 'grad_norm': 0.7151925144353927, 'learning_rate': 1.02077643096395e-06, 'epoch': 0.86} 86%|████████▌ | 5704/6638 [1:38:59<1:13:29, 4.72s/it] 86%|████████▌ | 5705/6638 [1:39:03<1:07:01, 4.31s/it] {'loss': 0.6175, 'grad_norm': 0.6702764030160832, 'learning_rate': 1.0186296526425076e-06, 'epoch': 0.86} 86%|████████▌ | 5705/6638 [1:39:03<1:07:01, 4.31s/it] 86%|████████▌ | 5706/6638 [1:39:06<1:01:46, 3.98s/it] {'loss': 0.6438, 'grad_norm': 0.6595296921597052, 'learning_rate': 1.0164850129744186e-06, 'epoch': 0.86} 86%|████████▌ | 5706/6638 [1:39:06<1:01:46, 3.98s/it] 86%|████████▌ | 5707/6638 [1:39:09<58:50, 3.79s/it] {'loss': 0.65, 'grad_norm': 0.5968331599646183, 'learning_rate': 1.0143425124703665e-06, 'epoch': 0.86} 86%|████████▌ | 5707/6638 [1:39:09<58:50, 3.79s/it] 86%|████████▌ | 5708/6638 [1:39:13<56:30, 3.65s/it] {'loss': 0.6123, 'grad_norm': 0.5246400690416948, 'learning_rate': 1.012202151640528e-06, 'epoch': 0.86} 86%|████████▌ | 5708/6638 [1:39:13<56:30, 3.65s/it] 86%|████████▌ | 5709/6638 [1:39:16<55:18, 3.57s/it] {'loss': 0.6317, 'grad_norm': 0.5918552471022808, 'learning_rate': 1.0100639309945659e-06, 'epoch': 0.86} 86%|████████▌ | 5709/6638 [1:39:16<55:18, 3.57s/it] 86%|████████▌ | 5710/6638 [1:39:19<53:48, 3.48s/it] {'loss': 0.662, 'grad_norm': 0.7049806706297805, 'learning_rate': 1.0079278510416313e-06, 'epoch': 0.86} 86%|████████▌ | 5710/6638 [1:39:19<53:48, 3.48s/it] 86%|████████▌ | 5711/6638 [1:39:23<53:01, 3.43s/it] {'loss': 0.6701, 'grad_norm': 0.6556778801547969, 'learning_rate': 1.005793912290378e-06, 'epoch': 0.86} 86%|████████▌ | 5711/6638 [1:39:23<53:01, 3.43s/it] 86%|████████▌ | 5712/6638 [1:39:26<52:59, 3.43s/it] {'loss': 0.6569, 'grad_norm': 0.6335197119171557, 'learning_rate': 1.0036621152489357e-06, 'epoch': 0.86} 86%|████████▌ | 5712/6638 [1:39:26<52:59, 3.43s/it] 86%|████████▌ | 5713/6638 [1:39:29<52:28, 3.40s/it] {'loss': 0.6853, 'grad_norm': 0.6210662048573966, 'learning_rate': 1.0015324604249343e-06, 'epoch': 0.86} 86%|████████▌ | 5713/6638 [1:39:29<52:28, 3.40s/it] 86%|████████▌ | 5714/6638 [1:39:33<52:01, 3.38s/it] {'loss': 0.672, 'grad_norm': 0.5899465626065388, 'learning_rate': 9.994049483254865e-07, 'epoch': 0.86} 86%|████████▌ | 5714/6638 [1:39:33<52:01, 3.38s/it] 86%|████████▌ | 5715/6638 [1:39:36<51:21, 3.34s/it] {'loss': 0.5984, 'grad_norm': 0.5579181486509863, 'learning_rate': 9.972795794571976e-07, 'epoch': 0.86} 86%|████████▌ | 5715/6638 [1:39:36<51:21, 3.34s/it] 86%|████████▌ | 5716/6638 [1:39:39<50:59, 3.32s/it] {'loss': 0.6242, 'grad_norm': 0.5978404001029582, 'learning_rate': 9.951563543261656e-07, 'epoch': 0.86} 86%|████████▌ | 5716/6638 [1:39:39<50:59, 3.32s/it] 86%|████████▌ | 5717/6638 [1:39:42<50:44, 3.31s/it] {'loss': 0.6347, 'grad_norm': 0.6177728263675771, 'learning_rate': 9.930352734379734e-07, 'epoch': 0.86} 86%|████████▌ | 5717/6638 [1:39:42<50:44, 3.31s/it] 86%|████████▌ | 5718/6638 [1:39:46<50:18, 3.28s/it] {'loss': 0.6091, 'grad_norm': 0.6529583664772709, 'learning_rate': 9.909163372976903e-07, 'epoch': 0.86} 86%|████████▌ | 5718/6638 [1:39:46<50:18, 3.28s/it] 86%|████████▌ | 5719/6638 [1:39:49<50:22, 3.29s/it] {'loss': 0.6153, 'grad_norm': 0.5254709908936446, 'learning_rate': 9.887995464098888e-07, 'epoch': 0.86} 86%|████████▌ | 5719/6638 [1:39:49<50:22, 3.29s/it] 86%|████████▌ | 5720/6638 [1:39:52<50:06, 3.27s/it] {'loss': 0.6139, 'grad_norm': 0.566936661176161, 'learning_rate': 9.86684901278614e-07, 'epoch': 0.86} 86%|████████▌ | 5720/6638 [1:39:52<50:06, 3.27s/it] 86%|████████▌ | 5721/6638 [1:39:56<50:14, 3.29s/it] {'loss': 0.6383, 'grad_norm': 0.612882875137901, 'learning_rate': 9.845724024074122e-07, 'epoch': 0.86} 86%|████████▌ | 5721/6638 [1:39:56<50:14, 3.29s/it] 86%|████████▌ | 5722/6638 [1:39:59<50:34, 3.31s/it] {'loss': 0.6164, 'grad_norm': 0.6222309198302968, 'learning_rate': 9.824620502993098e-07, 'epoch': 0.86} 86%|████████▌ | 5722/6638 [1:39:59<50:34, 3.31s/it] 86%|████████▌ | 5723/6638 [1:40:02<50:44, 3.33s/it] {'loss': 0.6208, 'grad_norm': 0.5362787969222349, 'learning_rate': 9.803538454568285e-07, 'epoch': 0.86} 86%|████████▌ | 5723/6638 [1:40:02<50:44, 3.33s/it] 86%|████████▌ | 5724/6638 [1:40:06<50:26, 3.31s/it] {'loss': 0.6639, 'grad_norm': 0.5920234901605892, 'learning_rate': 9.782477883819775e-07, 'epoch': 0.86} 86%|████████▌ | 5724/6638 [1:40:06<50:26, 3.31s/it] 86%|████████▌ | 5725/6638 [1:40:09<51:02, 3.35s/it] {'loss': 0.6645, 'grad_norm': 0.6237104617340425, 'learning_rate': 9.761438795762491e-07, 'epoch': 0.86} 86%|████████▌ | 5725/6638 [1:40:09<51:02, 3.35s/it] 86%|████████▋ | 5726/6638 [1:40:12<50:40, 3.33s/it] {'loss': 0.6446, 'grad_norm': 0.598791120060233, 'learning_rate': 9.740421195406314e-07, 'epoch': 0.86} 86%|████████▋ | 5726/6638 [1:40:12<50:40, 3.33s/it] 86%|████████▋ | 5727/6638 [1:40:16<50:22, 3.32s/it] {'loss': 0.6127, 'grad_norm': 0.5368671508328969, 'learning_rate': 9.719425087755984e-07, 'epoch': 0.86} 86%|████████▋ | 5727/6638 [1:40:16<50:22, 3.32s/it] 86%|████████▋ | 5728/6638 [1:40:19<50:44, 3.35s/it] {'loss': 0.6113, 'grad_norm': 0.5557439289505861, 'learning_rate': 9.698450477811095e-07, 'epoch': 0.86} 86%|████████▋ | 5728/6638 [1:40:19<50:44, 3.35s/it] 86%|████████▋ | 5729/6638 [1:40:22<50:30, 3.33s/it] {'loss': 0.6438, 'grad_norm': 0.6355890050262472, 'learning_rate': 9.677497370566135e-07, 'epoch': 0.86} 86%|████████▋ | 5729/6638 [1:40:22<50:30, 3.33s/it] 86%|████████▋ | 5730/6638 [1:40:26<49:59, 3.30s/it] {'loss': 0.6132, 'grad_norm': 0.5627445271518898, 'learning_rate': 9.656565771010507e-07, 'epoch': 0.86} 86%|████████▋ | 5730/6638 [1:40:26<49:59, 3.30s/it] 86%|████████▋ | 5731/6638 [1:40:29<50:12, 3.32s/it] {'loss': 0.6515, 'grad_norm': 0.5915304582529096, 'learning_rate': 9.635655684128475e-07, 'epoch': 0.86} 86%|████████▋ | 5731/6638 [1:40:29<50:12, 3.32s/it] 86%|████████▋ | 5732/6638 [1:40:32<50:11, 3.32s/it] {'loss': 0.6557, 'grad_norm': 0.6539138324556498, 'learning_rate': 9.61476711489915e-07, 'epoch': 0.86} 86%|████████▋ | 5732/6638 [1:40:32<50:11, 3.32s/it] 86%|████████▋ | 5733/6638 [1:40:35<49:50, 3.30s/it] {'loss': 0.6084, 'grad_norm': 0.5521853962177568, 'learning_rate': 9.593900068296512e-07, 'epoch': 0.86} 86%|████████▋ | 5733/6638 [1:40:35<49:50, 3.30s/it] 86%|████████▋ | 5734/6638 [1:40:39<49:48, 3.31s/it] {'loss': 0.6573, 'grad_norm': 0.6199835757824135, 'learning_rate': 9.573054549289485e-07, 'epoch': 0.86} 86%|████████▋ | 5734/6638 [1:40:39<49:48, 3.31s/it] 86%|████████▋ | 5735/6638 [1:40:42<50:05, 3.33s/it] {'loss': 0.6485, 'grad_norm': 0.5908048492335173, 'learning_rate': 9.552230562841802e-07, 'epoch': 0.86} 86%|████████▋ | 5735/6638 [1:40:42<50:05, 3.33s/it] 86%|████████▋ | 5736/6638 [1:40:45<49:29, 3.29s/it] {'loss': 0.6219, 'grad_norm': 0.5349794324075161, 'learning_rate': 9.531428113912134e-07, 'epoch': 0.86} 86%|████████▋ | 5736/6638 [1:40:45<49:29, 3.29s/it] 86%|████████▋ | 5737/6638 [1:40:49<48:54, 3.26s/it] {'loss': 0.6889, 'grad_norm': 0.6756562774903053, 'learning_rate': 9.510647207453927e-07, 'epoch': 0.86} 86%|████████▋ | 5737/6638 [1:40:49<48:54, 3.26s/it] 86%|████████▋ | 5738/6638 [1:40:52<48:42, 3.25s/it] {'loss': 0.618, 'grad_norm': 0.5691754178305518, 'learning_rate': 9.489887848415569e-07, 'epoch': 0.86} 86%|████████▋ | 5738/6638 [1:40:52<48:42, 3.25s/it] 86%|████████▋ | 5739/6638 [1:40:55<49:11, 3.28s/it] {'loss': 0.6614, 'grad_norm': 0.612372501375973, 'learning_rate': 9.469150041740338e-07, 'epoch': 0.86} 86%|████████▋ | 5739/6638 [1:40:55<49:11, 3.28s/it] 86%|████████▋ | 5740/6638 [1:40:59<49:58, 3.34s/it] {'loss': 0.6666, 'grad_norm': 0.6048466854880281, 'learning_rate': 9.448433792366296e-07, 'epoch': 0.86} 86%|████████▋ | 5740/6638 [1:40:59<49:58, 3.34s/it] 86%|████████▋ | 5741/6638 [1:41:02<49:50, 3.33s/it] {'loss': 0.6936, 'grad_norm': 0.6544963038059691, 'learning_rate': 9.42773910522643e-07, 'epoch': 0.86} 86%|████████▋ | 5741/6638 [1:41:02<49:50, 3.33s/it] 87%|████████▋ | 5742/6638 [1:41:05<49:30, 3.32s/it] {'loss': 0.6511, 'grad_norm': 0.6219350947814078, 'learning_rate': 9.40706598524861e-07, 'epoch': 0.87} 87%|████████▋ | 5742/6638 [1:41:05<49:30, 3.32s/it] 87%|████████▋ | 5743/6638 [1:41:08<48:37, 3.26s/it] {'loss': 0.6246, 'grad_norm': 0.6221626695113442, 'learning_rate': 9.386414437355495e-07, 'epoch': 0.87} 87%|████████▋ | 5743/6638 [1:41:08<48:37, 3.26s/it] 87%|████████▋ | 5744/6638 [1:41:12<48:43, 3.27s/it] {'loss': 0.6685, 'grad_norm': 0.7279184202947653, 'learning_rate': 9.365784466464689e-07, 'epoch': 0.87} 87%|████████▋ | 5744/6638 [1:41:12<48:43, 3.27s/it] 87%|████████▋ | 5745/6638 [1:41:15<49:02, 3.29s/it] {'loss': 0.6318, 'grad_norm': 0.5551774441201748, 'learning_rate': 9.345176077488583e-07, 'epoch': 0.87} 87%|████████▋ | 5745/6638 [1:41:15<49:02, 3.29s/it] 87%|████████▋ | 5746/6638 [1:41:18<49:16, 3.31s/it] {'loss': 0.7038, 'grad_norm': 0.6201205005851167, 'learning_rate': 9.324589275334517e-07, 'epoch': 0.87} 87%|████████▋ | 5746/6638 [1:41:18<49:16, 3.31s/it] 87%|████████▋ | 5747/6638 [1:41:22<49:05, 3.31s/it] {'loss': 0.6113, 'grad_norm': 0.5312348724611824, 'learning_rate': 9.304024064904626e-07, 'epoch': 0.87} 87%|████████▋ | 5747/6638 [1:41:22<49:05, 3.31s/it] 87%|████████▋ | 5748/6638 [1:41:25<49:05, 3.31s/it] {'loss': 0.6061, 'grad_norm': 0.5494569323135037, 'learning_rate': 9.283480451095894e-07, 'epoch': 0.87} 87%|████████▋ | 5748/6638 [1:41:25<49:05, 3.31s/it] 87%|████████▋ | 5749/6638 [1:41:28<48:53, 3.30s/it] {'loss': 0.6262, 'grad_norm': 0.5103938119059972, 'learning_rate': 9.262958438800207e-07, 'epoch': 0.87} 87%|████████▋ | 5749/6638 [1:41:28<48:53, 3.30s/it]7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 87%|████████▋ | 5750/6638 [1:41:31<48:35, 3.28s/it] {'loss': 0.614, 'grad_norm': 1.0651589484147697, 'learning_rate': 9.242458032904311e-07, 'epoch': 0.87} 87%|████████▋ | 5750/6638 [1:41:31<48:35, 3.28s/it] 87%|████████▋ | 5751/6638 [1:41:35<49:21, 3.34s/it] {'loss': 0.6911, 'grad_norm': 0.664022849504195, 'learning_rate': 9.221979238289736e-07, 'epoch': 0.87} 87%|████████▋ | 5751/6638 [1:41:35<49:21, 3.34s/it] 87%|████████▋ | 5752/6638 [1:41:38<49:13, 3.33s/it] {'loss': 0.6628, 'grad_norm': 0.6292575144402217, 'learning_rate': 9.201522059832935e-07, 'epoch': 0.87} 87%|████████▋ | 5752/6638 [1:41:38<49:13, 3.33s/it] 87%|████████▋ | 5753/6638 [1:41:42<49:07, 3.33s/it] {'loss': 0.6473, 'grad_norm': 0.5948260488617215, 'learning_rate': 9.181086502405201e-07, 'epoch': 0.87} 87%|████████▋ | 5753/6638 [1:41:42<49:07, 3.33s/it] 87%|████████▋ | 5754/6638 [1:41:45<48:28, 3.29s/it] {'loss': 0.6208, 'grad_norm': 0.5691957386601296, 'learning_rate': 9.160672570872687e-07, 'epoch': 0.87} 87%|████████▋ | 5754/6638 [1:41:45<48:28, 3.29s/it] 87%|████████▋ | 5755/6638 [1:41:48<48:42, 3.31s/it] {'loss': 0.6143, 'grad_norm': 0.5457583723054804, 'learning_rate': 9.140280270096358e-07, 'epoch': 0.87} 87%|████████▋ | 5755/6638 [1:41:48<48:42, 3.31s/it] 87%|████████▋ | 5756/6638 [1:41:51<48:32, 3.30s/it] {'loss': 0.6202, 'grad_norm': 0.562845000561986, 'learning_rate': 9.119909604932031e-07, 'epoch': 0.87} 87%|████████▋ | 5756/6638 [1:41:51<48:32, 3.30s/it] 87%|████████▋ | 5757/6638 [1:41:55<48:28, 3.30s/it] {'loss': 0.676, 'grad_norm': 0.5945468193721081, 'learning_rate': 9.099560580230405e-07, 'epoch': 0.87} 87%|████████▋ | 5757/6638 [1:41:55<48:28, 3.30s/it] 87%|████████▋ | 5758/6638 [1:41:58<47:53, 3.27s/it] {'loss': 0.6301, 'grad_norm': 0.6699241293100283, 'learning_rate': 9.079233200837023e-07, 'epoch': 0.87} 87%|████████▋ | 5758/6638 [1:41:58<47:53, 3.27s/it] 87%|████████▋ | 5759/6638 [1:42:01<48:03, 3.28s/it] {'loss': 0.6435, 'grad_norm': 0.6523457709923692, 'learning_rate': 9.058927471592272e-07, 'epoch': 0.87} 87%|████████▋ | 5759/6638 [1:42:01<48:03, 3.28s/it] 87%|████████▋ | 5760/6638 [1:42:04<47:43, 3.26s/it] {'loss': 0.6127, 'grad_norm': 0.5270847651931766, 'learning_rate': 9.038643397331337e-07, 'epoch': 0.87} 87%|████████▋ | 5760/6638 [1:42:04<47:43, 3.26s/it] 87%|████████▋ | 5761/6638 [1:42:08<47:24, 3.24s/it] {'loss': 0.6419, 'grad_norm': 0.5545539552792694, 'learning_rate': 9.018380982884311e-07, 'epoch': 0.87} 87%|████████▋ | 5761/6638 [1:42:08<47:24, 3.24s/it] 87%|████████▋ | 5762/6638 [1:42:11<47:03, 3.22s/it] {'loss': 0.6253, 'grad_norm': 0.6044917648514497, 'learning_rate': 8.998140233076103e-07, 'epoch': 0.87} 87%|████████▋ | 5762/6638 [1:42:11<47:03, 3.22s/it] 87%|████████▋ | 5763/6638 [1:42:14<47:05, 3.23s/it] {'loss': 0.6304, 'grad_norm': 0.5071544111298713, 'learning_rate': 8.977921152726432e-07, 'epoch': 0.87} 87%|████████▋ | 5763/6638 [1:42:14<47:05, 3.23s/it] 87%|████████▋ | 5764/6638 [1:42:17<47:43, 3.28s/it] {'loss': 0.6346, 'grad_norm': 0.6196735579597795, 'learning_rate': 8.957723746649916e-07, 'epoch': 0.87} 87%|████████▋ | 5764/6638 [1:42:17<47:43, 3.28s/it] 87%|████████▋ | 5765/6638 [1:42:21<47:43, 3.28s/it] {'loss': 0.6492, 'grad_norm': 0.5639732423782021, 'learning_rate': 8.937548019655984e-07, 'epoch': 0.87} 87%|████████▋ | 5765/6638 [1:42:21<47:43, 3.28s/it] 87%|████████▋ | 5766/6638 [1:42:24<47:51, 3.29s/it] {'loss': 0.6532, 'grad_norm': 0.5940832650282117, 'learning_rate': 8.917393976548882e-07, 'epoch': 0.87} 87%|████████▋ | 5766/6638 [1:42:24<47:51, 3.29s/it] 87%|████████▋ | 5767/6638 [1:42:27<47:04, 3.24s/it] {'loss': 0.6347, 'grad_norm': 0.610828461203126, 'learning_rate': 8.897261622127728e-07, 'epoch': 0.87} 87%|████████▋ | 5767/6638 [1:42:27<47:04, 3.24s/it] 87%|████████▋ | 5768/6638 [1:42:31<47:40, 3.29s/it] {'loss': 0.6302, 'grad_norm': 0.5784697471436635, 'learning_rate': 8.87715096118642e-07, 'epoch': 0.87} 87%|████████▋ | 5768/6638 [1:42:31<47:40, 3.29s/it] 87%|████████▋ | 5769/6638 [1:42:34<47:37, 3.29s/it] {'loss': 0.6858, 'grad_norm': 0.6805673682653509, 'learning_rate': 8.857061998513794e-07, 'epoch': 0.87} 87%|████████▋ | 5769/6638 [1:42:34<47:37, 3.29s/it] 87%|████████▋ | 5770/6638 [1:42:37<47:34, 3.29s/it] {'loss': 0.6351, 'grad_norm': 0.6383010213762778, 'learning_rate': 8.836994738893434e-07, 'epoch': 0.87} 87%|████████▋ | 5770/6638 [1:42:37<47:34, 3.29s/it] 87%|████████▋ | 5771/6638 [1:42:40<47:10, 3.26s/it] {'loss': 0.6596, 'grad_norm': 0.6498261278445535, 'learning_rate': 8.816949187103741e-07, 'epoch': 0.87} 87%|████████▋ | 5771/6638 [1:42:40<47:10, 3.26s/it] 87%|████████▋ | 5772/6638 [1:42:44<47:46, 3.31s/it] {'loss': 0.6274, 'grad_norm': 0.5502393923414076, 'learning_rate': 8.796925347918006e-07, 'epoch': 0.87} 87%|████████▋ | 5772/6638 [1:42:44<47:46, 3.31s/it] 87%|████████▋ | 5773/6638 [1:42:47<47:56, 3.32s/it] {'loss': 0.6017, 'grad_norm': 0.5525793763892276, 'learning_rate': 8.776923226104328e-07, 'epoch': 0.87} 87%|████████▋ | 5773/6638 [1:42:47<47:56, 3.32s/it] 87%|████████▋ | 5774/6638 [1:42:50<47:26, 3.29s/it] {'loss': 0.6139, 'grad_norm': 0.5839333747656764, 'learning_rate': 8.756942826425652e-07, 'epoch': 0.87} 87%|████████▋ | 5774/6638 [1:42:50<47:26, 3.29s/it] 87%|████████▋ | 5775/6638 [1:42:54<47:27, 3.30s/it] {'loss': 0.6448, 'grad_norm': 0.761858118718033, 'learning_rate': 8.736984153639716e-07, 'epoch': 0.87} 87%|████████▋ | 5775/6638 [1:42:54<47:27, 3.30s/it] 87%|████████▋ | 5776/6638 [1:42:57<47:14, 3.29s/it] {'loss': 0.617, 'grad_norm': 0.5393543843278534, 'learning_rate': 8.717047212499052e-07, 'epoch': 0.87} 87%|████████▋ | 5776/6638 [1:42:57<47:14, 3.29s/it] 87%|████████▋ | 5777/6638 [1:43:00<47:02, 3.28s/it] {'loss': 0.6186, 'grad_norm': 0.5395715459301736, 'learning_rate': 8.697132007751163e-07, 'epoch': 0.87} 87%|████████▋ | 5777/6638 [1:43:00<47:02, 3.28s/it] 87%|████████▋ | 5778/6638 [1:43:03<46:49, 3.27s/it] {'loss': 0.6409, 'grad_norm': 0.6017125522829675, 'learning_rate': 8.677238544138201e-07, 'epoch': 0.87} 87%|████████▋ | 5778/6638 [1:43:03<46:49, 3.27s/it] 87%|████████▋ | 5779/6638 [1:43:07<46:16, 3.23s/it] {'loss': 0.6215, 'grad_norm': 0.5857884161571628, 'learning_rate': 8.657366826397251e-07, 'epoch': 0.87} 87%|████████▋ | 5779/6638 [1:43:07<46:16, 3.23s/it] 87%|████████▋ | 5780/6638 [1:43:10<47:08, 3.30s/it] {'loss': 0.6854, 'grad_norm': 0.6276312550327466, 'learning_rate': 8.637516859260176e-07, 'epoch': 0.87} 87%|████████▋ | 5780/6638 [1:43:10<47:08, 3.30s/it] 87%|████████▋ | 5781/6638 [1:43:13<47:22, 3.32s/it] {'loss': 0.6878, 'grad_norm': 0.5832200299432374, 'learning_rate': 8.61768864745367e-07, 'epoch': 0.87} 87%|████████▋ | 5781/6638 [1:43:13<47:22, 3.32s/it] 87%|████████▋ | 5782/6638 [1:43:17<47:08, 3.30s/it] {'loss': 0.6007, 'grad_norm': 0.5273764002037676, 'learning_rate': 8.59788219569927e-07, 'epoch': 0.87} 87%|████████▋ | 5782/6638 [1:43:17<47:08, 3.30s/it] 87%|████████▋ | 5783/6638 [1:43:20<46:32, 3.27s/it] {'loss': 0.6381, 'grad_norm': 0.7333551516123926, 'learning_rate': 8.578097508713279e-07, 'epoch': 0.87} 87%|████████▋ | 5783/6638 [1:43:20<46:32, 3.27s/it] 87%|████████▋ | 5784/6638 [1:43:23<46:28, 3.27s/it] {'loss': 0.628, 'grad_norm': 0.5787222641873956, 'learning_rate': 8.558334591206852e-07, 'epoch': 0.87} 87%|████████▋ | 5784/6638 [1:43:23<46:28, 3.27s/it] 87%|████████▋ | 5785/6638 [1:43:26<46:12, 3.25s/it] {'loss': 0.6389, 'grad_norm': 0.5789081586573913, 'learning_rate': 8.53859344788599e-07, 'epoch': 0.87} 87%|████████▋ | 5785/6638 [1:43:26<46:12, 3.25s/it] 87%|████████▋ | 5786/6638 [1:43:30<46:52, 3.30s/it] {'loss': 0.6752, 'grad_norm': 0.6498492662441541, 'learning_rate': 8.518874083451434e-07, 'epoch': 0.87} 87%|████████▋ | 5786/6638 [1:43:30<46:52, 3.30s/it] 87%|████████▋ | 5787/6638 [1:43:33<46:57, 3.31s/it] {'loss': 0.6648, 'grad_norm': 0.6253484550794229, 'learning_rate': 8.499176502598783e-07, 'epoch': 0.87} 87%|████████▋ | 5787/6638 [1:43:33<46:57, 3.31s/it] 87%|████████▋ | 5788/6638 [1:43:36<46:51, 3.31s/it] {'loss': 0.6391, 'grad_norm': 0.6116527692837278, 'learning_rate': 8.479500710018484e-07, 'epoch': 0.87} 87%|████████▋ | 5788/6638 [1:43:36<46:51, 3.31s/it] 87%|████████▋ | 5789/6638 [1:43:40<47:04, 3.33s/it] {'loss': 0.6115, 'grad_norm': 0.4996776486407763, 'learning_rate': 8.459846710395714e-07, 'epoch': 0.87} 87%|████████▋ | 5789/6638 [1:43:40<47:04, 3.33s/it] 87%|████████▋ | 5790/6638 [1:43:43<46:31, 3.29s/it] {'loss': 0.6122, 'grad_norm': 0.6619389772916252, 'learning_rate': 8.440214508410527e-07, 'epoch': 0.87} 87%|████████▋ | 5790/6638 [1:43:43<46:31, 3.29s/it] 87%|████████▋ | 5791/6638 [1:43:46<46:20, 3.28s/it] {'loss': 0.6261, 'grad_norm': 0.5841128636122658, 'learning_rate': 8.420604108737751e-07, 'epoch': 0.87} 87%|████████▋ | 5791/6638 [1:43:46<46:20, 3.28s/it] 87%|████████▋ | 5792/6638 [1:43:49<46:18, 3.28s/it] {'loss': 0.59, 'grad_norm': 0.5186440695649832, 'learning_rate': 8.401015516047039e-07, 'epoch': 0.87} 87%|████████▋ | 5792/6638 [1:43:49<46:18, 3.28s/it] 87%|████████▋ | 5793/6638 [1:43:53<47:20, 3.36s/it] {'loss': 0.6447, 'grad_norm': 0.5691842362031674, 'learning_rate': 8.381448735002861e-07, 'epoch': 0.87} 87%|████████▋ | 5793/6638 [1:43:53<47:20, 3.36s/it] 87%|████████▋ | 5794/6638 [1:43:56<47:19, 3.36s/it] {'loss': 0.6223, 'grad_norm': 0.5571575679809576, 'learning_rate': 8.361903770264457e-07, 'epoch': 0.87} 87%|████████▋ | 5794/6638 [1:43:56<47:19, 3.36s/it] 87%|████████▋ | 5795/6638 [1:44:00<46:51, 3.33s/it] {'loss': 0.6144, 'grad_norm': 0.5636896929315716, 'learning_rate': 8.342380626485902e-07, 'epoch': 0.87} 87%|████████▋ | 5795/6638 [1:44:00<46:51, 3.33s/it] 87%|████████▋ | 5796/6638 [1:44:03<47:00, 3.35s/it] {'loss': 0.6088, 'grad_norm': 0.5458104411616461, 'learning_rate': 8.322879308316078e-07, 'epoch': 0.87} 87%|████████▋ | 5796/6638 [1:44:03<47:00, 3.35s/it] 87%|████████▋ | 5797/6638 [1:44:06<46:28, 3.32s/it] {'loss': 0.6454, 'grad_norm': 0.5792301647876723, 'learning_rate': 8.303399820398672e-07, 'epoch': 0.87} 87%|████████▋ | 5797/6638 [1:44:06<46:28, 3.32s/it] 87%|████████▋ | 5798/6638 [1:44:10<46:52, 3.35s/it] {'loss': 0.6434, 'grad_norm': 0.5495631304185286, 'learning_rate': 8.283942167372128e-07, 'epoch': 0.87} 87%|████████▋ | 5798/6638 [1:44:10<46:52, 3.35s/it] 87%|████████▋ | 5799/6638 [1:44:13<46:26, 3.32s/it] {'loss': 0.6083, 'grad_norm': 0.5521436889947023, 'learning_rate': 8.264506353869717e-07, 'epoch': 0.87} 87%|████████▋ | 5799/6638 [1:44:13<46:26, 3.32s/it]2 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 1 87%|████████▋ | 5800/6638 [1:44:16<45:46, 3.28s/it] AutoResumeHook: Checking whether to suspend... {'loss': 0.6337, 'grad_norm': 0.6046561332297875, 'learning_rate': 8.24509238451956e-07, 'epoch': 0.87} 87%|████████▋ | 5800/6638 [1:44:16<45:46, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5800/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5800/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5800/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 87%|████████▋ | 5801/6638 [1:44:35<1:48:58, 7.81s/it] {'loss': 0.6642, 'grad_norm': 0.615589385463001, 'learning_rate': 8.225700263944481e-07, 'epoch': 0.87} 87%|████████▋ | 5801/6638 [1:44:35<1:48:58, 7.81s/it] 87%|████████▋ | 5802/6638 [1:44:38<1:30:02, 6.46s/it] {'loss': 0.6573, 'grad_norm': 0.610914318281943, 'learning_rate': 8.206329996762208e-07, 'epoch': 0.87} 87%|████████▋ | 5802/6638 [1:44:38<1:30:02, 6.46s/it] 87%|████████▋ | 5803/6638 [1:44:41<1:16:07, 5.47s/it] {'loss': 0.615, 'grad_norm': 0.5762254404003129, 'learning_rate': 8.186981587585152e-07, 'epoch': 0.87} 87%|████████▋ | 5803/6638 [1:44:41<1:16:07, 5.47s/it] 87%|████████▋ | 5804/6638 [1:44:44<1:06:55, 4.81s/it] {'loss': 0.6277, 'grad_norm': 0.5820889563794533, 'learning_rate': 8.167655041020606e-07, 'epoch': 0.87} 87%|████████▋ | 5804/6638 [1:44:44<1:06:55, 4.81s/it] 87%|████████▋ | 5805/6638 [1:44:48<1:00:39, 4.37s/it] {'loss': 0.6474, 'grad_norm': 0.5797305828314728, 'learning_rate': 8.148350361670643e-07, 'epoch': 0.87} 87%|████████▋ | 5805/6638 [1:44:48<1:00:39, 4.37s/it] 87%|████████▋ | 5806/6638 [1:44:51<56:17, 4.06s/it] {'loss': 0.6498, 'grad_norm': 0.5466377361873056, 'learning_rate': 8.129067554132075e-07, 'epoch': 0.87} 87%|████████▋ | 5806/6638 [1:44:51<56:17, 4.06s/it] 87%|████████▋ | 5807/6638 [1:44:54<53:43, 3.88s/it] {'loss': 0.6311, 'grad_norm': 0.649780479694499, 'learning_rate': 8.109806622996563e-07, 'epoch': 0.87} 87%|████████▋ | 5807/6638 [1:44:54<53:43, 3.88s/it] 87%|████████▋ | 5808/6638 [1:44:58<51:16, 3.71s/it] {'loss': 0.618, 'grad_norm': 0.8329106133876925, 'learning_rate': 8.090567572850561e-07, 'epoch': 0.87} 87%|████████▋ | 5808/6638 [1:44:58<51:16, 3.71s/it] 88%|████████▊ | 5809/6638 [1:45:01<50:00, 3.62s/it] {'loss': 0.6561, 'grad_norm': 0.6292281258526561, 'learning_rate': 8.071350408275258e-07, 'epoch': 0.88} 88%|████████▊ | 5809/6638 [1:45:01<50:00, 3.62s/it] 88%|████████▊ | 5810/6638 [1:45:04<48:31, 3.52s/it] {'loss': 0.634, 'grad_norm': 0.6684706878097948, 'learning_rate': 8.052155133846695e-07, 'epoch': 0.88} 88%|████████▊ | 5810/6638 [1:45:04<48:31, 3.52s/it] 88%|████████▊ | 5811/6638 [1:45:08<47:47, 3.47s/it] {'loss': 0.6359, 'grad_norm': 0.574166408722281, 'learning_rate': 8.032981754135638e-07, 'epoch': 0.88} 88%|████████▊ | 5811/6638 [1:45:08<47:47, 3.47s/it] 88%|████████▊ | 5812/6638 [1:45:11<47:12, 3.43s/it] {'loss': 0.6176, 'grad_norm': 0.5661408022552478, 'learning_rate': 8.01383027370769e-07, 'epoch': 0.88} 88%|████████▊ | 5812/6638 [1:45:11<47:12, 3.43s/it] 88%|████████▊ | 5813/6638 [1:45:14<46:11, 3.36s/it] {'loss': 0.6347, 'grad_norm': 0.5867061154555937, 'learning_rate': 7.994700697123247e-07, 'epoch': 0.88} 88%|████████▊ | 5813/6638 [1:45:14<46:11, 3.36s/it] 88%|████████▊ | 5814/6638 [1:45:18<46:23, 3.38s/it] {'loss': 0.6317, 'grad_norm': 0.6101748116897953, 'learning_rate': 7.975593028937412e-07, 'epoch': 0.88} 88%|████████▊ | 5814/6638 [1:45:18<46:23, 3.38s/it] 88%|████████▊ | 5815/6638 [1:45:21<46:02, 3.36s/it] {'loss': 0.6064, 'grad_norm': 0.5884383207913281, 'learning_rate': 7.956507273700154e-07, 'epoch': 0.88} 88%|████████▊ | 5815/6638 [1:45:21<46:02, 3.36s/it] 88%|████████▊ | 5816/6638 [1:45:24<45:34, 3.33s/it] {'loss': 0.6046, 'grad_norm': 0.5523415518084107, 'learning_rate': 7.937443435956205e-07, 'epoch': 0.88} 88%|████████▊ | 5816/6638 [1:45:24<45:34, 3.33s/it] 88%|████████▊ | 5817/6638 [1:45:28<45:22, 3.32s/it] {'loss': 0.6444, 'grad_norm': 0.6009301050749767, 'learning_rate': 7.918401520245033e-07, 'epoch': 0.88} 88%|████████▊ | 5817/6638 [1:45:28<45:22, 3.32s/it] 88%|████████▊ | 5818/6638 [1:45:31<45:31, 3.33s/it] {'loss': 0.6327, 'grad_norm': 0.5406386443792514, 'learning_rate': 7.899381531100936e-07, 'epoch': 0.88} 88%|████████▊ | 5818/6638 [1:45:31<45:31, 3.33s/it] 88%|████████▊ | 5819/6638 [1:45:34<45:32, 3.34s/it] {'loss': 0.6213, 'grad_norm': 0.5430711705636, 'learning_rate': 7.880383473052966e-07, 'epoch': 0.88} 88%|████████▊ | 5819/6638 [1:45:34<45:32, 3.34s/it] 88%|████████▊ | 5820/6638 [1:45:38<45:00, 3.30s/it] {'loss': 0.6487, 'grad_norm': 0.6025626119295586, 'learning_rate': 7.861407350624994e-07, 'epoch': 0.88} 88%|████████▊ | 5820/6638 [1:45:38<45:00, 3.30s/it] 88%|████████▊ | 5821/6638 [1:45:41<45:01, 3.31s/it] {'loss': 0.6545, 'grad_norm': 0.5691970847919465, 'learning_rate': 7.842453168335607e-07, 'epoch': 0.88} 88%|████████▊ | 5821/6638 [1:45:41<45:01, 3.31s/it] 88%|████████▊ | 5822/6638 [1:45:44<44:44, 3.29s/it] {'loss': 0.6078, 'grad_norm': 0.5858343019180352, 'learning_rate': 7.82352093069817e-07, 'epoch': 0.88} 88%|████████▊ | 5822/6638 [1:45:44<44:44, 3.29s/it] 88%|████████▊ | 5823/6638 [1:45:47<44:27, 3.27s/it] {'loss': 0.662, 'grad_norm': 0.6346110541813998, 'learning_rate': 7.8046106422209e-07, 'epoch': 0.88} 88%|████████▊ | 5823/6638 [1:45:47<44:27, 3.27s/it] 88%|████████▊ | 5824/6638 [1:45:51<44:12, 3.26s/it] {'loss': 0.6241, 'grad_norm': 0.5841355476012583, 'learning_rate': 7.785722307406685e-07, 'epoch': 0.88} 88%|████████▊ | 5824/6638 [1:45:51<44:12, 3.26s/it] 88%|████████▊ | 5825/6638 [1:45:54<44:27, 3.28s/it] {'loss': 0.6103, 'grad_norm': 0.5408877279995526, 'learning_rate': 7.766855930753281e-07, 'epoch': 0.88} 88%|████████▊ | 5825/6638 [1:45:54<44:27, 3.28s/it] 88%|████████▊ | 5826/6638 [1:45:57<44:27, 3.28s/it] {'loss': 0.6319, 'grad_norm': 0.5941029302371503, 'learning_rate': 7.74801151675314e-07, 'epoch': 0.88} 88%|████████▊ | 5826/6638 [1:45:57<44:27, 3.28s/it] 88%|████████▊ | 5827/6638 [1:46:00<44:29, 3.29s/it] {'loss': 0.6394, 'grad_norm': 0.5430879302760822, 'learning_rate': 7.729189069893506e-07, 'epoch': 0.88} 88%|████████▊ | 5827/6638 [1:46:00<44:29, 3.29s/it] 88%|████████▊ | 5828/6638 [1:46:04<44:38, 3.31s/it] {'loss': 0.6541, 'grad_norm': 0.6130160581522125, 'learning_rate': 7.710388594656449e-07, 'epoch': 0.88} 88%|████████▊ | 5828/6638 [1:46:04<44:38, 3.31s/it] 88%|████████▊ | 5829/6638 [1:46:07<44:16, 3.28s/it] {'loss': 0.6331, 'grad_norm': 0.6284798866564837, 'learning_rate': 7.691610095518686e-07, 'epoch': 0.88} 88%|████████▊ | 5829/6638 [1:46:07<44:16, 3.28s/it] 88%|████████▊ | 5830/6638 [1:46:10<43:56, 3.26s/it] {'loss': 0.6438, 'grad_norm': 0.5828867867252755, 'learning_rate': 7.672853576951822e-07, 'epoch': 0.88} 88%|████████▊ | 5830/6638 [1:46:10<43:56, 3.26s/it] 88%|████████▊ | 5831/6638 [1:46:14<43:51, 3.26s/it] {'loss': 0.631, 'grad_norm': 0.6014967107755692, 'learning_rate': 7.654119043422192e-07, 'epoch': 0.88} 88%|████████▊ | 5831/6638 [1:46:14<43:51, 3.26s/it] 88%|████████▊ | 5832/6638 [1:46:17<44:06, 3.28s/it] {'loss': 0.6309, 'grad_norm': 0.5399576721869768, 'learning_rate': 7.635406499390829e-07, 'epoch': 0.88} 88%|████████▊ | 5832/6638 [1:46:17<44:06, 3.28s/it] 88%|████████▊ | 5833/6638 [1:46:20<43:49, 3.27s/it] {'loss': 0.6131, 'grad_norm': 0.6394456391770202, 'learning_rate': 7.616715949313636e-07, 'epoch': 0.88} 88%|████████▊ | 5833/6638 [1:46:20<43:49, 3.27s/it] 88%|████████▊ | 5834/6638 [1:46:24<44:41, 3.34s/it] {'loss': 0.6613, 'grad_norm': 0.6150671477501684, 'learning_rate': 7.598047397641162e-07, 'epoch': 0.88} 88%|████████▊ | 5834/6638 [1:46:24<44:41, 3.34s/it] 88%|████████▊ | 5835/6638 [1:46:27<44:59, 3.36s/it] {'loss': 0.5986, 'grad_norm': 0.5198039195754711, 'learning_rate': 7.579400848818864e-07, 'epoch': 0.88} 88%|████████▊ | 5835/6638 [1:46:27<44:59, 3.36s/it] 88%|████████▊ | 5836/6638 [1:46:30<44:56, 3.36s/it] {'loss': 0.6526, 'grad_norm': 0.6171161625776362, 'learning_rate': 7.560776307286843e-07, 'epoch': 0.88} 88%|████████▊ | 5836/6638 [1:46:30<44:56, 3.36s/it] 88%|████████▊ | 5837/6638 [1:46:34<44:56, 3.37s/it] {'loss': 0.6138, 'grad_norm': 0.5382131772734865, 'learning_rate': 7.542173777479966e-07, 'epoch': 0.88} 88%|████████▊ | 5837/6638 [1:46:34<44:56, 3.37s/it] 88%|████████▊ | 5838/6638 [1:46:37<45:20, 3.40s/it] {'loss': 0.6118, 'grad_norm': 0.5196754184928892, 'learning_rate': 7.5235932638279e-07, 'epoch': 0.88} 88%|████████▊ | 5838/6638 [1:46:37<45:20, 3.40s/it] 88%|████████▊ | 5839/6638 [1:46:41<45:12, 3.39s/it] {'loss': 0.6538, 'grad_norm': 0.5861215543467417, 'learning_rate': 7.505034770755082e-07, 'epoch': 0.88} 88%|████████▊ | 5839/6638 [1:46:41<45:12, 3.39s/it] 88%|████████▊ | 5840/6638 [1:46:44<44:40, 3.36s/it] {'loss': 0.6368, 'grad_norm': 0.6036550142350379, 'learning_rate': 7.486498302680679e-07, 'epoch': 0.88} 88%|████████▊ | 5840/6638 [1:46:44<44:40, 3.36s/it] 88%|████████▊ | 5841/6638 [1:46:47<44:02, 3.32s/it] {'loss': 0.6326, 'grad_norm': 0.6255831640388663, 'learning_rate': 7.467983864018568e-07, 'epoch': 0.88} 88%|████████▊ | 5841/6638 [1:46:47<44:02, 3.32s/it] 88%|████████▊ | 5842/6638 [1:46:50<43:50, 3.30s/it] {'loss': 0.6579, 'grad_norm': 0.6186850088854041, 'learning_rate': 7.449491459177471e-07, 'epoch': 0.88} 88%|████████▊ | 5842/6638 [1:46:50<43:50, 3.30s/it] 88%|████████▊ | 5843/6638 [1:46:54<43:29, 3.28s/it] {'loss': 0.6741, 'grad_norm': 0.6434807104851008, 'learning_rate': 7.431021092560819e-07, 'epoch': 0.88} 88%|████████▊ | 5843/6638 [1:46:54<43:29, 3.28s/it] 88%|████████▊ | 5844/6638 [1:46:57<43:27, 3.28s/it] {'loss': 0.6658, 'grad_norm': 0.6453286497020413, 'learning_rate': 7.412572768566783e-07, 'epoch': 0.88} 88%|████████▊ | 5844/6638 [1:46:57<43:27, 3.28s/it] 88%|████████▊ | 5845/6638 [1:47:00<42:57, 3.25s/it] {'loss': 0.5999, 'grad_norm': 0.5730995366428759, 'learning_rate': 7.394146491588261e-07, 'epoch': 0.88} 88%|████████▊ | 5845/6638 [1:47:00<42:57, 3.25s/it] 88%|████████▊ | 5846/6638 [1:47:04<43:46, 3.32s/it] {'loss': 0.6172, 'grad_norm': 0.6118764645073685, 'learning_rate': 7.375742266013009e-07, 'epoch': 0.88} 88%|████████▊ | 5846/6638 [1:47:04<43:46, 3.32s/it] 88%|████████▊ | 5847/6638 [1:47:07<43:37, 3.31s/it] {'loss': 0.6106, 'grad_norm': 0.5434347079089347, 'learning_rate': 7.357360096223409e-07, 'epoch': 0.88} 88%|████████▊ | 5847/6638 [1:47:07<43:37, 3.31s/it] 88%|████████▊ | 5848/6638 [1:47:10<43:06, 3.27s/it] {'loss': 0.6516, 'grad_norm': 0.5830836428323192, 'learning_rate': 7.338999986596673e-07, 'epoch': 0.88} 88%|████████▊ | 5848/6638 [1:47:10<43:06, 3.27s/it] 88%|████████▊ | 5849/6638 [1:47:13<43:27, 3.30s/it] {'loss': 0.6469, 'grad_norm': 0.5558987223036662, 'learning_rate': 7.320661941504703e-07, 'epoch': 0.88} 88%|████████▊ | 5849/6638 [1:47:13<43:27, 3.30s/it]6 AutoResumeHook: Checking whether to suspend... 74 AutoResumeHook: Checking whether to suspend...AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 88%|████████▊ | 5850/6638 [1:47:17<43:10, 3.29s/it]1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.6682, 'grad_norm': 0.5983755797800859, 'learning_rate': 7.302345965314173e-07, 'epoch': 0.88} 88%|████████▊ | 5850/6638 [1:47:17<43:10, 3.29s/it] 88%|████████▊ | 5851/6638 [1:47:20<42:59, 3.28s/it] {'loss': 0.6001, 'grad_norm': 0.574069601121629, 'learning_rate': 7.284052062386538e-07, 'epoch': 0.88} 88%|████████▊ | 5851/6638 [1:47:20<42:59, 3.28s/it] 88%|████████▊ | 5852/6638 [1:47:23<42:55, 3.28s/it] {'loss': 0.6304, 'grad_norm': 0.5421846925587166, 'learning_rate': 7.265780237077924e-07, 'epoch': 0.88} 88%|████████▊ | 5852/6638 [1:47:23<42:55, 3.28s/it] 88%|████████▊ | 5853/6638 [1:47:27<43:12, 3.30s/it] {'loss': 0.6355, 'grad_norm': 0.5875998592863374, 'learning_rate': 7.247530493739252e-07, 'epoch': 0.88} 88%|████████▊ | 5853/6638 [1:47:27<43:12, 3.30s/it] 88%|████████▊ | 5854/6638 [1:47:30<43:15, 3.31s/it] {'loss': 0.621, 'grad_norm': 0.5616317875267481, 'learning_rate': 7.229302836716179e-07, 'epoch': 0.88} 88%|████████▊ | 5854/6638 [1:47:30<43:15, 3.31s/it] 88%|████████▊ | 5855/6638 [1:47:33<43:40, 3.35s/it] {'loss': 0.6321, 'grad_norm': 0.5550373396670925, 'learning_rate': 7.211097270349065e-07, 'epoch': 0.88} 88%|████████▊ | 5855/6638 [1:47:33<43:40, 3.35s/it] 88%|████████▊ | 5856/6638 [1:47:37<43:41, 3.35s/it] {'loss': 0.6079, 'grad_norm': 0.5109527053757466, 'learning_rate': 7.192913798973089e-07, 'epoch': 0.88} 88%|████████▊ | 5856/6638 [1:47:37<43:41, 3.35s/it] 88%|████████▊ | 5857/6638 [1:47:40<43:23, 3.33s/it] {'loss': 0.6419, 'grad_norm': 0.5769376267182442, 'learning_rate': 7.174752426918041e-07, 'epoch': 0.88} 88%|████████▊ | 5857/6638 [1:47:40<43:23, 3.33s/it] 88%|████████▊ | 5858/6638 [1:47:43<43:00, 3.31s/it] {'loss': 0.6266, 'grad_norm': 0.5601592187187671, 'learning_rate': 7.156613158508618e-07, 'epoch': 0.88} 88%|████████▊ | 5858/6638 [1:47:43<43:00, 3.31s/it] 88%|████████▊ | 5859/6638 [1:47:47<43:10, 3.33s/it] {'loss': 0.6214, 'grad_norm': 0.5659097592685369, 'learning_rate': 7.138495998064099e-07, 'epoch': 0.88} 88%|████████▊ | 5859/6638 [1:47:47<43:10, 3.33s/it] 88%|████████▊ | 5860/6638 [1:47:50<42:55, 3.31s/it] {'loss': 0.6142, 'grad_norm': 0.5864835149607528, 'learning_rate': 7.120400949898576e-07, 'epoch': 0.88} 88%|████████▊ | 5860/6638 [1:47:50<42:55, 3.31s/it] 88%|████████▊ | 5861/6638 [1:47:53<43:27, 3.36s/it] {'loss': 0.651, 'grad_norm': 0.6540290832094439, 'learning_rate': 7.102328018320859e-07, 'epoch': 0.88} 88%|████████▊ | 5861/6638 [1:47:53<43:27, 3.36s/it] 88%|████████▊ | 5862/6638 [1:47:57<43:06, 3.33s/it] {'loss': 0.6273, 'grad_norm': 0.6099600538489549, 'learning_rate': 7.084277207634494e-07, 'epoch': 0.88} 88%|████████▊ | 5862/6638 [1:47:57<43:06, 3.33s/it] 88%|████████▊ | 5863/6638 [1:48:00<43:13, 3.35s/it] {'loss': 0.6301, 'grad_norm': 0.5738926185612215, 'learning_rate': 7.066248522137786e-07, 'epoch': 0.88} 88%|████████▊ | 5863/6638 [1:48:00<43:13, 3.35s/it] 88%|████████▊ | 5864/6638 [1:48:03<43:02, 3.34s/it] {'loss': 0.63, 'grad_norm': 0.5686552147684807, 'learning_rate': 7.048241966123703e-07, 'epoch': 0.88} 88%|████████▊ | 5864/6638 [1:48:03<43:02, 3.34s/it] 88%|████████▊ | 5865/6638 [1:48:07<42:50, 3.33s/it] {'loss': 0.6576, 'grad_norm': 0.6012694188773599, 'learning_rate': 7.030257543879992e-07, 'epoch': 0.88} 88%|████████▊ | 5865/6638 [1:48:07<42:50, 3.33s/it] 88%|████████▊ | 5866/6638 [1:48:10<42:40, 3.32s/it] {'loss': 0.6473, 'grad_norm': 0.6235723725207094, 'learning_rate': 7.012295259689161e-07, 'epoch': 0.88} 88%|████████▊ | 5866/6638 [1:48:10<42:40, 3.32s/it] 88%|████████▊ | 5867/6638 [1:48:13<42:44, 3.33s/it] {'loss': 0.6437, 'grad_norm': 0.5846875133798515, 'learning_rate': 6.994355117828366e-07, 'epoch': 0.88} 88%|████████▊ | 5867/6638 [1:48:13<42:44, 3.33s/it] 88%|████████▊ | 5868/6638 [1:48:17<42:39, 3.32s/it] {'loss': 0.6499, 'grad_norm': 0.6495847014325569, 'learning_rate': 6.976437122569557e-07, 'epoch': 0.88} 88%|████████▊ | 5868/6638 [1:48:17<42:39, 3.32s/it] 88%|████████▊ | 5869/6638 [1:48:20<42:19, 3.30s/it] {'loss': 0.6161, 'grad_norm': 0.5813930730989171, 'learning_rate': 6.958541278179365e-07, 'epoch': 0.88} 88%|████████▊ | 5869/6638 [1:48:20<42:19, 3.30s/it] 88%|████████▊ | 5870/6638 [1:48:23<42:15, 3.30s/it] {'loss': 0.6041, 'grad_norm': 0.5573986673247344, 'learning_rate': 6.94066758891917e-07, 'epoch': 0.88} 88%|████████▊ | 5870/6638 [1:48:23<42:15, 3.30s/it] 88%|████████▊ | 5871/6638 [1:48:26<42:00, 3.29s/it] {'loss': 0.6031, 'grad_norm': 0.5431131271107926, 'learning_rate': 6.922816059045112e-07, 'epoch': 0.88} 88%|████████▊ | 5871/6638 [1:48:26<42:00, 3.29s/it] 88%|████████▊ | 5872/6638 [1:48:30<41:53, 3.28s/it] {'loss': 0.6296, 'grad_norm': 0.5911740893804087, 'learning_rate': 6.904986692807958e-07, 'epoch': 0.88} 88%|████████▊ | 5872/6638 [1:48:30<41:53, 3.28s/it] 88%|████████▊ | 5873/6638 [1:48:33<41:40, 3.27s/it] {'loss': 0.6044, 'grad_norm': 0.543523483440301, 'learning_rate': 6.887179494453289e-07, 'epoch': 0.88} 88%|████████▊ | 5873/6638 [1:48:33<41:40, 3.27s/it] 88%|████████▊ | 5874/6638 [1:48:36<41:20, 3.25s/it] {'loss': 0.6316, 'grad_norm': 0.5821302995718013, 'learning_rate': 6.86939446822138e-07, 'epoch': 0.88} 88%|████████▊ | 5874/6638 [1:48:36<41:20, 3.25s/it] 89%|████████▊ | 5875/6638 [1:48:39<41:40, 3.28s/it] {'loss': 0.6742, 'grad_norm': 0.6097791073719494, 'learning_rate': 6.851631618347198e-07, 'epoch': 0.89} 89%|████████▊ | 5875/6638 [1:48:39<41:40, 3.28s/it] 89%|████████▊ | 5876/6638 [1:48:43<41:39, 3.28s/it] {'loss': 0.6229, 'grad_norm': 0.5784579037547632, 'learning_rate': 6.833890949060462e-07, 'epoch': 0.89} 89%|████████▊ | 5876/6638 [1:48:43<41:39, 3.28s/it] 89%|████████▊ | 5877/6638 [1:48:46<41:45, 3.29s/it] {'loss': 0.5997, 'grad_norm': 0.4939915158709655, 'learning_rate': 6.816172464585614e-07, 'epoch': 0.89} 89%|████████▊ | 5877/6638 [1:48:46<41:45, 3.29s/it] 89%|████████▊ | 5878/6638 [1:48:49<41:19, 3.26s/it] {'loss': 0.6373, 'grad_norm': 0.5915235450384535, 'learning_rate': 6.798476169141766e-07, 'epoch': 0.89} 89%|████████▊ | 5878/6638 [1:48:49<41:19, 3.26s/it] 89%|████████▊ | 5879/6638 [1:48:52<41:14, 3.26s/it] {'loss': 0.649, 'grad_norm': 0.5812045773456718, 'learning_rate': 6.780802066942816e-07, 'epoch': 0.89} 89%|████████▊ | 5879/6638 [1:48:52<41:14, 3.26s/it] 89%|████████▊ | 5880/6638 [1:48:56<41:27, 3.28s/it] {'loss': 0.6145, 'grad_norm': 0.5103605275279485, 'learning_rate': 6.763150162197274e-07, 'epoch': 0.89} 89%|████████▊ | 5880/6638 [1:48:56<41:27, 3.28s/it] 89%|████████▊ | 5881/6638 [1:48:59<41:26, 3.28s/it] {'loss': 0.631, 'grad_norm': 0.5734711704815715, 'learning_rate': 6.745520459108523e-07, 'epoch': 0.89} 89%|████████▊ | 5881/6638 [1:48:59<41:26, 3.28s/it] 89%|████████▊ | 5882/6638 [1:49:02<41:49, 3.32s/it] {'loss': 0.7134, 'grad_norm': 0.6593095477552278, 'learning_rate': 6.727912961874505e-07, 'epoch': 0.89} 89%|████████▊ | 5882/6638 [1:49:02<41:49, 3.32s/it] 89%|████████▊ | 5883/6638 [1:49:06<42:00, 3.34s/it] {'loss': 0.6466, 'grad_norm': 0.6693969697094416, 'learning_rate': 6.710327674687944e-07, 'epoch': 0.89} 89%|████████▊ | 5883/6638 [1:49:06<42:00, 3.34s/it] 89%|████████▊ | 5884/6638 [1:49:09<41:43, 3.32s/it] {'loss': 0.6088, 'grad_norm': 0.5705569484915374, 'learning_rate': 6.692764601736268e-07, 'epoch': 0.89} 89%|████████▊ | 5884/6638 [1:49:09<41:43, 3.32s/it] 89%|████████▊ | 5885/6638 [1:49:12<41:13, 3.28s/it] {'loss': 0.6374, 'grad_norm': 0.5949867236458639, 'learning_rate': 6.67522374720162e-07, 'epoch': 0.89} 89%|████████▊ | 5885/6638 [1:49:12<41:13, 3.28s/it] 89%|████████▊ | 5886/6638 [1:49:16<40:59, 3.27s/it] {'loss': 0.6367, 'grad_norm': 0.5777994723112566, 'learning_rate': 6.657705115260859e-07, 'epoch': 0.89} 89%|████████▊ | 5886/6638 [1:49:16<40:59, 3.27s/it] 89%|████████▊ | 5887/6638 [1:49:19<41:02, 3.28s/it] {'loss': 0.6212, 'grad_norm': 0.612352058872266, 'learning_rate': 6.640208710085517e-07, 'epoch': 0.89} 89%|████████▊ | 5887/6638 [1:49:19<41:02, 3.28s/it] 89%|████████▊ | 5888/6638 [1:49:22<40:36, 3.25s/it] {'loss': 0.6594, 'grad_norm': 0.633749576381575, 'learning_rate': 6.622734535841868e-07, 'epoch': 0.89} 89%|████████▊ | 5888/6638 [1:49:22<40:36, 3.25s/it] 89%|████████▊ | 5889/6638 [1:49:25<41:04, 3.29s/it] {'loss': 0.6178, 'grad_norm': 0.5335186654277781, 'learning_rate': 6.605282596690888e-07, 'epoch': 0.89} 89%|████████▊ | 5889/6638 [1:49:25<41:04, 3.29s/it] 89%|████████▊ | 5890/6638 [1:49:29<41:00, 3.29s/it] {'loss': 0.629, 'grad_norm': 0.5985312561489637, 'learning_rate': 6.587852896788227e-07, 'epoch': 0.89} 89%|████████▊ | 5890/6638 [1:49:29<41:00, 3.29s/it] 89%|████████▊ | 5891/6638 [1:49:32<40:58, 3.29s/it] {'loss': 0.6628, 'grad_norm': 0.6339336539479519, 'learning_rate': 6.57044544028429e-07, 'epoch': 0.89} 89%|████████▊ | 5891/6638 [1:49:32<40:58, 3.29s/it] 89%|████████▉ | 5892/6638 [1:49:35<40:34, 3.26s/it] {'loss': 0.6357, 'grad_norm': 0.6503607347808754, 'learning_rate': 6.553060231324137e-07, 'epoch': 0.89} 89%|████████▉ | 5892/6638 [1:49:35<40:34, 3.26s/it] 89%|████████▉ | 5893/6638 [1:49:38<40:33, 3.27s/it] {'loss': 0.6469, 'grad_norm': 0.5758696837879141, 'learning_rate': 6.53569727404757e-07, 'epoch': 0.89} 89%|████████▉ | 5893/6638 [1:49:38<40:33, 3.27s/it] 89%|████████▉ | 5894/6638 [1:49:42<41:01, 3.31s/it] {'loss': 0.63, 'grad_norm': 0.5968849390246834, 'learning_rate': 6.518356572589079e-07, 'epoch': 0.89} 89%|████████▉ | 5894/6638 [1:49:42<41:01, 3.31s/it] 89%|████████▉ | 5895/6638 [1:49:45<40:36, 3.28s/it] {'loss': 0.6079, 'grad_norm': 0.5605355490666375, 'learning_rate': 6.50103813107782e-07, 'epoch': 0.89} 89%|████████▉ | 5895/6638 [1:49:45<40:36, 3.28s/it] 89%|████████▉ | 5896/6638 [1:49:48<40:29, 3.27s/it] {'loss': 0.6783, 'grad_norm': 0.6732103016665512, 'learning_rate': 6.48374195363769e-07, 'epoch': 0.89} 89%|████████▉ | 5896/6638 [1:49:48<40:29, 3.27s/it] 89%|████████▉ | 5897/6638 [1:49:52<40:23, 3.27s/it] {'loss': 0.6477, 'grad_norm': 0.5995675720875036, 'learning_rate': 6.466468044387308e-07, 'epoch': 0.89} 89%|████████▉ | 5897/6638 [1:49:52<40:23, 3.27s/it] 89%|████████▉ | 5898/6638 [1:49:55<40:30, 3.28s/it] {'loss': 0.6401, 'grad_norm': 0.6030086068632776, 'learning_rate': 6.449216407439906e-07, 'epoch': 0.89} 89%|████████▉ | 5898/6638 [1:49:55<40:30, 3.28s/it] 89%|████████▉ | 5899/6638 [1:49:58<40:22, 3.28s/it] {'loss': 0.6293, 'grad_norm': 0.606530434075043, 'learning_rate': 6.431987046903487e-07, 'epoch': 0.89} 89%|████████▉ | 5899/6638 [1:49:58<40:22, 3.28s/it]6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 89%|████████▉ | 5900/6638 [1:50:01<40:03, 3.26s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... {'loss': 0.6134, 'grad_norm': 0.5498262446210423, 'learning_rate': 6.414779966880714e-07, 'epoch': 0.89} 89%|████████▉ | 5900/6638 [1:50:01<40:03, 3.26s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5900/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5900/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-5900/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 89%|████████▉ | 5901/6638 [1:50:19<1:32:18, 7.52s/it] {'loss': 0.6546, 'grad_norm': 0.6011595858012405, 'learning_rate': 6.397595171468984e-07, 'epoch': 0.89} 89%|████████▉ | 5901/6638 [1:50:19<1:32:18, 7.52s/it] 89%|████████▉ | 5902/6638 [1:50:22<1:17:10, 6.29s/it] {'loss': 0.6342, 'grad_norm': 0.5919374614518964, 'learning_rate': 6.380432664760327e-07, 'epoch': 0.89} 89%|████████▉ | 5902/6638 [1:50:22<1:17:10, 6.29s/it] 89%|████████▉ | 5903/6638 [1:50:26<1:05:51, 5.38s/it] {'loss': 0.6833, 'grad_norm': 0.6679326095762582, 'learning_rate': 6.363292450841485e-07, 'epoch': 0.89} 89%|████████▉ | 5903/6638 [1:50:26<1:05:51, 5.38s/it] 89%|████████▉ | 5904/6638 [1:50:29<57:56, 4.74s/it] {'loss': 0.6286, 'grad_norm': 0.6147042939104146, 'learning_rate': 6.346174533793948e-07, 'epoch': 0.89} 89%|████████▉ | 5904/6638 [1:50:29<57:56, 4.74s/it] 89%|████████▉ | 5905/6638 [1:50:32<52:46, 4.32s/it] {'loss': 0.6155, 'grad_norm': 0.5374918660086551, 'learning_rate': 6.329078917693832e-07, 'epoch': 0.89} 89%|████████▉ | 5905/6638 [1:50:32<52:46, 4.32s/it] 89%|████████▉ | 5906/6638 [1:50:35<48:37, 3.99s/it] {'loss': 0.6407, 'grad_norm': 0.5958119527594944, 'learning_rate': 6.312005606611949e-07, 'epoch': 0.89} 89%|████████▉ | 5906/6638 [1:50:35<48:37, 3.99s/it] 89%|████████▉ | 5907/6638 [1:50:39<46:08, 3.79s/it] {'loss': 0.5972, 'grad_norm': 0.5302712631153539, 'learning_rate': 6.294954604613824e-07, 'epoch': 0.89} 89%|████████▉ | 5907/6638 [1:50:39<46:08, 3.79s/it] 89%|████████▉ | 5908/6638 [1:50:42<44:03, 3.62s/it] {'loss': 0.6516, 'grad_norm': 0.6240085918341575, 'learning_rate': 6.277925915759664e-07, 'epoch': 0.89} 89%|████████▉ | 5908/6638 [1:50:42<44:03, 3.62s/it] 89%|████████▉ | 5909/6638 [1:50:45<42:49, 3.53s/it] {'loss': 0.6149, 'grad_norm': 0.5544033033614698, 'learning_rate': 6.260919544104371e-07, 'epoch': 0.89} 89%|████████▉ | 5909/6638 [1:50:45<42:49, 3.53s/it] 89%|████████▉ | 5910/6638 [1:50:48<41:54, 3.45s/it] {'loss': 0.6574, 'grad_norm': 0.5995648865233997, 'learning_rate': 6.243935493697495e-07, 'epoch': 0.89} 89%|████████▉ | 5910/6638 [1:50:48<41:54, 3.45s/it] 89%|████████▉ | 5911/6638 [1:50:52<41:26, 3.42s/it] {'loss': 0.6296, 'grad_norm': 0.5256336434781784, 'learning_rate': 6.226973768583278e-07, 'epoch': 0.89} 89%|████████▉ | 5911/6638 [1:50:52<41:26, 3.42s/it] 89%|████████▉ | 5912/6638 [1:50:55<40:53, 3.38s/it] {'loss': 0.6324, 'grad_norm': 0.5788294265776354, 'learning_rate': 6.210034372800733e-07, 'epoch': 0.89} 89%|████████▉ | 5912/6638 [1:50:55<40:53, 3.38s/it] 89%|████████▉ | 5913/6638 [1:50:58<40:19, 3.34s/it] {'loss': 0.6221, 'grad_norm': 0.5817853607075683, 'learning_rate': 6.193117310383412e-07, 'epoch': 0.89} 89%|████████▉ | 5913/6638 [1:50:58<40:19, 3.34s/it] 89%|████████▉ | 5914/6638 [1:51:02<40:16, 3.34s/it] {'loss': 0.629, 'grad_norm': 0.5551381237157335, 'learning_rate': 6.176222585359681e-07, 'epoch': 0.89} 89%|████████▉ | 5914/6638 [1:51:02<40:16, 3.34s/it] 89%|████████▉ | 5915/6638 [1:51:05<40:24, 3.35s/it] {'loss': 0.6468, 'grad_norm': 0.5556482124320934, 'learning_rate': 6.15935020175249e-07, 'epoch': 0.89} 89%|████████▉ | 5915/6638 [1:51:05<40:24, 3.35s/it] 89%|████████▉ | 5916/6638 [1:51:08<39:48, 3.31s/it] {'loss': 0.6342, 'grad_norm': 0.5840924407223165, 'learning_rate': 6.142500163579534e-07, 'epoch': 0.89} 89%|████████▉ | 5916/6638 [1:51:08<39:48, 3.31s/it] 89%|████████▉ | 5917/6638 [1:51:12<39:36, 3.30s/it] {'loss': 0.619, 'grad_norm': 0.5546494658053258, 'learning_rate': 6.12567247485315e-07, 'epoch': 0.89} 89%|████████▉ | 5917/6638 [1:51:12<39:36, 3.30s/it] 89%|████████▉ | 5918/6638 [1:51:15<39:21, 3.28s/it] {'loss': 0.6351, 'grad_norm': 0.5415601869708664, 'learning_rate': 6.108867139580365e-07, 'epoch': 0.89} 89%|████████▉ | 5918/6638 [1:51:15<39:21, 3.28s/it] 89%|████████▉ | 5919/6638 [1:51:18<39:50, 3.32s/it] {'loss': 0.6436, 'grad_norm': 0.5916375078281686, 'learning_rate': 6.09208416176288e-07, 'epoch': 0.89} 89%|████████▉ | 5919/6638 [1:51:18<39:50, 3.32s/it] 89%|████████▉ | 5920/6638 [1:51:22<39:47, 3.32s/it] {'loss': 0.6403, 'grad_norm': 0.5363178910942165, 'learning_rate': 6.075323545397093e-07, 'epoch': 0.89} 89%|████████▉ | 5920/6638 [1:51:22<39:47, 3.32s/it] 89%|████████▉ | 5921/6638 [1:51:25<39:54, 3.34s/it] {'loss': 0.6397, 'grad_norm': 0.682306964774258, 'learning_rate': 6.058585294474028e-07, 'epoch': 0.89} 89%|████████▉ | 5921/6638 [1:51:25<39:54, 3.34s/it] 89%|████████▉ | 5922/6638 [1:51:28<40:10, 3.37s/it] {'loss': 0.669, 'grad_norm': 0.5538914341599437, 'learning_rate': 6.04186941297944e-07, 'epoch': 0.89} 89%|████████▉ | 5922/6638 [1:51:28<40:10, 3.37s/it] 89%|████████▉ | 5923/6638 [1:51:32<40:00, 3.36s/it] {'loss': 0.619, 'grad_norm': 0.6046896918181738, 'learning_rate': 6.02517590489372e-07, 'epoch': 0.89} 89%|████████▉ | 5923/6638 [1:51:32<40:00, 3.36s/it] 89%|████████▉ | 5924/6638 [1:51:35<40:09, 3.38s/it] {'loss': 0.6235, 'grad_norm': 0.6054872462797405, 'learning_rate': 6.00850477419197e-07, 'epoch': 0.89} 89%|████████▉ | 5924/6638 [1:51:35<40:09, 3.38s/it] 89%|████████▉ | 5925/6638 [1:51:38<39:13, 3.30s/it] {'loss': 0.587, 'grad_norm': 0.6042580354048523, 'learning_rate': 5.991856024843923e-07, 'epoch': 0.89} 89%|████████▉ | 5925/6638 [1:51:38<39:13, 3.30s/it] 89%|████████▉ | 5926/6638 [1:51:42<39:18, 3.31s/it] {'loss': 0.6768, 'grad_norm': 0.6492637140249846, 'learning_rate': 5.975229660813964e-07, 'epoch': 0.89} 89%|████████▉ | 5926/6638 [1:51:42<39:18, 3.31s/it] 89%|████████▉ | 5927/6638 [1:51:45<38:53, 3.28s/it] {'loss': 0.635, 'grad_norm': 0.5739902814829256, 'learning_rate': 5.958625686061215e-07, 'epoch': 0.89} 89%|████████▉ | 5927/6638 [1:51:45<38:53, 3.28s/it] 89%|████████▉ | 5928/6638 [1:51:48<39:32, 3.34s/it] {'loss': 0.6418, 'grad_norm': 0.5275750469915826, 'learning_rate': 5.942044104539412e-07, 'epoch': 0.89} 89%|████████▉ | 5928/6638 [1:51:48<39:32, 3.34s/it] 89%|████████▉ | 5929/6638 [1:51:52<39:11, 3.32s/it] {'loss': 0.6212, 'grad_norm': 0.5728890255032814, 'learning_rate': 5.925484920197022e-07, 'epoch': 0.89} 89%|████████▉ | 5929/6638 [1:51:52<39:11, 3.32s/it] 89%|████████▉ | 5930/6638 [1:51:55<38:39, 3.28s/it] {'loss': 0.6076, 'grad_norm': 0.6539091515526578, 'learning_rate': 5.908948136977078e-07, 'epoch': 0.89} 89%|████████▉ | 5930/6638 [1:51:55<38:39, 3.28s/it] 89%|████████▉ | 5931/6638 [1:51:58<38:17, 3.25s/it] {'loss': 0.6168, 'grad_norm': 0.5631691876648366, 'learning_rate': 5.892433758817362e-07, 'epoch': 0.89} 89%|████████▉ | 5931/6638 [1:51:58<38:17, 3.25s/it] 89%|████████▉ | 5932/6638 [1:52:01<38:19, 3.26s/it] {'loss': 0.6415, 'grad_norm': 0.6301176176296037, 'learning_rate': 5.875941789650319e-07, 'epoch': 0.89} 89%|████████▉ | 5932/6638 [1:52:01<38:19, 3.26s/it] 89%|████████▉ | 5933/6638 [1:52:04<38:31, 3.28s/it] {'loss': 0.6379, 'grad_norm': 0.5368495680279998, 'learning_rate': 5.859472233402985e-07, 'epoch': 0.89} 89%|████████▉ | 5933/6638 [1:52:04<38:31, 3.28s/it] 89%|████████▉ | 5934/6638 [1:52:08<38:06, 3.25s/it] {'loss': 0.6116, 'grad_norm': 0.5508440658769108, 'learning_rate': 5.843025093997134e-07, 'epoch': 0.89} 89%|████████▉ | 5934/6638 [1:52:08<38:06, 3.25s/it] 89%|████████▉ | 5935/6638 [1:52:11<38:14, 3.26s/it] {'loss': 0.6562, 'grad_norm': 0.6055449205031133, 'learning_rate': 5.826600375349201e-07, 'epoch': 0.89} 89%|████████▉ | 5935/6638 [1:52:11<38:14, 3.26s/it] 89%|████████▉ | 5936/6638 [1:52:14<38:58, 3.33s/it] {'loss': 0.6165, 'grad_norm': 0.5833679966703837, 'learning_rate': 5.8101980813702e-07, 'epoch': 0.89} 89%|████████▉ | 5936/6638 [1:52:14<38:58, 3.33s/it] 89%|████████▉ | 5937/6638 [1:52:18<38:42, 3.31s/it] {'loss': 0.6393, 'grad_norm': 0.6189124943758492, 'learning_rate': 5.793818215965918e-07, 'epoch': 0.89} 89%|████████▉ | 5937/6638 [1:52:18<38:42, 3.31s/it] 89%|████████▉ | 5938/6638 [1:52:21<38:35, 3.31s/it] {'loss': 0.611, 'grad_norm': 0.5132349661184304, 'learning_rate': 5.777460783036714e-07, 'epoch': 0.89} 89%|████████▉ | 5938/6638 [1:52:21<38:35, 3.31s/it] 89%|████████▉ | 5939/6638 [1:52:24<38:52, 3.34s/it] {'loss': 0.6769, 'grad_norm': 0.6445518779151703, 'learning_rate': 5.76112578647764e-07, 'epoch': 0.89} 89%|████████▉ | 5939/6638 [1:52:24<38:52, 3.34s/it] 89%|████████▉ | 5940/6638 [1:52:28<38:20, 3.30s/it] {'loss': 0.6121, 'grad_norm': 0.5575543701909208, 'learning_rate': 5.744813230178415e-07, 'epoch': 0.89} 89%|████████▉ | 5940/6638 [1:52:28<38:20, 3.30s/it] 89%|████████▉ | 5941/6638 [1:52:31<38:17, 3.30s/it] {'loss': 0.6241, 'grad_norm': 0.6136058350091435, 'learning_rate': 5.728523118023388e-07, 'epoch': 0.89} 89%|████████▉ | 5941/6638 [1:52:31<38:17, 3.30s/it] 90%|████████▉ | 5942/6638 [1:52:34<38:21, 3.31s/it] {'loss': 0.6329, 'grad_norm': 0.5517336601961292, 'learning_rate': 5.71225545389158e-07, 'epoch': 0.9} 90%|████████▉ | 5942/6638 [1:52:34<38:21, 3.31s/it] 90%|████████▉ | 5943/6638 [1:52:38<38:20, 3.31s/it] {'loss': 0.6232, 'grad_norm': 0.5280430562240638, 'learning_rate': 5.696010241656691e-07, 'epoch': 0.9} 90%|████████▉ | 5943/6638 [1:52:38<38:20, 3.31s/it] 90%|████████▉ | 5944/6638 [1:52:41<38:02, 3.29s/it] {'loss': 0.6475, 'grad_norm': 0.6050377477234791, 'learning_rate': 5.679787485187005e-07, 'epoch': 0.9} 90%|████████▉ | 5944/6638 [1:52:41<38:02, 3.29s/it] 90%|████████▉ | 5945/6638 [1:52:44<38:15, 3.31s/it] {'loss': 0.6075, 'grad_norm': 0.5669165907452904, 'learning_rate': 5.663587188345521e-07, 'epoch': 0.9} 90%|████████▉ | 5945/6638 [1:52:44<38:15, 3.31s/it] 90%|████████▉ | 5946/6638 [1:52:47<38:02, 3.30s/it] {'loss': 0.623, 'grad_norm': 0.5889069671268108, 'learning_rate': 5.647409354989874e-07, 'epoch': 0.9} 90%|████████▉ | 5946/6638 [1:52:47<38:02, 3.30s/it] 90%|████████▉ | 5947/6638 [1:52:51<38:01, 3.30s/it] {'loss': 0.607, 'grad_norm': 0.5180701752981436, 'learning_rate': 5.631253988972362e-07, 'epoch': 0.9} 90%|████████▉ | 5947/6638 [1:52:51<38:01, 3.30s/it] 90%|████████▉ | 5948/6638 [1:52:54<37:56, 3.30s/it] {'loss': 0.6131, 'grad_norm': 0.6091464743479917, 'learning_rate': 5.615121094139897e-07, 'epoch': 0.9} 90%|████████▉ | 5948/6638 [1:52:54<37:56, 3.30s/it] 90%|████████▉ | 5949/6638 [1:52:57<37:45, 3.29s/it] {'loss': 0.6003, 'grad_norm': 0.5116692416008245, 'learning_rate': 5.599010674334049e-07, 'epoch': 0.9} 90%|████████▉ | 5949/6638 [1:52:57<37:45, 3.29s/it]3 AutoResumeHook: Checking whether to suspend... 06 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 90%|████████▉ | 5950/6638 [1:53:01<37:47, 3.30s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6329, 'grad_norm': 0.627721879844183, 'learning_rate': 5.582922733391071e-07, 'epoch': 0.9} 90%|████████▉ | 5950/6638 [1:53:01<37:47, 3.30s/it] 90%|████████▉ | 5951/6638 [1:53:04<37:48, 3.30s/it] {'loss': 0.6791, 'grad_norm': 0.6023889665321827, 'learning_rate': 5.566857275141824e-07, 'epoch': 0.9} 90%|████████▉ | 5951/6638 [1:53:04<37:48, 3.30s/it] 90%|████████▉ | 5952/6638 [1:53:07<37:25, 3.27s/it] {'loss': 0.6467, 'grad_norm': 0.6987791133686321, 'learning_rate': 5.550814303411856e-07, 'epoch': 0.9} 90%|████████▉ | 5952/6638 [1:53:07<37:25, 3.27s/it] 90%|████████▉ | 5953/6638 [1:53:10<37:37, 3.30s/it] {'loss': 0.6328, 'grad_norm': 0.625007339973775, 'learning_rate': 5.534793822021311e-07, 'epoch': 0.9} 90%|████████▉ | 5953/6638 [1:53:10<37:37, 3.30s/it] 90%|████████▉ | 5954/6638 [1:53:14<37:57, 3.33s/it] {'loss': 0.6018, 'grad_norm': 0.5436436960295846, 'learning_rate': 5.518795834785018e-07, 'epoch': 0.9} 90%|████████▉ | 5954/6638 [1:53:14<37:57, 3.33s/it] 90%|████████▉ | 5955/6638 [1:53:17<37:59, 3.34s/it] {'loss': 0.6615, 'grad_norm': 0.5956852752427253, 'learning_rate': 5.502820345512438e-07, 'epoch': 0.9} 90%|████████▉ | 5955/6638 [1:53:17<37:59, 3.34s/it] 90%|████████▉ | 5956/6638 [1:53:21<37:59, 3.34s/it] {'loss': 0.6227, 'grad_norm': 0.588017162606943, 'learning_rate': 5.486867358007642e-07, 'epoch': 0.9} 90%|████████▉ | 5956/6638 [1:53:21<37:59, 3.34s/it] 90%|████████▉ | 5957/6638 [1:53:24<38:07, 3.36s/it] {'loss': 0.626, 'grad_norm': 0.5581487429378594, 'learning_rate': 5.4709368760694e-07, 'epoch': 0.9} 90%|████████▉ | 5957/6638 [1:53:24<38:07, 3.36s/it] 90%|████████▉ | 5958/6638 [1:53:27<38:05, 3.36s/it] {'loss': 0.6415, 'grad_norm': 0.541782826980745, 'learning_rate': 5.455028903491099e-07, 'epoch': 0.9} 90%|████████▉ | 5958/6638 [1:53:27<38:05, 3.36s/it] 90%|████████▉ | 5959/6638 [1:53:31<37:46, 3.34s/it] {'loss': 0.6761, 'grad_norm': 0.6408294217677662, 'learning_rate': 5.439143444060746e-07, 'epoch': 0.9} 90%|████████▉ | 5959/6638 [1:53:31<37:46, 3.34s/it] 90%|████████▉ | 5960/6638 [1:53:34<37:37, 3.33s/it] {'loss': 0.6405, 'grad_norm': 0.5829510578949685, 'learning_rate': 5.423280501561013e-07, 'epoch': 0.9} 90%|████████▉ | 5960/6638 [1:53:34<37:37, 3.33s/it] 90%|████████▉ | 5961/6638 [1:53:37<37:19, 3.31s/it] {'loss': 0.6304, 'grad_norm': 0.6369627897064675, 'learning_rate': 5.407440079769188e-07, 'epoch': 0.9} 90%|████████▉ | 5961/6638 [1:53:37<37:19, 3.31s/it] 90%|████████▉ | 5962/6638 [1:53:40<36:49, 3.27s/it] {'loss': 0.6609, 'grad_norm': 0.6143366885496614, 'learning_rate': 5.391622182457212e-07, 'epoch': 0.9} 90%|████████▉ | 5962/6638 [1:53:40<36:49, 3.27s/it] 90%|████████▉ | 5963/6638 [1:53:44<36:51, 3.28s/it] {'loss': 0.6229, 'grad_norm': 0.5278214767221331, 'learning_rate': 5.375826813391682e-07, 'epoch': 0.9} 90%|████████▉ | 5963/6638 [1:53:44<36:51, 3.28s/it] 90%|████████▉ | 5964/6638 [1:53:47<36:35, 3.26s/it] {'loss': 0.6278, 'grad_norm': 0.6713568733339546, 'learning_rate': 5.360053976333779e-07, 'epoch': 0.9} 90%|████████▉ | 5964/6638 [1:53:47<36:35, 3.26s/it] 90%|████████▉ | 5965/6638 [1:53:50<36:25, 3.25s/it] {'loss': 0.6089, 'grad_norm': 0.5905748768466634, 'learning_rate': 5.344303675039353e-07, 'epoch': 0.9} 90%|████████▉ | 5965/6638 [1:53:50<36:25, 3.25s/it] 90%|████████▉ | 5966/6638 [1:53:54<37:08, 3.32s/it] {'loss': 0.6897, 'grad_norm': 0.6491324275480094, 'learning_rate': 5.328575913258893e-07, 'epoch': 0.9} 90%|████████▉ | 5966/6638 [1:53:54<37:08, 3.32s/it] 90%|████████▉ | 5967/6638 [1:53:57<37:15, 3.33s/it] {'loss': 0.6176, 'grad_norm': 0.5561414304412542, 'learning_rate': 5.312870694737516e-07, 'epoch': 0.9} 90%|████████▉ | 5967/6638 [1:53:57<37:15, 3.33s/it] 90%|████████▉ | 5968/6638 [1:54:00<37:01, 3.32s/it] {'loss': 0.6117, 'grad_norm': 0.6099009499528553, 'learning_rate': 5.297188023214961e-07, 'epoch': 0.9} 90%|████████▉ | 5968/6638 [1:54:00<37:01, 3.32s/it] 90%|████████▉ | 5969/6638 [1:54:04<37:09, 3.33s/it] {'loss': 0.622, 'grad_norm': 0.6587192647052711, 'learning_rate': 5.281527902425565e-07, 'epoch': 0.9} 90%|████████▉ | 5969/6638 [1:54:04<37:09, 3.33s/it] 90%|████████▉ | 5970/6638 [1:54:07<37:10, 3.34s/it] {'loss': 0.6252, 'grad_norm': 0.6007190530826818, 'learning_rate': 5.265890336098389e-07, 'epoch': 0.9} 90%|████████▉ | 5970/6638 [1:54:07<37:10, 3.34s/it] 90%|████████▉ | 5971/6638 [1:54:10<36:54, 3.32s/it] {'loss': 0.6421, 'grad_norm': 0.5791585688049897, 'learning_rate': 5.250275327957033e-07, 'epoch': 0.9} 90%|████████▉ | 5971/6638 [1:54:10<36:54, 3.32s/it] 90%|████████▉ | 5972/6638 [1:54:14<36:40, 3.30s/it] {'loss': 0.632, 'grad_norm': 0.6079258633967802, 'learning_rate': 5.234682881719766e-07, 'epoch': 0.9} 90%|████████▉ | 5972/6638 [1:54:14<36:40, 3.30s/it] 90%|████████▉ | 5973/6638 [1:54:17<36:48, 3.32s/it] {'loss': 0.6173, 'grad_norm': 0.5554919438176453, 'learning_rate': 5.219113001099474e-07, 'epoch': 0.9} 90%|████████▉ | 5973/6638 [1:54:17<36:48, 3.32s/it] 90%|████████▉ | 5974/6638 [1:54:20<37:01, 3.35s/it] {'loss': 0.6305, 'grad_norm': 0.5514824419279896, 'learning_rate': 5.20356568980368e-07, 'epoch': 0.9} 90%|████████▉ | 5974/6638 [1:54:20<37:01, 3.35s/it] 90%|█████████ | 5975/6638 [1:54:24<36:40, 3.32s/it] {'loss': 0.594, 'grad_norm': 0.5560046265628765, 'learning_rate': 5.188040951534534e-07, 'epoch': 0.9} 90%|█████████ | 5975/6638 [1:54:24<36:40, 3.32s/it] 90%|█████████ | 5976/6638 [1:54:27<36:31, 3.31s/it] {'loss': 0.6474, 'grad_norm': 0.570155636078318, 'learning_rate': 5.172538789988779e-07, 'epoch': 0.9} 90%|█████████ | 5976/6638 [1:54:27<36:31, 3.31s/it] 90%|█████████ | 5977/6638 [1:54:30<36:39, 3.33s/it] {'loss': 0.6357, 'grad_norm': 0.5436950541380556, 'learning_rate': 5.157059208857817e-07, 'epoch': 0.9} 90%|█████████ | 5977/6638 [1:54:30<36:39, 3.33s/it] 90%|█████████ | 5978/6638 [1:54:34<36:32, 3.32s/it] {'loss': 0.6302, 'grad_norm': 0.5838203507085457, 'learning_rate': 5.14160221182769e-07, 'epoch': 0.9} 90%|█████████ | 5978/6638 [1:54:34<36:32, 3.32s/it] 90%|█████████ | 5979/6638 [1:54:37<36:57, 3.37s/it] {'loss': 0.6434, 'grad_norm': 0.5370887910364093, 'learning_rate': 5.126167802578985e-07, 'epoch': 0.9} 90%|█████████ | 5979/6638 [1:54:37<36:57, 3.37s/it] 90%|█████████ | 5980/6638 [1:54:40<36:39, 3.34s/it] {'loss': 0.634, 'grad_norm': 0.5647071512211861, 'learning_rate': 5.110755984786997e-07, 'epoch': 0.9} 90%|█████████ | 5980/6638 [1:54:40<36:39, 3.34s/it] 90%|█████████ | 5981/6638 [1:54:44<36:26, 3.33s/it] {'loss': 0.6309, 'grad_norm': 0.5933612142004977, 'learning_rate': 5.095366762121601e-07, 'epoch': 0.9} 90%|█████████ | 5981/6638 [1:54:44<36:26, 3.33s/it] 90%|█████████ | 5982/6638 [1:54:47<36:29, 3.34s/it] {'loss': 0.6502, 'grad_norm': 0.5523835133746886, 'learning_rate': 5.080000138247265e-07, 'epoch': 0.9} 90%|█████████ | 5982/6638 [1:54:47<36:29, 3.34s/it] 90%|█████████ | 5983/6638 [1:54:50<36:49, 3.37s/it] {'loss': 0.6131, 'grad_norm': 0.5497846960057624, 'learning_rate': 5.064656116823141e-07, 'epoch': 0.9} 90%|█████████ | 5983/6638 [1:54:50<36:49, 3.37s/it] 90%|█████████ | 5984/6638 [1:54:54<36:17, 3.33s/it] {'loss': 0.6218, 'grad_norm': 0.6147464829155113, 'learning_rate': 5.049334701502939e-07, 'epoch': 0.9} 90%|█████████ | 5984/6638 [1:54:54<36:17, 3.33s/it] 90%|█████████ | 5985/6638 [1:54:57<36:11, 3.32s/it] {'loss': 0.6267, 'grad_norm': 0.5897545124667067, 'learning_rate': 5.03403589593503e-07, 'epoch': 0.9} 90%|█████████ | 5985/6638 [1:54:57<36:11, 3.32s/it] 90%|█████████ | 5986/6638 [1:55:00<36:14, 3.34s/it] {'loss': 0.6493, 'grad_norm': 0.6270304814407353, 'learning_rate': 5.018759703762377e-07, 'epoch': 0.9} 90%|█████████ | 5986/6638 [1:55:00<36:14, 3.34s/it] 90%|█████████ | 5987/6638 [1:55:04<35:54, 3.31s/it] {'loss': 0.6411, 'grad_norm': 0.8604806687504015, 'learning_rate': 5.003506128622537e-07, 'epoch': 0.9} 90%|█████████ | 5987/6638 [1:55:04<35:54, 3.31s/it] 90%|█████████ | 5988/6638 [1:55:07<35:54, 3.32s/it] {'loss': 0.6678, 'grad_norm': 0.5534293789822028, 'learning_rate': 4.988275174147739e-07, 'epoch': 0.9} 90%|█████████ | 5988/6638 [1:55:07<35:54, 3.32s/it] 90%|█████████ | 5989/6638 [1:55:10<36:01, 3.33s/it] {'loss': 0.664, 'grad_norm': 0.6118137604342032, 'learning_rate': 4.97306684396478e-07, 'epoch': 0.9} 90%|█████████ | 5989/6638 [1:55:10<36:01, 3.33s/it] 90%|█████████ | 5990/6638 [1:55:13<35:36, 3.30s/it] {'loss': 0.652, 'grad_norm': 0.6839907679687405, 'learning_rate': 4.957881141695098e-07, 'epoch': 0.9} 90%|█████████ | 5990/6638 [1:55:13<35:36, 3.30s/it] 90%|█████████ | 5991/6638 [1:55:17<35:29, 3.29s/it] {'loss': 0.6286, 'grad_norm': 0.642731496609393, 'learning_rate': 4.942718070954721e-07, 'epoch': 0.9} 90%|█████████ | 5991/6638 [1:55:17<35:29, 3.29s/it] 90%|█████████ | 5992/6638 [1:55:20<35:22, 3.28s/it] {'loss': 0.5771, 'grad_norm': 0.506664070622271, 'learning_rate': 4.927577635354253e-07, 'epoch': 0.9} 90%|█████████ | 5992/6638 [1:55:20<35:22, 3.28s/it] 90%|█████████ | 5993/6638 [1:55:23<35:22, 3.29s/it] {'loss': 0.6435, 'grad_norm': 0.6461507411654505, 'learning_rate': 4.912459838499029e-07, 'epoch': 0.9} 90%|█████████ | 5993/6638 [1:55:23<35:22, 3.29s/it] 90%|█████████ | 5994/6638 [1:55:26<35:02, 3.27s/it] {'loss': 0.6671, 'grad_norm': 0.5977795527840213, 'learning_rate': 4.897364683988859e-07, 'epoch': 0.9} 90%|█████████ | 5994/6638 [1:55:26<35:02, 3.27s/it] 90%|█████████ | 5995/6638 [1:55:30<35:57, 3.36s/it] {'loss': 0.6361, 'grad_norm': 0.5428646599139216, 'learning_rate': 4.882292175418246e-07, 'epoch': 0.9} 90%|█████████ | 5995/6638 [1:55:30<35:57, 3.36s/it] 90%|█████████ | 5996/6638 [1:55:33<35:36, 3.33s/it] {'loss': 0.6138, 'grad_norm': 0.603805690757627, 'learning_rate': 4.867242316376241e-07, 'epoch': 0.9} 90%|█████████ | 5996/6638 [1:55:33<35:36, 3.33s/it] 90%|█████████ | 5997/6638 [1:55:37<36:03, 3.38s/it] {'loss': 0.6063, 'grad_norm': 0.5625040260864596, 'learning_rate': 4.852215110446556e-07, 'epoch': 0.9} 90%|█████████ | 5997/6638 [1:55:37<36:03, 3.38s/it] 90%|█████████ | 5998/6638 [1:55:40<35:55, 3.37s/it] {'loss': 0.7154, 'grad_norm': 0.7400248342370063, 'learning_rate': 4.837210561207484e-07, 'epoch': 0.9} 90%|█████████ | 5998/6638 [1:55:40<35:55, 3.37s/it] 90%|█████████ | 5999/6638 [1:55:43<35:18, 3.32s/it] {'loss': 0.6056, 'grad_norm': 0.517545866311432, 'learning_rate': 4.82222867223191e-07, 'epoch': 0.9} 90%|█████████ | 5999/6638 [1:55:43<35:18, 3.32s/it]6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 90%|█████████ | 6000/6638 [1:55:47<35:25, 3.33s/it]5 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6525, 'grad_norm': 0.599838151669707, 'learning_rate': 4.807269447087348e-07, 'epoch': 0.9} 90%|█████████ | 6000/6638 [1:55:47<35:25, 3.33s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6000/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6000/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6000/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 90%|█████████ | 6001/6638 [1:56:04<1:20:20, 7.57s/it] {'loss': 0.6126, 'grad_norm': 0.5207282047738883, 'learning_rate': 4.792332889335915e-07, 'epoch': 0.9} 90%|█████████ | 6001/6638 [1:56:04<1:20:20, 7.57s/it] 90%|█████████ | 6002/6638 [1:56:07<1:06:28, 6.27s/it] {'loss': 0.624, 'grad_norm': 0.6373626881041798, 'learning_rate': 4.777419002534289e-07, 'epoch': 0.9} 90%|█████████ | 6002/6638 [1:56:07<1:06:28, 6.27s/it] 90%|█████████ | 6003/6638 [1:56:11<56:34, 5.35s/it] {'loss': 0.6316, 'grad_norm': 0.5778045507509223, 'learning_rate': 4.762527790233817e-07, 'epoch': 0.9} 90%|█████████ | 6003/6638 [1:56:11<56:34, 5.35s/it] 90%|█████████ | 6004/6638 [1:56:14<49:45, 4.71s/it] {'loss': 0.6038, 'grad_norm': 0.5541280772737742, 'learning_rate': 4.7476592559803744e-07, 'epoch': 0.9} 90%|█████████ | 6004/6638 [1:56:14<49:45, 4.71s/it] 90%|█████████ | 6005/6638 [1:56:17<45:15, 4.29s/it] {'loss': 0.6539, 'grad_norm': 0.5414428701014478, 'learning_rate': 4.7328134033144955e-07, 'epoch': 0.9} 90%|█████████ | 6005/6638 [1:56:17<45:15, 4.29s/it] 90%|█████████ | 6006/6638 [1:56:20<42:02, 3.99s/it] {'loss': 0.6389, 'grad_norm': 0.6015404856378614, 'learning_rate': 4.717990235771297e-07, 'epoch': 0.9} 90%|█████████ | 6006/6638 [1:56:20<42:02, 3.99s/it] 90%|█████████ | 6007/6638 [1:56:24<39:52, 3.79s/it] {'loss': 0.661, 'grad_norm': 0.6441480402622607, 'learning_rate': 4.7031897568804553e-07, 'epoch': 0.9} 90%|█████████ | 6007/6638 [1:56:24<39:52, 3.79s/it] 91%|█████████ | 6008/6638 [1:56:27<38:30, 3.67s/it] {'loss': 0.6556, 'grad_norm': 0.6090924437487438, 'learning_rate': 4.6884119701662954e-07, 'epoch': 0.91} 91%|█████████ | 6008/6638 [1:56:27<38:30, 3.67s/it] 91%|█████████ | 6009/6638 [1:56:30<37:12, 3.55s/it] {'loss': 0.7142, 'grad_norm': 0.690954723767028, 'learning_rate': 4.6736568791477367e-07, 'epoch': 0.91} 91%|█████████ | 6009/6638 [1:56:30<37:12, 3.55s/it] 91%|█████████ | 6010/6638 [1:56:34<36:10, 3.46s/it] {'loss': 0.648, 'grad_norm': 0.5911502270914017, 'learning_rate': 4.6589244873382454e-07, 'epoch': 0.91} 91%|█████████ | 6010/6638 [1:56:34<36:10, 3.46s/it] 91%|█████████ | 6011/6638 [1:56:37<35:27, 3.39s/it] {'loss': 0.6362, 'grad_norm': 0.6183587803598601, 'learning_rate': 4.6442147982459276e-07, 'epoch': 0.91} 91%|█████████ | 6011/6638 [1:56:37<35:27, 3.39s/it] 91%|█████████ | 6012/6638 [1:56:40<34:51, 3.34s/it] {'loss': 0.6083, 'grad_norm': 0.5277261067629914, 'learning_rate': 4.6295278153734693e-07, 'epoch': 0.91} 91%|█████████ | 6012/6638 [1:56:40<34:51, 3.34s/it] 91%|█████████ | 6013/6638 [1:56:43<34:49, 3.34s/it] {'loss': 0.6232, 'grad_norm': 0.5448359773817231, 'learning_rate': 4.6148635422181733e-07, 'epoch': 0.91} 91%|█████████ | 6013/6638 [1:56:43<34:49, 3.34s/it] 91%|█████████ | 6014/6638 [1:56:47<34:51, 3.35s/it] {'loss': 0.6173, 'grad_norm': 0.5278620697419083, 'learning_rate': 4.6002219822718794e-07, 'epoch': 0.91} 91%|█████████ | 6014/6638 [1:56:47<34:51, 3.35s/it] 91%|█████████ | 6015/6638 [1:56:50<34:24, 3.31s/it] {'loss': 0.6288, 'grad_norm': 0.6079114763376081, 'learning_rate': 4.585603139021044e-07, 'epoch': 0.91} 91%|█████████ | 6015/6638 [1:56:50<34:24, 3.31s/it] 91%|█████████ | 6016/6638 [1:56:54<35:29, 3.42s/it] {'loss': 0.641, 'grad_norm': 0.633373293888161, 'learning_rate': 4.5710070159467604e-07, 'epoch': 0.91} 91%|█████████ | 6016/6638 [1:56:54<35:29, 3.42s/it] 91%|█████████ | 6017/6638 [1:56:57<35:02, 3.39s/it] {'loss': 0.6331, 'grad_norm': 0.5027138316848029, 'learning_rate': 4.556433616524636e-07, 'epoch': 0.91} 91%|█████████ | 6017/6638 [1:56:57<35:02, 3.39s/it] 91%|█████████ | 6018/6638 [1:57:00<35:01, 3.39s/it] {'loss': 0.5958, 'grad_norm': 0.5173392289267015, 'learning_rate': 4.54188294422494e-07, 'epoch': 0.91} 91%|█████████ | 6018/6638 [1:57:00<35:01, 3.39s/it] 91%|█████████ | 6019/6638 [1:57:04<34:38, 3.36s/it] {'loss': 0.6386, 'grad_norm': 0.631889249580176, 'learning_rate': 4.527355002512457e-07, 'epoch': 0.91} 91%|█████████ | 6019/6638 [1:57:04<34:38, 3.36s/it] 91%|█████████ | 6020/6638 [1:57:07<34:42, 3.37s/it] {'loss': 0.6505, 'grad_norm': 0.5736518696747581, 'learning_rate': 4.512849794846608e-07, 'epoch': 0.91} 91%|█████████ | 6020/6638 [1:57:07<34:42, 3.37s/it] 91%|█████████ | 6021/6638 [1:57:10<34:17, 3.33s/it] {'loss': 0.6094, 'grad_norm': 0.5993798209537206, 'learning_rate': 4.498367324681407e-07, 'epoch': 0.91} 91%|█████████ | 6021/6638 [1:57:10<34:17, 3.33s/it] 91%|█████████ | 6022/6638 [1:57:14<33:54, 3.30s/it] {'loss': 0.6329, 'grad_norm': 0.5734850671687379, 'learning_rate': 4.48390759546542e-07, 'epoch': 0.91} 91%|█████████ | 6022/6638 [1:57:14<33:54, 3.30s/it] 91%|█████████ | 6023/6638 [1:57:17<33:42, 3.29s/it] {'loss': 0.6112, 'grad_norm': 0.6147732025643868, 'learning_rate': 4.4694706106418016e-07, 'epoch': 0.91} 91%|█████████ | 6023/6638 [1:57:17<33:42, 3.29s/it] 91%|█████████ | 6024/6638 [1:57:20<33:41, 3.29s/it] {'loss': 0.6253, 'grad_norm': 0.5546712955266468, 'learning_rate': 4.455056373648325e-07, 'epoch': 0.91} 91%|█████████ | 6024/6638 [1:57:20<33:41, 3.29s/it] 91%|█████████ | 6025/6638 [1:57:24<33:52, 3.32s/it] {'loss': 0.6333, 'grad_norm': 0.6639337788408953, 'learning_rate': 4.440664887917301e-07, 'epoch': 0.91} 91%|█████████ | 6025/6638 [1:57:24<33:52, 3.32s/it] 91%|█████████ | 6026/6638 [1:57:27<33:31, 3.29s/it] {'loss': 0.6424, 'grad_norm': 0.6374493662146756, 'learning_rate': 4.426296156875676e-07, 'epoch': 0.91} 91%|█████████ | 6026/6638 [1:57:27<33:31, 3.29s/it] 91%|█████████ | 6027/6638 [1:57:30<33:25, 3.28s/it] {'loss': 0.5969, 'grad_norm': 0.4952014438323762, 'learning_rate': 4.411950183944902e-07, 'epoch': 0.91} 91%|█████████ | 6027/6638 [1:57:30<33:25, 3.28s/it] 91%|█████████ | 6028/6638 [1:57:33<33:42, 3.32s/it] {'loss': 0.6223, 'grad_norm': 0.571318638500645, 'learning_rate': 4.3976269725411026e-07, 'epoch': 0.91} 91%|█████████ | 6028/6638 [1:57:33<33:42, 3.32s/it] 91%|█████████ | 6029/6638 [1:57:37<33:39, 3.32s/it] {'loss': 0.6291, 'grad_norm': 0.5968717904121883, 'learning_rate': 4.3833265260749157e-07, 'epoch': 0.91} 91%|█████████ | 6029/6638 [1:57:37<33:39, 3.32s/it] 91%|█████████ | 6030/6638 [1:57:40<33:35, 3.32s/it] {'loss': 0.6185, 'grad_norm': 0.5669709431386342, 'learning_rate': 4.3690488479515735e-07, 'epoch': 0.91} 91%|█████████ | 6030/6638 [1:57:40<33:35, 3.32s/it] 91%|█████████ | 6031/6638 [1:57:43<33:51, 3.35s/it] {'loss': 0.6704, 'grad_norm': 0.5687475623028373, 'learning_rate': 4.3547939415708893e-07, 'epoch': 0.91} 91%|█████████ | 6031/6638 [1:57:43<33:51, 3.35s/it] 91%|█████████ | 6032/6638 [1:57:47<33:25, 3.31s/it] {'loss': 0.6244, 'grad_norm': 0.6369944689303909, 'learning_rate': 4.3405618103272816e-07, 'epoch': 0.91} 91%|█████████ | 6032/6638 [1:57:47<33:25, 3.31s/it] 91%|█████████ | 6033/6638 [1:57:50<33:25, 3.32s/it] {'loss': 0.6022, 'grad_norm': 0.5394888166073858, 'learning_rate': 4.326352457609695e-07, 'epoch': 0.91} 91%|█████████ | 6033/6638 [1:57:50<33:25, 3.32s/it] 91%|█████████ | 6034/6638 [1:57:53<33:13, 3.30s/it] {'loss': 0.6222, 'grad_norm': 0.5771456783104761, 'learning_rate': 4.312165886801678e-07, 'epoch': 0.91} 91%|█████████ | 6034/6638 [1:57:53<33:13, 3.30s/it] 91%|█████████ | 6035/6638 [1:57:57<32:55, 3.28s/it] {'loss': 0.6602, 'grad_norm': 0.5804153356835424, 'learning_rate': 4.298002101281362e-07, 'epoch': 0.91} 91%|█████████ | 6035/6638 [1:57:57<32:55, 3.28s/it] 91%|█████████ | 6036/6638 [1:58:00<32:56, 3.28s/it] {'loss': 0.6802, 'grad_norm': 0.6821991864191664, 'learning_rate': 4.2838611044214496e-07, 'epoch': 0.91} 91%|█████████ | 6036/6638 [1:58:00<32:56, 3.28s/it] 91%|█████████ | 6037/6638 [1:58:03<32:36, 3.25s/it] {'loss': 0.6253, 'grad_norm': 0.557432825376028, 'learning_rate': 4.269742899589191e-07, 'epoch': 0.91} 91%|█████████ | 6037/6638 [1:58:03<32:36, 3.25s/it] 91%|█████████ | 6038/6638 [1:58:06<32:39, 3.27s/it] {'loss': 0.6391, 'grad_norm': 0.5687116961389758, 'learning_rate': 4.2556474901464195e-07, 'epoch': 0.91} 91%|█████████ | 6038/6638 [1:58:06<32:39, 3.27s/it] 91%|█████████ | 6039/6638 [1:58:10<32:28, 3.25s/it] {'loss': 0.6436, 'grad_norm': 0.5797472523448364, 'learning_rate': 4.241574879449595e-07, 'epoch': 0.91} 91%|█████████ | 6039/6638 [1:58:10<32:28, 3.25s/it] 91%|█████████ | 6040/6638 [1:58:13<32:34, 3.27s/it] {'loss': 0.6151, 'grad_norm': 0.5295603931016403, 'learning_rate': 4.2275250708496475e-07, 'epoch': 0.91} 91%|█████████ | 6040/6638 [1:58:13<32:34, 3.27s/it] 91%|█████████ | 6041/6638 [1:58:16<32:24, 3.26s/it] {'loss': 0.6202, 'grad_norm': 0.5977943892158155, 'learning_rate': 4.2134980676921787e-07, 'epoch': 0.91} 91%|█████████ | 6041/6638 [1:58:16<32:24, 3.26s/it] 91%|█████████ | 6042/6638 [1:58:19<32:21, 3.26s/it] {'loss': 0.6453, 'grad_norm': 0.5715084252868495, 'learning_rate': 4.199493873317273e-07, 'epoch': 0.91} 91%|█████████ | 6042/6638 [1:58:19<32:21, 3.26s/it] 91%|█████████ | 6043/6638 [1:58:23<32:17, 3.26s/it] {'loss': 0.6169, 'grad_norm': 0.5938028304185486, 'learning_rate': 4.18551249105964e-07, 'epoch': 0.91} 91%|█████████ | 6043/6638 [1:58:23<32:17, 3.26s/it] 91%|█████████ | 6044/6638 [1:58:26<32:10, 3.25s/it] {'loss': 0.6023, 'grad_norm': 0.5842793260555574, 'learning_rate': 4.1715539242485613e-07, 'epoch': 0.91} 91%|█████████ | 6044/6638 [1:58:26<32:10, 3.25s/it] 91%|█████████ | 6045/6638 [1:58:29<32:02, 3.24s/it] {'loss': 0.6127, 'grad_norm': 0.5530801445760128, 'learning_rate': 4.157618176207834e-07, 'epoch': 0.91} 91%|█████████ | 6045/6638 [1:58:29<32:02, 3.24s/it] 91%|█████████ | 6046/6638 [1:58:32<31:56, 3.24s/it] {'loss': 0.6964, 'grad_norm': 0.6611220226995604, 'learning_rate': 4.1437052502558693e-07, 'epoch': 0.91} 91%|█████████ | 6046/6638 [1:58:32<31:56, 3.24s/it] 91%|█████████ | 6047/6638 [1:58:36<32:01, 3.25s/it] {'loss': 0.6263, 'grad_norm': 0.6171875837565975, 'learning_rate': 4.1298151497056404e-07, 'epoch': 0.91} 91%|█████████ | 6047/6638 [1:58:36<32:01, 3.25s/it] 91%|█████████ | 6048/6638 [1:58:39<32:41, 3.32s/it] {'loss': 0.6828, 'grad_norm': 0.6374376099351545, 'learning_rate': 4.1159478778646347e-07, 'epoch': 0.91} 91%|█████████ | 6048/6638 [1:58:39<32:41, 3.32s/it] 91%|█████████ | 6049/6638 [1:58:42<32:32, 3.31s/it] {'loss': 0.6757, 'grad_norm': 0.6080957093274588, 'learning_rate': 4.102103438034988e-07, 'epoch': 0.91} 91%|█████████ | 6049/6638 [1:58:42<32:32, 3.31s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 0 4AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 91%|█████████ | 6050/6638 [1:58:45<32:02, 3.27s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6301, 'grad_norm': 0.6159267792704521, 'learning_rate': 4.088281833513297e-07, 'epoch': 0.91} 91%|█████████ | 6050/6638 [1:58:45<32:02, 3.27s/it] 91%|█████████ | 6051/6638 [1:58:49<32:03, 3.28s/it] {'loss': 0.6277, 'grad_norm': 0.5851003455042584, 'learning_rate': 4.07448306759084e-07, 'epoch': 0.91} 91%|█████████ | 6051/6638 [1:58:49<32:03, 3.28s/it] 91%|█████████ | 6052/6638 [1:58:52<32:28, 3.32s/it] {'loss': 0.6318, 'grad_norm': 0.5417140170510025, 'learning_rate': 4.0607071435533554e-07, 'epoch': 0.91} 91%|█████████ | 6052/6638 [1:58:52<32:28, 3.32s/it] 91%|█████████ | 6053/6638 [1:58:56<32:32, 3.34s/it] {'loss': 0.6388, 'grad_norm': 0.5570928230143047, 'learning_rate': 4.0469540646811856e-07, 'epoch': 0.91} 91%|█████████ | 6053/6638 [1:58:56<32:32, 3.34s/it] 91%|█████████ | 6054/6638 [1:58:59<32:49, 3.37s/it] {'loss': 0.6437, 'grad_norm': 0.5820447464868216, 'learning_rate': 4.0332238342492223e-07, 'epoch': 0.91} 91%|█████████ | 6054/6638 [1:58:59<32:49, 3.37s/it] 91%|█████████ | 6055/6638 [1:59:02<32:20, 3.33s/it] {'loss': 0.6287, 'grad_norm': 0.5326790033849319, 'learning_rate': 4.019516455526928e-07, 'epoch': 0.91} 91%|█████████ | 6055/6638 [1:59:02<32:20, 3.33s/it] 91%|█████████ | 6056/6638 [1:59:06<32:21, 3.34s/it] {'loss': 0.6336, 'grad_norm': 0.5611623409372524, 'learning_rate': 4.0058319317783366e-07, 'epoch': 0.91} 91%|█████████ | 6056/6638 [1:59:06<32:21, 3.34s/it] 91%|█████████ | 6057/6638 [1:59:09<32:11, 3.32s/it] {'loss': 0.6603, 'grad_norm': 0.6237489144231814, 'learning_rate': 3.9921702662619966e-07, 'epoch': 0.91} 91%|█████████ | 6057/6638 [1:59:09<32:11, 3.32s/it] 91%|█████████▏| 6058/6638 [1:59:12<32:00, 3.31s/it] {'loss': 0.6423, 'grad_norm': 0.5672507222305733, 'learning_rate': 3.97853146223105e-07, 'epoch': 0.91} 91%|█████████▏| 6058/6638 [1:59:12<32:00, 3.31s/it] 91%|█████████▏| 6059/6638 [1:59:15<31:52, 3.30s/it] {'loss': 0.6197, 'grad_norm': 0.5450591888333215, 'learning_rate': 3.964915522933188e-07, 'epoch': 0.91} 91%|█████████▏| 6059/6638 [1:59:15<31:52, 3.30s/it] 91%|█████████▏| 6060/6638 [1:59:19<31:54, 3.31s/it] {'loss': 0.6375, 'grad_norm': 0.5754278640910082, 'learning_rate': 3.9513224516106507e-07, 'epoch': 0.91} 91%|█████████▏| 6060/6638 [1:59:19<31:54, 3.31s/it] 91%|█████████▏| 6061/6638 [1:59:22<32:09, 3.34s/it] {'loss': 0.6086, 'grad_norm': 0.5211430060216716, 'learning_rate': 3.937752251500204e-07, 'epoch': 0.91} 91%|█████████▏| 6061/6638 [1:59:22<32:09, 3.34s/it] 91%|█████████▏| 6062/6638 [1:59:26<32:22, 3.37s/it] {'loss': 0.6516, 'grad_norm': 0.6695472007269423, 'learning_rate': 3.9242049258332306e-07, 'epoch': 0.91} 91%|█████████▏| 6062/6638 [1:59:26<32:22, 3.37s/it] 91%|█████████▏| 6063/6638 [1:59:29<32:20, 3.37s/it] {'loss': 0.6206, 'grad_norm': 0.5520406919654206, 'learning_rate': 3.910680477835627e-07, 'epoch': 0.91} 91%|█████████▏| 6063/6638 [1:59:29<32:20, 3.37s/it] 91%|█████████▏| 6064/6638 [1:59:32<32:19, 3.38s/it] {'loss': 0.6205, 'grad_norm': 0.55817184048937, 'learning_rate': 3.897178910727861e-07, 'epoch': 0.91} 91%|█████████▏| 6064/6638 [1:59:32<32:19, 3.38s/it] 91%|█████████▏| 6065/6638 [1:59:36<32:00, 3.35s/it] {'loss': 0.6342, 'grad_norm': 0.5854468414707926, 'learning_rate': 3.883700227724907e-07, 'epoch': 0.91} 91%|█████████▏| 6065/6638 [1:59:36<32:00, 3.35s/it] 91%|█████████▏| 6066/6638 [1:59:39<32:32, 3.41s/it] {'loss': 0.656, 'grad_norm': 0.5695307115240908, 'learning_rate': 3.870244432036352e-07, 'epoch': 0.91} 91%|█████████▏| 6066/6638 [1:59:39<32:32, 3.41s/it] 91%|█████████▏| 6067/6638 [1:59:42<31:57, 3.36s/it] {'loss': 0.6062, 'grad_norm': 0.5476910201293191, 'learning_rate': 3.8568115268663e-07, 'epoch': 0.91} 91%|█████████▏| 6067/6638 [1:59:42<31:57, 3.36s/it] 91%|█████████▏| 6068/6638 [1:59:46<31:41, 3.34s/it] {'loss': 0.675, 'grad_norm': 0.602484158443118, 'learning_rate': 3.843401515413392e-07, 'epoch': 0.91} 91%|█████████▏| 6068/6638 [1:59:46<31:41, 3.34s/it] 91%|█████████▏| 6069/6638 [1:59:49<31:36, 3.33s/it] {'loss': 0.6036, 'grad_norm': 0.5682678905465649, 'learning_rate': 3.8300144008708516e-07, 'epoch': 0.91} 91%|█████████▏| 6069/6638 [1:59:49<31:36, 3.33s/it] 91%|█████████▏| 6070/6638 [1:59:52<31:33, 3.33s/it] {'loss': 0.6609, 'grad_norm': 0.6466964561201467, 'learning_rate': 3.8166501864264404e-07, 'epoch': 0.91} 91%|█████████▏| 6070/6638 [1:59:52<31:33, 3.33s/it] 91%|█████████▏| 6071/6638 [1:59:56<31:30, 3.33s/it] {'loss': 0.6104, 'grad_norm': 0.5319048282830694, 'learning_rate': 3.8033088752624347e-07, 'epoch': 0.91} 91%|█████████▏| 6071/6638 [1:59:56<31:30, 3.33s/it] 91%|█████████▏| 6072/6638 [1:59:59<31:47, 3.37s/it] {'loss': 0.5898, 'grad_norm': 0.5430535639798405, 'learning_rate': 3.7899904705556936e-07, 'epoch': 0.91} 91%|█████████▏| 6072/6638 [1:59:59<31:47, 3.37s/it] 91%|█████████▏| 6073/6638 [2:00:02<31:25, 3.34s/it] {'loss': 0.6984, 'grad_norm': 1.0733192683321482, 'learning_rate': 3.7766949754775905e-07, 'epoch': 0.91} 91%|█████████▏| 6073/6638 [2:00:02<31:25, 3.34s/it] 92%|█████████▏| 6074/6638 [2:00:06<31:03, 3.30s/it] {'loss': 0.629, 'grad_norm': 0.6531332867478565, 'learning_rate': 3.763422393194105e-07, 'epoch': 0.92} 92%|█████████▏| 6074/6638 [2:00:06<31:03, 3.30s/it] 92%|█████████▏| 6075/6638 [2:00:09<31:04, 3.31s/it] {'loss': 0.7159, 'grad_norm': 0.644144478745638, 'learning_rate': 3.7501727268656974e-07, 'epoch': 0.92} 92%|█████████▏| 6075/6638 [2:00:09<31:04, 3.31s/it] 92%|█████████▏| 6076/6638 [2:00:12<30:57, 3.30s/it] {'loss': 0.6398, 'grad_norm': 0.5808301247658475, 'learning_rate': 3.736945979647377e-07, 'epoch': 0.92} 92%|█████████▏| 6076/6638 [2:00:12<30:57, 3.30s/it] 92%|█████████▏| 6077/6638 [2:00:16<31:06, 3.33s/it] {'loss': 0.6121, 'grad_norm': 0.5444357695005286, 'learning_rate': 3.723742154688714e-07, 'epoch': 0.92} 92%|█████████▏| 6077/6638 [2:00:16<31:06, 3.33s/it] 92%|█████████▏| 6078/6638 [2:00:19<31:04, 3.33s/it] {'loss': 0.6754, 'grad_norm': 0.5500064891434332, 'learning_rate': 3.7105612551338377e-07, 'epoch': 0.92} 92%|█████████▏| 6078/6638 [2:00:19<31:04, 3.33s/it] 92%|█████████▏| 6079/6638 [2:00:22<31:09, 3.34s/it] {'loss': 0.608, 'grad_norm': 0.5434933952363872, 'learning_rate': 3.6974032841213923e-07, 'epoch': 0.92} 92%|█████████▏| 6079/6638 [2:00:22<31:09, 3.34s/it] 92%|█████████▏| 6080/6638 [2:00:26<30:48, 3.31s/it] {'loss': 0.6411, 'grad_norm': 0.5935654448820096, 'learning_rate': 3.68426824478455e-07, 'epoch': 0.92} 92%|█████████▏| 6080/6638 [2:00:26<30:48, 3.31s/it] 92%|█████████▏| 6081/6638 [2:00:29<30:49, 3.32s/it] {'loss': 0.6512, 'grad_norm': 0.6000777166666407, 'learning_rate': 3.671156140251053e-07, 'epoch': 0.92} 92%|█████████▏| 6081/6638 [2:00:29<30:49, 3.32s/it] 92%|█████████▏| 6082/6638 [2:00:32<31:08, 3.36s/it] {'loss': 0.6994, 'grad_norm': 0.6336354631072932, 'learning_rate': 3.658066973643171e-07, 'epoch': 0.92} 92%|█████████▏| 6082/6638 [2:00:32<31:08, 3.36s/it] 92%|█████████▏| 6083/6638 [2:00:36<30:54, 3.34s/it] {'loss': 0.6103, 'grad_norm': 0.554080576414848, 'learning_rate': 3.645000748077709e-07, 'epoch': 0.92} 92%|█████████▏| 6083/6638 [2:00:36<30:54, 3.34s/it] 92%|█████████▏| 6084/6638 [2:00:39<31:17, 3.39s/it] {'loss': 0.7239, 'grad_norm': 0.6319935208658817, 'learning_rate': 3.631957466666003e-07, 'epoch': 0.92} 92%|█████████▏| 6084/6638 [2:00:39<31:17, 3.39s/it] 92%|█████████▏| 6085/6638 [2:00:43<30:50, 3.35s/it] {'loss': 0.6518, 'grad_norm': 0.6059627741751412, 'learning_rate': 3.6189371325139444e-07, 'epoch': 0.92} 92%|█████████▏| 6085/6638 [2:00:43<30:50, 3.35s/it] 92%|█████████▏| 6086/6638 [2:00:46<30:25, 3.31s/it] {'loss': 0.6399, 'grad_norm': 0.5835087653551146, 'learning_rate': 3.6059397487219315e-07, 'epoch': 0.92} 92%|█████████▏| 6086/6638 [2:00:46<30:25, 3.31s/it] 92%|█████████▏| 6087/6638 [2:00:49<30:25, 3.31s/it] {'loss': 0.6419, 'grad_norm': 0.5296553508147939, 'learning_rate': 3.5929653183849444e-07, 'epoch': 0.92} 92%|█████████▏| 6087/6638 [2:00:49<30:25, 3.31s/it] 92%|█████████▏| 6088/6638 [2:00:52<30:39, 3.35s/it] {'loss': 0.6381, 'grad_norm': 0.5982141352189237, 'learning_rate': 3.5800138445924336e-07, 'epoch': 0.92} 92%|█████████▏| 6088/6638 [2:00:52<30:39, 3.35s/it] 92%|█████████▏| 6089/6638 [2:00:56<30:32, 3.34s/it] {'loss': 0.6613, 'grad_norm': 0.5822003812119316, 'learning_rate': 3.5670853304284324e-07, 'epoch': 0.92} 92%|█████████▏| 6089/6638 [2:00:56<30:32, 3.34s/it] 92%|█████████▏| 6090/6638 [2:00:59<30:15, 3.31s/it] {'loss': 0.6319, 'grad_norm': 0.5920003481816221, 'learning_rate': 3.5541797789715115e-07, 'epoch': 0.92} 92%|█████████▏| 6090/6638 [2:00:59<30:15, 3.31s/it] 92%|█████████▏| 6091/6638 [2:01:02<30:05, 3.30s/it] {'loss': 0.6566, 'grad_norm': 0.5694324705890619, 'learning_rate': 3.5412971932947236e-07, 'epoch': 0.92} 92%|█████████▏| 6091/6638 [2:01:02<30:05, 3.30s/it] 92%|█████████▏| 6092/6638 [2:01:06<30:05, 3.31s/it] {'loss': 0.6503, 'grad_norm': 0.5843984338648892, 'learning_rate': 3.528437576465693e-07, 'epoch': 0.92} 92%|█████████▏| 6092/6638 [2:01:06<30:05, 3.31s/it] 92%|█████████▏| 6093/6638 [2:01:09<30:26, 3.35s/it] {'loss': 0.6871, 'grad_norm': 0.5780169302727206, 'learning_rate': 3.5156009315465813e-07, 'epoch': 0.92} 92%|█████████▏| 6093/6638 [2:01:09<30:26, 3.35s/it] 92%|█████████▏| 6094/6638 [2:01:12<29:57, 3.31s/it] {'loss': 0.6044, 'grad_norm': 0.5798089917509044, 'learning_rate': 3.5027872615940426e-07, 'epoch': 0.92} 92%|█████████▏| 6094/6638 [2:01:12<29:57, 3.31s/it] 92%|█████████▏| 6095/6638 [2:01:16<29:45, 3.29s/it] {'loss': 0.6239, 'grad_norm': 0.5321109208936341, 'learning_rate': 3.489996569659293e-07, 'epoch': 0.92} 92%|█████████▏| 6095/6638 [2:01:16<29:45, 3.29s/it] 92%|█████████▏| 6096/6638 [2:01:19<30:06, 3.33s/it] {'loss': 0.607, 'grad_norm': 0.5625588982607101, 'learning_rate': 3.477228858788051e-07, 'epoch': 0.92} 92%|█████████▏| 6096/6638 [2:01:19<30:06, 3.33s/it] 92%|█████████▏| 6097/6638 [2:01:22<29:49, 3.31s/it] {'loss': 0.6394, 'grad_norm': 0.5651141973750426, 'learning_rate': 3.464484132020607e-07, 'epoch': 0.92} 92%|█████████▏| 6097/6638 [2:01:22<29:49, 3.31s/it] 92%|█████████▏| 6098/6638 [2:01:25<29:22, 3.26s/it] {'loss': 0.5951, 'grad_norm': 0.5420875064677734, 'learning_rate': 3.451762392391733e-07, 'epoch': 0.92} 92%|█████████▏| 6098/6638 [2:01:25<29:22, 3.26s/it] 92%|█████████▏| 6099/6638 [2:01:29<29:36, 3.30s/it] {'loss': 0.6177, 'grad_norm': 0.5531625305048364, 'learning_rate': 3.439063642930729e-07, 'epoch': 0.92} 92%|█████████▏| 6099/6638 [2:01:29<29:36, 3.30s/it]3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 04 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 92%|█████████▏| 6100/6638 [2:01:32<29:21, 3.27s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6154, 'grad_norm': 0.5468225067715557, 'learning_rate': 3.426387886661442e-07, 'epoch': 0.92} 92%|█████████▏| 6100/6638 [2:01:32<29:21, 3.27s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6100/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6100/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6100/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 92%|█████████▏| 6101/6638 [2:01:49<1:06:33, 7.44s/it] {'loss': 0.6492, 'grad_norm': 0.5720643182573477, 'learning_rate': 3.413735126602247e-07, 'epoch': 0.92} 92%|█████████▏| 6101/6638 [2:01:49<1:06:33, 7.44s/it] 92%|█████████▏| 6102/6638 [2:01:52<55:14, 6.18s/it] {'loss': 0.6545, 'grad_norm': 0.6164660874749502, 'learning_rate': 3.401105365766033e-07, 'epoch': 0.92} 92%|█████████▏| 6102/6638 [2:01:52<55:14, 6.18s/it] 92%|█████████▏| 6103/6638 [2:01:56<47:12, 5.29s/it] {'loss': 0.6245, 'grad_norm': 0.5390208266759375, 'learning_rate': 3.388498607160207e-07, 'epoch': 0.92} 92%|█████████▏| 6103/6638 [2:01:56<47:12, 5.29s/it] 92%|█████████▏| 6104/6638 [2:01:59<42:00, 4.72s/it] {'loss': 0.6862, 'grad_norm': 0.7035174042008812, 'learning_rate': 3.375914853786677e-07, 'epoch': 0.92} 92%|█████████▏| 6104/6638 [2:01:59<42:00, 4.72s/it] 92%|█████████▏| 6105/6638 [2:02:02<38:15, 4.31s/it] {'loss': 0.6568, 'grad_norm': 0.5342663903110745, 'learning_rate': 3.3633541086419477e-07, 'epoch': 0.92} 92%|█████████▏| 6105/6638 [2:02:02<38:15, 4.31s/it] 92%|█████████▏| 6106/6638 [2:02:06<35:22, 3.99s/it] {'loss': 0.6449, 'grad_norm': 0.6700185531497062, 'learning_rate': 3.35081637471697e-07, 'epoch': 0.92} 92%|█████████▏| 6106/6638 [2:02:06<35:22, 3.99s/it] 92%|█████████▏| 6107/6638 [2:02:09<33:35, 3.80s/it] {'loss': 0.6497, 'grad_norm': 0.5495774183582908, 'learning_rate': 3.338301654997245e-07, 'epoch': 0.92} 92%|█████████▏| 6107/6638 [2:02:09<33:35, 3.80s/it] 92%|█████████▏| 6108/6638 [2:02:12<31:46, 3.60s/it] {'loss': 0.6225, 'grad_norm': 0.5847409704492039, 'learning_rate': 3.3258099524627884e-07, 'epoch': 0.92} 92%|█████████▏| 6108/6638 [2:02:12<31:46, 3.60s/it] 92%|█████████▏| 6109/6638 [2:02:15<30:37, 3.47s/it] {'loss': 0.5807, 'grad_norm': 0.5864678919400549, 'learning_rate': 3.313341270088144e-07, 'epoch': 0.92} 92%|█████████▏| 6109/6638 [2:02:15<30:37, 3.47s/it] 92%|█████████▏| 6110/6638 [2:02:19<30:15, 3.44s/it] {'loss': 0.6209, 'grad_norm': 0.5109278627742498, 'learning_rate': 3.3008956108423696e-07, 'epoch': 0.92} 92%|█████████▏| 6110/6638 [2:02:19<30:15, 3.44s/it] 92%|█████████▏| 6111/6638 [2:02:22<29:54, 3.41s/it] {'loss': 0.6517, 'grad_norm': 0.5589500640661748, 'learning_rate': 3.2884729776890276e-07, 'epoch': 0.92} 92%|█████████▏| 6111/6638 [2:02:22<29:54, 3.41s/it] 92%|█████████▏| 6112/6638 [2:02:25<29:29, 3.36s/it] {'loss': 0.6465, 'grad_norm': 0.5686708736657715, 'learning_rate': 3.276073373586208e-07, 'epoch': 0.92} 92%|█████████▏| 6112/6638 [2:02:25<29:29, 3.36s/it] 92%|█████████▏| 6113/6638 [2:02:28<29:05, 3.32s/it] {'loss': 0.6196, 'grad_norm': 0.5737012961217427, 'learning_rate': 3.263696801486538e-07, 'epoch': 0.92} 92%|█████████▏| 6113/6638 [2:02:28<29:05, 3.32s/it] 92%|█████████▏| 6114/6638 [2:02:32<28:54, 3.31s/it] {'loss': 0.6464, 'grad_norm': 0.565825221053484, 'learning_rate': 3.2513432643371144e-07, 'epoch': 0.92} 92%|█████████▏| 6114/6638 [2:02:32<28:54, 3.31s/it] 92%|█████████▏| 6115/6638 [2:02:35<28:56, 3.32s/it] {'loss': 0.6276, 'grad_norm': 0.614059308643849, 'learning_rate': 3.2390127650795857e-07, 'epoch': 0.92} 92%|█████████▏| 6115/6638 [2:02:35<28:56, 3.32s/it] 92%|█████████▏| 6116/6638 [2:02:38<28:44, 3.30s/it] {'loss': 0.6749, 'grad_norm': 0.6493902608225195, 'learning_rate': 3.226705306650113e-07, 'epoch': 0.92} 92%|█████████▏| 6116/6638 [2:02:38<28:44, 3.30s/it] 92%|█████████▏| 6117/6638 [2:02:42<28:31, 3.28s/it] {'loss': 0.6169, 'grad_norm': 0.6539848883671762, 'learning_rate': 3.214420891979353e-07, 'epoch': 0.92} 92%|█████████▏| 6117/6638 [2:02:42<28:31, 3.28s/it] 92%|█████████▏| 6118/6638 [2:02:46<32:33, 3.76s/it] {'loss': 0.6548, 'grad_norm': 0.5573706905195398, 'learning_rate': 3.2021595239924874e-07, 'epoch': 0.92} 92%|█████████▏| 6118/6638 [2:02:46<32:33, 3.76s/it] 92%|█████████▏| 6119/6638 [2:02:50<31:17, 3.62s/it] {'loss': 0.5986, 'grad_norm': 0.5602955007757981, 'learning_rate': 3.1899212056091923e-07, 'epoch': 0.92} 92%|█████████▏| 6119/6638 [2:02:50<31:17, 3.62s/it] 92%|█████████▏| 6120/6638 [2:02:53<30:37, 3.55s/it] {'loss': 0.6535, 'grad_norm': 0.6292937636549887, 'learning_rate': 3.1777059397436693e-07, 'epoch': 0.92} 92%|█████████▏| 6120/6638 [2:02:53<30:37, 3.55s/it] 92%|█████████▏| 6121/6638 [2:02:56<29:52, 3.47s/it] {'loss': 0.6385, 'grad_norm': 1.1601231150658262, 'learning_rate': 3.165513729304648e-07, 'epoch': 0.92} 92%|█████████▏| 6121/6638 [2:02:56<29:52, 3.47s/it] 92%|█████████▏| 6122/6638 [2:03:00<29:16, 3.40s/it] {'loss': 0.6468, 'grad_norm': 0.5864255017191508, 'learning_rate': 3.15334457719535e-07, 'epoch': 0.92} 92%|█████████▏| 6122/6638 [2:03:00<29:16, 3.40s/it] 92%|█████████▏| 6123/6638 [2:03:03<28:39, 3.34s/it] {'loss': 0.6064, 'grad_norm': 0.5821932419542304, 'learning_rate': 3.141198486313479e-07, 'epoch': 0.92} 92%|█████████▏| 6123/6638 [2:03:03<28:39, 3.34s/it] 92%|█████████▏| 6124/6638 [2:03:06<28:46, 3.36s/it] {'loss': 0.606, 'grad_norm': 0.5543705589589368, 'learning_rate': 3.129075459551301e-07, 'epoch': 0.92} 92%|█████████▏| 6124/6638 [2:03:06<28:46, 3.36s/it] 92%|█████████▏| 6125/6638 [2:03:09<28:27, 3.33s/it] {'loss': 0.6069, 'grad_norm': 0.5758351905106047, 'learning_rate': 3.1169754997955715e-07, 'epoch': 0.92} 92%|█████████▏| 6125/6638 [2:03:09<28:27, 3.33s/it] 92%|█████████▏| 6126/6638 [2:03:13<28:18, 3.32s/it] {'loss': 0.6111, 'grad_norm': 0.553888491986526, 'learning_rate': 3.1048986099275204e-07, 'epoch': 0.92} 92%|█████████▏| 6126/6638 [2:03:13<28:18, 3.32s/it] 92%|█████████▏| 6127/6638 [2:03:16<28:13, 3.31s/it] {'loss': 0.6397, 'grad_norm': 0.6205101338208776, 'learning_rate': 3.092844792822902e-07, 'epoch': 0.92} 92%|█████████▏| 6127/6638 [2:03:16<28:13, 3.31s/it] 92%|█████████▏| 6128/6638 [2:03:19<28:06, 3.31s/it] {'loss': 0.6477, 'grad_norm': 0.6350950778614778, 'learning_rate': 3.0808140513520213e-07, 'epoch': 0.92} 92%|█████████▏| 6128/6638 [2:03:19<28:06, 3.31s/it] 92%|█████████▏| 6129/6638 [2:03:23<27:40, 3.26s/it] {'loss': 0.6088, 'grad_norm': 0.5665091634127224, 'learning_rate': 3.06880638837963e-07, 'epoch': 0.92} 92%|█████████▏| 6129/6638 [2:03:23<27:40, 3.26s/it] 92%|█████████▏| 6130/6638 [2:03:26<27:32, 3.25s/it] {'loss': 0.6429, 'grad_norm': 0.5673514943356314, 'learning_rate': 3.056821806765009e-07, 'epoch': 0.92} 92%|█████████▏| 6130/6638 [2:03:26<27:32, 3.25s/it] 92%|█████████▏| 6131/6638 [2:03:29<27:18, 3.23s/it] {'loss': 0.6279, 'grad_norm': 0.5834066327838611, 'learning_rate': 3.04486030936193e-07, 'epoch': 0.92} 92%|█████████▏| 6131/6638 [2:03:29<27:18, 3.23s/it] 92%|█████████▏| 6132/6638 [2:03:32<27:09, 3.22s/it] {'loss': 0.6256, 'grad_norm': 0.5719010084236786, 'learning_rate': 3.032921899018681e-07, 'epoch': 0.92} 92%|█████████▏| 6132/6638 [2:03:32<27:09, 3.22s/it] 92%|█████████▏| 6133/6638 [2:03:37<30:28, 3.62s/it] {'loss': 0.7154, 'grad_norm': 0.6653292063111752, 'learning_rate': 3.021006578578067e-07, 'epoch': 0.92} 92%|█████████▏| 6133/6638 [2:03:37<30:28, 3.62s/it] 92%|█████████▏| 6134/6638 [2:03:40<29:52, 3.56s/it] {'loss': 0.6499, 'grad_norm': 0.5436034691077714, 'learning_rate': 3.009114350877351e-07, 'epoch': 0.92} 92%|█████████▏| 6134/6638 [2:03:40<29:52, 3.56s/it] 92%|█████████▏| 6135/6638 [2:03:43<29:08, 3.48s/it] {'loss': 0.6438, 'grad_norm': 0.5438315021495866, 'learning_rate': 2.997245218748335e-07, 'epoch': 0.92} 92%|█████████▏| 6135/6638 [2:03:43<29:08, 3.48s/it] 92%|█████████▏| 6136/6638 [2:03:48<32:32, 3.89s/it] {'loss': 0.6425, 'grad_norm': 0.5673974849383622, 'learning_rate': 2.985399185017324e-07, 'epoch': 0.92} 92%|█████████▏| 6136/6638 [2:03:48<32:32, 3.89s/it] 92%|█████████▏| 6137/6638 [2:03:52<30:58, 3.71s/it] {'loss': 0.662, 'grad_norm': 0.6202425771267865, 'learning_rate': 2.9735762525050727e-07, 'epoch': 0.92} 92%|█████████▏| 6137/6638 [2:03:52<30:58, 3.71s/it] 92%|█████████▏| 6138/6638 [2:03:56<33:36, 4.03s/it] {'loss': 0.6283, 'grad_norm': 0.5378753234191951, 'learning_rate': 2.9617764240269076e-07, 'epoch': 0.92} 92%|█████████▏| 6138/6638 [2:03:56<33:36, 4.03s/it] 92%|█████████▏| 6139/6638 [2:04:00<31:51, 3.83s/it] {'loss': 0.6199, 'grad_norm': 0.5133061694321637, 'learning_rate': 2.94999970239257e-07, 'epoch': 0.92} 92%|█████████▏| 6139/6638 [2:04:00<31:51, 3.83s/it] 92%|█████████▏| 6140/6638 [2:04:03<30:19, 3.65s/it] {'loss': 0.5814, 'grad_norm': 0.5173228731831929, 'learning_rate': 2.938246090406405e-07, 'epoch': 0.92} 92%|█████████▏| 6140/6638 [2:04:03<30:19, 3.65s/it] 93%|█████████▎| 6141/6638 [2:04:06<29:41, 3.58s/it] {'loss': 0.6294, 'grad_norm': 0.5506252534021951, 'learning_rate': 2.9265155908671516e-07, 'epoch': 0.93} 93%|█████████▎| 6141/6638 [2:04:06<29:41, 3.58s/it] 93%|█████████▎| 6142/6638 [2:04:10<29:08, 3.53s/it] {'loss': 0.6067, 'grad_norm': 0.5900373778756737, 'learning_rate': 2.914808206568098e-07, 'epoch': 0.93} 93%|█████████▎| 6142/6638 [2:04:10<29:08, 3.53s/it] 93%|█████████▎| 6143/6638 [2:04:13<28:37, 3.47s/it] {'loss': 0.6748, 'grad_norm': 0.6240051740976595, 'learning_rate': 2.903123940297015e-07, 'epoch': 0.93} 93%|█████████▎| 6143/6638 [2:04:13<28:37, 3.47s/it] 93%|█████████▎| 6144/6638 [2:04:16<27:55, 3.39s/it] {'loss': 0.6338, 'grad_norm': 0.6018277650990203, 'learning_rate': 2.891462794836186e-07, 'epoch': 0.93} 93%|█████████▎| 6144/6638 [2:04:16<27:55, 3.39s/it] 93%|█████████▎| 6145/6638 [2:04:20<27:48, 3.39s/it] {'loss': 0.6273, 'grad_norm': 0.5059973467106739, 'learning_rate': 2.879824772962381e-07, 'epoch': 0.93} 93%|█████████▎| 6145/6638 [2:04:20<27:48, 3.39s/it] 93%|█████████▎| 6146/6638 [2:04:25<31:23, 3.83s/it] {'loss': 0.6217, 'grad_norm': 0.5727646567811949, 'learning_rate': 2.8682098774468257e-07, 'epoch': 0.93} 93%|█████████▎| 6146/6638 [2:04:25<31:23, 3.83s/it] 93%|█████████▎| 6147/6638 [2:04:28<29:56, 3.66s/it] {'loss': 0.6615, 'grad_norm': 0.6499457517893343, 'learning_rate': 2.856618111055298e-07, 'epoch': 0.93} 93%|█████████▎| 6147/6638 [2:04:28<29:56, 3.66s/it] 93%|█████████▎| 6148/6638 [2:04:31<29:12, 3.58s/it] {'loss': 0.6836, 'grad_norm': 0.6171622560869778, 'learning_rate': 2.845049476548045e-07, 'epoch': 0.93} 93%|█████████▎| 6148/6638 [2:04:31<29:12, 3.58s/it] 93%|█████████▎| 6149/6638 [2:04:37<34:17, 4.21s/it] {'loss': 0.6395, 'grad_norm': 0.6523637297904507, 'learning_rate': 2.8335039766797745e-07, 'epoch': 0.93} 93%|█████████▎| 6149/6638 [2:04:37<34:17, 4.21s/it]3 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 93%|█████████▎| 6150/6638 [2:04:40<32:02, 3.94s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6402, 'grad_norm': 0.6324514989492938, 'learning_rate': 2.8219816141997315e-07, 'epoch': 0.93} 93%|█████████▎| 6150/6638 [2:04:40<32:02, 3.94s/it] 93%|█████████▎| 6151/6638 [2:04:43<30:25, 3.75s/it] {'loss': 0.7274, 'grad_norm': 0.6636143135909965, 'learning_rate': 2.810482391851643e-07, 'epoch': 0.93} 93%|█████████▎| 6151/6638 [2:04:43<30:25, 3.75s/it] 93%|█████████▎| 6152/6638 [2:04:47<29:01, 3.58s/it] {'loss': 0.6291, 'grad_norm': 0.6240362266865643, 'learning_rate': 2.799006312373698e-07, 'epoch': 0.93} 93%|█████████▎| 6152/6638 [2:04:47<29:01, 3.58s/it] 93%|█████████▎| 6153/6638 [2:04:50<28:22, 3.51s/it] {'loss': 0.6578, 'grad_norm': 0.6274455365640224, 'learning_rate': 2.787553378498609e-07, 'epoch': 0.93} 93%|█████████▎| 6153/6638 [2:04:50<28:22, 3.51s/it] 93%|█████████▎| 6154/6638 [2:04:53<27:40, 3.43s/it] {'loss': 0.6594, 'grad_norm': 0.5842999171130657, 'learning_rate': 2.776123592953539e-07, 'epoch': 0.93} 93%|█████████▎| 6154/6638 [2:04:53<27:40, 3.43s/it] 93%|█████████▎| 6155/6638 [2:04:57<27:13, 3.38s/it] {'loss': 0.6028, 'grad_norm': 0.5219125352475782, 'learning_rate': 2.7647169584601677e-07, 'epoch': 0.93} 93%|█████████▎| 6155/6638 [2:04:57<27:13, 3.38s/it] 93%|█████████▎| 6156/6638 [2:05:01<30:46, 3.83s/it] {'loss': 0.6332, 'grad_norm': 0.6668240010339948, 'learning_rate': 2.7533334777346764e-07, 'epoch': 0.93} 93%|█████████▎| 6156/6638 [2:05:01<30:46, 3.83s/it] 93%|█████████▎| 6157/6638 [2:05:05<29:29, 3.68s/it] {'loss': 0.5801, 'grad_norm': 0.5324302648548015, 'learning_rate': 2.741973153487687e-07, 'epoch': 0.93} 93%|█████████▎| 6157/6638 [2:05:05<29:29, 3.68s/it] 93%|█████████▎| 6158/6638 [2:05:08<28:44, 3.59s/it] {'loss': 0.677, 'grad_norm': 0.5736605880763282, 'learning_rate': 2.730635988424335e-07, 'epoch': 0.93} 93%|█████████▎| 6158/6638 [2:05:08<28:44, 3.59s/it] 93%|█████████▎| 6159/6638 [2:05:11<27:53, 3.49s/it] {'loss': 0.6193, 'grad_norm': 0.6226203126910289, 'learning_rate': 2.71932198524425e-07, 'epoch': 0.93} 93%|█████████▎| 6159/6638 [2:05:11<27:53, 3.49s/it] 93%|█████████▎| 6160/6638 [2:05:15<27:09, 3.41s/it] {'loss': 0.6117, 'grad_norm': 0.607735416222736, 'learning_rate': 2.708031146641521e-07, 'epoch': 0.93} 93%|█████████▎| 6160/6638 [2:05:15<27:09, 3.41s/it] 93%|█████████▎| 6161/6638 [2:05:18<26:37, 3.35s/it] {'loss': 0.6298, 'grad_norm': 0.6518897335214827, 'learning_rate': 2.6967634753047424e-07, 'epoch': 0.93} 93%|█████████▎| 6161/6638 [2:05:18<26:37, 3.35s/it] 93%|█████████▎| 6162/6638 [2:05:21<26:32, 3.34s/it] {'loss': 0.6406, 'grad_norm': 0.5877501971553757, 'learning_rate': 2.6855189739169673e-07, 'epoch': 0.93} 93%|█████████▎| 6162/6638 [2:05:21<26:32, 3.34s/it] 93%|█████████▎| 6163/6638 [2:05:24<26:25, 3.34s/it] {'loss': 0.6931, 'grad_norm': 0.5769954079941123, 'learning_rate': 2.674297645155788e-07, 'epoch': 0.93} 93%|█████████▎| 6163/6638 [2:05:24<26:25, 3.34s/it] 93%|█████████▎| 6164/6638 [2:05:28<26:24, 3.34s/it] {'loss': 0.661, 'grad_norm': 0.5509488511531662, 'learning_rate': 2.6630994916932107e-07, 'epoch': 0.93} 93%|█████████▎| 6164/6638 [2:05:28<26:24, 3.34s/it] 93%|█████████▎| 6165/6638 [2:05:31<26:19, 3.34s/it] {'loss': 0.6202, 'grad_norm': 0.5632626410681506, 'learning_rate': 2.6519245161957364e-07, 'epoch': 0.93} 93%|█████████▎| 6165/6638 [2:05:31<26:19, 3.34s/it] 93%|█████████▎| 6166/6638 [2:05:34<26:11, 3.33s/it] {'loss': 0.6309, 'grad_norm': 0.5375565333599538, 'learning_rate': 2.640772721324392e-07, 'epoch': 0.93} 93%|█████████▎| 6166/6638 [2:05:34<26:11, 3.33s/it] 93%|█████████▎| 6167/6638 [2:05:38<26:15, 3.35s/it] {'loss': 0.6669, 'grad_norm': 0.6136104914011774, 'learning_rate': 2.62964410973463e-07, 'epoch': 0.93} 93%|█████████▎| 6167/6638 [2:05:38<26:15, 3.35s/it] 93%|█████████▎| 6168/6638 [2:05:41<26:04, 3.33s/it] {'loss': 0.6373, 'grad_norm': 0.5935544331490508, 'learning_rate': 2.618538684076444e-07, 'epoch': 0.93} 93%|█████████▎| 6168/6638 [2:05:41<26:04, 3.33s/it] 93%|█████████▎| 6169/6638 [2:05:44<25:39, 3.28s/it] {'loss': 0.6509, 'grad_norm': 0.6071668285948337, 'learning_rate': 2.6074564469942277e-07, 'epoch': 0.93} 93%|█████████▎| 6169/6638 [2:05:44<25:39, 3.28s/it] 93%|█████████▎| 6170/6638 [2:05:48<25:40, 3.29s/it] {'loss': 0.6586, 'grad_norm': 0.6273699506739634, 'learning_rate': 2.5963974011269156e-07, 'epoch': 0.93} 93%|█████████▎| 6170/6638 [2:05:48<25:40, 3.29s/it] 93%|█████████▎| 6171/6638 [2:05:51<25:36, 3.29s/it] {'loss': 0.6453, 'grad_norm': 0.6061975187248205, 'learning_rate': 2.585361549107901e-07, 'epoch': 0.93} 93%|█████████▎| 6171/6638 [2:05:51<25:36, 3.29s/it] 93%|█████████▎| 6172/6638 [2:05:54<25:29, 3.28s/it] {'loss': 0.62, 'grad_norm': 0.5287565088547715, 'learning_rate': 2.5743488935650483e-07, 'epoch': 0.93} 93%|█████████▎| 6172/6638 [2:05:54<25:29, 3.28s/it] 93%|█████████▎| 6173/6638 [2:05:59<28:32, 3.68s/it] {'loss': 0.6054, 'grad_norm': 0.7904630926574795, 'learning_rate': 2.563359437120694e-07, 'epoch': 0.93} 93%|█████████▎| 6173/6638 [2:05:59<28:32, 3.68s/it] 93%|█████████▎| 6174/6638 [2:06:02<27:41, 3.58s/it] {'loss': 0.68, 'grad_norm': 0.6036999584609193, 'learning_rate': 2.552393182391677e-07, 'epoch': 0.93} 93%|█████████▎| 6174/6638 [2:06:02<27:41, 3.58s/it] 93%|█████████▎| 6175/6638 [2:06:06<27:34, 3.57s/it] {'loss': 0.6458, 'grad_norm': 0.6072600924612748, 'learning_rate': 2.541450131989287e-07, 'epoch': 0.93} 93%|█████████▎| 6175/6638 [2:06:06<27:34, 3.57s/it] 93%|█████████▎| 6176/6638 [2:06:09<26:43, 3.47s/it] {'loss': 0.6434, 'grad_norm': 0.5865963754341607, 'learning_rate': 2.530530288519284e-07, 'epoch': 0.93} 93%|█████████▎| 6176/6638 [2:06:09<26:43, 3.47s/it] 93%|█████████▎| 6177/6638 [2:06:12<26:20, 3.43s/it] {'loss': 0.6246, 'grad_norm': 0.5320496700660496, 'learning_rate': 2.519633654581921e-07, 'epoch': 0.93} 93%|█████████▎| 6177/6638 [2:06:12<26:20, 3.43s/it] 93%|█████████▎| 6178/6638 [2:06:15<25:51, 3.37s/it] {'loss': 0.6262, 'grad_norm': 0.6002584146012878, 'learning_rate': 2.5087602327719117e-07, 'epoch': 0.93} 93%|█████████▎| 6178/6638 [2:06:15<25:51, 3.37s/it] 93%|█████████▎| 6179/6638 [2:06:19<25:48, 3.37s/it] {'loss': 0.6137, 'grad_norm': 0.52213048240119, 'learning_rate': 2.497910025678463e-07, 'epoch': 0.93} 93%|█████████▎| 6179/6638 [2:06:19<25:48, 3.37s/it] 93%|█████████▎| 6180/6638 [2:06:22<25:28, 3.34s/it] {'loss': 0.6372, 'grad_norm': 0.6358189125768254, 'learning_rate': 2.487083035885218e-07, 'epoch': 0.93} 93%|█████████▎| 6180/6638 [2:06:22<25:28, 3.34s/it] 93%|█████████▎| 6181/6638 [2:06:26<26:08, 3.43s/it] {'loss': 0.6604, 'grad_norm': 0.579128097892997, 'learning_rate': 2.476279265970316e-07, 'epoch': 0.93} 93%|█████████▎| 6181/6638 [2:06:26<26:08, 3.43s/it] 93%|█████████▎| 6182/6638 [2:06:29<25:37, 3.37s/it] {'loss': 0.6061, 'grad_norm': 0.5690704482078023, 'learning_rate': 2.465498718506354e-07, 'epoch': 0.93} 93%|█████████▎| 6182/6638 [2:06:29<25:37, 3.37s/it] 93%|█████████▎| 6183/6638 [2:06:32<25:40, 3.38s/it] {'loss': 0.6497, 'grad_norm': 0.6055284642460009, 'learning_rate': 2.4547413960604336e-07, 'epoch': 0.93} 93%|█████████▎| 6183/6638 [2:06:32<25:40, 3.38s/it] 93%|█████████▎| 6184/6638 [2:06:36<25:28, 3.37s/it] {'loss': 0.6246, 'grad_norm': 0.5507189174474786, 'learning_rate': 2.4440073011940845e-07, 'epoch': 0.93} 93%|█████████▎| 6184/6638 [2:06:36<25:28, 3.37s/it] 93%|█████████▎| 6185/6638 [2:06:39<25:06, 3.32s/it] {'loss': 0.6096, 'grad_norm': 0.5998079512243744, 'learning_rate': 2.433296436463306e-07, 'epoch': 0.93} 93%|█████████▎| 6185/6638 [2:06:39<25:06, 3.32s/it] 93%|█████████▎| 6186/6638 [2:06:42<25:03, 3.33s/it] {'loss': 0.6779, 'grad_norm': 0.6226790406506423, 'learning_rate': 2.4226088044186026e-07, 'epoch': 0.93} 93%|█████████▎| 6186/6638 [2:06:42<25:03, 3.33s/it] 93%|█████████▎| 6187/6638 [2:06:46<24:55, 3.32s/it] {'loss': 0.6778, 'grad_norm': 0.6665110004912472, 'learning_rate': 2.411944407604927e-07, 'epoch': 0.93} 93%|█████████▎| 6187/6638 [2:06:46<24:55, 3.32s/it] 93%|█████████▎| 6188/6638 [2:06:49<24:42, 3.29s/it] {'loss': 0.6292, 'grad_norm': 0.56857312304046, 'learning_rate': 2.4013032485616595e-07, 'epoch': 0.93} 93%|█████████▎| 6188/6638 [2:06:49<24:42, 3.29s/it] 93%|█████████▎| 6189/6638 [2:06:52<24:48, 3.32s/it] {'loss': 0.6375, 'grad_norm': 1.9219015938351867, 'learning_rate': 2.390685329822728e-07, 'epoch': 0.93} 93%|█████████▎| 6189/6638 [2:06:52<24:48, 3.32s/it] 93%|█████████▎| 6190/6638 [2:06:56<24:54, 3.34s/it] {'loss': 0.5923, 'grad_norm': 0.627037053694762, 'learning_rate': 2.3800906539164558e-07, 'epoch': 0.93} 93%|█████████▎| 6190/6638 [2:06:56<24:54, 3.34s/it] 93%|█████████▎| 6191/6638 [2:06:59<24:44, 3.32s/it] {'loss': 0.6374, 'grad_norm': 0.5928402782433465, 'learning_rate': 2.3695192233656682e-07, 'epoch': 0.93} 93%|█████████▎| 6191/6638 [2:06:59<24:44, 3.32s/it] 93%|█████████▎| 6192/6638 [2:07:02<24:32, 3.30s/it] {'loss': 0.6331, 'grad_norm': 0.618664202042282, 'learning_rate': 2.3589710406876408e-07, 'epoch': 0.93} 93%|█████████▎| 6192/6638 [2:07:02<24:32, 3.30s/it] 93%|█████████▎| 6193/6638 [2:07:06<25:17, 3.41s/it] {'loss': 0.6581, 'grad_norm': 0.5809597190220344, 'learning_rate': 2.3484461083941312e-07, 'epoch': 0.93} 93%|█████████▎| 6193/6638 [2:07:06<25:17, 3.41s/it] 93%|█████████▎| 6194/6638 [2:07:09<24:47, 3.35s/it] {'loss': 0.6126, 'grad_norm': 0.5429465148003625, 'learning_rate': 2.3379444289913344e-07, 'epoch': 0.93} 93%|█████████▎| 6194/6638 [2:07:09<24:47, 3.35s/it] 93%|█████████▎| 6195/6638 [2:07:12<24:54, 3.37s/it] {'loss': 0.615, 'grad_norm': 0.6649952789995367, 'learning_rate': 2.327466004979917e-07, 'epoch': 0.93} 93%|█████████▎| 6195/6638 [2:07:12<24:54, 3.37s/it] 93%|█████████▎| 6196/6638 [2:07:16<25:11, 3.42s/it] {'loss': 0.5864, 'grad_norm': 0.4676463531587897, 'learning_rate': 2.3170108388550384e-07, 'epoch': 0.93} 93%|█████████▎| 6196/6638 [2:07:16<25:11, 3.42s/it] 93%|█████████▎| 6197/6638 [2:07:19<24:51, 3.38s/it] {'loss': 0.6706, 'grad_norm': 0.573830544364556, 'learning_rate': 2.3065789331062515e-07, 'epoch': 0.93} 93%|█████████▎| 6197/6638 [2:07:19<24:51, 3.38s/it] 93%|█████████▎| 6198/6638 [2:07:23<24:55, 3.40s/it] {'loss': 0.6569, 'grad_norm': 0.5952692311225609, 'learning_rate': 2.2961702902176476e-07, 'epoch': 0.93} 93%|█████████▎| 6198/6638 [2:07:23<24:55, 3.40s/it] 93%|█████████▎| 6199/6638 [2:07:26<24:58, 3.41s/it] {'loss': 0.6279, 'grad_norm': 0.5647042808845464, 'learning_rate': 2.2857849126677433e-07, 'epoch': 0.93} 93%|█████████▎| 6199/6638 [2:07:26<24:58, 3.41s/it]6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 5AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 93%|█████████▎| 6200/6638 [2:07:29<24:27, 3.35s/it]1 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.624, 'grad_norm': 0.5841726533571824, 'learning_rate': 2.2754228029294833e-07, 'epoch': 0.93} 93%|█████████▎| 6200/6638 [2:07:29<24:27, 3.35s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6200/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6200/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6200/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 93%|█████████▎| 6201/6638 [2:07:47<55:40, 7.64s/it] {'loss': 0.6165, 'grad_norm': 0.5493613181769942, 'learning_rate': 2.265083963470327e-07, 'epoch': 0.93} 93%|█████████▎| 6201/6638 [2:07:47<55:40, 7.64s/it] 93%|█████████▎| 6202/6638 [2:07:50<46:09, 6.35s/it] {'loss': 0.6372, 'grad_norm': 0.6040334139032746, 'learning_rate': 2.2547683967521827e-07, 'epoch': 0.93} 93%|█████████▎| 6202/6638 [2:07:50<46:09, 6.35s/it] 93%|█████████▎| 6203/6638 [2:07:54<39:30, 5.45s/it] {'loss': 0.7478, 'grad_norm': 0.7473170586135993, 'learning_rate': 2.2444761052313857e-07, 'epoch': 0.93} 93%|█████████▎| 6203/6638 [2:07:54<39:30, 5.45s/it] 93%|█████████▎| 6204/6638 [2:07:57<35:03, 4.85s/it] {'loss': 0.6564, 'grad_norm': 0.5389488496710664, 'learning_rate': 2.2342070913587423e-07, 'epoch': 0.93} 93%|█████████▎| 6204/6638 [2:07:57<35:03, 4.85s/it] 93%|█████████▎| 6205/6638 [2:08:00<31:47, 4.41s/it] {'loss': 0.6075, 'grad_norm': 0.5355198566019741, 'learning_rate': 2.2239613575795294e-07, 'epoch': 0.93} 93%|█████████▎| 6205/6638 [2:08:00<31:47, 4.41s/it] 93%|█████████▎| 6206/6638 [2:08:04<29:26, 4.09s/it] {'loss': 0.6221, 'grad_norm': 0.604302647801351, 'learning_rate': 2.2137389063334846e-07, 'epoch': 0.93} 93%|█████████▎| 6206/6638 [2:08:04<29:26, 4.09s/it] 94%|█████████▎| 6207/6638 [2:08:07<27:39, 3.85s/it] {'loss': 0.6354, 'grad_norm': 1.6016039495793055, 'learning_rate': 2.2035397400547832e-07, 'epoch': 0.94} 94%|█████████▎| 6207/6638 [2:08:07<27:39, 3.85s/it] 94%|█████████▎| 6208/6638 [2:08:10<26:11, 3.65s/it] {'loss': 0.6255, 'grad_norm': 0.6104756365892883, 'learning_rate': 2.1933638611720265e-07, 'epoch': 0.94} 94%|█████████▎| 6208/6638 [2:08:10<26:11, 3.65s/it] 94%|█████████▎| 6209/6638 [2:08:14<25:16, 3.54s/it] {'loss': 0.6096, 'grad_norm': 0.5742383281523696, 'learning_rate': 2.1832112721083542e-07, 'epoch': 0.94} 94%|█████████▎| 6209/6638 [2:08:14<25:16, 3.54s/it] 94%|█████████▎| 6210/6638 [2:08:17<24:40, 3.46s/it] {'loss': 0.6678, 'grad_norm': 0.6300382138695565, 'learning_rate': 2.1730819752812772e-07, 'epoch': 0.94} 94%|█████████▎| 6210/6638 [2:08:17<24:40, 3.46s/it] 94%|█████████▎| 6211/6638 [2:08:20<24:05, 3.39s/it] {'loss': 0.6251, 'grad_norm': 0.5777079978609659, 'learning_rate': 2.162975973102821e-07, 'epoch': 0.94} 94%|█████████▎| 6211/6638 [2:08:20<24:05, 3.39s/it] 94%|█████████▎| 6212/6638 [2:08:23<23:52, 3.36s/it] {'loss': 0.63, 'grad_norm': 0.668273055929353, 'learning_rate': 2.1528932679794168e-07, 'epoch': 0.94} 94%|█████████▎| 6212/6638 [2:08:23<23:52, 3.36s/it] 94%|█████████▎| 6213/6638 [2:08:27<23:44, 3.35s/it] {'loss': 0.6334, 'grad_norm': 0.5351328895907043, 'learning_rate': 2.142833862311977e-07, 'epoch': 0.94} 94%|█████████▎| 6213/6638 [2:08:27<23:44, 3.35s/it] 94%|█████████▎| 6214/6638 [2:08:30<24:00, 3.40s/it] {'loss': 0.5935, 'grad_norm': 0.5799179004650393, 'learning_rate': 2.1327977584958637e-07, 'epoch': 0.94} 94%|█████████▎| 6214/6638 [2:08:30<24:00, 3.40s/it] 94%|█████████▎| 6215/6638 [2:08:34<24:01, 3.41s/it] {'loss': 0.6508, 'grad_norm': 0.5927155635960644, 'learning_rate': 2.1227849589208648e-07, 'epoch': 0.94} 94%|█████████▎| 6215/6638 [2:08:34<24:01, 3.41s/it] 94%|█████████▎| 6216/6638 [2:08:37<23:36, 3.36s/it] {'loss': 0.6299, 'grad_norm': 0.5983886353398947, 'learning_rate': 2.1127954659712513e-07, 'epoch': 0.94} 94%|█████████▎| 6216/6638 [2:08:37<23:36, 3.36s/it] 94%|█████████▎| 6217/6638 [2:08:40<23:22, 3.33s/it] {'loss': 0.6448, 'grad_norm': 0.6129726661505709, 'learning_rate': 2.1028292820257535e-07, 'epoch': 0.94} 94%|█████████▎| 6217/6638 [2:08:40<23:22, 3.33s/it] 94%|█████████▎| 6218/6638 [2:08:43<23:05, 3.30s/it] {'loss': 0.625, 'grad_norm': 0.5889351327512011, 'learning_rate': 2.0928864094574842e-07, 'epoch': 0.94} 94%|█████████▎| 6218/6638 [2:08:43<23:05, 3.30s/it] 94%|█████████▎| 6219/6638 [2:08:47<23:01, 3.30s/it] {'loss': 0.6277, 'grad_norm': 0.5570147609340844, 'learning_rate': 2.082966850634094e-07, 'epoch': 0.94} 94%|█████████▎| 6219/6638 [2:08:47<23:01, 3.30s/it] 94%|█████████▎| 6220/6638 [2:08:50<23:16, 3.34s/it] {'loss': 0.6011, 'grad_norm': 0.4974065118520261, 'learning_rate': 2.0730706079176156e-07, 'epoch': 0.94} 94%|█████████▎| 6220/6638 [2:08:50<23:16, 3.34s/it] 94%|█████████▎| 6221/6638 [2:08:53<23:05, 3.32s/it] {'loss': 0.6265, 'grad_norm': 0.6040509710489036, 'learning_rate': 2.0631976836645418e-07, 'epoch': 0.94} 94%|█████████▎| 6221/6638 [2:08:53<23:05, 3.32s/it] 94%|█████████▎| 6222/6638 [2:08:57<23:09, 3.34s/it] {'loss': 0.6389, 'grad_norm': 0.6880662769343839, 'learning_rate': 2.0533480802258587e-07, 'epoch': 0.94} 94%|█████████▎| 6222/6638 [2:08:57<23:09, 3.34s/it] 94%|█████████▎| 6223/6638 [2:09:00<23:26, 3.39s/it] {'loss': 0.6153, 'grad_norm': 0.5647535400181993, 'learning_rate': 2.043521799946935e-07, 'epoch': 0.94} 94%|█████████▎| 6223/6638 [2:09:00<23:26, 3.39s/it] 94%|█████████▍| 6224/6638 [2:09:04<23:18, 3.38s/it] {'loss': 0.6535, 'grad_norm': 0.6433241952680603, 'learning_rate': 2.0337188451676205e-07, 'epoch': 0.94} 94%|█████████▍| 6224/6638 [2:09:04<23:18, 3.38s/it] 94%|█████████▍| 6225/6638 [2:09:07<23:15, 3.38s/it] {'loss': 0.6517, 'grad_norm': 0.6005635749097349, 'learning_rate': 2.023939218222215e-07, 'epoch': 0.94} 94%|█████████▍| 6225/6638 [2:09:07<23:15, 3.38s/it] 94%|█████████▍| 6226/6638 [2:09:10<23:07, 3.37s/it] {'loss': 0.647, 'grad_norm': 0.5891081902400496, 'learning_rate': 2.0141829214394447e-07, 'epoch': 0.94} 94%|█████████▍| 6226/6638 [2:09:10<23:07, 3.37s/it] 94%|█████████▍| 6227/6638 [2:09:14<23:01, 3.36s/it] {'loss': 0.6554, 'grad_norm': 0.6060813062145536, 'learning_rate': 2.0044499571424846e-07, 'epoch': 0.94} 94%|█████████▍| 6227/6638 [2:09:14<23:01, 3.36s/it] 94%|█████████▍| 6228/6638 [2:09:17<22:46, 3.33s/it] {'loss': 0.6007, 'grad_norm': 0.5173256987673438, 'learning_rate': 1.9947403276489808e-07, 'epoch': 0.94} 94%|█████████▍| 6228/6638 [2:09:17<22:46, 3.33s/it] 94%|█████████▍| 6229/6638 [2:09:20<22:50, 3.35s/it] {'loss': 0.6449, 'grad_norm': 0.5949671693167407, 'learning_rate': 1.9850540352709836e-07, 'epoch': 0.94} 94%|█████████▍| 6229/6638 [2:09:20<22:50, 3.35s/it] 94%|█████████▍| 6230/6638 [2:09:24<22:43, 3.34s/it] {'loss': 0.6286, 'grad_norm': 0.5777082049401395, 'learning_rate': 1.9753910823150035e-07, 'epoch': 0.94} 94%|█████████▍| 6230/6638 [2:09:24<22:43, 3.34s/it] 94%|█████████▍| 6231/6638 [2:09:27<22:43, 3.35s/it] {'loss': 0.6012, 'grad_norm': 0.5417216275420597, 'learning_rate': 1.965751471081978e-07, 'epoch': 0.94} 94%|█████████▍| 6231/6638 [2:09:27<22:43, 3.35s/it] 94%|█████████▍| 6232/6638 [2:09:30<22:44, 3.36s/it] {'loss': 0.6484, 'grad_norm': 0.5413879709752893, 'learning_rate': 1.9561352038673264e-07, 'epoch': 0.94} 94%|█████████▍| 6232/6638 [2:09:30<22:44, 3.36s/it] 94%|█████████▍| 6233/6638 [2:09:34<22:52, 3.39s/it] {'loss': 0.6279, 'grad_norm': 0.5420651362315556, 'learning_rate': 1.9465422829608838e-07, 'epoch': 0.94} 94%|█████████▍| 6233/6638 [2:09:34<22:52, 3.39s/it] 94%|█████████▍| 6234/6638 [2:09:37<22:36, 3.36s/it] {'loss': 0.6572, 'grad_norm': 0.6171715756932316, 'learning_rate': 1.9369727106469116e-07, 'epoch': 0.94} 94%|█████████▍| 6234/6638 [2:09:37<22:36, 3.36s/it] 94%|█████████▍| 6235/6638 [2:09:40<22:29, 3.35s/it] {'loss': 0.6149, 'grad_norm': 0.5950115617654609, 'learning_rate': 1.9274264892041428e-07, 'epoch': 0.94} 94%|█████████▍| 6235/6638 [2:09:40<22:29, 3.35s/it] 94%|█████████▍| 6236/6638 [2:09:44<22:16, 3.33s/it] {'loss': 0.6244, 'grad_norm': 0.5795986421011274, 'learning_rate': 1.9179036209057034e-07, 'epoch': 0.94} 94%|█████████▍| 6236/6638 [2:09:44<22:16, 3.33s/it] 94%|█████████▍| 6237/6638 [2:09:47<22:23, 3.35s/it] {'loss': 0.6215, 'grad_norm': 0.5680132198226306, 'learning_rate': 1.908404108019235e-07, 'epoch': 0.94} 94%|█████████▍| 6237/6638 [2:09:47<22:23, 3.35s/it] 94%|█████████▍| 6238/6638 [2:09:50<22:16, 3.34s/it] {'loss': 0.638, 'grad_norm': 0.5650944059502989, 'learning_rate': 1.8989279528067283e-07, 'epoch': 0.94} 94%|█████████▍| 6238/6638 [2:09:50<22:16, 3.34s/it] 94%|█████████▍| 6239/6638 [2:09:54<22:11, 3.34s/it] {'loss': 0.6571, 'grad_norm': 0.6546815164063234, 'learning_rate': 1.8894751575246783e-07, 'epoch': 0.94} 94%|█████████▍| 6239/6638 [2:09:54<22:11, 3.34s/it] 94%|█████████▍| 6240/6638 [2:09:57<22:03, 3.33s/it] {'loss': 0.6435, 'grad_norm': 0.5986455664005882, 'learning_rate': 1.8800457244239957e-07, 'epoch': 0.94} 94%|█████████▍| 6240/6638 [2:09:57<22:03, 3.33s/it] 94%|█████████▍| 6241/6638 [2:10:00<21:49, 3.30s/it] {'loss': 0.6062, 'grad_norm': 0.6920908942101816, 'learning_rate': 1.8706396557500172e-07, 'epoch': 0.94} 94%|█████████▍| 6241/6638 [2:10:00<21:49, 3.30s/it] 94%|█████████▍| 6242/6638 [2:10:04<21:40, 3.28s/it] {'loss': 0.6224, 'grad_norm': 0.5713045431841149, 'learning_rate': 1.8612569537425295e-07, 'epoch': 0.94} 94%|█████████▍| 6242/6638 [2:10:04<21:40, 3.28s/it] 94%|█████████▍| 6243/6638 [2:10:07<21:58, 3.34s/it] {'loss': 0.6653, 'grad_norm': 0.8862610412810139, 'learning_rate': 1.8518976206357452e-07, 'epoch': 0.94} 94%|█████████▍| 6243/6638 [2:10:07<21:58, 3.34s/it] 94%|█████████▍| 6244/6638 [2:10:10<21:51, 3.33s/it] {'loss': 0.6309, 'grad_norm': 0.5914043678686514, 'learning_rate': 1.842561658658337e-07, 'epoch': 0.94} 94%|█████████▍| 6244/6638 [2:10:10<21:51, 3.33s/it] 94%|█████████▍| 6245/6638 [2:10:14<21:46, 3.32s/it] {'loss': 0.6476, 'grad_norm': 0.5999906392167134, 'learning_rate': 1.8332490700333828e-07, 'epoch': 0.94} 94%|█████████▍| 6245/6638 [2:10:14<21:46, 3.32s/it] 94%|█████████▍| 6246/6638 [2:10:17<21:33, 3.30s/it] {'loss': 0.6068, 'grad_norm': 0.5553207970064986, 'learning_rate': 1.8239598569783856e-07, 'epoch': 0.94} 94%|█████████▍| 6246/6638 [2:10:17<21:33, 3.30s/it] 94%|█████████▍| 6247/6638 [2:10:20<21:23, 3.28s/it] {'loss': 0.6371, 'grad_norm': 0.6080727548235226, 'learning_rate': 1.8146940217053322e-07, 'epoch': 0.94} 94%|█████████▍| 6247/6638 [2:10:20<21:23, 3.28s/it] 94%|█████████▍| 6248/6638 [2:10:23<21:17, 3.28s/it] {'loss': 0.5935, 'grad_norm': 0.5193167943399989, 'learning_rate': 1.805451566420613e-07, 'epoch': 0.94} 94%|█████████▍| 6248/6638 [2:10:23<21:17, 3.28s/it] 94%|█████████▍| 6249/6638 [2:10:27<21:42, 3.35s/it] {'loss': 0.659, 'grad_norm': 0.6273416989268149, 'learning_rate': 1.796232493325023e-07, 'epoch': 0.94} 94%|█████████▍| 6249/6638 [2:10:27<21:42, 3.35s/it]5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 94%|█████████▍| 6250/6638 [2:10:30<21:30, 3.33s/it]2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.66, 'grad_norm': 0.598624100470782, 'learning_rate': 1.7870368046138508e-07, 'epoch': 0.94} 94%|█████████▍| 6250/6638 [2:10:30<21:30, 3.33s/it] 94%|█████████▍| 6251/6638 [2:10:33<21:19, 3.31s/it] {'loss': 0.6599, 'grad_norm': 0.6360414140003517, 'learning_rate': 1.777864502476756e-07, 'epoch': 0.94} 94%|█████████▍| 6251/6638 [2:10:33<21:19, 3.31s/it] 94%|█████████▍| 6252/6638 [2:10:37<21:22, 3.32s/it] {'loss': 0.6454, 'grad_norm': 0.5717383557977868, 'learning_rate': 1.76871558909788e-07, 'epoch': 0.94} 94%|█████████▍| 6252/6638 [2:10:37<21:22, 3.32s/it] 94%|█████████▍| 6253/6638 [2:10:40<21:33, 3.36s/it] {'loss': 0.6509, 'grad_norm': 0.6081219824813565, 'learning_rate': 1.7595900666557585e-07, 'epoch': 0.94} 94%|█████████▍| 6253/6638 [2:10:40<21:33, 3.36s/it] 94%|█████████▍| 6254/6638 [2:10:44<21:22, 3.34s/it] {'loss': 0.5967, 'grad_norm': 0.520105128602431, 'learning_rate': 1.7504879373233752e-07, 'epoch': 0.94} 94%|█████████▍| 6254/6638 [2:10:44<21:22, 3.34s/it] 94%|█████████▍| 6255/6638 [2:10:47<21:14, 3.33s/it] {'loss': 0.6329, 'grad_norm': 0.576006506266192, 'learning_rate': 1.7414092032681294e-07, 'epoch': 0.94} 94%|█████████▍| 6255/6638 [2:10:47<21:14, 3.33s/it] 94%|█████████▍| 6256/6638 [2:10:50<20:46, 3.26s/it] {'loss': 0.6194, 'grad_norm': 0.6223378996828947, 'learning_rate': 1.732353866651859e-07, 'epoch': 0.94} 94%|█████████▍| 6256/6638 [2:10:50<20:46, 3.26s/it] 94%|█████████▍| 6257/6638 [2:10:53<20:47, 3.27s/it] {'loss': 0.6452, 'grad_norm': 0.5632482077046349, 'learning_rate': 1.72332192963085e-07, 'epoch': 0.94} 94%|█████████▍| 6257/6638 [2:10:53<20:47, 3.27s/it] 94%|█████████▍| 6258/6638 [2:10:57<20:53, 3.30s/it] {'loss': 0.6428, 'grad_norm': 0.6150694744667261, 'learning_rate': 1.714313394355782e-07, 'epoch': 0.94} 94%|█████████▍| 6258/6638 [2:10:57<20:53, 3.30s/it] 94%|█████████▍| 6259/6638 [2:11:00<20:45, 3.29s/it] {'loss': 0.6552, 'grad_norm': 0.5863808086101528, 'learning_rate': 1.7053282629717726e-07, 'epoch': 0.94} 94%|█████████▍| 6259/6638 [2:11:00<20:45, 3.29s/it] 94%|█████████▍| 6260/6638 [2:11:03<20:48, 3.30s/it] {'loss': 0.6342, 'grad_norm': 0.5317117809771253, 'learning_rate': 1.696366537618388e-07, 'epoch': 0.94} 94%|█████████▍| 6260/6638 [2:11:03<20:48, 3.30s/it] 94%|█████████▍| 6261/6638 [2:11:07<20:55, 3.33s/it] {'loss': 0.675, 'grad_norm': 0.6072500988784649, 'learning_rate': 1.6874282204295765e-07, 'epoch': 0.94} 94%|█████████▍| 6261/6638 [2:11:07<20:55, 3.33s/it] 94%|█████████▍| 6262/6638 [2:11:10<20:46, 3.32s/it] {'loss': 0.6118, 'grad_norm': 0.5887185184651959, 'learning_rate': 1.6785133135337584e-07, 'epoch': 0.94} 94%|█████████▍| 6262/6638 [2:11:10<20:46, 3.32s/it] 94%|█████████▍| 6263/6638 [2:11:13<20:47, 3.33s/it] {'loss': 0.6522, 'grad_norm': 0.5586607091143695, 'learning_rate': 1.6696218190537682e-07, 'epoch': 0.94} 94%|█████████▍| 6263/6638 [2:11:13<20:47, 3.33s/it] 94%|█████████▍| 6264/6638 [2:11:16<20:28, 3.28s/it] {'loss': 0.6431, 'grad_norm': 0.6351650461826654, 'learning_rate': 1.6607537391068463e-07, 'epoch': 0.94} 94%|█████████▍| 6264/6638 [2:11:16<20:28, 3.28s/it] 94%|█████████▍| 6265/6638 [2:11:20<20:41, 3.33s/it] {'loss': 0.6312, 'grad_norm': 0.5681909204421361, 'learning_rate': 1.6519090758046696e-07, 'epoch': 0.94} 94%|█████████▍| 6265/6638 [2:11:20<20:41, 3.33s/it] 94%|█████████▍| 6266/6638 [2:11:23<20:33, 3.32s/it] {'loss': 0.6273, 'grad_norm': 0.6034816484953898, 'learning_rate': 1.6430878312533205e-07, 'epoch': 0.94} 94%|█████████▍| 6266/6638 [2:11:23<20:33, 3.32s/it] 94%|█████████▍| 6267/6638 [2:11:26<20:27, 3.31s/it] {'loss': 0.6281, 'grad_norm': 0.5848516242133482, 'learning_rate': 1.634290007553363e-07, 'epoch': 0.94} 94%|█████████▍| 6267/6638 [2:11:26<20:27, 3.31s/it] 94%|█████████▍| 6268/6638 [2:11:30<20:21, 3.30s/it] {'loss': 0.6064, 'grad_norm': 0.5145464419599345, 'learning_rate': 1.6255156067997325e-07, 'epoch': 0.94} 94%|█████████▍| 6268/6638 [2:11:30<20:21, 3.30s/it] 94%|█████████▍| 6269/6638 [2:11:33<20:26, 3.32s/it] {'loss': 0.5971, 'grad_norm': 0.5144661791621186, 'learning_rate': 1.6167646310817686e-07, 'epoch': 0.94} 94%|█████████▍| 6269/6638 [2:11:33<20:26, 3.32s/it] 94%|█████████▍| 6270/6638 [2:11:36<20:25, 3.33s/it] {'loss': 0.6234, 'grad_norm': 0.5484336621252909, 'learning_rate': 1.6080370824832824e-07, 'epoch': 0.94} 94%|█████████▍| 6270/6638 [2:11:36<20:25, 3.33s/it] 94%|█████████▍| 6271/6638 [2:11:40<20:15, 3.31s/it] {'loss': 0.6189, 'grad_norm': 0.5928657074252535, 'learning_rate': 1.5993329630825005e-07, 'epoch': 0.94} 94%|█████████▍| 6271/6638 [2:11:40<20:15, 3.31s/it] 94%|█████████▍| 6272/6638 [2:11:43<20:16, 3.32s/it] {'loss': 0.6234, 'grad_norm': 0.55663228910956, 'learning_rate': 1.5906522749520426e-07, 'epoch': 0.94} 94%|█████████▍| 6272/6638 [2:11:43<20:16, 3.32s/it] 95%|█████████▍| 6273/6638 [2:11:46<20:09, 3.31s/it] {'loss': 0.6208, 'grad_norm': 0.5161940962053004, 'learning_rate': 1.5819950201589552e-07, 'epoch': 0.95} 95%|█████████▍| 6273/6638 [2:11:46<20:09, 3.31s/it] 95%|█████████▍| 6274/6638 [2:11:50<20:05, 3.31s/it] {'loss': 0.6512, 'grad_norm': 0.6082168775205811, 'learning_rate': 1.5733612007647226e-07, 'epoch': 0.95} 95%|█████████▍| 6274/6638 [2:11:50<20:05, 3.31s/it] 95%|█████████▍| 6275/6638 [2:11:53<20:15, 3.35s/it] {'loss': 0.6781, 'grad_norm': 0.595702555315689, 'learning_rate': 1.564750818825256e-07, 'epoch': 0.95} 95%|█████████▍| 6275/6638 [2:11:53<20:15, 3.35s/it] 95%|█████████▍| 6276/6638 [2:11:56<20:05, 3.33s/it] {'loss': 0.6421, 'grad_norm': 0.6060875988387825, 'learning_rate': 1.5561638763908372e-07, 'epoch': 0.95} 95%|█████████▍| 6276/6638 [2:11:56<20:05, 3.33s/it] 95%|█████████▍| 6277/6638 [2:12:00<19:58, 3.32s/it] {'loss': 0.6775, 'grad_norm': 0.6685948003537319, 'learning_rate': 1.5476003755062197e-07, 'epoch': 0.95} 95%|█████████▍| 6277/6638 [2:12:00<19:58, 3.32s/it] 95%|█████████▍| 6278/6638 [2:12:03<19:49, 3.30s/it] {'loss': 0.5999, 'grad_norm': 0.586759823052936, 'learning_rate': 1.539060318210539e-07, 'epoch': 0.95} 95%|█████████▍| 6278/6638 [2:12:03<19:49, 3.30s/it] 95%|█████████▍| 6279/6638 [2:12:06<19:47, 3.31s/it] {'loss': 0.5876, 'grad_norm': 0.5884592565472015, 'learning_rate': 1.5305437065373797e-07, 'epoch': 0.95} 95%|█████████▍| 6279/6638 [2:12:06<19:47, 3.31s/it] 95%|█████████▍| 6280/6638 [2:12:10<19:41, 3.30s/it] {'loss': 0.6472, 'grad_norm': 0.6163801704477652, 'learning_rate': 1.522050542514708e-07, 'epoch': 0.95} 95%|█████████▍| 6280/6638 [2:12:10<19:41, 3.30s/it] 95%|█████████▍| 6281/6638 [2:12:13<19:33, 3.29s/it] {'loss': 0.6081, 'grad_norm': 0.5844279194424884, 'learning_rate': 1.51358082816494e-07, 'epoch': 0.95} 95%|█████████▍| 6281/6638 [2:12:13<19:33, 3.29s/it] 95%|█████████▍| 6282/6638 [2:12:16<19:40, 3.32s/it] {'loss': 0.6417, 'grad_norm': 0.5716729317535959, 'learning_rate': 1.5051345655048733e-07, 'epoch': 0.95} 95%|█████████▍| 6282/6638 [2:12:16<19:40, 3.32s/it] 95%|█████████▍| 6283/6638 [2:12:19<19:34, 3.31s/it] {'loss': 0.6282, 'grad_norm': 0.5758336843596101, 'learning_rate': 1.4967117565457657e-07, 'epoch': 0.95} 95%|█████████▍| 6283/6638 [2:12:20<19:34, 3.31s/it] 95%|█████████▍| 6284/6638 [2:12:23<19:22, 3.28s/it] {'loss': 0.6133, 'grad_norm': 0.5663608259153129, 'learning_rate': 1.488312403293257e-07, 'epoch': 0.95} 95%|█████████▍| 6284/6638 [2:12:23<19:22, 3.28s/it] 95%|█████████▍| 6285/6638 [2:12:26<19:45, 3.36s/it] {'loss': 0.6775, 'grad_norm': 0.5866891539665434, 'learning_rate': 1.4799365077473926e-07, 'epoch': 0.95} 95%|█████████▍| 6285/6638 [2:12:26<19:45, 3.36s/it] 95%|█████████▍| 6286/6638 [2:12:30<19:42, 3.36s/it] {'loss': 0.597, 'grad_norm': 0.5665084430300316, 'learning_rate': 1.4715840719026763e-07, 'epoch': 0.95} 95%|█████████▍| 6286/6638 [2:12:30<19:42, 3.36s/it] 95%|█████████▍| 6287/6638 [2:12:33<19:24, 3.32s/it] {'loss': 0.6272, 'grad_norm': 0.5670438020442718, 'learning_rate': 1.4632550977479843e-07, 'epoch': 0.95} 95%|█████████▍| 6287/6638 [2:12:33<19:24, 3.32s/it] 95%|█████████▍| 6288/6638 [2:12:36<19:34, 3.36s/it] {'loss': 0.6357, 'grad_norm': 0.569492405220566, 'learning_rate': 1.4549495872666186e-07, 'epoch': 0.95} 95%|█████████▍| 6288/6638 [2:12:36<19:34, 3.36s/it] 95%|█████████▍| 6289/6638 [2:12:39<19:14, 3.31s/it] {'loss': 0.6244, 'grad_norm': 0.543239591852207, 'learning_rate': 1.4466675424362863e-07, 'epoch': 0.95} 95%|█████████▍| 6289/6638 [2:12:39<19:14, 3.31s/it] 95%|█████████▍| 6290/6638 [2:12:43<19:18, 3.33s/it] {'loss': 0.6505, 'grad_norm': 0.6707715141959861, 'learning_rate': 1.4384089652291544e-07, 'epoch': 0.95} 95%|█████████▍| 6290/6638 [2:12:43<19:18, 3.33s/it] 95%|█████████▍| 6291/6638 [2:12:46<19:01, 3.29s/it] {'loss': 0.6114, 'grad_norm': 0.5572398098049194, 'learning_rate': 1.4301738576117386e-07, 'epoch': 0.95} 95%|█████████▍| 6291/6638 [2:12:46<19:01, 3.29s/it] 95%|█████████▍| 6292/6638 [2:12:50<19:19, 3.35s/it] {'loss': 0.6806, 'grad_norm': 0.6486250286961536, 'learning_rate': 1.4219622215449814e-07, 'epoch': 0.95} 95%|█████████▍| 6292/6638 [2:12:50<19:19, 3.35s/it] 95%|█████████▍| 6293/6638 [2:12:53<19:19, 3.36s/it] {'loss': 0.5967, 'grad_norm': 0.5101040861888781, 'learning_rate': 1.4137740589842519e-07, 'epoch': 0.95} 95%|█████████▍| 6293/6638 [2:12:53<19:19, 3.36s/it] 95%|█████████▍| 6294/6638 [2:12:56<19:24, 3.39s/it] {'loss': 0.63, 'grad_norm': 0.5308703029295467, 'learning_rate': 1.4056093718793352e-07, 'epoch': 0.95} 95%|█████████▍| 6294/6638 [2:12:56<19:24, 3.39s/it] 95%|█████████▍| 6295/6638 [2:13:00<19:20, 3.38s/it] {'loss': 0.6223, 'grad_norm': 0.5694270209045217, 'learning_rate': 1.3974681621744202e-07, 'epoch': 0.95} 95%|█████████▍| 6295/6638 [2:13:00<19:20, 3.38s/it] 95%|█████████▍| 6296/6638 [2:13:03<18:59, 3.33s/it] {'loss': 0.677, 'grad_norm': 0.6294131680009862, 'learning_rate': 1.3893504318080897e-07, 'epoch': 0.95} 95%|█████████▍| 6296/6638 [2:13:03<18:59, 3.33s/it] 95%|█████████▍| 6297/6638 [2:13:06<18:44, 3.30s/it] {'loss': 0.6332, 'grad_norm': 0.5517145939346227, 'learning_rate': 1.3812561827133307e-07, 'epoch': 0.95} 95%|█████████▍| 6297/6638 [2:13:06<18:44, 3.30s/it] 95%|█████████▍| 6298/6638 [2:13:10<18:49, 3.32s/it] {'loss': 0.5963, 'grad_norm': 0.5950557041402857, 'learning_rate': 1.3731854168175796e-07, 'epoch': 0.95} 95%|█████████▍| 6298/6638 [2:13:10<18:49, 3.32s/it] 95%|█████████▍| 6299/6638 [2:13:13<18:36, 3.29s/it] {'loss': 0.6192, 'grad_norm': 0.5579157156007123, 'learning_rate': 1.365138136042643e-07, 'epoch': 0.95} 95%|█████████▍| 6299/6638 [2:13:13<18:36, 3.29s/it]0 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 95%|█████████▍| 6300/6638 [2:13:16<18:35, 3.30s/it]5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 27 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... {'loss': 0.65, 'grad_norm': 0.5784215214697694, 'learning_rate': 1.357114342304755e-07, 'epoch': 0.95} 95%|█████████▍| 6300/6638 [2:13:16<18:35, 3.30s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6300/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6300/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6300/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 95%|█████████▍| 6301/6638 [2:13:33<41:06, 7.32s/it] {'loss': 0.6367, 'grad_norm': 0.5869518557911668, 'learning_rate': 1.349114037514543e-07, 'epoch': 0.95} 95%|█████████▍| 6301/6638 [2:13:33<41:06, 7.32s/it] 95%|█████████▍| 6302/6638 [2:13:36<33:58, 6.07s/it] {'loss': 0.5947, 'grad_norm': 0.5911510119840232, 'learning_rate': 1.3411372235770493e-07, 'epoch': 0.95} 95%|█████████▍| 6302/6638 [2:13:36<33:58, 6.07s/it] 95%|█████████▍| 6303/6638 [2:13:39<29:08, 5.22s/it] {'loss': 0.6261, 'grad_norm': 0.528591425890692, 'learning_rate': 1.3331839023917216e-07, 'epoch': 0.95} 95%|█████████▍| 6303/6638 [2:13:39<29:08, 5.22s/it] 95%|█████████▍| 6304/6638 [2:13:43<25:57, 4.66s/it] {'loss': 0.6453, 'grad_norm': 0.5635090669525087, 'learning_rate': 1.3252540758524113e-07, 'epoch': 0.95} 95%|█████████▍| 6304/6638 [2:13:43<25:57, 4.66s/it] 95%|█████████▍| 6305/6638 [2:13:46<23:55, 4.31s/it] {'loss': 0.6189, 'grad_norm': 0.521784080080326, 'learning_rate': 1.317347745847386e-07, 'epoch': 0.95} 95%|█████████▍| 6305/6638 [2:13:46<23:55, 4.31s/it] 95%|█████████▍| 6306/6638 [2:13:49<22:25, 4.05s/it] {'loss': 0.6209, 'grad_norm': 0.5243486647744388, 'learning_rate': 1.3094649142593064e-07, 'epoch': 0.95} 95%|█████████▍| 6306/6638 [2:13:50<22:25, 4.05s/it] 95%|█████████▌| 6307/6638 [2:13:53<21:16, 3.86s/it] {'loss': 0.625, 'grad_norm': 0.5616451802265501, 'learning_rate': 1.3016055829652263e-07, 'epoch': 0.95} 95%|█████████▌| 6307/6638 [2:13:53<21:16, 3.86s/it] 95%|█████████▌| 6308/6638 [2:13:56<20:05, 3.65s/it] {'loss': 0.608, 'grad_norm': 0.547492698646087, 'learning_rate': 1.2937697538366378e-07, 'epoch': 0.95} 95%|█████████▌| 6308/6638 [2:13:56<20:05, 3.65s/it] 95%|█████████▌| 6309/6638 [2:13:59<19:24, 3.54s/it] {'loss': 0.71, 'grad_norm': 0.6664881307548672, 'learning_rate': 1.2859574287393928e-07, 'epoch': 0.95} 95%|█████████▌| 6309/6638 [2:13:59<19:24, 3.54s/it] 95%|█████████▌| 6310/6638 [2:14:03<18:53, 3.45s/it] {'loss': 0.6148, 'grad_norm': 0.5251956412325144, 'learning_rate': 1.278168609533803e-07, 'epoch': 0.95} 95%|█████████▌| 6310/6638 [2:14:03<18:53, 3.45s/it] 95%|█████████▌| 6311/6638 [2:14:06<18:46, 3.45s/it] {'loss': 0.6678, 'grad_norm': 0.6104844960991953, 'learning_rate': 1.2704032980745296e-07, 'epoch': 0.95} 95%|█████████▌| 6311/6638 [2:14:06<18:46, 3.45s/it] 95%|█████████▌| 6312/6638 [2:14:09<18:33, 3.42s/it] {'loss': 0.6412, 'grad_norm': 0.5577200163256593, 'learning_rate': 1.2626614962106598e-07, 'epoch': 0.95} 95%|█████████▌| 6312/6638 [2:14:09<18:33, 3.42s/it] 95%|█████████▌| 6313/6638 [2:14:13<18:39, 3.44s/it] {'loss': 0.6443, 'grad_norm': 0.6100701163954075, 'learning_rate': 1.2549432057856746e-07, 'epoch': 0.95} 95%|█████████▌| 6313/6638 [2:14:13<18:39, 3.44s/it] 95%|█████████▌| 6314/6638 [2:14:16<18:29, 3.42s/it] {'loss': 0.6464, 'grad_norm': 0.6140535978625901, 'learning_rate': 1.247248428637471e-07, 'epoch': 0.95} 95%|█████████▌| 6314/6638 [2:14:16<18:29, 3.42s/it] 95%|█████████▌| 6315/6638 [2:14:20<18:10, 3.38s/it] {'loss': 0.6251, 'grad_norm': 0.5491611676360473, 'learning_rate': 1.239577166598327e-07, 'epoch': 0.95} 95%|█████████▌| 6315/6638 [2:14:20<18:10, 3.38s/it] 95%|█████████▌| 6316/6638 [2:14:23<18:08, 3.38s/it] {'loss': 0.6764, 'grad_norm': 0.6115097296404145, 'learning_rate': 1.2319294214949373e-07, 'epoch': 0.95} 95%|█████████▌| 6316/6638 [2:14:23<18:08, 3.38s/it] 95%|█████████▌| 6317/6638 [2:14:26<18:02, 3.37s/it] {'loss': 0.6383, 'grad_norm': 0.5732201621698795, 'learning_rate': 1.2243051951483898e-07, 'epoch': 0.95} 95%|█████████▌| 6317/6638 [2:14:26<18:02, 3.37s/it] 95%|█████████▌| 6318/6638 [2:14:30<17:54, 3.36s/it] {'loss': 0.6097, 'grad_norm': 0.5527381300291453, 'learning_rate': 1.2167044893741876e-07, 'epoch': 0.95} 95%|█████████▌| 6318/6638 [2:14:30<17:54, 3.36s/it] 95%|█████████▌| 6319/6638 [2:14:33<17:40, 3.32s/it] {'loss': 0.6209, 'grad_norm': 0.6657368786617951, 'learning_rate': 1.209127305982205e-07, 'epoch': 0.95} 95%|█████████▌| 6319/6638 [2:14:33<17:40, 3.32s/it] 95%|█████████▌| 6320/6638 [2:14:36<17:46, 3.35s/it] {'loss': 0.6095, 'grad_norm': 0.5459946253088728, 'learning_rate': 1.2015736467767102e-07, 'epoch': 0.95} 95%|█████████▌| 6320/6638 [2:14:36<17:46, 3.35s/it] 95%|█████████▌| 6321/6638 [2:14:40<17:39, 3.34s/it] {'loss': 0.5955, 'grad_norm': 0.5457061026850997, 'learning_rate': 1.194043513556431e-07, 'epoch': 0.95} 95%|█████████▌| 6321/6638 [2:14:40<17:39, 3.34s/it] 95%|█████████▌| 6322/6638 [2:14:43<17:25, 3.31s/it] {'loss': 0.6335, 'grad_norm': 0.5730022065112662, 'learning_rate': 1.1865369081144107e-07, 'epoch': 0.95} 95%|█████████▌| 6322/6638 [2:14:43<17:25, 3.31s/it] 95%|█████████▌| 6323/6638 [2:14:46<17:23, 3.31s/it] {'loss': 0.6223, 'grad_norm': 0.543933973100469, 'learning_rate': 1.1790538322381528e-07, 'epoch': 0.95} 95%|█████████▌| 6323/6638 [2:14:46<17:23, 3.31s/it] 95%|█████████▌| 6324/6638 [2:14:49<17:20, 3.31s/it] {'loss': 0.6528, 'grad_norm': 0.5927386698390731, 'learning_rate': 1.171594287709532e-07, 'epoch': 0.95} 95%|█████████▌| 6324/6638 [2:14:49<17:20, 3.31s/it] 95%|█████████▌| 6325/6638 [2:14:53<17:15, 3.31s/it] {'loss': 0.6316, 'grad_norm': 0.5786950456981702, 'learning_rate': 1.1641582763048165e-07, 'epoch': 0.95} 95%|█████████▌| 6325/6638 [2:14:53<17:15, 3.31s/it] 95%|█████████▌| 6326/6638 [2:14:56<17:28, 3.36s/it] {'loss': 0.6956, 'grad_norm': 0.6143175622563154, 'learning_rate': 1.1567457997946896e-07, 'epoch': 0.95} 95%|█████████▌| 6326/6638 [2:14:56<17:28, 3.36s/it] 95%|█████████▌| 6327/6638 [2:14:59<17:16, 3.33s/it] {'loss': 0.635, 'grad_norm': 0.5826272605624171, 'learning_rate': 1.1493568599441951e-07, 'epoch': 0.95} 95%|█████████▌| 6327/6638 [2:15:00<17:16, 3.33s/it] 95%|█████████▌| 6328/6638 [2:15:03<17:10, 3.32s/it] {'loss': 0.6164, 'grad_norm': 0.5536331178485684, 'learning_rate': 1.1419914585128145e-07, 'epoch': 0.95} 95%|█████████▌| 6328/6638 [2:15:03<17:10, 3.32s/it] 95%|█████████▌| 6329/6638 [2:15:06<17:08, 3.33s/it] {'loss': 0.5994, 'grad_norm': 0.5635647758735243, 'learning_rate': 1.1346495972544003e-07, 'epoch': 0.95} 95%|█████████▌| 6329/6638 [2:15:06<17:08, 3.33s/it] 95%|█████████▌| 6330/6638 [2:15:10<17:25, 3.40s/it] {'loss': 0.6664, 'grad_norm': 0.5620951749498334, 'learning_rate': 1.1273312779172096e-07, 'epoch': 0.95} 95%|█████████▌| 6330/6638 [2:15:10<17:25, 3.40s/it] 95%|█████████▌| 6331/6638 [2:15:13<17:09, 3.35s/it] {'loss': 0.6269, 'grad_norm': 0.5907565677328852, 'learning_rate': 1.1200365022438819e-07, 'epoch': 0.95} 95%|█████████▌| 6331/6638 [2:15:13<17:09, 3.35s/it] 95%|█████████▌| 6332/6638 [2:15:16<16:52, 3.31s/it] {'loss': 0.6308, 'grad_norm': 0.5689311150718649, 'learning_rate': 1.1127652719714388e-07, 'epoch': 0.95} 95%|█████████▌| 6332/6638 [2:15:16<16:52, 3.31s/it] 95%|█████████▌| 6333/6638 [2:15:19<16:52, 3.32s/it] {'loss': 0.6623, 'grad_norm': 0.5925290645953541, 'learning_rate': 1.1055175888313507e-07, 'epoch': 0.95} 95%|█████████▌| 6333/6638 [2:15:19<16:52, 3.32s/it] 95%|█████████▌| 6334/6638 [2:15:23<16:47, 3.32s/it] {'loss': 0.6119, 'grad_norm': 0.5760994611787628, 'learning_rate': 1.0982934545494262e-07, 'epoch': 0.95} 95%|█████████▌| 6334/6638 [2:15:23<16:47, 3.32s/it] 95%|█████████▌| 6335/6638 [2:15:26<16:42, 3.31s/it] {'loss': 0.6947, 'grad_norm': 0.6306850574168688, 'learning_rate': 1.0910928708458779e-07, 'epoch': 0.95} 95%|█████████▌| 6335/6638 [2:15:26<16:42, 3.31s/it] 95%|█████████▌| 6336/6638 [2:15:29<16:46, 3.33s/it] {'loss': 0.6357, 'grad_norm': 0.5567136395972816, 'learning_rate': 1.0839158394353122e-07, 'epoch': 0.95} 95%|█████████▌| 6336/6638 [2:15:29<16:46, 3.33s/it] 95%|█████████▌| 6337/6638 [2:15:33<16:31, 3.30s/it] {'loss': 0.6079, 'grad_norm': 0.5753550876165663, 'learning_rate': 1.0767623620267509e-07, 'epoch': 0.95} 95%|█████████▌| 6337/6638 [2:15:33<16:31, 3.30s/it] 95%|█████████▌| 6338/6638 [2:15:36<16:30, 3.30s/it] {'loss': 0.6864, 'grad_norm': 0.6399854103702498, 'learning_rate': 1.0696324403235759e-07, 'epoch': 0.95} 95%|█████████▌| 6338/6638 [2:15:36<16:30, 3.30s/it] 95%|█████████▌| 6339/6638 [2:15:39<16:20, 3.28s/it] {'loss': 0.6964, 'grad_norm': 0.6117003447798237, 'learning_rate': 1.0625260760235623e-07, 'epoch': 0.95} 95%|█████████▌| 6339/6638 [2:15:39<16:20, 3.28s/it] 96%|█████████▌| 6340/6638 [2:15:43<16:26, 3.31s/it] {'loss': 0.6684, 'grad_norm': 0.6603949811939334, 'learning_rate': 1.055443270818901e-07, 'epoch': 0.96} 96%|█████████▌| 6340/6638 [2:15:43<16:26, 3.31s/it] 96%|█████████▌| 6341/6638 [2:15:46<16:22, 3.31s/it] {'loss': 0.6123, 'grad_norm': 0.5569620427971255, 'learning_rate': 1.0483840263961543e-07, 'epoch': 0.96} 96%|█████████▌| 6341/6638 [2:15:46<16:22, 3.31s/it] 96%|█████████▌| 6342/6638 [2:15:49<16:14, 3.29s/it] {'loss': 0.6441, 'grad_norm': 0.5543373943429772, 'learning_rate': 1.041348344436277e-07, 'epoch': 0.96} 96%|█████████▌| 6342/6638 [2:15:49<16:14, 3.29s/it] 96%|█████████▌| 6343/6638 [2:15:53<16:14, 3.30s/it] {'loss': 0.6468, 'grad_norm': 0.6338278531611469, 'learning_rate': 1.0343362266145963e-07, 'epoch': 0.96} 96%|█████████▌| 6343/6638 [2:15:53<16:14, 3.30s/it] 96%|█████████▌| 6344/6638 [2:15:56<16:14, 3.31s/it] {'loss': 0.6077, 'grad_norm': 0.5693482014650948, 'learning_rate': 1.0273476746008649e-07, 'epoch': 0.96} 96%|█████████▌| 6344/6638 [2:15:56<16:14, 3.31s/it] 96%|█████████▌| 6345/6638 [2:15:59<16:11, 3.31s/it] {'loss': 0.6847, 'grad_norm': 0.6796777836711217, 'learning_rate': 1.0203826900591962e-07, 'epoch': 0.96} 96%|█████████▌| 6345/6638 [2:15:59<16:11, 3.31s/it] 96%|█████████▌| 6346/6638 [2:16:03<16:18, 3.35s/it] {'loss': 0.6222, 'grad_norm': 0.512609581463159, 'learning_rate': 1.0134412746481082e-07, 'epoch': 0.96} 96%|█████████▌| 6346/6638 [2:16:03<16:18, 3.35s/it] 96%|█████████▌| 6347/6638 [2:16:06<16:13, 3.34s/it] {'loss': 0.6121, 'grad_norm': 0.5522451415887526, 'learning_rate': 1.0065234300204895e-07, 'epoch': 0.96} 96%|█████████▌| 6347/6638 [2:16:06<16:13, 3.34s/it] 96%|█████████▌| 6348/6638 [2:16:09<16:12, 3.35s/it] {'loss': 0.6098, 'grad_norm': 0.5174842134706665, 'learning_rate': 9.996291578236228e-08, 'epoch': 0.96} 96%|█████████▌| 6348/6638 [2:16:09<16:12, 3.35s/it] 96%|█████████▌| 6349/6638 [2:16:13<16:05, 3.34s/it] {'loss': 0.6423, 'grad_norm': 0.5607147637944799, 'learning_rate': 9.927584596991835e-08, 'epoch': 0.96} 96%|█████████▌| 6349/6638 [2:16:13<16:05, 3.34s/it]0 AutoResumeHook: Checking whether to suspend... 96%|█████████▌| 6350/6638 [2:16:16<15:52, 3.31s/it]4 AutoResumeHook: Checking whether to suspend... 6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6053, 'grad_norm': 0.5824667038289233, 'learning_rate': 9.859113372832297e-08, 'epoch': 0.96} 96%|█████████▌| 6350/6638 [2:16:16<15:52, 3.31s/it] 96%|█████████▌| 6351/6638 [2:16:19<15:46, 3.30s/it] {'loss': 0.6295, 'grad_norm': 0.5136275877148916, 'learning_rate': 9.790877922062015e-08, 'epoch': 0.96} 96%|█████████▌| 6351/6638 [2:16:19<15:46, 3.30s/it] 96%|█████████▌| 6352/6638 [2:16:22<15:35, 3.27s/it] {'loss': 0.6275, 'grad_norm': 0.5817710920825998, 'learning_rate': 9.722878260929325e-08, 'epoch': 0.96} 96%|█████████▌| 6352/6638 [2:16:22<15:35, 3.27s/it] 96%|█████████▌| 6353/6638 [2:16:26<15:38, 3.29s/it] {'loss': 0.6135, 'grad_norm': 0.5680318400396802, 'learning_rate': 9.655114405626387e-08, 'epoch': 0.96} 96%|█████████▌| 6353/6638 [2:16:26<15:38, 3.29s/it] 96%|█████████▌| 6354/6638 [2:16:29<15:34, 3.29s/it] {'loss': 0.6154, 'grad_norm': 0.5763368513148306, 'learning_rate': 9.587586372289182e-08, 'epoch': 0.96} 96%|█████████▌| 6354/6638 [2:16:29<15:34, 3.29s/it] 96%|█████████▌| 6355/6638 [2:16:32<15:53, 3.37s/it] {'loss': 0.6588, 'grad_norm': 0.5729429876688688, 'learning_rate': 9.520294176997514e-08, 'epoch': 0.96} 96%|█████████▌| 6355/6638 [2:16:33<15:53, 3.37s/it] 96%|█████████▌| 6356/6638 [2:16:36<15:42, 3.34s/it] {'loss': 0.6547, 'grad_norm': 0.5866404685868164, 'learning_rate': 9.453237835775119e-08, 'epoch': 0.96} 96%|█████████▌| 6356/6638 [2:16:36<15:42, 3.34s/it] 96%|█████████▌| 6357/6638 [2:16:39<15:38, 3.34s/it] {'loss': 0.6179, 'grad_norm': 0.507692904578644, 'learning_rate': 9.386417364589562e-08, 'epoch': 0.96} 96%|█████████▌| 6357/6638 [2:16:39<15:38, 3.34s/it] 96%|█████████▌| 6358/6638 [2:16:42<15:36, 3.35s/it] {'loss': 0.6543, 'grad_norm': 0.5458087930633675, 'learning_rate': 9.319832779352223e-08, 'epoch': 0.96} 96%|█████████▌| 6358/6638 [2:16:42<15:36, 3.35s/it] 96%|█████████▌| 6359/6638 [2:16:46<15:27, 3.32s/it] {'loss': 0.5971, 'grad_norm': 0.5495532818199678, 'learning_rate': 9.253484095918087e-08, 'epoch': 0.96} 96%|█████████▌| 6359/6638 [2:16:46<15:27, 3.32s/it] 96%|█████████▌| 6360/6638 [2:16:49<15:17, 3.30s/it] {'loss': 0.6259, 'grad_norm': 0.5714406241604612, 'learning_rate': 9.187371330086404e-08, 'epoch': 0.96} 96%|█████████▌| 6360/6638 [2:16:49<15:17, 3.30s/it] 96%|█████████▌| 6361/6638 [2:16:52<15:22, 3.33s/it] {'loss': 0.6143, 'grad_norm': 0.5630658104421864, 'learning_rate': 9.121494497599915e-08, 'epoch': 0.96} 96%|█████████▌| 6361/6638 [2:16:52<15:22, 3.33s/it] 96%|█████████▌| 6362/6638 [2:16:56<15:30, 3.37s/it] {'loss': 0.6427, 'grad_norm': 0.6316442759735306, 'learning_rate': 9.055853614145294e-08, 'epoch': 0.96} 96%|█████████▌| 6362/6638 [2:16:56<15:30, 3.37s/it] 96%|█████████▌| 6363/6638 [2:16:59<15:22, 3.35s/it] {'loss': 0.6326, 'grad_norm': 0.5439969292170526, 'learning_rate': 8.990448695352927e-08, 'epoch': 0.96} 96%|█████████▌| 6363/6638 [2:16:59<15:22, 3.35s/it] 96%|█████████▌| 6364/6638 [2:17:02<15:08, 3.32s/it] {'loss': 0.6107, 'grad_norm': 0.5127637337873959, 'learning_rate': 8.925279756797357e-08, 'epoch': 0.96} 96%|█████████▌| 6364/6638 [2:17:02<15:08, 3.32s/it] 96%|█████████▌| 6365/6638 [2:17:06<15:00, 3.30s/it] {'loss': 0.6511, 'grad_norm': 0.6587242968413116, 'learning_rate': 8.86034681399639e-08, 'epoch': 0.96} 96%|█████████▌| 6365/6638 [2:17:06<15:00, 3.30s/it] 96%|█████████▌| 6366/6638 [2:17:09<15:07, 3.34s/it] {'loss': 0.6499, 'grad_norm': 0.591109227544206, 'learning_rate': 8.795649882412105e-08, 'epoch': 0.96} 96%|█████████▌| 6366/6638 [2:17:09<15:07, 3.34s/it] 96%|█████████▌| 6367/6638 [2:17:12<15:02, 3.33s/it] {'loss': 0.6122, 'grad_norm': 0.5548887106054762, 'learning_rate': 8.731188977450178e-08, 'epoch': 0.96} 96%|█████████▌| 6367/6638 [2:17:12<15:02, 3.33s/it] 96%|█████████▌| 6368/6638 [2:17:16<15:02, 3.34s/it] {'loss': 0.6554, 'grad_norm': 0.5901160885720449, 'learning_rate': 8.666964114459997e-08, 'epoch': 0.96} 96%|█████████▌| 6368/6638 [2:17:16<15:02, 3.34s/it] 96%|█████████▌| 6369/6638 [2:17:19<14:57, 3.34s/it] {'loss': 0.6658, 'grad_norm': 0.5846451280502118, 'learning_rate': 8.602975308734996e-08, 'epoch': 0.96} 96%|█████████▌| 6369/6638 [2:17:19<14:57, 3.34s/it] 96%|█████████▌| 6370/6638 [2:17:22<14:43, 3.30s/it] {'loss': 0.6157, 'grad_norm': 0.8374116818987627, 'learning_rate': 8.539222575512096e-08, 'epoch': 0.96} 96%|█████████▌| 6370/6638 [2:17:22<14:43, 3.30s/it] 96%|█████████▌| 6371/6638 [2:17:26<14:43, 3.31s/it] {'loss': 0.659, 'grad_norm': 0.5720242073478281, 'learning_rate': 8.475705929972377e-08, 'epoch': 0.96} 96%|█████████▌| 6371/6638 [2:17:26<14:43, 3.31s/it] 96%|█████████▌| 6372/6638 [2:17:29<14:53, 3.36s/it] {'loss': 0.6048, 'grad_norm': 0.5722834886867162, 'learning_rate': 8.412425387240408e-08, 'epoch': 0.96} 96%|█████████▌| 6372/6638 [2:17:29<14:53, 3.36s/it] 96%|█████████▌| 6373/6638 [2:17:32<14:40, 3.32s/it] {'loss': 0.6213, 'grad_norm': 0.5258337151766853, 'learning_rate': 8.349380962384468e-08, 'epoch': 0.96} 96%|█████████▌| 6373/6638 [2:17:32<14:40, 3.32s/it] 96%|█████████▌| 6374/6638 [2:17:36<14:38, 3.33s/it] {'loss': 0.6129, 'grad_norm': 0.6063950277410166, 'learning_rate': 8.286572670416993e-08, 'epoch': 0.96} 96%|█████████▌| 6374/6638 [2:17:36<14:38, 3.33s/it] 96%|█████████▌| 6375/6638 [2:17:39<14:30, 3.31s/it] {'loss': 0.612, 'grad_norm': 0.5461458832675995, 'learning_rate': 8.224000526293796e-08, 'epoch': 0.96} 96%|█████████▌| 6375/6638 [2:17:39<14:30, 3.31s/it] 96%|█████████▌| 6376/6638 [2:17:42<14:23, 3.29s/it] {'loss': 0.6189, 'grad_norm': 0.5480933751005889, 'learning_rate': 8.161664544914627e-08, 'epoch': 0.96} 96%|█████████▌| 6376/6638 [2:17:42<14:23, 3.29s/it] 96%|█████████▌| 6377/6638 [2:17:46<14:21, 3.30s/it] {'loss': 0.6376, 'grad_norm': 0.5706801632322049, 'learning_rate': 8.099564741123167e-08, 'epoch': 0.96} 96%|█████████▌| 6377/6638 [2:17:46<14:21, 3.30s/it] 96%|█████████▌| 6378/6638 [2:17:49<14:09, 3.27s/it] {'loss': 0.6363, 'grad_norm': 0.5643530785212072, 'learning_rate': 8.037701129706366e-08, 'epoch': 0.96} 96%|█████████▌| 6378/6638 [2:17:49<14:09, 3.27s/it] 96%|█████████▌| 6379/6638 [2:17:52<14:22, 3.33s/it] {'loss': 0.6698, 'grad_norm': 0.6298018393872254, 'learning_rate': 7.97607372539566e-08, 'epoch': 0.96} 96%|█████████▌| 6379/6638 [2:17:52<14:22, 3.33s/it] 96%|█████████▌| 6380/6638 [2:17:55<14:12, 3.30s/it] {'loss': 0.6452, 'grad_norm': 0.6247131475050401, 'learning_rate': 7.914682542865537e-08, 'epoch': 0.96} 96%|█████████▌| 6380/6638 [2:17:55<14:12, 3.30s/it] 96%|█████████▌| 6381/6638 [2:17:59<14:15, 3.33s/it] {'loss': 0.6522, 'grad_norm': 0.5938277090251738, 'learning_rate': 7.853527596734634e-08, 'epoch': 0.96} 96%|█████████▌| 6381/6638 [2:17:59<14:15, 3.33s/it] 96%|█████████▌| 6382/6638 [2:18:02<14:01, 3.29s/it] {'loss': 0.6265, 'grad_norm': 0.7617348949962872, 'learning_rate': 7.792608901565191e-08, 'epoch': 0.96} 96%|█████████▌| 6382/6638 [2:18:02<14:01, 3.29s/it] 96%|█████████▌| 6383/6638 [2:18:05<14:03, 3.31s/it] {'loss': 0.6337, 'grad_norm': 0.6164531235123183, 'learning_rate': 7.73192647186316e-08, 'epoch': 0.96} 96%|█████████▌| 6383/6638 [2:18:05<14:03, 3.31s/it] 96%|█████████▌| 6384/6638 [2:18:09<14:10, 3.35s/it] {'loss': 0.6462, 'grad_norm': 0.5878275430044194, 'learning_rate': 7.67148032207854e-08, 'epoch': 0.96} 96%|█████████▌| 6384/6638 [2:18:09<14:10, 3.35s/it] 96%|█████████▌| 6385/6638 [2:18:12<14:17, 3.39s/it] {'loss': 0.6375, 'grad_norm': 0.5406446036656181, 'learning_rate': 7.611270466604592e-08, 'epoch': 0.96} 96%|█████████▌| 6385/6638 [2:18:12<14:17, 3.39s/it] 96%|█████████▌| 6386/6638 [2:18:16<14:13, 3.39s/it] {'loss': 0.7118, 'grad_norm': 0.7079104161630806, 'learning_rate': 7.551296919778738e-08, 'epoch': 0.96} 96%|█████████▌| 6386/6638 [2:18:16<14:13, 3.39s/it] 96%|█████████▌| 6387/6638 [2:18:19<14:06, 3.37s/it] {'loss': 0.6377, 'grad_norm': 0.5739888655322748, 'learning_rate': 7.491559695881779e-08, 'epoch': 0.96} 96%|█████████▌| 6387/6638 [2:18:19<14:06, 3.37s/it] 96%|█████████▌| 6388/6638 [2:18:22<13:59, 3.36s/it] {'loss': 0.6249, 'grad_norm': 0.5887845510280164, 'learning_rate': 7.432058809138442e-08, 'epoch': 0.96} 96%|█████████▌| 6388/6638 [2:18:22<13:59, 3.36s/it] 96%|█████████▌| 6389/6638 [2:18:26<13:49, 3.33s/it] {'loss': 0.6523, 'grad_norm': 0.6518065312259057, 'learning_rate': 7.372794273717288e-08, 'epoch': 0.96} 96%|█████████▌| 6389/6638 [2:18:26<13:49, 3.33s/it] 96%|█████████▋| 6390/6638 [2:18:29<13:42, 3.31s/it] {'loss': 0.6464, 'grad_norm': 0.5787681661863835, 'learning_rate': 7.313766103730246e-08, 'epoch': 0.96} 96%|█████████▋| 6390/6638 [2:18:29<13:42, 3.31s/it] 96%|█████████▋| 6391/6638 [2:18:32<13:38, 3.31s/it] {'loss': 0.6613, 'grad_norm': 0.5736064961445816, 'learning_rate': 7.254974313233298e-08, 'epoch': 0.96} 96%|█████████▋| 6391/6638 [2:18:32<13:38, 3.31s/it] 96%|█████████▋| 6392/6638 [2:18:36<13:34, 3.31s/it] {'loss': 0.6305, 'grad_norm': 0.5320547206242295, 'learning_rate': 7.196418916226022e-08, 'epoch': 0.96} 96%|█████████▋| 6392/6638 [2:18:36<13:34, 3.31s/it] 96%|█████████▋| 6393/6638 [2:18:39<13:28, 3.30s/it] {'loss': 0.5887, 'grad_norm': 0.5308511148729564, 'learning_rate': 7.138099926651487e-08, 'epoch': 0.96} 96%|█████████▋| 6393/6638 [2:18:39<13:28, 3.30s/it] 96%|█████████▋| 6394/6638 [2:18:42<13:41, 3.37s/it] {'loss': 0.6324, 'grad_norm': 0.6050968055674477, 'learning_rate': 7.080017358396917e-08, 'epoch': 0.96} 96%|█████████▋| 6394/6638 [2:18:42<13:41, 3.37s/it] 96%|█████████▋| 6395/6638 [2:18:46<13:32, 3.34s/it] {'loss': 0.6227, 'grad_norm': 0.5415947242157181, 'learning_rate': 7.022171225292918e-08, 'epoch': 0.96} 96%|█████████▋| 6395/6638 [2:18:46<13:32, 3.34s/it] 96%|█████████▋| 6396/6638 [2:18:49<13:27, 3.34s/it] {'loss': 0.6665, 'grad_norm': 0.7213721732761547, 'learning_rate': 6.964561541113801e-08, 'epoch': 0.96} 96%|█████████▋| 6396/6638 [2:18:49<13:27, 3.34s/it] 96%|█████████▋| 6397/6638 [2:18:52<13:23, 3.33s/it] {'loss': 0.6395, 'grad_norm': 0.5336672960460233, 'learning_rate': 6.907188319577706e-08, 'epoch': 0.96} 96%|█████████▋| 6397/6638 [2:18:52<13:23, 3.33s/it] 96%|█████████▋| 6398/6638 [2:18:56<13:18, 3.33s/it] {'loss': 0.6572, 'grad_norm': 0.6284995551487449, 'learning_rate': 6.850051574346373e-08, 'epoch': 0.96} 96%|█████████▋| 6398/6638 [2:18:56<13:18, 3.33s/it] 96%|█████████▋| 6399/6638 [2:18:59<13:15, 3.33s/it] {'loss': 0.6592, 'grad_norm': 0.5786086204675522, 'learning_rate': 6.79315131902536e-08, 'epoch': 0.96} 96%|█████████▋| 6399/6638 [2:18:59<13:15, 3.33s/it]6 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 96%|█████████▋| 6400/6638 [2:19:02<13:16, 3.35s/it]5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6791, 'grad_norm': 0.6186587638256952, 'learning_rate': 6.736487567163719e-08, 'epoch': 0.96} 96%|█████████▋| 6400/6638 [2:19:02<13:16, 3.35s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6400/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6400/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6400/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 96%|█████████▋| 6401/6638 [2:19:19<28:48, 7.29s/it] {'loss': 0.6236, 'grad_norm': 0.5799644037955228, 'learning_rate': 6.680060332254212e-08, 'epoch': 0.96} 96%|█████████▋| 6401/6638 [2:19:19<28:48, 7.29s/it] 96%|█████████▋| 6402/6638 [2:19:22<23:56, 6.09s/it] {'loss': 0.6272, 'grad_norm': 0.5610962111485814, 'learning_rate': 6.623869627733537e-08, 'epoch': 0.96} 96%|█████████▋| 6402/6638 [2:19:22<23:56, 6.09s/it] 96%|█████████▋| 6403/6638 [2:19:25<20:38, 5.27s/it] {'loss': 0.6164, 'grad_norm': 0.5330280291854348, 'learning_rate': 6.567915466981767e-08, 'epoch': 0.96} 96%|█████████▋| 6403/6638 [2:19:25<20:38, 5.27s/it] 96%|█████████▋| 6404/6638 [2:19:29<18:19, 4.70s/it] {'loss': 0.6285, 'grad_norm': 0.5726053322059825, 'learning_rate': 6.5121978633228e-08, 'epoch': 0.96} 96%|█████████▋| 6404/6638 [2:19:29<18:19, 4.70s/it] 96%|█████████▋| 6405/6638 [2:19:32<16:34, 4.27s/it] {'loss': 0.605, 'grad_norm': 0.5227726954807954, 'learning_rate': 6.456716830024135e-08, 'epoch': 0.96} 96%|█████████▋| 6405/6638 [2:19:32<16:34, 4.27s/it] 97%|█████████▋| 6406/6638 [2:19:35<15:22, 3.98s/it] {'loss': 0.6765, 'grad_norm': 0.6162260737365763, 'learning_rate': 6.401472380297091e-08, 'epoch': 0.97} 97%|█████████▋| 6406/6638 [2:19:35<15:22, 3.98s/it] 97%|█████████▋| 6407/6638 [2:19:39<14:35, 3.79s/it] {'loss': 0.6062, 'grad_norm': 0.5540917983751396, 'learning_rate': 6.34646452729637e-08, 'epoch': 0.97} 97%|█████████▋| 6407/6638 [2:19:39<14:35, 3.79s/it] 97%|█████████▋| 6408/6638 [2:19:42<13:52, 3.62s/it] {'loss': 0.6362, 'grad_norm': 0.6057584786529745, 'learning_rate': 6.291693284120492e-08, 'epoch': 0.97} 97%|█████████▋| 6408/6638 [2:19:42<13:52, 3.62s/it] 97%|█████████▋| 6409/6638 [2:19:45<13:24, 3.51s/it] {'loss': 0.6212, 'grad_norm': 0.584750434446046, 'learning_rate': 6.237158663811805e-08, 'epoch': 0.97} 97%|█████████▋| 6409/6638 [2:19:45<13:24, 3.51s/it] 97%|█████████▋| 6410/6638 [2:19:48<13:06, 3.45s/it] {'loss': 0.5967, 'grad_norm': 0.5610077157245779, 'learning_rate': 6.18286067935614e-08, 'epoch': 0.97} 97%|█████████▋| 6410/6638 [2:19:48<13:06, 3.45s/it] 97%|█████████▋| 6411/6638 [2:19:52<12:49, 3.39s/it] {'loss': 0.602, 'grad_norm': 0.552928988544623, 'learning_rate': 6.128799343682712e-08, 'epoch': 0.97} 97%|█████████▋| 6411/6638 [2:19:52<12:49, 3.39s/it] 97%|█████████▋| 6412/6638 [2:19:55<12:42, 3.37s/it] {'loss': 0.6621, 'grad_norm': 0.5824878633305349, 'learning_rate': 6.074974669665112e-08, 'epoch': 0.97} 97%|█████████▋| 6412/6638 [2:19:55<12:42, 3.37s/it] 97%|█████████▋| 6413/6638 [2:19:58<12:34, 3.35s/it] {'loss': 0.6279, 'grad_norm': 0.5944631732774298, 'learning_rate': 6.021386670119755e-08, 'epoch': 0.97} 97%|█████████▋| 6413/6638 [2:19:58<12:34, 3.35s/it] 97%|█████████▋| 6414/6638 [2:20:02<12:28, 3.34s/it] {'loss': 0.6161, 'grad_norm': 0.523138222563811, 'learning_rate': 5.9680353578071e-08, 'epoch': 0.97} 97%|█████████▋| 6414/6638 [2:20:02<12:28, 3.34s/it] 97%|█████████▋| 6415/6638 [2:20:05<12:22, 3.33s/it] {'loss': 0.6328, 'grad_norm': 0.5405511801800187, 'learning_rate': 5.9149207454315404e-08, 'epoch': 0.97} 97%|█████████▋| 6415/6638 [2:20:05<12:22, 3.33s/it] 97%|█████████▋| 6416/6638 [2:20:08<12:16, 3.32s/it] {'loss': 0.5863, 'grad_norm': 0.511136357639106, 'learning_rate': 5.862042845640403e-08, 'epoch': 0.97} 97%|█████████▋| 6416/6638 [2:20:08<12:16, 3.32s/it] 97%|█████████▋| 6417/6638 [2:20:12<12:08, 3.30s/it] {'loss': 0.6472, 'grad_norm': 0.5800031087591284, 'learning_rate': 5.809401671025283e-08, 'epoch': 0.97} 97%|█████████▋| 6417/6638 [2:20:12<12:08, 3.30s/it] 97%|█████████▋| 6418/6638 [2:20:15<12:05, 3.30s/it] {'loss': 0.6622, 'grad_norm': 0.6298236716346848, 'learning_rate': 5.7569972341210424e-08, 'epoch': 0.97} 97%|█████████▋| 6418/6638 [2:20:15<12:05, 3.30s/it] 97%|█████████▋| 6419/6638 [2:20:18<12:03, 3.30s/it] {'loss': 0.6156, 'grad_norm': 0.566381231852962, 'learning_rate': 5.704829547406143e-08, 'epoch': 0.97} 97%|█████████▋| 6419/6638 [2:20:18<12:03, 3.30s/it] 97%|█████████▋| 6420/6638 [2:20:21<11:53, 3.27s/it] {'loss': 0.6539, 'grad_norm': 0.621763427734248, 'learning_rate': 5.652898623303094e-08, 'epoch': 0.97} 97%|█████████▋| 6420/6638 [2:20:21<11:53, 3.27s/it] 97%|█████████▋| 6421/6638 [2:20:25<11:58, 3.31s/it] {'loss': 0.6762, 'grad_norm': 0.6068398346113775, 'learning_rate': 5.601204474177668e-08, 'epoch': 0.97} 97%|█████████▋| 6421/6638 [2:20:25<11:58, 3.31s/it] 97%|█████████▋| 6422/6638 [2:20:28<11:46, 3.27s/it] {'loss': 0.6316, 'grad_norm': 0.5710246180890675, 'learning_rate': 5.5497471123391325e-08, 'epoch': 0.97} 97%|█████████▋| 6422/6638 [2:20:28<11:46, 3.27s/it] 97%|█████████▋| 6423/6638 [2:20:31<11:55, 3.33s/it] {'loss': 0.667, 'grad_norm': 0.6368817228447157, 'learning_rate': 5.498526550040906e-08, 'epoch': 0.97} 97%|█████████▋| 6423/6638 [2:20:31<11:55, 3.33s/it] 97%|█████████▋| 6424/6638 [2:20:35<12:06, 3.40s/it] {'loss': 0.6019, 'grad_norm': 0.5140003986930723, 'learning_rate': 5.447542799479233e-08, 'epoch': 0.97} 97%|█████████▋| 6424/6638 [2:20:35<12:06, 3.40s/it] 97%|█████████▋| 6425/6638 [2:20:38<12:02, 3.39s/it] {'loss': 0.6602, 'grad_norm': 0.638378868041962, 'learning_rate': 5.396795872794846e-08, 'epoch': 0.97} 97%|█████████▋| 6425/6638 [2:20:38<12:02, 3.39s/it] 97%|█████████▋| 6426/6638 [2:20:42<11:58, 3.39s/it] {'loss': 0.6287, 'grad_norm': 0.837758170249257, 'learning_rate': 5.346285782071414e-08, 'epoch': 0.97} 97%|█████████▋| 6426/6638 [2:20:42<11:58, 3.39s/it] 97%|█████████▋| 6427/6638 [2:20:45<11:44, 3.34s/it] {'loss': 0.5908, 'grad_norm': 0.511219582159979, 'learning_rate': 5.296012539336537e-08, 'epoch': 0.97} 97%|█████████▋| 6427/6638 [2:20:45<11:44, 3.34s/it] 97%|█████████▋| 6428/6638 [2:20:48<11:43, 3.35s/it] {'loss': 0.6526, 'grad_norm': 0.566575557966207, 'learning_rate': 5.2459761565613054e-08, 'epoch': 0.97} 97%|█████████▋| 6428/6638 [2:20:48<11:43, 3.35s/it] 97%|█████████▋| 6429/6638 [2:20:52<11:40, 3.35s/it] {'loss': 0.6216, 'grad_norm': 0.5459776131618856, 'learning_rate': 5.196176645660411e-08, 'epoch': 0.97} 97%|█████████▋| 6429/6638 [2:20:52<11:40, 3.35s/it] 97%|█████████▋| 6430/6638 [2:20:55<11:28, 3.31s/it] {'loss': 0.6095, 'grad_norm': 0.5309122080999319, 'learning_rate': 5.146614018492369e-08, 'epoch': 0.97} 97%|█████████▋| 6430/6638 [2:20:55<11:28, 3.31s/it] 97%|█████████▋| 6431/6638 [2:20:58<11:19, 3.28s/it] {'loss': 0.6142, 'grad_norm': 0.5701094552072191, 'learning_rate': 5.097288286858737e-08, 'epoch': 0.97} 97%|█████████▋| 6431/6638 [2:20:58<11:19, 3.28s/it] 97%|█████████▋| 6432/6638 [2:21:01<11:07, 3.24s/it] {'loss': 0.5996, 'grad_norm': 0.5901141398554433, 'learning_rate': 5.048199462505232e-08, 'epoch': 0.97} 97%|█████████▋| 6432/6638 [2:21:01<11:07, 3.24s/it] 97%|█████████▋| 6433/6638 [2:21:04<11:06, 3.25s/it] {'loss': 0.6479, 'grad_norm': 0.6531000355988169, 'learning_rate': 4.9993475571209485e-08, 'epoch': 0.97} 97%|█████████▋| 6433/6638 [2:21:04<11:06, 3.25s/it] 97%|█████████▋| 6434/6638 [2:21:08<11:03, 3.25s/it] {'loss': 0.6073, 'grad_norm': 0.5483693536693237, 'learning_rate': 4.950732582338358e-08, 'epoch': 0.97} 97%|█████████▋| 6434/6638 [2:21:08<11:03, 3.25s/it] 97%|█████████▋| 6435/6638 [2:21:11<11:06, 3.28s/it] {'loss': 0.6386, 'grad_norm': 0.5876064725648036, 'learning_rate': 4.902354549733979e-08, 'epoch': 0.97} 97%|█████████▋| 6435/6638 [2:21:11<11:06, 3.28s/it] 97%|█████████▋| 6436/6638 [2:21:14<11:09, 3.32s/it] {'loss': 0.6006, 'grad_norm': 0.5682523506871009, 'learning_rate': 4.8542134708274844e-08, 'epoch': 0.97} 97%|█████████▋| 6436/6638 [2:21:14<11:09, 3.32s/it] 97%|█████████▋| 6437/6638 [2:21:18<10:59, 3.28s/it] {'loss': 0.6312, 'grad_norm': 0.5706240349946429, 'learning_rate': 4.8063093570822615e-08, 'epoch': 0.97} 97%|█████████▋| 6437/6638 [2:21:18<10:59, 3.28s/it] 97%|█████████▋| 6438/6638 [2:21:21<10:48, 3.24s/it] {'loss': 0.6205, 'grad_norm': 0.5671587401856356, 'learning_rate': 4.7586422199054075e-08, 'epoch': 0.97} 97%|█████████▋| 6438/6638 [2:21:21<10:48, 3.24s/it] 97%|█████████▋| 6439/6638 [2:21:24<10:50, 3.27s/it] {'loss': 0.6372, 'grad_norm': 0.5737331741730327, 'learning_rate': 4.711212070647286e-08, 'epoch': 0.97} 97%|█████████▋| 6439/6638 [2:21:24<10:50, 3.27s/it] 97%|█████████▋| 6440/6638 [2:21:28<10:52, 3.30s/it] {'loss': 0.6254, 'grad_norm': 0.5837722879980815, 'learning_rate': 4.664018920602198e-08, 'epoch': 0.97} 97%|█████████▋| 6440/6638 [2:21:28<10:52, 3.30s/it] 97%|█████████▋| 6441/6638 [2:21:31<10:48, 3.29s/it] {'loss': 0.6183, 'grad_norm': 0.5351491099971206, 'learning_rate': 4.6170627810077086e-08, 'epoch': 0.97} 97%|█████████▋| 6441/6638 [2:21:31<10:48, 3.29s/it] 97%|█████████▋| 6442/6638 [2:21:34<10:43, 3.28s/it] {'loss': 0.6144, 'grad_norm': 0.5531355101033438, 'learning_rate': 4.570343663045096e-08, 'epoch': 0.97} 97%|█████████▋| 6442/6638 [2:21:34<10:43, 3.28s/it] 97%|█████████▋| 6443/6638 [2:21:37<10:43, 3.30s/it] {'loss': 0.6371, 'grad_norm': 0.6420553526683513, 'learning_rate': 4.5238615778392394e-08, 'epoch': 0.97} 97%|█████████▋| 6443/6638 [2:21:37<10:43, 3.30s/it] 97%|█████████▋| 6444/6638 [2:21:41<10:37, 3.28s/it] {'loss': 0.64, 'grad_norm': 0.67512447228446, 'learning_rate': 4.477616536458396e-08, 'epoch': 0.97} 97%|█████████▋| 6444/6638 [2:21:41<10:37, 3.28s/it] 97%|█████████▋| 6445/6638 [2:21:44<10:31, 3.27s/it] {'loss': 0.5866, 'grad_norm': 0.5517503344048146, 'learning_rate': 4.431608549914535e-08, 'epoch': 0.97} 97%|█████████▋| 6445/6638 [2:21:44<10:31, 3.27s/it] 97%|█████████▋| 6446/6638 [2:21:47<10:24, 3.25s/it] {'loss': 0.6155, 'grad_norm': 0.5492013168496143, 'learning_rate': 4.385837629163114e-08, 'epoch': 0.97} 97%|█████████▋| 6446/6638 [2:21:47<10:24, 3.25s/it] 97%|█████████▋| 6447/6638 [2:21:51<10:28, 3.29s/it] {'loss': 0.6235, 'grad_norm': 0.5683175972633434, 'learning_rate': 4.3403037851031945e-08, 'epoch': 0.97} 97%|█████████▋| 6447/6638 [2:21:51<10:28, 3.29s/it] 97%|█████████▋| 6448/6638 [2:21:54<10:26, 3.30s/it] {'loss': 0.6294, 'grad_norm': 0.6039315426498971, 'learning_rate': 4.295007028577214e-08, 'epoch': 0.97} 97%|█████████▋| 6448/6638 [2:21:54<10:26, 3.30s/it] 97%|█████████▋| 6449/6638 [2:21:57<10:26, 3.31s/it] {'loss': 0.6522, 'grad_norm': 0.6683231412949103, 'learning_rate': 4.249947370371432e-08, 'epoch': 0.97} 97%|█████████▋| 6449/6638 [2:21:57<10:26, 3.31s/it]60 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 97%|█████████▋| 6450/6638 [2:22:00<10:16, 3.28s/it]4 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6132, 'grad_norm': 0.5354565475978103, 'learning_rate': 4.2051248212154897e-08, 'epoch': 0.97} 97%|█████████▋| 6450/6638 [2:22:00<10:16, 3.28s/it] 97%|█████████▋| 6451/6638 [2:22:04<10:11, 3.27s/it] {'loss': 0.6338, 'grad_norm': 0.5882467448509743, 'learning_rate': 4.1605393917824034e-08, 'epoch': 0.97} 97%|█████████▋| 6451/6638 [2:22:04<10:11, 3.27s/it] 97%|█████████▋| 6452/6638 [2:22:07<10:15, 3.31s/it] {'loss': 0.6382, 'grad_norm': 0.6191316558987277, 'learning_rate': 4.116191092689126e-08, 'epoch': 0.97} 97%|█████████▋| 6452/6638 [2:22:07<10:15, 3.31s/it] 97%|█████████▋| 6453/6638 [2:22:10<10:12, 3.31s/it] {'loss': 0.6531, 'grad_norm': 0.5886512334882303, 'learning_rate': 4.072079934495765e-08, 'epoch': 0.97} 97%|█████████▋| 6453/6638 [2:22:10<10:12, 3.31s/it] 97%|█████████▋| 6454/6638 [2:22:13<10:01, 3.27s/it] {'loss': 0.6539, 'grad_norm': 0.6103744392561965, 'learning_rate': 4.0282059277062524e-08, 'epoch': 0.97} 97%|█████████▋| 6454/6638 [2:22:14<10:01, 3.27s/it] 97%|█████████▋| 6455/6638 [2:22:17<10:04, 3.30s/it] {'loss': 0.666, 'grad_norm': 0.591761142938902, 'learning_rate': 3.9845690827677866e-08, 'epoch': 0.97} 97%|█████████▋| 6455/6638 [2:22:17<10:04, 3.30s/it] 97%|█████████▋| 6456/6638 [2:22:20<10:02, 3.31s/it] {'loss': 0.6216, 'grad_norm': 0.5080230820360961, 'learning_rate': 3.941169410071277e-08, 'epoch': 0.97} 97%|█████████▋| 6456/6638 [2:22:20<10:02, 3.31s/it] 97%|█████████▋| 6457/6638 [2:22:24<09:58, 3.31s/it] {'loss': 0.6222, 'grad_norm': 0.5402797837253766, 'learning_rate': 3.898006919951125e-08, 'epoch': 0.97} 97%|█████████▋| 6457/6638 [2:22:24<09:58, 3.31s/it] 97%|█████████▋| 6458/6638 [2:22:27<10:02, 3.35s/it] {'loss': 0.6418, 'grad_norm': 0.5849482702230971, 'learning_rate': 3.855081622685219e-08, 'epoch': 0.97} 97%|█████████▋| 6458/6638 [2:22:27<10:02, 3.35s/it] 97%|█████████▋| 6459/6638 [2:22:30<09:54, 3.32s/it] {'loss': 0.6189, 'grad_norm': 0.6259874494034016, 'learning_rate': 3.812393528494829e-08, 'epoch': 0.97} 97%|█████████▋| 6459/6638 [2:22:30<09:54, 3.32s/it] 97%|█████████▋| 6460/6638 [2:22:33<09:49, 3.31s/it] {'loss': 0.6689, 'grad_norm': 0.591473007165121, 'learning_rate': 3.769942647545044e-08, 'epoch': 0.97} 97%|█████████▋| 6460/6638 [2:22:34<09:49, 3.31s/it] 97%|█████████▋| 6461/6638 [2:22:37<09:44, 3.30s/it] {'loss': 0.6951, 'grad_norm': 0.6505343546147854, 'learning_rate': 3.727728989944446e-08, 'epoch': 0.97} 97%|█████████▋| 6461/6638 [2:22:37<09:44, 3.30s/it] 97%|█████████▋| 6462/6638 [2:22:40<09:35, 3.27s/it] {'loss': 0.6179, 'grad_norm': 0.5705481877106416, 'learning_rate': 3.68575256574466e-08, 'epoch': 0.97} 97%|█████████▋| 6462/6638 [2:22:40<09:35, 3.27s/it] 97%|█████████▋| 6463/6638 [2:22:43<09:33, 3.28s/it] {'loss': 0.6601, 'grad_norm': 0.6212632545248309, 'learning_rate': 3.6440133849413586e-08, 'epoch': 0.97} 97%|█████████▋| 6463/6638 [2:22:43<09:33, 3.28s/it] 97%|█████████▋| 6464/6638 [2:22:47<09:27, 3.26s/it] {'loss': 0.6294, 'grad_norm': 0.5821789902159629, 'learning_rate': 3.602511457473479e-08, 'epoch': 0.97} 97%|█████████▋| 6464/6638 [2:22:47<09:27, 3.26s/it] 97%|█████████▋| 6465/6638 [2:22:50<09:24, 3.26s/it] {'loss': 0.6493, 'grad_norm': 0.6730108260490457, 'learning_rate': 3.561246793223561e-08, 'epoch': 0.97} 97%|█████████▋| 6465/6638 [2:22:50<09:24, 3.26s/it] 97%|█████████▋| 6466/6638 [2:22:53<09:24, 3.28s/it] {'loss': 0.6404, 'grad_norm': 0.5898514034089176, 'learning_rate': 3.5202194020175215e-08, 'epoch': 0.97} 97%|█████████▋| 6466/6638 [2:22:53<09:24, 3.28s/it] 97%|█████████▋| 6467/6638 [2:22:56<09:20, 3.28s/it] {'loss': 0.6965, 'grad_norm': 0.6722462547592251, 'learning_rate': 3.479429293624881e-08, 'epoch': 0.97} 97%|█████████▋| 6467/6638 [2:22:56<09:20, 3.28s/it] 97%|█████████▋| 6468/6638 [2:23:00<09:24, 3.32s/it] {'loss': 0.6953, 'grad_norm': 0.6534540037630444, 'learning_rate': 3.438876477758646e-08, 'epoch': 0.97} 97%|█████████▋| 6468/6638 [2:23:00<09:24, 3.32s/it] 97%|█████████▋| 6469/6638 [2:23:03<09:29, 3.37s/it] {'loss': 0.5989, 'grad_norm': 0.5249985483600429, 'learning_rate': 3.398560964075204e-08, 'epoch': 0.97} 97%|█████████▋| 6469/6638 [2:23:03<09:29, 3.37s/it] 97%|█████████▋| 6470/6638 [2:23:06<09:18, 3.33s/it] {'loss': 0.625, 'grad_norm': 0.6245537944152759, 'learning_rate': 3.358482762174542e-08, 'epoch': 0.97} 97%|█████████▋| 6470/6638 [2:23:06<09:18, 3.33s/it] 97%|█████████▋| 6471/6638 [2:23:10<09:17, 3.34s/it] {'loss': 0.6355, 'grad_norm': 0.5320065387090666, 'learning_rate': 3.318641881600249e-08, 'epoch': 0.97} 97%|█████████▋| 6471/6638 [2:23:10<09:17, 3.34s/it] 97%|█████████▋| 6472/6638 [2:23:13<09:10, 3.32s/it] {'loss': 0.6288, 'grad_norm': 0.5812304432395299, 'learning_rate': 3.279038331839068e-08, 'epoch': 0.97} 97%|█████████▋| 6472/6638 [2:23:13<09:10, 3.32s/it] 98%|█████████▊| 6473/6638 [2:23:17<09:12, 3.35s/it] {'loss': 0.6042, 'grad_norm': 0.4907170781858353, 'learning_rate': 3.2396721223216795e-08, 'epoch': 0.98} 98%|█████████▊| 6473/6638 [2:23:17<09:12, 3.35s/it] 98%|█████████▊| 6474/6638 [2:23:20<09:02, 3.31s/it] {'loss': 0.6386, 'grad_norm': 0.5969579823354192, 'learning_rate': 3.200543262421807e-08, 'epoch': 0.98} 98%|█████████▊| 6474/6638 [2:23:20<09:02, 3.31s/it] 98%|█████████▊| 6475/6638 [2:23:23<08:59, 3.31s/it] {'loss': 0.6177, 'grad_norm': 0.6138103394654875, 'learning_rate': 3.161651761456996e-08, 'epoch': 0.98} 98%|█████████▊| 6475/6638 [2:23:23<08:59, 3.31s/it] 98%|█████████▊| 6476/6638 [2:23:26<08:57, 3.32s/it] {'loss': 0.6337, 'grad_norm': 0.524592774889633, 'learning_rate': 3.1229976286880624e-08, 'epoch': 0.98} 98%|█████████▊| 6476/6638 [2:23:26<08:57, 3.32s/it] 98%|█████████▊| 6477/6638 [2:23:30<08:54, 3.32s/it] {'loss': 0.6092, 'grad_norm': 0.5512969368832136, 'learning_rate': 3.08458087331942e-08, 'epoch': 0.98} 98%|█████████▊| 6477/6638 [2:23:30<08:54, 3.32s/it] 98%|█████████▊| 6478/6638 [2:23:33<08:45, 3.29s/it] {'loss': 0.613, 'grad_norm': 0.5541592685173629, 'learning_rate': 3.046401504498753e-08, 'epoch': 0.98} 98%|█████████▊| 6478/6638 [2:23:33<08:45, 3.29s/it] 98%|█████████▊| 6479/6638 [2:23:36<08:44, 3.30s/it] {'loss': 0.6184, 'grad_norm': 0.5752494985322614, 'learning_rate': 3.008459531317564e-08, 'epoch': 0.98} 98%|█████████▊| 6479/6638 [2:23:36<08:44, 3.30s/it] 98%|█████████▊| 6480/6638 [2:23:40<08:42, 3.31s/it] {'loss': 0.6533, 'grad_norm': 0.5858758138209754, 'learning_rate': 2.970754962810518e-08, 'epoch': 0.98} 98%|█████████▊| 6480/6638 [2:23:40<08:42, 3.31s/it] 98%|█████████▊| 6481/6638 [2:23:43<08:36, 3.29s/it] {'loss': 0.6436, 'grad_norm': 0.5836460625742491, 'learning_rate': 2.933287807955987e-08, 'epoch': 0.98} 98%|█████████▊| 6481/6638 [2:23:43<08:36, 3.29s/it] 98%|█████████▊| 6482/6638 [2:23:46<08:29, 3.27s/it] {'loss': 0.6186, 'grad_norm': 0.5677959894158283, 'learning_rate': 2.8960580756756118e-08, 'epoch': 0.98} 98%|█████████▊| 6482/6638 [2:23:46<08:29, 3.27s/it] 98%|█████████▊| 6483/6638 [2:23:49<08:27, 3.28s/it] {'loss': 0.6576, 'grad_norm': 0.9274397757969799, 'learning_rate': 2.859065774834413e-08, 'epoch': 0.98} 98%|█████████▊| 6483/6638 [2:23:49<08:27, 3.28s/it] 98%|█████████▊| 6484/6638 [2:23:53<08:24, 3.28s/it] {'loss': 0.6336, 'grad_norm': 0.634913837156308, 'learning_rate': 2.8223109142413442e-08, 'epoch': 0.98} 98%|█████████▊| 6484/6638 [2:23:53<08:24, 3.28s/it] 98%|█████████▊| 6485/6638 [2:23:56<08:19, 3.26s/it] {'loss': 0.674, 'grad_norm': 0.5598892716240895, 'learning_rate': 2.7857935026482928e-08, 'epoch': 0.98} 98%|█████████▊| 6485/6638 [2:23:56<08:19, 3.26s/it] 98%|█████████▊| 6486/6638 [2:23:59<08:18, 3.28s/it] {'loss': 0.6258, 'grad_norm': 0.5701731157753428, 'learning_rate': 2.7495135487509706e-08, 'epoch': 0.98} 98%|█████████▊| 6486/6638 [2:23:59<08:18, 3.28s/it] 98%|█████████▊| 6487/6638 [2:24:02<08:16, 3.29s/it] {'loss': 0.6092, 'grad_norm': 0.6044295867136575, 'learning_rate': 2.713471061188244e-08, 'epoch': 0.98} 98%|█████████▊| 6487/6638 [2:24:02<08:16, 3.29s/it] 98%|█████████▊| 6488/6638 [2:24:06<08:06, 3.25s/it] {'loss': 0.6322, 'grad_norm': 0.5881763143321361, 'learning_rate': 2.6776660485426932e-08, 'epoch': 0.98} 98%|█████████▊| 6488/6638 [2:24:06<08:06, 3.25s/it] 98%|█████████▊| 6489/6638 [2:24:09<08:06, 3.27s/it] {'loss': 0.6229, 'grad_norm': 0.5777600939713777, 'learning_rate': 2.642098519340275e-08, 'epoch': 0.98} 98%|█████████▊| 6489/6638 [2:24:09<08:06, 3.27s/it] 98%|█████████▊| 6490/6638 [2:24:12<08:01, 3.25s/it] {'loss': 0.6213, 'grad_norm': 0.6748537559826338, 'learning_rate': 2.606768482050215e-08, 'epoch': 0.98} 98%|█████████▊| 6490/6638 [2:24:12<08:01, 3.25s/it] 98%|█████████▊| 6491/6638 [2:24:16<08:07, 3.31s/it] {'loss': 0.6174, 'grad_norm': 0.5231930205467871, 'learning_rate': 2.57167594508545e-08, 'epoch': 0.98} 98%|█████████▊| 6491/6638 [2:24:16<08:07, 3.31s/it] 98%|█████████▊| 6492/6638 [2:24:19<08:00, 3.29s/it] {'loss': 0.6721, 'grad_norm': 0.6105059742839994, 'learning_rate': 2.5368209168021852e-08, 'epoch': 0.98} 98%|█████████▊| 6492/6638 [2:24:19<08:00, 3.29s/it] 98%|█████████▊| 6493/6638 [2:24:22<07:52, 3.26s/it] {'loss': 0.617, 'grad_norm': 0.6158084292003322, 'learning_rate': 2.5022034055003363e-08, 'epoch': 0.98} 98%|█████████▊| 6493/6638 [2:24:22<07:52, 3.26s/it] 98%|█████████▊| 6494/6638 [2:24:25<07:50, 3.27s/it] {'loss': 0.609, 'grad_norm': 0.5412049226596348, 'learning_rate': 2.4678234194227546e-08, 'epoch': 0.98} 98%|█████████▊| 6494/6638 [2:24:25<07:50, 3.27s/it] 98%|█████████▊| 6495/6638 [2:24:29<07:47, 3.27s/it] {'loss': 0.6091, 'grad_norm': 0.5279147785072987, 'learning_rate': 2.4336809667563355e-08, 'epoch': 0.98} 98%|█████████▊| 6495/6638 [2:24:29<07:47, 3.27s/it] 98%|█████████▊| 6496/6638 [2:24:32<07:43, 3.26s/it] {'loss': 0.6365, 'grad_norm': 0.5852812373329727, 'learning_rate': 2.399776055630909e-08, 'epoch': 0.98} 98%|█████████▊| 6496/6638 [2:24:32<07:43, 3.26s/it] 98%|█████████▊| 6497/6638 [2:24:35<07:38, 3.25s/it] {'loss': 0.6186, 'grad_norm': 0.5713560940387766, 'learning_rate': 2.3661086941200174e-08, 'epoch': 0.98} 98%|█████████▊| 6497/6638 [2:24:35<07:38, 3.25s/it] 98%|█████████▊| 6498/6638 [2:24:38<07:31, 3.22s/it] {'loss': 0.6227, 'grad_norm': 0.6056864315859855, 'learning_rate': 2.3326788902404695e-08, 'epoch': 0.98} 98%|█████████▊| 6498/6638 [2:24:38<07:31, 3.22s/it] 98%|█████████▊| 6499/6638 [2:24:41<07:28, 3.23s/it] {'loss': 0.6347, 'grad_norm': 0.5921484344370785, 'learning_rate': 2.299486651952898e-08, 'epoch': 0.98} 98%|█████████▊| 6499/6638 [2:24:41<07:28, 3.23s/it]046 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 98%|█████████▊| 6500/6638 [2:24:45<07:24, 3.22s/it]1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... {'loss': 0.6084, 'grad_norm': 0.5201394439580721, 'learning_rate': 2.2665319871607584e-08, 'epoch': 0.98} 98%|█████████▊| 6500/6638 [2:24:45<07:24, 3.22s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6500/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6500/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6500/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 98%|█████████▊| 6501/6638 [2:25:03<17:42, 7.75s/it] {'loss': 0.6172, 'grad_norm': 0.5301183064372011, 'learning_rate': 2.2338149037113287e-08, 'epoch': 0.98} 98%|█████████▊| 6501/6638 [2:25:03<17:42, 7.75s/it] 98%|█████████▊| 6502/6638 [2:25:06<14:32, 6.42s/it] {'loss': 0.6639, 'grad_norm': 0.6489515707577186, 'learning_rate': 2.2013354093953775e-08, 'epoch': 0.98} 98%|█████████▊| 6502/6638 [2:25:06<14:32, 6.42s/it] 98%|█████████▊| 6503/6638 [2:25:10<12:20, 5.48s/it] {'loss': 0.6114, 'grad_norm': 0.5745900005785098, 'learning_rate': 2.1690935119468294e-08, 'epoch': 0.98} 98%|█████████▊| 6503/6638 [2:25:10<12:20, 5.48s/it] 98%|█████████▊| 6504/6638 [2:25:13<10:58, 4.91s/it] {'loss': 0.6616, 'grad_norm': 0.5968251453135843, 'learning_rate': 2.137089219043098e-08, 'epoch': 0.98} 98%|█████████▊| 6504/6638 [2:25:13<10:58, 4.91s/it] 98%|█████████▊| 6505/6638 [2:25:17<09:50, 4.44s/it] {'loss': 0.6422, 'grad_norm': 0.5775629148372096, 'learning_rate': 2.1053225383051988e-08, 'epoch': 0.98} 98%|█████████▊| 6505/6638 [2:25:17<09:50, 4.44s/it] 98%|█████████▊| 6506/6638 [2:25:20<09:00, 4.10s/it] {'loss': 0.6043, 'grad_norm': 0.561293650937059, 'learning_rate': 2.0737934772975254e-08, 'epoch': 0.98} 98%|█████████▊| 6506/6638 [2:25:20<09:00, 4.10s/it] 98%|█████████▊| 6507/6638 [2:25:23<08:35, 3.93s/it] {'loss': 0.6928, 'grad_norm': 0.6242190639987647, 'learning_rate': 2.0425020435276278e-08, 'epoch': 0.98} 98%|█████████▊| 6507/6638 [2:25:23<08:35, 3.93s/it] 98%|█████████▊| 6508/6638 [2:25:27<08:03, 3.72s/it] {'loss': 0.6227, 'grad_norm': 0.5722151409679282, 'learning_rate': 2.0114482444466565e-08, 'epoch': 0.98} 98%|█████████▊| 6508/6638 [2:25:27<08:03, 3.72s/it] 98%|█████████▊| 6509/6638 [2:25:30<07:42, 3.59s/it] {'loss': 0.6503, 'grad_norm': 0.5978865014920516, 'learning_rate': 1.9806320874492523e-08, 'epoch': 0.98} 98%|█████████▊| 6509/6638 [2:25:30<07:42, 3.59s/it] 98%|█████████▊| 6510/6638 [2:25:34<08:17, 3.88s/it] {'loss': 0.6654, 'grad_norm': 0.5705812598065839, 'learning_rate': 1.9500535798734342e-08, 'epoch': 0.98} 98%|█████████▊| 6510/6638 [2:25:34<08:17, 3.88s/it] 98%|█████████▊| 6511/6638 [2:25:38<07:50, 3.70s/it] {'loss': 0.6258, 'grad_norm': 0.5788095942497655, 'learning_rate': 1.9197127290006003e-08, 'epoch': 0.98} 98%|█████████▊| 6511/6638 [2:25:38<07:50, 3.70s/it] 98%|█████████▊| 6512/6638 [2:25:41<07:31, 3.58s/it] {'loss': 0.6113, 'grad_norm': 0.5958252579373877, 'learning_rate': 1.889609542055415e-08, 'epoch': 0.98} 98%|█████████▊| 6512/6638 [2:25:41<07:31, 3.58s/it] 98%|█████████▊| 6513/6638 [2:25:44<07:14, 3.48s/it] {'loss': 0.6762, 'grad_norm': 0.6703892793317782, 'learning_rate': 1.859744026206145e-08, 'epoch': 0.98} 98%|█████████▊| 6513/6638 [2:25:44<07:14, 3.48s/it] 98%|█████████▊| 6514/6638 [2:25:48<07:04, 3.42s/it] {'loss': 0.6541, 'grad_norm': 0.6077291195921807, 'learning_rate': 1.8301161885644347e-08, 'epoch': 0.98} 98%|█████████▊| 6514/6638 [2:25:48<07:04, 3.42s/it] 98%|█████████▊| 6515/6638 [2:25:51<06:52, 3.35s/it] {'loss': 0.6475, 'grad_norm': 0.5832911425705974, 'learning_rate': 1.8007260361851963e-08, 'epoch': 0.98} 98%|█████████▊| 6515/6638 [2:25:51<06:52, 3.35s/it] 98%|█████████▊| 6516/6638 [2:25:54<06:45, 3.33s/it] {'loss': 0.6448, 'grad_norm': 0.7037907838341546, 'learning_rate': 1.7715735760669428e-08, 'epoch': 0.98} 98%|█████████▊| 6516/6638 [2:25:54<06:45, 3.33s/it] 98%|█████████▊| 6517/6638 [2:25:57<06:43, 3.33s/it] {'loss': 0.6973, 'grad_norm': 0.6131734150662725, 'learning_rate': 1.7426588151514546e-08, 'epoch': 0.98} 98%|█████████▊| 6517/6638 [2:25:57<06:43, 3.33s/it] 98%|█████████▊| 6518/6638 [2:26:01<06:36, 3.31s/it] {'loss': 0.6482, 'grad_norm': 0.9911575529363237, 'learning_rate': 1.7139817603240015e-08, 'epoch': 0.98} 98%|█████████▊| 6518/6638 [2:26:01<06:36, 3.31s/it] 98%|█████████▊| 6519/6638 [2:26:04<06:32, 3.30s/it] {'loss': 0.631, 'grad_norm': 0.60101523938589, 'learning_rate': 1.685542418413122e-08, 'epoch': 0.98} 98%|█████████▊| 6519/6638 [2:26:04<06:32, 3.30s/it] 98%|█████████▊| 6520/6638 [2:26:07<06:34, 3.35s/it] {'loss': 0.6608, 'grad_norm': 0.6076611616014528, 'learning_rate': 1.6573407961907317e-08, 'epoch': 0.98} 98%|█████████▊| 6520/6638 [2:26:07<06:34, 3.35s/it] 98%|█████████▊| 6521/6638 [2:26:11<06:29, 3.33s/it] {'loss': 0.6498, 'grad_norm': 0.5909191154925721, 'learning_rate': 1.6293769003724592e-08, 'epoch': 0.98} 98%|█████████▊| 6521/6638 [2:26:11<06:29, 3.33s/it] 98%|█████████▊| 6522/6638 [2:26:14<06:26, 3.33s/it] {'loss': 0.6372, 'grad_norm': 0.5389957903725826, 'learning_rate': 1.6016507376169776e-08, 'epoch': 0.98} 98%|█████████▊| 6522/6638 [2:26:14<06:26, 3.33s/it] 98%|█████████▊| 6523/6638 [2:26:17<06:21, 3.31s/it] {'loss': 0.6336, 'grad_norm': 0.6523406639139536, 'learning_rate': 1.5741623145263396e-08, 'epoch': 0.98} 98%|█████████▊| 6523/6638 [2:26:17<06:21, 3.31s/it] 98%|█████████▊| 6524/6638 [2:26:21<06:20, 3.34s/it] {'loss': 0.6504, 'grad_norm': 0.5872097916177307, 'learning_rate': 1.54691163764642e-08, 'epoch': 0.98} 98%|█████████▊| 6524/6638 [2:26:21<06:20, 3.34s/it] 98%|█████████▊| 6525/6638 [2:26:24<06:18, 3.35s/it] {'loss': 0.6513, 'grad_norm': 0.6296955649862527, 'learning_rate': 1.519898713465806e-08, 'epoch': 0.98} 98%|█████████▊| 6525/6638 [2:26:24<06:18, 3.35s/it] 98%|█████████▊| 6526/6638 [2:26:27<06:11, 3.31s/it] {'loss': 0.6026, 'grad_norm': 0.5901492121195361, 'learning_rate': 1.4931235484172412e-08, 'epoch': 0.98} 98%|█████████▊| 6526/6638 [2:26:27<06:11, 3.31s/it] 98%|█████████▊| 6527/6638 [2:26:31<06:09, 3.33s/it] {'loss': 0.6562, 'grad_norm': 0.5904374804989442, 'learning_rate': 1.4665861488761813e-08, 'epoch': 0.98} 98%|█████████▊| 6527/6638 [2:26:31<06:09, 3.33s/it] 98%|█████████▊| 6528/6638 [2:26:34<06:15, 3.41s/it] {'loss': 0.5836, 'grad_norm': 0.4867819306519119, 'learning_rate': 1.4402865211617934e-08, 'epoch': 0.98} 98%|█████████▊| 6528/6638 [2:26:34<06:15, 3.41s/it] 98%|█████████▊| 6529/6638 [2:26:37<06:03, 3.34s/it] {'loss': 0.6483, 'grad_norm': 0.6221413704946709, 'learning_rate': 1.4142246715366238e-08, 'epoch': 0.98} 98%|█████████▊| 6529/6638 [2:26:37<06:03, 3.34s/it] 98%|█████████▊| 6530/6638 [2:26:41<06:11, 3.44s/it] {'loss': 0.6563, 'grad_norm': 0.5790329583061306, 'learning_rate': 1.388400606206486e-08, 'epoch': 0.98} 98%|█████████▊| 6530/6638 [2:26:41<06:11, 3.44s/it] 98%|█████████▊| 6531/6638 [2:26:44<06:04, 3.41s/it] {'loss': 0.6154, 'grad_norm': 0.603455379789257, 'learning_rate': 1.3628143313206833e-08, 'epoch': 0.98} 98%|█████████▊| 6531/6638 [2:26:44<06:04, 3.41s/it] 98%|█████████▊| 6532/6638 [2:26:48<05:57, 3.37s/it] {'loss': 0.607, 'grad_norm': 0.5697138551256044, 'learning_rate': 1.3374658529717866e-08, 'epoch': 0.98} 98%|█████████▊| 6532/6638 [2:26:48<05:57, 3.37s/it] 98%|█████████▊| 6533/6638 [2:26:51<05:50, 3.34s/it] {'loss': 0.6658, 'grad_norm': 0.6457304457255153, 'learning_rate': 1.3123551771958564e-08, 'epoch': 0.98} 98%|█████████▊| 6533/6638 [2:26:51<05:50, 3.34s/it] 98%|█████████▊| 6534/6638 [2:26:54<05:49, 3.36s/it] {'loss': 0.6328, 'grad_norm': 0.5397352276596348, 'learning_rate': 1.287482309972332e-08, 'epoch': 0.98} 98%|█████████▊| 6534/6638 [2:26:54<05:49, 3.36s/it] 98%|█████████▊| 6535/6638 [2:26:58<05:45, 3.36s/it] {'loss': 0.6351, 'grad_norm': 0.5526633899087758, 'learning_rate': 1.2628472572239204e-08, 'epoch': 0.98} 98%|█████████▊| 6535/6638 [2:26:58<05:45, 3.36s/it] 98%|█████████▊| 6536/6638 [2:27:01<05:35, 3.29s/it] {'loss': 0.6294, 'grad_norm': 0.6463577284822516, 'learning_rate': 1.2384500248165954e-08, 'epoch': 0.98} 98%|█████████▊| 6536/6638 [2:27:01<05:35, 3.29s/it] 98%|█████████▊| 6537/6638 [2:27:04<05:33, 3.30s/it] {'loss': 0.6132, 'grad_norm': 0.5587006930321442, 'learning_rate': 1.2142906185600433e-08, 'epoch': 0.98} 98%|█████████▊| 6537/6638 [2:27:04<05:33, 3.30s/it] 98%|█████████▊| 6538/6638 [2:27:07<05:29, 3.30s/it] {'loss': 0.6876, 'grad_norm': 0.6297575509902006, 'learning_rate': 1.190369044206996e-08, 'epoch': 0.98} 98%|█████████▊| 6538/6638 [2:27:07<05:29, 3.30s/it] 99%|█████████▊| 6539/6638 [2:27:11<05:27, 3.30s/it] {'loss': 0.5916, 'grad_norm': 0.5719479991012716, 'learning_rate': 1.1666853074538965e-08, 'epoch': 0.99} 99%|█████████▊| 6539/6638 [2:27:11<05:27, 3.30s/it] 99%|█████████▊| 6540/6638 [2:27:14<05:29, 3.36s/it] {'loss': 0.6855, 'grad_norm': 0.605606927512474, 'learning_rate': 1.1432394139401226e-08, 'epoch': 0.99} 99%|█████████▊| 6540/6638 [2:27:14<05:29, 3.36s/it] 99%|█████████▊| 6541/6638 [2:27:18<05:26, 3.37s/it] {'loss': 0.6426, 'grad_norm': 0.552275475735969, 'learning_rate': 1.120031369248653e-08, 'epoch': 0.99} 99%|█████████▊| 6541/6638 [2:27:18<05:26, 3.37s/it] 99%|█████████▊| 6542/6638 [2:27:21<05:18, 3.32s/it] {'loss': 0.6947, 'grad_norm': 0.7684999634930956, 'learning_rate': 1.097061178905956e-08, 'epoch': 0.99} 99%|█████████▊| 6542/6638 [2:27:21<05:18, 3.32s/it] 99%|█████████▊| 6543/6638 [2:27:24<05:13, 3.30s/it] {'loss': 0.6302, 'grad_norm': 0.5462768376923832, 'learning_rate': 1.074328848381545e-08, 'epoch': 0.99} 99%|█████████▊| 6543/6638 [2:27:24<05:13, 3.30s/it] 99%|█████████▊| 6544/6638 [2:27:27<05:08, 3.28s/it] {'loss': 0.6029, 'grad_norm': 0.5205320436091547, 'learning_rate': 1.0518343830885346e-08, 'epoch': 0.99} 99%|█████████▊| 6544/6638 [2:27:27<05:08, 3.28s/it] 99%|█████████▊| 6545/6638 [2:27:32<05:34, 3.60s/it] {'loss': 0.6721, 'grad_norm': 0.6412038284024641, 'learning_rate': 1.029577788383418e-08, 'epoch': 0.99} 99%|█████████▊| 6545/6638 [2:27:32<05:34, 3.60s/it] 99%|█████████▊| 6546/6638 [2:27:35<05:20, 3.48s/it] {'loss': 0.6286, 'grad_norm': 0.5422733461951271, 'learning_rate': 1.0075590695658444e-08, 'epoch': 0.99} 99%|█████████▊| 6546/6638 [2:27:35<05:20, 3.48s/it] 99%|█████████▊| 6547/6638 [2:27:38<05:10, 3.42s/it] {'loss': 0.6429, 'grad_norm': 0.6072011979454488, 'learning_rate': 9.857782318790643e-09, 'epoch': 0.99} 99%|█████████▊| 6547/6638 [2:27:38<05:10, 3.42s/it] 99%|█████████▊| 6548/6638 [2:27:41<05:02, 3.36s/it] {'loss': 0.6236, 'grad_norm': 0.6056668673974686, 'learning_rate': 9.642352805093735e-09, 'epoch': 0.99} 99%|█████████▊| 6548/6638 [2:27:41<05:02, 3.36s/it] 99%|█████████▊| 6549/6638 [2:27:45<04:58, 3.35s/it] {'loss': 0.6242, 'grad_norm': 0.5423213706667919, 'learning_rate': 9.429302205866686e-09, 'epoch': 0.99} 99%|█████████▊| 6549/6638 [2:27:45<04:58, 3.35s/it]6 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 0 AutoResumeHook: Checking whether to suspend... 2 AutoResumeHook: Checking whether to suspend... 99%|█████████▊| 6550/6638 [2:27:48<04:52, 3.32s/it]7 AutoResumeHook: Checking whether to suspend... 1 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... {'loss': 0.5902, 'grad_norm': 0.5304907964063323, 'learning_rate': 9.218630571842247e-09, 'epoch': 0.99} 99%|█████████▊| 6550/6638 [2:27:48<04:52, 3.32s/it] 99%|█████████▊| 6551/6638 [2:27:51<04:46, 3.30s/it] {'loss': 0.6437, 'grad_norm': 0.5957659246667145, 'learning_rate': 9.010337953185843e-09, 'epoch': 0.99} 99%|█████████▊| 6551/6638 [2:27:51<04:46, 3.30s/it] 99%|█████████▊| 6552/6638 [2:27:54<04:42, 3.29s/it] {'loss': 0.6545, 'grad_norm': 0.6021025199765869, 'learning_rate': 8.80442439949447e-09, 'epoch': 0.99} 99%|█████████▊| 6552/6638 [2:27:54<04:42, 3.29s/it] 99%|█████████▊| 6553/6638 [2:27:58<04:36, 3.25s/it] {'loss': 0.6348, 'grad_norm': 0.6171969007675581, 'learning_rate': 8.600889959801128e-09, 'epoch': 0.99} 99%|█████████▊| 6553/6638 [2:27:58<04:36, 3.25s/it] 99%|█████████▊| 6554/6638 [2:28:01<04:33, 3.26s/it] {'loss': 0.6148, 'grad_norm': 0.5309237470610597, 'learning_rate': 8.399734682573712e-09, 'epoch': 0.99} 99%|█████████▊| 6554/6638 [2:28:01<04:33, 3.26s/it] 99%|█████████▊| 6555/6638 [2:28:04<04:33, 3.29s/it] {'loss': 0.681, 'grad_norm': 0.6496622439693244, 'learning_rate': 8.200958615708354e-09, 'epoch': 0.99} 99%|█████████▊| 6555/6638 [2:28:04<04:33, 3.29s/it] 99%|█████████▉| 6556/6638 [2:28:08<04:29, 3.29s/it] {'loss': 0.6552, 'grad_norm': 0.6480206047724423, 'learning_rate': 8.004561806540523e-09, 'epoch': 0.99} 99%|█████████▉| 6556/6638 [2:28:08<04:29, 3.29s/it] 99%|█████████▉| 6557/6638 [2:28:11<04:26, 3.29s/it] {'loss': 0.6203, 'grad_norm': 0.7354690151515584, 'learning_rate': 7.810544301835032e-09, 'epoch': 0.99} 99%|█████████▉| 6557/6638 [2:28:11<04:26, 3.29s/it] 99%|█████████▉| 6558/6638 [2:28:14<04:24, 3.30s/it] {'loss': 0.6481, 'grad_norm': 0.5753691080846072, 'learning_rate': 7.618906147791594e-09, 'epoch': 0.99} 99%|█████████▉| 6558/6638 [2:28:14<04:24, 3.30s/it] 99%|█████████▉| 6559/6638 [2:28:17<04:20, 3.29s/it] {'loss': 0.6377, 'grad_norm': 0.5777739687645824, 'learning_rate': 7.429647390043704e-09, 'epoch': 0.99} 99%|█████████▉| 6559/6638 [2:28:17<04:20, 3.29s/it] 99%|█████████▉| 6560/6638 [2:28:21<04:20, 3.34s/it] {'loss': 0.6576, 'grad_norm': 0.6010202873071521, 'learning_rate': 7.2427680736575355e-09, 'epoch': 0.99} 99%|█████████▉| 6560/6638 [2:28:21<04:20, 3.34s/it] 99%|█████████▉| 6561/6638 [2:28:24<04:18, 3.35s/it] {'loss': 0.619, 'grad_norm': 0.5737385405355222, 'learning_rate': 7.058268243133048e-09, 'epoch': 0.99} 99%|█████████▉| 6561/6638 [2:28:24<04:18, 3.35s/it] 99%|█████████▉| 6562/6638 [2:28:28<04:15, 3.36s/it] {'loss': 0.6219, 'grad_norm': 0.53977981027929, 'learning_rate': 6.876147942403988e-09, 'epoch': 0.99} 99%|█████████▉| 6562/6638 [2:28:28<04:15, 3.36s/it] 99%|█████████▉| 6563/6638 [2:28:31<04:09, 3.33s/it] {'loss': 0.6167, 'grad_norm': 0.5861988357652791, 'learning_rate': 6.696407214835665e-09, 'epoch': 0.99} 99%|█████████▉| 6563/6638 [2:28:31<04:09, 3.33s/it] 99%|█████████▉| 6564/6638 [2:28:34<04:07, 3.34s/it] {'loss': 0.6238, 'grad_norm': 0.6016148856420481, 'learning_rate': 6.5190461032305085e-09, 'epoch': 0.99} 99%|█████████▉| 6564/6638 [2:28:34<04:07, 3.34s/it] 99%|█████████▉| 6565/6638 [2:28:38<04:03, 3.33s/it] {'loss': 0.6392, 'grad_norm': 0.594275385148315, 'learning_rate': 6.344064649819182e-09, 'epoch': 0.99} 99%|█████████▉| 6565/6638 [2:28:38<04:03, 3.33s/it] 99%|█████████▉| 6566/6638 [2:28:41<03:59, 3.33s/it] {'loss': 0.6271, 'grad_norm': 0.6139121257545322, 'learning_rate': 6.171462896270575e-09, 'epoch': 0.99} 99%|█████████▉| 6566/6638 [2:28:41<03:59, 3.33s/it] 99%|█████████▉| 6567/6638 [2:28:44<03:53, 3.29s/it] {'loss': 0.6402, 'grad_norm': 0.6511170602773988, 'learning_rate': 6.001240883684034e-09, 'epoch': 0.99} 99%|█████████▉| 6567/6638 [2:28:44<03:53, 3.29s/it] 99%|█████████▉| 6568/6638 [2:28:47<03:49, 3.28s/it] {'loss': 0.6225, 'grad_norm': 0.5615508428458121, 'learning_rate': 5.833398652593802e-09, 'epoch': 0.99} 99%|█████████▉| 6568/6638 [2:28:47<03:49, 3.28s/it] 99%|█████████▉| 6569/6638 [2:28:51<03:48, 3.30s/it] {'loss': 0.6234, 'grad_norm': 0.5391436449511654, 'learning_rate': 5.667936242964578e-09, 'epoch': 0.99} 99%|█████████▉| 6569/6638 [2:28:51<03:48, 3.30s/it] 99%|█████████▉| 6570/6638 [2:28:54<03:49, 3.37s/it] {'loss': 0.6897, 'grad_norm': 0.6263875133550579, 'learning_rate': 5.504853694198176e-09, 'epoch': 0.99} 99%|█████████▉| 6570/6638 [2:28:54<03:49, 3.37s/it] 99%|█████████▉| 6571/6638 [2:28:58<03:44, 3.35s/it] {'loss': 0.6342, 'grad_norm': 0.5577853494395312, 'learning_rate': 5.344151045126866e-09, 'epoch': 0.99} 99%|█████████▉| 6571/6638 [2:28:58<03:44, 3.35s/it] 99%|█████████▉| 6572/6638 [2:29:01<03:40, 3.35s/it] {'loss': 0.5944, 'grad_norm': 0.5193307681062426, 'learning_rate': 5.185828334018928e-09, 'epoch': 0.99} 99%|█████████▉| 6572/6638 [2:29:01<03:40, 3.35s/it] 99%|█████████▉| 6573/6638 [2:29:04<03:36, 3.34s/it] {'loss': 0.6691, 'grad_norm': 0.5720692450694399, 'learning_rate': 5.029885598573092e-09, 'epoch': 0.99} 99%|█████████▉| 6573/6638 [2:29:04<03:36, 3.34s/it] 99%|█████████▉| 6574/6638 [2:29:07<03:31, 3.30s/it] {'loss': 0.6301, 'grad_norm': 0.5761892577334394, 'learning_rate': 4.87632287592299e-09, 'epoch': 0.99} 99%|█████████▉| 6574/6638 [2:29:07<03:31, 3.30s/it] 99%|█████████▉| 6575/6638 [2:29:11<03:26, 3.28s/it] {'loss': 0.5774, 'grad_norm': 0.5262942438949221, 'learning_rate': 4.725140202634926e-09, 'epoch': 0.99} 99%|█████████▉| 6575/6638 [2:29:11<03:26, 3.28s/it] 99%|█████████▉| 6576/6638 [2:29:14<03:24, 3.30s/it] {'loss': 0.6561, 'grad_norm': 0.5813258173055315, 'learning_rate': 4.576337614708992e-09, 'epoch': 0.99} 99%|█████████▉| 6576/6638 [2:29:14<03:24, 3.30s/it] 99%|█████████▉| 6577/6638 [2:29:18<03:25, 3.36s/it] {'loss': 0.6616, 'grad_norm': 0.5719557994209247, 'learning_rate': 4.4299151475779565e-09, 'epoch': 0.99} 99%|█████████▉| 6577/6638 [2:29:18<03:25, 3.36s/it] 99%|█████████▉| 6578/6638 [2:29:21<03:20, 3.34s/it] {'loss': 0.6614, 'grad_norm': 0.5618428756413156, 'learning_rate': 4.285872836108374e-09, 'epoch': 0.99} 99%|█████████▉| 6578/6638 [2:29:21<03:20, 3.34s/it] 99%|█████████▉| 6579/6638 [2:29:24<03:17, 3.35s/it] {'loss': 0.6573, 'grad_norm': 0.6247870370053787, 'learning_rate': 4.144210714599472e-09, 'epoch': 0.99} 99%|█████████▉| 6579/6638 [2:29:24<03:17, 3.35s/it] 99%|█████████▉| 6580/6638 [2:29:28<03:14, 3.35s/it] {'loss': 0.6529, 'grad_norm': 0.5684999801076744, 'learning_rate': 4.00492881678427e-09, 'epoch': 0.99} 99%|█████████▉| 6580/6638 [2:29:28<03:14, 3.35s/it] 99%|█████████▉| 6581/6638 [2:29:31<03:09, 3.33s/it] {'loss': 0.67, 'grad_norm': 0.6010355814990287, 'learning_rate': 3.868027175827349e-09, 'epoch': 0.99} 99%|█████████▉| 6581/6638 [2:29:31<03:09, 3.33s/it] 99%|█████████▉| 6582/6638 [2:29:34<03:05, 3.32s/it] {'loss': 0.6564, 'grad_norm': 0.5986436201176855, 'learning_rate': 3.733505824330408e-09, 'epoch': 0.99} 99%|█████████▉| 6582/6638 [2:29:34<03:05, 3.32s/it] 99%|█████████▉| 6583/6638 [2:29:37<03:02, 3.31s/it] {'loss': 0.6199, 'grad_norm': 0.5468806552460332, 'learning_rate': 3.6013647943233808e-09, 'epoch': 0.99} 99%|█████████▉| 6583/6638 [2:29:37<03:02, 3.31s/it] 99%|█████████▉| 6584/6638 [2:29:41<02:57, 3.28s/it] {'loss': 0.6017, 'grad_norm': 0.5729351000274381, 'learning_rate': 3.4716041172733195e-09, 'epoch': 0.99} 99%|█████████▉| 6584/6638 [2:29:41<02:57, 3.28s/it] 99%|█████████▉| 6585/6638 [2:29:44<02:54, 3.29s/it] {'loss': 0.6415, 'grad_norm': 0.6293339959514882, 'learning_rate': 3.3442238240788403e-09, 'epoch': 0.99} 99%|█████████▉| 6585/6638 [2:29:44<02:54, 3.29s/it] 99%|█████████▉| 6586/6638 [2:29:47<02:52, 3.32s/it] {'loss': 0.6269, 'grad_norm': 0.5877429465978267, 'learning_rate': 3.219223945071237e-09, 'epoch': 0.99} 99%|█████████▉| 6586/6638 [2:29:47<02:52, 3.32s/it] 99%|█████████▉| 6587/6638 [2:29:51<02:48, 3.30s/it] {'loss': 0.5915, 'grad_norm': 0.5286501861826971, 'learning_rate': 3.0966045100155885e-09, 'epoch': 0.99} 99%|█████████▉| 6587/6638 [2:29:51<02:48, 3.30s/it] 99%|█████████▉| 6588/6638 [2:29:54<02:46, 3.34s/it] {'loss': 0.6295, 'grad_norm': 0.6138589858590751, 'learning_rate': 2.9763655481118705e-09, 'epoch': 0.99} 99%|█████████▉| 6588/6638 [2:29:54<02:46, 3.34s/it] 99%|█████████▉| 6589/6638 [2:29:57<02:42, 3.31s/it] {'loss': 0.5973, 'grad_norm': 0.5460164492004935, 'learning_rate': 2.858507087988294e-09, 'epoch': 0.99} 99%|█████████▉| 6589/6638 [2:29:57<02:42, 3.31s/it] 99%|█████████▉| 6590/6638 [2:30:01<02:41, 3.35s/it] {'loss': 0.6378, 'grad_norm': 0.5993105491100291, 'learning_rate': 2.743029157712407e-09, 'epoch': 0.99} 99%|█████████▉| 6590/6638 [2:30:01<02:41, 3.35s/it] 99%|█████████▉| 6591/6638 [2:30:04<02:35, 3.32s/it] {'loss': 0.6178, 'grad_norm': 0.6279289063284441, 'learning_rate': 2.6299317847811034e-09, 'epoch': 0.99} 99%|█████████▉| 6591/6638 [2:30:04<02:35, 3.32s/it] 99%|█████████▉| 6592/6638 [2:30:07<02:31, 3.30s/it] {'loss': 0.6456, 'grad_norm': 0.5837017651193578, 'learning_rate': 2.5192149961250633e-09, 'epoch': 0.99} 99%|█████████▉| 6592/6638 [2:30:07<02:31, 3.30s/it] 99%|█████████▉| 6593/6638 [2:30:10<02:27, 3.28s/it] {'loss': 0.6299, 'grad_norm': 0.5919578146584236, 'learning_rate': 2.4108788181076428e-09, 'epoch': 0.99} 99%|█████████▉| 6593/6638 [2:30:10<02:27, 3.28s/it] 99%|█████████▉| 6594/6638 [2:30:14<02:25, 3.30s/it] {'loss': 0.6828, 'grad_norm': 0.6245517786835428, 'learning_rate': 2.3049232765259834e-09, 'epoch': 0.99} 99%|█████████▉| 6594/6638 [2:30:14<02:25, 3.30s/it] 99%|█████████▉| 6595/6638 [2:30:17<02:21, 3.29s/it] {'loss': 0.6337, 'grad_norm': 0.618459792467591, 'learning_rate': 2.201348396612124e-09, 'epoch': 0.99} 99%|█████████▉| 6595/6638 [2:30:17<02:21, 3.29s/it] 99%|█████████▉| 6596/6638 [2:30:20<02:17, 3.28s/it] {'loss': 0.6428, 'grad_norm': 0.5924956103817831, 'learning_rate': 2.100154203027449e-09, 'epoch': 0.99} 99%|█████████▉| 6596/6638 [2:30:20<02:17, 3.28s/it] 99%|█████████▉| 6597/6638 [2:30:24<02:14, 3.28s/it] {'loss': 0.665, 'grad_norm': 0.6832058526344117, 'learning_rate': 2.0013407198693492e-09, 'epoch': 0.99} 99%|█████████▉| 6597/6638 [2:30:24<02:14, 3.28s/it] 99%|█████████▉| 6598/6638 [2:30:27<02:11, 3.28s/it] {'loss': 0.6159, 'grad_norm': 0.6026995731113939, 'learning_rate': 1.9049079706667804e-09, 'epoch': 0.99} 99%|█████████▉| 6598/6638 [2:30:27<02:11, 3.28s/it] 99%|█████████▉| 6599/6638 [2:30:30<02:08, 3.29s/it] {'loss': 0.6533, 'grad_norm': 0.6567819612143354, 'learning_rate': 1.8108559783824863e-09, 'epoch': 0.99} 99%|█████████▉| 6599/6638 [2:30:30<02:08, 3.29s/it]6 AutoResumeHook: Checking whether to suspend... 01 AutoResumeHook: Checking whether to suspend... AutoResumeHook: Checking whether to suspend... 2 99%|█████████▉| 6600/6638 [2:30:33<02:04, 3.28s/it]AutoResumeHook: Checking whether to suspend... 7 AutoResumeHook: Checking whether to suspend... 5 AutoResumeHook: Checking whether to suspend... 4 AutoResumeHook: Checking whether to suspend... 3 AutoResumeHook: Checking whether to suspend... {'loss': 0.6281, 'grad_norm': 0.6539974380245679, 'learning_rate': 1.719184765414106e-09, 'epoch': 0.99} 99%|█████████▉| 6600/6638 [2:30:33<02:04, 3.28s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6600/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6600/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6600/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 99%|█████████▉| 6601/6638 [2:30:51<04:34, 7.42s/it] {'loss': 0.6692, 'grad_norm': 0.6011584956495697, 'learning_rate': 1.6298943535875133e-09, 'epoch': 0.99} 99%|█████████▉| 6601/6638 [2:30:51<04:34, 7.42s/it] 99%|█████████▉| 6602/6638 [2:30:54<03:44, 6.23s/it] {'loss': 0.5997, 'grad_norm': 0.5567646935757352, 'learning_rate': 1.5429847641668106e-09, 'epoch': 0.99} 99%|█████████▉| 6602/6638 [2:30:54<03:44, 6.23s/it] 99%|█████████▉| 6603/6638 [2:30:57<03:07, 5.34s/it] {'loss': 0.6644, 'grad_norm': 0.6563134794455975, 'learning_rate': 1.4584560178465546e-09, 'epoch': 0.99} 99%|█████████▉| 6603/6638 [2:30:57<03:07, 5.34s/it] 99%|█████████▉| 6604/6638 [2:31:01<02:40, 4.73s/it] {'loss': 0.6745, 'grad_norm': 0.5973246417246344, 'learning_rate': 1.3763081347539785e-09, 'epoch': 0.99} 99%|█████████▉| 6604/6638 [2:31:01<02:40, 4.73s/it] 100%|█████████▉| 6605/6638 [2:31:04<02:22, 4.31s/it] {'loss': 0.7297, 'grad_norm': 0.7147734756065831, 'learning_rate': 1.2965411344501023e-09, 'epoch': 1.0} 100%|█████████▉| 6605/6638 [2:31:04<02:22, 4.31s/it] 100%|█████████▉| 6606/6638 [2:31:07<02:07, 3.98s/it] {'loss': 0.6182, 'grad_norm': 0.6371332086045757, 'learning_rate': 1.2191550359308413e-09, 'epoch': 1.0} 100%|█████████▉| 6606/6638 [2:31:07<02:07, 3.98s/it] 100%|█████████▉| 6607/6638 [2:31:10<01:57, 3.78s/it] {'loss': 0.6241, 'grad_norm': 0.5637130424790793, 'learning_rate': 1.1441498576225673e-09, 'epoch': 1.0} 100%|█████████▉| 6607/6638 [2:31:10<01:57, 3.78s/it] 100%|█████████▉| 6608/6638 [2:31:14<01:49, 3.64s/it] {'loss': 0.6511, 'grad_norm': 0.6342355761355769, 'learning_rate': 1.071525617384328e-09, 'epoch': 1.0} 100%|█████████▉| 6608/6638 [2:31:14<01:49, 3.64s/it] 100%|█████████▉| 6609/6638 [2:31:17<01:42, 3.52s/it] {'loss': 0.6829, 'grad_norm': 0.8103521420491312, 'learning_rate': 1.0012823325111776e-09, 'epoch': 1.0} 100%|█████████▉| 6609/6638 [2:31:17<01:42, 3.52s/it] 100%|█████████▉| 6610/6638 [2:31:20<01:36, 3.44s/it] {'loss': 0.6695, 'grad_norm': 0.5881555337380749, 'learning_rate': 9.334200197286258e-10, 'epoch': 1.0} 100%|█████████▉| 6610/6638 [2:31:20<01:36, 3.44s/it] 100%|█████████▉| 6611/6638 [2:31:23<01:31, 3.37s/it] {'loss': 0.6321, 'grad_norm': 0.6134543407509627, 'learning_rate': 8.679386951970792e-10, 'epoch': 1.0} 100%|█████████▉| 6611/6638 [2:31:23<01:31, 3.37s/it] 100%|█████████▉| 6612/6638 [2:31:27<01:27, 3.37s/it] {'loss': 0.6083, 'grad_norm': 0.5737370851069261, 'learning_rate': 8.048383745085097e-10, 'epoch': 1.0} 100%|█████████▉| 6612/6638 [2:31:27<01:27, 3.37s/it] 100%|█████████▉| 6613/6638 [2:31:30<01:23, 3.35s/it] {'loss': 0.6369, 'grad_norm': 0.5731090749637703, 'learning_rate': 7.441190726875658e-10, 'epoch': 1.0} 100%|█████████▉| 6613/6638 [2:31:30<01:23, 3.35s/it] 100%|█████████▉| 6614/6638 [2:31:33<01:19, 3.32s/it] {'loss': 0.5783, 'grad_norm': 0.547197112109774, 'learning_rate': 6.857808041937919e-10, 'epoch': 1.0} 100%|█████████▉| 6614/6638 [2:31:33<01:19, 3.32s/it] 100%|█████████▉| 6615/6638 [2:31:37<01:17, 3.37s/it] {'loss': 0.6335, 'grad_norm': 0.5189643114211046, 'learning_rate': 6.298235829182986e-10, 'epoch': 1.0} 100%|█████████▉| 6615/6638 [2:31:37<01:17, 3.37s/it] 100%|█████████▉| 6616/6638 [2:31:40<01:13, 3.33s/it] {'loss': 0.6535, 'grad_norm': 0.5999805533032511, 'learning_rate': 5.762474221859826e-10, 'epoch': 1.0} 100%|█████████▉| 6616/6638 [2:31:40<01:13, 3.33s/it] 100%|█████████▉| 6617/6638 [2:31:43<01:09, 3.32s/it] {'loss': 0.5886, 'grad_norm': 0.5088694159603242, 'learning_rate': 5.250523347544167e-10, 'epoch': 1.0} 100%|█████████▉| 6617/6638 [2:31:43<01:09, 3.32s/it] 100%|█████████▉| 6618/6638 [2:31:47<01:05, 3.29s/it] {'loss': 0.6589, 'grad_norm': 0.6003608229473136, 'learning_rate': 4.762383328138498e-10, 'epoch': 1.0} 100%|█████████▉| 6618/6638 [2:31:47<01:05, 3.29s/it] 100%|█████████▉| 6619/6638 [2:31:50<01:02, 3.27s/it] {'loss': 0.6283, 'grad_norm': 0.6017343417860531, 'learning_rate': 4.298054279883168e-10, 'epoch': 1.0} 100%|█████████▉| 6619/6638 [2:31:50<01:02, 3.27s/it] 100%|█████████▉| 6620/6638 [2:31:53<00:58, 3.26s/it] {'loss': 0.5826, 'grad_norm': 0.5333562315314205, 'learning_rate': 3.857536313345289e-10, 'epoch': 1.0} 100%|█████████▉| 6620/6638 [2:31:53<00:58, 3.26s/it] 100%|█████████▉| 6621/6638 [2:31:56<00:55, 3.27s/it] {'loss': 0.6116, 'grad_norm': 0.5399575055459586, 'learning_rate': 3.4408295334187324e-10, 'epoch': 1.0} 100%|█████████▉| 6621/6638 [2:31:56<00:55, 3.27s/it] 100%|█████████▉| 6622/6638 [2:32:00<00:52, 3.29s/it] {'loss': 0.6157, 'grad_norm': 0.5295329537327718, 'learning_rate': 3.047934039335232e-10, 'epoch': 1.0} 100%|█████████▉| 6622/6638 [2:32:00<00:52, 3.29s/it] 100%|█████████▉| 6623/6638 [2:32:03<00:48, 3.26s/it] {'loss': 0.6213, 'grad_norm': 0.5714253484848915, 'learning_rate': 2.6788499246421794e-10, 'epoch': 1.0} 100%|█████████▉| 6623/6638 [2:32:03<00:48, 3.26s/it] 100%|█████████▉| 6624/6638 [2:32:06<00:45, 3.27s/it] {'loss': 0.6657, 'grad_norm': 0.581543304106417, 'learning_rate': 2.333577277235932e-10, 'epoch': 1.0} 100%|█████████▉| 6624/6638 [2:32:06<00:45, 3.27s/it] 100%|█████████▉| 6625/6638 [2:32:09<00:42, 3.29s/it] {'loss': 0.6369, 'grad_norm': 0.5523635553907636, 'learning_rate': 2.012116179328505e-10, 'epoch': 1.0} 100%|█████████▉| 6625/6638 [2:32:10<00:42, 3.29s/it] 100%|█████████▉| 6626/6638 [2:32:13<00:39, 3.28s/it] {'loss': 0.5975, 'grad_norm': 0.5234195067554501, 'learning_rate': 1.714466707469775e-10, 'epoch': 1.0} 100%|█████████▉| 6626/6638 [2:32:13<00:39, 3.28s/it] 100%|█████████▉| 6627/6638 [2:32:16<00:36, 3.32s/it] {'loss': 0.6296, 'grad_norm': 0.7443845870569801, 'learning_rate': 1.4406289325252788e-10, 'epoch': 1.0} 100%|█████████▉| 6627/6638 [2:32:16<00:36, 3.32s/it] 100%|█████████▉| 6628/6638 [2:32:19<00:33, 3.32s/it] {'loss': 0.6191, 'grad_norm': 0.5764806422534288, 'learning_rate': 1.190602919709516e-10, 'epoch': 1.0} 100%|█████████▉| 6628/6638 [2:32:19<00:33, 3.32s/it] 100%|█████████▉| 6629/6638 [2:32:23<00:29, 3.30s/it] {'loss': 0.6205, 'grad_norm': 0.5742374148413487, 'learning_rate': 9.643887285637476e-11, 'epoch': 1.0} 100%|█████████▉| 6629/6638 [2:32:23<00:29, 3.30s/it] 100%|█████████▉| 6630/6638 [2:32:26<00:26, 3.35s/it] {'loss': 0.6717, 'grad_norm': 0.6217822284566791, 'learning_rate': 7.619864129559951e-11, 'epoch': 1.0} 100%|█████████▉| 6630/6638 [2:32:26<00:26, 3.35s/it] 100%|█████████▉| 6631/6638 [2:32:29<00:23, 3.33s/it] {'loss': 0.6436, 'grad_norm': 0.5677023649534383, 'learning_rate': 5.833960210588352e-11, 'epoch': 1.0} 100%|█████████▉| 6631/6638 [2:32:29<00:23, 3.33s/it] 100%|█████████▉| 6632/6638 [2:32:33<00:20, 3.33s/it] {'loss': 0.612, 'grad_norm': 0.5205482931287522, 'learning_rate': 4.286175954271166e-11, 'epoch': 1.0} 100%|█████████▉| 6632/6638 [2:32:33<00:20, 3.33s/it] 100%|█████████▉| 6633/6638 [2:32:36<00:16, 3.31s/it] {'loss': 0.633, 'grad_norm': 0.592462560942973, 'learning_rate': 2.976511729091414e-11, 'epoch': 1.0} 100%|█████████▉| 6633/6638 [2:32:36<00:16, 3.31s/it] 100%|█████████▉| 6634/6638 [2:32:40<00:13, 3.35s/it] {'loss': 0.6091, 'grad_norm': 0.50172562016045, 'learning_rate': 1.9049678467997213e-11, 'epoch': 1.0} 100%|█████████▉| 6634/6638 [2:32:40<00:13, 3.35s/it] 100%|█████████▉| 6635/6638 [2:32:43<00:09, 3.29s/it] {'loss': 0.613, 'grad_norm': 0.6117331639131375, 'learning_rate': 1.0715445626363618e-11, 'epoch': 1.0} 100%|█████████▉| 6635/6638 [2:32:43<00:09, 3.29s/it] 100%|█████████▉| 6636/6638 [2:32:46<00:06, 3.31s/it] {'loss': 0.6324, 'grad_norm': 0.5904727283060205, 'learning_rate': 4.762420751092123e-12, 'epoch': 1.0} 100%|█████████▉| 6636/6638 [2:32:46<00:06, 3.31s/it] 100%|█████████▉| 6637/6638 [2:32:49<00:03, 3.30s/it] {'loss': 0.6307, 'grad_norm': 0.5799284100340704, 'learning_rate': 1.1906052588273044e-12, 'epoch': 1.0} 100%|█████████▉| 6637/6638 [2:32:49<00:03, 3.30s/it] 100%|██████████| 6638/6638 [2:32:54<00:00, 3.66s/it] {'loss': 0.6072, 'grad_norm': 0.5441621753148557, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 6638/6638 [2:32:54<00:00, 3.66s/it]saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6638/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6638/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/tmp-checkpoint-6638/mm_projector /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/torch/nn/modules/module.py:1898: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( {'train_runtime': 9191.1745, 'train_samples_per_second': 184.981, 'train_steps_per_second': 0.722, 'train_loss': 0.2544525241898499, 'epoch': 1.0} 100%|██████████| 6638/6638 [2:33:09<00:00, 3.66s/it] 100%|██████████| 6638/6638 [2:33:09<00:00, 1.38s/it] saving llm to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/llm saving vision_tower to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/vision_tower saving mm_projector to /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask/mm_projector wandb: wandb: 🚀 View run nvila_2b_path_mask at: https://wandb.ai/memmelma/VILA/runs/i92sb1t7 wandb: Find logs at: ../../../../../../../../fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/wandb/run-20250527_220238-i92sb1t7/logs srun: job 8273814 queued and waiting for resources srun: job 8273814 has been allocated resources wandb: Currently logged in as: memmelma to https://api.wandb.ai. Use `wandb login --relogin` to force relogin MASTER_ADDR=batch-block1-0083 JobID: 8273814 | Full list: batch-block1-0083 NETWORK=Efficient-Large-Model/NVILA-Lite-2B W0528 00:37:44.790000 23456244102976 torch/distributed/run.py:757] W0528 00:37:44.790000 23456244102976 torch/distributed/run.py:757] ***************************************** W0528 00:37:44.790000 23456244102976 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0528 00:37:44.790000 23456244102976 torch/distributed/run.py:757] ***************************************** 2025-05-28 00:37:57.762 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-28 00:37:57.762 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-28 00:37:57.762 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-28 00:37:57.762 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-28 00:37:57.762 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-28 00:37:57.762 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-28 00:37:57.762 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-28 00:37:57.763 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-28 00:37:57.763 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-28 00:37:57.763 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-28 00:37:57.763 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-28 00:37:57.763 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-28 00:37:57.763 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-28 00:37:57.763 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. 2025-05-28 00:37:57.763 | INFO | llava.data.builder:register_datasets:39 - Registering datasets from environment: 'default'. 2025-05-28 00:37:57.764 | INFO | llava.data.builder:register_datasets:44 - Registering datasets from: '/lustre/fs12/portfolios/nvr/users/mmemmel/projects/vila/NVILA/llava/data/registry/datasets/default.yaml'. [2025-05-28 00:37:58,091] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 00:37:58,091] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 00:37:58,091] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 00:37:58,092] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 00:37:58,098] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 00:37:58,099] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 00:37:58,099] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-28 00:37:58,099] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! Did not find AutoResume SDK! /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /lustre/fs12/portfolios/nvr/users/mmemmel/miniforge3/envs/nvila/lib/python3.10/site-packages/transformers/training_args.py:1559: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( [2025-05-28 00:38:08,431] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 00:38:08,431] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 00:38:08,431] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 00:38:08,431] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 00:38:08,431] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 00:38:08,431] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 00:38:08,431] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 00:38:08,431] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 00:38:08,431] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 00:38:08,431] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 00:38:08,431] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 00:38:08,431] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 00:38:08,431] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 00:38:08,431] [INFO] [comm.py:594:init_distributed] cdb=None [2025-05-28 00:38:08,431] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-05-28 00:38:08,431] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-05-28 00:38:08,431] [INFO] [comm.py:594:init_distributed] cdb=None [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask. Skipp training Models has been ready under /lustre/fs12/portfolios/nvr/users/mmemmel/projects/nvila/checkpoints/finetuned/nvila/nvila_2b_path_mask. Skipp training