| [2026-01-11 12:03:08,973][datasets][INFO] - PyTorch version 2.2.2+cu118 available. |
| [2026-01-11 12:03:18,410][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method |
| [2026-01-11 12:03:18,749][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) |
| [2026-01-11 12:03:19,057][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. |
| [2026-01-11 12:03:19,482][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) |
| [2026-01-11 12:03:19,762][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. |
| [2026-01-11 12:03:19,835][dinov2][INFO] - using MLP layer as FFN |
| [2026-01-11 12:07:36,667][datasets][INFO] - PyTorch version 2.2.2+cu118 available. |
| [2026-01-11 12:07:45,491][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method |
| [2026-01-11 12:07:45,829][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) |
| [2026-01-11 12:07:46,111][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. |
| [2026-01-11 12:07:46,501][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) |
| [2026-01-11 12:07:47,551][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. |
| [2026-01-11 12:07:47,691][dinov2][INFO] - using MLP layer as FFN |
| [2026-01-11 12:08:39,736][__main__][ERROR] - |
| Training failed for seed 242: |
| [2026-01-11 12:08:39,736][__main__][ERROR] - ================================================================================ |
| [2026-01-11 12:08:39,737][__main__][ERROR] - Error type: TypeError |
| [2026-01-11 12:08:39,737][__main__][ERROR] - Error message: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token' |
| [2026-01-11 12:08:39,737][__main__][ERROR] - Full traceback: |
| [2026-01-11 12:08:39,737][__main__][ERROR] - Traceback (most recent call last): |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train |
| raise e |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train |
| trainer.fit(model, datamodule=datamodule) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit |
| call._call_and_handle_interrupt( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt |
| return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch |
| return function(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl |
| self._run(model, ckpt_path=ckpt_path, weights_only=weights_only) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run |
| results = self._run_stage() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1121, in _run_stage |
| self._run_sanity_check() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1150, in _run_sanity_check |
| val_loop.run() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 179, in _decorator |
| return loop_run(self, *args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 146, in run |
| self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 441, in _evaluation_step |
| output = call._call_strategy_hook(trainer, hook_name, *step_args) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 329, in _call_strategy_hook |
| output = fn(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 411, in validation_step |
| return self._forward_redirection(self.model, self.lightning_module, "validation_step", *args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 641, in __call__ |
| wrapper_output = wrapper_module(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward |
| else self._run_ddp_forward(*inputs, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward |
| return self.module(*inputs, **kwargs) # type: ignore[index] |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 634, in wrapped_forward |
| out = method(*_args, **_kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context |
| return func(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 487, in validation_step |
| perceptual_emb, latent_goal = self.compute_input_embeddings(dataset_batch) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 616, in compute_input_embeddings |
| track_tokens = self.track_adapter( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/onestep_tracker.py", line 566, in forward |
| raw_tokens = self.track_backbone.get_global_token( |
| TypeError: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token' |
|
|
| [2026-01-11 12:08:39,740][__main__][ERROR] - ================================================================================ |
| [2026-01-11 12:08:39,996][__main__][ERROR] - |
| Training script failed: |
| [2026-01-11 12:08:39,996][__main__][ERROR] - ================================================================================ |
| [2026-01-11 12:08:39,996][__main__][ERROR] - Error type: TypeError |
| [2026-01-11 12:08:39,996][__main__][ERROR] - Error message: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token' |
| [2026-01-11 12:08:39,996][__main__][ERROR] - Full traceback: |
| [2026-01-11 12:08:39,997][__main__][ERROR] - Traceback (most recent call last): |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 231, in <module> |
| train() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| _run_hydra( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| _run_app( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| run_and_report( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| raise ex |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| return func() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| lambda: hydra.run( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| _ = ret.return_value |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| raise self._return_value |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| ret.return_value = task_function(task_cfg) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 212, in train |
| raise e |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train |
| raise e |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train |
| trainer.fit(model, datamodule=datamodule) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit |
| call._call_and_handle_interrupt( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt |
| return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch |
| return function(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl |
| self._run(model, ckpt_path=ckpt_path, weights_only=weights_only) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run |
| results = self._run_stage() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1121, in _run_stage |
| self._run_sanity_check() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1150, in _run_sanity_check |
| val_loop.run() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 179, in _decorator |
| return loop_run(self, *args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 146, in run |
| self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 441, in _evaluation_step |
| output = call._call_strategy_hook(trainer, hook_name, *step_args) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 329, in _call_strategy_hook |
| output = fn(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 411, in validation_step |
| return self._forward_redirection(self.model, self.lightning_module, "validation_step", *args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 641, in __call__ |
| wrapper_output = wrapper_module(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward |
| else self._run_ddp_forward(*inputs, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward |
| return self.module(*inputs, **kwargs) # type: ignore[index] |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 634, in wrapped_forward |
| out = method(*_args, **_kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context |
| return func(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 487, in validation_step |
| perceptual_emb, latent_goal = self.compute_input_embeddings(dataset_batch) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 616, in compute_input_embeddings |
| track_tokens = self.track_adapter( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl |
| return self._call_impl(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl |
| return forward_call(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/onestep_tracker.py", line 566, in forward |
| raw_tokens = self.track_backbone.get_global_token( |
| TypeError: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token' |
|
|
| [2026-01-11 12:08:39,997][__main__][ERROR] - ================================================================================ |
| [2026-01-11 12:10:13,991][datasets][INFO] - PyTorch version 2.2.2+cu118 available. |
| [2026-01-11 12:10:23,086][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method |
| [2026-01-11 12:10:23,421][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) |
| [2026-01-11 12:10:23,744][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. |
| [2026-01-11 12:10:24,132][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) |
| [2026-01-11 12:10:24,617][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. |
| [2026-01-11 12:10:24,774][dinov2][INFO] - using MLP layer as FFN |
| [2026-01-11 12:11:31,321][root][INFO] - Creating EMA weights copy. |
| [2026-01-11 12:37:15,159][__main__][ERROR] - |
| Training failed for seed 242: |
| [2026-01-11 12:37:15,186][__main__][ERROR] - ================================================================================ |
| [2026-01-11 12:37:15,186][__main__][ERROR] - Error type: MisconfigurationException |
| [2026-01-11 12:37:15,186][__main__][ERROR] - Error message: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`? |
| [2026-01-11 12:37:15,188][__main__][ERROR] - Full traceback: |
| [2026-01-11 12:37:15,188][__main__][ERROR] - Traceback (most recent call last): |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train |
| raise e |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train |
| trainer.fit(model, datamodule=datamodule) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit |
| call._call_and_handle_interrupt( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt |
| return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch |
| return function(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl |
| self._run(model, ckpt_path=ckpt_path, weights_only=weights_only) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run |
| results = self._run_stage() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1123, in _run_stage |
| self.fit_loop.run() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 218, in run |
| self.on_advance_end() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 480, in on_advance_end |
| call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 228, in _call_callback_hooks |
| fn(trainer, trainer.lightning_module, *args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 493, in on_train_epoch_end |
| self._save_topk_checkpoint(trainer, monitor_candidates) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 587, in _save_topk_checkpoint |
| raise MisconfigurationException(m) |
| lightning_fabric.utilities.exceptions.MisconfigurationException: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`? |
|
|
| [2026-01-11 12:37:19,255][__main__][ERROR] - ================================================================================ |
| [2026-01-11 12:37:19,972][__main__][ERROR] - |
| Training script failed: |
| [2026-01-11 12:37:19,973][__main__][ERROR] - ================================================================================ |
| [2026-01-11 12:37:19,973][__main__][ERROR] - Error type: MisconfigurationException |
| [2026-01-11 12:37:19,973][__main__][ERROR] - Error message: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`? |
| [2026-01-11 12:37:19,986][__main__][ERROR] - Full traceback: |
| [2026-01-11 12:37:19,987][__main__][ERROR] - Traceback (most recent call last): |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 231, in <module> |
| train() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main |
| _run_hydra( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra |
| _run_app( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app |
| run_and_report( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report |
| raise ex |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report |
| return func() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda> |
| lambda: hydra.run( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run |
| _ = ret.return_value |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value |
| raise self._return_value |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job |
| ret.return_value = task_function(task_cfg) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 212, in train |
| raise e |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train |
| raise e |
| File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train |
| trainer.fit(model, datamodule=datamodule) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit |
| call._call_and_handle_interrupt( |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt |
| return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch |
| return function(*args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl |
| self._run(model, ckpt_path=ckpt_path, weights_only=weights_only) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run |
| results = self._run_stage() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1123, in _run_stage |
| self.fit_loop.run() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 218, in run |
| self.on_advance_end() |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 480, in on_advance_end |
| call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 228, in _call_callback_hooks |
| fn(trainer, trainer.lightning_module, *args, **kwargs) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 493, in on_train_epoch_end |
| self._save_topk_checkpoint(trainer, monitor_candidates) |
| File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 587, in _save_topk_checkpoint |
| raise MisconfigurationException(m) |
| lightning_fabric.utilities.exceptions.MisconfigurationException: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`? |
|
|
| [2026-01-11 12:37:19,987][__main__][ERROR] - ================================================================================ |
| [2026-01-11 23:26:47,573][datasets][INFO] - PyTorch version 2.2.2+cu118 available. |
| [2026-01-11 23:27:25,307][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e603b20>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 6615de3c-feaa-4818-8d1b-4ed73d711ed6)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:27:25,308][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. |
| [2026-01-11 23:27:36,323][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e620430>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 5aeb566d-4c0d-4484-a44e-0478b8a3c2bf)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:27:36,323][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. |
| [2026-01-11 23:27:48,335][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e6200d0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 48fcdc23-f950-41ac-85d3-0e7b9ec1cb03)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:27:48,336][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. |
| [2026-01-11 23:28:02,352][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e620760>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 38748308-51b3-4ec8-aae1-1ce7a84cee1f)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:28:02,352][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. |
| [2026-01-11 23:28:04,171][datasets][INFO] - PyTorch version 2.2.2+cu118 available. |
| [2026-01-11 23:28:20,371][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e620a60>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 530cae3e-b194-4897-805e-e2527ef95fde)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:28:20,397][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. |
| [2026-01-11 23:28:38,415][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e620d60>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 43493718-e156-4926-ac4f-a5f834e633f1)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:28:38,843][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88f9720>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 993daa5e-3e92-4ac4-8447-1b3cb39ec12b)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:28:38,852][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. |
| [2026-01-11 23:28:48,429][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e600580>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 9ed40347-6fda-42c1-9091-1bbd35c17060)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:28:48,433][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. |
| [2026-01-11 23:28:49,867][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88f9e40>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 911156df-3232-4135-9a9c-fe660e8ad559)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:28:49,870][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. |
| [2026-01-11 23:28:59,448][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e8a9030>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 877689a3-ed71-4242-b898-b52c62f2d734)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:28:59,465][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. |
| [2026-01-11 23:29:01,887][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88f9b70>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 58286c00-1491-49d3-9d48-d47457aec0f4)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:29:01,890][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. |
| [2026-01-11 23:29:11,479][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e861570>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 11dcd2c3-f4da-4d6f-b655-6bfac2639fb8)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:29:11,482][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. |
| [2026-01-11 23:29:15,903][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88fa260>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 8a1be1fa-f318-4bf3-a362-d6fbefcb8b99)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:29:15,907][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. |
| [2026-01-11 23:29:25,495][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e860940>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 86c0433e-086a-4e41-a542-1ac4d4c15642)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:29:25,497][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. |
| [2026-01-11 23:29:34,047][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88fa560>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 45441ad1-5f75-4d53-955d-eb649f85bb0a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:29:34,082][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. |
| [2026-01-11 23:29:43,516][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e620e50>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 7408b013-6c7a-4b01-8f25-90576bd23f31)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:29:43,521][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. |
| [2026-01-11 23:29:52,099][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88fa860>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 51ffdbf9-44e4-484e-9d7c-404458e3b0c1)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:30:01,540][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e620850>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: c47ee3bb-4465-4b82-839b-eb86c5bc1779)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:30:02,115][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f8b3ac80>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 15299db9-f91e-4a36-bff0-cdecb1e02671)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:30:02,134][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. |
| [2026-01-11 23:30:05,831][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method |
| [2026-01-11 23:30:07,263][dinov2][INFO] - using MLP layer as FFN |
| [2026-01-11 23:30:13,148][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f8b39f00>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 7f91f5cd-03c0-45c1-97eb-d3d729425820)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:30:13,150][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. |
| [2026-01-11 23:30:25,163][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88fab90>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 3b8396e1-96b8-40e4-bbb0-25d8fed14110)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:30:25,164][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. |
| [2026-01-11 23:30:39,179][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88fa890>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 28baa919-1679-470b-9008-d01eaa7cd1d5)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:30:39,199][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. |
| [2026-01-11 23:30:57,219][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88fa590>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 4e4b06bd-0f15-414c-8307-bcb7aec63270)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:30:57,219][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. |
| [2026-01-11 23:31:15,238][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f88fa290>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: fa940326-327b-4b45-ac68-03f00e450355)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:31:17,784][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method |
| [2026-01-11 23:31:18,355][dinov2][INFO] - using MLP layer as FFN |
| [2026-01-11 23:31:25,991][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f8740d49000>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 9944b16e-a3e4-4864-8ed8-a9c6799077e3)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:31:25,995][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. |
| [2026-01-11 23:31:37,010][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e66b4f0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 6716f22e-9ec4-4a14-b311-9d011ac92bc7)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:31:37,011][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. |
| [2026-01-11 23:31:49,023][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e66bb50>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 51af9e95-de51-4842-afe0-a4f50da4e7f8)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:31:49,024][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. |
| [2026-01-11 23:32:03,039][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e230130>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 593224a6-5a93-457e-adbd-819fc7d1f71d)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:32:03,066][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. |
| [2026-01-11 23:32:21,083][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e232080>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 1a11ef69-5b75-415c-b692-90b61c8ef6ba)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:32:21,084][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. |
| [2026-01-11 23:32:39,105][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e0b79d0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: ea0dd451-1818-4204-97fe-895b9b0cf8e8)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:32:49,119][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f8740cbdcf0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 434f8c78-2917-44ba-8b52-050e57c0867f)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:32:49,120][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. |
| [2026-01-11 23:33:00,131][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f8758bd6ad0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 69f94c24-90d6-448f-863f-09339754fb6f)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:33:00,132][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. |
| [2026-01-11 23:33:12,147][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f858bcb81f0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: d495c732-11fc-46f2-9704-127db1ce7c42)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:33:12,148][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. |
| [2026-01-11 23:33:26,163][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f873e621ea0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 56068754-bda2-48b0-9034-4f326a74f613)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:33:26,164][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. |
| [2026-01-11 23:33:44,183][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f858bcb84c0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: ca60038c-1cae-4817-a304-e580bdecbd58)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:33:44,184][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. |
| [2026-01-11 23:33:59,370][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b333b760>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 072793cf-34d4-4d02-8d2d-3fe82e8bf095)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:33:59,730][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. |
| [2026-01-11 23:34:02,203][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7f8740cbc100>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: a0179ad2-bec9-423f-adf3-ab7db9f25308)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:34:10,742][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b333b280>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: e81c1f27-0452-4865-9d22-248f3abe7722)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:34:10,746][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. |
| [2026-01-11 23:34:22,758][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b34ee7d0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 29c06739-da5e-452f-8077-757ec5223af6)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:34:22,777][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. |
| [2026-01-11 23:34:30,997][root][INFO] - Creating EMA weights copy. |
| [2026-01-11 23:34:36,795][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b34edd20>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: b6983c7f-5729-4d8a-990b-4572736acc17)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:34:36,801][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. |
| [2026-01-11 23:34:54,820][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b34ed870>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 8319cf1d-b434-4b2c-b630-fc784ddd0e95)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:34:54,821][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. |
| [2026-01-11 23:35:12,843][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b34edde0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 3cfc8e65-254d-49fd-9dce-eb8d07da537b)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json |
| [2026-01-11 23:35:22,857][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b34ecd60>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 52c84543-1a06-4dd6-ab4f-64547d5652a3)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:35:22,857][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. |
| [2026-01-11 23:35:33,870][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b34ecb50>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 4f5be885-87ae-4400-8255-42a0f201b251)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:35:33,871][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. |
| [2026-01-11 23:35:45,887][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f8b39e70>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 9071f97e-b9eb-41be-b83c-cb65161b213d)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:35:45,887][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. |
| [2026-01-11 23:35:59,898][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5f8a8a380>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 2fa34707-9f5d-4056-9491-7b3e41be438a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:35:59,899][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. |
| [2026-01-11 23:36:17,919][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b34ec550>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: b75addd4-cd4d-431a-80b6-fcd3dfc05d7e)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:36:17,919][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. |
| [2026-01-11 23:36:35,956][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd5b34ecdf0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 53e3a271-f041-4a16-ba9c-4ce445f8866a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json |
| [2026-01-11 23:37:05,396][root][INFO] - Creating EMA weights copy. |
| [2026-01-11 23:46:11,642][datasets][INFO] - PyTorch version 2.2.2+cu118 available. |
| [2026-01-11 23:46:43,654][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method |
| [2026-01-11 23:46:44,240][dinov2][INFO] - using MLP layer as FFN |
| [2026-01-11 23:46:57,878][datasets][INFO] - PyTorch version 2.2.2+cu118 available. |
| [2026-01-11 23:47:27,117][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method |
| [2026-01-11 23:47:27,891][dinov2][INFO] - using MLP layer as FFN |
| [2026-01-11 23:48:20,014][root][INFO] - Creating EMA weights copy. |
| [2026-01-11 23:49:03,156][root][INFO] - Creating EMA weights copy. |
| [2026-01-11 23:53:20,798][datasets][INFO] - PyTorch version 2.2.2+cu118 available. |
| [2026-01-11 23:53:51,953][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method |
| [2026-01-11 23:53:52,749][dinov2][INFO] - using MLP layer as FFN |
| [2026-01-11 23:56:37,443][root][INFO] - Creating EMA weights copy. |
|
|