[2026-01-11 12:03:08,973][datasets][INFO] - PyTorch version 2.2.2+cu118 available. [2026-01-11 12:03:18,410][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method [2026-01-11 12:03:18,749][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) [2026-01-11 12:03:19,057][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. [2026-01-11 12:03:19,482][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) [2026-01-11 12:03:19,762][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. [2026-01-11 12:03:19,835][dinov2][INFO] - using MLP layer as FFN [2026-01-11 12:07:36,667][datasets][INFO] - PyTorch version 2.2.2+cu118 available. [2026-01-11 12:07:45,491][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method [2026-01-11 12:07:45,829][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) [2026-01-11 12:07:46,111][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. [2026-01-11 12:07:46,501][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) [2026-01-11 12:07:47,551][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. [2026-01-11 12:07:47,691][dinov2][INFO] - using MLP layer as FFN [2026-01-11 12:08:39,736][__main__][ERROR] - Training failed for seed 242: [2026-01-11 12:08:39,736][__main__][ERROR] - ================================================================================ [2026-01-11 12:08:39,737][__main__][ERROR] - Error type: TypeError [2026-01-11 12:08:39,737][__main__][ERROR] - Error message: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token' [2026-01-11 12:08:39,737][__main__][ERROR] - Full traceback: [2026-01-11 12:08:39,737][__main__][ERROR] - Traceback (most recent call last): File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train raise e File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train trainer.fit(model, datamodule=datamodule) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit call._call_and_handle_interrupt( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch return function(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl self._run(model, ckpt_path=ckpt_path, weights_only=weights_only) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run results = self._run_stage() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1121, in _run_stage self._run_sanity_check() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1150, in _run_sanity_check val_loop.run() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 179, in _decorator return loop_run(self, *args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 146, in run self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 441, in _evaluation_step output = call._call_strategy_hook(trainer, hook_name, *step_args) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 329, in _call_strategy_hook output = fn(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 411, in validation_step return self._forward_redirection(self.model, self.lightning_module, "validation_step", *args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 641, in __call__ wrapper_output = wrapper_module(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 634, in wrapped_forward out = method(*_args, **_kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 487, in validation_step perceptual_emb, latent_goal = self.compute_input_embeddings(dataset_batch) File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 616, in compute_input_embeddings track_tokens = self.track_adapter( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/onestep_tracker.py", line 566, in forward raw_tokens = self.track_backbone.get_global_token( TypeError: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token' [2026-01-11 12:08:39,740][__main__][ERROR] - ================================================================================ [2026-01-11 12:08:39,996][__main__][ERROR] - Training script failed: [2026-01-11 12:08:39,996][__main__][ERROR] - ================================================================================ [2026-01-11 12:08:39,996][__main__][ERROR] - Error type: TypeError [2026-01-11 12:08:39,996][__main__][ERROR] - Error message: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token' [2026-01-11 12:08:39,996][__main__][ERROR] - Full traceback: [2026-01-11 12:08:39,997][__main__][ERROR] - Traceback (most recent call last): File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 231, in train() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main _run_hydra( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report raise ex File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report return func() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 212, in train raise e File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train raise e File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train trainer.fit(model, datamodule=datamodule) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit call._call_and_handle_interrupt( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch return function(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl self._run(model, ckpt_path=ckpt_path, weights_only=weights_only) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run results = self._run_stage() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1121, in _run_stage self._run_sanity_check() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1150, in _run_sanity_check val_loop.run() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 179, in _decorator return loop_run(self, *args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 146, in run self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 441, in _evaluation_step output = call._call_strategy_hook(trainer, hook_name, *step_args) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 329, in _call_strategy_hook output = fn(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 411, in validation_step return self._forward_redirection(self.model, self.lightning_module, "validation_step", *args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 641, in __call__ wrapper_output = wrapper_module(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 634, in wrapped_forward out = method(*_args, **_kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 487, in validation_step perceptual_emb, latent_goal = self.compute_input_embeddings(dataset_batch) File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 616, in compute_input_embeddings track_tokens = self.track_adapter( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/onestep_tracker.py", line 566, in forward raw_tokens = self.track_backbone.get_global_token( TypeError: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token' [2026-01-11 12:08:39,997][__main__][ERROR] - ================================================================================ [2026-01-11 12:10:13,991][datasets][INFO] - PyTorch version 2.2.2+cu118 available. [2026-01-11 12:10:23,086][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method [2026-01-11 12:10:23,421][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) [2026-01-11 12:10:23,744][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. [2026-01-11 12:10:24,132][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k) [2026-01-11 12:10:24,617][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. [2026-01-11 12:10:24,774][dinov2][INFO] - using MLP layer as FFN [2026-01-11 12:11:31,321][root][INFO] - Creating EMA weights copy. [2026-01-11 12:37:15,159][__main__][ERROR] - Training failed for seed 242: [2026-01-11 12:37:15,186][__main__][ERROR] - ================================================================================ [2026-01-11 12:37:15,186][__main__][ERROR] - Error type: MisconfigurationException [2026-01-11 12:37:15,186][__main__][ERROR] - Error message: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`? [2026-01-11 12:37:15,188][__main__][ERROR] - Full traceback: [2026-01-11 12:37:15,188][__main__][ERROR] - Traceback (most recent call last): File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train raise e File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train trainer.fit(model, datamodule=datamodule) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit call._call_and_handle_interrupt( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch return function(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl self._run(model, ckpt_path=ckpt_path, weights_only=weights_only) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run results = self._run_stage() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1123, in _run_stage self.fit_loop.run() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 218, in run self.on_advance_end() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 480, in on_advance_end call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 228, in _call_callback_hooks fn(trainer, trainer.lightning_module, *args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 493, in on_train_epoch_end self._save_topk_checkpoint(trainer, monitor_candidates) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 587, in _save_topk_checkpoint raise MisconfigurationException(m) lightning_fabric.utilities.exceptions.MisconfigurationException: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`? [2026-01-11 12:37:19,255][__main__][ERROR] - ================================================================================ [2026-01-11 12:37:19,972][__main__][ERROR] - Training script failed: [2026-01-11 12:37:19,973][__main__][ERROR] - ================================================================================ [2026-01-11 12:37:19,973][__main__][ERROR] - Error type: MisconfigurationException [2026-01-11 12:37:19,973][__main__][ERROR] - Error message: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`? [2026-01-11 12:37:19,986][__main__][ERROR] - Full traceback: [2026-01-11 12:37:19,987][__main__][ERROR] - Traceback (most recent call last): File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 231, in train() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main _run_hydra( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report raise ex File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report return func() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 212, in train raise e File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train raise e File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train trainer.fit(model, datamodule=datamodule) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit call._call_and_handle_interrupt( File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch return function(*args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl self._run(model, ckpt_path=ckpt_path, weights_only=weights_only) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run results = self._run_stage() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1123, in _run_stage self.fit_loop.run() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 218, in run self.on_advance_end() File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 480, in on_advance_end call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 228, in _call_callback_hooks fn(trainer, trainer.lightning_module, *args, **kwargs) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 493, in on_train_epoch_end self._save_topk_checkpoint(trainer, monitor_candidates) File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 587, in _save_topk_checkpoint raise MisconfigurationException(m) lightning_fabric.utilities.exceptions.MisconfigurationException: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`? [2026-01-11 12:37:19,987][__main__][ERROR] - ================================================================================ [2026-01-11 23:26:47,573][datasets][INFO] - PyTorch version 2.2.2+cu118 available. [2026-01-11 23:27:25,307][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 6615de3c-feaa-4818-8d1b-4ed73d711ed6)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:27:25,308][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2026-01-11 23:27:36,323][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 5aeb566d-4c0d-4484-a44e-0478b8a3c2bf)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:27:36,323][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2026-01-11 23:27:48,335][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 48fcdc23-f950-41ac-85d3-0e7b9ec1cb03)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:27:48,336][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. [2026-01-11 23:28:02,352][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 38748308-51b3-4ec8-aae1-1ce7a84cee1f)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:28:02,352][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. [2026-01-11 23:28:04,171][datasets][INFO] - PyTorch version 2.2.2+cu118 available. [2026-01-11 23:28:20,371][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 530cae3e-b194-4897-805e-e2527ef95fde)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:28:20,397][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. [2026-01-11 23:28:38,415][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 43493718-e156-4926-ac4f-a5f834e633f1)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:28:38,843][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 993daa5e-3e92-4ac4-8447-1b3cb39ec12b)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:28:38,852][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2026-01-11 23:28:48,429][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 9ed40347-6fda-42c1-9091-1bbd35c17060)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:28:48,433][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2026-01-11 23:28:49,867][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 911156df-3232-4135-9a9c-fe660e8ad559)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:28:49,870][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2026-01-11 23:28:59,448][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 877689a3-ed71-4242-b898-b52c62f2d734)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:28:59,465][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2026-01-11 23:29:01,887][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 58286c00-1491-49d3-9d48-d47457aec0f4)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:29:01,890][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. [2026-01-11 23:29:11,479][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 11dcd2c3-f4da-4d6f-b655-6bfac2639fb8)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:29:11,482][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. [2026-01-11 23:29:15,903][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 8a1be1fa-f318-4bf3-a362-d6fbefcb8b99)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:29:15,907][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. [2026-01-11 23:29:25,495][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 86c0433e-086a-4e41-a542-1ac4d4c15642)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:29:25,497][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. [2026-01-11 23:29:34,047][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 45441ad1-5f75-4d53-955d-eb649f85bb0a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:29:34,082][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. [2026-01-11 23:29:43,516][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 7408b013-6c7a-4b01-8f25-90576bd23f31)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:29:43,521][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. [2026-01-11 23:29:52,099][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 51ffdbf9-44e4-484e-9d7c-404458e3b0c1)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:30:01,540][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: c47ee3bb-4465-4b82-839b-eb86c5bc1779)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:30:02,115][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 15299db9-f91e-4a36-bff0-cdecb1e02671)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:30:02,134][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2026-01-11 23:30:05,831][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method [2026-01-11 23:30:07,263][dinov2][INFO] - using MLP layer as FFN [2026-01-11 23:30:13,148][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 7f91f5cd-03c0-45c1-97eb-d3d729425820)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:30:13,150][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2026-01-11 23:30:25,163][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 3b8396e1-96b8-40e4-bbb0-25d8fed14110)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:30:25,164][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. [2026-01-11 23:30:39,179][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 28baa919-1679-470b-9008-d01eaa7cd1d5)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:30:39,199][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. [2026-01-11 23:30:57,219][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 4e4b06bd-0f15-414c-8307-bcb7aec63270)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:30:57,219][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. [2026-01-11 23:31:15,238][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: fa940326-327b-4b45-ac68-03f00e450355)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:31:17,784][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method [2026-01-11 23:31:18,355][dinov2][INFO] - using MLP layer as FFN [2026-01-11 23:31:25,991][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 9944b16e-a3e4-4864-8ed8-a9c6799077e3)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:31:25,995][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2026-01-11 23:31:37,010][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 6716f22e-9ec4-4a14-b311-9d011ac92bc7)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:31:37,011][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2026-01-11 23:31:49,023][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 51af9e95-de51-4842-afe0-a4f50da4e7f8)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:31:49,024][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. [2026-01-11 23:32:03,039][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 593224a6-5a93-457e-adbd-819fc7d1f71d)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:32:03,066][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. [2026-01-11 23:32:21,083][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 1a11ef69-5b75-415c-b692-90b61c8ef6ba)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:32:21,084][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. [2026-01-11 23:32:39,105][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: ea0dd451-1818-4204-97fe-895b9b0cf8e8)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:32:49,119][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 434f8c78-2917-44ba-8b52-050e57c0867f)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:32:49,120][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2026-01-11 23:33:00,131][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 69f94c24-90d6-448f-863f-09339754fb6f)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:33:00,132][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2026-01-11 23:33:12,147][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: d495c732-11fc-46f2-9704-127db1ce7c42)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:33:12,148][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. [2026-01-11 23:33:26,163][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 56068754-bda2-48b0-9034-4f326a74f613)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:33:26,164][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. [2026-01-11 23:33:44,183][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: ca60038c-1cae-4817-a304-e580bdecbd58)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:33:44,184][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. [2026-01-11 23:33:59,370][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 072793cf-34d4-4d02-8d2d-3fe82e8bf095)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:33:59,730][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2026-01-11 23:34:02,203][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: a0179ad2-bec9-423f-adf3-ab7db9f25308)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:34:10,742][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: e81c1f27-0452-4865-9d22-248f3abe7722)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:34:10,746][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2026-01-11 23:34:22,758][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 29c06739-da5e-452f-8077-757ec5223af6)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:34:22,777][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. [2026-01-11 23:34:30,997][root][INFO] - Creating EMA weights copy. [2026-01-11 23:34:36,795][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: b6983c7f-5729-4d8a-990b-4572736acc17)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:34:36,801][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. [2026-01-11 23:34:54,820][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 8319cf1d-b434-4b2c-b630-fc784ddd0e95)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:34:54,821][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. [2026-01-11 23:35:12,843][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 3cfc8e65-254d-49fd-9dce-eb8d07da537b)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json [2026-01-11 23:35:22,857][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 52c84543-1a06-4dd6-ab4f-64547d5652a3)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:35:22,857][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5]. [2026-01-11 23:35:33,870][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 4f5be885-87ae-4400-8255-42a0f201b251)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:35:33,871][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5]. [2026-01-11 23:35:45,887][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 9071f97e-b9eb-41be-b83c-cb65161b213d)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:35:45,887][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5]. [2026-01-11 23:35:59,898][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 2fa34707-9f5d-4056-9491-7b3e41be438a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:35:59,899][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5]. [2026-01-11 23:36:17,919][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: b75addd4-cd4d-431a-80b6-fcd3dfc05d7e)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:36:17,919][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5]. [2026-01-11 23:36:35,956][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 53e3a271-f041-4a16-ba9c-4ce445f8866a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json [2026-01-11 23:37:05,396][root][INFO] - Creating EMA weights copy. [2026-01-11 23:46:11,642][datasets][INFO] - PyTorch version 2.2.2+cu118 available. [2026-01-11 23:46:43,654][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method [2026-01-11 23:46:44,240][dinov2][INFO] - using MLP layer as FFN [2026-01-11 23:46:57,878][datasets][INFO] - PyTorch version 2.2.2+cu118 available. [2026-01-11 23:47:27,117][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method [2026-01-11 23:47:27,891][dinov2][INFO] - using MLP layer as FFN [2026-01-11 23:48:20,014][root][INFO] - Creating EMA weights copy. [2026-01-11 23:49:03,156][root][INFO] - Creating EMA weights copy. [2026-01-11 23:53:20,798][datasets][INFO] - PyTorch version 2.2.2+cu118 available. [2026-01-11 23:53:51,953][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method [2026-01-11 23:53:52,749][dinov2][INFO] - using MLP layer as FFN [2026-01-11 23:56:37,443][root][INFO] - Creating EMA weights copy.