track-flowmatching / train_ddp_process_2.log
magic0's picture
Upload folder using huggingface_hub
fb075f1 verified
Raw
History Blame Contribute Delete
129 kB
[2026-01-11 12:03:08,977][datasets][INFO] - PyTorch version 2.2.2+cu118 available.
[2026-01-11 12:03:17,922][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method
[2026-01-11 12:03:18,258][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k)
[2026-01-11 12:03:18,543][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
[2026-01-11 12:03:18,936][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k)
[2026-01-11 12:03:19,374][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
[2026-01-11 12:03:19,525][dinov2][INFO] - using MLP layer as FFN
[2026-01-11 12:07:36,634][datasets][INFO] - PyTorch version 2.2.2+cu118 available.
[2026-01-11 12:07:47,297][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method
[2026-01-11 12:07:47,675][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k)
[2026-01-11 12:07:47,973][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
[2026-01-11 12:07:48,361][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k)
[2026-01-11 12:07:48,659][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
[2026-01-11 12:07:48,804][dinov2][INFO] - using MLP layer as FFN
[2026-01-11 12:08:38,931][__main__][ERROR] -
Training failed for seed 242:
[2026-01-11 12:08:38,931][__main__][ERROR] - ================================================================================
[2026-01-11 12:08:38,931][__main__][ERROR] - Error type: TypeError
[2026-01-11 12:08:38,932][__main__][ERROR] - Error message: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token'
[2026-01-11 12:08:38,932][__main__][ERROR] - Full traceback:
[2026-01-11 12:08:38,932][__main__][ERROR] - Traceback (most recent call last):
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train
raise e
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train
trainer.fit(model, datamodule=datamodule)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit
call._call_and_handle_interrupt(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl
self._run(model, ckpt_path=ckpt_path, weights_only=weights_only)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run
results = self._run_stage()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1121, in _run_stage
self._run_sanity_check()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1150, in _run_sanity_check
val_loop.run()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 179, in _decorator
return loop_run(self, *args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 146, in run
self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 441, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_args)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 329, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 411, in validation_step
return self._forward_redirection(self.model, self.lightning_module, "validation_step", *args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 641, in __call__
wrapper_output = wrapper_module(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 634, in wrapped_forward
out = method(*_args, **_kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 487, in validation_step
perceptual_emb, latent_goal = self.compute_input_embeddings(dataset_batch)
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 616, in compute_input_embeddings
track_tokens = self.track_adapter(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/onestep_tracker.py", line 566, in forward
raw_tokens = self.track_backbone.get_global_token(
TypeError: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token'
[2026-01-11 12:08:38,936][__main__][ERROR] - ================================================================================
[2026-01-11 12:08:39,220][__main__][ERROR] -
Training script failed:
[2026-01-11 12:08:39,220][__main__][ERROR] - ================================================================================
[2026-01-11 12:08:39,221][__main__][ERROR] - Error type: TypeError
[2026-01-11 12:08:39,221][__main__][ERROR] - Error message: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token'
[2026-01-11 12:08:39,221][__main__][ERROR] - Full traceback:
[2026-01-11 12:08:39,223][__main__][ERROR] - Traceback (most recent call last):
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 231, in <module>
train()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
lambda: hydra.run(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 212, in train
raise e
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train
raise e
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train
trainer.fit(model, datamodule=datamodule)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit
call._call_and_handle_interrupt(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl
self._run(model, ckpt_path=ckpt_path, weights_only=weights_only)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run
results = self._run_stage()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1121, in _run_stage
self._run_sanity_check()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1150, in _run_sanity_check
val_loop.run()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 179, in _decorator
return loop_run(self, *args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 146, in run
self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 441, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_args)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 329, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 411, in validation_step
return self._forward_redirection(self.model, self.lightning_module, "validation_step", *args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 641, in __call__
wrapper_output = wrapper_module(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 634, in wrapped_forward
out = method(*_args, **_kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 487, in validation_step
perceptual_emb, latent_goal = self.compute_input_embeddings(dataset_batch)
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/mode_agent.py", line 616, in compute_input_embeddings
track_tokens = self.track_adapter(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/models/onestep_tracker.py", line 566, in forward
raw_tokens = self.track_backbone.get_global_token(
TypeError: SingleStageGlobalTrack.get_global_token() got an unexpected keyword argument 'return_img_token'
[2026-01-11 12:08:39,223][__main__][ERROR] - ================================================================================
[2026-01-11 12:10:14,015][datasets][INFO] - PyTorch version 2.2.2+cu118 available.
[2026-01-11 12:10:22,772][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method
[2026-01-11 12:10:23,109][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k)
[2026-01-11 12:10:23,392][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
[2026-01-11 12:10:23,781][timm.models._builder][INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k)
[2026-01-11 12:10:24,248][timm.models._hub][INFO] - [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
[2026-01-11 12:10:24,321][dinov2][INFO] - using MLP layer as FFN
[2026-01-11 12:11:31,321][root][INFO] - Creating EMA weights copy.
[2026-01-11 12:37:14,939][__main__][ERROR] -
Training failed for seed 242:
[2026-01-11 12:37:14,940][__main__][ERROR] - ================================================================================
[2026-01-11 12:37:14,940][__main__][ERROR] - Error type: MisconfigurationException
[2026-01-11 12:37:14,940][__main__][ERROR] - Error message: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`?
[2026-01-11 12:37:14,941][__main__][ERROR] - Full traceback:
[2026-01-11 12:37:14,941][__main__][ERROR] - Traceback (most recent call last):
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train
raise e
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train
trainer.fit(model, datamodule=datamodule)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit
call._call_and_handle_interrupt(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl
self._run(model, ckpt_path=ckpt_path, weights_only=weights_only)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run
results = self._run_stage()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1123, in _run_stage
self.fit_loop.run()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 218, in run
self.on_advance_end()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 480, in on_advance_end
call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 228, in _call_callback_hooks
fn(trainer, trainer.lightning_module, *args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 493, in on_train_epoch_end
self._save_topk_checkpoint(trainer, monitor_candidates)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 587, in _save_topk_checkpoint
raise MisconfigurationException(m)
lightning_fabric.utilities.exceptions.MisconfigurationException: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`?
[2026-01-11 12:37:14,941][__main__][ERROR] - ================================================================================
[2026-01-11 12:37:19,254][__main__][ERROR] -
Training script failed:
[2026-01-11 12:37:19,255][__main__][ERROR] - ================================================================================
[2026-01-11 12:37:19,255][__main__][ERROR] - Error type: MisconfigurationException
[2026-01-11 12:37:19,255][__main__][ERROR] - Error message: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`?
[2026-01-11 12:37:19,256][__main__][ERROR] - Full traceback:
[2026-01-11 12:37:19,260][__main__][ERROR] - Traceback (most recent call last):
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 231, in <module>
train()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
lambda: hydra.run(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 212, in train
raise e
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 201, in train
raise e
File "/inspire/hdd/global_user/xuzijun-253108540220/MoDE_Diffusion_Policy/mode/training_realworld.py", line 186, in train
trainer.fit(model, datamodule=datamodule)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 584, in fit
call._call_and_handle_interrupt(
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 630, in _fit_impl
self._run(model, ckpt_path=ckpt_path, weights_only=weights_only)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run
results = self._run_stage()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1123, in _run_stage
self.fit_loop.run()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 218, in run
self.on_advance_end()
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 480, in on_advance_end
call._call_callback_hooks(trainer, "on_train_epoch_end", monitoring_callbacks=True)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 228, in _call_callback_hooks
fn(trainer, trainer.lightning_module, *args, **kwargs)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 493, in on_train_epoch_end
self._save_topk_checkpoint(trainer, monitor_candidates)
File "/inspire/hdd/global_user/xuzijun-253108540220/conda/envs/mode_env_310/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 587, in _save_topk_checkpoint
raise MisconfigurationException(m)
lightning_fabric.utilities.exceptions.MisconfigurationException: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['debug/total_grad_norm', 'debug/input_layers_grad_norm', 'train/ema_rate', 'debug/block_0_ln_1.g_grad_norm', 'debug/block_0_attn.key.weight_grad_norm', 'debug/block_0_attn.key.bias_grad_norm', 'debug/block_0_attn.query.weight_grad_norm', 'debug/block_0_attn.query.bias_grad_norm', 'debug/block_0_attn.value.weight_grad_norm', 'debug/block_0_attn.value.bias_grad_norm', 'debug/block_0_attn.c_proj.weight_grad_norm', 'debug/block_0_attn.q_norm.g_grad_norm', 'debug/block_0_attn.k_norm.g_grad_norm', 'debug/block_0_ln_2.g_grad_norm', 'debug/block_0_router.router.mlp.0.weight_grad_norm', 'debug/block_0_router.router.mlp.0.bias_grad_norm', 'debug/block_0_router.router.mlp.3.weight_grad_norm', 'debug/block_0_router.router.mlp.3.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_0_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_0_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_1_ln_1.g_grad_norm', 'debug/block_1_attn.key.weight_grad_norm', 'debug/block_1_attn.key.bias_grad_norm', 'debug/block_1_attn.query.weight_grad_norm', 'debug/block_1_attn.query.bias_grad_norm', 'debug/block_1_attn.value.weight_grad_norm', 'debug/block_1_attn.value.bias_grad_norm', 'debug/block_1_attn.c_proj.weight_grad_norm', 'debug/block_1_attn.q_norm.g_grad_norm', 'debug/block_1_attn.k_norm.g_grad_norm', 'debug/block_1_ln_2.g_grad_norm', 'debug/block_1_router.router.mlp.0.weight_grad_norm', 'debug/block_1_router.router.mlp.0.bias_grad_norm', 'debug/block_1_router.router.mlp.3.weight_grad_norm', 'debug/block_1_router.router.mlp.3.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_1_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_1_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_2_ln_1.g_grad_norm', 'debug/block_2_attn.key.weight_grad_norm', 'debug/block_2_attn.key.bias_grad_norm', 'debug/block_2_attn.query.weight_grad_norm', 'debug/block_2_attn.query.bias_grad_norm', 'debug/block_2_attn.value.weight_grad_norm', 'debug/block_2_attn.value.bias_grad_norm', 'debug/block_2_attn.c_proj.weight_grad_norm', 'debug/block_2_attn.q_norm.g_grad_norm', 'debug/block_2_attn.k_norm.g_grad_norm', 'debug/block_2_ln_2.g_grad_norm', 'debug/block_2_router.router.mlp.0.weight_grad_norm', 'debug/block_2_router.router.mlp.0.bias_grad_norm', 'debug/block_2_router.router.mlp.3.weight_grad_norm', 'debug/block_2_router.router.mlp.3.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_2_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_2_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_3_ln_1.g_grad_norm', 'debug/block_3_attn.key.weight_grad_norm', 'debug/block_3_attn.key.bias_grad_norm', 'debug/block_3_attn.query.weight_grad_norm', 'debug/block_3_attn.query.bias_grad_norm', 'debug/block_3_attn.value.weight_grad_norm', 'debug/block_3_attn.value.bias_grad_norm', 'debug/block_3_attn.c_proj.weight_grad_norm', 'debug/block_3_attn.q_norm.g_grad_norm', 'debug/block_3_attn.k_norm.g_grad_norm', 'debug/block_3_ln_2.g_grad_norm', 'debug/block_3_router.router.mlp.0.weight_grad_norm', 'debug/block_3_router.router.mlp.0.bias_grad_norm', 'debug/block_3_router.router.mlp.3.weight_grad_norm', 'debug/block_3_router.router.mlp.3.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_3_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_3_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_4_ln_1.g_grad_norm', 'debug/block_4_attn.key.weight_grad_norm', 'debug/block_4_attn.key.bias_grad_norm', 'debug/block_4_attn.query.weight_grad_norm', 'debug/block_4_attn.query.bias_grad_norm', 'debug/block_4_attn.value.weight_grad_norm', 'debug/block_4_attn.value.bias_grad_norm', 'debug/block_4_attn.c_proj.weight_grad_norm', 'debug/block_4_attn.q_norm.g_grad_norm', 'debug/block_4_attn.k_norm.g_grad_norm', 'debug/block_4_ln_2.g_grad_norm', 'debug/block_4_router.router.mlp.0.weight_grad_norm', 'debug/block_4_router.router.mlp.0.bias_grad_norm', 'debug/block_4_router.router.mlp.3.weight_grad_norm', 'debug/block_4_router.router.mlp.3.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_4_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_4_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_5_ln_1.g_grad_norm', 'debug/block_5_attn.key.weight_grad_norm', 'debug/block_5_attn.key.bias_grad_norm', 'debug/block_5_attn.query.weight_grad_norm', 'debug/block_5_attn.query.bias_grad_norm', 'debug/block_5_attn.value.weight_grad_norm', 'debug/block_5_attn.value.bias_grad_norm', 'debug/block_5_attn.c_proj.weight_grad_norm', 'debug/block_5_attn.q_norm.g_grad_norm', 'debug/block_5_attn.k_norm.g_grad_norm', 'debug/block_5_ln_2.g_grad_norm', 'debug/block_5_router.router.mlp.0.weight_grad_norm', 'debug/block_5_router.router.mlp.0.bias_grad_norm', 'debug/block_5_router.router.mlp.3.weight_grad_norm', 'debug/block_5_router.router.mlp.3.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_5_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_5_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_6_ln_1.g_grad_norm', 'debug/block_6_attn.key.weight_grad_norm', 'debug/block_6_attn.key.bias_grad_norm', 'debug/block_6_attn.query.weight_grad_norm', 'debug/block_6_attn.query.bias_grad_norm', 'debug/block_6_attn.value.weight_grad_norm', 'debug/block_6_attn.value.bias_grad_norm', 'debug/block_6_attn.c_proj.weight_grad_norm', 'debug/block_6_attn.q_norm.g_grad_norm', 'debug/block_6_attn.k_norm.g_grad_norm', 'debug/block_6_ln_2.g_grad_norm', 'debug/block_6_router.router.mlp.0.weight_grad_norm', 'debug/block_6_router.router.mlp.0.bias_grad_norm', 'debug/block_6_router.router.mlp.3.weight_grad_norm', 'debug/block_6_router.router.mlp.3.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_6_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_6_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_7_ln_1.g_grad_norm', 'debug/block_7_attn.key.weight_grad_norm', 'debug/block_7_attn.key.bias_grad_norm', 'debug/block_7_attn.query.weight_grad_norm', 'debug/block_7_attn.query.bias_grad_norm', 'debug/block_7_attn.value.weight_grad_norm', 'debug/block_7_attn.value.bias_grad_norm', 'debug/block_7_attn.c_proj.weight_grad_norm', 'debug/block_7_attn.q_norm.g_grad_norm', 'debug/block_7_attn.k_norm.g_grad_norm', 'debug/block_7_ln_2.g_grad_norm', 'debug/block_7_router.router.mlp.0.weight_grad_norm', 'debug/block_7_router.router.mlp.0.bias_grad_norm', 'debug/block_7_router.router.mlp.3.weight_grad_norm', 'debug/block_7_router.router.mlp.3.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_7_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_7_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_8_ln_1.g_grad_norm', 'debug/block_8_attn.key.weight_grad_norm', 'debug/block_8_attn.key.bias_grad_norm', 'debug/block_8_attn.query.weight_grad_norm', 'debug/block_8_attn.query.bias_grad_norm', 'debug/block_8_attn.value.weight_grad_norm', 'debug/block_8_attn.value.bias_grad_norm', 'debug/block_8_attn.c_proj.weight_grad_norm', 'debug/block_8_attn.q_norm.g_grad_norm', 'debug/block_8_attn.k_norm.g_grad_norm', 'debug/block_8_ln_2.g_grad_norm', 'debug/block_8_router.router.mlp.0.weight_grad_norm', 'debug/block_8_router.router.mlp.0.bias_grad_norm', 'debug/block_8_router.router.mlp.3.weight_grad_norm', 'debug/block_8_router.router.mlp.3.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_8_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_8_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_9_ln_1.g_grad_norm', 'debug/block_9_attn.key.weight_grad_norm', 'debug/block_9_attn.key.bias_grad_norm', 'debug/block_9_attn.query.weight_grad_norm', 'debug/block_9_attn.query.bias_grad_norm', 'debug/block_9_attn.value.weight_grad_norm', 'debug/block_9_attn.value.bias_grad_norm', 'debug/block_9_attn.c_proj.weight_grad_norm', 'debug/block_9_attn.q_norm.g_grad_norm', 'debug/block_9_attn.k_norm.g_grad_norm', 'debug/block_9_ln_2.g_grad_norm', 'debug/block_9_router.router.mlp.0.weight_grad_norm', 'debug/block_9_router.router.mlp.0.bias_grad_norm', 'debug/block_9_router.router.mlp.3.weight_grad_norm', 'debug/block_9_router.router.mlp.3.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_9_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_9_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_10_ln_1.g_grad_norm', 'debug/block_10_attn.key.weight_grad_norm', 'debug/block_10_attn.key.bias_grad_norm', 'debug/block_10_attn.query.weight_grad_norm', 'debug/block_10_attn.query.bias_grad_norm', 'debug/block_10_attn.value.weight_grad_norm', 'debug/block_10_attn.value.bias_grad_norm', 'debug/block_10_attn.c_proj.weight_grad_norm', 'debug/block_10_attn.q_norm.g_grad_norm', 'debug/block_10_attn.k_norm.g_grad_norm', 'debug/block_10_ln_2.g_grad_norm', 'debug/block_10_router.router.mlp.0.weight_grad_norm', 'debug/block_10_router.router.mlp.0.bias_grad_norm', 'debug/block_10_router.router.mlp.3.weight_grad_norm', 'debug/block_10_router.router.mlp.3.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_10_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_10_experts.expert_3.mlp.2.weight_grad_norm', 'debug/block_11_ln_1.g_grad_norm', 'debug/block_11_attn.key.weight_grad_norm', 'debug/block_11_attn.key.bias_grad_norm', 'debug/block_11_attn.query.weight_grad_norm', 'debug/block_11_attn.query.bias_grad_norm', 'debug/block_11_attn.value.weight_grad_norm', 'debug/block_11_attn.value.bias_grad_norm', 'debug/block_11_attn.c_proj.weight_grad_norm', 'debug/block_11_attn.q_norm.g_grad_norm', 'debug/block_11_attn.k_norm.g_grad_norm', 'debug/block_11_ln_2.g_grad_norm', 'debug/block_11_router.router.mlp.0.weight_grad_norm', 'debug/block_11_router.router.mlp.0.bias_grad_norm', 'debug/block_11_router.router.mlp.3.weight_grad_norm', 'debug/block_11_router.router.mlp.3.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_0.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_0.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_1.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_1.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_2.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_2.mlp.2.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.weight_grad_norm', 'debug/block_11_experts.expert_3.mlp.0.project.bias_grad_norm', 'debug/block_11_experts.expert_3.mlp.2.weight_grad_norm', 'val_act/lang_act_loss_pp', 'train/action_loss', 'train/total_loss', 'lr-AdamW/pg1', 'lr-AdamW/pg2', 'lr-AdamW/pg3', 'lr-AdamW/pg4', 'lr-AdamW/pg5', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`?
[2026-01-11 12:37:19,633][__main__][ERROR] - ================================================================================
[2026-01-11 23:26:47,573][datasets][INFO] - PyTorch version 2.2.2+cu118 available.
[2026-01-11 23:27:25,235][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85ff118d0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 576c6585-8a86-4aae-85fe-79c5b8bf5a05)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:27:25,236][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5].
[2026-01-11 23:27:36,251][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcc83d0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 20c72a9e-eec0-4d45-939d-a1bd77a7ab7e)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:27:36,252][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5].
[2026-01-11 23:27:48,267][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcc8100>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: c953a127-3e05-430c-8cf2-32bd5ee0d9be)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:27:48,269][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5].
[2026-01-11 23:28:02,285][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcc87f0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 12ae90fd-7faf-436d-b333-c7c4ac947745)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:28:02,286][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5].
[2026-01-11 23:28:04,200][datasets][INFO] - PyTorch version 2.2.2+cu118 available.
[2026-01-11 23:28:20,308][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcc8af0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: a12f574a-a014-4b0d-85bf-1b62a3711807)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:28:20,311][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5].
[2026-01-11 23:28:38,327][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcc8df0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 7abb9f43-796b-449f-a91b-f376f3143f89)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:28:38,791][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded57b0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: ce6f4724-3fe3-4453-b5fd-96e3e96e3d88)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:28:38,799][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5].
[2026-01-11 23:28:48,339][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85ff111e0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 9ffe1431-b74d-468c-9021-3c20a9e79df0)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:28:48,406][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5].
[2026-01-11 23:28:49,811][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded5ea0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: b2321e11-cec5-42de-a7f6-891c93f9e30f)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:28:49,814][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5].
[2026-01-11 23:28:59,419][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85ff11480>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 95dd6648-1fd4-45e2-ad2b-3784dacb7a3a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:28:59,435][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5].
[2026-01-11 23:29:01,831][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded5bd0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 72156e75-dae6-4a15-a5f3-78a11ddfa699)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:29:01,833][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5].
[2026-01-11 23:29:11,451][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcaba30>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 242bd94f-19db-4b67-8990-247412094b14)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:29:11,470][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5].
[2026-01-11 23:29:15,851][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded62c0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 2d1b5e32-eed8-4836-89a5-b7d3d13cad5a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:29:15,854][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5].
[2026-01-11 23:29:25,487][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcc9030>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 4f476237-2e10-43a3-83a9-6fc40302e820)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:29:25,493][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5].
[2026-01-11 23:29:33,875][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded65c0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 1eafbfba-c132-47b1-95b6-0bf1312150ab)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:29:33,878][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5].
[2026-01-11 23:29:43,515][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcc8eb0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 385c91da-5256-4122-9b1a-f73666c41ee6)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:29:43,519][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5].
[2026-01-11 23:29:51,901][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded68c0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 10b5e95a-4310-4c5f-a7e6-92f52a315054)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:30:01,539][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcc8bb0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: f325c1b3-00f0-408e-a752-352aa912d240)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:30:01,922][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81e97ed10>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: ce5b7a8c-5ac2-4dca-903f-aa640ffb6458)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:30:01,991][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5].
[2026-01-11 23:30:05,821][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method
[2026-01-11 23:30:07,320][dinov2][INFO] - using MLP layer as FFN
[2026-01-11 23:30:13,007][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81e97df90>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: c677f464-f98e-41a0-aeed-e26d5bedb714)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:30:13,009][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5].
[2026-01-11 23:30:25,027][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded6bf0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 475811a0-4d8c-4d04-8189-81646ee25278)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:30:25,027][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5].
[2026-01-11 23:30:39,043][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded68f0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: e28465f0-5c70-44a3-a162-a3e651608182)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:30:39,057][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5].
[2026-01-11 23:30:57,067][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded65f0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: d41eccd5-b5f3-4974-9450-e57335d9031a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:30:57,067][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5].
[2026-01-11 23:31:15,079][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81ded62f0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 62bcc0a9-8527-456e-be86-873250b8ae9f)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:31:17,686][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method
[2026-01-11 23:31:18,295][dinov2][INFO] - using MLP layer as FFN
[2026-01-11 23:31:25,991][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85fcab9d0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 4ef2c4bd-22da-4712-8e99-bce6f1082cdb)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:31:25,995][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5].
[2026-01-11 23:31:37,010][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa8a6359de0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: f74d345f-eb4a-4d75-982f-b7ea24c6e049)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:31:37,010][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5].
[2026-01-11 23:31:49,023][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85f8d6ce0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 8e687400-056d-49f6-a748-235c6fdb5f3c)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:31:49,024][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5].
[2026-01-11 23:32:03,039][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85f8d7ee0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 0b984ed9-ee53-487d-91fb-cd06abc44dd1)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:32:03,065][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5].
[2026-01-11 23:32:21,083][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85f8d7b20>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 42e4f4fe-25dd-43bd-a6dd-241a40fb99e7)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:32:21,084][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5].
[2026-01-11 23:32:39,103][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85f8e43a0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 62ad11c1-5362-47d4-9a93-450eb33107a5)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:32:49,115][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85f769030>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: f219f247-2567-4373-982e-1387eb55ef56)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:32:49,116][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5].
[2026-01-11 23:33:00,131][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa8be25eb90>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 4f6a7979-3962-42ac-92f2-c35f74e36b83)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:33:00,132][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5].
[2026-01-11 23:33:12,148][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa7b9358280>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 76f519c7-822d-4c7e-a691-1ec7c9a40f15)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:33:12,148][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5].
[2026-01-11 23:33:26,166][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85f516560>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 188eeb52-4d05-4749-808c-b13fb3492cd2)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:33:26,166][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5].
[2026-01-11 23:33:44,184][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa7b9358310>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 69d34dce-1970-488c-afb1-fe871ed4df66)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:33:44,185][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5].
[2026-01-11 23:33:59,466][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81df2c5e0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 3b7ff38f-022b-43fc-9744-a89614de5ac6)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:33:59,532][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5].
[2026-01-11 23:34:02,203][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fa85f76be20>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 0e9f0fe1-3493-4bb1-91b0-a87c3256799e)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:34:10,547][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81c1a56f0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: b90d5d09-daef-4bea-964b-2942280ecd2e)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:34:10,551][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5].
[2026-01-11 23:34:22,559][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81df21ba0>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: f3a3713a-9a1b-4c89-a8e5-5aa49ceca7f5)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:34:22,561][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5].
[2026-01-11 23:34:30,997][root][INFO] - Creating EMA weights copy.
[2026-01-11 23:34:36,578][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81df23880>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: c5806ebf-73e4-49bb-80a3-9a10722e3511)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:34:36,582][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5].
[2026-01-11 23:34:54,604][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81df21270>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 641845ba-b43f-4066-98d8-37c3bbf8ca1b)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:34:54,606][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5].
[2026-01-11 23:35:12,618][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81df21900>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 2663fb79-269d-4b44-821f-0438b243a4eb)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/tokenizer_config.json
[2026-01-11 23:35:22,630][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81df20e80>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: c2cd55f3-39c2-446b-937a-45885bd21877)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:35:22,631][huggingface_hub.utils._http][WARNING] - Retrying in 1s [Retry 1/5].
[2026-01-11 23:35:33,643][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81df20a30>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: f3cac1d6-1ce1-4594-af62-f91dd149b021)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:35:33,643][huggingface_hub.utils._http][WARNING] - Retrying in 2s [Retry 2/5].
[2026-01-11 23:35:45,658][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81e97f010>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: dd20cdd4-64dc-4904-9e93-27809f092854)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:35:45,659][huggingface_hub.utils._http][WARNING] - Retrying in 4s [Retry 3/5].
[2026-01-11 23:35:59,676][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81e947730>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: 0c40b4e0-4a49-48b4-9d37-8abd5555b42a)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:35:59,677][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 4/5].
[2026-01-11 23:36:17,691][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81df20b80>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: c77ce026-5f90-44d9-b1a6-4997de18be25)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:36:17,691][huggingface_hub.utils._http][WARNING] - Retrying in 8s [Retry 5/5].
[2026-01-11 23:36:35,793][huggingface_hub.utils._http][WARNING] - '(MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /openai/clip-vit-base-patch32/resolve/main/config.json (Caused by ConnectTimeoutError(<HTTPSConnection(host='hf-mirror.com', port=443) at 0x7fd81df20d60>, 'Connection to hf-mirror.com timed out. (connect timeout=10)'))"), '(Request ID: ecf535fa-a8a2-4daf-8633-57c342bc5bd6)')' thrown while requesting HEAD https://hf-mirror.com/openai/clip-vit-base-patch32/resolve/main/config.json
[2026-01-11 23:37:05,395][root][INFO] - Creating EMA weights copy.
[2026-01-11 23:46:11,745][datasets][INFO] - PyTorch version 2.2.2+cu118 available.
[2026-01-11 23:46:43,730][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method
[2026-01-11 23:46:44,356][dinov2][INFO] - using MLP layer as FFN
[2026-01-11 23:46:57,879][datasets][INFO] - PyTorch version 2.2.2+cu118 available.
[2026-01-11 23:47:27,131][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method
[2026-01-11 23:47:27,911][dinov2][INFO] - using MLP layer as FFN
[2026-01-11 23:48:20,013][root][INFO] - Creating EMA weights copy.
[2026-01-11 23:49:03,155][root][INFO] - Creating EMA weights copy.
[2026-01-11 23:53:20,801][datasets][INFO] - PyTorch version 2.2.2+cu118 available.
[2026-01-11 23:53:51,742][mode.models.networks.modedit][INFO] - Weights initialized using custom _init_weights method
[2026-01-11 23:53:52,649][dinov2][INFO] - using MLP layer as FFN
[2026-01-11 23:56:37,442][root][INFO] - Creating EMA weights copy.