Crash when reloading a trained checkpoint
#17
by
BenedictWJIrwin
- opened
I'd run a small training job, and saw it produced some checkpoints. When I come back to inference with one of these new model checkpoints getting a crash:
File "/home/ec2-user/micromamba/envs/saq2/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2629, in load_state_dict
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for BindingAffinityPredictor:
Missing key(s) in state_dict: "model_1.binding_affinity_head.ReLULayers.0.0.weight", "model_1.binding_affinity_head.ReLULayers.0.0.bias", "model_1.binding_affinity_head.ReLULayers.1.0.weight", "model_1.binding_affinity_head.ReLULayers.1.0.bias", "model_1.binding_affinity_head.linear_out.weight", "model_1.binding_affinity_head.linear_out.bias", "model_1.binding_classifier_head.ReLULayers.0.0.weight", "model_1.binding_classifier_head.ReLULayers.0.0.bias", "model_1.binding_classifier_head.ReLULayers.1.0.weight", "model_1.binding_classifier_head.ReLULayers.1.0.bias", "model_1.binding_classifier_head.linear_out.weight", "model_1.binding_classifier_head.linear_out.bias", "model_1.binding_classifier_head.linear_binary.weight", "model_1.binding_classifier_head.linear_binary.bias", "model_1.s_input_i_mlp.weight", "model_1.s_input_j_mlp.weight", "model_1.transition_z.0.layer_norm.weight", "model_1.transition_z.0.layer_norm.bias", "model_1.transition_z.0.swiglu.linear_a.weight", "model_1.transition_z.0.swiglu.linear_b.weight", "model_1.transition_z.0.linear_out.weight", "model_1.transition_z.1.layer_norm.weight", "model_1.transition_z.1.layer_norm.bias", "model_1.transition_z.1.swiglu.linear_a.weight"
...
Perhaps because one of the heads is missing from the repo? Any thoughts?