How do I use this?
I'm super confused, I don't see anyone else running into problems but I've run out of avenues to explore.
I'm trying to just run the default .json you've included. I've also tried the one embedded in the image provided. I get the same error no matter what.
CLIPLoader
Error(s) in loading state_dict for Llama2:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([151936, 1024]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.0.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.0.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
etc...
I've tried updating ComfyUI (0.80.0), I've tried running a fresh venv, tried checking for node updates... nothing helps.
Sorry if the answer is really simple. I'm literally just importing your .json and hitting run with no luck.
Their is an official template included in comfyui
Be sure to update!
same problems, we have problem loading the clip, some suggestions?
CLIPLoader
Error(s) in loading state_dict for Llama2:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([151936, 1024]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.0.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.0.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.0.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.0.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.0.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.0.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.1.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.1.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.1.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.1.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.1.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.1.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.1.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.2.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.2.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.2.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.2.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.2.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.2.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.2.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.2.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.2.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.3.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.3.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.3.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.3.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.3.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.3.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.3.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.3.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.3.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.4.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.4.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.4.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.4.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.4.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.4.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.4.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.4.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.4.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.5.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.5.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.5.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.5.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.5.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.5.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.5.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.5.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.5.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.6.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.6.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.6.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.6.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.6.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.6.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.6.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.6.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.6.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.7.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.7.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.7.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.7.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.7.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.7.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.7.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.7.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.7.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.8.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.8.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.8.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.8.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.8.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.8.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.8.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.8.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.8.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.9.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.9.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.9.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.9.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.9.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.9.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.9.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.9.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.9.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.10.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.10.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.10.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.10.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.10.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.10.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.10.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.10.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.10.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.11.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.11.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.11.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.11.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.11.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.11.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.11.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.11.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.11.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.12.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.12.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.12.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.12.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.12.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.12.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.12.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.12.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.12.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.13.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.13.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.13.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.13.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.13.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.13.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.13.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.13.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.13.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.14.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.14.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.14.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.14.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.14.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.14.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.14.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.14.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.14.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.15.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.15.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.15.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.15.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.15.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.15.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.15.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.15.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.15.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.16.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.16.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.16.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.16.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.16.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.16.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.16.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.16.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.16.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.17.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.17.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.17.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.17.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.17.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.17.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.17.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.17.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.17.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.18.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.18.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.18.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.18.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.18.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.18.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.18.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.18.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.18.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.19.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.19.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.19.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.19.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.19.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.19.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.19.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.19.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.19.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.20.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.20.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.20.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.20.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.20.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.20.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.20.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.20.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.20.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.21.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.21.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.21.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.21.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.21.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.21.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.21.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.21.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.21.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.22.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.22.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.22.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.22.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.22.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.22.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.22.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.22.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.22.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.23.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.23.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.23.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.23.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.23.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.23.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.23.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.23.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.23.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.24.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.24.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.24.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.24.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.24.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.24.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.24.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.24.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.24.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.25.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.25.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.25.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.25.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.25.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.25.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.25.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.25.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.25.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.26.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.26.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.26.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.26.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.26.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.26.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.26.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.26.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.26.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.27.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.27.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.27.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for model.layers.27.self_attn.o_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.27.mlp.gate_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.27.mlp.up_proj.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([14336, 4096]).
size mismatch for model.layers.27.mlp.down_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([4096, 14336]).
size mismatch for model.layers.27.input_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.layers.27.post_attention_layernorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for model.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([4096]).