config_update

weights shape mismatch

Script:
```
from transformers import LlavaForConditionalGeneration, AutoProcessor
from PIL import Image

model_id = "hf-internal-testing/pixtral-12b"
model = LlavaForConditionalGeneration.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

IMG_URLS = [
"https://picsum.photos/id/237/400/300",
"https://picsum.photos/id/231/200/300",
"https://picsum.photos/id/27/500/500",
"https://picsum.photos/id/17/150/600",
]
PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"

inputs = processor(images=IMG_URLS, text=PROMPT, return_tensors="pt")
generate_ids = model.generate(**inputs, max_new_tokens=500)
ouptut = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
```

Error:
```
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:30<00:00, 5.16s/it]
Traceback (most recent call last):
File "/Users/varb/exo/test.py", line 70, in <module>
model = LlavaForConditionalGeneration.from_pretrained(model_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/varb/transformers/src/transformers/modeling_utils.py", line 3976, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/varb/transformers/src/transformers/modeling_utils.py", line 4511, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlavaForConditionalGeneration:
size mismatch for language_model.model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.0.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.1.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.1.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.2.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.2.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.2.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.2.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.3.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.3.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.3.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.3.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.4.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.4.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.4.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.4.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.5.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.5.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.5.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.5.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.6.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.6.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.6.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.6.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.7.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.7.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.7.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.7.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.8.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.8.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.8.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.8.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.9.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.9.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.9.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.9.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.10.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size

Files changed (1) hide show

config.json +2 -0

config.json CHANGED Viewed

@@ -11,6 +11,8 @@
   "text_config": {
     "hidden_size": 5120,
     "intermediate_size": 14336,
     "max_position_embeddings": 1024000,
     "model_type": "mistral",
     "num_hidden_layers": 40,

   "text_config": {
     "hidden_size": 5120,
     "intermediate_size": 14336,
+    "num_attention_heads": 32,
+    "head_dim": 128,
     "max_position_embeddings": 1024000,
     "model_type": "mistral",
     "num_hidden_layers": 40,