Error loading the SigLIP2 Vision model

#12

by jgaubil - opened Jun 10, 2025

Jun 10, 2025

Description

Running the code snippet provided in the documentation for Siglip2VisionModel yields a RuntimeError due to shape mismatch when loading the model checkpoint.

Steps to Reproduce

Install transformers library
Run the following code:

from transformers import Siglip2VisionModel

model = Siglip2VisionModel.from_pretrained("google/siglip2-base-patch16-224")

Expected Behavior

The model should load successfully without errors.

Actual Behavior

The following error is raised:

You are using a model of type siglip_vision_model to instantiate a model of type siglip2_vision_model. 
This is not supported for all configurations of models and can yield errors.

[...]

RuntimeError: Error(s) in loading state_dict for Linear: 
size mismatch for weight: copying a param with shape torch.Size([768, 3, 16, 16]) 
from checkpoint, the shape in current model is torch.Size([768, 768]).

Additional Investigation

Loading a SigLIP2 model from checkpoint effectively attempts to load the model using the wrong class, using SigLIP classes instead of SigLIP2:

from transformers import AutoModel

model = AutoModel.from_pretrained("google/siglip2-base-patch16-224")
model.vision_model.__class__
# Output: transformers.models.siglip.modeling_siglip.SiglipVisionTransformer

Root Cause Analysis

I believe this is because this checkpoint, as well as most other SigLIP2 checkpoints, are not defined in src/transformers/models/siglip2/convert_siglip2_to_hf.py but rather in src/transformers/models/siglip/convert_siglip_to_hf.py.

Proposed Solution

Porting the SigLIP2 checkpoints to the SigLIP2 conversion file may fix the error.

Environment

transformers version: 4.52.4
PyTorch version: 2.6.0
Python version: 3.11.11
Operating System: linux

Additional Context

This affects the google/siglip2-base-patch16-224 checkpoint and affects all checkpoints except google/siglip2-base-patch16-naflexand google/siglip2-so400m-patch16-naflex, that are correctly defined in src/transformers/models/siglip/convert_siglip_to_hf.py.

IONSLIU1

Nov 25, 2025

Hello, I've also encountered the same problem. Could you please tell me how you solved it?

jgaubil

Nov 25, 2025

Unfortunately, I haven't. I'm currently using checkpoints outside huggingface. This works for me, using open_clip_torch==3.2.0 and timm==1.0.15:

from open_clip import create_model_from_pretrained # works on open-clip-torch >= 2.31.0, timm >= 1.0.15

siglip2, preprocess = create_model_from_pretrained('hf-hub:timm/ViT-L-16-SigLIP2-512')

IONSLIU1

Nov 25, 2025

Thanks, I just saw this discussion. It says that Siglip2Model works only for -naflex checkpoints, and other checkpoints should be loaded with SiglipModel (the same applies for SiglipVision / Siglip2Vision).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment