SigLIP2 or SigLIP1

by JosephusCheung - opened May 21, 2025

May 21, 2025

License

BAGEL is licensed under the Apache 2.0 license. It is finetuned from Qwen2.5-7B-Instruct and siglip-so400m-14-980-flash-attn2-navit model, and uses the FLUX.1-schnell VAE model, all under Apache 2.0.

siglip-so400m-14-980-flash-attn2-navit by HuggingFaceM4 is SigLIP1, but in your paper

We adopt
SigLIP2-so400m/14 [74] with a fixed 384-resolution as the initialization of the ViT encoder. Building

tsutikgiau

ByteDance Seed org May 21, 2025

Thanks for the pointing out this issue! We use siglip-so400m-14-384-flash-attn2. The information in license is updated.

JosephusCheung

May 21, 2025

I'm still confused.

'siglip-so400m-14-384-flash-attn2' seems to be SigLIP1. But SigLIP2-so400m/14 was mentioned in your paper.

bearcat

May 21, 2025

I'm still confused.

'siglip-so400m-14-384-flash-attn2' seems to be SigLIP1. But SigLIP2-so400m/14 was mentioned in your paper.

let me clarify this. we use SigLIP2-so400m/14 with a 384x384 input resolution. then we interpolate the position embeddings to 980x980.
this is actually what siglip-so400m-14-384-flash-attn2 has done to siglip-so400m-14-384.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment