@merve on Hugging Face: "Google's SigLIP is another alternative to openai's CLIP, and it just got…"

merve

posted an update Jan 16, 2024

Post

Google's SigLIP is another alternative to openai's CLIP, and it just got merged to 🤗transformers and it's super easy to use!
To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects 🥳
Search for art 👉 merve/draw_to_search_art
Compare SigLIP with CLIP 👉 merve/compare_clip_siglip

How does SigLIP work?
SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs
The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. 📎

Highlights from the paper on why you should use it ✨
🖼️📝 Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder
😍 More performant than CLIP on zero-shot
🗣️ Authors trained a multilingual model too!
⚡️ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that

It's super easy to use thanks to transformers 👇

from transformers import pipeline
from PIL import Image
import requests

# load pipe
image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"])
outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs]
print(outputs)

For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. 🤗

julien-c

Jan 16, 2024

very cool! link to model checkpoint on the hub: https://huggingface.co/google/siglip-base-patch16-256-multilingual

chansung

Jan 17, 2024

@merve

the link to the GitHub repository is broken (it contains ')' at the end)
thanks for sharing your works by the way!

AndresJara

Mar 6, 2025

•

edited Mar 6, 2025

Is it possible to generate embeddings with SigLIP? i mean only one vector to beeing used for a vector search like in bigquery

Join the conversation