Instructions to use google/siglip2-base-patch16-224 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/siglip2-base-patch16-224 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="google/siglip2-base-patch16-224") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("google/siglip2-base-patch16-224", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -2,15 +2,20 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
tags:
|
| 4 |
- vision
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# SigLIP 2 Base
|
| 8 |
|
| 9 |
-
[SigLIP 2](https://huggingface.co/
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
with prior, independently developed techniques into a unified recipe, for improved semantic
|
| 13 |
-
understanding, localization, and dense features.
|
| 14 |
|
| 15 |
## Intended uses
|
| 16 |
|
|
@@ -80,10 +85,18 @@ The model was trained on up to 2048 TPU-v5e chips.
|
|
| 80 |
|
| 81 |
Evaluation of SigLIP 2 is shown below (taken from the paper).
|
| 82 |
|
| 83 |
-
[Evaluation Table](
|
| 84 |
|
| 85 |
### BibTeX entry and citation info
|
| 86 |
|
| 87 |
```bibtex
|
| 88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
```
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
tags:
|
| 4 |
- vision
|
| 5 |
+
widget:
|
| 6 |
+
- src: >-
|
| 7 |
+
https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg
|
| 8 |
+
candidate_labels: bee in the sky, bee on the flower
|
| 9 |
+
example_title: Bee
|
| 10 |
+
library_name: transformers
|
| 11 |
+
pipeline_tag: zero-shot-image-classification
|
| 12 |
---
|
| 13 |
|
| 14 |
# SigLIP 2 Base
|
| 15 |
|
| 16 |
+
[SigLIP 2](https://huggingface.co/papers/2502.14786) extends the pretraining objective of
|
| 17 |
+
[SigLIP](https://huggingface.co/papers/2303.15343) with prior, independently developed techniques
|
| 18 |
+
into a unified recipe, for improved semantic understanding, localization, and dense features.
|
|
|
|
|
|
|
| 19 |
|
| 20 |
## Intended uses
|
| 21 |
|
|
|
|
| 85 |
|
| 86 |
Evaluation of SigLIP 2 is shown below (taken from the paper).
|
| 87 |
|
| 88 |
+

|
| 89 |
|
| 90 |
### BibTeX entry and citation info
|
| 91 |
|
| 92 |
```bibtex
|
| 93 |
+
@misc{tschannen2025siglip2multilingualvisionlanguage,
|
| 94 |
+
title={SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features},
|
| 95 |
+
author={Michael Tschannen and Alexey Gritsenko and Xiao Wang and Muhammad Ferjad Naeem and Ibrahim Alabdulmohsin and Nikhil Parthasarathy and Talfan Evans and Lucas Beyer and Ye Xia and Basil Mustafa and Olivier Hénaff and Jeremiah Harmsen and Andreas Steiner and Xiaohua Zhai},
|
| 96 |
+
year={2025},
|
| 97 |
+
eprint={2502.14786},
|
| 98 |
+
archivePrefix={arXiv},
|
| 99 |
+
primaryClass={cs.CV},
|
| 100 |
+
url={https://arxiv.org/abs/2502.14786},
|
| 101 |
+
}
|
| 102 |
```
|