diegohh
/

siglip2-base-patch16-224

image-text-matching

Model card Files Files and versions

siglip2-base-patch16-224 / README.md

diegohh's picture

Upload README.md with huggingface_hub

902b496 verified about 2 months ago

|

history blame contribute delete

1.66 kB

	---
	license: apache-2.0
	base_model: google/siglip2-base-patch16-224
	tags:
	- onnx
	- vision
	- image-text-matching
	- nebula
	---

	# siglip2-base-patch16-224 (ONNX)

	This is [Google's SigLIP 2 base/224](https://huggingface.co/google/siglip2-base-patch16-224) exported to ONNX format for CPU inference, used by [Nebula](https://github.com/diegohh0411/nebula) for local, offline image search.

	## What's inside

	\| File \| Description \|
	\|---\|---\|
	\| `model.onnx` \| Combined vision + text encoder (~110 MB) \|
	\| `tokenizer.json` \| SigLIP tokenizer \|

	## Model inputs & outputs

	The single `model.onnx` file contains both encoders. You can run either independently by passing a dummy tensor for the unused branch.

	Inputs

	\| Name \| Shape \| dtype \|
	\|---\|---\|---\|
	\| `pixel_values` \| `[image_batch, 3, 224, 224]` \| float32 \|
	\| `input_ids` \| `[text_batch, seq_len]` \| int64 \|

	Outputs

	\| Name \| Shape \| dtype \| Description \|
	\|---\|---\|---\|---\|
	\| `image_embeds` \| `[image_batch, 768]` \| float32 \| L2-normalizable image embedding \|
	\| `text_embeds` \| `[text_batch, 768]` \| float32 \| L2-normalizable text embedding \|
	\| `logits_per_image` \| `[image_batch, text_batch]` \| float32 \| Cosine similarity scores \|
	\| `logits_per_text` \| `[text_batch, image_batch]` \| float32 \| Cosine similarity scores (transposed) \|

	## How it was exported

	```bash
	optimum-cli export onnx \
	--model google/siglip2-base-patch16-224 \
	--task zero-shot-image-classification \
	--opset 18 \
	./models/
	```

	Requires `optimum[onnxruntime]` and `transformers`.

	## License

	Inherits [Apache 2.0](https://huggingface.co/google/siglip2-base-patch16-224) from the original Google SigLIP 2 model.