siglip2-base-patch16-224 (ONNX)

This is Google's SigLIP 2 base/224 exported to ONNX format for CPU inference, used by Nebula for local, offline image search.

What's inside

File Description
model.onnx Combined vision + text encoder (~110 MB)
tokenizer.json SigLIP tokenizer

Model inputs & outputs

The single model.onnx file contains both encoders. You can run either independently by passing a dummy tensor for the unused branch.

Inputs

Name Shape dtype
pixel_values [image_batch, 3, 224, 224] float32
input_ids [text_batch, seq_len] int64

Outputs

Name Shape dtype Description
image_embeds [image_batch, 768] float32 L2-normalizable image embedding
text_embeds [text_batch, 768] float32 L2-normalizable text embedding
logits_per_image [image_batch, text_batch] float32 Cosine similarity scores
logits_per_text [text_batch, image_batch] float32 Cosine similarity scores (transposed)

How it was exported

optimum-cli export onnx \
  --model google/siglip2-base-patch16-224 \
  --task zero-shot-image-classification \
  --opset 18 \
  ./models/

Requires optimum[onnxruntime] and transformers.

License

Inherits Apache 2.0 from the original Google SigLIP 2 model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for diegohh/siglip2-base-patch16-224

Quantized
(8)
this model