siglip2-base-patch16-224 (ONNX)

This is Google's SigLIP 2 base/224 exported to ONNX format for CPU inference, used by Nebula for local, offline image search.

What's inside

File	Description
`model.onnx`	Combined vision + text encoder (~110 MB)
`tokenizer.json`	SigLIP tokenizer

The single model.onnx file contains both encoders. You can run either independently by passing a dummy tensor for the unused branch.

Inputs

Name	Shape	dtype
`pixel_values`	`[image_batch, 3, 224, 224]`	float32
`input_ids`	`[text_batch, seq_len]`	int64

Outputs

Name	Shape	dtype	Description
`image_embeds`	`[image_batch, 768]`	float32	L2-normalizable image embedding
`text_embeds`	`[text_batch, 768]`	float32	L2-normalizable text embedding
`logits_per_image`	`[image_batch, text_batch]`	float32	Cosine similarity scores
`logits_per_text`	`[text_batch, image_batch]`	float32	Cosine similarity scores (transposed)

optimum-cli export onnx \
  --model google/siglip2-base-patch16-224 \
  --task zero-shot-image-classification \
  --opset 18 \
  ./models/

Requires optimum[onnxruntime] and transformers.

Inherits Apache 2.0 from the original Google SigLIP 2 model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Quantized

(8)

this model