siglip2-large-patch16-384 (ONNX)

ONNX export of google/siglip2-large-patch16-384 — the image and text towers as two separate ONNX files, sharing a tokenizer.

Source model: google/siglip2-large-patch16-384
Image input: NCHW, 384×384, float32, SigLIP normalization
Text input: fixed-length 64 tokens (SigLIP2's training-time max)
Embedding dim: 1024 (both towers, joint embedding space)

Files

File	Purpose	Size
`siglip2-large-patch16-384_image.onnx`	image tower (self-contained)	1.2 GB
`siglip2-large-patch16-384_text.onnx` + `_data`	text tower (one external-data blob)	0.4 MB + 2.1 GB
`siglip2-large-patch16-384.json`	fast-tokenizer artifact	33 MB

The text-tower's external-data blob (siglip2-large-patch16-384_text.onnx_data) must sit next to the .onnx file when loading; onnx.load(...) and onnxruntime.InferenceSession(...) resolve the relative reference automatically.

Usage

import onnxruntime as ort
from tokenizers import Tokenizer

# Image tower
img_sess = ort.InferenceSession("siglip2-large-patch16-384_image.onnx",
                                providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
img_emb = img_sess.run(None, {"pixel_values": chw_image_batch})[0]

# Text tower
tok = Tokenizer.from_file("siglip2-large-patch16-384.json")
ids = [tok.encode(t).ids[:64] for t in texts]
txt_sess = ort.InferenceSession("siglip2-large-patch16-384_text.onnx",
                                providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
txt_emb = txt_sess.run(None, {"input_ids": ids})[0]

Source license

Inherits Apache-2.0 from the upstream model.

Downloads last month: -; Downloads are not tracked for this model. How to track