siglip2-large-patch16-384 (ONNX)

ONNX export of google/siglip2-large-patch16-384 — the image and text towers as two separate ONNX files, sharing a tokenizer.

  • Source model: google/siglip2-large-patch16-384
  • Image input: NCHW, 384×384, float32, SigLIP normalization
  • Text input: fixed-length 64 tokens (SigLIP2's training-time max)
  • Embedding dim: 1024 (both towers, joint embedding space)

Files

File Purpose Size
siglip2-large-patch16-384_image.onnx image tower (self-contained) 1.2 GB
siglip2-large-patch16-384_text.onnx + _data text tower (one external-data blob) 0.4 MB + 2.1 GB
siglip2-large-patch16-384.json fast-tokenizer artifact 33 MB

The text-tower's external-data blob (siglip2-large-patch16-384_text.onnx_data) must sit next to the .onnx file when loading; onnx.load(...) and onnxruntime.InferenceSession(...) resolve the relative reference automatically.

Usage

import onnxruntime as ort
from tokenizers import Tokenizer

# Image tower
img_sess = ort.InferenceSession("siglip2-large-patch16-384_image.onnx",
                                providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
img_emb = img_sess.run(None, {"pixel_values": chw_image_batch})[0]

# Text tower
tok = Tokenizer.from_file("siglip2-large-patch16-384.json")
ids = [tok.encode(t).ids[:64] for t in texts]
txt_sess = ort.InferenceSession("siglip2-large-patch16-384_text.onnx",
                                providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
txt_emb = txt_sess.run(None, {"input_ids": ids})[0]

Source license

Inherits Apache-2.0 from the upstream model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support