siglip2-large-patch16-384 (ONNX)
ONNX export of google/siglip2-large-patch16-384 —
the image and text towers as two separate ONNX files, sharing a tokenizer.
- Source model:
google/siglip2-large-patch16-384 - Image input:
NCHW, 384×384, float32, SigLIP normalization - Text input: fixed-length 64 tokens (SigLIP2's training-time max)
- Embedding dim: 1024 (both towers, joint embedding space)
Files
| File | Purpose | Size |
|---|---|---|
siglip2-large-patch16-384_image.onnx |
image tower (self-contained) | 1.2 GB |
siglip2-large-patch16-384_text.onnx + _data |
text tower (one external-data blob) | 0.4 MB + 2.1 GB |
siglip2-large-patch16-384.json |
fast-tokenizer artifact | 33 MB |
The text-tower's external-data blob (siglip2-large-patch16-384_text.onnx_data)
must sit next to the .onnx file when loading; onnx.load(...) and
onnxruntime.InferenceSession(...) resolve the relative reference
automatically.
Usage
import onnxruntime as ort
from tokenizers import Tokenizer
# Image tower
img_sess = ort.InferenceSession("siglip2-large-patch16-384_image.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
img_emb = img_sess.run(None, {"pixel_values": chw_image_batch})[0]
# Text tower
tok = Tokenizer.from_file("siglip2-large-patch16-384.json")
ids = [tok.encode(t).ids[:64] for t in texts]
txt_sess = ort.InferenceSession("siglip2-large-patch16-384_text.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
txt_emb = txt_sess.run(None, {"input_ids": ids})[0]
Source license
Inherits Apache-2.0 from the upstream model.