Svenni551's picture
Upload README.md with huggingface_hub
4514699 verified
metadata
license: apache-2.0
base_model: google/siglip-base-patch16-224
tags:
  - vision
  - onnx
  - int8
  - mobile
  - flutter

WakeUp SigLIP-1 Base INT8 (ONNX)

ONNX INT8 exports of google/siglip-base-patch16-224 for use in the WakeUp Flutter alarm app's "Travel Mode" feature.

Files

File Size Purpose
siglip1_image_encoder_int8.onnx ~99 MB Image feature extraction (per-scan)
siglip1_text_encoder_int8.onnx ~111 MB Text feature extraction (Custom Text mode only)
model_metadata.json logit_scale, logit_bias, normalization params
tokenizer/ SigLIP-1 tokenizer files

Scoring

Both encoders L2-normalize their output. SigLIP scoring is:

logits = exp(logit_scale) * cosine(image_emb, text_emb) + logit_bias
prob   = sigmoid(logits)

Constants from model_metadata.json:

  • logit_scale = 4.765 (so exp(scale) ≈ 117.33)
  • logit_bias = -12.932

Inference

import onnxruntime as ort
import numpy as np

img_sess = ort.InferenceSession("siglip1_image_encoder_int8.onnx")
txt_sess = ort.InferenceSession("siglip1_text_encoder_int8.onnx")

# image: 1x3x224x224 normalized with mean/std [0.5, 0.5, 0.5]
image_emb = img_sess.run(None, {"pixel_values": pixel_values})[0]
# text: input_ids + attention_mask (use all-ones mask for canonical inference)
text_emb = txt_sess.run(None, {"input_ids": ids, "attention_mask": np.ones_like(ids)})[0]

cos = text_emb @ image_emb.T
prob = 1 / (1 + np.exp(-(np.exp(4.765) * cos + -12.932)))

License

Apache 2.0 (inherits from base model).