Upload README.md with huggingface_hub

4514699 verified 1 day ago

1.71 kB

license: apache-2.0
base_model: google/siglip-base-patch16-224
tags:
  - vision
  - onnx
  - int8
  - mobile
  - flutter

WakeUp SigLIP-1 Base INT8 (ONNX)

ONNX INT8 exports of google/siglip-base-patch16-224 for use in the WakeUp Flutter alarm app's "Travel Mode" feature.

Files

File	Size	Purpose
`siglip1_image_encoder_int8.onnx`	~99 MB	Image feature extraction (per-scan)
`siglip1_text_encoder_int8.onnx`	~111 MB	Text feature extraction (Custom Text mode only)
`model_metadata.json`	—	`logit_scale`, `logit_bias`, normalization params
`tokenizer/`	—	SigLIP-1 tokenizer files

Scoring

Both encoders L2-normalize their output. SigLIP scoring is:

logits = exp(logit_scale) * cosine(image_emb, text_emb) + logit_bias
prob   = sigmoid(logits)

Constants from model_metadata.json:

logit_scale = 4.765 (so exp(scale) ≈ 117.33)
logit_bias = -12.932

Inference

import onnxruntime as ort
import numpy as np

img_sess = ort.InferenceSession("siglip1_image_encoder_int8.onnx")
txt_sess = ort.InferenceSession("siglip1_text_encoder_int8.onnx")

# image: 1x3x224x224 normalized with mean/std [0.5, 0.5, 0.5]
image_emb = img_sess.run(None, {"pixel_values": pixel_values})[0]
# text: input_ids + attention_mask (use all-ones mask for canonical inference)
text_emb = txt_sess.run(None, {"input_ids": ids, "attention_mask": np.ones_like(ids)})[0]

cos = text_emb @ image_emb.T
prob = 1 / (1 + np.exp(-(np.exp(4.765) * cos + -12.932)))

License

Apache 2.0 (inherits from base model).