Sentrix-Code
/

wakeup-siglip1-base-int8

Model card Files Files and versions

wakeup-siglip1-base-int8 / README.md

Svenni551's picture

Upload README.md with huggingface_hub

4514699 verified 2 days ago

|

history blame contribute delete

1.71 kB

	---
	license: apache-2.0
	base_model: google/siglip-base-patch16-224
	tags:
	- vision
	- onnx
	- int8
	- mobile
	- flutter
	---

	# WakeUp SigLIP-1 Base INT8 (ONNX)

	ONNX INT8 exports of [`google/siglip-base-patch16-224`](https://huggingface.co/google/siglip-base-patch16-224) for use in the WakeUp Flutter alarm app's "Travel Mode" feature.

	## Files

	\| File \| Size \| Purpose \|
	\|---\|---\|---\|
	\| `siglip1_image_encoder_int8.onnx` \| ~99 MB \| Image feature extraction (per-scan) \|
	\| `siglip1_text_encoder_int8.onnx` \| ~111 MB \| Text feature extraction (Custom Text mode only) \|
	\| `model_metadata.json` \| — \| `logit_scale`, `logit_bias`, normalization params \|
	\| `tokenizer/` \| — \| SigLIP-1 tokenizer files \|

	## Scoring

	Both encoders L2-normalize their output. SigLIP scoring is:

	```
	logits = exp(logit_scale) * cosine(image_emb, text_emb) + logit_bias
	prob = sigmoid(logits)
	```

	Constants from `model_metadata.json`:
	- `logit_scale = 4.765` (so `exp(scale) ≈ 117.33`)
	- `logit_bias = -12.932`

	## Inference

	```python
	import onnxruntime as ort
	import numpy as np

	img_sess = ort.InferenceSession("siglip1_image_encoder_int8.onnx")
	txt_sess = ort.InferenceSession("siglip1_text_encoder_int8.onnx")

	# image: 1x3x224x224 normalized with mean/std [0.5, 0.5, 0.5]
	image_emb = img_sess.run(None, {"pixel_values": pixel_values})[0]
	# text: input_ids + attention_mask (use all-ones mask for canonical inference)
	text_emb = txt_sess.run(None, {"input_ids": ids, "attention_mask": np.ones_like(ids)})[0]

	cos = text_emb @ image_emb.T
	prob = 1 / (1 + np.exp(-(np.exp(4.765) * cos + -12.932)))
	```

	## License

	Apache 2.0 (inherits from base model).