Svenni551 commited on
Commit
4514699
·
verified ·
1 Parent(s): 1d1ef02

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/siglip-base-patch16-224
4
+ tags:
5
+ - vision
6
+ - onnx
7
+ - int8
8
+ - mobile
9
+ - flutter
10
+ ---
11
+
12
+ # WakeUp SigLIP-1 Base INT8 (ONNX)
13
+
14
+ ONNX INT8 exports of [`google/siglip-base-patch16-224`](https://huggingface.co/google/siglip-base-patch16-224) for use in the **WakeUp** Flutter alarm app's "Travel Mode" feature.
15
+
16
+ ## Files
17
+
18
+ | File | Size | Purpose |
19
+ |---|---|---|
20
+ | `siglip1_image_encoder_int8.onnx` | ~99 MB | Image feature extraction (per-scan) |
21
+ | `siglip1_text_encoder_int8.onnx` | ~111 MB | Text feature extraction (Custom Text mode only) |
22
+ | `model_metadata.json` | — | `logit_scale`, `logit_bias`, normalization params |
23
+ | `tokenizer/` | — | SigLIP-1 tokenizer files |
24
+
25
+ ## Scoring
26
+
27
+ Both encoders L2-normalize their output. SigLIP scoring is:
28
+
29
+ ```
30
+ logits = exp(logit_scale) * cosine(image_emb, text_emb) + logit_bias
31
+ prob = sigmoid(logits)
32
+ ```
33
+
34
+ Constants from `model_metadata.json`:
35
+ - `logit_scale = 4.765` (so `exp(scale) ≈ 117.33`)
36
+ - `logit_bias = -12.932`
37
+
38
+ ## Inference
39
+
40
+ ```python
41
+ import onnxruntime as ort
42
+ import numpy as np
43
+
44
+ img_sess = ort.InferenceSession("siglip1_image_encoder_int8.onnx")
45
+ txt_sess = ort.InferenceSession("siglip1_text_encoder_int8.onnx")
46
+
47
+ # image: 1x3x224x224 normalized with mean/std [0.5, 0.5, 0.5]
48
+ image_emb = img_sess.run(None, {"pixel_values": pixel_values})[0]
49
+ # text: input_ids + attention_mask (use all-ones mask for canonical inference)
50
+ text_emb = txt_sess.run(None, {"input_ids": ids, "attention_mask": np.ones_like(ids)})[0]
51
+
52
+ cos = text_emb @ image_emb.T
53
+ prob = 1 / (1 + np.exp(-(np.exp(4.765) * cos + -12.932)))
54
+ ```
55
+
56
+ ## License
57
+
58
+ Apache 2.0 (inherits from base model).