File size: 11,932 Bytes
6d57cb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
# OptGuideOnDeviceClassifierModel β€” Complete Analysis

## Overview

**OptGuideOnDeviceClassifierModel** is a 120 MB on-device language model shipped with Chrome Canary as a Chrome Component. Its manifest names it **"Optimization Guide On Device Taxonomy Model"**, with a base model spec called **`taxonomy-tiny`**.

It is a **Gemma 2 variant** purpose-built for **page-level classification** β€” specifically extracting the **brand** and **intent** of web pages for Chrome's client-side **scam/phishing detection** pipeline.

| Field | Value |
|---|---|
| Manifest name | Optimization Guide On Device Taxonomy Model |
| Base model | `taxonomy-tiny` v0.0.0.0 |
| Component version | `2026.2.12.1554` |
| Component ID (CRX) | `eidcjfoningnkhpoelgpjemmhmopkeoi` |
| File | `weights.bin` (126,025,728 bytes / 120.19 MB) |
| Execution config | Empty (0 bytes) β€” no prompt template bundled |
| Performance hint | `3` |
| Availability | **Chrome Canary** (not tested in Stable) |
| Optimization target | `OPTIMIZATION_TARGET_MODEL_EXECUTION_FEATURE_CLASSIFIER` (ID 72) |
| Chrome feature flag | `ClientSideDetectionBrandAndIntentForScamDetection` |

---

## Purpose: Scam Detection via Brand + Intent Classification

Chrome's Client-Side Detection (CSD) system extracts page text from suspicious websites and sends it to this model with the following prompt (decoded from `on_device_model_execution_config.pb` of model ID 55):

```

You are a web page text scanner. Your task is to carefully review text from

a web page and answer the following questions in English:



1) What brand does the page represent?

2) In one complete sentence, summarize what this page aims to do.

   Do not leak PII data.



You should output your answers strictly in the following JSON format:



{"brand": "<brand>", "intent": "<intent>"}



Do not use ```json``` block in your output.



Text: [PAGE CONTENT HERE]

```

The expected response conforms to this JSON schema:

```json

{

  "type": "object",

  "additionalProperties": false,

  "properties": {

    "brand": { "type": "string" },

    "intent": { "type": "string" }

  },

  "required": ["brand", "intent"]

}

```

When the detected brand/intent combination is inconsistent with the actual page behavior (e.g., a page claiming to be PayPal but actually harvesting credentials on an unrelated domain), Chrome flags the page as a potential scam via Safe Browsing.

---

## Binary Format: LITERTLM Container

The `weights.bin` file is **not** a raw TFLite model. It uses the **LITERTLM** (LiteRT Language Model) container format β€” a proprietary Google ODML packaging format with a FlatBuffer header and multiple embedded submodels.

### File Layout

```

Offset          Component                        Size

────────────────────────────────────────────────────────────

0x00000000      LITERTLM FlatBuffer header       32 KB

                  Magic: "LITERTLM"

                  Version: 1

                  Submodels: 4 entries declared

                  Metadata:

                    model_type = "tf_lite_prefill_decode"

                    model_type = "tf_lite_embedder"

                    model_version = "1.0.1"

                    Authors = "ODML team"



0x00008000      TFLite #1 β€” Embedder             8.20 MB (8,601,600 bytes)

                  Input:  token_ids [1, 1] int32

                  Output: embeddings [1, 1, 1024] float32

                  Op: lookup_embedding_table

                  TFLite runtime: 2.18.0



0x0083C000      TFLite #2 β€” Prefill + Decode     111.63 MB (117,055,216 bytes)

                  2 signatures: "prefill" and "decode"

                  39 inputs (embeddings + position + mask + 36 KV cache)

                  37 outputs (36 KV cache + logits [1, 1, 16384])

                  18 transformer layers

                  Full Gemma 2 architecture



0x077E0000      SentencePiece tokenizer          305.6 KB (312,918 bytes)

                  Vocab size: 16,384 tokens

                  Special tokens: <pad>=0, </s>=1, <s>=2, <unk>=3

                  256 byte-fallback tokens

                  Normalizer: nmt_nfkc



0x0782C656      Zero padding to alignment        14.7 KB

0x07830000      End of file                      126,025,728 bytes total

```

### How to Extract the Submodels

```python

data = open('weights.bin', 'rb').read()



# TFLite embedder

open('embedder.tflite', 'wb').write(data[0x8000:0x83C000])



# TFLite prefill+decode transformer

open('decoder.tflite', 'wb').write(data[0x83C000:0x77DDEF0])



# SentencePiece tokenizer

open('tokenizer.model', 'wb').write(data[0x77E0000:0x782C656])

```

---

## Architecture: Gemma 2 "taxonomy-tiny"

The model is a **distilled Gemma 2** with reduced dimensions, confirmed by layer name analysis of the TFLite graph.

### Specifications

| Parameter | Value | Evidence |
|---|---|---|
| Architecture family | **Gemma 2** | QK normalization + post-FFN norm = Gemma 2 exclusive features |
| Transformer layers | **18** | `layer_0` through `layer_17` in tensor names |
| Embedding dimension | **1024** | Embedder output shape `[1, 1, 1024]` |
| KV cache dimension | **256** per layer | KV input/output shape `[1, 1, 1, 256]` |
| Vocabulary size | **16,384** | Logits output shape `[1, 1, 16384]`; SentencePiece vocab |
| Normalization | **RMSNorm** | `rms_norm/mul`, `rms_norm/rsqrt`, `rms_norm/square` |
| Pre-attention norm | **Yes** | `pre_attention_norm/rms_norm` |
| Pre-FFN norm | **Yes** | `pre_ffw_norm` patterns |
| Post-FFN norm | **Yes** | Post-FFN norm present (Gemma 2 specific) |
| QK normalization | **Yes** | `key_norm/rms_norm` (Gemma 2 specific) |
| Positional encoding | **RoPE** | `maybe_rope/concatenate` |
| Attention type | **Full attention** | No sliding window patterns found |
| Activation | **GeLU** (likely) | Standard for Gemma 2 |
| Quantization | **Mixed INT4/INT8** | 120 MB for 18 layers with 1024 dim implies heavy quantization |
| Estimated parameters | **~100–200M** | Based on file size and quantization assumptions |
| TFLite signatures | `prefill` (no logits) + `decode` (with logits) | Standard ODML LLM execution pattern |

### Comparison with Known Models

| | **taxonomy-tiny** | Gemma 2 2B | Gemini Nano v3 |
|---|---|---|---|
| Layers | 18 | 26 | ~32 |
| Embed dim | 1,024 | 2,304 | unknown |
| Vocab size | 16,384 | 256,128 | 256,128 |
| File size | 120 MB | ~2.6 GB | 4.07 GB |
| QK norm | Yes | Yes | Yes |
| Post-FFN norm | Yes | Yes | Yes |
| Sliding window | No | Yes (alternating) | Yes |
| Purpose | Classification | General | General |

### Single Transformer Block Structure

From tensor name analysis, each of the 18 layers contains:

```

layer_N/

β”œβ”€β”€ layer_N.pre_qkv/

β”‚   β”œβ”€β”€ pre_attention_norm/rms_norm/          (RMSNorm)

β”‚   └── attn._pre_attention_fn/

β”‚       └── maybe_rope/                       (RoPE positional encoding)

β”œβ”€β”€ attn.dot_product_attention/

β”‚   └── dot_attn._qkv_fn/

β”‚       β”œβ”€β”€ key_norm/rms_norm/                (QK normalization)

β”‚       β”œβ”€β”€ dot_general (Q*K)

β”‚       β”œβ”€β”€ tfl_softmax

β”‚       β”œβ”€β”€ dot_general (attn*V)

β”‚       └── reshape/transpose

β”œβ”€β”€ layer_N.post_qkv/

β”‚   β”œβ”€β”€ attn.post_qkv/attn_vec_einsum/       (output projection)

β”‚   β”œβ”€β”€ add (residual)

β”‚   └── add1 (post-attention residual)

β”œβ”€β”€ layer_N.update_cache/

β”‚   └── attn.update_cache/concatenate         (KV cache update)

└── [pre_ffw_norm + FFN + post_ffw_norm]      (feed-forward block)

```

Final output: `final_norm/rms_norm` β†’ `decode_softmax` β†’ logits `[1, 1, 16384]`

---

## Tokenizer: Reduced Gemma Vocabulary

The embedded SentencePiece model uses a **16,384-token vocabulary** β€” a dramatic reduction from Gemma's standard 256,128 tokens. This is consistent with a classification-focused model that doesn't need the full multilingual generative vocabulary.

| Property | Value |
|---|---|
| Vocab size | 16,384 |
| BOS token | `<s>` (id=2) |
| EOS token | `</s>` (id=1) |
| PAD token | `<pad>` (id=0) |
| UNK token | `<unk>` (id=3) |
| Byte fallbacks | 256 tokens (`<0x00>` through `<0xFF>`) |
| Normalizer | `nmt_nfkc` |

Notably, Gemma's conversation tokens (`<start_of_turn>`, `<end_of_turn>`) are **absent** from this vocabulary β€” they map to UNK (id=3). The model does not use chat-turn formatting.

Sample vocabulary entries:

```

[  260] = '.'           [  500] = '▁such'       [ 1000] = '▁amount'

[ 2000] = '▁Q'          [ 5000] = '▁tradition'  [10000] = '▁Computer'

[15000] = '▁Philosophy'  [16383] = '▁<custom370>'

```

---

## Chrome Integration Pipeline

```

User visits a page

        β”‚

        β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚  Safe Browsing Heuristics   β”‚  Pre-filter: URL reputation, phishing

β”‚  (CSD - Client Side Det.)   β”‚  patterns, keyboard lock API, etc.

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

           β”‚ Page flagged as suspicious

           β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚  Page Text Extraction       β”‚  Extract visible text content from DOM

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

           β”‚

           β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚  Prompt Construction        β”‚  "You are a web page text scanner..."

β”‚  (from model ID 55 config)  β”‚  + page text appended

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

           β”‚

     β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”

     β–Ό            β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ Gemini  β”‚  β”‚ taxonomy-    β”‚   Whichever model is available

β”‚ Nano    β”‚  β”‚ tiny         β”‚   (taxonomy-tiny is 33x smaller)

β”‚ (4 GB)  β”‚  β”‚ (120 MB)     β”‚

β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜

     β”‚              β”‚

     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜

            β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚  JSON Response Parsing      β”‚  {"brand": "PayPal",

β”‚                             β”‚   "intent": "credential harvesting"}

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

           β”‚

           β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚  Verdict Logic              β”‚  Compare brand claim vs. actual domain,

β”‚                             β”‚  intent vs. page behavior

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

           β”‚

           β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚  Safe Browsing Warning      β”‚  Red interstitial page shown to user

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

```

### Trigger Conditions

The classifier does **not** run on every page. It triggers when Chrome's CSD heuristics detect suspicious signals:

- Phishing URL patterns (Safe Browsing prefix match)
- Keyboard Lock API usage (common in tech support scams)
- Aggressive popups or fullscreen requests
- Form fields requesting sensitive data (passwords, SSN, credit cards)
- Urgency language patterns

---