File size: 11,932 Bytes
6d57cb4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 | # OptGuideOnDeviceClassifierModel β Complete Analysis
## Overview
**OptGuideOnDeviceClassifierModel** is a 120 MB on-device language model shipped with Chrome Canary as a Chrome Component. Its manifest names it **"Optimization Guide On Device Taxonomy Model"**, with a base model spec called **`taxonomy-tiny`**.
It is a **Gemma 2 variant** purpose-built for **page-level classification** β specifically extracting the **brand** and **intent** of web pages for Chrome's client-side **scam/phishing detection** pipeline.
| Field | Value |
|---|---|
| Manifest name | Optimization Guide On Device Taxonomy Model |
| Base model | `taxonomy-tiny` v0.0.0.0 |
| Component version | `2026.2.12.1554` |
| Component ID (CRX) | `eidcjfoningnkhpoelgpjemmhmopkeoi` |
| File | `weights.bin` (126,025,728 bytes / 120.19 MB) |
| Execution config | Empty (0 bytes) β no prompt template bundled |
| Performance hint | `3` |
| Availability | **Chrome Canary** (not tested in Stable) |
| Optimization target | `OPTIMIZATION_TARGET_MODEL_EXECUTION_FEATURE_CLASSIFIER` (ID 72) |
| Chrome feature flag | `ClientSideDetectionBrandAndIntentForScamDetection` |
---
## Purpose: Scam Detection via Brand + Intent Classification
Chrome's Client-Side Detection (CSD) system extracts page text from suspicious websites and sends it to this model with the following prompt (decoded from `on_device_model_execution_config.pb` of model ID 55):
```
You are a web page text scanner. Your task is to carefully review text from
a web page and answer the following questions in English:
1) What brand does the page represent?
2) In one complete sentence, summarize what this page aims to do.
Do not leak PII data.
You should output your answers strictly in the following JSON format:
{"brand": "<brand>", "intent": "<intent>"}
Do not use ```json``` block in your output.
Text: [PAGE CONTENT HERE]
```
The expected response conforms to this JSON schema:
```json
{
"type": "object",
"additionalProperties": false,
"properties": {
"brand": { "type": "string" },
"intent": { "type": "string" }
},
"required": ["brand", "intent"]
}
```
When the detected brand/intent combination is inconsistent with the actual page behavior (e.g., a page claiming to be PayPal but actually harvesting credentials on an unrelated domain), Chrome flags the page as a potential scam via Safe Browsing.
---
## Binary Format: LITERTLM Container
The `weights.bin` file is **not** a raw TFLite model. It uses the **LITERTLM** (LiteRT Language Model) container format β a proprietary Google ODML packaging format with a FlatBuffer header and multiple embedded submodels.
### File Layout
```
Offset Component Size
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0x00000000 LITERTLM FlatBuffer header 32 KB
Magic: "LITERTLM"
Version: 1
Submodels: 4 entries declared
Metadata:
model_type = "tf_lite_prefill_decode"
model_type = "tf_lite_embedder"
model_version = "1.0.1"
Authors = "ODML team"
0x00008000 TFLite #1 β Embedder 8.20 MB (8,601,600 bytes)
Input: token_ids [1, 1] int32
Output: embeddings [1, 1, 1024] float32
Op: lookup_embedding_table
TFLite runtime: 2.18.0
0x0083C000 TFLite #2 β Prefill + Decode 111.63 MB (117,055,216 bytes)
2 signatures: "prefill" and "decode"
39 inputs (embeddings + position + mask + 36 KV cache)
37 outputs (36 KV cache + logits [1, 1, 16384])
18 transformer layers
Full Gemma 2 architecture
0x077E0000 SentencePiece tokenizer 305.6 KB (312,918 bytes)
Vocab size: 16,384 tokens
Special tokens: <pad>=0, </s>=1, <s>=2, <unk>=3
256 byte-fallback tokens
Normalizer: nmt_nfkc
0x0782C656 Zero padding to alignment 14.7 KB
0x07830000 End of file 126,025,728 bytes total
```
### How to Extract the Submodels
```python
data = open('weights.bin', 'rb').read()
# TFLite embedder
open('embedder.tflite', 'wb').write(data[0x8000:0x83C000])
# TFLite prefill+decode transformer
open('decoder.tflite', 'wb').write(data[0x83C000:0x77DDEF0])
# SentencePiece tokenizer
open('tokenizer.model', 'wb').write(data[0x77E0000:0x782C656])
```
---
## Architecture: Gemma 2 "taxonomy-tiny"
The model is a **distilled Gemma 2** with reduced dimensions, confirmed by layer name analysis of the TFLite graph.
### Specifications
| Parameter | Value | Evidence |
|---|---|---|
| Architecture family | **Gemma 2** | QK normalization + post-FFN norm = Gemma 2 exclusive features |
| Transformer layers | **18** | `layer_0` through `layer_17` in tensor names |
| Embedding dimension | **1024** | Embedder output shape `[1, 1, 1024]` |
| KV cache dimension | **256** per layer | KV input/output shape `[1, 1, 1, 256]` |
| Vocabulary size | **16,384** | Logits output shape `[1, 1, 16384]`; SentencePiece vocab |
| Normalization | **RMSNorm** | `rms_norm/mul`, `rms_norm/rsqrt`, `rms_norm/square` |
| Pre-attention norm | **Yes** | `pre_attention_norm/rms_norm` |
| Pre-FFN norm | **Yes** | `pre_ffw_norm` patterns |
| Post-FFN norm | **Yes** | Post-FFN norm present (Gemma 2 specific) |
| QK normalization | **Yes** | `key_norm/rms_norm` (Gemma 2 specific) |
| Positional encoding | **RoPE** | `maybe_rope/concatenate` |
| Attention type | **Full attention** | No sliding window patterns found |
| Activation | **GeLU** (likely) | Standard for Gemma 2 |
| Quantization | **Mixed INT4/INT8** | 120 MB for 18 layers with 1024 dim implies heavy quantization |
| Estimated parameters | **~100β200M** | Based on file size and quantization assumptions |
| TFLite signatures | `prefill` (no logits) + `decode` (with logits) | Standard ODML LLM execution pattern |
### Comparison with Known Models
| | **taxonomy-tiny** | Gemma 2 2B | Gemini Nano v3 |
|---|---|---|---|
| Layers | 18 | 26 | ~32 |
| Embed dim | 1,024 | 2,304 | unknown |
| Vocab size | 16,384 | 256,128 | 256,128 |
| File size | 120 MB | ~2.6 GB | 4.07 GB |
| QK norm | Yes | Yes | Yes |
| Post-FFN norm | Yes | Yes | Yes |
| Sliding window | No | Yes (alternating) | Yes |
| Purpose | Classification | General | General |
### Single Transformer Block Structure
From tensor name analysis, each of the 18 layers contains:
```
layer_N/
βββ layer_N.pre_qkv/
β βββ pre_attention_norm/rms_norm/ (RMSNorm)
β βββ attn._pre_attention_fn/
β βββ maybe_rope/ (RoPE positional encoding)
βββ attn.dot_product_attention/
β βββ dot_attn._qkv_fn/
β βββ key_norm/rms_norm/ (QK normalization)
β βββ dot_general (Q*K)
β βββ tfl_softmax
β βββ dot_general (attn*V)
β βββ reshape/transpose
βββ layer_N.post_qkv/
β βββ attn.post_qkv/attn_vec_einsum/ (output projection)
β βββ add (residual)
β βββ add1 (post-attention residual)
βββ layer_N.update_cache/
β βββ attn.update_cache/concatenate (KV cache update)
βββ [pre_ffw_norm + FFN + post_ffw_norm] (feed-forward block)
```
Final output: `final_norm/rms_norm` β `decode_softmax` β logits `[1, 1, 16384]`
---
## Tokenizer: Reduced Gemma Vocabulary
The embedded SentencePiece model uses a **16,384-token vocabulary** β a dramatic reduction from Gemma's standard 256,128 tokens. This is consistent with a classification-focused model that doesn't need the full multilingual generative vocabulary.
| Property | Value |
|---|---|
| Vocab size | 16,384 |
| BOS token | `<s>` (id=2) |
| EOS token | `</s>` (id=1) |
| PAD token | `<pad>` (id=0) |
| UNK token | `<unk>` (id=3) |
| Byte fallbacks | 256 tokens (`<0x00>` through `<0xFF>`) |
| Normalizer | `nmt_nfkc` |
Notably, Gemma's conversation tokens (`<start_of_turn>`, `<end_of_turn>`) are **absent** from this vocabulary β they map to UNK (id=3). The model does not use chat-turn formatting.
Sample vocabulary entries:
```
[ 260] = '.' [ 500] = 'βsuch' [ 1000] = 'βamount'
[ 2000] = 'βQ' [ 5000] = 'βtradition' [10000] = 'βComputer'
[15000] = 'βPhilosophy' [16383] = 'β<custom370>'
```
---
## Chrome Integration Pipeline
```
User visits a page
β
βΌ
βββββββββββββββββββββββββββββββ
β Safe Browsing Heuristics β Pre-filter: URL reputation, phishing
β (CSD - Client Side Det.) β patterns, keyboard lock API, etc.
ββββββββββββ¬βββββββββββββββββββ
β Page flagged as suspicious
βΌ
βββββββββββββββββββββββββββββββ
β Page Text Extraction β Extract visible text content from DOM
ββββββββββββ¬βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Prompt Construction β "You are a web page text scanner..."
β (from model ID 55 config) β + page text appended
ββββββββββββ¬βββββββββββββββββββ
β
βββββββ΄βββββββ
βΌ βΌ
βββββββββββ ββββββββββββββββ
β Gemini β β taxonomy- β Whichever model is available
β Nano β β tiny β (taxonomy-tiny is 33x smaller)
β (4 GB) β β (120 MB) β
ββββββ¬βββββ ββββββββ¬ββββββββ
β β
ββββββββ¬ββββββββ
βΌ
βββββββββββββββββββββββββββββββ
β JSON Response Parsing β {"brand": "PayPal",
β β "intent": "credential harvesting"}
ββββββββββββ¬βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Verdict Logic β Compare brand claim vs. actual domain,
β β intent vs. page behavior
ββββββββββββ¬βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Safe Browsing Warning β Red interstitial page shown to user
βββββββββββββββββββββββββββββββ
```
### Trigger Conditions
The classifier does **not** run on every page. It triggers when Chrome's CSD heuristics detect suspicious signals:
- Phishing URL patterns (Safe Browsing prefix match)
- Keyboard Lock API usage (common in tech support scams)
- Aggressive popups or fullscreen requests
- Form fields requesting sensitive data (passwords, SSN, credit cards)
- Urgency language patterns
---
|