| # OptGuideOnDeviceClassifierModel β Complete Analysis | |
| ## Overview | |
| **OptGuideOnDeviceClassifierModel** is a 120 MB on-device language model shipped with Chrome Canary as a Chrome Component. Its manifest names it **"Optimization Guide On Device Taxonomy Model"**, with a base model spec called **`taxonomy-tiny`**. | |
| It is a **Gemma 2 variant** purpose-built for **page-level classification** β specifically extracting the **brand** and **intent** of web pages for Chrome's client-side **scam/phishing detection** pipeline. | |
| | Field | Value | | |
| |---|---| | |
| | Manifest name | Optimization Guide On Device Taxonomy Model | | |
| | Base model | `taxonomy-tiny` v0.0.0.0 | | |
| | Component version | `2026.2.12.1554` | | |
| | Component ID (CRX) | `eidcjfoningnkhpoelgpjemmhmopkeoi` | | |
| | File | `weights.bin` (126,025,728 bytes / 120.19 MB) | | |
| | Execution config | Empty (0 bytes) β no prompt template bundled | | |
| | Performance hint | `3` | | |
| | Availability | **Chrome Canary** (not tested in Stable) | | |
| | Optimization target | `OPTIMIZATION_TARGET_MODEL_EXECUTION_FEATURE_CLASSIFIER` (ID 72) | | |
| | Chrome feature flag | `ClientSideDetectionBrandAndIntentForScamDetection` | | |
| --- | |
| ## Purpose: Scam Detection via Brand + Intent Classification | |
| Chrome's Client-Side Detection (CSD) system extracts page text from suspicious websites and sends it to this model with the following prompt (decoded from `on_device_model_execution_config.pb` of model ID 55): | |
| ``` | |
| You are a web page text scanner. Your task is to carefully review text from | |
| a web page and answer the following questions in English: | |
| 1) What brand does the page represent? | |
| 2) In one complete sentence, summarize what this page aims to do. | |
| Do not leak PII data. | |
| You should output your answers strictly in the following JSON format: | |
| {"brand": "<brand>", "intent": "<intent>"} | |
| Do not use ```json``` block in your output. | |
| Text: [PAGE CONTENT HERE] | |
| ``` | |
| The expected response conforms to this JSON schema: | |
| ```json | |
| { | |
| "type": "object", | |
| "additionalProperties": false, | |
| "properties": { | |
| "brand": { "type": "string" }, | |
| "intent": { "type": "string" } | |
| }, | |
| "required": ["brand", "intent"] | |
| } | |
| ``` | |
| When the detected brand/intent combination is inconsistent with the actual page behavior (e.g., a page claiming to be PayPal but actually harvesting credentials on an unrelated domain), Chrome flags the page as a potential scam via Safe Browsing. | |
| --- | |
| ## Binary Format: LITERTLM Container | |
| The `weights.bin` file is **not** a raw TFLite model. It uses the **LITERTLM** (LiteRT Language Model) container format β a proprietary Google ODML packaging format with a FlatBuffer header and multiple embedded submodels. | |
| ### File Layout | |
| ``` | |
| Offset Component Size | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| 0x00000000 LITERTLM FlatBuffer header 32 KB | |
| Magic: "LITERTLM" | |
| Version: 1 | |
| Submodels: 4 entries declared | |
| Metadata: | |
| model_type = "tf_lite_prefill_decode" | |
| model_type = "tf_lite_embedder" | |
| model_version = "1.0.1" | |
| Authors = "ODML team" | |
| 0x00008000 TFLite #1 β Embedder 8.20 MB (8,601,600 bytes) | |
| Input: token_ids [1, 1] int32 | |
| Output: embeddings [1, 1, 1024] float32 | |
| Op: lookup_embedding_table | |
| TFLite runtime: 2.18.0 | |
| 0x0083C000 TFLite #2 β Prefill + Decode 111.63 MB (117,055,216 bytes) | |
| 2 signatures: "prefill" and "decode" | |
| 39 inputs (embeddings + position + mask + 36 KV cache) | |
| 37 outputs (36 KV cache + logits [1, 1, 16384]) | |
| 18 transformer layers | |
| Full Gemma 2 architecture | |
| 0x077E0000 SentencePiece tokenizer 305.6 KB (312,918 bytes) | |
| Vocab size: 16,384 tokens | |
| Special tokens: <pad>=0, </s>=1, <s>=2, <unk>=3 | |
| 256 byte-fallback tokens | |
| Normalizer: nmt_nfkc | |
| 0x0782C656 Zero padding to alignment 14.7 KB | |
| 0x07830000 End of file 126,025,728 bytes total | |
| ``` | |
| ### How to Extract the Submodels | |
| ```python | |
| data = open('weights.bin', 'rb').read() | |
| # TFLite embedder | |
| open('embedder.tflite', 'wb').write(data[0x8000:0x83C000]) | |
| # TFLite prefill+decode transformer | |
| open('decoder.tflite', 'wb').write(data[0x83C000:0x77DDEF0]) | |
| # SentencePiece tokenizer | |
| open('tokenizer.model', 'wb').write(data[0x77E0000:0x782C656]) | |
| ``` | |
| --- | |
| ## Architecture: Gemma 2 "taxonomy-tiny" | |
| The model is a **distilled Gemma 2** with reduced dimensions, confirmed by layer name analysis of the TFLite graph. | |
| ### Specifications | |
| | Parameter | Value | Evidence | | |
| |---|---|---| | |
| | Architecture family | **Gemma 2** | QK normalization + post-FFN norm = Gemma 2 exclusive features | | |
| | Transformer layers | **18** | `layer_0` through `layer_17` in tensor names | | |
| | Embedding dimension | **1024** | Embedder output shape `[1, 1, 1024]` | | |
| | KV cache dimension | **256** per layer | KV input/output shape `[1, 1, 1, 256]` | | |
| | Vocabulary size | **16,384** | Logits output shape `[1, 1, 16384]`; SentencePiece vocab | | |
| | Normalization | **RMSNorm** | `rms_norm/mul`, `rms_norm/rsqrt`, `rms_norm/square` | | |
| | Pre-attention norm | **Yes** | `pre_attention_norm/rms_norm` | | |
| | Pre-FFN norm | **Yes** | `pre_ffw_norm` patterns | | |
| | Post-FFN norm | **Yes** | Post-FFN norm present (Gemma 2 specific) | | |
| | QK normalization | **Yes** | `key_norm/rms_norm` (Gemma 2 specific) | | |
| | Positional encoding | **RoPE** | `maybe_rope/concatenate` | | |
| | Attention type | **Full attention** | No sliding window patterns found | | |
| | Activation | **GeLU** (likely) | Standard for Gemma 2 | | |
| | Quantization | **Mixed INT4/INT8** | 120 MB for 18 layers with 1024 dim implies heavy quantization | | |
| | Estimated parameters | **~100β200M** | Based on file size and quantization assumptions | | |
| | TFLite signatures | `prefill` (no logits) + `decode` (with logits) | Standard ODML LLM execution pattern | | |
| ### Comparison with Known Models | |
| | | **taxonomy-tiny** | Gemma 2 2B | Gemini Nano v3 | | |
| |---|---|---|---| | |
| | Layers | 18 | 26 | ~32 | | |
| | Embed dim | 1,024 | 2,304 | unknown | | |
| | Vocab size | 16,384 | 256,128 | 256,128 | | |
| | File size | 120 MB | ~2.6 GB | 4.07 GB | | |
| | QK norm | Yes | Yes | Yes | | |
| | Post-FFN norm | Yes | Yes | Yes | | |
| | Sliding window | No | Yes (alternating) | Yes | | |
| | Purpose | Classification | General | General | | |
| ### Single Transformer Block Structure | |
| From tensor name analysis, each of the 18 layers contains: | |
| ``` | |
| layer_N/ | |
| βββ layer_N.pre_qkv/ | |
| β βββ pre_attention_norm/rms_norm/ (RMSNorm) | |
| β βββ attn._pre_attention_fn/ | |
| β βββ maybe_rope/ (RoPE positional encoding) | |
| βββ attn.dot_product_attention/ | |
| β βββ dot_attn._qkv_fn/ | |
| β βββ key_norm/rms_norm/ (QK normalization) | |
| β βββ dot_general (Q*K) | |
| β βββ tfl_softmax | |
| β βββ dot_general (attn*V) | |
| β βββ reshape/transpose | |
| βββ layer_N.post_qkv/ | |
| β βββ attn.post_qkv/attn_vec_einsum/ (output projection) | |
| β βββ add (residual) | |
| β βββ add1 (post-attention residual) | |
| βββ layer_N.update_cache/ | |
| β βββ attn.update_cache/concatenate (KV cache update) | |
| βββ [pre_ffw_norm + FFN + post_ffw_norm] (feed-forward block) | |
| ``` | |
| Final output: `final_norm/rms_norm` β `decode_softmax` β logits `[1, 1, 16384]` | |
| --- | |
| ## Tokenizer: Reduced Gemma Vocabulary | |
| The embedded SentencePiece model uses a **16,384-token vocabulary** β a dramatic reduction from Gemma's standard 256,128 tokens. This is consistent with a classification-focused model that doesn't need the full multilingual generative vocabulary. | |
| | Property | Value | | |
| |---|---| | |
| | Vocab size | 16,384 | | |
| | BOS token | `<s>` (id=2) | | |
| | EOS token | `</s>` (id=1) | | |
| | PAD token | `<pad>` (id=0) | | |
| | UNK token | `<unk>` (id=3) | | |
| | Byte fallbacks | 256 tokens (`<0x00>` through `<0xFF>`) | | |
| | Normalizer | `nmt_nfkc` | | |
| Notably, Gemma's conversation tokens (`<start_of_turn>`, `<end_of_turn>`) are **absent** from this vocabulary β they map to UNK (id=3). The model does not use chat-turn formatting. | |
| Sample vocabulary entries: | |
| ``` | |
| [ 260] = '.' [ 500] = 'βsuch' [ 1000] = 'βamount' | |
| [ 2000] = 'βQ' [ 5000] = 'βtradition' [10000] = 'βComputer' | |
| [15000] = 'βPhilosophy' [16383] = 'β<custom370>' | |
| ``` | |
| --- | |
| ## Chrome Integration Pipeline | |
| ``` | |
| User visits a page | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββ | |
| β Safe Browsing Heuristics β Pre-filter: URL reputation, phishing | |
| β (CSD - Client Side Det.) β patterns, keyboard lock API, etc. | |
| ββββββββββββ¬βββββββββββββββββββ | |
| β Page flagged as suspicious | |
| βΌ | |
| βββββββββββββββββββββββββββββββ | |
| β Page Text Extraction β Extract visible text content from DOM | |
| ββββββββββββ¬βββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββ | |
| β Prompt Construction β "You are a web page text scanner..." | |
| β (from model ID 55 config) β + page text appended | |
| ββββββββββββ¬βββββββββββββββββββ | |
| β | |
| βββββββ΄βββββββ | |
| βΌ βΌ | |
| βββββββββββ ββββββββββββββββ | |
| β Gemini β β taxonomy- β Whichever model is available | |
| β Nano β β tiny β (taxonomy-tiny is 33x smaller) | |
| β (4 GB) β β (120 MB) β | |
| ββββββ¬βββββ ββββββββ¬ββββββββ | |
| β β | |
| ββββββββ¬ββββββββ | |
| βΌ | |
| βββββββββββββββββββββββββββββββ | |
| β JSON Response Parsing β {"brand": "PayPal", | |
| β β "intent": "credential harvesting"} | |
| ββββββββββββ¬βββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββ | |
| β Verdict Logic β Compare brand claim vs. actual domain, | |
| β β intent vs. page behavior | |
| ββββββββββββ¬βββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββ | |
| β Safe Browsing Warning β Red interstitial page shown to user | |
| βββββββββββββββββββββββββββββββ | |
| ``` | |
| ### Trigger Conditions | |
| The classifier does **not** run on every page. It triggers when Chrome's CSD heuristics detect suspicious signals: | |
| - Phishing URL patterns (Safe Browsing prefix match) | |
| - Keyboard Lock API usage (common in tech support scams) | |
| - Aggressive popups or fullscreen requests | |
| - Form fields requesting sensitive data (passwords, SSN, credit cards) | |
| - Urgency language patterns | |
| --- | |