chrome_models / OptGuideOnDeviceClassifierModel /CLASSIFIER_MODEL_ANALYSIS.md
oliseg's picture
Upload 31 files
6d57cb4 verified
|
raw
history blame
11.9 kB

OptGuideOnDeviceClassifierModel β€” Complete Analysis

Overview

OptGuideOnDeviceClassifierModel is a 120 MB on-device language model shipped with Chrome Canary as a Chrome Component. Its manifest names it "Optimization Guide On Device Taxonomy Model", with a base model spec called taxonomy-tiny.

It is a Gemma 2 variant purpose-built for page-level classification β€” specifically extracting the brand and intent of web pages for Chrome's client-side scam/phishing detection pipeline.

Field Value
Manifest name Optimization Guide On Device Taxonomy Model
Base model taxonomy-tiny v0.0.0.0
Component version 2026.2.12.1554
Component ID (CRX) eidcjfoningnkhpoelgpjemmhmopkeoi
File weights.bin (126,025,728 bytes / 120.19 MB)
Execution config Empty (0 bytes) β€” no prompt template bundled
Performance hint 3
Availability Chrome Canary (not tested in Stable)
Optimization target OPTIMIZATION_TARGET_MODEL_EXECUTION_FEATURE_CLASSIFIER (ID 72)
Chrome feature flag ClientSideDetectionBrandAndIntentForScamDetection

Purpose: Scam Detection via Brand + Intent Classification

Chrome's Client-Side Detection (CSD) system extracts page text from suspicious websites and sends it to this model with the following prompt (decoded from on_device_model_execution_config.pb of model ID 55):

You are a web page text scanner. Your task is to carefully review text from
a web page and answer the following questions in English:

1) What brand does the page represent?
2) In one complete sentence, summarize what this page aims to do.
   Do not leak PII data.

You should output your answers strictly in the following JSON format:

{"brand": "<brand>", "intent": "<intent>"}

Do not use ```json``` block in your output.

Text: [PAGE CONTENT HERE]

The expected response conforms to this JSON schema:

{
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "brand": { "type": "string" },
    "intent": { "type": "string" }
  },
  "required": ["brand", "intent"]
}

When the detected brand/intent combination is inconsistent with the actual page behavior (e.g., a page claiming to be PayPal but actually harvesting credentials on an unrelated domain), Chrome flags the page as a potential scam via Safe Browsing.


Binary Format: LITERTLM Container

The weights.bin file is not a raw TFLite model. It uses the LITERTLM (LiteRT Language Model) container format β€” a proprietary Google ODML packaging format with a FlatBuffer header and multiple embedded submodels.

File Layout

Offset          Component                        Size
────────────────────────────────────────────────────────────
0x00000000      LITERTLM FlatBuffer header       32 KB
                  Magic: "LITERTLM"
                  Version: 1
                  Submodels: 4 entries declared
                  Metadata:
                    model_type = "tf_lite_prefill_decode"
                    model_type = "tf_lite_embedder"
                    model_version = "1.0.1"
                    Authors = "ODML team"

0x00008000      TFLite #1 β€” Embedder             8.20 MB (8,601,600 bytes)
                  Input:  token_ids [1, 1] int32
                  Output: embeddings [1, 1, 1024] float32
                  Op: lookup_embedding_table
                  TFLite runtime: 2.18.0

0x0083C000      TFLite #2 β€” Prefill + Decode     111.63 MB (117,055,216 bytes)
                  2 signatures: "prefill" and "decode"
                  39 inputs (embeddings + position + mask + 36 KV cache)
                  37 outputs (36 KV cache + logits [1, 1, 16384])
                  18 transformer layers
                  Full Gemma 2 architecture

0x077E0000      SentencePiece tokenizer          305.6 KB (312,918 bytes)
                  Vocab size: 16,384 tokens
                  Special tokens: <pad>=0, </s>=1, <s>=2, <unk>=3
                  256 byte-fallback tokens
                  Normalizer: nmt_nfkc

0x0782C656      Zero padding to alignment        14.7 KB
0x07830000      End of file                      126,025,728 bytes total

How to Extract the Submodels

data = open('weights.bin', 'rb').read()

# TFLite embedder
open('embedder.tflite', 'wb').write(data[0x8000:0x83C000])

# TFLite prefill+decode transformer
open('decoder.tflite', 'wb').write(data[0x83C000:0x77DDEF0])

# SentencePiece tokenizer
open('tokenizer.model', 'wb').write(data[0x77E0000:0x782C656])

Architecture: Gemma 2 "taxonomy-tiny"

The model is a distilled Gemma 2 with reduced dimensions, confirmed by layer name analysis of the TFLite graph.

Specifications

Parameter Value Evidence
Architecture family Gemma 2 QK normalization + post-FFN norm = Gemma 2 exclusive features
Transformer layers 18 layer_0 through layer_17 in tensor names
Embedding dimension 1024 Embedder output shape [1, 1, 1024]
KV cache dimension 256 per layer KV input/output shape [1, 1, 1, 256]
Vocabulary size 16,384 Logits output shape [1, 1, 16384]; SentencePiece vocab
Normalization RMSNorm rms_norm/mul, rms_norm/rsqrt, rms_norm/square
Pre-attention norm Yes pre_attention_norm/rms_norm
Pre-FFN norm Yes pre_ffw_norm patterns
Post-FFN norm Yes Post-FFN norm present (Gemma 2 specific)
QK normalization Yes key_norm/rms_norm (Gemma 2 specific)
Positional encoding RoPE maybe_rope/concatenate
Attention type Full attention No sliding window patterns found
Activation GeLU (likely) Standard for Gemma 2
Quantization Mixed INT4/INT8 120 MB for 18 layers with 1024 dim implies heavy quantization
Estimated parameters ~100–200M Based on file size and quantization assumptions
TFLite signatures prefill (no logits) + decode (with logits) Standard ODML LLM execution pattern

Comparison with Known Models

taxonomy-tiny Gemma 2 2B Gemini Nano v3
Layers 18 26 ~32
Embed dim 1,024 2,304 unknown
Vocab size 16,384 256,128 256,128
File size 120 MB ~2.6 GB 4.07 GB
QK norm Yes Yes Yes
Post-FFN norm Yes Yes Yes
Sliding window No Yes (alternating) Yes
Purpose Classification General General

Single Transformer Block Structure

From tensor name analysis, each of the 18 layers contains:

layer_N/
β”œβ”€β”€ layer_N.pre_qkv/
β”‚   β”œβ”€β”€ pre_attention_norm/rms_norm/          (RMSNorm)
β”‚   └── attn._pre_attention_fn/
β”‚       └── maybe_rope/                       (RoPE positional encoding)
β”œβ”€β”€ attn.dot_product_attention/
β”‚   └── dot_attn._qkv_fn/
β”‚       β”œβ”€β”€ key_norm/rms_norm/                (QK normalization)
β”‚       β”œβ”€β”€ dot_general (Q*K)
β”‚       β”œβ”€β”€ tfl_softmax
β”‚       β”œβ”€β”€ dot_general (attn*V)
β”‚       └── reshape/transpose
β”œβ”€β”€ layer_N.post_qkv/
β”‚   β”œβ”€β”€ attn.post_qkv/attn_vec_einsum/       (output projection)
β”‚   β”œβ”€β”€ add (residual)
β”‚   └── add1 (post-attention residual)
β”œβ”€β”€ layer_N.update_cache/
β”‚   └── attn.update_cache/concatenate         (KV cache update)
└── [pre_ffw_norm + FFN + post_ffw_norm]      (feed-forward block)

Final output: final_norm/rms_norm β†’ decode_softmax β†’ logits [1, 1, 16384]


Tokenizer: Reduced Gemma Vocabulary

The embedded SentencePiece model uses a 16,384-token vocabulary β€” a dramatic reduction from Gemma's standard 256,128 tokens. This is consistent with a classification-focused model that doesn't need the full multilingual generative vocabulary.

Property Value
Vocab size 16,384
BOS token <s> (id=2)
EOS token </s> (id=1)
PAD token <pad> (id=0)
UNK token <unk> (id=3)
Byte fallbacks 256 tokens (<0x00> through <0xFF>)
Normalizer nmt_nfkc

Notably, Gemma's conversation tokens (<start_of_turn>, <end_of_turn>) are absent from this vocabulary β€” they map to UNK (id=3). The model does not use chat-turn formatting.

Sample vocabulary entries:

[  260] = '.'           [  500] = '▁such'       [ 1000] = '▁amount'
[ 2000] = '▁Q'          [ 5000] = '▁tradition'  [10000] = '▁Computer'
[15000] = '▁Philosophy'  [16383] = '▁<custom370>'

Chrome Integration Pipeline

User visits a page
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Safe Browsing Heuristics   β”‚  Pre-filter: URL reputation, phishing
β”‚  (CSD - Client Side Det.)   β”‚  patterns, keyboard lock API, etc.
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ Page flagged as suspicious
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Page Text Extraction       β”‚  Extract visible text content from DOM
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Prompt Construction        β”‚  "You are a web page text scanner..."
β”‚  (from model ID 55 config)  β”‚  + page text appended
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
     β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
     β–Ό            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gemini  β”‚  β”‚ taxonomy-    β”‚   Whichever model is available
β”‚ Nano    β”‚  β”‚ tiny         β”‚   (taxonomy-tiny is 33x smaller)
β”‚ (4 GB)  β”‚  β”‚ (120 MB)     β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚              β”‚
     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  JSON Response Parsing      β”‚  {"brand": "PayPal",
β”‚                             β”‚   "intent": "credential harvesting"}
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Verdict Logic              β”‚  Compare brand claim vs. actual domain,
β”‚                             β”‚  intent vs. page behavior
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Safe Browsing Warning      β”‚  Red interstitial page shown to user
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Trigger Conditions

The classifier does not run on every page. It triggers when Chrome's CSD heuristics detect suspicious signals:

  • Phishing URL patterns (Safe Browsing prefix match)
  • Keyboard Lock API usage (common in tech support scams)
  • Aggressive popups or fullscreen requests
  • Form fields requesting sensitive data (passwords, SSN, credit cards)
  • Urgency language patterns