TinyBERT Address Autofill

A compact field-type classifier for HTML form autofill developed by the Credentials Management Team on Firefox. Given a string describing a single form field's attributes, it predicts one of 66 autofill field types (given-name, family-name, email, postal-code, address-line1, cc-number, etc.) or other when the field should not be filled.

The model is fine-tuned from huawei-noah/TinyBERT_General_4L_312D on a corpus of manually annotated shopping and address forms collected by Mozilla, and is intended to run client-side inside Firefox (or any Transformers.js host) as a replacement or augmentation for the existing regex-based heuristic field detector.

ONNX variants

All variants live under onnx/ and are loadable through Transformers.js by passing the corresponding dtype argument.

File Precision Size Transformers.js dtype
onnx/model.onnx fp32 57.6 MB fp32
onnx/model_fp16.onnx fp16 28.9 MB fp16
onnx/model_quantized.onnx int8 dynamic (default) 14.6 MB q8
onnx/model_int8.onnx int8 dynamic 14.6 MB int8
onnx/model_uint8.onnx uint8 dynamic 14.6 MB uint8
onnx/model_q4.onnx 4-bit weight-only on MatMul 42.3 MB q4
onnx/model_q4f16.onnx 4-bit on top of fp16 22.4 MB q4f16
onnx/model_bnb4.onnx bitsandbytes NF4 41.9 MB bnb4

How to use

Transformers.js (browser)

import { pipeline } from "@huggingface/transformers";

const classifier = await pipeline(
  "text-classification",
  "vazish/tinybert-address-autofill",
  { dtype: "q8" }   // try "fp16" for highest fidelity, "q4f16" for smallest
);

const out = await classifier(
  "a-c-postal-code billing zip code dwfrm billing address fields postal code"
);
// β†’ [{ label: "postal-code", score: 0.99 }]

Python (Optimum + ONNX Runtime)

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

model = ORTModelForSequenceClassification.from_pretrained(
    "vazish/tinybert-address-autofill",
    file_name="onnx/model.onnx",   # or onnx/model_quantized.onnx, etc.
)
tokenizer = AutoTokenizer.from_pretrained("vazish/tinybert-address-autofill")
clf = pipeline("text-classification", model=model, tokenizer=tokenizer)

clf("email email mail **email")
# β†’ [{"label": "email", "score": 0.99}]

Input format

The model expects a single string per field, built by concatenating that field's HTML attributes after light normalisation:

  1. Concatenate (in order): type + autocomplete + id + name + placeholder + the field's computed <label> text.
  2. Split camelCase boundaries to whitespace (firstName β†’ first name).
  3. Lowercase the whole thing.
  4. If the field declares an autocomplete attribute, prepend an a-c-<value> token (e.g. a-c-postal-code).
  5. Optionally include adjacent-field context β€” bb-prefixed tokens for the previous field on the same form and aa-prefixed tokens for the next. Including adjacent context improves accuracy by roughly 8 percentage points relative to the same model trained on isolated fields.

Example input for a "first name" field followed by a "last name" field:

first name first name enter first name aaa-c-family-name aalast aaname

Training

Base model huawei-noah/TinyBERT_General_4L_312D (4 layers, hidden 312, intermediate 1200, 12 heads, ~14M params, max sequence length 512)
Head BertForSequenceClassification, 66 output classes
Training set ~360 real shopping / checkout / address forms, 6,691 labelled fields
Validation / test ~246 forms, 4,300 fields, split into validation and test
Regions covered US, CA, GB, FR, DE, BR, ES, JP, AT, IN, IT, PL, AU, CH (supported); some additional regions also represented for evaluation
Optimizer / schedule Hugging Face Trainer defaults, 50 epochs
Hardware Apple M1 MacBook Pro, ~75 minutes wall time

Each form field is annotated with data-mozautofill-type="<type>" set to the expected autofill class; fields that should not be filled receive no attribute and are mapped to other.

Evaluation

Evaluated on the project's held-out test set (2,168 labelled fields drawn from real address / shopping forms) using ONNX Runtime on CPU.

  • Total β€” strict exact-match accuracy.
  • Close β€” counts predictions on closely related labels as correct (e.g. street-address predicted when ground truth is address-line1, tel predicted when ground truth is tel-national).
  • Blank β€” false-fill rate. Fraction of other-labelled fields the model predicted as a real autofill type. Lower is better; this metric matters most for user experience because high false-fill means filling search boxes, comments, and gift-card fields with personal data.
Variant Total Close Blank Throughput (CPU)
fp32 89.62% 91.51% 2.40% ~218/s
fp16 89.71% 91.61% 2.31% ~132/s
bnb4 88.42% 90.64% 2.77% ~214/s
q4 88.01% 90.54% 2.58% ~209/s
q4f16 88.01% 90.54% 2.58% ~95/s
uint8 87.27% 89.53% 3.27% ~163/s
int8 / quantized 84.82% 87.73% 1.94% ~257/s

For reference, the existing Firefox regex-based heuristic detector reaches roughly 85% total accuracy on comparable test sets.

Highlights:

  • fp16 is statistically indistinguishable from fp32 across all metrics while halving the file size. It is the recommended high-fidelity variant. Latency on CPU is ~2Γ— fp32 because most CPUs lack native fp16 ops, but the gap closes on hardware with fp16 support and on WebGPU.
  • int8 / quantized has the lowest exact accuracy but the lowest false-fill rate of any variant (1.94%, below the fp32 baseline). It errs toward other when uncertain β€” the safer failure mode for an autofill UI. This is the recommended size-constrained default.
  • 4-bit variants (q4, q4f16, bnb4) cluster around 88% total accuracy with q4f16 being the smallest at 22 MB.

Limitations

  • Trained primarily on the supported-region list above. Accuracy on unsupported regions trained-without-data drops ~5–10 percentage points; adding region-specific samples to the training set typically recovers most of that gap.
  • Underrepresented field types (address-line3, additional-name, phonetic-*, tel-local-prefix, etc.) have very few training examples and are sometimes confidently misclassified.
  • Quantized variants disagree with fp32 on roughly 0.1% (fp16) to ~5% (int8) of inputs. The exact disagreement pattern is captured in the evaluation table above.
  • The model assumes the team's preprocessing format (camelCase-split, lowercased, with optional a-c-/bb/aa markers). Feeding raw HTML attribute strings without this normalisation will degrade accuracy.

Citation

This model is built on TinyBERT:

@inproceedings{jiao-etal-2020-tinybert,
  title     = {{TinyBERT}: Distilling {BERT} for Natural Language Understanding},
  author    = {Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin
               and Chen, Xiao and Li, Linlin and Wang, Fang and Liu, Qun},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},
  year      = {2020},
  pages     = {4163--4174},
  url       = {https://aclanthology.org/2020.findings-emnlp.372}
}

If you use this checkpoint, please also cite the Mozilla autofill ML investigation that produced it (citation forthcoming).

License

Apache 2.0.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Mozilla/tinybert-address-autofill

Quantized
(6)
this model