vazish's picture
initial commit
056ef4b unverified
---
license: apache-2.0
language: multilingual
library_name: transformers.js
pipeline_tag: text-classification
base_model: huawei-noah/TinyBERT_General_4L_312D
tags:
- autofill
- field-classification
- bert
- tinybert
- onnx
- transformers.js
- browser
---
# TinyBERT Address Autofill
A compact field-type classifier for HTML form autofill developed by the
Credentials Management Team on Firefox. Given a string describing a single form
field's attributes, it predicts one of 66 autofill field types (`given-name`,
`family-name`, `email`, `postal-code`, `address-line1`, `cc-number`, etc.) or
`other` when the field should not be filled.
The model is fine-tuned from `huawei-noah/TinyBERT_General_4L_312D` on a
corpus of manually annotated shopping and address forms collected by Mozilla, and is
intended to run client-side inside Firefox (or any Transformers.js host) as
a replacement or augmentation for the existing regex-based heuristic field
detector.
## ONNX variants
All variants live under `onnx/` and are loadable through Transformers.js by
passing the corresponding `dtype` argument.
| File | Precision | Size | Transformers.js `dtype` |
| --- | --- | ---: | --- |
| `onnx/model.onnx` | fp32 | 57.6 MB | `fp32` |
| `onnx/model_fp16.onnx` | fp16 | 28.9 MB | `fp16` |
| `onnx/model_quantized.onnx` | int8 dynamic (default) | 14.6 MB | `q8` |
| `onnx/model_int8.onnx` | int8 dynamic | 14.6 MB | `int8` |
| `onnx/model_uint8.onnx` | uint8 dynamic | 14.6 MB | `uint8` |
| `onnx/model_q4.onnx` | 4-bit weight-only on MatMul | 42.3 MB | `q4` |
| `onnx/model_q4f16.onnx` | 4-bit on top of fp16 | 22.4 MB | `q4f16` |
| `onnx/model_bnb4.onnx` | bitsandbytes NF4 | 41.9 MB | `bnb4` |
## How to use
### Transformers.js (browser)
```js
import { pipeline } from "@huggingface/transformers";
const classifier = await pipeline(
"text-classification",
"vazish/tinybert-address-autofill",
{ dtype: "q8" } // try "fp16" for highest fidelity, "q4f16" for smallest
);
const out = await classifier(
"a-c-postal-code billing zip code dwfrm billing address fields postal code"
);
// β†’ [{ label: "postal-code", score: 0.99 }]
```
### Python (Optimum + ONNX Runtime)
```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
model = ORTModelForSequenceClassification.from_pretrained(
"vazish/tinybert-address-autofill",
file_name="onnx/model.onnx", # or onnx/model_quantized.onnx, etc.
)
tokenizer = AutoTokenizer.from_pretrained("vazish/tinybert-address-autofill")
clf = pipeline("text-classification", model=model, tokenizer=tokenizer)
clf("email email mail **email")
# β†’ [{"label": "email", "score": 0.99}]
```
## Input format
The model expects a single string per field, built by concatenating that
field's HTML attributes after light normalisation:
1. Concatenate (in order): `type` + `autocomplete` + `id` + `name` +
`placeholder` + the field's computed `<label>` text.
2. Split camelCase boundaries to whitespace (`firstName` β†’ `first name`).
3. Lowercase the whole thing.
4. If the field declares an `autocomplete` attribute, prepend an
`a-c-<value>` token (e.g. `a-c-postal-code`).
5. Optionally include adjacent-field context β€” `bb`-prefixed tokens for
the previous field on the same form and `aa`-prefixed tokens for the
next. Including adjacent context improves accuracy by roughly 8 percentage
points relative to the same model trained on isolated fields.
Example input for a "first name" field followed by a "last name" field:
```
first name first name enter first name aaa-c-family-name aalast aaname
```
## Training
| | |
| --- | --- |
| Base model | `huawei-noah/TinyBERT_General_4L_312D` (4 layers, hidden 312, intermediate 1200, 12 heads, ~14M params, max sequence length 512) |
| Head | `BertForSequenceClassification`, 66 output classes |
| Training set | ~360 real shopping / checkout / address forms, 6,691 labelled fields |
| Validation / test | ~246 forms, 4,300 fields, split into validation and test |
| Regions covered | US, CA, GB, FR, DE, BR, ES, JP, AT, IN, IT, PL, AU, CH (supported); some additional regions also represented for evaluation |
| Optimizer / schedule | Hugging Face `Trainer` defaults, 50 epochs |
| Hardware | Apple M1 MacBook Pro, ~75 minutes wall time |
Each form field is annotated with `data-mozautofill-type="<type>"` set to
the expected autofill class; fields that should not be filled receive no
attribute and are mapped to `other`.
## Evaluation
Evaluated on the project's held-out test set (2,168 labelled fields drawn
from real address / shopping forms) using ONNX Runtime on CPU.
- **Total** β€” strict exact-match accuracy.
- **Close** β€” counts predictions on closely related labels as correct
(e.g. `street-address` predicted when ground truth is `address-line1`,
`tel` predicted when ground truth is `tel-national`).
- **Blank** β€” false-fill rate. Fraction of `other`-labelled fields the
model predicted as a real autofill type. Lower is better; this metric
matters most for user experience because high false-fill means filling
search boxes, comments, and gift-card fields with personal data.
| Variant | Total | Close | Blank | Throughput (CPU) |
| --- | ---: | ---: | ---: | ---: |
| fp32 | **89.62%** | 91.51% | 2.40% | ~218/s |
| fp16 | **89.71%** | 91.61% | 2.31% | ~132/s |
| bnb4 | 88.42% | 90.64% | 2.77% | ~214/s |
| q4 | 88.01% | 90.54% | 2.58% | ~209/s |
| q4f16 | 88.01% | 90.54% | 2.58% | ~95/s |
| uint8 | 87.27% | 89.53% | 3.27% | ~163/s |
| int8 / quantized | 84.82% | 87.73% | **1.94%** | ~257/s |
For reference, the existing Firefox regex-based heuristic detector reaches
roughly 85% total accuracy on comparable test sets.
Highlights:
- **fp16** is statistically indistinguishable from fp32 across all metrics
while halving the file size. It is the recommended high-fidelity
variant. Latency on CPU is ~2Γ— fp32 because most CPUs lack native fp16
ops, but the gap closes on hardware with fp16 support and on
WebGPU.
- **int8 / quantized** has the lowest exact accuracy but **the lowest
false-fill rate of any variant** (1.94%, below the fp32 baseline). It
errs toward `other` when uncertain β€” the safer failure mode for an
autofill UI. This is the recommended size-constrained default.
- 4-bit variants (`q4`, `q4f16`, `bnb4`) cluster around 88% total accuracy
with `q4f16` being the smallest at 22 MB.
## Limitations
- Trained primarily on the supported-region list above. Accuracy on
unsupported regions trained-without-data drops ~5–10 percentage points;
adding region-specific samples to the training set typically recovers
most of that gap.
- Underrepresented field types (`address-line3`, `additional-name`,
`phonetic-*`, `tel-local-prefix`, etc.) have very few training examples
and are sometimes confidently misclassified.
- Quantized variants disagree with fp32 on roughly 0.1% (`fp16`) to ~5%
(`int8`) of inputs. The exact disagreement pattern is captured in the
evaluation table above.
- The model assumes the team's preprocessing format (camelCase-split,
lowercased, with optional `a-c-`/`bb`/`aa` markers). Feeding raw HTML
attribute strings without this normalisation will degrade accuracy.
## Citation
This model is built on TinyBERT:
```bibtex
@inproceedings{jiao-etal-2020-tinybert,
title = {{TinyBERT}: Distilling {BERT} for Natural Language Understanding},
author = {Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin
and Chen, Xiao and Li, Linlin and Wang, Fang and Liu, Qun},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},
year = {2020},
pages = {4163--4174},
url = {https://aclanthology.org/2020.findings-emnlp.372}
}
```
If you use this checkpoint, please also cite the Mozilla autofill ML
investigation that produced it (citation forthcoming).
## License
Apache 2.0.