WebBERT Action Classifier
DistilBERT-based action classifier for web browser navigation. Given a task goal, page elements, and domain, predicts the next browser action.
Model Details
- Base model: distilbert-base-uncased
- Fine-tuned on: 9,025 synthetic + hard-case examples
- Classes: 15 web action types
- Input format:
[TASK] goal [ELEMENTS] label:type @(cx,cy) ... [PAGE] domain - Max sequence length: 256
- Export format: ONNX (opset 14)
Classes
click, type, scroll_down, scroll_up, wait, go_back, skip, extract_content, dismiss_popup, accept_cookies, fill_form, submit_form, click_next, download, select_dropdown
Performance
| Metric | Value |
|---|---|
| Overall Accuracy | 90.9% |
| Macro F1 | 0.909 |
| Typical scenarios | 92.0% |
| Complex edge cases | 89.5% |
| Inference latency (CPU) | ~5ms |
| Model size | ~256 MB |
Usage
Python (ONNX Runtime)
import onnxruntime as ort
from tokenizers import Tokenizer
session = ort.InferenceSession("webbert.onnx")
tokenizer = Tokenizer.from_file("webbert-tokenizer.json")
tokenizer.enable_padding(length=256, pad_id=0, pad_token="[PAD]")
tokenizer.enable_truncation(max_length=256)
text = "[TASK] click login button [ELEMENTS] Login:button @(0.50,0.30) [PAGE] example.com"
encoding = tokenizer.encode(text)
import numpy as np
input_ids = np.array([encoding.ids], dtype=np.int64)
attention_mask = np.array([encoding.attention_mask], dtype=np.int64)
outputs = session.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})
pred = np.argmax(outputs[0], axis=-1)[0]
Rust (ort + tokenizers)
Used in nyaya-agent as Layer 2 in the browser navigation cascade.
Files
webbert.onnxโ ONNX model (DistilBERT fine-tuned, ~256 MB)webbert-tokenizer.jsonโ HuggingFace tokenizer (single JSON file)webbert-classes.jsonโ Ordered class label list
Training
Trained with HuggingFace Transformers on 9,025 examples (6,000 base + 3,025 hard-case disambiguation). 5 epochs, lr=2e-5, batch_size=32, warmup_steps=100.
Evaluation results
- Accuracyself-reported0.909
- Macro F1self-reported0.909