| | --- |
| | license: apache-2.0 |
| | library_name: onnx |
| | tags: |
| | - onnx |
| | - distilbert |
| | - text-classification |
| | - browser-automation |
| | - web-navigation |
| | pipeline_tag: text-classification |
| | datasets: |
| | - custom |
| | metrics: |
| | - accuracy |
| | - f1 |
| | model-index: |
| | - name: webbert-action-classifier |
| | results: |
| | - task: |
| | type: text-classification |
| | name: Web Action Classification |
| | metrics: |
| | - name: Accuracy |
| | type: accuracy |
| | value: 0.909 |
| | - name: Macro F1 |
| | type: f1 |
| | value: 0.909 |
| | --- |
| | |
| | # WebBERT Action Classifier |
| |
|
| | DistilBERT-based action classifier for web browser navigation. Given a task goal, page elements, and domain, predicts the next browser action. |
| |
|
| | ## Model Details |
| |
|
| | - **Base model:** distilbert-base-uncased |
| | - **Fine-tuned on:** 9,025 synthetic + hard-case examples |
| | - **Classes:** 15 web action types |
| | - **Input format:** `[TASK] goal [ELEMENTS] label:type @(cx,cy) ... [PAGE] domain` |
| | - **Max sequence length:** 256 |
| | - **Export format:** ONNX (opset 14) |
| |
|
| | ## Classes |
| |
|
| | click, type, scroll_down, scroll_up, wait, go_back, skip, extract_content, dismiss_popup, accept_cookies, fill_form, submit_form, click_next, download, select_dropdown |
| |
|
| | ## Performance |
| |
|
| | | Metric | Value | |
| | |--------|-------| |
| | | Overall Accuracy | 90.9% | |
| | | Macro F1 | 0.909 | |
| | | Typical scenarios | 92.0% | |
| | | Complex edge cases | 89.5% | |
| | | Inference latency (CPU) | ~5ms | |
| | | Model size | ~256 MB | |
| |
|
| | ## Usage |
| |
|
| | ### Python (ONNX Runtime) |
| |
|
| | ```python |
| | import onnxruntime as ort |
| | from tokenizers import Tokenizer |
| | |
| | session = ort.InferenceSession("webbert.onnx") |
| | tokenizer = Tokenizer.from_file("webbert-tokenizer.json") |
| | tokenizer.enable_padding(length=256, pad_id=0, pad_token="[PAD]") |
| | tokenizer.enable_truncation(max_length=256) |
| | |
| | text = "[TASK] click login button [ELEMENTS] Login:button @(0.50,0.30) [PAGE] example.com" |
| | encoding = tokenizer.encode(text) |
| | |
| | import numpy as np |
| | input_ids = np.array([encoding.ids], dtype=np.int64) |
| | attention_mask = np.array([encoding.attention_mask], dtype=np.int64) |
| | |
| | outputs = session.run(None, {"input_ids": input_ids, "attention_mask": attention_mask}) |
| | pred = np.argmax(outputs[0], axis=-1)[0] |
| | ``` |
| |
|
| | ### Rust (ort + tokenizers) |
| |
|
| | Used in [nyaya-agent](https://github.com/biztiger/nyaya-agent) as Layer 2 in the browser navigation cascade. |
| |
|
| | ## Files |
| |
|
| | - `webbert.onnx` — ONNX model (DistilBERT fine-tuned, ~256 MB) |
| | - `webbert-tokenizer.json` — HuggingFace tokenizer (single JSON file) |
| | - `webbert-classes.json` — Ordered class label list |
| |
|
| | ## Training |
| |
|
| | Trained with HuggingFace Transformers on 9,025 examples (6,000 base + 3,025 hard-case disambiguation). 5 epochs, lr=2e-5, batch_size=32, warmup_steps=100. |
| |
|