File size: 2,670 Bytes
fb4fbb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
license: apache-2.0
library_name: onnx
tags:
  - onnx
  - distilbert
  - text-classification
  - browser-automation
  - web-navigation
pipeline_tag: text-classification
datasets:
  - custom
metrics:
  - accuracy
  - f1
model-index:
  - name: webbert-action-classifier
    results:
      - task:
          type: text-classification
          name: Web Action Classification
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.909
          - name: Macro F1
            type: f1
            value: 0.909
---

# WebBERT Action Classifier

DistilBERT-based action classifier for web browser navigation. Given a task goal, page elements, and domain, predicts the next browser action.

## Model Details

- **Base model:** distilbert-base-uncased
- **Fine-tuned on:** 9,025 synthetic + hard-case examples
- **Classes:** 15 web action types
- **Input format:** `[TASK] goal [ELEMENTS] label:type @(cx,cy) ... [PAGE] domain`
- **Max sequence length:** 256
- **Export format:** ONNX (opset 14)

## Classes

click, type, scroll_down, scroll_up, wait, go_back, skip, extract_content, dismiss_popup, accept_cookies, fill_form, submit_form, click_next, download, select_dropdown

## Performance

| Metric | Value |
|--------|-------|
| Overall Accuracy | 90.9% |
| Macro F1 | 0.909 |
| Typical scenarios | 92.0% |
| Complex edge cases | 89.5% |
| Inference latency (CPU) | ~5ms |
| Model size | ~256 MB |

## Usage

### Python (ONNX Runtime)

```python
import onnxruntime as ort
from tokenizers import Tokenizer

session = ort.InferenceSession("webbert.onnx")
tokenizer = Tokenizer.from_file("webbert-tokenizer.json")
tokenizer.enable_padding(length=256, pad_id=0, pad_token="[PAD]")
tokenizer.enable_truncation(max_length=256)

text = "[TASK] click login button [ELEMENTS] Login:button @(0.50,0.30) [PAGE] example.com"
encoding = tokenizer.encode(text)

import numpy as np
input_ids = np.array([encoding.ids], dtype=np.int64)
attention_mask = np.array([encoding.attention_mask], dtype=np.int64)

outputs = session.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})
pred = np.argmax(outputs[0], axis=-1)[0]
```

### Rust (ort + tokenizers)

Used in [nyaya-agent](https://github.com/biztiger/nyaya-agent) as Layer 2 in the browser navigation cascade.

## Files

- `webbert.onnx` — ONNX model (DistilBERT fine-tuned, ~256 MB)
- `webbert-tokenizer.json` — HuggingFace tokenizer (single JSON file)
- `webbert-classes.json` — Ordered class label list

## Training

Trained with HuggingFace Transformers on 9,025 examples (6,000 base + 3,025 hard-case disambiguation). 5 epochs, lr=2e-5, batch_size=32, warmup_steps=100.