File size: 3,361 Bytes
6a7089a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# Find Architecture

This page covers the implementation details behind PinchTab's semantic `find` pipeline.

## Overview

The `find` system converts accessibility snapshot nodes into lightweight descriptors, scores them against a natural-language query, and returns the best matching `ref`.

The implementation is designed to stay:

- local
- fast
- dependency-light
- recoverable after page re-renders

## Pipeline

```text
accessibility snapshot
  -> element descriptors
  -> lexical matcher
  -> embedding matcher
  -> combined score
  -> best ref
  -> intent cache / recovery hooks
```

## Element Descriptors

Each accessibility node is converted into a descriptor with:

- `ref`
- `role`
- `name`
- `value`

Those fields are also combined into a composite string used for matching.

## Matchers

PinchTab currently uses a combined matcher built from:

- a lexical matcher
- an embedding matcher based on a hashing embedder

Default weighting is:

```text
0.6 lexical + 0.4 embedding
```

Per-request overrides exist through `lexicalWeight` and `embeddingWeight`.

## Lexical Side

The lexical matcher focuses on exact and near-exact token overlap, including role-aware matching behavior.

Useful properties:

- strong for exact words
- easy to reason about
- good precision on explicit queries like `submit button`

## Embedding Side

The embedding matcher uses a feature-hashing approach rather than an external ML model.

Useful properties:

- catches fuzzy similarity
- handles partial and sub-word overlap better
- has no model download or network dependency

## Combined Matching

The combined matcher runs lexical and embedding scoring concurrently, merges results by element ref, and applies the weighted final score.

It also uses a lower internal threshold before the final merge so that candidates which are only strong on one side are not discarded too early.

## Snapshot Dependency

`find` depends on the same accessibility snapshot/ref-cache infrastructure used by snapshot-driven interaction.

If a cached snapshot is missing, the handler tries to refresh it automatically before giving up.

## Intent Cache And Recovery

After a successful match, PinchTab records:

- the original query
- the matched descriptor
- score/confidence metadata

This allows recovery logic to attempt a semantic re-match if a later action fails because the old ref became stale after a page update.

## Orchestrator Routing

The orchestrator exposes `POST /tabs/{id}/find` and proxies it to the correct running instance. The actual matching implementation remains in the shared handler layer.

## Design Constraints

The current design intentionally avoids:

- external embedding services
- heavyweight model dependencies
- selector-first coupling

That keeps the system portable and fast, but it also means the quality ceiling is bounded by the in-process matcher design and the quality of the accessibility snapshot.

## Performance

Benchmarks on Intel i5-4300U @ 1.90GHz:

| Operation | Elements | Latency | Allocations |
| --- | --- | --- | --- |
| Lexical Find | 16 | ~71 us | 134 allocs |
| HashingEmbedder (single) | 1 | ~11 us | 3 allocs |
| HashingEmbedder (batch) | 16 | ~171 us | 49 allocs |
| Embedding Find | 16 | ~180 us | 98 allocs |
| **Combined Find** | **16** | **~233 us** | **263 allocs** |
| Combined Find | 100 | ~1.5 ms | 1685 allocs |