AUXteam's picture
Upload folder using huggingface_hub
6a7089a verified
# Find Architecture
This page covers the implementation details behind PinchTab's semantic `find` pipeline.
## Overview
The `find` system converts accessibility snapshot nodes into lightweight descriptors, scores them against a natural-language query, and returns the best matching `ref`.
The implementation is designed to stay:
- local
- fast
- dependency-light
- recoverable after page re-renders
## Pipeline
```text
accessibility snapshot
-> element descriptors
-> lexical matcher
-> embedding matcher
-> combined score
-> best ref
-> intent cache / recovery hooks
```
## Element Descriptors
Each accessibility node is converted into a descriptor with:
- `ref`
- `role`
- `name`
- `value`
Those fields are also combined into a composite string used for matching.
## Matchers
PinchTab currently uses a combined matcher built from:
- a lexical matcher
- an embedding matcher based on a hashing embedder
Default weighting is:
```text
0.6 lexical + 0.4 embedding
```
Per-request overrides exist through `lexicalWeight` and `embeddingWeight`.
## Lexical Side
The lexical matcher focuses on exact and near-exact token overlap, including role-aware matching behavior.
Useful properties:
- strong for exact words
- easy to reason about
- good precision on explicit queries like `submit button`
## Embedding Side
The embedding matcher uses a feature-hashing approach rather than an external ML model.
Useful properties:
- catches fuzzy similarity
- handles partial and sub-word overlap better
- has no model download or network dependency
## Combined Matching
The combined matcher runs lexical and embedding scoring concurrently, merges results by element ref, and applies the weighted final score.
It also uses a lower internal threshold before the final merge so that candidates which are only strong on one side are not discarded too early.
## Snapshot Dependency
`find` depends on the same accessibility snapshot/ref-cache infrastructure used by snapshot-driven interaction.
If a cached snapshot is missing, the handler tries to refresh it automatically before giving up.
## Intent Cache And Recovery
After a successful match, PinchTab records:
- the original query
- the matched descriptor
- score/confidence metadata
This allows recovery logic to attempt a semantic re-match if a later action fails because the old ref became stale after a page update.
## Orchestrator Routing
The orchestrator exposes `POST /tabs/{id}/find` and proxies it to the correct running instance. The actual matching implementation remains in the shared handler layer.
## Design Constraints
The current design intentionally avoids:
- external embedding services
- heavyweight model dependencies
- selector-first coupling
That keeps the system portable and fast, but it also means the quality ceiling is bounded by the in-process matcher design and the quality of the accessibility snapshot.
## Performance
Benchmarks on Intel i5-4300U @ 1.90GHz:
| Operation | Elements | Latency | Allocations |
| --- | --- | --- | --- |
| Lexical Find | 16 | ~71 us | 134 allocs |
| HashingEmbedder (single) | 1 | ~11 us | 3 allocs |
| HashingEmbedder (batch) | 16 | ~171 us | 49 allocs |
| Embedding Find | 16 | ~180 us | 98 allocs |
| **Combined Find** | **16** | **~233 us** | **263 allocs** |
| Combined Find | 100 | ~1.5 ms | 1685 allocs |