AUXteam's picture
Upload folder using huggingface_hub
6a7089a verified

Find Architecture

This page covers the implementation details behind PinchTab's semantic find pipeline.

Overview

The find system converts accessibility snapshot nodes into lightweight descriptors, scores them against a natural-language query, and returns the best matching ref.

The implementation is designed to stay:

  • local
  • fast
  • dependency-light
  • recoverable after page re-renders

Pipeline

accessibility snapshot
  -> element descriptors
  -> lexical matcher
  -> embedding matcher
  -> combined score
  -> best ref
  -> intent cache / recovery hooks

Element Descriptors

Each accessibility node is converted into a descriptor with:

  • ref
  • role
  • name
  • value

Those fields are also combined into a composite string used for matching.

Matchers

PinchTab currently uses a combined matcher built from:

  • a lexical matcher
  • an embedding matcher based on a hashing embedder

Default weighting is:

0.6 lexical + 0.4 embedding

Per-request overrides exist through lexicalWeight and embeddingWeight.

Lexical Side

The lexical matcher focuses on exact and near-exact token overlap, including role-aware matching behavior.

Useful properties:

  • strong for exact words
  • easy to reason about
  • good precision on explicit queries like submit button

Embedding Side

The embedding matcher uses a feature-hashing approach rather than an external ML model.

Useful properties:

  • catches fuzzy similarity
  • handles partial and sub-word overlap better
  • has no model download or network dependency

Combined Matching

The combined matcher runs lexical and embedding scoring concurrently, merges results by element ref, and applies the weighted final score.

It also uses a lower internal threshold before the final merge so that candidates which are only strong on one side are not discarded too early.

Snapshot Dependency

find depends on the same accessibility snapshot/ref-cache infrastructure used by snapshot-driven interaction.

If a cached snapshot is missing, the handler tries to refresh it automatically before giving up.

Intent Cache And Recovery

After a successful match, PinchTab records:

  • the original query
  • the matched descriptor
  • score/confidence metadata

This allows recovery logic to attempt a semantic re-match if a later action fails because the old ref became stale after a page update.

Orchestrator Routing

The orchestrator exposes POST /tabs/{id}/find and proxies it to the correct running instance. The actual matching implementation remains in the shared handler layer.

Design Constraints

The current design intentionally avoids:

  • external embedding services
  • heavyweight model dependencies
  • selector-first coupling

That keeps the system portable and fast, but it also means the quality ceiling is bounded by the in-process matcher design and the quality of the accessibility snapshot.

Performance

Benchmarks on Intel i5-4300U @ 1.90GHz:

Operation Elements Latency Allocations
Lexical Find 16 ~71 us 134 allocs
HashingEmbedder (single) 1 ~11 us 3 allocs
HashingEmbedder (batch) 16 ~171 us 49 allocs
Embedding Find 16 ~180 us 98 allocs
Combined Find 16 ~233 us 263 allocs
Combined Find 100 ~1.5 ms 1685 allocs