Web Agent Bi-Encoder (v1)

A fine-tuned bi-encoder for web element selection, trained on the Mind2Web dataset.

What this model does

Given a natural-language task description (e.g., "click the search button") and a set of serialized web page elements, this model identifies the correct element to interact with.

Training

Base model: BAAI/bge-small-en-v1.5
Training data: Mind2Web (~5825 examples)
Loss: MultipleNegativesRankingLoss
Format version: v1

Evaluation

Top-1 accuracy: 80.4%
Δ vs zero-shot baseline: +13.8 points
Baseline: BAAI/bge-small-en-v1.5

Serialization format

This model expects inputs serialized in the web-agent v1 format. See the repository's docs/spec/03-node-serialization.md for the full specification. Models trained on a different format version are incompatible.

Limitations

Trained primarily on English-language web tasks
US-centric site distribution in Mind2Web
Dataset collected circa 2023; some site layouts may have changed
Best for common web interaction patterns (forms, search, navigation)

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("doeve/web-agent-bge-small-v1", revision="v1.0.0")
query_emb = model.encode(["search for flights"])
candidate_embs = model.encode(["textbox \"Search\" | — | in form:\"\", ..."])

Or via Transformers.js in the browser:

import { pipeline } from "@huggingface/transformers";
const extractor = await pipeline("feature-extraction", "doeve/web-agent-bge-small-v1",
  { revision: "v1.0.0", dtype: "q8" });

Downloads last month: 19

doeve
/

web-agent-bge-small-v1