Web Agent Bi-Encoder (v1)
A fine-tuned bi-encoder for web element selection, trained on the Mind2Web dataset.
What this model does
Given a natural-language task description (e.g., "click the search button") and a set of serialized web page elements, this model identifies the correct element to interact with.
Training
- Base model: BAAI/bge-small-en-v1.5
- Training data: Mind2Web (~5825 examples)
- Loss: MultipleNegativesRankingLoss
- Format version: v1
Evaluation
- Top-1 accuracy: 80.4%
- Δ vs zero-shot baseline: +13.8 points
- Baseline: BAAI/bge-small-en-v1.5
Serialization format
This model expects inputs serialized in the web-agent v1 format. See the
repository's docs/spec/03-node-serialization.md for the full specification.
Models trained on a different format version are incompatible.
Limitations
- Trained primarily on English-language web tasks
- US-centric site distribution in Mind2Web
- Dataset collected circa 2023; some site layouts may have changed
- Best for common web interaction patterns (forms, search, navigation)
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("doeve/web-agent-bge-small-v1", revision="v1.0.0")
query_emb = model.encode(["search for flights"])
candidate_embs = model.encode(["textbox \"Search\" | — | in form:\"\", ..."])
Or via Transformers.js in the browser:
import { pipeline } from "@huggingface/transformers";
const extractor = await pipeline("feature-extraction", "doeve/web-agent-bge-small-v1",
{ revision: "v1.0.0", dtype: "q8" });
- Downloads last month
- 19