Web Agent Bi-Encoder (v1)

A fine-tuned bi-encoder for web element selection, trained on the Mind2Web dataset.

What this model does

Given a natural-language task description (e.g., "click the search button") and a set of serialized web page elements, this model identifies the correct element to interact with.

Training

  • Base model: BAAI/bge-small-en-v1.5
  • Training data: Mind2Web (~5825 examples)
  • Loss: MultipleNegativesRankingLoss
  • Format version: v1

Evaluation

  • Top-1 accuracy: 80.4%
  • Δ vs zero-shot baseline: +13.8 points
  • Baseline: BAAI/bge-small-en-v1.5

Serialization format

This model expects inputs serialized in the web-agent v1 format. See the repository's docs/spec/03-node-serialization.md for the full specification. Models trained on a different format version are incompatible.

Limitations

  • Trained primarily on English-language web tasks
  • US-centric site distribution in Mind2Web
  • Dataset collected circa 2023; some site layouts may have changed
  • Best for common web interaction patterns (forms, search, navigation)

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("doeve/web-agent-bge-small-v1", revision="v1.0.0")
query_emb = model.encode(["search for flights"])
candidate_embs = model.encode(["textbox \"Search\" | — | in form:\"\", ..."])

Or via Transformers.js in the browser:

import { pipeline } from "@huggingface/transformers";
const extractor = await pipeline("feature-extraction", "doeve/web-agent-bge-small-v1",
  { revision: "v1.0.0", dtype: "q8" });
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train doeve/web-agent-bge-small-v1