WebShepherd_8B / README.md

nielsr HF Staff

Improve model card

d4694d5 verified 8 months ago

preview code

raw

history blame

1.5 kB

metadata

library_name: transformers
tags: []
pipeline_tag: image-text-to-text
license: apache-2.0
datasets:
  - LangAGI-Lab/WebPRMCollection_preference_pair
language:
  - en

Model Card for Web-Shepherd

Web-Shepherd is the first process reward model (PRM) designed specifically for web agents, as presented in the paper Web-Shepherd: Advancing PRMs for Reinforcing Web Agents. It evaluates trajectories at the step level to provide interpretable and cost-efficient feedback for both learning and inference-time decision making in web navigation tasks.

Model Details

Developed by: [More Information Needed]
Model type: Language Model
License: apache-2.0
Finetuned from model: Qwen3

Model Sources

Repository: https://github.com/LangAGI-Lab/WebShepherd
Paper: https://arxiv.org/abs/2505.15277
Dataset: https://huggingface.co/datasets/LangAGI-Lab/WebPRMCollection_preference_pair

Uses

Direct Use

This model can be used to assess web navigation trajectories in a step-level.

Downstream Use

The model can be fine-tuned for web navigation tasks.

Training Details

Training Data

The model has been trained on the WebPRM Collection, a large-scale dataset with 40K step-level preference pairs and annotated checklists spanning diverse domains and difficulty levels.

Evaluation

The model was evaluated on the WebRewardBench, the first meta-evaluation benchmark for evaluating PRMs.