WebShepherd_8B / README.md
nielsr's picture
nielsr HF Staff
Improve model card
d4694d5 verified
|
raw
history blame
1.5 kB
---
library_name: transformers
tags: []
pipeline_tag: image-text-to-text
license: apache-2.0
datasets:
- LangAGI-Lab/WebPRMCollection_preference_pair
language:
- en
---
# Model Card for Web-Shepherd
Web-Shepherd is the first process reward model (PRM) designed specifically for web agents, as presented in the paper [Web-Shepherd: Advancing PRMs for Reinforcing Web Agents](https://arxiv.org/abs/2505.15277). It evaluates trajectories at the step level to provide interpretable and cost-efficient feedback for both learning and inference-time decision making in web navigation tasks.
## Model Details
* **Developed by:** [More Information Needed]
* **Model type:** Language Model
* **License:** apache-2.0
* **Finetuned from model:** Qwen3
### Model Sources
- **Repository:** https://github.com/LangAGI-Lab/WebShepherd
- **Paper:** https://arxiv.org/abs/2505.15277
- **Dataset:** https://huggingface.co/datasets/LangAGI-Lab/WebPRMCollection_preference_pair
## Uses
### Direct Use
This model can be used to assess web navigation trajectories in a step-level.
### Downstream Use
The model can be fine-tuned for web navigation tasks.
## Training Details
### Training Data
The model has been trained on the WebPRM Collection, a large-scale dataset with 40K step-level preference pairs and annotated checklists spanning diverse domains and difficulty levels.
## Evaluation
The model was evaluated on the WebRewardBench, the first meta-evaluation benchmark for evaluating PRMs.