|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
pipeline_tag: image-text-to-text |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- LangAGI-Lab/WebPRMCollection_preference_pair |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Card for Web-Shepherd |
|
|
|
|
|
Web-Shepherd is the first process reward model (PRM) designed specifically for web agents, as presented in the paper [Web-Shepherd: Advancing PRMs for Reinforcing Web Agents](https://arxiv.org/abs/2505.15277). It evaluates trajectories at the step level to provide interpretable and cost-efficient feedback for both learning and inference-time decision making in web navigation tasks. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
* **Developed by:** [More Information Needed] |
|
|
* **Model type:** Language Model |
|
|
* **License:** apache-2.0 |
|
|
* **Finetuned from model:** Qwen3 |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** https://github.com/LangAGI-Lab/WebShepherd |
|
|
- **Paper:** https://arxiv.org/abs/2505.15277 |
|
|
- **Dataset:** https://huggingface.co/datasets/LangAGI-Lab/WebPRMCollection_preference_pair |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
This model can be used to assess web navigation trajectories in a step-level. |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
The model can be fine-tuned for web navigation tasks. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model has been trained on the WebPRM Collection, a large-scale dataset with 40K step-level preference pairs and annotated checklists spanning diverse domains and difficulty levels. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
The model was evaluated on the WebRewardBench, the first meta-evaluation benchmark for evaluating PRMs. |