--- library_name: transformers tags: [] pipeline_tag: image-text-to-text license: apache-2.0 datasets: - LangAGI-Lab/WebPRMCollection_preference_pair language: - en --- # Model Card for Web-Shepherd Web-Shepherd is the first process reward model (PRM) designed specifically for web agents, as presented in the paper [Web-Shepherd: Advancing PRMs for Reinforcing Web Agents](https://arxiv.org/abs/2505.15277). It evaluates trajectories at the step level to provide interpretable and cost-efficient feedback for both learning and inference-time decision making in web navigation tasks. ## Model Details * **Developed by:** [More Information Needed] * **Model type:** Language Model * **License:** apache-2.0 * **Finetuned from model:** Qwen3 ### Model Sources - **Repository:** https://github.com/LangAGI-Lab/WebShepherd - **Paper:** https://arxiv.org/abs/2505.15277 - **Dataset:** https://huggingface.co/datasets/LangAGI-Lab/WebPRMCollection_preference_pair ## Uses ### Direct Use This model can be used to assess web navigation trajectories in a step-level. ### Downstream Use The model can be fine-tuned for web navigation tasks. ## Training Details ### Training Data The model has been trained on the WebPRM Collection, a large-scale dataset with 40K step-level preference pairs and annotated checklists spanning diverse domains and difficulty levels. ## Evaluation The model was evaluated on the WebRewardBench, the first meta-evaluation benchmark for evaluating PRMs.