LangAGI-Lab
/

WebShepherd_8B

Model card Files Files and versions

WebShepherd_8B / README.md

nielsr's picture

nielsr HF Staff

Improve model card

d4694d5 verified 8 months ago

|

1.5 kB

	---
	library_name: transformers
	tags: []
	pipeline_tag: image-text-to-text
	license: apache-2.0
	datasets:
	- LangAGI-Lab/WebPRMCollection_preference_pair
	language:
	- en
	---

	# Model Card for Web-Shepherd

	Web-Shepherd is the first process reward model (PRM) designed specifically for web agents, as presented in the paper [Web-Shepherd: Advancing PRMs for Reinforcing Web Agents](https://arxiv.org/abs/2505.15277). It evaluates trajectories at the step level to provide interpretable and cost-efficient feedback for both learning and inference-time decision making in web navigation tasks.

	## Model Details

	* Developed by: [More Information Needed]
	* Model type: Language Model
	* License: apache-2.0
	* Finetuned from model: Qwen3

	### Model Sources

	- Repository: https://github.com/LangAGI-Lab/WebShepherd
	- Paper: https://arxiv.org/abs/2505.15277
	- Dataset: https://huggingface.co/datasets/LangAGI-Lab/WebPRMCollection_preference_pair

	## Uses

	### Direct Use

	This model can be used to assess web navigation trajectories in a step-level.

	### Downstream Use

	The model can be fine-tuned for web navigation tasks.

	## Training Details

	### Training Data

	The model has been trained on the WebPRM Collection, a large-scale dataset with 40K step-level preference pairs and annotated checklists spanning diverse domains and difficulty levels.

	## Evaluation

	The model was evaluated on the WebRewardBench, the first meta-evaluation benchmark for evaluating PRMs.