nielsr HF Staff

Update model card with pipeline tag and project links

c98c45e verified 4 months ago

1.88 kB

	---
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	datasets:
	- Gnonymous/Web-CogDataset
	language:
	- en
	- zh
	license: apache-2.0
	pipeline_tag: image-text-to-text
	---

	# Web-CogReasoner

	[Web-CogReasoner](https://huggingface.co/papers/2508.01858) is a knowledge-driven multimodal agent designed for cognitive reasoning in web environments. It introduces a paradigm shift by systematically building agent capabilities through a two-stage training process: knowledge content learning (Factual, Conceptual) and cognitive processes (Procedural).

	- Paper: [Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents](https://huggingface.co/papers/2508.01858)
	- Project Page: [https://eohan.me/Web-CogReasoner](https://eohan.me/Web-CogReasoner)
	- Repository: [https://github.com/Gnonymous/Web-CogReasoner](https://github.com/Gnonymous/Web-CogReasoner)

	Web-CogReasoner is trained using the [Web-CogDataset](https://huggingface.co/datasets/Gnonymous/Web-CogDataset) and employs a novel knowledge-driven Chain-of-Thought (CoT) reasoning framework to generalize to unseen web tasks.

	## Performance

	Web-CogReasoner demonstrates significant superiority over existing models across various benchmarks:

	\| Benchmark \| Score \|
	\| :--- \| :---: \|
	\| Web-CogBench \| 84.4 \|
	\| VisualWebBench \| 86.3 \|
	\| WebVoyager \| 30.2% \|
	\| Online Multimodal-Mind2Web (Cross-Tasks) \| 17.0% \|
	\| Online Multimodal-Mind2Web (Cross-Webs) \| 10.1% \|

	## Citation

	If you find this work helpful, please cite the following paper:

	```bibtex
	@article{guo2025web,
	title={Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents},
	author={Guo, Yuhan and Guo, Cong and Sun, Aiwen and He, Hongliang and Yang, Xinyu and Lu, Yue and Zhang, Yingji and Guo, Xuntao and Zhang, Dong and Liu, Jianzhuang and others},
	journal={arXiv preprint arXiv:2508.01858},
	year={2025}
	}
	```