metadata
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
datasets:
- Gnonymous/Web-CogDataset
language:
- en
- zh
license: apache-2.0
pipeline_tag: image-text-to-text
Web-CogReasoner
Web-CogReasoner is a knowledge-driven multimodal agent designed for cognitive reasoning in web environments. It introduces a paradigm shift by systematically building agent capabilities through a two-stage training process: knowledge content learning (Factual, Conceptual) and cognitive processes (Procedural).
- Paper: Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents
- Project Page: https://eohan.me/Web-CogReasoner
- Repository: https://github.com/Gnonymous/Web-CogReasoner
Web-CogReasoner is trained using the Web-CogDataset and employs a novel knowledge-driven Chain-of-Thought (CoT) reasoning framework to generalize to unseen web tasks.
Performance
Web-CogReasoner demonstrates significant superiority over existing models across various benchmarks:
| Benchmark | Score |
|---|---|
| Web-CogBench | 84.4 |
| VisualWebBench | 86.3 |
| WebVoyager | 30.2% |
| Online Multimodal-Mind2Web (Cross-Tasks) | 17.0% |
| Online Multimodal-Mind2Web (Cross-Webs) | 10.1% |
Citation
If you find this work helpful, please cite the following paper:
@article{guo2025web,
title={Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents},
author={Guo, Yuhan and Guo, Cong and Sun, Aiwen and He, Hongliang and Yang, Xinyu and Lu, Yue and Zhang, Yingji and Guo, Xuntao and Zhang, Dong and Liu, Jianzhuang and others},
journal={arXiv preprint arXiv:2508.01858},
year={2025}
}