| --- |
| base_model: |
| - Qwen/Qwen2.5-VL-7B-Instruct |
| datasets: |
| - Gnonymous/Web-CogDataset |
| language: |
| - en |
| - zh |
| license: apache-2.0 |
| pipeline_tag: image-text-to-text |
| --- |
| |
| # Web-CogReasoner |
|
|
| [**Web-CogReasoner**](https://huggingface.co/papers/2508.01858) is a knowledge-driven multimodal agent designed for cognitive reasoning in web environments. It introduces a paradigm shift by systematically building agent capabilities through a two-stage training process: knowledge content learning (Factual, Conceptual) and cognitive processes (Procedural). |
|
|
| - **Paper:** [Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents](https://huggingface.co/papers/2508.01858) |
| - **Project Page:** [https://eohan.me/Web-CogReasoner](https://eohan.me/Web-CogReasoner) |
| - **Repository:** [https://github.com/Gnonymous/Web-CogReasoner](https://github.com/Gnonymous/Web-CogReasoner) |
|
|
| Web-CogReasoner is trained using the [Web-CogDataset](https://huggingface.co/datasets/Gnonymous/Web-CogDataset) and employs a novel knowledge-driven Chain-of-Thought (CoT) reasoning framework to generalize to unseen web tasks. |
|
|
| ## Performance |
|
|
| Web-CogReasoner demonstrates significant superiority over existing models across various benchmarks: |
|
|
| | Benchmark | Score | |
| | :--- | :---: | |
| | Web-CogBench | 84.4 | |
| | VisualWebBench | 86.3 | |
| | WebVoyager | 30.2% | |
| | Online Multimodal-Mind2Web (Cross-Tasks) | 17.0% | |
| | Online Multimodal-Mind2Web (Cross-Webs) | 10.1% | |
|
|
| ## Citation |
|
|
| If you find this work helpful, please cite the following paper: |
|
|
| ```bibtex |
| @article{guo2025web, |
| title={Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents}, |
| author={Guo, Yuhan and Guo, Cong and Sun, Aiwen and He, Hongliang and Yang, Xinyu and Lu, Yue and Zhang, Yingji and Guo, Xuntao and Zhang, Dong and Liu, Jianzhuang and others}, |
| journal={arXiv preprint arXiv:2508.01858}, |
| year={2025} |
| } |
| ``` |