Web-CogReasoner / README.md
nielsr's picture
nielsr HF Staff
Update model card with pipeline tag and project links
c98c45e verified
|
raw
history blame
1.88 kB
metadata
base_model:
  - Qwen/Qwen2.5-VL-7B-Instruct
datasets:
  - Gnonymous/Web-CogDataset
language:
  - en
  - zh
license: apache-2.0
pipeline_tag: image-text-to-text

Web-CogReasoner

Web-CogReasoner is a knowledge-driven multimodal agent designed for cognitive reasoning in web environments. It introduces a paradigm shift by systematically building agent capabilities through a two-stage training process: knowledge content learning (Factual, Conceptual) and cognitive processes (Procedural).

Web-CogReasoner is trained using the Web-CogDataset and employs a novel knowledge-driven Chain-of-Thought (CoT) reasoning framework to generalize to unseen web tasks.

Performance

Web-CogReasoner demonstrates significant superiority over existing models across various benchmarks:

Benchmark Score
Web-CogBench 84.4
VisualWebBench 86.3
WebVoyager 30.2%
Online Multimodal-Mind2Web (Cross-Tasks) 17.0%
Online Multimodal-Mind2Web (Cross-Webs) 10.1%

Citation

If you find this work helpful, please cite the following paper:

@article{guo2025web,
  title={Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents},
  author={Guo, Yuhan and Guo, Cong and Sun, Aiwen and He, Hongliang and Yang, Xinyu and Lu, Yue and Zhang, Yingji and Guo, Xuntao and Zhang, Dong and Liu, Jianzhuang and others},
  journal={arXiv preprint arXiv:2508.01858},
  year={2025}
}