CognitiveKernel
/

WebAggregator-32B

Model card Files Files and versions

xet

Community

Improve model card: Add pipeline tag, detailed description, and code link

by nielsr HF Staff - opened Oct 21, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+40

-3

Files changed (1) hide show

README.md +40 -3

README.md CHANGED Viewed

@@ -1,9 +1,46 @@
 ---
 license: other
 license_name: webaggregator
 license_link: https://huggingface.co/CognitiveKernel/WebAggregator-32B/blob/main/LICENSE
-base_model:
-- Qwen/Qwen3-32B
 ---
-This model was the WebAggregator-32B model mentioned in the paper [Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents](https://arxiv.org/abs/2510.14438).

 ---
+base_model:
+- Qwen/Qwen3-32B
 license: other
 license_name: webaggregator
 license_link: https://huggingface.co/CognitiveKernel/WebAggregator-32B/blob/main/LICENSE
+pipeline_tag: image-text-to-text
 ---
+# WebAggregator-32B
+This model is **WebAggregator-32B**, a deep research web agent presented in the paper [Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents](https://arxiv.org/abs/2510.14438).
+WebAggregator models are designed to enhance the information aggregation capabilities of web agents, going beyond mere information seeking. They rigorously analyze and aggregate knowledge from diverse sources, including web environments, files, and multimodal inputs, to support in-depth research. The framework employs an "Explore to Evolve" paradigm to scalably construct verifiable training data for web agents.
+The 32B variant of WebAggregator demonstrates strong performance, surpassing GPT-4.1 by more than 10% on GAIA-text and closely approaching Claude-3.7-sonnet, particularly on challenging information aggregation benchmarks where other agents struggle.
+## ✨ Features
+-   🤖 **Fully Automated and Verifiable QA Construction**: Enables scalable generation of high-quality training data for web agents.
+-   😄 **Open Source**: Provides a complete codebase including the QA construction engine, queries, trajectories, and models.
+-   👍 **Highly Customizable**: Allows users to collect data tailored to their specific needs with minimal human effort, and easily customize their own agents.
+## 🔗 Code Repository
+The official code for the WebAggregator project can be found on GitHub: [https://github.com/Tencent/WebAggregator](https://github.com/Tencent/WebAggregator)
+## 🚀 Getting Started
+To get started with the WebAggregator project, please refer to the comprehensive instructions in the [official GitHub repository's Quick Start and Usage sections](https://github.com/Tencent/WebAggregator#quick-start). The repository provides details on cloning, installing dependencies, configuring, and running evaluation, QA construction, and trajectory sampling scripts.
+## 📚 Citation
+If you find this work helpful, please cite the original paper:
+```bibtex
+@misc{wang2025exploreevolvescalingevolved,
+      title={Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents},
+      author={Rui Wang and Ce Zhang and Jun-Yu Ma and Jianshu Zhang and Hongru Wang and Yi Chen and Boyang Xue and Tianqing Fang and Zhisong Zhang and Hongming Zhang and Haitao Mi and Dong Yu and Kam-Fai Wong},
+      year={2025},
+      eprint={2510.14438},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2510.14438},
+}
+```