Improve model card: Add pipeline tag, detailed description, and code link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,9 +1,46 @@
1
  ---
 
 
2
  license: other
3
  license_name: webaggregator
4
  license_link: https://huggingface.co/CognitiveKernel/WebAggregator-32B/blob/main/LICENSE
5
- base_model:
6
- - Qwen/Qwen3-32B
7
  ---
8
 
9
- This model was the WebAggregator-32B model mentioned in the paper [Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents](https://arxiv.org/abs/2510.14438).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen3-32B
4
  license: other
5
  license_name: webaggregator
6
  license_link: https://huggingface.co/CognitiveKernel/WebAggregator-32B/blob/main/LICENSE
7
+ pipeline_tag: image-text-to-text
 
8
  ---
9
 
10
+ # WebAggregator-32B
11
+
12
+ This model is **WebAggregator-32B**, a deep research web agent presented in the paper [Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents](https://arxiv.org/abs/2510.14438).
13
+
14
+ WebAggregator models are designed to enhance the information aggregation capabilities of web agents, going beyond mere information seeking. They rigorously analyze and aggregate knowledge from diverse sources, including web environments, files, and multimodal inputs, to support in-depth research. The framework employs an "Explore to Evolve" paradigm to scalably construct verifiable training data for web agents.
15
+
16
+ The 32B variant of WebAggregator demonstrates strong performance, surpassing GPT-4.1 by more than 10% on GAIA-text and closely approaching Claude-3.7-sonnet, particularly on challenging information aggregation benchmarks where other agents struggle.
17
+
18
+ ## ✨ Features
19
+
20
+ - πŸ€– **Fully Automated and Verifiable QA Construction**: Enables scalable generation of high-quality training data for web agents.
21
+ - πŸ˜„ **Open Source**: Provides a complete codebase including the QA construction engine, queries, trajectories, and models.
22
+ - πŸ‘ **Highly Customizable**: Allows users to collect data tailored to their specific needs with minimal human effort, and easily customize their own agents.
23
+
24
+ ## πŸ”— Code Repository
25
+
26
+ The official code for the WebAggregator project can be found on GitHub: [https://github.com/Tencent/WebAggregator](https://github.com/Tencent/WebAggregator)
27
+
28
+ ## πŸš€ Getting Started
29
+
30
+ To get started with the WebAggregator project, please refer to the comprehensive instructions in the [official GitHub repository's Quick Start and Usage sections](https://github.com/Tencent/WebAggregator#quick-start). The repository provides details on cloning, installing dependencies, configuring, and running evaluation, QA construction, and trajectory sampling scripts.
31
+
32
+ ## πŸ“š Citation
33
+
34
+ If you find this work helpful, please cite the original paper:
35
+
36
+ ```bibtex
37
+ @misc{wang2025exploreevolvescalingevolved,
38
+ title={Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents},
39
+ author={Rui Wang and Ce Zhang and Jun-Yu Ma and Jianshu Zhang and Hongru Wang and Yi Chen and Boyang Xue and Tianqing Fang and Zhisong Zhang and Hongming Zhang and Haitao Mi and Dong Yu and Kam-Fai Wong},
40
+ year={2025},
41
+ eprint={2510.14438},
42
+ archivePrefix={arXiv},
43
+ primaryClass={cs.CL},
44
+ url={https://arxiv.org/abs/2510.14438},
45
+ }
46
+ ```