Kaguya-19 commited on
Commit
68f0493
·
verified ·
1 Parent(s): ebb191f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +331 -329
README.md CHANGED
@@ -1,329 +1,331 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- # AgentCPM-Report: Gemini-2.5-pro-DeepResearch Level Local DeepResearch
5
-
6
- <p align="center">
7
- <a href='https://huggingface.co/openbmb/AgentCPM-Report'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-AgentCPM--Report-yellow'>
8
- <a href='https://huggingface.co/openbmb/AgentCPM-Report-GGUF'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-AgentCPM--Report--GGUF-yellow'>
9
- <a href='https://github.com/OpenBMB/AgentCPM'><img src='https://img.shields.io/badge/GitHub-AgentCPM-blue?logo=github'>
10
- <a href='https://github.com/OpenBMB/UltraRAG'><img src='https://img.shields.io/badge/GitHub-UltraRAG-blue?logo=github'>
11
- </p>
12
-
13
- ## Links
14
- - [AgentCPM-Report](https://huggingface.co/openbmb/AgentCPM-Report) The Gemini-2.5-pro-DeepResearch Level Local DeepResearch Model
15
- - [AgentCPM-Report-GGUF](https://huggingface.co/openbmb/AgentCPM-Report-GGUF) The GGUF version
16
- - [AgentCPM](https://github.com/OpenBMB/AgentCPM) Our code for AgentCPM Series
17
- - [UltraRAG](https://github.com/OpenBMB/UltraRAG) The low code RAG Framework
18
-
19
-
20
- ## News
21
- - [2026-01-20] 🚀🚀🚀 We open-sourced AgentCPM-Report built on MiniCPM4.1-8B, capable of matching top closed-source commercial systems like Gemini-2.5-pro-DeepResearch in report generation.
22
-
23
- ## Overview
24
- AgentCPM-Report is an open-source large language model agent jointly developed by [THUNLP](https://nlp.csai.tsinghua.edu.cn), Renmin University of China [RUCBM](https://github.com/RUCBM), and [ModelBest](https://modelbest.cn/en). It is based on the [MiniCPM4.1](https://github.com/OpenBMB/MiniCPM4.1) 8B-parameter base model. It accepts user instructions as input and autonomously generates long-form reports. Key highlights:
25
-
26
- - **Strong advantages in insight and comprehensiveness**: The first 8B edge-side model to surpass closed-source DeepResearch systems on deep research report generation tasks, redefining the performance ceiling for small-scale agent systems—especially achieving SOTA results on the Insight metric.
27
- - **Lightweight and local deployment**: Supports agile local deployment. With frameworks like UltraRAG, it enables large-scale knowledge base construction and can generate reports that are even more professional and in-depth than large models. Lightweight models plus local knowledge bases make it feasible to deploy a deep-research report writing system on a personal computer, laying the foundation for report writing based on personal privacy data or private-domain data.
28
-
29
- ## Demo Cases
30
- `YouTube link or Bilibili link for the video`
31
-
32
- ## Quick Start
33
- ### Docker Deployment
34
- We provide a minimal one-click `docker-compose` deployment integrated with UltraRAG, including the RAG framework UltraRAG2.0, the model inference framework vllm, and the vector database milvus. If you want CPU inference, we also provide a llama.cpp-based version for gguf models—just switch `docker-compose.yml` to `docker-compose.cpu.yml`.
35
-
36
- ``` bash
37
- git clone git@github.com:OpenBMB/UltraRAG.git
38
- cd UltraRAG
39
- git checkout agentcpm-report-demo
40
- cd agentcpm-report-demo
41
- cp env.example .env
42
- docker-compose -f docker-compose.yml up -d --build
43
- docker-compose -f docker-compose.yml logs -f ultrarag-ui
44
- ```
45
- The first startup pulls images, downloads the model, and configures the environment, which takes about 30 minutes.
46
- Then open `http://localhost:5050`. If you can see the UI, your deployment is successful.
47
- Follow the UI instructions to upload local files, chunk them, and build indexes; then in the Chat section, select AgentCPM-Report in the pipeline to start your workflow.
48
-
49
- (Optional) You can import [Wiki2024](https://modelscope.cn/datasets/UltraRAG/UltraRAG_Benchmark/tree/master/corpus/wiki24) as the writing database.
50
-
51
- You can read more tutorials about AgentCPM-Report in the [documentation](https://ultrarag.openbmb.cn/pages/cn/pipeline/agentcpm-report).
52
-
53
-
54
- ## Evaluation
55
- <table align="center">
56
- <thead>
57
- <tr>
58
- <th align="center">DeepResearch Bench</th>
59
- <th align="center">Overall</th>
60
- <th align="center">Comprehensiveness</th>
61
- <th align="center">Insight</th>
62
- <th align="center">Instruction Following</th>
63
- <th align="center">Readability</th>
64
- </tr>
65
- </thead>
66
- <tbody>
67
- <tr>
68
- <td align="center">Doubao-research</td>
69
- <td align="center">44.34</td>
70
- <td align="center">44.84</td>
71
- <td align="center">40.56</td>
72
- <td align="center">47.95</td>
73
- <td align="center">44.69</td>
74
- </tr>
75
- <tr>
76
- <td align="center">Claude-research</td>
77
- <td align="center">45</td>
78
- <td align="center">45.34</td>
79
- <td align="center">42.79</td>
80
- <td align="center">47.58</td>
81
- <td align="center">44.66</td>
82
- </tr>
83
- <tr>
84
- <td align="center">OpenAI-deepresearch</td>
85
- <td align="center">46.45</td>
86
- <td align="center">46.46</td>
87
- <td align="center">43.73</td>
88
- <td align="center">49.39</td>
89
- <td align="center">47.22</td>
90
- </tr>
91
- <tr>
92
- <td align="center">Gemini-2.5-Pro-deepresearch</td>
93
- <td align="center">49.71</td>
94
- <td align="center">49.51</td>
95
- <td align="center">49.45</td>
96
- <td align="center">50.12</td>
97
- <td align="center">50</td>
98
- </tr>
99
- <tr>
100
- <td align="center">WebWeaver(Qwen3-30B-A3B)</td>
101
- <td align="center">46.77</td>
102
- <td align="center">45.15</td>
103
- <td align="center">45.78</td>
104
- <td align="center">49.21</td>
105
- <td align="center">47.34</td>
106
- </tr>
107
- <tr>
108
- <td align="center">WebWeaver(Claude-Sonnet-4)</td>
109
- <td align="center">50.58</td>
110
- <td align="center">51.45</td>
111
- <td align="center">50.02</td>
112
- <td align="center">50.81</td>
113
- <td align="center">49.79</td>
114
- </tr>
115
- <tr>
116
- <td align="center">Enterprise-DR(Gemini-2.5-Pro)</td>
117
- <td align="center">49.86</td>
118
- <td align="center">49.01</td>
119
- <td align="center">50.28</td>
120
- <td align="center">50.03</td>
121
- <td align="center">49.98</td>
122
- </tr>
123
- <tr>
124
- <td align="center">RhinoInsigh(Gemini-2.5-Pro)</td>
125
- <td align="center">50.92</td>
126
- <td align="center">50.51</td>
127
- <td align="center">51.45</td>
128
- <td align="center">51.72</td>
129
- <td align="center">50</td>
130
- </tr>
131
- <tr>
132
- <td align="center">AgentCPM-Report</td>
133
- <td align="center">50.11</td>
134
- <td align="center">50.54</td>
135
- <td align="center">52.64</td>
136
- <td align="center">48.87</td>
137
- <td align="center">44.17</td>
138
- </tr>
139
- </tbody>
140
- </table>
141
-
142
- <table align="center">
143
- <thead>
144
- <tr>
145
- <th align="center">DeepResearch Gym</th>
146
- <th align="center">Avg.</th>
147
- <th align="center">Clarity</th>
148
- <th align="center">Depth</th>
149
- <th align="center">Balance</th>
150
- <th align="center">Breadth</th>
151
- <th align="center">Support</th>
152
- <th align="center">Insightfulness</th>
153
- </tr>
154
- </thead>
155
- <tbody>
156
- <tr>
157
- <td align="center">Doubao-research</td>
158
- <td align="center">84.46</td>
159
- <td align="center">68.85</td>
160
- <td align="center">93.12</td>
161
- <td align="center">83.96</td>
162
- <td align="center">93.33</td>
163
- <td align="center">84.38</td>
164
- <td align="center">83.12</td>
165
- </tr>
166
- <tr>
167
- <td align="center">Claude-research</td>
168
- <td align="center">80.25</td>
169
- <td align="center">86.67</td>
170
- <td align="center">96.88</td>
171
- <td align="center">84.41</td>
172
- <td align="center">96.56</td>
173
- <td align="center">26.77</td>
174
- <td align="center">90.22</td>
175
- </tr>
176
- <tr>
177
- <td align="center">OpenAI-deepresearch</td>
178
- <td align="center">91.27</td>
179
- <td align="center">84.90</td>
180
- <td align="center">98.10</td>
181
- <td align="center">89.80</td>
182
- <td align="center">97.40</td>
183
- <td align="center">88.40</td>
184
- <td align="center">89.00</td>
185
- </tr>
186
- <tr>
187
- <td align="center">Gemini-2.5-pro-deepresearch</td>
188
- <td align="center">96.02</td>
189
- <td align="center">90.71</td>
190
- <td align="center">99.90</td>
191
- <td align="center">93.37</td>
192
- <td align="center">99.69</td>
193
- <td align="center">95.00</td>
194
- <td align="center">97.45</td>
195
- </tr>
196
- <tr>
197
- <td align="center">WebWeaver (Qwen3-30b-a3b)</td>
198
- <td align="center">77.27</td>
199
- <td align="center">71.88</td>
200
- <td align="center">85.51</td>
201
- <td align="center">75.80</td>
202
- <td align="center">84.78</td>
203
- <td align="center">63.77</td>
204
- <td align="center">81.88</td>
205
- </tr>
206
- <tr>
207
- <td align="center">WebWeaver (Claude-sonnet-4)</td>
208
- <td align="center">96.77</td>
209
- <td align="center">90.50</td>
210
- <td align="center">99.87</td>
211
- <td align="center">94.30</td>
212
- <td align="center">100.00</td>
213
- <td align="center">98.73</td>
214
- <td align="center">97.22</td>
215
- </tr>
216
- <tr>
217
- <td align="center">AgentCPM-Report</td>
218
- <td align="center">98.48</td>
219
- <td align="center">95.1</td>
220
- <td align="center">100.0</td>
221
- <td align="center">98.5</td>
222
- <td align="center">100.0</td>
223
- <td align="center">97.3</td>
224
- <td align="center">100.0</td>
225
- </tr>
226
- </tbody>
227
- </table>
228
-
229
- <table align="center">
230
- <thead>
231
- <tr>
232
- <th align="center">DeepConsult</th>
233
- <th align="center">Avg.</th>
234
- <th align="center">Win</th>
235
- <th align="center">Tie</th>
236
- <th align="center">Lose</th>
237
- </tr>
238
- </thead>
239
- <tbody>
240
- <tr>
241
- <td align="center">Doubao-research</td>
242
- <td align="center">5.42</td>
243
- <td align="center">29.95</td>
244
- <td align="center">40.35</td>
245
- <td align="center">29.7</td>
246
- </tr>
247
- <tr>
248
- <td align="center">Claude-research</td>
249
- <td align="center">4.6</td>
250
- <td align="center">25</td>
251
- <td align="center">38.89</td>
252
- <td align="center">36.11</td>
253
- </tr>
254
- <tr>
255
- <td align="center">OpenAI-deepresearch</td>
256
- <td align="center">5</td>
257
- <td align="center">0</td>
258
- <td align="center">100</td>
259
- <td align="center">0</td>
260
- </tr>
261
- <tr>
262
- <td align="center">Gemini-2.5-Pro-deepresearch</td>
263
- <td align="center">6.7</td>
264
- <td align="center">61.27</td>
265
- <td align="center">31.13</td>
266
- <td align="center">7.6</td>
267
- </tr>
268
- <tr>
269
- <td align="center">WebWeaver(Qwen3-30B-A3B)</td>
270
- <td align="center">4.57</td>
271
- <td align="center">28.65</td>
272
- <td align="center">34.9</td>
273
- <td align="center">36.46</td>
274
- </tr>
275
- <tr>
276
- <td align="center">WebWeaver(Claude-Sonnet-4)</td>
277
- <td align="center">6.96</td>
278
- <td align="center">66.86</td>
279
- <td align="center">10.47</td>
280
- <td align="center">22.67</td>
281
- </tr>
282
- <tr>
283
- <td align="center">Enterprise-DR(Gemini-2.5-Pro)</td>
284
- <td align="center">6.82</td>
285
- <td align="center">71.57</td>
286
- <td align="center">19.12</td>
287
- <td align="center">9.31</td>
288
- </tr>
289
- <tr>
290
- <td align="center">RhinoInsigh(Gemini-2.5-Pro)</td>
291
- <td align="center">6.82</td>
292
- <td align="center">68.51</td>
293
- <td align="center">11.02</td>
294
- <td align="center">20.47</td>
295
- </tr>
296
- <tr>
297
- <td align="center">AgentCPM-Report</td>
298
- <td align="center">6.6</td>
299
- <td align="center">57.6</td>
300
- <td align="center">13.73</td>
301
- <td align="center">28.68</td>
302
- </tr>
303
- </tbody>
304
- </table>
305
-
306
- Our evaluation datasets include DeepResearch Bench, DeepConsult, and DeepResearch Gym. The writing-time knowledge base includes about 2.7 million [Arxiv papers](https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv) and about 200,000 internal webpage summaries.
307
-
308
- ## Acknowledgements
309
- This project would not be possible without the support and contributions of the open-source community. During development, we referred to and used multiple excellent open-source frameworks, models, and data resources, including [verl](https://github.com/volcengine/verl), [UltraRAG](https://github.com/OpenBMB/UltraRAG), [MiniCPM4.1](https://github.com/OpenBMB/MiniCPM4.1), and [SurveyGo](https://surveygo.modelbest.cn/).
310
-
311
- ## Contributions
312
- Project leads: Yishan Li, Wentong Chen
313
-
314
- Contributors: Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang, Zhong Zhang, Yaxi Lu, Zhenghao Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun
315
-
316
- Advisors: Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
317
-
318
- ## Citation
319
-
320
- If **AgentCPM-Report** is helpful for your research, please cite it as follows:
321
-
322
- ```bibtex
323
- @software{AgentCPMReport2026,
324
- title = {AgentCPM-Report: Gemini-2.5-pro-DeepResearch Level Local DeepResearch},
325
- author = {Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang, Zhong Zhang, Yaxi Lu, Zhenghao Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun},
326
- year = {2026},
327
- url = {https://github.com/OpenBMB/AgentCPM}
328
- }
329
- ```
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # AgentCPM-Report: Gemini-2.5-pro-DeepResearch Level Local DeepResearch
5
+
6
+ <p align="center">
7
+ <a href='https://huggingface.co/openbmb/AgentCPM-Report'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-AgentCPM--Report-yellow'>
8
+ <a href='https://huggingface.co/openbmb/AgentCPM-Report-GGUF'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-AgentCPM--Report--GGUF-yellow'>
9
+ <a href='https://github.com/OpenBMB/AgentCPM'><img src='https://img.shields.io/badge/GitHub-AgentCPM-blue?logo=github'>
10
+ <a href='https://github.com/OpenBMB/UltraRAG'><img src='https://img.shields.io/badge/GitHub-UltraRAG-blue?logo=github'>
11
+ </p>
12
+
13
+ ## Links
14
+ - [AgentCPM-Report](https://huggingface.co/openbmb/AgentCPM-Report) The Gemini-2.5-pro-DeepResearch Level Local DeepResearch Model
15
+ - [AgentCPM-Report-GGUF](https://huggingface.co/openbmb/AgentCPM-Report-GGUF) The GGUF version of AgentCPM-Report
16
+ - [AgentCPM-Explore](https://huggingface.co/openbmb/AgentCPM-Explore) The first open-source agent model with 4B parameters to appear on 8 widely used long-horizon agent benchmarks.
17
+ - [AgentCPM-Explore-GGUF](https://huggingface.co/openbmb/AgentCPM-Explore-GGUF) The GGUF version of AgentCPM-Explore
18
+ - [AgentCPM](https://github.com/OpenBMB/AgentCPM) Our code for AgentCPM Series
19
+ - [UltraRAG](https://github.com/OpenBMB/UltraRAG) The low code RAG Framework
20
+
21
+
22
+ ## News
23
+ - [2026-01-20] 🚀🚀🚀 We open-sourced AgentCPM-Report built on MiniCPM4.1-8B, capable of matching top closed-source commercial systems like Gemini-2.5-pro-DeepResearch in report generation.
24
+
25
+ ## Overview
26
+ AgentCPM-Report is an open-source large language model agent jointly developed by [THUNLP](https://nlp.csai.tsinghua.edu.cn), Renmin University of China [RUCBM](https://github.com/RUCBM), and [ModelBest](https://modelbest.cn/en). It is based on the [MiniCPM4.1](https://github.com/OpenBMB/MiniCPM4.1) 8B-parameter base model. It accepts user instructions as input and autonomously generates long-form reports. Key highlights:
27
+
28
+ - **Strong advantages in insight and comprehensiveness**: The first 8B edge-side model to surpass closed-source DeepResearch systems on deep research report generation tasks, redefining the performance ceiling for small-scale agent systems—especially achieving SOTA results on the Insight metric.
29
+ - **Lightweight and local deployment**: Supports agile local deployment. With frameworks like UltraRAG, it enables large-scale knowledge base construction and can generate reports that are even more professional and in-depth than large models. Lightweight models plus local knowledge bases make it feasible to deploy a deep-research report writing system on a personal computer, laying the foundation for report writing based on personal privacy data or private-domain data.
30
+
31
+ ## Demo Cases
32
+ `YouTube link or Bilibili link for the video`
33
+
34
+ ## Quick Start
35
+ ### Docker Deployment
36
+ We provide a minimal one-click `docker-compose` deployment integrated with UltraRAG, including the RAG framework UltraRAG2.0, the model inference framework vllm, and the vector database milvus. If you want CPU inference, we also provide a llama.cpp-based version for gguf models—just switch `docker-compose.yml` to `docker-compose.cpu.yml`.
37
+
38
+ ``` bash
39
+ git clone git@github.com:OpenBMB/UltraRAG.git
40
+ cd UltraRAG
41
+ git checkout agentcpm-report-demo
42
+ cd agentcpm-report-demo
43
+ cp env.example .env
44
+ docker-compose -f docker-compose.yml up -d --build
45
+ docker-compose -f docker-compose.yml logs -f ultrarag-ui
46
+ ```
47
+ The first startup pulls images, downloads the model, and configures the environment, which takes about 30 minutes.
48
+ Then open `http://localhost:5050`. If you can see the UI, your deployment is successful.
49
+ Follow the UI instructions to upload local files, chunk them, and build indexes; then in the Chat section, select AgentCPM-Report in the pipeline to start your workflow.
50
+
51
+ (Optional) You can import [Wiki2024](https://modelscope.cn/datasets/UltraRAG/UltraRAG_Benchmark/tree/master/corpus/wiki24) as the writing database.
52
+
53
+ You can read more tutorials about AgentCPM-Report in the [documentation](https://ultrarag.openbmb.cn/pages/cn/pipeline/agentcpm-report).
54
+
55
+
56
+ ## Evaluation
57
+ <table align="center">
58
+ <thead>
59
+ <tr>
60
+ <th align="center">DeepResearch Bench</th>
61
+ <th align="center">Overall</th>
62
+ <th align="center">Comprehensiveness</th>
63
+ <th align="center">Insight</th>
64
+ <th align="center">Instruction Following</th>
65
+ <th align="center">Readability</th>
66
+ </tr>
67
+ </thead>
68
+ <tbody>
69
+ <tr>
70
+ <td align="center">Doubao-research</td>
71
+ <td align="center">44.34</td>
72
+ <td align="center">44.84</td>
73
+ <td align="center">40.56</td>
74
+ <td align="center">47.95</td>
75
+ <td align="center">44.69</td>
76
+ </tr>
77
+ <tr>
78
+ <td align="center">Claude-research</td>
79
+ <td align="center">45</td>
80
+ <td align="center">45.34</td>
81
+ <td align="center">42.79</td>
82
+ <td align="center">47.58</td>
83
+ <td align="center">44.66</td>
84
+ </tr>
85
+ <tr>
86
+ <td align="center">OpenAI-deepresearch</td>
87
+ <td align="center">46.45</td>
88
+ <td align="center">46.46</td>
89
+ <td align="center">43.73</td>
90
+ <td align="center">49.39</td>
91
+ <td align="center">47.22</td>
92
+ </tr>
93
+ <tr>
94
+ <td align="center">Gemini-2.5-Pro-deepresearch</td>
95
+ <td align="center">49.71</td>
96
+ <td align="center">49.51</td>
97
+ <td align="center">49.45</td>
98
+ <td align="center">50.12</td>
99
+ <td align="center">50</td>
100
+ </tr>
101
+ <tr>
102
+ <td align="center">WebWeaver(Qwen3-30B-A3B)</td>
103
+ <td align="center">46.77</td>
104
+ <td align="center">45.15</td>
105
+ <td align="center">45.78</td>
106
+ <td align="center">49.21</td>
107
+ <td align="center">47.34</td>
108
+ </tr>
109
+ <tr>
110
+ <td align="center">WebWeaver(Claude-Sonnet-4)</td>
111
+ <td align="center">50.58</td>
112
+ <td align="center">51.45</td>
113
+ <td align="center">50.02</td>
114
+ <td align="center">50.81</td>
115
+ <td align="center">49.79</td>
116
+ </tr>
117
+ <tr>
118
+ <td align="center">Enterprise-DR(Gemini-2.5-Pro)</td>
119
+ <td align="center">49.86</td>
120
+ <td align="center">49.01</td>
121
+ <td align="center">50.28</td>
122
+ <td align="center">50.03</td>
123
+ <td align="center">49.98</td>
124
+ </tr>
125
+ <tr>
126
+ <td align="center">RhinoInsigh(Gemini-2.5-Pro)</td>
127
+ <td align="center">50.92</td>
128
+ <td align="center">50.51</td>
129
+ <td align="center">51.45</td>
130
+ <td align="center">51.72</td>
131
+ <td align="center">50</td>
132
+ </tr>
133
+ <tr>
134
+ <td align="center">AgentCPM-Report</td>
135
+ <td align="center">50.11</td>
136
+ <td align="center">50.54</td>
137
+ <td align="center">52.64</td>
138
+ <td align="center">48.87</td>
139
+ <td align="center">44.17</td>
140
+ </tr>
141
+ </tbody>
142
+ </table>
143
+
144
+ <table align="center">
145
+ <thead>
146
+ <tr>
147
+ <th align="center">DeepResearch Gym</th>
148
+ <th align="center">Avg.</th>
149
+ <th align="center">Clarity</th>
150
+ <th align="center">Depth</th>
151
+ <th align="center">Balance</th>
152
+ <th align="center">Breadth</th>
153
+ <th align="center">Support</th>
154
+ <th align="center">Insightfulness</th>
155
+ </tr>
156
+ </thead>
157
+ <tbody>
158
+ <tr>
159
+ <td align="center">Doubao-research</td>
160
+ <td align="center">84.46</td>
161
+ <td align="center">68.85</td>
162
+ <td align="center">93.12</td>
163
+ <td align="center">83.96</td>
164
+ <td align="center">93.33</td>
165
+ <td align="center">84.38</td>
166
+ <td align="center">83.12</td>
167
+ </tr>
168
+ <tr>
169
+ <td align="center">Claude-research</td>
170
+ <td align="center">80.25</td>
171
+ <td align="center">86.67</td>
172
+ <td align="center">96.88</td>
173
+ <td align="center">84.41</td>
174
+ <td align="center">96.56</td>
175
+ <td align="center">26.77</td>
176
+ <td align="center">90.22</td>
177
+ </tr>
178
+ <tr>
179
+ <td align="center">OpenAI-deepresearch</td>
180
+ <td align="center">91.27</td>
181
+ <td align="center">84.90</td>
182
+ <td align="center">98.10</td>
183
+ <td align="center">89.80</td>
184
+ <td align="center">97.40</td>
185
+ <td align="center">88.40</td>
186
+ <td align="center">89.00</td>
187
+ </tr>
188
+ <tr>
189
+ <td align="center">Gemini-2.5-pro-deepresearch</td>
190
+ <td align="center">96.02</td>
191
+ <td align="center">90.71</td>
192
+ <td align="center">99.90</td>
193
+ <td align="center">93.37</td>
194
+ <td align="center">99.69</td>
195
+ <td align="center">95.00</td>
196
+ <td align="center">97.45</td>
197
+ </tr>
198
+ <tr>
199
+ <td align="center">WebWeaver (Qwen3-30b-a3b)</td>
200
+ <td align="center">77.27</td>
201
+ <td align="center">71.88</td>
202
+ <td align="center">85.51</td>
203
+ <td align="center">75.80</td>
204
+ <td align="center">84.78</td>
205
+ <td align="center">63.77</td>
206
+ <td align="center">81.88</td>
207
+ </tr>
208
+ <tr>
209
+ <td align="center">WebWeaver (Claude-sonnet-4)</td>
210
+ <td align="center">96.77</td>
211
+ <td align="center">90.50</td>
212
+ <td align="center">99.87</td>
213
+ <td align="center">94.30</td>
214
+ <td align="center">100.00</td>
215
+ <td align="center">98.73</td>
216
+ <td align="center">97.22</td>
217
+ </tr>
218
+ <tr>
219
+ <td align="center">AgentCPM-Report</td>
220
+ <td align="center">98.48</td>
221
+ <td align="center">95.1</td>
222
+ <td align="center">100.0</td>
223
+ <td align="center">98.5</td>
224
+ <td align="center">100.0</td>
225
+ <td align="center">97.3</td>
226
+ <td align="center">100.0</td>
227
+ </tr>
228
+ </tbody>
229
+ </table>
230
+
231
+ <table align="center">
232
+ <thead>
233
+ <tr>
234
+ <th align="center">DeepConsult</th>
235
+ <th align="center">Avg.</th>
236
+ <th align="center">Win</th>
237
+ <th align="center">Tie</th>
238
+ <th align="center">Lose</th>
239
+ </tr>
240
+ </thead>
241
+ <tbody>
242
+ <tr>
243
+ <td align="center">Doubao-research</td>
244
+ <td align="center">5.42</td>
245
+ <td align="center">29.95</td>
246
+ <td align="center">40.35</td>
247
+ <td align="center">29.7</td>
248
+ </tr>
249
+ <tr>
250
+ <td align="center">Claude-research</td>
251
+ <td align="center">4.6</td>
252
+ <td align="center">25</td>
253
+ <td align="center">38.89</td>
254
+ <td align="center">36.11</td>
255
+ </tr>
256
+ <tr>
257
+ <td align="center">OpenAI-deepresearch</td>
258
+ <td align="center">5</td>
259
+ <td align="center">0</td>
260
+ <td align="center">100</td>
261
+ <td align="center">0</td>
262
+ </tr>
263
+ <tr>
264
+ <td align="center">Gemini-2.5-Pro-deepresearch</td>
265
+ <td align="center">6.7</td>
266
+ <td align="center">61.27</td>
267
+ <td align="center">31.13</td>
268
+ <td align="center">7.6</td>
269
+ </tr>
270
+ <tr>
271
+ <td align="center">WebWeaver(Qwen3-30B-A3B)</td>
272
+ <td align="center">4.57</td>
273
+ <td align="center">28.65</td>
274
+ <td align="center">34.9</td>
275
+ <td align="center">36.46</td>
276
+ </tr>
277
+ <tr>
278
+ <td align="center">WebWeaver(Claude-Sonnet-4)</td>
279
+ <td align="center">6.96</td>
280
+ <td align="center">66.86</td>
281
+ <td align="center">10.47</td>
282
+ <td align="center">22.67</td>
283
+ </tr>
284
+ <tr>
285
+ <td align="center">Enterprise-DR(Gemini-2.5-Pro)</td>
286
+ <td align="center">6.82</td>
287
+ <td align="center">71.57</td>
288
+ <td align="center">19.12</td>
289
+ <td align="center">9.31</td>
290
+ </tr>
291
+ <tr>
292
+ <td align="center">RhinoInsigh(Gemini-2.5-Pro)</td>
293
+ <td align="center">6.82</td>
294
+ <td align="center">68.51</td>
295
+ <td align="center">11.02</td>
296
+ <td align="center">20.47</td>
297
+ </tr>
298
+ <tr>
299
+ <td align="center">AgentCPM-Report</td>
300
+ <td align="center">6.6</td>
301
+ <td align="center">57.6</td>
302
+ <td align="center">13.73</td>
303
+ <td align="center">28.68</td>
304
+ </tr>
305
+ </tbody>
306
+ </table>
307
+
308
+ Our evaluation datasets include DeepResearch Bench, DeepConsult, and DeepResearch Gym. The writing-time knowledge base includes about 2.7 million [Arxiv papers](https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv) and about 200,000 internal webpage summaries.
309
+
310
+ ## Acknowledgements
311
+ This project would not be possible without the support and contributions of the open-source community. During development, we referred to and used multiple excellent open-source frameworks, models, and data resources, including [verl](https://github.com/volcengine/verl), [UltraRAG](https://github.com/OpenBMB/UltraRAG), [MiniCPM4.1](https://github.com/OpenBMB/MiniCPM4.1), and [SurveyGo](https://surveygo.modelbest.cn/).
312
+
313
+ ## Contributions
314
+ Project leads: Yishan Li, Wentong Chen
315
+
316
+ Contributors: Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang, Zhong Zhang, Yaxi Lu, Zhenghao Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun
317
+
318
+ Advisors: Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
319
+
320
+ ## Citation
321
+
322
+ If **AgentCPM-Report** is helpful for your research, please cite it as follows:
323
+
324
+ ```bibtex
325
+ @software{AgentCPMReport2026,
326
+ title = {AgentCPM-Report: Gemini-2.5-pro-DeepResearch Level Local DeepResearch},
327
+ author = {Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang, Zhong Zhang, Yaxi Lu, Zhenghao Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun},
328
+ year = {2026},
329
+ url = {https://github.com/OpenBMB/AgentCPM}
330
+ }
331
+ ```