AgentCPM-Report / README.md
Kaguya-19's picture
Upload folder using huggingface_hub
ebb191f verified
|
raw
history blame
13.2 kB
metadata
license: apache-2.0

AgentCPM-Report: Gemini-2.5-pro-DeepResearch Level Local DeepResearch

Links

News

  • [2026-01-20] 🚀🚀🚀 We open-sourced AgentCPM-Report built on MiniCPM4.1-8B, capable of matching top closed-source commercial systems like Gemini-2.5-pro-DeepResearch in report generation.

Overview

AgentCPM-Report is an open-source large language model agent jointly developed by THUNLP, Renmin University of China RUCBM, and ModelBest. It is based on the MiniCPM4.1 8B-parameter base model. It accepts user instructions as input and autonomously generates long-form reports. Key highlights:

  • Strong advantages in insight and comprehensiveness: The first 8B edge-side model to surpass closed-source DeepResearch systems on deep research report generation tasks, redefining the performance ceiling for small-scale agent systems—especially achieving SOTA results on the Insight metric.
  • Lightweight and local deployment: Supports agile local deployment. With frameworks like UltraRAG, it enables large-scale knowledge base construction and can generate reports that are even more professional and in-depth than large models. Lightweight models plus local knowledge bases make it feasible to deploy a deep-research report writing system on a personal computer, laying the foundation for report writing based on personal privacy data or private-domain data.

Demo Cases

YouTube link or Bilibili link for the video

Quick Start

Docker Deployment

We provide a minimal one-click docker-compose deployment integrated with UltraRAG, including the RAG framework UltraRAG2.0, the model inference framework vllm, and the vector database milvus. If you want CPU inference, we also provide a llama.cpp-based version for gguf models—just switch docker-compose.yml to docker-compose.cpu.yml.

git clone git@github.com:OpenBMB/UltraRAG.git
cd UltraRAG
git checkout agentcpm-report-demo
cd agentcpm-report-demo
cp env.example .env
docker-compose -f docker-compose.yml up -d --build
docker-compose -f docker-compose.yml logs -f ultrarag-ui

The first startup pulls images, downloads the model, and configures the environment, which takes about 30 minutes. Then open http://localhost:5050. If you can see the UI, your deployment is successful. Follow the UI instructions to upload local files, chunk them, and build indexes; then in the Chat section, select AgentCPM-Report in the pipeline to start your workflow.

(Optional) You can import Wiki2024 as the writing database.

You can read more tutorials about AgentCPM-Report in the documentation.

Evaluation

DeepResearch Bench Overall Comprehensiveness Insight Instruction Following Readability
Doubao-research 44.34 44.84 40.56 47.95 44.69
Claude-research 45 45.34 42.79 47.58 44.66
OpenAI-deepresearch 46.45 46.46 43.73 49.39 47.22
Gemini-2.5-Pro-deepresearch 49.71 49.51 49.45 50.12 50
WebWeaver(Qwen3-30B-A3B) 46.77 45.15 45.78 49.21 47.34
WebWeaver(Claude-Sonnet-4) 50.58 51.45 50.02 50.81 49.79
Enterprise-DR(Gemini-2.5-Pro) 49.86 49.01 50.28 50.03 49.98
RhinoInsigh(Gemini-2.5-Pro) 50.92 50.51 51.45 51.72 50
AgentCPM-Report 50.11 50.54 52.64 48.87 44.17
DeepResearch Gym Avg. Clarity Depth Balance Breadth Support Insightfulness
Doubao-research 84.46 68.85 93.12 83.96 93.33 84.38 83.12
Claude-research 80.25 86.67 96.88 84.41 96.56 26.77 90.22
OpenAI-deepresearch 91.27 84.90 98.10 89.80 97.40 88.40 89.00
Gemini-2.5-pro-deepresearch 96.02 90.71 99.90 93.37 99.69 95.00 97.45
WebWeaver (Qwen3-30b-a3b) 77.27 71.88 85.51 75.80 84.78 63.77 81.88
WebWeaver (Claude-sonnet-4) 96.77 90.50 99.87 94.30 100.00 98.73 97.22
AgentCPM-Report 98.48 95.1 100.0 98.5 100.0 97.3 100.0
DeepConsult Avg. Win Tie Lose
Doubao-research 5.42 29.95 40.35 29.7
Claude-research 4.6 25 38.89 36.11
OpenAI-deepresearch 5 0 100 0
Gemini-2.5-Pro-deepresearch 6.7 61.27 31.13 7.6
WebWeaver(Qwen3-30B-A3B) 4.57 28.65 34.9 36.46
WebWeaver(Claude-Sonnet-4) 6.96 66.86 10.47 22.67
Enterprise-DR(Gemini-2.5-Pro) 6.82 71.57 19.12 9.31
RhinoInsigh(Gemini-2.5-Pro) 6.82 68.51 11.02 20.47
AgentCPM-Report 6.6 57.6 13.73 28.68

Our evaluation datasets include DeepResearch Bench, DeepConsult, and DeepResearch Gym. The writing-time knowledge base includes about 2.7 million Arxiv papers and about 200,000 internal webpage summaries.

Acknowledgements

This project would not be possible without the support and contributions of the open-source community. During development, we referred to and used multiple excellent open-source frameworks, models, and data resources, including verl, UltraRAG, MiniCPM4.1, and SurveyGo.

Contributions

Project leads: Yishan Li, Wentong Chen

Contributors: Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang, Zhong Zhang, Yaxi Lu, Zhenghao Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun

Advisors: Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun

Citation

If AgentCPM-Report is helpful for your research, please cite it as follows:

@software{AgentCPMReport2026,
  title  = {AgentCPM-Report: Gemini-2.5-pro-DeepResearch Level Local DeepResearch},
  author = {Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang, Zhong Zhang, Yaxi Lu, Zhenghao Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun},
  year   = {2026},
  url    = {https://github.com/OpenBMB/AgentCPM}
}