--- license: apache-2.0 language: - en base_model: - Qwen/Qwen3-4B-Thinking-2507 library_name: transformers --- ## Latest News * [2026-01-12]🚀🚀🚀 We have open-sourced **AgentCPM-Explore**, an agent foundation model with only **4B parameters**, together with its **entire training and inference infrastructure**. AgentCPM-Explore has successfully entered **8 classic long-horizon agent benchmarks**, including **GAIA,HLE, and BrowserComp**. AgentCPM-Explore achieves **SOTA performance at the same parameter scale** and demonstrates its **accurate deep research capabilities**, effectively breaking the performance bottleneck for **on-device agents**. ## Overview Key highlights of AgentCPM-Explore include: - The **first full-parameter 4B agent model** to rank on **8 long-horizon and complex agent benchmarks**, including **GAIA, HLE, and BrowserComp**, in the on-device setting. - Capable of **over 100 rounds of continuous environment interaction**, supporting **multi-source information cross-validation**, **dynamic search strategy adjustment**, and **real-time verification of up-to-date information**, enabling sustained deep exploration until task completion. - **Fully open-sourced end-to-end**, including (1) **AgentRL**, a fully asynchronous reinforcement learning framework for agent training, (2) **AgentDock**, a unified management and scheduling platform for tool sandboxes, (3) **AgentToLeaP**, a one-click evaluation platform for agent tool-learning capabilities. These components collectively support **community collaboration and custom extensibility**. We elaborate on the entire construction pipeline of AgentCPM-Explore on [GitHub](https://github.com/OpenBMB/AgentCPM). ## Experimental Results
Model GAIA (text-only) BrowseComp BrowseComp (ZH) HLE Frames WebWalker Seal-0 Xbench-DeepSearch
Closed-Source Models
Claude-4.5-sonnet 71.2% 19.6% 40.8% 24.5% 85.0% / 53.4% 66.0%
Gemini Deep Research / / / 26.9% / / / /
DeepSeek-V3.2 63.5% 67.6% 65.0% 40.8% 80.2% / 38.5% 71.0%
MiniMax-M2 75.7% 44.0% 48.5% 31.8% / / / 72.0%
OpenAI-GPT-5-high 76.4% 54.9% 65.0% 35.2% / / 51.4% 77.8%
GLM-4.6 71.9% 45.1% 49.5% 30.4% / / / 70.0%
Kimi-Researcher / / / 26.9% 78.8% / 36.0% 69.0%
Seed-1.8 87.4% 67.6% 81.3% 40.9% / / / /
Open-Source Models
MiroThinker 8B 66.4% 31.1% 40.2% 21.5% 80.6% 60.6% 40.4% 60.6%
Tongyi DeepResearch 30B 70.9% 43.4% 46.7% 32.9% 90.6% 72.2% / 75.0%
ASearcher QWQ 32B v2 58.7% / / / 74.5% / / 51.1%
iterresearch-30B-A3B 72.8% 37.3% 45.2% 28.8% 71.0% / 39.6% /
WebSailor-V2-30B-A3B (RL) 74.1% 35.3% 44.1% 30.6% / / / 73.7%
WebLeaper-30B-A3B-RUC 73.2% 38.8% / / / / 48.6% 72.0%
WebDancer (QWQ-32B) 51.5% 3.8% 18.0% / / 47.9% / 38.3%
AgentCPM-Explore 4B 63.9% 25.0% 29.0% 19.1% 82.7% 68.1% 40.0% 70.0%