Overview

ToolMind-web-3B is a specialized lightweight agent built on top of the Nanbeige4-3B-Thinking-2511 foundation model. Following extensive SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning) focused on search behaviors, our model attains leading performance among small-scale models on multiple long-horizon leaderboards like Xbench-Deepsearch, HLE, and GAIA, enabling reliable execution of up to hundreds of consecutive tool invocations.

Highlights

Strong Performance at Compact Scale

ToolMind-web-3B delivers high-quality long-horizon reasoning and tool-augmented search capabilities while maintaining a lightweight 3B parameter footprint. Despite its compact size, it achieves competitive performance across multiple benchmarks like Xbench-Deepsearch, GAIA, and HLE. The model is evaluated under the MiroThinkers workflow, ensuring standardized and reproducible assessment.

An Open-Source Complex QA Dataset Synthesized from Wikipedia Entity–Relation Knowledge Graphs

We provide a rich, structured QA dataset derived from Wikipedia knowledge graphs, designed to support supervised fine-tuning and reinforcement learning of search-augmented agents.

Turn-Level Judge Guided SFT and Reinforcement Learning

In the supervised fine-tuning (SFT) stage, a turn-level judge identifies which interaction turns should be used for training. In the reinforcement learning (RL) stage, a turn-level reward provides feedback to refine the model's multi-turn search and tool invocation behavior.

Note: Training data, full report, and evaluation details will be released soon.

Benchmark Results

Model	GAIA	BrowseComp	BrowseComp-zh	HLE	Seal-0	Xbench-Deepsearch	Xbench-Deepsearch-10	DSQA
DeepSeek-V3.2	0.635	0.676	0.65	0.408	0.385	0.71		/
MiniMax-M2	0.757	0.44	0.485	0.318	/	0.72		/
GLM-4.6	0.719	0.451	0.495	0.304	/	0.7		/
MiroThinker 8B	0.664	0.311	0.402	0.215	0.404	0.606		/
AgentCPM-Explore 4B	0.639	0.25	0.29	0.191	0.4	0.7	/	/
Ours
ToolMind-web-3B(w Synthetic QA only)	0.583	0.144	0.301	0.224	0.36	0.76	0.3	0.308
ToolMind-web-3B	0.670	0.174	0.308	0.248	0.477	0.751	0.37	0.458

Limitations

While we place great emphasis on the safety of the model during the training process, striving to ensure that its outputs align with ethical and legal requirements, it may not completely avoid generating unexpected outputs due to the model's size and probabilistic nature. These outputs may include harmful content such as bias or discrimination. Please don't propagate such content. We do not assume any responsibility for the consequences resulting from the dissemination of inappropriate information.

Citation

If you find our model useful or want to use it in your projects, please cite this project.

Contact

If you have any questions, please raise an issue or contact us at nanbeige@126.com.

Downloads last month: 14

Model tree for Nanbeige/ToolMind-web-3B

Quantizations

2 models