Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ pipeline_tag: text-generation
|
|
| 14 |
|
| 15 |
# Overview
|
| 16 |
|
| 17 |
-
ToolMind-
|
| 18 |
Following extensive **SFT (Supervised Fine-Tuning)** and **RL (Reinforcement Learning)** focused on search behaviors,
|
| 19 |
our model attains leading performance among small-scale models on multiple long-horizon leaderboards like **Xbench-Deepsearch, HLE, and GAIA**, enabling reliable execution of up to hundreds of consecutive tool invocations.
|
| 20 |
|
|
@@ -28,7 +28,7 @@ our model attains leading performance among small-scale models on multiple long-
|
|
| 28 |
|
| 29 |
1. **Strong Performance at Compact Scale**
|
| 30 |
|
| 31 |
-
ToolMind-
|
| 32 |
|
| 33 |
2. **An Open-Source Complex QA Dataset Synthesized from Wikipedia Entity–Relation Knowledge Graphs**
|
| 34 |
|
|
@@ -51,8 +51,8 @@ In the reinforcement learning (RL) stage, a **turn-level reward** provides feedb
|
|
| 51 |
| MiroThinker 8B | 0.664 | 0.311 | 0.402 | 0.215 | 0.404 | 0.606 | | / |
|
| 52 |
| AgentCPM-Explore 4B | 0.639 | 0.25 | 0.29 | 0.191 | 0.4 | 0.7 | / | / |
|
| 53 |
| **Ours**|
|
| 54 |
-
| **ToolMind-
|
| 55 |
-
| **ToolMind-
|
| 56 |
|
| 57 |
|
| 58 |
|
|
|
|
| 14 |
|
| 15 |
# Overview
|
| 16 |
|
| 17 |
+
ToolMind-Web-3B is a specialized lightweight agent built on top of the [**Nanbeige4-3B-Thinking-2511**](https://huggingface.co/Nanbeige/Nanbeige4-3B-Thinking-2511) foundation model.
|
| 18 |
Following extensive **SFT (Supervised Fine-Tuning)** and **RL (Reinforcement Learning)** focused on search behaviors,
|
| 19 |
our model attains leading performance among small-scale models on multiple long-horizon leaderboards like **Xbench-Deepsearch, HLE, and GAIA**, enabling reliable execution of up to hundreds of consecutive tool invocations.
|
| 20 |
|
|
|
|
| 28 |
|
| 29 |
1. **Strong Performance at Compact Scale**
|
| 30 |
|
| 31 |
+
ToolMind-Web-3B delivers high-quality long-horizon reasoning and tool-augmented search capabilities while maintaining a lightweight 3B parameter footprint. Despite its compact size, it achieves competitive performance across multiple benchmarks like **Xbench-Deepsearch, GAIA, and HLE**. The model is evaluated under the **[MiroThinkers workflow](https://github.com/MiroMindAI/MiroThinker)**, ensuring standardized and reproducible assessment.
|
| 32 |
|
| 33 |
2. **An Open-Source Complex QA Dataset Synthesized from Wikipedia Entity–Relation Knowledge Graphs**
|
| 34 |
|
|
|
|
| 51 |
| MiroThinker 8B | 0.664 | 0.311 | 0.402 | 0.215 | 0.404 | 0.606 | | / |
|
| 52 |
| AgentCPM-Explore 4B | 0.639 | 0.25 | 0.29 | 0.191 | 0.4 | 0.7 | / | / |
|
| 53 |
| **Ours**|
|
| 54 |
+
| **ToolMind-Web-3B(w Synthetic QA only)** | 0.583 | 0.144 | 0.301 | 0.224 | 0.36 | 0.76 | 0.3 | 0.308 |
|
| 55 |
+
| **ToolMind-Web-3B** | 0.670 | 0.174 | 0.308 | 0.248 | 0.477 | 0.751 | 0.37 | 0.458 |
|
| 56 |
|
| 57 |
|
| 58 |
|