Add model card for DIVE-8B-RL
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
license: other
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# DIVE-8B-RL
|
| 8 |
+
|
| 9 |
+
DIVE-8B-RL is a tool-using Large Language Model based on the Qwen3-8B architecture. It was fine-tuned using the **DIVE** (**Di**verse, **V**erifiable, and **E**xecutable) recipe, an evidence-driven framework that synthesizes agentic tasks by inverting the synthesis order: executing diverse, real-world tools first and reverse-deriving tasks strictly entailed by the resulting traces.
|
| 10 |
+
|
| 11 |
+
Detailed information can be found in the paper [DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use](https://huggingface.co/papers/2603.11076).
|
| 12 |
+
|
| 13 |
+
- **Project Page:** [https://sheep333c.github.io/DIVE/](https://sheep333c.github.io/DIVE/)
|
| 14 |
+
- **Repository:** [https://github.com/sheep333c/DIVE](https://github.com/sheep333c/DIVE)
|
| 15 |
+
- **Paper:** [arXiv:2603.11076](https://arxiv.org/abs/2603.11076)
|
| 16 |
+
|
| 17 |
+
## Model Description
|
| 18 |
+
|
| 19 |
+
Recent work synthesizes agentic tasks for tool-using LLMs, yet robust generalization remains challenging. DIVE traces this to insufficient diversity in synthesized tasks. The DIVE recipe scales structural diversity along two controllable axes—tool-pool coverage and per-task toolset variety—using an Evidence Collection–Task Derivation loop that induces rich multi-step tool-use patterns across 373 tools in five domains.
|
| 20 |
+
|
| 21 |
+
Training Qwen3-8B on DIVE data (48k SFT + 3.2k RL) improves performance by +22 average points across 9 OOD benchmarks and outperforms the strongest 8B baseline by +68. Controlled scaling analysis reveals that diversity scaling consistently outperforms quantity scaling for OOD generalization.
|
| 22 |
+
|
| 23 |
+
## Installation
|
| 24 |
+
|
| 25 |
+
```bash
|
| 26 |
+
conda create -n dive python=3.10
|
| 27 |
+
conda activate dive
|
| 28 |
+
pip install -e .
|
| 29 |
+
|
| 30 |
+
# Optional: domain-specific tool dependencies
|
| 31 |
+
pip install -e ".[all-tools]"
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
## Quick Start (CLI)
|
| 35 |
+
|
| 36 |
+
To synthesize tasks using the DIVE framework:
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
# Configure API keys and model settings in dive.yaml
|
| 40 |
+
dive --config dive.yaml synthesize --domain medical --count 10 --workers 4
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
To run the full end-to-end pipeline (synthesize → solve → verify → aggregate):
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
dive --config dive.yaml end2end \
|
| 47 |
+
--domain medical \
|
| 48 |
+
--count 100 \
|
| 49 |
+
--workers 10 \
|
| 50 |
+
--batch_size 20
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## Citation
|
| 54 |
+
|
| 55 |
+
If our paper or related resources prove valuable to your research, please consider citing:
|
| 56 |
+
|
| 57 |
+
```bibtex
|
| 58 |
+
@misc{chen2026divescalingdiversityagentic,
|
| 59 |
+
title={DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use},
|
| 60 |
+
author={Aili Chen and Chi Zhang and Junteng Liu and Jiangjie Chen and Chengyu Du and Yunji Li and Ming Zhong and Qin Wang and Zhengmao Zhu and Jiayuan Song and Ke Ji and Junxian He and Pengyu Zhao and Yanghua Xiao},
|
| 61 |
+
year={2026},
|
| 62 |
+
eprint={2603.11076},
|
| 63 |
+
archivePrefix={arXiv},
|
| 64 |
+
primaryClass={cs.AI},
|
| 65 |
+
url={https://arxiv.org/abs/2603.11076},
|
| 66 |
+
}
|
| 67 |
+
```
|