Add model card for DIVE-8B-RL

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ license: other
5
+ ---
6
+
7
+ # DIVE-8B-RL
8
+
9
+ DIVE-8B-RL is a tool-using Large Language Model based on the Qwen3-8B architecture. It was fine-tuned using the **DIVE** (**Di**verse, **V**erifiable, and **E**xecutable) recipe, an evidence-driven framework that synthesizes agentic tasks by inverting the synthesis order: executing diverse, real-world tools first and reverse-deriving tasks strictly entailed by the resulting traces.
10
+
11
+ Detailed information can be found in the paper [DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use](https://huggingface.co/papers/2603.11076).
12
+
13
+ - **Project Page:** [https://sheep333c.github.io/DIVE/](https://sheep333c.github.io/DIVE/)
14
+ - **Repository:** [https://github.com/sheep333c/DIVE](https://github.com/sheep333c/DIVE)
15
+ - **Paper:** [arXiv:2603.11076](https://arxiv.org/abs/2603.11076)
16
+
17
+ ## Model Description
18
+
19
+ Recent work synthesizes agentic tasks for tool-using LLMs, yet robust generalization remains challenging. DIVE traces this to insufficient diversity in synthesized tasks. The DIVE recipe scales structural diversity along two controllable axes—tool-pool coverage and per-task toolset variety—using an Evidence Collection–Task Derivation loop that induces rich multi-step tool-use patterns across 373 tools in five domains.
20
+
21
+ Training Qwen3-8B on DIVE data (48k SFT + 3.2k RL) improves performance by +22 average points across 9 OOD benchmarks and outperforms the strongest 8B baseline by +68. Controlled scaling analysis reveals that diversity scaling consistently outperforms quantity scaling for OOD generalization.
22
+
23
+ ## Installation
24
+
25
+ ```bash
26
+ conda create -n dive python=3.10
27
+ conda activate dive
28
+ pip install -e .
29
+
30
+ # Optional: domain-specific tool dependencies
31
+ pip install -e ".[all-tools]"
32
+ ```
33
+
34
+ ## Quick Start (CLI)
35
+
36
+ To synthesize tasks using the DIVE framework:
37
+
38
+ ```bash
39
+ # Configure API keys and model settings in dive.yaml
40
+ dive --config dive.yaml synthesize --domain medical --count 10 --workers 4
41
+ ```
42
+
43
+ To run the full end-to-end pipeline (synthesize → solve → verify → aggregate):
44
+
45
+ ```bash
46
+ dive --config dive.yaml end2end \
47
+ --domain medical \
48
+ --count 100 \
49
+ --workers 10 \
50
+ --batch_size 20
51
+ ```
52
+
53
+ ## Citation
54
+
55
+ If our paper or related resources prove valuable to your research, please consider citing:
56
+
57
+ ```bibtex
58
+ @misc{chen2026divescalingdiversityagentic,
59
+ title={DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use},
60
+ author={Aili Chen and Chi Zhang and Junteng Liu and Jiangjie Chen and Chengyu Du and Yunji Li and Ming Zhong and Qin Wang and Zhengmao Zhu and Jiayuan Song and Ke Ji and Junxian He and Pengyu Zhao and Yanghua Xiao},
61
+ year={2026},
62
+ eprint={2603.11076},
63
+ archivePrefix={arXiv},
64
+ primaryClass={cs.AI},
65
+ url={https://arxiv.org/abs/2603.11076},
66
+ }
67
+ ```