Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
Paper
โข
2601.20209
โข
Published
โข
13
This model is trained using the SPARK framework proposed in the paper:
SPARK: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
๐ Paper: arXiv:2601.20209
SPARK is a novel reinforcement learning framework that enables autonomous strategic exploration for long-horizon agentic tasks. Instead of uniformly exploring all steps, SPARK selectively branches at critical decision points using intrinsic <explore> signals, achieving superior performance with significantly fewer training samples.
| Benchmark | SPARK-1.5B | GPT-5 | Gemini-2.5-Pro |
|---|---|---|---|
| ALFWorld L2 | 80.5% | 63.3% | 55.5% |
| ScienceWorld L2 | 49.2% | 33.6% | 30.5% |
| WebShop | 75.8% | 29.7% | 32.0% |
If you use this model or the SPARK framework in your research, please cite:
@article{wu2026spark,
title={SPARK: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning},
author={Wu, Jinyang and Yang, Shuo and Yang, Changpeng and Shen, Yuhao and Zhang, Shuai and Wen, Zhengqi and Tao, Jianhua},
journal={arXiv preprint arXiv:2601.20209},
year={2026}
}