Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
Paper
โข
2601.20209
โข
Published
โข
21
This model is trained using the SPARK framework proposed in the paper:
SPARK: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
๐ Paper: arXiv:2601.20209
SPARK is a novel reinforcement learning framework that enables autonomous strategic exploration for long-horizon agentic tasks. Instead of uniformly exploring all steps, SPARK selectively branches at critical decision points using intrinsic <explore> signals, achieving superior performance with significantly fewer training samples.
| Benchmark | SPARK-1.5B | GPT-5 | Gemini-2.5-Pro |
|---|---|---|---|
| ALFWorld L2 | 80.5% | 63.3% | 55.5% |
| ScienceWorld L2 | 49.2% | 33.6% | 30.5% |
| WebShop | 75.8% | 29.7% | 32.0% |
Here we provide a transformers inference style:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Jinyang23/Spark-1.5B-ALFWorld"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Calculate the sum of 123 and 456. Provide only the numerical answer."
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
If you use this model or the SPARK framework in your research, please cite:
@article{wu2026spark,
title={SPARK: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning},
author={Wu, Jinyang and Yang, Shuo and Yang, Changpeng and Shen, Yuhao and Zhang, Shuai and Wen, Zhengqi and Tao, Jianhua},
journal={arXiv preprint arXiv:2601.20209},
year={2026}
}