hzeng
/

syn-plan-research-4B

+---
+pipeline_tag: text-generation
+library_name: transformers
+---
+# SynPlanResearch-R1-8B
+SynPlanResearch-R1 is a framework designed to improve the exploration behaviors of research agents. It synthesizes tool-use trajectories that encourage deeper investigation during cold-start supervised fine-tuning (SFT), providing a robust initialization for subsequent reinforcement learning. This specific checkpoint is based on the Qwen3-8B backbone.
+## Resources
+- **Paper**: [SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans](https://huggingface.co/papers/2603.07853)
+- **Repository**: [https://github.com/HansiZeng/syn-plan-research](https://github.com/HansiZeng/syn-plan-research)
+## Description
+Research Agents gather information from the web using tools to answer user queries, requiring them to dynamically interleave internal reasoning with tool use. While such capabilities can be learned via reinforcement learning with verifiable rewards (RLVR), agents often exhibit poor exploration behaviors, including premature termination and biased tool usage.
+SynPlanResearch-R1 addresses these challenges by synthesizing trajectories that shape the model's behavior toward more comprehensive exploration. Across seven multi-hop and open-web benchmarks, this framework improves performance by up to 6.0% on Qwen3-8B and 5.8% on Qwen3-4B backbones compared to state-of-the-art baselines.
+## Training and Evaluation
+The repository provides comprehensive scripts for:
+- **Supervised Fine-Tuning (SFT)**: `bash examples/syn_plan_research/sft_syn_plan_research.sh`
+- **Reinforcement Learning (RL)**: `bash examples/syn_plan_research/rl_syn_plan_research.sh`
+- **Evaluation**: `bash examples/syn_plan_research/eval_syn_plan_research_all.sh`
+For detailed environment setup and data configuration, please refer to the [official GitHub repository](https://github.com/HansiZeng/syn-plan-research).