SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search Paper • 2605.29796 • Published 8 days ago • 24
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Paper • 2601.10355 • Published Jan 15 • 39
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search Paper • 2601.11037 • Published Jan 16 • 17