Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 12 days ago • 63
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published May 27 • 93