Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 16 days ago • 63
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 21 days ago • 109
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published May 27 • 93