Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 5 days ago • 52
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 5 days ago • 191
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research Paper • 2605.26114 • Published 27 days ago • 64
TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks Paper • 2605.22535 • Published about 1 month ago • 11