ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research Paper • 2606.07591 • Published May 28 • 97
Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning Paper • 2606.07602 • Published about 1 month ago • 6
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published May 19 • 108
Stabilizing Rubric Integration Training via Decoupled Advantage Normalization Paper • 2603.26535 • Published Mar 27 • 3
Stabilizing Rubric Integration Training via Decoupled Advantage Normalization Paper • 2603.26535 • Published Mar 27 • 3
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning Paper • 2509.25300 • Published Sep 29, 2025 • 8
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning Paper • 2509.25300 • Published Sep 29, 2025 • 8
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 239
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 239