NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers? Paper • 2606.24530 • Published 5 days ago • 60
EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions Paper • 2606.23654 • Published 6 days ago • 80
FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18, 2025 • 119
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook Paper • 2509.14142 • Published Sep 17, 2025 • 10
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper • 2509.09674 • Published Sep 11, 2025 • 81
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 193
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark? Paper • 2509.07894 • Published Sep 9, 2025 • 32
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4, 2025 • 77