SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 2 days ago • 18
SkillOrchestra: Learning to Route Agents via Skill Transfer Paper • 2602.19672 • Published Feb 23 • 56
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild Paper • 2510.14240 • Published Oct 16, 2025 • 13