SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 21 days ago • 29
CooperBench: Why Coding Agents Cannot be Your Teammates Yet Paper • 2601.13295 • Published Jan 19 • 5