AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Paper • 2604.25256 • Published 11 days ago • 29
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? Paper • 2603.03194 • Published Mar 3 • 57
MemoBrain: Executive Memory as an Agentic Brain for Reasoning Paper • 2601.08079 • Published Jan 12 • 39