Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 17 days ago • 120
DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch Paper • 2606.10728 • Published 18 days ago • 34
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Paper • 2604.25256 • Published Apr 28 • 30
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? Paper • 2603.03194 • Published Mar 3 • 57
MemoBrain: Executive Memory as an Agentic Brain for Reasoning Paper • 2601.08079 • Published Jan 12 • 39