SWE-rebench-V2 Collection SWE-rebench-V2 is a curated dataset of software-engineering tasks derived from real GitHub issues and pull requests. • 3 items • Updated Mar 3 • 18
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 325
AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions Paper • 2508.16402 • Published Aug 22, 2025 • 14 • 4
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17, 2025 • 46 • 5