Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields Paper • 2606.11042 • Published 2 days ago • 14
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published Dec 14, 2025 • 52
msc-smart-contract-auditing/vulnerability-severity-classification Viewer • Updated May 4, 2024 • 2.91k • 127 • 3