DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2, 2025 • 21
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26, 2025 • 92
Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Paper • 2502.20383 • Published Feb 27, 2025 • 3