TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use Paper • 2510.04550 • Published Oct 6, 2025 • 2
Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents Paper • 2602.02164 • Published 10 days ago • 1