PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 4 days ago • 89 • 3
Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge Paper • 2605.08518 • Published May 8 • 11 • 2
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines Paper • 2605.20630 • Published May 20 • 12 • 2
MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments Paper • 2605.09131 • Published May 9 • 59 • 2