Trimming the Long-Tail of Visual World Modeling Evaluation Paper • 2606.24256 • Published 8 days ago • 34
GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems Paper • 2606.28187 • Published 5 days ago • 10
BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery Paper • 2606.20997 • Published 12 days ago • 3
BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery Paper • 2606.20997 • Published 12 days ago • 3
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 10 days ago • 95
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 10 days ago • 95 • 3
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 10 days ago • 95
GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces Paper • 2604.04017 • Published Apr 5 • 8
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 10 days ago • 95
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 27 days ago • 44 • 4
Brick-Composer: Using MLLMs for Assembly with Diverse Bricks Paper • 2606.05445 • Published 28 days ago • 8
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 27 days ago • 44
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 27 days ago • 44