Trimming the Long-Tail of Visual World Modeling Evaluation Paper • 2606.24256 • Published 8 days ago • 31
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 27 days ago • 44
Does Synthetic Layered Design Data Benefit Layered Design Decomposition? Paper • 2605.15167 • Published May 14 • 9
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 10 days ago • 95