PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 10 days ago • 95
Trimming the Long-Tail of Visual World Modeling Evaluation Paper • 2606.24256 • Published 8 days ago • 34
Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation Paper • 2605.12975 • Published May 13 • 9
Learning to Predict Future-Aligned Research Proposals with Language Models Paper • 2603.27146 • Published Apr 6 • 6
Can Language Models Solve Graph Problems in Natural Language? Paper • 2305.10037 • Published May 17, 2023 • 2