PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 4 days ago • 89
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs Paper • 2606.09030 • Published 17 days ago • 30
Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents Paper • 2606.06036 • Published 21 days ago • 73
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time? Paper • 2606.05553 • Published 21 days ago • 48
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 24 days ago • 55
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 24 days ago • 57
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 28 days ago • 146
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models Paper • 2605.11887 • Published May 12 • 17