Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use Paper • 2605.14038 • Published 7 days ago • 11
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 23 days ago • 67
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors Paper • 2505.23001 • Published May 29, 2025 • 8