WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents Paper • 2606.18847 • Published 8 days ago • 5
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models Paper • 2606.19534 • Published 8 days ago • 60
Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents Paper • 2606.19704 • Published 7 days ago • 39
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning Paper • 2606.17682 • Published 9 days ago • 26
OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains Paper • 2606.14702 • Published 13 days ago • 31
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 9 days ago • 203
MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding Paper • 2605.30794 • Published 27 days ago • 5
One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA Paper • 2606.10572 • Published 16 days ago • 16
Video2LoRA: Parametric Video Internalization for Vision-Language Models Paper • 2606.04351 • Published 22 days ago • 4
Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory Paper • 2606.09365 • Published 17 days ago • 3