HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 9 days ago • 73
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published 8 days ago • 89
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published Nov 14, 2025 • 15
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published Nov 14, 2025 • 15 • 2
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19, 2025 • 77
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19, 2025 • 77
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling Paper • 2508.03404 • Published Aug 5, 2025 • 4
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning Paper • 2508.06259 • Published Aug 8, 2025 • 2
DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing Paper • 2510.02253 • Published Oct 2, 2025 • 15
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow Paper • 2509.21789 • Published Sep 26, 2025 • 9
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21, 2025 • 22
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21, 2025 • 22
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow Paper • 2509.21789 • Published Sep 26, 2025 • 9
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning Paper • 2509.23768 • Published Sep 28, 2025 • 49
DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing Paper • 2510.02253 • Published Oct 2, 2025 • 15