Efficient Multimodal Planning Agent for Visual Question-Answering Paper • 2601.20676 • Published Jan 28
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published Feb 26 • 45
GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces Paper • 2604.04017 • Published Apr 5 • 8
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents Paper • 2605.10832 • Published May 11 • 22
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation Paper • 2605.21605 • Published May 20 • 14
Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking Paper • 2606.07689 • Published 23 days ago • 5
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7, 2025 • 143