Efficient Multimodal Planning Agent for Visual Question-Answering Paper • 2601.20676 • Published Jan 28
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published Feb 26 • 45
GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces Paper • 2604.04017 • Published Apr 5 • 8
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents Paper • 2605.10832 • Published May 11 • 22
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation Paper • 2605.21605 • Published May 20 • 14
Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking Paper • 2606.07689 • Published 21 days ago • 5
GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces Paper • 2604.04017 • Published Apr 5 • 8
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents Paper • 2605.10832 • Published May 11 • 22