Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking
Abstract
Struct-Searcher introduces a belief revision theory-based structural agentic workflow for multimodal information seeking that improves accuracy over existing vision-language models and deep research agents.
Deep research agents have attracted increasing attention for their ability to collect large-scale online information to acquire target knowledge, with recent efforts shifting from purely text-based information seeking to multimodal settings. However, existing agentic workflows are largely aligned with evidence accumulation models, which linearly aggregate evidence and lack principled mechanisms for handling contradictory information across heterogeneous modalities. Towards this end, we propose Struct-Searcher, a structural agentic workflow grounded in belief revision theory that explicitly maintains an evolving multimodal structural graph throughout the reasoning process, enabling effective conflict-aware multimodal deep information seeking. Extensive experiments across multiple benchmark datasets and backbone models demonstrate that Struct-Searcher is (1) plug-and-play and model-agnostic, yielding an average relative accuracy improvement of 17.2% on BrowseComp-VL across five different backbones. (2) top-performing, consistently outperforming state-of-the-art vision-language models (VLMs) and deep research agents, with relative accuracy improvements of 3.7% on MM-BrowseComp, 1.5% on HLE-VL, and 0.7% on BrowseComp-VL over the second-best competing approach.
Community
Struct-Searcher is a training-free agentic workflow that advances multimodal deep research with structure-aware thinking mechanisms.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Towards Long-horizon Agentic Multimodal Search (2026)
- MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA (2026)
- SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning (2026)
- Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation (2026)
- InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search (2026)
- PaperScope: A Multi-Modal Multi-Document Benchmark for Agentic Deep Research Across Massive Scientific Papers (2026)
- MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper