arxiv:2605.10848

Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

Published on May 11

· Submitted by

Ricky Hsu on May 12

Upvote

Authors:

Tz-Huan Hsu ,

Jheng-Hong Yang ,

Abstract

Lexical retrievers remain effective for deep research tasks when paired with advanced LLMs, outperforming dense retrievers in answer accuracy and evidence recall.

AI-generated summary

Does a lexical retriever suffice as large language models (LLMs) become more capable in an agentic loop? This question naturally arises when building deep research systems. We revisit it by pairing BM25 with frontier LLMs that have better reasoning and tool-use abilities. To support researchers asking the same question, we introduce Pi-Serini, a search agent equipped with three tools for retrieving, browsing, and reading documents. Our results show that, on BrowseComp-Plus, a well-configured lexical retriever with sufficient retrieval depth can support effective deep research when paired with more capable LLMs. Specifically, Pi-Serini with gpt-5.5 achieves 83.1% answer accuracy and 94.7% surfaced evidence recall, outperforming released search agents that use dense retrievers. Controlled ablations further show that BM25 tuning improves answer accuracy by 18.0% and surfaced evidence recall by 11.1% over the default BM25 setting, while increasing retrieval depth further improves surfaced evidence recall by 25.3% over the shallow-retrieval setting. Source code is available at https://github.com/justram/pi-serini.

View arXiv page View PDF Project page GitHub 38 Add to collection

Community

ricky42613

Paper author Paper submitter about 7 hours ago

Does a lexical retriever suffice for agentic search when agents can keep refining their queries?

As LLMs become more capable in agentic loops, agents can continuously refine their behavior, including tool use and reasoning, based on feedback from the environment. Motivated by this, I couldn’t help but ask the question above.

To answer it, we introduce Pi-Serini (= PI + Anserini), a minimal BM25-based search agent equipped with search, browse, and read tools. This interface allows agents to cache retrieved rankings locally and selectively decide which content should enter their context window, much like how people use Google Search. This design enables agents to retrieve more deeply.

Results on BrowseComp-Plus:
Pi-Serini with GPT-5.5 achieves 83.1% answer accuracy and 94.7% surfaced evidence recall.

Controlled ablations show that:
Well-configured BM25 improves answer accuracy by 18.0% and surfaced evidence recall by 11.1% over the default BM25 setting.
Increasing retrieval depth improves surfaced evidence recall by 25.3% over shallow retrieval.

Overall, our answer is: Yes, a lexical retriever can suffice for agentic search when it is well configured and the search agent is equipped with a tool interface that enables deeper retrieval.

Notably, if you are worried about the high cost of deep research, Pi-Serini may help: it reduces evaluation cost by 3.3×–10×.

Paper: https://arxiv.org/abs/2605.10848
Code: https://github.com/justram/pi-serini
Project site: https://ricky42613.github.io/piserini

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.10848 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.10848 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.10848 in a Space README.md to link it from this page.