arxiv:2603.15653

Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context

Published on Mar 7

· Submitted by

Parshin Shojaee on Mar 18

Apple

Upvote

Authors:

Keivan Alizadeh ,

Parshin Shojaee ,

Mehrdad Farajtabar

Abstract

Language models struggle with long-context handling, but a new framework called SRLM improves performance by incorporating uncertainty-aware self-reflection to guide programmatic context interaction, outperforming existing methods like RLM across various tasks and context lengths.

AI-generated summary

Long-context handling remains a core challenge for language models: even with extended context windows, models often fail to reliably extract, reason over, and use the information across long contexts. Recent works like Recursive Language Models (RLM) have approached this challenge by agentic way of decomposing long contexts into recursive sub-calls through programmatic interaction at inference. While promising, the success of RLM critically depends on how these context-interaction programs are selected, which has remained largely unexplored. In this paper, we study this problem and introduce SRLM, a framework that augments programmatic context interaction with uncertainty-aware Self-Reflection. SRLM leverages three intrinsic signals: self consistency, reasoning length, and verbalized confidence. These serve as complementary indicators of a model's internal uncertainty, and the model uses them to evaluate and compare candidate context-interaction programs. Extensive experiments across diverse benchmark datasets, context lengths, and backbone models, show that SRLM consistently outperforms state-of-the-art baselines, yielding up to 22% improvement over RLM under the same time budget. Our findings show that recursion itself is not the primary driver of performance in RLM, and a simple self-reflective program search can match or surpass RLM without requiring self-query or explicit recursion mechanisms. We find that for context lengths within the model's window, RLMs with recursion often degrade performance relative to the base model, whereas SRLM yields consistent gains across both short and long contexts. We also find that RLM is less effective in tasks with semantically intensive nature, where heuristic program search is insufficient and broader contextual understanding is required, while self-reflection in SRLM provides a semantic signal that better steers reasoning in these scenarios.

View arXiv page View PDF Add to collection

Community

parshinsh

Paper author Paper submitter about 17 hours ago

•

edited about 16 hours ago

Recursive Language Models (RLMs) have been a new direction for long context. Ever wondered what is the main driver of RLM? Is it always effective or depends on the context? How is the role of recursion or self-query tool call in this setting?
In our recent work, we look closely into these questions. Our analysis show some interesting findings & alternatives for RLM-style program-based context interaction frameworks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.15653 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.15653 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.15653 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.