arxiv:2603.04257

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Published on Mar 4

· Submitted by

Zhenting Wang on Mar 5

Upvote

Authors:

Zhenting Wang ,

Abstract

A memory mechanism called Memex enables large language model agents to handle long-horizon tasks more effectively by maintaining compact context through structured summaries while storing full interaction details in an external database, allowing selective retrieval based on learned criteria.

AI-generated summary

Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence. Memex maintains a compact working context consisting of concise structured summaries and stable indices, while storing full-fidelity underlying interactions in an external experience database under those indices. The agent can then decide when to dereference an index and recover the exact past evidence needed for the current subgoal. We optimize both write and read behaviors with our reinforcement learning framework MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields a substantially less lossy form of long-horizon memory than summary-only approaches. We further provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows. Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task success while using a significantly smaller working context.

View arXiv page View PDF Add to collection

Community

ztwang

Paper author Paper submitter about 13 hours ago

Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on longhorizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience
memory mechanism that instead compresses context without discarding evidence. Memex maintains a
compact working context consisting of concise structured summaries and stable indices, while storing
full-fidelity underlying interactions in an external experience database under those indices. The agent
can then decide when to dereference an index and recover the exact past evidence needed for the
current subgoal. We optimize both write and read behaviors with our reinforcement learning framework
MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the
agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields
a substantially less lossy form of long-horizon memory than summary-only approaches. We further
provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality
with bounded dereferencing while keeping effective in-context computation bounded as history grows.
Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task
success while using a significantly smaller working context.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.04257 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.04257 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.04257 in a Space README.md to link it from this page.