RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction
Abstract
RealMem benchmark evaluates memory systems for long-term project-oriented interactions in large language models, revealing challenges in managing dynamic context dependencies.
As Large Language Models (LLMs) evolve from static dialogue interfaces to autonomous general agents, effective memory is paramount to ensuring long-term consistency. However, existing benchmarks primarily focus on casual conversation or task-oriented dialogue, failing to capture **"long-term project-oriented"** interactions where agents must track evolving goals. To bridge this gap, we introduce **RealMem**, the first benchmark grounded in realistic project scenarios. RealMem comprises over 2,000 cross-session dialogues across eleven scenarios, utilizing natural user queries for evaluation. We propose a synthesis pipeline that integrates Project Foundation Construction, Multi-Agent Dialogue Generation, and Memory and Schedule Management to simulate the dynamic evolution of memory. Experiments reveal that current memory systems face significant challenges in managing the long-term project states and dynamic context dependencies inherent in real-world projects. Our code and datasets are available at [https://github.com/AvatarMemory/RealMemBench](https://github.com/AvatarMemory/RealMemBench).
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- EvolMem: A Cognitive-Driven Benchmark for Multi-Session Dialogue Memory (2026)
- Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents (2026)
- Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI (2025)
- MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards (2026)
- KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions (2026)
- MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents (2026)
- Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper