Papers
arxiv:2509.18436

Memory-QA: Answering Recall Questions Based on Multimodal Memories

Published on Sep 22, 2025
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Memory-QA is a novel task for recalling visual content from multimodal memories, addressed through the Pensieve pipeline that integrates memory augmentation, temporal-location aware retrieval, and multi-memory QA fine-tuning.

AI-generated summary

We introduce Memory-QA, a novel real-world task that involves answering recall questions about visual content from previously stored multimodal memories. This task poses unique challenges, including the creation of task-oriented memories, the effective utilization of temporal and location information within memories, and the ability to draw upon multiple memories to answer a recall question. To address these challenges, we propose a comprehensive pipeline, Pensieve, integrating memory-specific augmentation, time- and location-aware multi-signal retrieval, and multi-memory QA fine-tuning. We created a multimodal benchmark to illustrate various real challenges in this task, and show the superior performance of Pensieve over state-of-the-art solutions (up to 14% on QA accuracy).

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.18436 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.18436 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.