PRISM-Memory Extraction Skill
Hook: Turn conversations into durable, searchable memory.
This is the single extraction skill the public release keeps.
- Released model:
PRISM-Memory 7B Adapter - Base model:
Qwen/Qwen2.5-7B-Instruct - Role: proposition extraction for long-term conversational memory
- Why this one: strongest confirmed overall release profile, strongest adversarial behavior, and best confirmed LongMemEval score among the release candidates
Skill Definition
The extractor operates turn by turn and emits 0-5 atomic memory records per
turn. Each record should be a standalone fact about a person, event,
preference, plan, or property, with dates carried into the fact when available.
Canonical prompt:
You are a memory extraction assistant. Given a conversation turn, extract 0-5 atomic, standalone facts. Each fact must be a complete sentence about a specific person, event, preference, or property. Include dates/times when mentioned. Skip greetings, filler, and questions. Output ONLY a JSON array of strings, e.g. ["fact1", "fact2"] or [].
Inference Contract
- Format the current turn with speaker and session date.
- Extract
0-5propositions as a JSON array. - Clean speaker references so generic labels become real names when possible.
- Resolve relative temporal expressions against the session date.
- Prefix each stored proposition with the normalized session date before indexing.
- Pair the extractor with the hybrid retrieval stack, not with raw transcript search alone.
Retrieval Setup To Keep
- Retriever:
PRISMv3Rerank - Sparse retrieval: BM25
- Dense retrieval:
all-MiniLM-L6-v2 - Reranker:
cross-encoder/ms-marco-MiniLM-L-6-v2
Best confirmed retrieval settings:
- LoCoMo: adversarial
k=5, multi-hopk=10, all other categoriesk=8 - LongMemEval: multi-session
k=20, all other categoriesk=8except single-session-userk=5
What Held Up In The Repo
- The stable
20,000-example supervised base mattered more than aggressive benchmark-specific add-ons. - Four epochs was enough to reach the useful local optimum for this 7B line.
- Explicit date anchoring helped. Benchmark-style relative-date imitation did not.
- Post-processing mattered. Speaker cleanup and relative-date resolution made the extracted records usable.
- Hybrid retrieval beat simpler sparse-only or dense-only retrieval.
- Turn-local extraction worked better than feeding long recent-context windows into the extractor.
What To Avoid
- Benchmark-specific format hacks, especially relative-date answer imitation.
- Narrow LoCoMo-style SFT add-ons that improve one slice and hurt balance.
- Overtraining follow-on variants that trade adversarial precision for narrow gains.
- Treating the extractor as a standalone answer model instead of a memory writer.
Release Rule
Public surfaces should expose exactly one extraction behavior and one released model. Other runs remain internal research artifacts.
Related docs: