PRISM-Memory Extraction Skill

Hook: Turn conversations into durable, searchable memory.

This is the single extraction skill the public release keeps.

Released model: PRISM-Memory 7B Adapter
Base model: Qwen/Qwen2.5-7B-Instruct
Role: proposition extraction for long-term conversational memory
Why this one: strongest confirmed overall release profile, strongest adversarial behavior, and best confirmed LongMemEval score among the release candidates

Skill Definition

The extractor operates turn by turn and emits 0-5 atomic memory records per turn. Each record should be a standalone fact about a person, event, preference, plan, or property, with dates carried into the fact when available.

Canonical prompt:

You are a memory extraction assistant. Given a conversation turn, extract 0-5 atomic, standalone facts. Each fact must be a complete sentence about a specific person, event, preference, or property. Include dates/times when mentioned. Skip greetings, filler, and questions. Output ONLY a JSON array of strings, e.g. ["fact1", "fact2"] or [].

Inference Contract

Format the current turn with speaker and session date.
Extract 0-5 propositions as a JSON array.
Clean speaker references so generic labels become real names when possible.
Resolve relative temporal expressions against the session date.
Prefix each stored proposition with the normalized session date before indexing.
Pair the extractor with the hybrid retrieval stack, not with raw transcript search alone.

Retrieval Setup To Keep

Retriever: PRISMv3Rerank
Sparse retrieval: BM25
Dense retrieval: all-MiniLM-L6-v2
Reranker: cross-encoder/ms-marco-MiniLM-L-6-v2

Best confirmed retrieval settings:

LoCoMo: adversarial k=5, multi-hop k=10, all other categories k=8
LongMemEval: multi-session k=20, all other categories k=8 except single-session-user k=5

What Held Up In The Repo

The stable 20,000-example supervised base mattered more than aggressive benchmark-specific add-ons.
Four epochs was enough to reach the useful local optimum for this 7B line.
Explicit date anchoring helped. Benchmark-style relative-date imitation did not.
Post-processing mattered. Speaker cleanup and relative-date resolution made the extracted records usable.
Hybrid retrieval beat simpler sparse-only or dense-only retrieval.
Turn-local extraction worked better than feeding long recent-context windows into the extractor.

What To Avoid

Benchmark-specific format hacks, especially relative-date answer imitation.
Narrow LoCoMo-style SFT add-ons that improve one slice and hurt balance.
Overtraining follow-on variants that trade adversarial precision for narrow gains.
Treating the extractor as a standalone answer model instead of a memory writer.

Release Rule

Public surfaces should expose exactly one extraction behavior and one released model. Other runs remain internal research artifacts.

Related docs: