File size: 6,178 Bytes
047d480
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
419e63b
 
 
 
 
 
 
 
047d480
 
 
 
 
 
 
 
 
 
28e0411
 
 
 
 
 
 
419e63b
28e0411
047d480
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
419e63b
047d480
 
 
53abcbd
419e63b
53abcbd
047d480
419e63b
047d480
419e63b
 
 
 
047d480
419e63b
047d480
419e63b
53abcbd
419e63b
 
53abcbd
419e63b
53abcbd
419e63b
53abcbd
419e63b
 
 
 
 
 
 
53abcbd
419e63b
 
53abcbd
047d480
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7f94cde
047d480
 
419e63b
 
9088f51
047d480
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
base_model: Qwen/Qwen2.5-7B-Instruct
base_model_relation: adapter
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
- conversational-memory
- information-extraction
- long-context
- peft
- lora
- qwen2.5
---

# PRISM-Memory

PRISM-Memory is a LoRA adapter that trains `Qwen/Qwen2.5-7B-Instruct` to write
proposition-level memory from dialogue. It is a memory-writing component, not a
general chat model.

## Released model

- Model name: `PRISM-Memory 7B Adapter`
- Base model: `Qwen/Qwen2.5-7B-Instruct`
- Adapter type: `LoRA`

## What this release shows

- A 7B open model can replace GPT-4.1 for the extraction step in this memory pipeline.
- On the confirmed release surface, PRISM-Memory scores `0.4768` on LongMemEval and `0.4981` on LoCoMo.
- The GPT-4.1-based PropMem reference scores `0.4650` on LongMemEval and `0.5360` on LoCoMo.

This comparison holds the QA layer constant. It compares extractor against
extractor, not a full end-to-end GPT-4.1 system.

## Why this is useful

- It keeps hard limits and preferences available for later workflow generation.
- It keeps current state separate from future plans.
- It supports dated recall and clean refusal on unsupported questions.

See [docs/release/memory-scenarios.md](docs/release/memory-scenarios.md) for
compact end-to-end examples.

## Load the adapter

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "AsadIsmail/prism-memory"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_id,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)
```

This repo contains adapter weights only. You still need the base model.

## Training data

PRISM-Memory was trained on **synthetic** multi-session memory conversations
with **GPT-4.1-derived** memory-writing labels. The public release does not use
real user chat logs.

| Item | Count | Notes |
|---|---:|---|
| synthetic training conversations | `2,329` | multi-session conversations with inserts, updates, and deletes |
| synthetic held-out conversations | `584` | evaluation split used for held-out examples |
| supervised extraction examples | `100,427` | memory-writing labels derived from the synthetic corpus |
| released training subset | `20,000` | supervised examples used for the public adapter |

### Example training item

**Synthetic scenario**

- Domain: cloud infrastructure performance optimization
- Persona: senior cloud systems engineer at a fintech startup

**Synthetic user turn**

> Here’s the initial architecture outline: deploy microservices on AWS Fargate, use PostgreSQL 13 as the primary database, plan Kubernetes orchestration, use Redis for caching, and keep API latency under 50ms.

**Target memory records**

- Deploy microservices on AWS Fargate
- Orchestrate containers on a Kubernetes cluster (planned)
- Primary database: PostgreSQL 13
- Use Redis as an in-memory caching layer
- Latency target: API responses under 50ms

The release makes the dataset design, counts, and example records public. It
does not bundle the full raw corpus files.

## Confirmed results

| Benchmark | PRISM-Memory | GPT-4.1-based PropMem reference |
|---|---:|---:|
| LongMemEval | `0.4768` | `0.4650` |
| LoCoMo | `0.4981` | `0.5360` |

The reproduced evaluation hit the cached QA surface exactly: `460` hits,
`0` misses.

## Extraction examples

### Infrastructure bottlenecks stay structured
- Session date: `2025-01-04 15:34:00`
- Overlap score: `0.909`
- Note: Near-exact match on two operational facts from a single held-out turn.

**Turn**

> yeah, no real caching beyond basic Docker layer caching. Jenkins nodes have limited capacity, and we sometimes hit queue delays during peak commits.

**GPT-4.1 reference**

- No caching beyond basic Docker layer caching
- Jenkins nodes have limited capacity and experience queue delays during peak commits

**PRISM-Memory**

- No Docker caching beyond basic layer caching
- Jenkins nodes have limited capacity; peak commits cause queue delays

### Numeric constraints and preferences survive extraction
- Session date: `2025-03-01 15:07:00`
- Overlap score: `0.800`
- Note: The trained model keeps both the hard concurrency cap and the desired notification style.

**Turn**

> yeah, I think starting with incremental scans and parallel matrix jobs makes sense. We have 20 concurrent jobs max on GitHub Actions currently. Also want to keep Slack notifications from Snyk consistent with other pipeline alerts—aggregated and concise. Can you help draft the workflow?

**GPT-4.1 reference**

- GitHub Actions concurrency limit: 20 concurrent jobs
- Wants Snyk Slack notifications aggregated and concise, consistent with other pipeline alerts

**PRISM-Memory**

- GitHub Actions concurrency limit: 20 concurrent jobs
- Snyk Slack notifications should be aggregated and concise

More held-out examples live in
[docs/release/extraction-examples.md](docs/release/extraction-examples.md).

## Bundled docs and artifacts

- [docs/release/datasets.md](docs/release/datasets.md)
- [docs/release/extraction-examples.md](docs/release/extraction-examples.md)
- [docs/release/extraction-skill.md](docs/release/extraction-skill.md)
- [docs/release/memory-scenarios.md](docs/release/memory-scenarios.md)
- [docs/release/release-results.md](docs/release/release-results.md)
- [docs/release/technical-blog.md](docs/release/technical-blog.md)
- [results/release_summary.json](results/release_summary.json)
- [results/extraction_examples.json](results/extraction_examples.json)
- [results/try_it_sessions.json](results/try_it_sessions.json)

## Demo

The companion Space is live at
`https://huggingface.co/spaces/AsadIsmail/prism-memory`.

## Limitations

- This is a memory-writing component, not a general chat model.
- It is a LoRA adapter, not a standalone full checkpoint.
- The evaluation pipeline still uses a separate QA model to score retrieved memory.
- Temporal and inferential categories still trail stronger larger-model baselines.