Text Generation
PEFT
Safetensors
conversational-memory
information-extraction
long-context
lora
qwen2.5
conversational
Instructions to use AsadIsmail/prism-memory with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AsadIsmail/prism-memory with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "AsadIsmail/prism-memory") - Notebooks
- Google Colab
- Kaggle
Publish PRISM-Memory adapter bundle
Browse files- README.md +1 -0
- docs/release/memory-scenarios.md +105 -0
- docs/release/technical-blog.md +67 -0
README.md
CHANGED
|
@@ -137,6 +137,7 @@ More held-out examples live in
|
|
| 137 |
- [docs/release/datasets.md](docs/release/datasets.md)
|
| 138 |
- [docs/release/extraction-examples.md](docs/release/extraction-examples.md)
|
| 139 |
- [docs/release/extraction-skill.md](docs/release/extraction-skill.md)
|
|
|
|
| 140 |
- [docs/release/release-results.md](docs/release/release-results.md)
|
| 141 |
- [docs/release/technical-blog.md](docs/release/technical-blog.md)
|
| 142 |
- [results/confirmed_exp15_summary.json](results/confirmed_exp15_summary.json)
|
|
|
|
| 137 |
- [docs/release/datasets.md](docs/release/datasets.md)
|
| 138 |
- [docs/release/extraction-examples.md](docs/release/extraction-examples.md)
|
| 139 |
- [docs/release/extraction-skill.md](docs/release/extraction-skill.md)
|
| 140 |
+
- [docs/release/memory-scenarios.md](docs/release/memory-scenarios.md)
|
| 141 |
- [docs/release/release-results.md](docs/release/release-results.md)
|
| 142 |
- [docs/release/technical-blog.md](docs/release/technical-blog.md)
|
| 143 |
- [results/confirmed_exp15_summary.json](results/confirmed_exp15_summary.json)
|
docs/release/memory-scenarios.md
ADDED
|
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# PRISM-Memory End-To-End Scenarios
|
| 2 |
+
|
| 3 |
+
These are compact product-style scenarios built from the public release
|
| 4 |
+
artifacts.
|
| 5 |
+
|
| 6 |
+
- The first two use the released held-out extraction examples.
|
| 7 |
+
- The last two use confirmed held-out benchmark cases from
|
| 8 |
+
[../../results/scenario_comparisons.json](../../results/scenario_comparisons.json).
|
| 9 |
+
|
| 10 |
+
The point is not just that the extractor matches GPT-4.1-style labels. The
|
| 11 |
+
point is that a later system can ask a concrete question and get back a useful,
|
| 12 |
+
inspectable answer from stored memory.
|
| 13 |
+
|
| 14 |
+
## 1. Keep hard limits and notification preferences
|
| 15 |
+
|
| 16 |
+
**Conversation turn**
|
| 17 |
+
|
| 18 |
+
> yeah, I think starting with incremental scans and parallel matrix jobs makes sense. We have 20 concurrent jobs max on GitHub Actions currently. Also want to keep Slack notifications from Snyk consistent with other pipeline alerts, aggregated and concise.
|
| 19 |
+
|
| 20 |
+
**Stored memory**
|
| 21 |
+
|
| 22 |
+
- GitHub Actions concurrency limit: 20 concurrent jobs
|
| 23 |
+
- Snyk Slack notifications should be aggregated and concise
|
| 24 |
+
|
| 25 |
+
**Later question**
|
| 26 |
+
|
| 27 |
+
What is our GitHub Actions concurrency limit, and how should Snyk alerts look?
|
| 28 |
+
|
| 29 |
+
**Answer from memory**
|
| 30 |
+
|
| 31 |
+
20 concurrent jobs. Snyk alerts should be aggregated and concise.
|
| 32 |
+
|
| 33 |
+
**Why it matters**
|
| 34 |
+
|
| 35 |
+
This is the kind of operational detail that gets buried in chat but needs to
|
| 36 |
+
survive into later workflow drafts and agent actions.
|
| 37 |
+
|
| 38 |
+
## 2. Keep current state separate from the roadmap
|
| 39 |
+
|
| 40 |
+
**Conversation turn**
|
| 41 |
+
|
| 42 |
+
> yeah good point about resource overhead, we set CPU limits for all sidecars and monitor with Prometheus now. no mTLS yet, but it’s on the roadmap for phase two. as for routing, we want to start with canary deployments and traffic splitting, maybe some basic fault injection for testing.
|
| 43 |
+
|
| 44 |
+
**Stored memory**
|
| 45 |
+
|
| 46 |
+
- Sidecar CPU limits set and monitored via Prometheus
|
| 47 |
+
- Istio mTLS planned for phase two
|
| 48 |
+
- Routing strategy: canary deployments and traffic splitting; basic fault injection planned
|
| 49 |
+
|
| 50 |
+
**Later question**
|
| 51 |
+
|
| 52 |
+
Did we already enable mTLS, and what rollout strategy are we planning?
|
| 53 |
+
|
| 54 |
+
**Answer from memory**
|
| 55 |
+
|
| 56 |
+
mTLS is not enabled yet; it is planned for phase two. The rollout plan is
|
| 57 |
+
canary deployments and traffic splitting, with basic fault injection planned.
|
| 58 |
+
|
| 59 |
+
**Why it matters**
|
| 60 |
+
|
| 61 |
+
Memory systems often blur the current state with the planned state. This is the
|
| 62 |
+
kind of distinction that matters in deployment and incident work.
|
| 63 |
+
|
| 64 |
+
## 3. Answer dated questions instead of only remembering themes
|
| 65 |
+
|
| 66 |
+
**Question**
|
| 67 |
+
|
| 68 |
+
Which hobby did Sam take up in May 2023?
|
| 69 |
+
|
| 70 |
+
**Retrieved memory**
|
| 71 |
+
|
| 72 |
+
- Sam: [18 May 2023] Sam is considering trying painting as a new hobby.
|
| 73 |
+
- Sam: [24 May 2023] Sam has been considering trying painting as a new hobby.
|
| 74 |
+
|
| 75 |
+
**Answer from memory**
|
| 76 |
+
|
| 77 |
+
painting
|
| 78 |
+
|
| 79 |
+
**Why it matters**
|
| 80 |
+
|
| 81 |
+
A useful memory system should not just remember that someone talked about
|
| 82 |
+
hobbies. It should recover the dated fact that actually answers the later
|
| 83 |
+
question.
|
| 84 |
+
|
| 85 |
+
## 4. Refuse unsupported claims instead of inventing a reason
|
| 86 |
+
|
| 87 |
+
**Question**
|
| 88 |
+
|
| 89 |
+
Why did Dave get his guitar customized with a shiny finish?
|
| 90 |
+
|
| 91 |
+
**Retrieved memory**
|
| 92 |
+
|
| 93 |
+
- Dave: That guitar has a gorgeous purple hue. Why did you make it so shiny?
|
| 94 |
+
- Good pick! The customized purple glow gives it a unique look that really stands out.
|
| 95 |
+
- Dave: The guitar was in bad condition when Dave found it.
|
| 96 |
+
|
| 97 |
+
**Answer from memory**
|
| 98 |
+
|
| 99 |
+
None / unsupported
|
| 100 |
+
|
| 101 |
+
**Why it matters**
|
| 102 |
+
|
| 103 |
+
Memory systems are more useful when they can refuse cleanly. Here the retrieved
|
| 104 |
+
context talks about the guitar and the finish, but it never actually supports
|
| 105 |
+
the premise that Dave customized it for a specific reason.
|
docs/release/technical-blog.md
CHANGED
|
@@ -6,6 +6,9 @@
|
|
| 6 |
dialogue into proposition-level memory and retrieves it with an inspectable
|
| 7 |
hybrid stack.
|
| 8 |
|
|
|
|
|
|
|
|
|
|
| 9 |
This package now ships one public extraction skill and one public checkpoint:
|
| 10 |
|
| 11 |
- **Checkpoint:** `exp15_sft_qwen7b_4ep`
|
|
@@ -17,6 +20,70 @@ The public hook is simple:
|
|
| 17 |
|
| 18 |
**PRISM-Memory turns conversations into durable, searchable memory.**
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
## What The Repo Actually Contributed
|
| 21 |
|
| 22 |
The core contribution is not another opaque memory model. The repo showed that a
|
|
|
|
| 6 |
dialogue into proposition-level memory and retrieves it with an inspectable
|
| 7 |
hybrid stack.
|
| 8 |
|
| 9 |
+
The point is not that a 7B model chats well. The point is that a 7B open model
|
| 10 |
+
can write memory records that another system can actually use later.
|
| 11 |
+
|
| 12 |
This package now ships one public extraction skill and one public checkpoint:
|
| 13 |
|
| 14 |
- **Checkpoint:** `exp15_sft_qwen7b_4ep`
|
|
|
|
| 20 |
|
| 21 |
**PRISM-Memory turns conversations into durable, searchable memory.**
|
| 22 |
|
| 23 |
+
## Why This Is Useful In Practice
|
| 24 |
+
|
| 25 |
+
A memory writer is only interesting if a later system can ask a pointed
|
| 26 |
+
question and get back a useful answer without rereading the original chat. The
|
| 27 |
+
public release artifacts already show that pattern.
|
| 28 |
+
|
| 29 |
+
### 1. Keep hard limits and preferences available for later work
|
| 30 |
+
|
| 31 |
+
The extractor can turn a single conversational turn into stable memory like:
|
| 32 |
+
|
| 33 |
+
- GitHub Actions concurrency limit: `20` concurrent jobs
|
| 34 |
+
- Snyk Slack notifications should be aggregated and concise
|
| 35 |
+
|
| 36 |
+
That means a later system can answer:
|
| 37 |
+
|
| 38 |
+
> What is our GitHub Actions concurrency limit, and how should Snyk alerts look?
|
| 39 |
+
|
| 40 |
+
with:
|
| 41 |
+
|
| 42 |
+
> `20` concurrent jobs. Alerts should be aggregated and concise.
|
| 43 |
+
|
| 44 |
+
That is a real product use case. Teams mention constraints and preferences once,
|
| 45 |
+
then expect downstream tools and agents to remember them.
|
| 46 |
+
|
| 47 |
+
### 2. Keep current state separate from the roadmap
|
| 48 |
+
|
| 49 |
+
The released extractor can also preserve the difference between what is true
|
| 50 |
+
now and what is only planned:
|
| 51 |
+
|
| 52 |
+
- sidecar CPU limits are already set and monitored
|
| 53 |
+
- mTLS is planned for phase two
|
| 54 |
+
- rollout strategy is canary deployments plus traffic splitting
|
| 55 |
+
|
| 56 |
+
So a later question like:
|
| 57 |
+
|
| 58 |
+
> Did we already enable mTLS, and what rollout strategy are we planning?
|
| 59 |
+
|
| 60 |
+
can be answered without confusing the present state with the future plan.
|
| 61 |
+
|
| 62 |
+
This is a core memory problem, not a style problem. Chat history tends to blur
|
| 63 |
+
these states together.
|
| 64 |
+
|
| 65 |
+
### 3. Answer dated questions with dated evidence
|
| 66 |
+
|
| 67 |
+
One confirmed held-out benchmark case asks:
|
| 68 |
+
|
| 69 |
+
> Which hobby did Sam take up in May 2023?
|
| 70 |
+
|
| 71 |
+
The retrieved memory contains explicit dated propositions about Sam trying
|
| 72 |
+
painting in May 2023, and the released system answers:
|
| 73 |
+
|
| 74 |
+
> painting
|
| 75 |
+
|
| 76 |
+
That matters because the useful behavior is not “remember that hobbies were
|
| 77 |
+
discussed.” The useful behavior is “recover the dated fact that actually
|
| 78 |
+
answers the later question.”
|
| 79 |
+
|
| 80 |
+
There is a fourth practical behavior that matters too: refusal. On the held-out
|
| 81 |
+
adversarial guitar case, the released model returns `None` instead of inventing
|
| 82 |
+
a reason for an unsupported premise. That is also part of being useful.
|
| 83 |
+
|
| 84 |
+
For the compact scenario version of this story, see
|
| 85 |
+
[memory-scenarios.md](memory-scenarios.md).
|
| 86 |
+
|
| 87 |
## What The Repo Actually Contributed
|
| 88 |
|
| 89 |
The core contribution is not another opaque memory model. The repo showed that a
|