Text Generation
PEFT
Safetensors
conversational-memory
information-extraction
long-context
lora
qwen2.5
conversational
Instructions to use AsadIsmail/prism-memory with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AsadIsmail/prism-memory with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "AsadIsmail/prism-memory") - Notebooks
- Google Colab
- Kaggle
File size: 2,516 Bytes
047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 047d480 419e63b 9088f51 419e63b 047d480 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | # PRISM-Memory Release Results
This page summarizes the confirmed public release metrics and the internal
comparison evidence that informed the release choice.
## Released Model
- Model: `PRISM-Memory 7B Adapter`
- Base model: `Qwen/Qwen2.5-7B-Instruct`
- Adapter type: LoRA
- Confirmed LoCoMo mean: `0.4981204463`
- Confirmed LongMemEval mean: `0.4767574431`
- QA cache hits during confirmation: `460`
- QA cache misses during confirmation: `0`
## Public Comparison
PRISM-Memory fine-tunes `Qwen/Qwen2.5-7B-Instruct` for the memory extraction
step that the PropMem reference gets from GPT-4.1.
| Benchmark | PRISM-Memory | GPT-4.1-based PropMem reference | Read |
|---|---:|---:|---|
| LongMemEval | `0.4768` | `0.4650` | PRISM wins |
| LoCoMo | `0.4981` | `0.5360` | PRISM trails, but stays competitive |
The QA layer is held constant. This is an extraction-step comparison, not an
end-to-end GPT-4.1 replacement claim.
## LoCoMo Breakdown
| Category | Score |
|---|---:|
| factual | `0.3339551926` |
| temporal | `0.4978785870` |
| inferential | `0.2605997475` |
| multi-hop | `0.5144477744` |
| adversarial | `0.8837209302` |
## LongMemEval Breakdown
| Category | Score |
|---|---:|
| knowledge-update | `0.5588405797` |
| multi-session | `0.1390977444` |
| single-session-assistant | `0.7656395892` |
| single-session-preference | `0.0519667456` |
| single-session-user | `0.9133333333` |
| temporal-reasoning | `0.4316666667` |
## Why This Model Was Released
The closest internal runner-up nearly tied the released model on overall
LoCoMo, but it lost on the broader release profile:
- lower LongMemEval score: `0.4689`
- weaker adversarial precision
- less balanced behavior across the full evaluation surface
Question-level comparison on held-out LoCoMo:
- disagreements: `152 / 400`
- questions favoring PRISM-Memory: `56`
- questions favoring the runner-up: `52`
That is close enough to be a real internal comparison, but not close enough to
justify two public models.
## Artifact Files
- [../../results/release_summary.json](../../results/release_summary.json)
- [../../results/release_model.json](../../results/release_model.json)
- [../../results/try_it_sessions.json](../../results/try_it_sessions.json)
- [../../results/internal_locomo_pairwise_diffs.json](../../results/internal_locomo_pairwise_diffs.json)
Related docs:
- [extraction-skill.md](extraction-skill.md)
- [extraction-examples.md](extraction-examples.md)
- [datasets.md](datasets.md)
- [model-card.md](model-card.md)
|