--- base_model: Qwen/Qwen2.5-7B-Instruct base_model_relation: adapter license: apache-2.0 library_name: peft pipeline_tag: text-generation tags: - conversational-memory - information-extraction - long-context - peft - lora - qwen2.5 --- # PRISM-Memory PRISM-Memory is a LoRA adapter that trains `Qwen/Qwen2.5-7B-Instruct` to write proposition-level memory from dialogue. It is a memory-writing component, not a general chat model. ## Released model - Model name: `PRISM-Memory 7B Adapter` - Base model: `Qwen/Qwen2.5-7B-Instruct` - Adapter type: `LoRA` ## What this release shows - A 7B open model can replace GPT-4.1 for the extraction step in this memory pipeline. - On the confirmed release surface, PRISM-Memory scores `0.4768` on LongMemEval and `0.4981` on LoCoMo. - The GPT-4.1-based PropMem reference scores `0.4650` on LongMemEval and `0.5360` on LoCoMo. This comparison holds the QA layer constant. It compares extractor against extractor, not a full end-to-end GPT-4.1 system. ## Why this is useful - It keeps hard limits and preferences available for later workflow generation. - It keeps current state separate from future plans. - It supports dated recall and clean refusal on unsupported questions. See [docs/release/memory-scenarios.md](docs/release/memory-scenarios.md) for compact end-to-end examples. ## Load the adapter ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base_id = "Qwen/Qwen2.5-7B-Instruct" adapter_id = "AsadIsmail/prism-memory" tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True) base_model = AutoModelForCausalLM.from_pretrained( base_id, device_map="auto", trust_remote_code=True, ) model = PeftModel.from_pretrained(base_model, adapter_id) ``` This repo contains adapter weights only. You still need the base model. ## Training data PRISM-Memory was trained on **synthetic** multi-session memory conversations with **GPT-4.1-derived** memory-writing labels. The public release does not use real user chat logs. | Item | Count | Notes | |---|---:|---| | synthetic training conversations | `2,329` | multi-session conversations with inserts, updates, and deletes | | synthetic held-out conversations | `584` | evaluation split used for held-out examples | | supervised extraction examples | `100,427` | memory-writing labels derived from the synthetic corpus | | released training subset | `20,000` | supervised examples used for the public adapter | ### Example training item **Synthetic scenario** - Domain: cloud infrastructure performance optimization - Persona: senior cloud systems engineer at a fintech startup **Synthetic user turn** > Here’s the initial architecture outline: deploy microservices on AWS Fargate, use PostgreSQL 13 as the primary database, plan Kubernetes orchestration, use Redis for caching, and keep API latency under 50ms. **Target memory records** - Deploy microservices on AWS Fargate - Orchestrate containers on a Kubernetes cluster (planned) - Primary database: PostgreSQL 13 - Use Redis as an in-memory caching layer - Latency target: API responses under 50ms The release makes the dataset design, counts, and example records public. It does not bundle the full raw corpus files. ## Confirmed results | Benchmark | PRISM-Memory | GPT-4.1-based PropMem reference | |---|---:|---:| | LongMemEval | `0.4768` | `0.4650` | | LoCoMo | `0.4981` | `0.5360` | The reproduced evaluation hit the cached QA surface exactly: `460` hits, `0` misses. ## Extraction examples ### Infrastructure bottlenecks stay structured - Session date: `2025-01-04 15:34:00` - Overlap score: `0.909` - Note: Near-exact match on two operational facts from a single held-out turn. **Turn** > yeah, no real caching beyond basic Docker layer caching. Jenkins nodes have limited capacity, and we sometimes hit queue delays during peak commits. **GPT-4.1 reference** - No caching beyond basic Docker layer caching - Jenkins nodes have limited capacity and experience queue delays during peak commits **PRISM-Memory** - No Docker caching beyond basic layer caching - Jenkins nodes have limited capacity; peak commits cause queue delays ### Numeric constraints and preferences survive extraction - Session date: `2025-03-01 15:07:00` - Overlap score: `0.800` - Note: The trained model keeps both the hard concurrency cap and the desired notification style. **Turn** > yeah, I think starting with incremental scans and parallel matrix jobs makes sense. We have 20 concurrent jobs max on GitHub Actions currently. Also want to keep Slack notifications from Snyk consistent with other pipeline alerts—aggregated and concise. Can you help draft the workflow? **GPT-4.1 reference** - GitHub Actions concurrency limit: 20 concurrent jobs - Wants Snyk Slack notifications aggregated and concise, consistent with other pipeline alerts **PRISM-Memory** - GitHub Actions concurrency limit: 20 concurrent jobs - Snyk Slack notifications should be aggregated and concise More held-out examples live in [docs/release/extraction-examples.md](docs/release/extraction-examples.md). ## Bundled docs and artifacts - [docs/release/datasets.md](docs/release/datasets.md) - [docs/release/extraction-examples.md](docs/release/extraction-examples.md) - [docs/release/extraction-skill.md](docs/release/extraction-skill.md) - [docs/release/memory-scenarios.md](docs/release/memory-scenarios.md) - [docs/release/release-results.md](docs/release/release-results.md) - [docs/release/technical-blog.md](docs/release/technical-blog.md) - [results/release_summary.json](results/release_summary.json) - [results/extraction_examples.json](results/extraction_examples.json) - [results/try_it_sessions.json](results/try_it_sessions.json) ## Demo The companion Space is live at `https://huggingface.co/spaces/AsadIsmail/prism-memory`. ## Limitations - This is a memory-writing component, not a general chat model. - It is a LoRA adapter, not a standalone full checkpoint. - The evaluation pipeline still uses a separate QA model to score retrieved memory. - Temporal and inferential categories still trail stronger larger-model baselines.