File size: 6,673 Bytes
4b8593c
 
 
 
 
 
 
fd5b1dc
4b8593c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20e0c2a
4b8593c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: Emma Assistant
emoji: πŸ€–
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.9.1
app_file: Emma/app.py 
pinned: false
---

# EMMA β€” Empathetic Memory-Augmented Multi-layer Assistant  
*(Research Prototype)*

**Empathetic, privacy-aware memory for psychologically informed conversational agents.**

This repository provides a reference implementation of a mobile-friendly, memory-augmented artificial intelligence assistant inspired by the **EMMA architecture**. The system integrates **session**, **episodic**, and **semantic** memory layers, dynamic query classification, **LlamaIndex-based retrieval**, and a **Gradio** demonstration interface. The codebase includes implementation logic, data processing scripts, memory indexing and retrieval components, query classification, and evaluation tooling used in the research prototype.

> ⚠️ **Research prototype β€” not a clinical tool.**  
> This system is intended solely for research, experimentation, and controlled simulations. It must not be used for clinical diagnosis or treatment. See **Limitations & Safety** below.

---

##  Table of Contents
- [EMMA β€” Empathetic Memory-Augmented Multi-layer Assistant](#emma--empathetic-memory-augmented-multi-layer-assistant)
  - [Table of Contents](#table-of-contents)
  - [Key Features](#key-features)
  - [Architecture (High Level)](#architecture-high-level)
  - [Personalized Response Generation Workflow](#personalized-response-generation-workflow)
    - [Step 1 – User Query](#step-1--user-query)
    - [Step 2 – Query Classification](#step-2--query-classification)
    - [Step 3 – Memory Routing](#step-3--memory-routing)
    - [Step 4 – Memory Retrieval](#step-4--memory-retrieval)
    - [Step 5 – Prompt Composition](#step-5--prompt-composition)
    - [Step 6 – Response Generation](#step-6--response-generation)
  - [Evaluation \& Metrics](#evaluation--metrics)
    - [Reproducibility](#reproducibility)
  - [Limitations \& Safety](#limitations--safety)


---

## Key Features

- **Three-tier memory architecture**:  
  Session memory (raw conversational transcripts), episodic memory (session summaries), and semantic memory (long-term traits and values).

- **Dynamic query classifier**:  
  Routes user queries to *Episodic*, *Semantic*, *Hybrid*, or *Unrelated* processing pipelines.

- **Privacy-aware retrieval**:  
  Uses **LlamaIndex** and vector-based indexing for efficient local semantic search.

- **Therapy-aligned prompt templates**:  
  Combines retrieved memory with therapist-inspired prompt scaffolding to ensure emotional alignment.

- **Gradio-based demo interface**:  
  Lightweight chat UI with access to memory summaries and session history.

- **Evaluation tooling**:  
  Scripts supporting quantitative memory retrieval accuracy and qualitative Likert-scale evaluation pipelines.



---

## Architecture (High Level)

- **Indexing**: Episodic and semantic memory items are embedded and stored in vector indexes.
- **Routing**: A classifier determines which memory layer(s) should be queried. Hybrid queries may combine episodic and semantic retrieval.
- **Prompting**: Retrieved memory is merged into therapy-aware prompt templates prior to LLM invocation.

---

## Personalized Response Generation Workflow

The figure illustrates the end-to-end workflow used by **EMMA** for generating personalized and psychologically informed responses. The pipeline consists of six sequential stages:

![Personalized Response Generation Workflow](assets/emma_workflow.jpg)

### Step 1 – User Query
The interaction begins when the user submits a query, which may express a psychological concern, emotional state, or a general question.

### Step 2 – Query Classification
The user query, combined with a task-specific prompt, is passed to a language model (e.g., GPT-3.5) acting as a query recognition mechanism. The query is classified into one of the following categories:

- **Episodic**: Past experiences or events  
- **Semantic**: Stable traits, preferences, or beliefs  
- **Hybrid**: Requires both episodic and semantic context  
- **Unrelated**: No memory retrieval required

### Step 3 – Memory Routing
Based on the predicted memory type, the system determines which memory layer(s) should be accessed and forwards the query together with the memory label to the retrieval module.

### Step 4 – Memory Retrieval
EMMA leverages **LlamaIndex** to retrieve relevant memory chunks from its structured memory store, which includes:

- **Session memory** (short-term conversational context)
- **Episodic memory** (summarized past interactions)
- **Semantic memory** (long-term psychological attributes and behavioral patterns)

### Step 5 – Prompt Composition
Retrieved memory content is merged with the user query using task-specific prompt templates designed to preserve emotional tone, maintain psychological coherence, and align responses with empathic counseling principles.

### Step 6 – Response Generation
The composed prompt is forwarded to the language model (e.g., GPT-3.5), which generates a personalized, memory-informed, and emotionally aligned response.  
Optionally, a post-processing module may refine tone and safety to ensure therapeutic appropriateness.

**Privacy Note:**  
Since psychologically relevant information is abstracted into episodic and semantic memory, raw session transcripts can be periodically discarded. This reduces storage overhead while enhancing user privacy and data security.

---

## Evaluation & Metrics

The original prototype evaluation included:

- **Qualitative evaluation**:  
  90 prompts rated on a 5-point Likert scale across *Personalization*, *Continuity*, and *Empathy* dimensions.

- **Quantitative evaluation**:  
  Memory retrieval accuracy computed as the normalized mean of 5-point Likert scores.

- **Automatic metrics**:  
  Automated rubric-based assessment using a stronger LLM evaluator.

### Reproducibility
Evaluation can be reproduced by preparing:
- A test set of prompts linked to memory entries.
- Scripts comparing generated responses with ground-truth memory and computing Likert-aligned scores.

---

## Limitations & Safety

- This system is **not a clinical or diagnostic tool** and must not replace licensed mental health professionals.
- Automated evaluators and LLM judgments may be noisy or biased; safety-critical use cases require clinician oversight and human-in-the-loop validation.
- The system may occasionally hallucinate memory-grounded facts; retrieval traces should always be logged for auditing and debugging.
- See the associated paper for a detailed discussion of limitations and evaluation methodology.