Spaces:

Keyvan1986
/

Emma

Running

App Files Files Community

Emma / README.md

Keyvan1986

Update README.md

20e0c2a verified 6 months ago

preview code

Raw

History Blame Contribute Delete

6.67 kB

	---
	title: Emma Assistant
	emoji: 🤖
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.9.1
	app_file: Emma/app.py
	pinned: false
	---

	# EMMA — Empathetic Memory-Augmented Multi-layer Assistant
	(Research Prototype)

	Empathetic, privacy-aware memory for psychologically informed conversational agents.

	This repository provides a reference implementation of a mobile-friendly, memory-augmented artificial intelligence assistant inspired by the EMMA architecture. The system integrates session, episodic, and semantic memory layers, dynamic query classification, LlamaIndex-based retrieval, and a Gradio demonstration interface. The codebase includes implementation logic, data processing scripts, memory indexing and retrieval components, query classification, and evaluation tooling used in the research prototype.

	> ⚠️ Research prototype — not a clinical tool.
	> This system is intended solely for research, experimentation, and controlled simulations. It must not be used for clinical diagnosis or treatment. See Limitations & Safety below.

	---

	## Table of Contents
	- [EMMA — Empathetic Memory-Augmented Multi-layer Assistant](#emma--empathetic-memory-augmented-multi-layer-assistant)
	- [Table of Contents](#table-of-contents)
	- [Key Features](#key-features)
	- [Architecture (High Level)](#architecture-high-level)
	- [Personalized Response Generation Workflow](#personalized-response-generation-workflow)
	- [Step 1 – User Query](#step-1--user-query)
	- [Step 2 – Query Classification](#step-2--query-classification)
	- [Step 3 – Memory Routing](#step-3--memory-routing)
	- [Step 4 – Memory Retrieval](#step-4--memory-retrieval)
	- [Step 5 – Prompt Composition](#step-5--prompt-composition)
	- [Step 6 – Response Generation](#step-6--response-generation)
	- [Evaluation \& Metrics](#evaluation--metrics)
	- [Reproducibility](#reproducibility)
	- [Limitations \& Safety](#limitations--safety)


	---

	## Key Features

	- Three-tier memory architecture:
	Session memory (raw conversational transcripts), episodic memory (session summaries), and semantic memory (long-term traits and values).

	- Dynamic query classifier:
	Routes user queries to Episodic, Semantic, Hybrid, or Unrelated processing pipelines.

	- Privacy-aware retrieval:
	Uses LlamaIndex and vector-based indexing for efficient local semantic search.

	- Therapy-aligned prompt templates:
	Combines retrieved memory with therapist-inspired prompt scaffolding to ensure emotional alignment.

	- Gradio-based demo interface:
	Lightweight chat UI with access to memory summaries and session history.

	- Evaluation tooling:
	Scripts supporting quantitative memory retrieval accuracy and qualitative Likert-scale evaluation pipelines.



	---

	## Architecture (High Level)

	- Indexing: Episodic and semantic memory items are embedded and stored in vector indexes.
	- Routing: A classifier determines which memory layer(s) should be queried. Hybrid queries may combine episodic and semantic retrieval.
	- Prompting: Retrieved memory is merged into therapy-aware prompt templates prior to LLM invocation.

	---

	## Personalized Response Generation Workflow

	The figure illustrates the end-to-end workflow used by EMMA for generating personalized and psychologically informed responses. The pipeline consists of six sequential stages:

	![Personalized Response Generation Workflow](assets/emma_workflow.jpg)

	### Step 1 – User Query
	The interaction begins when the user submits a query, which may express a psychological concern, emotional state, or a general question.

	### Step 2 – Query Classification
	The user query, combined with a task-specific prompt, is passed to a language model (e.g., GPT-3.5) acting as a query recognition mechanism. The query is classified into one of the following categories:

	- Episodic: Past experiences or events
	- Semantic: Stable traits, preferences, or beliefs
	- Hybrid: Requires both episodic and semantic context
	- Unrelated: No memory retrieval required

	### Step 3 – Memory Routing
	Based on the predicted memory type, the system determines which memory layer(s) should be accessed and forwards the query together with the memory label to the retrieval module.

	### Step 4 – Memory Retrieval
	EMMA leverages LlamaIndex to retrieve relevant memory chunks from its structured memory store, which includes:

	- Session memory (short-term conversational context)
	- Episodic memory (summarized past interactions)
	- Semantic memory (long-term psychological attributes and behavioral patterns)

	### Step 5 – Prompt Composition
	Retrieved memory content is merged with the user query using task-specific prompt templates designed to preserve emotional tone, maintain psychological coherence, and align responses with empathic counseling principles.

	### Step 6 – Response Generation
	The composed prompt is forwarded to the language model (e.g., GPT-3.5), which generates a personalized, memory-informed, and emotionally aligned response.
	Optionally, a post-processing module may refine tone and safety to ensure therapeutic appropriateness.

	Privacy Note:
	Since psychologically relevant information is abstracted into episodic and semantic memory, raw session transcripts can be periodically discarded. This reduces storage overhead while enhancing user privacy and data security.

	---

	## Evaluation & Metrics

	The original prototype evaluation included:

	- Qualitative evaluation:
	90 prompts rated on a 5-point Likert scale across Personalization, Continuity, and Empathy dimensions.

	- Quantitative evaluation:
	Memory retrieval accuracy computed as the normalized mean of 5-point Likert scores.

	- Automatic metrics:
	Automated rubric-based assessment using a stronger LLM evaluator.

	### Reproducibility
	Evaluation can be reproduced by preparing:
	- A test set of prompts linked to memory entries.
	- Scripts comparing generated responses with ground-truth memory and computing Likert-aligned scores.

	---

	## Limitations & Safety

	- This system is not a clinical or diagnostic tool and must not replace licensed mental health professionals.
	- Automated evaluators and LLM judgments may be noisy or biased; safety-critical use cases require clinician oversight and human-in-the-loop validation.
	- The system may occasionally hallucinate memory-grounded facts; retrieval traces should always be logged for auditing and debugging.
	- See the associated paper for a detailed discussion of limitations and evaluation methodology.