Episodic Memory
Source file: agent/memory.py
Overview
LifeStackMemory gives the agent access to a history of past decisions and their outcomes. On each new conflict, it retrieves the most relevant past high-reward decisions and injects them as few-shot context into the prompt. This is not RAG over a knowledge base β it's a replay buffer that shapes the agent's prior before the model even generates a token.
The 116.81% reward improvement reported in data/before_after_comparison.json compares the memory-augmented agent against the same model without memory context. It is not a trained-vs-untrained comparison.
Storage
Three ChromaDB collections, all in ./lifestack_memory/ by default:
| Collection | Content |
|---|---|
decisions |
Per-step action records: conflict title, action type, domain, reward, reasoning |
trajectories |
Per-episode summaries: task ID, route taken, total reward, milestone hits |
feedback |
Human outcome feedback: episode ID, effectiveness rating, improved/worsened domains |
memory = LifeStackMemory(path="./lifestack_memory")
# Persistent ChromaDB; falls back to in-memory client on path failure
On first initialization, if the decisions collection is empty, LifeStackMemory auto-hydrates from data/preseeded_memory*.json (3 partitioned files). This ensures the agent has something useful to retrieve on a fresh deployment without requiring prior training runs.
Embedding
LifeStackMemory uses SentenceTransformer('all-MiniLM-L6-v2') for 384-dim embeddings. If the model isn't available locally (local_files_only=True), it falls back to a deterministic hash-based embedding:
# Hash fallback: adler32 hash of each token, bucketed into 384 dimensions
for token in text.lower().split():
idx = zlib.adler32(token.encode()) % len(buckets)
buckets[idx] += 1.0
The hash fallback preserves semantic retrieval quality well enough for the lexically consistent LifeStack vocabulary (conflict titles, domain names, action types appear repeatedly).
Storing decisions
memory.store_decision(
conflict_title="Friday 6PM",
action_type="communicate",
target_domain="relationships",
reward=0.72,
metrics_snapshot={"relationships.romantic": 55, "mental_wellbeing.stress_level": 80},
reasoning="A quick call prevents relationship erosion during high-stress periods."
)
The stored text is f"{conflict_title} Action: {action_type} Domain: {target_domain} Reward: {reward:.2f} {reasoning[:100]}". This text is embedded and stored with the full metadata dict. Only the embedding is used at retrieval time.
Retrieving similar decisions
similar = memory.retrieve_similar(
conflict_title="The Perfect Storm",
current_metrics={"career.workload": 90, "mental_wellbeing.stress_level": 85},
n=3
)
Query construction: embeds f"{conflict_title} <top_3_most_stressed_metrics>" β the most stressed metrics are the 3 with lowest values after sorting current_metrics.items(). This grounds the query in the agent's current situation rather than just the conflict name.
Results are filtered to reward >= 0.05 before returning. The function retrieves n*2 candidates and selects the top n by cosine similarity after filtering.
Return format:
[{
"action_type": "communicate",
"target_domain": "relationships",
"reward": 0.72,
"reasoning": "...",
"similarity_score": 0.87,
...
}]
Few-shot prompt injection
few_shot = memory.build_few_shot_prompt("Friday 6PM", current_metrics)
# Output:
# --- PAST EXPERIENCE & HUMAN VERIFICATION ---
# - Action Taken: [COMMUNICATE] on RELATIONSHIPS
# Agent's Initial Reasoning: A quick call prevents relationship erosion...
# HUMAN FEEDBACK: Rated 8/10. Notes: Partner appreciated the transparency.
build_few_shot_prompt() calls retrieve_similar(), then for each retrieved decision checks whether its episode_id has stored feedback in the feedback collection. If feedback exists, it appends the effectiveness rating and unexpected effects as additional context. This is the mechanism that brings human feedback into the agent's prompt without any fine-tuning.
Trajectory storage and retrieval
memory.store_trajectory(
task_id="flight_crisis_task_main",
route_taken="rebook_premium",
total_reward=2.5,
trajectory_summary={"milestones_hit": ["m1"], "steps": 8}
)
similar_trajectories = memory.retrieve_similar_trajectories(
task_domain="flight_crisis",
current_world={"lounge_access": True, "flight_rebooked": False},
n=3
)
Trajectories are stored in the separate traj_collection. Query construction uses f"TaskDomain: {task_domain} <top_3_most_stressed_world_values>". These are surfaced to the agent in LifeStackAgent.plan() but are not currently injected into the main GRPO training prompt (the training prompt uses decisions only).
Human feedback storage
from core.feedback import OutcomeFeedback
from datetime import datetime
feedback = OutcomeFeedback(
episode_id="ep_12345",
overall_effectiveness=8,
domains_improved=["relationships", "mental_wellbeing"],
domains_worsened=[],
unexpected_effects="Partner called back and offered help with finances.",
resolution_time_hours=2.5
)
memory.store_feedback(feedback)
Stored at doc ID f"fb_{episode_id}". Retrieved by reward_human_feedback_fn during GRPO training via embedding similarity on the prompt text. This is what closes the loop between real-world outcomes and training signal β if a human reports that the agent's relationship actions worked well, future training batches for similar conflicts will reward those actions more.
Memory stats
stats = memory.get_stats()
# {"total_memories": 145, "average_reward": 0.623, "by_action_type": {"communicate": 38, ...}}
Related files
agent/agent.pyβLifeStackAgentusesLifeStackMemory.build_few_shot_prompt()core/feedback.pyβOutcomeFeedbackdataclass,compute_human_feedback_reward()scripts/train_trl.pyβreward_human_feedback_fnqueries the feedback collection during trainingdata/preseeded_memory*.jsonβ initial hydration data (decisions collection)