# Episodic Memory **Source file:** `agent/memory.py` --- ## Overview `LifeStackMemory` gives the agent access to a history of past decisions and their outcomes. On each new conflict, it retrieves the most relevant past high-reward decisions and injects them as few-shot context into the prompt. This is not RAG over a knowledge base — it's a replay buffer that shapes the agent's prior before the model even generates a token. The 116.81% reward improvement reported in `data/before_after_comparison.json` compares the memory-augmented agent against the same model without memory context. It is not a trained-vs-untrained comparison. --- ## Storage Three ChromaDB collections, all in `./lifestack_memory/` by default: | Collection | Content | |------------|---------| | `decisions` | Per-step action records: conflict title, action type, domain, reward, reasoning | | `trajectories` | Per-episode summaries: task ID, route taken, total reward, milestone hits | | `feedback` | Human outcome feedback: episode ID, effectiveness rating, improved/worsened domains | ```python memory = LifeStackMemory(path="./lifestack_memory") # Persistent ChromaDB; falls back to in-memory client on path failure ``` On first initialization, if the `decisions` collection is empty, `LifeStackMemory` auto-hydrates from `data/preseeded_memory*.json` (3 partitioned files). This ensures the agent has something useful to retrieve on a fresh deployment without requiring prior training runs. --- ## Embedding `LifeStackMemory` uses `SentenceTransformer('all-MiniLM-L6-v2')` for 384-dim embeddings. If the model isn't available locally (`local_files_only=True`), it falls back to a deterministic hash-based embedding: ```python # Hash fallback: adler32 hash of each token, bucketed into 384 dimensions for token in text.lower().split(): idx = zlib.adler32(token.encode()) % len(buckets) buckets[idx] += 1.0 ``` The hash fallback preserves semantic retrieval quality well enough for the lexically consistent LifeStack vocabulary (conflict titles, domain names, action types appear repeatedly). --- ## Storing decisions ```python memory.store_decision( conflict_title="Friday 6PM", action_type="communicate", target_domain="relationships", reward=0.72, metrics_snapshot={"relationships.romantic": 55, "mental_wellbeing.stress_level": 80}, reasoning="A quick call prevents relationship erosion during high-stress periods." ) ``` The stored text is `f"{conflict_title} Action: {action_type} Domain: {target_domain} Reward: {reward:.2f} {reasoning[:100]}"`. This text is embedded and stored with the full metadata dict. Only the embedding is used at retrieval time. --- ## Retrieving similar decisions ```python similar = memory.retrieve_similar( conflict_title="The Perfect Storm", current_metrics={"career.workload": 90, "mental_wellbeing.stress_level": 85}, n=3 ) ``` Query construction: embeds `f"{conflict_title} "` — the most stressed metrics are the 3 with lowest values after sorting `current_metrics.items()`. This grounds the query in the agent's current situation rather than just the conflict name. Results are filtered to `reward >= 0.05` before returning. The function retrieves `n*2` candidates and selects the top `n` by cosine similarity after filtering. Return format: ```python [{ "action_type": "communicate", "target_domain": "relationships", "reward": 0.72, "reasoning": "...", "similarity_score": 0.87, ... }] ``` --- ## Few-shot prompt injection ```python few_shot = memory.build_few_shot_prompt("Friday 6PM", current_metrics) # Output: # --- PAST EXPERIENCE & HUMAN VERIFICATION --- # - Action Taken: [COMMUNICATE] on RELATIONSHIPS # Agent's Initial Reasoning: A quick call prevents relationship erosion... # HUMAN FEEDBACK: Rated 8/10. Notes: Partner appreciated the transparency. ``` `build_few_shot_prompt()` calls `retrieve_similar()`, then for each retrieved decision checks whether its `episode_id` has stored feedback in the `feedback` collection. If feedback exists, it appends the effectiveness rating and unexpected effects as additional context. This is the mechanism that brings human feedback into the agent's prompt without any fine-tuning. --- ## Trajectory storage and retrieval ```python memory.store_trajectory( task_id="flight_crisis_task_main", route_taken="rebook_premium", total_reward=2.5, trajectory_summary={"milestones_hit": ["m1"], "steps": 8} ) similar_trajectories = memory.retrieve_similar_trajectories( task_domain="flight_crisis", current_world={"lounge_access": True, "flight_rebooked": False}, n=3 ) ``` Trajectories are stored in the separate `traj_collection`. Query construction uses `f"TaskDomain: {task_domain} "`. These are surfaced to the agent in `LifeStackAgent.plan()` but are not currently injected into the main GRPO training prompt (the training prompt uses `decisions` only). --- ## Human feedback storage ```python from core.feedback import OutcomeFeedback from datetime import datetime feedback = OutcomeFeedback( episode_id="ep_12345", overall_effectiveness=8, domains_improved=["relationships", "mental_wellbeing"], domains_worsened=[], unexpected_effects="Partner called back and offered help with finances.", resolution_time_hours=2.5 ) memory.store_feedback(feedback) ``` Stored at doc ID `f"fb_{episode_id}"`. Retrieved by `reward_human_feedback_fn` during GRPO training via embedding similarity on the prompt text. This is what closes the loop between real-world outcomes and training signal — if a human reports that the agent's relationship actions worked well, future training batches for similar conflicts will reward those actions more. --- ## Memory stats ```python stats = memory.get_stats() # {"total_memories": 145, "average_reward": 0.623, "by_action_type": {"communicate": 38, ...}} ``` --- ## Related files - `agent/agent.py` — `LifeStackAgent` uses `LifeStackMemory.build_few_shot_prompt()` - `core/feedback.py` — `OutcomeFeedback` dataclass, `compute_human_feedback_reward()` - `scripts/train_trl.py` — `reward_human_feedback_fn` queries the feedback collection during training - `data/preseeded_memory*.json` — initial hydration data (decisions collection)