File size: 6,373 Bytes
ddbc1ba | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | # Episodic Memory
**Source file:** `agent/memory.py`
---
## Overview
`LifeStackMemory` gives the agent access to a history of past decisions and their outcomes. On each new conflict, it retrieves the most relevant past high-reward decisions and injects them as few-shot context into the prompt. This is not RAG over a knowledge base β it's a replay buffer that shapes the agent's prior before the model even generates a token.
The 116.81% reward improvement reported in `data/before_after_comparison.json` compares the memory-augmented agent against the same model without memory context. It is not a trained-vs-untrained comparison.
---
## Storage
Three ChromaDB collections, all in `./lifestack_memory/` by default:
| Collection | Content |
|------------|---------|
| `decisions` | Per-step action records: conflict title, action type, domain, reward, reasoning |
| `trajectories` | Per-episode summaries: task ID, route taken, total reward, milestone hits |
| `feedback` | Human outcome feedback: episode ID, effectiveness rating, improved/worsened domains |
```python
memory = LifeStackMemory(path="./lifestack_memory")
# Persistent ChromaDB; falls back to in-memory client on path failure
```
On first initialization, if the `decisions` collection is empty, `LifeStackMemory` auto-hydrates from `data/preseeded_memory*.json` (3 partitioned files). This ensures the agent has something useful to retrieve on a fresh deployment without requiring prior training runs.
---
## Embedding
`LifeStackMemory` uses `SentenceTransformer('all-MiniLM-L6-v2')` for 384-dim embeddings. If the model isn't available locally (`local_files_only=True`), it falls back to a deterministic hash-based embedding:
```python
# Hash fallback: adler32 hash of each token, bucketed into 384 dimensions
for token in text.lower().split():
idx = zlib.adler32(token.encode()) % len(buckets)
buckets[idx] += 1.0
```
The hash fallback preserves semantic retrieval quality well enough for the lexically consistent LifeStack vocabulary (conflict titles, domain names, action types appear repeatedly).
---
## Storing decisions
```python
memory.store_decision(
conflict_title="Friday 6PM",
action_type="communicate",
target_domain="relationships",
reward=0.72,
metrics_snapshot={"relationships.romantic": 55, "mental_wellbeing.stress_level": 80},
reasoning="A quick call prevents relationship erosion during high-stress periods."
)
```
The stored text is `f"{conflict_title} Action: {action_type} Domain: {target_domain} Reward: {reward:.2f} {reasoning[:100]}"`. This text is embedded and stored with the full metadata dict. Only the embedding is used at retrieval time.
---
## Retrieving similar decisions
```python
similar = memory.retrieve_similar(
conflict_title="The Perfect Storm",
current_metrics={"career.workload": 90, "mental_wellbeing.stress_level": 85},
n=3
)
```
Query construction: embeds `f"{conflict_title} <top_3_most_stressed_metrics>"` β the most stressed metrics are the 3 with lowest values after sorting `current_metrics.items()`. This grounds the query in the agent's current situation rather than just the conflict name.
Results are filtered to `reward >= 0.05` before returning. The function retrieves `n*2` candidates and selects the top `n` by cosine similarity after filtering.
Return format:
```python
[{
"action_type": "communicate",
"target_domain": "relationships",
"reward": 0.72,
"reasoning": "...",
"similarity_score": 0.87,
...
}]
```
---
## Few-shot prompt injection
```python
few_shot = memory.build_few_shot_prompt("Friday 6PM", current_metrics)
# Output:
# --- PAST EXPERIENCE & HUMAN VERIFICATION ---
# - Action Taken: [COMMUNICATE] on RELATIONSHIPS
# Agent's Initial Reasoning: A quick call prevents relationship erosion...
# HUMAN FEEDBACK: Rated 8/10. Notes: Partner appreciated the transparency.
```
`build_few_shot_prompt()` calls `retrieve_similar()`, then for each retrieved decision checks whether its `episode_id` has stored feedback in the `feedback` collection. If feedback exists, it appends the effectiveness rating and unexpected effects as additional context. This is the mechanism that brings human feedback into the agent's prompt without any fine-tuning.
---
## Trajectory storage and retrieval
```python
memory.store_trajectory(
task_id="flight_crisis_task_main",
route_taken="rebook_premium",
total_reward=2.5,
trajectory_summary={"milestones_hit": ["m1"], "steps": 8}
)
similar_trajectories = memory.retrieve_similar_trajectories(
task_domain="flight_crisis",
current_world={"lounge_access": True, "flight_rebooked": False},
n=3
)
```
Trajectories are stored in the separate `traj_collection`. Query construction uses `f"TaskDomain: {task_domain} <top_3_most_stressed_world_values>"`. These are surfaced to the agent in `LifeStackAgent.plan()` but are not currently injected into the main GRPO training prompt (the training prompt uses `decisions` only).
---
## Human feedback storage
```python
from core.feedback import OutcomeFeedback
from datetime import datetime
feedback = OutcomeFeedback(
episode_id="ep_12345",
overall_effectiveness=8,
domains_improved=["relationships", "mental_wellbeing"],
domains_worsened=[],
unexpected_effects="Partner called back and offered help with finances.",
resolution_time_hours=2.5
)
memory.store_feedback(feedback)
```
Stored at doc ID `f"fb_{episode_id}"`. Retrieved by `reward_human_feedback_fn` during GRPO training via embedding similarity on the prompt text. This is what closes the loop between real-world outcomes and training signal β if a human reports that the agent's relationship actions worked well, future training batches for similar conflicts will reward those actions more.
---
## Memory stats
```python
stats = memory.get_stats()
# {"total_memories": 145, "average_reward": 0.623, "by_action_type": {"communicate": 38, ...}}
```
---
## Related files
- `agent/agent.py` β `LifeStackAgent` uses `LifeStackMemory.build_few_shot_prompt()`
- `core/feedback.py` β `OutcomeFeedback` dataclass, `compute_human_feedback_reward()`
- `scripts/train_trl.py` β `reward_human_feedback_fn` queries the feedback collection during training
- `data/preseeded_memory*.json` β initial hydration data (decisions collection)
|