Spaces:

jdsb06
/

meta-r2

Sleeping

App Files Files Community

meta-r2 / docs /memory.md

github-actions[bot]

Deploy Space snapshot

ddbc1ba about 1 month ago

preview code

raw

history blame contribute delete

6.37 kB

	# Episodic Memory

	Source file: `agent/memory.py`

	---

	## Overview

	`LifeStackMemory` gives the agent access to a history of past decisions and their outcomes. On each new conflict, it retrieves the most relevant past high-reward decisions and injects them as few-shot context into the prompt. This is not RAG over a knowledge base — it's a replay buffer that shapes the agent's prior before the model even generates a token.

	The 116.81% reward improvement reported in `data/before_after_comparison.json` compares the memory-augmented agent against the same model without memory context. It is not a trained-vs-untrained comparison.

	---

	## Storage

	Three ChromaDB collections, all in `./lifestack_memory/` by default:

	\| Collection \| Content \|
	\|------------\|---------\|
	\| `decisions` \| Per-step action records: conflict title, action type, domain, reward, reasoning \|
	\| `trajectories` \| Per-episode summaries: task ID, route taken, total reward, milestone hits \|
	\| `feedback` \| Human outcome feedback: episode ID, effectiveness rating, improved/worsened domains \|

	```python
	memory = LifeStackMemory(path="./lifestack_memory")
	# Persistent ChromaDB; falls back to in-memory client on path failure
	```

	On first initialization, if the `decisions` collection is empty, `LifeStackMemory` auto-hydrates from `data/preseeded_memory*.json` (3 partitioned files). This ensures the agent has something useful to retrieve on a fresh deployment without requiring prior training runs.

	---

	## Embedding

	`LifeStackMemory` uses `SentenceTransformer('all-MiniLM-L6-v2')` for 384-dim embeddings. If the model isn't available locally (`local_files_only=True`), it falls back to a deterministic hash-based embedding:

	```python
	# Hash fallback: adler32 hash of each token, bucketed into 384 dimensions
	for token in text.lower().split():
	idx = zlib.adler32(token.encode()) % len(buckets)
	buckets[idx] += 1.0
	```

	The hash fallback preserves semantic retrieval quality well enough for the lexically consistent LifeStack vocabulary (conflict titles, domain names, action types appear repeatedly).

	---

	## Storing decisions

	```python
	memory.store_decision(
	conflict_title="Friday 6PM",
	action_type="communicate",
	target_domain="relationships",
	reward=0.72,
	metrics_snapshot={"relationships.romantic": 55, "mental_wellbeing.stress_level": 80},
	reasoning="A quick call prevents relationship erosion during high-stress periods."
	)
	```

	The stored text is `f"{conflict_title} Action: {action_type} Domain: {target_domain} Reward: {reward:.2f} {reasoning[:100]}"`. This text is embedded and stored with the full metadata dict. Only the embedding is used at retrieval time.

	---

	## Retrieving similar decisions

	```python
	similar = memory.retrieve_similar(
	conflict_title="The Perfect Storm",
	current_metrics={"career.workload": 90, "mental_wellbeing.stress_level": 85},
	n=3
	)
	```

	Query construction: embeds `f"{conflict_title} <top_3_most_stressed_metrics>"` — the most stressed metrics are the 3 with lowest values after sorting `current_metrics.items()`. This grounds the query in the agent's current situation rather than just the conflict name.

	Results are filtered to `reward >= 0.05` before returning. The function retrieves `n*2` candidates and selects the top `n` by cosine similarity after filtering.

	Return format:
	```python
	[{
	"action_type": "communicate",
	"target_domain": "relationships",
	"reward": 0.72,
	"reasoning": "...",
	"similarity_score": 0.87,
	...
	}]
	```

	---

	## Few-shot prompt injection

	```python
	few_shot = memory.build_few_shot_prompt("Friday 6PM", current_metrics)
	# Output:
	# --- PAST EXPERIENCE & HUMAN VERIFICATION ---
	# - Action Taken: [COMMUNICATE] on RELATIONSHIPS
	# Agent's Initial Reasoning: A quick call prevents relationship erosion...
	# HUMAN FEEDBACK: Rated 8/10. Notes: Partner appreciated the transparency.
	```

	`build_few_shot_prompt()` calls `retrieve_similar()`, then for each retrieved decision checks whether its `episode_id` has stored feedback in the `feedback` collection. If feedback exists, it appends the effectiveness rating and unexpected effects as additional context. This is the mechanism that brings human feedback into the agent's prompt without any fine-tuning.

	---

	## Trajectory storage and retrieval

	```python
	memory.store_trajectory(
	task_id="flight_crisis_task_main",
	route_taken="rebook_premium",
	total_reward=2.5,
	trajectory_summary={"milestones_hit": ["m1"], "steps": 8}
	)

	similar_trajectories = memory.retrieve_similar_trajectories(
	task_domain="flight_crisis",
	current_world={"lounge_access": True, "flight_rebooked": False},
	n=3
	)
	```

	Trajectories are stored in the separate `traj_collection`. Query construction uses `f"TaskDomain: {task_domain} <top_3_most_stressed_world_values>"`. These are surfaced to the agent in `LifeStackAgent.plan()` but are not currently injected into the main GRPO training prompt (the training prompt uses `decisions` only).

	---

	## Human feedback storage

	```python
	from core.feedback import OutcomeFeedback
	from datetime import datetime

	feedback = OutcomeFeedback(
	episode_id="ep_12345",
	overall_effectiveness=8,
	domains_improved=["relationships", "mental_wellbeing"],
	domains_worsened=[],
	unexpected_effects="Partner called back and offered help with finances.",
	resolution_time_hours=2.5
	)
	memory.store_feedback(feedback)
	```

	Stored at doc ID `f"fb_{episode_id}"`. Retrieved by `reward_human_feedback_fn` during GRPO training via embedding similarity on the prompt text. This is what closes the loop between real-world outcomes and training signal — if a human reports that the agent's relationship actions worked well, future training batches for similar conflicts will reward those actions more.

	---

	## Memory stats

	```python
	stats = memory.get_stats()
	# {"total_memories": 145, "average_reward": 0.623, "by_action_type": {"communicate": 38, ...}}
	```

	---

	## Related files

	- `agent/agent.py` — `LifeStackAgent` uses `LifeStackMemory.build_few_shot_prompt()`
	- `core/feedback.py` — `OutcomeFeedback` dataclass, `compute_human_feedback_reward()`
	- `scripts/train_trl.py` — `reward_human_feedback_fn` queries the feedback collection during training
	- `data/preseeded_memory*.json` — initial hydration data (decisions collection)