Spaces:

jdsb06
/

meta-r2

Sleeping

App Files Files Community

meta-r2 / docs /memory.md

github-actions[bot]

Deploy Space snapshot

ddbc1ba about 1 month ago

preview code

raw

history blame contribute delete

6.37 kB

Episodic Memory

Source file: agent/memory.py

Overview

LifeStackMemory gives the agent access to a history of past decisions and their outcomes. On each new conflict, it retrieves the most relevant past high-reward decisions and injects them as few-shot context into the prompt. This is not RAG over a knowledge base — it's a replay buffer that shapes the agent's prior before the model even generates a token.

The 116.81% reward improvement reported in data/before_after_comparison.json compares the memory-augmented agent against the same model without memory context. It is not a trained-vs-untrained comparison.

Storage

Three ChromaDB collections, all in ./lifestack_memory/ by default:

Collection	Content
`decisions`	Per-step action records: conflict title, action type, domain, reward, reasoning
`trajectories`	Per-episode summaries: task ID, route taken, total reward, milestone hits
`feedback`	Human outcome feedback: episode ID, effectiveness rating, improved/worsened domains

memory = LifeStackMemory(path="./lifestack_memory")
# Persistent ChromaDB; falls back to in-memory client on path failure

On first initialization, if the decisions collection is empty, LifeStackMemory auto-hydrates from data/preseeded_memory*.json (3 partitioned files). This ensures the agent has something useful to retrieve on a fresh deployment without requiring prior training runs.

Embedding

LifeStackMemory uses SentenceTransformer('all-MiniLM-L6-v2') for 384-dim embeddings. If the model isn't available locally (local_files_only=True), it falls back to a deterministic hash-based embedding:

# Hash fallback: adler32 hash of each token, bucketed into 384 dimensions
for token in text.lower().split():
    idx = zlib.adler32(token.encode()) % len(buckets)
    buckets[idx] += 1.0

The hash fallback preserves semantic retrieval quality well enough for the lexically consistent LifeStack vocabulary (conflict titles, domain names, action types appear repeatedly).

Storing decisions

memory.store_decision(
    conflict_title="Friday 6PM",
    action_type="communicate",
    target_domain="relationships",
    reward=0.72,
    metrics_snapshot={"relationships.romantic": 55, "mental_wellbeing.stress_level": 80},
    reasoning="A quick call prevents relationship erosion during high-stress periods."
)

The stored text is f"{conflict_title} Action: {action_type} Domain: {target_domain} Reward: {reward:.2f} {reasoning[:100]}". This text is embedded and stored with the full metadata dict. Only the embedding is used at retrieval time.

Retrieving similar decisions

similar = memory.retrieve_similar(
    conflict_title="The Perfect Storm",
    current_metrics={"career.workload": 90, "mental_wellbeing.stress_level": 85},
    n=3
)

Query construction: embeds f"{conflict_title} <top_3_most_stressed_metrics>" — the most stressed metrics are the 3 with lowest values after sorting current_metrics.items(). This grounds the query in the agent's current situation rather than just the conflict name.

Results are filtered to reward >= 0.05 before returning. The function retrieves n*2 candidates and selects the top n by cosine similarity after filtering.

Return format:

[{
    "action_type": "communicate",
    "target_domain": "relationships",
    "reward": 0.72,
    "reasoning": "...",
    "similarity_score": 0.87,
    ...
}]

Few-shot prompt injection

few_shot = memory.build_few_shot_prompt("Friday 6PM", current_metrics)
# Output:
# --- PAST EXPERIENCE & HUMAN VERIFICATION ---
# - Action Taken: [COMMUNICATE] on RELATIONSHIPS
#   Agent's Initial Reasoning: A quick call prevents relationship erosion...
#   HUMAN FEEDBACK: Rated 8/10. Notes: Partner appreciated the transparency.

build_few_shot_prompt() calls retrieve_similar(), then for each retrieved decision checks whether its episode_id has stored feedback in the feedback collection. If feedback exists, it appends the effectiveness rating and unexpected effects as additional context. This is the mechanism that brings human feedback into the agent's prompt without any fine-tuning.

Trajectory storage and retrieval

memory.store_trajectory(
    task_id="flight_crisis_task_main",
    route_taken="rebook_premium",
    total_reward=2.5,
    trajectory_summary={"milestones_hit": ["m1"], "steps": 8}
)

similar_trajectories = memory.retrieve_similar_trajectories(
    task_domain="flight_crisis",
    current_world={"lounge_access": True, "flight_rebooked": False},
    n=3
)

Trajectories are stored in the separate traj_collection. Query construction uses f"TaskDomain: {task_domain} <top_3_most_stressed_world_values>". These are surfaced to the agent in LifeStackAgent.plan() but are not currently injected into the main GRPO training prompt (the training prompt uses decisions only).

Human feedback storage

from core.feedback import OutcomeFeedback
from datetime import datetime

feedback = OutcomeFeedback(
    episode_id="ep_12345",
    overall_effectiveness=8,
    domains_improved=["relationships", "mental_wellbeing"],
    domains_worsened=[],
    unexpected_effects="Partner called back and offered help with finances.",
    resolution_time_hours=2.5
)
memory.store_feedback(feedback)

Stored at doc ID f"fb_{episode_id}". Retrieved by reward_human_feedback_fn during GRPO training via embedding similarity on the prompt text. This is what closes the loop between real-world outcomes and training signal — if a human reports that the agent's relationship actions worked well, future training batches for similar conflicts will reward those actions more.

Memory stats

stats = memory.get_stats()
# {"total_memories": 145, "average_reward": 0.623, "by_action_type": {"communicate": 38, ...}}

Related files

agent/agent.py — LifeStackAgent uses LifeStackMemory.build_few_shot_prompt()
core/feedback.py — OutcomeFeedback dataclass, compute_human_feedback_reward()
scripts/train_trl.py — reward_human_feedback_fn queries the feedback collection during training
data/preseeded_memory*.json — initial hydration data (decisions collection)