Spaces:
Running on Zero
Running on Zero
| # Memory Architecture | |
| ## The Core Insight | |
| Agent memory is not a separate store. | |
| It is a **filtered view over the shared append-only ledger**, computed fresh each turn. | |
| This solves four problems at once: | |
| - **Consistency**: memory is always in sync with the ledger β no sync bugs possible | |
| - **Crash recovery**: reload the ledger, rebuild every memory view from scratch | |
| - **Testability**: memory retrieval is a pure function (events β recalled events) β trivial to test | |
| - **Privacy**: an agent's memory can only see events it was authorised to see | |
| --- | |
| ## Three Layers | |
| ```mermaid | |
| flowchart TD | |
| L[("Append-only Ledger")] --> V["Visibility filter<br/>own events βͺ globally-visible kinds"] | |
| V --> E["Layer 1 Β· EpisodicMemory<br/>recent window (always on)"] | |
| V --> S["Layer 2 Β· SalienceMemory<br/>top-k: relevance Γ recency Γ importance"] | |
| Idx["MemoryIndex Β· optional Β· ADR-0018<br/>semantic relevance"] -.->|upgrades relevance term| S | |
| V --> Rf["Layer 3 Β· ReflectionMemory<br/>emits agent.reflected every N events"] | |
| Rf -->|"agent.reflected (globally visible)"| L | |
| E --> CB["ContextBuilder β prompt"] | |
| S --> CB | |
| ``` | |
| All three layers are *views over the one ledger* β none holds separate state. | |
| ### Layer 1: EpisodicMemory (always on) | |
| The simplest layer. An agent sees: | |
| - Its own events (any kind, any turn) | |
| - Globally-visible event kinds: `world.observed`, `judge.verdict`, | |
| `user.injected`, `run.started`, `agent.reflected` | |
| The window is capped at `manifest.memory.window` (default 8) for small-model | |
| context budgets. Returns the most-recent N visible events in chronological order. | |
| ```python | |
| class EpisodicMemory: | |
| agent_name: str | |
| max_recent: int = 8 | |
| def visible(self, events) -> list[Event]: | |
| return [e for e in events if mine_or_global(e)][-max_recent:] | |
| ``` | |
| **When to use**: always. It is the baseline memory layer and is always enabled. | |
| --- | |
| ### Layer 2: SalienceMemory (optional, manifest.memory.use_salience=True) | |
| Replaces recency-window ranking with composite salience scoring: | |
| ``` | |
| salience(e) = w_relΒ·relevance(e, query) + w_recΒ·recency(e, turn) + w_impΒ·importance(e.kind) | |
| ``` | |
| | Component | How computed | Default weight | | |
| |---|---|---| | |
| | relevance | Semantic similarity when a `MemoryIndex` is attached (ADR-0018); else Jaccard similarity between event text and current scene | 0.30 | | |
| | recency | exp(βλ·Ξturn), Ξ»=0.1 β half-life β7 turns | 0.40 | | |
| | importance | Kind-based weight table | 0.30 | | |
| **Importance weights** (from `memory.py`): | |
| | Event kind | Weight | | |
| |---|---| | |
| | `user.injected` | 0.95 | | |
| | `verdict.final` | 1.00 | | |
| | `judge.verdict` | 0.90 | | |
| | `agent.reflected` | 0.85 | | |
| | `clue.found` | 0.80 | | |
| | `world.observed` | 0.70 | | |
| | `agent.spoke` | 0.50 | | |
| | `agent.thought` | 0.40 | | |
| | `run.started` | 0.30 | | |
| Top-K events by salience score are returned in chronological order so the | |
| prompt reads naturally (not by importance descending). | |
| **When to use**: enable when agents run for many turns and need to surface | |
| important but older memories over irrelevant recent ones. | |
| First enable point: when the agent window fills up (>30 turns). | |
| **Semantic relevance (ADR-0018, implemented)**: the keyword-Jaccard relevance is | |
| the offline default; attaching a `MemoryIndex` upgrades only that term to | |
| semantic search (see "Semantic Relevance Index" below). Recency, importance, the | |
| visibility filter, and the `format_for_prompt` shape are unchanged. | |
| --- | |
| ### Layer 3: ReflectionMemory (optional, manifest.memory.reflection_threshold=N) | |
| Triggered when an agent has seen `N` visible events since the last reflection. | |
| The agent is instructed to emit an `agent.reflected` event whose payload is a | |
| high-level belief synthesising recent experience: | |
| ``` | |
| agent.reflected β {"belief": "the baker resents me", "based_on": ["evt-123", "evt-456"]} | |
| ``` | |
| Reflection events are globally visible β every agent sees them, including the | |
| reflector itself. This means beliefs accumulate over time without the cost | |
| of carrying raw episodic history, and the judge can read an agent's current | |
| belief state without full access to its memory. | |
| **Compaction effect**: each reflection replaces N raw events with 1 belief. | |
| After K reflections, the effective context window is `KΒ·1 + recent_window` | |
| instead of `NΒ·K + recent_window`. This is how you keep a villager coherent | |
| over 200 turns with an 8-event context window. | |
| **When to implement**: Phase 2 milestone. The `ReflectionTracker` class is | |
| already present in `src/core/memory.py` β it just needs the agent to check | |
| `tracker.observe(events)` each turn and emit the reflection when due. | |
| --- | |
| ## Semantic Relevance Index (ADR-0018, optional) | |
| The `relevance` term in Layer 2 can be computed by **semantic search** instead of | |
| keyword overlap. This is a *derived, rebuildable lens over the ledger* β it | |
| changes how relevance is scored, never which events are eligible (the visibility | |
| filter and the recency/importance terms are untouched). The ledger stays the | |
| single source of truth (ADR-0005): the index is keyed by `event.id` (re-indexing | |
| is idempotent) and can be wiped and rebuilt from the ledger. | |
| ```python | |
| @runtime_checkable | |
| class MemoryIndex(Protocol): | |
| def index(self, events: tuple[Event, ...]) -> None: ... # derive, idempotent by id | |
| def search(self, query: str, k: int) -> list[Event]: ... # read back by relevance | |
| ``` | |
| `SalienceMemory(..., index=...)` derives, then reads: it indexes the visible | |
| candidates first, then queries, so a hit can never be an event the ledger has not | |
| produced. With `index=None` (the offline default) the relevance term is keyword | |
| Jaccard, byte-for-byte unchanged. | |
| **Backend (`Mem0MemoryIndex`)**: stores each event as one raw memory with | |
| inference disabled (the text is embedded verbatim β no model-driven fact | |
| extraction), carrying the full event in metadata so a hit reconstructs the | |
| `Event`. Lazy-imported, so `import src.*` / `import app` work with the package not | |
| installed. | |
| **Gate**: `memory_index_from_env()` returns `None` unless `MEMORY_INDEX` is | |
| truthy. When active, embeddings run **locally** via sentence-transformers | |
| (`all-MiniLM-L6-v2`) by default β no API key, fully offline once the model is | |
| cached β or are repointed (e.g. to a different embedder, or the project's | |
| Postgres+pgvector, ADR-0014) via `MEMORY_INDEX_CONFIG` (a JSON blob forwarded | |
| verbatim to the backend's `from_config`). Install the `memory` extra (`mem0ai` + | |
| `sentence-transformers`). See ADR-0019. | |
| **Hosted backend (opt-in, ADR-0020)**: set `MEMORY_INDEX=cloud` (or | |
| `MEMORY_INDEX_BACKEND=cloud`) to use mem0's managed platform (`MemoryClient`, | |
| api.mem0.ai) instead of the local embedder. `Mem0CloudIndex` satisfies the *same* | |
| `MemoryIndex` protocol β derived, idempotent by `event.id`, ledger-is-truth, | |
| verbatim `infer=False` storage β so nothing downstream changes; only *where* the | |
| embedding and retrieval run differs. It needs `MEM0_API_KEY` (plus optional | |
| `MEM0_ORG_ID` / `MEM0_PROJECT_ID` / `MEM0_HOST`). **Caveat:** activating it sends | |
| ledger event text to mem0's servers β a deliberate departure from the | |
| off-the-grid default, which is why the local backend remains the default. | |
| **Alternative backends**: the two-method protocol can wrap any retrieval store β | |
| a stateful agent-memory service (e.g. a Letta-style memory server) could be a | |
| `MemoryIndex` too, as long as it stays derived from and rebuildable from the | |
| ledger. | |
| --- | |
| ## Context Builder Layering | |
| The ContextBuilder assembles layers in this order (permanent cost β variable cost): | |
| ``` | |
| IDENTITY β persona (never compresses) | |
| CURRENT SCENE β world state from the projection | |
| YOUR MEMORY β EpisodicMemory or SalienceMemory output | |
| VISITOR β recent user_artifacts (last 3) | |
| [EXTRA] β scenario-specific, from _build_extra_prompt() | |
| [OUTPUT FORMAT] β JSON constraint (added by structured.py) | |
| ``` | |
| The layering order is deliberate: | |
| - The model must read IDENTITY first to stay in character | |
| - Scene before memory β what's happening now is more important than what happened before | |
| - Visitor disturbances are always included because they are the most salient inputs | |
| - JSON instruction is last so the model focuses on generating before being constrained | |
| --- | |
| ## Phase 3 Upgrade Path | |
| | Feature | Phase | Mechanism | | |
| |---|---|---| | |
| | Keyword salience | 2 | `SalienceMemory` with Jaccard relevance | | |
| | Reflection events | 2 | `ReflectionTracker` + `agent.reflected` kind | | |
| | Embedding relevance | done | `MemoryIndex` semantic search for the relevance term (ADR-0018) | | |
| | pgvector retrieval | done | `MEMORY_INDEX_CONFIG` persists vectors in the ADR-0014 Postgres/pgvector store | | |
| | Belief graph | 4 | Structured belief store derived from reflection events | | |