File size: 8,754 Bytes
0d7db8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2a8eab2
 
 
 
 
 
 
 
 
 
 
 
 
 
0d7db8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f487b74
0d7db8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f487b74
 
 
 
0d7db8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f487b74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9dd6dab
 
 
 
 
 
f487b74
c3f5c19
 
 
 
 
 
 
 
 
 
f487b74
 
 
 
 
 
 
0d7db8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f487b74
 
0d7db8e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# Memory Architecture

## The Core Insight

Agent memory is not a separate store.
It is a **filtered view over the shared append-only ledger**, computed fresh each turn.

This solves four problems at once:
- **Consistency**: memory is always in sync with the ledger — no sync bugs possible
- **Crash recovery**: reload the ledger, rebuild every memory view from scratch
- **Testability**: memory retrieval is a pure function (events → recalled events) — trivial to test
- **Privacy**: an agent's memory can only see events it was authorised to see

---

## Three Layers

```mermaid
flowchart TD
    L[("Append-only Ledger")] --> V["Visibility filter<br/>own events ∪ globally-visible kinds"]
    V --> E["Layer 1 · EpisodicMemory<br/>recent window (always on)"]
    V --> S["Layer 2 · SalienceMemory<br/>top-k: relevance × recency × importance"]
    Idx["MemoryIndex · optional · ADR-0018<br/>semantic relevance"] -.->|upgrades relevance term| S
    V --> Rf["Layer 3 · ReflectionMemory<br/>emits agent.reflected every N events"]
    Rf -->|"agent.reflected (globally visible)"| L
    E --> CB["ContextBuilder → prompt"]
    S --> CB
```

All three layers are *views over the one ledger* — none holds separate state.

### Layer 1: EpisodicMemory (always on)

The simplest layer.  An agent sees:
- Its own events (any kind, any turn)
- Globally-visible event kinds: `world.observed`, `judge.verdict`,
  `user.injected`, `run.started`, `agent.reflected`

The window is capped at `manifest.memory.window` (default 8) for small-model
context budgets.  Returns the most-recent N visible events in chronological order.

```python
class EpisodicMemory:
    agent_name: str
    max_recent: int = 8

    def visible(self, events) -> list[Event]:
        return [e for e in events if mine_or_global(e)][-max_recent:]
```

**When to use**: always.  It is the baseline memory layer and is always enabled.

---

### Layer 2: SalienceMemory (optional, manifest.memory.use_salience=True)

Replaces recency-window ranking with composite salience scoring:

```
salience(e) = w_rel·relevance(e, query) + w_rec·recency(e, turn) + w_imp·importance(e.kind)
```

| Component | How computed | Default weight |
|---|---|---|
| relevance | Semantic similarity when a `MemoryIndex` is attached (ADR-0018); else Jaccard similarity between event text and current scene | 0.30 |
| recency | exp(−λ·Δturn), λ=0.1 → half-life ≈7 turns | 0.40 |
| importance | Kind-based weight table | 0.30 |

**Importance weights** (from `memory.py`):

| Event kind | Weight |
|---|---|
| `user.injected` | 0.95 |
| `verdict.final` | 1.00 |
| `judge.verdict` | 0.90 |
| `agent.reflected` | 0.85 |
| `clue.found` | 0.80 |
| `world.observed` | 0.70 |
| `agent.spoke` | 0.50 |
| `agent.thought` | 0.40 |
| `run.started` | 0.30 |

Top-K events by salience score are returned in chronological order so the
prompt reads naturally (not by importance descending).

**When to use**: enable when agents run for many turns and need to surface
important but older memories over irrelevant recent ones.
First enable point: when the agent window fills up (>30 turns).

**Semantic relevance (ADR-0018, implemented)**: the keyword-Jaccard relevance is
the offline default; attaching a `MemoryIndex` upgrades only that term to
semantic search (see "Semantic Relevance Index" below). Recency, importance, the
visibility filter, and the `format_for_prompt` shape are unchanged.

---

### Layer 3: ReflectionMemory (optional, manifest.memory.reflection_threshold=N)

Triggered when an agent has seen `N` visible events since the last reflection.
The agent is instructed to emit an `agent.reflected` event whose payload is a
high-level belief synthesising recent experience:

```
agent.reflected → {"belief": "the baker resents me", "based_on": ["evt-123", "evt-456"]}
```

Reflection events are globally visible — every agent sees them, including the
reflector itself.  This means beliefs accumulate over time without the cost
of carrying raw episodic history, and the judge can read an agent's current
belief state without full access to its memory.

**Compaction effect**: each reflection replaces N raw events with 1 belief.
After K reflections, the effective context window is `K·1 + recent_window`
instead of `N·K + recent_window`.  This is how you keep a villager coherent
over 200 turns with an 8-event context window.

**When to implement**: Phase 2 milestone.  The `ReflectionTracker` class is
already present in `src/core/memory.py` — it just needs the agent to check
`tracker.observe(events)` each turn and emit the reflection when due.

---

## Semantic Relevance Index (ADR-0018, optional)

The `relevance` term in Layer 2 can be computed by **semantic search** instead of
keyword overlap. This is a *derived, rebuildable lens over the ledger* — it
changes how relevance is scored, never which events are eligible (the visibility
filter and the recency/importance terms are untouched). The ledger stays the
single source of truth (ADR-0005): the index is keyed by `event.id` (re-indexing
is idempotent) and can be wiped and rebuilt from the ledger.

```python
@runtime_checkable
class MemoryIndex(Protocol):
    def index(self, events: tuple[Event, ...]) -> None: ...   # derive, idempotent by id
    def search(self, query: str, k: int) -> list[Event]: ...  # read back by relevance
```

`SalienceMemory(..., index=...)` derives, then reads: it indexes the visible
candidates first, then queries, so a hit can never be an event the ledger has not
produced. With `index=None` (the offline default) the relevance term is keyword
Jaccard, byte-for-byte unchanged.

**Backend (`Mem0MemoryIndex`)**: stores each event as one raw memory with
inference disabled (the text is embedded verbatim — no model-driven fact
extraction), carrying the full event in metadata so a hit reconstructs the
`Event`. Lazy-imported, so `import src.*` / `import app` work with the package not
installed.

**Gate**: `memory_index_from_env()` returns `None` unless `MEMORY_INDEX` is
truthy. When active, embeddings run **locally** via sentence-transformers
(`all-MiniLM-L6-v2`) by default — no API key, fully offline once the model is
cached — or are repointed (e.g. to a different embedder, or the project's
Postgres+pgvector, ADR-0014) via `MEMORY_INDEX_CONFIG` (a JSON blob forwarded
verbatim to the backend's `from_config`). Install the `memory` extra (`mem0ai` +
`sentence-transformers`). See ADR-0019.

**Hosted backend (opt-in, ADR-0020)**: set `MEMORY_INDEX=cloud` (or
`MEMORY_INDEX_BACKEND=cloud`) to use mem0's managed platform (`MemoryClient`,
api.mem0.ai) instead of the local embedder. `Mem0CloudIndex` satisfies the *same*
`MemoryIndex` protocol — derived, idempotent by `event.id`, ledger-is-truth,
verbatim `infer=False` storage — so nothing downstream changes; only *where* the
embedding and retrieval run differs. It needs `MEM0_API_KEY` (plus optional
`MEM0_ORG_ID` / `MEM0_PROJECT_ID` / `MEM0_HOST`). **Caveat:** activating it sends
ledger event text to mem0's servers — a deliberate departure from the
off-the-grid default, which is why the local backend remains the default.

**Alternative backends**: the two-method protocol can wrap any retrieval store —
a stateful agent-memory service (e.g. a Letta-style memory server) could be a
`MemoryIndex` too, as long as it stays derived from and rebuildable from the
ledger.

---

## Context Builder Layering

The ContextBuilder assembles layers in this order (permanent cost → variable cost):

```
IDENTITY          ← persona (never compresses)
CURRENT SCENE     ← world state from the projection
YOUR MEMORY       ← EpisodicMemory or SalienceMemory output
VISITOR           ← recent user_artifacts (last 3)
[EXTRA]           ← scenario-specific, from _build_extra_prompt()
[OUTPUT FORMAT]   ← JSON constraint (added by structured.py)
```

The layering order is deliberate:
- The model must read IDENTITY first to stay in character
- Scene before memory — what's happening now is more important than what happened before
- Visitor disturbances are always included because they are the most salient inputs
- JSON instruction is last so the model focuses on generating before being constrained

---

## Phase 3 Upgrade Path

| Feature | Phase | Mechanism |
|---|---|---|
| Keyword salience | 2 | `SalienceMemory` with Jaccard relevance |
| Reflection events | 2 | `ReflectionTracker` + `agent.reflected` kind |
| Embedding relevance | done | `MemoryIndex` semantic search for the relevance term (ADR-0018) |
| pgvector retrieval | done | `MEMORY_INDEX_CONFIG` persists vectors in the ADR-0014 Postgres/pgvector store |
| Belief graph | 4 | Structured belief store derived from reflection events |