File size: 7,265 Bytes
7bad702
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# Phase 4.3 — Agent evolution

Characters evolve after each match. Same character for everyone (global
drift), bounded per match, clamped to identity. Elo is already handled
by Phase 4.0b; this phase handles the human-like learning layer on top.

## Scope

Five channels, in priority order:

1. **Stylistic drift**`aggression`, `risk_tolerance`, `patience`,
   `trash_talk` sliders shift a small amount based on match outcome +
   opponent quality.
2. **Opening-book evolution** — per-opening score updated by a simple
   EMA of (opening_played · result_signal). Over time a character favours
   openings that won and drifts away from openings that lost.
3. **Trap memory** — pattern-specific counters ("fell for scholar's mate
   vs. @alice"). Probability of falling again drops but never zeros —
   humans still slip.
4. **Tone drift** — confidence/tilt baselines shift over streaks. Not a
   slider; a bias the Soul sees in its mood state.
5. **Hard clamps** — a Viktor who loses 50 games in a row can't turn
   into a pacifist. Sliders cannot drift more than ±2 from base;
   opening-book weights cannot flip sign on their basis openings.

## Invariants

- **Private matches skip the whole evolution step** (per Phase 4.0b
  memory). Set once: `if match.is_private: return`.
- **Global, not per-player.** One evolution state row per character.
  Per-player memory (who bullied whom) stays in `OpponentProfile` and
  character memories.
- **Per-match deltas are small.** A single match can only move a slider
  by ≤ 1 step, a tone baseline by ≤ 0.05, an opening EMA by ≤ 0.1.
- **Identity clamps bound cumulative drift.** Character's base sliders
  are the anchor. Cumulative drift is capped at ±2 slider steps from
  base. The character can change — slowly — but can't become unrecognisable.
- **Cadence: every match.** Kept batch-friendly so we can flip to
  "every N matches" later without rewriting the pipeline.

## Data model

`character_evolution_state` — one row per character, lazy-created on
first post-match run.

| column | type | notes |
|---|---|---|
| `character_id` (PK) | String(36) FK | one row per character |
| `slider_drift` | JSON | `{aggression, risk_tolerance, patience, trash_talk}` — signed deltas from base; bounded [-2, +2] |
| `opening_scores` | JSON | `{"<opening_label>": float}` — EMA-updated result signal in [-1, +1] |
| `trap_memory` | JSON | list of `{"pattern": str, "fell_for": int, "avoided": int, "last_seen_at": iso}` |
| `tone_drift` | JSON | `{"confidence_baseline": float, "tilt_baseline": float}` in [-0.3, +0.3] |
| `matches_processed` | int | counter |
| `last_updated_at` | datetime | |
| `last_match_id` | str nullable | for idempotency check — skip if we've already processed this match |

Migration `0012_character_evolution_state`.

## Drift math

Let `win_value`: `+1` on a win (character POV), `-1` on loss, `0` on draw.
Let `opponent_strength`: `player_elo_at_start / character_elo_at_start` (centered
around 1.0). Strong opponents amplify drift; weak opponents attenuate it.

```
signal        = win_value * clamp(opponent_strength, 0.3, 2.0)
```

**Slider drift.** Each match picks at most ONE slider to nudge,
selected by which one would have helped most given the outcome:

- Lost, played cautiously (avg ACPL < 30) → nudge `aggression` +0.5.
- Lost, played recklessly (avg ACPL > 80) → nudge `patience` +0.5.
- Won, opponent higher-rated → nudge `confidence` in tone (not a slider).
- On tilt (recent loss streak ≥ 3) → nudge `trash_talk` slightly up or
  down toward the character's base depending on direction-of-drift.

```
new_drift = clamp(old_drift + delta, -2.0, +2.0)
```

Cap: at most one slider updated per match. Delta magnitude ≤ 0.5.

**Opening-book EMA.**
For each opening the character played in the match:
```
score'       = 0.9 * score_prev  +  0.1 * signal
```
Only applied to openings in `character.opening_preferences` or to
openings whose score has been built up over ≥ 3 matches.

**Trap memory.**
A "trap" is a sharp loss in ≤ 15 plies, or a loss to a named mating
pattern detectable by the analysis step. Detection is cheap: reuse
`MatchAnalysis.critical_moments` — the first critical moment in the
first 12 plies with swing ≥ 400cp is tagged as a trap and fed into
`trap_memory`.

```
if detected_trap(pattern):
    entry = trap_memory[pattern] or new(entry)
    if character_fell: entry.fell_for += 1
    else:              entry.avoided += 1
    entry.last_seen_at = now
```

A learning Memory row is generated the FIRST time a pattern is seen
(narrative: "Last time an opponent tried X, I stepped right into it
..."). Subsequent hits only bump the counter — no new memory row
per match (prevents the Subconscious from drowning in redundant copies).

**Tone drift.** EMA toward current streak:
```
confidence_baseline'  = 0.95 * confidence_prev + 0.05 * clamp(win_streak / 5, -0.3, +0.3)
tilt_baseline'        = 0.95 * tilt_prev       + 0.05 * clamp(-loss_streak / 5, -0.3, +0.3)
```

Applied every match. Fed to the Soul as an additive nudge on the
initial `MoodState` before each turn.

## Pipeline

Added as a new step after `elo_ratchet`, before `memory_generation` in
`app/post_match/processor.py`. That way generated memories can reference
newly-detected traps.

```
engine_analysis → feature_extraction → elo_ratchet
  → evolution                ← NEW
  → memory_generation → narrative_summary
```

Contract: `apply_evolution(session, match, analysis)` — raises on
programmer errors, swallows LLM errors (shouldn't invoke LLM anyway —
the hook is pure data). Idempotent via `last_match_id` guard.

## Integration with Subconscious

- Learned memories from trap memory use a new `MemoryType.LEARNING`
  scope. They enter the normal vector-retrieval flow; the Subconscious
  doesn't need changes.
- Tone drift is read at Soul call time — `_build_soul_input`
  adds `tone_drift` to the `mood` parameter before passing to the Soul.
  No Soul-side changes needed.
- Slider drift is read whenever the Director builds the engine config
  and whenever the style prompt is rendered. Effective slider =
  `character.slider + drift`.

## Private-match exclusion

At entry: `if match.is_private: return` — no state read, no state
write, no memory generation of learned entries. Logged at INFO so a
demo can grep evolution.log to verify the guardrail is working.

## Testing

- **Unit tests** per math function (slider delta selection, opening
  EMA step, trap detection, tone EMA, clamps).
- **Integration test** — simulate a 50-match burst against a fixed
  opponent profile and assert cumulative drift stays in bounds.
- **Private-match test** — confirm `apply_evolution` is a no-op on
  `match.is_private = True`.
- **Idempotency test** — re-running the pipeline on the same match
  doesn't double-apply drift.

## What this phase does NOT do

- Does not build a multi-player personality model (global only).
- Does not feed PvP matches into evolution (only PvE — characters are
  PvE only; PvP is player-vs-player).
- Does not build a UI for viewing a character's drift. Phase 4.5 polish
  pass adds a read-only "this character has learned…" panel on the
  detail page if there's time.