Spaces:
Sleeping
Sleeping
File size: 7,265 Bytes
7bad702 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | # Phase 4.3 — Agent evolution
Characters evolve after each match. Same character for everyone (global
drift), bounded per match, clamped to identity. Elo is already handled
by Phase 4.0b; this phase handles the human-like learning layer on top.
## Scope
Five channels, in priority order:
1. **Stylistic drift** — `aggression`, `risk_tolerance`, `patience`,
`trash_talk` sliders shift a small amount based on match outcome +
opponent quality.
2. **Opening-book evolution** — per-opening score updated by a simple
EMA of (opening_played · result_signal). Over time a character favours
openings that won and drifts away from openings that lost.
3. **Trap memory** — pattern-specific counters ("fell for scholar's mate
vs. @alice"). Probability of falling again drops but never zeros —
humans still slip.
4. **Tone drift** — confidence/tilt baselines shift over streaks. Not a
slider; a bias the Soul sees in its mood state.
5. **Hard clamps** — a Viktor who loses 50 games in a row can't turn
into a pacifist. Sliders cannot drift more than ±2 from base;
opening-book weights cannot flip sign on their basis openings.
## Invariants
- **Private matches skip the whole evolution step** (per Phase 4.0b
memory). Set once: `if match.is_private: return`.
- **Global, not per-player.** One evolution state row per character.
Per-player memory (who bullied whom) stays in `OpponentProfile` and
character memories.
- **Per-match deltas are small.** A single match can only move a slider
by ≤ 1 step, a tone baseline by ≤ 0.05, an opening EMA by ≤ 0.1.
- **Identity clamps bound cumulative drift.** Character's base sliders
are the anchor. Cumulative drift is capped at ±2 slider steps from
base. The character can change — slowly — but can't become unrecognisable.
- **Cadence: every match.** Kept batch-friendly so we can flip to
"every N matches" later without rewriting the pipeline.
## Data model
`character_evolution_state` — one row per character, lazy-created on
first post-match run.
| column | type | notes |
|---|---|---|
| `character_id` (PK) | String(36) FK | one row per character |
| `slider_drift` | JSON | `{aggression, risk_tolerance, patience, trash_talk}` — signed deltas from base; bounded [-2, +2] |
| `opening_scores` | JSON | `{"<opening_label>": float}` — EMA-updated result signal in [-1, +1] |
| `trap_memory` | JSON | list of `{"pattern": str, "fell_for": int, "avoided": int, "last_seen_at": iso}` |
| `tone_drift` | JSON | `{"confidence_baseline": float, "tilt_baseline": float}` in [-0.3, +0.3] |
| `matches_processed` | int | counter |
| `last_updated_at` | datetime | |
| `last_match_id` | str nullable | for idempotency check — skip if we've already processed this match |
Migration `0012_character_evolution_state`.
## Drift math
Let `win_value`: `+1` on a win (character POV), `-1` on loss, `0` on draw.
Let `opponent_strength`: `player_elo_at_start / character_elo_at_start` (centered
around 1.0). Strong opponents amplify drift; weak opponents attenuate it.
```
signal = win_value * clamp(opponent_strength, 0.3, 2.0)
```
**Slider drift.** Each match picks at most ONE slider to nudge,
selected by which one would have helped most given the outcome:
- Lost, played cautiously (avg ACPL < 30) → nudge `aggression` +0.5.
- Lost, played recklessly (avg ACPL > 80) → nudge `patience` +0.5.
- Won, opponent higher-rated → nudge `confidence` in tone (not a slider).
- On tilt (recent loss streak ≥ 3) → nudge `trash_talk` slightly up or
down toward the character's base depending on direction-of-drift.
```
new_drift = clamp(old_drift + delta, -2.0, +2.0)
```
Cap: at most one slider updated per match. Delta magnitude ≤ 0.5.
**Opening-book EMA.**
For each opening the character played in the match:
```
score' = 0.9 * score_prev + 0.1 * signal
```
Only applied to openings in `character.opening_preferences` or to
openings whose score has been built up over ≥ 3 matches.
**Trap memory.**
A "trap" is a sharp loss in ≤ 15 plies, or a loss to a named mating
pattern detectable by the analysis step. Detection is cheap: reuse
`MatchAnalysis.critical_moments` — the first critical moment in the
first 12 plies with swing ≥ 400cp is tagged as a trap and fed into
`trap_memory`.
```
if detected_trap(pattern):
entry = trap_memory[pattern] or new(entry)
if character_fell: entry.fell_for += 1
else: entry.avoided += 1
entry.last_seen_at = now
```
A learning Memory row is generated the FIRST time a pattern is seen
(narrative: "Last time an opponent tried X, I stepped right into it
..."). Subsequent hits only bump the counter — no new memory row
per match (prevents the Subconscious from drowning in redundant copies).
**Tone drift.** EMA toward current streak:
```
confidence_baseline' = 0.95 * confidence_prev + 0.05 * clamp(win_streak / 5, -0.3, +0.3)
tilt_baseline' = 0.95 * tilt_prev + 0.05 * clamp(-loss_streak / 5, -0.3, +0.3)
```
Applied every match. Fed to the Soul as an additive nudge on the
initial `MoodState` before each turn.
## Pipeline
Added as a new step after `elo_ratchet`, before `memory_generation` in
`app/post_match/processor.py`. That way generated memories can reference
newly-detected traps.
```
engine_analysis → feature_extraction → elo_ratchet
→ evolution ← NEW
→ memory_generation → narrative_summary
```
Contract: `apply_evolution(session, match, analysis)` — raises on
programmer errors, swallows LLM errors (shouldn't invoke LLM anyway —
the hook is pure data). Idempotent via `last_match_id` guard.
## Integration with Subconscious
- Learned memories from trap memory use a new `MemoryType.LEARNING`
scope. They enter the normal vector-retrieval flow; the Subconscious
doesn't need changes.
- Tone drift is read at Soul call time — `_build_soul_input`
adds `tone_drift` to the `mood` parameter before passing to the Soul.
No Soul-side changes needed.
- Slider drift is read whenever the Director builds the engine config
and whenever the style prompt is rendered. Effective slider =
`character.slider + drift`.
## Private-match exclusion
At entry: `if match.is_private: return` — no state read, no state
write, no memory generation of learned entries. Logged at INFO so a
demo can grep evolution.log to verify the guardrail is working.
## Testing
- **Unit tests** per math function (slider delta selection, opening
EMA step, trap detection, tone EMA, clamps).
- **Integration test** — simulate a 50-match burst against a fixed
opponent profile and assert cumulative drift stays in bounds.
- **Private-match test** — confirm `apply_evolution` is a no-op on
`match.is_private = True`.
- **Idempotency test** — re-running the pipeline on the same match
doesn't double-apply drift.
## What this phase does NOT do
- Does not build a multi-player personality model (global only).
- Does not feed PvP matches into evolution (only PvE — characters are
PvE only; PvP is player-vs-player).
- Does not build a UI for viewing a character's drift. Phase 4.5 polish
pass adds a read-only "this character has learned…" panel on the
detail page if there's time.
|