# Phase 4.3 — Agent evolution Characters evolve after each match. Same character for everyone (global drift), bounded per match, clamped to identity. Elo is already handled by Phase 4.0b; this phase handles the human-like learning layer on top. ## Scope Five channels, in priority order: 1. **Stylistic drift** — `aggression`, `risk_tolerance`, `patience`, `trash_talk` sliders shift a small amount based on match outcome + opponent quality. 2. **Opening-book evolution** — per-opening score updated by a simple EMA of (opening_played · result_signal). Over time a character favours openings that won and drifts away from openings that lost. 3. **Trap memory** — pattern-specific counters ("fell for scholar's mate vs. @alice"). Probability of falling again drops but never zeros — humans still slip. 4. **Tone drift** — confidence/tilt baselines shift over streaks. Not a slider; a bias the Soul sees in its mood state. 5. **Hard clamps** — a Viktor who loses 50 games in a row can't turn into a pacifist. Sliders cannot drift more than ±2 from base; opening-book weights cannot flip sign on their basis openings. ## Invariants - **Private matches skip the whole evolution step** (per Phase 4.0b memory). Set once: `if match.is_private: return`. - **Global, not per-player.** One evolution state row per character. Per-player memory (who bullied whom) stays in `OpponentProfile` and character memories. - **Per-match deltas are small.** A single match can only move a slider by ≤ 1 step, a tone baseline by ≤ 0.05, an opening EMA by ≤ 0.1. - **Identity clamps bound cumulative drift.** Character's base sliders are the anchor. Cumulative drift is capped at ±2 slider steps from base. The character can change — slowly — but can't become unrecognisable. - **Cadence: every match.** Kept batch-friendly so we can flip to "every N matches" later without rewriting the pipeline. ## Data model `character_evolution_state` — one row per character, lazy-created on first post-match run. | column | type | notes | |---|---|---| | `character_id` (PK) | String(36) FK | one row per character | | `slider_drift` | JSON | `{aggression, risk_tolerance, patience, trash_talk}` — signed deltas from base; bounded [-2, +2] | | `opening_scores` | JSON | `{"": float}` — EMA-updated result signal in [-1, +1] | | `trap_memory` | JSON | list of `{"pattern": str, "fell_for": int, "avoided": int, "last_seen_at": iso}` | | `tone_drift` | JSON | `{"confidence_baseline": float, "tilt_baseline": float}` in [-0.3, +0.3] | | `matches_processed` | int | counter | | `last_updated_at` | datetime | | | `last_match_id` | str nullable | for idempotency check — skip if we've already processed this match | Migration `0012_character_evolution_state`. ## Drift math Let `win_value`: `+1` on a win (character POV), `-1` on loss, `0` on draw. Let `opponent_strength`: `player_elo_at_start / character_elo_at_start` (centered around 1.0). Strong opponents amplify drift; weak opponents attenuate it. ``` signal = win_value * clamp(opponent_strength, 0.3, 2.0) ``` **Slider drift.** Each match picks at most ONE slider to nudge, selected by which one would have helped most given the outcome: - Lost, played cautiously (avg ACPL < 30) → nudge `aggression` +0.5. - Lost, played recklessly (avg ACPL > 80) → nudge `patience` +0.5. - Won, opponent higher-rated → nudge `confidence` in tone (not a slider). - On tilt (recent loss streak ≥ 3) → nudge `trash_talk` slightly up or down toward the character's base depending on direction-of-drift. ``` new_drift = clamp(old_drift + delta, -2.0, +2.0) ``` Cap: at most one slider updated per match. Delta magnitude ≤ 0.5. **Opening-book EMA.** For each opening the character played in the match: ``` score' = 0.9 * score_prev + 0.1 * signal ``` Only applied to openings in `character.opening_preferences` or to openings whose score has been built up over ≥ 3 matches. **Trap memory.** A "trap" is a sharp loss in ≤ 15 plies, or a loss to a named mating pattern detectable by the analysis step. Detection is cheap: reuse `MatchAnalysis.critical_moments` — the first critical moment in the first 12 plies with swing ≥ 400cp is tagged as a trap and fed into `trap_memory`. ``` if detected_trap(pattern): entry = trap_memory[pattern] or new(entry) if character_fell: entry.fell_for += 1 else: entry.avoided += 1 entry.last_seen_at = now ``` A learning Memory row is generated the FIRST time a pattern is seen (narrative: "Last time an opponent tried X, I stepped right into it ..."). Subsequent hits only bump the counter — no new memory row per match (prevents the Subconscious from drowning in redundant copies). **Tone drift.** EMA toward current streak: ``` confidence_baseline' = 0.95 * confidence_prev + 0.05 * clamp(win_streak / 5, -0.3, +0.3) tilt_baseline' = 0.95 * tilt_prev + 0.05 * clamp(-loss_streak / 5, -0.3, +0.3) ``` Applied every match. Fed to the Soul as an additive nudge on the initial `MoodState` before each turn. ## Pipeline Added as a new step after `elo_ratchet`, before `memory_generation` in `app/post_match/processor.py`. That way generated memories can reference newly-detected traps. ``` engine_analysis → feature_extraction → elo_ratchet → evolution ← NEW → memory_generation → narrative_summary ``` Contract: `apply_evolution(session, match, analysis)` — raises on programmer errors, swallows LLM errors (shouldn't invoke LLM anyway — the hook is pure data). Idempotent via `last_match_id` guard. ## Integration with Subconscious - Learned memories from trap memory use a new `MemoryType.LEARNING` scope. They enter the normal vector-retrieval flow; the Subconscious doesn't need changes. - Tone drift is read at Soul call time — `_build_soul_input` adds `tone_drift` to the `mood` parameter before passing to the Soul. No Soul-side changes needed. - Slider drift is read whenever the Director builds the engine config and whenever the style prompt is rendered. Effective slider = `character.slider + drift`. ## Private-match exclusion At entry: `if match.is_private: return` — no state read, no state write, no memory generation of learned entries. Logged at INFO so a demo can grep evolution.log to verify the guardrail is working. ## Testing - **Unit tests** per math function (slider delta selection, opening EMA step, trap detection, tone EMA, clamps). - **Integration test** — simulate a 50-match burst against a fixed opponent profile and assert cumulative drift stays in bounds. - **Private-match test** — confirm `apply_evolution` is a no-op on `match.is_private = True`. - **Idempotency test** — re-running the pipeline on the same match doesn't double-apply drift. ## What this phase does NOT do - Does not build a multi-player personality model (global only). - Does not feed PvP matches into evolution (only PvE — characters are PvE only; PvP is player-vs-player). - Does not build a UI for viewing a character's drift. Phase 4.5 polish pass adds a read-only "this character has learned…" panel on the detail page if there's time.