Spaces:
Running
Phase 4.3 — Agent evolution
Characters evolve after each match. Same character for everyone (global drift), bounded per match, clamped to identity. Elo is already handled by Phase 4.0b; this phase handles the human-like learning layer on top.
Scope
Five channels, in priority order:
- Stylistic drift —
aggression,risk_tolerance,patience,trash_talksliders shift a small amount based on match outcome + opponent quality. - Opening-book evolution — per-opening score updated by a simple EMA of (opening_played · result_signal). Over time a character favours openings that won and drifts away from openings that lost.
- Trap memory — pattern-specific counters ("fell for scholar's mate vs. @alice"). Probability of falling again drops but never zeros — humans still slip.
- Tone drift — confidence/tilt baselines shift over streaks. Not a slider; a bias the Soul sees in its mood state.
- Hard clamps — a Viktor who loses 50 games in a row can't turn into a pacifist. Sliders cannot drift more than ±2 from base; opening-book weights cannot flip sign on their basis openings.
Invariants
- Private matches skip the whole evolution step (per Phase 4.0b
memory). Set once:
if match.is_private: return. - Global, not per-player. One evolution state row per character.
Per-player memory (who bullied whom) stays in
OpponentProfileand character memories. - Per-match deltas are small. A single match can only move a slider by ≤ 1 step, a tone baseline by ≤ 0.05, an opening EMA by ≤ 0.1.
- Identity clamps bound cumulative drift. Character's base sliders are the anchor. Cumulative drift is capped at ±2 slider steps from base. The character can change — slowly — but can't become unrecognisable.
- Cadence: every match. Kept batch-friendly so we can flip to "every N matches" later without rewriting the pipeline.
Data model
character_evolution_state — one row per character, lazy-created on
first post-match run.
| column | type | notes |
|---|---|---|
character_id (PK) |
String(36) FK | one row per character |
slider_drift |
JSON | {aggression, risk_tolerance, patience, trash_talk} — signed deltas from base; bounded [-2, +2] |
opening_scores |
JSON | {"<opening_label>": float} — EMA-updated result signal in [-1, +1] |
trap_memory |
JSON | list of {"pattern": str, "fell_for": int, "avoided": int, "last_seen_at": iso} |
tone_drift |
JSON | {"confidence_baseline": float, "tilt_baseline": float} in [-0.3, +0.3] |
matches_processed |
int | counter |
last_updated_at |
datetime | |
last_match_id |
str nullable | for idempotency check — skip if we've already processed this match |
Migration 0012_character_evolution_state.
Drift math
Let win_value: +1 on a win (character POV), -1 on loss, 0 on draw.
Let opponent_strength: player_elo_at_start / character_elo_at_start (centered
around 1.0). Strong opponents amplify drift; weak opponents attenuate it.
signal = win_value * clamp(opponent_strength, 0.3, 2.0)
Slider drift. Each match picks at most ONE slider to nudge, selected by which one would have helped most given the outcome:
- Lost, played cautiously (avg ACPL < 30) → nudge
aggression+0.5. - Lost, played recklessly (avg ACPL > 80) → nudge
patience+0.5. - Won, opponent higher-rated → nudge
confidencein tone (not a slider). - On tilt (recent loss streak ≥ 3) → nudge
trash_talkslightly up or down toward the character's base depending on direction-of-drift.
new_drift = clamp(old_drift + delta, -2.0, +2.0)
Cap: at most one slider updated per match. Delta magnitude ≤ 0.5.
Opening-book EMA. For each opening the character played in the match:
score' = 0.9 * score_prev + 0.1 * signal
Only applied to openings in character.opening_preferences or to
openings whose score has been built up over ≥ 3 matches.
Trap memory.
A "trap" is a sharp loss in ≤ 15 plies, or a loss to a named mating
pattern detectable by the analysis step. Detection is cheap: reuse
MatchAnalysis.critical_moments — the first critical moment in the
first 12 plies with swing ≥ 400cp is tagged as a trap and fed into
trap_memory.
if detected_trap(pattern):
entry = trap_memory[pattern] or new(entry)
if character_fell: entry.fell_for += 1
else: entry.avoided += 1
entry.last_seen_at = now
A learning Memory row is generated the FIRST time a pattern is seen (narrative: "Last time an opponent tried X, I stepped right into it ..."). Subsequent hits only bump the counter — no new memory row per match (prevents the Subconscious from drowning in redundant copies).
Tone drift. EMA toward current streak:
confidence_baseline' = 0.95 * confidence_prev + 0.05 * clamp(win_streak / 5, -0.3, +0.3)
tilt_baseline' = 0.95 * tilt_prev + 0.05 * clamp(-loss_streak / 5, -0.3, +0.3)
Applied every match. Fed to the Soul as an additive nudge on the
initial MoodState before each turn.
Pipeline
Added as a new step after elo_ratchet, before memory_generation in
app/post_match/processor.py. That way generated memories can reference
newly-detected traps.
engine_analysis → feature_extraction → elo_ratchet
→ evolution ← NEW
→ memory_generation → narrative_summary
Contract: apply_evolution(session, match, analysis) — raises on
programmer errors, swallows LLM errors (shouldn't invoke LLM anyway —
the hook is pure data). Idempotent via last_match_id guard.
Integration with Subconscious
- Learned memories from trap memory use a new
MemoryType.LEARNINGscope. They enter the normal vector-retrieval flow; the Subconscious doesn't need changes. - Tone drift is read at Soul call time —
_build_soul_inputaddstone_driftto themoodparameter before passing to the Soul. No Soul-side changes needed. - Slider drift is read whenever the Director builds the engine config
and whenever the style prompt is rendered. Effective slider =
character.slider + drift.
Private-match exclusion
At entry: if match.is_private: return — no state read, no state
write, no memory generation of learned entries. Logged at INFO so a
demo can grep evolution.log to verify the guardrail is working.
Testing
- Unit tests per math function (slider delta selection, opening EMA step, trap detection, tone EMA, clamps).
- Integration test — simulate a 50-match burst against a fixed opponent profile and assert cumulative drift stays in bounds.
- Private-match test — confirm
apply_evolutionis a no-op onmatch.is_private = True. - Idempotency test — re-running the pipeline on the same match doesn't double-apply drift.
What this phase does NOT do
- Does not build a multi-player personality model (global only).
- Does not feed PvP matches into evolution (only PvE — characters are PvE only; PvP is player-vs-player).
- Does not build a UI for viewing a character's drift. Phase 4.5 polish pass adds a read-only "this character has learned…" panel on the detail page if there's time.