PyCatan-AI / docs /AI_RUN_BACKLOG.md
shon
1
bf013e2
# AI Run Backlog
Live observations from autonomous PyCatan AI runs.
Current run: `examples/ai_testing/my_games/session_20260515_231622`
## Observations
| ID | Status | Area | Severity | Observation | Evidence | Suggested next step |
| --- | --- | --- | --- | --- | --- | --- |
| AI-RUN-001 | Fixed - smoke verified | Agent prompt/actions | High | `wait_for_response` is injected into allowed actions even during setup turns, but `AIUser` does not map it to a real action. If selected, it will likely fall back to `END_TURN`, which can prematurely end or fail a required setup step. | `Bob/prompts/prompt_1.txt` and `Alice/prompts/prompt_2.txt` list `wait_for_response` beside required setup actions. `AIUser._decision_to_action` has no `wait_for_response` mapping. | Removed automatic `WAIT_FOR_RESPONSE` injection from AI prompts, mapped unexpected `wait_for_response` outputs to `END_TURN`, and added `GameManager.execute_action` allowed-action validation so `END_TURN` cannot skip required setup/robber phases. Follow-up cleanup removed stale `wait_for_response` instruction text when the action is not actually allowed. Manual smoke checks passed because pytest is not installed in the active env. |
| AI-RUN-002 | Fixed - smoke verified | Agent reasoning/tools | Medium | Road-placement prompts do not strongly encourage using `analyze_path_potential`; agents chose roads by manually interpreting neighbors from compact state. | Alice prompt #2 chose `20 -> 10`, Bob prompt #2 chose `42 -> 41`, Charlie prompt #2 chose `12 -> 13`; no tool calls in those road prompts. | Added action-specific prompt guidance for `place_starting_road` and `build_road` to use `analyze_path_potential` before choosing. Smoke check confirms road instructions include the tool guidance. |
| AI-RUN-003 | Fixed - smoke verified | Agent factual accuracy | Medium | Agent reasoning contains board-summary inaccuracies even when the final action is valid. Bob described Alice's node as a `9-5-4` spot, while Alice's node 20 is Wood 11, Brick 6, Sheep 8. | `Bob/responses/response_1.json`; actual state in `Bob/prompts/prompt_1.txt` has `bld` at node 20 and tool data earlier shows node 20 as 11/6/8. | Added prompt guardrails: agents are told not to state node resources or opponent settlement facts unless they come from filtered `game_state` or a tool result, and settlement prompts now push `find_best_nodes`/`inspect_node`. Smoke check confirms the guidance is present. |
| AI-RUN-004 | Fixed - smoke verified | Response schema/parser | Low | Schema requires `internal_thinking` minLength 1000, but responses are shorter and parser repairs them with `[Response was too brief]`. This adds noise and means schema pressure is not matching desired behavior. | `Alice/responses/response_1.json`, `response_2.json`, `Bob/responses/response_1.json`, `response_2.json`. | Reduced active-turn `internal_thinking` minLength to 120 and removed the parser repair suffix. `get_schema_description()` no longer asks for 1000+ chars. Compile checks passed. |
| AI-RUN-005 | Open | Agent architecture | Medium | The prompt gives the LLM a dense raw compact board (`H`, `N`, `state`) plus tools. This works, but still invites manual decoding and hallucination instead of tool-grounded decisions. | All prompt files include full compact lookup tables and tool list. Initial settlement prompts use `find_best_nodes`; road prompts do not use tools. | Move toward a planner/validator loop: propose candidate, verify with tools/rules, then emit action. |
| AI-RUN-006 | Fixed - smoke verified | Config/runtime | Medium | `play_with_ai.py` creates `AIConfig()` directly, so it may not load `config_dev.yaml`; runtime behavior can drift from documented dev config. | `examples/ai_testing/play_with_ai.py` called `AIConfig()` in `create_game`. First rerun after the fix crashed because `config_dev.yaml` contains richer `agent` keys such as `personality` that `AgentConfig` did not accept. | `play_with_ai.py` now loads `pycatan/ai/config_dev.yaml` by default when present, accepts `--config`, and prints the selected provider/model. `AIConfig.from_dict()` now ignores unknown YAML keys per section, so richer config files stay backward-compatible. Compile and config-load smoke checks passed. |
| AI-RUN-007 | Fixed - smoke verified | Runtime setup | Medium | `.env` exists but the runtime does not load it automatically; API key must be in process environment. This can block runs on another terminal/session. | Earlier check showed `.env` exists and `GEMINI_API_KEY_SET=False` in this shell; code read `os.environ` only. | Added lightweight `.env` loading at startup for simple `KEY=VALUE` entries without requiring a new dependency. Compile checks passed. |
| AI-RUN-008 | Fixed - smoke verified | Agent reasoning/tools | Medium | Tool use is inconsistent for high-impact setup decisions. Charlie's second settlement was selected without `find_best_nodes` or `inspect_node`, despite explicit resource goals and available tools. | `Charlie/prompts/prompt_3.txt`; `llm_communication.log` API Call #10 returns text directly with no tool request. | Added action-specific prompt guidance for `place_starting_settlement` and `build_settlement` to use `find_best_nodes` and `inspect_node` instead of manual array decoding. Smoke check confirms the guidance is present. |
| AI-RUN-009 | Fixed - smoke verified | Prompt context | Medium | Setup second-round prompt context is too generic: `what_just_happened` says `It's your turn` instead of explaining this is the second setup placement and that starting resources will be granted from this settlement. | `Charlie/prompts/prompt_3.txt` has phase `SETUP_SECOND_ROUND` but task context is generic. | `_get_prompt_message_for_phase()` is now setup-phase aware: first settlement, starting road, and second starting settlement get explicit instructions, including the second-placement resource rule. Manual smoke checks passed; needs live rerun verification because the current running process may not reload patched code. |
| AI-RUN-010 | Fixed - smoke verified | Viewer/UI | Medium | Unified AI Analysis memory view displays `[object Object]` instead of each agent's memory text. The JSON data is correct; the renderer treats the whole memory object as a string. | Screenshot from `localhost:5000/unified`; `agent_memories.json` stores `{note_to_self, last_updated}` objects. `pycatan/static/js/unified.js` rendered `escapeHtml(memory)`. | Updated Unified memory rendering to extract `note_to_self`, `current_note`, or the latest `recent_notes` entry before falling back to JSON. Compile checks passed. |
| AI-RUN-011 | Fixed - smoke verified | Agent memory | Low | `agent_memories.json` rewrites `last_updated` for all agents whenever memories are saved, so timestamps look like all memories updated at the same time even if only one agent acted. | After Charlie prompt #4, Alice/Bob/Charlie all had `last_updated` `2026-05-15T19:23:19.*`. `AILogger.save_agent_memories` assigned `datetime.now()` while iterating all agents. | Added `AgentState.memory_updated_at`; logger now writes each agent's own memory timestamp instead of save time. Smoke check confirms timestamps are stored from per-agent state. |
| AI-RUN-012 | Fixed - smoke verified | Agent memory | Medium | Agent memory is a single overwritten `note_from_last_turn`, not a structured memory system. It is consistent for short continuity, but it loses older strategic notes and makes long games fragile. | Prompts included one `memory.note_from_last_turn`; `agent_memories.json` contained only the latest note per agent. | Added bounded per-agent `memory_history` and prompt `recent_notes` based on `memory.short_term_turns`, while preserving `note_from_last_turn` for compatibility. Smoke check confirms history is saved. |
| AI-RUN-013 | Fixed - smoke verified | Agent communication/trading | High | Chat messages are stored and passed into later prompts, so basic table-talk works. However, communication is passive broadcast only; there is no explicit negotiation state, addressed messages, offers, or response tracking. | `chat_history.json` has 8 messages; later prompts include `social_context.recent_chat`. The `session_20260515_194515` trade loop also showed that natural chat alone was not enough to execute or track an offer. | Added structured player-to-player trade state: every trade gets a `trade_id`, proposer, target, offer/request, and pending/accepted/rejected status. Pending trades are included in `social_context.pending_trades`, the target prompt contains the exact offer and accept/reject actions, and responses update the trade state. Also fixed `trade_bank` give/receive conversion to engine `offer`/`request`. Smoke checks passed. |
| AI-RUN-014 | Fixed - smoke verified | Viewer/API | High | Player Hub shows total card count after second setup resources, but per-resource counters remain 0. The game/action log knows Charlie received `1xSheep 1xOre 1xWheat`; the player card still shows all resource icons as 0. | Screenshot from Game Board; `/api/game-state` returned `cards_list: ["sheep","ore","wheat"]` and `total_cards: 3` for Charlie, but no `resources` object. `pycatan/static/js/unified.js` only rendered per-resource counts from `player.resources`. | `WebVisualization._convert_players` now emits a normalized `resources` map, and Unified Player Hub can also derive counts from `cards_list` as fallback. Smoke check confirms cards map to resource counters. |
| AI-RUN-015 | Open | Agent architecture/events | High | Agents are currently invoked mainly when the game needs an action from them. Observing-mode schemas exist, but event notifications do not trigger a passive LLM observation after significant game events, so agents cannot react to table talk, builds, dice, or opponent moves until their next actionable turn. | Code has `OBSERVING` response schema and `_create_prompt(... is_active_turn=False)`, but `process_agent_turn` creates active-turn prompts and `notify_game_event` only stores recent events through `AIManager.on_game_event`. No caller was found for observing prompts during live events. | Wire a budgeted event-observer loop for meaningful triggers such as dice rolls, builds, robber moves, trade/chat proposals, and accepted/rejected trades. Observer output should update memory/communication intent, not emit normal game actions. |
| AI-RUN-016 | Fixed - smoke verified | Agent prompt/resources | High | Resource counts visible to agents do update after dice/resource spending, but the prompt does not explicitly state the dice result or resource distribution. Agents must infer what happened by diffing state, and `meta.dice` remains `null`. | `Alice/prompts/prompt_5.txt` before roll: Alice `O1 Wh2`, Bob `W1 B1 Wh1`, Charlie `S1 O1 Wh1`. `Alice/prompts/prompt_6.txt` after roll: Alice `S1 O1 Wh2`, Bob unchanged, Charlie `S1 O1 Wh2`, matching an 8 roll. But `what_just_happened` only says Alice rolled the dice and `meta.dice` is `null`. | `GameManager.get_full_state()` now preserves `dice_rolled` for AI state/meta, and dice events include explicit per-player resource distribution summaries. Added unit coverage for dice state and distribution formatting; manual smoke checks passed because pytest is not installed in the active env. |
| AI-RUN-017 | Fixed - smoke verified | Agent action mapping | Critical | `steal_card` can be prompted and emitted with a player name, but the game engine expects a numeric player id. Bob repeatedly tried `{"target_player": "Charlie"}` after moving the robber, causing `ACTION_FAILED` instead of stealing. | `Bob/responses/response_7.json` through later responses emit `steal_card` with `"Charlie"`. `pycatan/ai/ai_user.py` passes `target_player` through unchanged for `ActionType.STEAL_CARD`, while `pycatan/management/game_manager.py` validates it as an integer (`target_player < 0`, indexes `self.game.players[target_player]`). Prompt example says `{"target_player": "Red"}`. | Implemented name/color/string-id normalization in `AIUser._convert_parameters` for `STEAL_CARD`; unknown names map to invalid id `-1` so the engine can return recoverable feedback. Added unit coverage; manual smoke checks passed because pytest is not installed in the active env. |
| AI-RUN-018 | Fixed - smoke verified | Agent failure recovery | High | Failed actions are not fed back into the next AI prompt in a useful way, so the agent repeats the same invalid action. After Bob's failed `steal_card`, later prompts still show only `steal_card`/`wait_for_response` and `what_just_happened` incorrectly falls back to `Game is starting. Place your first settlement.` | `Bob/Bob.md` requests #8-#13 repeat `steal_card` against Charlie. `Bob/prompts/prompt_8.json` and later have `what_just_happened: "Game is starting. Place your first settlement."` despite being in `NORMAL_PLAY` with robber on hex 5. `AIUser.notify_action` only prints failed actions and does not call `AIManager.on_game_event`. | Implemented failure events on the acting agent via `AIUser.notify_action`; prompt context now combines the last event with the current phase prompt instead of falling back to setup text. Added unit coverage; manual smoke checks passed because pytest is not installed in the active env. |
| AI-RUN-019 | Fixed - smoke verified | Debug workflow/session replay | High | There was no way to resume or fast-replay an AI game from a specific prior point with board state, turn phase, resources, chat history, and agent memory intact. Debugging late-turn bugs required replaying from the beginning, which wastes time and LLM tokens. | Session folders contain prompts/responses/chat/memory logs, but no authoritative GameManager/Game snapshot that can be loaded by `PLAY_AI_AUTO.BAT` / `play_with_ai.py`. | Added fast action replay for existing sessions: `play_with_ai.py` accepts `--replay-session` / `--resume-session`, infers player names in original first-response order, replays parsed decisions through normal `GameManager` logic without LLM calls, writes derived-session lineage metadata, rebuilds replayed memory/chat from recorded `note_to_self` / `say_outloud`, and then falls back to live AI when recorded decisions are exhausted or no longer match `allowed_actions`. Added `--replay-through`, `--replay-stop-before`, and `--replay-max-decisions`. Smoke checks passed for loader/order/marker validation/CLI; this is not yet a full authoritative snapshot restore. |
| AI-RUN-020 | Fixed - smoke verified | Agent action mapping/trading | Critical | `trade_propose` can be emitted without `target_player`, but the engine requires that parameter. The action construction fails before a normal `Action` exists, so the failure-feedback path does not notify the agent and Alice repeats the same invalid trade proposal. | In `session_20260515_194515`, `Alice/responses/response_6.json` through `response_9.json` all emit `trade_propose` with only `offer` and `request`. `Alice/prompts/prompt_6.json` shows the `trade_propose` example also omits `target_player`. `ActionType.TRADE_PROPOSE` requires `target_player`, and construction failures return `ACTION_PROCESSING_ERROR` without an action to pass into `AIUser.notify_action`. | `trade_propose` examples/schema now require `target_player`; `AIUser` normalizes trade target names/colors/ids and aliases like `to`; `GameManager` now surfaces pre-Action processing errors back to AI agents. Smoke check confirms Alice-style trade maps Charlie to id `2` and construction errors become agent events. |
| AI-RUN-021 | Fixed - smoke verified | Response parsing/fallback | Critical | A truncated LLM JSON response during a required setup action is parsed as `null`; unsafe fallback can choose an illegal action, illegal node, or illegal `END_TURN`. | `session_20260515_202759/Bob/responses/response_1.json` has `parsed: null` and raw content cut at `"note`, causing fallback `END_TURN`. In `session_20260515_203358/Charlie/responses/response_1.json`, the model says node 42 is the strong choice, but old fallback selected earlier-mentioned node 12. `Charlie/responses/response_5.json` and `response_6.json` were cut before `action`, causing illegal `END_TURN` while `PLACE_STARTING_ROAD` was required. | Raised `config_dev.yaml` `max_tokens` from 4096 to 20000 after discovering default config loading made the dev cap much lower than runtime default. Disabled thinking for stable JSON auto-runs, added `finish_reason` logging, recovered settlement nodes only after filtering occupied/adjacent blocked nodes, recovered setup roads from legal mentioned edges, and replaced unsafe parameterized-action `END_TURN` fallback with a retry of the required action. Compile and log-replay smoke checks passed. |
| AI-RUN-022 | Fixed - smoke verified | Turn phase/trading | Critical | After a successful dice roll, `allowed_actions` moved to post-roll actions but `turn_phase` stayed `ROLL_DICE`, so prompts still told the agent to roll. Also AI trade resources used compact keys like `S`/`B`, causing valid offers to fail resource validation. | `session_20260515_205233/Alice/prompts/prompt_6.json` has post-roll actions and resources `S:1,O:1,Wh:2`, but `what_happened` still says `Roll the dice`. `Alice/response_6.json` proposes `offer: {"S":1}` for `{"B":1}` and the engine rejects it with `You don't have the required cards to offer`; prompts #7/#9 then repeat stale roll instructions, producing illegal `ROLL_DICE` failures and a misleading `[0,0]=0` failed-roll log entry. | `AIUser` now normalizes trade resource bundles from compact prompt codes (`W/B/S/Wh/O`) and natural aliases to engine names (`wood/brick/sheep/wheat/ore`). `GameManager._resource_name_to_card()` accepts the same aliases defensively. `_handle_roll_dice()` now sets `turn_phase = PLAYER_ACTIONS` after non-7 rolls so prompt text matches allowed actions. Compile and smoke checks passed. Requires a fresh run because the active process loaded old code. |
| AI-RUN-023 | Fixed - replay smoke verified | Development cards | High | AI can hold and choose `use_dev_card` for Road Building, but the emitted card type `road_building` was rejected by the engine as invalid. The failure was fed back correctly, but the card could not be used and the agent could waste turns or loop around the same plan. | `session_20260515_211742/Alice/responses/response_10.json` emits `use_dev_card {"card_type": "road_building", "road_1": [45,35], "road_2": [35,34]}`. `Alice/prompts/prompt_11.json` says `Your previous action failed ... Error: Invalid card type: road_building`. | Added dev-card alias normalization in `AIUser` and defensive alias handling in `GameManager`: `road_building`/`road` -> `Road`, `year_of_plenty` -> `YearOfPlenty`, plus resource aliases for Monopoly/Year of Plenty. `AIUser` now converts AI road params like `road_1: [45,35]` into `road_one_coords`. Prompt examples now show exact parameter shapes for Knight, Road Building, Monopoly, and Year of Plenty. Replay smoke through `session_20260515_211742` `Alice:10` verifies Alice's Road card is consumed successfully instead of failing. |
| AI-RUN-024 | Open | Debug workflow/session replay | Medium | Fast replay replays recorded `say_outloud` and `note_to_self` for replayed actions, so if the source session contained a wrong belief near the bug, the derived session can inherit stale chat/memory even though the fixed engine state is now correct. | In derived `session_20260515_220558`, `--replay-through Alice:10` correctly executes Road Building, but also replays Alice's old table-talk: `nice, a 5! that brick is exactly what i needed...`. The live follow-up immediately corrects the state with `darn, no resources for me on that 5...`, and memory is corrected afterward. | Add a replay mode option such as `--replay-skip-side-effects-after Player:N` or recommend `--replay-stop-before Player:N` for bug-point re-generation. Consider marking replayed chat/memory as replayed and optionally excluding it from live prompt history after the cut point. |
| AI-RUN-025 | Fixed - smoke verified | Bank trade execution | Critical | Successful bank trades could corrupt a player's hand by adding the requested card as a nested list, then later UI/state code failed with `'list' object has no attribute 'name'`. On Windows, a Unicode success print could also turn the already-mutated trade into a reported failure. | `session_20260515_220558/Charlie/response_9.json` emitted valid `trade_bank {"give":"wheat","receive":"brick"}` after Monopoly. The session then logged repeated processing failures: `'list' object has no attribute 'name'`. Local smoke reproduced the issue in `_execute_trade_bank`. | `GameManager._execute_trade_bank()` now passes the single requested `ResCard` to `game.trade_to_bank()` instead of a list, validates that bank trades request exactly one card, and uses ASCII logging so console encoding cannot fail after state mutation. `TRADE_BANK` prompt text now says the default bank trade is 4:1 and shows `give_amount: 4`. Compile and direct bank-trade smoke passed with no nested cards. |
| AI-RUN-026 | Fixed - smoke verified | Viewer/replay chat | Medium | Replayed `say_outloud` messages were written to session chat logs, but the Unified chat panel could appear empty after fast replay because the browser connected after replay SSE chat events had already been emitted. | `session_20260515_231622` was derived from `session_20260515_224216` with 71 replay decisions; `chat_history.json` contained replayed chat, but the live board chat panel did not hydrate historical chat on load. | `WebVisualization` now keeps a bounded `chat_history`, exposes `/api/chat`, and `main.js` loads prior chat messages during initial connection before listening for live SSE updates. `py_compile` and `node --check` passed. |
## Run Notes
- Alice: placed settlement at node 20, road 20 -> 10.
- Bob: placed settlement at node 42, road 42 -> 41.
- Charlie: placed settlement at node 12, road 12 -> 13.
- Charlie second setup settlement: node 25; prompt #4 for Charlie road was created.
- Resource display bug observed: Charlie has 3 cards from second setup resources, but Player Hub resource counters stay 0.
- Current live point when first logged: Charlie prompt #3, second setup round starting.
## Sprint Notes
### 2026-05-15 - Progress blockers and prompt correctness
- Selected for immediate sprint: `AI-RUN-017`, `AI-RUN-018`, `AI-RUN-001`, `AI-RUN-016`.
- Fixed `steal_card` name/color/id normalization so Bob choosing `"Charlie"` becomes player id `2`.
- Added recoverable failure feedback into the acting agent's next prompt, including action type, parameters, and error message.
- Removed automatic `wait_for_response` prompt injection and added allowed-action validation in `GameManager` to prevent illegal actions from silently advancing phases.
- Preserved `dice_rolled` in AI-facing game state and added explicit resource distribution summaries to dice events.
- Verification: `pytest` is not installed in the active Python or `.venv`; ran `py_compile` and manual smoke checks for the sprint scenarios successfully.
### 2026-05-15 - Live rerun verification (`session_20260515_194515`)
- `AI-RUN-001`: partially live-verified. Setup prompts no longer include `wait_for_response` in `allowed_actions`, so the action surface is safe. The running process still emitted stale instruction text mentioning `wait_for_response`, and stale setup prompt text saying `Roll the dice`; these were fixed in code after the run had already started and require the next run to verify.
- `AI-RUN-009`: not live-verified in this run because the running process was already using the old prompt-message code. Code smoke checks pass for first settlement, starting road, and second starting settlement context.
- `AI-RUN-016`: partially live-verified. `Alice/prompts/prompt_6.json` has `meta.dice: [5,3]` and updated resources after an 8 roll. The explicit distribution text was not live-verified because this running process still had stale prompt/event text.
- `AI-RUN-017`: waiting for a robber steal event in the live run to verify name/color-to-player-id conversion.
- `AI-RUN-018`: live run found a limitation. Failure feedback works for failures after an `Action` object exists, but `trade_propose` without required parameters fails during action construction and therefore does not reach `AIUser.notify_action`.
- `AI-RUN-020`: new critical blocker from this run. Alice repeated invalid `trade_propose` outputs in responses #6-#9 because the prompt example omitted `target_player` and the construction failure was not fed back to the agent.
### 2026-05-15 - Pre-next-session backlog cleanup
- Closed or smoke-verified 16 of 20 backlog items, exceeding the 50% cleanup target before the next run.
- Fixed the critical `trade_propose` loop: prompt/schema now require `target_player`, AI trade targets are normalized like robber steals, and pre-Action construction errors are fed back to the acting agent.
- Closed UI/debugging blockers for this iteration: Unified memory no longer renders `[object Object]`, Player Hub receives normalized resource counts, and `.env` / `config_dev.yaml` load automatically for `play_with_ai.py`.
- Improved agent prompt discipline for setup and road decisions: settlement actions point agents to `find_best_nodes`/`inspect_node`; road actions point to `analyze_path_potential`; factual board claims are explicitly tied to filtered state or tool results.
- Improved memory persistence from a single overwritten note to a bounded `recent_notes` history with per-agent update timestamps.
- Verification: `pytest` is still not installed in either active Python or `.venv`; ran `py_compile` plus manual smoke checks for trade mapping, action-processing feedback, prompt guidance, memory history/timestamps, and resource-counter conversion.
### 2026-05-15 - Trading mechanism hardening
- Promoted `AI-RUN-013` from low-priority passive communication to high-priority trading infrastructure because trade loops can block the game.
- Added structured trade lifecycle: `pending` offer with `trade_id`, target wake-up prompt with the full offer, and `accepted`/`rejected` resolution callbacks to AI state.
- `social_context.pending_trades` is now populated for active prompts, so agents can see actual pending offers instead of inferring from chat messages.
- Added safety validation for invalid/self trade targets before indexing `self.users[target_id]`.
- Added `trade_bank` conversion from AI-facing `give`/`receive` to engine-facing `offer`/`request`.
- Verification: `py_compile` passed; smoke checks covered structured trade prompt state, accepted trade execution/resource transfer, invalid target rejection, AIManager pending-trade prompt context, and bank-trade conversion. `pytest` remains unavailable in the current env.
### 2026-05-15 - Rerun startup fix
- Fixed a startup crash introduced by default `config_dev.yaml` loading: `AgentConfig.__init__()` rejected richer YAML keys like `personality`.
- `AIConfig.from_dict()` now filters each YAML section to dataclass-supported fields instead of failing on unknown keys.
- Verification: `py_compile` passed and `load_ai_config()` successfully loaded `pycatan/ai/config_dev.yaml`.
### 2026-05-15 - Current rerun parse failure (`session_20260515_202759`)
- Bob's first setup response was truncated before the `action` field, so parsing returned `null`.
- The previous fallback returned `end_turn`, which is illegal while `PLACE_STARTING_SETTLEMENT` is the only allowed action; this produced repeated failed `END_TURN` rows in the UI.
- Failure feedback worked correctly by showing the failed `END_TURN` in the next Bob prompt, but the fallback itself was unsafe.
- Fixed by disabling config-dev thinking for stable JSON and by adding safe parse-failure fallback logic that does not blindly end the turn during required setup actions.
- This fix requires restarting the current run, because the active Python process already loaded the old config/code.
### 2026-05-15 - Fallback node legality follow-up (`session_20260515_203358`)
- Bob's first response was again truncated. The fallback recovered the first `Node N` mentioned in the text, but that was Alice's occupied Node 20, not Bob's intended final choice.
- The engine correctly rejected the illegal settlement with `Location is blocked`, and failure feedback correctly led Bob to choose Node 12 on the next prompt.
- Updated fallback recovery to skip occupied nodes and their neighboring nodes using compact `state.bld` and `N`, so a truncated response cannot recover a settlement on an already blocked location.
### 2026-05-15 - Truncation and road fallback hardening (`session_20260515_203358`)
- Root cause for the new truncation pattern: after `play_with_ai.py` started correctly loading `config_dev.yaml`, auto-runs inherited `max_tokens: 4096` instead of the code default `20000`. Dev config is now aligned back to `20000`.
- Confirmed agents do see blocked locations: Charlie prompt #1 includes `state.bld` with Bob at 12 and Alice at 20, and the `find_best_nodes` tool response at 20:36 excludes 12 and 20. The bad action came from truncated-response fallback, not from tool availability filtering.
- Added `finish_reason` capture to response logs so future cutoffs show whether Gemini stopped normally or hit a limit/safety/other finish reason.
- Added setup-road recovery from truncated text. Replaying Charlie response #5 now recovers `place_starting_road {"from": 14, "to": 24}` instead of illegal `END_TURN`.
- Verification: `py_compile` passed for touched AI modules; log-replay smoke checks recover Charlie's settlement fallback as node 42 and road fallback as 14 -> 24.
### 2026-05-15 - Normal-play trade/roll issue (`session_20260515_205233`)
- Setup and truncation fixes verified live through normal play entry: all setup responses parsed, no truncation, and resources displayed correctly.
- New blocker found after Alice's first roll: `dice_rolled` was set but `turn_phase` stayed `ROLL_DICE`, creating contradictory prompts: post-roll `allowed_actions` with stale "Roll the dice" instructions.
- Alice's valid-looking trade failed because the AI emitted compact resource keys from the prompt (`S` and `B`), while engine card validation expected full resource names.
- Fixed resource normalization for player trades and bank trades, and set turn phase to `PLAYER_ACTIONS` immediately after non-7 dice rolls.
- Verification: `py_compile` passed; smoke checks confirm `{"S": 1}` -> `{"sheep": 1}`, `{"B": 1}` -> `{"brick": 1}`, and GameManager accepts compact resource keys.
### 2026-05-15 - Fast replay / derived sessions
- Implemented first-pass session replay for `AI-RUN-019`.
- Usage examples:
- `play_ai_auto.bat --replay-session session_20260515_205233 --replay-stop-before Alice:6`
- `python examples/ai_testing/play_with_ai.py --auto --replay-session session_20260515_205233 --replay-through Bob:4`
- Replay creates a new session instead of mutating the old one. `session_metadata.json` records `derived_from`, source session name, replay markers, and number of loaded decisions.
- Replay uses parsed response decisions from `response_*.json`, sorted by timestamp, and feeds them back through `AIUser._decision_to_action()` and normal `GameManager` rules.
- Replay also reapplies recorded `note_to_self` and `say_outloud`, so derived sessions rebuild agent memory and chat as the replay advances.
- Replay now infers player order from first response timestamps instead of folder-name sorting, validates replay markers case-insensitively, and stops replay cleanly if an old decision no longer matches the current state's `allowed_actions`.
- Verification: `py_compile` passed; dry loader checks on `session_20260515_205233` and `session_20260515_211742` inferred `Alice, Bob, Charlie`, loaded chronological decisions, validated `--replay-stop-before Alice:6`, `--replay-through Bob:4`, and rejected a missing marker.
- Limitations: this is fast-replay, not a full `GameManager` snapshot restore. If old logged decisions include a bug that now behaves differently, use `--replay-stop-before Player:N` to stop before that point and continue live with fixed code.
### 2026-05-15 - Live run watch (`session_20260515_211742`)
- Game progressed into normal play: setup completed, resources/cards are visible in prompts, trade proposal/rejection worked, robber/steal sequence progressed, and agents are using tools before expansion decisions.
- Failure feedback is working live: after Alice's failed Road Building attempt, prompt #11 contains the exact failed action and engine error.
- New blocker logged as `AI-RUN-023`: development-card usage is not aligned between AI-facing `road_building` and engine-supported card type validation.
### 2026-05-15 - Development-card replay fix
- Fixed `AI-RUN-023` after stopping `session_20260515_211742`.
- `AIUser` now normalizes dev-card aliases and parameter shapes before creating `Action(USE_DEV_CARD)`.
- `GameManager` now defensively accepts dev-card aliases and compact resource names for Monopoly/Year of Plenty.
- `USE_DEV_CARD` prompt examples now show card-specific params instead of only `{"card_type": "knight"}`.
- Verification:
- `py_compile` passed for `pycatan/ai/ai_user.py`, `pycatan/ai/ai_manager.py`, `pycatan/management/game_manager.py`, and `examples/ai_testing/play_with_ai.py`.
- Conversion smoke maps Alice's failed payload to `{'card_type': 'Road', 'road_one_coords': ..., 'road_two_coords': ...}`.
- Fast replay through `session_20260515_211742 --replay-through Alice:10` consumes Alice's `Road` dev card successfully instead of returning `Invalid card type: road_building`.
### 2026-05-15 - Replay verification run (`session_20260515_220558`)
- Ran `play_ai_auto.bat --replay-session session_20260515_211742 --replay-through Alice:10`.
- Derived session metadata is correct: source session is `session_20260515_211742`, `decisions_loaded: 28`, `replay_through: Alice:10`.
- Road Building fix is live-verified: Alice starts the live portion after roads to nodes 35/34 are in place and chooses `end_turn`; no `Invalid card type` or `ACTION_FAILED` appears.
- Structured trading still works after replay: Bob proposes Wheat for Charlie's Sheep, Charlie receives the targeted prompt and rejects with `trade_reject`, then control returns to Bob.
- Logged `AI-RUN-024`: `--replay-through` can carry stale replayed chat/memory from the bug point; use `--replay-stop-before` when the goal is to regenerate the suspect action's communication and memory too.
### 2026-05-15 - Bank trade execution blocker (`session_20260515_220558`)
- Charlie's post-Monopoly bank trade was strategically valid: `trade_bank {"give": "wheat", "receive": "brick"}` with four Wheat available.
- Root cause was execution-side, not decision-side: `GameManager` passed `request_cards` as a list into `game.trade_to_bank()`, while the core API expects a single requested `ResCard`. This inserted a nested list into Charlie's cards and later broke state rendering/logging with `'list' object has no attribute 'name'`.
- Fixed `AI-RUN-025` by passing `request_cards[0]`, validating exactly one requested card, and replacing the Unicode bank-trade success print with ASCII to avoid Windows console encoding failures after mutation.
- Prompt examples now make the 4:1 bank-trade amount explicit: `{"give": "wheat", "give_amount": 4, "receive": "brick"}`.
- Verification: `py_compile` passed and a direct smoke trade leaves Charlie with `['Wood', 'Brick']`, `success=True`, and `nested_lists=False`.
### 2026-05-15 - Replay chat hydration (`session_20260515_231622`)
- `--replay-session session_20260515_224216 --hebrew-chat` did replay old chat, but those chat events were emitted before the browser's SSE connection was ready, so the board chat panel could look empty.
- Fixed `AI-RUN-026`: the web visualization now stores replay/live chat history, exposes `/api/chat`, and the frontend hydrates the chat panel on initial load.
- Verification: `py_compile` passed for `web_visualization.py`; `node --check` passed for `main.js`.