PyCatan-AI / .github /instructions /WORK_PLAN.md
EZTIME2025
update response parser
21fb2c3

πŸ—ΊοΈ AI Agent Development Work Plan

Date: January 3, 2026
Status: βœ… Phase 1 - Foundation & Infrastructure (100% Complete) Current Task: Phase 3 - Core AI Agent (3.1) - NEXT

🎯 Project Goal

Build a fully functional LLM-based AI agent that can play Settlers of Catan autonomously, making intelligent strategic decisions and interacting naturally with other players.


πŸ“Š Development Phases

Note: Phase 4 (Monitoring & Debugging) should be developed early and in parallel with Phase 3.
The web dashboard and logging are critical for observing agent behavior during development!

Phase 1: Foundation & Infrastructure πŸ—οΈ

Goal: Build the core infrastructure needed to support AI agents

1.1 Configuration Management βœ… COMPLETED

  • Create centralized configuration system
    • LLM settings (model, temperature, max_tokens, etc.)
    • API credentials management
    • Agent parameters (custom instructions only)
    • Performance settings (timeouts, retries, caching)
  • Create config file format (YAML)
  • Build configuration loader and validator
  • Add environment variable support for sensitive data

Files created:

  • βœ… pycatan/ai/config.py - Configuration management
  • βœ… pycatan/ai/config_example.yaml - Example configuration file
  • βœ… pycatan/ai/config_dev.yaml - Default dev configuration
  • βœ… .env.example - Environment variables template

1.2 Prompt Management Layer βœ… COMPLETED

  • Design prompt processing pipeline
  • Implement game state filtering
    • Hide opponent's private information
    • Filter development cards
    • Remove non-visible game elements
  • Build perspective transformation
    • Convert game state to agent's viewpoint
    • Format resources and points
    • Present relative positioning
  • Create prompt template system
    • Meta data section
    • Task context section
    • Game state section
    • Social context section
    • Memory section
    • Constraints section
  • Build custom instruction injection per agent

Files created:

  • βœ… pycatan/ai/prompt_manager.py - Main prompt processing
  • βœ… pycatan/ai/state_filter.py - Game state filtering logic
  • βœ… pycatan/ai/prompt_templates.py - Template definitions

1.2.5 Game State Optimization βœ… COMPLETED

Goal: Optimize the game state capture and representation for better LLM consumption

  • Review current game state structure from play_and_capture.py
  • Design improved game state format
    • Compress player information structure
    • Improve board representation (lookup tables H & N)
    • Add resource/harbor code mappings
    • Reduce redundancy and token usage (removed pixel_coords, board_graph)
    • Add status flags (Longest Road, Largest Army)
  • Create optimized state format with legend
  • Update game state capture to save both formats (.json + .txt)
  • Fix timing: capture state at turn START (not just after actions)
  • Test with real game scenarios

Files modified:

  • βœ… examples/ai_testing/play_and_capture.py - Optimized state capture
  • βœ… pycatan/management/game_manager.py - Added state capture at turn start

Key achievements:

  • 🎯 State representation optimized by ~60% (removed redundant fields)
  • πŸ“Š Compressed format with lookup tables (H=hexes, N=nodes)
  • πŸ”„ Real-time state updates at current_state_optimized.txt
  • πŸ“ Clear legend/documentation included in output
  • βœ… Captures state at decision point (turn start)

1.3 Response Parser βœ… COMPLETED

  • Define structured response format (JSON schema)
  • Build response parser and validator
  • Implement error handling for malformed responses
  • Create fallback mechanisms for parsing failures
  • Add response logging for debugging

Files created:

  • βœ… pycatan/ai/response_parser.py - Parse and validate LLM responses
  • βœ… pycatan/ai/schemas.py - JSON schemas for requests/responses

Key features:

  • 🎯 Dual schema support: Active turn (with action) & Observing (no action)
  • πŸ›‘οΈ Error handling: Invalid JSON, missing fields, type validation
  • πŸ”§ Fallback mechanisms: JSON repair, structure repair, default values
  • πŸ“Š Parse statistics tracking
  • πŸ” Flexible parsing: Handles markdown code blocks, extra text
  • βœ… Action parameter validation against expected schemas

Phase 2: Memory System 🧠

Goal: Enable agents to maintain context and learning across turns

2.1 Memory Structure

  • Design memory data model
    • Short-term observations (last N turns)
    • Strategic notes (persistent)
    • Social tracking (player relationships)
    • Game insights (patterns observed)
  • Implement memory storage (in-memory for now)
  • Build memory retrieval and formatting

Files to create:

  • pycatan/ai/memory.py - Memory management system

2.2 Memory Operations

  • Add note creation and updates
  • Implement memory pruning (keep relevant, remove old)
  • Build memory summarization for context limits
  • Create memory persistence (save/load between games)

2.3 Chat History Summarization ⚑

  • Implement automatic chat summarization
    • Configure separate smaller LLM for summarization (cost-effective)
    • Monitor chat history length (e.g., last 10 messages)
    • Trigger summarization when threshold reached
    • Create summarization prompt template
  • Build chat memory management
    • Keep only most recent message after summarization
    • Store summary in agent's memory
    • Maintain summary history for context
  • Add configuration for summarization settings
    • Summarization model selection
    • Message threshold for triggering
    • Summary format and length

Files to update:

  • pycatan/ai/memory.py - Add chat summarization logic
  • pycatan/ai/config.py - Add summarization configuration
  • pycatan/ai/llm_client.py - Support multiple models (main + summarization)

Phase 3: Core AI Agent πŸ€–

Goal: Implement the main AI agent class

3.1 Base Agent Implementation

  • Create AIAgent class inheriting from User
  • Implement required User interface methods
    • get_choice() for decision-making
    • Other interaction methods as needed
  • Integrate with prompt manager
  • Integrate with memory system
  • Add agent state management

Files to create:

  • pycatan/players/ai_agent.py - Main AI agent implementation (update existing stub)

3.2 LLM Integration

  • Create LLM client abstraction
    • Support for OpenAI API
    • Support for Anthropic Claude
    • Support for other providers (Azure, etc.)
  • Implement API call handling
    • Request formatting
    • Response parsing
    • Error handling and retries
    • Rate limiting
  • Add logging for all LLM interactions
  • Implement cost tracking

Files to create:

  • pycatan/ai/llm_client.py - LLM API abstraction
  • pycatan/ai/providers/ - Provider-specific implementations
    • openai_provider.py
    • anthropic_provider.py

3.3 Decision Pipeline

  • Build event-to-prompt conversion
  • Implement action extraction from responses
  • Create action validation before execution
  • Add decision logging and debugging
  • Implement decision timeout handling

Phase 4: Monitoring & Debugging Infrastructure πŸ”

Goal: Build essential tools for observing and debugging agent behavior

⚠️ CRITICAL: These tools are essential for development and must be built early!


4.1 Web Dashboard for Real-Time Monitoring 🌐

Priority: HIGH - Required before agent testing

  • Design web dashboard UI
    • Multi-agent view (tabs or split screen per agent)
    • Live prompt display with syntax highlighting
    • Agent reasoning/thinking display
    • Action selection visualization
    • Chat window with all messages
    • Game state summary panel
  • Build backend API for dashboard
    • WebSocket connection for live updates
    • Endpoints for prompt/response history
    • Agent state endpoints
    • Chat history endpoint
  • Implement prompt logging and streaming
    • Capture all prompts sent to LLM
    • Capture all responses from LLM
    • Stream to dashboard in real-time
    • Format for readability
  • Build agent reasoning viewer
    • Display internal_thinking/reasoning
    • Show action selection process
    • Highlight tool usage
    • Show memory updates

Files to create:

  • pycatan/monitoring/ - NEW monitoring package
    • dashboard_server.py - Flask/FastAPI server for dashboard
    • event_logger.py - Captures and broadcasts events
    • prompt_tracker.py - Tracks all LLM interactions
  • pycatan/monitoring/web/ - Dashboard frontend
    • index.html - Main dashboard page
    • dashboard.js - Dashboard functionality
    • dashboard.css - Dashboard styling

4.2 Local Documentation & Logging πŸ“

Priority: HIGH - Required for debugging

  • Design local documentation structure
    • One folder per game session
    • One file per agent with structured log
    • Timestamp-based organization
  • Implement per-agent documentation
    • Agent configuration snapshot
    • All prompts sent (formatted)
    • All responses received (formatted)
    • Decision timeline with reasoning
    • Memory state snapshots
    • Tool usage log
    • Errors and warnings
  • Build structured logging format
    • JSON-based for easy parsing
    • Markdown reports for human reading
    • Searchable and filterable
  • Add game session documentation
    • Game state at each turn
    • All chat messages with timestamps
    • Final game results and statistics

Files to create:

  • pycatan/monitoring/local_logger.py - Local file logging
  • pycatan/monitoring/session_recorder.py - Game session recording
  • pycatan/monitoring/report_generator.py - Generate readable reports

Output structure:

logs/
└── game_sessions/
    └── 2026-01-03_15-30-45/
        β”œβ”€β”€ game_summary.json
        β”œβ”€β”€ chat_log.txt
        β”œβ”€β”€ agent_blue/
        β”‚   β”œβ”€β”€ config.json
        β”‚   β”œβ”€β”€ prompts.log
        β”‚   β”œβ”€β”€ decisions.log
        β”‚   └── memory_snapshots.json
        β”œβ”€β”€ agent_red/
        β”‚   └── ...
        └── agent_white/
            └── ...

4.3 Chat Management System πŸ’¬

Priority: HIGH - Core game feature

  • Design chat system architecture
    • Centralized chat manager
    • Message routing between players
    • Chat history per game
    • Public vs private messages
  • Implement chat manager component
    • Message queue/buffer
    • Broadcast to all players
    • Direct messages between players
    • Integration with GameManager
  • Build chat observation interface
    • Real-time chat display in web dashboard
    • Chat log export
    • Filter by sender/time
  • Define chat protocol
    • Message format (sender, content, timestamp, type)
    • Chat commands (if any)
    • Trade negotiation messages

Files to create:

  • pycatan/management/chat_manager.py - Central chat management
  • pycatan/management/message.py - Message data structure

Integration points:

  • GameManager receives messages from players
  • ChatManager distributes to other players and dashboard
  • AI agents see messages in their prompt context
  • Web dashboard shows live chat
  • Local logs record all messages

Phase 5: Tool System πŸ”§

Goal: Provide computational tools for agent decision-making

5.1 Core Tools

  • Probability Calculator
    • Dice roll probabilities for tiles
    • Expected resource generation rates
    • Statistical analysis helpers
  • Resource Tracker
    • Historical resource generation
    • Resource scarcity analysis
    • Production trend analysis
  • Path Finder
    • Optimal road placement
    • Longest road calculation
    • Connectivity analysis
  • Trade Evaluator
    • Fair trade assessment
    • Trade benefit calculation
    • Market value estimation

Files to create:

  • pycatan/ai/tools/ - Tool implementations
    • probability_tool.py
    • resource_tool.py
    • pathfinding_tool.py
    • trade_tool.py
    • tool_manager.py - Tool orchestration

5.2 Tool Integration

  • Define tool interface/protocol
  • Implement tool calling from prompts
  • Add tool usage limits per decision
  • Create tool result formatting
  • Build tool usage logging

Phase 6: Testing & Validation βœ…

Goal: Ensure agent works correctly and plays reasonably

6.1 Unit Tests

  • Test prompt manager filtering
  • Test response parser with various inputs
  • Test memory operations
  • Test each tool independently
  • Test configuration loading

Files to create:

  • tests/unit/test_ai_agent.py
  • tests/unit/test_prompt_manager.py
  • tests/unit/test_memory.py
  • tests/unit/test_tools.py

6.2 Integration Tests

  • Test agent in complete game loop
  • Test agent vs human player
  • Test multiple AI agents playing together
  • Test edge cases and error scenarios
  • Test long-running games (memory management)

Files to create:

  • tests/integration/test_ai_gameplay.py
  • tests/integration/test_multi_agent.py

6.3 Gameplay Validation

  • Verify legal moves only
  • Check strategic decision quality
  • Evaluate social interaction naturalness
  • Monitor LLM costs and performance
  • Collect agent behavior metrics

Phase 7: Optimization & Enhancement πŸš€

Goal: Improve agent performance and capabilities

7.1 Performance Optimization

  • Reduce prompt token usage
  • Implement response caching for similar situations
  • Optimize tool execution
  • Improve decision speed

7.2 Strategy Enhancement

  • Tune agent personalities
  • Improve opening game strategy
  • Enhance mid-game adaptation
  • Refine end-game tactics
  • Better negotiation and trading

7.3 Advanced Features

  • Multi-turn planning capability
  • Opponent modeling
  • Meta-strategy learning
  • Tournament play support
  • Statistical performance tracking

πŸ“ Project Structure (Proposed)

pycatan/
β”œβ”€β”€ ai/                          # NEW: AI agent infrastructure
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ config.py                # Configuration management
β”‚   β”œβ”€β”€ prompt_manager.py        # Prompt processing pipeline
β”‚   β”œβ”€β”€ state_filter.py          # Game state filtering
β”‚   β”œβ”€β”€ prompt_templates.py      # Prompt templates
β”‚   β”œβ”€β”€ response_parser.py       # Response parsing
β”‚   β”œβ”€β”€ schemas.py               # JSON schemas
β”‚   β”œβ”€β”€ memory.py                # Memory system + chat summarization
β”‚   β”œβ”€β”€ llm_client.py            # LLM abstraction (multi-model)
β”‚   β”œβ”€β”€ providers/               # LLM provider implementations
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ openai_provider.py
β”‚   β”‚   └── anthropic_provider.py
β”‚   └── tools/                   # Agent tools
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ tool_manager.py
β”‚       β”œβ”€β”€ probability_tool.py
β”‚       β”œβ”€β”€ resource_tool.py
β”‚       β”œβ”€β”€ pathfinding_tool.py
β”‚       └── trade_tool.py
β”œβ”€β”€ monitoring/                  # NEW: Monitoring & debugging
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ dashboard_server.py      # Web dashboard backend
β”‚   β”œβ”€β”€ event_logger.py          # Event capture and broadcast
β”‚   β”œβ”€β”€ prompt_tracker.py        # LLM interaction tracking
β”‚   β”œβ”€β”€ local_logger.py          # Local file logging
β”‚   β”œβ”€β”€ session_recorder.py      # Game session recording
β”‚   β”œβ”€β”€ report_generator.py      # Report generation
β”‚   └── web/                     # Dashboard frontend
β”‚       β”œβ”€β”€ index.html
β”‚       β”œβ”€β”€ dashboard.js
β”‚       └── dashboard.css
β”œβ”€β”€ management/
β”‚   β”œβ”€β”€ actions.py               # Existing
β”‚   β”œβ”€β”€ game_manager.py          # Existing
β”‚   β”œβ”€β”€ log_events.py            # Existing
β”‚   β”œβ”€β”€ chat_manager.py          # NEW: Chat management
β”‚   └── message.py               # NEW: Message data structure
β”œβ”€β”€ players/
β”‚   β”œβ”€β”€ ai_agent.py              # UPDATE: Full AI agent implementation
β”‚   β”œβ”€β”€ human_user.py            # Existing
β”‚   └── user.py                  # Existing
└── ...                          # Existing structure

logs/                            # NEW: Local documentation
└── game_sessions/
    └── YYYY-MM-DD_HH-MM-SS/
        β”œβ”€β”€ game_summary.json
        β”œβ”€β”€ chat_log.txt
        └── agent_<color>/
            β”œβ”€β”€ config.json
            β”œβ”€β”€ prompts.log
            β”œβ”€β”€ decisions.log
            └── memory_snapshots.json

examples/
β”œβ”€β”€ ai_testing/
β”‚   β”œβ”€β”€ config_example.yaml      # NEW: Example configuration
β”‚   β”œβ”€β”€ test_single_agent.py     # NEW: Test script
β”‚   └── test_multi_agent.py      # NEW: Multi-agent test
└── ...

tests/
β”œβ”€β”€ unit/
β”‚   β”œβ”€β”€ test_ai_agent.py         # NEW
β”‚   β”œβ”€β”€ test_prompt_manager.py   # NEW
β”‚   β”œβ”€β”€ test_memory.py           # NEW
β”‚   └── test_tools.py            # NEW
β”œβ”€β”€ integration/
β”‚   β”œβ”€β”€ test_ai_gameplay.py      # NEW
β”‚   └── test_multi_agent.py      # NEW
└── ...