PyCatan-AI / .github /instructions /WORK_PLAN.md
EZTIME2025
update response parser
21fb2c3
# πŸ—ΊοΈ AI Agent Development Work Plan
**Date:** January 3, 2026
**Status:** βœ… Phase 1 - Foundation & Infrastructure (100% Complete)
**Current Task:** Phase 3 - Core AI Agent (3.1) - **NEXT**
## 🎯 Project Goal
Build a fully functional LLM-based AI agent that can play Settlers of Catan autonomously, making intelligent strategic decisions and interacting naturally with other players.
---
## πŸ“Š Development Phases
> **Note:** Phase 4 (Monitoring & Debugging) should be developed **early and in parallel** with Phase 3.
> The web dashboard and logging are **critical** for observing agent behavior during development!
### Phase 1: Foundation & Infrastructure πŸ—οΈ
**Goal:** Build the core infrastructure needed to support AI agents
#### 1.1 Configuration Management βœ… **COMPLETED**
- [x] Create centralized configuration system
- [x] LLM settings (model, temperature, max_tokens, etc.)
- [x] API credentials management
- [x] Agent parameters (custom instructions only)
- [x] Performance settings (timeouts, retries, caching)
- [x] Create config file format (YAML)
- [x] Build configuration loader and validator
- [x] Add environment variable support for sensitive data
**Files created:**
- βœ… `pycatan/ai/config.py` - Configuration management
- βœ… `pycatan/ai/config_example.yaml` - Example configuration file
- βœ… `pycatan/ai/config_dev.yaml` - Default dev configuration
- βœ… `.env.example` - Environment variables template
---
#### 1.2 Prompt Management Layer βœ… **COMPLETED**
- [x] Design prompt processing pipeline
- [x] Implement game state filtering
- [x] Hide opponent's private information
- [x] Filter development cards
- [x] Remove non-visible game elements
- [x] Build perspective transformation
- [x] Convert game state to agent's viewpoint
- [x] Format resources and points
- [x] Present relative positioning
- [x] Create prompt template system
- [x] Meta data section
- [x] Task context section
- [x] Game state section
- [x] Social context section
- [x] Memory section
- [x] Constraints section
- [x] Build custom instruction injection per agent
**Files created:**
- βœ… `pycatan/ai/prompt_manager.py` - Main prompt processing
- βœ… `pycatan/ai/state_filter.py` - Game state filtering logic
- βœ… `pycatan/ai/prompt_templates.py` - Template definitions
---
#### 1.2.5 Game State Optimization βœ… **COMPLETED**
**Goal:** Optimize the game state capture and representation for better LLM consumption
- [x] Review current game state structure from `play_and_capture.py`
- [x] Design improved game state format
- [x] Compress player information structure
- [x] Improve board representation (lookup tables H & N)
- [x] Add resource/harbor code mappings
- [x] Reduce redundancy and token usage (removed pixel_coords, board_graph)
- [x] Add status flags (Longest Road, Largest Army)
- [x] Create optimized state format with legend
- [x] Update game state capture to save both formats (.json + .txt)
- [x] Fix timing: capture state at turn START (not just after actions)
- [x] Test with real game scenarios
**Files modified:**
- βœ… `examples/ai_testing/play_and_capture.py` - Optimized state capture
- βœ… `pycatan/management/game_manager.py` - Added state capture at turn start
**Key achievements:**
- 🎯 State representation optimized by ~60% (removed redundant fields)
- πŸ“Š Compressed format with lookup tables (H=hexes, N=nodes)
- πŸ”„ Real-time state updates at `current_state_optimized.txt`
- πŸ“ Clear legend/documentation included in output
- βœ… Captures state at decision point (turn start)
---
#### 1.3 Response Parser βœ… **COMPLETED**
- [x] Define structured response format (JSON schema)
- [x] Build response parser and validator
- [x] Implement error handling for malformed responses
- [x] Create fallback mechanisms for parsing failures
- [x] Add response logging for debugging
**Files created:**
- βœ… `pycatan/ai/response_parser.py` - Parse and validate LLM responses
- βœ… `pycatan/ai/schemas.py` - JSON schemas for requests/responses
**Key features:**
- 🎯 Dual schema support: Active turn (with action) & Observing (no action)
- πŸ›‘οΈ Error handling: Invalid JSON, missing fields, type validation
- πŸ”§ Fallback mechanisms: JSON repair, structure repair, default values
- πŸ“Š Parse statistics tracking
- πŸ” Flexible parsing: Handles markdown code blocks, extra text
- βœ… Action parameter validation against expected schemas
---
### Phase 2: Memory System 🧠
**Goal:** Enable agents to maintain context and learning across turns
#### 2.1 Memory Structure
- [ ] Design memory data model
- [ ] Short-term observations (last N turns)
- [ ] Strategic notes (persistent)
- [ ] Social tracking (player relationships)
- [ ] Game insights (patterns observed)
- [ ] Implement memory storage (in-memory for now)
- [ ] Build memory retrieval and formatting
**Files to create:**
- `pycatan/ai/memory.py` - Memory management system
---
#### 2.2 Memory Operations
- [ ] Add note creation and updates
- [ ] Implement memory pruning (keep relevant, remove old)
- [ ] Build memory summarization for context limits
- [ ] Create memory persistence (save/load between games)
---
#### 2.3 Chat History Summarization ⚑
- [ ] Implement automatic chat summarization
- [ ] Configure separate smaller LLM for summarization (cost-effective)
- [ ] Monitor chat history length (e.g., last 10 messages)
- [ ] Trigger summarization when threshold reached
- [ ] Create summarization prompt template
- [ ] Build chat memory management
- [ ] Keep only most recent message after summarization
- [ ] Store summary in agent's memory
- [ ] Maintain summary history for context
- [ ] Add configuration for summarization settings
- [ ] Summarization model selection
- [ ] Message threshold for triggering
- [ ] Summary format and length
**Files to update:**
- `pycatan/ai/memory.py` - Add chat summarization logic
- `pycatan/ai/config.py` - Add summarization configuration
- `pycatan/ai/llm_client.py` - Support multiple models (main + summarization)
---
### Phase 3: Core AI Agent πŸ€–
**Goal:** Implement the main AI agent class
#### 3.1 Base Agent Implementation
- [ ] Create `AIAgent` class inheriting from `User`
- [ ] Implement required User interface methods
- [ ] `get_choice()` for decision-making
- [ ] Other interaction methods as needed
- [ ] Integrate with prompt manager
- [ ] Integrate with memory system
- [ ] Add agent state management
**Files to create:**
- `pycatan/players/ai_agent.py` - Main AI agent implementation (update existing stub)
---
#### 3.2 LLM Integration
- [ ] Create LLM client abstraction
- [ ] Support for OpenAI API
- [ ] Support for Anthropic Claude
- [ ] Support for other providers (Azure, etc.)
- [ ] Implement API call handling
- [ ] Request formatting
- [ ] Response parsing
- [ ] Error handling and retries
- [ ] Rate limiting
- [ ] Add logging for all LLM interactions
- [ ] Implement cost tracking
**Files to create:**
- `pycatan/ai/llm_client.py` - LLM API abstraction
- `pycatan/ai/providers/` - Provider-specific implementations
- `openai_provider.py`
- `anthropic_provider.py`
---
#### 3.3 Decision Pipeline
- [ ] Build event-to-prompt conversion
- [ ] Implement action extraction from responses
- [ ] Create action validation before execution
- [ ] Add decision logging and debugging
- [ ] Implement decision timeout handling
---
### Phase 4: Monitoring & Debugging Infrastructure πŸ”
**Goal:** Build essential tools for observing and debugging agent behavior
**⚠️ CRITICAL: These tools are essential for development and must be built early!**
---
#### 4.1 Web Dashboard for Real-Time Monitoring 🌐
**Priority: HIGH - Required before agent testing**
- [ ] Design web dashboard UI
- [ ] Multi-agent view (tabs or split screen per agent)
- [ ] Live prompt display with syntax highlighting
- [ ] Agent reasoning/thinking display
- [ ] Action selection visualization
- [ ] Chat window with all messages
- [ ] Game state summary panel
- [ ] Build backend API for dashboard
- [ ] WebSocket connection for live updates
- [ ] Endpoints for prompt/response history
- [ ] Agent state endpoints
- [ ] Chat history endpoint
- [ ] Implement prompt logging and streaming
- [ ] Capture all prompts sent to LLM
- [ ] Capture all responses from LLM
- [ ] Stream to dashboard in real-time
- [ ] Format for readability
- [ ] Build agent reasoning viewer
- [ ] Display internal_thinking/reasoning
- [ ] Show action selection process
- [ ] Highlight tool usage
- [ ] Show memory updates
**Files to create:**
- `pycatan/monitoring/` - NEW monitoring package
- `dashboard_server.py` - Flask/FastAPI server for dashboard
- `event_logger.py` - Captures and broadcasts events
- `prompt_tracker.py` - Tracks all LLM interactions
- `pycatan/monitoring/web/` - Dashboard frontend
- `index.html` - Main dashboard page
- `dashboard.js` - Dashboard functionality
- `dashboard.css` - Dashboard styling
---
#### 4.2 Local Documentation & Logging πŸ“
**Priority: HIGH - Required for debugging**
- [ ] Design local documentation structure
- [ ] One folder per game session
- [ ] One file per agent with structured log
- [ ] Timestamp-based organization
- [ ] Implement per-agent documentation
- [ ] Agent configuration snapshot
- [ ] All prompts sent (formatted)
- [ ] All responses received (formatted)
- [ ] Decision timeline with reasoning
- [ ] Memory state snapshots
- [ ] Tool usage log
- [ ] Errors and warnings
- [ ] Build structured logging format
- [ ] JSON-based for easy parsing
- [ ] Markdown reports for human reading
- [ ] Searchable and filterable
- [ ] Add game session documentation
- [ ] Game state at each turn
- [ ] All chat messages with timestamps
- [ ] Final game results and statistics
**Files to create:**
- `pycatan/monitoring/local_logger.py` - Local file logging
- `pycatan/monitoring/session_recorder.py` - Game session recording
- `pycatan/monitoring/report_generator.py` - Generate readable reports
**Output structure:**
```
logs/
└── game_sessions/
└── 2026-01-03_15-30-45/
β”œβ”€β”€ game_summary.json
β”œβ”€β”€ chat_log.txt
β”œβ”€β”€ agent_blue/
β”‚ β”œβ”€β”€ config.json
β”‚ β”œβ”€β”€ prompts.log
β”‚ β”œβ”€β”€ decisions.log
β”‚ └── memory_snapshots.json
β”œβ”€β”€ agent_red/
β”‚ └── ...
└── agent_white/
└── ...
```
---
#### 4.3 Chat Management System πŸ’¬
**Priority: HIGH - Core game feature**
- [ ] Design chat system architecture
- [ ] Centralized chat manager
- [ ] Message routing between players
- [ ] Chat history per game
- [ ] Public vs private messages
- [ ] Implement chat manager component
- [ ] Message queue/buffer
- [ ] Broadcast to all players
- [ ] Direct messages between players
- [ ] Integration with GameManager
- [ ] Build chat observation interface
- [ ] Real-time chat display in web dashboard
- [ ] Chat log export
- [ ] Filter by sender/time
- [ ] Define chat protocol
- [ ] Message format (sender, content, timestamp, type)
- [ ] Chat commands (if any)
- [ ] Trade negotiation messages
**Files to create:**
- `pycatan/management/chat_manager.py` - Central chat management
- `pycatan/management/message.py` - Message data structure
**Integration points:**
- GameManager receives messages from players
- ChatManager distributes to other players and dashboard
- AI agents see messages in their prompt context
- Web dashboard shows live chat
- Local logs record all messages
---
### Phase 5: Tool System πŸ”§
**Goal:** Provide computational tools for agent decision-making
#### 5.1 Core Tools
- [ ] **Probability Calculator**
- [ ] Dice roll probabilities for tiles
- [ ] Expected resource generation rates
- [ ] Statistical analysis helpers
- [ ] **Resource Tracker**
- [ ] Historical resource generation
- [ ] Resource scarcity analysis
- [ ] Production trend analysis
- [ ] **Path Finder**
- [ ] Optimal road placement
- [ ] Longest road calculation
- [ ] Connectivity analysis
- [ ] **Trade Evaluator**
- [ ] Fair trade assessment
- [ ] Trade benefit calculation
- [ ] Market value estimation
**Files to create:**
- `pycatan/ai/tools/` - Tool implementations
- `probability_tool.py`
- `resource_tool.py`
- `pathfinding_tool.py`
- `trade_tool.py`
- `tool_manager.py` - Tool orchestration
---
#### 5.2 Tool Integration
- [ ] Define tool interface/protocol
- [ ] Implement tool calling from prompts
- [ ] Add tool usage limits per decision
- [ ] Create tool result formatting
- [ ] Build tool usage logging
---
### Phase 6: Testing & Validation βœ…
**Goal:** Ensure agent works correctly and plays reasonably
#### 6.1 Unit Tests
- [ ] Test prompt manager filtering
- [ ] Test response parser with various inputs
- [ ] Test memory operations
- [ ] Test each tool independently
- [ ] Test configuration loading
**Files to create:**
- `tests/unit/test_ai_agent.py`
- `tests/unit/test_prompt_manager.py`
- `tests/unit/test_memory.py`
- `tests/unit/test_tools.py`
---
#### 6.2 Integration Tests
- [ ] Test agent in complete game loop
- [ ] Test agent vs human player
- [ ] Test multiple AI agents playing together
- [ ] Test edge cases and error scenarios
- [ ] Test long-running games (memory management)
**Files to create:**
- `tests/integration/test_ai_gameplay.py`
- `tests/integration/test_multi_agent.py`
---
#### 6.3 Gameplay Validation
- [ ] Verify legal moves only
- [ ] Check strategic decision quality
- [ ] Evaluate social interaction naturalness
- [ ] Monitor LLM costs and performance
- [ ] Collect agent behavior metrics
---
### Phase 7: Optimization & Enhancement πŸš€
**Goal:** Improve agent performance and capabilities
#### 7.1 Performance Optimization
- [ ] Reduce prompt token usage
- [ ] Implement response caching for similar situations
- [ ] Optimize tool execution
- [ ] Improve decision speed
---
#### 7.2 Strategy Enhancement
- [ ] Tune agent personalities
- [ ] Improve opening game strategy
- [ ] Enhance mid-game adaptation
- [ ] Refine end-game tactics
- [ ] Better negotiation and trading
---
#### 7.3 Advanced Features
- [ ] Multi-turn planning capability
- [ ] Opponent modeling
- [ ] Meta-strategy learning
- [ ] Tournament play support
- [ ] Statistical performance tracking
---
## πŸ“ Project Structure (Proposed)
```
pycatan/
β”œβ”€β”€ ai/ # NEW: AI agent infrastructure
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ config.py # Configuration management
β”‚ β”œβ”€β”€ prompt_manager.py # Prompt processing pipeline
β”‚ β”œβ”€β”€ state_filter.py # Game state filtering
β”‚ β”œβ”€β”€ prompt_templates.py # Prompt templates
β”‚ β”œβ”€β”€ response_parser.py # Response parsing
β”‚ β”œβ”€β”€ schemas.py # JSON schemas
β”‚ β”œβ”€β”€ memory.py # Memory system + chat summarization
β”‚ β”œβ”€β”€ llm_client.py # LLM abstraction (multi-model)
β”‚ β”œβ”€β”€ providers/ # LLM provider implementations
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ β”œβ”€β”€ openai_provider.py
β”‚ β”‚ └── anthropic_provider.py
β”‚ └── tools/ # Agent tools
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ tool_manager.py
β”‚ β”œβ”€β”€ probability_tool.py
β”‚ β”œβ”€β”€ resource_tool.py
β”‚ β”œβ”€β”€ pathfinding_tool.py
β”‚ └── trade_tool.py
β”œβ”€β”€ monitoring/ # NEW: Monitoring & debugging
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ dashboard_server.py # Web dashboard backend
β”‚ β”œβ”€β”€ event_logger.py # Event capture and broadcast
β”‚ β”œβ”€β”€ prompt_tracker.py # LLM interaction tracking
β”‚ β”œβ”€β”€ local_logger.py # Local file logging
β”‚ β”œβ”€β”€ session_recorder.py # Game session recording
β”‚ β”œβ”€β”€ report_generator.py # Report generation
β”‚ └── web/ # Dashboard frontend
β”‚ β”œβ”€β”€ index.html
β”‚ β”œβ”€β”€ dashboard.js
β”‚ └── dashboard.css
β”œβ”€β”€ management/
β”‚ β”œβ”€β”€ actions.py # Existing
β”‚ β”œβ”€β”€ game_manager.py # Existing
β”‚ β”œβ”€β”€ log_events.py # Existing
β”‚ β”œβ”€β”€ chat_manager.py # NEW: Chat management
β”‚ └── message.py # NEW: Message data structure
β”œβ”€β”€ players/
β”‚ β”œβ”€β”€ ai_agent.py # UPDATE: Full AI agent implementation
β”‚ β”œβ”€β”€ human_user.py # Existing
β”‚ └── user.py # Existing
└── ... # Existing structure
logs/ # NEW: Local documentation
└── game_sessions/
└── YYYY-MM-DD_HH-MM-SS/
β”œβ”€β”€ game_summary.json
β”œβ”€β”€ chat_log.txt
└── agent_<color>/
β”œβ”€β”€ config.json
β”œβ”€β”€ prompts.log
β”œβ”€β”€ decisions.log
└── memory_snapshots.json
examples/
β”œβ”€β”€ ai_testing/
β”‚ β”œβ”€β”€ config_example.yaml # NEW: Example configuration
β”‚ β”œβ”€β”€ test_single_agent.py # NEW: Test script
β”‚ └── test_multi_agent.py # NEW: Multi-agent test
└── ...
tests/
β”œβ”€β”€ unit/
β”‚ β”œβ”€β”€ test_ai_agent.py # NEW
β”‚ β”œβ”€β”€ test_prompt_manager.py # NEW
β”‚ β”œβ”€β”€ test_memory.py # NEW
β”‚ └── test_tools.py # NEW
β”œβ”€β”€ integration/
β”‚ β”œβ”€β”€ test_ai_gameplay.py # NEW
β”‚ └── test_multi_agent.py # NEW
└── ...
```