Spaces:
Configuration error
Configuration error
| # πΊοΈ AI Agent Development Work Plan | |
| **Date:** January 3, 2026 | |
| **Status:** β Phase 1 - Foundation & Infrastructure (100% Complete) | |
| **Current Task:** Phase 3 - Core AI Agent (3.1) - **NEXT** | |
| ## π― Project Goal | |
| Build a fully functional LLM-based AI agent that can play Settlers of Catan autonomously, making intelligent strategic decisions and interacting naturally with other players. | |
| --- | |
| ## π Development Phases | |
| > **Note:** Phase 4 (Monitoring & Debugging) should be developed **early and in parallel** with Phase 3. | |
| > The web dashboard and logging are **critical** for observing agent behavior during development! | |
| ### Phase 1: Foundation & Infrastructure ποΈ | |
| **Goal:** Build the core infrastructure needed to support AI agents | |
| #### 1.1 Configuration Management β **COMPLETED** | |
| - [x] Create centralized configuration system | |
| - [x] LLM settings (model, temperature, max_tokens, etc.) | |
| - [x] API credentials management | |
| - [x] Agent parameters (custom instructions only) | |
| - [x] Performance settings (timeouts, retries, caching) | |
| - [x] Create config file format (YAML) | |
| - [x] Build configuration loader and validator | |
| - [x] Add environment variable support for sensitive data | |
| **Files created:** | |
| - β `pycatan/ai/config.py` - Configuration management | |
| - β `pycatan/ai/config_example.yaml` - Example configuration file | |
| - β `pycatan/ai/config_dev.yaml` - Default dev configuration | |
| - β `.env.example` - Environment variables template | |
| --- | |
| #### 1.2 Prompt Management Layer β **COMPLETED** | |
| - [x] Design prompt processing pipeline | |
| - [x] Implement game state filtering | |
| - [x] Hide opponent's private information | |
| - [x] Filter development cards | |
| - [x] Remove non-visible game elements | |
| - [x] Build perspective transformation | |
| - [x] Convert game state to agent's viewpoint | |
| - [x] Format resources and points | |
| - [x] Present relative positioning | |
| - [x] Create prompt template system | |
| - [x] Meta data section | |
| - [x] Task context section | |
| - [x] Game state section | |
| - [x] Social context section | |
| - [x] Memory section | |
| - [x] Constraints section | |
| - [x] Build custom instruction injection per agent | |
| **Files created:** | |
| - β `pycatan/ai/prompt_manager.py` - Main prompt processing | |
| - β `pycatan/ai/state_filter.py` - Game state filtering logic | |
| - β `pycatan/ai/prompt_templates.py` - Template definitions | |
| --- | |
| #### 1.2.5 Game State Optimization β **COMPLETED** | |
| **Goal:** Optimize the game state capture and representation for better LLM consumption | |
| - [x] Review current game state structure from `play_and_capture.py` | |
| - [x] Design improved game state format | |
| - [x] Compress player information structure | |
| - [x] Improve board representation (lookup tables H & N) | |
| - [x] Add resource/harbor code mappings | |
| - [x] Reduce redundancy and token usage (removed pixel_coords, board_graph) | |
| - [x] Add status flags (Longest Road, Largest Army) | |
| - [x] Create optimized state format with legend | |
| - [x] Update game state capture to save both formats (.json + .txt) | |
| - [x] Fix timing: capture state at turn START (not just after actions) | |
| - [x] Test with real game scenarios | |
| **Files modified:** | |
| - β `examples/ai_testing/play_and_capture.py` - Optimized state capture | |
| - β `pycatan/management/game_manager.py` - Added state capture at turn start | |
| **Key achievements:** | |
| - π― State representation optimized by ~60% (removed redundant fields) | |
| - π Compressed format with lookup tables (H=hexes, N=nodes) | |
| - π Real-time state updates at `current_state_optimized.txt` | |
| - π Clear legend/documentation included in output | |
| - β Captures state at decision point (turn start) | |
| --- | |
| #### 1.3 Response Parser β **COMPLETED** | |
| - [x] Define structured response format (JSON schema) | |
| - [x] Build response parser and validator | |
| - [x] Implement error handling for malformed responses | |
| - [x] Create fallback mechanisms for parsing failures | |
| - [x] Add response logging for debugging | |
| **Files created:** | |
| - β `pycatan/ai/response_parser.py` - Parse and validate LLM responses | |
| - β `pycatan/ai/schemas.py` - JSON schemas for requests/responses | |
| **Key features:** | |
| - π― Dual schema support: Active turn (with action) & Observing (no action) | |
| - π‘οΈ Error handling: Invalid JSON, missing fields, type validation | |
| - π§ Fallback mechanisms: JSON repair, structure repair, default values | |
| - π Parse statistics tracking | |
| - π Flexible parsing: Handles markdown code blocks, extra text | |
| - β Action parameter validation against expected schemas | |
| --- | |
| ### Phase 2: Memory System π§ | |
| **Goal:** Enable agents to maintain context and learning across turns | |
| #### 2.1 Memory Structure | |
| - [ ] Design memory data model | |
| - [ ] Short-term observations (last N turns) | |
| - [ ] Strategic notes (persistent) | |
| - [ ] Social tracking (player relationships) | |
| - [ ] Game insights (patterns observed) | |
| - [ ] Implement memory storage (in-memory for now) | |
| - [ ] Build memory retrieval and formatting | |
| **Files to create:** | |
| - `pycatan/ai/memory.py` - Memory management system | |
| --- | |
| #### 2.2 Memory Operations | |
| - [ ] Add note creation and updates | |
| - [ ] Implement memory pruning (keep relevant, remove old) | |
| - [ ] Build memory summarization for context limits | |
| - [ ] Create memory persistence (save/load between games) | |
| --- | |
| #### 2.3 Chat History Summarization β‘ | |
| - [ ] Implement automatic chat summarization | |
| - [ ] Configure separate smaller LLM for summarization (cost-effective) | |
| - [ ] Monitor chat history length (e.g., last 10 messages) | |
| - [ ] Trigger summarization when threshold reached | |
| - [ ] Create summarization prompt template | |
| - [ ] Build chat memory management | |
| - [ ] Keep only most recent message after summarization | |
| - [ ] Store summary in agent's memory | |
| - [ ] Maintain summary history for context | |
| - [ ] Add configuration for summarization settings | |
| - [ ] Summarization model selection | |
| - [ ] Message threshold for triggering | |
| - [ ] Summary format and length | |
| **Files to update:** | |
| - `pycatan/ai/memory.py` - Add chat summarization logic | |
| - `pycatan/ai/config.py` - Add summarization configuration | |
| - `pycatan/ai/llm_client.py` - Support multiple models (main + summarization) | |
| --- | |
| ### Phase 3: Core AI Agent π€ | |
| **Goal:** Implement the main AI agent class | |
| #### 3.1 Base Agent Implementation | |
| - [ ] Create `AIAgent` class inheriting from `User` | |
| - [ ] Implement required User interface methods | |
| - [ ] `get_choice()` for decision-making | |
| - [ ] Other interaction methods as needed | |
| - [ ] Integrate with prompt manager | |
| - [ ] Integrate with memory system | |
| - [ ] Add agent state management | |
| **Files to create:** | |
| - `pycatan/players/ai_agent.py` - Main AI agent implementation (update existing stub) | |
| --- | |
| #### 3.2 LLM Integration | |
| - [ ] Create LLM client abstraction | |
| - [ ] Support for OpenAI API | |
| - [ ] Support for Anthropic Claude | |
| - [ ] Support for other providers (Azure, etc.) | |
| - [ ] Implement API call handling | |
| - [ ] Request formatting | |
| - [ ] Response parsing | |
| - [ ] Error handling and retries | |
| - [ ] Rate limiting | |
| - [ ] Add logging for all LLM interactions | |
| - [ ] Implement cost tracking | |
| **Files to create:** | |
| - `pycatan/ai/llm_client.py` - LLM API abstraction | |
| - `pycatan/ai/providers/` - Provider-specific implementations | |
| - `openai_provider.py` | |
| - `anthropic_provider.py` | |
| --- | |
| #### 3.3 Decision Pipeline | |
| - [ ] Build event-to-prompt conversion | |
| - [ ] Implement action extraction from responses | |
| - [ ] Create action validation before execution | |
| - [ ] Add decision logging and debugging | |
| - [ ] Implement decision timeout handling | |
| --- | |
| ### Phase 4: Monitoring & Debugging Infrastructure π | |
| **Goal:** Build essential tools for observing and debugging agent behavior | |
| **β οΈ CRITICAL: These tools are essential for development and must be built early!** | |
| --- | |
| #### 4.1 Web Dashboard for Real-Time Monitoring π | |
| **Priority: HIGH - Required before agent testing** | |
| - [ ] Design web dashboard UI | |
| - [ ] Multi-agent view (tabs or split screen per agent) | |
| - [ ] Live prompt display with syntax highlighting | |
| - [ ] Agent reasoning/thinking display | |
| - [ ] Action selection visualization | |
| - [ ] Chat window with all messages | |
| - [ ] Game state summary panel | |
| - [ ] Build backend API for dashboard | |
| - [ ] WebSocket connection for live updates | |
| - [ ] Endpoints for prompt/response history | |
| - [ ] Agent state endpoints | |
| - [ ] Chat history endpoint | |
| - [ ] Implement prompt logging and streaming | |
| - [ ] Capture all prompts sent to LLM | |
| - [ ] Capture all responses from LLM | |
| - [ ] Stream to dashboard in real-time | |
| - [ ] Format for readability | |
| - [ ] Build agent reasoning viewer | |
| - [ ] Display internal_thinking/reasoning | |
| - [ ] Show action selection process | |
| - [ ] Highlight tool usage | |
| - [ ] Show memory updates | |
| **Files to create:** | |
| - `pycatan/monitoring/` - NEW monitoring package | |
| - `dashboard_server.py` - Flask/FastAPI server for dashboard | |
| - `event_logger.py` - Captures and broadcasts events | |
| - `prompt_tracker.py` - Tracks all LLM interactions | |
| - `pycatan/monitoring/web/` - Dashboard frontend | |
| - `index.html` - Main dashboard page | |
| - `dashboard.js` - Dashboard functionality | |
| - `dashboard.css` - Dashboard styling | |
| --- | |
| #### 4.2 Local Documentation & Logging π | |
| **Priority: HIGH - Required for debugging** | |
| - [ ] Design local documentation structure | |
| - [ ] One folder per game session | |
| - [ ] One file per agent with structured log | |
| - [ ] Timestamp-based organization | |
| - [ ] Implement per-agent documentation | |
| - [ ] Agent configuration snapshot | |
| - [ ] All prompts sent (formatted) | |
| - [ ] All responses received (formatted) | |
| - [ ] Decision timeline with reasoning | |
| - [ ] Memory state snapshots | |
| - [ ] Tool usage log | |
| - [ ] Errors and warnings | |
| - [ ] Build structured logging format | |
| - [ ] JSON-based for easy parsing | |
| - [ ] Markdown reports for human reading | |
| - [ ] Searchable and filterable | |
| - [ ] Add game session documentation | |
| - [ ] Game state at each turn | |
| - [ ] All chat messages with timestamps | |
| - [ ] Final game results and statistics | |
| **Files to create:** | |
| - `pycatan/monitoring/local_logger.py` - Local file logging | |
| - `pycatan/monitoring/session_recorder.py` - Game session recording | |
| - `pycatan/monitoring/report_generator.py` - Generate readable reports | |
| **Output structure:** | |
| ``` | |
| logs/ | |
| βββ game_sessions/ | |
| βββ 2026-01-03_15-30-45/ | |
| βββ game_summary.json | |
| βββ chat_log.txt | |
| βββ agent_blue/ | |
| β βββ config.json | |
| β βββ prompts.log | |
| β βββ decisions.log | |
| β βββ memory_snapshots.json | |
| βββ agent_red/ | |
| β βββ ... | |
| βββ agent_white/ | |
| βββ ... | |
| ``` | |
| --- | |
| #### 4.3 Chat Management System π¬ | |
| **Priority: HIGH - Core game feature** | |
| - [ ] Design chat system architecture | |
| - [ ] Centralized chat manager | |
| - [ ] Message routing between players | |
| - [ ] Chat history per game | |
| - [ ] Public vs private messages | |
| - [ ] Implement chat manager component | |
| - [ ] Message queue/buffer | |
| - [ ] Broadcast to all players | |
| - [ ] Direct messages between players | |
| - [ ] Integration with GameManager | |
| - [ ] Build chat observation interface | |
| - [ ] Real-time chat display in web dashboard | |
| - [ ] Chat log export | |
| - [ ] Filter by sender/time | |
| - [ ] Define chat protocol | |
| - [ ] Message format (sender, content, timestamp, type) | |
| - [ ] Chat commands (if any) | |
| - [ ] Trade negotiation messages | |
| **Files to create:** | |
| - `pycatan/management/chat_manager.py` - Central chat management | |
| - `pycatan/management/message.py` - Message data structure | |
| **Integration points:** | |
| - GameManager receives messages from players | |
| - ChatManager distributes to other players and dashboard | |
| - AI agents see messages in their prompt context | |
| - Web dashboard shows live chat | |
| - Local logs record all messages | |
| --- | |
| ### Phase 5: Tool System π§ | |
| **Goal:** Provide computational tools for agent decision-making | |
| #### 5.1 Core Tools | |
| - [ ] **Probability Calculator** | |
| - [ ] Dice roll probabilities for tiles | |
| - [ ] Expected resource generation rates | |
| - [ ] Statistical analysis helpers | |
| - [ ] **Resource Tracker** | |
| - [ ] Historical resource generation | |
| - [ ] Resource scarcity analysis | |
| - [ ] Production trend analysis | |
| - [ ] **Path Finder** | |
| - [ ] Optimal road placement | |
| - [ ] Longest road calculation | |
| - [ ] Connectivity analysis | |
| - [ ] **Trade Evaluator** | |
| - [ ] Fair trade assessment | |
| - [ ] Trade benefit calculation | |
| - [ ] Market value estimation | |
| **Files to create:** | |
| - `pycatan/ai/tools/` - Tool implementations | |
| - `probability_tool.py` | |
| - `resource_tool.py` | |
| - `pathfinding_tool.py` | |
| - `trade_tool.py` | |
| - `tool_manager.py` - Tool orchestration | |
| --- | |
| #### 5.2 Tool Integration | |
| - [ ] Define tool interface/protocol | |
| - [ ] Implement tool calling from prompts | |
| - [ ] Add tool usage limits per decision | |
| - [ ] Create tool result formatting | |
| - [ ] Build tool usage logging | |
| --- | |
| ### Phase 6: Testing & Validation β | |
| **Goal:** Ensure agent works correctly and plays reasonably | |
| #### 6.1 Unit Tests | |
| - [ ] Test prompt manager filtering | |
| - [ ] Test response parser with various inputs | |
| - [ ] Test memory operations | |
| - [ ] Test each tool independently | |
| - [ ] Test configuration loading | |
| **Files to create:** | |
| - `tests/unit/test_ai_agent.py` | |
| - `tests/unit/test_prompt_manager.py` | |
| - `tests/unit/test_memory.py` | |
| - `tests/unit/test_tools.py` | |
| --- | |
| #### 6.2 Integration Tests | |
| - [ ] Test agent in complete game loop | |
| - [ ] Test agent vs human player | |
| - [ ] Test multiple AI agents playing together | |
| - [ ] Test edge cases and error scenarios | |
| - [ ] Test long-running games (memory management) | |
| **Files to create:** | |
| - `tests/integration/test_ai_gameplay.py` | |
| - `tests/integration/test_multi_agent.py` | |
| --- | |
| #### 6.3 Gameplay Validation | |
| - [ ] Verify legal moves only | |
| - [ ] Check strategic decision quality | |
| - [ ] Evaluate social interaction naturalness | |
| - [ ] Monitor LLM costs and performance | |
| - [ ] Collect agent behavior metrics | |
| --- | |
| ### Phase 7: Optimization & Enhancement π | |
| **Goal:** Improve agent performance and capabilities | |
| #### 7.1 Performance Optimization | |
| - [ ] Reduce prompt token usage | |
| - [ ] Implement response caching for similar situations | |
| - [ ] Optimize tool execution | |
| - [ ] Improve decision speed | |
| --- | |
| #### 7.2 Strategy Enhancement | |
| - [ ] Tune agent personalities | |
| - [ ] Improve opening game strategy | |
| - [ ] Enhance mid-game adaptation | |
| - [ ] Refine end-game tactics | |
| - [ ] Better negotiation and trading | |
| --- | |
| #### 7.3 Advanced Features | |
| - [ ] Multi-turn planning capability | |
| - [ ] Opponent modeling | |
| - [ ] Meta-strategy learning | |
| - [ ] Tournament play support | |
| - [ ] Statistical performance tracking | |
| --- | |
| ## π Project Structure (Proposed) | |
| ``` | |
| pycatan/ | |
| βββ ai/ # NEW: AI agent infrastructure | |
| β βββ __init__.py | |
| β βββ config.py # Configuration management | |
| β βββ prompt_manager.py # Prompt processing pipeline | |
| β βββ state_filter.py # Game state filtering | |
| β βββ prompt_templates.py # Prompt templates | |
| β βββ response_parser.py # Response parsing | |
| β βββ schemas.py # JSON schemas | |
| β βββ memory.py # Memory system + chat summarization | |
| β βββ llm_client.py # LLM abstraction (multi-model) | |
| β βββ providers/ # LLM provider implementations | |
| β β βββ __init__.py | |
| β β βββ openai_provider.py | |
| β β βββ anthropic_provider.py | |
| β βββ tools/ # Agent tools | |
| β βββ __init__.py | |
| β βββ tool_manager.py | |
| β βββ probability_tool.py | |
| β βββ resource_tool.py | |
| β βββ pathfinding_tool.py | |
| β βββ trade_tool.py | |
| βββ monitoring/ # NEW: Monitoring & debugging | |
| β βββ __init__.py | |
| β βββ dashboard_server.py # Web dashboard backend | |
| β βββ event_logger.py # Event capture and broadcast | |
| β βββ prompt_tracker.py # LLM interaction tracking | |
| β βββ local_logger.py # Local file logging | |
| β βββ session_recorder.py # Game session recording | |
| β βββ report_generator.py # Report generation | |
| β βββ web/ # Dashboard frontend | |
| β βββ index.html | |
| β βββ dashboard.js | |
| β βββ dashboard.css | |
| βββ management/ | |
| β βββ actions.py # Existing | |
| β βββ game_manager.py # Existing | |
| β βββ log_events.py # Existing | |
| β βββ chat_manager.py # NEW: Chat management | |
| β βββ message.py # NEW: Message data structure | |
| βββ players/ | |
| β βββ ai_agent.py # UPDATE: Full AI agent implementation | |
| β βββ human_user.py # Existing | |
| β βββ user.py # Existing | |
| βββ ... # Existing structure | |
| logs/ # NEW: Local documentation | |
| βββ game_sessions/ | |
| βββ YYYY-MM-DD_HH-MM-SS/ | |
| βββ game_summary.json | |
| βββ chat_log.txt | |
| βββ agent_<color>/ | |
| βββ config.json | |
| βββ prompts.log | |
| βββ decisions.log | |
| βββ memory_snapshots.json | |
| examples/ | |
| βββ ai_testing/ | |
| β βββ config_example.yaml # NEW: Example configuration | |
| β βββ test_single_agent.py # NEW: Test script | |
| β βββ test_multi_agent.py # NEW: Multi-agent test | |
| βββ ... | |
| tests/ | |
| βββ unit/ | |
| β βββ test_ai_agent.py # NEW | |
| β βββ test_prompt_manager.py # NEW | |
| β βββ test_memory.py # NEW | |
| β βββ test_tools.py # NEW | |
| βββ integration/ | |
| β βββ test_ai_gameplay.py # NEW | |
| β βββ test_multi_agent.py # NEW | |
| βββ ... | |
| ``` | |