Spaces:

shon98
/

PyCatan-AI

Configuration error

App Files Files Community

PyCatan-AI / .github /instructions /WORK_PLAN.md

EZTIME2025

update response parser

21fb2c3 5 months ago

preview code

raw

history blame contribute delete

17.8 kB

	# 🗺️ AI Agent Development Work Plan

	Date: January 3, 2026
	Status: ✅ Phase 1 - Foundation & Infrastructure (100% Complete)
	Current Task: Phase 3 - Core AI Agent (3.1) - NEXT

	## 🎯 Project Goal

	Build a fully functional LLM-based AI agent that can play Settlers of Catan autonomously, making intelligent strategic decisions and interacting naturally with other players.

	---

	## 📊 Development Phases

	> Note: Phase 4 (Monitoring & Debugging) should be developed early and in parallel with Phase 3.
	> The web dashboard and logging are critical for observing agent behavior during development!

	### Phase 1: Foundation & Infrastructure 🏗️
	Goal: Build the core infrastructure needed to support AI agents

	#### 1.1 Configuration Management ✅ COMPLETED
	- [x] Create centralized configuration system
	- [x] LLM settings (model, temperature, max_tokens, etc.)
	- [x] API credentials management
	- [x] Agent parameters (custom instructions only)
	- [x] Performance settings (timeouts, retries, caching)
	- [x] Create config file format (YAML)
	- [x] Build configuration loader and validator
	- [x] Add environment variable support for sensitive data

	Files created:
	- ✅ `pycatan/ai/config.py` - Configuration management
	- ✅ `pycatan/ai/config_example.yaml` - Example configuration file
	- ✅ `pycatan/ai/config_dev.yaml` - Default dev configuration
	- ✅ `.env.example` - Environment variables template

	---

	#### 1.2 Prompt Management Layer ✅ COMPLETED
	- [x] Design prompt processing pipeline
	- [x] Implement game state filtering
	- [x] Hide opponent's private information
	- [x] Filter development cards
	- [x] Remove non-visible game elements
	- [x] Build perspective transformation
	- [x] Convert game state to agent's viewpoint
	- [x] Format resources and points
	- [x] Present relative positioning
	- [x] Create prompt template system
	- [x] Meta data section
	- [x] Task context section
	- [x] Game state section
	- [x] Social context section
	- [x] Memory section
	- [x] Constraints section
	- [x] Build custom instruction injection per agent

	Files created:
	- ✅ `pycatan/ai/prompt_manager.py` - Main prompt processing
	- ✅ `pycatan/ai/state_filter.py` - Game state filtering logic
	- ✅ `pycatan/ai/prompt_templates.py` - Template definitions

	---

	#### 1.2.5 Game State Optimization ✅ COMPLETED
	Goal: Optimize the game state capture and representation for better LLM consumption

	- [x] Review current game state structure from `play_and_capture.py`
	- [x] Design improved game state format
	- [x] Compress player information structure
	- [x] Improve board representation (lookup tables H & N)
	- [x] Add resource/harbor code mappings
	- [x] Reduce redundancy and token usage (removed pixel_coords, board_graph)
	- [x] Add status flags (Longest Road, Largest Army)
	- [x] Create optimized state format with legend
	- [x] Update game state capture to save both formats (.json + .txt)
	- [x] Fix timing: capture state at turn START (not just after actions)
	- [x] Test with real game scenarios

	Files modified:
	- ✅ `examples/ai_testing/play_and_capture.py` - Optimized state capture
	- ✅ `pycatan/management/game_manager.py` - Added state capture at turn start

	Key achievements:
	- 🎯 State representation optimized by ~60% (removed redundant fields)
	- 📊 Compressed format with lookup tables (H=hexes, N=nodes)
	- 🔄 Real-time state updates at `current_state_optimized.txt`
	- 📝 Clear legend/documentation included in output
	- ✅ Captures state at decision point (turn start)

	---

	#### 1.3 Response Parser ✅ COMPLETED
	- [x] Define structured response format (JSON schema)
	- [x] Build response parser and validator
	- [x] Implement error handling for malformed responses
	- [x] Create fallback mechanisms for parsing failures
	- [x] Add response logging for debugging

	Files created:
	- ✅ `pycatan/ai/response_parser.py` - Parse and validate LLM responses
	- ✅ `pycatan/ai/schemas.py` - JSON schemas for requests/responses

	Key features:
	- 🎯 Dual schema support: Active turn (with action) & Observing (no action)
	- 🛡️ Error handling: Invalid JSON, missing fields, type validation
	- 🔧 Fallback mechanisms: JSON repair, structure repair, default values
	- 📊 Parse statistics tracking
	- 🔍 Flexible parsing: Handles markdown code blocks, extra text
	- ✅ Action parameter validation against expected schemas

	---

	### Phase 2: Memory System 🧠
	Goal: Enable agents to maintain context and learning across turns

	#### 2.1 Memory Structure
	- [ ] Design memory data model
	- [ ] Short-term observations (last N turns)
	- [ ] Strategic notes (persistent)
	- [ ] Social tracking (player relationships)
	- [ ] Game insights (patterns observed)
	- [ ] Implement memory storage (in-memory for now)
	- [ ] Build memory retrieval and formatting

	Files to create:
	- `pycatan/ai/memory.py` - Memory management system

	---

	#### 2.2 Memory Operations
	- [ ] Add note creation and updates
	- [ ] Implement memory pruning (keep relevant, remove old)
	- [ ] Build memory summarization for context limits
	- [ ] Create memory persistence (save/load between games)

	---

	#### 2.3 Chat History Summarization ⚡
	- [ ] Implement automatic chat summarization
	- [ ] Configure separate smaller LLM for summarization (cost-effective)
	- [ ] Monitor chat history length (e.g., last 10 messages)
	- [ ] Trigger summarization when threshold reached
	- [ ] Create summarization prompt template
	- [ ] Build chat memory management
	- [ ] Keep only most recent message after summarization
	- [ ] Store summary in agent's memory
	- [ ] Maintain summary history for context
	- [ ] Add configuration for summarization settings
	- [ ] Summarization model selection
	- [ ] Message threshold for triggering
	- [ ] Summary format and length

	Files to update:
	- `pycatan/ai/memory.py` - Add chat summarization logic
	- `pycatan/ai/config.py` - Add summarization configuration
	- `pycatan/ai/llm_client.py` - Support multiple models (main + summarization)

	---

	### Phase 3: Core AI Agent 🤖
	Goal: Implement the main AI agent class

	#### 3.1 Base Agent Implementation
	- [ ] Create `AIAgent` class inheriting from `User`
	- [ ] Implement required User interface methods
	- [ ] `get_choice()` for decision-making
	- [ ] Other interaction methods as needed
	- [ ] Integrate with prompt manager
	- [ ] Integrate with memory system
	- [ ] Add agent state management

	Files to create:
	- `pycatan/players/ai_agent.py` - Main AI agent implementation (update existing stub)

	---

	#### 3.2 LLM Integration
	- [ ] Create LLM client abstraction
	- [ ] Support for OpenAI API
	- [ ] Support for Anthropic Claude
	- [ ] Support for other providers (Azure, etc.)
	- [ ] Implement API call handling
	- [ ] Request formatting
	- [ ] Response parsing
	- [ ] Error handling and retries
	- [ ] Rate limiting
	- [ ] Add logging for all LLM interactions
	- [ ] Implement cost tracking

	Files to create:
	- `pycatan/ai/llm_client.py` - LLM API abstraction
	- `pycatan/ai/providers/` - Provider-specific implementations
	- `openai_provider.py`
	- `anthropic_provider.py`

	---

	#### 3.3 Decision Pipeline
	- [ ] Build event-to-prompt conversion
	- [ ] Implement action extraction from responses
	- [ ] Create action validation before execution
	- [ ] Add decision logging and debugging
	- [ ] Implement decision timeout handling

	---

	### Phase 4: Monitoring & Debugging Infrastructure 🔍
	Goal: Build essential tools for observing and debugging agent behavior

	⚠️ CRITICAL: These tools are essential for development and must be built early!

	---

	#### 4.1 Web Dashboard for Real-Time Monitoring 🌐
	Priority: HIGH - Required before agent testing

	- [ ] Design web dashboard UI
	- [ ] Multi-agent view (tabs or split screen per agent)
	- [ ] Live prompt display with syntax highlighting
	- [ ] Agent reasoning/thinking display
	- [ ] Action selection visualization
	- [ ] Chat window with all messages
	- [ ] Game state summary panel
	- [ ] Build backend API for dashboard
	- [ ] WebSocket connection for live updates
	- [ ] Endpoints for prompt/response history
	- [ ] Agent state endpoints
	- [ ] Chat history endpoint
	- [ ] Implement prompt logging and streaming
	- [ ] Capture all prompts sent to LLM
	- [ ] Capture all responses from LLM
	- [ ] Stream to dashboard in real-time
	- [ ] Format for readability
	- [ ] Build agent reasoning viewer
	- [ ] Display internal_thinking/reasoning
	- [ ] Show action selection process
	- [ ] Highlight tool usage
	- [ ] Show memory updates

	Files to create:
	- `pycatan/monitoring/` - NEW monitoring package
	- `dashboard_server.py` - Flask/FastAPI server for dashboard
	- `event_logger.py` - Captures and broadcasts events
	- `prompt_tracker.py` - Tracks all LLM interactions
	- `pycatan/monitoring/web/` - Dashboard frontend
	- `index.html` - Main dashboard page
	- `dashboard.js` - Dashboard functionality
	- `dashboard.css` - Dashboard styling

	---

	#### 4.2 Local Documentation & Logging 📁
	Priority: HIGH - Required for debugging

	- [ ] Design local documentation structure
	- [ ] One folder per game session
	- [ ] One file per agent with structured log
	- [ ] Timestamp-based organization
	- [ ] Implement per-agent documentation
	- [ ] Agent configuration snapshot
	- [ ] All prompts sent (formatted)
	- [ ] All responses received (formatted)
	- [ ] Decision timeline with reasoning
	- [ ] Memory state snapshots
	- [ ] Tool usage log
	- [ ] Errors and warnings
	- [ ] Build structured logging format
	- [ ] JSON-based for easy parsing
	- [ ] Markdown reports for human reading
	- [ ] Searchable and filterable
	- [ ] Add game session documentation
	- [ ] Game state at each turn
	- [ ] All chat messages with timestamps
	- [ ] Final game results and statistics

	Files to create:
	- `pycatan/monitoring/local_logger.py` - Local file logging
	- `pycatan/monitoring/session_recorder.py` - Game session recording
	- `pycatan/monitoring/report_generator.py` - Generate readable reports

	Output structure:
	```
	logs/
	└── game_sessions/
	└── 2026-01-03_15-30-45/
	├── game_summary.json
	├── chat_log.txt
	├── agent_blue/
	│ ├── config.json
	│ ├── prompts.log
	│ ├── decisions.log
	│ └── memory_snapshots.json
	├── agent_red/
	│ └── ...
	└── agent_white/
	└── ...
	```

	---

	#### 4.3 Chat Management System 💬
	Priority: HIGH - Core game feature

	- [ ] Design chat system architecture
	- [ ] Centralized chat manager
	- [ ] Message routing between players
	- [ ] Chat history per game
	- [ ] Public vs private messages
	- [ ] Implement chat manager component
	- [ ] Message queue/buffer
	- [ ] Broadcast to all players
	- [ ] Direct messages between players
	- [ ] Integration with GameManager
	- [ ] Build chat observation interface
	- [ ] Real-time chat display in web dashboard
	- [ ] Chat log export
	- [ ] Filter by sender/time
	- [ ] Define chat protocol
	- [ ] Message format (sender, content, timestamp, type)
	- [ ] Chat commands (if any)
	- [ ] Trade negotiation messages

	Files to create:
	- `pycatan/management/chat_manager.py` - Central chat management
	- `pycatan/management/message.py` - Message data structure

	Integration points:
	- GameManager receives messages from players
	- ChatManager distributes to other players and dashboard
	- AI agents see messages in their prompt context
	- Web dashboard shows live chat
	- Local logs record all messages

	---

	### Phase 5: Tool System 🔧
	Goal: Provide computational tools for agent decision-making

	#### 5.1 Core Tools
	- [ ] Probability Calculator
	- [ ] Dice roll probabilities for tiles
	- [ ] Expected resource generation rates
	- [ ] Statistical analysis helpers
	- [ ] Resource Tracker
	- [ ] Historical resource generation
	- [ ] Resource scarcity analysis
	- [ ] Production trend analysis
	- [ ] Path Finder
	- [ ] Optimal road placement
	- [ ] Longest road calculation
	- [ ] Connectivity analysis
	- [ ] Trade Evaluator
	- [ ] Fair trade assessment
	- [ ] Trade benefit calculation
	- [ ] Market value estimation

	Files to create:
	- `pycatan/ai/tools/` - Tool implementations
	- `probability_tool.py`
	- `resource_tool.py`
	- `pathfinding_tool.py`
	- `trade_tool.py`
	- `tool_manager.py` - Tool orchestration

	---

	#### 5.2 Tool Integration
	- [ ] Define tool interface/protocol
	- [ ] Implement tool calling from prompts
	- [ ] Add tool usage limits per decision
	- [ ] Create tool result formatting
	- [ ] Build tool usage logging

	---

	### Phase 6: Testing & Validation ✅
	Goal: Ensure agent works correctly and plays reasonably

	#### 6.1 Unit Tests
	- [ ] Test prompt manager filtering
	- [ ] Test response parser with various inputs
	- [ ] Test memory operations
	- [ ] Test each tool independently
	- [ ] Test configuration loading

	Files to create:
	- `tests/unit/test_ai_agent.py`
	- `tests/unit/test_prompt_manager.py`
	- `tests/unit/test_memory.py`
	- `tests/unit/test_tools.py`

	---

	#### 6.2 Integration Tests
	- [ ] Test agent in complete game loop
	- [ ] Test agent vs human player
	- [ ] Test multiple AI agents playing together
	- [ ] Test edge cases and error scenarios
	- [ ] Test long-running games (memory management)

	Files to create:
	- `tests/integration/test_ai_gameplay.py`
	- `tests/integration/test_multi_agent.py`

	---

	#### 6.3 Gameplay Validation
	- [ ] Verify legal moves only
	- [ ] Check strategic decision quality
	- [ ] Evaluate social interaction naturalness
	- [ ] Monitor LLM costs and performance
	- [ ] Collect agent behavior metrics

	---

	### Phase 7: Optimization & Enhancement 🚀
	Goal: Improve agent performance and capabilities

	#### 7.1 Performance Optimization
	- [ ] Reduce prompt token usage
	- [ ] Implement response caching for similar situations
	- [ ] Optimize tool execution
	- [ ] Improve decision speed

	---

	#### 7.2 Strategy Enhancement
	- [ ] Tune agent personalities
	- [ ] Improve opening game strategy
	- [ ] Enhance mid-game adaptation
	- [ ] Refine end-game tactics
	- [ ] Better negotiation and trading

	---

	#### 7.3 Advanced Features
	- [ ] Multi-turn planning capability
	- [ ] Opponent modeling
	- [ ] Meta-strategy learning
	- [ ] Tournament play support
	- [ ] Statistical performance tracking

	---

	## 📁 Project Structure (Proposed)

	```
	pycatan/
	├── ai/ # NEW: AI agent infrastructure
	│ ├── __init__.py
	│ ├── config.py # Configuration management
	│ ├── prompt_manager.py # Prompt processing pipeline
	│ ├── state_filter.py # Game state filtering
	│ ├── prompt_templates.py # Prompt templates
	│ ├── response_parser.py # Response parsing
	│ ├── schemas.py # JSON schemas
	│ ├── memory.py # Memory system + chat summarization
	│ ├── llm_client.py # LLM abstraction (multi-model)
	│ ├── providers/ # LLM provider implementations
	│ │ ├── __init__.py
	│ │ ├── openai_provider.py
	│ │ └── anthropic_provider.py
	│ └── tools/ # Agent tools
	│ ├── __init__.py
	│ ├── tool_manager.py
	│ ├── probability_tool.py
	│ ├── resource_tool.py
	│ ├── pathfinding_tool.py
	│ └── trade_tool.py
	├── monitoring/ # NEW: Monitoring & debugging
	│ ├── __init__.py
	│ ├── dashboard_server.py # Web dashboard backend
	│ ├── event_logger.py # Event capture and broadcast
	│ ├── prompt_tracker.py # LLM interaction tracking
	│ ├── local_logger.py # Local file logging
	│ ├── session_recorder.py # Game session recording
	│ ├── report_generator.py # Report generation
	│ └── web/ # Dashboard frontend
	│ ├── index.html
	│ ├── dashboard.js
	│ └── dashboard.css
	├── management/
	│ ├── actions.py # Existing
	│ ├── game_manager.py # Existing
	│ ├── log_events.py # Existing
	│ ├── chat_manager.py # NEW: Chat management
	│ └── message.py # NEW: Message data structure
	├── players/
	│ ├── ai_agent.py # UPDATE: Full AI agent implementation
	│ ├── human_user.py # Existing
	│ └── user.py # Existing
	└── ... # Existing structure

	logs/ # NEW: Local documentation
	└── game_sessions/
	└── YYYY-MM-DD_HH-MM-SS/
	├── game_summary.json
	├── chat_log.txt
	└── agent_<color>/
	├── config.json
	├── prompts.log
	├── decisions.log
	└── memory_snapshots.json

	examples/
	├── ai_testing/
	│ ├── config_example.yaml # NEW: Example configuration
	│ ├── test_single_agent.py # NEW: Test script
	│ └── test_multi_agent.py # NEW: Multi-agent test
	└── ...

	tests/
	├── unit/
	│ ├── test_ai_agent.py # NEW
	│ ├── test_prompt_manager.py # NEW
	│ ├── test_memory.py # NEW
	│ └── test_tools.py # NEW
	├── integration/
	│ ├── test_ai_gameplay.py # NEW
	│ └── test_multi_agent.py # NEW
	└── ...
	```