Spaces:

shon98
/

PyCatan-AI

Configuration error

App Files Files Community

PyCatan-AI / .github /instructions /WORK_PLAN.md

EZTIME2025

update response parser

21fb2c3 5 months ago

preview code

raw

history blame contribute delete

17.8 kB

🗺️ AI Agent Development Work Plan

Date: January 3, 2026
Status: ✅ Phase 1 - Foundation & Infrastructure (100% Complete) Current Task: Phase 3 - Core AI Agent (3.1) - NEXT

🎯 Project Goal

Build a fully functional LLM-based AI agent that can play Settlers of Catan autonomously, making intelligent strategic decisions and interacting naturally with other players.

📊 Development Phases

Note: Phase 4 (Monitoring & Debugging) should be developed early and in parallel with Phase 3.
The web dashboard and logging are critical for observing agent behavior during development!

Phase 1: Foundation & Infrastructure 🏗️

Goal: Build the core infrastructure needed to support AI agents

1.1 Configuration Management ✅ COMPLETED

Create centralized configuration system
- LLM settings (model, temperature, max_tokens, etc.)
- API credentials management
- Agent parameters (custom instructions only)
- Performance settings (timeouts, retries, caching)
Create config file format (YAML)
Build configuration loader and validator
Add environment variable support for sensitive data

Files created:

✅ pycatan/ai/config.py - Configuration management
✅ pycatan/ai/config_example.yaml - Example configuration file
✅ pycatan/ai/config_dev.yaml - Default dev configuration
✅ .env.example - Environment variables template

1.2 Prompt Management Layer ✅ COMPLETED

Design prompt processing pipeline
Implement game state filtering
- Hide opponent's private information
- Filter development cards
- Remove non-visible game elements
Build perspective transformation
- Convert game state to agent's viewpoint
- Format resources and points
- Present relative positioning
Create prompt template system
- Meta data section
- Task context section
- Game state section
- Social context section
- Memory section
- Constraints section
Build custom instruction injection per agent

Files created:

✅ pycatan/ai/prompt_manager.py - Main prompt processing
✅ pycatan/ai/state_filter.py - Game state filtering logic
✅ pycatan/ai/prompt_templates.py - Template definitions

1.2.5 Game State Optimization ✅ COMPLETED

Goal: Optimize the game state capture and representation for better LLM consumption

Review current game state structure from play_and_capture.py
Design improved game state format
- Compress player information structure
- Improve board representation (lookup tables H & N)
- Add resource/harbor code mappings
- Reduce redundancy and token usage (removed pixel_coords, board_graph)
- Add status flags (Longest Road, Largest Army)
Create optimized state format with legend
Update game state capture to save both formats (.json + .txt)
Fix timing: capture state at turn START (not just after actions)
Test with real game scenarios

Files modified:

✅ examples/ai_testing/play_and_capture.py - Optimized state capture
✅ pycatan/management/game_manager.py - Added state capture at turn start

Key achievements:

🎯 State representation optimized by ~60% (removed redundant fields)
📊 Compressed format with lookup tables (H=hexes, N=nodes)
🔄 Real-time state updates at current_state_optimized.txt
📝 Clear legend/documentation included in output
✅ Captures state at decision point (turn start)

1.3 Response Parser ✅ COMPLETED

Define structured response format (JSON schema)
Build response parser and validator
Implement error handling for malformed responses
Create fallback mechanisms for parsing failures
Add response logging for debugging

Files created:

✅ pycatan/ai/response_parser.py - Parse and validate LLM responses
✅ pycatan/ai/schemas.py - JSON schemas for requests/responses

Key features:

🎯 Dual schema support: Active turn (with action) & Observing (no action)
🛡️ Error handling: Invalid JSON, missing fields, type validation
🔧 Fallback mechanisms: JSON repair, structure repair, default values
📊 Parse statistics tracking
🔍 Flexible parsing: Handles markdown code blocks, extra text
✅ Action parameter validation against expected schemas

Phase 2: Memory System 🧠

Goal: Enable agents to maintain context and learning across turns

2.1 Memory Structure

Design memory data model
- Short-term observations (last N turns)
- Strategic notes (persistent)
- Social tracking (player relationships)
- Game insights (patterns observed)
Implement memory storage (in-memory for now)
Build memory retrieval and formatting

Files to create:

pycatan/ai/memory.py - Memory management system

2.2 Memory Operations

Add note creation and updates
Implement memory pruning (keep relevant, remove old)
Build memory summarization for context limits
Create memory persistence (save/load between games)

2.3 Chat History Summarization ⚡

Implement automatic chat summarization
- Configure separate smaller LLM for summarization (cost-effective)
- Monitor chat history length (e.g., last 10 messages)
- Trigger summarization when threshold reached
- Create summarization prompt template
Build chat memory management
- Keep only most recent message after summarization
- Store summary in agent's memory
- Maintain summary history for context
Add configuration for summarization settings
- Summarization model selection
- Message threshold for triggering
- Summary format and length

Files to update:

pycatan/ai/memory.py - Add chat summarization logic
pycatan/ai/config.py - Add summarization configuration
pycatan/ai/llm_client.py - Support multiple models (main + summarization)

Phase 3: Core AI Agent 🤖

Goal: Implement the main AI agent class

3.1 Base Agent Implementation

Create AIAgent class inheriting from User
Implement required User interface methods
- get_choice() for decision-making
- Other interaction methods as needed
Integrate with prompt manager
Integrate with memory system
Add agent state management

Files to create:

pycatan/players/ai_agent.py - Main AI agent implementation (update existing stub)

3.2 LLM Integration

Create LLM client abstraction
- Support for OpenAI API
- Support for Anthropic Claude
- Support for other providers (Azure, etc.)
Implement API call handling
- Request formatting
- Response parsing
- Error handling and retries
- Rate limiting
Add logging for all LLM interactions
Implement cost tracking

Files to create:

pycatan/ai/llm_client.py - LLM API abstraction
pycatan/ai/providers/ - Provider-specific implementations
- openai_provider.py
- anthropic_provider.py

3.3 Decision Pipeline

Build event-to-prompt conversion
Implement action extraction from responses
Create action validation before execution
Add decision logging and debugging
Implement decision timeout handling

Phase 4: Monitoring & Debugging Infrastructure 🔍

Goal: Build essential tools for observing and debugging agent behavior

⚠️ CRITICAL: These tools are essential for development and must be built early!

4.1 Web Dashboard for Real-Time Monitoring 🌐

Priority: HIGH - Required before agent testing

Design web dashboard UI
- Multi-agent view (tabs or split screen per agent)
- Live prompt display with syntax highlighting
- Agent reasoning/thinking display
- Action selection visualization
- Chat window with all messages
- Game state summary panel
Build backend API for dashboard
- WebSocket connection for live updates
- Endpoints for prompt/response history
- Agent state endpoints
- Chat history endpoint
Implement prompt logging and streaming
- Capture all prompts sent to LLM
- Capture all responses from LLM
- Stream to dashboard in real-time
- Format for readability
Build agent reasoning viewer
- Display internal_thinking/reasoning
- Show action selection process
- Highlight tool usage
- Show memory updates

Files to create:

pycatan/monitoring/ - NEW monitoring package
- dashboard_server.py - Flask/FastAPI server for dashboard
- event_logger.py - Captures and broadcasts events
- prompt_tracker.py - Tracks all LLM interactions
pycatan/monitoring/web/ - Dashboard frontend
- index.html - Main dashboard page
- dashboard.js - Dashboard functionality
- dashboard.css - Dashboard styling

4.2 Local Documentation & Logging 📁

Priority: HIGH - Required for debugging

Design local documentation structure
- One folder per game session
- One file per agent with structured log
- Timestamp-based organization
Implement per-agent documentation
- Agent configuration snapshot
- All prompts sent (formatted)
- All responses received (formatted)
- Decision timeline with reasoning
- Memory state snapshots
- Tool usage log
- Errors and warnings
Build structured logging format
- JSON-based for easy parsing
- Markdown reports for human reading
- Searchable and filterable
Add game session documentation
- Game state at each turn
- All chat messages with timestamps
- Final game results and statistics

Files to create:

pycatan/monitoring/local_logger.py - Local file logging
pycatan/monitoring/session_recorder.py - Game session recording
pycatan/monitoring/report_generator.py - Generate readable reports

Output structure:

logs/
└── game_sessions/
    └── 2026-01-03_15-30-45/
        ├── game_summary.json
        ├── chat_log.txt
        ├── agent_blue/
        │   ├── config.json
        │   ├── prompts.log
        │   ├── decisions.log
        │   └── memory_snapshots.json
        ├── agent_red/
        │   └── ...
        └── agent_white/
            └── ...

4.3 Chat Management System 💬

Priority: HIGH - Core game feature

Design chat system architecture
- Centralized chat manager
- Message routing between players
- Chat history per game
- Public vs private messages
Implement chat manager component
- Message queue/buffer
- Broadcast to all players
- Direct messages between players
- Integration with GameManager
Build chat observation interface
- Real-time chat display in web dashboard
- Chat log export
- Filter by sender/time
Define chat protocol
- Message format (sender, content, timestamp, type)
- Chat commands (if any)
- Trade negotiation messages

Files to create:

pycatan/management/chat_manager.py - Central chat management
pycatan/management/message.py - Message data structure

Integration points:

GameManager receives messages from players
ChatManager distributes to other players and dashboard
AI agents see messages in their prompt context
Web dashboard shows live chat
Local logs record all messages

Phase 5: Tool System 🔧

Goal: Provide computational tools for agent decision-making

5.1 Core Tools

Probability Calculator
- Dice roll probabilities for tiles
- Expected resource generation rates
- Statistical analysis helpers
Resource Tracker
- Historical resource generation
- Resource scarcity analysis
- Production trend analysis
Path Finder
- Optimal road placement
- Longest road calculation
- Connectivity analysis
Trade Evaluator
- Fair trade assessment
- Trade benefit calculation
- Market value estimation

Files to create:

pycatan/ai/tools/ - Tool implementations
- probability_tool.py
- resource_tool.py
- pathfinding_tool.py
- trade_tool.py
- tool_manager.py - Tool orchestration

5.2 Tool Integration

Define tool interface/protocol
Implement tool calling from prompts
Add tool usage limits per decision
Create tool result formatting
Build tool usage logging

Phase 6: Testing & Validation ✅

Goal: Ensure agent works correctly and plays reasonably

6.1 Unit Tests

Test prompt manager filtering
Test response parser with various inputs
Test memory operations
Test each tool independently
Test configuration loading

Files to create:

tests/unit/test_ai_agent.py
tests/unit/test_prompt_manager.py
tests/unit/test_memory.py
tests/unit/test_tools.py

6.2 Integration Tests

Test agent in complete game loop
Test agent vs human player
Test multiple AI agents playing together
Test edge cases and error scenarios
Test long-running games (memory management)

Files to create:

tests/integration/test_ai_gameplay.py
tests/integration/test_multi_agent.py

6.3 Gameplay Validation

Verify legal moves only
Check strategic decision quality
Evaluate social interaction naturalness
Monitor LLM costs and performance
Collect agent behavior metrics

Phase 7: Optimization & Enhancement 🚀

Goal: Improve agent performance and capabilities

7.1 Performance Optimization

Reduce prompt token usage
Implement response caching for similar situations
Optimize tool execution
Improve decision speed

7.2 Strategy Enhancement

Tune agent personalities
Improve opening game strategy
Enhance mid-game adaptation
Refine end-game tactics
Better negotiation and trading

7.3 Advanced Features

Multi-turn planning capability
Opponent modeling
Meta-strategy learning
Tournament play support
Statistical performance tracking

📁 Project Structure (Proposed)

pycatan/
├── ai/                          # NEW: AI agent infrastructure
│   ├── __init__.py
│   ├── config.py                # Configuration management
│   ├── prompt_manager.py        # Prompt processing pipeline
│   ├── state_filter.py          # Game state filtering
│   ├── prompt_templates.py      # Prompt templates
│   ├── response_parser.py       # Response parsing
│   ├── schemas.py               # JSON schemas
│   ├── memory.py                # Memory system + chat summarization
│   ├── llm_client.py            # LLM abstraction (multi-model)
│   ├── providers/               # LLM provider implementations
│   │   ├── __init__.py
│   │   ├── openai_provider.py
│   │   └── anthropic_provider.py
│   └── tools/                   # Agent tools
│       ├── __init__.py
│       ├── tool_manager.py
│       ├── probability_tool.py
│       ├── resource_tool.py
│       ├── pathfinding_tool.py
│       └── trade_tool.py
├── monitoring/                  # NEW: Monitoring & debugging
│   ├── __init__.py
│   ├── dashboard_server.py      # Web dashboard backend
│   ├── event_logger.py          # Event capture and broadcast
│   ├── prompt_tracker.py        # LLM interaction tracking
│   ├── local_logger.py          # Local file logging
│   ├── session_recorder.py      # Game session recording
│   ├── report_generator.py      # Report generation
│   └── web/                     # Dashboard frontend
│       ├── index.html
│       ├── dashboard.js
│       └── dashboard.css
├── management/
│   ├── actions.py               # Existing
│   ├── game_manager.py          # Existing
│   ├── log_events.py            # Existing
│   ├── chat_manager.py          # NEW: Chat management
│   └── message.py               # NEW: Message data structure
├── players/
│   ├── ai_agent.py              # UPDATE: Full AI agent implementation
│   ├── human_user.py            # Existing
│   └── user.py                  # Existing
└── ...                          # Existing structure

logs/                            # NEW: Local documentation
└── game_sessions/
    └── YYYY-MM-DD_HH-MM-SS/
        ├── game_summary.json
        ├── chat_log.txt
        └── agent_<color>/
            ├── config.json
            ├── prompts.log
            ├── decisions.log
            └── memory_snapshots.json

examples/
├── ai_testing/
│   ├── config_example.yaml      # NEW: Example configuration
│   ├── test_single_agent.py     # NEW: Test script
│   └── test_multi_agent.py      # NEW: Multi-agent test
└── ...

tests/
├── unit/
│   ├── test_ai_agent.py         # NEW
│   ├── test_prompt_manager.py   # NEW
│   ├── test_memory.py           # NEW
│   └── test_tools.py            # NEW
├── integration/
│   ├── test_ai_gameplay.py      # NEW
│   └── test_multi_agent.py      # NEW
└── ...