Spaces:
Configuration error
Configuration error
| # 🔧 Tool Calling System for AI Agents | |
| ## 📋 Overview | |
| The PyCatan AI system now supports **function calling** (tool use) for LLM agents. This allows the AI to query specific information about the game state instead of trying to interpret raw data, which **prevents hallucinations** and improves decision quality. | |
| ## 🎯 Key Features | |
| ### ✅ Complete Tool System | |
| - **3 powerful tools** for game state analysis | |
| - **Multiple tool calls** in a single turn | |
| - **Automatic execution** and result formatting | |
| - **Full logging** with token tracking | |
| ### ✅ Token Tracking | |
| - Input tokens (tool parameters) | |
| - Output tokens (tool results) | |
| - Separate tracking from LLM tokens | |
| - Cost calculation for tool usage | |
| ### ✅ Detailed Logging | |
| - Every tool call logged with parameters | |
| - Execution time per tool | |
| - Success/failure status | |
| - Results preview in logs | |
| - Separate `tool_executions.json` file | |
| ### ✅ LLM Integration | |
| - Works with Gemini function calling | |
| - Supports multiple iterations | |
| - Automatic tool result formatting | |
| - Seamless conversation flow | |
| --- | |
| ## 🛠️ Available Tools | |
| ### 1. **inspect_node** | |
| Get detailed information about a specific node. | |
| **Use case:** "What resources does node 14 provide?" | |
| **Parameters:** | |
| - `node_id` (int): The node to inspect | |
| **Returns:** | |
| ```json | |
| { | |
| "node_id": 14, | |
| "exists": true, | |
| "resources": {"Wheat": 6, "Wood": 8, "Brick": 5}, | |
| "total_pips": 14, | |
| "port": "3:1", | |
| "neighbors": [10, 11, 18], | |
| "occupied": false, | |
| "can_build_here": true | |
| } | |
| ``` | |
| ### 2. **find_best_nodes** | |
| Search for the best available nodes matching criteria. | |
| **Use case:** "Find the best spots with high ore production" | |
| **Parameters:** | |
| - `min_pips` (int): Minimum pip value (default: 0) | |
| - `must_have_resource` (str): Required resource (optional) | |
| - `exclude_blocked` (bool): Skip unbuildable nodes (default: true) | |
| - `prefer_port` (bool): Prioritize ports (default: false) | |
| - `limit` (int): Max results (default: 10) | |
| **Returns:** | |
| ```json | |
| { | |
| "query": {...}, | |
| "total_found": 15, | |
| "nodes": [ | |
| { | |
| "node_id": 18, | |
| "resources": {"Ore": 10, "Wheat": 6}, | |
| "total_pips": 13, | |
| "port": null, | |
| "score": 15.0 | |
| }, | |
| ... | |
| ] | |
| } | |
| ``` | |
| ### 3. **analyze_path_potential** | |
| Analyze where roads lead and what opportunities exist ahead. | |
| **Use case:** "If I build a road from node 10, what can I reach?" | |
| **Parameters:** | |
| - `from_node` (int): Starting node | |
| - `direction_node` (int): Specific direction (optional) | |
| - `max_depth` (int): How far to look (1 or 2, default: 2) | |
| **Returns:** | |
| ```json | |
| { | |
| "from_node": 10, | |
| "total_directions": 3, | |
| "paths": [ | |
| { | |
| "direction": 14, | |
| "depth_1": { | |
| "node_id": 14, | |
| "total_pips": 12, | |
| "port": "3:1" | |
| }, | |
| "depth_2": { | |
| "best_node": 18, | |
| "best_pips": 13 | |
| }, | |
| "highlights": ["Port (3:1) at depth 1"], | |
| "score": 14.5 | |
| }, | |
| ... | |
| ] | |
| } | |
| ``` | |
| --- | |
| ## 🔄 How It Works | |
| ### Architecture Flow | |
| ``` | |
| ┌─────────────┐ | |
| │ AI Manager │ | |
| └──────┬──────┘ | |
| │ | |
| ├─────► Update AgentTools with game state | |
| │ | |
| ├─────► Send prompt to LLM (with tool schemas) | |
| │ | |
| ▼ | |
| ┌─────────────────┐ | |
| │ LLM Client │ ◄──── Tools available via function calling | |
| └────────┬────────┘ | |
| │ | |
| ├──── Response with tool_calls? | |
| │ | |
| ▼ YES | |
| ┌──────────────────┐ | |
| │ Tool Executor │ | |
| └────────┬─────────┘ | |
| │ | |
| ├─────► Execute each tool call | |
| ├─────► Log execution (time, tokens) | |
| ├─────► Format results | |
| │ | |
| ▼ | |
| Back to LLM with results ──► Final answer | |
| ``` | |
| ### Execution Loop | |
| 1. **Prompt sent** with tool schemas | |
| 2. **LLM decides** to call one or more tools | |
| 3. **Tools executed** in parallel | |
| 4. **Results logged** with full details | |
| 5. **Results sent back** to LLM | |
| 6. **LLM provides** final answer based on tool data | |
| **Maximum iterations:** 5 (prevents infinite loops) | |
| --- | |
| ## 📊 Logging & Tracking | |
| ### Tool Execution Log | |
| Every tool call is logged to `tool_executions.json`: | |
| ```json | |
| [ | |
| { | |
| "timestamp": "2026-01-09T12:34:56", | |
| "total_calls": 2, | |
| "successful": 2, | |
| "failed": 0, | |
| "total_time_ms": 45.2, | |
| "tokens": { | |
| "input": 15, | |
| "output": 127, | |
| "total": 142 | |
| }, | |
| "calls": [ | |
| { | |
| "id": "call_1", | |
| "name": "inspect_node", | |
| "parameters": {"node_id": 14}, | |
| "result": {...}, | |
| "success": true, | |
| "execution_time_ms": 12.3, | |
| "tokens": { | |
| "input": 5, | |
| "output": 45, | |
| "total": 50 | |
| } | |
| }, | |
| ... | |
| ] | |
| } | |
| ] | |
| ``` | |
| ### LLM Communication Log | |
| Tool activity is logged to `llm_communication.log`: | |
| ``` | |
| [12:34:56] [TOOL_REQUEST] 🔧 LLM requested 2 tool(s) (iteration 1) | |
| [12:34:56] [TOOL] === Tool Execution Batch (2 calls) === | |
| [12:34:56] [TOOL] ✅ inspect_node({"node_id": 14}) | |
| [12:34:56] [TOOL] Time: 12.3ms | Tokens: 5 in + 45 out = 50 total | |
| [12:34:56] [TOOL] Result: {"node_id": 14, "exists": true... | |
| [12:34:56] [TOOL] ✅ find_best_nodes({"min_pips": 10}) | |
| [12:34:56] [TOOL] Time: 32.9ms | Tokens: 10 in + 82 out = 92 total | |
| [12:34:56] [TOOL] Total: 2/2 successful | 142 tokens | 45.2ms | |
| [12:34:56] [TOOL_RESULTS] ✅ Tool results sent back to LLM (142 tokens) | |
| ``` | |
| ### Token Statistics | |
| The LLM stats now include tool tokens: | |
| ```python | |
| { | |
| "total_requests": 5, | |
| "total_tokens": 15432, | |
| "tool_tokens": 1250, # From tool inputs/outputs | |
| "llm_tokens": 14182, # From prompts/completions | |
| "total_cost_usd": "$0.0145" | |
| } | |
| ``` | |
| --- | |
| ## 🧪 Testing | |
| ### Run the Test Suite | |
| ```bash | |
| python examples/ai_testing/test_tools_integration.py | |
| ``` | |
| This tests: | |
| 1. ✅ Basic tool operations | |
| 2. ✅ Multiple tool calls in batch | |
| 3. ✅ Tool schema generation | |
| 4. ✅ Execution history and statistics | |
| ### Expected Output | |
| ``` | |
| 🧪 Testing Tool Integration for AI Agents | |
| ============================================================ | |
| TEST 1: Basic Tool Operations | |
| ============================================================ | |
| ✅ Initialized AgentTools with 54 nodes | |
| 🔧 Testing: inspect_node(10) | |
| { | |
| "node_id": 10, | |
| "exists": true, | |
| "resources": {"Wheat": 6, "Wood": 8}, | |
| "total_pips": 10, | |
| ... | |
| } | |
| ... | |
| ============================================================ | |
| ✅ All Tests Passed! | |
| ============================================================ | |
| ``` | |
| --- | |
| ## 💻 Usage Examples | |
| ### Example 1: Enable Tools in AI Manager | |
| Tools are **automatically enabled** when you use AIManager: | |
| ```python | |
| from pycatan.ai.ai_manager import AIManager | |
| # Create AI manager | |
| ai_manager = AIManager() | |
| # Register agent | |
| ai_manager.register_agent("Alice", player_id=0) | |
| # Process turn (tools automatically available) | |
| result = ai_manager.process_agent_turn( | |
| player_name="Alice", | |
| game_state=game_state, | |
| prompt_message="Your turn", | |
| allowed_actions=["build_settlement"] | |
| ) | |
| ``` | |
| ### Example 2: Direct Tool Usage | |
| You can also use tools directly: | |
| ```python | |
| from pycatan.ai.agent_tools import AgentTools | |
| # Initialize with game state | |
| tools = AgentTools(game_state) | |
| # Inspect a specific node | |
| node_info = tools.inspect_node(14) | |
| print(f"Node 14 has {node_info['total_pips']} pips") | |
| # Find best locations | |
| best_nodes = tools.find_best_nodes(min_pips=10, limit=5) | |
| print(f"Found {len(best_nodes['nodes'])} great spots") | |
| # Analyze road potential | |
| paths = tools.analyze_path_potential(from_node=10, max_depth=2) | |
| print(f"Best direction: {paths['paths'][0]['direction']}") | |
| ``` | |
| ### Example 3: Get Tool Execution Summary | |
| ```python | |
| # After game ends | |
| summary = ai_manager.tool_executor.get_execution_summary() | |
| print(f"Total tool calls: {summary['total_calls']}") | |
| print(f"Success rate: {summary['success_rate']}") | |
| print(f"Total tokens: {summary['total_tokens']}") | |
| # Tool usage breakdown | |
| for tool_name, count in summary['tool_usage'].items(): | |
| print(f" {tool_name}: {count} times") | |
| ``` | |
| --- | |
| ## 🎮 Real Game Usage | |
| ### What the LLM Sees | |
| When the LLM receives a prompt, it also gets tool schemas: | |
| ```json | |
| { | |
| "tools": [ | |
| { | |
| "name": "inspect_node", | |
| "description": "Get detailed information about a node. Prevents hallucinations!", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "node_id": { | |
| "type": "integer", | |
| "description": "The node ID to inspect" | |
| } | |
| }, | |
| "required": ["node_id"] | |
| } | |
| }, | |
| ... | |
| ] | |
| } | |
| ``` | |
| ### LLM Decision Process | |
| 1. **LLM thinks:** "I need to know about node 14 before deciding" | |
| 2. **LLM calls:** `inspect_node(node_id=14)` | |
| 3. **Tool executes:** Returns detailed node info | |
| 4. **LLM receives:** Complete accurate data | |
| 5. **LLM decides:** "Based on the data, I'll build there" | |
| ### Benefits Over Raw Data | |
| **Without tools:** | |
| ``` | |
| "Looking at Array N, I think node 14 has wheat and wood..." ❌ (hallucination) | |
| ``` | |
| **With tools:** | |
| ``` | |
| *calls inspect_node(14)* | |
| "The tool confirms node 14 has 12 pips with ore and wheat..." ✅ (accurate) | |
| ``` | |
| --- | |
| ## 📁 File Structure | |
| ``` | |
| pycatan/ai/ | |
| ├── agent_tools.py # The 3 tools (inspect, find, analyze) | |
| ├── tool_executor.py # Executes and logs tool calls | |
| ├── llm_client.py # LLM with function calling support | |
| ├── ai_manager.py # Integrates everything | |
| └── ai_logger.py # Logs tool executions | |
| examples/ai_testing/ | |
| ├── test_tools_integration.py # Test suite | |
| └── my_games/ | |
| └── session_YYYYMMDD_HHMMSS/ | |
| ├── tool_executions.json # Detailed tool logs | |
| ├── llm_communication.log # Real-time log | |
| └── [player_name]/ | |
| ├── prompts/ | |
| └── responses/ | |
| ``` | |
| --- | |
| ## 🚀 Future Enhancements | |
| ### Potential New Tools | |
| 1. **evaluate_trade** - Check if a trade is fair | |
| 2. **calculate_odds** - Probability of getting specific resources | |
| 3. **check_opponent_threats** - Identify threats from opponents | |
| 4. **plan_resource_path** - Plan how to get needed resources | |
| 5. **estimate_victory_points** - Calculate VP for different strategies | |
| ### Advanced Features | |
| - **Tool chaining** - One tool's output feeds into another | |
| - **Cached results** - Avoid re-executing identical calls | |
| - **Parallel execution** - Run independent tools simultaneously | |
| - **Tool suggestions** - AI Manager suggests which tools to use | |
| --- | |
| ## ⚙️ Configuration | |
| Tools work out-of-the-box, but you can customize: | |
| ### Token Estimation | |
| Tools estimate tokens at ~4 chars per token. Adjust in `tool_executor.py`: | |
| ```python | |
| def _estimate_tokens(self, text: str) -> int: | |
| return len(text) // 4 # Adjust divisor for accuracy | |
| ``` | |
| ### Max Tool Iterations | |
| Prevent infinite loops by setting max iterations in `ai_manager.py`: | |
| ```python | |
| max_tool_iterations = 5 # Increase if needed | |
| ``` | |
| ### Tool Timeout | |
| Add timeout per tool in `tool_executor.py`: | |
| ```python | |
| # Add to _execute_single_tool: | |
| import signal | |
| signal.alarm(5) # 5 second timeout | |
| ``` | |
| --- | |
| ## 🐛 Troubleshooting | |
| ### Issue: Tools not called by LLM | |
| **Check:** | |
| - Is `tools` parameter passed to `llm_client.generate()`? | |
| - Are tool schemas valid JSON? | |
| - Does LLM support function calling? (Gemini 2.0+) | |
| ### Issue: Wrong tool results | |
| **Check:** | |
| - Is game state updated before calling tools? | |
| - Are node IDs correct in the game state? | |
| - Check `tool_executions.json` for actual parameters used | |
| ### Issue: Too many tool iterations | |
| **Check:** | |
| - Is LLM stuck in a loop? | |
| - Are tool results clear enough for LLM to decide? | |
| - Consider adding more context in tool descriptions | |
| --- | |
| ## 📚 Related Documentation | |
| - [AI_ARCHITECTURE.md](../../.github/instructions/AI_ARCHITECTURE.md) - System architecture | |
| - [AGENT_TOOLS_README.md](../../pycatan/ai/AGENT_TOOLS_README.md) - Tool documentation | |
| - [AI_AGENT_PRINCIPLES.md](../../.github/instructions/AI_AGENT_PRINCIPLES.md) - Design principles | |
| --- | |
| ## ✅ Summary | |
| The tool calling system provides: | |
| 1. **3 powerful tools** for game analysis | |
| 2. **Multiple calls** per turn supported | |
| 3. **Full logging** with execution details | |
| 4. **Token tracking** separate from LLM | |
| 5. **Automatic integration** in AIManager | |
| 6. **Easy to test** with provided test suite | |
| **Result:** More accurate AI decisions, fewer hallucinations, better gameplay! 🎯 | |
| --- | |
| **Questions?** Check the test file or open an issue on GitHub. | |