agentbee

Sleeping

App Files Files Community

mangubee commited on Jan 2

Commit

87de1a7

1 Parent(s): 8e48c56

Update dev record with correct test/ folder paths

Browse files

Files changed (1) hide show

dev/dev_260102_13_stage2_tool_development.md +38 -20

dev/dev_260102_13_stage2_tool_development.md CHANGED Viewed

@@ -20,12 +20,14 @@ Stage 1 established the LangGraph StateGraph skeleton with placeholder nodes. St
 **Chosen:** Direct Python function implementations for all tools
 **Why:**
 - HuggingFace Spaces doesn't support running MCP servers (requires separate processes)
 - Direct API approach is simpler and more reliable for deployment
 - Full control over retry logic, error handling, and timeouts
 - MCP servers are external dependencies with additional failure points
 **Rejected alternative:** Using MCP protocol servers for Tavily/Exa
 - Would require complex Docker configuration on HF Spaces
 - Additional process management overhead
 - Not necessary for MVP stage
@@ -35,12 +37,14 @@ Stage 1 established the LangGraph StateGraph skeleton with placeholder nodes. St
 **Chosen:** Use `tenacity` library with exponential backoff, max 3 retries
 **Why:**
 - Industry-standard retry library with clean decorator syntax
 - Exponential backoff prevents API rate limit issues
 - Configurable retry conditions (only retry on connection errors, not on validation errors)
 - Easy to test with mocking
 **Configuration:**
 - Max retries: 3
 - Min wait: 1 second
 - Max wait: 10 seconds
@@ -49,11 +53,13 @@ Stage 1 established the LangGraph StateGraph skeleton with placeholder nodes. St
 ### Decision 3: Tool Architecture - Unified Functions with Fallback
 **Pattern applied to all tools:**
 - Primary implementation (e.g., `tavily_search`)
 - Fallback implementation (e.g., `exa_search`)
 - Unified function with automatic fallback (e.g., `search`)
 **Example:**
 ```python
 def search(query):
     if default_tool == "tavily":
@@ -70,6 +76,7 @@ def search(query):
 **Chosen:** Custom AST visitor with whitelisted operations only
 **Why:**
 - Python's `eval()` is dangerous (arbitrary code execution)
 - `ast.literal_eval()` is too restrictive (doesn't support math operations)
 - Custom AST visitor allows precise control over allowed operations
@@ -77,10 +84,12 @@ def search(query):
 - Whitelist approach: only allow known-safe operations (add, multiply, sin, cos, etc.)
 **Rejected alternatives:**
 - Using `eval()`: Major security vulnerability
 - Using `sympify()` from sympy: Too complex, allows too much
 **Security layers:**
 1. AST whitelist (only allow specific node types)
 2. Expression length limit (500 chars)
 3. Number size limit (prevent huge calculations)
@@ -102,6 +111,7 @@ def parse_file(file_path):
 ```
 **Why:**
 - Simple interface for users (one function for all file types)
 - Easy to add new file types (just add new parser and update dispatcher)
 - Each parser can have format-specific logic
@@ -112,11 +122,13 @@ def parse_file(file_path):
 **Chosen:** Gemini 2.0 Flash as primary, Claude Sonnet 4.5 as fallback
 **Why:**
 - Gemini 2.0 Flash: Free tier (1500 req/day), fast, good quality
 - Claude Sonnet 4.5: Paid but highest quality, automatic fallback if Gemini fails
 - Same pattern as web search (primary + fallback = reliability)
 **Image handling:**
 - Load file, encode as base64
 - Check file size (max 10MB)
 - Support common formats (JPG, PNG, GIF, WEBP, BMP)
@@ -132,7 +144,7 @@ Successfully implemented 4 production-ready tools with comprehensive error handl
    - Tavily API integration (primary, free tier)
    - Exa API integration (fallback, paid)
    - Automatic fallback if primary fails
-   - 10 passing tests (mock API, retry logic, fallback mechanism)
 2. **File Parser Tool** ([src/tools/file_parser.py](../src/tools/file_parser.py))
    - PDF parsing (PyPDF2)
@@ -140,21 +152,21 @@ Successfully implemented 4 production-ready tools with comprehensive error handl
    - Word parsing (python-docx)
    - Text/CSV parsing (built-in open)
    - Generic `parse_file()` dispatcher
-   - 19 passing tests (real files + error handling)
 3. **Calculator Tool** ([src/tools/calculator.py](../src/tools/calculator.py))
    - Safe AST-based expression evaluation
    - Whitelisted operations only (no code execution)
    - Mathematical functions (sin, cos, sqrt, factorial, etc.)
    - Security hardened (timeout, complexity limits)
-   - 41 passing tests (arithmetic, functions, security)
 4. **Vision Tool** ([src/tools/vision.py](../src/tools/vision.py))
    - Multimodal image analysis using LLMs
    - Gemini 2.0 Flash (primary, free)
    - Claude Sonnet 4.5 (fallback, paid)
    - Image loading and base64 encoding
-   - 15 passing tests (mock LLM responses)
 5. **Tool Registry** ([src/tools/__init__.py](../src/tools/__init__.py))
    - Exports all 4 main tools: `search`, `parse_file`, `safe_eval`, `analyze_image`
@@ -167,12 +179,14 @@ Successfully implemented 4 production-ready tools with comprehensive error handl
    - Stage 3: Will add dynamic tool selection and execution
 **Test Coverage:**
 - 85 tool tests passing (web_search: 10, file_parser: 19, calculator: 41, vision: 15)
 - 6 existing agent tests still passing
 - 91 total tests passing
 - No regressions from Stage 1
 **Deployment:**
 - All changes committed and pushed to HuggingFace Spaces
 - Build succeeded
 - Agent now reports: "Stage 2 complete: 4 tools ready for execution in Stage 3"
@@ -198,6 +212,7 @@ def tool_name(args):
 ```
 **Why it works:**
 - Maximizes reliability (2 chances to succeed)
 - Transparent to users (single function call)
 - Preserves cost optimization (use free tier first, paid only as fallback)
@@ -209,17 +224,19 @@ def tool_name(args):
 Creating real test fixtures (sample.pdf, sample.xlsx, etc.) was critical for file parser testing:
 **What worked:**
 - Tests are realistic (test actual file parsing, not just mocks)
 - Easy to add new test cases (just add new fixture files)
 - Catches edge cases that mocks miss
 **Created fixtures:**
-- `tests/fixtures/sample.txt` - Plain text
-- `tests/fixtures/sample.csv` - CSV data
-- `tests/fixtures/sample.xlsx` - Excel spreadsheet
-- `tests/fixtures/sample.docx` - Word document
-- `tests/fixtures/test_image.jpg` - Test image (red square)
-- `tests/fixtures/generate_fixtures.py` - Script to regenerate fixtures
 **Recommendation:** For any file processing tool, create comprehensive fixture library.
@@ -241,6 +258,7 @@ with patch('google.genai.Client') as mock_client:
 Initially planned to create `tests/test_tools_integration.py` for cross-tool testing. **Decision:** Skip for Stage 2.
 **Why:**
 - Tools work independently (don't need to interact yet)
 - Integration testing makes sense in Stage 3 when tools are orchestrated
 - Unit tests provide sufficient coverage for Stage 2
@@ -255,16 +273,16 @@ Initially planned to create `tests/test_tools_integration.py` for cross-tool tes
 - `src/tools/file_parser.py` - PDF/Excel/Word/Text parsing with retry logic
 - `src/tools/calculator.py` - Safe AST-based math evaluation
 - `src/tools/vision.py` - Multimodal image analysis (Gemini/Claude)
-- `tests/test_web_search.py` - 10 tests for web search tool
-- `tests/test_file_parser.py` - 19 tests for file parser
-- `tests/test_calculator.py` - 41 tests for calculator (including security)
-- `tests/test_vision.py` - 15 tests for vision tool
-- `tests/fixtures/sample.txt` - Test text file
-- `tests/fixtures/sample.csv` - Test CSV file
-- `tests/fixtures/sample.xlsx` - Test Excel file
-- `tests/fixtures/sample.docx` - Test Word document
-- `tests/fixtures/test_image.jpg` - Test image
-- `tests/fixtures/generate_fixtures.py` - Fixture generation script
 **What was modified:**

 **Chosen:** Direct Python function implementations for all tools
 **Why:**
 - HuggingFace Spaces doesn't support running MCP servers (requires separate processes)
 - Direct API approach is simpler and more reliable for deployment
 - Full control over retry logic, error handling, and timeouts
 - MCP servers are external dependencies with additional failure points
 **Rejected alternative:** Using MCP protocol servers for Tavily/Exa
 - Would require complex Docker configuration on HF Spaces
 - Additional process management overhead
 - Not necessary for MVP stage
 **Chosen:** Use `tenacity` library with exponential backoff, max 3 retries
 **Why:**
 - Industry-standard retry library with clean decorator syntax
 - Exponential backoff prevents API rate limit issues
 - Configurable retry conditions (only retry on connection errors, not on validation errors)
 - Easy to test with mocking
 **Configuration:**
 - Max retries: 3
 - Min wait: 1 second
 - Max wait: 10 seconds
 ### Decision 3: Tool Architecture - Unified Functions with Fallback
 **Pattern applied to all tools:**
 - Primary implementation (e.g., `tavily_search`)
 - Fallback implementation (e.g., `exa_search`)
 - Unified function with automatic fallback (e.g., `search`)
 **Example:**
 ```python
 def search(query):
     if default_tool == "tavily":
 **Chosen:** Custom AST visitor with whitelisted operations only
 **Why:**
 - Python's `eval()` is dangerous (arbitrary code execution)
 - `ast.literal_eval()` is too restrictive (doesn't support math operations)
 - Custom AST visitor allows precise control over allowed operations
 - Whitelist approach: only allow known-safe operations (add, multiply, sin, cos, etc.)
 **Rejected alternatives:**
 - Using `eval()`: Major security vulnerability
 - Using `sympify()` from sympy: Too complex, allows too much
 **Security layers:**
 1. AST whitelist (only allow specific node types)
 2. Expression length limit (500 chars)
 3. Number size limit (prevent huge calculations)
 ```
 **Why:**
 - Simple interface for users (one function for all file types)
 - Easy to add new file types (just add new parser and update dispatcher)
 - Each parser can have format-specific logic
 **Chosen:** Gemini 2.0 Flash as primary, Claude Sonnet 4.5 as fallback
 **Why:**
 - Gemini 2.0 Flash: Free tier (1500 req/day), fast, good quality
 - Claude Sonnet 4.5: Paid but highest quality, automatic fallback if Gemini fails
 - Same pattern as web search (primary + fallback = reliability)
 **Image handling:**
 - Load file, encode as base64
 - Check file size (max 10MB)
 - Support common formats (JPG, PNG, GIF, WEBP, BMP)
    - Tavily API integration (primary, free tier)
    - Exa API integration (fallback, paid)
    - Automatic fallback if primary fails
+   - 10 passing tests in [test/test_web_search.py](../test/test_web_search.py)
 2. **File Parser Tool** ([src/tools/file_parser.py](../src/tools/file_parser.py))
    - PDF parsing (PyPDF2)
    - Word parsing (python-docx)
    - Text/CSV parsing (built-in open)
    - Generic `parse_file()` dispatcher
+   - 19 passing tests in [test/test_file_parser.py](../test/test_file_parser.py)
 3. **Calculator Tool** ([src/tools/calculator.py](../src/tools/calculator.py))
    - Safe AST-based expression evaluation
    - Whitelisted operations only (no code execution)
    - Mathematical functions (sin, cos, sqrt, factorial, etc.)
    - Security hardened (timeout, complexity limits)
+   - 41 passing tests in [test/test_calculator.py](../test/test_calculator.py)
 4. **Vision Tool** ([src/tools/vision.py](../src/tools/vision.py))
    - Multimodal image analysis using LLMs
    - Gemini 2.0 Flash (primary, free)
    - Claude Sonnet 4.5 (fallback, paid)
    - Image loading and base64 encoding
+   - 15 passing tests in [test/test_vision.py](../test/test_vision.py)
 5. **Tool Registry** ([src/tools/__init__.py](../src/tools/__init__.py))
    - Exports all 4 main tools: `search`, `parse_file`, `safe_eval`, `analyze_image`
    - Stage 3: Will add dynamic tool selection and execution
 **Test Coverage:**
 - 85 tool tests passing (web_search: 10, file_parser: 19, calculator: 41, vision: 15)
 - 6 existing agent tests still passing
 - 91 total tests passing
 - No regressions from Stage 1
 **Deployment:**
 - All changes committed and pushed to HuggingFace Spaces
 - Build succeeded
 - Agent now reports: "Stage 2 complete: 4 tools ready for execution in Stage 3"
 ```
 **Why it works:**
 - Maximizes reliability (2 chances to succeed)
 - Transparent to users (single function call)
 - Preserves cost optimization (use free tier first, paid only as fallback)
 Creating real test fixtures (sample.pdf, sample.xlsx, etc.) was critical for file parser testing:
 **What worked:**
 - Tests are realistic (test actual file parsing, not just mocks)
 - Easy to add new test cases (just add new fixture files)
 - Catches edge cases that mocks miss
 **Created fixtures:**
+- `test/fixtures/sample.txt` - Plain text
+- `test/fixtures/sample.csv` - CSV data
+- `test/fixtures/sample.xlsx` - Excel spreadsheet
+- `test/fixtures/sample.docx` - Word document
+- `test/fixtures/test_image.jpg` - Test image (red square)
+- `test/fixtures/generate_fixtures.py` - Script to regenerate fixtures
 **Recommendation:** For any file processing tool, create comprehensive fixture library.
 Initially planned to create `tests/test_tools_integration.py` for cross-tool testing. **Decision:** Skip for Stage 2.
 **Why:**
 - Tools work independently (don't need to interact yet)
 - Integration testing makes sense in Stage 3 when tools are orchestrated
 - Unit tests provide sufficient coverage for Stage 2
 - `src/tools/file_parser.py` - PDF/Excel/Word/Text parsing with retry logic
 - `src/tools/calculator.py` - Safe AST-based math evaluation
 - `src/tools/vision.py` - Multimodal image analysis (Gemini/Claude)
+- `test/test_web_search.py` - 10 tests for web search tool
+- `test/test_file_parser.py` - 19 tests for file parser
+- `test/test_calculator.py` - 41 tests for calculator (including security)
+- `test/test_vision.py` - 15 tests for vision tool
+- `test/fixtures/sample.txt` - Test text file
+- `test/fixtures/sample.csv` - Test CSV file
+- `test/fixtures/sample.xlsx` - Test Excel file
+- `test/fixtures/sample.docx` - Test Word document
+- `test/fixtures/test_image.jpg` - Test image
+- `test/fixtures/generate_fixtures.py` - Fixture generation script
 **What was modified:**