agentbee

Sleeping

mangubee commited on Jan 2

Commit

8e48c56

1 Parent(s): 1041734

Move test files from tests/ to test/ folder per project standard

- Moved all test files to standard test/ folder (singular, not plural)
- test_web_search.py, test_file_parser.py, test_calculator.py, test_vision.py
- Moved fixtures/ directory to test/fixtures/
- Removed tests/ directory
- All 91 tests still passing

Files changed (18) hide show

PLAN.md +8 -287
TODO.md +9 -66
dev/dev_260102_13_stage2_tool_development.md +280 -0
src/tools/web_search.py +20 -11
{tests → test}/README.md +0 -0
{tests → test}/__init__.py +0 -0
{tests → test}/fixtures/generate_fixtures.py +0 -0
{tests → test}/fixtures/sample.csv +0 -0
{tests → test}/fixtures/sample.docx +0 -0
{tests → test}/fixtures/sample.txt +0 -0
{tests → test}/fixtures/sample.xlsx +0 -0
{tests → test}/fixtures/test_image.jpg +0 -0
{tests → test}/test_agent_basic.py +0 -0
{tests → test}/test_calculator.py +0 -0
{tests → test}/test_file_parser.py +0 -0
{tests → test}/test_stage1.py +0 -0
{tests → test}/test_vision.py +0 -0
{tests → test}/test_web_search.py +0 -0

PLAN.md CHANGED Viewed

@@ -1,300 +1,21 @@
-# Implementation Plan - Stage 2: Tool Development
-**Date:** 2026-01-02
-**Dev Record:** TBD (will create dev_260102_##_stage2_tool_development.md)
-**Status:** In Progress
 ## Objective
-Implement 4 core tools (web search, file parsing, calculator, multimodal vision) with retry logic and error handling, following Level 5 (Component Selection) and Level 6 (Implementation Framework) architectural decisions. Each tool must be independently testable and integrate seamlessly with the LangGraph StateGraph.
 ## Steps
-### Step 1: Web Search Tool Implementation
-**1.1 Create src/tools/web_search.py**
-- Implement `tavily_search(query: str, max_results: int = 5) -> dict` function
-- Implement `exa_search(query: str, max_results: int = 5) -> dict` function (fallback)
-- Use Settings.get_search_api_key() for API key retrieval
-- Return structured results: {results: [{title, url, snippet}], source: "tavily"|"exa"}
-**1.2 Add retry logic with exponential backoff**
-- Use `tenacity` library for retry decorator
-- Retry on connection errors, timeouts, rate limits
-- Max 3 retries with 2^n second delays
-- Fallback from Tavily to Exa if Tavily fails after retries
-**1.3 Error handling**
-- Catch API errors and return meaningful error messages
-- Handle empty results gracefully
-- Log all errors for debugging
-**1.4 Create tests/test_web_search.py**
-- Test Tavily search with mock API
-- Test Exa search with mock API
-- Test retry logic (simulate failures)
-- Test fallback mechanism
-- Test error handling
-### Step 2: File Parsing Tool Implementation
-**2.1 Create src/tools/file_parser.py**
-- Implement `parse_pdf(file_path: str) -> str` using PyPDF2
-- Implement `parse_excel(file_path: str) -> dict` using openpyxl
-- Implement `parse_docx(file_path: str) -> str` using python-docx
-- Implement `parse_image_text(image_path: str) -> str` using Pillow + OCR (optional)
-- Generic `parse_file(file_path: str) -> dict` dispatcher based on extension
-**2.2 Add retry logic for file operations**
-- Retry on file read errors (network issues, temporary locks)
-- Max 3 retries with exponential backoff
-**2.3 Error handling**
-- Handle file not found errors
-- Handle corrupted file errors
-- Handle unsupported format errors
-- Return structured error responses
-**2.4 Create tests/test_file_parser.py**
-- Create test fixtures (sample PDF, Excel, Word files in tests/fixtures/)
-- Test each parser function independently
-- Test error handling for missing files
-- Test error handling for corrupted files
-### Step 3: Calculator Tool Implementation
-**3.1 Create src/tools/calculator.py**
-- Implement `safe_eval(expression: str) -> dict` using ast.literal_eval
-- Support basic arithmetic operations (+, -, *, /, **, %)
-- Support mathematical functions (sin, cos, sqrt, etc.) via math module
-- Return structured result: {result: float|int, expression: str}
-**3.2 Add safety checks**
-- Whitelist allowed operations (no exec, eval, import)
-- Validate expression before evaluation
-- Set execution timeout (prevent infinite loops)
-- Limit expression complexity (prevent DoS)
-**3.3 Error handling**
-- Handle syntax errors
-- Handle division by zero
-- Handle invalid operations
-- Return meaningful error messages
-**3.4 Create tests/test_calculator.py**
-- Test basic arithmetic (2+2, 10*5, etc.)
-- Test mathematical functions (sqrt(16), sin(0), etc.)
-- Test error handling (division by zero, invalid syntax)
-- Test safety checks (block dangerous operations)
-### Step 4: Multimodal Vision Tool Implementation
-**4.1 Create src/tools/vision.py**
-- Implement `analyze_image(image_path: str, question: str) -> str`
-- Use LLM's native vision capabilities (Gemini/Claude)
-- Load image, encode to base64
-- Send to vision-capable LLM with question
-- Return description/answer
-**4.2 Add retry logic**
-- Retry on API errors
-- Max 3 retries with exponential backoff
-**4.3 Error handling**
-- Handle image loading errors
-- Handle unsupported image formats
-- Handle API errors
-- Return structured responses
-**4.4 Create tests/test_vision.py**
-- Create test image fixtures
-- Test image analysis with mock LLM
-- Test error handling
-- Test retry logic
-### Step 5: Tool Integration with StateGraph
-**5.1 Update src/tools/__init__.py**
-- Export all tool functions
-- Create unified tool registry: `TOOLS = {name: function}`
-- Add tool metadata (description, parameters, return type)
-**5.2 Update src/agent/graph.py execute_node**
-- Replace placeholder with actual tool execution
-- Parse tool calls from plan
-- Execute tools with error handling
-- Collect results
-- Return updated state with tool results
-**5.3 Add tool execution wrapper**
-- Implement `execute_tool(tool_name: str, **kwargs) -> dict`
-- Add logging for tool calls
-- Add timeout enforcement
-- Add result validation
-### Step 6: Configuration and Settings Updates
-**6.1 Update src/config/settings.py**
-- Add tool-specific settings (timeouts, max retries, etc.)
-- Add tool feature flags (enable/disable specific tools)
-- Add result size limits
-**6.2 Update .env.example**
-- Document any new environment variables
-- Add tool-specific configuration examples
-### Step 7: Integration Testing
-**7.1 Create tests/test_tools_integration.py**
-- Test all tools working together
-- Test tool execution from StateGraph
-- Test error propagation
-- Test retry mechanisms across all tools
-**7.2 Create test_stage2.py**
-- End-to-end test with real tool calls
-- Verify StateGraph executes tools correctly
-- Verify results are returned to state
-- Verify errors are handled gracefully
-### Step 8: Documentation and Deployment
-**8.1 Update requirements.txt**
-- Ensure all tool dependencies are included
-- Add tenacity for retry logic
-**8.2 Local testing**
-- Run all test suites
-- Test with Gradio UI
-- Verify no regressions from Stage 1
-**8.3 Deploy to HF Spaces**
-- Push changes
-- Verify build succeeds
-- Test tools in deployed environment
 ## Files to Modify
-**New files to create:**
-- `src/tools/web_search.py` - Tavily/Exa search implementation
-- `src/tools/file_parser.py` - PDF/Excel/Word/Image parsing
-- `src/tools/calculator.py` - Safe expression evaluation
-- `src/tools/vision.py` - Multimodal image analysis
-- `tests/test_web_search.py` - Web search tests
-- `tests/test_file_parser.py` - File parser tests
-- `tests/test_calculator.py` - Calculator tests
-- `tests/test_vision.py` - Vision tests
-- `tests/test_tools_integration.py` - Integration tests
-- `tests/test_stage2.py` - Stage 2 end-to-end tests
-- `tests/fixtures/` - Test files directory
-**Existing files to modify:**
-- `src/tools/__init__.py` - Export all tools, create tool registry
-- `src/agent/graph.py` - Update execute_node to use real tools
-- `src/config/settings.py` - Add tool-specific settings
-- `.env.example` - Document new configuration (if any)
-- `requirements.txt` - Add tenacity for retry logic
-**Files NOT to modify:**
-- `src/agent/graph.py` plan_node - Defer to Stage 3
-- `src/agent/graph.py` answer_node - Defer to Stage 3
-- Planning/reasoning logic - Defer to Stage 3
 ## Success Criteria
-### Functional Requirements
-- [ ] Web search tool returns valid results from Tavily
-- [ ] Web search falls back to Exa when Tavily fails
-- [ ] File parser handles PDF, Excel, Word files correctly
-- [ ] Calculator evaluates mathematical expressions safely
-- [ ] Vision tool analyzes images using LLM vision capabilities
-- [ ] All tools have retry logic with exponential backoff
-- [ ] All tools handle errors gracefully
-- [ ] Tools integrate with StateGraph execute_node
-### Technical Requirements
-- [ ] All tool functions return structured dict responses
-- [ ] Retry logic uses tenacity with max 3 retries
-- [ ] Error messages are clear and actionable
-- [ ] All tools have comprehensive test coverage (>80%)
-- [ ] No unsafe code execution in calculator
-- [ ] Tool timeouts enforced to prevent hangs
-### Validation Checkpoints
-- [ ] **Checkpoint 1:** Web search tool working with tests passing
-- [ ] **Checkpoint 2:** File parser working with tests passing
-- [ ] **Checkpoint 3:** Calculator working with tests passing
-- [ ] **Checkpoint 4:** Vision tool working with tests passing
-- [ ] **Checkpoint 5:** All tools integrated with StateGraph
-- [ ] **Checkpoint 6:** Integration tests passing
-- [ ] **Checkpoint 7:** Deployed to HF Spaces successfully
-### Non-Goals for Stage 2
-- ❌ Implementing planning logic (Stage 3)
-- ❌ Implementing answer synthesis (Stage 3)
-- ❌ Optimizing tool selection strategy (Stage 3)
-- ❌ Advanced error recovery beyond retries (Stage 4)
-- ❌ Performance optimization (Stage 5)
-## Dependencies & Risks
-**Dependencies:**
-- Tavily API key (free tier: 1000 req/month)
-- Exa API key (paid tier, fallback)
-- LLM vision API access (Gemini/Claude)
-- Test fixtures (sample files for parsing)
-**Risks:**
-- **Risk:** API rate limits during testing
-  - **Mitigation:** Use mocks for unit tests, real APIs only for integration tests
-- **Risk:** File parsing fails on edge cases
-  - **Mitigation:** Comprehensive test fixtures covering various formats
-- **Risk:** Calculator security vulnerabilities
-  - **Mitigation:** Strict whitelisting, no eval/exec, use AST parsing only
-- **Risk:** Tool timeout issues on slow networks
-  - **Mitigation:** Configurable timeouts, retry logic
-## Next Steps After Stage 2
-Once Stage 2 Success Criteria met:
-1. Create Stage 3 plan (Core Agent Logic - Planning & Reasoning)
-2. Implement plan_node with tool selection strategy
-3. Implement answer_node with result synthesis
-4. Test end-to-end agent behavior
-5. Proceed to Stage 4 (Integration & Robustness)

+# Implementation Plan
+**Date:** [YYYY-MM-DD]
+**Dev Record:** [link to dev/dev_YYMMDD_##_concise_title.md]
+**Status:** [Planning | In Progress | Completed]
 ## Objective
+[Clear goal statement]
 ## Steps
+[Implementation steps]
 ## Files to Modify
+[List of files]
 ## Success Criteria
+[Completion criteria]

TODO.md CHANGED Viewed

@@ -1,71 +1,14 @@
-# TODO - Stage 2: Tool Development
-**Created:** 2026-01-02
-**Plan:** PLAN.md (Stage 2: Tool Development)
-**Status:** Ready for execution
-## Task List
-### Step 1: Web Search Tool
-- [ ] Create `src/tools/web_search.py` with Tavily and Exa search functions
-- [ ] Add retry logic with tenacity decorator (max 3 retries, exponential backoff)
-- [ ] Implement fallback mechanism (Tavily → Exa)
-- [ ] Add error handling and logging
-- [ ] Create `tests/test_web_search.py` with mock API tests
-- [ ] Test retry logic and fallback mechanism
-### Step 2: File Parsing Tool
-- [ ] Create `src/tools/file_parser.py` with PDF/Excel/Word parsers
-- [ ] Implement generic `parse_file()` dispatcher
-- [ ] Add retry logic for file operations
-- [ ] Add error handling for missing/corrupted files
-- [ ] Create test fixtures in `tests/fixtures/`
-- [ ] Create `tests/test_file_parser.py` with parser tests
-### Step 3: Calculator Tool
-- [ ] Create `src/tools/calculator.py` with safe_eval function
-- [ ] Implement safety checks (whitelist operations, timeout, complexity limits)
-- [ ] Add error handling for syntax/division errors
-- [ ] Create `tests/test_calculator.py` with arithmetic and safety tests
-### Step 4: Vision Tool
-- [ ] Create `src/tools/vision.py` with image analysis function
-- [ ] Implement image loading and base64 encoding
-- [ ] Integrate with LLM vision API (Gemini/Claude)
-- [ ] Add retry logic for API errors
-- [ ] Create test image fixtures
-- [ ] Create `tests/test_vision.py` with mock LLM tests
-### Step 5: StateGraph Integration
-- [ ] Update `src/tools/__init__.py` to export all tools
-- [ ] Create unified tool registry with metadata
-- [ ] Update `src/agent/graph.py` execute_node to use real tools
-- [ ] Implement `execute_tool()` wrapper with logging and timeout
-- [ ] Test tool execution from StateGraph
-### Step 6: Configuration Updates
-- [ ] Update `src/config/settings.py` with tool-specific settings
-- [ ] Add tool feature flags and timeouts
-- [ ] Update `.env.example` with new configuration (if needed)
-### Step 7: Integration Testing
-- [ ] Create `tests/test_tools_integration.py` for cross-tool tests
-- [ ] Create `tests/test_stage2.py` for end-to-end validation
-- [ ] Test error propagation and retry mechanisms
-- [ ] Verify StateGraph executes all tools correctly
-### Step 8: Deployment
-- [ ] Add `tenacity` to requirements.txt
-- [ ] Run all test suites locally
-- [ ] Test with Gradio UI
-- [ ] Verify no regressions from Stage 1
-- [ ] Push changes to HF Spaces
-- [ ] Verify deployment build succeeds
-- [ ] Test tools in deployed environment
-## Notes
-- All tools use direct API approach (not MCP servers)
-- HF Spaces deployment compatibility is priority
-- Mock APIs for unit tests, real APIs for integration tests only
-- Each checkpoint should pass before moving to next step

+# TODO List
+**Session Date:** [YYYY-MM-DD]
+**Dev Record:** [link to dev/dev_YYMMDD_##_concise_title.md]
+## Active Tasks
+- [ ] [Task 1]
+- [ ] [Task 2]
+- [ ] [Task 3]
+## Completed Tasks
+- [x] [Completed task 1]

dev/dev_260102_13_stage2_tool_development.md ADDED Viewed

	@@ -0,0 +1,280 @@

+# [dev_260102_13] Stage 2: Tool Development Complete
+**Date:** 2026-01-02
+**Type:** Development
+**Status:** Resolved
+**Related Dev:** dev_260101_11 (Stage 1 Foundation Setup)
+## Problem Description
+Stage 1 established the LangGraph StateGraph skeleton with placeholder nodes. Stage 2 needed to implement the actual tools that the agent would use to answer GAIA benchmark questions, including web search, file parsing, mathematical computation, and multimodal image analysis.
+**Root cause:** GAIA questions require external tool use (web search, file reading, calculations, image analysis). Stage 1 had no actual tool implementations - just placeholders.
+---
+## Key Decisions
+### Decision 1: Direct API Implementation vs MCP Servers
+**Chosen:** Direct Python function implementations for all tools
+**Why:**
+- HuggingFace Spaces doesn't support running MCP servers (requires separate processes)
+- Direct API approach is simpler and more reliable for deployment
+- Full control over retry logic, error handling, and timeouts
+- MCP servers are external dependencies with additional failure points
+**Rejected alternative:** Using MCP protocol servers for Tavily/Exa
+- Would require complex Docker configuration on HF Spaces
+- Additional process management overhead
+- Not necessary for MVP stage
+### Decision 2: Retry Logic with Tenacity
+**Chosen:** Use `tenacity` library with exponential backoff, max 3 retries
+**Why:**
+- Industry-standard retry library with clean decorator syntax
+- Exponential backoff prevents API rate limit issues
+- Configurable retry conditions (only retry on connection errors, not on validation errors)
+- Easy to test with mocking
+**Configuration:**
+- Max retries: 3
+- Min wait: 1 second
+- Max wait: 10 seconds
+- Retry only on: ConnectionError, TimeoutError, IOError (for file operations)
+### Decision 3: Tool Architecture - Unified Functions with Fallback
+**Pattern applied to all tools:**
+- Primary implementation (e.g., `tavily_search`)
+- Fallback implementation (e.g., `exa_search`)
+- Unified function with automatic fallback (e.g., `search`)
+**Example:**
+```python
+def search(query):
+    if default_tool == "tavily":
+        try:
+            return tavily_search(query)
+        except:
+            return exa_search(query)  # Fallback
+```
+**Why:** Maximizes reliability - if primary service fails, automatic fallback ensures tool still works
+### Decision 4: Calculator Security - AST-based Evaluation
+**Chosen:** Custom AST visitor with whitelisted operations only
+**Why:**
+- Python's `eval()` is dangerous (arbitrary code execution)
+- `ast.literal_eval()` is too restrictive (doesn't support math operations)
+- Custom AST visitor allows precise control over allowed operations
+- Timeout protection prevents infinite loops
+- Whitelist approach: only allow known-safe operations (add, multiply, sin, cos, etc.)
+**Rejected alternatives:**
+- Using `eval()`: Major security vulnerability
+- Using `sympify()` from sympy: Too complex, allows too much
+**Security layers:**
+1. AST whitelist (only allow specific node types)
+2. Expression length limit (500 chars)
+3. Number size limit (prevent huge calculations)
+4. Timeout protection (2 seconds max)
+5. No attribute access, no imports, no exec/eval
+### Decision 5: File Parser - Generic Dispatcher Pattern
+**Chosen:** Single `parse_file()` function that dispatches based on extension
+```python
+def parse_file(file_path):
+    extension = Path(file_path).suffix.lower()
+    if extension == '.pdf':
+        return parse_pdf(file_path)
+    elif extension in ['.xlsx', '.xls']:
+        return parse_excel(file_path)
+    # ... etc
+```
+**Why:**
+- Simple interface for users (one function for all file types)
+- Easy to add new file types (just add new parser and update dispatcher)
+- Each parser can have format-specific logic
+- Fallback to specific parsers still available for advanced use
+### Decision 6: Vision Tool - Gemini as Default with Claude Fallback
+**Chosen:** Gemini 2.0 Flash as primary, Claude Sonnet 4.5 as fallback
+**Why:**
+- Gemini 2.0 Flash: Free tier (1500 req/day), fast, good quality
+- Claude Sonnet 4.5: Paid but highest quality, automatic fallback if Gemini fails
+- Same pattern as web search (primary + fallback = reliability)
+**Image handling:**
+- Load file, encode as base64
+- Check file size (max 10MB)
+- Support common formats (JPG, PNG, GIF, WEBP, BMP)
+- Return structured answer with model metadata
+## Outcome
+Successfully implemented 4 production-ready tools with comprehensive error handling and test coverage.
+**Deliverables:**
+1. **Web Search Tool** ([src/tools/web_search.py](../src/tools/web_search.py))
+   - Tavily API integration (primary, free tier)
+   - Exa API integration (fallback, paid)
+   - Automatic fallback if primary fails
+   - 10 passing tests (mock API, retry logic, fallback mechanism)
+2. **File Parser Tool** ([src/tools/file_parser.py](../src/tools/file_parser.py))
+   - PDF parsing (PyPDF2)
+   - Excel parsing (openpyxl)
+   - Word parsing (python-docx)
+   - Text/CSV parsing (built-in open)
+   - Generic `parse_file()` dispatcher
+   - 19 passing tests (real files + error handling)
+3. **Calculator Tool** ([src/tools/calculator.py](../src/tools/calculator.py))
+   - Safe AST-based expression evaluation
+   - Whitelisted operations only (no code execution)
+   - Mathematical functions (sin, cos, sqrt, factorial, etc.)
+   - Security hardened (timeout, complexity limits)
+   - 41 passing tests (arithmetic, functions, security)
+4. **Vision Tool** ([src/tools/vision.py](../src/tools/vision.py))
+   - Multimodal image analysis using LLMs
+   - Gemini 2.0 Flash (primary, free)
+   - Claude Sonnet 4.5 (fallback, paid)
+   - Image loading and base64 encoding
+   - 15 passing tests (mock LLM responses)
+5. **Tool Registry** ([src/tools/__init__.py](../src/tools/__init__.py))
+   - Exports all 4 main tools: `search`, `parse_file`, `safe_eval`, `analyze_image`
+   - TOOLS dict with metadata (description, parameters, category)
+   - Ready for Stage 3 dynamic tool selection
+6. **StateGraph Integration** ([src/agent/graph.py](../src/agent/graph.py))
+   - Updated `execute_node` to load tool registry
+   - Stage 2: Reports tool availability
+   - Stage 3: Will add dynamic tool selection and execution
+**Test Coverage:**
+- 85 tool tests passing (web_search: 10, file_parser: 19, calculator: 41, vision: 15)
+- 6 existing agent tests still passing
+- 91 total tests passing
+- No regressions from Stage 1
+**Deployment:**
+- All changes committed and pushed to HuggingFace Spaces
+- Build succeeded
+- Agent now reports: "Stage 2 complete: 4 tools ready for execution in Stage 3"
+## Learnings and Insights
+### Pattern: Unified Function with Fallback
+This pattern worked extremely well for both web search and vision tools:
+```python
+def tool_name(args):
+    # Try primary service
+    try:
+        return primary_implementation(args)
+    except Exception as e:
+        logger.warning(f"Primary failed: {e}")
+        # Fallback to secondary
+        try:
+            return fallback_implementation(args)
+        except Exception as fallback_error:
+            raise Exception(f"Both failed")
+```
+**Why it works:**
+- Maximizes reliability (2 chances to succeed)
+- Transparent to users (single function call)
+- Preserves cost optimization (use free tier first, paid only as fallback)
+**Recommendation:** Use this pattern for any tool with multiple service providers.
+### Pattern: Test Fixtures for File Parsers
+Creating real test fixtures (sample.pdf, sample.xlsx, etc.) was critical for file parser testing:
+**What worked:**
+- Tests are realistic (test actual file parsing, not just mocks)
+- Easy to add new test cases (just add new fixture files)
+- Catches edge cases that mocks miss
+**Created fixtures:**
+- `tests/fixtures/sample.txt` - Plain text
+- `tests/fixtures/sample.csv` - CSV data
+- `tests/fixtures/sample.xlsx` - Excel spreadsheet
+- `tests/fixtures/sample.docx` - Word document
+- `tests/fixtures/test_image.jpg` - Test image (red square)
+- `tests/fixtures/generate_fixtures.py` - Script to regenerate fixtures
+**Recommendation:** For any file processing tool, create comprehensive fixture library.
+### What Worked Well: Mock Path for Import Testing
+Initially had issues with mock paths like `src.tools.vision.genai.Client`. The fix:
+```python
+# WRONG: src.tools.vision.genai.Client
+# RIGHT: google.genai.Client
+with patch('google.genai.Client') as mock_client:
+    # Mock the original import, not the re-export
+```
+**Lesson:** Always mock the original module path, not where it's imported into your code.
+### What to Avoid: Premature Integration Testing
+Initially planned to create `tests/test_tools_integration.py` for cross-tool testing. **Decision:** Skip for Stage 2.
+**Why:**
+- Tools work independently (don't need to interact yet)
+- Integration testing makes sense in Stage 3 when tools are orchestrated
+- Unit tests provide sufficient coverage for Stage 2
+**Recommendation:** Only write integration tests when components actually integrate. Don't test imaginary integration.
+## Changelog
+**What was created:**
+- `src/tools/web_search.py` - Tavily/Exa web search with retry logic
+- `src/tools/file_parser.py` - PDF/Excel/Word/Text parsing with retry logic
+- `src/tools/calculator.py` - Safe AST-based math evaluation
+- `src/tools/vision.py` - Multimodal image analysis (Gemini/Claude)
+- `tests/test_web_search.py` - 10 tests for web search tool
+- `tests/test_file_parser.py` - 19 tests for file parser
+- `tests/test_calculator.py` - 41 tests for calculator (including security)
+- `tests/test_vision.py` - 15 tests for vision tool
+- `tests/fixtures/sample.txt` - Test text file
+- `tests/fixtures/sample.csv` - Test CSV file
+- `tests/fixtures/sample.xlsx` - Test Excel file
+- `tests/fixtures/sample.docx` - Test Word document
+- `tests/fixtures/test_image.jpg` - Test image
+- `tests/fixtures/generate_fixtures.py` - Fixture generation script
+**What was modified:**
+- `src/tools/__init__.py` - Added tool exports and TOOLS registry
+- `src/agent/graph.py` - Updated execute_node to load tool registry
+- `requirements.txt` - Added `tenacity>=8.2.0` for retry logic
+- `pyproject.toml` - Installed tenacity, fpdf2, defusedxml packages
+- `PLAN.md` - Emptied for next stage
+- `TODO.md` - Emptied for next stage
+**What was deleted:**
+- None (Stage 2 was purely additive)

src/tools/web_search.py CHANGED Viewed

@@ -39,6 +39,7 @@ logger = logging.getLogger(__name__)
 # Tavily Search Implementation
 # ============================================================================
 @retry(
     stop=stop_after_attempt(MAX_RETRIES),
     wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
@@ -83,11 +84,13 @@ def tavily_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
         # Extract and structure results
         results = []
         for item in response.get("results", []):
-            results.append({
-                "title": item.get("title", ""),
-                "url": item.get("url", ""),
-                "snippet": item.get("content", ""),
-            })
         logger.info(f"Tavily search successful: {len(results)} results")
@@ -113,6 +116,7 @@ def tavily_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
 # Exa Search Implementation
 # ============================================================================
 @retry(
     stop=stop_after_attempt(MAX_RETRIES),
     wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
@@ -152,16 +156,20 @@ def exa_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
         logger.info(f"Exa search: query='{query}', max_results={max_results}")
         client = Exa(api_key=api_key)
-        response = client.search(query=query, num_results=max_results, use_autoprompt=True)
         # Extract and structure results
         results = []
         for item in response.results:
-            results.append({
-                "title": item.title if hasattr(item, 'title') else "",
-                "url": item.url if hasattr(item, 'url') else "",
-                "snippet": item.text if hasattr(item, 'text') else "",
-            })
         logger.info(f"Exa search successful: {len(results)} results")
@@ -187,6 +195,7 @@ def exa_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
 # Unified Search with Fallback
 # ============================================================================
 def search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
     """
     Unified search function with automatic fallback.

 # Tavily Search Implementation
 # ============================================================================
 @retry(
     stop=stop_after_attempt(MAX_RETRIES),
     wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
         # Extract and structure results
         results = []
         for item in response.get("results", []):
+            results.append(
+                {
+                    "title": item.get("title", ""),
+                    "url": item.get("url", ""),
+                    "snippet": item.get("content", ""),
+                }
+            )
         logger.info(f"Tavily search successful: {len(results)} results")
 # Exa Search Implementation
 # ============================================================================
 @retry(
     stop=stop_after_attempt(MAX_RETRIES),
     wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
         logger.info(f"Exa search: query='{query}', max_results={max_results}")
         client = Exa(api_key=api_key)
+        response = client.search(
+            query=query, num_results=max_results, use_autoprompt=True
+        )
         # Extract and structure results
         results = []
         for item in response.results:
+            results.append(
+                {
+                    "title": item.title if hasattr(item, "title") else "",
+                    "url": item.url if hasattr(item, "url") else "",
+                    "snippet": item.text if hasattr(item, "text") else "",
+                }
+            )
         logger.info(f"Exa search successful: {len(results)} results")
 # Unified Search with Fallback
 # ============================================================================
 def search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
     """
     Unified search function with automatic fallback.

{tests → test}/README.md RENAMED Viewed

File without changes

{tests → test}/__init__.py RENAMED Viewed

File without changes

{tests → test}/fixtures/generate_fixtures.py RENAMED Viewed

File without changes

{tests → test}/fixtures/sample.csv RENAMED Viewed

File without changes

{tests → test}/fixtures/sample.docx RENAMED Viewed

File without changes

{tests → test}/fixtures/sample.txt RENAMED Viewed

File without changes

{tests → test}/fixtures/sample.xlsx RENAMED Viewed

File without changes

{tests → test}/fixtures/test_image.jpg RENAMED Viewed

File without changes

{tests → test}/test_agent_basic.py RENAMED Viewed

File without changes

{tests → test}/test_calculator.py RENAMED Viewed

File without changes

{tests → test}/test_file_parser.py RENAMED Viewed

File without changes

{tests → test}/test_stage1.py RENAMED Viewed

File without changes

{tests → test}/test_vision.py RENAMED Viewed

File without changes

{tests → test}/test_web_search.py RENAMED Viewed

File without changes