mangubee commited on
Commit
8e48c56
Β·
1 Parent(s): 1041734

Move test files from tests/ to test/ folder per project standard

Browse files

- Moved all test files to standard test/ folder (singular, not plural)
- test_web_search.py, test_file_parser.py, test_calculator.py, test_vision.py
- Moved fixtures/ directory to test/fixtures/
- Removed tests/ directory
- All 91 tests still passing

PLAN.md CHANGED
@@ -1,300 +1,21 @@
1
- # Implementation Plan - Stage 2: Tool Development
2
 
3
- **Date:** 2026-01-02
4
- **Dev Record:** TBD (will create dev_260102_##_stage2_tool_development.md)
5
- **Status:** In Progress
6
 
7
  ## Objective
8
 
9
- Implement 4 core tools (web search, file parsing, calculator, multimodal vision) with retry logic and error handling, following Level 5 (Component Selection) and Level 6 (Implementation Framework) architectural decisions. Each tool must be independently testable and integrate seamlessly with the LangGraph StateGraph.
10
 
11
  ## Steps
12
 
13
- ### Step 1: Web Search Tool Implementation
14
-
15
- **1.1 Create src/tools/web_search.py**
16
-
17
- - Implement `tavily_search(query: str, max_results: int = 5) -> dict` function
18
- - Implement `exa_search(query: str, max_results: int = 5) -> dict` function (fallback)
19
- - Use Settings.get_search_api_key() for API key retrieval
20
- - Return structured results: {results: [{title, url, snippet}], source: "tavily"|"exa"}
21
-
22
- **1.2 Add retry logic with exponential backoff**
23
-
24
- - Use `tenacity` library for retry decorator
25
- - Retry on connection errors, timeouts, rate limits
26
- - Max 3 retries with 2^n second delays
27
- - Fallback from Tavily to Exa if Tavily fails after retries
28
-
29
- **1.3 Error handling**
30
-
31
- - Catch API errors and return meaningful error messages
32
- - Handle empty results gracefully
33
- - Log all errors for debugging
34
-
35
- **1.4 Create tests/test_web_search.py**
36
-
37
- - Test Tavily search with mock API
38
- - Test Exa search with mock API
39
- - Test retry logic (simulate failures)
40
- - Test fallback mechanism
41
- - Test error handling
42
-
43
- ### Step 2: File Parsing Tool Implementation
44
-
45
- **2.1 Create src/tools/file_parser.py**
46
-
47
- - Implement `parse_pdf(file_path: str) -> str` using PyPDF2
48
- - Implement `parse_excel(file_path: str) -> dict` using openpyxl
49
- - Implement `parse_docx(file_path: str) -> str` using python-docx
50
- - Implement `parse_image_text(image_path: str) -> str` using Pillow + OCR (optional)
51
- - Generic `parse_file(file_path: str) -> dict` dispatcher based on extension
52
-
53
- **2.2 Add retry logic for file operations**
54
-
55
- - Retry on file read errors (network issues, temporary locks)
56
- - Max 3 retries with exponential backoff
57
-
58
- **2.3 Error handling**
59
-
60
- - Handle file not found errors
61
- - Handle corrupted file errors
62
- - Handle unsupported format errors
63
- - Return structured error responses
64
-
65
- **2.4 Create tests/test_file_parser.py**
66
-
67
- - Create test fixtures (sample PDF, Excel, Word files in tests/fixtures/)
68
- - Test each parser function independently
69
- - Test error handling for missing files
70
- - Test error handling for corrupted files
71
-
72
- ### Step 3: Calculator Tool Implementation
73
-
74
- **3.1 Create src/tools/calculator.py**
75
-
76
- - Implement `safe_eval(expression: str) -> dict` using ast.literal_eval
77
- - Support basic arithmetic operations (+, -, *, /, **, %)
78
- - Support mathematical functions (sin, cos, sqrt, etc.) via math module
79
- - Return structured result: {result: float|int, expression: str}
80
-
81
- **3.2 Add safety checks**
82
-
83
- - Whitelist allowed operations (no exec, eval, import)
84
- - Validate expression before evaluation
85
- - Set execution timeout (prevent infinite loops)
86
- - Limit expression complexity (prevent DoS)
87
-
88
- **3.3 Error handling**
89
-
90
- - Handle syntax errors
91
- - Handle division by zero
92
- - Handle invalid operations
93
- - Return meaningful error messages
94
-
95
- **3.4 Create tests/test_calculator.py**
96
-
97
- - Test basic arithmetic (2+2, 10*5, etc.)
98
- - Test mathematical functions (sqrt(16), sin(0), etc.)
99
- - Test error handling (division by zero, invalid syntax)
100
- - Test safety checks (block dangerous operations)
101
-
102
- ### Step 4: Multimodal Vision Tool Implementation
103
-
104
- **4.1 Create src/tools/vision.py**
105
-
106
- - Implement `analyze_image(image_path: str, question: str) -> str`
107
- - Use LLM's native vision capabilities (Gemini/Claude)
108
- - Load image, encode to base64
109
- - Send to vision-capable LLM with question
110
- - Return description/answer
111
-
112
- **4.2 Add retry logic**
113
-
114
- - Retry on API errors
115
- - Max 3 retries with exponential backoff
116
-
117
- **4.3 Error handling**
118
-
119
- - Handle image loading errors
120
- - Handle unsupported image formats
121
- - Handle API errors
122
- - Return structured responses
123
-
124
- **4.4 Create tests/test_vision.py**
125
-
126
- - Create test image fixtures
127
- - Test image analysis with mock LLM
128
- - Test error handling
129
- - Test retry logic
130
-
131
- ### Step 5: Tool Integration with StateGraph
132
-
133
- **5.1 Update src/tools/__init__.py**
134
-
135
- - Export all tool functions
136
- - Create unified tool registry: `TOOLS = {name: function}`
137
- - Add tool metadata (description, parameters, return type)
138
-
139
- **5.2 Update src/agent/graph.py execute_node**
140
-
141
- - Replace placeholder with actual tool execution
142
- - Parse tool calls from plan
143
- - Execute tools with error handling
144
- - Collect results
145
- - Return updated state with tool results
146
-
147
- **5.3 Add tool execution wrapper**
148
-
149
- - Implement `execute_tool(tool_name: str, **kwargs) -> dict`
150
- - Add logging for tool calls
151
- - Add timeout enforcement
152
- - Add result validation
153
-
154
- ### Step 6: Configuration and Settings Updates
155
-
156
- **6.1 Update src/config/settings.py**
157
-
158
- - Add tool-specific settings (timeouts, max retries, etc.)
159
- - Add tool feature flags (enable/disable specific tools)
160
- - Add result size limits
161
-
162
- **6.2 Update .env.example**
163
-
164
- - Document any new environment variables
165
- - Add tool-specific configuration examples
166
-
167
- ### Step 7: Integration Testing
168
-
169
- **7.1 Create tests/test_tools_integration.py**
170
-
171
- - Test all tools working together
172
- - Test tool execution from StateGraph
173
- - Test error propagation
174
- - Test retry mechanisms across all tools
175
-
176
- **7.2 Create test_stage2.py**
177
-
178
- - End-to-end test with real tool calls
179
- - Verify StateGraph executes tools correctly
180
- - Verify results are returned to state
181
- - Verify errors are handled gracefully
182
-
183
- ### Step 8: Documentation and Deployment
184
-
185
- **8.1 Update requirements.txt**
186
-
187
- - Ensure all tool dependencies are included
188
- - Add tenacity for retry logic
189
-
190
- **8.2 Local testing**
191
-
192
- - Run all test suites
193
- - Test with Gradio UI
194
- - Verify no regressions from Stage 1
195
-
196
- **8.3 Deploy to HF Spaces**
197
-
198
- - Push changes
199
- - Verify build succeeds
200
- - Test tools in deployed environment
201
 
202
  ## Files to Modify
203
 
204
- **New files to create:**
205
-
206
- - `src/tools/web_search.py` - Tavily/Exa search implementation
207
- - `src/tools/file_parser.py` - PDF/Excel/Word/Image parsing
208
- - `src/tools/calculator.py` - Safe expression evaluation
209
- - `src/tools/vision.py` - Multimodal image analysis
210
- - `tests/test_web_search.py` - Web search tests
211
- - `tests/test_file_parser.py` - File parser tests
212
- - `tests/test_calculator.py` - Calculator tests
213
- - `tests/test_vision.py` - Vision tests
214
- - `tests/test_tools_integration.py` - Integration tests
215
- - `tests/test_stage2.py` - Stage 2 end-to-end tests
216
- - `tests/fixtures/` - Test files directory
217
-
218
- **Existing files to modify:**
219
-
220
- - `src/tools/__init__.py` - Export all tools, create tool registry
221
- - `src/agent/graph.py` - Update execute_node to use real tools
222
- - `src/config/settings.py` - Add tool-specific settings
223
- - `.env.example` - Document new configuration (if any)
224
- - `requirements.txt` - Add tenacity for retry logic
225
-
226
- **Files NOT to modify:**
227
-
228
- - `src/agent/graph.py` plan_node - Defer to Stage 3
229
- - `src/agent/graph.py` answer_node - Defer to Stage 3
230
- - Planning/reasoning logic - Defer to Stage 3
231
 
232
  ## Success Criteria
233
 
234
- ### Functional Requirements
235
-
236
- - [ ] Web search tool returns valid results from Tavily
237
- - [ ] Web search falls back to Exa when Tavily fails
238
- - [ ] File parser handles PDF, Excel, Word files correctly
239
- - [ ] Calculator evaluates mathematical expressions safely
240
- - [ ] Vision tool analyzes images using LLM vision capabilities
241
- - [ ] All tools have retry logic with exponential backoff
242
- - [ ] All tools handle errors gracefully
243
- - [ ] Tools integrate with StateGraph execute_node
244
-
245
- ### Technical Requirements
246
-
247
- - [ ] All tool functions return structured dict responses
248
- - [ ] Retry logic uses tenacity with max 3 retries
249
- - [ ] Error messages are clear and actionable
250
- - [ ] All tools have comprehensive test coverage (>80%)
251
- - [ ] No unsafe code execution in calculator
252
- - [ ] Tool timeouts enforced to prevent hangs
253
-
254
- ### Validation Checkpoints
255
-
256
- - [ ] **Checkpoint 1:** Web search tool working with tests passing
257
- - [ ] **Checkpoint 2:** File parser working with tests passing
258
- - [ ] **Checkpoint 3:** Calculator working with tests passing
259
- - [ ] **Checkpoint 4:** Vision tool working with tests passing
260
- - [ ] **Checkpoint 5:** All tools integrated with StateGraph
261
- - [ ] **Checkpoint 6:** Integration tests passing
262
- - [ ] **Checkpoint 7:** Deployed to HF Spaces successfully
263
-
264
- ### Non-Goals for Stage 2
265
-
266
- - ❌ Implementing planning logic (Stage 3)
267
- - ❌ Implementing answer synthesis (Stage 3)
268
- - ❌ Optimizing tool selection strategy (Stage 3)
269
- - ❌ Advanced error recovery beyond retries (Stage 4)
270
- - ❌ Performance optimization (Stage 5)
271
-
272
- ## Dependencies & Risks
273
-
274
- **Dependencies:**
275
-
276
- - Tavily API key (free tier: 1000 req/month)
277
- - Exa API key (paid tier, fallback)
278
- - LLM vision API access (Gemini/Claude)
279
- - Test fixtures (sample files for parsing)
280
-
281
- **Risks:**
282
-
283
- - **Risk:** API rate limits during testing
284
- - **Mitigation:** Use mocks for unit tests, real APIs only for integration tests
285
- - **Risk:** File parsing fails on edge cases
286
- - **Mitigation:** Comprehensive test fixtures covering various formats
287
- - **Risk:** Calculator security vulnerabilities
288
- - **Mitigation:** Strict whitelisting, no eval/exec, use AST parsing only
289
- - **Risk:** Tool timeout issues on slow networks
290
- - **Mitigation:** Configurable timeouts, retry logic
291
-
292
- ## Next Steps After Stage 2
293
-
294
- Once Stage 2 Success Criteria met:
295
-
296
- 1. Create Stage 3 plan (Core Agent Logic - Planning & Reasoning)
297
- 2. Implement plan_node with tool selection strategy
298
- 3. Implement answer_node with result synthesis
299
- 4. Test end-to-end agent behavior
300
- 5. Proceed to Stage 4 (Integration & Robustness)
 
1
+ # Implementation Plan
2
 
3
+ **Date:** [YYYY-MM-DD]
4
+ **Dev Record:** [link to dev/dev_YYMMDD_##_concise_title.md]
5
+ **Status:** [Planning | In Progress | Completed]
6
 
7
  ## Objective
8
 
9
+ [Clear goal statement]
10
 
11
  ## Steps
12
 
13
+ [Implementation steps]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ## Files to Modify
16
 
17
+ [List of files]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Success Criteria
20
 
21
+ [Completion criteria]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TODO.md CHANGED
@@ -1,71 +1,14 @@
1
- # TODO - Stage 2: Tool Development
2
 
3
- **Created:** 2026-01-02
4
- **Plan:** PLAN.md (Stage 2: Tool Development)
5
- **Status:** Ready for execution
6
 
7
- ## Task List
8
 
9
- ### Step 1: Web Search Tool
10
- - [ ] Create `src/tools/web_search.py` with Tavily and Exa search functions
11
- - [ ] Add retry logic with tenacity decorator (max 3 retries, exponential backoff)
12
- - [ ] Implement fallback mechanism (Tavily β†’ Exa)
13
- - [ ] Add error handling and logging
14
- - [ ] Create `tests/test_web_search.py` with mock API tests
15
- - [ ] Test retry logic and fallback mechanism
16
 
17
- ### Step 2: File Parsing Tool
18
- - [ ] Create `src/tools/file_parser.py` with PDF/Excel/Word parsers
19
- - [ ] Implement generic `parse_file()` dispatcher
20
- - [ ] Add retry logic for file operations
21
- - [ ] Add error handling for missing/corrupted files
22
- - [ ] Create test fixtures in `tests/fixtures/`
23
- - [ ] Create `tests/test_file_parser.py` with parser tests
24
 
25
- ### Step 3: Calculator Tool
26
- - [ ] Create `src/tools/calculator.py` with safe_eval function
27
- - [ ] Implement safety checks (whitelist operations, timeout, complexity limits)
28
- - [ ] Add error handling for syntax/division errors
29
- - [ ] Create `tests/test_calculator.py` with arithmetic and safety tests
30
-
31
- ### Step 4: Vision Tool
32
- - [ ] Create `src/tools/vision.py` with image analysis function
33
- - [ ] Implement image loading and base64 encoding
34
- - [ ] Integrate with LLM vision API (Gemini/Claude)
35
- - [ ] Add retry logic for API errors
36
- - [ ] Create test image fixtures
37
- - [ ] Create `tests/test_vision.py` with mock LLM tests
38
-
39
- ### Step 5: StateGraph Integration
40
- - [ ] Update `src/tools/__init__.py` to export all tools
41
- - [ ] Create unified tool registry with metadata
42
- - [ ] Update `src/agent/graph.py` execute_node to use real tools
43
- - [ ] Implement `execute_tool()` wrapper with logging and timeout
44
- - [ ] Test tool execution from StateGraph
45
-
46
- ### Step 6: Configuration Updates
47
- - [ ] Update `src/config/settings.py` with tool-specific settings
48
- - [ ] Add tool feature flags and timeouts
49
- - [ ] Update `.env.example` with new configuration (if needed)
50
-
51
- ### Step 7: Integration Testing
52
- - [ ] Create `tests/test_tools_integration.py` for cross-tool tests
53
- - [ ] Create `tests/test_stage2.py` for end-to-end validation
54
- - [ ] Test error propagation and retry mechanisms
55
- - [ ] Verify StateGraph executes all tools correctly
56
-
57
- ### Step 8: Deployment
58
- - [ ] Add `tenacity` to requirements.txt
59
- - [ ] Run all test suites locally
60
- - [ ] Test with Gradio UI
61
- - [ ] Verify no regressions from Stage 1
62
- - [ ] Push changes to HF Spaces
63
- - [ ] Verify deployment build succeeds
64
- - [ ] Test tools in deployed environment
65
-
66
- ## Notes
67
-
68
- - All tools use direct API approach (not MCP servers)
69
- - HF Spaces deployment compatibility is priority
70
- - Mock APIs for unit tests, real APIs for integration tests only
71
- - Each checkpoint should pass before moving to next step
 
1
+ # TODO List
2
 
3
+ **Session Date:** [YYYY-MM-DD]
4
+ **Dev Record:** [link to dev/dev_YYMMDD_##_concise_title.md]
 
5
 
6
+ ## Active Tasks
7
 
8
+ - [ ] [Task 1]
9
+ - [ ] [Task 2]
10
+ - [ ] [Task 3]
 
 
 
 
11
 
12
+ ## Completed Tasks
 
 
 
 
 
 
13
 
14
+ - [x] [Completed task 1]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dev/dev_260102_13_stage2_tool_development.md ADDED
@@ -0,0 +1,280 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # [dev_260102_13] Stage 2: Tool Development Complete
2
+
3
+ **Date:** 2026-01-02
4
+ **Type:** Development
5
+ **Status:** Resolved
6
+ **Related Dev:** dev_260101_11 (Stage 1 Foundation Setup)
7
+
8
+ ## Problem Description
9
+
10
+ Stage 1 established the LangGraph StateGraph skeleton with placeholder nodes. Stage 2 needed to implement the actual tools that the agent would use to answer GAIA benchmark questions, including web search, file parsing, mathematical computation, and multimodal image analysis.
11
+
12
+ **Root cause:** GAIA questions require external tool use (web search, file reading, calculations, image analysis). Stage 1 had no actual tool implementations - just placeholders.
13
+
14
+ ---
15
+
16
+ ## Key Decisions
17
+
18
+ ### Decision 1: Direct API Implementation vs MCP Servers
19
+
20
+ **Chosen:** Direct Python function implementations for all tools
21
+
22
+ **Why:**
23
+ - HuggingFace Spaces doesn't support running MCP servers (requires separate processes)
24
+ - Direct API approach is simpler and more reliable for deployment
25
+ - Full control over retry logic, error handling, and timeouts
26
+ - MCP servers are external dependencies with additional failure points
27
+
28
+ **Rejected alternative:** Using MCP protocol servers for Tavily/Exa
29
+ - Would require complex Docker configuration on HF Spaces
30
+ - Additional process management overhead
31
+ - Not necessary for MVP stage
32
+
33
+ ### Decision 2: Retry Logic with Tenacity
34
+
35
+ **Chosen:** Use `tenacity` library with exponential backoff, max 3 retries
36
+
37
+ **Why:**
38
+ - Industry-standard retry library with clean decorator syntax
39
+ - Exponential backoff prevents API rate limit issues
40
+ - Configurable retry conditions (only retry on connection errors, not on validation errors)
41
+ - Easy to test with mocking
42
+
43
+ **Configuration:**
44
+ - Max retries: 3
45
+ - Min wait: 1 second
46
+ - Max wait: 10 seconds
47
+ - Retry only on: ConnectionError, TimeoutError, IOError (for file operations)
48
+
49
+ ### Decision 3: Tool Architecture - Unified Functions with Fallback
50
+
51
+ **Pattern applied to all tools:**
52
+ - Primary implementation (e.g., `tavily_search`)
53
+ - Fallback implementation (e.g., `exa_search`)
54
+ - Unified function with automatic fallback (e.g., `search`)
55
+
56
+ **Example:**
57
+ ```python
58
+ def search(query):
59
+ if default_tool == "tavily":
60
+ try:
61
+ return tavily_search(query)
62
+ except:
63
+ return exa_search(query) # Fallback
64
+ ```
65
+
66
+ **Why:** Maximizes reliability - if primary service fails, automatic fallback ensures tool still works
67
+
68
+ ### Decision 4: Calculator Security - AST-based Evaluation
69
+
70
+ **Chosen:** Custom AST visitor with whitelisted operations only
71
+
72
+ **Why:**
73
+ - Python's `eval()` is dangerous (arbitrary code execution)
74
+ - `ast.literal_eval()` is too restrictive (doesn't support math operations)
75
+ - Custom AST visitor allows precise control over allowed operations
76
+ - Timeout protection prevents infinite loops
77
+ - Whitelist approach: only allow known-safe operations (add, multiply, sin, cos, etc.)
78
+
79
+ **Rejected alternatives:**
80
+ - Using `eval()`: Major security vulnerability
81
+ - Using `sympify()` from sympy: Too complex, allows too much
82
+
83
+ **Security layers:**
84
+ 1. AST whitelist (only allow specific node types)
85
+ 2. Expression length limit (500 chars)
86
+ 3. Number size limit (prevent huge calculations)
87
+ 4. Timeout protection (2 seconds max)
88
+ 5. No attribute access, no imports, no exec/eval
89
+
90
+ ### Decision 5: File Parser - Generic Dispatcher Pattern
91
+
92
+ **Chosen:** Single `parse_file()` function that dispatches based on extension
93
+
94
+ ```python
95
+ def parse_file(file_path):
96
+ extension = Path(file_path).suffix.lower()
97
+ if extension == '.pdf':
98
+ return parse_pdf(file_path)
99
+ elif extension in ['.xlsx', '.xls']:
100
+ return parse_excel(file_path)
101
+ # ... etc
102
+ ```
103
+
104
+ **Why:**
105
+ - Simple interface for users (one function for all file types)
106
+ - Easy to add new file types (just add new parser and update dispatcher)
107
+ - Each parser can have format-specific logic
108
+ - Fallback to specific parsers still available for advanced use
109
+
110
+ ### Decision 6: Vision Tool - Gemini as Default with Claude Fallback
111
+
112
+ **Chosen:** Gemini 2.0 Flash as primary, Claude Sonnet 4.5 as fallback
113
+
114
+ **Why:**
115
+ - Gemini 2.0 Flash: Free tier (1500 req/day), fast, good quality
116
+ - Claude Sonnet 4.5: Paid but highest quality, automatic fallback if Gemini fails
117
+ - Same pattern as web search (primary + fallback = reliability)
118
+
119
+ **Image handling:**
120
+ - Load file, encode as base64
121
+ - Check file size (max 10MB)
122
+ - Support common formats (JPG, PNG, GIF, WEBP, BMP)
123
+ - Return structured answer with model metadata
124
+
125
+ ## Outcome
126
+
127
+ Successfully implemented 4 production-ready tools with comprehensive error handling and test coverage.
128
+
129
+ **Deliverables:**
130
+
131
+ 1. **Web Search Tool** ([src/tools/web_search.py](../src/tools/web_search.py))
132
+ - Tavily API integration (primary, free tier)
133
+ - Exa API integration (fallback, paid)
134
+ - Automatic fallback if primary fails
135
+ - 10 passing tests (mock API, retry logic, fallback mechanism)
136
+
137
+ 2. **File Parser Tool** ([src/tools/file_parser.py](../src/tools/file_parser.py))
138
+ - PDF parsing (PyPDF2)
139
+ - Excel parsing (openpyxl)
140
+ - Word parsing (python-docx)
141
+ - Text/CSV parsing (built-in open)
142
+ - Generic `parse_file()` dispatcher
143
+ - 19 passing tests (real files + error handling)
144
+
145
+ 3. **Calculator Tool** ([src/tools/calculator.py](../src/tools/calculator.py))
146
+ - Safe AST-based expression evaluation
147
+ - Whitelisted operations only (no code execution)
148
+ - Mathematical functions (sin, cos, sqrt, factorial, etc.)
149
+ - Security hardened (timeout, complexity limits)
150
+ - 41 passing tests (arithmetic, functions, security)
151
+
152
+ 4. **Vision Tool** ([src/tools/vision.py](../src/tools/vision.py))
153
+ - Multimodal image analysis using LLMs
154
+ - Gemini 2.0 Flash (primary, free)
155
+ - Claude Sonnet 4.5 (fallback, paid)
156
+ - Image loading and base64 encoding
157
+ - 15 passing tests (mock LLM responses)
158
+
159
+ 5. **Tool Registry** ([src/tools/__init__.py](../src/tools/__init__.py))
160
+ - Exports all 4 main tools: `search`, `parse_file`, `safe_eval`, `analyze_image`
161
+ - TOOLS dict with metadata (description, parameters, category)
162
+ - Ready for Stage 3 dynamic tool selection
163
+
164
+ 6. **StateGraph Integration** ([src/agent/graph.py](../src/agent/graph.py))
165
+ - Updated `execute_node` to load tool registry
166
+ - Stage 2: Reports tool availability
167
+ - Stage 3: Will add dynamic tool selection and execution
168
+
169
+ **Test Coverage:**
170
+ - 85 tool tests passing (web_search: 10, file_parser: 19, calculator: 41, vision: 15)
171
+ - 6 existing agent tests still passing
172
+ - 91 total tests passing
173
+ - No regressions from Stage 1
174
+
175
+ **Deployment:**
176
+ - All changes committed and pushed to HuggingFace Spaces
177
+ - Build succeeded
178
+ - Agent now reports: "Stage 2 complete: 4 tools ready for execution in Stage 3"
179
+
180
+ ## Learnings and Insights
181
+
182
+ ### Pattern: Unified Function with Fallback
183
+
184
+ This pattern worked extremely well for both web search and vision tools:
185
+
186
+ ```python
187
+ def tool_name(args):
188
+ # Try primary service
189
+ try:
190
+ return primary_implementation(args)
191
+ except Exception as e:
192
+ logger.warning(f"Primary failed: {e}")
193
+ # Fallback to secondary
194
+ try:
195
+ return fallback_implementation(args)
196
+ except Exception as fallback_error:
197
+ raise Exception(f"Both failed")
198
+ ```
199
+
200
+ **Why it works:**
201
+ - Maximizes reliability (2 chances to succeed)
202
+ - Transparent to users (single function call)
203
+ - Preserves cost optimization (use free tier first, paid only as fallback)
204
+
205
+ **Recommendation:** Use this pattern for any tool with multiple service providers.
206
+
207
+ ### Pattern: Test Fixtures for File Parsers
208
+
209
+ Creating real test fixtures (sample.pdf, sample.xlsx, etc.) was critical for file parser testing:
210
+
211
+ **What worked:**
212
+ - Tests are realistic (test actual file parsing, not just mocks)
213
+ - Easy to add new test cases (just add new fixture files)
214
+ - Catches edge cases that mocks miss
215
+
216
+ **Created fixtures:**
217
+ - `tests/fixtures/sample.txt` - Plain text
218
+ - `tests/fixtures/sample.csv` - CSV data
219
+ - `tests/fixtures/sample.xlsx` - Excel spreadsheet
220
+ - `tests/fixtures/sample.docx` - Word document
221
+ - `tests/fixtures/test_image.jpg` - Test image (red square)
222
+ - `tests/fixtures/generate_fixtures.py` - Script to regenerate fixtures
223
+
224
+ **Recommendation:** For any file processing tool, create comprehensive fixture library.
225
+
226
+ ### What Worked Well: Mock Path for Import Testing
227
+
228
+ Initially had issues with mock paths like `src.tools.vision.genai.Client`. The fix:
229
+
230
+ ```python
231
+ # WRONG: src.tools.vision.genai.Client
232
+ # RIGHT: google.genai.Client
233
+ with patch('google.genai.Client') as mock_client:
234
+ # Mock the original import, not the re-export
235
+ ```
236
+
237
+ **Lesson:** Always mock the original module path, not where it's imported into your code.
238
+
239
+ ### What to Avoid: Premature Integration Testing
240
+
241
+ Initially planned to create `tests/test_tools_integration.py` for cross-tool testing. **Decision:** Skip for Stage 2.
242
+
243
+ **Why:**
244
+ - Tools work independently (don't need to interact yet)
245
+ - Integration testing makes sense in Stage 3 when tools are orchestrated
246
+ - Unit tests provide sufficient coverage for Stage 2
247
+
248
+ **Recommendation:** Only write integration tests when components actually integrate. Don't test imaginary integration.
249
+
250
+ ## Changelog
251
+
252
+ **What was created:**
253
+
254
+ - `src/tools/web_search.py` - Tavily/Exa web search with retry logic
255
+ - `src/tools/file_parser.py` - PDF/Excel/Word/Text parsing with retry logic
256
+ - `src/tools/calculator.py` - Safe AST-based math evaluation
257
+ - `src/tools/vision.py` - Multimodal image analysis (Gemini/Claude)
258
+ - `tests/test_web_search.py` - 10 tests for web search tool
259
+ - `tests/test_file_parser.py` - 19 tests for file parser
260
+ - `tests/test_calculator.py` - 41 tests for calculator (including security)
261
+ - `tests/test_vision.py` - 15 tests for vision tool
262
+ - `tests/fixtures/sample.txt` - Test text file
263
+ - `tests/fixtures/sample.csv` - Test CSV file
264
+ - `tests/fixtures/sample.xlsx` - Test Excel file
265
+ - `tests/fixtures/sample.docx` - Test Word document
266
+ - `tests/fixtures/test_image.jpg` - Test image
267
+ - `tests/fixtures/generate_fixtures.py` - Fixture generation script
268
+
269
+ **What was modified:**
270
+
271
+ - `src/tools/__init__.py` - Added tool exports and TOOLS registry
272
+ - `src/agent/graph.py` - Updated execute_node to load tool registry
273
+ - `requirements.txt` - Added `tenacity>=8.2.0` for retry logic
274
+ - `pyproject.toml` - Installed tenacity, fpdf2, defusedxml packages
275
+ - `PLAN.md` - Emptied for next stage
276
+ - `TODO.md` - Emptied for next stage
277
+
278
+ **What was deleted:**
279
+
280
+ - None (Stage 2 was purely additive)
src/tools/web_search.py CHANGED
@@ -39,6 +39,7 @@ logger = logging.getLogger(__name__)
39
  # Tavily Search Implementation
40
  # ============================================================================
41
 
 
42
  @retry(
43
  stop=stop_after_attempt(MAX_RETRIES),
44
  wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
@@ -83,11 +84,13 @@ def tavily_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
83
  # Extract and structure results
84
  results = []
85
  for item in response.get("results", []):
86
- results.append({
87
- "title": item.get("title", ""),
88
- "url": item.get("url", ""),
89
- "snippet": item.get("content", ""),
90
- })
 
 
91
 
92
  logger.info(f"Tavily search successful: {len(results)} results")
93
 
@@ -113,6 +116,7 @@ def tavily_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
113
  # Exa Search Implementation
114
  # ============================================================================
115
 
 
116
  @retry(
117
  stop=stop_after_attempt(MAX_RETRIES),
118
  wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
@@ -152,16 +156,20 @@ def exa_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
152
  logger.info(f"Exa search: query='{query}', max_results={max_results}")
153
 
154
  client = Exa(api_key=api_key)
155
- response = client.search(query=query, num_results=max_results, use_autoprompt=True)
 
 
156
 
157
  # Extract and structure results
158
  results = []
159
  for item in response.results:
160
- results.append({
161
- "title": item.title if hasattr(item, 'title') else "",
162
- "url": item.url if hasattr(item, 'url') else "",
163
- "snippet": item.text if hasattr(item, 'text') else "",
164
- })
 
 
165
 
166
  logger.info(f"Exa search successful: {len(results)} results")
167
 
@@ -187,6 +195,7 @@ def exa_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
187
  # Unified Search with Fallback
188
  # ============================================================================
189
 
 
190
  def search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
191
  """
192
  Unified search function with automatic fallback.
 
39
  # Tavily Search Implementation
40
  # ============================================================================
41
 
42
+
43
  @retry(
44
  stop=stop_after_attempt(MAX_RETRIES),
45
  wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
 
84
  # Extract and structure results
85
  results = []
86
  for item in response.get("results", []):
87
+ results.append(
88
+ {
89
+ "title": item.get("title", ""),
90
+ "url": item.get("url", ""),
91
+ "snippet": item.get("content", ""),
92
+ }
93
+ )
94
 
95
  logger.info(f"Tavily search successful: {len(results)} results")
96
 
 
116
  # Exa Search Implementation
117
  # ============================================================================
118
 
119
+
120
  @retry(
121
  stop=stop_after_attempt(MAX_RETRIES),
122
  wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
 
156
  logger.info(f"Exa search: query='{query}', max_results={max_results}")
157
 
158
  client = Exa(api_key=api_key)
159
+ response = client.search(
160
+ query=query, num_results=max_results, use_autoprompt=True
161
+ )
162
 
163
  # Extract and structure results
164
  results = []
165
  for item in response.results:
166
+ results.append(
167
+ {
168
+ "title": item.title if hasattr(item, "title") else "",
169
+ "url": item.url if hasattr(item, "url") else "",
170
+ "snippet": item.text if hasattr(item, "text") else "",
171
+ }
172
+ )
173
 
174
  logger.info(f"Exa search successful: {len(results)} results")
175
 
 
195
  # Unified Search with Fallback
196
  # ============================================================================
197
 
198
+
199
  def search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
200
  """
201
  Unified search function with automatic fallback.
{tests β†’ test}/README.md RENAMED
File without changes
{tests β†’ test}/__init__.py RENAMED
File without changes
{tests β†’ test}/fixtures/generate_fixtures.py RENAMED
File without changes
{tests β†’ test}/fixtures/sample.csv RENAMED
File without changes
{tests β†’ test}/fixtures/sample.docx RENAMED
File without changes
{tests β†’ test}/fixtures/sample.txt RENAMED
File without changes
{tests β†’ test}/fixtures/sample.xlsx RENAMED
File without changes
{tests β†’ test}/fixtures/test_image.jpg RENAMED
File without changes
{tests β†’ test}/test_agent_basic.py RENAMED
File without changes
{tests β†’ test}/test_calculator.py RENAMED
File without changes
{tests β†’ test}/test_file_parser.py RENAMED
File without changes
{tests β†’ test}/test_stage1.py RENAMED
File without changes
{tests β†’ test}/test_vision.py RENAMED
File without changes
{tests β†’ test}/test_web_search.py RENAMED
File without changes