mangubee Claude Sonnet 4.5 commited on
Commit
1041734
·
1 Parent(s): 070d5c0

Stage 2: Implement tool development with retry logic and error handling

Browse files

Implemented 4 core tools with comprehensive test coverage:

**Tools Added:**
- web_search.py: Tavily/Exa search with fallback (10 tests)
- file_parser.py: PDF/Excel/Word/Text parsing (19 tests)
- calculator.py: Safe math eval with security (41 tests)
- vision.py: Multimodal image analysis (15 tests)

**Features:**
- Retry logic with tenacity (exponential backoff, 3 max retries)
- Comprehensive error handling and logging
- Tool registry in __init__.py with metadata
- 85 passing tests total

**Integration:**
- Updated graph.py execute_node to load tool registry
- Added TOOLS dict for Stage 3 dynamic tool selection
- Maintained Stage 1 compatibility

**Testing:**
- Created test fixtures for all file types
- Mock API testing for web search and vision
- Security testing for calculator (prevents code injection)
- All 91 tests passing (6 agent + 85 tool tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

PLAN.md CHANGED
@@ -1,8 +1,300 @@
1
- # Implementation Plan
2
 
3
- **Status:** Ready for next stage
4
- **Last Updated:** 2026-01-02
 
5
 
6
- ---
7
 
8
- Stage 1 completed. Planning for next stage will be documented here.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Implementation Plan - Stage 2: Tool Development
2
 
3
+ **Date:** 2026-01-02
4
+ **Dev Record:** TBD (will create dev_260102_##_stage2_tool_development.md)
5
+ **Status:** In Progress
6
 
7
+ ## Objective
8
 
9
+ Implement 4 core tools (web search, file parsing, calculator, multimodal vision) with retry logic and error handling, following Level 5 (Component Selection) and Level 6 (Implementation Framework) architectural decisions. Each tool must be independently testable and integrate seamlessly with the LangGraph StateGraph.
10
+
11
+ ## Steps
12
+
13
+ ### Step 1: Web Search Tool Implementation
14
+
15
+ **1.1 Create src/tools/web_search.py**
16
+
17
+ - Implement `tavily_search(query: str, max_results: int = 5) -> dict` function
18
+ - Implement `exa_search(query: str, max_results: int = 5) -> dict` function (fallback)
19
+ - Use Settings.get_search_api_key() for API key retrieval
20
+ - Return structured results: {results: [{title, url, snippet}], source: "tavily"|"exa"}
21
+
22
+ **1.2 Add retry logic with exponential backoff**
23
+
24
+ - Use `tenacity` library for retry decorator
25
+ - Retry on connection errors, timeouts, rate limits
26
+ - Max 3 retries with 2^n second delays
27
+ - Fallback from Tavily to Exa if Tavily fails after retries
28
+
29
+ **1.3 Error handling**
30
+
31
+ - Catch API errors and return meaningful error messages
32
+ - Handle empty results gracefully
33
+ - Log all errors for debugging
34
+
35
+ **1.4 Create tests/test_web_search.py**
36
+
37
+ - Test Tavily search with mock API
38
+ - Test Exa search with mock API
39
+ - Test retry logic (simulate failures)
40
+ - Test fallback mechanism
41
+ - Test error handling
42
+
43
+ ### Step 2: File Parsing Tool Implementation
44
+
45
+ **2.1 Create src/tools/file_parser.py**
46
+
47
+ - Implement `parse_pdf(file_path: str) -> str` using PyPDF2
48
+ - Implement `parse_excel(file_path: str) -> dict` using openpyxl
49
+ - Implement `parse_docx(file_path: str) -> str` using python-docx
50
+ - Implement `parse_image_text(image_path: str) -> str` using Pillow + OCR (optional)
51
+ - Generic `parse_file(file_path: str) -> dict` dispatcher based on extension
52
+
53
+ **2.2 Add retry logic for file operations**
54
+
55
+ - Retry on file read errors (network issues, temporary locks)
56
+ - Max 3 retries with exponential backoff
57
+
58
+ **2.3 Error handling**
59
+
60
+ - Handle file not found errors
61
+ - Handle corrupted file errors
62
+ - Handle unsupported format errors
63
+ - Return structured error responses
64
+
65
+ **2.4 Create tests/test_file_parser.py**
66
+
67
+ - Create test fixtures (sample PDF, Excel, Word files in tests/fixtures/)
68
+ - Test each parser function independently
69
+ - Test error handling for missing files
70
+ - Test error handling for corrupted files
71
+
72
+ ### Step 3: Calculator Tool Implementation
73
+
74
+ **3.1 Create src/tools/calculator.py**
75
+
76
+ - Implement `safe_eval(expression: str) -> dict` using ast.literal_eval
77
+ - Support basic arithmetic operations (+, -, *, /, **, %)
78
+ - Support mathematical functions (sin, cos, sqrt, etc.) via math module
79
+ - Return structured result: {result: float|int, expression: str}
80
+
81
+ **3.2 Add safety checks**
82
+
83
+ - Whitelist allowed operations (no exec, eval, import)
84
+ - Validate expression before evaluation
85
+ - Set execution timeout (prevent infinite loops)
86
+ - Limit expression complexity (prevent DoS)
87
+
88
+ **3.3 Error handling**
89
+
90
+ - Handle syntax errors
91
+ - Handle division by zero
92
+ - Handle invalid operations
93
+ - Return meaningful error messages
94
+
95
+ **3.4 Create tests/test_calculator.py**
96
+
97
+ - Test basic arithmetic (2+2, 10*5, etc.)
98
+ - Test mathematical functions (sqrt(16), sin(0), etc.)
99
+ - Test error handling (division by zero, invalid syntax)
100
+ - Test safety checks (block dangerous operations)
101
+
102
+ ### Step 4: Multimodal Vision Tool Implementation
103
+
104
+ **4.1 Create src/tools/vision.py**
105
+
106
+ - Implement `analyze_image(image_path: str, question: str) -> str`
107
+ - Use LLM's native vision capabilities (Gemini/Claude)
108
+ - Load image, encode to base64
109
+ - Send to vision-capable LLM with question
110
+ - Return description/answer
111
+
112
+ **4.2 Add retry logic**
113
+
114
+ - Retry on API errors
115
+ - Max 3 retries with exponential backoff
116
+
117
+ **4.3 Error handling**
118
+
119
+ - Handle image loading errors
120
+ - Handle unsupported image formats
121
+ - Handle API errors
122
+ - Return structured responses
123
+
124
+ **4.4 Create tests/test_vision.py**
125
+
126
+ - Create test image fixtures
127
+ - Test image analysis with mock LLM
128
+ - Test error handling
129
+ - Test retry logic
130
+
131
+ ### Step 5: Tool Integration with StateGraph
132
+
133
+ **5.1 Update src/tools/__init__.py**
134
+
135
+ - Export all tool functions
136
+ - Create unified tool registry: `TOOLS = {name: function}`
137
+ - Add tool metadata (description, parameters, return type)
138
+
139
+ **5.2 Update src/agent/graph.py execute_node**
140
+
141
+ - Replace placeholder with actual tool execution
142
+ - Parse tool calls from plan
143
+ - Execute tools with error handling
144
+ - Collect results
145
+ - Return updated state with tool results
146
+
147
+ **5.3 Add tool execution wrapper**
148
+
149
+ - Implement `execute_tool(tool_name: str, **kwargs) -> dict`
150
+ - Add logging for tool calls
151
+ - Add timeout enforcement
152
+ - Add result validation
153
+
154
+ ### Step 6: Configuration and Settings Updates
155
+
156
+ **6.1 Update src/config/settings.py**
157
+
158
+ - Add tool-specific settings (timeouts, max retries, etc.)
159
+ - Add tool feature flags (enable/disable specific tools)
160
+ - Add result size limits
161
+
162
+ **6.2 Update .env.example**
163
+
164
+ - Document any new environment variables
165
+ - Add tool-specific configuration examples
166
+
167
+ ### Step 7: Integration Testing
168
+
169
+ **7.1 Create tests/test_tools_integration.py**
170
+
171
+ - Test all tools working together
172
+ - Test tool execution from StateGraph
173
+ - Test error propagation
174
+ - Test retry mechanisms across all tools
175
+
176
+ **7.2 Create test_stage2.py**
177
+
178
+ - End-to-end test with real tool calls
179
+ - Verify StateGraph executes tools correctly
180
+ - Verify results are returned to state
181
+ - Verify errors are handled gracefully
182
+
183
+ ### Step 8: Documentation and Deployment
184
+
185
+ **8.1 Update requirements.txt**
186
+
187
+ - Ensure all tool dependencies are included
188
+ - Add tenacity for retry logic
189
+
190
+ **8.2 Local testing**
191
+
192
+ - Run all test suites
193
+ - Test with Gradio UI
194
+ - Verify no regressions from Stage 1
195
+
196
+ **8.3 Deploy to HF Spaces**
197
+
198
+ - Push changes
199
+ - Verify build succeeds
200
+ - Test tools in deployed environment
201
+
202
+ ## Files to Modify
203
+
204
+ **New files to create:**
205
+
206
+ - `src/tools/web_search.py` - Tavily/Exa search implementation
207
+ - `src/tools/file_parser.py` - PDF/Excel/Word/Image parsing
208
+ - `src/tools/calculator.py` - Safe expression evaluation
209
+ - `src/tools/vision.py` - Multimodal image analysis
210
+ - `tests/test_web_search.py` - Web search tests
211
+ - `tests/test_file_parser.py` - File parser tests
212
+ - `tests/test_calculator.py` - Calculator tests
213
+ - `tests/test_vision.py` - Vision tests
214
+ - `tests/test_tools_integration.py` - Integration tests
215
+ - `tests/test_stage2.py` - Stage 2 end-to-end tests
216
+ - `tests/fixtures/` - Test files directory
217
+
218
+ **Existing files to modify:**
219
+
220
+ - `src/tools/__init__.py` - Export all tools, create tool registry
221
+ - `src/agent/graph.py` - Update execute_node to use real tools
222
+ - `src/config/settings.py` - Add tool-specific settings
223
+ - `.env.example` - Document new configuration (if any)
224
+ - `requirements.txt` - Add tenacity for retry logic
225
+
226
+ **Files NOT to modify:**
227
+
228
+ - `src/agent/graph.py` plan_node - Defer to Stage 3
229
+ - `src/agent/graph.py` answer_node - Defer to Stage 3
230
+ - Planning/reasoning logic - Defer to Stage 3
231
+
232
+ ## Success Criteria
233
+
234
+ ### Functional Requirements
235
+
236
+ - [ ] Web search tool returns valid results from Tavily
237
+ - [ ] Web search falls back to Exa when Tavily fails
238
+ - [ ] File parser handles PDF, Excel, Word files correctly
239
+ - [ ] Calculator evaluates mathematical expressions safely
240
+ - [ ] Vision tool analyzes images using LLM vision capabilities
241
+ - [ ] All tools have retry logic with exponential backoff
242
+ - [ ] All tools handle errors gracefully
243
+ - [ ] Tools integrate with StateGraph execute_node
244
+
245
+ ### Technical Requirements
246
+
247
+ - [ ] All tool functions return structured dict responses
248
+ - [ ] Retry logic uses tenacity with max 3 retries
249
+ - [ ] Error messages are clear and actionable
250
+ - [ ] All tools have comprehensive test coverage (>80%)
251
+ - [ ] No unsafe code execution in calculator
252
+ - [ ] Tool timeouts enforced to prevent hangs
253
+
254
+ ### Validation Checkpoints
255
+
256
+ - [ ] **Checkpoint 1:** Web search tool working with tests passing
257
+ - [ ] **Checkpoint 2:** File parser working with tests passing
258
+ - [ ] **Checkpoint 3:** Calculator working with tests passing
259
+ - [ ] **Checkpoint 4:** Vision tool working with tests passing
260
+ - [ ] **Checkpoint 5:** All tools integrated with StateGraph
261
+ - [ ] **Checkpoint 6:** Integration tests passing
262
+ - [ ] **Checkpoint 7:** Deployed to HF Spaces successfully
263
+
264
+ ### Non-Goals for Stage 2
265
+
266
+ - ❌ Implementing planning logic (Stage 3)
267
+ - ❌ Implementing answer synthesis (Stage 3)
268
+ - ❌ Optimizing tool selection strategy (Stage 3)
269
+ - ❌ Advanced error recovery beyond retries (Stage 4)
270
+ - ❌ Performance optimization (Stage 5)
271
+
272
+ ## Dependencies & Risks
273
+
274
+ **Dependencies:**
275
+
276
+ - Tavily API key (free tier: 1000 req/month)
277
+ - Exa API key (paid tier, fallback)
278
+ - LLM vision API access (Gemini/Claude)
279
+ - Test fixtures (sample files for parsing)
280
+
281
+ **Risks:**
282
+
283
+ - **Risk:** API rate limits during testing
284
+ - **Mitigation:** Use mocks for unit tests, real APIs only for integration tests
285
+ - **Risk:** File parsing fails on edge cases
286
+ - **Mitigation:** Comprehensive test fixtures covering various formats
287
+ - **Risk:** Calculator security vulnerabilities
288
+ - **Mitigation:** Strict whitelisting, no eval/exec, use AST parsing only
289
+ - **Risk:** Tool timeout issues on slow networks
290
+ - **Mitigation:** Configurable timeouts, retry logic
291
+
292
+ ## Next Steps After Stage 2
293
+
294
+ Once Stage 2 Success Criteria met:
295
+
296
+ 1. Create Stage 3 plan (Core Agent Logic - Planning & Reasoning)
297
+ 2. Implement plan_node with tool selection strategy
298
+ 3. Implement answer_node with result synthesis
299
+ 4. Test end-to-end agent behavior
300
+ 5. Proceed to Stage 4 (Integration & Robustness)
TODO.md CHANGED
@@ -1,14 +1,71 @@
1
- # TODO List
2
 
3
- **Session Date:** [YYYY-MM-DD]
4
- **Dev Record:** [link to dev/dev_YYMMDD_##_concise_title.md]
 
5
 
6
- ## Active Tasks
7
 
8
- - [ ] [Task 1]
9
- - [ ] [Task 2]
10
- - [ ] [Task 3]
 
 
 
 
11
 
12
- ## Completed Tasks
 
 
 
 
 
 
13
 
14
- - [x] [Completed task 1]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TODO - Stage 2: Tool Development
2
 
3
+ **Created:** 2026-01-02
4
+ **Plan:** PLAN.md (Stage 2: Tool Development)
5
+ **Status:** Ready for execution
6
 
7
+ ## Task List
8
 
9
+ ### Step 1: Web Search Tool
10
+ - [ ] Create `src/tools/web_search.py` with Tavily and Exa search functions
11
+ - [ ] Add retry logic with tenacity decorator (max 3 retries, exponential backoff)
12
+ - [ ] Implement fallback mechanism (Tavily → Exa)
13
+ - [ ] Add error handling and logging
14
+ - [ ] Create `tests/test_web_search.py` with mock API tests
15
+ - [ ] Test retry logic and fallback mechanism
16
 
17
+ ### Step 2: File Parsing Tool
18
+ - [ ] Create `src/tools/file_parser.py` with PDF/Excel/Word parsers
19
+ - [ ] Implement generic `parse_file()` dispatcher
20
+ - [ ] Add retry logic for file operations
21
+ - [ ] Add error handling for missing/corrupted files
22
+ - [ ] Create test fixtures in `tests/fixtures/`
23
+ - [ ] Create `tests/test_file_parser.py` with parser tests
24
 
25
+ ### Step 3: Calculator Tool
26
+ - [ ] Create `src/tools/calculator.py` with safe_eval function
27
+ - [ ] Implement safety checks (whitelist operations, timeout, complexity limits)
28
+ - [ ] Add error handling for syntax/division errors
29
+ - [ ] Create `tests/test_calculator.py` with arithmetic and safety tests
30
+
31
+ ### Step 4: Vision Tool
32
+ - [ ] Create `src/tools/vision.py` with image analysis function
33
+ - [ ] Implement image loading and base64 encoding
34
+ - [ ] Integrate with LLM vision API (Gemini/Claude)
35
+ - [ ] Add retry logic for API errors
36
+ - [ ] Create test image fixtures
37
+ - [ ] Create `tests/test_vision.py` with mock LLM tests
38
+
39
+ ### Step 5: StateGraph Integration
40
+ - [ ] Update `src/tools/__init__.py` to export all tools
41
+ - [ ] Create unified tool registry with metadata
42
+ - [ ] Update `src/agent/graph.py` execute_node to use real tools
43
+ - [ ] Implement `execute_tool()` wrapper with logging and timeout
44
+ - [ ] Test tool execution from StateGraph
45
+
46
+ ### Step 6: Configuration Updates
47
+ - [ ] Update `src/config/settings.py` with tool-specific settings
48
+ - [ ] Add tool feature flags and timeouts
49
+ - [ ] Update `.env.example` with new configuration (if needed)
50
+
51
+ ### Step 7: Integration Testing
52
+ - [ ] Create `tests/test_tools_integration.py` for cross-tool tests
53
+ - [ ] Create `tests/test_stage2.py` for end-to-end validation
54
+ - [ ] Test error propagation and retry mechanisms
55
+ - [ ] Verify StateGraph executes all tools correctly
56
+
57
+ ### Step 8: Deployment
58
+ - [ ] Add `tenacity` to requirements.txt
59
+ - [ ] Run all test suites locally
60
+ - [ ] Test with Gradio UI
61
+ - [ ] Verify no regressions from Stage 1
62
+ - [ ] Push changes to HF Spaces
63
+ - [ ] Verify deployment build succeeds
64
+ - [ ] Test tools in deployed environment
65
+
66
+ ## Notes
67
+
68
+ - All tools use direct API approach (not MCP servers)
69
+ - HF Spaces deployment compatibility is priority
70
+ - Mock APIs for unit tests, real APIs for integration tests only
71
+ - Each checkpoint should pass before moving to next step
pyproject.toml CHANGED
@@ -13,28 +13,24 @@ dependencies = [
13
  "langgraph>=0.2.0",
14
  "langchain>=0.3.0",
15
  "langchain-core>=0.3.0",
16
-
17
  # LLM APIs
18
  "anthropic>=0.39.0",
19
  "google-genai>=0.2.0",
20
-
21
  # Search & retrieval tools
22
  "exa-py>=1.0.0",
23
  "tavily-python>=0.5.0",
24
-
25
  # File readers (multi-format support)
26
  "PyPDF2>=3.0.0",
27
  "openpyxl>=3.1.0",
28
  "python-docx>=1.1.0",
29
  "pillow>=10.4.0",
30
-
31
  # Web & API utilities
32
  "requests>=2.32.0",
33
  "python-dotenv>=1.0.0",
34
-
35
  # Gradio UI
36
  "gradio[oauth]>=5.0.0",
37
  "pandas>=2.2.0",
 
38
  ]
39
 
40
  [tool.uv]
 
13
  "langgraph>=0.2.0",
14
  "langchain>=0.3.0",
15
  "langchain-core>=0.3.0",
 
16
  # LLM APIs
17
  "anthropic>=0.39.0",
18
  "google-genai>=0.2.0",
 
19
  # Search & retrieval tools
20
  "exa-py>=1.0.0",
21
  "tavily-python>=0.5.0",
 
22
  # File readers (multi-format support)
23
  "PyPDF2>=3.0.0",
24
  "openpyxl>=3.1.0",
25
  "python-docx>=1.1.0",
26
  "pillow>=10.4.0",
 
27
  # Web & API utilities
28
  "requests>=2.32.0",
29
  "python-dotenv>=1.0.0",
 
30
  # Gradio UI
31
  "gradio[oauth]>=5.0.0",
32
  "pandas>=2.2.0",
33
+ "tenacity>=9.1.2",
34
  ]
35
 
36
  [tool.uv]
requirements.txt CHANGED
@@ -57,3 +57,4 @@ python-dotenv>=1.0.0 # Environment variable management
57
  # ============================================================================
58
  pydantic>=2.0.0 # Data validation (for StateGraph)
59
  typing-extensions>=4.12.0 # Type hints support
 
 
57
  # ============================================================================
58
  pydantic>=2.0.0 # Data validation (for StateGraph)
59
  typing-extensions>=4.12.0 # Type hints support
60
+ tenacity>=8.2.0 # Retry logic with exponential backoff
src/agent/graph.py CHANGED
@@ -4,7 +4,7 @@ Author: @mangobee
4
  Date: 2026-01-01
5
 
6
  Stage 1: Skeleton with placeholder nodes
7
- Stage 2: Tool integration
8
  Stage 3: Planning and reasoning logic implementation
9
 
10
  Based on:
@@ -13,9 +13,16 @@ Based on:
13
  - Level 6: LangGraph framework
14
  """
15
 
 
16
  from typing import TypedDict, List, Optional
17
  from langgraph.graph import StateGraph, END
18
  from src.config import Settings
 
 
 
 
 
 
19
 
20
  # ============================================================================
21
  # Agent State Definition
@@ -42,8 +49,8 @@ def plan_node(state: AgentState) -> AgentState:
42
  """
43
  Planning node: Analyze question and generate execution plan.
44
 
45
- Stage 1: Returns placeholder plan
46
- Stage 3: Implement dynamic planning logic
47
 
48
  Args:
49
  state: Current agent state with question
@@ -51,10 +58,13 @@ def plan_node(state: AgentState) -> AgentState:
51
  Returns:
52
  Updated state with execution plan
53
  """
54
- print(f"[plan_node] Question received: {state['question'][:100]}...")
 
 
 
 
55
 
56
- # Stage 1 placeholder: Skip planning
57
- state["plan"] = "Stage 1 placeholder: No planning implemented yet"
58
 
59
  return state
60
 
@@ -63,9 +73,8 @@ def execute_node(state: AgentState) -> AgentState:
63
  """
64
  Execution node: Execute tools based on plan.
65
 
66
- Stage 1: Returns placeholder tool calls
67
- Stage 2: Implement tool orchestration
68
- Stage 3: Implement tool selection based on plan
69
 
70
  Args:
71
  state: Current agent state with plan
@@ -73,12 +82,25 @@ def execute_node(state: AgentState) -> AgentState:
73
  Returns:
74
  Updated state with tool execution results
75
  """
76
- print(f"[execute_node] Plan: {state['plan']}")
77
 
78
- # Stage 1 placeholder: No tool execution
79
- state["tool_calls"] = [
80
- {"tool": "placeholder", "status": "Stage 1: No tools implemented yet"}
81
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  return state
84
 
@@ -87,8 +109,8 @@ def answer_node(state: AgentState) -> AgentState:
87
  """
88
  Answer synthesis node: Generate final factoid answer.
89
 
90
- Stage 1: Returns fixed placeholder answer
91
- Stage 3: Implement answer synthesis from tool results
92
 
93
  Args:
94
  state: Current agent state with tool results
@@ -96,10 +118,13 @@ def answer_node(state: AgentState) -> AgentState:
96
  Returns:
97
  Updated state with final answer
98
  """
99
- print(f"[answer_node] Tool calls: {len(state['tool_calls'])}")
 
 
 
 
100
 
101
- # Stage 1 placeholder: Fixed answer
102
- state["answer"] = "Stage 1 placeholder answer"
103
 
104
  return state
105
 
 
4
  Date: 2026-01-01
5
 
6
  Stage 1: Skeleton with placeholder nodes
7
+ Stage 2: Tool integration (CURRENT)
8
  Stage 3: Planning and reasoning logic implementation
9
 
10
  Based on:
 
13
  - Level 6: LangGraph framework
14
  """
15
 
16
+ import logging
17
  from typing import TypedDict, List, Optional
18
  from langgraph.graph import StateGraph, END
19
  from src.config import Settings
20
+ from src.tools import TOOLS
21
+
22
+ # ============================================================================
23
+ # Logging Setup
24
+ # ============================================================================
25
+ logger = logging.getLogger(__name__)
26
 
27
  # ============================================================================
28
  # Agent State Definition
 
49
  """
50
  Planning node: Analyze question and generate execution plan.
51
 
52
+ Stage 2: Basic tool listing
53
+ Stage 3: Dynamic planning with LLM
54
 
55
  Args:
56
  state: Current agent state with question
 
58
  Returns:
59
  Updated state with execution plan
60
  """
61
+ logger.info(f"[plan_node] Question received: {state['question'][:100]}...")
62
+
63
+ # Stage 2: List available tools (dynamic planning in Stage 3)
64
+ tool_summary = ", ".join(TOOLS.keys())
65
+ state["plan"] = f"Stage 2: {len(TOOLS)} tools available ({tool_summary}). Dynamic planning in Stage 3."
66
 
67
+ logger.info(f"[plan_node] Plan created: {state['plan']}")
 
68
 
69
  return state
70
 
 
73
  """
74
  Execution node: Execute tools based on plan.
75
 
76
+ Stage 2: Tool execution with error handling
77
+ Stage 3: Dynamic tool selection based on plan
 
78
 
79
  Args:
80
  state: Current agent state with plan
 
82
  Returns:
83
  Updated state with tool execution results
84
  """
85
+ logger.info(f"[execute_node] Executing tools - Plan: {state['plan'][:100]}...")
86
 
87
+ # Stage 2: Tools are available but no dynamic planning yet
88
+ # For now, just demonstrate tool registry is loaded
89
+ tool_calls = []
90
+
91
+ # Log available tools
92
+ for tool_name, tool_info in TOOLS.items():
93
+ logger.info(f" Available tool: {tool_name} - {tool_info['description']}")
94
+ tool_calls.append({
95
+ "tool": tool_name,
96
+ "status": "ready",
97
+ "description": tool_info["description"],
98
+ "category": tool_info["category"]
99
+ })
100
+
101
+ state["tool_calls"] = tool_calls
102
+
103
+ logger.info(f"[execute_node] {len(tool_calls)} tools ready for Stage 3 dynamic execution")
104
 
105
  return state
106
 
 
109
  """
110
  Answer synthesis node: Generate final factoid answer.
111
 
112
+ Stage 2: Summarize tool availability
113
+ Stage 3: Synthesize answer from tool execution results
114
 
115
  Args:
116
  state: Current agent state with tool results
 
118
  Returns:
119
  Updated state with final answer
120
  """
121
+ logger.info(f"[answer_node] Processing {len(state['tool_calls'])} tool results")
122
+
123
+ # Stage 2: Report tool readiness
124
+ ready_tools = [t["tool"] for t in state["tool_calls"] if t["status"] == "ready"]
125
+ state["answer"] = f"Stage 2 complete: {len(ready_tools)} tools ready for execution in Stage 3"
126
 
127
+ logger.info(f"[answer_node] Answer generated: {state['answer']}")
 
128
 
129
  return state
130
 
src/tools/__init__.py CHANGED
@@ -1,15 +1,64 @@
1
  """
2
- MCP tool implementations package
3
  Author: @mangobee
4
 
5
- This package will contain:
6
- - web_search.py: Web search tool (Exa/Tavily)
7
- - code_interpreter.py: Python code execution
8
- - file_reader.py: Multi-format file reading
9
- - multimodal.py: Vision/image processing
10
 
11
- Stage 1: Placeholder only
12
- Stage 2: Full implementation
13
  """
14
 
15
- __all__ = []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  """
2
+ Tool implementations package
3
  Author: @mangobee
4
 
5
+ This package contains all agent tools:
6
+ - web_search: Web search using Tavily/Exa
7
+ - file_parser: Multi-format file parsing (PDF/Excel/Word/Text)
8
+ - calculator: Safe mathematical expression evaluation
9
+ - vision: Multimodal image analysis using LLMs
10
 
11
+ Stage 2: All tools implemented with retry logic and error handling
 
12
  """
13
 
14
+ from src.tools.web_search import search, tavily_search, exa_search
15
+ from src.tools.file_parser import parse_file, parse_pdf, parse_excel, parse_word, parse_text
16
+ from src.tools.calculator import safe_eval
17
+ from src.tools.vision import analyze_image, analyze_image_gemini, analyze_image_claude
18
+
19
+ # Tool registry with metadata
20
+ TOOLS = {
21
+ "web_search": {
22
+ "function": search,
23
+ "description": "Search the web using Tavily or Exa APIs with fallback",
24
+ "parameters": ["query", "max_results"],
25
+ "category": "information_retrieval",
26
+ },
27
+ "parse_file": {
28
+ "function": parse_file,
29
+ "description": "Parse files (PDF, Excel, Word, Text, CSV) and extract content",
30
+ "parameters": ["file_path"],
31
+ "category": "file_processing",
32
+ },
33
+ "calculator": {
34
+ "function": safe_eval,
35
+ "description": "Safely evaluate mathematical expressions",
36
+ "parameters": ["expression"],
37
+ "category": "computation",
38
+ },
39
+ "vision": {
40
+ "function": analyze_image,
41
+ "description": "Analyze images using multimodal LLMs (Gemini/Claude)",
42
+ "parameters": ["image_path", "question"],
43
+ "category": "multimodal",
44
+ },
45
+ }
46
+
47
+ __all__ = [
48
+ # Main unified tool functions
49
+ "search",
50
+ "parse_file",
51
+ "safe_eval",
52
+ "analyze_image",
53
+ # Specific implementations (for advanced use)
54
+ "tavily_search",
55
+ "exa_search",
56
+ "parse_pdf",
57
+ "parse_excel",
58
+ "parse_word",
59
+ "parse_text",
60
+ "analyze_image_gemini",
61
+ "analyze_image_claude",
62
+ # Tool registry
63
+ "TOOLS",
64
+ ]
src/tools/calculator.py ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Calculator Tool - Safe mathematical expression evaluation
3
+ Author: @mangobee
4
+ Date: 2026-01-02
5
+
6
+ Provides safe evaluation of mathematical expressions with:
7
+ - Whitelisted operations and functions
8
+ - Timeout protection
9
+ - Complexity limits
10
+ - No access to dangerous built-ins
11
+
12
+ Security is prioritized over functionality.
13
+ """
14
+
15
+ import ast
16
+ import math
17
+ import operator
18
+ import logging
19
+ from typing import Any, Dict
20
+ import signal
21
+ from contextlib import contextmanager
22
+
23
+ # ============================================================================
24
+ # CONFIG
25
+ # ============================================================================
26
+ MAX_EXPRESSION_LENGTH = 500
27
+ MAX_EVAL_TIME_SECONDS = 2
28
+ MAX_NUMBER_SIZE = 10**100 # Prevent huge number calculations
29
+
30
+ # Whitelist of safe operations
31
+ SAFE_OPERATORS = {
32
+ ast.Add: operator.add,
33
+ ast.Sub: operator.sub,
34
+ ast.Mult: operator.mul,
35
+ ast.Div: operator.truediv,
36
+ ast.FloorDiv: operator.floordiv,
37
+ ast.Mod: operator.mod,
38
+ ast.Pow: operator.pow,
39
+ ast.USub: operator.neg,
40
+ ast.UAdd: operator.pos,
41
+ }
42
+
43
+ # Whitelist of safe mathematical functions
44
+ SAFE_FUNCTIONS = {
45
+ 'abs': abs,
46
+ 'round': round,
47
+ 'min': min,
48
+ 'max': max,
49
+ 'sum': sum,
50
+ # Math module functions
51
+ 'sqrt': math.sqrt,
52
+ 'ceil': math.ceil,
53
+ 'floor': math.floor,
54
+ 'log': math.log,
55
+ 'log10': math.log10,
56
+ 'exp': math.exp,
57
+ 'sin': math.sin,
58
+ 'cos': math.cos,
59
+ 'tan': math.tan,
60
+ 'asin': math.asin,
61
+ 'acos': math.acos,
62
+ 'atan': math.atan,
63
+ 'degrees': math.degrees,
64
+ 'radians': math.radians,
65
+ 'factorial': math.factorial,
66
+ # Constants
67
+ 'pi': math.pi,
68
+ 'e': math.e,
69
+ }
70
+
71
+ # ============================================================================
72
+ # Logging Setup
73
+ # ============================================================================
74
+ logger = logging.getLogger(__name__)
75
+
76
+
77
+ # ============================================================================
78
+ # Timeout Context Manager
79
+ # ============================================================================
80
+
81
+ class TimeoutError(Exception):
82
+ """Raised when evaluation exceeds timeout"""
83
+ pass
84
+
85
+
86
+ @contextmanager
87
+ def timeout(seconds: int):
88
+ """
89
+ Context manager for timeout protection.
90
+
91
+ Args:
92
+ seconds: Maximum execution time
93
+
94
+ Raises:
95
+ TimeoutError: If execution exceeds timeout
96
+ """
97
+ def timeout_handler(signum, frame):
98
+ raise TimeoutError(f"Evaluation exceeded {seconds} second timeout")
99
+
100
+ # Set signal handler
101
+ old_handler = signal.signal(signal.SIGALRM, timeout_handler)
102
+ signal.alarm(seconds)
103
+
104
+ try:
105
+ yield
106
+ finally:
107
+ # Restore old handler and cancel alarm
108
+ signal.alarm(0)
109
+ signal.signal(signal.SIGALRM, old_handler)
110
+
111
+
112
+ # ============================================================================
113
+ # Safe AST Evaluator
114
+ # ============================================================================
115
+
116
+ class SafeEvaluator(ast.NodeVisitor):
117
+ """
118
+ AST visitor that evaluates mathematical expressions safely.
119
+
120
+ Only allows whitelisted operations and functions.
121
+ Prevents code execution, attribute access, and other dangerous operations.
122
+ """
123
+
124
+ def visit_Expression(self, node):
125
+ """Visit Expression node (root of parse tree)"""
126
+ return self.visit(node.body)
127
+
128
+ def visit_Constant(self, node):
129
+ """Visit Constant node (numbers, strings)"""
130
+ value = node.value
131
+
132
+ # Only allow numbers
133
+ if not isinstance(value, (int, float, complex)):
134
+ raise ValueError(f"Unsupported constant type: {type(value).__name__}")
135
+
136
+ # Prevent huge numbers
137
+ if isinstance(value, (int, float)) and abs(value) > MAX_NUMBER_SIZE:
138
+ raise ValueError(f"Number too large: {value}")
139
+
140
+ return value
141
+
142
+ def visit_BinOp(self, node):
143
+ """Visit binary operation node (+, -, *, /, etc.)"""
144
+ op_type = type(node.op)
145
+
146
+ if op_type not in SAFE_OPERATORS:
147
+ raise ValueError(f"Unsupported operation: {op_type.__name__}")
148
+
149
+ left = self.visit(node.left)
150
+ right = self.visit(node.right)
151
+
152
+ op_func = SAFE_OPERATORS[op_type]
153
+
154
+ # Check for division by zero
155
+ if op_type in (ast.Div, ast.FloorDiv, ast.Mod) and right == 0:
156
+ raise ZeroDivisionError("Division by zero")
157
+
158
+ # Prevent huge exponentiations
159
+ if op_type == ast.Pow and abs(right) > 1000:
160
+ raise ValueError(f"Exponent too large: {right}")
161
+
162
+ return op_func(left, right)
163
+
164
+ def visit_UnaryOp(self, node):
165
+ """Visit unary operation node (-, +)"""
166
+ op_type = type(node.op)
167
+
168
+ if op_type not in SAFE_OPERATORS:
169
+ raise ValueError(f"Unsupported unary operation: {op_type.__name__}")
170
+
171
+ operand = self.visit(node.operand)
172
+ op_func = SAFE_OPERATORS[op_type]
173
+
174
+ return op_func(operand)
175
+
176
+ def visit_Call(self, node):
177
+ """Visit function call node"""
178
+ # Only allow simple function names, not attribute access
179
+ if not isinstance(node.func, ast.Name):
180
+ raise ValueError("Only direct function calls are allowed")
181
+
182
+ func_name = node.func.id
183
+
184
+ if func_name not in SAFE_FUNCTIONS:
185
+ raise ValueError(f"Unsupported function: {func_name}")
186
+
187
+ # Evaluate arguments
188
+ args = [self.visit(arg) for arg in node.args]
189
+
190
+ # No keyword arguments allowed
191
+ if node.keywords:
192
+ raise ValueError("Keyword arguments not allowed")
193
+
194
+ func = SAFE_FUNCTIONS[func_name]
195
+
196
+ try:
197
+ return func(*args)
198
+ except Exception as e:
199
+ raise ValueError(f"Error calling {func_name}: {str(e)}")
200
+
201
+ def visit_Name(self, node):
202
+ """Visit name node (variable/constant reference)"""
203
+ # Only allow whitelisted constants
204
+ if node.id in SAFE_FUNCTIONS:
205
+ value = SAFE_FUNCTIONS[node.id]
206
+ # If it's a constant (not a function), return it
207
+ if not callable(value):
208
+ return value
209
+
210
+ raise ValueError(f"Undefined name: {node.id}")
211
+
212
+ def visit_List(self, node):
213
+ """Visit list node"""
214
+ return [self.visit(element) for element in node.elts]
215
+
216
+ def visit_Tuple(self, node):
217
+ """Visit tuple node"""
218
+ return tuple(self.visit(element) for element in node.elts)
219
+
220
+ def generic_visit(self, node):
221
+ """Catch-all for unsupported node types"""
222
+ raise ValueError(f"Unsupported expression type: {type(node).__name__}")
223
+
224
+
225
+ # ============================================================================
226
+ # Public API
227
+ # ============================================================================
228
+
229
+ def safe_eval(expression: str) -> Dict[str, Any]:
230
+ """
231
+ Safely evaluate a mathematical expression.
232
+
233
+ Args:
234
+ expression: Mathematical expression string
235
+
236
+ Returns:
237
+ Dict with structure: {
238
+ "result": float or int, # Evaluation result
239
+ "expression": str, # Original expression
240
+ "success": bool # True if evaluation succeeded
241
+ }
242
+
243
+ Raises:
244
+ ValueError: For invalid or unsafe expressions
245
+ ZeroDivisionError: For division by zero
246
+ TimeoutError: If evaluation exceeds timeout
247
+ SyntaxError: For malformed expressions
248
+
249
+ Examples:
250
+ >>> safe_eval("2 + 2")
251
+ {"result": 4, "expression": "2 + 2", "success": True}
252
+
253
+ >>> safe_eval("sqrt(16) + 3")
254
+ {"result": 7.0, "expression": "sqrt(16) + 3", "success": True}
255
+
256
+ >>> safe_eval("import os") # Raises ValueError
257
+ """
258
+ # Input validation
259
+ if not expression or not isinstance(expression, str):
260
+ raise ValueError("Expression must be a non-empty string")
261
+
262
+ expression = expression.strip()
263
+
264
+ if len(expression) > MAX_EXPRESSION_LENGTH:
265
+ raise ValueError(
266
+ f"Expression too long ({len(expression)} chars). "
267
+ f"Maximum: {MAX_EXPRESSION_LENGTH} chars"
268
+ )
269
+
270
+ logger.info(f"Evaluating expression: {expression}")
271
+
272
+ try:
273
+ # Parse expression into AST
274
+ tree = ast.parse(expression, mode='eval')
275
+
276
+ # Evaluate with timeout protection
277
+ with timeout(MAX_EVAL_TIME_SECONDS):
278
+ evaluator = SafeEvaluator()
279
+ result = evaluator.visit(tree)
280
+
281
+ logger.info(f"Evaluation successful: {result}")
282
+
283
+ return {
284
+ "result": result,
285
+ "expression": expression,
286
+ "success": True,
287
+ }
288
+
289
+ except SyntaxError as e:
290
+ logger.error(f"Syntax error in expression: {e}")
291
+ raise SyntaxError(f"Invalid expression syntax: {str(e)}")
292
+ except ZeroDivisionError as e:
293
+ logger.error(f"Division by zero: {expression}")
294
+ raise
295
+ except TimeoutError as e:
296
+ logger.error(f"Evaluation timeout: {expression}")
297
+ raise
298
+ except ValueError as e:
299
+ logger.error(f"Invalid expression: {e}")
300
+ raise
301
+ except Exception as e:
302
+ logger.error(f"Unexpected error evaluating expression: {e}")
303
+ raise ValueError(f"Evaluation error: {str(e)}")
src/tools/file_parser.py ADDED
@@ -0,0 +1,367 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ File Parser Tool - Multi-format file reading
3
+ Author: @mangobee
4
+ Date: 2026-01-02
5
+
6
+ Provides file parsing for:
7
+ - PDF files (.pdf) using PyPDF2
8
+ - Excel files (.xlsx, .xls) using openpyxl
9
+ - Word documents (.docx) using python-docx
10
+ - Text files (.txt, .csv) using built-in open()
11
+
12
+ All parsers include retry logic and error handling.
13
+ """
14
+
15
+ import logging
16
+ from pathlib import Path
17
+ from typing import Dict, List, Optional
18
+ from tenacity import (
19
+ retry,
20
+ stop_after_attempt,
21
+ wait_exponential,
22
+ retry_if_exception_type,
23
+ )
24
+
25
+ # ============================================================================
26
+ # CONFIG
27
+ # ============================================================================
28
+ MAX_RETRIES = 3
29
+ RETRY_MIN_WAIT = 1 # seconds
30
+ RETRY_MAX_WAIT = 5 # seconds
31
+
32
+ SUPPORTED_EXTENSIONS = {
33
+ '.pdf': 'PDF',
34
+ '.xlsx': 'Excel',
35
+ '.xls': 'Excel',
36
+ '.docx': 'Word',
37
+ '.txt': 'Text',
38
+ '.csv': 'CSV',
39
+ }
40
+
41
+ # ============================================================================
42
+ # Logging Setup
43
+ # ============================================================================
44
+ logger = logging.getLogger(__name__)
45
+
46
+
47
+ # ============================================================================
48
+ # PDF Parser
49
+ # ============================================================================
50
+
51
+ @retry(
52
+ stop=stop_after_attempt(MAX_RETRIES),
53
+ wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
54
+ retry=retry_if_exception_type((IOError, OSError)),
55
+ reraise=True,
56
+ )
57
+ def parse_pdf(file_path: str) -> Dict:
58
+ """
59
+ Parse PDF file and extract text content.
60
+
61
+ Args:
62
+ file_path: Path to PDF file
63
+
64
+ Returns:
65
+ Dict with structure: {
66
+ "content": str, # Extracted text
67
+ "pages": int, # Number of pages
68
+ "file_type": "PDF",
69
+ "file_path": str
70
+ }
71
+
72
+ Raises:
73
+ FileNotFoundError: If file doesn't exist
74
+ ValueError: If file is corrupted or invalid
75
+ IOError: For file reading errors (triggers retry)
76
+ """
77
+ try:
78
+ from PyPDF2 import PdfReader
79
+
80
+ path = Path(file_path)
81
+ if not path.exists():
82
+ raise FileNotFoundError(f"PDF file not found: {file_path}")
83
+
84
+ logger.info(f"Parsing PDF: {file_path}")
85
+
86
+ reader = PdfReader(str(path))
87
+ num_pages = len(reader.pages)
88
+
89
+ # Extract text from all pages
90
+ content = []
91
+ for page_num, page in enumerate(reader.pages, 1):
92
+ text = page.extract_text()
93
+ if text.strip():
94
+ content.append(f"--- Page {page_num} ---\n{text}")
95
+
96
+ full_content = "\n\n".join(content)
97
+
98
+ logger.info(f"PDF parsed successfully: {num_pages} pages, {len(full_content)} chars")
99
+
100
+ return {
101
+ "content": full_content,
102
+ "pages": num_pages,
103
+ "file_type": "PDF",
104
+ "file_path": file_path,
105
+ }
106
+
107
+ except FileNotFoundError as e:
108
+ logger.error(f"PDF file not found: {e}")
109
+ raise
110
+ except (IOError, OSError) as e:
111
+ logger.warning(f"PDF IO error (will retry): {e}")
112
+ raise
113
+ except Exception as e:
114
+ logger.error(f"PDF parsing error: {e}")
115
+ raise ValueError(f"Failed to parse PDF: {str(e)}")
116
+
117
+
118
+ # ============================================================================
119
+ # Excel Parser
120
+ # ============================================================================
121
+
122
+ @retry(
123
+ stop=stop_after_attempt(MAX_RETRIES),
124
+ wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
125
+ retry=retry_if_exception_type((IOError, OSError)),
126
+ reraise=True,
127
+ )
128
+ def parse_excel(file_path: str) -> Dict:
129
+ """
130
+ Parse Excel file and extract data from all sheets.
131
+
132
+ Args:
133
+ file_path: Path to Excel file (.xlsx or .xls)
134
+
135
+ Returns:
136
+ Dict with structure: {
137
+ "content": str, # Formatted table data
138
+ "sheets": List[str], # Sheet names
139
+ "file_type": "Excel",
140
+ "file_path": str
141
+ }
142
+
143
+ Raises:
144
+ FileNotFoundError: If file doesn't exist
145
+ ValueError: If file is corrupted or invalid
146
+ IOError: For file reading errors (triggers retry)
147
+ """
148
+ try:
149
+ from openpyxl import load_workbook
150
+
151
+ path = Path(file_path)
152
+ if not path.exists():
153
+ raise FileNotFoundError(f"Excel file not found: {file_path}")
154
+
155
+ logger.info(f"Parsing Excel: {file_path}")
156
+
157
+ workbook = load_workbook(str(path), data_only=True)
158
+ sheet_names = workbook.sheetnames
159
+
160
+ # Extract data from all sheets
161
+ content_parts = []
162
+ for sheet_name in sheet_names:
163
+ sheet = workbook[sheet_name]
164
+
165
+ # Get all values
166
+ rows = []
167
+ for row in sheet.iter_rows(values_only=True):
168
+ # Filter out completely empty rows
169
+ if any(cell is not None for cell in row):
170
+ row_str = "\t".join(str(cell) if cell is not None else "" for cell in row)
171
+ rows.append(row_str)
172
+
173
+ if rows:
174
+ sheet_content = f"=== Sheet: {sheet_name} ===\n" + "\n".join(rows)
175
+ content_parts.append(sheet_content)
176
+
177
+ full_content = "\n\n".join(content_parts)
178
+
179
+ logger.info(f"Excel parsed successfully: {len(sheet_names)} sheets")
180
+
181
+ return {
182
+ "content": full_content,
183
+ "sheets": sheet_names,
184
+ "file_type": "Excel",
185
+ "file_path": file_path,
186
+ }
187
+
188
+ except FileNotFoundError as e:
189
+ logger.error(f"Excel file not found: {e}")
190
+ raise
191
+ except (IOError, OSError) as e:
192
+ logger.warning(f"Excel IO error (will retry): {e}")
193
+ raise
194
+ except Exception as e:
195
+ logger.error(f"Excel parsing error: {e}")
196
+ raise ValueError(f"Failed to parse Excel: {str(e)}")
197
+
198
+
199
+ # ============================================================================
200
+ # Word Document Parser
201
+ # ============================================================================
202
+
203
+ @retry(
204
+ stop=stop_after_attempt(MAX_RETRIES),
205
+ wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
206
+ retry=retry_if_exception_type((IOError, OSError)),
207
+ reraise=True,
208
+ )
209
+ def parse_word(file_path: str) -> Dict:
210
+ """
211
+ Parse Word document and extract text content.
212
+
213
+ Args:
214
+ file_path: Path to Word file (.docx)
215
+
216
+ Returns:
217
+ Dict with structure: {
218
+ "content": str, # Extracted text
219
+ "paragraphs": int, # Number of paragraphs
220
+ "file_type": "Word",
221
+ "file_path": str
222
+ }
223
+
224
+ Raises:
225
+ FileNotFoundError: If file doesn't exist
226
+ ValueError: If file is corrupted or invalid
227
+ IOError: For file reading errors (triggers retry)
228
+ """
229
+ try:
230
+ from docx import Document
231
+
232
+ path = Path(file_path)
233
+ if not path.exists():
234
+ raise FileNotFoundError(f"Word file not found: {file_path}")
235
+
236
+ logger.info(f"Parsing Word document: {file_path}")
237
+
238
+ doc = Document(str(path))
239
+
240
+ # Extract text from all paragraphs
241
+ paragraphs = [para.text for para in doc.paragraphs if para.text.strip()]
242
+ full_content = "\n\n".join(paragraphs)
243
+
244
+ logger.info(f"Word parsed successfully: {len(paragraphs)} paragraphs")
245
+
246
+ return {
247
+ "content": full_content,
248
+ "paragraphs": len(paragraphs),
249
+ "file_type": "Word",
250
+ "file_path": file_path,
251
+ }
252
+
253
+ except FileNotFoundError as e:
254
+ logger.error(f"Word file not found: {e}")
255
+ raise
256
+ except (IOError, OSError) as e:
257
+ logger.warning(f"Word IO error (will retry): {e}")
258
+ raise
259
+ except Exception as e:
260
+ logger.error(f"Word parsing error: {e}")
261
+ raise ValueError(f"Failed to parse Word document: {str(e)}")
262
+
263
+
264
+ # ============================================================================
265
+ # Text/CSV Parser
266
+ # ============================================================================
267
+
268
+ @retry(
269
+ stop=stop_after_attempt(MAX_RETRIES),
270
+ wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
271
+ retry=retry_if_exception_type((IOError, OSError)),
272
+ reraise=True,
273
+ )
274
+ def parse_text(file_path: str) -> Dict:
275
+ """
276
+ Parse plain text or CSV file.
277
+
278
+ Args:
279
+ file_path: Path to text file (.txt or .csv)
280
+
281
+ Returns:
282
+ Dict with structure: {
283
+ "content": str,
284
+ "lines": int,
285
+ "file_type": "Text" or "CSV",
286
+ "file_path": str
287
+ }
288
+
289
+ Raises:
290
+ FileNotFoundError: If file doesn't exist
291
+ IOError: For file reading errors (triggers retry)
292
+ """
293
+ try:
294
+ path = Path(file_path)
295
+ if not path.exists():
296
+ raise FileNotFoundError(f"Text file not found: {file_path}")
297
+
298
+ logger.info(f"Parsing text file: {file_path}")
299
+
300
+ with open(path, 'r', encoding='utf-8') as f:
301
+ content = f.read()
302
+
303
+ lines = content.count('\n') + 1
304
+ file_type = "CSV" if path.suffix == '.csv' else "Text"
305
+
306
+ logger.info(f"{file_type} file parsed successfully: {lines} lines")
307
+
308
+ return {
309
+ "content": content,
310
+ "lines": lines,
311
+ "file_type": file_type,
312
+ "file_path": file_path,
313
+ }
314
+
315
+ except FileNotFoundError as e:
316
+ logger.error(f"Text file not found: {e}")
317
+ raise
318
+ except (IOError, OSError) as e:
319
+ logger.warning(f"Text file IO error (will retry): {e}")
320
+ raise
321
+ except UnicodeDecodeError as e:
322
+ logger.error(f"Text file encoding error: {e}")
323
+ raise ValueError(f"Failed to decode text file (try UTF-8): {str(e)}")
324
+
325
+
326
+ # ============================================================================
327
+ # Unified File Parser
328
+ # ============================================================================
329
+
330
+ def parse_file(file_path: str) -> Dict:
331
+ """
332
+ Parse file based on extension, automatically selecting the right parser.
333
+
334
+ Args:
335
+ file_path: Path to file
336
+
337
+ Returns:
338
+ Dict with parsed content and metadata
339
+
340
+ Raises:
341
+ ValueError: If file type is not supported
342
+ FileNotFoundError: If file doesn't exist
343
+ Exception: For parsing errors
344
+ """
345
+ path = Path(file_path)
346
+ extension = path.suffix.lower()
347
+
348
+ if extension not in SUPPORTED_EXTENSIONS:
349
+ raise ValueError(
350
+ f"Unsupported file type: {extension}. "
351
+ f"Supported: {', '.join(SUPPORTED_EXTENSIONS.keys())}"
352
+ )
353
+
354
+ logger.info(f"Dispatching parser for {SUPPORTED_EXTENSIONS[extension]} file: {file_path}")
355
+
356
+ # Dispatch to appropriate parser
357
+ if extension == '.pdf':
358
+ return parse_pdf(file_path)
359
+ elif extension in ['.xlsx', '.xls']:
360
+ return parse_excel(file_path)
361
+ elif extension == '.docx':
362
+ return parse_word(file_path)
363
+ elif extension in ['.txt', '.csv']:
364
+ return parse_text(file_path)
365
+ else:
366
+ # Should never reach here due to check above
367
+ raise ValueError(f"No parser for extension: {extension}")
src/tools/vision.py ADDED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Vision Tool - Image analysis using multimodal LLMs
3
+ Author: @mangobee
4
+ Date: 2026-01-02
5
+
6
+ Provides image analysis functionality using:
7
+ - Gemini 2.0 Flash (default, free tier)
8
+ - Claude Sonnet 4.5 (fallback, if configured)
9
+
10
+ Supports:
11
+ - Image file loading and encoding
12
+ - Question answering about images
13
+ - Object detection/description
14
+ - Text extraction (OCR)
15
+ - Visual reasoning
16
+ """
17
+
18
+ import base64
19
+ import logging
20
+ from pathlib import Path
21
+ from typing import Dict, Optional
22
+ from tenacity import (
23
+ retry,
24
+ stop_after_attempt,
25
+ wait_exponential,
26
+ retry_if_exception_type,
27
+ )
28
+
29
+ from src.config.settings import Settings
30
+
31
+ # ============================================================================
32
+ # CONFIG
33
+ # ============================================================================
34
+ MAX_RETRIES = 3
35
+ RETRY_MIN_WAIT = 1 # seconds
36
+ RETRY_MAX_WAIT = 10 # seconds
37
+ MAX_IMAGE_SIZE_MB = 10 # Maximum image size in MB
38
+ SUPPORTED_IMAGE_FORMATS = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp'}
39
+
40
+ # ============================================================================
41
+ # Logging Setup
42
+ # ============================================================================
43
+ logger = logging.getLogger(__name__)
44
+
45
+
46
+ # ============================================================================
47
+ # Image Loading and Encoding
48
+ # ============================================================================
49
+
50
+ def load_and_encode_image(image_path: str) -> Dict[str, str]:
51
+ """
52
+ Load image file and encode as base64.
53
+
54
+ Args:
55
+ image_path: Path to image file
56
+
57
+ Returns:
58
+ Dict with structure: {
59
+ "data": str, # Base64 encoded image
60
+ "mime_type": str, # MIME type (e.g., "image/jpeg")
61
+ "size_mb": float, # File size in MB
62
+ }
63
+
64
+ Raises:
65
+ FileNotFoundError: If image doesn't exist
66
+ ValueError: If file is not a supported image format or too large
67
+ """
68
+ path = Path(image_path)
69
+
70
+ if not path.exists():
71
+ raise FileNotFoundError(f"Image file not found: {image_path}")
72
+
73
+ # Check file extension
74
+ extension = path.suffix.lower()
75
+ if extension not in SUPPORTED_IMAGE_FORMATS:
76
+ raise ValueError(
77
+ f"Unsupported image format: {extension}. "
78
+ f"Supported: {', '.join(SUPPORTED_IMAGE_FORMATS)}"
79
+ )
80
+
81
+ # Check file size
82
+ size_bytes = path.stat().st_size
83
+ size_mb = size_bytes / (1024 * 1024)
84
+
85
+ if size_mb > MAX_IMAGE_SIZE_MB:
86
+ raise ValueError(
87
+ f"Image too large: {size_mb:.2f}MB. Maximum: {MAX_IMAGE_SIZE_MB}MB"
88
+ )
89
+
90
+ # Read and encode image
91
+ with open(path, 'rb') as f:
92
+ image_data = f.read()
93
+
94
+ encoded = base64.b64encode(image_data).decode('utf-8')
95
+
96
+ # Determine MIME type
97
+ mime_types = {
98
+ '.jpg': 'image/jpeg',
99
+ '.jpeg': 'image/jpeg',
100
+ '.png': 'image/png',
101
+ '.gif': 'image/gif',
102
+ '.webp': 'image/webp',
103
+ '.bmp': 'image/bmp',
104
+ }
105
+ mime_type = mime_types.get(extension, 'image/jpeg')
106
+
107
+ logger.info(f"Image loaded: {path.name} ({size_mb:.2f}MB, {mime_type})")
108
+
109
+ return {
110
+ "data": encoded,
111
+ "mime_type": mime_type,
112
+ "size_mb": size_mb,
113
+ }
114
+
115
+
116
+ # ============================================================================
117
+ # Gemini Vision
118
+ # ============================================================================
119
+
120
+ @retry(
121
+ stop=stop_after_attempt(MAX_RETRIES),
122
+ wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
123
+ retry=retry_if_exception_type((ConnectionError, TimeoutError)),
124
+ reraise=True,
125
+ )
126
+ def analyze_image_gemini(image_path: str, question: Optional[str] = None) -> Dict:
127
+ """
128
+ Analyze image using Gemini 2.0 Flash.
129
+
130
+ Args:
131
+ image_path: Path to image file
132
+ question: Optional question about the image (default: "Describe this image")
133
+
134
+ Returns:
135
+ Dict with structure: {
136
+ "answer": str, # LLM's analysis/answer
137
+ "model": "gemini-2.0-flash",
138
+ "image_path": str,
139
+ "question": str
140
+ }
141
+
142
+ Raises:
143
+ ValueError: If API key not configured or image invalid
144
+ ConnectionError: If API connection fails (triggers retry)
145
+ """
146
+ try:
147
+ import google.genai as genai
148
+
149
+ settings = Settings()
150
+ api_key = settings.google_api_key
151
+
152
+ if not api_key:
153
+ raise ValueError("GOOGLE_API_KEY not configured in settings")
154
+
155
+ # Load and encode image
156
+ image_data = load_and_encode_image(image_path)
157
+
158
+ # Default question
159
+ if not question:
160
+ question = "Describe this image in detail."
161
+
162
+ logger.info(f"Gemini vision analysis: {Path(image_path).name} - '{question}'")
163
+
164
+ # Configure Gemini client
165
+ client = genai.Client(api_key=api_key)
166
+
167
+ # Create content with image and text
168
+ response = client.models.generate_content(
169
+ model='gemini-2.0-flash-exp',
170
+ contents=[
171
+ question,
172
+ {
173
+ "mime_type": image_data["mime_type"],
174
+ "data": image_data["data"]
175
+ }
176
+ ]
177
+ )
178
+
179
+ answer = response.text.strip()
180
+
181
+ logger.info(f"Gemini vision successful: {len(answer)} chars")
182
+
183
+ return {
184
+ "answer": answer,
185
+ "model": "gemini-2.0-flash",
186
+ "image_path": image_path,
187
+ "question": question,
188
+ }
189
+
190
+ except ValueError as e:
191
+ logger.error(f"Gemini configuration/input error: {e}")
192
+ raise
193
+ except (ConnectionError, TimeoutError) as e:
194
+ logger.warning(f"Gemini connection error (will retry): {e}")
195
+ raise
196
+ except Exception as e:
197
+ logger.error(f"Gemini vision error: {e}")
198
+ raise Exception(f"Gemini vision failed: {str(e)}")
199
+
200
+
201
+ # ============================================================================
202
+ # Claude Vision (Fallback)
203
+ # ============================================================================
204
+
205
+ @retry(
206
+ stop=stop_after_attempt(MAX_RETRIES),
207
+ wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
208
+ retry=retry_if_exception_type((ConnectionError, TimeoutError)),
209
+ reraise=True,
210
+ )
211
+ def analyze_image_claude(image_path: str, question: Optional[str] = None) -> Dict:
212
+ """
213
+ Analyze image using Claude Sonnet 4.5.
214
+
215
+ Args:
216
+ image_path: Path to image file
217
+ question: Optional question about the image (default: "Describe this image")
218
+
219
+ Returns:
220
+ Dict with structure: {
221
+ "answer": str, # LLM's analysis/answer
222
+ "model": "claude-sonnet-4.5",
223
+ "image_path": str,
224
+ "question": str
225
+ }
226
+
227
+ Raises:
228
+ ValueError: If API key not configured or image invalid
229
+ ConnectionError: If API connection fails (triggers retry)
230
+ """
231
+ try:
232
+ from anthropic import Anthropic
233
+
234
+ settings = Settings()
235
+ api_key = settings.anthropic_api_key
236
+
237
+ if not api_key:
238
+ raise ValueError("ANTHROPIC_API_KEY not configured in settings")
239
+
240
+ # Load and encode image
241
+ image_data = load_and_encode_image(image_path)
242
+
243
+ # Default question
244
+ if not question:
245
+ question = "Describe this image in detail."
246
+
247
+ logger.info(f"Claude vision analysis: {Path(image_path).name} - '{question}'")
248
+
249
+ # Configure Claude client
250
+ client = Anthropic(api_key=api_key)
251
+
252
+ # Create message with image
253
+ response = client.messages.create(
254
+ model="claude-sonnet-4-20250514",
255
+ max_tokens=1024,
256
+ messages=[
257
+ {
258
+ "role": "user",
259
+ "content": [
260
+ {
261
+ "type": "image",
262
+ "source": {
263
+ "type": "base64",
264
+ "media_type": image_data["mime_type"],
265
+ "data": image_data["data"],
266
+ },
267
+ },
268
+ {
269
+ "type": "text",
270
+ "text": question
271
+ }
272
+ ],
273
+ }
274
+ ],
275
+ )
276
+
277
+ answer = response.content[0].text.strip()
278
+
279
+ logger.info(f"Claude vision successful: {len(answer)} chars")
280
+
281
+ return {
282
+ "answer": answer,
283
+ "model": "claude-sonnet-4.5",
284
+ "image_path": image_path,
285
+ "question": question,
286
+ }
287
+
288
+ except ValueError as e:
289
+ logger.error(f"Claude configuration/input error: {e}")
290
+ raise
291
+ except (ConnectionError, TimeoutError) as e:
292
+ logger.warning(f"Claude connection error (will retry): {e}")
293
+ raise
294
+ except Exception as e:
295
+ logger.error(f"Claude vision error: {e}")
296
+ raise Exception(f"Claude vision failed: {str(e)}")
297
+
298
+
299
+ # ============================================================================
300
+ # Unified Vision Analysis
301
+ # ============================================================================
302
+
303
+ def analyze_image(image_path: str, question: Optional[str] = None) -> Dict:
304
+ """
305
+ Analyze image using available multimodal LLM.
306
+
307
+ Tries Gemini first (free tier), falls back to Claude if configured.
308
+
309
+ Args:
310
+ image_path: Path to image file
311
+ question: Optional question about the image
312
+
313
+ Returns:
314
+ Dict with analysis results from either Gemini or Claude
315
+
316
+ Raises:
317
+ Exception: If both Gemini and Claude fail or are not configured
318
+ """
319
+ settings = Settings()
320
+
321
+ # Try Gemini first (default, free tier)
322
+ if settings.google_api_key:
323
+ try:
324
+ return analyze_image_gemini(image_path, question)
325
+ except Exception as e:
326
+ logger.warning(f"Gemini failed, trying Claude: {e}")
327
+
328
+ # Fallback to Claude
329
+ if settings.anthropic_api_key:
330
+ try:
331
+ return analyze_image_claude(image_path, question)
332
+ except Exception as e:
333
+ logger.error(f"Claude also failed: {e}")
334
+ raise Exception(f"Vision analysis failed - Gemini and Claude both failed")
335
+
336
+ # No API keys configured
337
+ raise ValueError(
338
+ "No vision API configured. Please set GOOGLE_API_KEY or ANTHROPIC_API_KEY"
339
+ )
src/tools/web_search.py ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Web Search Tool - Tavily and Exa implementations
3
+ Author: @mangobee
4
+ Date: 2026-01-02
5
+
6
+ Provides web search functionality with:
7
+ - Tavily as primary search (free tier: 1000 req/month)
8
+ - Exa as fallback (paid tier)
9
+ - Retry logic with exponential backoff
10
+ - Structured error handling
11
+ """
12
+
13
+ import logging
14
+ from typing import Dict, List, Optional
15
+ from tenacity import (
16
+ retry,
17
+ stop_after_attempt,
18
+ wait_exponential,
19
+ retry_if_exception_type,
20
+ )
21
+
22
+ from src.config.settings import Settings
23
+
24
+ # ============================================================================
25
+ # CONFIG
26
+ # ============================================================================
27
+ MAX_RETRIES = 3
28
+ RETRY_MIN_WAIT = 1 # seconds
29
+ RETRY_MAX_WAIT = 10 # seconds
30
+ DEFAULT_MAX_RESULTS = 5
31
+
32
+ # ============================================================================
33
+ # Logging Setup
34
+ # ============================================================================
35
+ logger = logging.getLogger(__name__)
36
+
37
+
38
+ # ============================================================================
39
+ # Tavily Search Implementation
40
+ # ============================================================================
41
+
42
+ @retry(
43
+ stop=stop_after_attempt(MAX_RETRIES),
44
+ wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
45
+ retry=retry_if_exception_type((ConnectionError, TimeoutError)),
46
+ reraise=True,
47
+ )
48
+ def tavily_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
49
+ """
50
+ Search using Tavily API with retry logic.
51
+
52
+ Args:
53
+ query: Search query string
54
+ max_results: Maximum number of results to return (default: 5)
55
+
56
+ Returns:
57
+ Dict with structure: {
58
+ "results": [{"title": str, "url": str, "snippet": str}, ...],
59
+ "source": "tavily",
60
+ "query": str,
61
+ "count": int
62
+ }
63
+
64
+ Raises:
65
+ ValueError: If API key not configured
66
+ ConnectionError: If API connection fails after retries
67
+ Exception: For other API errors
68
+ """
69
+ try:
70
+ from tavily import TavilyClient
71
+
72
+ settings = Settings()
73
+ api_key = settings.tavily_api_key
74
+
75
+ if not api_key:
76
+ raise ValueError("TAVILY_API_KEY not configured in settings")
77
+
78
+ logger.info(f"Tavily search: query='{query}', max_results={max_results}")
79
+
80
+ client = TavilyClient(api_key=api_key)
81
+ response = client.search(query=query, max_results=max_results)
82
+
83
+ # Extract and structure results
84
+ results = []
85
+ for item in response.get("results", []):
86
+ results.append({
87
+ "title": item.get("title", ""),
88
+ "url": item.get("url", ""),
89
+ "snippet": item.get("content", ""),
90
+ })
91
+
92
+ logger.info(f"Tavily search successful: {len(results)} results")
93
+
94
+ return {
95
+ "results": results,
96
+ "source": "tavily",
97
+ "query": query,
98
+ "count": len(results),
99
+ }
100
+
101
+ except ValueError as e:
102
+ logger.error(f"Tavily configuration error: {e}")
103
+ raise
104
+ except (ConnectionError, TimeoutError) as e:
105
+ logger.warning(f"Tavily connection error (will retry): {e}")
106
+ raise
107
+ except Exception as e:
108
+ logger.error(f"Tavily search error: {e}")
109
+ raise Exception(f"Tavily search failed: {str(e)}")
110
+
111
+
112
+ # ============================================================================
113
+ # Exa Search Implementation
114
+ # ============================================================================
115
+
116
+ @retry(
117
+ stop=stop_after_attempt(MAX_RETRIES),
118
+ wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
119
+ retry=retry_if_exception_type((ConnectionError, TimeoutError)),
120
+ reraise=True,
121
+ )
122
+ def exa_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
123
+ """
124
+ Search using Exa API with retry logic.
125
+
126
+ Args:
127
+ query: Search query string
128
+ max_results: Maximum number of results to return (default: 5)
129
+
130
+ Returns:
131
+ Dict with structure: {
132
+ "results": [{"title": str, "url": str, "snippet": str}, ...],
133
+ "source": "exa",
134
+ "query": str,
135
+ "count": int
136
+ }
137
+
138
+ Raises:
139
+ ValueError: If API key not configured
140
+ ConnectionError: If API connection fails after retries
141
+ Exception: For other API errors
142
+ """
143
+ try:
144
+ from exa_py import Exa
145
+
146
+ settings = Settings()
147
+ api_key = settings.exa_api_key
148
+
149
+ if not api_key:
150
+ raise ValueError("EXA_API_KEY not configured in settings")
151
+
152
+ logger.info(f"Exa search: query='{query}', max_results={max_results}")
153
+
154
+ client = Exa(api_key=api_key)
155
+ response = client.search(query=query, num_results=max_results, use_autoprompt=True)
156
+
157
+ # Extract and structure results
158
+ results = []
159
+ for item in response.results:
160
+ results.append({
161
+ "title": item.title if hasattr(item, 'title') else "",
162
+ "url": item.url if hasattr(item, 'url') else "",
163
+ "snippet": item.text if hasattr(item, 'text') else "",
164
+ })
165
+
166
+ logger.info(f"Exa search successful: {len(results)} results")
167
+
168
+ return {
169
+ "results": results,
170
+ "source": "exa",
171
+ "query": query,
172
+ "count": len(results),
173
+ }
174
+
175
+ except ValueError as e:
176
+ logger.error(f"Exa configuration error: {e}")
177
+ raise
178
+ except (ConnectionError, TimeoutError) as e:
179
+ logger.warning(f"Exa connection error (will retry): {e}")
180
+ raise
181
+ except Exception as e:
182
+ logger.error(f"Exa search error: {e}")
183
+ raise Exception(f"Exa search failed: {str(e)}")
184
+
185
+
186
+ # ============================================================================
187
+ # Unified Search with Fallback
188
+ # ============================================================================
189
+
190
+ def search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
191
+ """
192
+ Unified search function with automatic fallback.
193
+
194
+ Tries Tavily first (free tier), falls back to Exa if Tavily fails.
195
+
196
+ Args:
197
+ query: Search query string
198
+ max_results: Maximum number of results to return (default: 5)
199
+
200
+ Returns:
201
+ Dict with search results from either Tavily or Exa
202
+
203
+ Raises:
204
+ Exception: If both Tavily and Exa searches fail
205
+ """
206
+ settings = Settings()
207
+ default_tool = settings.default_search_tool
208
+
209
+ # Try default tool first
210
+ if default_tool == "tavily":
211
+ try:
212
+ return tavily_search(query, max_results)
213
+ except Exception as e:
214
+ logger.warning(f"Tavily failed, falling back to Exa: {e}")
215
+ try:
216
+ return exa_search(query, max_results)
217
+ except Exception as exa_error:
218
+ logger.error(f"Both Tavily and Exa failed")
219
+ raise Exception(f"Search failed - Tavily: {e}, Exa: {exa_error}")
220
+ else:
221
+ # Default is Exa
222
+ try:
223
+ return exa_search(query, max_results)
224
+ except Exception as e:
225
+ logger.warning(f"Exa failed, falling back to Tavily: {e}")
226
+ try:
227
+ return tavily_search(query, max_results)
228
+ except Exception as tavily_error:
229
+ logger.error(f"Both Exa and Tavily failed")
230
+ raise Exception(f"Search failed - Exa: {e}, Tavily: {tavily_error}")
tests/fixtures/generate_fixtures.py ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Generate test fixtures for file parser tests
3
+ Author: @mangobee
4
+ """
5
+
6
+ from pathlib import Path
7
+
8
+ # ============================================================================
9
+ # CONFIG
10
+ # ============================================================================
11
+ FIXTURES_DIR = Path(__file__).parent
12
+
13
+ # ============================================================================
14
+ # Generate PDF
15
+ # ============================================================================
16
+ def generate_pdf():
17
+ """Generate sample PDF file using fpdf"""
18
+ try:
19
+ from fpdf import FPDF
20
+ except ImportError:
21
+ print("Skipping PDF generation (fpdf not installed)")
22
+ return
23
+
24
+ pdf = FPDF()
25
+ pdf.add_page()
26
+ pdf.set_font("Arial", size=12)
27
+ pdf.cell(200, 10, txt="Test PDF Document", ln=True)
28
+ pdf.cell(200, 10, txt="This is page 1 content.", ln=True)
29
+ pdf.add_page()
30
+ pdf.cell(200, 10, txt="Page 2", ln=True)
31
+ pdf.cell(200, 10, txt="This is page 2 content.", ln=True)
32
+
33
+ pdf_path = FIXTURES_DIR / "sample.pdf"
34
+ pdf.output(str(pdf_path))
35
+
36
+ print(f"Created: {pdf_path}")
37
+
38
+
39
+ # ============================================================================
40
+ # Generate Excel
41
+ # ============================================================================
42
+ def generate_excel():
43
+ """Generate sample Excel file"""
44
+ from openpyxl import Workbook
45
+
46
+ wb = Workbook()
47
+
48
+ # Sheet 1
49
+ ws1 = wb.active
50
+ ws1.title = "Data"
51
+ ws1.append(["Product", "Price", "Quantity"])
52
+ ws1.append(["Apple", 1.50, 100])
53
+ ws1.append(["Banana", 0.75, 150])
54
+ ws1.append(["Orange", 2.00, 80])
55
+
56
+ # Sheet 2
57
+ ws2 = wb.create_sheet("Summary")
58
+ ws2.append(["Total Products", 3])
59
+ ws2.append(["Total Quantity", 330])
60
+
61
+ excel_path = FIXTURES_DIR / "sample.xlsx"
62
+ wb.save(excel_path)
63
+
64
+ print(f"Created: {excel_path}")
65
+
66
+
67
+ # ============================================================================
68
+ # Generate Word
69
+ # ============================================================================
70
+ def generate_word():
71
+ """Generate sample Word document"""
72
+ from docx import Document
73
+
74
+ doc = Document()
75
+ doc.add_heading("Test Word Document", 0)
76
+ doc.add_paragraph("This is the first paragraph.")
77
+ doc.add_paragraph("This is the second paragraph with some content.")
78
+ doc.add_heading("Section 2", level=1)
79
+ doc.add_paragraph("Content in section 2.")
80
+
81
+ word_path = FIXTURES_DIR / "sample.docx"
82
+ doc.save(word_path)
83
+
84
+ print(f"Created: {word_path}")
85
+
86
+
87
+ # ============================================================================
88
+ # Main
89
+ # ============================================================================
90
+ if __name__ == "__main__":
91
+ print("Generating test fixtures...")
92
+ generate_pdf()
93
+ generate_excel()
94
+ generate_word()
95
+ print("All fixtures generated successfully!")
tests/fixtures/sample.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ Name,Age,City
2
+ Alice,30,New York
3
+ Bob,25,San Francisco
4
+ Charlie,35,Boston
tests/fixtures/sample.docx ADDED
Binary file (36.7 kB). View file
 
tests/fixtures/sample.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ This is a test text file.
2
+ It has multiple lines.
3
+ Line 3 with some content.
4
+ Final line.
tests/fixtures/sample.xlsx ADDED
Binary file (5.44 kB). View file
 
tests/fixtures/test_image.jpg ADDED
tests/test_calculator.py ADDED
@@ -0,0 +1,293 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tests for calculator tool (safe mathematical evaluation)
3
+ Author: @mangobee
4
+ Date: 2026-01-02
5
+
6
+ Tests cover:
7
+ - Basic arithmetic operations
8
+ - Mathematical functions
9
+ - Safety checks (no code execution, no imports, etc.)
10
+ - Timeout protection
11
+ - Complexity limits
12
+ - Error handling
13
+ """
14
+
15
+ import pytest
16
+ from src.tools.calculator import safe_eval
17
+
18
+
19
+ # ============================================================================
20
+ # Basic Arithmetic Tests
21
+ # ============================================================================
22
+
23
+ def test_addition():
24
+ """Test basic addition"""
25
+ result = safe_eval("2 + 3")
26
+ assert result["result"] == 5
27
+ assert result["success"] is True
28
+
29
+
30
+ def test_subtraction():
31
+ """Test basic subtraction"""
32
+ result = safe_eval("10 - 4")
33
+ assert result["result"] == 6
34
+
35
+
36
+ def test_multiplication():
37
+ """Test basic multiplication"""
38
+ result = safe_eval("6 * 7")
39
+ assert result["result"] == 42
40
+
41
+
42
+ def test_division():
43
+ """Test basic division"""
44
+ result = safe_eval("15 / 3")
45
+ assert result["result"] == 5.0
46
+
47
+
48
+ def test_floor_division():
49
+ """Test floor division"""
50
+ result = safe_eval("17 // 5")
51
+ assert result["result"] == 3
52
+
53
+
54
+ def test_modulo():
55
+ """Test modulo operation"""
56
+ result = safe_eval("17 % 5")
57
+ assert result["result"] == 2
58
+
59
+
60
+ def test_exponentiation():
61
+ """Test exponentiation"""
62
+ result = safe_eval("2 ** 8")
63
+ assert result["result"] == 256
64
+
65
+
66
+ def test_negative_numbers():
67
+ """Test negative numbers"""
68
+ result = safe_eval("-5 + 3")
69
+ assert result["result"] == -2
70
+
71
+
72
+ def test_complex_expression():
73
+ """Test complex arithmetic expression"""
74
+ result = safe_eval("(2 + 3) * 4 - 10 / 2")
75
+ assert result["result"] == 15.0
76
+
77
+
78
+ # ============================================================================
79
+ # Mathematical Function Tests
80
+ # ============================================================================
81
+
82
+ def test_sqrt():
83
+ """Test square root function"""
84
+ result = safe_eval("sqrt(16)")
85
+ assert result["result"] == 4.0
86
+
87
+
88
+ def test_abs():
89
+ """Test absolute value"""
90
+ result = safe_eval("abs(-42)")
91
+ assert result["result"] == 42
92
+
93
+
94
+ def test_round():
95
+ """Test rounding"""
96
+ result = safe_eval("round(3.7)")
97
+ assert result["result"] == 4
98
+
99
+
100
+ def test_min():
101
+ """Test min function"""
102
+ result = safe_eval("min(5, 2, 8, 1)")
103
+ assert result["result"] == 1
104
+
105
+
106
+ def test_max():
107
+ """Test max function"""
108
+ result = safe_eval("max(5, 2, 8, 1)")
109
+ assert result["result"] == 8
110
+
111
+
112
+ def test_trigonometric():
113
+ """Test trigonometric functions"""
114
+ result = safe_eval("sin(0)")
115
+ assert result["result"] == 0.0
116
+
117
+ result = safe_eval("cos(0)")
118
+ assert result["result"] == 1.0
119
+
120
+
121
+ def test_logarithm():
122
+ """Test logarithmic functions"""
123
+ result = safe_eval("log10(100)")
124
+ assert result["result"] == 2.0
125
+
126
+
127
+ def test_constants():
128
+ """Test mathematical constants"""
129
+ result = safe_eval("pi")
130
+ assert abs(result["result"] - 3.14159) < 0.001
131
+
132
+ result = safe_eval("e")
133
+ assert abs(result["result"] - 2.71828) < 0.001
134
+
135
+
136
+ def test_factorial():
137
+ """Test factorial function"""
138
+ result = safe_eval("factorial(5)")
139
+ assert result["result"] == 120
140
+
141
+
142
+ def test_nested_functions():
143
+ """Test nested function calls"""
144
+ result = safe_eval("sqrt(abs(-16))")
145
+ assert result["result"] == 4.0
146
+
147
+
148
+ # ============================================================================
149
+ # Security Tests
150
+ # ============================================================================
151
+
152
+ def test_no_import():
153
+ """Test that imports are blocked"""
154
+ with pytest.raises(SyntaxError):
155
+ safe_eval("import os")
156
+
157
+
158
+ def test_no_exec():
159
+ """Test that exec is blocked"""
160
+ with pytest.raises((ValueError, SyntaxError)):
161
+ safe_eval("exec('print(1)')")
162
+
163
+
164
+ def test_no_eval():
165
+ """Test that eval is blocked"""
166
+ with pytest.raises((ValueError, SyntaxError)):
167
+ safe_eval("eval('1+1')")
168
+
169
+
170
+ def test_no_lambda():
171
+ """Test that lambda is blocked"""
172
+ with pytest.raises((ValueError, SyntaxError)):
173
+ safe_eval("lambda x: x + 1")
174
+
175
+
176
+ def test_no_attribute_access():
177
+ """Test that attribute access is blocked"""
178
+ with pytest.raises(ValueError):
179
+ safe_eval("(1).__class__")
180
+
181
+
182
+ def test_no_list_comprehension():
183
+ """Test that list comprehensions are blocked"""
184
+ with pytest.raises(ValueError):
185
+ safe_eval("[x for x in range(10)]")
186
+
187
+
188
+ def test_no_dict_access():
189
+ """Test that dict operations are blocked"""
190
+ with pytest.raises((ValueError, SyntaxError)):
191
+ safe_eval("{'a': 1}")
192
+
193
+
194
+ def test_no_undefined_names():
195
+ """Test that undefined variable names are blocked"""
196
+ with pytest.raises(ValueError, match="Undefined name"):
197
+ safe_eval("undefined_variable + 1")
198
+
199
+
200
+ def test_no_dangerous_functions():
201
+ """Test that dangerous functions are blocked"""
202
+ with pytest.raises(ValueError, match="Unsupported function"):
203
+ safe_eval("open('file.txt')")
204
+
205
+
206
+ # ============================================================================
207
+ # Error Handling Tests
208
+ # ============================================================================
209
+
210
+ def test_division_by_zero():
211
+ """Test division by zero raises error"""
212
+ with pytest.raises(ZeroDivisionError):
213
+ safe_eval("10 / 0")
214
+
215
+
216
+ def test_invalid_syntax():
217
+ """Test invalid syntax raises error"""
218
+ with pytest.raises(SyntaxError):
219
+ safe_eval("2 +* 3")
220
+
221
+
222
+ def test_empty_expression():
223
+ """Test empty expression raises error"""
224
+ with pytest.raises(ValueError, match="non-empty string"):
225
+ safe_eval("")
226
+
227
+
228
+ def test_too_long_expression():
229
+ """Test expression length limit"""
230
+ long_expr = "1 + " * 300 + "1"
231
+ with pytest.raises(ValueError, match="too long"):
232
+ safe_eval(long_expr)
233
+
234
+
235
+ def test_huge_exponent():
236
+ """Test that huge exponents are blocked"""
237
+ with pytest.raises(ValueError, match="Exponent too large"):
238
+ safe_eval("2 ** 10000")
239
+
240
+
241
+ def test_sqrt_negative():
242
+ """Test sqrt of negative number raises error"""
243
+ with pytest.raises(ValueError):
244
+ safe_eval("sqrt(-1)")
245
+
246
+
247
+ def test_factorial_negative():
248
+ """Test factorial of negative number raises error"""
249
+ with pytest.raises(ValueError):
250
+ safe_eval("factorial(-5)")
251
+
252
+
253
+ # ============================================================================
254
+ # Edge Case Tests
255
+ # ============================================================================
256
+
257
+ def test_whitespace_handling():
258
+ """Test that whitespace is handled correctly"""
259
+ result = safe_eval(" 2 + 3 ")
260
+ assert result["result"] == 5
261
+
262
+
263
+ def test_floating_point():
264
+ """Test floating point arithmetic"""
265
+ result = safe_eval("3.14 * 2")
266
+ assert abs(result["result"] - 6.28) < 0.01
267
+
268
+
269
+ def test_very_small_numbers():
270
+ """Test very small numbers"""
271
+ result = safe_eval("0.0001 + 0.0002")
272
+ assert abs(result["result"] - 0.0003) < 0.00001
273
+
274
+
275
+ def test_scientific_notation():
276
+ """Test scientific notation"""
277
+ result = safe_eval("1e3 + 2e2")
278
+ assert result["result"] == 1200.0
279
+
280
+
281
+ def test_parentheses_precedence():
282
+ """Test that parentheses affect precedence correctly"""
283
+ result1 = safe_eval("2 + 3 * 4")
284
+ assert result1["result"] == 14
285
+
286
+ result2 = safe_eval("(2 + 3) * 4")
287
+ assert result2["result"] == 20
288
+
289
+
290
+ def test_multiple_operations():
291
+ """Test chaining multiple operations"""
292
+ result = safe_eval("10 + 20 - 5 * 2 / 2 + 3")
293
+ assert result["result"] == 28.0
tests/test_file_parser.py ADDED
@@ -0,0 +1,317 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tests for file parser tool
3
+ Author: @mangobee
4
+ Date: 2026-01-02
5
+
6
+ Tests cover:
7
+ - PDF parsing
8
+ - Excel parsing
9
+ - Word document parsing
10
+ - Text/CSV parsing
11
+ - Retry logic
12
+ - Error handling
13
+ """
14
+
15
+ import pytest
16
+ from pathlib import Path
17
+ from unittest.mock import Mock, patch, MagicMock
18
+ from src.tools.file_parser import (
19
+ parse_pdf,
20
+ parse_excel,
21
+ parse_word,
22
+ parse_text,
23
+ parse_file,
24
+ )
25
+
26
+ # ============================================================================
27
+ # Test Fixtures
28
+ # ============================================================================
29
+
30
+ FIXTURES_DIR = Path(__file__).parent / "fixtures"
31
+
32
+
33
+ @pytest.fixture
34
+ def sample_text_file():
35
+ """Path to sample text file"""
36
+ return str(FIXTURES_DIR / "sample.txt")
37
+
38
+
39
+ @pytest.fixture
40
+ def sample_csv_file():
41
+ """Path to sample CSV file"""
42
+ return str(FIXTURES_DIR / "sample.csv")
43
+
44
+
45
+ @pytest.fixture
46
+ def sample_excel_file():
47
+ """Path to sample Excel file"""
48
+ return str(FIXTURES_DIR / "sample.xlsx")
49
+
50
+
51
+ @pytest.fixture
52
+ def sample_word_file():
53
+ """Path to sample Word file"""
54
+ return str(FIXTURES_DIR / "sample.docx")
55
+
56
+
57
+ @pytest.fixture
58
+ def mock_pdf_reader():
59
+ """Mock PyPDF2 PdfReader"""
60
+ mock_page_1 = Mock()
61
+ mock_page_1.extract_text.return_value = "Test PDF page 1 content"
62
+
63
+ mock_page_2 = Mock()
64
+ mock_page_2.extract_text.return_value = "Test PDF page 2 content"
65
+
66
+ mock_reader = Mock()
67
+ mock_reader.pages = [mock_page_1, mock_page_2]
68
+
69
+ return mock_reader
70
+
71
+
72
+ # ============================================================================
73
+ # PDF Parser Tests
74
+ # ============================================================================
75
+
76
+ def test_parse_pdf_success(mock_pdf_reader):
77
+ """Test successful PDF parsing"""
78
+ with patch('PyPDF2.PdfReader') as mock_reader_class:
79
+ with patch('src.tools.file_parser.Path') as mock_path_class:
80
+ # Mock file exists
81
+ mock_path = Mock()
82
+ mock_path.exists.return_value = True
83
+ mock_path_class.return_value = mock_path
84
+
85
+ # Mock PdfReader
86
+ mock_reader_class.return_value = mock_pdf_reader
87
+
88
+ result = parse_pdf("test.pdf")
89
+
90
+ assert result["file_type"] == "PDF"
91
+ assert result["pages"] == 2
92
+ assert "page 1 content" in result["content"].lower()
93
+ assert "page 2 content" in result["content"].lower()
94
+
95
+
96
+ def test_parse_pdf_file_not_found():
97
+ """Test PDF parsing with missing file"""
98
+ with patch('src.tools.file_parser.Path') as mock_path_class:
99
+ mock_path = Mock()
100
+ mock_path.exists.return_value = False
101
+ mock_path_class.return_value = mock_path
102
+
103
+ with pytest.raises(FileNotFoundError):
104
+ parse_pdf("nonexistent.pdf")
105
+
106
+
107
+ def test_parse_pdf_io_error_retry():
108
+ """Test PDF parsing with IO error triggers retry"""
109
+ with patch('PyPDF2.PdfReader') as mock_reader_class:
110
+ with patch('src.tools.file_parser.Path') as mock_path_class:
111
+ # Mock file exists
112
+ mock_path = Mock()
113
+ mock_path.exists.return_value = True
114
+ mock_path_class.return_value = mock_path
115
+
116
+ # Mock IO error
117
+ mock_reader_class.side_effect = IOError("Disk error")
118
+
119
+ with pytest.raises(IOError):
120
+ parse_pdf("test.pdf")
121
+
122
+ # Verify retry happened (should be called MAX_RETRIES times)
123
+ assert mock_reader_class.call_count == 3
124
+
125
+
126
+ # ============================================================================
127
+ # Excel Parser Tests
128
+ # ============================================================================
129
+
130
+ def test_parse_excel_success(sample_excel_file):
131
+ """Test successful Excel parsing with real file"""
132
+ result = parse_excel(sample_excel_file)
133
+
134
+ assert result["file_type"] == "Excel"
135
+ assert len(result["sheets"]) == 2
136
+ assert "Data" in result["sheets"]
137
+ assert "Summary" in result["sheets"]
138
+ assert "Apple" in result["content"]
139
+ assert "Banana" in result["content"]
140
+
141
+
142
+ def test_parse_excel_file_not_found():
143
+ """Test Excel parsing with missing file"""
144
+ with pytest.raises(FileNotFoundError):
145
+ parse_excel("nonexistent.xlsx")
146
+
147
+
148
+ def test_parse_excel_io_error_retry():
149
+ """Test Excel parsing with IO error triggers retry"""
150
+ with patch('openpyxl.load_workbook') as mock_load:
151
+ with patch('src.tools.file_parser.Path') as mock_path_class:
152
+ # Mock file exists
153
+ mock_path = Mock()
154
+ mock_path.exists.return_value = True
155
+ mock_path_class.return_value = mock_path
156
+
157
+ # Mock IO error
158
+ mock_load.side_effect = IOError("Disk error")
159
+
160
+ with pytest.raises(IOError):
161
+ parse_excel("test.xlsx")
162
+
163
+ # Verify retry happened
164
+ assert mock_load.call_count == 3
165
+
166
+
167
+ # ============================================================================
168
+ # Word Document Parser Tests
169
+ # ============================================================================
170
+
171
+ def test_parse_word_success(sample_word_file):
172
+ """Test successful Word document parsing with real file"""
173
+ result = parse_word(sample_word_file)
174
+
175
+ assert result["file_type"] == "Word"
176
+ assert result["paragraphs"] > 0
177
+ assert "Test Word Document" in result["content"]
178
+ assert "first paragraph" in result["content"]
179
+
180
+
181
+ def test_parse_word_file_not_found():
182
+ """Test Word parsing with missing file"""
183
+ with pytest.raises(FileNotFoundError):
184
+ parse_word("nonexistent.docx")
185
+
186
+
187
+ def test_parse_word_io_error_retry():
188
+ """Test Word parsing with IO error triggers retry"""
189
+ with patch('docx.Document') as mock_doc_class:
190
+ with patch('src.tools.file_parser.Path') as mock_path_class:
191
+ # Mock file exists
192
+ mock_path = Mock()
193
+ mock_path.exists.return_value = True
194
+ mock_path_class.return_value = mock_path
195
+
196
+ # Mock IO error
197
+ mock_doc_class.side_effect = IOError("Disk error")
198
+
199
+ with pytest.raises(IOError):
200
+ parse_word("test.docx")
201
+
202
+ # Verify retry happened
203
+ assert mock_doc_class.call_count == 3
204
+
205
+
206
+ # ============================================================================
207
+ # Text/CSV Parser Tests
208
+ # ============================================================================
209
+
210
+ def test_parse_text_success(sample_text_file):
211
+ """Test successful text file parsing with real file"""
212
+ result = parse_text(sample_text_file)
213
+
214
+ assert result["file_type"] == "Text"
215
+ assert result["lines"] > 0
216
+ assert "test text file" in result["content"].lower()
217
+
218
+
219
+ def test_parse_csv_success(sample_csv_file):
220
+ """Test successful CSV file parsing with real file"""
221
+ result = parse_text(sample_csv_file)
222
+
223
+ assert result["file_type"] == "CSV"
224
+ assert result["lines"] > 0
225
+ assert "Name,Age,City" in result["content"]
226
+ assert "Alice" in result["content"]
227
+
228
+
229
+ def test_parse_text_file_not_found():
230
+ """Test text parsing with missing file"""
231
+ with pytest.raises(FileNotFoundError):
232
+ parse_text("nonexistent.txt")
233
+
234
+
235
+ def test_parse_text_io_error_retry():
236
+ """Test text parsing with IO error triggers retry"""
237
+ with patch('builtins.open') as mock_open:
238
+ with patch('src.tools.file_parser.Path') as mock_path_class:
239
+ # Mock file exists
240
+ mock_path = Mock()
241
+ mock_path.exists.return_value = True
242
+ mock_path.suffix = '.txt'
243
+ mock_path_class.return_value = mock_path
244
+
245
+ # Mock IO error
246
+ mock_open.side_effect = IOError("Disk error")
247
+
248
+ with pytest.raises(IOError):
249
+ parse_text("test.txt")
250
+
251
+ # Verify retry happened
252
+ assert mock_open.call_count == 3
253
+
254
+
255
+ # ============================================================================
256
+ # Unified Parser Tests
257
+ # ============================================================================
258
+
259
+ def test_parse_file_pdf():
260
+ """Test unified parser dispatches to PDF parser"""
261
+ with patch('src.tools.file_parser.parse_pdf') as mock_parse_pdf:
262
+ mock_parse_pdf.return_value = {"file_type": "PDF"}
263
+
264
+ result = parse_file("test.pdf")
265
+
266
+ assert result["file_type"] == "PDF"
267
+ mock_parse_pdf.assert_called_once()
268
+
269
+
270
+ def test_parse_file_excel():
271
+ """Test unified parser dispatches to Excel parser"""
272
+ with patch('src.tools.file_parser.parse_excel') as mock_parse_excel:
273
+ mock_parse_excel.return_value = {"file_type": "Excel"}
274
+
275
+ result = parse_file("test.xlsx")
276
+
277
+ assert result["file_type"] == "Excel"
278
+ mock_parse_excel.assert_called_once()
279
+
280
+
281
+ def test_parse_file_word():
282
+ """Test unified parser dispatches to Word parser"""
283
+ with patch('src.tools.file_parser.parse_word') as mock_parse_word:
284
+ mock_parse_word.return_value = {"file_type": "Word"}
285
+
286
+ result = parse_file("test.docx")
287
+
288
+ assert result["file_type"] == "Word"
289
+ mock_parse_word.assert_called_once()
290
+
291
+
292
+ def test_parse_file_text():
293
+ """Test unified parser dispatches to text parser"""
294
+ with patch('src.tools.file_parser.parse_text') as mock_parse_text:
295
+ mock_parse_text.return_value = {"file_type": "Text"}
296
+
297
+ result = parse_file("test.txt")
298
+
299
+ assert result["file_type"] == "Text"
300
+ mock_parse_text.assert_called_once()
301
+
302
+
303
+ def test_parse_file_unsupported_extension():
304
+ """Test unified parser rejects unsupported file type"""
305
+ with pytest.raises(ValueError, match="Unsupported file type"):
306
+ parse_file("test.mp4")
307
+
308
+
309
+ def test_parse_file_xls_extension():
310
+ """Test unified parser handles .xls extension"""
311
+ with patch('src.tools.file_parser.parse_excel') as mock_parse_excel:
312
+ mock_parse_excel.return_value = {"file_type": "Excel"}
313
+
314
+ result = parse_file("test.xls")
315
+
316
+ assert result["file_type"] == "Excel"
317
+ mock_parse_excel.assert_called_once()
tests/test_vision.py ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tests for vision tool (multimodal image analysis)
3
+ Author: @mangobee
4
+ Date: 2026-01-02
5
+
6
+ Tests cover:
7
+ - Image loading and encoding
8
+ - Gemini vision analysis
9
+ - Claude vision analysis
10
+ - Fallback mechanism
11
+ - Retry logic
12
+ - Error handling
13
+ """
14
+
15
+ import pytest
16
+ from pathlib import Path
17
+ from unittest.mock import Mock, patch, MagicMock
18
+ from src.tools.vision import (
19
+ load_and_encode_image,
20
+ analyze_image_gemini,
21
+ analyze_image_claude,
22
+ analyze_image,
23
+ )
24
+
25
+
26
+ # ============================================================================
27
+ # Test Fixtures
28
+ # ============================================================================
29
+
30
+ FIXTURES_DIR = Path(__file__).parent / "fixtures"
31
+
32
+
33
+ @pytest.fixture
34
+ def test_image_path():
35
+ """Path to test image"""
36
+ return str(FIXTURES_DIR / "test_image.jpg")
37
+
38
+
39
+ @pytest.fixture
40
+ def mock_gemini_response():
41
+ """Mock Gemini API response"""
42
+ mock_response = Mock()
43
+ mock_response.text = "This image shows a red square."
44
+ return mock_response
45
+
46
+
47
+ @pytest.fixture
48
+ def mock_claude_response():
49
+ """Mock Claude API response"""
50
+ mock_content = Mock()
51
+ mock_content.text = "The image contains a red colored square."
52
+
53
+ mock_response = Mock()
54
+ mock_response.content = [mock_content]
55
+ return mock_response
56
+
57
+
58
+ @pytest.fixture
59
+ def mock_settings_gemini():
60
+ """Mock Settings with Gemini API key"""
61
+ with patch('src.tools.vision.Settings') as mock:
62
+ settings_instance = Mock()
63
+ settings_instance.google_api_key = "test_google_key"
64
+ settings_instance.anthropic_api_key = None
65
+ mock.return_value = settings_instance
66
+ yield mock
67
+
68
+
69
+ @pytest.fixture
70
+ def mock_settings_claude():
71
+ """Mock Settings with Claude API key"""
72
+ with patch('src.tools.vision.Settings') as mock:
73
+ settings_instance = Mock()
74
+ settings_instance.google_api_key = None
75
+ settings_instance.anthropic_api_key = "test_anthropic_key"
76
+ mock.return_value = settings_instance
77
+ yield mock
78
+
79
+
80
+ @pytest.fixture
81
+ def mock_settings_both():
82
+ """Mock Settings with both API keys"""
83
+ with patch('src.tools.vision.Settings') as mock:
84
+ settings_instance = Mock()
85
+ settings_instance.google_api_key = "test_google_key"
86
+ settings_instance.anthropic_api_key = "test_anthropic_key"
87
+ mock.return_value = settings_instance
88
+ yield mock
89
+
90
+
91
+ # ============================================================================
92
+ # Image Loading Tests
93
+ # ============================================================================
94
+
95
+ def test_load_and_encode_image_success(test_image_path):
96
+ """Test successful image loading and encoding"""
97
+ result = load_and_encode_image(test_image_path)
98
+
99
+ assert "data" in result
100
+ assert "mime_type" in result
101
+ assert result["mime_type"] == "image/jpeg"
102
+ assert result["size_mb"] > 0
103
+ assert len(result["data"]) > 0 # Base64 encoded data
104
+
105
+
106
+ def test_load_image_file_not_found():
107
+ """Test image loading with missing file"""
108
+ with pytest.raises(FileNotFoundError):
109
+ load_and_encode_image("nonexistent_image.jpg")
110
+
111
+
112
+ def test_load_image_unsupported_format(tmp_path):
113
+ """Test image loading with unsupported format"""
114
+ # Create a text file with .mp4 extension
115
+ fake_video = tmp_path / "video.mp4"
116
+ fake_video.write_text("not a real video")
117
+
118
+ with pytest.raises(ValueError, match="Unsupported image format"):
119
+ load_and_encode_image(str(fake_video))
120
+
121
+
122
+ # ============================================================================
123
+ # Gemini Vision Tests
124
+ # ============================================================================
125
+
126
+ def test_analyze_image_gemini_success(mock_settings_gemini, test_image_path, mock_gemini_response):
127
+ """Test successful Gemini vision analysis"""
128
+ with patch('google.genai.Client') as mock_client_class:
129
+ # Mock Gemini client
130
+ mock_client = Mock()
131
+ mock_client.models.generate_content.return_value = mock_gemini_response
132
+ mock_client_class.return_value = mock_client
133
+
134
+ result = analyze_image_gemini(test_image_path, "What is in this image?")
135
+
136
+ assert result["model"] == "gemini-2.0-flash"
137
+ assert result["answer"] == "This image shows a red square."
138
+ assert result["question"] == "What is in this image?"
139
+ assert result["image_path"] == test_image_path
140
+
141
+
142
+ def test_analyze_image_gemini_default_question(mock_settings_gemini, test_image_path, mock_gemini_response):
143
+ """Test Gemini with default question"""
144
+ with patch('google.genai.Client') as mock_client_class:
145
+ mock_client = Mock()
146
+ mock_client.models.generate_content.return_value = mock_gemini_response
147
+ mock_client_class.return_value = mock_client
148
+
149
+ result = analyze_image_gemini(test_image_path)
150
+
151
+ assert result["question"] == "Describe this image in detail."
152
+
153
+
154
+ def test_analyze_image_gemini_missing_api_key():
155
+ """Test Gemini with missing API key"""
156
+ with patch('src.tools.vision.Settings') as mock_settings:
157
+ settings_instance = Mock()
158
+ settings_instance.google_api_key = None
159
+ mock_settings.return_value = settings_instance
160
+
161
+ with pytest.raises(ValueError, match="GOOGLE_API_KEY not configured"):
162
+ analyze_image_gemini("test.jpg")
163
+
164
+
165
+ def test_analyze_image_gemini_connection_error(mock_settings_gemini, test_image_path):
166
+ """Test Gemini with connection error (triggers retry)"""
167
+ with patch('google.genai.Client') as mock_client_class:
168
+ mock_client = Mock()
169
+ mock_client.models.generate_content.side_effect = ConnectionError("Network error")
170
+ mock_client_class.return_value = mock_client
171
+
172
+ with pytest.raises(ConnectionError):
173
+ analyze_image_gemini(test_image_path)
174
+
175
+ # Verify retry happened
176
+ assert mock_client.models.generate_content.call_count == 3
177
+
178
+
179
+ # ============================================================================
180
+ # Claude Vision Tests
181
+ # ============================================================================
182
+
183
+ def test_analyze_image_claude_success(mock_settings_claude, test_image_path, mock_claude_response):
184
+ """Test successful Claude vision analysis"""
185
+ with patch('anthropic.Anthropic') as mock_anthropic_class:
186
+ # Mock Claude client
187
+ mock_client = Mock()
188
+ mock_client.messages.create.return_value = mock_claude_response
189
+ mock_anthropic_class.return_value = mock_client
190
+
191
+ result = analyze_image_claude(test_image_path, "What is in this image?")
192
+
193
+ assert result["model"] == "claude-sonnet-4.5"
194
+ assert result["answer"] == "The image contains a red colored square."
195
+ assert result["question"] == "What is in this image?"
196
+ assert result["image_path"] == test_image_path
197
+
198
+
199
+ def test_analyze_image_claude_default_question(mock_settings_claude, test_image_path, mock_claude_response):
200
+ """Test Claude with default question"""
201
+ with patch('anthropic.Anthropic') as mock_anthropic_class:
202
+ mock_client = Mock()
203
+ mock_client.messages.create.return_value = mock_claude_response
204
+ mock_anthropic_class.return_value = mock_client
205
+
206
+ result = analyze_image_claude(test_image_path)
207
+
208
+ assert result["question"] == "Describe this image in detail."
209
+
210
+
211
+ def test_analyze_image_claude_missing_api_key():
212
+ """Test Claude with missing API key"""
213
+ with patch('src.tools.vision.Settings') as mock_settings:
214
+ settings_instance = Mock()
215
+ settings_instance.anthropic_api_key = None
216
+ mock_settings.return_value = settings_instance
217
+
218
+ with pytest.raises(ValueError, match="ANTHROPIC_API_KEY not configured"):
219
+ analyze_image_claude("test.jpg")
220
+
221
+
222
+ def test_analyze_image_claude_connection_error(mock_settings_claude, test_image_path):
223
+ """Test Claude with connection error (triggers retry)"""
224
+ with patch('anthropic.Anthropic') as mock_anthropic_class:
225
+ mock_client = Mock()
226
+ mock_client.messages.create.side_effect = ConnectionError("Network error")
227
+ mock_anthropic_class.return_value = mock_client
228
+
229
+ with pytest.raises(ConnectionError):
230
+ analyze_image_claude(test_image_path)
231
+
232
+ # Verify retry happened
233
+ assert mock_client.messages.create.call_count == 3
234
+
235
+
236
+ # ============================================================================
237
+ # Unified Vision Analysis Tests
238
+ # ============================================================================
239
+
240
+ def test_analyze_image_uses_gemini(mock_settings_both, test_image_path, mock_gemini_response):
241
+ """Test unified analysis prefers Gemini when both APIs available"""
242
+ with patch('google.genai.Client') as mock_gemini_class:
243
+ mock_client = Mock()
244
+ mock_client.models.generate_content.return_value = mock_gemini_response
245
+ mock_gemini_class.return_value = mock_client
246
+
247
+ result = analyze_image(test_image_path, "What is this?")
248
+
249
+ assert result["model"] == "gemini-2.0-flash"
250
+ assert "red square" in result["answer"].lower()
251
+
252
+
253
+ def test_analyze_image_fallback_to_claude(mock_settings_both, test_image_path, mock_claude_response):
254
+ """Test unified analysis falls back to Claude when Gemini fails"""
255
+ with patch('google.genai.Client') as mock_gemini_class:
256
+ with patch('anthropic.Anthropic') as mock_claude_class:
257
+ # Gemini fails
258
+ mock_gemini_client = Mock()
259
+ mock_gemini_client.models.generate_content.side_effect = Exception("Gemini error")
260
+ mock_gemini_class.return_value = mock_gemini_client
261
+
262
+ # Claude succeeds
263
+ mock_claude_client = Mock()
264
+ mock_claude_client.messages.create.return_value = mock_claude_response
265
+ mock_claude_class.return_value = mock_claude_client
266
+
267
+ result = analyze_image(test_image_path, "What is this?")
268
+
269
+ assert result["model"] == "claude-sonnet-4.5"
270
+ assert "red" in result["answer"].lower()
271
+
272
+
273
+ def test_analyze_image_no_api_keys():
274
+ """Test unified analysis with no API keys configured"""
275
+ with patch('src.tools.vision.Settings') as mock_settings:
276
+ settings_instance = Mock()
277
+ settings_instance.google_api_key = None
278
+ settings_instance.anthropic_api_key = None
279
+ mock_settings.return_value = settings_instance
280
+
281
+ with pytest.raises(ValueError, match="No vision API configured"):
282
+ analyze_image("test.jpg")
283
+
284
+
285
+ def test_analyze_image_both_fail(mock_settings_both, test_image_path):
286
+ """Test unified analysis when both APIs fail"""
287
+ with patch('google.genai.Client') as mock_gemini_class:
288
+ with patch('anthropic.Anthropic') as mock_claude_class:
289
+ # Both fail
290
+ mock_gemini_client = Mock()
291
+ mock_gemini_client.models.generate_content.side_effect = Exception("Gemini error")
292
+ mock_gemini_class.return_value = mock_gemini_client
293
+
294
+ mock_claude_client = Mock()
295
+ mock_claude_client.messages.create.side_effect = Exception("Claude error")
296
+ mock_claude_class.return_value = mock_claude_client
297
+
298
+ with pytest.raises(Exception, match="both failed"):
299
+ analyze_image(test_image_path)
tests/test_web_search.py ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tests for web search tool (Tavily and Exa)
3
+ Author: @mangobee
4
+ Date: 2026-01-02
5
+
6
+ Tests cover:
7
+ - Tavily search with mocked API
8
+ - Exa search with mocked API
9
+ - Retry logic simulation
10
+ - Fallback mechanism
11
+ - Error handling
12
+ """
13
+
14
+ import pytest
15
+ from unittest.mock import Mock, patch, MagicMock
16
+ from src.tools.web_search import tavily_search, exa_search, search
17
+
18
+
19
+ # ============================================================================
20
+ # Test Fixtures
21
+ # ============================================================================
22
+
23
+ @pytest.fixture
24
+ def mock_tavily_response():
25
+ """Mock Tavily API response"""
26
+ return {
27
+ "results": [
28
+ {
29
+ "title": "Test Result 1",
30
+ "url": "https://example.com/1",
31
+ "content": "This is test content 1"
32
+ },
33
+ {
34
+ "title": "Test Result 2",
35
+ "url": "https://example.com/2",
36
+ "content": "This is test content 2"
37
+ }
38
+ ]
39
+ }
40
+
41
+
42
+ @pytest.fixture
43
+ def mock_exa_response():
44
+ """Mock Exa API response"""
45
+ mock_result_1 = Mock()
46
+ mock_result_1.title = "Exa Result 1"
47
+ mock_result_1.url = "https://exa.com/1"
48
+ mock_result_1.text = "This is exa content 1"
49
+
50
+ mock_result_2 = Mock()
51
+ mock_result_2.title = "Exa Result 2"
52
+ mock_result_2.url = "https://exa.com/2"
53
+ mock_result_2.text = "This is exa content 2"
54
+
55
+ mock_response = Mock()
56
+ mock_response.results = [mock_result_1, mock_result_2]
57
+ return mock_response
58
+
59
+
60
+ @pytest.fixture
61
+ def mock_settings_tavily():
62
+ """Mock Settings with Tavily API key"""
63
+ with patch('src.tools.web_search.Settings') as mock:
64
+ settings_instance = Mock()
65
+ settings_instance.tavily_api_key = "test_tavily_key"
66
+ settings_instance.exa_api_key = "test_exa_key"
67
+ settings_instance.default_search_tool = "tavily"
68
+ mock.return_value = settings_instance
69
+ yield mock
70
+
71
+
72
+ @pytest.fixture
73
+ def mock_settings_exa():
74
+ """Mock Settings with Exa as default"""
75
+ with patch('src.tools.web_search.Settings') as mock:
76
+ settings_instance = Mock()
77
+ settings_instance.tavily_api_key = "test_tavily_key"
78
+ settings_instance.exa_api_key = "test_exa_key"
79
+ settings_instance.default_search_tool = "exa"
80
+ mock.return_value = settings_instance
81
+ yield mock
82
+
83
+
84
+ # ============================================================================
85
+ # Tavily Search Tests
86
+ # ============================================================================
87
+
88
+ def test_tavily_search_success(mock_settings_tavily, mock_tavily_response):
89
+ """Test successful Tavily search"""
90
+ with patch('tavily.TavilyClient') as mock_client_class:
91
+ mock_client = Mock()
92
+ mock_client.search.return_value = mock_tavily_response
93
+ mock_client_class.return_value = mock_client
94
+
95
+ result = tavily_search("test query", max_results=2)
96
+
97
+ assert result["source"] == "tavily"
98
+ assert result["query"] == "test query"
99
+ assert result["count"] == 2
100
+ assert len(result["results"]) == 2
101
+ assert result["results"][0]["title"] == "Test Result 1"
102
+ assert result["results"][0]["url"] == "https://example.com/1"
103
+ assert result["results"][0]["snippet"] == "This is test content 1"
104
+
105
+
106
+ def test_tavily_search_missing_api_key():
107
+ """Test Tavily search with missing API key"""
108
+ with patch('src.tools.web_search.Settings') as mock_settings:
109
+ settings_instance = Mock()
110
+ settings_instance.tavily_api_key = None
111
+ mock_settings.return_value = settings_instance
112
+
113
+ with pytest.raises(ValueError, match="TAVILY_API_KEY not configured"):
114
+ tavily_search("test query")
115
+
116
+
117
+ def test_tavily_search_connection_error(mock_settings_tavily):
118
+ """Test Tavily search with connection error (triggers retry)"""
119
+ with patch('tavily.TavilyClient') as mock_client_class:
120
+ mock_client = Mock()
121
+ mock_client.search.side_effect = ConnectionError("Network error")
122
+ mock_client_class.return_value = mock_client
123
+
124
+ with pytest.raises(ConnectionError):
125
+ tavily_search("test query")
126
+
127
+ # Verify retry happened (should be called MAX_RETRIES times)
128
+ assert mock_client.search.call_count == 3
129
+
130
+
131
+ def test_tavily_search_empty_results(mock_settings_tavily):
132
+ """Test Tavily search with empty results"""
133
+ with patch('tavily.TavilyClient') as mock_client_class:
134
+ mock_client = Mock()
135
+ mock_client.search.return_value = {"results": []}
136
+ mock_client_class.return_value = mock_client
137
+
138
+ result = tavily_search("test query")
139
+
140
+ assert result["count"] == 0
141
+ assert result["results"] == []
142
+
143
+
144
+ # ============================================================================
145
+ # Exa Search Tests
146
+ # ============================================================================
147
+
148
+ def test_exa_search_success(mock_settings_exa, mock_exa_response):
149
+ """Test successful Exa search"""
150
+ with patch('exa_py.Exa') as mock_client_class:
151
+ mock_client = Mock()
152
+ mock_client.search.return_value = mock_exa_response
153
+ mock_client_class.return_value = mock_client
154
+
155
+ result = exa_search("test query", max_results=2)
156
+
157
+ assert result["source"] == "exa"
158
+ assert result["query"] == "test query"
159
+ assert result["count"] == 2
160
+ assert len(result["results"]) == 2
161
+ assert result["results"][0]["title"] == "Exa Result 1"
162
+ assert result["results"][0]["url"] == "https://exa.com/1"
163
+ assert result["results"][0]["snippet"] == "This is exa content 1"
164
+
165
+
166
+ def test_exa_search_missing_api_key():
167
+ """Test Exa search with missing API key"""
168
+ with patch('src.tools.web_search.Settings') as mock_settings:
169
+ settings_instance = Mock()
170
+ settings_instance.exa_api_key = None
171
+ mock_settings.return_value = settings_instance
172
+
173
+ with pytest.raises(ValueError, match="EXA_API_KEY not configured"):
174
+ exa_search("test query")
175
+
176
+
177
+ def test_exa_search_connection_error(mock_settings_exa):
178
+ """Test Exa search with connection error (triggers retry)"""
179
+ with patch('exa_py.Exa') as mock_client_class:
180
+ mock_client = Mock()
181
+ mock_client.search.side_effect = ConnectionError("Network error")
182
+ mock_client_class.return_value = mock_client
183
+
184
+ with pytest.raises(ConnectionError):
185
+ exa_search("test query")
186
+
187
+ # Verify retry happened
188
+ assert mock_client.search.call_count == 3
189
+
190
+
191
+ # ============================================================================
192
+ # Unified Search with Fallback Tests
193
+ # ============================================================================
194
+
195
+ def test_search_tavily_success(mock_settings_tavily, mock_tavily_response):
196
+ """Test unified search using Tavily successfully"""
197
+ with patch('tavily.TavilyClient') as mock_client_class:
198
+ mock_client = Mock()
199
+ mock_client.search.return_value = mock_tavily_response
200
+ mock_client_class.return_value = mock_client
201
+
202
+ result = search("test query")
203
+
204
+ assert result["source"] == "tavily"
205
+ assert result["count"] == 2
206
+
207
+
208
+ def test_search_fallback_to_exa(mock_settings_tavily, mock_exa_response):
209
+ """Test unified search falls back to Exa when Tavily fails"""
210
+ with patch('tavily.TavilyClient') as mock_tavily_class:
211
+ with patch('exa_py.Exa') as mock_exa_class:
212
+ # Tavily fails
213
+ mock_tavily_client = Mock()
214
+ mock_tavily_client.search.side_effect = Exception("Tavily error")
215
+ mock_tavily_class.return_value = mock_tavily_client
216
+
217
+ # Exa succeeds
218
+ mock_exa_client = Mock()
219
+ mock_exa_client.search.return_value = mock_exa_response
220
+ mock_exa_class.return_value = mock_exa_client
221
+
222
+ result = search("test query")
223
+
224
+ assert result["source"] == "exa"
225
+ assert result["count"] == 2
226
+
227
+
228
+ def test_search_both_fail(mock_settings_tavily):
229
+ """Test unified search when both Tavily and Exa fail"""
230
+ with patch('tavily.TavilyClient') as mock_tavily_class:
231
+ with patch('exa_py.Exa') as mock_exa_class:
232
+ # Both fail
233
+ mock_tavily_client = Mock()
234
+ mock_tavily_client.search.side_effect = Exception("Tavily error")
235
+ mock_tavily_class.return_value = mock_tavily_client
236
+
237
+ mock_exa_client = Mock()
238
+ mock_exa_client.search.side_effect = Exception("Exa error")
239
+ mock_exa_class.return_value = mock_exa_client
240
+
241
+ with pytest.raises(Exception, match="Search failed"):
242
+ search("test query")