Stage 2: Implement tool development with retry logic and error handling
Browse filesImplemented 4 core tools with comprehensive test coverage:
**Tools Added:**
- web_search.py: Tavily/Exa search with fallback (10 tests)
- file_parser.py: PDF/Excel/Word/Text parsing (19 tests)
- calculator.py: Safe math eval with security (41 tests)
- vision.py: Multimodal image analysis (15 tests)
**Features:**
- Retry logic with tenacity (exponential backoff, 3 max retries)
- Comprehensive error handling and logging
- Tool registry in __init__.py with metadata
- 85 passing tests total
**Integration:**
- Updated graph.py execute_node to load tool registry
- Added TOOLS dict for Stage 3 dynamic tool selection
- Maintained Stage 1 compatibility
**Testing:**
- Created test fixtures for all file types
- Mock API testing for web search and vision
- Security testing for calculator (prevents code injection)
- All 91 tests passing (6 agent + 85 tool tests)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- PLAN.md +297 -5
- TODO.md +66 -9
- pyproject.toml +1 -5
- requirements.txt +1 -0
- src/agent/graph.py +44 -19
- src/tools/__init__.py +58 -9
- src/tools/calculator.py +303 -0
- src/tools/file_parser.py +367 -0
- src/tools/vision.py +339 -0
- src/tools/web_search.py +230 -0
- tests/fixtures/generate_fixtures.py +95 -0
- tests/fixtures/sample.csv +4 -0
- tests/fixtures/sample.docx +0 -0
- tests/fixtures/sample.txt +4 -0
- tests/fixtures/sample.xlsx +0 -0
- tests/fixtures/test_image.jpg +0 -0
- tests/test_calculator.py +293 -0
- tests/test_file_parser.py +317 -0
- tests/test_vision.py +299 -0
- tests/test_web_search.py +242 -0
|
@@ -1,8 +1,300 @@
|
|
| 1 |
-
# Implementation Plan
|
| 2 |
|
| 3 |
-
**
|
| 4 |
-
**
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation Plan - Stage 2: Tool Development
|
| 2 |
|
| 3 |
+
**Date:** 2026-01-02
|
| 4 |
+
**Dev Record:** TBD (will create dev_260102_##_stage2_tool_development.md)
|
| 5 |
+
**Status:** In Progress
|
| 6 |
|
| 7 |
+
## Objective
|
| 8 |
|
| 9 |
+
Implement 4 core tools (web search, file parsing, calculator, multimodal vision) with retry logic and error handling, following Level 5 (Component Selection) and Level 6 (Implementation Framework) architectural decisions. Each tool must be independently testable and integrate seamlessly with the LangGraph StateGraph.
|
| 10 |
+
|
| 11 |
+
## Steps
|
| 12 |
+
|
| 13 |
+
### Step 1: Web Search Tool Implementation
|
| 14 |
+
|
| 15 |
+
**1.1 Create src/tools/web_search.py**
|
| 16 |
+
|
| 17 |
+
- Implement `tavily_search(query: str, max_results: int = 5) -> dict` function
|
| 18 |
+
- Implement `exa_search(query: str, max_results: int = 5) -> dict` function (fallback)
|
| 19 |
+
- Use Settings.get_search_api_key() for API key retrieval
|
| 20 |
+
- Return structured results: {results: [{title, url, snippet}], source: "tavily"|"exa"}
|
| 21 |
+
|
| 22 |
+
**1.2 Add retry logic with exponential backoff**
|
| 23 |
+
|
| 24 |
+
- Use `tenacity` library for retry decorator
|
| 25 |
+
- Retry on connection errors, timeouts, rate limits
|
| 26 |
+
- Max 3 retries with 2^n second delays
|
| 27 |
+
- Fallback from Tavily to Exa if Tavily fails after retries
|
| 28 |
+
|
| 29 |
+
**1.3 Error handling**
|
| 30 |
+
|
| 31 |
+
- Catch API errors and return meaningful error messages
|
| 32 |
+
- Handle empty results gracefully
|
| 33 |
+
- Log all errors for debugging
|
| 34 |
+
|
| 35 |
+
**1.4 Create tests/test_web_search.py**
|
| 36 |
+
|
| 37 |
+
- Test Tavily search with mock API
|
| 38 |
+
- Test Exa search with mock API
|
| 39 |
+
- Test retry logic (simulate failures)
|
| 40 |
+
- Test fallback mechanism
|
| 41 |
+
- Test error handling
|
| 42 |
+
|
| 43 |
+
### Step 2: File Parsing Tool Implementation
|
| 44 |
+
|
| 45 |
+
**2.1 Create src/tools/file_parser.py**
|
| 46 |
+
|
| 47 |
+
- Implement `parse_pdf(file_path: str) -> str` using PyPDF2
|
| 48 |
+
- Implement `parse_excel(file_path: str) -> dict` using openpyxl
|
| 49 |
+
- Implement `parse_docx(file_path: str) -> str` using python-docx
|
| 50 |
+
- Implement `parse_image_text(image_path: str) -> str` using Pillow + OCR (optional)
|
| 51 |
+
- Generic `parse_file(file_path: str) -> dict` dispatcher based on extension
|
| 52 |
+
|
| 53 |
+
**2.2 Add retry logic for file operations**
|
| 54 |
+
|
| 55 |
+
- Retry on file read errors (network issues, temporary locks)
|
| 56 |
+
- Max 3 retries with exponential backoff
|
| 57 |
+
|
| 58 |
+
**2.3 Error handling**
|
| 59 |
+
|
| 60 |
+
- Handle file not found errors
|
| 61 |
+
- Handle corrupted file errors
|
| 62 |
+
- Handle unsupported format errors
|
| 63 |
+
- Return structured error responses
|
| 64 |
+
|
| 65 |
+
**2.4 Create tests/test_file_parser.py**
|
| 66 |
+
|
| 67 |
+
- Create test fixtures (sample PDF, Excel, Word files in tests/fixtures/)
|
| 68 |
+
- Test each parser function independently
|
| 69 |
+
- Test error handling for missing files
|
| 70 |
+
- Test error handling for corrupted files
|
| 71 |
+
|
| 72 |
+
### Step 3: Calculator Tool Implementation
|
| 73 |
+
|
| 74 |
+
**3.1 Create src/tools/calculator.py**
|
| 75 |
+
|
| 76 |
+
- Implement `safe_eval(expression: str) -> dict` using ast.literal_eval
|
| 77 |
+
- Support basic arithmetic operations (+, -, *, /, **, %)
|
| 78 |
+
- Support mathematical functions (sin, cos, sqrt, etc.) via math module
|
| 79 |
+
- Return structured result: {result: float|int, expression: str}
|
| 80 |
+
|
| 81 |
+
**3.2 Add safety checks**
|
| 82 |
+
|
| 83 |
+
- Whitelist allowed operations (no exec, eval, import)
|
| 84 |
+
- Validate expression before evaluation
|
| 85 |
+
- Set execution timeout (prevent infinite loops)
|
| 86 |
+
- Limit expression complexity (prevent DoS)
|
| 87 |
+
|
| 88 |
+
**3.3 Error handling**
|
| 89 |
+
|
| 90 |
+
- Handle syntax errors
|
| 91 |
+
- Handle division by zero
|
| 92 |
+
- Handle invalid operations
|
| 93 |
+
- Return meaningful error messages
|
| 94 |
+
|
| 95 |
+
**3.4 Create tests/test_calculator.py**
|
| 96 |
+
|
| 97 |
+
- Test basic arithmetic (2+2, 10*5, etc.)
|
| 98 |
+
- Test mathematical functions (sqrt(16), sin(0), etc.)
|
| 99 |
+
- Test error handling (division by zero, invalid syntax)
|
| 100 |
+
- Test safety checks (block dangerous operations)
|
| 101 |
+
|
| 102 |
+
### Step 4: Multimodal Vision Tool Implementation
|
| 103 |
+
|
| 104 |
+
**4.1 Create src/tools/vision.py**
|
| 105 |
+
|
| 106 |
+
- Implement `analyze_image(image_path: str, question: str) -> str`
|
| 107 |
+
- Use LLM's native vision capabilities (Gemini/Claude)
|
| 108 |
+
- Load image, encode to base64
|
| 109 |
+
- Send to vision-capable LLM with question
|
| 110 |
+
- Return description/answer
|
| 111 |
+
|
| 112 |
+
**4.2 Add retry logic**
|
| 113 |
+
|
| 114 |
+
- Retry on API errors
|
| 115 |
+
- Max 3 retries with exponential backoff
|
| 116 |
+
|
| 117 |
+
**4.3 Error handling**
|
| 118 |
+
|
| 119 |
+
- Handle image loading errors
|
| 120 |
+
- Handle unsupported image formats
|
| 121 |
+
- Handle API errors
|
| 122 |
+
- Return structured responses
|
| 123 |
+
|
| 124 |
+
**4.4 Create tests/test_vision.py**
|
| 125 |
+
|
| 126 |
+
- Create test image fixtures
|
| 127 |
+
- Test image analysis with mock LLM
|
| 128 |
+
- Test error handling
|
| 129 |
+
- Test retry logic
|
| 130 |
+
|
| 131 |
+
### Step 5: Tool Integration with StateGraph
|
| 132 |
+
|
| 133 |
+
**5.1 Update src/tools/__init__.py**
|
| 134 |
+
|
| 135 |
+
- Export all tool functions
|
| 136 |
+
- Create unified tool registry: `TOOLS = {name: function}`
|
| 137 |
+
- Add tool metadata (description, parameters, return type)
|
| 138 |
+
|
| 139 |
+
**5.2 Update src/agent/graph.py execute_node**
|
| 140 |
+
|
| 141 |
+
- Replace placeholder with actual tool execution
|
| 142 |
+
- Parse tool calls from plan
|
| 143 |
+
- Execute tools with error handling
|
| 144 |
+
- Collect results
|
| 145 |
+
- Return updated state with tool results
|
| 146 |
+
|
| 147 |
+
**5.3 Add tool execution wrapper**
|
| 148 |
+
|
| 149 |
+
- Implement `execute_tool(tool_name: str, **kwargs) -> dict`
|
| 150 |
+
- Add logging for tool calls
|
| 151 |
+
- Add timeout enforcement
|
| 152 |
+
- Add result validation
|
| 153 |
+
|
| 154 |
+
### Step 6: Configuration and Settings Updates
|
| 155 |
+
|
| 156 |
+
**6.1 Update src/config/settings.py**
|
| 157 |
+
|
| 158 |
+
- Add tool-specific settings (timeouts, max retries, etc.)
|
| 159 |
+
- Add tool feature flags (enable/disable specific tools)
|
| 160 |
+
- Add result size limits
|
| 161 |
+
|
| 162 |
+
**6.2 Update .env.example**
|
| 163 |
+
|
| 164 |
+
- Document any new environment variables
|
| 165 |
+
- Add tool-specific configuration examples
|
| 166 |
+
|
| 167 |
+
### Step 7: Integration Testing
|
| 168 |
+
|
| 169 |
+
**7.1 Create tests/test_tools_integration.py**
|
| 170 |
+
|
| 171 |
+
- Test all tools working together
|
| 172 |
+
- Test tool execution from StateGraph
|
| 173 |
+
- Test error propagation
|
| 174 |
+
- Test retry mechanisms across all tools
|
| 175 |
+
|
| 176 |
+
**7.2 Create test_stage2.py**
|
| 177 |
+
|
| 178 |
+
- End-to-end test with real tool calls
|
| 179 |
+
- Verify StateGraph executes tools correctly
|
| 180 |
+
- Verify results are returned to state
|
| 181 |
+
- Verify errors are handled gracefully
|
| 182 |
+
|
| 183 |
+
### Step 8: Documentation and Deployment
|
| 184 |
+
|
| 185 |
+
**8.1 Update requirements.txt**
|
| 186 |
+
|
| 187 |
+
- Ensure all tool dependencies are included
|
| 188 |
+
- Add tenacity for retry logic
|
| 189 |
+
|
| 190 |
+
**8.2 Local testing**
|
| 191 |
+
|
| 192 |
+
- Run all test suites
|
| 193 |
+
- Test with Gradio UI
|
| 194 |
+
- Verify no regressions from Stage 1
|
| 195 |
+
|
| 196 |
+
**8.3 Deploy to HF Spaces**
|
| 197 |
+
|
| 198 |
+
- Push changes
|
| 199 |
+
- Verify build succeeds
|
| 200 |
+
- Test tools in deployed environment
|
| 201 |
+
|
| 202 |
+
## Files to Modify
|
| 203 |
+
|
| 204 |
+
**New files to create:**
|
| 205 |
+
|
| 206 |
+
- `src/tools/web_search.py` - Tavily/Exa search implementation
|
| 207 |
+
- `src/tools/file_parser.py` - PDF/Excel/Word/Image parsing
|
| 208 |
+
- `src/tools/calculator.py` - Safe expression evaluation
|
| 209 |
+
- `src/tools/vision.py` - Multimodal image analysis
|
| 210 |
+
- `tests/test_web_search.py` - Web search tests
|
| 211 |
+
- `tests/test_file_parser.py` - File parser tests
|
| 212 |
+
- `tests/test_calculator.py` - Calculator tests
|
| 213 |
+
- `tests/test_vision.py` - Vision tests
|
| 214 |
+
- `tests/test_tools_integration.py` - Integration tests
|
| 215 |
+
- `tests/test_stage2.py` - Stage 2 end-to-end tests
|
| 216 |
+
- `tests/fixtures/` - Test files directory
|
| 217 |
+
|
| 218 |
+
**Existing files to modify:**
|
| 219 |
+
|
| 220 |
+
- `src/tools/__init__.py` - Export all tools, create tool registry
|
| 221 |
+
- `src/agent/graph.py` - Update execute_node to use real tools
|
| 222 |
+
- `src/config/settings.py` - Add tool-specific settings
|
| 223 |
+
- `.env.example` - Document new configuration (if any)
|
| 224 |
+
- `requirements.txt` - Add tenacity for retry logic
|
| 225 |
+
|
| 226 |
+
**Files NOT to modify:**
|
| 227 |
+
|
| 228 |
+
- `src/agent/graph.py` plan_node - Defer to Stage 3
|
| 229 |
+
- `src/agent/graph.py` answer_node - Defer to Stage 3
|
| 230 |
+
- Planning/reasoning logic - Defer to Stage 3
|
| 231 |
+
|
| 232 |
+
## Success Criteria
|
| 233 |
+
|
| 234 |
+
### Functional Requirements
|
| 235 |
+
|
| 236 |
+
- [ ] Web search tool returns valid results from Tavily
|
| 237 |
+
- [ ] Web search falls back to Exa when Tavily fails
|
| 238 |
+
- [ ] File parser handles PDF, Excel, Word files correctly
|
| 239 |
+
- [ ] Calculator evaluates mathematical expressions safely
|
| 240 |
+
- [ ] Vision tool analyzes images using LLM vision capabilities
|
| 241 |
+
- [ ] All tools have retry logic with exponential backoff
|
| 242 |
+
- [ ] All tools handle errors gracefully
|
| 243 |
+
- [ ] Tools integrate with StateGraph execute_node
|
| 244 |
+
|
| 245 |
+
### Technical Requirements
|
| 246 |
+
|
| 247 |
+
- [ ] All tool functions return structured dict responses
|
| 248 |
+
- [ ] Retry logic uses tenacity with max 3 retries
|
| 249 |
+
- [ ] Error messages are clear and actionable
|
| 250 |
+
- [ ] All tools have comprehensive test coverage (>80%)
|
| 251 |
+
- [ ] No unsafe code execution in calculator
|
| 252 |
+
- [ ] Tool timeouts enforced to prevent hangs
|
| 253 |
+
|
| 254 |
+
### Validation Checkpoints
|
| 255 |
+
|
| 256 |
+
- [ ] **Checkpoint 1:** Web search tool working with tests passing
|
| 257 |
+
- [ ] **Checkpoint 2:** File parser working with tests passing
|
| 258 |
+
- [ ] **Checkpoint 3:** Calculator working with tests passing
|
| 259 |
+
- [ ] **Checkpoint 4:** Vision tool working with tests passing
|
| 260 |
+
- [ ] **Checkpoint 5:** All tools integrated with StateGraph
|
| 261 |
+
- [ ] **Checkpoint 6:** Integration tests passing
|
| 262 |
+
- [ ] **Checkpoint 7:** Deployed to HF Spaces successfully
|
| 263 |
+
|
| 264 |
+
### Non-Goals for Stage 2
|
| 265 |
+
|
| 266 |
+
- ❌ Implementing planning logic (Stage 3)
|
| 267 |
+
- ❌ Implementing answer synthesis (Stage 3)
|
| 268 |
+
- ❌ Optimizing tool selection strategy (Stage 3)
|
| 269 |
+
- ❌ Advanced error recovery beyond retries (Stage 4)
|
| 270 |
+
- ❌ Performance optimization (Stage 5)
|
| 271 |
+
|
| 272 |
+
## Dependencies & Risks
|
| 273 |
+
|
| 274 |
+
**Dependencies:**
|
| 275 |
+
|
| 276 |
+
- Tavily API key (free tier: 1000 req/month)
|
| 277 |
+
- Exa API key (paid tier, fallback)
|
| 278 |
+
- LLM vision API access (Gemini/Claude)
|
| 279 |
+
- Test fixtures (sample files for parsing)
|
| 280 |
+
|
| 281 |
+
**Risks:**
|
| 282 |
+
|
| 283 |
+
- **Risk:** API rate limits during testing
|
| 284 |
+
- **Mitigation:** Use mocks for unit tests, real APIs only for integration tests
|
| 285 |
+
- **Risk:** File parsing fails on edge cases
|
| 286 |
+
- **Mitigation:** Comprehensive test fixtures covering various formats
|
| 287 |
+
- **Risk:** Calculator security vulnerabilities
|
| 288 |
+
- **Mitigation:** Strict whitelisting, no eval/exec, use AST parsing only
|
| 289 |
+
- **Risk:** Tool timeout issues on slow networks
|
| 290 |
+
- **Mitigation:** Configurable timeouts, retry logic
|
| 291 |
+
|
| 292 |
+
## Next Steps After Stage 2
|
| 293 |
+
|
| 294 |
+
Once Stage 2 Success Criteria met:
|
| 295 |
+
|
| 296 |
+
1. Create Stage 3 plan (Core Agent Logic - Planning & Reasoning)
|
| 297 |
+
2. Implement plan_node with tool selection strategy
|
| 298 |
+
3. Implement answer_node with result synthesis
|
| 299 |
+
4. Test end-to-end agent behavior
|
| 300 |
+
5. Proceed to Stage 4 (Integration & Robustness)
|
|
@@ -1,14 +1,71 @@
|
|
| 1 |
-
# TODO
|
| 2 |
|
| 3 |
-
**
|
| 4 |
-
**
|
|
|
|
| 5 |
|
| 6 |
-
##
|
| 7 |
|
| 8 |
-
|
| 9 |
-
- [ ]
|
| 10 |
-
- [ ]
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TODO - Stage 2: Tool Development
|
| 2 |
|
| 3 |
+
**Created:** 2026-01-02
|
| 4 |
+
**Plan:** PLAN.md (Stage 2: Tool Development)
|
| 5 |
+
**Status:** Ready for execution
|
| 6 |
|
| 7 |
+
## Task List
|
| 8 |
|
| 9 |
+
### Step 1: Web Search Tool
|
| 10 |
+
- [ ] Create `src/tools/web_search.py` with Tavily and Exa search functions
|
| 11 |
+
- [ ] Add retry logic with tenacity decorator (max 3 retries, exponential backoff)
|
| 12 |
+
- [ ] Implement fallback mechanism (Tavily → Exa)
|
| 13 |
+
- [ ] Add error handling and logging
|
| 14 |
+
- [ ] Create `tests/test_web_search.py` with mock API tests
|
| 15 |
+
- [ ] Test retry logic and fallback mechanism
|
| 16 |
|
| 17 |
+
### Step 2: File Parsing Tool
|
| 18 |
+
- [ ] Create `src/tools/file_parser.py` with PDF/Excel/Word parsers
|
| 19 |
+
- [ ] Implement generic `parse_file()` dispatcher
|
| 20 |
+
- [ ] Add retry logic for file operations
|
| 21 |
+
- [ ] Add error handling for missing/corrupted files
|
| 22 |
+
- [ ] Create test fixtures in `tests/fixtures/`
|
| 23 |
+
- [ ] Create `tests/test_file_parser.py` with parser tests
|
| 24 |
|
| 25 |
+
### Step 3: Calculator Tool
|
| 26 |
+
- [ ] Create `src/tools/calculator.py` with safe_eval function
|
| 27 |
+
- [ ] Implement safety checks (whitelist operations, timeout, complexity limits)
|
| 28 |
+
- [ ] Add error handling for syntax/division errors
|
| 29 |
+
- [ ] Create `tests/test_calculator.py` with arithmetic and safety tests
|
| 30 |
+
|
| 31 |
+
### Step 4: Vision Tool
|
| 32 |
+
- [ ] Create `src/tools/vision.py` with image analysis function
|
| 33 |
+
- [ ] Implement image loading and base64 encoding
|
| 34 |
+
- [ ] Integrate with LLM vision API (Gemini/Claude)
|
| 35 |
+
- [ ] Add retry logic for API errors
|
| 36 |
+
- [ ] Create test image fixtures
|
| 37 |
+
- [ ] Create `tests/test_vision.py` with mock LLM tests
|
| 38 |
+
|
| 39 |
+
### Step 5: StateGraph Integration
|
| 40 |
+
- [ ] Update `src/tools/__init__.py` to export all tools
|
| 41 |
+
- [ ] Create unified tool registry with metadata
|
| 42 |
+
- [ ] Update `src/agent/graph.py` execute_node to use real tools
|
| 43 |
+
- [ ] Implement `execute_tool()` wrapper with logging and timeout
|
| 44 |
+
- [ ] Test tool execution from StateGraph
|
| 45 |
+
|
| 46 |
+
### Step 6: Configuration Updates
|
| 47 |
+
- [ ] Update `src/config/settings.py` with tool-specific settings
|
| 48 |
+
- [ ] Add tool feature flags and timeouts
|
| 49 |
+
- [ ] Update `.env.example` with new configuration (if needed)
|
| 50 |
+
|
| 51 |
+
### Step 7: Integration Testing
|
| 52 |
+
- [ ] Create `tests/test_tools_integration.py` for cross-tool tests
|
| 53 |
+
- [ ] Create `tests/test_stage2.py` for end-to-end validation
|
| 54 |
+
- [ ] Test error propagation and retry mechanisms
|
| 55 |
+
- [ ] Verify StateGraph executes all tools correctly
|
| 56 |
+
|
| 57 |
+
### Step 8: Deployment
|
| 58 |
+
- [ ] Add `tenacity` to requirements.txt
|
| 59 |
+
- [ ] Run all test suites locally
|
| 60 |
+
- [ ] Test with Gradio UI
|
| 61 |
+
- [ ] Verify no regressions from Stage 1
|
| 62 |
+
- [ ] Push changes to HF Spaces
|
| 63 |
+
- [ ] Verify deployment build succeeds
|
| 64 |
+
- [ ] Test tools in deployed environment
|
| 65 |
+
|
| 66 |
+
## Notes
|
| 67 |
+
|
| 68 |
+
- All tools use direct API approach (not MCP servers)
|
| 69 |
+
- HF Spaces deployment compatibility is priority
|
| 70 |
+
- Mock APIs for unit tests, real APIs for integration tests only
|
| 71 |
+
- Each checkpoint should pass before moving to next step
|
|
@@ -13,28 +13,24 @@ dependencies = [
|
|
| 13 |
"langgraph>=0.2.0",
|
| 14 |
"langchain>=0.3.0",
|
| 15 |
"langchain-core>=0.3.0",
|
| 16 |
-
|
| 17 |
# LLM APIs
|
| 18 |
"anthropic>=0.39.0",
|
| 19 |
"google-genai>=0.2.0",
|
| 20 |
-
|
| 21 |
# Search & retrieval tools
|
| 22 |
"exa-py>=1.0.0",
|
| 23 |
"tavily-python>=0.5.0",
|
| 24 |
-
|
| 25 |
# File readers (multi-format support)
|
| 26 |
"PyPDF2>=3.0.0",
|
| 27 |
"openpyxl>=3.1.0",
|
| 28 |
"python-docx>=1.1.0",
|
| 29 |
"pillow>=10.4.0",
|
| 30 |
-
|
| 31 |
# Web & API utilities
|
| 32 |
"requests>=2.32.0",
|
| 33 |
"python-dotenv>=1.0.0",
|
| 34 |
-
|
| 35 |
# Gradio UI
|
| 36 |
"gradio[oauth]>=5.0.0",
|
| 37 |
"pandas>=2.2.0",
|
|
|
|
| 38 |
]
|
| 39 |
|
| 40 |
[tool.uv]
|
|
|
|
| 13 |
"langgraph>=0.2.0",
|
| 14 |
"langchain>=0.3.0",
|
| 15 |
"langchain-core>=0.3.0",
|
|
|
|
| 16 |
# LLM APIs
|
| 17 |
"anthropic>=0.39.0",
|
| 18 |
"google-genai>=0.2.0",
|
|
|
|
| 19 |
# Search & retrieval tools
|
| 20 |
"exa-py>=1.0.0",
|
| 21 |
"tavily-python>=0.5.0",
|
|
|
|
| 22 |
# File readers (multi-format support)
|
| 23 |
"PyPDF2>=3.0.0",
|
| 24 |
"openpyxl>=3.1.0",
|
| 25 |
"python-docx>=1.1.0",
|
| 26 |
"pillow>=10.4.0",
|
|
|
|
| 27 |
# Web & API utilities
|
| 28 |
"requests>=2.32.0",
|
| 29 |
"python-dotenv>=1.0.0",
|
|
|
|
| 30 |
# Gradio UI
|
| 31 |
"gradio[oauth]>=5.0.0",
|
| 32 |
"pandas>=2.2.0",
|
| 33 |
+
"tenacity>=9.1.2",
|
| 34 |
]
|
| 35 |
|
| 36 |
[tool.uv]
|
|
@@ -57,3 +57,4 @@ python-dotenv>=1.0.0 # Environment variable management
|
|
| 57 |
# ============================================================================
|
| 58 |
pydantic>=2.0.0 # Data validation (for StateGraph)
|
| 59 |
typing-extensions>=4.12.0 # Type hints support
|
|
|
|
|
|
| 57 |
# ============================================================================
|
| 58 |
pydantic>=2.0.0 # Data validation (for StateGraph)
|
| 59 |
typing-extensions>=4.12.0 # Type hints support
|
| 60 |
+
tenacity>=8.2.0 # Retry logic with exponential backoff
|
|
@@ -4,7 +4,7 @@ Author: @mangobee
|
|
| 4 |
Date: 2026-01-01
|
| 5 |
|
| 6 |
Stage 1: Skeleton with placeholder nodes
|
| 7 |
-
Stage 2: Tool integration
|
| 8 |
Stage 3: Planning and reasoning logic implementation
|
| 9 |
|
| 10 |
Based on:
|
|
@@ -13,9 +13,16 @@ Based on:
|
|
| 13 |
- Level 6: LangGraph framework
|
| 14 |
"""
|
| 15 |
|
|
|
|
| 16 |
from typing import TypedDict, List, Optional
|
| 17 |
from langgraph.graph import StateGraph, END
|
| 18 |
from src.config import Settings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
# ============================================================================
|
| 21 |
# Agent State Definition
|
|
@@ -42,8 +49,8 @@ def plan_node(state: AgentState) -> AgentState:
|
|
| 42 |
"""
|
| 43 |
Planning node: Analyze question and generate execution plan.
|
| 44 |
|
| 45 |
-
Stage
|
| 46 |
-
Stage 3:
|
| 47 |
|
| 48 |
Args:
|
| 49 |
state: Current agent state with question
|
|
@@ -51,10 +58,13 @@ def plan_node(state: AgentState) -> AgentState:
|
|
| 51 |
Returns:
|
| 52 |
Updated state with execution plan
|
| 53 |
"""
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
-
|
| 57 |
-
state["plan"] = "Stage 1 placeholder: No planning implemented yet"
|
| 58 |
|
| 59 |
return state
|
| 60 |
|
|
@@ -63,9 +73,8 @@ def execute_node(state: AgentState) -> AgentState:
|
|
| 63 |
"""
|
| 64 |
Execution node: Execute tools based on plan.
|
| 65 |
|
| 66 |
-
Stage
|
| 67 |
-
Stage
|
| 68 |
-
Stage 3: Implement tool selection based on plan
|
| 69 |
|
| 70 |
Args:
|
| 71 |
state: Current agent state with plan
|
|
@@ -73,12 +82,25 @@ def execute_node(state: AgentState) -> AgentState:
|
|
| 73 |
Returns:
|
| 74 |
Updated state with tool execution results
|
| 75 |
"""
|
| 76 |
-
|
| 77 |
|
| 78 |
-
# Stage
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
return state
|
| 84 |
|
|
@@ -87,8 +109,8 @@ def answer_node(state: AgentState) -> AgentState:
|
|
| 87 |
"""
|
| 88 |
Answer synthesis node: Generate final factoid answer.
|
| 89 |
|
| 90 |
-
Stage
|
| 91 |
-
Stage 3:
|
| 92 |
|
| 93 |
Args:
|
| 94 |
state: Current agent state with tool results
|
|
@@ -96,10 +118,13 @@ def answer_node(state: AgentState) -> AgentState:
|
|
| 96 |
Returns:
|
| 97 |
Updated state with final answer
|
| 98 |
"""
|
| 99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
|
| 101 |
-
|
| 102 |
-
state["answer"] = "Stage 1 placeholder answer"
|
| 103 |
|
| 104 |
return state
|
| 105 |
|
|
|
|
| 4 |
Date: 2026-01-01
|
| 5 |
|
| 6 |
Stage 1: Skeleton with placeholder nodes
|
| 7 |
+
Stage 2: Tool integration (CURRENT)
|
| 8 |
Stage 3: Planning and reasoning logic implementation
|
| 9 |
|
| 10 |
Based on:
|
|
|
|
| 13 |
- Level 6: LangGraph framework
|
| 14 |
"""
|
| 15 |
|
| 16 |
+
import logging
|
| 17 |
from typing import TypedDict, List, Optional
|
| 18 |
from langgraph.graph import StateGraph, END
|
| 19 |
from src.config import Settings
|
| 20 |
+
from src.tools import TOOLS
|
| 21 |
+
|
| 22 |
+
# ============================================================================
|
| 23 |
+
# Logging Setup
|
| 24 |
+
# ============================================================================
|
| 25 |
+
logger = logging.getLogger(__name__)
|
| 26 |
|
| 27 |
# ============================================================================
|
| 28 |
# Agent State Definition
|
|
|
|
| 49 |
"""
|
| 50 |
Planning node: Analyze question and generate execution plan.
|
| 51 |
|
| 52 |
+
Stage 2: Basic tool listing
|
| 53 |
+
Stage 3: Dynamic planning with LLM
|
| 54 |
|
| 55 |
Args:
|
| 56 |
state: Current agent state with question
|
|
|
|
| 58 |
Returns:
|
| 59 |
Updated state with execution plan
|
| 60 |
"""
|
| 61 |
+
logger.info(f"[plan_node] Question received: {state['question'][:100]}...")
|
| 62 |
+
|
| 63 |
+
# Stage 2: List available tools (dynamic planning in Stage 3)
|
| 64 |
+
tool_summary = ", ".join(TOOLS.keys())
|
| 65 |
+
state["plan"] = f"Stage 2: {len(TOOLS)} tools available ({tool_summary}). Dynamic planning in Stage 3."
|
| 66 |
|
| 67 |
+
logger.info(f"[plan_node] Plan created: {state['plan']}")
|
|
|
|
| 68 |
|
| 69 |
return state
|
| 70 |
|
|
|
|
| 73 |
"""
|
| 74 |
Execution node: Execute tools based on plan.
|
| 75 |
|
| 76 |
+
Stage 2: Tool execution with error handling
|
| 77 |
+
Stage 3: Dynamic tool selection based on plan
|
|
|
|
| 78 |
|
| 79 |
Args:
|
| 80 |
state: Current agent state with plan
|
|
|
|
| 82 |
Returns:
|
| 83 |
Updated state with tool execution results
|
| 84 |
"""
|
| 85 |
+
logger.info(f"[execute_node] Executing tools - Plan: {state['plan'][:100]}...")
|
| 86 |
|
| 87 |
+
# Stage 2: Tools are available but no dynamic planning yet
|
| 88 |
+
# For now, just demonstrate tool registry is loaded
|
| 89 |
+
tool_calls = []
|
| 90 |
+
|
| 91 |
+
# Log available tools
|
| 92 |
+
for tool_name, tool_info in TOOLS.items():
|
| 93 |
+
logger.info(f" Available tool: {tool_name} - {tool_info['description']}")
|
| 94 |
+
tool_calls.append({
|
| 95 |
+
"tool": tool_name,
|
| 96 |
+
"status": "ready",
|
| 97 |
+
"description": tool_info["description"],
|
| 98 |
+
"category": tool_info["category"]
|
| 99 |
+
})
|
| 100 |
+
|
| 101 |
+
state["tool_calls"] = tool_calls
|
| 102 |
+
|
| 103 |
+
logger.info(f"[execute_node] {len(tool_calls)} tools ready for Stage 3 dynamic execution")
|
| 104 |
|
| 105 |
return state
|
| 106 |
|
|
|
|
| 109 |
"""
|
| 110 |
Answer synthesis node: Generate final factoid answer.
|
| 111 |
|
| 112 |
+
Stage 2: Summarize tool availability
|
| 113 |
+
Stage 3: Synthesize answer from tool execution results
|
| 114 |
|
| 115 |
Args:
|
| 116 |
state: Current agent state with tool results
|
|
|
|
| 118 |
Returns:
|
| 119 |
Updated state with final answer
|
| 120 |
"""
|
| 121 |
+
logger.info(f"[answer_node] Processing {len(state['tool_calls'])} tool results")
|
| 122 |
+
|
| 123 |
+
# Stage 2: Report tool readiness
|
| 124 |
+
ready_tools = [t["tool"] for t in state["tool_calls"] if t["status"] == "ready"]
|
| 125 |
+
state["answer"] = f"Stage 2 complete: {len(ready_tools)} tools ready for execution in Stage 3"
|
| 126 |
|
| 127 |
+
logger.info(f"[answer_node] Answer generated: {state['answer']}")
|
|
|
|
| 128 |
|
| 129 |
return state
|
| 130 |
|
|
@@ -1,15 +1,64 @@
|
|
| 1 |
"""
|
| 2 |
-
|
| 3 |
Author: @mangobee
|
| 4 |
|
| 5 |
-
This package
|
| 6 |
-
- web_search
|
| 7 |
-
-
|
| 8 |
-
-
|
| 9 |
-
-
|
| 10 |
|
| 11 |
-
Stage
|
| 12 |
-
Stage 2: Full implementation
|
| 13 |
"""
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
"""
|
| 2 |
+
Tool implementations package
|
| 3 |
Author: @mangobee
|
| 4 |
|
| 5 |
+
This package contains all agent tools:
|
| 6 |
+
- web_search: Web search using Tavily/Exa
|
| 7 |
+
- file_parser: Multi-format file parsing (PDF/Excel/Word/Text)
|
| 8 |
+
- calculator: Safe mathematical expression evaluation
|
| 9 |
+
- vision: Multimodal image analysis using LLMs
|
| 10 |
|
| 11 |
+
Stage 2: All tools implemented with retry logic and error handling
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from src.tools.web_search import search, tavily_search, exa_search
|
| 15 |
+
from src.tools.file_parser import parse_file, parse_pdf, parse_excel, parse_word, parse_text
|
| 16 |
+
from src.tools.calculator import safe_eval
|
| 17 |
+
from src.tools.vision import analyze_image, analyze_image_gemini, analyze_image_claude
|
| 18 |
+
|
| 19 |
+
# Tool registry with metadata
|
| 20 |
+
TOOLS = {
|
| 21 |
+
"web_search": {
|
| 22 |
+
"function": search,
|
| 23 |
+
"description": "Search the web using Tavily or Exa APIs with fallback",
|
| 24 |
+
"parameters": ["query", "max_results"],
|
| 25 |
+
"category": "information_retrieval",
|
| 26 |
+
},
|
| 27 |
+
"parse_file": {
|
| 28 |
+
"function": parse_file,
|
| 29 |
+
"description": "Parse files (PDF, Excel, Word, Text, CSV) and extract content",
|
| 30 |
+
"parameters": ["file_path"],
|
| 31 |
+
"category": "file_processing",
|
| 32 |
+
},
|
| 33 |
+
"calculator": {
|
| 34 |
+
"function": safe_eval,
|
| 35 |
+
"description": "Safely evaluate mathematical expressions",
|
| 36 |
+
"parameters": ["expression"],
|
| 37 |
+
"category": "computation",
|
| 38 |
+
},
|
| 39 |
+
"vision": {
|
| 40 |
+
"function": analyze_image,
|
| 41 |
+
"description": "Analyze images using multimodal LLMs (Gemini/Claude)",
|
| 42 |
+
"parameters": ["image_path", "question"],
|
| 43 |
+
"category": "multimodal",
|
| 44 |
+
},
|
| 45 |
+
}
|
| 46 |
+
|
| 47 |
+
__all__ = [
|
| 48 |
+
# Main unified tool functions
|
| 49 |
+
"search",
|
| 50 |
+
"parse_file",
|
| 51 |
+
"safe_eval",
|
| 52 |
+
"analyze_image",
|
| 53 |
+
# Specific implementations (for advanced use)
|
| 54 |
+
"tavily_search",
|
| 55 |
+
"exa_search",
|
| 56 |
+
"parse_pdf",
|
| 57 |
+
"parse_excel",
|
| 58 |
+
"parse_word",
|
| 59 |
+
"parse_text",
|
| 60 |
+
"analyze_image_gemini",
|
| 61 |
+
"analyze_image_claude",
|
| 62 |
+
# Tool registry
|
| 63 |
+
"TOOLS",
|
| 64 |
+
]
|
|
@@ -0,0 +1,303 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Calculator Tool - Safe mathematical expression evaluation
|
| 3 |
+
Author: @mangobee
|
| 4 |
+
Date: 2026-01-02
|
| 5 |
+
|
| 6 |
+
Provides safe evaluation of mathematical expressions with:
|
| 7 |
+
- Whitelisted operations and functions
|
| 8 |
+
- Timeout protection
|
| 9 |
+
- Complexity limits
|
| 10 |
+
- No access to dangerous built-ins
|
| 11 |
+
|
| 12 |
+
Security is prioritized over functionality.
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import ast
|
| 16 |
+
import math
|
| 17 |
+
import operator
|
| 18 |
+
import logging
|
| 19 |
+
from typing import Any, Dict
|
| 20 |
+
import signal
|
| 21 |
+
from contextlib import contextmanager
|
| 22 |
+
|
| 23 |
+
# ============================================================================
|
| 24 |
+
# CONFIG
|
| 25 |
+
# ============================================================================
|
| 26 |
+
MAX_EXPRESSION_LENGTH = 500
|
| 27 |
+
MAX_EVAL_TIME_SECONDS = 2
|
| 28 |
+
MAX_NUMBER_SIZE = 10**100 # Prevent huge number calculations
|
| 29 |
+
|
| 30 |
+
# Whitelist of safe operations
|
| 31 |
+
SAFE_OPERATORS = {
|
| 32 |
+
ast.Add: operator.add,
|
| 33 |
+
ast.Sub: operator.sub,
|
| 34 |
+
ast.Mult: operator.mul,
|
| 35 |
+
ast.Div: operator.truediv,
|
| 36 |
+
ast.FloorDiv: operator.floordiv,
|
| 37 |
+
ast.Mod: operator.mod,
|
| 38 |
+
ast.Pow: operator.pow,
|
| 39 |
+
ast.USub: operator.neg,
|
| 40 |
+
ast.UAdd: operator.pos,
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
# Whitelist of safe mathematical functions
|
| 44 |
+
SAFE_FUNCTIONS = {
|
| 45 |
+
'abs': abs,
|
| 46 |
+
'round': round,
|
| 47 |
+
'min': min,
|
| 48 |
+
'max': max,
|
| 49 |
+
'sum': sum,
|
| 50 |
+
# Math module functions
|
| 51 |
+
'sqrt': math.sqrt,
|
| 52 |
+
'ceil': math.ceil,
|
| 53 |
+
'floor': math.floor,
|
| 54 |
+
'log': math.log,
|
| 55 |
+
'log10': math.log10,
|
| 56 |
+
'exp': math.exp,
|
| 57 |
+
'sin': math.sin,
|
| 58 |
+
'cos': math.cos,
|
| 59 |
+
'tan': math.tan,
|
| 60 |
+
'asin': math.asin,
|
| 61 |
+
'acos': math.acos,
|
| 62 |
+
'atan': math.atan,
|
| 63 |
+
'degrees': math.degrees,
|
| 64 |
+
'radians': math.radians,
|
| 65 |
+
'factorial': math.factorial,
|
| 66 |
+
# Constants
|
| 67 |
+
'pi': math.pi,
|
| 68 |
+
'e': math.e,
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
# ============================================================================
|
| 72 |
+
# Logging Setup
|
| 73 |
+
# ============================================================================
|
| 74 |
+
logger = logging.getLogger(__name__)
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
# ============================================================================
|
| 78 |
+
# Timeout Context Manager
|
| 79 |
+
# ============================================================================
|
| 80 |
+
|
| 81 |
+
class TimeoutError(Exception):
|
| 82 |
+
"""Raised when evaluation exceeds timeout"""
|
| 83 |
+
pass
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
@contextmanager
|
| 87 |
+
def timeout(seconds: int):
|
| 88 |
+
"""
|
| 89 |
+
Context manager for timeout protection.
|
| 90 |
+
|
| 91 |
+
Args:
|
| 92 |
+
seconds: Maximum execution time
|
| 93 |
+
|
| 94 |
+
Raises:
|
| 95 |
+
TimeoutError: If execution exceeds timeout
|
| 96 |
+
"""
|
| 97 |
+
def timeout_handler(signum, frame):
|
| 98 |
+
raise TimeoutError(f"Evaluation exceeded {seconds} second timeout")
|
| 99 |
+
|
| 100 |
+
# Set signal handler
|
| 101 |
+
old_handler = signal.signal(signal.SIGALRM, timeout_handler)
|
| 102 |
+
signal.alarm(seconds)
|
| 103 |
+
|
| 104 |
+
try:
|
| 105 |
+
yield
|
| 106 |
+
finally:
|
| 107 |
+
# Restore old handler and cancel alarm
|
| 108 |
+
signal.alarm(0)
|
| 109 |
+
signal.signal(signal.SIGALRM, old_handler)
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
# ============================================================================
|
| 113 |
+
# Safe AST Evaluator
|
| 114 |
+
# ============================================================================
|
| 115 |
+
|
| 116 |
+
class SafeEvaluator(ast.NodeVisitor):
|
| 117 |
+
"""
|
| 118 |
+
AST visitor that evaluates mathematical expressions safely.
|
| 119 |
+
|
| 120 |
+
Only allows whitelisted operations and functions.
|
| 121 |
+
Prevents code execution, attribute access, and other dangerous operations.
|
| 122 |
+
"""
|
| 123 |
+
|
| 124 |
+
def visit_Expression(self, node):
|
| 125 |
+
"""Visit Expression node (root of parse tree)"""
|
| 126 |
+
return self.visit(node.body)
|
| 127 |
+
|
| 128 |
+
def visit_Constant(self, node):
|
| 129 |
+
"""Visit Constant node (numbers, strings)"""
|
| 130 |
+
value = node.value
|
| 131 |
+
|
| 132 |
+
# Only allow numbers
|
| 133 |
+
if not isinstance(value, (int, float, complex)):
|
| 134 |
+
raise ValueError(f"Unsupported constant type: {type(value).__name__}")
|
| 135 |
+
|
| 136 |
+
# Prevent huge numbers
|
| 137 |
+
if isinstance(value, (int, float)) and abs(value) > MAX_NUMBER_SIZE:
|
| 138 |
+
raise ValueError(f"Number too large: {value}")
|
| 139 |
+
|
| 140 |
+
return value
|
| 141 |
+
|
| 142 |
+
def visit_BinOp(self, node):
|
| 143 |
+
"""Visit binary operation node (+, -, *, /, etc.)"""
|
| 144 |
+
op_type = type(node.op)
|
| 145 |
+
|
| 146 |
+
if op_type not in SAFE_OPERATORS:
|
| 147 |
+
raise ValueError(f"Unsupported operation: {op_type.__name__}")
|
| 148 |
+
|
| 149 |
+
left = self.visit(node.left)
|
| 150 |
+
right = self.visit(node.right)
|
| 151 |
+
|
| 152 |
+
op_func = SAFE_OPERATORS[op_type]
|
| 153 |
+
|
| 154 |
+
# Check for division by zero
|
| 155 |
+
if op_type in (ast.Div, ast.FloorDiv, ast.Mod) and right == 0:
|
| 156 |
+
raise ZeroDivisionError("Division by zero")
|
| 157 |
+
|
| 158 |
+
# Prevent huge exponentiations
|
| 159 |
+
if op_type == ast.Pow and abs(right) > 1000:
|
| 160 |
+
raise ValueError(f"Exponent too large: {right}")
|
| 161 |
+
|
| 162 |
+
return op_func(left, right)
|
| 163 |
+
|
| 164 |
+
def visit_UnaryOp(self, node):
|
| 165 |
+
"""Visit unary operation node (-, +)"""
|
| 166 |
+
op_type = type(node.op)
|
| 167 |
+
|
| 168 |
+
if op_type not in SAFE_OPERATORS:
|
| 169 |
+
raise ValueError(f"Unsupported unary operation: {op_type.__name__}")
|
| 170 |
+
|
| 171 |
+
operand = self.visit(node.operand)
|
| 172 |
+
op_func = SAFE_OPERATORS[op_type]
|
| 173 |
+
|
| 174 |
+
return op_func(operand)
|
| 175 |
+
|
| 176 |
+
def visit_Call(self, node):
|
| 177 |
+
"""Visit function call node"""
|
| 178 |
+
# Only allow simple function names, not attribute access
|
| 179 |
+
if not isinstance(node.func, ast.Name):
|
| 180 |
+
raise ValueError("Only direct function calls are allowed")
|
| 181 |
+
|
| 182 |
+
func_name = node.func.id
|
| 183 |
+
|
| 184 |
+
if func_name not in SAFE_FUNCTIONS:
|
| 185 |
+
raise ValueError(f"Unsupported function: {func_name}")
|
| 186 |
+
|
| 187 |
+
# Evaluate arguments
|
| 188 |
+
args = [self.visit(arg) for arg in node.args]
|
| 189 |
+
|
| 190 |
+
# No keyword arguments allowed
|
| 191 |
+
if node.keywords:
|
| 192 |
+
raise ValueError("Keyword arguments not allowed")
|
| 193 |
+
|
| 194 |
+
func = SAFE_FUNCTIONS[func_name]
|
| 195 |
+
|
| 196 |
+
try:
|
| 197 |
+
return func(*args)
|
| 198 |
+
except Exception as e:
|
| 199 |
+
raise ValueError(f"Error calling {func_name}: {str(e)}")
|
| 200 |
+
|
| 201 |
+
def visit_Name(self, node):
|
| 202 |
+
"""Visit name node (variable/constant reference)"""
|
| 203 |
+
# Only allow whitelisted constants
|
| 204 |
+
if node.id in SAFE_FUNCTIONS:
|
| 205 |
+
value = SAFE_FUNCTIONS[node.id]
|
| 206 |
+
# If it's a constant (not a function), return it
|
| 207 |
+
if not callable(value):
|
| 208 |
+
return value
|
| 209 |
+
|
| 210 |
+
raise ValueError(f"Undefined name: {node.id}")
|
| 211 |
+
|
| 212 |
+
def visit_List(self, node):
|
| 213 |
+
"""Visit list node"""
|
| 214 |
+
return [self.visit(element) for element in node.elts]
|
| 215 |
+
|
| 216 |
+
def visit_Tuple(self, node):
|
| 217 |
+
"""Visit tuple node"""
|
| 218 |
+
return tuple(self.visit(element) for element in node.elts)
|
| 219 |
+
|
| 220 |
+
def generic_visit(self, node):
|
| 221 |
+
"""Catch-all for unsupported node types"""
|
| 222 |
+
raise ValueError(f"Unsupported expression type: {type(node).__name__}")
|
| 223 |
+
|
| 224 |
+
|
| 225 |
+
# ============================================================================
|
| 226 |
+
# Public API
|
| 227 |
+
# ============================================================================
|
| 228 |
+
|
| 229 |
+
def safe_eval(expression: str) -> Dict[str, Any]:
|
| 230 |
+
"""
|
| 231 |
+
Safely evaluate a mathematical expression.
|
| 232 |
+
|
| 233 |
+
Args:
|
| 234 |
+
expression: Mathematical expression string
|
| 235 |
+
|
| 236 |
+
Returns:
|
| 237 |
+
Dict with structure: {
|
| 238 |
+
"result": float or int, # Evaluation result
|
| 239 |
+
"expression": str, # Original expression
|
| 240 |
+
"success": bool # True if evaluation succeeded
|
| 241 |
+
}
|
| 242 |
+
|
| 243 |
+
Raises:
|
| 244 |
+
ValueError: For invalid or unsafe expressions
|
| 245 |
+
ZeroDivisionError: For division by zero
|
| 246 |
+
TimeoutError: If evaluation exceeds timeout
|
| 247 |
+
SyntaxError: For malformed expressions
|
| 248 |
+
|
| 249 |
+
Examples:
|
| 250 |
+
>>> safe_eval("2 + 2")
|
| 251 |
+
{"result": 4, "expression": "2 + 2", "success": True}
|
| 252 |
+
|
| 253 |
+
>>> safe_eval("sqrt(16) + 3")
|
| 254 |
+
{"result": 7.0, "expression": "sqrt(16) + 3", "success": True}
|
| 255 |
+
|
| 256 |
+
>>> safe_eval("import os") # Raises ValueError
|
| 257 |
+
"""
|
| 258 |
+
# Input validation
|
| 259 |
+
if not expression or not isinstance(expression, str):
|
| 260 |
+
raise ValueError("Expression must be a non-empty string")
|
| 261 |
+
|
| 262 |
+
expression = expression.strip()
|
| 263 |
+
|
| 264 |
+
if len(expression) > MAX_EXPRESSION_LENGTH:
|
| 265 |
+
raise ValueError(
|
| 266 |
+
f"Expression too long ({len(expression)} chars). "
|
| 267 |
+
f"Maximum: {MAX_EXPRESSION_LENGTH} chars"
|
| 268 |
+
)
|
| 269 |
+
|
| 270 |
+
logger.info(f"Evaluating expression: {expression}")
|
| 271 |
+
|
| 272 |
+
try:
|
| 273 |
+
# Parse expression into AST
|
| 274 |
+
tree = ast.parse(expression, mode='eval')
|
| 275 |
+
|
| 276 |
+
# Evaluate with timeout protection
|
| 277 |
+
with timeout(MAX_EVAL_TIME_SECONDS):
|
| 278 |
+
evaluator = SafeEvaluator()
|
| 279 |
+
result = evaluator.visit(tree)
|
| 280 |
+
|
| 281 |
+
logger.info(f"Evaluation successful: {result}")
|
| 282 |
+
|
| 283 |
+
return {
|
| 284 |
+
"result": result,
|
| 285 |
+
"expression": expression,
|
| 286 |
+
"success": True,
|
| 287 |
+
}
|
| 288 |
+
|
| 289 |
+
except SyntaxError as e:
|
| 290 |
+
logger.error(f"Syntax error in expression: {e}")
|
| 291 |
+
raise SyntaxError(f"Invalid expression syntax: {str(e)}")
|
| 292 |
+
except ZeroDivisionError as e:
|
| 293 |
+
logger.error(f"Division by zero: {expression}")
|
| 294 |
+
raise
|
| 295 |
+
except TimeoutError as e:
|
| 296 |
+
logger.error(f"Evaluation timeout: {expression}")
|
| 297 |
+
raise
|
| 298 |
+
except ValueError as e:
|
| 299 |
+
logger.error(f"Invalid expression: {e}")
|
| 300 |
+
raise
|
| 301 |
+
except Exception as e:
|
| 302 |
+
logger.error(f"Unexpected error evaluating expression: {e}")
|
| 303 |
+
raise ValueError(f"Evaluation error: {str(e)}")
|
|
@@ -0,0 +1,367 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
File Parser Tool - Multi-format file reading
|
| 3 |
+
Author: @mangobee
|
| 4 |
+
Date: 2026-01-02
|
| 5 |
+
|
| 6 |
+
Provides file parsing for:
|
| 7 |
+
- PDF files (.pdf) using PyPDF2
|
| 8 |
+
- Excel files (.xlsx, .xls) using openpyxl
|
| 9 |
+
- Word documents (.docx) using python-docx
|
| 10 |
+
- Text files (.txt, .csv) using built-in open()
|
| 11 |
+
|
| 12 |
+
All parsers include retry logic and error handling.
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import logging
|
| 16 |
+
from pathlib import Path
|
| 17 |
+
from typing import Dict, List, Optional
|
| 18 |
+
from tenacity import (
|
| 19 |
+
retry,
|
| 20 |
+
stop_after_attempt,
|
| 21 |
+
wait_exponential,
|
| 22 |
+
retry_if_exception_type,
|
| 23 |
+
)
|
| 24 |
+
|
| 25 |
+
# ============================================================================
|
| 26 |
+
# CONFIG
|
| 27 |
+
# ============================================================================
|
| 28 |
+
MAX_RETRIES = 3
|
| 29 |
+
RETRY_MIN_WAIT = 1 # seconds
|
| 30 |
+
RETRY_MAX_WAIT = 5 # seconds
|
| 31 |
+
|
| 32 |
+
SUPPORTED_EXTENSIONS = {
|
| 33 |
+
'.pdf': 'PDF',
|
| 34 |
+
'.xlsx': 'Excel',
|
| 35 |
+
'.xls': 'Excel',
|
| 36 |
+
'.docx': 'Word',
|
| 37 |
+
'.txt': 'Text',
|
| 38 |
+
'.csv': 'CSV',
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
# ============================================================================
|
| 42 |
+
# Logging Setup
|
| 43 |
+
# ============================================================================
|
| 44 |
+
logger = logging.getLogger(__name__)
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
# ============================================================================
|
| 48 |
+
# PDF Parser
|
| 49 |
+
# ============================================================================
|
| 50 |
+
|
| 51 |
+
@retry(
|
| 52 |
+
stop=stop_after_attempt(MAX_RETRIES),
|
| 53 |
+
wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
|
| 54 |
+
retry=retry_if_exception_type((IOError, OSError)),
|
| 55 |
+
reraise=True,
|
| 56 |
+
)
|
| 57 |
+
def parse_pdf(file_path: str) -> Dict:
|
| 58 |
+
"""
|
| 59 |
+
Parse PDF file and extract text content.
|
| 60 |
+
|
| 61 |
+
Args:
|
| 62 |
+
file_path: Path to PDF file
|
| 63 |
+
|
| 64 |
+
Returns:
|
| 65 |
+
Dict with structure: {
|
| 66 |
+
"content": str, # Extracted text
|
| 67 |
+
"pages": int, # Number of pages
|
| 68 |
+
"file_type": "PDF",
|
| 69 |
+
"file_path": str
|
| 70 |
+
}
|
| 71 |
+
|
| 72 |
+
Raises:
|
| 73 |
+
FileNotFoundError: If file doesn't exist
|
| 74 |
+
ValueError: If file is corrupted or invalid
|
| 75 |
+
IOError: For file reading errors (triggers retry)
|
| 76 |
+
"""
|
| 77 |
+
try:
|
| 78 |
+
from PyPDF2 import PdfReader
|
| 79 |
+
|
| 80 |
+
path = Path(file_path)
|
| 81 |
+
if not path.exists():
|
| 82 |
+
raise FileNotFoundError(f"PDF file not found: {file_path}")
|
| 83 |
+
|
| 84 |
+
logger.info(f"Parsing PDF: {file_path}")
|
| 85 |
+
|
| 86 |
+
reader = PdfReader(str(path))
|
| 87 |
+
num_pages = len(reader.pages)
|
| 88 |
+
|
| 89 |
+
# Extract text from all pages
|
| 90 |
+
content = []
|
| 91 |
+
for page_num, page in enumerate(reader.pages, 1):
|
| 92 |
+
text = page.extract_text()
|
| 93 |
+
if text.strip():
|
| 94 |
+
content.append(f"--- Page {page_num} ---\n{text}")
|
| 95 |
+
|
| 96 |
+
full_content = "\n\n".join(content)
|
| 97 |
+
|
| 98 |
+
logger.info(f"PDF parsed successfully: {num_pages} pages, {len(full_content)} chars")
|
| 99 |
+
|
| 100 |
+
return {
|
| 101 |
+
"content": full_content,
|
| 102 |
+
"pages": num_pages,
|
| 103 |
+
"file_type": "PDF",
|
| 104 |
+
"file_path": file_path,
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
except FileNotFoundError as e:
|
| 108 |
+
logger.error(f"PDF file not found: {e}")
|
| 109 |
+
raise
|
| 110 |
+
except (IOError, OSError) as e:
|
| 111 |
+
logger.warning(f"PDF IO error (will retry): {e}")
|
| 112 |
+
raise
|
| 113 |
+
except Exception as e:
|
| 114 |
+
logger.error(f"PDF parsing error: {e}")
|
| 115 |
+
raise ValueError(f"Failed to parse PDF: {str(e)}")
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
# ============================================================================
|
| 119 |
+
# Excel Parser
|
| 120 |
+
# ============================================================================
|
| 121 |
+
|
| 122 |
+
@retry(
|
| 123 |
+
stop=stop_after_attempt(MAX_RETRIES),
|
| 124 |
+
wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
|
| 125 |
+
retry=retry_if_exception_type((IOError, OSError)),
|
| 126 |
+
reraise=True,
|
| 127 |
+
)
|
| 128 |
+
def parse_excel(file_path: str) -> Dict:
|
| 129 |
+
"""
|
| 130 |
+
Parse Excel file and extract data from all sheets.
|
| 131 |
+
|
| 132 |
+
Args:
|
| 133 |
+
file_path: Path to Excel file (.xlsx or .xls)
|
| 134 |
+
|
| 135 |
+
Returns:
|
| 136 |
+
Dict with structure: {
|
| 137 |
+
"content": str, # Formatted table data
|
| 138 |
+
"sheets": List[str], # Sheet names
|
| 139 |
+
"file_type": "Excel",
|
| 140 |
+
"file_path": str
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
Raises:
|
| 144 |
+
FileNotFoundError: If file doesn't exist
|
| 145 |
+
ValueError: If file is corrupted or invalid
|
| 146 |
+
IOError: For file reading errors (triggers retry)
|
| 147 |
+
"""
|
| 148 |
+
try:
|
| 149 |
+
from openpyxl import load_workbook
|
| 150 |
+
|
| 151 |
+
path = Path(file_path)
|
| 152 |
+
if not path.exists():
|
| 153 |
+
raise FileNotFoundError(f"Excel file not found: {file_path}")
|
| 154 |
+
|
| 155 |
+
logger.info(f"Parsing Excel: {file_path}")
|
| 156 |
+
|
| 157 |
+
workbook = load_workbook(str(path), data_only=True)
|
| 158 |
+
sheet_names = workbook.sheetnames
|
| 159 |
+
|
| 160 |
+
# Extract data from all sheets
|
| 161 |
+
content_parts = []
|
| 162 |
+
for sheet_name in sheet_names:
|
| 163 |
+
sheet = workbook[sheet_name]
|
| 164 |
+
|
| 165 |
+
# Get all values
|
| 166 |
+
rows = []
|
| 167 |
+
for row in sheet.iter_rows(values_only=True):
|
| 168 |
+
# Filter out completely empty rows
|
| 169 |
+
if any(cell is not None for cell in row):
|
| 170 |
+
row_str = "\t".join(str(cell) if cell is not None else "" for cell in row)
|
| 171 |
+
rows.append(row_str)
|
| 172 |
+
|
| 173 |
+
if rows:
|
| 174 |
+
sheet_content = f"=== Sheet: {sheet_name} ===\n" + "\n".join(rows)
|
| 175 |
+
content_parts.append(sheet_content)
|
| 176 |
+
|
| 177 |
+
full_content = "\n\n".join(content_parts)
|
| 178 |
+
|
| 179 |
+
logger.info(f"Excel parsed successfully: {len(sheet_names)} sheets")
|
| 180 |
+
|
| 181 |
+
return {
|
| 182 |
+
"content": full_content,
|
| 183 |
+
"sheets": sheet_names,
|
| 184 |
+
"file_type": "Excel",
|
| 185 |
+
"file_path": file_path,
|
| 186 |
+
}
|
| 187 |
+
|
| 188 |
+
except FileNotFoundError as e:
|
| 189 |
+
logger.error(f"Excel file not found: {e}")
|
| 190 |
+
raise
|
| 191 |
+
except (IOError, OSError) as e:
|
| 192 |
+
logger.warning(f"Excel IO error (will retry): {e}")
|
| 193 |
+
raise
|
| 194 |
+
except Exception as e:
|
| 195 |
+
logger.error(f"Excel parsing error: {e}")
|
| 196 |
+
raise ValueError(f"Failed to parse Excel: {str(e)}")
|
| 197 |
+
|
| 198 |
+
|
| 199 |
+
# ============================================================================
|
| 200 |
+
# Word Document Parser
|
| 201 |
+
# ============================================================================
|
| 202 |
+
|
| 203 |
+
@retry(
|
| 204 |
+
stop=stop_after_attempt(MAX_RETRIES),
|
| 205 |
+
wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
|
| 206 |
+
retry=retry_if_exception_type((IOError, OSError)),
|
| 207 |
+
reraise=True,
|
| 208 |
+
)
|
| 209 |
+
def parse_word(file_path: str) -> Dict:
|
| 210 |
+
"""
|
| 211 |
+
Parse Word document and extract text content.
|
| 212 |
+
|
| 213 |
+
Args:
|
| 214 |
+
file_path: Path to Word file (.docx)
|
| 215 |
+
|
| 216 |
+
Returns:
|
| 217 |
+
Dict with structure: {
|
| 218 |
+
"content": str, # Extracted text
|
| 219 |
+
"paragraphs": int, # Number of paragraphs
|
| 220 |
+
"file_type": "Word",
|
| 221 |
+
"file_path": str
|
| 222 |
+
}
|
| 223 |
+
|
| 224 |
+
Raises:
|
| 225 |
+
FileNotFoundError: If file doesn't exist
|
| 226 |
+
ValueError: If file is corrupted or invalid
|
| 227 |
+
IOError: For file reading errors (triggers retry)
|
| 228 |
+
"""
|
| 229 |
+
try:
|
| 230 |
+
from docx import Document
|
| 231 |
+
|
| 232 |
+
path = Path(file_path)
|
| 233 |
+
if not path.exists():
|
| 234 |
+
raise FileNotFoundError(f"Word file not found: {file_path}")
|
| 235 |
+
|
| 236 |
+
logger.info(f"Parsing Word document: {file_path}")
|
| 237 |
+
|
| 238 |
+
doc = Document(str(path))
|
| 239 |
+
|
| 240 |
+
# Extract text from all paragraphs
|
| 241 |
+
paragraphs = [para.text for para in doc.paragraphs if para.text.strip()]
|
| 242 |
+
full_content = "\n\n".join(paragraphs)
|
| 243 |
+
|
| 244 |
+
logger.info(f"Word parsed successfully: {len(paragraphs)} paragraphs")
|
| 245 |
+
|
| 246 |
+
return {
|
| 247 |
+
"content": full_content,
|
| 248 |
+
"paragraphs": len(paragraphs),
|
| 249 |
+
"file_type": "Word",
|
| 250 |
+
"file_path": file_path,
|
| 251 |
+
}
|
| 252 |
+
|
| 253 |
+
except FileNotFoundError as e:
|
| 254 |
+
logger.error(f"Word file not found: {e}")
|
| 255 |
+
raise
|
| 256 |
+
except (IOError, OSError) as e:
|
| 257 |
+
logger.warning(f"Word IO error (will retry): {e}")
|
| 258 |
+
raise
|
| 259 |
+
except Exception as e:
|
| 260 |
+
logger.error(f"Word parsing error: {e}")
|
| 261 |
+
raise ValueError(f"Failed to parse Word document: {str(e)}")
|
| 262 |
+
|
| 263 |
+
|
| 264 |
+
# ============================================================================
|
| 265 |
+
# Text/CSV Parser
|
| 266 |
+
# ============================================================================
|
| 267 |
+
|
| 268 |
+
@retry(
|
| 269 |
+
stop=stop_after_attempt(MAX_RETRIES),
|
| 270 |
+
wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
|
| 271 |
+
retry=retry_if_exception_type((IOError, OSError)),
|
| 272 |
+
reraise=True,
|
| 273 |
+
)
|
| 274 |
+
def parse_text(file_path: str) -> Dict:
|
| 275 |
+
"""
|
| 276 |
+
Parse plain text or CSV file.
|
| 277 |
+
|
| 278 |
+
Args:
|
| 279 |
+
file_path: Path to text file (.txt or .csv)
|
| 280 |
+
|
| 281 |
+
Returns:
|
| 282 |
+
Dict with structure: {
|
| 283 |
+
"content": str,
|
| 284 |
+
"lines": int,
|
| 285 |
+
"file_type": "Text" or "CSV",
|
| 286 |
+
"file_path": str
|
| 287 |
+
}
|
| 288 |
+
|
| 289 |
+
Raises:
|
| 290 |
+
FileNotFoundError: If file doesn't exist
|
| 291 |
+
IOError: For file reading errors (triggers retry)
|
| 292 |
+
"""
|
| 293 |
+
try:
|
| 294 |
+
path = Path(file_path)
|
| 295 |
+
if not path.exists():
|
| 296 |
+
raise FileNotFoundError(f"Text file not found: {file_path}")
|
| 297 |
+
|
| 298 |
+
logger.info(f"Parsing text file: {file_path}")
|
| 299 |
+
|
| 300 |
+
with open(path, 'r', encoding='utf-8') as f:
|
| 301 |
+
content = f.read()
|
| 302 |
+
|
| 303 |
+
lines = content.count('\n') + 1
|
| 304 |
+
file_type = "CSV" if path.suffix == '.csv' else "Text"
|
| 305 |
+
|
| 306 |
+
logger.info(f"{file_type} file parsed successfully: {lines} lines")
|
| 307 |
+
|
| 308 |
+
return {
|
| 309 |
+
"content": content,
|
| 310 |
+
"lines": lines,
|
| 311 |
+
"file_type": file_type,
|
| 312 |
+
"file_path": file_path,
|
| 313 |
+
}
|
| 314 |
+
|
| 315 |
+
except FileNotFoundError as e:
|
| 316 |
+
logger.error(f"Text file not found: {e}")
|
| 317 |
+
raise
|
| 318 |
+
except (IOError, OSError) as e:
|
| 319 |
+
logger.warning(f"Text file IO error (will retry): {e}")
|
| 320 |
+
raise
|
| 321 |
+
except UnicodeDecodeError as e:
|
| 322 |
+
logger.error(f"Text file encoding error: {e}")
|
| 323 |
+
raise ValueError(f"Failed to decode text file (try UTF-8): {str(e)}")
|
| 324 |
+
|
| 325 |
+
|
| 326 |
+
# ============================================================================
|
| 327 |
+
# Unified File Parser
|
| 328 |
+
# ============================================================================
|
| 329 |
+
|
| 330 |
+
def parse_file(file_path: str) -> Dict:
|
| 331 |
+
"""
|
| 332 |
+
Parse file based on extension, automatically selecting the right parser.
|
| 333 |
+
|
| 334 |
+
Args:
|
| 335 |
+
file_path: Path to file
|
| 336 |
+
|
| 337 |
+
Returns:
|
| 338 |
+
Dict with parsed content and metadata
|
| 339 |
+
|
| 340 |
+
Raises:
|
| 341 |
+
ValueError: If file type is not supported
|
| 342 |
+
FileNotFoundError: If file doesn't exist
|
| 343 |
+
Exception: For parsing errors
|
| 344 |
+
"""
|
| 345 |
+
path = Path(file_path)
|
| 346 |
+
extension = path.suffix.lower()
|
| 347 |
+
|
| 348 |
+
if extension not in SUPPORTED_EXTENSIONS:
|
| 349 |
+
raise ValueError(
|
| 350 |
+
f"Unsupported file type: {extension}. "
|
| 351 |
+
f"Supported: {', '.join(SUPPORTED_EXTENSIONS.keys())}"
|
| 352 |
+
)
|
| 353 |
+
|
| 354 |
+
logger.info(f"Dispatching parser for {SUPPORTED_EXTENSIONS[extension]} file: {file_path}")
|
| 355 |
+
|
| 356 |
+
# Dispatch to appropriate parser
|
| 357 |
+
if extension == '.pdf':
|
| 358 |
+
return parse_pdf(file_path)
|
| 359 |
+
elif extension in ['.xlsx', '.xls']:
|
| 360 |
+
return parse_excel(file_path)
|
| 361 |
+
elif extension == '.docx':
|
| 362 |
+
return parse_word(file_path)
|
| 363 |
+
elif extension in ['.txt', '.csv']:
|
| 364 |
+
return parse_text(file_path)
|
| 365 |
+
else:
|
| 366 |
+
# Should never reach here due to check above
|
| 367 |
+
raise ValueError(f"No parser for extension: {extension}")
|
|
@@ -0,0 +1,339 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Vision Tool - Image analysis using multimodal LLMs
|
| 3 |
+
Author: @mangobee
|
| 4 |
+
Date: 2026-01-02
|
| 5 |
+
|
| 6 |
+
Provides image analysis functionality using:
|
| 7 |
+
- Gemini 2.0 Flash (default, free tier)
|
| 8 |
+
- Claude Sonnet 4.5 (fallback, if configured)
|
| 9 |
+
|
| 10 |
+
Supports:
|
| 11 |
+
- Image file loading and encoding
|
| 12 |
+
- Question answering about images
|
| 13 |
+
- Object detection/description
|
| 14 |
+
- Text extraction (OCR)
|
| 15 |
+
- Visual reasoning
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
import base64
|
| 19 |
+
import logging
|
| 20 |
+
from pathlib import Path
|
| 21 |
+
from typing import Dict, Optional
|
| 22 |
+
from tenacity import (
|
| 23 |
+
retry,
|
| 24 |
+
stop_after_attempt,
|
| 25 |
+
wait_exponential,
|
| 26 |
+
retry_if_exception_type,
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
from src.config.settings import Settings
|
| 30 |
+
|
| 31 |
+
# ============================================================================
|
| 32 |
+
# CONFIG
|
| 33 |
+
# ============================================================================
|
| 34 |
+
MAX_RETRIES = 3
|
| 35 |
+
RETRY_MIN_WAIT = 1 # seconds
|
| 36 |
+
RETRY_MAX_WAIT = 10 # seconds
|
| 37 |
+
MAX_IMAGE_SIZE_MB = 10 # Maximum image size in MB
|
| 38 |
+
SUPPORTED_IMAGE_FORMATS = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp'}
|
| 39 |
+
|
| 40 |
+
# ============================================================================
|
| 41 |
+
# Logging Setup
|
| 42 |
+
# ============================================================================
|
| 43 |
+
logger = logging.getLogger(__name__)
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
# ============================================================================
|
| 47 |
+
# Image Loading and Encoding
|
| 48 |
+
# ============================================================================
|
| 49 |
+
|
| 50 |
+
def load_and_encode_image(image_path: str) -> Dict[str, str]:
|
| 51 |
+
"""
|
| 52 |
+
Load image file and encode as base64.
|
| 53 |
+
|
| 54 |
+
Args:
|
| 55 |
+
image_path: Path to image file
|
| 56 |
+
|
| 57 |
+
Returns:
|
| 58 |
+
Dict with structure: {
|
| 59 |
+
"data": str, # Base64 encoded image
|
| 60 |
+
"mime_type": str, # MIME type (e.g., "image/jpeg")
|
| 61 |
+
"size_mb": float, # File size in MB
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
Raises:
|
| 65 |
+
FileNotFoundError: If image doesn't exist
|
| 66 |
+
ValueError: If file is not a supported image format or too large
|
| 67 |
+
"""
|
| 68 |
+
path = Path(image_path)
|
| 69 |
+
|
| 70 |
+
if not path.exists():
|
| 71 |
+
raise FileNotFoundError(f"Image file not found: {image_path}")
|
| 72 |
+
|
| 73 |
+
# Check file extension
|
| 74 |
+
extension = path.suffix.lower()
|
| 75 |
+
if extension not in SUPPORTED_IMAGE_FORMATS:
|
| 76 |
+
raise ValueError(
|
| 77 |
+
f"Unsupported image format: {extension}. "
|
| 78 |
+
f"Supported: {', '.join(SUPPORTED_IMAGE_FORMATS)}"
|
| 79 |
+
)
|
| 80 |
+
|
| 81 |
+
# Check file size
|
| 82 |
+
size_bytes = path.stat().st_size
|
| 83 |
+
size_mb = size_bytes / (1024 * 1024)
|
| 84 |
+
|
| 85 |
+
if size_mb > MAX_IMAGE_SIZE_MB:
|
| 86 |
+
raise ValueError(
|
| 87 |
+
f"Image too large: {size_mb:.2f}MB. Maximum: {MAX_IMAGE_SIZE_MB}MB"
|
| 88 |
+
)
|
| 89 |
+
|
| 90 |
+
# Read and encode image
|
| 91 |
+
with open(path, 'rb') as f:
|
| 92 |
+
image_data = f.read()
|
| 93 |
+
|
| 94 |
+
encoded = base64.b64encode(image_data).decode('utf-8')
|
| 95 |
+
|
| 96 |
+
# Determine MIME type
|
| 97 |
+
mime_types = {
|
| 98 |
+
'.jpg': 'image/jpeg',
|
| 99 |
+
'.jpeg': 'image/jpeg',
|
| 100 |
+
'.png': 'image/png',
|
| 101 |
+
'.gif': 'image/gif',
|
| 102 |
+
'.webp': 'image/webp',
|
| 103 |
+
'.bmp': 'image/bmp',
|
| 104 |
+
}
|
| 105 |
+
mime_type = mime_types.get(extension, 'image/jpeg')
|
| 106 |
+
|
| 107 |
+
logger.info(f"Image loaded: {path.name} ({size_mb:.2f}MB, {mime_type})")
|
| 108 |
+
|
| 109 |
+
return {
|
| 110 |
+
"data": encoded,
|
| 111 |
+
"mime_type": mime_type,
|
| 112 |
+
"size_mb": size_mb,
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
# ============================================================================
|
| 117 |
+
# Gemini Vision
|
| 118 |
+
# ============================================================================
|
| 119 |
+
|
| 120 |
+
@retry(
|
| 121 |
+
stop=stop_after_attempt(MAX_RETRIES),
|
| 122 |
+
wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
|
| 123 |
+
retry=retry_if_exception_type((ConnectionError, TimeoutError)),
|
| 124 |
+
reraise=True,
|
| 125 |
+
)
|
| 126 |
+
def analyze_image_gemini(image_path: str, question: Optional[str] = None) -> Dict:
|
| 127 |
+
"""
|
| 128 |
+
Analyze image using Gemini 2.0 Flash.
|
| 129 |
+
|
| 130 |
+
Args:
|
| 131 |
+
image_path: Path to image file
|
| 132 |
+
question: Optional question about the image (default: "Describe this image")
|
| 133 |
+
|
| 134 |
+
Returns:
|
| 135 |
+
Dict with structure: {
|
| 136 |
+
"answer": str, # LLM's analysis/answer
|
| 137 |
+
"model": "gemini-2.0-flash",
|
| 138 |
+
"image_path": str,
|
| 139 |
+
"question": str
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
Raises:
|
| 143 |
+
ValueError: If API key not configured or image invalid
|
| 144 |
+
ConnectionError: If API connection fails (triggers retry)
|
| 145 |
+
"""
|
| 146 |
+
try:
|
| 147 |
+
import google.genai as genai
|
| 148 |
+
|
| 149 |
+
settings = Settings()
|
| 150 |
+
api_key = settings.google_api_key
|
| 151 |
+
|
| 152 |
+
if not api_key:
|
| 153 |
+
raise ValueError("GOOGLE_API_KEY not configured in settings")
|
| 154 |
+
|
| 155 |
+
# Load and encode image
|
| 156 |
+
image_data = load_and_encode_image(image_path)
|
| 157 |
+
|
| 158 |
+
# Default question
|
| 159 |
+
if not question:
|
| 160 |
+
question = "Describe this image in detail."
|
| 161 |
+
|
| 162 |
+
logger.info(f"Gemini vision analysis: {Path(image_path).name} - '{question}'")
|
| 163 |
+
|
| 164 |
+
# Configure Gemini client
|
| 165 |
+
client = genai.Client(api_key=api_key)
|
| 166 |
+
|
| 167 |
+
# Create content with image and text
|
| 168 |
+
response = client.models.generate_content(
|
| 169 |
+
model='gemini-2.0-flash-exp',
|
| 170 |
+
contents=[
|
| 171 |
+
question,
|
| 172 |
+
{
|
| 173 |
+
"mime_type": image_data["mime_type"],
|
| 174 |
+
"data": image_data["data"]
|
| 175 |
+
}
|
| 176 |
+
]
|
| 177 |
+
)
|
| 178 |
+
|
| 179 |
+
answer = response.text.strip()
|
| 180 |
+
|
| 181 |
+
logger.info(f"Gemini vision successful: {len(answer)} chars")
|
| 182 |
+
|
| 183 |
+
return {
|
| 184 |
+
"answer": answer,
|
| 185 |
+
"model": "gemini-2.0-flash",
|
| 186 |
+
"image_path": image_path,
|
| 187 |
+
"question": question,
|
| 188 |
+
}
|
| 189 |
+
|
| 190 |
+
except ValueError as e:
|
| 191 |
+
logger.error(f"Gemini configuration/input error: {e}")
|
| 192 |
+
raise
|
| 193 |
+
except (ConnectionError, TimeoutError) as e:
|
| 194 |
+
logger.warning(f"Gemini connection error (will retry): {e}")
|
| 195 |
+
raise
|
| 196 |
+
except Exception as e:
|
| 197 |
+
logger.error(f"Gemini vision error: {e}")
|
| 198 |
+
raise Exception(f"Gemini vision failed: {str(e)}")
|
| 199 |
+
|
| 200 |
+
|
| 201 |
+
# ============================================================================
|
| 202 |
+
# Claude Vision (Fallback)
|
| 203 |
+
# ============================================================================
|
| 204 |
+
|
| 205 |
+
@retry(
|
| 206 |
+
stop=stop_after_attempt(MAX_RETRIES),
|
| 207 |
+
wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
|
| 208 |
+
retry=retry_if_exception_type((ConnectionError, TimeoutError)),
|
| 209 |
+
reraise=True,
|
| 210 |
+
)
|
| 211 |
+
def analyze_image_claude(image_path: str, question: Optional[str] = None) -> Dict:
|
| 212 |
+
"""
|
| 213 |
+
Analyze image using Claude Sonnet 4.5.
|
| 214 |
+
|
| 215 |
+
Args:
|
| 216 |
+
image_path: Path to image file
|
| 217 |
+
question: Optional question about the image (default: "Describe this image")
|
| 218 |
+
|
| 219 |
+
Returns:
|
| 220 |
+
Dict with structure: {
|
| 221 |
+
"answer": str, # LLM's analysis/answer
|
| 222 |
+
"model": "claude-sonnet-4.5",
|
| 223 |
+
"image_path": str,
|
| 224 |
+
"question": str
|
| 225 |
+
}
|
| 226 |
+
|
| 227 |
+
Raises:
|
| 228 |
+
ValueError: If API key not configured or image invalid
|
| 229 |
+
ConnectionError: If API connection fails (triggers retry)
|
| 230 |
+
"""
|
| 231 |
+
try:
|
| 232 |
+
from anthropic import Anthropic
|
| 233 |
+
|
| 234 |
+
settings = Settings()
|
| 235 |
+
api_key = settings.anthropic_api_key
|
| 236 |
+
|
| 237 |
+
if not api_key:
|
| 238 |
+
raise ValueError("ANTHROPIC_API_KEY not configured in settings")
|
| 239 |
+
|
| 240 |
+
# Load and encode image
|
| 241 |
+
image_data = load_and_encode_image(image_path)
|
| 242 |
+
|
| 243 |
+
# Default question
|
| 244 |
+
if not question:
|
| 245 |
+
question = "Describe this image in detail."
|
| 246 |
+
|
| 247 |
+
logger.info(f"Claude vision analysis: {Path(image_path).name} - '{question}'")
|
| 248 |
+
|
| 249 |
+
# Configure Claude client
|
| 250 |
+
client = Anthropic(api_key=api_key)
|
| 251 |
+
|
| 252 |
+
# Create message with image
|
| 253 |
+
response = client.messages.create(
|
| 254 |
+
model="claude-sonnet-4-20250514",
|
| 255 |
+
max_tokens=1024,
|
| 256 |
+
messages=[
|
| 257 |
+
{
|
| 258 |
+
"role": "user",
|
| 259 |
+
"content": [
|
| 260 |
+
{
|
| 261 |
+
"type": "image",
|
| 262 |
+
"source": {
|
| 263 |
+
"type": "base64",
|
| 264 |
+
"media_type": image_data["mime_type"],
|
| 265 |
+
"data": image_data["data"],
|
| 266 |
+
},
|
| 267 |
+
},
|
| 268 |
+
{
|
| 269 |
+
"type": "text",
|
| 270 |
+
"text": question
|
| 271 |
+
}
|
| 272 |
+
],
|
| 273 |
+
}
|
| 274 |
+
],
|
| 275 |
+
)
|
| 276 |
+
|
| 277 |
+
answer = response.content[0].text.strip()
|
| 278 |
+
|
| 279 |
+
logger.info(f"Claude vision successful: {len(answer)} chars")
|
| 280 |
+
|
| 281 |
+
return {
|
| 282 |
+
"answer": answer,
|
| 283 |
+
"model": "claude-sonnet-4.5",
|
| 284 |
+
"image_path": image_path,
|
| 285 |
+
"question": question,
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
except ValueError as e:
|
| 289 |
+
logger.error(f"Claude configuration/input error: {e}")
|
| 290 |
+
raise
|
| 291 |
+
except (ConnectionError, TimeoutError) as e:
|
| 292 |
+
logger.warning(f"Claude connection error (will retry): {e}")
|
| 293 |
+
raise
|
| 294 |
+
except Exception as e:
|
| 295 |
+
logger.error(f"Claude vision error: {e}")
|
| 296 |
+
raise Exception(f"Claude vision failed: {str(e)}")
|
| 297 |
+
|
| 298 |
+
|
| 299 |
+
# ============================================================================
|
| 300 |
+
# Unified Vision Analysis
|
| 301 |
+
# ============================================================================
|
| 302 |
+
|
| 303 |
+
def analyze_image(image_path: str, question: Optional[str] = None) -> Dict:
|
| 304 |
+
"""
|
| 305 |
+
Analyze image using available multimodal LLM.
|
| 306 |
+
|
| 307 |
+
Tries Gemini first (free tier), falls back to Claude if configured.
|
| 308 |
+
|
| 309 |
+
Args:
|
| 310 |
+
image_path: Path to image file
|
| 311 |
+
question: Optional question about the image
|
| 312 |
+
|
| 313 |
+
Returns:
|
| 314 |
+
Dict with analysis results from either Gemini or Claude
|
| 315 |
+
|
| 316 |
+
Raises:
|
| 317 |
+
Exception: If both Gemini and Claude fail or are not configured
|
| 318 |
+
"""
|
| 319 |
+
settings = Settings()
|
| 320 |
+
|
| 321 |
+
# Try Gemini first (default, free tier)
|
| 322 |
+
if settings.google_api_key:
|
| 323 |
+
try:
|
| 324 |
+
return analyze_image_gemini(image_path, question)
|
| 325 |
+
except Exception as e:
|
| 326 |
+
logger.warning(f"Gemini failed, trying Claude: {e}")
|
| 327 |
+
|
| 328 |
+
# Fallback to Claude
|
| 329 |
+
if settings.anthropic_api_key:
|
| 330 |
+
try:
|
| 331 |
+
return analyze_image_claude(image_path, question)
|
| 332 |
+
except Exception as e:
|
| 333 |
+
logger.error(f"Claude also failed: {e}")
|
| 334 |
+
raise Exception(f"Vision analysis failed - Gemini and Claude both failed")
|
| 335 |
+
|
| 336 |
+
# No API keys configured
|
| 337 |
+
raise ValueError(
|
| 338 |
+
"No vision API configured. Please set GOOGLE_API_KEY or ANTHROPIC_API_KEY"
|
| 339 |
+
)
|
|
@@ -0,0 +1,230 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Web Search Tool - Tavily and Exa implementations
|
| 3 |
+
Author: @mangobee
|
| 4 |
+
Date: 2026-01-02
|
| 5 |
+
|
| 6 |
+
Provides web search functionality with:
|
| 7 |
+
- Tavily as primary search (free tier: 1000 req/month)
|
| 8 |
+
- Exa as fallback (paid tier)
|
| 9 |
+
- Retry logic with exponential backoff
|
| 10 |
+
- Structured error handling
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
import logging
|
| 14 |
+
from typing import Dict, List, Optional
|
| 15 |
+
from tenacity import (
|
| 16 |
+
retry,
|
| 17 |
+
stop_after_attempt,
|
| 18 |
+
wait_exponential,
|
| 19 |
+
retry_if_exception_type,
|
| 20 |
+
)
|
| 21 |
+
|
| 22 |
+
from src.config.settings import Settings
|
| 23 |
+
|
| 24 |
+
# ============================================================================
|
| 25 |
+
# CONFIG
|
| 26 |
+
# ============================================================================
|
| 27 |
+
MAX_RETRIES = 3
|
| 28 |
+
RETRY_MIN_WAIT = 1 # seconds
|
| 29 |
+
RETRY_MAX_WAIT = 10 # seconds
|
| 30 |
+
DEFAULT_MAX_RESULTS = 5
|
| 31 |
+
|
| 32 |
+
# ============================================================================
|
| 33 |
+
# Logging Setup
|
| 34 |
+
# ============================================================================
|
| 35 |
+
logger = logging.getLogger(__name__)
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
# ============================================================================
|
| 39 |
+
# Tavily Search Implementation
|
| 40 |
+
# ============================================================================
|
| 41 |
+
|
| 42 |
+
@retry(
|
| 43 |
+
stop=stop_after_attempt(MAX_RETRIES),
|
| 44 |
+
wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
|
| 45 |
+
retry=retry_if_exception_type((ConnectionError, TimeoutError)),
|
| 46 |
+
reraise=True,
|
| 47 |
+
)
|
| 48 |
+
def tavily_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
|
| 49 |
+
"""
|
| 50 |
+
Search using Tavily API with retry logic.
|
| 51 |
+
|
| 52 |
+
Args:
|
| 53 |
+
query: Search query string
|
| 54 |
+
max_results: Maximum number of results to return (default: 5)
|
| 55 |
+
|
| 56 |
+
Returns:
|
| 57 |
+
Dict with structure: {
|
| 58 |
+
"results": [{"title": str, "url": str, "snippet": str}, ...],
|
| 59 |
+
"source": "tavily",
|
| 60 |
+
"query": str,
|
| 61 |
+
"count": int
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
Raises:
|
| 65 |
+
ValueError: If API key not configured
|
| 66 |
+
ConnectionError: If API connection fails after retries
|
| 67 |
+
Exception: For other API errors
|
| 68 |
+
"""
|
| 69 |
+
try:
|
| 70 |
+
from tavily import TavilyClient
|
| 71 |
+
|
| 72 |
+
settings = Settings()
|
| 73 |
+
api_key = settings.tavily_api_key
|
| 74 |
+
|
| 75 |
+
if not api_key:
|
| 76 |
+
raise ValueError("TAVILY_API_KEY not configured in settings")
|
| 77 |
+
|
| 78 |
+
logger.info(f"Tavily search: query='{query}', max_results={max_results}")
|
| 79 |
+
|
| 80 |
+
client = TavilyClient(api_key=api_key)
|
| 81 |
+
response = client.search(query=query, max_results=max_results)
|
| 82 |
+
|
| 83 |
+
# Extract and structure results
|
| 84 |
+
results = []
|
| 85 |
+
for item in response.get("results", []):
|
| 86 |
+
results.append({
|
| 87 |
+
"title": item.get("title", ""),
|
| 88 |
+
"url": item.get("url", ""),
|
| 89 |
+
"snippet": item.get("content", ""),
|
| 90 |
+
})
|
| 91 |
+
|
| 92 |
+
logger.info(f"Tavily search successful: {len(results)} results")
|
| 93 |
+
|
| 94 |
+
return {
|
| 95 |
+
"results": results,
|
| 96 |
+
"source": "tavily",
|
| 97 |
+
"query": query,
|
| 98 |
+
"count": len(results),
|
| 99 |
+
}
|
| 100 |
+
|
| 101 |
+
except ValueError as e:
|
| 102 |
+
logger.error(f"Tavily configuration error: {e}")
|
| 103 |
+
raise
|
| 104 |
+
except (ConnectionError, TimeoutError) as e:
|
| 105 |
+
logger.warning(f"Tavily connection error (will retry): {e}")
|
| 106 |
+
raise
|
| 107 |
+
except Exception as e:
|
| 108 |
+
logger.error(f"Tavily search error: {e}")
|
| 109 |
+
raise Exception(f"Tavily search failed: {str(e)}")
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
# ============================================================================
|
| 113 |
+
# Exa Search Implementation
|
| 114 |
+
# ============================================================================
|
| 115 |
+
|
| 116 |
+
@retry(
|
| 117 |
+
stop=stop_after_attempt(MAX_RETRIES),
|
| 118 |
+
wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
|
| 119 |
+
retry=retry_if_exception_type((ConnectionError, TimeoutError)),
|
| 120 |
+
reraise=True,
|
| 121 |
+
)
|
| 122 |
+
def exa_search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
|
| 123 |
+
"""
|
| 124 |
+
Search using Exa API with retry logic.
|
| 125 |
+
|
| 126 |
+
Args:
|
| 127 |
+
query: Search query string
|
| 128 |
+
max_results: Maximum number of results to return (default: 5)
|
| 129 |
+
|
| 130 |
+
Returns:
|
| 131 |
+
Dict with structure: {
|
| 132 |
+
"results": [{"title": str, "url": str, "snippet": str}, ...],
|
| 133 |
+
"source": "exa",
|
| 134 |
+
"query": str,
|
| 135 |
+
"count": int
|
| 136 |
+
}
|
| 137 |
+
|
| 138 |
+
Raises:
|
| 139 |
+
ValueError: If API key not configured
|
| 140 |
+
ConnectionError: If API connection fails after retries
|
| 141 |
+
Exception: For other API errors
|
| 142 |
+
"""
|
| 143 |
+
try:
|
| 144 |
+
from exa_py import Exa
|
| 145 |
+
|
| 146 |
+
settings = Settings()
|
| 147 |
+
api_key = settings.exa_api_key
|
| 148 |
+
|
| 149 |
+
if not api_key:
|
| 150 |
+
raise ValueError("EXA_API_KEY not configured in settings")
|
| 151 |
+
|
| 152 |
+
logger.info(f"Exa search: query='{query}', max_results={max_results}")
|
| 153 |
+
|
| 154 |
+
client = Exa(api_key=api_key)
|
| 155 |
+
response = client.search(query=query, num_results=max_results, use_autoprompt=True)
|
| 156 |
+
|
| 157 |
+
# Extract and structure results
|
| 158 |
+
results = []
|
| 159 |
+
for item in response.results:
|
| 160 |
+
results.append({
|
| 161 |
+
"title": item.title if hasattr(item, 'title') else "",
|
| 162 |
+
"url": item.url if hasattr(item, 'url') else "",
|
| 163 |
+
"snippet": item.text if hasattr(item, 'text') else "",
|
| 164 |
+
})
|
| 165 |
+
|
| 166 |
+
logger.info(f"Exa search successful: {len(results)} results")
|
| 167 |
+
|
| 168 |
+
return {
|
| 169 |
+
"results": results,
|
| 170 |
+
"source": "exa",
|
| 171 |
+
"query": query,
|
| 172 |
+
"count": len(results),
|
| 173 |
+
}
|
| 174 |
+
|
| 175 |
+
except ValueError as e:
|
| 176 |
+
logger.error(f"Exa configuration error: {e}")
|
| 177 |
+
raise
|
| 178 |
+
except (ConnectionError, TimeoutError) as e:
|
| 179 |
+
logger.warning(f"Exa connection error (will retry): {e}")
|
| 180 |
+
raise
|
| 181 |
+
except Exception as e:
|
| 182 |
+
logger.error(f"Exa search error: {e}")
|
| 183 |
+
raise Exception(f"Exa search failed: {str(e)}")
|
| 184 |
+
|
| 185 |
+
|
| 186 |
+
# ============================================================================
|
| 187 |
+
# Unified Search with Fallback
|
| 188 |
+
# ============================================================================
|
| 189 |
+
|
| 190 |
+
def search(query: str, max_results: int = DEFAULT_MAX_RESULTS) -> Dict:
|
| 191 |
+
"""
|
| 192 |
+
Unified search function with automatic fallback.
|
| 193 |
+
|
| 194 |
+
Tries Tavily first (free tier), falls back to Exa if Tavily fails.
|
| 195 |
+
|
| 196 |
+
Args:
|
| 197 |
+
query: Search query string
|
| 198 |
+
max_results: Maximum number of results to return (default: 5)
|
| 199 |
+
|
| 200 |
+
Returns:
|
| 201 |
+
Dict with search results from either Tavily or Exa
|
| 202 |
+
|
| 203 |
+
Raises:
|
| 204 |
+
Exception: If both Tavily and Exa searches fail
|
| 205 |
+
"""
|
| 206 |
+
settings = Settings()
|
| 207 |
+
default_tool = settings.default_search_tool
|
| 208 |
+
|
| 209 |
+
# Try default tool first
|
| 210 |
+
if default_tool == "tavily":
|
| 211 |
+
try:
|
| 212 |
+
return tavily_search(query, max_results)
|
| 213 |
+
except Exception as e:
|
| 214 |
+
logger.warning(f"Tavily failed, falling back to Exa: {e}")
|
| 215 |
+
try:
|
| 216 |
+
return exa_search(query, max_results)
|
| 217 |
+
except Exception as exa_error:
|
| 218 |
+
logger.error(f"Both Tavily and Exa failed")
|
| 219 |
+
raise Exception(f"Search failed - Tavily: {e}, Exa: {exa_error}")
|
| 220 |
+
else:
|
| 221 |
+
# Default is Exa
|
| 222 |
+
try:
|
| 223 |
+
return exa_search(query, max_results)
|
| 224 |
+
except Exception as e:
|
| 225 |
+
logger.warning(f"Exa failed, falling back to Tavily: {e}")
|
| 226 |
+
try:
|
| 227 |
+
return tavily_search(query, max_results)
|
| 228 |
+
except Exception as tavily_error:
|
| 229 |
+
logger.error(f"Both Exa and Tavily failed")
|
| 230 |
+
raise Exception(f"Search failed - Exa: {e}, Tavily: {tavily_error}")
|
|
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Generate test fixtures for file parser tests
|
| 3 |
+
Author: @mangobee
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
|
| 8 |
+
# ============================================================================
|
| 9 |
+
# CONFIG
|
| 10 |
+
# ============================================================================
|
| 11 |
+
FIXTURES_DIR = Path(__file__).parent
|
| 12 |
+
|
| 13 |
+
# ============================================================================
|
| 14 |
+
# Generate PDF
|
| 15 |
+
# ============================================================================
|
| 16 |
+
def generate_pdf():
|
| 17 |
+
"""Generate sample PDF file using fpdf"""
|
| 18 |
+
try:
|
| 19 |
+
from fpdf import FPDF
|
| 20 |
+
except ImportError:
|
| 21 |
+
print("Skipping PDF generation (fpdf not installed)")
|
| 22 |
+
return
|
| 23 |
+
|
| 24 |
+
pdf = FPDF()
|
| 25 |
+
pdf.add_page()
|
| 26 |
+
pdf.set_font("Arial", size=12)
|
| 27 |
+
pdf.cell(200, 10, txt="Test PDF Document", ln=True)
|
| 28 |
+
pdf.cell(200, 10, txt="This is page 1 content.", ln=True)
|
| 29 |
+
pdf.add_page()
|
| 30 |
+
pdf.cell(200, 10, txt="Page 2", ln=True)
|
| 31 |
+
pdf.cell(200, 10, txt="This is page 2 content.", ln=True)
|
| 32 |
+
|
| 33 |
+
pdf_path = FIXTURES_DIR / "sample.pdf"
|
| 34 |
+
pdf.output(str(pdf_path))
|
| 35 |
+
|
| 36 |
+
print(f"Created: {pdf_path}")
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
# ============================================================================
|
| 40 |
+
# Generate Excel
|
| 41 |
+
# ============================================================================
|
| 42 |
+
def generate_excel():
|
| 43 |
+
"""Generate sample Excel file"""
|
| 44 |
+
from openpyxl import Workbook
|
| 45 |
+
|
| 46 |
+
wb = Workbook()
|
| 47 |
+
|
| 48 |
+
# Sheet 1
|
| 49 |
+
ws1 = wb.active
|
| 50 |
+
ws1.title = "Data"
|
| 51 |
+
ws1.append(["Product", "Price", "Quantity"])
|
| 52 |
+
ws1.append(["Apple", 1.50, 100])
|
| 53 |
+
ws1.append(["Banana", 0.75, 150])
|
| 54 |
+
ws1.append(["Orange", 2.00, 80])
|
| 55 |
+
|
| 56 |
+
# Sheet 2
|
| 57 |
+
ws2 = wb.create_sheet("Summary")
|
| 58 |
+
ws2.append(["Total Products", 3])
|
| 59 |
+
ws2.append(["Total Quantity", 330])
|
| 60 |
+
|
| 61 |
+
excel_path = FIXTURES_DIR / "sample.xlsx"
|
| 62 |
+
wb.save(excel_path)
|
| 63 |
+
|
| 64 |
+
print(f"Created: {excel_path}")
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
# ============================================================================
|
| 68 |
+
# Generate Word
|
| 69 |
+
# ============================================================================
|
| 70 |
+
def generate_word():
|
| 71 |
+
"""Generate sample Word document"""
|
| 72 |
+
from docx import Document
|
| 73 |
+
|
| 74 |
+
doc = Document()
|
| 75 |
+
doc.add_heading("Test Word Document", 0)
|
| 76 |
+
doc.add_paragraph("This is the first paragraph.")
|
| 77 |
+
doc.add_paragraph("This is the second paragraph with some content.")
|
| 78 |
+
doc.add_heading("Section 2", level=1)
|
| 79 |
+
doc.add_paragraph("Content in section 2.")
|
| 80 |
+
|
| 81 |
+
word_path = FIXTURES_DIR / "sample.docx"
|
| 82 |
+
doc.save(word_path)
|
| 83 |
+
|
| 84 |
+
print(f"Created: {word_path}")
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
# ============================================================================
|
| 88 |
+
# Main
|
| 89 |
+
# ============================================================================
|
| 90 |
+
if __name__ == "__main__":
|
| 91 |
+
print("Generating test fixtures...")
|
| 92 |
+
generate_pdf()
|
| 93 |
+
generate_excel()
|
| 94 |
+
generate_word()
|
| 95 |
+
print("All fixtures generated successfully!")
|
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Name,Age,City
|
| 2 |
+
Alice,30,New York
|
| 3 |
+
Bob,25,San Francisco
|
| 4 |
+
Charlie,35,Boston
|
|
Binary file (36.7 kB). View file
|
|
|
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
This is a test text file.
|
| 2 |
+
It has multiple lines.
|
| 3 |
+
Line 3 with some content.
|
| 4 |
+
Final line.
|
|
Binary file (5.44 kB). View file
|
|
|
|
|
@@ -0,0 +1,293 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tests for calculator tool (safe mathematical evaluation)
|
| 3 |
+
Author: @mangobee
|
| 4 |
+
Date: 2026-01-02
|
| 5 |
+
|
| 6 |
+
Tests cover:
|
| 7 |
+
- Basic arithmetic operations
|
| 8 |
+
- Mathematical functions
|
| 9 |
+
- Safety checks (no code execution, no imports, etc.)
|
| 10 |
+
- Timeout protection
|
| 11 |
+
- Complexity limits
|
| 12 |
+
- Error handling
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import pytest
|
| 16 |
+
from src.tools.calculator import safe_eval
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
# ============================================================================
|
| 20 |
+
# Basic Arithmetic Tests
|
| 21 |
+
# ============================================================================
|
| 22 |
+
|
| 23 |
+
def test_addition():
|
| 24 |
+
"""Test basic addition"""
|
| 25 |
+
result = safe_eval("2 + 3")
|
| 26 |
+
assert result["result"] == 5
|
| 27 |
+
assert result["success"] is True
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
def test_subtraction():
|
| 31 |
+
"""Test basic subtraction"""
|
| 32 |
+
result = safe_eval("10 - 4")
|
| 33 |
+
assert result["result"] == 6
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def test_multiplication():
|
| 37 |
+
"""Test basic multiplication"""
|
| 38 |
+
result = safe_eval("6 * 7")
|
| 39 |
+
assert result["result"] == 42
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
def test_division():
|
| 43 |
+
"""Test basic division"""
|
| 44 |
+
result = safe_eval("15 / 3")
|
| 45 |
+
assert result["result"] == 5.0
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
def test_floor_division():
|
| 49 |
+
"""Test floor division"""
|
| 50 |
+
result = safe_eval("17 // 5")
|
| 51 |
+
assert result["result"] == 3
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def test_modulo():
|
| 55 |
+
"""Test modulo operation"""
|
| 56 |
+
result = safe_eval("17 % 5")
|
| 57 |
+
assert result["result"] == 2
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def test_exponentiation():
|
| 61 |
+
"""Test exponentiation"""
|
| 62 |
+
result = safe_eval("2 ** 8")
|
| 63 |
+
assert result["result"] == 256
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def test_negative_numbers():
|
| 67 |
+
"""Test negative numbers"""
|
| 68 |
+
result = safe_eval("-5 + 3")
|
| 69 |
+
assert result["result"] == -2
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
def test_complex_expression():
|
| 73 |
+
"""Test complex arithmetic expression"""
|
| 74 |
+
result = safe_eval("(2 + 3) * 4 - 10 / 2")
|
| 75 |
+
assert result["result"] == 15.0
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
# ============================================================================
|
| 79 |
+
# Mathematical Function Tests
|
| 80 |
+
# ============================================================================
|
| 81 |
+
|
| 82 |
+
def test_sqrt():
|
| 83 |
+
"""Test square root function"""
|
| 84 |
+
result = safe_eval("sqrt(16)")
|
| 85 |
+
assert result["result"] == 4.0
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
def test_abs():
|
| 89 |
+
"""Test absolute value"""
|
| 90 |
+
result = safe_eval("abs(-42)")
|
| 91 |
+
assert result["result"] == 42
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
def test_round():
|
| 95 |
+
"""Test rounding"""
|
| 96 |
+
result = safe_eval("round(3.7)")
|
| 97 |
+
assert result["result"] == 4
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
def test_min():
|
| 101 |
+
"""Test min function"""
|
| 102 |
+
result = safe_eval("min(5, 2, 8, 1)")
|
| 103 |
+
assert result["result"] == 1
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
def test_max():
|
| 107 |
+
"""Test max function"""
|
| 108 |
+
result = safe_eval("max(5, 2, 8, 1)")
|
| 109 |
+
assert result["result"] == 8
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
def test_trigonometric():
|
| 113 |
+
"""Test trigonometric functions"""
|
| 114 |
+
result = safe_eval("sin(0)")
|
| 115 |
+
assert result["result"] == 0.0
|
| 116 |
+
|
| 117 |
+
result = safe_eval("cos(0)")
|
| 118 |
+
assert result["result"] == 1.0
|
| 119 |
+
|
| 120 |
+
|
| 121 |
+
def test_logarithm():
|
| 122 |
+
"""Test logarithmic functions"""
|
| 123 |
+
result = safe_eval("log10(100)")
|
| 124 |
+
assert result["result"] == 2.0
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
def test_constants():
|
| 128 |
+
"""Test mathematical constants"""
|
| 129 |
+
result = safe_eval("pi")
|
| 130 |
+
assert abs(result["result"] - 3.14159) < 0.001
|
| 131 |
+
|
| 132 |
+
result = safe_eval("e")
|
| 133 |
+
assert abs(result["result"] - 2.71828) < 0.001
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
def test_factorial():
|
| 137 |
+
"""Test factorial function"""
|
| 138 |
+
result = safe_eval("factorial(5)")
|
| 139 |
+
assert result["result"] == 120
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
def test_nested_functions():
|
| 143 |
+
"""Test nested function calls"""
|
| 144 |
+
result = safe_eval("sqrt(abs(-16))")
|
| 145 |
+
assert result["result"] == 4.0
|
| 146 |
+
|
| 147 |
+
|
| 148 |
+
# ============================================================================
|
| 149 |
+
# Security Tests
|
| 150 |
+
# ============================================================================
|
| 151 |
+
|
| 152 |
+
def test_no_import():
|
| 153 |
+
"""Test that imports are blocked"""
|
| 154 |
+
with pytest.raises(SyntaxError):
|
| 155 |
+
safe_eval("import os")
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
def test_no_exec():
|
| 159 |
+
"""Test that exec is blocked"""
|
| 160 |
+
with pytest.raises((ValueError, SyntaxError)):
|
| 161 |
+
safe_eval("exec('print(1)')")
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
def test_no_eval():
|
| 165 |
+
"""Test that eval is blocked"""
|
| 166 |
+
with pytest.raises((ValueError, SyntaxError)):
|
| 167 |
+
safe_eval("eval('1+1')")
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
def test_no_lambda():
|
| 171 |
+
"""Test that lambda is blocked"""
|
| 172 |
+
with pytest.raises((ValueError, SyntaxError)):
|
| 173 |
+
safe_eval("lambda x: x + 1")
|
| 174 |
+
|
| 175 |
+
|
| 176 |
+
def test_no_attribute_access():
|
| 177 |
+
"""Test that attribute access is blocked"""
|
| 178 |
+
with pytest.raises(ValueError):
|
| 179 |
+
safe_eval("(1).__class__")
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
def test_no_list_comprehension():
|
| 183 |
+
"""Test that list comprehensions are blocked"""
|
| 184 |
+
with pytest.raises(ValueError):
|
| 185 |
+
safe_eval("[x for x in range(10)]")
|
| 186 |
+
|
| 187 |
+
|
| 188 |
+
def test_no_dict_access():
|
| 189 |
+
"""Test that dict operations are blocked"""
|
| 190 |
+
with pytest.raises((ValueError, SyntaxError)):
|
| 191 |
+
safe_eval("{'a': 1}")
|
| 192 |
+
|
| 193 |
+
|
| 194 |
+
def test_no_undefined_names():
|
| 195 |
+
"""Test that undefined variable names are blocked"""
|
| 196 |
+
with pytest.raises(ValueError, match="Undefined name"):
|
| 197 |
+
safe_eval("undefined_variable + 1")
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
def test_no_dangerous_functions():
|
| 201 |
+
"""Test that dangerous functions are blocked"""
|
| 202 |
+
with pytest.raises(ValueError, match="Unsupported function"):
|
| 203 |
+
safe_eval("open('file.txt')")
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
# ============================================================================
|
| 207 |
+
# Error Handling Tests
|
| 208 |
+
# ============================================================================
|
| 209 |
+
|
| 210 |
+
def test_division_by_zero():
|
| 211 |
+
"""Test division by zero raises error"""
|
| 212 |
+
with pytest.raises(ZeroDivisionError):
|
| 213 |
+
safe_eval("10 / 0")
|
| 214 |
+
|
| 215 |
+
|
| 216 |
+
def test_invalid_syntax():
|
| 217 |
+
"""Test invalid syntax raises error"""
|
| 218 |
+
with pytest.raises(SyntaxError):
|
| 219 |
+
safe_eval("2 +* 3")
|
| 220 |
+
|
| 221 |
+
|
| 222 |
+
def test_empty_expression():
|
| 223 |
+
"""Test empty expression raises error"""
|
| 224 |
+
with pytest.raises(ValueError, match="non-empty string"):
|
| 225 |
+
safe_eval("")
|
| 226 |
+
|
| 227 |
+
|
| 228 |
+
def test_too_long_expression():
|
| 229 |
+
"""Test expression length limit"""
|
| 230 |
+
long_expr = "1 + " * 300 + "1"
|
| 231 |
+
with pytest.raises(ValueError, match="too long"):
|
| 232 |
+
safe_eval(long_expr)
|
| 233 |
+
|
| 234 |
+
|
| 235 |
+
def test_huge_exponent():
|
| 236 |
+
"""Test that huge exponents are blocked"""
|
| 237 |
+
with pytest.raises(ValueError, match="Exponent too large"):
|
| 238 |
+
safe_eval("2 ** 10000")
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
def test_sqrt_negative():
|
| 242 |
+
"""Test sqrt of negative number raises error"""
|
| 243 |
+
with pytest.raises(ValueError):
|
| 244 |
+
safe_eval("sqrt(-1)")
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
def test_factorial_negative():
|
| 248 |
+
"""Test factorial of negative number raises error"""
|
| 249 |
+
with pytest.raises(ValueError):
|
| 250 |
+
safe_eval("factorial(-5)")
|
| 251 |
+
|
| 252 |
+
|
| 253 |
+
# ============================================================================
|
| 254 |
+
# Edge Case Tests
|
| 255 |
+
# ============================================================================
|
| 256 |
+
|
| 257 |
+
def test_whitespace_handling():
|
| 258 |
+
"""Test that whitespace is handled correctly"""
|
| 259 |
+
result = safe_eval(" 2 + 3 ")
|
| 260 |
+
assert result["result"] == 5
|
| 261 |
+
|
| 262 |
+
|
| 263 |
+
def test_floating_point():
|
| 264 |
+
"""Test floating point arithmetic"""
|
| 265 |
+
result = safe_eval("3.14 * 2")
|
| 266 |
+
assert abs(result["result"] - 6.28) < 0.01
|
| 267 |
+
|
| 268 |
+
|
| 269 |
+
def test_very_small_numbers():
|
| 270 |
+
"""Test very small numbers"""
|
| 271 |
+
result = safe_eval("0.0001 + 0.0002")
|
| 272 |
+
assert abs(result["result"] - 0.0003) < 0.00001
|
| 273 |
+
|
| 274 |
+
|
| 275 |
+
def test_scientific_notation():
|
| 276 |
+
"""Test scientific notation"""
|
| 277 |
+
result = safe_eval("1e3 + 2e2")
|
| 278 |
+
assert result["result"] == 1200.0
|
| 279 |
+
|
| 280 |
+
|
| 281 |
+
def test_parentheses_precedence():
|
| 282 |
+
"""Test that parentheses affect precedence correctly"""
|
| 283 |
+
result1 = safe_eval("2 + 3 * 4")
|
| 284 |
+
assert result1["result"] == 14
|
| 285 |
+
|
| 286 |
+
result2 = safe_eval("(2 + 3) * 4")
|
| 287 |
+
assert result2["result"] == 20
|
| 288 |
+
|
| 289 |
+
|
| 290 |
+
def test_multiple_operations():
|
| 291 |
+
"""Test chaining multiple operations"""
|
| 292 |
+
result = safe_eval("10 + 20 - 5 * 2 / 2 + 3")
|
| 293 |
+
assert result["result"] == 28.0
|
|
@@ -0,0 +1,317 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tests for file parser tool
|
| 3 |
+
Author: @mangobee
|
| 4 |
+
Date: 2026-01-02
|
| 5 |
+
|
| 6 |
+
Tests cover:
|
| 7 |
+
- PDF parsing
|
| 8 |
+
- Excel parsing
|
| 9 |
+
- Word document parsing
|
| 10 |
+
- Text/CSV parsing
|
| 11 |
+
- Retry logic
|
| 12 |
+
- Error handling
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import pytest
|
| 16 |
+
from pathlib import Path
|
| 17 |
+
from unittest.mock import Mock, patch, MagicMock
|
| 18 |
+
from src.tools.file_parser import (
|
| 19 |
+
parse_pdf,
|
| 20 |
+
parse_excel,
|
| 21 |
+
parse_word,
|
| 22 |
+
parse_text,
|
| 23 |
+
parse_file,
|
| 24 |
+
)
|
| 25 |
+
|
| 26 |
+
# ============================================================================
|
| 27 |
+
# Test Fixtures
|
| 28 |
+
# ============================================================================
|
| 29 |
+
|
| 30 |
+
FIXTURES_DIR = Path(__file__).parent / "fixtures"
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
@pytest.fixture
|
| 34 |
+
def sample_text_file():
|
| 35 |
+
"""Path to sample text file"""
|
| 36 |
+
return str(FIXTURES_DIR / "sample.txt")
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
@pytest.fixture
|
| 40 |
+
def sample_csv_file():
|
| 41 |
+
"""Path to sample CSV file"""
|
| 42 |
+
return str(FIXTURES_DIR / "sample.csv")
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
@pytest.fixture
|
| 46 |
+
def sample_excel_file():
|
| 47 |
+
"""Path to sample Excel file"""
|
| 48 |
+
return str(FIXTURES_DIR / "sample.xlsx")
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
@pytest.fixture
|
| 52 |
+
def sample_word_file():
|
| 53 |
+
"""Path to sample Word file"""
|
| 54 |
+
return str(FIXTURES_DIR / "sample.docx")
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
@pytest.fixture
|
| 58 |
+
def mock_pdf_reader():
|
| 59 |
+
"""Mock PyPDF2 PdfReader"""
|
| 60 |
+
mock_page_1 = Mock()
|
| 61 |
+
mock_page_1.extract_text.return_value = "Test PDF page 1 content"
|
| 62 |
+
|
| 63 |
+
mock_page_2 = Mock()
|
| 64 |
+
mock_page_2.extract_text.return_value = "Test PDF page 2 content"
|
| 65 |
+
|
| 66 |
+
mock_reader = Mock()
|
| 67 |
+
mock_reader.pages = [mock_page_1, mock_page_2]
|
| 68 |
+
|
| 69 |
+
return mock_reader
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
# ============================================================================
|
| 73 |
+
# PDF Parser Tests
|
| 74 |
+
# ============================================================================
|
| 75 |
+
|
| 76 |
+
def test_parse_pdf_success(mock_pdf_reader):
|
| 77 |
+
"""Test successful PDF parsing"""
|
| 78 |
+
with patch('PyPDF2.PdfReader') as mock_reader_class:
|
| 79 |
+
with patch('src.tools.file_parser.Path') as mock_path_class:
|
| 80 |
+
# Mock file exists
|
| 81 |
+
mock_path = Mock()
|
| 82 |
+
mock_path.exists.return_value = True
|
| 83 |
+
mock_path_class.return_value = mock_path
|
| 84 |
+
|
| 85 |
+
# Mock PdfReader
|
| 86 |
+
mock_reader_class.return_value = mock_pdf_reader
|
| 87 |
+
|
| 88 |
+
result = parse_pdf("test.pdf")
|
| 89 |
+
|
| 90 |
+
assert result["file_type"] == "PDF"
|
| 91 |
+
assert result["pages"] == 2
|
| 92 |
+
assert "page 1 content" in result["content"].lower()
|
| 93 |
+
assert "page 2 content" in result["content"].lower()
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
def test_parse_pdf_file_not_found():
|
| 97 |
+
"""Test PDF parsing with missing file"""
|
| 98 |
+
with patch('src.tools.file_parser.Path') as mock_path_class:
|
| 99 |
+
mock_path = Mock()
|
| 100 |
+
mock_path.exists.return_value = False
|
| 101 |
+
mock_path_class.return_value = mock_path
|
| 102 |
+
|
| 103 |
+
with pytest.raises(FileNotFoundError):
|
| 104 |
+
parse_pdf("nonexistent.pdf")
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
def test_parse_pdf_io_error_retry():
|
| 108 |
+
"""Test PDF parsing with IO error triggers retry"""
|
| 109 |
+
with patch('PyPDF2.PdfReader') as mock_reader_class:
|
| 110 |
+
with patch('src.tools.file_parser.Path') as mock_path_class:
|
| 111 |
+
# Mock file exists
|
| 112 |
+
mock_path = Mock()
|
| 113 |
+
mock_path.exists.return_value = True
|
| 114 |
+
mock_path_class.return_value = mock_path
|
| 115 |
+
|
| 116 |
+
# Mock IO error
|
| 117 |
+
mock_reader_class.side_effect = IOError("Disk error")
|
| 118 |
+
|
| 119 |
+
with pytest.raises(IOError):
|
| 120 |
+
parse_pdf("test.pdf")
|
| 121 |
+
|
| 122 |
+
# Verify retry happened (should be called MAX_RETRIES times)
|
| 123 |
+
assert mock_reader_class.call_count == 3
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
# ============================================================================
|
| 127 |
+
# Excel Parser Tests
|
| 128 |
+
# ============================================================================
|
| 129 |
+
|
| 130 |
+
def test_parse_excel_success(sample_excel_file):
|
| 131 |
+
"""Test successful Excel parsing with real file"""
|
| 132 |
+
result = parse_excel(sample_excel_file)
|
| 133 |
+
|
| 134 |
+
assert result["file_type"] == "Excel"
|
| 135 |
+
assert len(result["sheets"]) == 2
|
| 136 |
+
assert "Data" in result["sheets"]
|
| 137 |
+
assert "Summary" in result["sheets"]
|
| 138 |
+
assert "Apple" in result["content"]
|
| 139 |
+
assert "Banana" in result["content"]
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
def test_parse_excel_file_not_found():
|
| 143 |
+
"""Test Excel parsing with missing file"""
|
| 144 |
+
with pytest.raises(FileNotFoundError):
|
| 145 |
+
parse_excel("nonexistent.xlsx")
|
| 146 |
+
|
| 147 |
+
|
| 148 |
+
def test_parse_excel_io_error_retry():
|
| 149 |
+
"""Test Excel parsing with IO error triggers retry"""
|
| 150 |
+
with patch('openpyxl.load_workbook') as mock_load:
|
| 151 |
+
with patch('src.tools.file_parser.Path') as mock_path_class:
|
| 152 |
+
# Mock file exists
|
| 153 |
+
mock_path = Mock()
|
| 154 |
+
mock_path.exists.return_value = True
|
| 155 |
+
mock_path_class.return_value = mock_path
|
| 156 |
+
|
| 157 |
+
# Mock IO error
|
| 158 |
+
mock_load.side_effect = IOError("Disk error")
|
| 159 |
+
|
| 160 |
+
with pytest.raises(IOError):
|
| 161 |
+
parse_excel("test.xlsx")
|
| 162 |
+
|
| 163 |
+
# Verify retry happened
|
| 164 |
+
assert mock_load.call_count == 3
|
| 165 |
+
|
| 166 |
+
|
| 167 |
+
# ============================================================================
|
| 168 |
+
# Word Document Parser Tests
|
| 169 |
+
# ============================================================================
|
| 170 |
+
|
| 171 |
+
def test_parse_word_success(sample_word_file):
|
| 172 |
+
"""Test successful Word document parsing with real file"""
|
| 173 |
+
result = parse_word(sample_word_file)
|
| 174 |
+
|
| 175 |
+
assert result["file_type"] == "Word"
|
| 176 |
+
assert result["paragraphs"] > 0
|
| 177 |
+
assert "Test Word Document" in result["content"]
|
| 178 |
+
assert "first paragraph" in result["content"]
|
| 179 |
+
|
| 180 |
+
|
| 181 |
+
def test_parse_word_file_not_found():
|
| 182 |
+
"""Test Word parsing with missing file"""
|
| 183 |
+
with pytest.raises(FileNotFoundError):
|
| 184 |
+
parse_word("nonexistent.docx")
|
| 185 |
+
|
| 186 |
+
|
| 187 |
+
def test_parse_word_io_error_retry():
|
| 188 |
+
"""Test Word parsing with IO error triggers retry"""
|
| 189 |
+
with patch('docx.Document') as mock_doc_class:
|
| 190 |
+
with patch('src.tools.file_parser.Path') as mock_path_class:
|
| 191 |
+
# Mock file exists
|
| 192 |
+
mock_path = Mock()
|
| 193 |
+
mock_path.exists.return_value = True
|
| 194 |
+
mock_path_class.return_value = mock_path
|
| 195 |
+
|
| 196 |
+
# Mock IO error
|
| 197 |
+
mock_doc_class.side_effect = IOError("Disk error")
|
| 198 |
+
|
| 199 |
+
with pytest.raises(IOError):
|
| 200 |
+
parse_word("test.docx")
|
| 201 |
+
|
| 202 |
+
# Verify retry happened
|
| 203 |
+
assert mock_doc_class.call_count == 3
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
# ============================================================================
|
| 207 |
+
# Text/CSV Parser Tests
|
| 208 |
+
# ============================================================================
|
| 209 |
+
|
| 210 |
+
def test_parse_text_success(sample_text_file):
|
| 211 |
+
"""Test successful text file parsing with real file"""
|
| 212 |
+
result = parse_text(sample_text_file)
|
| 213 |
+
|
| 214 |
+
assert result["file_type"] == "Text"
|
| 215 |
+
assert result["lines"] > 0
|
| 216 |
+
assert "test text file" in result["content"].lower()
|
| 217 |
+
|
| 218 |
+
|
| 219 |
+
def test_parse_csv_success(sample_csv_file):
|
| 220 |
+
"""Test successful CSV file parsing with real file"""
|
| 221 |
+
result = parse_text(sample_csv_file)
|
| 222 |
+
|
| 223 |
+
assert result["file_type"] == "CSV"
|
| 224 |
+
assert result["lines"] > 0
|
| 225 |
+
assert "Name,Age,City" in result["content"]
|
| 226 |
+
assert "Alice" in result["content"]
|
| 227 |
+
|
| 228 |
+
|
| 229 |
+
def test_parse_text_file_not_found():
|
| 230 |
+
"""Test text parsing with missing file"""
|
| 231 |
+
with pytest.raises(FileNotFoundError):
|
| 232 |
+
parse_text("nonexistent.txt")
|
| 233 |
+
|
| 234 |
+
|
| 235 |
+
def test_parse_text_io_error_retry():
|
| 236 |
+
"""Test text parsing with IO error triggers retry"""
|
| 237 |
+
with patch('builtins.open') as mock_open:
|
| 238 |
+
with patch('src.tools.file_parser.Path') as mock_path_class:
|
| 239 |
+
# Mock file exists
|
| 240 |
+
mock_path = Mock()
|
| 241 |
+
mock_path.exists.return_value = True
|
| 242 |
+
mock_path.suffix = '.txt'
|
| 243 |
+
mock_path_class.return_value = mock_path
|
| 244 |
+
|
| 245 |
+
# Mock IO error
|
| 246 |
+
mock_open.side_effect = IOError("Disk error")
|
| 247 |
+
|
| 248 |
+
with pytest.raises(IOError):
|
| 249 |
+
parse_text("test.txt")
|
| 250 |
+
|
| 251 |
+
# Verify retry happened
|
| 252 |
+
assert mock_open.call_count == 3
|
| 253 |
+
|
| 254 |
+
|
| 255 |
+
# ============================================================================
|
| 256 |
+
# Unified Parser Tests
|
| 257 |
+
# ============================================================================
|
| 258 |
+
|
| 259 |
+
def test_parse_file_pdf():
|
| 260 |
+
"""Test unified parser dispatches to PDF parser"""
|
| 261 |
+
with patch('src.tools.file_parser.parse_pdf') as mock_parse_pdf:
|
| 262 |
+
mock_parse_pdf.return_value = {"file_type": "PDF"}
|
| 263 |
+
|
| 264 |
+
result = parse_file("test.pdf")
|
| 265 |
+
|
| 266 |
+
assert result["file_type"] == "PDF"
|
| 267 |
+
mock_parse_pdf.assert_called_once()
|
| 268 |
+
|
| 269 |
+
|
| 270 |
+
def test_parse_file_excel():
|
| 271 |
+
"""Test unified parser dispatches to Excel parser"""
|
| 272 |
+
with patch('src.tools.file_parser.parse_excel') as mock_parse_excel:
|
| 273 |
+
mock_parse_excel.return_value = {"file_type": "Excel"}
|
| 274 |
+
|
| 275 |
+
result = parse_file("test.xlsx")
|
| 276 |
+
|
| 277 |
+
assert result["file_type"] == "Excel"
|
| 278 |
+
mock_parse_excel.assert_called_once()
|
| 279 |
+
|
| 280 |
+
|
| 281 |
+
def test_parse_file_word():
|
| 282 |
+
"""Test unified parser dispatches to Word parser"""
|
| 283 |
+
with patch('src.tools.file_parser.parse_word') as mock_parse_word:
|
| 284 |
+
mock_parse_word.return_value = {"file_type": "Word"}
|
| 285 |
+
|
| 286 |
+
result = parse_file("test.docx")
|
| 287 |
+
|
| 288 |
+
assert result["file_type"] == "Word"
|
| 289 |
+
mock_parse_word.assert_called_once()
|
| 290 |
+
|
| 291 |
+
|
| 292 |
+
def test_parse_file_text():
|
| 293 |
+
"""Test unified parser dispatches to text parser"""
|
| 294 |
+
with patch('src.tools.file_parser.parse_text') as mock_parse_text:
|
| 295 |
+
mock_parse_text.return_value = {"file_type": "Text"}
|
| 296 |
+
|
| 297 |
+
result = parse_file("test.txt")
|
| 298 |
+
|
| 299 |
+
assert result["file_type"] == "Text"
|
| 300 |
+
mock_parse_text.assert_called_once()
|
| 301 |
+
|
| 302 |
+
|
| 303 |
+
def test_parse_file_unsupported_extension():
|
| 304 |
+
"""Test unified parser rejects unsupported file type"""
|
| 305 |
+
with pytest.raises(ValueError, match="Unsupported file type"):
|
| 306 |
+
parse_file("test.mp4")
|
| 307 |
+
|
| 308 |
+
|
| 309 |
+
def test_parse_file_xls_extension():
|
| 310 |
+
"""Test unified parser handles .xls extension"""
|
| 311 |
+
with patch('src.tools.file_parser.parse_excel') as mock_parse_excel:
|
| 312 |
+
mock_parse_excel.return_value = {"file_type": "Excel"}
|
| 313 |
+
|
| 314 |
+
result = parse_file("test.xls")
|
| 315 |
+
|
| 316 |
+
assert result["file_type"] == "Excel"
|
| 317 |
+
mock_parse_excel.assert_called_once()
|
|
@@ -0,0 +1,299 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tests for vision tool (multimodal image analysis)
|
| 3 |
+
Author: @mangobee
|
| 4 |
+
Date: 2026-01-02
|
| 5 |
+
|
| 6 |
+
Tests cover:
|
| 7 |
+
- Image loading and encoding
|
| 8 |
+
- Gemini vision analysis
|
| 9 |
+
- Claude vision analysis
|
| 10 |
+
- Fallback mechanism
|
| 11 |
+
- Retry logic
|
| 12 |
+
- Error handling
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import pytest
|
| 16 |
+
from pathlib import Path
|
| 17 |
+
from unittest.mock import Mock, patch, MagicMock
|
| 18 |
+
from src.tools.vision import (
|
| 19 |
+
load_and_encode_image,
|
| 20 |
+
analyze_image_gemini,
|
| 21 |
+
analyze_image_claude,
|
| 22 |
+
analyze_image,
|
| 23 |
+
)
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
# ============================================================================
|
| 27 |
+
# Test Fixtures
|
| 28 |
+
# ============================================================================
|
| 29 |
+
|
| 30 |
+
FIXTURES_DIR = Path(__file__).parent / "fixtures"
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
@pytest.fixture
|
| 34 |
+
def test_image_path():
|
| 35 |
+
"""Path to test image"""
|
| 36 |
+
return str(FIXTURES_DIR / "test_image.jpg")
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
@pytest.fixture
|
| 40 |
+
def mock_gemini_response():
|
| 41 |
+
"""Mock Gemini API response"""
|
| 42 |
+
mock_response = Mock()
|
| 43 |
+
mock_response.text = "This image shows a red square."
|
| 44 |
+
return mock_response
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
@pytest.fixture
|
| 48 |
+
def mock_claude_response():
|
| 49 |
+
"""Mock Claude API response"""
|
| 50 |
+
mock_content = Mock()
|
| 51 |
+
mock_content.text = "The image contains a red colored square."
|
| 52 |
+
|
| 53 |
+
mock_response = Mock()
|
| 54 |
+
mock_response.content = [mock_content]
|
| 55 |
+
return mock_response
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
@pytest.fixture
|
| 59 |
+
def mock_settings_gemini():
|
| 60 |
+
"""Mock Settings with Gemini API key"""
|
| 61 |
+
with patch('src.tools.vision.Settings') as mock:
|
| 62 |
+
settings_instance = Mock()
|
| 63 |
+
settings_instance.google_api_key = "test_google_key"
|
| 64 |
+
settings_instance.anthropic_api_key = None
|
| 65 |
+
mock.return_value = settings_instance
|
| 66 |
+
yield mock
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
@pytest.fixture
|
| 70 |
+
def mock_settings_claude():
|
| 71 |
+
"""Mock Settings with Claude API key"""
|
| 72 |
+
with patch('src.tools.vision.Settings') as mock:
|
| 73 |
+
settings_instance = Mock()
|
| 74 |
+
settings_instance.google_api_key = None
|
| 75 |
+
settings_instance.anthropic_api_key = "test_anthropic_key"
|
| 76 |
+
mock.return_value = settings_instance
|
| 77 |
+
yield mock
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
@pytest.fixture
|
| 81 |
+
def mock_settings_both():
|
| 82 |
+
"""Mock Settings with both API keys"""
|
| 83 |
+
with patch('src.tools.vision.Settings') as mock:
|
| 84 |
+
settings_instance = Mock()
|
| 85 |
+
settings_instance.google_api_key = "test_google_key"
|
| 86 |
+
settings_instance.anthropic_api_key = "test_anthropic_key"
|
| 87 |
+
mock.return_value = settings_instance
|
| 88 |
+
yield mock
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
# ============================================================================
|
| 92 |
+
# Image Loading Tests
|
| 93 |
+
# ============================================================================
|
| 94 |
+
|
| 95 |
+
def test_load_and_encode_image_success(test_image_path):
|
| 96 |
+
"""Test successful image loading and encoding"""
|
| 97 |
+
result = load_and_encode_image(test_image_path)
|
| 98 |
+
|
| 99 |
+
assert "data" in result
|
| 100 |
+
assert "mime_type" in result
|
| 101 |
+
assert result["mime_type"] == "image/jpeg"
|
| 102 |
+
assert result["size_mb"] > 0
|
| 103 |
+
assert len(result["data"]) > 0 # Base64 encoded data
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
def test_load_image_file_not_found():
|
| 107 |
+
"""Test image loading with missing file"""
|
| 108 |
+
with pytest.raises(FileNotFoundError):
|
| 109 |
+
load_and_encode_image("nonexistent_image.jpg")
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
def test_load_image_unsupported_format(tmp_path):
|
| 113 |
+
"""Test image loading with unsupported format"""
|
| 114 |
+
# Create a text file with .mp4 extension
|
| 115 |
+
fake_video = tmp_path / "video.mp4"
|
| 116 |
+
fake_video.write_text("not a real video")
|
| 117 |
+
|
| 118 |
+
with pytest.raises(ValueError, match="Unsupported image format"):
|
| 119 |
+
load_and_encode_image(str(fake_video))
|
| 120 |
+
|
| 121 |
+
|
| 122 |
+
# ============================================================================
|
| 123 |
+
# Gemini Vision Tests
|
| 124 |
+
# ============================================================================
|
| 125 |
+
|
| 126 |
+
def test_analyze_image_gemini_success(mock_settings_gemini, test_image_path, mock_gemini_response):
|
| 127 |
+
"""Test successful Gemini vision analysis"""
|
| 128 |
+
with patch('google.genai.Client') as mock_client_class:
|
| 129 |
+
# Mock Gemini client
|
| 130 |
+
mock_client = Mock()
|
| 131 |
+
mock_client.models.generate_content.return_value = mock_gemini_response
|
| 132 |
+
mock_client_class.return_value = mock_client
|
| 133 |
+
|
| 134 |
+
result = analyze_image_gemini(test_image_path, "What is in this image?")
|
| 135 |
+
|
| 136 |
+
assert result["model"] == "gemini-2.0-flash"
|
| 137 |
+
assert result["answer"] == "This image shows a red square."
|
| 138 |
+
assert result["question"] == "What is in this image?"
|
| 139 |
+
assert result["image_path"] == test_image_path
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
def test_analyze_image_gemini_default_question(mock_settings_gemini, test_image_path, mock_gemini_response):
|
| 143 |
+
"""Test Gemini with default question"""
|
| 144 |
+
with patch('google.genai.Client') as mock_client_class:
|
| 145 |
+
mock_client = Mock()
|
| 146 |
+
mock_client.models.generate_content.return_value = mock_gemini_response
|
| 147 |
+
mock_client_class.return_value = mock_client
|
| 148 |
+
|
| 149 |
+
result = analyze_image_gemini(test_image_path)
|
| 150 |
+
|
| 151 |
+
assert result["question"] == "Describe this image in detail."
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
def test_analyze_image_gemini_missing_api_key():
|
| 155 |
+
"""Test Gemini with missing API key"""
|
| 156 |
+
with patch('src.tools.vision.Settings') as mock_settings:
|
| 157 |
+
settings_instance = Mock()
|
| 158 |
+
settings_instance.google_api_key = None
|
| 159 |
+
mock_settings.return_value = settings_instance
|
| 160 |
+
|
| 161 |
+
with pytest.raises(ValueError, match="GOOGLE_API_KEY not configured"):
|
| 162 |
+
analyze_image_gemini("test.jpg")
|
| 163 |
+
|
| 164 |
+
|
| 165 |
+
def test_analyze_image_gemini_connection_error(mock_settings_gemini, test_image_path):
|
| 166 |
+
"""Test Gemini with connection error (triggers retry)"""
|
| 167 |
+
with patch('google.genai.Client') as mock_client_class:
|
| 168 |
+
mock_client = Mock()
|
| 169 |
+
mock_client.models.generate_content.side_effect = ConnectionError("Network error")
|
| 170 |
+
mock_client_class.return_value = mock_client
|
| 171 |
+
|
| 172 |
+
with pytest.raises(ConnectionError):
|
| 173 |
+
analyze_image_gemini(test_image_path)
|
| 174 |
+
|
| 175 |
+
# Verify retry happened
|
| 176 |
+
assert mock_client.models.generate_content.call_count == 3
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
# ============================================================================
|
| 180 |
+
# Claude Vision Tests
|
| 181 |
+
# ============================================================================
|
| 182 |
+
|
| 183 |
+
def test_analyze_image_claude_success(mock_settings_claude, test_image_path, mock_claude_response):
|
| 184 |
+
"""Test successful Claude vision analysis"""
|
| 185 |
+
with patch('anthropic.Anthropic') as mock_anthropic_class:
|
| 186 |
+
# Mock Claude client
|
| 187 |
+
mock_client = Mock()
|
| 188 |
+
mock_client.messages.create.return_value = mock_claude_response
|
| 189 |
+
mock_anthropic_class.return_value = mock_client
|
| 190 |
+
|
| 191 |
+
result = analyze_image_claude(test_image_path, "What is in this image?")
|
| 192 |
+
|
| 193 |
+
assert result["model"] == "claude-sonnet-4.5"
|
| 194 |
+
assert result["answer"] == "The image contains a red colored square."
|
| 195 |
+
assert result["question"] == "What is in this image?"
|
| 196 |
+
assert result["image_path"] == test_image_path
|
| 197 |
+
|
| 198 |
+
|
| 199 |
+
def test_analyze_image_claude_default_question(mock_settings_claude, test_image_path, mock_claude_response):
|
| 200 |
+
"""Test Claude with default question"""
|
| 201 |
+
with patch('anthropic.Anthropic') as mock_anthropic_class:
|
| 202 |
+
mock_client = Mock()
|
| 203 |
+
mock_client.messages.create.return_value = mock_claude_response
|
| 204 |
+
mock_anthropic_class.return_value = mock_client
|
| 205 |
+
|
| 206 |
+
result = analyze_image_claude(test_image_path)
|
| 207 |
+
|
| 208 |
+
assert result["question"] == "Describe this image in detail."
|
| 209 |
+
|
| 210 |
+
|
| 211 |
+
def test_analyze_image_claude_missing_api_key():
|
| 212 |
+
"""Test Claude with missing API key"""
|
| 213 |
+
with patch('src.tools.vision.Settings') as mock_settings:
|
| 214 |
+
settings_instance = Mock()
|
| 215 |
+
settings_instance.anthropic_api_key = None
|
| 216 |
+
mock_settings.return_value = settings_instance
|
| 217 |
+
|
| 218 |
+
with pytest.raises(ValueError, match="ANTHROPIC_API_KEY not configured"):
|
| 219 |
+
analyze_image_claude("test.jpg")
|
| 220 |
+
|
| 221 |
+
|
| 222 |
+
def test_analyze_image_claude_connection_error(mock_settings_claude, test_image_path):
|
| 223 |
+
"""Test Claude with connection error (triggers retry)"""
|
| 224 |
+
with patch('anthropic.Anthropic') as mock_anthropic_class:
|
| 225 |
+
mock_client = Mock()
|
| 226 |
+
mock_client.messages.create.side_effect = ConnectionError("Network error")
|
| 227 |
+
mock_anthropic_class.return_value = mock_client
|
| 228 |
+
|
| 229 |
+
with pytest.raises(ConnectionError):
|
| 230 |
+
analyze_image_claude(test_image_path)
|
| 231 |
+
|
| 232 |
+
# Verify retry happened
|
| 233 |
+
assert mock_client.messages.create.call_count == 3
|
| 234 |
+
|
| 235 |
+
|
| 236 |
+
# ============================================================================
|
| 237 |
+
# Unified Vision Analysis Tests
|
| 238 |
+
# ============================================================================
|
| 239 |
+
|
| 240 |
+
def test_analyze_image_uses_gemini(mock_settings_both, test_image_path, mock_gemini_response):
|
| 241 |
+
"""Test unified analysis prefers Gemini when both APIs available"""
|
| 242 |
+
with patch('google.genai.Client') as mock_gemini_class:
|
| 243 |
+
mock_client = Mock()
|
| 244 |
+
mock_client.models.generate_content.return_value = mock_gemini_response
|
| 245 |
+
mock_gemini_class.return_value = mock_client
|
| 246 |
+
|
| 247 |
+
result = analyze_image(test_image_path, "What is this?")
|
| 248 |
+
|
| 249 |
+
assert result["model"] == "gemini-2.0-flash"
|
| 250 |
+
assert "red square" in result["answer"].lower()
|
| 251 |
+
|
| 252 |
+
|
| 253 |
+
def test_analyze_image_fallback_to_claude(mock_settings_both, test_image_path, mock_claude_response):
|
| 254 |
+
"""Test unified analysis falls back to Claude when Gemini fails"""
|
| 255 |
+
with patch('google.genai.Client') as mock_gemini_class:
|
| 256 |
+
with patch('anthropic.Anthropic') as mock_claude_class:
|
| 257 |
+
# Gemini fails
|
| 258 |
+
mock_gemini_client = Mock()
|
| 259 |
+
mock_gemini_client.models.generate_content.side_effect = Exception("Gemini error")
|
| 260 |
+
mock_gemini_class.return_value = mock_gemini_client
|
| 261 |
+
|
| 262 |
+
# Claude succeeds
|
| 263 |
+
mock_claude_client = Mock()
|
| 264 |
+
mock_claude_client.messages.create.return_value = mock_claude_response
|
| 265 |
+
mock_claude_class.return_value = mock_claude_client
|
| 266 |
+
|
| 267 |
+
result = analyze_image(test_image_path, "What is this?")
|
| 268 |
+
|
| 269 |
+
assert result["model"] == "claude-sonnet-4.5"
|
| 270 |
+
assert "red" in result["answer"].lower()
|
| 271 |
+
|
| 272 |
+
|
| 273 |
+
def test_analyze_image_no_api_keys():
|
| 274 |
+
"""Test unified analysis with no API keys configured"""
|
| 275 |
+
with patch('src.tools.vision.Settings') as mock_settings:
|
| 276 |
+
settings_instance = Mock()
|
| 277 |
+
settings_instance.google_api_key = None
|
| 278 |
+
settings_instance.anthropic_api_key = None
|
| 279 |
+
mock_settings.return_value = settings_instance
|
| 280 |
+
|
| 281 |
+
with pytest.raises(ValueError, match="No vision API configured"):
|
| 282 |
+
analyze_image("test.jpg")
|
| 283 |
+
|
| 284 |
+
|
| 285 |
+
def test_analyze_image_both_fail(mock_settings_both, test_image_path):
|
| 286 |
+
"""Test unified analysis when both APIs fail"""
|
| 287 |
+
with patch('google.genai.Client') as mock_gemini_class:
|
| 288 |
+
with patch('anthropic.Anthropic') as mock_claude_class:
|
| 289 |
+
# Both fail
|
| 290 |
+
mock_gemini_client = Mock()
|
| 291 |
+
mock_gemini_client.models.generate_content.side_effect = Exception("Gemini error")
|
| 292 |
+
mock_gemini_class.return_value = mock_gemini_client
|
| 293 |
+
|
| 294 |
+
mock_claude_client = Mock()
|
| 295 |
+
mock_claude_client.messages.create.side_effect = Exception("Claude error")
|
| 296 |
+
mock_claude_class.return_value = mock_claude_client
|
| 297 |
+
|
| 298 |
+
with pytest.raises(Exception, match="both failed"):
|
| 299 |
+
analyze_image(test_image_path)
|
|
@@ -0,0 +1,242 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tests for web search tool (Tavily and Exa)
|
| 3 |
+
Author: @mangobee
|
| 4 |
+
Date: 2026-01-02
|
| 5 |
+
|
| 6 |
+
Tests cover:
|
| 7 |
+
- Tavily search with mocked API
|
| 8 |
+
- Exa search with mocked API
|
| 9 |
+
- Retry logic simulation
|
| 10 |
+
- Fallback mechanism
|
| 11 |
+
- Error handling
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
import pytest
|
| 15 |
+
from unittest.mock import Mock, patch, MagicMock
|
| 16 |
+
from src.tools.web_search import tavily_search, exa_search, search
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
# ============================================================================
|
| 20 |
+
# Test Fixtures
|
| 21 |
+
# ============================================================================
|
| 22 |
+
|
| 23 |
+
@pytest.fixture
|
| 24 |
+
def mock_tavily_response():
|
| 25 |
+
"""Mock Tavily API response"""
|
| 26 |
+
return {
|
| 27 |
+
"results": [
|
| 28 |
+
{
|
| 29 |
+
"title": "Test Result 1",
|
| 30 |
+
"url": "https://example.com/1",
|
| 31 |
+
"content": "This is test content 1"
|
| 32 |
+
},
|
| 33 |
+
{
|
| 34 |
+
"title": "Test Result 2",
|
| 35 |
+
"url": "https://example.com/2",
|
| 36 |
+
"content": "This is test content 2"
|
| 37 |
+
}
|
| 38 |
+
]
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
@pytest.fixture
|
| 43 |
+
def mock_exa_response():
|
| 44 |
+
"""Mock Exa API response"""
|
| 45 |
+
mock_result_1 = Mock()
|
| 46 |
+
mock_result_1.title = "Exa Result 1"
|
| 47 |
+
mock_result_1.url = "https://exa.com/1"
|
| 48 |
+
mock_result_1.text = "This is exa content 1"
|
| 49 |
+
|
| 50 |
+
mock_result_2 = Mock()
|
| 51 |
+
mock_result_2.title = "Exa Result 2"
|
| 52 |
+
mock_result_2.url = "https://exa.com/2"
|
| 53 |
+
mock_result_2.text = "This is exa content 2"
|
| 54 |
+
|
| 55 |
+
mock_response = Mock()
|
| 56 |
+
mock_response.results = [mock_result_1, mock_result_2]
|
| 57 |
+
return mock_response
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
@pytest.fixture
|
| 61 |
+
def mock_settings_tavily():
|
| 62 |
+
"""Mock Settings with Tavily API key"""
|
| 63 |
+
with patch('src.tools.web_search.Settings') as mock:
|
| 64 |
+
settings_instance = Mock()
|
| 65 |
+
settings_instance.tavily_api_key = "test_tavily_key"
|
| 66 |
+
settings_instance.exa_api_key = "test_exa_key"
|
| 67 |
+
settings_instance.default_search_tool = "tavily"
|
| 68 |
+
mock.return_value = settings_instance
|
| 69 |
+
yield mock
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
@pytest.fixture
|
| 73 |
+
def mock_settings_exa():
|
| 74 |
+
"""Mock Settings with Exa as default"""
|
| 75 |
+
with patch('src.tools.web_search.Settings') as mock:
|
| 76 |
+
settings_instance = Mock()
|
| 77 |
+
settings_instance.tavily_api_key = "test_tavily_key"
|
| 78 |
+
settings_instance.exa_api_key = "test_exa_key"
|
| 79 |
+
settings_instance.default_search_tool = "exa"
|
| 80 |
+
mock.return_value = settings_instance
|
| 81 |
+
yield mock
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
# ============================================================================
|
| 85 |
+
# Tavily Search Tests
|
| 86 |
+
# ============================================================================
|
| 87 |
+
|
| 88 |
+
def test_tavily_search_success(mock_settings_tavily, mock_tavily_response):
|
| 89 |
+
"""Test successful Tavily search"""
|
| 90 |
+
with patch('tavily.TavilyClient') as mock_client_class:
|
| 91 |
+
mock_client = Mock()
|
| 92 |
+
mock_client.search.return_value = mock_tavily_response
|
| 93 |
+
mock_client_class.return_value = mock_client
|
| 94 |
+
|
| 95 |
+
result = tavily_search("test query", max_results=2)
|
| 96 |
+
|
| 97 |
+
assert result["source"] == "tavily"
|
| 98 |
+
assert result["query"] == "test query"
|
| 99 |
+
assert result["count"] == 2
|
| 100 |
+
assert len(result["results"]) == 2
|
| 101 |
+
assert result["results"][0]["title"] == "Test Result 1"
|
| 102 |
+
assert result["results"][0]["url"] == "https://example.com/1"
|
| 103 |
+
assert result["results"][0]["snippet"] == "This is test content 1"
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
def test_tavily_search_missing_api_key():
|
| 107 |
+
"""Test Tavily search with missing API key"""
|
| 108 |
+
with patch('src.tools.web_search.Settings') as mock_settings:
|
| 109 |
+
settings_instance = Mock()
|
| 110 |
+
settings_instance.tavily_api_key = None
|
| 111 |
+
mock_settings.return_value = settings_instance
|
| 112 |
+
|
| 113 |
+
with pytest.raises(ValueError, match="TAVILY_API_KEY not configured"):
|
| 114 |
+
tavily_search("test query")
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
def test_tavily_search_connection_error(mock_settings_tavily):
|
| 118 |
+
"""Test Tavily search with connection error (triggers retry)"""
|
| 119 |
+
with patch('tavily.TavilyClient') as mock_client_class:
|
| 120 |
+
mock_client = Mock()
|
| 121 |
+
mock_client.search.side_effect = ConnectionError("Network error")
|
| 122 |
+
mock_client_class.return_value = mock_client
|
| 123 |
+
|
| 124 |
+
with pytest.raises(ConnectionError):
|
| 125 |
+
tavily_search("test query")
|
| 126 |
+
|
| 127 |
+
# Verify retry happened (should be called MAX_RETRIES times)
|
| 128 |
+
assert mock_client.search.call_count == 3
|
| 129 |
+
|
| 130 |
+
|
| 131 |
+
def test_tavily_search_empty_results(mock_settings_tavily):
|
| 132 |
+
"""Test Tavily search with empty results"""
|
| 133 |
+
with patch('tavily.TavilyClient') as mock_client_class:
|
| 134 |
+
mock_client = Mock()
|
| 135 |
+
mock_client.search.return_value = {"results": []}
|
| 136 |
+
mock_client_class.return_value = mock_client
|
| 137 |
+
|
| 138 |
+
result = tavily_search("test query")
|
| 139 |
+
|
| 140 |
+
assert result["count"] == 0
|
| 141 |
+
assert result["results"] == []
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
# ============================================================================
|
| 145 |
+
# Exa Search Tests
|
| 146 |
+
# ============================================================================
|
| 147 |
+
|
| 148 |
+
def test_exa_search_success(mock_settings_exa, mock_exa_response):
|
| 149 |
+
"""Test successful Exa search"""
|
| 150 |
+
with patch('exa_py.Exa') as mock_client_class:
|
| 151 |
+
mock_client = Mock()
|
| 152 |
+
mock_client.search.return_value = mock_exa_response
|
| 153 |
+
mock_client_class.return_value = mock_client
|
| 154 |
+
|
| 155 |
+
result = exa_search("test query", max_results=2)
|
| 156 |
+
|
| 157 |
+
assert result["source"] == "exa"
|
| 158 |
+
assert result["query"] == "test query"
|
| 159 |
+
assert result["count"] == 2
|
| 160 |
+
assert len(result["results"]) == 2
|
| 161 |
+
assert result["results"][0]["title"] == "Exa Result 1"
|
| 162 |
+
assert result["results"][0]["url"] == "https://exa.com/1"
|
| 163 |
+
assert result["results"][0]["snippet"] == "This is exa content 1"
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
def test_exa_search_missing_api_key():
|
| 167 |
+
"""Test Exa search with missing API key"""
|
| 168 |
+
with patch('src.tools.web_search.Settings') as mock_settings:
|
| 169 |
+
settings_instance = Mock()
|
| 170 |
+
settings_instance.exa_api_key = None
|
| 171 |
+
mock_settings.return_value = settings_instance
|
| 172 |
+
|
| 173 |
+
with pytest.raises(ValueError, match="EXA_API_KEY not configured"):
|
| 174 |
+
exa_search("test query")
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
def test_exa_search_connection_error(mock_settings_exa):
|
| 178 |
+
"""Test Exa search with connection error (triggers retry)"""
|
| 179 |
+
with patch('exa_py.Exa') as mock_client_class:
|
| 180 |
+
mock_client = Mock()
|
| 181 |
+
mock_client.search.side_effect = ConnectionError("Network error")
|
| 182 |
+
mock_client_class.return_value = mock_client
|
| 183 |
+
|
| 184 |
+
with pytest.raises(ConnectionError):
|
| 185 |
+
exa_search("test query")
|
| 186 |
+
|
| 187 |
+
# Verify retry happened
|
| 188 |
+
assert mock_client.search.call_count == 3
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
# ============================================================================
|
| 192 |
+
# Unified Search with Fallback Tests
|
| 193 |
+
# ============================================================================
|
| 194 |
+
|
| 195 |
+
def test_search_tavily_success(mock_settings_tavily, mock_tavily_response):
|
| 196 |
+
"""Test unified search using Tavily successfully"""
|
| 197 |
+
with patch('tavily.TavilyClient') as mock_client_class:
|
| 198 |
+
mock_client = Mock()
|
| 199 |
+
mock_client.search.return_value = mock_tavily_response
|
| 200 |
+
mock_client_class.return_value = mock_client
|
| 201 |
+
|
| 202 |
+
result = search("test query")
|
| 203 |
+
|
| 204 |
+
assert result["source"] == "tavily"
|
| 205 |
+
assert result["count"] == 2
|
| 206 |
+
|
| 207 |
+
|
| 208 |
+
def test_search_fallback_to_exa(mock_settings_tavily, mock_exa_response):
|
| 209 |
+
"""Test unified search falls back to Exa when Tavily fails"""
|
| 210 |
+
with patch('tavily.TavilyClient') as mock_tavily_class:
|
| 211 |
+
with patch('exa_py.Exa') as mock_exa_class:
|
| 212 |
+
# Tavily fails
|
| 213 |
+
mock_tavily_client = Mock()
|
| 214 |
+
mock_tavily_client.search.side_effect = Exception("Tavily error")
|
| 215 |
+
mock_tavily_class.return_value = mock_tavily_client
|
| 216 |
+
|
| 217 |
+
# Exa succeeds
|
| 218 |
+
mock_exa_client = Mock()
|
| 219 |
+
mock_exa_client.search.return_value = mock_exa_response
|
| 220 |
+
mock_exa_class.return_value = mock_exa_client
|
| 221 |
+
|
| 222 |
+
result = search("test query")
|
| 223 |
+
|
| 224 |
+
assert result["source"] == "exa"
|
| 225 |
+
assert result["count"] == 2
|
| 226 |
+
|
| 227 |
+
|
| 228 |
+
def test_search_both_fail(mock_settings_tavily):
|
| 229 |
+
"""Test unified search when both Tavily and Exa fail"""
|
| 230 |
+
with patch('tavily.TavilyClient') as mock_tavily_class:
|
| 231 |
+
with patch('exa_py.Exa') as mock_exa_class:
|
| 232 |
+
# Both fail
|
| 233 |
+
mock_tavily_client = Mock()
|
| 234 |
+
mock_tavily_client.search.side_effect = Exception("Tavily error")
|
| 235 |
+
mock_tavily_class.return_value = mock_tavily_client
|
| 236 |
+
|
| 237 |
+
mock_exa_client = Mock()
|
| 238 |
+
mock_exa_client.search.side_effect = Exception("Exa error")
|
| 239 |
+
mock_exa_class.return_value = mock_exa_client
|
| 240 |
+
|
| 241 |
+
with pytest.raises(Exception, match="Search failed"):
|
| 242 |
+
search("test query")
|