diff --git a/5_day_meal_plan.docx b/5_day_meal_plan.docx deleted file mode 100644 index d4ab64f050c434895cd25414bfb36550afbe5f38..0000000000000000000000000000000000000000 Binary files a/5_day_meal_plan.docx and /dev/null differ diff --git a/5_day_meal_plan.xlsx b/5_day_meal_plan.xlsx deleted file mode 100644 index e3e8d0f53a4656f8ffba800e3f016b7f061c811b..0000000000000000000000000000000000000000 Binary files a/5_day_meal_plan.xlsx and /dev/null differ diff --git a/ANTHROPIC_CONTEXT_ENGINEERING.md b/ANTHROPIC_CONTEXT_ENGINEERING.md deleted file mode 100644 index 709984aa6ab5a90b760a360aa64c8b5192128a6b..0000000000000000000000000000000000000000 --- a/ANTHROPIC_CONTEXT_ENGINEERING.md +++ /dev/null @@ -1,201 +0,0 @@ -# Anthropic Context Engineering Implementation - -## Overview -Enhanced context engineering implementation based on Anthropic's best practices and research. - -## Key Principles from Anthropic - -### 1. Context as Finite Resource -- **Context Rot**: As tokens increase, model's ability to recall information decreases -- **Attention Budget**: LLMs have finite attention, every token depletes it -- **Diminishing Returns**: More context doesn't always mean better performance - -### 2. Minimal High-Signal Tokens -- Find the smallest possible set of high-signal tokens -- Maximize likelihood of desired outcome -- Balance between too much and too little context - -## Implemented Strategies - -### 1. Structured Prompt Organization ✅ -**Anthropic's Recommendation**: Use clear sections with XML tags or Markdown headers - -**Implementation**: -- All prompts now use XML-style tags: ``, ``, ``, etc. -- Clear separation of concerns -- Better model understanding of context structure - -**Example Structure**: -``` - - System instructions - - - - Context and rules - - - - RAG results - - - - Task instructions - -``` - -### 2. Compaction (High-Fidelity Summarization) ✅ -**Anthropic's Strategy**: Summarize conversations nearing context limit while preserving critical details - -**Implementation**: -- `compact_conversation()`: Preserves architectural decisions, unresolved issues, implementation details -- Discards redundant tool outputs -- Keeps first message + summary + last N messages -- High-fidelity compression maintaining coherence - -**Key Features**: -- Preserves: Architectural decisions, unresolved bugs, implementation details, key facts -- Discards: Redundant tool outputs, repetitive information, verbose explanations - -### 3. Tool Result Clearing ✅ -**Anthropic's Safest Compaction**: Clear tool results once processed - -**Implementation**: -- `clear_tool_results()`: Removes large tool outputs while keeping metadata -- Once a tool is called deep in history, raw results often no longer needed -- Safest form of compaction with minimal information loss - -**Usage**: -- Automatically applied before full compaction -- Reduces tokens without losing critical context -- Preserves tool call metadata for debugging - -### 4. Structured Note-Taking ✅ -**Anthropic's Memory Strategy**: Write notes outside context window, pull back when needed - -**Enhanced Implementation**: -- **Objectives Tracking**: Like Claude playing Pokémon - tracks progress toward goals -- **Architectural Decisions**: Preserved during compaction -- **Unresolved Issues**: Tracked separately for later resolution -- **Structured Summary**: Organized sections (Plan, Objectives, Decisions, Issues, Facts, Notes) - -**Example**: -``` -## Plan -Multi-step plan: ... - -## Objectives -- Objective 1: Progress (target: ...) -- Objective 2: Progress (target: ...) - -## Architectural Decisions -- Decision 1 -- Decision 2 - -## Unresolved Issues -- Issue 1 -- Issue 2 -``` - -### 5. Just-in-Time Context Loading ✅ -**Anthropic's Approach**: Use lightweight identifiers, load data at runtime - -**Implementation**: -- Memory selection: Only relevant memories loaded -- Tool selection: Only relevant tools provided -- Progressive disclosure: Context discovered incrementally - -### 6. Context Compression Thresholds ✅ -**Anthropic's Guidance**: Compress at 80% of context window - -**Implementation**: -- Monitors token usage -- Triggers compression at 80% threshold -- Targets 60% after compression -- Uses tool result clearing first (safest), then full compaction - -## Prompt Engineering Improvements - -### System Prompt Structure -- **Right Altitude**: Balance between too specific (brittle) and too vague (ineffective) -- **Clear Sections**: XML tags for better organization -- **Minimal but Complete**: Enough information without bloat - -### Tool Design -- **Token Efficient**: Tools return concise, relevant information -- **Minimal Overlap**: Clear tool boundaries -- **Self-Contained**: Each tool is independent and robust - -### Examples (Few-Shot) -- **Diverse, Canonical**: Not laundry lists of edge cases -- **Effective Portrayal**: Examples that show expected behavior -- **Quality over Quantity**: Few good examples better than many mediocre ones - -## Integration Points - -### In `agent_orchestrator.py`: - -1. **Conversation History Compression**: - - Checks token usage at 80% threshold - - Uses tool result clearing first - - Falls back to full compaction if needed - -2. **Structured Note-Taking**: - - Saves plans, objectives, decisions, issues - - Pulls notes into prompts when relevant - - Preserves across compaction cycles - -3. **Prompt Structure**: - - All prompts use XML-style sections - - Clear organization improves model understanding - - Better separation of concerns - -4. **Tool Output Compression**: - - Automatically compresses RAG/web outputs - - Limits results to top 5 - - Truncates long text fields - -## Benefits - -1. **Better Performance**: Structured prompts improve model understanding -2. **Reduced Token Usage**: Compression and clearing reduce costs -3. **Longer Conversations**: Compaction enables extended agent trajectories -4. **Better Coherence**: Structured notes maintain context across resets -5. **Cost Efficiency**: Fewer tokens = lower API costs - -## Comparison: Before vs After - -### Before: -- Flat prompt structure -- No conversation compression -- All tool outputs kept in context -- No structured note-taking - -### After: -- XML-structured prompts -- Automatic compaction at 80% threshold -- Tool result clearing (safest compaction) -- Structured note-taking with objectives, decisions, issues -- Better context selection - -## Files Modified - -- `backend/api/services/context_engineer.py` - Enhanced with Anthropic strategies -- `backend/api/services/agent_orchestrator.py` - Integrated structured prompts and compaction - -## Testing Recommendations - -1. **Long Conversations**: Test with 20+ message exchanges -2. **Compaction**: Verify compaction preserves critical information -3. **Tool Clearing**: Ensure tool results are cleared appropriately -4. **Note-Taking**: Verify notes persist across compaction cycles -5. **Structured Prompts**: Test that XML structure improves responses - -## Future Enhancements - -1. **Fine-tuned Compaction**: Train models specifically for context compression -2. **Hierarchical Summarization**: Multi-level compression for very long conversations -3. **Embedding-based Selection**: Better memory/tool selection using embeddings -4. **Sub-agent Architectures**: Specialized agents with clean context windows -5. **Adaptive Thresholds**: Dynamic compression thresholds based on task complexity - diff --git a/CONTEXT_ENGINEERING_IMPLEMENTATION.md b/CONTEXT_ENGINEERING_IMPLEMENTATION.md deleted file mode 100644 index b9e579d34266e663be942eca1e11ab48f9cdaf9a..0000000000000000000000000000000000000000 --- a/CONTEXT_ENGINEERING_IMPLEMENTATION.md +++ /dev/null @@ -1,128 +0,0 @@ -# Context Engineering Implementation - -## Overview -Implemented comprehensive context engineering strategies based on LangChain's best practices to optimize agent performance and reduce token usage. - -## Four Main Strategies - -### 1. Write Context ✅ -**Purpose**: Save context outside the context window for later use. - -**Implementation**: -- **Scratchpad**: `ContextScratchpad` class saves notes, plans, and key facts during agent execution -- **Plan Saving**: Agent plans are saved to scratchpad for persistence -- **Key Facts**: Important information extracted from responses is saved -- **Notes**: Categorized notes (user_query, intent, tool_execution, etc.) - -**Usage in Agent**: -- Saves user queries to scratchpad -- Saves intent classifications -- Saves agent plans from multi-step decisions -- Saves key facts from LLM responses - -### 2. Select Context ✅ -**Purpose**: Pull only relevant context into the context window. - -**Implementation**: -- **Memory Selection**: `ContextSelector.select_relevant_memories()` selects top N relevant memories -- **Tool Selection**: `ContextSelector.select_relevant_tools()` selects most relevant tools -- **Keyword-based**: Uses keyword matching (can be enhanced with embeddings) - -**Usage in Agent**: -- Selects relevant memories before tool selection -- Filters conversation history to most relevant parts -- Can be extended for better RAG retrieval - -### 3. Compress Context ✅ -**Purpose**: Retain only necessary tokens. - -**Implementation**: -- **Conversation Summarization**: `ContextCompressor.summarize_conversation()` summarizes long conversations -- **Message Trimming**: `ContextCompressor.trim_messages()` keeps first N and last M messages -- **Tool Output Compression**: `ContextCompressor.compress_tool_output()` reduces tool output size - - Limits RAG results to top 5 - - Limits web search results to top 5 - - Truncates long text fields - -**Usage in Agent**: -- Compresses conversation history if > 10 messages -- Compresses RAG tool outputs automatically -- Compresses web search tool outputs automatically -- Summarizes middle sections of long conversations - -### 4. Isolate Context ✅ -**Purpose**: Split context to prevent token bloat. - -**Implementation**: -- **ContextIsolator**: Stores large tool outputs separately -- **Reference System**: Returns references instead of full data -- **Automatic Cleanup**: Clears old isolated data after timeout - -**Usage in Agent**: -- Can isolate large tool outputs (images, audio, large JSON) -- Prevents context window overflow -- Maintains references for later retrieval - -## Integration Points - -### In `agent_orchestrator.py`: - -1. **Request Start**: - - Writes user query to scratchpad - - Compresses conversation history if needed - -2. **Intent Classification**: - - Saves intent to scratchpad - -3. **Memory Retrieval**: - - Selects relevant memories using context selector - -4. **Tool Selection**: - - Saves multi-step plans to scratchpad - -5. **Tool Execution**: - - Compresses RAG outputs - - Compresses web search outputs - - Saves key facts from responses - -6. **Prompt Building**: - - Includes scratchpad context in prompts - - Adds context from previous steps - -## Benefits - -1. **Reduced Token Usage**: Compression and selection reduce context window usage -2. **Better Performance**: Relevant context improves agent accuracy -3. **Longer Conversations**: Summarization enables longer agent trajectories -4. **Cost Savings**: Fewer tokens = lower costs -5. **Faster Responses**: Smaller context = faster LLM calls - -## Future Enhancements - -1. **Embedding-based Selection**: Use embeddings for better memory/tool selection -2. **Hierarchical Summarization**: Multi-level summarization for very long conversations -3. **Fine-tuned Compression**: Train models specifically for context compression -4. **Knowledge Graph Integration**: Use knowledge graphs for better context selection -5. **Adaptive Compression**: Adjust compression based on context window usage - -## Files Created - -- `backend/api/services/context_engineer.py` - Main context engineering service - - `ContextScratchpad` - Write context - - `ContextCompressor` - Compress context - - `ContextSelector` - Select context - - `ContextIsolator` - Isolate context - - `ContextEngineer` - Main orchestrator - -## Files Modified - -- `backend/api/services/agent_orchestrator.py` - Integrated context engineering throughout - -## Testing - -Test with: -- Long conversations (> 10 messages) -- Multiple tool calls -- Large tool outputs -- Memory retrieval scenarios - diff --git a/KB_FIRST_IMPLEMENTATION.md b/KB_FIRST_IMPLEMENTATION.md deleted file mode 100644 index 67c4f1492d110dcfa8956e561a353037920e1689..0000000000000000000000000000000000000000 --- a/KB_FIRST_IMPLEMENTATION.md +++ /dev/null @@ -1,81 +0,0 @@ -# KB-First Strategy Implementation - -## Overview -The system now implements a **Knowledge Base (KB) first, web search as fallback** strategy with enhanced safety rules. - -## Key Behavior - -### 1. KB-First Approach -- **Always check Knowledge Base first** - RAG search is performed before any other tool -- **Web search is ONLY a fallback** - Used when KB has no relevant information -- **KB is authoritative** - Knowledge Base information takes priority over web search - -### 2. Safety Rules for Web Search - -When web search is used as a fallback: -- ✅ Keep responses **short, factual, and neutral** -- ✅ **Limit to 2-4 sentences** for web search content -- ❌ Do NOT provide long legal, medical, or highly detailed professional explanations -- ⚠️ For legal, medical, financial, or safety topics: provide brief general explanation + recommend consulting a qualified professional -- 📝 Always clarify that information comes from external sources, not the Knowledge Base - -### 3. Professional Disclaimers - -For topics involving: -- Legal advice -- Medical advice -- Financial advice -- Safety-critical information - -**Response format:** -> "Brief general explanation. For specific advice, please consult a qualified professional." - -## Implementation Details - -### Prompt Updates - -1. **RAG Prompt (when KB has results)** - - Emphasizes KB as primary and authoritative source - - Clarifies that web search is supplementary only - -2. **RAG Prompt (when KB has no results)** - - Includes rules for web search fallback - - Adds safety disclaimers for professional advice topics - -3. **Web Search Prompt** - - Explicitly states KB was checked first - - Includes all safety rules and disclaimers - - Enforces 2-4 sentence limit - -4. **Multi-Step Synthesis Prompt** - - Prioritizes KB information over web search - - Distinguishes between authoritative (KB) and supplementary (web) sources - -### Example Test Query - -**Query:** "What are the international laws regarding subletting?" - -**Expected Flow:** -1. ✅ Check Knowledge Base first -2. ✅ No relevant KB information found -3. ✅ Trigger web search as fallback -4. ✅ Generate short, safe answer - -**Expected Response:** -> "I don't have this in the knowledge base, but based on general information from the web, subletting laws differ widely by country. For specific legal advice, please consult a local authority or legal professional." - -## Safety Features - -- ✅ Professional advice disclaimers -- ✅ Source distinction (KB vs web) -- ✅ Response length limits for web content -- ✅ Clear messaging about fallback behavior - -## Configuration - -All rules are built into the prompt templates in: -- `backend/api/services/agent_orchestrator.py` - - `_build_prompt_with_rag()` - - `_build_prompt_with_web()` - - `_execute_multi_step()` (multi-step synthesis) - diff --git a/TESTING_GUIDE.md b/TESTING_GUIDE.md deleted file mode 100644 index b10891fd0d1f241a3024e11e7d4279c77fb79f14..0000000000000000000000000000000000000000 --- a/TESTING_GUIDE.md +++ /dev/null @@ -1,308 +0,0 @@ -# Testing Guide for IntegraChat Improvements - -This guide helps you test all the improvements we've made to the system. - -## Prerequisites - -1. Make sure all services are running: - - Backend API server - - MCP servers (RAG, Web, Admin) - - Ollama (if using local LLM) - -2. Check environment variables in `.env`: - ``` - OLLAMA_URL=http://localhost:11434 - OLLAMA_MODEL=llama3.1:latest - RAG_MCP_URL=http://localhost:8001 - WEB_MCP_URL=http://localhost:8002 - ADMIN_MCP_URL=http://localhost:8003 - ``` - -## Quick Test Script - -Run the test script: -```bash -python test_improvements.py -``` - -## Manual Testing - -### 1. Test Streaming Response (Character-by-Character) - -**Test Query:** -``` -"Tell me about artificial intelligence" -``` - -**What to Check:** -- Response streams character-by-character (not word-by-word) -- Smooth animation in the UI -- No delays or jumps - -**Expected Behavior:** -- Characters appear one by one smoothly -- Response completes without errors - ---- - -### 2. Test Query Expansion for Ambiguous Terms - -**Test Queries:** -``` -"latest news about Al" -"atest news about Al" (typo test) -"What is AI?" -"Tell me about ML" -``` - -**What to Check:** -- System expands "Al" to "artificial intelligence" -- System expands "AI" appropriately -- System expands "ML" to "machine learning" -- News queries still work with typos - -**Expected Behavior:** -- Ambiguous terms are expanded -- Better search results -- No "provided context" errors for news queries - ---- - -### 3. Test Enhanced Error Handling - -**Test Scenarios:** - -**A. Connection Error:** -- Stop Ollama service -- Send any query -- Check error message is user-friendly - -**B. Timeout:** -- Send a very complex query that might timeout -- Check error message explains timeout - -**C. 404 Error:** -- Query something that doesn't exist -- Check error message is helpful - -**Expected Behavior:** -- Clear, actionable error messages -- No technical jargon for users -- Suggestions on what to do next - ---- - -### 4. Test Multi-Query Web Search - -**Test Query:** -``` -"latest news about artificial intelligence" -``` - -**What to Check:** -- Multiple query variations are tried in parallel -- Results are merged from multiple queries -- Better coverage of results - -**How to Verify:** -- Check backend logs for "web_multi_query_merge" -- Look for multiple web search calls -- Results should be more comprehensive - ---- - -### 5. Test Caching - -**Test Query:** -``` -"What is Python programming?" -``` - -**Steps:** -1. Send query first time - note response time -2. Send same query immediately - should be faster (cached) -3. Wait 6 minutes - cache should expire -4. Send again - should be slower (cache expired) - -**Expected Behavior:** -- Second query is much faster -- Cache expires after 5 minutes -- Different queries don't interfere - ---- - -### 6. Test Enhanced News Query Detection - -**Test Queries:** -``` -"latest news about AI" -"breaking news technology" -"what happened today" -"current events in tech" -``` - -**What to Check:** -- News queries use web search (not RAG) -- No "provided context" errors -- LLM-based detection works for edge cases - -**Expected Behavior:** -- All news queries route to web search -- No RAG results for news queries -- Helpful responses even if web search fails - ---- - -### 7. Test Enhanced Prompts - -**Test Query:** -``` -"Explain quantum computing" -``` - -**What to Check:** -- Response is well-structured -- Sources are cited -- Response is comprehensive - -**Expected Behavior:** -- Clear sections in response -- Citations when using sources -- Professional and helpful tone - ---- - -### 8. Test Performance (Parallel Execution) - -**Test Query:** -``` -"Compare Python and JavaScript" -``` - -**What to Check:** -- Multiple tools run in parallel -- Faster overall response time -- Better results from parallel execution - -**How to Verify:** -- Check logs for "parallel_execution" -- Response time should be faster -- Multiple tools used simultaneously - ---- - -## Using the Debug Endpoint - -Test the `/agent/debug` endpoint to see detailed reasoning: - -```bash -curl -X POST http://localhost:8000/agent/debug \ - -H "Content-Type: application/json" \ - -d '{ - "tenant_id": "test-tenant", - "message": "latest news about AI" - }' -``` - -This shows: -- Intent classification -- Tool selection reasoning -- Tool scores -- Reasoning trace -- Tool traces - ---- - -## Testing with Python Script - -Create a test script to automate testing: - -```python -import requests -import json -import time - -BASE_URL = "http://localhost:8000" - -def test_query(message, tenant_id="test-tenant"): - """Test a query and return response.""" - response = requests.post( - f"{BASE_URL}/agent/message", - json={ - "tenant_id": tenant_id, - "message": message, - "temperature": 0.0 - } - ) - return response.json() - -# Test cases -test_cases = [ - ("latest news about AI", "News query"), - ("What is Python?", "General query"), - ("Who is the admin?", "Admin query"), - ("atest news about Al", "Typo + ambiguous"), -] - -for query, description in test_cases: - print(f"\n{'='*50}") - print(f"Testing: {description}") - print(f"Query: {query}") - print(f"{'='*50}") - - start = time.time() - result = test_query(query) - elapsed = time.time() - start - - print(f"Response time: {elapsed:.2f}s") - print(f"Response: {result['text'][:200]}...") - print(f"Tools used: {result.get('decision', {}).get('tool', 'unknown')}") -``` - ---- - -## Common Issues and Solutions - -### Issue: "Cannot connect to Ollama" -**Solution:** -- Start Ollama: `ollama serve` -- Pull model: `ollama pull llama3.1:latest` - -### Issue: Cache not working -**Solution:** -- Check cache is enabled (it is by default) -- Verify query is exactly the same -- Check cache hasn't expired (5 min TTL) - -### Issue: News queries still using RAG -**Solution:** -- Check logs for "news_query_detection" -- Verify "news" keyword is in query -- Check tool selection decision - -### Issue: Streaming not smooth -**Solution:** -- Check character-by-character streaming is enabled -- Verify no network issues -- Check browser console for errors - ---- - -## Performance Benchmarks - -Expected performance improvements: - -- **Caching**: 90%+ faster for repeated queries -- **Parallel execution**: 30-50% faster for multi-tool queries -- **Multi-query search**: 2-3x more results -- **Streaming**: Smoother UX (subjective) - ---- - -## Next Steps - -1. Run all test cases -2. Check logs for any errors -3. Verify all features work as expected -4. Report any issues found - diff --git a/backend/api/placeholder.txt b/backend/api/placeholder.txt deleted file mode 100644 index ca0f927849523f8e4d1d48c6d4d12fc4b988534a..0000000000000000000000000000000000000000 --- a/backend/api/placeholder.txt +++ /dev/null @@ -1,4 +0,0 @@ -This directory contains the FastAPI backend API code. -For the Hugging Face Space submission, only placeholder files are included. -The full backend implementation exists separately. - diff --git a/backend/tests/README_RETRY_TESTS.md b/backend/tests/README_RETRY_TESTS.md deleted file mode 100644 index 6c0200f5348ea7653a6275692807ff494472db5c..0000000000000000000000000000000000000000 --- a/backend/tests/README_RETRY_TESTS.md +++ /dev/null @@ -1,266 +0,0 @@ -# Retry System Testing Guide - -This guide explains how to test the autonomous retry and self-correction system. - -## Test Files - -### 1. Unit Tests: `test_retry_system.py` - -Comprehensive unit tests that mock all dependencies and test individual retry methods. - -**Run with:** -```bash -# Run all retry tests -pytest backend/tests/test_retry_system.py -v - -# Run specific test -pytest backend/tests/test_retry_system.py::test_rag_with_repair_low_score_retry -v - -# Run with coverage -pytest backend/tests/test_retry_system.py --cov=api.services.agent_orchestrator -v -``` - -**What it tests:** -- ✅ RAG retry with low scores (threshold adjustment) -- ✅ RAG retry with query expansion -- ✅ Web search retry with empty results -- ✅ Safe tool call retry mechanism -- ✅ Rule safe message rewriting -- ✅ Analytics logging verification -- ✅ Reasoning trace integration -- ✅ Edge cases and boundary conditions - -**No backend required** - all tests use mocks. - -### 2. Integration Tests: `test_retry_integration.py` - -Integration tests that require a running backend and test the full system. - -**Prerequisites:** -- FastAPI backend running on `http://localhost:8000` -- MCP server running -- Optional: LLM service available - -**Run with:** -```bash -python test_retry_integration.py -``` - -**What it tests:** -- ✅ RAG retry scenarios with real backend -- ✅ Web search retry scenarios -- ✅ Reasoning trace verification -- ✅ Analytics logging -- ✅ Full agent flow integration -- ✅ Agent plan endpoint - -### 3. Quick Test: `test_retry_quick.py` - -Minimal test to quickly verify retry system is active. - -**Prerequisites:** -- Backend running on `http://localhost:8000` - -**Run with:** -```bash -python test_retry_quick.py -``` - -**What it tests:** -- ✅ Basic connectivity -- ✅ Retry steps in reasoning traces -- ✅ Quick verification retry system is active - -## Test Scenarios - -### Scenario 1: RAG Low Score Retry - -**What happens:** -1. Initial RAG search returns score < 0.30 -2. System retries with lower threshold (0.15) -3. If still low (< 0.15), expands query and retries - -**How to test:** -```bash -# Send query that might have low relevance -curl -X POST "http://localhost:8000/agent/debug" \ - -H "Content-Type: application/json" \ - -d '{ - "tenant_id": "test", - "message": "What is quantum field theory and how does it relate to string theory?" - }' | jq '.reasoning_trace[] | select(.step | contains("retry"))' -``` - -**Expected:** -- `rag_retry_low_threshold` step in reasoning trace -- Possibly `rag_retry_expanded_query` if score still low -- Analytics logs showing retry attempts - -### Scenario 2: Web Search Empty Results Retry - -**What happens:** -1. Web search returns empty results -2. System rewrites query as "best explanation of {query}" -3. If still empty, rewrites as "{query} facts summary" - -**How to test:** -```bash -# Send obscure query -curl -X POST "http://localhost:8000/agent/debug" \ - -H "Content-Type: application/json" \ - -d '{ - "tenant_id": "test", - "message": "Explain zyxwvutsrqp in detail" - }' | jq '.reasoning_trace[] | select(.step | contains("web_retry"))' -``` - -**Expected:** -- `web_retry_rewritten` steps in reasoning trace -- Rewritten queries visible in trace -- Analytics logs showing retry attempts - -### Scenario 3: Safe Tool Call Retry - -**What happens:** -1. Tool call fails -2. System retries up to max_retries times -3. Uses fallback params if provided - -**How to test:** -- This is tested automatically in unit tests -- In production, retries happen transparently - -## Verifying Retry Behavior - -### Method 1: Check Reasoning Trace - -The `/agent/debug` endpoint shows all reasoning steps including retries: - -```bash -curl -X POST "http://localhost:8000/agent/debug" \ - -H "Content-Type: application/json" \ - -d '{"tenant_id": "test", "message": "test query"}' \ - | jq '.reasoning_trace[] | select(.step | test("retry|repair"))' -``` - -### Method 2: Check Analytics - -Retry attempts are logged to analytics: - -```bash -curl -X GET "http://localhost:8000/analytics/tool-usage?days=1" \ - -H "x-tenant-id: test" \ - | jq '.logs[] | select(.tool_name | contains("retry"))' -``` - -### Method 3: Check Tool Traces - -Tool traces in agent responses show retry attempts: - -```bash -curl -X POST "http://localhost:8000/agent/message" \ - -H "Content-Type: application/json" \ - -d '{"tenant_id": "test", "message": "test"}' \ - | jq '.tool_traces' -``` - -## Expected Retry Patterns - -### RAG Retries - -- **Low score (< 0.30)**: Retry with threshold 0.15 -- **Very low score (< 0.15)**: Expand query and retry -- **Reasoning trace steps**: - - `rag_retry_low_threshold` - - `rag_retry_expanded_query` - - `rag_expanded_query_result` - -### Web Retries - -- **Empty results**: Rewrite query and retry -- **Reasoning trace steps**: - - `web_retry_rewritten` - - `web_retry_success` - -### Tool Call Retries - -- **Tool failure**: Retry up to max_retries -- **Reasoning trace steps**: - - `retry_attempt` - - `retry_success` or `error` after all retries - -## Troubleshooting - -### Tests Not Showing Retries - -**Possible reasons:** -1. **Scores are already high** - Retries only happen when needed -2. **First attempt succeeded** - System working optimally -3. **Query doesn't trigger retry** - Try more obscure queries - -**Solution:** This is actually good! Retries only happen when needed. - -### Backend Not Running - -```bash -# Start backend -cd backend/api -uvicorn main:app --port 8000 --reload - -# Or use start script -python start.bat -``` - -### Import Errors - -```bash -# Install dependencies -pip install -r requirements.txt - -# Run from project root -cd /path/to/IntegraChat -pytest backend/tests/test_retry_system.py -``` - -## Test Coverage - -The test suite covers: - -- ✅ RAG retry logic (threshold + query expansion) -- ✅ Web retry logic (query rewriting) -- ✅ Safe tool call retries -- ✅ Rule safe message rewriting -- ✅ Analytics logging -- ✅ Reasoning trace integration -- ✅ Edge cases and boundaries -- ✅ Integration with full agent flow - -## Continuous Testing - -To run tests automatically: - -```bash -# Watch mode (runs on file changes) -pytest-watch backend/tests/test_retry_system.py - -# With coverage -pytest backend/tests/test_retry_system.py --cov --cov-report=html - -# All tests -pytest backend/tests/ -v -k retry -``` - -## Next Steps - -1. ✅ Run unit tests: `pytest backend/tests/test_retry_system.py -v` -2. ✅ Start backend and run integration tests: `python test_retry_integration.py` -3. ✅ Quick verification: `python test_retry_quick.py` -4. ✅ Check reasoning traces for retry steps -5. ✅ Monitor analytics for retry attempts - -For more information, see `TESTING_GUIDE.md` in the project root. - - - - - diff --git a/backend/tests/conftest.py b/backend/tests/conftest.py deleted file mode 100644 index 8b137891791fe96927ad78e64b0aad7bded08bdc..0000000000000000000000000000000000000000 --- a/backend/tests/conftest.py +++ /dev/null @@ -1 +0,0 @@ - diff --git a/backend/tests/test_access_control.py b/backend/tests/test_access_control.py deleted file mode 100644 index 0025c64638a792bfa29ad250b55288ae4dd15d28..0000000000000000000000000000000000000000 --- a/backend/tests/test_access_control.py +++ /dev/null @@ -1,55 +0,0 @@ -import sys -from pathlib import Path -import pytest - -# Ensure backend package is importable -backend_dir = Path(__file__).parent.parent -sys.path.insert(0, str(backend_dir)) - -from mcp_server.common import access_control -from mcp_server.common.utils import execute_tool - - -@pytest.mark.asyncio -async def test_execute_tool_denies_without_permission(): - async def handler(context, payload): - return {"ok": True} - - payload = { - "tenant_id": "tenant123", - "session_id": "s1", - "role": "viewer", - } - - result = await execute_tool("rag.ingest", payload, handler) - assert result["status"] == "error" - assert result["error_type"] == "validation_error" - assert "not permitted" in result["message"] - - -@pytest.mark.asyncio -async def test_execute_tool_allows_authorized_role(): - async def handler(context, payload): - return {"ok": True} - - payload = { - "tenant_id": "tenant123", - "session_id": "s1", - "role": "admin", - } - - result = await execute_tool("rag.ingest", payload, handler) - assert result["status"] == "ok" - assert result["data"]["ok"] is True - - -def test_normalize_role_defaults_to_viewer(): - assert access_control.normalize_role(None) == "viewer" - assert access_control.normalize_role("ADMIN") == "admin" - assert access_control.normalize_role("unknown") == "viewer" - - -def test_role_allows_matrix(): - assert access_control.role_allows("owner", "manage_rules") - assert not access_control.role_allows("viewer", "manage_rules") - diff --git a/backend/tests/test_agent_orchestrator.py b/backend/tests/test_agent_orchestrator.py deleted file mode 100644 index b9e66a89373855a98ccac5aecd1b082576f6165d..0000000000000000000000000000000000000000 --- a/backend/tests/test_agent_orchestrator.py +++ /dev/null @@ -1,230 +0,0 @@ -# ============================================================= -# File: tests/test_agent_orchestrator.py -# ============================================================= - -import sys -from pathlib import Path - -# Add backend directory to Python path -backend_dir = Path(__file__).parent.parent -sys.path.insert(0, str(backend_dir)) - -try: - import pytest - HAS_PYTEST = True -except ImportError: - HAS_PYTEST = False - # Create a mock pytest decorator if pytest is not available - class MockMark: - def asyncio(self, func): - return func - class MockPytest: - mark = MockMark() - def fixture(self, func): - return func - pytest = MockPytest() - -import os -from api.services.agent_orchestrator import AgentOrchestrator -from api.models.agent import AgentRequest, AgentDecision, AgentResponse -from api.models.redflag import RedFlagMatch -from api.services.llm_client import LLMClient - - -# --------------------------- -# Mock classes -# --------------------------- - -class FakeLLM(LLMClient): - def __init__(self, output="LLM_RESPONSE"): - self.output = output - - async def simple_call(self, prompt: str, temperature: float = 0.0): - return self.output - - -class FakeMCP: - """Fake MCP server client used for rag/web/admin calls.""" - def __init__(self): - self.last_rag = None - self.last_web = None - self.last_admin = None - - async def call_rag(self, tenant_id: str, query: str): - self.last_rag = query - return {"results": [{"text": "RAG_DOC_CONTENT"}]} - - async def call_web(self, tenant_id: str, query: str): - self.last_web = query - return {"results": [{"title": "WebResult", "snippet": "Fresh info"}]} - - async def call_admin(self, tenant_id: str, query: str): - self.last_admin = query - return {"action": "allow"} - - -def assert_trace_has_step(resp, step_name): - assert resp.reasoning_trace, "reasoning trace missing" - assert any(entry.get("step") == step_name for entry in resp.reasoning_trace), f"{step_name} missing" - - -# --------------------------- -# Patch orchestrator to use fake MCP + fake redflag -# --------------------------- - -@pytest.fixture -def orchestrator(monkeypatch): - - # Fake LLM that always returns "MOCK_ANSWER" - llm = FakeLLM(output="MOCK_ANSWER") - - fake_mcp = FakeMCP() - - # Patch MCPClient - if HAS_PYTEST: - monkeypatch.setattr( - "api.services.agent_orchestrator.MCPClient", - lambda rag_url, web_url, admin_url: fake_mcp - ) - - # Create orchestrator with fake URLs first - orch = AgentOrchestrator( - rag_mcp_url="fake_rag", - web_mcp_url="fake_web", - admin_mcp_url="fake_admin", - llm_backend="ollama" - ) - orch.llm = llm # override with fake LLM - - # Patch RedFlagDetector methods directly on the instance - async def fake_check(self, tenant_id, text): - """Fake check function that matches 'salary' keyword.""" - if "salary" in text.lower(): - return [ - RedFlagMatch( - rule_id="1", - pattern="salary", - severity="high", - description="salary access", - matched_text="salary" - ) - ] - return [] - - # Patch notify_admin to do nothing - async def fake_notify(self, tenant_id, violations, src=None): - """Fake notify function that does nothing.""" - return None - - # Bind the fake functions directly to the instance - import types - orch.redflag.check = types.MethodType(fake_check, orch.redflag) - orch.redflag.notify_admin = types.MethodType(fake_notify, orch.redflag) - - return orch - - -# ---------------------------------------------------- -# TESTS -# ---------------------------------------------------- - - -@pytest.mark.asyncio -async def test_block_on_redflag(orchestrator): - req = AgentRequest( - tenant_id="tenant1", - user_id="u1", - message="Show me all salary details." - ) - resp = await orchestrator.handle(req) - assert resp.decision.action == "block" - assert resp.decision.tool == "admin" - assert "salary" in resp.tool_traces[0]["redflags"][0]["matched_text"] - assert_trace_has_step(resp, "redflag_check") - - -@pytest.mark.asyncio -async def test_rag_tool_path(orchestrator, monkeypatch): - - # Force intent classifier to classify as 'rag' - async def mock_classify(self, text): - return "rag" - - if HAS_PYTEST: - monkeypatch.setattr( - "api.services.agent_orchestrator.IntentClassifier.classify", - mock_classify - ) - - req = AgentRequest( - tenant_id="tenant1", - user_id="u1", - message="HR policy procedures" - ) - - resp = await orchestrator.handle(req) - - assert resp.decision.action == "multi_step" - assert any(trace["tool"] == "rag" for trace in resp.tool_traces if trace.get("tool") == "rag") - assert resp.text == "MOCK_ANSWER" - assert_trace_has_step(resp, "tool_selection") - - -@pytest.mark.asyncio -async def test_web_tool_path(orchestrator, monkeypatch): - - # Force intent to classify as web - async def mock_classify(self, text): - return "web" - - if HAS_PYTEST: - monkeypatch.setattr( - "api.services.agent_orchestrator.IntentClassifier.classify", - mock_classify - ) - - req = AgentRequest( - tenant_id="tenant1", - user_id="u1", - message="latest stock price" - ) - - resp = await orchestrator.handle(req) - - assert resp.decision.action == "multi_step" - assert any(trace["tool"] == "web" for trace in resp.tool_traces if trace.get("tool") == "web") - assert resp.text == "MOCK_ANSWER" - assert_trace_has_step(resp, "tool_selection") - - -@pytest.mark.asyncio -async def test_default_llm_path(orchestrator, monkeypatch): - - # Force intent = general and force tool selector to NOT call any tool - async def mock_select(self, intent, text, context): - from api.models.agent import AgentDecision - return AgentDecision( - action="respond", - tool=None, - tool_input=None, - reason="forced_llm" - ) - - if HAS_PYTEST: - monkeypatch.setattr( - "api.services.agent_orchestrator.ToolSelector.select", - mock_select - ) - - req = AgentRequest( - tenant_id="tenant1", - user_id="u1", - message="just a normal question" - ) - - resp = await orchestrator.handle(req) - - assert resp.decision.action == "respond" - assert resp.decision.tool is None - assert resp.text == "MOCK_ANSWER" - assert_trace_has_step(resp, "intent_detection") diff --git a/backend/tests/test_analytics_store.py b/backend/tests/test_analytics_store.py deleted file mode 100644 index c8c9c0bf0e2417ba3396e7109e65ddb56c897b76..0000000000000000000000000000000000000000 --- a/backend/tests/test_analytics_store.py +++ /dev/null @@ -1,208 +0,0 @@ -""" -Tests for AnalyticsStore - tenant-level analytics logging -""" - -import sys -from pathlib import Path - -# Add backend directory to Python path -backend_dir = Path(__file__).parent.parent -sys.path.insert(0, str(backend_dir)) - -import pytest -import time -import tempfile -import os - -from api.storage.analytics_store import AnalyticsStore - - -@pytest.fixture -def temp_analytics_db(): - """Create a temporary database for testing.""" - with tempfile.NamedTemporaryFile(delete=False, suffix='.db') as f: - db_path = f.name - yield db_path - # Cleanup - close any connections first - try: - if os.path.exists(db_path): - # On Windows, we need to ensure the file is closed - import time - time.sleep(0.1) # Brief delay to ensure file is released - os.unlink(db_path) - except (PermissionError, OSError): - # File might still be in use, that's okay for temp files - pass - - -@pytest.fixture -def analytics_store(temp_analytics_db): - """Create an AnalyticsStore instance with temporary database.""" - return AnalyticsStore(db_path=temp_analytics_db) - - -def test_analytics_store_init(analytics_store): - """Test that AnalyticsStore initializes correctly.""" - assert analytics_store is not None - assert analytics_store.db_path.exists() - - -def test_log_tool_usage(analytics_store): - """Test logging tool usage events.""" - analytics_store.log_tool_usage( - tenant_id="test_tenant", - tool_name="rag", - latency_ms=150, - tokens_used=500, - success=True, - user_id="user123" - ) - - stats = analytics_store.get_tool_usage_stats("test_tenant") - assert "rag" in stats - assert stats["rag"]["count"] == 1 - assert stats["rag"]["avg_latency_ms"] == 150.0 - assert stats["rag"]["total_tokens"] == 500 - - -def test_log_redflag_violation(analytics_store): - """Test logging red-flag violations.""" - analytics_store.log_redflag_violation( - tenant_id="test_tenant", - rule_id="rule123", - rule_pattern=".*password.*", - severity="high", - matched_text="password123", - confidence=0.95, - message_preview="User entered password123", - user_id="user123" - ) - - violations = analytics_store.get_redflag_violations("test_tenant", limit=10) - assert len(violations) == 1 - assert violations[0]["severity"] == "high" - assert violations[0]["confidence"] == 0.95 - assert violations[0]["matched_text"] == "password123" - - -def test_log_rag_search(analytics_store): - """Test logging RAG search events with quality metrics.""" - analytics_store.log_rag_search( - tenant_id="test_tenant", - query="What is the policy?", - hits_count=5, - avg_score=0.85, - top_score=0.92, - latency_ms=120 - ) - - metrics = analytics_store.get_rag_quality_metrics("test_tenant") - assert metrics["total_searches"] == 1 - assert metrics["avg_hits_per_search"] == 5.0 - assert metrics["avg_score"] == 0.85 - assert metrics["avg_top_score"] == 0.92 - - -def test_log_agent_query(analytics_store): - """Test logging agent query events.""" - analytics_store.log_agent_query( - tenant_id="test_tenant", - message_preview="What is the company policy?", - intent="rag", - tools_used=["rag", "llm"], - total_tokens=1000, - total_latency_ms=250, - success=True, - user_id="user123" - ) - - activity = analytics_store.get_activity_summary("test_tenant") - assert activity["total_queries"] == 1 - assert activity["active_users"] == 1 - - -def test_tool_usage_stats_filtered_by_time(analytics_store): - """Test that tool usage stats can be filtered by timestamp.""" - # Log an old event (1 day ago) - old_timestamp = int(time.time()) - 86400 - # Note: We can't directly set timestamp in current implementation, - # but we can test the filtering works - - analytics_store.log_tool_usage( - tenant_id="test_tenant", - tool_name="web", - latency_ms=100 - ) - - # Get stats without time filter - all_stats = analytics_store.get_tool_usage_stats("test_tenant") - assert "web" in all_stats - - # Get stats with recent time filter - recent_timestamp = int(time.time()) - 3600 # Last hour - recent_stats = analytics_store.get_tool_usage_stats("test_tenant", recent_timestamp) - assert "web" in recent_stats - - -def test_get_activity_summary(analytics_store): - """Test getting activity summary for a tenant.""" - # Log multiple queries - for i in range(3): - analytics_store.log_agent_query( - tenant_id="test_tenant", - message_preview=f"Query {i}", - intent="general", - tools_used=["llm"], - user_id=f"user{i}" - ) - - activity = analytics_store.get_activity_summary("test_tenant") - assert activity["total_queries"] == 3 - assert activity["active_users"] == 3 - - -def test_get_rag_quality_metrics(analytics_store): - """Test getting RAG quality metrics.""" - # Log multiple RAG searches - for i in range(3): - analytics_store.log_rag_search( - tenant_id="test_tenant", - query=f"Query {i}", - hits_count=5 + i, - avg_score=0.8 + i * 0.05, - top_score=0.9 + i * 0.05, - latency_ms=100 + i * 10 - ) - - metrics = analytics_store.get_rag_quality_metrics("test_tenant") - assert metrics["total_searches"] == 3 - assert metrics["avg_hits_per_search"] > 0 - assert metrics["avg_score"] > 0 - - -def test_multiple_tenants_isolation(analytics_store): - """Test that analytics are properly isolated by tenant.""" - # Log events for tenant1 - analytics_store.log_tool_usage( - tenant_id="tenant1", - tool_name="rag", - latency_ms=100 - ) - - # Log events for tenant2 - analytics_store.log_tool_usage( - tenant_id="tenant2", - tool_name="web", - latency_ms=200 - ) - - # Check tenant1 stats - tenant1_stats = analytics_store.get_tool_usage_stats("tenant1") - assert "rag" in tenant1_stats - assert "web" not in tenant1_stats - - # Check tenant2 stats - tenant2_stats = analytics_store.get_tool_usage_stats("tenant2") - assert "web" in tenant2_stats - assert "rag" not in tenant2_stats - diff --git a/backend/tests/test_api_endpoints.py b/backend/tests/test_api_endpoints.py deleted file mode 100644 index d219f880f19ffe8fbed94218c4f924896a6bb802..0000000000000000000000000000000000000000 --- a/backend/tests/test_api_endpoints.py +++ /dev/null @@ -1,222 +0,0 @@ -""" -Integration tests for new API endpoints -""" - -import sys -from pathlib import Path - -# Add backend to path -backend_dir = Path(__file__).parent.parent -sys.path.insert(0, str(backend_dir)) - -# Add root directory to path for backend.api imports -root_dir = Path(__file__).resolve().parents[2] -sys.path.insert(0, str(root_dir)) - -import pytest -from fastapi.testclient import TestClient -from fastapi import FastAPI - -try: - from backend.api.main import app -except ImportError: - # Fallback if backend.api.main doesn't work - from api.main import app - - -@pytest.fixture -def client(): - """Create a test client.""" - return TestClient(app) - - -def test_analytics_overview_endpoint(client): - """Test /analytics/overview endpoint.""" - response = client.get( - "/analytics/overview", - headers={"x-tenant-id": "test_tenant", "x-user-role": "owner"}, - params={"days": 30} - ) - - assert response.status_code == 200 - data = response.json() - assert "tenant_id" in data - assert "overview" in data - assert "total_queries" in data["overview"] - assert "tool_usage" in data["overview"] - assert "redflag_count" in data["overview"] - - -def test_analytics_tool_usage_endpoint(client): - """Test /analytics/tool-usage endpoint.""" - response = client.get( - "/analytics/tool-usage", - headers={"x-tenant-id": "test_tenant", "x-user-role": "owner"}, - params={"days": 30} - ) - - assert response.status_code == 200 - data = response.json() - assert "tenant_id" in data - assert "tool_usage" in data - assert "period_days" in data - - -def test_analytics_rag_quality_endpoint(client): - """Test /analytics/rag-quality endpoint.""" - response = client.get( - "/analytics/rag-quality", - headers={"x-tenant-id": "test_tenant", "x-user-role": "owner"}, - params={"days": 30} - ) - - assert response.status_code == 200 - data = response.json() - assert "tenant_id" in data - assert "rag_quality" in data - - -def test_admin_rules_with_regex(client): - """Test adding admin rule with regex pattern and severity.""" - response = client.post( - "/admin/rules", - headers={"x-tenant-id": "test_tenant", "x-user-role": "owner"}, - json={ - "rule": "Block password queries", - "pattern": ".*password.*", - "severity": "high", - "description": "Blocks password-related queries" - } - ) - - assert response.status_code == 200 - data = response.json() - assert data["severity"] == "high" - assert ".*password.*" in data["pattern"] - - # Get detailed rules - response = client.get( - "/admin/rules", - headers={"x-tenant-id": "test_tenant"}, - params={"detailed": True} - ) - - assert response.status_code == 200 - data = response.json() - assert "rules" in data - assert len(data["rules"]) > 0 - assert data["rules"][0]["severity"] == "high" - - -def test_admin_violations_endpoint(client): - """Test /admin/violations endpoint.""" - response = client.get( - "/admin/violations", - headers={"x-tenant-id": "test_tenant"}, - params={"limit": 50, "days": 30} - ) - - assert response.status_code == 200 - data = response.json() - assert "tenant_id" in data - assert "violations" in data - assert "count" in data - - -def test_admin_tools_logs_endpoint(client): - """Test /admin/tools/logs endpoint.""" - response = client.get( - "/admin/tools/logs", - headers={"x-tenant-id": "test_tenant"}, - params={"tool_name": "rag", "days": 7} - ) - - assert response.status_code == 200 - data = response.json() - assert "tenant_id" in data - assert "tool_usage" in data - - -def test_agent_debug_endpoint(client): - """Test /agent/debug endpoint.""" - # Note: This will fail if LLM/MCP servers are not running - # But we can at least test the endpoint structure - response = client.post( - "/agent/debug", - json={ - "tenant_id": "test_tenant", - "message": "Test message", - "temperature": 0.0 - } - ) - - # Might fail if services not available, but should have proper error handling - assert response.status_code in [200, 500, 503] # Accept various status codes - - -def test_agent_plan_endpoint(client): - """Test /agent/plan endpoint.""" - # Note: This will fail if LLM/MCP servers are not running - response = client.post( - "/agent/plan", - json={ - "tenant_id": "test_tenant", - "message": "What is the company policy?", - "temperature": 0.0 - } - ) - - # Might fail if services not available - assert response.status_code in [200, 500, 503] - - -def test_missing_tenant_id_returns_400(client): - """Test that endpoints return 400 when tenant ID is missing.""" - endpoints = [ - "/analytics/overview", - "/analytics/tool-usage", - "/admin/rules", - "/admin/violations" - ] - - for endpoint in endpoints: - response = client.get(endpoint) - assert response.status_code == 400, f"Endpoint {endpoint} should return 400" - - -def test_admin_tenants_endpoints(client): - """Test tenant management endpoints (placeholders).""" - # List tenants - response = client.get("/admin/tenants") - assert response.status_code == 200 - data = response.json() - assert "tenants" in data - - # Create tenant (placeholder) - response = client.post("/admin/tenants", params={"tenant_id": "new_tenant"}) - assert response.status_code == 200 - - # Delete tenant (placeholder) - response = client.delete("/admin/tenants/new_tenant") - assert response.status_code == 200 - - -def test_analytics_requires_admin_role(client): - """Ensure analytics endpoints enforce RBAC.""" - response = client.get( - "/analytics/overview", - headers={"x-tenant-id": "test_tenant", "x-user-role": "viewer"}, - params={"days": 7} - ) - assert response.status_code == 403 - - -def test_admin_rules_requires_admin_role(client): - """Ensure rule uploads enforce RBAC.""" - response = client.post( - "/admin/rules", - headers={"x-tenant-id": "test_tenant", "x-user-role": "viewer"}, - json={"rule": "No passwords"} - ) - assert response.status_code == 403 - diff --git a/backend/tests/test_conversation_memory.py b/backend/tests/test_conversation_memory.py deleted file mode 100644 index a59437349869787edfb794074f3d9d381ee7c2a3..0000000000000000000000000000000000000000 --- a/backend/tests/test_conversation_memory.py +++ /dev/null @@ -1,479 +0,0 @@ -# ============================================================= -# File: backend/tests/test_conversation_memory.py -# ============================================================= -""" -Comprehensive tests for short-term conversation memory with expiration. - -Tests: -1. Memory storage and retrieval -2. Memory injection into tool payloads -3. Session isolation (different session_ids don't share memory) -4. Memory expiration (TTL) -5. Memory bounded size (only last N items) -6. Session clearing (end_session flag) -7. Memory is NOT keyed by tenant_id (same session_id across tenants shares memory) -""" - -import sys -from pathlib import Path -import pytest -import time -from unittest.mock import AsyncMock, MagicMock, patch -import asyncio - -# Add backend directory to Python path -backend_dir = Path(__file__).parent.parent -sys.path.insert(0, str(backend_dir)) - -from mcp_server.common import memory -from mcp_server.common.utils import execute_tool, ToolHandler -from mcp_server.common.tenant import TenantContext - - -# ============================================================= -# FIXTURES -# ============================================================= - -@pytest.fixture(autouse=True) -def clear_memory(): - """Clear memory before and after each test.""" - # Clear all memory before test - memory._MEMORY.clear() - yield - # Clear all memory after test - memory._MEMORY.clear() - - -@pytest.fixture -def mock_tool_handler(): - """Create a mock tool handler that captures the payload.""" - captured_payloads = [] - - async def handler(context: TenantContext, payload: dict) -> dict: - captured_payloads.append(payload) - return {"result": "success", "tool_output": "test_data"} - - handler.captured = captured_payloads - return handler - - -# ============================================================= -# UNIT TESTS: Memory Module -# ============================================================= - -def test_extract_session_id(): - """Test session ID extraction from payload.""" - # Test various key formats - assert memory.extract_session_id({"session_id": "s1"}) == "s1" - assert memory.extract_session_id({"sessionId": "s2"}) == "s2" - assert memory.extract_session_id({"conversation_id": "s3"}) == "s3" - assert memory.extract_session_id({"conversationId": "s4"}) == "s4" - - # Test first match wins - assert memory.extract_session_id({ - "session_id": "s1", - "sessionId": "s2" - }) == "s1" - - # Test missing session ID - assert memory.extract_session_id({"tenant_id": "t1"}) is None - assert memory.extract_session_id({}) is None - - # Test empty string - assert memory.extract_session_id({"session_id": ""}) is None - assert memory.extract_session_id({"session_id": " "}) is None - - -def test_add_and_get_entry(): - """Test basic memory storage and retrieval.""" - session_id = "test-session-1" - - # Add entries - memory.add_entry(session_id, "tool1", {"output": "data1"}, max_items=10, ttl_seconds=900) - memory.add_entry(session_id, "tool2", {"output": "data2"}, max_items=10, ttl_seconds=900) - memory.add_entry(session_id, "tool3", {"output": "data3"}, max_items=10, ttl_seconds=900) - - # Retrieve entries - entries = memory.get_recent(session_id, ttl_seconds=900) - - assert len(entries) == 3 - assert entries[0]["tool"] == "tool1" - assert entries[1]["tool"] == "tool2" - assert entries[2]["tool"] == "tool3" - assert entries[0]["output"] == {"output": "data1"} - assert "timestamp" in entries[0] - - -def test_memory_bounded_size(): - """Test that memory only keeps last N items.""" - session_id = "test-session-2" - max_items = 3 - - # Add more items than max - for i in range(5): - memory.add_entry(session_id, f"tool{i}", {"data": i}, max_items=max_items, ttl_seconds=900) - - entries = memory.get_recent(session_id, ttl_seconds=900) - - # Should only have last 3 items - assert len(entries) == 3 - assert entries[0]["tool"] == "tool2" - assert entries[1]["tool"] == "tool3" - assert entries[2]["tool"] == "tool4" - - -def test_memory_expiration(): - """Test that expired entries are automatically removed.""" - session_id = "test-session-3" - short_ttl = 1 # 1 second TTL - - # Add entry - memory.add_entry(session_id, "tool1", {"data": "old"}, max_items=10, ttl_seconds=short_ttl) - - # Should be present immediately - entries = memory.get_recent(session_id, ttl_seconds=short_ttl) - assert len(entries) == 1 - - # Wait for expiration - time.sleep(1.1) - - # Should be expired now - entries = memory.get_recent(session_id, ttl_seconds=short_ttl) - assert len(entries) == 0 - - -def test_session_isolation(): - """Test that different session_ids don't share memory.""" - session1 = "session-1" - session2 = "session-2" - - memory.add_entry(session1, "tool1", {"data": "s1"}, max_items=10, ttl_seconds=900) - memory.add_entry(session2, "tool2", {"data": "s2"}, max_items=10, ttl_seconds=900) - - entries1 = memory.get_recent(session1, ttl_seconds=900) - entries2 = memory.get_recent(session2, ttl_seconds=900) - - assert len(entries1) == 1 - assert len(entries2) == 1 - assert entries1[0]["tool"] == "tool1" - assert entries2[0]["tool"] == "tool2" - - -def test_clear_session(): - """Test that clear_session removes all memory for a session.""" - session_id = "test-session-4" - - memory.add_entry(session_id, "tool1", {"data": "d1"}, max_items=10, ttl_seconds=900) - memory.add_entry(session_id, "tool2", {"data": "d2"}, max_items=10, ttl_seconds=900) - - assert len(memory.get_recent(session_id, ttl_seconds=900)) == 2 - - memory.clear_session(session_id) - - assert len(memory.get_recent(session_id, ttl_seconds=900)) == 0 - - -def test_memory_not_keyed_by_tenant(): - """Test that memory is keyed by session_id, NOT tenant_id.""" - session_id = "shared-session" - tenant1 = "tenant-a" - tenant2 = "tenant-b" - - # Simulate: tenant1 calls tool, then tenant2 calls tool with same session_id - # They should see each other's tool outputs (because memory is session-based, not tenant-based) - - # This is intentional for safety - memory is NOT per-tenant - # In a real scenario, you'd want to ensure session_ids are unique per tenant - # But the memory system itself doesn't enforce this - - # Add entry from tenant1 perspective - memory.add_entry(session_id, "tool1", {"tenant": tenant1, "data": "from-tenant1"}, max_items=10, ttl_seconds=900) - - # Add entry from tenant2 perspective (same session_id) - memory.add_entry(session_id, "tool2", {"tenant": tenant2, "data": "from-tenant2"}, max_items=10, ttl_seconds=900) - - # Both should see both entries (because same session_id) - entries = memory.get_recent(session_id, ttl_seconds=900) - assert len(entries) == 2 - assert entries[0]["output"]["tenant"] == tenant1 - assert entries[1]["output"]["tenant"] == tenant2 - - -def test_get_recent_with_limit(): - """Test that get_recent respects the limit parameter.""" - session_id = "test-session-5" - - # Add 5 entries - for i in range(5): - memory.add_entry(session_id, f"tool{i}", {"data": i}, max_items=10, ttl_seconds=900) - - # Get all - all_entries = memory.get_recent(session_id, limit=None, ttl_seconds=900) - assert len(all_entries) == 5 - - # Get last 2 - recent_2 = memory.get_recent(session_id, limit=2, ttl_seconds=900) - assert len(recent_2) == 2 - assert recent_2[0]["tool"] == "tool3" - assert recent_2[1]["tool"] == "tool4" - - -# ============================================================= -# INTEGRATION TESTS: execute_tool with Memory -# ============================================================= - -@pytest.mark.asyncio -async def test_execute_tool_stores_memory(mock_tool_handler): - """Test that execute_tool stores tool output in memory.""" - payload = { - "tenant_id": "test-tenant", - "session_id": "test-session-6", - "query": "test query" - } - - result = await execute_tool("test.tool", payload, mock_tool_handler) - - # Check that result is successful - assert result["status"] == "ok" - - # Check that memory was stored - entries = memory.get_recent("test-session-6", ttl_seconds=900) - assert len(entries) == 1 - assert entries[0]["tool"] == "test.tool" - assert entries[0]["output"] == {"result": "success", "tool_output": "test_data"} - - -@pytest.mark.asyncio -async def test_execute_tool_injects_memory(mock_tool_handler): - """Test that execute_tool injects recent memory into payload.""" - session_id = "test-session-7" - - # First call - no memory yet - payload1 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "query": "first query" - } - - await execute_tool("tool1", payload1, mock_tool_handler) - - # Second call - should have memory from first call - payload2 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "query": "second query" - } - - await execute_tool("tool2", payload2, mock_tool_handler) - - # Check that second call received memory - assert len(mock_tool_handler.captured) == 2 - second_payload = mock_tool_handler.captured[1] - - assert "memory" in second_payload - assert len(second_payload["memory"]) == 1 - assert second_payload["memory"][0]["tool"] == "tool1" - - -@pytest.mark.asyncio -async def test_execute_tool_clears_memory_on_end_session(mock_tool_handler): - """Test that execute_tool clears memory when end_session is True.""" - session_id = "test-session-8" - - # First call - store memory - payload1 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "query": "first query" - } - - await execute_tool("tool1", payload1, mock_tool_handler) - - # Verify memory exists - assert len(memory.get_recent(session_id, ttl_seconds=900)) == 1 - - # Second call with end_session=True - payload2 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "end_session": True, - "query": "closing" - } - - await execute_tool("tool2", payload2, mock_tool_handler) - - # Memory should be cleared - assert len(memory.get_recent(session_id, ttl_seconds=900)) == 0 - - # Third call - should have no memory - payload3 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "query": "new query" - } - - await execute_tool("tool3", payload3, mock_tool_handler) - - # Check that third call received no memory - third_payload = mock_tool_handler.captured[2] - assert "memory" in third_payload - assert len(third_payload["memory"]) == 0 - - -@pytest.mark.asyncio -async def test_execute_tool_no_memory_without_session_id(mock_tool_handler): - """Test that execute_tool doesn't store/inject memory if no session_id.""" - payload = { - "tenant_id": "test-tenant", - "query": "test query" - # No session_id - } - - await execute_tool("test.tool", payload, mock_tool_handler) - - # Should not have stored memory - # (We can't easily check this without session_id, but handler shouldn't have memory field) - first_payload = mock_tool_handler.captured[0] - assert "memory" not in first_payload - - -@pytest.mark.asyncio -async def test_execute_tool_multi_step_workflow(mock_tool_handler): - """Test a multi-step workflow where each step sees previous tool outputs.""" - session_id = "test-session-9" - - # Step 1: RAG search - payload1 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "query": "search for X" - } - - await execute_tool("rag.search", payload1, mock_tool_handler) - - # Step 2: Web search (should see RAG results in memory) - payload2 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "query": "search web for Y" - } - - await execute_tool("web.search", payload2, mock_tool_handler) - - # Step 3: LLM synthesis (should see both RAG and Web results) - payload3 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "query": "synthesize results" - } - - await execute_tool("llm.synthesize", payload3, mock_tool_handler) - - # Verify all steps captured memory - assert len(mock_tool_handler.captured) == 3 - - # First call has no memory - assert "memory" not in mock_tool_handler.captured[0] or len(mock_tool_handler.captured[0].get("memory", [])) == 0 - - # Second call has memory from first - assert len(mock_tool_handler.captured[1].get("memory", [])) == 1 - assert mock_tool_handler.captured[1]["memory"][0]["tool"] == "rag.search" - - # Third call has memory from both previous calls - assert len(mock_tool_handler.captured[2].get("memory", [])) == 2 - assert mock_tool_handler.captured[2]["memory"][0]["tool"] == "rag.search" - assert mock_tool_handler.captured[2]["memory"][1]["tool"] == "web.search" - - -@pytest.mark.asyncio -async def test_execute_tool_end_session_variants(mock_tool_handler): - """Test that both end_session and endSession flags work.""" - session_id = "test-session-10" - - # Store some memory - payload1 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "query": "first" - } - await execute_tool("tool1", payload1, mock_tool_handler) - assert len(memory.get_recent(session_id, ttl_seconds=900)) == 1 - - # Test end_session (snake_case) - payload2 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "end_session": True, - "query": "end" - } - await execute_tool("tool2", payload2, mock_tool_handler) - assert len(memory.get_recent(session_id, ttl_seconds=900)) == 0 - - # Store memory again - await execute_tool("tool3", payload1, mock_tool_handler) - assert len(memory.get_recent(session_id, ttl_seconds=900)) == 1 - - # Test endSession (camelCase) - payload3 = { - "tenant_id": "test-tenant", - "session_id": session_id, - "endSession": True, - "query": "end" - } - await execute_tool("tool4", payload3, mock_tool_handler) - assert len(memory.get_recent(session_id, ttl_seconds=900)) == 0 - - -# ============================================================= -# EDGE CASES -# ============================================================= - -def test_empty_session_id(): - """Test that empty session_id doesn't cause errors.""" - memory.add_entry("", "tool1", {"data": "test"}, max_items=10, ttl_seconds=900) - # Should not store anything - assert len(memory.get_recent("", ttl_seconds=900)) == 0 - - -def test_none_session_id(): - """Test that None session_id doesn't cause errors.""" - # This shouldn't happen in practice, but test for safety - entries = memory.get_recent(None, ttl_seconds=900) # type: ignore - assert entries == [] - - -@pytest.mark.asyncio -async def test_concurrent_sessions(mock_tool_handler): - """Test that concurrent sessions don't interfere with each other.""" - session1 = "session-concurrent-1" - session2 = "session-concurrent-2" - - # Execute tools in both sessions concurrently - tasks = [ - execute_tool("tool1", { - "tenant_id": "tenant1", - "session_id": session1, - "query": "q1" - }, mock_tool_handler), - execute_tool("tool2", { - "tenant_id": "tenant2", - "session_id": session2, - "query": "q2" - }, mock_tool_handler), - ] - - await asyncio.gather(*tasks) - - # Each session should have its own memory - entries1 = memory.get_recent(session1, ttl_seconds=900) - entries2 = memory.get_recent(session2, ttl_seconds=900) - - assert len(entries1) == 1 - assert len(entries2) == 1 - assert entries1[0]["tool"] == "tool1" - assert entries2[0]["tool"] == "tool2" - - -if __name__ == "__main__": - pytest.main([__file__, "-v"]) - diff --git a/backend/tests/test_enhanced_admin_rules.py b/backend/tests/test_enhanced_admin_rules.py deleted file mode 100644 index 494782435a3511b4226ff103c618dee780b18c63..0000000000000000000000000000000000000000 --- a/backend/tests/test_enhanced_admin_rules.py +++ /dev/null @@ -1,195 +0,0 @@ -""" -Tests for enhanced admin rules with regex and severity support -""" - -import sys -from pathlib import Path - -# Add backend directory to Python path -backend_dir = Path(__file__).parent.parent -sys.path.insert(0, str(backend_dir)) - -import pytest -import tempfile -import os -import re - -from api.storage.rules_store import RulesStore - - -@pytest.fixture -def temp_rules_db(): - """Create a temporary database for testing.""" - with tempfile.NamedTemporaryFile(delete=False, suffix='.db') as f: - db_path = f.name - yield db_path - # Cleanup - if os.path.exists(db_path): - os.unlink(db_path) - - -@pytest.fixture -def rules_store(temp_rules_db): - """Create a RulesStore instance with temporary database.""" - # RulesStore uses a fixed path, so we'll just use the default - # For tests, it will create/use data/admin_rules.db - # Each test should use unique tenant_id to avoid conflicts - store = RulesStore() - yield store - # Cleanup: Delete test data after each test - # Note: In a real scenario, you'd want to clean up specific tenant data - # For now, tests use unique tenant IDs to avoid conflicts - - -def test_add_rule_with_regex_and_severity(rules_store): - """Test adding a rule with regex pattern and severity.""" - tenant_id = "test_tenant_regex_severity" # Unique tenant ID - success = rules_store.add_rule( - tenant_id=tenant_id, - rule="Block password queries", - pattern=r".*password.*|.*pwd.*", - severity="high", - description="Blocks any queries containing password or pwd", - enabled=True - ) - - assert success is True - - # Get detailed rules - rules = rules_store.get_rules_detailed(tenant_id) - assert len(rules) == 1 - assert rules[0]["pattern"] == r".*password.*|.*pwd.*" - assert rules[0]["severity"] == "high" - assert rules[0]["description"] == "Blocks any queries containing password or pwd" - assert rules[0]["enabled"] == 1 - - -def test_add_rule_without_pattern_uses_rule_text(rules_store): - """Test that if pattern is not provided, rule text is used as pattern.""" - tenant_id = "test_tenant_no_pattern" # Unique tenant ID - rules_store.add_rule( - tenant_id=tenant_id, - rule="Block sensitive data", - severity="medium" - ) - - rules = rules_store.get_rules_detailed(tenant_id) - assert len(rules) == 1 - assert rules[0]["pattern"] == "Block sensitive data" - assert rules[0]["severity"] == "medium" - - -def test_get_rules_backward_compatibility(rules_store): - """Test that get_rules() still returns simple list for backward compatibility.""" - tenant_id = "test_tenant_backward_compat" # Unique tenant ID - rules_store.add_rule( - tenant_id=tenant_id, - rule="Rule 1", - severity="low" - ) - rules_store.add_rule( - tenant_id=tenant_id, - rule="Rule 2", - severity="high" - ) - - rules = rules_store.get_rules(tenant_id) - assert isinstance(rules, list) - assert len(rules) == 2 - assert "Rule 1" in rules - assert "Rule 2" in rules - - -def test_regex_pattern_matching(rules_store): - """Test that regex patterns work correctly.""" - tenant_id = "test_tenant_regex_match" # Unique tenant ID - rules_store.add_rule( - tenant_id=tenant_id, - rule="Email pattern", - pattern=r".*@.*\..*", - severity="medium" - ) - - rules = rules_store.get_rules_detailed(tenant_id) - assert len(rules) == 1 - pattern = rules[0]["pattern"] - - # Test regex matching - test_cases = [ - ("user@example.com", True), - ("contact me at test@domain.org", True), - ("no email here", False), - ("just text", False) - ] - - regex = re.compile(pattern, re.IGNORECASE) - for text, should_match in test_cases: - assert (regex.search(text) is not None) == should_match, f"Failed for: {text}" - - -def test_severity_levels(rules_store): - """Test different severity levels.""" - tenant_id = "test_tenant_severity" # Unique tenant ID - severities = ["low", "medium", "high", "critical"] - - for i, severity in enumerate(severities): - rules_store.add_rule( - tenant_id=tenant_id, - rule=f"Rule {severity}", - severity=severity - ) - - rules = rules_store.get_rules_detailed(tenant_id) - assert len(rules) == len(severities) - - for rule in rules: - assert rule["severity"] in severities - - -def test_disabled_rules_not_returned(rules_store): - """Test that disabled rules are not returned by get_rules().""" - tenant_id = "test_tenant_disabled" # Unique tenant ID - rules_store.add_rule( - tenant_id=tenant_id, - rule="Enabled rule", - enabled=True - ) - rules_store.add_rule( - tenant_id=tenant_id, - rule="Disabled rule", - enabled=False - ) - - rules = rules_store.get_rules(tenant_id) - assert len(rules) == 1 - assert "Enabled rule" in rules - assert "Disabled rule" not in rules - - # But disabled rules should still exist in detailed view (if we add a method for that) - # For now, we rely on enabled column filtering - - -def test_multiple_tenants_isolation(rules_store): - """Test that rules are properly isolated by tenant.""" - rules_store.add_rule( - tenant_id="tenant1", - rule="Tenant 1 rule", - severity="low" - ) - rules_store.add_rule( - tenant_id="tenant2", - rule="Tenant 2 rule", - severity="high" - ) - - tenant1_rules = rules_store.get_rules("tenant1") - tenant2_rules = rules_store.get_rules("tenant2") - - assert len(tenant1_rules) == 1 - assert "Tenant 1 rule" in tenant1_rules - assert "Tenant 2 rule" not in tenant1_rules - - assert len(tenant2_rules) == 1 - assert "Tenant 2 rule" in tenant2_rules - assert "Tenant 1 rule" not in tenant2_rules - diff --git a/backend/tests/test_intent.py b/backend/tests/test_intent.py deleted file mode 100644 index f0b6bbfa30599138a064a412be2ba7c0743b05cf..0000000000000000000000000000000000000000 --- a/backend/tests/test_intent.py +++ /dev/null @@ -1,118 +0,0 @@ -# ============================================================= -# File: tests/test_intent.py -# ============================================================= - -import sys -from pathlib import Path - -# Add backend directory to Python path -backend_dir = Path(__file__).parent.parent -sys.path.insert(0, str(backend_dir)) - -try: - import pytest - HAS_PYTEST = True -except ImportError: - HAS_PYTEST = False - # Create a mock pytest decorator if pytest is not available - class MockMark: - def asyncio(self, func): - return func - class MockPytest: - mark = MockMark() - pytest = MockPytest() - -import asyncio -from api.services.intent_classifier import IntentClassifier -from api.services.llm_client import LLMClient -from api.services.redflag_detector import RedFlagDetector -from api.services.tool_selector import ToolSelector -from api.models.redflag import RedFlagMatch - - -@pytest.mark.asyncio -async def test_intent_rag_keywords(): - classifier = IntentClassifier() - intent = await classifier.classify("Please check the HR policy document") - assert intent == "rag" - -@pytest.mark.asyncio -async def test_intent_web_keywords(): - classifier = IntentClassifier() - intent = await classifier.classify("latest news about Tesla stock") - assert intent == "web" - -@pytest.mark.asyncio -async def test_intent_admin_keywords(): - classifier = IntentClassifier() - intent = await classifier.classify("export all user data") - assert intent == "admin" - -@pytest.mark.asyncio -async def test_intent_general(): - classifier = IntentClassifier() - intent = await classifier.classify("explain how gravity works") - assert intent == "general" - - -# ---- LLM fallback test ---- - -class FakeLLM: - async def simple_call(self, prompt: str, temperature: float = 0.0): - return "web" - -@pytest.mark.asyncio -async def test_intent_llm_fallback(): - classifier = IntentClassifier(llm_client=FakeLLM()) - intent = await classifier.classify("What's going on in the world?") - assert intent == "web" - - -# ---- Manual run function (for non-pytest execution) ---- - -async def run_manual_tests(): - llm = LLMClient() - clf = IntentClassifier(llm_client=llm) - - # Initialize detector with empty creds (will return empty results if no Supabase) - import os - detector = RedFlagDetector( - supabase_url=os.getenv("SUPABASE_URL") or "", - supabase_key=os.getenv("SUPABASE_SERVICE_KEY") or "" - ) - selector = ToolSelector(llm_client=llm) - - print("Intent Classification:") - print("RAG:", await clf.classify("summarize internal policy")) - print("WEB:", await clf.classify("latest news about ai")) - print("ADMIN:", await clf.classify("delete all data")) - print("GENERAL:", await clf.classify("hi how are you")) - - print("\nRedFlag checks (will be empty if no Supabase configured):") - try: - print(await detector.check("tenant123", "My email is test@gmail.com")) - print(await detector.check("tenant123", "delete all data now")) - print(await detector.check("tenant123", "confidential salary report")) - print(await detector.check("tenant123", "hello world")) - except Exception as e: - print(f"RedFlag check failed (expected if Supabase not configured): {e}") - - print("\nTool selection:") - print(await selector.select("admin", "delete all data", {})) - print(await selector.select("rag", "summarize policy", {})) - print(await selector.select("web", "latest news", {})) - print(await selector.select("general", "hello", {})) - - print("\nLLM Test:") - try: - if llm.url and llm.model: - result = await llm.simple_call("Hello Llama!") - print(f"LLM Result: {result}") - else: - print("LLM not configured (OLLAMA_URL/OLLAMA_MODEL not set) - skipping LLM test") - except Exception as e: - print(f"LLM call failed (expected if Ollama not running or not configured): {e}") - - -if __name__ == "__main__": - asyncio.run(run_manual_tests()) diff --git a/backend/tests/test_metadata_extraction.py b/backend/tests/test_metadata_extraction.py deleted file mode 100644 index 5913e773c3a51ff768d30248e336c6e76437bd84..0000000000000000000000000000000000000000 --- a/backend/tests/test_metadata_extraction.py +++ /dev/null @@ -1,461 +0,0 @@ -""" -Comprehensive tests for AI-Generated Knowledge Base Metadata Extraction - -Tests all metadata extraction features: -- Title extraction (from filename, content, URL) -- Summary generation (LLM and fallback) -- Tags extraction (LLM and fallback) -- Topics extraction (LLM and fallback) -- Date detection -- Quality score calculation -- Database storage -- Integration with ingestion pipeline -""" - -import pytest -import asyncio -from unittest.mock import Mock, patch, AsyncMock -from backend.api.services.metadata_extractor import MetadataExtractor -from backend.mcp_server.common.database import insert_document_chunks, get_connection -import json - - -class TestMetadataExtractor: - """Test the MetadataExtractor service""" - - @pytest.fixture - def extractor(self): - """Create a MetadataExtractor instance""" - return MetadataExtractor() - - @pytest.fixture - def sample_content(self): - """Sample document content for testing""" - return """ - # API Documentation Guide - - This comprehensive guide covers REST API endpoints, authentication, and best practices. - Published on 2024-01-15, this document provides detailed information about our API. - - ## Authentication - All API requests require authentication using API keys or OAuth tokens. - - ## Endpoints - - GET /api/v1/users - List all users - - POST /api/v1/users - Create a new user - - GET /api/v1/users/{id} - Get user by ID - - ## Examples - Here are some example requests and responses. - - ## Troubleshooting - Common issues and their solutions. - """ - - def test_extract_title_from_filename(self, extractor): - """Test title extraction from filename""" - content = "Some content here" - filename = "API_Documentation_Guide.pdf" - - title = extractor._extract_title(content, filename=filename, url=None) - assert title == "Api Documentation Guide" - assert "API" in title or "Api" in title - - def test_extract_title_from_content(self, extractor, sample_content): - """Test title extraction from content (first line or markdown)""" - title = extractor._extract_title(sample_content, filename=None, url=None) - # Should extract from markdown header or first meaningful line - assert len(title) > 0 - assert len(title) < 200 - - def test_extract_title_from_url(self, extractor): - """Test title extraction from URL""" - content = "Some content" - url = "https://example.com/api/documentation-guide" - - title = extractor._extract_title(content, filename=None, url=url) - # URL extraction should return something (may be from URL path or fallback) - assert len(title) > 0 - assert isinstance(title, str) - - def test_extract_title_fallback(self, extractor): - """Test title fallback to first 50 chars""" - content = "This is a very long document that doesn't have a clear title structure and continues with more text" - title = extractor._extract_title(content, filename=None, url=None) - assert len(title) > 0 - # Fallback should return first line or first 50 chars (may not have ...) - assert isinstance(title, str) - # Title should be reasonable length (not the entire content if content is long) - # If content is short, title might equal content, which is fine - if len(content) > 50: - assert len(title) <= len(content) - - def test_detect_date_formats(self, extractor): - """Test date detection in various formats""" - # YYYY-MM-DD format - content1 = "Published on 2024-01-15" - date1 = extractor._detect_date(content1) - assert date1 == "2024-01-15" - - # MM/DD/YYYY format - content2 = "Created on 01/15/2024" - date2 = extractor._detect_date(content2) - assert date2 is not None - - # Month name format - content3 = "Last updated January 15, 2024" - date3 = extractor._detect_date(content3) - assert date3 is not None - - def test_detect_date_none(self, extractor): - """Test date detection when no date is present""" - content = "This document has no date information" - date = extractor._detect_date(content) - assert date is None - - def test_generate_basic_summary(self, extractor, sample_content): - """Test basic summary generation""" - summary = extractor._generate_basic_summary(sample_content) - assert len(summary) > 0 - assert len(summary) < len(sample_content) - assert summary.endswith('.') - - def test_extract_basic_tags(self, extractor, sample_content): - """Test basic tag extraction without LLM""" - tags = extractor._extract_basic_tags(sample_content) - assert isinstance(tags, list) - assert len(tags) > 0 - assert len(tags) <= 8 - # Should find "api" in tags - assert any("api" in tag.lower() for tag in tags) - - def test_extract_basic_topics(self, extractor, sample_content): - """Test basic topic extraction without LLM""" - topics = extractor._extract_basic_topics(sample_content) - assert isinstance(topics, list) - assert len(topics) > 0 - assert len(topics) <= 5 - # Should find topics from headers - assert any("API" in topic or "api" in topic.lower() for topic in topics) - - def test_calculate_quality_score(self, extractor): - """Test quality score calculation""" - # Good quality content - good_content = "This is a well-structured document. " * 50 - good_content += "It has multiple paragraphs. " * 10 - score1 = extractor._calculate_quality_score(good_content, 500, "Good summary") - assert 0.0 <= score1 <= 1.0 - assert score1 > 0.5 # Should be decent quality - - # Poor quality content - poor_content = "x" * 100 - score2 = extractor._calculate_quality_score(poor_content, 10, "") - assert 0.0 <= score2 <= 1.0 - assert score2 < score1 # Should be lower quality - - def test_extract_fallback(self, extractor, sample_content): - """Test fallback metadata extraction""" - result = extractor._extract_fallback(sample_content, "Test Title") - assert "summary" in result - assert "tags" in result - assert "topics" in result - assert isinstance(result["tags"], list) - assert isinstance(result["topics"], list) - assert len(result["summary"]) > 0 - - @pytest.mark.asyncio - async def test_extract_with_llm_success(self, extractor, sample_content): - """Test LLM-based metadata extraction (mocked)""" - # Mock LLM response - mock_response = json.dumps({ - "summary": "This document provides comprehensive API documentation.", - "tags": ["api", "documentation", "rest", "endpoints"], - "topics": ["API", "REST", "Endpoints"], - "domain": "Software Development" - }) - - with patch.object(extractor.llm, 'simple_call', new_callable=AsyncMock) as mock_llm: - mock_llm.return_value = mock_response - - result = await extractor._extract_with_llm(sample_content, "API Documentation") - - assert "summary" in result - assert "tags" in result - assert "topics" in result - assert len(result["tags"]) > 0 - assert len(result["topics"]) > 0 - assert "api" in [tag.lower() for tag in result["tags"]] - - @pytest.mark.asyncio - async def test_extract_with_llm_timeout(self, extractor, sample_content): - """Test LLM extraction timeout handling""" - with patch.object(extractor.llm, 'simple_call', new_callable=AsyncMock) as mock_llm: - mock_llm.side_effect = asyncio.TimeoutError() - - with pytest.raises(Exception) as exc_info: - await extractor._extract_with_llm(sample_content, "Test") - assert "timeout" in str(exc_info.value).lower() or isinstance(exc_info.value, asyncio.TimeoutError) - - @pytest.mark.asyncio - async def test_extract_metadata_full(self, extractor, sample_content): - """Test full metadata extraction (with LLM fallback)""" - # Mock LLM to fail (will use fallback) - with patch.object(extractor.llm, 'simple_call', new_callable=AsyncMock) as mock_llm: - mock_llm.side_effect = Exception("LLM unavailable") - - metadata = await extractor.extract_metadata( - content=sample_content, - filename="api_docs.md", - url=None, - source_type="markdown" - ) - - # Verify all required fields - assert "title" in metadata - assert "summary" in metadata - assert "tags" in metadata - assert "topics" in metadata - assert "detected_date" in metadata - assert "quality_score" in metadata - assert "word_count" in metadata - assert "char_count" in metadata - assert "source_type" in metadata - assert "extraction_method" in metadata - - # Verify data types and ranges - assert isinstance(metadata["title"], str) - assert isinstance(metadata["summary"], str) - assert isinstance(metadata["tags"], list) - assert isinstance(metadata["topics"], list) - assert isinstance(metadata["quality_score"], float) - assert 0.0 <= metadata["quality_score"] <= 1.0 - assert metadata["word_count"] > 0 - assert metadata["extraction_method"] in ["llm", "fallback"] - - @pytest.mark.asyncio - async def test_extract_metadata_with_llm(self, extractor, sample_content): - """Test metadata extraction with successful LLM call""" - mock_response = json.dumps({ - "summary": "Comprehensive API documentation guide.", - "tags": ["api", "documentation", "rest"], - "topics": ["API", "REST", "Documentation"], - "domain": "API" - }) - - with patch.object(extractor.llm, 'simple_call', new_callable=AsyncMock) as mock_llm: - mock_llm.return_value = mock_response - - metadata = await extractor.extract_metadata( - content=sample_content, - filename="api_docs.md" - ) - - assert metadata["extraction_method"] == "llm" - assert len(metadata["summary"]) > 0 - assert len(metadata["tags"]) > 0 - assert len(metadata["topics"]) > 0 - - -class TestDatabaseMetadataStorage: - """Test database storage of metadata""" - - @pytest.fixture - def sample_metadata(self): - """Sample metadata for testing""" - return { - "title": "Test Document", - "summary": "This is a test document for metadata extraction.", - "tags": ["test", "documentation"], - "topics": ["Testing", "Metadata"], - "detected_date": "2024-01-15", - "quality_score": 0.85, - "word_count": 100, - "char_count": 500, - "source_type": "txt", - "extraction_method": "llm" - } - - def test_insert_with_metadata(self, sample_metadata): - """Test inserting document chunk with metadata""" - # This test requires a real database connection - # Skip if database is not available - try: - conn = get_connection() - conn.close() - except Exception: - pytest.skip("Database not available for testing") - - tenant_id = "test_tenant_metadata" - text = "This is a test chunk with metadata." - - # Generate a simple embedding (384 dimensions) - embedding = [0.1] * 384 - - # Insert with metadata - insert_document_chunks( - tenant_id=tenant_id, - text=text, - embedding=embedding, - metadata=sample_metadata, - doc_id="test_doc_123" - ) - - # Verify insertion by querying - conn = get_connection() - cur = conn.cursor() - cur.execute(""" - SELECT metadata, doc_id - FROM documents - WHERE tenant_id = %s - AND chunk_text = %s - LIMIT 1; - """, (tenant_id, text)) - - result = cur.fetchone() - assert result is not None - - stored_metadata = result[0] - stored_doc_id = result[1] - - # Verify metadata was stored correctly - assert stored_metadata is not None - assert stored_metadata["title"] == sample_metadata["title"] - assert stored_metadata["summary"] == sample_metadata["summary"] - assert stored_metadata["quality_score"] == sample_metadata["quality_score"] - - # Verify doc_id was stored - assert stored_doc_id == "test_doc_123" - - # Cleanup - cur.execute("DELETE FROM documents WHERE tenant_id = %s", (tenant_id,)) - conn.commit() - cur.close() - conn.close() - - -class TestIngestionIntegration: - """Test metadata extraction integration with ingestion pipeline""" - - @pytest.mark.asyncio - async def test_metadata_extraction_in_ingestion(self): - """Test that metadata is extracted during document ingestion""" - from backend.api.services.document_ingestion import prepare_ingestion_payload, process_ingestion - from backend.api.mcp_clients.rag_client import RAGClient - from unittest.mock import AsyncMock, patch, MagicMock - - # Mock RAG client - mock_rag_client = Mock(spec=RAGClient) - mock_rag_client.ingest_with_metadata = AsyncMock(return_value={ - "chunks_stored": 3, - "status": "ok" - }) - - # Prepare payload - payload = await prepare_ingestion_payload( - tenant_id="test_tenant", - content="This is a test document about API documentation. Published on 2024-01-15.", - source_type="txt", - filename="api_docs.txt" - ) - - # Process with metadata extraction - patch the import path used in the function - with patch('backend.api.services.metadata_extractor.MetadataExtractor') as mock_extractor_class: - mock_extractor = MagicMock() - mock_extractor.extract_metadata = AsyncMock(return_value={ - "title": "API Documentation", - "summary": "Test document about APIs", - "tags": ["api", "documentation"], - "topics": ["API"], - "detected_date": "2024-01-15", - "quality_score": 0.8, - "word_count": 10, - "char_count": 50, - "source_type": "txt", - "extraction_method": "llm" - }) - mock_extractor_class.return_value = mock_extractor - - result = await process_ingestion(payload, mock_rag_client, extract_metadata=True) - - # Verify metadata was extracted - assert "extracted_metadata" in result - assert result["extracted_metadata"]["title"] == "API Documentation" - assert result["extracted_metadata"]["quality_score"] == 0.8 - - # Verify RAG client was called with metadata - mock_rag_client.ingest_with_metadata.assert_called_once() - call_args = mock_rag_client.ingest_with_metadata.call_args - # Check that metadata was passed (either as kwarg or in the merged metadata) - assert call_args is not None - - -class TestMetadataEdgeCases: - """Test edge cases and error handling""" - - @pytest.mark.asyncio - async def test_empty_content(self): - """Test metadata extraction with empty content""" - extractor = MetadataExtractor() - - metadata = await extractor.extract_metadata( - content="", - filename="empty.txt" - ) - - # Should still return metadata structure - assert "title" in metadata - assert "summary" in metadata - assert metadata["word_count"] == 0 - - @pytest.mark.asyncio - async def test_very_long_content(self): - """Test metadata extraction with very long content""" - extractor = MetadataExtractor() - long_content = "Word " * 10000 # 10,000 words - - metadata = await extractor.extract_metadata( - content=long_content, - filename="long_doc.txt" - ) - - assert metadata["word_count"] == 10000 - assert len(metadata["summary"]) > 0 - assert metadata["quality_score"] >= 0.0 - - @pytest.mark.asyncio - async def test_special_characters(self): - """Test metadata extraction with special characters""" - extractor = MetadataExtractor() - special_content = "Document with émojis 🚀 and spéciál chàracters!" - - metadata = await extractor.extract_metadata( - content=special_content, - filename="special.txt" - ) - - assert "title" in metadata - assert len(metadata["title"]) > 0 - - def test_quality_score_edge_cases(self): - """Test quality score with edge cases""" - extractor = MetadataExtractor() - - # Very short content - short = "Hi" - score1 = extractor._calculate_quality_score(short, 1, "") - assert 0.0 <= score1 <= 1.0 - - # Very long content - long = "Word " * 20000 - score2 = extractor._calculate_quality_score(long, 20000, "Summary") - assert 0.0 <= score2 <= 1.0 - - # No summary - no_summary = "Content " * 100 - score3 = extractor._calculate_quality_score(no_summary, 100, "") - assert 0.0 <= score3 <= 1.0 - - -if __name__ == "__main__": - pytest.main([__file__, "-v", "--tb=short"]) - diff --git a/backend/tests/test_retry_system.py b/backend/tests/test_retry_system.py deleted file mode 100644 index cc8b5ec1f4eea05bd4ac965d5d53b15f65aa026d..0000000000000000000000000000000000000000 --- a/backend/tests/test_retry_system.py +++ /dev/null @@ -1,651 +0,0 @@ -# ============================================================= -# File: backend/tests/test_retry_system.py -# ============================================================= -""" -Comprehensive tests for autonomous retry and self-correction system. - -Tests: -1. RAG retry with low scores (threshold adjustment + query expansion) -2. Web search retry with empty results (query rewriting) -3. Safe tool call retry mechanism -4. Rule safe message rewriting -5. Integration tests with reasoning traces -6. Analytics logging verification -""" - -import sys -from pathlib import Path -import pytest -from unittest.mock import AsyncMock, MagicMock, patch -import asyncio - -# Add backend directory to Python path -backend_dir = Path(__file__).parent.parent -sys.path.insert(0, str(backend_dir)) - -try: - HAS_PYTEST = True -except ImportError: - HAS_PYTEST = False - class MockMark: - def asyncio(self, func): - return func - class MockPytest: - mark = MockMark() - def fixture(self, func): - return func - pytest = MockPytest() - -from api.services.agent_orchestrator import AgentOrchestrator -from api.models.agent import AgentRequest -from api.models.redflag import RedFlagMatch - - -# ============================================================= -# FIXTURES -# ============================================================= - -@pytest.fixture -def mock_orchestrator(): - """Create orchestrator with mocked dependencies.""" - orch = AgentOrchestrator( - rag_mcp_url="http://fake:8001", - web_mcp_url="http://fake:8002", - admin_mcp_url="http://fake:8003", - llm_backend="ollama" - ) - - # Mock MCP client - orch.mcp = MagicMock() - orch.analytics = MagicMock() - orch.llm = MagicMock() - orch.redflag = MagicMock() - - return orch - - -# ============================================================= -# RAG RETRY TESTS -# ============================================================= - -@pytest.mark.asyncio -async def test_rag_with_repair_high_score_no_retry(mock_orchestrator): - """Test RAG repair doesn't retry when scores are good.""" - - # Mock high score result - mock_orchestrator.mcp.call_rag = AsyncMock(return_value={ - "results": [{"text": "relevant content", "score": 0.85}] - }) - - reasoning_trace = [] - result = await mock_orchestrator.rag_with_repair( - query="test query", - tenant_id="tenant1", - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Should only call once (no retry needed) - assert mock_orchestrator.mcp.call_rag.call_count == 1 - assert result["results"][0]["score"] == 0.85 - - -@pytest.mark.asyncio -async def test_rag_with_repair_low_score_retry_threshold(mock_orchestrator): - """Test RAG repair retries with lower threshold when score < 0.30.""" - - # Mock first call - low score, second call - better score - mock_orchestrator.mcp.call_rag = AsyncMock(side_effect=[ - {"results": [{"text": "low relevance", "score": 0.25}]}, - {"results": [{"text": "better match", "score": 0.45}]} - ]) - - reasoning_trace = [] - result = await mock_orchestrator.rag_with_repair( - query="test query", - tenant_id="tenant1", - original_threshold=0.3, - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Should have retried with lower threshold (0.15) - assert mock_orchestrator.mcp.call_rag.call_count == 2 - - # Check second call used threshold 0.15 - second_call_kwargs = mock_orchestrator.mcp.call_rag.call_args_list[1].kwargs - assert second_call_kwargs.get("threshold") == 0.15 - - # Verify reasoning trace has retry step - retry_steps = [s for s in reasoning_trace if "retry" in str(s).lower()] - assert len(retry_steps) > 0 - - -@pytest.mark.asyncio -async def test_rag_with_repair_expand_query(mock_orchestrator): - """Test RAG repair expands query when score still low after threshold retry.""" - - # Mock: low score -> still low after threshold retry -> better after expansion - mock_orchestrator.mcp.call_rag = AsyncMock(side_effect=[ - {"results": [{"text": "low", "score": 0.12}]}, # Initial - very low - {"results": [{"text": "still low", "score": 0.10}]}, # After threshold retry - still low - {"results": [{"text": "better", "score": 0.35}]} # After query expansion - better - ]) - - reasoning_trace = [] - result = await mock_orchestrator.rag_with_repair( - query="test", - tenant_id="tenant1", - original_threshold=0.3, - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Should have retried 3 times (initial + threshold + expanded query) - assert mock_orchestrator.mcp.call_rag.call_count == 3 - - # Check reasoning trace has expanded query step - expand_steps = [s for s in reasoning_trace if "expanded" in str(s).lower() or "expand" in str(s).lower()] - assert len(expand_steps) > 0 - - # Verify analytics was called for retries - assert mock_orchestrator.analytics.log_tool_usage.call_count > 1 - - -@pytest.mark.asyncio -async def test_rag_with_repair_no_results(mock_orchestrator): - """Test RAG repair handles empty results gracefully.""" - - mock_orchestrator.mcp.call_rag = AsyncMock(return_value={ - "results": [] - }) - - reasoning_trace = [] - result = await mock_orchestrator.rag_with_repair( - query="test query", - tenant_id="tenant1", - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Should handle gracefully (may retry or return empty) - assert isinstance(result, dict) - assert "results" in result - - -# ============================================================= -# WEB SEARCH RETRY TESTS -# ============================================================= - -@pytest.mark.asyncio -async def test_web_with_repair_has_results_no_retry(mock_orchestrator): - """Test web repair doesn't retry when results are found.""" - - mock_orchestrator.mcp.call_web = AsyncMock(return_value={ - "results": [ - {"title": "Result 1", "snippet": "Content", "url": "http://example.com"} - ] - }) - - reasoning_trace = [] - result = await mock_orchestrator.web_with_repair( - query="normal query", - tenant_id="tenant1", - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Should only call once (no retry needed) - assert mock_orchestrator.mcp.call_web.call_count == 1 - assert len(result["results"]) > 0 - - -@pytest.mark.asyncio -async def test_web_with_repair_empty_results_retry(mock_orchestrator): - """Test web repair retries with rewritten query when results are empty.""" - - # Mock: empty -> empty -> success - mock_orchestrator.mcp.call_web = AsyncMock(side_effect=[ - {"results": []}, # Initial - empty - {"results": []}, # First retry - still empty - {"results": [{"title": "Found", "snippet": "Result", "url": "http://example.com"}]} # Second retry - success - ]) - - reasoning_trace = [] - result = await mock_orchestrator.web_with_repair( - query="obscure query xyz", - tenant_id="tenant1", - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Should have retried (up to 2 rewrites) - assert mock_orchestrator.mcp.call_web.call_count >= 2 - - # Verify reasoning trace has retry steps - retry_steps = [s for s in reasoning_trace if "retry" in str(s).lower()] - assert len(retry_steps) > 0 - - # Check that rewritten queries were used - # call_web takes positional args: (tenant_id, query) - calls = mock_orchestrator.mcp.call_web.call_args_list - rewritten_queries = [] - for call in calls: - # Extract query from positional args (args[1] after tenant_id) - if len(call.args) > 1: - rewritten_queries.append(call.args[1]) - - # Should have at least original + retry queries - assert len(rewritten_queries) >= 2 - # Check that at least one rewritten query contains our rewrite patterns - assert any("best explanation" in str(q).lower() or "facts summary" in str(q).lower() - for q in rewritten_queries if q) - - -@pytest.mark.asyncio -async def test_web_with_repair_analytics_logging(mock_orchestrator): - """Test web repair logs retry attempts to analytics.""" - - mock_orchestrator.mcp.call_web = AsyncMock(side_effect=[ - {"results": []}, - {"results": [{"title": "Result", "snippet": "Content"}]} - ]) - - await mock_orchestrator.web_with_repair( - query="test", - tenant_id="tenant1", - user_id="user1" - ) - - # Verify analytics was called - assert mock_orchestrator.analytics.log_tool_usage.called - - -# ============================================================= -# SAFE TOOL CALL TESTS -# ============================================================= - -@pytest.mark.asyncio -async def test_safe_tool_call_success_first_attempt(mock_orchestrator): - """Test safe_tool_call succeeds on first attempt.""" - - successful_tool = AsyncMock(return_value={"success": True, "data": "result"}) - - result = await mock_orchestrator.safe_tool_call( - tool_fn=successful_tool, - params={"param1": "value1"}, - max_retries=2, - tool_name="test_tool", - tenant_id="tenant1", - user_id="user1" - ) - - # Should succeed on first try - assert successful_tool.call_count == 1 - assert result["success"] is True - assert result["data"] == "result" - - -@pytest.mark.asyncio -async def test_safe_tool_call_retry_on_failure(mock_orchestrator): - """Test safe_tool_call retries on failure.""" - - failing_tool = AsyncMock(side_effect=[ - Exception("First failure"), - {"success": True, "data": "recovered"} - ]) - - reasoning_trace = [] - result = await mock_orchestrator.safe_tool_call( - tool_fn=failing_tool, - params={}, - max_retries=2, - tool_name="test_tool", - tenant_id="tenant1", - user_id="user1", - reasoning_trace=reasoning_trace - ) - - # Should have retried - assert failing_tool.call_count == 2 - assert result["success"] is True - - # Verify reasoning trace has retry info - retry_steps = [s for s in reasoning_trace if "retry" in str(s).lower()] - assert len(retry_steps) > 0 - - -@pytest.mark.asyncio -async def test_safe_tool_call_exhausts_retries(mock_orchestrator): - """Test safe_tool_call returns error after all retries exhausted.""" - - failing_tool = AsyncMock(side_effect=Exception("Always fails")) - - reasoning_trace = [] - result = await mock_orchestrator.safe_tool_call( - tool_fn=failing_tool, - params={}, - max_retries=2, - tool_name="test_tool", - tenant_id="tenant1", - user_id="user1", - reasoning_trace=reasoning_trace - ) - - # Should have retried max_retries times - assert failing_tool.call_count == 2 - assert "error" in result - - # Verify analytics logged failures - assert mock_orchestrator.analytics.log_tool_usage.called - - -@pytest.mark.asyncio -async def test_safe_tool_call_fallback_params(mock_orchestrator): - """Test safe_tool_call uses fallback params on retry.""" - - tool_calls = [] - - async def mock_tool_async(**kwargs): - tool_calls.append(kwargs.copy()) - if len(tool_calls) == 1: - raise Exception("First attempt failed") - return {"success": True, "params": kwargs} - - result = await mock_orchestrator.safe_tool_call( - tool_fn=mock_tool_async, - params={"param1": "value1"}, - max_retries=2, - fallback_params={"param1": "fallback_value"}, - tool_name="test_tool", - tenant_id="tenant1" - ) - - # Should have used fallback params on retry - assert len(tool_calls) == 2 - assert tool_calls[0]["param1"] == "value1" # Original params - assert tool_calls[1]["param1"] == "fallback_value" # Fallback params on retry - assert result["success"] is True - - -# ============================================================= -# RULE SAFE MESSAGE TESTS -# ============================================================= - -@pytest.mark.asyncio -async def test_rule_safe_message_no_violations(mock_orchestrator): - """Test rule_safe_message returns original when no violations.""" - - mock_orchestrator.redflag.check = AsyncMock(return_value=[]) - - safe_msg = await mock_orchestrator.rule_safe_message( - user_message="Normal message", - tenant_id="tenant1" - ) - - # Should return original message - assert safe_msg == "Normal message" - assert mock_orchestrator.redflag.check.call_count == 1 - - -@pytest.mark.asyncio -async def test_rule_safe_message_rewrites_violation(mock_orchestrator): - """Test rule_safe_message rewrites violating messages.""" - - # Mock redflag check - first call violates, second (rewritten) passes - violation = RedFlagMatch( - rule_id="1", - pattern="salary", - severity="high", - description="salary access", - matched_text="salary" - ) - - mock_orchestrator.redflag.check = AsyncMock(side_effect=[ - [violation], # Original message violates - [] # Rewritten message is safe - ]) - - mock_orchestrator.llm.simple_call = AsyncMock( - return_value="This is a compliant version of your request about compensation" - ) - - reasoning_trace = [] - safe_msg = await mock_orchestrator.rule_safe_message( - user_message="I want to see salary info", - tenant_id="tenant1", - reasoning_trace=reasoning_trace - ) - - # Should have checked rules twice (original + rewritten) - assert mock_orchestrator.redflag.check.call_count == 2 - - # Should have called LLM to rewrite - assert mock_orchestrator.llm.simple_call.called - - # Should return rewritten message - assert "compliant" in safe_msg.lower() or safe_msg != "I want to see salary info" - - # Verify reasoning trace - rewrite_steps = [s for s in reasoning_trace if "rewrite" in str(s).lower()] - assert len(rewrite_steps) > 0 - - -@pytest.mark.asyncio -async def test_rule_safe_message_brief_rule_no_rewrite(mock_orchestrator): - """Test rule_safe_message doesn't rewrite brief response rules.""" - - # Brief response rules are handled separately, so should return original - brief_rule = RedFlagMatch( - rule_id="1", - pattern="greeting", - severity="low", - description="greeting", - matched_text="hi" - ) - - mock_orchestrator.redflag.check = AsyncMock(return_value=[brief_rule]) - - safe_msg = await mock_orchestrator.rule_safe_message( - user_message="Hi there", - tenant_id="tenant1" - ) - - # Should return original (brief rules are handled elsewhere) - assert safe_msg == "Hi there" - - -@pytest.mark.asyncio -async def test_rule_safe_message_llm_failure_fallback(mock_orchestrator): - """Test rule_safe_message falls back to original if LLM rewrite fails.""" - - violation = RedFlagMatch( - rule_id="1", - pattern="blocked", - severity="high", - description="blocked", - matched_text="blocked" - ) - - mock_orchestrator.redflag.check = AsyncMock(return_value=[violation]) - mock_orchestrator.llm.simple_call = AsyncMock(side_effect=Exception("LLM failed")) - - original_msg = "I want blocked content" - safe_msg = await mock_orchestrator.rule_safe_message( - user_message=original_msg, - tenant_id="tenant1" - ) - - # Should return original message if rewrite fails - assert safe_msg == original_msg - - -# ============================================================= -# INTEGRATION TESTS -# ============================================================= - -@pytest.mark.asyncio -async def test_rag_integration_reasoning_trace(mock_orchestrator): - """Test RAG retry steps appear in reasoning trace.""" - - mock_orchestrator.mcp.call_rag = AsyncMock(side_effect=[ - {"results": [{"text": "low", "score": 0.20}]}, - {"results": [{"text": "better", "score": 0.50}]} - ]) - - reasoning_trace = [] - await mock_orchestrator.rag_with_repair( - query="test", - tenant_id="tenant1", - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Check reasoning trace has retry information - trace_str = str(reasoning_trace).lower() - assert "retry" in trace_str or "threshold" in trace_str - - -@pytest.mark.asyncio -async def test_web_integration_reasoning_trace(mock_orchestrator): - """Test web retry steps appear in reasoning trace.""" - - mock_orchestrator.mcp.call_web = AsyncMock(side_effect=[ - {"results": []}, - {"results": [{"title": "Result", "snippet": "Content"}]} - ]) - - reasoning_trace = [] - await mock_orchestrator.web_with_repair( - query="test", - tenant_id="tenant1", - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Check reasoning trace has retry information - trace_str = str(reasoning_trace).lower() - assert "retry" in trace_str or "rewritten" in trace_str - - -@pytest.mark.asyncio -async def test_analytics_logging_on_retries(mock_orchestrator): - """Test that retry attempts are logged to analytics.""" - - mock_orchestrator.mcp.call_rag = AsyncMock(side_effect=[ - {"results": [{"text": "low", "score": 0.25}]}, - {"results": [{"text": "better", "score": 0.45}]} - ]) - - await mock_orchestrator.rag_with_repair( - query="test", - tenant_id="tenant1", - user_id="user1" - ) - - # Verify analytics was called (for initial + retry) - assert mock_orchestrator.analytics.log_tool_usage.call_count > 0 - - # Verify RAG search was logged - assert mock_orchestrator.analytics.log_rag_search.called - - -@pytest.mark.asyncio -async def test_full_agent_flow_with_retry(mock_orchestrator): - """Test full agent flow integrates retry system.""" - - # Setup mocks for a full agent request - mock_orchestrator.intent = MagicMock() - mock_orchestrator.intent.classify = AsyncMock(return_value="rag") - - mock_orchestrator.selector = MagicMock() - from api.models.agent import AgentDecision - mock_orchestrator.selector.select = AsyncMock(return_value=AgentDecision( - action="call_tool", - tool="rag", - tool_input={"query": "test query"}, - reason="test" - )) - - mock_orchestrator.redflag.check = AsyncMock(return_value=[]) - - mock_orchestrator.mcp.call_rag = AsyncMock(side_effect=[ - {"results": [{"text": "low relevance", "score": 0.25}]}, - {"results": [{"text": "better match", "score": 0.50}]} - ]) - - mock_orchestrator.llm.simple_call = AsyncMock(return_value="Final answer") - - # Create request - req = AgentRequest( - tenant_id="tenant1", - user_id="user1", - message="test query" - ) - - # Handle request - response = await mock_orchestrator.handle(req) - - # Verify retry happened (2 RAG calls) - assert mock_orchestrator.mcp.call_rag.call_count == 2 - - # Verify response is generated - assert response.text == "Final answer" - - # Verify reasoning trace contains retry info - trace_str = str(response.reasoning_trace).lower() - # Should have retry or repair related steps - - -# ============================================================= -# EDGE CASES -# ============================================================= - -@pytest.mark.asyncio -async def test_rag_repair_edge_case_exactly_threshold(mock_orchestrator): - """Test RAG repair behavior at threshold boundary.""" - - # Score exactly at threshold - should not retry - mock_orchestrator.mcp.call_rag = AsyncMock(return_value={ - "results": [{"text": "content", "score": 0.30}]} # Exactly at threshold - ) - - reasoning_trace = [] - await mock_orchestrator.rag_with_repair( - query="test", - tenant_id="tenant1", - original_threshold=0.3, - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Should not retry (score >= 0.30) - assert mock_orchestrator.mcp.call_rag.call_count == 1 - - -@pytest.mark.asyncio -async def test_web_repair_all_retries_fail(mock_orchestrator): - """Test web repair handles case where all retries return empty.""" - - mock_orchestrator.mcp.call_web = AsyncMock(return_value={"results": []}) - - reasoning_trace = [] - result = await mock_orchestrator.web_with_repair( - query="very obscure query", - tenant_id="tenant1", - reasoning_trace=reasoning_trace, - user_id="user1" - ) - - # Should have attempted retries - assert mock_orchestrator.mcp.call_web.call_count >= 2 - - # Should still return result (even if empty) - assert isinstance(result, dict) - - -if __name__ == "__main__": - # Allow running tests directly - print("Running retry system tests...") - pytest.main([__file__, "-v", "--tb=short"]) - diff --git a/backend/tests/test_tool_metadata_and_routing.py b/backend/tests/test_tool_metadata_and_routing.py deleted file mode 100644 index b5b3ff389cc0970aeffe4d7bf855de615bfc1446..0000000000000000000000000000000000000000 --- a/backend/tests/test_tool_metadata_and_routing.py +++ /dev/null @@ -1,585 +0,0 @@ -""" -Comprehensive tests for: -1. Per-Tool Latency Prediction -2. Context-Aware MCP Routing -3. Tool Output Schemas - -Tests all three new features for intelligent tool selection and output validation. -""" - -import pytest -from unittest.mock import Mock, patch, AsyncMock -from backend.api.services.tool_metadata import ( - get_tool_latency_estimate, - estimate_path_latency, - get_fastest_path, - validate_tool_output, - get_tool_schema, - TOOL_LATENCY_METADATA, - TOOL_OUTPUT_SCHEMAS -) -from backend.api.services.tool_selector import ToolSelector -from backend.api.services.agent_orchestrator import AgentOrchestrator - - -class TestLatencyPrediction: - """Test per-tool latency prediction""" - - def test_get_tool_latency_estimate_basic(self): - """Test basic latency estimation without context""" - rag_latency = get_tool_latency_estimate("rag") - web_latency = get_tool_latency_estimate("web") - admin_latency = get_tool_latency_estimate("admin") - llm_latency = get_tool_latency_estimate("llm") - - # Check that latencies are within expected ranges - assert 60 <= rag_latency <= 120 - assert 400 <= web_latency <= 1800 - assert 5 <= admin_latency <= 20 - assert 500 <= llm_latency <= 5000 - - def test_get_tool_latency_estimate_with_context(self): - """Test latency estimation with context""" - # RAG with long query - rag_long = get_tool_latency_estimate("rag", {"query_length": 200}) - rag_short = get_tool_latency_estimate("rag", {"query_length": 10}) - - assert rag_long >= rag_short # Longer queries should take more time - - # Web with complexity - web_complex = get_tool_latency_estimate("web", {"query_complexity": "high"}) - web_simple = get_tool_latency_estimate("web", {"query_complexity": "low"}) - - assert web_complex >= web_simple # Complex queries should take more time - - def test_estimate_path_latency(self): - """Test total latency estimation for tool sequences""" - # Single tool - single = estimate_path_latency(["admin"]) - assert single > 0 - assert single <= 20 - - # Multiple tools - multi = estimate_path_latency(["rag", "web", "llm"]) - assert multi > 0 - # Should be sum of individual latencies - assert multi >= get_tool_latency_estimate("rag") - assert multi >= get_tool_latency_estimate("web") - assert multi >= get_tool_latency_estimate("llm") - - def test_get_fastest_path(self): - """Test fastest path optimization""" - tools = ["llm", "admin", "rag", "web"] - fastest = get_fastest_path(tools) - - # Should be sorted by latency (fastest first) - assert len(fastest) == len(tools) - assert "admin" in fastest # Fastest tool - assert fastest[0] == "admin" # Should be first - - # Verify order is optimized - latencies = [get_tool_latency_estimate(t) for t in fastest] - assert latencies == sorted(latencies) # Should be in ascending order - - def test_latency_metadata_structure(self): - """Test that latency metadata has correct structure""" - for tool_name, metadata in TOOL_LATENCY_METADATA.items(): - assert metadata.tool_name == tool_name - assert metadata.min_ms > 0 - assert metadata.max_ms >= metadata.min_ms - assert metadata.avg_ms >= metadata.min_ms - assert metadata.avg_ms <= metadata.max_ms - assert len(metadata.description) > 0 - - -class TestToolOutputSchemas: - """Test tool output schema validation""" - - def test_get_tool_schema(self): - """Test schema retrieval""" - rag_schema = get_tool_schema("rag") - web_schema = get_tool_schema("web") - admin_schema = get_tool_schema("admin") - llm_schema = get_tool_schema("llm") - - assert rag_schema is not None - assert web_schema is not None - assert admin_schema is not None - assert llm_schema is not None - - assert rag_schema.tool_name == "rag" - assert web_schema.tool_name == "web" - assert admin_schema.tool_name == "admin" - assert llm_schema.tool_name == "llm" - - def test_validate_rag_output_valid(self): - """Test validation of valid RAG output""" - valid_rag = { - "results": [ - { - "text": "Document chunk", - "similarity": 0.85, - "metadata": {"title": "Test"}, - "doc_id": "doc123" - } - ], - "query": "test query", - "tenant_id": "tenant1", - "hits_count": 1, - "avg_score": 0.85, - "top_score": 0.85, - "latency_ms": 90 - } - - is_valid, error = validate_tool_output("rag", valid_rag) - assert is_valid is True - assert error is None - - def test_validate_rag_output_missing_field(self): - """Test validation catches missing required fields""" - invalid_rag = { - "results": [], - # Missing "query" and "tenant_id" - "hits_count": 0 - } - - is_valid, error = validate_tool_output("rag", invalid_rag) - assert is_valid is False - assert "Missing required field" in error - - def test_validate_web_output_valid(self): - """Test validation of valid Web output""" - valid_web = { - "results": [ - { - "title": "Result Title", - "snippet": "Result snippet", - "link": "https://example.com", - "displayLink": "example.com" - } - ], - "query": "search query", - "total_results": 10, - "latency_ms": 800 - } - - is_valid, error = validate_tool_output("web", valid_web) - assert is_valid is True - assert error is None - - def test_validate_admin_output_valid(self): - """Test validation of valid Admin output""" - valid_admin = { - "violations": [ - { - "rule_id": "rule1", - "rule_pattern": ".*password.*", - "severity": "high", - "matched_text": "password", - "confidence": 0.95, - "message_preview": "User asked for password" - } - ], - "checked": True, - "rules_count": 5, - "latency_ms": 10 - } - - is_valid, error = validate_tool_output("admin", valid_admin) - assert is_valid is True - assert error is None - - def test_validate_llm_output_valid(self): - """Test validation of valid LLM output""" - valid_llm = { - "text": "Generated response", - "tokens_used": 150, - "latency_ms": 2000, - "model": "llama3.1:latest", - "temperature": 0.0 - } - - is_valid, error = validate_tool_output("llm", valid_llm) - assert is_valid is True - assert error is None - - def test_validate_type_mismatch(self): - """Test validation catches type mismatches""" - invalid_rag = { - "results": "not an array", # Should be array - "query": "test", - "tenant_id": "tenant1" - } - - is_valid, error = validate_tool_output("rag", invalid_rag) - assert is_valid is False - assert "must be array" in error - - def test_schema_examples(self): - """Test that all schemas have examples""" - for tool_name, schema in TOOL_OUTPUT_SCHEMAS.items(): - assert schema.example is not None - assert isinstance(schema.example, dict) - # Example should be valid - is_valid, error = validate_tool_output(tool_name, schema.example) - assert is_valid is True, f"Schema example for {tool_name} is invalid: {error}" - - -class TestContextAwareRouting: - """Test context-aware MCP routing""" - - @pytest.fixture - def tool_selector(self): - """Create a ToolSelector instance""" - return ToolSelector(llm_client=None) - - def test_analyze_context_rag_high_score(self, tool_selector): - """Test context analysis when RAG returns high score""" - rag_results = [ - {"similarity": 0.85, "text": "High quality result"}, - {"similarity": 0.90, "text": "Another high quality result"} - ] - memory = [] - admin_violations = [] - tool_scores = {"rag_fitness": 0.8, "web_fitness": 0.5} - - hints = tool_selector._analyze_context(rag_results, memory, admin_violations, tool_scores) - - assert hints.get("skip_web_if_rag_high") is True - assert hints.get("rag_high_confidence") is True - - def test_analyze_context_rag_low_score(self, tool_selector): - """Test context analysis when RAG returns low score""" - rag_results = [ - {"similarity": 0.3, "text": "Low quality result"} - ] - memory = [] - admin_violations = [] - tool_scores = {"rag_fitness": 0.3, "web_fitness": 0.7} - - hints = tool_selector._analyze_context(rag_results, memory, admin_violations, tool_scores) - - # Should not skip web if RAG score is low - assert hints.get("skip_web_if_rag_high") is not True - - def test_analyze_context_memory_relevant(self, tool_selector): - """Test context analysis when relevant memory exists""" - rag_results = [] - memory = [ - { - "tool": "rag", - "result": { - "results": [ - {"similarity": 0.80, "text": "Recent RAG result"} - ] - } - } - ] - admin_violations = [] - tool_scores = {} - - hints = tool_selector._analyze_context(rag_results, memory, admin_violations, tool_scores) - - assert hints.get("has_relevant_memory") is True - # Should suggest skipping RAG if memory is recent and high quality - if memory[0]["result"]["results"][0]["similarity"] >= 0.75: - assert hints.get("skip_rag_if_memory") is True - - def test_analyze_context_admin_critical(self, tool_selector): - """Test context analysis when admin violation is critical""" - rag_results = [] - memory = [] - admin_violations = [ - { - "severity": "critical", - "rule_id": "rule1", - "matched_text": "sensitive data" - } - ] - tool_scores = {} - - hints = tool_selector._analyze_context(rag_results, memory, admin_violations, tool_scores) - - assert hints.get("skip_agent_reasoning") is True - assert hints.get("critical_violation") is True - - def test_analyze_context_admin_low_severity(self, tool_selector): - """Test context analysis when admin violation is low severity""" - rag_results = [] - memory = [] - admin_violations = [ - { - "severity": "low", - "rule_id": "rule1", - "matched_text": "minor issue" - } - ] - tool_scores = {} - - hints = tool_selector._analyze_context(rag_results, memory, admin_violations, tool_scores) - - # Low severity should not skip reasoning - assert hints.get("skip_agent_reasoning") is not True - - @pytest.mark.asyncio - async def test_tool_selection_with_context_hints(self, tool_selector): - """Test tool selection uses context hints""" - # Mock LLM client - tool_selector.llm_client = AsyncMock() - - # Context with high RAG score - ctx = { - "tenant_id": "test_tenant", - "rag_results": [ - {"similarity": 0.85, "text": "High quality result"} - ], - "tool_scores": { - "rag_fitness": 0.8, - "web_fitness": 0.6, - "llm_only": 0.3 - }, - "memory": [], - "admin_violations": [] - } - - decision = await tool_selector.select("general", "What is our company policy?", ctx) - - # Should include latency estimates in reason - assert "latency" in decision.reason.lower() or "est." in decision.reason.lower() - - # Check that steps have latency estimates (for non-LLM tools) - if decision.tool_input and "steps" in decision.tool_input: - steps = decision.tool_input["steps"] - for step in steps: - if isinstance(step, dict) and "input" in step and step.get("tool") != "llm": - # Non-LLM tools should have estimated latency (or be parallel) - assert "_estimated_latency_ms" in step["input"] or "parallel" in step or step.get("tool") == "llm" - - @pytest.mark.asyncio - async def test_tool_selection_skips_web_on_high_rag(self, tool_selector): - """Test that tool selection skips web when RAG has high score""" - tool_selector.llm_client = AsyncMock() - - ctx = { - "tenant_id": "test_tenant", - "rag_results": [ - {"similarity": 0.90, "text": "Very high quality result"} - ], - "tool_scores": { - "rag_fitness": 0.9, - "web_fitness": 0.7, - "llm_only": 0.2 - }, - "memory": [], - "admin_violations": [] - } - - decision = await tool_selector.select("general", "What is our internal policy?", ctx) - - # Check reason includes context hint - assert "skip web" in decision.reason.lower() or "rag high" in decision.reason.lower() or "context" in decision.reason.lower() - - @pytest.mark.asyncio - async def test_tool_selection_admin_critical_skip_reasoning(self, tool_selector): - """Test that tool selection skips reasoning for critical admin violations""" - tool_selector.llm_client = None # No LLM needed for admin-only path - - ctx = { - "tenant_id": "test_tenant", - "rag_results": [], - "tool_scores": {}, - "memory": [], - "admin_violations": [ - { - "severity": "critical", - "rule_id": "rule1", - "matched_text": "critical violation" - } - ] - } - - decision = await tool_selector.select("admin", "User trying to access sensitive data", ctx) - - # Should skip LLM reasoning for critical violations - if decision.tool_input and "steps" in decision.tool_input: - steps = decision.tool_input["steps"] - # Should have admin step but may skip LLM - has_admin = any(s.get("tool") == "admin" for s in steps if isinstance(s, dict)) - assert has_admin - - -class TestOrchestratorIntegration: - """Test orchestrator integration with new features""" - - @pytest.fixture - def orchestrator(self): - """Create an AgentOrchestrator instance""" - return AgentOrchestrator( - rag_mcp_url="http://localhost:8900/rag", - web_mcp_url="http://localhost:8900/web", - admin_mcp_url="http://localhost:8900/admin", - llm_backend="ollama" - ) - - def test_format_rag_output(self, orchestrator): - """Test RAG output formatting""" - raw_output = { - "results": [ - {"text": "Chunk 1", "similarity": 0.85}, - {"text": "Chunk 2", "similarity": 0.75} - ], - "query": "test query" - } - - formatted = orchestrator._format_tool_output("rag", raw_output, 90) - - # Check schema compliance - assert "results" in formatted - assert "query" in formatted - assert "tenant_id" in formatted - assert "hits_count" in formatted - assert "avg_score" in formatted - assert "top_score" in formatted - assert "latency_ms" in formatted - - # Validate against schema - is_valid, error = validate_tool_output("rag", formatted) - assert is_valid is True, f"Formatted RAG output invalid: {error}" - - def test_format_web_output(self, orchestrator): - """Test Web output formatting""" - raw_output = { - "items": [ - { - "title": "Result Title", - "snippet": "Result snippet", - "link": "https://example.com" - } - ] - } - - formatted = orchestrator._format_tool_output("web", raw_output, 800) - - # Check schema compliance - assert "results" in formatted - assert "query" in formatted - assert "total_results" in formatted - assert "latency_ms" in formatted - - # Validate against schema - is_valid, error = validate_tool_output("web", formatted) - assert is_valid is True, f"Formatted Web output invalid: {error}" - - def test_format_admin_output(self, orchestrator): - """Test Admin output formatting""" - raw_output = { - "matches": [ - { - "rule_id": "rule1", - "pattern": ".*password.*", - "severity": "high", - "text": "password", - "confidence": 0.95 - } - ] - } - - formatted = orchestrator._format_tool_output("admin", raw_output, 10) - - # Check schema compliance - assert "violations" in formatted - assert "checked" in formatted - assert "rules_count" in formatted - assert "latency_ms" in formatted - - # Validate against schema - is_valid, error = validate_tool_output("admin", formatted) - assert is_valid is True, f"Formatted Admin output invalid: {error}" - - def test_format_llm_output(self, orchestrator): - """Test LLM output formatting""" - raw_output = "This is a generated response from the LLM." - - formatted = orchestrator._format_tool_output("llm", raw_output, 2000) - - # Check schema compliance - assert "text" in formatted - assert "tokens_used" in formatted - assert "latency_ms" in formatted - assert "model" in formatted - assert "temperature" in formatted - - # Validate against schema - is_valid, error = validate_tool_output("llm", formatted) - assert is_valid is True, f"Formatted LLM output invalid: {error}" - - def test_format_output_handles_missing_fields(self, orchestrator): - """Test output formatting handles missing fields gracefully""" - # Minimal RAG output - minimal = {"results": []} - - formatted = orchestrator._format_tool_output("rag", minimal, 90) - - # Should have all required fields with defaults - assert "query" in formatted - assert "tenant_id" in formatted - assert "hits_count" in formatted - assert formatted["hits_count"] == 0 - - -class TestEndToEndRouting: - """End-to-end tests for context-aware routing""" - - @pytest.mark.asyncio - async def test_routing_with_high_rag_score(self): - """Test that high RAG score prevents web search""" - selector = ToolSelector(llm_client=None) - - ctx = { - "tenant_id": "test", - "rag_results": [{"similarity": 0.92, "text": "Perfect match"}], - "tool_scores": {"rag_fitness": 0.9, "web_fitness": 0.7}, - "memory": [], - "admin_violations": [] - } - - decision = await selector.select("general", "What is our policy?", ctx) - - # Check that context hints are applied - if decision.tool_input and "steps" in decision.tool_input: - steps = decision.tool_input["steps"] - tool_names = [s.get("tool") for s in steps if isinstance(s, dict) and "tool" in s] - - # Should have RAG but may skip web due to high score - assert "rag" in tool_names or "llm" in tool_names - - @pytest.mark.asyncio - async def test_routing_with_memory(self): - """Test that relevant memory prevents redundant RAG call""" - selector = ToolSelector(llm_client=None) - - ctx = { - "tenant_id": "test", - "rag_results": [], - "tool_scores": {"rag_fitness": 0.6}, - "memory": [ - { - "tool": "rag", - "result": { - "results": [{"similarity": 0.85, "text": "Recent result"}] - } - } - ], - "admin_violations": [] - } - - decision = await selector.select("general", "Tell me about our policy", ctx) - - # Context should be analyzed - # (Actual behavior depends on implementation, but should use memory) - assert decision is not None - - -if __name__ == "__main__": - pytest.main([__file__, "-v", "--tb=short"]) - diff --git a/backend/workers/placeholder.txt b/backend/workers/placeholder.txt deleted file mode 100644 index 520d4463a21ac6805c9e95029878a04d87b78c6a..0000000000000000000000000000000000000000 --- a/backend/workers/placeholder.txt +++ /dev/null @@ -1,4 +0,0 @@ -This directory contains Celery worker tasks for async processing. -For the Hugging Face Space submission, only placeholder files are included. -The full worker implementation exists separately. - diff --git a/check_env.py b/check_env.py deleted file mode 100644 index bea2beb0451468172ab69398c6e145c3c5c69ffb..0000000000000000000000000000000000000000 --- a/check_env.py +++ /dev/null @@ -1,106 +0,0 @@ -#!/usr/bin/env python3 -""" -Simple script to check Supabase environment variables -""" - -import os -import sys -from pathlib import Path -from dotenv import load_dotenv - -# Load .env file -load_dotenv() - -print("=" * 70) -print("Supabase Environment Variables Check") -print("=" * 70) -print() - -# Check SUPABASE_URL -supabase_url = os.getenv("SUPABASE_URL") -if supabase_url: - print(f"[OK] SUPABASE_URL is set") - print(f" Value: {supabase_url}") - if not supabase_url.startswith("https://"): - print(f" [WARNING] URL should start with https://") - if ".supabase.co" not in supabase_url: - print(f" [WARNING] URL should contain .supabase.co") -else: - print("[ERROR] SUPABASE_URL is NOT set") - print(" Required for Supabase integration") - -print() - -# Check SUPABASE_SERVICE_KEY -supabase_key = os.getenv("SUPABASE_SERVICE_KEY") -if supabase_key: - key_length = len(supabase_key) - print(f"[OK] SUPABASE_SERVICE_KEY is set") - print(f" Length: {key_length} characters") - - if key_length < 100: - print(f" [ERROR] Key is too short ({key_length} chars)") - print(f" Expected: 200+ characters") - print(f" This looks like an 'anon' key, not 'service_role' key!") - print(f" Get the correct key from:") - print(f" Supabase Dashboard -> Settings -> API -> service_role key") - elif key_length < 200: - print(f" [WARNING] Key might be incomplete ({key_length} chars)") - print(f" Expected: 200+ characters") - else: - print(f" [OK] Key length looks correct ({key_length} chars)") - - # Check if it starts with eyJ (JWT token format) - if supabase_key.startswith("eyJ"): - print(f" [OK] Key format looks correct (JWT token)") - else: - print(f" [WARNING] Key doesn't start with 'eyJ' (unusual for JWT)") - - # Show first and last few characters (masked) - if key_length > 20: - masked = supabase_key[:10] + "..." + supabase_key[-10:] - print(f" Preview: {masked}") -else: - print("[ERROR] SUPABASE_SERVICE_KEY is NOT set") - print(" Required for Supabase integration") - print(" Get it from: Supabase Dashboard -> Settings -> API -> service_role key") - -print() - -# Check POSTGRESQL_URL (optional) -postgres_url = os.getenv("POSTGRESQL_URL") -if postgres_url: - print(f"[INFO] POSTGRESQL_URL is set (optional, for migrations)") - if len(postgres_url) > 50: - masked = postgres_url[:30] + "..." + postgres_url[-20:] - print(f" Value: {masked}") - else: - print(f" Value: {postgres_url}") -else: - print("[INFO] POSTGRESQL_URL is not set (optional, only needed for migrations)") - -print() -print("=" * 70) -print("Summary") -print("=" * 70) - -has_url = bool(supabase_url) -has_key = bool(supabase_key) -key_valid = has_key and len(supabase_key) >= 200 - -if has_url and has_key and key_valid: - print("[SUCCESS] Supabase environment variables are correctly configured!") - print(" Your data should upload to Supabase automatically.") -elif has_url and has_key: - print("[WARNING] Supabase URL and key are set, but key appears invalid.") - print(" Check that you're using the 'service_role' key (not 'anon' key).") -elif has_url or has_key: - print("[ERROR] Supabase configuration is incomplete.") - print(" Both SUPABASE_URL and SUPABASE_SERVICE_KEY must be set.") -else: - print("[ERROR] Supabase is not configured.") - print(" Set SUPABASE_URL and SUPABASE_SERVICE_KEY in your .env file.") - -print() -print("=" * 70) - diff --git a/check_rag_database.py b/check_rag_database.py deleted file mode 100644 index bdd8d38c84b7ea8f88730c0a4ee1e4291c82a0fe..0000000000000000000000000000000000000000 --- a/check_rag_database.py +++ /dev/null @@ -1,125 +0,0 @@ -""" -Diagnostic script to check RAG database tenant isolation - -This script directly queries the database to verify tenant_id isolation. -""" - -import sys -from pathlib import Path - -# Add backend to path -backend_dir = Path(__file__).parent / "backend" -sys.path.insert(0, str(backend_dir)) - -def check_database(): - """Check database directly for tenant isolation""" - print("\n" + "="*60) - print("RAG Database Tenant Isolation Check") - print("="*60) - - try: - from mcp_server.common.database import get_connection - import psycopg2.extras - - conn = get_connection() - cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) - - # Check all tenant_ids in database - print("\n1. Checking all tenant_ids in database...") - cur.execute("SELECT DISTINCT tenant_id, COUNT(*) as count FROM documents GROUP BY tenant_id") - rows = cur.fetchall() - - if not rows: - print(" ⚠️ No documents found in database") - cur.close() - conn.close() - return - - print(f" Found {len(rows)} unique tenant(s):") - for row in rows: - print(f" - tenant_id: '{row['tenant_id']}' ({row['count']} documents)") - - # Check for tenant1 documents - print("\n2. Checking documents for 'verify_tenant1'...") - cur.execute( - "SELECT id, tenant_id, LEFT(chunk_text, 50) as preview FROM documents WHERE tenant_id = %s LIMIT 5", - ("verify_tenant1",) - ) - tenant1_docs = cur.fetchall() - print(f" Found {len(tenant1_docs)} documents for verify_tenant1") - for doc in tenant1_docs: - preview = doc['preview'].replace('\n', ' ') - print(f" - ID: {doc['id']}, tenant_id: '{doc['tenant_id']}', preview: {preview[:50]}...") - - # Check for tenant2 documents - print("\n3. Checking documents for 'verify_tenant2'...") - cur.execute( - "SELECT id, tenant_id, LEFT(chunk_text, 50) as preview FROM documents WHERE tenant_id = %s LIMIT 5", - ("verify_tenant2",) - ) - tenant2_docs = cur.fetchall() - print(f" Found {len(tenant2_docs)} documents for verify_tenant2") - for doc in tenant2_docs: - preview = doc['preview'].replace('\n', ' ') - print(f" - ID: {doc['id']}, tenant_id: '{doc['tenant_id']}', preview: {preview[:50]}...") - - # Test search_vectors function directly - print("\n4. Testing search_vectors function directly...") - from mcp_server.common.embeddings import embed_text - from mcp_server.common.database import search_vectors - - # Search for tenant1's secret as tenant1 - query = "TENANT1_SECRET" - query_vector = embed_text(query) - results_tenant1 = search_vectors("verify_tenant1", query_vector, limit=5) - print(f" Searching for '{query}' as verify_tenant1: {len(results_tenant1)} results") - for i, result in enumerate(results_tenant1[:2], 1): - text_preview = result['text'][:80].replace('\n', ' ') - print(f" Result {i}: {text_preview}...") - - # Search for tenant1's secret as tenant2 (should NOT find) - results_tenant2 = search_vectors("verify_tenant2", query_vector, limit=5) - print(f" Searching for '{query}' as verify_tenant2: {len(results_tenant2)} results") - if results_tenant2: - print(" ⚠️ WARNING: tenant2 found tenant1's secret!") - for i, result in enumerate(results_tenant2[:2], 1): - text_preview = result['text'][:80].replace('\n', ' ') - print(f" Result {i}: {text_preview}...") - else: - print(" ✅ PASSED: tenant2 cannot see tenant1's secret") - - # Check for any documents with wrong tenant_id - print("\n5. Checking for data integrity issues...") - cur.execute(""" - SELECT tenant_id, COUNT(*) as count - FROM documents - WHERE tenant_id IN ('verify_tenant1', 'verify_tenant2') - GROUP BY tenant_id - """) - integrity_check = cur.fetchall() - print(" Tenant document counts:") - for row in integrity_check: - print(f" - {row['tenant_id']}: {row['count']} documents") - - cur.close() - conn.close() - - print("\n" + "="*60) - if results_tenant2 and "TENANT1_SECRET" in str(results_tenant2): - print("❌ ISOLATION FAILED: tenant2 can see tenant1's documents") - else: - print("✅ Database isolation appears to be working correctly") - print("="*60) - - except ImportError as e: - print(f"\n❌ Import error: {e}") - print(" Make sure you're running from the project root directory") - except Exception as e: - print(f"\n❌ Error: {e}") - import traceback - traceback.print_exc() - - -if __name__ == "__main__": - check_database() - diff --git a/check_rules_db.py b/check_rules_db.py deleted file mode 100644 index c40020cfc9ef4c2321e4d9209a6006748c3d2b63..0000000000000000000000000000000000000000 --- a/check_rules_db.py +++ /dev/null @@ -1,43 +0,0 @@ -""" -Quick script to check if admin rules are saved in the database -""" -import sqlite3 -from pathlib import Path - -db_path = Path("data/admin_rules.db") - -if db_path.exists(): - print(f"✅ Database found at: {db_path}") - print("\n" + "="*60) - - conn = sqlite3.connect(db_path) - conn.row_factory = sqlite3.Row - cursor = conn.cursor() - - # Get all rules - cursor.execute("SELECT * FROM admin_rules ORDER BY created_at DESC") - rules = cursor.fetchall() - - if rules: - print(f"📋 Found {len(rules)} rule(s) in database:\n") - for rule in rules: - print(f"Tenant: {rule['tenant_id']}") - print(f"Rule: {rule['rule']}") - print(f"Pattern: {rule['pattern'] or 'N/A'}") - print(f"Severity: {rule['severity']}") - print(f"Enabled: {rule['enabled']}") - print(f"Created: {rule['created_at']}") - print("-" * 60) - else: - print("⚠️ No rules found in database.") - print(" Add rules via the Gradio UI or API to populate the database.") - - conn.close() -else: - print(f"❌ Database not found at: {db_path}") - print(" The database will be created automatically when you add your first rule.") - print("\n💡 To add rules:") - print(" 1. Open Gradio UI (python app.py)") - print(" 2. Go to 'Admin Rules & Compliance' tab") - print(" 3. Add rules in the text box and click 'Upload / Append Rules'") - diff --git a/check_supabase_rules.py b/check_supabase_rules.py deleted file mode 100644 index 6b4bfb3888dc5913bbe41ce53269c11448c4749d..0000000000000000000000000000000000000000 --- a/check_supabase_rules.py +++ /dev/null @@ -1,132 +0,0 @@ -#!/usr/bin/env python3 -""" -Quick script to verify Supabase rules storage is working. -Run this to check if rules are being saved to Supabase. -""" - -import os -import sys -from pathlib import Path - -# Load environment variables from .env file -from dotenv import load_dotenv -load_dotenv() - -# Add backend to path -backend_dir = Path(__file__).resolve().parent -sys.path.insert(0, str(backend_dir)) - -from backend.api.storage.rules_store import RulesStore - - -def main(): - print("=" * 60) - print("Supabase Rules Storage Verification") - print("=" * 60) - - # Check environment variables - supabase_url = os.getenv("SUPABASE_URL") - supabase_key = os.getenv("SUPABASE_SERVICE_KEY") - - print("\n1. Checking Environment Variables:") - if supabase_url: - print(f" ✅ SUPABASE_URL is set: {supabase_url[:50]}...") - else: - print(" ❌ SUPABASE_URL is not set") - print(" Add it to your .env file: SUPABASE_URL=https://your-project.supabase.co") - - if supabase_key: - print(f" ✅ SUPABASE_SERVICE_KEY is set: {supabase_key[:20]}...") - else: - print(" ❌ SUPABASE_SERVICE_KEY is not set") - print(" Add it to your .env file: SUPABASE_SERVICE_KEY=your_service_role_key") - - if not supabase_url or not supabase_key: - print("\n⚠️ Supabase credentials are missing!") - print(" Rules will be saved to SQLite instead.") - print(" See SUPABASE_SETUP.md for setup instructions.") - print("\n To use Supabase:") - print(" 1. Add SUPABASE_URL and SUPABASE_SERVICE_KEY to your .env file") - print(" 2. Create the admin_rules table in Supabase (see supabase_admin_rules_table.sql)") - print(" 3. Restart your application") - return - - # Initialize RulesStore - print("\n2. Initializing RulesStore:") - try: - store = RulesStore(auto_create_table=True) - print(f" ✅ RulesStore initialized") - print(f" 📦 Using Supabase: {store.use_supabase}") - - if not store.use_supabase: - print(" ⚠️ RulesStore is using SQLite, not Supabase!") - print(" Check that:") - print(" - SUPABASE_URL and SUPABASE_SERVICE_KEY are correct") - print(" - Supabase Python client is installed: pip install supabase") - return - - except Exception as e: - print(f" ❌ Failed to initialize RulesStore: {e}") - return - - # Test adding a rule - print("\n3. Testing Rule Storage:") - test_tenant = "test_verification" - test_rule = "Test rule for Supabase verification" - - try: - # Delete test rule if it exists - store.delete_rule(test_tenant, test_rule) - - # Add test rule - success = store.add_rule( - test_tenant, - test_rule, - severity="medium", - description="Verification test rule" - ) - - if success: - print(f" ✅ Successfully added test rule to Supabase") - else: - print(f" ❌ Failed to add rule to Supabase") - return - - # Retrieve rule - rules = store.get_rules(test_tenant) - if test_rule in rules: - print(f" ✅ Successfully retrieved rule from Supabase") - print(f" 📋 Found {len(rules)} rule(s) for tenant '{test_tenant}'") - else: - print(f" ❌ Rule not found after adding") - return - - # Get detailed rules - detailed_rules = store.get_rules_detailed(test_tenant) - if detailed_rules: - print(f" ✅ Successfully retrieved detailed rules") - for rule in detailed_rules: - if rule['rule'] == test_rule: - print(f" 📝 Rule details:") - print(f" - Pattern: {rule.get('pattern', 'N/A')}") - print(f" - Severity: {rule.get('severity', 'N/A')}") - print(f" - Enabled: {rule.get('enabled', 'N/A')}") - - # Cleanup test rule - store.delete_rule(test_tenant, test_rule) - print(f" 🧹 Cleaned up test rule") - - except Exception as e: - print(f" ❌ Error during test: {e}") - import traceback - traceback.print_exc() - return - - print("\n" + "=" * 60) - print("✅ All checks passed! Rules are being saved to Supabase.") - print("=" * 60) - - -if __name__ == "__main__": - main() - diff --git a/create_supabase_table.py b/create_supabase_table.py deleted file mode 100644 index c46dad7a089e7d7bfb103b765a457cb6338839b0..0000000000000000000000000000000000000000 --- a/create_supabase_table.py +++ /dev/null @@ -1,185 +0,0 @@ -""" -Create admin_rules table in Supabase programmatically. -This script uses the Supabase Python client to set up the table. -""" - -import os -import sys -from pathlib import Path -from dotenv import load_dotenv - -load_dotenv() - -def create_table_using_supabase_client(): - """ - Create the admin_rules table using Supabase client. - Since Supabase doesn't allow direct SQL execution via REST API, - we'll use a workaround or provide clear instructions. - """ - supabase_url = os.getenv("SUPABASE_URL") - supabase_key = os.getenv("SUPABASE_SERVICE_KEY") - - if not supabase_url or not supabase_key: - print("❌ Missing Supabase credentials!") - print(" Set SUPABASE_URL and SUPABASE_SERVICE_KEY in .env file") - return False - - try: - from supabase import create_client - import httpx - - print("🔗 Connecting to Supabase...") - client = create_client(supabase_url, supabase_key) - - # Read SQL from file - sql_file = Path(__file__).parent / "supabase_admin_rules_table.sql" - if not sql_file.exists(): - print(f"❌ SQL file not found: {sql_file}") - return False - - with open(sql_file, "r", encoding="utf-8") as f: - sql_content = f.read() - - print("📝 Attempting to create table via Supabase API...") - - # Method 1: Try using Supabase Management API (if available) - # This requires the project to have pg_net extension enabled - try: - # Use the REST API to execute SQL via a custom function - # First, check if we can use the SQL execution endpoint - response = httpx.post( - f"{supabase_url}/rest/v1/rpc/exec_sql", - headers={ - "apikey": supabase_key, - "Authorization": f"Bearer {supabase_key}", - "Content-Type": "application/json", - "Prefer": "return=representation" - }, - json={"query": sql_content}, - timeout=30 - ) - - if response.status_code in [200, 201, 204]: - print("✅ Table created successfully via API!") - return True - else: - print(f"⚠️ API method returned: {response.status_code}") - print(f" Response: {response.text[:200]}") - except Exception as e: - print(f"⚠️ API method failed: {e}") - - # Method 2: Try using Supabase Python client's table operations - # This won't work for DDL, but we can verify if table exists - print("\n🔍 Checking if table already exists...") - try: - result = client.table("admin_rules").select("id").limit(1).execute() - print("✅ Table 'admin_rules' already exists!") - return True - except Exception as e: - error_str = str(e).lower() - if "relation" in error_str or "does not exist" in error_str: - print("⚠️ Table does not exist yet.") - else: - print(f"⚠️ Error checking table: {e}") - - # Method 3: Since direct SQL execution isn't supported, show instructions - print("\n" + "=" * 70) - print("📋 MANUAL SETUP REQUIRED") - print("=" * 70) - print("\nSupabase doesn't allow programmatic SQL execution for security.") - print("Please run the SQL manually in Supabase Dashboard:\n") - print("1. Go to: https://app.supabase.com") - print("2. Select your project") - print("3. Click 'SQL Editor' (left sidebar)") - print("4. Click 'New query'") - print("5. Copy the SQL below and paste it:") - print("\n" + "-" * 70) - print(sql_content) - print("-" * 70) - print("\n6. Click 'Run' button (or press Ctrl+Enter)") - print("7. Wait for success confirmation") - print("\n✅ After running, the table will be created automatically!") - - return False - - except ImportError: - print("❌ Supabase client not installed") - print(" Run: pip install supabase") - return False - except Exception as e: - print(f"❌ Error: {e}") - import traceback - traceback.print_exc() - return False - - -def create_table_via_psql(): - """ - Alternative: Use psql (PostgreSQL client) to execute SQL directly. - This requires POSTGRESQL_URL to be set. - """ - postgres_url = os.getenv("POSTGRESQL_URL") - if not postgres_url: - print("⚠️ POSTGRESQL_URL not set, skipping psql method") - return False - - sql_file = Path(__file__).parent / "supabase_admin_rules_table.sql" - if not sql_file.exists(): - return False - - try: - import subprocess - print("📝 Attempting to create table via psql...") - - # Execute SQL using psql - result = subprocess.run( - ["psql", postgres_url, "-f", str(sql_file)], - capture_output=True, - text=True, - timeout=30 - ) - - if result.returncode == 0: - print("✅ Table created successfully via psql!") - return True - else: - print(f"⚠️ psql failed: {result.stderr}") - return False - except FileNotFoundError: - print("⚠️ psql not found in PATH") - return False - except Exception as e: - print(f"⚠️ psql method failed: {e}") - return False - - -if __name__ == "__main__": - print("=" * 70) - print("Supabase Admin Rules Table Creator") - print("=" * 70) - print() - - # Try Method 1: Supabase client - success = create_table_using_supabase_client() - - if not success: - # Try Method 2: psql (if available) - print("\n" + "=" * 70) - print("Trying alternative method: psql") - print("=" * 70) - success = create_table_via_psql() - - if success: - print("\n" + "=" * 70) - print("✅ SUCCESS!") - print("=" * 70) - print("\nThe admin_rules table has been created in Supabase.") - print("RulesStore will now use Supabase instead of SQLite.") - else: - print("\n" + "=" * 70) - print("📝 Manual Setup Required") - print("=" * 70) - print("\nPlease run the SQL manually in Supabase SQL Editor.") - print("The SQL script is ready in: supabase_admin_rules_table.sql") - print("\nAfter creating the table, RulesStore will automatically use Supabase.") - diff --git a/create_supabase_table_simple.py b/create_supabase_table_simple.py deleted file mode 100644 index ea4111e086b96df839d280a4d9787c070ffe4b71..0000000000000000000000000000000000000000 --- a/create_supabase_table_simple.py +++ /dev/null @@ -1,70 +0,0 @@ -""" -Simple script to create admin_rules table in Supabase. -This uses the Supabase Management API or direct SQL execution. -""" - -import os -from dotenv import load_dotenv -import httpx -import json - -load_dotenv() - -SUPABASE_URL = os.getenv("SUPABASE_URL") -SUPABASE_SERVICE_KEY = os.getenv("SUPABASE_SERVICE_KEY") - -if not SUPABASE_URL or not SUPABASE_SERVICE_KEY: - print("❌ Missing Supabase credentials!") - print(" Set SUPABASE_URL and SUPABASE_SERVICE_KEY in .env file") - exit(1) - -# Read the SQL file -sql_file = Path("supabase_admin_rules_table.sql") -if not sql_file.exists(): - print(f"❌ SQL file not found: {sql_file}") - exit(1) - -with open(sql_file, "r") as f: - sql_content = f.read() - -print("🔗 Connecting to Supabase...") -print(f" URL: {SUPABASE_URL[:50]}...") - -# Method 1: Try using Supabase REST API with SQL execution -# Note: This requires the pg_net extension or a custom function -# Most Supabase projects don't allow direct SQL execution via REST API - -# Method 2: Use Supabase Python client to execute via RPC -try: - from supabase import create_client - - client = create_client(SUPABASE_URL, SUPABASE_SERVICE_KEY) - - # Split SQL into individual statements - statements = [s.strip() for s in sql_content.split(";") if s.strip() and not s.strip().startswith("--")] - - print(f"\n📝 Executing {len(statements)} SQL statements...") - - # Execute each statement - # Note: Supabase Python client doesn't support direct SQL execution - # We'll need to use a workaround or manual execution - - print("\n⚠️ Direct SQL execution via Python client is not supported.") - print(" Supabase requires SQL to be executed via the SQL Editor.") - print("\n📋 Please follow these steps:") - print(" 1. Go to: https://app.supabase.com") - print(" 2. Select your project") - print(" 3. Click 'SQL Editor' in the left sidebar") - print(" 4. Click 'New query'") - print(" 5. Copy the contents of: supabase_admin_rules_table.sql") - print(" 6. Paste into the SQL Editor") - print(" 7. Click 'Run' (or press Ctrl+Enter)") - print("\n✅ After running the SQL, the table will be created!") - -except ImportError: - print("❌ Supabase client not installed") - print(" Run: pip install supabase") -except Exception as e: - print(f"❌ Error: {e}") - print("\n💡 Manual setup required - see instructions above") - diff --git a/createingdummydata.py b/createingdummydata.py deleted file mode 100644 index b2ba2c8160458154b51db86788ec96cb526a69df..0000000000000000000000000000000000000000 --- a/createingdummydata.py +++ /dev/null @@ -1,44 +0,0 @@ -from docx import Document - -# Dummy data -data = { - "Day": ["Day 1", "Day 2", "Day 3", "Day 4", "Day 5"], - "Breakfast": [ - "Oatmeal with sliced bananas and honey", - "Scrambled eggs with toast and orange juice", - "Greek yogurt with granola and berries", - "Pancakes with maple syrup and strawberries", - "Smoothie (spinach, banana, yogurt, almond milk)" - ], - "Lunch": [ - "Grilled chicken salad with mixed greens and vinaigrette", - "Turkey sandwich with lettuce, tomato, and chips", - "Vegetable soup with whole-grain roll", - "Tuna salad wrap with carrot sticks", - "Caesar salad with grilled shrimp" - ], - "Dinner": [ - "Spaghetti with marinara sauce and garlic bread", - "Baked salmon with steamed broccoli and rice", - "Beef stir-fry with mixed vegetables and noodles", - "Chicken curry with basmati rice", - "Veggie pizza with side salad" - ] -} - -# Create DOCX document -doc = Document() -doc.add_heading("5-Day Meal Plan", level=1) - -for i in range(5): - doc.add_heading(data["Day"][i], level=2) - doc.add_paragraph(f"Breakfast: {data['Breakfast'][i]}") - doc.add_paragraph(f"Lunch: {data['Lunch'][i]}") - doc.add_paragraph(f"Dinner: {data['Dinner'][i]}") - doc.add_paragraph("") - -# Save file -path = "5_day_meal_plan.docx" -doc.save(path) - -print(f"Saved DOCX file to: {path}") diff --git a/example_rules.txt b/example_rules.txt deleted file mode 100644 index 7de92700b02a642558607ee827cc35bb76627508..0000000000000000000000000000000000000000 --- a/example_rules.txt +++ /dev/null @@ -1,133 +0,0 @@ -# Admin Rules Examples for IntegraChat -# Copy and paste these rules into the Admin Rules & Compliance tab in Gradio UI - -# ============================================================ -# HIGH PRIORITY SECURITY RULES -# ============================================================ - -Block password disclosure requests -Prevent sharing of authentication credentials -No sharing of API keys or tokens -Block requests for user account passwords -Prevent disclosure of security credentials -Block social security number requests -No sharing of credit card information -Prevent disclosure of personal identification numbers -Block requests for bank account details -No sharing of confidential access codes - -# ============================================================ -# MEDIUM PRIORITY COMPLIANCE RULES -# ============================================================ - -Block requests for employee personal information -Prevent sharing of customer data without authorization -No unauthorized access to financial records -Block requests for confidential business strategies -Prevent disclosure of proprietary information -No sharing of trade secrets -Block requests for competitor analysis data -Prevent unauthorized data export -No sharing of internal process documentation -Block requests for customer contact lists - -# ============================================================ -# DATA PRIVACY RULES -# ============================================================ - -Block requests for personal data of EU citizens -Prevent sharing of health information -No disclosure of medical records -Block requests for biometric data -Prevent sharing of location tracking information -No disclosure of children's personal information -Block requests for genetic information -Prevent sharing of religious or political affiliations -No disclosure of sexual orientation data -Block requests for financial transaction history - -# ============================================================ -# OPERATIONAL RULES -# ============================================================ - -Block requests to delete system logs -Prevent unauthorized system configuration changes -No sharing of infrastructure credentials -Block requests for production database access -Prevent disclosure of deployment procedures -No sharing of monitoring tool credentials -Block requests for backup restoration procedures -Prevent unauthorized access to cloud resources -No sharing of encryption keys -Block requests for system administrator privileges - -# ============================================================ -# CONTENT MODERATION RULES -# ============================================================ - -Block requests for generating harmful content -Prevent creation of offensive material -No sharing of inappropriate content -Block requests for generating misleading information -Prevent creation of fake news content -No sharing of defamatory statements -Block requests for generating hate speech -Prevent creation of discriminatory content -No sharing of violent content -Block requests for generating illegal content - -# ============================================================ -# SPECIFIC KEYWORD-BASED RULES -# ============================================================ - -Block queries containing "password" and "reset" -Prevent requests with "API key" and "generate" -No queries containing "SSN" or "social security" -Block requests with "credit card" and "number" -Prevent queries containing "bank account" and "details" -No requests with "admin" and "access" -Block queries containing "delete" and "all data" -Prevent requests with "export" and "customer list" -No queries containing "encryption key" and "show" -Block requests with "root password" and "share" - -# ============================================================ -# REGULATORY COMPLIANCE RULES -# ============================================================ - -Block requests violating GDPR regulations -Prevent sharing of data without consent -No disclosure of information to unauthorized parties -Block requests for data subject to HIPAA -Prevent sharing of protected health information -No disclosure of financial data subject to PCI-DSS -Block requests violating SOX compliance -Prevent sharing of audit trail information -No disclosure of information subject to FERPA -Block requests violating industry-specific regulations - -# ============================================================ -# RESPONSE BEHAVIOR RULES -# ============================================================ - -Keep greeting responses brief and simple -Do not provide verbose responses to simple greetings -Respond to hello and hi with short friendly greetings only -Avoid mentioning RAG or documentation sources in greeting responses -Keep casual conversation responses concise - -# ============================================================ -# CUSTOM BUSINESS RULES (Examples) -# ============================================================ - -Block requests for competitor pricing information -Prevent sharing of upcoming product launch details -No disclosure of merger and acquisition information -Block requests for employee salary information -Prevent sharing of vendor contract terms -No disclosure of strategic partnership details -Block requests for customer churn analysis data -Prevent sharing of marketing campaign strategies -No disclosure of research and development projects -Block requests for intellectual property information - diff --git a/example_rules_detailed.json b/example_rules_detailed.json deleted file mode 100644 index f0ba9ebd8ae9b46fa281a2baf6bebfdcb000ab82..0000000000000000000000000000000000000000 --- a/example_rules_detailed.json +++ /dev/null @@ -1,131 +0,0 @@ -{ - "rules": [ - { - "rule": "Block password disclosure requests", - "pattern": ".*(password|pwd|passcode|credential|login).*", - "severity": "high", - "description": "Prevents users from requesting or sharing passwords, credentials, or authentication information" - }, - { - "rule": "Prevent sharing of API keys or tokens", - "pattern": ".*(api.?key|token|secret|access.?key|auth.?token).*", - "severity": "critical", - "description": "Blocks requests to share, generate, or disclose API keys, tokens, or authentication secrets" - }, - { - "rule": "Block social security number requests", - "pattern": ".*(ssn|social.?security|tax.?id|ein).*", - "severity": "high", - "description": "Prevents disclosure of social security numbers or tax identification numbers" - }, - { - "rule": "No sharing of credit card information", - "pattern": ".*(credit.?card|card.?number|cvv|cvc|expiration).*", - "severity": "critical", - "description": "Blocks requests to share or store credit card numbers, CVV codes, or payment card information" - }, - { - "rule": "Block requests for bank account details", - "pattern": ".*(bank.?account|routing.?number|account.?number|swift|iban).*", - "severity": "high", - "description": "Prevents disclosure of bank account numbers, routing numbers, or financial account information" - }, - { - "rule": "Prevent sharing of employee personal information", - "pattern": ".*(employee.?data|staff.?info|personnel.?record|hr.?data).*", - "severity": "medium", - "description": "Blocks requests to access or share employee personal information without authorization" - }, - { - "rule": "No unauthorized access to financial records", - "pattern": ".*(financial.?record|accounting|bookkeeping|financial.?data).*", - "severity": "high", - "description": "Prevents unauthorized access to financial records, accounting data, or bookkeeping information" - }, - { - "rule": "Block requests for confidential business strategies", - "pattern": ".*(business.?strategy|strategic.?plan|confidential.?plan|roadmap).*", - "severity": "medium", - "description": "Prevents disclosure of confidential business strategies, plans, or roadmaps" - }, - { - "rule": "Prevent disclosure of proprietary information", - "pattern": ".*(proprietary|trade.?secret|intellectual.?property|ip).*", - "severity": "high", - "description": "Blocks requests to share proprietary information, trade secrets, or intellectual property" - }, - { - "rule": "Block requests for personal data of EU citizens", - "pattern": ".*(gdpr|eu.?citizen|personal.?data|data.?subject).*", - "severity": "critical", - "description": "Prevents unauthorized access to personal data of EU citizens, violating GDPR regulations" - }, - { - "rule": "Prevent sharing of health information", - "pattern": ".*(health.?info|medical.?record|patient.?data|hipaa).*", - "severity": "critical", - "description": "Blocks requests to share health information or medical records, protecting HIPAA compliance" - }, - { - "rule": "No disclosure of children's personal information", - "pattern": ".*(child|minor|under.?18|coppa).*", - "severity": "critical", - "description": "Prevents disclosure of personal information of children under 18, ensuring COPPA compliance" - }, - { - "rule": "Block requests to delete system logs", - "pattern": ".*(delete.?log|remove.?log|clear.?log|purge.?log).*", - "severity": "high", - "description": "Prevents deletion or modification of system logs, which are critical for security and compliance" - }, - { - "rule": "Prevent unauthorized system configuration changes", - "pattern": ".*(system.?config|change.?setting|modify.?config|update.?config).*", - "severity": "high", - "description": "Blocks unauthorized changes to system configuration that could compromise security" - }, - { - "rule": "No sharing of infrastructure credentials", - "pattern": ".*(infrastructure|server.?credential|deployment.?key|cloud.?access).*", - "severity": "critical", - "description": "Prevents sharing of infrastructure credentials, server access, or cloud deployment keys" - }, - { - "rule": "Block requests for generating harmful content", - "pattern": ".*(harmful|violent|hate.?speech|offensive|illegal).*", - "severity": "medium", - "description": "Prevents generation of harmful, violent, hateful, or illegal content" - }, - { - "rule": "Prevent creation of misleading information", - "pattern": ".*(misleading|fake.?news|false.?info|disinformation).*", - "severity": "medium", - "description": "Blocks creation of misleading information, fake news, or disinformation" - }, - { - "rule": "No sharing of defamatory statements", - "pattern": ".*(defamatory|libel|slander|defame).*", - "severity": "medium", - "description": "Prevents creation or sharing of defamatory statements that could cause legal issues" - }, - { - "rule": "Block requests for competitor pricing information", - "pattern": ".*(competitor|pricing|competitive.?intelligence).*", - "severity": "low", - "description": "Prevents sharing of competitor pricing information or competitive intelligence" - }, - { - "rule": "Prevent sharing of upcoming product launch details", - "pattern": ".*(product.?launch|upcoming.?release|new.?product).*", - "severity": "medium", - "description": "Blocks disclosure of upcoming product launches or new product information" - } - ], - "usage_instructions": { - "simple": "Copy rules from example_rules.txt and paste into Gradio UI", - "detailed": "Use the JSON format with patterns and severity levels for more control", - "bulk_upload": "Use the /admin/rules/bulk endpoint with the rules array", - "individual": "Add rules one by one using the /admin/rules endpoint with JSON payload" - } -} - diff --git a/frontend/.gitignore b/frontend/.gitignore deleted file mode 100644 index 5ef6a520780202a1d6addd833d800ccb1ecac0bb..0000000000000000000000000000000000000000 --- a/frontend/.gitignore +++ /dev/null @@ -1,41 +0,0 @@ -# See https://help.github.com/articles/ignoring-files/ for more about ignoring files. - -# dependencies -/node_modules -/.pnp -.pnp.* -.yarn/* -!.yarn/patches -!.yarn/plugins -!.yarn/releases -!.yarn/versions - -# testing -/coverage - -# next.js -/.next/ -/out/ - -# production -/build - -# misc -.DS_Store -*.pem - -# debug -npm-debug.log* -yarn-debug.log* -yarn-error.log* -.pnpm-debug.log* - -# env files (can opt-in for committing if needed) -.env* - -# vercel -.vercel - -# typescript -*.tsbuildinfo -next-env.d.ts diff --git a/frontend/README.md b/frontend/README.md deleted file mode 100644 index 2e22c0129af130089daab31fc09f858c2835a1fc..0000000000000000000000000000000000000000 --- a/frontend/README.md +++ /dev/null @@ -1,134 +0,0 @@ -## IntegraChat Frontend - -Next.js 16 / React 19 app that showcases everything wired up in `backend/`. -It provides a polished operator console with: - -- **Hero section** + feature overview describing the FastAPI + MCP stack -- **Live chat panel** that POSTs to `POST /agent/message` for AI conversations -- **Analytics dashboard** pulling from `GET /analytics/overview` with real-time metrics -- **Knowledge base management** page (`/knowledge-base`) for document search and ingestion -- **Document ingestion UI** for uploading PDF, DOCX, TXT files or raw text -- **Feature grid** showcasing platform capabilities - -**Note:** IntegraChat also includes a Gradio-based UI (`app.py`) with interactive visualizations, statistics cards, and Plotly charts. See the root `README.md` for details on running the Gradio interface. - -## Running Locally - -```bash -cd frontend -npm install -npm run dev -``` - -Visit `http://localhost:3000` for the main landing page, or `http://localhost:3000/knowledge-base` for document management. - -### API configuration - -The UI calls the FastAPI service through `NEXT_PUBLIC_API_URL` (default `http://localhost:8000`). -Update `.env.local` if your backend runs elsewhere: - -``` -NEXT_PUBLIC_API_URL=http://localhost:8000 -``` - -### Tenant & Role selector - -- The navbar widget now stores both the tenant ID and the MCP role (Viewer, Editor, Admin, Owner) in `localStorage`. -- Every API call automatically includes `x-tenant-id` and `x-user-role` headers so the backend RBAC layer can authorize ingestion, admin rule uploads, analytics, and delete operations. -- If you see a 403 "insufficient permissions" error, switch the role dropdown to a higher privilege (e.g., Admin) before retrying the action. -- **Note**: Analytics is now accessible to all roles (viewer, editor, admin, owner) for improved transparency. - -## Features - -### Main Landing Page (`/`) -- **Hero section** with platform introduction -- **Feature grid** showcasing key capabilities -- **Chat panel** for real-time AI conversations with reasoning visualizations -- **Analytics panel** with query metrics and tool usage statistics -- **Ingestion card** for quick document uploads - -### Real-Time Visualizations - -The frontend includes three powerful visualization components: - -#### 1. Reasoning Path Visualizer (`reasoning-visualizer.tsx`) -- Step-by-step visualization of agent decision-making -- Animated progression through reasoning steps -- Status indicators and detailed metrics -- **Latency predictions** shown for each step (estimated vs actual) -- **Context-aware routing hints** displayed (skip web/RAG/reasoning decisions) -- Integrated into chat panel with collapsible section - -#### 2. Tool Invocation Timeline (`tool-timeline.tsx`) -- Visual timeline of tool executions -- Latency and result count visualization -- **Schema-validated outputs** displayed (RAG results, Web results, Admin violations, LLM tokens) -- Summary statistics -- Integrated into chat panel - -#### 3. Tenant Activity Heatmap (`tenant-heatmap.tsx`) -- Query activity heatmap (hour-by-hour, day-by-day) -- Per-tool usage trends -- Integrated into analytics page - -### Knowledge Base Page (`/knowledge-base`) -- **Document listing** with pagination and filtering by type (text, PDF, FAQ, link) -- **Search interface** for semantic search with cross-encoder re-ranking across documents -- **AI-Generated Metadata Display**: After ingestion, shows extracted: - - Title, Summary, Tags, Topics - - Quality Score (0.0-1.0) - - Detected Date - - Extraction Method (LLM vs fallback) -- **Document ingestion** with support for: - - Raw text input - - URL ingestion (automatic content fetching) - - PDF file uploads - - DOCX file uploads - - TXT and Markdown file uploads -- **Document management** with tenant + role isolation: - - Delete individual documents by ID - - Delete all documents for a tenant (with confirmation) - - Real-time document list updates after operations - - Error handling with clear user feedback - -### Analytics Page (`/analytics`) -- **Analytics overview** with key metrics (queries, users, red flags) -- **Tool usage statistics** with detailed breakdowns -- **Tenant activity heatmap** showing query patterns over time -- **Per-tool usage trends** with visual bar charts -- **Access**: All roles can view analytics (viewer, editor, admin, owner) - -### Admin Rules Page (`/admin-rules`) -- **Rule management** with bulk upload and individual rule deletion -- **File upload support** for TXT, PDF, DOC, DOCX, MD files with drag-and-drop -- **LLM-Guided Rule Explanations**: - - Automatic generation of human-readable explanations - - Concrete examples of what would trigger the rule - - Missing pattern suggestions for rule improvement - - Edge cases and improvements identified by LLM - - **Intelligent fallback**: When LLM times out, uses keyword extraction to generate useful explanations, examples, and suggestions -- **Expandable explanations** with "Explain" button for each rule -- **Auto-expand** for newly added rules with explanations -- **Role-based access**: Requires Admin or Owner role to manage rules -- **Real-time updates** with refresh functionality - -### Components - -- `chat-panel.tsx` - Real-time chat interface with streaming responses and visualization integration -- `analytics-panel.tsx` - Analytics dashboard with metrics visualization -- `knowledge-base-panel.tsx` - Document search and ingestion component -- `ingestion-card.tsx` - Quick document upload card -- `hero.tsx` - Landing page hero section -- `feature-grid.tsx` - Feature showcase grid -- `footer.tsx` - Footer component -- `reasoning-visualizer.tsx` - Real-time reasoning path visualizer component -- `tool-timeline.tsx` - Tool invocation timeline component -- `tenant-heatmap.tsx` - Tenant activity heatmap component -- `rule-explanation.tsx` - LLM-generated rule explanation component with examples and pattern suggestions -- `admin-rules-panel.tsx` - Admin rules management panel component - -## Deploy - -Deploy like any Next.js app (Vercel, Docker, etc.). Ensure the backend endpoints are reachable from the browser and CORS is enabled (already configured in `backend/api/main.py`). - -**Note:** Make sure Celery workers are running in production for document ingestion and analytics processing to work properly. diff --git a/frontend/app/admin-rules/page.tsx b/frontend/app/admin-rules/page.tsx deleted file mode 100644 index 2cb165fbc71da3c0f0ed1b6d030c77e5a125d35d..0000000000000000000000000000000000000000 --- a/frontend/app/admin-rules/page.tsx +++ /dev/null @@ -1,778 +0,0 @@ -"use client"; - -import React, { useCallback, useMemo, useState, useRef, useEffect } from "react"; -import Link from "next/link"; - -import { AdminRulesPanel } from "@/components/admin-rules-panel"; -import { RuleExplanation } from "@/components/rule-explanation"; -import { Footer } from "@/components/footer"; -import { useTenant } from "@/contexts/TenantContext"; -import { TenantSelector } from "@/components/tenant-selector"; -import { canManageRules } from "@/lib/permissions"; - -const BACKEND_BASE_URL = process.env.NEXT_PUBLIC_BACKEND_BASE_URL ?? "http://localhost:8000"; - -type StatusState = { tone: "info" | "success" | "error"; message: string } | null; - -const RBAC_ERROR_HINT = - "Insufficient permissions for this action. Switch your role to Admin or Owner in the navbar and try again."; - -async function buildErrorMessage(response: Response) { - const fallback = `Backend error ${response.status}`; - try { - const text = await response.text(); - if (!text) { - return response.status === 403 ? RBAC_ERROR_HINT : fallback; - } - try { - const parsed = JSON.parse(text); - const detail = parsed.detail || parsed.message; - if (response.status === 403) { - return detail || RBAC_ERROR_HINT; - } - return detail || fallback; - } catch { - if (response.status === 403) { - return text || RBAC_ERROR_HINT; - } - return text || fallback; - } - } catch { - return response.status === 403 ? RBAC_ERROR_HINT : fallback; - } -} - -export default function AdminRulesPage() { - const { tenantId, role } = useTenant(); - const [rulesInput, setRulesInput] = useState(""); - const [deleteInput, setDeleteInput] = useState(""); - const [rules, setRules] = useState([]); - const [loading, setLoading] = useState(false); - const [status, setStatus] = useState(null); - const [isDragging, setIsDragging] = useState(false); - const [lastUpdated, setLastUpdated] = useState(""); - const [ruleExplanations, setRuleExplanations] = useState>({}); - const [expandedRules, setExpandedRules] = useState>(new Set()); - const [loadingExplanations, setLoadingExplanations] = useState>(new Set()); - const fileInputRef = useRef(null); - - // Set initial time only on client side to avoid hydration mismatch - useEffect(() => { - setLastUpdated(new Date().toLocaleTimeString()); - }, []); - - const headers = useMemo(() => { - if (!tenantId.trim()) return undefined; - return { - "Content-Type": "application/json", - "x-tenant-id": tenantId.trim(), - "x-user-role": role, - }; - }, [tenantId, role]); - - const requireTenant = useCallback(() => { - if (!tenantId.trim()) { - setStatus({ tone: "error", message: "Enter a tenant ID in the navbar first." }); - return false; - } - return true; - }, [tenantId]); - - const handleRefresh = useCallback(async () => { - if (!requireTenant()) return; - try { - setLoading(true); - setStatus({ tone: "info", message: "Loading rules..." }); - const response = await fetch(`${BACKEND_BASE_URL}/admin/rules`, { - method: "GET", - headers, - }); - if (!response.ok) { - throw new Error(await buildErrorMessage(response)); - } - const data = await response.json(); - setRules(data.rules ?? []); - setLastUpdated(new Date().toLocaleTimeString()); - setStatus({ tone: "success", message: "Rules synced." }); - } catch (error: any) { - setStatus({ tone: "error", message: error.message || "Failed to fetch rules" }); - } finally { - setLoading(false); - } - }, [headers, requireTenant]); - - const handleUpload = useCallback(async () => { - if (!requireTenant()) return; - const lines = rulesInput - .split("\n") - .map((line) => line.trim()) - .filter((line) => line && !line.startsWith("#")); // Filter out comments and empty lines - - if (!lines.length) { - setStatus({ tone: "error", message: "Add at least one rule to upload. (Comment lines starting with # are ignored)" }); - return; - } - - try { - setLoading(true); - setStatus({ tone: "info", message: `Uploading ${lines.length} rule(s)...` }); - const response = await fetch(`${BACKEND_BASE_URL}/admin/rules/bulk?enhance=true`, { - method: "POST", - headers, - body: JSON.stringify({ rules: lines }), - }); - if (!response.ok) { - throw new Error(await buildErrorMessage(response)); - } - const data = await response.json(); - await handleRefresh(); - setRulesInput(""); - - // Store explanations for display and auto-expand - if (data.explanations && Array.isArray(data.explanations)) { - const explanationsMap: Record = {}; - const newExpanded = new Set(expandedRules); - - data.explanations.forEach((exp: any) => { - if (exp.rule) { - explanationsMap[exp.rule] = exp; - // Auto-expand rules that have explanations - if (exp.explanation || exp.examples || exp.missing_patterns) { - newExpanded.add(exp.rule); - } - } - }); - setRuleExplanations((prev) => ({ ...prev, ...explanationsMap })); - setExpandedRules(newExpanded); - } - - const enhancedMsg = data.enhanced ? " (enhanced by LLM)" : ""; - setStatus({ tone: "success", message: `Uploaded ${data.added_rules?.length || lines.length} rule(s)${enhancedMsg}.` }); - } catch (error: any) { - setStatus({ tone: "error", message: error.message || "Failed to upload rules" }); - } finally { - setLoading(false); - } - }, [handleRefresh, headers, requireTenant, rulesInput]); - - const processFile = useCallback(async (file: File) => { - if (!requireTenant()) return; - - const fileExt = file.name.split('.').pop()?.toLowerCase(); - if (!fileExt || !['txt', 'pdf', 'doc', 'docx', 'md'].includes(fileExt)) { - setStatus({ tone: "error", message: "Unsupported file type. Supported: TXT, PDF, DOC, DOCX, MD" }); - return; - } - - try { - setLoading(true); - setStatus({ tone: "info", message: `Uploading and processing ${file.name}...` }); - - // For TXT files, read client-side for faster processing - if (fileExt === 'txt' || fileExt === 'md') { - const fileContent = await file.text(); - const lines = fileContent - .split("\n") - .map((line) => line.trim()) - .filter((line) => line && !line.startsWith("#")); - - if (!lines.length) { - setStatus({ tone: "error", message: "No valid rules found in file (after filtering comments)." }); - setLoading(false); - return; - } - - // Upload rules via bulk endpoint - setStatus({ tone: "info", message: `Uploading ${lines.length} rule(s)...` }); - const response = await fetch(`${BACKEND_BASE_URL}/admin/rules/bulk?enhance=true`, { - method: "POST", - headers, - body: JSON.stringify({ rules: lines }), - }); - - if (!response.ok) { - throw new Error(await buildErrorMessage(response)); - } - - const data = await response.json(); - await handleRefresh(); - - // Store explanations for display and auto-expand - if (data.explanations && Array.isArray(data.explanations)) { - const explanationsMap: Record = {}; - const newExpanded = new Set(expandedRules); - - data.explanations.forEach((exp: any) => { - if (exp.rule) { - explanationsMap[exp.rule] = exp; - // Auto-expand rules that have explanations - if (exp.explanation || exp.examples || exp.missing_patterns) { - newExpanded.add(exp.rule); - } - } - }); - setRuleExplanations((prev) => ({ ...prev, ...explanationsMap })); - setExpandedRules(newExpanded); - } - - const enhancedMsg = data.enhanced ? " (enhanced by LLM)" : ""; - setStatus({ tone: "success", message: `Uploaded ${data.added_rules?.length || lines.length} rule(s) from ${file.name}${enhancedMsg}.` }); - return; - } - - // For PDF, DOC, DOCX - use backend file upload endpoint - const formData = new FormData(); - formData.append('file', file); - - setStatus({ tone: "info", message: `Extracting text from ${file.name}...` }); - const response = await fetch(`${BACKEND_BASE_URL}/admin/rules/upload-file?enhance=true`, { - method: "POST", - headers: { - "x-tenant-id": tenantId.trim(), - "x-user-role": role, - }, - body: formData, - }); - - if (!response.ok) { - throw new Error(await buildErrorMessage(response)); - } - - const data = await response.json(); - await handleRefresh(); - - // Store explanations for display and auto-expand - if (data.explanations && Array.isArray(data.explanations)) { - const explanationsMap: Record = {}; - const newExpanded = new Set(expandedRules); - - data.explanations.forEach((exp: any) => { - if (exp.rule) { - explanationsMap[exp.rule] = exp; - // Auto-expand rules that have explanations - if (exp.explanation || exp.examples || exp.missing_patterns) { - newExpanded.add(exp.rule); - } - } - }); - setRuleExplanations((prev) => ({ ...prev, ...explanationsMap })); - setExpandedRules(newExpanded); - } - - const enhancedMsg = data.enhanced ? " (enhanced by LLM)" : ""; - setStatus({ - tone: "success", - message: `Uploaded ${data.added_rules?.length || data.total_extracted || 0} rule(s) from ${file.name}${enhancedMsg}.` - }); - } catch (error: any) { - setStatus({ tone: "error", message: error.message || "Failed to upload rules from file" }); - } finally { - setLoading(false); - } - }, [handleRefresh, headers, requireTenant, tenantId]); - - const handleFileUpload = useCallback(async (event: React.ChangeEvent) => { - const file = event.target.files?.[0]; - if (!file) return; - await processFile(file); - if (fileInputRef.current) { - fileInputRef.current.value = ""; - } - }, [processFile]); - - const handleDragOver = useCallback((e: React.DragEvent) => { - e.preventDefault(); - e.stopPropagation(); - setIsDragging(true); - }, []); - - const handleDragLeave = useCallback((e: React.DragEvent) => { - e.preventDefault(); - e.stopPropagation(); - setIsDragging(false); - }, []); - - const handleDrop = useCallback(async (e: React.DragEvent) => { - e.preventDefault(); - e.stopPropagation(); - setIsDragging(false); - - const file = e.dataTransfer.files?.[0]; - if (!file) return; - - const fileExt = file.name.split('.').pop()?.toLowerCase(); - if (!fileExt || !['txt', 'pdf', 'doc', 'docx', 'md'].includes(fileExt)) { - setStatus({ tone: "error", message: "Unsupported file type. Supported: TXT, PDF, DOC, DOCX, MD" }); - return; - } - - await processFile(file); - }, [processFile]); - - const fetchRuleExplanation = useCallback(async (rule: string) => { - if (!requireTenant()) return; - if (ruleExplanations[rule]) return; // Already have explanation - - try { - setLoadingExplanations((prev) => new Set(prev).add(rule)); - - // Fetch explanation by calling the enhance endpoint - // We'll use POST with the rule in the body to get explanation - const response = await fetch( - `${BACKEND_BASE_URL}/admin/rules?enhance=true`, - { - method: "POST", - headers: { - "Content-Type": "application/json", - "x-tenant-id": tenantId.trim(), - "x-user-role": role, - }, - body: JSON.stringify({ rule }), - } - ); - - if (response.ok) { - const data = await response.json(); - if (data.explanation || data.examples || data.missing_patterns) { - setRuleExplanations((prev) => ({ - ...prev, - [rule]: { - explanation: data.explanation, - examples: data.examples || [], - missing_patterns: data.missing_patterns || [], - edge_cases: data.edge_cases || [], - improvements: data.improvements || [], - severity: data.severity, - }, - })); - } - } - } catch (error) { - console.error("Failed to fetch rule explanation:", error); - } finally { - setLoadingExplanations((prev) => { - const next = new Set(prev); - next.delete(rule); - return next; - }); - } - }, [tenantId, role, ruleExplanations, requireTenant]); - - const toggleRuleExplanation = useCallback((rule: string) => { - setExpandedRules((prev) => { - const next = new Set(prev); - if (next.has(rule)) { - next.delete(rule); - } else { - next.add(rule); - // Fetch explanation if we don't have it - if (!ruleExplanations[rule]) { - fetchRuleExplanation(rule); - } - } - return next; - }); - }, [ruleExplanations, fetchRuleExplanation]); - - const handleDelete = useCallback(async () => { - if (!requireTenant()) return; - if (!deleteInput.trim()) { - setStatus({ tone: "error", message: "Enter the rule text you want to delete." }); - return; - } - - try { - setLoading(true); - setStatus({ tone: "info", message: "Deleting rule..." }); - const response = await fetch( - `${BACKEND_BASE_URL}/admin/rules/${encodeURIComponent(deleteInput.trim())}`, - { - method: "DELETE", - headers, - } - ); - if (!response.ok) { - throw new Error(await buildErrorMessage(response)); - } - await handleRefresh(); - setDeleteInput(""); - setStatus({ tone: "success", message: "Rule deleted." }); - } catch (error: any) { - setStatus({ tone: "error", message: error.message || "Failed to delete rule" }); - } finally { - setLoading(false); - } - }, [deleteInput, handleRefresh, headers, requireTenant]); - - // Check permissions AFTER all hooks are called - if (!canManageRules(role)) { - return ( -
-
-
-
- - IC - - IntegraChat · Admin Rules -
-
- - - ← Back Home - -
-
-
- -
-

Access Denied

-

- You need Admin or Owner role to manage rules. -

-

- Your current role: {role.charAt(0).toUpperCase() + role.slice(1)} -

-

- Please switch your role using the dropdown in the header. -

-
-
-
- ); - } - - return ( -
-
-
-
- - IC - - IntegraChat · Admin Rule Ingestion -
-
- - - ← Back Home - -
-
-
-

- Upload governance policies, compliance workflows, and red-flag patterns. Rules are automatically enhanced by LLM and stored in the backend. -

-
- - - LLM Enhanced - - - 📄 - File Upload - - - 🔄 - Chunk Processing - -
-
-
- - - -
-
-
- {lastUpdated && ( -
- 🔄 - Last updated: {lastUpdated} -
- )} - {!lastUpdated &&
} - -
- -
- {/* Left Column: Upload Rules */} -
-
- 📝 -

Add Rules

-
- -