# Backend Documentation This folder contains the production-ready FastAPI stack plus the companion MCP servers that power IntegraChat. ## Directory Overview - `api/` – FastAPI application (routes, services, storage helpers, MCP clients) - `mcp_server/` – Unified MCP server exposing rag/web/admin tools via namespaces - `workers/` – Celery workers and schedulers for async ingestion + analytics maintenance ## Prerequisites - Python 3.10+ - PostgreSQL (with the `vector` extension) for RAG data, or Supabase with pgvector enabled - **Supabase (recommended)** for admin rules + analytics storage, with automatic SQLite fallback in `data/` - Both `RulesStore` and `AnalyticsStore` automatically detect and use Supabase when configured - Falls back to SQLite automatically if Supabase credentials are missing - See `SUPABASE_SETUP.md` in the root directory for setup instructions - Optional: Ollama running locally (default) or Groq API credentials for remote LLMs Create a virtual environment at the repo root, then: ```bash pip install -r requirements.txt cp env.example .env # update MCP URLs + LLM settings ``` ## Running the Services Locally 1. **FastAPI core** ```bash uvicorn backend.api.main:app --port 8000 --reload ``` 2. **Unified MCP server (rag/web/admin)** ```bash python backend/mcp_server/server.py ``` Or use the provided startup script: ```bash start.bat # Windows - launches MCP server on port 8900 and FastAPI on port 8000 ``` This single server (default port 8900) exposes the following namespaced tools: - `rag.search` - Semantic search across tenant documents - `rag.ingest` - Ingest text content into knowledge base - `rag.delete` - Delete individual or all documents for a tenant - `rag.list` - List all documents for a tenant with pagination - `web.search` - Google Programmable Search (Custom Search API) web search - `admin.getRules`, `admin.addRule`, `admin.deleteRule`, `admin.logViolation` **HTTP Endpoints** (for direct API access): - `GET /rag/list?tenant_id={id}&limit={n}&offset={n}` - List documents - `POST /rag/ingest` - Ingest content - `POST /rag/search` - Search documents (supports `threshold` parameter, default: 0.3) - `DELETE /rag/delete/{document_id}?tenant_id={id}` - Delete specific document - `DELETE /rag/delete-all?tenant_id={id}` - Delete all documents - `POST /web/search` - Web search - `POST /admin/*` - Admin operations 3. **Optional workers** (if running Celery-based ingestion/analytics jobs): ```bash celery -A backend.workers.ingestion_worker worker --loglevel=info celery -A backend.workers.analytics_worker worker --loglevel=info ``` The Gradio UI (`python app.py`) talks to the FastAPI layer at `http://localhost:8000`. ## Key Endpoints All endpoints require the `x-tenant-id` header unless otherwise noted. | Service | Path | Notes | | --- | --- | --- | | Agent | `POST /agent/message` | Autonomous orchestration (RAG/Web/Admin/LLM) | | Agent Debug | `POST /agent/debug` | Full reasoning trace + tool plan | | Agent Plan | `POST /agent/plan` | Dry-run planning without executing tools | | RAG | `POST /rag/ingest-document` | Rich ingestion (text, URL, metadata) | | RAG | `POST /rag/ingest-file` | File upload (PDF/DOCX/TXT/MD) | | RAG | `GET /rag/list` | Paginated document listing per tenant (requires `x-tenant-id` header) | | RAG | `DELETE /rag/delete/{document_id}` | Delete specific document (requires `x-tenant-id` header) | | RAG | `DELETE /rag/delete-all` | Delete all documents for tenant (requires `x-tenant-id` header) | | Admin | `POST /admin/rules` | Regex + severity rule ingestion | | Analytics | `GET /analytics/overview` | Summary metrics (queries, tokens, red flags) | Refer to the root `README.md` for the complete endpoint tables. ## Diagnostics & Tenant Isolation When validating backend changes: - **API Testing**: Use the FastAPI interactive docs at `http://localhost:8000/docs` to test endpoints - **Database Inspection**: Connect directly to your PostgreSQL/Supabase instance to verify tenant isolation and check that documents are tagged with the correct `tenant_id` - **Log Monitoring**: Check FastAPI and MCP server logs for detailed error messages and debugging information > **Troubleshooting tip:** If you suspect tenant isolation issues, check database queries to confirm they include `WHERE tenant_id = ...` filters, then restart the unified MCP server so it reloads the updated SQL filtering logic. ## Recent Improvements ### Tenant ID Normalization - All database operations now normalize tenant IDs to handle whitespace and formatting differences - Documents can be listed and deleted consistently even if stored with slightly different tenant_id formatting - The system automatically matches tenant IDs after normalization, ensuring operations work across different input formats ### HTTP Endpoint Support - Added GET support for `/rag/list` endpoint (previously POST-only) - Added DELETE support for `/rag/delete/{document_id}` and `/rag/delete-all` endpoints - All endpoints support both MCP protocol (POST with JSON payload) and direct HTTP methods (GET/DELETE with query parameters) ### Response Format - MCP server responses are wrapped in a standard format with `status`, `data`, and `metadata` fields - RAG client automatically unwraps responses for seamless integration - Error responses include detailed messages for better debugging ### RAG Search Enhancements - **Cross-Encoder Re-ranking**: Two-stage retrieval process for massive accuracy improvement: - Initial vector search retrieves top candidates using embeddings - Cross-encoder model (`cross-encoder/ms-marco-MiniLM-L-6-v2`) re-ranks top 10 results - Final filtering by threshold and limit applied - Seamlessly integrated with existing search API - **Lowered default threshold** from 0.5 to 0.3 for improved recall of relevant documents - **Intelligent fallback mechanism** returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible - **Configurable threshold** via `threshold` parameter in search requests (default: 0.3) - **Enhanced tool selection** automatically triggers RAG for admin questions, fact lookups ("who is", "what is"), and internal knowledge queries - **Response unwrapping** in MCP client ensures orchestrator receives properly formatted results for tool scoring and prompt building ### Conversation Memory System - **Short-Term Memory**: Automatic storage of tool outputs per session with configurable size limits (default: 10 outputs) and TTL (default: 900 seconds / 15 minutes) - **Session-Based Isolation**: Memory is keyed by `session_id` (not `tenant_id`) for safety, ensuring no cross-tenant data mixing - **Automatic Injection**: Recent memory is automatically injected into tool payloads as a `memory` field, enabling tools to make context-aware decisions in multi-step workflows - **Auto-Expiration**: Memory entries automatically expire after TTL or can be explicitly cleared via `end_session`/`endSession` flag - **Configuration**: Tune behavior via environment variables: - `MCP_MEMORY_MAX_ITEMS`: Maximum number of tool outputs to keep per session (default: 10) - `MCP_MEMORY_TTL_SECONDS`: Time-to-live for memory entries in seconds (default: 900) - **Comprehensive Testing**: Conversation memory system tested through integration with agent orchestrator ### AI-Generated KB Metadata When ingesting documents, the system automatically extracts rich metadata: - **Title Extraction**: From filename, URL, or content structure (with intelligent fallback) - **Summary Generation**: 2-3 sentence summary via LLM (with keyword-based fallback) - **Tag Extraction**: 5-8 relevant tags extracted from content - **Topic Identification**: 3-5 main themes identified via LLM - **Date Detection**: Multiple date formats automatically detected - **Quality Score**: 0.0-1.0 score based on structure and completeness **Intelligent Fallback**: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata. **Database Integration**: Metadata stored in JSONB column (`metadata`) for flexible querying and enhanced RAG search. Migration script: `backend/scripts/migrate_add_metadata.py`. **API Response**: Ingestion endpoints (`/rag/ingest-document`, `/rag/ingest-file`) now return `extracted_metadata` in the response. ### Per-Tool Latency Prediction & Context-Aware Routing The agent now uses sophisticated routing logic to optimize tool selection: - **Latency Prediction**: Agent estimates expected latency before tool selection: - RAG: 60-120ms (depends on result count) - Web: 400-1800ms (network-dependent) - Admin: <20ms (local regex matching) - LLM: Variable based on model and token count - **Path Optimization**: Agent chooses fastest tool sequence based on latency estimates - **Context-Aware Routing**: Intelligent tool skipping based on previous outputs: - High RAG score (≥0.8) → Skip web search - Critical admin violation → Skip agent reasoning, immediate block - Relevant memory available → Skip RAG, use memory instead - **Routing Hints**: Context hints included in reasoning trace for transparency **Implementation**: `backend/api/services/tool_metadata.py` defines latency estimates and routing logic. `backend/api/services/tool_selector.py` implements context-aware decisions. ### Tool Output Schemas Every tool now returns strict JSON schemas for consistency: - **RAG**: `{results: [...], top_score: float, latency_ms: int}` - **Web**: `{results: [...], latency_ms: int}` - **Admin**: `{violations: [...], severity: str, latency_ms: int}` - **LLM**: `{text: str, tokens_used: int, latency_ms: int}` **Automatic Validation**: All tool outputs validated and formatted in `AgentOrchestrator` before use. Makes debugging and monitoring simpler. **Schema Definitions**: `backend/api/services/tool_metadata.py` contains `TOOL_OUTPUT_SCHEMAS` with validation functions. ### UI & UX Improvements - **Document Display**: Fixed document list formatting - now properly displays as table rows instead of `[object Object]` - **Rule Deletion**: Enhanced to support both rule numbers and rule text for easier deletion - **LLM Enhancement Toggle**: Added option to disable LLM enhancement for faster rule addition - **Timeout Improvements**: Increased timeout for bulk rule operations from 45s to 180s ### Role Propagation & Permission Handling - **Fixed Role Propagation**: User role is now properly passed from API route → `process_ingestion()` → RAG client → MCP server - `rag_client.ingest_with_metadata()` now accepts `user_role` parameter - Role is included in payload sent to MCP server: `{"user_role": "owner", ...}` - MCP server extracts role from payload via `build_tenant_context()` and uses it for permission checks - **Improved Error Handling**: - Permission errors (403) return clear messages with actionable guidance - Error messages specify which roles are allowed for each action - Error messages specify which roles are allowed for each action - **Debug Logging**: Added debug logging in route handlers and services to trace role values: - Logs role received in route handler - Logs role passed to process_ingestion - Logs role sent to RAG client - Logs role received by MCP server - **Admin Question Handling**: Fixed admin identity questions to prioritize RAG results from knowledge base ### UI Enhancements (app.py) - **Knowledge Base Library Tab**: - Statistics cards showing document counts by type - Interactive Plotly pie chart for document type distribution - Semantic search with relevance scoring - Type filtering (text, PDF, FAQ, link) - Document management with preview and deletion - Auto-refresh after operations - **Admin Analytics Tab**: - Statistics cards for key metrics (queries, users, red flags, RAG searches) - Interactive Plotly bar charts for tool usage, latency, and RAG quality - Detailed tool usage table with performance metrics - Formatted summary with dark theme styling - Real-time data fetching and visualization - **Access**: All roles can view analytics (viewer, editor, admin, owner) - **Debug & Reasoning Tab**: - Reasoning trace analyzer showing step-by-step agent decision-making - Tool invocation timeline with latency visualization - **AI metadata display** after document ingestion (title, summary, tags, topics, quality score) - **Latency predictions** shown in reasoning trace (estimated vs actual) - **Context-aware routing hints** visualized (skip web/RAG/reasoning decisions) - **Tool output schemas** displayed in debug view - Formatted markdown output with detailed metrics - Uses `/agent/debug` endpoint for comprehensive insights - **Modern UI/UX**: - Dark theme with white text for better readability - Custom CSS styling for cards and charts - Improved error handling and status messages - Responsive layout with proper component scaling ### LLM-Guided Rule Explanation The rule enhancement system includes intelligent fallback mechanisms: - **LLM Enhancement**: When available, rules are enhanced with comprehensive explanations, examples, missing patterns, edge cases, and improvements - **Intelligent Fallback**: When LLM times out or fails, the system automatically generates basic explanations using keyword extraction: - Detects keywords (password, API key, credit card, sensitive data, etc.) - Generates contextual explanations based on detected keywords - Provides relevant examples (5-8 examples) based on rule type - Suggests missing patterns (3-5 suggestions) for rule improvement - **Timeout Protection**: 30-second timeout per rule with graceful fallback - **Chunk Processing**: Bulk rule processing handles failures gracefully - one rule failure doesn't block others This ensures users always receive useful rule explanations even when the LLM service is unavailable or slow. ### Real-Time Visualization Components (Gradio UI) The Gradio UI includes powerful visualization components: - **Reasoning Path Visualizer**: Step-by-step visualization of agent reasoning with status indicators and detailed metrics. Available in Debug & Reasoning tab. - **Tool Invocation Timeline**: Visual timeline showing tool execution order, latency, and result counts. Available in Debug & Reasoning tab. - **Analytics Dashboard**: Query activity and per-tool usage trends. Available in Admin Analytics tab. All visualizations automatically populate when agent responses include `reasoning_trace` and `tool_traces` data. ### Context Engineering (Latest) The system implements comprehensive context engineering strategies based on Anthropic's best practices: - **ContextEngineer Service** (`backend/api/services/context_engineer.py`): - **ContextScratchpad**: Structured note-taking with objectives, architectural decisions, and unresolved issues - **ContextCompressor**: High-fidelity compaction and tool result clearing - **ContextSelector**: Just-in-time context loading and memory selection - **ContextIsolator**: Isolation of large tool outputs - **Compaction Strategy**: - Monitors token usage and compresses at 80% threshold - Uses tool result clearing first (safest), then full compaction - Preserves architectural decisions, unresolved issues, and implementation details - Targets 60% token usage after compression - **Structured Prompts**: - All prompts use XML-style sections (``, ``, ``) - Clear organization improves model understanding - Better separation of concerns - **Integration Points**: - Conversation history compression in `agent_orchestrator.py` - Tool output compression for RAG and web search - Structured scratchpad context in all prompts - Memory selection before tool selection - **Benefits**: - Reduced token usage and API costs - Support for longer conversations - Better agent coherence across extended interactions - Improved performance through structured context Context engineering features are integrated throughout the agent orchestrator and MCP server. ## Environment Variables (excerpt) Defined in `env.example`: - `RAG_MCP_URL` - Default: `http://localhost:8900/rag` (unified MCP server) - `WEB_MCP_URL` - Default: `http://localhost:8900/web` (unified MCP server for Google web search) - `ADMIN_MCP_URL` - Default: `http://localhost:8900/admin` (unified MCP server) - `MCP_PORT` - Port for unified MCP server (default: 8900) - `MCP_HOST` - Host for unified MCP server (default: 0.0.0.0) - `POSTGRESQL_URL` - PostgreSQL connection string with pgvector extension - `OLLAMA_URL`, `OLLAMA_MODEL` (or `GROQ_API_KEY` + `LLM_BACKEND=groq`) - `SUPABASE_URL`, `SUPABASE_SERVICE_KEY` - **Required for Supabase backend** (admin rules + analytics) - If not set, the system automatically falls back to SQLite in `data/` directory - `GOOGLE_SEARCH_API_KEY`, `GOOGLE_SEARCH_CX_ID` - Credentials for Google Programmable Search used by `web.search` - `MCP_MEMORY_MAX_ITEMS` - Maximum number of tool outputs to keep per session (default: 10) - `MCP_MEMORY_TTL_SECONDS` - Time-to-live for memory entries in seconds (default: 900) - `APP_ENV`, `LOG_LEVEL`, `API_PORT` Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime. **Note**: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The `start.bat` script automatically configures the correct URLs. ## Supabase Configuration Both `RulesStore` and `AnalyticsStore` support dual-backend storage with automatic detection: ### Setup Steps 1. **Create Supabase tables**: - Run `supabase_admin_rules_table.sql` in Supabase SQL Editor (from repo root) - Run `supabase_analytics_tables.sql` in Supabase SQL Editor (from repo root) 2. **Configure environment variables** in `.env`: ```env SUPABASE_URL=https://your-project-id.supabase.co SUPABASE_SERVICE_KEY=your_service_role_key_here ``` 3. **Verify configuration**: Check that your Supabase project is accessible and tables are created correctly. 4. **Migrate existing data** (if you have SQLite data): Use Supabase migration tools or database export/import methods. ### How It Works - **Automatic Detection**: Both stores check for `SUPABASE_URL` and `SUPABASE_SERVICE_KEY` at initialization - **Supabase First**: If credentials are found, Supabase is used automatically - **SQLite Fallback**: If Supabase is not configured, SQLite databases in `data/` are used - **Startup Logging**: Check startup logs to see which backend each store is using: - `✅ RulesStore: Using Supabase backend` - `✅ AnalyticsStore: Using Supabase backend` - Or `⚠️ RulesStore: Using SQLite backend` if Supabase is not configured ### Tables Used - **Admin Rules**: `admin_rules` table in Supabase - **Analytics**: `tool_usage_events`, `redflag_violations`, `rag_search_events`, `agent_query_events` For detailed Supabase setup instructions, refer to the Supabase documentation and ensure tables are created correctly. ## Unified MCP tool instructions Agents that speak the Model Context Protocol should connect to the `integrachat` server id defined in `backend/mcp_server/server.py` and call the namespaced tools directly: | Namespace | Tool | Purpose | HTTP Endpoint | | --- | --- | --- | --- | | `rag` | `search` | Retrieve tenant-scoped document chunks | `POST /rag/search` | | `rag` | `ingest` | Chunk + store new knowledge | `POST /rag/ingest` | | `rag` | `list` | List all documents for tenant | `GET /rag/list?tenant_id={id}` | | `rag` | `delete` | Remove one/all stored documents | `DELETE /rag/delete/{id}?tenant_id={id}` or `DELETE /rag/delete-all?tenant_id={id}` | | `web` | `search` | Google Programmable Search (Custom Search API) | `POST /web/search` | | `admin` | `getRules` | Fetch tenant governance rules (list or detailed) | `POST /admin/getRules` | | `admin` | `addRule` | Insert or update a rule | `POST /admin/addRule` | | `admin` | `deleteRule` | Remove a rule by text | `POST /admin/deleteRule` | | `admin` | `logViolation` | Persist a red-flag event into analytics | `POST /admin/logViolation` | **Important Notes:** - Always send `tenant_id` in the payload (or as query parameter for GET/DELETE requests) so the shared middleware can enforce isolation and log analytics - The MCP server automatically normalizes tenant IDs to ensure consistent matching across operations - All endpoints support both POST (with JSON payload) and direct HTTP methods (GET for list, DELETE for delete operations) - Tenant ID normalization handles whitespace and ensures documents can be listed and deleted consistently - RAG search uses a default threshold of 0.3 for better recall; adjust via `threshold` parameter if needed - **Conversation Memory**: Send `session_id` (or `sessionId`/`conversation_id`/`conversationId`) in tool payloads to enable short-term memory. Recent tool outputs are automatically stored and injected into subsequent tool calls as a `memory` field. Send `end_session: true` to clear memory for a session. ## Troubleshooting ### RAG Search Not Returning Results - **Check similarity threshold**: The default threshold is 0.3. If results are still not found, try lowering it to 0.2 or 0.1 - **Verify documents are ingested**: Use `GET /rag/list?tenant_id={id}` to confirm documents exist for the tenant - **Check tenant ID matching**: Ensure the tenant_id used for search matches the one used for ingestion (normalization handles whitespace automatically) - **Review search logs**: Check MCP server logs for search metrics (hits_count, avg_score, top_score) ### Agent Not Using RAG for Knowledge Base Questions - **Verify RAG results are being found**: Check the agent debug endpoint (`POST /agent/debug`) to see if RAG results are being pre-fetched - **Check tool scores**: The debug output shows `rag_fitness` score; if it's low (< 0.4), the agent may skip RAG - **Ensure knowledge base content exists**: Questions like "who is the admin" require relevant content in the knowledge base - **Pattern matching**: The tool selector automatically triggers RAG for patterns like "admin", "who is", "what is", but semantic similarity also plays a role ### Document Ingestion Permission Errors - **403 Forbidden**: If you see "Role 'viewer' is not permitted to perform 'ingest_documents'": - Ensure the `x-user-role` header is set to "editor", "admin", or "owner" - Verify `x-user-role` header is being sent correctly (check backend logs for `🔍 DEBUG:` messages) - Check that role is being propagated through the pipeline: route handler → process_ingestion → RAG client → MCP server - Review debug logs to see where role might be getting lost or defaulting to "viewer" - **Role Propagation**: The role must flow through: 1. Client sends `x-user-role` header 2. Route handler receives it as `x_user_role` parameter 3. Route handler passes it to `process_ingestion(user_role=...)` 4. `process_ingestion` passes it to `rag_client.ingest_with_metadata(user_role=...)` 5. RAG client includes it in payload: `{"user_role": "...", ...}` 6. MCP server extracts it via `build_tenant_context()` and uses for permission checks ### Document Deletion Issues - **404 Not Found**: Verify the document_id exists and belongs to the correct tenant - **Tenant ID mismatch**: The system normalizes tenant IDs, but ensure you're using the same tenant_id format as when documents were ingested - **Check logs**: Database deletion logs show detailed information about tenant ID matching and document existence - **Role Propagation**: Ensure user role is being passed correctly - deletion requires `admin` or `owner` role. The role is now properly propagated from API request → API → RAG Client → MCP Server ### Rule Management Issues - **Timeout Errors**: If rule enhancement times out: - Disable LLM enhancement by not setting `enhance=true` in the request - Add rules in smaller batches (1-3 rules at a time) - Enhancement timeout increased to 180s per chunk (5 rules) - **Rule Deletion 404**: Can now delete by rule number or full text. If using number, ensure it's a valid index (1-based) - **Permission Errors**: Rule management requires `admin` or `owner` role. Check that role is set correctly in the `x-user-role` header ### Supabase Configuration Issues - **Data still going to SQLite**: Check that `SUPABASE_URL` and `SUPABASE_SERVICE_KEY` are set correctly in `.env` (no quotes, no spaces) - **Service role key errors**: Make sure you're using the **service_role** key (not anon key) from Supabase Dashboard → Settings → API - **Tables don't exist**: Run `supabase_admin_rules_table.sql` and `supabase_analytics_tables.sql` in Supabase SQL Editor - **Permission errors**: Check RLS policies in Supabase allow service role access - **Startup warnings**: Check FastAPI startup logs to see which backend each store is using (`✅` for Supabase, `⚠️` for SQLite fallback)