IntegraChat / backend /README.md
nothingworry's picture
update the readme file
59466bb
|
raw
history blame
23.2 kB

Backend Documentation

This folder contains the production-ready FastAPI stack plus the companion MCP servers that power IntegraChat.

Directory Overview

  • api/ – FastAPI application (routes, services, storage helpers, MCP clients)
  • mcp_server/ – Unified MCP server exposing rag/web/admin tools via namespaces
  • workers/ – Celery workers and schedulers for async ingestion + analytics maintenance

Prerequisites

  • Python 3.10+
  • PostgreSQL (with the vector extension) for RAG data, or Supabase with pgvector enabled
  • Supabase (recommended) for admin rules + analytics storage, with automatic SQLite fallback in data/
    • Both RulesStore and AnalyticsStore automatically detect and use Supabase when configured
    • Falls back to SQLite automatically if Supabase credentials are missing
    • See SUPABASE_SETUP.md in the root directory for setup instructions
  • Optional: Ollama running locally (default) or Groq API credentials for remote LLMs

Create a virtual environment at the repo root, then:

pip install -r requirements.txt
cp env.example .env   # update MCP URLs + LLM settings

Running the Services Locally

  1. FastAPI core

    uvicorn backend.api.main:app --port 8000 --reload
    
  2. Unified MCP server (rag/web/admin)

    python backend/mcp_server/server.py
    

    Or use the provided startup script:

    start.bat  # Windows - launches MCP server on port 8900 and FastAPI on port 8000
    

    This single server (default port 8900) exposes the following namespaced tools:

    • rag.search - Semantic search across tenant documents
    • rag.ingest - Ingest text content into knowledge base
    • rag.delete - Delete individual or all documents for a tenant
    • rag.list - List all documents for a tenant with pagination
    • web.search - Google Programmable Search (Custom Search API) web search
    • admin.getRules, admin.addRule, admin.deleteRule, admin.logViolation

    HTTP Endpoints (for direct API access):

    • GET /rag/list?tenant_id={id}&limit={n}&offset={n} - List documents
    • POST /rag/ingest - Ingest content
    • POST /rag/search - Search documents (supports threshold parameter, default: 0.3)
    • DELETE /rag/delete/{document_id}?tenant_id={id} - Delete specific document
    • DELETE /rag/delete-all?tenant_id={id} - Delete all documents
    • POST /web/search - Web search
    • POST /admin/* - Admin operations
  3. Optional workers (if running Celery-based ingestion/analytics jobs):

    celery -A backend.workers.ingestion_worker worker --loglevel=info
    celery -A backend.workers.analytics_worker worker --loglevel=info
    

The Gradio UI (python app.py) and the Next.js operator console (see frontend/README.md) both talk to the FastAPI layer at http://localhost:8000.

Key Endpoints

All endpoints require the x-tenant-id header unless otherwise noted.

Service Path Notes
Agent POST /agent/message Autonomous orchestration (RAG/Web/Admin/LLM)
Agent Debug POST /agent/debug Full reasoning trace + tool plan
Agent Plan POST /agent/plan Dry-run planning without executing tools
RAG POST /rag/ingest-document Rich ingestion (text, URL, metadata)
RAG POST /rag/ingest-file File upload (PDF/DOCX/TXT/MD)
RAG GET /rag/list Paginated document listing per tenant (requires x-tenant-id header)
RAG DELETE /rag/delete/{document_id} Delete specific document (requires x-tenant-id header)
RAG DELETE /rag/delete-all Delete all documents for tenant (requires x-tenant-id header)
Admin POST /admin/rules Regex + severity rule ingestion
Analytics GET /analytics/overview Summary metrics (queries, tokens, red flags)

Refer to the root README.md for the complete endpoint tables.

Diagnostics & Tenant Isolation

Use the helper scripts in the repo root when validating backend changes:

  • python verify_tenant_isolation.py – Exercises analytics logging, admin rule CRUD, API reachability, and proves RAG tenant isolation by ingesting + querying as multiple tenants.
  • python check_rag_database.py – Talks directly to the pgvector database to list tenant IDs, preview stored chunks, and run safeguarded searches via search_vectors(). Helpful when troubleshooting suspected cross-tenant leakage.
  • python verify_supabase_setup.py – Verifies Supabase configuration and shows which backend (Supabase or SQLite) each store is using.
  • python check_supabase_rules.py – Checks Supabase admin rules configuration and RLS policies.
  • python migrate_sqlite_to_supabase.py – One-shot migration script to copy existing SQLite data to Supabase.
  • python test_manual.py – Legacy manual smoke test harness (analytics store, admin rules, API surface).

Troubleshooting tip: If the isolation script reports a failure, first run check_rag_database.py to confirm documents are tagged with the correct tenant_id, then restart the unified MCP server so it reloads the updated SQL filtering logic.

Recent Improvements

Tenant ID Normalization

  • All database operations now normalize tenant IDs to handle whitespace and formatting differences
  • Documents can be listed and deleted consistently even if stored with slightly different tenant_id formatting
  • The system automatically matches tenant IDs after normalization, ensuring operations work across different input formats

HTTP Endpoint Support

  • Added GET support for /rag/list endpoint (previously POST-only)
  • Added DELETE support for /rag/delete/{document_id} and /rag/delete-all endpoints
  • All endpoints support both MCP protocol (POST with JSON payload) and direct HTTP methods (GET/DELETE with query parameters)

Response Format

  • MCP server responses are wrapped in a standard format with status, data, and metadata fields
  • RAG client automatically unwraps responses for seamless integration
  • Error responses include detailed messages for better debugging

RAG Search Enhancements

  • Cross-Encoder Re-ranking: Two-stage retrieval process for massive accuracy improvement:
    • Initial vector search retrieves top candidates using embeddings
    • Cross-encoder model (cross-encoder/ms-marco-MiniLM-L-6-v2) re-ranks top 10 results
    • Final filtering by threshold and limit applied
    • Seamlessly integrated with existing search API
  • Lowered default threshold from 0.5 to 0.3 for improved recall of relevant documents
  • Intelligent fallback mechanism returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible
  • Configurable threshold via threshold parameter in search requests (default: 0.3)
  • Enhanced tool selection automatically triggers RAG for admin questions, fact lookups ("who is", "what is"), and internal knowledge queries
  • Response unwrapping in MCP client ensures orchestrator receives properly formatted results for tool scoring and prompt building

Conversation Memory System

  • Short-Term Memory: Automatic storage of tool outputs per session with configurable size limits (default: 10 outputs) and TTL (default: 900 seconds / 15 minutes)
  • Session-Based Isolation: Memory is keyed by session_id (not tenant_id) for safety, ensuring no cross-tenant data mixing
  • Automatic Injection: Recent memory is automatically injected into tool payloads as a memory field, enabling tools to make context-aware decisions in multi-step workflows
  • Auto-Expiration: Memory entries automatically expire after TTL or can be explicitly cleared via end_session/endSession flag
  • Configuration: Tune behavior via environment variables:
    • MCP_MEMORY_MAX_ITEMS: Maximum number of tool outputs to keep per session (default: 10)
    • MCP_MEMORY_TTL_SECONDS: Time-to-live for memory entries in seconds (default: 900)
  • Comprehensive Testing: Full test suite in backend/tests/test_conversation_memory.py covering storage, retrieval, expiration, and multi-step workflows

AI-Generated KB Metadata

When ingesting documents, the system automatically extracts rich metadata:

  • Title Extraction: From filename, URL, or content structure (with intelligent fallback)
  • Summary Generation: 2-3 sentence summary via LLM (with keyword-based fallback)
  • Tag Extraction: 5-8 relevant tags extracted from content
  • Topic Identification: 3-5 main themes identified via LLM
  • Date Detection: Multiple date formats automatically detected
  • Quality Score: 0.0-1.0 score based on structure and completeness

Intelligent Fallback: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata.

Database Integration: Metadata stored in JSONB column (metadata) for flexible querying and enhanced RAG search. Migration script: backend/scripts/migrate_add_metadata.py.

API Response: Ingestion endpoints (/rag/ingest-document, /rag/ingest-file) now return extracted_metadata in the response.

Per-Tool Latency Prediction & Context-Aware Routing

The agent now uses sophisticated routing logic to optimize tool selection:

  • Latency Prediction: Agent estimates expected latency before tool selection:
    • RAG: 60-120ms (depends on result count)
    • Web: 400-1800ms (network-dependent)
    • Admin: <20ms (local regex matching)
    • LLM: Variable based on model and token count
  • Path Optimization: Agent chooses fastest tool sequence based on latency estimates
  • Context-Aware Routing: Intelligent tool skipping based on previous outputs:
    • High RAG score (β‰₯0.8) β†’ Skip web search
    • Critical admin violation β†’ Skip agent reasoning, immediate block
    • Relevant memory available β†’ Skip RAG, use memory instead
  • Routing Hints: Context hints included in reasoning trace for transparency

Implementation: backend/api/services/tool_metadata.py defines latency estimates and routing logic. backend/api/services/tool_selector.py implements context-aware decisions.

Tool Output Schemas

Every tool now returns strict JSON schemas for consistency:

  • RAG: {results: [...], top_score: float, latency_ms: int}
  • Web: {results: [...], latency_ms: int}
  • Admin: {violations: [...], severity: str, latency_ms: int}
  • LLM: {text: str, tokens_used: int, latency_ms: int}

Automatic Validation: All tool outputs validated and formatted in AgentOrchestrator before use. Makes debugging and monitoring simpler.

Schema Definitions: backend/api/services/tool_metadata.py contains TOOL_OUTPUT_SCHEMAS with validation functions.

Role Propagation & Permission Handling

  • Fixed Role Propagation: User role is now properly passed from API route β†’ process_ingestion() β†’ RAG client β†’ MCP server
    • rag_client.ingest_with_metadata() now accepts user_role parameter
    • Role is included in payload sent to MCP server: {"user_role": "owner", ...}
    • MCP server extracts role from payload via build_tenant_context() and uses it for permission checks
  • Improved Error Handling:
    • Permission errors (403) return clear messages with actionable guidance
    • Error messages specify which roles are allowed for each action
    • Frontend displays user-friendly error messages with instructions
  • Debug Logging: Added debug logging in route handlers and services to trace role values:
    • Logs role received in route handler
    • Logs role passed to process_ingestion
    • Logs role sent to RAG client
    • Logs role received by MCP server
  • Admin Question Handling: Fixed admin identity questions to prioritize RAG results from knowledge base

UI Enhancements (app.py)

  • Knowledge Base Library Tab:

    • Statistics cards showing document counts by type
    • Interactive Plotly pie chart for document type distribution
    • Semantic search with relevance scoring
    • Type filtering (text, PDF, FAQ, link)
    • Document management with preview and deletion
    • Auto-refresh after operations
  • Admin Analytics Tab:

    • Statistics cards for key metrics (queries, users, red flags, RAG searches)
    • Interactive Plotly bar charts for tool usage, latency, and RAG quality
    • Detailed tool usage table with performance metrics
    • Formatted summary with dark theme styling
    • Real-time data fetching and visualization
    • Access: All roles can view analytics (viewer, editor, admin, owner)
  • Debug & Reasoning Tab:

    • Reasoning trace analyzer showing step-by-step agent decision-making
    • Tool invocation timeline with latency visualization
    • AI metadata display after document ingestion (title, summary, tags, topics, quality score)
    • Latency predictions shown in reasoning trace (estimated vs actual)
    • Context-aware routing hints visualized (skip web/RAG/reasoning decisions)
    • Tool output schemas displayed in debug view
    • Formatted markdown output with detailed metrics
    • Uses /agent/debug endpoint for comprehensive insights
  • Modern UI/UX:

    • Dark theme with white text for better readability
    • Custom CSS styling for cards and charts
    • Improved error handling and status messages
    • Responsive layout with proper component scaling

LLM-Guided Rule Explanation

The rule enhancement system includes intelligent fallback mechanisms:

  • LLM Enhancement: When available, rules are enhanced with comprehensive explanations, examples, missing patterns, edge cases, and improvements
  • Intelligent Fallback: When LLM times out or fails, the system automatically generates basic explanations using keyword extraction:
    • Detects keywords (password, API key, credit card, sensitive data, etc.)
    • Generates contextual explanations based on detected keywords
    • Provides relevant examples (5-8 examples) based on rule type
    • Suggests missing patterns (3-5 suggestions) for rule improvement
  • Timeout Protection: 30-second timeout per rule with graceful fallback
  • Chunk Processing: Bulk rule processing handles failures gracefully - one rule failure doesn't block others

This ensures users always receive useful rule explanations even when the LLM service is unavailable or slow.

Real-Time Visualization Components (Next.js Frontend)

The Next.js frontend includes three powerful visualization components:

  • Reasoning Path Visualizer: Step-by-step visualization of agent reasoning with animated progression, status indicators, and detailed metrics. Integrated into chat panel.
  • Tool Invocation Timeline: Visual timeline showing tool execution order, latency, and result counts. Integrated into chat panel.
  • Tenant Activity Heatmap: Query activity heatmap and per-tool usage trends. Integrated into analytics page.

All visualizations are accessible to all roles and automatically populate when agent responses include reasoning_trace and tool_traces data.

Environment Variables (excerpt)

Defined in env.example:

  • RAG_MCP_URL - Default: http://localhost:8900/rag (unified MCP server)
  • WEB_MCP_URL - Default: http://localhost:8900/web (unified MCP server for Google web search)
  • ADMIN_MCP_URL - Default: http://localhost:8900/admin (unified MCP server)
  • MCP_PORT - Port for unified MCP server (default: 8900)
  • MCP_HOST - Host for unified MCP server (default: 0.0.0.0)
  • POSTGRESQL_URL - PostgreSQL connection string with pgvector extension
  • OLLAMA_URL, OLLAMA_MODEL (or GROQ_API_KEY + LLM_BACKEND=groq)
  • SUPABASE_URL, SUPABASE_SERVICE_KEY - Required for Supabase backend (admin rules + analytics)
    • If not set, the system automatically falls back to SQLite in data/ directory
    • See SUPABASE_SETUP.md in the root directory for detailed setup instructions
  • GOOGLE_SEARCH_API_KEY, GOOGLE_SEARCH_CX_ID - Credentials for Google Programmable Search used by web.search
  • MCP_MEMORY_MAX_ITEMS - Maximum number of tool outputs to keep per session (default: 10)
  • MCP_MEMORY_TTL_SECONDS - Time-to-live for memory entries in seconds (default: 900)
  • APP_ENV, LOG_LEVEL, API_PORT

Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.

Note: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The start.bat script automatically configures the correct URLs.

Supabase Configuration

Both RulesStore and AnalyticsStore support dual-backend storage with automatic detection:

Setup Steps

  1. Create Supabase tables:

    • Run supabase_admin_rules_table.sql in Supabase SQL Editor (from repo root)
    • Run supabase_analytics_tables.sql in Supabase SQL Editor (from repo root)
  2. Configure environment variables in .env:

    SUPABASE_URL=https://your-project-id.supabase.co
    SUPABASE_SERVICE_KEY=your_service_role_key_here
    
  3. Verify configuration:

    python verify_supabase_setup.py
    
  4. Migrate existing data (if you have SQLite data):

    python migrate_sqlite_to_supabase.py
    

How It Works

  • Automatic Detection: Both stores check for SUPABASE_URL and SUPABASE_SERVICE_KEY at initialization
  • Supabase First: If credentials are found, Supabase is used automatically
  • SQLite Fallback: If Supabase is not configured, SQLite databases in data/ are used
  • Startup Logging: Check startup logs to see which backend each store is using:
    • βœ… RulesStore: Using Supabase backend
    • βœ… AnalyticsStore: Using Supabase backend
    • Or ⚠️ RulesStore: Using SQLite backend if Supabase is not configured

Tables Used

  • Admin Rules: admin_rules table in Supabase
  • Analytics: tool_usage_events, redflag_violations, rag_search_events, agent_query_events

See SUPABASE_SETUP.md and SUPABASE_MIGRATION_COMPLETE.md in the root directory for detailed instructions and troubleshooting.

Unified MCP tool instructions

Agents that speak the Model Context Protocol should connect to the integrachat server id defined in backend/mcp_server/server.py and call the namespaced tools directly:

Namespace Tool Purpose HTTP Endpoint
rag search Retrieve tenant-scoped document chunks POST /rag/search
rag ingest Chunk + store new knowledge POST /rag/ingest
rag list List all documents for tenant GET /rag/list?tenant_id={id}
rag delete Remove one/all stored documents DELETE /rag/delete/{id}?tenant_id={id} or DELETE /rag/delete-all?tenant_id={id}
web search Google Programmable Search (Custom Search API) POST /web/search
admin getRules Fetch tenant governance rules (list or detailed) POST /admin/getRules
admin addRule Insert or update a rule POST /admin/addRule
admin deleteRule Remove a rule by text POST /admin/deleteRule
admin logViolation Persist a red-flag event into analytics POST /admin/logViolation

Important Notes:

  • Always send tenant_id in the payload (or as query parameter for GET/DELETE requests) so the shared middleware can enforce isolation and log analytics
  • The MCP server automatically normalizes tenant IDs to ensure consistent matching across operations
  • All endpoints support both POST (with JSON payload) and direct HTTP methods (GET for list, DELETE for delete operations)
  • Tenant ID normalization handles whitespace and ensures documents can be listed and deleted consistently
  • RAG search uses a default threshold of 0.3 for better recall; adjust via threshold parameter if needed
  • Conversation Memory: Send session_id (or sessionId/conversation_id/conversationId) in tool payloads to enable short-term memory. Recent tool outputs are automatically stored and injected into subsequent tool calls as a memory field. Send end_session: true to clear memory for a session.

Troubleshooting

RAG Search Not Returning Results

  • Check similarity threshold: The default threshold is 0.3. If results are still not found, try lowering it to 0.2 or 0.1
  • Verify documents are ingested: Use GET /rag/list?tenant_id={id} to confirm documents exist for the tenant
  • Check tenant ID matching: Ensure the tenant_id used for search matches the one used for ingestion (normalization handles whitespace automatically)
  • Review search logs: Check MCP server logs for search metrics (hits_count, avg_score, top_score)

Agent Not Using RAG for Knowledge Base Questions

  • Verify RAG results are being found: Check the agent debug endpoint (POST /agent/debug) to see if RAG results are being pre-fetched
  • Check tool scores: The debug output shows rag_fitness score; if it's low (< 0.4), the agent may skip RAG
  • Ensure knowledge base content exists: Questions like "who is the admin" require relevant content in the knowledge base
  • Pattern matching: The tool selector automatically triggers RAG for patterns like "admin", "who is", "what is", but semantic similarity also plays a role

Document Ingestion Permission Errors

  • 403 Forbidden: If you see "Role 'viewer' is not permitted to perform 'ingest_documents'":
    • Change your role in the UI dropdown (top right) from "viewer" to "editor", "admin", or "owner"
    • Verify x-user-role header is being sent correctly (check backend logs for πŸ” DEBUG: messages)
    • Check that role is being propagated through the pipeline: route handler β†’ process_ingestion β†’ RAG client β†’ MCP server
    • Review debug logs to see where role might be getting lost or defaulting to "viewer"
  • Role Propagation: The role must flow through:
    1. UI sends x-user-role header
    2. Route handler receives it as x_user_role parameter
    3. Route handler passes it to process_ingestion(user_role=...)
    4. process_ingestion passes it to rag_client.ingest_with_metadata(user_role=...)
    5. RAG client includes it in payload: {"user_role": "...", ...}
    6. MCP server extracts it via build_tenant_context() and uses for permission checks

Document Deletion Issues

  • 404 Not Found: Verify the document_id exists and belongs to the correct tenant
  • Tenant ID mismatch: The system normalizes tenant IDs, but ensure you're using the same tenant_id format as when documents were ingested
  • Check logs: Database deletion logs show detailed information about tenant ID matching and document existence

Supabase Configuration Issues

  • Data still going to SQLite: Check that SUPABASE_URL and SUPABASE_SERVICE_KEY are set correctly in .env (no quotes, no spaces)
  • Service role key errors: Make sure you're using the service_role key (not anon key) from Supabase Dashboard β†’ Settings β†’ API
  • Tables don't exist: Run supabase_admin_rules_table.sql and supabase_analytics_tables.sql in Supabase SQL Editor
  • Permission errors: Check RLS policies in Supabase allow service role access
  • Startup warnings: Check FastAPI startup logs to see which backend each store is using (βœ… for Supabase, ⚠️ for SQLite fallback)